Skip to Main Content
PolyU Library

Research Data Management

Sharing good practices for Research Data Management

Data Backup & Preservation


A good data backup strategy is crucial in preventing data loss and ensuring long term availability of data for future reuse. This section shares good practice in formulating backup strategystorage media selection, and suggested file formats to preserve your research data for long-term access.

When formulating your back-up plan, you need to consider a number of factors like the value of the research data, expected level of risk, affordability of cost and time effort, and so on. Below are some good practices for your reference:

Apply the 3-2-1 rules

  • Keep 3 copies of your research data,
  • Save the copies in 2 types of storage media, &
  • Place 1 copy off-site.

Schedule your backup

Backup your data on a regular interval and after every significant change to the data.


Regular check on data integrity 

Test your backup periodically to ensure you can recover the data from a backup in case of need. Migrate the data file to new storage media periodically to prevent obsoletion of storage media. Perform checksum after backup and data migration.


Use file format with long-term availability

Use non-proprietary file format with open documentation, or file format that is widely adopted, if possible. Find the suggested file format here.
 

Next: Storage Media>>

There is no perfect storage media that can provide convenient access yet never suffer from damage, loss, obsolescence. A good practice is to keep your research data in at least two different types of storage media. This helps to diversify the risks for both short term or long term storage. Also, do periodically check if the data is accessible.

Below are the common storage media used by researchers:

Storage Media

Advantages

Risks

Suitable for


Desktop PC
  • Convenient to store and analyze the data on the same device
  • Hard disk failure
  • Physical damage
    (e.g. fire, flooding)
  • No auto backup

Temporary
Storage


Laptop
  • Convenient to store and analyze the data on the same device
  • Portable
  • Hard disk failure
  • Physical damage
  • Stolen / loss
  • No auto backup

Temporary
Storage


External Storage
  • Portable
  • May allow encryption
  • Device failure
  • Physical damage
  • Stolen / loss
  • No auto backup
  • Can become obsolete 

Temporary
Storage


University's
Network Drive
(PolyU Home Drive)
  • Auto-backup
  • Stable and Secure
  • Storage space allocated to your department/ research team may not be sufficient

Master Copy
Storage


Cloud Services
  • Auto backup
  • Auto-sync with local folders
  • Anytime Anywhere

Collaboration with
members from 
different institutions

<< Previous: Backup Strategy Next: Suggested File Formats >>

File format is an important issue to consider if you would like to ensure long-term accessibility of your research data. Theoretically, the ideal file format suitable for long-term access should be:

  • non-proprietary
  • with open documentation
  • widely adopted by the research community
  • compatible with multiple software
  • no compression or lossless compression (reducing file size without loss of quality)
  • without embedded scripts or files

It may not always be possible to find a file format that meets all criteria above. To be prudent, you may consider saving your data in more than one format below:

Data Type

Recommended formats

Acceptable formats


Spreadsheets
  • CSV (.csv)
  • ODS (.ods)
  • Microsoft Excel (.xls, .xlsx)

Statistical Data
  • SPSS Portable (.por)
  • STATA (.dta)
  • DDI (.xml)
  • R
  • SPSS (.sav)
  • SAS (.7dat; .sd2; .tpt)

Databases
  • SQL (.sql)
  • SIARD (.siard)
  • CSV (.csv)
  • XML (.xml)
  • Microsoft Access (.mdb, .accdb)
  • dBase (.dbf)
  • HDF5 (.hdf5, .he5, .h5)

Text
  • PDF/A (.pdf)
  • ODT (.odt)
  • Unicode text (.txt)
  • Microsoft Word (.doc, .docx)
  • Rich Text File (.rtf)
  • PDF other than PDF/A (.pdf)
  • Non-Unicode text (.txt)

Images
  • JPEG (.jpg, .jpeg, .jp2)
  • TIFF (.tif, .tiff)
  • PNG (.png)
  • Photoshop files (.psd)
  • PDF (.pdf)

Audio
  • FLAC (.flac)
  • BWF (.bwf)
  • MXF (.mxf)
  • Matroska (.mka)
  • OPUS
  • WAVE (.wav)
  • MP3 (.mp3)
  • AAC (.aac, .m4a)
  • AIFF (.aif, .aiff)
  • OGG (.ogg)

Video
  • MXF (.mxf)
  • Matroska (.mkv)
  • MPEG-4 (.mp4, .m4a, .m4v)
  • MPEG-2 (.mpg, .mpeg, .m2v, mpg2)
  • AVI (.avi)
  • QuickTime (.mov, .qt)

You may also find more recommended and acceptable file formats for other types of data from DANS (Data Archiving and Networked Services).


<< Previous: Storage Media

Checksum Validation

Checksum is a quick way to check data integrity before and after backup or file migration. It creates a number string for each data file. Every change in the data file will generate a new number string so as to detect data alteration or corruption.

Below are some commonly used free software to perform checksum: