Skip to Main Content
PolyU Library

The PolyU Research Data Repository

This guide introduces key features of the PolyU Research Data Repository.

Scenarios for Data Deposit


The PolyU Research Data Repository allows researchers to deposit and share data for diverse scenarios. Here are the types of research data for deposit in different contexts:

Journal Publications:

  • Processed data relevant to the publication
  • Supplementary materials, such as figures, tables, and charts
  • Methodology and experimental protocols
  • Code or scripts used for analysis or simulations
  • Supporting documentation and metadata

Pre-Publication Reviews:

  • Preprints or draft versions of the research manuscript
  • Associated datasets and results
  • Supporting materials, including figures, tables, and additional analyses
  • Methodological details and protocols
  • Relevant metadata and documentation

Patent Applications:

  • Experimental data and results supporting the patent claims
  • Prototypes, designs, or schematics
  • Methodology and detailed experimental procedures
  • Supporting documentation, including patent application drafts and related materials
  • Metadata and documentation describing the data and its relationship to the patent application

Collaborative Research Projects:

  • Data generated by multiple research teams or institutions
  • Interdisciplinary datasets and findings
  • Project reports, meeting minutes, and progress updates
  • Research instruments, surveys, or questionnaires
  • Documentation describing the collaboration and data sharing agreements

Theses and Dissertations:

  • Data collected during the research process
  • Analysis results and findings
  • Supporting materials, such as survey instruments or interview transcripts
  • Methodological details and protocols
  • Relevant metadata and documentation

Data File Preparation


The PolyU Research Data Repository welcomes a wide range of research data, including:

  • Quantitative, qualitative, geospatial, multimedia, simulation and modeling data etc.
  • Comprehensive metadata and documentation alongside the datasets for better readability and potential reuse

Be aware of a few precautions and limitations when uploading your data:

  • Max. file number per dataset: 100
  • Max. individual file size: 1GB
  • File format
    • See below for the recommended formats
    • Executable programs or scripts (e.g., exe, bat, vbs) are NOT allowed
  • Sensitive or confidential data must be appropriately anonymized or de-identified before the deposit to ensure compliance with ethical and legal requirements
  • For full-text of your research outputs, please submit to The PolyU Institutional Research Archive (PIRA) instead

If you have special needs for file number, size or format, please contact us.

 

Recommended File Format

File format selection is a crucial consideration when aiming at long-term accessibility of your research data. Ideally, the chosen file format should possess the following characteristics:

  • Non-proprietary: Opt for file formats that are open source and not subject to proprietary restrictions.
  • Open documentation: Ensure that the file format has well-documented specifications, promoting transparency and ease of understanding.
  • Wide adoption: Select a file format that is widely adopted and recognized within the research community, enhancing interoperability and future accessibility.
  • Software compatibility: Choose a file format that can be opened and utilized by multiple software applications, minimizing dependencies on specific software tools.
  • Compression considerations: Prefer file formats that employ either no compression or lossless compression techniques, allowing for reduced file sizes without compromising data quality.
  • Avoid embedded scripts or files: To ensure stability and compatibility, refrain from using file formats that incorporate embedded scripts or files.

While it may not always be possible to find a file format that meets all the criteria mentioned above, it is advisable to safeguard your data by saving it in multiple formats whenever feasible. This approach ensures redundancy and adaptability, mitigating risks associated with format obsolescence over time.

The table below provides guidance on the recommended and accepted file formats for data sharing, reuse, and preservation within the PolyU Research Data Repository:

Data Type

Recommended formats

Acceptable formats


Spreadsheets
  • CSV (.csv)
  • Tab-delimited File(.tab)
  • Microsoft Excel (.xls, .xlsx)
  • Open Document Spreadsheet (.ods)

Statistical Data
  • SPSS (.sav)
  • STATA (.dta)
  • SAS (.sas7dat)
  • DDI (.xml)
  • SPSS Portable (.por)

Databases
  • SQL (.sql)
  • SIARD (.siard)
  • CSV (.csv)
  • XML (.xml)
  • Microsoft Access (.mdb, .accdb)
  • dBase (.dbf)
  • HDF5 (.hdf5, .he5, .h5)

Text
  • PDF/A (.pdf)
  • ODT (.odt)
  • Unicode text (.txt)
  • Rich Text File (.rtf)
  • Microsoft Word (.doc, .docx)
  • PDF other than PDF/A (.pdf)
  • Non-Unicode text (.txt)

Images
  • TIFF (.tif, .tiff)
  • Photoshop files (.psd)
  • PDF (.pdf)
  • JPEG (.jpg, .jpeg, .jp2)
  • PNG (.png)

Audio
  • FLAC (.flac)
  • WAVE (.wav)
  • MP3 (.mp3)
  • AAC (.aac, .m4a)
  • AIFF (.aif, .aiff)
  • OGG (.ogg)

Video
  • MPEG-4 (.mp4)
  • Matroska (.mkv)
  • Window Media Video (.wmv)
  • QuickTime (.mov, .qt)

You may also find more recommended and acceptable file formats for other types of data from UK Data Service.




     
   Image CC-BY-SA by SangyaPundir         

Ensure your Data as FAIR as possible!


To make your shared data useful, your data should be as FAIR as possible. FAIR stands for FindableAccessible, Interoperable, and Reusable. It is a set of guiding principles formulated by Force11, aiming at optimizing discovery and reuse of research data.

FAIR data is a framework for you to manage your research data to be shared so that other researchers will be able to find it, understand it, and reuse it effectively. Below is a simplified description of what is FAIR. We encourage you to read the original document for details. 

  Findable

Make sure the data is discoverable by the others with rich metadata and assigned with a persistent identifier, e.g. a DOI.

  Accessible

Both data and metadata should be retrievable via a standard protocol. In case the data cannot be made open, do make sure to keep the metadata publicly available.

  Interoperable

(Meta)data should be interpretable with different tools, applications, and systems by using recognized formats and standards.

  Reusable

There should be a clear license for reuse of the shared data and persistent identifier for citation. Proper documentation helps others to interpret and reuse the data. 

Depositing your research data in a data repository is a good way to help your data to be FAIR, as these data repositories usually will assign DOI to your data, populate the metadata and assist you to specify the license of reusing your data.