Skip to Main Content
PolyU Library

Research Data Management

Sharing good practices for Research Data Management

File Management and Organization


Organizing your files and research data in a structured and consistent way can save your time in retrieving them in the future. You may find some good practices related to the folder structure, file naming, and version management in this session.

A good folder structure allows you to easily locate the files you need and planning your folder structure earlier in the process allows you to built a logical one. Also, the nature of your research project will determine how you organize your folders and how you handle the data. Here are some good practices you should consider when designing your folder structure:


Design with a hierarchical folder structure

Hierarchical folder structure is a systematic way to organize your files. We normally start from folders with board topics, then sub-folders with more specific topics in the next hierarchy. Try not to have a deep hierarchy (say, not more than 4 levels) as this may create difficulties in retrieving the files. Also, try not to have an excessive number of items in each folder. Keep it less than 10 if possible.

 

Document your directory structure and naming convention

Proper documentation of your folder structure helps your collaborators and you, as well as other researchers whom you share your data with, learn how the materials are organized. This also helps everyone in the research team to file similar items consistently.

 


Separate old versions with working documents

Put old versions of documents in a separate folder so that you will only see the latest version in the working folder. This helps to keep your folders tidy and avoid accidentally working on an outdated version.

Next: File Naming >>

An appropriate file name can help you to understand what information is in the file. It will also shorten the time spent on locating the file again in the future. You may find some good practices in naming your files below:


Assign descriptive names

Filenames should reflect the content of the files with elements like project name, researcher, date, location, data type, version in consistent orders. This helps to preview the content and organize the files logically.


Keep short but meaningful names

Most systems, software, and repositories have a limitation on the length of the file name. You may adopt abbreviations or code the element (like researcher, data type, etc.) to keep filenames short but informative.

Avoid spacing 

Spacing in filenames may not be recognized by some software. You may use alternative ways like underscore (research_data), dash (research-data), no separation (researchdata), or camel case (ResearchData).

Avoid using non-alphanumeric characters

Do not use special characters like @ ~ \ / < > | ? ! [ ] " * : ; = + & $ % in filenames. It may increase the likelihood of error when opening the files in another application or operation system.


Ensure files are in chronological order

Use the format of YYYYMMDD (e.g. 20200423 instead of 23042020 or 04232020) for filenames containing date elements and two-digit numbers (e.g. 01, 02, 03 instead of 1, 2, 3) for filenames with sequential numbers. These practices ensure your files can be sorted properly.


Document your file naming conventions 

Keeping a file explaining your filename format, abbreviations used, and coded elements will help everyone, including yourself, to recall and read the file names in the future.

Batch Renaming Tools


You may consider using a batch renaming tool to rename the files imported from another system, software, or device according to your file naming conventions. 

<< Previous: Folder Structure Next: Version Management >>

Giving proper versions to your research data and files will allow you to retrieve a specific version easily. This is useful when you wish to re-work or retrieve your data from a specific stage of your project. Here are some good practices for versioning of files and data:


Use sequential numbering system

Add sequential number (v01, v02, v03) to filename or two-part numbering rule (v1.00, v1.01, v2.00) to indicate major and minor changes with the ordinal and decimal number respectively. Avoid using ambiguous terms like revision, final, final2. See the reason here.


 

Keep milestone versions only

Although we do not advocate deleting any version during your research process, we recommend keeping major versions only for long-term preservation purposes due to the cost and time in managing your files in the long term.


Set original files as read-only

Keeping a read-only version of your raw data can avoid any accidental change of your precious data.


Document your versions

Record all changes made whenever a new version is generated. This allows you and collaborators to identify the differences for the different versions thus enabling the locating of the correct version in the future.

Version Control Tools


Version control refers to software tools that allow users to track changes made to files within a computer's directory. While version control systems are commonly used in software development, they are increasingly utilised for collaborative purposes in academic and research contexts. These systems are ideal for simple text files, such as computer code and documents.

Git

Git is a free and open source distributed version control system designed to efficiently manage small to very large projects.

<< Previous: File Naming