Skip to Main Content
PolyU Library

Research Data Management

Sharing good practices for Research Data Management

File Management and Organization


Organizing your files and research data in a structured and consistent way can save your time in retrieving them in the future. You may find some good practices related to the folder structure, file naming, and version management in this session.

A good folder structure allows you to easily locate the files you need. Planning your folder structure earlier in a research process allows you to built a logical one. The nature of your research project determines how you organize your folders and handle your data. Here are some good practices you should consider when designing your folder structure:


Design with a hierarchical folder structure

Hierarchical folder structure is a systematic way to organize your files. We normally start from folders with broad topics, then sub-folders with more specific topics in the next hierarchy. It is recommended not to have a deep hierarchy, for instance having more than 4 levels, as this may create difficulties in retrieving the files. Also, it is suggested not to have an excessive number of items in each folder. 

 

Document your directory structure and naming convention

Proper documentation of your folder structure helps your collaborators and you, as well as other researchers with whom you share your data to learn how the materials are organized. This also helps everyone in the research team to do the filing process consistently.

 


Separate old versions with working documents

Put old versions of documents in a separate folder so that you will only see the latest version in the working folder. This helps to keep your folders tidy and avoid accidentally working on an outdated version.

Next: File Naming >>

An appropriate file name can help you understand what information is in the file. It will also shorten the time spent on locating the file in the future. You may find some good practices in naming your files below:


Assign descriptive names

Filenames should reflect the content of the files with elements like project name, researcher, date, location, data type, version in consistent orders. This helps to preview the content and organize the files logically.


Keep short but meaningful names

Most systems, software, and repositories have a limitation on the length of the file name. You may adopt abbreviations or code the element (like researcher, data type, etc.) to keep filenames short but informative.

Avoid spacing 

Spacing in filenames may not be recognized by some software. You may use alternative ways like underscore (research_data), dash (research-data), no separation (researchdata), or camel case (ResearchData).

Avoid using non-alphanumeric characters

Do not use special characters like @ ~ \ / < > | ? ! [ ] " * : ; = + & $ % in filenames. It may increase the likelihood of error when opening the files in another application or operating system.


Ensure files are in chronological order

Use the format of YYYYMMDD (e.g. 20200423 instead of 23042020 or 04232020) for filenames containing date elements and two-digit numbers (e.g. 01, 02, 03 instead of 1, 2, 3) for filenames with sequential numbers. These practices ensure your files can be sorted properly.


Document your file naming conventions 

Keeping a file explaining your filename format, abbreviations used, and coded elements help everyone, including yourself, to recall and understand the file names in the future.

Batch Renaming Tools


You may consider using a batch renaming tool to rename the files imported from another system, software, or device according to your file naming conventions. 

<< Previous: Folder Structure Next: Version Management >>

Giving proper versions to your research data and files will allow you to retrieve a specific version easily. This is useful when you wish to re-work or retrieve your data from a specific stage of your project. Here are some good practices in versioning files and data:


Use sequential numbering system

Add sequential number (v01, v02, v03) to filename or two-part numbering rule (v1.00, v1.01, v2.00) to indicate major and minor changes with the ordinal and decimal number respectively. Avoid using ambiguous terms like revision, final, final2. See the reason here.


 

Keep milestone versions only

Although we do not advocate deleting any version during your research process, we recommend keeping major versions only for long-term preservation purposes due to the cost and time in managing your files in the long term.


Set original files as read-only

Keeping a read-only version of your raw data can avoid any accidental change of your precious data.


Document your versions

Record all changes made whenever a new version is generated. This allows you and your collaborators to identify the differences between different versions, thus enabling locating the correct version in the future.

Version Control Tools


Version control refers to software tools that allow users to track the changes made to files within a computer's directory. While version control systems are commonly used in software development industry, they are increasingly utilised for collaborative purposes in academic and research contexts. These systems are ideal for simple text files, such as computer code and documents.

Git

Git is a free and open source distributed version control system designed to efficiently manage small to very large projects.

<< Previous: File Naming