Guides & Tutorials: Research Data Management: Data Documentation

Need Help? Contact Us!

	2766-6863
	2766-6863 (service hours)
	Online Form
	Contact your Faculty Librarians on in-depth research questions

Data Documentation & Metadata

Data documentations are human-readable files and records explaining the content, structure, and meaning of data; while metadata are standardized, machine-readable fields that make the data discoverable and reusable.

They both provide information about the data, ensure your data is understandable, make future analysis and reuse possible, and thus increase the value of your research data.

Data Documentation
Metadata

It is always easier to create data documentation at the beginning of your research project and update it throughout the research process. Good data documentation usually explains at:

Project level:
- research background and design, e.g. investigators, funders, research aims, hypothesis, etc.
- data collection method.
- structure of data files.
- procedure for data cleaning and other quality assurance measures adopted.
- version of the dataset and modifications made.
- source of secondary data used, if any.
- license for reuse.
- related publications and other research outputs.
Variable level:
- definition of the parameters.
- unit of measurements.
- format for data, time and other parameters.
- code values, e.g. 1=female; 2=male, etc.
- code for missing values.
- corresponding question number.

Depending on the nature of the research and data collection method, data documentation can be recorded in different forms like read-me file, data dictionary, codebook, laboratory notebook, dairy, etc. They all share the same goals - to ensure your research data can be understood by current and future researchers who would like to make use of the data again, including yourself!

Next: Metadata >>

Metadata means data about data. It provides a structured way to describe the datasets in a standardized manner. This allows different computers to interpret the contents automatically which facilitates interoperability among different systems. Below are the common elements of metadata:

Types	Functions	Examples
Descriptive Metadata	Enables discovery, indexing, and retrieval.	Title Creator Funders Subject Language Dates Location Persistent Identifier
Technical Metadata	Describes how a dataset was produced and structured.	Methods Processing File names File format Variables Codes Versions
Administrative Metadata	Describes user rights and management of the dataset.	Rights & license for reuse Access information like restrictions and embargo period

Metadata Standards

Metadata can be recorded in a variety of formats like text documents, HTML, or XML. An example of a widely used metadata standard for generic research data is Dublin Core (DC). You can also make use of the following tools to identify the common metadata standards used in your subject areas:

Disciplinary Metadata - The Digital Curation Centre (DCC)
Metadata Directory - Research Data Alliance (RDA)
Standards Search - Fairsharing.org

<< Previous: Data Documentation

Resources for Creating Documentation

A readme file provides information about a data file. It helps ensure other researchers and yourself can understand and reuse the data in the future. A typical readme file is usually saved in a plain text file rather than in proprietary formats (e.g. MS Word) for long-term accessibility.

You can learn how to create a readme file from Cornell University’s Research Data Management Service Group, and download their suggested template to adapt it for your own data.

A data dictionary is a file that provides meaningful descriptions for each variable and value of your dataset. Below is an example from the Open Science Framework (OSF). Learn more from its section on How to Make a Data Dictionary.

For data including Python or R scripts, you may also provide brief information and the purpose of the code.

A codebook provides information about data from a survey instrument. It describes the contents, structure, and layout of the data file, the response codes that are used to record survey responses, and other information.

Below is an example from the ICPSR Guide to Codebooks. We recommend you read the guide from ICPSR for more details.

Laboratory notebook documents the inputs, conditions, workflows, and other information in conducting an experiment. It records data provenance, and thus improves the transparency of the research process as well as increases the reliability of research results.

A lot of researchers are now using Electronic laboratory notebooks (ELNs) to replace the traditional paper laboratory notebooks because it:

Simplifies data management: ELN allows information stored in a centralized place which helps to streamline the data management process.
Allows audit trials and controlled access: ELN provides an organized and controlled system that facilitates audits and reviews. All the editing processes will be logged and timestamped.
Enables searching: ELN digitalizes the entire research process thus making it easier to trace amongst previous experiments.
Ensures long-term access: ELN can be backed up on a server or on the cloud, which ensures future accessibility.
Enhances collaboration: Sharing of scientific data stored in ELN among collaborators is much easier when compared to paper-based notebooks.