Guides & Tutorials: Research Data Management: Data Processing

Need Help? Contact Us!

	2766-6863
	2766-6863 (service hours)
	Online Form
	Contact your Faculty Librarians on in-depth research questions

Data Processing

Data processing is a critical step for researchers to draw meaningful insights from the collected data. It involves skills like data provenance, cleaning, analysis, and visualization.

The Library has subscribed to DataCamp, an interactive learning platform that allows you to build data skills at your own pace. You can find topics like data cleaning, data visualization, machine learning, data engineering, statistics, and more. Most courses are beginner-friendly!

There are many interactive learning materials for Python, R, SQL, and Power BI.

Data provenance is a record trail that helps the current research team and other researchers in future to gain a better understanding of the origin, changes, workflow, and processes of data. It plays an important role in scientific research by building trust and credibility in the data and ensuring the reproducibility of data analysis.

We can simply record the provenance using a read-me file, but a lot of researchers currently capturing the provenance trails with the following tools:

	Electronic Lab Notebooks (ELNs) Software to replace traditional paper lab notebooks. ELNs document the research design, experiments, and procedures performed in a laboratory that supports access control and collaborations. Examples of ELNs here.
	Online Computational Notebooks An interactive computing environment that allows writing and running of codes, documents methods, and shares analysis with others. Typical examples include Jupyter Notebooks and Google Colab.
	Open Science Framework A free, open platform created by the Center for Open Science (COS), allows researchers to document, collaborate, register, and share research projects and data.

Data cleaning (also known as Data Cleansing) is an important process to transform your raw data into usable data for analysis. To ensure your dataset is correct and complete, the data cleaning process involves fixing or removing incomplete data, cross-checking data against a validated data set, standardizing inconsistent data, among others.

Typical tools for data cleaning include:

OpenRefine
MS Excel
Python
R

Data analysis is the process of gaining meaningful insights from raw data. You may find interesting trends or patterns and relationships in the datasets via different statistical methods, such as statistical modeling, data mining, and machine learning algorithms.

Some examples of data analysis tools:

Data visualization helps discover trends, associations, and patterns that may not be easily identified otherwise. It is also a process to communicate your analysis with your audience effectively by employing tables, charts, graphs, maps, word cloud, etc.

Some examples of data visualization tools: