Data Processing is a critical step for researchers to draw meaningful insights from the collected data. It involves skills like data provenance, cleaning, analysis, and visualization.
The Library has subscribed to DataCamp, an interactive learning platform that allows you to build data skills at your own pace. You can find topics like data cleaning, data visualization, machine learning, data engineering, statistics, and more. Most courses are beginner-friendly!
There are many interactive learning materials for Python, R, SQL, and Power BI.
Data Provenance is a record trail that helps the current research team and other researchers in future to gain a better understanding of the origin, changes, workflow, and processes of data. It plays an important role in scientific research by building trust and credibility in the data and ensuring the reproducibility of data analysis.
We can simply record the provenance using a read-me file, but a lot of researchers currently capturing the provenance trails with the following tools:
Electronic Lab Notebooks (ELNs)
Online Computational Notebooks
Data Cleaning (also known as Data Cleansing) is an important process to transform your raw data into usable data for analysis. To ensure your dataset is correct and complete, the data cleaning process involves fixing or removing incomplete data, cross-checking data against a validated data set, standardizing inconsistent data, and more.
Typical tools for data cleaning include:
Data analysis is the process of gaining meaningful insights from raw data. You may find interesting trends or patterns and relationships in the datasets via different statistical methods, such as statistical modeling, data mining, and machine learning algorithms.
Some examples of data analysis tools:
Data Visualization helps discover trends, associations, and patterns that may not be easily identified otherwise. It is also a process to communicate your analysis with your audience effectively by employing tables, charts, graphs, maps, word cloud, etc.
Some examples of data visualization tools: