Skip to Main Content
PolyU Library

Research Data Management

Sharing good practices for Research Data Management

Data Processing 


Data Processing is a critical step for researchers to draw meaningful insights from the collected data. It involves skills like data provenance, cleaning, analysis, and visualization.

The Library has subscribed to DataCamp, an interactive learning platform that allows you to build data skills at your own pace. You can find topics like data cleaning, data visualization, machine learning, data engineering, statistics, and more. Most courses are beginner-friendly!

There are many interactive learning materials for Python, R, SQL, and Power BI.

Data Provenance is a record trail that helps the current research team and other researchers in future to gain a better understanding of the origin, changes, workflow, and processes of data. It plays an important role in scientific research by building trust and credibility in the data and ensuring the reproducibility of data analysis.

We can simply record the provenance using a read-me file, but a lot of researchers currently capturing the provenance trails with the following tools:

Electronic Lab Notebooks (ELNs)
Software to replace traditional paper lab notebooks. ELNs document the research design, experiments, and procedures performed in a laboratory that supports access control and collaborations. Examples of ELNs here.

Online Computational Notebooks
An interactive computing environment that allows writing and running of codes, documents methods, and shares analysis with the others. Typical examples include Jupyter Notebooks and Google Colab.

Open Science Framework
​A free, open platform created by the Center for Open Science (COS), allows researchers to document, collaborate, register, and share research projects and data.

 

Data Cleaning (also known as Data Cleansing) is an important process to transform your raw data into usable data for analysis. To ensure your dataset is correct and complete, the data cleaning process involves fixing or removing incomplete data, cross-checking data against a validated data set, standardizing inconsistent data, and more. 

Typical tools for data cleaning include: 

Data analysis is the process of gaining meaningful insights from raw data. You may find interesting trends or patterns and relationships in the datasets via different statistical methods, such as statistical modeling, data mining, and machine learning algorithms. 

Some examples of data analysis tools:

Data Visualization helps discover trends, associations, and patterns that may not be easily identified otherwise. It is also a process to communicate your analysis with your audience effectively by employing tables, charts, graphs, maps, word cloud, etc. 

Some examples of data visualization tools: