Skip to Main Content
PolyU Library

Online Tools for Assignment

Introduce useful online tools that may help to prepare your assignment.

OpenRefine


Previously known as Google Refine, Open Refine is an open-source application for data cleaning and data transformation which will come in useful when performing data analysis at a later stage. It uses your browser as an interface but keeping your data private on your own device unless you would like to share it with others.

Before we can analyze any data, we often need to clean the data beforehand. We use this Excel file to demonstrate the powerful data cleaning function in Open Refine. The information in the spreadsheet under "Region" in column F is in a mess with inconsistent spelling. It would be very time consuming to manually clean the data using Excel. 

Now ley's try to clean it by using Open Refine with steps below:

  1. Download and install Open Refine.
  2. Click on Choose Files to browse the Excel file we downloaded, and then click on Next >>.

     
  3. Click on Create Project >>.

     
  4. Select Text facet under Region > Facet:

     
  5. Click on Cluster.

     
  6. Select an appropriate algorithm from the Keying Function to re-group your data.
  7. Provide a new value for each group after re-grouping.
  8. Click on Select All to apply changes to all of these groups.
  9. Click on Merge Selected & Close to execute the changes.


    The data now becomes clean:

     
  10. We can then export the manipulated data to Excel or other formats by clicking on Export.

 

For more data cleaning functions, please refer to the official documentation.


Creative Commons License

Except where otherwise noted, the content of this guide is licensed under a CC BY-NC 4.0 License.