data wrangling
DESCRIPTION
Our data cleaning toolkitTRANSCRIPT
Data wrangling Sometimes we have to do dirty jobs
Michele Mauri DensityDesign Research Lab
Data often is messy and needs to be cleaned or at least converted
My data cleaning toolkit
1. Textwrangler * ** http://www.barebones.com/products/textwrangler/
* (notepad++ for winduz) ** (actually, any advanced texteditor)
1. Textwrangler
useful to: - remove text formatting - clean hidden characters
- replace separator charachters - structure data - apply regexp
2. Open Refine http://openrefine.org/
2. Open Refine
useful to: - convert formats - reconcile data - structure data
- enrich (link) data with freebase - apply GREL functions
3. Data wrangler http://vis.stanford.edu/wrangler/
3. Data Wrangler
useful to: - reformat data values
- correct erroneous or missing values - (re)structure dataset
4. Excel http://office.microsoft.com/en-us/excel/
4. Excel
useful to: - use formulas
- rearrange & filter - pivot tables
5. Code (processing, javascript…)
5. Code
useful to: - do everything