data wrangling

16
Data wrangling Sometimes we have to do dirty jobs Michele Mauri DensityDesign Research Lab

Upload: densitydesign

Post on 07-Mar-2016

217 views

Category:

Documents


0 download

DESCRIPTION

Our data cleaning toolkit

TRANSCRIPT

Page 1: Data Wrangling

Data wrangling Sometimes we have to do dirty jobs

Michele Mauri DensityDesign Research Lab

Page 2: Data Wrangling

Data often is messy and needs to be cleaned or at least converted

Page 3: Data Wrangling
Page 4: Data Wrangling
Page 5: Data Wrangling
Page 6: Data Wrangling

My data cleaning toolkit

Page 7: Data Wrangling

1. Textwrangler * ** http://www.barebones.com/products/textwrangler/

* (notepad++ for winduz) ** (actually, any advanced texteditor)

Page 8: Data Wrangling

1. Textwrangler

useful to: - remove text formatting - clean hidden characters

- replace separator charachters - structure data - apply regexp

Page 9: Data Wrangling

2. Open Refine http://openrefine.org/

Page 10: Data Wrangling

2. Open Refine

useful to: - convert formats - reconcile data - structure data

- enrich (link) data with freebase - apply GREL functions

Page 11: Data Wrangling

3. Data wrangler http://vis.stanford.edu/wrangler/

Page 12: Data Wrangling

3. Data Wrangler

useful to: - reformat data values

- correct erroneous or missing values - (re)structure dataset

Page 13: Data Wrangling

4. Excel http://office.microsoft.com/en-us/excel/

Page 14: Data Wrangling

4. Excel

useful to: - use formulas

- rearrange & filter - pivot tables

Page 15: Data Wrangling

5. Code (processing, javascript…)

Page 16: Data Wrangling

5. Code

useful to: - do everything