dr.&michele&c.&weigle mweigle ...mweigle/courses/cs725-s17/02-data.pdf · we will not...
TRANSCRIPT
Data
Dr.&Michele&C.&Weigle
CS&725/825Information&VisualizationSpring&2017
http://www.cs.odu.edu/~mweigle/CS725FS17/
Note! We will not cover these slides in class, but they
are required reading for the week.
! There are a few supplemental images for the textbook reading and a few slides on data formats, data cleaning, and data sources
CS&725/825&F Spring&2016&F Weigle2
Data
CS&725/825&F Spring&2016&F Weigle3
Ward,&Grinstein,&Keim,&Fig&1.37
Tables
CS&725/825&F Spring&2016&F Weigle4
n items
m attributes
Ward,&Grinstein,&Keim,&Fig&1.37
value
Networks&and&Trees
CS&725/825&F Spring&2016&F Weigle5
http://en.wikipedia.org/wiki/File:Wikipedia_multilingual_network_graph_July_2013.svg http://commons.wikimedia.org/wiki/File:Binary_tree.svg
Field&Dataset
CS&725/825&F Spring&2016&F Weigle6
http://en.wikipedia.org/wiki/File:PETFimage.jpg
Geometry
CS&725/825&F Spring&2016&F Weigle7
http://en.wikipedia.org/wiki/File:CntrFmapF1.jpg
Multidimensional&Table
CS&725/825&F Spring&2016&F Weigle8
http://en.wikipedia.org/wiki/Table_(information)
Temporal&Semantics
CS&725/825&F Spring&2016&F Weigle9
http://commons.wikimedia.org/wiki/File:Evidence_of_global_warming_F_time_series_of_seasonal_(red_dots)_and_annual_average_(black_line)_of_global_upper_ocean_heat_content_for_the_0F700m_layer_between_1955F2008.gif
Data&Formats! Delimited Text
! tabbed delimited! comma delimited (CSV)
! Extensible Markup Language (XML)! looks a bit like HTML! user-defined tags to identify data
! JavaScript Object Notation (JSON)! collection of name/value pairs! smaller than XML! easier to parse
CS&725/825&F Spring&2016&F Weigle10
Example&of&Data&Formats
CS&725/825&F Spring&2016&F Weigle11
20090101,2620090102,3420090103,27
<weather_data><observation>
<date>20090101</date><max_temp>26</max_temp>
</observation><observation>
<date>20090102</date><max_temp>34</max_temp>
</observation><observation>
<date>20090103</date><max_temp>27</max_temp>
</observation></weather_data>
{"observations":&[{"date":"20090101",&"max_temp":26},{"date":"20090102",&"max_temp":34},{"date":"20090103",&"max_temp":27}]}
JSON
CSV XML
Yau,&Visualize+This,&Ch&2
How&to&convert&between&data&formats?! Write a program to convert from one format to another
! awk (my favorite, but I'm old school), Python, Perl, PHP, ...
! Other tools! search Google for "csv to json", "csv to xml", "xml to json"
! Mr. Data Converter! http://shancarter.github.io/mr-data-converter/! developed by a graphics editor at The New York Times! input: CSV or tab-delimited data! output: HTML table, JSON, MySQL, Python, PHP, Ruby,
XML, ...
CS&725/825&F Spring&2016&F Weigle12
Data&in&the&Real&World! Data can be missing, have typos, be
inconsistent, spread over multiple tables, etc.
! Two big issues:! format! accuracy
CS&725/825&F Spring&2016&F Weigle13
World&Disasters&F Inconsistent
CS&725/825&F Spring&2016&F Weigle14
http://www.infochimps.com/datasets/disastersFworldwideFfromF1900F2008
World&Disasters&F Missing
CS&725/825&F Spring&2016&F Weigle15
http://www.infochimps.com/datasets/disastersFworldwideFfromF1900F2008
What&to&do&with&dirty&data?
CS&725/825&F Spring&2016&F Weigle16
Data&Cleaning&ToolsQuick Tools
! Data Science Toolkit! http://www.datasciencetoolkit.org/! lots of quick conversion tools
! Mr. People! http://people.ericson.net/! formats lists of names
! Mr. Data Converter! http://shancarter.github.io/mr-
data-converter/
Full Apps
! Trifacta Wrangler! https://www.trifacta.com/products
/wrangler/! video: https://vimeo.com/175859872
! Open Refine (was Google Refine)! http://openrefine.org/! video:
http://www.youtube.com/watch?v=B70J_H_zAWM
! more info: http://www.propublica.org/nerds/item/using-google-refine-for-data-cleaning
CS&725/825&F Spring&2016&F Weigle17
What&about&accuracy?! Nathan Yau (Visualize This, Data Points) was
intern at The New York Times
! One day, his entire goal was to verify 3 numbers in a dataset
! Must have accurate data before can trust the visualization
CS&725/825&F Spring&2016&F Weigle18
Yau,&Visualize+This,&Ch&1
Recall&the&marriage&rate&chart
CS&725/825&F Spring&2016&F Weigle19
Graphing&the&raw&data
CS&725/825&F Spring&2016&F Weigle20
New&Hampshire?!?
Let's&look&at&the&Excel&file
CS&725/825&F Spring&2016&F Weigle21
http://www2.census.gov/library/publications/2011/compendia/statab/131ed/tables/12s0133.xls
Now,&let's&look&at&the&PDF
CS&725/825&F Spring&2016&F Weigle22
http://www2.census.gov/library/publications/2011/compendia/statab/131ed/tables/vitstat.pdf
But&wait,&there's&more&funny&stuff
CS&725/825&F Spring&2016&F Weigle23
PDF Excel
Bottom&Line! If you see something weird in your graph that
you can't explain, go back and double-check your data
! Even if you didn't make an error, maybe someone else did
CS&725/825&F Spring&2016&F Weigle24
Data&Sources! Some notable ones
! Data.gov! Google Public Data Explorer
! http://www.google.com/publicdata/directory ! Census Bureau
! Census data visualization gallery - http://www.census.gov/dataviz/! Federal Reserve
! FRASER - http://fraser.stlouisfed.org/! FRED - http://research.stlouisfed.org/fred2/
CS&725/825&F Spring&2016&F Weigle25
Even&more&on&the&Links&page