hannah aizenman - get to know your data
DESCRIPTION
A recent article in the New York Times estimates that data scientists spend somewhere between %50 and %80 of their time "collecting and preparing unruly digital data" before they ever get to the analysis. Data is often badly labeled, inconsistently sampled, incorrect in strange places, missing, and otherwise contains a whole host of errors, leading to the "garbage in, garbage out" problem. While detecting the myriad ways in which the data is broken can sometimes be difficult, traditional visualization techniques, exploratory data analytics, and cluster analysis can help. This talk will discuss some of the typical methods for sanity checking small data sets: visualization, simple statistics, and some basic combinations of the two. This talk will then veer into some machine learning techniques for exploring the underlying structure of larger data sets to verify the occurrence of known patterns and to detect outliers that could be due to errors rather than the occurance of something interesting.TRANSCRIPT
![Page 1: Hannah Aizenman - Get To Know Your Data](https://reader034.vdocument.in/reader034/viewer/2022042817/559c1cb11a28ab14158b4674/html5/thumbnails/1.jpg)
Get To Know Your Data
Hannah Aizenman@story645
![Page 3: Hannah Aizenman - Get To Know Your Data](https://reader034.vdocument.in/reader034/viewer/2022042817/559c1cb11a28ab14158b4674/html5/thumbnails/3.jpg)
Unprocessed Data
![Page 4: Hannah Aizenman - Get To Know Your Data](https://reader034.vdocument.in/reader034/viewer/2022042817/559c1cb11a28ab14158b4674/html5/thumbnails/4.jpg)
Missing Observations
![Page 5: Hannah Aizenman - Get To Know Your Data](https://reader034.vdocument.in/reader034/viewer/2022042817/559c1cb11a28ab14158b4674/html5/thumbnails/5.jpg)
Misused Technique
![Page 6: Hannah Aizenman - Get To Know Your Data](https://reader034.vdocument.in/reader034/viewer/2022042817/559c1cb11a28ab14158b4674/html5/thumbnails/6.jpg)
Start?
![Page 7: Hannah Aizenman - Get To Know Your Data](https://reader034.vdocument.in/reader034/viewer/2022042817/559c1cb11a28ab14158b4674/html5/thumbnails/7.jpg)
Research
![Page 8: Hannah Aizenman - Get To Know Your Data](https://reader034.vdocument.in/reader034/viewer/2022042817/559c1cb11a28ab14158b4674/html5/thumbnails/8.jpg)
Explore Attributes
![Page 9: Hannah Aizenman - Get To Know Your Data](https://reader034.vdocument.in/reader034/viewer/2022042817/559c1cb11a28ab14158b4674/html5/thumbnails/9.jpg)
Take Snapshots
![Page 10: Hannah Aizenman - Get To Know Your Data](https://reader034.vdocument.in/reader034/viewer/2022042817/559c1cb11a28ab14158b4674/html5/thumbnails/10.jpg)
Plot
![Page 11: Hannah Aizenman - Get To Know Your Data](https://reader034.vdocument.in/reader034/viewer/2022042817/559c1cb11a28ab14158b4674/html5/thumbnails/11.jpg)
Label
![Page 12: Hannah Aizenman - Get To Know Your Data](https://reader034.vdocument.in/reader034/viewer/2022042817/559c1cb11a28ab14158b4674/html5/thumbnails/12.jpg)
Rearrange
![Page 13: Hannah Aizenman - Get To Know Your Data](https://reader034.vdocument.in/reader034/viewer/2022042817/559c1cb11a28ab14158b4674/html5/thumbnails/13.jpg)
Higher D Data: Plot 1 Dim
![Page 14: Hannah Aizenman - Get To Know Your Data](https://reader034.vdocument.in/reader034/viewer/2022042817/559c1cb11a28ab14158b4674/html5/thumbnails/14.jpg)
Plot Another Dim (or 2)
![Page 15: Hannah Aizenman - Get To Know Your Data](https://reader034.vdocument.in/reader034/viewer/2022042817/559c1cb11a28ab14158b4674/html5/thumbnails/15.jpg)
Fix that Plot
![Page 16: Hannah Aizenman - Get To Know Your Data](https://reader034.vdocument.in/reader034/viewer/2022042817/559c1cb11a28ab14158b4674/html5/thumbnails/16.jpg)
Histogram
![Page 17: Hannah Aizenman - Get To Know Your Data](https://reader034.vdocument.in/reader034/viewer/2022042817/559c1cb11a28ab14158b4674/html5/thumbnails/17.jpg)
Min, Max, Mean, Median
![Page 18: Hannah Aizenman - Get To Know Your Data](https://reader034.vdocument.in/reader034/viewer/2022042817/559c1cb11a28ab14158b4674/html5/thumbnails/18.jpg)
Too Much Data
![Page 19: Hannah Aizenman - Get To Know Your Data](https://reader034.vdocument.in/reader034/viewer/2022042817/559c1cb11a28ab14158b4674/html5/thumbnails/19.jpg)
Multivariate Relationships
![Page 20: Hannah Aizenman - Get To Know Your Data](https://reader034.vdocument.in/reader034/viewer/2022042817/559c1cb11a28ab14158b4674/html5/thumbnails/20.jpg)
Multivariate Relationships With Classes
![Page 21: Hannah Aizenman - Get To Know Your Data](https://reader034.vdocument.in/reader034/viewer/2022042817/559c1cb11a28ab14158b4674/html5/thumbnails/21.jpg)
Known Patterns
![Page 22: Hannah Aizenman - Get To Know Your Data](https://reader034.vdocument.in/reader034/viewer/2022042817/559c1cb11a28ab14158b4674/html5/thumbnails/22.jpg)
Expected Values
![Page 23: Hannah Aizenman - Get To Know Your Data](https://reader034.vdocument.in/reader034/viewer/2022042817/559c1cb11a28ab14158b4674/html5/thumbnails/23.jpg)
Look For Structure
![Page 24: Hannah Aizenman - Get To Know Your Data](https://reader034.vdocument.in/reader034/viewer/2022042817/559c1cb11a28ab14158b4674/html5/thumbnails/24.jpg)
Incorporate Outside Knowledge
![Page 25: Hannah Aizenman - Get To Know Your Data](https://reader034.vdocument.in/reader034/viewer/2022042817/559c1cb11a28ab14158b4674/html5/thumbnails/25.jpg)
Weave it All Together