data visualization at codetalks 2016
TRANSCRIPT
![Page 1: Data Visualization at codetalks 2016](https://reader033.vdocument.in/reader033/viewer/2022042706/58ab59e21a28abbc2a8b4bcb/html5/thumbnails/1.jpg)
VisualizingHigh-dimensionalData
Dr. Stefan KühnLead Data Scientist
code.talks - 30.09.2016
![Page 2: Data Visualization at codetalks 2016](https://reader033.vdocument.in/reader033/viewer/2022042706/58ab59e21a28abbc2a8b4bcb/html5/thumbnails/2.jpg)
Overview
• Data Visualization as you might know it• Main Properties of Graphics (and Humans)• A short story about Charts• Pair Plots and Correlations
• Data Visualization as you might not know it• Fundamental Problems• SVD, t-SNE and other approximations• Principal Components and Principal Curves • Parallel Coordinates• The Grand Tour
2
![Page 3: Data Visualization at codetalks 2016](https://reader033.vdocument.in/reader033/viewer/2022042706/58ab59e21a28abbc2a8b4bcb/html5/thumbnails/3.jpg)
Takeaways - hopefully ;-)
• Data Visualization is complicated• There is always an approximation• There is always a bias• There is always a misinterpretation
• Data Visualization is simple• Lots of packages available• Lots of studies + literature• Lots of examples to learn from
3
![Page 4: Data Visualization at codetalks 2016](https://reader033.vdocument.in/reader033/viewer/2022042706/58ab59e21a28abbc2a8b4bcb/html5/thumbnails/4.jpg)
Data Visualization Basics
4
![Page 5: Data Visualization at codetalks 2016](https://reader033.vdocument.in/reader033/viewer/2022042706/58ab59e21a28abbc2a8b4bcb/html5/thumbnails/5.jpg)
The Modes of Perception
5
X
X
X
X
X
XX
O
O
O O
O
O
O
OX
X
XX
O
O
O
OX
X
X
X
X
X
XX
O
O
O O
O
O
O
O
X
X X
X
XO
OO
Fast Slow
Find the outlier
![Page 6: Data Visualization at codetalks 2016](https://reader033.vdocument.in/reader033/viewer/2022042706/58ab59e21a28abbc2a8b4bcb/html5/thumbnails/6.jpg)
The Modes of Perception
6
• Pre-attentive• fast• parallel processing• effortless
• Pattern recognition• semi-fast• governed by laws of per by
• Attentive• slow• sequential• high effort (attention a very limited resource)
![Page 7: Data Visualization at codetalks 2016](https://reader033.vdocument.in/reader033/viewer/2022042706/58ab59e21a28abbc2a8b4bcb/html5/thumbnails/7.jpg)
Main Properties of Graphics
7
Category ExamplePosition
Shape
Size
Color
Orientation (Line)
Length (Line)
Type and Size (Line)
Brightness
![Page 8: Data Visualization at codetalks 2016](https://reader033.vdocument.in/reader033/viewer/2022042706/58ab59e21a28abbc2a8b4bcb/html5/thumbnails/8.jpg)
Main Properties of Graphics Humans
8
Category Amount of pre-attentive informationPosition very high
Shape ———
Size approx. 4
Color approx. 8
Orientation (Line) approx. 4
Length (Line) ———
Type and Size (Line) ———
Brightness approx. 8
![Page 9: Data Visualization at codetalks 2016](https://reader033.vdocument.in/reader033/viewer/2022042706/58ab59e21a28abbc2a8b4bcb/html5/thumbnails/9.jpg)
Pre-attentive perception
9
• Position• fast• effective• high number of different positions
• Color• use with care
• Shape• Orientation
Pre-attentive perception is effortless.Exploit this as much as you can.
![Page 10: Data Visualization at codetalks 2016](https://reader033.vdocument.in/reader033/viewer/2022042706/58ab59e21a28abbc2a8b4bcb/html5/thumbnails/10.jpg)
Pattern detection
10
„It is interesting to note that our brain […] subconsciously always prefers meaningful situations and objects.“
• Emergence• Reiification• Multi-stability• Invariance
Pattern detection can be trained.Exploit this for frequent visualizations.
![Page 11: Data Visualization at codetalks 2016](https://reader033.vdocument.in/reader033/viewer/2022042706/58ab59e21a28abbc2a8b4bcb/html5/thumbnails/11.jpg)
What is this?
11
![Page 12: Data Visualization at codetalks 2016](https://reader033.vdocument.in/reader033/viewer/2022042706/58ab59e21a28abbc2a8b4bcb/html5/thumbnails/12.jpg)
What is this?
12
![Page 13: Data Visualization at codetalks 2016](https://reader033.vdocument.in/reader033/viewer/2022042706/58ab59e21a28abbc2a8b4bcb/html5/thumbnails/13.jpg)
What is this?
13
![Page 14: Data Visualization at codetalks 2016](https://reader033.vdocument.in/reader033/viewer/2022042706/58ab59e21a28abbc2a8b4bcb/html5/thumbnails/14.jpg)
Laws of Gestalt
14
„It is interesting to note that our brain, inaccordance with the laws of Gestalt, subconsciously always prefers meaningful situations and objects.“
![Page 15: Data Visualization at codetalks 2016](https://reader033.vdocument.in/reader033/viewer/2022042706/58ab59e21a28abbc2a8b4bcb/html5/thumbnails/15.jpg)
Accuracy of Graphics
15
Square Pie vs Stacked Bar vs Pie vs Donut
What do you think?
https://eagereyes.org/blog/2016/a-reanalysis-of-a-study-about-square-pie-charts-from-2009
![Page 16: Data Visualization at codetalks 2016](https://reader033.vdocument.in/reader033/viewer/2022042706/58ab59e21a28abbc2a8b4bcb/html5/thumbnails/16.jpg)
Demo
16
![Page 17: Data Visualization at codetalks 2016](https://reader033.vdocument.in/reader033/viewer/2022042706/58ab59e21a28abbc2a8b4bcb/html5/thumbnails/17.jpg)
Beyond the Basics
17
![Page 18: Data Visualization at codetalks 2016](https://reader033.vdocument.in/reader033/viewer/2022042706/58ab59e21a28abbc2a8b4bcb/html5/thumbnails/18.jpg)
Fundamental Problems
• No accurate method in higher dimensions• Approximations methods• „Simulated“ dimensions (color, size, shape)• Animations?
• No notion of quality or accuracy for Visualizations• Information Theory?• „Stability“?
All Visualizations are wrong, but some are useful.
18
![Page 19: Data Visualization at codetalks 2016](https://reader033.vdocument.in/reader033/viewer/2022042706/58ab59e21a28abbc2a8b4bcb/html5/thumbnails/19.jpg)
Approximation methods
• Pair Plots• Axis-aligned projections • Interpretable in terms of original variables
• Singular Value Decomposition• Optimal with respect to 2-norm (Euclidean norm)
and supremum norm• Comes with an error estimate
• Other methods• Stochastic Neighborhood Embedding (t-SNE)• „Manifold Learning“
19
![Page 20: Data Visualization at codetalks 2016](https://reader033.vdocument.in/reader033/viewer/2022042706/58ab59e21a28abbc2a8b4bcb/html5/thumbnails/20.jpg)
Demo
20
![Page 21: Data Visualization at codetalks 2016](https://reader033.vdocument.in/reader033/viewer/2022042706/58ab59e21a28abbc2a8b4bcb/html5/thumbnails/21.jpg)
Manifold Learning Methods
• Locally Linear Embedding• Neighborhood-preserving embedding
• Isomap• quasi-isometric
• Multi-dimensional scaling• quasi-isometric
• Spectral Embedding• Spectral clustering based on similarity
• Stochastic Neighborhood Embedding (SNE, t-SNE)• preserves probabilities
• Local Tangent Space Alignement (LTSA)
21
![Page 22: Data Visualization at codetalks 2016](https://reader033.vdocument.in/reader033/viewer/2022042706/58ab59e21a28abbc2a8b4bcb/html5/thumbnails/22.jpg)
Local Tangent Space Alignement
22
![Page 23: Data Visualization at codetalks 2016](https://reader033.vdocument.in/reader033/viewer/2022042706/58ab59e21a28abbc2a8b4bcb/html5/thumbnails/23.jpg)
Principal Components and Curves
• Principal Component Analysis• orthogonal decomposition based on SVD• linear in all variables• tries to preserve variance
• Principal Curves• minimize the Sum of Squared Errors with respect
to all variables (as PCA, preserve variance)• nonlinear• smooth
23
![Page 24: Data Visualization at codetalks 2016](https://reader033.vdocument.in/reader033/viewer/2022042706/58ab59e21a28abbc2a8b4bcb/html5/thumbnails/24.jpg)
Principal Components and Curves
24
![Page 25: Data Visualization at codetalks 2016](https://reader033.vdocument.in/reader033/viewer/2022042706/58ab59e21a28abbc2a8b4bcb/html5/thumbnails/25.jpg)
Parallel Coordinates
• Parallel Coordinates• especially useful for high-dimensional data• depends on ordering and scaling
25
![Page 26: Data Visualization at codetalks 2016](https://reader033.vdocument.in/reader033/viewer/2022042706/58ab59e21a28abbc2a8b4bcb/html5/thumbnails/26.jpg)
Demo
26
![Page 27: Data Visualization at codetalks 2016](https://reader033.vdocument.in/reader033/viewer/2022042706/58ab59e21a28abbc2a8b4bcb/html5/thumbnails/27.jpg)
The Grand Tour
• Animated sequence of 2-D projections• https://en.wikipedia.org/wiki/
Grand_Tour_(data_visualisation)• Asimov (1985): The grand tour: a tool for viewing
multidimensional data.• Underlying idea• Randomly generate 2-D projections (random
walk)• Over time generate a dense subset of all
possible 2-D projections• Optional: Follow a given path / guided tour
27
![Page 28: Data Visualization at codetalks 2016](https://reader033.vdocument.in/reader033/viewer/2022042706/58ab59e21a28abbc2a8b4bcb/html5/thumbnails/28.jpg)
The Grand Tour
28
![Page 29: Data Visualization at codetalks 2016](https://reader033.vdocument.in/reader033/viewer/2022042706/58ab59e21a28abbc2a8b4bcb/html5/thumbnails/29.jpg)
The Grand Tour
29
![Page 30: Data Visualization at codetalks 2016](https://reader033.vdocument.in/reader033/viewer/2022042706/58ab59e21a28abbc2a8b4bcb/html5/thumbnails/30.jpg)
Demo
30