![Page 1: Foresight: Recommending Visual Insights - UMass Amherstphaas/files/kdd-idea17.pdf · 2017-08-24 · 1.Structure data variation around statistical descriptors 2.Use descriptor strength](https://reader033.vdocument.in/reader033/viewer/2022042223/5eca0385cb8ac030fe4555a9/html5/thumbnails/1.jpg)
Foresight: Recommending Visual Insights
Çağatay Demiralp Peter Haas Srinivasan Parthasarathy Tejaswini Pedapati
IBM Research
![Page 2: Foresight: Recommending Visual Insights - UMass Amherstphaas/files/kdd-idea17.pdf · 2017-08-24 · 1.Structure data variation around statistical descriptors 2.Use descriptor strength](https://reader033.vdocument.in/reader033/viewer/2022042223/5eca0385cb8ac030fe4555a9/html5/thumbnails/2.jpg)
Foresight: Recommending Visual Insights
Çağatay Demiralp Peter Haas Srinivasan Parthasarathy Tejaswini Pedapati
IBM Research
![Page 3: Foresight: Recommending Visual Insights - UMass Amherstphaas/files/kdd-idea17.pdf · 2017-08-24 · 1.Structure data variation around statistical descriptors 2.Use descriptor strength](https://reader033.vdocument.in/reader033/viewer/2022042223/5eca0385cb8ac030fe4555a9/html5/thumbnails/3.jpg)
3
![Page 4: Foresight: Recommending Visual Insights - UMass Amherstphaas/files/kdd-idea17.pdf · 2017-08-24 · 1.Structure data variation around statistical descriptors 2.Use descriptor strength](https://reader033.vdocument.in/reader033/viewer/2022042223/5eca0385cb8ac030fe4555a9/html5/thumbnails/4.jpg)
4
OpenGL DirectX Java2D
HTML Canvas
Processing Prefuse
D3 ggplot VizQL VizML
Excel Google Charts
Tableau
speed expressiveness
Chart Typologies
Declarative Encoding
Languages
Component Model
ArchitecturesGraphics
APIs
Majority of Users
Automated Visualization
Systems
![Page 5: Foresight: Recommending Visual Insights - UMass Amherstphaas/files/kdd-idea17.pdf · 2017-08-24 · 1.Structure data variation around statistical descriptors 2.Use descriptor strength](https://reader033.vdocument.in/reader033/viewer/2022042223/5eca0385cb8ac030fe4555a9/html5/thumbnails/5.jpg)
speed expressiveness
5
OpenGL DirectX Java2D
HTML Canvas
Processing Prefuse
D3 ggplot VizQL VizML
Excel Google Charts
Tableau
Chart Typologies
Declarative Encoding
Languages
Component Model
ArchitecturesGraphics
APIs
Automated Visualization
Systems
Majority of Users
Foresight
![Page 6: Foresight: Recommending Visual Insights - UMass Amherstphaas/files/kdd-idea17.pdf · 2017-08-24 · 1.Structure data variation around statistical descriptors 2.Use descriptor strength](https://reader033.vdocument.in/reader033/viewer/2022042223/5eca0385cb8ac030fe4555a9/html5/thumbnails/6.jpg)
Exploratory Data Analysis (EDA)
6
John W. Tukey (1915 - 2000)
Explore patterns and relations in data, ask questions and (re)form hypotheses
Statistics + visualizations
“Here is the data! Which questions does it want us to ask? What seems to be going on?”
Exploratory vs. confirmatory
![Page 7: Foresight: Recommending Visual Insights - UMass Amherstphaas/files/kdd-idea17.pdf · 2017-08-24 · 1.Structure data variation around statistical descriptors 2.Use descriptor strength](https://reader033.vdocument.in/reader033/viewer/2022042223/5eca0385cb8ac030fe4555a9/html5/thumbnails/7.jpg)
EDA CHALLENGES
7
Data complexity
Insufficient time and skills
Cognitive limitations
Transient working memory
Tendency to fit evidence to existing expectations and schemas
[Tversky & Kahneman’75,Nickerson’98,Card et al.’05]
![Page 8: Foresight: Recommending Visual Insights - UMass Amherstphaas/files/kdd-idea17.pdf · 2017-08-24 · 1.Structure data variation around statistical descriptors 2.Use descriptor strength](https://reader033.vdocument.in/reader033/viewer/2022042223/5eca0385cb8ac030fe4555a9/html5/thumbnails/8.jpg)
FORESIGHT
8
Structured, rapid first order EDA
Framework for exploring datasets through ranked and neighborhood based visualizations
Exploring engine supporting a faceted interface
Sketch based composition for fast approximate computation
![Page 9: Foresight: Recommending Visual Insights - UMass Amherstphaas/files/kdd-idea17.pdf · 2017-08-24 · 1.Structure data variation around statistical descriptors 2.Use descriptor strength](https://reader033.vdocument.in/reader033/viewer/2022042223/5eca0385cb8ac030fe4555a9/html5/thumbnails/9.jpg)
9
OECD Dataset: 25 well-being indicators (columns) for 36 OECD member countries (rows)
DEMO
![Page 10: Foresight: Recommending Visual Insights - UMass Amherstphaas/files/kdd-idea17.pdf · 2017-08-24 · 1.Structure data variation around statistical descriptors 2.Use descriptor strength](https://reader033.vdocument.in/reader033/viewer/2022042223/5eca0385cb8ac030fe4555a9/html5/thumbnails/10.jpg)
10
PRIOR WORK
![Page 11: Foresight: Recommending Visual Insights - UMass Amherstphaas/files/kdd-idea17.pdf · 2017-08-24 · 1.Structure data variation around statistical descriptors 2.Use descriptor strength](https://reader033.vdocument.in/reader033/viewer/2022042223/5eca0385cb8ac030fe4555a9/html5/thumbnails/11.jpg)
data
1
6
1 8X
Y
1
6
1 8X
Z
1
6
1 8Y
Z
visual encoding
0
1.8
3.5
5.3
7
X Y
0
1.8
3.5
5.3
7
X Y
![Page 12: Foresight: Recommending Visual Insights - UMass Amherstphaas/files/kdd-idea17.pdf · 2017-08-24 · 1.Structure data variation around statistical descriptors 2.Use descriptor strength](https://reader033.vdocument.in/reader033/viewer/2022042223/5eca0385cb8ac030fe4555a9/html5/thumbnails/12.jpg)
data
1
6
1 8X
Y
1
6
1 8X
Z
1
6
1 8Y
Z
visual encoding
0
1.8
3.5
5.3
7
X Y
measure + data
0
1.8
3.5
5.3
7
X Y
measure +
AutoVis’10Rank-by-Feature’04
Foresight
Voyager’16Voyager-2’17
SeeDB’15
SAGE’94
Gotz & Wen’09Zhou & Chen'03
Zenvisage’16
VizDeck’13
ShowMe’07
GrandTour’84PRIM-9’79
![Page 13: Foresight: Recommending Visual Insights - UMass Amherstphaas/files/kdd-idea17.pdf · 2017-08-24 · 1.Structure data variation around statistical descriptors 2.Use descriptor strength](https://reader033.vdocument.in/reader033/viewer/2022042223/5eca0385cb8ac030fe4555a9/html5/thumbnails/13.jpg)
data
1
6
1 8X
Y
1
6
1 8X
Z
1
6
1 8Y
Z
visual encoding
0
1.8
3.5
5.3
7
X Y
measure +
0
1.8
3.5
5.3
7
X Y
measure + statistical
AutoVis’10Rank-by-Feature’04
Foresight
Voyager’16Voyager-2’17
SeeDB’15
SAGE’94
Gotz & Wen’09Zhou & Chen'03
Zenvisage’16
VizDeck’13
ShowMe’07
GrandTour’84PRIM-9’79
![Page 14: Foresight: Recommending Visual Insights - UMass Amherstphaas/files/kdd-idea17.pdf · 2017-08-24 · 1.Structure data variation around statistical descriptors 2.Use descriptor strength](https://reader033.vdocument.in/reader033/viewer/2022042223/5eca0385cb8ac030fe4555a9/html5/thumbnails/14.jpg)
data
1
6
1 8X
Y
1
6
1 8X
Z
1
6
1 8Y
Z
visual encoding
0
1.8
3.5
5.3
7
X Y
measure +
0
1.8
3.5
5.3
7
X Y
measure + statistical
task
coveragealphabetical
Mackinlay’s ranking
user preference
saliency
AutoVis’10Rank-by-Feature’04
Foresight
Voyager’16Voyager-2’17
SeeDB’15
SAGE’94
Gotz & Wen’09Zhou & Chen'03
Zenvisage’16
VizDeck’13
ShowMe’07
GrandTour’84PRIM-9’79
![Page 15: Foresight: Recommending Visual Insights - UMass Amherstphaas/files/kdd-idea17.pdf · 2017-08-24 · 1.Structure data variation around statistical descriptors 2.Use descriptor strength](https://reader033.vdocument.in/reader033/viewer/2022042223/5eca0385cb8ac030fe4555a9/html5/thumbnails/15.jpg)
DESIGN
15
![Page 16: Foresight: Recommending Visual Insights - UMass Amherstphaas/files/kdd-idea17.pdf · 2017-08-24 · 1.Structure data variation around statistical descriptors 2.Use descriptor strength](https://reader033.vdocument.in/reader033/viewer/2022042223/5eca0385cb8ac030fe4555a9/html5/thumbnails/16.jpg)
INTERVIEW STUDY
16
10 data scientists (2 female + 8 male)
IBM Research
Diverse domains, e.g., healthcare, marketing , finance, etc.
MS & PhDs
Predictive modeling
Participants:
![Page 17: Foresight: Recommending Visual Insights - UMass Amherstphaas/files/kdd-idea17.pdf · 2017-08-24 · 1.Structure data variation around statistical descriptors 2.Use descriptor strength](https://reader033.vdocument.in/reader033/viewer/2022042223/5eca0385cb8ac030fe4555a9/html5/thumbnails/17.jpg)
INTERVIEW STUDY
17
How do analysts start exploratory data analysis?
What tools do analysts generally work with?
What visualizations and statistics do analysts frequently use?
How do analysts decide on what is “interesting” in data?
What strategies do analysts use with large data?
What are productivity challenges in general and for specific tools?
Sought answers for:
? !
![Page 18: Foresight: Recommending Visual Insights - UMass Amherstphaas/files/kdd-idea17.pdf · 2017-08-24 · 1.Structure data variation around statistical descriptors 2.Use descriptor strength](https://reader033.vdocument.in/reader033/viewer/2022042223/5eca0385cb8ac030fe4555a9/html5/thumbnails/18.jpg)
INTERVIEW STUDY
18
Procedure & analysis:
IBM Research
Face to face, open ended
Walk through a recent experience
Three note takers & audio recorded
Lasted ~30 mins
Merged & grouped through iterative coding
![Page 19: Foresight: Recommending Visual Insights - UMass Amherstphaas/files/kdd-idea17.pdf · 2017-08-24 · 1.Structure data variation around statistical descriptors 2.Use descriptor strength](https://reader033.vdocument.in/reader033/viewer/2022042223/5eca0385cb8ac030fe4555a9/html5/thumbnails/19.jpg)
INTERVIEW STUDY
19IBM Research
Results:1) EDA in Data Analysis Process
2) Junior versus Senior Analysts
3) Stratified Greedy Navigation
4) Handling Big Data
5) Tools
6) Challenges
![Page 20: Foresight: Recommending Visual Insights - UMass Amherstphaas/files/kdd-idea17.pdf · 2017-08-24 · 1.Structure data variation around statistical descriptors 2.Use descriptor strength](https://reader033.vdocument.in/reader033/viewer/2022042223/5eca0385cb8ac030fe4555a9/html5/thumbnails/20.jpg)
INTERVIEW RESULTS
20IBM Research
EDAWrangling Profiling Modeling ReportingDiscovery
EDA in Data Analysis Process Analysts spent most of their time on EDA, after data is readied for analysis
First order understanding dominated EDA
![Page 21: Foresight: Recommending Visual Insights - UMass Amherstphaas/files/kdd-idea17.pdf · 2017-08-24 · 1.Structure data variation around statistical descriptors 2.Use descriptor strength](https://reader033.vdocument.in/reader033/viewer/2022042223/5eca0385cb8ac030fe4555a9/html5/thumbnails/21.jpg)
INTERVIEW RESULTS
21IBM Research
Junior versus Senior Analysts Senior analysts (5+ years experience) spent more time on domain understanding and EDA than junior analysts
Junior analysts transitioned to modeling faster, relied more on ML based techniques
Senior analysts relied on basic statistical techniques but put more emphasis on domain specific—causal/semantic—relations
![Page 22: Foresight: Recommending Visual Insights - UMass Amherstphaas/files/kdd-idea17.pdf · 2017-08-24 · 1.Structure data variation around statistical descriptors 2.Use descriptor strength](https://reader033.vdocument.in/reader033/viewer/2022042223/5eca0385cb8ac030fe4555a9/html5/thumbnails/22.jpg)
INTERVIEW RESULTS
22IBM Research
Stratified Greedy Navigation Simpler, univariate to more complex, multivariate
Hierarchical both in statistical computation and data relations
Rarely considered trivariate relations
Greedy strategy deciding on what to focus
May cause premature fixation
![Page 23: Foresight: Recommending Visual Insights - UMass Amherstphaas/files/kdd-idea17.pdf · 2017-08-24 · 1.Structure data variation around statistical descriptors 2.Use descriptor strength](https://reader033.vdocument.in/reader033/viewer/2022042223/5eca0385cb8ac030fe4555a9/html5/thumbnails/23.jpg)
DESIGN CRITERIA
23
1. Structure data variation around statistical descriptors
2. Use descriptor strength to drive the promotion of data variation
3. Give user control over the definition of descriptor strength
4. Use the best visualizations for communicating statistical descriptors
5. Facilitate stratified work flow to minimize the cost of exploration
6. Enable access to raw data on demand
![Page 24: Foresight: Recommending Visual Insights - UMass Amherstphaas/files/kdd-idea17.pdf · 2017-08-24 · 1.Structure data variation around statistical descriptors 2.Use descriptor strength](https://reader033.vdocument.in/reader033/viewer/2022042223/5eca0385cb8ac030fe4555a9/html5/thumbnails/24.jpg)
24
DESCRIPTORSDispersion: Quartile coefficient of dispersion; visualized with histogram
Skew: Standardized skewness coefficient; visualized with histogram
Heavy tails: Kurtosis; visualized with histogram
Outliers: Number of points outside the inlier range of Tukey box-and-whisker plot; visualized using box-and-whisker plot
Heterogeneous frequencies: Normalized Shannon Entropy; visualized with Pareto chart
Linear relationship: Absolute value of the Person correlation coefficient; visualized with a scatter plot with a best line fit overlaid
![Page 25: Foresight: Recommending Visual Insights - UMass Amherstphaas/files/kdd-idea17.pdf · 2017-08-24 · 1.Structure data variation around statistical descriptors 2.Use descriptor strength](https://reader033.vdocument.in/reader033/viewer/2022042223/5eca0385cb8ac030fe4555a9/html5/thumbnails/25.jpg)
A
B
25
NEIGHBORHOOD
U
W
V
Q
Z V X
![Page 26: Foresight: Recommending Visual Insights - UMass Amherstphaas/files/kdd-idea17.pdf · 2017-08-24 · 1.Structure data variation around statistical descriptors 2.Use descriptor strength](https://reader033.vdocument.in/reader033/viewer/2022042223/5eca0385cb8ac030fe4555a9/html5/thumbnails/26.jpg)
26
NEIGHBORHOOD
A
B
E
B
A
Q
B A
![Page 27: Foresight: Recommending Visual Insights - UMass Amherstphaas/files/kdd-idea17.pdf · 2017-08-24 · 1.Structure data variation around statistical descriptors 2.Use descriptor strength](https://reader033.vdocument.in/reader033/viewer/2022042223/5eca0385cb8ac030fe4555a9/html5/thumbnails/27.jpg)
A
B
27
NEIGHBORHOOD
U
W
V
Q
Z V X
![Page 28: Foresight: Recommending Visual Insights - UMass Amherstphaas/files/kdd-idea17.pdf · 2017-08-24 · 1.Structure data variation around statistical descriptors 2.Use descriptor strength](https://reader033.vdocument.in/reader033/viewer/2022042223/5eca0385cb8ac030fe4555a9/html5/thumbnails/28.jpg)
Z28
NEIGHBORHOOD
I
Z
B
Z
M
Z
![Page 29: Foresight: Recommending Visual Insights - UMass Amherstphaas/files/kdd-idea17.pdf · 2017-08-24 · 1.Structure data variation around statistical descriptors 2.Use descriptor strength](https://reader033.vdocument.in/reader033/viewer/2022042223/5eca0385cb8ac030fe4555a9/html5/thumbnails/29.jpg)
29
SCALABILTY VIA
SKETCHING
![Page 30: Foresight: Recommending Visual Insights - UMass Amherstphaas/files/kdd-idea17.pdf · 2017-08-24 · 1.Structure data variation around statistical descriptors 2.Use descriptor strength](https://reader033.vdocument.in/reader033/viewer/2022042223/5eca0385cb8ac030fe4555a9/html5/thumbnails/30.jpg)
30
SKETCHESCompressed synopses for fast approximate computations
Provide desirable guarantees on approximation errors
Hyperplane sketch for correlation
![Page 31: Foresight: Recommending Visual Insights - UMass Amherstphaas/files/kdd-idea17.pdf · 2017-08-24 · 1.Structure data variation around statistical descriptors 2.Use descriptor strength](https://reader033.vdocument.in/reader033/viewer/2022042223/5eca0385cb8ac030fe4555a9/html5/thumbnails/31.jpg)
CONCLUSION
31
![Page 32: Foresight: Recommending Visual Insights - UMass Amherstphaas/files/kdd-idea17.pdf · 2017-08-24 · 1.Structure data variation around statistical descriptors 2.Use descriptor strength](https://reader033.vdocument.in/reader033/viewer/2022042223/5eca0385cb8ac030fe4555a9/html5/thumbnails/32.jpg)
32
Herb A. Simon (1916 - 2001)
“What information consumes is rather obvious: it consumes the attention of its recipients. Hence a wealth of information creates a poverty of attention, and a need to allocate that attention efficiently among the overabundance of information sources that might consume it.”
![Page 33: Foresight: Recommending Visual Insights - UMass Amherstphaas/files/kdd-idea17.pdf · 2017-08-24 · 1.Structure data variation around statistical descriptors 2.Use descriptor strength](https://reader033.vdocument.in/reader033/viewer/2022042223/5eca0385cb8ac030fe4555a9/html5/thumbnails/33.jpg)
FORESIGHT
33
Framework for exploring datasets through ranked and neighborhood based visualizations
Exploring engine supporting a faceted interface
Sketch based composition for fast approximate computation
Interview study providing insights into the EDA practices, informing EDA tool design at large
![Page 34: Foresight: Recommending Visual Insights - UMass Amherstphaas/files/kdd-idea17.pdf · 2017-08-24 · 1.Structure data variation around statistical descriptors 2.Use descriptor strength](https://reader033.vdocument.in/reader033/viewer/2022042223/5eca0385cb8ac030fe4555a9/html5/thumbnails/34.jpg)
ON GOING
34
Human-subjects study
New descriptors
![Page 35: Foresight: Recommending Visual Insights - UMass Amherstphaas/files/kdd-idea17.pdf · 2017-08-24 · 1.Structure data variation around statistical descriptors 2.Use descriptor strength](https://reader033.vdocument.in/reader033/viewer/2022042223/5eca0385cb8ac030fe4555a9/html5/thumbnails/35.jpg)
Foresight: Recommending Visual Insights
Çağatay Demiralp Peter Haas Srinivasan Parthasarathy Tejaswini Pedapati
IBM Research
@serravis
![Page 36: Foresight: Recommending Visual Insights - UMass Amherstphaas/files/kdd-idea17.pdf · 2017-08-24 · 1.Structure data variation around statistical descriptors 2.Use descriptor strength](https://reader033.vdocument.in/reader033/viewer/2022042223/5eca0385cb8ac030fe4555a9/html5/thumbnails/36.jpg)
INSIGHT
36
Strong manifestation of a statistical property of the data, e.g., high correlation between two attributes, high skewness or concentration about the mean of a single attribute, a strong clustering of values, etc.