what to expect when you are visualizing
TRANSCRIPT
WHAT TO EXPECT WHEN YOU ARE VISUALIZING
Krist Wongsuphasawat / @kristw
Based on true stories Forever querying
Never-ending cleaning Hopelessly prototyping
Last minute coding and many more…
Computer Engineer Bangkok, Thailand
PhD in Computer Science Information Visualization Univ. of Maryland
IBMMicrosoft
Data Visualization Scientist Twitter
Krist Wongsuphasawat / @kristw
VISUALIZE DATA
INPUT (DATA)
=YOU+ OUTPUT (VIS)
EXPECT THE MISMATCHES
INPUT (DATA)What clients think they have
INPUT (DATA)What clients think they have What they usually have
YOUWhat clients think you are
YOUWhat clients think you are What they will get
OUTPUT (VIS)What clients ask for
OUTPUT (VIS)What clients ask for What they really need
I need this. Take this.
I need this. Here you are.
I need this. Take this.
EXPECT THESE TASKS
INPUT (DATA)
=YOU+ OUTPUT (VIS)
INPUT (DATA)
=YOU+ OUTPUT (VIS)
+Get data & Wrangle
1+Analyze
& Visualize
2
GET DATA & WRANGLE1
DATA SOURCESOpen data Publicly available
Internal data Private, owned by clients’ organization
Self-collected data Manual, site scraping, etc.
Combine the above
MANY FORMS OF DATAStandalone files txt, csv, tsv, json, Google Docs, …, pdf*
APIs better quality with more overhead
Databases doesn’t necessary mean they are organized
Big data bigger pain
HAVING ALL TWEETSHow people think I feel.
How people think I feel. How I really feel.
HAVING ALL TWEETS
CHALLENGESGet relevant Tweets hashtag: #oscars keywords: “spotlight” (movie name)
Too big Need to aggregate & reduce size
Slow Long processing time (hours)
Hadoop Cluster
GETTING BIG DATA
Data Storage
Pig / Scalding (slow)
GETTING BIG DATAHadoop Cluster
Data Storage
Tool
Hadoop Cluster
Pig / Scalding (slow)
GETTING BIG DATA
Data Storage
Tool
Pig / Scalding (slow)
GETTING BIG DATAHadoop Cluster
Data Storage
Tool
Your laptop Smaller dataset
Hadoop Cluster
Pig / Scalding (slow)
Data Storage
Tool
Final dataset
Tool node.js / python / excel (fast)
Your laptop
GETTING BIG DATA
Smaller dataset
EXPECT TO WAIT FOR (BIG) DATA
DATA WRANGLINGClean A clean dataset? Joking, right?
Filter Less is more
Parse, Format, Correct, etc. Change country code from 3-letter to 2-letter Correct time of day based on users’ timezone etc.
EXPECT A LOT OF TIME WITH DATA WRANGLING
70-80% of time “Data Janitor”
RECOMMENDATIONSAlways think that you will have to do it again document the process, automation
Reusable scripts break a gigantic do-it-all function into smaller ones
Reusable data keep for future project
ANALYZE & VISUALIZE2
EXPECT DIFFERENT REQUIREMENTS
TYPE OF PROJECTSExplanatory Exploratory
Storytelling Analytics Tools Inspirations
x x
PMs, Data ScientistsGeneral Public General Public
Understand product usage
See what data can tell us
Get inspired
TYPE OF PROJECTSExplanatory Exploratory
Storytelling Analytics Tools Inspirations
x x
PMs, Data ScientistsGeneral Public General Public
Understand product usage
See what data can tell us
Get inspired
So many things we could learn
from Twitter data
Give us interesting vis about xxxx by Nov 10
STORYTELLING : WHAT TO EXPECTtimely Deadline is strict. Also can be unexpected events.
wide audience easy to explain and understand, multi-device support
one-off projects
content screening
WHO/WHAT
STORYTELLING
WHERE WHENlocation time
user/content
WHO/WHAT
STORYTELLING
WHERE WHENlocation time
user/content
TIME : TWEETS/SECONDby Miguel Rios
TIME : TWEETS/SECONDby Miguel Rios
TIME : TWEETS/SECOND + ANNOTATION
http://www.flickr.com/photos/twitteroffice/5681263084/
by Miguel Rios
IT DOESN’T HAVE TO BE COMPLEX.
WHO/WHAT
STORYTELLING
WHERE WHENlocation time
user/content
LOCATIONLow density
High density
by Miguel Rios
LOCATION
flickr.com/photos/twitteroffice/8798020541
San Francisco
Low density
High density
by Miguel Rios
Rebuild the world based on
tweet density
twitter.github.io/interactive/andes/
by Nicolas Garcia Belmonte
WHO/WHAT
STORYTELLING
WHERE WHENlocation time
user/content
CONTENT : US ELECTION 2016
CONTENT : #MUSEUMWEEK
CONTENT : #MUSEUMWEEK
WHO/WHAT
STORYTELLING
WHERE WHENlocation time
user/content
TIME + LOCATION : TWEET TIME BY CITY
Night
Late night
Daytime
Night
Late night
Daytime
by Miguel Rios & Jimmy Lin
Night
Late night
Daytime
Night
Late night
Daytime
TIME + LOCATION : TWEET TIME BY CITYby Miguel Rios & Jimmy Lin
Night
Late night
Daytime
Night
Late night
Daytime
TIME + LOCATION : TWEET TIME BY CITYby Miguel Rios & Jimmy Lin
TIME + LOCATION : TWEET TIME BY CITY
Night
Late night
Daytime
Night
Late night
Daytime
by Miguel Rios & Jimmy Lin
WHO/WHAT
STORYTELLING
WHERE WHENlocation time
user/content
CONTENT + LOCATION : TWEET MAPby Robert Harris
CONTENT + LOCATION : TWEET MAPby Robert Harris
most frequent
term
CONTENT + LOCATION : TWEET MAPby Robert Harris
Gmail was down Jan 24, 2014
CONTENT + LOCATION : TWEET MAPby Robert Harris
USER + LOCATION : FAN MAP
interactive.twitter.com/nfl_followers2014
USER + LOCATION : FAN MAP
interactive.twitter.com/nba_followers
USER + LOCATION : FAN MAP
interactive.twitter.com/premierleague
WHO/WHAT
STORYTELLING
WHERE WHENlocation time
user/content
CONTENT + TIME : STREAMGRAPH
CONTENT + TIME : MATCH SUMMARY
Biggest tournament for European soccer clubs
CONTENT + TIME : MATCH SUMMARY
Count Tweets mentioning the teams every minute
Dortmund Bayern MunichTeam 1 Team 2
time
begin
end
CONTENT + TIME : MATCH SUMMARY
CONTENT + TIME : MATCH SUMMARY
+ goals
CONTENT + TIME : MATCH SUMMARY
+ goals + players
CONTENT + TIME : COMPETITION SUMMARY
A B C D
A C
C
vs vs
vs + =
uclfinal.twitter.com
WHO/WHAT
STORYTELLING
WHERE WHENlocation time
user/content
CONTENT + TIME + LOCATION : NEW YEAR 2014
twitter.github.io/interactive/newyear2014/
BEHIND THE SCENE
https://interactive.twitter.com/tenyears
Project / Twitter 10 years
REQUEST
EXPECT FUNNY REQUESTS
DESIGN & PROTOTYPE
Engagements
First Minute First Hour First Day First Week
0 24h 0 7d0 60s 0 60m
EXPECT REVISIONS
Visualization is an important piece, but not the entire experience.
DON’T FORGET THE BIG PICTURE.
https://interactive.twitter.com/tenyears
Demo / Twitter 10 years
WORKFLOWRequested / Identify needs
Design & Prototype
Refine Mobile, Embed
Logging
Release
EXPECT THE UNEXPECTED
WORKFLOWRequested / Identify needs
Design & Prototype
Refine Mobile, Embed
Logging
Translations
Release
TYPE OF PROJECTSExplanatory Exploratory
Storytelling Analytics Tools Inspirations
x x
PMs, Data ScientistsGeneral Public General Public
Understand product usage
See what data can tell us
Get inspired
Data sources
Output
explore
analyze
present
get
*
*
Data sources
Output
explore
analyze
present
get
*
*
ad-hoc scripts
Data sources
Output
explore
analyze
present
get
*
*
ad-hoc scripts tools for exploration
ANALYTICS TOOLS : WHAT TO EXPECTricher, more features to support exploration of complex data
more technical audience product managers, engineers, data scientists
accuracy
designed for dynamic input
long-term projects
USER ACTIVITY LOGS
UsersUseTwitter
UsersUse
Product Managers
Curious
UsersUse
Curious
Engineers
Log datain Hadoop
Write Twitter
Instrument
Product Managers
WHAT ARE BEING LOGGED?
tweet
activities
WHAT ARE BEING LOGGED?
tweet from home timeline on twitter.com tweet from search page on iPhone
activities
WHAT ARE BEING LOGGED?
tweet from home timeline on twitter.com tweet from search page on iPhone
sign up log in
retweet etc.
activities
ORGANIZE?
LOG EVENT A.K.A. “CLIENT EVENT”
[Lee et al. 2012]
LOG EVENT A.K.A. “CLIENT EVENT”
client : page : section : component : element : actionweb : home : timeline : tweet_box : button : tweet
1) User ID 2) Timestamp 3) Event name
4) Event detail
[Lee et al. 2012]
LOG DATA
UsersUse
Curious
Engineers
Log datain Hadoop
Instrument
Write
Product Managers
bigger than Tweet data
UsersUse
Curious
Engineers
Log datain Hadoop
Data Scientists
Ask
Instrument
Write
Product Managers
UsersUse
Curious
Engineers
Log datain Hadoop
Data Scientists
Find
Ask
Instrument
Write
Product Managers
LOG DATA
UsersUse
Curious
Engineers
Log datain Hadoop
Data Scientists
Find, Clean
Ask
Instrument
Write
Product Managers
UsersUse
Curious
Engineers
Log datain Hadoop
Data Scientists
Find, Clean
Ask
Monitor
Instrument
Write
Product Managers
UsersUse
Curious
Engineers
Log datain Hadoop
Data Scientists
Find, Clean, Analyze
Ask
Monitor
Instrument
Write
Product Managers
Log data
EngineersData Scientists
Usersin Hadoop
Find, Clean, Analyze
Use
Monitor
Ask
Curious
1 2
Instrument
Write
Product Managers
Scribe Radar
Project / Find & Monitor client events
Log datain Hadoop
Engineers & Data Scientists
billions of rows
Log datain Hadoop
AggregateClient events count
Engineers & Data Scientists
Log datain Hadoop
Aggregate
Find
client page section component element action
Search
Client events count
Engineers & Data Scientists
Log datain Hadoop
Aggregate
Find
client page section component element action
Search
Client events count
Engineers & Data Scientists
SECTION? COMPONENT?
ELEMENT?
client page section component element action
Search
Find
Log datain Hadoop
Aggregate
web home * * impression*
Client events count
Engineers & Data Scientists
client page section component element action
Search
Find
Query
Return
Log datain Hadoop
Resultsweb : home : home : - : - : impression
web : home : wtf : - : - : impression
Aggregate
web home * * impression*
Client events count
Engineers & Data Scientists
client page section component element action
Search
Find
Query
Return
Log datain Hadoop
Resultsweb : home : home : - : - : impression
web : home : wtf : - : - : impression
Aggregate
search can be better
Client events count
Engineers & Data Scientists
client page section component element action
Search
Find
Query
Return
Log datain Hadoop
Resultsweb : home : home : - : - : impression
web : home : wtf : - : - : impression
Aggregate
10,000+ event types
search can be better
Client events count
Engineers & Data Scientists
client page section component element action
Search
Find
Query
Return
Log datain Hadoop
Resultsweb : home : home : - : - : impression
web : home : wtf : - : - : impression
Aggregate
search can be better
10,000+ event types
not everybody knows
What are all sections under web:home?
Client events count
Engineers & Data Scientists
client page section component element action
Search
Find
Query
Return
Log datain Hadoop
Resultsweb : home : home : - : - : impression
Aggregate
one graph / event
10,000+ event types
not everybody knows
What are all sections under web:home?
Client events count
Engineers & Data Scientists
search can be better
client page section component element action
Search
Find
Query
Return
Log datain Hadoop
Resultsweb : home : home : - : - : impression
Aggregate
one graph / event
x 10,000
10,000+ event types
not everybody knows
What are all sections under web:home?
Client events count
Engineers & Data Scientists
search can be better
GOALSSearch for client events
Explore client event collection
Monitor changes
DESIGN
Client event collection
Engineers & Data Scientists
See
Client event collection
Engineers & Data Scientists
See
Client event collection
Engineers & Data Scientists
narrow down
Interactions search box => filter
See
HOW TO VISUALIZE?
narrow down
Client event collection
Engineers & Data Scientists
Interactions search box => filter
See
Client event collection
Engineers & Data Scientists
client : page : section : component : element : action
HOW TO VISUALIZE?
narrow down
Interactions search box => filter
CLIENT EVENT HIERARCHY
iphone home -
- - impression
tweet tweet click
iphone:home:-:-:-:impression
iphone:home:-:tweet:tweet:click
DETECT CHANGES
iphone home -
- - impression
tweet tweet click
iphone home -
- - impression
tweet tweet click
TODAY
7 DAYS AGO
compared to
CALCULATE CHANGES
+5% +5% +5%
+10% +10% +10%
-5% -5% -5%
DIFF
DISPLAY CHANGES
iphone home -
- - impression
tweet tweet click
Map of the Market [Wattenberg 1999], StemView [Guerra-Gomez et al. 2013]
DISPLAY CHANGES
home -
- - impression
tweet tweet click
iphone
Demo Demo Demo
Demo / Scribe Radar
Twitter for Banana
Flying Sessions
Project / Funnel Analysis
COUNT PAGE VISITS
banana : home : - : - : - : impressionhome page
FUNNEL
home page
profile page
FUNNEL ANALYSIS
1 jobhome page
profile page
1 hourbanana : home : - : - : - : impression
banana : profile : - : - : - : impression
FUNNEL ANALYSIS
banana : home : - : - : - : impression
banana : profile : - : - : - : impression banana : search : - : - : - : impression
home page
profile page search page
2 jobs2 hours
FUNNEL ANALYSIS
banana : home : - : - : - : impression
banana : profile : - : - : - : impression banana : search : - : - : - : impression
home page
profile page search page
Specify all funnels manually!
n jobs
Time to find a new job
GOAL
banana : home : - : - : - : impression
… ……
1 job => all funnels, visualized
home page
USER SESSIONSSession#1
A
B
end
Session#4
Start
end
A
Session#2
B
end
A
Session#3
C
end
A
StartStartStart
AGGREGATE
A
BB C
Start
end endend
A A
end
A
4 sessions
AGGREGATE
A
BB C
Start
end endend
end
4 sessions
AGGREGATE
C
Start
end endend
end
A
B
4 sessions
AGGREGATE
C
Start
end endend
end
A
B
4 sessions
AGGREGATE
C
Start
end endend
A
B end
4 sessions
AGGREGATE
C
Start
endend
A
B end
4 sessions
AGGREGATE
C
Start
endend
A
B end
4 sessions
AGGREGATE
Start
endend
A
CB end
4 sessions
AGGREGATE
endend
A
CB end
Start
4,000,000 sessions
(~millions sessions, 10,000+ event types)
TRY WITH SAMPLE DATA
FAIL…
Keep trying to make it work
EXPECT TRIALS AND ERRORS
Read the details in Krist Wongsuphasawat and Jimmy Lin.
“Using Visualizations to Monitor Changes and Harvest Insights from a Global-Scale Logging Infrastructure at Twitter “ Proc. IEEE Conference on Visual Analytics Science and Technology (VAST) 2014
HOW TO MAKE IT WORK?
Demo Demo Demo
Demo / Flying Sessions
WORKFLOWRequested / Identify needs
Design & Prototype Make it work for sample dataset
Refine & Generalize
Productionize
Document & Release
Maintain & Support Keep it running, Feature requests & Bugs fix
TYPE OF PROJECTSExplanatory Exploratory
Storytelling Analytics Tools Inspirations
x x
PMs, Data ScientistsGeneral Public General Public
Understand product usage
See what data can tell us
Get inspired
https://medium.com/@kristw/designing-the-game-of-tweets-7f87c30dc5a2
Project / Game of Tweets
EXPECT HARDWARE COMPLICATIONS
INPUT (DATA)
=YOU+ OUTPUT (VIS)
+Get data & Wrangle
1+Analyze
& Visualize
2
INPUT (DATA)
=YOU+ OUTPUT (VIS)
+Get data & Wrangle
1+Analyze
& Visualize
2
EXPECT TO IMPROVE
HOW TO BE BETTER?Time is limited.
Grow the team
Expand skills
Improve tooling Solve a problem once and for all
Automate repetitive tasks
https://github.com/twitter/d3kit
Demo / d3Kithttp://www.slideshare.net/kristw/d3kit
TO SUM UP
INPUT (DATA)
=YOU+ OUTPUT (VIS)
+Get data & Wrangle
1+Analyze
& Visualize
2
TYPE OF PROJECTSExplanatory Exploratory
Storytelling Analytics Tools Inspirations
x x
PMs, Data ScientistsGeneral Public General Public
Understand product usage
See what data can tell us
Get inspired
TAKE-AWAY Getting data and data wrangling are time-consuming.
Different projects, different requirements Storytelling, Product insights, Art, etc.
Combine visualization with other skills HCI, Design, Stats, ML, etc.
Expect the unexpected
Learn and improve do more with less time grow the team, expand skills, improve tooling
Krist Wongsuphasawat / @kristwkristw.yellowpigz.com
Nicolas Garcia Belmonte, Robert Harris, Miguel Rios, Simon Rogers, Jimmy Lin, Linus Lee, Chuang Liu,
and many colleagues at Twitter. Lastly, to my wife for taking care of our 3 months old baby, so I had time to prepare these slides.
ACKNOWLEDGEMENT
RESOURCESImages Banana phone http://goo.gl/GmcMPq Bar chart https://goo.gl/1G1GBg Boss https://goo.gl/gcY8Kw Champions League http://goo.gl/DjtNKE Database http://goo.gl/5N7zZz Fishing shark http://goo.gl/2fp4zW Globe visualization http://goo.gl/UiGMMj Harry Potter http://goo.gl/Q9Cy64 Holding phone http://goo.gl/It2TzH Kiwi orange http://goo.gl/ejQ73y Kiwi http://goo.gl/9yk7o5 Library https://goo.gl/HVeE6h Library earthquake http://goo.gl/rBqBrs
Minion http://goo.gl/I19Ijg NBA http://goo.gl/p7HBdG NFL http://goo.gl/feQMZs Orange & Apple http://goo.gl/NG6RIL Pile of paper http://goo.gl/mGLQTx Premier League http://goo.gl/AqIINO Scrooge McDuck https://goo.gl/aKv8D7 The Sound of Music https://goo.gl/dqHlzj Trash pile http://goo.gl/OsFfo3 Tyrion http://goo.gl/WaBonl Watercolor Map by Stamen Design
THANK YOU