analytics building blocks - poloclub.github.io
TRANSCRIPT
![Page 1: Analytics Building Blocks - poloclub.github.io](https://reader033.vdocument.in/reader033/viewer/2022051412/627dd6f97f865e1b370795e7/html5/thumbnails/1.jpg)
poloclub.github.io/#cse6242CSE6242/CX4242: Data & Visual Analytics
Analytics Building Blocks
Duen Horng (Polo) ChauAssociate Professor, College of Computing Associate Director, MS AnalyticsGeorgia Tech
Mahdi RoozbahaniLecturer, Computational Science & Engineering, Georgia TechFounder of Filio, a visual asset management platform
Partly based on materials by Guy Lebanon, Jeffrey Heer, John Stasko, Christos Faloutsos
![Page 2: Analytics Building Blocks - poloclub.github.io](https://reader033.vdocument.in/reader033/viewer/2022051412/627dd6f97f865e1b370795e7/html5/thumbnails/2.jpg)
Collection
Cleaning
Integration
Visualization
Analysis
Presentation
Dissemination
![Page 3: Analytics Building Blocks - poloclub.github.io](https://reader033.vdocument.in/reader033/viewer/2022051412/627dd6f97f865e1b370795e7/html5/thumbnails/3.jpg)
Building blocks. Not Rigid āStepsā.
Can skip some
Can go back (two-way street)
ā¢ Data types inform visualization design
ā¢ Data size informs choice of algorithms
ā¢ Visualization motivates more data cleaning
ā¢ Visualization challenges algorithm assumptionse.g., user finds that results donāt make sense
Collection
Cleaning
Integration
Visualization
Analysis
Presentation
Dissemination
![Page 4: Analytics Building Blocks - poloclub.github.io](https://reader033.vdocument.in/reader033/viewer/2022051412/627dd6f97f865e1b370795e7/html5/thumbnails/4.jpg)
How ābig dataā affects the process? (Hint: almost everything is harder!)
The Vs of big data (3Vs originally, then 7, now 42)
Volume: ābillionsā, āpetabytesā are common
Velocity: think Twitter, fraud detection, etc.
Variety: text (webpages), video (youtube)ā¦
Veracity: uncertainty of data
Variability
Visualization
Value
Collection
Cleaning
Integration
Visualization
Analysis
Presentation
Disseminationhttp://www.ibmbigdatahub.com/infographic/four-vs-big-data http://dataconomy.com/seven-vs-big-data/https://tdwi.org/articles/2017/02/08/10-vs-of-big-data.aspx
![Page 5: Analytics Building Blocks - poloclub.github.io](https://reader033.vdocument.in/reader033/viewer/2022051412/627dd6f97f865e1b370795e7/html5/thumbnails/5.jpg)
Two Example Projects from Polo Club
![Page 6: Analytics Building Blocks - poloclub.github.io](https://reader033.vdocument.in/reader033/viewer/2022051412/627dd6f97f865e1b370795e7/html5/thumbnails/6.jpg)
Apolo Graph Exploration: Machine Learning + Visualization
6
Apolo: Making Sense of Large Network Data by Combining Rich User Interaction and Machine Learning. Duen Horng (Polo) Chau, Aniket Kittur, Jason I. Hong, Christos Faloutsos. CHI 2011.
![Page 7: Analytics Building Blocks - poloclub.github.io](https://reader033.vdocument.in/reader033/viewer/2022051412/627dd6f97f865e1b370795e7/html5/thumbnails/7.jpg)
7
![Page 8: Analytics Building Blocks - poloclub.github.io](https://reader033.vdocument.in/reader033/viewer/2022051412/627dd6f97f865e1b370795e7/html5/thumbnails/8.jpg)
7
Beautiful Hairball Death Star Spaghetti
![Page 9: Analytics Building Blocks - poloclub.github.io](https://reader033.vdocument.in/reader033/viewer/2022051412/627dd6f97f865e1b370795e7/html5/thumbnails/9.jpg)
Finding More Relevant Nodes
HCIPaper
Data MiningPaper
Citation network
8
![Page 10: Analytics Building Blocks - poloclub.github.io](https://reader033.vdocument.in/reader033/viewer/2022051412/627dd6f97f865e1b370795e7/html5/thumbnails/10.jpg)
Finding More Relevant Nodes
HCIPaper
Data MiningPaper
Citation network
8
![Page 11: Analytics Building Blocks - poloclub.github.io](https://reader033.vdocument.in/reader033/viewer/2022051412/627dd6f97f865e1b370795e7/html5/thumbnails/11.jpg)
Finding More Relevant Nodes
Apolo uses guilt-by-association(Belief Propagation)
HCIPaper
Data MiningPaper
Citation network
8
![Page 12: Analytics Building Blocks - poloclub.github.io](https://reader033.vdocument.in/reader033/viewer/2022051412/627dd6f97f865e1b370795e7/html5/thumbnails/12.jpg)
Demo: Mapping the Sensemaking Literature
9
Nodes: 80k papers from Google Scholar (node size: #citation) Edges: 150k citations
![Page 13: Analytics Building Blocks - poloclub.github.io](https://reader033.vdocument.in/reader033/viewer/2022051412/627dd6f97f865e1b370795e7/html5/thumbnails/13.jpg)
![Page 14: Analytics Building Blocks - poloclub.github.io](https://reader033.vdocument.in/reader033/viewer/2022051412/627dd6f97f865e1b370795e7/html5/thumbnails/14.jpg)
![Page 15: Analytics Building Blocks - poloclub.github.io](https://reader033.vdocument.in/reader033/viewer/2022051412/627dd6f97f865e1b370795e7/html5/thumbnails/15.jpg)
Key Ideas (Recap)Specify exemplarsFind other relevant nodes (BP)
11
![Page 16: Analytics Building Blocks - poloclub.github.io](https://reader033.vdocument.in/reader033/viewer/2022051412/627dd6f97f865e1b370795e7/html5/thumbnails/16.jpg)
What did Apolo go through?
Collection
Cleaning
Integration
Visualization
Analysis
Presentation
Dissemination
Scrape Google Scholar. No API. š©
Design inference algorithm (Which nodes to show next?)
Paper, talks, lectures
Interactive visualization you just saw
You will a new Apolo prototype (called Argo)
![Page 17: Analytics Building Blocks - poloclub.github.io](https://reader033.vdocument.in/reader033/viewer/2022051412/627dd6f97f865e1b370795e7/html5/thumbnails/17.jpg)
13Apolo: Making Sense of Large Network Data by Combining Rich User Interaction and Machine Learning. Duen Horng (Polo) Chau, Aniket Kittur, Jason I. Hong, Christos Faloutsos. ACM Conference on Human Factors in Computing Systems (CHI) 2011. May 7-12, 2011.
![Page 18: Analytics Building Blocks - poloclub.github.io](https://reader033.vdocument.in/reader033/viewer/2022051412/627dd6f97f865e1b370795e7/html5/thumbnails/18.jpg)
NetProbe: Fraud Detection in Online Auction
NetProbe: A Fast and Scalable System for Fraud Detection in Online Auction Networks. Shashank Pandit, Duen Horng (Polo) Chau, Samuel Wang, Christos Faloutsos. WWW 2007
![Page 19: Analytics Building Blocks - poloclub.github.io](https://reader033.vdocument.in/reader033/viewer/2022051412/627dd6f97f865e1b370795e7/html5/thumbnails/19.jpg)
Find bad sellers (fraudsters) on eBay who donāt deliver their items
NetProbe: The Problem
Buyer
$$$
Seller
15
Non-delivery fraud is a common auction fraudsource: https://www.fbi.gov/contact-us/field-offices/portland/news/press-releases/fbi-tech-tuesday---building-a-digital-defense-against-auction-fraud
![Page 20: Analytics Building Blocks - poloclub.github.io](https://reader033.vdocument.in/reader033/viewer/2022051412/627dd6f97f865e1b370795e7/html5/thumbnails/20.jpg)
16
![Page 21: Analytics Building Blocks - poloclub.github.io](https://reader033.vdocument.in/reader033/viewer/2022051412/627dd6f97f865e1b370795e7/html5/thumbnails/21.jpg)
NetProbe: Key Ideas! Fraudsters fabricate their reputation by
ātradingā with their accomplices! Fake transactions form near bipartite cores! How to detect them?
17
![Page 22: Analytics Building Blocks - poloclub.github.io](https://reader033.vdocument.in/reader033/viewer/2022051412/627dd6f97f865e1b370795e7/html5/thumbnails/22.jpg)
NetProbe: Key IdeasUse Belief Propagation
18
F A HFraudsterAccomplic
eHonest
Darker means more likely
![Page 23: Analytics Building Blocks - poloclub.github.io](https://reader033.vdocument.in/reader033/viewer/2022051412/627dd6f97f865e1b370795e7/html5/thumbnails/23.jpg)
NetProbe: Main Results
19
![Page 24: Analytics Building Blocks - poloclub.github.io](https://reader033.vdocument.in/reader033/viewer/2022051412/627dd6f97f865e1b370795e7/html5/thumbnails/24.jpg)
20
![Page 25: Analytics Building Blocks - poloclub.github.io](https://reader033.vdocument.in/reader033/viewer/2022051412/627dd6f97f865e1b370795e7/html5/thumbnails/25.jpg)
20
![Page 26: Analytics Building Blocks - poloclub.github.io](https://reader033.vdocument.in/reader033/viewer/2022051412/627dd6f97f865e1b370795e7/html5/thumbnails/26.jpg)
20
āBelgian Policeā
![Page 27: Analytics Building Blocks - poloclub.github.io](https://reader033.vdocument.in/reader033/viewer/2022051412/627dd6f97f865e1b370795e7/html5/thumbnails/27.jpg)
21
![Page 28: Analytics Building Blocks - poloclub.github.io](https://reader033.vdocument.in/reader033/viewer/2022051412/627dd6f97f865e1b370795e7/html5/thumbnails/28.jpg)
What did NetProbe go through?
Collection
Cleaning
Integration
Visualization
Analysis
Presentation
Dissemination
Scraping (built a āscraperā/ācrawlerā)
Design detection algorithm
Not released
Paper, talks, lectures
![Page 29: Analytics Building Blocks - poloclub.github.io](https://reader033.vdocument.in/reader033/viewer/2022051412/627dd6f97f865e1b370795e7/html5/thumbnails/29.jpg)
23NetProbe: A Fast and Scalable System for Fraud Detection in Online Auction Networks. Shashank Pandit, Duen Horng (Polo) Chau, Samuel Wang, Christos Faloutsos. International Conference on World Wide Web (WWW) 2007. May 8-12, 2007. Banff, Alberta, Canada. Pages 201-210.
![Page 30: Analytics Building Blocks - poloclub.github.io](https://reader033.vdocument.in/reader033/viewer/2022051412/627dd6f97f865e1b370795e7/html5/thumbnails/30.jpg)
Homework 1
ā¢ Simple āEnd-to-endā analysis
ā¢ Collect data about LEGO via API
ā¢ Store in SQLite database
ā¢ Create graph from data
ā¢ Analyze, using SQL queries (e.g., create graphās degree distribution)
ā¢ Visualize graph using ARGO Lite
ā¢ Describe your discoveries
Collection
Cleaning
Integration
Visualization
Analysis
Presentation
Dissemination