visualize big graph data

30
MATHIEU BASTIAN DATA VISUALIZATION SUMMIT, SAN FRANCISCO, APRIL 11-12, 2013 1

Upload: mathieu-bastian

Post on 26-Jan-2015

115 views

Category:

Technology


3 download

DESCRIPTION

 

TRANSCRIPT

Page 1: Visualize Big Graph Data

M A T H I E U B A S T I A N

D A T A V I S U A L I Z A T I O N S U M M I T , S A N F R A N C I S C O , A P R I L 1 1 - 1 2 , 2 0 1 3 1

Page 2: Visualize Big Graph Data

BIG GRAPH DATA

•  The story of big graph data is just starting •  BIG GRAPH DATA

2 2 D A T A V I S U A L I Z A T I O N S U M M I T

Page 3: Visualize Big Graph Data

BIG GRAPH DATA

•  The story of big graph data is just starting •  BIG GRAPH DATA

3 3 D A T A V I S U A L I Z A T I O N S U M M I T

BIG DATA GRAPHS

Page 4: Visualize Big Graph Data

BIG GRAPH DATA

•  The story of big graph data is just starting •  BIG GRAPH DATA

4 4 D A T A V I S U A L I Z A T I O N S U M M I T

BIG DATA GRAPHS

LARGE DATASETS

DISTRIBUTED SYSTEMS

HADOOP

INDEXATION

REAL-TIME

STORAGE COMPLEX

ALGORITHM

ANALYTICS VISUALIZATION

CLOUD COMPUTING

DATABASES

Page 5: Visualize Big Graph Data

BIG GRAPH DATA

•  The story of big graph data is just starting •  BIG GRAPH DATA

5 5 D A T A V I S U A L I Z A T I O N S U M M I T

BIG DATA GRAPHS

LARGE DATASETS

DISTRIBUTED SYSTEMS

HADOOP

INDEXATION

REAL-TIME

STORAGE COMPLEX

ALGORITHM

ANALYTICS VISUALIZATION

CLOUD COMPUTING

DATABASES

Page 6: Visualize Big Graph Data

•  “The Petabyte age” •  All industries and domains can leverage big data

•  Big Data => Big Problems •  Focusing on building the technology to handle big data, and big

graph data (ex: graph databases) •  Seeking efficient analysis of ever more complex systems

BIG DATA

6 6 D A T A V I S U A L I Z A T I O N S U M M I T

Health Government Finance Technology

Page 7: Visualize Big Graph Data

•  Graphs are everywhere, and it’s easy to collect graph data •  The world is more complex and interconnected that we thought

GRAPHS

7 7 D A T A V I S U A L I Z A T I O N S U M M I T

Source: Collective Dynamics of Small-World Networks, D Watts, S Strogatz, Nature 393, 440-442

Page 8: Visualize Big Graph Data

•  The study of graphs has been exploding in the last 15 years •  Networks have properties and patterns one can study •  Robustness – How a network is resistant to random attacks? •  Contagion – How fast a disease or gossip spread in a network? •  Communities – How many communities exist in a network? •  Centrality – Who is the most central individual in a network?

•  If you read one of these books, you understand Network Science

NETWORK SCIENCE

8 8 D A T A V I S U A L I Z A T I O N S U M M I T

Page 9: Visualize Big Graph Data

•  Saddam Hussein Network (2003)

GRAPHS HELP SOLVE PROBLEMS

9 9 D A T A V I S U A L I Z A T I O N S U M M I T

The Universe

C. Wilson. Searching for Saddam: a five-part series on how the US military used social networking to capture the Iraqi dictator. 2010. www.slate.com/id/2245228/.

Page 10: Visualize Big Graph Data

•  Predicting and controlling infectious disease

GRAPHS HELP SOLVE PROBLEMS

10

10 D A T A V I S U A L I Z A T I O N S U M M I T

The Universe Naoki Masuda, Petter Holme - Predicting and controlling infectious disease epidemics using temporal networks. http://f1000.com/prime/reports/b/5/6/ Haraldsdottir S, Gupta S, Anderson RM: Preliminary studies of sexual networks in a male homosexual community in Iceland. J Acquir Immune Defic Syndr. 1992, 5:374–81.

Page 11: Visualize Big Graph Data

•  Recommendation systems

GRAPHS HELP SOLVE PROBLEMS

11

11 D A T A V I S U A L I Z A T I O N S U M M I T

The Universe

Credit: http://markorodriguez.com/2011/09/22/a-graph-based-movie-recommender-engine/

Page 12: Visualize Big Graph Data

•  Recipe recommendation using ingredient networks

GRAPHS HELP SOLVE PROBLEMS

12

12 D A T A V I S U A L I Z A T I O N S U M M I T

The Universe

Credit: http://www.ladamic.com/wordpress/?p=294

Page 13: Visualize Big Graph Data

•  Power grid

GRAPHS HELP SOLVE PROBLEMS

13

13 D A T A V I S U A L I Z A T I O N S U M M I T

The Universe

Credit: http://www.npr.org/templates/story/story.php?storyId=110997398

Page 14: Visualize Big Graph Data

•  Famous “Zachary’s Karate Club” study in 1977 only involved 34 nodes.

•  It could be drawn by hand on paper

SMALL GRAPHS

14

14 D A T A V I S U A L I Z A T I O N S U M M I T

The Universe

W. W. Zachary, An information flow model for conflict and fission in small groups, Journal of Anthropological Research 33, 452-473 (1977).

Zachary’s Karate Club (1977)

Page 15: Visualize Big Graph Data

•  Your own Facebook or LinkedIn social network •  The Harlem Shake: Anatomy of a Viral Meme

MEDIUM GRAPHS

15

15 D A T A V I S U A L I Z A T I O N S U M M I T

The Universe

Gilad Lotan. http://www.huffingtonpost.com/gilad-lotan/the-harlem-shake_b_2804799.html

Page 16: Visualize Big Graph Data

•  The Internet Map (~350 000 domains) •  DBPedia (~290M relationships) •  Friendster Social Network dataset* (1.8B edges)

LARGE GRAPHS

16

16 D A T A V I S U A L I Z A T I O N S U M M I T

The Universe

Internet Map (http://internet-map.net)

* http://snap.stanford.edu/data/index.html

Page 17: Visualize Big Graph Data

•  Graphs can be explicit or implicit •  Explicit: The network exists in nature (Social Network, Food Webs,

Airlines Network) •  Implicit: The network is derived from other data (Word networks, co-

authorship)

•  Example of an implicit graph: •  A set of documents have a set of tags •  One can create a link when two tags are on the same document •  Aggregate all links across all documents

IMPLICIT GRAPHS

17

17 D A T A V I S U A L I Z A T I O N S U M M I T

Page 18: Visualize Big Graph Data

•  Graphs of all the co-occurrences between LinkedIn Skills (2011)

SIMILARITY GRAPHS

18

18 D A T A V I S U A L I Z A T I O N S U M M I T

Page 19: Visualize Big Graph Data

•  Visualization and statistics are the two basic toolkits one can use on graphs

•  Complex questions are asked when studying graphs

•  Easy •  Min, max, average, quartiles •  Exact queries, search

•  Harder •  Patterns, trends, correlations •  Changes over time, context •  Anomalies, data errors •  Geographical representation

VISUALIZATION

19

19 D A T A V I S U A L I Z A T I O N S U M M I T

Excel can do this!

Visualization can do this!

Page 20: Visualize Big Graph Data

•  Due to the size of graphs and the complexity of questions, visualization is the natural tool to understand what’s going on

GRAPH VISUALIZATION

20

20 D A T A V I S U A L I Z A T I O N S U M M I T

“ We are more easily persuaded by the reasons we ourselves discover than by those which are given to us by others.” Blaise Pascal

Let me play with the data!

Direct manipulation

Page 21: Visualize Big Graph Data

•  Use visualization and statistics to discover new hypothesis • Exploratory data analysis

•  The user interface is centered around the human •  Empowers the user to understand the structure and patterns in

the data •  The machine augments the human •  How? • Overview and details, zoom and pan interface •  Interactive, direct-manipulation

DATA EXPLORATION AND INTERACTION

21

21 D A T A V I S U A L I Z A T I O N S U M M I T

“The greatest value of a picture is when it forces us to notice what we never expected to see.” John Tukey

Page 22: Visualize Big Graph Data

•  Iterative process to transform relational data into a map

•  Use color, size and position to highlight, group and set up a hierarchy

MAP YOUR DATA

22

22 D A T A V I S U A L I Z A T I O N S U M M I T

Page 23: Visualize Big Graph Data

•  Exploring networks interactively & iterating often provide “Eureka” moments for domain experts

FROM INFORMATION TO KNOWLEDGE

23

23 D A T A V I S U A L I Z A T I O N S U M M I T

Eureka

Page 24: Visualize Big Graph Data

•  Big graph data doesn’t necessarily mean you’re visualizing or analyzing a large graph

•  Small graphs can be extracted from large graphs and analyzed •  Small graphs can be extracted from non-graph data as well •  Graphs are just nodes and relationships after all

•  Example: Adverse Drug Event Analysis with Hadoop, R, and Gephi (Josh Wills, Cloudera, 2012)

BIG GRAPH DATA

24

24 D A T A V I S U A L I Z A T I O N S U M M I T

Page 25: Visualize Big Graph Data

•  Built to solve large graph visualization problems. •  Open source tool for Windows, Mac OS X and Linux •  Large international community involved •  The latest version has been downloaded > 100,000 times •  Extensible with plug-ins •  Available at http://gephi.org

GEPHI

25

25 D A T A V I S U A L I Z A T I O N S U M M I T

Page 26: Visualize Big Graph Data

GEPHI

26

26 D A T A V I S U A L I Z A T I O N S U M M I T

VISUALIZATION

LAYOUT

FILTER

STATISTICS

TIMELINE

VISUAL MAPPING

DATA EDITION

Page 27: Visualize Big Graph Data

•  Open-source lightweight JavaScript library to draw graphs •  Uses HTML5 Canvas •  Display dynamically graphs that can be generated on the fly •  Available at http://sigmajs.org

SIGMA.JS

27

27 D A T A V I S U A L I Z A T I O N S U M M I T

Sigma.js v0.1

Page 28: Visualize Big Graph Data

•  Big graph data = Relational Big Data •  Graphs are everywhere! •  Graphs have fascinating structure and patterns one can analyze •  Visualization is a natural tool for such complex data and complex

questions •  On graphs, visualization done right allows interaction and

iteration. Play. •  The hard part is to extract a small or medium graph from big data •  Open source tools like Gephi or Sigma.js are a good start

SUMMARY

28

28 D A T A V I S U A L I Z A T I O N S U M M I T

Page 29: Visualize Big Graph Data

Become a graph evangelist!

QUESTIONS?

Mathieu Bastian (@mathieubastian)

29

29 D A T A V I S U A L I Z A T I O N S U M M I T

Page 30: Visualize Big Graph Data

Join the Social Network Analysis class by Lada Adamic on Coursera https://www.coursera.org/course/sna Support the Gephi Consortium http://consortium.gephi.org Computational Information Design, Ben Fry (2004) http://benfry.com/phd/ The Atlas of Economic Complexity, Harvard's Center for International Development (CID) and the MIT Media Lab http://atlas.media.mit.edu/ The Mesh of Civilizations and International Email Flows, Bogdan State, Patrick Park, Ingmar Weber, Yelena Mejova, Michael Macy http://arxiv.org/abs/1303.0045 The Human Disease Network, Goh K-I, Cusick ME, Valle D, Childs B, Vidal M, Barabási A-L (2007) http://www.pnas.org/content/104/21/8685.full What does your intranet look like? http://intranetdiary.blogspot.co.uk/2012/11/network-visualisation.html Recipe recommendation using ingredient networks, Chun-Yuen Teng, Yu-Ru Lin, Lada A. Adamic http://arxiv.org/abs/1111.3919 US Presidents Inaugural Speeches 1969-2013 Text Network Analysis http://noduslabs.com/cases/presidents-inaugural-speeches-text-network-analysis/ 10 Reasons Why We Visualise Data http://www.slideshare.net/Facegroup/10-reasons-why-we-visualise-data

Sigma.js, Alexis Jacomy and al. http://sigmajs.org Linked: How Everything Is Connected to Everything Else and What It Means, Albert-Laszlo Barabasi http://www.amazon.com/gp/product/0452284392/ Six Degrees: The Science of a Connected Age, Duncan J. Watts http://www.amazon.com/gp/product/0393325423/ Nexus: Small Worlds and the Groundbreaking Science of Networks, Mark Buchanan http://www.amazon.com/gp/product/0393324427 Connected: The Surprising Power of Our Social Networks and How They Shape Our Lives, Nicholas A. Christakis and James H. Fowler http://www.amazon.com/dp/product/0316036137 Atelier Iceberg – Gephi http://www.slideshare.net/ateliericeberg/gephi-17680699 Adding Value through graph analysis using Titan and Faunus, Matthias Broecheler http://www.slideshare.net/knowfrominfo/titan-talk-ebaymarch2013 Network Maps Board on Pinterest, Mathieu Bastian http://pinterest.com/mathieubastian/network-maps/ Network Science Book, Albert-László Barabási http://barabasilab.neu.edu/networksciencebook Adverse Drug Event Analysis with Hadoop, R, and Gephi, Cloudera https://github.com/cloudera/ades

REFERENCES & LINKS

30

30 D A T A V I S U A L I Z A T I O N S U M M I T