social network analysis in your problem domain

40
GRAPH DAY TEXAS TALK Networks All Around Us: Discovering Networks in your Domain | 1/5/2015 Russell Jurney http://bit.ly/socialnetworkanalysis

Upload: russell-jurney

Post on 22-Jan-2018

444 views

Category:

Data & Analytics


0 download

TRANSCRIPT

Page 1: Social Network Analysis in Your Problem Domain

G R A P H D AY T E X A S TA L K

Networks All Around Us: Discovering Networks in your Domain | 1/5/2015

Russell Jurney

http://bit.ly/socialnetworkanalysis

Page 2: Social Network Analysis in Your Problem Domain

RELATO MAPS

MARKET

Page 3: Social Network Analysis in Your Problem Domain

BACKGROUND

Serial Entrepreneur Contributed code to Apache Druid, Apache Pig, Apache DataFu, Apache Whirr, Azkaban, MongoDB

Apache Commi?er

Three-Bme O'Reilly Author Started & Shipped Product at E8 Security

Ning, LinkedIn, Hortonworks veteran

Page 4: Social Network Analysis in Your Problem Domain

2009 2010 2011

2012 2014

EXAMPLES OF NETWORKS

Page 5: Social Network Analysis in Your Problem Domain

FOUNDER

NETWORKS

node = company edge = employment transition as in people who… …worked at one startup, founded another

Page 6: Social Network Analysis in Your Problem Domain

WEBSITE

BEHAVIOR

node = web page edge = user browses one page, then another

Page 7: Social Network Analysis in Your Problem Domain

ONLINE SOCIAL

NETWORKS

node = linkedin profile, edge = linked connection

Page 8: Social Network Analysis in Your Problem Domain

EMAIL INBOX

node = email address, edge = sent email

Page 9: Social Network Analysis in Your Problem Domain

MARKETS

node = company, edge = partnership

Page 10: Social Network Analysis in Your Problem Domain

MARKET REPORTS

Page 11: Social Network Analysis in Your Problem Domain

TYPES OF NETWORKS

Page 12: Social Network Analysis in Your Problem Domain

TINKERPOP

“Marko Rodriguez is the Doug Cutting of graph analytics.” —Mark Twain

Page 13: Social Network Analysis in Your Problem Domain

PROPERTY

GRAPHS

Page 14: Social Network Analysis in Your Problem Domain

MULTI RELATIONAL TO SINGLE

RELATIONAL

g.E(‘friend’).subgraph()

Page 15: Social Network Analysis in Your Problem Domain

final Graph g = TinkerFactory.createClassic(); try (final OutputStream os = new FileOutputStream(“jsondump/links.json")) { GraphSONWriter.build().create().writeGraph(os, g); }

EXPORT LINKS AS JSON

Page 16: Social Network Analysis in Your Problem Domain

THEN USE SNA

LIBRARIES

# # Example - calculate friendship dispersion #

di_graph = nx.DiGraph()

all_edges = util.json_cr_file_2_array('jsondump/links.json')

for edge in all_edges: if 'type' in edge and edge['type'] == 'partnership': di_graph.add_edge(edge['domain1'], edge[‘domain2'])

dispersion = nx.dispersion(di_graph)

Page 17: Social Network Analysis in Your Problem Domain

A PROPERTY GRAPH IN

EVERY DATABASE

Page 18: Social Network Analysis in Your Problem Domain

PROPERTY GRAPHS IN YOUR DOMAIN

identify entities identify relationships specify schema (or not) populate graph database learn to think in graph walks (hard) query in batch query in realtime

Page 19: Social Network Analysis in Your Problem Domain

POPULATING A PROPERTY GRAPH

// Add nodes while((json = company_reader.readLine()) != null) { document = jsonSlurper.parseText(json) v = graph.addVertex('company') v.property("_id", document._id) v.property("domain", document.domain) v.property("name", document.name) }

Page 20: Social Network Analysis in Your Problem Domain

POPULATING A PROPERTY GRAPH

// Get a graph traverser g = graph.traversal()

while((json = links_reader.readLine()) != null) { document = jsonSlurper.parseText(json)

// Add edges to graph v1 = g.V().has('domain', document.home_domain).next() v2 = g.V().has('domain', document.link_domain).next() v1.addEdge(document.type, v2) }

Page 21: Social Network Analysis in Your Problem Domain

TOOLS OF

SNA

SNA = Social Network Analysis

centrality clustering block models cores dispersion center-pieces

Page 22: Social Network Analysis in Your Problem Domain

CENTRALITY

Centrality is a way of measuring how central or important a particular node is in a social network.

OR

What nodes should I care about?

Page 23: Social Network Analysis in Your Problem Domain

SINGLE-RELATIONAL CENTRALITY(S)

# all-links-the-same-type-centrality g.V().out().groupCount()

# things-humans-walk-centrality g.V().hasLabel(‘human’).out(‘walks’).groupCount()

# things-dogs-eat-centrality g.V().hasLabel(‘dog’).out(‘eats’).groupCount()

Page 24: Social Network Analysis in Your Problem Domain

MULTI-RELATIONAL CENTRALITY(S)

# things-eaten-by-things-humans-walk-centrality g.V().hasLabel(‘human’).out(‘walks’).out(‘eats’).groupCount()

# things-hated-by-things-humans-pet-centrality g.V().hasLabel(‘human’).out(‘pets’).out(‘hates’).groupCount()

# things-that-pet-things-that-eat-mice-centrality g.V().in(‘eats’).in(‘pets’).groupCount()

Page 25: Social Network Analysis in Your Problem Domain

CENTRALITIES

degree centrality closeness centrality

betweenness centrality eigenvector centrality

Page 26: Social Network Analysis in Your Problem Domain

DEGREE CENTRALITY

in-degree centrality is nice… it works even if you’re missing a node’s outbound links

Page 27: Social Network Analysis in Your Problem Domain

DEGREE CENTRALITY

# computation count connections …its that simple in-degree centrality = popularity out-degree centrality = gregariousness

# meaning risk of catching cold

Page 28: Social Network Analysis in Your Problem Domain

DEGREE CENTRALITY IN GREMLIN

# all-links-the-same-type-centrality g.V().out().groupCount()

Page 29: Social Network Analysis in Your Problem Domain

CLOSENESS CENTRALITY

# computation count hops of all shortest paths distance from all other nodes reciprocal of farness

# meaning communication efficiency spread of information

Page 30: Social Network Analysis in Your Problem Domain

CLOSENESS CENTRALITY IN GREMLIN

closenessCentrality = g.V().as(“a”).repeat(both(‘relationship_type').simplePath()).emit().as("b")

.dedup().by(select(“a","b")).path() .group().by(limit(local, 1)).by(count(local)

.map {1/it.get()}.sum())

Page 31: Social Network Analysis in Your Problem Domain

BETWEENNESS CENTRALITY

# computation count of times node appears in shortest paths… …between all pairs of nodes

# meaning control of communication between other nodes

Page 32: Social Network Analysis in Your Problem Domain

EIGENVECTOR CENTRALITY

# computation counts connections of connected nodes more connected neighbors matter more

# meaning influence of one node on others pagerank is an eigenvector centrality

Page 33: Social Network Analysis in Your Problem Domain

EIGENVECTOR CENTRALITY IN GREMLIN

g.V() .repeat(out(‘relationship_type’).groupCount(‘m').by('unique_key'))

.times(n).cap('m')

Page 34: Social Network Analysis in Your Problem Domain

CLUSTERING

Page 35: Social Network Analysis in Your Problem Domain

CLUSTERING

property based clustering: k-meansgraph based clustering: modularity property graph based clustering: CESNA

Page 36: Social Network Analysis in Your Problem Domain

BLOCK MODELS

how much do clusters connect? are links reciprocal? circos are helpful

Page 37: Social Network Analysis in Your Problem Domain

CORES

Page 38: Social Network Analysis in Your Problem Domain

DISPERSION

Romantic Partnerships and the Dispersion of Social Ties: A Network Analysis of Relationship Status on Facebook

Page 39: Social Network Analysis in Your Problem Domain

CENTER-PIECE SUBGRAPHS

*Slide stolen from Tong, Faloutsos, Pan

Page 40: Social Network Analysis in Your Problem Domain

Russell Jurney, CEO [email protected] twi?er.com/rjurney 404-317-3620

http://bit.ly/socialnetworkanalysis