introduction to computational graph analytics - lecture...

52
Introduction to Computational Graph Analytics Lecture 1 CSCI 4974/6971 29 August 2016 1/6

Upload: others

Post on 16-Jul-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Introduction to Computational Graph Analytics - Lecture 1slotag/classes/FA16/slides/lec01-intro.pdf · Slides from Marta Arias & R. Ferrer-i-Cancho, Intro to Complex and Social Networks

Introduction to Computational Graph

AnalyticsLecture 1

CSCI 4974/6971

29 August 2016

1 / 6

Page 2: Introduction to Computational Graph Analytics - Lecture 1slotag/classes/FA16/slides/lec01-intro.pdf · Slides from Marta Arias & R. Ferrer-i-Cancho, Intro to Complex and Social Networks

Graph, networks, and characteristics of real-world dataSlides from Marta Arias & R. Ferrer-i-Cancho, Intro to

Complex and Social Networks

2 / 6

Page 3: Introduction to Computational Graph Analytics - Lecture 1slotag/classes/FA16/slides/lec01-intro.pdf · Slides from Marta Arias & R. Ferrer-i-Cancho, Intro to Complex and Social Networks

Presentation and course logisticsIntro to Network Analysis

Examples of real networksMeasuring and modeling networks

So, let’s start! Today, we’ll see:

1. Examples of real networks

2. What do real networks look like?I real networks exhibit small diameter

I .. and so does the Erdos-Renyi or random model

I real networks have high clustering coefficientI .. and so does the Watts-Strogatz model

I real networks’ degree distribution follows a power-lawI .. and so does the Barabasi-Albert or preferential attachment

model

Marta Arias & R. Ferrer-i-Cancho Intro to Complex and Social Networks

Page 4: Introduction to Computational Graph Analytics - Lecture 1slotag/classes/FA16/slides/lec01-intro.pdf · Slides from Marta Arias & R. Ferrer-i-Cancho, Intro to Complex and Social Networks

Presentation and course logisticsIntro to Network Analysis

Examples of real networksMeasuring and modeling networks

Examples of real networks

I Social networks

I Information networks

I Technological networks

I Biological networks

Marta Arias & R. Ferrer-i-Cancho Intro to Complex and Social Networks

Page 5: Introduction to Computational Graph Analytics - Lecture 1slotag/classes/FA16/slides/lec01-intro.pdf · Slides from Marta Arias & R. Ferrer-i-Cancho, Intro to Complex and Social Networks

Presentation and course logisticsIntro to Network Analysis

Examples of real networksMeasuring and modeling networks

Social networks

Links denote social “interactions”I friendship, collaborations, e-mail, etc.

Marta Arias & R. Ferrer-i-Cancho Intro to Complex and Social Networks

Page 6: Introduction to Computational Graph Analytics - Lecture 1slotag/classes/FA16/slides/lec01-intro.pdf · Slides from Marta Arias & R. Ferrer-i-Cancho, Intro to Complex and Social Networks

Presentation and course logisticsIntro to Network Analysis

Examples of real networksMeasuring and modeling networks

Information networks

Nodes store information, links associate information

I citation networks, the web, p2p networks, etc.

Marta Arias & R. Ferrer-i-Cancho Intro to Complex and Social Networks

Page 7: Introduction to Computational Graph Analytics - Lecture 1slotag/classes/FA16/slides/lec01-intro.pdf · Slides from Marta Arias & R. Ferrer-i-Cancho, Intro to Complex and Social Networks

Presentation and course logisticsIntro to Network Analysis

Examples of real networksMeasuring and modeling networks

Technological networks

Man-built for the distribution of a commodity

I telephone networks, power grids, transportation networks, etc.

Marta Arias & R. Ferrer-i-Cancho Intro to Complex and Social Networks

Page 8: Introduction to Computational Graph Analytics - Lecture 1slotag/classes/FA16/slides/lec01-intro.pdf · Slides from Marta Arias & R. Ferrer-i-Cancho, Intro to Complex and Social Networks

Presentation and course logisticsIntro to Network Analysis

Examples of real networksMeasuring and modeling networks

Biological networks

Represent biological systems

I protein-protein interaction networks, gene regulation networks,metabolic pathways, etc.

Marta Arias & R. Ferrer-i-Cancho Intro to Complex and Social Networks

Page 9: Introduction to Computational Graph Analytics - Lecture 1slotag/classes/FA16/slides/lec01-intro.pdf · Slides from Marta Arias & R. Ferrer-i-Cancho, Intro to Complex and Social Networks

Presentation and course logisticsIntro to Network Analysis

Examples of real networksMeasuring and modeling networks

Representing networks

I Network ≡ Graph

I Networks are just collections of “points” joined by “lines”

points lines

vertices edges, arcs mathnodes links computer sciencesites bonds physicsactors ties, relations sociology

Marta Arias & R. Ferrer-i-Cancho Intro to Complex and Social Networks

Page 10: Introduction to Computational Graph Analytics - Lecture 1slotag/classes/FA16/slides/lec01-intro.pdf · Slides from Marta Arias & R. Ferrer-i-Cancho, Intro to Complex and Social Networks

Presentation and course logisticsIntro to Network Analysis

Examples of real networksMeasuring and modeling networks

Types of networksFrom [Newman, 2003]

(a) unweighted,undirected

(b) discrete vertex andedge types,undirected

(c) varying vertex andedge weights,undirected

(d) directed

Marta Arias & R. Ferrer-i-Cancho Intro to Complex and Social Networks

Page 11: Introduction to Computational Graph Analytics - Lecture 1slotag/classes/FA16/slides/lec01-intro.pdf · Slides from Marta Arias & R. Ferrer-i-Cancho, Intro to Complex and Social Networks

Presentation and course logisticsIntro to Network Analysis

Examples of real networksMeasuring and modeling networks

Small-world phenomenon

I A friend of a friend is also frequently a friend

I Only 6 hops separate any two people in the world

Marta Arias & R. Ferrer-i-Cancho Intro to Complex and Social Networks

Page 12: Introduction to Computational Graph Analytics - Lecture 1slotag/classes/FA16/slides/lec01-intro.pdf · Slides from Marta Arias & R. Ferrer-i-Cancho, Intro to Complex and Social Networks

Presentation and course logisticsIntro to Network Analysis

Examples of real networksMeasuring and modeling networks

Measuring the small-world phenomenon, I

I Let dij be the shortest-path distance between nodes i and jI To check whether “any two nodes are within 6 hops”, we use:

I The diameter (longest shortest-path distance) as

d = maxi,j

dij

I The average shortest-path length as

l =2

n (n + 1)

i>j

dij

I The harmonic mean shortest-path length as

l−1 =2

n (n + 1)

i>j

d−1ij

Marta Arias & R. Ferrer-i-Cancho Intro to Complex and Social Networks

Page 13: Introduction to Computational Graph Analytics - Lecture 1slotag/classes/FA16/slides/lec01-intro.pdf · Slides from Marta Arias & R. Ferrer-i-Cancho, Intro to Complex and Social Networks

Presentation and course logisticsIntro to Network Analysis

Examples of real networksMeasuring and modeling networks

From [Newman, 2003]

Marta Arias & R. Ferrer-i-Cancho Intro to Complex and Social Networks

Page 14: Introduction to Computational Graph Analytics - Lecture 1slotag/classes/FA16/slides/lec01-intro.pdf · Slides from Marta Arias & R. Ferrer-i-Cancho, Intro to Complex and Social Networks

Presentation and course logisticsIntro to Network Analysis

Examples of real networksMeasuring and modeling networks

Degree distribution

Histogram of nr of nodes having a particular degree

fk = fraction of nodes of degree k

Marta Arias & R. Ferrer-i-Cancho Intro to Complex and Social Networks

Page 15: Introduction to Computational Graph Analytics - Lecture 1slotag/classes/FA16/slides/lec01-intro.pdf · Slides from Marta Arias & R. Ferrer-i-Cancho, Intro to Complex and Social Networks

Presentation and course logisticsIntro to Network Analysis

Examples of real networksMeasuring and modeling networks

Scale-free networks

The degree distribution of most real-world networks follows apower-law distribution

fk = ck−α

I “heavy-tail” distribution, impliesexistence of hubs

I hubs are nodes with very high degree

Marta Arias & R. Ferrer-i-Cancho Intro to Complex and Social Networks

Page 16: Introduction to Computational Graph Analytics - Lecture 1slotag/classes/FA16/slides/lec01-intro.pdf · Slides from Marta Arias & R. Ferrer-i-Cancho, Intro to Complex and Social Networks

How to Analyze NetworksSlides from Johannes Putzke, Social Network Analysis: Basic

Concepts, Methods & Theory

3 / 6

Page 17: Introduction to Computational Graph Analytics - Lecture 1slotag/classes/FA16/slides/lec01-intro.pdf · Slides from Marta Arias & R. Ferrer-i-Cancho, Intro to Complex and Social Networks

Different Levels of Analysis

Actor-Level

Dyad-Level

Triad-Level

Subset-level (cliques / subgraphs)

Group (i.e. global) level

Folie: 35

Page 18: Introduction to Computational Graph Analytics - Lecture 1slotag/classes/FA16/slides/lec01-intro.pdf · Slides from Marta Arias & R. Ferrer-i-Cancho, Intro to Complex and Social Networks

Example: Centrality Measures

Who is the most prominent?

Who knows the most actors?

(Degree Centrality)

Who has the shortest distance to the other actors?

Who controls knowledge flows?

...

Folie: 18

Page 19: Introduction to Computational Graph Analytics - Lecture 1slotag/classes/FA16/slides/lec01-intro.pdf · Slides from Marta Arias & R. Ferrer-i-Cancho, Intro to Complex and Social Networks

Closeness Centrality Who knows the most

actors?

Who has the shortest distance to the other actors? (Closesness Centrality)

Who controls knowledge flows?

...

Folie: 40

Page 20: Introduction to Computational Graph Analytics - Lecture 1slotag/classes/FA16/slides/lec01-intro.pdf · Slides from Marta Arias & R. Ferrer-i-Cancho, Intro to Complex and Social Networks

Betweenness Centrality Who knows the most actors?

Who has the shortest distance to the other actors?

Who controls knowledge flows?

(Betweenness Centrality)

...

Folie: 44

19

1 2

4 3 6

5

9 8

7

10 11 12

14

17 18

13 15

16

Page 21: Introduction to Computational Graph Analytics - Lecture 1slotag/classes/FA16/slides/lec01-intro.pdf · Slides from Marta Arias & R. Ferrer-i-Cancho, Intro to Complex and Social Networks

Reachability, Distances and Diameter

Reachability If there is a path between

nodes ni and nj

Geodesic Shortest path between two nodes

(Geodesic) Distance d(i,j) Length of Geodesic (also called „degrees of separation“)

Folie: 27

Page 22: Introduction to Computational Graph Analytics - Lecture 1slotag/classes/FA16/slides/lec01-intro.pdf · Slides from Marta Arias & R. Ferrer-i-Cancho, Intro to Complex and Social Networks

Diameter of a Graph and Average Geodesic Distance

Diameter

Largest geodesic distance between any pair of nodes

Average Geodesic Distance

How fast can information get transmitted?

Folie: 50

19

1 2

4 3 6

5

9 8

7

10 11 12

14

17 18

13 15

16

Page 23: Introduction to Computational Graph Analytics - Lecture 1slotag/classes/FA16/slides/lec01-intro.pdf · Slides from Marta Arias & R. Ferrer-i-Cancho, Intro to Complex and Social Networks

Density

Proportion of ties in a graph

High density (44%)

Low density (14%)

Folie: 51

Page 24: Introduction to Computational Graph Analytics - Lecture 1slotag/classes/FA16/slides/lec01-intro.pdf · Slides from Marta Arias & R. Ferrer-i-Cancho, Intro to Complex and Social Networks

Connectivity of Graphs

Folie: 58

Page 25: Introduction to Computational Graph Analytics - Lecture 1slotag/classes/FA16/slides/lec01-intro.pdf · Slides from Marta Arias & R. Ferrer-i-Cancho, Intro to Complex and Social Networks

Connected Graphs, Components, Cutpoints and Bridges

Connectedness A graph is connected if there

is a path between every pair of nodes

Components Connected subgraphs in a

graph

Connected graph has 1 component

Two disconnected graphs are one social network!!!

Folie: 59

Page 26: Introduction to Computational Graph Analytics - Lecture 1slotag/classes/FA16/slides/lec01-intro.pdf · Slides from Marta Arias & R. Ferrer-i-Cancho, Intro to Complex and Social Networks

Connected Graphs, Components, Cutpoints and Bridges

Connectivity of pairs of nodes and graphs Weakly connected

Joined by semipath

Unilaterally connected

Path from nj to nj or from nj to nj

Strongly connected

Path from nj to nj and from nj to nj

Path may contain different nodes

Recursively Connected

Nodes are strongly connected and both paths use the same nodes and arcs in reverse order

Folie: 60

n4 n2 n3 n1

n4 n2 n3 n1

n4 n2 n3

n1

n4 n2 n3 n1

n5 n6

n4 n2 n3 n1

Page 27: Introduction to Computational Graph Analytics - Lecture 1slotag/classes/FA16/slides/lec01-intro.pdf · Slides from Marta Arias & R. Ferrer-i-Cancho, Intro to Complex and Social Networks

Connected Graphs, Components, Cutpoints and Bridges

Cutpoints number of components in

the graph that contain node nj is fewer than number of components in subgraphs that results from deleting nj from the graph

Cutsets (of size k) k-node cut

Bridges / line cuts Number of

components…that contain line lk

Folie: 61

19

1 2

4 3 6

5

9 8

7

10 11 12

14

17 18

13 15

16

Page 28: Introduction to Computational Graph Analytics - Lecture 1slotag/classes/FA16/slides/lec01-intro.pdf · Slides from Marta Arias & R. Ferrer-i-Cancho, Intro to Complex and Social Networks

Node- and Line Connectivity How vulnerable is a graph to removal of nodes or lines?

Point connectivity /

Node connectivity Minimum number of k for

which the graph has a k-node cut

For any value <k the graph is k-node-connected

Line connectivity / Edge connectivity

Minimum number λ for which for which graph has a λ-line cut

Folie: 62

Page 29: Introduction to Computational Graph Analytics - Lecture 1slotag/classes/FA16/slides/lec01-intro.pdf · Slides from Marta Arias & R. Ferrer-i-Cancho, Intro to Complex and Social Networks

How to Analyze Networks (cont.)Slides from Jon Crowcroft, Introduction to Network Theory

4 / 6

Page 30: Introduction to Computational Graph Analytics - Lecture 1slotag/classes/FA16/slides/lec01-intro.pdf · Slides from Marta Arias & R. Ferrer-i-Cancho, Intro to Complex and Social Networks

SubgraphSubgraph

Vertex and edge sets are subsets of those of GVertex and edge sets are subsets of those of G a a supergraphsupergraph of a graph G is a graph that contains G as a of a graph G is a graph that contains G as a

subgraphsubgraph..

Page 31: Introduction to Computational Graph Analytics - Lecture 1slotag/classes/FA16/slides/lec01-intro.pdf · Slides from Marta Arias & R. Ferrer-i-Cancho, Intro to Complex and Social Networks

IsomorphismIsomorphism

BijectionBijection, i.e., a one-to-one mapping:, i.e., a one-to-one mapping:f : V(G) -> V(H)f : V(G) -> V(H)

u and v from G are adjacent if and only if f(u) and f(v) areu and v from G are adjacent if and only if f(u) and f(v) areadjacent in H.adjacent in H.

If an isomorphism can be constructed between two graphs, thenIf an isomorphism can be constructed between two graphs, thenwe say those graphs are we say those graphs are isomorphicisomorphic..

Page 32: Introduction to Computational Graph Analytics - Lecture 1slotag/classes/FA16/slides/lec01-intro.pdf · Slides from Marta Arias & R. Ferrer-i-Cancho, Intro to Complex and Social Networks

Isomorphism ProblemIsomorphism Problem

Determining whether two graphs areDetermining whether two graphs areisomorphicisomorphic

Although these graphs look very different,Although these graphs look very different,they are isomorphic; one isomorphismthey are isomorphic; one isomorphismbetween them isbetween them isf(a)=1 f(b)=6 f(c)=8 f(d)=3f(a)=1 f(b)=6 f(c)=8 f(d)=3f(g)=5 f(h)=2 f(i)=4 f(j)=7f(g)=5 f(h)=2 f(i)=4 f(j)=7

Page 33: Introduction to Computational Graph Analytics - Lecture 1slotag/classes/FA16/slides/lec01-intro.pdf · Slides from Marta Arias & R. Ferrer-i-Cancho, Intro to Complex and Social Networks

Analyzing using subgraph counting (more cont.)

5 / 6

Page 34: Introduction to Computational Graph Analytics - Lecture 1slotag/classes/FA16/slides/lec01-intro.pdf · Slides from Marta Arias & R. Ferrer-i-Cancho, Intro to Complex and Social Networks

Subgraph Counting

13 / 1

Page 35: Introduction to Computational Graph Analytics - Lecture 1slotag/classes/FA16/slides/lec01-intro.pdf · Slides from Marta Arias & R. Ferrer-i-Cancho, Intro to Complex and Social Networks

Subgraph Counting

13 / 1

Page 36: Introduction to Computational Graph Analytics - Lecture 1slotag/classes/FA16/slides/lec01-intro.pdf · Slides from Marta Arias & R. Ferrer-i-Cancho, Intro to Complex and Social Networks

Subgraph Counting

13 / 1

Page 37: Introduction to Computational Graph Analytics - Lecture 1slotag/classes/FA16/slides/lec01-intro.pdf · Slides from Marta Arias & R. Ferrer-i-Cancho, Intro to Complex and Social Networks

Subgraph Counting

13 / 1

Page 38: Introduction to Computational Graph Analytics - Lecture 1slotag/classes/FA16/slides/lec01-intro.pdf · Slides from Marta Arias & R. Ferrer-i-Cancho, Intro to Complex and Social Networks

Subgraph Counting

13 / 1

Page 39: Introduction to Computational Graph Analytics - Lecture 1slotag/classes/FA16/slides/lec01-intro.pdf · Slides from Marta Arias & R. Ferrer-i-Cancho, Intro to Complex and Social Networks

Motivations for Subgraph Counting, Path FindingWhy do we want fast algorithms for subgraph counting and weighted path finding?

Important to social network analysis, communicationnetwork analysis,bioinformatics, chemoinformatics, etc.

Forms basis of more complex analysis

Motif finding, anomaly detectionGraphlet frequency distance (GFD)Graphlet degree distributions (GDD)Graphlet degree signatures (GDS)

Counting and enumeration on large networks is verytough, O(nk) complexity for naıve algorithm

Finding minimum-weight paths – NP-hard problem

14 / 1

Page 40: Introduction to Computational Graph Analytics - Lecture 1slotag/classes/FA16/slides/lec01-intro.pdf · Slides from Marta Arias & R. Ferrer-i-Cancho, Intro to Complex and Social Networks

Motif Finding

Motif finding: Look for all subgraphs of a certain size (andstructure)Highly occuring subgraphs can have structural significance

● ●●

●●

● ●

0.1

1.0

1 2 3 4 5 6 7 8 9 10 11Subgraph

Rel

ativ

e F

requ

ency

●●●●E.coli S.cerevisiae H.pylori C.elegans

15 / 1

Page 41: Introduction to Computational Graph Analytics - Lecture 1slotag/classes/FA16/slides/lec01-intro.pdf · Slides from Marta Arias & R. Ferrer-i-Cancho, Intro to Complex and Social Networks

Graphlet Frequency Distance Analysis

GFD: Numerically compare occurrence frequency to othernetworksSi(G ) – relative frequency for subgraph i in graph GCi – counts of subgraph iD(G ,H) – frequency distance between two graphs G ,H

16 / 1

Page 42: Introduction to Computational Graph Analytics - Lecture 1slotag/classes/FA16/slides/lec01-intro.pdf · Slides from Marta Arias & R. Ferrer-i-Cancho, Intro to Complex and Social Networks

Graphlet Frequency Distance Analysis

GFD: Numerically compare occurrence frequency to other networksHeatmap of distances between many networks (red = similar, white =dissimilar)Note occurrence of high intra-network type similarities

17 / 1

Page 43: Introduction to Computational Graph Analytics - Lecture 1slotag/classes/FA16/slides/lec01-intro.pdf · Slides from Marta Arias & R. Ferrer-i-Cancho, Intro to Complex and Social Networks

Computational Aspects for Massive GraphsAKA why efficient parallelization is important - i.e. the point

of this class

6 / 6

Page 44: Introduction to Computational Graph Analytics - Lecture 1slotag/classes/FA16/slides/lec01-intro.pdf · Slides from Marta Arias & R. Ferrer-i-Cancho, Intro to Complex and Social Networks

What?

3 / 23

Page 45: Introduction to Computational Graph Analytics - Lecture 1slotag/classes/FA16/slides/lec01-intro.pdf · Slides from Marta Arias & R. Ferrer-i-Cancho, Intro to Complex and Social Networks

Graph Analytics and HPC

Or, given modern extreme-scale graph-structured datasets(web crawls, brain graphs, human interaction networks) andmodern high performance computing systems (Blue Waters),how can we develop a generalized approach to efficiently studysuch datasets on such systems?

4 / 23

Page 46: Introduction to Computational Graph Analytics - Lecture 1slotag/classes/FA16/slides/lec01-intro.pdf · Slides from Marta Arias & R. Ferrer-i-Cancho, Intro to Complex and Social Networks

Why?

5 / 23

Page 47: Introduction to Computational Graph Analytics - Lecture 1slotag/classes/FA16/slides/lec01-intro.pdf · Slides from Marta Arias & R. Ferrer-i-Cancho, Intro to Complex and Social Networks

Why do want to study these large graphs?

Human Interaction Graphs:

I Finding hidden communities, individuals, malicious actors

I Observe how information and knowledge propagates

Brain Graphs:

I Study the topological properties of neural connections

I Finding latent computational substructures, similarities toother information processing systems

Web Crawls:

I Identifying trustworthy/important sites

I Spam networks, untrustworthy sites

6 / 23

Page 48: Introduction to Computational Graph Analytics - Lecture 1slotag/classes/FA16/slides/lec01-intro.pdf · Slides from Marta Arias & R. Ferrer-i-Cancho, Intro to Complex and Social Networks

Prior ApproachesCan we use them to analyze large graphs on HPC?

I Some limited by shared-memory and/or specialized hardwareI Some run in distributed memory but graph scale is still limitedI Others, graph scale isn’t limiting factor but performance can be

7 / 23

Page 49: Introduction to Computational Graph Analytics - Lecture 1slotag/classes/FA16/slides/lec01-intro.pdf · Slides from Marta Arias & R. Ferrer-i-Cancho, Intro to Complex and Social Networks

Graph analytics on HPCSo why do we want to run graph analytics on HPC?

I Scalability for analytic performance and graph sizeI Efficient implementations should be limited only by

distributed memory capacityI Graph500.org - demonstration of performance achievable

for irregular computations through breadth-first search(BFS)

I Relative availability of access in academic/researchcommunities

I Private clusters of various scales, shared supercomputersI Access for domain experts, those using analytics on

real-world graphs

Can we create an approach that is as simple to use asthe aforementioned frameworks but runs on commoncluster hardware and gives state-of-the-artperformance?

8 / 23

Page 50: Introduction to Computational Graph Analytics - Lecture 1slotag/classes/FA16/slides/lec01-intro.pdf · Slides from Marta Arias & R. Ferrer-i-Cancho, Intro to Complex and Social Networks

Challenges

9 / 23

Page 51: Introduction to Computational Graph Analytics - Lecture 1slotag/classes/FA16/slides/lec01-intro.pdf · Slides from Marta Arias & R. Ferrer-i-Cancho, Intro to Complex and Social Networks

Scale

I This work considers “extreme-scale” graphs – billion+vertices and up to trillion+ edges

I Processing these graphs requires at least hundreds tothousands of compute nodes or tens of thousands of cores

I Graph analytic algorithms are generally memory-boundinstead of compute-bound; in the distributed space, thisresults in a ratio of communication versus computationthat increases with core/node count

10 / 23

Page 52: Introduction to Computational Graph Analytics - Lecture 1slotag/classes/FA16/slides/lec01-intro.pdf · Slides from Marta Arias & R. Ferrer-i-Cancho, Intro to Complex and Social Networks

Complexity

I Real-world extreme-scale graphs have similarcharacteristics: small-world nature with skewed degreedistributions

I Small-world graphs are difficult to partition for distributedcomputation or to optimize in terms of cache due to “toomuch locality”

I Skewed degree distributions make efficient parallelizationand load balance difficult to achieve

I Multiple levels of cache/memory and increasing relianceon wide parallelism for modern HPC systems compoundsthe above challenges

11 / 23