cs224w: analysis of networks jure leskovec,

61
CS224W: Analysis of Networks Jure Leskovec, Stanford University http://cs224w.stanford.edu

Upload: others

Post on 04-Oct-2021

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: CS224W: Analysis of Networks Jure Leskovec,

CS224W: Analysis of NetworksJure Leskovec, Stanford University

http://cs224w.stanford.edu

Page 2: CS224W: Analysis of Networks Jure Leskovec,
Page 3: CS224W: Analysis of Networks Jure Leskovec,

9/29/19 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, cs224w.stanford.edu 3

Degree distribution: P(k)

Path length: h

Clustering coefficient: C

Connected components: s Definitions will be presented for undirected graphs, sometimes we will explicitly mention

extensions to directed graphs, and sometimes extensions will be obvious

Page 4: CS224W: Analysis of Networks Jure Leskovec,

¡ Degree distribution P(k): Probability that a randomly chosen node has degree k

Nk = # nodes with degree k¡ Normalized histogram:

P(k) = Nk / N ➔ plot

9/29/19 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, cs224w.stanford.edu 4

k

P(k)

1 2 3 4

0.10.20.30.40.50.6

For directed graphs we have separate in- and out-degree distributions.

Page 5: CS224W: Analysis of Networks Jure Leskovec,

¡ A path is a sequence of nodes in which each node is linked to the next one

¡ A path can intersect itself and pass through the same edge multiple times§ E.g.: ACBDCDEG

Pn = {i0,i1,i2,...,in}

Pn = {(i0 ,i1),(i1,i2 ),(i2 ,i3 ),...,(in-1,in )}

C

AB

D E

H

F

G

9/29/19 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, cs224w.stanford.edu 5

X

Page 6: CS224W: Analysis of Networks Jure Leskovec,

¡ Distance (shortest path, geodesic)between a pair of nodes is defined as the number of edges along the shortest path connecting the nodes

§ *If the two nodes are not connected, the distance is usually defined as infinite (or zero)

¡ In directed graphs, paths need to follow the direction of the arrows

§ Consequence: Distance is not symmetric: hB,C ≠ hC,B

9/29/19 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, cs224w.stanford.edu 6

B

A

D

C

B

A

D

C

hB,D = 2hA,X = ∞

hB,C = 1, hC,B = 2

X

Page 7: CS224W: Analysis of Networks Jure Leskovec,

¡ Diameter: The maximum (shortest path) distance between any pair of nodes in a graph

¡ Average path length for a connected graph or a strongly connected directed graph

§ Many times we compute the average only over the connected pairs of nodes (that is, we ignore “infinite” length paths)

§ Note that ths measure also applied to (strongly) connected components of a graph

9/29/19 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, cs224w.stanford.edu 7

å¹

=ijiijhE

h,max2

1 • hij is the distance from node i to node j• Emax is the max number of edges (total

number of node pairs) = n(n-1)/2

Page 8: CS224W: Analysis of Networks Jure Leskovec,

¡ Clustering coefficient (for undirected graphs): § How connected are i’s neighbors to each other?§ Node i with degree ki

§ Ci Î [0,1]

§

¡ Average clustering coefficient:

9/29/19 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, cs224w.stanford.edu 8

where ei is the number of edges between the neighbors of node i

å=N

iiCN

C 1

Clustering coefficientis undefined (or defined to be 0) for nodes with degree 0 or 1

Note 𝑘"(𝑘" − 1) is max number of edges between the 𝑘" neighbors

Page 9: CS224W: Analysis of Networks Jure Leskovec,

¡ Clustering coefficient (for undirected graps): § How connected are i’s neighbors to each other?§ Node i with degree ki

§

9/29/19 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, cs224w.stanford.edu 9

where ei is the number of edges between the neighbors of node i

C

AB

D E

H

F

G

kB=2, eB=1, CB=2/2 = 1

kD=4, eD=2, CD=4/12 = 1/3

Avg. clustering: C=0.33

Page 10: CS224W: Analysis of Networks Jure Leskovec,

¡ Size of the largest connected component § Largest set where any two vertices can be joined

by a path¡ Largest component = Giant component

9/29/19 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, cs224w.stanford.edu 10

How to find connected components:• Start from random node and perform

Breadth First Search (BFS)• Label the nodes that BFS visits• If all nodes are visited, the network is connected• Otherwise find an unvisited node and repeat BFS

DC

A

B

H

F

G

Page 11: CS224W: Analysis of Networks Jure Leskovec,

9/29/19 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, cs224w.stanford.edu 11

Degree distribution: P(k)

Path length: h

Clustering coefficient: C

Connected components: s

Page 12: CS224W: Analysis of Networks Jure Leskovec,
Page 13: CS224W: Analysis of Networks Jure Leskovec,

MSN Messenger:¡ 1 month of activity§ 245 million users logged in§ 180 million users engaged in

conversations§ More than 30 billion

conversations§ More than 255 billion

exchanged messages

9/29/19 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, cs224w.stanford.edu 13

Page 14: CS224W: Analysis of Networks Jure Leskovec,

149/29/19 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, cs224w.stanford.edu

Page 15: CS224W: Analysis of Networks Jure Leskovec,

Network: 180M people, 1.3B edges 159/29/19 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, cs224w.stanford.edu

Page 16: CS224W: Analysis of Networks Jure Leskovec,

Contact ConversationJure Leskovec, Stanford CS224W: Machine Learning with Graphs, cs224w.stanford.edu9/29/19 16

Messaging as an undirected graph• Edge (u,v) if users u and v

exchanged at least 1 msg• N=180 million people• E=1.3 billion edges

Page 17: CS224W: Analysis of Networks Jure Leskovec,

9/29/19 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, cs224w.stanford.edu 17

Page 18: CS224W: Analysis of Networks Jure Leskovec,

9/29/19 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, cs224w.stanford.edu 18

Note: We plotted the same data as on the previous slide, just the axes are now logarithmic.

Page 19: CS224W: Analysis of Networks Jure Leskovec,

9/29/19 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, cs224w.stanford.edu 19

å=

=kkii

kk

i

CN

C:

1Ck: average Ci of nodes i of degree k:

Avg. clustering of the MSN:C = 0.1140

Page 20: CS224W: Analysis of Networks Jure Leskovec,

9/29/19 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, cs224w.stanford.edu 20

Page 21: CS224W: Analysis of Networks Jure Leskovec,

9/29/19 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, cs224w.stanford.edu 21

Number of links between pairs of

nodes in the largest connected

component

Avg. path length 6.690% of the nodes can be reached in < 8 hops

Steps #Nodes0 1

1 10

2 78

3 3,96

4 8,648

5 3,299,252

6 28,395,849

7 79,059,497

8 52,995,778

9 10,321,008

10 1,955,007

11 518,410

12 149,945

13 44,616

14 13,740

15 4,476

16 1,542

17 536

18 167

19 71

20 29

21 16

22 10

23 3

24 2

25 3

# no

des

as w

e do

BFS

out

of a

rand

om n

ode

Page 22: CS224W: Analysis of Networks Jure Leskovec,

9/29/19 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, cs224w.stanford.edu 22

Degree distribution:

Path length: 6.6

Clustering coefficient: 0.11

Connectivity: giant component

Heavily skewed;avg. degree = 14.4

Are these values “expected”? Are they “surprising”?

To answer this we need a model!

Page 23: CS224W: Analysis of Networks Jure Leskovec,

a. Undirected networkN=2,018 proteins as nodesE=2,930 binding interactions as links.

b. Degree distribution:Skewed. Average degree <k>=2.90c. Diameter:Avg. path length = 5.8d. Clustering:Avg. clustering = 0.12Connectivity: 185 componentsthe largest component has 1,647 nodes (81% of nodes)

9/29/19 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, cs224w.stanford.edu 23

Page 24: CS224W: Analysis of Networks Jure Leskovec,
Page 25: CS224W: Analysis of Networks Jure Leskovec,

¡ Erdös-Renyi Random Graphs [Erdös-Renyi, ‘60]¡ Two variants:§ Gnp: undirected graph on n nodes where each

edge (u,v) appears i.i.d. with probability p

§ Gnm : undirected graph with n nodes, and m edges picked uniformly at random

Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, cs224w.stanford.edu9/29/19 25

What kind of networks do such models produce?

Page 26: CS224W: Analysis of Networks Jure Leskovec,

¡ n and p do not uniquely determine the graph!§ The graph is a result of a random process

¡ We can have many different realizations given the same n and p

9/29/19 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, cs224w.stanford.edu 26

n = 10 p= 1/6

Page 27: CS224W: Analysis of Networks Jure Leskovec,

9/29/19 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, cs224w.stanford.edu 27

Degree distribution: P(k)

Path length: h

Clustering coefficient: C

What are the values of these properties for Gnp?

Page 28: CS224W: Analysis of Networks Jure Leskovec,

¡ Fact: Degree distribution of Gnp is binomial.¡ Let P(k) denote the fraction of nodes with

degree k:

9/29/19 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, cs224w.stanford.edu 28

knk ppkn

kP ---÷÷ø

öççè

æ -= 1)1(

1)(

Select k nodes out of n-1

Probability of having k edges

Probability of missing the rest of the n-1-k edges

σ 2 = p(1− p)(n−1)

σk=1− pp

1(n−1)

"

#$

%

&'

1/2

≈1

(n−1)1/2)1( -= npk By the law of large numbers, as the network size increases, the distribution becomes increasingly

narrow—we are increasingly confident that the degree of a node is in the vicinity of k.

P(k)

Mean, variance of a binomial distribution

k

Page 29: CS224W: Analysis of Networks Jure Leskovec,

9/29/19 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, cs224w.stanford.edu 29

nk

nkp

kkkkpCii

ii »-

==--×

=1)1(

)1(

Clustering coefficient of a random graph is small.If we generate bigger and bigger graphs with fixed avg. degree 𝑘 (that is we set 𝑝 = 𝑘 ⋅ 1/𝑛), then C decreases with the graph size n.

)1(2-

=ii

ii kk

eC

ei = pki (ki −1)2 Number of distinct pairs of

neighbors of node i of degree kiEach pair is connected with prob. p

Where ei is the number of edges between i’s neighbors

¡ Remember:

¡ Edges in Gnp appear i.i.d. with prob. p

¡ So, expected E[ei] is:

¡ Then E[Ci]:

Page 30: CS224W: Analysis of Networks Jure Leskovec,

9/29/19 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, cs224w.stanford.edu 30

Degree distribution:

Clustering coefficient: C=p=k/n

Path length: next!

Connectivity:

knk ppkn

kP ---÷÷ø

öççè

æ -= 1)1(

1)(

Page 31: CS224W: Analysis of Networks Jure Leskovec,

¡ Graph G(V, E) has expansion α: if" S Í V: # of edges leaving S ³ α× min(|S|,|V\S|)

¡ Or equivalently:

9/29/19 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, cs224w.stanford.edu 31

|)\||,min(|#min SVS

SleavingedgesVSÍ

=a

SV \ S

Page 32: CS224W: Analysis of Networks Jure Leskovec,

¡ Fact: In a graph on n nodes with expansion α, for all pairs of nodes, there is a path of length O((log n)/α).

¡ Random graph Gnp:For log n > np > c, diam(Gnp) = O(log n / log (np))§ Random graphs have good expansion so it takes a

logarithmic number of steps for BFS to visit all nodes

9/29/19 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, cs224w.stanford.edu 32

S nodes α·S edges

S’ nodes α·S’ edges

s

Page 33: CS224W: Analysis of Networks Jure Leskovec,

Erdös-Renyi Random Graph can grow very large but nodes will be just a few hops apart

0 200000 400000 600000 800000 1000000

05

1015

20

num nodes

aver

age

shor

test

pat

h

9/29/19 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, cs224w.stanford.edu 33

Here 𝑛 ⋅ 𝑝 =constantThat is, avg deg 𝑘 is const

Page 34: CS224W: Analysis of Networks Jure Leskovec,

9/29/19 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, cs224w.stanford.edu 34

Degree distribution:

Path length: O(log n)

Clustering coefficient: C = p = k / n

Connected components: next!

knk ppkn

kP ---÷÷ø

öççè

æ -= 1)1(

1)(

Page 35: CS224W: Analysis of Networks Jure Leskovec,

¡ Graph structure of Gnp as p changes:

¡ Emergence of a giant component:avg. degree k=2E/n or p=k/(n-1)§ k=1-ε: all components are of size Ω(log n)§ k=1+ε: 1 component of size Ω(n), others have size Ω(log n)

§ Each node has at least one edge in expectation

9/29/19 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, cs224w.stanford.edu 35

0 1

p=1/(n-1)

Giant component appears

c/(n-1)Avg. deg const. Lots of isolated

nodes.

log(n)/(n-1)Fewer isolated

nodes.

2*log(n)/(n-1)No isolated nodes.

Emptygraph

Completegraph

Avg deg = 1

Page 36: CS224W: Analysis of Networks Jure Leskovec,

¡ Gnp, n=100,000, k=p(n-1) = 0.5 … 3

9/29/19 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, cs224w.stanford.edu 36

Fraction of nodes in the largest component

p*(n-1)=1

Page 37: CS224W: Analysis of Networks Jure Leskovec,

9/29/19 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, cs224w.stanford.edu 37

Degree distribution:

Avg. path length: 6.6 O(log n)

Avg. clustering coef.: 0.11 k / n

Largest Conn. Comp.: 99%C ≈ 8·10-8

h ≈ 8.2

MSN Gnp

GCC existswhen k>1.

k ≈ 14.

n=180M

ý

þ

ý

þ

Page 38: CS224W: Analysis of Networks Jure Leskovec,

¡ Are real networks like random graphs?§ Giant connected component: J§ Average path length: J§ Clustering Coefficient: L§ Degree Distribution: L

¡ Problems with the random networks model:§ Degree distribution differs from that of real networks§ Giant component in most real networks does NOT

emerge through a phase transition§ No local structure – clustering coefficient is too low

¡ Most important: Are real networks random?§ The answer is simply: NO!

9/29/19 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, cs224w.stanford.edu 38

Page 39: CS224W: Analysis of Networks Jure Leskovec,

¡ If Gnp is wrong, why did we spend time on it?§ It is the reference model for the rest of the class§ It will help us calculate many quantities, that can

then be compared to the real data§ It will help us understand to what degree a

particular property is the result of some random process

9/29/19 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, cs224w.stanford.edu 39

So, while Gnp is WRONG, it will turn out to be extremly USEFUL!

Page 40: CS224W: Analysis of Networks Jure Leskovec,

Can we have high clustering while also having short paths?

Vs.

High clustering coefficient, High diameter

Low clustering coefficientLow diameter

Page 41: CS224W: Analysis of Networks Jure Leskovec,

¡ MSN network has 7 orders of magnitude larger clustering than the corresponding Gnp!

¡ Other examples:

9/29/19 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, cs224w.stanford.edu 42

h ... Average shortest path lengthC ... Average clustering coefficient“actual” … real network“random” … random graph with same avg. degree

Actor Collaborations (IMDB): N = 225,226 nodes, avg. degree k = 61Electrical power grid: N = 4,941 nodes, k = 2.67Network of neurons: N = 282 nodes, k = 14

Network hactual hrandom Crandom

Film actors 3.65 2.99 0.00027Power Grid 18.70 12.40 0.005C. elegans 2.65 2.25 0.05

Page 42: CS224W: Analysis of Networks Jure Leskovec,

9/29/19 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, cs224w.stanford.edu 43

¡ Consequence of expansion:§ Short paths: O(log n)

§ This is the smallest diameter we can get if we keep the degree constant.

§ But clustering is low!¡ But networks have

“local” structure:§ Triadic closure:

Friend of a friend is my friend§ High clustering but

diameter is also high¡ How can we have both?

Low diameterLow clustering coefficient

High clustering coefficientHigh diameter

Page 43: CS224W: Analysis of Networks Jure Leskovec,

¡ Could a network with high clustering also be small world (have log 𝒏 diameter)?§ How can we at the same time have

high clustering and small diameter?

§ Clustering implies edge “locality”§ Randomness enables “shortcuts”

9/29/19 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, cs224w.stanford.edu 44

High clusteringHigh diameter

Low clusteringLow diameter

Page 44: CS224W: Analysis of Networks Jure Leskovec,

Small-World Model [Watts-Strogatz ‘98]Two components to the model:¡ (1) Start with a low-dimensional regular lattice§ (In our case we are using a ring as a lattice)§ Has high clustering coefficient

¡ (2) Rewire: Introduce randomness (“shortcuts”)§ Add/remove edges to create

shortcuts to join remote parts of the lattice

§ For each edge, with prob. p,move the other endpoint to a random node

9/29/19 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, cs224w.stanford.edu 45

[Watts-Strogatz, ‘98]

Page 45: CS224W: Analysis of Networks Jure Leskovec,

9/29/19 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, cs224w.stanford.edu 46

High clusteringLow diameter

Low clusteringLow diameter

43

2== C

kNh

NkCNh ==

loglog

a

Rewiring allows us to “interpolate” between a regular lattice and a random graph

[Watts-Strogatz, ‘98]

12

High clusteringHigh diameter

Page 46: CS224W: Analysis of Networks Jure Leskovec,

9/29/19 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, cs224w.stanford.edu 47

Clu

ster

ing

Coe

ffici

ent, 𝐶=

1 2∑𝐶𝑖

Prob. of rewiring, p

Parameter region of high clustering and low path length

Intuition: It takes a lot of randomness to ruin the clustering, but a very small amount to create shortcuts.

(sca

led)

Ave

rage

Pat

h Le

ngth

Page 47: CS224W: Analysis of Networks Jure Leskovec,

¡ Could a network with high clustering be at the same time a small world?§ Yes! You don’t need more than a few random links

¡ The Watts Strogatz Model:§ Provides insight on the interplay between clustering

and the small-world § Captures the structure of many realistic networks§ Accounts for the high clustering of real networks§ Does not lead to the correct degree distribution

9/29/19 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, cs224w.stanford.edu 48

Page 48: CS224W: Analysis of Networks Jure Leskovec,

Generating large realistic graphs

Page 49: CS224W: Analysis of Networks Jure Leskovec,

¡ How can we think of network structure recursively? Intuition: Self-similarity§ Object is similar to a part of itself: the whole has

the same shape as one or more of the parts¡ Mimic recursive graph/community growth:

¡ Kronecker product is a way of generating self-similar matrices

9/29/19 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, cs224w.stanford.edu 50

Initial graph Recursive expansion

Page 50: CS224W: Analysis of Networks Jure Leskovec,

¡ Kronecker graphs: § A recursive model of network structure

9/29/19 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, cs224w.stanford.edu 51

81 x 81 adjacency matrix

K1

[PKDD ‘05]

3 x 3 9 x 9

Page 51: CS224W: Analysis of Networks Jure Leskovec,

¡ Kronecker product of matrices 𝐴 and 𝐵 is given by

¡ Define a Kronecker product of two graphs as a Kronecker product of their adjacency matrices

9/29/19 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, cs224w.stanford.edu 52

N x M K x L

N*K x M*L

Page 52: CS224W: Analysis of Networks Jure Leskovec,

¡ Kronecker graph is obtained bygrowing sequence of graphs by iterating the Kronecker productover the initiator matrix 𝑲𝟏:

¡ Note: One can easily use multiple initiator matrices (𝐾1’, 𝐾1’’, 𝐾1’’’ ) (even of different sizes)

9/29/19 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, cs224w.stanford.edu 53

K1

[PKDD ‘05]

m

[m]m-1 m

Page 53: CS224W: Analysis of Networks Jure Leskovec,

9/29/19 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, cs224w.stanford.edu 54

[PKDD ‘05]

Page 54: CS224W: Analysis of Networks Jure Leskovec,

0.25 0.10 0.10 0.04

0.05 0.15 0.02 0.06

0.05 0.02 0.15 0.06

0.01 0.03 0.03 0.09

¡ Create 𝑁1´ 𝑁1 probability matrix 𝚯𝟏¡ Compute the kth Kronecker power 𝚯𝒌¡ For each entry 𝑝𝑢𝑣 of 𝚯𝒌 include an

edge (𝑢, 𝑣) in 𝐾𝑘 with probability 𝑝𝑢𝑣

9/29/19 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, cs224w.stanford.edu 55

0.5 0.20.1 0.3

Θ1

Instancematrix K2

Θ2= Θ1Ä Θ1

Flip biased coins

Kroneckermultiplication

Probability of edge puv

[PKDD ‘05]

Page 55: CS224W: Analysis of Networks Jure Leskovec,

¡ How do we generate an instance of a (Directed) stochastic Kronecker graph?

¡ Is there a faster way? YES!¡ Idea: Exploit the recursive structure of

Kronecker graphs§ “Drop” edges onto the graph one by one

9/29/19 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, cs224w.stanford.edu 56

0.25 0.10 0.10 0.040.05 0.15 0.02 0.060.05 0.02 0.15 0.060.01 0.03 0.03 0.09

1 1 0 00 1 0 11 0 1 10 1 0 1

Probability of edge puv

Need to flip 𝒏𝟐 coins!! Way too slow!!

Flip biased coins

Page 56: CS224W: Analysis of Networks Jure Leskovec,

¡ A faster way to generate Kronecker graphs

¡ How to “drop” an edge into a graph 𝑮 on 𝒏 = 𝟐𝒎 nodes

9/29/19 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, cs224w.stanford.edu 57

Adjacency matrix G

==Q

QÄQ

Page 57: CS224W: Analysis of Networks Jure Leskovec,

¡ A faster way to generate Kronecker graphs

¡ How to “drop” an edge into a graph 𝑮 on 𝒏 = 𝟐𝒎 nodes

9/29/19 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, cs224w.stanford.edu 58

Adjacency matrix G

a bc d

==Q

QÄQ

Page 58: CS224W: Analysis of Networks Jure Leskovec,

¡ A faster way to generate Kronecker graphs

¡ How to “drop” an edge into a graph 𝑮 on 𝒏 = 𝟐𝒎 nodes

9/29/19 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, cs224w.stanford.edu 59

==Q

QÄQ

Adjacency matrix G

a bc d

a bc d

Page 59: CS224W: Analysis of Networks Jure Leskovec,

¡ A faster way to generate Kronecker graphs

¡ How to “drop” an edge into a graph 𝑮 on 𝒏 = 𝟐𝒎 nodes:¡ We may get a few

edges colliding. We simply reinsert them.

9/29/19 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, cs224w.stanford.edu 60

Adjacency matrix G

a bc d

a bc da b

c d

==Q

QÄQ

Page 60: CS224W: Analysis of Networks Jure Leskovec,

Fast Kronecker generator algorithm:§ For generating directed graphs

¡ Insert 1 edge on graph 𝑮 on 𝒏 = 𝟐𝒎 nodes:§ Create normalized matrix 𝑳𝒖𝒗 = 𝚯𝒖𝒗/(∑𝒐𝒑𝚯𝒐𝒑)§ For 𝒊 = 𝟏 …𝒎

§ Start with 𝒙 = 𝟎, 𝒚 = 𝟎§ Pick a row/column (𝒖, 𝒗) with prob. 𝑳𝒖𝒗§ Descend into quadrant (𝒖, 𝒗) at level 𝒊 of 𝑮

§ This means: 𝒙 += 𝒖 ⋅ 𝟐𝒎O𝒊 , 𝒚 += 𝒗 ⋅ 𝟐𝒎O𝒊

§ Add an edge (𝒙, 𝒚) to 𝑮

9/29/19 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, cs224w.stanford.edu 61

a bc d

𝚯

Page 61: CS224W: Analysis of Networks Jure Leskovec,

¡ Real and Kronecker are very close:

9/29/19 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, cs224w.stanford.edu 62

=Q10.99 0.54

0.49 0.13

[ICML ‘07]