random graph modelssrijan kumar, georgia tech, cse6240 spring 2020: web search and text mining 3...

Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining 1

CSE 6240: Web Search and Text Mining. Spring 2020

Random Graph Models

Prof. Srijan Kumar


Today’s Lecture: Networks• Networks introduction • Web as a network • Networks properties• Random graph model: Erdos-Renyi Random Graph Model• Random graph model: Small-world Random Graph Model

Some slides are inspired by Prof. Jure Leskovec’s slides


Simplest Model of Graphs¡ Erdös-Renyi Random Graphs [Erdös-Renyi, 1960]• Two variants:

– Gn,p: undirected graph on n nodes and each edge (u,v) appears i.i.d. with probability p

– Gn,m: undirected graph with n nodes and m edges, where edges are picked uniformly at random

• What kind of networks do such models produce?


Random Graph Models: Intuition

• n and p do not uniquely determine the graph!– The graph is a result of a random process

• We can have many different realizations given the same n and p

n = 10 p= 1/6


Random Graph Model: Edges

• How likely is a graph on E edges?• P(E): the probability that a given Gnp generates a graph on

exactly E edges:

where Emax=n(n-1)/2 is the maximum possible number of edges in an undirected graph of n nodes

• P(E) is a Binomial distribution: NumberofsuccessesinasequenceofEmax independentyes/noexperiments

EEE ppEE

EP --÷÷ø

öççè

æ= max)1()( max


Node Degrees in a Random Graph

• What is expected degree of a node?

• Probability of node u linking to node v is p• u can link (flips a coin) to all other (n-1) nodes• Thus, the expected degree of node u is: p(n-1)

E[Xv ]= E[Xvu ]= (n−1)pu=1

n−1

∑

Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

Key Network Properties

7

• Degree distribution: P(k)

• Clustering coefficient: C

• Path length: h

What are the values of these properties for Gnp?


Degree Distribution• Degree distribution of Gnp is binomial• Let P(k) denote the fraction of nodes with degree k:

• Mean and variance of a binomial distribution

knk ppkn

kP ---÷÷ø

öççè

æ -= 1)1(

1)(

Select k nodes out of n-1

Probability of having k edges

Probability of missing the rest of the n-1-k edges

σ 2 = p(1− p)(n−1)

)1( -= npk


Degree Distribution• As the network size increases, the distribution becomes

increasingly narrow—we are increasingly confident that the degree of a node is in the vicinity of k.

σk=1− pp

1(n−1)

"

#$

%

&'

1/2

≈1

(n−1)1/2

P(k)

k


Clustering Coefficient of Gnp

• Clustering coefficient – Where ei is the number of edges between i’s neighbors

• So,

• Clustering coefficient of a random graph is small– Bigger graphs with the same average degree k have lower

clustering coefficient

nk

nkp

kkkkpCii

ii »-

==--×

=1)1(

)1(

)1(2-

=ii

ii kk

eC

ei = pki (ki −1)2

Number of distinct pairs of neighbors of node i of degree ki

Each pair is connected with prob. p


Key Network Properties

11

• Degree distribution:

• Clustering coefficient: C=p=k/n

• Path length: h

What are the values of these properties for Gnp?

knk ppkn

kP ---÷÷ø

öççè

æ -= 1)1(

1)(


Average Shortest Path• Average path length = O(log n)• Erdös-Renyi networks can grow to be very large but nodes

will be just a few hops apart

0 200000 400000 600000 800000 1000000

05

1015

20

num nodes

aver

age

shor

test

pat

h


MSN Network Properties vs. Gnp Properties

13

Degree distribution:

Path length: 6.6 O(log n) ~ 8.2

Clustering coefficient: 0.11 k / n ≈ 8·10-8

MSN Gnp


Clustering Implies Edge Locality• MSN network has 7 orders of magnitude larger clustering

than the corresponding Gnp!• Other examples:

– Actor Collaborations (IMDB): N = 225,226 nodes, avg. degree k = 61– Electrical power grid: N = 4,941 nodes, k = 2.67– Network of neurons: N = 282 nodes, k = 14

Network hactual hrandom Cactual Crandom

Film actors 3.65 2.99 0.00027

Power Grid 18.70 12.40 0.005

C. elegans 2.65 2.25 0.05


Gnp Simulation Experiment: Giant Component

• n=100,000, k=p(n-1) = 0.5 … 3• Emergence of a giant component: average degree k=2E/n

or p=k/(n-1)– When k=1-ε: all

components are of size Ω(log n)

– k=1+ε: 1 component of size Ω(n), others have size Ω(log n)

Fraction of nodes in the largest component

p*(n-1)=1


Real Networks vs. Gnp

• Are real networks like random graphs?– Giant connected component: YES – Average path length: YES – Clustering Coefficient: NO– Degree Distribution: NO



• Problems with the random networks model:– Degree distribution differs from that of real networks– Giant component in most real networks does NOT emerge

through a phase transition– No local structure – clustering coefficient is too low

• Most important: Are real networks random?– The answer is simply: NO!



• If Gnp is wrong, why did we spend time on it?– It is the reference model for the rest of the class. – It will help us calculate many quantities, that can then be

compared to the real data– It will help us understand to what degree is a particular

property the result of some random process• While Gnp is not realistic, it is useful


Problem with the ER Model• Gnp model has short paths: O(log n)

– This is the smallest diameter we can get if we have a constant degree.

– But clustering is low!• But real networks have “local” structure

– Triadic closure: Friend of a friend is my friend– High clustering but diameter is also high

• Can we generate graphs with high clustering coefficient while having short paths (low diameter)?

• Solution: Small-World Model

Low diameterLow clustering coefficient

High clustering coefficientHigh diameter


Six Degrees of Kevin Bacon

Origins of a small-world idea:• The Bacon number:

– Create a network of Hollywood actors– Connect two actors if they co-appeared in

the movie– Bacon number: number of steps to Kevin

Bacon• As of Dec 2007, the highest Bacon

number reported is 8• Only approx. 12% of all actors cannot be

linked to Bacon


Erdos Number

• Erdos Number: number of hops in scientific co-author graph to reach Paul Erdos

• Srijan’ Erdos number is 4.

• Find out your Erdos number: http://www.ams.org/mathscinet/collaborationDistance.html


The Small-World Experiment• What is the typical shortest path length

between any two people?– Experiment on the global friendship network

• Can’t measure, need to probe explicitly • Small-world experiment [Milgram ’67]

– Picked 300 people in Omaha, Nebraska and Wichita, Kansas

– Ask them to get a letter to a stock-broker in Boston by passing it through friends only

• How many steps do you think it took?


The Small-World Experiment• 64 chains completed (letters reached)

– It took 6.2 steps on the average, thus “6 degrees of separation”

• Further observations:– People who owned stock

had shorter paths to the stockbroker than random people: 5.4 vs. 6.7

– People from the Boston area have even closer paths: 4.4

• On average, you are 6 hops away from anyone in the world!

Milgram’s small world experiment


The Small-World Experiment #2: Columbia• In 2003 Dodds, Muhamad and Watts performed similar

experiments using e-mail:– 18 targets of various backgrounds– 24,000 first steps (~1,500 per target)– 65% dropout per step– 384 chains completed (1.5%)

Average chain length = 4.01Problem: People stop participating

Path length, h

n(h)


6-Degrees: Should We Be Surprised?• Assume each human is connected to 100 other people,

then:– Step 1: reach 100 people– Step 2: reach 100*100 = 10,000 people– Step 3: reach 100*100*100 = 1,000,000 people– Step 4: reach 100*100*100*100 = 100M people– In 5 steps we can reach 10 billion people


Small-World: How?• Could a network with high clustering be at the same

time a small world?– How can we at the same time have high clustering and small

diameter?

• Intuition:– Clustering implies edge “locality”– Randomness enables “shortcuts”

High clusteringHigh diameter

Low clusteringLow diameter


The Small-World Model [Watts-Strogatz ‘98]

Two components to the model:• (1) Start with a low-dimensional regular lattice

– In our case, we are using a ring as a lattice– Has high clustering coefficient, but has high diameter

• (2) Rewire: introduce randomness (“shortcuts”)– Add/remove edges to create shortcuts to join remote parts

of the lattice– For each edge with probability p move

the other end to a random node– Reduces the diameter by adding shortcuts


Tuning Randomness in Small-World Model• Rewiring allows us to “interpolate” between a regular lattice

and a random graph

High clusteringHigh diameter

High clusteringLow diameter

Low clusteringLow diameter

43

2== C

kNh

NkCNh ==

loglog

aC = ½


Randomness vs Clustering• Intuition: It takes a lot of randomness to ruin the clustering,

but a very small amount to create shortcuts.

Clustering coefficientC = 1/n ∑ Ci

Parameter region of high clustering and low path length

Probability of rewiring, p

Clustering Coefficient

Average path length


• Alternative formulation of the model:1. Start with a square grid2. Add 1 random long-range edge per node

• Each node has 1 spoke. Then randomly connect them.

• Each node has 8 + 1 = 9 edges• Each node’s neighbors have 12 edges • Clustering

• Diameter: O(log(n)) – Why?

Ci =2 ⋅ei

ki (ki −1)=2 ⋅129 ⋅8

≥ 0.33

Diameter of the Watts-Strogatz Model


Proof: O(log n) Diameter of Small World Model• Convert 2x2 subgraphs into supernodes:

– Each supernode has 4 long-range edges sticking out: a 4-regular random graph!• Ignore the edges between neighboring supernodes

– Recall Gnp: short paths between super nodesÞ Path in the original graph = add at most 2 steps

per long range edge (by traversing within supernodes)

Þ Diameter of the model is O(2 + log n) = O (log n)• Edges between neighboring supernodes: these edges

will reduce the diameter further. 4-regular randomgraph on supernodes

Super node


Small-World: Summary

• Could a network with high clustering be at the same time a small world?– Yes! You don’t need more than a few random links

• The Watts-Strogatz/Small-World Model:– Provides insight on the interplay between clustering and the

small-world – Captures the structure of many realistic networks– Accounts for the high clustering of real networks– Does not lead to the correct degree distribution

random graph modelssrijan kumar, georgia tech, cse6240 spring 2020: web search and text mining 3...

Documents