cs728 lecture 5 generative graph models and the web
TRANSCRIPT
Importance of Generative Models
Gives insight into the graph formation process:– Anomaly detection – abnormal behavior,
evolution– Predictions – predicting future from the past– Simulations and evaluation of new algorithms– Graph sampling – many real world graphs like
the web are too large and complex to deal with – Goal: generating graphs with small world
property, clustering, power-laws, other naturally occurring structures
Graph Models: Waxman Models
• Used for models of clustering in Internet-like topologies and networks with long and short edges
• The vertices are distributed at random in a plane. • An edge is added between each pair of vertices with probability
p.
p(u,v) = * exp( -d / (*L) ), 0 , 1.
• L is the maximum distance between any two nodes. • Increase in alpha increases the number of edges in the graph. • Increase in beta increases the number of long edges relative to
short edges. • d is the Euclidean distance from u to v in Waxman-1. • d is a random number between [0, L] in Waxman-2.
Graph Models: Configuration Model
• Random Graph from given degree sequence
• Problem: Given a degree sequence, d1,d2, d3, …., dn generate a random graph with that degree sequence
• Solution:
Place di stubs onto vertex I
Choose pairs of stubs at random
• Problem: we may construct graphs with loops and multiedges
• To prevent this there must be enough “absorbing” residual degree capacity.
• Algorithm:• Maintain list of nodes sorted by residual degrees d(v)• Repeat until all nodes have been chosen:
– pick arbitrary vertex v– add edges from v to d(v) vertices of highest residual
degree– update residual degrees
To randomize further, we can start with a realization and repeatedly 2-swap pairs of edges (u,v), (s,t) to (u,t), (s,v)
Works OK, But is there a more ‘natural’ generative model?
Generative Graph models: Preferential attachment
• Price’s Model [65] : Physics citations – “cummulative advantage”
• Herb Simon [50’s]: Nobel and Turing Awards, political scientist “rich get richer” (Pareto)
• Matthew effect / Matilda effect: sociology • Barabasi and Albert 99: Preferential attachment:
– Add a new node, create d out-links– Probability of linking a node is proportional to
its current degree• Simple explanation of power-law degree
distributions
Issues with preferential attachment and Power-laws
• Barabasi model fixed constant m for out-degree• Price’s model directed with m mean out-degree• Probability of adding a new edge is proportional to its (in)
degree k – problem at the start degree 0– Price’s model: prop to deg + 1– Analysis: prob a node has degree k
• pk ~ k-3 (Barabasi model)• pk ~ k-(2+1/m) power-law with exponent 2-3 (Price)
• Exercise: give pseudocode that generates such a graph in linear time
Variations on the PA Theme
• Clustering, Small-World and Ageing
• Copying Model
• Alpha and beta Models
• Temporal Evolution
• Densification
Graph models: Copying model
• Copying model • [Kleinberg, Kumar, Raghavan, Rajagopalan and Tomkins, 99]:
– Add a node and choose the number of edges to add
– Choose a random vertex and “copy” its links (neighbors)
• Also generates power-law degree distributions
• Generates communities - clustering
Graph Models: The Alpha ModelWatts (1999)
model: Add edges to nodes, as in random graphs, but makes links more likely when two nodes have a common friend.
For a range of values:
– The world is small (average path length is short), and
– Groups tend to form (high clustering coefficient).
Probability of linkage as a functionof number of mutual friends
( is 0 in upper left,1 in diagonal,
and ∞ in bottom right curves.)
Graph Models: The Beta ModelWatts and Strogatz (1998)“Link Rewiring”
= 0 = 0.125 = 1
People knowothers atrandom.
Not clustered,but “small world”
People knowtheir neighbors,
and a few distant people.
Clustered and“small world”
People know their neighbors.
Clustered, butnot a “small world”
Graph Models: The Beta Model
First five random links reduce the average path length of the network by half, regardless of N!
Both and models reproduce short-path results of random graphs, but also allow for clustering.
Small-world phenomena occur at threshold between order and chaos.
Watts and Strogatz (1998)
Clu
ster
ing
coef
ficie
nt /
Nor
mal
ized
pat
h le
ngth
Clustering coefficient (C) and average path length (L) plotted against
Other Related Work
• Hybrid models: Beta + Waxman on grid• Huberman and Adamic, 1999: Growth dynamics of the
world wide web – Argue against Barabasi model for its age dependence
• Kumar, Raghavan, Rajagopalan, Sivakumar and Tomkins, 1999: Stochastic models for the web graph
• Watts, Dodds, Newman, 2002: Identity and search in social networks
• Medina, Lakhina, Matta, and Byers, 2001: BRITE: An Approach to Universal Topology Generation
• …
Statistics
• Statistics of common networks:
N - nodes K - degree
D - distance
C- clique fraction
Actors 225,226 61 3.65 0.79
Power-grid 4,941 2.67 18.7 0.08
C.elegans 282 14 2.65 0.28
Large k = large c?
Small c = large d?
Modeling Ageing and Temporal Evolution
• N(t) … nodes at time t
• E(t) … edges at time t
• Suppose thatN(t+1) = 2 * N(t)
• Q: what is guess for E(t+1) =? 2 * E(t)
• A: over-doubled?
Temporal Evolution of Graphs
• Densification Power Law – networks appear denser over time – the number of edges grows faster than the
number of nodes – average degree is increasing
a … densification exponent
or
equivalently
Graph Densification
• Densification Power Law
• Densification exponent: 1 ≤ a ≤ 2:– a=1: linear growth – constant out-degree
(assumed in the literature so far)– a=2: quadratic growth – clique
• Let’s see the real graphs!
Densification – ArXiv citation graph in Physics
• Citations among physics papers
• 1992:– 1,293 papers,
2,717 citations• 2003:
– 29,555 papers, 352,807 citations
• For each month M, create a graph of all citations up to month M N(t)
E(t)
1.69
Densification – Patent Citations
• Citations among patents granted
• 1975– 334,000 nodes– 676,000 edges
• 1999– 2.9 million nodes– 16.5 million edges
• Each year is a datapoint N(t)
E(t)
1.66
Densification – Internet Autonomous Systems
• Graph of Internet• 1997
– 3,000 nodes– 10,000 edges
• 2000– 6,000 nodes– 26,000 edges
• One graph per day
N(t)
E(t)
1.18
Evolution of the Diameter
• Prior work on Power Law graphs hints at Slowly growing diameter:– diameter ~ O(log N)– diameter ~ O(log log N)
• What is happening in real data?
• Diameter shrinks over time– As the network grows the distances
between nodes slowly decrease
Diameter – ArXiv citation graph
• Citations among physics papers
• 1992 –2003
• One graph per year
time [years]
diameter
Diameter – Autonomous Systems
• Graph of Internet
• One graph per day
• 1997 – 2000
number of nodes
diameter
Next Time: Densification – Possible Explanations
• Generative models to capture the Densification Power Law and Shrinking diameters
• 2 proposed models:– Community Guided Attachment – obeys
Densification– Forest Fire model – obeys Densification,
Shrinking diameter (and Power Law degree distribution)