random graph models with fixed degree sequences: choices...
TRANSCRIPT
Random graph models with fixed degree sequences:
choices, consequences and irreducibilty proofs for
sampling
Joel Nishimura1, Bailey K Fosdick2, Daniel B Larremore3 and Johan Ugander4
1Arizona State Univ. 2Colorado State 3Univ. of Colorado 4Stanford Univ.
ASU Discrete Math Seminar 2018
See paper for the numerous literature connections
What is notable about a graph?
Interpretation requires a Null Model
karate club
Interpretation requires a Null Model
model
err
or
implementation difficulty and/or required understanding
replicated experiments
Erdős–Rényi
application specific simulation
fixed degree sequence
Stub Matching
edges to
stubs
join 2
stubs
drop stub
labels
Stub-labeledVertex-labeled
Self-loops
edges to
stubs
join 2
stubs
drop stub
labels
Self-loops and Multiedges
• Self-loops and multiedges are asymptotically rare
(for reasonable degree sequences)
• Have been frequently been ignored, or simply deleted
• BUT – they can also have large impacts on finite sized null models
• AND – in null models which allow self-loops or multiedges, stub matching does not sample adjacency matrices uniformly at random
Interpretation requires a Null Model
model
err
or
implementation difficulty and/or required understanding
replicated experiments
Erdős–Rényi
application specific simulation
fixed degree sequence
??
?
multiedges self-loops
vertex-
labeled
simple
stub-
labeled
Stub labeled graphs are biased
against multiedges and self-loops
Consider “1,2,2,1”Uniformly samples
from d and e are the
same
Uniform samples are
different
There’s a choice of graphs – and it matters
Example 1
• Geometer’s collaboration graph n=9,072m=22,577
• Nodes: computational geometry researchers.
• Edges: collaboration on a book or paper
• Degree assortativity
• Do high productivity authors coauthor with other high productivity authors?
Multiedges, self-loops, and labeling?
Stub-labeling isn’t causal
• Node iStub 1: first paper
Stub 2: second paper
• Node iStub 1: first paper
Stub 2: second paper
• Node jStub 1: first paper
Stub 2: second paper
• Node jStub 1: first paper
Stub 2: second paper
Consider a collaboration network, and two potential stub
labelings:
Vertex-Labeling is Causal
• Consider a collaboration network:
• Nodes: authors
• Edges: papers/books with unique title
• Suppose you order each edge’s arrival
• Each vertex labeled graph has m! edge orderings
• i.e. all adjacency matrices correspond to the same number of timelines where papers were produced in different orders.
Example 2
• Swallow graph n=17
• Nodes: barn swallows.
• Edges: bird-bird interactions
• Trait assortativity (based on bird color)
• Do birds of a similar color interact together?
Example 3• South Indian village social support network n=782
• Nodes: villagers Edges: reported social support
• Community detection via modularity maximization
• Modularity has a built in stub-labeled Chung-Lu null model
• Do results change if we use vertex labeled model?
Chung Lu
estimation
# of edges
observed in
configuratio
n models
Sampling graphs uniformly at random…
… is surprisingly difficult (except pseudo-graphs)
Sampling graphs uniformly at random
Sampling via Markov chain Monte Carlo
G0 G1 G2 G3 G4 G5
Goal: A sequence of degree constrained graphs such that subsampling
from this sequence approximates a set of graphs drawn uniformly at
random.
Double EdgeSwaps
, the Graph of Graphs
, the Graph of Graphs
Dealing with Constraints
Dealing with Constraints
no self-loops
Dealing with Constraints
MCMC requirements
1. Random walks can reach any graph -Irreducibility/GOG connected
2. Balanced transition probabilities-P(𝐺𝑖 → 𝐺𝑗) = P(𝐺𝑗 → 𝐺𝑖)-i.e. edges will be weighted but undirected
3. Markov chain is aperiodic -otherwise subsampling can be biased
NOTE: There are mixing time results for some degree sequences. There are also numerical methods to gauge convergence. I will not discuss either.
Is the GoG periodic? Nope!
Or
Stub-labeled
GoG
Vertex-labeled
GoG
GoG is an
undirected
simple graph
GoG is a directed
pseudograph
Are transition probabilities balanced?
Stub-labeled
GoG
Vertex-labeled
GoG
GoG is an
undirected
simple graph
Is the GoG connected?• Most difficult of the 3 questions
• Need special proof for each of choice of self-loops/multiedges
• Stub labeled GoG connectivity iff vertex labeled GoG Connectivity,
because the following swap permutes stubs:
Connectivity of Graph of Pseudographs
start target diff
# of stubs per node
# gold = # maroon
Connectivity of Graph of Pseudographs
can always find a graph one edge closer to target
swap
start target
Connectivity on other GoGs?
Disconnectivity of loopy graphs
Consider graphs with self-loops but no multiedges
There are no swaps between these graphs
Two directions for generalizations: cycles and cliques
Degree sequence: “2,2,…,2}Swaps can:
1. Merge two cycles into a larger cycle (or do the reverse).
2. Swap two edges inside a cycles, preserving cycle length
3. Make a self-loop & reduce cycle length by 1 (or do the reverse), but only for cycles of length 4 or more.
Swaps cannot make every edge a self-loop
This can be further generalized
3) Vk are vertices
k distance from a
vertex in V0
1) Let V0 be
vertices without a
self-loop
2) Vertices in V1
have a neighbor in V0
A taxonomy of V
Let Vk be
vertices k hops
from a vertex
without a
selfloop
Deg seq: “n+1,…n+1,n-1,…,n-1”
No swaps are possible
Q1 and Q2 are exactly the problems
Proof of 4.20 outline
increasing
number of
self-loops
connected components
graphs with a fixed
degree sequence
increasing
number of
self-loops
connected components
graphs with most self-loops
in ‘yellow’ ‘m*-loopy’
graphs: graphs with
the most self-loops
increasing
number of
self-loops
connected components
graphs with most self-loops
in ‘yellow’ ‘m*-loopy’
graphs: graphs with
the most self-loops
Note: connectivity of
follows from connectivity
of simple graphs and an
exchange lemma.
increasing
number of
self-loops
connected components
graphs with most self-loops
in ‘yellow’ ‘m*-loopy’
graphs: graphs with
the most self-loops
The GoG is disconnected
iff there is some
component where:
U
Zooming into
Easy case:
Harder case:
What do we know about ?
Maximum number of self-loops
implies no open wedges in V0.
No sequence of swaps can net
create open wedges in V0.
&
Example: V4 is empty in any
Open Wedge
Q2
Q1
is m*-loopy
Decreasing any degree in K0
leaves Vu1 with excess degree.
is also m*-loopy
By an alternating cycle/path argument.
Thus
Q: Can a different swap connect loopy-graphs?
Triangle swaps connect the GoG
Bonus: other constraints
• Connected Graphs
• GoG known to be connected, but algorithms require complicated data-structures to track effect of edge changes.
• Graphs with the same clustering coefficients
• Or, triangle constraints
Triangle MCMC constraints
• Total number of triangles
• Number of triangles incident at each node
Do these affect connectedness in simple graphs?
Can we constrain number of triangles
How about triangle sequence
And more!
Thanks for listening!