subgraph frequencies - stanford universitystanford.edu/~jugander/papers/ · fitting λ to subgraph...

42
Johan Ugander, Lars Backstrom, Jon Kleinberg World Wide Web Conference May , Subgraph Frequencies: The Empirical and Extremal Geography of Large Graph Collections

Upload: others

Post on 16-Oct-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Subgraph Frequencies - Stanford Universitystanford.edu/~jugander/papers/ · Fitting λ to subgraph data How well can we fit λ? Subgraph frequencies are modeled very well by triadic

Johan Ugander, Lars Backstrom, Jon KleinbergWorld Wide Web ConferenceMay !", #$!%

Subgraph Frequencies:The Empirical and Extremal Geographyof Large Graph Collections

Page 2: Subgraph Frequencies - Stanford Universitystanford.edu/~jugander/papers/ · Fitting λ to subgraph data How well can we fit λ? Subgraph frequencies are modeled very well by triadic

▪ Neighborhoods: graph induced by friends of a single ego, excluding ego

Graph collections

All Friends

One-way Communication Mutual Communication

Maintained Relationships

Page 3: Subgraph Frequencies - Stanford Universitystanford.edu/~jugander/papers/ · Fitting λ to subgraph data How well can we fit λ? Subgraph frequencies are modeled very well by triadic

▪ Neighborhoods: graph induced by friends of a single ego, excluding ego

▪ Groups: graph induced by members of a Facebook ‘group‘

▪ Events: graph induced by ‘Yes’ respondents to a Facebook ‘event’

Graph collections

All Friends

One-way Communication Mutual Communication

Maintained Relationships

Page 4: Subgraph Frequencies - Stanford Universitystanford.edu/~jugander/papers/ · Fitting λ to subgraph data How well can we fit λ? Subgraph frequencies are modeled very well by triadic

▪ Neighborhoods: graph induced by friends of a single ego, excluding ego

▪ Groups: graph induced by members of a Facebook ‘group‘

▪ Events: graph induced by ‘Yes’ respondents to a Facebook ‘event’

Graph collections

All Friends

One-way Communication Mutual Communication

Maintained Relationships

Seeking a ‘coordinate system’ on these graphs

Page 5: Subgraph Frequencies - Stanford Universitystanford.edu/~jugander/papers/ · Fitting λ to subgraph data How well can we fit λ? Subgraph frequencies are modeled very well by triadic

SubgraphsAll Friends

One-way Communication Mutual Communication

Maintained Relationships

Page 6: Subgraph Frequencies - Stanford Universitystanford.edu/~jugander/papers/ · Fitting λ to subgraph data How well can we fit λ? Subgraph frequencies are modeled very well by triadic

SubgraphsAll Friends

One-way Communication Mutual Communication

Maintained Relationships

Page 7: Subgraph Frequencies - Stanford Universitystanford.edu/~jugander/papers/ · Fitting λ to subgraph data How well can we fit λ? Subgraph frequencies are modeled very well by triadic

SubgraphsAll Friends

One-way Communication Mutual Communication

Maintained Relationships

Page 8: Subgraph Frequencies - Stanford Universitystanford.edu/~jugander/papers/ · Fitting λ to subgraph data How well can we fit λ? Subgraph frequencies are modeled very well by triadic

SubgraphsAll Friends

One-way Communication Mutual Communication

Maintained Relationships

Compute frequencies

Page 9: Subgraph Frequencies - Stanford Universitystanford.edu/~jugander/papers/ · Fitting λ to subgraph data How well can we fit λ? Subgraph frequencies are modeled very well by triadic

Subgraph Frequencies▪ Definition: The subgraph frequency s(F,G) of a k-node subgraph F in a graph G

is the fraction of k-tuples of nodes in G that induce a copy of F.

Motifs/Frequent subgraphs: Inokuchi et al. #$$$, Milo et al. #$$#, Yan-Han #$$#, Kuramochi-Karypis #$$&Triad census: Davis-Leinhardt !'(!, Wasserman-Faust !''&

Page 10: Subgraph Frequencies - Stanford Universitystanford.edu/~jugander/papers/ · Fitting λ to subgraph data How well can we fit λ? Subgraph frequencies are modeled very well by triadic

Subgraph Frequencies▪ Definition: The subgraph frequency s(F,G) of a k-node subgraph F in a graph G

is the fraction of k-tuples of nodes in G that induce a copy of F.

▪ Subgraph frequency vectors:

s(·, G) = (y1, y2, y3, y4, y5, y6, y7, y8, y9, y10, y11)

s(·, G) = (x1, x2, x3, x4)

Motifs/Frequent subgraphs: Inokuchi et al. #$$$, Milo et al. #$$#, Yan-Han #$$#, Kuramochi-Karypis #$$&Triad census: Davis-Leinhardt !'(!, Wasserman-Faust !''&

= ($.!), $.%(, $.!&, $.%!)

Page 11: Subgraph Frequencies - Stanford Universitystanford.edu/~jugander/papers/ · Fitting λ to subgraph data How well can we fit λ? Subgraph frequencies are modeled very well by triadic

Subgraph Frequencies▪ Definition: The subgraph frequency s(F,G) of a k-node subgraph F in a graph G

is the fraction of k-tuples of nodes in G that induce a copy of F.

▪ Subgraph frequency vectors:

Motifs/Frequent subgraphs: Inokuchi et al. #$$$, Milo et al. #$$#, Yan-Han #$$#, Kuramochi-Karypis #$$&Triad census: Davis-Leinhardt !'(!, Wasserman-Faust !''&

= ($.!), $.%(, $.!&, $.%!)

s(·, G) = (y1, y2, y3, y4, y5, y6, y7, y8, y9, y10, y11)

s(·, G) = (x1, x2, x3, x4)

Page 12: Subgraph Frequencies - Stanford Universitystanford.edu/~jugander/papers/ · Fitting λ to subgraph data How well can we fit λ? Subgraph frequencies are modeled very well by triadic

Empirical/Extremal Questions

▪ Consider the subgraph frequencies as a ‘coordinate system’

▪ Empirical Geography:

▪ What subgraph frequencies do social graphs exhibit?

▪ Is there a good model?

▪ Extremal Geography:

▪ How much of this space is even feasible, combinatorially?

▪ Do empirical graphs fill the feasible space?

Page 13: Subgraph Frequencies - Stanford Universitystanford.edu/~jugander/papers/ · Fitting λ to subgraph data How well can we fit λ? Subgraph frequencies are modeled very well by triadic

Empirical/Extremal Questions

▪ What’s a property of graphs and what’s a property of people?

▪ Consider the subgraph frequencies as a ‘coordinate system’

▪ Empirical Geography:

▪ What subgraph frequencies do social graphs exhibit?

▪ Is there a good model?

▪ Extremal Geography:

▪ How much of this space is even feasible, combinatorially?

▪ Do empirical graphs fill the feasible space?

Page 14: Subgraph Frequencies - Stanford Universitystanford.edu/~jugander/papers/ · Fitting λ to subgraph data How well can we fit λ? Subgraph frequencies are modeled very well by triadic

What do we expect?

Page 15: Subgraph Frequencies - Stanford Universitystanford.edu/~jugander/papers/ · Fitting λ to subgraph data How well can we fit λ? Subgraph frequencies are modeled very well by triadic

tria

dic clo

sure

triadic closure

triadic closure

What do we expect?

Page 16: Subgraph Frequencies - Stanford Universitystanford.edu/~jugander/papers/ · Fitting λ to subgraph data How well can we fit λ? Subgraph frequencies are modeled very well by triadic

What do we expect?

We expect few wedges, many triangles for social networks.

Page 17: Subgraph Frequencies - Stanford Universitystanford.edu/~jugander/papers/ · Fitting λ to subgraph data How well can we fit λ? Subgraph frequencies are modeled very well by triadic

The triad space

Page 18: Subgraph Frequencies - Stanford Universitystanford.edu/~jugander/papers/ · Fitting λ to subgraph data How well can we fit λ? Subgraph frequencies are modeled very well by triadic

The triad space

You are here

*$ node graphsOrange - Neighborhoods Green - GroupsLavender - Events

Page 19: Subgraph Frequencies - Stanford Universitystanford.edu/~jugander/papers/ · Fitting λ to subgraph data How well can we fit λ? Subgraph frequencies are modeled very well by triadic

You are here

Gn,p

The triad space

*$ node graphsOrange - Neighborhoods Green - GroupsLavender - Events

Page 20: Subgraph Frequencies - Stanford Universitystanford.edu/~jugander/papers/ · Fitting λ to subgraph data How well can we fit λ? Subgraph frequencies are modeled very well by triadic

Subgraph frequency of

*$ node graphsOrange - Neighborhoods Green - GroupsLavender - Events

Page 21: Subgraph Frequencies - Stanford Universitystanford.edu/~jugander/papers/ · Fitting λ to subgraph data How well can we fit λ? Subgraph frequencies are modeled very well by triadic

Subgraph frequency of

*$ node graphsOrange - Neighborhoods Green - GroupsLavender - Events

Page 22: Subgraph Frequencies - Stanford Universitystanford.edu/~jugander/papers/ · Fitting λ to subgraph data How well can we fit λ? Subgraph frequencies are modeled very well by triadic

Subgraph frequency of

Gn,p

*$ node graphsOrange - Neighborhoods Green - GroupsLavender - Events

Page 23: Subgraph Frequencies - Stanford Universitystanford.edu/~jugander/papers/ · Fitting λ to subgraph data How well can we fit λ? Subgraph frequencies are modeled very well by triadic

Subgraph frequency of

*$ node graphsOrange - Neighborhoods Green - GroupsLavender - Events

ExtremalGraph Theory

Page 24: Subgraph Frequencies - Stanford Universitystanford.edu/~jugander/papers/ · Fitting λ to subgraph data How well can we fit λ? Subgraph frequencies are modeled very well by triadic

Subgraph frequency ofFrequency of the ‘forbidden triad’ is bounded at ! "/#.

Sharp for Kn/#,n/# (bipartite graph) when n is even.

*$ node graphsOrange - Neighborhoods Green - GroupsLavender - Events

Page 25: Subgraph Frequencies - Stanford Universitystanford.edu/~jugander/papers/ · Fitting λ to subgraph data How well can we fit λ? Subgraph frequencies are modeled very well by triadic

Subgraph frequencies

Page 26: Subgraph Frequencies - Stanford Universitystanford.edu/~jugander/papers/ · Fitting λ to subgraph data How well can we fit λ? Subgraph frequencies are modeled very well by triadic

‘Crowd-sourced’ inner bounds

Consider all social graphs and the complements of all graphs, anti-social graphs (which are also graphs!)

Page 27: Subgraph Frequencies - Stanford Universitystanford.edu/~jugander/papers/ · Fitting λ to subgraph data How well can we fit λ? Subgraph frequencies are modeled very well by triadic

What graphs are missing?

Page 28: Subgraph Frequencies - Stanford Universitystanford.edu/~jugander/papers/ · Fitting λ to subgraph data How well can we fit λ? Subgraph frequencies are modeled very well by triadic

▪ Square unlikely to form:

Triadic Closure and Squares

Page 29: Subgraph Frequencies - Stanford Universitystanford.edu/~jugander/papers/ · Fitting λ to subgraph data How well can we fit λ? Subgraph frequencies are modeled very well by triadic

▪ Square unlikely to form:

Triadic Closure and Squares

Page 30: Subgraph Frequencies - Stanford Universitystanford.edu/~jugander/papers/ · Fitting λ to subgraph data How well can we fit λ? Subgraph frequencies are modeled very well by triadic

▪ Square unlikely to form:

▪ Square has very short ‘half-life’:

Triadic Closure and Squares

Page 31: Subgraph Frequencies - Stanford Universitystanford.edu/~jugander/papers/ · Fitting λ to subgraph data How well can we fit λ? Subgraph frequencies are modeled very well by triadic

Continuous Time Markov Chain Model

Page 32: Subgraph Frequencies - Stanford Universitystanford.edu/~jugander/papers/ · Fitting λ to subgraph data How well can we fit λ? Subgraph frequencies are modeled very well by triadic

tria

dic clo

sure

triadic closure

triadic closure

Continuous Time Markov Chain Model

Page 33: Subgraph Frequencies - Stanford Universitystanford.edu/~jugander/papers/ · Fitting λ to subgraph data How well can we fit λ? Subgraph frequencies are modeled very well by triadic

Edge Formation Random Walk (EFRW)▪ Continuous-time Markov chain▪ Transitions between unlabeled, undirected graphs based in edge formation.

▪ Independent Poisson processes for all node pairs:▪ Arbitrary formation: rate ɣ > $ ▪ Arbitrary deletion: rate δ > $

▪ Triadic closure formation for each wedge: rate λ + $

Page 34: Subgraph Frequencies - Stanford Universitystanford.edu/~jugander/papers/ · Fitting λ to subgraph data How well can we fit λ? Subgraph frequencies are modeled very well by triadic

Edge Formation Random Walk (EFRW)▪ Continuous-time Markov chain▪ Transitions between unlabeled, undirected graphs based in edge formation.

▪ Independent Poisson processes for all node pairs:▪ Arbitrary formation: rate ɣ > $ ▪ Arbitrary deletion: rate δ > $

▪ Triadic closure formation for each wedge: rate λ + $

▪ For &-node graphs, succinct Markov chain state transition diagram:

6�

�4�

2�

2�

3�

3�

2�

�4�

�+�

2�

3(�+�)

6�3�

4�2(�+�)

4�2(�+

�)

2(�+2�)�+�

2��

Page 35: Subgraph Frequencies - Stanford Universitystanford.edu/~jugander/papers/ · Fitting λ to subgraph data How well can we fit λ? Subgraph frequencies are modeled very well by triadic

Edge Formation Random Walk (EFRW)▪ Continuous-time Markov chain▪ Transitions between unlabeled, undirected graphs based in edge formation.

▪ Independent Poisson processes for all node pairs:▪ Arbitrary formation: rate ɣ > $ ▪ Arbitrary deletion: rate δ > $

▪ Triadic closure formation for each wedge: rate λ + $

▪ For &-node graphs, succinct Markov chain state transition diagram:

6�

�4�

2�

2�

3�

3�

2�

�4�

�+�

2�

3(�+�)

6�3�

4�2(�+�)

4�2(�+

�)

2(�+2�)�+�

2��

Page 36: Subgraph Frequencies - Stanford Universitystanford.edu/~jugander/papers/ · Fitting λ to subgraph data How well can we fit λ? Subgraph frequencies are modeled very well by triadic

Fitting λ to subgraph data▪ How well can we fit λ?

▪ Subgraph frequencies are modeled very well by triadic closure.

frequ

ency

0.00

10.

010

0.10

01.

000

Neighborhoods, n=50

●●

Neighborhoods data, meanFit model, λ ν = 19.37

frequ

ency

0.00

10.

010

0.10

01.

000

Groups, n=50

Groups data, meanFit model, λ ν = 7.02

●●

● ●

● ●

frequ

ency

0.00

10.

010

0.10

01.

000

Events, n=50●

● ●

● ●

Events data, meanFit model, λ ν = 7.38

(log-scale y-axis)

Page 37: Subgraph Frequencies - Stanford Universitystanford.edu/~jugander/papers/ · Fitting λ to subgraph data How well can we fit λ? Subgraph frequencies are modeled very well by triadic

Extremal graph theory▪ Subgraph frequencies s(F,G) closely related to homomorphism density t(F,G).

▪ Frequency of cliques, lower bounds: Moon-Moser !'"#, Razborov #$$)▪ Frequency of cliques, upper bounds: Kruskal-Katona Theorem▪ Frequency of trees: Sidorenko Conjecture (‘Theorem for trees’)▪ Also linear relationships across sizes.▪ => Linear Program!

[Borgs et al. $%%&, Lovasz $%%']

Page 38: Subgraph Frequencies - Stanford Universitystanford.edu/~jugander/papers/ · Fitting λ to subgraph data How well can we fit λ? Subgraph frequencies are modeled very well by triadic

▪ A proposition for all subgraphs:

Proposition. For every k, there exist constants ✏ and n0 such that the following

holds. If F is a k-node subgraph that is not a clique and not empty, and G is

any graph on n � n0 nodes, then s(F,G) < 1� ✏.

Extremal graph theory

Page 39: Subgraph Frequencies - Stanford Universitystanford.edu/~jugander/papers/ · Fitting λ to subgraph data How well can we fit λ? Subgraph frequencies are modeled very well by triadic

▪ How do different audience graphs differ?

Audience graph classification

20 50 100 200 500 1000

0.05

0.10

0.20

0.50

1.00

size

Aver

age

edge

den

sity

NeighborhoodsNeighborhoods + egoGroupsEvents

40075

Page 40: Subgraph Frequencies - Stanford Universitystanford.edu/~jugander/papers/ · Fitting λ to subgraph data How well can we fit λ? Subgraph frequencies are modeled very well by triadic

▪ How do different audience graphs differ?

▪ Classification challenges A) (*-node neigh. vs. (*-node events B) &$$-node neigh. vs. &$$-node groups

Audience graph classification

20 50 100 200 500 1000

0.05

0.10

0.20

0.50

1.00

size

Aver

age

edge

den

sity

NeighborhoodsNeighborhoods + egoGroupsEvents

40075

Page 41: Subgraph Frequencies - Stanford Universitystanford.edu/~jugander/papers/ · Fitting λ to subgraph data How well can we fit λ? Subgraph frequencies are modeled very well by triadic

▪ How do different audience graphs differ?

▪ Classification challenges A) (*-node neigh. vs. (*-node events B) &$$-node neigh. vs. &$$-node groups

▪ Features: Quad frequencies : (", / (", accuracy Global features: "', / (", accuracy Quad frequencies + Global features: )!, / )#, accuracy

Audience graph classification

20 50 100 200 500 1000

0.05

0.10

0.20

0.50

1.00

size

Aver

age

edge

den

sity

NeighborhoodsNeighborhoods + egoGroupsEvents

40075

Page 42: Subgraph Frequencies - Stanford Universitystanford.edu/~jugander/papers/ · Fitting λ to subgraph data How well can we fit λ? Subgraph frequencies are modeled very well by triadic

▪ Subgraph frequencies usefully characterize social graphs, have extremal limits!

▪ Edge Formation Random Walk model of dense social graphs:

▪ Homomorphism density bounds yield subgraph density bounds:

Conclusions

6�

�4�

2�

2�

3�

3�

2�

�4�

�+�

2�

3(�+�)

6�3�

4�2(�+�)

4�2(�+

�)

2(�+2�)�+�

2��