dynamics of real-world networksjure/talks/jure-proposal.pdfreal world networks instant messenger...

60
1 Dynamics of Real-world Networks Jure Leskovec Machine Learning Department Carnegie Mellon University [email protected] http://www.cs.cmu.edu/~jure

Upload: others

Post on 26-May-2020

4 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Dynamics of Real-world Networksjure/talks/jure-proposal.pdfreal world networks Instant messenger network N = 180 million nodes E = 1.3 billion edges Blog network N = 2.5 million nodes

1

Dynamics of Real-world Networks

Jure LeskovecMachine Learning DepartmentCarnegie Mellon [email protected]://www.cs.cmu.edu/~jure

Page 2: Dynamics of Real-world Networksjure/talks/jure-proposal.pdfreal world networks Instant messenger network N = 180 million nodes E = 1.3 billion edges Blog network N = 2.5 million nodes

2

Committee membersChristos FaloutsosAvrim BlumJon KleinbergJohn Lafferty

Page 3: Dynamics of Real-world Networksjure/talks/jure-proposal.pdfreal world networks Instant messenger network N = 180 million nodes E = 1.3 billion edges Blog network N = 2.5 million nodes

3

Network dynamics

InternetYeast proteininteractions

Web & citations Sexual networkFriendship network

Food-web(who-eats-whom)

Page 4: Dynamics of Real-world Networksjure/talks/jure-proposal.pdfreal world networks Instant messenger network N = 180 million nodes E = 1.3 billion edges Blog network N = 2.5 million nodes

4

Large real world networksInstant messenger network

N = 180 million nodesE = 1.3 billion edges

Blog networkN = 2.5 million nodesE = 5 million edges

Autonomous systems N = 6,500 nodesE = 26,500 edges

Citation network of physics papers

N = 31,000 nodesE = 350,000 edges

Recommendation network

N = 3 million nodesE = 16 million edges

Page 5: Dynamics of Real-world Networksjure/talks/jure-proposal.pdfreal world networks Instant messenger network N = 180 million nodes E = 1.3 billion edges Blog network N = 2.5 million nodes

5

Questions we askDo networks follow patterns as they grow?How to generate realistic graphs?How does influence spread over the network (chains, stars)?How to find/select nodes to detect cascades?

Page 6: Dynamics of Real-world Networksjure/talks/jure-proposal.pdfreal world networks Instant messenger network N = 180 million nodes E = 1.3 billion edges Blog network N = 2.5 million nodes

6

Our work: Network dynamicsOur research focuses on analyzing and modeling the structure, evolution and dynamics of large real-world networks

EvolutionGrowth and evolution of networks

CascadesProcesses taking place on networks

Page 7: Dynamics of Real-world Networksjure/talks/jure-proposal.pdfreal world networks Instant messenger network N = 180 million nodes E = 1.3 billion edges Blog network N = 2.5 million nodes

7

Our work: Goals3 parts / goals

G1: What are interesting statistical properties of network structure?

e.g., 6-degreesG2: What is a good tractable model?

e.g., preferential attachmentG3: Use models and findings to predict future behavior

e.g., node immunization

Page 8: Dynamics of Real-world Networksjure/talks/jure-proposal.pdfreal world networks Instant messenger network N = 180 million nodes E = 1.3 billion edges Blog network N = 2.5 million nodes

8

Our work: Overview

S1: Dynamics of network evolution

S2: Dynamics of processes on

networks

G1: Patterns

G2: Models

G3: Predictions

Page 9: Dynamics of Real-world Networksjure/talks/jure-proposal.pdfreal world networks Instant messenger network N = 180 million nodes E = 1.3 billion edges Blog network N = 2.5 million nodes

9

Our work: Overview

S1: Dynamics of network evolution

S2: Dynamics of processes on

networks

G1: Patterns KDD ‘05TKDD ’07

PKDD ‘06ACM EC ’06

G2: Models KDD ‘05PAKDD ’05

SDM ‘07TWEB ’07

G3: Predictions KDD ‘06 ICML ’07

WWW ‘07 submission to KDD

Page 10: Dynamics of Real-world Networksjure/talks/jure-proposal.pdfreal world networks Instant messenger network N = 180 million nodes E = 1.3 billion edges Blog network N = 2.5 million nodes

10

Our work: Impact and applications

Structural propertiesAbnormality detection

Graph modelsGraph generationGraph sampling and extrapolationsAnonymization

CascadesNode selection and targetingOutbreak detection

Page 11: Dynamics of Real-world Networksjure/talks/jure-proposal.pdfreal world networks Instant messenger network N = 180 million nodes E = 1.3 billion edges Blog network N = 2.5 million nodes

11

OutlineIntroductionCompleted work

S1: Network structure and evolutionS2: Network cascades

Proposed workKronecker time evolving graphsLarge online communication networksLinks and information cascades

Conclusion

Page 12: Dynamics of Real-world Networksjure/talks/jure-proposal.pdfreal world networks Instant messenger network N = 180 million nodes E = 1.3 billion edges Blog network N = 2.5 million nodes

12

Completed work: Overview

S1: Dynamics of network evolution

S2: Dynamics of processes on

networks

G1: Patterns DensificationShrinking diameters

Cascade shapeand size

G2: Models Forest FireKronecker graphs

Cascade generation model

G3: Predictions Estimating Kronecker parameters

Selecting nodes for detecting cascades

Page 13: Dynamics of Real-world Networksjure/talks/jure-proposal.pdfreal world networks Instant messenger network N = 180 million nodes E = 1.3 billion edges Blog network N = 2.5 million nodes

13

Completed work: Overview

S1: Dynamics of network evolution

S2: Dynamics of processes on

networks

G1: Patterns DensificationShrinking diameters

Cascade shapeand size

G2: Models Forest FireKronecker graphs

Cascade generation model

G3: Predictions Estimating Kronecker parameters

Selecting nodes for detecting cascades

Page 14: Dynamics of Real-world Networksjure/talks/jure-proposal.pdfreal world networks Instant messenger network N = 180 million nodes E = 1.3 billion edges Blog network N = 2.5 million nodes

14

G1 - Patterns: DensificationWhat is the relation between the number of nodes and the edges over time?Networks are denser over time Densification Power Law

a … densification exponent: 1 ≤ a ≤ 2:

a=1: linear growth – constant degree a=2: quadratic growth –clique

Internet

Citations

log N(t)

log

E(t)

a=1.2

a=1.7

log N(t)

log

E(t)

Page 15: Dynamics of Real-world Networksjure/talks/jure-proposal.pdfreal world networks Instant messenger network N = 180 million nodes E = 1.3 billion edges Blog network N = 2.5 million nodes

15

G1 - Patterns: Shrinking diametersIntuition and prior work say that distances between the nodes slowly grow as the network grows (like log N)

Diameter Shrinks or Stabilizes over time

as the network grows the distances between nodes slowly decrease

Internet

Citations

time

diam

eter

diam

eter

size of the graph

Page 16: Dynamics of Real-world Networksjure/talks/jure-proposal.pdfreal world networks Instant messenger network N = 180 million nodes E = 1.3 billion edges Blog network N = 2.5 million nodes

16

G2 - Models: Kronecker graphsWant to have a model that can generate a realistic graph with realistic growth

Patterns for static networksPatterns for evolving networks

The model should beanalytically tractable

We can prove properties of graphs the model generates

computationally tractableWe can estimate parameters

Page 17: Dynamics of Real-world Networksjure/talks/jure-proposal.pdfreal world networks Instant messenger network N = 180 million nodes E = 1.3 billion edges Blog network N = 2.5 million nodes

17

Idea: Recursive graph generationTry to mimic recursive graph/community growth because self-similarity leads to power-lawsThere are many obvious (but wrong) ways:

Does not densify, has increasing diameterKronecker Product is a way of generating self-similar matrices

Initial graph Recursive expansion

Page 18: Dynamics of Real-world Networksjure/talks/jure-proposal.pdfreal world networks Instant messenger network N = 180 million nodes E = 1.3 billion edges Blog network N = 2.5 million nodes

18Adjacency matrix

Kronecker product: Graph

Intermediate stage

Adjacency matrix

(9x9)(3x3)

Page 19: Dynamics of Real-world Networksjure/talks/jure-proposal.pdfreal world networks Instant messenger network N = 180 million nodes E = 1.3 billion edges Blog network N = 2.5 million nodes

19

Kronecker product: GraphContinuing multiplying with G1 we obtain G4 and so on …

G4 adjacency matrix

Page 20: Dynamics of Real-world Networksjure/talks/jure-proposal.pdfreal world networks Instant messenger network N = 180 million nodes E = 1.3 billion edges Blog network N = 2.5 million nodes

20

Properties of Kronecker graphsWe show that Kronecker multiplication generates graphs that have:

Properties of static networksPower Law Degree DistributionPower Law eigenvalue and eigenvector distributionSmall Diameter

Properties of dynamic networksDensification Power LawShrinking / Stabilizing Diameter

This means “shapes” of the distributions match but the properties are not independentHow do we set the initiator to match the real graph?

Page 21: Dynamics of Real-world Networksjure/talks/jure-proposal.pdfreal world networks Instant messenger network N = 180 million nodes E = 1.3 billion edges Blog network N = 2.5 million nodes

21

G3 - Predictions: The problem

We want to generate realistic networks:

G1) What are the relevant properties?G2) What is a good tractable model?G3) How can we fit the model (find parameters)?

Compare some property, e.g., degree distribution

Given a real network

Generate a synthetic network

Page 22: Dynamics of Real-world Networksjure/talks/jure-proposal.pdfreal world networks Instant messenger network N = 180 million nodes E = 1.3 billion edges Blog network N = 2.5 million nodes

22

Model estimation: approach

Maximum likelihood estimationGiven real graph GEstimate the Kronecker initiator graph Θ (e.g., 3x3 ) which

We need to (efficiently) calculate

And maximize over Θ

)|( ΘGP

)|(maxarg ΘΘ

GP

Page 23: Dynamics of Real-world Networksjure/talks/jure-proposal.pdfreal world networks Instant messenger network N = 180 million nodes E = 1.3 billion edges Blog network N = 2.5 million nodes

23

Model estimation: solutionNaïvely estimating the Kronecker initiator takes O(N!N2) time:

N! for graph isomorphism Metropolis sampling: N! (big) const

N2 for traversing the graph adjacency matrixProperties of Kronecker product and sparsity(E << N2): N2 E

We can estimate the parameters in linear time O(E)

Page 24: Dynamics of Real-world Networksjure/talks/jure-proposal.pdfreal world networks Instant messenger network N = 180 million nodes E = 1.3 billion edges Blog network N = 2.5 million nodes

24

Model estimation: experimentsAutonomous systems (internet): N=6500, E=26500Fitting takes 20 minutesAS graph is undirected and estimated parameters correspond to that

Degree distribution Hop plot

log degree

log

coun

t

number of hops

log

# of

reac

habl

e pa

irs

diameter=4

Page 25: Dynamics of Real-world Networksjure/talks/jure-proposal.pdfreal world networks Instant messenger network N = 180 million nodes E = 1.3 billion edges Blog network N = 2.5 million nodes

25

Model estimation: experiments

Network valueScree plot

log rank

log

eige

nval

ue

log ranklo

g 1st

eige

nvec

tor

Page 26: Dynamics of Real-world Networksjure/talks/jure-proposal.pdfreal world networks Instant messenger network N = 180 million nodes E = 1.3 billion edges Blog network N = 2.5 million nodes

26

Completed work: Overview

S1: Dynamics of network evolution

S2: Dynamics of processes on

networks

G1: Patterns DensificationShrinking diameters

Cascade shapeand size

G2: Models Forest FireKronecker graphs

Cascade generation model

G3: Predictions Estimating Kronecker parameters

Selecting nodes for detecting cascades

Page 27: Dynamics of Real-world Networksjure/talks/jure-proposal.pdfreal world networks Instant messenger network N = 180 million nodes E = 1.3 billion edges Blog network N = 2.5 million nodes

27

Cascades are phenomena in which an idea becomes adopted due to influence by others

We investigate cascade formation inViral marketing (Word of mouth)Blogs

Information cascades

Cascade (propagation graph)Social network

Page 28: Dynamics of Real-world Networksjure/talks/jure-proposal.pdfreal world networks Instant messenger network N = 180 million nodes E = 1.3 billion edges Blog network N = 2.5 million nodes

28

Cascades: QuestionsWhat kinds of cascades arise frequently in real life? Are they like trees, stars, or something else?What is the distribution of cascade sizes (exponential tail / heavy-tailed)?When is a person going to follow a recommendation?

Page 29: Dynamics of Real-world Networksjure/talks/jure-proposal.pdfreal world networks Instant messenger network N = 180 million nodes E = 1.3 billion edges Blog network N = 2.5 million nodes

29

Cascades in viral marketingSenders and followers of recommendations receive discounts on products

10% credit 10% off

Recommendations are made at time of purchaseData: 3 million people, 16 million recommendations, 500k products (books, DVDs, videos, music)

Page 30: Dynamics of Real-world Networksjure/talks/jure-proposal.pdfreal world networks Instant messenger network N = 180 million nodes E = 1.3 billion edges Blog network N = 2.5 million nodes

30

Product recommendation

networkpurchase following a

recommendation

customer recommending a product

customer not buying a recommended product

Page 31: Dynamics of Real-world Networksjure/talks/jure-proposal.pdfreal world networks Instant messenger network N = 180 million nodes E = 1.3 billion edges Blog network N = 2.5 million nodes

31

Stars (“no propagation”)

Bipartite cores (“common friends”)

Nodes having same friends

G1- Viral cascade shapes

Page 32: Dynamics of Real-world Networksjure/talks/jure-proposal.pdfreal world networks Instant messenger network N = 180 million nodes E = 1.3 billion edges Blog network N = 2.5 million nodes

32

G1- Viral cascade sizesCount how many people are in a single cascadeWe observe a heavy tailed distribution which can not be explained by a simple branching process

100 101 102100

102

104

106= 1.8e6 x-4.98 R2=0.99

steep drop-off

very few large cascades

books

log cascade size

log

coun

t

Page 33: Dynamics of Real-world Networksjure/talks/jure-proposal.pdfreal world networks Instant messenger network N = 180 million nodes E = 1.3 billion edges Blog network N = 2.5 million nodes

33

Does receiving more recommendationsincrease the likelihood of buying?

BOOKS DVDs

2 4 6 8 100

0.01

0.02

0.03

0.04

0.05

0.06

Incoming Recommendations

Pro

babi

lity

of B

uyin

g

10 20 30 40 50 600

0.02

0.04

0.06

0.08

Incoming Recommendations

Pro

babi

lity

of B

uyin

g

Page 34: Dynamics of Real-world Networksjure/talks/jure-proposal.pdfreal world networks Instant messenger network N = 180 million nodes E = 1.3 billion edges Blog network N = 2.5 million nodes

34

Cascades in the blogosphere

Posts are time stampedWe can identify cascades – graphs induced by a time ordered propagation of information

Extracted cascades

B1 B2

B4B3

a

b

d

a

Blogosphereblogs + posts

Post networklinks among posts

c

b

d

c

e e e

ab

d

c

Page 35: Dynamics of Real-world Networksjure/talks/jure-proposal.pdfreal world networks Instant messenger network N = 180 million nodes E = 1.3 billion edges Blog network N = 2.5 million nodes

35

G1- Blog cascade shapesCascade shapes (ordered by frequency)

Cascades are mainly starsInteresting relation between the cascade frequency and structure

Page 36: Dynamics of Real-world Networksjure/talks/jure-proposal.pdfreal world networks Instant messenger network N = 180 million nodes E = 1.3 billion edges Blog network N = 2.5 million nodes

36

100 102 104100

= 3.6e4 x-2.01 R2=0.94

G1- Blog cascade sizeCount how many posts participate in cascades Blog cascades tend to be larger than Viral Marketing cascades

shallow drop-off

some large cascades

log cascade size

log

coun

t

Page 37: Dynamics of Real-world Networksjure/talks/jure-proposal.pdfreal world networks Instant messenger network N = 180 million nodes E = 1.3 billion edges Blog network N = 2.5 million nodes

37

G2- Blog cascades: modelSimple virus propagation type of model (SIS) generates similar cascades as found in real life

Cascade sizeC

ount

Cascade node in-degree

Cou

ntSize of star cascade

Cou

nt

Size of chain cascadeC

ount

B1 B2

B4B3

Page 38: Dynamics of Real-world Networksjure/talks/jure-proposal.pdfreal world networks Instant messenger network N = 180 million nodes E = 1.3 billion edges Blog network N = 2.5 million nodes

38

G3- Node selection for cascade detection

Observing cascades we want to select a set of nodes to quickly detect cascadesGiven a limited budget of attention/sensors

Which blogs should one read to be most up to date?Where should we position monitoring stations to quickly detect disease outbreaks?

Page 39: Dynamics of Real-world Networksjure/talks/jure-proposal.pdfreal world networks Instant messenger network N = 180 million nodes E = 1.3 billion edges Blog network N = 2.5 million nodes

39

Node selection: algorithmNode selection is NP hardWe exploit submodularity of objective functions to

develop scalable node selection algorithmsgive performance guarantees

In practice our solution is at most 5-15% from optimal

Number of blogs

Solu

tion

qual

ityWorst case bound

Our solution

Page 40: Dynamics of Real-world Networksjure/talks/jure-proposal.pdfreal world networks Instant messenger network N = 180 million nodes E = 1.3 billion edges Blog network N = 2.5 million nodes

40

OutlineIntroductionCompleted work

Network structure and evolutionNetwork cascades

Proposed workLarge communication networksLinks and information cascadesKronecker time evolving graphs

Conclusion

Page 41: Dynamics of Real-world Networksjure/talks/jure-proposal.pdfreal world networks Instant messenger network N = 180 million nodes E = 1.3 billion edges Blog network N = 2.5 million nodes

41

Proposed work: Overview

S1: Dynamics of network evolution

S2: Dynamics of processes on

networks

G1: PatternsDynamics in

communication networks

G2: Models Models of link and cascade creation

G3: Predictions Kronecker time evolving graphs

3

1

2

Page 42: Dynamics of Real-world Networksjure/talks/jure-proposal.pdfreal world networks Instant messenger network N = 180 million nodes E = 1.3 billion edges Blog network N = 2.5 million nodes

42

Proposed work: Communication networksLarge communication network

1 billion conversations per day, 3TB of data!How communication and network properties change with user demographics (age, location, sex, distance)

Test 6 degrees of separationExamine transitivity in the network

1

Page 43: Dynamics of Real-world Networksjure/talks/jure-proposal.pdfreal world networks Instant messenger network N = 180 million nodes E = 1.3 billion edges Blog network N = 2.5 million nodes

43

Proposed work: Communication networks

Preliminary experimentDistribution of shortest path lengths

Microsoft Messenger network

200 million people1.3 billion edgesEdge if two people exchanged at least one message in one month period distance (Hops)

log

num

ber o

f nod

es

Pick a random node, count how many

nodes are at distance

1,2,3... hops7

MSN Messenger network

1

Page 44: Dynamics of Real-world Networksjure/talks/jure-proposal.pdfreal world networks Instant messenger network N = 180 million nodes E = 1.3 billion edges Blog network N = 2.5 million nodes

44

Proposed work: Links & cascades

Given labeled nodes, how do links and cascades form?Propagation of information

Do blogs have particular cascading properties?Propagation of trust

Social network of professional acquaintances7 million people, 50 million edgesRich temporal and network information

How do various factors (profession, education, location) influence link creation?How do invitations propagate?

2

Page 45: Dynamics of Real-world Networksjure/talks/jure-proposal.pdfreal world networks Instant messenger network N = 180 million nodes E = 1.3 billion edges Blog network N = 2.5 million nodes

45

Proposed work: Kronecker graphs

Graphs with weighted edgesMove beyond Bernoulli edge generation model

Algorithms for estimating parameters of time evolving networks

Allow parameters to slowly evolve over time

Θt Θt+1 Θt+2

3

Page 46: Dynamics of Real-world Networksjure/talks/jure-proposal.pdfreal world networks Instant messenger network N = 180 million nodes E = 1.3 billion edges Blog network N = 2.5 million nodes

46

TimelineMay ‘07

communication networkJun – Aug ‘07

research on on-line time evolving networksSept– Dec ‘07

Cascade formation and link prediction Jan – Apr ’08

Kronecker time evolving graphsApr – May ‘08

Write the thesisJun ‘08

Thesis defense

1

3

2

Page 47: Dynamics of Real-world Networksjure/talks/jure-proposal.pdfreal world networks Instant messenger network N = 180 million nodes E = 1.3 billion edges Blog network N = 2.5 million nodes

47

ReferencesGraphs over Time: Densification Laws, Shrinking Diameters and Possible Explanations, by Jure Leskovec, Jon Kleinberg, Christos Faloutsos, ACM KDD 2005Graph Evolution: Densification and Shrinking Diameters, by Jure Leskovec, Jon Kleinberg and Christos Faloutsos, ACM TKDD 2007Realistic, Mathematically Tractable Graph Generation and Evolution, Using Kronecker Multiplication, by Jure Leskovec, Deepay Chakrabarti, Jon Kleinberg and Christos Faloutsos, PKDD 2005Scalable Modeling of Real Graphs using Kronecker Multiplication, by Jure Leskovec and Christos Faloutsos, ICML 2007The Dynamics of Viral Marketing, by Jure Leskovec, Lada Adamic, Bernado Huberman, ACM EC 2006Cost-effective outbreak detection in networks, by Jure Leskovec, Andreas Krause, Carlos Guestrin, Christos Faloutsos, Jeanne VanBriesen, Natalie Glance, in submission to KDD 2007Cascading behavior in large blog graphs, by Jure Leskovec, Marry McGlohon, Christos Faloutsos, Natalie Glance, Matthew Hurst, SIAM DM 2007

Acknowledgements: Christos Faloutsos, Mary McGlohon, Jon Kleinberg, Zoubin Gharamani, Pall Melsted, Andreas Krause, Carlos Guestrin, Deepay Chakrabarti, Marko Grobelnik, Dunja Mladenic, Natasa Milic-Frayling, Lada Adamic, Bernardo Huberman, Eric Horvitz, Susan Dumais

Page 48: Dynamics of Real-world Networksjure/talks/jure-proposal.pdfreal world networks Instant messenger network N = 180 million nodes E = 1.3 billion edges Blog network N = 2.5 million nodes

48

Backup slides

Page 49: Dynamics of Real-world Networksjure/talks/jure-proposal.pdfreal world networks Instant messenger network N = 180 million nodes E = 1.3 billion edges Blog network N = 2.5 million nodes

49

Proposed work: Kronecker graphs

Further analysis of Kronecker graphsProve properties of the diameter of Stochastic Kronecker Graphs

Extend Kronecker to generate graphs with any number of nodes

Currently Kronecker can generate graphs with Nk

nodesIdea: expand only one row/column of current adjacency matrix

1

Page 50: Dynamics of Real-world Networksjure/talks/jure-proposal.pdfreal world networks Instant messenger network N = 180 million nodes E = 1.3 billion edges Blog network N = 2.5 million nodes

50

Proposed work: GraphGardenPublicly release a library for mining large graphs

Developed during our research40,000 lines of C++ code

ComponentsProperties of static and evolving networksGraph generation and model fittingGraph samplingAnalysis of cascadesNode placement/selection

5

Page 51: Dynamics of Real-world Networksjure/talks/jure-proposal.pdfreal world networks Instant messenger network N = 180 million nodes E = 1.3 billion edges Blog network N = 2.5 million nodes

54

Proposed work

Release the graph mining toolkit

Dynamics of network evolution

Dynamics of processes on

networks

Structural properties

Dynamics in communication

networks

ModelsAnalysis and extensions of

Kronecker graphs

Models of link and cascade creation

Predictions Kronecker time evolving graphs

1

2

3

4

5

Page 52: Dynamics of Real-world Networksjure/talks/jure-proposal.pdfreal world networks Instant messenger network N = 180 million nodes E = 1.3 billion edges Blog network N = 2.5 million nodes

57

The model: Forest Fire ModelWant to model graphs that density and have shrinking diametersIntuition:

How do we meet friends at a party?How do we identify references when writing papers?

Page 53: Dynamics of Real-world Networksjure/talks/jure-proposal.pdfreal world networks Instant messenger network N = 180 million nodes E = 1.3 billion edges Blog network N = 2.5 million nodes

59

Properties of the Forest FireHeavy-tailed in-degrees: “rich get richer”

Highly linked nodes can easily be reachedCommunities

Newcomer copies several of neighbors’ linksHeavy-tailed out-degrees

Recursive nature provides chance for node to burn many edges

Densification Power LawLike in Community Guided Attachment

Shrinking diameterDensification helps but is not enough

Page 54: Dynamics of Real-world Networksjure/talks/jure-proposal.pdfreal world networks Instant messenger network N = 180 million nodes E = 1.3 billion edges Blog network N = 2.5 million nodes

60

Forest Fire ModelForest Fire generates graphs that densifyand have shrinking diameter

densification diameter

1.32

N(t)

E(t)

N(t)

diam

eter

Page 55: Dynamics of Real-world Networksjure/talks/jure-proposal.pdfreal world networks Instant messenger network N = 180 million nodes E = 1.3 billion edges Blog network N = 2.5 million nodes

61

Forest Fire: Parameter SpaceFix backward probability rand vary forward burning probability pWe observe a sharp transition between sparse and clique-like graphsSweet spot is very narrow

Sparse graph

Clique-likegraphIncreasing

diameter

Decreasing diameter

Constantdiameter

Page 56: Dynamics of Real-world Networksjure/talks/jure-proposal.pdfreal world networks Instant messenger network N = 180 million nodes E = 1.3 billion edges Blog network N = 2.5 million nodes

63

Kronecker product: DefinitionThe Kronecker product of matrices A and B is given by

We define a Kronecker product of two graphs as a Kronecker product of their adjacency matrices

N x M K x L

N*K x M*L

Page 57: Dynamics of Real-world Networksjure/talks/jure-proposal.pdfreal world networks Instant messenger network N = 180 million nodes E = 1.3 billion edges Blog network N = 2.5 million nodes

64

Kronecker graphsWe propose a growing sequence of graphs by iterating the Kronecker product

Each Kronecker multiplication exponentially increases the size of the graphGk has N1

k nodes and E1k edges, so we get

densification

Page 58: Dynamics of Real-world Networksjure/talks/jure-proposal.pdfreal world networks Instant messenger network N = 180 million nodes E = 1.3 billion edges Blog network N = 2.5 million nodes

65

Stochastic Kronecker graphsCreate N1×N1 probability matrix P1

Compute the kth Kronecker power Pk

For each entry puv of Pk include an edge (u,v) with probability puv

0.5 0.20.1 0.3

P1

Instancematrix K2

0.25 0.10 0.10 0.04

0.05 0.15 0.02 0.06

0.05 0.02 0.15 0.06

0.01 0.03 0.03 0.09

P2=P1⊗P1

flip biased coins

Kroneckermultiplication

Probability of edge pij

Page 59: Dynamics of Real-world Networksjure/talks/jure-proposal.pdfreal world networks Instant messenger network N = 180 million nodes E = 1.3 billion edges Blog network N = 2.5 million nodes

66

Cascade formation processViral marketing

People purchase and send recommendations

legend

received recommendation and propagated it forward

received a recommendationbut didn’t propagate

Page 60: Dynamics of Real-world Networksjure/talks/jure-proposal.pdfreal world networks Instant messenger network N = 180 million nodes E = 1.3 billion edges Blog network N = 2.5 million nodes

67

Node selection: exampleWater distribution network:

Different objective functions give different placements

Population affected Detection likelihood