propagation on large networks b. aditya prakash badityap christos faloutsos christos carnegie mellon...

Propagation on Large NetworksB. Aditya Prakash

http://www.cs.cmu.edu/~badityap

Christos Faloutsoshttp://www.cs.cmu.edu/~christos

Carnegie Mellon UniversityINARC Meeting – March 28

Prakash and Faloutsos 2012 2

Preaching to the choir:Networks are everywhere!

Human Disease Network [Barabasi 2007]

Gene Regulatory Network [Decourty 2008]

Facebook Network [2010]

The Internet [2005]


Focus of this talk:Dynamical Processes over

networks are also everywhere!


Why do we care?• Social collaboration• Information Diffusion• Viral Marketing• Epidemiology and Public Health• Cyber Security• Human mobility • Games and Virtual Worlds • Ecology........


Why do we care? (1: Epidemiology)

• Dynamical Processes over networks[AJPH 2007]

CDC data: Visualization of the first 35 tuberculosis (TB) patients and their 1039 contacts

Diseases over contact networks



• Dynamical Processes over networks

• Each circle is a hospital• ~3000 hospitals• More than 30,000 patients transferred

[US-MEDICARE NETWORK 2005]

Problem: Given k units of disinfectant, whom to immunize?



CURRENT PRACTICE OUR METHOD

~6x fewer!

[US-MEDICARE NETWORK 2005]

Hospital-acquired inf. took 99K+ lives, cost $5B+ (all per year)


Why do we care? (2: Online Diffusion)

> 800m users, ~$1B revenue [WSJ 2010]

~100m active users

> 50m users


Why do we care? (2: Online Diffusion)


Celebrity

Buy Versace™!

Followers

Social Media Marketing


Why do we care? (3: To change the world?)


Social networks and Collaborative Action


High Impact – Multiple Settings

Q. How to squash rumors faster?

Q. How do opinions spread?

Q. How to market better?

epidemic out-breaks

products/viruses

transmit s/w patches


Research Theme

DATALarge real-world

networks & processes

ANALYSISUnderstanding

POLICY/ ACTIONManaging


Research Theme – Public Health

DATAModeling # patient

transfers

ANALYSISWill an epidemic

happen?

POLICY/ ACTION

How to control out-breaks?


Research Theme – Social Media

DATAModeling Tweets

spreading

POLICY/ ACTION

How to market better?

ANALYSIS# cascades in

future?


In this talk

ANALYSISUnderstanding

Given propagation models:

Q1: Will an epidemic happen?


In this talk

Q2: How to immunize and control out-breaks better?

POLICY/ ACTIONManaging


Outline

• Motivation• Epidemics: what happens? (Theory)• Action: Who to immunize? (Algorithms)


A fundamental questionStrong Virus

Epidemic?


example (static graph)Weak Virus

Epidemic?


Problem Statement

Find, a condition under which– virus will die out exponentially quickly– regardless of initial infection condition

above (epidemic)

below (extinction)

# Infected

time

Separate the regimes?


Threshold (static version)

Problem Statement• Given: –Graph G, and –Virus specs (attack prob. etc.)

• Find: –A condition for virus extinction/invasion


Threshold: Why important?

• Accelerating simulations• Forecasting (‘What-if’ scenarios)• Design of contagion and/or topology• A great handle to manipulate the spreading– Immunization– Maximize collaboration…..


Outline

• Motivation• Epidemics: what happens? (Theory)– Background– Result (Static Graphs)– Proof Ideas (Static Graphs)– Bonus 1: Dynamic Graphs– Bonus 2: Competing Viruses

• Action: Who to immunize? (Algorithms)


“SIR” model: life immunity (mumps)

• Each node in the graph is in one of three states– Susceptible (i.e. healthy)– Infected– Removed (i.e. can’t get infected again)

Prob. β Prob. δ

t = 1 t = 2 t = 3

Background


Terminology: continued

• Other virus propagation models (“VPM”)– SIS : susceptible-infected-susceptible, flu-like– SIRS : temporary immunity, like pertussis– SEIR : mumps-like, with virus incubation (E = Exposed)….………….

• Underlying contact-network – ‘who-can-infect-whom’

Background


Related Work R. M. Anderson and R. M. May. Infectious Diseases of Humans. Oxford University Press,

1991. A. Barrat, M. Barthélemy, and A. Vespignani. Dynamical Processes on Complex Networks.

Cambridge University Press, 2010. F. M. Bass. A new product growth for model consumer durables. Management Science,

15(5):215–227, 1969. D. Chakrabarti, Y. Wang, C. Wang, J. Leskovec, and C. Faloutsos. Epidemic thresholds in

real networks. ACM TISSEC, 10(4), 2008. D. Easley and J. Kleinberg. Networks, Crowds, and Markets: Reasoning About a Highly

Connected World. Cambridge University Press, 2010. A. Ganesh, L. Massoulie, and D. Towsley. The effect of network topology in spread of

epidemics. IEEE INFOCOM, 2005. Y. Hayashi, M. Minoura, and J. Matsukubo. Recoverable prevalence in growing scale-free

networks and the effective immunization. arXiv:cond-at/0305549 v2, Aug. 6 2003. H. W. Hethcote. The mathematics of infectious diseases. SIAM Review, 42, 2000. H. W. Hethcote and J. A. Yorke. Gonorrhea transmission dynamics and control. Springer

Lecture Notes in Biomathematics, 46, 1984. J. O. Kephart and S. R. White. Directed-graph epidemiological models of computer

viruses. IEEE Computer Society Symposium on Research in Security and Privacy, 1991. J. O. Kephart and S. R. White. Measuring and modeling computer virus prevalence. IEEE

Computer Society Symposium on Research in Security and Privacy, 1993. R. Pastor-Santorras and A. Vespignani. Epidemic spreading in scale-free networks.

Physical Review Letters 86, 14, 2001.

……… ……… ………

All are about either:

• Structured topologies (cliques, block-diagonals, hierarchies, random)

• Specific virus propagation models

• Static graphs

Background


Outline




How should the answer look like?

• Answer should depend on:– Graph– Virus Propagation Model (VPM)

• But how??– Graph – average degree? max. degree? diameter?– VPM – which parameters? – How to combine – linear? quadratic? exponential?

?diameterdavg ?/)( max22 ddd avgavg …..


Static Graphs: Our Main Result

• Informally,

•

For, any arbitrary topology (adjacency matrix A) any virus propagation model (VPM) in standard literature

the epidemic threshold depends only 1. on the λ, first eigenvalue of A, and 2. some constant , determined by

the virus propagation model

λVPMC

No epidemic if λ *

< 1

VPMCVPMC

In Prakash+ ICDM 2011 (Selected among best papers).


Our thresholds for some models

• s = effective strength• s < 1 : below threshold

Models Effective Strength (s) Threshold (tipping point)

SIS, SIR, SIRS, SEIRs = λ .

s = 1

SIV, SEIV s = λ .

(H.I.V.) s = λ .

12

221

vv

v

2121 VVISI


Our result: Intuition for λ

“Official” definition:• Let A be the adjacency

matrix. Then λ is the root with the largest magnitude of the characteristic polynomial of A [det(A – xI)].

• Doesn’t give much intuition!

“Un-official” Intuition • λ ~ # paths in the

graph

uu≈ .

kkA

(i, j) = # of paths i j of length k

kA


better connectivity higher λ

Largest Eigenvalue (λ)

Prakash and Faloutsos 2012 33N nodes

Largest Eigenvalue (λ)

λ ≈ 2 λ = N λ = N-1

N = 1000λ ≈ 2 λ= 31.67 λ= 999

better connectivity higher λ


Examples: Simulations – SIR (mumps)

(a) Infection profile (b) “Take-off” plot

PORTLAND graph: synthetic population, 31 million links, 6 million nodes

Frac

tion

of In

fecti

ons

Foot

prin

tEffective StrengthTime ticks


Examples: Simulations – SIRS (pertusis)

Frac

tion

of In

fecti

ons

Foot

prin

tEffective StrengthTime ticks

(a) Infection profile (b) “Take-off” plot

PORTLAND graph: synthetic population, 31 million links, 6 million nodes


Outline




λ * < 1VPMC

Graph-based

Model-basedGeneral VPM structure

Topology and stability

See paper for full proof


Outline




Dynamic Graphs: Epidemic?

adjacency matrix

8

8

Alternating behaviorsDAY (e.g., work)


adjacency matrix

8

8

Dynamic Graphs: Epidemic?Alternating behaviorsNIGHT

(e.g., home)


• SIS model– recovery rate δ– infection rate β

• Set of T arbitrary graphs

Model Description

day

N

N night

N

N , weekend…..

Infected

Healthy

XN1

N3

N2

Prob. βProb. β Prob. δ


• Informally, NO epidemic if

eig (S) = < 1

Our result: Dynamic Graphs Threshold

Single number! Largest eigenvalue of The system matrix S

In Prakash+, ECML-PKDD 2010

S =

Details


Synthetic MIT Reality Mining

log(fraction infected)

Time

BELOW

AT

ABOVE ABOVE

AT

BELOW

Infection-profile


“Take-off” plotsFootprint (# infected @ “steady state”)

Our threshold

Our threshold

(log scale)

NO EPIDEMIC

EPIDEMIC

EPIDEMIC

NO EPIDEMIC

Synthetic MIT Reality


Outline



46

Competing Contagions

iPhone v Android

Blu-ray v HD-DVD

Biological common flu/avian flu, pneumococcal inf etc

Attack Retreatv


A simple model

• Modified flu-like • Mutual Immunity (“pick one of the two”)• Susceptible-Infected1-Infected2-Susceptible

Virus 1 Virus 2

Details


Question: What happens in the end?

green: virus 1red: virus 2

Footprint @ Steady State Footprint @ Steady State = ?

Number of Infections

ASSUME: Virus 1 is stronger than Virus 2

49

Question: What happens in the end?




Strength Strength

??= Strength Strength

2

Footprint @ Steady State Footprint @ Steady State

50

Answer: Winner-Takes-All




51

Our Result: Winner-Takes-All

In Prakash+ WWW 2012

Given our model, and any graph, the weaker virus always dies-out completely

1. The stronger survives only if it is above threshold 2. Virus 1 is stronger than Virus 2, if: strength(Virus 1) > strength(Virus 2)3. Strength(Virus) = λ β / δ same as before!

Details

52

Real Examples

Reddit v Digg Blu-Ray v HD-DVD

[Google Search Trends data]


Outline

• Motivation• Epidemics: what happens? (Theory)• Action: Who to immunize? (Algorithms)


?

?

Given: a graph A, virus prop. model and budget k; Find: k ‘best’ nodes for immunization (removal).

k = 2

??

Full Static Immunization


Outline

• Motivation• Epidemics: what happens? (Theory)• Action: Who to immunize? (Algorithms)– Full Immunization (Static Graphs)– Fractional Immunization


Challenges

• Given a graph A, budget k, Q1 (Metric) How to measure the ‘shield-

value’ for a set of nodes (S)?

Q2 (Algorithm) How to find a set of k nodes with highest ‘shield-value’?


Proposed vulnerability measure λ

Increasing λ Increasing vulnerability

λ is the epidemic threshold

“Safe” “Vulnerable” “Deadly”


1

9

10

3

4

5

7

8

6

2

9

1

11

10

3

4

56

7

8

2

9

Original Graph Without {2, 6}

Eigen-Drop(S) Δ λ = λ - λs

Δ

A1: “Eigen-Drop”: an ideal shield value


(Q2) - Direct Algorithm too expensive!

• Immunize k nodes which maximize Δ λ

S = argmax Δ λ• Combinatorial!• Complexity:– Example: • 1,000 nodes, with 10,000 edges • It takes 0.01 seconds to compute λ• It takes 2,615 years to find 5-best nodes!


A2: Our Solution

• Part 1: Shield Value–Carefully approximate Eigen-drop (Δ λ)–Matrix perturbation theory

• Part 2: Algorithm–Greedily pick best node at each step–Near-optimal due to submodularity

• NetShield (linear complexity)–O(nk2+m) n = # nodes; m = # edges

In Tong, Prakash+ ICDM 2010


Experiment: Immunization qualityLog(fraction of infected nodes)

NetShield

Degree

PageRank

Eigs (=HITS)Acquaintance

Betweeness (shortest path)

Lower is

better Time


Outline

• Motivation• Epidemics: what happens? (Theory)• Action: Who to immunize? (Algorithms)– Full Immunization (Static Graphs)– Fractional Immunization


Fractional Immunization of NetworksB. Aditya Prakash, Lada Adamic, Theodore Iwashyna (M.D.), Hanghang Tong, Christos Faloutsos

Submitted to Science


Fractional Asymmetric Immunization

Hospital Another Hospital

Drug-resistant Bacteria (like XDR-TB)




Drug-resistant Bacteria (like XDR-TB)

= f




Problem: Given k units of disinfectant, how to distribute them to maximize

hospitals saved?


Our Algorithm “SMART-ALLOC”

CURRENT PRACTICE SMART-ALLOC

[US-MEDICARE NETWORK 2005]• Each circle is a hospital, ~3000 hospitals• More than 30,000 patients transferred

~6x fewer!


Running Time

≈

Simulations SMART-ALLOC

> 1 week

14 secs

> 30,000x speed-up!

Wall-Clock Time

Lower is better


Acknowledgements

Collaborators Christos Faloutsos Roni Rosenfeld, Michalis Faloutsos, Lada Adamic, Theodore Iwashyna (M.D.), Dave Andersen, Tina Eliassi-Rad, Iulian Neamtiu,

Varun Gupta, Jilles Vreeken,

Deepayan Chakrabarti, Hanghang Tong, Kunal Punera, Ashwin Sridharan, Sridhar Machiraju, Mukund Seshadri, Alice Zheng, Lei Li, Polo Chau, Nicholas Valler, Alex Beutel, Xuetao Wei


Acknowledgements

Funding


Analysis Policy/Action Data

Propagation on Large Networks

B. Aditya Prakash Christos Faloutsos


BACK-UP SLIDES


λ * < 1VPMC

Graph-based

Model-based

Proof Sketch

General VPM structure

Topology and stability


Models and more modelsModel Used for

SIR Mumps

SIS Flu

SIRS Pertussis

SEIR Chicken-pox

……..

SICR Tuberculosis

MSIR Measles

SIV Sensor Stability

H.I.V.……….

2121 VVISI


Ingredient 1: Our generalized model

Endogenous Transitions

Susceptible Infected

Vigilant

Exogenous Transitions




Vigilant


Special case


Vigilant


Special case: H.I.V.

2121 VVISI

Multiple Infectious, Vigilant states

“Terminal”

“Non-terminal”


Ingredient 2: NLDS+Stability

• View as a NLDS– discrete time – non-linear dynamical system (NLDS)

Probability vector Specifies the state of the system at time t

Details

size mN x 1

.

.

.

.

.

size N (number of nodes in the graph)

.

.

.

S

I

V


Ingredient 2: NLDS + Stability


Non-linear functionExplicitly gives the evolution of system

Details

size mN x 1

.

.

.

.

.

.

.

.


Ingredient 2: NLDS + Stability


• Threshold Stability of NLDS


= probability that node i is not attacked by any of its infectious neighbors

Special case: SIR

size 3N x 1 I

R

S

NLDS

I

R

S

Details


Fixed Point

11.

00.

00.

State when no node is infected

Q: Is it stable?

Details


Stability for SIR

Stableunder threshold

Unstableabove threshold


Another special case: SEIV

• E == exposed latent state

‘Dropping guard’‘Pre-emptive vaccination’

SEIV: itself generalizes • SIR (mumps) • SIS (flu-like) • SIRS (pertussis)• SEIR (varicella) etc.

no “sneezes”

infectious


Our Solution: Part 1

• Approximate Eigen-drop (Δ λ)

• Δ λ ≈ SV(S) =

– Result using Matrix perturbation theory–u(i) == ‘eigenscore’

~~ pagerank(i)A u = λ . u

u(i)

Details


P1: node importance P2: set diversity

Original Graph Select by P1 Select by P1+P2

Details


Our Solution: Part 2: NetShield

• We prove that: SV(S) is sub-modular (& monotone non-decreasing)

• NetShield: Greedily add best node at each step

Corollary: Greedy algorithm works 1. NetShield is near-optimal (w.r.t. max SV(S)) 2. NetShield is O(nk2+m)

Footnote: near-optimal means SV(S NetShield) >= (1-1/e) SV(S Opt)


> 10 days

0.1 seconds NetShield

100,000

>=1,000,000

10,000

10

1

0.1

0.01

argmax Δ λ

argmax SV(S)

Time

1,000

100

10,000,000x

# of vaccines

Lower is

better

NetShield Speed


Full Dynamic Immunization

• Given: Set of T arbitrary graphs

• Find: k ‘best’ nodes to immunize (remove)

day

N

N night

N

N , weekend…..

In Prakash+ ECML-PKDD 2010


Full Dynamic Immunization

• Our solution– Recall theorem– Simple: reduce (= )

• Goal: max eigendrop Δ

• No competing policy for comparison• We propose and evaluate many policies

Matrix Product

Δ =

day night

after before λλ λλ

λ


Greedy-Dmax

A

Greedy-Dav

gA

Greedy-Aav

gA

Greedy-S

Optimal2022242628303234

Footprint after k=6 immunizations

Footprint

Performance of Policies

MIT Reality Mining

Lower is better



• Fractional Effect• Asymmetric Effect



Fractional Effect [ f(x) = ]• Asymmetric Effect

Edge weakened by half

x5.0



Fractional Effect [ f(x) = ] Asymmetric Effect

Only incoming edges

x5.0



• Fractional Effect [ f(x) = ]• Asymmetric Effect

# antidotes = 3

x5.0




# antidotes = 3

x5.0


Problem Statement

• Hospital-transfer networks – Number of patients transferred

• Given:– The SI model– Directed weighted graph – A total of k antidotes– A weakening function f(x)

• Find: – the ‘best’ distribution which minimizes the “footprint”

at some time t

Almost everything is different from before!

In Prakash+ 2012 (Submitted to Science)


Naïve way

• How to estimate the footprint?– Run simulations? – too slow– takes about 3 weeks for graphs of typical size!


Our Solution – Main Idea

• The SI model has no threshold– any infection will become an epidemic

• But can bound the expected number of infected nodes at time t

• Get the distribution which (approx.) minimizes the bound!– Original problem is NP-complete– SMART-ALLOC• Greedy and near-optimal if f() is monotone non-

increasing convex

Details

propagation on large networks b. aditya prakash badityap christos faloutsos christos carnegie mellon...

Documents

contact networks prakash

networks social networks

networks ajph

networks celebritybuy

facebook network

usmedicare network

virus ext

human disease network