propagation on large networks b. aditya prakash badityap christos faloutsos christos carnegie mellon...
TRANSCRIPT
Propagation on Large NetworksB. Aditya Prakash
http://www.cs.cmu.edu/~badityap
Christos Faloutsoshttp://www.cs.cmu.edu/~christos
Carnegie Mellon UniversityINARC Meeting – March 28
Prakash and Faloutsos 2012 2
Preaching to the choir:Networks are everywhere!
Human Disease Network [Barabasi 2007]
Gene Regulatory Network [Decourty 2008]
Facebook Network [2010]
The Internet [2005]
Prakash and Faloutsos 2012 3
Focus of this talk:Dynamical Processes over
networks are also everywhere!
Prakash and Faloutsos 2012 4
Why do we care?• Social collaboration• Information Diffusion• Viral Marketing• Epidemiology and Public Health• Cyber Security• Human mobility • Games and Virtual Worlds • Ecology........
Prakash and Faloutsos 2012 5
Why do we care? (1: Epidemiology)
• Dynamical Processes over networks[AJPH 2007]
CDC data: Visualization of the first 35 tuberculosis (TB) patients and their 1039 contacts
Diseases over contact networks
Prakash and Faloutsos 2012 6
Why do we care? (1: Epidemiology)
• Dynamical Processes over networks
• Each circle is a hospital• ~3000 hospitals• More than 30,000 patients transferred
[US-MEDICARE NETWORK 2005]
Problem: Given k units of disinfectant, whom to immunize?
Prakash and Faloutsos 2012 7
Why do we care? (1: Epidemiology)
CURRENT PRACTICE OUR METHOD
~6x fewer!
[US-MEDICARE NETWORK 2005]
Hospital-acquired inf. took 99K+ lives, cost $5B+ (all per year)
Prakash and Faloutsos 2012 8
Why do we care? (2: Online Diffusion)
> 800m users, ~$1B revenue [WSJ 2010]
~100m active users
> 50m users
Prakash and Faloutsos 2012 9
Why do we care? (2: Online Diffusion)
• Dynamical Processes over networks
Celebrity
Buy Versace™!
Followers
Social Media Marketing
Prakash and Faloutsos 2012 10
Why do we care? (3: To change the world?)
• Dynamical Processes over networks
Social networks and Collaborative Action
Prakash and Faloutsos 2012 11
High Impact – Multiple Settings
Q. How to squash rumors faster?
Q. How do opinions spread?
Q. How to market better?
epidemic out-breaks
products/viruses
transmit s/w patches
Prakash and Faloutsos 2012 12
Research Theme
DATALarge real-world
networks & processes
ANALYSISUnderstanding
POLICY/ ACTIONManaging
Prakash and Faloutsos 2012 13
Research Theme – Public Health
DATAModeling # patient
transfers
ANALYSISWill an epidemic
happen?
POLICY/ ACTION
How to control out-breaks?
Prakash and Faloutsos 2012 14
Research Theme – Social Media
DATAModeling Tweets
spreading
POLICY/ ACTION
How to market better?
ANALYSIS# cascades in
future?
Prakash and Faloutsos 2012 15
In this talk
ANALYSISUnderstanding
Given propagation models:
Q1: Will an epidemic happen?
Prakash and Faloutsos 2012 16
In this talk
Q2: How to immunize and control out-breaks better?
POLICY/ ACTIONManaging
Prakash and Faloutsos 2012 17
Outline
• Motivation• Epidemics: what happens? (Theory)• Action: Who to immunize? (Algorithms)
Prakash and Faloutsos 2012 18
A fundamental questionStrong Virus
Epidemic?
Prakash and Faloutsos 2012 19
example (static graph)Weak Virus
Epidemic?
Prakash and Faloutsos 2012 20
Problem Statement
Find, a condition under which– virus will die out exponentially quickly– regardless of initial infection condition
above (epidemic)
below (extinction)
# Infected
time
Separate the regimes?
Prakash and Faloutsos 2012 21
Threshold (static version)
Problem Statement• Given: –Graph G, and –Virus specs (attack prob. etc.)
• Find: –A condition for virus extinction/invasion
Prakash and Faloutsos 2012 22
Threshold: Why important?
• Accelerating simulations• Forecasting (‘What-if’ scenarios)• Design of contagion and/or topology• A great handle to manipulate the spreading– Immunization– Maximize collaboration…..
Prakash and Faloutsos 2012 23
Outline
• Motivation• Epidemics: what happens? (Theory)– Background– Result (Static Graphs)– Proof Ideas (Static Graphs)– Bonus 1: Dynamic Graphs– Bonus 2: Competing Viruses
• Action: Who to immunize? (Algorithms)
Prakash and Faloutsos 2012 24
“SIR” model: life immunity (mumps)
• Each node in the graph is in one of three states– Susceptible (i.e. healthy)– Infected– Removed (i.e. can’t get infected again)
Prob. β Prob. δ
t = 1 t = 2 t = 3
Background
Prakash and Faloutsos 2012 25
Terminology: continued
• Other virus propagation models (“VPM”)– SIS : susceptible-infected-susceptible, flu-like– SIRS : temporary immunity, like pertussis– SEIR : mumps-like, with virus incubation (E = Exposed)….………….
• Underlying contact-network – ‘who-can-infect-whom’
Background
Prakash and Faloutsos 2012 26
Related Work R. M. Anderson and R. M. May. Infectious Diseases of Humans. Oxford University Press,
1991. A. Barrat, M. Barthélemy, and A. Vespignani. Dynamical Processes on Complex Networks.
Cambridge University Press, 2010. F. M. Bass. A new product growth for model consumer durables. Management Science,
15(5):215–227, 1969. D. Chakrabarti, Y. Wang, C. Wang, J. Leskovec, and C. Faloutsos. Epidemic thresholds in
real networks. ACM TISSEC, 10(4), 2008. D. Easley and J. Kleinberg. Networks, Crowds, and Markets: Reasoning About a Highly
Connected World. Cambridge University Press, 2010. A. Ganesh, L. Massoulie, and D. Towsley. The effect of network topology in spread of
epidemics. IEEE INFOCOM, 2005. Y. Hayashi, M. Minoura, and J. Matsukubo. Recoverable prevalence in growing scale-free
networks and the effective immunization. arXiv:cond-at/0305549 v2, Aug. 6 2003. H. W. Hethcote. The mathematics of infectious diseases. SIAM Review, 42, 2000. H. W. Hethcote and J. A. Yorke. Gonorrhea transmission dynamics and control. Springer
Lecture Notes in Biomathematics, 46, 1984. J. O. Kephart and S. R. White. Directed-graph epidemiological models of computer
viruses. IEEE Computer Society Symposium on Research in Security and Privacy, 1991. J. O. Kephart and S. R. White. Measuring and modeling computer virus prevalence. IEEE
Computer Society Symposium on Research in Security and Privacy, 1993. R. Pastor-Santorras and A. Vespignani. Epidemic spreading in scale-free networks.
Physical Review Letters 86, 14, 2001.
……… ……… ………
All are about either:
• Structured topologies (cliques, block-diagonals, hierarchies, random)
• Specific virus propagation models
• Static graphs
Background
Prakash and Faloutsos 2012 27
Outline
• Motivation• Epidemics: what happens? (Theory)– Background– Result (Static Graphs)– Proof Ideas (Static Graphs)– Bonus 1: Dynamic Graphs– Bonus 2: Competing Viruses
• Action: Who to immunize? (Algorithms)
Prakash and Faloutsos 2012 28
How should the answer look like?
• Answer should depend on:– Graph– Virus Propagation Model (VPM)
• But how??– Graph – average degree? max. degree? diameter?– VPM – which parameters? – How to combine – linear? quadratic? exponential?
?diameterdavg ?/)( max22 ddd avgavg …..
Prakash and Faloutsos 2012 29
Static Graphs: Our Main Result
• Informally,
•
For, any arbitrary topology (adjacency matrix A) any virus propagation model (VPM) in standard literature
the epidemic threshold depends only 1. on the λ, first eigenvalue of A, and 2. some constant , determined by
the virus propagation model
λVPMC
No epidemic if λ *
< 1
VPMCVPMC
In Prakash+ ICDM 2011 (Selected among best papers).
Prakash and Faloutsos 2012 30
Our thresholds for some models
• s = effective strength• s < 1 : below threshold
Models Effective Strength (s) Threshold (tipping point)
SIS, SIR, SIRS, SEIRs = λ .
s = 1
SIV, SEIV s = λ .
(H.I.V.) s = λ .
12
221
vv
v
2121 VVISI
Prakash and Faloutsos 2012 31
Our result: Intuition for λ
“Official” definition:• Let A be the adjacency
matrix. Then λ is the root with the largest magnitude of the characteristic polynomial of A [det(A – xI)].
• Doesn’t give much intuition!
“Un-official” Intuition • λ ~ # paths in the
graph
uu≈ .
kkA
(i, j) = # of paths i j of length k
kA
Prakash and Faloutsos 2012 32
better connectivity higher λ
Largest Eigenvalue (λ)
Prakash and Faloutsos 2012 33N nodes
Largest Eigenvalue (λ)
λ ≈ 2 λ = N λ = N-1
N = 1000λ ≈ 2 λ= 31.67 λ= 999
better connectivity higher λ
Prakash and Faloutsos 2012 34
Examples: Simulations – SIR (mumps)
(a) Infection profile (b) “Take-off” plot
PORTLAND graph: synthetic population, 31 million links, 6 million nodes
Frac
tion
of In
fecti
ons
Foot
prin
tEffective StrengthTime ticks
Prakash and Faloutsos 2012 35
Examples: Simulations – SIRS (pertusis)
Frac
tion
of In
fecti
ons
Foot
prin
tEffective StrengthTime ticks
(a) Infection profile (b) “Take-off” plot
PORTLAND graph: synthetic population, 31 million links, 6 million nodes
Prakash and Faloutsos 2012 36
Outline
• Motivation• Epidemics: what happens? (Theory)– Background– Result (Static Graphs)– Proof Ideas (Static Graphs)– Bonus 1: Dynamic Graphs– Bonus 2: Competing Viruses
• Action: Who to immunize? (Algorithms)
Prakash and Faloutsos 2012 37
λ * < 1VPMC
Graph-based
Model-basedGeneral VPM structure
Topology and stability
See paper for full proof
Prakash and Faloutsos 2012 38
Outline
• Motivation• Epidemics: what happens? (Theory)– Background– Result (Static Graphs)– Proof Ideas (Static Graphs)– Bonus 1: Dynamic Graphs– Bonus 2: Competing Viruses
• Action: Who to immunize? (Algorithms)
Prakash and Faloutsos 2012 39
Dynamic Graphs: Epidemic?
adjacency matrix
8
8
Alternating behaviorsDAY (e.g., work)
Prakash and Faloutsos 2012 40
adjacency matrix
8
8
Dynamic Graphs: Epidemic?Alternating behaviorsNIGHT
(e.g., home)
Prakash and Faloutsos 2012 41
• SIS model– recovery rate δ– infection rate β
• Set of T arbitrary graphs
Model Description
day
N
N night
N
N , weekend…..
Infected
Healthy
XN1
N3
N2
Prob. βProb. β Prob. δ
Prakash and Faloutsos 2012 42
• Informally, NO epidemic if
eig (S) = < 1
Our result: Dynamic Graphs Threshold
Single number! Largest eigenvalue of The system matrix S
In Prakash+, ECML-PKDD 2010
S =
Details
Prakash and Faloutsos 2012 43
Synthetic MIT Reality Mining
log(fraction infected)
Time
BELOW
AT
ABOVE ABOVE
AT
BELOW
Infection-profile
Prakash and Faloutsos 2012 44
“Take-off” plotsFootprint (# infected @ “steady state”)
Our threshold
Our threshold
(log scale)
NO EPIDEMIC
EPIDEMIC
EPIDEMIC
NO EPIDEMIC
Synthetic MIT Reality
Prakash and Faloutsos 2012 45
Outline
• Motivation• Epidemics: what happens? (Theory)– Background– Result (Static Graphs)– Proof Ideas (Static Graphs)– Bonus 1: Dynamic Graphs– Bonus 2: Competing Viruses
• Action: Who to immunize? (Algorithms)
46
Competing Contagions
iPhone v Android
Blu-ray v HD-DVD
Biological common flu/avian flu, pneumococcal inf etc
Attack Retreatv
Prakash and Faloutsos 2012 47
A simple model
• Modified flu-like • Mutual Immunity (“pick one of the two”)• Susceptible-Infected1-Infected2-Susceptible
Virus 1 Virus 2
Details
Prakash and Faloutsos 2012 48
Question: What happens in the end?
green: virus 1red: virus 2
Footprint @ Steady State Footprint @ Steady State = ?
Number of Infections
ASSUME: Virus 1 is stronger than Virus 2
49
Question: What happens in the end?
green: virus 1red: virus 2
Number of Infections
ASSUME: Virus 1 is stronger than Virus 2
Strength Strength
??= Strength Strength
2
Footprint @ Steady State Footprint @ Steady State
50
Answer: Winner-Takes-All
green: virus 1red: virus 2
ASSUME: Virus 1 is stronger than Virus 2
Number of Infections
51
Our Result: Winner-Takes-All
In Prakash+ WWW 2012
Given our model, and any graph, the weaker virus always dies-out completely
1. The stronger survives only if it is above threshold 2. Virus 1 is stronger than Virus 2, if: strength(Virus 1) > strength(Virus 2)3. Strength(Virus) = λ β / δ same as before!
Details
52
Real Examples
Reddit v Digg Blu-Ray v HD-DVD
[Google Search Trends data]
Prakash and Faloutsos 2012 53
Outline
• Motivation• Epidemics: what happens? (Theory)• Action: Who to immunize? (Algorithms)
Prakash and Faloutsos 2012 54
?
?
Given: a graph A, virus prop. model and budget k; Find: k ‘best’ nodes for immunization (removal).
k = 2
??
Full Static Immunization
Prakash and Faloutsos 2012 55
Outline
• Motivation• Epidemics: what happens? (Theory)• Action: Who to immunize? (Algorithms)– Full Immunization (Static Graphs)– Fractional Immunization
Prakash and Faloutsos 2012 56
Challenges
• Given a graph A, budget k, Q1 (Metric) How to measure the ‘shield-
value’ for a set of nodes (S)?
Q2 (Algorithm) How to find a set of k nodes with highest ‘shield-value’?
Prakash and Faloutsos 2012 57
Proposed vulnerability measure λ
Increasing λ Increasing vulnerability
λ is the epidemic threshold
“Safe” “Vulnerable” “Deadly”
Prakash and Faloutsos 2012 58
1
9
10
3
4
5
7
8
6
2
9
1
11
10
3
4
56
7
8
2
9
Original Graph Without {2, 6}
Eigen-Drop(S) Δ λ = λ - λs
Δ
A1: “Eigen-Drop”: an ideal shield value
Prakash and Faloutsos 2012 59
(Q2) - Direct Algorithm too expensive!
• Immunize k nodes which maximize Δ λ
S = argmax Δ λ• Combinatorial!• Complexity:– Example: • 1,000 nodes, with 10,000 edges • It takes 0.01 seconds to compute λ• It takes 2,615 years to find 5-best nodes!
Prakash and Faloutsos 2012 60
A2: Our Solution
• Part 1: Shield Value–Carefully approximate Eigen-drop (Δ λ)–Matrix perturbation theory
• Part 2: Algorithm–Greedily pick best node at each step–Near-optimal due to submodularity
• NetShield (linear complexity)–O(nk2+m) n = # nodes; m = # edges
In Tong, Prakash+ ICDM 2010
Prakash and Faloutsos 2012 61
Experiment: Immunization qualityLog(fraction of infected nodes)
NetShield
Degree
PageRank
Eigs (=HITS)Acquaintance
Betweeness (shortest path)
Lower is
better Time
Prakash and Faloutsos 2012 62
Outline
• Motivation• Epidemics: what happens? (Theory)• Action: Who to immunize? (Algorithms)– Full Immunization (Static Graphs)– Fractional Immunization
Prakash and Faloutsos 2012 63
Fractional Immunization of NetworksB. Aditya Prakash, Lada Adamic, Theodore Iwashyna (M.D.), Hanghang Tong, Christos Faloutsos
Submitted to Science
Prakash and Faloutsos 2012 64
Fractional Asymmetric Immunization
Hospital Another Hospital
Drug-resistant Bacteria (like XDR-TB)
Prakash and Faloutsos 2012 65
Fractional Asymmetric Immunization
Hospital Another Hospital
Drug-resistant Bacteria (like XDR-TB)
= f
Prakash and Faloutsos 2012 66
Fractional Asymmetric Immunization
Hospital Another Hospital
Problem: Given k units of disinfectant, how to distribute them to maximize
hospitals saved?
Prakash and Faloutsos 2012 67
Our Algorithm “SMART-ALLOC”
CURRENT PRACTICE SMART-ALLOC
[US-MEDICARE NETWORK 2005]• Each circle is a hospital, ~3000 hospitals• More than 30,000 patients transferred
~6x fewer!
Prakash and Faloutsos 2012 68
Running Time
≈
Simulations SMART-ALLOC
> 1 week
14 secs
> 30,000x speed-up!
Wall-Clock Time
Lower is better
Prakash and Faloutsos 2012 69
Acknowledgements
Collaborators Christos Faloutsos Roni Rosenfeld, Michalis Faloutsos, Lada Adamic, Theodore Iwashyna (M.D.), Dave Andersen, Tina Eliassi-Rad, Iulian Neamtiu,
Varun Gupta, Jilles Vreeken,
Deepayan Chakrabarti, Hanghang Tong, Kunal Punera, Ashwin Sridharan, Sridhar Machiraju, Mukund Seshadri, Alice Zheng, Lei Li, Polo Chau, Nicholas Valler, Alex Beutel, Xuetao Wei
Prakash and Faloutsos 2012 70
Acknowledgements
Funding
Prakash and Faloutsos 2012 71
Analysis Policy/Action Data
Propagation on Large Networks
B. Aditya Prakash Christos Faloutsos
Prakash and Faloutsos 2012 72
BACK-UP SLIDES
Prakash and Faloutsos 2012 73
λ * < 1VPMC
Graph-based
Model-based
Proof Sketch
General VPM structure
Topology and stability
Prakash and Faloutsos 2012 74
Models and more modelsModel Used for
SIR Mumps
SIS Flu
SIRS Pertussis
SEIR Chicken-pox
……..
SICR Tuberculosis
MSIR Measles
SIV Sensor Stability
H.I.V.……….
2121 VVISI
Prakash and Faloutsos 2012 75
Ingredient 1: Our generalized model
Endogenous Transitions
Susceptible Infected
Vigilant
Exogenous Transitions
Endogenous Transitions
Endogenous Transitions
Susceptible Infected
Vigilant
Prakash and Faloutsos 2012 76
Special case
Susceptible Infected
Vigilant
Prakash and Faloutsos 2012 77
Special case: H.I.V.
2121 VVISI
Multiple Infectious, Vigilant states
“Terminal”
“Non-terminal”
Prakash and Faloutsos 2012 78
Ingredient 2: NLDS+Stability
• View as a NLDS– discrete time – non-linear dynamical system (NLDS)
Probability vector Specifies the state of the system at time t
Details
size mN x 1
.
.
.
.
.
size N (number of nodes in the graph)
.
.
.
S
I
V
Prakash and Faloutsos 2012 79
Ingredient 2: NLDS + Stability
• View as a NLDS– discrete time – non-linear dynamical system (NLDS)
Non-linear functionExplicitly gives the evolution of system
Details
size mN x 1
.
.
.
.
.
.
.
.
Prakash and Faloutsos 2012 80
Ingredient 2: NLDS + Stability
• View as a NLDS– discrete time – non-linear dynamical system (NLDS)
• Threshold Stability of NLDS
Prakash and Faloutsos 2012 81
= probability that node i is not attacked by any of its infectious neighbors
Special case: SIR
size 3N x 1 I
R
S
NLDS
I
R
S
Details
Prakash and Faloutsos 2012 82
Fixed Point
11.
00.
00.
State when no node is infected
Q: Is it stable?
Details
Prakash and Faloutsos 2012 83
Stability for SIR
Stableunder threshold
Unstableabove threshold
Prakash and Faloutsos 2012 84
Another special case: SEIV
• E == exposed latent state
‘Dropping guard’‘Pre-emptive vaccination’
SEIV: itself generalizes • SIR (mumps) • SIS (flu-like) • SIRS (pertussis)• SEIR (varicella) etc.
no “sneezes”
infectious
Prakash and Faloutsos 2012 85
Our Solution: Part 1
• Approximate Eigen-drop (Δ λ)
• Δ λ ≈ SV(S) =
– Result using Matrix perturbation theory–u(i) == ‘eigenscore’
~~ pagerank(i)A u = λ . u
u(i)
Details
Prakash and Faloutsos 2012 86
P1: node importance P2: set diversity
Original Graph Select by P1 Select by P1+P2
Details
Prakash and Faloutsos 2012 87
Our Solution: Part 2: NetShield
• We prove that: SV(S) is sub-modular (& monotone non-decreasing)
• NetShield: Greedily add best node at each step
Corollary: Greedy algorithm works 1. NetShield is near-optimal (w.r.t. max SV(S)) 2. NetShield is O(nk2+m)
Footnote: near-optimal means SV(S NetShield) >= (1-1/e) SV(S Opt)
Prakash and Faloutsos 2012 88
> 10 days
0.1 seconds NetShield
100,000
>=1,000,000
10,000
10
1
0.1
0.01
argmax Δ λ
argmax SV(S)
Time
1,000
100
10,000,000x
# of vaccines
Lower is
better
NetShield Speed
Prakash and Faloutsos 2012 89
Full Dynamic Immunization
• Given: Set of T arbitrary graphs
• Find: k ‘best’ nodes to immunize (remove)
day
N
N night
N
N , weekend…..
In Prakash+ ECML-PKDD 2010
Prakash and Faloutsos 2012 90
Full Dynamic Immunization
• Our solution– Recall theorem– Simple: reduce (= )
• Goal: max eigendrop Δ
• No competing policy for comparison• We propose and evaluate many policies
Matrix Product
Δ =
day night
after before λλ λλ
λ
Prakash and Faloutsos 2012 91
Greedy-Dmax
A
Greedy-Dav
gA
Greedy-Aav
gA
Greedy-S
Optimal2022242628303234
Footprint after k=6 immunizations
Footprint
Performance of Policies
MIT Reality Mining
Lower is better
Prakash and Faloutsos 2012 92
Fractional Asymmetric Immunization
• Fractional Effect• Asymmetric Effect
Prakash and Faloutsos 2012 93
Fractional Asymmetric Immunization
• Fractional Effect• Asymmetric Effect
Prakash and Faloutsos 2012 94
Fractional Asymmetric Immunization
Fractional Effect [ f(x) = ]• Asymmetric Effect
Edge weakened by half
x5.0
Prakash and Faloutsos 2012 95
Fractional Asymmetric Immunization
Fractional Effect [ f(x) = ] Asymmetric Effect
Only incoming edges
x5.0
Prakash and Faloutsos 2012 96
Fractional Asymmetric Immunization
• Fractional Effect [ f(x) = ]• Asymmetric Effect
# antidotes = 3
x5.0
Prakash and Faloutsos 2012 97
Fractional Asymmetric Immunization
• Fractional Effect [ f(x) = ]• Asymmetric Effect
# antidotes = 3
x5.0
Prakash and Faloutsos 2012 98
Fractional Asymmetric Immunization
• Fractional Effect [ f(x) = ]• Asymmetric Effect
# antidotes = 3
x5.0
Prakash and Faloutsos 2012 99
Problem Statement
• Hospital-transfer networks – Number of patients transferred
• Given:– The SI model– Directed weighted graph – A total of k antidotes– A weakening function f(x)
• Find: – the ‘best’ distribution which minimizes the “footprint”
at some time t
Almost everything is different from before!
In Prakash+ 2012 (Submitted to Science)
Prakash and Faloutsos 2012 100
Naïve way
• How to estimate the footprint?– Run simulations? – too slow– takes about 3 weeks for graphs of typical size!
Prakash and Faloutsos 2012 101
Our Solution – Main Idea
• The SI model has no threshold– any infection will become an epidemic
• But can bound the expected number of infected nodes at time t
• Get the distribution which (approx.) minimizes the bound!– Original problem is NP-complete– SMART-ALLOC• Greedy and near-optimal if f() is monotone non-
increasing convex
Details