making diffusion work for you: from social media to epidemiology b. aditya prakash computer science...

79
Making Diffusion Work for You: From Social Media to Epidemiology B. Aditya Prakash Computer Science Virginia Tech. BSEC Conference, ORNL, Aug 26, 2015

Upload: arthur-carson

Post on 31-Dec-2015

214 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Making Diffusion Work for You: From Social Media to Epidemiology B. Aditya Prakash Computer Science Virginia Tech. BSEC Conference, ORNL, Aug 26, 2015

Making Diffusion Work for You: From Social Media to

EpidemiologyB. Aditya Prakash

Computer ScienceVirginia Tech.

BSEC Conference, ORNL, Aug 26, 2015

Page 2: Making Diffusion Work for You: From Social Media to Epidemiology B. Aditya Prakash Computer Science Virginia Tech. BSEC Conference, ORNL, Aug 26, 2015

Prakash 2015 2

Networks are everywhere!

Human Disease Network [Barabasi 2007]

Gene Regulatory Network [Decourty 2008]

Facebook Network [2010]

The Internet [2005]

Page 3: Making Diffusion Work for You: From Social Media to Epidemiology B. Aditya Prakash Computer Science Virginia Tech. BSEC Conference, ORNL, Aug 26, 2015

Prakash 2015 3

Dynamical Processes over networks are also everywhere!

Page 4: Making Diffusion Work for You: From Social Media to Epidemiology B. Aditya Prakash Computer Science Virginia Tech. BSEC Conference, ORNL, Aug 26, 2015

Prakash 2015 4

Why do we care?• Social collaboration• Information Diffusion• Viral Marketing• Epidemiology and Public Health• Cyber Security• Human mobility • Games and Virtual Worlds • Ecology........

Page 5: Making Diffusion Work for You: From Social Media to Epidemiology B. Aditya Prakash Computer Science Virginia Tech. BSEC Conference, ORNL, Aug 26, 2015

Prakash 2015 5

Why do we care? (1: Epidemiology)

• Dynamical Processes over networks[AJPH 2007]

CDC data: Visualization of the first 35 tuberculosis (TB) patients and their 1039 contacts

Diseases over contact networks

SI Model

Page 6: Making Diffusion Work for You: From Social Media to Epidemiology B. Aditya Prakash Computer Science Virginia Tech. BSEC Conference, ORNL, Aug 26, 2015

Prakash 2015 6

Why do we care? (1: Epidemiology)

• Dynamical Processes over networks

• Each circle is a hospital• ~3000 hospitals• More than 30,000 patients transferred

[US-MEDICARE NETWORK 2005]

Problem: Given k units of disinfectant, whom to immunize?

Page 7: Making Diffusion Work for You: From Social Media to Epidemiology B. Aditya Prakash Computer Science Virginia Tech. BSEC Conference, ORNL, Aug 26, 2015

Prakash 2015 7

Why do we care? (1: Epidemiology)

CURRENT PRACTICE OUR METHOD

~6x fewer!

[US-MEDICARE NETWORK 2005]

Hospital-acquired inf. took 99K+ lives, cost $5B+ (all per year)

Page 8: Making Diffusion Work for You: From Social Media to Epidemiology B. Aditya Prakash Computer Science Virginia Tech. BSEC Conference, ORNL, Aug 26, 2015

Prakash 2015 8

Why do we care? (2: Online Diffusion)

> 800m users, ~$1B revenue [WSJ 2010]

~100m active users

> 50m users

Page 9: Making Diffusion Work for You: From Social Media to Epidemiology B. Aditya Prakash Computer Science Virginia Tech. BSEC Conference, ORNL, Aug 26, 2015

Prakash 2015 9

Why do we care? (2: Online Diffusion)

• Dynamical Processes over networks

Celebrity

Buy Versace™!

Followers

Social Media Marketing

Page 10: Making Diffusion Work for You: From Social Media to Epidemiology B. Aditya Prakash Computer Science Virginia Tech. BSEC Conference, ORNL, Aug 26, 2015

Prakash 2015 10

Why do we care? (3: To change the world?)

• Dynamical Processes over networks

Social networks and Collaborative Action

Page 11: Making Diffusion Work for You: From Social Media to Epidemiology B. Aditya Prakash Computer Science Virginia Tech. BSEC Conference, ORNL, Aug 26, 2015

Prakash 2015 11

High Impact – Multiple Settings

Q. How to squash rumors faster?

Q. How do opinions spread?

Q. How to market better?

epidemic out-breaks

products/viruses

transmit s/w patches

Page 12: Making Diffusion Work for You: From Social Media to Epidemiology B. Aditya Prakash Computer Science Virginia Tech. BSEC Conference, ORNL, Aug 26, 2015

Prakash 2015 12

Research Theme

DATALarge real-world

networks & processes

ANALYSISUnderstanding

POLICY/ ACTIONManaging/

Utilizing

Page 13: Making Diffusion Work for You: From Social Media to Epidemiology B. Aditya Prakash Computer Science Virginia Tech. BSEC Conference, ORNL, Aug 26, 2015

Prakash 2015 13

Research Theme – Public Health

DATAModeling # patient

transfers

ANALYSISWill an epidemic

happen?

POLICY/ ACTION

How to control out-breaks?

Page 14: Making Diffusion Work for You: From Social Media to Epidemiology B. Aditya Prakash Computer Science Virginia Tech. BSEC Conference, ORNL, Aug 26, 2015

Prakash 2015 14

Research Theme – Social Media

DATAModeling Tweets

spreading

POLICY/ ACTION

How to market better?

ANALYSIS# cascades in

future?

Page 15: Making Diffusion Work for You: From Social Media to Epidemiology B. Aditya Prakash Computer Science Virginia Tech. BSEC Conference, ORNL, Aug 26, 2015

Prakash 2015 15

In this talk

DATALarge real-world

networks & processes

Q1: How to predict Flu- trends better?

Q2: How does ‘activity’ evolve over time?

Page 16: Making Diffusion Work for You: From Social Media to Epidemiology B. Aditya Prakash Computer Science Virginia Tech. BSEC Conference, ORNL, Aug 26, 2015

Prakash 2015 16

In this talk

Q3: How to control out-breaks?

POLICY/ ACTIONUtilizing

Page 17: Making Diffusion Work for You: From Social Media to Epidemiology B. Aditya Prakash Computer Science Virginia Tech. BSEC Conference, ORNL, Aug 26, 2015

Prakash 2015 17

Outline

• Motivation• Part 1: Learning Models (Empirical Studies)• Part 2: Policy and Action (Algorithms)• Conclusion

Page 18: Making Diffusion Work for You: From Social Media to Epidemiology B. Aditya Prakash Computer Science Virginia Tech. BSEC Conference, ORNL, Aug 26, 2015

Prakash 2015 18

Part 1: Empirical Studies

• Q1: How to predict Flu-trends better?

• Q2: How does activity evolve over time?

Page 19: Making Diffusion Work for You: From Social Media to Epidemiology B. Aditya Prakash Computer Science Virginia Tech. BSEC Conference, ORNL, Aug 26, 2015

Prakash 2015

Surveillance• How to estimate and predict flu trends?

19

Population survey

Hospital record

Lab survey

Surveillance Report

Page 20: Making Diffusion Work for You: From Social Media to Epidemiology B. Aditya Prakash Computer Science Virginia Tech. BSEC Conference, ORNL, Aug 26, 2015

Prakash 2015

GFT & Twitter• Estimate flu trends using online electronic

sources

20

So cold today, I’m catching cold.

I have headache, sore throat, I can’t go to school today.

My nose is totally congested, I havea hard time understanding what I’msaying.

Page 21: Making Diffusion Work for You: From Social Media to Epidemiology B. Aditya Prakash Computer Science Virginia Tech. BSEC Conference, ORNL, Aug 26, 2015

Prakash 2015

Observation 1: States

• There are different states in an infection cycle.• SEIR model:

1. Susceptible 2. Exposed3. Infected 4. Recovered

21

Page 22: Making Diffusion Work for You: From Social Media to Epidemiology B. Aditya Prakash Computer Science Virginia Tech. BSEC Conference, ORNL, Aug 26, 2015

Prakash 2015

Observation 2: Ep. & So. Gap

• Infection cases drop exponentially in epidemiology (Hethcote 2000)

• Keyword mentions drop in a power-law pattern in social media (Matsubara 2012)

22

Page 23: Making Diffusion Work for You: From Social Media to Epidemiology B. Aditya Prakash Computer Science Virginia Tech. BSEC Conference, ORNL, Aug 26, 2015

Prakash 2015

HFSTM Model• Hidden Flu-State from Tweet Model (HFSTM)

– Each word (w) in a tweet (Oi) can be generated by:• A background topic• Non-flu related topics• State related topics

23

Binary background switch

Binary non-flu related switch

Word distribution

Latent stateInitial

prob.

Transit. prob.

Transit. switch

Page 24: Making Diffusion Work for You: From Social Media to Epidemiology B. Aditya Prakash Computer Science Virginia Tech. BSEC Conference, ORNL, Aug 26, 2015

Prakash 2015

HFSTM Model• Generating tweets

24

Generate the state for a tweetGenerate the topic for a word

State: [S,E,I] Topic: [Background,Non-flu,State]

S: goodThis restaurant is really

E: Themoviewas

goodbut it

wasfreezing

I: I think I have flu

Page 25: Making Diffusion Work for You: From Social Media to Epidemiology B. Aditya Prakash Computer Science Virginia Tech. BSEC Conference, ORNL, Aug 26, 2015

Prakash 2015

• EM-based algorithm: HFSTM-FIT– E-step:

• At(i)=P(O1,O2,…,Ot,St=i)

• Bt(i)=P(Ot+1,…,OTu|St=i)

• γt(i)=P(St=i|Ou)

– M-step:• Other parameters such as state transition probabilities,

topic distributions, etc.

– Parameters learned:

Inference

25

Page 26: Making Diffusion Work for You: From Social Media to Epidemiology B. Aditya Prakash Computer Science Virginia Tech. BSEC Conference, ORNL, Aug 26, 2015

Prakash 2015

A possible issue with HFSTM

• Suffers from large, noisy vocabulary. • Semi-supervision for improvement

– Introduce weak supervision into HFSTM.

26

Page 27: Making Diffusion Work for You: From Social Media to Epidemiology B. Aditya Prakash Computer Science Virginia Tech. BSEC Conference, ORNL, Aug 26, 2015

Prakash 2015

HFSTM-A

• HFSTM-A(spect)– Introduce an aspect variable y, expressing our belief on

whether a word is flu-related or not.– The value of y biases the switch variables s.t. flu-related

words are more likely to be explained by state topics.

27

When the aspect value (y) is introduced, the switching probability are updated accordingly.

Page 28: Making Diffusion Work for You: From Social Media to Epidemiology B. Aditya Prakash Computer Science Virginia Tech. BSEC Conference, ORNL, Aug 26, 2015

Prakash 2015

Vocabulary & Dataset• Vocabulary (230 words):

– Flu-related keyword list by Chakraborty SDM 2014

– Extra state-related keyword list• Dataset (34,000 tweets):

– Identify infected users and collect their tweets– Train on data from Jun 20, 2013-Aug 06, 2013– Test on two time period:

• Dec 01, 2012- July 08, 2013• Nov 10, 2013-Jan 26, 2014

28

Page 29: Making Diffusion Work for You: From Social Media to Epidemiology B. Aditya Prakash Computer Science Virginia Tech. BSEC Conference, ORNL, Aug 26, 2015

Prakash 2015

Learned word distributions• The most probable words learned in each state

29

Probably healthy: S Having symptons: E Definitely sick: I

Page 30: Making Diffusion Work for You: From Social Media to Epidemiology B. Aditya Prakash Computer Science Virginia Tech. BSEC Conference, ORNL, Aug 26, 2015

Prakash 2015

Learned state transitionTransition probabilities Transition in real tweets

30

Not directly flu-related, yet correctly identified

Learned by HFSTM:

Page 31: Making Diffusion Work for You: From Social Media to Epidemiology B. Aditya Prakash Computer Science Virginia Tech. BSEC Conference, ORNL, Aug 26, 2015

Prakash 2015

Flu trend fitting

• Ground-truth: – The Pan American Health Organization (PAHO)

• Algorithms:– Baseline:

• Count the number of keywords weekly as features, and regress to the ground-truth curve.

– Google flu trend:• Take the google flu trend data as input, regress to the PAHO curve.

– HFSTM:• Distinguish different states of keyword, and only use the number

of keywords in I state. Again regress to PAHO.

31

Page 32: Making Diffusion Work for You: From Social Media to Epidemiology B. Aditya Prakash Computer Science Virginia Tech. BSEC Conference, ORNL, Aug 26, 2015

Prakash 2015

Flu trend fitting• Linear regression to the case count

reported by PAHO (the ground-truth)

32

Page 33: Making Diffusion Work for You: From Social Media to Epidemiology B. Aditya Prakash Computer Science Virginia Tech. BSEC Conference, ORNL, Aug 26, 2015

Prakash 2015

HFSTM-A

• Results are qualitatively similar with HFSTM, when the vocabulary is 10 times larger.

33

See Poster!

Page 34: Making Diffusion Work for You: From Social Media to Epidemiology B. Aditya Prakash Computer Science Virginia Tech. BSEC Conference, ORNL, Aug 26, 2015

Prakash 2015 34

Part 1: Empirical Studies

• Q1: How to predict Flu-trends better?

• Q2: How does activity evolve over time?

Page 35: Making Diffusion Work for You: From Social Media to Epidemiology B. Aditya Prakash Computer Science Virginia Tech. BSEC Conference, ORNL, Aug 26, 2015

Prakash 2015 35

Google Search Volume

e.g., given (1) first spike, (2) release date of two sequel movies (3) access volume before the release date

? ?

(1) First spike (2) Release date (3) Two weeks before release

Page 36: Making Diffusion Work for You: From Social Media to Epidemiology B. Aditya Prakash Computer Science Virginia Tech. BSEC Conference, ORNL, Aug 26, 2015

Prakash 2015 36

Patterns

X

Y

Page 37: Making Diffusion Work for You: From Social Media to Epidemiology B. Aditya Prakash Computer Science Virginia Tech. BSEC Conference, ORNL, Aug 26, 2015

Prakash 2015 37

Patterns

X

Y

More Data

Page 38: Making Diffusion Work for You: From Social Media to Epidemiology B. Aditya Prakash Computer Science Virginia Tech. BSEC Conference, ORNL, Aug 26, 2015

Prakash 2015 38

Patterns

X

YAnomaly

?

Page 39: Making Diffusion Work for You: From Social Media to Epidemiology B. Aditya Prakash Computer Science Virginia Tech. BSEC Conference, ORNL, Aug 26, 2015

Prakash 2015 39

Patterns

X

YAnomaly

?

Extrapolation

Page 40: Making Diffusion Work for You: From Social Media to Epidemiology B. Aditya Prakash Computer Science Virginia Tech. BSEC Conference, ORNL, Aug 26, 2015

Prakash 2015 40

Patterns

X

YAnomalyImputation

Extrapolation

Page 41: Making Diffusion Work for You: From Social Media to Epidemiology B. Aditya Prakash Computer Science Virginia Tech. BSEC Conference, ORNL, Aug 26, 2015

Prakash 2015 41

Patterns

AnomalyImputation

Extrapolation

Compression

Page 42: Making Diffusion Work for You: From Social Media to Epidemiology B. Aditya Prakash Computer Science Virginia Tech. BSEC Conference, ORNL, Aug 26, 2015

Prakash 2015 42

• Meme (# of mentions in blogs)– short phrases Sourced from U.S. politics in 2008

“you can put lipstick on a pig”

“yes we can”

Rise and fall patterns in social media

Page 43: Making Diffusion Work for You: From Social Media to Epidemiology B. Aditya Prakash Computer Science Virginia Tech. BSEC Conference, ORNL, Aug 26, 2015

Prakash 2015 43

Rise and fall patterns in social media

• Can we find a unifying model, which includes these patterns?

• four classes on YouTube [Crane et al. ’08]• six classes on Meme [Yang et al. ’11]

Page 44: Making Diffusion Work for You: From Social Media to Epidemiology B. Aditya Prakash Computer Science Virginia Tech. BSEC Conference, ORNL, Aug 26, 2015

Prakash 2015 44

Rise and fall patterns in social media

• Answer: YES!

• We can represent all patterns by single model

In Matsubara, Sakurai, Prakash+ SIGKDD 2012

Page 45: Making Diffusion Work for You: From Social Media to Epidemiology B. Aditya Prakash Computer Science Virginia Tech. BSEC Conference, ORNL, Aug 26, 2015

Prakash 2015 45

Main idea - SpikeM- 1. Un-informed bloggers (uninformed about rumor)- 2. External shock at time nb (e.g, breaking news)- 3. Infection (word-of-mouth)

Infectiveness of a blog-post at age n:

- Strength of infection (quality of news)

- Decay function (how infective a blog posting is)

Time n=0 Time n=nb Time n=nb+1

β

Power Law

Page 46: Making Diffusion Work for You: From Social Media to Epidemiology B. Aditya Prakash Computer Science Virginia Tech. BSEC Conference, ORNL, Aug 26, 2015

Prakash 2015 46

-1.5 slopeJ. G. Oliveira et. al. Human Dynamics: The

Correspondence Patterns of Darwin and Einstein. Nature 437, 1251 (2005) . [PDF]

(also in Leskovec, McGlohon+, SDM 2007)

Page 47: Making Diffusion Work for You: From Social Media to Epidemiology B. Aditya Prakash Computer Science Virginia Tech. BSEC Conference, ORNL, Aug 26, 2015

Prakash 2015 47

SpikeM - with periodicity

• Full equation of SpikeM

Periodicity

12pmPeak activity 3am

Low activity

Time n

Bloggers change their activity over time

(e.g., daily, weekly, yearly)

activity

Details

Page 48: Making Diffusion Work for You: From Social Media to Epidemiology B. Aditya Prakash Computer Science Virginia Tech. BSEC Conference, ORNL, Aug 26, 2015

Prakash 2015 48

Tail-part forecasts

• SpikeM can capture tail part

Page 49: Making Diffusion Work for You: From Social Media to Epidemiology B. Aditya Prakash Computer Science Virginia Tech. BSEC Conference, ORNL, Aug 26, 2015

Prakash 2015 49

“What-if” forecasting

e.g., given (1) first spike, (2) release date of two sequel movies (3) access volume before the release date

? ?

(1) First spike (2) Release date (3) Two weeks before release

Page 50: Making Diffusion Work for You: From Social Media to Epidemiology B. Aditya Prakash Computer Science Virginia Tech. BSEC Conference, ORNL, Aug 26, 2015

Prakash 2015 50

“What-if” forecasting

–SpikeM can forecast not only tail-part, but also rise-part!

• SpikeM can forecast upcoming spikes

(1) First spike (2) Release date (3) Two weeks before release

Page 51: Making Diffusion Work for You: From Social Media to Epidemiology B. Aditya Prakash Computer Science Virginia Tech. BSEC Conference, ORNL, Aug 26, 2015

Prakash 2015 51

Modeling Malware Penetration

• Worldwide Intelligence Network– Which machine got which malware (or legitimate files)– 1 Billion nodes– 37 Billion edges

• Q: Temporal patterns?

[Papalexakakis et. al. + 2013]

Page 52: Making Diffusion Work for You: From Social Media to Epidemiology B. Aditya Prakash Computer Science Virginia Tech. BSEC Conference, ORNL, Aug 26, 2015

Prakash 2015 52

Q: Temporal Patterns

Looks familiar?

Page 53: Making Diffusion Work for You: From Social Media to Epidemiology B. Aditya Prakash Computer Science Virginia Tech. BSEC Conference, ORNL, Aug 26, 2015

Prakash 2015 53

SpikeM again (or SharkFin)

7 parameters only!

~ 400 points ~ 400 points

Page 54: Making Diffusion Work for You: From Social Media to Epidemiology B. Aditya Prakash Computer Science Virginia Tech. BSEC Conference, ORNL, Aug 26, 2015

Prakash 2015 54

Latent Propagation Patterns

Page 55: Making Diffusion Work for You: From Social Media to Epidemiology B. Aditya Prakash Computer Science Virginia Tech. BSEC Conference, ORNL, Aug 26, 2015

Prakash 2015 55

Bonus: Protest Predictions

• Can Twitter provide a lead time?• South American twitter dataset

– Language: Spanish/Portuguese– Idea

1. Look for trending keywords.2. Predict event type for protest using SpikeMparameters!

A political tweet

Violent Protest (VP)

Non Violent Protest (P)

[Sundereisan et al. ASONAM 2014][Jin et al. SIGKDD 2014]

VP

P

Page 56: Making Diffusion Work for You: From Social Media to Epidemiology B. Aditya Prakash Computer Science Virginia Tech. BSEC Conference, ORNL, Aug 26, 2015

Prakash 2015 56

Part 1: Algorithms

• Q3: How to control out-breaks?

(Broad theme: Network Topology Manipulation)

Page 57: Making Diffusion Work for You: From Social Media to Epidemiology B. Aditya Prakash Computer Science Virginia Tech. BSEC Conference, ORNL, Aug 26, 2015

Prakash 2015 57

Immunization (= Interventions)

• Different Flavors:– Pre-emptive– Data-aware

Page 58: Making Diffusion Work for You: From Social Media to Epidemiology B. Aditya Prakash Computer Science Virginia Tech. BSEC Conference, ORNL, Aug 26, 2015

Prakash 2015 58

Pre-emptive: Vulnerability• First eigenvalue λ1 (of adjacency matrix) is sufficient

for most diffusion models. [Prakash et al. ICDM’12 selected for best papers]

λ1 is the epidemic threshold

“Safe” “Vulnerable” “Deadly”

Increasing λ1 , Increasing vulnerability

Page 59: Making Diffusion Work for You: From Social Media to Epidemiology B. Aditya Prakash Computer Science Virginia Tech. BSEC Conference, ORNL, Aug 26, 2015

Prakash 2015 59

Goal

• Decrease λ1 as much as possible

• Node based [Tong, Prakash, + ICDM 2010]• Edge-based [Tong, Prakash, Eliassi-Rad+ CIKM

2012, Best Paper Award]• Edge-Manipulation (see next)

Page 60: Making Diffusion Work for You: From Social Media to Epidemiology B. Aditya Prakash Computer Science Virginia Tech. BSEC Conference, ORNL, Aug 26, 2015

Prakash 2015 60

Fractional Asymmetric Immunization

Hospital Another Hospital

Drug-resistant Bacteria (like XDR-TB)

[Prakash, Adamic, Iwashnya (M.D.) SDM 2013]

Page 61: Making Diffusion Work for You: From Social Media to Epidemiology B. Aditya Prakash Computer Science Virginia Tech. BSEC Conference, ORNL, Aug 26, 2015

Prakash 2015 61

Fractional Asymmetric Immunization

Hospital Another Hospital

Drug-resistant Bacteria (like XDR-TB)

= f

Page 62: Making Diffusion Work for You: From Social Media to Epidemiology B. Aditya Prakash Computer Science Virginia Tech. BSEC Conference, ORNL, Aug 26, 2015

Prakash 2015 62

Fractional Asymmetric Immunization

Hospital Another Hospital

Problem: Given k units of disinfectant, how to distribute them to maximize

hospitals saved?

Page 63: Making Diffusion Work for You: From Social Media to Epidemiology B. Aditya Prakash Computer Science Virginia Tech. BSEC Conference, ORNL, Aug 26, 2015

Prakash 2015 63

Our Solution

• Part 1: Value– Approximate Eigen-drop (Δ λ)– Matrix perturbation theory

• Part 2: Algorithm– Greedily pick best node at each step– Near-optimal due to submodularity

• SmartAlloc (linear complexity)

Page 64: Making Diffusion Work for You: From Social Media to Epidemiology B. Aditya Prakash Computer Science Virginia Tech. BSEC Conference, ORNL, Aug 26, 2015

Prakash 2015 64

Our Algorithm “SMART-ALLOC”

~CURRENT PRACTICE SMART-ALLOC

[US-MEDICARE NETWORK 2005]• Each circle is a hospital, ~3000 hospitals• More than 30,000 patients transferred

~6x fewer!

Page 65: Making Diffusion Work for You: From Social Media to Epidemiology B. Aditya Prakash Computer Science Virginia Tech. BSEC Conference, ORNL, Aug 26, 2015

Prakash 2015 65

Running Time

Simulations (Best competitor)

SMART-ALLOC

> 1 week

14 secs

> 30,000x speed-up!

Wall-Clock Time

Lower is better

Page 66: Making Diffusion Work for You: From Social Media to Epidemiology B. Aditya Prakash Computer Science Virginia Tech. BSEC Conference, ORNL, Aug 26, 2015

Prakash 2015 66

Experiments

K = 200 K = 2000

PENN-NETWORK SECOND-LIFE

~5 x ~2.5 x

Lower is better

Page 67: Making Diffusion Work for You: From Social Media to Epidemiology B. Aditya Prakash Computer Science Virginia Tech. BSEC Conference, ORNL, Aug 26, 2015

Prakash 2015 67

Latest results

• First (provable) approximation algorithms for edge-based problem ([Saha, Adiga, Prakash, Vullikanti SDM 2015])– O(log^2 n)--factor (can be improved to O(log n))

• Based on the idea of removing closed walks

– Semi-Definite Programming Rounding-based O(1) factor

Page 68: Making Diffusion Work for You: From Social Media to Epidemiology B. Aditya Prakash Computer Science Virginia Tech. BSEC Conference, ORNL, Aug 26, 2015

Prakash 2015 68

Data-aware Immunization

Dominator tree

Graph with infected nodes

Given: Graph and Infected nodesFind: ‘best’ nodes for immunization• Complexity

– NP-hard– Hard to approximate within an absolute error

• DAVA-tree– Optimal solution on the tree

• DAVA and DAVA-fast– Merging infected nodes– Build a “dominator tree”, and run DAVA-tree

• Running time: subquadratic– DAVA: O(k(|E|+ |V|log|V|))– DAVA-fast: O(|E|+|V|log|V|)

[Zhang and Prakash, SDM 2014]

Page 69: Making Diffusion Work for You: From Social Media to Epidemiology B. Aditya Prakash Computer Science Virginia Tech. BSEC Conference, ORNL, Aug 26, 2015

Prakash 2015 69

Extensions

• Can be extended to Uncertain and noisy initial data as well!

[Zhang and Prakash, CIKM 2014]

Twitter Firehose API1% sample

Page 70: Making Diffusion Work for You: From Social Media to Epidemiology B. Aditya Prakash Computer Science Virginia Tech. BSEC Conference, ORNL, Aug 26, 2015

Prakash 2015 70

Group-based Immunization

How to select groups to minimize the epidemic?

A

FE

D

CB

• Epidemiology• Contact networks• People are grouped by ages,

demographics, occupations …

• Social Media• Friendship networks• Friends are grouped by the

same interests• E.g., Facebook pages

[Zhang, Adiga, Vullikanti, Prakash, ICDM 2015]

See Poster!

Page 71: Making Diffusion Work for You: From Social Media to Epidemiology B. Aditya Prakash Computer Science Virginia Tech. BSEC Conference, ORNL, Aug 26, 2015

Outline

• Motivation• Part 1: Learning Models (Empirical Studies)• Part 2: Policy and Action (Algorithms)• Conclusion and Future Plans

Prakash 2015 71

Page 72: Making Diffusion Work for You: From Social Media to Epidemiology B. Aditya Prakash Computer Science Virginia Tech. BSEC Conference, ORNL, Aug 26, 2015

Future Plans

DATALarge real-world

networks & processes

ANALYSISUnderstanding

POLICY/ ACTIONManaging

Prakash 2015 72

Page 73: Making Diffusion Work for You: From Social Media to Epidemiology B. Aditya Prakash Computer Science Virginia Tech. BSEC Conference, ORNL, Aug 26, 2015

Scalability – Big Data

• Datasets of unprecedented scale– High dimensionality and sample size!

• Need scalable algorithms for – Learning Models– Developing Policy

• Leverage parallel systems– Map-Reduce clusters (like Hadoop) for data-intensive

jobs (more than 6000 machines) – Parallelized compute-intensive simulations (like Condor)

Prakash 2015 73

Page 74: Making Diffusion Work for You: From Social Media to Epidemiology B. Aditya Prakash Computer Science Virginia Tech. BSEC Conference, ORNL, Aug 26, 2015

Uncertain Data in Cascade analysis (more implementable policies)

Original, Nodes sampled off

Culprits, and missing nodes filled in

Sundereisan, Vreeken, Prakash. 2014

Correcting for missing data Designing More Robust Immunization Policies

Zhang and Prakash. CIKM 2014

Prakash 2015 74

Page 75: Making Diffusion Work for You: From Social Media to Epidemiology B. Aditya Prakash Computer Science Virginia Tech. BSEC Conference, ORNL, Aug 26, 2015

Prakash 2015 75

References1. Scalable Vaccine Distribution in Large Graphs given Uncertain Data (Yao Zhang and B. Aditya Prakash) -- In

CIKM 2014.2. Fast Influence-based Coarsening for Large Networks (Manish Purohit, B. Aditya Prakash, Chahhyun Kang, Yao

Zhang and V. S. Subrahmanian) – In SIGKDD 20143. DAVA: Distributing Vaccines over Large Networks under Prior Information (Yao Zhang and B. Aditya Prakash) --

In SDM 20144. Fractional Immunization on Networks (B. Aditya Prakash, Lada Adamic, Jack Iwashnya, Hanghang Tong, Christos

Faloutsos) – In SDM 20135. Spotting Culprits in Epidemics: Who and How many? (B. Aditya Prakash, Jilles Vreeken, Christos Faloutsos) – In

ICDM 2012, Brussels Vancouver (Invited to KAIS Journal Best Papers of ICDM.)6. Gelling, and Melting, Large Graphs through Edge Manipulation (Hanghang Tong, B. Aditya Prakash, Tina Eliassi-

Rad, Michalis Faloutsos, Christos Faloutsos) – In ACM CIKM 2012, Hawaii (Best Paper Award)7. Rise and Fall Patterns of Information Diffusion: Model and Implications (Yasuko Matsubara, Yasushi Sakurai, B.

Aditya Prakash, Lei Li, Christos Faloutsos) – In SIGKDD 2012, Beijing8. Interacting Viruses on a Network: Can both survive? (Alex Beutel, B. Aditya Prakash, Roni Rosenfeld, Christos

Faloutsos) – In SIGKDD 2012, Beijing9. Winner-takes-all: Competing Viruses or Ideas on fair-play networks (B. Aditya Prakash, Alex Beutel, Roni

Rosenfeld, Christos Faloutsos) – In WWW 2012, Lyon10. Threshold Conditions for Arbitrary Cascade Models on Arbitrary Networks (B. Aditya Prakash, Deepayan

Chakrabarti, Michalis Faloutsos, Nicholas Valler, Christos Faloutsos) - In IEEE ICDM 2011, Vancouver (Invited to KAIS Journal Best Papers of ICDM.)

11. Times Series Clustering: Complex is Simpler! (Lei Li, B. Aditya Prakash) - In ICML 2011, Bellevue12. Epidemic Spreading on Mobile Ad Hoc Networks: Determining the Tipping Point (Nicholas Valler, B. Aditya

Prakash, Hanghang Tong, Michalis Faloutsos and Christos Faloutsos) – In IEEE NETWORKING 2011, Valencia, Spain

13. Formalizing the BGP stability problem: patterns and a chaotic model (B. Aditya Prakash, Michalis Faloutsos and Christos Faloutsos) – In IEEE INFOCOM NetSciCom Workshop, 2011.

14. On the Vulnerability of Large Graphs (Hanghang Tong, B. Aditya Prakash, Tina Eliassi-Rad and Christos Faloutsos) – In IEEE ICDM 2010, Sydney, Australia

15. Virus Propagation on Time-Varying Networks: Theory and Immunization Algorithms (B. Aditya Prakash, Hanghang Tong, Nicholas Valler, Michalis Faloutsos and Christos Faloutsos) – In ECML-PKDD 2010, Barcelona, Spain

16. MetricForensics: A Multi-Level Approach for Mining Volatile Graphs (Keith Henderson, Tina Eliassi-Rad, Christos Faloutsos, Leman Akoglu, Lei Li, Koji Maruhashi, B. Aditya Prakash and Hanghang Tong) - In SIGKDD 2010, Washington D.C.

Page 76: Making Diffusion Work for You: From Social Media to Epidemiology B. Aditya Prakash Computer Science Virginia Tech. BSEC Conference, ORNL, Aug 26, 2015

Prakash 2015 76

Acknowledgements

Collaborators Christos Faloutsos Roni Rosenfeld, Michalis Faloutsos, Lada Adamic, Theodore Iwashyna (M.D.), Dave Andersen, Tina Eliassi-Rad, Iulian Neamtiu,

Varun Gupta, Jilles Vreeken, V. S. Subrahmanian John Brownstein (M.D.)

Deepayan Chakrabarti, Hanghang Tong, Kunal Punera, Ashwin Sridharan, Sridhar Machiraju, Mukund Seshadri, Alice Zheng, Lei Li, Polo Chau, Nicholas Valler, Alex Beutel, Xuetao Wei

Page 77: Making Diffusion Work for You: From Social Media to Epidemiology B. Aditya Prakash Computer Science Virginia Tech. BSEC Conference, ORNL, Aug 26, 2015

Prakash 2015 77

Acknowledgements

• Students Liangzhe Chen Shashidhar Sundereisan Benjamin Wang Yao Zhang Sorour Amiri

Page 78: Making Diffusion Work for You: From Social Media to Epidemiology B. Aditya Prakash Computer Science Virginia Tech. BSEC Conference, ORNL, Aug 26, 2015

Prakash 2015 78

Acknowledgements

Funding

Page 79: Making Diffusion Work for You: From Social Media to Epidemiology B. Aditya Prakash Computer Science Virginia Tech. BSEC Conference, ORNL, Aug 26, 2015

Prakash 2015 79

Analysis Policy/Action Data

Making Diffusion Work for You

B. Aditya Prakash http://www.cs.vt.edu/~badityap