network dynamics and simulation science laboratory a data-driven epidemiological model stephen...

53
Network Dynamics and Simulation Science Laboratory A Data-driven Epidemiological Model Stephen Eubank, Christopher Barrett, Madhav V. Marathe GIACS Conference on "Data in Complex Systems" Palermo, Italy, April 7-9 2008

Post on 22-Dec-2015

215 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Network Dynamics and Simulation Science Laboratory A Data-driven Epidemiological Model Stephen Eubank, Christopher Barrett, Madhav V. Marathe GIACS Conference

Network Dynamics and Simulation Science

Laboratory

A Data-driven Epidemiological Model

Stephen Eubank, Christopher Barrett, Madhav V. Marathe

GIACS Conference on"Data in Complex Systems"

Palermo, Italy, April 7-9 2008

Page 2: Network Dynamics and Simulation Science Laboratory A Data-driven Epidemiological Model Stephen Eubank, Christopher Barrett, Madhav V. Marathe GIACS Conference

QuickTime™ and aAnimation decompressor

are needed to see this picture.

Page 3: Network Dynamics and Simulation Science Laboratory A Data-driven Epidemiological Model Stephen Eubank, Christopher Barrett, Madhav V. Marathe GIACS Conference

QuickTime™ and aCinepak decompressor

are needed to see this picture.

Page 4: Network Dynamics and Simulation Science Laboratory A Data-driven Epidemiological Model Stephen Eubank, Christopher Barrett, Madhav V. Marathe GIACS Conference
Page 5: Network Dynamics and Simulation Science Laboratory A Data-driven Epidemiological Model Stephen Eubank, Christopher Barrett, Madhav V. Marathe GIACS Conference

Network Dynamics and Simulation Science

Laboratory

Data driven epidemiological models

I. Complex system

II. Data driven, individual-based simulation

III. Privacy and accuracy issues

Page 6: Network Dynamics and Simulation Science Laboratory A Data-driven Epidemiological Model Stephen Eubank, Christopher Barrett, Madhav V. Marathe GIACS Conference

Network Dynamics and Simulation Science

Laboratory

What’s so complex about epidemiology?

Consider an “outbreak” among 4 people

removedinfectious

susceptible

Page 7: Network Dynamics and Simulation Science Laboratory A Data-driven Epidemiological Model Stephen Eubank, Christopher Barrett, Madhav V. Marathe GIACS Conference

Network Dynamics and Simulation Science

Laboratory

Outbreaks can be represented as Markov processes

A given configuration of the system probabilistically transitions into any of several other configurations.

Even a small system has many possible configurations.

Page 8: Network Dynamics and Simulation Science Laboratory A Data-driven Epidemiological Model Stephen Eubank, Christopher Barrett, Madhav V. Marathe GIACS Conference

Network Dynamics and Simulation Science

Laboratory

Very little data is available to estimate this process

Historically, we (partially) observe 1 or 2 Markov chains

We want to estimate transition probabilities on every edge

Page 9: Network Dynamics and Simulation Science Laboratory A Data-driven Epidemiological Model Stephen Eubank, Christopher Barrett, Madhav V. Marathe GIACS Conference

Network Dynamics and Simulation Science

Laboratory

Aggregation simplifies the model …

… at the cost of reduced information content.

p(C’t+1 | C’t) is less informative than p(Ct+1 | Ct) when C’ C,

0 1 2 3 4 #S

#I

4

3

2

1

0

Page 10: Network Dynamics and Simulation Science Laboratory A Data-driven Epidemiological Model Stephen Eubank, Christopher Barrett, Madhav V. Marathe GIACS Conference

Network Dynamics and Simulation Science

Laboratory

Other assumptions further simplify the model …

… but are unwarranted in social systems, where components are

1. Heterogenous (distinguishable)2. Intentional (behavior not determined by physical laws)

QuickTime™ and aPNG decompressor

are needed to see this picture.

Page 11: Network Dynamics and Simulation Science Laboratory A Data-driven Epidemiological Model Stephen Eubank, Christopher Barrett, Madhav V. Marathe GIACS Conference

Network Dynamics and Simulation Science

Laboratory

Aggregation naturally makes contact with observations

Observations of outbreaks often ignore heterogeneity and intention, and provide only point estimates.

Page 12: Network Dynamics and Simulation Science Laboratory A Data-driven Epidemiological Model Stephen Eubank, Christopher Barrett, Madhav V. Marathe GIACS Conference

“An approximate answer to the right problem is worth a good deal more than

an exact answer to an approximate problem”

- J. Tukey

“All models are wrong, but some are useful”- G.E.P. Box

A system is complex “if its behavior crucially depends on the details of its parts.”

- G. Parisi

Page 13: Network Dynamics and Simulation Science Laboratory A Data-driven Epidemiological Model Stephen Eubank, Christopher Barrett, Madhav V. Marathe GIACS Conference

Network Dynamics and Simulation Science

Laboratory

Interaction approach simplifies process itself

Interactions among system components completely determine transition probabilities among configurations

replaced with

Page 14: Network Dynamics and Simulation Science Laboratory A Data-driven Epidemiological Model Stephen Eubank, Christopher Barrett, Madhav V. Marathe GIACS Conference

Network Dynamics and Simulation Science

Laboratory

Calibrating with unexpectedly rich data

• For aerosol borne pathogens, the probability of transmission

is related to physical proximity, duration, etc.

• The interaction approach reduces to estimating a social network.

• There is much more data available for this than for outbreaks.

• But it is not directly observable.

Page 15: Network Dynamics and Simulation Science Laboratory A Data-driven Epidemiological Model Stephen Eubank, Christopher Barrett, Madhav V. Marathe GIACS Conference

How can we estimate a social network?

Page 16: Network Dynamics and Simulation Science Laboratory A Data-driven Epidemiological Model Stephen Eubank, Christopher Barrett, Madhav V. Marathe GIACS Conference

Network Dynamics and Simulation Science

Laboratory

A possible approach we didn’t use

• Consider a subset of random networks subject to certain constraints• Constraints should be relevant to the global dynamics, i.e. epidemics• But what are those? A “chicken or the egg” problem:

It would seem offhand that a taxonomy of “nets” … would arise naturally from the consideration of the statistical parameters... But the statistical parameters themselves are singled out on the basis of taxonomic considerations, which have yet to be clarified.

- Anatol Rapoport and William Horvath, Behav Sci. 1961, 6, 279–291

Page 17: Network Dynamics and Simulation Science Laboratory A Data-driven Epidemiological Model Stephen Eubank, Christopher Barrett, Madhav V. Marathe GIACS Conference

Network Dynamics and Simulation Science

Laboratory

Questions to drive model development

1. What is the optimal targeted allocation of antivirals used prophylactically or therapeutically to mitigate influenza pandemic?

2. What combination of targeted antivirals and feasible, community-based, non-pharmaceutical interventions (e.g. closing schools, allowing liberal leave from work) can best delay an outbreak from becoming epidemic for several months?

1 & 2 Models must compare changes in social network with changes in transmissibility

This is an example of policy informatics for complex systems

Page 18: Network Dynamics and Simulation Science Laboratory A Data-driven Epidemiological Model Stephen Eubank, Christopher Barrett, Madhav V. Marathe GIACS Conference

Network Dynamics and Simulation Science

Laboratory

Interventions specified naturally by effect on network

No single “knob” reduces overall transmission by 50%

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.

Page 19: Network Dynamics and Simulation Science Laboratory A Data-driven Epidemiological Model Stephen Eubank, Christopher Barrett, Madhav V. Marathe GIACS Conference

Network Dynamics and Simulation Science

Laboratory

Step 1. Create a synthetic population

• Census data– Individual demographics

• Age and gender

– Household characteristics• Size and Income

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

Page 20: Network Dynamics and Simulation Science Laboratory A Data-driven Epidemiological Model Stephen Eubank, Christopher Barrett, Madhav V. Marathe GIACS Conference

Network Dynamics and Simulation Science

Laboratory

• Start from a proto-population, e.g. a list of ids.• Add observed data• Capture correlations in data using statistical models

(iterative proportional fitting from Public Use Microdata)

• Start from a proto-population, e.g. a list of ids.• Add observed data• Start from a proto-population, e.g. a list of ids.

Successive refinement of synthetic data

ID GenderHousehold

1 M 1

.

.

.

.

.

.

.

.

.

3 x 108 F 1.2 x 108

Page 21: Network Dynamics and Simulation Science Laboratory A Data-driven Epidemiological Model Stephen Eubank, Christopher Barrett, Madhav V. Marathe GIACS Conference

Network Dynamics and Simulation Science

Laboratory

Step 2. Assign activities, locations & times

• Locations – Dunn and Bradstreet data

• Activity surveys– Matched to households by demographics

– Matched to locations by activity type & travel time

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.QuickTime™ and a

TIFF (Uncompressed) decompressorare needed to see this picture.

Page 22: Network Dynamics and Simulation Science Laboratory A Data-driven Epidemiological Model Stephen Eubank, Christopher Barrett, Madhav V. Marathe GIACS Conference

Network Dynamics and Simulation Science

Laboratory

• Surveys are very different kinds of data sources than census• This step depends on data fusion capability• Some values may be outcomes of very large games, not statistical models

Successive refinement of synthetic data

• Surveys are very different kinds of data sources than census• This step depends on data fusion capability

ID GenderHousehold

ActivitiesActivity

LocationsActivityTimes

1 M 1Schoolshop

2743

8:003:00

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

3 x 108 F 1.2 x 108 Worksocial

98734723947

9:007:30

Page 23: Network Dynamics and Simulation Science Laboratory A Data-driven Epidemiological Model Stephen Eubank, Christopher Barrett, Madhav V. Marathe GIACS Conference

Network Dynamics and Simulation Science

Laboratory

So far: a typical family’s day

Carpool

HomeHome

Work Lunch WorkCarpool

Bus

Shopping

Car

Daycare

Car

School

time

Bus

Page 24: Network Dynamics and Simulation Science Laboratory A Data-driven Epidemiological Model Stephen Eubank, Christopher Barrett, Madhav V. Marathe GIACS Conference

Network Dynamics and Simulation Science

Laboratory

Overlapping families’ days create a social network

Page 25: Network Dynamics and Simulation Science Laboratory A Data-driven Epidemiological Model Stephen Eubank, Christopher Barrett, Madhav V. Marathe GIACS Conference

Network Dynamics and Simulation Science

Laboratory

Successive refinement of synthetic data

• Gives us a generative model for contacts• More powerful than traditional encapsulated agents• Note: each byte of data / person adds ~300 MB to the database

ID GenderHousehold

ActivitiesActivity

LocationsActivityTimes

ContactsContactDuration

1 M 1Schoolshop

2743

8:003:00

2,3,4836, 289

5:200:45

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

3 x 108 F 1.2 x 108 Worksocial

98734723947

9:007:30

Page 26: Network Dynamics and Simulation Science Laboratory A Data-driven Epidemiological Model Stephen Eubank, Christopher Barrett, Madhav V. Marathe GIACS Conference

Network Dynamics and Simulation Science

Laboratory

Using data for purposes other than intended

Possibly the only epidemiological model that hasbeen calibrated using automobile traffic counts!

(Because the same activity model generates both transportation demand and contact networks)

Page 27: Network Dynamics and Simulation Science Laboratory A Data-driven Epidemiological Model Stephen Eubank, Christopher Barrett, Madhav V. Marathe GIACS Conference

QuickTime™ and aCinepak decompressor

are needed to see this picture.

Page 28: Network Dynamics and Simulation Science Laboratory A Data-driven Epidemiological Model Stephen Eubank, Christopher Barrett, Madhav V. Marathe GIACS Conference

Network Dynamics and Simulation Science

Laboratory

HomeHome

Activities adapt to situation & generate network changesActivities adapt to situation & generate network changes

Page 29: Network Dynamics and Simulation Science Laboratory A Data-driven Epidemiological Model Stephen Eubank, Christopher Barrett, Madhav V. Marathe GIACS Conference

Network Dynamics and Simulation Science

Laboratory

Derive disease interaction from social network

Interactions only need to get a few things right:• Susceptibility• Infectivity as a function of time since exposure

Page 30: Network Dynamics and Simulation Science Laboratory A Data-driven Epidemiological Model Stephen Eubank, Christopher Barrett, Madhav V. Marathe GIACS Conference

Network Dynamics and Simulation Science

Laboratory

Modeling pandemic influenza

• Nobody knows what pandemic flu will look like

• Assume something like seasonal flu, but with less immunity

• Create several “flu” bugs in siico– Moderate (10% attack rate)– Strong (20 - 25% attack rate) – Catastrophic (> 50% attack rate)

• For each, fix other characteristics:– Incubation period: 2-3 days– Infectious period: 2-5 days

Page 31: Network Dynamics and Simulation Science Laboratory A Data-driven Epidemiological Model Stephen Eubank, Christopher Barrett, Madhav V. Marathe GIACS Conference

Network Dynamics and Simulation Science

Laboratory

Resolution, fidelity, and accuracy are different

• Resolution describes level of aggregation,

e.g. individuals vs populations

• Fidelity describes the completeness of the representation’s features,

e.g. age vs (age, gender, income, household size, education)

• Accuracy describes the correctness of features and correlations

e.g. is mixing by age derived from social network correct?

“Validity” (always for a particular question) depends on all 3.

Page 32: Network Dynamics and Simulation Science Laboratory A Data-driven Epidemiological Model Stephen Eubank, Christopher Barrett, Madhav V. Marathe GIACS Conference

Effect of changes in social networks (above) on disease dynamics (below)

Page 33: Network Dynamics and Simulation Science Laboratory A Data-driven Epidemiological Model Stephen Eubank, Christopher Barrett, Madhav V. Marathe GIACS Conference

Network Dynamics and Simulation Science

Laboratory

Characterizing the resulting network

Page 34: Network Dynamics and Simulation Science Laboratory A Data-driven Epidemiological Model Stephen Eubank, Christopher Barrett, Madhav V. Marathe GIACS Conference

Degree Distribution, location-location

Page 35: Network Dynamics and Simulation Science Laboratory A Data-driven Epidemiological Model Stephen Eubank, Christopher Barrett, Madhav V. Marathe GIACS Conference

Degree Distribution, people-people

Page 36: Network Dynamics and Simulation Science Laboratory A Data-driven Epidemiological Model Stephen Eubank, Christopher Barrett, Madhav V. Marathe GIACS Conference

Sensitivity to parameters

Page 37: Network Dynamics and Simulation Science Laboratory A Data-driven Epidemiological Model Stephen Eubank, Christopher Barrett, Madhav V. Marathe GIACS Conference

Sensitivity to parameters

Page 38: Network Dynamics and Simulation Science Laboratory A Data-driven Epidemiological Model Stephen Eubank, Christopher Barrett, Madhav V. Marathe GIACS Conference

Network Dynamics and Simulation Science

Laboratory

Assortative Mixing

• Static people - people projection is assortative – by degree (~0.25)– but not as strongly by age, income, household size, …

This is

• Like other social networks • Unlike

– technological networks, – Erdos-Renyi random graphs– Barabasi-Albert networks

Page 39: Network Dynamics and Simulation Science Laboratory A Data-driven Epidemiological Model Stephen Eubank, Christopher Barrett, Madhav V. Marathe GIACS Conference

Removing high degree people useless

Page 40: Network Dynamics and Simulation Science Laboratory A Data-driven Epidemiological Model Stephen Eubank, Christopher Barrett, Madhav V. Marathe GIACS Conference

Removing high degree locations better

Page 41: Network Dynamics and Simulation Science Laboratory A Data-driven Epidemiological Model Stephen Eubank, Christopher Barrett, Madhav V. Marathe GIACS Conference

Network Dynamics and Simulation Science

Laboratory

Summary

• Complex systems models are hungry for detail (= data)

• Privacy & extrapolation require “synthetic” data, combining observations (declarative), statistical models, and simulation results (procedural)

• Validity of synthetic data depends on resolution, fidelity, accuracy, and the question it is intended to answer

Page 42: Network Dynamics and Simulation Science Laboratory A Data-driven Epidemiological Model Stephen Eubank, Christopher Barrett, Madhav V. Marathe GIACS Conference
Page 43: Network Dynamics and Simulation Science Laboratory A Data-driven Epidemiological Model Stephen Eubank, Christopher Barrett, Madhav V. Marathe GIACS Conference

Network Dynamics and Simulation Science

Laboratory

When is this model simpler?

Notation: x and y are states of a component at time t and t+1

1. Components’ states are updated independently:

# parameters

2. Interactions are pairwise independent:

# parameters

Page 44: Network Dynamics and Simulation Science Laboratory A Data-driven Epidemiological Model Stephen Eubank, Christopher Barrett, Madhav V. Marathe GIACS Conference

Network Dynamics and Simulation Science

Laboratory

When is this model simpler?

3. Most components do not interact directly:

# parameters

4. Only one state transition, S I, is affected by interactions: # parameters

Page 45: Network Dynamics and Simulation Science Laboratory A Data-driven Epidemiological Model Stephen Eubank, Christopher Barrett, Madhav V. Marathe GIACS Conference

Architecture

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.

Page 46: Network Dynamics and Simulation Science Laboratory A Data-driven Epidemiological Model Stephen Eubank, Christopher Barrett, Madhav V. Marathe GIACS Conference

Network Dynamics and Simulation Science

Laboratory

Computational Resources

• Demonstration experiment– 8 experiments (exp ids: 1083 to 1090)– 24 cells with 200 days and 25 reps

• Computations performed– 291 million contacts * 200 days * 25 reps * 24 cells =

34.92 quadrillion transmission evaluations

• Time Requirements– Single processor: 2 years 340 days– Small cluster (10 nodes, 4 cores): 26 days 18 hours– Current IDAC cluster: > 3 hours

Page 47: Network Dynamics and Simulation Science Laboratory A Data-driven Epidemiological Model Stephen Eubank, Christopher Barrett, Madhav V. Marathe GIACS Conference

Network Dynamics and Simulation Science

Laboratory

Example Located Synthetic Population

Page 48: Network Dynamics and Simulation Science Laboratory A Data-driven Epidemiological Model Stephen Eubank, Christopher Barrett, Madhav V. Marathe GIACS Conference

Example Route Plans

HOME

WORKLUNCH

WORK

DOCTOR

SHOP

HOME

HOME

WORK

SHOP

second person in household

first person in household

Page 49: Network Dynamics and Simulation Science Laboratory A Data-driven Epidemiological Model Stephen Eubank, Christopher Barrett, Madhav V. Marathe GIACS Conference

Network Dynamics and Simulation Science

Laboratory

Time Slice of a Typical Family’s Day

Page 50: Network Dynamics and Simulation Science Laboratory A Data-driven Epidemiological Model Stephen Eubank, Christopher Barrett, Madhav V. Marathe GIACS Conference

Network Dynamics and Simulation Science

Laboratory

How much does detail matter?

• Interaction picture: – Dynamics of outbreak depend on topology

– How and how much?

– What differences in network topology are relevant to prevention/mitigation

• What statistics capture difference?• Answer staring us in the face (see above):

– Overall attack rate is a function of the topology of the network

• Other measures for other questions– Attack rate by transmissibility as function of edges retained

– Vulnerability of a subset as function of edges retained

– Distribution of vulnerabilities as function of edges retained

How much does detail matter?

Page 51: Network Dynamics and Simulation Science Laboratory A Data-driven Epidemiological Model Stephen Eubank, Christopher Barrett, Madhav V. Marathe GIACS Conference

Network Dynamics and Simulation Science

Laboratory

Edge deletion in a graph

• RTI synthesized poultry farm network• In collaboration with Upenn, studying outbreaks• National network, essentially complete graph

– Distribution of weights

• Attack rate as function of edges retained• Attack rate by transmissibility as function of edges retained• Vulnerability of a subset as function of edges retained• Distribution of vulnerabilities as function of edges retained

Page 52: Network Dynamics and Simulation Science Laboratory A Data-driven Epidemiological Model Stephen Eubank, Christopher Barrett, Madhav V. Marathe GIACS Conference

Network Dynamics and Simulation Science

Laboratory

Model comparison

• Compare outcomes of same scenarios– Compare distributions of outcomes of similar scenarios

– Compare distributions of summary statistics of outcomes of similar scenarios

– Compare distributions of answers to questions about similar scenarios

• Compare

Page 53: Network Dynamics and Simulation Science Laboratory A Data-driven Epidemiological Model Stephen Eubank, Christopher Barrett, Madhav V. Marathe GIACS Conference

Network Dynamics and Simulation Science

Laboratory

Adds up to serious informatics challenge

• Managing the refinement process

• Integrating various data sources & simulations

• Curating the database

• Providing HPC services

• Providing analysis support