how many cases are too many? detection of disease...
TRANSCRIPT
How Many Cases Are Too Many?
Detection of Disease Outbreaks and Clusters
Lance A. Waller, Department of Biostatistics, Rollins School of Public Health, Emory University
How many are too many?
What sets off the public health “alarm”?
For anthrax and smallpox…
ONE (no statistics needed)
(rare enough and dangerous enough)
What about…
…a more subtle pattern?5 flu cases in a single day.20 acute asthma attacks in one
neighborhood.
We want to detect anomolies, patterns of cases differing from the “usual” pattern.
What are we looking for?
Among “Epidemiologic clues that may signal a covert bioterrorism attack” CDC’sThe Public Health Response to Biological and Chemical Terrorism: Interim Planning Guidance for State Public Health Officials (July 2001):
“Disease with unusual geographic or seasonal distribution”
http://www.bt.cdc.gov/Documents/Planning/PlanningGuidance.PDF
John Snow, M.D. 1845 map
!
Snow, J. (1949) Snow on Cholera.Oxford University Press: London.
What we want...
Statistical assessments of the “unusualness” of observed patterns in space and time.Suggests statistical tests of: H0: No clusters in the data.
Yes/no answer?Easy to ask, harder to answer.
Distributed “by chance”…
Need to “operationalize” H0
What sort of data arise under H0?What counts as evidence against H0 ?
Simple random (uniform) pattern?
Scan statistics
Count events in moving window.In time:
Consideration: Cluster “anywhen”, or outbreak now?
4 3 2 20
Wallenstein, S. (1980) A test for detection of clustering over time. American Journal of Epidemiology 111, 367-372.
Scan statistic in space
2
0
3
1
Kulldorff, M. (1997) A spatial scan statistic. Communications in Statistics-Theory and Methods 26, 1481-1496.
Complication
Heterogeneous population density
Refine the question…
“Are there clusters in the data?” to
“Are there clusters in the data after adjusting for heterogeneities in the population at risk?”
Complication:
Where is “where”?Which location for each case?Example: Maxcy (1926) study of endemic typhus fever in Montgomery, AL, 1922-1925.
Lilienfeld, D.E. and Stolley, P.D. (1994) Foundations of Epidemiology, Third Edition. Oxford University Press: New York, pp. 136-140.
Maxcy, K.F. (1926) “An epidemiological study of endemic typhus (Brill’s disease) in the Southeastern United States with special reference to its
mode of transmition.” Public Health Reports 41, 2967-2995.
Residence location Place of employment
Refine the question…
“Are there clusters in the data after adjusting for heterogeneities in the population at risk?” to…
“Are there clusters of case residences in the data after adjusting for heterogeneities in the population at risk?”
We’re building a conceptual model…
What we have…
Disease surveillance (ongoing collection, monitoring, and analysis of disease data).Vital statistics (birth/death certificates)Notifiable diseases (required reporting)Registries (link multiple sources of information on each case, e.g. SEER)Health surveys (NHANES, NHIS, BRFSS)
Teutsch, S.M. and Churchill, R.E. (1994) Principles and Practice of Public Health Surveillance. Oxford University Press: New York.
Data components
Types of location (time or space) data:
Point data (case locations)• Latitude/longitude• Street address• Confidentiality?
Regional count data• Counts for enumeration
districts
Background data
Types of background data:Point locations for non-cases (“controls”)• Is the spatial distribution of
cases close to that of controls?
Regional census counts• Are the observed number of
cases close to the number expected under H0?
Point data
Case locations geocoded from registry or billing records.Controls:
All non-cases (e.g., birth records)Sample (perhaps matched) of non-cases.Different outcome (e.g., nonrespiratory ED visits, compared to respiratory ED visits)
Regional Count Data
Aggregate to regional counts, often to preserve confidentiality.
4 1
211 2
Complication:
Counts lose some resolution...
4 1
211 2
Modifiable Areal Unit Problem
Different aggregations can lead to different results.
4 1
211 2
0 0 0 0
2210
20
24
0
MAUP example: John Snow
?
Monmonier, M (1991) How to Lie with Maps. University of Chicago Press: Chicago. p. 142.
Operationalizing H0 :
Case/control point data:Random labeling hypothesisSay n0 control, n1 case locations.H0: Case/control label randomly assigned to the n = n0 + n1 total locations.
Operationalizing H0 :
Regional count data:Constant risk hypothesisEach individual subject to same risk.Expected count = (risk)*(population size).
Variable total: Poisson counts.Fixed total: Multinomial counts.
4 1
211 2
5 2
101 1
3 0
410 3
H0 drives type of test
Random labeling: often compare observed spatial intensities (expected number of events per unit area) of cases and controls.Constant risk: compare observed to those expected counts (goodness of fit).
What deviation from H0 ?
Tests of clustering: check tendency for cases to occur in clusters. Tests to detect clusters: find most likely cluster(s).General tests: detect clusters or clustering anywhere.Focused tests: detect clusters or clustering around suspected foci.
Besag, J. and Newell, J. (1991) “The detection of clusters in rarediseases”. Journal of the Royal Statistical Society-A 154 327-333.
How weird? (Monte Carlo test)
Random labeling/constant risk simulate data sets under H0.For any test statistic, calculate value in observed data, Tobs.Simulate many data sets under H0, and calculate the test statistic for each (T1,T2,…,Tnumsim ).p-value = proportion of test statistics from simulated data sets exceeding Tobs (fraction of T’s > Tobs).
Example: Regional Counts
Comparing observed to expected.Pearson’s chi-square statistic:
X2 =Sum of (Oi – Ei)2
But X2 ignores location of lack of fit.
Spatial goodness-of-fit
Instead of squaring (Oi – Ei), what if we link (Oi – Ei) and (Ok – Ek) by proximity of regions i and k ?Say, sum wik (Oi – Ei)(Ok – Ek), where wik gives link between i and k ?This (essentially) gives Tango’s index of clustering.
Tango, T. (1990) An index for cancer clustering. EnvironmentalHealth Perspectives 87, 157-162.
Finding spatial clusters?
Spatial scan statistic (SaTScan)Scan on windows with distance radii.
Turnbull et al’s Cluster Evaluation Permutation Procedure (CEPP).
Scan on window of constant population size (e.g., 10,000 people at risk).
Besag and Newell’s approachScan on window of constant number of cases (e.g., 10 cases).
All seek collection least consistent with H0 .
New York Leukemia
592 cases 1978-1982, 8 counties, 790 census regions, ~ 1 million people.
Example: case/control point data
Kelsall and Diggle (1995)Compare ratio of case intensity to control intensity.Random labeling simulations.Identify locations where case intensity significantly exceeds control intensity (pointwise test of significance).
Approach to detect clusters.
Kelsall, J.E. and Diggle, P.J. (1995) Non-parametric estimation ofspatial variation in relative risk. Statistics in Medicine 14, 2335-2342.
Archeology data
Alt and Vach (1991)143 grave sites, 30 with affected teeth (“cases”)Question: families buried together?Tested question: Do gravesites with affected teeth cluster?
Alt, K.W., and Vach, W. (1991) “The reconstruction of ‘genetickinship’ in prehistoric burial complexes – problems and statistics”
In Classification, Data Analysis, and Knowledge Organization:Models and Methods with Applications. H.-H. Beck and P. Ihm (eds.)
Springer: Berlin.
Map
Case and control intensities
f
Y
Z
Affected
4000 6000 8000 10000
4000
6000
8000
10000
**
*
*
*
*
*
*
* *
***
*
*
**
**
*
*
*
**
*** **
*
Affected, bw = 500
g
Y
Z
Non-affected
4000 6000 8000 10000
4000
6000
8000
10000
o
oooo
oo o o
oo
oo
o
oo
oooo
oo
oo
o
o
ooo
oooo
oo
oo
o
o
oo
oo
oo
oo
o
o
oo
ooo
oo
o
o
o
o
ooo o
ooo
o
oooo
o
o
o
oo
oo
o
o
oo
o
o
o
o
o
o
oo oooo
oo
o
ooo
ooooo
o
ooo
oo
o
Non-affected, bw = 500
Relative risk surface
r
Y
Z
Relative risk surface
4000 6000 8000
4000
6000
8000 **
*
*
*
*
*
*
* ****
*
*
**
**
*
*
*
***** **
*
o
ooooooo o
oooo
ooo
oooo
oo
oo
o
o
oooooo
o
oooo
o
o
oo
oo
ooo
o
ooooooo
oo
o
o
o
o
ooo o
ooo
o
oooo
o
oo
o ooo
o
o
oo
o
o
o
o
o
o
oo oooo
ooo
ooo
ooooo
o
ooo
ooo
Relative risk surface, bw= 500
Spatial scan statistic
Most likely cluster (p-value = 0.067)
Important ideas
What question do I want to answer?What data can I get?What statistical method will I use? What question can I answer with the data I have and the method?Does this match my first question?
Additional important ideas
Results depend on data structure (MAUP).Every test involves a specific definition of “cluster”…ask yourself:
What data results from H0 (the model of “no clustering”)?
• Can you simulate data from H0?
What constitutes evidence against H0(the model of “clustering”)?
• Do your data appear consistent with H0?
Reading listBesag, J. and Newell, J. (1991). The detection of clusters in rare diseases. Journal of the Royal Statistical Society, Series A 154, 143-155. Kelsall, J.E. and Diggle, P.J. (1995) Non-parametric estimation of spatial variation in relative risk. Statistics in Medicine 14, 2335-2342. Kulldorff, M. (1997) A spatial scan statistic. Communications in Statistics-Theory and Methods 26, 1481-1496.Neutra, R.R. (1990). Counterpoint from a cluster buster. American Journal of Epidemiology 132, 1-8.Rothman, K. (1990). A sobering start to the cluster busters’ conference. American Journal of Epidemiology 132 (Supplement), S6-S13.Snow, J. (1946) Snow on Cholera. Oxford University Press.Tango, T. (1990) An index for cancer clustering. Environmental Health Perspectives 87, 157-162.Turnbull, B.W., Iwano, E.J., Burnett, W.S., Howe, H.L., and Clark, L.C. (1990). Monitoring for clusters of disease: application to leukemia incidence in upstate New York. American Journal of Epidemiology 132 (Supplement), S136-S143. Wallenstein, S. (1980) A test for detection of clustering over time. American Journal of Epidemiology 111, 367-372.Waller, L.A. and Jacquez, G.M. (1995). Disease models implicit in statistical tests of disease clustering. Epidemiology 6, 584-590.Waller, L.A. (2002). Methods for detecting disease clustering in time or space”. In Statistical Methods and Principles in Public Health Surveillance. R. Brookmeyer and D. Stroup (eds). Oxford University Press.