epidemiology for mathematicians “ looking at wildflowers from horseback” david ozonoff, md, mph...

83
Epidemiology for mathematicians Looking at wildflowers from horseback” David Ozonoff, MD, MPH Boston University School of Public Health DIMACS Working Group on Order Theory in Epidemiology March 7, 2005

Post on 22-Dec-2015

218 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: Epidemiology for mathematicians “ Looking at wildflowers from horseback” David Ozonoff, MD, MPH Boston University School of Public Health DIMACS Working

Epidemiology for mathematicians“Looking at wildflowers from horseback”

David Ozonoff, MD, MPHBoston University

School of Public Health

DIMACS Working Group on Order Theory in Epidemiology

March 7, 2005

Page 2: Epidemiology for mathematicians “ Looking at wildflowers from horseback” David Ozonoff, MD, MPH Boston University School of Public Health DIMACS Working

Tutorial overview and goals

• The landscape of epidemiology– What is epidemiology?

– Who is an epidemiologist?

– Who employs them?

– Kinds of epidemiology

• How epidemiologists think– What kinds of things do they work with?

– What kinds of things are they interested in?

Page 3: Epidemiology for mathematicians “ Looking at wildflowers from horseback” David Ozonoff, MD, MPH Boston University School of Public Health DIMACS Working

Tutorial overview and goals, cont’d

• Some language and concepts of epidemiology– Language of occurrence measures– Study designs– Causal inference

Page 4: Epidemiology for mathematicians “ Looking at wildflowers from horseback” David Ozonoff, MD, MPH Boston University School of Public Health DIMACS Working

I. Landscape, perspective, language

What is epidemiology?

Who is an epidemiologist?

Who employs epidemiologists?

Flavors of epidemiology: Descriptive, analytic

Epi and mathematics: models and patterns

Some examples of epidemiological thinking

Page 5: Epidemiology for mathematicians “ Looking at wildflowers from horseback” David Ozonoff, MD, MPH Boston University School of Public Health DIMACS Working

Some definitions of epidemiology• Study of health and illness in populations (Kleinbaum,

Kupper and Morgenstern)

• Study of the distribution and determinants of disease frequency in human populations (MacMahon and Pugh; Susser)

• Study of the occurrence of illness (Rothman I)

• Theoretical epidemiology: discipline of how to study the occurrence of phenomena of interest in the health field (Miettinnen) [NB: not illness centered]

Page 6: Epidemiology for mathematicians “ Looking at wildflowers from horseback” David Ozonoff, MD, MPH Boston University School of Public Health DIMACS Working

Some more (cynical) definitions• Rothman II: “Unfortunately, there seem to be more

definitions of epidemiology than there are epidemiologists. Some have defined it in terms of its methods. While the methods of epidemiology may be distinctive, it is more typical to define a branch of science in terms of its subject matter rather than its tools….If the subject of epidemiologic inquiry is taken to be the occurrence of disease and other health outcomes, it is reasonable to infer that the ultimate goal of most epidemiologic research is the elaboration of causes that can explain patterns of disease occurrence.”

• Schneiderman: Epidemiology is the practice of criticizing other epidemiologists

Page 7: Epidemiology for mathematicians “ Looking at wildflowers from horseback” David Ozonoff, MD, MPH Boston University School of Public Health DIMACS Working

Consensus notions

• Deals with populations, not individuals

• Deals with (frequency of) occurrences of health related events

• Has a (major but not exclusive) concern with causes (“determinants”) of disease patterns in populations

Page 8: Epidemiology for mathematicians “ Looking at wildflowers from horseback” David Ozonoff, MD, MPH Boston University School of Public Health DIMACS Working

Remarks • Public health perspective• “Flavors”: Analytic versus descriptive epidemiology• Causal inference: assumptions

– Disease occurrence is not random.– Systematic investigation of different populations can

identify causal and preventive factors

• Observational versus experimental sciences• Chronic disease and infectious disease epidemiology

– What is “theoretical epidemiology”?

Page 9: Epidemiology for mathematicians “ Looking at wildflowers from horseback” David Ozonoff, MD, MPH Boston University School of Public Health DIMACS Working

Some examples• Do environmental exposures increase risk of disease?

– John Snow: cholera epidemic of 1854– Contaminated water and leukemia in Woburn, MA

• Are vitamin supplements beneficial? – Does Vitamin E lower risk of Alzheimer’s Disease– Folic acid and risk of neural tube (birth) defects

• Do behavioral interventions reduce risk behaviors? – Community–based studies to change diets– Peer interventions to reduce HIV-risk behaviors

Page 10: Epidemiology for mathematicians “ Looking at wildflowers from horseback” David Ozonoff, MD, MPH Boston University School of Public Health DIMACS Working

Who is an epidemiologist?• Relatively new in medical science

– Precursors: John Graunt (17th century), John Snow (19th century)

– Rise as a profession: Wade Hampton Frost at JHU– 1950s and 1960s: CDC and consolidation as

professional discipline, still mainly physicians– 1960s+: Infectious disease -> Chronic disease epi

• Professonalization– Doctoral degrees in epidemiology– Now most epidemiologists are not docs

Page 11: Epidemiology for mathematicians “ Looking at wildflowers from horseback” David Ozonoff, MD, MPH Boston University School of Public Health DIMACS Working

Who employs epidemiologists?• Public sector

– State and federal health officials• Communicable and chronic disease programs

– Infectious disease, outbreak investigations– Cancer registries, environmental studies, program areas in

substance abuse, health services, etc., etc.

– Research at CDC, NIH, academia, etc.

• Private sector– Industry (chemical companies, drug companies)– Consultants – Academia, NGOs

Page 12: Epidemiology for mathematicians “ Looking at wildflowers from horseback” David Ozonoff, MD, MPH Boston University School of Public Health DIMACS Working

“Flavors” of epidemiology

– Descriptive epidemiology– Analytic epidemiology (finding “risk

factors”, a.k.a. “causes”)

Page 13: Epidemiology for mathematicians “ Looking at wildflowers from horseback” David Ozonoff, MD, MPH Boston University School of Public Health DIMACS Working

Descriptive epidemiology

• Describe patterns of disease by: person, place, time– Good for monitoring public’s health (e.g.,

surveillance, vital events)– Used for administrative purposes (e.g.,

planning)– Good for generating hypotheses

Page 14: Epidemiology for mathematicians “ Looking at wildflowers from horseback” David Ozonoff, MD, MPH Boston University School of Public Health DIMACS Working

NB: Disease patterns and the Science of patterns

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

Page 15: Epidemiology for mathematicians “ Looking at wildflowers from horseback” David Ozonoff, MD, MPH Boston University School of Public Health DIMACS Working

Description • Two kinds

– Tabulations or summaries only (no inference or estimation)– Inference

• Prediction to other populations (“generalization”; surveys and polling)• “True” value in face of noise

• May also assume data produced by underlying population model and try to describe it– Parametric: particular functional form assumed

• Parameter = value that indexes family functions, e.g., mean and std deviation of Normal distribution

– Non-parametric: data-driven estimate of underlying density or distribution

Page 16: Epidemiology for mathematicians “ Looking at wildflowers from horseback” David Ozonoff, MD, MPH Boston University School of Public Health DIMACS Working

A word about “models” and “patterns” (our usage)

• Models are high level, “global” descriptions of all or most of dataset– Descriptive or inferential component– Examples

• Regression models, mixture models, Markov models

• Patterns are “local” features of data– Perhaps only a few people or a few variables– Also descriptive or inferential

• Descriptive: look for people with “unusual” features• Inferential: Predict which people have “unusual” features

– Examples: Association rules, mode or gap in density function, outliers, inflection point in regression, symptom clusters, geographic “hot spots”, predict disease from symptoms

Page 17: Epidemiology for mathematicians “ Looking at wildflowers from horseback” David Ozonoff, MD, MPH Boston University School of Public Health DIMACS Working

Models and patterns, cont’d

• Epidemiologists use both but more interested in patterns, i.e., more interested in “structure” that is local than “structure” that is global– George Box: “All models are wrong but some

models are useful” describes epi viewpoint– But epidemiologists tend to think of patterns as

“real,” even if misleading

Page 18: Epidemiology for mathematicians “ Looking at wildflowers from horseback” David Ozonoff, MD, MPH Boston University School of Public Health DIMACS Working

Warning: word “model” differs by context but is usually some kind of

metaphor

• Metaphor: a figure of speech literally denoting one kind of thing but used to represent or reason about another kind of thing– Examples: fashion model, model citizen

(represent an “ideal”); scale model; animal model; mathematical model; model of an axiomatic system; regression model

Page 19: Epidemiology for mathematicians “ Looking at wildflowers from horseback” David Ozonoff, MD, MPH Boston University School of Public Health DIMACS Working

Question: What do we learn from the following examples?

Describing populations by person, place and time:

illustrating how epidemiologists think

Page 20: Epidemiology for mathematicians “ Looking at wildflowers from horseback” David Ozonoff, MD, MPH Boston University School of Public Health DIMACS Working

PersonPerson (age, sex, race) (age, sex, race) Death rates per 10Death rates per 1055 US population US population from coronary from coronary

disease by age and sex, 1981 disease by age and sex, 1981

Age White Men White Women

25-34 9 4

35-44 60 16

45-54 265 71

55-64 708 243

65-74 1670 769

75-84 3752 2359

85+ 8596 7215

Page 21: Epidemiology for mathematicians “ Looking at wildflowers from horseback” David Ozonoff, MD, MPH Boston University School of Public Health DIMACS Working

PlacePlace• Where are the rates of disease the highest and lowest?

Malignant Melanoma of Skin

Page 22: Epidemiology for mathematicians “ Looking at wildflowers from horseback” David Ozonoff, MD, MPH Boston University School of Public Health DIMACS Working

Place

Page 23: Epidemiology for mathematicians “ Looking at wildflowers from horseback” David Ozonoff, MD, MPH Boston University School of Public Health DIMACS Working

A Variation on Place: Migrant StudiesA Variation on Place: Migrant StudiesMortality rates (per 100,000) due to stomach Mortality rates (per 100,000) due to stomach cancercancer

Japanese in Japan 58.4

Japanese Immigrants to California

29.9

Sons of Japanese Immigrants 11.7

Native Californians (Caucasians)

8.0

Page 24: Epidemiology for mathematicians “ Looking at wildflowers from horseback” David Ozonoff, MD, MPH Boston University School of Public Health DIMACS Working

TimeDoes frequency of disease differ now from in the past?

Page 25: Epidemiology for mathematicians “ Looking at wildflowers from horseback” David Ozonoff, MD, MPH Boston University School of Public Health DIMACS Working

Two types of populations, based on whether membership is permanent or transient:

• Fixed population or cohort: membership is permanent and defined by an event Ex. Atomic bomb survivors, Persons born in 1980

• Dynamic population: membership is transient and defined by being in or out of a "state.” Ex. Members of HMO Blue, residents of the City of Boston

What is a Population?What is a Population?How an epidemiologist would put itHow an epidemiologist would put it

Group of people with a common characteristics like age, race, sex, geographic location, occupation, etc.

Page 26: Epidemiology for mathematicians “ Looking at wildflowers from horseback” David Ozonoff, MD, MPH Boston University School of Public Health DIMACS Working

First step, summary description

• Tabulate data by selected features of person, place, time

• What are characteristics of population members? (how many of each sex, race, etc.) And combinations of these features (How many white women? Employed? Etc.)

Page 27: Epidemiology for mathematicians “ Looking at wildflowers from horseback” David Ozonoff, MD, MPH Boston University School of Public Health DIMACS Working

Constructing contingency table from “raw data”

• “raw data” consists of listing of each subject and his or her attributes:

M F R L <65 65+Case1 1 0 0 1 0 1Case2 0 1 1 0 1 0Case3 1 0 0 1 0 1Case4 1 0 1 0 1 0Case5 0 1 1 1 1 0Case6 0 1 0 1 1 0Case7 1 0 1 0 0 1

Page 28: Epidemiology for mathematicians “ Looking at wildflowers from horseback” David Ozonoff, MD, MPH Boston University School of Public Health DIMACS Working

One-way tables

• One dimensional Contingency Table (CT) is just a frequency table, i.e., a table that gives number of subjects with each attribute

Males 4Females 3Right-handed 4Left-handed 4<65 465+ 3

Page 29: Epidemiology for mathematicians “ Looking at wildflowers from horseback” David Ozonoff, MD, MPH Boston University School of Public Health DIMACS Working

Two-way tables

• Most contingency tables are (at least) two-way, i.e., they cross-classify two attributes

Right-handed males 2Left-handed males 2Right-handed females 2Left-handed females 2Males < 65 y.o. 1Females < 65 y.o. 3Males 65 and older 3Females 65 and older 0

Page 30: Epidemiology for mathematicians “ Looking at wildflowers from horseback” David Ozonoff, MD, MPH Boston University School of Public Health DIMACS Working

Or in more familiar form…

R L <65 65+

M 2 2 1 3

F 2 2 3 0

Sex by handedness and age

But this is only part of the possible two way tables as it does not represent handedness versus age, for example

Page 31: Epidemiology for mathematicians “ Looking at wildflowers from horseback” David Ozonoff, MD, MPH Boston University School of Public Health DIMACS Working

What is a Population?What is a Population?How a mathematician How a mathematician mightmight put it put it

• A population is a triple, (G, M, I)

• Two sets, G and M; G is a set of “people” or “subjects”, M is a set of features the subjects might “have”

• A relation I, I G M– Interpretation: r = (g, m) I means that subject

g  G “has” attribute m M

Page 32: Epidemiology for mathematicians “ Looking at wildflowers from horseback” David Ozonoff, MD, MPH Boston University School of Public Health DIMACS Working

Contingency tables (“cross-tabs”)• Mainstay of data preparation, inspection and analysis• Requires study design based operations

– Sampling set of n subjects in set G– Variable selection (classification scheme) set of m variables in set M

• E.g., age, sex, disease status (as indicator variables)

– Measurement binary relation I G M• E.g., ordered pair (case 2, female=yes) is typical member of I

• We call the triple (G, M, I) a data structure for the contingency table (also called a formal context in FCA literature)

• Simple formulation allows use of rich mathematical theory• Much more about this from Alex Pogel

Page 33: Epidemiology for mathematicians “ Looking at wildflowers from horseback” David Ozonoff, MD, MPH Boston University School of Public Health DIMACS Working

Quantification: Disease frequency• Goal will be to see if occurrence of disease differs in populations with

different characteristics or experiences (note comparison is at heart of this)

• Quantify disease occurrence in a population at certain point or period of time– Population (counting, absolute scale)

• How big?• Composition?

– Occurrence (counting, absolute scale)• Existing cases? New cases?

– Time• Calendar time? (NB: interval scale, preserved under pos. lin. xform)• Duration of time (NB: ratio scale, preserved under similarity xform)

• More about this in Fred Roberts’s tutorial

Page 34: Epidemiology for mathematicians “ Looking at wildflowers from horseback” David Ozonoff, MD, MPH Boston University School of Public Health DIMACS Working

Ex. Hypothetical Frequency ofEx. Hypothetical Frequency of AIDS in Two CitiesAIDS in Two Cities

# new cases time period populationCity A 58 1985 25,000City B 35 1985-86 7,000 Annual "rate" of AIDSCity A = 58/25,000/1yr = 232/100,000/yrCity B = 35/7,000/2 yrs = 17.5/7000/yr = 250/100,000/yr

 Make it easy to compare rates (i.e., make them “commensurable”) by using same population unit (say, per 100,000 people) and time period (say, 1 year)

NB Commensurability is property of underlying relational system used in measurement (treated in Roberts tutorial)

Page 35: Epidemiology for mathematicians “ Looking at wildflowers from horseback” David Ozonoff, MD, MPH Boston University School of Public Health DIMACS Working

• Proportion: numerator is subset of denominator, often expressed as a percentage

• Ratio: division of one number by another, numbers don't have to be related

• Rate: time (sometimes space) is intrinsic part of denominator, term is often misused (e.g., “birthrate”)

Need to specify if measure represents events or people

Three kinds of quantitative Three kinds of quantitative measures of frequency of measures of frequency of occurrenceoccurrence

Used to relate number cases of disease, size of population, time

Page 36: Epidemiology for mathematicians “ Looking at wildflowers from horseback” David Ozonoff, MD, MPH Boston University School of Public Health DIMACS Working

• P = Number of existing cases of disease (at a given point in time)/ total population

• Ex. City A has 7000 people with arthritis on Jan 1st, 2002

• Population of City A = 70,000 • Prevalence of Arthritis on Jan 1st = .10 or 10%

(Point) Prevalence(Point) Prevalence(P) Quantifies number of existing cases of disease in a population at a point in time

Prevalence is a proportion

Page 37: Epidemiology for mathematicians “ Looking at wildflowers from horseback” David Ozonoff, MD, MPH Boston University School of Public Health DIMACS Working

Incidence - quantifies number of Incidence - quantifies number of (a)(a) newnew cases of disease that cases of disease that (b) develop in a population at risk (b) develop in a population at risk (c) during a specified time period(c) during a specified time period

Three key ideas:Three key ideas:• New disease events, or for diseases that can occur more than once,

usually first occurrence of disease• Population at risk (candidate population) - can't have disease

already, should have relevant organs• Enough time must pass for a person to move from health to disease

Page 38: Epidemiology for mathematicians “ Looking at wildflowers from horseback” David Ozonoff, MD, MPH Boston University School of Public Health DIMACS Working

Cumulative Incidence

(“Attack Rate”) (Abbreviated Cum Inc. CI)

Incidence Rate

(“Incidence Density”) (Abbreviated I, IR, ID)

Two Types of Incidence MeasuresTwo Types of Incidence Measures

Page 39: Epidemiology for mathematicians “ Looking at wildflowers from horseback” David Ozonoff, MD, MPH Boston University School of Public Health DIMACS Working

Incidence rate (I, IR) = # new cases of disease

Total person-time of observation

Also called incidence density (ID)

 

Page 40: Epidemiology for mathematicians “ Looking at wildflowers from horseback” David Ozonoff, MD, MPH Boston University School of Public Health DIMACS Working

Accrual of Person-Time Accrual of Person-Time

Jan Jan Jan1980 1981 1982

-----------------------x

-------------------------x

--------------------------------------------

1.1 Person-Year (PY)

1.2 PY

2.2 PY4.5 PY

Subject 1

Subject 2

Subject 3

X = outcome of interest, incident rate = 2/4.5 PY

Page 41: Epidemiology for mathematicians “ Looking at wildflowers from horseback” David Ozonoff, MD, MPH Boston University School of Public Health DIMACS Working

Some Ways to Accrue 100PYSome Ways to Accrue 100PY

• 100 people followed 1 year each = 100 py• 10 people followed 10 years each= 100 py• 50 people followed 1 year plus 25 people followed 2 years =

100 py

Time unit for person-time = year, month or dayPerson-time = person-year, person-month, person-day

Page 42: Epidemiology for mathematicians “ Looking at wildflowers from horseback” David Ozonoff, MD, MPH Boston University School of Public Health DIMACS Working

• Followed 1,762 women ---> 30,324 py

• Average of 17 years of follow-up per woman

• Ascertained 61 cases of breast cancer

• Incidence rate = 61/30,324 py = .00201/y

= 201/100,000 py (.00201 x 100,000 p/100,000 p)

Ex.: (Cohort) study of risk of breast cancer Ex.: (Cohort) study of risk of breast cancer among women with hyperthyroidismamong women with hyperthyroidism

Page 43: Epidemiology for mathematicians “ Looking at wildflowers from horseback” David Ozonoff, MD, MPH Boston University School of Public Health DIMACS Working

DimensionsDimensions

 

Prevalence = people people no dimension

 Cumulative incidence = people

people no dimension Incidence rate = people

people-time dimension is time –1

Page 44: Epidemiology for mathematicians “ Looking at wildflowers from horseback” David Ozonoff, MD, MPH Boston University School of Public Health DIMACS Working

Types of (instantaneous) rates

1

N

dN

dt

dN

dt

Relative rate (person-time or incidence rate)

Absolute rate (used in infectious disease epi and health services)

Also where units do not involve time, such as accidents per passenger mile or cases per square area

Page 45: Epidemiology for mathematicians “ Looking at wildflowers from horseback” David Ozonoff, MD, MPH Boston University School of Public Health DIMACS Working

P = IR x D

• Prevalence depends on incidence rate and duration of disease (duration lasts from onset of disease to its termination)

• If incidence is low but duration is long - prevalence is relatively high• If incidence is high but duration is short - prevalence is relatively low• This is an example of Little’s equation in queuing theory:

time-avg number of units in the system = arrival rule x avg delay time/unit

• This equation is true if ...

Relationship between prevalence and incidenceRelationship between prevalence and incidence

Page 46: Epidemiology for mathematicians “ Looking at wildflowers from horseback” David Ozonoff, MD, MPH Boston University School of Public Health DIMACS Working

• Steady state

• IR constant

• Distribution of durations constant

• Prevalence of disease is low (less than 10%)

In queuing theory terms: strictly stationary process in steady state conditions

Conditions for equation to be true:Conditions for equation to be true:

Page 47: Epidemiology for mathematicians “ Looking at wildflowers from horseback” David Ozonoff, MD, MPH Boston University School of Public Health DIMACS Working

Figuring duration from prevalence and Figuring duration from prevalence and incidenceincidence

Lung cancer incidence rate = 45.9/100,000 py

Prevalence of lung cancer = 23/100,000

D = P = 23/100,000 p = 0.5 years

IR 45.9/100,000 py

Conclusion: Individuals with lung cancer survive 6 months from diagnosis to death

Page 48: Epidemiology for mathematicians “ Looking at wildflowers from horseback” David Ozonoff, MD, MPH Boston University School of Public Health DIMACS Working

• Prevalence: administration, planning

• Incidence: etiologic research (problems with prevalence since it combines IR and D), planning

Uses of Prevalence and Incidence MeasuresUses of Prevalence and Incidence Measures

Page 49: Epidemiology for mathematicians “ Looking at wildflowers from horseback” David Ozonoff, MD, MPH Boston University School of Public Health DIMACS Working

Common measures of disease Common measures of disease frequency for public healthfrequency for public health

– Crude death (mortality) rate:

• Total number of deaths from all causes 1,000 people For one year

(also cause-specific, age-specific, race-specific death rate)

– Live-birth rate:

total number of live births For one year 1,000 people (sometimes women of childbearing age) 

– Infant mortality rate:

# deaths of infants under 1 year of age For one year1,000 live-births

Page 50: Epidemiology for mathematicians “ Looking at wildflowers from horseback” David Ozonoff, MD, MPH Boston University School of Public Health DIMACS Working

# of deaths for a defined period of time

# cases of disease

•Survival rate:# living cases for a defined period of time # cases of disease

•Attack rate:Attack rate:

# cases of disease that develop during defined period# in pop. at risk at start of period

(usually used for infectious disease outbreaks)

Frequency measures used in infectious disease epidemiology

Page 51: Epidemiology for mathematicians “ Looking at wildflowers from horseback” David Ozonoff, MD, MPH Boston University School of Public Health DIMACS Working

Tutorial part 2: Exposure - Disease Relationship

Analytic epidemiology

Page 52: Epidemiology for mathematicians “ Looking at wildflowers from horseback” David Ozonoff, MD, MPH Boston University School of Public Health DIMACS Working

Reprise: Epidemiology is a science within public health

• This means that it adopts a population perspective

• As a science, it is also quantitative

• As a science, it is also interested in explanation and prediction, not just describing

Page 53: Epidemiology for mathematicians “ Looking at wildflowers from horseback” David Ozonoff, MD, MPH Boston University School of Public Health DIMACS Working

Questions asked by communities

• Exposure driven questions– “What will happen to me, my family, my

community?

• Outcome driven questions– “Why me, why my child, why us?”

• Mixed– “Are we sicker than our neighbors?”

Page 54: Epidemiology for mathematicians “ Looking at wildflowers from horseback” David Ozonoff, MD, MPH Boston University School of Public Health DIMACS Working

The usual notion of causation: John Stuart Mill’s “Method of

Difference”• A causes B if, all else being held constant, a

change in A is accompanied by a subsequent change in B. – This of course does not mean that nothing else can produce a

change in B.

• The formal method to detect such an occurrence is the Experiment, whereby all things are held constant except A and B, A is varied, and B observed

Page 55: Epidemiology for mathematicians “ Looking at wildflowers from horseback” David Ozonoff, MD, MPH Boston University School of Public Health DIMACS Working

Expt’l vs. Observational Science

• Epidemiology is an “observational” science

• We do not control the independent variable (or most other variables)

• What is the implication of this for the status of epidemiology as a science?

• What does it mean about epidemiology’s ability to “prove causation”?

Page 56: Epidemiology for mathematicians “ Looking at wildflowers from horseback” David Ozonoff, MD, MPH Boston University School of Public Health DIMACS Working

Sources of information

• Case studies

• Experimental studies

• Observational studies

Once results are observed, it remains to explain or interpret the observation, whether the result is a difference or a lack of a difference in the compared entities.

Page 57: Epidemiology for mathematicians “ Looking at wildflowers from horseback” David Ozonoff, MD, MPH Boston University School of Public Health DIMACS Working

Types of observational study designs

• Descriptive– Case study and case-series– No comparison: Person, place and time– Cross-sectional comparison (“Are we sicker than our neighbors?”)– “ecological” (comparing communities/environments; not individual level)– Notice how descriptive and analytic shade into each other (as per

examples we did earlier)

• Cohort (“What’s going to happen to me?”)– Analog of the laboratory experiment

• Case-control (“Why me?”)

Page 58: Epidemiology for mathematicians “ Looking at wildflowers from horseback” David Ozonoff, MD, MPH Boston University School of Public Health DIMACS Working

Central idea: compare frequencies of occurrence in two groups

• Example: Summarize relationship between exposure and disease by comparing two measures of disease frequency

• Overall rate of disease in an exposed group says nothing about whether exposure is a risk factor for (“causes”) a disease

• This can be evaluated by comparing disease incidence in an exposed group to another group that is not exposed, (a “comparison group”)

• Comparison or contrast is the essence of epidemiology

Page 59: Epidemiology for mathematicians “ Looking at wildflowers from horseback” David Ozonoff, MD, MPH Boston University School of Public Health DIMACS Working

 Two Main Options for Comparing disease frequencies

1. Calculate ratio of two measures of disease frequency ( a measure in exposed group and a measure in unexposed comparison group)

2. Calculate difference between two measures of disease frequency (a measure in exposed group and a measure in unexposed comparison group)

Page 60: Epidemiology for mathematicians “ Looking at wildflowers from horseback” David Ozonoff, MD, MPH Boston University School of Public Health DIMACS Working

At the heart of an epidemiological study ...

• Lies a comparison– Between 2 rates, ratios, proportions

• Is the difference/lack of difference due to– Bias?– Chance?– Real effect?

Page 61: Epidemiology for mathematicians “ Looking at wildflowers from horseback” David Ozonoff, MD, MPH Boston University School of Public Health DIMACS Working

Determinants of the comparison

• Compared measures differ or they don’t ( is linearly ordered)

• Either way, the comparison may be affected by:– Chance (sample variation)– Bias– Real effect or lack of effect

• To interpret the comparison and evaluate the last factor, we need to account for effects of the first two

Page 62: Epidemiology for mathematicians “ Looking at wildflowers from horseback” David Ozonoff, MD, MPH Boston University School of Public Health DIMACS Working

Role of statistics

• Evaluates role that chance might play in the absence of any other factor

• Also used for summary purposes or to express a model mathematically

• Not the main preoccupation of epidemiologists, however

• Bias is main preoccupation of epidemiologists

Page 63: Epidemiology for mathematicians “ Looking at wildflowers from horseback” David Ozonoff, MD, MPH Boston University School of Public Health DIMACS Working

Evaluating the role of bias

• Epidemiology is observational discipline, so uncontrolled variables abound

• Most of training is in recognizing and accounting for sources of bias, often extremely subtle

• Less emphasis on role of chance, often handed over to biostatisticians

• Extent to which content area (“real effect”) taken into account varies with investigator and who collaborators are

Page 64: Epidemiology for mathematicians “ Looking at wildflowers from horseback” David Ozonoff, MD, MPH Boston University School of Public Health DIMACS Working

I. Definition of Bias

Bias is a systematic error that results in an incorrect (invalid) estimate of the measure of association

A. Can create spurious association when there really is none (bias away from the null)

B. Can mask an association when there really is one (bias towards the null)

C. Bias is primarily introduced by the investigator or study participants

Page 65: Epidemiology for mathematicians “ Looking at wildflowers from horseback” David Ozonoff, MD, MPH Boston University School of Public Health DIMACS Working

I. Definition of Bias (con’t)D. Bias does not mean that the investigator is

“prejudiced” or “not objective”

E. Bias can arise in all study types: experimental, cohort,

case-control

F. Bias occurs in the design and conduct of a study. It

cannot be fixed in the analysis phase.

G. Two main types of bias are selection and information

bias, but there are many other types of bias

H. We will consider only selection and information bias for purposes

of illustration of epidemiologic practice

Page 66: Epidemiology for mathematicians “ Looking at wildflowers from horseback” David Ozonoff, MD, MPH Boston University School of Public Health DIMACS Working

II. Selection Bias

A. Results from procedures used to select subjects into a study that lead to a result different from what would have been obtained from the entire population targeted for study

B. Most likely to occur in case-control or retrospective cohort because exposure and outcome have occurred at time of study selection

Page 67: Epidemiology for mathematicians “ Looking at wildflowers from horseback” David Ozonoff, MD, MPH Boston University School of Public Health DIMACS Working

II. Selection Bias in a Case-Control Study

A. Occurs when controls or cases are more (or less) likely to be included in study if they have been exposed -- that is, inclusion in study is not independent of exposure

B. Result: Relationship between exposure and disease observed among study participants is different from relationship between exposure and disease in individuals who would have been eligible but were not included -- OR from a study that suffers from selection bias will incorrectly represent the relationship between exposure and disease in the overall study population

Page 68: Epidemiology for mathematicians “ Looking at wildflowers from horseback” David Ozonoff, MD, MPH Boston University School of Public Health DIMACS Working

Selection Bias: Case-Control StudyQuestion: Do PAP smears prevent cervical cancer? Cases diagnosed at a city hospital. Controls

randomly sampled from household in same city by canvassing the neighborhood on foot. Here is the true relationship:

CervicalCancer Cases

Controls

Had PAPsmear

100 150

Did not havePAP smear

150 100

Total 250 250

OR = (100)(100) / (150)(150) = .44 There was a 54% reduced risk of cervical cancer among women who had PAP smears as compared to women who did not. (40% of cases had PAP smears versus 60% of controls)

Page 69: Epidemiology for mathematicians “ Looking at wildflowers from horseback” David Ozonoff, MD, MPH Boston University School of Public Health DIMACS Working

Selection Bias : Case-Control Study (con’t)

Recall: Cases from the hospital and controls come from the neighborhood around the hospital.

Now for the bias: Only controls who were at home at the time the researchers came around to recruit for the study were actually included in the study. Women at home were more likely not to work and were less likely to have regular checkups and PAP smears. Therefore, being included in the study as a control is not independent of the exposure. The resulting data are as follows:

Page 70: Epidemiology for mathematicians “ Looking at wildflowers from horseback” David Ozonoff, MD, MPH Boston University School of Public Health DIMACS Working

Selection Bias (con’t)CervicalCancer Cases

Controls

Had PAPsmear

100 100

Did not havePAP smear

150 150

Total 250 250

OR = (100)(150) / (150)(100) = 1.0There is no association between Pap smears and the risk of cervical cancer. Here, 40% of cases and 40% of controls had PAP smears.

Page 71: Epidemiology for mathematicians “ Looking at wildflowers from horseback” David Ozonoff, MD, MPH Boston University School of Public Health DIMACS Working

Selection Bias : Case-Control Study (con’t)

Ramifications of using women who were at home during the day as controls:

These women were not representative of the whole study population that produced the cases. They did not accurately represent the distribution of exposure in the study population that produced the cases, and so they gave a biased estimate of the association.

Page 72: Epidemiology for mathematicians “ Looking at wildflowers from horseback” David Ozonoff, MD, MPH Boston University School of Public Health DIMACS Working

When interpreting study results, ask yourself these questions …

• Given conditions of the study, could bias have occurred?

• Is bias actually present?• Are consequences of the bias large enough

to distort the measure of association in an important way?

• Which direction is the distortion? –is it towards the null or away from the null?

Page 73: Epidemiology for mathematicians “ Looking at wildflowers from horseback” David Ozonoff, MD, MPH Boston University School of Public Health DIMACS Working

Imputation of CausalityWhat are the roles of … • Bias: The critique checklist• Chance: “Statistical significance”• Real effect

– The Hill “viewpoints”• Not necessary “criteria” (not even criteria)• Not a checklist

– The way it’s really done...

Page 74: Epidemiology for mathematicians “ Looking at wildflowers from horseback” David Ozonoff, MD, MPH Boston University School of Public Health DIMACS Working

“Marks” of causality

• Strength of association

• Biologically plausible

• Biological gradient (“dose-response”)

• Appropriate temporal relationship

• Specificity

• “Consistency”

Page 75: Epidemiology for mathematicians “ Looking at wildflowers from horseback” David Ozonoff, MD, MPH Boston University School of Public Health DIMACS Working

The “Fundamental Question” (according to Hill)

• "Clearly none of these nine viewpoints can bring indisputable evidence for or against a cause-and-effect hypothesis and equally none can be required as a sine qua non. What they can do, with greater or less strength, is to help us to answer the fundamental question--is there any other way of explaining the set of facts before us, is there any other answer equally, or more, likely than cause and effect?”

Page 76: Epidemiology for mathematicians “ Looking at wildflowers from horseback” David Ozonoff, MD, MPH Boston University School of Public Health DIMACS Working

How it’s really done…

• Assemble the evidence from the literature. What are the pieces of the jigsaw?– How do you decide?

• Where do they fit?– How do you decide?

Page 77: Epidemiology for mathematicians “ Looking at wildflowers from horseback” David Ozonoff, MD, MPH Boston University School of Public Health DIMACS Working

Interpretation

• Evaluate the evidence (a study) for internal validity

• Evaluate the evidence for external validity

• Bottom line– What roles are played by bias, chance, real

effect?

Page 78: Epidemiology for mathematicians “ Looking at wildflowers from horseback” David Ozonoff, MD, MPH Boston University School of Public Health DIMACS Working

Assemble the jigsaw pieces into a picture

• The “picture” is your version of “causality”

• Your picture may disagree with other scientists– Disagreement among scientists is the rule, not

he exception

Page 79: Epidemiology for mathematicians “ Looking at wildflowers from horseback” David Ozonoff, MD, MPH Boston University School of Public Health DIMACS Working

Mathematics in epidemiology

• Traditional – Evaluate role of chance (statistical hypothesis

testing; estimation)– Descriptive (compact summary or generative

model)– Infectious disease epidemiology dynamics

Page 80: Epidemiology for mathematicians “ Looking at wildflowers from horseback” David Ozonoff, MD, MPH Boston University School of Public Health DIMACS Working

Comparing chronic and infectious disease epidemiology

S

S I R

P

1

,

2

Page 81: Epidemiology for mathematicians “ Looking at wildflowers from horseback” David Ozonoff, MD, MPH Boston University School of Public Health DIMACS Working

S

S I R

P

1

,

2

=birth rate or migration in-rate=incidence rate or infectivity rate, = mortality and recovery rates with1=case fatality rate, 2=background mortality ratePrevalence “rate” = P/(S+P)

Page 82: Epidemiology for mathematicians “ Looking at wildflowers from horseback” David Ozonoff, MD, MPH Boston University School of Public Health DIMACS Working

Comparing chronic and infectious epi (cont’d)

• Chronic– Usually concentrate on

(incidence) because interested in etiology

– Have to account for fact that is function of calendar time and age, exposure (?metric), sex, race, SES, occupation, co-morbid conditions, latency

– But not usually population size or density, number of other cancer cases, etc.

• Infectious– Interest in usually limited to

its value as a parameter; we know the etiology

– Interested in dynamics over time and space, existence of thresholds or periods, effect of parameters and initial conditions like size initial population, infectivity, mode of contact

Difference is one of emphasis and interest, not concepts

Page 83: Epidemiology for mathematicians “ Looking at wildflowers from horseback” David Ozonoff, MD, MPH Boston University School of Public Health DIMACS Working

Some new uses for mathematics in epidemiology

• Formalization and theoretical tools• Pattern and rule detection (“data mining”)• Descriptive modeling• Prediction from data

– Classification– Taxonomy

• Data organization and retrieval from large databases• Patient confidentiality/coding/cryptography• Multi-scale inference• Network construction/applications, etc.