study design: case-control studies paul l. reiter, phd assistant professor division of cancer...

Study Design: Case-Control Studies

Paul L. Reiter, PhDAssistant Professor

Division of Cancer Prevention and Control

[email protected]

mailto:[email protected]

Learning Objectives

Describe the strengths and weaknesses of case-control studies

Describe the importance of the selection of controls Compare and contrast the different types of matching for

case-control studies Describe the different types of biases commonly

associated with case-control studies

Module Outline

Case-Control Studies What are they? Case Selection Control Selection Matching Potential Biases Strengths and Weaknesses

Evidence Pyramid

http://www.flickr.com/photos/69409968@N07/6329536471/

2 x 2 Table

a b

c d

Disease

ExposureYes

No

Yes No

Case-Control Study

Nature / subjects / others assign exposure status No formal procedure of random assignment

Subjects selected based on disease status (cases and controls)

Past exposure status is determined for cases and controls Compare exposure in cases versus controls

Case-Control Study

a b

c d

Disease

ExposureYes

No

Yes No

Case-control studies start with disease status and then determine exposure

Module Outline


Source Cohort

“It is helpful to think of any case-control study as being nested – that is, conducted – within a cohort of exposed and unexposed…Case-control studies can be thought of as nested within a source population…”

Rothman and Greenland

Source Cohort

The “source cohort” behind a case-control study is the population (cohort) that gave rise to the cases

Example

Cases: lung cancer cases in Franklin County, OH

Source Cohort: residents of Franklin County, OH

Selecting Participants

??? ???

??? ???

Disease

ExposureYes

No

Yes No

Goal: Select n1 cases and n0 controls from source cohort without knowledge of their exposure status

n1n0

Selecting Cases

We want cases to be all (or a “representative” sample) of the diseased members of the source cohort “Representative” = group provides a valid estimate of exposure

Time

Source Cohort

Cases

Controls

Study Cases

Selecting Cases

Where to Find Cases

Clinic-based cases Hospitals Outpatient clinics Physician practices

Population-based cases Disease registries Death certificates

Considerations

Clinic-based cases Possibly harder to define “source cohort” due to referral patterns If examining severely ill patients, may get “survivors” instead of a

representative sample

Population-based cases May be difficult to find a registry for some diseases (e.g., HPV

infection)

Incident vs. Prevalent Cases

Incident Newly diagnosed cases Have to wait for new cases to be diagnosed and have system for

identifying them

Prevalent People who may have had disease for some time Any risk factors identified may be related more to survival than

disease development

Verdict: Incident cases are generally preferred

Module Outline


Selecting Controls

We want controls to be a “representative” sample of the non-diseased members of the source cohort “Representative” = group provides a valid estimate of exposure

Selecting controls is extremely important since they serve as the “comparison group” to cases for your study Want to select the most valid comparison group possible

Selecting Controls

Select individuals who might have become cases in your study if they had developed disease, that is, from the source cohort that gave rise to the cases

Try to conceptualize the “source cohort” (although it may not be easily identifiable) and select controls from that cohort

Time

Source Cohort

Cases

Controls

Study Cases

Selecting Controls

Study Controls

Selecting Controls is Difficult!

Control selection is “one of the most difficult problems in epidemiology” (Gordis)

It is also one of the most important components of a case-control study!

Where to Find Controls

Medical care system Hospitals Outpatient clinics Physician practices

Community General population Family members or friends Neighbors (geographic controls) Other (schools, worksites, etc.)

Deceased individuals

Medical Care System Controls

Advantages Theoretically belong to same source cohort as cases (if using

clinic-based cases) Easily identifiable High cooperation rate “Mental set” is similar to cases (potentially less recall bias)

Disadvantages Might have medical condition caused by exposure Only a subset of source population

Medical Care System Controls

General rules Choose control conditions likely to have same referral pattern as

disease of interest Exclude conditions known to be associated (positively or

negatively) with the exposure Preferable to select controls from multiple disease categories

Community Controls

Advantages Theoretically belong to same source cohort as cases (if using

population-based cases) Random sampling of population-based controls is usually the

most desirable option, if possible

Disadvantages Source cohort not always easily identifiable to allow for random

sampling of controls Low cooperation rate Possible “overmatching” if using family or friends “Mental set” different from cases (recall bias)

Community Controls - Methodology

Random digit dialing (RDD) Cell phone only households Negative influence of telemarketing

Door-to-door More likely option for developing countries

Ask cases to provide list of family members, friends, or neighbors

Public databases (DMV, voter registration lists, etc.)

How Many Controls Do I Need?

0 1 2 3 4 5 6

Pre

cisi

on

of E

stim

ate

s

Number of Controls per Case

Returns in statistical efficiency diminish drastically by increasing the control to case ratio beyond 4 or 5

Module Outline


Matching - Definition

“Matching refers to the selection of a reference series – unexposed subjects in a cohort study or controls in a case-control study – that is identical, or nearly so, to the index series [exposed or cases] with respect to the distribution of one or more potentially confounding factors.”

Rothman and Greenland

Reason for Matching

“A major concern in conducting a case-control study is that cases and controls may differ in characteristics or exposures other than the one that has been targeted for the study.”

Gordis

Matching

Matching basically makes sure that controls and cases are similar on certain characteristics

Two types of matching Individual matching Group matching

Individual Matching

Also called “match pairs” Matching occurs subject by subject For each case, select one or more controls with

characteristics that match that case Example

Case is a 50 year old African American man, and we want to match on age, race, and gender

Control would be selected who is 50 years old, African American, and male

Group Matching

Also called “frequency matching” For a stratum of cases, select a stratum of controls. The

proportion of a characteristic should be the same between cases and controls

Often requires that all cases are selected first Example

There are 400 cases (300 female, 100 male) We would select 300 female and 100 male controls if we wanted

to match on gender

Matching – Positives and Negatives

Positives Leads to more efficient stratified analyses

Negatives Cannot examine the relation of a matched variable to the disease May be increase complexity of study logistics (hard to find a

control for some cases) In individual matching, cannot use cases for which no matched

control was found Risk of “overmatching”, which can result in loss of precision

Matching – The Verdict

Be careful when opting for a matched design Match (if at all) on only a few variables suspected to be

strong confounders

Module Outline


Potential Biases

Selection bias Information bias

Recall bias Interviewer bias

Confounding bias

Selection Bias

Control-selection bias If exposure in selected controls differs from exposure in source

cohort

Case-selection bias If exposure in selected cases differs from exposure in source

cohort If some cases did not arise from the source cohort

Want well-defined inclusion/exclusion criteria and sound selection methods

Information Bias

Recall bias Interviewer bias

Recall Bias

Remember that we identify cases and controls based on disease status and then need to determine past exposure

May not be a problem for some exposures (e.g., presence of a gene) but other exposure data rely on interviews or surveys

Recall is a major problem in case-control studies

Recall Bias

Some participants may not be able to remember or accurately report information related to exposure Or they simply may not have the requested information

This means that some cases/controls will likely be misclassified as exposed/unexposed

Interviewer Bias

If using interviewers to collect data, they may not be blinded to the case-control status of participants Interviews may phrase items differently or probe further on

exposure questions when interviewing cases

Minimizing Information Bias

Exposure status (and other variables) should be measured in a comparable fashion in cases and controls

Exposure status should not be known when a cases or control is selected for study

Sources of exposure information Self-reports Surrogate / proxy (e.g., spouse) Records (hospital, worksite) Physical measurements Stored samples

Confounding Bias

Confounding: A situation in which the effect or association between an exposure and outcome is distorted by the presence of another variable

If confounding is present in the source cohort, then it should also be present in the study sample Since we select cases and controls to be “representative” of the

source cohort

Several ways to control for confounding Stratification, statistical modeling, etc.

Module Outline


Strengths of Case-Control Studies

Easier to study rare diseases Can examine a variety of exposures for a given disease Compared to cohort studies, usually:

Quicker Easier Cheaper

Under certain conditions, results can estimate a causal parameter

Weaknesses of Case-Control Studies

Difficulty in selecting appropriate controls Information bias (particularly recall bias) Not ideal for rare exposures (cohort studies are probably

better for this) Can be difficult to establish temporality between

exposure and disease

Case-Control vs. Cohort

Case-Control Cohort (Prospective)

Study Group Diseased persons (cases) Exposed persons

Comparison Group Nondiseased (controls) Unexposed persons

Multiple Associations Several exposures with disease

Several diseases with exposure

Cost of Study Relatively inexpensive Expensive

Time Required Relatively short Generally long

Best When Disease is rare Exposure is rare

Problems Selection of controls, information bias, etc.

Loss to follow-up, misclassify outcomes, etc.

Summary

“A case-control study is a useful first step when searching for a cause of an adverse health outcome.”

Gordis

Evidence Pyramid

http://www.flickr.com/photos/69409968@N07/6329536471/

Summary

There are several important strengths to case-control studies, but must be aware of some of the limitations Biases discussed earlier

Control selection is crucial to a case-control study Source of controls Matching

Case Control Quiz

Thank you for completing this module

• If you have any questions, write to me. • [email protected]

mailto:[email protected]

References

Gordis L. (2009). Epidemiology, 4th edition. Philadelphia, PA: Elsevier/Saunders.

Rothman, K.J., Greenland, S. & Lash, T.L. (2008). Modern Epidemiology, 3rd Edition. Philadelphia, PA: Lippincott, Williams & Wilkins.

Rothman, K.J. & Greenland, S. (1998). Modern Epidemiology, 2nd Edition. Philadelphia, PA: Lippincott, Williams & Wilkins.

Survey

We would appreciate your feedback on this module. Click on the button below to complete a brief survey. Your responses and comments will be shared with the module’s author, the LSI EdTech team, and LSI curriculum leaders. We will use your feedback to improve future versions of the module.

The survey is both optional and anonymous and should take less than 5 minutes to complete.

Survey

http://osumcedtech.com/eval?id=3938

study design: case-control studies paul l. reiter, phd assistant professor division of cancer...

Documents