screening and prognostic tests thomas b. newman, md, mph october 20, 2005

Screening and Prognostic Tests

Thomas B. Newman, MD, MPH

October 20, 2005

Overview Questions from last time; administrative stuff Screening tests

– Introduction– Biases in observational studies– Biases in randomized trials– Conclusion – ecologic view

Prognostic tests– Differences from diagnostic tests and risk

factors– Quantifying prediction: calibration and

discrimination – Value of information– Common problems

TN Biases “When your only tool is a hammer, you

tend to see every problem as a nail.”

Biggest gains in longevity have been PUBLIC HEALTH interventions, not interventions aimed at individuals

Biggest threats are still public health threats

Interventions aimed at individuals are overemphasized

Cultural characteristics

"We live in a wasteful, technology driven, individualistic and death-denying culture."

--George Annas, New Engl J Med, 1995

What is screening? Common definition: testing to detect

asymptomatic disease Better definition*: application of a test to

detect a potential disease or condition in people with no known signs or symptoms of that disease or condition.– Disease vs condition– Asymptomatic vs no known signs or

symptoms

*Common screening tests. David M. Eddy, editor. Philadelphia, PA: American College of Physicians, 1991

Types of screening

Unrecognized symptomatic disease screening: what IS making the person sick.

Disease screening: what WILL make the person sick.

Risk factor screening: what MIGHT make the person sick.

Examples and overlap Continuum related to both certainty and timing of

symptoms May vary with age Unrecognized symptomatic disease: vision and

hearing problems in young children; iron deficiency anemia, depression

Presymptomatic disease: neonatal hypothyroidism, syphilis, HIV

Risk factor: hypercholesterolemia, hypertension Somewhere between: prostate cancer, breast

carcinoma in situ, more severe hypertension

Disease vs. Risk factor screening. 1

(Unrecognized) Symptomatic Disease

# Labeled Few# Treated FewDuration of treatment

Varies

NNT LowEase of showing benefit

Potential for harm

False positives

Pre-symptomatic

Disease # Labeled Few Few# Treated Few FewDuration of treatment

Varies Varies, may be short

NNT Low LowEase of showing benefit

Easy Often difficult

Potential for harm

False positives False positives, pseudodisease

*May be political as well as scientific decision

Pre-symptomatic

Disease

Risk factor

# Labeled Few Few High*# Treated Few Few High*Duration of treatment

Varies, may be short

NNT Low Low HighEase of showing benefit

Easy Often difficult Usually very difficult

Potential for harm

False positives False positives, pseudodisease

Harmful treatment,

delayed effects

Possible harms from screening

To all tested To those with negative results To those with positive results To those not tested See course book

Forces behind excessive screening -1 Companies selling machines to do the

test Companies selling the test itself Companies selling products to treat the

condition Clinicians who treat the condition Politicians who are (or want to appear)

sympathetic

Forces behind excessive screening -2 Disease research and advocacy groups Academics who study the condition Clinicians doing or interpreting the test Managed care organizations The public

E-mail excerpt -1 > PLEASE, PLEASE, PLEASE TELL ALL YOUR FEMALE FRIENDS AND RELATIVES TO INSIST ON A CA-125 BLOOD TEST EVERY YEAR AS PART OF THEIR ANNUAL PHYSICAL EXAMS. Be forewarned that their doctors might try to talk them out of it, saying, "IT ISN'T NECESSARY." > > …Insist on the CA-125 BLOOD TEST; DO NOT take "NO" for an answer!

Biases in Observational Studies of Screening Tests

Volunteer bias Lead time bias Length time bias Stage migration bias Pseudodisease

Volunteer Bias

People who volunteer for studies differ from those who do not

Examples– HIP Mammography study: women who

volunteered for mammography had lower heart disease death rates

– Coronary drug project: Men who took their medicine had about half the mortality of men who didn't, whether they were on drug or placebo

Lead time bias

Source: EDITORIAL: Finding and Redefining Disease. Effective Clinical Practice, March/April 1999. Available at: ACP- Online http://www.acponline.org/journals/ecp/marapr99/primer.htm accessed 8/30/02

Length Bias (Different natural history bias)

Screening picks up prevalent disease Prevalence = incidence x duration Slowly growing tumors have greater duration

in presymptomatic phase, therefore greater prevalence

Therefore, cases picked up by screening will be disproportionately those that are slow growing

Length bias

Source: EDITORIAL: Finding and Redefining Disease. Effective Clinical Practice, March/April 1999. Available at: ACP- Online http://www.acponline.org/journals/ecp/marapr99/primer.htm

Length Bias

Early detection Higher cure rate

Slower growing tumor with better prognosis

Stage migration bias

Stage 0

Stage 1

Stage 2

Stage 3

Stage 4

Stage 0

Stage 1

Stage 2

Stage 3

Stage 4

Old tests New tests

Stage migration bias

Also called the "Will Rogers Phenomenon"– "When the Okies left Oklahoma and moved to

California, they raised the average intelligence level in both states."

-- Will Rogers Documented with colon cancer at Yale Other examples abound – the more you look

for disease, the higher the prevalence and the better the prognosis

More generally, be careful with stratified analyses

Best reference on this topic: Black WC and Welch HG. Advances in diagnostic imaging and overestimation of disease prevalence and the benefits of therapy. NEJM 1993;328:1237-43.

A more general example of Stage Migration Bias

VLBW (< 1500 g), LBW (1500-2499g) and NBW (>= 2500g) fetuses exposed to Factor X all have decreased mortality compared with those not exposed

Is factor X good? Maybe not! Factor X could be cigarette

smoking! – Smoking moves babies to lower birthweight strata– Compared with other causes of LBW (i.e.,

prematurity) it is not as bad

Pseudodisease A condition that looks just like the disease,

but never would have bothered the patient In an individual treated patient it is impossible

to distinguish pseudodisease from successfully treated asymptomatic disease

Existence of pseudodisease can only be detected in groups of treated patients

Treating pseudodisease can only cause harm because (by definition) it is unnecessary

Example: Mayo Lung Project (MLP) RCT of lung cancer screening Enrollment 1971-76 9,211 male smokers Two study arms

– Intervention arm: chest x-ray and sputum cytology every 4 months for 6 years (75% compliance)

– Usual care (control) arm: at trial entry only, a recommendation to receive same tests annually

MLP Extended Follow-up Results*

Intervention group: more cancers diagnosed at early, resectable stage

Better survival of those with lung cancer

*Marcus et al., JNCI 2000;92:1308-16

MLP Extended Follow-up Results* Slight increase in lung-cancer mortality (P=0.09 by

*Marcus et al., JNCI 2000;92:1308-16

What happened?

Lead-time bias? Length bias? Volunteer bias? Overdiagnosis (pseudodisease)

Black, WC. Overdiagnosis: An unrecognized cause of confusion and harm in cancer screening. JNCI 2000;92:1280-1

NHLBI National Lung Screening Trial

46,000 participants randomized in 2 years

Equal randomization Three annual screens Spiral CT versus chest x-ray!

Each year, 182,000 women are diagnosed with breast cancer and 43,300 die. One woman in eight either has or will develop breast cancer in her lifetime...

If detected early, the five-year survival rate exceeds 95%. Mammograms are among the best early detection methods, yet 13 million women in the U.S. are 40 years old or older and have never had a mammogram.

39,800 Clicks per mammogram (Sept, ’04)

RCTs of screening tests, Example: Mammography

New York TimesExpert Panel Cites Doubts On Mammogram's Worth

Washington Post Mammography Review Shatters the Status Quo

Doubts About Its Value Alarm Many

Is screening for breast cancer with mammography justifiable?* Meta-analysis of randomized trials Methodologic issues raised

Quality of randomization Post-randomization exclusions Choice of outcome variable: Breast cancer

mortality vs. total mortality

*Gotzsche P,Olsen O. Lancet 2000;355:1293

Poor Quality Randomization. Example: Edinburgh trial Randomization by practice (N=87?), not

by woman 7 practices changed allocation status Highest SES

– 26% of women in control group– 53% of women in screening group

26% reduction in cardiovascular mortality in mammography group

Example 2: Biased post-randomization exclusion for previous beast cancer

New York Trial N=853 in screened group N=336 in control group Breast cancer mortality difference at 18 years: 44

deaths Edinburgh trial

N=338 in screened group N=177 in control group

Explanation for differences in NY Trial* In screened group women with previous breast

cancer excluded at entry In control group, women with previous breast

cancer excluded only if they developed breast cancer

Thus, women with previous breast cancer in who did NOT develop breast cancer were included in the denominator of the control group but not the mammography group

Therefore, bias against mammography

* Fletcher SW, Gilmore JG. Mammography screening for breast cancer. NEJM 2003;348:1672-80. (Appendix 2)

Problems with breast cancer mortality as an endpoint Assignment of cause of death is

subjective– Unblinded in NY, Two-county trials

Treatment may have effects on other causes of death

Meta-analysis of radiotherapy for early breast cancer* Meta-analysis of 40 RCTs Central review of individual-level data; N

= 20,000 Breast cancer mortality reduced (20-yr

ARR 4.8%; P = .0001) Mortality from other causes increased

(20-yr ARR -4.3%; P = 0.003)

*Early Breast Cancer Trialists Collaborative Group. Lancet 2000;355:1757

Mastectomies

Radiotherapy

13-year total mortality, > 50 y.o.

Breast cancer deaths, 7 yr

NCI Position* “NCI recommends mammography for women

starting in their 40s” -- Dr. Peter Greenwald, NCI director of cancer prevention

"Everyone agrees that mammography detects breast cancer when it's smaller, when it's earlier. There's no debate about that," Greenwald added. "And everybody agrees mammography detects more cancers.

"The debate is whether that has an impact on mortality later on. It is the only real method that we have, other than clinical exam, that's useful as screening for early detection in healthy women."

*Washington Post, January 24, 2002

Cancer mortality vs Total mortality in RCTs

TN Conclusions on Screening Screening decisions are heavily influenced by

politics, economics, emotion and wishful thinking Most screening occurs without informed consent High quality RCTs are needed Low power to discern effect on total mortality Big debate about efficacy. But even if

proponents are right, much screening is not cost-effective and its disadvantages are consistently downplayed

Cost per QALY Mammography, age 40-50: $105,000* Mammography, age 50-69: $21,400* Smoking cessation counseling: $2000** HIV prevention in Africa: $1-20***

*Salzman P et al. Ann Int Med 1997;127:955-65 (Based on optimistic assumptions about mammography.)

**Cromwell J et al. JAMA 1997;278:1759-66

***Marseille E et al. Lancet 2002; 359: 1851-56

Return to George Annas*

Need to begin to think differently about health. Two dysfunctional metaphors:– Military metaphor – battle disease, no

cost too high for victory, no room for uncertainty

– Market metaphor -- medicine as a business; health care as a product; success measured economically

*Annas G. Reframing the debate on health care reform by replacing our metaphors. NEJM 1995;332:744-7

Ecology metaphor

Sustainability Limited resources Interconnectedness More critical of technology Move away from domination, buying,

selling, exploiting Focus on the big picture

–Populations rather than individuals–Causes rather than symptoms

Assessment of Prognostic Tests

Difference from diagnostic tests and risk factors

Quantifying accuracy Value of prognostic information Common problems

Potential confusion: “cross-sectional” means 2 things

Cross-sectional sampling means sampling does not depend on either the predictor variable or the outcome variable. (E.g., as opposed to case-control sampling)

Cross-sectional time dimension means that predictor and outcome are measured at the same time -- opposite of longitudinal

Difference from Diagnostic Tests

Longitudinal rather than cross-sectional time dimension

Incidence rather than prevalence Sensitivity, specificity, prior probability

confusing Time to an event may be important Harder to quantify accuracy in individuals

– Exceptions: short time course, continuous outcomes

Difference from Risk Factors Causality not important Absolute risk very important

– Sampling scheme makes a much bigger difference because absolute risks are less generalizable than relative risks

– Can be informative even if no bad outcomes!

How accurate are the predicted probabilities?– Assemble a group– Compare actual and predicted probabilities

Calibration is important for decision making and giving information to patients

Like absolute risk in this way – less generalizable

Quantifying Prediction 1: Calibration

How well can the test separate subjects in the group from the mean probability to values closer to zero or 1?

May be more generalizable Often measured with C-statistic

Quantifying Prediction 2: Discrimination

Illustration

Perfect calibration, no discrimination:– Predicted = actual 5-year mortality = 45%

(for everyone) Perfect discrimination, poor calibration

– Every patient that dies has a predicted mortality of 51% and every patient who survives has a predicted mortality of 49%

Quantifying Discrimination:

Dichotomize outcome at time t Then can calculate

– Sensitivity and specificity– Likelihood ratios– ROC curves, c-statistic– Can provide these for multiple time points.

In each case, probabilities are for an event on or before time t.

Quantifying prediction. Analyze as a risk factor

Risk ratios (for cumulative incidence) Odds ratios (from logistic regression) Hazard ratios (for time to an event)

Doctors and patients like prognostic information

But hard to assess its value Most objective approach is decision-

analytic. Consider: – What decision is to be made– Costs of errors– Cost of test

Value of Prognostic Information

DECISION: Treat with more aggressive regimen

BEFORE test: 5-year mortality = 25% AFTER test: 5-year mortality either 10% or 50% BUT: do we know how bad it is:

– To treat patient with 10% mortality with more aggressive regimen?

– To treat patient with 50% mortality with less aggressive regimen?

Example

Common Problems with Studies of Prognostic Tests- 1 Referral/selection bias – e.g. too many

studies from tertiary centers Effects of prognosis on treatment and

effects of treatment on prognosis– Effective treatments blunt relationships– End-of-life decisions may accentuate

relationships

Common Problems with Studies of Prognostic Tests- 2 Loss to follow-up

– Can do sensitivity analysis Lack of blinding

– Especially important for subjective outcomes, e.g., physician decisions, cause of death

Common Problems with Studies of Prognostic Tests- 3 Overfitting – given enough variables and a

small enough number of outcomes, can predict almost perfectly– Need separate validation

Inadequate sample size– Unlike situations where relative risk is

important, for absolute risk DENOMINATOR as important as numerator.

On Cost-Effectiveness Analyses

"The essential purpose of a cost-effectiveness analysis is to calculate the net benefit or harm to a population if resources are put into one activity rather than another. But that question does not even arise if you do not look past the one activity that interests you...From this narrowed perspective, the results of cost-effectiveness analysis are not only moot, they are an irritant.”

-- David Eddy

Questions?

screening and prognostic tests thomas b. newman, md, mph october 20, 2005

risk factor screening

potential disease

common screening tests

hivrisk factor

person sick

pseudodiseaseharmful

known signs

public health interventions

Documents

epi 202: designing clinical research data management for...

alternatives to randomized trials for estimating treatment...

prognostic factors for mrcc - euikcs.com · prognostic...

thomas b. newman, md, mph andi marmor, md, msed october 18,...

understanding p- values and confidence intervals thomas b....

st. thomas more newman center newman news

elegant alternatives to randomized trials for determining...

studies of diagnostic tests thomas b. newman, md, mph...

studies of medical tests thomas b. newman, md, mph september...

prognostic scoring system on peptic ulcer...

thomas b. newman, md, mph andi marmor, md, msed october 23,...

the new nuclear danger and what you can do about it thomas...

quantifying your commitment to your patient experience...

prognostic and...

music by thomas newman - hqcovers · pdf filethomas newman...

biostat 215 clarifying the causal question thomas b. newman,...

epi 202:designing clinical research session 1: introduction...

course overview, the diagnostic process, and measures of...

complications in breast...

alternatives and enhancements to intention to treat analyses...