screening and prognostic tests thomas b. newman, md, mph october 20, 2005
Post on 30-Dec-2015
213 Views
Preview:
TRANSCRIPT
Screening and Prognostic Tests
Thomas B. Newman, MD, MPH
October 20, 2005
Overview Questions from last time; administrative stuff Screening tests
– Introduction– Biases in observational studies– Biases in randomized trials– Conclusion – ecologic view
Prognostic tests– Differences from diagnostic tests and risk
factors– Quantifying prediction: calibration and
discrimination – Value of information– Common problems
TN Biases “When your only tool is a hammer, you
tend to see every problem as a nail.”
Biggest gains in longevity have been PUBLIC HEALTH interventions, not interventions aimed at individuals
Biggest threats are still public health threats
Interventions aimed at individuals are overemphasized
Cultural characteristics
"We live in a wasteful, technology driven, individualistic and death-denying culture."
--George Annas, New Engl J Med, 1995
What is screening? Common definition: testing to detect
asymptomatic disease Better definition*: application of a test to
detect a potential disease or condition in people with no known signs or symptoms of that disease or condition.– Disease vs condition– Asymptomatic vs no known signs or
symptoms
*Common screening tests. David M. Eddy, editor. Philadelphia, PA: American College of Physicians, 1991
Types of screening
Unrecognized symptomatic disease screening: what IS making the person sick.
Disease screening: what WILL make the person sick.
Risk factor screening: what MIGHT make the person sick.
Examples and overlap Continuum related to both certainty and timing of
symptoms May vary with age Unrecognized symptomatic disease: vision and
hearing problems in young children; iron deficiency anemia, depression
Presymptomatic disease: neonatal hypothyroidism, syphilis, HIV
Risk factor: hypercholesterolemia, hypertension Somewhere between: prostate cancer, breast
carcinoma in situ, more severe hypertension
Disease vs. Risk factor screening. 1
(Unrecognized) Symptomatic Disease
# Labeled Few# Treated FewDuration of treatment
Varies
NNT LowEase of showing benefit
Easy
Potential for harm
False positives
Disease vs. Risk factor screening. 2
(Unrecognized) Symptomatic Disease
Pre-symptomatic
Disease # Labeled Few Few# Treated Few FewDuration of treatment
Varies Varies, may be short
NNT Low LowEase of showing benefit
Easy Often difficult
Potential for harm
False positives False positives, pseudodisease
Disease vs. Risk factor screening. 3
*May be political as well as scientific decision
(Unrecognized) Symptomatic Disease
Pre-symptomatic
Disease
Risk factor
# Labeled Few Few High*# Treated Few Few High*Duration of treatment
Varies, may be short
Long
NNT Low Low HighEase of showing benefit
Easy Often difficult Usually very difficult
Potential for harm
False positives False positives, pseudodisease
Harmful treatment,
delayed effects
Possible harms from screening
To all tested To those with negative results To those with positive results To those not tested See course book
Forces behind excessive screening -1 Companies selling machines to do the
test Companies selling the test itself Companies selling products to treat the
condition Clinicians who treat the condition Politicians who are (or want to appear)
sympathetic
Forces behind excessive screening -2 Disease research and advocacy groups Academics who study the condition Clinicians doing or interpreting the test Managed care organizations The public
E-mail excerpt -1 > PLEASE, PLEASE, PLEASE TELL ALL YOUR FEMALE FRIENDS AND RELATIVES TO INSIST ON A CA-125 BLOOD TEST EVERY YEAR AS PART OF THEIR ANNUAL PHYSICAL EXAMS. Be forewarned that their doctors might try to talk them out of it, saying, "IT ISN'T NECESSARY." > > …Insist on the CA-125 BLOOD TEST; DO NOT take "NO" for an answer!
Biases in Observational Studies of Screening Tests
Volunteer bias Lead time bias Length time bias Stage migration bias Pseudodisease
Volunteer Bias
People who volunteer for studies differ from those who do not
Examples– HIP Mammography study: women who
volunteered for mammography had lower heart disease death rates
– Coronary drug project: Men who took their medicine had about half the mortality of men who didn't, whether they were on drug or placebo
Lead time bias
Source: EDITORIAL: Finding and Redefining Disease. Effective Clinical Practice, March/April 1999. Available at: ACP- Online http://www.acponline.org/journals/ecp/marapr99/primer.htm accessed 8/30/02
Length Bias (Different natural history bias)
Screening picks up prevalent disease Prevalence = incidence x duration Slowly growing tumors have greater duration
in presymptomatic phase, therefore greater prevalence
Therefore, cases picked up by screening will be disproportionately those that are slow growing
Length bias
Source: EDITORIAL: Finding and Redefining Disease. Effective Clinical Practice, March/April 1999. Available at: ACP- Online http://www.acponline.org/journals/ecp/marapr99/primer.htm
Length Bias
Early detection Higher cure rate
Slower growing tumor with better prognosis
?
Stage migration bias
Stage 0
Stage 1
Stage 2
Stage 3
Stage 4
Stage 0
Stage 1
Stage 2
Stage 3
Stage 4
Old tests New tests
Stage migration bias
Also called the "Will Rogers Phenomenon"– "When the Okies left Oklahoma and moved to
California, they raised the average intelligence level in both states."
-- Will Rogers Documented with colon cancer at Yale Other examples abound – the more you look
for disease, the higher the prevalence and the better the prognosis
More generally, be careful with stratified analyses
Best reference on this topic: Black WC and Welch HG. Advances in diagnostic imaging and overestimation of disease prevalence and the benefits of therapy. NEJM 1993;328:1237-43.
A more general example of Stage Migration Bias
VLBW (< 1500 g), LBW (1500-2499g) and NBW (>= 2500g) fetuses exposed to Factor X all have decreased mortality compared with those not exposed
Is factor X good? Maybe not! Factor X could be cigarette
smoking! – Smoking moves babies to lower birthweight strata– Compared with other causes of LBW (i.e.,
prematurity) it is not as bad
Pseudodisease A condition that looks just like the disease,
but never would have bothered the patient In an individual treated patient it is impossible
to distinguish pseudodisease from successfully treated asymptomatic disease
Existence of pseudodisease can only be detected in groups of treated patients
Treating pseudodisease can only cause harm because (by definition) it is unnecessary
Example: Mayo Lung Project (MLP) RCT of lung cancer screening Enrollment 1971-76 9,211 male smokers Two study arms
– Intervention arm: chest x-ray and sputum cytology every 4 months for 6 years (75% compliance)
– Usual care (control) arm: at trial entry only, a recommendation to receive same tests annually
MLP Extended Follow-up Results*
Intervention group: more cancers diagnosed at early, resectable stage
Better survival of those with lung cancer
*Marcus et al., JNCI 2000;92:1308-16
MLP Extended Follow-up Results* Slight increase in lung-cancer mortality (P=0.09 by
1996)
*Marcus et al., JNCI 2000;92:1308-16
What happened?
Lead-time bias? Length bias? Volunteer bias? Overdiagnosis (pseudodisease)
Black, WC. Overdiagnosis: An unrecognized cause of confusion and harm in cancer screening. JNCI 2000;92:1280-1
NHLBI National Lung Screening Trial
46,000 participants randomized in 2 years
Equal randomization Three annual screens Spiral CT versus chest x-ray!
Each year, 182,000 women are diagnosed with breast cancer and 43,300 die. One woman in eight either has or will develop breast cancer in her lifetime...
If detected early, the five-year survival rate exceeds 95%. Mammograms are among the best early detection methods, yet 13 million women in the U.S. are 40 years old or older and have never had a mammogram.
39,800 Clicks per mammogram (Sept, ’04)
RCTs of screening tests, Example: Mammography
New York TimesExpert Panel Cites Doubts On Mammogram's Worth
Washington Post Mammography Review Shatters the Status Quo
Doubts About Its Value Alarm Many
Is screening for breast cancer with mammography justifiable?* Meta-analysis of randomized trials Methodologic issues raised
Quality of randomization Post-randomization exclusions Choice of outcome variable: Breast cancer
mortality vs. total mortality
*Gotzsche P,Olsen O. Lancet 2000;355:1293
Poor Quality Randomization. Example: Edinburgh trial Randomization by practice (N=87?), not
by woman 7 practices changed allocation status Highest SES
– 26% of women in control group– 53% of women in screening group
26% reduction in cardiovascular mortality in mammography group
Example 2: Biased post-randomization exclusion for previous beast cancer
New York Trial N=853 in screened group N=336 in control group Breast cancer mortality difference at 18 years: 44
deaths Edinburgh trial
N=338 in screened group N=177 in control group
Explanation for differences in NY Trial* In screened group women with previous breast
cancer excluded at entry In control group, women with previous breast
cancer excluded only if they developed breast cancer
Thus, women with previous breast cancer in who did NOT develop breast cancer were included in the denominator of the control group but not the mammography group
Therefore, bias against mammography
* Fletcher SW, Gilmore JG. Mammography screening for breast cancer. NEJM 2003;348:1672-80. (Appendix 2)
Problems with breast cancer mortality as an endpoint Assignment of cause of death is
subjective– Unblinded in NY, Two-county trials
Treatment may have effects on other causes of death
Meta-analysis of radiotherapy for early breast cancer* Meta-analysis of 40 RCTs Central review of individual-level data; N
= 20,000 Breast cancer mortality reduced (20-yr
ARR 4.8%; P = .0001) Mortality from other causes increased
(20-yr ARR -4.3%; P = 0.003)
*Early Breast Cancer Trialists Collaborative Group. Lancet 2000;355:1757
Mastectomies
Radiotherapy
13-year total mortality, > 50 y.o.
Breast cancer deaths, 7 yr
NCI Position* “NCI recommends mammography for women
starting in their 40s” -- Dr. Peter Greenwald, NCI director of cancer prevention
"Everyone agrees that mammography detects breast cancer when it's smaller, when it's earlier. There's no debate about that," Greenwald added. "And everybody agrees mammography detects more cancers.
"The debate is whether that has an impact on mortality later on. It is the only real method that we have, other than clinical exam, that's useful as screening for early detection in healthy women."
*Washington Post, January 24, 2002
Cancer mortality vs Total mortality in RCTs
TN Conclusions on Screening Screening decisions are heavily influenced by
politics, economics, emotion and wishful thinking Most screening occurs without informed consent High quality RCTs are needed Low power to discern effect on total mortality Big debate about efficacy. But even if
proponents are right, much screening is not cost-effective and its disadvantages are consistently downplayed
Cost per QALY Mammography, age 40-50: $105,000* Mammography, age 50-69: $21,400* Smoking cessation counseling: $2000** HIV prevention in Africa: $1-20***
*Salzman P et al. Ann Int Med 1997;127:955-65 (Based on optimistic assumptions about mammography.)
**Cromwell J et al. JAMA 1997;278:1759-66
***Marseille E et al. Lancet 2002; 359: 1851-56
Return to George Annas*
Need to begin to think differently about health. Two dysfunctional metaphors:– Military metaphor – battle disease, no
cost too high for victory, no room for uncertainty
– Market metaphor -- medicine as a business; health care as a product; success measured economically
*Annas G. Reframing the debate on health care reform by replacing our metaphors. NEJM 1995;332:744-7
Ecology metaphor
Sustainability Limited resources Interconnectedness More critical of technology Move away from domination, buying,
selling, exploiting Focus on the big picture
–Populations rather than individuals–Causes rather than symptoms
Assessment of Prognostic Tests
Difference from diagnostic tests and risk factors
Quantifying accuracy Value of prognostic information Common problems
Potential confusion: “cross-sectional” means 2 things
Cross-sectional sampling means sampling does not depend on either the predictor variable or the outcome variable. (E.g., as opposed to case-control sampling)
Cross-sectional time dimension means that predictor and outcome are measured at the same time -- opposite of longitudinal
Difference from Diagnostic Tests
Longitudinal rather than cross-sectional time dimension
Incidence rather than prevalence Sensitivity, specificity, prior probability
confusing Time to an event may be important Harder to quantify accuracy in individuals
– Exceptions: short time course, continuous outcomes
Difference from Risk Factors Causality not important Absolute risk very important
– Sampling scheme makes a much bigger difference because absolute risks are less generalizable than relative risks
– Can be informative even if no bad outcomes!
How accurate are the predicted probabilities?– Assemble a group– Compare actual and predicted probabilities
Calibration is important for decision making and giving information to patients
Like absolute risk in this way – less generalizable
Quantifying Prediction 1: Calibration
How well can the test separate subjects in the group from the mean probability to values closer to zero or 1?
May be more generalizable Often measured with C-statistic
Quantifying Prediction 2: Discrimination
Illustration
Perfect calibration, no discrimination:– Predicted = actual 5-year mortality = 45%
(for everyone) Perfect discrimination, poor calibration
– Every patient that dies has a predicted mortality of 51% and every patient who survives has a predicted mortality of 49%
Quantifying Discrimination:
Dichotomize outcome at time t Then can calculate
– Sensitivity and specificity– Likelihood ratios– ROC curves, c-statistic– Can provide these for multiple time points.
In each case, probabilities are for an event on or before time t.
Quantifying prediction. Analyze as a risk factor
Risk ratios (for cumulative incidence) Odds ratios (from logistic regression) Hazard ratios (for time to an event)
Doctors and patients like prognostic information
But hard to assess its value Most objective approach is decision-
analytic. Consider: – What decision is to be made– Costs of errors– Cost of test
Value of Prognostic Information
DECISION: Treat with more aggressive regimen
BEFORE test: 5-year mortality = 25% AFTER test: 5-year mortality either 10% or 50% BUT: do we know how bad it is:
– To treat patient with 10% mortality with more aggressive regimen?
– To treat patient with 50% mortality with less aggressive regimen?
Example
Common Problems with Studies of Prognostic Tests- 1 Referral/selection bias – e.g. too many
studies from tertiary centers Effects of prognosis on treatment and
effects of treatment on prognosis– Effective treatments blunt relationships– End-of-life decisions may accentuate
relationships
Common Problems with Studies of Prognostic Tests- 2 Loss to follow-up
– Can do sensitivity analysis Lack of blinding
– Especially important for subjective outcomes, e.g., physician decisions, cause of death
Common Problems with Studies of Prognostic Tests- 3 Overfitting – given enough variables and a
small enough number of outcomes, can predict almost perfectly– Need separate validation
Inadequate sample size– Unlike situations where relative risk is
important, for absolute risk DENOMINATOR as important as numerator.
On Cost-Effectiveness Analyses
"The essential purpose of a cost-effectiveness analysis is to calculate the net benefit or harm to a population if resources are put into one activity rather than another. But that question does not even arise if you do not look past the one activity that interests you...From this narrowed perspective, the results of cost-effectiveness analysis are not only moot, they are an irritant.”
-- David Eddy
Questions?
top related