classification and bias of clinical research, with a randomized clinical trial case study rick...

70
Classification and Bias of Clinical Research, with a Randomized Clinical Trial Case Study Rick Chappell, Ph.D. Professor, Department of Biostatistics and Medical Informatics University of Wisconsin Medical School

Upload: buddy-bishop

Post on 22-Dec-2015

219 views

Category:

Documents


1 download

TRANSCRIPT

Classification and Bias of Clinical Research, with a Randomized Clinical

Trial Case Study

Rick Chappell, Ph.D.

Professor,

Department of Biostatistics and Medical Informatics

University of Wisconsin Medical School

Good Ethics is Good Science:

“If a research study is so methodologically flawed that little or no reliable information will result, it is unethical to put subjects at risk or even to inconvenience them through participation in such a study. … Clearly, if it is not good science, it is not ethical.”

- U.S. Dept. of Health and Human Services, Policy for Protection of Human Subjects (45 CFR 46, 1/1/92 ed.)

Types of Studies Classified by Temporal Point of View

I. Instantaneous Studies - Surveys

II. Longitudinal Studies A. Retrospective Studies

Historical Observational CohortCase - Control

B. Prospective StudiesProspective Observational CohortClinical Trial

C. Hybrid Designs

A Schematic for Temporal Classification

Now

ProspectiveRetrospective

Observational Cohort

Clinical Trial

Randomization

Observational Cohort

Case - Control

Instantaneous: Survey

I. Instantaneous:Population-Based Studies

Synonyms Survey Population-Correlation Study Ecological Study

Two or more populations are instantaneously compared through the prevalences of both exposure and disease.

As summarized units get smaller (country region neighborhood individual), a survey approaches a historical observational cohort study.

Population-Based Studies

Advantages

Instantaneous.

Easy access to a large and varied population.

Good for hypothesis generation.

Disadvantages

Intervention is usually not feasible.

Very little information on causality: IARC standards require individual-based evidence.

II. Longitudinal:Individual-Based Studies

A longitudinal study observes exposures and events for individuals over a period of time.

There are two types, depending on whether one is looking forwards (prospective) or backwards (retrospective) from the present.

Longitudinal Studies:A. Retrospective

Historical Observational Cohort Synonyms - survey, retrospective cohort study. Examines outcomes among patients with past exposures. E.g., track down 1950s asbestos miners & determine current

status.

Case - Control (Breslow and Day, 1980) Synonyms - case referent, retrospective study. Examines past exposures among a group of patients with

current outcomes. E.g., interview mesothelioma patients & determine past

exposures.

Historical Observational Cohort Studies

Advantages

Quick results - no wait.

Easy to get large samples by ‘mining’ databases.

Yields wide range of sequelae.

Useful for investigating rare treatments or exposures.

Disadvantages

No opportunity to customize data collection.

No possibility for blinding.

Many possible biases: Confounding Selection Information

Case - Control Studies

Advantages

Cheap, quick - record searching can be automated.

Useful for pilot studies.

Useful for investigating rare disorders.

Disadvantages

Gives narrow picture of risks due to treatment or exposure.

Biases: Confounding Selection Recall

Yields only estimates of relative, not absolute risk.

Hypothetical Historical Cohort Study

Exposed Group 100 Patients 10 Events Rate = .1

Odds Ratio 2

Control Group 100 Patients 5 Events Rate = .05

Hypothetical Case-Control Study

Event Group 100 Patients 10 Exposures

Event Rate per Exposure = ?

(Not 100/200).

Non-Event (Control) Group 100 Patients 5 Exposures

Odds Ratio 2

Longitudinal Studies:B. Prospective

General Advantages Can collect detailed exposure, treatment, disease, and

demographic information. Blinding is possible. Recall and information bias may be eliminated. Useful for investigating rare treatments or exposures.

Classification depends on the presence of intervention.

Prospective Studies

Prospective Observational Cohort Synonyms - prospective trial, ‘clinical trial’. No intervention.

Randomized Controlled (“Phase III”) Clinical Trial Synonyms - prospective interventional cohort study,

experiment, prospective trial, clinical trial. Experimenters directly intervene in patient treatment,

usually on a randomized basis with controls.

Prospective Observational Cohort Study

Additional

Advantage

Passive observation; no need to dictate treatment.

Disadvantages

May take a long time to accrue cases and wait for results.

Potential confounding bias due to lack of randomization and suitable controls.

Clinical Trials

Additional Advantages

“The most definitive tool for evaluation of the applicability of clinical research” - 1979 NIH release.

Biases may be eliminated.

Good design may make analysis simple.

Disadvantages

As above, may take a long time.

Must be ethically and laboriously conducted.

Requires treatment on basis (in part) of scientific rather than medical factors. Patients may make some sacrifice (Meier, 1982).

Phases of a Clinical Trial

Biochemical and pharmacological research.

Animal Studies (Gart, 1986 & Schneiderman, 1967).

Phase I (Storer, 1989) - estimate toxicity rates using few (~ 10 - 40) healthy or sick subjects.

Phase II (Thall & Simon, 1995) - determines whether a therapy has potential using a few very sick patients.

Phases of a Clinical Trial (cont.)

Phase III - large randomized controlled, possibly blinded, experiments

Phase IV - a controlled trial of an approved treatment with long-term followup of safety and efficacy.

Sackett’s Levels of Evidence for “Evidence-based Medicine” (Cook, et

al., 2002)

Level Type of Evidence

1a Systematic Review (with homogeneity) of RCTs

1b Individual RCT with Narrow Confidence Interval

1c All or None- Previously Hopeless Case(s) Salvaged

… continued

Levels of Evidence (cont.)

Level Type of Evidence

2a Systematic Review (with Homogeneity) of Cohort Studies

2b Individual Cohort Study (or Low Quality RCT; e.g., <80% Followup)

2c Ecological Studies

3a SR (with Homogeneity) of case-control studies

3b Individual Case-Control Study

… continued

Levels of Evidence (cont.)

Level Type of Evidence

4 Case-series (or poor quality cohort and case-control studies)

5 Expert opinion without explicit critical appraisal, or based on physiology, bench research or "first principles"

A critique (by example) of clinical trials and evidence-based medicine

“Parachute use to prevent death and major trauma related to gravitational challenge: systematic review of randomised controlled trials” (Smith & Pell, 2003).

Fig. 1:

Abstract

Objectives: To determine whether parachutes are effective in preventing major trauma related to gravitational challenge.

Design: Systematic review of randomised controlled trials

...

Main outcome measure: Death or major trauma, defined as an injury severity score > 15.

Results: We were unable to identify any randomised controlled trials of parachute intervention.

Conclusions: As with many interventions intended to prevent ill health, the effectiveness of parachutes has not been subjected to rigorous evaluation by using randomised controlled trials. Advocates of evidence based medicine have criticised the adoption of interventions evaluated by using only observational data. We think that everyone might benefit if the most radical protagonists of evidence based medicine organised and participated in a double blind, randomised, placebo controlled, crossover trial of the parachute.

Longitudinal Studies:C. Hybrid Designs

Prospective Treatment, Historical Controls Currently treated series of patients is compared with

a previous series. See Gehan & Freireich (1974), Gehan (1984). Advantages

Doesn’t assign treatments.No need to recruit controls.

Longitudinal Studies:C. Hybrid Designs (cont.)

Prospective Treatment, Historical Controls Disadvantages

Same as in Historical Observational Cohort except that characteristics of treated patients (only) can be collected.

Selection bias likely because of time lag between groups.

Hybrid Designs

Prospective Treatment with Both Prospective and Historical Controls Uses both types of controls to maximize efficiency

and minimize bias See Pocock (1976a and 1976b).

Bias in Clinical Studies

Definition: Bias is a systematic error in estimation which is not reduced by increasing the study sample size (as opposed to random variation).

See Sacket (1979) and other articles in the same issue; Rose (1982); and Lachin (1988).

Classification is based on whether bias occurs at the time of patient Selection; or at the time of Information collection; or at the time of Publication.

They are all variants of Confounding, in which a third variable is related to both treatment and outcome.

I. Selection Bias

Prevalence - Incidence Bias Prevalence (observed occurrence) of a trait

Incidence (rate of onset). Cause: gap between exposure, selection of subjects. Not a problem with irreversible events such as

mortality, if detectable. E.g., hypertension may disappear with onset of CV

disease and can be overlooked as a risk factor. See Neyman, 1955. (Any retrospective study, especially case-control.)

Selection Bias

Admission Rate Bias Patients may differ from noninstitutionalized

subjects in size or direction of effects. E.g., systemic weakness vs. arthritis:

Negative relation among inpatients;Positive relation among outpatients.

See Berkson, 1946. (Any nonrandomized study with a mix of patient

sources, especially case-control.)

Selection Bias

Nonrespondant (Volunteer) Bias Nonparticipation may be related to the subject of

investigation. E.g., smokers ignore surveys more often than do non-

smokers (Seltzer, 1974). For general methods to analyze data with ‘nonignorable

nonresponse’ see Little and Rubin (1987) and Rubin (1987).

(Case-control, though drop-outs can effect any study not analyzed ‘intent to treat.)

Example: Where to add armor to fighter planes?

In World War II, the U.S. Air Force conducted an investigation into where armor could most effectively be added to fighter planes.

Researchers examined returning aircraft, mapped the locations of bullet holes, and recommended that the most commonly pierced areas be reinforced.

Their recommendation neglected the most vital part of the aircraft, which was intact in all returning aircraft: the area surrounding the pilot’s head!

II. Information Bias

Detection Signal (Diagnostic Suspicion) Bias In unblinded studies, an exposure may be

considered a risk factor for an endpoint, and such patients preferentially observed.

In blinded studies, an exposure may make an endpoint more detectable.

E.g., estrogen causes bleeding from uterine cancer to be more easily detectable.

(Any unblinded study except case-control; also clinical trials with sensitive endpoints.)

Reports of Original Studies JAVMA 191, 12/1/87

“High-rise syndrome in cats”Wayne O. Whitney, DVM & Cheryl J. Mehlhaff, DVM

Selection and/or detection bias

Information Bias

Exposure Suspicion Bias An outcome may cause the investigator to look for a

particular exposure. The temporal reverse of detection signal bias. E.g., arthritis and knuckle-cracking. (Case-control studies.)

Information Bias

Recall (family information) Bias Similar to exposure suspicion bias, but errors

originate with the subject or his/her family. E.g., in a study of prescription use among women

with fetal malformation, 28% reported unverifiable exposure vs. 20% of the controls (Klemetti & Saxen, 1967).

(Case-control studies.)

III. Publication (Reporting) Bias

Even a perfect study leads to bias if dissemination depends on the direction of its result.

Causes: Commercial reasons; Researchers’ personal motivations; Editorial Policy !

Vickers, et al. (1998) show that the problem is widespread: in some countries, 100% of publications show treatment effects.

Publication (Reporting) Bias

A version of the multiple comparisons problem (Miller, 1985), or ‘testing to a foregone conclusion’.

E.g., ORG-2766 protected nerves from cytotoxic injury in 55 women with ovarian cancer - NEJM lead article (van der Hoop, et al., 1990); a subsequent negative study of 133 women - ASCO Proceedings abstract (Neijt, et al., 1994).

(All Studies.)

A type of reporting bias: Multiple Comparisons (“Data Dredging”)

A “p-value” is interpreted as the probability of attaining a result as extreme that observed given that the result is false (under the null hypothesis); it can be viewed as the false positive rate under the null hypothesis.

This assumes that only a single test is conducted. If many tests are performed, it is possible to “sample to a foregone conclusion” and produce a falsely low p-value.

For example, if twenty-five independent tests are conducted, the probability of at least one p-value being less than .01 is .22.

Often only the significant result is reported, and the 24 others ignored.

IV. Confounding (General)

Caused by any situation in which: A third variable exists which isn’t known or at least isn’t

accounted for; It is associated with the “cause”and It is also associated with the “effect”.

Then:

The supposed cause-effect relation will be confounded by the third variable.

(Any nonrandomized study)

Do Storks Bring Babies?

Population of Oldenburg, Germany, 1930-1936(Ornithologische Monatsberichte 44, Jahrgang, 1936, Berlin)

Storks (1000s)

Humans(1000s)

A Case Study in Randomized Clinical Trials: the Women’s Health Initiative

Slides adapted from a presentation by Jacques E. Rossouw, National Heart, Lung, and Blood Institute.

Rationale for WHI: Prevention of Chronic Disease in Women

Clinical trials to test promising but unproven therapies Hormone therapy to prevent CHD Diet to prevent cancers Calcium+Vitamin D to prevent fractures

Observational study to identify/quantify risk factors

Community intervention studies to test strategies for health promotion

WHI Clinical TrialWHI Clinical TrialThree Trials in One StudyThree Trials in One Study

TOTAL TRIAL PARTICIPANTS = 68,133

Hormone Replacement Therapy = 27,347

Calcium and Vitamin D = 36,282

Dietary Modification = 48,835

Primary Outcome: CHD

Primary Outcome: Hip Fractures

Primary Outcomes: Cancer of Breast, Colon/Rectum

Average follow-up 8.4 years

“In the entire realm of medicine,

there are few forms of therapy

with a more consistent record

of beneficence.”

Robert Wilson, MD

Feminine Forever,’65

History of HT Use in the US

Failed RCTs Failed RCTs in menin men

““Feminine Feminine ForeverForever””

Endo. Ca.Endo. Ca.

Progest;CProgest;CHD; bone HD; bone lossloss

Breast Breast cancercancer

010203040

1960

1970

1980

1990

2000P

resc

rip

tio

ns/

yr

WHI Hormone TrialBackground circa 1992

Suspected benefits of hormones:

risk of CHD

risk of fracture

risk of colorectal cancer

Suspected risks of hormones:

Possible risk of breast cancer

risk of VTE/PE

American College of PhysiciansAmerican College of Physicians

1992

"randomized trials are required to prove these effects...."

"Women who have coronary heart disease or who are at increased risk of coronary heart disease are likely to benefit from hormone therapy"

American College of Obstetricians and Gynecologists

American College of Obstetricians and Gynecologists

1992

"epidemiologic studies.....strongly suggest that hormone replacement therapy decreases the risk of cardiovascular disease"

"probable beneficial effect of estrogen on heart disease"

"randomized drug trial......is urgently needed"

National Cholesterol Education Program Adult Treatment Panel II

National Cholesterol Education Program Adult Treatment Panel II

1993

"Epidemiologic evidence for benefit of estrogen replacement therapy is especially strong for secondary prevention in women with prior CHD"

"Estrogen replacement therapy can be used in postmenopausal women with elevated LDL cholesterol, although confirmation of benefit in CHD risk reduction from clinical trials is still needed for certainty"

American Heart AssociationAmerican Heart Association

19961996

"ERT does look promising as a long-term protection against heart attack......."

"More clinical trials are needed to determine whether ERT can actually delay heart disease in women after menopause."

Risk for Coronary Heart Disease: Estrogen+Progestin Users vs Nonusers

Case-Control Studies

Relative Risk

Psaty, 1994

Mann, 1994

Rosenberg, 1993

Thompson, 1989

Cohort StudiesGrodstein, 1996

Falkeborn, 1992

Clinical TrialNachtigal, 1979

Summary Relative Risk

0.01 0.1 101

Barrett-Connor. Annu Rev Public Health. 1998;19:55-72. Barrett-Connor. Annu Rev Public Health. 1998;19:55-72.

Observational studies suggest HT use is associated with reduced risk of CHD in…

users of estrogen

users of estrogen plus progestin

irrespective of type of estrogen or progestin

women without prior heart disease (primary prevention)

women with prior heart disease (secondary prevention)

Selection biases in hormone users may explain all, or a large proportion of

the apparent benefit for CHD…..

At start HT Users healthier

During HT Compliant pill takers have better health Are under medical surveillance Early CHD events missed in some studies

On stopping HT Illness often reason for stopping

Do residual biases in observational studies lead to overestimation of

benefit?

Compliance Bias

People who take their pills regularly are less likely to die…….

Mortality reduction in good adherers compared to poor adherers

Active 68%

Placebo 60%

All 62%

Adapted from Beta Blocker Heart Attack Trial, Horwitz et al. (1990)

WHI E+P: CHD by Year of Follow-up

0

10

20

30

40

50

60

1 2 3 4 5 6+

Year of Follow-up

Ann

uali

zed

Rat

e/10

,000

E+P Placebo

Overall HR=1.24 (1.00-1.54)

P<0.05

P for trend<.05

Report by the U.S. Institute of Medicine – Recommendations:

"One branch is at special risk: the near-term effects of hormones on reducing cardiovascular risk factors and event rates may be confirmed early in this project.”

Total HT prescriptions 1995- Aug 2003

0102030405060708090

100

19

95

19

96

19

97

19

98

19

99

20

00

20

01

20

02

20

03

Pre

sc

rip

tio

ns

(m

illi

on

s)

HERS

WHI

Hersch et al, JAMA 2004

Labeling Changes: Labeling Changes: ““Black BoxBlack Box””

WARNING

Estrogens and progestins should not be used for the prevention of cardiovascular disease. The Women’s Health Initiative (WHI) study reported increased risks of myocardial infarction, stroke, invasive breast cancer, pulmonary emboli, and deep vein thrombosis in postmenopausal women during 5 years of treatment with conjugated equine estrogens (0.625 mg) combined with medroxyprogesterone acetate (2.5 mg) relative to placebo (see CLINICAL PHARMACOLOGY, Clinical Studies)…...

A consequence (? Based on observational data)

References

Berkson, J. Limitations of the application of fourfold table analysis to hospiital data (1946). Biometrics Bulletin 2, 47-53.

Breslow, N.E. and Day, N.E. (1980). Statistical Methods in Cancer Research 1: The Analysis of Case-Control Ctudies. Oxford: Oxford University Press.

Cook, et al. (2002). http://www.eboncall.org/content/levels.html .

Dorr, Robert T. (1997). Personal communication.

Gart, J.J. et al. (1986). Statistical Methods in Cancer Research 3: The Design and Analysis of Long-Term Animal Experiments. Oxford: Oxford University Press.

Gehan, Edmund A. The evaluation of therapies: Historical control studies, with discussion (1984). Statistics in Medicine 3, 315-324.

Gehan, Edmund A. and Freireich, Emil (1974). The New England Journal of Medicine, 198-203.

Horwitz R.I., et al. (1990). Lancet 336, 542-5.

IARC. Monographs on the Evaluation of Carcinogenic Risk of Chemicals to Humans. Lyon: IARC.

Klemetti, A. and Saxen, L. Prospective vs. retrospective approach in the search for environmental causes of malformations. American Journal of Public Health 57, 2071-2075.

Lachin, J. Statistical properties of randomization in clinical trials (1988). Controlled Clinical Trials 9, 289-311.

Little, R.J.A. and Rubin, D.B. (1987). Statistical analysis with Missing Data. New York: Wiley.

Neyman, J. Statistics - servant of all sciences (1955). Science 122, 401.

Meier, Paul. Current research in statistical methodology for clinical trials (1982). Proceedings of Current Topics in Biostatistics and Epidemiology: A Memorial Symposium in Honor of Jerome Cornfield. Pages 141- 150. Biometrics.

Miller, R. Publication bias (1985). Entry in The Encyclopedia of Statistical Sciences, Volume 5. S. Kotz and N.L. Johnson, eds., pp. 679-689. New York: Wiley.

National Institutes of Health, Division of Research Grants, Research Analysis and Evaluation Branch, Bethesda, MD (1979). NIH inventory of clinical trials: fiscal year 1979, Volume I.

Neijt, et al. (1994). Proceedings of the American Society for Clinical Oncology.

Pocock, S.J. The combination of randomized and historical controls in clinical trials (1976a). Journal of Chronic Diseases 29, 175-188.

Pocock, S.J. Randomized versus historical controls: A compromise solution (1976b). Proceedings of the International Biometric Conference 9/1, 245-260.

Rose, G. Bias (1982). British Journal of Clinical Pharmacology 13, 157-162.

Rubin, D.B. (1987). Multiple Imputation for Nonresponse in Surveys. New York: Wiley.

Sacket, D.L. Bias in analytic research (1979). Journal of Chronic Diseases 32, 51-63.

Schneiderman, M.A. Mouse to man: statistical problems in bringing a drug to clinical trial (1967). Proceedings of the 5th Berkeley Symposium in Mathematical Statistics and Probability, Volume IV. L.M. LeCam and J. Neyman, eds. Berkeley.

Seltzer, C.C. et al. Mail response by smoking status (1974). American Journal of Epidemiology 100, 453-477.

Smith, G.C.S. and Pell, J.P. Parachute use to prevent death and major trauma related to gravitational challenge: systematic review of randomised controlled trials (2003). British Medical Journal 327, 1459-1461.

Storer, B.E. Design and analysis of phase I clinical trials (1989). Biometrics 46, 33-38.

Thall, Peter F. and Simon, Richard M. Recent developments in the design of phase II clinical trials (1995). In Recent Advances in Clinical Trial Design and Analysis. Peter Thall, ed. , pp. 49-72. New York: Kluwer.

Unger, D.L. Does knuckle cracking lead to arthritis of the fingers? [letter]. Arthritis & Rheumatism 41. 949-50, 1998.

van der Hoop, et al. (1990). New England Journal of Medicine 322, 89-84.

Vickers, et al. (1998). Controlled Clinical Trials 19, 159-166.