randomisation: necessary but not sufficient

61
Randomisation: necessary but not sufficient Doug Altman Centre for Statistics in Medicine University of Oxford

Upload: evangeline-joseph

Post on 03-Jan-2016

54 views

Category:

Documents


1 download

DESCRIPTION

Randomisation: necessary but not sufficient. Doug Altman Centre for Statistics in Medicine University of Oxford. Randomisation is not enough. The aim of an RCT is to compare groups equivalent in all respects other than the treatment itself - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Randomisation:  necessary but not sufficient

Randomisation: necessary but not sufficient

Doug Altman

Centre for Statistics in MedicineUniversity of Oxford

Page 2: Randomisation:  necessary but not sufficient

2

Randomisation is not enough

The aim of an RCT is to compare groups equivalent in all respects other than the treatment itself

Randomisation can only produce groups that are comparable at the start of a study

Other aspects of good trial design are required to retain comparability to the end

Randomised trials are conceptually simple but– easy to do badly– hard to do well

Page 3: Randomisation:  necessary but not sufficient

3

… what could possibly go wrong?”

Page 4: Randomisation:  necessary but not sufficient

4

“Clinical trials are only as strong as the weakest elements of design, execution

and analysis”

[Hart HG. NEJM 1992]

Page 5: Randomisation:  necessary but not sufficient

5

Randomised trials

An incomplete compendium of errors:Design AnalysisInterpretationSelective publication Reporting

Implications

Page 6: Randomisation:  necessary but not sufficient

6

DESIGN

Page 7: Randomisation:  necessary but not sufficient

7

Trial was not really randomised

Pediatrics 2009;123;e661-7

Page 8: Randomisation:  necessary but not sufficient

8

The study population comprised children attending the second and third grades of elementary schools in deprived neighborhoods of 2 neighboring cities, namely, Dortmund and Essen, Germany … Schools in Dortmund represented the intervention group (IG) and schools in Essen the control group (CG). For each city, 20 schools were selected randomly (Fig 1).

The study population comprised children attending the second and third grades of elementary schools in deprived neighborhoods of 2 neighboring cities, namely, Dortmund and Essen, Germany … Schools in Dortmund represented the intervention group (IG) and schools in Essen the control group (CG). For each city, 20 schools were selected randomly (Fig 1).

Page 9: Randomisation:  necessary but not sufficient

9

Improper randomisation

“Randomization was alternated every 10 patients, such that the first 10 patients were assigned to early atropine and the next 10 to the regular protocol, etc.  To avoid possible bias, the last 10 were also assigned to early atropine.”

[Lessick et al, Eur J Echocardiography 2000]

Page 10: Randomisation:  necessary but not sufficient

Inadequate blinding

“… the patients were randomly assigned to prophylaxis or nonprophylaxis groups according to hospital number. Both the physician and the nurse technician were blind as to which assignment the patient received. Patients in group A received nitrofurantoin 50 mg four times and phenazopyridine hydrochloride 200 mg three times for 1 day. Patients in group B received phenazopyridine hydrochloride only. The code was broken at the completion of the study.”

10

Page 11: Randomisation:  necessary but not sufficient
Page 12: Randomisation:  necessary but not sufficient

Sources of bias Pre-randomisation Post-randomisation

12

Page 13: Randomisation:  necessary but not sufficient

13

Page 14: Randomisation:  necessary but not sufficient

Sample size

The aim should be to have a large enough sample size to have a high probability (power) of detecting a clinically worthwhile treatment effect if it exists

Larger trials have greater power to detect beneficial (or detrimental) effects

Many clinical trials are far too small– Median 40 patients per arm in 616 trials on PubMed

in 2006– Most trials have very low power to detect clinically

meaningful treatment effects

Page 15: Randomisation:  necessary but not sufficient

15

ANALYSIS

Page 16: Randomisation:  necessary but not sufficient

Analysis does not match design

16

Page 17: Randomisation:  necessary but not sufficient

Analysis does not match designswitched from crossover to parallel

17

Page 18: Randomisation:  necessary but not sufficient

18

Analysis does not match design

Higgins et al. J Am Coll Cardiol 2003

Page 19: Randomisation:  necessary but not sufficient

Analysis does not match design

Primary end point: Progression of heart failure, defined as a composite of all-cause mortality, hospitalization for worsening HF, or ventricular tachyarrhythmias requiring device therapy

19

Page 20: Randomisation:  necessary but not sufficient

20

Analysis does not match design

TARGET trial, Lancet

2004

In fact this was two separate 1:1 comparisons: Lumiracoxib vs naproxen

Lumiracoxib vs ibuprofen

Page 21: Randomisation:  necessary but not sufficient

21

Page 22: Randomisation:  necessary but not sufficient

22

Stender et al, Lancet 2000

Page 23: Randomisation:  necessary but not sufficient

23

What is an intention to treat analysis?

Which patients are included in an intention to treat analysis?

– Should be all randomised patients, retained in the original groups as randomised

Most RCTs with ‘intention to treat’ analyses have some missing data on the primary outcome variable

– 75% of 119 RCTs - Hollis & Campbell, BMJ 1999

– 58% of 100 RCTs - Kruse et al, J Fam Pract 2002

– 77% of 249 RCTs – Gravel et al, Clin Trials 2007

– Really ‘available case analysis’

Page 24: Randomisation:  necessary but not sufficient

24

Improper comparison

Labrie et al, Prostate 2004;59:311-318.

Page 25: Randomisation:  necessary but not sufficient

25

Post hoc data and analysis decisions

Huge scope for post hoc selection from multiple analyses– omitting data– adjustment – categorisation/cutpoints– log transformation– etc

“The “art” part of science is focussed in large part on dealing with these matters in a way that is most likely to preserve fundamental truths, but the way is open for deliberate skewing of results to reach a predetermined conclusion.”

Bailar JC. How to distort the scientific record without actually lying: truth, and the arts of science. Eur J Oncol 2006;11:217-24.

Page 26: Randomisation:  necessary but not sufficient

26

INTERPRETATION

Page 27: Randomisation:  necessary but not sufficient
Page 28: Randomisation:  necessary but not sufficient

Spin in a representative sample of 72 trials [Boutron et al, JAMA 2010]

Title– 18% Title

Abstract– 38% Results section of abstract– 58% Conclusions section of abstract

Main text – 29% Results– 41% Discussion– 50% Conclusions– >40% had spin in at least 2 sections of main text

Page 29: Randomisation:  necessary but not sufficient

“Spin”

Review of breast cancer trials“… spin was used frequently to influence, positively, the interpretation of negative trials, by emphasizing the apparent benefit of a secondary end point. We found bias in reporting efficacy and toxicity in 32.9% and 67.1% of trials, respectively, with spin and bias used to suggest efficacy in 59% of the trials that had no significant difference in their primary endpoint.”

[Vera-Badillo et al, Ann Oncol 2013]

Page 30: Randomisation:  necessary but not sufficient

30

SELECTIVEPUBLICATION

Page 31: Randomisation:  necessary but not sufficient

31

Consistent evidence of study publication bias

Studies with significant results are more likely to be published than those with non-significant results– Statistically significant results are about 20% more

likely to be published [Song et al, HTA 2000]

Studies reported at conferences are less likely to be fully published if not significant

[Scherer et al, CDMR 2004]

Even when published, nonsignificant studies take longer to reach publication that those with significant findings

[Hopewell et al, CDMR 2001]

Page 32: Randomisation:  necessary but not sufficient

Of 635 clinical trials completed by Dec 2008, 294(46%) were published in a peer reviewed biomedical journal, indexed by Medline, within 30 months of trial completion.

Country

Size

Phase

Funder

32

Ross JS, Mulvey GK, Hines EM, Nissen SE, Krumholz HM. Trial publication after registration in ClinicalTrials.gov: a cross-sectional analysis. PLoS Med 2009.

Page 33: Randomisation:  necessary but not sufficient

Consequences of failure to publish

Non-publication of research findings always leads to a reduced evidence-base

Main concern is that inadequate publication distorts the evidence-base – if choices are driven by results

Even if there is no bias the evidence-base is diminished and thus there is extra (and avoidable) imprecision and clinical uncertainty

Page 34: Randomisation:  necessary but not sufficient

Clustering of P values just below 0.05

Pocock et al, BMJ 2004

P=0.05

P=0.01

Page 35: Randomisation:  necessary but not sufficient

PLoS One 2013

“There is strong evidence of an association between significant results and publication; studies that report positive or significant results are more likely to be published and outcomes that are statistically significant have higher odds of being fully reported. Publications have been found to be inconsistent with their protocols.”

Page 36: Randomisation:  necessary but not sufficient

36

REPORTING

Page 37: Randomisation:  necessary but not sufficient

37

Evidence of poor reporting

Poor reporting: key information is missing or ambiguous

There is considerable evidence that many published articles do not contain the necessary information – We cannot tell exactly how the research was done

Page 38: Randomisation:  necessary but not sufficient

Poor description of non-pharmacological interventions in RCTs

Hoffmann et al, BMJ 2013

Only 53/137 (39%) interventions were adequately described– increased to 59% by using responses from

contacted authors

38

Page 39: Randomisation:  necessary but not sufficient

Perry et al, J Exp Criminol 2010

39

Page 40: Randomisation:  necessary but not sufficient

Reporting of harms in randomized controlled trials of psychological interventions for mental and behavioral disorders: A review of current practice [Jonsson et al, CCT 2014]

104 (79%) reports did not indicate that adverse events, side effects, or deterioration had been monitored

40

Page 41: Randomisation:  necessary but not sufficient

“None of the psychological intervention trials mentioned the occurrence of an adverse event in their final report. Trials of drug treatments were more likely to mention adverse events in their protocols compared with those using psychological treatments. When adverse events were mentioned, the protocols of psychological interventions relied heavily on severe adverse events guidelines from the National Research Ethics Service (NRES), which were developed for drug rather than psychological interventions and so may not be appropriate for the latter.” 41

Page 42: Randomisation:  necessary but not sufficient

CONSORT – reporting RCTs

Structured advice, checklist and flow diagram Based on evidence, consensus of relevant

stakeholders Explanation and elaboration paper

42

Page 43: Randomisation:  necessary but not sufficient

Liu et al.,Transplant Int 2013

43

Page 44: Randomisation:  necessary but not sufficient

44

Review of 87 RCTs • Primary Outcome specification never matched

precisely!• 21% failed to register or publish primary outcomes [PO] • discrepancies in 79% of the registry–publication pairs • Percentages did not differ significantly between industry

and non-industry-sponsored trials• 30% of trials contained unambiguous PO

discrepancies• e.g., omitting a registered PO from the publication,

‘‘demoting’’ a registered PO to a published secondary outcome

• 48% non-industry-sponsored, 21% industry-sponsored (P=0.01)

Page 45: Randomisation:  necessary but not sufficient

State of play

Not all trials are published

Methodological errors are common

Research reports are seriously inadequate– Improvement over time is very slow

Reporting guidelines exist

It’s much easier to continue to document the problems than to change behaviour

45

Page 46: Randomisation:  necessary but not sufficient

46

Can we do better?

Page 47: Randomisation:  necessary but not sufficient

47

Page 48: Randomisation:  necessary but not sufficient

48

Some (partial) solutions to improving published randomised trials

Prevention of outcome reporting bias requires changing views about P<0.05

All primary and secondary outcomes should be specified a priori and then fully reported

Monitoring/regulation– Ethics committees, data monitoring committees,

funders Trial registration Journal restrictions Publication of protocols Availability of raw data (data sharing or

publication)

Page 49: Randomisation:  necessary but not sufficient

Publication of protocols

Publication is strongly desirable

Copy of protocol is required by some journals– Some publish this as a Web Appendix

Practice is likely to increase

Page 50: Randomisation:  necessary but not sufficient
Page 51: Randomisation:  necessary but not sufficient

Int J Stroke 2012

Page 52: Randomisation:  necessary but not sufficient
Page 53: Randomisation:  necessary but not sufficient

53

Finally …

“There may be greater danger to the public welfare from statistical dishonesty than from almost any other form of dishonesty”[Bailar JC. Clin Pharmacol Ther 1976;20:113-20.]

As an author– Be honest and transparent

As a reader– Beware

Page 54: Randomisation:  necessary but not sufficient

54

Page 55: Randomisation:  necessary but not sufficient

55

Vasopressin vs epinephrine for out-of-hospital cardiopulmonary resuscitation[Wenzel et al, NEJM 2004]

Primary outcome – hospital admission (alive) The overall comparison gives

OR = 0.79 (95% CI 0.62 to 1.02) [P=0.06]

Page 56: Randomisation:  necessary but not sufficient

56

Wenzel et al, NEJM 2004 Vasopressin

(N=589) Epinephrine

(N=597) P

Value Odds Ratio (95% CI)

All patients Hospital admission

214/ 589 (36.3) 186/ 597 (31.2) 0.06 0.8 (0.6–1.0)

Ventricular fibrillation

Hospital admission

103/ 223 (46.2) 107/ 249 (43.0) 0.48 0.9 (0.6–1.3)

Pulseless electrical activity Hospital admission

35/ 104 (33.7) 25/ 82 (30.5) 0.65 0.8 (0.5–1.6)

Asystole Hospital admission

76/ 262 (29.0) 54/ 266 (20.3) 0.02 0.6 (0.4–0.9)

Page 57: Randomisation:  necessary but not sufficient

57

Wenzel et al, NEJM 2004

Vasopressin (N=589)

Epinephrine (N=597)

P Value

Odds Ratio (95% CI)

All patients Hospital admission

214/ 589 (36.3) 186/ 597 (31.2) 0.06 0.8 (0.6–1.0)

Ventricular fibrillation

Hospital admission

103/ 223 (46.2) 107/ 249 (43.0) 0.48 0.9 (0.6–1.3)

Pulseless electrical activity

Hospital admission

35/ 104 (33.7) 25/ 82 (30.5) 0.65 0.8 (0.5–1.6)

Asystole

Hospital admission

76/ 262 (29.0) 54/ 266 (20.3) 0.02 0.6 (0.4–0.9)

No mention in Methods of planning to look at subgroups

Comparison of P values is wrong No significant difference between 3 groups

(interaction test)

Page 58: Randomisation:  necessary but not sufficient

58

Chan et al, Lancet 2014

Page 59: Randomisation:  necessary but not sufficient

59

What is a double blind trial?

Survey of 91 physicians [Devereaux et al. JAMA 2001]– single, double, and triple blind trials

Which groups are blinded in a double blind trial?– participants– health care providers– data collectors– data analysts– judicial assessors of outcomes

Page 60: Randomisation:  necessary but not sufficient

60

Physician interpretations

Participants + + + + + + + + Health care providers + + + + + + Data collectors + + + + + Data analysts + + Outcome assessors + + + +

% 38 5 5 7 1 7 13 10[13% gave other answers]

Only 29% specified outcome assessors

[Devereaux et al. JAMA 2001]

Page 61: Randomisation:  necessary but not sufficient

“The main limitation of our trial was the lack of blinded outcome assessment; this is probably impossible to achieve in such trials because it is difficult to disguise or mask the mattresses and it would be unethical to frequently move seriously ill, elderly people on to a standard surface for their skin to be assessed. We took steps to minimise the potential for bias this allows by collecting independent skin assessments carried out by both the ward staff and the clinical research nurses. Although ward nurses were not blind to allocation, we have no evidence that this influenced the care given. The frequent mattress changes were a strength of this trial as they represent the use of mattresses in real life and provide generalisable data.

BMJ 2006