statistical considerations for research plans jarno tuimala 2015-09-08

117
Statistical considerations for research plans Jarno Tuimala 2015-09-08

Upload: theresa-lane

Post on 02-Jan-2016

217 views

Category:

Documents


1 download

TRANSCRIPT

Statistical considerations for

research plansJarno Tuimala

2015-09-08

Data analysis considerations for (clinical) research

Jarno Tuimala2015-09-08

3

Schedule 2015

LECTURESHaartman instituutti, Haartmaninkatu 3, pieni luentosali 14:15-15:45

1. Tue 1.9.2015Otto Helve: Introduction and curriculum of a clinical investigator

2. Wed 2.9.2015Jussi Merenmies: Evaluating results from a randomised controlled trialErkki Isometsä: Clinical Epidemiology: observational studies

3. Tue 8.9.2015 Jarno Tuimala: Statistical considerations for research plans

4. Wed 9.9.2015 Ritva Loponen, Harriet Colliander: Clinical trial registrations and submissions to the authorities

5. Tue 22.9.2015Mikael Knip: Research in international setting

4

Principles of experimental design• Ronald A. Fisher (1935)

1. Comparison (results are in relation to something)2. Replication (several obs. units per groups)3. Randomization (randomly allocate units to groups)4. Blocking (take confounding into account)5. Factorial experiments (study interactions)

• Originally built on a foundation of analysis of variance (ANOVA), and aimed for agricultural experiments

The world’s oldest clinical trial

• Bible, Book of Daniel, 1:3-16.• Treatment group: Four boys from Israel

were given just water and vegetables.• Control group: Another group of boys

received meat and wine from the king's table.

• After ten days the groups were visually compared, and the treatment group was found to be healthier than the control group.

• In addition, the treatment group was ten times better in all matters of wisdom and understanding than the control group.

1. Comparison2. Replication3. Randomization

• What design principles were used in this experiment?

• What are the response and explanatory variables?

• How would you make the experiment better?

• What statistical method(s) would you use for analyzing this?

5

6

The whole chapter10 but the official told Daniel, "I am afraid of my lord the king, who has assigned your food and drink. Why should he see you looking worse than the other young men your age? The king would then have my head because of you." 11 Daniel then said to the guard whom the chief official had appointed over Daniel, Hananiah, Mishael and Azariah, 12 "Please test your servants for ten days: Give us nothing but vegetables to eat and water to drink.13 Then compare our appearance with that of the young men who eat the royal food, and treat your servants in accordance with what you see." 14 So he agreed to this and tested them for ten days. 15 At the end of the ten days they looked healthier and better nourished than any of the young men who ate the royal food.16 So the guard took away their choice food and the wine they were to drink and gave them vegetables instead. 17 To these four young men God gave knowledge and understanding of all kinds of literature and learning. And Daniel could understand visions and dreams of all kinds. 18 At the end of the time set by the king to bring them in, the chief official presented them to Nebuchadnezzar. 19 The king talked with them, and he found none equal to Daniel, Hananiah, Mishael and Azariah; so they entered the king's service. 20 In every matter of wisdom and understanding about which the king questioned them, he found them ten times better than all the magicians and enchanters in his whole kingdom.

7

Things to consider

• Hypothesis• Outcome measures

• Data sources• Registries• Experiments• Data management

• Study design• Observational and experimental design• Sample size

• Statistical analysis• Reporting

8

Hypothesis

9

Study objectives

• Testable hypotheses?

• Primary and secondary questions?

• Example:• Primary: Does smoking cause lung cancer?• Secondary: Are old smokers in worse shape than old non-

smokers?

10

Outcome measures

• What will be measured?• Does the individual get the disease (yes/no)?• How long does it take for the individual to get the disease

(time)?• How severe is the disease (laboratory tests, various scores

or gradings)?• Proxies?

• Example• Do smokers get cancer more often than non-smokers?• Does it take longer for non-smokers to get cancer than for

smokers?

11

Smoking and cancer

• Objective: Find, if smoking causes cancer• Hypothesis: Smokers get cancer more often than

non-smokers

• Next:• What kind of data is needed to test this? • Where to get data to test this?

12

Data sources

13

Smoking and cancer

• Hypothesis: Smokers get cancer more often than non-smokers

• Data needs, at least:• Two groups: smokers, non-smokers• Data: Smoking status, cancer end-point status

14

Registry or experimental study?• Experimental

• Expose individuals to tobacco smoke?• A review from an ethical board is needed• Not ethical -> registry study

• Registry• If strictly registry-based, no ethical board review needed• If patients or their relatives are contacted, a review is

mandatory

15

Registries (examples)

• National• Hospital’s discharge registry (HILMO) [THL]• Cancer registry [Cancer Society / THL]• Causes of Death [Statistics Finland]• Medications [KELA]• New special embursements for medicines [KELA]• ASA Registry [TTL]

• Local• Hospital registries

• Studies• Health 2000 / 2011

16

Registry study example

• Easy to assess whether individual has or has had lung cancer

• Much harder to assess whether they smoked or not• Health 2000 /2011 helps

• Use Health 2000 or 2011 data to pick the smokers and non-smokers

• Link with other registries (cancer registry) to assess the cancer status

• Do you need to collect other variables?

Confounding

Causal inference

18

Causality is (often) the aim

19

• Causal effects?

– The amount of total damage of a fire and the number of firemen at the site are strongly correlated. Do the firemen cause the damages?

– More of the lung cancer patients are smokers thannon-smokers. Does smoking cause lung cancer?

• Evidence based medicine...

Confounding

cause outcome

confounder

23

Confounding

smoking Lung cancer

occupation

24

25

Smoking and cancer

• Hypothesis: Smokers get cancer more often than non-smokers

• Data needs, at least:• Two groups: smokers, non-smokers• Data: Smoking status, cancer end-point status [from

Health 2000 or 2011]• Data: Occupational exposure [from ASA registry], Age,

Sex [from Health 2000 or 2011]

26

Note on causality

Time

Exposure Cancer

This is the right way!

27

Note on causality

StatinsHeart attack

Time

Don’t do this!

Confounding by indication

• The patient’s condition affects the way treatments or medication are allocated (confounding by severity).

• So, business as usual, but it creates problems during epidemiological (observational) studies.

33

Confounding by indication

34

• If the effect of treatment is not adjusted for the initial condition of the patient, a risk for drawing a wrong conclusion is high!

Solutions

30

• The previous example is a type of confounding by indicationcalled confounding by severity.

• Usual statistical methods, such as multivariate regression do not adjust for unmeasured variables that are often of importance in this kind of a situation.

• Or even if measured, the severity of disease is a royal pain to adjust for!– Propensity score adjustment, inverse-probability weighting

(Rubin) or instrumental variable methods (factor analysis and structural equation modeling) might work better.

– If possible, better to use a controlled trial, where patients can be randomized to treatment and no-treatment (or placebo).

– Remember natural experiments, also!– In other words, this is not necessarily very easy...

Causal pathway and confounders

Lung cancer

Alcohol Tobacco

Socioeconomicstatus

CYP2D6

31

Occupation

Study designs

33

Study designs

• Observational studies• Case-control studies• Cohort studies

• Treatment studies• Randomized Controlled Trials (RCTs)

Evidence pyramid (EBM)

1. Meta-analyses2. Randomized controlled trials3. Cohort studies4. Case-control studies5. Cross-sectional surveys6. Case series7. Case reports8. Animal studies

better

34

worse

35

Case-control study

36

Case-control study - Initiation

Time

Sampling

Age

37

Case-control study – Disease status

Time

Sampling

Age

CaseHas the disease

ControlDoesn’t have the disease

38

Case-control study - Sampling

Time

Sampling

Age

CaseHas the disease

ControlDoesn’t have the disease

39

Case-control study - Matching

Time

Sampling

Age

CaseHas the disease

ControlDoesn’t have the disease

40

Case-control study – Exposure?

Time

Sampling

Age

CaseHas the disease

ControlDoesn’t have the disease

41

Smoking and cancer

• Hypothesis: Smokers get cancer more often than non-smokers

• Data needs, at least:• Two groups: smokers, non-smokers• Data: Smoking status, cancer end-point status [from

Health 2000 or 2011]• Data: Occupational exposure [from ASA registry)

• Study design: Case-control study• Cases and controls sampled from Health 2011

42

Cohort study

43

Case-control study - Initiation

Time

Sampling

Age

44

Case-control study - Sampling

Time

Sampling

Age

45

Case-control study – Follow-up

Time

Sampling

Age

46

Follow-up time

Time

Censoring

Censoring

Censoring

47

Randomized controlled trials

48

RCT - Initiation

Eligible patients

49

RCT - Selection

Eligible patients

50

RCT - Randomization

Eligible patients

1

2 3

45

6

7

8

9

10

5 1 2 4 3 10 9 6 7 8

51

RCT - Randomization

Control group Treatment group

1

2

3

4

5

6

7

8

9

10

5 1 2 4 3 10 9 6 7 8

52

Treatment studies

• Randomized controlled trials!

53

Principles of experimental design• Ronald A. Fisher (1935)

1. Comparison2. Replication3. Randomization4. Blocking5. Factorial experiments

• Built on a foundation of analysis of variance (ANOVA), and aimed for agricultural experiments

54

Treatment studies

• Several different design possibilities• Parallel groups (or arms)

• Each groups does or does not receive a treatment

+ Very little contamination from external factors- Practicality, individual variability, assignment bias,

generalization

Subject 1

Subject 2

Pretest

Pretest

Control

Treatment A

Posttest

Posttest

55

Treatment studies

• Several different design possibilities• Within-subject design

• Each individual receives all treatments

+ Testing new drugs; requires less participants than the parallel groups design

- Carry-over effect -> counterbalanced design

Subject 1 Pretest Control Treatment A Posttest

Subject 2 Pretest Control Treatment B Posttest

56

Treatment studies

• Several different design possibilities• Cross-over (counter-balanced design)

• Each individual receives all treatments

+ Order of treatment can affect the results, this design reduces the chance of this happening

- Might not be practical with many conditions -> incomplete counterbalanced designs, such as Latin square

Group 1 Treatment A Treatment B Posttest

Group 2 Treatment B Treatment A Posttest

Treatment studies - Factorial design

• In designed experiments!• Sometimes used in clinical trials, also

• Factor is a manipulated phenomenan, or a treatment, presumed to affect the experiment, e.g.:

• Name of the factor: factor levels• Sex: male and female rats• vitamin C: low and high level

• Factorial designs have at least two distinct factors

57

Full factorial design, terms• The full factorial design

shown on the previous slides is often marked as 22 (or 2x2) and gives 2*2=4 different combinations or treatments.

• The base is the number of factor levels and the exponent gives the number of factors. Thus, there is a family of full factorial design that can be marked as 2k.

58

Diet

Normal Chocolate

Sex

Mal

eF

amal

e

Factors

Levels

Group 1 Group 2

Group 3 Group 4

60

Sample size

61

Sample size

• How many individuals do you need to have (in both groups) in order to be able to find a statistically significant difference (between the groups)?

• Essential step!• Many published studies are under-powered• R. Tsang, L. Colley, L. D. Lynd. Inadequate statistical power to detect

clinically significant differences in adverse event rates in randomized controlled trials. Journal of Clinical Epidemiology, 62:609–616, 2009.

• Educated guesswork• Very straightforward: Go to the library, and search for similar experiments you are

going to perform, and see how large a sample size is utilized in those.

• Formal power analysis• Should be done before the experiment is conducted.• Will complement the educated guesswork, or be worked out even without it.• Can be used for estimating any of the things listed on the following slides, if other

four are known or guessed.

62

These affect the sample size• Desired power ↑ -> sample size ↑• Desired ”p-value” ↑ -> sample size ↓• Effect size ↑ -> sample size ↓

• Possibly estimated by a pilot study

• Amount of random variation ↑ -> sample size ↑• Possibly estimated by a pilot study

• Desired levels for Type I and Type II errors• Usually

• Type I (alpha) = 0.05 (”p-value”), false positives• Type II (beta) = 0.80 (”power”), 1 – frequency false negatives

63

Power for a case-control study

Analysis

65

Statistical analysis plan -Write it before you have the data!

1. Introduction2. Data sources3. Analysis objectives4. Analysis sets / populations / subgroups5. Endpoints and covariates6. Handling of missing values7. Other data convensions8. Statistical procedures9. Adjustment for confounders, etc.10. Sensitivity analyses11. Rationale for deviation (during the analysis) from this plan12. Quality control plan13. Programming plans14. References15. Appendices

Adapted from https://www.pfizer.com/files/research/research_clinical_trials/Clinical_Data_Access_Request_Sample_SAP.pdf

66

Data manipulation

67

Data manipulation

• Missing values• Not all individuals necessarily have values for all variables• For example, some individuals might miss information for

age and sex

• Solutions• Remove from the analysis all individuals with at least one

missing value• Impute, or estimate, the missing values using information

from other variables• SPSS offers, for example, a pairwise deletion possibility,

but it biases the results

68

Example

Individual Age Sex Smoking Cancer

1 64 M S 1

2 79 M S 0

3 ?? M NS 0

4 91 M NS 1

5 83 F S 1

6 65 F NS 0

7 90 F NS 0

69

Example - imputation

Individual Age Sex Smoking Cancer

1 64 M S 1

2 79 M S 0

3 90 M NS 0

4 91 M NS 1

5 83 F S 1

6 65 F NS 0

7 90 F NS 0

70

Example – case-wise deletionIndividual Age Sex Smoking Cancer

1 64 M S 1

2 79 M S 0

4 91 M NS 1

5 83 F S 1

6 65 F NS 0

7 90 F NS 0

71

Statistical analyses

Odds ratio – a measure of association

• Odds for cancer | smoker: 12 / 8 = 1.5• Odds for cancer | non-smoker: 36 / 180 = 0.2• Odds ratio = 1.5 / 0.2 = 7.5• The odd for a smoker getting lung cancer is

7.5times that of an odd for a non-smoker 72

Chi square test for odds ratio

73

• Is smoking associated with the cancer status?

Pearson's Chi-squared test with Yates' continuity correction

Warning message:: Chi-squared approximationIn chisq.test(m)

may be incorrect

data: mX-squared = 18.6247

df = 1p-value = 1.591e-05

Fisher’s exact test for odds ratio

74

Fisher's Exact Test for Count Data

data: m

p-value = 5.103e-05

interval:95 percent

2.575804

confidence

22.522438

sample estimates:

odds ratio

7.407224

alternative hypothesis: true odds ratio isnot equal to 1

What is a P-value?

75

• Technicalities:– Null hypothesis: the odds ratio is not different from one– Alternative hypothesis: the odds ratio is different from one

• P-value gives us the probability that we would get a) such an extreme test statistic (here, X-squared) value or b) observe such an extreme data set, if the null hypothesis is true.

• Usually the P-value is compared to a cut-off, say 0.05, and if the P-value is smaller than the cut-off, the result is called statistically significant.

What is a P-value?• P-values are used for testing hypothesis: one P-

valueper hypothesis!– Does smoking predispose individuals for lung cancer?– Does a larger exposure (more cigarettes smoked) give rise

to larger risk?

• If there is no hypothesis to be tested, do notgenerate a P-value!

• P-value is not the whole story. Pay attention to the effect size, also. More on this later. 20

What is a confidence interval?

77

• A counterpart of p-value with a cut-off of 0.05 can bethought of being the confidence interval of 95%.

• If the same experiment would be repeated, say, a hundred times, the true population value (of OR) would fall inside the confidence interval in average 95 times out of hundred.

• If the 95% confidence interval for an odds ratio does not include one, the result is statistically significant at a0.05 risk level.

• Used for giving an idea of how imprecise the result is.

Which test to use - tableTypes of your dependent variable

Interval/Ratio (Normality assumed)

Interval/Ratio (Normality not assumed), Ordinal Dichotomy (Binomial)

Compare two unpaired groups Unpaired t test Mann-Whitney test Fisher's test

Compare two paired groups Paired t test Wilcoxon test McNemar's test

Compare more than two unmatched groups ANOVA Kruskal-Wallis test Chi-square test

Compare more than two matched groups

Repeated-measures ANOVA Friedman test Cochran's Q test

Find relationship between two variables Pearson correlation Spearman correlation Cramer's V

Predict a value with one independent variable

Linear/Non-linear regression Non-parametric regression Logistic regression

Predict a value with multiple independent variables or binomial variables

Multiple linear/non-linear regression

Poisson regression, survival analysis Multiple logistic regression

78Adapted from http://yatani.jp/HCIstats/HomePage

Adjusting for confounding

79

Stratification

26

strata

Mantel & Haenszel, 1956

occupation cases controls

non-smokers smokers non-smokers smokers

Housewives and white-collars

36 12 180 8

Other occupations 10 6 56 5

hw & wc Lung cancer No lung cancer

Smokers 12 8

Non-smokers 36 180

other Lung cancer No lung cancer

Smokers 6 5

Non-smokers 10 56

Separate analysis

81

OR=7.4 (2.6-22.5)

OR=6.5 (1.4-32.9)

hw & wc Lung cancer No lung cancer

Smokers 12 8

Non-smokers 36 180

other Lung cancer No lung cancer

Smokers 6 5

Non-smokers 10 56

Stratified analysis

82

• Mantel-Haenzel’s test: A Chi Square test withweighting over a stratification variable– OR = 7.2 (3.3 – 15.9)– Effect of smoking is significant even when the

confounding variable (occupation) is adjusted for.

occupation cases controls

non-smokers smokers non-smokers smokers

Housewives and white-collars

36 12 180 8

Other occupations 10 6 56 5

83

Regression modelingResponse variable Example Regression method

Continuous Height of a person Linear regression

Dichotomous Disease / no disease[case-control studies]

Logistic regression

Count Number of naevi[cohort studies, and others]

Poisson regression

Time Time to death[cohort studies]

Cox’s regression

• These are very general and flexible methods• Several explanatory variables can be used in the model• Interactions between explanatory variables can be modeled

• If you know these, you seldom need anything else, since e.g., t-test, ANOVA, and ANCOVA can all be performed using linear (regression) models.

Logistic regression

30

• Regression:– Allows adjusting for several confounders and

co- variates at the same time– Different types for different purposes

• Linear, logistic, Poisson, survival time, ...

• Logistic regression:– The response (dependent) variable has two

possiblevalues (yes / no)

– Estimates an odds ratio, confidence interval and a p-value for every variable or variable’s level.

Age, occupation and smoking

85

• Effect of smoking is adjusted for both age and occupation at the same time.

• Note that after adjustment the OR is higher than the raw OR!

Variable OR (95% CI)Age

<45 145-54 1.91 (0.61... 6.75)55-64 2.05 (0.68... 7.24)>65 3.35 (1.07...12.18)

Occupationhousewife 1white-collar 0.92 (0.42... 1.91)other 0.97 (0.46... 1.97)

Smokingno 1yes 9.97 (4.22...25.28)

Causal pathway and confounders

Lung cancer

Alcohol Tobacco

Socioeconomicstatus

CYP2D6

86

Occupation

Causal pathway and confounders

Lung cancer

Alcohol Tobacco

Socioeconomicstatus

CYP2D6

87

Genotype and smoking

88

• Observation: Smoking is associated with lungcancer.

• Tobacco industry: observed association between smoking and lung cancer could be explained by some cancer predisposing genotype that also creates a craving for nicotine.

CYP2D6 genotype and smoking

Pharmacogenetics. 1998 Jun;8(3):227-38.89

• Hypothesis: Carriers of CYP2D6 inactivating allele(s) metabolize chemicals in tobacco faster than others, and makes these individuals smoke more often than others.

• Observation: Risk of lung cancer for carriers of inactivating mutation is 0.69 (95% CI = 0.52- 0.90).

93

Regression modeling Cox regression example

94

Regression modeling Cox regression example

Clinical relevance

95

Statistical and clinical significance

96

• Even if the result is statistically significant, it may• not be clinically significant

– Minimal clinically important difference (MCID)• MCID has to be decided before the study

– Sometimes it is known beforehand, sometimes not, and it has to be based on an educated guess.

• For case-control studies, MCID can also be thought of as, e.g., how much some new predictors help in setting the diagnosis.

COPD• For forced expiratory volume in one second (FEV1)

an increase of about 100 mL, which can be perceived by patients, is sometimes considered MCID.

• Bronchodilators in healthy persons:– Salbutamol: FEV1 increase of 62 mL (0 – 152 mL)

• Bronchodilators in COPD patients (FEV1 in litres):– Pre-salbutamol: 1.29 (0.80-2.12)– Post-salbutamol: 1.53 (1.19-2.58)– Post-placebo: 1.40 (1.36-1.42)– Post-caffeine: 1.36 (1.31-1.41) [5 mg / kg, for

asthma]– Post-indacaterol: 1.71 (1.63-1.78)– Post-formoterol: 1.65 (1.59-1.70)

39COPD. 2005 Mar;2(1):111-24; Chest. 2008 Aug;134(2):387-93; Caffeine for asthma (Cochrane Review)

COPD

98

99

Presenting results

Odds ratio – a measure of association

• Odds for cancer | smoker: 12 / 8 = 1.5• Odds for cancer | non-smoker: 36 / 180 = 0.2• Odds ratio = 1.5 / 0.2 = 7.5• The odd for a smoker getting lung cancer is

7.5times that of an odd for a non-smoker 10

0

Graphical representation of the table

101

Risk theatre

Non-smoker cases / 10 years Smokers cases / 10 years

Doll & Hill 191566

103

Statins – base rate (absolute risk)

104

Statins – relative risk

105

Statins – benefits versus adverse effects

106

Statins – risk theatre

107

Statins – NNI / risk theatre

108

Reporting guidelines

109

Reporting guidelines

• STROBE• STrengthening the Reporting of OBservational studies in

Epidemiology

• CONSORT• CONsolidated Standards of Reporting Trials

• Follow these!

110

STROBEMethodsStudy design 4 Present key elements of study design early in the paper

Setting 5 Describe the setting, locations, and relevant dates, including periods of recruitment, exposure, follow-up, and data collection

Participants 6 (a) Give the eligibility criteria, and the sources and methods of case ascertainment and control selection. Give the rationale for the choice of cases and controls

(b) For matched studies, give matching criteria and the number of controls per case

Variables 7 Clearly define all outcomes, exposures, predictors, potential confounders, and effect modifiers. Give diagnostic criteria, if applicable

Data sources/ measurement

8* For each variable of interest, give sources of data and details of methods of assessment (measurement). Describe comparability of assessment methods if there is more than one group

Bias 9 Describe any efforts to address potential sources of bias

Study size 10 Explain how the study size was arrived atQuantitative variables

11 Explain how quantitative variables were handled in the analyses. If applicable, describe which groupings were chosen and why

Statistical methods 12 (a) Describe all statistical methods, including those used to control for confounding

(b) Describe any methods used to examine subgroups and interactions

(c) Explain how missing data were addressed(d) If applicable, explain how matching of cases and controls was addressed

(e) Describe any sensitivity analyses

111

STROBEResultsParticipants 13* (a) Report numbers of individuals at each stage of study—eg

numbers potentially eligible, examined for eligibility, confirmed eligible, included in the study, completing follow-up, and analysed

(b) Give reasons for non-participation at each stage

(c) Consider use of a flow diagramDescriptive data

14* (a) Give characteristics of study participants (eg demographic, clinical, social) and information on exposures and potential confounders(b) Indicate number of participants with missing data for each variable of interest

Outcome data 15* Report numbers in each exposure category, or summary measures of exposure

Main results 16 (a) Give unadjusted estimates and, if applicable, confounder-adjusted estimates and their precision (eg, 95% confidence interval). Make clear which confounders were adjusted for and why they were included

(b) Report category boundaries when continuous variables were categorized(c) If relevant, consider translating estimates of relative risk into absolute risk for a meaningful time period

Data management

113

Reproducible research

• To sum up the previous steps:• Data gathering• Data analysis• Data presentation

• Working habit:• Document everything• Everything is a text file• Save in an open file format• Files should be human readable• Tie your files together• Have a data management plan

• Organization, (long-term) storage, availability• Use versioning on all files

114

Clinical trials at Duke

• Potti et al. studied chemosentivity of cancer cell lines.

• Results were going to be applied in a clinical trial.• And so it begins...

• http://bioinformatics.mdanderson.org/Supplements/ReproRsch-All/Modified/StarterSet/index.html

115

Summary in two minutes

• Coombs et al. delved into the analysis...• Doxorubicin

• Sensitive / resistant labels were reversed in the analysis• Some samples in the test data were duplicated• Some samples are labeled both sensitive and resistant

• Cisplatin and pemetrexed• Gene lists were off by one, the correct list does not differentiate

the cell lines• Some genes are not on arrays that were used• Sensitive / resistant labels are again reversed

• And the list goes on, see: http://arxiv.org/pdf/1010.1092.pdf

116

Meticulous documentation

• Protect the individuals recruited for the study• Protect your co-workers and co-authors• Protect yourself

• Work openly. • Everybody makes mistakes. Embrace and learn

from them!

117

Wrap-up

118

Summary

• The analysis methods are coupled to the study design which is itself affected by the hypotheses

• Write the analysis plan before you have the data• Prepare for small deviations, but don’t change the major

themes

• Learn regression methology, it will serve you well• Make a data management plan. Document everything• Learn from mistakes. Everybody makes them.

• Also, protect the innocent. It’s better to have a horrible end than horrors without end.