significance testing and confidence intervals Ágnes hajdu epiet introductory course 3.10.2011

48
Significance testing and confidence intervals Ágnes Hajdu EPIET Introductory course 3.10.2011

Upload: evan-boone

Post on 27-Mar-2015

234 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Significance testing and confidence intervals Ágnes Hajdu EPIET Introductory course 3.10.2011

Significance testingand confidence intervals

Ágnes HajduEPIET Introductory course

3.10.2011

Page 2: Significance testing and confidence intervals Ágnes Hajdu EPIET Introductory course 3.10.2011

The idea of statistical inference

Sample

PopulationConclusions basedon the sample

Generalisation to the population

Hypotheses

2

Page 3: Significance testing and confidence intervals Ágnes Hajdu EPIET Introductory course 3.10.2011

Inferential statistics

• Uses patterns in the sample data to draw inferences about the population represented, accounting for randomness.

• Two basic approaches: – Hypothesis testing– Estimation

• Common goal: conclude on the effect of an independent variable (exposure) on a dependent variable (outcome).

3

Page 4: Significance testing and confidence intervals Ágnes Hajdu EPIET Introductory course 3.10.2011

The aim of a statistical test

To reach a scientific decision (“yes” or “no”) on a difference (or effect), on a probabilistic basis, on observed data.

4

Page 5: Significance testing and confidence intervals Ágnes Hajdu EPIET Introductory course 3.10.2011

Why significance testing?

Botulism outbreak in Italy: “The risk of illness was higher among diners who ate home preserved green olives (RR=3.6).”

Is the association due to chance?

5

Page 6: Significance testing and confidence intervals Ágnes Hajdu EPIET Introductory course 3.10.2011

The two hypothesis!

There is a difference between the two groups

(=there is an effect)

Alternative Hypothesis (H1)

(eg: RR=3.6)

When you perform a test of statistical significance you usually reject or do not reject the Null Hypothesis (H0)

There is NO difference between the two groups

(=no effect)

Null Hypothesis (H0)

(e.g.: RR=1)

6

Page 7: Significance testing and confidence intervals Ágnes Hajdu EPIET Introductory course 3.10.2011

Botulism outbreak in Italy• Null hypothesis (H0): “There is no

association between consumption of green olives and Botulism.”

• Alternative hypothesis (H1): “There is an association between consumption of green olives and Botulism.”

7

Page 8: Significance testing and confidence intervals Ágnes Hajdu EPIET Introductory course 3.10.2011

Hypothesis, testing and null hypothesis

• Tests of statistical significance• Data not consistent with H0 :

– H0 can be rejected in favour of some alternative hypothesis H1 (the objective of our study).

• Data are consistent with the H0 :– H0 cannot be rejected

You cannot say that the H0 is true. You can only decide to reject it or not reject it.

8

Page 9: Significance testing and confidence intervals Ágnes Hajdu EPIET Introductory course 3.10.2011

How to decide when to reject the null hypothesis?

H0 rejected using reported p value

p-value = probability that our result (e.g. a difference between proportions or a RR) or more extreme values could be observed under the null hypothesis

9

Page 10: Significance testing and confidence intervals Ágnes Hajdu EPIET Introductory course 3.10.2011

p values – practicalities

Small p values = low degree of compatibility between H0 and the observed data: you reject H0, the test is significant

Large p values = high degree of compatibility between H0 and the observed data: you don’t reject H0, the test is not significant

We can never reduce to zero the probability that our result was not observed by chance alone

10

Page 11: Significance testing and confidence intervals Ágnes Hajdu EPIET Introductory course 3.10.2011

Levels of significance – practicalities

We need of a cut-off !

0.01 0.05 0.10

p value > 0.05 = H0 non rejected (non significant)

p value ≤ 0.05 = H0 rejected (significant)

BUT: Give always the exact p-value rather than „significant“ vs. „non-significant“.

11

Page 12: Significance testing and confidence intervals Ágnes Hajdu EPIET Introductory course 3.10.2011

• ”The limit for statistical significance was set at p=0.05.”

• ”There was a strong relationship (p<0.001).”

• ”…, but it did not reach statistical significance (ns).”

• „ The relationship was statistically significant (p=0.0361)”

Examples from the literature

p=0.05 Agreed conventionNot an absolute truth

”Surely, God loves the 0.06 nearly as much as the 0.05” (Rosnow and Rosenthal, 1991)

12

Page 13: Significance testing and confidence intervals Ágnes Hajdu EPIET Introductory course 3.10.2011

p = 0.05 and its errors

• Level of significance, usually p = 0.05

• p value used for decision making

But still 2 possible errors:

H0 should not be rejected, but it was rejected :

Type I or alpha error

H0 should be rejected, but it was not rejected :

Type II or beta error

13

Page 14: Significance testing and confidence intervals Ágnes Hajdu EPIET Introductory course 3.10.2011

• H0 is “true” but rejected: Type I or error• H0 is “false” but not rejected: Type II or error

Types of errors

H0 to be not rejected H0 to be rejected (H1)

H0 not rejected Right decision

1-

Type II error

H0 rejected (H1)

Type I error

Right decision

1-

Decision based on the p value

Truth

No diff

No diff

Diff

Diff

14

Page 15: Significance testing and confidence intervals Ágnes Hajdu EPIET Introductory course 3.10.2011

More on errors• Probability of Type I error:

– Value of α is determined in advance of the test– The significance level is the level of α error that we

would accept (usually 0.05)

• Probability of Type II error:– Value of β depends on the size of effect (e.g. RR, OR)

and sample size– 1-β: Statistical power of a study to detect an effect on

a specified size (e.g. 0.80)– Fix β in advance: choose an appropriate sample size

15

Page 16: Significance testing and confidence intervals Ágnes Hajdu EPIET Introductory course 3.10.2011

H0 is true H1 is true

Test statistics T

1 1

ok

1 -

error2. kind

error 1. kind

ok

1 -

H0 Reality H1

H0

Decisionaccording to p value

H1

1- Power

1- Significance

error

Even more on errors

16

Page 17: Significance testing and confidence intervals Ágnes Hajdu EPIET Introductory course 3.10.2011

Principles of significance testing

• Formulate the H0 • Test your sample data against H0

• The p value tells you whether your data are consistent with H0

i.e, whether your sample data are consistent with a chance finding (large p value), or whether there is reason to believe that there is a true difference (association) between the groups you tested

• You can only reject H0, or fail to reject it!

17

Page 18: Significance testing and confidence intervals Ágnes Hajdu EPIET Introductory course 3.10.2011

Quantifying the association

• Test of association of exposure and outcome • E.g. Chi2 test or Fisher’s exact test• Comparison of proportions• Chi2-value quantifies the association• The larger the Chi2-value, the smaller the p

value – the more the observed data deviate from the

assumption of independence (no effect).18

Page 19: Significance testing and confidence intervals Ágnes Hajdu EPIET Introductory course 3.10.2011

Chi-square value

= sum of all cells: for each cell, subtract the expected number from the observed number, square the difference, and divide by the expected number

num. expected

num.) expectednum. (observed 22

19

Page 20: Significance testing and confidence intervals Ágnes Hajdu EPIET Introductory course 3.10.2011

Botulism outbreak in Italy2x2 table

9 43

4 79

Olives

Noolives

Ill Non ill

13 122

52

83

1352010 % 90 %

Expected proportion of ill and not ill :

x10% ill

x 90% non-ill

x10% ill

x 90% non-ill

Expected number of ill and not ill for each cell :

5

8 75

47

Page 21: Significance testing and confidence intervals Ágnes Hajdu EPIET Introductory course 3.10.2011

5.01

5.01)-(9 2

46.99

46.99)-(43 2

7.99

7.99)-(4 2

75.01

75.01)-(79 2

Chi-square value

Botulism outbreak in Italy

Olives

Noolives

Ill Non ill

2 = 5.73p = 0.016

21

Page 22: Significance testing and confidence intervals Ágnes Hajdu EPIET Introductory course 3.10.2011

Botulism outbreak in Italy“The relative risk (RR) of illness among diners who ate home preserved green olives was 3.6 (p=0.016).”

The p-value is smaller than the chosen significance level of a = 5%. → Null hypothesis can be rejected.

There is a 0.016 probability (16/1000) that the observed association could have occured by chance, if there were no true association between

eating olives and illness.22

Page 23: Significance testing and confidence intervals Ágnes Hajdu EPIET Introductory course 3.10.2011

Epidemiology and statistics

23

Page 24: Significance testing and confidence intervals Ágnes Hajdu EPIET Introductory course 3.10.2011

Criticism on significance testing

“Epidemiological application need more than a decision as to whether chance alone could have produced association.” (Rothman et al. 2008)

→ Estimation of an effect measure (e.g. RR, OR) rather than significance testing.

24

Page 25: Significance testing and confidence intervals Ágnes Hajdu EPIET Introductory course 3.10.2011

Why estimation?

Botulism outbreak in Italy: “The risk of illness was higher among diners who ate home preserved green olives (RR=3.6).”

How confident can we be in the result?What is the precision of our point estimate?

25

Page 26: Significance testing and confidence intervals Ágnes Hajdu EPIET Introductory course 3.10.2011

The epidemiologist needs measurements rather than probabilities

2 is a test of association

OR, RR are measures of association on a continuous scale infinite number of possible values

The best estimate = point estimate

Range of values allowing for random variability:

Confidence interval precision of the point estimate

26

Page 27: Significance testing and confidence intervals Ágnes Hajdu EPIET Introductory course 3.10.2011

Confidence interval (CI)

Range of values, on the basis of the sample data, in which the population value (or true value) may lie.

• Frequently used formulation: „If the data collection and analysis could be replicated many times, the CI should include the true value of the measure 95% of the time .”

27

Page 28: Significance testing and confidence intervals Ágnes Hajdu EPIET Introductory course 3.10.2011

Confidence interval (CI)

Indicates the amount of random error in the estimateCan be calculated for any „test statistic“, e.g.: means, proportions, ORs, RRs

e.g. CI for means95% CI = x – 1.96 SE up to x + 1.96 SE

1 - αα/2 α/2

Lower limit upper limitof 95% CI of 95% CI

= 5%

s

28

Page 29: Significance testing and confidence intervals Ágnes Hajdu EPIET Introductory course 3.10.2011

CI terminology

RR = 1.45 (0.99 – 2.1)

Confidence intervalPoint estimate

Lower confidence limit

Upper confidence limit

29

Page 30: Significance testing and confidence intervals Ágnes Hajdu EPIET Introductory course 3.10.2011

• The amount of variability in the data

• The size of the sample

• The arbitrary level of confidence you desire for your study (usually 90%, 95%, 99%)

Width of confidence interval depends on …

A common way to use CI regarding OR/RR is :If 1.0 is included in CI non significant If 1.0 is not included in CI significant

30

Page 31: Significance testing and confidence intervals Ágnes Hajdu EPIET Introductory course 3.10.2011

Study A, large sample, precise results, narrow CI – SIGNIFICANTStudy B, small size, large CI - NON SIGNIFICANT

Looking the CI

Study A, effect close to NO EFFECTStudy B, no information about absence of large effect

RR = 1

A

B

Large RR

31

Page 32: Significance testing and confidence intervals Ágnes Hajdu EPIET Introductory course 3.10.2011

More studies are better or worse?

• Decision making based on results from a collection of studies is not facilitated when each study is classified as a YES or NO decision.

1RR

20 studies with different results...

Need to look at the point estimation and its CI

But also consider its clinical or biological significance

32

Page 33: Significance testing and confidence intervals Ágnes Hajdu EPIET Introductory course 3.10.2011

Botulism outbreak in Italy

• How confident can we be in the result?• Relative risk = 3.6 (point estimate)• 95% CI for the relative risk:

(1.17 ; 11.07)

The probability that the CI from 1.17 to 11.07 includes the true relative risk is 95%.

33

Page 34: Significance testing and confidence intervals Ágnes Hajdu EPIET Introductory course 3.10.2011

Botulism outbreak in Italy

“The risk of illness was higher among diners who ate home preserved green olives (RR=3.6, 95% CI 1.17 to 11.07).”

34

Page 35: Significance testing and confidence intervals Ágnes Hajdu EPIET Introductory course 3.10.2011

The p-value (or CI) function

• A graph showing the p value for all possible values of the estimate (e.g. OR or RR).

• Quantitative overview of the statistical relation between exposure and disease for the set of data.

• All confidence intervals can be read from the curve.• The function can be constructed from the confidence

limits in Episheet.

35

Page 36: Significance testing and confidence intervals Ágnes Hajdu EPIET Introductory course 3.10.2011

Example: Chlordiazopoxide use and congenital heart disease

C use No C use

Cases 4 386

Controls 4 1250

OR = (4 x 1250) / (4 x 386) = 3.2

p=0.08 ; 95% CI=0.81–13

From Rothman K

Page 37: Significance testing and confidence intervals Ágnes Hajdu EPIET Introductory course 3.10.2011

Odds ratio

3.2

p=0.08

0.81 - 1337

Page 38: Significance testing and confidence intervals Ágnes Hajdu EPIET Introductory course 3.10.2011

Example: Chlordiazopoxide use and congenital heart disease – large study

C use No C use

Cases 1090 14 910

Controls 1000 15 000

OR = (1090 x 15000) / (1000 x 14910) = 1.1

p=0.04 ; 95% CI=1.05-1.2From Rothman K

Page 39: Significance testing and confidence intervals Ágnes Hajdu EPIET Introductory course 3.10.2011

Precision and strength of association

Strength

Precision39

Page 40: Significance testing and confidence intervals Ágnes Hajdu EPIET Introductory course 3.10.2011

Confidence interval provides more information than p value

• Magnitude of the effect (strength of association)

• Direction of the effect (RR > or < 1)

• Precision of the point estimate of the effect (variability)

p value can not provide them !

40

Page 41: Significance testing and confidence intervals Ágnes Hajdu EPIET Introductory course 3.10.2011

2 A test of association. It depends on sample size.

p value Probability that equal (or more extreme) results can be observed by chance alone

OR, RR Direction & strength of associationif > 1 risk factor if < 1 protective factor(independently from sample size)

CI Magnitude and precision of effect

What we have to evaluate the study

41

Page 42: Significance testing and confidence intervals Ágnes Hajdu EPIET Introductory course 3.10.2011

Comments on p-values and CIs

• Presence of significance does not prove clinical or biological relevance of an effect.

• A lack of significance is not necessarily a lack of an effect: “Absence of evidence is not evidence of absence”.

42

Page 43: Significance testing and confidence intervals Ágnes Hajdu EPIET Introductory course 3.10.2011

Comments on p values and CIs

• A huge effect in a small sample or a small effect in a large sample can result in identical p values.

• A statistical test will always give a significant result if the sample is big enough.

• p values and CIs do not provide any information on the possibility that the observed association is due to bias or confounding.

43

Page 44: Significance testing and confidence intervals Ágnes Hajdu EPIET Introductory course 3.10.2011

Cases Non cases Total 2 = 1.3E 9 51 60 p = 0.13NE 5 55 60 RR = 1.8Total 14 106 120 95% CI [ 0.6 - 4.9 ]

Cases Non cases Total 2 = 12E 90 510 600 p = 0.0002NE 50 550 600 RR = 1.8Total 140 1060 1200 95% CI [ 1.3-2.5 ]

2 and Relative Risk

« Too large a difference and you are doomed to statistical significance » 44

Page 45: Significance testing and confidence intervals Ágnes Hajdu EPIET Introductory course 3.10.2011

Exposure cases non cases AR%Yes 15 20 42.8%No 50 200 20.0%

Total 65 220

Common source outbreak suspected

REMEMBER: These values do not provide any information on the possibility that the observed association is due to a bias or confounding.

2 = 9.1 p = 0.002RR = 2.195%CI = 1.4-3.4

23%

45

Page 46: Significance testing and confidence intervals Ágnes Hajdu EPIET Introductory course 3.10.2011

Recommendations

• Always look at the raw data (2x2-table). How many cases can be explained by the exposure?

• Interpret with caution associations that achieve statistical significance.

• Double caution if this statistical significance is not expected.

• Use confidence intervals to describe your results.

• Report p values precisely.

46

Page 47: Significance testing and confidence intervals Ágnes Hajdu EPIET Introductory course 3.10.2011

Suggested reading

• KJ Rothman, S Greenland, TL Lash, Modern Epidemiology, Lippincott Williams & Wilkins, Philadelphia, PA, 2008

• SN Goodman, R Royall, Evidence and Scientific Research, AJPH 78, 1568, 1988

• SN Goodman, Toward Evidence-Based Medical Statistics. 1: The P Value Fallacy, Ann Intern Med. 130, 995, 1999

• C Poole, Low P-Values or Narrow Confidence Intervals: Which are more Durable? Epidemiology 12, 291, 2001

47

Page 48: Significance testing and confidence intervals Ágnes Hajdu EPIET Introductory course 3.10.2011

Previous lecturers

48