significance testing ioannis karagiannis (based on previous epiet material) 18 th epiet/euphem...

Significance testing

Ioannis Karagiannis(based on previous EPIET material)

18th EPIET/EUPHEM Introductory course28.09.2012

The idea of statistical inference

Sample

PopulationConclusions basedon the sample

Generalisation to the population

Hypotheses

2

Inferential statistics

• Uses patterns in the sample data to draw inferences about the population represented, accounting for randomness

• Two basic approaches: – Hypothesis testing– Estimation

• Common goal: conclude on the effect of an independent variable on a dependent variable

3

The aim of a statistical test

To reach a deterministic decision (“yes” or “no”) about observed data on a probabilistic basis.

4

Why significance testing?

Norovirus outbreak on a Greek island: “The risk of illness was higher among people who ate raw seafood (RR=21.5).”

Is the association due to chance?

5

The two hypotheses

There is a difference between the two groups

(=there is an effect)

Alternative Hypothesis (H1)

(e.g.: RR=21.5)

When you perform a test of statistical significance,you reject or do not reject the Null Hypothesis (H0)

There is NO difference between the two groups

(=no effect)

Null Hypothesis (H0)

(RR=1)

6

Norovirus on a Greek island• Null hypothesis (H0): “There is no association

between consumption of raw seafood and illness.”

• Alternative hypothesis (H1): “There is an association between consumption of raw seafood and illness.”

7

Hypothesis testing

• Tests of statistical significance• Data not consistent with H0 :

– H0 can be rejected in favour of some alternative hypothesis H1 (the objective of our study).

• Data are consistent with the H0 :– H0 cannot be rejected

You cannot say that the H0 is true. You can only decide to reject it or not reject it.

8

p value

p value = probability that our result (e.g. a difference between proportions or a RR) or more extreme values could be observed under the null hypothesis

H0 rejected using reported p value

9

p values – practicalities

Low p values = low degree of compatibility between H0 and the observed data: association unlikely to be by chanceyou reject H0, the test is significant

High p values = high degree of compatibility between H0 and the observed data: association likely to be by chanceyou don’t reject H0, the test is not significant

10

Levels of significance – practicalities

We need of a cut-off !

1% 5% 10%

p value > 0.05 = H0 not rejected (non significant)

p value ≤ 0.05 = H0 rejected (significant)

BUT: Give always the exact p-value rather than „significant“ vs. „non-significant“.

11

• ”The limit for statistical significance was set at p=0.05.”

• ”There was a strong relationship (p<0.001).”

• ”…, but it did not reach statistical significance (ns).”

• „ The relationship was statistically significant (p=0.0361)”

Examples from the literature

p=0.05 Agreed conventionNot an absolute truth

”Surely, God loves the 0.06 nearly as much as the 0.05” (Rosnow and Rosenthal, 1991)

12

p = 0.05 and its errors

• Level of significance, usually p = 0.05

• p value used for decision making

But still 2 possible errors:

H0 should not be rejected, but it was rejected :

Type I or alpha error

H0 should be rejected, but it was not rejected :

Type II or beta error

13

• H0 is “true” but rejected: Type I or error• H0 is “false” but not rejected: Type II or error

Types of errors

H0 to be not rejected H0 to be rejected (H1)

H0 not rejected Right decision

1-

Type II error

H0 rejected (H1)

Type I error

Right decision

1-

Decision basedon thep value

Truth

No diff

No diff

Diff

Diff

14

More on errors• Probability of Type I error:

– Value of α is determined in advance of the test– The significance level is the level of α error that we

would accept (usually 0.05)

• Probability of Type II error:– Value of β depends on the size of effect (e.g. RR, OR)

and sample size– 1- β: Statistical power of a study to detect an effect on

a specified size (e.g. 0.80)– Fix β in advance: choose an appropriate sample size

15

Quantifying the association

• Test of association of exposure and outcome • E.g. chi2 test or Fisher’s exact test• Comparison of proportions• Chi2 value quantifies the association• The larger the chi2 value, the smaller the

p value – the more the observed data deviate from the

assumption of independence (no effect).16

Chi-square value

num. expected

num.) expectednum. (observed 22

17

Norovirus on a Greek island2x2 table

29 9

5 136

Raw seafood

No raw seafood

Ill Non ill

34 145

38

141

1791819 % 81%

Expected proportion of ill and not ill :

x19% ill

x 81% non-ill

x 19% ill

x 81% non-ill

Expected number of ill and not ill for each cell :

6

27 114

31

Chi-square calculation

(29-6)2/6 (9-31)2/31

(5-27)2/27(136-114)2/

114

Raw seafood

No raw seafood

Ill Non ill

34 145

38

141

17919

χ2= 125p < 0.001

Norovirus on a Greek island“The attack rate of illness among consumers of raw seafood was 21.5 times higher than among non consumers of these food items (p<0.001).”

The p value is smaller than the chosen significance level of α = 5%. → The null hypothesis is rejected.

There is a < 0.001 probability (<1/1000) that the observed association could have occured by chance, if there were no true association between

eating imported raw seafood and illness.20

C2012 vs facilitators

The ultimate (eye) test.

H0: the proportion of facilitators wearing glasses during the Tuesday morning sessions was equal to the proportion of fellows wearing glasses.

H1: the above proportions were different.

21

C2012 vs facilitators

11 27

6 8

Fellow

Facilitator

Glasses No glasses

17 35

38

14

522233% 67%

Expected proportion of ill and not ill :

x33% +ve

x67% -ve

x33% +ve

x67% -ve

Expected number of ill and not ill for each cell :

13

4.6 9.4

25

Chi-square calculation

(11-13)2/13 (27-25)2/25

(6-4.6)2/4.6 (8-9.4)2/9.4

Fellow

Facilitator

Glasses No glasses

23

χ2= 1.11p = 0.343

t-test

• Used to compare means of a continuous variable in two different groups

• Assumes normal distribution

24

t-test

• H0: fellows with glasses do not tend to sit further in the back of the room compared to fellows without glasses

• H1: fellows with glasses tend to sit further in the back of the room compared to fellows without glasses

25

t-test

26

Epidemiology and statistics

27

Criticism on significance testing

“Epidemiological application need more than a decision as to whether chance alone could have produced association.” (Rothman et al. 2008)

Estimation of an effect measure (e.g. RR, OR) rather than significance testing.

28

Suggested reading

• KJ Rothman, S Greenland, TL Lash, Modern Epidemiology, Lippincott Williams & Wilkins, Philadelphia, PA, 2008

• SN Goodman, R Royall, Evidence and Scientific Research, AJPH 78, 1568, 1988

• SN Goodman, Toward Evidence-Based Medical Statistics. 1: The P Value Fallacy, Ann Intern Med. 130, 995, 1999

• C Poole, Low P-Values or Narrow Confidence Intervals: Which are more Durable? Epidemiology 12, 291, 2001

29

Previous lecturers

30

significance testing ioannis karagiannis (based on previous epiet material) 18 th epiet/euphem...

Documents

p value p value

effect null hypothesis

significant p value

greek island null hypothesis

reported p value

exact pvalue

significant high p values

observed data