significance testing ioannis karagiannis (based on previous epiet material) 18 th epiet/euphem...
TRANSCRIPT
Significance testing
Ioannis Karagiannis(based on previous EPIET material)
18th EPIET/EUPHEM Introductory course28.09.2012
The idea of statistical inference
Sample
PopulationConclusions basedon the sample
Generalisation to the population
Hypotheses
2
Inferential statistics
• Uses patterns in the sample data to draw inferences about the population represented, accounting for randomness
• Two basic approaches: – Hypothesis testing– Estimation
• Common goal: conclude on the effect of an independent variable on a dependent variable
3
The aim of a statistical test
To reach a deterministic decision (“yes” or “no”) about observed data on a probabilistic basis.
4
Why significance testing?
Norovirus outbreak on a Greek island: “The risk of illness was higher among people who ate raw seafood (RR=21.5).”
Is the association due to chance?
5
The two hypotheses
There is a difference between the two groups
(=there is an effect)
Alternative Hypothesis (H1)
(e.g.: RR=21.5)
When you perform a test of statistical significance,you reject or do not reject the Null Hypothesis (H0)
There is NO difference between the two groups
(=no effect)
Null Hypothesis (H0)
(RR=1)
6
Norovirus on a Greek island• Null hypothesis (H0): “There is no association
between consumption of raw seafood and illness.”
• Alternative hypothesis (H1): “There is an association between consumption of raw seafood and illness.”
7
Hypothesis testing
• Tests of statistical significance• Data not consistent with H0 :
– H0 can be rejected in favour of some alternative hypothesis H1 (the objective of our study).
• Data are consistent with the H0 :– H0 cannot be rejected
You cannot say that the H0 is true. You can only decide to reject it or not reject it.
8
p value
p value = probability that our result (e.g. a difference between proportions or a RR) or more extreme values could be observed under the null hypothesis
H0 rejected using reported p value
9
p values – practicalities
Low p values = low degree of compatibility between H0 and the observed data: association unlikely to be by chanceyou reject H0, the test is significant
High p values = high degree of compatibility between H0 and the observed data: association likely to be by chanceyou don’t reject H0, the test is not significant
10
Levels of significance – practicalities
We need of a cut-off !
1% 5% 10%
p value > 0.05 = H0 not rejected (non significant)
p value ≤ 0.05 = H0 rejected (significant)
BUT: Give always the exact p-value rather than „significant“ vs. „non-significant“.
11
• ”The limit for statistical significance was set at p=0.05.”
• ”There was a strong relationship (p<0.001).”
• ”…, but it did not reach statistical significance (ns).”
• „ The relationship was statistically significant (p=0.0361)”
Examples from the literature
p=0.05 Agreed conventionNot an absolute truth
”Surely, God loves the 0.06 nearly as much as the 0.05” (Rosnow and Rosenthal, 1991)
12
p = 0.05 and its errors
• Level of significance, usually p = 0.05
• p value used for decision making
But still 2 possible errors:
H0 should not be rejected, but it was rejected :
Type I or alpha error
H0 should be rejected, but it was not rejected :
Type II or beta error
13
• H0 is “true” but rejected: Type I or error• H0 is “false” but not rejected: Type II or error
Types of errors
H0 to be not rejected H0 to be rejected (H1)
H0 not rejected Right decision
1-
Type II error
H0 rejected (H1)
Type I error
Right decision
1-
Decision basedon thep value
Truth
No diff
No diff
Diff
Diff
14
More on errors• Probability of Type I error:
– Value of α is determined in advance of the test– The significance level is the level of α error that we
would accept (usually 0.05)
• Probability of Type II error:– Value of β depends on the size of effect (e.g. RR, OR)
and sample size– 1- β: Statistical power of a study to detect an effect on
a specified size (e.g. 0.80)– Fix β in advance: choose an appropriate sample size
15
Quantifying the association
• Test of association of exposure and outcome • E.g. chi2 test or Fisher’s exact test• Comparison of proportions• Chi2 value quantifies the association• The larger the chi2 value, the smaller the
p value – the more the observed data deviate from the
assumption of independence (no effect).16
Chi-square value
num. expected
num.) expectednum. (observed 22
17
Norovirus on a Greek island2x2 table
29 9
5 136
Raw seafood
No raw seafood
Ill Non ill
34 145
38
141
1791819 % 81%
Expected proportion of ill and not ill :
x19% ill
x 81% non-ill
x 19% ill
x 81% non-ill
Expected number of ill and not ill for each cell :
6
27 114
31
Chi-square calculation
(29-6)2/6 (9-31)2/31
(5-27)2/27(136-114)2/
114
Raw seafood
No raw seafood
Ill Non ill
34 145
38
141
17919
χ2= 125p < 0.001
Norovirus on a Greek island“The attack rate of illness among consumers of raw seafood was 21.5 times higher than among non consumers of these food items (p<0.001).”
The p value is smaller than the chosen significance level of α = 5%. → The null hypothesis is rejected.
There is a < 0.001 probability (<1/1000) that the observed association could have occured by chance, if there were no true association between
eating imported raw seafood and illness.20
C2012 vs facilitators
The ultimate (eye) test.
H0: the proportion of facilitators wearing glasses during the Tuesday morning sessions was equal to the proportion of fellows wearing glasses.
H1: the above proportions were different.
21
C2012 vs facilitators
11 27
6 8
Fellow
Facilitator
Glasses No glasses
17 35
38
14
522233% 67%
Expected proportion of ill and not ill :
x33% +ve
x67% -ve
x33% +ve
x67% -ve
Expected number of ill and not ill for each cell :
13
4.6 9.4
25
Chi-square calculation
(11-13)2/13 (27-25)2/25
(6-4.6)2/4.6 (8-9.4)2/9.4
Fellow
Facilitator
Glasses No glasses
23
χ2= 1.11p = 0.343
t-test
• Used to compare means of a continuous variable in two different groups
• Assumes normal distribution
24
t-test
• H0: fellows with glasses do not tend to sit further in the back of the room compared to fellows without glasses
• H1: fellows with glasses tend to sit further in the back of the room compared to fellows without glasses
25
t-test
26
Epidemiology and statistics
27
Criticism on significance testing
“Epidemiological application need more than a decision as to whether chance alone could have produced association.” (Rothman et al. 2008)
Estimation of an effect measure (e.g. RR, OR) rather than significance testing.
28
Suggested reading
• KJ Rothman, S Greenland, TL Lash, Modern Epidemiology, Lippincott Williams & Wilkins, Philadelphia, PA, 2008
• SN Goodman, R Royall, Evidence and Scientific Research, AJPH 78, 1568, 1988
• SN Goodman, Toward Evidence-Based Medical Statistics. 1: The P Value Fallacy, Ann Intern Med. 130, 995, 1999
• C Poole, Low P-Values or Narrow Confidence Intervals: Which are more Durable? Epidemiology 12, 291, 2001
29
Previous lecturers
30