mc namer test of correlation
DESCRIPTION
TRANSCRIPT
EPI809/Spring 2008EPI809/Spring 2008 11
Fisher’s Exact TestFisher’s Exact Test
Fisher’s Exact TestFisher’s Exact Test is a test for is a test for independence in a 2 X 2 table. It is most independence in a 2 X 2 table. It is most useful when the total sample size and the useful when the total sample size and the expected values are small. The test holds expected values are small. The test holds the marginal totals fixed and computes the the marginal totals fixed and computes the hypergeometric probability that nhypergeometric probability that n1111 is at is at
least as large as the observed valueleast as large as the observed value Useful when E(cell counts) < 5.Useful when E(cell counts) < 5.
EPI809/Spring 2008EPI809/Spring 2008 22
Hypergeometric distributionHypergeometric distribution Example: 2x2 table with cell counts a, b, c, d. Assuming Example: 2x2 table with cell counts a, b, c, d. Assuming
marginal totals are fixed:marginal totals are fixed:M1 = a+b, M2 = c+d, N1 = a+c, N2 = b+d.M1 = a+b, M2 = c+d, N1 = a+c, N2 = b+d.for convenience assume N1<N2, M1<M2.for convenience assume N1<N2, M1<M2.possible value of a are: 0, 1, …min(M1,N1). possible value of a are: 0, 1, …min(M1,N1).
Probability distribution of cell count a follows a Probability distribution of cell count a follows a hypergeometric distribution:hypergeometric distribution:N = a + b + c + d = N1 + N2 = M1 + M2N = a + b + c + d = N1 + N2 = M1 + M2
Pr (x=a) = N1!N2!M1!M2! / [N!a!b!c!d!]Pr (x=a) = N1!N2!M1!M2! / [N!a!b!c!d!] Mean (x) = M1N1/ NMean (x) = M1N1/ N Var (x) = M1M2N1N2 / [NVar (x) = M1M2N1N2 / [N22(N-1)](N-1)]
Fisher exact test is based on this hypergeometric distr. Fisher exact test is based on this hypergeometric distr.
EPI809/Spring 2008EPI809/Spring 2008 33
Fisher’s Exact Test ExampleFisher’s Exact Test Example
Is HIV Infection related to Hx of STDs in Sub Is HIV Infection related to Hx of STDs in Sub Saharan African Countries? Test at 5% Saharan African Countries? Test at 5% level.level.
yesyes nono totaltotal
yesyes 33 77 1010
nono 55 1010 1515
totaltotal 88 1717
HIV Infection
Hx of STDs
EPI809/Spring 2008EPI809/Spring 2008 44
Hypergeometric prob.Hypergeometric prob.
Probability of observing this specific table Probability of observing this specific table given fixed marginal totals isgiven fixed marginal totals isPr (3,7, 5, 10) = 10!15!8!17!/[25!3!7!5!10!]Pr (3,7, 5, 10) = 10!15!8!17!/[25!3!7!5!10!]= 0.3332= 0.3332
Note the above is not the p-value. Why?Note the above is not the p-value. Why? Not the accumulative probability, or not the Not the accumulative probability, or not the
tail probability.tail probability. Tail prob = sum of all values (a = 3, 2, 1, Tail prob = sum of all values (a = 3, 2, 1,
0). 0).
EPI809/Spring 2008EPI809/Spring 2008 55
Hypergeometric probHypergeometric prob
Pr (2, 8, 6, 9) = 10!15!8!17!/[25!2!8!6!9!]Pr (2, 8, 6, 9) = 10!15!8!17!/[25!2!8!6!9!]= 0.2082= 0.2082
Pr (1, 9, 7, 8) = 10!15!8!17!/[25!1!9!7!8!]Pr (1, 9, 7, 8) = 10!15!8!17!/[25!1!9!7!8!]= 0.0595= 0.0595
Pr (0,10, 8, 7) = 10!15!8!17!/[25!0!10!8!7!]Pr (0,10, 8, 7) = 10!15!8!17!/[25!0!10!8!7!]= 0.0059= 0.0059
Tail prob = .3332+.2082+.0595+.0059 Tail prob = .3332+.2082+.0595+.0059 = .6068= .6068
EPI809/Spring 2008EPI809/Spring 2008 66
Data dis;Data dis;input STDs $ HIV $ count;input STDs $ HIV $ count;cards;cards;no no 10 no no 10 No Yes 5No Yes 5yes no 7yes no 7yes yes 3yes yes 3;;run;run;
proc freq data=dis order=data;proc freq data=dis order=data; weight Count;weight Count; tables STDs*HIV/chisq fisher;tables STDs*HIV/chisq fisher;run;run;
Fisher’s Exact Test Fisher’s Exact Test SAS CodesSAS Codes
EPI809/Spring 2008EPI809/Spring 2008 77
Pearson Chi-squares test Pearson Chi-squares test Yates correction Yates correction
Pearson Chi-squares testPearson Chi-squares test
χχ22 = ∑ = ∑ii (O (Oii-E-Eii))22/E/Ei i follows a chi-squares follows a chi-squares
distribution with df = (r-1)(c-1)distribution with df = (r-1)(c-1)
if Eif Eii ≥ 5. ≥ 5.
Yates correction for more accurate p-valueYates correction for more accurate p-value
χχ22 = ∑ = ∑ii (|O (|Oii-E-Eii| - 0.5)| - 0.5)22/E/Ei i
when Owhen Oi i and Eand Ei i are close to each other.are close to each other.
EPI809/Spring 2008EPI809/Spring 2008 88
Fisher’s Exact Test SAS OutputFisher’s Exact Test SAS Output Statistics for Table of STDs by HIV
Statistic DF Value Prob ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ Chi-Square 1 0.0306 0.8611 Likelihood Ratio Chi-Square 1 0.0308 0.8608 Continuity Adj. Chi-Square 1 0.0000 1.0000 Mantel-Haenszel Chi-Square 1 0.0294 0.8638 Phi Coefficient -0.0350 Contingency Coefficient 0.0350 Cramer's V -0.0350
WARNING: 50% of the cells have expected counts less than 5. Chi-Square may not be a valid test.
Fisher's Exact Test ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ Cell (1,1) Frequency (F) 10 Left-sided Pr <= F 0.6069 Right-sided Pr >= F 0.7263
Table Probability (P) 0.3332 Two-sided Pr <= P 1.0000
EPI809/Spring 2008EPI809/Spring 2008 99
Fisher’s Exact TestFisher’s Exact Test
The output consists of three p-values: Left: Use this when the alternative to
independence is that there is negative association between the variables. That is, the observations tend to lie in lower left and upper right.
Right: Use this when the alternative to independence is that there is positive association between the variables. That is, the observations tend to lie in upper left and lower right.
2-Tail: Use this when there is no prior alternative.
EPI809/Spring 2008EPI809/Spring 2008 1010
Useful Measures of Association Useful Measures of Association - Nominal Data- Nominal Data
Cohen’s Kappa ( Cohen’s Kappa ( ) ) Also referred to as Cohen’s General Index of Also referred to as Cohen’s General Index of
Agreement. It was originally developed to Agreement. It was originally developed to assess the degree of agreement between two assess the degree of agreement between two judges or raters assessing n items on the judges or raters assessing n items on the basis of a nominal classification for 2 basis of a nominal classification for 2 categories. Subsequent work by Fleiss and categories. Subsequent work by Fleiss and Light presented extensions of this statistic to Light presented extensions of this statistic to more than 2 categories.more than 2 categories.
EPI809/Spring 2008EPI809/Spring 2008 1111
Useful Measures of Association Useful Measures of Association - Nominal Data- Nominal Data
Cohen’s Kappa ( Cohen’s Kappa ( ) )
.33 (a)
.07 (b)
.13 (c)
.47 (d)
Inspector A
Good Bad
Inspector ‘B’
Bad
Good .40 (pB)
.60 (qB)
.46 .54 1.00 (pA) (qA)
EPI809/Spring 2008EPI809/Spring 2008 1212
Useful Measures of Association Useful Measures of Association - Nominal Data- Nominal Data
Cohen’s Kappa ( Cohen’s Kappa ( ) ) Cohen’s Cohen’s requires that we calculate two requires that we calculate two
values:values:• ppoo : the proportion of cases in which agreement : the proportion of cases in which agreement
occurs. In our example, this value equals 0.80.occurs. In our example, this value equals 0.80.• Pe : the proportion of cases in which agreement Pe : the proportion of cases in which agreement
would have been expected due purely to chance, would have been expected due purely to chance, based upon the marginal frequencies; wherebased upon the marginal frequencies; where
pe = pApB + qAqB = 0.508 for our data
EPI809/Spring 2008EPI809/Spring 2008 1313
Useful Measures of Association Useful Measures of Association - Nominal Data- Nominal Data
Cohen’s Kappa ( Cohen’s Kappa ( ) ) Then, Cohen’s Then, Cohen’s measures the measures the
agreement between two variables and is agreement between two variables and is defined bydefined by
=po - pe
1 - pe
= 0.593
EPI809/Spring 2008EPI809/Spring 2008 1414
Useful Measures of Association Useful Measures of Association - Nominal Data- Nominal Data
Cohen’s Kappa ( Cohen’s Kappa ( ) ) To test the Null Hypothesis that the true To test the Null Hypothesis that the true
kappa kappa = 0, we use the Standard Error: = 0, we use the Standard Error:
then z = then z = //~N(0,1)~N(0,1)
=1
(1 - pe) npe + pe
2 - pi.p.i (pi.+ p.i)i = 1
k
where pi. & p.i refer to row and column proportions (in textbook, ai= pi. & bi=p.i)
EPI809/Spring 2008EPI809/Spring 2008 1515
Useful Measures of Association Useful Measures of Association - Nominal Data- SAS CODES- Nominal Data- SAS CODES
DataData kap; kap;input B $ A $ prob;input B $ A $ prob;n=n=100100;;count=prob*n;count=prob*n;cards;cards;Good Good .33 Good Good .33 Good Bad .07Good Bad .07Bad Good .13Bad Good .13Bad Bad .47Bad Bad .47;;runrun;;
procproc freqfreq data=kap order=data; data=kap order=data; weight Count;weight Count; tables B*A/chisq;tables B*A/chisq; test kappa;test kappa;runrun;;
EPI809/Spring 2008EPI809/Spring 2008 1616
Useful Measures of Association Useful Measures of Association
- Nominal Data-- Nominal Data- SAS OUTPUT SAS OUTPUT
The FREQ Procedure
Statistics for Table of B by A
Simple Kappa Coefficient ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ Kappa 0.5935 ASE 0.0806 95% Lower Conf Limit 0.4356 95% Upper Conf Limit 0.7514
Test of H0: Kappa = 0
ASE under H0 0.0993 Z 5.9796 One-sided Pr > Z <.0001 Two-sided Pr > |Z| <.0001
Sample Size = 100
EPI809/Spring 2008EPI809/Spring 2008 1717
McNemar’s Test for Correlated McNemar’s Test for Correlated (Dependent) Proportions(Dependent) Proportions
EPI809/Spring 2008EPI809/Spring 2008 1818
McNemar’s Test for Correlated McNemar’s Test for Correlated (Dependent) Proportions(Dependent) Proportions
The approximate test previously presented for The approximate test previously presented for assessing a difference in proportions is based upon assessing a difference in proportions is based upon the assumption that the two samples are the assumption that the two samples are independentindependent..
Suppose, however, that we are faced with a situation Suppose, however, that we are faced with a situation where this is not true. Suppose we randomly-select where this is not true. Suppose we randomly-select 100 people, and find that 20% of them have flu. Then, 100 people, and find that 20% of them have flu. Then, imagine that we apply some type of treatment to all imagine that we apply some type of treatment to all sampled peoples; and on a post-test, we find that 20% sampled peoples; and on a post-test, we find that 20% have flu. have flu.
Basis / Rationale for the TestBasis / Rationale for the Test
EPI809/Spring 2008EPI809/Spring 2008 1919
McNemar’s Test for Correlated McNemar’s Test for Correlated (Dependent) Proportions(Dependent) Proportions
We might be tempted to suppose that no hypothesis We might be tempted to suppose that no hypothesis test is required under these conditions, in that the test is required under these conditions, in that the ‘Before’ and ‘After’ ‘Before’ and ‘After’ pp values are identical, and would values are identical, and would surely result in a test statistic value of 0.00.surely result in a test statistic value of 0.00.
The problem with this thinking, however, is that the two The problem with this thinking, however, is that the two sample sample pp values are dependent, in that each person values are dependent, in that each person was assessed twice. It was assessed twice. It isis possible that the 20 people possible that the 20 people that had flu originally still had flu. It is also possible that had flu originally still had flu. It is also possible that the 20 people that had flu on the second test that the 20 people that had flu on the second test were were a completely different seta completely different set of 20 people! of 20 people!
EPI809/Spring 2008EPI809/Spring 2008 2020
McNemar’s Test for Correlated McNemar’s Test for Correlated (Dependent) Proportions(Dependent) Proportions
It is for precisely this type of situation that It is for precisely this type of situation that McNemar’s Test for Correlated (Dependent) McNemar’s Test for Correlated (Dependent) Proportions is applicable.Proportions is applicable.
McNemar’s Test employs two unique features for McNemar’s Test employs two unique features for testing the two proportions:testing the two proportions:
* a special fourfold contingency table; with a* a special fourfold contingency table; with a
* special-purpose chi-square (* special-purpose chi-square ( 22) test statistic ) test statistic (the approximate test).(the approximate test).
EPI809/Spring 2008EPI809/Spring 2008 2121
McNemar’s Test for Correlated McNemar’s Test for Correlated (Dependent) Proportions(Dependent) Proportions
Nomenclature for the Fourfold (2 x 2) Nomenclature for the Fourfold (2 x 2) Contingency TableContingency Table
AA BB
DDCC
(B + D)(B + D)(A + C)(A + C)
(C + D)(C + D)
(A + B)(A + B)
nn
where(A+B) + (C+D) =(A+C) + (B+D) = n = number of units evaluated
and where
df = 1
EPI809/Spring 2008EPI809/Spring 2008 2222
McNemar’s Test for Correlated McNemar’s Test for Correlated (Dependent) Proportions(Dependent) Proportions
1. Construct a 2x2 table where the paired 1. Construct a 2x2 table where the paired observations are the sampling units.observations are the sampling units.
2. Each observation must represent a single 2. Each observation must represent a single joint event possibility; that is, classifiable in joint event possibility; that is, classifiable in only one cell of the contingency table.only one cell of the contingency table.
3. In it’s Exact form, this test may be conducted 3. In it’s Exact form, this test may be conducted as a One Sample Binomial for the B & C cellsas a One Sample Binomial for the B & C cells
Underlying Assumptions of the TestUnderlying Assumptions of the Test
EPI809/Spring 2008EPI809/Spring 2008 2323
McNemar’s Test for Correlated McNemar’s Test for Correlated (Dependent) Proportions(Dependent) Proportions
4. The expected frequency (4. The expected frequency (ffee) for the B ) for the B
and C cells on the contingency table and C cells on the contingency table must be equal to or greater than 5; must be equal to or greater than 5; where where
ffe e = (B + C) / 2 = (B + C) / 2
from the Fourfold tablefrom the Fourfold table
Underlying Assumptions of the TestUnderlying Assumptions of the Test
EPI809/Spring 2008EPI809/Spring 2008 2424
McNemar’s Test for Correlated (Dependent) McNemar’s Test for Correlated (Dependent) ProportionsProportions
Sample ProblemSample ProblemA randomly selected group of 120 students taking a standardized test for entrance into college exhibits a failure rate of 50%. A company which specializes in coaching students on this type of test has indicated that it can significantly reduce failure rates through a four-hour seminar. The students are exposed to this coaching session, and re-take the test a few weeks later. The school board is wondering if the results justify paying this firm to coach all of the students in the high school. Should they? Test at the 5% level.
EPI809/Spring 2008EPI809/Spring 2008 2525
McNemar’s Test for Correlated McNemar’s Test for Correlated (Dependent) Proportions(Dependent) Proportions
Sample ProblemSample ProblemThe summary data for this study appear as follows:
Number ofStudents
Status BeforeSession
Status AfterSession
4 Fail Fail4 Pass Fail
56 Fail Pass56 Pass Pass
EPI809/Spring 2008EPI809/Spring 2008 2626
McNemar’s Test for Correlated McNemar’s Test for Correlated (Dependent) Proportions(Dependent) Proportions
The data are then entered into the Fourfold Contingency table:
5656 5656
4444
60606060
88
112112
120120
BeforePass Fail
Pass
Fail
After
EPI809/Spring 2008EPI809/Spring 2008 2727
Step I : State the Null & Research HypothesesStep I : State the Null & Research Hypotheses
HH00 : : = =
HH11 : :
where where and and relate to the proportion of observations reflecting relate to the proportion of observations reflecting changeschanges in in
status (the B & C cells in the table)status (the B & C cells in the table)
Step II : Step II : 0.050.05
McNemar’s Test for Correlated McNemar’s Test for Correlated (Dependent) Proportions(Dependent) Proportions
EPI809/Spring 2008EPI809/Spring 2008 2828
Step III : State the Associated Test StatisticStep III : State the Associated Test Statistic
ABS (B - C) - 1 }2
2 = B + C
McNemar’s Test for Correlated McNemar’s Test for Correlated (Dependent) Proportions(Dependent) Proportions
EPI809/Spring 2008EPI809/Spring 2008 2929
Step IV : State the distribution of the Test Statistic When HStep IV : State the distribution of the Test Statistic When Hoo is True is True
22 = = 22 with 1 with 1 dfdf when H when Hoo is True is True
d
McNemar’s Test for Correlated McNemar’s Test for Correlated (Dependent) Proportions(Dependent) Proportions
EPI809/Spring 2008EPI809/Spring 2008 3030
Step V : Reject Ho if ABS (2 ) > 3.84
McNemar’s Test for Correlated McNemar’s Test for Correlated (Dependent) Proportions(Dependent) Proportions
X² Distribution
X²03.84
Chi-Square Curve w. df=1
EPI809/Spring 2008EPI809/Spring 2008 3131
Step VI : Calculate the Value of the Test StatisticStep VI : Calculate the Value of the Test Statistic
McNemar’s Test for Correlated McNemar’s Test for Correlated (Dependent) Proportions(Dependent) Proportions
ABS (56 - 4) - 1 }2
2 = = 43.35 56 + 4
EPI809/Spring 2008EPI809/Spring 2008 3232
McNemar’s Test for Correlated McNemar’s Test for Correlated (Dependent) Proportions-SAS Codes(Dependent) Proportions-SAS Codes
DataData test; test;input Before $ After $ count;input Before $ After $ count;cards;cards;pass pass 56 pass pass 56 pass fail 56pass fail 56fail pass 4fail pass 4fail fail 4;fail fail 4;runrun;;
procproc freqfreq data=test order=data; data=test order=data; weight Count;weight Count; tables Before*After/tables Before*After/agreeagree;;runrun;;
EPI809/Spring 2008EPI809/Spring 2008 3333
McNemar’s Test for Correlated McNemar’s Test for Correlated (Dependent) Proportions-SAS Output(Dependent) Proportions-SAS Output
Statistics for Table of Before by After
McNemar's Test ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ Statistic (S) 45.0667 DF 1 Pr > S <.0001
Sample Size = 120
Without the correction
EPI809/Spring 2008EPI809/Spring 2008 3434
Conclusion: What we have learnedConclusion: What we have learned
1.1. Comparison of binomial proportion using Z and Comparison of binomial proportion using Z and 22 Test. Test.
2.2. Explain Explain 22 Test for Independence of 2 variables Test for Independence of 2 variables
3.3. Explain The Fisher’s test for independenceExplain The Fisher’s test for independence
4.4. McNemar’s tests for correlated dataMcNemar’s tests for correlated data
5.5. Kappa StatisticKappa Statistic
6.6. Use of SAS Proc FREQ Use of SAS Proc FREQ
EPI809/Spring 2008EPI809/Spring 2008 3535
Conclusion: Further readingsConclusion: Further readings
Read textbook for Read textbook for
1.1. Power and sample size calculationPower and sample size calculation
2.2. Tests for trendsTests for trends