lecture 07 category shaoqi rao rev

30
1 Chapter 6 Chi-Square Test for Categorical Variable Shaoqi Rao, PhD 2009.11.9 Slides adapted from Dr. Zhang Jinxin’s

Upload: sumit-prajapati

Post on 22-Apr-2015

640 views

Category:

Business


1 download

DESCRIPTION

 

TRANSCRIPT

Page 1: Lecture 07 Category Shaoqi Rao Rev

1

Chapter 6

Chi-Square Test for Categorical Variable

Shaoqi Rao, PhD

2009.11.9

Slides adapted from Dr. Zhang Jinxin’s

Page 2: Lecture 07 Category Shaoqi Rao Rev

2

6.1 Basic logic of 6.1 Basic logic of 22 test testGiven a set of observed frequency distribution

A1, A2, A3 …

to test whether the data follow certain theory.If the theory is true, then we will have a set

of theoretical frequency distribution:

T1, T2, T3 …

Comparing A1, A2, A3 … and T1, T2, T3 …

If they are quite different, then the theory might not be true;

Otherwise, the theory is acceptable.

Page 3: Lecture 07 Category Shaoqi Rao Rev

3

6.1.16.1.1 Chi-square distribution Chi-square distribution

~ 2 distribution

—— Agreement between observed and expected frequencies

k

i i

iiP e

ef

1

22 )(

DF=k-1-# parameters estimating fi

For a contingency table,

DF=(# rows-1)(# columns-1)

Page 4: Lecture 07 Category Shaoqi Rao Rev

4

22 distributiondistribution

0 2 4 6 8 100.0

0.1

0.2

0.3

Page 5: Lecture 07 Category Shaoqi Rao Rev

5

6.1.2 χ2Test for Goodness of Fit (Large Sample)

Table1 Frequency distribution and goodness of fit based on 136 measurements to the phantom( 体模 )

intervals A Φ(X1) Φ(X2) P(X) T=n* P(X) (A-T)2/T

1.228- 2 0.00069 0.00466 0.00397 0.5405 3.94143

1.234- 2 0.00466 0.02275 0.01809 2.4601 0.08605

1.240- 7 0.02275 0.08076 0.05801 7.8889 0.10016

1.246- 17 0.08076 0.21186 0.13110 17.8294 0.03859

1.252- 25 0.21186 0.42074 0.20888 28.4083 0.40892

1.258- 37 0.42074 0.65542 0.23468 31.9167 0.80961

1.264- 25 0.65542 0.84134 0.18592 25.2855 0.00322

1.270- 16 0.84134 0.94520 0.10386 14.1244 0.24906

1.276- 4 0.94520 0.98610 0.04090 5.5618 0.43858

1.282- 1 0.98610 0.99744 0.01135 1.5434 0.19130

合 计 - - - - - 6.26692

00.201.0

26.1240.1

Z 40.1

01.0

26.1246.1

Z

Page 6: Lecture 07 Category Shaoqi Rao Rev

6

1. Setting up hypothesesH0 : the population follows N(1.26,0.012)

H1 : the population doesn’t follow N(1.26,0.012) α=0.05

2. Calculation of the statistic :

3. P-value : ν=k-1-2=10-1-2=7

4. Conclusion : With significance level α=0.05, H0 is not rejected. The measurement follows the normal distribution.

27.6

22

T

TA

5.0

07.14,35.62

7,5.02

27,05.0

27,5.0

P

Page 7: Lecture 07 Category Shaoqi Rao Rev

7

6.2 Comparison between Two Independent

Sample Proportions

In chapter 4 the Z test can only be used

for comparing with a given 0 (one sample)

or comparing 1 with 2 (two samples).

If we need to compare more than two

samples, Chi-square test is widely used.

Page 8: Lecture 07 Category Shaoqi Rao Rev

8

Example 6.1Example 6.1 In a clinical survey, 215 patients with pulmona

ry heart disease ( 肺心病 ) in a hospital were collected , of which 164 patients have taken digitalis ( 洋地黄 ) and 51 patients haven’t taken it. Each of them received an ECG examination. The results are listed in Table 6.2.

Page 9: Lecture 07 Category Shaoqi Rao Rev

9

Table 6.2 Data of patients of pulmonary heart disease with arrhythmia(心律失常)

ECG

Arrhythmia Normal Total Arrhythmia rate (%)

With digitalis 81 83 164 49.39

Without digitalis 19 32 51 37.25

Total 100 115 215 46.51

Page 10: Lecture 07 Category Shaoqi Rao Rev

10

Table 6.2 Data of patients of pulmonary heart disease with arrhythmia(心律失常)

ECG

Arrhythmia Normal Total Arrhythmia rate (%)

With digitalis 81(76.28) 83(87.72) 164 49.39

Without digitalis 19(23.72) 32(27.28) 51 37.25

Total 100 115 215 46.51

Page 11: Lecture 07 Category Shaoqi Rao Rev

11

2P =

11510051164

215)19833281( 2

=2.3028

2

1

2

1

22 )(

i j ij

ijijP e

ef

3028.228.27

)28.2732(

72.23

)72.2319(

72.87

)72.8783(

28.76

)28.7681( 22222

p

2121

2211222112 )(

ccrrp nnnn

nffff

ν = 1

Page 12: Lecture 07 Category Shaoqi Rao Rev

12

22 test and test and ZZ test test

According to (4.25)

5175.1)51/1164/1)(215/115)(215/100(

51/19164/81

Z

3028.22 Z)( 25.4

11)1(

2100

21

nnPP

PPz

Page 13: Lecture 07 Category Shaoqi Rao Rev

13

Correction for continuityCorrection for continuity

When n≥40, if there happens 1≤eij<5,

2

1

2

1

2

2)5.0(

i j ij

ijij

P e

ef

2121

2211222112 )2/(

ccrrP nnnn

nnffff

Page 14: Lecture 07 Category Shaoqi Rao Rev

14

Fisher’s exact testFisher’s exact test

When n<40, or eij<1, with SPSS, 2 test is not proper then. An exact P value will be obtained for us to give conclusion.

This can be easily fulfilled in SPSS.

Page 15: Lecture 07 Category Shaoqi Rao Rev

15

Example 6.9Example 6.9

Table 6.14 The results of treatment to embolic angitis(栓塞性脉管炎)patients Groups Recovery No recovery Total

New treatment 6(a) 1(b) 7(nr1) Control 1(c) 4(d) 5(nr2)

Total 7(nc1) 5(nc2) 12(n)

Page 16: Lecture 07 Category Shaoqi Rao Rev

16

Statistical descriptionStatistical description

group * result Crosstabulation

6 1 7

4.1 2.9 7.0

85.7% 14.3% 100.0%

1 4 5

2.9 2.1 5.0

20.0% 80.0% 100.0%

7 5 12

7.0 5.0 12.0

58.3% 41.7% 100.0%

Count

Expected Count

% within group

Count

Expected Count

% within group

Count

Expected Count

% within group

new treatment

control

group

Total

recovery no recovery

result

Total

Page 17: Lecture 07 Category Shaoqi Rao Rev

17

Chi-Square Tests

5.182b 1 .023

2.831 1 .092

5.555 1 .018

.072 .045

4.750 1 .029

12

Pearson Chi-Square

Continuity Correctiona

Likelihood Ratio

Fisher's Exact Test

Linear-by-LinearAssociation

N of Valid Cases

Value dfAsymp. Sig.

(2-sided)Exact Sig.(2-sided)

Exact Sig.(1-sided)

Computed only for a 2x2 tablea.

4 cells (100.0%) have expected count less than 5. The minimum expected count is2.08.

b.

Statistical inferenceStatistical inference

Page 18: Lecture 07 Category Shaoqi Rao Rev

18

6.3 The 6.3 The 22 Tests for Binary Tests for Binary Variable under a Paired DesignVariable under a Paired Design

Example 6.2 There are 260 serum ( 血清 ) samples. Each sample is divided into two and tested by two different methods of immunological test of rheumatoid factor( 类风湿因子 ) respectively. The results are listed in Table 6.4. Now the question is that results of two methods are independent or not.

Page 19: Lecture 07 Category Shaoqi Rao Rev

19

test for independence between test for independence between two binary variablestwo binary variables

Table 6.4 The results of two immunological tests B

A + -

Total

+ 172 8 180 - 12 68 80

Total 184 76 260

2121

2211222112 )2/(

ccrrP nnnn

nnffff 2 2 =173.74=173.74

Example 6.2Example 6.2

12/80=15%12/80=15%172/180=95%172/180=95%

Page 20: Lecture 07 Category Shaoqi Rao Rev

20

6.3.2 Comparison between 6.3.2 Comparison between two sample proportionstwo sample proportions

McNemar testMcNemar test

2112

22112 )(

ff

ff

2 2 ==

Page 21: Lecture 07 Category Shaoqi Rao Rev

21

H0: 1=2, H1: 1≠2, α=0.05When H0 is true,

For large sample (b+c>40)

If the 2 > 2 , then reject H0

221

cbTT

cb

cbcb

cbc

cb

cbb

222

2 )(

2

)2

(

2

)2

(

0.05

Page 22: Lecture 07 Category Shaoqi Rao Rev

22

The Probability ExpressionsThe Probability Expressions

Trt A Trt B Total

+ -

+ 11 (a) 12 (b) r1

- 21 (c) 22 (d) r2

Total c1 c2 1.0

H0: c1= r1 H1: c1 r1

Since c1= 11+ 21, r1= 11+ 12,

This test becomes: H0: 12= 21, H1: 12 21

Page 23: Lecture 07 Category Shaoqi Rao Rev

23

Correction to McNemar testCorrection to McNemar test((ff1212++ff2121<40)<40)

2112

22112 )1(

ff

ff

2 2 == 112

128

)1128( 2

2 2 = =0.45= =0.45

Page 24: Lecture 07 Category Shaoqi Rao Rev

24

6.4 The 6.4 The 22 Test for R×C Test for R×C Contingency TableContingency Table

Table 6.6 Blood types of patient suffering from different diseases Blood type Total

Disease status A B O

Digestive ulcer 679 134 983 1796 Stomach cancer 416 84 383 883

Control 2625 570 2892 6087 Total 3720 788 4258 8766

Page 25: Lecture 07 Category Shaoqi Rao Rev

25

The statistic for hypothesis testThe statistic for hypothesis test

i j cjri

ij

nn

fn )1(

2

2 2 ==

543.40142586087

2892

37201796

6798766

222

P

4)1()1( CR

205.0 =9.488=9.488

Page 26: Lecture 07 Category Shaoqi Rao Rev

26

6.4.2 Multiple comparison6.4.2 Multiple comparison for R×C Table for R×C Table

group + -

I

II

III

IV

V

VI … …controlcontrol

Page 27: Lecture 07 Category Shaoqi Rao Rev

27

6.4.3 6.4.3 Measurement of Measurement of association for R×C tableassociation for R×C table

Table 6.11 Blood type of 1043 patients MN system ABO

system M N MN Total

O 85 100 150 335

A 56 78 120 254

B 98 132 170 400 AB 23 25 6 54

Total 262 335 446 1043

Page 28: Lecture 07 Category Shaoqi Rao Rev

28

Pearson contingency coefficientPearson contingency coefficient

2

2

P

PP nr

156.0925.251043

925.25

Pr

Page 29: Lecture 07 Category Shaoqi Rao Rev

29

Pre-requisite for 2 test

By experience, The theoretical frequencies should be grea

ter than 5 in more than 4/5 cells; The theoretical frequency in any cell shoul

d be greater than 1.

Otherwise, we need to use Fisher exact test.

Page 30: Lecture 07 Category Shaoqi Rao Rev

30