dr siti azrin binti ab hamid unit biostatistics and research methodology

52
Analysis of Categorical Data Dr Siti Azrin Binti Ab Hamid Unit Biostatistics and Research Methodology

Upload: judith-mcbride

Post on 16-Jan-2016

262 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Dr Siti Azrin Binti Ab Hamid Unit Biostatistics and Research Methodology

Analysis of Categorical Data

Dr Siti Azrin Binti Ab HamidUnit Biostatistics and Research Methodology

Page 2: Dr Siti Azrin Binti Ab Hamid Unit Biostatistics and Research Methodology

Types of categorical analysis Steps to analysis

Outline

Page 3: Dr Siti Azrin Binti Ab Hamid Unit Biostatistics and Research Methodology

Dependent variable

Independent variable

Number of groups in independent variable

Parametric test

Non parametric test

Numerical (one)

- - One sample t Sign test

Categorical 2 groups (independent)

Independent t Mann Whitney

Categorical 2 groups (dependent)

Paired t Signed rank test

Categorical > 2 groups (independent)

One way ANOVA

Kruskal Wallis

Categorical(2 groups)

Categorical 2 groups (independent)

- Chi square testFisher exact test

Categorical 2 groups (dependent)

- McNemar test

Overview univariable analysis

Page 4: Dr Siti Azrin Binti Ab Hamid Unit Biostatistics and Research Methodology

Categorical data analysis deals with discrete data that can be organized into categories.

The data are organized into a contingency table.

Introduction

Page 5: Dr Siti Azrin Binti Ab Hamid Unit Biostatistics and Research Methodology

Data Statistical tests

One proportion Chi-square goodness of fit

Two proportion

Independent sample Pearson chi-square / Fisher exact

Dependent sample McNemar test

Stratified sampling to control confounder

Mantel-Haenszel test

Types of categorical data analysis

Page 6: Dr Siti Azrin Binti Ab Hamid Unit Biostatistics and Research Methodology

Step 1: State the hypotheses

Step 2: Set the significance level

Step 3: Check the assumptions

Step 4: Perform the statistical analysis

Step 5: Make interpretation

Step 6: Draw conclusion

Hypothesis testing

Page 7: Dr Siti Azrin Binti Ab Hamid Unit Biostatistics and Research Methodology

Consists of two columns and two rows. Cells are labeled A through D. Columns and rows are added for labels. Row: independent variable / exposure / risk

factors Column: dependent variable / outcome

Contingency table

Page 8: Dr Siti Azrin Binti Ab Hamid Unit Biostatistics and Research Methodology

Example of contingency table

CHDpresent

CHD absent

Total

Smoker 138 32 170

Non-smoker

263 105 368

Total 137 401 538

Page 9: Dr Siti Azrin Binti Ab Hamid Unit Biostatistics and Research Methodology

To test the association between two categorical variables

Independent sample Result of test:

- Not significant: no association- Significant: an association

Pearson Chi-square

Page 10: Dr Siti Azrin Binti Ab Hamid Unit Biostatistics and Research Methodology

Does estrogen receptor associated with breast cancer status?

Data: Breast cancer.sav

Research Question

Page 11: Dr Siti Azrin Binti Ab Hamid Unit Biostatistics and Research Methodology

HO: There is no association between estrogen receptor and breast cancer status.

HA: There is an association between estrogen receptor and breast cancer status.

Step 1: State the hypothesis

Page 12: Dr Siti Azrin Binti Ab Hamid Unit Biostatistics and Research Methodology

α = 0.05

Step 2: Set the significance level

Page 13: Dr Siti Azrin Binti Ab Hamid Unit Biostatistics and Research Methodology

1. Two variables are independent2. Two variables are categorical3. Expected count of < 5

- > 20%: Fisher exact test- < 20%: Pearson Chi-squareExpected count = Row total x Column total

Grand total

Step 3: Check the assumption

Variable Breast Ca Total

Died Alive

ER - ve 310 28 338

ER + ve 508 23 531

Total 818 51 869

Page 14: Dr Siti Azrin Binti Ab Hamid Unit Biostatistics and Research Methodology

Step 3: Check the assumptionVariable Breast Ca Total

Died Alive

ER - ve 310E = 318.2

28E = 19.8

338

ER + ve 508E = 499.8

23E = 31.2

531

Total 818 51 869

Page 15: Dr Siti Azrin Binti Ab Hamid Unit Biostatistics and Research Methodology

Calculate the Chi-square value

x2 = ∑((O – E)2/ E) = 5.897 df = (R-1)(C-1) = (2-1)(2-1) = 1

Between 0.01 – 0.02

Step 4: Statistical test

Page 16: Dr Siti Azrin Binti Ab Hamid Unit Biostatistics and Research Methodology

Step 4: Statistical test

6

5

4

3

2

8

7

910

1

Page 17: Dr Siti Azrin Binti Ab Hamid Unit Biostatistics and Research Methodology

Step 5: Interpretation

p value = 0.016< 0.05 – reject HO, accept HA

Page 18: Dr Siti Azrin Binti Ab Hamid Unit Biostatistics and Research Methodology

There is significant association between estrogen receptor and breast cancer status using Pearson Chi-square test (p = 0.016).

Step 6: Conclusion

Page 19: Dr Siti Azrin Binti Ab Hamid Unit Biostatistics and Research Methodology

To test the association between two categorical variables

Independent sample Sample sizes are small

Fisher’s Exact Test

Page 20: Dr Siti Azrin Binti Ab Hamid Unit Biostatistics and Research Methodology

Does gender associated with coronary heart disease?

Data: CHD data.sav

Research Question

Page 21: Dr Siti Azrin Binti Ab Hamid Unit Biostatistics and Research Methodology

HO: There is no association between gender and coronary heart disease.

HA: There is an association between gender and coronary heart disease.

Step 1: State the hypothesis

Page 22: Dr Siti Azrin Binti Ab Hamid Unit Biostatistics and Research Methodology

α = 0.05

Step 2: Set the significance level

Page 23: Dr Siti Azrin Binti Ab Hamid Unit Biostatistics and Research Methodology

1. Two variables are independent2. Two variables are categorical3. Expected count of < 5

- > 20%: Fisher exact test- < 20%: Pearson Chi-squareExpected count = Row total x Column total

Grand total

Step 3: Check the assumption

Variable Coronary Heart Disease Total

Presence Absent

Male 15 5 20

Female 10 0 10

Total 25 5 30

Page 24: Dr Siti Azrin Binti Ab Hamid Unit Biostatistics and Research Methodology

Step 3: Check the assumptionVariable Coronary Heart

DiseaseTotal

Presence Absent

Male 15E = 16.7

5E = 3.3

20

Female 10E = 8.3

0E = 1.7

10

Total 25 5 30

2 cells (50%) – expected count < 5

Page 25: Dr Siti Azrin Binti Ab Hamid Unit Biostatistics and Research Methodology

Calculate the Chi-square value

x2 = ∑((O – E)2/ E) = 3.0968 df = (R-1)(C-1) = (2-1)(2-1) = 1

Between 0.1 – 0.05

Step 4: Statistical test

Page 26: Dr Siti Azrin Binti Ab Hamid Unit Biostatistics and Research Methodology

Step 4: Statistical test

5

4

8

7

9

1

26

3

10

Page 27: Dr Siti Azrin Binti Ab Hamid Unit Biostatistics and Research Methodology

Step 5: Interpretation

p value = 0.140> 0.05 – accept HO

Page 28: Dr Siti Azrin Binti Ab Hamid Unit Biostatistics and Research Methodology

There is no significant association between gender and coronary heart disease using Fisher’s Exact test (p = 0.140).

Step 6: Conclusion

Page 29: Dr Siti Azrin Binti Ab Hamid Unit Biostatistics and Research Methodology

Categorical data Dependent sample

- Matched sample- Cross over design- Before & after (same subject)

To determine whether the row and column marginal frequencies are equal (marginal homogeneity)

McNemar Test

Page 30: Dr Siti Azrin Binti Ab Hamid Unit Biostatistics and Research Methodology

Null hypothesis of marginal homogeneity states the two marginal probabilities for each outcome are the sameHO : PB = PC

HA : PB ≠ PC

A & D = concordant pair

B & C = discordant pair

Discordant pair is pair of different outcome

Hypotheses

Page 31: Dr Siti Azrin Binti Ab Hamid Unit Biostatistics and Research Methodology

Does type of mastectomy associated with 5-year survival proportion in patients with breast cancer?

The sample were breast cancer patients- matched for age (same decade of age)- same clinical condition

Data: breast ca.sav

Research Question

Page 32: Dr Siti Azrin Binti Ab Hamid Unit Biostatistics and Research Methodology

HO: There is no association between type of mastectomy and 5-year survival proportion in patients with breast cancer.

HA: There is an association between type of mastectomy and 5-year survival proportion in patients with breast cancer.

Step 1: State the hypothesis

Page 33: Dr Siti Azrin Binti Ab Hamid Unit Biostatistics and Research Methodology

α = 0.05

Step 2: Set the significance level

Page 34: Dr Siti Azrin Binti Ab Hamid Unit Biostatistics and Research Methodology

1. Two variables are dependent2. Two variables are categorical

Step 3: Check the assumption

Page 35: Dr Siti Azrin Binti Ab Hamid Unit Biostatistics and Research Methodology

x2 = (|b-c|-1)2/(b + c) = (|0 – 8| - 1)2 / (0 +8)

=6.125 df = (R-1)(C-1) = (2-1)(2-1) = 1

Calculated x2 > tabulated x2

*x2 = (|b-c|-0.5)2/(b + c)

Step 4: Statistical test

Page 36: Dr Siti Azrin Binti Ab Hamid Unit Biostatistics and Research Methodology

Step 4: Statistical test

1

3

2

4

5

6

7

8

9

Page 37: Dr Siti Azrin Binti Ab Hamid Unit Biostatistics and Research Methodology

Step 5: Interpretation

p value = 0.008< 0.05 – reject HO, accept HA

Page 38: Dr Siti Azrin Binti Ab Hamid Unit Biostatistics and Research Methodology

There is an association between type of mastectomy and 5-year survival proportion in patients with breast cancer using McNemar test (p = 0.008).

Step 6: Conclusion

Page 39: Dr Siti Azrin Binti Ab Hamid Unit Biostatistics and Research Methodology

Test is a method to compare the probability of an event among independent groups in stratified samples.

The stratification factor can be study center, gender, race, age groups, obesity status or disease severity.

Gives a stratified statistical analysis of the relationship between exposure and disease, after controlling for a confounder (strata variables).

The data are arranged in a series of associated 2 × 2 contingency tables.

Cochran Mantel-Haenszel Test

Page 40: Dr Siti Azrin Binti Ab Hamid Unit Biostatistics and Research Methodology

Does the type of treatment associated with response of treatment among migraine patients after controlling for gender?

Confounder: gender

Research Question

Active Placebo

Female

No of patients 27 25

No of better response

16 5

Male

No of patients 28 26

No of better response

12 7

Page 41: Dr Siti Azrin Binti Ab Hamid Unit Biostatistics and Research Methodology

Better Same Total

Reasons of failure

Strata 1

Female

Active 16 11 27

Placebo 5 20 25

Strata 2

Male

Active 12 16 28

Placebo 7 19 26

Step 1: 2x2 contingency table

Page 42: Dr Siti Azrin Binti Ab Hamid Unit Biostatistics and Research Methodology

1. Random sampling 2. Stratified sampling

Step 2: Check the assumption

Page 43: Dr Siti Azrin Binti Ab Hamid Unit Biostatistics and Research Methodology

HO: There is no association between type of treatment and response of treatment among female and male migraine patients.

HA: There is an association between type of treatment and response of treatment among female and male migraine patients.

Step 3: State the hypothesis

Page 44: Dr Siti Azrin Binti Ab Hamid Unit Biostatistics and Research Methodology

Compute the expected frequency from each stratumei = (ai + bi)(ai + ci)

ni

Compute each stratumvi = (ai +bi)(ci +di)(ai +ci)(bi + di)

ni2(ni -1)

Compute Mantel-Haenszel statisticsx2

MH = ∑(ai –ei)2

∑vi

Step 4: Statistical test

Page 45: Dr Siti Azrin Binti Ab Hamid Unit Biostatistics and Research Methodology

Compute the expected frequency from each stratumei = (ai + bi)(ai + ci)

ni

e1 = (16 +11)(16+ 5)

52 = 10.9038

e2 = (12 +16)(12+ 7)

54 = 9.8519

Step 4: Statistical test

Page 46: Dr Siti Azrin Binti Ab Hamid Unit Biostatistics and Research Methodology

Compute each stratumvi = (ai +bi)(ci +di)(ai +ci)(bi + di)

ni2(ni -1)

v1 = (16 + 11)(5 + 20)(16 + 5)(11+20)

(52)2(52-1) = 3.1865

v2 = (12 + 16)(7 + 19)(12 + 7)(16+19)

(54)2(54-1) = 3.1325

Step 4: Statistical test

Page 47: Dr Siti Azrin Binti Ab Hamid Unit Biostatistics and Research Methodology

Compute Mantel-Haenszel statisticsx2

MH = (∑ai –∑ei)2

∑vi

= ((16 +12) - (10.9038 + 9.8519))2

3.1865 + 3.1325 = 8.3051

= 8.31

Step 4: Statistical test

Page 48: Dr Siti Azrin Binti Ab Hamid Unit Biostatistics and Research Methodology

Compute odd ratioORMH = ∑(ai di/ ni)

∑(bi ci/ ni)

= (16 x 20/ 52) + (12 x 19 / 54) (11 x 5/ 52) + (16 x 7/ 54 = 3.313

Step 4: Statistical test

Page 49: Dr Siti Azrin Binti Ab Hamid Unit Biostatistics and Research Methodology

Step 4: Statistical test

Data: Migraine.sav

1

2

3

4

56

Page 50: Dr Siti Azrin Binti Ab Hamid Unit Biostatistics and Research Methodology

Compute Mantel-Haenszel statisticsx2

MH = (∑ai –∑ei)2

∑vi

= ((16 +12) - (10.9038 + 9.8519))2

3.1865 + 3.1325 = 8.3051

= 8.31

Step 5: Interpretation

Calculated value > tabulated valueReject HO

Page 51: Dr Siti Azrin Binti Ab Hamid Unit Biostatistics and Research Methodology

Step 5: Interpretation

The large p-value for the Breslow-Day test (p = 0.222) indicates no significant gender difference in the odds ratios.

HO = OR1 = OR2

Association homogenous*Tarone’s - adjusted

HO = OR1 = 1HO = OR2 = 1Conditionally independent

Page 52: Dr Siti Azrin Binti Ab Hamid Unit Biostatistics and Research Methodology

There is significant association between type of treatment and response of treatment among female and male migraine patients (p = 0.004).

We estimate that female patients and male patients who receive active treatment are 3.33 times more likely to have better symptoms in migraine for any reason than patients who receive placebo.

Step 6: Conclusion