dr siti azrin binti ab hamid unit biostatistics and research methodology

Analysis of Categorical Data

Dr Siti Azrin Binti Ab HamidUnit Biostatistics and Research Methodology

Types of categorical analysis Steps to analysis

Outline

Dependent variable

Independent variable

Number of groups in independent variable

Parametric test

Non parametric test

Numerical (one)

- - One sample t Sign test

Categorical 2 groups (independent)

Independent t Mann Whitney

Categorical 2 groups (dependent)

Paired t Signed rank test

Categorical > 2 groups (independent)

One way ANOVA

Kruskal Wallis

Categorical(2 groups)

Categorical 2 groups (independent)

- Chi square testFisher exact test

Categorical 2 groups (dependent)

- McNemar test

Overview univariable analysis

Categorical data analysis deals with discrete data that can be organized into categories.

The data are organized into a contingency table.

Introduction

Data Statistical tests

One proportion Chi-square goodness of fit

Two proportion

Independent sample Pearson chi-square / Fisher exact

Dependent sample McNemar test

Stratified sampling to control confounder

Mantel-Haenszel test

Types of categorical data analysis

Step 1: State the hypotheses

Step 2: Set the significance level

Step 3: Check the assumptions

Step 4: Perform the statistical analysis

Step 5: Make interpretation

Step 6: Draw conclusion

Hypothesis testing

Consists of two columns and two rows. Cells are labeled A through D. Columns and rows are added for labels. Row: independent variable / exposure / risk

factors Column: dependent variable / outcome

Contingency table

Example of contingency table

CHDpresent

CHD absent

Total

Smoker 138 32 170

Non-smoker

263 105 368

Total 137 401 538

To test the association between two categorical variables

Independent sample Result of test:

- Not significant: no association- Significant: an association

Pearson Chi-square

Does estrogen receptor associated with breast cancer status?

Data: Breast cancer.sav

Research Question

HO: There is no association between estrogen receptor and breast cancer status.

HA: There is an association between estrogen receptor and breast cancer status.

Step 1: State the hypothesis

α = 0.05


1. Two variables are independent2. Two variables are categorical3. Expected count of < 5

- > 20%: Fisher exact test- < 20%: Pearson Chi-squareExpected count = Row total x Column total

Grand total

Step 3: Check the assumption

Variable Breast Ca Total

Died Alive

ER - ve 310 28 338

ER + ve 508 23 531

Total 818 51 869

Step 3: Check the assumptionVariable Breast Ca Total

Died Alive

ER - ve 310E = 318.2

28E = 19.8

338

ER + ve 508E = 499.8

23E = 31.2

531

Total 818 51 869

Calculate the Chi-square value

x2 = ∑((O – E)2/ E) = 5.897 df = (R-1)(C-1) = (2-1)(2-1) = 1

Between 0.01 – 0.02

Step 4: Statistical test


6

5

4

3

2

8

7

910

1

Step 5: Interpretation

p value = 0.016< 0.05 – reject HO, accept HA

There is significant association between estrogen receptor and breast cancer status using Pearson Chi-square test (p = 0.016).

Step 6: Conclusion

To test the association between two categorical variables

Independent sample Sample sizes are small

Fisher’s Exact Test

Does gender associated with coronary heart disease?

Data: CHD data.sav

Research Question

HO: There is no association between gender and coronary heart disease.

HA: There is an association between gender and coronary heart disease.


α = 0.05


1. Two variables are independent2. Two variables are categorical3. Expected count of < 5

- > 20%: Fisher exact test- < 20%: Pearson Chi-squareExpected count = Row total x Column total

Grand total


Variable Coronary Heart Disease Total

Presence Absent

Male 15 5 20

Female 10 0 10

Total 25 5 30

Step 3: Check the assumptionVariable Coronary Heart

DiseaseTotal

Presence Absent

Male 15E = 16.7

5E = 3.3

20

Female 10E = 8.3

0E = 1.7

10

Total 25 5 30

2 cells (50%) – expected count < 5

Calculate the Chi-square value

x2 = ∑((O – E)2/ E) = 3.0968 df = (R-1)(C-1) = (2-1)(2-1) = 1

Between 0.1 – 0.05



5

4

8

7

9

1

26

3

10


p value = 0.140> 0.05 – accept HO

There is no significant association between gender and coronary heart disease using Fisher’s Exact test (p = 0.140).

Step 6: Conclusion

Categorical data Dependent sample

- Matched sample- Cross over design- Before & after (same subject)

To determine whether the row and column marginal frequencies are equal (marginal homogeneity)

McNemar Test

Null hypothesis of marginal homogeneity states the two marginal probabilities for each outcome are the sameHO : PB = PC

HA : PB ≠ PC

A & D = concordant pair

B & C = discordant pair

Discordant pair is pair of different outcome

Hypotheses

Does type of mastectomy associated with 5-year survival proportion in patients with breast cancer?

The sample were breast cancer patients- matched for age (same decade of age)- same clinical condition

Data: breast ca.sav

Research Question

HO: There is no association between type of mastectomy and 5-year survival proportion in patients with breast cancer.

HA: There is an association between type of mastectomy and 5-year survival proportion in patients with breast cancer.


α = 0.05


1. Two variables are dependent2. Two variables are categorical


x2 = (|b-c|-1)2/(b + c) = (|0 – 8| - 1)2 / (0 +8)

=6.125 df = (R-1)(C-1) = (2-1)(2-1) = 1

Calculated x2 > tabulated x2

*x2 = (|b-c|-0.5)2/(b + c)



1

3

2

4

5

6

7

8

9


p value = 0.008< 0.05 – reject HO, accept HA

There is an association between type of mastectomy and 5-year survival proportion in patients with breast cancer using McNemar test (p = 0.008).

Step 6: Conclusion

Test is a method to compare the probability of an event among independent groups in stratified samples.

The stratification factor can be study center, gender, race, age groups, obesity status or disease severity.

Gives a stratified statistical analysis of the relationship between exposure and disease, after controlling for a confounder (strata variables).

The data are arranged in a series of associated 2 × 2 contingency tables.

Cochran Mantel-Haenszel Test

Does the type of treatment associated with response of treatment among migraine patients after controlling for gender?

Confounder: gender

Research Question

Active Placebo

Female

No of patients 27 25

No of better response

16 5

Male

No of patients 28 26

No of better response

12 7

Better Same Total

Reasons of failure

Strata 1

Female

Active 16 11 27

Placebo 5 20 25

Strata 2

Male

Active 12 16 28

Placebo 7 19 26

Step 1: 2x2 contingency table

1. Random sampling 2. Stratified sampling


HO: There is no association between type of treatment and response of treatment among female and male migraine patients.

HA: There is an association between type of treatment and response of treatment among female and male migraine patients.


Compute the expected frequency from each stratumei = (ai + bi)(ai + ci)

ni

Compute each stratumvi = (ai +bi)(ci +di)(ai +ci)(bi + di)

ni2(ni -1)

Compute Mantel-Haenszel statisticsx2

MH = ∑(ai –ei)2

∑vi


Compute the expected frequency from each stratumei = (ai + bi)(ai + ci)

ni

e1 = (16 +11)(16+ 5)

52 = 10.9038

e2 = (12 +16)(12+ 7)

54 = 9.8519


Compute each stratumvi = (ai +bi)(ci +di)(ai +ci)(bi + di)

ni2(ni -1)

v1 = (16 + 11)(5 + 20)(16 + 5)(11+20)

(52)2(52-1) = 3.1865

v2 = (12 + 16)(7 + 19)(12 + 7)(16+19)

(54)2(54-1) = 3.1325



MH = (∑ai –∑ei)2

∑vi

= ((16 +12) - (10.9038 + 9.8519))2

3.1865 + 3.1325 = 8.3051

= 8.31


Compute odd ratioORMH = ∑(ai di/ ni)

∑(bi ci/ ni)

= (16 x 20/ 52) + (12 x 19 / 54) (11 x 5/ 52) + (16 x 7/ 54 = 3.313



Data: Migraine.sav

1

2

3

4

56


MH = (∑ai –∑ei)2

∑vi

= ((16 +12) - (10.9038 + 9.8519))2

3.1865 + 3.1325 = 8.3051

= 8.31


Calculated value > tabulated valueReject HO


The large p-value for the Breslow-Day test (p = 0.222) indicates no significant gender difference in the odds ratios.

HO = OR1 = OR2

Association homogenous*Tarone’s - adjusted

HO = OR1 = 1HO = OR2 = 1Conditionally independent

There is significant association between type of treatment and response of treatment among female and male migraine patients (p = 0.004).

We estimate that female patients and male patients who receive active treatment are 3.33 times more likely to have better symptoms in migraine for any reason than patients who receive placebo.

Step 6: Conclusion

dr siti azrin binti ab hamid unit biostatistics and research methodology

Documents

statistical test step

breast cancer status

discrete data

chd data

significant association

pearson chisquare test

statistical analysisstep

coronary heart disease