dr siti azrin binti ab hamid unit biostatistics and research methodology
TRANSCRIPT
Analysis of Categorical Data
Dr Siti Azrin Binti Ab HamidUnit Biostatistics and Research Methodology
Types of categorical analysis Steps to analysis
Outline
Dependent variable
Independent variable
Number of groups in independent variable
Parametric test
Non parametric test
Numerical (one)
- - One sample t Sign test
Categorical 2 groups (independent)
Independent t Mann Whitney
Categorical 2 groups (dependent)
Paired t Signed rank test
Categorical > 2 groups (independent)
One way ANOVA
Kruskal Wallis
Categorical(2 groups)
Categorical 2 groups (independent)
- Chi square testFisher exact test
Categorical 2 groups (dependent)
- McNemar test
Overview univariable analysis
Categorical data analysis deals with discrete data that can be organized into categories.
The data are organized into a contingency table.
Introduction
Data Statistical tests
One proportion Chi-square goodness of fit
Two proportion
Independent sample Pearson chi-square / Fisher exact
Dependent sample McNemar test
Stratified sampling to control confounder
Mantel-Haenszel test
Types of categorical data analysis
Step 1: State the hypotheses
Step 2: Set the significance level
Step 3: Check the assumptions
Step 4: Perform the statistical analysis
Step 5: Make interpretation
Step 6: Draw conclusion
Hypothesis testing
Consists of two columns and two rows. Cells are labeled A through D. Columns and rows are added for labels. Row: independent variable / exposure / risk
factors Column: dependent variable / outcome
Contingency table
Example of contingency table
CHDpresent
CHD absent
Total
Smoker 138 32 170
Non-smoker
263 105 368
Total 137 401 538
To test the association between two categorical variables
Independent sample Result of test:
- Not significant: no association- Significant: an association
Pearson Chi-square
Does estrogen receptor associated with breast cancer status?
Data: Breast cancer.sav
Research Question
HO: There is no association between estrogen receptor and breast cancer status.
HA: There is an association between estrogen receptor and breast cancer status.
Step 1: State the hypothesis
α = 0.05
Step 2: Set the significance level
1. Two variables are independent2. Two variables are categorical3. Expected count of < 5
- > 20%: Fisher exact test- < 20%: Pearson Chi-squareExpected count = Row total x Column total
Grand total
Step 3: Check the assumption
Variable Breast Ca Total
Died Alive
ER - ve 310 28 338
ER + ve 508 23 531
Total 818 51 869
Step 3: Check the assumptionVariable Breast Ca Total
Died Alive
ER - ve 310E = 318.2
28E = 19.8
338
ER + ve 508E = 499.8
23E = 31.2
531
Total 818 51 869
Calculate the Chi-square value
x2 = ∑((O – E)2/ E) = 5.897 df = (R-1)(C-1) = (2-1)(2-1) = 1
Between 0.01 – 0.02
Step 4: Statistical test
Step 4: Statistical test
6
5
4
3
2
8
7
910
1
Step 5: Interpretation
p value = 0.016< 0.05 – reject HO, accept HA
There is significant association between estrogen receptor and breast cancer status using Pearson Chi-square test (p = 0.016).
Step 6: Conclusion
To test the association between two categorical variables
Independent sample Sample sizes are small
Fisher’s Exact Test
Does gender associated with coronary heart disease?
Data: CHD data.sav
Research Question
HO: There is no association between gender and coronary heart disease.
HA: There is an association between gender and coronary heart disease.
Step 1: State the hypothesis
α = 0.05
Step 2: Set the significance level
1. Two variables are independent2. Two variables are categorical3. Expected count of < 5
- > 20%: Fisher exact test- < 20%: Pearson Chi-squareExpected count = Row total x Column total
Grand total
Step 3: Check the assumption
Variable Coronary Heart Disease Total
Presence Absent
Male 15 5 20
Female 10 0 10
Total 25 5 30
Step 3: Check the assumptionVariable Coronary Heart
DiseaseTotal
Presence Absent
Male 15E = 16.7
5E = 3.3
20
Female 10E = 8.3
0E = 1.7
10
Total 25 5 30
2 cells (50%) – expected count < 5
Calculate the Chi-square value
x2 = ∑((O – E)2/ E) = 3.0968 df = (R-1)(C-1) = (2-1)(2-1) = 1
Between 0.1 – 0.05
Step 4: Statistical test
Step 4: Statistical test
5
4
8
7
9
1
26
3
10
Step 5: Interpretation
p value = 0.140> 0.05 – accept HO
There is no significant association between gender and coronary heart disease using Fisher’s Exact test (p = 0.140).
Step 6: Conclusion
Categorical data Dependent sample
- Matched sample- Cross over design- Before & after (same subject)
To determine whether the row and column marginal frequencies are equal (marginal homogeneity)
McNemar Test
Null hypothesis of marginal homogeneity states the two marginal probabilities for each outcome are the sameHO : PB = PC
HA : PB ≠ PC
A & D = concordant pair
B & C = discordant pair
Discordant pair is pair of different outcome
Hypotheses
Does type of mastectomy associated with 5-year survival proportion in patients with breast cancer?
The sample were breast cancer patients- matched for age (same decade of age)- same clinical condition
Data: breast ca.sav
Research Question
HO: There is no association between type of mastectomy and 5-year survival proportion in patients with breast cancer.
HA: There is an association between type of mastectomy and 5-year survival proportion in patients with breast cancer.
Step 1: State the hypothesis
α = 0.05
Step 2: Set the significance level
1. Two variables are dependent2. Two variables are categorical
Step 3: Check the assumption
x2 = (|b-c|-1)2/(b + c) = (|0 – 8| - 1)2 / (0 +8)
=6.125 df = (R-1)(C-1) = (2-1)(2-1) = 1
Calculated x2 > tabulated x2
*x2 = (|b-c|-0.5)2/(b + c)
Step 4: Statistical test
Step 4: Statistical test
1
3
2
4
5
6
7
8
9
Step 5: Interpretation
p value = 0.008< 0.05 – reject HO, accept HA
There is an association between type of mastectomy and 5-year survival proportion in patients with breast cancer using McNemar test (p = 0.008).
Step 6: Conclusion
Test is a method to compare the probability of an event among independent groups in stratified samples.
The stratification factor can be study center, gender, race, age groups, obesity status or disease severity.
Gives a stratified statistical analysis of the relationship between exposure and disease, after controlling for a confounder (strata variables).
The data are arranged in a series of associated 2 × 2 contingency tables.
Cochran Mantel-Haenszel Test
Does the type of treatment associated with response of treatment among migraine patients after controlling for gender?
Confounder: gender
Research Question
Active Placebo
Female
No of patients 27 25
No of better response
16 5
Male
No of patients 28 26
No of better response
12 7
Better Same Total
Reasons of failure
Strata 1
Female
Active 16 11 27
Placebo 5 20 25
Strata 2
Male
Active 12 16 28
Placebo 7 19 26
Step 1: 2x2 contingency table
1. Random sampling 2. Stratified sampling
Step 2: Check the assumption
HO: There is no association between type of treatment and response of treatment among female and male migraine patients.
HA: There is an association between type of treatment and response of treatment among female and male migraine patients.
Step 3: State the hypothesis
Compute the expected frequency from each stratumei = (ai + bi)(ai + ci)
ni
Compute each stratumvi = (ai +bi)(ci +di)(ai +ci)(bi + di)
ni2(ni -1)
Compute Mantel-Haenszel statisticsx2
MH = ∑(ai –ei)2
∑vi
Step 4: Statistical test
Compute the expected frequency from each stratumei = (ai + bi)(ai + ci)
ni
e1 = (16 +11)(16+ 5)
52 = 10.9038
e2 = (12 +16)(12+ 7)
54 = 9.8519
Step 4: Statistical test
Compute each stratumvi = (ai +bi)(ci +di)(ai +ci)(bi + di)
ni2(ni -1)
v1 = (16 + 11)(5 + 20)(16 + 5)(11+20)
(52)2(52-1) = 3.1865
v2 = (12 + 16)(7 + 19)(12 + 7)(16+19)
(54)2(54-1) = 3.1325
Step 4: Statistical test
Compute Mantel-Haenszel statisticsx2
MH = (∑ai –∑ei)2
∑vi
= ((16 +12) - (10.9038 + 9.8519))2
3.1865 + 3.1325 = 8.3051
= 8.31
Step 4: Statistical test
Compute odd ratioORMH = ∑(ai di/ ni)
∑(bi ci/ ni)
= (16 x 20/ 52) + (12 x 19 / 54) (11 x 5/ 52) + (16 x 7/ 54 = 3.313
Step 4: Statistical test
Step 4: Statistical test
Data: Migraine.sav
1
2
3
4
56
Compute Mantel-Haenszel statisticsx2
MH = (∑ai –∑ei)2
∑vi
= ((16 +12) - (10.9038 + 9.8519))2
3.1865 + 3.1325 = 8.3051
= 8.31
Step 5: Interpretation
Calculated value > tabulated valueReject HO
Step 5: Interpretation
The large p-value for the Breslow-Day test (p = 0.222) indicates no significant gender difference in the odds ratios.
HO = OR1 = OR2
Association homogenous*Tarone’s - adjusted
HO = OR1 = 1HO = OR2 = 1Conditionally independent
There is significant association between type of treatment and response of treatment among female and male migraine patients (p = 0.004).
We estimate that female patients and male patients who receive active treatment are 3.33 times more likely to have better symptoms in migraine for any reason than patients who receive placebo.
Step 6: Conclusion