log-linear models hrp 261 03/03/04 log-linear models for multi-way contingency tables 1. glm for...
TRANSCRIPT
Log-linear ModelsLog-linear Models
HRP 261 03/03/04HRP 261 03/03/04
Log-Linear Models for Log-Linear Models for Multi-way Contingency TablesMulti-way Contingency Tables
1. GLM for Poisson-distributed data with log-link (see Agresti chapter 4).
2. Recall: log = + x = e (e)x A one-unit increase in X has a multiplicative impact of eon .
3. General idea: predict the expected frequency (count) in each cell by a product of “effects”—main effects and interactions.
4. (Take logs to linearize).
Log-linear vs. logisticLog-linear vs. logistic
1. The expected distribution of the categorical variables is Poisson, not binomial.
2. The link function is the log, not the logit.
3. Predictions are estimates of the cell counts in a contingency table, not the logit of y.
Log-linear vs. logisticLog-linear vs. logistic The variables investigated by log linear models are all treated as “response variables.” Therefore, loglinear models only demonstrate association between variables (like chi-
square or correlation coefficient). If clear explanatory and response variables exist, then logistic regression should be
used instead. Also, if the variables are continuous and cannot be broken down into discrete
categories, logistic regression is preferable.
Example: 3-way contingencyExample: 3-way contingency Heart Disease Total
Body Weight Sex Yes No
Not over weight Male 15 5 20
Female 40 60 100
Total 55 65 120
Over weight Male 20 10 30
Female 10 40 50
Total 30 50 80
Source: Angela Jeansonne
In class exercise:
Analyze these data using methods we have already learned.
Is gender related to heart disease and is this effect modified or confounded by weight?
What’s the relationship between overweight and gender (controlled for chd) and overweight and heart disease (controlled for gender)?
Heart Disease Total
Sex Yes No
All weights Male 35 15 50
Female 50 100 150
Total 85 115 200
Over weight Male 20 10 30
Female 10 40 50
Total 30 50 80
OR male-CHD=35*100/(15*50)=4.66
Crude ORCrude ORCHD-MaleCHD-Male(ignore overweight)(ignore overweight)
Crude ORCrude OROverweight-MaleOverweight-Male(ignore heart disease)(ignore heart disease)
Overweight Total
Sex Yes No
All CHD-status Male 30 20 50
Female 50 100 150
Total 80 120 200
Over weight Male 20 10 30
Female 10 40 50
Total 30 50 80
OR Overweight-Male=30*100/(20*50)=3.0
Crude ORCrude ORCHD-OverweightCHD-Overweight(ignore gender)(ignore gender)
Heart Disease Total
Weight Yes No
Men and Women combined
Heavy 30 50 80
Light 55 65 120
Total 85 115 200
Over weight Male 20 10 30
Female 10 40 50
Total 30 50 80
OR CHD-Overweight=30*65/(50*55)=0.71
ORORMHMH (CHD-Male) – stratified (CHD-Male) – stratified
by Overweightby Overweight
2
1
2
1
i i
ii
i i
ii
T
cb
T
da
0.6
80
10*10
120
40*580
40*20
120
60*15
Stratified by Heart DiseaseStratified by Heart Disease Overweight Total
Sex Yes No
Heart Disease Male 20 15 35
Female 10 40 50
Total 30 55 85
No CHD Male 10 5 15
Female 40 60 100
Total 50 65 115
ORORMHMH (Overweight-Male) – (Overweight-Male) –
stratified by Heart Diseasestratified by Heart Disease
2
1
2
1
i i
ii
i i
ii
T
cb
T
da
2.4
115
40*5
85
10*15115
60*10
85
40*20
Stratified by genderStratified by gender Heart Disease Total
Gender Weight Yes No
Male Heavy 20 10 30
Light 15 5 20
Total 35 15 50
Female Heavy 10 40 50
Light 40 60 100
Total 50 100 150
ORORMHMH (CHD-Overweight) – (CHD-Overweight) –
stratified by Genderstratified by Gender
2
1
2
1
i i
ii
i i
ii
T
cb
T
da
44.
150
40*40
50
15*10150
60*10
50
5*20
Model with log-linear modelsModel with log-linear models
Model 1: IndependenceModel 1: Independence
SAS CODE for generlized linear model with Poisson distribution and log link function:proc genmod data=loglinear;
model total = Overweight IsMale HeartDis / dist=poisson link=log pred ;run;
Model 1 (main effects only):
Log (counts) = + overweight + isMale + HeartDisease
Implies that the cell counts only depend on the MARGINAL probabilities (odds)
Independence model: Independence model: parametersparameters
Standard Wald 95% Chi-Parameter DF Estimate Error Confidence Limits Square Intercept 1 3.9464 0.1170 3.7171 4.1758 1137.17Overweight 1 -0.4055 0.1443 -0.6884 -0.1226 7.89IsMale 1 -1.0986 0.1633 -1.4187 -0.7786 45.26HeartDis 1 -0.3023 0.1430 -0.5826 -0.0219 4.47 Parameter Pr > ChiSq Intercept <.0001 Overweight 0.0050 IsMale <.0001 HeartDis 0.0346
Model 1:
Log (counts) = 3.95 -.41 (weight) – 1.1 (male) -.30 (heart disease)
Interpretation of Parameters:Interpretation of Parameters:Marginal OddsMarginal Odds
Model 1:
Log (counts) = 3.95 -.41 (weight) – 1.1 (male) -.30 (heart disease)
e-.41 = the (marginal) odds of being overweight = .66= 80/120
e-1.1 = the odds of being male = .33 = 50/150
e-0.3 = the odds of having disease= .74 = 85/115
Marginal probabilitiesMarginal probabilities
P(overweight) = .66/(.66+1)=.40 (80/200)
P(male)=.33/(.33+1)=.25 (50/200)
P(heart disease)=.74/1.74=.425 (80/200)
Predicted CountsPredicted CountsAs examples:
The expected number of light men with heart disease = 200*(1-.40)(.25)(.425) under independence, or 12.75
The expected number of light men without disease = 200*(1-.40)(.25)(1-.425) under independence, or 17.25
Independence model: Independence model: goodness-of-fitgoodness-of-fit
Cells Observed Pred light/male/disease 15 12.75
light/male/no disease 5 17.25
light/female/disease 40 38.25
light/female/no disease 60 51.75
heavy/male/disease 20 8.5
heavy/male/no disease 10 11.5
heavy/female/disease 10 25.5
heavy/female/no disease 40 34.5
5.342
4 df = cells – parameters in model=8-4
Suggests independence model is a poor fit!!
Predicted Table Predicted Table (note: marginal proportions don’t change)(note: marginal proportions don’t change)
Heart Disease Total
Body Weight Sex Yes No
Not over weight Male 12.75 17.25 30
Female 38.25 51.75 90
Total 51 69 120
Over weight Male 8.5 11.5 20
Female 25.5 34.5 60
Total 34 46 80
Predicted ORPredicted ORCHD-MaleCHD-Male
Heart Disease Total
Sex Yes No
All weights Male 21.25 28.75 50
Female 63.75 86.25 150
Total 85 115 200
Over weight Male 20 10 30
Female 10 40 50
Total 30 50 80
OR CHD-male=21.25*86.25/(28.75*63.75)=1.0
The model coefficients have The model coefficients have an odds ratio interpretation…an odds ratio interpretation…
0.1
0)()()()(
loglogloglog)log()log(
)1()1()0()c cellin #log(
)1()0()1()b cellin #log(
)1()0()0()d cellin #log(
)1()1()1()a cellin #log(
:overweight Among
0.1
0)()()()(
loglogloglog)log()log(
)1()0()c cellin #log(
)0()1()b cellin #log(
)0()0()d cellin #log(
)1()1()a cellin #log(
:overweight-non Among
0
0
e
cbdabc
adOR
e
cbdabc
adOR
overweightchdoverweightmaleoverweightoverweightchdmale
chdmale
overweightchdmale
overweightchdmale
overweightchdmale
overweightchdmale
chdmalechdmale
chdmale
chdmale
chdmale
chdmale
chdmale
Coefficients
represent predicted counts in each cell
Coefficients have a direct odds ratio
interpretation
Calculate OR CHD-Male in each Weight stratum
This interpretation becomes more
interesting/useful when interaction terms occur!
Expected ORExpected ORCHD-OverweightCHD-Overweight
Heart Disease Total
Weight Yes No
All genders Heavy 34 46
80
Light 51 69 120
Total 85 115 200
Over weight Male 20 10 30
Female 10 40 50
Total 30 50 80
OR CHD-Overweight=34*69/(46*51)=1.0
Expected ORExpected OROverweight-MaleOverweight-Male
Overweight Total
Sex Yes No
All CHD status Male 20 30 50
Female 60 90 150
Total 80 120 200
Over weight Male 20 10 30
Female 10 40 50
Total 30 50 80
OR Overweight-Male=20*90/(60*30)=1.0
Model with Interaction:Model with Interaction:
Model 2 (main effects + interaction with gender):
This model corresponds to case when heart disease and overweight are conditionally independent (conditioned on gender).
Log (counts) = + overweight + isMale + HeartDisease +
isMale*HeartDisease +
isMale* overweight
proc genmod data=loglinear; model total = Overweight IsMale HeartDis isMale*HeartDis isMale*Overweight/ dist=poisson link=log
pred ;run;
Implies that gender is associated with heart disease and with overweight but
overweight and heart disease are independent.
ORCHD -Male1 and
OROverweight-Male1 , but
ORCHD-Overweight =1
Model 2:
Log (counts) = 4.19 -.69 (weight) – 2.4 (male) -.69 (heart disease)
1.54 (if male and heartdis) + 1.1 (if overweight and male)
Analysis Of Parameter Estimates Standard Wald 95% Parameter DF Estimate Error Confidence Limits Intercept 1 4.1997 0.1155 3.9734 4.4260 Overweight 1 -0.6931 0.1732 -1.0326 -0.3537 IsMale 1 -2.4079 0.3317 -3.0580 -1.7579 HeartDis 1 -0.6931 0.1732 -1.0326 -0.3537 IsMale*HeartDis 1 1.5404 0.3539 0.8468 2.2341 Overweight*IsMale 1 1.0986 0.3367 0.4388 1.7584 Analysis Of Parameter Estimates Chi- Parameter Square Pr > ChiSq Intercept 1322.81 <.0001 Overweight 16.02 <.0001 IsMale 52.71 <.0001 HeartDis 16.02 <.0001 IsMale*HeartDis 18.95 <.0001 Overweight*IsMale 10.65 0.0011
Interpretation of Parameters,Interpretation of Parameters,Model 2Model 2
Model 2:
Log (counts) = 4.19 -.69 (weight) – 2.4 (male) -.69 (heart disease)
1.54 (if male and heartdis) + 1.1 (if overweight and male)
66.4
)()(
)()( :overweight
)()()()( :overweightnot
loglogloglog)log()log(
54.1
*
*
**
*
*
*
eeOR
k
k
cbdabc
adOR
malechd
malechd
overwchdmaleoverwoverwmale
overwmaleoverwmalechdchdoverwmale
malechd
chdmalemalechdchdmale
chdmale
OR estimate from predicted countsOR estimate from predicted counts
Cells Observed Pred
light/male/disease 15 14
light/male/no disease 5 6
light/female/disease 40 33.3
light/female/no disease 60 66.6 heavy/male/disease 20 21
heavy/male/no disease 10 9 heavy/female/disease 10 16.6 heavy/female/no disease 40 33.3
66.46.16*9
3.33*21)(
66.43.33*6
6.66*14)(
heavykOR
lightkOR ORCHD-Male is not confounded by weight
3.622
OROROverweight-MaleOverweight-Male
Model 2:
Log (counts) = 4.19 -.69 (weight) – 2.4 (male) -.69 (heart disease)
1.54 (if male and heartdis) + 1.1 (if overweight and male)
0.3
)()(
)()( :c
)()()()( :chd no
loglogloglog)log()log(
1.1
*
*
**
*
*
*
eeOR
hdk
k
cbdabc
adOR
maleoverw
maleoverw
overwchdmalechdoverwmale
chdmaleoverwmalechdchdoverwmale
maleoverw
overmalemaleoverwovermale
MaleOverweight
OR estimate from predicted countsOR estimate from predicted counts
Cells Observed Pred
light/male/disease 15 14
light/male/no disease 5 6
light/female/disease 40 33.3
light/female/no disease 60 66.6 heavy/male/disease 20 21
heavy/male/no disease 10 9 heavy/female/disease 10 16.6 heavy/female/no disease 40 33.3
00.33.33*6
6.66*9) (
00.36.16*14
3.33*21)(
chdnokOR
chdkOR ORmale-overweight is not confounded by chd
ORORCHD-OVerweightCHD-OVerweight
Model 2:
Log (counts) = 4.19 -.69 (weight) – 2.4 (male) -.69 (heart disease)
1.54 (if male and heartdis) + 1.1 (if overweight and male)
0.1
0
)()()()( :m :
0
)()(
)()( :m
loglogloglog)log()log(
0
**
**
eOR
alekfemalek
alek
cbdabc
adOR
overchdchdover
maleoverwmaleoverchdmalechdmale
malemaleoverwchdmalechdmaleover
OverweightCHD
Interpretation: Model 2Interpretation: Model 2 Overweight and heart-disease are independent
when you condition on gender.
Heart Disease
Men Yes No
Overweight 21 9
Women Overweight 16.6 33.3
normal 33.3 66.6
normal 14 6
OR=21*6/14*9 =1.0
OR=16.6*33.3/33.3*33.3
=1.0
Model 3: only male and chd Model 3: only male and chd are relatedare related
Output Model 3:
Log (counts) = 4.09 -.41 (weight) – 1.9 (male) -.69 (heart disease)
1.54 (if male and heartdis)
Model 2 (main effects + single interaction):
This model corresponds to case when heart disease and overweight and gender and overweight are conditionally independent.
Log (counts) = + overweight + isMale + HeartDisease +
isMale*HeartDisease
OR: Male and CHDOR: Male and CHD
66.4
)()(
)()( :
)()()()( :overweight no
loglogloglog)log()log(
54.1
*
*
*
*
*
eeOR
overweightk
k
cbdabc
adOR
malechd
malechd
overwchdoverwmale
overwmalechdchdoverwmale
malechd
chdmalemalechdchdmale
chdmale
Model 3:
Log (counts) = 4.09 -.41 (weight) – 1.9 (male) -.69 (heart disease)
1.54 (if male and heartdis)
Cells Observed Pred
light/male/disease 15 21
light/male/no disease 5 9
light/female/disease 40 30
light/female/no disease 60 60 heavy/male/disease 20 14
heavy/male/no disease 10 6 heavy/female/disease 10 20 heavy/female/no disease 40 40
Model 3: only male and chd Model 3: only male and chd are relatedare related
Collapses to…Collapses to…
CHD
No CHD
Male Female
35 50
15 100
66.415*50
100*35OR
And…And…heart disease and overweight are heart disease and overweight are independent, regardless of genderindependent, regardless of gender
CHD
No CHD
Overweight light
34 51
46 69
00.151*46
69*34OR
And…And… overweight and gender are overweight and gender are
independent, regardless of diseaseindependent, regardless of disease
Male
Female
Overweight light
20 30
60 90
00.130*60
90*20OR
M4: All pair-wise interactionsM4: All pair-wise interactions
proc genmod data=loglinear; model total = Overweight IsMale HeartDis isMale*HeartDis isMale*Overweight Overweight*HeartDis / dist=poisson
link=log pred ;run;
Model 4 (main effects +all pairwise interactions):
No pair of variables is conditionally independent.
Log (counts) = + overweight + isMale + HeartDisease
isMale*HeartDisease +
isMale* overweight +
HeartDis* overweight
Model 4:
Log (counts) = 4.11 -.25 (weight) – 2.7 (male) -.45 (heart disease)
1.8 (if male and heartdis) + 1.4 (if overweight and male)-.82 (if over and heartdis)
Standard Wald 95%Parameter DF Estimate Error Confidence Limits Intercept 1 4.1103 0.1263 3.8627 4.3579Overweight 1 -0.4458 0.1978 -0.8336 -0.0581IsMale 1 -2.7153 0.3877 -3.4753 -1.9554HeartDis 1 -0.4458 0.1978 -0.8336 -0.0581IsMale*HeartDis 1 1.8213 0.3871 1.0627 2.5799Overweight*IsMale 1 1.4456 0.3797 0.7013 2.1899Overweight*HeartDis 1 -0.8239 0.3431 -1.4963 -0.1515 Analysis Of Parameter Estimates Chi- Parameter Square Pr > ChiSq Intercept 1058.30 <.0001 Overweight 5.08 0.0242 IsMale 49.04 <.0001 HeartDis 5.08 0.0242 IsMale*HeartDis 22.14 <.0001 Overweight*IsMale 14.49 0.0001 Overweight*HeartDis 5.77 0.0163
OR: Male and CHDOR: Male and CHD
0.6
)()(
)()( :
)()()()( :overweightnot
loglogloglog)log()log(
8.1
*
**
***
*
*
*
eeOR
overweightk
k
cbdabc
adOR
malechd
malechd
overwchdoverwchdoverwmaleoverwmale
overwmaleoverwoverwchdmalechdchdoverwmale
malechd
chdmalemalechdchdmale
chdmale
Model 4:
Log (counts) = 4.11 -.25 (weight) – 2.7 (male) -.45 (heart disease)
1.8 (if male and heartdis) + 1.4 (if overweight and male)-.82 (if over and heartdis)
Corresponds to the M-H summary OR, stratified by
overweight
OR: CHD and overweightOR: CHD and overweight
44.82.* eeOR overweightchd
Model 4:
Log (counts) = 4.11 -.25 (weight) – 2.7 (male) -.45 (heart disease)
1.8 (if male and heartdis) + 1.4 (if overweight and male)-.82 (if over and heartdis)
Corresponds to the M-H summary OR, stratified by
gender
OR: male and overweightOR: male and overweight
2.44.1* eeOR overweightmale
Model 4:
Log (counts) = 4.11 -.25 (weight) – 2.7 (male) -.45 (heart disease)
1.8 (if male and heartdis) + 1.4 (if overweight and male)-.82 (if over and heartdis)
Corresponds to the M-H summary OR, stratified by chd
OR estimate from predicted countsOR estimate from predicted counts
Cells Observed Pred
light/male/disease 15 16
light/male/no disease 5 4
light/female/disease 40 39
light/female/no disease 60 61 heavy/male/disease 20 19
heavy/male/no disease 10 11 heavy/female/disease 10 11 heavy/female/no disease 40 39
571.21
GOOD FIT!
The saturated modelThe saturated model
Model 5 (saturated):
Log (counts) = + overweight + isMale + HeartDisease
isMale*HeartDisease +
isMale* overweight +
HeartDis* overweight +
isMale*HeartDisease * overweight
Perfect fit—but no degrees of freedom.