logit model, logistic regression, and log-linear model a comparison
TRANSCRIPT
Logit model, logistic regression, and log-linear model
A comparison
R o w i , C o l u m n j S e x : A , B
u u u u ln ABij
Bj
Aiij
o r
o r
w i t h A T I M E [ e a r l y = 0 ; l a t e = 1 ] a n d B S E X [ f e m a l e = 0 ; m a l e = 1 ]
E A R L Y i s r e f e r e n c e c a t e g o r y
... ln xxx 3322110
ijjiij ln
Leaving home
Models of counts: log-linear model
Model 1: null model
= 4.887 ij = 133.5 for all i and j (=530/4)
Model 2: + TIME
= 4.649
i = 0.4291
ln = exp[4.649 + 0.4291 t] 104.5 for ‘early’ (t=0) and 160.5 for ‘late’ (t=1)
or
ln = exp[4.649] = 104.5 for early
ln = exp[4.649 + 0.4291] = 160.5 for late
Leaving home
M o d e l 3 : T I M E A N D S E X
= 4 . 6 9 7 ; 2 = 0 . 4 2 9 1 ; 2 = - 0 . 0 9 8 2
R e f e r e n c e c a t e g o r i e s : ‘ e a r l y ’ [ 1 = 0 ] a n d ‘ F e m a l e s ’ [ 1 = 0 ]
jiij ln
TablePredicted number of young adults leaving home by age and sex
(unsaturated log-linear model)Females Males Total
< 20 109.6 99.4 209
20 168.4 152.6 321
Total 278 252 530
Leaving home
11 = exp[4.697] = 109.6
21 = exp[4.697 + 0.4291] = 168.4
12 = exp[4.697 - 0.0982] = 99.4
22 = exp[4.697 + 0.4291 - 0.0982] = 152.8
Model 3: Time and Sex (unsaturated log-linear model)
jiij ln
jiij exp
Leaving home
M o d e l 4 : T I M E A N D S E X A N D T I M E * S E X i n t e r a c t i o n( S a t u r a t e d l o g - l i n e a r m o d e l
= 4 . 9 0 5 o v e r a l l e f f e c t 2 = 0 . 0 5 7 6 T I M E 2 = - 0 . 6 0 1 2 G E N D E R 2 2 = 0 . 8 2 0 1 T I M E * G E N D E R
o r
1 i = 0 f o r < 2 0x 1 i = 1 f o r 2 0
x 2 i = 0 f e m a l e sx 2 i = 1 m a l e s
x 3 i = 0 < 2 0 a n d f e m a l e sx 3 i = 0 < 2 0 a n d m a l e sx 3 i = 0 2 0 a n d f e m a l e sx 3 i = 1 2 0 a n d m a l e s
S a t u r a t e d m o d e l p r e d i c t s p e r f e c t l y
i jjii j ln
x i332 i21 i10ij ln xx
Leaving home
M o d e l 4 : T I M E A N D S E X A N D T I M E * S E X i n t e r a c t i o n
= 4 . 9 0 5 o v e r a l l e f f e c t 2 = 0 . 0 5 7 5 7 T I M E ( 2 ) 2 = - 0 . 6 0 1 2 S E X ( 2 ) 2 2 = 0 . 8 2 0 1 T I M E ( 2 ) * S E X ( 2 )
ijjiij ln
TablePredicted number of young adults leaving home by age and sex
(saturated log-linear model)Females Males Total
< 20 135 74 209
20 143 178 321
Total 278 252 530
Leaving home
Model 4: TIME AND SEX AND TIME*SEX interaction
11 = exp[4.905
= 135
21 = exp[4.905 + 0.0576]
= 143
12 = exp[4.905 - 0.6012]
= 74
22 = exp[4.905 + 0.0576 - 0.6012 + 0.8201]
= 178
ijjiij ln
ijjiij exp
Leaving home
Log-linear and logit model
Log-linear model: μ ln μμμλAB
ij
B
j
A
iij
Select one variable as a dependent variable: response variable, e.g. does voting behaviour differ by sex
Are females more likely to vote conservative than males?
Logit model: γ ln B
j
2j
1j
λλ γ
Political attitudes
μμμμμμλλ AB
21
B
1
A
2
AB
11
B
1
A
1
21
11 μ μ ln
Males voting conservative rather than labour:
Females voting conservative rather than labour:
μμμμμμλλ AB
22
B
2
A
2
AB
12
B
2
A
1
22
12 μ μ ln
Are females more likely to vote conservative than males?
Log-odds = logit
2 - - ln μ2μμμμμλλ AB
21
A
1
AB
21
AB
11
A
2
A
1
21
11
2 - - ln μ2μμμμμλλ AB
22
A
1
AB
22
AB
12
A
2
A
1
22
12
Effect coding (1)
θγγ B
1
B
1ln
θγγ B
2
B
2ln
A = Party; B = Sex
Political attitudes
Are women more conservative than men? Do women vote more conservative than men? The odds ratio.
γγγγθθ B
1
B
2
B
1
B
2B
1
B
2 - γ γ ln
If the odds ratio is positive, then the odds of voting conservative rather than labour is larger for women than men. In that case, women vote more conservative than men.
0* - γ ln γγγθB
1
B
2
B
1
B
1
1* - γ ln γγγθB
1
B
2
B
1
B
2
bx a p-1
pln ln logit(p) η
pp
2
1 Logit model:
with a = γB
1 γ
and b = γγB
1
B
2
Log odds of reference category (males)
Log odds ratio (odds females / odds males)
with x = 0, 1
Political attitudes
The logit model as a regression model
• Select a response variable proportion
• Dependent variable of logit model is the log of (odds of) being in one category rather than in another.
• Number of observations in each subpopulation (males, females) is assumed to be fixed.
• Intercept (a) = log odds of reference category
• Slope (b) = log odds ratio
DATA SexParty Male Female TotalConservative 279 352 631Labour 335 291 626Total 614 643 1257
Logit model: descriptive statisticsCounts in terms of odds and odds ratio
Male Female TotalOdds 0.8328 1.2096 1.0080Odds ratio (ref.cat: males): 1.4524
Sex
Reference categories: Labour; Males
Party Odds Odds ratioConservative 1.2616Labour 0.8687Total 1.0472 1.4524
F11 = 279
F21 = 335 = 279 * 335/279 = 279 / 0.8328
F12 = 352 = 279 * 352/279 = 279 1.2616
F22 = 291 = 279 * 352/279 * 291/352 = 279 * 1.2616 * [1/1.2096]
Political attitudes
DATA SexParty Male Female TotalConservative 279 352 631Labour 335 291 626Total 614 643 1257
Proportion voting conservative: SexParty Male Female Males Females Conservative 0.454 0.547 0.8328 1.2096
Are females more likely to vote conservative than males?Logit model: logit(p) = a + bX (males reference category)
v exp(v) pln(odds) (odds)
a = -0.18292 0.8328 0.454 Males = 0.833/(1+0.833)b = 0.37323 1.4524 Odds ratioa+b = 0.19031 1.2096 0.547 Females = 1.2096/(1+1.2096)
logit(p) = -0.18292 + 0.37323X (with X = 0 for males and X = 1 for females)
If number of males and number of females are known, the counts can be calculated.
Odds of voting cons. rather than labour
LOGIT MODEL
Political attitudes
Logistic regression SPSS
Variable Param S.E. Exp(param) SEX(1) .3732 .1133 1.4524Constant -.1903 .0792
Females voting labour: 1/[1+exp[-(-0.1903)]] = 45% 291/626 (females ref.cat)Males voting labour: 1/[1+exp[-(-0.1903+0.3732)]] = 55% 335/626
Reference category: females (X = 1 for males and X = 0 for females)
Different parameter coding: X = -0.5 for males and X = 0.5 for females
Variable Param S.E. Exp(param)SEX(1) -.3732 .1133 0.6885 Constant -.0037 .0567
Females voting labour: 1/[1+exp[-(-0.0037 + 0.5*(-0.3732))]] = 45% 291/626Males voting labour: 1/[1+exp[-(-0.0037 - 0.5 * (-0.3732))]] = 55% 335/626
Political attitudes
Observation from a binomial distribution with parameter p and index m
The logit model andthe logistic regression
Leaving parental home
L o g i t m o d e l a n d l o g i s t i c r e g r e s s i o n
N u m b e r o f y o u n g a d u l t s l e a v i n g h o m e e a r l y : 2 0 9T o t a l n u m b e r o f y o u n g a d u l t s l e a v i n g h o m e : 5 3 0P r o b a b i l i t y o f l e a v i n g h o m e e a r l y : 2 0 9 / 5 3 0 = 0 . 3 9 4
R E F E R E N C E C A T E G O R Y : l e a v i n g h o m e l a t e ( l a t e = 0 ; e a r l y = 1 )
O D D S o f l e a v i n g h o m e e a r l y v e r s u s l a t e : 2 0 9 / ( 5 3 0 - 2 0 9 ) = 0 . 6 5 1 1L o g i t o f l e a v i n g h o m e e a r l y : l n 0 . 6 5 1 1 = - 0 . 4 2 9 1
S p e c i f y a m o d e l :
L o g i t m o d e l
0.4291- 0 .394-1
0 .394ln
p-1
pln pLogit
Leaving home
L o g i s t i c r e g r e s s i o n
0.394 (-0.4291)-exp1
1 p
S t a n d a r d e r r o r :
0.0889 321
1
209
1
C o n fi d e n c e i n t e r v a l : - 0 . 4 2 9 1 1 . 9 6 * 0 . 0 8 8 9 = ( - 0 . 6 0 3 , - 0 . 2 5 5 ) O N L O G I T S C A L E
a n d
0.4366) (0.3546, 549)]exp[-(-0.21
1 ,
)][-(-0.6033exp1
1
O N P R O B A B I L I T Y S C A L E
Leaving home
Relation logit and log-linear modelThe unsaturated model
Log-linear model:
With i effect of timing and j effect of sex
Odds of leaving parental home late rather than early: females:
ln jiij
1.536 109.6
168.4
11
21
21ODDS
1.536 0-0.4291exp -exp
exp
exp 2112
11
12
11
21
21ODDS
Leaving home
Relation logit and log-linear modelThe unsaturated model
Odds of leaving parental home late rather than early: males:
1.536 99.4
152.6
12
22
21ODDS
1.536 0-0.4291exp -exp
exp
exp 2112
21
22
12
22
21ODDS
0.0889) (s.e.result same gives modellogit ofOutput
males. and femalesfor 0.4291 Logit pp
early
late
Leaving home
Relation logit and log-linear modelThe saturated model
Log-linear model:
With i effect of timing and j effect of sex and ij the effect of interaction between timing and sex
Odds of leaving parental home late rather than early: females (ref):
ijjiij ln
1.059 135
143
11
21
21ODDS
1.059 0) - (0 0)-(0.0576exp
) - ( ) -exp exp
exp 21112112
1111
2112
11
21
21 (ODDS
Leaving home
Relation logit and log-linear modelThe saturated model
Odds of leaving parental home late rather than early: males:
2.405 74
178
12
22
22ODDS
males)for 1 and femalesfor 0 X(with X 0.8201 0.0573 logit(p) :modellogit
[ref]) females odds / males (odds RATIO ODDS log is 0.8201 0.0573 - 0.8775
malesfor odds log is 0.8775 2.405ln
cat) ref. (females modellogit ofeffect overall is 0.0573 1.059ln
2.405 0) -(0.8201 0)-(0.0576exp
) - ( ) -exp exp
exp 22122212
1221
2222
12
22
22 (ODDS
Leaving home
females 278
143 0.514
0.8201)]-77exp[-(0.871
1 p
males 252
178 0.706
77)]exp[-(0.871
1 p
0.8201X - 0.8777 p-1
pln Logit(p)
Logit model:
Logistic regression: probability of leaving home late
X=0 for males
X=1 for females
Leaving home
T a b l eN u m b e r o f y o u n g a d u l t s l e a v i n g h o m e b y a g e a n d s e x
F e m a l e s M a l e s T o t a l
< 2 0 1 3 5 7 4 2 0 9
2 0 1 4 3 1 7 8 3 2 1
T o t a l 2 7 8 2 5 2 5 3 0
D u m m y c o d i n g : r e f e r e n c e c a t e g o r y : ( i ) f e m a l e s ; ( i i ) l e a v i n g h o m e l a t e
L o g i t m o d e l : xx ii10i
i 0.8201 - 0.05757- p-1
pln pLogit
x i i s 0 f o r f e m a l e s a n d 1 f o r m a l e s
L O G I T p i s – 0 . 0 5 7 5 7 f o r f e m a l e s a n d – 0 . 0 5 7 5 7 – 0 . 8 2 0 1 = - 0 . 8 7 7 7 f o r m a l e s
O D D SF e m a l e s ( r e f e r e n c e ) : e x p [ - 0 . 0 5 7 5 7 ] = 0 . 9 4 4 0 = 1 3 5 / 1 4 3M a l e s : e x p [ - 0 . 8 7 7 7 ] = 0 . 4 1 5 7 = 7 4 / 1 7 8
O D D S R A T I OO D D S m a l e s / O D D S f e m a l e s = e x p [ - 0 . 8 2 0 1 ] = 0 . 4 4 0 4 = 0 . 4 1 5 7 / 0 . 9 4 4 0
A r e m a l e s m o r e l i k e l y t o l e a v e h o m e e a r l y t h a n f e m a l e s ?
Leaving home
L o g i s t i c r e g r e s s i o n
0.486 (-0.05757)-exp1
1 p f
0.294 0.8201) - (-0.05757-exp1
1 p m
xx ii10i
i 0.4101 0.4676- p-1
pln pLogit
x i i s 1 f o r f e m a l e s a n d - 1 f o r m a l e s
L o g i t p i s – 0 . 4 6 7 6 + 0 . 4 1 0 1 = - 0 . 0 5 7 6 f o r f e m a l e s a n d - 0 . 4 6 7 6 + 0 . 4 1 0 1 * ( - 1 ) = - 0 . 8 7 7 7 f o r m a l e s
xx ii10
i
i 0.8201 - 0.05757- p-1
pln pLogit
Dummy coding: ref.cat: females, late
Effect coding or marginal coding: females +1; males –1
Leaving home
The logistic regression in SPSS
Micro data and tabulated data
SPSS: Micro-data
• Micro-data: age at leaving home in months
• Crosstabs: Number leaving home by reason (row) and sex (column)
• Create variable: Age in years• Age = TRUNC[(month-1)/12]
• Create variable: TIMING2 based on MONTH: • TIMING2 =1 (early) if month 240 & reason < 4
• TIMING2 =2 (late) if month > 240 & reason < 4
• For analysis: select cases that are NOT censored: SELECT CASES with reason < 4
SPSS: tabulated data
• Number of observations: WEIGHT cases (in data)
• No difference between model for tabulated data and
micro-data
The logistic regression in SPSS
SPSS: regression/logisticNote: Dependent variable: TIMING2 (p = probability of leaving home LATE)
Covariate: sex (CATEGORICAL)
Logit[p/(1-p)] = 0.8777 – 0.8201 X with males reference categoryMales coded 0; hence X is 1 for females
OUTPUT SPSS:
---------------------- Variables in the Equation -----------
Variable B S.E. Wald df Sig R Exp(B)
SEX(1) -.8201 .1831 20.0598 1 .0000 -.1594 .4404Constant .8777 .1383 40.2681 1 .0000
Leaving home
Related models
• Poisson distribution: counts have Poisson distribution (total number not fixed)
• Poisson regression
• Log-linear model: model of count data (log of counts)
• Binomial and multinomial distributions: counts follow multinomial distribution (total number is fixed)
• Logit model: model of proportions [and odds (log of odds)]
• Logistic regression
• Log-rate model: log-linear model with OFFSET (constant term)
Parameters of these models are related
Construct your own logistic regression model
Specify the logistic regressionfor this observation
• Schoolleavers: 50% are males and 50% are females
• 70% of schoolleavers find a job within a year
• 60% of those who find a job are females
1. Construct table
Table
Durationof search Females Males Total
Less than 1 year 42 28 701 year and more 8 22 30Total 50 50 100
Sex
Duration of job search among schooleavers, by sex
84% of females find a job within a year against 56% of males
2. Determine reference categories
• Duration of job search: One year or more
• Sex: Males
3. Odds ratios
• Males (ref. Cat): 28/22 = 1.278
• Females: 42/8 = 5.250
• Odds ratio: 5.250/1.278 = 4.125
Logit model
• p = probability of finding a job within a year
• Logit(p) = ln[p/(1-p)] = a + b x • with x Sex (0 for males and 1 for females)
– a = ln 1.273 = 0.241– b = ln 4.128 = 1.418
• Logit model for these data:
logit(p) = 0.241 + 1.418 x
Logistic regression
• For males:
• For females:
• 84% of females find a job within a year against 56% of males
0.56 0)]*1.418 (0.241 exp[- 1
1 p
0.84 1)]*1.418 - (0.241 exp[- 1
1 p
Confidence interval
• S.e. saturated model:– s.e. of a [0.2412] =
– s.e. of b [1.417] =
0.2849 22
1
28
1
0.4796 8
1
42
1
22
1
28
1
Confidence interval
• S.e. null model:– s.e. of ln[0.7/(1-0.7)]
= s.e. of 0.8473 =
• Conf. Interval: 0.8473 +/- 1.96 * 0.2180
(0.420, 1.275) on logit scale
or (0.603, 0.782) on probability scale
• The p for males and females are significantly different
0.2180 30
1
70
1
SPSS output: logistic regression
Parameters of logistic regression
Variable B S.E. Wald df Sig (p-value) R
SEX(1) -1.4168 0.4795 8.7297 1 0.0031 -0.2347Constant -0.2412 0.2849 0.7165 1 0.3973
p = probability that duration of search is more than one year
Simple coding (SPSS): reference categories:
• Dependent variable: timing: early
• Factor: sex: males
Parameters
SPSS output: logistic regression
Parameters of logistic regression
p = probability that duration of search is more than one year
Deviation coding (SPSS):
• Dependent variable: timing: early
• Factor: females (-1); males (+1)
ParametersVariable B S.E. Wald df Sig (p-value) R
SEX(1) -0.7084 0.2398 8.7297 1 0.0031 -0.2347Constant -0.9496 0.2398 15.6849 1 0.0001
SPSS and GLIM: a comparison
TIMING2 * SEX Crosstabulation
Count
135 74 209
143 178 321
278 252 530
Early
Late
TIMING2
Total
Females Males
SEX
Total
SPSS: UNSATURATED LOG-LINEAR MODEL: Parameter Estimates
Asymptotic 95% CIParameter Estimate SE Z-value Lower Upper
1 5.0280 .0721 69.75 4.89 5.17 TIMI(1)2 .0982 .0870 1.13 -.07 .27 3 .0000 . . . . SEX(1) 4 -.4291 .0889 -4.83 -.60 -.25 5 .0000 . . . .
GLIM: UNSATURATED LOG-LINEAR MODEL
estimate s.e. parameter [o] 1 4.697 0.08058 1 [o] 2 0.4291 0.08887 TIMI(2) [o] 3 -0.09819 0.08697 SEX(2) [o] scale parameter taken as 1.000
SPSS: SATURATED MODEL
Asymptotic 95% CIParameter Estimate SE Z-value Lower Upper
1 5.1846 .0748 69.27 5.04 5.33TIMI(1) 2 -.2183 .1121 -1.95 -.44 1.497E-03 3 .0000 . . . .SEX(1) 4 -.8738 .1379 -6.33 -1.14 -.60 5 .0000 . . . .TIMI*SEX6 .8164 .1827 4.47 .46 1.17 7 .0000 . . . . 8 .0000 . . . . 9 .0000 . . .
GLIM: SATURATED MODEL
d e$ [o] estimate s.e. parameter [o] 1 4.905 0.08607 1 [o] 2 0.05757 0.1200 TIMI(2) [o] 3 -0.6012 0.1446 SEX(2) [o] 4 0.8201 0.1831 TIMI(2).SEX(2) [o] scale parameter taken as 1.000