generalized linear model - imperial college londonbm508/teaching/appstats/lecture5.pdfgeneralized...
TRANSCRIPT
Generalized Linear Model
I All models we have seen so far deal with continuousoutcome variables with no restriction on their expectations,and (most) have assumed that mean and variance areunrelated (i.e. variance is constant).
I Many outcomes of interest do not satisfy this.I Examples : binary outcomes, Poisson count outcomes.I A Generalized Linear Model (GLM) is a model with two
ingredients : a link function and a variance function.I The link relates the means of the observations to
predictors : linearizationI The variance function relates the means to the variances.
Generalized Linear Model
I The data involve 462 males between the ages of 15 and64. The outcome Y is the presence (Y = 1) or absenceY = 0 of heart diseaseCoefficients:
Estimate Std. Error z value Pr(>|z|)(Intercept) -5.9207616 1.3265724 -4.463 8.07e-06 ***sbp 0.0076602 0.0058574 1.308 0.190942tobacco 0.0777962 0.0266602 2.918 0.003522 **ldl 0.1701708 0.0597998 2.846 0.004432 **adiposity 0.0209609 0.0294496 0.712 0.476617famhistPresent 0.9385467 0.2287202 4.103 4.07e-05 ***typea 0.0376529 0.0124706 3.019 0.002533 **obesity -0.0661926 0.0443180 -1.494 0.135285alcohol 0.0004222 0.0045053 0.094 0.925346age 0.0441808 0.0121784 3.628 0.000286 ***
Generalized Linear Model
MotivationI Classical linear model
Y = Xβ + ε
where ε ∼ N(0, σ2). That means,
Y ∼ N(Xβ, σ2)
I In the GLM, we specify that
Y ∼ P(Xβ)
Generalized Linear Model
We write the GLM asE(Yi) = µi
andηi = g(µi) = Xiβ
where the function g called a link function which belongs to anexponential family.
Generalized Linear Model
I The exponential family density are specifying twocomponents, the canonical parameter θ and the dispersionparameter φ .
I Let Y = (Yi)i=1...n be a sequence of random variables. Yihas an exponential density if
fYi (yi ; θi , φ) = exp(
yiθi − b(θi)
ai(φ)+ c(yi , φ)
)where the functions b, c are specific to each distributionand ai(φ) = φ/wi .
Generalized Linear Model
Law Law µ σ2
B(m,p) py (1− p)m−y .(m
y
)∑mk=0 δ{k} mp mp(1− p)
P(µ) µye−µ.∑m
k=01k!δk µ µ
N (µ, σ2) exp{− (y−µ)2
2σ2
}.dy µ σ2
IG(µ, λ)√λexp
{−λ(y−µ)2
2µy
}. dy√
2πy3µ µ3/λ
Generalized Linear Model
I We write`(y ; θ, φ) = log f (y ; θ, φ)
for the log-likelihood function of Y .I Using the facts that
E(∂`
∂θ
)= 0
Var(∂`
∂θ
)= −E
(∂2`
∂θ2
)I We have
E(y) = b′(θ)
andVar(y) = b′′(θ)a(φ)
Generalized Linear ModelI Gaussian case
f (y ; θ, φ) =1
σ√
2πexp
[−(y − µ)2
2σ2
]= exp
(yµ− µ2/2
σ2 − 12
(y2
σ2 + log(2πσ2)
))We can write θ = µ, φ = σ2, a(φ) = φ, b(θ) = θ2/2 andc(y , φ) = −1
2
(y2
σ2 + log(2πσ2))
I Binomial case
f (y ; θ, φ) =
(ny
)µy (1− µ)n−y
= exp(
y log(
µ
1− µ
)+ n log(1− µ) + log
(ny
))We can write θ = log µ
1−µ , b(θ) = −n log(1− µ) andc(y , φ) = log
(ny
)
Generalized Linear Model
Recall that in ordinary linear models, the MLE of β satisfies
β = (X T X )−1X T Y
if X has full rank.In GLM, the MLE β does not exist in closed form and can beapproximately estimated via iterative weighted least squares.
Generalized Linear ModelI For n observations, the log-likelihood function is
L(β) =n∑
i=1
`(yi ; θ, φ)
I Computing∂`i∂βj
=∂`i∂θi
∂θi
∂µi
∂µi
∂ηi
∂ηi
∂βj= xij
1g′(µi)
1b′′(θi)
yi − µi
φ/wi
I The likelihood equations are
∂Li
∂βj=
n∑i=1
xij1
g′(µi)2Var(yi)
∂µi
∂ηi(yi − µi) = 0 j = 1, ..,p
I PutW = diag
{g′(µi)
2Var(yi)}
i=1,...,n
and∂µ
∂η= diag
{∂µi
∂ηi
}i=1,...,n
Generalized Linear Model
I These likelihood equations are
X T W−1∂µ
∂η(y − µ) = 0
I These equations are non-linear in β and require aniterative method (e.g Newton-Raphson).
I The Fisher’s Information matrix is
= = X T W−1X
and in general term
[=]jk = E(∂2L(β)∂βj∂βk
)= −
n∑i=1
xijxjk
Var(yi)
(∂µi
∂ηi
)2
Generalized Linear Model
Let µ0 = Y be the initial estimate. Then, set η0 = g(µ0),and form the adjusted variable
Z 0 = η0 + (Y − µ0)∂η
∂µ|µ=µ0
Calculate β1 by the least squares regression of Z 0 on X ,that means
β1 = argminβ(Z 0 − Xβ)T W−10 (Z 0 − Xβ)
So,β1 = (X T W−1
0 X )−1X T W−10 Z 0
Setη1 = X β1, µ1 = g−1(η1)
Repeat until changes in βm are sufficiently small.
Generalized Linear ModelEstimation
I In theory, βm → β as m→∞, but in practice, the algorithmmay fail to converge.
I Under some conditions,
β → N(β,=−1(β))
I In practice, the asymptotic covariance matrix of β isestimated by
φ(X T W−1m X )−1
where Wm is the weight matrix from the mth iteration.I If φ is unknown, it is estimated by
φ =1
n − p
n∑i=1
wi(yi − µ)2
V (µ)
where V (µi) = var(yi)/a(φ) = wivar(yi)/φ
Generalized Linear ModelI Confidence interval
CIα(βi) =
[βj − u1−α/2
1√nσβj ; βj + u1−α/2
1√nσβj
]where u1−α/2 is the 1− α/2 quantile of N(0,1) and
σβj =1n
[=(β)
]−1
jj.
I To test the hypothesis
H0 : βj = 0 against H1 : βj 6= 0
|βj |√φ(X T W−1
m X )−1(j , j)∼ N(0,1)
if φ is unknown
|βj |√φ(X T W−1
m X )−1(j , j)∼ tn−p
Generalized Linear Model
Goodness-of-Fit
H0 : the true model is M versus H1 : the true is Msat
I The likelihood ratio test for this hypothesis is called thedeviance.
I For any submodel M,
dev(M) = 2(ˆsat − ˆM)
I Under H0, dev(M)→ χ2psat−p.
Generalized Linear Model
Goodness-of-FitI The scaled deviance for GLM is
D(y , µ) = 2 [`(µsat , φ; y)− `(µ, φ; y)]
=n∑
i=1
2wi{
yi(θ(µsati )− θ(µi))− b(µsat
i ) + b(µi}/φ
=n∑
i=1
D?(yi ; µi)/φ
= D?(y ; µ)/φ
Generalized Linear Model
TestsI We use the deviance to compare two models having p1
and p2 parameters respectively, where p1 < p2. Let µ1 andµ2 denote the corresponding MLEs.
I
D(y , µ1)− D(y , µ2) ∼ χ2p2−p1
I If φ is unknown,
D?(y , µ1)− D?(y , µ2)
(p2 − p1)φ∼ F1−α,p2−p1,n−p2
Generalized Linear Model
Goodness-of-FitI The deviance residuals for a given model are
di = sign(yi − µi)√
D?(yi ; µi)
I A poorly fitting point will make a large contribution to thedeviance, so |di | will be large.
Generalized Linear Model
DiagnosticsI The Pearson residuals are defined by
ri =yi − µi√
(1− hii)V (µ)
where hii is the ith diagonal element of
H = X (X T W−1m X )−1X T W−1
m
I The deviance residuals are
εi = sign(yi − µi)
√D?(yi ; µi)
1− hii
Generalized Linear Model
DiagnosticsI The Anscombe residuals is defined as a transformation of
the Pearson residual
rAi =
t(yi)− t(µi)
t ′(µi)√φV (µi)(1− hii)
The aim in introducing the function t is to make theresiduals as Gaussian as possible. We consider
t(x) =∫ x
0V (µ)−1/3dµ
Generalized Linear Model
DiagnosticsI Influential points using the Cook’s distance
Ci =1p(β(i) − β)T X T WmX (β(i) − β) ≈ r2
ihii
p(1− hii)2
I The outliers points : if hii > 2p/n or hii > 3p/n, then weconsider that ith point is an outlier.
Generalized Linear Model
Model SelectionI Model selection can be done using the AIC and BIC.I Forward, Backward and stepwise approach can be used.
Generalized Linear Model
Logistic regressionI Logistic regression is a generalization of regression that is
used when the outcome Y is binary 0,1.I As example, we assume that
P(Yi = 1|Xi) =eβ0+β1Xi
1 + eβ0+β1Xi
I Note thatE(Yi |Xi) = P(Yi = 1|Xi)
Generalized Linear Model
Logistic regressionI Define the logit function
logit(z) = log(
z1− z
)I We can write
logit(πi) = β0 + β1Xi
where πi = P(Yi = 1|Xi)
I The extension to several covariates is
logit(πi) = β0 +
p∑i=1
βjxij
Generalized Linear Model
How do we estimate the parameters ?I Can be fit using maximum likelihood.I The likelihood function is
L(β) =n∏
i=1
f (yi |Xi ;β) = L(β) =n∏
i=1
πyii (1− πi)
1−yi
I The estimator β has to be found numerically.
Generalized Linear Model
Usually, we use the reweighted least squaresI First set a starting values of β(0)
I Compute
πi =eXiβ
(k)
1 + eX iβ(k)
I Define weighted matrix W whose i th diagonal is πi(1− πi)
I Define the adjusted response vectorZ = Xβ(k) + W−1(Y − π)
I Takeβ(k+1) = (X T WX )−1X T WZ
which is the weighted linear regression of Z on X
Generalized Linear Model
Model selection and diagnosticsI Diagnostics : the Pearson χ2
Yi − πi√πi(1− πi)
I The deviance residuals
sign(Yi − πi)
√2[Yi log
(Yi
πi
)+ (1− Yi) log
(1− Yi
1− πi
)]
Generalized Linear ModelI To fit this model, we use the glm command.
Call:glm(formula = chd ~ ., family = binomial, data = SAheart)
Deviance Residuals:Min 1Q Median 3Q Max
-1.8320 -0.8250 -0.4354 0.8747 2.5503
Coefficients:Estimate Std. Error z value Pr(>|z|)
(Intercept) -5.9207616 1.3265724 -4.463 8.07e-06 ***row.names -0.0008844 0.0008950 -0.988 0.323042sbp 0.0076602 0.0058574 1.308 0.190942tobacco 0.0777962 0.0266602 2.918 0.003522 **ldl 0.1701708 0.0597998 2.846 0.004432 **adiposity 0.0209609 0.0294496 0.712 0.476617famhistPresent 0.9385467 0.2287202 4.103 4.07e-05 ***typea 0.0376529 0.0124706 3.019 0.002533 **obesity -0.0661926 0.0443180 -1.494 0.135285alcohol 0.0004222 0.0045053 0.094 0.925346age 0.0441808 0.0121784 3.628 0.000286 ***---Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 596.11 on 461 degrees of freedomResidual deviance: 471.16 on 451 degrees of freedomAIC: 493.16
Number of Fisher Scoring iterations: 5
Generalized Linear Model
I To fit this model, we use the glm command.Start: AIC=493.16
chd ~ row.names + sbp + tobacco + ldl + adiposity + famhist +typea + obesity + alcohol + age
Df Deviance AIC- alcohol 1 471.17 491.17- adiposity 1 471.67 491.67- row.names 1 472.14 492.14- sbp 1 472.88 492.88<none> 471.16 493.16- obesity 1 473.47 493.47- ldl 1 479.65 499.65- tobacco 1 480.27 500.27- typea 1 480.75 500.75- age 1 484.76 504.76- famhist 1 488.29 508.29
etc...
Step: AIC=487.69chd ~ tobacco + ldl + famhist + typea + age
Df Deviance AIC<none> 475.69 487.69- ldl 1 484.71 494.71- typea 1 485.44 495.44- tobacco 1 486.03 496.03- famhist 1 492.09 502.09- age 1 502.38 512.38
Generalized Linear Model
I Suppose Yi ∼ Binomial(ni , πi)
I We can fit the logistic model as before
logit(πi) = Xiβ
I Pearson residuals
ri =Yi − ni πi√ni πi(1− πi)
I Deviation residuals
di = sign(Yi−Yi)
√2[Yi log
(Yi
µi
)+ (ni − Yi) log
(ni − Yi
ni − µi
)]
Generalized Linear Model
Goodness-of-Fit testI The Pearson test
χ2 =∑
i
r2i
I and devianceD =
∑i
d2i
I both have a χ2n−p distribution if the model is correct.
Generalized Linear Model
I To fit this model, we use the glm command.Call:glm(formula = cbind(y, n - y) ~ x, family = binomial)
Deviance Residuals:Min 1Q Median 3Q Max
-0.70832 -0.29814 0.02996 0.64070 0.91132
Coefficients:Estimate Std. Error z value Pr(>|z|)
(Intercept) -14.73119 1.83018 -8.049 8.35e-16 ***x 0.24785 0.03031 8.178 2.89e-16 ***---Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 137.7204 on 7 degrees of freedomResidual deviance: 2.6558 on 6 degrees of freedomAIC: 28.233
Number of Fisher Scoring iterations: 4
Generalized Linear Model
To test the correctness of the model> pvalue = 1-pchisq(out$dev,out$df.residual)> print(pvalue)[1] 0.8506433> r=resid(out,type="deviance")> p=out$linear.predictors> plot(p,r,pch=19,xlab="linear predictor", ylab="deviance residuals")> print(sum(r^2))[1] 2.655771> cooks.distance(out)
1 2 3 4 50.0004817501 0.3596628502 0.0248918197 0.1034462077 0.02429419426 7 8
0.0688081629 0.0014847981 0.0309767612
Note that the residuals give back the deviance test, and thep-value is large indicating no evidence of a lack of fit.