lecture 6 stk3100 - categorical responses · profit2 3 35.29647 clogfit2 3 35.60866 lecture 6...
TRANSCRIPT
Lecture 6 STK3100 - Categoricalresponses
22. September 2014
Plan for lecture:1. GLM for binary and binomial data
2. Link functions
3. Parameter interpretation in logistic regression
4. Parameter interpretation with other link functions
5. Goodness-of-fit: Hosmer-Lemeshow-test
6. ROC curves
7. Over dispersion
Lecture 6 STK3100 - Categorical responses – p. 1
Binomial responses
• AssumeYi ∼ Bin(ni, πi) and independent
• The data belongs to the exponential family with pmf
f(y, θi, φi) =
(
ni
y
)
πyi (1− πi)
ni−yi
=c(y) exp(yθi − a(θi))
• θi = log(πi/(1− πi))
• a(θi) = ni log(1 + exp(θi))
• dispersion parameterφi = 1 and known andc(y) =(
ni
y
)
• E[Yi] = a′(θi) = niexp(θi)
1+exp(θi)= niπi = µi
• Var[Yi] = φia′′(θi) = ni
exp(θi)(1+exp(θi))2
= niπi(1− πi).
Lecture 6 STK3100 - Categorical responses – p. 2
Binomial or binary responses?
• AssumeYi ∼ Bin(ni, πi) and independent
• The data can also be represented as
Yi,j =
1 for j = 1, ..., Yi
0 for j = Yi + 1, ..., ni
• which gives usbinary data
• Note: If Yi,j Bin(1, πi), butYi,j-s are dependent within
groupi, the sum are not binomial
• Positive dependence give overdispersion
• Grouping, and then taking into account over dispersion,
may be a way to handle such data
Lecture 6 STK3100 - Categorical responses – p. 3
Binary responses or grouped data?
• Yi ∼ Bin(ni, πi), i = 1, ..., k or
• Yi′ ∼ Bin(1, πi′), i′ = 1, ..., n′ =
∑i=k
i=1 ni
Estimation equivalent for both representations
AIC for comparing models are also equivalent
The deviance goodness-of-fit test becomesdifferent!
• ∆ ∼ χ2n−q
• n = k for grouped data
• n′ =∑k
i=1 ni for binary data
• To trust the deviance goodness-of-fit test, we require:
Yi ∼ Bin(ni, πi) whereniπi > 5 andni(1− πi) > 5
Lecture 6 STK3100 - Categorical responses – p. 4
Ex: Beetles> dim(beetle)
[1] 8 3
> glm(cbind(Dode,Ant-Dode)˜Dose,family=binomial,data=beetle)
Coefficients:
(Intercept) Dose
-60.72 34.27
Degrees of Freedom: 7 Total (i.e. Null); 6 Residual
Null Deviance: 284.2
Residual Deviance: 11.23 AIC: 41.43
> dim(beetle2)
[1] 481 2
> glm(Dode˜Dose,family=binomial,data=beetle2)
Coefficients:
(Intercept) Dose
-60.72 34.27
Degrees of Freedom: 480 Total (i.e. Null); 479 Residual
Null Deviance: 645.4
Residual Deviance: 372.5 AIC: 376.5 Lecture 6 STK3100 - Categorical responses – p. 5
GLM for binomial or binary ( ni = 1) responses
• IndependentYi with probability for successπi
• Linear predictorηi = βTxi
• Link functiong(πi) = ηi
The logit link function is the most usual:
g(πi) = log(πi
1− πi
) = logit(πi)
which gives
πi =exp(ηi)
1 + exp(ηi)= g−1(ηi)
This is the canonical link function, i.e. canonical parameter
θi = ηiThe logit link yield logistic regression
Lecture 6 STK3100 - Categorical responses – p. 6
Requirements for link function for binomial data
g() should
• be smooth (can be differentiated)
• be strongly monotone (increasing)
• take values over all real numbers
• g([0, 1]) = R or equivalentg−1(R) = [0, 1]
• g−1(η) cumulative distribution function (CDF) for a
continuous distribution onR
Logit link satisfies these requirements.g−1(η) is CDF in
"standard" logistic distribution with density
exp(η)
(1 + exp(η))2
Lecture 6 STK3100 - Categorical responses – p. 7
CDF and pdf in "standard" logistic distribution
x
F(x
)
-6 -4 -2 0 2 4 6
0.0
0.2
0.4
0.6
0.8
1.0
Kumulativ logistisk fordeling
x
f(x)
-6 -4 -2 0 2 4 6
0.0
0.05
0.10
0.15
0.20
0.25
Tetthet logistisk fordeling
pdf is symmetric aroundx = 0, hence expectation is 0
The variance is∫ ∞
−∞x2 exp(x)
(1 + exp(x))2dx =
π2
3= 1.8137992
Lecture 6 STK3100 - Categorical responses – p. 8
Probit link: Inverse of CDF for standard normal
g(η) = Φ−1(η)
whereΦ(y) =∫ y
−∞1√2π
exp(−12x2)dx
However,
• Since the pdf in the standard normal distribution also is
symmetric around 0, with probit link we often get results
that are comparable with those from logistic regression
• However, the logistic distribution has heavier tails than the
normal, and in some situations the probit link may be better
Lecture 6 STK3100 - Categorical responses – p. 9
CDF and pdf for logit and probit
x
F(x)
-6 -4 -2 0 2 4 6
0.0
0.2
0.4
0.6
0.8
1.0
Kumulative fordelingsfunksjoner
logistiskprobit (skalert)
x
f(x)
-6 -4 -2 0 2 4 6
0.0
0.05
0.10
0.15
0.20
0.25
Tettheter
Lecture 6 STK3100 - Categorical responses – p. 10
Comparing estimates from logit and probit
E[Yi] =g−1(ηi)
≈g−1(0) + (g−1)′(0)ηi
=
0.5 + 0.25ηli logit
0.5 + φ′(0)ηpi probit
I.e. forηi ≈ 0, sinceφ′(0) = 1/sqrt2π,
ηli ≈(φ′(0)/0.25)ηp =√
(8/π)ηp ≈ 1.6ηp
or
βlj ≈1.6βp
j
Lecture 6 STK3100 - Categorical responses – p. 11
R-output beetles: Logit vs. Probit
> logfit<-glm(cbind(Dode,Ant-Dode)˜Dose,binomial(link=logit),beetle)
> profit<-glm(cbind(Dode,Ant-Dode)˜Dose,binomial(link=probit),beetle)
> logfit
Coefficients:
(Intercept) Dose
-60.72 34.27
Degrees of Freedom: 7 Total (i.e. Null); 6 Residual
Null Deviance: 284.2
Residual Deviance: 11.23 AIC: 41.43
> profit
Coefficients:
(Intercept) Dose
-34.94 19.73
Degrees of Freedom: 7 Total (i.e. Null); 6 Residual
Null Deviance: 284.2
Residual Deviance: 10.12 AIC: 40.32
> logfit$coef/profit$coef
(Intercept) Dose
1.737999 1.737147 Lecture 6 STK3100 - Categorical responses – p. 12
Akaike information criterion (AIC)
AIC = −2l + 2q
• q = number of parameters in the model
• l is the maximum log-likelihood under the model
• AIC are used for model selection
• The model with lowest AIC model are the best according to
this criterion
Lecture 6 STK3100 - Categorical responses – p. 13
R-output beetles: Logit
> summary(logfit)
Deviance Residuals:
Min 1Q Median 3Q Max
-1.5941 -0.3944 0.8329 1.2592 1.5940
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -60.717 5.181 -11.72 <2e-16 ***Dose 34.270 2.912 11.77 <2e-16 ***---
Signif. codes: 0 ’ *** ’ 0.001 ’ ** ’ 0.01 ’ * ’ 0.05 ’.’ 0.1 ’ ’ 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 284.202 on 7 degrees of freedom
Residual deviance: 11.232 on 6 degrees of freedom
AIC: 41.43
Lecture 6 STK3100 - Categorical responses – p. 14
R-output beetles: Probit
> summary(profit)
Deviance Residuals:
Min 1Q Median 3Q Max
-1.5714 -0.4703 0.7501 1.0632 1.3449
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -34.935 2.648 -13.19 <2e-16 ***Dose 19.728 1.487 13.27 <2e-16 ***---
Signif. codes: 0 ’ *** ’ 0.001 ’ ** ’ 0.01 ’ * ’ 0.05 ’.’ 0.1 ’ ’ 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 284.202 on 7 degrees of freedom
Residual deviance: 10.120 on 6 degrees of freedom
AIC: 40.318
Number of Fisher Scoring iterations: 4Lecture 6 STK3100 - Categorical responses – p. 15
clog-log-link based on the Gumbel distribution
The linkηi = g(πi) = log(− log(1− πi)) is called the
"complementary log-log-link"
Its inverse is given by
πi = 1− exp(− exp(ηi)) = F (ηi)
which is CDF for (the standardized) Gumbel distribution
Properties:
• not symmetric
• light tail towards+∞
• tails as the logistic distributions towards−∞
• expectation = Euler’s constant≈ −0.58
• varianceπ2/6 ≈ 0.412Lecture 6 STK3100 - Categorical responses – p. 16
CDF and pdf in Gumbel distribution
x
F(x)
-4 -2 0 2 4
0.0
0.2
0.4
0.6
0.8
1.0
Kumulative fordelingsfunksjon Gumbel
x
f(x)
-4 -2 0 2 4
0.0
0.1
0.2
0.3
Tetthet Gumbel
Lecture 6 STK3100 - Categorical responses – p. 17
R-output beetles: Clog-log
> clogfit<-glm(cbind(Dode,Ant-Dode)˜Dose,binomial(link=cloglog),beetle)
> summary(clogfit)
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -39.572 3.240 -12.21 <2e-16 ***Dose 22.041 1.799 12.25 <2e-16 ***
Null deviance: 284.2024 on 7 degrees of freedom
Residual deviance: 3.4464 on 6 degrees of freedom
AIC: 33.644
Number of Fisher Scoring iterations: 4
> logfit$coef/clogfit$coef
(Intercept) Dose
1.534342 1.554832
Lecture 6 STK3100 - Categorical responses – p. 18
Comparing link functions by AIC
> AIC(logfit,profit,clogfit)
df AIC
logfit 2 41.43027
profit 2 40.31780
clogfit 2 33.64448
clog-log-link gives the lowest AICSince all three models has the same number of parameters, it alsogives the highest log-likelihood, i.e. the best fit
Lecture 6 STK3100 - Categorical responses – p. 19
Fitted probabilities for beetle data
with logit link and clog-log link:
dose (log_10)
ande
l dod
e bi
ller
1.70 1.75 1.80 1.85
0.0
0.2
0.4
0.6
0.8
1.0
logistiskcloglog
The clog-log link fits observed proportions better than logit link,with residual deviance 3.45 for clog-log and 11.23 for logit
Lecture 6 STK3100 - Categorical responses – p. 20
Including 2. order term of dose
> form = cbind(Dode,Ant-Dode)˜Dose+I(Doseˆ2)
> logfit2<-glm(form,binomial(link=logit),beetle)
> profit2<-glm(form,binomial(link=probit),beetle)
> clogfit2<-glm(form,binomial(link=cloglog),beetle)
> AIC(clogfit,logfit2,profit2,caufit2,clogfit2)
df AIC
clogfit 2 33.64448
logfit2 3 35.39294
profit2 3 35.29647
clogfit2 3 35.60866
Lecture 6 STK3100 - Categorical responses – p. 21
Fitted probabilities for beetle data
including also models with quadratic terms ofDose
dose (log_10)
ande
l dod
e bi
ller
1.70 1.75 1.80 1.85
0.0
0.2
0.4
0.6
0.8
1.0
logistiskclogloglogistisk, 2. gradsledd
clog-log link: Quadratic term yields residual deviance 3.19compared to 3.44 with only linear term
Lecture 6 STK3100 - Categorical responses – p. 22
Interpretation of parameters in logistic regression
Theodds for an event is defined:π1−π
= Odds
In logistic regression, withη = βTx, the odds are
Odds=exp(η)
1+exp(η)
1− exp(η)1+exp(η)
=
exp(η)1+exp(η)
11+exp(η)
= exp(η)
i.e.
η = logOdds
Lecture 6 STK3100 - Categorical responses – p. 23
Interpretation of parameters in logistic regression:
Odds-ratio
• Let x′k = xk, k 6= j, x′
j = xj + 1, i.e.
x′ − x = (0, . . . , 0, 1, 0, . . . , 0),
• The ratio between two odds with explanatory variablesx
andx′ is called theodds-ratio,
(with π′ = eη′
/(1 + eη′
) andη′ = βTx′)
ORj =π′
1−π′
π1−π
= Odds′Odds = exp(η′ − η) = exp(βT (x′ − x))
= exp(βj)
or
βj = log(ORj),
• i.e the regression coefficients are log-odds-ratios or relative
change in odds on the log scaleLecture 6 STK3100 - Categorical responses – p. 24
Odds-ratio ≈ Relative Risk (RR) when the probabilities
are small
• Relative risk is defined as the ratio between two
probabilities:
RR=π′
π• When bothπ andπ′ are small,1− π ≈ 1 and1− π′ ≈ 1.
Therefore,
OR=π′
π
1− π
1− π′≈
π′
π= RR
• I.e., when the probabilities are small,exp(βj) expresses
approximately the relative change in probability whenxj is
increased by one unit
Lecture 6 STK3100 - Categorical responses – p. 25
The approximation OR ≈ RR
Relative risk Odds-ratio
π 0.01 0.05 0.10 0.20 0.01 0.05 0.10 0.20
π′ = 0.01 1 0.2 0.1 0.05 1.00 0.19 0.09 0.04
π′ = 0.05 5 1.0 0.5 0.25 5.21 1.00 0.47 0.21
π′ = 0.10 10 2.0 1.0 0.50 11.00 2.11 1.00 0.44
π′ = 0.20 20 4.0 2.0 1.00 24.75 4.75 2.25 1.00
π′ = 0.30 30 6.0 3.0 1.50 42.43 8.14 3.86 1.71
π′ = 0.40 40 8.0 4.0 2.00 66.00 12.67 6.00 2.67
π′ = 0.50 50 10.0 5.0 2.50 99.00 19.00 9.00 4.00
Lecture 6 STK3100 - Categorical responses – p. 26
Interpretation of parameters with clog-log-link
π =1− exp(− exp(βTx))
or
η =βTx = log(− log(1− π))
If π is small, then− log(1− π) ≈ π (Taylor) which gives
η ≈ log(π) ⇔ π ≈ exp(η)
and thus
RRj =π′
π≈ exp(βj)
Lecture 6 STK3100 - Categorical responses – p. 27
Ex: Mortality by Wilm’s tumor
444 dead, 3471 survivors
> glm(d˜unfav+factor(stg),family=binomial(link=logit),
data=nwts)$coef
(Intercept) unfav factor(stg)2 factor(stg)3 factor(stg)4
-3.2415851 1.9927784 0.6957588 1.0305140 1.7935930
> glm(d˜unfav+factor(stg),family=binomial(link=cloglog),
data=nwts)$coef
(Intercept) unfav factor(stg)2 factor(stg)3 factor(stg)4
-3.2240445 1.7404373 0.6591325 0.9664677 1.6147868
Lecture 6 STK3100 - Categorical responses – p. 28
Interpretation of parameters with probit link
Sometimes we may have continuous responses, for instance
normal distributed,Yi0 ∼ N(βTxi, σ
2), but still prefer to study
Yi =
1 if Yi0 < γ = threshold value
0 if not
Ex; Yi0 = birth weight
Yi =
1 if Yi0 < 2800 gram
0 if not
Ex: Psychometric measurements,Yi0 = score on a depression
scale
Yi =
1 if Yi0 < threshold value
0 if not Lecture 6 STK3100 - Categorical responses – p. 29
Underlying scale
Yi =
1 if Yi0 < γ = threshold value
0 if not
Y0
tetth
et
0.0
0.1
0.2
0.3
0.4
Lecture 6 STK3100 - Categorical responses – p. 30
Probit, cont.
Why binary response?
• Tradition to do table analysis
• Direct scoreYi0 may have a skew distribution
• Direct score may not be registered, only an underlying
scale we imagine exists ("latent" variable)
The relation between
• Yi0 ∼ N(βTxi, σ
2)
• Yi = I(Yi0 ≤ γ)
is given by
πi = P(Yi = 1) = P(Yi0 ≤ γ) = Φ(γ
σ− (
β
σ)′xi)
Lecture 6 STK3100 - Categorical responses – p. 31
Relationship between parameters on probit
and underlying scale
E[Yi0] = βTxi = β0 + β1xi1 + · · ·+ βpxip is equivalent to the
linear predictor on probit scale
Φ−1(πi) = α0 + α1xi1 + · · ·+ αpxip
where
• α0 =γ−β
0
σ
• αj =−β
j
σfor j = 1, . . . , p
Note: The standard deviationσ on the underlying scale can notbe identified by the probit analysis
Lecture 6 STK3100 - Categorical responses – p. 32
Ex: Birth weight and gestational age
> summary(lm(vekt˜svlengde+sex))
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -1447.24 784.26 -1.845 0.0791 .
svlengde 120.89 20.46 5.908 7.28e-06 ***sex -163.04 72.81 -2.239 0.0361 *---
Residual standard error: 177.1 on 21 degrees of freedom
Multiple R-Squared: 0.64, Adjusted R-squared: 0.6057
F-statistic: 18.67 on 2 and 21 DF, p-value: 2.194e-05
Here isσ = 177.1.
Lecture 6 STK3100 - Categorical responses – p. 33
Ex: Birth weight and gestational age cont.
DefinesYi = 1 if birth weight is less than 2800 gram> lavvekt<-1 * (vekt<2800)
> table(lavvekt)
0 1
17 7
>
> glm(lavvekt˜svlengde+sex,family=binomial(link=probit))$coef
(Intercept) svlengde sex
24.1550285 -0.6801164 0.7522067
> -lm(vekt˜svlengde+sex)$coef/177.1
(Intercept) svlengde sex
8.1718986 -0.6826331 0.9206059
Approximately probit-estimates from linear regression:
αj ≈ −βj
σ
Lecture 6 STK3100 - Categorical responses – p. 34
Goodness of fit tests for binomial data
• If Yi ∼ Bin(ni, πi) and (a)niπi > 5 and (b)ni(1− πi) > 5
for i = 1, . . . , N , we have approximately
Residual deviance ∆ = 2(l − l) ∼ χ2N−p
Pearson chi-squareX2 =∑n
i=1(Yi−niπi)
2
niπi(1−πi)∼ χ2
N−p
• l is log-likelihood in saturated model
• l log-likelihood for the fitted model withp parameters and
• πi are estimated probabilities
• If D andX2 is much larger thanN − p, it indicates that the
model fit is bad
• However, theYi-s are often binary, and then the conditions
(a) and (b) is no fulfilledLecture 6 STK3100 - Categorical responses – p. 35
Two strategies for goodness of fit tests with binary data
• With categorical explanatory variables: Aggregate to
binomial data
• Aggregation can not be used if there are many categorical
variables with many levels, or if there are continuous
variables.
Can then instead use Hosmer-Lemeshow test
Lecture 6 STK3100 - Categorical responses – p. 36
Aggregation
• Count number of individuals within each combination of
the categorical variables
• Count number ofYi = 1 within each combination
• Fit a GLM on aggregated data
• The model is OK ifD andX2 are small compared toχ2N−p
,
whereN is number of combinations of the categorical
variables
• Requires that expected number of successes/failures in each
group> 5
Lecture 6 STK3100 - Categorical responses – p. 37
Ex: Aggregation on Wilm’s tumor data> table(nwts$unfav)
0 1
3476 439
> table(nwts$stg)
1 2 3 4
1543 993 906 473
> nwts2 = aggregate(nwts$d,by=list(nwts$unfav,nwts$stg),FUN=table)
Group.1 Group.2 x.0 x.1
1 0 1 1371 59
2 1 1 93 20
3 0 2 809 65
4 1 2 77 42
5 0 3 697 72
6 1 3 72 65
7 0 4 329 74
8 1 4 23 47
> nwts2 = data.frame(unfav=nwts2$Group.1,stg=nwts2$Group.2,
n=nwts2$x[,1]+nwts2$x[,2],d=nwts2$x[,2])
Lecture 6 STK3100 - Categorical responses – p. 38
Ex: Aggregation on Wilm’s tumor data> glmfit = glm(cbind(d,n-d)˜as.factor(unfav)+as.factor(stg),data=nwts2,family=binomial)
> glmfit
(Intercept) unfavaggr factor(stgaggr)2 factor(stgaggr)3 factor(stgaggr)4
-3.2416 1.9928 0.6958 1.0305 1.7936
Degrees of Freedom: 7 Total (i.e. Null); 3 Residual
Null Deviance: 413.4
Residual Deviance: 3.33 AIC: 56.85
> X2<-sum(residuals(glmfit,type="pearson")ˆ2)
> X2
[1] 3.259168
Lecture 6 STK3100 - Categorical responses – p. 39
Ex: Aggregation on Wilm’s tumor data cont.
• The model seems to be OK, since residual deviance
D = 3.33 ≈ X2 = 3.26 = Pearson chi-square is small
compared to residual degrees of freedomdf = 3
• Is expected successes and failures> 5? We compute these:
> round((nwts2$n * glmfit$fit,2)
1 2 3 4 5 6 7 8
53.81 63.55 75.95 76.70 25.19 43.45 61.05 44.30
> round((nwts2$n * (1-glmfit$fit),2)
1 2 3 4 5 6 7 8
1376.19 810.45 693.05 326.30 87.81 75.55 75.95 25.70
Lecture 6 STK3100 - Categorical responses – p. 40
Hosmer-Lemeshow test
• Fit the GLM model
• Order the individuals by fitted probabilities
π(1) ≤ π(2) ≤ · · · ≤ π(n)
• Divide the into G groups according to the ordering, with
equally many individuals in each group (“C statistic”)
• Divide the interval fromπ(1) to π(n) into G intervals (“H
statistic”)
• Compute the averageπg = of π(i) in groupg = 1, 2, . . . , G
• Compute no observationsng and successesYg in groupg
• Compute Hosmer-LemeshowX2hl =
∑G
g=1(Yg−ngπg)2
ngπg(1−πg)
• Under the 0 hypothesis (model is OK) we have
approximatelyX2hl ∼ χ2
G−2 Lecture 6 STK3100 - Categorical responses – p. 41
Ex: Hosmer-Lemeshow test on Wilm’s tumor data> glmfit<-glm(d˜unfav+factor(stg)+yr.regis+age,
data=nwts,family=binomial)
> library(MKmisc)
> HLgof.test(glmfit$fit,nwts$d)
$C
Hosmer-Lemeshow C statistic
data: glmfit$fit and nwts$d
X-squared = 3.4823, df = 8, p-value = 0.9006
$H
Hosmer-Lemeshow H statistic
data: glmfit$fit and nwts$d
X-squared = 6.6996, df = 8, p-value = 0.5694
Lecture 6 STK3100 - Categorical responses – p. 42
Ex: Hosmer-Lemeshow test on Wilm’s tumor data cont.> glmfit<-glm(d˜unfav+factor(stg)+yr.regis+age,family=binomial)
> kuttoff<-sort(glmfit$fit)[c(round(length(d) * (1:10)/10))]
> gr<-rep(1,length(d))
> for (i in 1:9) gr<-gr+(glmfit$fit>kuttoff[i])
> table(gr)
1 2 3 4 5 6 7 8 9 10
392 392 391 392 392 390 391 392 392 391
> ngr<-as.numeric(table(gr))
> ngr
[1] 392 392 391 392 392 390 391 392 392 391
> dgr<-numeric(0)
> for (i in 1:10) dgr[i]<-sum(d[gr==i])
> dgr
[1] 10 14 16 26 20 28 36 48 79 167
> for (i in 1:10) pigr[i]<-mean(glmfit$fit[gr==i])
> round(pigr,3)
[1] 0.024 0.032 0.040 0.049 0.061 0.076 0.095 0.128 0.202 0.427
> X2HL<-sum((dgr-ngr * pigr)ˆ2/(ngr * pigr * (1-pigr)))
> X2HL
[1] 3.482061
> 1-pchisq(X2HL,8)
[1] 0.9005774 Lecture 6 STK3100 - Categorical responses – p. 43
Sensitivity and specificity
• Classification:
• Predict an event(Yi = 1) if πi > γ, whereγ is a
threshold value
• Predict no event ifπi ≤ γ
• Count number of correct classifications in the data set
• Sensitivity: Proportion of correct predictions when true
Yi = 1
• Specificity: Proportion of correct predictions when true
Yi = 0
• We want high values for both sensitivity and specificity
• For a given method, we can choose threshold valueγ to
give a good balance in a specific classification situationLecture 6 STK3100 - Categorical responses – p. 44
ROC curves
• For evaluating and comparing models, we can vary the
thresholdγ and plot a Receiver Operating Characteristics
curve or ROC-curve with sensitivity on the y-axis and
(1-specificity) on the x-axis
• Can also compute the area under curve (AUC)
• AUC=1 if perfect classification
• AUC=0.5 if random classification
Lecture 6 STK3100 - Categorical responses – p. 45
ROC for predicting bycatch of fish
• Shrimp fishery in Barents Sea: Predict if one can expect to
catch more than 0.8 juvenile cod per kg shrimps caught
• If yes, the fishing area is temporarily closed
0.0 0.2 0.4 0.6 0.8 1.0
0.0
0.2
0.4
0.6
0.8
1.0
Pro
babi
lity
of c
orre
ct p
redi
ctio
n if
obse
rved
>0.
8
model predictorno predictability
Lecture 6 STK3100 - Categorical responses – p. 46
Over dispersion in “binomial” data
• With independent, binary data there is never over
dispersion (Var(Yi) = πi(1− πi)))
• If independent, binary dataYij with sameπi are aggregated
to Yi =∑j=ni
j=1 Yij,
thenYi ∼ Bin(ni, πi),
Var(Yi) = niπi(1− πi)) and no over dispersion
• However, over dispersion occurs if the outcomes of the
individuals trials are positively correlated.
Then Var(Yi) > niπi(1− πi)
• Possibility 1: Quasi-likelihood
• Possibility 2: Mixed model
• Possibility 3: Beta-binomial distribution
Lecture 6 STK3100 - Categorical responses – p. 47
Over dispersion in “binomial” data - Quasi-likelihood
• Specify mean structure by link function and linear predictor
• Specify variance structure
• Possibility 1: Var(Yi) = φniπi(1− πi))
• Possibility 2: Var(Yi) = (1 + ρ(ni − 1))niπi(1− πi))
• Fit the model. (Not sure if Possibility 2 is implemented in
R)
Lecture 6 STK3100 - Categorical responses – p. 48
Randomπ or beta binomial response
• Mixed model:πi random with expectationπ∗i
• If πi is random and beta distributed (continuous between 0
and 1),Yi becomes beta binomial
• Then Var(Yi) = (1 + ρ(ni − 1))niπ(1− π))
• Can be estimated i R by thebetabin function from the
aod library
Lecture 6 STK3100 - Categorical responses – p. 49