binomial data

out of 4

Upload: mark-ebrahim

Post on 03-Apr-2018

215 views

Category:

Documents

0 download

Report

Download

Embed Size (px):

TRANSCRIPT

7/29/2019 Binomial Data

1/4

p. 3-3

A linear model approach for binomial data:

yx/nx =X+, Q: when is the approach appropriate?

The pmf shape of the joint distribution ofyx/nx

is similar to the pdf shape of X Some problems with this approach

Predicted probability may > 1 or < 0

Normal approximation might be toomuch a stretch whennis are not

large orpx1/0

Variance of Binomial is not constant

Some of these problems could becorrected by using transformationand weighting

p. 3-4

Recall: linear model

model description 1:Y =X+, model description 2:Y X

Y xx

x =x, x =Q: which description can be generalized to binomial data?

3 components in a generalized linear model (binomial example)

yx ~B(nx,px)

link functiong: g monotone andx =g(px) [for binimial,

g: (0, 1)(,) ]Common choices of link function for binomial data

Logit: x =log(px/(1px))

Probit:x =(px), where is the cdf of Normal

X=p

i=1 i hi(X1, . . . , X m) x

X=p

i=1 i hi(X1, . . . , X m) x, < x
7/29/2019 Binomial Data

2/4

p. 3-5

Complementary log-log:x =log(log(1px))

Logit is close to the complementary log-log whenpx is small

Logit is close to probit when 0.1
7/29/2019 Binomial Data

3/4

p. 3-7

Since the saturated model fits as well as any model can fit, thedevianceD measures how close the (smaller) model comes to

perfection.

Deviance can be treated as a measure of goodness of fit

Suppose thatyi is truly binomial and that theni are relatively large

, if the (smaller) model is correct can use thedeviance to test whether the model is an adequate fit

The chi-square distribution is only an approximation thatbecomes more accurate as theni increase [often suggestni 5]

Use deviance to compare two modelsSandL, Snested inL

Larger model L: devianceDL anddfL (=kl)

Smaller model S: devianceDSanddfS(=ks)

To test H0: Sv.s. H1:L\S, the test statistics is

DS DL

which is asymptotically distributed as

In terms of the accuracy of dist. approx., test > goodness of fitp. 3-8

(Walds test) alternative test for H0:i = 0

Can be generalized to H0:i =c or H0: =cAsymptotic null distribution: N(0, 1)

in contrast to normal linear model, these two statistics(deviance-based and Walds tests) arenot identical

Hauck-Donner effect (see Hauck and Donner, 1977): for

sparse data (i.e., manynis =1 or small), the standard errorscan beoverestimatedand so thez-value is too small and the

significance of an effect could be missed

therefore, the deviance-based test is preferred

test statistics: z-value

THU STAT 5230, 2011 Lecture Notes

made by Shao-Wei Cheng (NTHU)
7/29/2019 Binomial Data

4/4

p. 3-9

100(1)% confidence interval

Relationship between confidence interval and test

Approach 1: (from Walds test)

Approach 2: (profile likelihood-based method)

otherjs,ji, set to the maximizing values

(recall: the computation of the C.I. for in Box-Cox method)

the profile likelihood method is generally preferable for thesame Hauck-Donner reason

Similar method can be generalized to construct confidenceregion of several parameters

THU STAT 5230, 2011 Lecture Notes

made by Shao Wei Cheng (NTHU)