binomial data

Upload: mark-ebrahim

Post on 03-Apr-2018

215 views

Category:

Documents


0 download

TRANSCRIPT

  • 7/29/2019 Binomial Data

    1/4

    p. 3-3

    A linear model approach for binomial data:

    yx/nx =X+, Q: when is the approach appropriate?

    The pmf shape of the joint distribution ofyx/nx

    is similar to the pdf shape of X Some problems with this approach

    Predicted probability may > 1 or < 0

    Normal approximation might be toomuch a stretch whennis are not

    large orpx1/0

    Variance of Binomial is not constant

    Some of these problems could becorrected by using transformationand weighting

    p. 3-4

    Recall: linear model

    model description 1:Y =X+, model description 2:Y X

    Y xx

    x =x, x =Q: which description can be generalized to binomial data?

    3 components in a generalized linear model (binomial example)

    yx ~B(nx,px)

    link functiong: g monotone andx =g(px) [for binimial,

    g: (0, 1)(,) ]Common choices of link function for binomial data

    Logit: x =log(px/(1px))

    Probit:x =(px), where is the cdf of Normal

    X=p

    i=1 i hi(X1, . . . , X m) x

    X=p

    i=1 i hi(X1, . . . , X m) x, < x

  • 7/29/2019 Binomial Data

    2/4

    p. 3-5

    Complementary log-log:x =log(log(1px))

    Logit is close to the complementary log-log whenpx is small

    Logit is close to probit when 0.1

  • 7/29/2019 Binomial Data

    3/4

    p. 3-7

    Since the saturated model fits as well as any model can fit, thedevianceD measures how close the (smaller) model comes to

    perfection.

    Deviance can be treated as a measure of goodness of fit

    Suppose thatyi is truly binomial and that theni are relatively large

    , if the (smaller) model is correct can use thedeviance to test whether the model is an adequate fit

    The chi-square distribution is only an approximation thatbecomes more accurate as theni increase [often suggestni 5]

    Use deviance to compare two modelsSandL, Snested inL

    Larger model L: devianceDL anddfL (=kl)

    Smaller model S: devianceDSanddfS(=ks)

    To test H0: Sv.s. H1:L\S, the test statistics is

    DS DL

    which is asymptotically distributed as

    In terms of the accuracy of dist. approx., test > goodness of fitp. 3-8

    (Walds test) alternative test for H0:i = 0

    Can be generalized to H0:i =c or H0: =cAsymptotic null distribution: N(0, 1)

    in contrast to normal linear model, these two statistics(deviance-based and Walds tests) arenot identical

    Hauck-Donner effect (see Hauck and Donner, 1977): for

    sparse data (i.e., manynis =1 or small), the standard errorscan beoverestimatedand so thez-value is too small and the

    significance of an effect could be missed

    therefore, the deviance-based test is preferred

    test statistics: z-value

    THU STAT 5230, 2011 Lecture Notes

    made by Shao-Wei Cheng (NTHU)

  • 7/29/2019 Binomial Data

    4/4

    p. 3-9

    100(1)% confidence interval

    Relationship between confidence interval and test

    Approach 1: (from Walds test)

    Approach 2: (profile likelihood-based method)

    otherjs,ji, set to the maximizing values

    (recall: the computation of the C.I. for in Box-Cox method)

    the profile likelihood method is generally preferable for thesame Hauck-Donner reason

    Similar method can be generalized to construct confidenceregion of several parameters

    THU STAT 5230, 2011 Lecture Notes

    made by Shao Wei Cheng (NTHU)