alison bowling maximum likelihood. general linear model

42
ALISON BOWLING MAXIMUM LIKELIHOOD

Upload: amy-heather-mcbride

Post on 18-Jan-2018

222 views

Category:

Documents


0 download

DESCRIPTION

ALTERNATIVE DISTRIBUTIONS Binomial (proportions) P (event occurring), 1-P (event not occurring) Poisson (count data)

TRANSCRIPT

Page 1: ALISON BOWLING MAXIMUM LIKELIHOOD. GENERAL LINEAR MODEL

A L I S O N BO W L I N G

MAXIMUM LIKELIHOOD

Page 2: ALISON BOWLING MAXIMUM LIKELIHOOD. GENERAL LINEAR MODEL

GENERAL LINEAR MODEL

• ei ~ i.i.d. N(0, s2)

• Residuals are• Independent and identically distributed• Normally distributed• Mean 0, Variance s2

• What to do when the normality assumption does not hold?

• We can fit an alternative distribution• This requires Maximum Likelihood methods.

Page 3: ALISON BOWLING MAXIMUM LIKELIHOOD. GENERAL LINEAR MODEL

ALTERNATIVE DISTRIBUTIONS

• Binomial (proportions)• P (event occurring), 1-P (event not occurring)

• Poisson (count data)

Page 4: ALISON BOWLING MAXIMUM LIKELIHOOD. GENERAL LINEAR MODEL

MAXIMUM LIKELIHOOD

• Myung, J. (2003). Tutorial on maximum likelihood estimation. Journal of Mathematical Psychology, 47, 90 – 100.• Standard approach to parameter estimation and

inference in statistics• Many of the inference methods in statistics are based on

MLE.• Chi-square test• Bayesian methods• Modelling of random effects

Page 5: ALISON BOWLING MAXIMUM LIKELIHOOD. GENERAL LINEAR MODEL

PROBABILITY DISTRIBUTIONS

• Imagine a biased coin, with the probability of heads, w, = 0.7, is tossed 10 times.• The following probability distribution, can be

computed using the binomial theorem.

0 1 2 3 4 5 6 7 8 9 100

0.05

0.1

0.15

0.2

0.25

0.3

Number of Heads

Prob

ality

of

resu

lt (f

(y))

This is a probability distribution.• the probability of

obtaining a particular outcome for 10 tosses of a coin with w = .7

• 7 heads are more likely to occur than any other combination

Page 6: ALISON BOWLING MAXIMUM LIKELIHOOD. GENERAL LINEAR MODEL

LIKELIHOOD FUNCTION

• Suppose we don’t know w, but have tossed the coin 10 times and obtained y = 7 heads. • What is the most likely value of

w?• This may be obtained from the

likelihood function.• This is a function of the

parameter, w, given the data, y.

• The most likely value of w is at the peak of this function.

Page 7: ALISON BOWLING MAXIMUM LIKELIHOOD. GENERAL LINEAR MODEL

MAXIMUM LIKELIHOOD ESTIMATION

• We are interested in finding the probability distribution that underlies that data that have been collected.• We are consequently interested in finding the parameter

value(s) that correspond to the desired probability distribution.

• The MLE estimate is the maximum (peak) of the maximum likelihood function• This may be obtained from the first derivative of the MLF.• To make sure this is a peak (and not a valley), the second

derivative is also checked.

Page 8: ALISON BOWLING MAXIMUM LIKELIHOOD. GENERAL LINEAR MODEL

ITERATIVE METHOD

• For very simple scenarios, the maximum can be obtained using calculus as in the example.• This is usually not possible, especially when the

model involves many parameters.• This is done by an iterative series of trial and

error steps.• Start with a value of a parameter, w, and compute the

likelihood of obtaining this.• Then try another, and see if the likelihood is higher.• If so, keep going• Stop when the maximum is found (solution converges).

Page 9: ALISON BOWLING MAXIMUM LIKELIHOOD. GENERAL LINEAR MODEL

MLE ALGORITHMS

• Different algorithms are used to obtain the result• EM: estimation maximisation algorithm• Newton-Raphson• Fisher Scoring.

• SPSS uses both the Newton-Raphson and the Fisher scoring method.

Page 10: ALISON BOWLING MAXIMUM LIKELIHOOD. GENERAL LINEAR MODEL

LOG LIKELIHOOD

• The computation of likelihood involves multiplying probabilities for each individual outcome• This can be computationally intensive.

• For this reason, the log of the likelihood is computed instead.• Instead of multiplying, the outcomes are added.• Log (A x B) = Log A + Log B

• We maximise the log of the likelihood rather than the likelihood itself, for computational convenience.

Page 11: ALISON BOWLING MAXIMUM LIKELIHOOD. GENERAL LINEAR MODEL

-2LL

• The log likelihood is the sum of the probabilities associated with the predicted and actual outcomes.• This is analogous to the residual sum of squares in OLS

regression.• The larger the log likelihood the greater the unexplained

variance.• This is usually negative, and can be made positive by

adding the negative sign.• We multiply by 2 to enable us to obtain p values to

compare models.• This value is -2LL

Page 12: ALISON BOWLING MAXIMUM LIKELIHOOD. GENERAL LINEAR MODEL

EVALUATING MODELS

• Using OLS we use R2 to evaluate models.• i.e. does the addition of a predictor produce a significant

increase in R2?• R2

is based on Sums of Squares, which we do not have when using ML.• We use the -2LL, Deviance, and Information

Criteria to evaluate models using ML.• Unlike R2, -2LL is not meaningful in its own right.• Used to compare with other models.

Page 13: ALISON BOWLING MAXIMUM LIKELIHOOD. GENERAL LINEAR MODEL

DEVIANCE

• Deviance is a measure of lack of fit.• Measures how much worse the model is than a perfectly

fitting model.• Deviance can be used to obtain a measure of

pseudo-R2

• -

Page 14: ALISON BOWLING MAXIMUM LIKELIHOOD. GENERAL LINEAR MODEL

LIKELIHOOD RATIO STATISTIC

• LR = likelihood of reduced model (without the parameters)• LF = likelihood of the full model (with the

parameters)

• LR ~ c2r , where r = dffull – dfreduced

• G2 compares the fitted model with the intercept-only model.

Page 15: ALISON BOWLING MAXIMUM LIKELIHOOD. GENERAL LINEAR MODEL

MAXIMUM LIKELIHOOD IN SPSS

• Logistic regression.• Used with a binomial outcome variable• E.g. yes, no; correct, incorrect; married, not married.

• Generalised Linear models• Provides a range of non-linear models to be fitted.

Page 16: ALISON BOWLING MAXIMUM LIKELIHOOD. GENERAL LINEAR MODEL

BAR-TAILED GODWIT DATA

• Dependent variable is a count:• Maximum number of birds observed at each estuary for

each year• Independent variables• Estuary: Richmond, Hastings, Clarence, Hunter, Tweed• categorical

• Year: 1981 – 2014.• Continuous (centred to 0 at 1981).

• Research question:• Does the number of Bar-tailed Godwits in the Richmond

Estuary remain stable, or improve, compared to the other estuaries?

Page 17: ALISON BOWLING MAXIMUM LIKELIHOOD. GENERAL LINEAR MODEL

STEP 1: GRAPH THE DATA

It is obvious that these data have problems.

Counts in the Hunter estuary are much higher than the other estuaries, and have much greater variance.

Page 18: ALISON BOWLING MAXIMUM LIKELIHOOD. GENERAL LINEAR MODEL

STEP 2: DUMMY CODE THE ESTUARY DATA

Richmond Clarence Hunter Hastings Tweed0 1 0 0 00 0 1 0 00 0 0 1 00 0 0 0 1

Use Richmond as the comparison category.Each of the other estuaries may be compared in turn with Richmond.

Page 19: ALISON BOWLING MAXIMUM LIKELIHOOD. GENERAL LINEAR MODEL

STEP 3: RUN OLS ANALYSIS OF THE DATA

• I will just include Hunter in this analysis to illustrate.• Model:

• Including just the Year0:

• There is a non-significant change in Godwit numbers over the years.

Page 20: ALISON BOWLING MAXIMUM LIKELIHOOD. GENERAL LINEAR MODEL

OLS DATA ANALYSIS

• Including the estuary and estuary * Year0 interaction.

There is a significant increase in R2 when the Hunter and Hunter* year interaction are included in the model.

Page 21: ALISON BOWLING MAXIMUM LIKELIHOOD. GENERAL LINEAR MODEL

INTERPRETATION OF THE FULL MODEL

• At year0 =0, the predicted Godwit for Richmond = 292 birds• Change in numbers over the years for Richmond = -4.4• At Year0=0, difference between numbers in the Hunter and Richmond =

1449.7 (p < .001) • Over 24 years, difference in rate of change for Hunter, compared with

Richmond is -15.2 (p = .031)• i.e. there is a steeper decline in bird numbers in Hunter estuary, than the

Richmond estuary.

Page 22: ALISON BOWLING MAXIMUM LIKELIHOOD. GENERAL LINEAR MODEL

CHECKING RESIDUALS….

• Residuals are not normally distributed.• The assumptions

for a linear model are not met!!

Page 23: ALISON BOWLING MAXIMUM LIKELIHOOD. GENERAL LINEAR MODEL

WHAT TO DO?

• We could try a transformation of the DV• A Square root transformation is better, but not perfect

• We could use a non-linear model• The data are counts, and we could use either a Poisson or

Negative Binomial distribution• We will use a Negative Binomial (for reasons that will be

explained later)• Use Generalized Linear Models for the analysis.

Page 24: ALISON BOWLING MAXIMUM LIKELIHOOD. GENERAL LINEAR MODEL

INTERCEPT ONLY MODEL

• No predictors are included, and the model simply tests whether the overall number of BT Godwits is different to zero.• The Log likelihood is -

827.26• -2LL = 1654.53

Page 25: ALISON BOWLING MAXIMUM LIKELIHOOD. GENERAL LINEAR MODEL

MODEL WITH THREE PARAMETERS

• Running the model including Year0, Hunter and Hunter*Year0 gives the following Goodness of Fit MeasuresLog likelihood = -781.3

-2LL = 1562.6

Page 26: ALISON BOWLING MAXIMUM LIKELIHOOD. GENERAL LINEAR MODEL

COMPARING THE TWO MODELS

• -2LL for intercept only model = 1654.53• -2LL for full model (with parameters) = 1562.6• Likelihood ratio (G2) = 1654.5 – 1562.6 = 91.9 • df = 3 , p < .001• Therefore the model including the three parameters is a

better fit to the data than just the intercept only model.• Limitations:

1. the models must be nested (one model must be contained within the other)

2. Data sets must be identical

Page 27: ALISON BOWLING MAXIMUM LIKELIHOOD. GENERAL LINEAR MODEL

INFORMATION CRITERIA

• Akaike’s Information Criterion : AIC = -2LL + 2k• Schwartz’s Bayesian Criterion : BIC = -2LL + k + ln(N)• k = number of parameters• N = number of participants

• Can be used with non-nested models• These IC are similar to restricted R2

• The more parameters you have, the better a model is likely to fit the data.

• The IC take this into account by penalising for additional parameters and/or participants.

• Better fitting models have lower values of the IC.

Page 28: ALISON BOWLING MAXIMUM LIKELIHOOD. GENERAL LINEAR MODEL

ANALYSIS OF COUNT DATA

• Coxe, S., West, S.G. and Aiken, L. (2009). The analysis of count data: a gentle introduction to Poisson regression and its alternatives. Journal of Personality Assessment, 91, 121- 136.• Poisson regression • Overdispersed Poisson regression models• Negative binomial regression models• Models which address problems with zeros.

Page 29: ALISON BOWLING MAXIMUM LIKELIHOOD. GENERAL LINEAR MODEL

ANALYSIS OF COUNT DATA

• Count data are discrete numbers• Usually not normally

distributed.• E.g. number of drinks

on a Saturday night.• Modelled by a Poisson

distribution.• This has one parameter, m.

Page 30: ALISON BOWLING MAXIMUM LIKELIHOOD. GENERAL LINEAR MODEL

POISSON MODEL

• Assumptions: (Y|X)~ Poi(μ), Var(Y|X)=fμ, f=1

• i.e. The residuals have a Poisson distribution.

Page 31: ALISON BOWLING MAXIMUM LIKELIHOOD. GENERAL LINEAR MODEL

EXAMPLE: DRINKS DATA

• Coxe et al Poisson dataset in SPSS format.• Sensation: mean score on a sensation seeking scale (1-7)• Gender (0 = female, 1 = male)• Y : number of drinks on a Saturday night.

Page 32: ALISON BOWLING MAXIMUM LIKELIHOOD. GENERAL LINEAR MODEL

OLS REGRESSION

• Intercept < 0• When sensation =

0, number of drinks is negative!!

• Residuals are not normally distribution.

• OLS has problems!!

Page 33: ALISON BOWLING MAXIMUM LIKELIHOOD. GENERAL LINEAR MODEL

POISSON REGRESSION: PARAMETERS

• Sensation only

• When sensation = 0, drinks = e-.14 = .86• For every 1 unit change in sensation, number of

drinks is multiplied by e-.231 = 1.26.

Page 34: ALISON BOWLING MAXIMUM LIKELIHOOD. GENERAL LINEAR MODEL

POISSON REGRESSION: MODEL FIT

• Sensation only: Model fit

• G2 = 35.07• Model fits better than the intercept only model

• Deviance = 1151• -2LL = -(-1037.5) x2 = 2075• BIC = 2087

• Deviance for the intercept-only model = 1186 (check)

• Pseudo-R2 =

Page 35: ALISON BOWLING MAXIMUM LIKELIHOOD. GENERAL LINEAR MODEL

POISSON REGRESSION: PARAMETERS

• Sensation and Gender as predictors

• What is the effect of gender on number of drinks consumed (holding sensation constant)??

Page 36: ALISON BOWLING MAXIMUM LIKELIHOOD. GENERAL LINEAR MODEL

EFFECT OF GENDER

• Intercept = -.789 (for gender = 0; female)• Exp(-.789) = .45• Females drink .45 drinks on a Saturday night

• B = .839 (gender = 1: male)• Exp(.839) = 2.3• Males drink 2.3 times as many drinks as females (when

sensation seeking = 0).

Page 37: ALISON BOWLING MAXIMUM LIKELIHOOD. GENERAL LINEAR MODEL

POISSON REGRESSION: MODEL FIT

• -2LL = -2 * (-.941.4) = 1828.2• BIC = 1900.77• Model including gender is a substantially better fit

than sensation model alone• (1900 vs 2087)

Pseudo-R2 =

Page 38: ALISON BOWLING MAXIMUM LIKELIHOOD. GENERAL LINEAR MODEL

MODEL ADEQUACY

• Save deviance residuals and predicted values, and plot the residuals against predicted values.

Page 39: ALISON BOWLING MAXIMUM LIKELIHOOD. GENERAL LINEAR MODEL

OVERDISPERSION

• A Poisson distribution has only one parameter, m, where m is the mean and variance of the distribution.• Often the variance of a set of data is greater than

the mean• The data are overdispersed.

Page 40: ALISON BOWLING MAXIMUM LIKELIHOOD. GENERAL LINEAR MODEL

OVERDISPERSED POISSON REGRESSION MODELS

• A second parameter, f, is estimated to scale the variance.• The parameters from the overdispersed model

are the same as with the simple model, but standard errors are larger.• Use information criteria to compare models

Page 41: ALISON BOWLING MAXIMUM LIKELIHOOD. GENERAL LINEAR MODEL

NEGATIVE BINOMIAL MODELS

• Negative binomial models use a Poisson distribution, but allow for individuals to vary in the distribution fitted.

Page 42: ALISON BOWLING MAXIMUM LIKELIHOOD. GENERAL LINEAR MODEL

HOMEWORK

• Use PGSI Data.sav (Leigh’s Honours data)• DV = PGSI (Score on Problem Gambling Severity Scale)• Predictors = GABS, FreqCoded

• Run a Poisson regression to predict PGSI from GABS• Does GABS significantly predict PGSI score?• Look at the likelihood ratio (G2)

• Interpret the coefficients for the intercept and GABS• Run a second regression including FreqCode (as a

continuous variable) in the model.• Does this second predictor improve the model fit?• (hint – look at the BIC for the two models)