understanding and interpreting results from logistic, multinomial, and ordered logistic
TRANSCRIPT
Understanding and Interpreting Results from Logistic, Multinomial, and Ordered Logistic Regression Models: Using Post-Estimation
Commands in Stata
Raymond Sin-Kwok Wong
University of California-Santa Barbara
Model Estimation and Interpretation
• For OLS models, both model estimation and interpretation are relatively easily, since the effects are linear.
• For non-linear models, model estimation is simple but the interpretation of results can be tricky, especially for beginners who are not familiar with the non-linear relationship between dependent and independent variables.
What my talk is about?
– Not about the rationale for statistical modeling or the mathematical and statistical derivation of specific non-linear models
– But about a set of post-estimation tools that would aid understanding and interpretation and the presentation of complex relationship among variables using graphical display
Alternative Output Methods• (a) Display odds-ratios rather than logit coefficients
logit y x1 x2 x3 x4 x5
logit, or
Alternative Output Methods• (b) Use LISTCOEFlistcoef [varlist] [,pvalue(#) [factor|percent|std] constant help]
Factor: factor changes in the odds or expected counts
Percent: % change in the odds or expected countsStd: Standardized coefficients
==============================================================Option
std factor percent
-----------------------------------------------------------------------------------------------------------Type 1: regress, probit, cloglog, Default No No
oprobit, tobit, cnreg, intreg
Type 2: logit, logistic, ologit Yes Default YesType 3: clogit, mlogit, poisson, No Default Yes
nbreg, zip, zinb==============================================================
Alternative Output Methods
• Different standardized coefficients– x-standardized (bStdX)
• For a standard deviation increase inxk, y is expected to change by βk
Sx units, holding everything constant
– y-standardized (bStdY)• For a unit increase in xk, y is expected to change by βk
Sy
standard deviations, holding everything constant
– Fully standardized (bStdXY)• For a standard deviation increase in xk, y is expected to change
by βkS units, holding everything constant
Post-Estimation Tests
regress y x1 x2 … xk
estimates store mod1
What is the use? For post-estimation analysis
Two kind of tests are common:
(a) Wald test, and
(b) LR tests
Wald Test
test varlist, [accumulate]
For example,(a) regress y x1 x2 x3 x4 x5
test x1 x2 x3 x4 x5
This tests for the H0: β1 = β2 = β3 = β4 = β5 = 0
(b) test x1=2x2
test x3=x4, accumulate
This tests for the H0: β1 = 2β2 and β3 = β4
LR (Likelihood-Ratio) Tests
lrtest [, saving(name) using(name) model(name) df(#) ]
For example,(a) logit chd age age2 sex estimate saturated model
(b) lrtest, saving(0) save results
(c) logit chd age sex estimate simpler model
(d) lrtest obtain test
(e) lrtest, saving(1) save results as 1
(f) logit chd sex estimate simplest model
(g) lrtest compare to saturated model
(h) lrtest, using(1) compare to model 1
(i) lrtest, model(1) repeat earlier test
• . logit died studytime age drug
• Logit estimates Number of obs = 48
• LR chi2(3) = 13.67
• Prob > chi2 = 0.0034
• Log likelihood = -24.364293 Pseudo R2 = 0.2191
• ------------------------------------------------------------------------------
• died | Coef. Std. Err. z P>|z| [95% Conf. Interval]
• -------------+----------------------------------------------------------------
• studytime | -.0236468 .0457671 -0.52 0.605 -.1133487 .0660551
• age | .0793438 .0699391 1.13 0.257 -.0577343 .2164219
• drug | -1.150009 .5549529 -2.07 0.038 -2.237697 -.0623212
• _cons | -1.113136 3.945369 -0.28 0.778 -8.845918 6.619645
• ------------------------------------------------------------------------------
• . lrtest, saving(0)
• . logit died studytime age
• Iteration 0: log likelihood = -31.199418
• Iteration 1: log likelihood = -26.82757
• Iteration 2: log likelihood = -26.734502
• Iteration 3: log likelihood = -26.734061
• Iteration 4: log likelihood = -26.734061
• Logit estimates Number of obs = 48
• LR chi2(2) = 8.93
• Prob > chi2 = 0.0115
• Log likelihood = -26.734061 Pseudo R2 = 0.1431
• ------------------------------------------------------------------------------
• died | Coef. Std. Err. z P>|z| [95% Conf. Interval]
• -------------+----------------------------------------------------------------
• studytime | -.0843475 .0353784 -2.38 0.017 -.153688 -.015007
• age | .0518897 .0646409 0.80 0.422 -.0748042 .1785836
• _cons | -.87332 3.729449 -0.23 0.815 -8.182906 6.436266
• ------------------------------------------------------------------------------
• . lrtest
• Logit: likelihood-ratio test chi2(1) = 4.74
• Prob > chi2 = 0.0295
• . lrtest, saving(1)
• . logit died age
• Iteration 0: log likelihood = -31.199418
• Iteration 1: log likelihood = -29.955649
• Iteration 2: log likelihood = -29.945382
• Iteration 3: log likelihood = -29.945379
• Logit estimates Number of obs = 48
• LR chi2(1) = 2.51
• Prob > chi2 = 0.1133
• Log likelihood = -29.945379 Pseudo R2 = 0.0402
• ------------------------------------------------------------------------------
• died | Coef. Std. Err. z P>|z| [95% Conf. Interval]
• -------------+----------------------------------------------------------------
• age | .0893535 .0585925 1.52 0.127 -.0254857 .2041928
• _cons | -4.353928 3.238757 -1.34 0.179 -10.70177 1.993919
• ------------------------------------------------------------------------------
• . lrtest
• Logit: likelihood-ratio test chi2(2) = 11.16
• Prob > chi2 = 0.0038
• . lrtest, using(1)
• Logit: likelihood-ratio test chi2(1) = 6.42
• Prob > chi2 = 0.0113
Fit Statistics
• fitstat calculates a large number of fit statistics for many kinds of regression models. It works after the following: clogit, cnreg, cloglog,intreg, logistic, logit, mlogit, nbreg, ocratio, ologit, oprobit, poisson,probit, regress, zinb, and zip. With the saving() and using() options, it can also be used to compare fit measures for two different models.
fitstat [, saving(name) using(name) bic force save dif]
Examples:
(a) logit y x1 x2 … x10
Fitstat
Fit statistics
(b) To compute, save, and compare with other modelsLogit y x1 x2 x3 x4 x5 age
Quietly fitstat, saving(mod1)
Generate age2=age*age
Logit y x1 x2 x3 x4 x5 age age2
Fitstat, using(mod1)
Post-Estimation Approach to Interpret Non-
Linear Regression Models
• For non-linear regression models, the interpretation of individual coefficients do not have the simple linear relationship. For example, the beta coefficient in a logistic regression model can only be interpreted as the logit coefficient. If we want to interpret the model in terms of predicted probability, the effect of a change in a variable depends on the values of all variables in the model. Or to put it differently, it depends on where we evaluate the effect.
Post-Estimation Approach to Interpret Non-Linear Regression Models
• (1) Use predict command
regress y x1 x2 x3 generate predicted-y
predict
logit y x1 x2 x3 generate predicted P(Y=1)
predict
ologit y x1 x2 x3 generate predicted P(Y=k)
predict
mlogit y x1 x2 x3 generate predicted P(Y=k)
predict
poission y x1 x2 x3 generate predicted count
Post-Estimation Approach to Interpret Non-Linear Regression Models
• (2) Use prchange command to compute discrete and marginal changes in the predicted outcomes
prchange [varlist] [if exp] [in range] [,x(variables_and_values) rest(stat) outcome(#)fromto brief nobase nolabel help all uncentereddelta(#) ]
Examples:(a) prchange age, x(x1=20 x2=10) rest(mean) help
(b) prchange, help
(c) prchange x1 x2, fromto
This will calculate x=min to max, 0 to 1, -.5 to .5, -.5 sd to .5 sd, and marginal effect
Post-Estimation Approach to Interpret Non-Linear Regression Models
• (3) Use prvalue command to calculate the change in probability for a discrete change for any magnitudes in an independent variable.
prvalue[if exp] [in range] [,(variables_and_values) ][rest(stat)] [level(#)][save][dif][brief][all][maxcnt#)][nobase] [nolabel][ystar]
Examples:(a) prvalue, rest(median)
(b) prvalue, x(age=30) save brief
prvalue, x(age=40) dif brief(c) prvalue age, x(age=30) uncentered delta(10)
rest(mean) brief
This will generate a change in probability (P(Y=1)) from age 30 to age 40.
Post-Estimation Approach to Interpret Non-Linear Regression Models
• (4) Use prgen command to compute predicted values as one variable changes over a range of values, which is useful for constructing plots. The syntax is:
prgen varname, generate(newvar)[from(#) to(#) ncases (#)]
[x(variables_and_values)][rest(stat)][maxcnt (#)][brief][nobase][all]
ExamplesTo compute predicted values from an ordered probit where warm has fourcategories SD, D, A and SA:. oprobit warm yr89 male white age ed prst. prgen age, f(20) t(80) gen(mn)
. prgen age, x(male=0) rest(grmean) f(20) t(80) gen(fe m)
. prgen age, x(male=1) rest(grmean) f(20) t(80) gen(ma l)
To plot the predicted probabilites for average males:. graph malp1 malp2 malp3 malp4 malX
Post-Estimation Approach to Interpret Non-Linear Regression Models
Models and Predictions - * is the prefix all models:
*X: value of X
logit & probit:
Predicted probability of each outcome: *p0, *p1
ologit, oprobit
Predicted probabilities: *p#1,*p#2,... where #1,#2,... are values of the outcome variable.
Cumulative probabilities: *s#1,*s#2,... where #1,#2,... are of the outcome variable. *s#k is the probability of all categories up to or equal to #k.
mlogit:
Predicted probabilities: *p#1,*p#2,... where #1,#2,... are values of the outcome variable.
Post-Estimation Approach to Interpret Non-Linear Regression Models
• (5) Use prtab command to construct a table of predicted values for all combinations of up to three variables. The syntax is:
prtab rowvar [colvar [supercolvar]] [if exp] [in rang e], [by(superrowvar)][x(variables_and_values)][rest(sta t)] [outcome(string)][brief][nobase][nolabel][novarlbl] [all]
Examples:
(a) probit faculty female fellow phd mcit3 mnas
prtab female fellow mnas
(b) ologit jobclass female fellow pub1 phd
prtab female fellow, x(phd=min)
(c) logit died female race age educ
prtab female race educ
Post-Estimation Approach to Interpret Non-Linear Regression Models
• (6) Use mfx compute command to compute numerically calculates the marginal effects or the elasticities and their standard errors after estimation. Exactly whatmfx can calculate is determined by the previous estimation command and the predict() option. At which points the marginal effects or elasticities are to be evaluated is determined by the at() option. By default, mfx calculates the marginal effects or elasticities at the means of the independent variables by the default prediction option associated with the preceding estimation command.
Post-Estimation Approach to Interpret Non-Linear Regression Models
Examples(a) logit foreign mpg price
mfx compute
mfx, at(mpg = 20, price = 6000)
mfx compute, predict(xb)
mfx replay, level(90)
(b) mlogit rep78 mpg displ, nolog
mfx compute, predict(outcome(1))
(c) regress mpg length weight
mfx compute, eyex