qualitative and limited dependent variable models adapted from vera tabakova’s notes econ 4551...

69
Chapter 16 Qualitative and Limited Dependent Variable Models Adapted from Vera Tabakova’s notes ECON 4551 Econometrics II Memorial University of Newfoundland

Upload: jade-chandler

Post on 25-Dec-2015

234 views

Category:

Documents


14 download

TRANSCRIPT

Page 1: Qualitative and Limited Dependent Variable Models Adapted from Vera Tabakova’s notes ECON 4551 Econometrics II Memorial University of Newfoundland

Chapter 16

Qualitative and Limited Dependent Variable Models

Adapted from Vera Tabakova’s notes

ECON 4551 Econometrics IIMemorial University of Newfoundland

Page 2: Qualitative and Limited Dependent Variable Models Adapted from Vera Tabakova’s notes ECON 4551 Econometrics II Memorial University of Newfoundland

Chapter 16: Qualitative and Limited Dependent Variable Models

16.1 Models with Binary Dependent Variables

16.2 The Logit Model for Binary Choice

16.3 Multinomial Logit

16.4 Conditional Logit

16.5 Ordered Choice Models

16.6 Models for Count Data

16.7 Limited Dependent Variables

Slide 16-2Principles of Econometrics, 3rd Edition

Page 3: Qualitative and Limited Dependent Variable Models Adapted from Vera Tabakova’s notes ECON 4551 Econometrics II Memorial University of Newfoundland

16.6 Models for Count Data

When the dependent variable in a regression model is a count of the number of occurrences of an event, the outcome variable is y = 0, 1, 2, 3, … These numbers are actual counts, and thus different from the ordinal numbers of the previous section. Examples include:

The number of trips to a physician a person makes during a year.

The number of fishing trips taken by a person during the previous year.

The number of children in a household.

The number of automobile accidents at a particular intersection during a month.

The number of televisions in a household.

The number of alcoholic drinks a college student takes in a week.

Slide16-3Principles of Econometrics, 3rd Edition

Page 4: Qualitative and Limited Dependent Variable Models Adapted from Vera Tabakova’s notes ECON 4551 Econometrics II Memorial University of Newfoundland

16.6 Models for Count Data

If Y is a Poisson random variable, then its probability function is

This choice defines the Poisson regression model for count data.

Slide16-4Principles of Econometrics, 3rd Edition

(16.27) , 0,1,2,!

yef y P Y y y

y

! 1 2 1y y y y

(16.28) 1 2expE Y x

“rate”Also equalTo the variance

Page 5: Qualitative and Limited Dependent Variable Models Adapted from Vera Tabakova’s notes ECON 4551 Econometrics II Memorial University of Newfoundland

16.6.1 Maximum Likelihood Estimation

Slide16-5Principles of Econometrics, 3rd Edition

1 2

1 2

, 0 2 2

ln , ln 0 ln 2 ln 2

L P Y P Y P Y

L P Y P Y P Y

1 2 1 2

ln ln ln ln !!

exp ln !

yeP Y y y y

y

x y x y

1 2 1 2 1 21

ln , exp ln !N

i i i ii

L x y x y

If we observe 3 individuals: one faces no event, the other two two events each:

Page 6: Qualitative and Limited Dependent Variable Models Adapted from Vera Tabakova’s notes ECON 4551 Econometrics II Memorial University of Newfoundland

16.6.2 Interpretation in the Poisson Regression Model

Slide16-6Principles of Econometrics, 3rd Edition

0 0 1 2 0expE y x

0 0expPr , 0,1,2,

!

y

Y y yy

Which is the predicted probabilityof a certain number y of events For someone with characteristics X0

Which is the expected number of occurrences observed

Page 7: Qualitative and Limited Dependent Variable Models Adapted from Vera Tabakova’s notes ECON 4551 Econometrics II Memorial University of Newfoundland

16.6.2 Interpretation in the Poisson Regression Model

Slide16-7Principles of Econometrics, 3rd Edition

(16.29)

2i

ii

E y

x

2

%100 100 %i i

i i

E y E y E y

x x

You may prefer to express this marginal effect as a %:

Page 8: Qualitative and Limited Dependent Variable Models Adapted from Vera Tabakova’s notes ECON 4551 Econometrics II Memorial University of Newfoundland

16.6.2 Interpretation in the Poisson Regression Model

Slide16-8Principles of Econometrics, 3rd Edition

1 2

1 2

1 2

1 2 1 2

1 2

exp

| 0 exp

| 1 exp

exp exp100 % 100 1 %

exp

i i i i

i i i

i i i

i i

i

E y x D

E y D x

E y D x

x xe

x

If there is a dummyInvolved, be careful,remember

Which would be identical to the effect of a dummyIn the log-linear modelwe saw under OLS

Page 9: Qualitative and Limited Dependent Variable Models Adapted from Vera Tabakova’s notes ECON 4551 Econometrics II Memorial University of Newfoundland

Slide16-9Principles of Econometrics, 3rd Edition

Example on Olympic Medals

# Poisson Regressionopen "c:\Program Files\gretl\data\poe\olympics.gdt"smpl year = 88 --restrictgenr lpop = log(pop)genr lgdp = log(gdp)poisson medaltot const lpop lgdpgenr mft = exp($coeff(const)+$coeff(lpop)*median(lpop) \ +$coeff(lgdp)*median(lgdp))*$coeff(lgdp)

Which would give you the marginal effect of GDP for the median countrygenr predicted medals = exp($coeff(const)+$coeff(lpop)*median(lpop) \ +$coeff(lgdp)*median(lgdp))

0.863 medals for those with median GDP and pop

Page 10: Qualitative and Limited Dependent Variable Models Adapted from Vera Tabakova’s notes ECON 4551 Econometrics II Memorial University of Newfoundland

Slide16-10Principles of Econometrics, 3rd Edition

Extensions: overdispersion

Under a plain Poisson the mean of the count is assumed to be equal to the variance (equidispersion)

This will often not hold

Real life data are often overdispersed

For example:

• a few women will have many affairs and many women will have few• a few travelers will make many trips to a park and many will make few• etc.

Page 11: Qualitative and Limited Dependent Variable Models Adapted from Vera Tabakova’s notes ECON 4551 Econometrics II Memorial University of Newfoundland

Slide16-11Principles of Econometrics, 3rd Edition

Extensions: overdispersion

_cons .8765791 .1125493 7.79 0.000 .6559865 1.097172 income -.0019933 .0007191 -2.77 0.006 -.0034027 -.0005839 educat -.0307667 .026493 -1.16 0.246 -.0826921 .0211587 Travelcost -.3299655 .0529402 -6.23 0.000 -.4337264 -.2262045 visits Coef. Std. Err. z P>|z| [95% Conf. Interval]

Log likelihood = -1321.4665 Pseudo R2 = 0.0210 Prob > chi2 = 0.0000 LR chi2(3) = 56.61Poisson regression Number of obs = 919

Iteration 2: log likelihood = -1321.4665 Iteration 1: log likelihood = -1321.4665 Iteration 0: log likelihood = -1321.4696

. poisson visits Travelcost educat income

_cons 2.144476 .0688666 31.14 0.000 2.0095 2.279452 income -.0014578 .0004404 -3.31 0.001 -.002321 -.0005946 educat -.0206209 .0163568 -1.26 0.207 -.0526797 .0114379 Travelcost -.9570718 .0435943 -21.95 0.000 -1.042515 -.8716285 persontrip Coef. Std. Err. z P>|z| [95% Conf. Interval]

Log likelihood = -2541.5165 Pseudo R2 = 0.1167 Prob > chi2 = 0.0000 LR chi2(3) = 671.71Poisson regression Number of obs = 919

. poisson persontrip Travelcost educat income, nolog

open C:\Users\rmartinezesp\aaa\bbbECONOMETRICS\Rober\4551\GROSMORNE.dta

Page 12: Qualitative and Limited Dependent Variable Models Adapted from Vera Tabakova’s notes ECON 4551 Econometrics II Memorial University of Newfoundland

Slide16-12Principles of Econometrics, 3rd Edition

Extensions: overdispersion

_cons .8765791 .1125493 7.79 0.000 .6559865 1.097172 income -.0019933 .0007191 -2.77 0.006 -.0034027 -.0005839 educat -.0307667 .026493 -1.16 0.246 -.0826921 .0211587 Travelcost -.3299655 .0529402 -6.23 0.000 -.4337264 -.2262045 visits Coef. Std. Err. z P>|z| [95% Conf. Interval]

Log likelihood = -1321.4665 Pseudo R2 = 0.0210 Prob > chi2 = 0.0000 LR chi2(3) = 56.61Poisson regression Number of obs = 919

Iteration 2: log likelihood = -1321.4665 Iteration 1: log likelihood = -1321.4665 Iteration 0: log likelihood = -1321.4696

. poisson visits Travelcost educat income

open C:\Users\rmartinezesp\aaa\bbbECONOMETRICS\Rober\4551\GROSMORNE.dta

educat 938 4.144989 1.120433 1 6 income 966 88.83793 41.94486 20 160 Travelcost 947 .7748112 .6820585 .0036767 7.8652 persontrip 966 3.824017 6.264637 1 91 Variable Obs Mean Std. Dev. Min Max

Page 13: Qualitative and Limited Dependent Variable Models Adapted from Vera Tabakova’s notes ECON 4551 Econometrics II Memorial University of Newfoundland

Slide16-13Principles of Econometrics, 3rd Edition

Extensions: overdispersionopen C:\Users\rmartinezesp\aaa\bbbECONOMETRICS\Rober\4551\GROSMORNE.dta

Page 14: Qualitative and Limited Dependent Variable Models Adapted from Vera Tabakova’s notes ECON 4551 Econometrics II Memorial University of Newfoundland

Slide16-14Principles of Econometrics, 3rd Edition

Extensions: negative binomial

Under a plain Poisson the mean of the count is assumed to be equal to the average (equidispersion)

The Poisson will inflate your t-ratios in this case, making you think that your model works better than it actually does

Or use a Negative Binomial model instead (nbreg) or even a Generalised Negative Binomial (gnbreg) , which will allow you to model the overdispersion parameter as a function of covariates of our choice

You can also test for overdispersion, to test whether the problem is significant

Page 15: Qualitative and Limited Dependent Variable Models Adapted from Vera Tabakova’s notes ECON 4551 Econometrics II Memorial University of Newfoundland

Slide16-15Principles of Econometrics, 3rd Edition

Extensions: negative binomial

Likelihood-ratio test of alpha=0: chibar2(01) = 1006.80 Prob>=chibar2 = 0.000 alpha .3042145 .0220429 .2639388 .3506361 /lnalpha -1.190022 .0724583 -1.332038 -1.048006 _cons 1.994577 .1037 19.23 0.000 1.791329 2.197826 income -.0014357 .0006578 -2.18 0.029 -.0027249 -.0001465 educat -.0218888 .0248201 -0.88 0.378 -.0705353 .0267578 Travelcost -.7135986 .0489137 -14.59 0.000 -.8094676 -.6177295 persontrip Coef. Std. Err. z P>|z| [95% Conf. Interval]

Log likelihood = -2038.1155 Pseudo R2 = 0.0547Dispersion = mean Prob > chi2 = 0.0000 LR chi2(3) = 236.04Negative binomial regression Number of obs = 919

. nbreg persontrip Travelcost educat income, nolog

Page 16: Qualitative and Limited Dependent Variable Models Adapted from Vera Tabakova’s notes ECON 4551 Econometrics II Memorial University of Newfoundland

Slide16-16Principles of Econometrics, 3rd Edition

Extensions: negative binomial

Page 17: Qualitative and Limited Dependent Variable Models Adapted from Vera Tabakova’s notes ECON 4551 Econometrics II Memorial University of Newfoundland

Slide16-17Principles of Econometrics, 3rd Edition

Extensions: excess zeros

Often the numbers of zeros in the sample cannot be accommodatedproperly by a Poisson or Negative Binomial model

They would underpredict them too

There is said to be an “excess zeros” problem

You can then use hurdle models or zero inflated or zero augmented models to accommodate the extra zeros

Page 18: Qualitative and Limited Dependent Variable Models Adapted from Vera Tabakova’s notes ECON 4551 Econometrics II Memorial University of Newfoundland

Slide16-18Principles of Econometrics, 3rd Edition

Extensions: excess zeros

Often the numbers of zeros in the sample cannot be accommodatedproperly by a Poisson or Negative Binomial model

They would underpredict them too

nbvargr Is a very useful command

0.2

.4.6

Pro

port

ion

0 2 4 6 8 10k

observed proportion neg binom probpoisson prob

mean = 3.296; overdispersion = 5.439

Page 19: Qualitative and Limited Dependent Variable Models Adapted from Vera Tabakova’s notes ECON 4551 Econometrics II Memorial University of Newfoundland

Slide16-19Principles of Econometrics, 3rd Edition

Extensions: excess zeros

You can then use hurdle models or zero inflated or zero augmented models to accommodate the extra zeros

They will also allow you to have a different process driving the value of the strictly positive count and whether the value is zero or strictly positive

EXAMPLES:• Number of extramarital affairs versus gender• Number of children before marriage versus religiosity

In the continuous case, we have similar models (e.g. Cragg’s Model) and an example is that of size of Insurance Claims from fires versus the age of the building

Page 20: Qualitative and Limited Dependent Variable Models Adapted from Vera Tabakova’s notes ECON 4551 Econometrics II Memorial University of Newfoundland

Slide16-20Principles of Econometrics, 3rd Edition

Extensions: excess zeros

You can then use hurdle models or zero inflated or zero augmented models to accommodate the extra zeros

Hurdle Models

A hurdle model is a modified count model in which there are two processes, one generating the zeros and one generating the positive values. The two models are not constrained to be the same. In the hurdle model a binomial probability model governs the binary outcome of whether a count variable has a zero or a positive value. If the value is positive, the "hurdle is crossed," and the conditional distribution of the positive values is governed by a zero-truncated count model.

Example: smokers versus non-smokers, if you are a smoker you will smoke!

Page 21: Qualitative and Limited Dependent Variable Models Adapted from Vera Tabakova’s notes ECON 4551 Econometrics II Memorial University of Newfoundland

Slide16-21Principles of Econometrics, 3rd Edition

Extensions: excess zeros

Hurdle Models

In Stata Joseph Hilbe’s downloadable ado HPLOGIT will work, although it does not allow for two different sets of variables, just two different sets of coefficients

Example: smokers versus non-smokers, if you are a smoker you will smoke!

Page 22: Qualitative and Limited Dependent Variable Models Adapted from Vera Tabakova’s notes ECON 4551 Econometrics II Memorial University of Newfoundland

Slide16-22Principles of Econometrics, 3rd Edition

Extensions: excess zeros

You can then use hurdle models or zero inflated or zero augmented models to accommodate the extra zeros

Zero-inflated models (initially suggested by D. Lambert) attempt to account for excess zeros in a subtly different way.

In this model there are two kinds of zeros, "true zeros" and excess zeros.

Zero-inflated models estimate also two equations, one for the count model and one for the excess zero's.

The key difference is that the count model allows zeros now. It is not a truncated count model, but allows for “corner solutions”

Example: meat eaters (who sometimes just did not eat meat that week) versus vegetarians who never ever do

Page 23: Qualitative and Limited Dependent Variable Models Adapted from Vera Tabakova’s notes ECON 4551 Econometrics II Memorial University of Newfoundland

Slide16-23Principles of Econometrics, 3rd Edition

Extensions: excess zeros

webuse fish

We want to model how many fish are being caught by fishermen at a state park.

Visitors are asked how long they stayed, how many people were in the group, were there children in the group and how many fish were caught.

Some visitors do not fish at all, but there is no data on whether a person fished or not.

Some visitors who did fish did not catch any fish (and admitted it ) so there are excess zeros in the data because of the people that did not fish.

Page 24: Qualitative and Limited Dependent Variable Models Adapted from Vera Tabakova’s notes ECON 4551 Econometrics II Memorial University of Newfoundland

Slide16-24Principles of Econometrics, 3rd Edition

Extensions: excess zeros

05

01

001

50F

req

uenc

y

0 50 100 150count

. histogram count, discrete freq

Lots of zeros!

Page 25: Qualitative and Limited Dependent Variable Models Adapted from Vera Tabakova’s notes ECON 4551 Econometrics II Memorial University of Newfoundland

Slide16-25Principles of Econometrics, 3rd Edition

Extensions: excess zeros (sample restricted to count<29)

. histogram count, discrete freq

Lots of zeros!

0

0.1

0.2

0.3

0.4

0.5

0.6

0 5 10 15 20 25

Den

sity

count

countgamma(0.21421,8.5873)

Test statistic for gamma:

z = -1.384 pvalue = 0.16642

Page 26: Qualitative and Limited Dependent Variable Models Adapted from Vera Tabakova’s notes ECON 4551 Econometrics II Memorial University of Newfoundland

Slide16-26Principles of Econometrics, 3rd Edition

Extensions: excess zeros number of affairs (Fair 1978)

. histogram count, discrete freq

Lots of zeros!

We sill showcase zero-inflated models using STATA now…

LIMDEP has an extra option to run this from Poisson or Negative Binomial dialogs

You would need to program it in GRETL using its maximum likelihood routines (there is a ZIP example on the pdf user’s guide) LIMDEP has an extra option to run this from Poisson or Negative Binomial dialogs

You would need to program it in GRETL using its maximum likelihood routines (there is a ZIP example on the pdf user’s guide)

Page 27: Qualitative and Limited Dependent Variable Models Adapted from Vera Tabakova’s notes ECON 4551 Econometrics II Memorial University of Newfoundland

Slide16-27Principles of Econometrics, 3rd Edition

Extensions: excess zeros (greene22_2.gdt)

Vuong test

Vuong test of zip vs. standard Poisson: z = 11.66 Pr>z = 0.0000 _cons .9322364 .3901503 2.39 0.017 .1675558 1.696917 relig .2884574 .0841492 3.43 0.001 .1235281 .4533867 male -.1791471 .1948003 -0.92 0.358 -.5609488 .2026546 age -.019041 .0104841 -1.82 0.069 -.0395895 .0015075inflate _cons 1.581638 .1577305 10.03 0.000 1.272492 1.890784 relig -.0971114 .0292688 -3.32 0.001 -.1544772 -.0397456 male -.1598035 .0686006 -2.33 0.020 -.2942583 -.0253487 age .015609 .0038029 4.10 0.000 .0081555 .0230625naffairs naffairs Coef. Std. Err. z P>|z| [95% Conf. Interval]

Log likelihood = -810.055 Prob > chi2 = 0.0000Inflation model = logit LR chi2(3) = 29.67

Zero obs = 451 Nonzero obs = 150Zero-inflated Poisson regression Number of obs = 601

. zip naffairs age male relig , inflate( age male relig ) vuong nolog

genr ANYAFFAIRS = ( Y>0)

Page 28: Qualitative and Limited Dependent Variable Models Adapted from Vera Tabakova’s notes ECON 4551 Econometrics II Memorial University of Newfoundland

Slide16-28Principles of Econometrics, 3rd Edition

Extensions: excess zeros

Vuong test

Vuong test of zinb vs. standard negative binomial: z = 2.82 Pr>z = 0.0024 alpha .7600988 .1925279 .4626647 1.248745 /lnalpha -.2743069 .2532933 -1.08 0.279 -.7707527 .2221388 _cons .6673066 .433002 1.54 0.123 -.1813618 1.515975 relig .274744 .0904315 3.04 0.002 .0975014 .4519865 male -.2309299 .2091759 -1.10 0.270 -.6409071 .1790474 age -.014892 .0113465 -1.31 0.189 -.0371308 .0073468inflate _cons 1.273196 .3874106 3.29 0.001 .5138849 2.032506 relig -.1472717 .0749567 -1.96 0.049 -.2941842 -.0003593 male -.2214886 .1660362 -1.33 0.182 -.5469135 .1039364 age .0258188 .0107692 2.40 0.017 .0047115 .046926naffairs naffairs Coef. Std. Err. z P>|z| [95% Conf. Interval]

Log likelihood = -726.405 Prob > chi2 = 0.0304Inflation model = logit LR chi2(3) = 8.92

Zero obs = 451 Nonzero obs = 150Zero-inflated negative binomial regression Number of obs = 601

. zinb naffairs age male relig , inflate( age male relig ) vuong nolog

Page 29: Qualitative and Limited Dependent Variable Models Adapted from Vera Tabakova’s notes ECON 4551 Econometrics II Memorial University of Newfoundland

Slide16-29Principles of Econometrics, 3rd Edition

Extensions: truncation

• Count data can be truncated too (usually at zero)

• So ztp and ztnb can accommodate that

• Example: you interview visitors at the recreational site, so they all made at least that one trip

• In the continuous case we would have to use the truncreg command

Page 30: Qualitative and Limited Dependent Variable Models Adapted from Vera Tabakova’s notes ECON 4551 Econometrics II Memorial University of Newfoundland

Slide16-30Principles of Econometrics, 3rd Edition

Extensions: truncation

This model works much better and showcases the bias in the previous estimates:

_cons 2.278878 .0728394 31.29 0.000 2.136116 2.421641 income -.0013521 .000473 -2.86 0.004 -.0022791 -.0004251 educat -.0170332 .0175026 -0.97 0.330 -.0513376 .0172712 Travelcost -1.380461 .0571736 -24.15 0.000 -1.492519 -1.268403 persontrip Coef. Std. Err. z P>|z| [95% Conf. Interval]

Log likelihood = -2412.6552 Pseudo R2 = 0.1551 Prob > chi2 = 0.0000 LR chi2(3) = 885.68Zero-truncated Poisson regression Number of obs = 919

. ztp persontrip Travelcost educat income, nolog

Smaller now estimated Consumer Surplus

Page 31: Qualitative and Limited Dependent Variable Models Adapted from Vera Tabakova’s notes ECON 4551 Econometrics II Memorial University of Newfoundland

Slide16-31Principles of Econometrics, 3rd Edition

Extensions: truncation

• Now accounting for overdispersion

This model works much better and showcases the bias in the previous estimates:

Likelihood-ratio test of alpha=0: chibar2(01) = 1092.66 Prob>=chibar2 = 0.000 alpha .52895 .053873 .433232 .6458158 /lnalpha -.6368613 .101849 -.8364818 -.4372409 _cons 2.015503 .1344308 14.99 0.000 1.752024 2.278983 income -.0016369 .0008563 -1.91 0.056 -.0033152 .0000413 educat -.0216377 .0322941 -0.67 0.503 -.084933 .0416576 Travelcost -1.079011 .068793 -15.68 0.000 -1.213843 -.9441795 persontrip Coef. Std. Err. z P>|z| [95% Conf. Interval]

Log likelihood = -1866.326 Pseudo R2 = 0.0660Dispersion = mean Prob > chi2 = 0.0000 LR chi2(3) = 263.89Zero-truncated negative binomial regression Number of obs = 919

. ztnb persontrip Travelcost educat income, nolog

Page 32: Qualitative and Limited Dependent Variable Models Adapted from Vera Tabakova’s notes ECON 4551 Econometrics II Memorial University of Newfoundland

Slide16-32Principles of Econometrics, 3rd Edition

Extensions: truncation and endogenous stratification

• Example: you interview visitors at the recreational site, so they all made at least that one trip

• You interview patients at the doctors’ office about how often they visit the doctor

• You ask people in George St. how often the go to George St…

• Then you are oversampling “frequent visitors” and biasing your estimates, perhaps substantially

Page 33: Qualitative and Limited Dependent Variable Models Adapted from Vera Tabakova’s notes ECON 4551 Econometrics II Memorial University of Newfoundland

Slide16-33Principles of Econometrics, 3rd Edition

Extensions: truncation and endogenous stratification

• Then you are oversampling “frequent visitors” and biasing your estimates, perhaps substantially

• It turns out to be supereasy to deal with a Truncated and Endogenously Stratified Poisson Model (as shown by Shaw, 1988):

Simply run a plain Poisson on “Count-1” and that will work (In STATA: poisson on the corrected count)

It is more complex if there is overdispersion though

Page 34: Qualitative and Limited Dependent Variable Models Adapted from Vera Tabakova’s notes ECON 4551 Econometrics II Memorial University of Newfoundland

Slide16-34Principles of Econometrics, 3rd Edition

Extensions: truncation and endogenous stratification

• Supereasy to deal with a Truncated and Endogenously Stratified Poisson Model

_cons 2.191885 .0792934 27.64 0.000 2.036473 2.347298 income -.0016285 .0005184 -3.14 0.002 -.0026446 -.0006124 educat -.0202144 .0191574 -1.06 0.291 -.0577622 .0173333 Travelcost -1.657986 .0620722 -26.71 0.000 -1.779646 -1.536327 persontrip~e Coef. Std. Err. z P>|z| [95% Conf. Interval]

Log likelihood = -2474.3262 Pseudo R2 = 0.1780 Prob > chi2 = 0.0000 LR chi2(3) = 1071.95Poisson regression Number of obs = 919

. poisson persontripminusone Travelcost educat income, nolog

Much smaller now estimated Consumer Surplus

Page 35: Qualitative and Limited Dependent Variable Models Adapted from Vera Tabakova’s notes ECON 4551 Econometrics II Memorial University of Newfoundland

Slide16-35Principles of Econometrics, 3rd Edition

Extensions: truncation and endogenous stratification

• Endogenously Stratified Negative Binomial Model (as shown by Shaw, 1988; Englin and Shonkwiler, 1995):

Deviance = 0.000 Dispersion = 0.000AIC Statistic = 4.007 BIC Statistic = -6243.307 alpha 1.0974 .1626825 .8206915 1.467406 /lnalpha .092944 .1482435 0.63 0.531 -.197608 .3834959 _cons 1.189429 .1561017 7.62 0.000 .8834757 1.495383 income -.0017368 .0008447 -2.06 0.040 -.0033923 -.0000813 educat -.0229483 .0318753 -0.72 0.472 -.0854228 .0395261 Travelcost -1.152915 .0695958 -16.57 0.000 -1.289321 -1.01651 persontrip Coef. Std. Err. z P>|z| [95% Conf. Interval]

Log likelihood = -1837.3183 Prob > chi2 = 0.0000 Wald chi2(3) = 283.49Negative Binomial with Endogenous Stratification Number of obs = 919

. nbstrat persontrip Travelcost educat income, nolog

Even after accounting for overdispersion, CS estimate is relatively low

Page 36: Qualitative and Limited Dependent Variable Models Adapted from Vera Tabakova’s notes ECON 4551 Econometrics II Memorial University of Newfoundland

Slide16-36Principles of Econometrics, 3rd Edition

Extensions: truncation and endogenous stratification

• How do we calculate the pseudo-R2 for this model???

Deviance = 0.000 Dispersion = 0.000AIC Statistic = 4.007 BIC Statistic = -6243.307 alpha 1.0974 .1626825 .8206915 1.467406 /lnalpha .092944 .1482435 0.63 0.531 -.197608 .3834959 _cons 1.189429 .1561017 7.62 0.000 .8834757 1.495383 income -.0017368 .0008447 -2.06 0.040 -.0033923 -.0000813 educat -.0229483 .0318753 -0.72 0.472 -.0854228 .0395261 Travelcost -1.152915 .0695958 -16.57 0.000 -1.289321 -1.01651 persontrip Coef. Std. Err. z P>|z| [95% Conf. Interval]

Log likelihood = -1837.3183 Prob > chi2 = 0.0000 Wald chi2(3) = 283.49Negative Binomial with Endogenous Stratification Number of obs = 919

. nbstrat persontrip Travelcost educat income, nolog

Page 37: Qualitative and Limited Dependent Variable Models Adapted from Vera Tabakova’s notes ECON 4551 Econometrics II Memorial University of Newfoundland

Slide16-37Principles of Econometrics, 3rd Edition

Extensions: truncation and endogenous stratification

• GNBSTRAT will also allow you to model the overdispersion parameter in this case, just as gnbreg did for the plain case

Page 38: Qualitative and Limited Dependent Variable Models Adapted from Vera Tabakova’s notes ECON 4551 Econometrics II Memorial University of Newfoundland

Slide16-38Principles of Econometrics, 3rd Edition

NOTE: what is the exposure

• Count models often need to deal with the fact that the counts may be measured over different observation periods, which might be of different length (in terms of time or some other relevant dimension)

For example, the number of accidents are recorded for 50 different intersections. However, the number of vehicles that pass through the intersections can vary greatly. Five accidents for 30,000 vehicles is very different from five accidents for 1,500 vehicles.

Count models account for these differences by including the log of the exposure variable in model with coefficient constrained to be one.

The use of exposure is often superior to analyzing rates as response variables as such, because it makes use of the correct probability distributions

Page 39: Qualitative and Limited Dependent Variable Models Adapted from Vera Tabakova’s notes ECON 4551 Econometrics II Memorial University of Newfoundland

16.7 Limited Dependent Variables

16.7.1 Censored Data

Figure 16.3 Histogram of Wife’s Hours of Work in 1975

Slide16-39Principles of Econometrics, 3rd Edition

Page 40: Qualitative and Limited Dependent Variable Models Adapted from Vera Tabakova’s notes ECON 4551 Econometrics II Memorial University of Newfoundland

16.7.1 Censored Data

Having censored data means that a substantial fraction of the

observations on the dependent variable take a limit value. The

regression function is no longer given by (16.30).

The least squares estimators of the regression parameters obtained by

running a regression of y on x are biased and inconsistent—least

squares estimation fails.

Slide16-40Principles of Econometrics, 3rd Edition

(16.30) 1 2|E y x x

Page 41: Qualitative and Limited Dependent Variable Models Adapted from Vera Tabakova’s notes ECON 4551 Econometrics II Memorial University of Newfoundland

16.7.1 Censored Data

Having censored data means that a substantial fraction of the

observations on the dependent variable take a limit value. The

regression function is no longer given by (16.30).

The least squares estimators of the regression parameters obtained by

running a regression of y on x are biased and inconsistent—least

squares estimation fails.

Slide16-41Principles of Econometrics, 3rd Edition

(16.30) 1 2|E y x x

Page 42: Qualitative and Limited Dependent Variable Models Adapted from Vera Tabakova’s notes ECON 4551 Econometrics II Memorial University of Newfoundland

Censoring versus Truncation

With truncation, we only observe the value of the regressors when the dependent variable takes a certain value (usually a positive one instead of zero)

With censoring we observe in principle the value of the regressors for everyone, but not the value of the dependent variable for those whose dependent variable takes a value beyond the limit

Page 43: Qualitative and Limited Dependent Variable Models Adapted from Vera Tabakova’s notes ECON 4551 Econometrics II Memorial University of Newfoundland

16.7.2 A Monte Carlo Experiment

We give the parameters the specific values and

Assume

Slide16-43Principles of Econometrics, 3rd Edition

(16.31)

1 29 and 1.

*1 2 9i i i i iy x e x e

2~ 0, 16 .ie N

*

* *

0 if 0;

if 0.

i i

i i i

y y

y y y

Page 44: Qualitative and Limited Dependent Variable Models Adapted from Vera Tabakova’s notes ECON 4551 Econometrics II Memorial University of Newfoundland

16.7.2 A Monte Carlo Experiment

Create N = 200 random values of xi that are spread evenly (or

uniformly) over the interval [0, 20]. These we will keep fixed in

further simulations.

Obtain N = 200 random values ei from a normal distribution with

mean 0 and variance 16.

Create N = 200 values of the latent variable.

Obtain N = 200 values of the observed yi using

Slide16-44Principles of Econometrics, 3rd Edition

*

* *

0 if 0

if 0

i

i

i i

yy

y y

Page 45: Qualitative and Limited Dependent Variable Models Adapted from Vera Tabakova’s notes ECON 4551 Econometrics II Memorial University of Newfoundland

16.7.2 A Monte Carlo Experiment

Figure 16.4 Uncensored Sample Data and Regression FunctionSlide16-45

Page 46: Qualitative and Limited Dependent Variable Models Adapted from Vera Tabakova’s notes ECON 4551 Econometrics II Memorial University of Newfoundland

16.7.2 A Monte Carlo Experiment

Figure 16.5 Censored Sample Data, and Latent Regression Function and Least Squares Fitted Line

Slide16-46Principles of Econometrics, 3rd Edition

Page 47: Qualitative and Limited Dependent Variable Models Adapted from Vera Tabakova’s notes ECON 4551 Econometrics II Memorial University of Newfoundland

16.7.2 A Monte Carlo Experiment

Slide16-47Principles of Econometrics, 3rd Edition

(16.32a)ˆ 2.1477 .5161

(se) (.3706) (.0326)i iy x

(16.32b)ˆ 3.1399 .6388

(se) (1.2055) (.0827)i iy x

(16.33) ( )1

1 NSAM

MC k k mm

E b bNSAM

OLS for all the 200 observations predicts:

OLS for only the 100 positive observations (y >0) predicts:

Our Monte Carlo experiment resamples 200 times and on average predicts on average:

Page 48: Qualitative and Limited Dependent Variable Models Adapted from Vera Tabakova’s notes ECON 4551 Econometrics II Memorial University of Newfoundland

16.7.2 A Monte Carlo Experiment

Slide16-48Principles of Econometrics, 3rd Edition

(16.32a)ˆ 2.1477 .5161

(se) (.3706) (.0326)i iy x

(16.32b)ˆ 3.1399 .6388

(se) (1.2055) (.0827)i iy x

(16.33) ( )1

1 NSAM

MC k k mm

E b bNSAM

OLS for all the 200 observations predicts:

OLS for only the 100 positive observations (y >0) predicts:

Our Monte Carlo experiment resamples 200 times and on average predicts on average:

Page 49: Qualitative and Limited Dependent Variable Models Adapted from Vera Tabakova’s notes ECON 4551 Econometrics II Memorial University of Newfoundland

16.7.3 Maximum Likelihood Estimation

The maximum likelihood procedure is called Tobit in honor of James

Tobin, winner of the 1981 Nobel Prize in Economics, who first

studied this model.

The probit probability that yi = 0 is:

Slide16-49Principles of Econometrics, 3rd Edition

1 20 [ 0] 1i i iP y P y x

1

221 2 21 2 1 22

0 0

1, , 1 2 exp

2i i

ii i

y y

xL y x

Page 50: Qualitative and Limited Dependent Variable Models Adapted from Vera Tabakova’s notes ECON 4551 Econometrics II Memorial University of Newfoundland

16.7.3 Maximum Likelihood Estimation

The maximum likelihood estimator is consistent and asymptotically

normal, with a known covariance matrix.

Using the artificial data the fitted values are:

Slide16-50Principles of Econometrics, 3rd Edition

(16.34)10.2773 1.0487

(se) (1.0970) (.0790)i iy x

Page 51: Qualitative and Limited Dependent Variable Models Adapted from Vera Tabakova’s notes ECON 4551 Econometrics II Memorial University of Newfoundland

16.7.3 Maximum Likelihood Estimation

Slide16-51Principles of Econometrics, 3rd Edition

Page 52: Qualitative and Limited Dependent Variable Models Adapted from Vera Tabakova’s notes ECON 4551 Econometrics II Memorial University of Newfoundland

16.7.3 Maximum Likelihood Estimation

Slide16-52Principles of Econometrics, 3rd Edition

You can run this experiment yourselves in GRETL

open "c:\Program Files\gretl\data\poe\tobit.gdt"smpl 1 200genr xs = 20*uniform()loop 1000 --progressive genr y = -9 + 1*xs + 4*normal() genr yi = y > 0#which is a handy command to generate dummies! genr yc = y*yi ols yc const xs --quiet genr b1s = $coeff(const) genr b2s = $coeff(xs) store coeffs.gdt b1s b2sendloop

Page 53: Qualitative and Limited Dependent Variable Models Adapted from Vera Tabakova’s notes ECON 4551 Econometrics II Memorial University of Newfoundland

16.7.3 You can repeat the experiment using only the positive values:

Slide16-53Principles of Econometrics, 3rd Edition

open "c:\Program Files\gretl\data\poe\tobit.gdt"genr xs = 20*uniform()genr idx = 1matrix A = zeros(1000,3)loop 1000 --quiet smpl --full genr y = -9 + 1*xs + 4*normal() smpl y > 0 --restrict ols y const xs --quiet genr b1s = $coeff(const) genr b2s = $coeff(xs) matrix A[idx,1]=idx matrix A[idx,2]=b1s matrix A[idx,3]=b2s genr idx = idx + 1endloop

# The matrix A contains all 1000 sets of coefficients # bb finds the column mean of A

matrix bb = meanc(A) bb

Page 54: Qualitative and Limited Dependent Variable Models Adapted from Vera Tabakova’s notes ECON 4551 Econometrics II Memorial University of Newfoundland

16.7.3 You can repeat the experiment using only the positive values:

Slide16-54Principles of Econometrics, 3rd Edition

# The matrix A contains all 1000 sets of coefficients # bb finds the column mean of A

matrix bb = meanc(A) bb

Note that the first cell refers to the average of the “case” number (500.5, since there are 1000 cases numbered 1 to 1000)

Page 55: Qualitative and Limited Dependent Variable Models Adapted from Vera Tabakova’s notes ECON 4551 Econometrics II Memorial University of Newfoundland

16.7.4 Tobit Model Interpretation

Because the cdf values are positive, the sign of the coefficient does

tell the direction of the marginal effect, just not its magnitude. If

β2 > 0, as x increases the cdf function approaches 1, and the slope of

the regression function approaches that of the latent variable model.

Slide16-55Principles of Econometrics, 3rd Edition

(16.35) 1 2

2

|E y x x

x

Page 56: Qualitative and Limited Dependent Variable Models Adapted from Vera Tabakova’s notes ECON 4551 Econometrics II Memorial University of Newfoundland

16.7.4 Tobit Model Interpretation

Figure 16.6 Censored Sample Data, and Regression Functions for Observed and Positive y values

Slide16-56Principles of Econometrics, 3rd Edition

Page 57: Qualitative and Limited Dependent Variable Models Adapted from Vera Tabakova’s notes ECON 4551 Econometrics II Memorial University of Newfoundland

16.7.5 An Example

Slide16-57Principles of Econometrics, 3rd Edition

(16.36)1 2 3 4 4 6HOURS EDUC EXPER AGE KIDSL e

2 73.29 .3638 26.34

E HOURS

EDUC

Page 58: Qualitative and Limited Dependent Variable Models Adapted from Vera Tabakova’s notes ECON 4551 Econometrics II Memorial University of Newfoundland

16.7.5 An Example

Slide16-58Principles of Econometrics, 3rd Edition

Page 59: Qualitative and Limited Dependent Variable Models Adapted from Vera Tabakova’s notes ECON 4551 Econometrics II Memorial University of Newfoundland

Tobit example in GRETL

#Tobit

open "c:\Program Files\gretl\data\poe\mroz.gdt"

tobit hours const educ exper age kidsl6

genr H_hat = $coeff(const)+$coeff(educ)*mean(educ) +$coeff(exper)*mean(exper) \

+$coeff(age)*mean(age)+$coeff(kidsl6)*1

genr z = cnorm(H_hat/$sigma)

genr pred = z*$coeff(educ)

smpl hours > 0 --restrict

ols hours const educ exper age kidsl6

smpl --full

ols hours const educ exper age kidsl6

Slide16-59Principles of Econometrics, 3rd Edition

Page 60: Qualitative and Limited Dependent Variable Models Adapted from Vera Tabakova’s notes ECON 4551 Econometrics II Memorial University of Newfoundland

16.7.6 Sample Selection

Problem: our sample is not a random sample. The data we observe

are “selected” by a systematic process for which we do not account.

Solution: a technique called Heckit, named after its developer (not

original author), Nobel Prize winning econometrician James

Heckman.

Slide16-60Principles of Econometrics, 3rd Edition

Page 61: Qualitative and Limited Dependent Variable Models Adapted from Vera Tabakova’s notes ECON 4551 Econometrics II Memorial University of Newfoundland

16.7.6a The Econometric Model

The econometric model describing the situation is composed of two equations. The first, is the selection equation that determines whether the variable of interest is observed.

Slide16-61Principles of Econometrics, 3rd Edition

(16.37)*1 2 1, ,i i iz w u i N

(16.38)

*1 0

0 otherwise

i

i

zz

Page 62: Qualitative and Limited Dependent Variable Models Adapted from Vera Tabakova’s notes ECON 4551 Econometrics II Memorial University of Newfoundland

16.7.6a The Econometric Model

The second equation is the linear model of interest. It is

Slide16-62Principles of Econometrics, 3rd Edition

(16.39)

(16.40)

1 2 1, ,i i iy x e i n N n

(16.41)

*1 2| 0 1, ,i i i iE y z x i n

1 2

1 2

ii

i

w

w

Page 63: Qualitative and Limited Dependent Variable Models Adapted from Vera Tabakova’s notes ECON 4551 Econometrics II Memorial University of Newfoundland

16.7.6a The Econometric Model

The estimated “Inverse Mills Ratio” is

The estimating equation is

Slide16-63Principles of Econometrics, 3rd Edition

(16.42)

1 2

1 2

ii

i

w

w

1 2 1, ,i i i iy x v i n

Page 64: Qualitative and Limited Dependent Variable Models Adapted from Vera Tabakova’s notes ECON 4551 Econometrics II Memorial University of Newfoundland

16.7.6b Heckit Example: Wages of Married Women

Slide16-64Principles of Econometrics, 3rd Edition

(16.43) 2ln .4002 .1095 .0157 .1484

(t-stat) ( 2.10) (7.73) (3.90)

WAGE EDUC EXPER R

1 1.1923 .0206 .0838 .3139 1.3939

(t-stat) ( 2.93) (3.61) ( 2.54) ( 2.26)

P LFP AGE EDUC KIDS MTR

1.1923 .0206 .0838 .3139 1.3939

1.1923 .0206 .0838 .3139 1.3939

AGE EDUC KIDS MTRIMR

AGE EDUC KIDS MTR

OLS yields

Probit on dummy indicating “being in the labour force” yields:

From here predict the inverse Mill’s ratio:

Page 65: Qualitative and Limited Dependent Variable Models Adapted from Vera Tabakova’s notes ECON 4551 Econometrics II Memorial University of Newfoundland

16.7.6b Heckit Example: Wages of Married Women

The maximum likelihood estimated wage equation is

The standard errors based on the full information maximum likelihood procedure are smaller than those yielded by the two-step estimation method.

Slide16-65Principles of Econometrics, 3rd Edition

(16.44)

ln .8105 .0585 .0163 .8664

(t-stat) (1.64) (2.45) (4.08) ( 2.65)

(t-stat-adj) (1.33) (1.97) (3.88) ( 2.17)

WAGE EDUC EXPER IMR

ln .6686 .0658 .0118

(t-stat) (2.84) (3.96) (2.87)

WAGE EDUC EXPER

Page 66: Qualitative and Limited Dependent Variable Models Adapted from Vera Tabakova’s notes ECON 4551 Econometrics II Memorial University of Newfoundland

Heckit example in GRETL

#Heckit

open "c:\Program Files\gretl\data\poe\mroz.gdt“

genr kids = (kidsl6+kids618>0)

logs wage

list X = const educ exper

list W = const mtr age kids educ

probit lfp W

genr ind = $coeff(const) + $coeff(age)*age + $coeff(educ)*educ + $coeff(kids)*kids + $coeff(mtr)*mtr

#Predict the inverse Mill’s ratio:

genr lambda = dnorm(ind)/cnorm(ind)

ols l_wage X lambda

heckit l_wage X ; lfp W --two-step

Slide16-66Principles of Econometrics, 3rd Edition

Page 67: Qualitative and Limited Dependent Variable Models Adapted from Vera Tabakova’s notes ECON 4551 Econometrics II Memorial University of Newfoundland

Keywords

Slide 16-67Principles of Econometrics, 3rd Edition

binary choice models censored data conditional logit count data models feasible generalized least squares Heckit identification problem independence of irrelevant

alternatives (IIA) index models individual and alternative specific

variables individual specific variables latent variables likelihood function limited dependent variables linear probability model

logistic random variable logit log-likelihood function marginal effect maximum likelihood estimation multinomial choice models multinomial logit odds ratio ordered choice models ordered probit ordinal variables Poisson random variable Poisson regression model probit selection bias tobit model truncated data

Page 68: Qualitative and Limited Dependent Variable Models Adapted from Vera Tabakova’s notes ECON 4551 Econometrics II Memorial University of Newfoundland

Further models

Survival analysis (time-to-event data analysis)

Multivariate probit (biprobit, triprobit, mvprobit)

Page 69: Qualitative and Limited Dependent Variable Models Adapted from Vera Tabakova’s notes ECON 4551 Econometrics II Memorial University of Newfoundland

References

Hoffmann, 2004 for all topicsLong, S. and J. Freese for all topicsCameron and Trivedi’s book for count

data

Agresti, A. (2001) Categorical Data Analysis (2nd ed). New York: Wiley.