1 research method lecture 15 tobit model for corner solution ©
TRANSCRIPT
1
Research MethodResearch Method
Lecture 15 Lecture 15
Tobit model for Tobit model for corner solutioncorner solution
©
The corner solution The corner solution responsesresponses
Corner solution responses example
1 Amount of charitable donation: Many people do not donate. Thus a significant fraction of the data has zero value.
2. Hours worked by married women: Many
married women do not work. Thus, a significant fraction of the data has zero hours worked.
2
Tobit model is used to model such situations.
3
The modelThe model For the explanatory purpose, I use
one explanatory variable model. This can be extended, off course, to multiple variable cases.
4
Consider to estimate the effect of education x on the married women’s hours worked y.
In tobit model, we start with a latent variable y*, which is only partially observed by the researcher:
y*=β0+β1x+u and u~N(0,σ2)
If y* is positive, then y* is equal to the actual hours worked: y. But if y* is negative, then the actual hours worked, y, is equal to zero. We also assume u is normally distributed.
5
The model can be conveniently written as: yi*=β0+β1xi+ui …………………..(1)
such that yi=yi* if yi*>0
yi=0 if yi*≤0
and ui~N(0,σ2)
We introduced i-subscript to denote ith observation. We assume that equation (1) satisfies the Classic Linear Model assumptions.
6
The actual hours worked.
7
y*, y
Educ
When y* is negative, actual hours worked is zero.
Graphical illustration
The variable, y*, can be negative, but if it is negative, then the actual hours worked is equal to zero.
In this way, the Tobit model deals with the fact that, many women do not work, thus,the hours worked is zero many women .
8
The estimation procedureThe estimation procedure The estimation procedure is by maximum
likelihood estimation. If the hours worked is positive (i.e., for the
women who are working), yi*=yi, thus
ui= yi- β0+β1xi
Thus, the likelihood function for a working woman is given by the hight of the density function:
9
)(1
2
11
2
1 10
)(
)(2
1
2
)(
2
10
2102
210
ii
xy
xyxy
i
xyeeL
ii
iiii
If the actual hours worked is zero (i.e., for women who are not working), we only know that y*≤0. Thus, the likelihood contribution is the probability that y*≤0, which is given by:
10
)(1
)(
)(
))((
)0()0(
10
10
10
10
10*
i
i
ii
ii
iiii
x
x
xuP
xuP
uxPyPL
To summarize,
Let Di be a dummy variable that takes 1 if yi>0. Then, the above likelihood contribution can be written as.
11
0y if )(1
0y if )(1
i10
i10
ii
iii
xL
and
xyL
ii D
i
D
iii
xxyL
1
1010 )(1)(1
The likelihood function, L, is obtained by multiplying all the likelihood contributions of all the observations.
The values of β0,β1 and σ that maximize the likelihood function are the Tobit estimators of the parameters.
In actual computation, you maximize Log(L).
12
n
iiLL
110 ),,(
ExerciseExercise Using Mroz.dta, estimate the hours
worked equation for married women using Tobit model. Included in the model, nwifeinc, educ, exper, expersq, age kidslt6, kidsge6.
13
AnswerAnswer
14 0 right-censored observations 428 uncensored observations Obs. summary: 325 left-censored observations at hours<=0 /sigma 1122.022 41.57903 1040.396 1203.647 _cons 965.3053 446.4358 2.16 0.031 88.88528 1841.725 kidsge6 -16.218 38.64136 -0.42 0.675 -92.07675 59.64075 kidslt6 -894.0217 111.8779 -7.99 0.000 -1113.655 -674.3887 age -54.40501 7.418496 -7.33 0.000 -68.96862 -39.8414 expersq -1.864158 .5376615 -3.47 0.001 -2.919667 -.8086479 exper 131.5643 17.27938 7.61 0.000 97.64231 165.4863 educ 80.64561 21.58322 3.74 0.000 38.27453 123.0167 nwifeinc -8.814243 4.459096 -1.98 0.048 -17.56811 -.0603724 hours Coef. Std. Err. t P>|t| [95% Conf. Interval]
Log likelihood = -3819.0946 Pseudo R2 = 0.0343 Prob > chi2 = 0.0000 LR chi2(7) = 271.59Tobit regression Number of obs = 753
. tobit hours nwifeinc educ exper expersq age kidslt6 kidsge6, ll(0)
. use "D:\My Documents\IUJ_teaching\Research Methodology, Cross section and panel\Wooldridge Econometrics resources\data\MROZ.DTA", clear
The partial effects The partial effects (marginal effects)(marginal effects)
As can be seen, estimated parameters βj measures the effect of xj on y*.
But in corner solution, we are interested in the effect of xj on actual hours worked y.
In the next few slides, I will show how to compute the effect of an increase in explanatory variable on the expected value of y.
15
Note that the expectation of y given x is given by
E(y|x)=P(y>0)E(y|y>0,x) +P(y=0)E(0|y=0,x) =P(y>0)E(y|y>0,x)
…………..(1) Now, let me compute E(y|y>0,x).
16
zero
),|(
),|(
)),(|(
),0|(),0|(
1010
1010
1010
1010
xxuu
Ex
xxuu
Ex
xxuuEx
xuxuxExyyE
Now, we use the fact that if v is a standard normal variable and c is a constant, then
In our case, c=-(β0+β1x). (Note that the expectation is also conditioned on x, so you can treat x as a constant.). Thus, we have
17
)(1
)()|(
c
ccvvE
)(
)(
)(1
)(
),|(),0|(
10
10
10
10
10
10
1010
x
x
x
x
x
x
xxuu
ExxyyE
18
ratio. sMill' inverse the
called isit and ,)(/)()(
)2.....().........(
)(
)(
1010
10
10
10
ccc
where
xx
x
x
x
This term is called the inverse Mill’s ratio, and denoted by λ(.)
Now, let me compute P(y>0|x)
)3..(..........).........(
)(1
),(
)),((
)|0()|0(
10
10
10
10
10
x
x
xxu
P
xxuP
xuxPxyP
By plugging (2) and (3) into (1), we have
19
)4....()()()|( 1010
10
x
xx
xyE
From the above computation, you can see that there can be two ways to compute the partial effect.
20
The effect of x on hours worked for those who are working.
The overall effect of x on hours worked.
)()|(
.2
)()(1),0|(
.1
101
1010101
x
x
xyE
xxx
x
xyyE
As can be seen, both partial effects depends on x. Therefore, they are different among different observations in the data.
However, we need to know the overall effect rather than the effect for specific person in the data.
As was the case in the Probit models or logit models, there are two ways to compute the ‘overall partial effect’.
21
The first is the Partial Effect at Average (PEA). You simply plug the average value of x in the partial effect formula. This is automatically computed by STATA.
The second is the Average Partial Effect. You compute the partial effect for each individual in the data, then take the average.
22
ExerciseExercise Using Mroz.dta, estimate the hours
worked equation for married women using Tobit model. Included in the model, nwifeinc, educ, exper, expersq, age kidslt6, kidsge6.
1. Compute the effect of education on hours worked for those who are currently working:
2. Compute the effect of education on hours worked for the entire observations: 23
educ
xyyE
),0|(
educ
yE
)x|(
24 partial 753 34.27517 0 34.27517 34.27517 Variable Obs Mean Std. Dev. Min Max
. su partial
. gen partial=_b[educ]*(1-lambda*(avxbsig+lambda))
. gen lambda=normalden(avxbsig)/normal(avxbsig)
. gen avxbsig=avxbeta/_b[/sigma]
. egen avxbeta=mean(xbeta)
. predict xbeta, xb
. *****************************
. *manually *
. *on hours for working women *
. *at average of educ *
. *Compute the Partial effect *
. *****************************
0 right-censored observations 428 uncensored observations Obs. summary: 325 left-censored observations at hours<=0 /sigma 1122.022 41.57903 1040.396 1203.647 _cons 965.3053 446.4358 2.16 0.031 88.88528 1841.725 kidsge6 -16.218 38.64136 -0.42 0.675 -92.07675 59.64075 kidslt6 -894.0217 111.8779 -7.99 0.000 -1113.655 -674.3887 age -54.40501 7.418496 -7.33 0.000 -68.96862 -39.8414 expersq -1.864158 .5376615 -3.47 0.001 -2.919667 -.8086479 exper 131.5643 17.27938 7.61 0.000 97.64231 165.4863 educ 80.64561 21.58322 3.74 0.000 38.27453 123.0167 nwifeinc -8.814243 4.459096 -1.98 0.048 -17.56811 -.0603724 hours Coef. Std. Err. t P>|t| [95% Conf. Interval]
Log likelihood = -3819.0946 Pseudo R2 = 0.0343 Prob > chi2 = 0.0000 LR chi2(7) = 271.59Tobit regression Number of obs = 753
. tobit hours nwifeinc educ exper expersq age kidslt6 kidsge6, ll(0)
Partial effect at average for working women: Computing manually.
25
0 right-censored observations 428 uncensored observations Obs. summary: 325 left-censored observations at hours<=0 /sigma 1122.022 41.57903 1040.396 1203.647 _cons 965.3053 446.4358 2.16 0.031 88.88528 1841.725 kidsge6 -16.218 38.64136 -0.42 0.675 -92.07675 59.64075 kidslt6 -894.0217 111.8779 -7.99 0.000 -1113.655 -674.3887 age -54.40501 7.418496 -7.33 0.000 -68.96862 -39.8414 expersq -1.864158 .5376615 -3.47 0.001 -2.919667 -.8086479 exper 131.5643 17.27938 7.61 0.000 97.64231 165.4863 educ 80.64561 21.58322 3.74 0.000 38.27453 123.0167 nwifeinc -8.814243 4.459096 -1.98 0.048 -17.56811 -.0603724 hours Coef. Std. Err. t P>|t| [95% Conf. Interval]
Log likelihood = -3819.0946 Pseudo R2 = 0.0343 Prob > chi2 = 0.0000 LR chi2(7) = 271.59Tobit regression Number of obs = 753
. tobit hours nwifeinc educ exper expersq age kidslt6 kidsge6, ll(0)
educ 34.27517 9.11708 3.76 0.000 16.406 52.1443 12.2869 variable dy/dx Std. Err. z P>|z| [ 95% C.I. ] X = 1012.0327 y = E(hours|hours>0) (predict, e(0,.))Marginal effects after tobit
. mfx, predict(e(0,.)) varlist(educ)
. ***********************************
. * for working women automatically *
. * at average of educ on hours *
. * Compute the partial effect *
. ***********************************
Partial effect at average for working women: Compute automatically.
26
partial_all 753 48.73409 0 48.73409 48.73409 Variable Obs Mean Std. Dev. Min Max
. su partial_all
. gen partial_all=_b[educ]*normal(avxbsig)
.
. *****************************************
. *manually *
. *of education for the entire observation*
. *Compute the Partial effect at average *
. *****************************************
.
educ 48.73409 12.963 3.76 0.000 23.3263 74.1419 12.2869 variable dy/dx Std. Err. z P>|z| [ 95% C.I. ] X = 611.57078 y = E(hours*|hours>0) (predict, ystar(0,.))Marginal effects after tobit
. mfx, predict(ystar(0,.)) varlist(educ)
.
. *****************************************
. *automatically *
. *of education for the entire observation*
. *Compute the Partial effect at average *
. *****************************************
Partial effect at average for all the obs: Compute manually.
Partial effect at average for all the obs: Compute automatically