binary and fractional response models with continuous and … · 2019. 11. 17. · likelihood of...
TRANSCRIPT
-
Binary and Fractional Response Models with Continuous and
Binary Endogenous Explanatory Variables
Wei Lin∗
Jeffrey M. Wooldridge†
November 8th, 2017
Abstract
This paper considers latent variable models for binary responses and fractional responses with a bi-
nary endogenous explanatory variable (EEV) and potentially many continuous endogenous explanatory
variables. A two-step control function (CF) approach is promoted to account for endogeneity. The CF
approach enables an uncovering of partial effects of causal interest. The inference for the partial effects
can be easily obtained through bootstrapping because of the computational simplicity of the two-step CF
approach. A basic probit model, an endogenous switching probit model, and a fractional probit model
are discussed in the paper. Variable addition tests on generalized residuals are used to detect additional
endogeneity from the binary EEV. Monte Carlo experiments show that partial effects obtained by insert-
ing generalized residuals into binary response models outperform coefficients from linear specifications.
In fact, they provide fairly close approximations to partial effects from joint estimations. An empirical
illustration of the determination of housing budget shares shows that, in a fractional response model,
using generalized residuals again leads to a close approximation to joint estimations. The coefficients
from linear specifications and partial effects from quasi-MLE are also close in this case.
∗Center for Real Estate, Massachusetts Institute of Technology, Cambridge, MA 02139 , United States. [email protected].†Department of Economics, Michigan State University, East Lansing, MI 48824, United States. [email protected].
1
-
1 Introduction
Binary response models play a significant role in many fields of empirical studies. Examples include
econometric models determining the probability of migration in labor economics (Dong and Lewbel, 2015),
the chance of college admission in the economics of education (Conlin et al., 2013), and the likelihood of
takeover activity in finance (Edmans et al., 2012), to name just a few.
In practice, linear probability models for binary responses are often used because they are easy to esti-
mate. However, a linear projection disregards the limited nature of the binary response, resulting in unreal-
istic predictions of the response probability. Therefore, coefficients of the linear probability models serve,
at best, as approximations to marginal effects of interest .
Latent variable models for binary responses, on the other hand, are well grounded on economic theories.
The latent threshold-crossing structure captures the trigger effect, and the nonlinear transformation ensures
that the response probability falls into the unit interval. Thereby, post-estimation quantities from the latent
variable models, such as average partial effects (APEs), can bear a causal interpretation, even in the presence
of endogenous explanatory variables (Lin and Wooldridge, 2015). Yet due to the nonlinearity, detecting and
solving endogeneity issues in the first place is less straightforward, especially when some of the suspected
endogenous explanatory variables are also discrete.
This paper considers the estimation of a special case of latent variable models for binary responses with
EEVs of differing attributes, where a control function approach can be applied to make it easier to handle
EEVs. Namely, we allow for one binary EEV and potentially many continuous EEVs. The binary EEV
could be an indicator of self-selecting into a treatment, into a region or into a sample, depending on how it
appears in the binary response equation. Some empirical examples for the binary EEV are whether to own
a house, whether to submit SAT scores in college applications, or whether to be in the labor force. The
continuous EEVs could be years of education, family income, prices of substitutes, or inputs for production
functions. The endogeneity here is modeled to arise from omitted variable problems—the existence of
common unobservables affecting both the binary outcome and the EEVs.
To estimate the model above, for simplicity, we take a simple parametric approach by assuming joint
normality among error terms, following the treatments of binary and continuous EEVs adopted by Heckman
(1978), Amemiya (1978), and Rivers and Vuong (1988). This distributional assumptions, not only allows for
control function approaches, but also for joint estimation, such as a limited information maximum likelihood
2
-
(LIML) estimation, or a quasi-limited information maximum likelihood (quasi-LIML) estimation if any of
the distribution is misspecified. However, this kind of joint estimation is rarely conducted in practice, due to
the difficulty in searching for a numerical solution along so many dimensions. One might also be tempted
to mimic two-stage least squares by substituting fitted values from a first-stage estimation for EEVs in the
binary response equation. Despite the simplicity, this procedure leads to the so-called ”forbidden regression”
coined by Hausman (1975), which in turn yields inconsistent estimates.
Alternatively, drawing on Wooldridge (2014), this paper promotes a two-step control function (CF)
approach under the quasi-LIML framework, which is not only computationally simple but also delivers
sensible estimators of APEs in the presence of multiple EEVs and various sources of heterogeneity. To carry
out this two-step procedure, residuals, instead of fitted values, from the first-stage estimation of reduced
forms for the continuous EEVs are plugged into the second-stage joint estimation of the binary outcome and
binary EEV. Routines in commonly used software can be exploited (or slightly modified) to carry out this
procedure. Most importantly, due to the computational simplicity of the CF approach, bootstrapping can
be easily applied to obtain inference for functions of the parameters, such as APEs, rather than using the
complicated delta method.
In addition, as shown in Wooldridge (2014), simple variable addition tests (VATs) for endogeneity are
obtained as by-products of the CF approach. This paper extends the VATs for a single EEV to VATs for
multiple EEVs. The VATs are based on standard Wald tests of those plugged-in residuals obtained from
the first stage estimations. In particular, in the presence of a binary EEV, testing on generalized residuals
enables us to determine whether we can avoid a joint estimation in the second stage. Further, since EEVs
are often correlated with each other, conditioning on residuals obtained from other EEVs helps reduce the
likelihood of detecting additional endogeneity in the binary EEV.
Another feature of the two-step procedure, which stems from White (1982), is the quasi-LIML frame-
work. The analysis for binary response models in this paper easily carries through to fractional response
models, as proposed in Papke and Wooldridge (1996). So long as the conditional mean of the fractional
response is correctly specified, we consistently estimate parameters in the conditional mean even though
other features of the distribution are misspecified.
Besides parametric approaches to estimating this triangular model for binary responses, semi-parametric
and nonparametric approaches are also available. Nevertheless, while those approaches sensibly relax para-
metric assumptions in one aspect, they inevitably impose restrictions in other directions. For example,
3
-
Blundell and Powell (2003) advance CF approaches to fully nonparametric binary response models with
continuous EEVs. Unfortunately, their assumption of additive, independent errors rules out discrete EEVs.
The special regressor method in Dong and Lewbel (2015) allows for both continuous and binary EEVs in
semiparametric binary response models. However, their method requires a special regressor to be excluded
from the reduced forms for the EEVs, and the special regressor cannot appear in the structural equation in
flexible functional forms. Further, as discussed in Lin and Wooldridge (2015), the average index functions
(AIF), proposed as a basis for defining marginal effects for special regressor methods, lack a causal interpre-
tation. Some other existing semiparametric methods for estimating this model are discussed in more detail
by Lin (2016).
The rest of the paper is organized as follows. Section 2 starts with a basic model with one binary and
many continuous EEVs. The same arguments are then extended to an endogenous switching model where
the error terms and switching indicator are allowed to interact. Section 3 derives the VAT for endogeneity
from a binary EEV given residuals from continuous EEVs. Section 4 shows that the CF approach can
be applied to fractional response models. Section 5 presents Monte Carlo simulation results of empirical
distributions of APEs for binary response models. Section 6 illustrates this approach by revisiting the study
of the effects of price and total expenditure on housing budget share equation. Section 7 concludes.
2 Model Specification and Estimation for Binary Response
2.1 Probit Models with One Binary EEV and Many Continuous EEVs
As a starting point, we first assume that the only complication arises from EEVs of differing attributes,
with no presence of heterogeneity or misspecification yet. More specifically, consider a simple model for a
binary response y1 with many continuous EEVs y2 and one binary EEV y3. Write the model recursively in
a triangular form as
y1 = 1 [x1β + u1 > 0] , (1a)
y2 = zΠ + v2, (1b)
y3 = 1 [zδ + u3 > 0] . (1c)
4
-
Equation (1a) is a structural equation that represents a causal relationship. Equations (1b) and (1c) are
reduced forms for the continuous EEVs y2 of dimension 1×G and the scalar binary EEV y3, respectively.
1 [·] denotes the indicator function that takes on a value of one when the statement in the bracket is true
and zero otherwise. x1 is a 1 ×K1 vector where each of its elements is a general function of (z1,y2, y3),
such as polynomials, interactions, logarithms, etc., with x1 = (z1,y2, y3) being the leading case. z1 is a
1 × L1 strict subset of the entire 1× L vector of exogenous variables z ≡ (z1, z2), with L ≡ L1 + L2
and L2 ≥ G + 1. Identifying parameters based on nonlinearity often times turns out poorly in practice, so
we need at least one excluded instrument for the binary EEV. Further, the same rank condition holds as in
two-stage least squares: rank E(z′z) = L1 + L2 and rank E(z′x1) = K1. Moreover, let z1 include unity
as its first element, which effectively forces the error terms (u1,v2, v3) to have zero means. Π is a L × G
matrix of parameters. In the simplest case, when G = 1, y2 is a scalar.
This system of equations describes endogeneity as an omitted variable problem. The structural error u1
is correlated with explanatory variables y2 and y3, in that it contains an unobservable that also appears in
error terms v2 and u3. Write the linear projections of (u1, u3) on v2 in error forms:
u1 = v2θ + v1, (2a)
u3 = v2η + v3, (2b)
where θ ≡ E (v′2v2)−1E (v′2u1) and η ≡ E (v′2v2)
−1E (v′2u3) are the G × 1 vectors of the population
regression coefficients.
A convenient joint normality assumption among (u1,v2, u3) (Heckman, 1978; Amemiya, 1978; Rivers
and Vuong, 1988) leaves us with a bivariate normally distributed vector of errors (v1, v3) that is independent
of v2 (by definition of a linear projection and by a property of multivariate normality):
D
v1v3
= D v1
v3
∣∣∣∣∣∣∣v2 ∼ Normal
0
0
, 1 ρ
ρ 1
, (3)
where D (·) denotes the distribution, the variances of v1 and v3 are normalized to one, and ρ ≡ Cov (v1, v3)
is the covariance.
In fact, if we are willing to assume a strong enough exogeneity condition for the instruments z, the
bivariate distribution (v1, v3) becomes independent not only of v2 but also of z and thus of y2 (because y2
5
-
is a deterministic function of z,v2):
D
v1v3
= D v1
v3
∣∣∣∣∣∣∣ z,v2 = D
v1v3
∣∣∣∣∣∣∣y2, z,v2 . (4)
Given the distributional assumptions, we arrive at a bivariate probit model that accounts for endogeneity
issues in (1) by adding v2 as extra explanatory variables:
y1 = 1 [x1β + v2θ + v1 ≥ 0] , (5a)
y3 = 1 [zδ + v2η + v3 ≥ 0] . (5b)
Adding reduced form errors to control for endogeneity is the essence of a control function approach.
However, since we can not observe v2, to operationalize it, a simple two-step procedure proceeds as follows:
1. Estimate (1b), the reduced forms for y2, by ordinary least squares (OLS), equation by equation, to
obtain the residuals v̂2 = y2 − zΠ̂.
2. Estimate (5), the bivariate probit model for y1 and y3, jointly by maximum likelihood estimation
(MLE), replacing v2 with v̂2.
Since there is no one-to-one mapping between the reduced form error v3 and the binary EEV y3, we
cannot obtain a proxy for v3 and hence have to rely on a joint estimation in the second step. By the usual
consistency argument of two-step M-estimations (see, for example, Wooldridge, 2010, section 12.4.1), the
resulting control function estimator(
Π̂, β̂, δ̂,θ̂, η̂)
is consistent for parameters identified by the following
population problems. Formally,
Π = E(z′z)−1
E(z′y2
), (6)
and (β, δ,θ,η) is the unique solution to
maxb∈RK1 ,d∈RL,r∈RG,g∈RG,ρ∈R
E [logP (y1, y3|y2, z,v2)]
= E
[y1y3 log
∫ ∞−q3
Φ (d)φ (υ3) dυ3
+ (1− y1) y3 log∫ ∞−q3
[1− Φ (d)]φ (υ3) dυ3
+y1 (1− y3) log∫ −q3−∞
Φ (d)φ (υ3) dυ3
+ (1− y1) (1− y3) log∫ −q3−∞
[1− Φ (d)]φ (υ3) dυ3], (7)
6
-
where
d ≡ x1b + v2r + ρυ3√1− ρ2
, (8)
q3 ≡ zd + v2g. (9)
However, as the magnitude of β depends on the normalization of the error terms and thus is only iden-
tified up to scale, interpreting β is not especially meaningful. Instead, the primary goal in empirical studies
is to explain marginal effects of a variable of interest on response probabilities. In the presence of EEVs,
P (y1 = 1|x1), the conditional response probability is hardly of any interest: it is affected by y2 and y3
having correlations with the omitted variable in the unobservables u1. We must use care in constructing
a interesting response function for deriving partial effects. Fortunately, Blundell and Powell (2003, 2004)
have proposed the average structural function (ASF), which is intuitively appealing and can be obtained via
counterfactual reasoning. In defining the ASF for the structural equation (1a), we break the correlations by
holding the observables x1 as fixed arguments and averaging out the unobservable ui1 without conditioning
on x1:
ASF (x1) = Eui1 {1 [x1β + ui1 > 0]} , (10)
where the subscript i on ui1 emphasizes that it is a random variable, and Eui1 {·} is the expected value with
respect to ui1.
In the two-step CF procedure above, we identify parameters that correspond to the conditional normality
of u1 given v2, namely,
u1|v2 ∼ Normal (v2θ, 1) . (11)
Thus, by the usual law of iterated expectations, the ASF defined in (10) can be obtained in two steps.
First, we treat vi2 as fixed, and then average them out as random variables:
ASF (x1) = Evi2{Eui1|vi2 {1 [x1β + vi2θ + vi1 > 0] |vi2}
}= Evi2 {Φ (x1β + vi2θ)}
=
∫ ∞−∞
Φ (x1β + υi2θ)φ (υi2) dυi2, (12)
where φ (·) is the density function for the random variables vi2.
The average partial effects (APEs) for a given x1 are then obtained by taking derivatives or differences
7
-
of (12)
APEy2 (x1) = βy2
∫ ∞−∞
φ (x1β + υi2θ)φ (υi2) dυi2, (13a)
APEy3 (x1) =∫ ∞−∞
[Φ(x
(1)1 β + vi2θ
)− Φ
(x
(0)1 β + vi2θ
)]φ (υi2) dυi2, (13b)
where βy2 is the coefficient on y2 and x(1)1 denotes explanatory variables at a particular fixed value with
y3 = 1 and x(0)1 denotes the same fixed value of the explanatory variables except that y3 = 0. Those APEs
can be consistently estimated by using sample analogue and inserting consistent estimators of β̂ and θ̂ from
the two-step CF approach:
ÂPEy2 (x1) = β̂y2
[N−1
N∑i=1
φ(x1β̂ + v̂i2θ̂
)], (14a)
ÂPEy3 (x1) = N−1
N∑i=1
[Φ(x
(1)1 β̂ + v̂i2θ̂
)− Φ
(x
(0)1 β̂ + v̂i2θ̂
)]. (14b)
To obtain inference for the estimators of APEs as in (14a) and (14b), analytical standard errors can be
derived by the delta method and by setting the two-step control function problem as one-step method of
moments problem. However, because all the procedures involved in the estimations are standard routines,
bootstrap standard errors can be easily obtained to account for the sampling errors.
As shown in (13a) and (13b), APEs for the binary response model have the attractive feature of built-in
heterogeneity—-they deliver varying partial effects when evaluated at different values of x1. However, if one
is interested in using a single summary statistic for marginal effects, further averaging across x1 should be
applied. A joint averaging across x1, v̂2 (as ”margins” command does in STATA) is computationally easier
but bears a different causal interpretation from sequentially averaging out v̂2 and x1 (Nam and Wooldridge,
2014).
Although serving as a starting point, the modelling strategy in (1) for CF approach is limited in several
ways. One restrictive feature is that the reduced form error v2 needs to be independent of the exogenous
variables z. Thus, the linear function form for conditional mean of y2 is unrealistic and can be relaxed to be
any generic function π (·) for z as in Blundell and Powell (2003, 2004). More importantly, v2 here acts as a
sufficient statistic to control for any endogeneity from y2 in the structural error u1: that is, y2 is correlated
with u1 only through v2 on its level form. However, as shown in Murtazashvili and Wooldridge (2015), in
8
-
case of more heterogeneity such as random coefficients, the unobservable u1 can contain full interactions
between v2 and z,x1. Besides interactions, even though allowing for an unknown function h (·) for v2 as
in Lin (2016) does not completely make the dependence of u1 on v2 flexible, nevertheless it adds some
flexiblity to this restrictive assumption.
2.2 Probit Endogenous Switching Models with Many Continuous EEVs
As we are interested in modeling some heterogeneity besides EEVs, we turn to a probit switching re-
gression with EEVs. The binary EEV y3 can be viewed as a switching indicator. In addition to shifting
intercepts when y3 appears by itself in the linear index, the switching can be made more general. Interacting
y3 with all the observables allows us to switch into regimes of differing slopes. The interaction between
y3 and unobservables indicates the two regimes have differing unobservables. The switching is endogenous
because y3 is correlated with the unobservables. In the treatment effect framework, y3 is the treatment
indicator and the treatment effect is heterogenous. To see this, first write the model as follows:
y1 = 1 [(1− y3)x1β0 + y3x1β1 + (1− y3)u0 + y3u1 > 0] (15a)
y2 = zΠ + v2 (15b)
y3 = 1 [zδ + u3 > 0] , (15c)
Under a similar set of notations and assumptions as in (1), write the linear projection of u1, u0 and u3
onto the reduced form error v2 in error forms:
u0 = v2θ0 + v0 (16a)
u1 = v2θ1 + v1 (16b)
u3 = v2η + v3, (16c)
where θ0 ≡ E (v′2v2)−1E (v′2u0), θ1 ≡ E (v′2v2)
−1E (v′2u1) and η ≡ E (v′2v2)−1E (v′2u3). Then, we
maintain a strong exogeneity assumption that the remaining error terms v0 and v1 are independent of v2 and
a parametric assumption that they have a bivariate normal distribution with the remaining error term v3 with
9
-
covariance ρ0 and ρ1, respectively:
D
v0v3
∣∣∣∣∣∣∣v2 ∼ Normal
0
0
, 1 ρ0
ρ0 1
, (17a)
D
v1v3
∣∣∣∣∣∣∣v2 ∼ Normal
0
0
, 1 ρ1
ρ1 1
. (17b)
Again, assuming (v1, v3) and (v0, v3) are independent of z leads to an independence between y2 and
the joint distribution of (v0, v3) and (v1, v3)
D
v0v3
= D v0
v3
∣∣∣∣∣∣∣ z,v2 = D
v0v3
∣∣∣∣∣∣∣y2, z,v2 , (18a)
D
v1v3
= D v1
v3
∣∣∣∣∣∣∣ z,v2 = D
v1v3
∣∣∣∣∣∣∣y2, z,v2 . (18b)
Them, rewrite model (15) in the treatment framework
y1 = (1− y3) y(0)1 + y3y(1)1 (19)
y(0)1 = 1 [x1β0 + v2θ0 + v0 > 0] , (20)
y(1)1 = 1 [x1β1 + v2θ1 + v1 > 0] , (21)
y3 = 1 [zδ + v2η + v3 > 0] , (22)
where y(0)1 is the potential outcome when the treatment y3 equals zero and y(1)1 is the potential outcome
when the treatment is one. The self-selection problem is represented by the non-zero correlation between
the treatment indicator y3 and the unobservables v0 and v1 in the potential outcomes. Those who self-select
into treatment inherently have a different distribution of unobservable from those who do not.
To consistently estimate the parameters in this model, a simple three-step control function approach
splits the above model into two Heckman sample selection models with sub-samples defined by the treatment
status:
1. Using all observation, estimate (15b), the reduced forms for y2, by ordinary least squares (OLS),
equation by equation, to obtain the residuals v̂2 = y2 − zΠ̂.
2. Since y(1)1 is observed only when y3 = 1, jointly estimate (21) and (22), the binary outcome equation
10
-
for y(1)1 and sample selection equation for indicator y3, by maximum likelihood estimation (MLE), replacing
v2 with v̂2, to obtain β̂1 and θ̂1.
3. Since y(0)1 is observed only when y3 = 0, jointly estimate (20) and (22), the binary response model for
y(0)1 and sample selection equation for indicator 1−y3, by maximum likelihood estimation (MLE), replacing
v2 with v̂2, to obtain β̂0 and θ̂0.
The above procedure is justified by splitting the objective function for the second-step estimation into
two parts.
Namely, solving
maxb0∈RK1 ,b1∈RK1 ,d∈RL,r0∈RG,r1∈RG,g∈RG,ρ0∈R,ρ1∈R
E [logP (y1, y3|y2, z,v2)]
= E[y1y3 logP
(y
(1)1 = 1, y3 = 1|z,v2,y2
)+ (1− y1) y3 logP
(y
(1)1 = 0, y3 = 1|z,v2,y2
)+y1 (1− y3) logP
(y
(0)1 = 1, y3 = 0|z,v2,y2
)+ (1− y1) (1− y3) logP
(y
(0)1 = 0, y3 = 0|z,v2,y2
)], (23)
is equivalent to solving
maxb1∈RK1 ,d∈RL,r1∈RG,g∈RG,ρ1∈R
E [logP (y1, y3|y2, z,v2)]
= E[y
(1)1 y3 logP
(y
(1)1 = 1, y3 = 1|z,v2,y2
)+(
1− y(1)1)y3 logP
(y
(1)1 = 0, y3 = 1|z,v2,y2
)+ (1− y3) logP (y3 = 0|z,v2,y2)] , (24)
and
maxb0∈RK1 ,d∈RL,r0∈RG,g∈RG,ρ0∈R
E [logP (y1, y3|y2, z,v2)]
= E[y
(0)1 (1− y3) logP
(y
(0)1 = 1, y3 = 0|z,v2,y2
)+(
1− y(0)1)
(1− y3) logP(y
(0)1 = 0, y3 = 0|z,v2,y2
)+y3 logP (y3 = 1|z,v2,y2)] , (25)
where
11
-
P(y
(1)1 = 1, y3 = 1|z,v2,y2
)=
∫ ∞−q3
Φ (d1)φ (υ3) dυ3 (26)
P(y
(1)1 = 0, y3 = 1|z,v2,y2
)=
∫ ∞−q3
[1− Φ (d1)]φ (υ3) dυ3 (27)
P(y
(0)1 = 1, y3 = 0|z,v2,y2
)=
∫ −q3−∞
Φ (d0)φ (υ3) dυ3 (28)
P(y
(0)1 = 0, y3 = 0|z,v2,y2
)=
∫ −q3−∞
[1− Φ (d0)]φ (υ3) dυ3 (29)
P (y3 = 1|z,v2,y2) = Φ (q3) (30)
P (y3 = 0|z,v2,y2) = 1− Φ (q3) (31)
d1 ≡x1b1 + v2r1 + ρ1υ3√
1− ρ21(32)
d0 ≡x1b0 + v2r0 + ρ0υ3√
1− ρ20(33)
q3 ≡ zd + v2g. (34)
Similar to (12), the ASF for the endogenous switching model is a combination of the ASFs for the two
regimes:
ASF (x1) =∫ ∞−∞
[y3Φ (x1β1 + υi2θ1) + (1− y3) Φ (x1β0 + υi2θ0)]φ (υi2) dυi2. (35)
APEs for a continuous EEV y2 and binary EEV y3 are defined as follows respectively:
APEy2 (x1) =∫ ∞−∞
[βy(1)2
y3φ (x1β1 + υi2θ1)
+βy(0)2
(1− y3)φ (x1β0 + υi2θ0)]φ (υi2) dυi2, (36a)
APEy3 (x1) =∫ ∞−∞
[Φ (x1β1 + υi2θ1)− Φ (x1β0 + υi2θ0)]φ (υi2) dυi2, (36b)
where βy(1)2
is the coefficient for y2 in (21) and βy(0)2is the coefficient in (20).
Notice that the APE for a binary exogenous variable z1 is defined nontrivially as
APEz1 (x1) =∫ ∞−∞
{y3
[Φ(x
(1)1 β1 + υi2θ1
)− Φ
(x
(0)1 β1 + υi2θ1
)]+ (1− y3)
[Φ(x
(1)1 β0 + υi2θ0
)− Φ
(x
(0)1 β0 + υi2θ0
)]}, (37)
where x(1)1 denotes explanatory variables at a particular fixed value with z1 = 1 and x(0)1 denotes the same
12
-
fixed value of the explanatory variables except that now z1 = 0.
Correspondingly, a consistent estimate of the APEs is a sample analog of (36a) and (36b) with consistent
estimates for the parameters plugged in:
ÂPEy2 (x1) = N−1
N∑i=1
[β̂y(1)2
y3φ(x1β̂1 + v̂i2θ̂1
)+β̂
y(0)2
(1− y3)φ(x1β̂0 + v̂i2θ̂0
)], (38a)
ÂPEy3 (x1) = N−1
N∑i=1
[Φ(x1β̂1 + v̂i2θ̂1
)− Φ
(x1β̂0 + v̂i2θ̂0
)]. (38b)
As before, instead of deriving complicated analytical formulas for standard errors for estimates of APEs,
bootstrap standard error can be easily applied to account for the sampling variation in the generated regressor
v̂.
Despite that the switching model brings in additional flexibility by allowing the structural error u ≡
(1− y3)u0 + y3u1 to depend not only on v2 but also on interactions between v2 and y3, assuming that the
reduced forms for y2 remain unchanged across two regimes is restrictive in empirical applications.
3 Test for Endogeneity from a Binary Explanatory Variable
This section focuses on variable addition tests for additional endogeneity from a binary explanatory
variable, conditioning on v̂2, the residuals from reduced forms for continuous EEVs. As we have seen in
equations (1) and (15), the only consistent approach to deal with a binary EEV is to make distributional
assumptions and conduct a joint estimation. In real application, we always want to avoid a joint MLE
estimation due to its sensitivity to the distributional assumption and computational difficulty in arriving at a
numerical solution. A variable addition test (VAT), as proposed in Wooldridge (2014), helps us determine
whether such a joint estimation is necessary by testing on generalized residuals before proceeding to a
joint estimation. Especially if we have already controlled for endogeneity from other continuous EEVs by
conditioning on v̂2, the generalized residual is less likely to be correlated with the remaining unobservable.
The following shows that the VAT on the generalized residual is a valid test for endogeneity from a binary
explanatory variable because it is asymptotically equivalent to a LM test under the null hypothesis of no
endogeneity.
More formally, in the basic model (1), we are interested in testing the following null hypothesis:
13
-
H0 : ρ = 0. First, we begin by showing an infeasible Lagrange multiplier (score) test that has the asymptotic
distribution of χ21. Then, we show the, conditional on v2, VAT test of the generalized residual is asymptot-
ically equivalent to the infeasible LM test and thus has the same asymptotic χ21 distribution. In practice, in
order to account for the sampling error in v̂2, we bootstrap the two-step procedure to obtain the p-value of
the test. Let γ ≡ (β,θ) and wi ≡ (xi1,vi2). Let d̃i be di in (8) evaluated at ρ = 0 and γ̃ be the estimates
of γ obtained from the restricted model. The restricted model is one where ρ = 0 so we treat y3 as an ex-
ogenous explanatory variable. Let q̂3i be q3i in (9) evaluated at the parameters(δ̂, η̂
)from a reduced-form
probit estimation.
As in Semykina and Wooldridge (2017), using the likelihood function Li ≡ P (yi1, yi3|yi2, zi,vi2)
for one observation, the LM statistic plugs the estimates from the restricted model into the score from the
unrestricted model:
LM =
(N∑i=1
S̃i,ρ
)′Ã22
[Ṽ22
]−1Ã22
(N∑i=1
S̃i,ρ
)/N, (39)
where S̃i,ρ ≡ ∂ lnLi∂ρ |γ=γ̃,ρ=0 =yi1−Φ(d̃i)
Φ(d̃i)[1−Φ(d̃i)]φ(d̃i
)ĝri3
Ã
≡ − 1N
∑Ni=1E(∂2 lnLi∂γ∂γ′ |yi3,yi2, zi,vi2
)|γ=γ̃,ρ=0
∑Ni=1E
(∂2 lnLi∂ρ∂γ′ |yi3,yi2, zi,vi2
)|γ=γ̃,ρ=0∑N
i=1E(∂2 lnLi∂γ∂ρ |yi3,yi2, zi,vi2
)|γ=γ̃,ρ=0
∑Ni=1E
(∂2 lnLi∂ρ∂ρ |yi3,yi2, zi,vi2
)|γ=γ̃,ρ=0
= 1N
∑N
i=1
φ(d̃i)2
Φ(d̃i)[1−Φ(d̃i)]w′iwi
∑Ni=1
φ(d̃i)2
Φ(d̃i)[1−Φ(d̃i)]w′iĝri3∑N
i=1
φ(d̃i)2
Φ(d̃i)[1−Φ(d̃i)]ĝri3wi
∑Ni=1
φ(d̃i)2
Φ(d̃i)[1−Φ(d̃i)]ĝr2i3
Ã−1 =
Ã11 Ã12Ã21 Ã22
Ṽ = Ã−1B̃Ã−1 =
Ṽ11 Ṽ12Ṽ21 Ṽ22
B̃ ≡ 1
N
N∑i=1
(S̃i,ρS̃
′i,ρ
)ĝri3 ≡ yi3
φ (q̂3i)
Φ (q̂3i)− (1− yi3)
φ (−q̂3i)Φ (−q̂3i)
(40)
Matrix à above is an estimator of the expected value of the negative Hessian matrix that uses the expected
Hessian form. The outer product of scores or usual Hessian form of the matrix could be used. ĝri3 is a
14
-
consistent estimator of gri3 ≡ E (vi3|yi3,yi2, zi,vi2)
A VAT can be carried out by the following procedure of testing on generalized residuals:
1. Use OLS to estimate the reduced-form equations for yi2 (1b) to obtain v̂i2.
2. Use probit to estimate the augmented reduced-form for yi3 in (5b), construct ĝri3 according to the
formula in equation (40).
3. Augment equation (5a) by ĝri3 and estimate by probit. Use the t statistics for testing single hypothe-
ses.
Under the null hypothesis the coefficient on ĝri3 is zero, and so estimation of the parameters in ĝri3 does
not affect the√N -asymptotic distribution of the test statistic. There is no need to account for the first-step
estimation of ĝri3 when performing the test. However, as in Wooldridge (2010, Section 12.5.2), we need
to adjust for the first-step estimation of vi2, by stacking the moment conditions or by bootstrapping the two
steps procedure.
The following shows that, conditional on vi2, the variable addition test is asymptotically equivalent to
the LM test. Write the second-step log likelihood function as
Li = Φ (xi1β + vi2θ+τgri3)yi1 [1− Φ (xi1β + vi2θ+τgri3)]1−yi1 . (41)
As mentioned above, we ignore the fact that gr3 is estimated consistently at the first step. The score
vector of (41) is
Si =
∂ lnLi∂γ∂ lnLi∂τ
= yi1 − Φ (wiγ+τgr3)Φ (wiγ+τgr3) [1− Φ (wiγ+τgr3)]
φ (wiγ+τgr3)
wigri3
(42)Summing the score vector over all i and using a mean-value expansion about the true parameter vector
gives
N−1/2N∑i=1
Ŝi = N−1/2
N∑i=1
Si −A√N
γ̂ − γτ̂ − τ
+ op (1) = 0 (43)where Ŝi is the score vector evaluated at the estimated parameters
(γ̂ ′, τ̂
)′, and A is the expected value ofthe negative Hessian matrix.
√N
γ̂ − γτ̂ − τ
= A−1 [N−1/2 N∑i=1
Si
]+ op (1) (44)
15
-
When testing H0 : τ = 0, the robust Wald test statistic is given by
W = (τ̂ − τ)′(V̂22/N
)−1(τ̂ − τ) =
√N (τ̂ − τ)′ V̂−122
√N (τ̂ − τ) (45)
where
V̂ = Â−1B̂Â−1 =
V̂11 V̂12V̂21 V̂22
, (46)B̂ =
1
N
N∑i=1
(S̃i,ρS̃
′i,ρ
), (47)
 =1
N
∑Ni=1 φ(p̂i)2Φ(p̂i)[1−Φ(p̂i)]w′iwi ∑Ni=1 φ(p̂i)2Φ(p̂i)[1−Φ(p̂i)]w′iĝri3∑Ni=1
φ(p̂i)2
Φ(p̂i)[1−Φ(p̂i)] ĝri3wi∑N
i=1φ(p̂i)
2
Φ(p̂ii)[1−Φ(p̂i)] ĝr2i3
, (48)p̂i = wiγ̂+τ̂ ĝr3, (49)
Â−1p−→ A−1 =
A11 A12A21 A22
. (50)So the Wald statistic can also be written as
W =
(N∑i=1
Si, τ
)′A22V̂−122 A
22
(N∑i=1
Si, τ
)//N (51)
Under the null of no selection bias (τ = 0, ρ = 0), the score and Hessian matrices used in (39) and (51)
are the same when evaluated at the true parameter values. When the null is true, τ̂p−→ 0,
√N (γ̂ − γ) and
√N (γ̃ − γ) converge in distribution. Therefore, LM−W p−→ 0, so the tests are asymptotically equivalent.
Through bootstrapping the two-step procedure, p-value for the test can be obtained .
4 Quasi-LIML and Fractional Response
Based on the literature of Quasi-MLE (White, 1982), the findings above carry through if f1 is a fractional
response with a conditional mean that happens to have a probit form. The key insight from quasi-likelihood
estimation is that we do not need to know the true distribution of the entire model to obtain consistent param-
eter estimates. This likelihood function could also be applied to the case where y1 is a fractional response,
as long as we model the conditional mean of y1 to have a probit form. With the Bernoulli distribution being
in the linear exponential family, quasi-LIML would identify parameters in a correctly specified conditional
16
-
mean regardless of misspecification in other aspects of the distribution.
Namely,
E (f1|x1, c1) = Φ (x1β+c1) (52a)
y2 = zΠ + v2 (52b)
y3 = 1 [zδ + u3 ≥ 0] , (52c)
where c1 is an omitted variable thought to be correlated with y2 and y3. By assuming c1 follows a joint
normality distribution with v2 and u3, linear projections of c1 and u3 onto v2 have the following error form:
c1 = v2θ+a1 (53a)
u3 = v2η + v3 (53b)
where θ ≡ E (v′2v2)−1E (v′2c1) and η ≡ E (v′2v2)
−1E (v′2u3) . Plugging the linear projections (53a) and
(53b) back to (52a) and (52c), we have an augmented equation for the conditional mean of f and the reduced
form for y3:
E (f1|x1,v2, a1) = Φ (x1β + v2θ+a1) (54a)
y3 = 1 [zδ + v2η + v3 ≥ 0] , (54b)
where a1 is the remaining unobservable factor that, after conditioning on v2, captures the additional endo-
geneity from y3 through v3. Again, assume a joint normality assumption between a1 and v3 as
D
a1v3
∣∣∣∣∣∣∣v2 ∼ Normal
0
0
, σ2a ρσa
ρσa 1
, (55)
where σ2a ≡Var(a1) and ρ is the covariance. Further averaging out the unobservable a1, the conditional mean
of the joint distribution of f1 and y3 has the exact same form as the probit model with many continuous EEVs
17
-
and one binary EEV in (1).
E (f1, y3 = 1|z,v2,y2) = E (y1, y3 = 1|z,v2,y2) = P (y1 = 1, y3 = 1|z,v2,y2)
=
∫ ∞−q3
Φ (d)φ (υ3) dυ3, (56a)
E (f1, y3 = 0|z,v2,y2) = E (y1, y3 = 0|z,v2,y2) = P (y1 = 1, y3 = 0|z,v2,y2)
=
∫ −q3−∞
Φ (d)φ (υ3) dυ3, (56b)
where
d ≡ x1b + v2r + ρυ3√1 + (1− ρ2)σ2a
, (57a)
q3 ≡ zd + v2g. (57b)
Because the Bernoulli log likelihood belongs to the linear exponential family, the solution from the
following maximization problem identifies (β,θ):
maxb1∈RK1 ,d∈RL,r1∈RG,g∈RG,ρ1∈R
E [logP (f1, y3|y2, z,v2)]
= E
[f1y3 log
∫ ∞−q3
Φ (d)φ (υ3) dυ3
+ (1− f1) y3 log∫ ∞−q3
[1− Φ (d)]φ (υ3) dυ3
+f1 (1− y3) log∫ −q3−∞
Φ (d)φ (υ3) dυ3
+ (1− f1) (1− y3) log∫ −q3−∞
[1− Φ (d)]φ (υ3) dυ3]. (58)
5 Monte Carlo Simulations
In this section, six Monte Carlo experiments are conducted to compare the finite sample behavior of
different estimators for binary response model with both continuous and discrete EEVs. The six Monte Carlo
experiments fall into two designs. In the first design error terms (u1,v2, u3) are jointly normally distributed.
In the second design, conditional on v2, u1 and u3 are assumed to have bivariate normal distribution. For
each design, three data generating processes (DGPs)including a just identification case, an over identification
case, and a switching model with two regimesare considered. Nine estimators are compared in each case,
four estimators assuming a linear probability model for the binary outcome and the other five estimators
18
-
acknowledging the nonlinear functional form. APEs are simulated for those estimators that respect the
nonlinear functional form and are compared with coefficients from linear estimators.
More specifically, in the first design of joint normality, the DGP for the Just ID is
y1 = 1 [−y2 + y3 + 0.3z1 + 0.3z2 + 0.5v2 + 0.5v3 + r1 > 0]
y2 = 0.1z1 + 0.2z2 + 0.1z3 + z4 + v2 (59)
y3 = 1 [0.2z1 + 0.1z2 + z3 + 0.1z4 + 0.5v2 + v3 > 0] ,
where
u1 = 0.5v2 + v1 (60)
v1 = 0.5v3 + r1 (61)
r1 ∼ Normal (0, 0.5) (62)
so that u1
v2
v3
∼ Normal
0
0
0
,
1 0.5 0.5
0.5 1 0
0.5 0 1
. (63)
The binary EEV y2 and continuous EEV y3 are generated to have coefficients of opposite signs in order to
show how biased estimators react to sign difference. The exogenous variables are generated as:
z1 ∼ Normal (0, 1)
e2 ∼ Normal (0, 1)
z2 = 1 [e2 > 0]
z3 ∼ Normal (0, 1)
e4 ∼ Normal (0, 1)
z4 = 1 [e3 > 0] .
where the continuous z3 is the instrument mainly for binary EEV y3 and the binary z4 is the instrument
mainly for continuous EEV y2. To make them valid instruments, z3 and z4 are excluded from the structural
19
-
equation.
In this DGP, the true ASF is defined as
ASF (x1) = Φ (−y2 + y3 + 0.3z1 + 0.3z2) . (64)
The second case of over identification has the same parameters except that we have two additional
instruments z5 and z6, where
z5 ∼ Normal (0, 1)
e6 ∼ Normal (0, 1)
z6 = 1 [e6 > 0] .
Continuous z5 is mainly for the continuous EEV y2 and binary z6 is mainly for the binary EEV y3. The true
ASF remains the same as in (64).
In the endogenous switching case, to emphasize coefficients on the continuous EEV y2 and the corre-
lations between the reduced form errors and the structural error are designed to have opposite directions
across regimes, namely
y(1)1 = 1 [−y2 + y3 + 0.3z1 + 0.3z2 + 0.5v2 + v1 > 0]
y(0)1 = 1 [0.3y2 + y3 − 0.5z1 + 0.1z2 − 0.5v2 + v0 > 0] (65)
y2 = 0.1z1 + 0.2z2 + 0.1z3 + z4 + v2
y3 = 1 [0.2z1 + 0.1z2 + z3 + 0.1z4 + 0.5v2 + v3 > 0] ,
where
u0 = −0.5v2 + v0 (66)
v0 = −0.5v3 + r1 (67)u0
v2
v3
∼ Normal
0
0
0
,
1 −0.5 −0.5
−0.5 1 0
−0.5 0 1
. (68)
20
-
ASF in this case is
ASF (x1) = y3Φ (−y2 + y3 + 0.3z1 + 0.3z2) + (1− y3) Φ (0.3y2 + y3 − 0.5z1 + 0.1z2) . (69)
In design 2, parameterizations are the same as in design 1, but we assume v2 follows a demeaned χ21
distribution with one degree of freedom
v2 ∼ χ21 − 1. (70)
In all experiments, the number of replications is 1000, and the results of the experiments are presented
for sample sizes of 1000, 3000 and 5000. Table 1 and Table 2 report biases and the root mean squared errors
(RMSEs) for estimators of APE for y2 and y3, respectively. Figure 1 and Figure 2 depict the empirical
distributions of estimators of APE for y2 with sample size of 1000 under design 1 and design 2, respectively.
Similarly, Figure 3 and Figure 4 depict the counterparts for y3.
For each of the above designs, coefficients of linear probability models and APEs of probit models are
compared. Further, for probit models, joint estimations with the binary EEV or all EEVs are compared with
two-step estimations with control function terms (residuals or generalized residuals) plugged in. In addition,
a switching version of each model is considered to account for the endongenous switching DGP in case 3.
More specifically, CF Biprobit is the control function approach inserting first-stage residual from reduced-
form estimation of y2 into the second-step joint estimation between y1 and y3. CF Biprobit Switching
performs Heckman probit with sample selection for y(1)1 and y(0)1 separately using y3 as a sample selection
indicator. CF Probit avoids the joint estimation with y3 by inserting a generalized residual from y3 as a
proxy for endogeneity given residual from y2. CF Probit Switching performs the CF Probit separately for
sub-samples defined by y3. CF Linear inserts a residual from y2 and a generalized residual from y3 into the
linear probability model for y1. CF Linear Switching allows for a full set of interactions between y3 and
other observables and unobservables in the linear probability model. Usual 2SLS uses linear probability
models for both y1 and y3 and applies the usual two-step IV estimation. Optimal IV uses predicted values
from reduced forms for y1 and y3 as instruments for a linear probability model of y1. y3 is predicted from a
probit model. Joint MLE is a full joint estimation of y1, y2 and y3.
For the APE of y2 under joint normality as in Figure 1, CF Biprobit and Joint MLE are the consistent
estimators in the Just ID case and Over ID case while CF Biprobit Switching is the consistent estimator in the
Switching case. Their empirical distributions are centered around the true APE depicted by the red vertical
21
-
line. Besides those consistent estimators, approximations provided by the CF Probit (or CF Probit Switching
in the Switching case) outperform, to a great extent, the approximations provided by the linear probability
estimators such as CF Linear (or CF Linear Switching in the Switching case), Usual 2SLS and Optimal IV.
In fact, in the Switching case, CF Probit Switching and CF Biprobit Switching (the consistent estimator in
this case) seem to completely overlap with each other, suggesting a negligible amount of bias. In the Just
ID case and Over ID case, CF Probit has a mild amount of upward bias and a slightly lower peak than CF
Biprobit and Joint MLE. In contrast, approximations provided by linear probability model estimators (CF
Linear, Usual 2SLS Optimal IV and CF Linear Switching) have a significant amount of downward bias in
all cases. The differences in bias within the linear probability model estimators are not noticeable: they all
seem to cluster together. In the Switching case, they are joined by the misspecified CF Biprobit and Joint
MLE which have a similar amount of downward bias. When CF Biprobit and Joint MLE are consistent, they
still completely overlap with each other. This happens not only in the Just ID case but also in the Over ID
case, suggesting a negligible amount of efficiency loss by carrying out a two-step procedure. CF Biprobit
Switching and CF Probit Switching, however, suffer a slightly flatter peak compared to their counterpart
non-switching estimators (CF Biprobit and CF Probit) in the Just ID case and Over ID case, indicating an
efficiency loss from a more complex parameterization.
When the error terms follow a conditional normality, the estimators for APEs for y2 have fairly different
finite sample behaviors from that under joint normality. As reflected in Figure 2, Joint MLE lacks robustness
and is no longer the consistent estimator in any case. As before, CF Biprobit is the consistent estimator in
the Just ID case and Over ID case while CF Biprobit Switching is the consistent estimator in the Switching
case. Approximations provided by the CF Probit (or CF Probit Switching in the Switching case) are still the
best: they almost overlap with those provided by CF Biprobit (or CF Biprobit Switching in the Switching
case), the consistent estimator. Joint MLE is biased upwards to a noticeable degree in the Just ID case and
Over ID case. In the Switching case where Joint MLE is misspecified, it is biased downward and joined
by other inconsistent estimators like CF Biprobit and linear probability estimators (CF Linear, Usual 2SLS
and Optimal IV). CF Linear Switching performs mildly better than other linear estimators in the Switching
case but are still more biased compared to the consistent estimator. Overall, linear probability estimators
continues to perform poorly in all cases: they are far biased downwards. The CF Biprobit Switching and CF
Probit Switching still lead to efficiency loss indicated by flatter peaks in the Just ID case and Over ID case.
Under joint normality, APE of y3 follows a similar pattern as that of y2, with some minor differences.
22
-
As in Figure 3, CF Biprobit and Joint MLE still overlap with each other in all cases, whether as consistent
estimators in the Just ID case and Over ID case, or as misspecified estimators in the Switching case. The
approximation provided by CF Probit (or CF Probit Switching in the Switching case) is still the best but with
a flatter peak than those in Figure 1. The linear probability estimators are biased upwards. The differences
in empirical distributions for the linear estimators are more pronounced in the binary EEV y3 than for
continuous EEV y2. Particularly, CF Linear using the generalized residual is no longer close to Usual 2SLS
using linear probability model for y3. In the Switching case, linear probability estimators all lie between the
CF Biprobit Switching and CF Biprobit with varying degrees of bias and precision.
APEs of y3 under conditional normality are sketched in Figure 4. Like estimators in Figure 3, they
have identical patterns with their counterparts for y2. More specifically, Joint MLE is biased downwards in
the Just ID case and Over ID case but biased upwards in the Switching case. Similarly, linear probability
estimators are biased upwards rather than downwards. CF Probit (or CF Probit Switching in the Switching
case) still provides the best approximations in all cases, significantly better than linear probability estimators.
Table 1 and Table 2 report the bias and RMSE of y2 and y3 for all the estimators in the six cases,
respectively. Despite the difference in sign and magnitude, the patterns of estimators for y2 and y3 are
similar. Methods using CF approaches are listed in Column (1) through (6), followed by conventional
methods like IV 2SLS, Opt. IV 2SLS and Joint MLE from Column (7) to (9). With the increase of sample
size from 1000 to 5000, the bias of CF Biprobit in the Just ID case and Over ID case (or CF Biprobit
Switching in the Switching case) shrinks drastically to zero. Their RMSEs also decrease by about half at
the same time. The bias of CF Probit in the Just ID case and Over ID case (or CF Probit Switching in the
Switching case) is small at sample size of 1000 but shrink by a less magnititude as sample size increases
to 5000. The RMSEs of CF Probit or CF Probit Switching also decrease by about half as the sample size
increases. The bias of the linear probability estimators (CF 2SLS, CF 2SLS Switching, IV 2SLS and Opt.
IV 2SLS) and misspecified Joint MLE, however, is huge to start with and does not shrink or even increases
in some cases as the sample size increase. Their RMSEs also do not decrease as much.
In summary, the Monte Carlo results show that CF Biprobit does not lose efficiency compared to Joint
MLE in the correctly specified case. CF Probit (or CF Probit in the Switching case) provides good approxi-
mations, outperforming linear estimators of any sort to a great extent.
23
-
6 Empirical Illustration
As an empirical illustration, we revisit the empirical example of Murtazashvili and Wooldridge (2015)
under different functional form assumptions and estimation methods. Murtazashvili and Wooldridge (2015)
study the sensitivity of the budget share of housing expenditure to price and total expenditure using a linear
probability panel data model with many sources of heterogeneity. Total expenditure is considered to be the
continuous EEV because of its joint determination with the budget share on housing expenditure. Home-
ownership dummy is considered to be the binary EEV. It is also assumed to play the role of an endogenous
switching indicator that is employed for the budget share of housing expenditure equation. Here, instead,
we employ a fractional response model with switching, as in (71), that acknowledges the fractional nature
of the budget share and therefore has built-in heterogeneity.
E(HousingShare|x1, c1, c0) = Φ [β0 + β1Log(Expend.) + β2Homeowner
+z1β3 + β4Log(Expend.) · Homeowner
+β5Homeowner · z1 + c0 + Homeowner · c1] , (71a)
Log(Expend.) = ζ0 + zζ1 + v2, (71b)
Homeowner = 1 [γ0 + zγ1 + u3 > 0] . (71c)
We also use just one cross-sectional period from the sample, which turns out to give fairly close estimates
of marginal effect for the variables of interest to the panel linear model with many sources of heterogeneity.
The summary APEs from the nonlinear model are compared to the coefficients from the coefficients in linear
probability panel data models.
The sample employed in the estimation is the 2001 wave of the Panel Study of Income Dynamics (PSID)
that consists of 2355 owners and 629 renters. Since we suspect that the homeownership dummy indicates
switching into differing regions, we report separate summary statistics for different home ownership statuses
as in Table 3. Due to the way the dependent variable housing budget share is constructed, and the increase
in the price for homes, 84 out of 2355 home owners face a negative housing budget share. As the dependent
variable has to be in the unity interval for a fractional response, the housing budget for these 84 homeowners
are set to their lower bound zero. On average, owners spend smaller budget shares on housing than renters.
24
-
The total expenditure and income of owners are greater than those of renters. Log price, age of the household
head, marital status, whether recently moved and race are the exogenous control variables. Log income
is considered to be the instrument primarily for the log expenditure, whereas years of education of the
household head and number of children in the household are instruments mainly for home ownership.
Table 4 reports the first-stage reduced-form estimation for the two EEVs. Linear reduced form regres-
sions are reported for log total expenditure, the continuous EEV, in Column (1) and (2). Probit regression is
reported for home ownership, the binary EEV, in Column (3) to (5). Slight variations in the specifications are
reported in each case. For example, age squared is included in Column (2) in addition to age in its level form
as in Column (1), which turns out to be significant but practically unimportant. For any specification, the
instruments mentioned above are strong enough. The probit reduced forms for the homeownership dummy
are reported with and without the continuous EEV. The predicted value of home ownership from Column
(3) contains only an exogenous variable and is used as an instrument in Regression (3) in Table 5. Columns
(4) and (5) show that including the residual from the reduced form of the log expenditure is sufficient to
control for all the endogeneity from the total expenditure in the home ownership equation.
Table 5 compares the APEs from the fractional response models to the coefficients of linear models.
Columns (1) to (4) report the coefficients from linear models for the housing budget share, and Columns
(5) to (10) report the APEs from fractional response models. The same set of estimators as in the Monte
Carlo study are compared here, the only difference being that the dependent variable is a fractional response,
instead of a binary response. A “Frac” is added to the names to indicate that a quasi-probit is assumed for the
housing budget share. When the homeownership is jointly estimated in the fractional probit, as in Column
(9) and Column (10), the biprobit and heckprobit command in Stata is modified to allow for a fractional
dependent variable. The standard errors for the estimates of APEs are bootstrapped.
As we can tell from Table 5, first of all, failing to account for endogeneity, whether as in the linear
model represented in Column (1) or as in the Frac Probit model as in Column (5), leads to fairly different
estimates from those methods that take care of endogeneity using the same models. Among the linear
probability models that have accounted for endogeneity, the estimates from IV 2SLS differ significantly
from that Opt. IV 2SLS and CF 2SLS. Both Opt. IV 2SLS and CF 2SLS are close to the APEs in the Frac
Probit models that have accounted for endogeneity. This suggests that the relationship between the housing
budget share and the covariates of interest may be close to linear in the unit interval so that two-stage
least squares estimator provides a good approximation. Among the Frac Probit models, the estimates from
25
-
different methods of accounting for endogeneity are fairly close across the board. The difference between
conducting joint estimations with the home ownership, as in Column (9) and Column (10), and plugging
in the generalized residual from home ownership, as in Column (7) and Column (8), is small. Particularly,
if we use home ownership as the switching indicator, the estimates and standard error from Column (10)
and Column (8) are the same, at least to the third decimal place. The difference between Column (7) and
Column (9) is also negligibly small.
Table 6 reports the t statistics for testing of endogeneity and their p-values. The p-values are obtained
from bootstrapping the test statistics. Only estimators that employ CF approaches are considered in this
table. All the test statistics reported are Wald tests for significance. The names refer to the estimators, as in
Table 5. Columns (1) to (4) report the variable addition tests (VATs) on control function terms only. Columns
(5) and (6) also report Wald tests on the correlation parameter ρ (or ρ0 and ρ1 for the two switching regimes),
representing the endogeneity from home ownership equations, given the control function term from log
expenditure equation. In any case, the evidence of endogeneity from log expenditure is strong: the p-values
are identically zero in any test on the significance of v̂2, the control function term from log expenditure
equation. No test based on the fractional response model confirms the endogeneity from homeownship,
whether it is a VAT test on the generalized residual or on the correlation parameter, although the test on the
generalized residual in the linear model CF 2SLS turns out to be significant. The test statistics and p-value
from the VAT test on generalized residual in Column (3) is quite similar to the Wald test on the correlation
parameter ρ in Column (5), suggesting the validity of using VAT on generalized residuals to detect the
additional endogeneity from home ownership. The LM or LR test on ρ can also be performed, but the
proper method of bootstrapping for p-values has to be determined. Another advantage of performing a VAT
test on generalized residuals is its robustness to model specifications. In the case of switching, a joint test
concerning two regimes needs to be conducted to detect endogeneity. This can be easily done by performing
a joint test on the interaction terms between the control function terms and the switching indicator. However,
finding a way to combine correlation parameters ρ0 and ρ1 obtained from different regimes, however is no
easy job.
26
-
7 Conclusion
This paper has shown applications of control function approaches to account for one binary EEV and
many continuous EEVs in binary and fractional response models. The control function approach is com-
putationally simple and allows for a flexible incorporation of heterogeneity, as in an endogenous switching
model. Partial effects based on the ASF are of causal interpretation and can be easily bootstrapped to obtain
inference due to the computational simplicity of the control function approach. A VAT test based on the
generalized residual is shown to be a valid test for detecting additional endogeneity from the binary EEV,
conditioning on the residuals from the continuous EEVs. The simulation study shows that using generalized
residuals to account for endogeneity provides a fairly good approximation to the true APE, significantly bet-
ter than approximations provided by linear probability models. Applying the CF approach to an empirical
illustration using a fractional response model for the housing budget share, we show that homeownership is
not endogenous after controlling for total expenditure, the continuous EEV. This is revealed by performing
VATs on the generalized residuals and cross validated by a Wald test on the correlation parameter. These
results imply that plugging in generalized residuals into a binary response model (or a fractional response
model) acts as a better approximation to the causal marginal effects of interest than other conventional
methods, and it is computationally simpler, enabling an easier detection of endogeneity.
27
-
A.1 Figures and Tables for Section 5
Figure 1: Empirical Distribution of APEs for y2 for the Sample Size of 1000 under Joint Normality
A.2 Tables for Section 6
28
-
Tabl
e1:
Sim
ulat
ion
Res
ults
forA
PEof
y2
(1)
(2)
(3)
(4)
(5)
(6)
(7)
(8)
(9)
(1)
(2)
(3)
(4)
(5)
(6)
(7)
(8)
(9)
Des
ign
1:Jo
intN
orm
ality
Des
ign
2:C
ondi
tiona
lNor
mal
ityC
FC
FC
FC
FC
FC
FIV
Opt
.IV
ML
EC
FC
FC
FC
FC
FC
FIV
Opt
.IV
ML
EB
ipro
bit
Bip
robi
tPr
obit
Prob
it2S
LS
2SL
S2S
LS
2SL
SB
ipro
bit
Bip
robi
tPr
obit
Prob
it2S
LS
2SL
S2S
LS
2SL
SSw
itchi
ngSw
itchi
ngSw
itchi
ngSw
itchi
ngSw
itchi
ngSw
itchi
ngC
ase
1:Ju
stID
,One
Reg
ion,
APE
y2=-
.265
0Ju
stID
,One
Reg
ion,
APE
y2=-
.242
6N
=100
0B
ias
.001
1.0
020
.002
9.0
041
-.056
6-.0
568
-.054
1-.0
542
.001
1.0
009
.001
8.0
024
.002
8-.0
463
-.046
5-.0
474
-.047
6.0
121
RM
SE.0
102
.010
5.0
112
.011
7.0
634
.063
5.0
613
.061
4.0
102
.013
3.0
147
.013
8.0
151
.056
0.0
562
.057
0.0
571
.016
0N
=300
0B
ias
.000
3.0
005
.001
8.0
022
-.057
1-.0
572
-.054
8-.0
549
.000
3.0
002
.000
4.0
016
.001
3-.0
458
-.045
9-.0
468
-.046
9.0
114
RM
SE.0
054
.005
5.0
060
.006
1.0
593
.059
3.0
571
.057
2.0
054
.007
5.0
081
.007
8.0
084
.049
0.0
490
.049
8.0
500
.012
8N
=500
0B
ias
.000
2.0
004
.001
8.0
020
-.057
3-.0
574
-.055
0-.0
551
.000
2-.0
001
.000
1.0
013
.000
9-.0
461
-.046
1-.0
472
-.047
2.0
112
RM
SE.0
044
.004
5.0
050
.005
1.0
587
.058
7.0
564
.056
5.0
044
.005
8.0
062
.006
1.0
064
.048
0.0
480
.049
0.0
490
.012
1C
ase
2:O
verI
D,O
neR
egio
n,A
PEy2=-
.227
9O
verI
D,O
neR
egio
n,A
PEy2=-
.212
3N
=100
0B
ias
.000
0.0
003
.000
8.0
011
-.034
2-.0
342
-.032
2-.0
325
.000
0.0
005
.000
6.0
015
.001
5-.0
269
-.027
3-.0
291
-.029
4.0
059
RM
SE.0
068
.006
9.0
072
.007
4.0
359
.035
8.0
341
.034
3.0
067
.007
2.0
074
.007
8.0
080
.029
8.0
301
.031
9.0
321
.009
2N
=300
0B
ias
.000
1.0
002
.000
8.0
009
-.033
5-.0
334
-.031
5-.0
317
.000
1.0
000
.000
0.0
011
.000
9-.0
272
-.027
7-.0
292
-.029
4.0
057
RM
SE.0
039
.003
9.0
041
.004
1.0
341
.033
9.0
321
.032
3.0
039
.004
1.0
042
.004
5.0
046
.028
1.0
286
.030
0.0
302
.007
0N
=500
0B
ias
.000
0.0
000
.000
7.0
007
-.033
8-.0
339
-.031
7-.0
320
0000
.000
0.0
001
.001
0.0
009
-.027
1-.0
276
-.029
0-.0
293
.005
7R
MSE
.002
9.0
029
.003
1.0
032
.034
1.0
635
.032
1.0
323
.002
9.0
032
.003
3.0
035
.003
6.0
277
.028
1.0
295
.029
8.0
065
Cas
e3:
Just
ID,T
wo
Reg
ions
,APE
y2=-
.090
2Ju
stID
,Tw
oR
egio
ns,A
PEy2=
-.062
4N
=100
0B
ias
-.024
8-.0
001
-.023
7.0
010
-.027
4-.0
261
-.025
2-.0
255
-.024
8-.0
239
-.000
1-.0
230
.000
1-.0
176
-.015
8-.0
176
-.018
2-.0
241
RM
SE.0
345
.017
1.0
337
.017
1.0
382
.036
5.0
368
.036
9.0
346
.036
3.0
197
.035
7.0
196
.033
6.0
317
.033
6.0
339
.036
3N
=300
0B
ias
-.025
2-.0
003
-.025
1-.0
003
-.026
9-.0
165
-.024
9-.0
192
-.025
2.-.
0259
-.000
4-.0
251
-.000
3-.0
187
-.016
6-.0
186
-.019
2-.0
261
RM
SE.0
289
.009
8.0
295
.010
8.0
311
.022
5.0
293
.025
1.0
289
.030
2.0
108
.029
5.0
108
.024
7.0
225
.024
6.0
251
.030
3N
=500
0B
ias
-.025
6.0
001
-.024
5.0
009
-.027
1-.0
260
-.025
0-.0
253
-.025
6-.0
251
.000
0-.0
243
.000
1-.0
175
-.015
9-.0
175
-.018
1-.0
253
RM
SE.0
280
.007
6.0
269
.007
7.0
298
.028
6.0
279
.028
2.0
280
.027
8.0
083
.027
0.0
083
.021
3.0
197
.021
4.0
218
.027
9a
Sequ
entia
lave
ragi
ngof
the
cont
rolf
unct
ion
term
v 2an
dx
isap
plie
dto
com
pute
estim
ates
ofA
PEs.
bT
hebi
asis
defin
edas
the
diff
eren
cebe
twee
nth
etr
ueA
PEs
and
the
estim
ates
.RM
SEis
the
root
mea
nsq
uare
der
ror.
cE
stim
ator
(1)i
sth
eC
Fap
proa
chin
sert
ing
the
first
-sta
gere
sidu
alv̂ 2
toa
seco
nd-s
tage
join
tbip
robi
tbet
wee
ny1
,y3.E
stim
ator
(2)i
sth
eC
Fap
proa
chin
sert
ing
the
first
-sta
gere
sidu
alv̂ 2
toa
seco
nd-s
tage
join
tbip
robi
tbet
wee
ny(1
)1
,y3
andy(0
)1
,y3.E
stim
ator
(3)i
sth
eC
Fap
proa
chin
sert
ing
first
-sta
gere
sidu
alv̂ 2
andĝr 3
into
the
prob
itm
odel
fory
1.E
stim
ator
(4)i
sth
eC
Fap
proa
chin
sert
ing
first
-sta
gere
sidu
alv̂ 2
andĝr 3
into
the
prob
itm
odel
fory
1se
para
tely
fors
ub-s
ampl
esde
fined
byy3.E
stim
ator
(5)i
sth
eC
Fap
proa
chap
plie
dto
linea
rpro
babi
lity
mod
elfo
ry1
byin
sert
ing
first
-sta
gere
sidu
alv̂ 2
andĝr 3
.Est
imat
or(6
)is
the
CF
appr
oach
appl
ied
tolin
earp
roba
bilit
ym
odel
fory
1by
inse
rtin
gfir
st-s
tage
resi
dualv̂ 2
andĝr 3
fors
ub-s
ampl
esde
fined
byy3.E
stim
ator
(7)i
sth
e2S
LS
IVap
proa
chfo
ralin
earp
roba
bilit
ym
odel
ofy1.E
stim
ator
(8)i
sth
e2S
LS
IVap
proa
chus
ing
pred
icte
dfit
ted
valu
esfr
omth
efir
st-s
tage
redu
ced
form
sfo
ry2
andy3
asin
stru
men
ts.y
2is
pred
icte
dus
ing
alin
ear
mod
elan
dy3
ispr
edic
ted
usin
gpr
obit
mod
el.E
stim
ator
(9)i
sth
ejo
inte
stim
atio
nofy1,y
2an
dy3
bym
axim
umlik
elih
ood.
29
-
Tabl
e2:
Sim
ulat
ion
Res
ults
forA
PEof
y3
(1)
(2)
(3)
(4)
(5)
(6)
(7)
(8)
(9)
(1)
(2)
(3)
(4)
(5)
(6)
(7)
(8)
(9)
Des
ign
1:Jo
intN
orm
ality
Des
ign
2:C
ondi
tiona
lNor
mal
ityC
FC
FC
FC
FC
FC
FIV
Opt
.IV
ML
EC
FC
FC
FC
FC
FC
FIV
Opt
.IV
ML
EB
ipro
bit
Bip
robi
tPr
obit
Prob
it2S
LS
2SL
S2S
LS
2SL
SB
ipro
bit
Bip
robi
tPr
obit
Prob
it2S
LS
2SL
S2S
LS
2SL
SSw
itchi
ngSw
itchi
ngSw
itchi
ngSw
itchi
ngSw
itchi
ngSw
itchi
ngC
ase
1Ju
stID
,One
Reg
ion,
APE
y3=.
2573
Just
ID,O
neR
egio
n,A
PEy3=.
2385
N=1
000
Bia
s-.0
013
-.001
2-.0
116
-.011
9.1
010
.101
0.0
809
.082
3-.0
014
-.000
3.0
013
-.008
7-.0
075
.066
7.0
656
.073
8.0
752
-.011
9R
MSE
.038
0.0
382
.044
0.0
442
.111
7.1
119
.096
6.0
972
.038
0.0
422
.043
8.0
472
.048
4.0
866
.086
9.0
956
.096
1.0
432
N=3
000
Bia
s.0
001
.000
2-.0
090
-.009
1.1
032
.103
3.0
852
.085
7.0
000
-.000
1.0
001
-.008
4-.0
083
.067
5.0
663
.074
4.0
757
-.011
8R
MSE
.022
8.0
229
.026
7.0
268
.106
9.1
070
.090
4.0
906
.022
8.0
236
.024
1.0
276
.028
1.0
742
.073
5.0
819
.083
0.0
259
N=5
000
Bia
s-.0
004
-.000
3-.0
096
-.009
6.1
024
.102
6.0
840
.084
9-.0
004
.000
8.0
009
-.007
3-.0
071
.068
3.0
671
.076
8.0
815
-.011
2R
MSE
.017
2.0
172
.021
3.0
214
.104
7.1
048
.087
3.0
880
.017
1.0
193
.019
8.0
224
.022
8.0
727
.071
7.0
816
.081
5.0
221
Cas
e2
Ove
rID
,One
Reg
ion,
APE
y3=.
2132
Ove
rID
,One
Reg
ion,
APE
y3=.
2026
N=1
000
Bia
s.0
002
-.000
8-.0
003
-.000
6.0
672
.065
3.0
512
.053
4.0
001
.000
9.0
009
-.001
2.0
003
.035
3.0
493
.049
2.0
513
-.004
4R
MSE
.032
3.0
349
.035
0.0
367
.081
3.0
811
.074
4.0
729
.032
2.0
394
.039
4.0
400
.041
1.0
622
.073
6.0
772
.076
2.0
372
N=3
000
Bia
s.0
002
.000
4-.0
003
.000
3.0
663
.064
7.0
501
.052
1.0
001
.000
4.0
008
-.001
7-.0
006
.033
0.0
339
.049
4.0
510
-.004
1R
MSE
.018
8.0
197
.019
8.0
207
.071
5.0
705
.058
7.0
595
.018
8.0
217
.022
9.0
237
.023
4.0
444
.045
5.0
608
.060
8.0
226
N=5
000
Bia
s-.0
001
-.000
2-.0
012
-.000
7.0
655
.063
5.0
491
.051
3-.0
001
.000
3.0
006
-.001
6-.0
005
.033
8.0
478
.048
5.0
506
-.004
1R
MSE
.014
7.0
152
.015
8.0
162
.068
7.0
670
.054
5.0
559
.014
6.0
173
.018
0.0
185
.018
9.0
414
.054
0.0
557
.056
9.0
174
Cas
e3
Just
ID,T
wo
Reg
ions
,APE
y3=.
0622
Just
ID,T
wo
Reg
ions
,APE
y3=.
0864
N=1
000
Bia
s.0
436
.000
5.0
339
.006
5.0
292
.032
7.0
128
.015
3.0
437
.053
5.0
007
.046
2.0
028
.025
7-.0
045
.025
7.0
301
.054
9R
MSE
.072
1.0
392
0585
.044
4.0
576
.057
5.0
559
.054
6.0
722
.076
6.0
409
.066
9.0
455
.055
5.0
500
.059
7.0
601
.077
9N
=300
0B
ias
.043
7.0
001
.046
7.0
023
.029
2-.0
051
.012
9.0
294
.043
8.0
536
-.000
3.0
467
.002
3.0
254
-.005
1.0
246
.029
4.0
546
RM
SE.0
555
.023
6.0
547
.026
8.0
413
.029
4.0
351
.042
8.0
555
.062
4.0
238
.054
7.0
268
.038
6.0
294
.040
8.0
428
.063
4N
=500
0B
ias
.044
7.0
010
.035
3.0
076
.030
0.0
335
.013
0.0
159
.044
7.0
536
-.000
3.0
466
.002
3.0
245
-.005
7.0
247
.029
1.0
545
RM
SE.0
516
.018
0.0
413
.021
5.0
372
.039
5.0
273
.028
0.0
516
.059
2.0
185
.051
6.0
208
.033
3.0
237
.035
0.0
378
.060
1a
Sequ
entia
lave
ragi
ngof
the
cont
rolf
unct
ion
term
v 2an
dx
isap
plie
dto
com
pute
estim
ates
ofA
PEs.
bT
hebi
asis
defin
edas
the
diff
eren
cebe
twee
nth
etr
ueA
PEs
and
the
estim
ates
.RM
SEis
the
root
mea
nsq
uare
der
ror.
cE
stim
ator
(1)i
sth
eC
Fap
proa
chin
sert
ing
the
first
-sta
gere
sidu
alv̂ 2
toa
seco
nd-s
tage
join
tbip
robi
tbet
wee
ny1
,y3.E
stim
ator
(2)i
sth
eC
Fap
proa
chin
sert
ing
the
first
-sta
gere
sidu
alv̂ 2
toa
seco
nd-s
tage
join
tbip
robi
tbet
wee
ny(1
)1
,y3
andy(0
)1
,y3.E
stim
ator
(3)i
sth
eC
Fap
proa
chin
sert
ing
first
-sta
gere
sidu
alv̂ 2
andĝr 3
into
the
prob
itm
odel
fory
1.E
stim
ator
(4)i
sth
eC
Fap
proa
chin
sert
ing
first
-sta
gere
sidu
alv̂ 2
andĝr 3
into
the
prob
itm
odel
fory
1se
para
tely
fors
ub-s
ampl
esde
fined
byy3.E
stim
ator
(5)i
sth
eC
Fap
proa
chap
plie
dto
linea
rpro
babi
lity
mod
elfo
ry1
byin
sert
ing
first
-sta
gere
sidu
alv̂ 2
andĝr 3
.Est
imat
or(6
)is
the
CF
appr
oach
appl
ied
tolin
earp
roba
bilit
ym
odel
fory
1by
inse
rtin
gfir
st-s
tage
resi
dualv̂ 2
andĝr 3
fors
ub-s
ampl
esde
fined
byy3.E
stim
ator
(7)i
sth
e2S
LS
IVap
proa
chfo
ralin
earp
roba
bilit
ym
odel
ofy1.E
stim
ator
(8)i
sth
e2S
LS
IVap
proa
chus
ing
pred
icte
dfit
ted
valu
esfr
omth
efir
st-s
tage
redu
ced
form
sfo
ry2
andy3
asin
stru
men
ts.y
2is
pred
icte
dus
ing
alin
ear
mod
elan
dy3
ispr
edic
ted
usin
gpr
obit
mod
el.E
stim
ator
(9)i
sth
ejo
inte
stim
atio
nofy1,y
2an
dy3
bym
axim
umlik
elih
ood.
30
-
Figure 2: Empirical Distribution of APEs for y2 for the Sample Size of 1000 under Conditional Normality
31
-
Figure 3: Empirical Distribution of APEs for y3 for the Sample Size of 1000 under Joint Normality
32
-
Figure 4: Empirical Distribution of APEs for y3 for the Sample Size of 1000 under Conditional Normality
33
-
Table 3: Summary Statistics of the Estimation Sample (N=2964)
Variable Owner RenterBudget Share on Housing .20 .41
(.16) (.16)Ln(Expenditure) 10.35 9.82
(.59) (.59)Ln(Income) 10.94 10.29
(.75) (.73)Ln(Price) 8.55 8.93
(.21) (.12)Age 49.88 44.45
(12.97) (13.69)Married .79 .35
(.40) (.48)Moved .10 .33
(.30) (.47)Black .21 .46
(.41) (.50)Years of education 13.27 12.12
(2.73) (2.90)Number of Children .94 1.05
(1.14) (1.24)Obs. 2355 629a The sample is based on the 2001waves of the Panel
Study of Income Dynamics (PSID). All monetary vari-ables were converted to 1998 dollars before they werelogged.
b Sample standard deviations are in parentheses belowthe sample means.
34
-
Table 4: The Frist Stage Reduced Form Regression for the EEVs
(1) (2) (3) (4) (5)Estimation Method OLS OLS Probit Probit ProbitDependent Variable Ln(Expenditure) Ln(Expenditure) Owner Owner Ownerv̂2 .058∗∗∗
(.009)Ln(Expenditure) .058∗∗∗
(.009)Ln(Income) .410∗∗∗ .385∗∗∗ .055∗∗∗ .057∗∗∗ .033∗∗∗
(.013) (.013) (.006) (.006) (.007)Education .025∗∗∗ .024∗∗∗ .005∗∗ .004∗∗ .002
(.0031) (.003) (.001) (.0015) (.0015)Children .056∗∗∗ .062∗∗∗ .012∗∗∗ .011∗∗∗ .008∗∗
(.0077) (.008) (.004) (.003) (.004)Age -.0025∗∗∗ .029∗∗∗ .003∗∗∗ .003∗∗∗ .003∗∗∗
(.0007) (.004) (.0003) (.0003) (.0003)Age2 -.0002∗∗∗
(.00004)Ln(Price) -.068∗∗ -.067∗∗ -.646∗∗∗ -.624∗∗∗ -.620∗∗∗
(.0318) (.0314) (.017) (.016) (.016)Married .27∗∗∗ .26∗∗∗ .054∗∗∗ .053∗∗∗ .037∗∗∗
(.021) (.021) (.010) (.009) (.010)Moved .012∗ .037∗ -.065∗∗∗ -.063∗∗∗ -.064∗∗∗
(.022) (.022) (.009) (.009) (.009)Black -.058∗∗ -.079∗∗∗ -.064∗∗∗ -.060∗∗∗ -.057∗∗∗
(.019) (.0189) (.0087) (.009) (.009)a v̂2 denotes the residual from Regression (1), the reduced form for log total expenditure.b Regression (1) and (2) are first-stage regressions for log total expenditure, the continuous EEV. Regression
(3)-(5) are first-stage regressions for home ownership, the binary EEV.c * p-value¡10%
** p-value¡5%*** p-value¡1%
35
-
Tabl
e5:
Com
pari
ngm
argi
nale
ffec
tsin
the
stru
ctur
aleq
uatio
nof
the
hous
ing
shar
e
(1)
(2)
(3)
(4)
(5)
(6)
(7)
(8)
(9)
(10)
Est
imat
ion
Met
hod
IVO
pt.I
VC
FC
FC
FC
FC
FC
FFu
nctio
nalf
orm
fory
1L
inea
r2S
LS
2SL
S2S
LS
Frac
Prob
itFr
acPr
obit
Frac
Prob
itFr
acPr
obit
Frac
Bip
robi
tFr
acB
ipro
bit
No
EE
Vs
No
EE
Vs
One
EE
VTw
oE
EV
sSw
itchi
ngSw
itchi
ngM
argi
nalE
ffec
tsC
oeff
Coe
ffC
oeff
Coe
ffA
PEA
PEA
PEA
PEA
PEA
PEL
n(E
xpen
ditu
re)
-.109∗∗∗
-.176∗∗
-.056∗∗∗
-.061∗∗∗
-.106∗∗∗
-.064∗∗∗
-.061∗∗∗
-.059∗∗∗
-.062∗∗∗
-.059∗∗∗
(.005
)(.0
84)
(.01)
(.01)
(.005
)(.0
09)
(.010
)(.0
09)
(.01)
(.009
)O
wne
r-.0
94∗∗∗
.336
-.14∗∗∗
-.124∗∗∗
-.074∗∗∗
-.088∗∗∗
-.101∗∗∗
-.124∗∗∗
-.098∗∗∗
-.124∗∗∗
(.009
)(.3
30)
(.016
)(.0
15)
(.008
2)(.0
12)
(.020
)(.0
38)
(.020
)(.0
38)
Age
.002∗∗∗
.000
3.0
03∗∗∗
.003∗∗∗
.002∗∗∗
.002
5∗∗∗
.002
6∗∗∗
.002
6∗∗∗
.002
5∗∗∗
.002
6∗∗∗
(.000
2)(.0
01)
(.000
2)(.0
002)
(.000
2)(.0
002)
(.000
2)(.0
002)
(.000
2)(.0
002)
Ln(
Pric
e).1
47∗∗∗
.54∗
.109∗∗∗
.124∗∗∗
.151∗∗∗
.147∗∗∗
.136∗∗∗
.138∗∗∗
.139∗∗∗
.138∗∗∗
(.013
)(.2
9)(.0
17)
(.017
)(.0
13)
(.016
)(.0
21)
(.022
)(.0
21)
(.022
)M
arri
ed-.0
05-.0
6∗∗
-.027
-.028∗∗∗
-.002
6-.0
27∗∗∗
-.027∗∗∗
-.027∗∗∗
-.027∗∗∗
-.027∗∗∗
(.007
)(.0
25)
(.008
)(.0
09)
(.006
)(.0
08)
(.008
)(.0
08)
(.008
)(.0
08)
Mov
ed.0
11.0
71.0
05.0
07.0
09.0
09.0
07.0
05.0
07.0
05(.0
07)
(.047
)(.0
08)
(.007
)(.0
07)
(.007
)(.0
08)
(.009
)(.0
07)
(.009
)B
lack
-.012∗∗
.041
-.010∗
-.009
-.012∗∗
-.006
-.008
-.009
-.007
-.009
(.006
)(.0
36)
(.006
)(.0
06)
(.006
)(.0
06)
(.006
)(.0
07)
(.006
)(.0
07)
aT
hede
pend
entv
aria
ble
isth
eex
pend
iture
shar
eon
hous
ing.
bSt
anda
rder
rors
fort
hees
timat
edA
PEs
wer
ebo
otst
rap
stan
dard
erro
rsw
ith20
0re
plic
atio
ns.
cR
egre
ssio
n(1
)is
the
OL
Sfo
rlin
earp
roba
bilit
ym
odel
that
assu
ms
noE
EV
s.R
egre
ssio
n(2
)is
the
2SL
SIV
estim
ator
forl
inea
rpro
babi
lity
mod
elth
atus
esa
linea
rpro
babi
lity
mod
elfo
rthe
redu
ced
form
ofho
me
owne
rshi
p.R
egre
ssi