binary and fractional response models with continuous and … · 2019. 11. 17. · likelihood of...

Binary and Fractional Response Models with Continuous and

Binary Endogenous Explanatory Variables

Wei Lin∗

Jeffrey M. Wooldridge†

November 8th, 2017

Abstract

This paper considers latent variable models for binary responses and fractional responses with a bi-

nary endogenous explanatory variable (EEV) and potentially many continuous endogenous explanatory

variables. A two-step control function (CF) approach is promoted to account for endogeneity. The CF

approach enables an uncovering of partial effects of causal interest. The inference for the partial effects

can be easily obtained through bootstrapping because of the computational simplicity of the two-step CF

approach. A basic probit model, an endogenous switching probit model, and a fractional probit model

are discussed in the paper. Variable addition tests on generalized residuals are used to detect additional

endogeneity from the binary EEV. Monte Carlo experiments show that partial effects obtained by insert-

ing generalized residuals into binary response models outperform coefficients from linear specifications.

In fact, they provide fairly close approximations to partial effects from joint estimations. An empirical

illustration of the determination of housing budget shares shows that, in a fractional response model,

using generalized residuals again leads to a close approximation to joint estimations. The coefficients

from linear specifications and partial effects from quasi-MLE are also close in this case.

∗Center for Real Estate, Massachusetts Institute of Technology, Cambridge, MA 02139 , United States. [email protected].†Department of Economics, Michigan State University, East Lansing, MI 48824, United States. [email protected].

1

mailto:[email protected]:[email protected]

1 Introduction

Binary response models play a significant role in many fields of empirical studies. Examples include

econometric models determining the probability of migration in labor economics (Dong and Lewbel, 2015),

the chance of college admission in the economics of education (Conlin et al., 2013), and the likelihood of

takeover activity in finance (Edmans et al., 2012), to name just a few.

In practice, linear probability models for binary responses are often used because they are easy to esti-

mate. However, a linear projection disregards the limited nature of the binary response, resulting in unreal-

istic predictions of the response probability. Therefore, coefficients of the linear probability models serve,

at best, as approximations to marginal effects of interest .

Latent variable models for binary responses, on the other hand, are well grounded on economic theories.

The latent threshold-crossing structure captures the trigger effect, and the nonlinear transformation ensures

that the response probability falls into the unit interval. Thereby, post-estimation quantities from the latent

variable models, such as average partial effects (APEs), can bear a causal interpretation, even in the presence

of endogenous explanatory variables (Lin and Wooldridge, 2015). Yet due to the nonlinearity, detecting and

solving endogeneity issues in the first place is less straightforward, especially when some of the suspected

endogenous explanatory variables are also discrete.

This paper considers the estimation of a special case of latent variable models for binary responses with

EEVs of differing attributes, where a control function approach can be applied to make it easier to handle

EEVs. Namely, we allow for one binary EEV and potentially many continuous EEVs. The binary EEV

could be an indicator of self-selecting into a treatment, into a region or into a sample, depending on how it

appears in the binary response equation. Some empirical examples for the binary EEV are whether to own

a house, whether to submit SAT scores in college applications, or whether to be in the labor force. The

continuous EEVs could be years of education, family income, prices of substitutes, or inputs for production

functions. The endogeneity here is modeled to arise from omitted variable problems—the existence of

common unobservables affecting both the binary outcome and the EEVs.

To estimate the model above, for simplicity, we take a simple parametric approach by assuming joint

normality among error terms, following the treatments of binary and continuous EEVs adopted by Heckman

(1978), Amemiya (1978), and Rivers and Vuong (1988). This distributional assumptions, not only allows for

control function approaches, but also for joint estimation, such as a limited information maximum likelihood

2

(LIML) estimation, or a quasi-limited information maximum likelihood (quasi-LIML) estimation if any of

the distribution is misspecified. However, this kind of joint estimation is rarely conducted in practice, due to

the difficulty in searching for a numerical solution along so many dimensions. One might also be tempted

to mimic two-stage least squares by substituting fitted values from a first-stage estimation for EEVs in the

binary response equation. Despite the simplicity, this procedure leads to the so-called ”forbidden regression”

coined by Hausman (1975), which in turn yields inconsistent estimates.

Alternatively, drawing on Wooldridge (2014), this paper promotes a two-step control function (CF)

approach under the quasi-LIML framework, which is not only computationally simple but also delivers

sensible estimators of APEs in the presence of multiple EEVs and various sources of heterogeneity. To carry

out this two-step procedure, residuals, instead of fitted values, from the first-stage estimation of reduced

forms for the continuous EEVs are plugged into the second-stage joint estimation of the binary outcome and

binary EEV. Routines in commonly used software can be exploited (or slightly modified) to carry out this

procedure. Most importantly, due to the computational simplicity of the CF approach, bootstrapping can

be easily applied to obtain inference for functions of the parameters, such as APEs, rather than using the

complicated delta method.

In addition, as shown in Wooldridge (2014), simple variable addition tests (VATs) for endogeneity are

obtained as by-products of the CF approach. This paper extends the VATs for a single EEV to VATs for

multiple EEVs. The VATs are based on standard Wald tests of those plugged-in residuals obtained from

the first stage estimations. In particular, in the presence of a binary EEV, testing on generalized residuals

enables us to determine whether we can avoid a joint estimation in the second stage. Further, since EEVs

are often correlated with each other, conditioning on residuals obtained from other EEVs helps reduce the

likelihood of detecting additional endogeneity in the binary EEV.

Another feature of the two-step procedure, which stems from White (1982), is the quasi-LIML frame-

work. The analysis for binary response models in this paper easily carries through to fractional response

models, as proposed in Papke and Wooldridge (1996). So long as the conditional mean of the fractional

response is correctly specified, we consistently estimate parameters in the conditional mean even though

other features of the distribution are misspecified.

Besides parametric approaches to estimating this triangular model for binary responses, semi-parametric

and nonparametric approaches are also available. Nevertheless, while those approaches sensibly relax para-

metric assumptions in one aspect, they inevitably impose restrictions in other directions. For example,

3

Blundell and Powell (2003) advance CF approaches to fully nonparametric binary response models with

continuous EEVs. Unfortunately, their assumption of additive, independent errors rules out discrete EEVs.

The special regressor method in Dong and Lewbel (2015) allows for both continuous and binary EEVs in

semiparametric binary response models. However, their method requires a special regressor to be excluded

from the reduced forms for the EEVs, and the special regressor cannot appear in the structural equation in

flexible functional forms. Further, as discussed in Lin and Wooldridge (2015), the average index functions

(AIF), proposed as a basis for defining marginal effects for special regressor methods, lack a causal interpre-

tation. Some other existing semiparametric methods for estimating this model are discussed in more detail

by Lin (2016).

The rest of the paper is organized as follows. Section 2 starts with a basic model with one binary and

many continuous EEVs. The same arguments are then extended to an endogenous switching model where

the error terms and switching indicator are allowed to interact. Section 3 derives the VAT for endogeneity

from a binary EEV given residuals from continuous EEVs. Section 4 shows that the CF approach can

be applied to fractional response models. Section 5 presents Monte Carlo simulation results of empirical

distributions of APEs for binary response models. Section 6 illustrates this approach by revisiting the study

of the effects of price and total expenditure on housing budget share equation. Section 7 concludes.

2 Model Specification and Estimation for Binary Response

2.1 Probit Models with One Binary EEV and Many Continuous EEVs

As a starting point, we first assume that the only complication arises from EEVs of differing attributes,

with no presence of heterogeneity or misspecification yet. More specifically, consider a simple model for a

binary response y1 with many continuous EEVs y2 and one binary EEV y3. Write the model recursively in

a triangular form as

y1 = 1 [x1β + u1 > 0] , (1a)

y2 = zΠ + v2, (1b)

y3 = 1 [zδ + u3 > 0] . (1c)

4

Equation (1a) is a structural equation that represents a causal relationship. Equations (1b) and (1c) are

reduced forms for the continuous EEVs y2 of dimension 1×G and the scalar binary EEV y3, respectively.

1 [·] denotes the indicator function that takes on a value of one when the statement in the bracket is true

and zero otherwise. x1 is a 1 ×K1 vector where each of its elements is a general function of (z1,y2, y3),

such as polynomials, interactions, logarithms, etc., with x1 = (z1,y2, y3) being the leading case. z1 is a

1 × L1 strict subset of the entire 1× L vector of exogenous variables z ≡ (z1, z2), with L ≡ L1 + L2

and L2 ≥ G + 1. Identifying parameters based on nonlinearity often times turns out poorly in practice, so

we need at least one excluded instrument for the binary EEV. Further, the same rank condition holds as in

two-stage least squares: rank E(z′z) = L1 + L2 and rank E(z′x1) = K1. Moreover, let z1 include unity

as its first element, which effectively forces the error terms (u1,v2, v3) to have zero means. Π is a L × G

matrix of parameters. In the simplest case, when G = 1, y2 is a scalar.

This system of equations describes endogeneity as an omitted variable problem. The structural error u1

is correlated with explanatory variables y2 and y3, in that it contains an unobservable that also appears in

error terms v2 and u3. Write the linear projections of (u1, u3) on v2 in error forms:

u1 = v2θ + v1, (2a)

u3 = v2η + v3, (2b)

where θ ≡ E (v′2v2)−1E (v′2u1) and η ≡ E (v′2v2)

−1E (v′2u3) are the G × 1 vectors of the population

regression coefficients.

A convenient joint normality assumption among (u1,v2, u3) (Heckman, 1978; Amemiya, 1978; Rivers

and Vuong, 1988) leaves us with a bivariate normally distributed vector of errors (v1, v3) that is independent

of v2 (by definition of a linear projection and by a property of multivariate normality):

D

v1v3

= D v1

v3

∣∣∣∣∣∣∣v2 ∼ Normal

0

0

, 1 ρ

ρ 1

, (3)

where D (·) denotes the distribution, the variances of v1 and v3 are normalized to one, and ρ ≡ Cov (v1, v3)

is the covariance.

In fact, if we are willing to assume a strong enough exogeneity condition for the instruments z, the

bivariate distribution (v1, v3) becomes independent not only of v2 but also of z and thus of y2 (because y2

5

is a deterministic function of z,v2):

D

v1v3

= D v1

v3

∣∣∣∣∣∣∣ z,v2 = D

v1v3

∣∣∣∣∣∣∣y2, z,v2 . (4)

Given the distributional assumptions, we arrive at a bivariate probit model that accounts for endogeneity

issues in (1) by adding v2 as extra explanatory variables:

y1 = 1 [x1β + v2θ + v1 ≥ 0] , (5a)

y3 = 1 [zδ + v2η + v3 ≥ 0] . (5b)

Adding reduced form errors to control for endogeneity is the essence of a control function approach.

However, since we can not observe v2, to operationalize it, a simple two-step procedure proceeds as follows:

1. Estimate (1b), the reduced forms for y2, by ordinary least squares (OLS), equation by equation, to

obtain the residuals v̂2 = y2 − zΠ̂.

2. Estimate (5), the bivariate probit model for y1 and y3, jointly by maximum likelihood estimation

(MLE), replacing v2 with v̂2.

Since there is no one-to-one mapping between the reduced form error v3 and the binary EEV y3, we

cannot obtain a proxy for v3 and hence have to rely on a joint estimation in the second step. By the usual

consistency argument of two-step M-estimations (see, for example, Wooldridge, 2010, section 12.4.1), the

resulting control function estimator(

Π̂, β̂, δ̂,θ̂, η̂)

is consistent for parameters identified by the following

population problems. Formally,

Π = E(z′z)−1

E(z′y2

), (6)

and (β, δ,θ,η) is the unique solution to

maxb∈RK1 ,d∈RL,r∈RG,g∈RG,ρ∈R

E [logP (y1, y3|y2, z,v2)]

= E

[y1y3 log

∫ ∞−q3

Φ (d)φ (υ3) dυ3

+ (1− y1) y3 log∫ ∞−q3

[1− Φ (d)]φ (υ3) dυ3

+y1 (1− y3) log∫ −q3−∞

Φ (d)φ (υ3) dυ3

+ (1− y1) (1− y3) log∫ −q3−∞

[1− Φ (d)]φ (υ3) dυ3], (7)

6

where

d ≡ x1b + v2r + ρυ3√1− ρ2

, (8)

q3 ≡ zd + v2g. (9)

However, as the magnitude of β depends on the normalization of the error terms and thus is only iden-

tified up to scale, interpreting β is not especially meaningful. Instead, the primary goal in empirical studies

is to explain marginal effects of a variable of interest on response probabilities. In the presence of EEVs,

P (y1 = 1|x1), the conditional response probability is hardly of any interest: it is affected by y2 and y3

having correlations with the omitted variable in the unobservables u1. We must use care in constructing

a interesting response function for deriving partial effects. Fortunately, Blundell and Powell (2003, 2004)

have proposed the average structural function (ASF), which is intuitively appealing and can be obtained via

counterfactual reasoning. In defining the ASF for the structural equation (1a), we break the correlations by

holding the observables x1 as fixed arguments and averaging out the unobservable ui1 without conditioning

on x1:

ASF (x1) = Eui1 {1 [x1β + ui1 > 0]} , (10)

where the subscript i on ui1 emphasizes that it is a random variable, and Eui1 {·} is the expected value with

respect to ui1.

In the two-step CF procedure above, we identify parameters that correspond to the conditional normality

of u1 given v2, namely,

u1|v2 ∼ Normal (v2θ, 1) . (11)

Thus, by the usual law of iterated expectations, the ASF defined in (10) can be obtained in two steps.

First, we treat vi2 as fixed, and then average them out as random variables:

ASF (x1) = Evi2{Eui1|vi2 {1 [x1β + vi2θ + vi1 > 0] |vi2}

}= Evi2 {Φ (x1β + vi2θ)}

=

∫ ∞−∞

Φ (x1β + υi2θ)φ (υi2) dυi2, (12)

where φ (·) is the density function for the random variables vi2.

The average partial effects (APEs) for a given x1 are then obtained by taking derivatives or differences

7

of (12)

APEy2 (x1) = βy2

∫ ∞−∞

φ (x1β + υi2θ)φ (υi2) dυi2, (13a)

APEy3 (x1) =∫ ∞−∞

[Φ(x

(1)1 β + vi2θ

)− Φ

(x

(0)1 β + vi2θ

)]φ (υi2) dυi2, (13b)

where βy2 is the coefficient on y2 and x(1)1 denotes explanatory variables at a particular fixed value with

y3 = 1 and x(0)1 denotes the same fixed value of the explanatory variables except that y3 = 0. Those APEs

can be consistently estimated by using sample analogue and inserting consistent estimators of β̂ and θ̂ from

the two-step CF approach:

ÂPEy2 (x1) = β̂y2

[N−1

N∑i=1

φ(x1β̂ + v̂i2θ̂

)], (14a)

ÂPEy3 (x1) = N−1

N∑i=1

[Φ(x

(1)1 β̂ + v̂i2θ̂

)− Φ

(x

(0)1 β̂ + v̂i2θ̂

)]. (14b)

To obtain inference for the estimators of APEs as in (14a) and (14b), analytical standard errors can be

derived by the delta method and by setting the two-step control function problem as one-step method of

moments problem. However, because all the procedures involved in the estimations are standard routines,

bootstrap standard errors can be easily obtained to account for the sampling errors.

As shown in (13a) and (13b), APEs for the binary response model have the attractive feature of built-in

heterogeneity—-they deliver varying partial effects when evaluated at different values of x1. However, if one

is interested in using a single summary statistic for marginal effects, further averaging across x1 should be

applied. A joint averaging across x1, v̂2 (as ”margins” command does in STATA) is computationally easier

but bears a different causal interpretation from sequentially averaging out v̂2 and x1 (Nam and Wooldridge,

2014).

Although serving as a starting point, the modelling strategy in (1) for CF approach is limited in several

ways. One restrictive feature is that the reduced form error v2 needs to be independent of the exogenous

variables z. Thus, the linear function form for conditional mean of y2 is unrealistic and can be relaxed to be

any generic function π (·) for z as in Blundell and Powell (2003, 2004). More importantly, v2 here acts as a

sufficient statistic to control for any endogeneity from y2 in the structural error u1: that is, y2 is correlated

with u1 only through v2 on its level form. However, as shown in Murtazashvili and Wooldridge (2015), in

8

case of more heterogeneity such as random coefficients, the unobservable u1 can contain full interactions

between v2 and z,x1. Besides interactions, even though allowing for an unknown function h (·) for v2 as

in Lin (2016) does not completely make the dependence of u1 on v2 flexible, nevertheless it adds some

flexiblity to this restrictive assumption.

2.2 Probit Endogenous Switching Models with Many Continuous EEVs

As we are interested in modeling some heterogeneity besides EEVs, we turn to a probit switching re-

gression with EEVs. The binary EEV y3 can be viewed as a switching indicator. In addition to shifting

intercepts when y3 appears by itself in the linear index, the switching can be made more general. Interacting

y3 with all the observables allows us to switch into regimes of differing slopes. The interaction between

y3 and unobservables indicates the two regimes have differing unobservables. The switching is endogenous

because y3 is correlated with the unobservables. In the treatment effect framework, y3 is the treatment

indicator and the treatment effect is heterogenous. To see this, first write the model as follows:

y1 = 1 [(1− y3)x1β0 + y3x1β1 + (1− y3)u0 + y3u1 > 0] (15a)

y2 = zΠ + v2 (15b)

y3 = 1 [zδ + u3 > 0] , (15c)

Under a similar set of notations and assumptions as in (1), write the linear projection of u1, u0 and u3

onto the reduced form error v2 in error forms:

u0 = v2θ0 + v0 (16a)

u1 = v2θ1 + v1 (16b)

u3 = v2η + v3, (16c)

where θ0 ≡ E (v′2v2)−1E (v′2u0), θ1 ≡ E (v′2v2)

−1E (v′2u1) and η ≡ E (v′2v2)−1E (v′2u3). Then, we

maintain a strong exogeneity assumption that the remaining error terms v0 and v1 are independent of v2 and

a parametric assumption that they have a bivariate normal distribution with the remaining error term v3 with

9

covariance ρ0 and ρ1, respectively:

D

v0v3

∣∣∣∣∣∣∣v2 ∼ Normal

0

0

, 1 ρ0

ρ0 1

, (17a)

D

v1v3

∣∣∣∣∣∣∣v2 ∼ Normal

0

0

, 1 ρ1

ρ1 1

. (17b)

Again, assuming (v1, v3) and (v0, v3) are independent of z leads to an independence between y2 and

the joint distribution of (v0, v3) and (v1, v3)

D

v0v3

= D v0

v3

∣∣∣∣∣∣∣ z,v2 = D

v0v3

∣∣∣∣∣∣∣y2, z,v2 , (18a)

D

v1v3

= D v1

v3

∣∣∣∣∣∣∣ z,v2 = D

v1v3

∣∣∣∣∣∣∣y2, z,v2 . (18b)

Them, rewrite model (15) in the treatment framework

y1 = (1− y3) y(0)1 + y3y(1)1 (19)

y(0)1 = 1 [x1β0 + v2θ0 + v0 > 0] , (20)

y(1)1 = 1 [x1β1 + v2θ1 + v1 > 0] , (21)

y3 = 1 [zδ + v2η + v3 > 0] , (22)

where y(0)1 is the potential outcome when the treatment y3 equals zero and y(1)1 is the potential outcome

when the treatment is one. The self-selection problem is represented by the non-zero correlation between

the treatment indicator y3 and the unobservables v0 and v1 in the potential outcomes. Those who self-select

into treatment inherently have a different distribution of unobservable from those who do not.

To consistently estimate the parameters in this model, a simple three-step control function approach

splits the above model into two Heckman sample selection models with sub-samples defined by the treatment

status:

1. Using all observation, estimate (15b), the reduced forms for y2, by ordinary least squares (OLS),

equation by equation, to obtain the residuals v̂2 = y2 − zΠ̂.

2. Since y(1)1 is observed only when y3 = 1, jointly estimate (21) and (22), the binary outcome equation

10

for y(1)1 and sample selection equation for indicator y3, by maximum likelihood estimation (MLE), replacing

v2 with v̂2, to obtain β̂1 and θ̂1.

3. Since y(0)1 is observed only when y3 = 0, jointly estimate (20) and (22), the binary response model for

y(0)1 and sample selection equation for indicator 1−y3, by maximum likelihood estimation (MLE), replacing

v2 with v̂2, to obtain β̂0 and θ̂0.

The above procedure is justified by splitting the objective function for the second-step estimation into

two parts.

Namely, solving

maxb0∈RK1 ,b1∈RK1 ,d∈RL,r0∈RG,r1∈RG,g∈RG,ρ0∈R,ρ1∈R

E [logP (y1, y3|y2, z,v2)]

= E[y1y3 logP

(y

(1)1 = 1, y3 = 1|z,v2,y2

)+ (1− y1) y3 logP

(y

(1)1 = 0, y3 = 1|z,v2,y2

)+y1 (1− y3) logP

(y

(0)1 = 1, y3 = 0|z,v2,y2

)+ (1− y1) (1− y3) logP

(y

(0)1 = 0, y3 = 0|z,v2,y2

)], (23)

is equivalent to solving

maxb1∈RK1 ,d∈RL,r1∈RG,g∈RG,ρ1∈R

E [logP (y1, y3|y2, z,v2)]

= E[y

(1)1 y3 logP

(y

(1)1 = 1, y3 = 1|z,v2,y2

)+(

1− y(1)1)y3 logP

(y

(1)1 = 0, y3 = 1|z,v2,y2

)+ (1− y3) logP (y3 = 0|z,v2,y2)] , (24)

and


E [logP (y1, y3|y2, z,v2)]

= E[y

(0)1 (1− y3) logP

(y

(0)1 = 1, y3 = 0|z,v2,y2

)+(

1− y(0)1)

(1− y3) logP(y

(0)1 = 0, y3 = 0|z,v2,y2

)+y3 logP (y3 = 1|z,v2,y2)] , (25)

where

11

P(y

(1)1 = 1, y3 = 1|z,v2,y2

)=

∫ ∞−q3

Φ (d1)φ (υ3) dυ3 (26)

P(y

(1)1 = 0, y3 = 1|z,v2,y2

)=

∫ ∞−q3

[1− Φ (d1)]φ (υ3) dυ3 (27)

P(y

(0)1 = 1, y3 = 0|z,v2,y2

)=

∫ −q3−∞

Φ (d0)φ (υ3) dυ3 (28)

P(y

(0)1 = 0, y3 = 0|z,v2,y2

)=

∫ −q3−∞

[1− Φ (d0)]φ (υ3) dυ3 (29)

P (y3 = 1|z,v2,y2) = Φ (q3) (30)

P (y3 = 0|z,v2,y2) = 1− Φ (q3) (31)

d1 ≡x1b1 + v2r1 + ρ1υ3√

1− ρ21(32)

d0 ≡x1b0 + v2r0 + ρ0υ3√

1− ρ20(33)

q3 ≡ zd + v2g. (34)

Similar to (12), the ASF for the endogenous switching model is a combination of the ASFs for the two

regimes:

ASF (x1) =∫ ∞−∞

[y3Φ (x1β1 + υi2θ1) + (1− y3) Φ (x1β0 + υi2θ0)]φ (υi2) dυi2. (35)

APEs for a continuous EEV y2 and binary EEV y3 are defined as follows respectively:

APEy2 (x1) =∫ ∞−∞

[βy(1)2

y3φ (x1β1 + υi2θ1)

+βy(0)2

(1− y3)φ (x1β0 + υi2θ0)]φ (υi2) dυi2, (36a)

APEy3 (x1) =∫ ∞−∞

[Φ (x1β1 + υi2θ1)− Φ (x1β0 + υi2θ0)]φ (υi2) dυi2, (36b)

where βy(1)2

is the coefficient for y2 in (21) and βy(0)2is the coefficient in (20).

Notice that the APE for a binary exogenous variable z1 is defined nontrivially as

APEz1 (x1) =∫ ∞−∞

{y3

[Φ(x

(1)1 β1 + υi2θ1

)− Φ

(x

(0)1 β1 + υi2θ1

)]+ (1− y3)

[Φ(x

(1)1 β0 + υi2θ0

)− Φ

(x

(0)1 β0 + υi2θ0

)]}, (37)

where x(1)1 denotes explanatory variables at a particular fixed value with z1 = 1 and x(0)1 denotes the same

12

fixed value of the explanatory variables except that now z1 = 0.

Correspondingly, a consistent estimate of the APEs is a sample analog of (36a) and (36b) with consistent

estimates for the parameters plugged in:

ÂPEy2 (x1) = N−1

N∑i=1

[β̂y(1)2

y3φ(x1β̂1 + v̂i2θ̂1

)+β̂

y(0)2

(1− y3)φ(x1β̂0 + v̂i2θ̂0

)], (38a)

ÂPEy3 (x1) = N−1

N∑i=1

[Φ(x1β̂1 + v̂i2θ̂1

)− Φ

(x1β̂0 + v̂i2θ̂0

)]. (38b)

As before, instead of deriving complicated analytical formulas for standard errors for estimates of APEs,

bootstrap standard error can be easily applied to account for the sampling variation in the generated regressor

v̂.

Despite that the switching model brings in additional flexibility by allowing the structural error u ≡

(1− y3)u0 + y3u1 to depend not only on v2 but also on interactions between v2 and y3, assuming that the

reduced forms for y2 remain unchanged across two regimes is restrictive in empirical applications.

3 Test for Endogeneity from a Binary Explanatory Variable

This section focuses on variable addition tests for additional endogeneity from a binary explanatory

variable, conditioning on v̂2, the residuals from reduced forms for continuous EEVs. As we have seen in

equations (1) and (15), the only consistent approach to deal with a binary EEV is to make distributional

assumptions and conduct a joint estimation. In real application, we always want to avoid a joint MLE

estimation due to its sensitivity to the distributional assumption and computational difficulty in arriving at a

numerical solution. A variable addition test (VAT), as proposed in Wooldridge (2014), helps us determine

whether such a joint estimation is necessary by testing on generalized residuals before proceeding to a

joint estimation. Especially if we have already controlled for endogeneity from other continuous EEVs by

conditioning on v̂2, the generalized residual is less likely to be correlated with the remaining unobservable.

The following shows that the VAT on the generalized residual is a valid test for endogeneity from a binary

explanatory variable because it is asymptotically equivalent to a LM test under the null hypothesis of no

endogeneity.

More formally, in the basic model (1), we are interested in testing the following null hypothesis:

13

H0 : ρ = 0. First, we begin by showing an infeasible Lagrange multiplier (score) test that has the asymptotic

distribution of χ21. Then, we show the, conditional on v2, VAT test of the generalized residual is asymptot-

ically equivalent to the infeasible LM test and thus has the same asymptotic χ21 distribution. In practice, in

order to account for the sampling error in v̂2, we bootstrap the two-step procedure to obtain the p-value of

the test. Let γ ≡ (β,θ) and wi ≡ (xi1,vi2). Let d̃i be di in (8) evaluated at ρ = 0 and γ̃ be the estimates

of γ obtained from the restricted model. The restricted model is one where ρ = 0 so we treat y3 as an ex-

ogenous explanatory variable. Let q̂3i be q3i in (9) evaluated at the parameters(δ̂, η̂

)from a reduced-form

probit estimation.

As in Semykina and Wooldridge (2017), using the likelihood function Li ≡ P (yi1, yi3|yi2, zi,vi2)

for one observation, the LM statistic plugs the estimates from the restricted model into the score from the

unrestricted model:

LM =

(N∑i=1

S̃i,ρ

)′Ã22

[Ṽ22

]−1Ã22

(N∑i=1

S̃i,ρ

)/N, (39)

where S̃i,ρ ≡ ∂ lnLi∂ρ |γ=γ̃,ρ=0 =yi1−Φ(d̃i)

Φ(d̃i)[1−Φ(d̃i)]φ(d̃i

)ĝri3

Ã

≡ − 1N

∑Ni=1E(∂2 lnLi∂γ∂γ′ |yi3,yi2, zi,vi2

)|γ=γ̃,ρ=0

∑Ni=1E

(∂2 lnLi∂ρ∂γ′ |yi3,yi2, zi,vi2

)|γ=γ̃,ρ=0∑N

i=1E(∂2 lnLi∂γ∂ρ |yi3,yi2, zi,vi2

)|γ=γ̃,ρ=0

∑Ni=1E

(∂2 lnLi∂ρ∂ρ |yi3,yi2, zi,vi2

)|γ=γ̃,ρ=0

= 1N

∑N

i=1

φ(d̃i)2

Φ(d̃i)[1−Φ(d̃i)]w′iwi

∑Ni=1

φ(d̃i)2

Φ(d̃i)[1−Φ(d̃i)]w′iĝri3∑N

i=1

φ(d̃i)2

Φ(d̃i)[1−Φ(d̃i)]ĝri3wi

∑Ni=1

φ(d̃i)2

Φ(d̃i)[1−Φ(d̃i)]ĝr2i3

Ã−1 =

Ã11 Ã12Ã21 Ã22

Ṽ = Ã−1B̃Ã−1 =

Ṽ11 Ṽ12Ṽ21 Ṽ22

B̃ ≡ 1

N

N∑i=1

(S̃i,ρS̃

′i,ρ

)ĝri3 ≡ yi3

φ (q̂3i)

Φ (q̂3i)− (1− yi3)

φ (−q̂3i)Φ (−q̂3i)

(40)

Matrix Ã above is an estimator of the expected value of the negative Hessian matrix that uses the expected

Hessian form. The outer product of scores or usual Hessian form of the matrix could be used. ĝri3 is a

14

consistent estimator of gri3 ≡ E (vi3|yi3,yi2, zi,vi2)

A VAT can be carried out by the following procedure of testing on generalized residuals:

1. Use OLS to estimate the reduced-form equations for yi2 (1b) to obtain v̂i2.

2. Use probit to estimate the augmented reduced-form for yi3 in (5b), construct ĝri3 according to the

formula in equation (40).

3. Augment equation (5a) by ĝri3 and estimate by probit. Use the t statistics for testing single hypothe-

ses.

Under the null hypothesis the coefficient on ĝri3 is zero, and so estimation of the parameters in ĝri3 does

not affect the√N -asymptotic distribution of the test statistic. There is no need to account for the first-step

estimation of ĝri3 when performing the test. However, as in Wooldridge (2010, Section 12.5.2), we need

to adjust for the first-step estimation of vi2, by stacking the moment conditions or by bootstrapping the two

steps procedure.

The following shows that, conditional on vi2, the variable addition test is asymptotically equivalent to

the LM test. Write the second-step log likelihood function as

Li = Φ (xi1β + vi2θ+τgri3)yi1 [1− Φ (xi1β + vi2θ+τgri3)]1−yi1 . (41)

As mentioned above, we ignore the fact that gr3 is estimated consistently at the first step. The score

vector of (41) is

Si =

∂ lnLi∂γ∂ lnLi∂τ

= yi1 − Φ (wiγ+τgr3)Φ (wiγ+τgr3) [1− Φ (wiγ+τgr3)]

φ (wiγ+τgr3)

wigri3

(42)Summing the score vector over all i and using a mean-value expansion about the true parameter vector

gives

N−1/2N∑i=1

Ŝi = N−1/2

N∑i=1

Si −A√N

γ̂ − γτ̂ − τ

+ op (1) = 0 (43)where Ŝi is the score vector evaluated at the estimated parameters

(γ̂ ′, τ̂

)′, and A is the expected value ofthe negative Hessian matrix.

√N

γ̂ − γτ̂ − τ

= A−1 [N−1/2 N∑i=1

Si

]+ op (1) (44)

15

When testing H0 : τ = 0, the robust Wald test statistic is given by

W = (τ̂ − τ)′(V̂22/N

)−1(τ̂ − τ) =

√N (τ̂ − τ)′ V̂−122

√N (τ̂ − τ) (45)

where

V̂ = Â−1B̂Â−1 =

V̂11 V̂12V̂21 V̂22

, (46)B̂ =

1

N

N∑i=1

(S̃i,ρS̃

′i,ρ

), (47)

Â =1

N

∑Ni=1 φ(p̂i)2Φ(p̂i)[1−Φ(p̂i)]w′iwi ∑Ni=1 φ(p̂i)2Φ(p̂i)[1−Φ(p̂i)]w′iĝri3∑Ni=1

φ(p̂i)2

Φ(p̂i)[1−Φ(p̂i)] ĝri3wi∑N

i=1φ(p̂i)

2

Φ(p̂ii)[1−Φ(p̂i)] ĝr2i3

, (48)p̂i = wiγ̂+τ̂ ĝr3, (49)

Â−1p−→ A−1 =

A11 A12A21 A22

. (50)So the Wald statistic can also be written as

W =

(N∑i=1

Si, τ

)′A22V̂−122 A

22

(N∑i=1

Si, τ

)//N (51)

Under the null of no selection bias (τ = 0, ρ = 0), the score and Hessian matrices used in (39) and (51)

are the same when evaluated at the true parameter values. When the null is true, τ̂p−→ 0,

√N (γ̂ − γ) and

√N (γ̃ − γ) converge in distribution. Therefore, LM−W p−→ 0, so the tests are asymptotically equivalent.

Through bootstrapping the two-step procedure, p-value for the test can be obtained .

4 Quasi-LIML and Fractional Response

Based on the literature of Quasi-MLE (White, 1982), the findings above carry through if f1 is a fractional

response with a conditional mean that happens to have a probit form. The key insight from quasi-likelihood

estimation is that we do not need to know the true distribution of the entire model to obtain consistent param-

eter estimates. This likelihood function could also be applied to the case where y1 is a fractional response,

as long as we model the conditional mean of y1 to have a probit form. With the Bernoulli distribution being

in the linear exponential family, quasi-LIML would identify parameters in a correctly specified conditional

16

mean regardless of misspecification in other aspects of the distribution.

Namely,

E (f1|x1, c1) = Φ (x1β+c1) (52a)

y2 = zΠ + v2 (52b)

y3 = 1 [zδ + u3 ≥ 0] , (52c)

where c1 is an omitted variable thought to be correlated with y2 and y3. By assuming c1 follows a joint

normality distribution with v2 and u3, linear projections of c1 and u3 onto v2 have the following error form:

c1 = v2θ+a1 (53a)

u3 = v2η + v3 (53b)

where θ ≡ E (v′2v2)−1E (v′2c1) and η ≡ E (v′2v2)

−1E (v′2u3) . Plugging the linear projections (53a) and

(53b) back to (52a) and (52c), we have an augmented equation for the conditional mean of f and the reduced

form for y3:

E (f1|x1,v2, a1) = Φ (x1β + v2θ+a1) (54a)

y3 = 1 [zδ + v2η + v3 ≥ 0] , (54b)

where a1 is the remaining unobservable factor that, after conditioning on v2, captures the additional endo-

geneity from y3 through v3. Again, assume a joint normality assumption between a1 and v3 as

D

a1v3

∣∣∣∣∣∣∣v2 ∼ Normal

0

0

, σ2a ρσa

ρσa 1

, (55)

where σ2a ≡Var(a1) and ρ is the covariance. Further averaging out the unobservable a1, the conditional mean

of the joint distribution of f1 and y3 has the exact same form as the probit model with many continuous EEVs

17

and one binary EEV in (1).

E (f1, y3 = 1|z,v2,y2) = E (y1, y3 = 1|z,v2,y2) = P (y1 = 1, y3 = 1|z,v2,y2)

=

∫ ∞−q3

Φ (d)φ (υ3) dυ3, (56a)

E (f1, y3 = 0|z,v2,y2) = E (y1, y3 = 0|z,v2,y2) = P (y1 = 1, y3 = 0|z,v2,y2)

=

∫ −q3−∞

Φ (d)φ (υ3) dυ3, (56b)

where

d ≡ x1b + v2r + ρυ3√1 + (1− ρ2)σ2a

, (57a)

q3 ≡ zd + v2g. (57b)

Because the Bernoulli log likelihood belongs to the linear exponential family, the solution from the

following maximization problem identifies (β,θ):


E [logP (f1, y3|y2, z,v2)]

= E

[f1y3 log

∫ ∞−q3

Φ (d)φ (υ3) dυ3

+ (1− f1) y3 log∫ ∞−q3

[1− Φ (d)]φ (υ3) dυ3

+f1 (1− y3) log∫ −q3−∞

Φ (d)φ (υ3) dυ3

+ (1− f1) (1− y3) log∫ −q3−∞

[1− Φ (d)]φ (υ3) dυ3]. (58)

5 Monte Carlo Simulations

In this section, six Monte Carlo experiments are conducted to compare the finite sample behavior of

different estimators for binary response model with both continuous and discrete EEVs. The six Monte Carlo

experiments fall into two designs. In the first design error terms (u1,v2, u3) are jointly normally distributed.

In the second design, conditional on v2, u1 and u3 are assumed to have bivariate normal distribution. For

each design, three data generating processes (DGPs)including a just identification case, an over identification

case, and a switching model with two regimesare considered. Nine estimators are compared in each case,

four estimators assuming a linear probability model for the binary outcome and the other five estimators

18

acknowledging the nonlinear functional form. APEs are simulated for those estimators that respect the

nonlinear functional form and are compared with coefficients from linear estimators.

More specifically, in the first design of joint normality, the DGP for the Just ID is

y1 = 1 [−y2 + y3 + 0.3z1 + 0.3z2 + 0.5v2 + 0.5v3 + r1 > 0]

y2 = 0.1z1 + 0.2z2 + 0.1z3 + z4 + v2 (59)

y3 = 1 [0.2z1 + 0.1z2 + z3 + 0.1z4 + 0.5v2 + v3 > 0] ,

where

u1 = 0.5v2 + v1 (60)

v1 = 0.5v3 + r1 (61)

r1 ∼ Normal (0, 0.5) (62)

so that u1

v2

v3

∼ Normal

0

0

0

,

1 0.5 0.5

0.5 1 0

0.5 0 1

. (63)

The binary EEV y2 and continuous EEV y3 are generated to have coefficients of opposite signs in order to

show how biased estimators react to sign difference. The exogenous variables are generated as:

z1 ∼ Normal (0, 1)

e2 ∼ Normal (0, 1)

z2 = 1 [e2 > 0]

z3 ∼ Normal (0, 1)

e4 ∼ Normal (0, 1)

z4 = 1 [e3 > 0] .

where the continuous z3 is the instrument mainly for binary EEV y3 and the binary z4 is the instrument

mainly for continuous EEV y2. To make them valid instruments, z3 and z4 are excluded from the structural

19

equation.

In this DGP, the true ASF is defined as

ASF (x1) = Φ (−y2 + y3 + 0.3z1 + 0.3z2) . (64)

The second case of over identification has the same parameters except that we have two additional

instruments z5 and z6, where

z5 ∼ Normal (0, 1)

e6 ∼ Normal (0, 1)

z6 = 1 [e6 > 0] .

Continuous z5 is mainly for the continuous EEV y2 and binary z6 is mainly for the binary EEV y3. The true

ASF remains the same as in (64).

In the endogenous switching case, to emphasize coefficients on the continuous EEV y2 and the corre-

lations between the reduced form errors and the structural error are designed to have opposite directions

across regimes, namely

y(1)1 = 1 [−y2 + y3 + 0.3z1 + 0.3z2 + 0.5v2 + v1 > 0]

y(0)1 = 1 [0.3y2 + y3 − 0.5z1 + 0.1z2 − 0.5v2 + v0 > 0] (65)

y2 = 0.1z1 + 0.2z2 + 0.1z3 + z4 + v2

y3 = 1 [0.2z1 + 0.1z2 + z3 + 0.1z4 + 0.5v2 + v3 > 0] ,

where

u0 = −0.5v2 + v0 (66)

v0 = −0.5v3 + r1 (67)u0

v2

v3

∼ Normal

0

0

0

,

1 −0.5 −0.5

−0.5 1 0

−0.5 0 1

. (68)

20

ASF in this case is

ASF (x1) = y3Φ (−y2 + y3 + 0.3z1 + 0.3z2) + (1− y3) Φ (0.3y2 + y3 − 0.5z1 + 0.1z2) . (69)

In design 2, parameterizations are the same as in design 1, but we assume v2 follows a demeaned χ21

distribution with one degree of freedom

v2 ∼ χ21 − 1. (70)

In all experiments, the number of replications is 1000, and the results of the experiments are presented

for sample sizes of 1000, 3000 and 5000. Table 1 and Table 2 report biases and the root mean squared errors

(RMSEs) for estimators of APE for y2 and y3, respectively. Figure 1 and Figure 2 depict the empirical

distributions of estimators of APE for y2 with sample size of 1000 under design 1 and design 2, respectively.

Similarly, Figure 3 and Figure 4 depict the counterparts for y3.

For each of the above designs, coefficients of linear probability models and APEs of probit models are

compared. Further, for probit models, joint estimations with the binary EEV or all EEVs are compared with

two-step estimations with control function terms (residuals or generalized residuals) plugged in. In addition,

a switching version of each model is considered to account for the endongenous switching DGP in case 3.

More specifically, CF Biprobit is the control function approach inserting first-stage residual from reduced-

form estimation of y2 into the second-step joint estimation between y1 and y3. CF Biprobit Switching

performs Heckman probit with sample selection for y(1)1 and y(0)1 separately using y3 as a sample selection

indicator. CF Probit avoids the joint estimation with y3 by inserting a generalized residual from y3 as a

proxy for endogeneity given residual from y2. CF Probit Switching performs the CF Probit separately for

sub-samples defined by y3. CF Linear inserts a residual from y2 and a generalized residual from y3 into the

linear probability model for y1. CF Linear Switching allows for a full set of interactions between y3 and

other observables and unobservables in the linear probability model. Usual 2SLS uses linear probability

models for both y1 and y3 and applies the usual two-step IV estimation. Optimal IV uses predicted values

from reduced forms for y1 and y3 as instruments for a linear probability model of y1. y3 is predicted from a

probit model. Joint MLE is a full joint estimation of y1, y2 and y3.

For the APE of y2 under joint normality as in Figure 1, CF Biprobit and Joint MLE are the consistent

estimators in the Just ID case and Over ID case while CF Biprobit Switching is the consistent estimator in the

Switching case. Their empirical distributions are centered around the true APE depicted by the red vertical

21

line. Besides those consistent estimators, approximations provided by the CF Probit (or CF Probit Switching

in the Switching case) outperform, to a great extent, the approximations provided by the linear probability

estimators such as CF Linear (or CF Linear Switching in the Switching case), Usual 2SLS and Optimal IV.

In fact, in the Switching case, CF Probit Switching and CF Biprobit Switching (the consistent estimator in

this case) seem to completely overlap with each other, suggesting a negligible amount of bias. In the Just

ID case and Over ID case, CF Probit has a mild amount of upward bias and a slightly lower peak than CF

Biprobit and Joint MLE. In contrast, approximations provided by linear probability model estimators (CF

Linear, Usual 2SLS Optimal IV and CF Linear Switching) have a significant amount of downward bias in

all cases. The differences in bias within the linear probability model estimators are not noticeable: they all

seem to cluster together. In the Switching case, they are joined by the misspecified CF Biprobit and Joint

MLE which have a similar amount of downward bias. When CF Biprobit and Joint MLE are consistent, they

still completely overlap with each other. This happens not only in the Just ID case but also in the Over ID

case, suggesting a negligible amount of efficiency loss by carrying out a two-step procedure. CF Biprobit

Switching and CF Probit Switching, however, suffer a slightly flatter peak compared to their counterpart

non-switching estimators (CF Biprobit and CF Probit) in the Just ID case and Over ID case, indicating an

efficiency loss from a more complex parameterization.

When the error terms follow a conditional normality, the estimators for APEs for y2 have fairly different

finite sample behaviors from that under joint normality. As reflected in Figure 2, Joint MLE lacks robustness

and is no longer the consistent estimator in any case. As before, CF Biprobit is the consistent estimator in

the Just ID case and Over ID case while CF Biprobit Switching is the consistent estimator in the Switching

case. Approximations provided by the CF Probit (or CF Probit Switching in the Switching case) are still the

best: they almost overlap with those provided by CF Biprobit (or CF Biprobit Switching in the Switching

case), the consistent estimator. Joint MLE is biased upwards to a noticeable degree in the Just ID case and

Over ID case. In the Switching case where Joint MLE is misspecified, it is biased downward and joined

by other inconsistent estimators like CF Biprobit and linear probability estimators (CF Linear, Usual 2SLS

and Optimal IV). CF Linear Switching performs mildly better than other linear estimators in the Switching

case but are still more biased compared to the consistent estimator. Overall, linear probability estimators

continues to perform poorly in all cases: they are far biased downwards. The CF Biprobit Switching and CF

Probit Switching still lead to efficiency loss indicated by flatter peaks in the Just ID case and Over ID case.

Under joint normality, APE of y3 follows a similar pattern as that of y2, with some minor differences.

22

As in Figure 3, CF Biprobit and Joint MLE still overlap with each other in all cases, whether as consistent

estimators in the Just ID case and Over ID case, or as misspecified estimators in the Switching case. The

approximation provided by CF Probit (or CF Probit Switching in the Switching case) is still the best but with

a flatter peak than those in Figure 1. The linear probability estimators are biased upwards. The differences

in empirical distributions for the linear estimators are more pronounced in the binary EEV y3 than for

continuous EEV y2. Particularly, CF Linear using the generalized residual is no longer close to Usual 2SLS

using linear probability model for y3. In the Switching case, linear probability estimators all lie between the

CF Biprobit Switching and CF Biprobit with varying degrees of bias and precision.

APEs of y3 under conditional normality are sketched in Figure 4. Like estimators in Figure 3, they

have identical patterns with their counterparts for y2. More specifically, Joint MLE is biased downwards in

the Just ID case and Over ID case but biased upwards in the Switching case. Similarly, linear probability

estimators are biased upwards rather than downwards. CF Probit (or CF Probit Switching in the Switching

case) still provides the best approximations in all cases, significantly better than linear probability estimators.

Table 1 and Table 2 report the bias and RMSE of y2 and y3 for all the estimators in the six cases,

respectively. Despite the difference in sign and magnitude, the patterns of estimators for y2 and y3 are

similar. Methods using CF approaches are listed in Column (1) through (6), followed by conventional

methods like IV 2SLS, Opt. IV 2SLS and Joint MLE from Column (7) to (9). With the increase of sample

size from 1000 to 5000, the bias of CF Biprobit in the Just ID case and Over ID case (or CF Biprobit

Switching in the Switching case) shrinks drastically to zero. Their RMSEs also decrease by about half at

the same time. The bias of CF Probit in the Just ID case and Over ID case (or CF Probit Switching in the

Switching case) is small at sample size of 1000 but shrink by a less magnititude as sample size increases

to 5000. The RMSEs of CF Probit or CF Probit Switching also decrease by about half as the sample size

increases. The bias of the linear probability estimators (CF 2SLS, CF 2SLS Switching, IV 2SLS and Opt.

IV 2SLS) and misspecified Joint MLE, however, is huge to start with and does not shrink or even increases

in some cases as the sample size increase. Their RMSEs also do not decrease as much.

In summary, the Monte Carlo results show that CF Biprobit does not lose efficiency compared to Joint

MLE in the correctly specified case. CF Probit (or CF Probit in the Switching case) provides good approxi-

mations, outperforming linear estimators of any sort to a great extent.

23

6 Empirical Illustration

As an empirical illustration, we revisit the empirical example of Murtazashvili and Wooldridge (2015)

under different functional form assumptions and estimation methods. Murtazashvili and Wooldridge (2015)

study the sensitivity of the budget share of housing expenditure to price and total expenditure using a linear

probability panel data model with many sources of heterogeneity. Total expenditure is considered to be the

continuous EEV because of its joint determination with the budget share on housing expenditure. Home-

ownership dummy is considered to be the binary EEV. It is also assumed to play the role of an endogenous

switching indicator that is employed for the budget share of housing expenditure equation. Here, instead,

we employ a fractional response model with switching, as in (71), that acknowledges the fractional nature

of the budget share and therefore has built-in heterogeneity.

E(HousingShare|x1, c1, c0) = Φ [β0 + β1Log(Expend.) + β2Homeowner

+z1β3 + β4Log(Expend.) · Homeowner

+β5Homeowner · z1 + c0 + Homeowner · c1] , (71a)

Log(Expend.) = ζ0 + zζ1 + v2, (71b)

Homeowner = 1 [γ0 + zγ1 + u3 > 0] . (71c)

We also use just one cross-sectional period from the sample, which turns out to give fairly close estimates

of marginal effect for the variables of interest to the panel linear model with many sources of heterogeneity.

The summary APEs from the nonlinear model are compared to the coefficients from the coefficients in linear

probability panel data models.

The sample employed in the estimation is the 2001 wave of the Panel Study of Income Dynamics (PSID)

that consists of 2355 owners and 629 renters. Since we suspect that the homeownership dummy indicates

switching into differing regions, we report separate summary statistics for different home ownership statuses

as in Table 3. Due to the way the dependent variable housing budget share is constructed, and the increase

in the price for homes, 84 out of 2355 home owners face a negative housing budget share. As the dependent

variable has to be in the unity interval for a fractional response, the housing budget for these 84 homeowners

are set to their lower bound zero. On average, owners spend smaller budget shares on housing than renters.

24

The total expenditure and income of owners are greater than those of renters. Log price, age of the household

head, marital status, whether recently moved and race are the exogenous control variables. Log income

is considered to be the instrument primarily for the log expenditure, whereas years of education of the

household head and number of children in the household are instruments mainly for home ownership.

Table 4 reports the first-stage reduced-form estimation for the two EEVs. Linear reduced form regres-

sions are reported for log total expenditure, the continuous EEV, in Column (1) and (2). Probit regression is

reported for home ownership, the binary EEV, in Column (3) to (5). Slight variations in the specifications are

reported in each case. For example, age squared is included in Column (2) in addition to age in its level form

as in Column (1), which turns out to be significant but practically unimportant. For any specification, the

instruments mentioned above are strong enough. The probit reduced forms for the homeownership dummy

are reported with and without the continuous EEV. The predicted value of home ownership from Column

(3) contains only an exogenous variable and is used as an instrument in Regression (3) in Table 5. Columns

(4) and (5) show that including the residual from the reduced form of the log expenditure is sufficient to

control for all the endogeneity from the total expenditure in the home ownership equation.

Table 5 compares the APEs from the fractional response models to the coefficients of linear models.

Columns (1) to (4) report the coefficients from linear models for the housing budget share, and Columns

(5) to (10) report the APEs from fractional response models. The same set of estimators as in the Monte

Carlo study are compared here, the only difference being that the dependent variable is a fractional response,

instead of a binary response. A “Frac” is added to the names to indicate that a quasi-probit is assumed for the

housing budget share. When the homeownership is jointly estimated in the fractional probit, as in Column

(9) and Column (10), the biprobit and heckprobit command in Stata is modified to allow for a fractional

dependent variable. The standard errors for the estimates of APEs are bootstrapped.

As we can tell from Table 5, first of all, failing to account for endogeneity, whether as in the linear

model represented in Column (1) or as in the Frac Probit model as in Column (5), leads to fairly different

estimates from those methods that take care of endogeneity using the same models. Among the linear

probability models that have accounted for endogeneity, the estimates from IV 2SLS differ significantly

from that Opt. IV 2SLS and CF 2SLS. Both Opt. IV 2SLS and CF 2SLS are close to the APEs in the Frac

Probit models that have accounted for endogeneity. This suggests that the relationship between the housing

budget share and the covariates of interest may be close to linear in the unit interval so that two-stage

least squares estimator provides a good approximation. Among the Frac Probit models, the estimates from

25

different methods of accounting for endogeneity are fairly close across the board. The difference between

conducting joint estimations with the home ownership, as in Column (9) and Column (10), and plugging

in the generalized residual from home ownership, as in Column (7) and Column (8), is small. Particularly,

if we use home ownership as the switching indicator, the estimates and standard error from Column (10)

and Column (8) are the same, at least to the third decimal place. The difference between Column (7) and

Column (9) is also negligibly small.

Table 6 reports the t statistics for testing of endogeneity and their p-values. The p-values are obtained

from bootstrapping the test statistics. Only estimators that employ CF approaches are considered in this

table. All the test statistics reported are Wald tests for significance. The names refer to the estimators, as in

Table 5. Columns (1) to (4) report the variable addition tests (VATs) on control function terms only. Columns

(5) and (6) also report Wald tests on the correlation parameter ρ (or ρ0 and ρ1 for the two switching regimes),

representing the endogeneity from home ownership equations, given the control function term from log

expenditure equation. In any case, the evidence of endogeneity from log expenditure is strong: the p-values

are identically zero in any test on the significance of v̂2, the control function term from log expenditure

equation. No test based on the fractional response model confirms the endogeneity from homeownship,

whether it is a VAT test on the generalized residual or on the correlation parameter, although the test on the

generalized residual in the linear model CF 2SLS turns out to be significant. The test statistics and p-value

from the VAT test on generalized residual in Column (3) is quite similar to the Wald test on the correlation

parameter ρ in Column (5), suggesting the validity of using VAT on generalized residuals to detect the

additional endogeneity from home ownership. The LM or LR test on ρ can also be performed, but the

proper method of bootstrapping for p-values has to be determined. Another advantage of performing a VAT

test on generalized residuals is its robustness to model specifications. In the case of switching, a joint test

concerning two regimes needs to be conducted to detect endogeneity. This can be easily done by performing

a joint test on the interaction terms between the control function terms and the switching indicator. However,

finding a way to combine correlation parameters ρ0 and ρ1 obtained from different regimes, however is no

easy job.

26

7 Conclusion

This paper has shown applications of control function approaches to account for one binary EEV and

many continuous EEVs in binary and fractional response models. The control function approach is com-

putationally simple and allows for a flexible incorporation of heterogeneity, as in an endogenous switching

model. Partial effects based on the ASF are of causal interpretation and can be easily bootstrapped to obtain

inference due to the computational simplicity of the control function approach. A VAT test based on the

generalized residual is shown to be a valid test for detecting additional endogeneity from the binary EEV,

conditioning on the residuals from the continuous EEVs. The simulation study shows that using generalized

residuals to account for endogeneity provides a fairly good approximation to the true APE, significantly bet-

ter than approximations provided by linear probability models. Applying the CF approach to an empirical

illustration using a fractional response model for the housing budget share, we show that homeownership is

not endogenous after controlling for total expenditure, the continuous EEV. This is revealed by performing

VATs on the generalized residuals and cross validated by a Wald test on the correlation parameter. These

results imply that plugging in generalized residuals into a binary response model (or a fractional response

model) acts as a better approximation to the causal marginal effects of interest than other conventional

methods, and it is computationally simpler, enabling an easier detection of endogeneity.

27

A.1 Figures and Tables for Section 5

Figure 1: Empirical Distribution of APEs for y2 for the Sample Size of 1000 under Joint Normality

A.2 Tables for Section 6

28

Tabl

e1:

Sim

ulat

ion

Res

ults

forA

PEof

y2

(1)

(2)

(3)

(4)

(5)

(6)

(7)

(8)

(9)

(1)

(2)

(3)

(4)

(5)

(6)

(7)

(8)

(9)

Des

ign

1:Jo

intN

orm

ality

Des

ign

2:C

ondi

tiona

lNor

mal

ityC

FC

FC

FC

FC

FC

FIV

Opt

.IV

ML

EC

FC

FC

FC

FC

FC

FIV

Opt

.IV

ML

EB

ipro

bit

Bip

robi

tPr

obit

Prob

it2S

LS

2SL

S2S

LS

2SL

SB

ipro

bit

Bip

robi

tPr

obit

Prob

it2S

LS

2SL

S2S

LS

2SL

SSw

itchi

ngSw

itchi

ngSw

itchi

ngSw

itchi

ngSw

itchi

ngSw

itchi

ngC

ase

1:Ju

stID

,One

Reg

ion,

APE

y2=-

.265

0Ju

stID

,One

Reg

ion,

APE

y2=-

.242

6N

=100

0B

ias

.001

1.0

020

.002

9.0

041

-.056

6-.0

568

-.054

1-.0

542

.001

1.0

009

.001

8.0

024

.002

8-.0

463

-.046

5-.0

474

-.047

6.0

121

RM

SE.0

102

.010

5.0

112

.011

7.0

634

.063

5.0

613

.061

4.0

102

.013

3.0

147

.013

8.0

151

.056

0.0

562

.057

0.0

571

.016

0N

=300

0B

ias

.000

3.0

005

.001

8.0

022

-.057

1-.0

572

-.054

8-.0

549

.000

3.0

002

.000

4.0

016

.001

3-.0

458

-.045

9-.0

468

-.046

9.0

114

RM

SE.0

054

.005

5.0

060

.006

1.0

593

.059

3.0

571

.057

2.0

054

.007

5.0

081

.007

8.0

084

.049

0.0

490

.049

8.0

500

.012

8N

=500

0B

ias

.000

2.0

004

.001

8.0

020

-.057

3-.0

574

-.055

0-.0

551

.000

2-.0

001

.000

1.0

013

.000

9-.0

461

-.046

1-.0

472

-.047

2.0

112

RM

SE.0

044

.004

5.0

050

.005

1.0

587

.058

7.0

564

.056

5.0

044

.005

8.0

062

.006

1.0

064

.048

0.0

480

.049

0.0

490

.012

1C

ase

2:O

verI

D,O

neR

egio

n,A

PEy2=-

.227

9O

verI

D,O

neR

egio

n,A

PEy2=-

.212

3N

=100

0B

ias

.000

0.0

003

.000

8.0

011

-.034

2-.0

342

-.032

2-.0

325

.000

0.0

005

.000

6.0

015

.001

5-.0

269

-.027

3-.0

291

-.029

4.0

059

RM

SE.0

068

.006

9.0

072

.007

4.0

359

.035

8.0

341

.034

3.0

067

.007

2.0

074

.007

8.0

080

.029

8.0

301

.031

9.0

321

.009

2N

=300

0B

ias

.000

1.0

002

.000

8.0

009

-.033

5-.0

334

-.031

5-.0

317

.000

1.0

000

.000

0.0

011

.000

9-.0

272

-.027

7-.0

292

-.029

4.0

057

RM

SE.0

039

.003

9.0

041

.004

1.0

341

.033

9.0

321

.032

3.0

039

.004

1.0

042

.004

5.0

046

.028

1.0

286

.030

0.0

302

.007

0N

=500

0B

ias

.000

0.0

000

.000

7.0

007

-.033

8-.0

339

-.031

7-.0

320

0000

.000

0.0

001

.001

0.0

009

-.027

1-.0

276

-.029

0-.0

293

.005

7R

MSE

.002

9.0

029

.003

1.0

032

.034

1.0

635

.032

1.0

323

.002

9.0

032

.003

3.0

035

.003

6.0

277

.028

1.0

295

.029

8.0

065

Cas

e3:

Just

ID,T

wo

Reg

ions

,APE

y2=-

.090

2Ju

stID

,Tw

oR

egio

ns,A

PEy2=

-.062

4N

=100

0B

ias

-.024

8-.0

001

-.023

7.0

010

-.027

4-.0

261

-.025

2-.0

255

-.024

8-.0

239

-.000

1-.0

230

.000

1-.0

176

-.015

8-.0

176

-.018

2-.0

241

RM

SE.0

345

.017

1.0

337

.017

1.0

382

.036

5.0

368

.036

9.0

346

.036

3.0

197

.035

7.0

196

.033

6.0

317

.033

6.0

339

.036

3N

=300

0B

ias

-.025

2-.0

003

-.025

1-.0

003

-.026

9-.0

165

-.024

9-.0

192

-.025

2.-.

0259

-.000

4-.0

251

-.000

3-.0

187

-.016

6-.0

186

-.019

2-.0

261

RM

SE.0

289

.009

8.0

295

.010

8.0

311

.022

5.0

293

.025

1.0

289

.030

2.0

108

.029

5.0

108

.024

7.0

225

.024

6.0

251

.030

3N

=500

0B

ias

-.025

6.0

001

-.024

5.0

009

-.027

1-.0

260

-.025

0-.0

253

-.025

6-.0

251

.000

0-.0

243

.000

1-.0

175

-.015

9-.0

175

-.018

1-.0

253

RM

SE.0

280

.007

6.0

269

.007

7.0

298

.028

6.0

279

.028

2.0

280

.027

8.0

083

.027

0.0

083

.021

3.0

197

.021

4.0

218

.027

9a

Sequ

entia

lave

ragi

ngof

the

cont

rolf

unct

ion

term

v 2an

dx

isap

plie

dto

com

pute

estim

ates

ofA

PEs.

bT

hebi

asis

defin

edas

the

diff

eren

cebe

twee

nth

etr

ueA

PEs

and

the

estim

ates

.RM

SEis

the

root

mea

nsq

uare

der

ror.

cE

stim

ator

(1)i

sth

eC

Fap

proa

chin

sert

ing

the

first

-sta

gere

sidu

alv̂ 2

toa

seco

nd-s

tage

join

tbip

robi

tbet

wee

ny1

,y3.E

stim

ator

(2)i

sth

eC

Fap

proa

chin

sert

ing

the

first

-sta

gere

sidu

alv̂ 2

toa

seco

nd-s

tage

join

tbip

robi

tbet

wee

ny(1

)1

,y3

andy(0

)1

,y3.E

stim

ator

(3)i

sth

eC

Fap

proa

chin

sert

ing

first

-sta

gere

sidu

alv̂ 2

andĝr 3

into

the

prob

itm

odel

fory

1.E

stim

ator

(4)i

sth

eC

Fap

proa

chin

sert

ing

first

-sta

gere

sidu

alv̂ 2

andĝr 3

into

the

prob

itm

odel

fory

1se

para

tely

fors

ub-s

ampl

esde

fined

byy3.E

stim

ator

(5)i

sth

eC

Fap

proa

chap

plie

dto

linea

rpro

babi

lity

mod

elfo

ry1

byin

sert

ing

first

-sta

gere

sidu

alv̂ 2

andĝr 3

.Est

imat

or(6

)is

the

CF

appr

oach

appl

ied

tolin

earp

roba

bilit

ym

odel

fory

1by

inse

rtin

gfir

st-s

tage

resi

dualv̂ 2

andĝr 3

fors

ub-s

ampl

esde

fined

byy3.E

stim

ator

(7)i

sth

e2S

LS

IVap

proa

chfo

ralin

earp

roba

bilit

ym

odel

ofy1.E

stim

ator

(8)i

sth

e2S

LS

IVap

proa

chus

ing

pred

icte

dfit

ted

valu

esfr

omth

efir

st-s

tage

redu

ced

form

sfo

ry2

andy3

asin

stru

men

ts.y

2is

pred

icte

dus

ing

alin

ear

mod

elan

dy3

ispr

edic

ted

usin

gpr

obit

mod

el.E

stim

ator

(9)i

sth

ejo

inte

stim

atio

nofy1,y

2an

dy3

bym

axim

umlik

elih

ood.

29

Tabl

e2:

Sim

ulat

ion

Res

ults

forA

PEof

y3

(1)

(2)

(3)

(4)

(5)

(6)

(7)

(8)

(9)

(1)

(2)

(3)

(4)

(5)

(6)

(7)

(8)

(9)

Des

ign

1:Jo

intN

orm

ality

Des

ign

2:C

ondi

tiona

lNor

mal

ityC

FC

FC

FC

FC

FC

FIV

Opt

.IV

ML

EC

FC

FC

FC

FC

FC

FIV

Opt

.IV

ML

EB

ipro

bit

Bip

robi

tPr

obit

Prob

it2S

LS

2SL

S2S

LS

2SL

SB

ipro

bit

Bip

robi

tPr

obit

Prob

it2S

LS

2SL

S2S

LS

2SL

SSw

itchi

ngSw

itchi

ngSw

itchi

ngSw

itchi

ngSw

itchi

ngSw

itchi

ngC

ase

1Ju

stID

,One

Reg

ion,

APE

y3=.

2573

Just

ID,O

neR

egio

n,A

PEy3=.

2385

N=1

000

Bia

s-.0

013

-.001

2-.0

116

-.011

9.1

010

.101

0.0

809

.082

3-.0

014

-.000

3.0

013

-.008

7-.0

075

.066

7.0

656

.073

8.0

752

-.011

9R

MSE

.038

0.0

382

.044

0.0

442

.111

7.1

119

.096

6.0

972

.038

0.0

422

.043

8.0

472

.048

4.0

866

.086

9.0

956

.096

1.0

432

N=3

000

Bia

s.0

001

.000

2-.0

090

-.009

1.1

032

.103

3.0

852

.085

7.0

000

-.000

1.0

001

-.008

4-.0

083

.067

5.0

663

.074

4.0

757

-.011

8R

MSE

.022

8.0

229

.026

7.0

268

.106

9.1

070

.090

4.0

906

.022

8.0

236

.024

1.0

276

.028

1.0

742

.073

5.0

819

.083

0.0

259

N=5

000

Bia

s-.0

004

-.000

3-.0

096

-.009

6.1

024

.102

6.0

840

.084

9-.0

004

.000

8.0

009

-.007

3-.0

071

.068

3.0

671

.076

8.0

815

-.011

2R

MSE

.017

2.0

172

.021

3.0

214

.104

7.1

048

.087

3.0

880

.017

1.0

193

.019

8.0

224

.022

8.0

727

.071

7.0

816

.081

5.0

221

Cas

e2

Ove

rID

,One

Reg

ion,

APE

y3=.

2132

Ove

rID

,One

Reg

ion,

APE

y3=.

2026

N=1

000

Bia

s.0

002

-.000

8-.0

003

-.000

6.0

672

.065

3.0

512

.053

4.0

001

.000

9.0

009

-.001

2.0

003

.035

3.0

493

.049

2.0

513

-.004

4R

MSE

.032

3.0

349

.035

0.0

367

.081

3.0

811

.074

4.0

729

.032

2.0

394

.039

4.0

400

.041

1.0

622

.073

6.0

772

.076

2.0

372

N=3

000

Bia

s.0

002

.000

4-.0

003

.000

3.0

663

.064

7.0

501

.052

1.0

001

.000

4.0

008

-.001

7-.0

006

.033

0.0

339

.049

4.0

510

-.004

1R

MSE

.018

8.0

197

.019

8.0

207

.071

5.0

705

.058

7.0

595

.018

8.0

217

.022

9.0

237

.023

4.0

444

.045

5.0

608

.060

8.0

226

N=5

000

Bia

s-.0

001

-.000

2-.0

012

-.000

7.0

655

.063

5.0

491

.051

3-.0

001

.000

3.0

006

-.001

6-.0

005

.033

8.0

478

.048

5.0

506

-.004

1R

MSE

.014

7.0

152

.015

8.0

162

.068

7.0

670

.054

5.0

559

.014

6.0

173

.018

0.0

185

.018

9.0

414

.054

0.0

557

.056

9.0

174

Cas

e3

Just

ID,T

wo

Reg

ions

,APE

y3=.

0622

Just

ID,T

wo

Reg

ions

,APE

y3=.

0864

N=1

000

Bia

s.0

436

.000

5.0

339

.006

5.0

292

.032

7.0

128

.015

3.0

437

.053

5.0

007

.046

2.0

028

.025

7-.0

045

.025

7.0

301

.054

9R

MSE

.072

1.0

392

0585

.044

4.0

576

.057

5.0

559

.054

6.0

722

.076

6.0

409

.066

9.0

455

.055

5.0

500

.059

7.0

601

.077

9N

=300

0B

ias

.043

7.0

001

.046

7.0

023

.029

2-.0

051

.012

9.0

294

.043

8.0

536

-.000

3.0

467

.002

3.0

254

-.005

1.0

246

.029

4.0

546

RM

SE.0

555

.023

6.0

547

.026

8.0

413

.029

4.0

351

.042

8.0

555

.062

4.0

238

.054

7.0

268

.038

6.0

294

.040

8.0

428

.063

4N

=500

0B

ias

.044

7.0

010

.035

3.0

076

.030

0.0

335

.013

0.0

159

.044

7.0

536

-.000

3.0

466

.002

3.0

245

-.005

7.0

247

.029

1.0

545

RM

SE.0

516

.018

0.0

413

.021

5.0

372

.039

5.0

273

.028

0.0

516

.059

2.0

185

.051

6.0

208

.033

3.0

237

.035

0.0

378

.060

1a

Sequ

entia

lave

ragi

ngof

the

cont

rolf

unct

ion

term

v 2an

dx

isap

plie

dto

com

pute

estim

ates

ofA

PEs.

bT

hebi

asis

defin

edas

the

diff

eren

cebe

twee

nth

etr

ueA

PEs

and

the

estim

ates

.RM

SEis

the

root

mea

nsq

uare

der

ror.

cE

stim

ator

(1)i

sth

eC

Fap

proa

chin

sert

ing

the

first

-sta

gere

sidu

alv̂ 2

toa

seco

nd-s

tage

join

tbip

robi

tbet

wee

ny1

,y3.E

stim

ator

(2)i

sth

eC

Fap

proa

chin

sert

ing

the

first

-sta

gere

sidu

alv̂ 2

toa

seco

nd-s

tage

join

tbip

robi

tbet

wee

ny(1

)1

,y3

andy(0

)1

,y3.E

stim

ator

(3)i

sth

eC

Fap

proa

chin

sert

ing

first

-sta

gere

sidu

alv̂ 2

andĝr 3

into

the

prob

itm

odel

fory

1.E

stim

ator

(4)i

sth

eC

Fap

proa

chin

sert

ing

first

-sta

gere

sidu

alv̂ 2

andĝr 3

into

the

prob

itm

odel

fory

1se

para

tely

fors

ub-s

ampl

esde

fined

byy3.E

stim

ator

(5)i

sth

eC

Fap

proa

chap

plie

dto

linea

rpro

babi

lity

mod

elfo

ry1

byin

sert

ing

first

-sta

gere

sidu

alv̂ 2

andĝr 3

.Est

imat

or(6

)is

the

CF

appr

oach

appl

ied

tolin

earp

roba

bilit

ym

odel

fory

1by

inse

rtin

gfir

st-s

tage

resi

dualv̂ 2

andĝr 3

fors

ub-s

ampl

esde

fined

byy3.E

stim

ator

(7)i

sth

e2S

LS

IVap

proa

chfo

ralin

earp

roba

bilit

ym

odel

ofy1.E

stim

ator

(8)i

sth

e2S

LS

IVap

proa

chus

ing

pred

icte

dfit

ted

valu

esfr

omth

efir

st-s

tage

redu

ced

form

sfo

ry2

andy3

asin

stru

men

ts.y

2is

pred

icte

dus

ing

alin

ear

mod

elan

dy3

ispr

edic

ted

usin

gpr

obit

mod

el.E

stim

ator

(9)i

sth

ejo

inte

stim

atio

nofy1,y

2an

dy3

bym

axim

umlik

elih

ood.

30

Figure 2: Empirical Distribution of APEs for y2 for the Sample Size of 1000 under Conditional Normality

31

Figure 3: Empirical Distribution of APEs for y3 for the Sample Size of 1000 under Joint Normality

32

Figure 4: Empirical Distribution of APEs for y3 for the Sample Size of 1000 under Conditional Normality

33

Table 3: Summary Statistics of the Estimation Sample (N=2964)

Variable Owner RenterBudget Share on Housing .20 .41

(.16) (.16)Ln(Expenditure) 10.35 9.82

(.59) (.59)Ln(Income) 10.94 10.29

(.75) (.73)Ln(Price) 8.55 8.93

(.21) (.12)Age 49.88 44.45

(12.97) (13.69)Married .79 .35

(.40) (.48)Moved .10 .33

(.30) (.47)Black .21 .46

(.41) (.50)Years of education 13.27 12.12

(2.73) (2.90)Number of Children .94 1.05

(1.14) (1.24)Obs. 2355 629a The sample is based on the 2001waves of the Panel

Study of Income Dynamics (PSID). All monetary vari-ables were converted to 1998 dollars before they werelogged.

b Sample standard deviations are in parentheses belowthe sample means.

34

Table 4: The Frist Stage Reduced Form Regression for the EEVs

(1) (2) (3) (4) (5)Estimation Method OLS OLS Probit Probit ProbitDependent Variable Ln(Expenditure) Ln(Expenditure) Owner Owner Ownerv̂2 .058∗∗∗

(.009)Ln(Expenditure) .058∗∗∗

(.009)Ln(Income) .410∗∗∗ .385∗∗∗ .055∗∗∗ .057∗∗∗ .033∗∗∗

(.013) (.013) (.006) (.006) (.007)Education .025∗∗∗ .024∗∗∗ .005∗∗ .004∗∗ .002

(.0031) (.003) (.001) (.0015) (.0015)Children .056∗∗∗ .062∗∗∗ .012∗∗∗ .011∗∗∗ .008∗∗

(.0077) (.008) (.004) (.003) (.004)Age -.0025∗∗∗ .029∗∗∗ .003∗∗∗ .003∗∗∗ .003∗∗∗

(.0007) (.004) (.0003) (.0003) (.0003)Age2 -.0002∗∗∗

(.00004)Ln(Price) -.068∗∗ -.067∗∗ -.646∗∗∗ -.624∗∗∗ -.620∗∗∗

(.0318) (.0314) (.017) (.016) (.016)Married .27∗∗∗ .26∗∗∗ .054∗∗∗ .053∗∗∗ .037∗∗∗

(.021) (.021) (.010) (.009) (.010)Moved .012∗ .037∗ -.065∗∗∗ -.063∗∗∗ -.064∗∗∗

(.022) (.022) (.009) (.009) (.009)Black -.058∗∗ -.079∗∗∗ -.064∗∗∗ -.060∗∗∗ -.057∗∗∗

(.019) (.0189) (.0087) (.009) (.009)a v̂2 denotes the residual from Regression (1), the reduced form for log total expenditure.b Regression (1) and (2) are first-stage regressions for log total expenditure, the continuous EEV. Regression

(3)-(5) are first-stage regressions for home ownership, the binary EEV.c * p-value¡10%

** p-value¡5%*** p-value¡1%

35

Tabl

e5:

Com

pari

ngm

argi

nale

ffec

tsin

the

stru

ctur

aleq

uatio

nof

the

hous

ing

shar

e

(1)

(2)

(3)

(4)

(5)

(6)

(7)

(8)

(9)

(10)

Est

imat

ion

Met

hod

IVO

pt.I

VC

FC

FC

FC

FC

FC

FFu

nctio

nalf

orm

fory

1L

inea

r2S

LS

2SL

S2S

LS

Frac

Prob

itFr

acPr

obit

Frac

Prob

itFr

acPr

obit

Frac

Bip

robi

tFr

acB

ipro

bit

No

EE

Vs

No

EE

Vs

One

EE

VTw

oE

EV

sSw

itchi

ngSw

itchi

ngM

argi

nalE

ffec

tsC

oeff

Coe

ffC

oeff

Coe

ffA

PEA

PEA

PEA

PEA

PEA

PEL

n(E

xpen

ditu

re)

-.109∗∗∗

-.176∗∗

-.056∗∗∗

-.061∗∗∗

-.106∗∗∗

-.064∗∗∗

-.061∗∗∗

-.059∗∗∗

-.062∗∗∗

-.059∗∗∗

(.005

)(.0

84)

(.01)

(.01)

(.005

)(.0

09)

(.010

)(.0

09)

(.01)

(.009

)O

wne

r-.0

94∗∗∗

.336

-.14∗∗∗

-.124∗∗∗

-.074∗∗∗

-.088∗∗∗

-.101∗∗∗

-.124∗∗∗

-.098∗∗∗

-.124∗∗∗

(.009

)(.3

30)

(.016

)(.0

15)

(.008

2)(.0

12)

(.020

)(.0

38)

(.020

)(.0

38)

Age

.002∗∗∗

.000

3.0

03∗∗∗

.003∗∗∗

.002∗∗∗

.002

5∗∗∗

.002

6∗∗∗

.002

6∗∗∗

.002

5∗∗∗

.002

6∗∗∗

(.000

2)(.0

01)

(.000

2)(.0

002)

(.000

2)(.0

002)

(.000

2)(.0

002)

(.000

2)(.0

002)

Ln(

Pric

e).1

47∗∗∗

.54∗

.109∗∗∗

.124∗∗∗

.151∗∗∗

.147∗∗∗

.136∗∗∗

.138∗∗∗

.139∗∗∗

.138∗∗∗

(.013

)(.2

9)(.0

17)

(.017

)(.0

13)

(.016

)(.0

21)

(.022

)(.0

21)

(.022

)M

arri

ed-.0

05-.0

6∗∗

-.027

-.028∗∗∗

-.002

6-.0

27∗∗∗

-.027∗∗∗

-.027∗∗∗

-.027∗∗∗

-.027∗∗∗

(.007

)(.0

25)

(.008

)(.0

09)

(.006

)(.0

08)

(.008

)(.0

08)

(.008

)(.0

08)

Mov

ed.0

11.0

71.0

05.0

07.0

09.0

09.0

07.0

05.0

07.0

05(.0

07)

(.047

)(.0

08)

(.007

)(.0

07)

(.007

)(.0

08)

(.009

)(.0

07)

(.009

)B

lack

-.012∗∗

.041

-.010∗

-.009

-.012∗∗

-.006

-.008

-.009

-.007

-.009

(.006

)(.0

36)

(.006

)(.0

06)

(.006

)(.0

06)

(.006

)(.0

07)

(.006

)(.0

07)

aT

hede

pend

entv

aria

ble

isth

eex

pend

iture

shar

eon

hous

ing.

bSt

anda

rder

rors

fort

hees

timat

edA

PEs

wer

ebo

otst

rap

stan

dard

erro

rsw

ith20

0re

plic

atio

ns.

cR

egre

ssio

n(1

)is

the

OL

Sfo

rlin

earp

roba

bilit

ym

odel

that

assu

ms

noE

EV

s.R

egre

ssio

n(2

)is

the

2SL

SIV

estim

ator

forl

inea

rpro

babi

lity

mod

elth

atus

esa

linea

rpro

babi

lity

mod

elfo

rthe

redu

ced

form

ofho

me

owne

rshi

p.R

egre

ssi

binary and fractional response models with continuous and … · 2019. 11. 17. · likelihood of...

Documents