binary and fractional response models with continuous and … · 2019. 11. 17. · likelihood of...

39
Binary and Fractional Response Models with Continuous and Binary Endogenous Explanatory Variables Wei Lin * Jeffrey M. Wooldridge November 8th, 2017 Abstract This paper considers latent variable models for binary responses and fractional responses with a bi- nary endogenous explanatory variable (EEV) and potentially many continuous endogenous explanatory variables. A two-step control function (CF) approach is promoted to account for endogeneity. The CF approach enables an uncovering of partial effects of causal interest. The inference for the partial effects can be easily obtained through bootstrapping because of the computational simplicity of the two-step CF approach. A basic probit model, an endogenous switching probit model, and a fractional probit model are discussed in the paper. Variable addition tests on generalized residuals are used to detect additional endogeneity from the binary EEV. Monte Carlo experiments show that partial effects obtained by insert- ing generalized residuals into binary response models outperform coefficients from linear specifications. In fact, they provide fairly close approximations to partial effects from joint estimations. An empirical illustration of the determination of housing budget shares shows that, in a fractional response model, using generalized residuals again leads to a close approximation to joint estimations. The coefficients from linear specifications and partial effects from quasi-MLE are also close in this case. * Center for Real Estate, Massachusetts Institute of Technology, Cambridge, MA 02139 , United States. [email protected]. Department of Economics, Michigan State University, East Lansing, MI 48824, United States. [email protected]. 1

Upload: others

Post on 02-Feb-2021

0 views

Category:

Documents


0 download

TRANSCRIPT

  • Binary and Fractional Response Models with Continuous and

    Binary Endogenous Explanatory Variables

    Wei Lin∗

    Jeffrey M. Wooldridge†

    November 8th, 2017

    Abstract

    This paper considers latent variable models for binary responses and fractional responses with a bi-

    nary endogenous explanatory variable (EEV) and potentially many continuous endogenous explanatory

    variables. A two-step control function (CF) approach is promoted to account for endogeneity. The CF

    approach enables an uncovering of partial effects of causal interest. The inference for the partial effects

    can be easily obtained through bootstrapping because of the computational simplicity of the two-step CF

    approach. A basic probit model, an endogenous switching probit model, and a fractional probit model

    are discussed in the paper. Variable addition tests on generalized residuals are used to detect additional

    endogeneity from the binary EEV. Monte Carlo experiments show that partial effects obtained by insert-

    ing generalized residuals into binary response models outperform coefficients from linear specifications.

    In fact, they provide fairly close approximations to partial effects from joint estimations. An empirical

    illustration of the determination of housing budget shares shows that, in a fractional response model,

    using generalized residuals again leads to a close approximation to joint estimations. The coefficients

    from linear specifications and partial effects from quasi-MLE are also close in this case.

    ∗Center for Real Estate, Massachusetts Institute of Technology, Cambridge, MA 02139 , United States. [email protected].†Department of Economics, Michigan State University, East Lansing, MI 48824, United States. [email protected].

    1

    mailto:[email protected]:[email protected]

  • 1 Introduction

    Binary response models play a significant role in many fields of empirical studies. Examples include

    econometric models determining the probability of migration in labor economics (Dong and Lewbel, 2015),

    the chance of college admission in the economics of education (Conlin et al., 2013), and the likelihood of

    takeover activity in finance (Edmans et al., 2012), to name just a few.

    In practice, linear probability models for binary responses are often used because they are easy to esti-

    mate. However, a linear projection disregards the limited nature of the binary response, resulting in unreal-

    istic predictions of the response probability. Therefore, coefficients of the linear probability models serve,

    at best, as approximations to marginal effects of interest .

    Latent variable models for binary responses, on the other hand, are well grounded on economic theories.

    The latent threshold-crossing structure captures the trigger effect, and the nonlinear transformation ensures

    that the response probability falls into the unit interval. Thereby, post-estimation quantities from the latent

    variable models, such as average partial effects (APEs), can bear a causal interpretation, even in the presence

    of endogenous explanatory variables (Lin and Wooldridge, 2015). Yet due to the nonlinearity, detecting and

    solving endogeneity issues in the first place is less straightforward, especially when some of the suspected

    endogenous explanatory variables are also discrete.

    This paper considers the estimation of a special case of latent variable models for binary responses with

    EEVs of differing attributes, where a control function approach can be applied to make it easier to handle

    EEVs. Namely, we allow for one binary EEV and potentially many continuous EEVs. The binary EEV

    could be an indicator of self-selecting into a treatment, into a region or into a sample, depending on how it

    appears in the binary response equation. Some empirical examples for the binary EEV are whether to own

    a house, whether to submit SAT scores in college applications, or whether to be in the labor force. The

    continuous EEVs could be years of education, family income, prices of substitutes, or inputs for production

    functions. The endogeneity here is modeled to arise from omitted variable problems—the existence of

    common unobservables affecting both the binary outcome and the EEVs.

    To estimate the model above, for simplicity, we take a simple parametric approach by assuming joint

    normality among error terms, following the treatments of binary and continuous EEVs adopted by Heckman

    (1978), Amemiya (1978), and Rivers and Vuong (1988). This distributional assumptions, not only allows for

    control function approaches, but also for joint estimation, such as a limited information maximum likelihood

    2

  • (LIML) estimation, or a quasi-limited information maximum likelihood (quasi-LIML) estimation if any of

    the distribution is misspecified. However, this kind of joint estimation is rarely conducted in practice, due to

    the difficulty in searching for a numerical solution along so many dimensions. One might also be tempted

    to mimic two-stage least squares by substituting fitted values from a first-stage estimation for EEVs in the

    binary response equation. Despite the simplicity, this procedure leads to the so-called ”forbidden regression”

    coined by Hausman (1975), which in turn yields inconsistent estimates.

    Alternatively, drawing on Wooldridge (2014), this paper promotes a two-step control function (CF)

    approach under the quasi-LIML framework, which is not only computationally simple but also delivers

    sensible estimators of APEs in the presence of multiple EEVs and various sources of heterogeneity. To carry

    out this two-step procedure, residuals, instead of fitted values, from the first-stage estimation of reduced

    forms for the continuous EEVs are plugged into the second-stage joint estimation of the binary outcome and

    binary EEV. Routines in commonly used software can be exploited (or slightly modified) to carry out this

    procedure. Most importantly, due to the computational simplicity of the CF approach, bootstrapping can

    be easily applied to obtain inference for functions of the parameters, such as APEs, rather than using the

    complicated delta method.

    In addition, as shown in Wooldridge (2014), simple variable addition tests (VATs) for endogeneity are

    obtained as by-products of the CF approach. This paper extends the VATs for a single EEV to VATs for

    multiple EEVs. The VATs are based on standard Wald tests of those plugged-in residuals obtained from

    the first stage estimations. In particular, in the presence of a binary EEV, testing on generalized residuals

    enables us to determine whether we can avoid a joint estimation in the second stage. Further, since EEVs

    are often correlated with each other, conditioning on residuals obtained from other EEVs helps reduce the

    likelihood of detecting additional endogeneity in the binary EEV.

    Another feature of the two-step procedure, which stems from White (1982), is the quasi-LIML frame-

    work. The analysis for binary response models in this paper easily carries through to fractional response

    models, as proposed in Papke and Wooldridge (1996). So long as the conditional mean of the fractional

    response is correctly specified, we consistently estimate parameters in the conditional mean even though

    other features of the distribution are misspecified.

    Besides parametric approaches to estimating this triangular model for binary responses, semi-parametric

    and nonparametric approaches are also available. Nevertheless, while those approaches sensibly relax para-

    metric assumptions in one aspect, they inevitably impose restrictions in other directions. For example,

    3

  • Blundell and Powell (2003) advance CF approaches to fully nonparametric binary response models with

    continuous EEVs. Unfortunately, their assumption of additive, independent errors rules out discrete EEVs.

    The special regressor method in Dong and Lewbel (2015) allows for both continuous and binary EEVs in

    semiparametric binary response models. However, their method requires a special regressor to be excluded

    from the reduced forms for the EEVs, and the special regressor cannot appear in the structural equation in

    flexible functional forms. Further, as discussed in Lin and Wooldridge (2015), the average index functions

    (AIF), proposed as a basis for defining marginal effects for special regressor methods, lack a causal interpre-

    tation. Some other existing semiparametric methods for estimating this model are discussed in more detail

    by Lin (2016).

    The rest of the paper is organized as follows. Section 2 starts with a basic model with one binary and

    many continuous EEVs. The same arguments are then extended to an endogenous switching model where

    the error terms and switching indicator are allowed to interact. Section 3 derives the VAT for endogeneity

    from a binary EEV given residuals from continuous EEVs. Section 4 shows that the CF approach can

    be applied to fractional response models. Section 5 presents Monte Carlo simulation results of empirical

    distributions of APEs for binary response models. Section 6 illustrates this approach by revisiting the study

    of the effects of price and total expenditure on housing budget share equation. Section 7 concludes.

    2 Model Specification and Estimation for Binary Response

    2.1 Probit Models with One Binary EEV and Many Continuous EEVs

    As a starting point, we first assume that the only complication arises from EEVs of differing attributes,

    with no presence of heterogeneity or misspecification yet. More specifically, consider a simple model for a

    binary response y1 with many continuous EEVs y2 and one binary EEV y3. Write the model recursively in

    a triangular form as

    y1 = 1 [x1β + u1 > 0] , (1a)

    y2 = zΠ + v2, (1b)

    y3 = 1 [zδ + u3 > 0] . (1c)

    4

  • Equation (1a) is a structural equation that represents a causal relationship. Equations (1b) and (1c) are

    reduced forms for the continuous EEVs y2 of dimension 1×G and the scalar binary EEV y3, respectively.

    1 [·] denotes the indicator function that takes on a value of one when the statement in the bracket is true

    and zero otherwise. x1 is a 1 ×K1 vector where each of its elements is a general function of (z1,y2, y3),

    such as polynomials, interactions, logarithms, etc., with x1 = (z1,y2, y3) being the leading case. z1 is a

    1 × L1 strict subset of the entire 1× L vector of exogenous variables z ≡ (z1, z2), with L ≡ L1 + L2

    and L2 ≥ G + 1. Identifying parameters based on nonlinearity often times turns out poorly in practice, so

    we need at least one excluded instrument for the binary EEV. Further, the same rank condition holds as in

    two-stage least squares: rank E(z′z) = L1 + L2 and rank E(z′x1) = K1. Moreover, let z1 include unity

    as its first element, which effectively forces the error terms (u1,v2, v3) to have zero means. Π is a L × G

    matrix of parameters. In the simplest case, when G = 1, y2 is a scalar.

    This system of equations describes endogeneity as an omitted variable problem. The structural error u1

    is correlated with explanatory variables y2 and y3, in that it contains an unobservable that also appears in

    error terms v2 and u3. Write the linear projections of (u1, u3) on v2 in error forms:

    u1 = v2θ + v1, (2a)

    u3 = v2η + v3, (2b)

    where θ ≡ E (v′2v2)−1E (v′2u1) and η ≡ E (v′2v2)

    −1E (v′2u3) are the G × 1 vectors of the population

    regression coefficients.

    A convenient joint normality assumption among (u1,v2, u3) (Heckman, 1978; Amemiya, 1978; Rivers

    and Vuong, 1988) leaves us with a bivariate normally distributed vector of errors (v1, v3) that is independent

    of v2 (by definition of a linear projection and by a property of multivariate normality):

    D

    v1v3

    = D v1

    v3

    ∣∣∣∣∣∣∣v2 ∼ Normal

    0

    0

    , 1 ρ

    ρ 1

    , (3)

    where D (·) denotes the distribution, the variances of v1 and v3 are normalized to one, and ρ ≡ Cov (v1, v3)

    is the covariance.

    In fact, if we are willing to assume a strong enough exogeneity condition for the instruments z, the

    bivariate distribution (v1, v3) becomes independent not only of v2 but also of z and thus of y2 (because y2

    5

  • is a deterministic function of z,v2):

    D

    v1v3

    = D v1

    v3

    ∣∣∣∣∣∣∣ z,v2 = D

    v1v3

    ∣∣∣∣∣∣∣y2, z,v2 . (4)

    Given the distributional assumptions, we arrive at a bivariate probit model that accounts for endogeneity

    issues in (1) by adding v2 as extra explanatory variables:

    y1 = 1 [x1β + v2θ + v1 ≥ 0] , (5a)

    y3 = 1 [zδ + v2η + v3 ≥ 0] . (5b)

    Adding reduced form errors to control for endogeneity is the essence of a control function approach.

    However, since we can not observe v2, to operationalize it, a simple two-step procedure proceeds as follows:

    1. Estimate (1b), the reduced forms for y2, by ordinary least squares (OLS), equation by equation, to

    obtain the residuals v̂2 = y2 − zΠ̂.

    2. Estimate (5), the bivariate probit model for y1 and y3, jointly by maximum likelihood estimation

    (MLE), replacing v2 with v̂2.

    Since there is no one-to-one mapping between the reduced form error v3 and the binary EEV y3, we

    cannot obtain a proxy for v3 and hence have to rely on a joint estimation in the second step. By the usual

    consistency argument of two-step M-estimations (see, for example, Wooldridge, 2010, section 12.4.1), the

    resulting control function estimator(

    Π̂, β̂, δ̂,θ̂, η̂)

    is consistent for parameters identified by the following

    population problems. Formally,

    Π = E(z′z)−1

    E(z′y2

    ), (6)

    and (β, δ,θ,η) is the unique solution to

    maxb∈RK1 ,d∈RL,r∈RG,g∈RG,ρ∈R

    E [logP (y1, y3|y2, z,v2)]

    = E

    [y1y3 log

    ∫ ∞−q3

    Φ (d)φ (υ3) dυ3

    + (1− y1) y3 log∫ ∞−q3

    [1− Φ (d)]φ (υ3) dυ3

    +y1 (1− y3) log∫ −q3−∞

    Φ (d)φ (υ3) dυ3

    + (1− y1) (1− y3) log∫ −q3−∞

    [1− Φ (d)]φ (υ3) dυ3], (7)

    6

  • where

    d ≡ x1b + v2r + ρυ3√1− ρ2

    , (8)

    q3 ≡ zd + v2g. (9)

    However, as the magnitude of β depends on the normalization of the error terms and thus is only iden-

    tified up to scale, interpreting β is not especially meaningful. Instead, the primary goal in empirical studies

    is to explain marginal effects of a variable of interest on response probabilities. In the presence of EEVs,

    P (y1 = 1|x1), the conditional response probability is hardly of any interest: it is affected by y2 and y3

    having correlations with the omitted variable in the unobservables u1. We must use care in constructing

    a interesting response function for deriving partial effects. Fortunately, Blundell and Powell (2003, 2004)

    have proposed the average structural function (ASF), which is intuitively appealing and can be obtained via

    counterfactual reasoning. In defining the ASF for the structural equation (1a), we break the correlations by

    holding the observables x1 as fixed arguments and averaging out the unobservable ui1 without conditioning

    on x1:

    ASF (x1) = Eui1 {1 [x1β + ui1 > 0]} , (10)

    where the subscript i on ui1 emphasizes that it is a random variable, and Eui1 {·} is the expected value with

    respect to ui1.

    In the two-step CF procedure above, we identify parameters that correspond to the conditional normality

    of u1 given v2, namely,

    u1|v2 ∼ Normal (v2θ, 1) . (11)

    Thus, by the usual law of iterated expectations, the ASF defined in (10) can be obtained in two steps.

    First, we treat vi2 as fixed, and then average them out as random variables:

    ASF (x1) = Evi2{Eui1|vi2 {1 [x1β + vi2θ + vi1 > 0] |vi2}

    }= Evi2 {Φ (x1β + vi2θ)}

    =

    ∫ ∞−∞

    Φ (x1β + υi2θ)φ (υi2) dυi2, (12)

    where φ (·) is the density function for the random variables vi2.

    The average partial effects (APEs) for a given x1 are then obtained by taking derivatives or differences

    7

  • of (12)

    APEy2 (x1) = βy2

    ∫ ∞−∞

    φ (x1β + υi2θ)φ (υi2) dυi2, (13a)

    APEy3 (x1) =∫ ∞−∞

    [Φ(x

    (1)1 β + vi2θ

    )− Φ

    (x

    (0)1 β + vi2θ

    )]φ (υi2) dυi2, (13b)

    where βy2 is the coefficient on y2 and x(1)1 denotes explanatory variables at a particular fixed value with

    y3 = 1 and x(0)1 denotes the same fixed value of the explanatory variables except that y3 = 0. Those APEs

    can be consistently estimated by using sample analogue and inserting consistent estimators of β̂ and θ̂ from

    the two-step CF approach:

    ÂPEy2 (x1) = β̂y2

    [N−1

    N∑i=1

    φ(x1β̂ + v̂i2θ̂

    )], (14a)

    ÂPEy3 (x1) = N−1

    N∑i=1

    [Φ(x

    (1)1 β̂ + v̂i2θ̂

    )− Φ

    (x

    (0)1 β̂ + v̂i2θ̂

    )]. (14b)

    To obtain inference for the estimators of APEs as in (14a) and (14b), analytical standard errors can be

    derived by the delta method and by setting the two-step control function problem as one-step method of

    moments problem. However, because all the procedures involved in the estimations are standard routines,

    bootstrap standard errors can be easily obtained to account for the sampling errors.

    As shown in (13a) and (13b), APEs for the binary response model have the attractive feature of built-in

    heterogeneity—-they deliver varying partial effects when evaluated at different values of x1. However, if one

    is interested in using a single summary statistic for marginal effects, further averaging across x1 should be

    applied. A joint averaging across x1, v̂2 (as ”margins” command does in STATA) is computationally easier

    but bears a different causal interpretation from sequentially averaging out v̂2 and x1 (Nam and Wooldridge,

    2014).

    Although serving as a starting point, the modelling strategy in (1) for CF approach is limited in several

    ways. One restrictive feature is that the reduced form error v2 needs to be independent of the exogenous

    variables z. Thus, the linear function form for conditional mean of y2 is unrealistic and can be relaxed to be

    any generic function π (·) for z as in Blundell and Powell (2003, 2004). More importantly, v2 here acts as a

    sufficient statistic to control for any endogeneity from y2 in the structural error u1: that is, y2 is correlated

    with u1 only through v2 on its level form. However, as shown in Murtazashvili and Wooldridge (2015), in

    8

  • case of more heterogeneity such as random coefficients, the unobservable u1 can contain full interactions

    between v2 and z,x1. Besides interactions, even though allowing for an unknown function h (·) for v2 as

    in Lin (2016) does not completely make the dependence of u1 on v2 flexible, nevertheless it adds some

    flexiblity to this restrictive assumption.

    2.2 Probit Endogenous Switching Models with Many Continuous EEVs

    As we are interested in modeling some heterogeneity besides EEVs, we turn to a probit switching re-

    gression with EEVs. The binary EEV y3 can be viewed as a switching indicator. In addition to shifting

    intercepts when y3 appears by itself in the linear index, the switching can be made more general. Interacting

    y3 with all the observables allows us to switch into regimes of differing slopes. The interaction between

    y3 and unobservables indicates the two regimes have differing unobservables. The switching is endogenous

    because y3 is correlated with the unobservables. In the treatment effect framework, y3 is the treatment

    indicator and the treatment effect is heterogenous. To see this, first write the model as follows:

    y1 = 1 [(1− y3)x1β0 + y3x1β1 + (1− y3)u0 + y3u1 > 0] (15a)

    y2 = zΠ + v2 (15b)

    y3 = 1 [zδ + u3 > 0] , (15c)

    Under a similar set of notations and assumptions as in (1), write the linear projection of u1, u0 and u3

    onto the reduced form error v2 in error forms:

    u0 = v2θ0 + v0 (16a)

    u1 = v2θ1 + v1 (16b)

    u3 = v2η + v3, (16c)

    where θ0 ≡ E (v′2v2)−1E (v′2u0), θ1 ≡ E (v′2v2)

    −1E (v′2u1) and η ≡ E (v′2v2)−1E (v′2u3). Then, we

    maintain a strong exogeneity assumption that the remaining error terms v0 and v1 are independent of v2 and

    a parametric assumption that they have a bivariate normal distribution with the remaining error term v3 with

    9

  • covariance ρ0 and ρ1, respectively:

    D

    v0v3

    ∣∣∣∣∣∣∣v2 ∼ Normal

    0

    0

    , 1 ρ0

    ρ0 1

    , (17a)

    D

    v1v3

    ∣∣∣∣∣∣∣v2 ∼ Normal

    0

    0

    , 1 ρ1

    ρ1 1

    . (17b)

    Again, assuming (v1, v3) and (v0, v3) are independent of z leads to an independence between y2 and

    the joint distribution of (v0, v3) and (v1, v3)

    D

    v0v3

    = D v0

    v3

    ∣∣∣∣∣∣∣ z,v2 = D

    v0v3

    ∣∣∣∣∣∣∣y2, z,v2 , (18a)

    D

    v1v3

    = D v1

    v3

    ∣∣∣∣∣∣∣ z,v2 = D

    v1v3

    ∣∣∣∣∣∣∣y2, z,v2 . (18b)

    Them, rewrite model (15) in the treatment framework

    y1 = (1− y3) y(0)1 + y3y(1)1 (19)

    y(0)1 = 1 [x1β0 + v2θ0 + v0 > 0] , (20)

    y(1)1 = 1 [x1β1 + v2θ1 + v1 > 0] , (21)

    y3 = 1 [zδ + v2η + v3 > 0] , (22)

    where y(0)1 is the potential outcome when the treatment y3 equals zero and y(1)1 is the potential outcome

    when the treatment is one. The self-selection problem is represented by the non-zero correlation between

    the treatment indicator y3 and the unobservables v0 and v1 in the potential outcomes. Those who self-select

    into treatment inherently have a different distribution of unobservable from those who do not.

    To consistently estimate the parameters in this model, a simple three-step control function approach

    splits the above model into two Heckman sample selection models with sub-samples defined by the treatment

    status:

    1. Using all observation, estimate (15b), the reduced forms for y2, by ordinary least squares (OLS),

    equation by equation, to obtain the residuals v̂2 = y2 − zΠ̂.

    2. Since y(1)1 is observed only when y3 = 1, jointly estimate (21) and (22), the binary outcome equation

    10

  • for y(1)1 and sample selection equation for indicator y3, by maximum likelihood estimation (MLE), replacing

    v2 with v̂2, to obtain β̂1 and θ̂1.

    3. Since y(0)1 is observed only when y3 = 0, jointly estimate (20) and (22), the binary response model for

    y(0)1 and sample selection equation for indicator 1−y3, by maximum likelihood estimation (MLE), replacing

    v2 with v̂2, to obtain β̂0 and θ̂0.

    The above procedure is justified by splitting the objective function for the second-step estimation into

    two parts.

    Namely, solving

    maxb0∈RK1 ,b1∈RK1 ,d∈RL,r0∈RG,r1∈RG,g∈RG,ρ0∈R,ρ1∈R

    E [logP (y1, y3|y2, z,v2)]

    = E[y1y3 logP

    (y

    (1)1 = 1, y3 = 1|z,v2,y2

    )+ (1− y1) y3 logP

    (y

    (1)1 = 0, y3 = 1|z,v2,y2

    )+y1 (1− y3) logP

    (y

    (0)1 = 1, y3 = 0|z,v2,y2

    )+ (1− y1) (1− y3) logP

    (y

    (0)1 = 0, y3 = 0|z,v2,y2

    )], (23)

    is equivalent to solving

    maxb1∈RK1 ,d∈RL,r1∈RG,g∈RG,ρ1∈R

    E [logP (y1, y3|y2, z,v2)]

    = E[y

    (1)1 y3 logP

    (y

    (1)1 = 1, y3 = 1|z,v2,y2

    )+(

    1− y(1)1)y3 logP

    (y

    (1)1 = 0, y3 = 1|z,v2,y2

    )+ (1− y3) logP (y3 = 0|z,v2,y2)] , (24)

    and

    maxb0∈RK1 ,d∈RL,r0∈RG,g∈RG,ρ0∈R

    E [logP (y1, y3|y2, z,v2)]

    = E[y

    (0)1 (1− y3) logP

    (y

    (0)1 = 1, y3 = 0|z,v2,y2

    )+(

    1− y(0)1)

    (1− y3) logP(y

    (0)1 = 0, y3 = 0|z,v2,y2

    )+y3 logP (y3 = 1|z,v2,y2)] , (25)

    where

    11

  • P(y

    (1)1 = 1, y3 = 1|z,v2,y2

    )=

    ∫ ∞−q3

    Φ (d1)φ (υ3) dυ3 (26)

    P(y

    (1)1 = 0, y3 = 1|z,v2,y2

    )=

    ∫ ∞−q3

    [1− Φ (d1)]φ (υ3) dυ3 (27)

    P(y

    (0)1 = 1, y3 = 0|z,v2,y2

    )=

    ∫ −q3−∞

    Φ (d0)φ (υ3) dυ3 (28)

    P(y

    (0)1 = 0, y3 = 0|z,v2,y2

    )=

    ∫ −q3−∞

    [1− Φ (d0)]φ (υ3) dυ3 (29)

    P (y3 = 1|z,v2,y2) = Φ (q3) (30)

    P (y3 = 0|z,v2,y2) = 1− Φ (q3) (31)

    d1 ≡x1b1 + v2r1 + ρ1υ3√

    1− ρ21(32)

    d0 ≡x1b0 + v2r0 + ρ0υ3√

    1− ρ20(33)

    q3 ≡ zd + v2g. (34)

    Similar to (12), the ASF for the endogenous switching model is a combination of the ASFs for the two

    regimes:

    ASF (x1) =∫ ∞−∞

    [y3Φ (x1β1 + υi2θ1) + (1− y3) Φ (x1β0 + υi2θ0)]φ (υi2) dυi2. (35)

    APEs for a continuous EEV y2 and binary EEV y3 are defined as follows respectively:

    APEy2 (x1) =∫ ∞−∞

    [βy(1)2

    y3φ (x1β1 + υi2θ1)

    +βy(0)2

    (1− y3)φ (x1β0 + υi2θ0)]φ (υi2) dυi2, (36a)

    APEy3 (x1) =∫ ∞−∞

    [Φ (x1β1 + υi2θ1)− Φ (x1β0 + υi2θ0)]φ (υi2) dυi2, (36b)

    where βy(1)2

    is the coefficient for y2 in (21) and βy(0)2is the coefficient in (20).

    Notice that the APE for a binary exogenous variable z1 is defined nontrivially as

    APEz1 (x1) =∫ ∞−∞

    {y3

    [Φ(x

    (1)1 β1 + υi2θ1

    )− Φ

    (x

    (0)1 β1 + υi2θ1

    )]+ (1− y3)

    [Φ(x

    (1)1 β0 + υi2θ0

    )− Φ

    (x

    (0)1 β0 + υi2θ0

    )]}, (37)

    where x(1)1 denotes explanatory variables at a particular fixed value with z1 = 1 and x(0)1 denotes the same

    12

  • fixed value of the explanatory variables except that now z1 = 0.

    Correspondingly, a consistent estimate of the APEs is a sample analog of (36a) and (36b) with consistent

    estimates for the parameters plugged in:

    ÂPEy2 (x1) = N−1

    N∑i=1

    [β̂y(1)2

    y3φ(x1β̂1 + v̂i2θ̂1

    )+β̂

    y(0)2

    (1− y3)φ(x1β̂0 + v̂i2θ̂0

    )], (38a)

    ÂPEy3 (x1) = N−1

    N∑i=1

    [Φ(x1β̂1 + v̂i2θ̂1

    )− Φ

    (x1β̂0 + v̂i2θ̂0

    )]. (38b)

    As before, instead of deriving complicated analytical formulas for standard errors for estimates of APEs,

    bootstrap standard error can be easily applied to account for the sampling variation in the generated regressor

    v̂.

    Despite that the switching model brings in additional flexibility by allowing the structural error u ≡

    (1− y3)u0 + y3u1 to depend not only on v2 but also on interactions between v2 and y3, assuming that the

    reduced forms for y2 remain unchanged across two regimes is restrictive in empirical applications.

    3 Test for Endogeneity from a Binary Explanatory Variable

    This section focuses on variable addition tests for additional endogeneity from a binary explanatory

    variable, conditioning on v̂2, the residuals from reduced forms for continuous EEVs. As we have seen in

    equations (1) and (15), the only consistent approach to deal with a binary EEV is to make distributional

    assumptions and conduct a joint estimation. In real application, we always want to avoid a joint MLE

    estimation due to its sensitivity to the distributional assumption and computational difficulty in arriving at a

    numerical solution. A variable addition test (VAT), as proposed in Wooldridge (2014), helps us determine

    whether such a joint estimation is necessary by testing on generalized residuals before proceeding to a

    joint estimation. Especially if we have already controlled for endogeneity from other continuous EEVs by

    conditioning on v̂2, the generalized residual is less likely to be correlated with the remaining unobservable.

    The following shows that the VAT on the generalized residual is a valid test for endogeneity from a binary

    explanatory variable because it is asymptotically equivalent to a LM test under the null hypothesis of no

    endogeneity.

    More formally, in the basic model (1), we are interested in testing the following null hypothesis:

    13

  • H0 : ρ = 0. First, we begin by showing an infeasible Lagrange multiplier (score) test that has the asymptotic

    distribution of χ21. Then, we show the, conditional on v2, VAT test of the generalized residual is asymptot-

    ically equivalent to the infeasible LM test and thus has the same asymptotic χ21 distribution. In practice, in

    order to account for the sampling error in v̂2, we bootstrap the two-step procedure to obtain the p-value of

    the test. Let γ ≡ (β,θ) and wi ≡ (xi1,vi2). Let d̃i be di in (8) evaluated at ρ = 0 and γ̃ be the estimates

    of γ obtained from the restricted model. The restricted model is one where ρ = 0 so we treat y3 as an ex-

    ogenous explanatory variable. Let q̂3i be q3i in (9) evaluated at the parameters(δ̂, η̂

    )from a reduced-form

    probit estimation.

    As in Semykina and Wooldridge (2017), using the likelihood function Li ≡ P (yi1, yi3|yi2, zi,vi2)

    for one observation, the LM statistic plugs the estimates from the restricted model into the score from the

    unrestricted model:

    LM =

    (N∑i=1

    S̃i,ρ

    )′Ã22

    [Ṽ22

    ]−1Ã22

    (N∑i=1

    S̃i,ρ

    )/N, (39)

    where S̃i,ρ ≡ ∂ lnLi∂ρ |γ=γ̃,ρ=0 =yi1−Φ(d̃i)

    Φ(d̃i)[1−Φ(d̃i)]φ(d̃i

    )ĝri3

    ≡ − 1N

    ∑Ni=1E(∂2 lnLi∂γ∂γ′ |yi3,yi2, zi,vi2

    )|γ=γ̃,ρ=0

    ∑Ni=1E

    (∂2 lnLi∂ρ∂γ′ |yi3,yi2, zi,vi2

    )|γ=γ̃,ρ=0∑N

    i=1E(∂2 lnLi∂γ∂ρ |yi3,yi2, zi,vi2

    )|γ=γ̃,ρ=0

    ∑Ni=1E

    (∂2 lnLi∂ρ∂ρ |yi3,yi2, zi,vi2

    )|γ=γ̃,ρ=0

    = 1N

    ∑N

    i=1

    φ(d̃i)2

    Φ(d̃i)[1−Φ(d̃i)]w′iwi

    ∑Ni=1

    φ(d̃i)2

    Φ(d̃i)[1−Φ(d̃i)]w′iĝri3∑N

    i=1

    φ(d̃i)2

    Φ(d̃i)[1−Φ(d̃i)]ĝri3wi

    ∑Ni=1

    φ(d̃i)2

    Φ(d̃i)[1−Φ(d̃i)]ĝr2i3

    Ã−1 =

    Ã11 Ã12Ã21 Ã22

    Ṽ = Ã−1B̃Ã−1 =

    Ṽ11 Ṽ12Ṽ21 Ṽ22

    B̃ ≡ 1

    N

    N∑i=1

    (S̃i,ρS̃

    ′i,ρ

    )ĝri3 ≡ yi3

    φ (q̂3i)

    Φ (q̂3i)− (1− yi3)

    φ (−q̂3i)Φ (−q̂3i)

    (40)

    Matrix à above is an estimator of the expected value of the negative Hessian matrix that uses the expected

    Hessian form. The outer product of scores or usual Hessian form of the matrix could be used. ĝri3 is a

    14

  • consistent estimator of gri3 ≡ E (vi3|yi3,yi2, zi,vi2)

    A VAT can be carried out by the following procedure of testing on generalized residuals:

    1. Use OLS to estimate the reduced-form equations for yi2 (1b) to obtain v̂i2.

    2. Use probit to estimate the augmented reduced-form for yi3 in (5b), construct ĝri3 according to the

    formula in equation (40).

    3. Augment equation (5a) by ĝri3 and estimate by probit. Use the t statistics for testing single hypothe-

    ses.

    Under the null hypothesis the coefficient on ĝri3 is zero, and so estimation of the parameters in ĝri3 does

    not affect the√N -asymptotic distribution of the test statistic. There is no need to account for the first-step

    estimation of ĝri3 when performing the test. However, as in Wooldridge (2010, Section 12.5.2), we need

    to adjust for the first-step estimation of vi2, by stacking the moment conditions or by bootstrapping the two

    steps procedure.

    The following shows that, conditional on vi2, the variable addition test is asymptotically equivalent to

    the LM test. Write the second-step log likelihood function as

    Li = Φ (xi1β + vi2θ+τgri3)yi1 [1− Φ (xi1β + vi2θ+τgri3)]1−yi1 . (41)

    As mentioned above, we ignore the fact that gr3 is estimated consistently at the first step. The score

    vector of (41) is

    Si =

    ∂ lnLi∂γ∂ lnLi∂τ

    = yi1 − Φ (wiγ+τgr3)Φ (wiγ+τgr3) [1− Φ (wiγ+τgr3)]

    φ (wiγ+τgr3)

    wigri3

    (42)Summing the score vector over all i and using a mean-value expansion about the true parameter vector

    gives

    N−1/2N∑i=1

    Ŝi = N−1/2

    N∑i=1

    Si −A√N

    γ̂ − γτ̂ − τ

    + op (1) = 0 (43)where Ŝi is the score vector evaluated at the estimated parameters

    (γ̂ ′, τ̂

    )′, and A is the expected value ofthe negative Hessian matrix.

    √N

    γ̂ − γτ̂ − τ

    = A−1 [N−1/2 N∑i=1

    Si

    ]+ op (1) (44)

    15

  • When testing H0 : τ = 0, the robust Wald test statistic is given by

    W = (τ̂ − τ)′(V̂22/N

    )−1(τ̂ − τ) =

    √N (τ̂ − τ)′ V̂−122

    √N (τ̂ − τ) (45)

    where

    V̂ = Â−1B̂Â−1 =

    V̂11 V̂12V̂21 V̂22

    , (46)B̂ =

    1

    N

    N∑i=1

    (S̃i,ρS̃

    ′i,ρ

    ), (47)

    Â =1

    N

    ∑Ni=1 φ(p̂i)2Φ(p̂i)[1−Φ(p̂i)]w′iwi ∑Ni=1 φ(p̂i)2Φ(p̂i)[1−Φ(p̂i)]w′iĝri3∑Ni=1

    φ(p̂i)2

    Φ(p̂i)[1−Φ(p̂i)] ĝri3wi∑N

    i=1φ(p̂i)

    2

    Φ(p̂ii)[1−Φ(p̂i)] ĝr2i3

    , (48)p̂i = wiγ̂+τ̂ ĝr3, (49)

    Â−1p−→ A−1 =

    A11 A12A21 A22

    . (50)So the Wald statistic can also be written as

    W =

    (N∑i=1

    Si, τ

    )′A22V̂−122 A

    22

    (N∑i=1

    Si, τ

    )//N (51)

    Under the null of no selection bias (τ = 0, ρ = 0), the score and Hessian matrices used in (39) and (51)

    are the same when evaluated at the true parameter values. When the null is true, τ̂p−→ 0,

    √N (γ̂ − γ) and

    √N (γ̃ − γ) converge in distribution. Therefore, LM−W p−→ 0, so the tests are asymptotically equivalent.

    Through bootstrapping the two-step procedure, p-value for the test can be obtained .

    4 Quasi-LIML and Fractional Response

    Based on the literature of Quasi-MLE (White, 1982), the findings above carry through if f1 is a fractional

    response with a conditional mean that happens to have a probit form. The key insight from quasi-likelihood

    estimation is that we do not need to know the true distribution of the entire model to obtain consistent param-

    eter estimates. This likelihood function could also be applied to the case where y1 is a fractional response,

    as long as we model the conditional mean of y1 to have a probit form. With the Bernoulli distribution being

    in the linear exponential family, quasi-LIML would identify parameters in a correctly specified conditional

    16

  • mean regardless of misspecification in other aspects of the distribution.

    Namely,

    E (f1|x1, c1) = Φ (x1β+c1) (52a)

    y2 = zΠ + v2 (52b)

    y3 = 1 [zδ + u3 ≥ 0] , (52c)

    where c1 is an omitted variable thought to be correlated with y2 and y3. By assuming c1 follows a joint

    normality distribution with v2 and u3, linear projections of c1 and u3 onto v2 have the following error form:

    c1 = v2θ+a1 (53a)

    u3 = v2η + v3 (53b)

    where θ ≡ E (v′2v2)−1E (v′2c1) and η ≡ E (v′2v2)

    −1E (v′2u3) . Plugging the linear projections (53a) and

    (53b) back to (52a) and (52c), we have an augmented equation for the conditional mean of f and the reduced

    form for y3:

    E (f1|x1,v2, a1) = Φ (x1β + v2θ+a1) (54a)

    y3 = 1 [zδ + v2η + v3 ≥ 0] , (54b)

    where a1 is the remaining unobservable factor that, after conditioning on v2, captures the additional endo-

    geneity from y3 through v3. Again, assume a joint normality assumption between a1 and v3 as

    D

    a1v3

    ∣∣∣∣∣∣∣v2 ∼ Normal

    0

    0

    , σ2a ρσa

    ρσa 1

    , (55)

    where σ2a ≡Var(a1) and ρ is the covariance. Further averaging out the unobservable a1, the conditional mean

    of the joint distribution of f1 and y3 has the exact same form as the probit model with many continuous EEVs

    17

  • and one binary EEV in (1).

    E (f1, y3 = 1|z,v2,y2) = E (y1, y3 = 1|z,v2,y2) = P (y1 = 1, y3 = 1|z,v2,y2)

    =

    ∫ ∞−q3

    Φ (d)φ (υ3) dυ3, (56a)

    E (f1, y3 = 0|z,v2,y2) = E (y1, y3 = 0|z,v2,y2) = P (y1 = 1, y3 = 0|z,v2,y2)

    =

    ∫ −q3−∞

    Φ (d)φ (υ3) dυ3, (56b)

    where

    d ≡ x1b + v2r + ρυ3√1 + (1− ρ2)σ2a

    , (57a)

    q3 ≡ zd + v2g. (57b)

    Because the Bernoulli log likelihood belongs to the linear exponential family, the solution from the

    following maximization problem identifies (β,θ):

    maxb1∈RK1 ,d∈RL,r1∈RG,g∈RG,ρ1∈R

    E [logP (f1, y3|y2, z,v2)]

    = E

    [f1y3 log

    ∫ ∞−q3

    Φ (d)φ (υ3) dυ3

    + (1− f1) y3 log∫ ∞−q3

    [1− Φ (d)]φ (υ3) dυ3

    +f1 (1− y3) log∫ −q3−∞

    Φ (d)φ (υ3) dυ3

    + (1− f1) (1− y3) log∫ −q3−∞

    [1− Φ (d)]φ (υ3) dυ3]. (58)

    5 Monte Carlo Simulations

    In this section, six Monte Carlo experiments are conducted to compare the finite sample behavior of

    different estimators for binary response model with both continuous and discrete EEVs. The six Monte Carlo

    experiments fall into two designs. In the first design error terms (u1,v2, u3) are jointly normally distributed.

    In the second design, conditional on v2, u1 and u3 are assumed to have bivariate normal distribution. For

    each design, three data generating processes (DGPs)including a just identification case, an over identification

    case, and a switching model with two regimesare considered. Nine estimators are compared in each case,

    four estimators assuming a linear probability model for the binary outcome and the other five estimators

    18

  • acknowledging the nonlinear functional form. APEs are simulated for those estimators that respect the

    nonlinear functional form and are compared with coefficients from linear estimators.

    More specifically, in the first design of joint normality, the DGP for the Just ID is

    y1 = 1 [−y2 + y3 + 0.3z1 + 0.3z2 + 0.5v2 + 0.5v3 + r1 > 0]

    y2 = 0.1z1 + 0.2z2 + 0.1z3 + z4 + v2 (59)

    y3 = 1 [0.2z1 + 0.1z2 + z3 + 0.1z4 + 0.5v2 + v3 > 0] ,

    where

    u1 = 0.5v2 + v1 (60)

    v1 = 0.5v3 + r1 (61)

    r1 ∼ Normal (0, 0.5) (62)

    so that u1

    v2

    v3

    ∼ Normal

    0

    0

    0

    ,

    1 0.5 0.5

    0.5 1 0

    0.5 0 1

    . (63)

    The binary EEV y2 and continuous EEV y3 are generated to have coefficients of opposite signs in order to

    show how biased estimators react to sign difference. The exogenous variables are generated as:

    z1 ∼ Normal (0, 1)

    e2 ∼ Normal (0, 1)

    z2 = 1 [e2 > 0]

    z3 ∼ Normal (0, 1)

    e4 ∼ Normal (0, 1)

    z4 = 1 [e3 > 0] .

    where the continuous z3 is the instrument mainly for binary EEV y3 and the binary z4 is the instrument

    mainly for continuous EEV y2. To make them valid instruments, z3 and z4 are excluded from the structural

    19

  • equation.

    In this DGP, the true ASF is defined as

    ASF (x1) = Φ (−y2 + y3 + 0.3z1 + 0.3z2) . (64)

    The second case of over identification has the same parameters except that we have two additional

    instruments z5 and z6, where

    z5 ∼ Normal (0, 1)

    e6 ∼ Normal (0, 1)

    z6 = 1 [e6 > 0] .

    Continuous z5 is mainly for the continuous EEV y2 and binary z6 is mainly for the binary EEV y3. The true

    ASF remains the same as in (64).

    In the endogenous switching case, to emphasize coefficients on the continuous EEV y2 and the corre-

    lations between the reduced form errors and the structural error are designed to have opposite directions

    across regimes, namely

    y(1)1 = 1 [−y2 + y3 + 0.3z1 + 0.3z2 + 0.5v2 + v1 > 0]

    y(0)1 = 1 [0.3y2 + y3 − 0.5z1 + 0.1z2 − 0.5v2 + v0 > 0] (65)

    y2 = 0.1z1 + 0.2z2 + 0.1z3 + z4 + v2

    y3 = 1 [0.2z1 + 0.1z2 + z3 + 0.1z4 + 0.5v2 + v3 > 0] ,

    where

    u0 = −0.5v2 + v0 (66)

    v0 = −0.5v3 + r1 (67)u0

    v2

    v3

    ∼ Normal

    0

    0

    0

    ,

    1 −0.5 −0.5

    −0.5 1 0

    −0.5 0 1

    . (68)

    20

  • ASF in this case is

    ASF (x1) = y3Φ (−y2 + y3 + 0.3z1 + 0.3z2) + (1− y3) Φ (0.3y2 + y3 − 0.5z1 + 0.1z2) . (69)

    In design 2, parameterizations are the same as in design 1, but we assume v2 follows a demeaned χ21

    distribution with one degree of freedom

    v2 ∼ χ21 − 1. (70)

    In all experiments, the number of replications is 1000, and the results of the experiments are presented

    for sample sizes of 1000, 3000 and 5000. Table 1 and Table 2 report biases and the root mean squared errors

    (RMSEs) for estimators of APE for y2 and y3, respectively. Figure 1 and Figure 2 depict the empirical

    distributions of estimators of APE for y2 with sample size of 1000 under design 1 and design 2, respectively.

    Similarly, Figure 3 and Figure 4 depict the counterparts for y3.

    For each of the above designs, coefficients of linear probability models and APEs of probit models are

    compared. Further, for probit models, joint estimations with the binary EEV or all EEVs are compared with

    two-step estimations with control function terms (residuals or generalized residuals) plugged in. In addition,

    a switching version of each model is considered to account for the endongenous switching DGP in case 3.

    More specifically, CF Biprobit is the control function approach inserting first-stage residual from reduced-

    form estimation of y2 into the second-step joint estimation between y1 and y3. CF Biprobit Switching

    performs Heckman probit with sample selection for y(1)1 and y(0)1 separately using y3 as a sample selection

    indicator. CF Probit avoids the joint estimation with y3 by inserting a generalized residual from y3 as a

    proxy for endogeneity given residual from y2. CF Probit Switching performs the CF Probit separately for

    sub-samples defined by y3. CF Linear inserts a residual from y2 and a generalized residual from y3 into the

    linear probability model for y1. CF Linear Switching allows for a full set of interactions between y3 and

    other observables and unobservables in the linear probability model. Usual 2SLS uses linear probability

    models for both y1 and y3 and applies the usual two-step IV estimation. Optimal IV uses predicted values

    from reduced forms for y1 and y3 as instruments for a linear probability model of y1. y3 is predicted from a

    probit model. Joint MLE is a full joint estimation of y1, y2 and y3.

    For the APE of y2 under joint normality as in Figure 1, CF Biprobit and Joint MLE are the consistent

    estimators in the Just ID case and Over ID case while CF Biprobit Switching is the consistent estimator in the

    Switching case. Their empirical distributions are centered around the true APE depicted by the red vertical

    21

  • line. Besides those consistent estimators, approximations provided by the CF Probit (or CF Probit Switching

    in the Switching case) outperform, to a great extent, the approximations provided by the linear probability

    estimators such as CF Linear (or CF Linear Switching in the Switching case), Usual 2SLS and Optimal IV.

    In fact, in the Switching case, CF Probit Switching and CF Biprobit Switching (the consistent estimator in

    this case) seem to completely overlap with each other, suggesting a negligible amount of bias. In the Just

    ID case and Over ID case, CF Probit has a mild amount of upward bias and a slightly lower peak than CF

    Biprobit and Joint MLE. In contrast, approximations provided by linear probability model estimators (CF

    Linear, Usual 2SLS Optimal IV and CF Linear Switching) have a significant amount of downward bias in

    all cases. The differences in bias within the linear probability model estimators are not noticeable: they all

    seem to cluster together. In the Switching case, they are joined by the misspecified CF Biprobit and Joint

    MLE which have a similar amount of downward bias. When CF Biprobit and Joint MLE are consistent, they

    still completely overlap with each other. This happens not only in the Just ID case but also in the Over ID

    case, suggesting a negligible amount of efficiency loss by carrying out a two-step procedure. CF Biprobit

    Switching and CF Probit Switching, however, suffer a slightly flatter peak compared to their counterpart

    non-switching estimators (CF Biprobit and CF Probit) in the Just ID case and Over ID case, indicating an

    efficiency loss from a more complex parameterization.

    When the error terms follow a conditional normality, the estimators for APEs for y2 have fairly different

    finite sample behaviors from that under joint normality. As reflected in Figure 2, Joint MLE lacks robustness

    and is no longer the consistent estimator in any case. As before, CF Biprobit is the consistent estimator in

    the Just ID case and Over ID case while CF Biprobit Switching is the consistent estimator in the Switching

    case. Approximations provided by the CF Probit (or CF Probit Switching in the Switching case) are still the

    best: they almost overlap with those provided by CF Biprobit (or CF Biprobit Switching in the Switching

    case), the consistent estimator. Joint MLE is biased upwards to a noticeable degree in the Just ID case and

    Over ID case. In the Switching case where Joint MLE is misspecified, it is biased downward and joined

    by other inconsistent estimators like CF Biprobit and linear probability estimators (CF Linear, Usual 2SLS

    and Optimal IV). CF Linear Switching performs mildly better than other linear estimators in the Switching

    case but are still more biased compared to the consistent estimator. Overall, linear probability estimators

    continues to perform poorly in all cases: they are far biased downwards. The CF Biprobit Switching and CF

    Probit Switching still lead to efficiency loss indicated by flatter peaks in the Just ID case and Over ID case.

    Under joint normality, APE of y3 follows a similar pattern as that of y2, with some minor differences.

    22

  • As in Figure 3, CF Biprobit and Joint MLE still overlap with each other in all cases, whether as consistent

    estimators in the Just ID case and Over ID case, or as misspecified estimators in the Switching case. The

    approximation provided by CF Probit (or CF Probit Switching in the Switching case) is still the best but with

    a flatter peak than those in Figure 1. The linear probability estimators are biased upwards. The differences

    in empirical distributions for the linear estimators are more pronounced in the binary EEV y3 than for

    continuous EEV y2. Particularly, CF Linear using the generalized residual is no longer close to Usual 2SLS

    using linear probability model for y3. In the Switching case, linear probability estimators all lie between the

    CF Biprobit Switching and CF Biprobit with varying degrees of bias and precision.

    APEs of y3 under conditional normality are sketched in Figure 4. Like estimators in Figure 3, they

    have identical patterns with their counterparts for y2. More specifically, Joint MLE is biased downwards in

    the Just ID case and Over ID case but biased upwards in the Switching case. Similarly, linear probability

    estimators are biased upwards rather than downwards. CF Probit (or CF Probit Switching in the Switching

    case) still provides the best approximations in all cases, significantly better than linear probability estimators.

    Table 1 and Table 2 report the bias and RMSE of y2 and y3 for all the estimators in the six cases,

    respectively. Despite the difference in sign and magnitude, the patterns of estimators for y2 and y3 are

    similar. Methods using CF approaches are listed in Column (1) through (6), followed by conventional

    methods like IV 2SLS, Opt. IV 2SLS and Joint MLE from Column (7) to (9). With the increase of sample

    size from 1000 to 5000, the bias of CF Biprobit in the Just ID case and Over ID case (or CF Biprobit

    Switching in the Switching case) shrinks drastically to zero. Their RMSEs also decrease by about half at

    the same time. The bias of CF Probit in the Just ID case and Over ID case (or CF Probit Switching in the

    Switching case) is small at sample size of 1000 but shrink by a less magnititude as sample size increases

    to 5000. The RMSEs of CF Probit or CF Probit Switching also decrease by about half as the sample size

    increases. The bias of the linear probability estimators (CF 2SLS, CF 2SLS Switching, IV 2SLS and Opt.

    IV 2SLS) and misspecified Joint MLE, however, is huge to start with and does not shrink or even increases

    in some cases as the sample size increase. Their RMSEs also do not decrease as much.

    In summary, the Monte Carlo results show that CF Biprobit does not lose efficiency compared to Joint

    MLE in the correctly specified case. CF Probit (or CF Probit in the Switching case) provides good approxi-

    mations, outperforming linear estimators of any sort to a great extent.

    23

  • 6 Empirical Illustration

    As an empirical illustration, we revisit the empirical example of Murtazashvili and Wooldridge (2015)

    under different functional form assumptions and estimation methods. Murtazashvili and Wooldridge (2015)

    study the sensitivity of the budget share of housing expenditure to price and total expenditure using a linear

    probability panel data model with many sources of heterogeneity. Total expenditure is considered to be the

    continuous EEV because of its joint determination with the budget share on housing expenditure. Home-

    ownership dummy is considered to be the binary EEV. It is also assumed to play the role of an endogenous

    switching indicator that is employed for the budget share of housing expenditure equation. Here, instead,

    we employ a fractional response model with switching, as in (71), that acknowledges the fractional nature

    of the budget share and therefore has built-in heterogeneity.

    E(HousingShare|x1, c1, c0) = Φ [β0 + β1Log(Expend.) + β2Homeowner

    +z1β3 + β4Log(Expend.) · Homeowner

    +β5Homeowner · z1 + c0 + Homeowner · c1] , (71a)

    Log(Expend.) = ζ0 + zζ1 + v2, (71b)

    Homeowner = 1 [γ0 + zγ1 + u3 > 0] . (71c)

    We also use just one cross-sectional period from the sample, which turns out to give fairly close estimates

    of marginal effect for the variables of interest to the panel linear model with many sources of heterogeneity.

    The summary APEs from the nonlinear model are compared to the coefficients from the coefficients in linear

    probability panel data models.

    The sample employed in the estimation is the 2001 wave of the Panel Study of Income Dynamics (PSID)

    that consists of 2355 owners and 629 renters. Since we suspect that the homeownership dummy indicates

    switching into differing regions, we report separate summary statistics for different home ownership statuses

    as in Table 3. Due to the way the dependent variable housing budget share is constructed, and the increase

    in the price for homes, 84 out of 2355 home owners face a negative housing budget share. As the dependent

    variable has to be in the unity interval for a fractional response, the housing budget for these 84 homeowners

    are set to their lower bound zero. On average, owners spend smaller budget shares on housing than renters.

    24

  • The total expenditure and income of owners are greater than those of renters. Log price, age of the household

    head, marital status, whether recently moved and race are the exogenous control variables. Log income

    is considered to be the instrument primarily for the log expenditure, whereas years of education of the

    household head and number of children in the household are instruments mainly for home ownership.

    Table 4 reports the first-stage reduced-form estimation for the two EEVs. Linear reduced form regres-

    sions are reported for log total expenditure, the continuous EEV, in Column (1) and (2). Probit regression is

    reported for home ownership, the binary EEV, in Column (3) to (5). Slight variations in the specifications are

    reported in each case. For example, age squared is included in Column (2) in addition to age in its level form

    as in Column (1), which turns out to be significant but practically unimportant. For any specification, the

    instruments mentioned above are strong enough. The probit reduced forms for the homeownership dummy

    are reported with and without the continuous EEV. The predicted value of home ownership from Column

    (3) contains only an exogenous variable and is used as an instrument in Regression (3) in Table 5. Columns

    (4) and (5) show that including the residual from the reduced form of the log expenditure is sufficient to

    control for all the endogeneity from the total expenditure in the home ownership equation.

    Table 5 compares the APEs from the fractional response models to the coefficients of linear models.

    Columns (1) to (4) report the coefficients from linear models for the housing budget share, and Columns

    (5) to (10) report the APEs from fractional response models. The same set of estimators as in the Monte

    Carlo study are compared here, the only difference being that the dependent variable is a fractional response,

    instead of a binary response. A “Frac” is added to the names to indicate that a quasi-probit is assumed for the

    housing budget share. When the homeownership is jointly estimated in the fractional probit, as in Column

    (9) and Column (10), the biprobit and heckprobit command in Stata is modified to allow for a fractional

    dependent variable. The standard errors for the estimates of APEs are bootstrapped.

    As we can tell from Table 5, first of all, failing to account for endogeneity, whether as in the linear

    model represented in Column (1) or as in the Frac Probit model as in Column (5), leads to fairly different

    estimates from those methods that take care of endogeneity using the same models. Among the linear

    probability models that have accounted for endogeneity, the estimates from IV 2SLS differ significantly

    from that Opt. IV 2SLS and CF 2SLS. Both Opt. IV 2SLS and CF 2SLS are close to the APEs in the Frac

    Probit models that have accounted for endogeneity. This suggests that the relationship between the housing

    budget share and the covariates of interest may be close to linear in the unit interval so that two-stage

    least squares estimator provides a good approximation. Among the Frac Probit models, the estimates from

    25

  • different methods of accounting for endogeneity are fairly close across the board. The difference between

    conducting joint estimations with the home ownership, as in Column (9) and Column (10), and plugging

    in the generalized residual from home ownership, as in Column (7) and Column (8), is small. Particularly,

    if we use home ownership as the switching indicator, the estimates and standard error from Column (10)

    and Column (8) are the same, at least to the third decimal place. The difference between Column (7) and

    Column (9) is also negligibly small.

    Table 6 reports the t statistics for testing of endogeneity and their p-values. The p-values are obtained

    from bootstrapping the test statistics. Only estimators that employ CF approaches are considered in this

    table. All the test statistics reported are Wald tests for significance. The names refer to the estimators, as in

    Table 5. Columns (1) to (4) report the variable addition tests (VATs) on control function terms only. Columns

    (5) and (6) also report Wald tests on the correlation parameter ρ (or ρ0 and ρ1 for the two switching regimes),

    representing the endogeneity from home ownership equations, given the control function term from log

    expenditure equation. In any case, the evidence of endogeneity from log expenditure is strong: the p-values

    are identically zero in any test on the significance of v̂2, the control function term from log expenditure

    equation. No test based on the fractional response model confirms the endogeneity from homeownship,

    whether it is a VAT test on the generalized residual or on the correlation parameter, although the test on the

    generalized residual in the linear model CF 2SLS turns out to be significant. The test statistics and p-value

    from the VAT test on generalized residual in Column (3) is quite similar to the Wald test on the correlation

    parameter ρ in Column (5), suggesting the validity of using VAT on generalized residuals to detect the

    additional endogeneity from home ownership. The LM or LR test on ρ can also be performed, but the

    proper method of bootstrapping for p-values has to be determined. Another advantage of performing a VAT

    test on generalized residuals is its robustness to model specifications. In the case of switching, a joint test

    concerning two regimes needs to be conducted to detect endogeneity. This can be easily done by performing

    a joint test on the interaction terms between the control function terms and the switching indicator. However,

    finding a way to combine correlation parameters ρ0 and ρ1 obtained from different regimes, however is no

    easy job.

    26

  • 7 Conclusion

    This paper has shown applications of control function approaches to account for one binary EEV and

    many continuous EEVs in binary and fractional response models. The control function approach is com-

    putationally simple and allows for a flexible incorporation of heterogeneity, as in an endogenous switching

    model. Partial effects based on the ASF are of causal interpretation and can be easily bootstrapped to obtain

    inference due to the computational simplicity of the control function approach. A VAT test based on the

    generalized residual is shown to be a valid test for detecting additional endogeneity from the binary EEV,

    conditioning on the residuals from the continuous EEVs. The simulation study shows that using generalized

    residuals to account for endogeneity provides a fairly good approximation to the true APE, significantly bet-

    ter than approximations provided by linear probability models. Applying the CF approach to an empirical

    illustration using a fractional response model for the housing budget share, we show that homeownership is

    not endogenous after controlling for total expenditure, the continuous EEV. This is revealed by performing

    VATs on the generalized residuals and cross validated by a Wald test on the correlation parameter. These

    results imply that plugging in generalized residuals into a binary response model (or a fractional response

    model) acts as a better approximation to the causal marginal effects of interest than other conventional

    methods, and it is computationally simpler, enabling an easier detection of endogeneity.

    27

  • A.1 Figures and Tables for Section 5

    Figure 1: Empirical Distribution of APEs for y2 for the Sample Size of 1000 under Joint Normality

    A.2 Tables for Section 6

    28

  • Tabl

    e1:

    Sim

    ulat

    ion

    Res

    ults

    forA

    PEof

    y2

    (1)

    (2)

    (3)

    (4)

    (5)

    (6)

    (7)

    (8)

    (9)

    (1)

    (2)

    (3)

    (4)

    (5)

    (6)

    (7)

    (8)

    (9)

    Des

    ign

    1:Jo

    intN

    orm

    ality

    Des

    ign

    2:C

    ondi

    tiona

    lNor

    mal

    ityC

    FC

    FC

    FC

    FC

    FC

    FIV

    Opt

    .IV

    ML

    EC

    FC

    FC

    FC

    FC

    FC

    FIV

    Opt

    .IV

    ML

    EB

    ipro

    bit

    Bip

    robi

    tPr

    obit

    Prob

    it2S

    LS

    2SL

    S2S

    LS

    2SL

    SB

    ipro

    bit

    Bip

    robi

    tPr

    obit

    Prob

    it2S

    LS

    2SL

    S2S

    LS

    2SL

    SSw

    itchi

    ngSw

    itchi

    ngSw

    itchi

    ngSw

    itchi

    ngSw

    itchi

    ngSw

    itchi

    ngC

    ase

    1:Ju

    stID

    ,One

    Reg

    ion,

    APE

    y2=-

    .265

    0Ju

    stID

    ,One

    Reg

    ion,

    APE

    y2=-

    .242

    6N

    =100

    0B

    ias

    .001

    1.0

    020

    .002

    9.0

    041

    -.056

    6-.0

    568

    -.054

    1-.0

    542

    .001

    1.0

    009

    .001

    8.0

    024

    .002

    8-.0

    463

    -.046

    5-.0

    474

    -.047

    6.0

    121

    RM

    SE.0

    102

    .010

    5.0

    112

    .011

    7.0

    634

    .063

    5.0

    613

    .061

    4.0

    102

    .013

    3.0

    147

    .013

    8.0

    151

    .056

    0.0

    562

    .057

    0.0

    571

    .016

    0N

    =300

    0B

    ias

    .000

    3.0

    005

    .001

    8.0

    022

    -.057

    1-.0

    572

    -.054

    8-.0

    549

    .000

    3.0

    002

    .000

    4.0

    016

    .001

    3-.0

    458

    -.045

    9-.0

    468

    -.046

    9.0

    114

    RM

    SE.0

    054

    .005

    5.0

    060

    .006

    1.0

    593

    .059

    3.0

    571

    .057

    2.0

    054

    .007

    5.0

    081

    .007

    8.0

    084

    .049

    0.0

    490

    .049

    8.0

    500

    .012

    8N

    =500

    0B

    ias

    .000

    2.0

    004

    .001

    8.0

    020

    -.057

    3-.0

    574

    -.055

    0-.0

    551

    .000

    2-.0

    001

    .000

    1.0

    013

    .000

    9-.0

    461

    -.046

    1-.0

    472

    -.047

    2.0

    112

    RM

    SE.0

    044

    .004

    5.0

    050

    .005

    1.0

    587

    .058

    7.0

    564

    .056

    5.0

    044

    .005

    8.0

    062

    .006

    1.0

    064

    .048

    0.0

    480

    .049

    0.0

    490

    .012

    1C

    ase

    2:O

    verI

    D,O

    neR

    egio

    n,A

    PEy2=-

    .227

    9O

    verI

    D,O

    neR

    egio

    n,A

    PEy2=-

    .212

    3N

    =100

    0B

    ias

    .000

    0.0

    003

    .000

    8.0

    011

    -.034

    2-.0

    342

    -.032

    2-.0

    325

    .000

    0.0

    005

    .000

    6.0

    015

    .001

    5-.0

    269

    -.027

    3-.0

    291

    -.029

    4.0

    059

    RM

    SE.0

    068

    .006

    9.0

    072

    .007

    4.0

    359

    .035

    8.0

    341

    .034

    3.0

    067

    .007

    2.0

    074

    .007

    8.0

    080

    .029

    8.0

    301

    .031

    9.0

    321

    .009

    2N

    =300

    0B

    ias

    .000

    1.0

    002

    .000

    8.0

    009

    -.033

    5-.0

    334

    -.031

    5-.0

    317

    .000

    1.0

    000

    .000

    0.0

    011

    .000

    9-.0

    272

    -.027

    7-.0

    292

    -.029

    4.0

    057

    RM

    SE.0

    039

    .003

    9.0

    041

    .004

    1.0

    341

    .033

    9.0

    321

    .032

    3.0

    039

    .004

    1.0

    042

    .004

    5.0

    046

    .028

    1.0

    286

    .030

    0.0

    302

    .007

    0N

    =500

    0B

    ias

    .000

    0.0

    000

    .000

    7.0

    007

    -.033

    8-.0

    339

    -.031

    7-.0

    320

    0000

    .000

    0.0

    001

    .001

    0.0

    009

    -.027

    1-.0

    276

    -.029

    0-.0

    293

    .005

    7R

    MSE

    .002

    9.0

    029

    .003

    1.0

    032

    .034

    1.0

    635

    .032

    1.0

    323

    .002

    9.0

    032

    .003

    3.0

    035

    .003

    6.0

    277

    .028

    1.0

    295

    .029

    8.0

    065

    Cas

    e3:

    Just

    ID,T

    wo

    Reg

    ions

    ,APE

    y2=-

    .090

    2Ju

    stID

    ,Tw

    oR

    egio

    ns,A

    PEy2=

    -.062

    4N

    =100

    0B

    ias

    -.024

    8-.0

    001

    -.023

    7.0

    010

    -.027

    4-.0

    261

    -.025

    2-.0

    255

    -.024

    8-.0

    239

    -.000

    1-.0

    230

    .000

    1-.0

    176

    -.015

    8-.0

    176

    -.018

    2-.0

    241

    RM

    SE.0

    345

    .017

    1.0

    337

    .017

    1.0

    382

    .036

    5.0

    368

    .036

    9.0

    346

    .036

    3.0

    197

    .035

    7.0

    196

    .033

    6.0

    317

    .033

    6.0

    339

    .036

    3N

    =300

    0B

    ias

    -.025

    2-.0

    003

    -.025

    1-.0

    003

    -.026

    9-.0

    165

    -.024

    9-.0

    192

    -.025

    2.-.

    0259

    -.000

    4-.0

    251

    -.000

    3-.0

    187

    -.016

    6-.0

    186

    -.019

    2-.0

    261

    RM

    SE.0

    289

    .009

    8.0

    295

    .010

    8.0

    311

    .022

    5.0

    293

    .025

    1.0

    289

    .030

    2.0

    108

    .029

    5.0

    108

    .024

    7.0

    225

    .024

    6.0

    251

    .030

    3N

    =500

    0B

    ias

    -.025

    6.0

    001

    -.024

    5.0

    009

    -.027

    1-.0

    260

    -.025

    0-.0

    253

    -.025

    6-.0

    251

    .000

    0-.0

    243

    .000

    1-.0

    175

    -.015

    9-.0

    175

    -.018

    1-.0

    253

    RM

    SE.0

    280

    .007

    6.0

    269

    .007

    7.0

    298

    .028

    6.0

    279

    .028

    2.0

    280

    .027

    8.0

    083

    .027

    0.0

    083

    .021

    3.0

    197

    .021

    4.0

    218

    .027

    9a

    Sequ

    entia

    lave

    ragi

    ngof

    the

    cont

    rolf

    unct

    ion

    term

    v 2an

    dx

    isap

    plie

    dto

    com

    pute

    estim

    ates

    ofA

    PEs.

    bT

    hebi

    asis

    defin

    edas

    the

    diff

    eren

    cebe

    twee

    nth

    etr

    ueA

    PEs

    and

    the

    estim

    ates

    .RM

    SEis

    the

    root

    mea

    nsq

    uare

    der

    ror.

    cE

    stim

    ator

    (1)i

    sth

    eC

    Fap

    proa

    chin

    sert

    ing

    the

    first

    -sta

    gere

    sidu

    alv̂ 2

    toa

    seco

    nd-s

    tage

    join

    tbip

    robi

    tbet

    wee

    ny1

    ,y3.E

    stim

    ator

    (2)i

    sth

    eC

    Fap

    proa

    chin

    sert

    ing

    the

    first

    -sta

    gere

    sidu

    alv̂ 2

    toa

    seco

    nd-s

    tage

    join

    tbip

    robi

    tbet

    wee

    ny(1

    )1

    ,y3

    andy(0

    )1

    ,y3.E

    stim

    ator

    (3)i

    sth

    eC

    Fap

    proa

    chin

    sert

    ing

    first

    -sta

    gere

    sidu

    alv̂ 2

    andĝr 3

    into

    the

    prob

    itm

    odel

    fory

    1.E

    stim

    ator

    (4)i

    sth

    eC

    Fap

    proa

    chin

    sert

    ing

    first

    -sta

    gere

    sidu

    alv̂ 2

    andĝr 3

    into

    the

    prob

    itm

    odel

    fory

    1se

    para

    tely

    fors

    ub-s

    ampl

    esde

    fined

    byy3.E

    stim

    ator

    (5)i

    sth

    eC

    Fap

    proa

    chap

    plie

    dto

    linea

    rpro

    babi

    lity

    mod

    elfo

    ry1

    byin

    sert

    ing

    first

    -sta

    gere

    sidu

    alv̂ 2

    andĝr 3

    .Est

    imat

    or(6

    )is

    the

    CF

    appr

    oach

    appl

    ied

    tolin

    earp

    roba

    bilit

    ym

    odel

    fory

    1by

    inse

    rtin

    gfir

    st-s

    tage

    resi

    dualv̂ 2

    andĝr 3

    fors

    ub-s

    ampl

    esde

    fined

    byy3.E

    stim

    ator

    (7)i

    sth

    e2S

    LS

    IVap

    proa

    chfo

    ralin

    earp

    roba

    bilit

    ym

    odel

    ofy1.E

    stim

    ator

    (8)i

    sth

    e2S

    LS

    IVap

    proa

    chus

    ing

    pred

    icte

    dfit

    ted

    valu

    esfr

    omth

    efir

    st-s

    tage

    redu

    ced

    form

    sfo

    ry2

    andy3

    asin

    stru

    men

    ts.y

    2is

    pred

    icte

    dus

    ing

    alin

    ear

    mod

    elan

    dy3

    ispr

    edic

    ted

    usin

    gpr

    obit

    mod

    el.E

    stim

    ator

    (9)i

    sth

    ejo

    inte

    stim

    atio

    nofy1,y

    2an

    dy3

    bym

    axim

    umlik

    elih

    ood.

    29

  • Tabl

    e2:

    Sim

    ulat

    ion

    Res

    ults

    forA

    PEof

    y3

    (1)

    (2)

    (3)

    (4)

    (5)

    (6)

    (7)

    (8)

    (9)

    (1)

    (2)

    (3)

    (4)

    (5)

    (6)

    (7)

    (8)

    (9)

    Des

    ign

    1:Jo

    intN

    orm

    ality

    Des

    ign

    2:C

    ondi

    tiona

    lNor

    mal

    ityC

    FC

    FC

    FC

    FC

    FC

    FIV

    Opt

    .IV

    ML

    EC

    FC

    FC

    FC

    FC

    FC

    FIV

    Opt

    .IV

    ML

    EB

    ipro

    bit

    Bip

    robi

    tPr

    obit

    Prob

    it2S

    LS

    2SL

    S2S

    LS

    2SL

    SB

    ipro

    bit

    Bip

    robi

    tPr

    obit

    Prob

    it2S

    LS

    2SL

    S2S

    LS

    2SL

    SSw

    itchi

    ngSw

    itchi

    ngSw

    itchi

    ngSw

    itchi

    ngSw

    itchi

    ngSw

    itchi

    ngC

    ase

    1Ju

    stID

    ,One

    Reg

    ion,

    APE

    y3=.

    2573

    Just

    ID,O

    neR

    egio

    n,A

    PEy3=.

    2385

    N=1

    000

    Bia

    s-.0

    013

    -.001

    2-.0

    116

    -.011

    9.1

    010

    .101

    0.0

    809

    .082

    3-.0

    014

    -.000

    3.0

    013

    -.008

    7-.0

    075

    .066

    7.0

    656

    .073

    8.0

    752

    -.011

    9R

    MSE

    .038

    0.0

    382

    .044

    0.0

    442

    .111

    7.1

    119

    .096

    6.0

    972

    .038

    0.0

    422

    .043

    8.0

    472

    .048

    4.0

    866

    .086

    9.0

    956

    .096

    1.0

    432

    N=3

    000

    Bia

    s.0

    001

    .000

    2-.0

    090

    -.009

    1.1

    032

    .103

    3.0

    852

    .085

    7.0

    000

    -.000

    1.0

    001

    -.008

    4-.0

    083

    .067

    5.0

    663

    .074

    4.0

    757

    -.011

    8R

    MSE

    .022

    8.0

    229

    .026

    7.0

    268

    .106

    9.1

    070

    .090

    4.0

    906

    .022

    8.0

    236

    .024

    1.0

    276

    .028

    1.0

    742

    .073

    5.0

    819

    .083

    0.0

    259

    N=5

    000

    Bia

    s-.0

    004

    -.000

    3-.0

    096

    -.009

    6.1

    024

    .102

    6.0

    840

    .084

    9-.0

    004

    .000

    8.0

    009

    -.007

    3-.0

    071

    .068

    3.0

    671

    .076

    8.0

    815

    -.011

    2R

    MSE

    .017

    2.0

    172

    .021

    3.0

    214

    .104

    7.1

    048

    .087

    3.0

    880

    .017

    1.0

    193

    .019

    8.0

    224

    .022

    8.0

    727

    .071

    7.0

    816

    .081

    5.0

    221

    Cas

    e2

    Ove

    rID

    ,One

    Reg

    ion,

    APE

    y3=.

    2132

    Ove

    rID

    ,One

    Reg

    ion,

    APE

    y3=.

    2026

    N=1

    000

    Bia

    s.0

    002

    -.000

    8-.0

    003

    -.000

    6.0

    672

    .065

    3.0

    512

    .053

    4.0

    001

    .000

    9.0

    009

    -.001

    2.0

    003

    .035

    3.0

    493

    .049

    2.0

    513

    -.004

    4R

    MSE

    .032

    3.0

    349

    .035

    0.0

    367

    .081

    3.0

    811

    .074

    4.0

    729

    .032

    2.0

    394

    .039

    4.0

    400

    .041

    1.0

    622

    .073

    6.0

    772

    .076

    2.0

    372

    N=3

    000

    Bia

    s.0

    002

    .000

    4-.0

    003

    .000

    3.0

    663

    .064

    7.0

    501

    .052

    1.0

    001

    .000

    4.0

    008

    -.001

    7-.0

    006

    .033

    0.0

    339

    .049

    4.0

    510

    -.004

    1R

    MSE

    .018

    8.0

    197

    .019

    8.0

    207

    .071

    5.0

    705

    .058

    7.0

    595

    .018

    8.0

    217

    .022

    9.0

    237

    .023

    4.0

    444

    .045

    5.0

    608

    .060

    8.0

    226

    N=5

    000

    Bia

    s-.0

    001

    -.000

    2-.0

    012

    -.000

    7.0

    655

    .063

    5.0

    491

    .051

    3-.0

    001

    .000

    3.0

    006

    -.001

    6-.0

    005

    .033

    8.0

    478

    .048

    5.0

    506

    -.004

    1R

    MSE

    .014

    7.0

    152

    .015

    8.0

    162

    .068

    7.0

    670

    .054

    5.0

    559

    .014

    6.0

    173

    .018

    0.0

    185

    .018

    9.0

    414

    .054

    0.0

    557

    .056

    9.0

    174

    Cas

    e3

    Just

    ID,T

    wo

    Reg

    ions

    ,APE

    y3=.

    0622

    Just

    ID,T

    wo

    Reg

    ions

    ,APE

    y3=.

    0864

    N=1

    000

    Bia

    s.0

    436

    .000

    5.0

    339

    .006

    5.0

    292

    .032

    7.0

    128

    .015

    3.0

    437

    .053

    5.0

    007

    .046

    2.0

    028

    .025

    7-.0

    045

    .025

    7.0

    301

    .054

    9R

    MSE

    .072

    1.0

    392

    0585

    .044

    4.0

    576

    .057

    5.0

    559

    .054

    6.0

    722

    .076

    6.0

    409

    .066

    9.0

    455

    .055

    5.0

    500

    .059

    7.0

    601

    .077

    9N

    =300

    0B

    ias

    .043

    7.0

    001

    .046

    7.0

    023

    .029

    2-.0

    051

    .012

    9.0

    294

    .043

    8.0

    536

    -.000

    3.0

    467

    .002

    3.0

    254

    -.005

    1.0

    246

    .029

    4.0

    546

    RM

    SE.0

    555

    .023

    6.0

    547

    .026

    8.0

    413

    .029

    4.0

    351

    .042

    8.0

    555

    .062

    4.0

    238

    .054

    7.0

    268

    .038

    6.0

    294

    .040

    8.0

    428

    .063

    4N

    =500

    0B

    ias

    .044

    7.0

    010

    .035

    3.0

    076

    .030

    0.0

    335

    .013

    0.0

    159

    .044

    7.0

    536

    -.000

    3.0

    466

    .002

    3.0

    245

    -.005

    7.0

    247

    .029

    1.0

    545

    RM

    SE.0

    516

    .018

    0.0

    413

    .021

    5.0

    372

    .039

    5.0

    273

    .028

    0.0

    516

    .059

    2.0

    185

    .051

    6.0

    208

    .033

    3.0

    237

    .035

    0.0

    378

    .060

    1a

    Sequ

    entia

    lave

    ragi

    ngof

    the

    cont

    rolf

    unct

    ion

    term

    v 2an

    dx

    isap

    plie

    dto

    com

    pute

    estim

    ates

    ofA

    PEs.

    bT

    hebi

    asis

    defin

    edas

    the

    diff

    eren

    cebe

    twee

    nth

    etr

    ueA

    PEs

    and

    the

    estim

    ates

    .RM

    SEis

    the

    root

    mea

    nsq

    uare

    der

    ror.

    cE

    stim

    ator

    (1)i

    sth

    eC

    Fap

    proa

    chin

    sert

    ing

    the

    first

    -sta

    gere

    sidu

    alv̂ 2

    toa

    seco

    nd-s

    tage

    join

    tbip

    robi

    tbet

    wee

    ny1

    ,y3.E

    stim

    ator

    (2)i

    sth

    eC

    Fap

    proa

    chin

    sert

    ing

    the

    first

    -sta

    gere

    sidu

    alv̂ 2

    toa

    seco

    nd-s

    tage

    join

    tbip

    robi

    tbet

    wee

    ny(1

    )1

    ,y3

    andy(0

    )1

    ,y3.E

    stim

    ator

    (3)i

    sth

    eC

    Fap

    proa

    chin

    sert

    ing

    first

    -sta

    gere

    sidu

    alv̂ 2

    andĝr 3

    into

    the

    prob

    itm

    odel

    fory

    1.E

    stim

    ator

    (4)i

    sth

    eC

    Fap

    proa

    chin

    sert

    ing

    first

    -sta

    gere

    sidu

    alv̂ 2

    andĝr 3

    into

    the

    prob

    itm

    odel

    fory

    1se

    para

    tely

    fors

    ub-s

    ampl

    esde

    fined

    byy3.E

    stim

    ator

    (5)i

    sth

    eC

    Fap

    proa

    chap

    plie

    dto

    linea

    rpro

    babi

    lity

    mod

    elfo

    ry1

    byin

    sert

    ing

    first

    -sta

    gere

    sidu

    alv̂ 2

    andĝr 3

    .Est

    imat

    or(6

    )is

    the

    CF

    appr

    oach

    appl

    ied

    tolin

    earp

    roba

    bilit

    ym

    odel

    fory

    1by

    inse

    rtin

    gfir

    st-s

    tage

    resi

    dualv̂ 2

    andĝr 3

    fors

    ub-s

    ampl

    esde

    fined

    byy3.E

    stim

    ator

    (7)i

    sth

    e2S

    LS

    IVap

    proa

    chfo

    ralin

    earp

    roba

    bilit

    ym

    odel

    ofy1.E

    stim

    ator

    (8)i

    sth

    e2S

    LS

    IVap

    proa

    chus

    ing

    pred

    icte

    dfit

    ted

    valu

    esfr

    omth

    efir

    st-s

    tage

    redu

    ced

    form

    sfo

    ry2

    andy3

    asin

    stru

    men

    ts.y

    2is

    pred

    icte

    dus

    ing

    alin

    ear

    mod

    elan

    dy3

    ispr

    edic

    ted

    usin

    gpr

    obit

    mod

    el.E

    stim

    ator

    (9)i

    sth

    ejo

    inte

    stim

    atio

    nofy1,y

    2an

    dy3

    bym

    axim

    umlik

    elih

    ood.

    30

  • Figure 2: Empirical Distribution of APEs for y2 for the Sample Size of 1000 under Conditional Normality

    31

  • Figure 3: Empirical Distribution of APEs for y3 for the Sample Size of 1000 under Joint Normality

    32

  • Figure 4: Empirical Distribution of APEs for y3 for the Sample Size of 1000 under Conditional Normality

    33

  • Table 3: Summary Statistics of the Estimation Sample (N=2964)

    Variable Owner RenterBudget Share on Housing .20 .41

    (.16) (.16)Ln(Expenditure) 10.35 9.82

    (.59) (.59)Ln(Income) 10.94 10.29

    (.75) (.73)Ln(Price) 8.55 8.93

    (.21) (.12)Age 49.88 44.45

    (12.97) (13.69)Married .79 .35

    (.40) (.48)Moved .10 .33

    (.30) (.47)Black .21 .46

    (.41) (.50)Years of education 13.27 12.12

    (2.73) (2.90)Number of Children .94 1.05

    (1.14) (1.24)Obs. 2355 629a The sample is based on the 2001waves of the Panel

    Study of Income Dynamics (PSID). All monetary vari-ables were converted to 1998 dollars before they werelogged.

    b Sample standard deviations are in parentheses belowthe sample means.

    34

  • Table 4: The Frist Stage Reduced Form Regression for the EEVs

    (1) (2) (3) (4) (5)Estimation Method OLS OLS Probit Probit ProbitDependent Variable Ln(Expenditure) Ln(Expenditure) Owner Owner Ownerv̂2 .058∗∗∗

    (.009)Ln(Expenditure) .058∗∗∗

    (.009)Ln(Income) .410∗∗∗ .385∗∗∗ .055∗∗∗ .057∗∗∗ .033∗∗∗

    (.013) (.013) (.006) (.006) (.007)Education .025∗∗∗ .024∗∗∗ .005∗∗ .004∗∗ .002

    (.0031) (.003) (.001) (.0015) (.0015)Children .056∗∗∗ .062∗∗∗ .012∗∗∗ .011∗∗∗ .008∗∗

    (.0077) (.008) (.004) (.003) (.004)Age -.0025∗∗∗ .029∗∗∗ .003∗∗∗ .003∗∗∗ .003∗∗∗

    (.0007) (.004) (.0003) (.0003) (.0003)Age2 -.0002∗∗∗

    (.00004)Ln(Price) -.068∗∗ -.067∗∗ -.646∗∗∗ -.624∗∗∗ -.620∗∗∗

    (.0318) (.0314) (.017) (.016) (.016)Married .27∗∗∗ .26∗∗∗ .054∗∗∗ .053∗∗∗ .037∗∗∗

    (.021) (.021) (.010) (.009) (.010)Moved .012∗ .037∗ -.065∗∗∗ -.063∗∗∗ -.064∗∗∗

    (.022) (.022) (.009) (.009) (.009)Black -.058∗∗ -.079∗∗∗ -.064∗∗∗ -.060∗∗∗ -.057∗∗∗

    (.019) (.0189) (.0087) (.009) (.009)a v̂2 denotes the residual from Regression (1), the reduced form for log total expenditure.b Regression (1) and (2) are first-stage regressions for log total expenditure, the continuous EEV. Regression

    (3)-(5) are first-stage regressions for home ownership, the binary EEV.c * p-value¡10%

    ** p-value¡5%*** p-value¡1%

    35

  • Tabl

    e5:

    Com

    pari

    ngm

    argi

    nale

    ffec

    tsin

    the

    stru

    ctur

    aleq

    uatio

    nof

    the

    hous

    ing

    shar

    e

    (1)

    (2)

    (3)

    (4)

    (5)

    (6)

    (7)

    (8)

    (9)

    (10)

    Est

    imat

    ion

    Met

    hod

    IVO

    pt.I

    VC

    FC

    FC

    FC

    FC

    FC

    FFu

    nctio

    nalf

    orm

    fory

    1L

    inea

    r2S

    LS

    2SL

    S2S

    LS

    Frac

    Prob

    itFr

    acPr

    obit

    Frac

    Prob

    itFr

    acPr

    obit

    Frac

    Bip

    robi

    tFr

    acB

    ipro

    bit

    No

    EE

    Vs

    No

    EE

    Vs

    One

    EE

    VTw

    oE

    EV

    sSw

    itchi

    ngSw

    itchi

    ngM

    argi

    nalE

    ffec

    tsC

    oeff

    Coe

    ffC

    oeff

    Coe

    ffA

    PEA

    PEA

    PEA

    PEA

    PEA

    PEL

    n(E

    xpen

    ditu

    re)

    -.109∗∗∗

    -.176∗∗

    -.056∗∗∗

    -.061∗∗∗

    -.106∗∗∗

    -.064∗∗∗

    -.061∗∗∗

    -.059∗∗∗

    -.062∗∗∗

    -.059∗∗∗

    (.005

    )(.0

    84)

    (.01)

    (.01)

    (.005

    )(.0

    09)

    (.010

    )(.0

    09)

    (.01)

    (.009

    )O

    wne

    r-.0

    94∗∗∗

    .336

    -.14∗∗∗

    -.124∗∗∗

    -.074∗∗∗

    -.088∗∗∗

    -.101∗∗∗

    -.124∗∗∗

    -.098∗∗∗

    -.124∗∗∗

    (.009

    )(.3

    30)

    (.016

    )(.0

    15)

    (.008

    2)(.0

    12)

    (.020

    )(.0

    38)

    (.020

    )(.0

    38)

    Age

    .002∗∗∗

    .000

    3.0

    03∗∗∗

    .003∗∗∗

    .002∗∗∗

    .002

    5∗∗∗

    .002

    6∗∗∗

    .002

    6∗∗∗

    .002

    5∗∗∗

    .002

    6∗∗∗

    (.000

    2)(.0

    01)

    (.000

    2)(.0

    002)

    (.000

    2)(.0

    002)

    (.000

    2)(.0

    002)

    (.000

    2)(.0

    002)

    Ln(

    Pric

    e).1

    47∗∗∗

    .54∗

    .109∗∗∗

    .124∗∗∗

    .151∗∗∗

    .147∗∗∗

    .136∗∗∗

    .138∗∗∗

    .139∗∗∗

    .138∗∗∗

    (.013

    )(.2

    9)(.0

    17)

    (.017

    )(.0

    13)

    (.016

    )(.0

    21)

    (.022

    )(.0

    21)

    (.022

    )M

    arri

    ed-.0

    05-.0

    6∗∗

    -.027

    -.028∗∗∗

    -.002

    6-.0

    27∗∗∗

    -.027∗∗∗

    -.027∗∗∗

    -.027∗∗∗

    -.027∗∗∗

    (.007

    )(.0

    25)

    (.008

    )(.0

    09)

    (.006

    )(.0

    08)

    (.008

    )(.0

    08)

    (.008

    )(.0

    08)

    Mov

    ed.0

    11.0

    71.0

    05.0

    07.0

    09.0

    09.0

    07.0

    05.0

    07.0

    05(.0

    07)

    (.047

    )(.0

    08)

    (.007

    )(.0

    07)

    (.007

    )(.0

    08)

    (.009

    )(.0

    07)

    (.009

    )B

    lack

    -.012∗∗

    .041

    -.010∗

    -.009

    -.012∗∗

    -.006

    -.008

    -.009

    -.007

    -.009

    (.006

    )(.0

    36)

    (.006

    )(.0

    06)

    (.006

    )(.0

    06)

    (.006

    )(.0

    07)

    (.006

    )(.0

    07)

    aT

    hede

    pend

    entv

    aria

    ble

    isth

    eex

    pend

    iture

    shar

    eon

    hous

    ing.

    bSt

    anda

    rder

    rors

    fort

    hees

    timat

    edA

    PEs

    wer

    ebo

    otst

    rap

    stan

    dard

    erro

    rsw

    ith20

    0re

    plic

    atio

    ns.

    cR

    egre

    ssio

    n(1

    )is

    the

    OL

    Sfo

    rlin

    earp

    roba

    bilit

    ym

    odel

    that

    assu

    ms

    noE

    EV

    s.R

    egre

    ssio

    n(2

    )is

    the

    2SL

    SIV

    estim

    ator

    forl

    inea

    rpro

    babi

    lity

    mod

    elth

    atus

    esa

    linea

    rpro

    babi

    lity

    mod

    elfo

    rthe

    redu

    ced

    form

    ofho

    me

    owne

    rshi

    p.R

    egre

    ssi