censored quantile instrumental variable estimation via control...

Censored Quantile Instrumental Variable Estimation via

Control Functions

Victor Chernozhukov∗ Ivan Fernandez-Val† Amanda E. Kowalski‡

June 24, 2009

Abstract

In this paper, we develop a new censored quantile instrumental variable (CQIV)

estimator and describe its properties and computation. The CQIV estimator handles

censoring semi-parametrically in the tradition of Powell (1986), and it generalizes stan-

dard censored quantile regression (CQR) methods to incorporate endogenous regressors

in a manner that is computationally tractable. Our computational algorithm combines

a control function approach with the CQR estimator developed by Chernozhukov and

Hong (2002). Through Monte-Carlo simulation, we show that CQIV performs well

relative to Tobit IV in terms of bias and dispersion in a model that satisfies the para-

metric assumptions required for Tobit IV to be efficient. Given the strong parametric

assumptions required by Tobit IV, the gains to CQIV relative to Tobit IV are likely

to be large in empirical applications. We present results from an empirical applica-

tion of CQIV to the estimation of Engel curves for alcohol. This empirical application

demonstrates the importance of accounting for censoring and endogeneity with CQIV.

∗Department of Economics, MIT, 50 Memorial Drive, Cambridge, MA 02142, [email protected].†Boston University, Department of Economics, 270 Bay State Road,Boston, MA 02215, [email protected].‡Department of Economics, Yale University, 37 Hillhouse Avenue, New Haven, CT 06520, and NBER,

[email protected].

1

1 Introduction

Censoring in the dependent variable can introduce bias in traditional mean and quantile

estimators because it induces correlation between the regressors and the error term. Several

mean regression estimators such as Tobit IV have been developed to produce consistent esti-

mates in models with censored dependent variables, but they often require strong parametric

assumptions. Through a generalization of the quantile regression estimator, Powell (1986)

developed a semi-parametric way to achieve consistent quantile estimates on censored data.

The Powell estimator, however, has proven to be computationally difficult to execute, and it

does not allow for endogeneity in the regressors. In this paper, we develop a new censored

quantile instrumental variable (CQIV) estimator that handles censoring semi-parametrically

in the tradition of Powell (1986) and generalizes standard censored quantile regression (CQR)

methods to incorporate endogenous regressors.

The CQIV computational algorithm that we develop here uses a control function approach

to account for endogeneity in the structural equation. Newey, Powell, and Vella (1999)

describe the use of the control function approach in triangular simultaneous equations models

for the conditional mean. Lee (2007) sets forth an estimation strategy using a control

function approach in a model with quantile structural and first stage equations. Our model

differs from Lee’s in that the dependent variable is censored, and our first stage equation

does not need to be additive in the unobservables. Blundell and Powell (2007) propose

an alternative censored quantile instrumental variable estimator that also assumes additive

errors in the first stage.

Our CQIV computational algorithm is simple to compute using standard statistical soft-

ware. In this paper, we demonstrate the implementation of CQIV with a Monte-Carlo

simulation and an empirical application to the estimation of Engel curves. The results

of the Monte-Carlo exercise demonstrate that the performance of CQIV is comparable to

that of Tobit IV in data generated to satisfy the Tobit IV assumptions. The results of

the application to Engel curves demonstrate the importance of accounting for endogeneity

and censoring. Another application of our CQIV estimator to the estimation of the price

elasticity of expenditure on medical care appears in Kowalski (2009).

In Section 2, we present the CQIV model and estimation methods. In Section 3, we

describe the associated computational algorithm and present results from a Monte-Carlo

simulation exercise. In Section 4, we present an empirical application of CQIV to Engel

curves. In Section 5, we provide conclusions and discuss potential empirical applications of

CQIV.

2

2 Censored Quantile Instrumental Variable Regression

2.1 The Model

The general stochastic model we consider is the following “triangular” system of quantile

equations:

Y = max(Y ∗, C), (2.1)

Y ∗ = QY ∗(U |D, W, V ), (2.2)

D = QD(V |W, Z). (2.3)

In this system, Y ∗ is the latent response variable, Y is obtained by censoring Y ∗ above at

the censoring variable C, D is the endogenous regressor, W is a vector of covariates, possibly

containing C, V is a latent unobserved regressor, and Z is a vector of instruments. Further,

QY ∗(·|D, W, V ) is the conditional quantile function of Y ∗ given (D, W, V ); and QD(·|W, Z)

is the conditional quantile function of the endogenous variable D given (W, Z). Here, U is

a Skorohod disturbance for Y that satisfies the independence assumption

U ∼ U(0, 1)|D, W, C, V,

and V is a Skorohod disturbance for D such that

V ∼ U(0, 1)|W, C, Z.

In the last two equations, we make the assumption that the censoring variable C is indepen-

dent of the disturbances U and V . This variable can, in principle, be related to W . Indeed,

our notation allows us to capture possible dependence of W and C by simply treating C as

a component of W .

In the model above, to recover the structural function of interest, QY ∗(·|D, W, V ), it is

important to condition on an omitted regressor V called the “control function.” The equation

(2.3) allows us to recover this omitted regressor as a residual that explains movements in

the variable D, conditional on the set of instruments and other regressors. Nonparametric

triangular models for uncensored data are developed in Newey, Powell, and Vella (1999),

Chesher (2003), and Imbens and Newey (2008); parametric nonlinear variants of these models

are also discussed in Wooldridge (2002); and linear variants of these models appear in the

analysis of Hausman (1978). The model treated in this paper differs from these earlier models

by explicitly treating the case of a censored response variable.

From the system of equations above and the equivariance property of the quantiles to

3

monotone transformations, we have that

Y = QY (U |D, W, V, C) = max(QY ∗(U |D, W, V ), C). (2.4)

Thus, the conditional quantile function of the observed response variable Y is equal to

the conditional quantile function of the latent variable Y ∗, transformed by the censoring

transformation function max(·, C).

2.2 Estimation

To make estimation both practical and realistic, we make a flexible semi-parametric restric-

tion on the functional form of the structural quantile function. In particular, we assume

that

QY ∗(u|D, W, V ) = X ′β(u), X = T (D, W, V ) = (X, V ),

where T (D, W, V ) is a collection of continuously differentiable transformations of the initial

regressors (D, W, V ). The transformations could be, for example, polynomial, trigonometric,

B-spline or other basis functions that have good approximating ability for economic problems.

In this notation, we also need to distinguish the part of the vector T (D, W, V ) that only

depends on V ; we denote this part V . An important property of this functional form is

linearity in parameters, which will lead us to a construction of a computationally efficient

estimator. The resulting functional form for the conditional quantile function of the censored

random variable is given by

QY (u|D, W, V, C) = max(X ′β(u), C).

This is the standard functional form first derived by Powell (1984) in the exogenous case.

We then form the estimator for parameters of this function as

β(u) = arg minβ∈Rk

1

n

n∑

i=1

[1((Xi,Vi)

′γ > c)ρu(Yi − (Xi,Vi)

′β)],

where ρu(x) = (u − 1(x < 0))x is the asymmetric absolute loss function of Koenker and

Bassett (1978), and X is a vector of transformations of the vector (X, C). This estimator

adapts the estimator developed in Chernozhukov and Hong (2002) to deal with endogeneity.

We call the multiplier 1((Xi,Vi)

′γ > c) the selector, as its purpose is to predict the subset

of regressors where the probability of censoring is sufficiently low to permit using a linear

– in place of a censored linear – functional form for the conditional quantile. We formally

4

state the conditions on the selector in the next subsection. This formulation allows us to

compute this estimator through several steps, all taking the form described above. We

provide practical implementation details in the next section. This estimator may also be

seen as a computationally attractive approximation to the Powell estimator applied to our

case:

βp(u) = arg minβ∈Rk

1

n

n∑

i=1

[ρu(Yi − max((Xi,Vi)

′β, Ci))].

The control function V can be estimated in several ways. We can see that

V = V (D, W, Z) ≡ Q−1D (D|W, Z) =

∫ 1

0

1{QD(v|W, Z) ≤ D}dv.

Take any estimator for QD(v|W, Z) or for Q−1D (D|W, Z), based on any parametric or semi-

parametric functional form. Denote the resulting estimator for the control function as

V = V (D, W, Z) ≡ Q−1D (D|W, Z) =

∫ 1

0

1{QD(v|W, Z) ≤ D}dv.

There are several examples: in the classical additive location model, we have that QD(v|W, Z) =

Z ′π + Q(v), where Q is a quantile function, and Z is a vector collecting transformations of

W and Z, so that

V = Q−1(D − Z ′π),

which can be estimated by the empirical CDF of the OLS residuals. In a non-additive

example, we have that QD(v|W, Z) = Z ′π(v), and

V =

∫ 1

0

1{Z ′π(v) ≤ D}dv.

The estimator then takes the form

V =

∫ 1

0

1{Z ′π(v) ≤ D}dv, (2.5)

where the integral can be approximated numerically using a finite grid of quantiles. Cher-

nozhukov, Fernandez-Val, and Galichon (2006) develops asymptotic theory for this estimator.

2.3 Regularity Conditions for Estimation

In order to estimate and make inference on β(u) where u is the probability index of interest

in (0, 1), we make the following assumptions:

5

Assumption 1 (Sampling) We have a sample of size n of independent and identically

distributed vectors (Yi, Di, Wi, Zi). The distribution function of (Yi, Di, Wi, Zi) has a compact

support and satisfies the conditions stated below.

Assumption 2 (Conditions on the Estimator of the Control Function) We have that

V = V (D, Z, W ), where V (·) ∈ V,

where V is class of functions that are sufficiently smooth, in the sense that the class satisfies

Pollard’s entropy condition, and

√n( V − V ) = B(D, Z, W )

1√n

n∑

i=1

Si + op(1),

where S1, ..., Sn are i.i.d. random vectors with finite second moments, and B(D, Z, W ) also

has finite second moments.

Assumption 3 (Conditions on the Selector) The selection rule is equivalent to the form

1((X, V )′γ > c),

where γ →p γ and, for some b > 0,

1((X, V )′γ > c) ≤ 1(Pr[Y = C|X, Z, V ] < u + b), a.s.

The selector must also be nontrivial in the sense that

1((X, V )′γ > c) = 1(Pr[Y = C|X, Z, V ] < u + b)

with positive probability.

Assumption 4 (Smoothness Conditions ) (a) The conditional density fY (y|X = x) is

differentiable in the argument y, with a derivative that is uniformly bounded in y and x

varying over the support of (Y, X). (b) the mapping (α, V ′) 7→ P ((X, V ′)′α > v) is Lipschitz

in α and in V ′, for α in an open neighborhood of γ and V ′ in V.

Assumption 5 (Design Conditions) The matrices

J(u) ≡ EfY (X ′iβ(u)|Xi)XiX

′i1[(Xi, Vi)

′γ > c]

6

and

Λ(u) ≡ V ar[{(u − 1(Yi < X ′iβ(u)))Xi + E[fY (X ′

iβ(u)|Xi)XiB(Xi)]Si} · 1((Xi, Vi)′γ > c)]

are of full rank.

Assumption 1 imposes standard independence conditions as well as compactness of sup-

port of the data variables. We can relax the compactness at the cost of more complicated

notation and proofs. Assumption 2 imposes a high-level condition on the estimator of the

control function. This condition is plausible, and it holds for the parametric estimators

of the control function in the additive set-up, and also for semi-parametric estimators of

the control function in the non-additive set-up using quantile regression (see Chernozhukov,

Fernandez-Val, and Galichon, 2006). Assumption 3 imposes a high-level condition on the

estimator of the selector function. This condition is plausible, and it holds for a variety

of selectors based on the initial estimates of the censoring probability and estimates of the

conditional quantile functions (see Chernozhukov and Hong, 2002). Assumption 4 imposes

some smoothness conditions on the distribution of Y and on the distribution of the linear

index entering the selector function. This assumption is more or less standard, and it also

appears to be plausible. Assumption 5 imposes a design condition that allows us to identify

the parameters of interest and also estimate them at the standard√

n rate.

2.4 Main Theorem

The following result states that the CQIV estimator is consistent, converges to the true

parameter at√

n rate, and is normally distributed in large samples.

Theorem 1 Under the stated assumptions

√n(β(u) − β(u))

d−→ N(0, J−1(u)Λ(u)J−1(u)) (2.6)

See Appendix A for a proof. We can estimate the variance-covariance matrix using standard

methods and carry out analytical inference based on the normal distribution. In practice,

we find it more practical to use bootstrap and subsampling to perform inference.

7

3 Implementation Details and Monte-Carlo Illustra-

tions

We begin our CQIV computational algorithm with Step 0 to facilitate comparison with the

Chernozhukov and Hong (2002) 3-Step CQR algorithm, which we follow closely. For each

desired quantile u,

0. Obtain a prediction of the control function V (and its transformations). A simple

additive strategy is to obtain V by predicting the OLS residuals from the first stage

regression of D on W and Z. If desired, higher order functions of the predicted

residuals and interactions with the other regressors can be included in X ≡ (X, V ).

We mentioned non-additive strategies in the previous section.

1. Select a subset of observations, Jo, which are not likely to be censored using a para-

metric probability model:

1(Yi > Ci) = p(X ′iγ) + εi

where 1(Yi > Ci) takes on a value of 1 if the observation is not censored and takes

on a value of zero otherwise. Note that the control function V is included in X. In

practice, a probit, logit, or any other model that fits the data well can be used. Select

the sample J0 according to the following criterion:

J0 = {i : p(X ′iγ) > 1 − u + c}.

In practice, it is advisable to choose c such that a constant fraction of observations

satisfying p(X ′iγ) > 1 − u are excluded from J0 for each quantile. To do so, set c so

that (1 − u − c) is the q0th quantile of p(X ′iγ) such that p(X ′

iγ) > 1 − u, where q0

is a percentage (10% worked well in our simulation). The empirical value of c and

the percentage of observations retained in J0 can be computed as simple robustness

diagnostic test at each quantile.

2. Estimate the standard quantile regression on the sample Jo:

β(u) minimizes∑

J0

ρu(Yi − X ′iβ(u)), (3.1)

and, using the predicted values, select another subset of observations, J1, from the full

8

sample according to the following criterion:

J1 = {i : X ′iβ(u) > Ci + δn}.

In practice, it is convenient to choose δn such that a constant fraction of observations

satisfying X ′iβ(u) > Ci are excluded from J1 for each quantile. To do so, set (Ci + δn)

to be the q1th quantile of X ′iβ(u) such that X ′

iβ(u) > Ci, where q1 is a percentage

less than q0 (3% worked well in our simulation). In practice, it should be true that J0

⊂ J1. If this is not the case, it is advisable to alter q0, q1, or the regression models.

At each quantile, the empirical value of δn, the percentage of observations from the

full sample retained in J1, the percentage of observations from J0 retained in J1, and

the number of observations in J1 but not in J0 can be computed as simple robustness

diagnostic tests. Coefficient estimates β(u) obtained in this step are consistent but will

be inefficient relative to estimates obtained in the subsequent step.

3. Estimate the standard quantile regression on the sample J1. Formally, replace J0 with

J1 in (3.1). The new estimates, β(u), are the 3-Step CQIV coefficient estimates.

4. (Optional) With results from the previous step, select a new sample J2. Repeat this

and the previous step as many times as desired.

Beginning with Step 2, each successive step of the algorithm should yield estimates that

come closer to minimizing the Powell objective function. As a simple robustness diagnostic

test, we recommend computing the value of the Powell objective function using the full

sample and the estimated coefficients after each step, starting with Step 2. This diagnostic

test is computationally straightforward because computing the value of the objective function

for a given set of values is much simpler than maximizing it. In practice, this test can be

used to determine when to stop the CQIV algorithm for each quantile. If the value of the

Powell objective function increases from Step s to Step s + 1 for s ≥ 2, estimates from step

s can be retained as the coefficient estimates.

We recommend obtaining confidence intervals through a bootstrap procedure, though

analytical formulas can also be used. If the estimation runs quickly on the desired sample,

it is straightforward to draw R ≥ 100 bootstrap samples with replacement and run each

bootstrapped sample through all steps of the algorithm. A 95% confidence interval for each

coefficient estimate can be formed from the .025 and .975 quantiles of the vector of point

estimates obtained for each coefficient.

9

3.1 Monte-Carlo

The goal of the following Monte-Carlo simulation is to compare the empirical performance of

CQIV relative to Tobit IV. For our simulation, we generate data according to a model that

satisfies the Tobit IV parametric assumptions. When the Tobit IV assumptions are satisfied,

Tobit IV is consistent and efficient, and CQIV at each quantile is consistent but inefficient.

Thus, estimates from both models satisfy the criteria for a Hausman (1978) specification

test, in which the null hypothesis is that the Tobit IV assumptions are satisfied. Moreover,

a comparison of Tobit IV coefficients to CQIV coefficients at each quantile quantifies the

relative efficiency of CQIV in a model where Tobit IV can be expected to perform as well as

possible.

A location model facilitates comparison between the conditional mean estimates of Tobit

IV and the conditional quantile estimates of CQIV. Specifically, for each of R Monte-Carlo

repetitions, we generate N observations according to the following model:

Di = π0 + π1Zi + π2Wi + Φ−1(Vi), Vi v U(0, 1) (3.2)

Y ∗i = β0 + β1Di + β2Wi + Φ−1(U ′

i), U ′i v U(0, 1) (3.3)

where (Φ−1(Vi), Φ−1(U ′

i)) is distributed multivariate normal with mean zero and covariance

matrix

Σ =

[1 ρ

ρ 1

]. (3.4)

Though we can observe Y ∗i in the simulated data, we artificially censor the data to observe

Yi = max(Y ∗i , Ci) = max(β0 + β1Di + β2Wi + Φ−1(U ′

i), Ci). (3.5)

From properties of the multivariate normal distribution, Φ−1(U ′i) = ρΦ−1(Vi)+

√1 − ρ2Φ−1(Ui),

where Ui v U(0, 1). Using this expression, we can combine (3.2) and (3.5) for an alternative

formulation of the censored model in which the control term V is included in the structural

equation:

Yi = max(Y ∗i , Ci) = max(β0 + β1Di + β2Wi + ρΦ−1(Vi) +

√1 − ρ2Φ−1(Ui), Ci).

This formulation is useful because it indicates that when we include the control term in the

structural equation, its true coefficient is ρ.

In our simulated data, we create extreme endogeneity by setting ρ = .9. For simplicity, we

set π0 = β0 = 0, and π1 = π2 = β1 = β2 = 1. To generate the data, we draw the disturbances

10

Φ−1(Vi) and Φ−1(U ′i) from a multivariate normal distribution with mean zero and covariance

matrix (3.4). We draw Zi from a standard normal distribution, and we generate Wi to

be a log-normal random variable that is censored from the right at its 95th percentile, r.

Formally, we draw Wi from a standard normal distribution. We then calculate r = QW (.95),

which differs across replication samples. Next, we set Wi = min(eWi , r). For comparative

purposes, we set the amount of censoring in the dependent variable to be comparable to that

in Kowalski (2009). Specifically, we set Ci = C = QY (.38) in each replication sample. For

comparison to Kowalski (2009), we run the simulation with N = 30, 000, but we we also run

and focus on a simulation with N = 1, 000 to demonstrate CQIV performance in a more

conventional sample size.

For each of R = 100 replications, we compute traditional Tobit IV estimates and Tobit

estimates using the control function approach for comparison to CQIV estimates. For addi-

tional comparisons, we present results from a censored quantile regression (CQR) estimator

that does not address endogeneity, a quantile instrumental variables estimator (QIV) that

incorporates the control function to control for endogeneity but does not account for cen-

soring, and a quantile regression (QR) estimator that does not account for endogeneity or

censoring. We report estimates at the .05, .10, .25, .5, .75, .90, and .95 quantiles to demon-

strate the performance of the estimators across the distribution. However, since the model

is homoskedastic, the true value is the same at all quantiles.

In the top panel of Table 1, we report Tobit IV results generated from a prepackaged

Tobit IV algorithm as well as results generated from a prepackaged Tobit algorithm with

a control variable generated with an OLS first stage. To demonstrate bias and dispersion,

we report median bias, mean bias, interquartile range (IQR), standard deviation, and root

mean square error for each estimator at each quantile. Bias on the coefficients on D and

W is computed as (1 − estimate). Since all numbers in the table are multiplied by 100,

they report bias on D and W in percentage terms. Bias on the estimated control term,

V , the predicted residual from the first stage regression of D on Z, W , and a constant, is

computed as (.9− estimate). As demonstrated in the table, the Tobit IV estimator with an

OLS estimate of the control variable represents a substantial improvement over Tobit IV in

terms of median bias and IQR. This comparison illustrates the value of the control function

approach in a nonlinear model.

In the next panels of Table 1, we report bias and dispersion of a CQIV estimator with an

OLS estimate of the control variable and another CQIV estimator with an estimate of the

control variable derived from several quantile regressions. Appendix B provides technical

details and robustness test disagnostic test results for the CQIV estimators. As shown in

the table, even though Tobit IV is efficient in this design, the CQIV estimates compare very

11

favorably to the Tobit IV estimates with an OLS estimate of the control variable, and they

out-perform the prepackaged Tobit IV estimator. It is notable that even with 38% censoring,

we are able to obtain CQIV estimates at low conditional quantiles, but the RMSE is largest

at the extreme quantiles.

The lowest panels of the table report results from a QIV estimator with an OLS estimate

of the control variable, a QIV estimate with a quantile estimate of the control variable, a

CQR estimator, and a QR estimator. The CQIV estimators out-perform the other quantile

estimators at almost all estimated quantiles. This finding demonstrates the advantage of

taking censoring and endogeneity into account with CQIV.

Given the homoskedasticity in the first stage of the design, the CQIV estimator with the

OLS estimate of the control variable has a very similar performance to the CQIV estimator

with an estimate of the control variable derived from quantile regressions. The same finding

generally holds true for the QIV estimators. However, we observe a difference in the perfor-

mance of both types of estimators when we introduce heteroskedasticity into the first stage

of the design by replacing 3.2 with the following equation:

Di = π0 + π1Zi + π2Wi + (π3 + π4Wi)Φ−1(Vi), Vi v U(0, 1) (3.6)

where we set π3 = π4 = 1. Table 2 reports results from this design. As shown in the

second and third panels, the CQIV estimator with a quantile estimate of the control function

now outperforms the CQIV estimator with an OLS estimate of the control function at every

quantile. As above the CQIV estimator also outperforms the other quantile estimators in

the lower panels. Most importantly, the CQIV estimator with the quantile estimate of the

control function now outperforms the Tobit IV estimators, which are no longer efficient given

the heteroskedasticity in the design of the first stage.

In summary, in data that satisfy the assumptions required for Tobit IV to be efficient,

the results of this simulation should provide a lower bound of CQIV performance relative to

Tobit IV. In a model in which the assumptions required for Tobit IV to be efficient are not

maintained, we show that CQIV has smaller bias and dispersion than Tobit IV. Since Tobit

IV requires strong parametric assumptions, the advantages of CQIV are likely to be large

relative to Tobit IV in applied work.

4 Empirical Application: Engel Curve Estimation

In this section, we apply the CQIV estimator to the estimation of Engel curves. The Engel

curve relationship describes how a household’s demand for a commodity changes as the

12

household’s expenditure increases. Lewbel (2006) provides a recent survey of the extensive

literature on Engel curve estimation. For comparability to the recent studies, we use data

from the 1995 U.K. Family Expenditure Survey (FES) as in Blundell, Chen, and Kristensen

(2007) and Imbens and Newey (2008). Following Blundell, Chen, and Kristensen (2007),

we restrict the sample to 1655 married or cohabitating couples with two or fewer children,

in which the head of household is employed and between the ages of 20 and 55. The FES

collects data on household expenditure for different categories of commodities. We focus on

estimation of the Engel curve relationship for the alcohol category because 16% of families

in our data report zero expenditure on alcohol. Although zero expenditure on alcohol arises

as a corner solution outcome, and not from bottom coding, both types of censoring motivate

the use of censored estimators such as CQIV.

Endogeneity in the estimation of Engel curves arises because the decision to consume a

particular category of commodity may occur simultaneously with the allocation of income

between consumption and savings. Following the literature, we rely on a two-stage budgeting

argument to justify the use of income as an instrument for expenditure. Specifically, we

estimate a quantile model in the first stage, where the logarithm of total expenditure D is a

function of the logarithm of gross earnings of the head of the household Z and demographic

household characteristics W . The control function V is obtained using the estimator in

(2.5), where the integral is approximated by a grid of 100 quantiles. For comparison, we also

obtain control function estimates as the empirical CDF of the OLS residuals from the first

stage equation. The correlation between these two control function estimates is .9986, and

both methods result in very similar estimates in the second stage.

In the second stage we focus on the following quantile specification for Engel curve

estimation:

Yi = max(X ′iβ(Ui), 0), Xi = (Di, D

2i , Wi, Vi), Ui v U(0, 1) | Xi,

where Y is the observed share of total expenditure on alcohol censored at zero, W is a binary

household demographic variable that indicates whether the family has any children, and V

is the control function estimate from the first stage. We define our binary demographic

variable following Blundell, Chen and Kristensen (2007).1

To choose the specification, we rely on recent studies in Engel curves estimation. Thus,

following Blundell, Browning, and Crawford (2003) we impose separability between the con-

1Demographic variables are important shifters of Engel curves. In recent literature, “shape invariant”specifications for demographic variable have become popular. For comparison with this literature, we alsoestimate an unrestricted version of shape invariant specification in which we include a term for the interactionbetween the logarithm of expenditure and our demographic variable. The results from the shape invariantspecification are qualitatively similar but less precise than the one reported in this application.

13

trol function and other regressors (see also Newey, Powell, Vella (1999) for a general treatment

of separable triangular models). Hausman, Newey, and Powell (1995) and Banks, Blundell,

and Lewbel (1997) show that the quadratic specification in log-expenditure gives a better

fit than the linear specification used in earlier studies. In particular, Blundell, Duncan, and

Pendakur (1998) find that the quadratic specification gives a good approximation to the

shape of the Engel curve for alcohol. To check the robustness of the specification to the

linearity in the control function, we also estimate specifications that include nonlinear terms

in the control function. The results are very similar to the ones reported.

Figure 1 reports the estimated coefficients for each quantile of the alcohol share for a

variety of estimators. In addition to reporting results for CQIV with a quantile estimate

of the control variable, as above, we report estimates from a censored quantile regression

(CQR), a quantile instrumental variables estimator with a quantile estimate of the control

variable (QIV), and a quantile regression (QR) estimator. We also estimate a model for the

conditional mean with a Tobit estimator that incorporates an OLS estimate of the control

variable. The traditional Tobit IV algorithm implemented by standard statistical software

does not converge in this application. Given the censoring, we focus on conditional quantiles

above the .15 quantile.

In the panels of Figure 1 that depict the coefficients of D and D2, the importance of

controlling for censoring is especially apparent. Comparison between the censored quantile

estimators (CQIV and CQR) and the uncensored quantile estimators (QIV and QR) demon-

strates that the censoring attenuates the uncorrected estimates toward zero at all quantiles

in this application. In particular, censoring appears very important even at the highest quan-

tiles. Relative to the Tobit IV estimate of the conditional mean, CQIV provides a richer

picture of the heterogenous effects of the variables. Comparison of the quantile estimators

that account for endogeneity (CQIV and QIV) and those that do not (CQR and QR) shows

that endogeneity also influences the estimates, but the pattern is difficult to interpret. The

estimates of the coefficient of the control function indicate that the endogeneity problem is

more severe in the upper half of the distribution.

Our quadratic quantile model is flexible in that it permits the expenditure elasticities to

vary across quantiles of the alcohol share and across the level of total expenditure. These

elasticities are related to the coefficients of the model by

ED(D, U) = β1(U) + 2β2(U)D,

where β1 and β2 are the coefficients of D and D2, respectively. Figure 2 reports estimates

of these elasticities evaluated at the three quartiles of D. Here we see that accounting for

14

endogeneity and censoring also has important consequences for these economically relevant

quantities. In the top panel, which depicts the elasticities at the .25 quartile of expenditure,

the difference between the estimates is more pronounced along the endogeneity dimension

than it is along the censoring dimension. At the .75 quartile, the difference between CQIV

and all of the other estimators is especially apparent. Figure 3 plots 95% pointwise confidence

intervals for the CQIV elasticity estimates obtained by bootstrap with 200 repetitions. Here

we can see that there is significant heterogeneity in the expenditure elasticity across quantiles

and levels of total expenditure. Thus, alcohol passes from being a normal good for low

quantiles to being an inferior good for high quantiles, with a stronger pattern of change

as the level of expenditure increases. This heterogeneity is missed by conventional mean

estimates of the elasticity.

In Figure 4 we report families of Engel curves based on the CQIV coefficient estimates

at each quartile of the alcohol share. We predict the value of the alcohol share, Y , for a

grid of values of log expenditure using the cqiv coefficients at each quartile. The subfigures

depict the engel curves for each quartile of the empirical values of the control variable, for

individuals with and without kids. Here we can see that controlling for censoring has an

important effect on the shape of the Engel curves even at the median. The families of Engel

curves are fairly robust to the values of the control variable, but the effect of children on

alcohol shares is more pronounced. The presence of children in the household produces a

downward shift in the Engel curves at all the levels of log-expenditure considered.

5 Conclusion

In this paper, we develop a new censored quantile instrumental variables estimator, and we

demonstrate its computation and finite sample performance using a Monte-Carlo simulation.

We also report a CQIV application to Engel curve estimation. Censoring and endogeneity

abound in empirical work, making CQIV a valuable addition to the applied econometrician’s

toolkit. For example, Kowalski (2009) uses CQIV to estimate the price elasticity of expen-

diture on medical care across the quantiles of the expenditure distribution, where censoring

arises because of the decision to consume zero care and endogeneity arises because marginal

prices explicitly depend on expenditure. Since CQIV can be implemented using standard

statistical software, it should prove useful to applied researchers in many applications.

15

A Proof of Theorem 1.

Below, const and K are generic positive constants. Ci denotes the censoring point.

Step 1. The rescaled statistic Zn =√

n(β(u) − β(u)) minimizes

Qn(z, γ, V ) ≡ 1√n

n∑

i=1

Vin(z)1[(Xi,Vi)

′γ > c], where (A.1)

Vin(z, V ) ≡ √n[ρu(εi − (Xi,

Vi)′z/

√n) − ρu(εi)] and εi ≡ Yi − X ′

iβ(u). The claim is that for

any finite collection of points zj , j ≤ l

(Qn(zj , γ, V ), j ≤ l

) d−→(

Q∞(zj), j ≤ l), (A.2)

where

Q∞(z) ≡ W ′z + 12z′Jz

Wd= N(0, Λ)

J ≡ EfY (X ′iβ(u)|Xi)XiX

′i1[X ′

iγ > c],

Λ(u) ≡ V ar[{(u − 1(Yi < X ′iβ(u)))Xi + E[fY (X ′

iβ(u)|Xi)XiB(Xi)]Si} · 1(X ′iγ > c)]

This claim above follows immediately from the standard CLT and LLN and some standard

calculations applied to the first order approximation

Qn(z, γ, V ) = Qn(z, γ, V ) + z′E[fY (X ′iβ(u)|Xi)XiB(Xi)]

1√n

n∑

i=1

Si + op(1), (A.3)

which is obtained in the Step 2 below.

Matrix J is invertible by assumption. Since functions Qn and Q∞ are convex, finite, and

continuous in z, and since function Q∞ is uniquely minimized at random vector −J−1W =

Op(1), (A.2) implies

Znd−→ −J−1W (A.4)

by the convexity theorem (e.g. Pollard, 1989).

Step 2. For any fixed z, the empirical process

{Qn(z, γ′, V ′) − EQn(z, γ′, V ′), γ′ ∈ G, V ′ ∈ V} (A.5)

is stochastically equicontinuous in γ, where G ≡ {γ : |γ − γ0| ≤ δ} and δ > 0 is small.

16

Indeed, let

F ={(W, Z, D) 7→ 1[(X ′, [V ′(W, Z, D)])γ > c], γ ∈ G, V ′ ∈ V

}(A.6)

and

Gn = {(W, Z, D, ε) 7→√

n[ρu(ε − (X ′, V ′(W, Z, D))z/√

n) − ρu(ε)], V′ ∈ V]. (A.7)

and, finally,

Hn = F × Gn. (A.8)

By the boundedness assumptions, Hn has a constant envelope that is bounded. The class of

functions Gn is a uniformly Lipschitz transformation of V. Using this fact it is not difficult

to show that the bracketing integral for Hn satisfies

J[](δn,Hn, L2(P )) ↘ 0, as δn ↘ 0. (A.9)

Indeed, the L2(P ) pseudo-metric on Hn is equivalent to the following pseudo-metric on G×F .

Let h1 ∈ Hn be defined by pair γ1, V1 and h2 ∈ Hn be defined by pair γ2, V2, then we define

the pseudo-metric on G × F as

ρ(γ1, γ2, V1, V2) ≡ supn≥1

√E|h1 − h2|2

.

√‖γ2 − γ1‖2 +

√E|V1(X, D, Z) − V2(X, D, Z)|2 +

√E|V1(X, D, Z) − V2(X, D, Z)|2

.

√‖γ2 − γ1‖2 +

√E|V1(X, D, Z) − V2(X, D, Z)|2 +

√E|V1(X, D, Z) − V2(X, D, Z)|2

. (‖γ2 − γ1‖2)1/2 + (E|V1(X, D, Z) − V2(X, D, Z)|2)1/4 + [E|V1(X, D, Z) − V2(X, D, Z)|2]1/2

where the first inequality follows by triangular inequality and some simple direct calculations,

and the second from V being a uniform Lipschitz transform of V , and the last inequality is

elementary. Using this inequality we can conclude that

J[](δn,Hn, L2(P )) . J[](δn,V, L2(P )) + J[](δn,G, L2(P )), (A.10)

where

J[](δn,V, L2(P )) + J[](δn,G, L2(P )) ↘ 0 as δn ↘ 0. (A.11)

where the first terms goes to zero by assumption on the class V; and the second term

converges to zero trivially.

17

The stochastic equicontinuity condition implies that

Qn(z, γ, V ) − Qn(z, γ, V ) − EQn(z, γ, V ) + EQn(z, γ, V ) = op(1). (A.12)

Thus, to complete the proof, it remains to examine the behavior of

EQn(z, γ, V )−EQn(z, γ, V ) = EQn(z, γ, V )−EQn(z, γ, V ) + EQn(z, γ, V )−EQn(z, γ, V )

(A.13)

We first show that EQn(z, γ, V )−EQn(z, γ, V ) = op(1). We can suppress V in the analysis.

We will show that for si(γ, γ0) ≡ 1[X ′iγ > c] − 1[X ′

iγ0 > c]:

EQn(z, γ) − EQn(z, γ0)|γ=γ ≡√

nEVin(z)si(γ, γ0)|γ=γ = Op(γ − γ0), (A.14)

Write√

nVin(z) ≡ −√n[{u − 1[εi ≤ 0]}X ′

iz]

+√

n[− ηi(z){X ′

iz − εi

√n}] ≡ √

nV ′in(z) +

√nV ′′

in(z), where ηi(z) ≡[1(εi ≤ 0) − 1(εi ≤ X ′

iz/√

n)]. For γ close enough to γ0, X ′

iγ > c

implies X ′iβ(u) < Ci − v, a.s. for v > 0 small, for all i, so that

E[√

nV ′in(z)si(γ, γ0)|Xi, Ci

]= 0 uniformly in i, (A.15)

since P[εi ≤ 0|Xi, Ci, Xiβ(u) < Ci − v

]= u [ if Xiβ(u) < Ci, εi has u-th conditional

quantile at 0] . Also E[√

nV ′′in(z)si(γ, γ0)|Xi, Ci

]=O

[fu(0|Xi)z

′XiX′iz1(X ′

iβ(u) < Ci −v)

]× si(γ, γ0), uniformly in i . Therefore,

EE[√

nV ′′in(z)si(γ, γ0)|Xi, Ci

]= O

(E

[si(γ, γ0)

])= O(γ − γ0). (A.16)

Next we consider the second term, and we can see by a direct calculation that

EQn(z, γ, V ) − EQn(z, γ, V ) = z′E[fY (X ′iβ(u)|Xi)XiB(Xi)]

1√n

n∑

i=1

Si + op(1)

�

B CQIV Technical Details and Robustness Diagnostic

Test Results

For the OLS estimate of the control variable, we run an OLS first stage and retain the

predicted residual from the OLS first stage as the control variable. For the quantile estimate

of the control variable, we first stage regressions at each quantile from .01 to .99 in increments

18

of .01. Next, for each observation, we next compute the fraction of the quantile estimates for

which the predicted value of the endogenous variable is less than or equal to the true value

of the endogenous variable. We then evaluate the standard normal cumulative distribution

function at this value and retain the result as the estimate of the control variable. In this

way, the quantile estimate of the control variable allows for heteroskedasticity in the first

stage.

In Table B1, we present the CQIV robustness diagnostic tests suggested in section 3 for

the CQIV estimator with an OLS estimate of the control variable. In our estimates, we

used a probit model in the first step, and we set q0 = 10 and q1 = 3. In practice, we do not

necessarily recommend reporting the diagnostics in Table B1, but we have included them here

for expositional purposes. In the top section of the table, we present diagnostics computed

after CQIV Step 1. At the 0.05 quantile, observations are retained in J0 if their predicted

probability of being uncensored exceeds 1−u+c = 1− .05+ .0445 = .9945. Empirically, this

leaves 47.0% of the total sample in J0 in the median replication sample. In all statistics, the

variation across replication samples appears small. However, as intended by the algorithm,

there is meaningful variation across the estimated quantiles. As the estimated quantile

increases, the percentage of observations retained in J0 increases. From these diagnostics,

the CQIV estimator appears well-behaved in the sense that the percentage of observations

retained in J0 is never very close to 0 or 100.

In the second section of Table B1, we present robustness test diagnostics computed after

CQIV Step 2. Observations are retained in J1 if the predicted Yi exceeds Ci + δn, where

the median value of Ci, as shown in the table, is 1.575, and the median value of δn at the

.05 quantile is 1.694. As desired, at each quantile, the percentage of observations retained in

J1 is smaller than the percentage of observations with predicted values above Ci but larger

than the percentage of observations retained in J0. As shown in sections of the table labeled

“Percent J0 in J1” and “Count J1 not in J0” J0 is almost a proper subset of J1.

In the last section of Table B1, we report the value of the Powell objective function

obtained after CQIV Step 2 and CQIV Step 3. The last column shows that on average the

final CQIV step represents an improvement in the objective function in 36-51% of replication

samples across the estimated quantiles. In our CQIV simulation results, we report the results

from the third step. Researchers might prefer to select select results from the second or third

step based on the value of the objective function.

19

References

[1] Banks, James, Blundell, Richard, and Lewbel, Arthur. “Quadratic Engel Curves and

Consumer Demand.” Review of Economics and Statistics. 1997. 79(4). pp 527-539.

[2] Blundell, Richard, Browning, Martin, and Crawford, Ian. “Nonparametric Engel Curves

and Revealed Preference.” Econometrica. 2003. 71(1). pp 205-240.

[3] Blundell, Richard, Chen, Xiaohong, and Kristensen, Dennis. “Semi-nonparametric IV

Estimation of Shape-Invariant Engel Curves.” Econometrica. 2007. 75(6). pp. 1613-1669.

[4] Blundell, Richard, Duncan, Alan, and Pendakur, Krishna. “Semiparametric Estimation

and Consumer Demand.” Journal of Applied Econometrics. 1998. 13(5). pp. 435-461.

[5] Blundell, Richard, and Powell, James. “Censored Regression Quantiles with Endoge-

nous Regressors.” Journal of Econometrics. 2007. 141. pp. 65-83.

[6] Chernozhukov, Victor, Fernandez-Val, Ivan, and Galichon, Alfred. “Quantile and Prob-

ability Curves without Crossing.” 2006. MIT Working Paper.

[7] Chernozhukov, Victor, and Hansen, Christian. “Instrumental variable quantile regres-

sion: A robust inference approach.” Journal of Econometrics. January 2008. 142(1).

pp.379-398.

[8] Chernozhukov, Victor, and Hong, Han. “Three-Step Quantile Regression and Extra-

marital Affairs.” Journal of The American Statistical Association. September 2002.

97(459). pp. 872-882.

[9] Chesher, A. “Identification in Nonseparable Models.” Econometrica, 2003, 71(5), pp.

1405-1441.

[10] Hausman, Jerry A..“Specification Tests in Econometrics.” Econometrica. 1978. 46(6).

pp. 1251-71.

[11] Hausman, Jerry, Newey, Whitney, and Powell, James. “Nonlinear Errors in Variables

Estimation of Some Engel Curves.” Journal of Econometrics. 1995. 65. pp. 203-233.

[12] Imbens, Guido W., and Newey, Whitney K.. “Identification and Estimation of Trian-

gular Simultaneous Equations Models without Additivity.” NBER Technical Working

Paper 285.

20

[13] Kowalski, Amanda E. “Censored Quantile Instrumental Variable Estimates of the Price

Elasticity of Expenditure on Medical Care.” NBER Working Paper 15085. 2009.

[14] Koenker, Roger, and Bassett, Gilbert Jr. “Regression Quantiles.” Econometrica, 1978,

46(1), pp. 33-50.

[15] Lee, Sokbae. “Endogeneity in quantile regression models: A control function ap-

proach.” Journal of Econometrics. 2007. 141, pp. 1131-1158.

[16] Lewbel, Arthur. “Entry for the New Palgrave Dictionary of Economics, 2nd Edition. ”

Boston College. 2006.

[17] McClellan, M., McNeil, B.J., and Newhouse, J.P. “Does more intensive treatment of

acute myocardial infarction in the elderly reduce mortality? Analysis using instru-

mental variables.” Journal of the American Medical Association. 1994. 272(11). pp.

859-866.

[18] Newey, Whitney K., Powell, James L., Vella, Francis. “Nonparametric Estimation of

Triangular Simultaneous Equations Models.” Econometrica. 1999. 67(3), 565-603.

[19] Powell, James L. “Censored Regression Quantiles.” Journal of Econometrics, 1986. 23.

pp-143-155.

[20] Powell, James L. “Least absolute deviations estimation for the censored regression

model.” Journal of Econometrics, 1984, 25(3), pp. 303-325.

[21] Wooldridge, Jeffrey M. Econometric Analysis of Cross Section and Panel Data. MIT

Press. Cambridge, MA. 2002.

21

Table 1: Monte Carlo Statistics - Design with Homoskedastic First Stage

Quantile Med. Mean IQR SD RMSE Med. Mean IQR SD RMSE Med. Mean IQR SD RMSE

Tobit IV Standard Algorithm (first row), Tobit IV with OLS Estimate of the Control Variable (second row)

NA 2.94 14.45 37.88 18.82 23.73 -0.93 -1.30 6.59 4.92 5.09 NA NA NA NA NA

NA -0.04 0.09 4.72 3.41 3.41 0.28 0.22 5.40 4.12 4.13 0.12 -0.01 4.66 3.50 3.50

CQIV with OLS Estimate of the Control Variable

0.05 -0.15 -0.05 7.32 5.29 5.29 0.59 0.18 8.15 6.05 6.05 -0.28 -0.34 9.22 6.69 6.70

0.10 -0.12 0.02 6.12 4.54 4.54 0.38 0.12 7.13 5.17 5.18 -0.08 -0.11 7.47 5.43 5.43

0.25 0.04 0.13 5.21 3.81 3.81 0.02 0.07 5.91 4.43 4.43 -0.13 -0.13 6.58 4.66 4.66

0.50 -0.07 0.03 5.25 3.77 3.77 0.30 0.21 6.08 4.52 4.52 0.04 -0.05 6.11 4.44 4.44

0.75 0.05 0.07 5.39 3.83 3.83 0.31 0.24 6.06 4.57 4.57 0.04 -0.04 6.10 4.42 4.42

0.90 -0.07 0.03 5.58 4.23 4.23 0.46 0.29 6.92 5.09 5.10 -0.01 -0.04 7.10 5.23 5.23

0.95 0.00 0.15 6.73 4.79 4.79 0.14 0.14 7.60 5.60 5.61 -0.05 -0.16 8.08 5.98 5.98

CQIV with Quantile Estimate of the Control Variable

0.05 -0.47 -0.30 7.23 5.39 5.40 1.50 1.16 8.25 6.12 6.23 3.33 3.34 9.41 7.08 7.83

0.10 -0.34 -0.20 6.27 4.75 4.75 1.15 1.00 7.22 5.27 5.36 3.30 3.29 8.16 6.00 6.84

0.25 -0.22 -0.15 5.45 3.94 3.95 0.85 0.82 6.25 4.52 4.59 3.26 3.09 7.51 5.22 6.07

0.50 -0.33 -0.26 5.43 3.91 3.91 1.02 0.83 6.12 4.56 4.63 3.13 3.01 6.73 4.94 5.78

0.75 -0.36 -0.21 5.60 3.94 3.95 1.05 0.80 6.15 4.64 4.71 3.27 2.94 6.90 5.04 5.83

0.90 -0.43 -0.25 5.63 4.30 4.30 0.94 0.71 7.01 5.04 5.09 2.95 2.77 7.69 5.69 6.33

0.95 -0.31 -0.21 6.80 4.79 4.79 0.77 0.59 7.65 5.64 5.67 3.07 2.60 8.43 6.38 6.89

QIV with OLS Estimate of the Control Variable

0.05 20.87 20.85 6.92 4.99 21.44 9.22 9.28 8.13 6.05 11.08 18.87 18.79 7.42 5.57 19.60

0.10 20.94 21.07 5.91 4.30 21.50 7.25 7.34 7.05 5.18 8.99 18.96 18.91 6.70 4.97 19.55

0.25 23.70 23.70 5.38 3.98 24.03 3.84 3.94 6.31 4.74 6.16 21.47 21.44 6.07 4.55 21.91

0.50 31.34 31.45 5.97 4.40 31.76 -2.49 -2.57 7.40 5.32 5.91 28.54 28.42 6.87 5.07 28.87

0.75 40.94 40.72 6.93 5.02 41.03 -12.22 -12.14 8.78 6.60 13.81 37.26 37.09 9.22 6.71 37.69

0.90 45.84 45.90 7.91 6.01 46.29 -20.58 -20.32 11.27 8.22 21.92 41.98 41.83 11.28 8.17 42.62

0.95 47.83 47.73 10.03 7.33 48.29 -24.25 -24.24 13.39 10.00 26.22 43.51 43.25 13.46 10.10 44.41

QIV with Quantile Estimate of the Control Variable

0.05 21.38 21.20 7.07 5.13 21.81 10.04 9.85 8.14 6.11 11.59 21.37 21.17 8.14 5.91 21.98

0.10 21.36 21.39 6.12 4.40 21.84 7.83 7.80 7.06 5.32 9.44 21.51 21.39 7.24 5.24 22.02

0.25 23.99 24.03 5.52 4.12 24.38 4.08 4.05 6.55 4.86 6.33 23.94 23.81 6.68 4.65 24.26

0.50 31.54 31.61 6.37 4.68 31.96 -2.45 -2.61 7.75 5.56 6.15 31.02 30.89 6.51 4.94 31.28

0.75 40.91 40.79 7.49 5.45 41.15 -12.78 -12.66 9.32 6.99 14.46 39.95 39.95 8.23 6.10 40.42

0.90 45.73 45.80 8.81 6.54 46.27 -21.27 -21.28 11.86 8.69 22.99 45.56 45.73 10.01 7.57 46.35

0.95 47.56 47.64 10.50 7.76 48.27 -26.05 -25.86 13.90 10.41 27.88 47.51 47.97 12.93 9.56 48.92

CQR

0.05 -44.98 -44.92 9.03 6.95 45.45 44.82 44.86 8.90 6.50 45.33 NA NA NA NA NA

0.10 -45.09 -44.96 6.75 5.24 45.26 44.92 44.86 6.49 5.00 45.14 NA NA NA NA NA

0.25 -45.11 -45.20 5.44 3.92 45.37 45.06 45.00 5.22 3.92 45.17 NA NA NA NA NA

0.50 -45.12 -45.22 4.66 3.53 45.36 45.17 45.18 4.60 3.54 45.32 NA NA NA NA NA

0.75 -45.12 -45.11 4.59 3.51 45.25 45.04 45.06 5.23 3.79 45.22 NA NA NA NA NA

0.90 -44.77 -44.94 5.76 4.07 45.12 45.15 45.21 5.76 4.37 45.42 NA NA NA NA NA

0.95 -44.75 -44.82 6.86 4.99 45.09 44.97 45.14 7.74 5.53 45.48 NA NA NA NA NA

QR

0.05 13.00 13.42 11.34 8.18 15.71 49.22 49.21 8.57 6.56 49.64 NA NA NA NA NA

0.10 8.10 7.97 7.81 5.82 9.87 43.19 43.00 6.90 5.11 43.30 NA NA NA NA NA

0.25 4.85 5.08 6.07 4.56 6.83 35.00 34.77 5.97 4.30 35.03 NA NA NA NA NA

0.50 7.96 8.07 5.57 4.04 9.02 24.96 24.99 6.17 4.60 25.41 NA NA NA NA NA

0.75 14.18 14.05 5.55 4.07 14.63 13.88 14.06 7.46 5.65 15.16 NA NA NA NA NA

0.90 18.42 18.40 5.40 4.14 18.86 5.64 5.47 9.39 6.94 8.84 NA NA NA NA NA

0.95 20.23 20.18 6.20 4.39 20.65 1.29 1.40 10.85 8.08 8.20 NA NA NA NA NA

Values for each quantile multiplied by 100. Med. And Mean report median and mean bias from true values D =1, W =1, Vhat=.9

N=1,000

Replications=1,000

Endogenous Variable (D ) Covariate (W ) Control Term (Vhat )

22

Table 2: Monte Carlo Statistics - Design with Heteroskedastic First Stage

Quantile Med. Mean IQR SD RMSE Med. Mean IQR SD RMSE Med. Mean IQR SD RMSE

Tobit IV Standard Algorithm (first row), Tobit IV with OLS Estimate of the Control Variable (second row)

NA 16.22 7.32 31.10 16.18 17.76 -1.76 3.04 19.84 12.14 12.51 NA NA NA NA NA

NA 5.19 4.88 4.63 3.50 6.01 5.66 5.89 6.35 4.75 7.57 60.18 60.35 4.69 3.58 60.46

CQIV with OLS Estimate of the Control Variable

0.05 7.95 8.17 7.09 5.22 9.69 6.51 6.54 8.87 6.97 9.55 59.84 59.73 7.74 5.52 59.98

0.10 7.78 7.80 6.26 4.50 9.01 6.51 6.49 8.47 6.25 9.01 59.66 59.55 6.32 4.67 59.73

0.25 6.76 7.02 5.59 4.04 8.10 6.68 6.45 7.17 5.52 8.49 59.25 59.22 5.87 4.21 59.37

0.50 5.46 5.79 5.04 3.91 6.99 6.92 6.51 6.77 5.25 8.37 59.63 59.26 5.41 3.98 59.39

0.75 4.53 4.95 5.72 4.15 6.46 6.12 5.73 7.38 5.48 7.93 59.48 59.25 5.88 4.30 59.41

0.90 3.46 3.97 6.64 4.75 6.19 5.65 5.11 7.91 6.15 8.00 60.03 59.66 6.75 5.05 59.88

0.95 3.19 3.44 7.39 5.59 6.57 5.53 4.71 9.59 7.30 8.68 60.03 59.94 7.93 6.10 60.25

CQIV with Quantile Estimate of the Control Variable

0.05 -0.36 -0.37 3.88 2.97 2.99 1.03 1.06 7.30 5.40 5.50 5.17 4.77 13.28 10.07 11.14

0.10 -0.41 -0.33 3.25 2.61 2.63 0.91 0.98 5.75 4.46 4.57 4.91 4.83 11.92 8.82 10.05

0.25 -0.41 -0.33 2.91 2.19 2.21 0.98 0.98 4.85 3.69 3.81 4.39 4.47 9.86 7.47 8.70

0.50 -0.43 -0.37 2.78 2.07 2.10 1.20 1.03 4.95 3.66 3.80 4.31 4.19 9.35 7.02 8.17

0.75 -0.34 -0.25 2.90 2.17 2.18 1.04 0.94 4.95 3.79 3.91 3.76 3.73 9.60 7.05 7.98

0.90 -0.25 -0.17 3.11 2.44 2.44 1.17 0.87 5.79 4.32 4.41 3.36 3.24 10.77 8.13 8.75

0.95 -0.12 -0.12 3.72 2.85 2.85 0.96 0.70 6.54 4.86 4.92 2.87 2.84 13.19 9.92 10.32

QIV with OLS Estimate of the Control Variable

0.05 19.25 19.48 6.63 5.06 20.13 0.01 0.02 9.18 6.41 6.41 56.24 56.09 6.69 4.91 56.30

0.10 20.25 20.24 6.14 4.57 20.75 -0.98 -1.32 8.26 6.05 6.19 56.00 55.81 5.94 4.42 55.98

0.25 22.59 22.86 6.07 4.46 23.29 -3.56 -3.90 8.09 5.99 7.14 55.29 55.10 5.50 4.21 55.26

0.50 28.96 29.05 6.90 5.12 29.50 -7.94 -8.24 9.29 6.82 10.70 53.59 53.40 6.22 4.72 53.61

0.75 40.56 40.83 8.63 6.47 41.33 -21.71 -21.81 12.25 9.27 23.70 50.35 49.88 8.34 6.48 50.30

0.90 48.02 48.45 10.75 7.98 49.11 -40.92 -41.33 16.46 11.79 42.98 49.69 49.48 11.62 8.88 50.27

0.95 49.97 50.36 13.02 9.93 51.33 -52.88 -53.32 19.47 14.54 55.26 50.99 50.74 14.82 11.14 51.95

QIV with Quantile Estimate of the Control Variable

0.05 6.21 6.21 5.39 3.92 7.34 12.50 12.49 5.78 4.20 13.18 33.83 33.69 10.50 7.79 34.58

0.10 4.97 5.23 4.44 3.30 6.19 12.79 12.79 5.14 3.62 13.29 37.70 37.77 9.50 6.83 38.39

0.25 3.46 3.58 4.41 3.15 4.77 14.64 14.61 4.54 3.38 14.99 49.83 49.80 9.12 6.87 50.27

0.50 0.68 0.82 5.49 3.93 4.02 19.80 19.60 6.27 4.80 20.18 78.88 78.93 13.09 9.28 79.48

0.75 -0.03 0.20 7.70 5.73 5.73 20.48 20.00 12.61 9.51 22.15 107.74 107.45 16.43 11.88 108.11

0.90 3.61 3.67 8.68 6.67 7.62 8.93 8.93 19.52 14.19 16.76 115.56 115.88 18.60 13.62 116.68

0.95 6.15 6.25 10.06 7.38 9.67 -0.94 -0.45 23.19 17.39 17.39 116.60 116.71 20.82 15.63 117.75

CQR

0.05 -18.83 -18.97 2.85 2.11 19.09 22.64 23.01 6.42 4.96 23.54 NA NA NA NA NA

0.10 -19.44 -19.54 2.45 1.84 19.63 24.45 24.64 4.91 3.82 24.93 NA NA NA NA NA

0.25 -20.46 -20.45 2.08 1.55 20.51 27.34 27.30 3.93 2.88 27.46 NA NA NA NA NA

0.50 -21.28 -21.31 1.91 1.47 21.36 29.93 29.92 3.76 2.75 30.04 NA NA NA NA NA

0.75 -21.94 -21.98 2.04 1.60 22.04 32.00 32.07 3.70 2.75 32.18 NA NA NA NA NA

0.90 -22.30 -22.31 2.53 1.89 22.39 33.70 33.78 4.14 3.05 33.92 NA NA NA NA NA

0.95 -22.36 -22.47 3.00 2.19 22.58 34.47 34.51 4.79 3.46 34.68 NA NA NA NA NA

QR

0.05 -7.23 -7.06 4.54 3.35 7.81 23.04 23.12 5.13 3.73 23.42 NA NA NA NA NA

0.10 -7.65 -7.42 3.48 2.58 7.86 23.44 23.45 4.02 2.97 23.64 NA NA NA NA NA

0.25 -6.69 -6.57 2.82 2.06 6.89 24.43 24.45 3.59 2.64 24.60 NA NA NA NA NA

0.50 -2.62 -2.50 3.05 2.25 3.36 23.27 23.26 4.02 3.13 23.47 NA NA NA NA NA

0.75 5.92 5.91 4.06 2.87 6.57 12.15 11.82 8.32 6.24 13.37 NA NA NA NA NA

0.90 12.81 12.87 3.99 2.98 13.21 -7.65 -7.79 12.26 9.27 12.11 NA NA NA NA NA

0.95 15.74 15.73 3.98 3.02 16.01 -20.45 -20.58 15.94 12.00 23.83 NA NA NA NA NA

Values for each quantile multiplied by 100. Med. And Mean report median and mean bias from true values D =1, W =1, Vhat=.9

N=1,000

Replications=1,000

Endogenous Variable (D ) Covariate (W ) Control Term (Vhat )

23

Figure 1: Coefficients for Engel Curves

0.5

1

−.08−.06−.04−.02

0

−.08−.06−.04−.02

0

0.01

.02

.03

−2

−1

01

20 40 60 80 100 20 40 60 80 100

20 40 60 80 100 20 40 60 80 100

20 40 60 80 100

quantile quantile

quantile quantile

quantile

log expenditure log expenditure squared

kids control variable

constant

CQIV

Tobit IV

QIV

CQR

QR

24

Figure 2: Expenditure elasticities by expenditure quartile

−.15

−.1

−.05

0.05

20 40 60 80 100 20 40 60 80 100 20 40 60 80 100

quantile quantile quantile

.25 quartile .50 quartile .75 quartile

CQIV

Tobit IV

QIV

CQR

QR

25

Figure 3: 95% confidence intervals for expenditure elasticities by expenditure quartile

−.3

−.2

−.1

0.1

20 40 60 80 100 20 40 60 80 100 20 40 60 80 100

quantile quantile quantile

.25 quartile .50 quartile .75 quartile

bootstrapped upper bound (200 replications)

elasticity

bootstrapped lower bound (200 replications)

26

Figure 4: Engel curves

0.01.02.03.04

0.01.02.03.04

0.01.02.03.04

4 4.5 5 5.5 6 4 4.5 5 5.5 6

log expenditure

log expenditure

log expenditure

log expenditure

log expenditure

log expenditure

kids .25 quartile control var



no kids .25 quartile control var



Engel curve for .75 quartile of alcohol share



27

Table B1: CQIV Robustness Diagnostic Test Resultsfor CQIV with OLS Estimate of the Control Variable - Homoskedastic Design

CQIV Step 1

Quantile Median Min Max Median Min Max

0.05 0.04 0.04 0.05 47.20 43.30 50.30

0.1 0.09 0.06 0.10 49.10 46.00 51.30

0.25 0.20 0.15 0.24 52.20 50.50 53.70

0.5 0.36 0.26 0.46 55.80 54.80 56.80

0.75 0.43 0.29 0.58 59.40 57.70 61.10

0.9 0.37 0.22 0.58 62.40 60.30 65.10

0.95 0.30 0.18 0.54 64.20 61.40 67.50

CQIV Step 2

Quantile Median Min Max Median Min Max Median Min Max

0.05 1.70 1.45 2.01 50.70 46.70 54.90 52.30 48.20 56.70

0.1 1.71 1.44 1.96 52.80 49.50 55.50 54.50 51.10 57.30

0.25 1.71 1.46 1.98 56.30 53.60 58.70 58.10 55.30 60.60

0.5 1.72 1.44 2.02 60.10 57.60 63.40 62.00 59.40 65.40

0.75 1.73 1.47 1.99 64.00 61.20 66.80 66.00 63.10 68.90

0.9 1.75 1.44 2.01 67.40 64.60 70.60 69.50 66.60 72.80

0.95 1.76 1.49 2.02 69.30 65.60 72.80 71.50 67.70 75.10

Quantile Median Min Max Median Min Max Median Min Max

0.05 1.60 1.33 1.85 100 97.7 100 36 0 81

0.1 1.60 1.33 1.85 100 99.0 100 37 7 74

0.25 1.60 1.33 1.85 100 99.6 100 40 15 68

0.5 1.60 1.33 1.85 100 99.6 100 43 23 78

0.75 1.60 1.33 1.85 100 99.7 100 47 17 74

0.9 1.60 1.33 1.85 100 99.7 100 50 15 88

0.95 1.60 1.33 1.85 100 99.1 100 51 16 97

Comparison of Objective Functions

Objective Step 3<Objective Step 2

Quantile Median Min Max Median Min Max Median Mean

0.05 5058 4458 5674 5054 4400 5753 0 0.44

0.1 8939 7925 9946 8927 7888 10049 0 0.47

0.25 17292 15100 19839 17271 14741 20052 0 0.44

0.5 22859 18692 27022 22837 18306 27091 0 0.45

0.75 16073 9603 22872 15895 8737 22866 0 0.42

0.9 -1016 -9624 7150 -1047 -10834 9265 0 0.45

0.95 -13815 -24602 -2884 -14034 -27816 -1919 0 0.44

N=1,000, Replications=1,000

Objective Step 3 Objective Step 2

Deltan Percent J1

Percent J0 in J1 Count in J1 not in J0

c

Percent Predicted Above C

Percent J0

C

28

censored quantile instrumental variable estimation via control...

Documents