censored quantile instrumental variable estimation via control...
TRANSCRIPT
Censored Quantile Instrumental Variable Estimation via
Control Functions
Victor Chernozhukov∗ Ivan Fernandez-Val† Amanda E. Kowalski‡
June 24, 2009
Abstract
In this paper, we develop a new censored quantile instrumental variable (CQIV)
estimator and describe its properties and computation. The CQIV estimator handles
censoring semi-parametrically in the tradition of Powell (1986), and it generalizes stan-
dard censored quantile regression (CQR) methods to incorporate endogenous regressors
in a manner that is computationally tractable. Our computational algorithm combines
a control function approach with the CQR estimator developed by Chernozhukov and
Hong (2002). Through Monte-Carlo simulation, we show that CQIV performs well
relative to Tobit IV in terms of bias and dispersion in a model that satisfies the para-
metric assumptions required for Tobit IV to be efficient. Given the strong parametric
assumptions required by Tobit IV, the gains to CQIV relative to Tobit IV are likely
to be large in empirical applications. We present results from an empirical applica-
tion of CQIV to the estimation of Engel curves for alcohol. This empirical application
demonstrates the importance of accounting for censoring and endogeneity with CQIV.
∗Department of Economics, MIT, 50 Memorial Drive, Cambridge, MA 02142, [email protected].†Boston University, Department of Economics, 270 Bay State Road,Boston, MA 02215, [email protected].‡Department of Economics, Yale University, 37 Hillhouse Avenue, New Haven, CT 06520, and NBER,
1
1 Introduction
Censoring in the dependent variable can introduce bias in traditional mean and quantile
estimators because it induces correlation between the regressors and the error term. Several
mean regression estimators such as Tobit IV have been developed to produce consistent esti-
mates in models with censored dependent variables, but they often require strong parametric
assumptions. Through a generalization of the quantile regression estimator, Powell (1986)
developed a semi-parametric way to achieve consistent quantile estimates on censored data.
The Powell estimator, however, has proven to be computationally difficult to execute, and it
does not allow for endogeneity in the regressors. In this paper, we develop a new censored
quantile instrumental variable (CQIV) estimator that handles censoring semi-parametrically
in the tradition of Powell (1986) and generalizes standard censored quantile regression (CQR)
methods to incorporate endogenous regressors.
The CQIV computational algorithm that we develop here uses a control function approach
to account for endogeneity in the structural equation. Newey, Powell, and Vella (1999)
describe the use of the control function approach in triangular simultaneous equations models
for the conditional mean. Lee (2007) sets forth an estimation strategy using a control
function approach in a model with quantile structural and first stage equations. Our model
differs from Lee’s in that the dependent variable is censored, and our first stage equation
does not need to be additive in the unobservables. Blundell and Powell (2007) propose
an alternative censored quantile instrumental variable estimator that also assumes additive
errors in the first stage.
Our CQIV computational algorithm is simple to compute using standard statistical soft-
ware. In this paper, we demonstrate the implementation of CQIV with a Monte-Carlo
simulation and an empirical application to the estimation of Engel curves. The results
of the Monte-Carlo exercise demonstrate that the performance of CQIV is comparable to
that of Tobit IV in data generated to satisfy the Tobit IV assumptions. The results of
the application to Engel curves demonstrate the importance of accounting for endogeneity
and censoring. Another application of our CQIV estimator to the estimation of the price
elasticity of expenditure on medical care appears in Kowalski (2009).
In Section 2, we present the CQIV model and estimation methods. In Section 3, we
describe the associated computational algorithm and present results from a Monte-Carlo
simulation exercise. In Section 4, we present an empirical application of CQIV to Engel
curves. In Section 5, we provide conclusions and discuss potential empirical applications of
CQIV.
2
2 Censored Quantile Instrumental Variable Regression
2.1 The Model
The general stochastic model we consider is the following “triangular” system of quantile
equations:
Y = max(Y ∗, C), (2.1)
Y ∗ = QY ∗(U |D, W, V ), (2.2)
D = QD(V |W, Z). (2.3)
In this system, Y ∗ is the latent response variable, Y is obtained by censoring Y ∗ above at
the censoring variable C, D is the endogenous regressor, W is a vector of covariates, possibly
containing C, V is a latent unobserved regressor, and Z is a vector of instruments. Further,
QY ∗(·|D, W, V ) is the conditional quantile function of Y ∗ given (D, W, V ); and QD(·|W, Z)
is the conditional quantile function of the endogenous variable D given (W, Z). Here, U is
a Skorohod disturbance for Y that satisfies the independence assumption
U ∼ U(0, 1)|D, W, C, V,
and V is a Skorohod disturbance for D such that
V ∼ U(0, 1)|W, C, Z.
In the last two equations, we make the assumption that the censoring variable C is indepen-
dent of the disturbances U and V . This variable can, in principle, be related to W . Indeed,
our notation allows us to capture possible dependence of W and C by simply treating C as
a component of W .
In the model above, to recover the structural function of interest, QY ∗(·|D, W, V ), it is
important to condition on an omitted regressor V called the “control function.” The equation
(2.3) allows us to recover this omitted regressor as a residual that explains movements in
the variable D, conditional on the set of instruments and other regressors. Nonparametric
triangular models for uncensored data are developed in Newey, Powell, and Vella (1999),
Chesher (2003), and Imbens and Newey (2008); parametric nonlinear variants of these models
are also discussed in Wooldridge (2002); and linear variants of these models appear in the
analysis of Hausman (1978). The model treated in this paper differs from these earlier models
by explicitly treating the case of a censored response variable.
From the system of equations above and the equivariance property of the quantiles to
3
monotone transformations, we have that
Y = QY (U |D, W, V, C) = max(QY ∗(U |D, W, V ), C). (2.4)
Thus, the conditional quantile function of the observed response variable Y is equal to
the conditional quantile function of the latent variable Y ∗, transformed by the censoring
transformation function max(·, C).
2.2 Estimation
To make estimation both practical and realistic, we make a flexible semi-parametric restric-
tion on the functional form of the structural quantile function. In particular, we assume
that
QY ∗(u|D, W, V ) = X ′β(u), X = T (D, W, V ) = (X, V ),
where T (D, W, V ) is a collection of continuously differentiable transformations of the initial
regressors (D, W, V ). The transformations could be, for example, polynomial, trigonometric,
B-spline or other basis functions that have good approximating ability for economic problems.
In this notation, we also need to distinguish the part of the vector T (D, W, V ) that only
depends on V ; we denote this part V . An important property of this functional form is
linearity in parameters, which will lead us to a construction of a computationally efficient
estimator. The resulting functional form for the conditional quantile function of the censored
random variable is given by
QY (u|D, W, V, C) = max(X ′β(u), C).
This is the standard functional form first derived by Powell (1984) in the exogenous case.
We then form the estimator for parameters of this function as
β(u) = arg minβ∈Rk
1
n
n∑
i=1
[1((Xi,Vi)
′γ > c)ρu(Yi − (Xi,Vi)
′β)],
where ρu(x) = (u − 1(x < 0))x is the asymmetric absolute loss function of Koenker and
Bassett (1978), and X is a vector of transformations of the vector (X, C). This estimator
adapts the estimator developed in Chernozhukov and Hong (2002) to deal with endogeneity.
We call the multiplier 1((Xi,Vi)
′γ > c) the selector, as its purpose is to predict the subset
of regressors where the probability of censoring is sufficiently low to permit using a linear
– in place of a censored linear – functional form for the conditional quantile. We formally
4
state the conditions on the selector in the next subsection. This formulation allows us to
compute this estimator through several steps, all taking the form described above. We
provide practical implementation details in the next section. This estimator may also be
seen as a computationally attractive approximation to the Powell estimator applied to our
case:
βp(u) = arg minβ∈Rk
1
n
n∑
i=1
[ρu(Yi − max((Xi,Vi)
′β, Ci))].
The control function V can be estimated in several ways. We can see that
V = V (D, W, Z) ≡ Q−1D (D|W, Z) =
∫ 1
0
1{QD(v|W, Z) ≤ D}dv.
Take any estimator for QD(v|W, Z) or for Q−1D (D|W, Z), based on any parametric or semi-
parametric functional form. Denote the resulting estimator for the control function as
V = V (D, W, Z) ≡ Q−1D (D|W, Z) =
∫ 1
0
1{QD(v|W, Z) ≤ D}dv.
There are several examples: in the classical additive location model, we have that QD(v|W, Z) =
Z ′π + Q(v), where Q is a quantile function, and Z is a vector collecting transformations of
W and Z, so that
V = Q−1(D − Z ′π),
which can be estimated by the empirical CDF of the OLS residuals. In a non-additive
example, we have that QD(v|W, Z) = Z ′π(v), and
V =
∫ 1
0
1{Z ′π(v) ≤ D}dv.
The estimator then takes the form
V =
∫ 1
0
1{Z ′π(v) ≤ D}dv, (2.5)
where the integral can be approximated numerically using a finite grid of quantiles. Cher-
nozhukov, Fernandez-Val, and Galichon (2006) develops asymptotic theory for this estimator.
2.3 Regularity Conditions for Estimation
In order to estimate and make inference on β(u) where u is the probability index of interest
in (0, 1), we make the following assumptions:
5
Assumption 1 (Sampling) We have a sample of size n of independent and identically
distributed vectors (Yi, Di, Wi, Zi). The distribution function of (Yi, Di, Wi, Zi) has a compact
support and satisfies the conditions stated below.
Assumption 2 (Conditions on the Estimator of the Control Function) We have that
V = V (D, Z, W ), where V (·) ∈ V,
where V is class of functions that are sufficiently smooth, in the sense that the class satisfies
Pollard’s entropy condition, and
√n( V − V ) = B(D, Z, W )
1√n
n∑
i=1
Si + op(1),
where S1, ..., Sn are i.i.d. random vectors with finite second moments, and B(D, Z, W ) also
has finite second moments.
Assumption 3 (Conditions on the Selector) The selection rule is equivalent to the form
1((X, V )′γ > c),
where γ →p γ and, for some b > 0,
1((X, V )′γ > c) ≤ 1(Pr[Y = C|X, Z, V ] < u + b), a.s.
The selector must also be nontrivial in the sense that
1((X, V )′γ > c) = 1(Pr[Y = C|X, Z, V ] < u + b)
with positive probability.
Assumption 4 (Smoothness Conditions ) (a) The conditional density fY (y|X = x) is
differentiable in the argument y, with a derivative that is uniformly bounded in y and x
varying over the support of (Y, X). (b) the mapping (α, V ′) 7→ P ((X, V ′)′α > v) is Lipschitz
in α and in V ′, for α in an open neighborhood of γ and V ′ in V.
Assumption 5 (Design Conditions) The matrices
J(u) ≡ EfY (X ′iβ(u)|Xi)XiX
′i1[(Xi, Vi)
′γ > c]
6
and
Λ(u) ≡ V ar[{(u − 1(Yi < X ′iβ(u)))Xi + E[fY (X ′
iβ(u)|Xi)XiB(Xi)]Si} · 1((Xi, Vi)′γ > c)]
are of full rank.
Assumption 1 imposes standard independence conditions as well as compactness of sup-
port of the data variables. We can relax the compactness at the cost of more complicated
notation and proofs. Assumption 2 imposes a high-level condition on the estimator of the
control function. This condition is plausible, and it holds for the parametric estimators
of the control function in the additive set-up, and also for semi-parametric estimators of
the control function in the non-additive set-up using quantile regression (see Chernozhukov,
Fernandez-Val, and Galichon, 2006). Assumption 3 imposes a high-level condition on the
estimator of the selector function. This condition is plausible, and it holds for a variety
of selectors based on the initial estimates of the censoring probability and estimates of the
conditional quantile functions (see Chernozhukov and Hong, 2002). Assumption 4 imposes
some smoothness conditions on the distribution of Y and on the distribution of the linear
index entering the selector function. This assumption is more or less standard, and it also
appears to be plausible. Assumption 5 imposes a design condition that allows us to identify
the parameters of interest and also estimate them at the standard√
n rate.
2.4 Main Theorem
The following result states that the CQIV estimator is consistent, converges to the true
parameter at√
n rate, and is normally distributed in large samples.
Theorem 1 Under the stated assumptions
√n(β(u) − β(u))
d−→ N(0, J−1(u)Λ(u)J−1(u)) (2.6)
See Appendix A for a proof. We can estimate the variance-covariance matrix using standard
methods and carry out analytical inference based on the normal distribution. In practice,
we find it more practical to use bootstrap and subsampling to perform inference.
7
3 Implementation Details and Monte-Carlo Illustra-
tions
We begin our CQIV computational algorithm with Step 0 to facilitate comparison with the
Chernozhukov and Hong (2002) 3-Step CQR algorithm, which we follow closely. For each
desired quantile u,
0. Obtain a prediction of the control function V (and its transformations). A simple
additive strategy is to obtain V by predicting the OLS residuals from the first stage
regression of D on W and Z. If desired, higher order functions of the predicted
residuals and interactions with the other regressors can be included in X ≡ (X, V ).
We mentioned non-additive strategies in the previous section.
1. Select a subset of observations, Jo, which are not likely to be censored using a para-
metric probability model:
1(Yi > Ci) = p(X ′iγ) + εi
where 1(Yi > Ci) takes on a value of 1 if the observation is not censored and takes
on a value of zero otherwise. Note that the control function V is included in X. In
practice, a probit, logit, or any other model that fits the data well can be used. Select
the sample J0 according to the following criterion:
J0 = {i : p(X ′iγ) > 1 − u + c}.
In practice, it is advisable to choose c such that a constant fraction of observations
satisfying p(X ′iγ) > 1 − u are excluded from J0 for each quantile. To do so, set c so
that (1 − u − c) is the q0th quantile of p(X ′iγ) such that p(X ′
iγ) > 1 − u, where q0
is a percentage (10% worked well in our simulation). The empirical value of c and
the percentage of observations retained in J0 can be computed as simple robustness
diagnostic test at each quantile.
2. Estimate the standard quantile regression on the sample Jo:
β(u) minimizes∑
J0
ρu(Yi − X ′iβ(u)), (3.1)
and, using the predicted values, select another subset of observations, J1, from the full
8
sample according to the following criterion:
J1 = {i : X ′iβ(u) > Ci + δn}.
In practice, it is convenient to choose δn such that a constant fraction of observations
satisfying X ′iβ(u) > Ci are excluded from J1 for each quantile. To do so, set (Ci + δn)
to be the q1th quantile of X ′iβ(u) such that X ′
iβ(u) > Ci, where q1 is a percentage
less than q0 (3% worked well in our simulation). In practice, it should be true that J0
⊂ J1. If this is not the case, it is advisable to alter q0, q1, or the regression models.
At each quantile, the empirical value of δn, the percentage of observations from the
full sample retained in J1, the percentage of observations from J0 retained in J1, and
the number of observations in J1 but not in J0 can be computed as simple robustness
diagnostic tests. Coefficient estimates β(u) obtained in this step are consistent but will
be inefficient relative to estimates obtained in the subsequent step.
3. Estimate the standard quantile regression on the sample J1. Formally, replace J0 with
J1 in (3.1). The new estimates, β(u), are the 3-Step CQIV coefficient estimates.
4. (Optional) With results from the previous step, select a new sample J2. Repeat this
and the previous step as many times as desired.
Beginning with Step 2, each successive step of the algorithm should yield estimates that
come closer to minimizing the Powell objective function. As a simple robustness diagnostic
test, we recommend computing the value of the Powell objective function using the full
sample and the estimated coefficients after each step, starting with Step 2. This diagnostic
test is computationally straightforward because computing the value of the objective function
for a given set of values is much simpler than maximizing it. In practice, this test can be
used to determine when to stop the CQIV algorithm for each quantile. If the value of the
Powell objective function increases from Step s to Step s + 1 for s ≥ 2, estimates from step
s can be retained as the coefficient estimates.
We recommend obtaining confidence intervals through a bootstrap procedure, though
analytical formulas can also be used. If the estimation runs quickly on the desired sample,
it is straightforward to draw R ≥ 100 bootstrap samples with replacement and run each
bootstrapped sample through all steps of the algorithm. A 95% confidence interval for each
coefficient estimate can be formed from the .025 and .975 quantiles of the vector of point
estimates obtained for each coefficient.
9
3.1 Monte-Carlo
The goal of the following Monte-Carlo simulation is to compare the empirical performance of
CQIV relative to Tobit IV. For our simulation, we generate data according to a model that
satisfies the Tobit IV parametric assumptions. When the Tobit IV assumptions are satisfied,
Tobit IV is consistent and efficient, and CQIV at each quantile is consistent but inefficient.
Thus, estimates from both models satisfy the criteria for a Hausman (1978) specification
test, in which the null hypothesis is that the Tobit IV assumptions are satisfied. Moreover,
a comparison of Tobit IV coefficients to CQIV coefficients at each quantile quantifies the
relative efficiency of CQIV in a model where Tobit IV can be expected to perform as well as
possible.
A location model facilitates comparison between the conditional mean estimates of Tobit
IV and the conditional quantile estimates of CQIV. Specifically, for each of R Monte-Carlo
repetitions, we generate N observations according to the following model:
Di = π0 + π1Zi + π2Wi + Φ−1(Vi), Vi v U(0, 1) (3.2)
Y ∗i = β0 + β1Di + β2Wi + Φ−1(U ′
i), U ′i v U(0, 1) (3.3)
where (Φ−1(Vi), Φ−1(U ′
i)) is distributed multivariate normal with mean zero and covariance
matrix
Σ =
[1 ρ
ρ 1
]. (3.4)
Though we can observe Y ∗i in the simulated data, we artificially censor the data to observe
Yi = max(Y ∗i , Ci) = max(β0 + β1Di + β2Wi + Φ−1(U ′
i), Ci). (3.5)
From properties of the multivariate normal distribution, Φ−1(U ′i) = ρΦ−1(Vi)+
√1 − ρ2Φ−1(Ui),
where Ui v U(0, 1). Using this expression, we can combine (3.2) and (3.5) for an alternative
formulation of the censored model in which the control term V is included in the structural
equation:
Yi = max(Y ∗i , Ci) = max(β0 + β1Di + β2Wi + ρΦ−1(Vi) +
√1 − ρ2Φ−1(Ui), Ci).
This formulation is useful because it indicates that when we include the control term in the
structural equation, its true coefficient is ρ.
In our simulated data, we create extreme endogeneity by setting ρ = .9. For simplicity, we
set π0 = β0 = 0, and π1 = π2 = β1 = β2 = 1. To generate the data, we draw the disturbances
10
Φ−1(Vi) and Φ−1(U ′i) from a multivariate normal distribution with mean zero and covariance
matrix (3.4). We draw Zi from a standard normal distribution, and we generate Wi to
be a log-normal random variable that is censored from the right at its 95th percentile, r.
Formally, we draw Wi from a standard normal distribution. We then calculate r = QW (.95),
which differs across replication samples. Next, we set Wi = min(eWi , r). For comparative
purposes, we set the amount of censoring in the dependent variable to be comparable to that
in Kowalski (2009). Specifically, we set Ci = C = QY (.38) in each replication sample. For
comparison to Kowalski (2009), we run the simulation with N = 30, 000, but we we also run
and focus on a simulation with N = 1, 000 to demonstrate CQIV performance in a more
conventional sample size.
For each of R = 100 replications, we compute traditional Tobit IV estimates and Tobit
estimates using the control function approach for comparison to CQIV estimates. For addi-
tional comparisons, we present results from a censored quantile regression (CQR) estimator
that does not address endogeneity, a quantile instrumental variables estimator (QIV) that
incorporates the control function to control for endogeneity but does not account for cen-
soring, and a quantile regression (QR) estimator that does not account for endogeneity or
censoring. We report estimates at the .05, .10, .25, .5, .75, .90, and .95 quantiles to demon-
strate the performance of the estimators across the distribution. However, since the model
is homoskedastic, the true value is the same at all quantiles.
In the top panel of Table 1, we report Tobit IV results generated from a prepackaged
Tobit IV algorithm as well as results generated from a prepackaged Tobit algorithm with
a control variable generated with an OLS first stage. To demonstrate bias and dispersion,
we report median bias, mean bias, interquartile range (IQR), standard deviation, and root
mean square error for each estimator at each quantile. Bias on the coefficients on D and
W is computed as (1 − estimate). Since all numbers in the table are multiplied by 100,
they report bias on D and W in percentage terms. Bias on the estimated control term,
V , the predicted residual from the first stage regression of D on Z, W , and a constant, is
computed as (.9− estimate). As demonstrated in the table, the Tobit IV estimator with an
OLS estimate of the control variable represents a substantial improvement over Tobit IV in
terms of median bias and IQR. This comparison illustrates the value of the control function
approach in a nonlinear model.
In the next panels of Table 1, we report bias and dispersion of a CQIV estimator with an
OLS estimate of the control variable and another CQIV estimator with an estimate of the
control variable derived from several quantile regressions. Appendix B provides technical
details and robustness test disagnostic test results for the CQIV estimators. As shown in
the table, even though Tobit IV is efficient in this design, the CQIV estimates compare very
11
favorably to the Tobit IV estimates with an OLS estimate of the control variable, and they
out-perform the prepackaged Tobit IV estimator. It is notable that even with 38% censoring,
we are able to obtain CQIV estimates at low conditional quantiles, but the RMSE is largest
at the extreme quantiles.
The lowest panels of the table report results from a QIV estimator with an OLS estimate
of the control variable, a QIV estimate with a quantile estimate of the control variable, a
CQR estimator, and a QR estimator. The CQIV estimators out-perform the other quantile
estimators at almost all estimated quantiles. This finding demonstrates the advantage of
taking censoring and endogeneity into account with CQIV.
Given the homoskedasticity in the first stage of the design, the CQIV estimator with the
OLS estimate of the control variable has a very similar performance to the CQIV estimator
with an estimate of the control variable derived from quantile regressions. The same finding
generally holds true for the QIV estimators. However, we observe a difference in the perfor-
mance of both types of estimators when we introduce heteroskedasticity into the first stage
of the design by replacing 3.2 with the following equation:
Di = π0 + π1Zi + π2Wi + (π3 + π4Wi)Φ−1(Vi), Vi v U(0, 1) (3.6)
where we set π3 = π4 = 1. Table 2 reports results from this design. As shown in the
second and third panels, the CQIV estimator with a quantile estimate of the control function
now outperforms the CQIV estimator with an OLS estimate of the control function at every
quantile. As above the CQIV estimator also outperforms the other quantile estimators in
the lower panels. Most importantly, the CQIV estimator with the quantile estimate of the
control function now outperforms the Tobit IV estimators, which are no longer efficient given
the heteroskedasticity in the design of the first stage.
In summary, in data that satisfy the assumptions required for Tobit IV to be efficient,
the results of this simulation should provide a lower bound of CQIV performance relative to
Tobit IV. In a model in which the assumptions required for Tobit IV to be efficient are not
maintained, we show that CQIV has smaller bias and dispersion than Tobit IV. Since Tobit
IV requires strong parametric assumptions, the advantages of CQIV are likely to be large
relative to Tobit IV in applied work.
4 Empirical Application: Engel Curve Estimation
In this section, we apply the CQIV estimator to the estimation of Engel curves. The Engel
curve relationship describes how a household’s demand for a commodity changes as the
12
household’s expenditure increases. Lewbel (2006) provides a recent survey of the extensive
literature on Engel curve estimation. For comparability to the recent studies, we use data
from the 1995 U.K. Family Expenditure Survey (FES) as in Blundell, Chen, and Kristensen
(2007) and Imbens and Newey (2008). Following Blundell, Chen, and Kristensen (2007),
we restrict the sample to 1655 married or cohabitating couples with two or fewer children,
in which the head of household is employed and between the ages of 20 and 55. The FES
collects data on household expenditure for different categories of commodities. We focus on
estimation of the Engel curve relationship for the alcohol category because 16% of families
in our data report zero expenditure on alcohol. Although zero expenditure on alcohol arises
as a corner solution outcome, and not from bottom coding, both types of censoring motivate
the use of censored estimators such as CQIV.
Endogeneity in the estimation of Engel curves arises because the decision to consume a
particular category of commodity may occur simultaneously with the allocation of income
between consumption and savings. Following the literature, we rely on a two-stage budgeting
argument to justify the use of income as an instrument for expenditure. Specifically, we
estimate a quantile model in the first stage, where the logarithm of total expenditure D is a
function of the logarithm of gross earnings of the head of the household Z and demographic
household characteristics W . The control function V is obtained using the estimator in
(2.5), where the integral is approximated by a grid of 100 quantiles. For comparison, we also
obtain control function estimates as the empirical CDF of the OLS residuals from the first
stage equation. The correlation between these two control function estimates is .9986, and
both methods result in very similar estimates in the second stage.
In the second stage we focus on the following quantile specification for Engel curve
estimation:
Yi = max(X ′iβ(Ui), 0), Xi = (Di, D
2i , Wi, Vi), Ui v U(0, 1) | Xi,
where Y is the observed share of total expenditure on alcohol censored at zero, W is a binary
household demographic variable that indicates whether the family has any children, and V
is the control function estimate from the first stage. We define our binary demographic
variable following Blundell, Chen and Kristensen (2007).1
To choose the specification, we rely on recent studies in Engel curves estimation. Thus,
following Blundell, Browning, and Crawford (2003) we impose separability between the con-
1Demographic variables are important shifters of Engel curves. In recent literature, “shape invariant”specifications for demographic variable have become popular. For comparison with this literature, we alsoestimate an unrestricted version of shape invariant specification in which we include a term for the interactionbetween the logarithm of expenditure and our demographic variable. The results from the shape invariantspecification are qualitatively similar but less precise than the one reported in this application.
13
trol function and other regressors (see also Newey, Powell, Vella (1999) for a general treatment
of separable triangular models). Hausman, Newey, and Powell (1995) and Banks, Blundell,
and Lewbel (1997) show that the quadratic specification in log-expenditure gives a better
fit than the linear specification used in earlier studies. In particular, Blundell, Duncan, and
Pendakur (1998) find that the quadratic specification gives a good approximation to the
shape of the Engel curve for alcohol. To check the robustness of the specification to the
linearity in the control function, we also estimate specifications that include nonlinear terms
in the control function. The results are very similar to the ones reported.
Figure 1 reports the estimated coefficients for each quantile of the alcohol share for a
variety of estimators. In addition to reporting results for CQIV with a quantile estimate
of the control variable, as above, we report estimates from a censored quantile regression
(CQR), a quantile instrumental variables estimator with a quantile estimate of the control
variable (QIV), and a quantile regression (QR) estimator. We also estimate a model for the
conditional mean with a Tobit estimator that incorporates an OLS estimate of the control
variable. The traditional Tobit IV algorithm implemented by standard statistical software
does not converge in this application. Given the censoring, we focus on conditional quantiles
above the .15 quantile.
In the panels of Figure 1 that depict the coefficients of D and D2, the importance of
controlling for censoring is especially apparent. Comparison between the censored quantile
estimators (CQIV and CQR) and the uncensored quantile estimators (QIV and QR) demon-
strates that the censoring attenuates the uncorrected estimates toward zero at all quantiles
in this application. In particular, censoring appears very important even at the highest quan-
tiles. Relative to the Tobit IV estimate of the conditional mean, CQIV provides a richer
picture of the heterogenous effects of the variables. Comparison of the quantile estimators
that account for endogeneity (CQIV and QIV) and those that do not (CQR and QR) shows
that endogeneity also influences the estimates, but the pattern is difficult to interpret. The
estimates of the coefficient of the control function indicate that the endogeneity problem is
more severe in the upper half of the distribution.
Our quadratic quantile model is flexible in that it permits the expenditure elasticities to
vary across quantiles of the alcohol share and across the level of total expenditure. These
elasticities are related to the coefficients of the model by
ED(D, U) = β1(U) + 2β2(U)D,
where β1 and β2 are the coefficients of D and D2, respectively. Figure 2 reports estimates
of these elasticities evaluated at the three quartiles of D. Here we see that accounting for
14
endogeneity and censoring also has important consequences for these economically relevant
quantities. In the top panel, which depicts the elasticities at the .25 quartile of expenditure,
the difference between the estimates is more pronounced along the endogeneity dimension
than it is along the censoring dimension. At the .75 quartile, the difference between CQIV
and all of the other estimators is especially apparent. Figure 3 plots 95% pointwise confidence
intervals for the CQIV elasticity estimates obtained by bootstrap with 200 repetitions. Here
we can see that there is significant heterogeneity in the expenditure elasticity across quantiles
and levels of total expenditure. Thus, alcohol passes from being a normal good for low
quantiles to being an inferior good for high quantiles, with a stronger pattern of change
as the level of expenditure increases. This heterogeneity is missed by conventional mean
estimates of the elasticity.
In Figure 4 we report families of Engel curves based on the CQIV coefficient estimates
at each quartile of the alcohol share. We predict the value of the alcohol share, Y , for a
grid of values of log expenditure using the cqiv coefficients at each quartile. The subfigures
depict the engel curves for each quartile of the empirical values of the control variable, for
individuals with and without kids. Here we can see that controlling for censoring has an
important effect on the shape of the Engel curves even at the median. The families of Engel
curves are fairly robust to the values of the control variable, but the effect of children on
alcohol shares is more pronounced. The presence of children in the household produces a
downward shift in the Engel curves at all the levels of log-expenditure considered.
5 Conclusion
In this paper, we develop a new censored quantile instrumental variables estimator, and we
demonstrate its computation and finite sample performance using a Monte-Carlo simulation.
We also report a CQIV application to Engel curve estimation. Censoring and endogeneity
abound in empirical work, making CQIV a valuable addition to the applied econometrician’s
toolkit. For example, Kowalski (2009) uses CQIV to estimate the price elasticity of expen-
diture on medical care across the quantiles of the expenditure distribution, where censoring
arises because of the decision to consume zero care and endogeneity arises because marginal
prices explicitly depend on expenditure. Since CQIV can be implemented using standard
statistical software, it should prove useful to applied researchers in many applications.
15
A Proof of Theorem 1.
Below, const and K are generic positive constants. Ci denotes the censoring point.
Step 1. The rescaled statistic Zn =√
n(β(u) − β(u)) minimizes
Qn(z, γ, V ) ≡ 1√n
n∑
i=1
Vin(z)1[(Xi,Vi)
′γ > c], where (A.1)
Vin(z, V ) ≡ √n[ρu(εi − (Xi,
Vi)′z/
√n) − ρu(εi)] and εi ≡ Yi − X ′
iβ(u). The claim is that for
any finite collection of points zj , j ≤ l
(Qn(zj , γ, V ), j ≤ l
) d−→(
Q∞(zj), j ≤ l), (A.2)
where
Q∞(z) ≡ W ′z + 12z′Jz
Wd= N(0, Λ)
J ≡ EfY (X ′iβ(u)|Xi)XiX
′i1[X ′
iγ > c],
Λ(u) ≡ V ar[{(u − 1(Yi < X ′iβ(u)))Xi + E[fY (X ′
iβ(u)|Xi)XiB(Xi)]Si} · 1(X ′iγ > c)]
This claim above follows immediately from the standard CLT and LLN and some standard
calculations applied to the first order approximation
Qn(z, γ, V ) = Qn(z, γ, V ) + z′E[fY (X ′iβ(u)|Xi)XiB(Xi)]
1√n
n∑
i=1
Si + op(1), (A.3)
which is obtained in the Step 2 below.
Matrix J is invertible by assumption. Since functions Qn and Q∞ are convex, finite, and
continuous in z, and since function Q∞ is uniquely minimized at random vector −J−1W =
Op(1), (A.2) implies
Znd−→ −J−1W (A.4)
by the convexity theorem (e.g. Pollard, 1989).
Step 2. For any fixed z, the empirical process
{Qn(z, γ′, V ′) − EQn(z, γ′, V ′), γ′ ∈ G, V ′ ∈ V} (A.5)
is stochastically equicontinuous in γ, where G ≡ {γ : |γ − γ0| ≤ δ} and δ > 0 is small.
16
Indeed, let
F ={(W, Z, D) 7→ 1[(X ′, [V ′(W, Z, D)])γ > c], γ ∈ G, V ′ ∈ V
}(A.6)
and
Gn = {(W, Z, D, ε) 7→√
n[ρu(ε − (X ′, V ′(W, Z, D))z/√
n) − ρu(ε)], V′ ∈ V]. (A.7)
and, finally,
Hn = F × Gn. (A.8)
By the boundedness assumptions, Hn has a constant envelope that is bounded. The class of
functions Gn is a uniformly Lipschitz transformation of V. Using this fact it is not difficult
to show that the bracketing integral for Hn satisfies
J[](δn,Hn, L2(P )) ↘ 0, as δn ↘ 0. (A.9)
Indeed, the L2(P ) pseudo-metric on Hn is equivalent to the following pseudo-metric on G×F .
Let h1 ∈ Hn be defined by pair γ1, V1 and h2 ∈ Hn be defined by pair γ2, V2, then we define
the pseudo-metric on G × F as
ρ(γ1, γ2, V1, V2) ≡ supn≥1
√E|h1 − h2|2
.
√‖γ2 − γ1‖2 +
√E|V1(X, D, Z) − V2(X, D, Z)|2 +
√E|V1(X, D, Z) − V2(X, D, Z)|2
.
√‖γ2 − γ1‖2 +
√E|V1(X, D, Z) − V2(X, D, Z)|2 +
√E|V1(X, D, Z) − V2(X, D, Z)|2
. (‖γ2 − γ1‖2)1/2 + (E|V1(X, D, Z) − V2(X, D, Z)|2)1/4 + [E|V1(X, D, Z) − V2(X, D, Z)|2]1/2
where the first inequality follows by triangular inequality and some simple direct calculations,
and the second from V being a uniform Lipschitz transform of V , and the last inequality is
elementary. Using this inequality we can conclude that
J[](δn,Hn, L2(P )) . J[](δn,V, L2(P )) + J[](δn,G, L2(P )), (A.10)
where
J[](δn,V, L2(P )) + J[](δn,G, L2(P )) ↘ 0 as δn ↘ 0. (A.11)
where the first terms goes to zero by assumption on the class V; and the second term
converges to zero trivially.
17
The stochastic equicontinuity condition implies that
Qn(z, γ, V ) − Qn(z, γ, V ) − EQn(z, γ, V ) + EQn(z, γ, V ) = op(1). (A.12)
Thus, to complete the proof, it remains to examine the behavior of
EQn(z, γ, V )−EQn(z, γ, V ) = EQn(z, γ, V )−EQn(z, γ, V ) + EQn(z, γ, V )−EQn(z, γ, V )
(A.13)
We first show that EQn(z, γ, V )−EQn(z, γ, V ) = op(1). We can suppress V in the analysis.
We will show that for si(γ, γ0) ≡ 1[X ′iγ > c] − 1[X ′
iγ0 > c]:
EQn(z, γ) − EQn(z, γ0)|γ=γ ≡√
nEVin(z)si(γ, γ0)|γ=γ = Op(γ − γ0), (A.14)
Write√
nVin(z) ≡ −√n[{u − 1[εi ≤ 0]}X ′
iz]
+√
n[− ηi(z){X ′
iz − εi
√n}] ≡ √
nV ′in(z) +
√nV ′′
in(z), where ηi(z) ≡[1(εi ≤ 0) − 1(εi ≤ X ′
iz/√
n)]. For γ close enough to γ0, X ′
iγ > c
implies X ′iβ(u) < Ci − v, a.s. for v > 0 small, for all i, so that
E[√
nV ′in(z)si(γ, γ0)|Xi, Ci
]= 0 uniformly in i, (A.15)
since P[εi ≤ 0|Xi, Ci, Xiβ(u) < Ci − v
]= u [ if Xiβ(u) < Ci, εi has u-th conditional
quantile at 0] . Also E[√
nV ′′in(z)si(γ, γ0)|Xi, Ci
]=O
[fu(0|Xi)z
′XiX′iz1(X ′
iβ(u) < Ci −v)
]× si(γ, γ0), uniformly in i . Therefore,
EE[√
nV ′′in(z)si(γ, γ0)|Xi, Ci
]= O
(E
[si(γ, γ0)
])= O(γ − γ0). (A.16)
Next we consider the second term, and we can see by a direct calculation that
EQn(z, γ, V ) − EQn(z, γ, V ) = z′E[fY (X ′iβ(u)|Xi)XiB(Xi)]
1√n
n∑
i=1
Si + op(1)
�
B CQIV Technical Details and Robustness Diagnostic
Test Results
For the OLS estimate of the control variable, we run an OLS first stage and retain the
predicted residual from the OLS first stage as the control variable. For the quantile estimate
of the control variable, we first stage regressions at each quantile from .01 to .99 in increments
18
of .01. Next, for each observation, we next compute the fraction of the quantile estimates for
which the predicted value of the endogenous variable is less than or equal to the true value
of the endogenous variable. We then evaluate the standard normal cumulative distribution
function at this value and retain the result as the estimate of the control variable. In this
way, the quantile estimate of the control variable allows for heteroskedasticity in the first
stage.
In Table B1, we present the CQIV robustness diagnostic tests suggested in section 3 for
the CQIV estimator with an OLS estimate of the control variable. In our estimates, we
used a probit model in the first step, and we set q0 = 10 and q1 = 3. In practice, we do not
necessarily recommend reporting the diagnostics in Table B1, but we have included them here
for expositional purposes. In the top section of the table, we present diagnostics computed
after CQIV Step 1. At the 0.05 quantile, observations are retained in J0 if their predicted
probability of being uncensored exceeds 1−u+c = 1− .05+ .0445 = .9945. Empirically, this
leaves 47.0% of the total sample in J0 in the median replication sample. In all statistics, the
variation across replication samples appears small. However, as intended by the algorithm,
there is meaningful variation across the estimated quantiles. As the estimated quantile
increases, the percentage of observations retained in J0 increases. From these diagnostics,
the CQIV estimator appears well-behaved in the sense that the percentage of observations
retained in J0 is never very close to 0 or 100.
In the second section of Table B1, we present robustness test diagnostics computed after
CQIV Step 2. Observations are retained in J1 if the predicted Yi exceeds Ci + δn, where
the median value of Ci, as shown in the table, is 1.575, and the median value of δn at the
.05 quantile is 1.694. As desired, at each quantile, the percentage of observations retained in
J1 is smaller than the percentage of observations with predicted values above Ci but larger
than the percentage of observations retained in J0. As shown in sections of the table labeled
“Percent J0 in J1” and “Count J1 not in J0” J0 is almost a proper subset of J1.
In the last section of Table B1, we report the value of the Powell objective function
obtained after CQIV Step 2 and CQIV Step 3. The last column shows that on average the
final CQIV step represents an improvement in the objective function in 36-51% of replication
samples across the estimated quantiles. In our CQIV simulation results, we report the results
from the third step. Researchers might prefer to select select results from the second or third
step based on the value of the objective function.
19
References
[1] Banks, James, Blundell, Richard, and Lewbel, Arthur. “Quadratic Engel Curves and
Consumer Demand.” Review of Economics and Statistics. 1997. 79(4). pp 527-539.
[2] Blundell, Richard, Browning, Martin, and Crawford, Ian. “Nonparametric Engel Curves
and Revealed Preference.” Econometrica. 2003. 71(1). pp 205-240.
[3] Blundell, Richard, Chen, Xiaohong, and Kristensen, Dennis. “Semi-nonparametric IV
Estimation of Shape-Invariant Engel Curves.” Econometrica. 2007. 75(6). pp. 1613-1669.
[4] Blundell, Richard, Duncan, Alan, and Pendakur, Krishna. “Semiparametric Estimation
and Consumer Demand.” Journal of Applied Econometrics. 1998. 13(5). pp. 435-461.
[5] Blundell, Richard, and Powell, James. “Censored Regression Quantiles with Endoge-
nous Regressors.” Journal of Econometrics. 2007. 141. pp. 65-83.
[6] Chernozhukov, Victor, Fernandez-Val, Ivan, and Galichon, Alfred. “Quantile and Prob-
ability Curves without Crossing.” 2006. MIT Working Paper.
[7] Chernozhukov, Victor, and Hansen, Christian. “Instrumental variable quantile regres-
sion: A robust inference approach.” Journal of Econometrics. January 2008. 142(1).
pp.379-398.
[8] Chernozhukov, Victor, and Hong, Han. “Three-Step Quantile Regression and Extra-
marital Affairs.” Journal of The American Statistical Association. September 2002.
97(459). pp. 872-882.
[9] Chesher, A. “Identification in Nonseparable Models.” Econometrica, 2003, 71(5), pp.
1405-1441.
[10] Hausman, Jerry A..“Specification Tests in Econometrics.” Econometrica. 1978. 46(6).
pp. 1251-71.
[11] Hausman, Jerry, Newey, Whitney, and Powell, James. “Nonlinear Errors in Variables
Estimation of Some Engel Curves.” Journal of Econometrics. 1995. 65. pp. 203-233.
[12] Imbens, Guido W., and Newey, Whitney K.. “Identification and Estimation of Trian-
gular Simultaneous Equations Models without Additivity.” NBER Technical Working
Paper 285.
20
[13] Kowalski, Amanda E. “Censored Quantile Instrumental Variable Estimates of the Price
Elasticity of Expenditure on Medical Care.” NBER Working Paper 15085. 2009.
[14] Koenker, Roger, and Bassett, Gilbert Jr. “Regression Quantiles.” Econometrica, 1978,
46(1), pp. 33-50.
[15] Lee, Sokbae. “Endogeneity in quantile regression models: A control function ap-
proach.” Journal of Econometrics. 2007. 141, pp. 1131-1158.
[16] Lewbel, Arthur. “Entry for the New Palgrave Dictionary of Economics, 2nd Edition. ”
Boston College. 2006.
[17] McClellan, M., McNeil, B.J., and Newhouse, J.P. “Does more intensive treatment of
acute myocardial infarction in the elderly reduce mortality? Analysis using instru-
mental variables.” Journal of the American Medical Association. 1994. 272(11). pp.
859-866.
[18] Newey, Whitney K., Powell, James L., Vella, Francis. “Nonparametric Estimation of
Triangular Simultaneous Equations Models.” Econometrica. 1999. 67(3), 565-603.
[19] Powell, James L. “Censored Regression Quantiles.” Journal of Econometrics, 1986. 23.
pp-143-155.
[20] Powell, James L. “Least absolute deviations estimation for the censored regression
model.” Journal of Econometrics, 1984, 25(3), pp. 303-325.
[21] Wooldridge, Jeffrey M. Econometric Analysis of Cross Section and Panel Data. MIT
Press. Cambridge, MA. 2002.
21
Table 1: Monte Carlo Statistics - Design with Homoskedastic First Stage
Quantile Med. Mean IQR SD RMSE Med. Mean IQR SD RMSE Med. Mean IQR SD RMSE
Tobit IV Standard Algorithm (first row), Tobit IV with OLS Estimate of the Control Variable (second row)
NA 2.94 14.45 37.88 18.82 23.73 -0.93 -1.30 6.59 4.92 5.09 NA NA NA NA NA
NA -0.04 0.09 4.72 3.41 3.41 0.28 0.22 5.40 4.12 4.13 0.12 -0.01 4.66 3.50 3.50
CQIV with OLS Estimate of the Control Variable
0.05 -0.15 -0.05 7.32 5.29 5.29 0.59 0.18 8.15 6.05 6.05 -0.28 -0.34 9.22 6.69 6.70
0.10 -0.12 0.02 6.12 4.54 4.54 0.38 0.12 7.13 5.17 5.18 -0.08 -0.11 7.47 5.43 5.43
0.25 0.04 0.13 5.21 3.81 3.81 0.02 0.07 5.91 4.43 4.43 -0.13 -0.13 6.58 4.66 4.66
0.50 -0.07 0.03 5.25 3.77 3.77 0.30 0.21 6.08 4.52 4.52 0.04 -0.05 6.11 4.44 4.44
0.75 0.05 0.07 5.39 3.83 3.83 0.31 0.24 6.06 4.57 4.57 0.04 -0.04 6.10 4.42 4.42
0.90 -0.07 0.03 5.58 4.23 4.23 0.46 0.29 6.92 5.09 5.10 -0.01 -0.04 7.10 5.23 5.23
0.95 0.00 0.15 6.73 4.79 4.79 0.14 0.14 7.60 5.60 5.61 -0.05 -0.16 8.08 5.98 5.98
CQIV with Quantile Estimate of the Control Variable
0.05 -0.47 -0.30 7.23 5.39 5.40 1.50 1.16 8.25 6.12 6.23 3.33 3.34 9.41 7.08 7.83
0.10 -0.34 -0.20 6.27 4.75 4.75 1.15 1.00 7.22 5.27 5.36 3.30 3.29 8.16 6.00 6.84
0.25 -0.22 -0.15 5.45 3.94 3.95 0.85 0.82 6.25 4.52 4.59 3.26 3.09 7.51 5.22 6.07
0.50 -0.33 -0.26 5.43 3.91 3.91 1.02 0.83 6.12 4.56 4.63 3.13 3.01 6.73 4.94 5.78
0.75 -0.36 -0.21 5.60 3.94 3.95 1.05 0.80 6.15 4.64 4.71 3.27 2.94 6.90 5.04 5.83
0.90 -0.43 -0.25 5.63 4.30 4.30 0.94 0.71 7.01 5.04 5.09 2.95 2.77 7.69 5.69 6.33
0.95 -0.31 -0.21 6.80 4.79 4.79 0.77 0.59 7.65 5.64 5.67 3.07 2.60 8.43 6.38 6.89
QIV with OLS Estimate of the Control Variable
0.05 20.87 20.85 6.92 4.99 21.44 9.22 9.28 8.13 6.05 11.08 18.87 18.79 7.42 5.57 19.60
0.10 20.94 21.07 5.91 4.30 21.50 7.25 7.34 7.05 5.18 8.99 18.96 18.91 6.70 4.97 19.55
0.25 23.70 23.70 5.38 3.98 24.03 3.84 3.94 6.31 4.74 6.16 21.47 21.44 6.07 4.55 21.91
0.50 31.34 31.45 5.97 4.40 31.76 -2.49 -2.57 7.40 5.32 5.91 28.54 28.42 6.87 5.07 28.87
0.75 40.94 40.72 6.93 5.02 41.03 -12.22 -12.14 8.78 6.60 13.81 37.26 37.09 9.22 6.71 37.69
0.90 45.84 45.90 7.91 6.01 46.29 -20.58 -20.32 11.27 8.22 21.92 41.98 41.83 11.28 8.17 42.62
0.95 47.83 47.73 10.03 7.33 48.29 -24.25 -24.24 13.39 10.00 26.22 43.51 43.25 13.46 10.10 44.41
QIV with Quantile Estimate of the Control Variable
0.05 21.38 21.20 7.07 5.13 21.81 10.04 9.85 8.14 6.11 11.59 21.37 21.17 8.14 5.91 21.98
0.10 21.36 21.39 6.12 4.40 21.84 7.83 7.80 7.06 5.32 9.44 21.51 21.39 7.24 5.24 22.02
0.25 23.99 24.03 5.52 4.12 24.38 4.08 4.05 6.55 4.86 6.33 23.94 23.81 6.68 4.65 24.26
0.50 31.54 31.61 6.37 4.68 31.96 -2.45 -2.61 7.75 5.56 6.15 31.02 30.89 6.51 4.94 31.28
0.75 40.91 40.79 7.49 5.45 41.15 -12.78 -12.66 9.32 6.99 14.46 39.95 39.95 8.23 6.10 40.42
0.90 45.73 45.80 8.81 6.54 46.27 -21.27 -21.28 11.86 8.69 22.99 45.56 45.73 10.01 7.57 46.35
0.95 47.56 47.64 10.50 7.76 48.27 -26.05 -25.86 13.90 10.41 27.88 47.51 47.97 12.93 9.56 48.92
CQR
0.05 -44.98 -44.92 9.03 6.95 45.45 44.82 44.86 8.90 6.50 45.33 NA NA NA NA NA
0.10 -45.09 -44.96 6.75 5.24 45.26 44.92 44.86 6.49 5.00 45.14 NA NA NA NA NA
0.25 -45.11 -45.20 5.44 3.92 45.37 45.06 45.00 5.22 3.92 45.17 NA NA NA NA NA
0.50 -45.12 -45.22 4.66 3.53 45.36 45.17 45.18 4.60 3.54 45.32 NA NA NA NA NA
0.75 -45.12 -45.11 4.59 3.51 45.25 45.04 45.06 5.23 3.79 45.22 NA NA NA NA NA
0.90 -44.77 -44.94 5.76 4.07 45.12 45.15 45.21 5.76 4.37 45.42 NA NA NA NA NA
0.95 -44.75 -44.82 6.86 4.99 45.09 44.97 45.14 7.74 5.53 45.48 NA NA NA NA NA
QR
0.05 13.00 13.42 11.34 8.18 15.71 49.22 49.21 8.57 6.56 49.64 NA NA NA NA NA
0.10 8.10 7.97 7.81 5.82 9.87 43.19 43.00 6.90 5.11 43.30 NA NA NA NA NA
0.25 4.85 5.08 6.07 4.56 6.83 35.00 34.77 5.97 4.30 35.03 NA NA NA NA NA
0.50 7.96 8.07 5.57 4.04 9.02 24.96 24.99 6.17 4.60 25.41 NA NA NA NA NA
0.75 14.18 14.05 5.55 4.07 14.63 13.88 14.06 7.46 5.65 15.16 NA NA NA NA NA
0.90 18.42 18.40 5.40 4.14 18.86 5.64 5.47 9.39 6.94 8.84 NA NA NA NA NA
0.95 20.23 20.18 6.20 4.39 20.65 1.29 1.40 10.85 8.08 8.20 NA NA NA NA NA
Values for each quantile multiplied by 100. Med. And Mean report median and mean bias from true values D =1, W =1, Vhat=.9
N=1,000
Replications=1,000
Endogenous Variable (D ) Covariate (W ) Control Term (Vhat )
22
Table 2: Monte Carlo Statistics - Design with Heteroskedastic First Stage
Quantile Med. Mean IQR SD RMSE Med. Mean IQR SD RMSE Med. Mean IQR SD RMSE
Tobit IV Standard Algorithm (first row), Tobit IV with OLS Estimate of the Control Variable (second row)
NA 16.22 7.32 31.10 16.18 17.76 -1.76 3.04 19.84 12.14 12.51 NA NA NA NA NA
NA 5.19 4.88 4.63 3.50 6.01 5.66 5.89 6.35 4.75 7.57 60.18 60.35 4.69 3.58 60.46
CQIV with OLS Estimate of the Control Variable
0.05 7.95 8.17 7.09 5.22 9.69 6.51 6.54 8.87 6.97 9.55 59.84 59.73 7.74 5.52 59.98
0.10 7.78 7.80 6.26 4.50 9.01 6.51 6.49 8.47 6.25 9.01 59.66 59.55 6.32 4.67 59.73
0.25 6.76 7.02 5.59 4.04 8.10 6.68 6.45 7.17 5.52 8.49 59.25 59.22 5.87 4.21 59.37
0.50 5.46 5.79 5.04 3.91 6.99 6.92 6.51 6.77 5.25 8.37 59.63 59.26 5.41 3.98 59.39
0.75 4.53 4.95 5.72 4.15 6.46 6.12 5.73 7.38 5.48 7.93 59.48 59.25 5.88 4.30 59.41
0.90 3.46 3.97 6.64 4.75 6.19 5.65 5.11 7.91 6.15 8.00 60.03 59.66 6.75 5.05 59.88
0.95 3.19 3.44 7.39 5.59 6.57 5.53 4.71 9.59 7.30 8.68 60.03 59.94 7.93 6.10 60.25
CQIV with Quantile Estimate of the Control Variable
0.05 -0.36 -0.37 3.88 2.97 2.99 1.03 1.06 7.30 5.40 5.50 5.17 4.77 13.28 10.07 11.14
0.10 -0.41 -0.33 3.25 2.61 2.63 0.91 0.98 5.75 4.46 4.57 4.91 4.83 11.92 8.82 10.05
0.25 -0.41 -0.33 2.91 2.19 2.21 0.98 0.98 4.85 3.69 3.81 4.39 4.47 9.86 7.47 8.70
0.50 -0.43 -0.37 2.78 2.07 2.10 1.20 1.03 4.95 3.66 3.80 4.31 4.19 9.35 7.02 8.17
0.75 -0.34 -0.25 2.90 2.17 2.18 1.04 0.94 4.95 3.79 3.91 3.76 3.73 9.60 7.05 7.98
0.90 -0.25 -0.17 3.11 2.44 2.44 1.17 0.87 5.79 4.32 4.41 3.36 3.24 10.77 8.13 8.75
0.95 -0.12 -0.12 3.72 2.85 2.85 0.96 0.70 6.54 4.86 4.92 2.87 2.84 13.19 9.92 10.32
QIV with OLS Estimate of the Control Variable
0.05 19.25 19.48 6.63 5.06 20.13 0.01 0.02 9.18 6.41 6.41 56.24 56.09 6.69 4.91 56.30
0.10 20.25 20.24 6.14 4.57 20.75 -0.98 -1.32 8.26 6.05 6.19 56.00 55.81 5.94 4.42 55.98
0.25 22.59 22.86 6.07 4.46 23.29 -3.56 -3.90 8.09 5.99 7.14 55.29 55.10 5.50 4.21 55.26
0.50 28.96 29.05 6.90 5.12 29.50 -7.94 -8.24 9.29 6.82 10.70 53.59 53.40 6.22 4.72 53.61
0.75 40.56 40.83 8.63 6.47 41.33 -21.71 -21.81 12.25 9.27 23.70 50.35 49.88 8.34 6.48 50.30
0.90 48.02 48.45 10.75 7.98 49.11 -40.92 -41.33 16.46 11.79 42.98 49.69 49.48 11.62 8.88 50.27
0.95 49.97 50.36 13.02 9.93 51.33 -52.88 -53.32 19.47 14.54 55.26 50.99 50.74 14.82 11.14 51.95
QIV with Quantile Estimate of the Control Variable
0.05 6.21 6.21 5.39 3.92 7.34 12.50 12.49 5.78 4.20 13.18 33.83 33.69 10.50 7.79 34.58
0.10 4.97 5.23 4.44 3.30 6.19 12.79 12.79 5.14 3.62 13.29 37.70 37.77 9.50 6.83 38.39
0.25 3.46 3.58 4.41 3.15 4.77 14.64 14.61 4.54 3.38 14.99 49.83 49.80 9.12 6.87 50.27
0.50 0.68 0.82 5.49 3.93 4.02 19.80 19.60 6.27 4.80 20.18 78.88 78.93 13.09 9.28 79.48
0.75 -0.03 0.20 7.70 5.73 5.73 20.48 20.00 12.61 9.51 22.15 107.74 107.45 16.43 11.88 108.11
0.90 3.61 3.67 8.68 6.67 7.62 8.93 8.93 19.52 14.19 16.76 115.56 115.88 18.60 13.62 116.68
0.95 6.15 6.25 10.06 7.38 9.67 -0.94 -0.45 23.19 17.39 17.39 116.60 116.71 20.82 15.63 117.75
CQR
0.05 -18.83 -18.97 2.85 2.11 19.09 22.64 23.01 6.42 4.96 23.54 NA NA NA NA NA
0.10 -19.44 -19.54 2.45 1.84 19.63 24.45 24.64 4.91 3.82 24.93 NA NA NA NA NA
0.25 -20.46 -20.45 2.08 1.55 20.51 27.34 27.30 3.93 2.88 27.46 NA NA NA NA NA
0.50 -21.28 -21.31 1.91 1.47 21.36 29.93 29.92 3.76 2.75 30.04 NA NA NA NA NA
0.75 -21.94 -21.98 2.04 1.60 22.04 32.00 32.07 3.70 2.75 32.18 NA NA NA NA NA
0.90 -22.30 -22.31 2.53 1.89 22.39 33.70 33.78 4.14 3.05 33.92 NA NA NA NA NA
0.95 -22.36 -22.47 3.00 2.19 22.58 34.47 34.51 4.79 3.46 34.68 NA NA NA NA NA
QR
0.05 -7.23 -7.06 4.54 3.35 7.81 23.04 23.12 5.13 3.73 23.42 NA NA NA NA NA
0.10 -7.65 -7.42 3.48 2.58 7.86 23.44 23.45 4.02 2.97 23.64 NA NA NA NA NA
0.25 -6.69 -6.57 2.82 2.06 6.89 24.43 24.45 3.59 2.64 24.60 NA NA NA NA NA
0.50 -2.62 -2.50 3.05 2.25 3.36 23.27 23.26 4.02 3.13 23.47 NA NA NA NA NA
0.75 5.92 5.91 4.06 2.87 6.57 12.15 11.82 8.32 6.24 13.37 NA NA NA NA NA
0.90 12.81 12.87 3.99 2.98 13.21 -7.65 -7.79 12.26 9.27 12.11 NA NA NA NA NA
0.95 15.74 15.73 3.98 3.02 16.01 -20.45 -20.58 15.94 12.00 23.83 NA NA NA NA NA
Values for each quantile multiplied by 100. Med. And Mean report median and mean bias from true values D =1, W =1, Vhat=.9
N=1,000
Replications=1,000
Endogenous Variable (D ) Covariate (W ) Control Term (Vhat )
23
Figure 1: Coefficients for Engel Curves
0.5
1
−.08−.06−.04−.02
0
−.08−.06−.04−.02
0
0.01
.02
.03
−2
−1
01
20 40 60 80 100 20 40 60 80 100
20 40 60 80 100 20 40 60 80 100
20 40 60 80 100
quantile quantile
quantile quantile
quantile
log expenditure log expenditure squared
kids control variable
constant
CQIV
Tobit IV
QIV
CQR
QR
24
Figure 2: Expenditure elasticities by expenditure quartile
−.15
−.1
−.05
0.05
20 40 60 80 100 20 40 60 80 100 20 40 60 80 100
quantile quantile quantile
.25 quartile .50 quartile .75 quartile
CQIV
Tobit IV
QIV
CQR
QR
25
Figure 3: 95% confidence intervals for expenditure elasticities by expenditure quartile
−.3
−.2
−.1
0.1
20 40 60 80 100 20 40 60 80 100 20 40 60 80 100
quantile quantile quantile
.25 quartile .50 quartile .75 quartile
bootstrapped upper bound (200 replications)
elasticity
bootstrapped lower bound (200 replications)
26
Figure 4: Engel curves
0.01.02.03.04
0.01.02.03.04
0.01.02.03.04
4 4.5 5 5.5 6 4 4.5 5 5.5 6
log expenditure
log expenditure
log expenditure
log expenditure
log expenditure
log expenditure
kids .25 quartile control var
kids .50 quartile control var
kids .75 quartile control var
no kids .25 quartile control var
no kids .50 quartile control var
no kids .75 quartile control var
Engel curve for .75 quartile of alcohol share
Engel curve for .50 quartile of alcohol share
Engel curve for .25 quartile of alcohol share
27
Table B1: CQIV Robustness Diagnostic Test Resultsfor CQIV with OLS Estimate of the Control Variable - Homoskedastic Design
CQIV Step 1
Quantile Median Min Max Median Min Max
0.05 0.04 0.04 0.05 47.20 43.30 50.30
0.1 0.09 0.06 0.10 49.10 46.00 51.30
0.25 0.20 0.15 0.24 52.20 50.50 53.70
0.5 0.36 0.26 0.46 55.80 54.80 56.80
0.75 0.43 0.29 0.58 59.40 57.70 61.10
0.9 0.37 0.22 0.58 62.40 60.30 65.10
0.95 0.30 0.18 0.54 64.20 61.40 67.50
CQIV Step 2
Quantile Median Min Max Median Min Max Median Min Max
0.05 1.70 1.45 2.01 50.70 46.70 54.90 52.30 48.20 56.70
0.1 1.71 1.44 1.96 52.80 49.50 55.50 54.50 51.10 57.30
0.25 1.71 1.46 1.98 56.30 53.60 58.70 58.10 55.30 60.60
0.5 1.72 1.44 2.02 60.10 57.60 63.40 62.00 59.40 65.40
0.75 1.73 1.47 1.99 64.00 61.20 66.80 66.00 63.10 68.90
0.9 1.75 1.44 2.01 67.40 64.60 70.60 69.50 66.60 72.80
0.95 1.76 1.49 2.02 69.30 65.60 72.80 71.50 67.70 75.10
Quantile Median Min Max Median Min Max Median Min Max
0.05 1.60 1.33 1.85 100 97.7 100 36 0 81
0.1 1.60 1.33 1.85 100 99.0 100 37 7 74
0.25 1.60 1.33 1.85 100 99.6 100 40 15 68
0.5 1.60 1.33 1.85 100 99.6 100 43 23 78
0.75 1.60 1.33 1.85 100 99.7 100 47 17 74
0.9 1.60 1.33 1.85 100 99.7 100 50 15 88
0.95 1.60 1.33 1.85 100 99.1 100 51 16 97
Comparison of Objective Functions
Objective Step 3<Objective Step 2
Quantile Median Min Max Median Min Max Median Mean
0.05 5058 4458 5674 5054 4400 5753 0 0.44
0.1 8939 7925 9946 8927 7888 10049 0 0.47
0.25 17292 15100 19839 17271 14741 20052 0 0.44
0.5 22859 18692 27022 22837 18306 27091 0 0.45
0.75 16073 9603 22872 15895 8737 22866 0 0.42
0.9 -1016 -9624 7150 -1047 -10834 9265 0 0.45
0.95 -13815 -24602 -2884 -14034 -27816 -1919 0 0.44
N=1,000, Replications=1,000
Objective Step 3 Objective Step 2
Deltan Percent J1
Percent J0 in J1 Count in J1 not in J0
c
Percent Predicted Above C
Percent J0
C
28