constrained conditional moment restriction models · andres santosz u.c.l.a. [email protected]...
TRANSCRIPT
Constrained Conditional Moment Restriction Models∗
Victor Chernozhukov
M.I.T.
Whitney K. Newey†
M.I.T.
Andres Santos‡
U.C.L.A.
First Draft: September, 2015
This Draft: July, 2020
Abstract
Shape restrictions have played a central role in economics as both testable impli-
cations of classical theory and sufficient conditions for obtaining informative coun-
terfactual predictions. In this paper we provide a general procedure for inference
under shape restrictions in identified and partially identified models defined by con-
ditional moment restrictions. Our test statistics and proposed inference methods
are based on the minimum of the generalized method of moments (GMM) objective
function with and without shape restrictions. The critical values are obtained by
building a strong approximation to the statistic and then bootstrapping a conser-
vatively relaxed form of the statistic. Sufficient conditions are provided, including
strong approximation using Koltchinski’s coupling. Examples given in the paper
include inference for linear instrumental variable (IV) models with linear inequality
constraints, inference for the variability of quantile IV treatment effects, and infer-
ence for bounds on average equivalent variation in a demand model with general
heterogeneity. We find in Monte Carlo examples that the critical values are conser-
vatively accurate and that tests about objects of interest have good power relative
to unrestricted GMM. We also give an empirical application to estimating returns
to schooling when marginal returns are positive and declining in schooling.
Keywords: Shape restrictions, inference on functionals, conditional moment
(in)equality restrictions, instrumental variables, nonparametric and semiparametric
models, Banach space, Banach lattice, Koltchinskii coupling.
∗We thank Riccardo D’amato for excellent research assistance. We are also indebted to the editor,
three anonymous referees, and numerous seminar participants for their valuable comments.†Research supported by NSF Grant 1757140.‡Research supported by NSF Grant SES-1426882.
1 Introduction
Shape restrictions have played a central role in economics as both testable implications
of classical theory and sufficient conditions for obtaining informative counterfactual pre-
dictions (Topkis, 1998). A long tradition in applied and theoretical econometrics has as
a result studied shape restrictions, their ability to aid in identification, estimation, and
inference, and the possibility of testing for their validity (Matzkin, 1994; Chetverikov
et al., 2018). The canonical example of this interplay between theory and practice is
undoubtedly consumer demand analysis, where theoretical predictions such as Slutsky
conditions have been extensively tested for and employed in estimation (Hausman and
Newey, 1995, 2016; Blundell et al., 2012; Dette et al., 2016). The empirical analysis
of shape restrictions, however, goes well beyond this important application with recent
examples including studies into the monotonicity of the state price density (Jackwerth,
2000; Aıt-Sahalia and Duarte, 2003), the presence of ramp-up and start-up costs (Wolak,
2007; Reguant, 2014), and the existence of complementarities in demand (Gentzkow,
2007) and organizational design (Athey and Stern, 1998; Kretschmer et al., 2012).
In this paper, we provide a general procedure for inference under shape restrictions
in conditional moment restriction models in which the conditioning variables (i.e. in-
struments) may vary with equations. As shown by Ai and Chen (2007, 2012), the
models we study encompass parametric (Hansen, 1982), semiparametric (Ai and Chen,
2003), and nonparametric (Newey and Powell, 2003) specifications, as well as panel data
applications (Chamberlain, 1992) and the study of plug-in functionals. By incorporat-
ing nuisance parameters into the definition of the parameter space, these models also
encompass semi(non)-parametric conditional moment (in)equality models.
Shape restrictions are often equivalent to inequalities on parameters of interest and of
certain unknown functions. For example, Slutsky negative semi-definiteness and mono-
tonicity require that certain functions satisfy inequality restrictions. Inference with
inequality restrictions is difficult. Such restrictions lead to discontinuities in limiting
distributions when the inequality restrictions bind, which makes inference challenging
due to non-pivotal and potentially unreliable pointwise asymptotic approximations (An-
drews, 2000, 2001). We address these challenges by carefully building some constraint
slackness into a potentially regularized specification of the local parameter space that
accounts for the curvature present in nonlinear constraints.
Our test statistics and proposed inference methods are based on the minimums
of the generalized method of moments (GMM) objective function with and without
inequality restrictions. We obtain a strong approximation to our test statistics and
propose bootstrap based critical values. The resulting tests remain valid in partially
identified settings and are shown to asymptotically control size uniformly over a class of
underlying distributions of the data. Inference is potentially conservative but powerful
1
in exploiting the large amount of information that inequality restrictions can provide
in many cases relevant for applications. While aspects of our analysis are specific to
the conditional moment restriction model, the role of the local parameter space is solely
dictated by the shape restrictions. As such, we expect the insights of the set up here to
be applicable to the study of shape restrictions in alternative models as well.
The inequalities associated with nonparametric shape restrictions necessitate con-
sideration of parameter spaces that are sufficiently general yet endowed with enough
structure to ensure a fruitful asymptotic analysis. An important insight of this paper is
that this simultaneous flexibility and structure is possessed by sets defined by inequality
restrictions on Abstract M (AM) spaces, an AM space being a Banach lattice whose
norm obeys a condition discussed in Section 3. We illustrate the general applicability
of our framework by applying our main results to: (i) Conduct inference on parameters
of interest estimated by GMM in the presence of parametric inequality restrictions; (ii)
Test shape restrictions on structural functions satisfying quantile conditional moment
restrictions; (iii) Conduct inference about partially identified sets of average equivalent
variation and other objects of interest in demand estimation with general heterogeneity
and smooth demand functions; and (iv) Impose the Slutsky restrictions to conduct in-
ference in a linear conditional moment restriction model. Additionally, while we do not
pursue further examples in detail for conciseness, we note our results may be applied to
conduct tests of homogeneity, supermodularity, and economies of scale or scope, as well
as inference on functionals of the identified set in certain partially identified models.
The literature on nonparametric shape restrictions in econometrics has classically
focused on testing whether conditional mean regressions satisfy the restrictions implied
by consumer demand theory; see, e.g., Lewbel (1995) and Haag et al. (2009), as well
as Lee et al. (2018) for a generalization to estimators allowing for Bahadur representa-
tions. The related problem of studying monotone conditional mean regressions has also
garnered widespread attention – recent advances on this problem includes Chetverikov
(2019) and Chatterjee et al. (2013). Additional work concerning monotonicity con-
straints includes Beare and Schmidt (2014) who test the monotonicity of the pricing
kernel, Chetverikov and Wilhelm (2017) who study estimation of a nonparametric in-
strumental variable (IV) regression under monotonicity constraints, Armstrong (2015)
who develops minimax rate optimal one sided tests in a Gaussian regression discon-
tinuity (RD) design, Babii and Kumar (2019) who apply isotonic regression to RD,
and Freyberger and Horowitz (2015) who study a partially identified nonparametric IV
model with discrete regressors. Our results do not lend themselves computationally for
the construction of uniform confidence bands for shape restricted functions – a problem
that has been addressed in different contexts by Chernozhukov et al. (2009), Horowitz
and Lee (2017), and Freyberger and Reeves (2018). Following the original version of
this paper, Zhu (2019) and Fang and Seo (2019) have proposed inference methods for
2
convex restrictions which, while applicable to an important class of problems, rule out
inference on nonlinear functionals or tests of certain shape restrictions. The analysis in
our paper is also related to the moment inequalities literature (Canay and Shaikh, 2017;
Ho and Rosen, 2017). When specialized to these models, our results enable for subvec-
tor inference under conditions related to those in Bugni et al. (2017); see Torgovitsky
(2019) for an application of our procedure in this context. Our paper also contributes
to a literature studying semiparametric and nonparametric models under partial iden-
tification (Manski, 2003). Examples of such work include Chen et al. (2011a), Hong
(2017), Santos (2012), Tao (2014), and Chen et al. (2011b) who focus on inference on
functionals of the structural functional but do not allow for shape constraints as we do.
This paper innovates relative to the previous literature in providing novel bootstrap
based, uniformly valid inference methods for general conditional moment restriction
models, including parametric and nonparametric IV models as special cases. Most of the
previous results are either for conditional means or the inference results are not uniformly
valid and are thus unsuitable for confidence interval construction. We also innovate in
providing inference results for functionals of the structural function while allowing for
shape constraints. The results of Chernozhukov et al. (2009) are complementary to our
analysis since our results do not provide feasible uniform confidence intervals but do lead
to, e.g., tests for monotonicity and for inference at a point whereas their methods would
be conservative. The results here are highly complementary to Chetverikov and Wilhelm
(2017) in providing inference for nonparametric IV under shape restrictions while they
showed that imposing monotonicity can greatly improve the convergence rate of the
estimator – an observation that additionally motivates our use of test statistics based
on shape constrained (instead of unconstrained) estimators.
The remainder of the paper is organized as follows. In Section 2 we preview our test
in the context of two specific examples. Section 3 contains our main theoretical results,
while Section 4 applies them to conduct inference in the heterogenous demand model
of Hausman and Newey (2016). Finally, Section 5 contains a brief simulation study,
while Section 6 revisits Angrist and Krueger (1991) study into the returns of education
by imposing shape restrictions. All mathematical derivations are included in a series of
appendices; see in particular Appendix S.6 for applications of our general results and
Appendix S.8 for coupling results based on Koltchinskii (1994).
2 Guide to Implementation
To fix ideas, we first describe our test in the context of two specific examples. We reserve
until later the full mathematical framework and focus here on implementation.
3
2.1 Linear Instrumental Variables
As perhaps the simplest possible example, we first consider a linear instrumental variable
model in which θ0(P ) ∈ Θ is identified through the moment conditions
EP [(Y −W ′θ0(P ))Z] = 0,
where Y ∈ R, W ∈ Rdθ , Z ∈ Rdz , and P denotes the distribution of V ≡ (Y,W,Z).
We are interested in testing whether θ0(P ) belongs to a set R characterized by
R = θ ∈ Rdθ : Fθ = b, Cθ ≤ c, (1)
for F a db × dθ matrix, C a dc × dθ matrix, b ∈ Rdb , and c ∈ Rdc .
We consider tests based on minimizing the norm of the weighted sample moments
as in Sargan (1958) and Hansen (1982). To this end, we define the criterion
Qn(θ) ≡ ( 1
n
n∑i=1
(Yi −W ′iθZi))′Σ′nΣn(1
n
n∑i=1
(Yi −W ′iθ)Zi)1/2,
where Σn is consistent for (E[ZZ ′U2])−12 for U ≡ Y −W ′θ0(P ) – e.g. Σ′nΣn may be the
optimal weighting matrix of Hansen (1982). Our test statistic is then based on
In(R) = minθ∈R
√nQn(θ) In(Θ) = min
θ∈Rdθ
√nQn(θ).
Specifically, we consider tests that reject for large values of In(R)−In(Θ). In what follows
it will also be helpful to let θn and θun denote the minimizers ofQn over θ ∈ R and θ ∈ Rdθ
respectively – i.e. θn and θun are the constrained and unconstrained estimators.
We construct a critical value for this choice of test statistic by using the multiplier
bootstrap. Specifically, let b ∈ 1, . . . , B index a bootstrap draw, ωbini=1 be i.i.d.
independent of the data with ωbi ∼ N(0, 1), and for any θ ∈ Rdθ define
Wbn(θ) ≡ 1√
n
n∑i=1
ωbi(Yi −W ′iθ)Zi −1
n
n∑j=1
(Yj −W ′jθ)Zj,
which is a simulated draw of the true (centered) moment functions. We also require an
estimator of the derivative of the moment conditions, and to this end we set
Dn[h] ≡ − 1
n
n∑i=1
ZiW′ih
for any h ∈ Rdθ . Finally, we additionally need to account for the variation in θn (recall
In(R) =√nQn(θn)). The principal challenge in this regard is accounting for the possible
4
values that θn may take, and for this purpose we introduce the set
Vn(θn) ≡ h ∈ Rdθ : Fh = 0 and Cjh ≤√nmax0,−(rn + Cj θn − cj) for all j,
where Cj is the jth row of C, cj the jth coordinate of c, and rn is a slackness parameter
whose choice we discuss shortly. The set Vn(θn) can be thought of as a local version of
the restricted parameter space, approximating the set of values h/√n that could equal
θn − θ0(P ). Our bootstrap approximations to In(R) and In(Θ) are then given by
U bn(R) ≡ minh∈Vn(θn)
(Wbn(θn) + Dnh)′Σ′nΣn(Wb
n(θn) + Dnh)1/2 (2)
U bn(Θ) ≡ minh∈Rdθ
(Wbn(θu
n) + Dnh)′Σ′nΣn(Wbn(θu
n) + Dnh)1/2. (3)
Thus, for a level α test, we employ the 1 − α quantile of U bn(R) − U bn(Θ) across the B
bootstrap draws as our critical value. The asymptotic validity of this test is formally
established in Appendix S.6.1, where we specialize our results to parametric GMM.
The critical value depends on the choice rn. Our asymptotic theory will require
rn√n → ∞, with the choice rn = +∞ always being valid. Intuitively, rn is meant to
reflect the sampling uncertainty in Cθn − θ0(P ). Since the distribution of θn cannot
be uniformly consistently estimated, we suggest linking rn to the degree of sampling
uncertainty in Cθun − θ0(P ) instead. Specifically, we recommend setting rn to satisfy
P ( max1≤j≤J
Cjθun − θu?
n ≤ rn|Data) = 1− γn (4)
where γn → 0 and θu?n is a “bootstrap” analogue of θu
n. This approach changes the prob-
lem of selecting rn into the problem of selecting γn. However, γn is more interpretable:
If we employed Vn(θun) in place of Vn(θn) in (2), then a Bonferroni bound implies the
asymptotic size is bounded by α + γn for fixed γn.1 In simulations we find this bound
to be overly pessimistic and the test to have size at most α for a wide range of γn.
Remark 2.1. Our test may be employed to obtain confidence regions for a coordinate
of θ0(P ) while imposing restrictions of the form Cθ0(P ) ≤ c (e.g. sign restrictions or
monotonicity of x 7→ x′θ0(P )). Specifically, for θk the kth coordinate of θ ∈ Rdθ let
Rλ ≡ θ ∈ Rdθ : θk = λ, Cθ ≤ c,
which is a special case of (1). The desired confidence region can then be obtained by
conducting test inversion in λ employing the described test.
1While we may replace Vn(θn) with Vn(θun) in identified models, in partially identified models weemploy Vn(θn) due to the identified set potentially not being a subset of R under the null hypothesis.
5
2.2 Quantile Treatment Effects
As a second introductory example we consider a nonparametric conditional moment
restriction model. Specifically, for an outcome Y ∈ R, treatment D ∈ [0, 1], instrument
Z ∈ R, and quantile τ ∈ (0, 1), we are interested in the function θ0(P ) solving
P (Y ≤ θ0(P )(D)|Z) = τ, (5)
where θ0(P )(D) denotes the value of θ0(P ) at the point D and P the distribution of
V ≡ (Y,D,Z). If D is randomly assigned, then we may set D = Z and interpret the
derivative ∇θ0(P ) as the τ th quantile treatment effect (QTE). Alternatively, if D 6= Z,
then we obtain the IV model of QTE of Chernozhukov and Hansen (2005). In what
follows we assume θ0(P ) belongs to a parameter space Θ, such as the set of functions
with bounded first and second derivatives; see Appendix S.6.3 for a formal analysis.
We aim to obtain a confidence interval on the variation (in D) of the QTE while
imposing that the QTE be increasing in treatment intensity.2 To this end, we set
R ≡ θ ∈ Θ :
∫ 1
0(∇θ(u))2du− (
∫ 1
0∇θ(u)du)2 = λ and ∇2θ(u) ≥ 0 for all u ∈ [0, 1],
where λ is the hypothesized value for the variation of the QTE, and the positivity
constraint on ∇2θ ensures the QTE is increasing in treatment intensity. By conducting
test inversion (in λ) of the null hypothesis that θ0(P ) ∈ R we may then obtain the desired
confidence region. To construct a test statistic we employ a sequence of transformations
qkknk=1 of Z, let qkn(z) ≡ (q1(z), . . . , qkn(z))′, and define the criterion function
Qn(θ) ≡ ( 1
n
n∑i=1
(1Yi ≤ θ(Di) − τ)qknn (Zi))′(
1
n
n∑i=1
(1Yi ≤ θ(Di) − τ)qknn (Zi))1/2,
where we allow kn to increase with n to reflect the full content of the conditional moment
restriction in (5). Because Θ is infinite dimensional, we do not minimize Qn over Θ,
but instead employ a sieve Θn. Specifically, for pjjnj=1 a sequence of approximating
functions, pjn(d) ≡ (p1(d), . . . , pjn(d))′, and Θn ≡ pjn′β : pjn′β ∈ Θ we define
In(R) ≡ infθ∈Θn∩R
√nQn(θ) In(Θ) ≡ inf
θ∈Θn
√nQn(θ),
and let θn and θun denote the corresponding constrained and unconstrained estimators.
The statistics In(R) and In(R) − In(Θ) can both be employed as the basis for a test.
As we discuss in Section 3, however, our asymptotic approximation to In(R) − In(Θ)
imposes more stringent assumptions on the rate of convergence of θun than those needed to
2Inference on other functionals, such as the average (in D) QTE, is also feasible but for illustrativepurposes we focus here on a nonlinear functional such as the variance.
6
approximate In(R) – a distinction that may be relevant in this nonparametric application
(particularly if D 6= Z), but was unnecessary in the parametric model of Section 2.1.
To obtain a critical value we proceed in a conceptually similar manner to Section 2.1.
First, for b ∈ 1, . . . , B indexing bootstrap draws of ωbini=1 i.i.d. with ωi ∼ N(0, 1) let
Wbn(θ) ≡ 1√
n
n∑i=1
ωbi(1Yi ≤ θ(Di) − τ)qkn(Zi)−1
n
n∑j=1
(1Yj ≤ θ(Di) − τ)qkn(Zj),
be our multiplier bootstrap analogue to the centered moments. Second, since Qn is not
everywhere differentiable, we approximate the derivative of the moments by
Dn(θ)[pjn′β] ≡ 1√n
n∑i=1
qkn(Zi)(1Yi ≤ θ(Di) +pjn(Di)
′β√n
− 1Yi ≤ θ(Di)),
which is a numerical derivative with step size 1/√n. Third, to account for the possible
values that the restricted estimator θn may take we introduce the set
Vn(θn, `n) =β ∈ Rjn : ∇2pjn(d)′β ≥
√nmin0, rn +∇θn(d) for all d ∈ [0, 1],∫ 1
0(∇θn(u)+
∇pjn(u)′β√n
)2du−(
∫ 1
0∇(θn(u)+
pjn(u)′β√n
)du)2 = λ, ‖pjn′β‖2,∞ ≤√n`n
where the choices of rn and `n are discussed below and the norm ‖ · ‖2,∞ is defined as
‖pjn′β‖2,∞ = max0≤α≤2
supd∈[0,1]
|∇αpjn(d)′β|.
In analogy to Section 2.1, Vn(θn, `n) represents possible local deviations from the ap-
proximation to θ0(P ) in Θn. Our bootstrap approximations to In(R) and In(Θ) are
U bn(R|`n) ≡ infβ∈Vn(θn,`n)
(Wbn(θn) + Dn(θn)[pjnβ])′(Wb
n(θn) + Dn(θn)[pjnβ])1/2
U bn(Θ|`n) ≡ infβ:‖pjn′β‖2,∞≤
√n`n(Wb
n(θun) + Dn(θu
n)[pjnβ])′(Wbn(θu
n) + Dn(θun)[pjnβ])1/2.
Hence, a test based on In(R) rejects whenever In(R) exceeds the 1 − α quantile of
U bn(R|`n) across bootstrap replications, while a test based on In(R) − In(Θ) rejects
whenever In(R)− In(Θ) exceeds the 1− α quantile of U bn(R|`n)− U bn(Θ|`n).
Our critical values depend on the choices of rn and `n. The slackness parameter rn
again measures sampling uncertainty in whether constraints “bind” and we thus set
P ( maxd∈[0,1]
∇2θun(d)−∇2θu?
n (d) ≤ rn|Data) = 1− γn
for θu?n a “bootstrap” analogue to θu
n and γn → 0 as in Section 2.1. The band-
7
width `n regularizes the local parameter space, and its main role is to ensure that
`n × (rate of convergence) = o(n−1/2) in settings in which the rate of convergence is
slower than the usually required o(n−1/4). We thus suggest setting `n to satisfy
P ( maxd∈[0,1]
|∇2θun(d)−∇2θu?
n (d)| ≤ 1√n`n|Data) = 1− γn.
In settings in which the rate of convergence is expected to be sufficiently fast, no regu-
larization is necessary and one may set `n = +∞; see Lemma 3.1 below.
3 General Theory
We next turn to developing a general inferential framework that encompasses the tests
discussed in Section 2 as special cases. The class of models we consider are those in
which the parameter of interest θ0 ∈ Θ satisfies J conditional moment restrictions
EP [ρ(X, θ0)|Z] = 0 for 1 ≤ ≤ J
with ρ : Rdx × Θ → R, X ∈ Rdx , Z ∈ Rdz , V ≡ (X, ZJ=1), and V ∼ P ∈ P.
In some of the applications that motivate us, such as Hausman and Newey (2016), the
parameter θ0 is not identified. It is therefore convenient to define the identified set
Θ0(P ) ≡ θ ∈ Θ : EP [ρ(X, θ)|Z] = 0 for 1 ≤ ≤ J
and employ it as the basis of our statistical analysis. Formally, for a set R of parameters
satisfying a conjectured restriction, we develop a test for the hypothesis
H0 : Θ0(P ) ∩R 6= ∅ H1 : Θ0(P ) ∩R = ∅; (6)
i.e. we devise a test of whether at least one element of the identified set satisfies the
posited constraints. In an identified model, a test of (6) is thus equivalent to a test of
whether θ0 satisfies the hypothesized constraint. We denote the set of distributions P
satisfying the null hypothesis by P0 ≡ P ∈ P : Θ0(P ) ∩R 6= ∅.
The defining elements determining the generality of the hypothesis allowed for by
(6) are the choices of Θ and R. In imposing restrictions on both Θ and R we therefore
aim to allow for a general framework while simultaneously ensuring enough structure
for a fruitful asymptotic analysis. To this end, we require Θ to be a subset of a complete
normed vector space B (i.e. a Banach space) and consider sets R with the structure
R ≡ θ ∈ B : ΥF (θ) = 0 and ΥG(θ) ≤ 0, (7)
where ΥF : B→ F and ΥG : B→ G are known maps. Our first assumption formalizes
8
the restrictions that we impose on the parameter space Θ and the restriction set R.
Assumption 3.1. (i) Θ ⊆ B, where B is a Banach space with metric ‖ · ‖B; (ii)
ΥF : B→ F and ΥG : B→ G, where F is a Banach space with metric ‖ · ‖F and G is
an AM space with order unit 1G and metric ‖ · ‖G.
Assumption 3.1(i) allows us to address parametric, semiparametric, and nonpara-
metric models. Assumption 3.1(ii) similarly imposes that ΥF take values in a Banach
space F, while ΥG is required to take values in an AM space G – we provide an overview
of AM spaces in the supplemental appendix. Heuristically, the essential properties of G
are: (i) G is a vector space equipped with a partial order “≤”; (ii) The partial order and
the vector space operations interact in the same manner they do on R (e.g. if θ1 ≤ θ2,
then θ1 + θ3 ≤ θ2 + θ3); and (iii) The order unit 1G ∈ G is an element such that for any
θ ∈ G there exists a scalar λ > 0 satisfying |θ| ≤ λ1G (e.g. when G = Rd we may set
1G ≡ (1, . . . , 1)′ ∈ Rd). These properties of an AM space will prove instrumental in our
analysis. In particular, the order unit 1G will provide a crucial link between the partial
order “≤”, the norm ‖ · ‖G, and (through smoothness of ΥG) allow us to leverage a rate
of convergence in B to build a suitable sample analogue to the local parameter space.
3.1 The Test Statistic
We test the null hypothesis in (6) by employing sieve-GMM criterion based tests that
may be viewed as a generalizations of the J-test of Sargan (1958) and Hansen (1982) or
the incremental J-test of Eichenbaum et al. (1988).
For each instrument Z, we consider transformations qk,n,kn,k=1 and let q
kn,n, (z) ≡
(q1,n,(z), . . . , qkn,,n,(z))′. Further setting Z ≡ (Z ′1, . . . , Z
′J )′, kn ≡
∑J=1 kn,, q
knn (z) ≡
(qkn,1n,1 (z1)′, . . . , q
kn,Jn,J (zJ )′)′, and ρ(x, θ) ≡ (ρ1(x, θ), . . . , ρJ (x, θ))′ we then let
1√n
n∑i=1
ρ(Xi, θ) ∗ qknn (Zi) ≡1√n
n∑i=1
(ρ1(Xi, θ)qkn,1n,1 (Zi,1)′, . . . , ρJ (Xi, θ)q
kn,Jn,J (Zi,J )′)′;
i.e. for each θ we obtain the scaled sample averages of the product of each “residual”
ρ(X, θ) with the transformations of its respective instrument Z. For a possibly data
dependant kn × kn matrix Σn, we then construct a criterion function Qn : Θ→ R+ by
Qn(θ) ≡ ‖ 1
n
n∑i=1
ρ(Xi, θ) ∗ qknn (Zi)‖Σn,p,
where for any a ∈ Rkn we let ‖a‖Σn,p ≡ ‖Σna‖p and ‖ · ‖p be the p-norm on Rkn for any
p ≥ 2 – i.e. ‖a‖pp ≡∑d
i=1 |a(i)|p for any a ≡ (a(1), . . . , a(d))′ ∈ Rd. We note that setting
p = 2 leads to the computationally most attractive test. However, we allow for p 6= 2 as
higher values of p enables us to establish coupling results under weaker conditions.
9
Heuristically,√nQn should diverge to infinity when evaluated at any θ /∈ Θ0(P )
and remain “stable” when evaluated at a θ ∈ Θ0(P ). Thus, examining the minimum
of√nQn over R should reveal whether there is a θ that simultaneously makes
√nQn
“stable” (θ ∈ Θ0(P )) and satisfies the conjectured restriction (θ ∈ R). This intuition
leads to a generalization of the J-test that is based on the test statistic
In(R) ≡ infθ∈Θn∩R
√nQn(θ),
where Θn ∩R is a finite dimensional subset of Θ ∩R that grows dense in Θ ∩R (Chen,
2007). Alternatively, we may recenter our test statistic by considering
In(R)− In(Θ).
Intuitively, relative to In(R), tests based on In(R)−In(Θ) can have higher power against
alternatives that satisfy Θ0(P ) 6= ∅ but lower power against alternatives under which
Θ0(P ) = ∅ (Chen and Santos, 2018). The theoretical study of both statistics, however,
is similar. Specifically, we will obtain coupling results that apply to statistics In(R) and
suitable bootstrap counterparts for general sets R. As a result, they will also apply to
R = Θ, which immediately lets us analyze the recentered statistic as well.
Before proceeding, we introduce notation and assumptions we employ throughout
our analysis. For any function f of V we let Gn,P f ≡∑n
i=1f(Vi) − EP [f(V )]/√n
denote the empirical process evaluated at f . We will often evaluate Gn,P on the class
Fn ≡ ρ(·, θ) : θ ∈ Θn ∩R and 1 ≤ ≤ J. (8)
The “size” of Fn plays a crucial role, and we control it through the bracketing integral
J[ ](δ,Fn, ‖ · ‖P,2) ≡∫ δ
0
√1 + logN[ ](ε,Fn, ‖ · ‖P,2)dε, (9)
where N[ ](ε,Fn, ‖ · ‖P,2) is the smallest number of ε-brackets (under ‖ · ‖P,2) required to
cover Fn. It will also prove useful to denote the vector subspace generated by Θn ∩ Rby Bn. If there are multiple norms ‖ · ‖A1 and ‖ · ‖A2 on Bn, we further set
Sn(A1,A2) ≡ supb∈Bn
‖b‖A1
‖b‖A2
, (10)
which we note depends on the sample size n only through the choice of sieve Θn ∩R.
The following assumptions introduce the basic structure we employ.
Assumption 3.2. (i) Vi∞i=1 is an i.i.d. sequence with Vi ∼ P ∈ P.
Assumption 3.3. (i) sup1≤≤J sup1≤k≤kn, ‖qk,n,‖∞ ≤ Bn with Bn ≥ 1; (ii) The eigen-
values of EP [qkn,n, (Z)q
kn,n, (Z)
′] are bounded in , n, P ∈ P; (iii) Fn has envelope Fn,
10
supP∈P ‖Fn‖P,2 <∞, and supP∈P J[ ](‖Fn‖P,2,Fn, ‖ · ‖P,2) ≤ Jn for some Jn <∞.
Assumption 3.4. (i) For each P ∈ P there is a Σn(P ) > 0 with ‖Σn − Σn(P )‖o,p =
oP (1) uniformly in P ∈ P; (ii) The matrices Σn(P ) are invertible for all n and P ∈ P;
(iii) ‖Σn(P )‖o,p and ‖Σn(P )−1‖o,p are uniformly bounded in n and P ∈ P.
Assumption 3.2 imposes that the sample Vini=1 be i.i.d. with P belonging to a set of
distributions P over which our results will hold uniformly. By allowing Bn to depend on
n, Assumption 3.3(i) accommodates both transformations that are uniformly bounded
in n, such as trigonometric series, and those with diverging bound, such as normalized
B-splines. The bound on eigenvalues imposed in Assumption 3.3(ii) guarantees that
qk,n,kn,n=1 are Bessel sequences uniformly in n, while Assumption 3.3(iii) controls the
“size” of the class Fn, which is crucial in studying the induced empirical process. We
note that Jn is allowed to diverge with the sample size and thus Assumption 3.3(iii)
accommodates non-compact parameter spaces Θ as in Chen and Pouzo (2012, 2015).
Alternatively, if the class F ≡⋃∞n=1Fn is restricted to be Donsker, then Assumptions
3.3(ii)-(iii) can hold with uniformly bounded Jn and ‖Fn‖P,2. Finally, Assumption 3.4
requires the weighting matrix Σn to converge to an invertible matrix Σn(P ) – here,
‖ · ‖o,p denotes the operator norm when Rkn is endowed with ‖ · ‖p.
3.2 Strong Approximation
We begin our analysis by obtaining a strong approximation to statistics of the form
In(R). The results immediately apply to In(R)− In(Θ) as well by setting R = Θ.
3.2.1 Rate of Convergence
As a preliminary step, we first aim to characterize the rate of convergence of the (ap-
proximate) minimizers of Qn on Θn ∩R. Formally, for any sequence τn ↓ 0, we let
Θrn ≡ θ ∈ Θn ∩R : Qn(θ) ≤ inf
θ∈Θn∩RQn(θ) + τn, (11)
which constitutes the set of exact (τn = 0) or near (τn > 0) minimizers of Qn. We
consider the general case with τn ↓ 0, as we employ both the set of exact and near
minimizers in characterizing and estimating the distribution of our test statistics.
Following the literature on set estimation (Chernozhukov et al., 2007; Beresteanu
and Molinari, 2008; Santos, 2011; Kaido and Santos, 2014), for any sets A and B we let
−→d H(A,B, ‖ · ‖E) ≡ sup
a∈Ainfb∈B‖a− b‖E
dH(A,B, ‖ · ‖E) ≡ max−→d H(A,B, ‖ · ‖E),
−→d H(B,A, ‖ · ‖E),
11
which constitute the directed Hausdorff distance and the Hausdorff distance under a
norm ‖ · ‖E that need not equal ‖ · ‖B. Heuristically, in order to obtain a strong approx-
imation we require a convergence rate under a metric for which the empirical process is
equicontinuous – a purpose for which ‖ · ‖B is often too strong with its use leading to
overly stringent assumptions. By introducing a potentially weaker norm ‖ · ‖E we are
thus able to obtain better rate conditions in some infinite dimensional problems.
For each θ ∈ Θ∩R, we further let Πrnθ denote its approximation on Θn ∩R and set
Θr0n(P ) ≡ Πr
nθ : θ ∈ Θ0(P ) ∩R. (12)
It will also prove useful to define the population analogue to Qn(θ), which is given by
Qn,P (θ) ≡ ‖EP [ρ(X, θ) ∗ qknn (Z)]‖Σn(P ),p.
Our next set of assumptions suffices for deriving a rate of convergence for Θrn.
Assumption 3.5. For each P ∈ P0 there is a map Πrn : Θ ∩ R → Θn ∩ R such that
supP∈P0supθ∈Θ0(P )∩RQn,P (Πr
nθ)−infθ∈Θn∩RQn,P (θ) = O(JnBnk1/pn
√log(1 + kn)/n).
Assumption 3.6. (i) There is a norm ‖ ·‖E on Bn, sets Vn(P ) ⊆ Θn∩R, and νn∞n=1
with ν−1n = O(1), such that Θr
n ⊆ Vn(P ) with probability tending to one uniformly in
P ∈ P0 and for any θ ∈ Vn(P ) and ηn ≡ JnBnk1/pn
√log(1 + kn)/n it follows
ν−1n
−→d H(θ,Θr
0n(P ), ‖ · ‖E) ≤ Qn,P (θ)− supθ0∈Θr
0n(P )Qn,P (θ0)+O(ηn);
(ii) Σn satisfies supθ∈Θr0n(P )Qn,P (θ)× ‖Σn − Σn(P )‖o,p = OP (ηn) uniformly in P0.
Assumption 3.5 formalizes the sense in which Θ0(P ) ∩ R must be approximated by
Θr0n(P ). Since Θ0(P )∩R minimizes Qn,P over the entire parameter space, Assumption
3.5 requires that Θr0n(P ) be “close” to minimizing Qn,P over the sieve. Assumption
3.6(i) introduces a local identification condition by requiring that on a set Vn(P ), the
criterion Qn,P grow at a rate ν−1n as θ moves away from Θr
0n(P ). The parameter ν−1n ,
which implicitly depends on kn and the choice of sieve Θn ∩ R, is conceptually related
to sieve measure of ill-posedness (Blundell et al., 2007). The set Vn(P ) may be taken to
equal the entire sieve in convex models, or it may be taken to equal a local neighborhood
of Θr0n(P ) after establishing the consistency of Θr
n; see Lemma S.3.1 in the supplemental
appendix. Finally, Assumption 3.6(ii) imposes a rate of convergence requirement on Σn.
The next theorem employs Assumptions 3.5 and 3.6 to obtain a rate of convergence.
Theorem 3.1. Let Assumptions 3.2, 3.3(i), 3.3(iii), 3.4, 3.5, and 3.6 hold, and let
Rn ≡ νnk
1/pn
√log(1 + kn)JnBn√
n. (13)
12
Then uniformly in P ∈ P0: (i)−→d H(Θr
n,Θr0n(P ), ‖ · ‖E) = OP (Rn + νnτn); and (ii)
dH(Θrn,Θ
r0n(P ), ‖ · ‖E) = OP (νnτn) provided JnBnk
1/pn
√log(1 + kn)/n = o(τn).
Theorem 3.1(i) implies that, with arbitrarily high probability, Θrn(P ) is contained
in a neighborhood of Θr0n(P ) that shrinks at a Rn + νnτn rate. We further note that
in identified models Θr0n(P ) is a singleton and Theorem 3.1(i) reduces to consistency in
the Hausdorff metric. For partially identified models, however, Theorem 3.1(ii) further
requires τn not tend to zero too fast in order to obtain Hausdorff consistency. It is also
worth noting that, under assumptions on the Hausdorff distance between Θr0n(P ) and
Θ0(P )∩R, Theorem 3.1 and the triangle inequality can yield a rate of convergence of Θn
to Θ0(P )∩R. Heuristically, we focus on convergence to Θr0n(P ) (instead of Θ0(P )∩R)
because our strong approximation will rely on undersmoothing.
While we employ Theorem 3.1 in our forthcoming analysis, we emphasize that in
specific applications alternative results that are better suited for the particular structure
of the model may be available. In this regard, we note Assumptions 3.5 and 3.6 are not
needed in our analysis beyond their role in delivering a rate of convergence. In particular,
if an alternative rate is derived under different assumptions, then such a result can still be
combined with our analysis to establish the validity of the proposed inferential methods.
3.2.2 Local Approximation
We follow the literature by observing that it is possible to accommodate non-differentiable
moment functions by requiring their conditional moments to be differentiable (Chen and
Pouzo, 2009, 2012). To this end, for any 1 ≤ ≤ J we define mP, : B→ L2P by
mP,(θ)(Z) ≡ EP [ρ(X, θ)|Z].
Our strong approximation then relies on the following assumptions.
Assumption 3.7. supθ∈Θn∩R ‖Gn,Pρ(·, θ)∗qknn −Wn,Pρ(·, θ)∗qknn ‖p = oP (an) uniformly
in P ∈ P for Wn,P Gaussian and an∞n=1 some known bounded sequence.
Assumption 3.8. There are κρ > 0 and Kρ <∞ such that for all θ1, θ2 ∈ Θn ∩R and
P ∈ P it follows that EP [‖ρ(X, θ1)− ρ(X, θ2)‖22] ≤ K2ρ‖θ1 − θ2‖
2κρE .
Assumption 3.9. There is a norm ‖ · ‖L on Bn, linear maps ∇mP,(θ) : B → L2P ,
and constants ε > 0 and Km,Mm < ∞ such that for all P ∈ P, h ∈ Bn, and elements
θ0, θ1 ∈ θ ∈ Θn ∩ R :−→d H(θ,Θr
0n(P ), ‖ · ‖E) ≤ ε: (i) ‖mP,(θ1) − mP,(θ0) −∇mP,(θ0)[θ1−θ0]‖P,2 ≤ Km‖θ1−θ0‖L‖θ1−θ0‖E; (ii) ‖∇mP,(θ1)[h]−∇mP,(θ0)[h]‖P,2 ≤Km‖θ1 − θ0‖L‖h‖E; and (iii) ‖∇mP,(θ0)[h]‖P,2 ≤Mm‖h‖E.
13
Assumption 3.10. (i) k1/pn
√log(1 + kn)Bn supP∈P J[ ](R
κρn ,Fn, ‖ · ‖P,2) = o(an); (ii)
supP∈P0supθ0∈Θr
0n(P )
√n‖EP [ρ(X, θ0)∗qknn (Z)]‖Σn(P ),p = o(an); (iii) ‖Σn−Σn(P )‖o,p =
oP (ank1/pn
√log(1 + kn)BnJn−1) uniformly in P ∈ P0.
Assumption 3.7 requires that the empirical process be coupled to a Gaussian process
uniformly in P ∈ P. The sequence an∞n=1 denotes a bound on the rate of convergence
of the coupling, which in turn characterizes the rate of convergence of our strong ap-
proximation. In the supplemental appendix, we illustrate how to verify Assumption 3.7
by relying on available coupling results (Yurinskii, 1977; Zhai, 2018) or an extension of
Koltchinskii (1994) derived in Appendix S.8. Assumption 3.8 imposes a mild restric-
tion on the moment functions that ensures Wn,P is appropriately equicontinuous with
respect to ‖ · ‖E. Assumption 3.9(i) demands that the conditional moment functions
mP, be locally approximated by linear maps ∇mP, with an approximation error that
is controlled by the product of ‖ · ‖E and a potentially stronger norm ‖ · ‖L. In turn,
Assumptions 3.9(ii)(iii) impose continuity conditions on ∇mP, – these assumptions are
not required to obtain a strong approximation but are needed for our inference results.
Finally, Assumption 3.10 contains our rate restrictions: Assumption 3.10(i) ensures the
rate of convergence Rn is sufficiently fast to overcome an asymptotic loss of equiconti-
nuity; Assumption 3.10(ii) is needed for the test statistic to be properly centered under
the null hypothesis; and Assumption 3.10(iii) controls the rate of convergence of Σn.
We obtain a strong approximation by employing a local reparametrization. Formally,
for any θ ∈ Θn ∩R we denote the collection of local deviations from θ by
Vn(θ, `) ≡ h√n∈ Bn : θ +
h√n∈ Θn ∩R and ‖ h√
n‖E ≤ `. (14)
The normalization by√n is notationally convenient but otherwise plays no particular
role since ` may depend on n. In addition, for any θ ∈ Θr0n(P ) and h ∈ Bn we let
Dn,P (θ0)[h] ≡ EP [∇mP (θ0)[h](Z) ∗ qknn (Z)],
where ∇mP (θ0)[h](Z) ≡ (∇mP,1(θ0)[h](Z1), . . . ,∇mP,J (θ0)[h](ZJ ))′. For any given
sequence `n, we then define a sequence of random variables Un,P (R|`n) to be given by
Un,P (R|`n) ≡ infθ0∈Θr
0n(P )inf
h√n∈Vn(θ0,`n)
‖Wn,Pρ(·, θ0) ∗ qknn + Dn,P (θ0)[h]‖Σn(P ),p. (15)
The next result establishes the relation between Un,P (R|`n) and our test statistic.
It is helpful to recall here that the notation Sn(L,E) was introduced in (10) and that
the constant Km, introduced in Assumption 3.9, equals zero for linear models.
Theorem 3.2. Let Assumptions 3.2, 3.3, 3.4, 3.6, 3.7, 3.8, 3.9(i), and 3.10 hold. Then:
(i) For any `n ↓ 0 satisfying k1/pn
√log(1 + kn)Bn × supP∈P J[ ](`
κρn ,Fn, ‖ · ‖P,2) = o(an)
14
and Km`2n × Sn(L,E) = o(ann
− 12 ) it follows uniformly in P ∈ P0 that:
In(R) ≤ Un,P (R|`n) + oP (an).
(ii) If in addition KmR2n×Sn(L,E) = o(ann
− 12 ), then `n may be additionally chosen to
satisfy Rn = o(`n), in which case it follows uniformly in P ∈ P0 that:
In(R) = Un,P (R|`n) + oP (an).
Theorem 3.2 is perhaps best understood as establishing the validity of a family
(indexed by `n) of strong approximations that differ on the size of the neighborhoods
of Θr0n(P ) that they employ. Its proof crucially relies on the local approximation
√nEP [ρ(X, θ0 +
h√n
) ∗ qknn (Z)] ≈√nEP [ρ(X, θ0) ∗ qknn (Z)] + Dn,P (θ0)[h], (16)
which is valid for nonlinear moments (Km 6= 0) whenever h/√n lies in a sufficiently
small neighborhood of θ0 – i.e. h/√n ∈ Vn(θ0, `n) for `n sufficiently small. By Theorem
3.1, the minimizer of Qn is with high probability contained in a Rn-shrinking neigh-
borhood of Θr0n(P ) and as a result has the structure θ0 + h/
√n for some (θ0, h/
√n) ∈
(Θr0n(P ), Vn(θ0, `n)) and appropriately selected `n. In particular whenever Rn tends to
zero sufficiently fast (or moments are linear), the minimizer of Qn will be contained in
a neighborhood for which (16) holds, which delivers the conclusion of Theorem 3.2(ii).
Regrettably, in certain nonlinear models the rate of convergence may be too slow and
the minimizer of Qn fail to be contained in neighborhoods for which the approximation
in (16) is accurate. In such instances, we may instead rely on set inclusion to obtain
In(R) = infθ∈Θn∩R
√nQn(θ) ≤ inf
(θ0,h√n
)∈(Θr0n(P ),Vn(θ0,`n))
√nQn(θ0 +
h√n
) (17)
and successfully couple the right hand side of (17) by restricting attention to sequences
`n for which (16) is accurate. Thus, by regularizing the local parameter space through
a norm bound, we obtain in Theorem 3.2(i) a distributional approximation that, while
potentially conservative, holds under weaker requirements on the rate of convergence.
3.3 Bootstrap Approximation
Theorem 3.2 provides us with a family of distributions (indexed by `n) whose quantiles
may be employed as critical values for our test. We next focus on devising a bootstrap
procedure for estimating the distribution of Un,P (R|`n) for appropriate `n.
15
3.3.1 Bootstrap Statistic
We estimate the distribution of Un,P (R|`n) by replacing population parameters with
suitable sample analogues. The key ingredients are: (i) A random variable Wn whose
distribution conditional on the data is consistent for the distribution of Wn,P ; (ii) An
estimator Dn(θ) for Dn,P (θ); and (iii) A sample analogue Vn(θ, `n) for the local parameter
space Vn(θ, `n). We then approximate the distribution of Un,P (R|`n) by that of
Un(R|`n) ≡ infθ∈Θr
n
infh√n∈Vn(θ,`n)
‖Wnρ(·, θ) ∗ qknn + Dn(θ)[h]‖Σn,p (18)
conditional on the data, where recall Θrn is the set estimator introduced in (11).
For concreteness, we employ the following sample analogues in our construction.
Gaussian Distribution: We estimate the distribution of the Gaussian process Wn,P
by relying on the multiplier bootstrap (Ledoux and Talagrand, 1988). Specifically, for
an i.i.d. sample ωini=1 with ωi ∼ N(0, 1) and independent of Vini=1 we set
Wnf ≡1√n
n∑i=1
ωif(Vi)−1
n
n∑j=1
f(Vj) (19)
for any function f . Since ωini=1 is independent of Vini=1, it follows that conditionally
on Vini=1, the distribution of Wn is centered Gaussian and its covariance kernel equals
the sample analogue to the unknown covariance kernel of Wn,P .
The Derivative: We estimate Dn,P (θ) by employing a construction that is applicable
to non-differentiable moments. Specifically, for any θ ∈ Θn ∩R and h ∈ Bn we set
Dn(θ)[h] ≡ 1√n
n∑i=1
(ρ(Xi, θ +h√n
)− ρ(Xi, θ)) ∗ qknn (Zi). (20)
While we focus our theoretical study on Dn(θ) due to its general applicability, alternative
approaches may be preferable in models with differentiable moments. In particular,
under differentiability, it is computationally preferable to employ
1
n
n∑i=1
∇θρ(Xi, θ)[h] ∗ qknn (Zi),
for ∇θρ(x, θ)[h] ≡ ∂∂τ ρ(x, θ + τh)|τ=0 as an estimator for Dn,P (θ)[h].
Local Parameter Space: In order to account for the role inequality constraints play
in determining the local parameter space, we conservatively estimate “binding” sets in
analogy to approaches pursued in the partial identification literature; see, e.g, Cher-
nozhukov et al. (2007); Galichon and Henry (2009); Linton et al. (2010); Andrews and
16
Soares (2010). Specifically, for a sequence rn and any θ ∈ Θn ∩R we define
Gn(θ) ≡ h√n∈ Bn : ΥG(θ +
h√n
) ≤ (ΥG(θ)−Kgrn‖h√n‖B1G) ∨ (−rn1G), (21)
where recall 1G is the order unit in G and g1 ∨ g2 represents the supremum of any
g1, g2 ∈ G. The constant Kg, formally introduced in Assumption 3.11 below, is related
to the curvature of ΥG and equals zero for linear ΥG. For any `n we then define
Vn(θ, `n) ≡ h√n∈ Bn :
h√n∈ Gn(θ), ΥF (θ +
h√n
) = 0 and ‖ h√n‖B ≤ `n, (22)
i.e. in comparison to Vn(θ, `n) we: (i) Replace ΥG(θ+h/√n) ≤ 0 by h/
√n ∈ Gn(θ); (ii)
Retain ΥF (θ + h/√n) = 0; and (iii) Substitute ‖h/
√n‖E ≤ `n with ‖h/
√n‖B ≤ `n.
3.3.2 Local Parameter Space
The construction of a suitable sample analogue for the local parameter space is arguably
the most novel aspect of the proposed bootstrap procedure. We next discuss the key
assumptions that ensure its validity and provide intuition for our construction.
In what follows, we will impose assumptions on ε-neighborhoods of Θr0n(P ) denoted
(Θr0n(P ))ε ≡ θ ∈ Bn :
−→d H(θ,Θr
0n(P ), ‖ · ‖B) ≤ ε.
In addition, we denote the closure of the linear span of ΥF (Bn) by Fn, and for any
linear map Γ on B we let N (Γ) ≡ h ∈ B : Γ(h) = 0 denote its null space. Given these
definitions, we next impose the following requirements on the maps ΥF and ΥG.
Assumption 3.11. For some Kg,Mg <∞, ε > 0 and all n, P ∈ P0, θ1, θ2 ∈ (Θr0n(P ))ε
(i) ΥG is Frechet differentiable with ‖ΥG(θ1)−ΥG(θ2)−∇ΥG(θ1)[θ1−θ2]‖G ≤ Kg‖θ1−θ2‖2B; (ii) ‖∇ΥG(θ1)−∇ΥG(θ0)‖o ≤ Kg‖θ1 − θ0‖B; (iii) ‖∇ΥG(θ1)‖o ≤Mg.
Assumption 3.12. For some Kf ,Mf <∞, ε > 0 and all n, P ∈ P0, θ1, θ2 ∈ (Θr0n(P ))ε
(i) ΥF is Frechet differentiable with ‖ΥF (θ1)−ΥF (θ2)−∇ΥF (θ1)[θ1−θ2]‖F ≤ Kf‖θ1−θ2‖2B; (ii) ‖∇ΥF (θ1) − ∇ΥF (θ0)‖o ≤ Kf‖θ1 − θ0‖B; (iii) ‖∇ΥF (θ1)‖o ≤ Mf ; (iv)
∇ΥF (θ1) : Bn → Fn admits a right inverse ∇ΥF (θ1)− with Kf‖∇ΥF (θ1)−‖o ≤Mf .
Assumption 3.13. Either (i) ΥF : B → F is affine, or (ii) There are constants
ε > 0, Kd < ∞ such that for every P ∈ P0, n, and θ0 ∈ Θr0n(P ) there exists a
h0 ∈ Bn ∩N (∇ΥF (θ0)) satisfying ΥG(θ0) +∇ΥG(θ0)[h0] ≤ −ε1G and ‖h0‖B ≤ Kd.
Assumption 3.11 imposes that ΥG be Frechet differentiable. The constant Kg, em-
ployed in the construction of Vn(θ, `), may be interpreted as a bound on the second
17
θ0
Vn(θ0,+∞)
θn
Gn(θn)
Figure 1: Impact of inequality constraints for restriction θ0 ≤ 0. Left Panel: True local
parameter space in shaded region. Right Panel: Approximation Gn(θn) in shaded region.
derivative of ΥG and equals zero when ΥG is linear. Assumptions 3.12 and 3.13 mark
an important difference between hypotheses in which ΥF is linear and those in which
ΥF is nonlinear – note linear ΥF automatically satisfy Assumptions 3.12 and 3.13. This
distinction reflects that when ΥF is linear its impact on the local parameter space is
known and need not be estimated.3 Thus, while Assumptions 3.12(i)-(iii) impose condi-
tions analogous to those required of ΥG, Assumption 3.12(iv) additionally demands that
∇ΥF (θ) posses a norm bounded right inverse on (Θr0n(P ))ε – the existence of a right in-
verse is equivalent to a classical rank condition.4 Finally, for nonlinear ΥF , Assumption
3.13(ii) requires the existence of a local perturbation to any θ0 ∈ Θr0n(P ) that relaxes
“active” inequality constraints without a first order effect on the equality restrictions.
Figure 1 illustrates how we account for inequality constraints in the case in which
B is the set of continuous functions on R and we aim to test whether θ0(x) ≤ 0 for all
x ∈ R. Absent equality constraints, the local parameter space for θ0 consists of the set
of perturbations h/√n such that θ0 + h/
√n remains negative. For an estimator θn of
θ0, Gn(θn) then consists of the h/√n such that at any x either (i) θn(x) + h(x)/
√n is
not “too close” to zero (to account for sampling uncertainty) or (ii) h(x)/√n is negative
(since all negative h/√n belong to the local parameter space for θ0). As θn converges
to θ0, Gn(θn) is asymptotically contained in the local parameter space of θ0. Unlike
Figure 1, whenever ΥG is nonlinear we must further account for the curvature of ΥG
which motivates the presence of the term Kgrn‖h/√n‖B1G in (21).
Figure 2 illustrates how regularizing with a ‖·‖B-norm constraint allows us to account
3For linear ΥF , the requirement ΥF (θ + h/√n) = 0 is equivalent to ΥF (h) = 0 for any θ ∈ R.
4Recall for a linear map Γ : Bn → Fn, its right inverse is a map Γ− : Fn → Bn such that ΓΓ−(h) = hfor any h ∈ Bn. The right inverse Γ− need not be unique if Γ is not bijective, in which case Assumption3.12(iv) is satisfied as long as it holds for some right inverse of ∇ΥF (θ) : Bn → Fn.
18
θ : ΥF (θ) = 0
θ0
θ2
θ1
Vn(θ0,+∞)
Vn(θ2,+∞)
Vn(θ1,+∞)
Figure 2: Impact of equality constraints. Left panel: Set of θ satisfying nonlinear equality
restriction. Right panel: Corresponding local parameter spaces (absent inequality constraints).
for the impact of equality constraints on the local parameter space when B = R2 and
F = R. In this setting, the constraint ΥF (θ) = 0 and the local parameter space
Vn(θ,+∞) generate curves in R2. Since all curves Vn(θ,+∞) pass through zero, they
are all “close” in a neighborhood of the origin. However, for nonlinear ΥF the size of
the neighborhood of the origin in which Vn(θn,+∞) is “close” to Vn(θ0,+∞) crucially
depends on both the distance of θn to θ0 and the curvature of ΥF . The set Vn(θn, `n)
therefore accounts for the impact of equality constraints by restricting attention to the
expanding neighborhood of the origin in which Vn(θn,+∞) resembles Vn(θ0,+∞).
Remark 3.1. Whenever ΥF and ΥG are linear, controlling the norm ‖ · ‖B is no longer
necessary as the “curvature” of ΥF and ΥG is known. As a result, we may instead set
Vn(θ, `n) ≡ h√n∈ Bn :
h√n∈ Gn(θ), ΥF (θ +
h√n
) = 0 and ‖ h√n‖E ≤ `n. (23)
Controlling the norm ‖ · ‖E, however, may still be necessary to ensure that Dn(θ)[h] is
a consistent estimator for Dn,P (θ)[h] uniformly in h/√n ∈ Vn(θ, `n).
3.3.3 Bootstrap Coupling
We impose a final set of assumptions in order to couple our bootstrap statistic.
Assumption 3.14. (i) supθ∈Θn∩R ‖Wnρ(·, θ) ∗ qknn −W?n,Pρ(·, θ) ∗ qknn ‖p = oP (an) uni-
formly in Φ×P with P ∈ P for W?n,P an isonormal Gaussian process that is independent
of Vini=1 and Φ the standard normal distribution.
Assumption 3.15. (i) For some Kb <∞, ‖h‖E ≤ Kb‖h‖B for all h ∈ Bn; (ii) There
is an ε > 0 such that P ((Θrn)ε ⊆ Θn) tends to one uniformly in P ∈ P0.
19
Assumption 3.16. (i) Either ΥF and ΥG are affine or (Rn+νnτn)×Sn(B,E) = o(1);
(ii) The sequences `n, τn satisfy k1/pn
√log(1 + kn)Bn× supP∈P J[ ](`
κρn ∨ (νnτn)κρ ,Fn, ‖ ·
‖P,2) = o(an), Km`n(`n+Rn+νnτn)×Sn(L,E) = o(ann− 1
2 ), and `n(`n+Rn+νnτn×Sn(B,E))1Kf > 0 = o(ann
− 12 ); (iii) The sequence rn satisfies lim supn→∞ 1Kg >
0`n/rn < 1/2 and (Rn + νnτn)× Sn(B,E) = o(rn).
Assumption 3.14 demands that Wn be coupled with a Gaussian W?n,P independent of
Vini=1. This condition implies the multiplier bootstrap is valid in our potentially non-
Donsker setting; see Appendix S.9 for sufficient conditions. Assumption 3.15(i) ensures
that ‖ · ‖B is (weakly) stronger than ‖ · ‖E. Assumption 3.15(ii) demands that Θrn be
asymptotically contained in the interior of Θn. We emphasize this requirement does not
rule out that parameter space restrictions be binding at Θr0n(P ) – instead, Assumption
3.15(ii) requires that all such restrictions be explicitly stated through R. Assumption
3.16(i) ensures the one sided Hausdorff convergence of Θrn to Θr
0n(P ) under ‖ · ‖B when-
ever ΥF or ΥG are nonlinear. The main conditions on the sequence `n, employed in
constructing Vn(θ, `n), are contained in Assumption 3.16(ii). These conditions ensure
the consistency of Dn(θ)[h], the applicability of Theorem 3.2, and that the intuition
behind Figure 2 be valid – recall the latter two are automatic when Km = Kf = 0,
which is reflected in Assumption 3.16(ii). Finally, Assumption 3.16(iii) requires that rn
not decrease to zero faster than the ‖ · ‖B-rate of convergence of Θrn.
Our next result provides a coupling result for our bootstrap statistic. In its state-
ment, U?n,P (R|`n) is defined identically to Un,P (R|`n) but with W?n,P in place of Wn,P .
Theorem 3.3. If Assumptions 3.1, 3.2, 3.3, 3.4, 3.6, 3.7, 3.8, 3.9, 3.10(i), 3.10(iii),
3.11, 3.12, 3.13, 3.14, 3.15, 3.16 hold, then there is ˜n `n so that uniformly in P ∈ P0
Un(R|`n) ≥ U?n,P (R|˜n) + oP (an).
Theorem 3.3 shows that with unconditional probability tending to one uniformly on
P ∈ P0 our bootstrap statistic is bounded from below by a random variable that is
independent of the data. The significance of this result lies in that the lower bound is
equal in distribution to the coupling to In(R) obtained in Theorem 3.2. Thus, Theorems
3.2 and 3.3 provide the basis for establishing that comparing In(R) to the quantiles of
Un(R|`n) conditional on the data provides asymptotic size control.
We conclude by showing that it is sometimes possible to set `n = +∞.
Lemma 3.1. Suppose there is a An(P ) ⊆ Θn ∩ R such that ‖h‖E ≤ νn‖Dn,P (θ)[h]‖pfor all θ ∈ An(P ) and h ∈
√nBn ∩R− θ. If the estimator Dn(θ) satisfies
supθ∈An(P )
suph∈√nBn∩R−θ:‖ h√
n‖B≥`n
‖Dn(θ)[h]− Dn,P (θ)[h]‖p‖h‖E
= oP (ν−1n ) (24)
20
and Θrn ⊆ An(P ) with probability tending to one uniformly in P ∈ P0, Assumptions
3.3(i)(iii), 3.4, 3.14 hold, and Sn(B,E)Rn = o(`n), then uniformly in P ∈ P0
Un(R|`n) = infθ∈Θr
n
infh√n∈Vn(θ,+∞)
‖Wnρ(·, θ) ∗ qknn + Dn(θ)[h]‖Σn,p + oP (an). (25)
Heuristically, Lemma 3.1 establishes the constraint ‖h/√n‖B ≤ `n is asymptotically
not binding provided `n ↓ 0 sufficiently slowly. In order for `n to simultaneous satisfy
such a requirement and Assumption 3.16(ii)-(iii), it must be that either the rate of
convergence is adequately fast, or that both the moments and ΥF are linear. Thus,
setting `n <∞ can be necessary in nonlinear problems with slow rates of convergence.
3.4 The Test
3.4.1 Critical Value: In(R)
For any statistic Tn that is a function of the data Vini=1, we let qτ,P (Tn) denote its τ
quantile when Vi ∼ P – e.g. the 1− α quantile of In(R) is denoted by
q1−α,P (In(R)) ≡ infu : P (In(R) ≤ u) ≥ 1− α.
Similarly, given any statistic Tn that is a function of Vini=1 and the bootstrap weights
ωini=1, we let qτ (Tn) denote its conditional τ quantile given Vini=1. In particular,
Theorems 3.2 and 3.3 suggest that a valid critical value for In(R) is given by
q1−α(Un(R|`n)) ≡ infu : P (Un(R|`n) ≤ u|Vini=1) ≥ 1− α.
We additionally impose one final assumption.
Assumption 3.17. There is a δ > 0 such that for all ε > 0 and α ∈ [α − δ, α + δ] we
have supP∈P0P (q1−α,P (In(R)) − ε ≤ In(R) ≤ q1−α,P (In(R)) + ε) ≤ %n(ε ∧ 1) + o(1),
where the concentration parameter %n is smaller than the coupling rate (i.e. %n ≤ a−1n ).
It is well known that uniform consistent estimation of an approximating distribution
is not sufficient for establishing asymptotic size control (Romano and Shaikh, 2012).
Intuitively, in order to achieve size control the approximate distribution must be suitably
continuous at the quantile of interest uniformly in P ∈ P0. Assumption 3.17 imposes
precisely this requirement, allowing the modulus of continuity, captured here by the
concentration parameter %n, to deteriorate with the sample size provided that %n ≤ a−1n
– i.e. the loss of continuity must occur at a slower rate than the coupling rate of Theorems
21
3.2 and 3.3. We refer the reader to Chernozhukov et al. (2013, 2014) for further discussion
and motivation of conditions of this type, called anti-concentration conditions.5
The next result establishes the asymptotic validity of a test based on In(R).
Corollary 3.1. Let Assumption 3.17 hold and the conditions of Theorem 3.2(i) and
Theorem 3.3 be satisfied. If cn = q1−α(Un(R|`n)), then it follows that:
lim supn→∞
supP∈P0
P (In(R) > cn) ≤ α.
We note that if there are no inequality constraints, then it is possible to show that
the test in Corollary 3.1 is similar and its asymptotic size equals the nominal level α
whenever the conditions of Theorem 3.2(ii) are satisfied. The consistency of the test
against any P ∈ P \P0 for which max ‖EP [ρ(X, θ)|Z]‖P,2 is bounded away from zero
(in θ ∈ Θ ∩R) is also straightforward to establish under suitable conditions.
3.4.2 Critical Value: In(R)− In(Θ)
We next establish the asymptotic validity of a test based on In(R) − In(Θ). To this
end, we will apply Theorems 3.2 and 3.3 to both R as in (7) and R = Θ, which in turn
will require us to impose assumptions for both cases. Throughout, we therefore signify
parameters associated with setting R = Θ by a “u” superscript – e.g. Fun , Θu
0n(P ), and
V un (θ, `) are understood to be as in (8), (12), and (14) but with R = Θ. Employing this
notation, we then note that the analogue to Un,P (R|`n) (as in (15)) is given by
Un,P (Θ|`n) ≡ infθ0∈Θu
0n(P )inf
h√n∈V u
n (θ0,`n)‖Wn,Pρ(·, θ0) ∗ qknn + Dn,P (θ0)[h]‖Σn(P ),p.
Our next result establishes a strong approximation to the recentered statistic.
Corollary 3.2. Let the conditions of Theorem 3.2(i) be satisfied with R as in (7) and the
conditions of Theorem 3.2(ii) be satisfied with R = Θ. Then, for any `n, `un ↓ 0 satisfying
k1/pn
√log(1 + kn)Bn × supP∈P J[ ](`
κρn ,Fn, ‖ · ‖P,2) + supP∈P J[ ]((`
un)κρ ,Fu
n , ‖ · ‖P,2) =
o(an), Km(`un)2 × Sun(L,E) = o(an), and Ru
n = o(`un) it follows uniformly in P ∈ P0
In(R)− In(Θ) ≤ Un,P (R|`n)− Un,P (Θ|`un) + oP (an).
Corollary 3.2 follows immediately by applying Theorem 3.2(i) to couple In(R) and
Theorem 3.2(ii) to couple In(Θ). In particular, it is worth emphasizing that in coupling
In(Θ) we must rely on Theorem 3.2(ii) instead of Theorem 3.2(i) in order to ensure that
5Alternatively, Assumption 3.17 can be dispensed by adding a fixed constant η > 0 to the criticalvalue – i.e. using q1−α(Un(R|`n)) + η as the critical value (Andrews and Shi, 2013).
22
In(R) − In(Θ) is weakly smaller than its strong approximation. As a result, whenever
moments are nonlinear, Corollary 3.2 requires the rate of convergence of the uncon-
strained estimator to be sufficiently fast for Theorem 3.2(ii) to apply.
In order to obtain a bootstrap approximation, we define the following estimators
Θun ≡ θ ∈ Θn : Qn(θ) ≤ inf
θ∈ΘnQn(θ) + τu
n V un (`) ≡ h√
n∈ Bu
n : ‖ h√n‖E ≤ `
i.e. Θun is simply the set estimator of Section 3.2.1 applied with Θ = R, while V u
n (`) is
the local parameter space sample analogue of Section 3.3.2 applied with no equality or
inequality constraints. As a bootstrap approximation to Un,P (Θ|`un) we then employ
Un(Θ|`un) ≡ infθ∈Θu
n
infh√n∈V u
n (`un)‖Wnρ(·, θ) ∗ qknn + Dn(θ)[h]‖Σn,p.
The next corollary obtains a coupling for our bootstrap approximation.
Corollary 3.3. Let the conditions of Theorem 3.3 be satisfied with R as in (7) and
R = Θ, and suppose JunBnk
1/pn
√log(1 + kn)/n = o(τu
n ). Then, it follows that there are˜n `n and ˜u
n `un such that uniformly in P ∈ P0 we have
Un(R|`n)− Un(Θ|`un) ≥ U?n,P (R|˜n)− U?n,P (Θ|˜un) + oP (an).
Corollary 3.3 simply follows by applying Theorem 3.3 to Un(R|`n) and coupling
Un(Θ|`un) to U?n,P (Θ|˜un). For the latter coupling, it is important that Θu
n be consistent
for Θu0n(P ) in the Hausdorff metric instead of the directed Hausdorff metric. While both
metrics are equivalent for identified models, and hence we may set τun = 0, in partially
identified models we require that τun tend to zero sufficiently slowly as in Theorem 3.1(ii).
We further note that in identified models, it is possible to employ either Wnρ(·, θn) or
Wnρ(·, θun) in constructing both Un(R|`n) and Un(Θ|`un) – a change that results in an
asymptotically equivalent coupling but ensures that the bootstrap statistic is positive.
The asymptotic validity of a test that rejects whenever In(R) − In(Θ) exceeds the
appropriate quantile of our bootstrap approximation immediately follows. In addition,
we note that it is always possible to set `un = +∞ – a choice that does not lead to a loss
of power under conditions related to those stated in Lemma 3.1.
Corollary 3.4. Let Assumption 3.17 hold with In(R)− In(Θ) instead of In(R), and the
conditions of Corollaries 3.2, 3.3 be satisfied. If cn = q1−α(Un(R|`n)− Un(Θ|`un)), then:
lim supn→∞
supP∈P0
P (In(R)− In(Θ) > cn) ≤ α.
Moreover, the same conclusion holds if we instead set cn = q1−α(Un(R|`n)−Un(Θ|+∞)).
23
4 Heterogeneity and Demand Analysis
For our final example, we illustrate how to conduct inference in the heterogeneous de-
mand model of Hausman and Newey (2016). Specifically, for Y ∈ [0, 1] equal to the
expenditure share on a commodity, W ∈ W ⊆ Rdw a vector of prices, income, and
covariates, and η representing unobserved individual heterogeneity we suppose
Y = g(W, η) (26)
where g is a known function of (W, η). The requirement that g be known is not neces-
sarily restrictive as η can in principle be infinite dimensional. For instance, Hausman
and Newey (2016) show that if expenditure share is actually given by Y = g(W, ε) for
some unknown function g and unobserved ε, then under appropriate conditions
Y = g(W, ε) =∞∑j=1
ψj(W )βj(ε), (27)
where ψj∞j=1 is a known basis with∑∞
j=1 ψ2j (W ) finite and βj(ε)∞j=1 are unknown
coefficients. Setting η = βj(ε)∞j=1 and viewing η as a random variable in the sequence
space `2 ≡ aj∞j=1 :∑
j a2j <∞, we then note (27) reduces to (26) with g known.
If the covariates W are independent of η, then for any c ∈ R it follows that
P (Y ≤ c|W ) = P (g(W, η) ≤ c|W ) =
∫1g(W, η) ≤ cνP (dη) (28)
where νP denotes the unknown distribution of η. Result (28) restricts the possible values
of νP and hence the identified set for average exact consumer surplus, average share, and
other functionals of interest. Specifically, for Ψ(g, η) an object of interest for preferences
denoted by η, such as equivalent variation, Hausman and Newey (2016) study∫Ψ(g, η)νP (dη), (29)
which is the average across individuals. By evaluating the set of values of (29) which
can be generated by a distribution νP satisfying (28) at a grid cJ=1, Hausman and
Newey (2016) provide estimates of the identified set for (29). We further note bounds
on the distribution of Ψ(g, η) under νP can be obtained by replacing Ψ(g, η) in (29) with
an indicator that Ψ(g, η) be less than or equal to some real number.
In what follows, we apply our results to conduct inference on functionals as in
(29). To this end, we let FP (c|W ) ≡ P (Y ≤ c|W ) for a given grid cJ=1 and
set θ0(P ) = (FP (c|W )J=1, νP ). To define B, we suppose η ∈ Ω for some known
Hausdorff space Ω, set B to be the Borel σ-algebra on Ω, let M(B) be the space of
regular signed Borel measures on Ω, and let ‖ · ‖TV denote the total variation norm.
24
Assuming FP (c|·) ∈ CB(W) for CB(W) the space of continuous and bounded functions
on W, we define B ≡ (⊗J
=1CB(W))×M(B) and for any (F (c|·)J=1, ν) = θ ∈ B let
‖θ‖B =∑J
=1 ‖F (c|·)‖∞ + ‖ν‖TV . As the parameter space Θ we then set
Θ ≡ (F (c|·)J=1, ν) = θ ∈ B : max1≤≤J
‖F (c|·)‖∞ ≤ 2, (30)
where the “2” norm bound is simply selected to ensure θ0(P ) is in the interior of Θ.
Letting X ≡ (Y,W ) and Z = W for every 1 ≤ ≤ J we then define
ρ(X, θ) = 1Y ≤ c − F (c|W ), (31)
which yields conditional moment restrictions that identify FP (c|W ) – νP , however, is
potentially partially identified. Also let G = `∞(B), which is an AM space under ‖ · ‖∞with order unit 1G satisfying 1G(B) = 1 for any B ∈ B. Defining ΥG : B→ `∞(B) by
ΥG(θ)(B) = −ν(B) (32)
for any set B ∈ B and (F (c|·)J=1, ν) = θ ∈ B, then enables us to impose the restriction
that ν be a positive measure through the inequality ΥG(θ) ≤ 0. Finally, for a grid
wlLl=1 we let ΥF : B→ RJL+2 be given by ΥF (θ) = (Υ(e)F (θ),Υ
(ν)F (θ),Υ
(s)F (θ)), where
Υ(e)F (θ) = F (c|wl)−
∫1g(wl, η) ≤ cν(dη)1≤≤J ,1≤l≤L (33)
Υ(ν)F (θ) = ν(Ω)− 1 (34)
Υ(s)F (θ) =
∫Ψ(g, η)ν(dη)− λ; (35)
i.e. (33) reflects (28) holds, (34) demands that ν have measure one, and (35) lets us
impose that average object of interest be equal to a hypothesized value λ. By conducting
test inversion in λ we can obtain a confidence region for the functional in (29).
As in Hausman and Newey (2016), we can impose utility maximization by requiring
that Ω consist only of η such that g(·, η) satisfies the Slutsky conditions. One may
sample from Ω by drawing randomly from sets of η that satisfy Slutsky symmetry and
only keeping those where the compensated price effects matrix is negative semidefinite
on a grid. This is the procedure followed in Hausman and Newey (2016) for two goods.
Given a collection of orthogonal probability measures µs,nsns=1 ⊂M(B) we employ
Mn(B) ≡ ν ∈M(B) : ν =
sn∑s=1
πsµs,n for some πssns=1 ∈ Rsn
as a sieve for M(B). To guarantee that each consumer is rational we focus on discrete
distributions where each µs,n is a Dirac measure for a point in Ω satisfying the Slutsky
25
conditions. For Dirac µs,n it is also particularly (computationally) simple to impose the
constraints ΥG(θ) ≤ 0 (see (32)). As a sieve for FP (c|·)J=1, we employ approximating
functions pj,njnj=1. In particular, setting pjnn (w) = (p1,n(w), . . . , pjn,n(w))′, we let
Θn ≡ (pjn′n βJ=1, ν) : ν ∈Mn(B) and max1≤≤J
‖pjn′n β‖∞ ≤ 2. (36)
Similarly, for a sequence qn,kknk=1 and kn × kn positive definite matrices Σ,nJ=1, we
set qknn (w) ≡ (q1,n(w), . . . , qkn,n(w))′ and for any (F (c|·)J=1, ν) = θ define
Qn(θ) ≡ J∑=1
‖ 1
n
n∑i=1
(1Yi ≤ c − F (c|Wi))qknn (Wi)‖2Σ,n,2
1/2. (37)
The statistics In(R) and In(Θ) then equal the minimums of√nQn over Θn∩R and Θn.
Our next set of assumptions enable us to couple In(R) and In(R)− In(Θ).
Assumption 4.1. (i) Yi,Wini=1 is i.i.d. P with P ∈ P; (ii) supw ‖pjnn (w)‖2 .
√jn;
(iii) EP [pjnn (W )pjnn (W )′] has eigenvalues bounded away from zero and infinity uniformly
in P ∈ P and n; (iv) For each P ∈ P0 and θ0 ∈ Θ0(P ) ∩ R, there exists a Πrnθ0 =
(Fn(c|·)J=1, νn) ∈ Θn ∩ R such that∑J
=1 ‖EP [(Fn(c|W ) − FP (c|W ))qknn (W )]‖2 =
O((n log(n))−1/2) uniformly in P ∈ P0 and θ0 ∈ Θ0(P ) ∩R.
Assumption 4.2. (i) max1≤k≤kn ‖qk,n‖∞ .√kn; (ii) EP [qknn (W )qknn (W )′] has eigen-
values bounded uniformly in P ∈ P and n; (iii) EP [qknn (W )pjnn (W )′] has singular values
bounded away from zero in P ∈ P and n; (iv) k2nj
3/2n log3(n) = o(n1/2).
Assumption 4.3. For all 1 ≤ ≤ J : (i) ‖Σ,n − Σ,n(P )‖o,2 = oP (1/kn√jn log2(n))
uniformly in P ∈ P; (ii) Σ,n(P ) is invertible and ‖Σ,n(P )‖o,2 and ‖Σ,n(P )−1‖o,2 are
bounded uniformly in P ∈ P and n.
Assumptions 4.1(ii)-(iv) state the conditions on Θn, with Assumptions 4.1(ii)(iii) be-
ing satisfied by standard choices such as B-Splines or wavelets. Assumption 4.1(iv) is an
asymptotic unbiasedness requirement – a condition that is eased by noting no require-
ments are imposed on the approximating space for νP . The requirements on qk,nknk=1
are stated in Assumption 4.2(i)(iii) and are again satisfied by standard choices. Assump-
tion 4.2(iv) states a rate condition that suffices for verifying the coupling requirements
of Theorem 3.2. Assumption 4.3 imposes the requirements on the weighting matrices.
Our next result employs Theorem 3.2(ii) to obtain strong approximations.
Theorem 4.1. Let Assumptions 4.1, 4.2, 4.3 hold, an = (log(n))−1/2, and for any
θ = (F (c|·)J=1, ν) ∈ B let ‖θ‖E = max1≤≤J ‖F (c|·)‖∞. If `n, `un ↓ 0 satisfy
26
kn√jn log2(n)(`n ∨ `un) = o(1), knjn log(n)/
√n = o(`n ∧ `un), then uniformly in P ∈ P0:
In(R) = Un,P (R|`n) + oP (an)
In(R)− In(Θ) = Un,P (R|`n)− Un,P (Θ|`un) + oP (an).
In order to conduct inference, we next aim to estimate the distributions of Un,P (R|`n)
and Un,P (Θ|`un). To this end, we note that Θr0n(P ) is potentially non-singleton and
we therefore employ a set estimator Θrn (as in (11)) to estimate the distribution of
Un,P (R|`n). In contrast, since Un,P (Θ|`un) only depends on θ0(P ) through its identified
component FP (c|·)J=1, for the unconstrained problem we employ any minimizer θun of
Qn on Θn. With regards to the local parameter space, we note that in this application
Gn(θ) = (pjn′n β,hJ=1, νh) : νh(B) ≥ minrn − ν(B), 0 for all B ∈ B (38)
for any θ = (F (c|·)J=1, ν). Computationally, since any ν, νh ∈ Mn(B) has the struc-
ture ν =∑sn
s=1 πsµs,n and νh =∑sn
s=1 πshµs,n it follows that the constraints in (38)
reduce to πsh ≥ minrn − πs, 0 for all 1 ≤ s ≤ sn whenever µs,nsns=1 are orthogonal,.
Furthermore, since moments and restrictions are linear, we may let `n = +∞ and set
Vn(θ,+∞) =
(pjn′n β,hJ=1, νh) :h√n∈ Gn(θ), ΥF (h) = 0
. (39)
Thus, the estimators of the strong approximations obtained in Theorem 4.1 equal
Un(R|+∞) ≡ infθ∈Θr
n
infh√n∈Vn(θ,+∞)
J∑=1
‖Wnρ(·, θ)qknn + D,n[h]‖Σ,n,21/2
Un(Θ|+∞) ≡ infh√n
J∑=1
‖Wnρ(·, θun)qknn + D,n[h]‖Σ,n,2
1/2.
Before stating our final assumption, we need an auxiliary result. To this end, define
Γn(θ) ≡ ν ∈Mn(B) : θ = (F (c|·)J=1, ν) satisfies ΥF (θ) = 0, ΥG(θ) ≤ 0 (40)
for any (F (c|·)J=1, ν) = θ ∈ Θn ∩ R – i.e. Γn(θ) denotes the set of distributions of
unobserved heterogeneity that agree with the restrictions implied by F (c|·)J=1. The
next result bounds the ‖ · ‖TV -Hausdorff distance between Γn(θ1) and Γn(θ2).
Lemma 4.1. If the probability measures µs,nsns=1 are orthogonal and sn > JL, then
for every n there is a ζn <∞ independent of P such that
dH(Γn(θ1),Γn(θ2), ‖ · ‖TV ) ≤ ζnJ∑=1
‖F1(c|·)− F2(c|·)‖∞
27
for any (F1(c|·)J=1, ν1) = θ1 ∈ Θn ∩R and (F2(c|·)J=1, ν2) = θ2 ∈ Θn ∩R.
We next introduce a final assumption to show the validity of our bootstrap procedure.
Assumption 4.4. (i) Ψ(g, ·) is bounded on Ω; (ii) The measures µs,nsns=1 are or-
thogonal and sn > JL; (iii) k4nj
5n log5(n)/n = o(1); (iv) Πnθ0(P ) = (Fn(c|·)J=1, ν)
satisfies ‖Fn(c|·)−FP (c|·)‖∞ = o(1) uniformly in θ0(P ) ∈ Θ0(P )∩R and P ∈ P0; (v)
knjn log2(n)τn = o(1), and ζn(knjn log(n)/√n+√jnτn) = o(rn).
The boundedness of Ψ(g, ·) on Ω ensures Υ(s)F (as in (35)) is continuous, while As-
sumption 4.4(ii) allows us to apply Lemma 4.1. Assumption 4.4(iii) is a low level suf-
ficient condition for verifying the bootstrap coupling requirement of Assumption 3.14.
These rate requirements could be improved under smoothness conditions on FP (c|·).Finally, Assumption 4.4(iv) imposes a mild requirement on the sieve, while Assumption
4.4(v) states conditions on τn and rn – note τn = 0 and rn = +∞ are always valid.
Our final result obtains a coupling for our bootstrap approximations.
Theorem 4.2. Let the conditions of Theorem 4.1 hold and Assumption 4.4 be satis-
fied. Then: for any sequences `n, `un ↓ 0 satisfying knjn log(n)/
√n = o(`n ∧ `un) and
kn√jn log2(n)(`n ∨ `un) = o(1) it follows uniformly in P ∈ P0 that
Un(R|+∞) ≥ U?n,P (R|`n) + oP (an)
Un(R|+∞)− Un(Θ|+∞) ≥ U?n,P (R|`n)− U?n,P (Θ|`un) + oP (an).
In particular, since the conditions on `n and `un imposed in Theorems 4.1 and 4.2 are
the same, it follows that we may employ the quantiles of Un(R|+∞) and Un(R|+∞)−Un(Θ|+∞) conditional on the data as critical values for In(R) and In(R)− In(Θ).
5 Simulation Evidence
We next study the finite sample performance of our inference procedure by revisiting
the simulation design in Chetverikov and Wilhelm (2017).
5.1 Identified Model
We first consider a nonparametric instrumental variable model in which, for some un-
known function θ0, the distribution of (Y,W,Z) ∈ R3 satisfies the restriction
Y = θ0(W ) + ε E[ε|Z] = 0; (41)
28
see Appendix S.6 for a formal study of this model. Following Chetverikov and Wilhelm
(2017), we set θ0(x) ≡ 0.2x+ x2 and for (ε, ζ, ν) independent standard normal random
variables we let Z = Φ(ζ), W = Φ(0.3ζ+√
1− (0.3)2ε), and ε = (0.3ε+√
1− (0.3)2ν)/2
for Φ the cumulative distribution function of a standard normal. All reported results
are based on five thousand replications employing five hundred bootstrap draws each.
Our results enable us to conduct inference on functionals of θ0 while imposing shape
restrictions. In what follows we set ΥF to denote either the level or the derivative of θ0 at
the point w0 = 0.5 and employ ΥG to impose that θ be either monotonically increasing
or monotonically increasing and convex. We employ the test statistic In(R) − In(Θ)
with r = 2 and Σn an estimate of the optimal weighting matrix based on a first stage
unconstrained estimator. The implementation of the test is similar to that of the linear
model of Section 2.1, with the difference that we must select the sieve Θn = pjn′n β : β ∈Rjn and qknn . We follow Chetverikov and Wilhelm (2017) in employing continuously
differentiable piecewise quadratic splines with equally spaced knots for both pjnn and qknn .
In computing critical values we set `n = `un = +∞ since the model is linear and
τn = 0 since the model is identified – note these choices are also valid under partial
identification. We select rn by proceeding as in Section 2.1. For pjn′n βun the minimizer of
In(Θ) and pjn′n βu?n its score bootstrap analogue (Kline and Santos, 2012), we set
P ( max1≤j≤J
Cjβu?′n − βn ≤ rn|Vini=1) = 1− γn (42)
where γn ∈ (0, 1) and the vectors Cj ∈ Rjn depend on the shape restriction being
imposed. We emphasize that the sequence γn must tend to zero in order for rn to satisfy
our assumptions. Finally, we employ the minimizer of In(R) in obtaining bootstrap
draws for both Un(R|+∞) and Un(Θ|+∞); see discussion following Corollary 3.3.
Table 1 reports empirical rejection probabilities under the null hypothesis for 5%-
level tests on the derivative and level of θ0 at w0 = 0.5 under different shape restrictions.
Simulations for additional values of (jn, kn) are included in the supplemental appendix.
With regards to rn, we examine the extreme possible values (0 and ∞) and choices
corresponding to (42) for different γn. Across all simulations, including those in the
supplemental appendix, we find that the rejection probability is below the nominal level
provided 1−γn ≥ 0.5 in (42). Overall, we find the general lack of sensitivity to different
choices of bandwidths to be reassuring for empirical practice.
In Figure 3 we report power curves for different 5%-level tests concerning the value
of θ0 and its derivative at w0 = 0.5. For conciseness, we focus on the sample sizes
n ∈ 1000, 5000 and rn chosen as in (42) with 1 − γn = 0.95. The curves labeled
“Mon” and “Mon+Conv” correspond to tests based on In(R)− In(Θ) with R imposing
29
Imposed: Mon. Imposed: Mon.+ Conv.Level Derivative Level Derivative
rn/(jn, kn) (4,4) (4,6) (4,4) (4,6) (4,4) (4,6) (4,4) (4,6)∞ 1.90 1.72 1.88 2.02 1.44 1.52 2.74 2.84
95% 1.74 1.68 1.90 2.08 1.46 1.54 2.68 2.84n = 500 50% 1.74 1.70 1.90 2.10 1.46 1.54 2.68 2.84
5% 2.18 2.90 2.20 2.96 1.52 1.82 2.74 2.980 5.30 5.10 4.62 4.48 5.42 5.36 5.08 4.84∞ 1.56 1.82 1.68 1.94 1.40 1.54 2.26 2.32
95% 1.52 1.84 1.64 1.86 1.36 1.44 2.04 2.26n = 1000 50% 1.52 1.86 1.64 1.86 1.36 1.44 2.04 2.26
5% 2.02 2.84 2.06 3.06 1.44 1.86 2.14 2.380 4.54 4.56 4.58 4.68 4.62 4.78 4.38 4.20∞ 1.34 1.58 1.26 1.52 1.04 1.36 1.36 1.58
95% 1.40 1.50 1.32 1.62 1.06 1.42 1.36 1.62n = 5000 50% 1.42 1.52 1.32 1.62 1.06 1.42 1.36 1.62
5% 2.20 3.62 2.36 3.36 1.42 2.38 1.46 1.860 3.98 4.56 4.68 4.50 4.10 4.74 3.98 4.06
Table 1: Empirical rejection probabilities for 5%-level tests based on In(R) − In(Θ).Value of rn set to a percentile corresponds to choice of 1− γn in (42).
monotonicity and monotonicity and convexity while changing the conjectured value
of θ0 and its derivative at w0 = 0.5. The curve labeled “Unres.” corresponds to a
Wald test based on the unrestricted estimator. For all designs we find that imposing
shape restrictions can yield important power gains. The benefits of imposing shape
restrictions, however, depend on both the sampling uncertainty and how “close” the
shape restrictions are to binding (Chetverikov et al., 2018). Since our design is fixed with
n and θ0 is strictly increasing and convex, in our simulations we see the advantages of
imposing shape restrictions decrease with n as sample uncertainty decreases. Similarly,
since estimating the derivative is a harder than estimating the level, we observe larger
power gains when imposing shape restrictions in the former problem.
5.2 Partially Identified Model
We next examine the performance of our test in a partially identified setting by dis-
cretizing the simulation design in Chetverikov and Wilhelm (2017). Concretely, we
generate (W,Z, ε) ∈ [0, 1]2 × R as in Section 5.1, divide [0, 1] into Sw and Sz equally
spaced segments, and generate dummy variables Dw and Dz for the segment to which
W and Z belong – e.g. if (Sw, Sz) = (3, 2), then Dw(W ) ≡ (1W ∈ [0, 1/3], 1W ∈(1/3, 2/3], 1W ∈ (2/3, 1])′ and Dz(Z) ≡ (1Z ∈ [0, 1/2], 1Z ∈ (1/2, 1])′. The
outcome Y is generated according to (41) but employing Dw in place of W .
The discretized design is characterized by Sz linear unconditional moment restric-
tions in Sw unknowns. For conciseness, we focus on imposing that θ0 be monotoni-
cally increasing and convex while conducting inference on the value of θ0 at the point
30
−0.2 −0.1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 1.10
0.2
0.4
0.6
0.8
1
Power Curve for Level: (jn, kn) = (4, 4), n = 1000
Mon Mon+Conv Unres.
−0.2 −0.1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 1.10
0.2
0.4
0.6
0.8
1
Power Curve for Level: (jn, kn) = (4, 6), n = 1000
Mon Mon+Conv Unres.
−0.2 −0.1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 1.10
0.2
0.4
0.6
0.8
1
Power Curve for Level: (jn, kn) = (4, 4), n = 5000
Mon Mon+Conv Unres.
−0.2 −0.1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 1.10
0.2
0.4
0.6
0.8
1
Power Curve for Level: (jn, kn) = (4, 6), n = 5000
Mon Mon+Conv Unres.
0 0.5 1 1.5 2 2.5 3 3.5 40
0.2
0.4
0.6
0.8
1
Power Curve for Derivative: (jn, kn) = (4, 4), n = 1000
Mon Mon+Conv Unres.
0 0.5 1 1.5 2 2.5 3 3.5 40
0.2
0.4
0.6
0.8
1
Power Curve for Derivative: (jn, kn) = (4, 6), n = 1000
Mon Mon+Conv Unres.
0 0.5 1 1.5 2 2.5 3 3.5 40
0.2
0.4
0.6
0.8
1
Power Curve for Derivative: (jn, kn) = (4, 4), n = 5000
Mon Mon+Conv Unres.
0 0.5 1 1.5 2 2.5 3 3.5 40
0.2
0.4
0.6
0.8
1
Power Curve for Derivative: (jn, kn) = (4, 6), n = 5000
Mon Mon+Conv Unres.
Figure 3: Rejection probabilities for 5%-level tests on conjectured value of θ0(0.5) (truevalue 0.35) and θ′0(0.5) (true value 1.2). Tests implemented with 1− γn = 0.05 in (42).
d0 ≡ Dw(0.5) – e.g, if Sw = 3, then d0 = (0, 1, 0)′. We note that θ0(d0) is generically not
identified whenever Sw > Sz, but imposing a shape restriction on θ0 partially identifies
θ0(d0) in this design. Table 2 reports the relevant identified sets.
(Sw, Sz)Restriction on θ0 (3, 2) (4, 2) (3, 2)
Mon.+Convex [0.059, 0.252] [0.100, 0.412] [0.310, 0.388]No Restriction (−∞,+∞) (−∞,+∞) (−∞,+∞)
Table 2: Identified sets for θ0(d0) with and without shape restrictions.
31
Lower Endpoint Midpoint Upper Endpoint(Sw, Sz) (Sw, Sz) (Sw, Sz)
rn (3,2) (4,2) (4,3) (3,2) (4,2) (4,3) (3,2) (4,2) (4,3)∞ 1.96 3.34 1.48 0.10 0.02 1.48 1.88 3.10 2.00
95% 3.64 4.70 1.46 0.10 0.02 1.46 2.26 3.12 1.98n = 500 50% 5.34 5.24 1.46 0.50 0.06 1.50 5.22 5.02 2.04
5% 5.36 5.24 3.56 0.50 0.06 3.44 5.24 5.02 3.540 5.34 5.26 4.64 0.50 0.06 4.48 5.24 5.16 4.60∞ 1.84 3.06 1.12 0.00 0.00 1.10 1.96 2.90 1.34
95% 4.98 4.84 1.12 0.02 0.00 1.08 2.98 2.90 1.34n = 1000 50% 5.10 4.88 1.20 0.12 0.00 1.14 5.00 4.86 1.44
5% 5.10 4.88 3.48 0.12 0.00 3.12 5.00 4.86 2.780 5.28 4.88 4.42 0.08 0.00 4.14 5.10 4.86 3.82∞ 1.98 4.40 1.34 0.00 0.00 1.22 1.98 2.80 1.36
95% 5.08 6.76 1.34 0.00 0.00 1.26 4.56 4.86 1.34n = 5000 50% 5.08 8.30 1.48 0.00 0.00 1.44 4.58 4.84 1.52
5% 5.08 9.00 4.28 0.00 0.00 4.14 4.58 4.84 3.580 4.96 8.84 4.70 0.00 0.00 4.38 4.64 5.02 4.46
Table 3: Empirical rejection probabilities for 5%-level tests based on In(R) for differentpoints in the null hypothesis. Lower and upper endpoints correspond to Table 2.
We test whether a value c belongs to the identified set for θ0(d0) by setting ΥF (θ) =
θ(d0)− c and employing ΥG to impose that θ be monotonically increasing and convex.
We base inference on In(R) with r = 2, Σn the sample analogue to E[DzD′z], all moment
restrictions (kn = Sz), and a saturated model for θ0 (jn = Sw). To compute critical
values we set `n = +∞ and τn = 0 – though note Θrn need not be a singleton when τn = 0
because jn > kn. We select rn by modifying the approach employed in Section 5.1: For
θLn and θUn minimizers and maximizers of θ(d0) over the set of θ that are monotonically
increasing, convex, and minimize ‖∑n
i=1(Yi − θ(Dw,i))Dz,i/n‖∞, we set
P ( max1≤j≤J
maxΥG,j(θL?n − θLn ),ΥG,j(θ
U?n − θUn ) ≤ rn|Vini=1) = 1− γn, (43)
where ΥG,j(θ) denotes the jth coordinate of the vector ΥG(θ) and θL?n and θU?n are again
computed employing the score bootstrap. As in our previous analysis, γn must tend to
zero with n in order for rn to satisfy our assumptions.
Table 3 reports empirical rejection rates for testing whether c belongs to the identified
set, with the lower and upper endpoint columns corresponding to setting c to equal
the lower and upper endpoints in Table 2. All tests are conducted at a 5% nominal
level. Across designs, we find that setting rn = +∞ always delivers tests with rejection
probabilities below their nominal level. Setting rn according to (43) with 1− γn = 0.95
also delivers adequate size control, with the exception of n = 5000 and (Sw, Sz) = (4, 2)
where we see a modest over-rejection at the lower endpoint of the identified set. Overall,
the degree of sensitivity to the choice of rn is similar to that found in Section 5.1.
32
6 Returns to Education
As a final illustration of the empirical content of our results, we conclude by examining
the role that shape restrictions can play in the study of Angrist and Krueger (1991) on
returns to education. Specifically, for Y log-earnings and D years of education we posit
Y = g(D) +W ′γ + ε, (44)
where W is a vector of observable covariates consisting of a constant, age, age squared,
and dummy variables for year of birth, race, marital status, residence in a Standard
Metropolitan Statistical Area (SMSA), and census region of residence. The function g
is modeled as a piecewise quadratic spline with a knot placed at the median level of
education. Following Angrist and Krueger (1991), we address the possible endogeneity
of D by employing a vector of instruments Z consisting of W and a set of quarter of
birth dummy variables interacted with year of birth dummy variables.
We conduct our analysis employing the decennial census extracts studied by Angrist
and Krueger (1991). All statistical procedures are based on In(R) − In(Θ) and, as in
Angrist and Krueger (1991), applied separately to men born in the 1920-1929, 1930-1939,
and 1940-1949 cohorts. Since the model is linear and identification holds when the rank
condition is satisfied, we set `n = +∞, τn = 0, and implement our tests by following
the discussion in Section 2.1. All bootstrap critical values are based on five thousand
bootstrap draws, with both bootstrap statistics based on the constrained estimator.
p-value of test of shape restrictionShape Restr. rn 1920-1929 Cohort 1930-1939 Cohort 1940-1949 Cohort
Mon. 95% 0.636 0.815 0.982Mon. 0 0.218 0.528 0.836
Mon.+Conc. 95% 0.184 0.750 0.551Mon.+Conc. 0 0.054 0.476 0.332
Table 4: Test of whether returns to education satisfy a shape restriction. Choice ofbandwidth rn = 95% corresponds to setting 1− γn = 0.95 in (4).
We first examine the empirical plausibility of returns to education satisfying certain
shape restrictions. In Table 4 we report the p-values for the null hypothesis that g
in (44) is monotonically increasing (i.e. marginal returns are positive) and for the null
hypothesis that it is monotonically increasing and concave (i.e. marginal returns are pos-
itive and decreasing). The p-values are obtained under different choices of rn, including
rn computed as suggested in Section 2.1 by setting 1 − γn = 0.95 in (4). While not
theoretically justified, we also report p-values corresponding to rn = 0 to illustrate the
sensitivity of the test to the choice of rn. We find little evidence against the conjectured
shape restrictions across cohorts and theoretically recommended choices of rn.
33
Shape Restr. rn Restr. Estimate Restr. CI 2SLS 2SLS CIMon. 95% 0.000 [0.000, 0.165] -0.083 [-0.291,0.125]
1920-1929 Mon. 0 0.000 [0.000, 0.123] -0.083 [-0.291,0.125]Cohort Mon.+Conc. 95% 0.086 [0.001, 0.173] -0.083 [-0.291,0.125]
Mon.+Conc. 0 0.086 [0.016, 0.150] -0.083 [-0.291,0.125]Mon. 95% 0.116 [0.000, 0.243] 0.160 [-0.061,0.380]
1930-1939 Mon. 0 0.116 [0.000, 0.243] 0.160 [-0.061,0.380]Cohort Mon.+Conc. 95% 0.090 [0.000, 0.202] 0.160 [-0.061,0.380]
Mon.+Conc. 0 0.090 [0.000, 0.196] 0.160 [-0.061,0.380]Mon. 95% 0.032 [0.000, 0.194] 0.032 [-0.155,0.220]
1940-1949 Mon. 0 0.032 [0.000, 0.194] 0.032 [-0.155,0.220]Cohort Mon.+Conc. 95% 0.078 [0.000, 0.167] 0.032 [-0.155,0.220]
Mon.+Conc. 0 0.078 [0.000, 0.162] 0.032 [-0.155,0.220]
Table 5: 95% Confidence region for marginal return to education at the median.Columns: (i) Imposed shape restriction; (ii) Choice of rn, with rn = 95% denoting1− γn = 0.95 in (4); (iii) Shape Restricted 2SLS estimate; (iv) Confidence region usingshape restrictions; (v)-(vi) Unrestricted 2SLS estimate and Wald confidence region.
In Table 5 we impose the conjectured shape restrictions while building 95%-level
confidence intervals for the marginal return to education at the median level of education.
By means of comparison, we also report the Wald confidence regions based on the
unrestricted two stage least squares (2SLS) estimator. Overall, we find little sensitivity
of our results to the choice of rn. In unreported results we found the confidence regions
corresponding to rn = +∞ to equal those obtained by setting 1 − γn = 0.95. We
further find imposing shape restrictions considerably sharpens the confidence regions
with the impact on the analysis of the 1930-1939 cohort being particularly salient. With
the exception of the 1920-1929 cohort, we also find imposing both monotonicity and
concavity is more informative than only imposing monotonicity.
34
References
Ai, C. and Chen, X. (2003). Efficient estimation of models with conditional moment restrictionscontaining unknown functions. Econometrica, 71 1795–1844.
Ai, C. and Chen, X. (2007). Estimation of possibly misspecified semiparametric conditionalmoment restriction models with different conditioning variables. Journal of Econometrics,141 5–43.
Ai, C. and Chen, X. (2012). The semiparametric efficiency bound for models of sequentialmoment restrictions containing unknown functions. Journal of Econometrics, 170 442–457.
Aıt-Sahalia, Y. and Duarte, J. (2003). Nonparametric option pricing under shape restric-tions. Journal of Econometrics, 116 9–47.
Andrews, D. W. (2000). Inconsistency of the bootstrap when a parameter is on the boundaryof the parameter space. Econometrica, 68 399–405.
Andrews, D. W. (2001). Testing when a parameter is on the boundary of the maintainedhypothesis. Econometrica 683–734.
Andrews, D. W. K. and Shi, X. (2013). Inference based on conditional moment inequalities.Econometrica, 81 609–666.
Andrews, D. W. K. and Soares, G. (2010). Inference for parameters defined by momentinequalities using generalized moment selection. Econometrica, 78 119–157.
Angrist, J. D. and Krueger, A. B. (1991). Does compulsory school attendance affect school-ing and earnings? The Quarterly Journal of Economics, 106 979–1014.
Armstrong, T. (2015). Adaptive testing on a regression function at a point. The Annals ofStatistics, 43 2086–2101.
Athey, S. and Stern, S. (1998). An empirical framework for testing theories about compli-mentarity in organizational design. Tech. rep., National Bureau of Economic Research.
Babii, A. and Kumar, R. (2019). Isotonic regression discontinuity designs. Available at SSRN3458127.
Beare, B. K. and Schmidt, L. D. (2014). An empirical test of pricing kernel monotonicity.Journal of Applied Econometrics (in press).
Beresteanu, A. and Molinari, F. (2008). Asymptotic properties for a class of partiallyidentified models. Econometrica, 76 763–814.
Blundell, R., Chen, X. and Kristensen, D. (2007). Semi-nonparametric iv estimation ofshape-invariant engel curves. Econometrica, 75 1613–1669.
Blundell, R., Horowitz, J. L. and Parey, M. (2012). Measuring the price responsivenessof gasoline demand: Economic shapre restrictions and nonparametric demand estimation.Quantitative Economics, 3 29–51.
Bugni, F. A., Canay, I. A. and Shi, X. (2017). Inference for subvectors and other functionsof partially identified parameters in moment inequality models. Quantitative Economics, 81–38.
Canay, I. A. and Shaikh, A. M. (2017). Practical and theoretical advances in inference forpartially identified models. In Advances in Economics and Econometrics: Eleventh WorldCongress, vol. 2. Cambridge University Press Cambridge, 271–306.
35
Chamberlain, G. (1992). Comment: Sequential moment restrictions in panel data. Journal ofBusiness & Economic Statistics, 10 20–26.
Chatterjee, S., Guntuboyina, A. and Sen, B. (2013). Improved risk bounds in isotonicregression. arXiv preprint arXiv:1311.3765.
Chen, X. (2007). Large sample sieve estimation of semi-nonparametric models. In Handbook ofEconometrics 6B (J. J. Heckman and E. E. Leamer, eds.). North Holland, Elsevier.
Chen, X. and Pouzo, D. (2009). Efficient estimation of semiparametric conditional momentmodels with possibly nonsmooth residuals. Journal of Econometrics, 152 46–60.
Chen, X. and Pouzo, D. (2012). Estimation of nonparametric conditional moment modelswith possibly nonsmooth generalized residuals. Econometrica, 80 277–321.
Chen, X. and Pouzo, D. (2015). Sieve wald and qlr inferences on semi/nonparametric condi-tional moment models. Econometrica, 83 1013–1079.
Chen, X., Pouzo, D. and Tamer, E. (2011a). Qlr inference on partially identified nonpara-metric conditional moment models. Working paper, Yale University.
Chen, X. and Santos, A. (2018). Overidentification in regular models. Econometrica, 861771–1817.
Chen, X., Tamer, E. and Torgovitsky, A. (2011b). Sensitivity analysis in semiparametriclikelihood models. Working paper, Yale University.
Chernozhukov, V., Chetverikov, D. and Kato, K. (2014). Comparison and anti-concentration bounds for maxima of gaussian random vectors. Probability Theory and RelatedFields, 162 47–70.
Chernozhukov, V., Fernandez-Val, I. and Galichon, A. (2009). Improving point andinterval estimators of monotone functions by rearrangement. Biometrika, 96 559–575.
Chernozhukov, V. and Hansen, C. (2005). An iv model of quantile treatment effects. Econo-metrica, 73 245–261.
Chernozhukov, V., Hong, H. and Tamer, E. (2007). Estimation and confidence regions forparameter sets in econometric models. Econometrica, 75 1243–1284.
Chernozhukov, V., Lee, S. and Rosen, A. M. (2013). Intersection bounds: Estimation andinference. Econometrica, 81 667–737.
Chetverikov, D. (2019). Testing regression monotonicity in econometric models. EconometricTheory, 35 729–776.
Chetverikov, D., Santos, A. and Shaikh, A. M. (2018). The econometrics of shape restric-tions. Annual Review of Economics, 10 31–63.
Chetverikov, D. and Wilhelm, D. (2017). Nonparametric instrumental variable estimationunder monotonicity. Econometrica, 85 1303–1320.
Dette, H., Hoderlein, S. and Neumeyer, N. (2016). Testing multivariate economic restric-tions using quantiles: the example of slutsky negative semidefiniteness. Journal of Economet-rics, 191 129–144.
Eichenbaum, M. S., Hansen, L. P. and Singleton, K. J. (1988). A time series analysisof representative agent models of consumption and leisure choice under uncertainty. TheQuarterly Journal of Economics, 103 51–78.
36
Fang, Z. and Seo, J. (2019). A general framework for inference on shape restrictions. arXivpreprint arXiv:1910.07689.
Freyberger, J. and Horowitz, J. L. (2015). Identification and shape restrictions in non-parametric instrumental variables estimation. Journal of Econometrics, 189 41–53.
Freyberger, J. and Reeves, B. (2018). Inference under shape restrictions. Available at SSRN3011474.
Galichon, A. and Henry, M. (2009). A test of non-identifying restrictions and confidenceregions for partially identified parameters. Journal of Econometrics, 152 186–196.
Gentzkow, M. (2007). Valuing new goods in a model with complementarity: Online newspa-pers. American Economic Review, 97 713–744.
Haag, B. R., Hoderlein, S. and Pendakur, K. (2009). Testing and imposing slutskysymmetry in nonparametric demand systems. Journal of Econometrics, 153 33–50.
Hansen, L. P. (1982). Large sample properties of generalized method of moments estimators.Econometrica, 50 891–916.
Hausman, J. A. and Newey, W. K. (1995). Nonparametric estimation of exact consumerssurplus and deadweight loss. Econometrica 1445–1476.
Hausman, J. A. and Newey, W. K. (2016). Individual heterogeneity and average welfare.Econometrica, 84 1225–1248.
Ho, K. and Rosen, A. M. (2017). Partial identification in applied research: Benefits and chal-lenges. In Advances in Economics and Econometrics: Volume 2: Eleventh World Congress,vol. 2. Cambridge University Press, 307.
Hong, S. (2017). Inference in semiparametric conditional moment models with partial identifi-cation. Journal of econometrics, 196 156–179.
Horowitz, J. L. and Lee, S. (2017). Nonparametric estimation and inference under shaperestrictions. Journal of econometrics, 201 108–126.
Jackwerth, J. C. (2000). Recovering risk aversion from option prices and realized returns.Review of Financial Studies, 13 433–451.
Kaido, H. and Santos, A. (2014). Asymptotically Efficient Estimation of Models Defined byConvex Moment Inequalities. Econometrica, 82 387–413.
Kline, P. and Santos, A. (2012). A score based approach to wild bootstrap inference. Journalof Econometric Methods, 1 23–41.
Koltchinskii, V. I. (1994). Komlos-major-tusnady approximation for the general empiricalprocess and haar expansions of classes of functions. Journal of Theoretical Probability, 773–118.
Kretschmer, T., Miravete, E. J. and Pernıas, J. C. (2012). Competitive pressure andthe adoption of complementary innovations. The American Economic Review, 102 1540.
Ledoux, M. and Talagrand, M. (1988). Un critere sur les petites boules dans le theoremelimite central. Probability Theory and Related Fields, 77 29–47.
Lee, S., Song, K. and Whang, Y.-J. (2018). Testing for a general class of functional inequal-ities. Econometric Theory, 34 1018–1064.
Lewbel, A. (1995). Consistent nonparametric hypothesis tests with an application to slutskysymmetry. Journal of Econometrics, 67 379–401.
37
Linton, O., Song, K. and Whang, Y.-J. (2010). An improved bootstrap test of stochasticdominance. Journal of Econometrics, 154 186–202.
Manski, C. F. (2003). Partial Identification of Probability Distributions. Springer-Verlag, NewYork.
Matzkin, R. L. (1994). Restrictions of economic theory in nonparametric methods. In Handboodof Econometrics (R. Engle and D. McFadden, eds.), vol. IV. Elsevier.
Newey, W. K. and Powell, J. (2003). Instrumental variables estimation of nonparametricmodels. Econometrica, 71 1565–1578.
Reguant, M. (2014). Complementary bidding mechanisms and startup costs in electricitymarkets. The Review of Economic Studies, 81 1708–1742.
Romano, J. P. and Shaikh, A. M. (2012). On the uniform asymptotic validity of subsamplingand the bootstrap. The Annals of Statistics, 40 2798–2822.
Santos, A. (2011). Instrumental variables methods for recovering continuous linear functionals.Journal of Econometrics, 161 129–146.
Santos, A. (2012). Inference in nonparametric instrumental variables with partial identification.Econometrica, 80 213–275.
Sargan, J. D. (1958). The estimation of economic relationships using instrumental variables.Econometrica, 26 393–415.
Tao, J. (2014). Inference for point and partially identified semi-nonparametric conditionalmoment models. Working paper, University of Washington.
Topkis, D. M. (1998). Supermodularity and complementarity. Princeton university press.
Torgovitsky, A. (2019). Nonparametric inference on state dependence in unemployment.Econometrica, 87 1475–1505.
Wolak, F. A. (2007). Quantifying the supply-side benefits from forward contracting in whole-sale electricity markets. Journal of Applied Econometrics, 22 1179–1209.
Yurinskii, V. V. (1977). On the error of the gaussian approximation for convolutions. Theoryof Probability and Its Applications, 2 236–247.
Zhai, A. (2018). A high-dimensional clt in w2 distance with near optimal convergence rate.Probability Theory and Related Fields, 170 821–845.
Zhu, Y. (2019). Inference in non-parametric/semi-parametric moment equality models withshape restrictions. Semi-Parametric Moment Equality Models with Shape Restrictions (Octo-ber 23, 2019).
38
Constrained Conditional Moment Restriction Models
Supplemental Appendix
Victor Chernozhukov
Department of Economics
M.I.T.
Whitney K. Newey‖
Department of Economics
M.I.T.
Andres Santos∗∗
Department of Economics
U.C.L.A.
January, 2020
This Supplemental Appendix to “Constrained Conditional Moment Restriction Mod-
els” is organized as follows. Sections S.1 and S.2 provide a review of AM spaces and
additional simulation results. Sections S.3 and S.4 contain the proofs of Theorems 3.1
and 3.2 respectively. The proofs for all remaining results concerning our bootstrap ap-
proximation and test are contained in Section S.5. Section S.6 analyses a number of
examples including a formal study of GMM (which covers Section 2.1) and the exam-
ple discussed in Section 2.2. Section S.6 additionally includes an example concerning
inference under Slutsky restrictions and the proof of the results stated in Section 4.
Finally, Sections S.7, S.8, and S.9 develop results that may be of independent interest,
and include the analysis of the local parameter space, empirical process coupling results
based on Koltchinskii (1994), and bootstrap coupling results.
‖Research supported by NSF Grant 1757140.∗∗Research supported by NSF Grant SES-1426882.
S.1 AM Spaces
We provide a brief introduction to AM spaces and refer the reader to Chapters 8 and 9
of Aliprantis and Border (2006) for a more detailed exposition. Before proceeding, we
first recall the definitions of a partially ordered set and a lattice.
Definition S.1.1. A partially ordered set (G,≥) is a set G with a partial order rela-
tionship ≥ defined on it – i.e. ≥ is a transitive (x ≥ y and y ≥ z implies x ≥ z), reflexive
(x ≥ x), and antisymmetric (x ≥ y implies the negation of y ≥ x) relation.
Definition S.1.2. A lattice is a partially ordered set (G,≥) such that any pair x, y ∈ G
has a least upper bound (denoted x ∨ y) and a greatest lower bound (denoted x ∧ y).
Whenever G is both a vector space and a lattice, it is possible to define objects that
depend on both the vector space and lattice operations. In particular, for x ∈ G we
define the positive part x+ ≡ x ∨ 0, the negative part x− ≡ (−x) ∨ 0, and the absolute
value |x| ≡ x ∨ (−x). It is also natural to demand that the order relation ≥ interact
with the algebraic operations in a manner analogous to that of R – i.e. to have
x ≥ y implies x+ z ≥ y + z for each z ∈ G (S.1)
x ≥ y implies αx ≥ αy for each 0 ≤ α ∈ R . (S.2)
A complete normed vector space that shares these familiar properties of R under a given
order relation ≥ is referred to as a Banach lattice. Formally, we define:
Definition S.1.3. A Banach space G with norm ‖ · ‖G is a Banach lattice if (i) G is a
lattice under ≥, (ii) ‖x‖G ≤ ‖y‖G when |x| ≤ |y|, (iii) (S.1) and (S.2) hold.
An AM space is a Banach lattice in which the maximum of the norms of any two
positive elements is equal to the norm of the maximums of the two elements.
Definition S.1.4. A Banach lattice G is called an AM space if for any elements 0 ≤x, y ∈ G it follows that ‖x ∨ y‖G = max‖x‖G, ‖y‖G.
In certain Banach lattices there may exist an element 1G > 0 called an order unit
such that for any x ∈ G there exists a 0 < λ ∈ R for which |x| ≤ λ1G – for example, in
Rd the vector (1, . . . , 1)′ is an order unit. The order unit 1G can be used to define
‖x‖∞ ≡ infλ > 0 : |x| ≤ λ1G, (S.3)
which is a norm on G. In principle, ‖ · ‖∞ need not be related to the original norm
‖ · ‖G. However, if G is an AM space, then ‖ · ‖G and ‖ · ‖∞ are equivalent in that they
generate the same topology. Hence, we refer to G as an AM space with unit 1G if: (i)
G is an AM space, (ii) 1G is an order unit in G, and (iii) The norm of G equals ‖ · ‖∞.
1
S.2 Additional Simulations
Below we report simulation results for additional choices of (jn, kn) in the identified
design discussed in Section 5.1 of the main paper.
Imposed Shape Restriction: MonotonicityInference on Level Inference on Derivative
(jn, kn) (jn, kn)rn (3,3) (3,5) (4,4) (4,6) (5,5) (5,8) (3,3) (3,5) (4,4) (4,6) (5,5) (5,8)∞ 1.84 1.88 1.90 1.72 0.86 1.32 3.60 3.92 1.88 2.02 1.28 3.76
95% 1.80 1.84 1.74 1.68 1.12 1.44 3.40 3.72 1.90 2.08 1.30 2.04n = 500 50% 1.92 2.24 1.74 1.70 1.12 1.44 3.52 4.04 1.90 2.10 1.30 2.04
5% 3.96 4.40 2.18 2.90 1.14 1.72 4.58 5.14 2.20 2.96 1.34 2.340 5.32 4.84 5.30 5.10 3.48 4.30 5.00 5.52 4.62 4.48 3.40 4.38∞ 1.30 1.42 1.56 1.82 0.84 1.08 2.90 3.06 1.68 1.94 1.18 1.24
95% 1.20 1.32 1.52 1.84 0.90 1.10 2.88 2.96 1.64 1.86 1.24 1.18n = 1000 50% 1.44 1.96 1.52 1.86 0.90 1.10 3.24 3.60 1.64 1.86 1.20 1.18
5% 3.00 3.72 2.02 2.84 0.92 1.32 4.22 4.70 2.06 3.06 1.22 1.460 3.94 4.20 4.54 4.56 3.62 3.66 4.18 4.74 4.58 4.68 3.48 3.32∞ 0.98 1.20 1.34 1.58 0.70 0.68 2.88 2.84 1.26 1.52 0.98 0.94
95% 1.00 1.36 1.40 1.50 0.76 0.70 3.06 2.96 1.32 1.62 0.98 0.96n = 5000 50% 2.06 2.48 1.42 1.52 0.76 0.70 4.40 4.34 1.32 1.62 0.96 0.96
5% 3.98 4.54 2.20 3.62 0.92 1.06 5.00 5.20 2.36 3.36 0.98 1.340 4.74 5.20 3.98 4.56 3.18 3.52 4.94 5.34 4.68 4.50 3.72 3.76
Imposed Shape Restriction: Monotonicity & ConvexityInference on Level Inference on Derivative
(jn, kn) (jn, kn)rn (3,3) (3,5) (4,4) (4,6) (5,5) (5,8) (3,3) (3,5) (4,4) (4,6) (5,5) (5,8)∞ 1.56 1.50 1.44 1.52 1.06 1.70 4.32 4.40 2.74 2.84 2.54 2.98
95% 1.36 1.42 1.46 1.54 1.28 1.78 4.24 4.22 2.68 2.84 2.66 3.10n = 500 50% 1.36 1.44 1.46 1.54 1.28 1.78 4.24 4.24 2.68 2.84 2.66 3.10
5% 3.40 3.88 1.52 1.82 1.22 1.86 4.98 5.26 2.74 2.98 2.68 3.120 5.30 4.84 5.42 5.36 4.48 5.22 5.48 5.48 5.08 4.84 4.18 4.80∞ 0.78 1.10 1.40 1.54 1.18 1.32 3.28 3.40 2.26 2.32 2.12 2.02
95% 0.98 1.06 1.36 1.44 1.22 1.30 3.30 3.42 2.04 2.26 2.32 2.08n = 1000 50% 0.98 1.12 1.36 1.44 1.22 1.30 3.30 3.46 2.04 2.26 2.32 2.08
5% 2.70 3.62 1.44 1.86 1.22 1.42 4.18 4.68 2.14 2.38 2.32 2.100 3.94 4.20 4.62 4.78 4.42 4.56 4.30 4.64 4.38 4.20 4.06 3.90∞ 0.78 0.84 1.04 1.36 0.98 1.00 3.28 3.18 1.36 1.58 1.48 1.56
95% 0.80 0.78 1.06 1.42 0.96 1.04 3.32 3.04 1.36 1.62 1.42 1.44n = 5000 50% 3.86 1.78 1.06 1.42 0.96 1.04 3.82 3.70 1.36 1.62 1.42 1.44
5% 3.86 4.40 1.42 2.38 0.96 1.10 4.98 4.68 1.46 1.86 1.42 1.460 4.74 5.20 4.10 4.74 3.78 4.22 4.78 5.02 3.98 4.06 3.38 3.34
Table 6: Empirical rejection probabilities for 5%-level tests based on In(R)−In(Θ). Vec-tors pjnn (x) and qknn (z) constructed from piecewise quadratic continuously differentiablesplines. Value of rn set to a percentile corresponds to choice of 1− γn in (42).
2
S.3 Rate of Convergence
Proof of Theorem 3.1: For conciseness, define ηn ≡ k1/pn
√log(1 + kn)JnBn/
√n and
let δ−1n ≡ νn(ηn + τn). In addition, we define the event An ≡ An1 ∩An2 ∩An3 where
An1 ≡ Θrn ⊆ Vn(P )
An2 ≡ Σ−1n exists and max‖Σ−1
n ‖o,p, ‖Σn‖o,p, ‖Σn(P )−1‖o,p, ‖Σn(P )‖o,p < B
An3 ≡ supθ∈Θr
0n(P )Qn,P (θ)‖Σn − Σn(P )‖o,p ≤ Bηn and ‖Σn − Σn(P )‖o,p ≤
1
2B. (S.4)
Moreover, note that for any ε > 0 and B sufficiently large we can conclude that
lim supn→∞
supP∈P0
P (Acn) < ε (S.5)
due to Lemma S.3.4 and Assumptions 3.4(i), 3.4(iii), and 3.6. Therefore, we obtain that
lim supn→∞
supP∈P0
P (δn−→d H(Θr
n,Θr0n(P ), ‖ · ‖E) > 2M )
≤ lim supn→∞
supP∈P0
P (δn−→d H(Θr
n,Θr0n(P ), ‖ · ‖E) > 2M ; An) + ε (S.6)
for any M . For each P ∈ P0, next partition Vn(P ) into subsets Sn,j(P ) defined by
Sn,j(P ) ≡ θ ∈ Vn(P ) : 2j−1 < δn−→d H(θ,Θr
0n(P ), ‖ · ‖E) ≤ 2j.
Since Θrn ⊆ Vn(P ) under An, it follows from the definition of Θr
n, and (S.6) that
lim supn→∞
supP∈P0
P (δn−→d H(Θr
n,Θr0n(P ), ‖ · ‖E) > 2M )
≤ lim supn→∞
supP∈P0
∞∑j≥M
P ( infθ∈Sn,j(P )
Qn(θ) ≤ infθ∈Θn∩R
Qn(θ) + τn; An) + ε. (S.7)
Letting Qn,P (θ) ≡ ‖EP [ρ(X, θ) ∗ qknn (Z)]‖Σn,p, we then obtain from Lemma S.3.2 that
infθ∈Θn∩R
Qn(θ) ≤ infθ∈Θn∩R
Qn,P (θ) + ‖Σn‖o,pZn,P ≤ infθ∈Θn∩R
Qn,P (θ) +BZn,P (S.8)
where the final inequality holds under the event An by (S.4). Moreover, since for any
a ∈ Rkn we have ‖Σna‖p ≤ ‖ΣnΣn(P )−1‖o,p‖Σn(P )a‖p, we obtain from the inequality
3
‖ΣnΣn(P )−1‖o,p ≤ ‖Σn − Σn(P )Σn(P )−1‖o,p + 1 that under the event An
infθ∈Θn∩R
Qn,P (θ) ≤ ‖ΣnΣn(P )−1‖o,p infθ∈Θn∩R
Qn,P (θ)
≤ 1 + ‖Σn(P )−1‖o,p‖Σn − Σn(P )‖o,p infθ∈Θn∩R
Qn,P (θ)
≤ infθ∈Θn∩R
Qn,P (θ) +B2ηn. (S.9)
In addition, note that by similar arguments we also obtain from Lemma S.3.2 and
‖Σn(P )a‖p ≤ ‖Σn(P )Σ−1n ‖o,p‖Σna‖p that under the event An we have
infθ∈Sn,j(P )
Qn(θ) ≥ infθ∈Sn,j(P )
Qn,P (θ)− ‖Σn‖o,pZn,P
≥ ‖Σn(P )Σ−1n ‖−1
o,p infθ∈Sn,j(P )
Qn,P (θ)−BZn,P . (S.10)
Next, we note the triangle inequality, ‖(Σn(P )−Σn)Σ−1n ‖o,p ≤ ‖Σ−1
n ‖o,p‖Σn−Σn(P )‖o,p,and ‖Σ−1
n ‖o,p ≤ B under the event An by (S.4) yield the inequality
‖Σn(P )Σ−1n ‖−1
o,p − 1 ≥ (‖(Σn(P )− Σn)Σ−1n ‖o,p + 1)−1 − 1
≥ −‖(Σn(P )− Σn)Σ−1n ‖o,p ≥ −B‖Σn − Σn(P )‖o,p. (S.11)
Therefore, combining results (S.10) and (S.11), together with Assumption 3.6(i) and the
definition of Sn,j(P ) we obtain for B sufficiently large that under the event An we have
infθ∈Sn,j(P )
Qn(θ) ≥ (1−B‖Σn − Σn(P )‖o,p)× infθ∈Sn,j(P )
Qn,P (θ)−BZn,P
≥ (1−B‖Σn − Σn(P )‖o,p)( supθ∈Θr
0n(P )Qn,P (θ) +
2j−1
νnδn−Bηn)−BZn,P
≥ infθ∈Θn∩R
Qn,P (θ) +2j−2
νnδn−B(Zn,P + 2Bηn), (S.12)
where in the final inequality we employed Assumption 3.5 and the definition of An in
(S.4). Hence, combining results (S.7), (S.8), (S.9), and (S.12) allows us to conclude
lim supM↑∞
lim supn→∞
supP∈P0
P (δn−→d H(Θr
n,Θr0n(P ), ‖ · ‖E) > 2M )
≤ lim supM↑∞
lim supn→∞
supP∈P0
∞∑j≥M
P (2j−2
νnδn≤ 3B(Bηn + Zn,P ) + τn; An) + ε
≤ lim supM↑∞
lim supn→∞
supP∈P0
∞∑j≥M
P (2(j−3)(ηn + τn) ≤ 3BZn,P ) + ε, (S.13)
4
where in the final inequality we employed that we had defined δ−1n ≡ νn(ηn + τn).
Therefore, Zn,P ∈ R+, Lemma S.3.2, τn ≥ 0, and Markov’s inequality yield
lim supM↑∞
lim supn→∞
supP∈P0
∞∑j≥M
P (2(j−3)(ηn + τn) ≤ 3BZn,P )
. lim supM↑∞
lim supn→∞
∑j≥M
2−j × 1
ηn
k1/pn
√log(1 + kn)JnBn√
n= 0, (S.14)
where in the final result we employed that ηn ≡ k1/pn
√log(1 + kn)JnBn/
√n. The first
claim of the theorem therefore follows from (S.13) and (S.14) and ε being arbitrary.
To establish the second claim of the theorem, next define the event An4 ≡ Θr0n(P ) ⊆
Θrn. Since
−→d H(Θr
0n(P ), Θrn, ‖·‖E) = 0 whenever the event An4 occurs, we can conclude
from Lemma S.3.1(ii) and part (i) of this theorem that
lim supM↑∞
lim supn→∞
supP∈P0
P (δndH(Θrn,Θ
r0n(P ), ‖ · ‖E) > 2M )
= lim supM↑∞
lim supn→∞
supP∈P0
P (δn−→d H(Θr
n,Θr0n(P ), ‖ · ‖E) > 2M ) = 0, (S.15)
and thus the theorem follows from δ−1n = νn(ηn + τn) and ηn = o(τn).
Lemma S.3.1. Let Assumptions 3.2, 3.3(i), 3.3(iii), 3.4, 3.6(ii) hold, ‖ · ‖A be a norm
on Bn and for some ε > 0 let Vn(P ) ≡ θ ∈ Θn ∩R :−→d H(θ,Θr
0n(P ), ‖ · ‖A) ≤ ε and
Sn(ε) ≡ infP∈P0
infθ∈(Θn∩R)\Vn(P )
Qn,P (θ)− infθ∈Θn∩R
Qn,P (θ).
(i) If ηn ∨ τn = o(Sn(ε)) for ηn ≡ k1/pn
√log(1 + kn)JnBn/
√n, then Θr
n ⊆ Vn(P ) with
probability tending to one uniformly in P ∈ P0. (ii) If Assumption 3.5 holds and
ηn = o(τn), then Θr0n(P ) ⊆ Θr
n with probability tending to one uniformly in P ∈ P0.
Proof: For a given ε > 0 first notice that by definition of Θrn and Vn(P ) we have
P (−→d H(Θr
n,Θr0n(P ), ‖·‖A) > ε) ≤ P ( inf
θ∈(Θn∩R)\Vn(P )Qn(θ) ≤ inf
θ∈Θn∩RQn(θ)+τn) (S.16)
for all P ∈ P0. For conciseness let ηn ≡ k1/pn
√log(1 + kn)JnBn/
√n, and setting
Qn,P (θ) ≡ ‖EP [ρ(X, θ) ∗ qknn (Z)]‖Σn,p then note that Lemmas S.3.2 and S.3.4 imply
infθ∈(Θn∩R)\Vn(P )
Qn,P (θ) ≤ infθ∈(Θn∩R)\Vn(P )
Qn(θ) +OP (ηn) (S.17)
uniformly in P ∈ P0. In addition, by similar arguments we obtain uniformly in P ∈ P0
infθ∈Θn∩R
Qn(θ) ≤ infθ∈Θn∩R
Qn,P (θ) +OP (ηn). (S.18)
5
Next note that for any a ∈ Rkn we have ‖Σn(P )a‖p ≤ ‖Σn(P )Σ−1n ‖o,p‖Σna‖p, and
therefore from the definition of Sn(ε) we obtain that
infθ∈(Θn∩R)\Vn(P )
Qn,P (θ) ≥ ‖Σn(P )Σ−1n ‖−1
o,p infθ∈(Θn∩R)\Vn(P )
Qn,P (θ)
≥ ‖Σn(P )Σ−1n ‖−1
o,pSn(ε) + infθ∈Θn∩R
Qn,P (θ). (S.19)
Similarly, employing that ‖Σna‖p ≤ ‖ΣnΣn(P )−1‖o,p‖Σn(P )a‖p, we can conclude
infθ∈Θn∩R
Qn,P (θ)− ‖Σn(P )Σ−1n ‖−1
o,p infθ∈Θn∩R
Qn,P (θ)
≤ ‖ΣnΣn(P )−1‖o,p − ‖Σn(P )Σ−1n ‖−1
o,p infθ∈Θn∩R
Qn,P (θ). (S.20)
For In the kn × kn identity matrix, then note that ‖In‖o,p = 1 implies the bound
|‖Σn(P )Σ−1n ‖o,p − 1| = |‖Σn(P )Σ−1
n ‖o,p − ‖In‖o,p| ≤ ‖(Σn(P )− Σn)Σ−1n ‖o,p
≤ ‖Σ−1n ‖o,p‖Σn(P )− Σn‖o,p = OP (‖Σn(P )− Σn‖o,p) (S.21)
where the final equality holds uniformly in P ∈ P0 by Lemma S.3.4. By identical
arguments it follows that |‖ΣnΣn(P )−1‖o,p − 1| = OP (‖Σn − Σn(P )‖o,p) uniformly in
P ∈ P0, and therefore result (S.20) and Assumption 3.6(ii) allows us to conclude that
infθ∈Θn∩R
Qn,P (θ)− ‖Σn(P )Σ−1n ‖−1
o,p infθ∈Θn∩R
Qn,P (θ) ≤ OP (ηn) (S.22)
uniformly in P ∈ P0. Therefore, (S.16), (S.17), (S.18), (S.19), and (S.22) imply
lim supn→∞
supP∈P0
P (−→d H(Θr
n,Θr0n(P ), ‖ · ‖A) > ε)
≤ lim supM↑∞
lim supn→∞
supP∈P0
P (Sn(ε) ≤ ‖Σn(P )‖o,p‖Σn‖o,pM(ηn + τn)) = 0
where the equality follows from Lemma S.3.4, Assumption 3.4(iii), and ηn∨τn = o(Sn(ε))
by hypothesis. Part (i) of the lemma then follows by definition of Vn(P ).
In order to establish part (ii) of the lemma, note the definition of Θrn implies
P (Θr0n(P ) ⊆ Θr
n) ≥ P ( supθ∈Θr
0n(P )Qn(θ) ≤ inf
θ∈Θn∩RQn(θ) + τn) (S.23)
for all P ∈ P0. Moreover, applying Lemma S.3.2 and ‖Σna‖p ≤ ‖ΣnΣn(P )−1‖o,p‖Σn(P )a‖p
6
for any a ∈ Rkn we obtain that uniformly in P ∈ P0 we have
supθ∈Θr
0n(P )Qn(θ) ≤ sup
θ∈Θr0n(P )
Qn,P (θ) +OP (ηn)
≤ ‖ΣnΣn(P )−1‖o,p supθ∈Θr
0n(P )Qn,P (θ) +OP (ηn) = inf
θ∈Θn∩RQn,P (θ) +OP (ηn), (S.24)
where the final equality follows from Assumption 3.5, identical arguments to those in
(S.21) implying |‖ΣnΣn(P )−1‖o,p− 1| = OP (‖Σn−Σn(P )‖o,p) uniformly in P ∈ P, and
Assumption 3.6(ii). Similarly, Lemma S.3.2, ‖Σn(P )a‖p ≤ ‖Σn(P )Σ−1n ‖o,p‖Σna‖p for
any a ∈ Rkn , Assumption 3.6(ii), and result (S.21) imply
infθ∈Θn∩R
Qn(θ) ≥ infθ∈Θn∩R
Qn,P (θ)−OP (ηn)
≥ ‖Σn(P )Σ−1n ‖−1
o,p infθ∈Θn∩R
Qn,P (θ)−OP (ηn) = infθ∈Θn∩R
Qn,P (θ)−OP (ηn) (S.25)
uniformly in P ∈ P0. Part (ii) of the lemma thus follows from results (S.23), (S.24),
(S.25), and ηn = o(τn) by hypothesis.
Lemma S.3.2. Let Qn,P (θ) ≡ ‖EP [ρ(X, θ)∗qknn (Z)]‖Σn,p, and Assumptions 3.2, 3.3(i),
and 3.3(iii) hold. Then, for each P ∈ P there are random Zn,P ∈ R+ with
|Qn(θ)− Qn,P (θ)| ≤ ‖Σn‖o,p ×Zn,P ,
for all θ ∈ Θn ∩R and in addition supP∈PEP [Zn,P ] = O(k1/pn
√log(1 + kn)JnBn/
√n).
Proof: Let Gn ≡ fqk,n, : f ∈ Fn, 1 ≤ ≤ J and 1 ≤ k ≤ kn,. Note that by
Assumption 3.3(i), ‖qk,n,‖∞ ≤ Bn for all 1 ≤ ≤ J and 1 ≤ k ≤ kn,. Hence, letting
Fn be the envelope for Fn, as in Assumption 3.3(iii), it follows that Gn ≡ BnFn is an
envelope for Gn satisfying supP∈PEP [G2n(V )] <∞. Thus, we obtain
supP∈P
EP [ supg∈Gn
|Gn,P g|] . supP∈P
J[ ](‖Gn‖P,2,Gn, ‖ · ‖P,2) (S.26)
by Theorem 2.14.2 in van der Vaart and Wellner (1996). Moreover, also notice that
Lemma S.3.3, the change of variables u = ε/Bn, and Bn ≥ 1 imply
supP∈P
J[ ](‖Gn‖P,2,Gn, ‖ · ‖P,2) ≤ supP∈P
∫ ‖Gn‖P,20
√1 + log(knN[ ](ε/Bn,Fn, ‖ · ‖P,2))dε
≤ (1 +√
log(kn))Bn × supP∈P
J[ ](‖Fn‖P,2,Fn, ‖ · ‖P,2) = O(√
log(1 + kn)BnJn), (S.27)
where the final equality follows from Assumption 3.3(iii). Next define Zn,P ∈ R+ by
Zn,P ≡k
1/pn√n× supg∈Gn
|Gn,P g|
7
and note (S.26) and (S.27) imply supP∈PEP [Zn,P ] = O(k1/pn
√log(1 + kn)BnJn/
√n) as
desired. Since we also have that ‖Gn,Pρ(·, θ) ∗ qknn ‖p ≤ k1/pn × supg∈Gn |Gn,P g| for all
θ ∈ Θn ∩R by definition of Gn, we can in turn conclude by direct calculation
|Qn(θ)− Qn,P (θ)| ≤ ‖Σn‖o,p√n× ‖Gn,Pρ(·, θ) ∗ qknn ‖p ≤ ‖Σn‖o,p ×Zn,P ,
which establishes the claim of the lemma.
Lemma S.3.3. Let gjJj=1 be a finite set of functions satisfying supP∈P ‖gj‖P,∞ ≤ C
for all 1 ≤ j ≤ J and some C <∞. Defining the class of functions
Gn ≡ fgj : f ∈ Fn, 1 ≤ j ≤ J,
it then follows that N[ ](ε,Gn, ‖ · ‖P,2) ≤ J ×N[ ](ε/C,Fn, ‖ · ‖P,2) for all P ∈ P.
Proof: First define g+j ≡ gj ∨ 0 and g−j ≡ gj ∧ 0, where ∨ and ∧ denote the pointwise
maximums and minimums. If [fi,l,P , fi,u,P ]i is a collection of brackets for Fn satisfying∫(fi,u,P − fi,l,P )2dP ≤ ε2 (S.28)
for all i, then it follows that the following collection of brackets covers the class Gn
[g+j fi,l,P + g−j fi,u,P , g
−j fi,l,P + g+
j fi,u,P ]i,j . (S.29)
Moreover, since |gj | = g+j − g
−j by construction, we also obtain from (S.28) that
∫(g+j fi,u,P + g−j fi,l,P − g
+j fi,l,P − g
−j fi,u,P )2dP
=
∫(fi,u,P − fi,l,P )2|gj |2dP ≤ ε2C2. (S.30)
Since there are J ×N[ ](ε,Fn, ‖ · ‖P,2) brackets in (S.29), we conclude from (S.30) that
N[ ](ε,Gn, ‖ · ‖P,2) ≤ J ×N[ ](ε
C,Fn, ‖ · ‖P,2),
for all P ∈ P, which establishes the claim of the lemma.
Lemma S.3.4. If Assumption 3.4 holds, then there exists a constant B <∞ such that
lim infn→∞
infP∈P
P (Σ−1n exists and max‖Σn‖o,p, ‖Σ−1
n ‖o,p < B) = 1.
Proof: First note that by Assumption 3.4(iii) there exists a B <∞ such that
supn≥1
supP∈P
max‖Σn(P )‖o,p, ‖Σn(P )−1‖o,p <B
2. (S.31)
8
Next, let In denote the kn × kn identity matrix and for each P ∈ P rewrite Σn as
Σn = Σn(P )In − Σn(P )−1(Σn(P )− Σn). (S.32)
By Theorem 2.9 in Kress (1999), the matrix In−Σn(P )−1(Σn(P )−Σn) is invertible and
the operator norm of its inverse is bounded by two when ‖Σn(P )−1(Σn(P )− Σn)‖o,p <1/2. Since by Assumption 3.4(ii) and the equality in (S.32) it follows that Σn is invertible
if and only if In − Σn(P )−1(Σn(P )− Σn) is invertible, we obtain that
P (Σ−1n exists and ‖In − Σn(P )−1(Σn(P )− Σn)−1‖o,p < 2)
≥ P (‖Σn(P )−1(Σn − Σn(P ))‖o,p <1
2) ≥ P (‖Σn − Σn(P )‖o,p <
1
B), (S.33)
where we employed ‖Σn(P )−1(Σn − Σn(P ))‖o,p ≤ ‖Σn(P )−1‖o,p‖Σn − Σn(P )‖o,p and
(S.31). Hence, since ‖Σn(P )‖o,p < B/2 for all P ∈ P and n, (S.32) and (S.33) yield
P (Σ−1n exists and ‖Σ−1
n ‖o,p < B) ≥ P (‖Σn − Σn(P )‖o,p <1
B). (S.34)
Finally, since ‖Σn‖o,p ≤ B/2 + ‖Σn − Σn(P )‖o,p by (S.31), result (S.34) implies that
lim infn→∞
infP∈P
P (Σ−1n exists and max‖Σn‖o,p, ‖Σ−1
n ‖o,p < B)
≥ lim infn→∞
infP∈P
P (‖Σn − Σn(P )‖o,p < minB2,
1
B) = 1,
where the equality, and hence the lemma, follows from Assumption 3.4(i).
Lemma S.3.5. If a ∈ Rd, then ‖a‖p ≤ d( 1p− 1p
)+‖a‖p for any p, p ∈ [2,∞].
Proof: The case p ≤ p trivially follows from ‖a‖p ≤ ‖a‖p for all a ∈ Rd. For the case
p > p, let a = (a(1), . . . , a(d))′ and note that by Holder’s inequality we obtain
‖a‖pp =
d∑i=1
|a(i)|p× 1 ≤ d∑i=1
(|a(i)|p)pp
pp
d∑i=1
1pp−p 1−
pp =
d∑i=1
|a(i)|pppd
1− pp . (S.35)
Thus, the claim of the lemma for p > p follows from taking the 1/p power in (S.35).
S.4 Strong Approximation
Proof of Theorem 3.2: Let Pf ≡ EP [f(V )] and note by Assumption 3.4(iii) there
is a C0 <∞ such that ‖Σn(P )‖o,p ≤ C0 for all P ∈ P0. Therefore, Assumption 3.10(ii)
9
and Lemma S.3.5 imply that for all P ∈ P0, θ0 ∈ Θr0n(P ), and h/
√n ∈ Vn(θ0, `n)
‖√nPρ(·, θ0 +
h√n
) ∗ qknn − Dn,P (θ0)[h]‖Σn(P ),p
≤ C0‖√nP (ρ(·, θ0 +
h√n
)− ρ(·, θ0)) ∗ qknn − Dn,P (θ0)[h]‖2 + o(an). (S.36)
Moreover, Lemma S.4.5 and the maps mP, satisfying Assumption 3.9(i) imply that
J∑=1
kn,∑k=1
〈√nmP,(θ0 +
h√n
)−mP,(θ0) − ∇mP,(θ0)[h], qk,n,〉2L2P
≤J∑=1
C1‖√nmP,(θ0 +
h√n
)−mP,(θ0)−∇mP,(θ0)[h√n
]‖2P,2
≤J∑=1
C1K2m × n× ‖
h√n‖2L × ‖
h√n‖2E (S.37)
for some constant C1 < ∞ and all P ∈ P0, θ0 ∈ Θr0n(P ), and h/
√n ∈ Vn(θ0, `n).
Therefore, by results (S.36) and (S.37), the law of iterated expectations, the definition
of Sn(L,E), and Km`2n × Sn(L,E) = o(ann
− 12 ) by hypothesis, we obtain
supP∈P0
supθ0∈Θr
0n(P )sup
h√n∈Vn(θ0,`n)
‖√nPρ(·, θ0 +
h√n
) ∗ qknn − Dn,P (θ0)[h]‖Σn(P ),p
. Km ×√n`2n × Sn(L,E) + o(an) = o(an). (S.38)
Next, note that since k1/pn
√log(1 + kn)Bn × supP∈P J[ ](`
κρn ,Fn, ‖ · ‖P,2) = o(an), As-
sumption 3.10(i) implies there is a sequence ˜n satisfying the conditions of Lemma S.4.1
and `n = o(˜n). Therefore, applying Lemma S.4.1 we obtain uniformly in P ∈ P0
In(R) = infθ0∈Θr
0n(P )inf
h√n∈Vn(θ0,˜n)
‖Wn,Pρ(·, θ0)∗qknn +√nPρ(·, θ0+
h√n
)∗qknn ‖Σn(P ),p+oP (an)
(S.39)
Moreover, since `n = o(˜n) implies Vn(θ, ˜
n) ⊆ Vn(θ, `n) for all θ ∈ Θn ∩R, we have
infθ0∈Θr
0n(P )inf
h√n∈Vn(θ0,˜n)
‖Wn,Pρ(·, θ0) ∗ qknn +√nPρ(·, θ0 +
h√n
) ∗ qknn ‖Σn(P ),p
≤ infθ0∈Θr
0n(P )inf
h√n∈Vn(θ0,`n)
‖Wn,Pρ(·, θ0) ∗ qknn +√nPρ(·, θ0 +
h√n
) ∗ qknn ‖Σn(P ),p
= infθ0∈Θr
0n(P )inf
h√n∈Vn(θ0,`n)
‖Wn,Pρ(·, θ0) ∗ qknn + Dn,P (θ)[h]‖Σn(P ),p + o(an) (S.40)
uniformly in P ∈ P0, with the final equality following from (S.38), Assumption 3.4(iii)
and Lemma S.4.6. Thus, the first claim of the Theorem follows from (S.39) and (S.40),
10
while the second follows by noting that if KmR2n × Sn(L,E) = o(ann
− 12 ), then we may
set `n to simultaneously satisfy the conditions of Lemma S.4.1 and Km`2n × Sn(L,E) =
o(ann− 1
2 ), which obviates the need to introduce ˜n in (S.39) and (S.40).
Lemma S.4.1. Let Assumptions 3.2, 3.3(i), 3.3(iii), 3.4, 3.6, 3.7, 3.8, and 3.10 hold,
and define Pf ≡ EP [f(V )]. Then, for any sequence `n satisfying Rn = o(`n) and
k1/pn
√log(1 + kn)Bn supP∈P J[ ](`
κρn ,Fn, ‖ · ‖P,2) = o(an), we have uniformly in P ∈ P0:
In(R) = infθ0∈Θr
0n(P )inf
h√n∈Vn(θ0,`n)
‖Wn,Pρ(·, θ0)∗qknn +√nPρ(·, θ0+
h√n
)∗qknn ‖Σn(P ),p+oP (an).
Proof: First note that the existence of the required sequence `n is guaranteed by
Assumption 3.10(i). Next, for ηn = o(an ∧Rn/νn) let θn ∈ Θn ∩R satisfy
Qn(θn) ≤ infθ∈Θn∩R
Qn(θ) + ηn. (S.41)
Note Assumption 3.10(ii) implies Assumption 3.5 and hence Assumptions 3.10(ii) and
3.6 allows us to apply Theorem 3.1 with τn ≡ ηn and employ νnηn = o(Rn) to obtain
−→d H(θn,Θr
0n(P ), ‖ · ‖E) = OP (Rn) (S.42)
uniformly in P ∈ P0. Hence, defining for each P ∈ P0 the shrinking neighborhood
(Θr0n(P ))`n ≡ θ ∈ Θn ∩R :
−→d H(θ,Θr
0n(P ), ‖ · ‖E) ≤ `n, we obtain
In(R) = infθ∈(Θr
0n(P ))`n
√nQn(θ) + oP (an) (S.43)
uniformly in P ∈ P0 due to Rn = o(`n), ηn = o(an), and results (S.41) and (S.42).
Next, note that by Assumption 3.7 and Lemma S.3.4 and S.4.6 it follows that
| infθ∈(Θr
0n(P ))`n
√nQn(θ)− inf
θ∈(Θr0n(P ))`n
‖Wn,Pρ(·, θ) ∗ qknn +√nPρ(·, θ) ∗ qknn ‖Σn,p|
≤ ‖Σn‖o,p × supθ∈Θn∩R
‖Gn,Pρ(·, θ) ∗ qknn −Wn,Pρ(·, θ) ∗ qknn ‖p = oP (an) (S.44)
uniformly in P ∈ P0. Similarly, employing Lemmas S.3.4, S.4.2, S.4.6, and `n satisfying
k1/pn
√log(1 + kn)Bn × supP∈P J[ ](`
κρn ,Fn, ‖ · ‖P,2) = o(an) by hypothesis yields
infθ0∈Θr
0n(P )inf
h√n∈Vn(θ0,`n)
‖Wn,Pρ(·, θ0 +h√n
) ∗ qknn +√nPρ(·, θ0 +
h√n
) ∗ qknn ‖Σn,p
= infθ0∈Θr
0n(P )inf
h√n∈Vn(θ0,`n)
‖Wn,Pρ(·, θ0) ∗ qknn +√nPρ(·, θ0 +
h√n
) ∗ qknn ‖Σn,p + oP (an)
uniformly in P ∈ P0, which together with (S.43), (S.44), and Lemma S.4.3 establish the
claim of the Lemma.
11
Lemma S.4.2. Let Assumptions 3.3(i) and 3.8 hold. If δn is a sequence satisfying
k1/pn
√log(1 + kn)Bn × supP∈P J[ ](δ
κρn ,Fn, ‖ · ‖P,2) = o(an), then uniformly in P ∈ P:
supθ0∈Θr
0n(P )sup
h√n∈Vn(θ0,δn)
‖Wn,Pρ(·, θ0 +h√n
) ∗ qknn −Wn,Pρ(·, θ0) ∗ qknn ‖p = oP (an).
Proof: Since ‖qk,n,‖∞ ≤ Bn for all 1 ≤ ≤ J and 1 ≤ k ≤ kn, by Assumption 3.3(i),
Assumption 3.8 yields for any P ∈ P, θ ∈ Θn ∩R, and h/√n ∈ Vn(θ, δn) that
EP [‖ρ(X, θ +h√n
)− ρ(X, θ)‖22q2k,n,(Z)] ≤ K2
ρB2n‖
h√n‖2κρE ≤ K2
ρB2nδ
2κρn . (S.45)
Next, define the class of functions Gn ≡ fqk,n, for some f ∈ Fn, 1 ≤ ≤ J and 1 ≤k ≤ kn, and note that (S.45) implies that for all 1 ≤ ≤ J and P ∈ P
supθ0∈Θr
0n(P )sup
h√n∈Vn(θ0,δn)
max1≤k≤kn,
|Wn,Pρ(·, θ0 +h√n
)qk,n, −Wn,Pρ(·, θ0)qk,n,|
≤ supg1,g2∈Gn:‖g1−g2‖P,2≤KρBnδ
κρn
|Wn,P g1 −Wn,P g2|. (S.46)
Hence, since ‖a‖p ≤ k1/pn ‖a‖∞ for any a ∈ Rkn , result (S.46) allows us to conclude that
supθ0∈Θr
0n(P )sup
h√n∈Vn(θ0,δn)
‖Wn,Pρ(·, θ0 +h√n
) ∗ qknn −Wn,Pρ(·, θ0) ∗ qknn ‖p
≤ k1/pn × sup
g1,g2∈Gn:‖g1−g2‖P,2≤KρBnδκρn
|Wn,P g1 −Wn,P g2|. (S.47)
Moreover, Corollary 2.2.8 in van der Vaart and Wellner (1996) additionally implies that
supP∈P
EP [ supg1,g2∈Gn:‖g1−g2‖P,2≤KρBnδ
κρn
|Wn,P g1 −Wn,P g2|]
. supP∈P
∫ KρBnδκρn
0
√logN[ ](ε,Gn, ‖ · ‖P,2)dε
. supP∈P
√log(1 + kn)Bn
∫ Kρδκρn
0
√1 + logN[ ](u,Fn, ‖ · ‖P,2)du, (S.48)
where the second inequality follows from Lemma S.3.3 and the change of variables
u = ε/Bn. However, note that since N[ ](u,Fn, ‖ · ‖P,2) is decreasing in u, it follows that
J[ ](Kρδκρn ,Fn, ‖ · ‖P,2) ≤ KρJ[ ](δ
κρn ,Fn, ‖ · ‖P,2). Therefore, the Lemma follows from
(S.47), ‖Σn‖o,p = OP (1) uniformly in P ∈ P by Lemma S.3.4, and Markov’s inequality
combined with (S.48), the definition of J[ ](ε,Fn, ‖ · ‖P,2), and k1/pn
√log(1 + kn)Bn ×
supP∈P J[ ](δκρn ,Fn, ‖ · ‖P,2) = o(an) by hypothesis.
Lemma S.4.3. Let Assumptions 3.3(i), 3.3(iii), 3.4, and 3.10(ii)-(iii) hold. For any
12
positive sequence δn it then follows that uniformly in P ∈ P0 we have
infθ0∈Θr
0n(P )inf
h√n∈Vn(θ0,δn)
‖Wn,Pρ(·, θ0) ∗ qknn +√nEP [ρ(X, θ0 +
h√n
) ∗ qknn (Z)]‖Σn(P ),p
= infθ0∈Θr
0n(P )inf
h√n∈Vn(θ0,δn)
‖Wn,Pρ(·, θ0)∗qknn +√nEP [ρ(X, θ0+
h√n
)∗qknn (Z)]‖Σn,p+oP (an).
Proof: First note that by Assumption 3.4(iii) there exists a constant C0 < ∞ such
that max‖Σn(P )‖o,p, ‖Σn(P )−1‖o,p ≤ C0 for all P ∈ P. Therefore, since ‖Σna‖p ≤‖ΣnΣn(P )−1‖o,p‖Σn(P )a‖p for any a ∈ Rkn , and ‖ΣnΣn(P )−1‖o,p ≤ ‖Σn(P )−1‖o,p‖Σn−Σn(P )‖o,p + 1 by the triangle inequality, it follows that
C0‖Σn −Σn(P )‖o,p + 1‖Wn,Pρ(·, θ0) ∗ qknn +√nEP [ρ(X, θ0 +
h√n
) ∗ qknn (Z)]‖Σn(P ),p
≥ ‖Wn,Pρ(·, θ0) ∗ qknn +√nEP [ρ(X, θ0 +
h√n
) ∗ qknn (Z)]‖Σn,p (S.49)
for any θ0 ∈ Θr0n(P ) and h/
√n ∈ Vn(θ0, δn). Moreover, ‖Σn(P )‖o,p ≤ C0, 0 ∈ Vn(θ, δn)
for any θ ∈ Θn ∩R, and Assumption 3.10(ii) imply uniformly in P ∈ P that
infθ0∈Θr
0n(P )inf
h√n∈Vn(θ0,δn)
‖Wn,Pρ(·, θ0) ∗ qknn +√nEP [ρ(X, θ0 +
h√n
) ∗ qknn (Z)]‖Σn(P ),p
. supθ∈Θn∩R
‖Wn,Pρ(·, θ) ∗ qknn ‖p + o(an) = OP (k1/pn
√log(1 + kn)BnJn) + o(an) (S.50)
where the final equality holds uniformly in P ∈ P0 by Lemma S.4.4 and Markov’s
inequality. Therefore, results (S.49), (S.50), and Assumption 3.10(iii) imply
infθ0∈Θr
0n(P )inf
h√n∈Vn(θ0,δn)
‖Wn,Pρ(·, θ0)∗qknn +√nEP [ρ(X, θ0+
h√n
)∗qknn (Z)]‖Σn(P ),p+oP (an)
≥ infθ0∈Θr
0n(P )inf
h√n∈Vn(θ0,δn)
‖Wn,Pρ(·, θ0) ∗ qknn +√nEP [ρ(X, θ0 +
h√n
) ∗ qknn (Z)]‖Σn,p
uniformly in P ∈ P0. The reverse to the final inequality follows by identical arguments
but relying on Lemma S.3.4 implying ‖Σn‖o,p = OP (1) and ‖Σ−1n ‖o,p = OP (1) uniformly
in P ∈ P rather than on max‖Σn(P )‖o,p, ‖Σn(P )−1‖o,p ≤ C0, and thus the claim of
the lemma follows.
Lemma S.4.4. If Assumptions 3.3(i) and 3.3(iii) hold, then for some K0 > 0:
supP∈P
EP [ supθ∈Θn∩R
‖Wn,Pρ(·, θ) ∗ qknn ‖p] ≤ K0k1/pn
√log(1 + kn)BnJn.
Proof: Define the class Gn ≡ fqk,n, : f ∈ Fn, 1 ≤ ≤ J , and 1 ≤ k ≤ kn,, and
13
note ‖a‖p ≤ d1/p‖a‖∞ for any a ∈ Rd implies that for any P ∈ P we have
EP [ supθ∈Θn∩R
‖Wn,Pρ(·, θ) ∗ qknn ‖p] ≤ k1/pn EP [ sup
g∈Gn|Wn,P g|]
≤ k1/pn EP [|Wn,P g0|] + C1
∫ ∞0
√logN[ ](ε,Gn, ‖ · ‖P,2)dε, (S.51)
where the final inequality holds for any g0 ∈ Gn and some C1 < ∞ by Corollary 2.2.8
in van der Vaart and Wellner (1996). Next, let Gn ≡ BnFn for Fn as in Assumption
3.3(iii) and note Assumption 3.3(i) implies Gn is an envelope for Gn. Thus, [−Gn, Gn]
is a bracket of size 2‖Gn‖P,2 covering Gn, and as a result we obtain
∫ ∞0
√logN[ ](ε,Gn, ‖ · ‖P,2)dε
≤∫ 2‖Gn‖P,2
0
√1 + logN[ ](ε,Gn, ‖ · ‖P,2)dε ≤ C2
√log(1 + kn)BnJn, (S.52)
where the final inequality holds for some C2 <∞ by result (S.27) and N[ ](u,Gn, ‖ · ‖P,2)
being decreasing in u. Furthermore, since EP [|Wn,P g0|] ≤ ‖g0‖P,2 ≤ ‖Gn‖P,2 we have
EP [|Wn,P g0|] ≤ ‖Gn‖P,2 ≤∫ ‖Gn‖P,2
0
√1 + logN[ ](u,Gn, ‖ · ‖P,2)du. (S.53)
Thus, the claim of the Lemma follows from (S.51), (S.52), and (S.53).
Lemma S.4.5. Let Assumption 3.3(ii) hold. It then follows that there exists a constant
C <∞ such that for all P ∈ P, n ≥ 1, 1 ≤ ≤ J , and functions f ∈ L2P we have
kn,∑k=1
〈f, qk,n,〉2L2P≤ CEP [(EP [f(V )|Z])2]. (S.54)
Proof: Let L2P (Z) denote the subspace of L2
P consisting of functions depending on Z
only, and set `2(N) ≡ ck∞k=1 : ck ∈ R and ‖ck‖`2(N) < ∞, where ‖ck‖2`2(N) ≡∑k c
2k. For any sequence ck ∈ `2(N), then define the map JP,n, : `2(N)→ L2
P (Z) by
JP,n,ck =
kn,∑k=1
ckqk,n,.
Clearly, the maps JP,n, : `2(N) → L2P (Z) are linear, and moreover by Assumption
3.3(ii) there is a C < ∞ such that the largest eigenvalue of EP [qkn,n, (Z)q
kn,n, (Z)
′] is
14
bounded by C for all n ≥ 1 and P ∈ P. Therefore, we obtain
supP∈P
supn≥1‖JP,n,‖2o = sup
P∈Psupn≥1
supck:
∑k c
2k=1
‖JP,n,ck‖2P,2
= supP∈P
supn≥1
supck:
∑k c
2k=1
EP [(
kn,∑k=1
ckqk,n,(Z))2] ≤ sup
ck:∑k c
2k=1
C∞∑k=1
c2k = C (S.55)
which implies JP,n, is continuous. Next, define J∗P,n, : L2P (Z)→ `2(N) to be given by
J∗P,n,g = ak(g)∞k=1 ak(g) ≡
〈g, qk,n,〉L2
P (Z) if k ≤ kn,0 if k > kn,
,
and note J∗P,n, is the adjoint of JP,n,. Therefore, since ‖JP,n,‖o = ‖J∗P,n,‖o by Theorem
6.5.1 in Luenberger (1969), we obtain for any P ∈ P, n ≥ 1, and g ∈ L2P (Z)
kn,∑k=1
〈g, qk,n,〉2L2P (Z)
= ‖J∗P,n,g‖2`2(N) ≤ ‖J∗P,n,‖2o‖g‖2P,2 = ‖JP,n,‖2o‖g‖2P,2. (S.56)
Therefore, since EP [f(V )qk,n,(Z)] = EP [EP [f(V )|Z]qk,n,(Z)] for any f ∈ L2P , setting
g(Z) = EP [f(V )|Z] in (S.56) and employing (S.55) yields the Lemma.
Lemma S.4.6. If Λ is a set, A : Λ→ Rk, B : Λ→ Rk, and W is a k× k matrix, then
| infλ∈Λ‖WA(λ)‖p − inf
λ∈Λ‖WB(λ)‖p| ≤ ‖W‖o,p × sup
λ∈Λ‖A(λ)−B(λ)‖p.
Proof: Fix η > 0, and let λa ∈ Λ satisfy ‖WA(λa)‖p ≤ infλ∈Λ ‖WA(λ)‖p + η. Then,
infλ∈Λ‖WB(λ)‖p − inf
λ∈Λ‖WA(λ)‖p ≤ ‖WB(λa)‖p − ‖WA(λa)‖p + η
≤ ‖WB(λa)−A(λa)‖p + η ≤ ‖W‖o,p × supλ∈Λ‖A(λ)−B(λ)‖p + η (S.57)
where the second result follows from the triangle inequality, and the final result from
‖Wv‖p ≤ ‖W‖o,p‖v‖p for any v ∈ Rk. By identical manipulations we also have
infλ∈Λ‖WA(λ)‖p − inf
λ∈Λ‖WB(λ)‖p ≤ ‖W‖o,p × sup
λ∈Λ‖A(λ)−B(λ)‖p + η. (S.58)
Thus, since η was arbitrary, the lemma follows from results (S.57) and (S.58).
S.5 Bootstrap Approximation
This appendix contains the proof of all results concerning the bootstrap approximation.
We first introduce two assumptions that generalize Assumption 3.16 (at the cost of
15
introducing additional notation), and deliver a stronger version of Theorem 3.3.
Assumption S.5.1. There is an ε > 0 and scalars Dn(L,E) and Dn(B,E) such that
for any P ∈ P, θ0 ∈ Θr0n(P ), and θ1 ∈ Θn ∩ R satisfying ‖θ1 − θ0‖E ≤ ε, there
exists θ0 ∈ Θr0n(P ) such that ‖θ0 − θ0‖E = 0, ‖θ0 − θ1‖L ≤ Dn(L,E)‖θ0 − θ1‖E, and
‖θ0 − θ1‖B ≤ Dn(B,E)‖θ0 − θ1‖E.
Assumption S.5.2. (i) Either ΥF and ΥG are affine or (Rn + νnτn)Dn(B,E) = o(1);
(ii) k1/pn
√log(1 + kn)Bn supP∈P J[ ](`
κρn ∨(νnτn)κρ ,Fn, ‖·‖P,2) = o(an), Km`
2nSn(L,E) =
o(ann− 1
2 ), Km`n(Rn + νnτn)Dn(L,E) = o(ann− 1
2 ), `2nSn(B,E)1Kf > 0 = o(ann− 1
2 ),
and `n(Rn + νnτn)Dn(B,E)1Kf > 0 = o(ann− 1
2 ); (iii) The sequence rn satisfies
lim supn→∞ 1Kg > 0`n/rn < 1/2 and (Rn + νnτn)Dn(B,E) = o(rn).
In particular, note Assumption S.5.1 holds with Dn(L,E) = Sn(L,E), Dn(B,E) =
Sn(B,E), and θ0 = θ0. Hence, Assumption 3.16 implies Assumptions S.5.1 and S.5.2.
In general, however, Dn(L,E) and Dn(B,E) can be smaller than Sn(L,E) and Sn(B,E)
while the introduction of a θ0 6= θ0 eases requirements in partially identified models.
Our next theorem consists of two parts. The first part, which replaces Assumption
3.16 with S.5.1 and S.5.2, can by the preceding discussion be seen as a generalization of
Theorem 3.3. The second part shows that, under additional restrictions, it is possible
to replace the norm ‖ · ‖B in the definition of Vn(θ, `) (as in (22)) with the norm ‖ · ‖E– an observation that is sometimes helpful in easing rate restrictions.
Theorem S.5.1. Let Assumptions 3.1, 3.2, 3.3, 3.4, 3.6, 3.7, 3.8, 3.9, 3.10(i), 3.10(iii),
3.11, 3.12, 3.13, 3.14, 3.15, S.5.1, and S.5.2 hold. Then:
(i) There exists a sequence ˜n `n such that uniformly in P ∈ P0 we have
Un(R|`n) ≥ U?n,P (R|˜n) + oP (an).
(ii) If in addition P (θ ∈ Bn :−→d H(θ, Θr
n, ‖ · ‖E) ≤ ε ⊆ Θn) tends to one uniformly
in P ∈ P0 for some ε > 0 and ΥF and ΥG are affine, then part (i) of this theorem
continues to hold with Un(R|`n) defined as in (15) but with Vn(θ, `) given by
Vn(θ, `) ≡ h√n∈ Bn :
h√n∈ Gn(θ), ΥF (θ+
h√n
) = 0, and ‖ h√n‖E ≤ `. (S.59)
Proof: First note Assumption S.5.2(ii) and Lemma S.5.1 imply uniformly in P ∈ P0
Un(R|`n) = infθ∈Θr
n
infh√n∈Vn(θ,`n)
‖W?n,Pρ(·, θ) ∗ qknn + Dn,P (θ)[h]‖Σn(P ),p + oP (an). (S.60)
Thus, we may select θn ∈ Θrn and hn/
√n ∈ Vn(θn, `n) so that uniformly in P ∈ P0
Un(R|`n) = ‖W?n,Pρ(·, θn) ∗ qknn + Dn,P (θn)[hn]‖Σn(P ),p + oP (an). (S.61)
16
Next note that by Assumptions 3.10(i), S.5.1, and S.5.2 there is a δn so that δnDn(B,E) =
o(rn), δnDn(B,E) = o(1) if either ΥF or ΥG are not affine, Rn + νnτn = o(δn), and
`nδnDn(B,E)1Kf > 0 = o(ann− 1
2 ) (S.62)
Kmδn`nDn(L,E) = o(ann− 1
2 ) (S.63)
k1/pn
√log(1 + kn)Bn × sup
P∈PJ[ ](δ
κρn ,Fn, ‖ · ‖P,2) = o(an). (S.64)
Next, notice that Theorem 3.1 implies that there exist θ0n ∈ Θr0n(P ) such that
‖θn − θ0n‖E = oP (δn) (S.65)
uniformly in P ∈ P0 due to (Rn + νnτn) = o(δn). Furthermore, by Assumption S.5.1
we can assume without loss of generality that θ0n in addition satisfies
‖θn − θ0n‖L = oP (Dn(L,E)δn) ‖θn − θ0n‖B = oP (Dn(B,E)δn) (S.66)
uniformly in P ∈ P0. In addition note that since ‖qk,n,‖∞ ≤ Bn for all 1 ≤ k ≤ kn,
by Assumption 3.3(i), we obtain from Assumption 3.8 together with result (S.65) that
with probability tending to one uniformly in P ∈ P0 we have
EP [‖ρ(X, θn)− ρ(X, θ0n)‖22q2k,n,(Z)] ≤ B2
nK2ρδ
2κρn .
Hence, letting Gn ≡ fqk,n, : f ∈ Fn, 1 ≤ ≤ J , and 1 ≤ k ≤ kn,, we obtain from
‖Σn(P )‖o,p being uniformly bounded by Assumption 3.4(iii), result (S.48), Markov’s
inequality, and δn satisfying (S.64) that uniformly in P ∈ P we have
‖W?n,Pρ(·, θn) ∗ qknn −W?
n,Pρ(·, θ0n) ∗ qknn ‖Σn(P ),p
≤ ‖Σn(P )‖o,pk1/pn sup
g1,g2∈Gn:‖g1−g2‖P,2≤BnKρδκρn
|W?n,P g1 −W?
n,P g2| = oP (an). (S.67)
Similarly, result (S.65) implies that for any ε > 0 we have−→d H(θn,Θr
0n(P ), ‖ · ‖E) ≤ εwith probability tending to one uniformly in P ∈ P0. Therefore, Lemma S.5.3 yields
‖Dn,P (θ0n)[hn]− Dn,P (θn)[hn]‖Σn(P ),p . ‖Σn(P )‖o,p ×Km‖θn − θ0n‖L‖h‖E + oP (an)
. ‖Σn(P )‖o,p ×KmDn(L,E)δn`n√n+ oP (an) = oP (an) (S.68)
where the second inequality follows from ‖hn/√n‖B ≤ `n due to hn/
√n ∈ Vn(θn, `n),
Assumption 3.15(i), and result (S.66). In turn, the final result in (S.68) is implied
by (S.63) and ‖Σn(P )‖o,p being uniformly bounded due to Assumption 3.4(iii). Next,
we note that since either ΥG is affine (implying Kg = 0) or δnDn(B,E) = o(1), and
in addition we have δnDn(B,E) = o(rn) and lim supn→∞ `n/rn1Kg > 0 < 1/2 by
17
Assumption S.5.2(iii), we can conclude that
rn ≥ (MgδnDn(B,E) +Kgδ2nD2
n(B,E)) ∨ 2(`n + δnDn(B,E))1Kg > 0
for n sufficiently large. Hence, Theorem S.7.1(i), Assumption 3.15(ii), and ‖h‖E ≤Kb‖h‖B for all h ∈ Bn and P ∈ P by Assumption 3.15(i), imply that there is an
M <∞ for which with probability tending to one uniformly in P ∈ P0
infh0√n∈Vn(θ0n,2Kb`n)
‖ hn√n− h0√
n‖B ≤M`n(`n + δnDn(B,E))1Kf > 0.
It follows from Assumption S.5.2(ii) and (S.62) that there is a h0n/√n ∈ Vn(θ0n, 2Kb`n)
such that ‖h0n − hn‖B = oP (an) uniformly in P ∈ P0, and hence Assumption 3.4(iii),
Lemma S.5.3, and ‖h‖E ≤ Kb‖h‖B by Assumption 3.15(i) yield
‖Dn,P (θ0n)[hn]− Dn,P (θ0n)[h0n]‖Σn(P ),p . ‖Σn(P )‖o,p × ‖h− h0n‖E = oP (an) (S.69)
uniformly in P ∈ P0. Therefore, combining results (S.61), (S.67), (S.68), and (S.69)
together with θ0n ∈ Θr0n(P ) and h0n/
√n ∈ Vn(θ0n, 2Kb`n) imply
Un(R|`n) = ‖W?n,Pρ(·, θ0n) ∗ qknn + Dn,P (θ0n)[h0n]‖Σn(P ),p + oP (an)
≥ infθ∈Θr
0n(P )inf
h√n∈Vn(θ,2Kb`n)
‖W?n,Pρ(·, θ) ∗ qknn + Dn,P (θ)[h]‖Σn(P ),p + oP (an)
uniformly in P ∈ P0. The first part of theorem then follows by setting ˜n = 2Kb`n.
In order to establish the second part of the theorem, note that the only assumptions
that potentially require the norm ‖ · ‖B to be stronger than ‖ · ‖E are Assumptions 3.11,
3.12, 3.13 (pertaining to the differentiability of ΥF and ΥG) and Assumption 3.15(ii)
(since a stronger norm ‖ · ‖B makes (Θrn)ε smaller). We therefore establish part (ii) of
the theorem by repeating the arguments employed in showing part (i) while carefully
re-examining the role played by the norm ‖ · ‖B. To this end, note that since
lim infn→∞
infP∈P0
P (θ ∈ Bn :−→d H(θ, Θn, ‖ · ‖E) ≤ ε ⊆ Θn) = 1, (S.70)
we may apply Lemma S.5.1 with ‖ · ‖B set to equal ‖ · ‖E to still obtain that
Un(R|`n) = infθ∈Θr
n
infh√n∈Vn(θ,`n)
‖W?n,Pρ(·, θ) ∗ qknn + Dn,P (θ)[h]‖Σn(P ),p + oP (an). (S.71)
Letting θn and hn be defined as in (S.61) (but with Vn(θ, `) as defined in (S.59)), then
observe that since results (S.67) and (S.68) do not rely on Assumptions 3.11, 3.12, 3.13
18
or 3.15(ii), we can conclude from result (S.71) that uniformly in P ∈ P0
Un(R|`n) = ‖W?n,Pρ(·, θ0n) ∗ qknn + Dn,P (θ0n)[hn]‖Σn(P ),p + oP (an) (S.72)
for some θ0n ∈ Θrn(P ). Next, note that rn ≥ MgδnDn(B,E) for n large enough since
δnDn(B,E) = o(rn). Hence, Kg = 0 due to ΥG being affine, together with Theorem
S.7.1(ii) imply that with probability tending to one uniformly in P ∈ P0
Vn(θn, `n) ≡ h√n∈ Bn :
h√n∈ Gn(θn), ΥF (θn +
h√n
) = 0, ‖ h√n|‖E ≤ `n
⊆ h√n∈ Bn : ΥG(θ0n +
h√n
) ≤ 0, ΥF (θ0n +h√n
) = 0, ‖ h√n‖E ≤ `n
⊆ Vn(θ0n, `n),
where the final inequality follows from `n ↓ 0, and results (S.65) and (S.70) implying that
with probability tending to one uniformly in P ∈ P0 we have θ0n + h/√n ∈ Θn for any
h/√n ∈ Bn with ‖h/
√n‖E ≤ `n. Therefore, we can conclude that hn/
√n ∈ Vn(θ0n, `n)
with probability tending to one uniformly in P ∈ P0, which by (S.72) yields
Un(R|`n) ≥ infθ∈Θr
0n(P )inf
h√n∈Vn(θ,`n)
‖W?n,Pρ(·, θ0n) ∗ qknn + Dn,P (θ0n)[h]‖Σn(P ),p + oP (an),
and hence establishes the second claim of the theorem.
Proof of Theorem 3.3: Follows from immediately from Theorem S.5.1(i) and As-
sumption 3.16 implying Assumptions S.5.1 and S.5.2 are satisfied by setting Dn(B,E) =
Sn(B,E), Dn(L,E) = Sn(L,E) and θ0 = θ0.
Proof of Lemma 3.1: In the following arguments, we note that the only requirement
on Dn(θ) is that it satisfy condition (24). As a result, the lemma applies to estimators
Dn(θ) besides the numerical derivative examined in the main text.
In order to establish the result, we first let let θn ∈ Θrn and hn ∈ Vn(θn,+∞) satisfy
infθ∈Θr
n
infh√n∈Vn(θ,+∞)
‖Wnρ(·, θ) ∗ qknn + Dn(θ)[h]‖Σn,p
= ‖Wnρ(·, θn) ∗ qknn + Dn(θn)[hn]‖Σn,p + o(an). (S.73)
Then note that in order to establish the claim of the lemma it suffices to show that
lim supn→∞
supP∈P0
P (‖ hn√n‖B ≥ `n) = 0. (S.74)
To this end, observe that since 0 ∈ Vn(θ,+∞) for all θ ∈ Θn ∩ R, we obtain from the
19
triangle inequality, ‖Σn‖o,p = OP (1) by Lemma S.3.4 and Assumption 3.14 that
‖Dn(θn)[hn]‖Σn,p ≤ ‖Wnρ(·, θn) ∗ qknn + Dn(θn)[hn]‖Σn,p + ‖Wnρ(·, θn) ∗ qknn ‖Σn,p≤ 2‖Σn‖o,p‖W?
n,Pρ(·, θn) ∗ qknn ‖p + oP (an) (S.75)
uniformly in P ∈ P. Hence, since Θrn ⊆ Θn ∩ R almost surely, we obtain from result
(S.75), ‖Σn‖o,p = OP (1), and Lemma S.4.4 together with Markov’s inequality
‖Dn(θn)[hn]‖Σn,p ≤ 2‖Σn‖o,p supθ∈Θn∩R
‖W?n,Pρ(·, θ) ∗ qknn ‖p + oP (an)
= OP (k1/pn
√log(1 + kn)BnJn) (S.76)
uniformly in P ∈ P. Since hn/√n ∈ Vn(θn,+∞) implies hn ∈
√nBn ∩ R − θn and
θn ∈ Θrn ⊆ An(P ) with probability tending to one uniformly in P ∈ P0, we obtain from
the first hypothesis of the lemma that ‖hn‖E ≤ νn‖Dn,P (θn)[hn]‖p with probability
tending to one uniformly in P ∈ P0. Therefore, it follows that
lim supn→∞
supP∈P0
P (`n ≤ ‖hn√n‖B)
= lim supn→∞
supP∈P0
P (`n ≤ ‖hn√n‖B and ‖hn‖E ≤ νn‖Dn,P (θn)[hn]‖p)
≤ lim supn→∞
supP∈P0
P (`n ≤ ‖hn√n‖B and ‖hn‖E ≤ 2νn‖Dn(θn)[hn]‖p), (S.77)
where the inequality follows from (24). Hence, results (S.76) and (S.77), the definitions
of Sn(B,E) and Rn, and Sn(B,E)Rn = o(`n) by hypothesis yield
lim supn→∞
supP∈P0
P (`n ≤ ‖hn√n‖B)
≤ lim supn→∞
supP∈P0
P (`n ≤ 2νn√nSn(B,E)‖Dn(θn)[hn]‖p) = 0, (S.78)
which establishes (S.74) and hence the claim of the Lemma.
Proof of Corollary 3.1: We establish the corollary by appealing to Lemma S.5.4.
To this end, note (S.99) holds by Assumption 3.17 and recall that
U?n,P (R|˜n) ≡ infθ∈Θr
0n(P )inf
h√n∈Vn(θ,˜n)
‖W?n,Pρ(·, θ) ∗ qknn + Dn,P (θ)[h]‖Σn(P ),p,
which is independent of Vini=1 by Assumption 3.14. Moreover, Theorem 3.3 yields
Un(R|`n) ≥ U?n,P (R|˜n) + oP (an)
uniformly in P ∈ P0 with `n ˜n, while Assumption 3.16(ii) implies Km
˜2nSn(L,E) =
20
o(ann− 1
2 ) and k1/pn
√log(1 + kn)Bn × supP∈P J[ ](˜κρ
n ,Fn, ‖ · ‖P,2) = o(an), and hence
In(R) ≤ Un,P (R|˜n) + oP (an) (S.79)
uniformly in P ∈ P0 by Theorem 3.2(i). Since the right hand side of (S.79) shares the
same distribution as U?n,P (R|˜n), the corollary holds by Lemma S.5.4.
Proof of Corollary 3.2: First note Theorem 3.2(i) implies that uniformly in P ∈ P0
In(R) ≤ Un,P (R|`n) + oP (an). (S.80)
On the other hand, by applying Theorem 3.2(ii) with R = Θ we also obtain that
In(Θ) = Un,P (Θ|`un ) + oP (an). (S.81)
The claim of the corollary is therefore immediate from (S.80) and (S.81).
Proof of Corollary 3.3: First note Theorem 3.3 implies that uniformly in P ∈ P0
Un(R|`n) ≥ U?n,P (R|˜n) + oP (an) (S.82)
with ˜n `n. Similarly, also note that Lemma S.5.5 implies uniformly in P ∈ P0 that
Un(Θ|`un) ≤ U?n,P (Θ|`un) + oP (an), (S.83)
and hence the corollary follows with ˜un = `un from (S.82) and (S.83).
Proof of Corollary 3.4: The proof of the first claim follows by the same arguments
as in Corollary 3.1 but employing Corollaries 3.2 and 3.3 in place of Theorems 3.2 and
3.3. The proof of the second claim is immediate from Un(Θ|+∞) ≤ Un(Θ|`un).
Lemma S.5.1. Let Assumptions 3.2, 3.3, 3.4, 3.6, 3.7, 3.8, 3.9(i), 3.10(ii)(iii), 3.14,
3.15 hold, and suppose `n ↓ 0 satisfies k1/pn
√log(1 + kn)Bn supP∈P J[ ](`
κρn ,Fn, ‖·‖P,2) =
o(an) and Km`2nSn(L,E) = o(ann
− 12 ). It then follows that uniformly in P ∈ P0 we have
Un(R|`n) = infθ∈Θr
n
infh√n∈Vn(θ,`n)
‖W?n,Pρ(·, θ) ∗ qknn + Dn,P (θ)[h]‖Σn(P ),p + oP (an).
Proof: First note Assumption 3.10(ii) implies Assumption 3.5, and hence Assumptions
3.10(ii) and 3.6 allow us to apply Theorem 3.1(i) to conclude−→d H(Θr
n,Θr0n, ‖ · ‖E) =
OP (Rn + νnτn) uniformly in P ∈ P0. Thus, for any ε > 0 it follows that
lim infn→∞
infP∈P0
P (Θrn ⊆ θ ∈ Θn ∩R :
−→d H(θ,Θr
0n(P ), ‖ · ‖E) ≤ ε) = 1. (S.84)
Furthermore, for any θ ∈ Θrn and h/
√n ∈ Vn(θ, `n) note that ΥG(θ + h/
√n) ≤ 0 and
21
ΥF (θ + h/√n) = 0 by definition of Vn(θ, `n). Thus, θ + h/
√n ∈ R for any θ ∈ Θr
n and
h/√n ∈ Vn(θ, `n), and hence Assumption 3.15(ii) allows us to conclude
lim infn→∞
infP∈P0
P (θ +h√n∈ Θn ∩R for all θ ∈ Θr
n andh√n∈ Vn(θ, `n))
= lim infn→∞
infP∈P0
P (θ +h√n∈ Θn for all θ ∈ Θr
n andh√n∈ Vn(θ, `n)) = 1 (S.85)
due to ‖h/√n‖B ≤ `n ↓ 0 for any h/
√n ∈ Vn(θ, `n). Therefore, results (S.84) and
(S.85), Assumption 3.15(i), and Lemma S.5.2 yield uniformly in P ∈ P0
supθ∈Θr
n
suph√n∈Vn(θ,`n)
‖Dn(θ)[h]− Dn,P (θ)[h]‖p = oP (an). (S.86)
Moreover, since Θrn ⊆ Θn ∩R almost surely, we also have from Assumption 3.14 that
supθ∈Θr
n
‖Wnρ(·, θ) ∗ qknn −W?n,Pρ(·, θ) ∗ qknn ‖p = oP (an) (S.87)
uniformly in P ∈ P. Therefore, since ‖Σn‖o,p = OP (1) uniformly in P ∈ P by Lemma
S.3.4, we obtain from results (S.86) and (S.87) and Lemma S.4.6 that
Un(R|`n) = infθ∈Θr
n
infh√n∈Vn(θ,`n)
‖W?n,Pρ(·, θ) ∗ qknn + Dn,P (θ)[h]‖Σn,p + oP (an) (S.88)
uniformly in P ∈ P0. Next, note that by Assumption 3.4(iii) there exists a constant
C0 <∞ such that ‖Σn(P )−1‖o,p ≤ C0 for all n and P ∈ P. Thus, using that ‖Σna‖p ≤‖ΣnΣn(P )−1‖o,p‖Σn(P )a‖p for any a ∈ Rkn and the triangle inequality we obtain
‖W?n,Pρ(·, θ) ∗ qknn + Dn,P (θ)[h]‖Σn,p
≤ C0‖Σn − Σn(P )‖o,p + 1‖W?n,Pρ(·, θ) ∗ qknn + Dn,P (θ)[h]‖Σn(P ),p (S.89)
for any θ ∈ Θn ∩ R, h ∈ Bn, and P ∈ P. In particular, since 0 ∈ Vn(θ, `n) for any
θ ∈ Θn ∩R, Assumptions 3.4(iii), 3.10(iii), Markov’s inequality, and Lemma S.4.4 yield
‖Σn − Σn(P )‖o,p × infθ∈Θr
n
infh√n∈Vn(θ,`n)
‖W?n,Pρ(·, θ) ∗ qknn + Dn,P (θ)[h]‖Σn,p
≤ ‖Σn − Σn(P )‖o,p × supθ∈Θn∩R
‖W?n,Pρ(·, θ) ∗ qknn ‖Σn(P ),p = oP (an) (S.90)
22
uniformly in P ∈ P. It then follows from (S.89) and (S.90) that uniformly in P ∈ P
infθ∈Θr
n
infh√n∈Vn(θ,`n)
‖W?n,Pρ(·, θ) ∗ qknn + Dn,P (θ)[h]‖Σn,p
≤ infθ∈Θr
n
infh√n∈Vn(θ,`n)
‖W?n,Pρ(·, θ) ∗ qknn + Dn,P (θ)[h]‖Σn(P ),p + oP (an) . (S.91)
The reverse inequality to (S.91) can be obtained by identical arguments and exploiting
max‖Σn‖o,p, ‖Σ−1n ‖o,p = OP (1) uniformly in P ∈ P by Lemma S.3.4. The claim of
the Lemma then follows from (S.88) and (S.91) (and its reverse inequality).
Lemma S.5.2. Let Assumptions 3.3(i)-(ii), 3.7, 3.8, 3.9(i) hold, and define
Vn(θ, `n) ≡ h√n∈ Bn : θ +
h√n∈ Θn ∩R and ‖ h√
n‖E ≤ `n. (S.92)
If `n ↓ 0 satisfies k1/pn
√log(1 + kn)Bn×supP∈P J[ ](`
κρn ,Fn, ‖·‖P,2) = o(an) and Km`
2n×
Sn(L,E) = o(ann− 1
2 ), then there is an ε > 0 such that uniformly in P ∈ P
supθ∈Θn∩R:
−→d H(θ,Θr
0n(P ),‖·‖E)≤εsup
h√n∈Vn(θ,`n)
‖Dn(θ)[h]− Dn,P (θ)[h]‖p = oP (an). (S.93)
Proof: By definition of the set Vn(θ, `n), we have θ+h/√n ∈ Θn∩R for any θ ∈ Θn∩R,
h/√n ∈ Vn(θ, `n). Therefore, since ‖h/
√n‖E ≤ `n for all h/
√n ∈ Vn(θ, `n) we obtain
supθ∈Θn∩R
suph√n∈Vn(θ,`n)
‖Dn(θ)[h]−√nEP [(ρ(X, θ +
h√n
)− ρ(X, θ)) ∗ qknn (Z)]‖p
≤ supθ1,θ2∈Θn∩R:‖θ1−θ2‖E≤`n
‖Gn,Pρ(·, θ1) ∗ qknn −Gn,Pρ(·, θ2) ∗ qknn ‖p. (S.94)
Further we note that Assumptions 3.3(i) and 3.8 additionally allow us to conclude
supP∈P
supθ1,θ2∈Θn∩R:‖θ1−θ2‖E≤`n
EP [‖ρ(X, θ1)− ρ(X, θ2)‖22q2k,n,(Z)] ≤ B2
nK2ρ`
2κρn . (S.95)
Next, let Gn ≡ fqk,n, : f ∈ Fn, 1 ≤ ≤ J and 1 ≤ k ≤ kn,, and then observe that
Assumption 3.7, result (S.95) and ‖a‖p ≤ k1/pn ‖a‖∞ for any a ∈ Rkn yield
supθ1,θ2∈Θn∩R:‖θ1−θ2‖E≤`n
‖Gn,Pρ(·, θ1) ∗ qknn −Gn,Pρ(·, θ2) ∗ qknn ‖p
≤ 2k1/pn × sup
g1,g2∈Gn:‖g1−g2‖P,2≤BnKρ`κρn
|Wn,P g1 −Wn,P g2|+ oP (an) (S.96)
uniformly in P ∈ P. Therefore, from the calculations in (S.48), Markov’s inequality, and
23
k1/pn
√log(1 + kn)Bn × supP∈P J[ ](`
κρn ,Fn, ‖ · ‖P,2) = o(an) by hypothesis, we conclude
supθ∈Θn∩R
suph√n∈Vn(θ,`n)
‖Dn(θ)[h]−√nEP [(ρ(X, θ +
h√n
)− ρ(X, θ)) ∗ qknn (Z)]‖p = oP (an)
(S.97)
uniformly in P ∈ P. Let Nn(P ) ≡ θ ∈ Θn ∩ R :−→d H(θ,Θr
0n(P ), ‖ · ‖E) ≤ ε, and
select ε sufficiently small for Assumption 3.9(i) to hold. We can then conclude from
Lemmas S.3.5 and S.4.5, and Assumption 3.9(i) that
supθ∈Nn(P )
suph√n∈Vn(θ,`n)
‖√nEP [(ρ(X, θ +
h√n
)− ρ(X, θ)) ∗ qknn (Z)]− Dn,P (θ)[h]‖p
. supθ∈Nn(P )
suph√n∈Vn(θ,`n)
Km ×√n‖ h√
n‖E‖
h√n‖L = o(an), (S.98)
where the final equality follows from Km`2n×Sn(L,E) = o(ann
− 12 ) by hypothesis. Hence,
the Lemma follows from results (S.97) and (S.98).
Lemma S.5.3. Let Assumptions 3.3(ii) and 3.9(ii)-(iii) hold. Then there are constants
ε > 0 and C < ∞ such that for all n, P ∈ P, θ0 ∈ Θr0n(P ), θ1 ∈ Θn ∩ R satisfying
−→d H(θ1,Θr
0n(P ), ‖ · ‖E) ≤ ε, and h0, h1 ∈ Bn it follows that
‖Dn,P (θ0)[h0]− Dn,P (θ1)[h1]‖p ≤ CMm‖h0 − h1‖E +Km‖θ0 − θ1‖L‖h1‖E.
Proof: We first fix ε > 0 such that Assumptions 3.9(ii)-(iii) are satisfied. Then note
that by Lemmas S.3.5 and S.4.5 it follows that there is a constant C0 <∞ with
‖Dn,P (θ0)[h0]− Dn,P (θ1)[h1]‖p ≤ J∑=1
C0‖∇mP,(θ0)[h0]−∇mP,(θ1)[h1]‖2P,212 .
Moreover, since (h0−h1) ∈ Bn, we can also conclude from Assumptions 3.9(ii)-(iii) that
‖∇mP,(θ0)[h0]−∇mP,(θ1)[h1]‖P,2≤ ‖∇mP,(θ0)[h0 − h1]‖P,2 + ‖∇mP,(θ0)[h1]−∇mP,(θ1)[h1]‖P,2≤Mm‖h0 − h1‖E +Km‖θ1 − θ0‖L‖h1‖E,
and thus the claim of the lemma follows.
Lemma S.5.4. Suppose there exists a δ > 0 such that for all ε > 0 and α ∈ [α−δ, α+δ]
supP∈P0
P (q1−α,P (In(R))− ε ≤ In(R) ≤ q1−α,P (In(R)) + ε) ≤ a−1n (ε ∧ 1) + o(1). (S.99)
If In(R) ≤ Un,P (R|˜n) + oP (an) and Un(R|`n) ≥ U?n,P (R|˜n) + oP (an) uniformly in
24
P ∈ P0 with Un,P (R|˜n)d= U?n,P (R|˜n) and U?n,P (R|˜n) independent of Vini=1, then
lim supn→∞
supP∈P0
P (In(R) > q1−α(Un(R|`n))) ≤ α.
Proof: We first note that by hypothesis there exists a positive sequence bn such that
bn = o(an) and in addition we have uniformly in P ∈ P0 that
In(R) ≤ Un,P (R|˜n) + oP (bn) Un(R|`n) ≥ U?n,P (R|˜n) + oP (bn). (S.100)
Next, observe that by Markov’s inequality and result (S.100) we can conclude that
lim supn→∞
supP∈P0
P (P (U?n,P (R|˜n) > Un(R|`n) + bn|Vini=1) > ε)
≤ lim supn→∞
supP∈P0
1
εP (U?n,P (R|˜n) > Un(R|`n) + bn) = 0. (S.101)
Thus, it follows from (S.101) that there exists some sequence ηn ↓ 0 such that the event
Ωn(P ) ≡ Vini=1|P (U?n,P (R|˜n) > Un(R|`n) + bn|Vini=1) ≤ ηn
satisfies P (Ωn(P )c) = o(1) uniformly in P ∈ P0. Hence, for any t ∈ R we obtain that
P (Un(R|`n) ≤ t|Vini=1)1Vini=1 ∈ Ωn(P )
≤ P (Un(R|`n) ≤ t and U?n,P (R|˜n) ≤ Un(R|`n) + bn|Vini=1) + ηn
≤ P (U?n,P (R|˜n) ≤ t+ bn) + ηn, (S.102)
where the final inequality employed that U?n,P (R|˜n) is independent of Vini=1. Set
q1−α,P (Un,P (R|˜n)) ≡ infu : P (Un,P (R|˜n) ≤ u) ≥ 1− α
and note evaluating (S.102) at t = q1−α(Un(R|`n)) yields that Ωn(P ) implies q1−α(Un(R|`n))+
bn ≥ q1−α−ηn,P (Un,P (R|˜n)). Hence, P (Ωn(P )c) = o(1) uniformly in P ∈ P0 yields
lim infn→∞
infP∈P0
P (q1−α−ηn,P (Un,P (R|˜n)) ≤ q1−α(Un(R|`n)) + bn)
≥ lim infn→∞
infP∈P0
P (Ωn(P )) = 1. (S.103)
Furthermore, arguing as in (S.102) it follows that for some sequence ηn = o(1) we have
q1−α−ηn,P (In(R)) ≤ q1−α−ηn,P (Un,P (R|˜n)) + bn. (S.104)
25
Thus, employing (S.103), (S.104), condition (S.99), and bn = o(an), we conclude that
lim supn→∞
supP∈P0
P (In(R) > q1−α(Un(R|`n)))
≤ lim supn→∞
supP∈P0
P (In(R) > q1−α−ηn,P (Un,P (R|˜n))− bn)
≤ lim supn→∞
supP∈P0
P (In(R) > q1−α−ηn,P (In(R))− 2bn) = α,
which establishes the claim of the Lemma.
Lemma S.5.5. Let the conditions of Theorem 3.3 hold with R = Θ, and suppose
JunBnk
1/pn
√log(1 + kn)/n = o(τu
n ). Then, it follows uniformly in P ∈ P0 that
Un(Θ|`un) ≤ U?n,P (Θ|`un) + oP (an).
Proof: Since we assumed the condition of Theorem 3.3 hold with R = Θ, we can apply
Lemma S.5.1 with Θun and V u
n (`un) in place of Θrn and Vn(θ, `n) to conclude
Un(Θ|`un) = infθ∈Θu
n
infh√n∈V u
n (`un)‖W?
n,Pρ(·, θ) ∗ qknn + Dn,P (θ)[h]‖Σn(P ),p + oP (an) (S.105)
uniformly in P ∈ P0. Next, let θ0n ∈ Θu0n(P ) and hn/
√n ∈ V u
n (`un) be such that
infθ∈Θu
0n(P )inf
h√n∈V u
n (`un)‖W?
n,Pρ(·, θ) ∗ qknn + Dn,P (θ)[h]‖Σn(P ),p
= ‖W?n,Pρ(·, θ0n) ∗ qknn + Dn,P (θ0n)[hn]‖Σn(P ),p + o(an). (S.106)
Assumption 3.16 holding with R = Θ implies Km`un((νu
nτun ) ∨ `un)Su
n(L,E) = o(ann−1/2)
and k1/pn
√log(1 + kn)Bn supP∈P J[ ]((`
un)κρ ,Fn, ‖ · ‖P,2) = o(an). Hence, there is δn with
Kmδn`unSu
n(L,E) = o(ann−1/2) (S.107)
k1/pn
√log(1 + kn)Bn × sup
P∈PJ[ ](δ
κρn ,Fu
n , ‖ · ‖P,2) = o(an), (S.108)
and νunτ
un = o(δn). Further note that since we required Ju
nBnk1/pn
√log(1 + kn)/n = o(τu
n )
and we assumed all other conditions of Theorem 3.1(ii) are satisfied when Θ = R, it
follows that we may find a θn in Θun such that uniformly in P ∈ P0
‖θ0n − θn‖E = OP (νunτ
un ). (S.109)
Thus, νunτ
un = o(δn) and θn ∈ Θu
n ⊆ Θn implies that (θn − θ0n) ∈ V un (θ0n, δn) with
probability tending to one uniformly in P ∈ P0. Hence, applying Lemma S.4.2 with
26
Θu0n(P ) and V u
n (θ0, δn) in place of Θr0n(P ) and Vn(θ0, δn), yields uniformly in P ∈ P0
‖W?n,Pρ(·, θn) ∗ qknn −W?
n,Pρ(·, θ0n) ∗ qknn ‖p = oP (an). (S.110)
Furthermore, Lemma S.5.3 implies with probability tending to one uniformly in P ∈ P0
‖Dn,P (θn)[hn]− Dn,P (θ0n)[hn]‖p ≤ KmSun(L,E)δn`
un
√n = o(an) (S.111)
where the final equality follows from (S.107). Therefore, ‖Σn(P )‖o,p being bounded
uniformly in P ∈ P by Assumption 3.4(iii), θn ∈ Θun and hn/
√n ∈ V u
n (`un) together with
results (S.106), (S.110), and (S.111) establish that uniformly in P ∈ P0 we have
infθ∈Θu
0n(P )inf
h√n∈V u
n (`un)‖W?
n,Pρ(·, θ) ∗ qknn + Dn,P (θ)[h]‖Σn(P ),p
≥ infθ∈Θu
n
infh√n∈V u
n (`un)‖W?
n,Pρ(·, θ) ∗ qknn + Dn,P (θ)[h]‖Σn(P ),p + oP (an). (S.112)
Hence, the corollary follows from V un (θ, `un) ⊆ V u
n (`un), (S.105), and (S.112).
S.6 Illustrative Examples
We next examine special cases of our general analysis and illustrate both how to imple-
ment our procedure and verify the assumptions in the main text.
S.6.1 Generalized Method of Moments
Our first example concerns the generalized method of moments (GMM) model of Hansen
(1982). For simplicity we assume the parameter of interest θ0(P ) ∈ Θ ⊆ Rdθ is identified
as the unique solution to the unconditional moment restriction
EP [ρ(X, θ0(P ))] = 0, (S.113)
where X ∈ Rdx is distributed according to P ∈ P and ρ : Rdx ×Θ→ RJ . This model
maps into our general analysis by letting Z = 1 for all 1 ≤ ≤ J . Moreover, since we
have assumed θ0(P ) is identified, the hypothesis testing problem simplifies to
H0 : θ0(P ) ∈ R H1 : θ0(P ) /∈ R.
The set R is as in the main text defined by equality and inequality restrictions. In
27
particular, for known functions ΥF : Rdθ → RdF and ΥG : Rdθ → RdG we set
R ≡ θ ∈ Rdθ : ΥF (θ) = 0 and ΥG(θ) ≤ 0. (S.114)
To verify Assumption 3.1, note Rd is a Banach space under any norm ‖ · ‖p with 1 ≤p ≤ ∞, so for concreteness we set B = Rdθ , F = RdF , and ‖ · ‖B = ‖ · ‖F = ‖ · ‖2. The
space Rd is in addition a lattice under the standard pointwise partial order
a ≤ b if and only if ai ≤ bi for all 1 ≤ i ≤ d (S.115)
for any (a1, . . . , ad)′ = a and (b1, . . . , bd)
′ = b in Rd. The least upper bound a ∨ b and
smallest lower bound a∧ b are then given by the pointwise maximum and minimum; i.e.
a ∨ b = (maxa1, b1, . . .maxad, bd)′
a ∧ b = (mina1, b1, . . .minad, bd)′.
The vector (1, . . . , 1)′ is an order unit in Rd under the partial order in (S.115). As
discussed in the appendix to the main paper, the order unit induces the norm
inf λ > 0 : |a| ≤ λ(1, . . . , 1)′ = max1≤i≤d
|ai|,
which corresponds to the usual ‖ · ‖∞ norm on Rd. Hence, by setting G = RdG ,
‖ · ‖G = ‖ · ‖∞, and 1G = (1, . . . , 1)′ we verify the requirements of Assumption 3.1.
Since the parameter space Θ is finite dimensional and all moment restrictions are
unconditional, we may set Θn = Θ and kn = J for all n. We base our test statistic on
quadratic forms in the moments (p = 2), which implies Qn(θ) is given by
Qn(θ) ≡ ( 1
n
n∑i=1
ρ(Xi, θ))′Σn(
1
n
n∑i=1
ρ(Xi, θ))12 .
In what follows, we considered tests based on the un-centered statistic In(R) and the
re-centered statistic In(R)− In(Θ). To this end, we impose the following assumptions.
Assumption S.6.1. (i) Xini=1 is an i.i.d. sequence with Xi ∼ P ∈ P; (ii) For each
P ∈ P0 there exists a unique θ0(P ) ∈ Θ solving (S.113); (iii) Θ is convex and compact.
Assumption S.6.2. (i) ρ(x, ·) : Θ → RJ is twice differentiable for all x ∈ Rdx; (ii)
EP [supθ∈Θ ‖ρ(X, θ)‖32], EP [supθ∈Θ ‖∇θρ(X, θ)‖2o,2], and EP [supθ∈Θ ‖∇2θρ(X, θ)‖
1+δo,2 ] are
finite and bounded uniformly in P ∈ P for some δ > 0.
Assumption S.6.3. (i) infP∈P0 infθ∈Θ:‖θ−θ0(P )‖2≥ε ‖EP [ρ(X, θ)]‖2 > 0 for all ε > 0;
(ii) The singular values of EP [∇θρ(X, θ0(P ))] are bounded away from zero in P ∈ P0.
Assumption S.6.4. (i) ‖Σn − Σ(P )‖o,2 = OP (n−1/2) uniformly in P ∈ P; (ii) Σ(P )
is invertible and ‖Σ(P )‖o,2 and ‖Σ(P )−1‖o,2 are bounded uniformly in P ∈ P.
28
In Assumption S.6.2 we focus on differentiable moments for simplicity. Assumption
S.6.3 essentially imposes strong identification of θ0(P ) and hence guarantees that θ0(P )
can be consistently estimated uniformly in P ∈ P0. Finally, Assumption S.6.4 states
the requirements on the weighting matrix Σn. The consistency rate in Assumption S.6.4
is imposed for simplicity since it is easily satisfied in this finite dimensional model.
In what follows, we set the local parameter spaces Vn(θ, `) and V un (θ, `) to equal
Vn(θ, `) = h/√n ∈ Rdθ : θ + h/
√n ∈ Θ ∩R and ‖h/
√n‖2 ≤ `
V un (θ, `) = h/
√n ∈ Rdθ : θ + h/
√n ∈ Θ and ‖h/
√n‖2 ≤ `.
We also denote the random variables to which In(R) and In(Θ) will be coupled to by
Un,P (R|`n) ≡ infh√n∈Vn(θ0(P ),`n)
‖Wn,Pρ(·, θ0(P )) + EP [∇θρ(X, θ0(P ))]h‖Σ(P ),2 (S.116)
Un,P (Θ|`n) ≡ infh√n∈V u
n (θ0(P ),`n)‖Wn,Pρ(·, θ0(P )) + EP [∇θρ(X, θ0(P ))]h‖Σ(P ),2 (S.117)
Our distributional approximations then follow immediately from Theorem 3.2(ii).
Theorem S.6.1. Let Assumptions S.6.1, S.6.2, S.6.3, and S.6.4 hold, ΥF and ΥG
be continuous, and set an =√
log(n)/n1
10+5dθ . Then: For any `n, `un ↓ 0 satisfying
(`n ∨ `un)√
log(1/`n ∨ `un) = o(an) and n−1/2 = o(`n ∨ `un) we have uniformly in P ∈ P0
In(R) = Un,P (R|`n) + oP (an)
In(R)− In(Θ) = Un,P (R|`n)− Un,P (Θ|`un) + oP (an).
The rate of coupling of√
log(n)/n1
10+5dθ obtained in Theorem S.6.1 suffices for both
the empirical process and bootstrap coupling; see Lemmas S.6.3 and S.6.4 below. While
the rate is adequate for our purposes, it can be potentially improved, for example, under
additional moment restrictions. Here, we rely in Yurinskii (1977) both to illustrate the
diversity of coupling arguments that can be employed to verify Assumption 3.7, and to
impose only the weak third moment restriction of Assumption S.6.2(ii).
As in Theorem 3.2, the conclusion in Theorem S.6.1 in fact delivers a family (indexed
by `n) of approximations to the distribution of both In(R) and In(R)−In(Θ). Our next
goal is therefore to obtain bootstrap approximations to the distributional approxima-
tions obtained in Theorem S.6.1. To this end, we write ΥF (θ) = (ΥF,1(θ), . . . ,ΥF,dF (θ))′
and ΥG(θ) = (ΥG,1(θ), . . . ,ΥG,dG(θ))′, and introduce the following assumptions.
Assumption S.6.5. For some ε > 0 and Bε ≡⋃P∈P0
θ : ‖θ− θ0(P )‖2 ≤ ε: (i) Bε ⊆Θ; (ii) ΥF and ΥG are twice differentiable on Bε; (iii) ‖∇ΥF (θ)‖o,2 and ‖∇ΥG(θ)‖o,2 are
29
bounded on Bε; (iv) ‖∇2ΥF,j(θ)‖o,2 is bounded on Bε for 1 ≤ j ≤ dF ; (v) ‖∇2ΥG,j(θ)‖o,2is bounded on Bε for 1 ≤ j ≤ dG; (vi) ∇ΥF (θ) has full row-rank on Bε.
Assumption S.6.6. Either (i) ΥF : Rdθ → RdF is linear, or (ii) There is an ε > 0 and
Kd <∞ such that the singular values of ∇ΥF (θ1)′ are bounded away from zero uniformly
in θ1 ∈ θ : ‖θ−θ1(P )‖2 ≤ ε and P ∈ P0, and there is an h(P ) ∈ N (∇ΥF (θ0(P ))) with
‖h(P )‖2 ≤ Kd satisfying ΥG,j(θ0(P )) +∇ΥG,j(θ0(P ))[h0(P )] ≤ −ε for all 1 ≤ j ≤ dG.
In order to describe the structure of the bootstrap procedure in this application let
Qn(θn) ≤ infθ∈Θ∩R
Qn(θ) + oP (n−1/2)
Qn(θun) ≤ inf
θ∈ΘQn(θ) + oP (n−1/2)
uniformly in P ∈ P; e.g. θn and θun may be set to equal any minimizer of Qn over
Θ∩R and Θ when such minimizers exist. Employing θn and θun we then obtain estima-
tors for the distribution of the random variable Wn,Pρ(·, θ0(P )) and for the derivative
EP [∇θρ(X, θ0(P ))] by evaluating the following expressions at both θ = θn and θ = θun:
Wnρ(·, θ) ≡ 1√n
n∑i=1
ωiρ(Xi, θ)−1
n
n∑j=1
ρ(Xj , θ) (S.118)
Dn(θ) ≡ 1
n
n∑i=1
∇θρ(Xi, θ), (S.119)
where ωini=1 is an i.i.d. sample of standard normal random variables independent
of the the data Xini=1. We note that since moments are assumed differentiable, we
employ an analytical derivative in (S.119) for the resulting computational simplicity.
With regards to the local parameter space, we note that the construction of Vn(θ, `)
requires the bound Kg on the second derivative of ΥG (as specified in Assumption 3.11).
In particular, Assumption S.6.5(v) implies Assumption 3.11 is satisfied with
Kg ≡ max1≤j≤J
supθ∈Bε
‖∇2θΥG,j(θ)‖o,2
(see Lemma S.6.5). If an a-priori bound on the second derivative is not available, then
it is also possible to simply substitute Kg with the data driven choice
Kg ≡ max1≤j≤J
supθ∈Θ:‖θ−θn‖2≤rn
‖∇2θΥG,j(θ)‖o,2.
Given Kg (or sample analogue Kg), we then note the set Gn(θ) in this application equals
Gn(θ) = h√n∈ Rdθ : Υj,G(θ+
h√n
) ≤ maxΥj,G(θ)−Kgrn‖h√n‖2,−rn for 1 ≤ j ≤ dG
30
Furthermore, in this parametric problem we may additionally specify the bandwidth `n
to be infinite. Hence, the sample analogue to the local parameter space is given by
Vn(θ,+∞) = h√n∈ Rdθ :
h√n∈ Gn(θ) and ΥF (θ +
h√n
) = 0.
The approximations to the distributions of In(R) and In(Θ) are then given by the
laws of Un(R|+∞) and Un(Θ|+∞) conditional on the data, where
Un(R|+∞) ≡ infh∈Vn(θn,+∞)
‖Wnρ(·, θn) + Dn(θn)h‖Σn,2
Un(Θ|+∞) ≡ infh∈Rdθ
‖Wnρ(·, θun) + Dn(θu
n)h‖Σn,2.
The validity of these approximations then follows from Theorem 3.3 and Corollary 3.3.
Theorem S.6.2. Let Assumptions S.6.1, S.6.2, S.6.3, S.6.4, S.6.5, and S.6.6 hold,
set an =√
log(n)/n1
10+5dθ , and let n−1/2 = o(rn). Then: for any sequences `n, `un ↓ 0
satisfying (`n ∨ `un)2√
log(1/(`n ∨ `un)) = o(ann− 1
2 ), `n = o(rn), and n−12 = o(`n ∧ `un), it
follows that uniformly in P ∈ P0 we have
Un(R|+∞) ≥ U?n,P (R|`n) + oP (an)
Un(R|+∞)− Un(Θ|+∞) ≥ U?n,P (R|`n)− U?n,P (Θ|`un) + oP (an).
Crucially, note that any sequences `n, `un satisfying the conditions of Theorem S.6.2
also satisfies the conditions of Theorem S.6.1. Therefore, Theorems S.6.2 and S.6.1
together establish the validity of employing the laws of Un(R| +∞) and Un(Θ| +∞)
conditional on the data to approximate the laws of In(R) and In(Θ). In particular, for
a level α test we may compare the test statistic In(R) to the critical value
q1−α(Un(R|+∞)) ≡ infc : P (Un(R|+∞) ≤ c|Xini=1) ≥ 1− α.
Similarly, for the re-centered statistic In(R)− In(Θ), valid critical values are given by:
q(Un(R|+∞)−Un(Θ|+∞)) ≡ infc : P (Un(R|+∞)−Un(Θ|+∞) ≤ c|Xini=1) ≥ 1−α.
These approximations are valid under the requirement that rn satisfies rn√n→∞.
Intuitively, the bandwidth rn is meant to reflect a bound on the distance between θn
and θ0(P ). For a data driven choice of rn we therefore suggest employing a bootstrap
estimate of an upper quantile of the distribution of the unconstrained estimator. Specif-
ically, for θu?n the bootstrapped version of θu
n, we let rn be given by
rn ≡ infc : P (‖θu?n − θu
n‖2 ≤ c|Xini=1) ≥ 1− γn
31
for γn → 0 as the sample size n tends to infinity, and employ rn in place of rn.
S.6.1.1 Proofs and Auxiliary Results
Proof of Theorem S.6.1: We establish the theorem by simply applying Theorem
3.2(ii) to both R as in (S.114) (to couple In(R)) and to R = Θ (to couple In(Θ)). To
this end, note that as discussed Assumption 3.1 holds, while Assumption 3.2 is directly
imposed in S.6.1(i). Since qknn (Z) equals the vector (1, . . . , 1)′ ∈ RJ , it further follows
Assumption 3.3(i) holds with Bn = 1, while Assumption 3.3(ii) is automatically satisfied.
We further note that Assumption 3.3(iii) holds for R = Θ (and hence also for R as in
(S.114)) with Jn = C0 for some C0 < ∞ by Assumption S.6.2(ii) and Lemma S.6.2.
Assumption 3.4 is implied by Assumption S.6.4. Lemma S.6.1 additionally verifies that
Assumption 3.6 holds with ‖ · ‖E = ‖ · ‖2 and ν−1n = η for some η > 0 when R = Θ
and hence also when R as in (S.114). Since Assumption 3.5 holds automatically due to
EP [ρ(X, θ0(P ))] = 0, we note Theorem 3.1 implies Rn n−1/2 for both choices of R
under consideration. Also note Assumption 3.7 is satisfied for R = Θ, and hence also
for R as in (S.114), by Lemma S.6.3. Additionally, since Θ is convex by Assumption
S.6.1(iii), the mean value theorem and Assumption S.6.2(ii) imply
EP [‖ρ(X, θ1)− ρ(X, θ2)‖22] ≤ EP [supθ∈Θ‖∇θρ(X, θ)‖2o,2]‖θ1 − θ2‖22
for all θ1, θ2 ∈ Θ, which verifies Assumption 3.8 holds with κρ = 1. To verify Assumption
3.9, note that in this application ∇mP,(θ) = EP [∇θρ(X, θ)]. Hence, Assumptions
3.9(i)(ii) hold with ‖ · ‖L = ‖ · ‖2 due to EP [supθ∈Θ ‖∇2θρ(X, θ)‖o,2] being bounded
in P ∈ P by Assumption S.6.2(ii). Similarly, Assumption 3.9(iii) is satisfied due to
EP [supθ∈Θ ‖∇θρ(X, θ)‖o,2] being bounded by Assumption S.6.2(ii). Finally, we note that
since Rn n−1/2 and κρ = 1, Lemma S.6.2 verifies Assumption 3.10(i). Assumption
3.10(ii) is immediate since EP [ρ(X, θ0(P ))] = 0, and Assumption 3.10(iii) holds by
Assumption S.6.4(i). To conclude, simply note that the condition k1/pn
√log(1 + kn)Bn×
supP∈P J[ ](`κρn ,Fn, ‖·‖P,2) = o(an) is implied by `n
√log(1/`n) = o(an) by Lemma S.6.2,
and KmR2n = o(an/
√n) is implied by n−1/2 = o(an).
Proof of Theorem S.6.2: We first define En(R|`n) and En(Θ|`un) to equal
En(R|`n) ≡ infh√n∈Vn(θn,`n)
‖Wnρ(·, θn) + Dn(θn)h‖Σn,2
En(Θ|`un) ≡ inf‖ h√
n‖2≤`un
‖Wnρ(·, θun) + Dn(θu
n)h‖Σn,2,
and note Lemma S.6.6 implies Un(R| +∞) = En(R|`n) + oP (an) and Un(Θ| +∞) =
En(Θ|`un) + oP (an) uniformly in P ∈ P0 for any `n, `un ↓ 0 satisfying the conditions of
32
the theorem. Therefore, to establish the theorem it suffices to establish
En(R|`n) ≥ U?n,P (R|˜n) + oP (an)
En(R|`n)− En(Θ|`un) ≥ U?n,P (R|˜n)− U?n,P (Θ|˜un) + oP (an)
uniformly in P ∈ P0 for `n ˜n and `un ˜u
n. To this end we rely on Theorem 3.3
(for En(R|`n)) and Corollary 3.3 (for En(R|`n)− En(Θ|`un)). We note that in the proof
of Theorem S.6.1 we established that Assumptions S.6.1, S.6.2, S.6.3, and S.6.4, imply
Assumptions 3.1, 3.2, 3.3, 3.4, 3.5, 3.6, 3.7, 3.8, 3.9, and 3.10 hold with Rn n−12 ,
νn 1, ‖ · ‖B = ‖ · ‖E = ‖ · ‖L = ‖ · ‖2, κρ = 1, and an =√
log(n)/n1
10+5dθ for R = Θ and
R as in (S.114). We thus avoid repeating the arguments, and verify only Assumptions
3.11, 3.12, 3.13, 3.14, 3.15, and 3.16 for R = Θ and R as in (S.114).
Next note that Lemma S.6.5 implies Assumptions 3.11, 3.12, and 3.13 are satisfied,
while Lemma S.6.4 verifies Assumption 3.14 with an =√
log(n)/n1
10+5dθ for R = Θ, and
hence also for R as in (S.114). Assumption 3.15(i) is immediate since ‖·‖E = ‖·‖B = ‖·‖2and hence Kb = 1, while Assumption 3.15(ii) is implied by Assumption S.6.5(i) and
‖θn − θ0(P )‖2 = oP (1) uniformly in P ∈ P0. Assumption 3.16(i) is immediate since
Sn(B,E) = 1 and the choices of θn and θun correspond to setting τn = o(n−1/2). Similarly,
it follows form Lemma S.6.2, Sn(L,E) = 1, and n−1/2 = o(`n) that the condition
`2n√
log(`n) = o(ann− 1
2 ) verifies Assumption 3.16(ii). Moreover, since `n = o(rn) and
n−1/2 = o(rn) Assumption 3.16(iii) automatically holds. Hence, Theorem 3.3 implies
En(R|`n) ≥ U?n,P (R|˜n) + oP (an) (S.120)
uniformly in P ∈ P0 for some `n ˜n. Similarly, since Ru
n n−1/2, the conditions of
Corollary 3.3 are immediate and hence there are `n ˜n and `un ˜u
n such that
En(R|`n)− En(Θ|`un) ≥ U?n,P (R|˜n)− U?n,P (Θ|˜un) + oP (an). (S.121)
The theorem therefore follows from (S.120), (S.121) and Lemma S.6.6.
Lemma S.6.1. If Assumptions S.6.1(ii)(iii), S.6.2, S.6.3, and S.6.4(ii) hold, then As-
sumption 3.6 is satisfied with R = Θ, ‖ · ‖E = ‖ · ‖2, ν−1n = η for some η > 0, and
Vn(P ) ≡ θ ∈ Θ : ‖θ − θ0(P )‖2 ≤ ε for some ε > 0.
Proof: First note that in this application Qn,P (θ) ≡ ‖Σ(P )EP [ρ(X, θ)]‖2 for all n, and
hence we write QP (θ) for simplicity. Also observe that Assumptions S.6.1(i), S.6.2(ii),
S.6.4, and S.6.3(i) and Lemma S.6.2 allow us to apply Lemma S.3.1(i) with ‖·‖A = ‖·‖2,
Jn = O(1) and Sn(ε) > 0 to conclude θn ∈ Vn(P ) ≡ θ ∈ Θn : ‖θ − θ0(P )‖2 ≤ ε with
probability tending to one uniformly in P ∈ P0 for any ε > 0. Moreover, Θ being convex
33
and Assumption S.6.2(ii) imply that for some C0 <∞ we have
‖EP [ρ(X, θ)]− EP [ρ(X, θ0(P ))]− EP [∇θρ(X, θ0(P ))](θ − θ0(P ))‖2 ≤ C0‖θ − θ0(P )‖22
for all θ ∈ Θ. Hence, since the smallest singular value of EP [∇θρ(X, θ0(P ))] is bounded
away from zero uniformly in P ∈ P0 by Assumption S.6.3(ii), we obtain for some C1 <∞
‖θ − θ0(P )‖2 ≤ C1‖EP [∇θρ(X, θ0(P ))](θ − θ0(P ))‖2≤ C1‖EP [ρ(X, θ)]− EP [ρ(X, θ0(P ))]‖2 + C0‖θ − θ0(P )‖22 (S.122)
for all θ ∈ Θ and P ∈ P0. Therefore, since EP [ρ(X, θ0(P ))] = 0, Assumption 3.6(i)
holds with ‖ · ‖E = ‖ · ‖2 and ν−1n = η for some η > 0 due to (S.122) and Assumption
S.6.4(ii). Assumption 3.6(iv) is immediate due to QP (θ0(P )) = 0.
Lemma S.6.2. Let F ≡ ρ(·, θ) : for some θ ∈ Θ and 1 ≤ ≤ J . If Assumptions
S.6.1(iii) and S.6.2 hold, then it follows that supP∈PN[ ](ε,F , ‖ · ‖P,2) . 1 ∨ ε−dθ and
supP∈P J[ ](ε,F , ‖ · ‖P,2) . ε(1 +√
log(1 ∨ ε−1)).
Proof: Since Θ is convex by Assumption S.6.1(iii), the mean value theorem and As-
sumption S.6.2(i) imply for any θ1, θ2 ∈ Θ and 1 ≤ ≤ J that
|ρ(x, θ1)− ρ(x, θ2)| ≤ supθ∈Θ‖∇θρ(x, θ)‖o,2‖θ1 − θ2‖2. (S.123)
Setting D(x) ≡ supθ∈Θ ‖∇θρ(x, θ)‖o,2, then note that Theorem 2.7.11 in van der Vaart
and Wellner (1996) and the right hand side of (S.123) not depending on imply
N[ ](ε,F , ‖ · ‖P,2) ≤ J ×N(ε
2‖D‖P,2,Θ, ‖ · ‖2) . 1 ∨ ε−dθ , (S.124)
where we employed that N(ε,Θ, ‖·‖2) . 1∨ε−dθ due to Θ being bounded by Assumption
S.6.1(iii) and supP∈P ‖D‖P,2 <∞ by Assumption S.6.2(ii).
For the second claim of the Lemma we employ the bound in (S.124) to obtain
supP∈P
J[ ](ε,F , ‖ · ‖P,2) .∫ ε
0(1 + log(1 ∨ u−dθ))1/2du
= ε
∫ 1
0(1 + log(1 ∨ (εv)−dθ))1/2dv . ε(1 +
√log(1 ∨ ε−1)),
where the first equality follows from the change of variables v = u/ε and the final
inequality is implied by the inequality 1 ∨ (ab) ≤ (1 ∨ a)(1 ∨ b).
Lemma S.6.3. If Assumptions S.6.1(i)(iii) and S.6.2 hold, then it follows that As-
sumption 3.7 is satisfied with R = Θ and an =√
log(n)/n1
6+5dθ .
34
Proof: Let εn =√
log(n)/n1
6+5dθ and set δn ≡ 1 ∧ (ε2n√n)− 2
2+5dθ , which note satisfies
1 ≥ δn = o(1). Further define Nn ≡ N(δn,Θ, ‖ · ‖2) and set θkNnk=1 to be the center of
the Nn balls covering Θ. For notational simplicity, we also let
rn,P (x) ≡ ((ρ(x, θ1)− EP [ρ(X, θ1)])′, . . . , (ρ(x, θNn)− EP [ρ(X, θNn)])′)′
and note rn,P (x) ∈ RJNn . For any P ∈ P and η > 0 also define Cn,P (η) to equal
Cn,P (η) ≡(JNn)EP [‖rn,P (X)‖32]
η3ε3n√n
. (S.125)
It then follows by Yurinskii’s coupling (see, e.g., Theorem 10.10 in Pollard (2002)) that
there exists a Gaussian vector Nn,P ∈ RJNn and universal constant K0 such that
P (‖Gn,P rn,P − Nn,P ‖2 > 3ηεn) ≤ K0Cn,P (η)(1 +| log(1/Cn,P (η))|
JNn). (S.126)
Next note that Assumption S.6.2(ii) and the convexity of the map u 7→ |u|32 imply that
supP∈P
EP [‖rn,P (X)‖32] ≤ (JNn)32 × sup
P∈P
1
JNn
Nn∑k=1
J∑=1
EP [|ρ(X, θk)|3] . N32n . (S.127)
In particular, since N(ε,Θ, ‖ · ‖2) . 1∨ ε−dθ , it follows from δn ≤ 1 that Nn . δ−dθn , and
hence by (S.127) and the definition of Cn,P (η) in (S.125) we obtain
supP∈P
Cn,P (η) .N
52n
η3ε3n√n.
1
η3ε3n(nδ5dθn )
12
. (S.128)
Moreover, since the function u 7→ u(1 + | log(1/u)|/A) with A ≥ 1 is increasing in u on
the interval (0, 1] and ε3n(nδ5dθn )
12 →∞, we obtain from results (S.126) and (S.128) that
lim supn→∞
supP∈P
P (‖Gn,P rn,P − Nn,P ‖2 > 3ηεn)
. lim supn→∞
1
η3ε3n(nδ5dθn )
12
(1 +| log(η3ε3n(nδ5dθ
n )12 )|
JNn) = 0, (S.129)
where the final result follows by direct calculation. Letting Sn,P denote the linear span
of rn,P in L2P we then employ Nn,P to define a Gaussian process W(1)
n,P on Sn,P by setting
W(1)n,P (
Nn∑k=1
λ′kρ(·, θk)) ≡ (λ′1, . . . , λ′Nn)Nn,P (S.130)
for any λkNnk=1 with λk ∈ RJ . Letting Projf |Sn,P denote the projection of f onto
Sn,P under ‖ · ‖P,2, and assuming the probability space is suitably large to carry an
35
isonormal process W(2)n,P on f −Projf |Sn,P : f ∈ F that is independent of W(1)
n,P , we
then define the isonormal process Wn,P to be given by
Wn,P f ≡W(1)n,P (Projf |Sn,P ) + W(2)
n,P (f − Projf |Sn,P ). (S.131)
To establish Wn,P meets the requirements of Assumption 3.7, next let Πnθ denote
the projection of any θ ∈ Θ onto θkNnk=1 under ‖ · ‖2 and define the class
Gn,P ≡ (ρ(·, θ)−ρ(·,Πnθ))−EP [(ρ(X, θ)−ρ(X,Πnθ))] : θ ∈ Θ, 1 ≤ ≤ J. (S.132)
By the mean value theorem, Θ being convex by Assumption S.6.1(iii), and ‖θ−Πnθ‖2 ≤δn for every θ ∈ Θ due to δn-balls around θkNnk=1 covering Θ, it follows that
supθ∈Θ|(ρ(x, θ)− ρ(x,Πnθ))− EP [(ρ(X, θ)− ρ(X,Πnθ))]|
≤ supθ∈Θ‖∇θρ(x, θ)‖o,2 + sup
P∈PEP [sup
θ∈Θ‖∇θρ(X, θ)‖o,2] × δn.
Hence, setting G(x) ≡ 1 ∨ supθ∈Θ ‖∇θρ(x, θ)‖o,2 + supP∈PEP [supθ∈Θ ‖∇θρ(X, θ)‖o,2]it follows that G(x)δn is an envelope for Gn,P , which by Assumption S.6.2(ii) satisfies
supP∈P ‖Gδn‖P,2 . δn. Further note that if [fl, fu] is a bracket containing a function f ,
then [fl − EP [fu(X)], fu − EP [fl(X)]] contains f − EP [f(X)] and satisfies
‖fu − fl − EP [fl(X)− fu(X)]‖P,2 ≤ 2‖fu − fl‖P,2
by Jensen’s inequality and the triangle inequality. Therefore, Lemma S.6.2 implies
supP∈P
N[ ](ε,Gn,P , ‖ · ‖P,2) . Nn × (1 ∨ ε−dθ),
and hence Theorem 2.14.2 in van der Vaart and Wellner (1996) together with Gn,Phaving envelope δnG with G ≥ 1 and supP∈P ‖G‖P,2 <∞ and Nn . δ−dθn yield
supP∈P
EP [ supg∈Gn,P
|Gn,P g|] . supP∈Pδn‖G‖P,2
∫ 1
0(1 + logN[ ](εδn‖G‖P,2,Gn,P , ‖ · ‖P,2))
12dε
. δn
∫ 1
0(1 + log(Nn) + log(1 ∨ (εδn)−dθ))1/2dε . δn(1 + log(δ−dθn ))1/2. (S.133)
Therefore, the definitions of δn and εn, result (S.133) and Markov’s inequality imply
lim supn→∞
supP∈P
P ( supg∈Gn,P
|Gn,P g| > ηεn) . lim supn→∞
δn(1 + log(δ−dθn ))1/2
ηεn= 0. (S.134)
Similarly, since Wn,P is Gaussian and 0 ∈ Gn,P , Corollary 2.2.8 in van der Vaart and
36
Wellner (1996) and packing numbers being bounded by bracketing numbers imply
supP∈P
EP [ supg∈Gn,P
|Wn,P g|] . supP∈P
∫ ∞0
(logN[ ](ε,Gn,P , ‖ · ‖P,2))12dε
. supP∈P
∫ 2δn‖G‖P,2
0(logN[ ](ε,Gn,P , ‖ · ‖P,2))
12dε . δn(1 + log(δ−dθn ))1/2, (S.135)
where in the second inequality we employed that the bracket [−δnG, δnG] covers Gn,Pdue to δnG being an envelope for Gn,P , and the final inequality follows from the change
of variables u = ε/(2δn‖G‖P,2) and the same manipulations as in (S.133). Hence,
lim supn→∞
supP∈P
P ( supg∈Gn,P
|Wn,P g| > ηεn) . lim supn→∞
δn(1 + log(δ−dθn ))1/2
ηεn= 0, (S.136)
by result (S.135) and Markov’s inequality. To conclude, note that the definitions of
Wn,P in (S.130) and (S.131), and of Gn,P in (S.132), and the triangle inequality yield
supθ∈Θ‖Gn,Pρ(·, θ)−Wn,Pρ(·, θ)‖2
≤ ‖Gn,P rn,P − Nn,P ‖2 + supg∈Gn,P
√J |Gn,P g|+ sup
g∈Gn,P
√J |Wn,P g|.
Thus the Lemma follows from (S.129), (S.134), and (S.136).
Lemma S.6.4. If Assumptions S.6.1(i)(iii) and S.6.2 hold, then it follows that As-
sumption 3.14 is satisfied with R = Θ and an = log3/4(n)/n1
12+2dθ .
Proof: We aim to establish the lemma by relying on Theorem S.9.1(i) in Section S.9.
To this end set ζn = n− 1
2(6+dθ) , Mn = n1
6+dθ , and Nn ≡ N(ζn,Θ, ‖ · ‖2). By Assumption
S.6.2(ii) the function F (x) ≡ supθ∈Θ ‖ρ(x, θ)‖2 is integrable, and for any θ ∈ Θ we let
ρ(x, θ) ≡ (ρ1(x, θ)1F (x) ≤Mn, . . . , ρJ (x, θ)1F (x) ≤Mn)′.
Defining dn = JNn and θkNnk=1 to be the centers of the ζn-balls covering Θ we then let
fdnn (x) ≡ (ρ(x, θ1)′, . . . , ρ(x, θNn)′)′.
Next note that since each entry of the matrix fdnn (X)fdnn (X)′ is almost surely bounded
by M2n it follows that ‖EP [fdnn (X)fdnn (X)′]‖o,2 ≤ dnM2
n, and hence Assumption S.9.1 in
Section S.9 holds with Cn = dnM2n and Kn = Mn. For every θ ∈ Θ let Πnθ denote its
projection (under ‖ · ‖2) onto θkNnk=1 and define the class Gn ≡ ρj(·, θ)− ρj(x,Πnθ) :
37
θ ∈ Θ and 1 ≤ ≤ J. Further observe Gn has envelope Gn given by
supg∈Gn
|g(x)| ≤ max1≤≤J
supθ∈Θ|ρ(x, θ)− ρ(x,Πnθ)|+ |ρ(x,Πnθ)|1F (x) > Mn
≤ supθ∈Θ‖∇θρ(x, θ)‖o,2‖θ −Πnθ‖2 + F (x)1F (x) > Mn ≡ Gn(x),
where in the second inequality we employed the mean value theorem and Θ being convex
by Assumption S.6.1(iii). Therefore, since the ζn-balls centered around θkNnk=1 cover
Θ, we obtain from Assumption S.6.2(ii) and Markov’s and Holder’s inequalities that
supP∈P
EP [G2n(X)]
. supP∈P
EP [supθ∈Θ‖∇θρ(X, θ)‖2o,2]× ζ2
n + supP∈PEP [F 3(X)]
23 P (F (X) > Mn)
13
. ζ2n +M−1
n . (S.137)
Moreover, since Θ being bounded by Assumption S.6.1(iii) implies Nn . ζ−dθn , we obtain
supP∈P
N[ ](ε,Gn, ‖ · ‖P,2) . Nn × (1 ∨ ε)−dθ . ζ−dθn × (1 ∨ ε)−dθ (S.138)
due to Lemma S.6.2. Hence, the change of variables u = ε/‖Gn‖P,2 implies that
supP∈P
∫ ‖Gn‖P,20
(1+ logN[ ](ε,Gn, ‖ · ‖P,2))1/2dε
. supP∈P‖Gn‖P,2
∫ 1
0(1 + log(ζ−dθn ) + log(1 ∨ (u‖Gn‖P,2)−dθ))1/2du
. (ζn +M−1/2n )(1 + log(ζ−1
n +M1/2n ))1/2 (S.139)
where in the inequalities we employed results (S.137) and (S.138). In particular, results
(S.138) and (S.139) together with Lemma S.9.4 imply that Assumption S.9.2(i) in Sec-
tion S.9 is satisfied with J1n . (ζn+M−1/2n )(1 + log(ζ−1
n +Mn))1/2. Similarly, note that
in this application, the set Bn in Assumption S.9.2(ii) consists of 0 ∈ Rdn and the set
of vectors in Rdn with one coordinate equal to one and all other coordinates equal to
zero. Thus, Assumption S.9.2(ii) holds with J2n = (log(1 + dn))1/2 . (log(1 + ζ−1n ))1/2.
We have so far verified Assumptions S.9.1 and S.9.2 in Section S.9 hold with dn .
ζ−dθn , Kn = Mn, Cn .M2nζ−dθn , J1n . (ζn +M
−1/2n )(1 + log(ζ−1
n +M1/2n ))1/2, and J2n .
(log(1 + ζ−1n ))1/2. Since we had set ζn = n
− 12(6+dθ) and Mn = n
16+dθ , the requirement
that dn log(1 + dn)K2n = o(n) imposed by Theorem S.9.1(i) holds as well. Therefore,
Assumption S.6.1(i) and Theorem S.9.1(i) finally enable us to conclude that there exists
38
a process W?n,P that is independent of the data Xini=1 and such that
supθ∈Θ‖Wnρ(·, θ)−W?
n,Pρ(·, θ)‖2
≤ max1≤≤J
supθ∈Θ
√J |Wnρ(·, θ)−W?
n,Pρ(·, θ)| = OP (log3/4(n)n− 1
12+2dθ )
uniformly in P ∈ P, which conclude the proof of the Lemma.
Lemma S.6.5. If Assumption S.6.1(ii), S.6.5(ii)-(vi), and S.6.6 hold, then it follows
that Assumptions 3.11, 3.12, and 3.13 are satisfied.
Proof: Recall that in this setting G = RdG and ‖ · ‖G = ‖ · ‖∞. Then note that if
θ1, θ2 ∈ θ : ‖θ−θ0(P )‖2 ≤ ε for some P ∈ P0, then it follows that αθ1 +(1−α)θ2 ∈ Bε
for all α ∈ [0, 1] and Bε as defined in Assumption S.6.5. Hence, by convexity of Bε and
Proposition 7.3.3 in Luenberger (1969) it follows that
‖ΥG(θ1)−ΥG(θ2)−∇ΥG(θ1)[θ1 − θ2]‖G ≤ supθ∈Bε
max1≤j≤dG
‖∇2ΥG,j(θ)‖o,2‖θ1 − θ2‖22
2.
(S.140)
Employing again that θ1, θ2 ∈ θ : ‖θ−θ0(P )‖2 ≤ ε for some P ∈ P0 implies θ1, θ2 ∈ Bε,
we similarly obtain from Proposition 7.3.2 in Luenberger (1969) that
‖∇ΥG(θ1)−∇ΥG(θ2)‖o = sup‖h‖2=1
max1≤j≤dG
|(∇ΥG,j(θ1)−∇ΥG,j(θ2))[h]|
≤ supθ∈Bε
max1≤j≤dG
‖∇2ΥG,j(θ)‖o,2‖θ1 − θ2‖2. (S.141)
Since ‖∇2ΥG,j(θ)‖o,2 is uniformly bounded on Bε by Assumption S.6.5(v), it follows
from results (S.140) and (S.141) that Assumptions 3.11(i)-(ii) are satisfied with
Kg ≡ supθ∈Bε
max1≤j≤dG
‖∇2ΥG,j(θ)‖o,2.
Assumption S.6.5(iii) additionally implies Mg ≡ supθ∈Bε ‖∇ΥG(θ)‖o,2 < ∞, and hence
verifies Assumption 3.11(iii). By identical arguments, but recalling F = RdF and ‖·‖F =
‖ · ‖2, it follows Assumptions S.6.5(iii)-(iv) imply Assumptions 3.12(i)-(iii) hold with
Kf ≡√dF sup
θ∈Bεmax
1≤j≤dF‖∇2ΥF,j(θ)‖o,2 Mf ≡ sup
θ∈Bε‖∇ΥF (θ)‖o,2. (S.142)
To conclude, note that since Assumption S.6.5(vi) implies the range of ∇ΥF (θ)
equals RdF for all θ ∈ Bε, it follows that ∇ΥF (θ) admits a right inverse. Moreover, if
ΥF is linear, then Kf = 0 and hence Assumption 3.12(iv) holds. On the other hand,
if ΥF is nonlinear, then note ∇ΥF (θ)− = ∇ΥF (θ)′(∇ΥF (θ)∇ΥF (θ)′)−1 and therefore
‖∇ΥF (θ)−‖o,2 is bounded for all θ ∈ Bε due to ‖∇ΥF (θ)‖o,2 being bounded on Bε by
39
Assumption S.6.5(ii), and the smallest singular value of ∇ΥF (θ)′ being bounded away
from zero on Bε by Assumption S.6.6(ii). It thus follows Assumption 3.12(iv) holds
as well (after possibly increasing Mf in (S.142)). Finally, we note Assumption S.6.6
directly implies Assumption 3.13 and hence the Lemma follows.
Lemma S.6.6. Let Assumptions S.6.1, S.6.2, S.6.3, and S.6.4 hold, and set an =√log(n)/n
110+5dθ . For any `n with n−
12 = o(`n), it follows uniformly in P ∈ P0 that
Un(R|+∞) = infh√n∈Vn(θn,`n)
‖Wnρ(·, θn) + Dn(θn)h‖Σn,2 + oP (an)
Un(Θ|+∞) = inf‖ h√
n‖2≤`n
‖Wnρ(·, θun) + Dn(θu
n)h‖Σn,2 + oP (an).
Proof: We establish the lemma by relying on Lemma 3.1. To this end note that in
the proof of Theorem S.6.1, Assumptions 3.2, 3.3(i)(iii), and 3.4 were verified and θn
and θun were shown to be consistent for θ0(P ) with ‖ · ‖B = ‖ · ‖E = ‖ · ‖2, Rn = n−1/2,
and νn 1 for both R as in (S.114) and for R = Θ. Next, note Lemma S.6.4 verifies
Assumption 3.14 holds with an =√
log(n)/n1
10+5dθ for R = Θ and hence also for R as
in (S.114). Also observe that the mean value theorem and Θ being convex imply
| ∂∂θk
ρ(x, θ1)− ∂
∂θkρ(x, θ2)| ≤ max
1≤≤Jsupθ∈Θ‖∇2
θρ(x, θ)‖o,2‖θ1 − θ2‖2 (S.143)
for any θ1, θ2 ∈ Θ, 1 ≤ ≤ J , and 1 ≤ k ≤ dθ. Thus, Assumption S.6.2(ii) implies there
exists a C0 <∞ such that for all P ∈ P and θ1, θ2 ∈ Θ it follows that
‖EP [∇θρ(X, θ1)−∇θρ(X, θ2)]‖o,2 ≤ C0‖θ1 − θ2‖2.
In particular, the function θ 7→ EP [∇θρ(X, θ)] is uniformly continuous in θ and P ∈ P,
which implies by Assumption S.6.3(ii) that there is an ε0 > 0 such that the smallest
singular value of EP [∇θρ(X, θ)] is bounded away from zero on θ ∈ Θ : ‖θ − θ0(P )‖2 ≤ε0 for some P ∈ P0. Since νn 1, p = 2, and ‖·‖E = ‖·‖2, the Lemma 3.1 requirement
‖h‖E ≤ νn‖Dn,P (θ)[h]‖p for all θ ∈ An(P ), P ∈ P0, and h ∈√nBn∩R−θ holds with
An(P ) = (Θr0n(P ))ε0 and R = Θ (and hence also for R as in (S.114)). Moreover, by
uniform consistency of θn and θun it follows that θn, θ
un ∈ An(P ) with probability tending
to one uniformly in P ∈ P0.
To conclude, define F ≡ ∂∂θk
ρ(·, θ) : for some θ ∈ Θ, 1 ≤ ≤ J , 1 ≤ k ≤ dθ and
let F (x) ≡ max1≤≤J supθ∈Θ ‖∇2θρ(x, θ)‖o,2. Then note that if ε-balls around θiNεi=1
cover Θ, then result (S.143) implies that the brackets [ ∂∂θk
ρ(·, θi)− εF, ∂∂θk
ρ(·, θi) + εF ]
cover F . Writing these brackets as [fl,k, fu,k]Kεk=1 for conciseness, further note that
Kε = J dθNε <∞ since Nε <∞ due to Θ being compact, and C1 ≡ supP∈P ‖F‖P,1 <∞
40
by Assumption S.6.2(ii). Moreover, by definition of [fl,k, fu,k] it further follows that
EP [fu,k(X)− fl,k(X)] = ‖fu,k − fl,k‖P,1 ≤ 2εC1 (S.144)
for all P ∈ P. Hence, employing the bound f(x) − EP [f(X)] ≤ fu,k(x) − EP [fl,k(X)]
for [fl,k, fu,k] the bracket containing f , we obtain from result (S.144) that
supf∈F 1
n
n∑i=1
f(Xi)− EP [f(X)]
≤ max1≤k≤Kε
| 1n
n∑i=1
fu,k(Xi)− EP [fu,k(X)]|+ 2εC1 = 2εC1 + oP (1), (S.145)
where the equality holds uniformly in P ∈ P by Assumption S.6.2(ii), Kε < ∞, and
Theorem 2.8.1 in van der Vaart and Wellner (1996). By identical arguments, we have
inff∈F 1
n
n∑i=1
f(Xi)− EP [f(X)]
≥ − max1≤k≤Kε
| 1n
n∑i=1
fl,k(Xi)− EP [fl,k(X)]| − 2εC1 = −2εC1 + oP (1), (S.146)
uniformly in P ∈ P. We thus conclude from results (S.145) and (S.146) that F is
Glivenko-Cantelli uniformly in P ∈ P. Since by Assumption S.6.5(i) there exists an
ε > 0 such that θ : ‖θ − θ0(P )‖2 ≤ ε ⊆ Θ for all P ∈ P0, we obtain
supθ:‖θ−θ0(P )‖2≤ε
suph∈Rdθ :‖ h√
n‖2≥`n
‖Dn(θ)[h]− Dn,P (θ)[h]‖2‖h‖2
≤ supθ∈Θ‖ 1
n
n∑i=1
∇θρ(Xi, θ)− EP [∇θρ(X, θ)]‖o,2 = oP (1) (S.147)
uniformly in P ∈ P, and where the equality follows from F being Glivenko-Cantelli
uniformly in P ∈ P. Since νn 1, result (S.147) verifies condition (24) in Lemma 3.1.
This concludes verifying the requirements of Lemma 3.1 and hence the present Lemma
follows for any `n satisfying Sn(B,E)Rn = o(`n), which in this application is equivalent
to n−1/2 = o(`n) due to ‖ · ‖B = ‖ · ‖E = ‖ · ‖2 and Rn n−1/2.
S.6.2 Consumer Demand
We base our next example on a long-standing literature aiming to replace paramet-
ric assumptions with shape restrictions implied by economic theory (Matzkin, 1994).
41
Specifically, suppose that quantity demanded by individual i, denoted Qi, satisfies
Qi = gP (Si, Yi) +W ′iγP + Ui
where Si ∈ R+ denotes price, Yi ∈ R+ denotes income, and Wi ∈ Rdw is a set of
covariates. In addition, we assume there is an instrument Zi yielding the restriction
EP [Q− gP (S, Y )−W ′γP |Z] = 0. (S.148)
For instance, under exogeneity of prices we may let Z = (S, Y,W ′)′ as in Blundell et al.
(2012). Alternatively, if there is a concern that prices are endogenous, then we may set
Z = (I, Y,W ′)′ for I an instrument for S, as in Blundell et al. (2017).
Our goal is to conduct inference on the level of demand at particular price income
pair (s0, y0) while imposing that the function gP satisfies the Slutsky restriction
∂
∂sgP (s, y) + gP (s, y)
∂
∂ygP (s, y) ≤ 0. (S.149)
To map this problem into our framework, we assume that for some set Ω, (S, Y ) ∈ Ω ⊆R2
+ with probability one for all P ∈ P and impose that gP ∈ C1B(Ω), where
CmB (Ω) ≡ g : Ω→ R s.t. ‖g‖m,∞ <∞ ‖g‖m,∞ ≡ sup0≤α≤m
sup(s,y)∈Ω
|∇αg(s, y)|.
Since the parameters in this model are (gP , γP ) with γP ∈ Rdw , we set B = C1B(Ω)×Rdw
and equip it with the norm ‖θ‖B = max‖g‖1,∞, ‖γ‖2 for any (g, γ) = θ ∈ B. Note
that in this application, X = (Q,S, Y,W ) ∈ X ≡ R+ × Ω×Rdw and
ρ(X, θ) = Q− g(S, Y )−W ′γ. (S.150)
We further assume that θ0(P ) = (gP , γP ) is identified as the unique solution to (S.148).
In order to impose the Slutsky restriction in (S.149) we let G = C0B(Ω) and ‖ · ‖G =
‖ · ‖∞, where with some abuse of notation we write ‖ · ‖∞ in place of ‖ · ‖0,∞. The space
C0B(Ω) is a Banach lattice under the standard pointwise ordering given by
a ≤ b if and only if a(s, y) ≤ b(s, y) for all (s, y) ∈ Ω. (S.151)
Moreover, the constant function c ∈ C0B(Ω) satisfying c(s, y) = 1 for all (s, y) ∈ Ω is an
order unit under the partial ordering in (S.151). Its induced norm on C0B(Ω) equals
inf λ > 0 : |a| ≤ λc = sup(s,y)∈Ω
|a(s, y)|,
which coincides with the norm ‖·‖∞ on C0B(Ω), and we therefore set 1G = c. To encode
42
the Slutsky restriction in (S.149) we then let the map ΥG : B→ G equal
ΥG(θ)(s, y) =∂
∂sg(s, y) + g(s, y)
∂
∂yg(s, y) (S.152)
for any θ = (g, γ) ∈ B. Finally, to test whether the level of demand at a prescribed
price s0 and income y0 equals a hypothesized value c0, we set F = R, ‖ · ‖F = | · |, and
ΥF (θ) = g(s0, y0)− c0 (S.153)
for any θ = (g, γ) ∈ B. By setting R = θ ∈ B : ΥG(θ) ≤ 0 and ΥF (θ) = 0 and
conducting test inversion (over different values of c0) of the null hypothesis
H0 : θ0(P ) ∈ R H1 : θ0(P ) /∈ R
we may then obtain a confidence region for the level of demand at price s0 and income
y0 that imposes the Slutsky restriction in (S.149).
We set the parameter space to be a ball in B under ‖ · ‖B by setting Θ to be
Θ ≡ (g, γ) ∈ C1B(Ω)×Rdw : ‖θ‖1,∞ ≤ C0 and ‖γ‖2 ≤ C0 (S.154)
for some C0 < ∞. Given a sequence of approximating functions pj,njnj=1, we then let
pjnn (s, y) ≡ (p1,n(s, y), . . . , pjn,n(s, y))′ and set the sieve Θn to equal
Θn ≡ (pjn′n β, γ) : ‖pjn′n β‖1,∞ ≤ C0 and ‖γ‖2 ≤ C0.
Similarly, for a sequence qn,kknk=1 of transformations of the conditioning variable Z, we
let qknn (z) ≡ (q1,n(z), . . . , qkn,n(z))′. We base our test statistic on the quadratic forms
Qn(θ) ≡ ( 1
n
n∑i=1
(Qi−g(Si, Yi)−W ′iγ)qknn (Zi))′Σn(
1
n
n∑i=1
(Qi−g(Si, Yi)−W ′iγ)qknn (Zi))12
for some weighting matrix Σn and every (g, γ) = θ ∈ Θ. The constrained (i.e. In(R))
and unconstrained (i.e. In(Θ)) statistics are then simply given by the minimum of√nQn
over θ ∈ Θn ∩R and Θn respectively.
The next assumptions suffice for obtaining a strong approximation. In their state-
ment, the notation singA denotes the smallest singular value of a matrix A.
Assumption S.6.7. (i) Xi, Zini=1 is i.i.d. with (X,Z) distributed according to P ∈ P;
(ii) For Θ as in (S.154) and each P ∈ P0 there exists a unique θ0(P ) ∈ Θ satisfying
(S.148); (iii) The support of (Q,W ) is bounded uniformly in P ∈ P.
Assumption S.6.8. (i) sup(s,y) ‖pjnn (s, y)‖2 .
√jn; (ii) sup(s,y) ‖∂ap
jnn (s, y)‖2 . j
3/2n
for a ∈ s, y; (iii) The eigenvalues of EP [pjnn (S, Y )pjnn (S, Y )′] are bounded away from
43
zero and infinity uniformly in P ∈ P and n; (iv) For each P ∈ P0 there is a Πnθ0(P ) =
(gn, γ) ∈ Θn ∩R with supP∈P0‖EP [(gP (S, Y )− gn(S, Y ))qknn (Z)‖2 = O((n log(n))−1/2).
Assumption S.6.9. (i) max1≤k≤kn ‖qk,n‖∞ .√kn; (ii) The largest eigenvalues of
EP [qknn (Z)qknn (Z)′] are bounded uniformly in P ∈ P and n; (iii) For every n, sn ≡infP∈P singEP [qknn (Z)(pjnn (S, Y )′ W ′)] is positive and sn = O(1); (iv) j3
nk3n log3(n) =
o(n) and k2njn log
32 (1 + kn)/(sn
√n)(1 ∨
√log(sn
√n/kn)) = o((log(n))−1/2).
Assumption S.6.10. (i) ‖Σn−Σn(P )‖o,2 = OP ((kn√jn log(n))−1) uniformly in P ∈ P;
(ii) Σn(P ) is invertible and ‖Σn(P )‖o,2 and ‖Σn(P )−1‖o,2 are bounded in P ∈ P and n.
Assumption S.6.7(iii) requires (Q,W ) to be bounded, which enables to apply the
recent coupling results by Zhai (2018). Alternatively, Assumption S.6.7(iii) can be re-
laxed under appropriate tail conditions; see, e.g., the proof of Lemma S.6.4 for related
arguments. Assumptions S.6.8(i)-(iii) are standard requirements on Θn that are sat-
isfied, e.g., by tensor product wavelets or B-splines under appropriate conditions on
the distribution of (S, Y ) (Newey, 1997; Chen, 2007; Belloni et al., 2015; Chen and
Christensen, 2018). Assumption S.6.8(iv) pertains the approximating requirements on
the sieve; see Remarks S.6.1 and S.6.2 below. Assumption S.6.8(iv) can be relaxed for
studying the re-centered statistic, but we impose it here for simplicity to analyse both
In(R) and In(R) − In(Θ). In turn, Assumption S.6.9(i)(ii) imposes standard require-
ments on the transformations qk,nknk=1. Assumption S.6.9(iii)(iv) contains the required
rate conditions, which are governed by sn – a parameter that is proportional to ν−1n (as
in Assumption 3.6) and is closely linked the degree of ill-posedness; see Remark S.6.2
below. Finally Assumption S.6.10 states the conditions on the weighting matrix Σn.
In this application, we may set ‖θ‖E = supP∈P ‖g‖P,2 + ‖γ‖2 for any (g, γ) ∈ Θ.
Since in addition any θ = (g, γ) ∈ Θn ∩R has the structure g = pjn′n β, we have
Vn(θ, `) =
(pjn′n βh, γh) : ‖g +pjn′n βh√
n‖1,∞ ≤ C0 and ‖γ +
γh√n‖2 ≤ C0 (S.155)
pjnn (s0, y0)′βh = 0 (S.156)
∂
∂s(g +
pjn′n βh√n
) + (g +pjn′n βh√
n)∂
∂y(g +
pjn′n βh√n
) ≤ 0 (S.157)
supP∈P‖pjn′n βh‖P,2 + ‖γh‖2 ≤ `
√n, (S.158)
where constraint (S.155) corresponds to (θ + h/√n) ∈ Θn, constraints (S.156) and
(S.157) impose θ+ h/√n ∈ R, and constraint (S.158) imposes ‖h/
√n‖E ≤ `. Similarly,
V un (θ, `) =
(pjn′n βh, γh) : ‖g +
pjn′n βh√n‖1,∞ ≤ C0 and ‖γ +
γh√n‖2 ≤ C0 (S.159)
supP∈P‖pjn′n βh‖P,2 + ‖γh‖2 ≤ `
√n. (S.160)
44
Finally, we also note that in this application Dn,P (θ) does not depend on θ due to the
moments being linear. Therefore, we suppress θ from the notation and note
Dn,P [h] = −EP [qknn (Z)(pjnn (S, Y )′βh +W ′γh)]
for any h = (pjn′n βh, γh) ∈ Bn. Given these definitions, recall for any `n we have
Un,P (R|`n) ≡ infh√n∈Vn(Πnθ0(P ),`n)
‖Wn,Pρ(·,Πnθ0(P ))qknn + Dn,P [h]‖Σn(P ),2
Un,P (Θ|`n) ≡ infh√n∈V u
n (Πnθ0(P ),`n)‖Wn,Pρ(·,Πnθ0(P )) + Dn,P [h]‖Σn(P ),2.
Theorem 3.2(ii) immediately yields the following distributional approximations.
Theorem S.6.3. Let Assumptions S.6.7, S.6.8, S.6.9, S.6.10 hold, and an = (log(n))−1/2.
Then: for any `n, `un ↓ 0 satisfying kn
√jn log(1 + kn)(`n ∨ `un
√log(√jn/(`n ∨ `un)) =
o(an) and kn√jn log(1 + kn)/sn
√n = o(`n ∧ `un) it follows uniformly in P ∈ P0 that
In(R) = Un,P (R|`n) + oP (an)
In(R)− In(Θ) = Un,P (R|`n)− Un,P (Θ|`un) + oP (an).
Theorem S.6.3 yields a family of strong approximation to both In(R) and the re-
centered statistic In(R)− In(Θ). Thus, our next goal is to obtain a bootstrap estimate
of the distribution of any of these strong approximations. To this end, we let
θn ∈ arg minθ∈Θn∩R
Qn(θ) θun ∈ arg min
θ∈ΘnQn(θ)
and, for ρ(·, θ) as in (S.150), we approximate the law of Wn,Pρ(·,Πnθ0(P )) by evaluating
Wnρ(·, θn)qknn ≡1√n
n∑i=1
ωiqknn (Zi)ρ(Xi, θn)− 1
n
n∑j=1
qknn (Zj)ρ(Xj , θn),
at θ = θn and θn = θun, and where ωini=1 is an i.i.d. sample of standard normal random
variables that are independent of the data. Furthermore, since ρ(X, θ) is linear in θ, we
do not need to employ a numerical derivative estimator. Instead, we simply set
Dn[h] = − 1
n
n∑i=1
qknn (Zi)(W′iγh + pjnn (Si, Yi)
′βh)
for any h = (pjn′n βh, γh) ∈ Bn as our estimator for Dn,P [h].
With regards to the local parameter space, we note that in this application Assump-
45
tions 3.11(i)(ii) are satisfied with Kg = 2 (see Lemma S.6.11). Therefore, we have
Gn(θn) = h√
n∈ Bn :
∂
∂spjnn (s, y)′(βn+
βh√n
)+pjnn (s, y)′(βn+βh√n
)∂
∂ypjnn (s, y)′(βn+
βh√n
)
≤ max ∂∂spjnn (s, y)′βn + pjnn (s, y)′βn
∂
∂ypjnn (s, y)′βn − 2rn‖
pjn′n βh√n‖1,∞,−rn
. (S.161)
Moreover, in this problem it is possible to set `n to be infinite due to ρ(X, ·) and ΥF
being linear. Hence, as a sample analogue to the local parameter space we employ
Vn(θn,+∞) = h√n∈ Bn :
h√n∈ Gn(θn) and pjnn (s0, y0)′βh = 0.
Given the introduced notation, we define the statistics Un(R|+∞) and Un(Θ|+∞) by
Un(R|+∞) ≡ infh√n∈Vn(θn,+∞)
‖Wnρ(·, θn) + Dn[h]‖Σn,2
Un(Θ|+∞) ≡ infh√n
‖Wnρ(·, θun) + Dn[h]‖Σn,2.
In order to establish the validity of the bootstrap, we impose one final assumption.
Assumption S.6.11. (i) There is an ε > 0 such that ‖gP ‖1,∞ ∨ ‖γP ‖2 ≤ C0 − ε
for all P ∈ P0; (ii) Πnθ0(P ) = (gn, γ) ∈ Θn ∩ R satisfies ‖gn − gP ‖1,∞ = o(1)
uniformly in P ∈ P0; (iii) The sequence rn ↓ 0 satisfies knj2n
√log(1 + kn)/sn
√n =
o(rn/√
log(n)); (iv) knj3/4n (En ∨
√log(kn)) log1/4(1 + kn) = o(n1/4/
√log(n)), where
En ≡∫∞
0
√log(ε, Cn, ‖ · ‖2)dε and Cn ≡ β : ‖pjn′n β‖1,∞ ≤ C0.
Assumptions S.6.11(i)(ii) suffice for verifying that Assumption 3.15(ii) is satisfied.
These requirements may dropped at the expense of modifying the estimators of the local
parameter space to reflect the possible impact of Πnθ0(P ) being “near” the boundary of
Θn. In turn, Assumption S.6.11 imposes the rate conditions on rn. Finally, Assumption
S.6.11(iv) controls the “size” of the set of coefficients β corresponding to elements pjn′n β ∈Θn and suffices for verifying the bootstrap coupling requirement of Assumption 3.14. For
instance, for tensor product B-splines, En j1/4n (see Lemma S.6.14), which implies a
sufficient condition for Assumption S.6.11(iv) is that k4nj
4n log4(kn) = o(n/ log2(n)). The
rate requirements for a bootstrap coupling can be weakened, for instance, if the test
statistic is based on the ‖ · ‖∞ of the moments (see Lemma S.6.10) or under additional
smoothness assumptions (see Theorem S.9.1(ii)).
Our next result characterizes the properties of the proposed bootstrap statistics.
Theorem S.6.4. Let Assumptions S.6.7, S.6.8, S.6.9, S.6.10, S.6.11 hold, and an =
(log(n))−1/2. Then: for any `n, ˜un ↓ 0 satisfying knj
2n log(1 + kn)/sn
√n = o(`n ∧ `un),
46
`n = o(rn), and kn√jn log(1 + kn)(`n ∨ `un)
√log(√jn/(`n ∨ `un)) = o(an) it follows that
uniformly in P ∈ P0 we have
Un(R|+∞) ≥ U?n,P (R|`n) + oP (an)
Un(R|+∞)− Un(Θ|+∞) ≥ U?n,P (R|`n)− U?n,P (Θ|`un) + oP (an).
We note that the existence of sequence `n, `un satisfying the requirements of the
theorem is implied by Assumptions S.6.9(iv) and S.6.11(iii). Importantly, any sequences
`n, `un satisfying the requirements of Theorem S.6.4 also satisfies the requirements of
Theorem S.6.3. As a result, together Theorems S.6.3 and S.6.4 justify employing
q1−α(Un(R|+∞)) ≡ infc : P (Un(R|+∞) ≤ c|Vini=1) ≥ 1− α
as a critical value for In(R). Similarly, for the statistic In(R)− In(Θ) we may employ
q1−α(Un(R|+∞)−Un(Θ|+∞)) ≡ infc : P (Un(R|+∞)−Un(Θ|+∞) ≤ c|Vini=1) ≥ 1−α.
Remark S.6.1. Suppose for notational simplicity that there are no covariates W and
let the marginal distribution of (S, Y, Z) be constant in P ∈ P. If Z = (S, Y ) (i.e. (S, Y )
is exogenous), we may set qknn (Z) = pknn (S, Y )′ for some kn ≥ jn. The singular value
sn can then be assumed to be bounded away from zero, and a sufficient condition for
Assumption S.6.9(iv) is that k3nj
3n log3(n) = o(n). In order to appreciate the content of
Assumption S.6.8(iv), suppose pj∞j=1 is an orthonormal basis such that
gP =
∞∑j=1
βjpj with |βj | = O(j−γβ ).
Setting ΠungP =
∑jnj=1 pjβj , we obtain from a standard integral bound for a sum that
‖EP [(gP (S, Y )−ΠungP (S, Y ))qknn (Z)]‖22 .
kn∑j=jn+1
1
j2γβ.
1
j2γβ−1n
− 1
k2γβ−1n
. (S.162)
For instance, if kn − jn = O(1), then (S.162) yields a bound of order 1/jγβn . Hence,
provided the approximation error by ΠungP and gn (as in Assumption S.6.8(iv)) are
of the same order when gP ∈ R, we obtain that Assumption S.6.8(iv) is equivalent
to√n log(n)/j
γβn = o(1) when kn − jn = O(1). This approximation requirement is
compatible with the condition k3nj
3n log3(n) = o(n) provided γβ > 3.
Remark S.6.2. Building on Remark S.6.1, suppose again there are no covariates W
and the marginal distribution of (S, Y, Z) is constant in P ∈ P, but now let (S, Y ) be
endogenous. A standard benchmark for nonparametric models with endogeneity is to
assume the operator g 7→ EP [g(X)|Z] is compact, in which case there are orthonormal
47
sequences of functions φj∞j=1 of (S, Y ) and ψj∞j=1 of Z satisfying
EP [φj(S, Y )|Z] = λjψj(Z) EP [ψj(Z)|S, Y ] = λjφj(S, Y )
where λj > 0 tends to zero. In addition suppose gP admits for an expansion satisfying
gP =∞∑j=1
βjφj with |βj | = O(j−γβ ),
and let pjn = (φ1, . . . , φjn)′, qkn = (ψ1, . . . , ψkn)′ with kn ≥ jn and kn − jn = O(1),
and set ΠungP =
∑jnj=1 φjβj . Provided the approximation error of Πu
ngP and gn (as in
Assumption S.6.8(iv)) are of the same order when gP ∈ R, we then obtain
‖EP [(gP (S, Y )− gn(S, Y ))qknn (Z)]‖2 .λjnjγβn
.
Moreover, direct calculation shows sn, which is proportional to ν−1n as in Assumption 3.6,
satisfies sn = λjn and hence equals the reciprocal of the sieve measure of ill-posedness
(Blundell et al., 2007). It follows that if λj j−γλ , and γβ > 3, then Assumptions
S.6.8(iv) and S.6.9(iv) can be satisfied by setting jn nκ with (γλ + γβ)−1 < 2κ <
(3 + γλ)−1 and kn − jn = O(1). Alternatively, if λj = exp−γλj, then Assumption
S.6.8(iv) and S.6.9(iv) can be satisfied when γβ > 4 by setting, for example, jn =
(log(n)− κ log(log(n)))/2γλ with 7 < κ < 2γβ − 1 and kn − jn = O(1).
S.6.2.1 Proofs and Auxiliary Results
Proof of Theorem S.6.3: We establish the theorem by simply verifying the condi-
tions of Theorem 3.2(ii) for both R as corresponding to (S.152) and (S.153) (to couple
In(R)) and to R = Θ (to couple In(Θ)). To this end, note Assumption 3.2 is imposed in
Assumption S.6.7(i), Assumption 3.3(i) holds with Bn √kn by Assumption S.6.9(i),
and Assumption 3.3(ii) is satisfied by Assumption S.6.9(ii). Further note that for R = Θ,
the class Fn has bounded envelope Fn by Assumption S.6.7(iii) and ‖g‖∞ ≤ C0 for any
(g, γ) ∈ Θ. Hence, by Lemma S.6.8 it follows that Assumption 3.3(iii) holds with
Jn √jn log(1 + jn) when R = Θ, and therefore also for R as corresponding to (S.152)
and (S.153). Assumption 3.4 is implied by Assumption S.6.10.
Lemma S.6.7 additionally verifies that Assumptions 3.5 and 3.6 hold with ‖ · ‖E ≡supP∈P ‖·‖P,2+‖·‖2, Vn(P ) = Θn∩R, and ν−1
n sn for both R = Θ and R as correspond-
ing to (S.152) and (S.153). Similarly, we note Lemma S.6.9 and j3nk
3n log3(n)/n = o(1)
by Assumption S.6.9(iv) imply Assumption 3.7 for both specifications of R under con-
sideration and any an satisfying an = O((log(n))−1). To verify Assumption 3.8, we
48
observe that for any (g1, γ1) ∈ Θn and (g2, γ2) ∈ Θn, Assumption S.6.7(iii) implies
EP [((Q− g1(S, Y )−W ′γ1)− (Q− g2(S, Y )−W ′γ2))2]
. supP∈P‖g1 − g2‖2P,2 + ‖γ1 − γ2‖22. (S.163)
Hence, since ‖ · ‖E ≡ supP∈P ‖ · ‖P,2 + ‖ · ‖2 it follows Assumption 3.8 holds with κρ = 1
and some Kρ <∞. Similarly, note in this application we have
∇mP (θ)[h] = −EP [gh(S, Y ) +W ′γh|Z]
for any θ ∈ B and (gh, γh) = h ∈ B. By direct calculation it then follows Assumptions
3.9(i)(ii) hold with Km = 0 for ‖ · ‖L = ‖ · ‖E, and Assumption 3.9(iii) is satisfied
for some Mm < ∞ by result (S.163) and Jensen’s inequality. Finally, note that since
as argued νn 1/sn, p = 2, Jn √jn log(1 + jn), and Bn
√kn, it follows Rn
kn√jn log(1 + kn)/sn
√n due to jn ≤ kn by Assumption S.6.9(iii). Therefore, since
κρ = 1, Lemma S.6.8 implies Assumption 3.10(i) demands kn√jn log(1 + kn)Rn(1 +√
log(1 ∨ (√jn/Rn))) = o(an), which is satisfied with an = 1/
√log(n) by Assumption
S.6.9(iv). Hence, since Assumptions S.6.8(iv) and S.6.10 imply Assumption 3.10(ii)
and 3.10(iii) respectively, it follows that Assumption 3.10 holds as well. To conclude,
note that since Km = 0, the condition KmR2nSn(L,E) = o(ann
−1/2) is automatically
satisfied, the requirement Rn = o(`n) is equivalent to kn√jn log(1 + kn)/sn
√n = o(`n),
and the condition k1/pn
√log(1 + kn)Bn × supP∈P J[ ](`
κρn ,Fn, ‖ · ‖P,2) = o(an) is implied
by kn√jn log(1 + kn)`n
√log(√jn/`n) = o((log(n))−1/2). Thus, all the conditions of
Theorem 3.2(ii) hold for both R = Θ and R corresponding to (S.152) and (S.153), and
thus the claim of the theorem follows.
Proof of Theorem S.6.4: We first define En(R|`n) and En(Θ|`un) to be given by
En(R|`n) ≡ infh√n∈Vn(θn,`n)
‖Wnρ(·, θn) + Dn[h]‖Σn,2
En(Θ|`un) ≡ inf‖ h√
n‖2≤`un
‖Wnρ(·, θun) + Dn[h]‖Σn,2,
and note that for any sequence `n, `un satisfying the conditions of the theorem, Lemma
S.6.12 implies Un(R|+∞) = En(R|`n) + oP (an) and Un(Θ|+∞) = En(Θ|`un) + oP (an)
uniformly in P ∈ P0. To establish the theorem, it therefore suffices to show
En(R|`n) ≥ U?n,P (R|˜n) + oP (an)
En(R|`n)− En(Θ|`un) ≥ U?n,P (R|˜n)− U?n,P (Θ|˜un) + oP (an)
with `n ˜n and `un ˜u
n uniformly in P ∈ P0. To this end we rely on Theorem
3.3 (for En(R|`n)) and Corollary 3.3 (for En(R|`n) − En(Θ|`un)). We note that in the
49
proof of Theorem S.6.3 we established that Assumptions S.6.7, S.6.8, S.6.9, and S.6.10
imply Assumptions 3.1, 3.2, 3.3, 3.4, 3.5, 3.6, 3.7, 3.8, 3.9, and 3.10 hold with Rn kn√jn log(1 + kn)/sn
√n, Bn
√kn, νn 1/sn, ‖θ‖B = ‖g‖1,∞ ∨ ‖γ‖2 for any θ =
(g, γ) ∈ Bn, ‖θ‖L = ‖θ‖E = supP∈P ‖g‖P,2 + ‖γ‖2 for any θ = (g, γ) ∈ Bn, κρ = 1,
and an = 1/√
log(n) for R = Θ and R as corresponding to (S.152) and (S.153). We
thus only verify Assumptions 3.11, 3.12, 3.13, 3.14, 3.15, and 3.16 for R = Θ and R as
corresponding to (S.152) and (S.153).
Next note that Lemma S.6.11 implies Assumptions 3.11, 3.12, and 3.13 are satisfied
with Kg = 2 and Kf = 0, while Lemma S.6.10 and Assumption S.6.11(iv) imply As-
sumption 3.14 holds with R = Θ (and hence for R corresponding to (S.152) and (S.153))
with an = (log(n))−12 . Further note that since supP∈P ‖g‖P,2 + ‖γ‖2 ≤ 2(‖g‖1,∞ ∨‖γ‖2)
for any (g, γ) ∈ C1B(Ω) × Rdw , it follows that Assumption 3.15(i) holds with Kb = 2.
To verify Assumptions 3.15(ii) and 3.16, note that by the definitions of ‖ · ‖B and ‖ · ‖Ein this application and the eigenvalues of EP [pjnn (S, Y )pjnn (S, Y )′] being bounded away
from zero uniformly in P ∈ P by Assumption S.6.8(iii) we obtain
Sn(B,E) = sup(β,γ)
‖pjn′n β‖1,∞ ∨ ‖γ‖2supP∈P ‖p
jn′n β‖P,2 + ‖γ‖2
≤ 1 ∨ supβ
‖pjn′n β‖1,∞supP∈P ‖p
jn′n β‖P,2
. 1 ∨ supβ
‖pjn′n β‖1,∞‖β‖2
. j3/2n (S.164)
where the final equality follows from Assumptions S.6.8(i)(ii). Thus, since setting θn
and θun to be the minimizers of Qn corresponds to setting τn = 0, Assumption 3.16(i)
holds provided RnSn(B,E) = o(1), which is satisfied by Assumption S.6.9(iv) and
jn ≤ kn due to Assumption S.6.9(iii). Furthermore, also note that RnSn(B,E) = o(1)
together with Assumption S.6.11(i)(ii) and setting Πunθ0(P ) = Πr
nθ0(P ) = Πnθ0(P )
for all P ∈ P0 imply Assumption 3.15(ii) is satisfied as well. Moreover, since Bn √kn and Kf = Km = 0, Lemma S.6.8 implies Assumption 3.16(ii) holds for any `n
satisfying kn√jn log(1 + kn)`n(1 +
√log(√jn/`n)) = o(an). Similarly, we obtain that
Assumption 3.16(iii) is satisfied provided `n = o(rn) (imposed in the Theorem) and
knj2n log(1 + kn)/sn
√n = o(rn) (implied by Assumption S.6.11(iii)). We have thus
verified the conditions of Theorem 3.3 and Corollary 3.3, which together establish the
claim of the Theorem.
Lemma S.6.7. If Assumptions S.6.7(ii)(iii), S.6.8, S.6.9(i)(iii)(iv), and S.6.10 hold,
then Assumptions 3.5 and 3.6 are satisfied with both R = Θ and R corresponding to
(S.152) and (S.153), ‖ · ‖E = supP∈P ‖ · ‖P,2 + ‖ · ‖2, Vn(P ) = Θn ∩R, and ν−1n sn.
Proof: Note that by Assumption S.6.7(ii) there is a unique θ0(P ) ≡ (gP , γP ) ∈ Θ∩R for
which (S.148) holds, and we let Πunθ0(P ) = Πr
nθ0(P ) = (gn, γP ) ∈ Θn∩R for gn = pjn′n βn
50
as in Assumption S.6.8(iv). Further note that in this application Qn,P (θ) equals
Qn,P (θ) = ‖EP [(Q− g(S, Y )−W ′γ)qknn (Z)]‖Σn(P ),2
for any (g, γ) = θ. Therefore, the definition of Πrnθ0(P ), θ0(P ) satisfying (S.148) and
‖Σn(P )‖o,2 being bounded uniformly in n and P ∈ P by Assumption S.6.10(ii) imply
Qn,P (Πrnθ0(P )) = ‖EP [(gP (S, Y )− gn(S, Y ))qknn (Z)]‖Σn(P ),2 = O(
1√n log(n)
) (S.165)
where the final result holds by Assumption S.6.8(iv). Also note that since Fn is uniformly
bounded by definition of Θ and (Q,W ) having compact support by Assumption S.6.7(iii),
Lemma S.6.8 implies Jn .√jn log(1 + jn). Hence, since we may set Bn
√kn by
Assumption S.6.9(i) and kn ≥ jn by Assumption S.6.9(iii), ηn (in the statement of
Assumption 3.6) is bounded by kn√jn log(1 + kn)/
√n. Thus, since Qn,P (θ) ≥ 0 for all
θ ∈ B, it follows from result (S.165) that Assumption 3.5 is satisfied.
In order to verify Assumption 3.6(i) we set ‖θ‖E ≡ supP∈P ‖g‖P,2 + ‖γ‖2 for any
(g, γ) = θ ∈ B. Since the eigenvalues of EP [pjnn (S, Y )pjnn (S, Y )′] are bounded uniformly
in n and P ∈ P by Assumption S.6.8(iii) it follows for any (pjn′n β, γ) ∈ Bn that
sn‖θ −Πrnθ0(P )‖E . sn‖β − βn‖2 + ‖γ − γP ‖2
. ‖EP [(pjnn (S, Y )′(β − βn) +W ′(γ − γP ))qknn (Z)]‖Σn(P ),2
≤ Qn,P (θ)−Qn,P (Πrnθ0(P )) +O(
1√n log(n)
) (S.166)
where the second inequality holds due to ‖Σn(P )−1‖o,2 being bounded uniformly in n
and P ∈ P by Assumption S.6.10(ii), while the final inequality follows from (S.165) and
the triangle inequality. Thus, we conclude form (S.166) that Assumption 3.6(i) holds
with ν−1n sn and Vn(P ) = Θn ∩ R. Finally, note Assumption 3.6(ii) holds by result
(S.165) and Assumption S.6.10(i) and hence the lemma follows.
Lemma S.6.8. Define the class Fn ≡ f : f(v) = (q − g(s, y)− w′γ) for some (g, γ) ∈Θn and suppose that Assumptions S.6.7(iii) and S.6.8(i)(iii) hold. Then, it follows
that supP∈PN[ ](ε,Fn, ‖ · ‖P,2) . (1∨ (√jnK/ε)
jn+dw) for some K <∞, and in addition
supP∈P J[ ](ε,Fn, ‖ · ‖P,2) . ε√jn(1 +
√log(1 ∨ (
√jn/ε))).
Proof: Define the classes F1n ≡ f : f(v) = q − w′γ with ‖γ‖2 ≤ C0 and F2n ≡pjn′n β : ‖pjn′n β‖1,∞ ≤ C0, and then note that by definition of Fn we have
supP∈P
N[ ](ε,Fn, ‖ · ‖P,2) ≤ supP∈P
N[ ](ε
2,F1n, ‖ · ‖P,2)× sup
P∈PN[ ](
ε
2,F2n, ‖ · ‖P,2). (S.167)
Next observe that since the support of W is bounded uniformly in P ∈ P by Assumption
S.6.7(iii), the Cauchy-Schwarz inequality, the covering numbers of γ ∈ Rdw : ‖γ‖2 ≤
51
C0 being bounded (up to a multiplicative constant) by 1 ∨ ε−dw , and Theorem 2.7.11
in van der Vaart and Wellner (1996) imply that
supP∈P
N[ ](ε
2,F1n, ‖ · ‖P,2) . 1 ∨ ε−dw . (S.168)
Similarly, for any elements pjn′n β1, pjn′n β2 ∈ F2n, it follows from the Cauchy-Schwarz that
|pjnn (s, y)′β1 − pjnn (s, y)′β2| ≤ sup(s,y)‖pjnn (s, y)‖2‖β1 − β2‖2 .
√jn‖β1 − β2‖2,
where in the final inequality we employed Assumption S.6.8(i). Hence, Theorem 2.7.11 in
van der Vaart and Wellner (1996), ‖β‖2 ‖pjn′n β‖P,2 uniformly in P ∈ P by Assumption
S.6.8(ii), and ‖pjn′n β‖P,2 ≤ ‖pjn′n β‖∞ ≤ C0 for any pjn′n β ∈ Θn imply that
supP∈P
N[ ](ε
2,F2n, ‖ · ‖P,2) . 1 ∨ (
K√jnε
)jn (S.169)
for some K < ∞. Thus, the first claim of the lemma follows from results (S.167),
(S.168), and (S.169). For the second claim of the lemma, we employ the first claim of
the lemma and the change of variables v = u/ε to obtain the bound
supP∈P
J[ ](ε,Fn, ‖ · ‖P,2) . ε+
∫ ε
0(log(1 ∨ (
K√jn
u)jn+dw))1/2du
= ε(1 +√jn + dw
∫ 1
0(log(1 ∨ (
K√jn
vε)))1/2dv) .
√jnε(1 +
√log(1 ∨ (
√jn/ε))),
where we used that (1 ∨ ab) ≤ (1 ∨ a)(1 ∨ b) whenever a and b are positive.
Lemma S.6.9. If k3nj
3n log2(n) = o(n), Assumptions S.6.7(i)(iii), S.6.9(i), S.6.8(i)(iii)
hold, then Assumption 3.7 holds with R = Θ for any an with k3nj
3n log2(n)/n = o(an).
Proof: We establish the claim of the lemma by relying on Lemma S.6.28. To this
end, let jn = (1 + dw) + jn, set rjnn (x) ≡ (q, w′, pjn′n (x))′, and observe any f ∈ Fn can be
written as f = rjn′n δ for some δ ∈ Rjn . Moreover, by Assumption S.6.8(iii) and definition
of Θn, it follows that there exists an M < ∞ such that Fn ⊆ rjn′n δ : ‖δ‖2 ≤ M,while Assumptions S.6.7(iii), S.6.8(i), and S.6.9(i) imply max1≤j≤jn ‖rj‖∞ .
√jn and
max1≤k≤kn ‖qk‖∞ .√kn. The claim of the lemma therefore follows from applying
Lemma S.6.28 with b1n √kn, b2n
√jn, and Cn = M .
Lemma S.6.10. Suppose Assumptions S.6.7(i)(iii), S.6.9(i)(ii), and S.6.8(i)(iii) hold
and let Cn ≡ β ∈ Rjn : ‖pjn′n β‖1,∞ ≤ C0 and En ≡∫∞
0
√log(N(ε, Cn, ‖ · ‖2))dε. If
j2nk
2n log(1 + knjn) = o(n), then it follows that Assumption 3.14 holds with R = Θ for
any sequence an satisfying k1/pn (
√log(kn) + En)j
3/4n k
1/2n log1/4(1 + jnkn)/n1/4 = o(an).
Proof: Recall that in this application X ≡ (Q,S, Y,W ′)′ and, when R = Θ, we
52
have Fn ≡ f : f(x) = (q − g(s, y) − w′γ) for some (g, γ) ∈ Θn. Therefore, letting
Fn ≡ fqk,n : f ∈ Fn and 1 ≤ k ≤ kn we obtain by direct calculation that
supf∈Fn
‖Wnfqknn −W?
nfqknn ‖p ≤ k1/p
n supf∈Fn
|Wnf −W?n,P f |. (S.170)
We establish the claim of the lemma by relying on (S.170) and applying Theorem S.9.1
to the class Fn. To this end, define dn = kn(jn + dw + 1) and let fdnn (v) ≡ qknn (z) ⊗(pjnn (s, y)′, q, w′)′. Set D1 ≡ (Q,W ′, pjnn (S, Y )′)′ and D2 = qknn (Z), and for eigD1D
′1
the largest eigenvalue of D1D′1, then note that we have
supP∈P‖eigD1D
′1‖P,∞ ≤ sup
P∈P‖traceD1D
′1‖P,∞ . jn, (S.171)
where the final inequality follows from Assumptions S.6.7(iii) and S.6.8(i). Hence, since
eigEP [qknn (Z)qknn (Z)′] is bounded uniformly in P ∈ P by Assumption S.6.9(ii), result
(S.171) and Lemma S.6.13 imply Assumption S.9.1(i) is satisfied with Cn jn. Similarly,
note that Assumptions S.6.7(iii), S.6.9(i), and S.6.8(i) imply Assumption S.9.1(ii) holds
with Kn =√jnkn. Moreover, Assumption S.9.2(i) is immediate with Gn,P equal to the
zero function and J1n = 0. Finally, note that any f ∈ Fn has the structure
f(v) = qk,n(z)(q − pjnn (s, y)′β − w′γ) for some (pjn′n β, γ) ∈ Θn. (S.172)
Therefore, for Bn as defined in Assumption S.9.2(ii), Cn ≡ β ∈ Rjn : ‖pjn′n β‖1,∞ ≤ C0,and Gn ≡ γ ∈ Rdw : ‖γ‖2 ≤ C0, we can conclude that
N(ε,Bn, ‖ · ‖2) ≤ kn ×N(ε
2,Gn, ‖ · ‖2)×N(
ε
2, Cn, ‖ · ‖2)
. kn × ((C0
ε)dw ∨ 1)×N(
ε
2, Cn, ‖ · ‖2), (S.173)
where in the second inequality we employed that N(ε,Gn, ‖ · ‖2) . (C0/ε)dw ∨ 1. Fur-
thermore, note that Assumption S.6.8(iii) implies that ‖β‖2 ‖pjn′n β‖P,2 uniformly in
n and P ∈ P, and hence since ‖pjn′n β‖P,2 ≤ ‖pjn′n β‖1,∞, the definition of Θn and (S.172)
implies that the radius of Bn under ‖ · ‖2 is uniformly bounded in n. Thus, the bound
in (S.173) yields that for some M <∞ we must have∫ ∞0
√log(N(ε,Bn, ‖ · ‖2))dε
.∫ M
0
√log(kn)dε+
∫ C0
0
√log(C0/ε)dε+
∫ M
0
√log(N(ε/2, Cn, ‖ · ‖2))dε
.√
log(kn) +
∫ ∞0
√log(N(u, Cn, ‖ · ‖2))du,
where the final inequality follows from N(ε, Cn, ‖ · ‖2) being (weakly) larger than one
for all ε and the change of variables u = ε/2. Hence, Assumption S.9.2(ii) holds with
53
J2n =√
log(kn) + En, and as a result Theorem S.9.1 implies uniformly in P ∈ P that
‖Wn −W?n,P ‖Fn = OP ((
√log(kn) + En)j
3nk
2n log(1 + jnkn)
n1/4). (S.174)
The claim of the lemma therefore follows from (S.170) and (S.174).
Lemma S.6.11. If B = C1B(Ω) ×Rdw and ΥG, ΥF , and Θ are as defined in (S.152),
(S.153), and (S.154), then it follows that Assumptions 3.11, 3.12, and 3.13 are satisfied
with Kg = 2, Kf = 0, and for any θ = (g, γ) and h = (gh, γh), ∇ΥG(θ)[h] equals
∇ΥG(θ)[h](s, y) =∂
∂sgh(s, y) + g(s, y)
∂
∂ygh(s, y) + gh(s, y)
∂
∂yg(s, y).
Proof: Recall that in this application G = C0B(Ω) and ‖θ‖B = max‖g‖1,∞, ‖γ‖2.
Hence, for any θ1 = (g1, γ1) ∈ B and θ2 = (g2, γ2) ∈ B we obtain that
‖ΥG(θ1)−ΥG(θ2)−∇ΥG(θ1)[θ1 − θ2]‖G
≤ sup(s,y)∈Ω
|g1(s, y)− g2(s, y)| × sup(s,y)∈Ω
| ∂∂y
(g1(s, y)− g2(s, y))| ≤ ‖g1 − g2‖21,∞, (S.175)
which verifies Assumption 3.11(i) holds with Kg = 2. Similarly, we additionally conclude
‖∇ΥG(θ1)−∇ΥG(θ2)‖o
= supgh:‖gh‖1,∞≤1
sup(s,y)∈Ω
|(g1(s, y)− g2(s, y))∂
∂ygh(s, y) + gh(s, y)
∂
∂y(g1(s, y)− g2(s, y))|
≤ 2‖g1 − g2‖1,∞, (S.176)
which verifies Assumption 3.11(ii) holds with Kg = 2 as well. Moreover, note that since
any θ = (g, γ) ∈ Θ satisfies ‖g‖1,∞ ≤ C0, it follows that ‖g‖1,∞ ≤ C0 + ε for any g ∈ Θε.
Thus, by identical arguments to those in (S.176) we obtain
‖∇ΥG(θ)‖o ≤ 2‖g‖1,∞ ≤ 2(C0 + ε), (S.177)
which thus verifies Assumption 3.11(iii) holds with Mg = 2(C0 + ε).
Next note ΥF : B→ F is affine and continuous, and hence ∇ΥF (θ)[h] = ΥF (h)− c0
for all θ, h ∈ B. Therefore, Assumptions 3.12(i)(ii) hold with Kf = 0, while
supgh:‖gh‖1,∞≤1
|gh(s0, y0)| ≤ 1 (S.178)
implies Assumption 3.12(iii) is satisfied with Mf = 1. Since ΥF being affine and Kf = 0
further imply that Assumptions 3.12(iv) and 3.13 hold, the lemma follows.
Lemma S.6.12. Let Assumptions S.6.7, S.6.8, S.6.9, S.6.10 hold, and an = (log(n))−1/2.
54
For any `n with knj2n log(1 + kn)/sn
√n = o(`n), it follows uniformly in P ∈ P0 that
Un(R|+∞) = infh√n∈Vn(θn,`n)
‖Wnρ(·, θn) + Dn[h]‖Σn,2 + oP (an)
Un(Θ|+∞) = inf‖ h√
n‖B≤`n
‖Wnρ(·, θun) + Dn[h]‖Σn,2 + oP (an).
Proof: We establish the lemma by applying Lemma 3.1. To this end, recall that in the
proof of Theorem S.6.3, Assumptions 3.3(i)(iii) and 3.4 were verified to hold with Bn √kn and Jn
√jn log(1 + jn). Since the eigenvalues of EP [pjnn (S, Y )pjnn (S, Y )′] are
bounded uniformly in P ∈ P by Assumption S.6.8(iii) and ‖θ‖E = supP∈P ‖g‖P,2 +‖γ‖2for any θ = (g, γ), it also follows that for any h = (pjn′n βh, γh) ∈ Bn we have
‖h‖E = supP∈P‖pjn′n βh‖P,2 + ‖γ‖2 . ‖βh‖2 + ‖γh‖2
.1
sn‖EP [qknn (Z)(pjnn (S, Y )′βh +W ′γh)]‖2 =
1
sn‖Dn,P [h]‖2,
where the second inequality holds by Assumption S.6.9(iii) and the final equality follows
from the definition of Dn,P [h]. Hence, since νn 1/sn by Lemma S.6.7 and p = 2, we
conclude the Lemma 3.1 requirement that ‖h‖E ≤ νn‖Dn,P (θ)[h]‖p for all θ ∈ An(P )
holds with An(P ) = Θn ∩ R (for either R = Θ or R corresponding to (S.152) and
(S.153)). Next, define the kn × (jn + dw) matrix Mi,n to be given by
Mi,n ≡1
nqknn (Zi)(p
jnn (Si, Yi)
′ W ′i )− EP [qknn (Z)(pjnn (S, Y )′ W ′)], (S.179)
which satisfies EP [Mi,n] = 0. Since ‖(pjn′n β, γ)‖E ‖β‖2+‖γ‖2 by Assumption S.6.8(iii),
we then conclude from (S.179) that for some C1 <∞ we must have
supP∈P
P ( suph∈Bn
‖Dn[h]− Dn,P [h]‖2‖h‖E
> sn) ≤ supP∈P
P (‖ 1
n
n∑i=1
Mi,n‖o > C1sn)
≤ (jn + kn) exp− ns2nC2
(k2n ∨ jn) + snkn
√jn = o(1), (S.180)
where the final inequality follows by applying Lemma S.6.29 with b1n =√jn (by As-
sumptions S.6.7(iii) and S.6.8(i)) and b2n = kn (by Assumption S.6.9(i)), while the final
equality results from log(kn)k2n/s
2nn = o(1) by Assumption S.6.9(iv) and kn ≥ jn by
Assumption S.6.8(iii). Hence, νn 1/sn and (S.180) imply condition (24) in Lemma
3.1 holds. Finally, we note that by Assumption S.6.11(iv), we may apply Lemma S.6.10
with p = 2 to conclude that Assumption 3.14 holds with R = Θ (and hence for R as cor-
responding to (S.152) and (S.153)) with an = (log(n))−12 . This concludes verifying the
requirements of Lemma 3.1 and therefore the present Lemma follows for any `n satisfying
Sn(B,E)Rn = o(`n), which in this application is equivalent to knj2n log(1 + kn)/sn
√n =
55
o(`n) due to Sn(B,E) . j3/2n and Rn kn
√jn log(1 + kn)/sn
√n.
Lemma S.6.13. Let D1 ∈ Rd1, D2 ∈ Rd2 be distributed according to Q, and for any
matrix A let eigA denote its largest eigenvalue. Then it follows that
eigEQ[(D1 ⊗D2)(D1 ⊗D2)′] ≤ ‖eigD1D′1‖Q,∞ × eigEQ[D2D
′2].
Proof: Let A ≡ aid1i=1 : ai ∈ Rd2 and∑d1
i=1 ‖ai‖22 ≤ 1, set (D(1)1 , . . . , D
(d1)1 ) =
D1 ∈ Rd1 , and then note that by direct calculation we obtain that
eigEQ[(D1 ⊗D2)(D1 ⊗D2)′]
= supai
d1i=1∈A
(a′1, . . . , a′d1)EQ[(D1 ⊗D2)(D1 ⊗D2)′](a′1, . . . , a
′d1)′
= supai
d1i=1∈A
EQ[(
d1∑i=1
(a′iD2)D(i)1 )2]
≤ ‖eigD1D′1‖Q,∞ sup
aid1i=1∈A
d1∑i=1
EQ[(a′iD2)2].
However, since∑d1
i=1 ‖ai‖22 ≤ 1 for all aid1i=1 ∈ A, we additionally have the inequality
supai
d1i=1∈A
d1∑i=1
EQ[(a′iD2)2] ≤ supai
d1i=1∈A
d1∑i=1
eigEQ[D2D′2]‖ai‖22 = eigEQ[D2D
′2],
and therefore the claim of the Lemma follows.
Lemma S.6.14. Let λ be the Lebesgue measure, B(1)b,n
j1nb=1 and B(2)
b,nj2nb=1 be B-splines
on [0, 1] of order r ≥ 3 with no interior knot multiplicity, mesh ratio bounded in n, and
‖ · ‖λ,2 normalized to have norm one. If pj,njnj=1 is the tensor product of B(1)b,n
j1nb=1 and
B(2)b,n
j2nb=1 and Cn ≡ β ∈ Rjn : ‖pjn′n β‖1,∞ ≤ C0, then it follows that∫ ∞
0
√log(N(ε, Cn, ‖ · ‖2))dε .
√j1n ∧ j2n log(jn + 1).
Proof: We rely heavily on Chapter 5 in DeVore and Lorentz (1993), and note that
Bj corresponds to Nj/‖Nj‖λ,2 in their notation. Throughout, for two sequences an and
bn we employ an bn to mean that there exist constants c and c such that can ≤bn ≤ can for all n. In what follows it will also prove convenient to index the elements
of β ∈ Rjn by βb1,b2 with 1 ≤ b1 ≤ j1n and 1 ≤ b2 ≤ j2n. Then note that the mesh
ratios corresponding to B(1)b,n
j1nb=1 and B(2)
b,nj2nb=1 being bounded uniformly in n and two
56
applications of Theorem 5.4.2 in DeVore and Lorentz (1993) imply that
‖pjn′n β‖∞ = supu1∈[0,1]
supu2∈[0,1]
|j2n∑b2=1
B(2)b2,n
(u2)
j1n∑b1=1
βb1,b2B(1)b1,n
(u1)|
supu1∈[0,1]
max1≤b2≤j2n
√j2n|
j1n∑b1=1
βb1,b2B(1)b1,n
(u1)| √j1nj2n‖β‖∞ (S.181)
uniformly in β ∈ Rjn . By similar arguments we also obtain uniformly in β ∈ Rjn that
supu1∈(0,1)
supu2∈[0,1]
|j2n∑b2=1
B(2)b2,n
(u2)
j1n∑b1=1
∂
∂u1βb1,b2B
(1)b1,n
(u1)|
supu1∈(0,1)
max1≤b2≤j2n
√j2n|
j1n∑b1=1
∂
∂u1βb1,b2B
(1)b1,n
(u1)|
max1≤b2≤j2n
max2≤b1≤j1n
√j2nj
3/21n |βb1,b2 − βb1−1,b2 |, (S.182)
where the second result follows by employing equation (3.11) and Theorem 5.4.2 in
Chapter 5 of DeVore and Lorentz (1993) and the mesh ratio of B(1)b,n
j1nb=1 being bounded.
Since by identical arguments we can also derive the symmetric (to (S.182)) relationship
supu1∈[0,1]
supu2∈(0,1)
|j1n∑b1=1
B(1)b1,n
(u1)
j2n∑b2=1
∂
∂u2βb1,b2B
(2)b2,n
(u2)|
max1≤b1≤j1n
max2≤b2≤j2n
√j1nj
3/22n |βb1,b2 − βb1,b2−1|, (S.183)
it follows from results (S.181), (S.182), and (S.183) that there is an M0 <∞ such that
max1≤b1≤j1n
max1≤b2≤j2n
|βb1,b2 | ≤M0/√jn
max1≤b2≤j2n
max2≤b1≤j1n
|βb1,b2 − βb1−1,b2 | ≤M0/(j1n√jn)
max1≤b1≤j1n
max2≤b2≤j2n
|βb1,b2 − βb1,b2−1| ≤M0/(j2n√jn) (S.184)
for all β ∈ Cn. Hence, in order to establish the claim of the lemma, it suffices to bound
the covering numbers for the set defined by (S.184).
We proceed by combining two bounds, one for “small” ε and one for “large” ε. First,
assume without loss of generality j1n ≥ j2n, let cn ≡ dlog(j1n + 1)e, and define the sets
ε
3√jnk1 ≤ βb1,b2 ≤
ε
3√jn
(k1 + 1) for all b1 = mcn + 1 with 0 ≤ m ≤ dj1n/cne − 1
ε
3cn√jnk2 ≤ βb1,b2 − βb1−1,b2 ≤
ε
3cn√jn
(k2 + 1) otherwise (S.185)
57
where k1, k2 are non-zero integers – i.e. the sets (in Rjn) defined in (S.185) consist of
“chains” along the b1 dimension that reset every cn integers. To compute the diameter
of the sets in (S.185), then note that since all “chains” have the same structure
sup ‖β − β‖22 s.t. β, β satisfying (S.185)
≤ sup j2ndj1ncne
cn∑b1=1
(βb1,j2n − βb1,j2n)2 s.t. β, β satisfying (S.185)
≤ j2ndj1ncne
cn∑b1=1
ε2
9jn1 +
2(b1 − 1)
cn2, (S.186)
where the final inequality follows from (S.185). Since dj1n/cnecn ≤ (j1n + cn) ≤ 2j1n
due to dj1n/cne ≤ 1 + j1n/cn and cn ≤ j1n, it follows from jn = j1nj2n that every set in
(S.185) is contained in a ball of radius ε. Moreover, by (S.184) the total number of sets
with the structure in (S.185) needed to cover the set Cn is bounded by
(d6M0
εe)j2nd
j1ncne(d6M0cn
εj1ne)j2nd
j1ncnecn . (S.187)
Next, we employ again the bound dj1n/cnecn ≤ 2j1n and dae ≤ 2a whenever a ≥ 1, to
obtain from (S.186) and (S.187) that whenever ε ≤ 6M0cn/j1n we have
N(ε, Cn, ‖ · ‖2) ≤ (12M0
ε)2jncn (
12M0cnεj1n
)2jn
= (12M0cnεj1n
(j1ncn
)1cn )
2jn(cn+1)cn ≤ (
M1 log(1 + j1n)
εj1n)4jn , (S.188)
where the final equality holds for someM1 <∞ due to (cn+1)/cn ≤ 2 and (j1n/cn)1/cn ≤
j1
log(1+j1n)
1n = O(1) because cn = dlog(1 + j1n)e.
The bound in (S.188) is valid only for ε ≤ 6M0cn/j1n. To obtain a bound for
ε ≥ 6M0cn/j1n, let Zb1,b2b1,b2 be independent standard normal random variables. By
Sudakov’s inquality (see, e.g., Proposition A.2.5 in van der Vaart and Wellner (1996)),
it then follows that for some M2 <∞ independent of n we have that
√log(N(ε, Cn, ‖ · ‖2)) ≤ M2
εE[ sup
β∈Cn
j1n∑b1=1
j2n∑b2=1
βb1,b2Zb1,b2 ]. (S.189)
Next, for notational convenience define ∆b1βb1,b2 = (βb1,b2 − βb1−1,b2) and ∆b2βb1,b2 =
58
(βb1,b2 − βb1,b2−1), and then note that by (S.184) it follows that
supβ∈Cn
j1n∑b1=1
j2n∑b2=1
βb1,b2Zb1,b2
= supβ∈Cn
j2n∑b2=1
j1n∑b1=2
∆b1βb1,b2
j1n∑b1=b1
Zb1,b2 +
j2n∑b2=2
∆b2β1,b2
j1n∑b1=1
j2n∑b2=b2
Zb1,b2 + β1,1
j1n∑b1=1
j2n∑b2=1
Zb1,b2
≤j2n∑b2=1
j1n∑b1=2
M0
j1n√jn|j1n∑b1=b1
Zb1,b2 |+j2n∑b2=2
M0
j2n√jn|j1n∑b1=1
j2n∑b2=b2
Zb1,b2 |+M0√jn|j1n∑b1=1
j2n∑b2=1
Zb1,b2 |
Hence, employing that if W ∼ N(0, σ2) then E[|W|] . σ, we can conclude that
E[ supβ∈Cn
j1n∑b1=1
j2n∑b2=1
βb1,b2Zb1,b2 ] .j2n∑b2=1
j1n∑b1=2
√j1n − b1j1n√jn
+
j2n∑b2=2
√j1n(j2n − b2)
j2n√jn
+ 1
≤ jn√j1n
j1n√jn
+j2n√j1nj2n
j2n√jn
+ 1 ≤ 3√j2n
where in the final inequality we employed that jn = j1nj2n. Hence, by (S.189) we have
√log(N(ε, Cn, ‖ · ‖2)) .
√j2nε
. (S.190)
To conclude the proof, we combine the bounds in (S.188) and (S.190). In particular,
setting δn ≡ 6M0dlog(j1n + 1)e/j1n and observing that ‖β‖2 ‖pjn′n β‖λ,2 ≤ C0 for all
β ∈ Cn allows us to conclude that for some M2 <∞ we must have
∫ ∞0
√log(N(ε, Cn, ‖ · ‖2))dε .
∫ M2
δn
√j2nε
+√jn
∫ δn
0(log(
M1 log(j1n)
εj1n))1/2dε
≤√j2n log(1 + j1n) +
√jn log(1 + j1n)
j1n
∫ 1
0(log(
1
u))1/2du .
√j2n log(1 + j1n)
where the second inequality follows from the change of variables u = ε/δn and the final
inequality employed that jn = j1nj2n and j2n ≤ j1n. Substituting j2n = j1n ∧ j2n and
employing j1n ≤ jn establishes the Lemma.
S.6.3 Quantile Treatment Effects
For our next example we formally study the nonparametric quantile treatment effect
(QTE) application introduced in Section 2.2. Recall that in this context θ0(P ) is as-
sumed to be the solution to the conditional moment restriction
P (Y ≤ θ0(P )(D)|Z) = τ (S.191)
59
where Y ∈ R, D ∈ [0, 1], and Z ∈ R. Thus, in this application X = (Y,D,Z) ∈ X ≡R× [0, 1]×R and the residual function ρ : X×B→ R corresponding to (S.191) is
ρ(X, θ) = 1Y ≤ θ(D) − τ. (S.192)
Our analysis readily enables us to conduct inference on the QTE itself. However, in
order to illustrate our conditions in a number of different settings, we focus here on a
nonlinear functional of θ0(P ). In particular, we conduct inference on∫ 1
0(∇θ0(P )(u))2du− (
∫ 1
0∇θ0(P )(u)du)2
while imposing that the QTE be increasing in treatment intensity (i.e. d 7→ ∇θ0(P )(d)
is increasing). To map this problem into our framework we define
CmB ([0, 1]) ≡ θ : [0, 1]→ R s.t. ‖θ‖m,∞ <∞ ‖θ‖m,∞ ≡ sup0≤α≤m
supd∈[0,1]
|∇αθ(d)|,
and set B = C2B([0, 1]) and ‖ · ‖B = ‖ · ‖2,∞. We impose the restriction that the quantile
treatment effect be increasing in the intensity of treatment by letting G = C0B([0, 1]),
‖ · ‖G = ‖ · ‖∞ (where we write ‖ · ‖∞ in place of ‖ · ‖0,∞), and defining
ΥG(θ) ≡ −∇2θ. (S.193)
As already argued in Section S.6.2, we note that G is a Banach lattice with order unit
1G = c for c the constant function c(d) = 1 for all d ∈ [0, 1]. Setting F = R with
‖ · ‖F = | · |, we may then test whether the variance of the quantile treatment effects
equals a hypothesized value λ 6= 0 by setting ΥF : B→ R to equal
ΥF (θ) =
∫ 1
0(∇θ(u))2du− (
∫ 1
0∇θ(u)du)2 − λ. (S.194)
For the parameter space for θ0(P ) we employ a ball in B and we thus set Θ to equal
Θ ≡ θ ∈ C2B([0, 1]) s.t. ‖θ‖2,∞ ≤ C0 (S.195)
for some C0 <∞. For a sequence of approximating functions pj,njnj=1 defined on [0, 1]
we then let pjnn (d) ≡ (p1,n(d), . . . , pjn,n(d))′ and define Θn to equal
Θn ≡ pjn′n β ∈ C2B([0, 1]) : ‖pjn′n β‖2,∞ ≤ C0. (S.196)
Similarly employing a sequence of transformations qn,kknk=1 of the conditioning variables
60
Z, we set qknn (z) ≡ (q1,n(z), . . . , qkn,n(z))′ and for some 2 ≤ p ≤ ∞ we let
Qn(θ) ≡ ‖Σn1
n
n∑i=1
(1Yi ≤ θ(Di) − τ)qknn (Zi)‖p
for some weighting matrix Σn. For R = θ ∈ Θ : ΥF (θ) = 0 and ΥG(θ) ≤ 0 with ΥG
and ΥF as defined in (S.193) and (S.194), we then set In(R) and In(Θ) to equal
In(R) ≡ infθ∈Θn∩R
√nQn(θ) In(Θ) ≡ inf
θ∈Θn
√nQn(θ).
In order to approximate the finite sample distribution of In(R) and In(Θ) we impose:
Assumption S.6.12. (i) Yi, Di, Zini=1 is i.i.d. with (Y,D,Z) ∈ R × [0, 1] × R dis-
tributed according to P ∈ P; (ii) For Θ as in (S.195) and each P ∈ P0 there exists a
unique θ0(P ) ∈ Θ satisfying (S.191); (iii) The distribution of Y conditional on (D,Z)
is absolutely continuous with density fY |DZ,P (·|D,Z) that is bounded and Lipschitz uni-
formly in (D,Z) and P ∈ P; (iv) Assumptions S.8.1 and S.8.2 hold.
Assumption S.6.13. (i) supd ‖pjnn (d)‖2 .
√jn; (ii) EP [pjnn (D)pjnn (D)′] has eigenvalues
bounded away from zero and infinity uniformly in P ∈ P and n; (iii) For each P ∈P0 there is a Πnθ0(P ) ∈ Θn ∩ R with supP∈P0
‖EP [(1Y ≤ Πnθ0(P )(D) − 1Y ≤θ0(P )(D))qknn (Z)]‖p = O((n log(n))−1/2).
Assumption S.6.14. (i) infP∈P0 infθ∈Θ:‖θ−θ0(P )‖1,∞≥εEP [(P (Y ≤ θ0(P )(D)|Z)−τ)2] >
0 for every ε > 0; (ii) There are ε and sn > 0 satisfying for all P ∈ P0 and ‖θ −Πnθ0(P )‖1,∞ ≤ ε, sn ≤ singEP [fY |D,Z(θ(D)|D,Z)qknn (Z)pjnn (D)′] and sn = O(1).
Assumption S.6.15. (i) max1≤k≤kn ‖qk,n‖∞ = O(1); (ii) max1≤k≤kn ‖qk,n‖1,∞ = O(kn);
(iii) EP [qknn (Z)qknn (Z)′] has eigenvalues bounded away from zero and infinity uniformly
in P ∈ P and n; (iv) For each θ ∈ Θ there is a πn(θ) ∈ Rkn such that EP [(EP [ρ(X, θ)|Z]−qknn (Z)′πn(θ))2] = o(1) uniformly in P ∈ P and θ ∈ Θ; (v) k
1/pn√jn log3/2(n)(n1/6 ∨
kn)/n1/3 = o(1) and jn log5/2(1 + kn)k2/p+1/2n /sn
√n = o((log(n))−1).
Assumption S.6.16. (i) ‖Σn − Σn(P )‖o,p = oP ((k1/pn log(n))−1) uniformly in P ∈ P;
(ii) Σn(P ) is invertible and ‖Σn(P )‖o,p and ‖Σn(P )−1‖o,p are bounded in P ∈ P and n.
Besides imposing identification of θ0(P ), Assumption S.6.12 imposes regularity con-
ditions on the distribution P that enable us to apply the empirical process coupling
results of Appendix S.8. Assumption S.6.13 states the requirements on the sieve Θn,
including demanding an asymptotically negligible bias in Assumption S.6.13(iii). As-
sumption S.6.13(i) holds pointwise in P ∈ P0 due to Θ being compact under ‖ · ‖1,∞,
and hence the uniformity in P ∈ P0 demanded by Assumption S.6.13(i) corresponds to
imposing strong identification. While Assumption S.6.13(i) delivers a uniform consis-
tency result, Assumption S.6.13(ii) enables us to obtain a uniform rate of convergence
61
under the norm ‖ · ‖E = supP∈P0‖ · ‖P,2. As in Section S.6.2, sn can be shown to
be related to the degree of ill-posedness; see Remarks S.6.1 and S.6.2. Assumptions
S.6.15(i)-(iv) impose conditions on the transformed moments qk,nnk=1 including that
they be bounded – this requirement can be relaxed at the cost of more stringent rate
restrictions to ensure a coupling of the empirical process (see Lemma S.6.19). Finally,
Assumption S.6.15(v) states our rate restrictions, which we note are easier to satisfy for
higher values of p.
In this application we set ‖·‖E = supP∈P0‖·‖P,2. Hence, for any θ = pjn′n β ∈ Θn∩R,
the local parameter space Vn(θ, `) is in this application given by
Vn(θ, `) =pjn′n βh : ‖θ +
pjn′n βh√n‖2,∞ ≤ C0, sup
P∈P0
‖pjn′n βh‖P,2 ≤ `√n,∫ 1
0(∇θ(u) +
∇pjnn (u)′βh√n
)2du− (
∫ 1
0∇θ(u) +
∇pjnn (u)′βh√n
du)2 = λ,
−∇2θ(d)− ∇2pjnn (d)′βh√
n≤ 0 for all d ∈ [0, 1]
, (S.197)
where the first two constraints impose that θ+ pjn′n βh/√n ∈ Θn and ‖pjn′n βh/
√n‖E ≤ `,
while the final two constraints require that θ + pjn′n βh/√n ∈ R. Similarly, here
V un (θ, `) =
pjn′n βh : ‖θ +
pjn′n βh√n‖2,∞ ≤ C0 and sup
P∈P0
‖pjn′n βh‖P,2 ≤ `√n,
which corresponds to dropping the constraint θ+pjn′n βh/√n ∈ R in (S.197). We note that
in this problem the residual function ρ(·, θ) is not smooth in θ. Nonetheless, Assumption
3.8 holds (see Lemma S.6.20) and Dn,P (θ) is for any h = pjn′n βh ∈ Bn equal to
Dn,P (θ)[h] = EP [qknn (Z)fY |DZ,P (θ(D)|D,Z)pjnn (D)′βh]. (S.198)
Our next result employs Theorem 3.2 to derive approximations to the finite sample
distribution of the test statistics In(R) and In(R)− In(Θ).
Theorem S.6.5. Let Assumptions S.6.12, S.6.13, S.6.14, S.6.15, and S.6.16 hold, an =
(log(n))−1/2, and `n ↓ 0 satisfy k1/pn
√jn`n log(1 + kn) log(1/`n) = o((log(n))−1/2) and
`2n√njn log(n) = o(1). Then: uniformly in P ∈ P0 it follows that
In(R) ≤ Un,P (R|`n) + oP (an).
If in addition kn log(1 + kn)√jn log(n)/s2
n
√n = o(1), then for any `un ↓ 0 satisfy-
ing k1/pn
√jn`un log(1 + kn) log(1/`un) = o((log(n))−1/2), (`un)2
√njn log(n) = o(1), and
62
√kn log(1 + kn)/sn
√n = o(`un), it follows uniformly in P ∈ P0 that
In(R)− In(Θ) ≤ Un,P (R|`n)− Un,P (Θ|`un) + oP (an).
The first part of Theorem S.6.5 obtains an upper bound for In(R) by relying on
Theorem 3.2(i). In order to approximate the re-centered statistic In(R) − In(Θ), we
cannot rely on an upper bound for In(Θ) as the resulting approximation could fail to
control size. Therefore, the second part of Theorem S.6.5 instead relies on Theorem
3.2(ii) to obtain a valid approximation. Applying Theorem 3.2(ii), however, requires an
additional rate condition in order to establish the linearization of the moment conditions
is asymptotically valid – i.e. we impose R2n × Sn(L,E) = o(ann
−1/2). Finally, we also
note that the second conclusion of Theorem S.6.5 in fact holds with equality whenever
`n satisfies the same rate restrictions as `un.
In order to provide bootstrap estimates for these distributional approximations, let
θn ∈ arg minθ∈Θn∩R
Qn(θ) θun ∈ arg min
θ∈ΘnQn(θ).
Recalling that ρ(·, θ) is as defined in (S.192), then note that Wn is here given by
Wnρ(·, θ)qknn ≡1√n
n∑i=1
ωiqknn (Zi)(1Yi ≤ θ(Di)−τ)− 1
n
n∑j=1
qknn (Zj)(1Yj ≤ θ(Dj)−τ).
Since ρ(·, θ) is not smooth in θ, we estimate Dn,P (θ)[h] through the numerical derivative
Dn(θ)[h] ≡ 1√n
n∑i=1
qknn (Zi)(1Yi ≤ θ(Di) +h(Di)√
n − 1Yi ≤ θ(Di)).
An unappealing feature of Dn(θ) is that it is not linear in h and thus may complicate
computation of the bootstrap statistic. We note that alternatively, a plug-in estimator
that employs the analytical expression in (S.198) could be used – such an estimator,
however, would necessitate estimating the density function fY |DZ,P and thus require the
introduction of additional bandwidths.
With regards to the local parameter space, we note that in this application the
function ΥG is linear. Therefore, for this problem the set Gn(θn) is simply given by
Gn(θn) ≡ h√n∈ Bn : −∇2θn(d)− ∇
2h(d)√n≤ max−∇2θn(d) ∨ −rn for all d ∈ [0, 1].
Employing that ‖ · ‖B = ‖ · ‖2,∞ and the expression for ΥF in (S.194), the estimate for
63
the local parameter space Vn(Πnθ0(P ), `n) in this application given by
Vn(θn, `n) = h√
n∈ Bn :
h√n∈ Gn(θn), ‖ h√
n‖2,∞ ≤ `n∫ 1
0(∇θn(u) +
∇h(u)√n
)2du− (
∫ 1
0∇(θn(u) +
h(u)√n
)du)2 = λ,
where `n is chosen to satisfy conditions stated below. The bootstrap statistics Un(R|`n)
and Un(Θ|`un) for approximating the distributions in Theorem S.6.5 are then
Un(R|`n) ≡ infh√n∈Vn(θn,`n)
‖Wnρ(·, θn) + Dn(θn)[h]‖Σn,p
Un(Θ|`un) ≡ inf‖ h√
n‖2,∞≤`un
‖Wnρ(·, θun) + Dn(θu
n)[h]‖Σn,p.
The following final assumption will enable us to establish bootstrap validity.
Assumption S.6.17. (i) The functions θ(d) = 1, θ(d) = d2 are in Bn; (ii) ‖θ0(P ) −Πnθ0(P )‖2,∞ = o(1) uniformly in P ∈ P0 and supP∈P0
‖θ0(P )‖2,∞ < C0; (iii) kn
satisfies k1/p+9/20n = o(n1/20/ log(n)); (iv) supd ‖∇2pjnn (d)‖2 ∨ ‖∇pjnn (d)‖2 . j
5/2n ; (v) rn
and ˜n = `n ∨ `un satisfy k
1/pn
√jn ˜
n log(1 + kn) log(1/˜n) = o((log(n))−1/2), ˜
n(√n(˜
n +
j5/2n
√kn log(1 + kn)/sn) = o((log(n))−1/2), and j
5/2n
√kn log(1 + kn)/sn
√n = o(1 ∧ rn).
Assumption S.6.17(i) simply demands that the quadratic functions belong to Bn – a
condition we employ to verify Assumption 3.13. In turn Assumption S.6.17(ii) implies
that θ0(P ) and its approximation Πnθ0(P ) belong to the interior of Θ. Assumption
S.6.17(iii) imposes a sufficient condition for verifying the bootstrap coupling requirement
of Assumption 3.14. In particular, we establish Assumption 3.14 holds by applying the
results in Appendix S.9 to a Haar basis expansion. While condition S.6.17(iii) suffices
for verifying Assumption 3.14 in both the endogenous (Z 6= D) and exogenous (Z = D)
settings, we note in both cases the rate condition of Assumption S.6.17(iii) could be
improved.1 Finally, Assumption S.6.17(iv), which is satisfied for instance by B-Splines,
ensures that Sn(B,E) j5/2n , while Assumption S.6.17(v) imposes the rate requirements
on `n, `un and rn. Intuitively, these rate requirements demand that `n ↓ 0 sufficiently fast
and rn not tend to zero too quickly.
The next theorem establishes the validity of the bootstrap procedure.
Theorem S.6.6. Let Assumptions S.6.12, S.6.13, S.6.14, S.6.15, S.6.16, and S.6.17
hold and an = (log(n))−1/2. Then, it follows that there are sequences ˜n and ˜u
n such
1For instance under endogeneity, a better rate could be obtained by conducting a basis expansionusing the tensor product of a Haar Basis for (Y,D) and the functions qk,nknk=1.
64
that `n ˜n, `un ˜u
n and uniformly in P ∈ P0 we have
Un(R|`n) ≥ U?n,P (R|˜n) + oP (an)
Un(R|`n)− Un(Θ|`un) ≥ U?n,P (R|˜n)− U?n,P (Θ|˜un) + oP (an).
Thus, Theorems S.6.5 and S.6.6 imply that as critical value for In(R) we may employ
q1−α(Un(R|`n)) ≡ infc : P (Un(R|`n) ≤ c|Vini=1) ≥ 1− α.
If in addition kn log(1+kn)√jn log(n)/s2
n
√n = o(1) and
√kn log(1 + kn)/sn
√n = o(`un)
(so that the second conclusion of Theorem S.6.5 holds), then Theorem S.6.6 implies a
valid test can be obtained by rejecting whenever In(R)−In(Θ) exceeds the critical value
q1−α(Un(R|`n)− Un(Θ|`un)) ≡ infc : P (Un(R|`n)− Un(Θ|˜un) ≤ c|Vini=1) ≥ 1− α.
Remark S.6.3. To illustrate the role of `n, it is helpful to conduct a pointwise (in
P ) analysis, set p = 2, and connect our assumptions to the literature on estimation of
conditional moment restriction models (Chen and Pouzo, 2012). We follow the literature
in imposing a local curvature assumption, which in our application, corresponds to
‖EP [(P (Y ≤ h(D)|Z)− τ)qknn (Z)]‖2 ‖EP [fY |DZ,P (θ(D)|D,Z)(θ(P )(D)− h(D))qknn (Z)]‖2 (S.199)
for all h ∈ Θn and θ ∈ Θ that are in a neighborhood of θ(P ). We further suppose the
linear operator h 7→ EP [fY |DZ,P (θ(P )(D)|D,Z)h(D)|Z] is compact, in which case there
exist orthonormal bases ψj and φk and a sequence λj ↓ 0 satisfying
EP [fY |DZ,P (θ(P )(D)|D,Z)φj(D)|Z] = λjψj(Z). (S.200)
Setting kn ≥ jn with kn − jn = O(1), pjn = (φ1, . . . , φjn)′, qkn = (ψ1, . . . , ψkn)′, and
Πunθ(P ) =
∑jnj=1 φjβj , we also suppose θ(P ) admits an expansion
θ(P ) =
∞∑j=1
pjβj with |βj | = O(j−γβ ). (S.201)
Provided that the approximation error of Πnθ0(P ) (as in Assumption S.6.13(iii)) and
Πunθ0(P ) are of the same order, it then follows from (S.199) and (S.200) that
‖EP [(1Y ≤ Πnθ0(P )(D) − 1Y ≤ θ0(P )(D))qkn(Z)]‖2 .λjnjγβn
(S.202)
and sn λjn – i.e. sn is proportional to the reciprocal of the sieve measure of ill-
posedness (Chen and Pouzo, 2012). As a result, if λj j−γλ and γβ > max5/2, 3 −
65
γλ, then Theorem S.6.5 may be applied to couple In(R) by setting jn nκ with
(2(γλ + γβ))−1 < κ < min(5 + 2γλ)−1, 1/6, while coupling In(R)− In(Θ) additionally
requires γβ > 3/2 + γλ and κ < (3 + 4γλ)−1. In contrast, in the severely ill-posed case
in which λj exp−γλj, the conditions of Theorem S.6.5 for coupling In(R)− In(Θ)
are not satisfied. However, the conditions for coupling In(R) can still be met provided
γβ > 4 by setting jn = (log(n)−κ(log(log(n))))/2γλ with 7 < κ < 2γβ−1. Heuristically,
while in the ill-posed case the rate of convergence is too slow to apply Theorem 3.2(ii)
to In(R), Theorem 3.2(i) is nonetheless able to deliver a coupling upper bound for In(R)
under suitable sequences `n.
S.6.3.1 Proofs and Auxiliary Results
Proof of Theorem S.6.5: We establish the result by applying Theorem 3.2 to
both R = Θ and R corresponding to (S.193) and (S.194). To this end, note that
Assumption 3.2 is directly imposed in Assumption S.6.12(i), Assumption 3.3(i) holds
with Bn = O(1) by Assumption S.6.15(i), Assumption 3.3(ii) is directly imposed by
Assumption S.6.15(iii), and Assumption 3.3(iii) holds with Fn = 1 and Jn = O(1)
by Lemma S.6.18. We also note Assumption 3.4 is implied by Assumption S.6.16,
while Lemma S.6.16 verifies Assumption 3.6 for both R = Θ and R corresponding to
(S.193) and (S.194). Next, we apply Lemma S.6.19 with π0n = O(1) and π1n = O(kn)
(which is possible by Assumptions S.6.15(i)(ii)) to obtain that Assumption 3.7 holds
for both R = Θ and R corresponding to (S.193) and (S.194) for any an satisfying
k1/pn√jn log(n)(n1/6 ∨ kn)/n1/3 = o(an) and in particular it holds for an (log(n))−1/2
by Assumption S.6.15(v). We also note Assumptions 3.8 and 3.9 hold with ‖·‖L = ‖·‖∞,
‖ · ‖E = supP∈P0‖ · ‖P,2, and κρ = 1/2 by Lemma S.6.20. Finally, to verify Assumption
3.10, note Jn = O(1), Bn = O(1), and νn √kn/snk
1/pn by Lemma S.6.16 imply in this
application we have Rn √kn log(1 + kn)/sn
√n. Thus, Lemma S.6.18 and Assump-
tion S.6.15(v) verifies Assumption 3.10(i), Assumption S.6.13(iii) implies Assumption
3.10(ii), and Assumption S.6.16(i) implies Assumption 3.10(iii). Finally, since in this
application ‖ · ‖L = ‖ · ‖∞ by Lemma S.6.20, Assumptions S.6.13(i)(ii) yield
supβ
‖pjn′n β‖∞supP∈P0
‖pjn′n β‖P,2.
√jn‖β‖2‖β‖2
=√jn. (S.203)
Therefore, the conditionKm`2n×Sn(L,E) = o(ann
−1/2) is equivalent to `2n√njn log(n) =
o(1). Moreover, by Lemma S.6.18, Assumption S.6.13(ii), κρ = 1/2, and Bn = O(1) the
condition k1/pn
√log(1 + kn)Bn × supP∈P J[ ](`
κρn ,Fn, ‖ · ‖P,2) = o(an) is implied by the
restriction k1/pn
√jn`n log(1 + kn) log(1/`n) = o((log(n))−1/2). Thus, the first claim of
the theorem follows form Theorem 3.2(i) applied to In(R).
The second claim of the Theorem follows from applying Theorem 3.2(ii) to both
66
In(Θ). To this end, note that the only remaining condition to verify is that R2n ×
Sn(L,E) = o(ann−1/2). Using that, as already argued, Rn
√kn log(1 + kn)/sn
√n
and result (S.203) we note that a sufficient condition for this final requirement is that
kn log(1 + kn)√jn log(n)/s2
n
√n = o(1) and therefore the theorem follows.
Proof of Theorem S.6.6: We proceed by relying on Theorem 3.3 and Corollary 3.3.
To this end, we note that, for both R = Θ and R corresponding to (S.193) and (S.194),
the proof of Theorem S.6.5 established that Assumptions 3.2, 3.3, 3.4, 3.6, 3.7, 3.8, 3.9,
and 3.10 are satisfied with Bn = O(1), Jn = O(1), ‖·‖E = supP∈P0‖·‖P,2, ‖·‖L = ‖·‖∞,
νn √kn/snk
1/pn , Rn
√kn log(1 + kn)/sn
√n, κρ = 1/2, and Sn(L,E) .
√jn.
Next, note that Assumptions 3.11, 3.12, and 3.13 are satisfied by Lemma S.6.21
with Kg = 0 and Kf > 0. To verify Assumption 3.14 we apply Lemma S.6.22 with
π0n = O(1) and π1n . kn, which is possible by Assumptions S.6.15(i)(ii). In particular,
Lemma S.6.22 evaluated at dn (nkn)3/10 and Assumption S.6.17(iii) yield
supθ∈Θn
‖Wnρ(·, θ)qknn −W?n,Pρ(·, θ)qknn ‖p = oP ((log(n))−1/2) (S.204)
uniformly in P ∈ P, which implies Assumption 3.14 is satisfied for both R = Θ and R
corresponding to (S.193) and (S.194). Next note that Assumption 3.15(i) is immediate
since ‖·‖E = supP∈P0‖·‖P,2 and ‖·‖B = ‖·‖2,∞. In order to verify the rate conditions of
Assumption 3.16, note that the eigenvalues of EP [pjnn (D)pjnn (D)′] being bounded away
from zero uniformly in P ∈ P by Assumption S.6.13(ii), Assumptions S.6.13(i) and
S.6.17(ii) together with ‖ · ‖B = ‖ · ‖2,∞ and ‖ · ‖E = supP∈P0‖ · ‖P,2 yield
Sn(B,E) = supβ
‖pjn′n β‖2,∞supP∈P0
‖pjn′n β‖P,2.j
5/2n ‖β‖2‖β‖2
= j5/2n . (S.205)
Thus, we note Assumption 3.16(i) follows from Assumption S.6.17(v), result (S.205) and
Rn √kn log(1 + kn)/sn
√n. Furthermore, we note RnSn(B,E) = o(1), Assumption
S.6.17(ii), and the definition of Θ in (S.195) imply assumption 3.15(ii) is satisfied as well.
Assumption 3.16(ii) similarly follows from Assumption S.6.17(v), result (S.205), Rn √kn log(1 + kn)/sn
√n, κρ = 1/2, and Lemma S.6.18. Finally, Assumption 3.16(iii) also
follows by Assumption S.6.17(v), result (S.205), Rn √kn(log(1 + kn))/sn
√n, and
Kg = 0. We have thus verified the conditions of Theorem 3.3 and Corollary 3.3, and
the theorem follows.
Lemma S.6.15. Let Assumptions S.6.12(i)(iii), S.6.13(ii)(iii), S.6.14(i), S.6.15(i)(iii)(iv),
S.6.16 hold, and for either R = Θ or R corresponding to (S.193) and (S.194) set
Θrn = arg min
θ∈Θn∩RQn(θ).
If kn log(1 + kn)/√n = o(1), then
−→d H(Θr
n,Θr0n, ‖ · ‖1,∞) = oP (1) uniformly in P ∈ P0.
67
Proof: We establish the result by verifying the conditions of Lemma S.3.1. To this
end, note that Assumption 3.2 is directly imposed in Assumption S.6.12(i), Assumption
3.3(i) holds with Bn = O(1) by Assumption S.6.15(i), Assumption 3.3(i) holds with
Fn = 1 and Jn = O(1) by Lemma S.6.18, Assumption 3.4 is implied by Assumption
S.6.16, and Assumption 3.6(ii) follows from Assumption S.6.13(iii). Next, define
Vn(P ) ≡ θ ∈ Θn : ‖θ − θ0(P )‖1,∞ ≤ ε (S.206)
for any ε > 0 and P ∈ P0. Since for any a ∈ Rkn we have ‖a‖p ≤ ‖Σ−1n (P )‖o,p‖Σn(P )a‖p
and ‖Σ−1n (P )‖o,p is bounded uniformly in n and P ∈ P by Assumption S.6.16(ii), we
obtain from Lemma S.3.5 and Assumption S.6.13(iii) that
Sn(ε) ≡ infP∈P0
infθ∈(Θn∩R)\Vn(P )
Qn,P (θ)− infθ∈Θn∩R
Qn,P (θ)
& infP∈P0
infθ∈(Θn∩R)\Vn(P )
k1/pn√kn‖EP [qknn (Z)ρ(X, θ)]‖2 +O((n log(n))−1/2). (S.207)
We further note that the eigenvalues of EP [qknn (Z)qknn (Z)′] being bounded uniformly in
n and P ∈ P together with Lemma S.4.5 and Assumption S.6.15(iv) yield
‖EP [qknn (Z)(EP [ρ(X, θ)|Z]− qknn (Z)′πn(θ))]‖22. EP [(EP [ρ(X, θ)|Z]− qknn (Z)′πn(θ))2] = o(1). (S.208)
Thus, since the eigenvalues of (EP [qknn (Z)qknn (Z)′])1/2 are bounded away from zero by
Assumption S.6.15(iii), we obtain from result (S.208) that
infP∈P0
infθ∈(Θn∩R)\Vn(P )
‖EP [qknn (Z)ρ(X, θ)]‖2
≥ infP∈P0
infθ∈(Θn∩R)\Vn(P )
‖EP [qknn (Z)qknn (Z)′πn(θ)]‖2 + o(1)
& infP∈P0
infθ∈(Θn∩R)\Vn(P )
(EP [(qknn (Z)′πn(θ))2])1/2 + o(1). (S.209)
Therefore, combining results (S.207) and (S.209), Assumption S.6.15(iv), and the set
inclusion Θn ∩R \ Vn(P ) ⊆ θ ∈ Θ : ‖θ − θ0(P )‖1,∞ ≥ ε, we can conclude
Sn(ε) & infP∈P0
infθ∈Θ:‖θ−θ0(P )‖1,∞≥ε
k1/pn√kn
(EP [(P (Y ≤ θ(D)|Z)−τ)2])1/2 +o(k
1/pn√kn
). (S.210)
Since Jn 1 and Bn 1, Assumption S.6.14(i), result (S.210) and kn log(1 +kn)/√n =
o(1) by hypothesis imply that k1/pn
√log(1 + kn)JnBn/
√n = o(Sn(ε)) as required by
Lemma S.3.1 and thus the claim of the Lemma follows.
Lemma S.6.16. Let Assumptions S.6.12(i)(iii), S.6.13(ii)(iii), S.6.14, S.6.15(i)(iii)(iv),
S.6.16 hold. For both R = Θ and R corresponding to (S.193) and (S.194), it then follows
68
that Assumption 3.6 is satisfied with ‖ · ‖E ≡ supP∈P0‖ · ‖P,2 and νn
√kn/snk
1/pn .
Proof: For R = Θ or R corresponding to (S.193) and (S.194), we define Θrn to equal
Θrn ≡ arg min
θ∈Θn∩RQn(θ),
which note corresponds to setting τn = 0 in (11). For any ε > 0, we may then let
Vn(P ) ≡ θ ∈ Θn ∩ R : ‖θ − Πnθ0(P )‖1,∞ ≤ ε, and note Lemma S.6.15 implies
Θrn ⊆ Vn(P ) with probability tending to one uniformly in P ∈ P0. Further observe
that since Πnθ0(P ) ∈ Θn, there exists a β0n such that Πnθ0(P ) = pjn′n β0n. For any
θ = pjn′n β ∈ Vn(P ), it then follows by Assumptions S.6.13(ii) and S.6.14(ii) that
supP∈P‖pjn′n (β − β0n)‖P,2 . ‖β − β0n‖2
≤ 1
sn× infP∈P0
inf‖θ−Πnθ0(P )‖1,∞≤ε
‖EP [fY |DZ,P (θ(D)|D,Z)qknn (Z)pjnn (D)′(βn − β0n)]‖2.
Hence, since ‖ ·‖E = supP∈P0‖ ·‖P,2, the mean value theorem allows us to conclude that
snk1/pn√kn‖pjn′n (β − β0n)‖E
.k
1/pn√kn‖EP [(P (Y ≤ pjnn (D)′β|D,Z)− P (Y ≤ pjnn (D)′β0n|D,Z))qknn (Z)]‖2
. Qn,P (θ)−Qn,P (Πnθ0(P )) +O((n log(n))−1/2) (S.211)
for any θ = pjn′n β ∈ Vn(P ) and P ∈ P0, and where the final inequality follows from
Lemma S.3.5, ‖a‖p ≤ ‖Σn(P )−1‖o,p‖Σn(P )a‖o,p for any a ∈ Rkn , and Assumptions
S.6.13(iii) and S.6.16(ii). Hence, (S.211) verifies Assumption 3.6(i) is satisfied with
νn √kn/snk
1/pn . Moreover, since Jn = O(1) by Lemma S.6.18 and Bn = O(1) by
Assumption S.6.15(i), Assumptions S.6.13(iii) and S.6.16(i) imply Assumption 3.6(ii) is
satisfied as well.
Lemma S.6.17. Let Assumption S.6.12(iii) hold. If f(y, d) = 1y ≤ θ(d)−τ for some
θ ∈ Θ (Θ as in (S.195)) and z 7→ q(z) is differentiable with bounded level and derivative,
then there exists a K <∞ independent of f , such that for all P ∈ P we have
$(fq, h, P ) ≤ K × ‖q‖∞√h+ ‖q‖1,∞h.
Proof: First note that since ‖f‖∞ ≤ 1, we can obtain by direct calculation and the
definition of integral modulus of continuity in (S.309) the upper bound
$2(fq, h, P ) ≤ 2‖q‖2∞$2(f, h, P ) + 2$2(q, h, P ). (S.212)
69
For ΩZ(P ) the support of Z under P , the mean value theorem then implies that
$2(q, h, P ) ≡ sup‖s‖2≤h
EP [(q(Z + s)− q(Z))21Z + s ∈ ΩZ(P )]
. sup‖s‖2≤h
(‖q‖1,∞s)2 = ‖q‖21,∞h2. (S.213)
Furthermore, for any (sy, sd) ∈ R2 and d ∈ [0, 1] such that d + sd ∈ [0, 1], we also note
that the mean value theorem and Assumption S.6.12(iii) imply that
EP [(1Y + sy ≤ θ(D + sd) − 1Y ≤ θ(D))2|D = d]
= |P (Y ≤ θ(D + sd)− sy|D = d)− P (Y ≤ θ(D)|D = d)|
. |θ(d+ sd)− sy − θ(d)|. (S.214)
Hence, by the law of iterated expectations, a second application of the mean value
theorem, and employing that ‖θ‖1,∞ ≤ C0 by definition of Θ, we can conclude
$2(f, h, P ) ≤ sup‖(sy ,sd)‖2≤h
EP [(1Y +sy ≤ θ(D+sd)−1Y ≤ θ(D))21D+sd ∈ [0, 1]]
. sup‖(sy ,sd)‖2≤h
EP [|θ(D + sd)− sy − θ(D)|1D + sd ∈ [0, 1]] . h. (S.215)
The claim of the Lemma then follows from (S.212), (S.213), and (S.215).
Lemma S.6.18. Define the class Fn ≡ f : f(v) = 1y ≤ θ(d) − τ for some θ ∈ Θnfor Θn as in (S.196), and suppose that Assumption S.6.12(iii) and S.6.13(ii) hold. For
ζjn ≡ supd∈[0,1] ‖pjnn (d)‖2, it then follows that for all ε ≤ 1 and some K <∞
supP∈P
N[ ](ε,Fn, ‖ · ‖P,2) ≤ expKε ∧ (
K√ζjnε
)2jn ,
and supP∈P J[ ](ε,Fn, ‖ · ‖P,2) .√
1 ∧ ε ∧√jn(log(ζjn) + log(1 ∨ ε−1))(1 ∧ ε).
Proof: We first note that if θL(d) ≤ θ(d) ≤ θU (d), then it immediately follows that
1y ≤ θL(d) − τ ≤ 1y ≤ θ(d) − τ ≤ 1y ≤ θU (d), (S.216)
which implies brackets for Θn readily yield brackets in Fn. Moreover, by the mean value
theorem and Assumption S.6.12(iii) we can in addition conclude that
EP [(1Y ≤ θL(D) − 1Y ≤ θU (D))2]
= EP [P (Y ≤ θU (D)|D)− P (Y ≤ θL(D)|D)] . EP [|θU (D)− θL(D)|]. (S.217)
70
Hence, combining results (S.216) and (S.217) it follows that for some M0 <∞ we have
N[ ](ε,Fn, ‖ · ‖P,2) ≤ N[ ](ε2
M0,Θn, ‖ · ‖P,1). (S.218)
On the other hand, since Θn ⊆ Θ, we also obtain by Corollary 2.7.2 in van der Vaart
and Wellner (1996), ‖ · ‖P,2 ≤ ‖ · ‖∞ and inequality (S.218) that
supP∈P
N[ ](ε,Fn, ‖ · ‖P,2) ≤ N[ ](ε2
M0,Θn, ‖ · ‖∞) ≤ expM1
ε. (S.219)
In addition, the Cauchy-Schwarz inequality implies supP∈P ‖pjn′n (β1−β2)‖P,1 ≤ ζjn‖β1−
β2‖2 for any β1, β2 ∈ Rjn . Therefore, defining Bn ≡ β ∈ Rjn : pjn′n β ∈ Θn Theorem
2.7.11 in van der Vaart and Wellner (1996) allows us to conclude
supP∈P
N[ ](ε,Fn, ‖ · ‖P,1) ≤ N(ε2
2M0ζjn,Bn, ‖ · ‖2). (S.220)
Further note that by Assumption S.6.12(iii), we have ‖pjn′n β‖P,2 ‖β‖2 uniformly in
P ∈ P and n, and hence since supP∈P ‖pjn′n β‖P,2 ≤ ‖pjn′n β‖∞ ≤ C0, it follows
supP∈P
N[ ](ε,Fn, ‖ · ‖P,1) ≤ (M2
√ζjn
ε)2jn (S.221)
for some M2 <∞ due to result (S.220). The first claim of the Lemma therefore follows
from (S.219) and (S.221). Moreover, noting N[ ](1,Fn, ‖ · ‖∞) = 1 we also obtain
supP∈P
J[ ](ε,Fn, ‖ · ‖P,2) ≤ (
∫ 1∧ε
0(1 +
K
u)1/2du) ∧ (
∫ 1∧ε
0(1 + 2jn log(
K√ζjnu
))1/2)du
.√
1 ∧ ε ∧ √jn log(ζjn) +
∫ 1
0(jn log(
1
v(1 ∧ ε)))1/2dv(1 ∧ ε), (S.222)
where the second inequality follows by the change of variables v = u/(1∧ ε). The claim
of the Lemma thus follows from (S.222) and direct calculation.
Lemma S.6.19. Let Assumptions S.6.12(i)(iii)(iv) and S.6.13(ii) hold, Θn be as in
(S.196), set π0n ≡ max1≤k≤kn ‖qk‖∞ and π1n ≡ max1≤k≤kn ‖qk‖1,∞, and suppose log(kn∨π0n ∨ supd ‖p
jnn (d)‖2) = O(log(n)). If jn/n = o(1), then Assumption 3.7 holds with
R = Θ for any an with (π0nk1/pn log(n)
√jn/√n)(√jn + π0nn
1/3 + π1nn1/6) = o(an).
Proof: We establish the lemma by applying Theorem S.8.1. To this end, define the
class Fn ≡ fqk for some f ∈ Fn, 1 ≤ k ≤ kn, and obtain the upper bound
supθ∈Θn
‖Gn,Pρ(·, θ) ∗ qknn −Wn,Pρ(·, θ) ∗ qknn ‖p ≤ supf∈Fn
k1/pn |Gn,P f −Wn,P f |. (S.223)
71
We proceed by applying Theorem S.8.1 to the class Fn with δn √jn/n. Note that
Assumptions S.8.1 and S.8.2 are directly imposed in Assumption S.6.12(iv), while As-
sumption S.8.3(i) is satisfied by Lemma S.6.17, and Assumption S.8.3(ii) holds with
Kn = π0n since F has envelope 1. Furthermore, for Sn as in (S.310), we have
S2n .
dlog2 ne∑i=0
2iπ20n
2i3
+π2
1n
22i3
. π20nn
2/3 + π21nn
1/3
due to Lemma S.6.17. Also note that Lemmas S.3.3 and S.6.18 together imply
supP∈P
log(N[ ](δn, Fn, ‖ · ‖P,2)) ≤ supP∈P
log(knN[ ](δnπ0n
,Fn, ‖ · ‖P,2)) . jn log(n),
where we employed that log(kn ∨ π0n ∨ supd ‖pjnn (d)‖2) = O(log(n)) and δn =
√jn/n.
Similarly, Lemmas S.3.3 and S.6.18 and the change of variables v = u/π0n yield
supP∈P
J[ ](δn, Fn, ‖ · ‖P,2) ≤ supP∈P
∫ δn
0(1 + log(kn) + log(N[ ](
u
π0n,Fn, ‖ · ‖P,2)))1/2du
≤ δn√
log(kn) + supP∈P
J[ ](δnπ0n
,Fn, ‖ · ‖P,2)π0n .jn√
log(n)√n
,
where in the final inequality we employed that log(π0n) = O(log(n)), log(kn) = O(log(n)),
and log(supd ‖pjnn (d)‖2) = O(log(n)). Hence, Theorem S.8.1 implies
‖Gn,P −Wn,P ‖Fn = OP (π0n log(n)
√jn√
n√jn + π0nn
1/3 + π1nn1/6)
uniformly in P ∈ P, which together with (S.223) establishes the Lemma.
Lemma S.6.20. If Assumption S.6.12(iii) holds, then it follows that Assumptions 3.8
and 3.9 are satisfied when R = Θ (Θ as in (S.195)) with ‖ · ‖E = supP∈P0‖ · ‖P,2,
‖ · ‖L = ‖ · ‖∞, κρ = 1/2, mP (θ)(Z) ≡ P (Y ≤ θ(D)|Z), and
∇mP (θ)[h](Z) ≡ EP [fY |DZ,P (θ(D)|D,Z)h(D)|Z]. (S.224)
Proof: For θ1∨θ2 and θ1∧θ2 the pointwise minimum and maximum of θ1 and θ2, note
that the conditional density fY |DZ,P being bounded in (D,Z) and P ∈ P by Assumption
S.6.12(iii) together with the mean value theorem imply
EP [(ρ(X, θ1)−ρ(X, θ2))2] = EP [P (Y ≤ θ1(D)∨θ2(D)|D)−P (Y ≤ θ1(D)∧θ2(D)|D)]
. EP [θ1(D) ∨ θ2(D)− θ1(D) ∧ θ2(D)] ≤ supP∈P‖θ1 − θ2‖P,2,
where in the final inequality we employed Jensen’s inequality and that θ1(d) ∨ θ2(d) −θ1(d) ∧ θ2(d) = |θ1(d) − θ2(d)|. It thus follows Assumption 3.8 holds with ‖ · ‖E =
72
supP∈P0‖·‖P,2 and κρ = 1/2. Moreover, Jensen’s inequality and the mean value theorem
imply for some θ such that θ(d) is a convex combination of θ1(d) and θ0(d) that
EP [(P (Y ≤ θ1(D)|Z)− P (Y ≤ θ2(D)|Z)−∇mP (θ0)[θ1 − θ2](Z))2]
≤ EP [(fY |DZ,P (θ(D)|D,Z)− fY |DZ,P (θ0(D)|D,Z)θ1(D)− θ0(D))2]
. ‖θ1 − θ0‖2∞ × supP∈P
EP [(θ1(D)− θ0(D))2],
where the final inequality follows from fY |DZ,P being Lipschitz uniformly in (D,Z) and
P ∈ P. Hence, we may conclude Assumption 3.9(i) is satisfied with ‖ · ‖L = ‖ · ‖∞ and
‖ · ‖E = supP∈P0‖ · ‖P,2. Furthermore, once again employing Jensen’s inequality and
that fY |DZ,P is Lipschitz uniformly in (D,Z) and P ∈ P yields
EP [(EP [fY |DZ,P (θ1(D)|D,Z)− fY |DZ,P (θ0(D)|D,Z)h(D)|Z])2]
. ‖θ1 − θ0‖2∞ × supP∈P‖h‖2P,2 (S.225)
which implies Assumption 3.9(ii) is also satisfied under the stated choices of ‖ · ‖L and
‖ · ‖E. Finally, we note Assumption 3.9(iii) is immediate due to Jensen’s inequality and
fY |DZ,P being bounded uniformly in (D,Z) and P ∈ P.
Lemma S.6.21. If Assumption S.6.17(i) holds, B = C2B([0, 1]) and ΥG, ΥF , and Θ are
as defined in (S.193) (with λ 6= 0), (S.194), and (S.195), then it follows that Assumptions
3.11, 3.12, and 3.13 are satisfied with Kg = 0, ∇ΥG(θ)[h] = −∇2h, and
∇ΥF (θ)[h] = 2
∫ 1
0θ(u)h(u)du− 2(
∫ 1
0θ(u)du)(
∫ 1
0h(u)du). (S.226)
Proof: Note that since ΥG is linear and continuous, it immediately follows that As-
sumptions 3.11(i) and 3.11(ii) hold with ∇ΥG = ΥG and Kg = 0. It further follows
from ∇ΥG = ΥG and the definitions of the operator norm ‖ · ‖o and ‖ · ‖m,∞ that
‖∇ΥG(θ)‖o = sup‖h‖2,∞=1
‖ − ∇2h‖∞ ≤ 1, (S.227)
which implies Assumption 3.11(iii) holds with Mg = 1. Moreover, by direct calculation
|ΥF (θ1)−ΥF (θ2)−∇ΥF (θ1)[θ1 − θ2]|
= |∫ 1
0(θ1(u)− θ2(u))2du− (
∫ 1
0(θ1(u)− θ2(u))du)2| ≤ ‖θ1 − θ2‖22,∞, (S.228)
which implies ΥF is indeed Frechet differentiable and its derivative is equal to ∇ΥF as
73
defined in (S.226). In addition, by (S.226) and Jensen’s inequality we have
‖∇ΥF (θ1)−∇ΥF (θ2)‖o
= sup‖h‖2,∞=1
2|∫ 1
0(θ1(u)− θ2(u))(h(u)−
∫ 1
0h(u)du)du| ≤ 2‖θ1 − θ2‖2,∞, (S.229)
which together with (S.228) implies Assumptions 3.12(i) and 3.12(ii) hold with Kf = 2.
Next, note that since λ 6= 0 it follows that Fn = R. For any θ ∈ Bn such that ΥF (θ) 6= 0,
we then define ∇ΥF (θ)− : Fn → Bn to be given (for any c ∈ R) by
∇ΥF (θ)−[c](d) ≡ c×θ(d)−
∫ 10 θ(u)du
2ΥF (θ), (S.230)
and note that since θ ∈ Bn and the constant function is in Bn by Assumption S.6.17(i),
it follows that ∇ΥF (θ)−[c] ∈ Bn. Moreover, by direct calculation we obtain
∇ΥF (θ)∇ΥF (θ)−[c] = 2
∫ 1
0θ(u)c×
θ(u)−∫ 1
0 θ(u)du
2ΥF (θ)du = c× 2ΥF (θ)
2ΥF (θ)= c, (S.231)
which verifies ∇ΥF (θ)− is indeed the right inverse of ∇ΥF (θ). In addition note that
‖∇ΥF (θ)−‖o = sup|c|=1‖c×
θ −∫ 1
0 θ(u)du
2ΥF (θ)‖2,∞ ≤
‖θ‖2,∞ΥF (θ)
, (S.232)
and hence, since ‖θ‖2,∞ ≤ C0 and ΥF (θ) = λ for any θ ∈ Θr0n(P ), it follows that we
may select an ε > 0 such that Assumption 3.12(iv) holds with Mf = 4C0/λ.
Next, let θ be the function d 7→ d2 and note that by Assumption S.6.17(i) it follows
that θ ∈ Bn. For any θ0 ∈ Θr0n(P ) we may then set h0 to equal
h0 ≡2λ
C0θ − ∇ΥF (θ0)[θ]
C0θ0, (S.233)
which belongs to Bn since θ, θ0 ∈ Bn. Further observe ∇ΥF (θ0)[θ0] = 2ΥF (θ0) = 2λ
due to θ0 ∈ R, and hence by linearity of ∇ΥF (θ0) and (S.233) we can conclude that
h0 ∈ Bn ∩N (∇ΥF (θ0)). In addition, it also follows from ΥG = ∇ΥG that
ΥG(θ0)(u) +∇ΥG(θ0)[h0](u) = −∇2θ0(u)(1− ∇ΥF (θ0)[θ]
C0)− 2λ
C0≤ −2λ
C0, (S.234)
where the inequality results from −∇2θ0(u) ≤ 0 due to θ0 ∈ Θr0n(P ), θ(d) = d 7→ d2,
and |∇ΥF (θ0)[θ]| ≤ C0 by the Cauchy-Schwarz inequality and ‖θ0‖2,∞ ≤ C0 since
θ0 ∈ Θr0n(P ). By similar arguments and the triangle inequality we also have ‖h0‖2,∞ ≤
4λ/C0 + C0 and hence by (S.234) we conclude Assumption 3.13 is satisfied.
Lemma S.6.22. Suppose Assumptions S.6.12(i)(iii)(iv) and S.6.13(ii) hold, Θn be as
74
in (S.196), and let π0n ≡ max1≤k≤kn ‖qk‖∞ and π1n ≡ max1≤k≤kn ‖qk‖1,∞. For any
sequence dn ↑ ∞ such that d2n log(1 + dn) = o(n) and δn d
−1/6n + π1n/(π0nd
1/3n )
satisfies δn log(1 + kn) = o(1) it follows that uniformly in P ∈ P we have
supθ∈Θn
‖Wnρ(·, θ)qknn −W?n,Pρ(·, θ)qknn ‖p
= OP (k
1/pn dnπ0n
√dn log(1 + dn)√n
+ π0nk1/pn (
√δn +
√n exp−nδ3
n/2)).
Proof: We first define the class Fn ≡ fqk for some f ∈ Fn, 1 ≤ k ≤ kn and note
supθ∈Θn
‖Wnρ(·, θ)qknn −W?n,Pρ(·, θ)qknn ‖p ≤ sup
f∈Fnk1/pn |Wnf −W?
n,P f |. (S.235)
In what follows, we aim to establish the lemma by applying Theorem S.9.1 to the class
Fn by relying on a Haar basis expansion as in Lemmas S.8.1 and S.8.2. Specifically,
note that by Assumption S.6.12(iv) and Lemma S.8.1, there exists a sequence of par-
titions ∆n(P ) = ∆d,n(P ) : d = 1, . . . , dn of the support of V ≡ (Y,D,Z) such that
P (∆d,n(P )) 1/dn. For any 1 ≤ d ≤ dn we then set fd,n,P dnd=1 to be given by
fd,n,P (V ) ≡1V ∈ ∆d,n(P )√
P (∆d,n(P ))(S.236)
and let fdnn,P (v) ≡ (f1,n,P (v), . . . , fdn,n,P (v))′. Then note EP [fdnn,P (V )fdnn,P (V )′] = Idn for
Idn the dn× dn identity matrix. Thus, Assumption S.9.1(i) holds with Cn = 1, while by
(S.236) and P (∆d,n(P )) 1/dn, Assumption S.9.1(ii) is satisfied with Kn =√dn.
We next aim to verify that Assumption S.9.2 is satisfied by setting βn,P (f) to be
βn,P (f) ≡
EP [f(V )|V ∈ ∆1,n(P )]
√P (∆1,n(P ))
...
EP [f(V )|V ∈ ∆dn,n(P )]√P (∆dn,n(P ))
(S.237)
for any f ∈ Fn. Then observe that by direct calculation, for any f ∈ Fn we have that
fdnn,P (V )′βn,P (f) =
dn∑d=1
EP [f(V )|V ∈ ∆d,n(P )]1V ∈ ∆d,n(P ). (S.238)
Defining the class of residual functions Gn,P ≡ f − fdn′n,Pβn,P (f) : f ∈ Fn, then observe
that since the functions f ∈ Fn have envelope 1, it follows the class Fn has envelope π0n
and hence by (S.238) and Jensen’s inequality, the class Gn,P has envelope Gn,P ≡ 2π0n.
75
Moreover, by Lemmas S.8.2 and S.6.17 we can in addition conclude
‖f − fdn′n,Pβn,P (f)‖2P,2 .π2
0n
d1/3n
+π2
1n
d2/3n
,
and hence it follows that ‖g‖P,2 ≤ δn‖Gn,P ‖P,2 for all g ∈ Gn,P and δn satisfying
δn 1
d1/6n
+π1n
π0nd1/3n
.
Next, note that if f(V ) ≤ f(V ) ≤ f(V ) almost surely, then result (S.238) yields
f(V )− fdnn (V )′βn,P (f) ≤ f(V )− fdnn (V )′βn,P (f) ≤ f(V )− fdnn (V )′βn,P (f), (S.239)
which implies brackets for Fn can readily be employed to obtain brackets for Gn,P . We
further note that formula (S.238), the triangle inequality, and Jensen’s inequality yield
‖(f − fdn′n βn,P (f))− (f − fdn′n βn,P (f))‖P,2= ‖(f − f) + fdn′n βn,P (f − f)‖P,2 ≤ 2‖f − f‖P,2. (S.240)
Hence, results (S.239), (S.240), Lemma S.3.3, and Fn having envelope π0n establish that
supP∈P
log(N[ ](ε,Gn,P , ‖ · ‖P,2)) ≤ supP∈P
log(N[ ](ε
2, Fn, ‖ · ‖P,2))
≤ log(kn) + supP∈P
log(N[ ](ε
2π0n,Fn, ‖ · ‖P,2)) . log(kn) +
π0n
ε1ε ≤ 2π0n (S.241)
where the final inequality follows for any ε ≤ 2π0n by Lemma S.6.18, and for any ε > 2π0n
by observing that Fn is contained in the brackets [−τ, 1 − τ ] which has width 1, and
hence N[ ](1,Fn, ‖ · ‖P,2) = 1. Recalling that Gn,P has envelope Gn,P ≡ 2π0n, we can
then obtain from result (S.241) the upper bound
supP∈P
J[ ](δn‖Gn,P ‖P,2,Gn,P , ‖ · ‖P,2) .∫ 2δnπ0n
0
√1 + log(kn) +
π0n
ε1ε ≤ 2π0ndε
. δnπ0n
√log(1 + kn) +
∫ 2δnπ0n
0
√π0n
εdε . π0n
√δn, (S.242)
where in the final inequality we employed that δn log(1 + kn) = o(1) by hypothesis.
Similarly, for ηn,P ≡ 1 + logN[ ](δn‖Gn,P ‖P,2,Gn,P , ‖ · ‖P,2) we can conclude that
√nEP [Gn,P (V ) exp−
nδ2n‖Gn,P ‖2P,2
G2n,P (V )ηn,P
] .√nπ0n exp− nδ2
n
1 + log(kn) + 12δn
1δn ≤ 1
≤√nπ0n exp−nδ
3n
2 (S.243)
76
where the second inequality holds for n sufficiently large due to δn log(1 + kn) = o(1).
Together, results (S.242) and (S.243) verify Assumption S.9.2(i) is satisfied with J1n π0n(√δn +
√n exp−nδ3
n/2). Finally, let Bn ≡ βn,P (f) : f ∈ Fn, P ∈ P and note
that definition (S.237), P (∆i,n(P )) 1/dn, Jensen’s inequality, and ‖f‖∞ ≤ π0n imply
‖βn,P (f)‖2 ≤ π0n for all f ∈ Fn and P ∈ P. It thus follows that Bn is contained in a
ball of radius π0n, which together with the change of variables u = ε/(π0n) yields
∫ ∞0
√N(ε,Bn, ‖ · ‖2)dε .
∫ π0n
0
√dn log(
π0n
ε)dε
=√dnπ0n
∫ 1
0
√log(
1
u)du = O(
√dnπ0n). (S.244)
Result (S.244) hence verifies Assumption S.9.2(ii) is satisfied with J2n √dnπ0n. In
summary, we have shown the conditions of Theorem S.9.1(ii) hold with Cn = 1, Kn =√dn, J1n π0n(
√δn +
√n exp−nδ3
n/2), and J2n √dnπ0n. Theorem S.9.1(ii) thus
allows us to conclude that uniformly in P ∈ P we have
supf∈Fn
|Wnf −W?n,P f | = OP (
dnπ0n
√dn log(1 + dn)√
n+ π0n(
√δn +
√n exp−nδ3
n/2)),
which together with (S.235) establishes the claim of the Lemma.
S.6.4 Heterogeneity and Average Welfare
Below we include the proofs of Theorem 4.1, Lemma 4.1, and Theorem 4.2 stated in
Section 4 of the main text as well as needed auxiliary results.
Proof of Theorem 4.1: We establish the claim of the theorem by verifying the
conditions of Theorem 3.2(ii) for both R corresponding to the constraints in (32) and
(33)-(35) (to couple In(R)) and R = Θ (to couple In(Θ)). To this end, note that As-
sumption 3.2 is imposed in Assumption 4.1(i), Assumption 3.3(i) holds with Bn √kn
by Assumption 4.2(i), Assumption 3.3(ii) is directly imposed in Assumption 4.2(ii),
and Assumption 3.3(iii) is satisfied with Jn √jn log(1 + jn) by Lemma S.6.24 and
Fn being ‖ · ‖∞-bounded uniformly in n. Further note Assumption 3.4 is implied by
Assumption 4.3, while Assumption 3.6 is verified by Lemma S.6.23. The coupling re-
quirement of Assumption 3.7 is satisfied for R = Θ, and hence also for R corresponding
to (32) and (33)-(35), with an = (log(n))−1/2 by Lemma S.6.26, Assumption 4.2(iv),
and kn ≥ jn by Assumption 4.2(iii). Moreover, Assumptions 3.8 and 3.9 also hold
by Lemma S.6.25. It thus only remains to verify Assumption 3.10, and to this end
we note Assumption 3.10(ii) is implied by Assumption 4.1(iv). Furthermore, as ar-
gued, Bn √kn, Jn
√jn log(1 + jn), and νn 1/
√jn by Lemma S.6.23, which
allows us to conclude that Assumption 3.10(iii) follows from Assumption 4.3(i). By
77
Lemma S.6.24 and Rn knjn√
log(1 + kn) log(1 + jn)/n, we then note that Assump-
tion 3.10(i) is implied by Assumption 4.2(iv) and κρ = 1, while the requirements
k1/pn
√log(1 + kn)Bn supP∈P J[ ](`
κρn ,Fn, ‖ · ‖P,2) = o(an) and Rn = o(`n) are implied
by kn√jn log2(n)`n = o(1) and knjn log(n)/
√n = o(`n). Since Km = 0 in this ap-
plication, it follows all the conditions of Theorem 3.2(ii) hold for both R = Θ and R
corresponding to (32) and (33)-(35), and hence the theorem follows.
Proof of Lemma 4.1: The result essentially follows from Theorem 1 in Walkup and
Wets (1969). To map our problem into their setting, note that since µs,nsns=1 are
orthogonal, every ν ∈ Mn(B) can be identified with a unique π ∈ Rsn through the
relation ν =∑sn
s=1 πsµs,n – e.g., by πs = ν(Ss,n) for Ss,n the support of µs,n. With some
abuse of notation, for the remaining of the proof we therefore employ π in place of ν.
Further note that, for any θ = (F (c|·)J=1, π), the restrictions ΥG(θ) ≤ 0, Υ(ν)F (θ) = 0,
and Υ(s)F (θ) = 0 depend only on π and define a closed convex polyhedron on Rsn , which
we denote by Kn. Next, define the map Λn : Rsn → RJL by
Λn(π) = sn∑s=1
πs(
∫1g(wl, η) ≤ cµs,n(dη))1≤≤J ,1≤l≤L (S.245)
and note that for any θ = (F (c|·)J=1, π) = θ ∈ Θn ∩R, it follows by (40) that
Γn(θ) = Kn ∩ Λ−1n (F (c|wl)1≤≤J ,1≤l≤L). (S.246)
Also observe that since sn > JL by hypothesis, the null space of Λn must have dimension
at least one. On the other hand, if the null space of Λn has dimension sn, then Γn(θ1) =
Γn(θ2) for any θ1, θ2 ∈ Θn ∩R by result (S.246), and hence the conclusion of the lemma
is immediate. We therefore assume without loss of generality that the null space of Λn
has dimension smaller than sn, which allows us to apply Theorem 1 in Walkup and Wets
(1969) to conclude there is a Cn such that for any θ1, θ2 ∈ Θn ∩R we have
dH(Γn(θ1),Γn(θ2), ‖ · ‖2) ≤ CnJ∑=1
L∑l=1
(F1(c|wl)− F2(c|wl))21/2
. Cn
J∑=1
‖F1(c|·)− F2(c|·)‖∞, (S.247)
and where the norm ‖ · ‖2 on Γn(θ) is understood as the usual Euclidean norm on
the corresponding π ∈ Rsn . On other hand, also note that for any two measures
ν =∑sn
s=1 πs,nµs,n and ν =∑sn
s=1 πs,nµs,n we have ‖ν − ν‖TV = ‖π − π‖1 due to the
measures µs,nsns=1 being orthogonal. Hence, since ‖a‖1 ≤√sn‖a‖2 for any a ∈ Rsn by
78
the Cauchy-Schwarz inequality, result (S.247) allows us to conclude
dH(Γn(θ1),Γn(θ2), ‖ · ‖TV ) .√snCn
J∑=1
‖F1(c|·)− F2(c|·)‖∞,
which establishes the claim of the lemma by setting ζn = Cn√sn.
Proof of Theorem 4.2: For any ` ≥ 0 let Vn(θ, `) ≡ Vn(θ,+∞) ∩ h/√n ∈ Bn :
‖h/√n‖E ≤ ` and recall ‖θ‖E = max1≤≤J ‖F (c|·)‖∞ for any θ = (F (c|·)J=1, ν) ∈
B. Next define the random variables En(R|`n) and En(Θ|`un) to be given by
En(R|`n) = infθ∈Θr
n
infh√n∈Vn(θ,`n)
J∑=1
‖Wnρ(·, θ)qknn + D,n[h]‖Σ,n,21/2
En(Θ|`un) = inf‖ h√
n‖E≤`un
J∑=1
‖Wnρ(·, θun)qknn + D,n[h]‖Σ,n,2
1/2,
and note that for any sequence `n, `un satisfying the conditions of the theorem, Assump-
tion 4.4(iii) and Lemma S.6.31 together imply that Un(R| +∞) = En(R|`n) + oP (an)
and Un(Θ| +∞) = En(Θ|`un) + oP (an) uniformly in P ∈ P0. Hence, to establish the
theorem it suffices to show there are `n ˜n and `un ˜u
n such that
En(R|`n) ≥ U?n,P (R|˜n) + oP (an)
En(R|`n)− En(Θ|`un) ≥ U?n,P (R|˜n)− U?n,P (Θ|˜un) + oP (an) (S.248)
uniformly in P ∈ P0. To this end, we rely on Theorem S.5.1(ii) (for En(R|`n)) and
Lemma S.5.5 (for En(Θ|`un)) Also note that in the proof of Theorem 4.1 we showed
Assumptions 4.1, 4.2, and 4.3 imply Assumptions 3.1-3.10 hold with Bn √kn, Jn √
jn log(1 + jn), νn √jn, Rn knjn
√log(1 + kn)(1 + jn)/n, an = (log(n))−1/2, κρ =
1, ‖θ‖L = ‖θ‖E = max1≤≤J ‖F (c|·)‖∞, and ‖θ‖B =∑J
=1 ‖F (c|·)‖∞ + ‖ν‖TV for
R = Θ and R corresponding to the constraints in (32) and (33)-(35).
In order to apply Theorem S.5.1(ii), we note Assumption 4.4(i) and Lemma S.6.27
verify Assumptions 3.11, 3.12, and 3.13 are satisfied with Kg = 0, Mg = 1, and Kf = 0,
while Assumption 4.4(iii) and Lemma S.6.30 verify Assumption 3.14 and Assumption
3.15(i) is immediate given the definitions of ‖ · ‖E and ‖ · ‖B. Also note Assumption
3.15(ii) follows from Theorem 3.1, νn √jn, and τn
√jn = o(1) by Assumption 4.4(v)
implying that−→d H(Θr
n,Θr0n(P ), ‖ · ‖E) = oP (1) uniformly in P ∈ P0 and therefore
lim infn→∞
infP∈P
P (θ ∈ Bn :−→d H(θ, Θr
n, ‖ · ‖E) ≤ ε ⊆ Θn)
≥ lim infn→∞
infP∈P
P (θ ∈ Bn :−→d H(θ,Θr
0n(P ), ‖ · ‖E) ≤ 2ε ⊆ Θn) = 1, (S.249)
where the final equality holds for any ε < 1/2 by Assumption 4.4(iv), the definition of
79
Θn, and ‖FP (c|·)‖∞ ≤ 1. Next, observe Lemma 4.1 and the definitions of ‖ · ‖E, ‖ · ‖L,
and ‖ · ‖B imply Assumption S.5.1 holds with Dn(B,E) ζn and Dn(L,E) = 1. Since
Km = Kg = Kf = 0 and ΥF and ΥG are affine, the only requirements imposed by
Assumption S.5.2 are that k1/pn
√log(1 + kn)Bn supP∈P J[ ](`
κρn ∨ (νnτn)κρ ,Fn, ‖ · ‖P,2) =
o(an) and (Rn + νnτn)Dn(B,E) = o(rn), which are implied by Assumption 4.4(v),
Lemma S.6.24, and kn√jn log2(n)`n = o(1) by hypothesis. Hence, all the conditions of
Theorem S.5.1(ii) hold, which implies there is a ˜n `n such that uniformly in P ∈ P0
En(R|`n) ≥ U?n,P (R|˜n) + oP (an). (S.250)
Finally, to apply Lemma S.5.5, note that in coupling En(Θ|`un) we can set ‖ · ‖B =
‖ · ‖E = max1≤≤J ‖F (c|·)‖∞ and ΥG(θ) = ΥF (θ) = 0 for all θ ∈ B (since R = Θ).
Hence, Assumptions 3.11, 3.12, and 3.13, 3.15(i) are immediate, while Assumption
3.14 is satisfied by Assumption 4.4(iii) and Lemma S.6.30. Further note since Θ0(P )
is an equivalence class under ‖ · ‖E, when studying the unconstrained statistic we
can treat the model as identified. As a result, we may set τun = 0 and Assump-
tion 3.15(ii) holds by arguments identical to (S.249). In order to apply Lemma S.5.5,
it therefore only remains to verify Assumption 3.16 for the unconstrained problem.
However, in this instance, the only condition imposed by Assumption 3.16 is that
k1/pn
√log(1 + kn)Bn supP∈P J[ ](`
κρn ,Fn, ‖ · ‖P,2) = o(an), which holds by Lemma S.6.24
and kn√jn log2(n)`un = o(1) by hypothesis. Thus, (S.250) and Lemma S.5.5 verify
(S.248), which in turn establishes the theorem.
Lemma S.6.23. If Assumptions 4.1, 4.2(i)(iii), 4.3 hold, then Assumptions 3.5, 3.6
hold with R = Θ and R corresponding to (32), (33)-(35), ‖θ‖E = max1≤≤J ‖F (c|·)‖∞for any θ = (F (c|·)J=1, ν) ∈ B, Vn(P ) = Θn ∩R, and ν−1
n 1/√jn.
Proof: First note that Assumption 4.1(iv) and ‖Σ,n(P )‖o,2 being uniformly bounded
in P ∈ P and n by Assumption 4.3(ii) allow us to conclude that
supP∈P0
supθ∈Θ0(P )
Qn,P (Πrnθ) = O((n log(n))−
12 ). (S.251)
Furthermore, since the class Fn has envelope 3 due to ‖F (c|·)‖∞ ≤ 2 by definition
of Θ in (30), Lemma S.6.24 allows us to set Jn √jn log(1 + jn). Since in addition
Assumption 4.2(i) implies Bn √kn while Assumption 4.2(iii) implies kn ≥ jn, we
obtain that ηn (as defined in Assumption 3.6) satisfies ηn √jnkn log(1 + kn)/
√n.
Hence, Qn,P (θ) ≥ 0 for all θ ∈ B and (S.251) imply Assumption 3.5 holds.
In order to verify Assumption 3.6(i), let ‖θ‖E = max1≤≤J ‖F (c|·)‖∞ for any θ =
(F (c|·)J=1, ν) ∈ B. Then note any (F (c|·)J=1, ν) = θ ∈ Θn must be such that
F (c|·) = pjn′n β,θ for some β,θ ∈ Rjn and, similarly, Πrnθ0(P ) = (Fn(c|·)J=1, νn) must
80
satisfy Fn(c|·) = pjn′n β,n. The Cauchy Schwarz inequality, and Assumptions 4.1(ii)(iii)
and 4.2(iii) then yield uniformly in P ∈ P that
‖θ−Πrnθ0(P )‖E .
√jn
J∑=1
‖β,θ−β,n‖2 .√jn
J∑=1
‖EP [qknn (W )pjnn (W )′(β,θ−β,n)]‖2
.√jn
J∑=1
‖EP [(F (c|W )− Fn(c|W ))qknn (W )]‖Σ,n(P ),21/2, (S.252)
where the final inequality holds due to ‖Σ,n(P )−1‖o,2 being uniformly bounded by
Assumption 4.3(ii) and∑J
=1 |a()| ≤√J ‖a‖2 for any (a(1), . . . , a(J )) = a ∈ RJ . Hence,
combining (S.251), (S.252), and the law of iterated expectations yields
1√jn‖θ −Πr
nθ0(P )‖E . Qn,P (θ)− supθ∈Θ0(P )
Qn,P (Πrnθ) +O((n log(n))−1/2). (S.253)
Since inequality (S.253) holds for any θ ∈ Θn and, as already shown, ηn √jnkn log(1+
kn)/√n, we conclude that Assumption 3.6(i) is satisfied with ν−1
n = 1/√jn and Vn(P ) =
Θn ∩ R. Finally, we note Assumption 3.6(ii) is implied by Assumption 4.3(i), result
(S.251), and ηn √jnkn log(1 + kn)/
√n.
Lemma S.6.24. Define the class Fn ≡ f : f(v) = (1y ≤ c−pjnn (w)′β) for some 1 ≤ ≤ J and ‖pjn′n β‖∞ ≤ 2 and suppose that Assumptions 4.1(ii)(iii) hold. Then, it
follows that supP∈PN[ ](ε,Fn, ‖ · ‖P,2) . (1 ∨ (√jnK/ε)
jn) for some K < ∞, and in
addition supP∈P J[ ](ε,Fn, ‖ · ‖P,2) . ε√jn(1 +
√log(1 ∨ (
√jn/ε))).
Proof: First note that for any pjn′n β1 and pjn′n β2, the Cauchy-Schwarz inequality yields
|pjnn (w)′β1 − pjnn (w)′β2| ≤ supw‖pjnn (w)‖2‖β1 − β2‖2 .
√jn‖β1 − β2‖2,
where in the final inequality we employed Assumption 4.1(ii). Hence, Theorem 2.7.11
in van der Vaart and Wellner (1996), ‖β‖2 supP∈P ‖pjn′n β‖P,2 by Assumption 4.1(iii),
and supP∈P ‖pjn′n β‖P,2 ≤ ‖pjn′n β‖∞ ≤ 2 for any pjn′n β ∈ Θn imply
supP∈P
N[ ](ε,Fn, ‖ · ‖P,2) . 1 ∨ (K√jnε
)jn , (S.254)
for some K < ∞, which establishes the first claim of the lemma. For the second claim
of the lemma, we employ (S.254) and the change of variables v = u/ε to obtain
supP∈P
J[ ](ε,Fn, ‖ · ‖P,2) . ε+
∫ ε
0(log(1 ∨ (
K√jn
u)jn))1/2du
= ε(1 +√jn
∫ 1
0(log(1 ∨ (
K√jn
vε)))1/2dv) .
√jnε(1 +
√log(1 ∨ (
√jn/ε))),
81
where the final inequality follows from (1 ∨ ab) ≤ (1 ∨ a)(1 ∨ b) for any a, b ∈ R+.
Lemma S.6.25. Let ρ : R+×W×Θ be as defined in (31). It then follows Assumptions
3.8 and 3.9 hold with κρ = 1, Kρ =√J , Km = 0, Mm < ∞, ‖ · ‖L = ‖ · ‖E, and
‖θ‖E = max1≤≤J ‖F (c|·)‖∞ for any θ = (F (c|·)J=1, ν) ∈ B.
Proof: First note that for any (F1(c|·)J=1, ν1) = θ1 ∈ B and (F2(c|·)J=1, ν2) =
θ2 ∈ B, we obtain from (31) and the definition of ‖ · ‖E that for all P ∈ P
EP [‖ρ(X, θ1)− ρ(X, θ2)‖22] =
J∑=1
EP [(F1(c|W )− F2(c|W ))2] ≤ J‖θ1 − θ2‖2E,
which verifies Assumption 3.8 holds with κρ = 1 and Kρ =√J . Next, for any P ∈ P
define ∇mP,(θ)[h] = −Fh(c|W ) for all θ ∈ B and (Fh(c|·)J=1, νh) = h ∈ B. Since
mP,(θ) = P (Y ≤ c|W )− F (c|W ) for any θ = (F (c|·)J=1, ν) ∈ B, direct calculation
verifies Assumption 3.9 holds with Km = 0, Mm = 1, and ‖ · ‖L = ‖ · ‖E.
Lemma S.6.26. If k3nj
3n log2(n) = o(n), Assumptions 4.1(i)-(iii) and 4.2(i) hold, then
Assumption 3.7 holds with R = Θ for any an with k3nj
3n log2(n)/n = o(an).
Proof: We establish the result by applying Lemma S.6.28. To this end, we let jn =
J +jn set rj,njnj=1 = 1y ≤ cJ=1∪pj,njnj=1 and let rjnn (x) ≡ (r1,n(x), . . . , rjn,n(x))′.
Next note that any f ∈ Fn may be written as rjn′n β for some β ∈ Rjn . Moreover, since
supP∈P max1≤≤J ‖F (c|·)‖P,2 ≤ max1≤≤J ‖F (c|·)‖∞ ≤ 2 for any (F (c|·)J=1, ν) =
θ ∈ Θn, Assumption 4.1(iii) implies that there exists a C0 <∞ (independent of n) such
that ‖β‖2 ≤ C0 whenever rjn′n β ∈ Fn. Hence, by Assumptions 4.1(ii) and 4.2(i), we may
apply Lemma S.6.28 with b1n √kn, b2n
√jn, and Cn = O(1), from which the claim
of the present lemma immediately follows.
Lemma S.6.27. Let B = (⊗J
=1CB(W)) ×M(B) and Θ, ΥG, and ΥF be as defined
in (30), (32), and (33)-(35). If Ψ(g, ·) is bounded on Ω, then Assumptions 3.11, 3.12,
and 3.13 are satisfied with Kg = 0, Mg = 1, ∇ΥG(θ)[h] = ΥG(h), Kf = 0, and
∇ΥF (θ)[h] = (Υ(e)F (h),Υ
(ν)F (h) + 1,Υ
(s)F (h) + λ)′. (S.255)
Proof: For any measure ν ∈M(B) let ν = ν+ − ν− denote its Jordan decomposition,
|ν| = ν+ + ν−, and recall the total variation of ν equals ‖ν‖TV = |ν|(Ω). Since ΥG :
B→ `∞(B) is linear, in order to verify Assumption 3.11 we need only show that ΥG is
continuous. To this end, recall that for any (F (c|·)J=1, ν) = θ ∈ B we had defined
‖θ‖B =∑J
=1 ‖F (c|·)‖∞ + ‖ν‖TV . Hence, employing the definition of ΥG we obtain
‖ΥG‖o = sup‖θ‖B=1
‖ΥG(θ)‖∞ = supν:‖ν‖TV =1
supB∈B|ν(B)| ≤ sup
ν:‖ν‖TV =1|ν|(Ω) = 1,
82
which, by linearity of ΥG, implies Assumption 3.11 holds with ∇ΥG(θ)[h] = ΥG(h),
Kg = 0, and Mg = 1. By similar arguments, note that Υ(e)F : B → RJL, as defined in
(33), is linear and its operator norm can be bounded by
‖Υ(e)F ‖
2o = sup
‖θ‖B=1
J∑=1
L∑l=1
(F (c|wl)−∫
1g(wl, η) ≤ cν(dη))2
≤J∑=1
L∑l=1
2 sup‖F (c|·)‖∞=1
(F (c|wl))2 + 2 sup‖ν‖TV =1
|ν|(Ω) = 4JL. (S.256)
Moreover, note that for any bounded f : Ω→ R and ν1, ν2 ∈M(B) it follows that∫Ωf(η)(ν1(dη)− ν2(dη)) ≤ ‖f‖∞|ν1 − ν2|(Ω) = ‖f‖∞‖ν1 − ν2‖TV ,
which implies Υ(ν)F : B → R and Υ
(s)F : B → R are Frechet differentiable at any
θ ∈ B with ∇Υ(ν)F (θ)[h] = Υ
(ν)F (h) + 1, ∇Υ
(s)F (θ)[h] = Υ
(s)F (h) + λ, ‖∇Υ
(ν)F (θ)‖o ≤
1, and ‖∇Υ(s)F (θ)‖o ≤ ‖Ψ(g, ·)‖∞. By employing (S.256) we may therefore conclude
Assumptions 3.12(i)(ii)(iii) are satisfied with ∇ΥF (θ) as in (S.255), Kf = 0, and M2f =
4JL+ 1 + ‖Ψ(g, ·)‖2∞. Furthermore, note that (provided R 6= ∅) we may find a ν? such
that ν?(Ω) = 1 and∫
Ψ(g, η)ν?(dη) = λ, which together with (S.255) implies the range
of∇ΥF (θ) equals Fn and hence Assumption 3.12(iv) holds. Finally, we note Assumption
3.13 is immediate due to ΥF being affine.
Lemma S.6.28. Let rj,njnj=1 be a set of functions of X, rjnn (x) = (r1,n(x), . . . , rjn,n(x))′,
Gn = rjn′n β for some β with ‖β‖2 ≤ Cn, and suppose b1n ≡ max1≤k≤kn ‖qk‖∞ and
b2n ≡ max1≤j≤jn ‖rj‖∞ are finite for all n. If Xi, Zini=1 is i.i.d. with (X,Z) dis-
tributed according to P ∈ P, then it follows that uniformly in P ∈ P
supg∈Gn
‖Gn,P gqknn −Wn,P gq
knn ‖2 = OP (
Cn(knjnb1nb2n log(n))2
n).
Proof: For notational simplicity, we first define a kn × jn matrix E(1)n,P to be given by
E(1)n,P ≡
1√n
n∑i=1
qknn (Zi)rjnn (Xi)
′ − EP [qknn (Z)rjnn (X)′].
For any matrix A let vecA denote a column vector consisting of the unique elements
of A and set En,P ≡ vecE(1)n,P , noting that En,P has dimension (at most) jnkn. Our
first step is to couple En,P to a normal vector Nn,P . To this end, we note that
‖vecqknn (Z)rjnn (X)′ − EP [qknn (Z)rjnn (X)′]‖2 ≤ 2√jnknb1nb2n
by definition of b1n and b2n. Since the dimension of En,P is at most jnkn, Theorem 1.1
83
in Zhai (2018) and Markov’s inequality imply, provided the underlying probability space
is suitably rich, that there is a Gaussian vector Nn,P such that
‖En,P − Nn,P ‖2 = OP ((knjnb1nb2n log(n))2
n) (S.257)
uniformly in P ∈ P. Next observe that for any g ∈ Gn there exists a β ∈ Rjn such that
Gn,P gqknn = E(1)
n,Pβ.
Hence, letting N(1)n,P denote the kn × jn matrix built from the corresponding entries of
the normal vector Nn,P , we define the Gaussian process Wn,P by setting
Wn,P gqknn = N(1)
n,Pβ
for any rjn′n β = g ∈ Gn. Therefore, since ‖β‖2 ≤ Cn by definition of Gn, and the operator
norm is bounded by the Frobenius norm, we obtain from result (S.257) that
supg∈Gn
‖Gn,P gqknn −Wn,P gq
knn ‖2 ≤ ‖E
(1)n,P − N(1)
n,P ‖o,2Cn = OP (Cn(knjnb1nb2n log(n))2
n)
uniformly in P ∈ P, and hence the claim of the lemma follows.
Lemma S.6.29. Let rj,njnj=1 be a set of functions of X, rjnn (x) ≡ (r1,n(x), . . . , rjn,n(x))′,
and suppose supx ‖rjnn (x)‖2 . b1n, supz ‖qknn (z)‖2 . b2n, and EP [qknn (Z)qknn (Z)′] and
EP [rjnn (X)rjnn (X)′] have eigenvalues bounded uniformly in P ∈ P and n. If Xi, Zini=1
is i.i.d. with (X,Z) ∼ P ∈ P, then there is a K <∞ such that for all δ ≥ 0
supP∈P
P (‖ 1
n
n∑i=1
qknn (Zi)rjnn (Xi)
′ − EP [qknn (Z)rjnn (X)′]‖o,2 > δ)
≤ (jn + kn) exp− nδ2K
b21n ∨ b22n + δb1nb2n. (S.258)
Proof: We first define a kn × jn random matrix Mi,n satisfying EP [Mi,n] = 0 by
Mi,n ≡1
nqknn (Zi)r
jnn (Xi)
′ − EP [qknn (Z)rjnn (Xi)′].
Since for any random matrix A we have ‖E[A]‖o ≤ E[‖A‖o] by Jensen’s inequality,
‖A‖2o ≤ traceA′A, supx ‖rjnn (x)‖2 . b1n, and supz ‖qknn (z)‖2 . b2n imply
‖Mi,n‖2o . ‖1
nqknn (Zi)r
jnn (Xi)
′‖2o + EP [‖ 1
nqknn (Z)rjnn (X)′‖2o]
.supz ‖qknn (z)‖22 × supx ‖r
jnn (x)‖22
n2.b21nb
22n
n2. (S.259)
84
Moreover, since the eigenvalues of EP [qknn (Z)qknn (Z)′] are bounded uniformly in P ∈ P
by Assumption and supx ‖rjnn (x)‖2 . b1n it additionally follows that
supP∈P‖
n∑i=1
EP [Mi,nM′i,n]‖o ≤ supP∈P
2
n‖EP [qknn (Z)qknn (Z)′‖rjnn (X)‖22]‖o .
b22nn. (S.260)
Identical arguments but relying on the eigenvalues of EP [rjnn (X)rjnn (X)′] being bounded
uniformly in P ∈ P and supx ‖qknn (x)‖2 . b1n by hypothesis further yield that
supP∈P‖
n∑i=1
EP [M′i,nMi,n]‖o .b21nn. (S.261)
The claim of the Lemma the follows from results (S.259), (S.260), and (S.261) allowing
us to apply Theorem 1.6 in Tropp (2012) with σ2 (b21n ∨ b22n)/n and R b1nb2n/n.
Lemma S.6.30. If Assumptions 4.1(i)-(iii), 4.2(i)(ii) hold, and j2nk
2n log(1 + jnkn) =
o(n), then it follows that Assumption 3.14 holds with R = Θ for any sequence an satis-
fying k1/pn (k2
nj5n log3(1 + knjn)/n)1/4 = o(an).
Proof: Let Gn ≡ g : g(x) = 1y ≤ c − pjn′n β for some 1 ≤ ≤ J and ‖pjn′n β‖∞ ≤ 2and Fn ≡ gqk,n : g ∈ Gn and 1 ≤ k ≤ kn. Then note that when R = Θ we obtain
supf∈Fn
‖Wnfqknn −W?
nfqknn ‖p ≤ k1/p
n supf∈Fn
|Wnf −W?n,P f |. (S.262)
We will therefore establish the lemma by employing (S.262) and applying Theorem
S.9.1(i) to the class Fn. To this end, let fdnn (v) ≡ qknn (z)⊗(pjnn (w)′, 1y ≤ c1, . . . , 1y ≤cJ )′ and note dn = kn(jn + J ). Next observe that applying Lemma S.6.13 with
D1 ≡ (pjnn (W )′, 1Y ≤ c1, . . . , 1Y ≤ cJ )′ and D2 = qknn (Z) allows us to conclude
supP∈P
EP [fdnn (V )fdnn (V )′] ≤ supP∈P‖eigD1D
′1‖P,∞ × eigEP [D2D
′2] . jn,
where the final inequality holds by Assumptions 4.1(ii) and 4.2(ii). Hence, it follows
Assumption S.9.1(i) holds with Cn = jn, while Assumption S.9.1(ii) is satisfied with
Kn =√knjn by Assumptions 4.1(ii) and 4.2(i). Further note that by Assumptions
4.1(iii) it follows that ‖β‖2 supP∈P ‖pjn′n β‖P,2 ≤ ‖pjn′n β‖∞. Hence, by definition of
Fn, there is a C0 <∞ such that any f ∈ F satisfies fdn′n β for some β in
Bn ≡ β ∈ Rdn : β = ek ⊗ γ for some γ ∈ Rjn+1 with ‖γ‖2 ≤ C0,
where ek ∈ Rkn has its kth coordinate equal to one and all other coordinates equal to
zero. In particular, it follows that Assumption S.9.2(i) is immediate with Gn,P equal to
the zero function and J1n = 0. Moreover, setting Cn ≡ γ ∈ R1+jn : ‖γ‖2 ≤ C0, we
85
can then conclude from the definition of Bn and N(ε, Cn, ‖ · ‖2) . 1 ∨ (C0/ε)jn that∫ ∞
0
√log(N(ε,Bn, ‖ · ‖2))dε
.∫ C0
0
√log(kn) + log(N(ε, Cn, ‖ · ‖2))dε .
√log(kn) +
√jn,
which verifies Assumption S.9.2(ii) is satisfied with J2n √
log(kn) +√jn. Thus,
applying Theorem S.9.1(i) with Kn √kn, jn, Cn jn, dn . knjn, J1n = 0, and
J2n √
log(kn) +√jn implies that uniformly in P ∈ P we have
supf∈Fn
|Wnf −W?n,P f | = OP (k
2nj
5n log3(1 + knjn)
n1/4) (S.263)
provided that j2nk
2n log(1 + jnkn) = o(n). Since the latter condition is satisfied by hy-
pothesis, the claim of the lemma then follows from (S.262) and (S.263).
Lemma S.6.31. Define ‖θ‖E = max1≤≤J ‖F (c|·)‖∞ and let Vn(θ, `n) = Vn(θ,+∞) ∩h/√n : ‖h/
√n‖E ≤ `n for Vn(θ,+∞) as in (39). If Assumptions 4.1, 4.2(i)(iii)(iv),
and 4.3 hold, then for any an and `n satisfying k4nj
5n log3(1 + knjn)/n = o(a4
n) and
knjn log(n)/√n = o(`n) it follows uniformly in P ∈ P that
Un(R|+∞) = infθ∈Θr
n
infh√n∈Vn(θ,`n)
J∑=1
‖Wnρ(·, θ)qknn + D,n[h]‖Σ,n,21/2 + oP (an)
Un(Θ|+∞) = inf‖ h√
n‖E≤`n
J∑=1
‖Wnρ(·, θun)qknn + D,n[h]‖Σ,n,2
1/2 + oP (an).
Proof: We establish the claim by verifying the conditions of Lemma 3.1 under both R =
Θ and R corresponding to the constraints in (32) and (33)-(35). To this end, recall that
in the proof of Theorem 4.1 we argued (for both specifications of R) that Assumptions
3.3(i)(iii) hold with Bn √kn and Jn
√jn log(1 + jn) and that Assumption 3.4 is
satisfied by Assumption 4.3. Further note Assumption 4.1(ii) and the Cauchy-Schwarz
inequality imply for any (pjn′n β,hJ=1, ν) = h ∈ Bn that
‖h‖E . max1≤≤J
√jn‖β,h‖2 .
√jn
J∑=1
‖D,n,P [h]‖221/2 =√jn‖Dn,P [h]‖2, (S.264)
where the second inequality follows from D,n,P [h] = −EP [qknn (Z)pjnn (W )′β,h] and the
smallest singular values of EP [qknn (Z)pjnn (W )′] being bounded away from zero uniformly
in P ∈ P by Assumption 4.2(iii). Since νn √jn by Lemma S.6.23 and the derivative
Dn,P (θ) does note depend on θ, we conclude ‖h‖E ≤ νn‖Dn,P [h]‖2 for all h ∈ Bn – i.e.,
in the verifying the conditions of Lemma 3.1 we may set An(P ) = Θn ∩ R. In order to
86
verify condition (24) of Lemma 3.1 we note that since ‖h‖E . max1≤≤J√jn‖βh‖2 as
shown in (S.264), the definitions of the operator norm ‖ · ‖o,2, D,n, and D,n,P imply
suph∈Bn
‖Dn[h]− Dn,P [h]‖2‖h‖E
.√jn‖
1
n
n∑i=1
qknn (Zi)pjnn (Wi)
′ − EP [qknn (Z)pjnn (W )′]‖o,2 = oP (1), (S.265)
where the final equality holds uniformly in P ∈ P by applying Lemma S.6.29 with
b1n =√jn, b2n = kn (by Assumptions 4.1(ii) and 4.2(i)) and employing that kn ≥ jn and
k2njn log(kn)/n = o(1) by Assumptions 4.2(iii)(iv). Finally, we note that Assumption
4.4(iii) implies j5nk
4n log5(1 + jnkn) = o(n), and employing Lemma S.6.30 with p = 2
yields that Assumption 3.14 holds for R = Θ, and hence also for R corresponding to
(32) and (33)-(35). The only condition of Lemma 3.1 that remains to be verified is that
Sn(B,E)Rn = o(`n). To this end, we observe that since Vn(θ, `n) is defined through
the constraint ‖h‖E ≤ `n (instead of ‖ · ‖B ≤ `n), it suffices to verify Rn = o(`n) – i.e.
for the purposes of this lemma we may set ‖ · ‖B = ‖ · ‖E. However, since as argued
Jn √jn log(jn), Bn =
√kn, and νn
√jn, we have Rn knjn
√log(kn) log(jn)/
√n,
and the requirement Rn = o(`n) is implied by knjn log(n)/√n = o(`n). Thus, the claim
of the lemma follow from Lemma 3.1.
S.7 Local Parameter Space
This section contains analytical results concerning the local parameter space and our
approximation to it. The main result of this section is Theorem S.7.1, which plays an
instrumental role in the proof of the results of Section 3.3 in the main text.
Theorem S.7.1. Let Assumptions 3.1, 3.11, 3.12, and 3.13 hold, `n, δn, rn∞n=1 satisfy
`n ↓ 0, δn1Kf > 0 ↓ 0, rn ≥ (Mgδn +Kgδ2n) ∨ 2(`n + δn)1Kg > 0, and define
Gn(θ) ≡ h√n∈ Bn : ΥG(θ +
h√n
) ≤ (ΥG(θ)−Kgrn‖h√n‖B1G) ∨ (−rn1G)
An(θ) ≡ h√n∈ Bn :
h√n∈ Gn(θ), ΥF (θ +
h√n
) = 0 and ‖ h√n‖B ≤ `n
Tn(θ) ≡ h√n∈ Bn : ΥF (θ +
h√n
) = 0, ΥG(θ +h√n
) ≤ 0 and ‖ h√n‖B ≤ 2`n.
(i) Then, there exist M < ∞, ε > 0, and n0 < ∞ such that for all n > n0, P ∈ P,
θ0 ∈ Θr0n(P ), and θ1 ∈ (Θr
0n(P ))ε ∩R satisfying ‖θ0 − θ1‖B ≤ δn we have
suph1√n∈An(θ1)
infh0√n∈Tn(θ0)
‖ h1√n− h0√
n‖B ≤M × `n(`n + δn)1Kf > 0. (S.266)
87
(ii) If in addition ΥG and ΥF are affine, then for any θ0, θ1 ∈ Bn∩R with ‖θ0−θ1‖B ≤ δn
h√n∈ Bn :
h√n∈ Gn(θ1), ΥF (θ1 +
h√n
) = 0
⊆ h√n∈ Bn : ΥG(θ0 +
h√n
) ≤ 0, ΥF (θ0 +h√n
) = 0.
Proof: We being by establishing part (ii) due to its simplicity. In particular, note that
if ΥG is affine, then Kg = 0 and by Lemma S.7.1(ii) we obtain that
Gn(θ1) ⊆ h√n∈ Bn : ΥG(θ0 +
h√n
) ≤ 0 (S.267)
for any θ0, θ1 ∈ Bn with ‖θ0 − θ1‖B ≤ δn due to rn ≥ Mgδn. Moreover, if ΥF is affine
and continuous, then ΥF (θ) = L(θ) + c0 for some continuous linear map L : B → F
and c0 ∈ F. It follows that ∇ΥF (θ)[h] = L(h), which does not depend on θ, and
since any θ ∈ R must satisfy L(θ) = −c0 (since ΥF (θ) = 0), we can conclude that
h : ΥF (θ + h) = 0 = h : ∇ΥF (h) = 0 whenever θ ∈ R. Therefore the claim of
Theorem S.7.1(ii) follows from result (S.267) and θ1, θ2 ∈ R.
We next turn to the proof of part (i). Throughout, let ε be such that Assumptions
3.11 and 3.12 hold and set ε = ε/2. We break up the proof into four steps.
Step 1: (Decompose h/√n). For any P ∈ P, θ0 ∈ Θr
0n(P ), and h ∈ Bn set
h⊥θ0 ≡ ∇ΥF (θ0)−∇ΥF (θ0)[h] hNθ0 ≡ h− h⊥θ0 , (S.268)
where recall ∇ΥF (θ0)− : Fn → Bn denotes the right inverse of ∇ΥF (θ0) : Bn → Fn.
Further note that hNθ0 ∈ N (∇ΥF (θ0)) since ∇ΥF (θ0)∇ΥF (θ0)− = I implies that
∇ΥF (θ0)[hNθ0 ] = ∇ΥF (θ0)[h]−∇ΥF (θ0)∇ΥF (θ0)−∇ΥF (θ0)[h] = 0, (S.269)
by definition of h⊥θ0 in (S.268). Next, observe that if θ1 ∈ (Θr0n(P ))ε∩R and h/
√n ∈ Bn
satisfies ‖h/√n‖B ≤ `n and ΥF (θ1 + h/
√n) = 0, then θ1 + h/
√n ∈ (Θr
0n(P ))ε for n
sufficiently large, and hence by Assumption 3.12(i) and ΥF (θ1) = 0 due to θ1 ∈ R
‖∇ΥF (θ1)[h√n
]‖F = ‖ΥF (θ1 +h√n
)−ΥF (θ1)−∇ΥF (θ1)[h√n
]‖F ≤ Kf‖h√n‖2B. (S.270)
Hence, Assumption 3.12(ii), result (S.270), ‖θ0 − θ1‖B ≤ δn, and ‖h/√n‖B ≤ `n imply
‖∇ΥF (θ0)[h√n
]‖F
≤ ‖∇ΥF (θ0)−∇ΥF (θ1)‖o‖h√n‖B +Kf‖
h√n‖2B ≤ Kf `n(δn + `n). (S.271)
88
Moreover, since ∇ΥF (θ0) : Fn → Bn satisfies Assumption 3.12(iv), we also have that
Kf‖h⊥θ0‖B = Kf‖∇ΥF (θ0)−∇Υ(θ0)[h]‖B≤ Kf‖∇ΥF (θ0)−‖o‖∇ΥF (θ0)[h]‖F ≤Mf‖∇ΥF (θ0)[h]‖F. (S.272)
Further note that if Kf = 0, then (S.268) and (S.271) imply h⊥θ0 = 0. Thus, combining
results (S.271) and (S.272) to handle the case Kf > 0 we conclude that for any P ∈ P,
θ0 ∈ Θr0n(P ), θ1 ∈ (Θr
0n(P ))ε ∩ R satisfying ‖θ0 − θ1‖B ≤ δn and any h/√n ∈ Bn such
that ΥF (θ1 + h/√n) = 0 and ‖h/
√n‖B ≤ `n we have the bound
‖h⊥θ0√n‖B ≤Mf `n(δn + `n)1Kf > 0. (S.273)
Step 2: (Inequality Constraints). In what follows, it is convenient to define the set
Sn(θ0, θ1) ≡ h√n∈ Bn : ΥG(θ0 +
h√n
) ≤ 0, ΥF (θ1 +h√n
) = 0, ‖ h√n‖B ≤ `n.
Then note rn ≥ (Mgδn +Kgδ2n) ∨ 2(`n + δn)1Kg > 0 and Lemma S.7.1(i) imply that
An(θ1) ⊆ Sn(θ0, θ1) (S.274)
for n sufficiently large, all P ∈ P, θ0 ∈ Θr0n(P ), and θ1 ∈ (Θr
0n(P ))ε satisfying ‖θ0 −θ1‖B ≤ δn. The proof will proceed by verifying (S.266) holds with Sn(θ0, θ1) in place
of An(θ1). In particular, if Kf = 0, then ΥF (θ0) = ΥF (θ1) due to θ0, θ1 ∈ R, and
Assumptions 3.12(i)-(ii) together with (S.274) imply An(θ1) ⊆ Sn(θ0, θ1) ⊆ Tn(θ0).
Hence, result (S.266) holds for the case Kf = 0.
For the rest of the proof we therefore assume Kf > 0. We further note that Lemma
S.7.2 implies that for any ηn ↓ 0, there is a sufficiently large n and constant 1 ≤C < ∞ (independent of ηn) such that for all P ∈ P and θ0 ∈ Θr
0n(P ) there exists a
hθ0,n/√n ∈ Bn ∩ N (∇ΥF (θ0)) such that for any h/
√n ∈ Bn for which there exists a
h/√n ∈ Sn(θ0, θ1) satisfying ‖(h− h)/
√n‖B ≤ ηn the following inequalities hold
ΥG(θ0 +hθ0,n√n
+h√n
) ≤ 0 ‖hθ0,n√n‖B ≤ Cηn. (S.275)
Step 3: (Equality Constraints). The results in this step allow us to address the chal-
lenge that h/√n ∈ Sn(θ0, θ1) satisfies ΥF (θ1 + h/
√n) = 0 but not necessarily ΥF (θ0 +
h/√n) = 0. To this end, let R(∇ΥF (θ0)−∇ΥF (θ0)) denote the range of the operator
∇ΥF (θ0)−∇ΥF (θ0) : Bn → Bn and define the vector subspaces
BNθ0n ≡ Bn ∩N (∇ΥF (θ0)) B
⊥θ0n ≡ R(∇ΥF (θ0)−∇ΥF (θ0)). (S.276)
89
Since hNθ0 ∈ BNθ0n by (S.269), the definitions in (S.268) and (S.276) imply that Bn =
BNθ0n + B
⊥θ0n . Furthermore, since ∇ΥF (θ0)∇ΥF (θ0)− = I, we also have
∇ΥF (θ0)−∇ΥF (θ0)[h] = h (S.277)
for any h ∈ B⊥θ0n , and thus that B
Nθ0n ∩B
⊥θ0n = 0. Since Bn = B
Nθ0n + B
⊥θ0n , it then
follows that Bn = BNθ0n ⊕B
⊥θ0n – i.e. the decomposition in (S.268) is unique. Moreover,
we observe that BNθ0n ∩B
⊥θ0n = 0 further implies the restricted map∇ΥF (θ0) : B
⊥θ0n →
Fn is in fact bijective, and by (S.277) its inverse is ∇ΥF (θ0)− : Fn → B⊥θ0n .
Note Assumption 3.12(i) implies that for all n and P ∈ P, ΥF is Frechet differentiable
at all θ ∈ (Θr0n(P ))ε. Therefore, applying Lemma S.7.4 with A1 = B
Nθ0n , A2 = B
⊥θ0n
and K0 ≡ Kf ∨Mf ∨Mf/Kf yields that for any P ∈ P, θ0 ∈ Θr0n(P ) and hNθ0 ∈ B
Nθ0n
satisfying ‖hNθ0‖B ≤ ε/2 ∧ (2K0)−2 ∧ 12, there exists a h?(hNθ0 ) ∈ B⊥θ0n such that
ΥF (θ0 + hNθ0 + h?(hNθ0 )) = 0 ‖h?(hNθ0 )‖B ≤ 2K20‖hNθ0‖2B. (S.278)
Moreover, for any P ∈ P, θ0 ∈ Θr0n(P ), θ1 ∈ (Θr
0n(P ))ε ∩ R with ‖θ0 − θ1‖B ≤ δn, and
any h/√n ∈ Bn such that ΥF (θ1 + h/
√n) = 0 and ‖h/
√n‖B ≤ `n, result (S.273), the
decomposition in (S.268), and δn ↓ 0 (since Kf > 0), `n ↓ 0 imply that for n large
‖hNθ0√n‖B ≤ ‖
h√n‖B + ‖h
⊥θ0√n‖B ≤ 2`n. (S.279)
Thus, for hθ0,n ∈ BNθ0n as in (S.275), C ≥ 1, and results (S.278) and (S.279) imply that
for n sufficiently large we must have for all P ∈ P, θ0 ∈ Θr0n(P ), θ1 ∈ Bn ∩ R with
‖θ0 − θ1‖B ≤ δn and h/√n ∈ Bn satisfying ΥF (θ1 + h/
√n) = 0 that
ΥF (θ0 +hθ0,n√n
+hNθ0√n
+ h?(hθ0,n√n
+hNθ0√n
)) = 0
‖h?(hθ0,n√n
+hNθ0√n
)‖B − 16K20C
2(`2n + η2n) ≤ 0. (S.280)
Step 4: (Build Approximation). In order to employ Steps 2 and 3, we now set ηn to
ηn = 32(Mf + C2K20 )`n(`n + δn). (S.281)
In addition, for any P ∈ P, θ0 ∈ Θr0n(P ), θ1 ∈ Bn ∩ R satisfying ‖θ0 − θ1‖B ≤ δn, and
any h/√n ∈ Sn(θ0, θ1), we let hNθ0 be as in (S.268) and define
h√n≡hθ0,n√n
+hNθ0√n
+ h?(hθ0,n√n
+hNθ0√n
). (S.282)
From Steps 2 and 3 it then follows that for n sufficiently large (independent of P ∈ P,
90
θ0 ∈ Θr0n(P ), θ1 ∈ Bn ∩R with ‖θ0 − θ1‖B ≤ δn or h/
√n ∈ Sn(θ0, θ1)) we have
ΥF (θ0 +h√n
) = 0. (S.283)
Moreover, from results (S.280) and (S.281) we also obtain that for n sufficiently large
‖h?(hθ0,n√n
+hNθ0√n
)‖B ≤ 16C2K20 (`2n + η2
n) ≤ ηn2
+ 16C2K20η
2n ≤
3
4ηn. (S.284)
Thus, h = hNθ0+h⊥θ0 , (S.273), (S.281), (S.282) and (S.284) imply ‖(h−h−hθ0,n)/√n‖B ≤
ηn for n sufficiently large, and employing (S.275) with h = (h− hθ0,n)/√n) yields
ΥG(θ0 +h√n
) ≤ 0. (S.285)
Furthermore, since ‖hθ0,n/√n‖B ≤ Cηn by (S.275), results (S.273), (S.280), and ‖h/
√n‖B ≤
`n for any h/√n ∈ Sn(θ0, θ1) imply by (S.281) and `n ↓ 0, δn ↓ 0 that
‖ h√n‖B ≤ ‖
hθ0,n√n‖B + ‖h?(
hθ0,n√n
+hNθ0√n
)‖B + ‖h⊥θ0√n‖B + ‖ h√
n‖B
≤ Cηn + 16C2K20 (`2n + η2
n) +Mf `n(δn + `n) + `n ≤ 2`n (S.286)
for n sufficiently large. Therefore, we conclude from (S.283), (S.285), and (S.286) that
h/√n ∈ Tn(θ0). Similarly, (S.273), (S.275), (S.280), and (S.281) yield for some M <∞
‖ h√n− h√
n‖B ≤ ‖
hθ0,n√n‖B + ‖h?(
hθ0,n√n
+hNθ0√n
)‖B + ‖h⊥θ0√n‖B
≤ Cηn + 16C2K20 (`2n + η2
n) +Mf `n(`n + δn) ≤M`n(`n + δn),
which establishes the (S.266) for the case Kf > 0.
Lemma S.7.1. Let Assumptions 3.1, 3.11 hold, and `n ↓ 0 be given. (i) Then, there
are n0 <∞ and ε > 0 such that for all n > n0, P ∈ P0, θ0 ∈ Θr0n(P ), θ1 ∈ (Θr
0n(P ))ε:
h√n∈ Bn : ΥG(θ1 +
h√n
) ≤ (ΥG(θ1)−Kgr‖h√n‖B1G)∨ (−r1G), and ‖ h√
n‖B ≤ `n
⊆ h√n∈ Bn : ΥG(θ0 +
h√n
) ≤ 0 and ‖ h√n‖B ≤ `n
for any r ≥ Mg‖θ0 − θ1‖B +Kg‖θ0 − θ1‖2B ∨ 2`n + ‖θ0 − θ1‖B1Kg > 0. (ii) If in
addition ΥG is affine, then for any n, θ0, θ1 ∈ Bn and r ≥Mg‖θ0 − θ1‖B we have
h√n∈ Bn : ΥG(θ1 +
h√n
) ≤ ΥG(θ1) ∨ (−r1G) ⊆ h√n∈ Bn : ΥG(θ0 +
h√n
) ≤ 0.
91
Proof: Let ε > 0 be such that Assumption 3.11 holds, set ε = ε/2, and for notational
simplicity let Nn,P (δ) ≡ θ ∈ Bn :−→d H(θ,Θr
0n(P ), ‖ · ‖B) < δ for any δ > 0. Then
note that for any θ1 ∈ Nn,P (ε) and ‖h/√n‖B ≤ `n we have θ1 + h/
√n ∈ Nn,P (ε) for n
sufficiently large. Therefore, by Assumption 3.11(ii) we obtain that
‖ΥG(θ1 +h√n
)−ΥG(θ1)−∇ΥG(θ1)[h√n
]‖G ≤ Kg‖h√n‖2B. (S.287)
Similarly, Assumption 3.11(ii) implies that if θ0 ∈ Θr0n(P ) and θ1 ∈ Nn,P (ε), then
‖∇ΥG(θ0)[h√n
]−∇ΥG(θ1)[h√n
]‖G
≤ ‖∇ΥG(θ0)−∇ΥG(θ1)‖o‖h√n‖B ≤ Kg‖θ0 − θ1‖B‖
h√n‖B (S.288)
for any h/√n ∈ Bn. Hence, since ΥG(θ0) ≤ 0 due to θ0 ∈ Θn ∩R we can conclude that
ΥG(θ0 +h√n
) + ΥG(θ1)−ΥG(θ1 +h√n
)
≤ ΥG(θ0 +h√n
)−ΥG(θ0)+ ΥG(θ1)−ΥG(θ1 +h√n
)
≤ Kg‖h√n‖B2‖
h√n‖B + ‖θ0 − θ1‖B1G, (S.289)
by (S.287), (S.288), and Lemma S.7.3. Also note for any θ0 ∈ Θr0n(P ), θ1 ∈ Nn,P (ε), and
h/√n ∈ Bn with ‖h/
√n‖B ≤ `n we have θ0 +h/
√n ∈ Nn,P (ε) and θ1 +h/
√n ∈ Nn,P (ε)
for n sufficiently large. Thus, by Assumptions 3.11(i), 3.11(iii), and Lemma S.7.3
ΥG(θ0 +h√n
)−ΥG(θ1 +h√n
) ≤ ∇ΥG(θ0 +h√n
)[θ0 − θ1] +Kg‖θ0 − θ1‖2B1G
≤ Mg‖θ0 − θ1‖B +Kg‖θ0 − θ1‖2B1G. (S.290)
Hence, (S.289) and (S.290) yield for r ≥ Mg‖θ0− θ1‖B +Kg‖θ0− θ1‖2B∨ 2`n + ‖θ0−θ1‖B1Kg > 0, θ0 ∈ Θr
0n(P ), θ1 ∈ Nn,P (ε), ‖h/√n‖B ≤ `n, and n large
ΥG(θ0 +h√n
) ≤ ΥG(θ1 +h√n
) + (Kgr‖h√n‖B −ΥG(θ1))1G ∧ r1G
= ΥG(θ1 +h√n
)− (ΥG(θ1)−Kgr‖h√n‖B)1G ∨ (−r1G) (S.291)
where the equality follows from (−a) ∨ (−b) = −(a ∧ b) by Theorem 8.6 in Aliprantis
and Border (2006). Since a1 ≤ a2 and b1 ≤ b2 implies a1∧b1 ≤ a2∧b2 in G by Corollary
8.7 in Aliprantis and Border (2006), (S.291) implies that for n sufficiently large and any
92
θ0 ∈ Θr0n(P ), θ1 ∈ Nn,P (ε) and h/
√n ∈ Bn satisfying ‖h/
√n‖B ≤ `n and
ΥG(θ1 +h√n
) ≤ (ΥG(θ1)−Kgr‖h√n‖B1G) ∨ (−r1G)
we must have ΥG(θ0 + h/√n) ≤ 0, which verifies the first claim of the lemma. For the
second claim, just note that if ΥG is affine, then we may set Kg = 0 and ε = +∞ in
Assumption 3.11, which leads to the desired simplification.
Lemma S.7.2. If Assumptions 3.1, 3.11, 3.13(ii) hold, and ηn ↓ 0, `n ↓ 0, then there
is a n0 (depending on ηn, `n) and a C < ∞ (independent of ηn, `n) such that for all
n > n0, P ∈ P0, and θ0 ∈ Θr0n(P ) there is hθ0,n/
√n ∈ Bn ∩N (∇ΥF (θ0)) with
ΥG(θ0 +hθ0,n√n
+h√n
) ≤ 0 ‖hθ0,n√n‖B ≤ Cηn (S.292)
for all h/√n ∈ Bn for which there is a h/
√n ∈ Bn satisfying ‖(h − h)/
√n‖B ≤ ηn,
‖h/√n‖B ≤ `n and the inequality ΥG(θ0 + h/
√n) ≤ 0.
Proof: By Assumption 3.13(ii) there are ε > 0 and Kd < ∞ such that for every
P ∈ P0, n, and θ0 ∈ Θr0n(P ) there exists a hθ0,n ∈ Bn ∩N (∇ΥF (θ0)) satisfying
ΥG(θ0) +∇ΥG(θ0)[hθ0,n] ≤ −ε1G (S.293)
and ‖hθ0,n‖B ≤ Kd. Moreover, for any h/√n ∈ Bn such that ‖h/
√n‖B ≤ `n, Assump-
tion 3.11(i), Lemma S.7.3, and Kg`2n ≤Mg`n for n sufficiently large yield
ΥG(θ0 +h√n
) ≤ ΥG(θ0) +∇ΥG(θ0)[h√n
] +Kg‖h√n‖2B1G
≤ ΥG(θ0) + ‖∇ΥG(θ0)‖o`n +Kg`2n1G ≤ ΥG(θ0) + 2Mg`n1G. (S.294)
Hence, (S.293) and (S.294) imply for any h/√n ∈ Bn with ‖h/
√n‖B ≤ `n we must have
ΥG(θ0 +h√n
) +∇ΥG(θ0)[hθ0,n] ≤ 2Mg`n − ε1G. (S.295)
Next, we let C0 > 8Mg/ε and aim to show (S.292) holds with C = C0Kd by setting
hθ0,n√n≡ C0ηnhθ0,n. (S.296)
To this end, we first note that if θ0 ∈ Θr0n(P ), h/
√n ∈ Bn satisfies ‖h/
√n‖B ≤ `n and
ΥG(θ0 + h/√n) ≤ 0, and h/
√n ∈ Bn is such that ‖(h− h)/
√n‖B ≤ ηn, then definition
(S.296) implies that ‖(hθ0,n + h)/√n‖B = o(1). Therefore, Assumption 3.11(i), Lemma
93
S.7.3, and ‖(h− h)/√n‖B ≤ ηn together allow us to conclude
ΥG(θ0 +hθ0,n√n
+h√n
)
≤ ΥG(θ0 +h√n
) +∇ΥG(θ0 +h√n
)[hθ0,n√n
+(h− h)√
n] + 2Kg(‖
hθ0,n√n‖2B + η2
n)1G
≤ ΥG(θ0 +h√n
) +∇ΥG(θ0 +h√n
)[hθ0,n√n
] + 2Kg‖hθ0,n√n‖2B + 2Mgηn1G, (S.297)
where the final result follows from Assumption 3.11(iii) and 2Kgη2n ≤Mgηn for n suffi-
ciently large. Similarly, Assumption 3.11(ii) and Lemma S.7.3 yield
∇ΥG(θ0 +h√n
)[hθ0,n√n
] ≤ ∇ΥG(θ0)[hθ0,n√n
] + ‖∇ΥG(θ0 +h√n
)−∇ΥG(θ0)‖o‖hθ0,n√n‖B1G
≤ ∇ΥG(θ0)[hθ0,n√n
] +Kg`n‖hθ0,n√n‖B1G. (S.298)
Hence, combining results (S.297) and (S.298), ‖hθ0,n/√n‖B ≤ C0Kdηn due to ‖hθ0,n‖B ≤
Kd, and ηn ↓ 0, `n ↓ 0, we obtain that for n sufficiently large we have
ΥG(θ0 +hθ0,n√n
+h√n
) ≤ ΥG(θ0 +h√n
) +∇ΥG(θ0)[hθ0,n√n
] + 4Mgηn1G. (S.299)
In addition, since C0ηn ↓ 0, we have C0ηn ≤ 1 eventually, and hence ΥG(θ0 +h/√n) ≤ 0,
2Mg`n ≤ ε/2 for n sufficiently large due to `n ↓ 0, and result (S.295) imply that
ΥG(θ0 +h√n
) + C0ηn∇ΥG(θ0)[hθ0,n] ≤ C0ηnΥG(θ0 +h√n
) +∇ΥG(θ0)[hθ0,n]
≤ C0ηn2Mg`n − ε1G ≤ −C0ηnε
21G. (S.300)
Thus, we can conclude from results (S.296), (S.299), (S.300), and C0 > 8Mg/ε that
ΥG(θ0 +hθ0,n√n
+h√n
) ≤ 4Mg −C0ε
2ηn1G ≤ 0,
for n sufficiently large, which establishes the claim of the Lemma.
Lemma S.7.3. If A is an AM space with norm ‖ · ‖A and unit 1A, and a1, a2 ∈ A,
then it follows that a1 ≤ a2 + C1A for any a1, a2 ∈ A satisfying ‖a1 − a2‖A ≤ C.
Proof: Since A is an AM space with unit 1A we have that ‖a1 − a2‖A ≤ C implies
|a1 − a2| ≤ C1A, and hence the claim follows trivially from a1 − a2 ≤ |a1 − a2|.
Lemma S.7.4. Let A and C be Banach spaces with norms ‖·‖A and ‖·‖C, A = A1⊕A2
and F : A→ C. Suppose F (a0) = 0 and that there are ε0 > 0 and K0 <∞ such that:
(i) F : A→ C is Frechet differentiable at all a ∈ Bε0(a0) ≡ a ∈ A : ‖a−a0‖A ≤ ε0.
94
(ii) ‖F (a+ h)− F (a)−∇F (a)[h]‖C ≤ K0‖h‖2A for all a, a+ h ∈ Bε0(a0).
(iii) ‖∇F (a1)−∇F (a2)‖o ≤ K0‖a1 − a2‖A for all a1, a2 ∈ Bε0(a0).
(iv) ∇F (a0) : A→ C has ‖∇F (a0)‖o ≤ K0.
(v) ∇F (a0) : A2 → C is bijective and ‖∇F (a0)−1‖o ≤ K0.
Then, for all h1 ∈ A1 with ‖h1‖A ≤ ε02 ∧ (4K20 )−1 ∧ 12 there is a unique h?2(h1) ∈ A2
with F (a0 + h1 + h?2(h1)) = 0. In addition, h?2(h1) satisfies ‖h?(h1)‖A ≤ 4K20‖h1‖A for
arbitrary A1, and ‖h?(h1)‖A ≤ 2K20‖h1‖2A when A1 = N (∇F (a0)).
Proof: We closely follow the arguments in the proof of Theorems 4.B in Zeidler (1985).
First, we define g : A1 ×A2 → C pointwise for any h1 ∈ A1 and h2 ∈ A2 by
g(h1, h2) ≡ ∇F (a0)[h2]− F (a0 + h1 + h2). (S.301)
Since ∇F (a0) : A2 → C is bijective by hypothesis, F (a0 + h1 + h2) = 0 if and only if
h2 = ∇F (a0)−1[g(h1, h2)]. (S.302)
Letting Th1 : A2 → A2 be given by Th1(h2) = ∇F (a0)−1[g(h1, h2)], we see from (S.302)
that the desired h?2(h1) must be a fixed point of Th1 . Next, define the set
M0 ≡ h2 ∈ A2 : ‖h2‖A ≤ δ0
for δ0 ≡ ε02 ∧ (4K2
0 )−1 ∧ 1, and consider an arbitrary h1 ∈ A1 with ‖h1‖A ≤ δ20 . Notice
that then a0+h1+h2 ∈ Bε0(a0) for any h2 ∈M0 and hence g is differentiable with respect
to h2 with derivative ∇2g(h1, h2) ≡ ∇F (a0)−∇F (a0 + h1 + h2). Thus, if h2, h2 ∈M0,
then Proposition 7.3.2 in Luenberger (1969) implies that
‖g(h1, h2)− g(h1, h2)‖C ≤ sup0<τ<1
‖∇2g(h1, h2 + τ(h2 − h2))‖o‖h2 − h2‖A
≤ 1
2K0‖h2 − h2‖A, (S.303)
where the final inequality follows by Condition (iii) and δ20 ≤ δ0 ≤ (4K2
0 )−1. Moreover,
‖∇F (a0)[h2]−∇F (a0 + h1)[h2]‖C
≤ ‖∇F (a0)−∇F (a0 + h1)‖o‖h2‖A ≤ K0‖h1‖A‖h2‖A ≤‖h2‖A4K0
(S.304)
by Condition (iii) and ‖h1‖A ≤ δ0 ≤ (4K20 )−1. Similarly, for any h2 ∈M0 we have
‖F (a0 + h1 + h2)− F (a0 + h1)−∇F (a0 + h1)[h2]‖C ≤ K0‖h2‖2A ≤‖h2‖A4K0
due to a0 + h1 ∈ Bε0(a0) and Condition (ii). Moreover, since F (a0) = 0 by hypothesis,
95
Conditions (ii) and (iv), ‖h1‖A ≤ δ20 , and δ0 ≤ (4K2
0 )−1 yield that
‖F (a0+h1)‖C = ‖F (a0+h1)−F (a0)‖C ≤ K0‖h1‖2A+‖∇F (a0)‖o‖h1‖A ≤δ0
2K0. (S.305)
Hence, by (S.301) and (S.304)-(S.305) we obtain for any h2 ∈M0 and h1 with ‖h1‖A ≤ δ20
‖g(h1, h2)‖C ≤‖h2‖A2K0
+δ0
2K0≤ δ0
K0. (S.306)
Thus, since ‖∇F (a0)−1‖o ≤ K0 by Condition (v), result (S.306) implies Th1 : M0 →M0, and (S.303) yields ‖Th1(h2) − Th1(h2)‖A ≤ 2−1‖h2 − h2‖A for any h2, h2 ∈ M0.
By Theorem 1.1.1.A in Zeidler (1985) we then conclude Th1 has a unique fixed point
h?2(h1) ∈M0, and the first claim of the Lemma follows from (S.301) and (S.302).
Next, we note that since h?2(h1) is a fixed point of Th1 , we can conclude that
‖h?2(h1)‖A = ‖Th1(h?2(h1))‖A ≤ ‖Th1(h?2(h1))− Th1(0)‖A + ‖Th1(0)‖A. (S.307)
Thus, since (S.303) and ‖∇F (a0)−1‖o ≤ K0 imply ‖Th1(h?2(h1))−Th1(0)‖A ≤ 2−1‖h?2(h1)‖A,
it follows from result (S.307) and Th1(0) ≡ −∇F (a0)−1F (a0 + h1) that
1
2‖h?2(h1)‖A ≤ ‖Th1(0)‖A ≤ K0‖F (a0 + h1)‖C
≤ K0K0‖h1‖2A + ‖∇F (a0)‖o‖h1‖A ≤ 2K20‖h1‖A, (S.308)
where in the second inequality we employed ‖∇F (a0)−1‖o ≤ K0, in the third inequality
we used (S.305), and in the final inequality we exploited ‖h1‖A ≤ 1. While the estimate
in (S.308) applies for generic A1, we note that if in addition A1 = N (∇F (a0)), then
1
2‖h?2(h1)‖A ≤ ‖Th1(0)‖A ≤ K0‖F (a0 + h1)‖C ≤ K2
0‖h1‖2A ,
due to F (a0) = 0 and ∇F (a0)[h1] = 0, and thus the final claim of the lemma follows.
S.8 Coupling via Koltchinskii (1994)
In this section we develop uniform coupling results for empirical processes that help
verify Assumption 3.7 in specific applications. Our analysis is based on the Hungarian
construction of Massart (1989) and Koltchinskii (1994), and we state the results in a
notation that abstracts from the rest of the paper due to their potential independent
interest. Thus, in what follows we consider V ∈ Rdv to be a generic random variable
distributed according to P ∈ P, denote its support under P by Ω(P ) ⊂ Rdv , and let λ
denote the Lebesgue measure on Rdv .
96
The rates obtained through a Hungarian construction crucially depend on the ability
of the functions indexing the empirical process to be approximated by a suitable Haar
basis. Here, we follow Koltchinskii (1994) and control the relevant approximation errors
through primitive conditions stated in terms of the integral modulus of continuity. For
a measure P and a function f : Rdv → R, the integral modulus of continuity of f is the
function $(f, ·, P ) : R+ → R+ defined for every h ∈ R+ as
$(f, h, P ) ≡ sup‖s‖≤h
(
∫Ω(P )
(f(v + s)− f(v))21v + s ∈ Ω(P )dP (v))12 . (S.309)
Intuitively, the integral modulus of continuity quantifies the “smoothness” of a function
f by examining the difference between f and its own translation. For example, it is
straightforward to verify that $(f, h, P ) . h whenever f is Lipschitz. In contrast
indicator functions such as f(v) = 1v ≤ t typically satisfy $(f, h, P ) . h1/2.
We impose the following assumptions to establish the uniform coupling results.
Assumption S.8.1. (i) For all P ∈ P, P λ and Ω(P ) is compact; (ii) The densities
dP/dλ satisfy supP∈P supv∈Ω(P )dPdλ (v) <∞ and infP∈P infv∈Ω(P )
dPdλ (v) > 0.
Assumption S.8.2. (i) For each P ∈ P there is a continuously differentiable bijec-
tion TP : [0, 1]dv → Ω(P ); (ii) The Jacobian JTP and its determinant |JTP | satisfy
infP∈P infv∈[0,1]dv |JTP (v)| > 0 and supP∈P supv∈[0,1]dv ‖JTP (v)‖o <∞.
Assumption S.8.3. The classes of functions Fn satisfy: (i) supP∈P supf∈Fn $(f, h, P ) ≤ϕn(h) for some ϕn : R+ → R+ satisfying ϕn(Ch) ≤ Cκϕn(h) for all n, C > 0, and
some κ > 0; and (ii) supP∈P supf∈Fn ‖f‖P,∞ ≤ Kn for some Kn > 0.
In Assumption S.8.1 we impose that V ∼ P be continuously distributed for all
P ∈ P, with uniformly (in P ) bounded supports and densities bounded from above
and away from zero. Assumption S.8.2 requires that the support of V under each P
be “smooth” in the sense that it may be seen as a differentiable transformation of the
unit square. Together, Assumptions S.8.1 and S.8.2 enable us to construct partitions
of Ω(P ) such that the diameter of each set in the partition is controlled uniformly in
P ; see Lemma S.8.1. As a result, the approximation error by the Haar bases implied
by each partition can be controlled uniformly by the integral modulus of continuity; see
Lemma S.8.2. Together with Assumption S.8.3, which imposes conditions on the integral
modulus of continuity of Fn uniformly in P , we can obtain a uniform coupling result
through the analysis in Koltchinskii (1994). We note that the homogeneity condition
on ϕn in Assumption S.8.3(i) is not necessary, but imposed to simplify the bound.
The next theorem provides us with an important tool for verifying Assumption
3.7. In its statement we employ the same notation as in the main text, where recall
N[ ](ε,Fn, ‖·‖P,2) denotes the smallest number of brackets of size ε (under ‖·‖P,2) needed
97
to cover Fn, while J[ ](ε,Fn, ‖ · ‖P,2) is defined as the integral
J[ ](ε,Fn, ‖ · ‖P,2) ≡∫ ε
0
√1 + logN[ ](u,Fn, ‖ · ‖P,2)du.
Theorem S.8.1. Let Assumptions S.8.1-S.8.3 hold, Vini=1 be i.i.d. with Vi ∼ P ∈ P
and for any δn ↓ 0 let Nn ≡ supP∈PN[ ](δn,Fn, ‖ ·‖P,2), Jn ≡ supP∈P J[ ](δn,Fn, ‖ ·‖P,2),
Sn ≡ (
dlog2 ne∑i=0
2iϕ2n(2−
idv ))
12 . (S.310)
If Nn ↑ ∞, then there are Gaussian processes Wn,P ∞n=1 such that uniformly in P ∈ P
‖Gn,P −Wn,P ‖Fn
= OP (Kn log(nNn)√
n+Kn
√log(nNn) log(n)Sn√
n+ Jn(1 +
JnKn
δ2n
√n
)). (S.311)
Theorem S.8.1 is a mild modification of the results in Koltchinskii (1994). The proof
of Theorem S.8.1 relies on a coupling of the empirical process on a sequence of grids of
cardinality Nn, and employs the equicontinuity of Gn,P and Wn,P to obtain a coupling
on the entire class Fn. The conclusion of Theorem S.8.1 applies to any choice of grid
accuracy δn. In order to obtain the best rate, δn must be chosen to balance the terms
in (S.311) and thus depends on the metric entropy of Fn through the terms Nn and Jn.
Below, we include the proof of Theorem S.8.1 and auxiliary results.
Proof of Theorem S.8.1: Let ∆i(P ) be the partitions of Ω(P ) in Lemma S.8.1
and BP,i the σ-algebra generated by ∆i(P ). By Lemma S.8.2 and Assumption S.8.3,
supP∈P
supf∈Fn
(
dlog2 ne∑i=0
2iEP [(f(V )− EP [f(V )|BP,i])2])12
≤ C1(
dlog2 ne∑i=0
2iϕ2n(2−
idv ))
12 ≡ C1Sn (S.312)
for some constant C1 > 0 and for Sn as defined in (S.310). Next, let FP,n,δn ⊆ Fn denote
a finite δn-net of Fn with respect to ‖ · ‖P,2. Since N(ε,Fn, ‖ · ‖P,2) ≤ N[ ](ε,Fn, ‖ · ‖P,2),
it follows from the definition of Nn that we may choose FP,n,δn so that
supP∈P
card(FP,n,δn) ≤ supP∈P
N[ ](δn,Fn, ‖ · ‖P,2) ≡ Nn. (S.313)
By Theorem 3.5 in Koltchinskii (1994), (S.312) and (S.313), it follows that for each
98
n ≥ 1 there exists an isonormal process Wn,P , such that for all η1 > 0, η2 > 0
supP∈P
P (
√n
Kn‖Gn,P −Wn,P ‖FP,n,δn ≥ η1 +
√η1√η2(C1Sn + 1))
. Nn exp−C2η1+ n exp−C2η2, (S.314)
for some C2 > 0. Since Nn ↑ ∞, (S.314) implies for any ε > 0 there are C3 > 0, C4 > 0
sufficiently large, such that setting η1 ≡ C3 log(Nn) and η2 ≡ C3 log(n) yields
supP∈P
P (‖Gn,P −Wn,P ‖FP,n,δn ≥ C4Kn ×log(nNn) +
√log(Nn) log(n)Sn√n
) < ε. (S.315)
Next, note that by definition of FP,n,δn , there exists a Γn,P : Fn → FP,n,δn such that
supP∈P supf∈Fn ‖f −Γn,P f‖P,2 ≤ δn. Let D(ε,Fn, ‖ · ‖P,2) denote the ε-packing number
for Fn under ‖ · ‖P,2, and note D(ε,Fn, ‖ · ‖P,2) ≤ N[ ](ε,Fn, ‖ · ‖P,2). Therefore, by
Corollary 2.2.8 in van der Vaart and Wellner (1996) we can conclude that
supP∈P
EP [‖Wn,P −Wn,P Γn,P ‖Fn ]
. supP∈P
∫ δn
0
√logD(ε,Fn, ‖ · ‖P,2)dε ≤ sup
P∈PJ[ ](δn,Fn, ‖ · ‖P,2) ≡ Jn. (S.316)
Similarly, employing Lemma 3.4.2 in van der Vaart and Wellner (1996) yields that
supP∈P
EP [‖Gn,P −Gn,P Γn,P ‖Fn ]
. supP∈P
J[ ](δn,Fn, ‖ · ‖P,2)(1 + supP∈P
J[ ](δn,Fn, ‖ · ‖P,2)Kn
δ2n
√n
) ≡ Jn(1 +JnKn
δ2n
√n
) (S.317)
Therefore, combining (S.315), (S.316), and (S.317) together with the decomposition
‖Gn,P −Wn,P ‖Fn≤ ‖Gn,P −Wn,P ‖FP,n,δn + ‖Gn,P −Gn,P Γn,P ‖Fn + ‖Wn,P −Wn,P Γn,P ‖Fn ,
establishes the claim of the theorem by Markov’s inequality.
Lemma S.8.1. Let BP denote the completion of the Borel σ−algebra on Ω(P ) with
respect to P . If Assumptions S.8.1(i)(ii) and S.8.2(i)(ii) hold, then for each P ∈ P
there exists a sequence ∆i(P ) of partitions of (Ω(P ),BP , P ) such that:
(i) ∆i(P ) = ∆i,k(P ) : k = 0, . . . , 2i − 1, ∆i,k(P ) ∈ BP , and ∆0,0(P ) = Ω(P ).
(ii) ∆i,k(P ) = ∆i+1,2k(P ) ∪∆i+1,2k+1(P ) and ∆i+1,2k(P ) ∩∆i+1,2k+1(P ) = ∅ for any
integers k = 0, . . . 2i − 1 and i ≥ 0.
(iii) P (∆i+1,2k(P )) = P (∆i+1,2k+1(P )) = 2−i−1 for k = 0, . . . 2i − 1, i ≥ 0.
(iv) supP∈P max0≤k≤2i−1 supv,v′∈∆i,k(P ) ‖v − v′‖2 = O(2−idv ).
99
(v) BP equals the completion with respect to P of the σ-algebra generated by⋃i≥0 ∆i(P ).
Proof: Let A denote the Borel σ-algebra on [0, 1]dv , and for any A ∈ A define
QP (A) ≡ P (TP (A)), (S.318)
where TP (A) ∈ BP due to T−1P being measurable. Moreover, QP ([0, 1]dv) = 1 due to TP
being surjective, and QP is σ-additive due to TP being injective. Hence, we conclude
QP defined by (S.318) is a probability measure on ([0, 1]dv ,A). In addition, for λ the
Lebesgue measure, we obtain from Theorem 3.7.1 in Bogachev (2007) that
QP (A) = P (TP (A)) =
∫TP (A)
dP
dλ(v)dλ(v) =
∫A
dP
dλ(TP (a))|JTP (a)|dλ(a), (S.319)
where |JTP (a)| denotes the Jacobian of TP at any point a ∈ [0, 1]dv . Hence, QP has
density with respect to Lebesgue measure given by gP (a) ≡ dPdλ (TP (a))|JTP (a)| for any
a ∈ [0, 1]dv . Next, let a = (a1, . . . , adv)′ ∈ [0, 1]dv and define for any t ∈ [0, 1]
Gl,P (t|A) ≡ QP (a ∈ A : al ≤ t)QP (A)
, (S.320)
for any set A ∈ A and 1 ≤ l ≤ dv. Further let m(i) ≡ i− b i−1dvc × dv (i.e. m(i) equals i
modulo dv), set ∆0,0(P ) = [0, 1]dv , and inductively define the partitions of [0, 1]dv
∆i+1,2k(P ) ≡ a ∈ ∆i,k(P ) : Gm(i+1),P (am(i+1)|∆i,k(P )) ≤ 1
2
∆i+1,2k+1(P ) ≡ ∆i,k(P ) \ ∆i+1,2k(P ) (S.321)
for 0 ≤ k ≤ 2i − 1. For cl(∆i,k(P )) the closure of ∆i,k(P ), we then note that by
construction each ∆i,k(P ) is a hyper-rectangle in [0, 1]dv – i.e. it is of the general form
cl(∆i,k(P )) =
dv∏j=1
[li,k,j(P ), ui,k,j(P )].
Moreover, since gP is positive everywhere on [0, 1]dv by Assumptions S.8.1(ii) and
S.8.2(ii), it follows that for any i ≥ 0, 0 ≤ k ≤ 2i − 1 and 1 ≤ j ≤ dv, we have
li+1,2k,j(P ) = li,k,j(P )
ui+1,2k,j(P ) =
ui,k,j(P ) if j 6= m(i+ 1)
solves Gm(i+1),P (ui+1,2k,j(P )|∆i,k(P )) = 12 if j = m(i+ 1)
.
(S.322)
100
Similarly, since ∆i+1,2k+1(P ) = ∆i,k(P ) \ ∆i+1,2k(P ), it additionally follows that
ui+1,2k+1,j(P ) = ui,k,j(P ) li+1,2k+1,j(P ) =
li,k,j(P ) if j 6= m(i+ 1)
ui+1,2k,j(P ) if j = m(i+ 1).
(S.323)
Since QP (cl(∆i+1,2k(P ))) = QP (∆i+1,2k(P )) by QP λ, (S.320) and (S.322) yield
QP (∆i+1,2k(P )) = QP (a ∈ ∆i,k(P ) : am(i+1) ≤ ui+1,2k,m(i+1)(P ))
= Gm(i+1),P (ui+1,2k,m(i+1)(P )|∆i,k(P ))QP (∆i,k(P ))
=1
2QP (∆i,k(P )).
Therefore, since ∆i,k(P ) = ∆i+1,2k(P )∪∆i+1,2k+1(P ), it follows thatQP (∆i+1,2k+1(P )) =12QP (∆i,k(P )) for 0 ≤ k ≤ 2i − 1 as well. In particular, QP (∆0,0(P )) = 1 implies that
QP (∆i,k(P )) =1
2i(S.324)
for any integers i ≥ 1 and 0 ≤ k ≤ 2i − 1. Moreover, we note that result (S.319) and
Assumptions S.8.1(ii) and S.8.2(ii) together imply that the density gP of QP satisfies
0 < infP∈P
infa∈[0,1]dv
gP (a) < supP∈P
supa∈[0,1]dv
gP (a) <∞, (S.325)
and therefore QP (A) λ(A) uniformly in A ∈ A and P ∈ P. Hence, since by (S.322)
ui+1,2k,j(P ) = ui,k,j(P ) and li+1,2k,j(P ) = li,k,j(P ) for all j 6= m(i+ 1), we obtain
(ui+1,2k,m(i+1)(P )− li+1,2k,m(i+1)(P ))
(ui,k,m(i+1)(P )− li,k,m(i+1)(P ))=
∏dvj=1(ui+1,2k,j(P )− li+1,2k,j(P ))∏dv
j=1(ui,k,j(P )− li,k,j(P ))
=λ(∆i+1,2k(P ))
λ(∆i,k(P ))QP (∆i+1,2k(P ))
QP (∆i,k(P ))=
1
2(S.326)
uniformly in P ∈ P, i ≥ 0, and 0 ≤ k ≤ 2i−1 by results (S.324) and (S.325). Moreover,
by identical arguments but using (S.323) instead of (S.322) we conclude
(ui+1,2k+1,m(i+1)(P )− li+1,2k+1,m(i+1)(P ))
(ui,k,m(i+1)(P )− li,k,m(i+1)(P )) 1
2(S.327)
also uniformly in P ∈ P, i ≥ 0 and 0 ≤ k ≤ 2i − 1. Thus, since (ui+1,2k,j(P ) −li+1,2k,j(P )) = (ui+1,2k+1,j(P )−li+1,2k+1,j(P )) = (ui,k,j(P )−li,k,j(P )) for all j 6= m(i+1),
and u0,0,j(P ) − l0,0,j(P ) = 1 for all 1 ≤ j ≤ dv we obtain from m(i) = i − b i−1dvc × dv,
results (S.326) and (S.327), and proceeding inductively that
(ui,k,j(P )− li,k,j(P )) 2−idv , (S.328)
101
uniformly in P ∈ P, i ≥ 0, 0 ≤ k ≤ 2i − 1, and 1 ≤ j ≤ dv. Thus, result (S.328) yields
supP∈P
max0≤k≤2i−1
supa,a′∈∆i,k(P )
‖a− a′‖
≤ supP∈P
max0≤k≤2i−1
max1≤j≤dv
√dv × (ui,j,k(P )− li,j,k(P )) = O(2−
idv ). (S.329)
We next obtain the desired sequence of partitions ∆i(P ) of (Ω(P ),BP , P ) by con-
structing them from the partition ∆i,k(P ) of [0, 1]dv . To this end, set
∆i,k(P ) ≡ TP (∆i,k(P ))
for all i ≥ 0 and 0 ≤ k ≤ 2i − 1. Note that ∆i(P ) satisfies conditions (i) and (ii) due
to T−1P being a measurable map, TP being bijective, and result (S.321). In addition,
∆i(P ) satisfies condition (iii) since by definition (S.318) and result (S.324) we have
P (∆i,k(P )) = P (TP (∆i,k(P ))) = QP (∆i,k(P )) = 2−i,
for all 0 ≤ k ≤ 2i−1. Moreover, by Assumption S.8.2(ii), supP∈P supa∈[0,1]dv ‖JTP (a)‖o <∞, and hence by the mean value theorem we can conclude that
supP∈P
max0≤k≤2i−1
supv,v′∈∆i,k(P )
‖v − v′‖ = supP∈P
max0≤k≤2i−1
supa,a′∈∆i,k(P )
‖TP (a)− TP (a′)‖
. supP∈P
max0≤k≤2i−1
supa,a′∈∆i,k(P )
‖a− a′‖ = O(2−idv ),
by result (S.329), which verifies that ∆i(P ) satisfies condition (iv). Also note that to
verify ∆i(P ) satisfies condition (v) it suffices to show that⋃i≥0 ∆i(P ) generates the
Borel σ-algebra on Ω(P ). To this end, we first aim to show that
A = σ(⋃i≥0
∆i(P )), (S.330)
where for a collection of sets C, σ(C) denotes the σ-algebra generated by C. For any
closed set A ∈ A, then define Di(P ) to be given by
Di(P ) ≡⋃
k:∆i,k(P )∩A 6=∅
∆i,k(P ).
Notice that since ∆i(P ) is a partition of [0, 1]dv , A ⊆ Di(P ) for all i ≥ 0 and hence A ⊆⋂i≥0Di(P ). Moreover, if a0 ∈ Ac, then Ac being open and (S.329) imply a0 /∈ Di(P )
for i sufficiently large. Hence, Ac ∩ (⋂i≥0Di(P )) = ∅ and therefore A =
⋂i≥0Di(P ). It
follows that if A is closed, then A ∈ σ(⋃i≥0 ∆i(P )), which implies A ⊆ σ(
⋃i≥0 ∆i(P )).
On the other hand, since ∆i,k(P ) is Borel for all i ≥ 0 and 0 ≤ k ≤ 2i − 1, we also have
102
σ(⋃i≥0 ∆i(P )) ⊆ A, and hence (S.330) follows. To conclude, we then note that
σ(⋃i≥0
∆i(P )) = σ(⋃i≥0
TP (∆i(P ))) = TP (σ(⋃i≥0
∆i(P ))) = TP (A), (S.331)
by Corollary 1.2.9 in Bogachev (2007). However, TP and T−1P being continuous im-
plies TP (A) equals the Borel σ-algebra in Ω(P ), and therefore (S.331) implies ∆i(P )satisfies condition (v) establishing the lemma.
Lemma S.8.2. Let ∆i(P ) be as in Lemma S.8.1, and BP,i denote the σ-algebra gener-
ated by ∆i(P ). If Assumptions S.8.1(i)(ii) and S.8.2(i)(ii) hold, then there are K0 > 0,
K1 > 0 such that for all P ∈ P and any f satisfying f ∈ L2P for all P ∈ P:
EP [(f(V )− EP [f(V )|BP,i])2] ≤ K0 ×$2(f,K1 × 2−idv , P ).
Proof: Since ∆i(P ) is a partition of Ω(P ) and P (∆i,k(P )) = 2−i for all i ≥ 0 and
0 ≤ k ≤ 2i − 1, we may express EP [f(V )|BP,i] as an element of L2P by
EP [f(V )|BP,i] = 2i2i−1∑k=0
1v ∈ ∆i,k(P )∫
∆i,k(P )f(v)dP (v).
Hence, employing that P (∆i,k(P )) = 2−i for all i ≥ 0 and 0 ≤ k ≤ 2i − 1 together
with ∆i(P ) being a partition of Ω(P ), and applying Holder’s inequality to the term
(f(v)− f(v))1v ∈ Ω(P )1v ∈ ∆i,k(P ) we can conclude that
EP [(f(V )− EP [f(V )|BP,i])2]
=2i−1∑k=0
∫∆i,k(P )
(f(v)− 2i∫
∆i,k(P )f(v)dP (v))2dP (v)
=2i−1∑k=0
22i
∫∆i,k(P )
(
∫∆i,k(P )
(f(v)− f(v))1v ∈ Ω(P )dP (v))2dP (v)
≤2i−1∑k=0
22iP (∆i,k(P ))
∫∆i,k(P )
∫∆i,k(P )
(f(v)− f(v))21v ∈ Ω(P )dP (v)dP (v)
=2i−1∑k=0
2i∫
∆i,k(P )
∫∆i,k(P )
(f(v)− f(v))21v ∈ Ω(P )dP (v)dP (v).
Let Di ≡ supP∈P max0≤k≤2i−1 diam∆i,k(P ), where diam∆i,k(P ) is the diameter
of ∆i,k(P ). Further note that by Lemma S.8.1(iv), Di = O(2−idv ) and hence we have
λ(s ∈ Rdv : ‖s‖ ≤ Di) ≤M12−i for some M1 > 0 and λ the Lebesgue measure. Noting
that supP∈P supv∈Ω(P )dPdλ (v) < ∞ by Assumption S.8.1(ii), and doing the change of
103
variables s = v − v we then obtain for some constant M0 > 0 that
EP [(f(V )− EP [f(V )|BP,i])2]
≤M0
2i−1∑k=0
2i∫
∆i,k(P )
∫∆i,k(P )
(f(v)− f(v))21v ∈ Ω(P )dλ(v)dλ(v)
≤M0M1 sup‖s‖≤Di
2i−1∑k=0
∫∆i,k(P )
(f(v + s)− f(v))21v + s ∈ Ω(P )dλ(v) . (S.332)
Hence, since ∆i,k(P ) : k = 0 . . . 2i − 1 is a partition of Ω(P ), $(f, h, P ) is decreasing
in h, and Di ≤ K12−idv for some K1 > 0 by Lemma S.8.1(iv), we obtain
EP [(f(V )− EP [f(V )|BP,i])2] ≤M0M1 ×$2(f,K1 × 2−idv , P ) (S.333)
by (S.332). Setting K0 ≡M0 ×M1 in (S.333) then establishes the lemma.
S.9 Uniform Bootstrap Coupling
We next provide uniform coupling results for the multiplier bootstrap that allow us to
verify Assumption 3.14 in a variety of problems. The results in this appendix may be of
independent interest, as they extend the validity of the multiplier bootstrap to suitable
non-Donsker classes Fn. For this reason, as in Section S.8, we state the results in a
notation that abstracts from the rest of the paper. Hence, here V ∈ Rdv should be
interpreted as a generic random variable whose distribution is given by P ∈ P.
Our coupling results rely on a series approximation to the elements of Fn. To this
end, we will assume that for each P ∈ P there is a basis fd,n,P dnd=1, with dn possibly
diverging to infinity, that provides a suitable approximation to every f ∈ Fn. Formally,
for fdnn,P (v) ≡ (f1,n,P (v), . . . , fdn,n,P (v))′, we impose the following:
Assumption S.9.1. For each P ∈ P there is an array of functions fd,n,P dnd=1 ⊂ L2P
such that: (i) The eigenvalues of EP [fdnn,P (V )fdnn,P (V )′] are bounded by 1 ≤ Cn uniformly
in P ∈ P; (ii) supP∈P max1≤d≤dn ‖fd,n,P ‖P,∞ ≤ Kn with 1 ≤ Kn finite.
Assumption S.9.2. For every f ∈ Fn and P ∈ P there is a βn,P (f) ∈ Rdn such that:
(i) The class Gn,P ≡ f − fdn′n,Pβn,P (f) : f ∈ Fn has envelope Gn,P which satisfies
‖g‖P,2 ≤ δn‖Gn,P ‖P,2 for all P ∈ P, g ∈ Gn,P , and some δn > 0 with
J1n ≡ supP∈PJ[ ](δn‖Gn,P ‖P,2,Gn,P , ‖ · ‖P,2) +
√nEP [Gn,P (V ) exp−
nδ2n‖Gn,P ‖2P,2
G2n,P (V )ηn,P
]
finite and ηn,P ≡ 1 + logN[ ](δn‖Gn,P ‖P,2,Gn,P , ‖ · ‖P,2); (ii) The set Bn ≡ βn,P (f) :
f ∈ Fn, P ∈ P ∪ 0 satisfies J2n ≡∫∞
0
√log(N(ε,Bn, ‖ · ‖2))dε <∞.
104
Assumption S.9.1 imposes our regularity conditions on the approximating functions
fd,n,P dnd=1. We emphasize that the functions fd,n,P dnd=1 need not be known as they are
merely employed in the theoretical construction of the bootstrap coupling, and not in
the computation of the multiplier bootstrap process Wn. In certain application, such as
when Fn itself is finite dimensional, a basis fd,n,P dnd=1 may be naturally available.The
approximating requirements of fd,n,P dnd=1 are formally imposed in Assumption S.9.2.
In particular, Assumption S.9.2(i) requires that the remainder of the approximation of
Fn by fd,n,P dnd=1 not be “too large.” Intuitively, Assumption S.9.2(i) may be under-
stood as controlling the “bias” in a series approximation of Fn by linear combinations of
fd,n,P dnd=1. Assumption S.9.2(ii) in turn controls the “variance” of the series approxi-
mation by demanding the class of approximating functions to have a finite entropy. As
in Section S.8, Assumption S.9.2 does not require the functions f ∈ Fn to be “smooth”
in the sense of being differentiable.
We next show Assumptions S.9.1 and S.9.2 suffice for coupling Wn to W?n,P .
Theorem S.9.1. Let Assumptions S.9.1, S.9.2 hold, (ωi, Vi)ni=1 be i.i.d. with Vi ∼P ∈ P, ωi ∼ N(0, 1), ωi and Vi independent, and suppose that dn log(1 + dn)K2
n = o(n).
(i) Then, there is a linear Gaussian process W?n,P independent of Vini=1 with
‖Wn −W?n,P ‖Fn = OP (J2n
K2nCndn log(1 + dn)
n1/4 + J1n)
uniformly in P ∈ P. (ii) If in addition the eigenvalues of EP [fdnn,P (V )fdnn,P (V )′] are
bounded away from zero uniformly in n and P ∈ P, then uniformly in P ∈ P
‖Wn −W?n,P ‖Fn = OP (
J2nKn
√Cndn log(1 + dn)√
n+ J1n).
Theorem S.9.1(i) derives a rate of convergence for the coupled process, while Theorem
S.9.1(ii) improves on the rate under the additional requirement that EP [fdnn,P (V )fdnn,P (V )′]
be bounded away from singularity. The rates of both Theorems S.9.1(i) and S.9.1(ii)
depend on the selected sequence dn, which should be chosen optimally to deliver the
best possible implied rate. Heuristically, the proof of Theorem S.9.1 proceeds in two
steps. First, we construct a multivariate normal random variable W?n,P (fdnn,P ) ∈ Rdn
that is coupled with Wn(fdnn,P ) ∈ Rdn , and then employ the linearity of Wn to obtain a
suitable coupling on the subspace Sn,P ≡ spanfdnn,P . Second, we employ Assumption
S.9.2(i) to show that a successful coupling on Sn,P leads to the desired construction since
Fn is well approximated by fd,n,P dnd=1. We note that while we do not pursue it here
for conciseness, the outlined heuristics can also be employed to verify Assumption 3.7
by coupling Gn,P (fdnn,P ) to Wn,P (fdnn,P ) through standard results (e.g. Yurinskii (1977)).
Below, we include the proof of Theorem S.9.1 and auxiliary results.
105
Proof of Theorem S.9.1: We proceed by employing Lemma S.9.1 to couple Wn on
a finite dimensional subspace, and then showing that such a result suffices for coupling
Wn and W?n,P on Fn. To this end, let Sn,P ≡ spanfdnn,P and note that Assumption
S.9.2(ii) and Lemma S.9.1 imply that there exists a linear Gaussian process W(1)n,P on
Sn,P and a sequence Rn = o(1) such that uniformly in P ∈ P we have
supβ∈Bn
|Wn(fdn′n,Pβ)−W(1)n,P (fdn′n,Pβ)| = OP (J2nRn). (S.334)
To establish part (i) of the theorem we will set Rn = (dn log(1 + dn)CnK2n/n)1/4
and employ Lemma S.9.1(i), while to establish part (ii) we will set Rn = (dn log(1 +
dn)CnK2n/n)1/2 and employ Lemma S.9.1(ii) instead.
For any closed linear subspace An,P of L2P , let Projf |An,P denote the ‖ · ‖P,2 pro-
jection of f onto An,P and set A⊥n,P ≡ f ∈ L2P : f = g−Projg|An,P for some g ∈ L2
P (i.e. A⊥n,P is the orthocomplement of An,P in L2
P ). Assuming the underlying probability
space is suitably enlarged to carry a linear isonormal process W(2)n,P on S⊥n,P independent
of W(1)n,P and Vini=1, we then define W?
n,P on L2P pointwise by
W?n,P f ≡W(1)
n,P (Projf |Sn,P ) + W(2)n,P (Projf |S⊥n,P ),
which is linear in f by linearity of f 7→ Projf |Sn,P , W(1)n,P , and W(2)
n,P . Moreover, since
W?n,P is sub-Gaussian with respect to ‖ · ‖P,2, it follows from Corollary 2.2.8 in van der
Vaart and Wellner (1996), N(δn‖Gn,P ‖P,2,Gn,P , ‖·‖P,2) = 1 due to ‖g‖P,2 ≤ δn‖Gn,P ‖P,2for all g ∈ Gn,P and P ∈ P, bracketing numbers being larger than covering numbers,
Jensen’s inequality, and the definition of J1n in Assumption S.9.2(i) that
EP [ supg∈Gn,P
|W?n,P g|] . δn‖Gn,P ‖P,2 +
∫ ∞0
√log(N(ε,Gn,P , ‖ · ‖P,2))dε
≤ δn‖Gn,P ‖P,2 +
∫ δn‖Gn,P ‖P,2
0
√1 + log(N[ ](ε,Gn,P , ‖ · ‖P,2))dε . J1n. (S.335)
To obtain an analogous bound for Wn, note supg∈Gn,P ‖g‖P,2 ≤ δn‖Gn,P ‖P,2 by Assump-
tion S.9.2(ii) and |EP [g(V )]| ≤ ‖g‖P,2 by Jensen’s inequality imply that
supg∈Gn,P
|Wng| ≤ supg∈Gn,P
| 1√n
n∑i=1
ωig(Vi)|
+ | 1√n
n∑i=1
ωi| × supg∈Gn,P
| 1n
n∑i=1
g(Vi)− EP [g(V )]|+ δn‖Gn,P ‖P,2. (S.336)
Next, define the class Gn,P ≡ (ω, v) 7→ ωg(v) : g ∈ Gn,P , and with some abuse of
notation let P index the joint distribution of (V, ω). Further note that if [gi,l,P , gi,u,P ]i
106
is a bracket for Gn,P , then the functions [gi,l,P , gi,u,P ] given by
gi,l,P ≡ (ω, v) 7→ maxω, 0gi,l,P (v) + minω, 0gi,u,P (v)
gi,u,P ≡ (ω, v) 7→ minω, 0gi,l,P (v) + maxω, 0gi,u,P (v)
form a bracket for Gn,P . Moreover, since E[ω2] = 1 and ω and V are independent,
it follows from direct calculation that ‖gi,u,P − gi,l,P ‖P,2 = ‖gi,u,P − gi,l,P ‖P,2. Setting
Gn,P (ω, v) ≡ |ω|Gn,P (v), then note that Gn,P is an envelope for Gn,P , which satisfies
‖Gn,P ‖P,2 = ‖Gn,P ‖P,2. Recalling ηn,P ≡ 1 + logN[ ](δn‖Gn,P ‖P,2,Gn,P , ‖ · ‖P,2) we then
obtain by Theorem 2.14.2 in van der Vaart and Wellner (1996) that
EP [ supg∈Gn,P
| 1√n
n∑i=1
ωig(Vi)|] . J[ ](δn‖Gn,P ‖P,2,Gn,P , ‖ · ‖P,2)
+√nEP [|ω|Gn,P (V )1|ω|
Gn,P (V )
‖Gn,P ‖P,2>
√nδn√ηn,P]. (S.337)
Moreover, since ω follows a standard normal distribution, we have E[|ω|1|ω| > a] .exp−a2/2 for any a ≥ 0. Therefore, the independence of ω and V implies
EP [|ω|Gn,P (V )1|ω|Gn,P (V )
‖Gn,P ‖P,2>
√nδn√ηn,P] . EP [Gn,P (V ) exp−
nδ2n‖Gn,P ‖2P,2
2G2n,P (V )ηn,P
]
which together with result (S.337) and the definition of J1n in Assumption S.9.2(i) yields
EP [ supg∈Gn,P
| 1√n
n∑i=1
ωig(Vi)|] . J1n. (S.338)
Moreover, by Lemma 2.3.1 in van der Vaart and Wellner (1996) we further obtain that
EP [ supg∈Gn,P
| 1√n
n∑i=1
g(Vi)− EP [g(V )]|] + δn‖Gn,P ‖P,2
≤ EP [ supg∈Gn,P
| 2√n
n∑i=1
ωig(Vi)|] + δn‖Gn,P ‖P,2 . J1n, (S.339)
where the final inequality follows from (S.338) and the definition of J1n. Thus, (S.336),
(S.338), and (S.339) together with Markov’s inequality imply that uniformly in P ∈ P
‖Wn‖Gn,P = OP (J1n). (S.340)
107
Next, we use the linearity of the processes f 7→ Wn(f) and f 7→W?n,P (f) to obtain that
‖Wn−W?n,P ‖Fn ≤ sup
f∈Fn|Wn(fdn′n,Pβn,P (f))−W?
n,P (fdn′n,P (βn,P (f)))|+‖Wn−W?n,P ‖Gn,P
≤ supβ∈Bn
|Wn(fdn′n,Pβ)−W?n,P (fdn′n,Pβ)|+OP (J1n) = OP (J2nRn + J1n),
where the second inequality holds uniformly in P ∈ P by (S.335) and Markov’s inequal-
ity, result (S.340), and set inclusion, while the equality holds uniformly in P ∈ P by
result (S.334). The first claim of the theorem then follows by using Lemma S.9.1(i) to
set Rn = (dn log(1 + dn)CnK2n/n)1/4 in (S.334), and the second part of the theorem
follows from using Lemma S.9.1(ii) to set Rn = (dn log(1 + dn)CnK2n/n)1/2 instead.
Lemma S.9.1. Let (ωi, Vi)ni=1 be i.i.d. with Vi ∼ P ∈ P, ωi ∼ N(0, 1), and ωi and Vi
independent. Suppose Assumption S.9.1 holds, dn log(1 + dn)K2n = o(n), and Bn ⊂ Rdn
satisfies 0 ∈ Bn and J2n ≡∫∞
0
√log(N(ε,Bn, ‖ · ‖2))dε <∞. (i) Then, there is a linear
Gaussian process W?n,P on Sn,P ≡ spanfdnn,P independent of Vini=1 with
supβ∈Bn
|Wn(fdn′n,Pβ)−W?n,P (fdn′n,Pβ)| = OP (J2n
dn log(1 + dn)CnK2n
n1/4)
uniformly in P ∈ P. (ii) If in addition the eigenvalues of EP [fdnn,P (V )fdnn,P (V )′] are
bounded away from zero uniformly in n and P ∈ P, then uniformly in P ∈ P
supβ∈Bn
|Wn(fdn′n,Pβ)−W?n,P (fdn′n,Pβ)| = OP (
J2n
√dn log(1 + dn)CnKn√
n).
Proof: First note that Wn(f−c) = Wnf for any c ∈ R and f ∈ L2P . We may therefore
assume without loss of generality that EP [fdnn,P (V )] = 0, and for every P ∈ P we let
Σn(P ) ≡ VarP fdnn,P (V ) = EP [fdnn,P (V )fdnn,P (V )′] and define
Σn(P ) ≡ 1
n
n∑i=1
(fdnn,P (Vi)−1
n
n∑j=1
fdnn,P (Vj))(fdnn,P (Vi)−
1
n
n∑j=1
fdnn,P (Vj))′.
For a sequence Rn with Rn = o(1), and any constant M > 0 and P ∈ P define the event
An,P (M) ≡ ‖Σ1/2n (P )− Σ1/2
n (P )‖o,2 ≤MRn. (S.341)
Further note that by Lemma S.9.2 it follows we may select Rn = o(1) such that we have
lim infM↑∞
lim infn→∞
infP∈P
P (Vini=1 ∈ An,P (M)) = 1. (S.342)
In particular, to establish part (i) of the lemma we will setRn = (dn log(1+dn)CnK2n/n)1/4
and employ Lemma S.9.2(i), while to establish part (ii) we will set Rn = (dn log(1 +
108
dn)CnK2n/n)1/2 and employ Lemma S.9.2(ii) instead.
Next, let Ndn ∈ Rdn follow a standard normal distribution and be independent
of (ωi, Vi)ni=1 (defined on the same suitably enlarged probability space). Further let
νddnd=1 denote eigenvectors of Σn(P ), λddnd=1 represent the corresponding (possibly
zero) eigenvalues and define the random variable Zn,P ∈ Rdn to be given by
Zn,P ≡∑d:λd 6=0
νd(ν ′dWn(fdnn,P ))
λ1/2d
+∑d:λd=0
νd(ν′dNdn). (S.343)
Then note that since Wn(fdnn,P ) ∼ N(0, Σn(P )) conditional on Vini=1, and Ndn is inde-
pendent of (ωi, Vi)ni=1, Zn,P is Gaussian conditional on Vini=1. Furthermore,
E[Zn,PZ′n,P |Vini=1] =
dn∑d=1
νdν′d = Idn
by direct calculation for Idn the dn × dn identity matrix. Hence, Zn,P ∼ N(0, Idn)
conditional on Vini=1 almost surely in Vini=1 and is thus independent of Vini=1.
Moreover, we also note that by Theorem 3.6.1 in Bogachev (1998) and Wn(fdnn,P ) ∼N(0, Σn(P )) conditional on Vini=1, it follows that Wn(fdnn,P ) belongs to the range of
Σn(P ) : Rdn → Rdn almost surely in (ωi, Vi)ni=1. Therefore, since νd : λd 6= 0dnd=1
spans the range of Σn(P ), we conclude from (S.343) that for any β ∈ Rdn
β′Σ1/2n (P )Zn,P = β′
∑d:λd 6=0
νd(ν′dWn(fdnn,P )) = Wn(β′fdnn,P ).
Analogously, we define for any β ∈ Rdn the linear Gaussian process W?n,P on Sn,P by
W?n,P (β′fdnn,P ) ≡ β′Σ1/2
n (P )Zn,P ,
which is independent of Vini=1 due to the independence of Zn,P and Vini=1. Setting
Wn,P (β) ≡ |β′(Σ1/2n (P )− Σ1/2
n (P ))Zn,P |1An,P (M), (S.344)
then note that for 1An,P (M) the indicator for the event Vini=1 ∈ An,P (M), we have
supβ∈Bn
|Wn(fdn′n,Pβ)−W?n,P (fdn′n,Pβ)|1An,P (M)
= supβ∈Bn
|β′(Σ1/2n (P )− Σ1/2
n (P ))Zn,P |1An,P (M) = supβ∈Bn
|Wn,P (β)|. (S.345)
Since Bn is bounded due to J2n being finite, definition (S.344) implies Wn,P ∈ `∞(Bn).
Moreover, we note that conditional on Vini=1, Wn,P is sub-Gaussian under the semi-
metric ρn(β, β) ≡ ‖(Σ1/2n (P )−Σ
1/2n (P ))(β−β)‖2. Since ‖Σ1/2
n (P )−Σ1/2n (P )‖o,2 ≤MRn
109
whenever 1An,P (M) = 1, we additionally obtain that∫ ∞0
√log(N(ε,Bn, ρn))dε ≤
∫ ∞0
√log(N(ε/MRn,Bn, ‖ · ‖2))dε
= MRn
∫ ∞0
√log(N(u,Bn, ‖ · ‖2))du, (S.346)
where the equality follows from the change of variables ε = MRnu. Therefore, since
0 ∈ Bn, Corollary 2.2.8 in van der Vaart and Wellner (1996) and (S.346) imply
E[ supβ∈Bn
|Wn,P (β)||Vini=1] .MRn
∫ ∞0
√log(N(u,Bn, ‖ · ‖2))du ≡MRnJ2n. (S.347)
Next, note (S.344), (S.345), and (S.347) together with Markov’s inequality imply that
P ( supβ∈Bn
|Wn(fdn′n,Pβ)−W?n,P (fdn′n,Pβ)| > M2RnJ2n; An,P (M))
≤ P ( supβ∈Bn
|Wn,P (β)| > M2RnJ2n) .1
M(S.348)
for all P ∈ P. Therefore, combining results (S.342) and (S.348), we can finally conclude
lim supM↑∞
lim supn→∞
supP∈P
P ( supβ∈Bn
|Wn(fdn′n,Pβ)−W?n(fdn′n,Pβ)| > M2RnJ2n)
. lim supM↑∞
lim supn→∞
supP∈P 1
M+ P (Vini=1 /∈ An,P (M)) = 0.
The first claim of the lemma then follows by employing Lemma S.9.2(i) to set Rn =
(dn log(1 + dn)CnK2n/n)1/4 in (S.341), while the second claim follows by employing
Lemma S.9.2(ii) to set Rn = (dn log(1 + dn)CnK2n/n)1/2.
Lemma S.9.2. Let Vini=1 be i.i.d. with Vi ∼ P ∈ P, suppose Assumption S.9.1 holds,
define Σn(P ) ≡ VarP fdnn,P (V ) and its sample analogue Σn(P ) to equal
Σn(P ) ≡ 1
n
n∑i=1
(fdnn,P (Vi)−1
n
n∑j=1
fdnn,P (Vj))(fdnn,P (Vi)−
1
n
n∑j=1
fdnn,P (Vj))′,
and assume dn log(1 + dn)K2nCn = o(n). (i) Then, it follows that uniformly in P ∈ P
‖Σ1/2n (P )− Σ1/2
n (P )‖o,2 = OP (dn log(1 + dn)CnK2n
n1/4).
(ii) If in addition supP∈P ‖Σ−1n (P )‖o,2 is bounded in n, then uniformly in P ∈ P
‖Σ1/2n (P )− Σ1/2
n (P )‖o,2 = OP (
√dn log(1 + dn)CnKn√
n).
110
Proof: First note Assumption S.9.1(i) implies that for all P ∈ P we must have
‖ 1
nfdnn,P (Vi)f
dnn,P (Vi)
′ − EP [fdnn,P (V )fdnn,P (V )′]‖o,2 ≤2dnK
2n
n(S.349)
almost surely for all P ∈ P since each entry of the matrix fdnn,P (Vi)fdnn,P (Vi)
′ is bounded
by K2n. Similarly, employing ‖fdnn,P (Vi)f
dnn,P (Vi)
′‖o,2 ≤ dnK2n almost surely we obtain
‖ 1
nEP [fdnn,P (Vi)f
dnn,P (Vi)
′ − EP [fdnn,P (V )fdnn,P (V )′]2]‖o,2 ≤2dnK
2nCnn
. (S.350)
Thus, employing results (S.349) and (S.350), together with dn log(1 + dn)K2nCn = o(n),
we obtain by Theorem 6.1(ii) in Tropp (2012) (Bernstein’s inequality for matrices) that
supP∈P
P (‖ 1
n
n∑i=1
fdnn,P (Vi)fdnn,P (Vi)
′−EP [fdnn,P (V )fdnn,P (V )′]‖o,2 >M
√dn log(1 + dn)CnKn√
n)
≤ dn exp−M2(dn log(1 + dn)K2
nCn)
2n
n
MBdnK2nCn (S.351)
for some B <∞. In particular, we can conclude from (S.351) that uniformly in P ∈ P
‖ 1
n
n∑i=1
fdnn,P (Vi)fdnn,P (Vi)
′ − EP [fdnn,P (V )fdnn,P (V )′]‖o,2 = OP (
√dn log(1 + dn)CnKn√
n).
(S.352)
Next note that EP [f2n,d,P (V )] ≤ ‖EP [fdnn,P (V )fdnn,P (V )′]‖o ≤ Cn, Markov’s inequality, and
Lemmas 2.2.9 and 2.2.10 in van der Vaart and Wellner (1996) imply that
‖ 1
n
n∑i=1
fdnn,P (Vi)− EP [fdnn,P (Vi)]‖2 ≤√dn max
1≤d≤dn| 1n
n∑i=1
fd,n,P (Vi)− EP [fd,n,P (V )]|
= OP (Kn log(1 + dn)
√dn
n+
√Cndn log(1 + dn)√
n) (S.353)
uniformly in P ∈ P. Moreover, ‖EP [fdnn,P (V )fdnn,P (V )′]‖o ≤ Cn, ‖b‖2 = sup‖a‖2=1 |a′b| for
any b ∈ Rdn , Assumption S.9.1(ii), and Jensen’s inequality together imply that
‖EP [fdnn,P (V )]‖2 = sup‖a‖=1
|a′E[fdnn,P (V )]| ≤ sup‖a‖=1
EP [(a′fdnn,P (V ))2]1/2 ≤√Cn ∧Kn
(S.354)
uniformly in P ∈ P. Therefore, since by the triangle inequality we additionally have
‖( 1
n
n∑i=1
fdnn,P (Vi))(1
n
n∑i=1
fdnn,P (Vi))′ − EP [fdnn,P (V )]EP [fdnn,P (V )]′‖o,2
≤ ‖EP [fdnn,P (V )]‖2 + ‖ 1
n
n∑i=1
fdnn,P (Vi)‖2‖1
n
n∑i=1
fdnn,P (Vi)− EP [fdnn,P (V )]‖2,
111
it follows from results (S.352), (S.353), and (S.354) that uniformly in P ∈ P we have
‖Σn(P )− Σn(P )‖o,2 = OP (
√dn log(1 + dn)CnKn√
n). (S.355)
Finally, since Σn(P ) ≥ 0 and Σn(P ) ≥ 0, Theorem X.1.1 in Bhatia (1997) implies that
‖Σ1/2n (P )− Σ1/2
n (P )‖o,2 ≤ ‖Σn(P )− Σn(P )‖1/2o,2 (S.356)
almost surely, and hence the first claim the lemma follows from (S.355) and (S.356).
For the second claim of the lemma, let eigA denote the smallest eigenvalue of
any Hermitian matrix A. Since ‖Σ−1n (P )‖o,2 = eigΣn(P ), supP∈P ‖Σ−1
n (P )‖o,2 being
bounded uniformly in n, result (S.355), and Corollary III.2.6 in Bhatia (1997) imply
lim infn→∞
infP∈P
P (mineigΣn(P ), eigΣn(P ) > η0) = 1
for some η0 > 0. Applying Theorem X.3.8 in Bhatia (1997) we can then conclude that
lim supM↑∞
lim supn→∞
supP∈P
P (‖Σ1/2n (P )− Σ1/2
n (P )‖o,2 > M
√dn log(1 + dn)CnKn√
n)
≤ lim supM↑∞
lim supn→∞
supP∈P
P (‖Σn(P )− Σn(P )‖o,2 > M2√η0dn log(1 + dn)CnKn√
n) = 0,
where the final equality follows from result (S.355).
Lemma S.9.3. Let An ≡ βj∞j=jn : βj ≤ Mn/jγ for some jn ≥ 2, Mn > 0, and
γ > 1, and define the metric ‖β‖`1 ≡∑
j≥jn |βj |. For any ε > 0 it then follows that
logN(ε,An, ‖ · ‖`1) ≤ ( 2Mn
ε(γ − 1))
1γ−1 + 1− jn ∨ 0 × log(
2Mn(jn − 1)−(γ−1)
ε(γ − 1)+ 1).
Proof: For any βj ∈ An and integer k ≥ (jn − 1) ≥ 1 we first obtain the estimate
∞∑j=k+1
|βj | ≤∞∑
j=k+1
Mn
jγ≤Mn
∫ ∞k
u−γdu = Mnk−(γ−1)
(γ − 1). (S.357)
For any a ∈ R, let dae denote the smallest integer larger than a, and further define
j?n(ε) ≡ d( 2Mn
ε(γ − 1))
1γ−1 e ∨ (jn − 1). (S.358)
Then note that (S.357) implies that for any βj ∈ An we have∑
j>j?(ε) |βj | ≤ ε/2.
112
Hence, letting An(ε) ≡ βj∞j=jn ∈ An : βj = 0 for all j > j?n(ε), we obtain
N(ε,An, ‖ · ‖`1) ≤ N(ε/2,An(ε), ‖ · ‖`1) ≤j?(ε)∏j=jn
N(cjε/2, [−Mn
jγ,Mn
jγ], | · |) (S.359)
where the final inequality holds for any cjj?(ε)j=jn
satisfying cj > 0 and∑j?(ε)
j=jncj = 1, and
the product should be understood to equal one if j?(ε) = jn − 1. In particular, setting
cj = j−γ/(∑j?(ε)
j=jnj−γ) in (S.359) we obtain the upper bound
N(ε,An, ‖ · ‖`1) ≤ (d2Mn(
∑∞j=jn
j−γ)
εe)(j?n(ε)−jn)∨0
≤ (d2Mn(jn − 1)−(γ−1)
ε(γ − 1)e)(j?n(ε)−jn)∨0
where the final inequality follows from an integral bound as in (S.357). Thus, the claim
of the lemma follows from the bound dae ≤ a+ 1 and results (S.358) and (S.359).
Lemma S.9.4. For any positive random variable U with E[U2] <∞ and finite constant
A > 0 it follows that E[U exp− AU2 ] ≤ E[U ] exp− A
E[U2]+ E[U2]/
√2A.
Proof: First note that the function u 7→ u exp−A/u2 is convex on u ∈ (0,√
2A].
Therefore by applying Jensen’s inequality, the fact that the function u 7→ u exp−A2/u2is increasing in u ∈ (0,∞), E[10 < U <
√2AU ] ≤ E[U ] due to U being positive almost
surely, and exp−A/U2 ≤ 1 due to A > 0, we can conclude that
E[U exp− A
U2] = E[10 < U ≤
√2AU exp− A
U2] + E[1U >
√2AU exp− A
U2]
≤ E[U ] exp− A
E[U2]+ E[1U >
√2AU ].
The claim of the lemma therefore follows from E[1U >√
2AU ] ≤ E[U2]/√
2A by the
Cauchy Schwarz inequality and Markov’s inequality.
References
Aliprantis, C. D. and Border, K. C. (2006). Infinite Dimensional Analysis – A Hitchhiker’sGuide. Springer-Verlag, Berlin.
Belloni, A., Chernozhukov, V., Chetverikov, D. and Kato, K. (2015). Some new asymp-totic theory for least squares series: Pointwise and uniform results. Journal of Econometrics,186 345–366.
Bhatia, R. (1997). Matrix Analysis. Springer, New York.
Blundell, R., Chen, X. and Kristensen, D. (2007). Semi-nonparametric iv estimation ofshape-invariant engel curves. Econometrica, 75 1613–1669.
113
Blundell, R., Horowitz, J. and Parey, M. (2017). Nonparametric estimation of a nonsep-arable demand function under the slutsky inequality restriction. Review of Economics andStatistics, 99 291–304.
Blundell, R., Horowitz, J. L. and Parey, M. (2012). Measuring the price responsivenessof gasoline demand: Economic shapre restrictions and nonparametric demand estimation.Quantitative Economics, 3 29–51.
Bogachev, V. I. (1998). Gaussian Measures. American Mathematical Society, Providence.
Bogachev, V. I. (2007). Measure Theory: Volume I. Springer-Verlag, Berlin.
Chen, X. (2007). Large sample sieve estimation of semi-nonparametric models. In Handbook ofEconometrics 6B (J. J. Heckman and E. E. Leamer, eds.). North Holland, Elsevier.
Chen, X. and Christensen, T. M. (2018). Optimal sup-norm rates and uniform inference onnonlinear functionals of nonparametric iv regression. Quantitative Economics, 9 39–84.
Chen, X. and Pouzo, D. (2012). Estimation of nonparametric conditional moment modelswith possibly nonsmooth generalized residuals. Econometrica, 80 277–321.
DeVore, R. A. and Lorentz, G. G. (1993). Constructive approximation, vol. 303. SpringerScience & Business Media.
Hansen, L. P. (1982). Large sample properties of generalized method of moments estimators.Econometrica, 50 891–916.
Koltchinskii, V. I. (1994). Komlos-major-tusnady approximation for the general empiricalprocess and haar expansions of classes of functions. Journal of Theoretical Probability, 773–118.
Kress, R. (1999). Linear Integral Equations. Springer, New York.
Luenberger, D. G. (1969). Optimization by Vector Space Methods. Wiley, New York.
Massart, P. (1989). Strong approximation for multivariate empirical and related processes,via kmt constructions. The Annals of Probability, 17 266–291.
Matzkin, R. L. (1994). Restrictions of economic theory in nonparametric methods. In Handboodof Econometrics (R. Engle and D. McFadden, eds.), vol. IV. Elsevier.
Newey, W. K. (1997). Convergence rates and asymptotic normality for series estimators.Journal of Econometrics, 79 147–168.
Pollard, D. (2002). A user’s guide to measure theoretic probability, vol. 8. Cambridge Uni-versity Press.
Tropp, J. A. (2012). User-friendly tail bounds for sums of random matrices. Foundations ofComputational Mathematics, 4 389–434.
van der Vaart, A. W. and Wellner, J. A. (1996). Weak Convergence and EmpiricalProcesses: with Applications to Statistics. Springer, New York.
Walkup, D. W. and Wets, R. J.-B. (1969). A lipschitzian characterization of convex poly-hedra. Proceedings of the American Mathematical Society 167–173.
Yurinskii, V. V. (1977). On the error of the gaussian approximation for convolutions. Theoryof Probability and Its Applications, 2 236–247.
Zeidler, E. (1985). Nonlinear Functional Analysis and its Applications I. Springer-Verlag,Berlin.
Zhai, A. (2018). A high-dimensional clt in w2 distance with near optimal convergence rate.Probability Theory and Related Fields, 170 821–845.
114