rohit patra university of floridausers.stat.ufl.edu/~rohitpatra/papersanddraft/cvxsimslides.pdf ·...
TRANSCRIPT
Efficient Estimation in Convex Single IndexModels1
Rohit PatraUniversity of Florida
http://arxiv.org/abs/1708.00145
1Joint work with Arun K. Kuchibhotla (UPenn) and Bodhisattva Sen (Columbia)
1/28
Overview
1 Introduction
2 Estimation
3 Asymptotics
4 Simulation study
2/28
Introduction
3/28
A semiparametric model
Convex single index model
Y = m0(θ>0 X ) + ε, E(ε|X ) = 0.
(Y ,X ) ∈ R× Rd ∼ Pθ0,m0 .
θ0 ∈ Rd is the unknown coefficient vector.
m0 : R→ R is an unknown convex link function with no parametricrestriction.
Offers a balance between flexibility of nonparametric models andinterpretability of parametric models.
4/28
Goal and Applicability
Model
Y = m0(θ>0 X ) + ε, E(ε|X ) = 0.
Problem
Estimate θ0 and m0 simultaneously, when we have i.i.d. data {(xi , yi )}ni=1
from the above model.
Why convex link
Convex/concave SIMs are widely used in economics, operationsresearch, financial engineering among other fields.
Production functions, utility functions, and call option prices areknown to be concave.
5/28
Restrictions
Identifiability: If m1(t) = m0(−t/2) and θ1 = −2θ0, thenm0(θ>0 x) = m1(θ>1 x). Thus we need to assume
θ0 ∈ Θ := {β ∈ Rd : |β| = 1,β1 > 0}.
We need some “regularity” assumptions on the class of link functions.
Only Shape constraints
Shape and Smoothness constraints
Some relevant works in the shape constrained single index model include:Murphy et al. (1999), Chen and Samworth (2015), Groeneboom andHendrickx (2016), and Balabdaoui et al. (2016).
6/28
Y = 2(X>θ0)2 + N(0, 5),
X ∼ U[0, 1]3, m(t) = 2t2.
Y: Output for 555 Belgian Firms
X: Labour, Capital, Wage
●●
●
●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●●
●
●●
●
●
●
●
●●
●
●
0
10
20
30
0 1 2 3 4 5Index
y
Convex LSE Smoothing Splines Truth
●
●
●
●
●●
●
●
●
●
●
●●●
●
●
●●●
●
●
●●●●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●●
●
●
●
●●
●●
●●
●
●
●
●
●
●●●
●
●
●
●
●
●
●
●
●●●●
●
●●
●●
●
●
●
●
●
●●●●
●
●
●
●
●
●
●
●
●●
●
●
●
●●●●
●●●●
●
●●
●
●●●●
●
●
●
●
●●
●
●
●
●●
●
●●
●
●
●●
●●
●
●
●
●
●
●●
●
●
●
●●
●●
●
●●
●
●
●
●
●
●●
●●●
●●
●
●●
●●
●
●●
●
●
●●●
●
●
●
●
●
●
●
●
●●
●●
●●
●●
●●
●
●
●
●
●
●
●
●●
●●
●
●●
●●
●
●
●
●
●●
●
●
●
●
●
●
●
●●
●
●
●
●●●
●
●
●
●●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●●
●
●
●●
●●
●●
●
●
●
●
●
●
●
●
●●
●
●●
●
●
●●
●
●
●
●
●
●
●
●
●●●
●●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●●
●●
●
●
●
●
●
●
●●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●●
●
●
●
●
●●
●
●
●
●
●
●
●●
●
●
●
●
●●
●
●
●
●
●
●
●
●●
●
●
●
●●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●●
● ●
●
●
●
●●●
●
●●
●
●●●
●
●
●●●
●
●
●
●
●●
●●
●
●●
●
●●
●
●
●
●
●
●
●
●
●
●
●
−1
0
1
2
3
4
0 2 4Index
log(
outp
ut)
Splines Concave LSE Kong and Xia (2007)
Labour Data for Belgian Firms (1996)
Figure: Estimated link functions: stability due to convex constraint. Index := θ̂x
.7/28
Estimation
8/28
Estimation in Convex SIM [Kuchibhotla, Patra, Sen, 2017]
Lipschitz LSE
(m̂, θ̂) = argminm∈CL,θ∈Θ
1
n
n∑i=1
{yi −m(θ>xi )}2,
where
CL ={m|m is convex and |m(t1)−m(t2)| ≤ L|t1 − t2| ∀ t1, t2 ∈ R
}.
Penalized LSE
(m̌, θ̌) := argmin(m,θ)∈R×Θ
1
n
n∑i=1
{yi −m(θ>xi )}2 + λ̌2n
∫{m′′(t)}2dt,
where R denotes the class of all convex functions that have absolutelycontinuous first derivative.
9/28
Computation: Alternating Scheme
Recall:
(m̂, θ̂) = argminm∈CL,θ∈Θ Qn(m, θ), where
Qn(m, θ) :=1
n
n∑i=1
{yi −m(θ>xi )}2.
Minimization for fixed θ
For a fixed θ, define
m̂θ := argminm∈CL
Qn(m, θ).
The minimization is a convex optimization problem.
m̂θ can computed efficiently using the nnls package in R.
m̌θ can be computed via a damped newton type algorithm. Our Rpackage simest has a implementation of this computation.
10/28
Computation contd.
Minimization for fixed θ
We can now define the profiled loss Qn : Θ→ R,
Qn(θ) := Qn(m̂θ, θ)
Minimization over θ
To find θ̂, we now minimize Qn(θ) over θ ∈ Θ.
The loss function is not convex.
Simulations suggest a large domain of attraction for moderate d .
We use a gradient step on the unit sphere based on the rightderivative of m̂θ.
Some initial work has suggested that the convergence of thisalternating scheme is linear, i.e., |θ(k+1) − θ̂| ≤ (1− ρ)|θ(k) − θ̂|.
11/28
Asymptotics
12/28
Theoretical properties of LLSE
Rates of convergence of the estimators [Kuchibhotla, Patra, Sen, 2017]
Under some regularity conditions on m0 and distribution of X (boundedsupport), sub-Gaussian errors and L ≥ L0 the Lipschitz LSE satisfies
‖m̂ ◦ θ̂ −m0 ◦ θ0‖ = Op(n−2/5), [Estimation error]
‖m̂ ◦ θ0 −m0 ◦ θ0‖ = Op(n−2/5), [Estimation error of m̂]
‖m̂′ ◦ θ0 −m′0 ◦ θ0‖ = Op(n−2/15), [Estimation error of m̂′]
|θ̂ − θ0| = Op(n−2/5). [Estimation error of θ̂]
For a convex Lipschitz function g : R→ R let g ′ define the right
derivative of g that satisfies g(b) = g(a) +∫ b
ag ′(t)dt.
Here for any f : R→ R, and θ ∈ Rd , we define ‖f ◦ θ‖2 :=∫|f (θ>x)|2dPX (x).
13/28
Asymptotic normality: Homoscedastic model
Recall:
(m̂, θ̂) = argmin(m,θ)∈CL×Θ
1
n
n∑i=1
{yi −m(θ>xi )}2.
Semiparametric efficiency of θ̂ [Kuchibhotla, Patra, and Sen, 2017]
Assume that E(ε2|X ) ≡ σ2. Let `θ,m : Rd × R→ Rd−1 be the efficientscore and let us define the efficient information matrix
Iθ0,m0 := E(`θ0,m0`>θ0,m0
) ∈ R(d−1)×(d−1).
If m0 is twice differentiable, then under some regularity conditions we canconclude
√n(θ̂ − θ0)
d→ N(0,Hθ0I−1θ0,m0
H>θ0).
14/28
Inference
Our result readily yields confidence sets for θ0.
We can use the following plug-in estimator for the covarianceestimator:
Σ̂ := σ̂4Hθ̂Pθ̂,m̂[`θ̂,m̂(Y ,X )`>θ̂,m̂
(Y ,X )]−1H>θ̂,
where
σ̂2 :=n∑
i=1
[yi − m̂(θ̂>xi )]2/n
`θ,m(y , x) :=(y −m(θ>x)
)m′(θ>x)H>θ
{x − hθ(θ>x)
}.
Asymptotic 1− 2α confidence interval[θ̂i −
zα√n
(Σ̂i,i
)1/2
, θ̂i +zα√n
(Σ̂i,i
)1/2],
15/28
Asymptotic properties of the PLSE
If λ̌−1n = Op(n2/5) and λ̌n = op(n−1/4), then under some similar regularity
assumptions the PLSE satisfies
‖m̌ ◦ θ̌ −m0 ◦ θ0‖ = Op(λ̌n), [Estimation error]
‖m̌ ◦ θ0 −m0 ◦ θ0‖ = Op(λ̌n), [Estimation error for m̌]
‖m̌′ ◦ θ0 −m′0 ◦ θ0‖ = Op(λ̌1/2n ), [Estimation error for m̌′]
and √n(θ̌ − θ0)
d→ N(0,Hθ0I−1θ0,m0
H>θ0).
An example choice of λ̌n := C n−2/5.
Proof borrows ideas from the empirical process theory; see e.g.,Mammen and van de Geer (1997) and van de Geer (2000).
16/28
Difficulty in proving efficiency
The LLSE m̂ is a piecewise affine function and lies on the boundary of CL.
Nuisance tangent space
Consider the following model:
Y = m(θ>X ) + ε, where m ∈ CL, and θ ∈ Θ.
A linear perturbation/submodel around m:
ms,a(t) = m(t)− s a(t), where s ∈ R.
The score for the single index model along this submodel isproportional to a(·).
lin{a : D → R|ms,a ∈ R for small enough s} ⊆ L2(Λ).
The set inclusion is strict when m is not strongly convex.
17/28
Efficient score
Let Sθ denote the parametric score of the model and Λm denote thenuisance tangent space.
When m is strongly convex, the efficient score is known to be
Π(Sθ|Λ⊥m) := `θ,m(y , x) =(y−m(θ>x)
)m′(θ>x)H>θ
{x − hθ(θ>x)
}.
Since m̂ is not strongly convex, it is not clear if one can show that
Π(Sθ̂|Λ⊥m̂)
??= `θ̂,m̂.
18/28
Efficient score
Since m̂ lies on the boundary of CL, the “least favorable path” doesnot exist, i.e., we can not find (θt ,mt) (centered at (θ̂, m̂)) such that
`θ̂,m̂??=
∂
∂t
(y −mt(θ
>t x)
)2∣∣∣∣t=0
.
Since LLSE is the minimizer of the least squares loss, this would mean
Pn`θ̂,m̂ = 0.
Since m̂ is piecewise affine, it is not clear if one can show that
Pn`θ̂,m̂??= op(n−1/2).
19/28
Approximations
We next try to construct some paths around (θ̂, m̂) that will havescore “close” to `θ̂,m̂.
We find a path that has the following score:
Sθ,m = {(y −m(θ>x)
)H>θ
[m′(θ>x)x +
∫ θ>x
s0
m′(u)k ′(u)du −m′(θ>x)k(θ>x)
+ m′0(s0)k(s0)−m′
0(s0)hθ0(s0)
].
Note `θ0,m0 = Sθ0,m0 =(y −m(θ>0 x)
)H>θ0
m′0(θ>0 x){x − hθ0 (θ>0 x)
}.
van der Vaart (2002) calls such a path “approximately leastfavorable”.
20/28
Approximation, Part 2
Sθ,m is not very tractable, so we further approximate this by
ψθ,m(x , y) := (y −m(θ>x))H>θ [m′(θ>x)x − hθ0 (θ>x)m′0(θ>x)].
Compare ψθ,m to
`θ,m =(y −m(θ>x)
)H>θ m
′(θ>x){x − hθ(θ>x)
}.
Also note ψθ0,m0 = `θ0,m0 .
We show thatPnψθ̂,m̂ = op(n−1/2).
21/28
Simulation study
22/28
Another Estimator
Convex LSE
(m̃, θ̃) := argmin(m,θ)∈C×Θ
Qn(m, θ).
We can compute the LSE via the alternating minimization algorithm.
Simulations suggest θ̃ is√n−consistent.
The behavior of m̃′ at the boundary is not well-understood.
In fact, in univariate regression, m̃′ is unbounded in probability at theboundary.
23/28
Choice of L
Y = (θ>0 X )2 + N(0, .12), where X ∼ Uniform[−1, 1]4 and θ0 = 14/2.
2 3 4 5 7 10 CvxLSE
0.00
0.04
0.08
L
Mea
n A
bsol
ute
Dev
iatio
n
Figure: Box plots of 14
∑4i=1 |θi − θ0,i | (over 1000 replications, n = 500) from the
following model as the tuning parameter varies over {3, 4, 5, 7, 10} and CvxLSE.Here L0 = 4.
24/28
Confidence Interval
Y = (θ>0 X )2 + N(0, .32), X ∼ Uniform[−1, 1]3 and θ0 = 13/√
3. (1)
Table: The estimated coverage probabilities and average lengths (obtained from800 replicates) of nominal 95% confidence intervals for the first coordinate of θ0
for the model (1).
nCvxLip CvxPen
Coverage Avg Length Coverage Avg Length
50 0.92 0.30 0.94 0.29100 0.91 0.18 0.92 0.19200 0.92 0.13 0.93 0.13500 0.94 0.08 0.92 0.08
1000 0.93 0.06 0.92 0.062000 0.92 0.04 0.93 0.04
25/28
CvxPen
CvxLip
Smooth
EFM
EDR
0.000
0.001
0.002
0.003
0.004
0.005 d = 10
CvxPen
CvxLip
Smooth
EFM
EDR
0.000
0.002
0.004
0.006
0.008
0.010 d = 25
CvxPen
CvxLip
Smooth
EFM
EDR
0.000
0.005
0.010
0.015
0.020
0.025 d = 50
CvxPen
CvxLip
Smooth
EFM
0.00
0.02
0.04
0.06
0.08
0.10d = 100
Figure: Boxplots of∑d
i=1 |θ̂i − θ0,i |/d (over 500 replications) based on 200observations for dimensions 10, 25, 50, and 100.
26/28
Summary
First work providing efficient estimator in shape-constrained LSE (in abundled parameter problem) when m̂ is piecewise affine.
Our estimators readily lead to asymptotic confidence sets for θ0.
Our methods are robust towards the choice of the tuning parameter.
The proposed estimators are implemented in the R package simest.
27/28
References
[1] Kuchibhotla, A. K. and Patra, R. K. (2016).simest: Single Index Model Estimation with Constraints on Link Function.R package version 0.2.
[2] Kuchibhotla, A. K., Patra, R. K., and Sen, B. (2017).Efficient Estimation in Convex Single Index Models.arxiv.org/abs/1708.00145.
[3] Kuchibhotla, A. K., and Patra, R. K. (2017).Efficient estimation in single index models through smoothing splines.arxiv.org/abs/1612.00068.
[4] Balabdaoui, F., Durot, C. and Jankowski, H. (2016).Least squares estimation in the monotone single index model.arxiv.org/abs/1610.06026.
[5] Groeneboom, P. and Hendrickx, K. (2016).Current status linear regression.Annals of Statistics (Forthcoming).
[6] Chen, Y. and Samworth, R. J. (2014).Generalised additive and index models with shape constraints.JRSSB. 78(4), 729–754..
[7] Murphy, S. A., van der Vaart, A. W. and Wellner, J. A. (1999).Current status regression.Math. Methods Statist. 8(3), 407425.
Thank You! Questions?
28/28