rohit patra university of floridausers.stat.ufl.edu/~rohitpatra/papersanddraft/cvxsimslides.pdf ·...

Efficient Estimation in Convex Single IndexModels1

Rohit PatraUniversity of Florida

http://arxiv.org/abs/1708.00145

1Joint work with Arun K. Kuchibhotla (UPenn) and Bodhisattva Sen (Columbia)

1/28

http://arxiv.org/abs/1708.00145

Overview

1 Introduction

2 Estimation

3 Asymptotics

4 Simulation study

2/28

Introduction

3/28

A semiparametric model

Convex single index model

Y = m0(θ>0 X ) + ε, E(ε|X ) = 0.

(Y ,X ) ∈ R× Rd ∼ Pθ0,m0 .

θ0 ∈ Rd is the unknown coefficient vector.

m0 : R→ R is an unknown convex link function with no parametricrestriction.

Offers a balance between flexibility of nonparametric models andinterpretability of parametric models.

4/28

Goal and Applicability

Model

Y = m0(θ>0 X ) + ε, E(ε|X ) = 0.

Problem

Estimate θ0 and m0 simultaneously, when we have i.i.d. data {(xi , yi )}ni=1

from the above model.

Why convex link

Convex/concave SIMs are widely used in economics, operationsresearch, financial engineering among other fields.

Production functions, utility functions, and call option prices areknown to be concave.

5/28

Restrictions

Identifiability: If m1(t) = m0(−t/2) and θ1 = −2θ0, thenm0(θ>0 x) = m1(θ>1 x). Thus we need to assume

θ0 ∈ Θ := {β ∈ Rd : |β| = 1,β1 > 0}.

We need some “regularity” assumptions on the class of link functions.

Only Shape constraints

Shape and Smoothness constraints

Some relevant works in the shape constrained single index model include:Murphy et al. (1999), Chen and Samworth (2015), Groeneboom andHendrickx (2016), and Balabdaoui et al. (2016).

6/28

Y = 2(X>θ0)2 + N(0, 5),

X ∼ U[0, 1]3, m(t) = 2t2.

Y: Output for 555 Belgian Firms

X: Labour, Capital, Wage

●●

●

●

●●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●●

●

●●

●

●

●

●

●●

●

●

0

10

20

30

0 1 2 3 4 5Index

y

Convex LSE Smoothing Splines Truth

●

●

●

●

●●

●

●

●

●

●

●●●

●

●

●●●

●

●

●●●●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●●

●

●

●

●●

●●

●●

●

●

●

●

●

●●●

●

●

●

●

●

●

●

●

●●●●

●

●●

●●

●

●

●

●

●

●●●●

●

●

●

●

●

●

●

●

●●

●

●

●

●●●●

●●●●

●

●●

●

●●●●

●

●

●

●

●●

●

●

●

●●

●

●●

●

●

●●

●●

●

●

●

●

●

●●

●

●

●

●●

●●

●

●●

●

●

●

●

●

●●

●●●

●●

●

●●

●●

●

●●

●

●

●●●

●

●

●

●

●

●

●

●

●●

●●

●●

●●

●●

●

●

●

●

●

●

●

●●

●●

●

●●

●●

●

●

●

●

●●

●

●

●

●

●

●

●

●●

●

●

●

●●●

●

●

●

●●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●●

●

●

●●

●●

●●

●

●

●

●

●

●

●

●

●●

●

●●

●

●

●●

●

●

●

●

●

●

●

●

●●●

●●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●●

●●

●

●

●

●

●

●

●●

●

●●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●●

●

●

●

●

●●

●

●

●

●

●

●

●●

●

●

●

●

●●

●

●

●

●

●

●

●

●●

●

●

●

●●

●●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●●

●●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●●

● ●

●

●

●

●●●

●

●●

●

●●●

●

●

●●●

●

●

●

●

●●

●●

●

●●

●

●●

●

●

●

●

●

●

●

●

●

●

●

−1

0

1

2

3

4

0 2 4Index

log(

outp

ut)

Splines Concave LSE Kong and Xia (2007)

Labour Data for Belgian Firms (1996)

Figure: Estimated link functions: stability due to convex constraint. Index := θ̂x

.7/28

Estimation

8/28

Estimation in Convex SIM [Kuchibhotla, Patra, Sen, 2017]

Lipschitz LSE

(m̂, θ̂) = argminm∈CL,θ∈Θ

1

n

n∑i=1

{yi −m(θ>xi )}2,

where

CL ={m|m is convex and |m(t1)−m(t2)| ≤ L|t1 − t2| ∀ t1, t2 ∈ R

}.

Penalized LSE

(m̌, θ̌) := argmin(m,θ)∈R×Θ

1

n

n∑i=1

{yi −m(θ>xi )}2 + λ̌2n

∫{m′′(t)}2dt,

where R denotes the class of all convex functions that have absolutelycontinuous first derivative.

9/28

Computation: Alternating Scheme

Recall:

(m̂, θ̂) = argminm∈CL,θ∈Θ Qn(m, θ), where

Qn(m, θ) :=1

n

n∑i=1

{yi −m(θ>xi )}2.

Minimization for fixed θ

For a fixed θ, define

m̂θ := argminm∈CL

Qn(m, θ).

The minimization is a convex optimization problem.

m̂θ can computed efficiently using the nnls package in R.

m̌θ can be computed via a damped newton type algorithm. Our Rpackage simest has a implementation of this computation.

10/28

Computation contd.

Minimization for fixed θ

We can now define the profiled loss Qn : Θ→ R,

Qn(θ) := Qn(m̂θ, θ)

Minimization over θ

To find θ̂, we now minimize Qn(θ) over θ ∈ Θ.

The loss function is not convex.

Simulations suggest a large domain of attraction for moderate d .

We use a gradient step on the unit sphere based on the rightderivative of m̂θ.

Some initial work has suggested that the convergence of thisalternating scheme is linear, i.e., |θ(k+1) − θ̂| ≤ (1− ρ)|θ(k) − θ̂|.

11/28

Asymptotics

12/28

Theoretical properties of LLSE

Rates of convergence of the estimators [Kuchibhotla, Patra, Sen, 2017]

Under some regularity conditions on m0 and distribution of X (boundedsupport), sub-Gaussian errors and L ≥ L0 the Lipschitz LSE satisfies

‖m̂ ◦ θ̂ −m0 ◦ θ0‖ = Op(n−2/5), [Estimation error]

‖m̂ ◦ θ0 −m0 ◦ θ0‖ = Op(n−2/5), [Estimation error of m̂]

‖m̂′ ◦ θ0 −m′0 ◦ θ0‖ = Op(n−2/15), [Estimation error of m̂′]

|θ̂ − θ0| = Op(n−2/5). [Estimation error of θ̂]

For a convex Lipschitz function g : R→ R let g ′ define the right

derivative of g that satisfies g(b) = g(a) +∫ b

ag ′(t)dt.

Here for any f : R→ R, and θ ∈ Rd , we define ‖f ◦ θ‖2 :=∫|f (θ>x)|2dPX (x).

13/28

Asymptotic normality: Homoscedastic model

Recall:

(m̂, θ̂) = argmin(m,θ)∈CL×Θ

1

n

n∑i=1

{yi −m(θ>xi )}2.

Semiparametric efficiency of θ̂ [Kuchibhotla, Patra, and Sen, 2017]

Assume that E(ε2|X ) ≡ σ2. Let `θ,m : Rd × R→ Rd−1 be the efficientscore and let us define the efficient information matrix

Iθ0,m0 := E(`θ0,m0`>θ0,m0

) ∈ R(d−1)×(d−1).

If m0 is twice differentiable, then under some regularity conditions we canconclude

√n(θ̂ − θ0)

d→ N(0,Hθ0I−1θ0,m0

H>θ0).

14/28

Inference

Our result readily yields confidence sets for θ0.

We can use the following plug-in estimator for the covarianceestimator:

Σ̂ := σ̂4Hθ̂Pθ̂,m̂[`θ̂,m̂(Y ,X )`>θ̂,m̂

(Y ,X )]−1H>θ̂,

where

σ̂2 :=n∑

i=1

[yi − m̂(θ̂>xi )]2/n

`θ,m(y , x) :=(y −m(θ>x)

)m′(θ>x)H>θ

{x − hθ(θ>x)

}.

Asymptotic 1− 2α confidence interval[θ̂i −

zα√n

(Σ̂i,i

)1/2

, θ̂i +zα√n

(Σ̂i,i

)1/2],

15/28

Asymptotic properties of the PLSE

If λ̌−1n = Op(n2/5) and λ̌n = op(n−1/4), then under some similar regularity

assumptions the PLSE satisfies

‖m̌ ◦ θ̌ −m0 ◦ θ0‖ = Op(λ̌n), [Estimation error]

‖m̌ ◦ θ0 −m0 ◦ θ0‖ = Op(λ̌n), [Estimation error for m̌]

‖m̌′ ◦ θ0 −m′0 ◦ θ0‖ = Op(λ̌1/2n ), [Estimation error for m̌′]

and √n(θ̌ − θ0)

d→ N(0,Hθ0I−1θ0,m0

H>θ0).

An example choice of λ̌n := C n−2/5.

Proof borrows ideas from the empirical process theory; see e.g.,Mammen and van de Geer (1997) and van de Geer (2000).

16/28

Difficulty in proving efficiency

The LLSE m̂ is a piecewise affine function and lies on the boundary of CL.

Nuisance tangent space

Consider the following model:

Y = m(θ>X ) + ε, where m ∈ CL, and θ ∈ Θ.

A linear perturbation/submodel around m:

ms,a(t) = m(t)− s a(t), where s ∈ R.

The score for the single index model along this submodel isproportional to a(·).

lin{a : D → R|ms,a ∈ R for small enough s} ⊆ L2(Λ).

The set inclusion is strict when m is not strongly convex.

17/28

Efficient score

Let Sθ denote the parametric score of the model and Λm denote thenuisance tangent space.

When m is strongly convex, the efficient score is known to be

Π(Sθ|Λ⊥m) := `θ,m(y , x) =(y−m(θ>x)

)m′(θ>x)H>θ

{x − hθ(θ>x)

}.

Since m̂ is not strongly convex, it is not clear if one can show that

Π(Sθ̂|Λ⊥m̂)

??= `θ̂,m̂.

18/28

Efficient score

Since m̂ lies on the boundary of CL, the “least favorable path” doesnot exist, i.e., we can not find (θt ,mt) (centered at (θ̂, m̂)) such that

`θ̂,m̂??=

∂

∂t

(y −mt(θ

>t x)

)2∣∣∣∣t=0

.

Since LLSE is the minimizer of the least squares loss, this would mean

Pn`θ̂,m̂ = 0.

Since m̂ is piecewise affine, it is not clear if one can show that

Pn`θ̂,m̂??= op(n−1/2).

19/28

Approximations

We next try to construct some paths around (θ̂, m̂) that will havescore “close” to `θ̂,m̂.

We find a path that has the following score:

Sθ,m = {(y −m(θ>x)

)H>θ

[m′(θ>x)x +

∫ θ>x

s0

m′(u)k ′(u)du −m′(θ>x)k(θ>x)

+ m′0(s0)k(s0)−m′

0(s0)hθ0(s0)

].

Note `θ0,m0 = Sθ0,m0 =(y −m(θ>0 x)

)H>θ0

m′0(θ>0 x){x − hθ0 (θ>0 x)

}.

van der Vaart (2002) calls such a path “approximately leastfavorable”.

20/28

Approximation, Part 2

Sθ,m is not very tractable, so we further approximate this by

ψθ,m(x , y) := (y −m(θ>x))H>θ [m′(θ>x)x − hθ0 (θ>x)m′0(θ>x)].

Compare ψθ,m to

`θ,m =(y −m(θ>x)

)H>θ m

′(θ>x){x − hθ(θ>x)

}.

Also note ψθ0,m0 = `θ0,m0 .

We show thatPnψθ̂,m̂ = op(n−1/2).

21/28

Simulation study

22/28

Another Estimator

Convex LSE

(m̃, θ̃) := argmin(m,θ)∈C×Θ

Qn(m, θ).

We can compute the LSE via the alternating minimization algorithm.

Simulations suggest θ̃ is√n−consistent.

The behavior of m̃′ at the boundary is not well-understood.

In fact, in univariate regression, m̃′ is unbounded in probability at theboundary.

23/28

Choice of L

Y = (θ>0 X )2 + N(0, .12), where X ∼ Uniform[−1, 1]4 and θ0 = 14/2.

2 3 4 5 7 10 CvxLSE

0.00

0.04

0.08

L

Mea

n A

bsol

ute

Dev

iatio

n

Figure: Box plots of 14

∑4i=1 |θi − θ0,i | (over 1000 replications, n = 500) from the

following model as the tuning parameter varies over {3, 4, 5, 7, 10} and CvxLSE.Here L0 = 4.

24/28

Confidence Interval

Y = (θ>0 X )2 + N(0, .32), X ∼ Uniform[−1, 1]3 and θ0 = 13/√

3. (1)

Table: The estimated coverage probabilities and average lengths (obtained from800 replicates) of nominal 95% confidence intervals for the first coordinate of θ0

for the model (1).

nCvxLip CvxPen

Coverage Avg Length Coverage Avg Length

50 0.92 0.30 0.94 0.29100 0.91 0.18 0.92 0.19200 0.92 0.13 0.93 0.13500 0.94 0.08 0.92 0.08

1000 0.93 0.06 0.92 0.062000 0.92 0.04 0.93 0.04

25/28

CvxPen

CvxLip

Smooth

EFM

EDR

0.000

0.001

0.002

0.003

0.004

0.005 d = 10

CvxPen

CvxLip

Smooth

EFM

EDR

0.000

0.002

0.004

0.006

0.008

0.010 d = 25

CvxPen

CvxLip

Smooth

EFM

EDR

0.000

0.005

0.010

0.015

0.020

0.025 d = 50

CvxPen

CvxLip

Smooth

EFM

0.00

0.02

0.04

0.06

0.08

0.10d = 100

Figure: Boxplots of∑d

i=1 |θ̂i − θ0,i |/d (over 500 replications) based on 200observations for dimensions 10, 25, 50, and 100.

26/28

Summary

First work providing efficient estimator in shape-constrained LSE (in abundled parameter problem) when m̂ is piecewise affine.

Our estimators readily lead to asymptotic confidence sets for θ0.

Our methods are robust towards the choice of the tuning parameter.

The proposed estimators are implemented in the R package simest.

27/28

References

[1] Kuchibhotla, A. K. and Patra, R. K. (2016).simest: Single Index Model Estimation with Constraints on Link Function.R package version 0.2.

[2] Kuchibhotla, A. K., Patra, R. K., and Sen, B. (2017).Efficient Estimation in Convex Single Index Models.arxiv.org/abs/1708.00145.

[3] Kuchibhotla, A. K., and Patra, R. K. (2017).Efficient estimation in single index models through smoothing splines.arxiv.org/abs/1612.00068.

[4] Balabdaoui, F., Durot, C. and Jankowski, H. (2016).Least squares estimation in the monotone single index model.arxiv.org/abs/1610.06026.

[5] Groeneboom, P. and Hendrickx, K. (2016).Current status linear regression.Annals of Statistics (Forthcoming).

[6] Chen, Y. and Samworth, R. J. (2014).Generalised additive and index models with shape constraints.JRSSB. 78(4), 729–754..

[7] Murphy, S. A., van der Vaart, A. W. and Wellner, J. A. (1999).Current status regression.Math. Methods Statist. 8(3), 407425.

Thank You! Questions?

28/28

rohit patra university of floridausers.stat.ufl.edu/~rohitpatra/papersanddraft/cvxsimslides.pdf ·...

Documents