simplex techniques for quantile regression model selection · applications of simplex techniques...

Post on 30-Jun-2020

3 Views

Category:

Documents

3 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Simplex Techniques for Simplex Techniques for Quantile Regression Model Selection

Yonggang Yao

SAS Institute Inc.SAS Institute Inc.

8/01/2010

Nonparametric Statistics — 2010 JSM, Vancouver, Canada

Outline

k d

Outline

Background

Quantile Regression

Linear Programming

Model Selection and Simplex Tableau

G d M th dGreedy Methods

Penalty Methods

Resampling

Some Computing Issues

Background

•Three Formulations of Quantile Regression (QR) at level

Background

n1

n

ii

n

iii

xyxy

xyn

21)(

21-min

)(1min 1

nn

n

iii

i

aXaXay

n

]1,0[ and 1)1( s.t. max

22

1

1

where .

Li P i (LP)

ttt )1( )(

• Linear Programming (LP)

0 and s.t.min

zbAzc'z

Background

• Cast QR problem as LP problem

Background

IIXXA

nncz

'/)1(/ 0 0

)' (

0 and s.t.min

Form) (Standard LP

zbAzc'z

where c and z are m-vectors with m=2p+2n, A is a n-by-m matrix, and

Yb

is the residual vector. XY

Simplex Theory

• Let denote an index set. d i ibl b i f A

Simplex Theory

},,2,1{},,{ 1 nBBB n ][ AAA denote an invertible sub-matrix of A.is called a basic solution if satisfies:],,[

1 nBBB AAA

],,[ **1

*mzzz *z

bAz BB1*

Form)(StandardLP

• is an optimal solution if

Bmjz j \},,2,1{for 0*

1*

*z 0 and s.t.min

Form)(StandardLP

zbAzc'z

• Simplex Tableau0'

01

1*

BB

BB

cAAc

bAz

p

AAbA

AAcczc BBBB

11

1'''

AAbA BB

Model Selection and Simplex Tableau

• Cast QR model selection problem as LP problem

Model Selection and Simplex Tableau

)'00( XXAIIXXA

nncz

)- ( '0 0 /)1(/ 0 0 )'0 0 (

22*

11

where is forced to be a zero vector.2

0ands tmin

Form) (Standard LP

bAc'z

Yb

• Simplex Tableau for Model Selection

*111

*11 '''''AAAAbA

AAccAAcczc BBBBBB

0and s.t. zbAz

where is for , and is for . U i d l d l l i

AAAAbA BBB

A 1X *A 2X

}{ BBB• Use index set to control a model selection process. },,{ 1 nBBB

Key IdeasKey Ideas

A model is like a port. The simplex tableau is like a cargo ship.

Evaluate models with data.

Model Selection and Simplex Tableau• Greedy Methods (Forward, Backward, Stepwise)

Model Selection and Simplex Tableau

1 n

• Fit Criteria

)(vs.)(1)( 21 RMWARR F

.][1min )(

][1min )(

1model-reduced

1model-full

n

iiiR

n

iiiF

xyn

MWAR

xyn

MWAR

)(vs. )(

1)( RMWAR

RR

npMWARnSICpMWARnAIClog))(log(2)(

2))(log(2)(

log))(log(2)(sSawa'

2)1(2))(log(2)(

pnnMWARnBIC

pnnpMWARnAICC

2

log))(log(2)( sSawa'

pn

nMWARnBIC

scoreWaldratio Likelihood

score WaldscoreRank

SimulationSimulation

True Model: p=20 and n=1000

)10(~)(

32 1815121021

diiUnifxxX

exxxxxxy

... )25,0(~... )1,0(~),,( 101

diiNediiUnifxxX

SimulationSimulation

Forward Selection Summary at quantile level 0.5 EFFECT Objective p-value j pstep entered function (Wald Scores) QRR ADJQRR AIC SIC0 Intercept 0.875704 0.000000 -0.000500 -131.727 -125.819

-----------------------------------------------------------------------------1 x10 0.858893 0.00001 0.019198 0.018707 -151.111 -145.2032 x15 0.850475 0.00001 0.028811 0.027838 -159.960 -148.1453 x12 0.842748 0.00001 0.037634 0.036187 -168.087 -150.3644 x1 0.837204 0.00006 0.043965 0.042047 -173.687 -150.056*5 x5 0.832609 0.00058 0.049213 0.046827 -178.192 -148.6536 x18 0.830066 0.00138 0.052117 0.049260 -180.250 -144.804

* Optimal Value Of CriterionSelection stopped as the candidate for entry has p-value> 0.1.y

Stop DetailsCandidate Candidate CompareFor Effect Significance SignificanceEntry x3 0.13957 > 0.1000 (p-value on Wald Score)

SimulationSimulation

Effects: Intercept x10 x15 x12 x1 x5 x18p

Parameter Estimates at quantile level 0.5 Standard 95% Confidence

Parameter DF Estimate Error Limits t Value Pr > |t|Intercept 1 -0.0519 0.3907 -0.8186 0.7148 -0.13 0.8944x10 1 1.3327 0.3115 0.7215 1.9439 4.28 <.0001x15 1 1.2865 0.3086 0.6809 1.8921 4.17 <.0001x12 1 1.1219 0.3066 0.5203 1.7235 3.66 0.0003x1 1 0.8864 0.3010 0.2957 1.4770 2.94 0.0033

5 1 0 7483 0 3030 0 1536 1 3430 2 47 0 0137x5 1 0.7483 0.3030 0.1536 1.3430 2.47 0.0137x18 1 0.6918 0.3021 0.0990 1.2845 2.29 0.0222

Model Selection and Simplex Tableau

• Penalty Methods

Model Selection and Simplex Tableau

LASSO penalty, OSCAR penalty, Grouped LASSO penalty

• Manipulating Simplex Algorithm for Penalty MethodsManipulating Simplex Algorithm for Penalty MethodsFor example, LASSO penalty can be measured by using vector as follow:

AA 1'''

a

Form)Costc(ParametriLP

AAAAaaAAcc

bAzazc

B

BB

BB

B

BB

BB

1

1

1

1

''''

''

0 and s.t.'min

Form)Cost -c(ParametriLP

zbAzzac'z

where = (1, 1, 0, 0) according to .)'( za

SimulationSimulation

True Model: p=11 and n=1000.

3

1 1exy

g

p

igigi

g

... )50,0(~... )1,0(~),,(

(-3,2,-2))(0,0,0,0),((2,3,2), True

101

diiNediiNxxX

),(

Solution Path for LASSO QR Solution Path for LASSO QR

. :Penalty p

is 1i

(-3,2,-2))(0,0,0,0),((2,3,2), True

Solution Path for OSCAR QR Solution Path for OSCAR QR

(-3,2,-2))(0,0,0,0),((2,3,2), True

Solution Path for Grouped-LASSO QRSolution Path for Grouped LASSO QR

. ,....,max :Penalty1

1

G

gggg p

s 1g

(-3,2,-2))(0,0,0,0),((2,3,2), True

Applications of Simplex Techniques

• Resampling

Applications of Simplex Techniques

Cross-validation, Bootstrap

• Manipulating Simplex Algorithm for Resamplingp g p g p g1. Check whether an observation is active for a fitted model.2. Drive-out some observations by changing the objective function.

Key IdeasKey Ideas

Simplex Tableau can be used to:

• update an optimal partial model to another optimal partial model or full model.

dd t t i t d l • add extra constraints on a model.

• update an optimal model on a subset of a dataset to the optimal model on another subset of the datasetsubset of the dataset.

Computational GoalsComputational Goals

Hi h f i• High-performance computing

• Massive data processingp g

• Re-usable programs

Parallel Computing

• Parallel Computation can expedite Tableau Simplex algorithm on

Parallel Computing

1. Building initial tableau

2. Sorting/Ordering positive tableau rows

3. Changing the signs of tableau rows

4. Pivot Updating

Reference• Chen, C. and Wei Y. (2005), Computational Issues for Quantile Regression, The Indian Journal of Statistics, (67), pp.399-417.

Reference

• Koenker, R. (2005), Quantile Regression, Cambridge University Press.

• Koenker, R. and Machado, J.A.F. (1999), Goodness of fit and related inference processes for quantile regression, Journal of the American Statistician Association, (94), pp.1296-1310.

• Li, Y. and Zhu, J. (2008), L1-norm quantile regression, Journal of Computational & Graphical Statistics, (17), pp.163-185.Statistics, (17), pp.163 185.

• Sawa, T. (1978), Information criteria for discriminating among alternative regression models, Econometrica, (46), pp.1273–1282.

• Schwarz, G. (1978), Estimating the dimension of a model, Annals of Statistics, (6), pp.461–464.

• Yao, Y. and Lee, Y. (2007), Another look at linear programming for feature selection via methods of regularization. Technical Report No. 800, Department of Statistics, The Ohio State University.

top related