r egression s hrinkage and s election via the l asso author: robert tibshirani journal of the royal...

18
REGRESSION SHRINKAGE AND SELECTION VIA THE LASSO Author: Robert Tibshirani Journal of the Royal Statistical Society 1996 Presentation: Tinglin Liu Oct. 27 2010 1

Upload: osborne-franklin

Post on 17-Jan-2016

217 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: R EGRESSION S HRINKAGE AND S ELECTION VIA THE L ASSO Author: Robert Tibshirani Journal of the Royal Statistical Society 1996 Presentation: Tinglin Liu

REGRESSION SHRINKAGE AND SELECTION VIA THE

LASSO

Author: Robert Tibshirani

Journal of the Royal Statistical Society

1996

Presentation: Tinglin Liu

Oct. 27 20101

Page 2: R EGRESSION S HRINKAGE AND S ELECTION VIA THE L ASSO Author: Robert Tibshirani Journal of the Royal Statistical Society 1996 Presentation: Tinglin Liu

OUTLINE

What’s the Lasso?

Why should we use the Lasso?

Why will the results of Lasso be sparse?

How to find the Lasso solutions?

2

Page 3: R EGRESSION S HRINKAGE AND S ELECTION VIA THE L ASSO Author: Robert Tibshirani Journal of the Royal Statistical Society 1996 Presentation: Tinglin Liu

OUTLINE

What’s the Lasso?

Why should we use the Lasso?

Why will the results of Lasso be sparse?

How to find the Lasso solutions?

3

Page 4: R EGRESSION S HRINKAGE AND S ELECTION VIA THE L ASSO Author: Robert Tibshirani Journal of the Royal Statistical Society 1996 Presentation: Tinglin Liu

LASSO(LEAST ABSOLUTE SHRINKAGE AND SELECTION OPERATOR)

Definition

It’s a coefficients shrunken version of the ordinary Least Square Estimate, by minimizing the Residual Sum of Squares subjecting to the constraint that the sum of the absolute value of the coefficients should be no greater than a constant.

4

Page 5: R EGRESSION S HRINKAGE AND S ELECTION VIA THE L ASSO Author: Robert Tibshirani Journal of the Royal Statistical Society 1996 Presentation: Tinglin Liu

LASSO(LEAST ABSOLUTE SHRINKAGE AND SELECTION OPERATOR)

Features Equivalent to the Classic Expression of Sparse Coding

Here, the standardization is required to make the andnormalize every predictor variable.

Murray, W., Gill, P. and Wright, M.(1981) Practical Optimization. Chapter 5. Academic Press

2

1

22 1

argmin{ ( ) } | |

argmin || || || ||

N

i j ij ji j j

y x

Y X

ˆˆ( , )

5

0y

Page 6: R EGRESSION S HRINKAGE AND S ELECTION VIA THE L ASSO Author: Robert Tibshirani Journal of the Royal Statistical Society 1996 Presentation: Tinglin Liu

LASSO(LEAST ABSOLUTE SHRINKAGE AND SELECTION OPERATOR)

Features Sparse Solutions

Let be the full least square estimates and

Value will cause the shrinkage

Let as the scaled Lasso

parameter

6

0̂0

0ˆ| |jt

0t t

00

ˆ/ / | |js t t t

Page 7: R EGRESSION S HRINKAGE AND S ELECTION VIA THE L ASSO Author: Robert Tibshirani Journal of the Royal Statistical Society 1996 Presentation: Tinglin Liu

LASSO(LEAST ABSOLUTE SHRINKAGE AND SELECTION OPERATOR)

Features Lasso as Bayes Estimate

Assume that , and has the double exponential probability distribution as:

Then, we can derive the lasso regression estimate as the Bayes posterior mode.

7

2( , )y N X

Page 8: R EGRESSION S HRINKAGE AND S ELECTION VIA THE L ASSO Author: Robert Tibshirani Journal of the Royal Statistical Society 1996 Presentation: Tinglin Liu

OUTLINE

What’s the Lasso?

Why should we use the Lasso?

Why will the results of Lasso be sparse?

How to find the Lasso solutions?

8

Page 9: R EGRESSION S HRINKAGE AND S ELECTION VIA THE L ASSO Author: Robert Tibshirani Journal of the Royal Statistical Society 1996 Presentation: Tinglin Liu

WHY LASSO?

Prediction Accuracy Assume , and , then the prediction error of the estimate is

OLS estimates often have low bias but large variance, the Lasso can improve the overall prediction accuracy by sacrifice a little bias to reduce the variance of the predicted value.

9

( )y f x ( ) 0E 2var( ) ˆ ( )f x

2

2 2 2

2 2

ˆ ˆ ˆ[ ( ) ( )] [ ( ) ( )]

ˆ ˆ( ( )) var( ( ))

ˆ( ) [( ( ))]

Ef x f x E f x Ef x

bias f x f x

Err x E y f x

Page 10: R EGRESSION S HRINKAGE AND S ELECTION VIA THE L ASSO Author: Robert Tibshirani Journal of the Royal Statistical Society 1996 Presentation: Tinglin Liu

WHY LASSO?

Interpretation In many cases, the response is determined by

just a small subset of the predictor variables.

10

y

Page 11: R EGRESSION S HRINKAGE AND S ELECTION VIA THE L ASSO Author: Robert Tibshirani Journal of the Royal Statistical Society 1996 Presentation: Tinglin Liu

OUTLINE

What’s the Lasso?

Why should we use the Lasso?

Why will the results of Lasso be sparse?

How to find the Lasso solutions?

11

Page 12: R EGRESSION S HRINKAGE AND S ELECTION VIA THE L ASSO Author: Robert Tibshirani Journal of the Royal Statistical Society 1996 Presentation: Tinglin Liu

WHY SPARSE?

12

Geometry of Lasso The criterion equals to the

quadratic function as:

This function is expressed as the elliptical contour centered at the OLS estimates.

The L1 norm constraints is expressed as the square centered at the origin.

The Lasso solution is the first place where the contour touches the square.

2

1( )

N

i j iji jy x

| |jjt

Page 13: R EGRESSION S HRINKAGE AND S ELECTION VIA THE L ASSO Author: Robert Tibshirani Journal of the Royal Statistical Society 1996 Presentation: Tinglin Liu

WHY SPARSE?

13

Page 14: R EGRESSION S HRINKAGE AND S ELECTION VIA THE L ASSO Author: Robert Tibshirani Journal of the Royal Statistical Society 1996 Presentation: Tinglin Liu

WHY SPARSE?

14

Geometry of Lasso Since the variables are standardized, the principal

axes of the contours are at to the co-ordinate axes.

The correlations between the variables can influence the axis length of the elliptical contours, but have almost no influence upon the solution of the Lasso solutions.

45o

Page 15: R EGRESSION S HRINKAGE AND S ELECTION VIA THE L ASSO Author: Robert Tibshirani Journal of the Royal Statistical Society 1996 Presentation: Tinglin Liu

OUTLINE

What’s the Lasso?

Why should we use the Lasso?

Why will the results of Lasso be sparse?

How to find the Lasso solutions?

15

Page 16: R EGRESSION S HRINKAGE AND S ELECTION VIA THE L ASSO Author: Robert Tibshirani Journal of the Royal Statistical Society 1996 Presentation: Tinglin Liu

HOW TO SOLVE THE PROBLEM?

The absolute inequality constraints can be translated into inequality constraints. ( p stands for the number of predictor variables )

Where is an matrix, corresponding to linear inequality constraints.

But direct application of this procedure is not practical due to the fact that may be very large.

16

Lawson, C. and Hansen, R. (1974) Solving Least Squares Problems. Prentice Hall.

2 p

| |jjt G t

G 2 p p 2 p

2 p

Page 17: R EGRESSION S HRINKAGE AND S ELECTION VIA THE L ASSO Author: Robert Tibshirani Journal of the Royal Statistical Society 1996 Presentation: Tinglin Liu

HOW TO SOLVE THE PROBLEM?

Outline of the Algorithm Sequentially introduce the inequality constraints

In practice, the average iteration steps required is in the range of (0.5p, 0,75p), so the algorithm is acceptable. 17

Lawson, C. and Hansen, R. (1974) Solving Least Squares Problems. Prentice Hall.

Page 18: R EGRESSION S HRINKAGE AND S ELECTION VIA THE L ASSO Author: Robert Tibshirani Journal of the Royal Statistical Society 1996 Presentation: Tinglin Liu

18