lecture 3: regularized linear models€¦ · bias-variance tradeoff – linear regression •...

12
Lecture 3: Regularized Linear models Hien Van Nguyen University of Houston 9/6/2017

Upload: others

Post on 21-Jul-2020

10 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Lecture 3: Regularized Linear models€¦ · Bias-Variance Tradeoff – Linear Regression • Linear model • Bias-Variance decomposition of linear model. 9/6/2017 Machine Learning

Lecture 3: Regularized Linear models

Hien Van Nguyen

University of Houston

9/6/2017

Page 2: Lecture 3: Regularized Linear models€¦ · Bias-Variance Tradeoff – Linear Regression • Linear model • Bias-Variance decomposition of linear model. 9/6/2017 Machine Learning

Bias-Variance Tradeoff

• Suppose data arise from this model

• Assume data is fixed and rewrite MSE

• More complex model will typically result in lower bias (fit better), but higher variance

9/6/2017 Machine Learning 2

True value

Page 3: Lecture 3: Regularized Linear models€¦ · Bias-Variance Tradeoff – Linear Regression • Linear model • Bias-Variance decomposition of linear model. 9/6/2017 Machine Learning

Bias-Variance Tradeoff – Linear Regression

• Linear model

• Bias-Variance decomposition of linear model

9/6/2017 Machine Learning 3

Page 4: Lecture 3: Regularized Linear models€¦ · Bias-Variance Tradeoff – Linear Regression • Linear model • Bias-Variance decomposition of linear model. 9/6/2017 Machine Learning

Bias-Variance Tradeoff – Linear Regression

• Each additional variable in the predictor weight vector will add the same amount of variance 𝜎𝜎𝜖𝜖2/𝑛𝑛 regardless of whether its true coefficient is large or small (or even zero)

• In other words, error scales linearly with the number of variables

• On a side note, variance is inversely proportional to the number of training samples

9/6/2017 Machine Learning 4

Page 5: Lecture 3: Regularized Linear models€¦ · Bias-Variance Tradeoff – Linear Regression • Linear model • Bias-Variance decomposition of linear model. 9/6/2017 Machine Learning

Why regularizing model?

• Predictive ability: Linear regression has zero bias but suffer from high variance. In some applications, it might be desirable to sacrifice some bias for smaller variance

• Intepretability: Large number of predictor variables make it difficult to interpret the model. Variables can be highly correlated. Need to identify a smaller subset of important variables

• Data scarcity: when the dimension of data is large and number of training samples is small, we cannot use non-regularized linear regression

9/6/2017 Machine Learning 5

Page 6: Lecture 3: Regularized Linear models€¦ · Bias-Variance Tradeoff – Linear Regression • Linear model • Bias-Variance decomposition of linear model. 9/6/2017 Machine Learning

Ridge regression• RR is similar to least square regression, but shrink the regression coefficients

towards zero by imposing a penalty on their size.• One-variable case

• Important convention:

• Constant variable 𝑏𝑏 can be removed

9/6/2017 Machine Learning 6

Page 7: Lecture 3: Regularized Linear models€¦ · Bias-Variance Tradeoff – Linear Regression • Linear model • Bias-Variance decomposition of linear model. 9/6/2017 Machine Learning

Ridge Regression

• Solution:

9/6/2017 Machine Learning 7

Page 8: Lecture 3: Regularized Linear models€¦ · Bias-Variance Tradeoff – Linear Regression • Linear model • Bias-Variance decomposition of linear model. 9/6/2017 Machine Learning

Multivariate ridge regression• Multivariate version

• Take derivative and set to zero:

• Regularization makes the problem non-singular even if 𝐗𝐗𝐗𝐗T is not full rank

• Scenarios where 𝐗𝐗𝐗𝐗T is not full rank: • One dimension could be linearly computed from other dimensions• Dimension is larger than the number of training samples

9/6/2017 Machine Learning 8

Page 9: Lecture 3: Regularized Linear models€¦ · Bias-Variance Tradeoff – Linear Regression • Linear model • Bias-Variance decomposition of linear model. 9/6/2017 Machine Learning

Least Absolute Selection and Shrinkage Operator (Lasso)• Motivation:

• Ridge regression rarely set coefficients to zero exactly

cannot perform variable selection in the linear model

less interpretable model

• Solution: Using a different regularizer

9/6/2017 Machine Learning 9

Page 10: Lecture 3: Regularized Linear models€¦ · Bias-Variance Tradeoff – Linear Regression • Linear model • Bias-Variance decomposition of linear model. 9/6/2017 Machine Learning

Lasso

• Ridge uses ℓ2-norm while lasso uses ℓ1-norm

• Many coefficients will be shrunk to zero exactly

• Regularizer is strongly affected by the scale of each variable. Variables with good predictive power can be penalized more due to large scale

Good practice to scale variables to unit variance

9/6/2017 Machine Learning 10

Page 11: Lecture 3: Regularized Linear models€¦ · Bias-Variance Tradeoff – Linear Regression • Linear model • Bias-Variance decomposition of linear model. 9/6/2017 Machine Learning

Lasso

• No close-form solution, need to solve using optimization methods

• Cannot decompose objective function in to bias and variance explicitly

• But similar trend hold: Larger 𝜆𝜆 lead to higher bias and lower variance

• More suitable for variable selection

9/6/2017 Machine Learning 11

Page 12: Lecture 3: Regularized Linear models€¦ · Bias-Variance Tradeoff – Linear Regression • Linear model • Bias-Variance decomposition of linear model. 9/6/2017 Machine Learning

Coefficient paths – Prostate cancer

9/6/2017 Machine Learning 12

Ridge Lasso