regularization methods for linear regression

104
Regularization Methods for Linear Regression Mathilde Mougeot ENSIIE 2017-2018 Mathilde Mougeot (ENSIIE) MRR2017 1 / 66

Upload: others

Post on 04-May-2022

8 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Regularization Methods for Linear Regression

Regularization Methods forLinear Regression

Mathilde Mougeot

ENSIIE

2017-2018

Mathilde Mougeot (ENSIIE) MRR2017 1 / 66

Page 2: Regularization Methods for Linear Regression

Agenda

Lessons

• 3 plenatory lessons (MRR)

• 3 Practical work sessions using R (mrr)

• 10 Project sessions (ipR)

Before next session, install on your computer

1 R software, https://www.r-project.org/

2 Rstudio, https://www.rstudio.com/

Documents are availabale (at this stage)

• https://sites.google.com/site/MougeotMathilde/teaching

Mathilde Mougeot (ENSIIE) MRR2017 2 / 66

Page 3: Regularization Methods for Linear Regression

Plan

A word on data and predictive models

Data are everywhere

• Industry (Temperature, IR sensors...)

• Finance : transactions

• Marketing : consumer data.

• on your phone (GPS, mail, musique ...)

→ Data base are available everywhere : from small data set to Big Data

Mathilde Mougeot (ENSIIE) MRR2017 3 / 66

Page 4: Regularization Methods for Linear Regression

Plan

A word on data and predictive models

Data are everywhere

• Industry (Temperature, IR sensors...)

• Finance : transactions

• Marketing : consumer data.

• on your phone (GPS, mail, musique ...)

→ Data base are available everywhere : from small data set to Big Data

Mathilde Mougeot (ENSIIE) MRR2017 3 / 66

Page 5: Regularization Methods for Linear Regression

Plan

A word on data and predictive models

Nowadays, predictive models are crutial for monitoring, for diagnosis

• Industry : Health monitoring, Energy...

• Finance : forecast of the evolution of the market

• Marketing : scoring

• Health

→ Machine learning models are used to mine, to operate the data.

Mathilde Mougeot (ENSIIE) MRR2017 4 / 66

Page 6: Regularization Methods for Linear Regression

Plan

Regularization Methods for Linear Regression

-Linear regression and Regularized Linear Regression belongs to the Predictivemodel family.-Linear regression is an old model but still very useful !Gauss, 1785 ; Legendre 1805

Outline of the lesson

• Motivations

• Ordinary Least Square -OLS- (geometrical approach)

• The linear Model (probabilistic approach)

• Using R software for modeling

Mathilde Mougeot (ENSIIE) MRR2017 5 / 66

Page 7: Regularization Methods for Linear Regression

Plan

Evolution of the average age of the French population

Mathilde Mougeot (ENSIIE) MRR2017 6 / 66

Page 8: Regularization Methods for Linear Regression

Plan

Evolution of the average age of the French population

Mathilde Mougeot (ENSIIE) MRR2017 7 / 66

Page 9: Regularization Methods for Linear Regression

Plan

Modeling the average age of the French population

Mathilde Mougeot (ENSIIE) MRR2017 8 / 66

Page 10: Regularization Methods for Linear Regression

Plan

Application : Social Networks

Facebook users :

an user(million)

31/12/04 031/12/05 531/12/06 1031/3/07 2030/12/07 6030/6/08 10030/12/08 15030/3/09 17030/6/09 20030/9/09 25030/12/09 30030/3/10 40030/6/10 50030/6/11 750

Mathilde Mougeot (ENSIIE) MRR2017 9 / 66

Page 11: Regularization Methods for Linear Regression

Plan

Evolution of the number of Facebook users

Motivations : investment, forecastMathilde Mougeot (ENSIIE) MRR2017 10 / 66

Page 12: Regularization Methods for Linear Regression

Plan

Modeling the evolution of the number of Facebook users

Mathilde Mougeot (ENSIIE) MRR2017 11 / 66

Page 13: Regularization Methods for Linear Regression

Introduction : Regression model

• (Y,X) : couple of variablesY : Target quantitative variableX = (X1,X2, . . . ,Xp) : Co-variates, quantitative variables

→ The goal is to propose a Regression model to explain Y given X.The parameters of the model are computed using a set of data

Y = Fdata(X ) = Fdata(X1, . . . ,Xp)

In this case, F is a linear function.

• Questions :

- What are the performances of this model ?- What are the main explicative variables ?- Is-it possible to use the model to predict new values ? to forecast ?- Can we improve the model ?

Page 14: Regularization Methods for Linear Regression

Boston Housing Data

The original data are n = 506 observations on p = 14 variables,medv median value, being the target variable

crim per capita crime rate by townzn proportion of residential land zoned for lots over 25,000 sq.ftindus proportion of non-retail business acres per townchas Charles River dummy variable (= 1 if tract bounds river ; 0 otherwise)nox nitric oxides concentration (parts per 10 million)rm average number of rooms per dwellingage proportion of owner-occupied units built prior to 1940dis weighted distances to five Boston employment centresrad index of accessibility to radial highwaystax full-value property-tax rate per USD 10,000ptratio pupil-teacher ratio by townb 1000(B − 0.63)2 where B is the proportion of blacks by townlstat percentage of lower status of the populationmedv median value of owner-occupied homes in USD 1000’s

Page 15: Regularization Methods for Linear Regression

Boston Housing Data

The data :nr crim zn indus chas nox rm age dis rad tax ptratio b lstat medv1 0.006 18 2.3 0 0.53 6.57 65.2 4.09 1 296 15.3 396.9 4.9 24.02 0.027 0 7.0 0 0.46 6.42 78.9 4.96 2 242 17.8 396.9 9.1 21.63 0.027 0 7.0 0 0.46 7.18 61.1 4.96 2 242 17.8 392.8 4.0 34.74 0.032 0 2.1 0 0.45 6.99 45.8 6.06 3 222 18.7 394.6 2.9 33.45 0.069 0 2.1 0 0.45 7.14 54.2 6.06 3 222 18.7 396.9 5.3 36.2... ... ... ... ... ... ... ... ... ... ... ... ... ... ...

Page 16: Regularization Methods for Linear Regression

Boston Housing Data

Different points of view may exist for studying those data :Model Y = Fdata(X )

• Mining approach

− Evaluate the performances of the model

− What are the most important variables ? (variable selection)→ sparse models, less complex, best performances

• Predictive approach

− Inference and simulation→ Ponctual estimation for new values of the co-variables→ Confidence interval computation.

Page 17: Regularization Methods for Linear Regression

Outline

Outline

• Applications

• Ordinary Least Square (OLS) / Moindre Carres Ordinaires (MCO)

• Linear Model

• Regularization methods : ridge, lasso

Mathilde Mougeot (ENSIIE) MRR2017 16 / 66

Page 18: Regularization Methods for Linear Regression

OLS

Ordinary Least Square (OLS)

Page 19: Regularization Methods for Linear Regression

Outline

Ordinary Least Square (OLS)

• Values/Variables :

- Y , Y ∈ R value/ Target variable- X = (X 1, ...,X p), X ∈ Rp values/ covariates

• Data : S = {(xi , yi ) i = 1, ..., n, yi ∈ R, xi ∈ Rp}

• Goal : Modeling Y linearly with X , with a ”small” ε term

• Y = β1 + β2X2 + · · ·+ βpXp+ ε

• If X1 = 1, we can write Y = β1X1 + β2X2 + · · ·+ βpXp+ ε

• Y =∑p

j βjXj+ ε (p βj parameters, p variables, ”one constant” inside)

Mathilde Mougeot (ENSIIE) MRR2017 18 / 66

Page 20: Regularization Methods for Linear Regression

Outline

Ordinary Least Square (OLS)

• Values/Variables :

- Y , Y ∈ R value/ Target variable- X = (X 1, ...,X p), X ∈ Rp values/ covariates

• Data : S = {(xi , yi ) i = 1, ..., n, yi ∈ R, xi ∈ Rp}

• Goal : Modeling Y linearly with X , with a ”small” ε term

• Y = β1 + β2X2 + · · ·+ βpXp+ ε

• If X1 = 1, we can write Y = β1X1 + β2X2 + · · ·+ βpXp+ ε

• Y =∑p

j βjXj+ ε (p βj parameters, p variables, ”one constant” inside)

Mathilde Mougeot (ENSIIE) MRR2017 18 / 66

Page 21: Regularization Methods for Linear Regression

Outline

Ordinary Least Square (OLS)

• Values/Variables :

- Y , Y ∈ R value/ Target variable- X = (X 1, ...,X p), X ∈ Rp values/ covariates

• Data : S = {(xi , yi ) i = 1, ..., n, yi ∈ R, xi ∈ Rp}

• Goal : Modeling Y linearly with X , with a ”small” ε term

• Y = β1 + β2X2 + · · ·+ βpXp+ ε

• If X1 = 1, we can write Y = β1X1 + β2X2 + · · ·+ βpXp+ ε

• Y =∑p

j βjXj+ ε (p βj parameters, p variables, ”one constant” inside)

Mathilde Mougeot (ENSIIE) MRR2017 18 / 66

Page 22: Regularization Methods for Linear Regression

OLS

Ordinary Least Square (OLS)Simple Linear Regression model

Page 23: Regularization Methods for Linear Regression

Outline

Simple Linear Regression : exampleWe only have one co-variable (X) to explain the target variable (Y).The scatter plot is represented by :

++++++

+

++

+

+

+

++

+

++++

+

+

+

+++

+

+

+

+

+

++

+

+

+

+

+

+

+

+

+

++

+

+

+

+

+

+

+

++

++++

++

+

+

+

+

+

+++

+

+

+

+

++

++

+

+

+

+

++

+

++

+

+

+

+

++

++

+

+

+

+

+

+

++

+

5 6 7 8 9 10

12

16

20

24

x

y

Mathilde Mougeot (ENSIIE) MRR2017 20 / 66

Page 24: Regularization Methods for Linear Regression

Outline

Simple Linear Regression : example

For all observation couples i , 1 ≤ i ≤ n (Yi ,Xi ),the goal is here to minimize (Yi − (β1 + β2Xi ))2

++++++

+

++

+

+

+

++

+

++++

+

+

+

+++

+

+

+

+

+

++

+

+

+

+

+

+

+

+

+

++

+

+

+

+

+

+

+

++

++++

++

+

+

+

+

+

+++

+

+

+

+

++

++

+

+

+

+

++

+

++

+

+

+

+

++

++

+

+

+

+

+

+

++

+

5 6 7 8 9 10

12

16

20

24

x

y

Régression

Mathilde Mougeot (ENSIIE) MRR2017 21 / 66

Page 25: Regularization Methods for Linear Regression

Outline

Ordinary Least Square : simple linear model

Formalism :

• Distance to a single point : (yi − xiβ2 − β1)2

• Distance to the whole sample :∑n

i=1(yi − xiβ2 − β1)2

→ Best line : intercept β1 and slope β2 such that∑n

i=1(yi − xiβ2 − β1)2 isminimum, among all possible values of β1 and β2.

OLS estimator :The values estimated by OLS (the estimates) for β1 and β2 verify :

(βols1 , βols

2 ) = argminβ1,β2∈R2

{n∑

i=1

(yi − xiβ2 − β1)2}

Mathilde Mougeot (ENSIIE) MRR2017 22 / 66

Page 26: Regularization Methods for Linear Regression

Outline

Ordinary Least Square : simple linear modelOLS estimator (observations) :The values estimated by OLS (the estimates) for β1 and β2 verify :

(βols1 , βols

2 ) = argminβ1,β2∈R2

{n∑

i=1

(yi − xiβ2 − β1)2}

OLS estimator (vector notations) : y , x , 1n

The values estimated by OLS (the estimates) for β1 and β2 verify :

(βols1 , βols

2 ) = argminβ1,β2∈R2

‖y − xβ2 − 1nβ1‖22

OLS estimator (matrix notations Y (n,1) ; X (n,2)) :The values estimated by OLS (the estimates) for β1 and β2 verify :

(βols1 , βols

2 ) = argminβ1,β2∈R2

‖Y − Xβ‖22

Mathilde Mougeot (ENSIIE) MRR2017 23 / 66

Page 27: Regularization Methods for Linear Regression

Outline

Ordinary Least Square : simple linear model

Theorem :The OLS estimators have the following expressions :

β1 = y − β2x

β2 =∑n

i=1(xi−x)(yi−y)∑2i=1(xi−x)2 = Cov(x ,y)

Var(x)

Proof :by zeroing the derivative of the objective function, which is convex.

Mathilde Mougeot (ENSIIE) MRR2017 24 / 66

Page 28: Regularization Methods for Linear Regression

Outline

Ordinary Least Square : simple linear model

For the simple linear model, the correlation coefficient may be very useful :

• r(x , y) : correlation coefficient/ coefficient de correlation lineaire

r(x , y) = cov(X ,Y )√var(x)√

var(y)

• r(x , y) = 1 if and only if Y = aX + b, linear relation between Y et X

R-square used in multiple regression

• R2 = VarYVar(Y )

• R2 ∈ [0, 1]

• Simple regression R2 = r2 :

Mathilde Mougeot (ENSIIE) MRR2017 25 / 66

Page 29: Regularization Methods for Linear Regression

Outline

Ordinary Least Square : simple linear model

For the simple linear model, the correlation coefficient may be very useful :

• r(x , y) : correlation coefficient/ coefficient de correlation lineaire

r(x , y) = cov(X ,Y )√var(x)√

var(y)

• r(x , y) = 1 if and only if Y = aX + b, linear relation between Y et X

R-square used in multiple regression

• R2 = VarYVar(Y )

• R2 ∈ [0, 1]

• Simple regression R2 = r2 :

Mathilde Mougeot (ENSIIE) MRR2017 25 / 66

Page 30: Regularization Methods for Linear Regression

Outline

Ordinary Least Square : simple linear model

For the simple linear model, the correlation coefficient may be very useful :

• r(x , y) : correlation coefficient/ coefficient de correlation lineaire

r(x , y) = cov(X ,Y )√var(x)√

var(y)

• r(x , y) = 1 if and only if Y = aX + b, linear relation between Y et X

R-square used in multiple regression

• R2 = VarYVar(Y )

• R2 ∈ [0, 1]

• Simple regression R2 = r2 :

Mathilde Mougeot (ENSIIE) MRR2017 25 / 66

Page 31: Regularization Methods for Linear Regression

Outline

The value of a correlation coif. may be non informative

Illustration of two joint distribution with a correlation coefficient equaled to 0.999

●●

●●

●●

●●

●●

● ●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●● ●

●●

n=500, correlation 0.9998

x

y

●●

●●

●●

● ●

●●

● ●

● ●

●●

●●

● ●●

●●

●●

● ●

● ●

●●

●●

●●

x

y

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●

n=500,correlation 0.9990

x

y

→ A scatter plot shows two very different distribution.Always look at the data ! !

Always study carefully the data

Mathilde Mougeot (ENSIIE) MRR2017 26 / 66

Page 32: Regularization Methods for Linear Regression

OLS

Ordinary Least Square (OLS)Multiple Linear Regression model

Page 33: Regularization Methods for Linear Regression

Outline

Ordinary Least Square : multiple linear model

• We suppose Y =∑p

j βjXj + ε and S = {(xi , yi ) i = 1...n, yi ∈ R xi ∈ Rp}

• The Quadratic error is defined by :

E (β) =∑n

i ε2i =

∑ni (yi −

∑j x

ji βj)

2

with matrix notation :

E (β) = (Y − Xβ)T (Y − Xβ)

• Goal : To minimize the error E (β) on the data set S .To compute β ∈ Rp :

β = argminβ∈Rp E (β)

Mathilde Mougeot (ENSIIE) MRR2017 28 / 66

Page 34: Regularization Methods for Linear Regression

Outline

Ordinary Least Square : multiple linear model

• We suppose Y =∑p

j βjXj + ε and S = {(xi , yi ) i = 1...n, yi ∈ R xi ∈ Rp}

• The Quadratic error is defined by :

E (β) =∑n

i ε2i =

∑ni (yi −

∑j x

ji βj)

2

with matrix notation :

E (β) = (Y − Xβ)T (Y − Xβ)

• Goal : To minimize the error E (β) on the data set S .To compute β ∈ Rp :

β = argminβ∈Rp E (β)

Mathilde Mougeot (ENSIIE) MRR2017 28 / 66

Page 35: Regularization Methods for Linear Regression

Outline

Ordinary Least Square : multiple linear model

• We suppose Y =∑p

j βjXj + ε and S = {(xi , yi ) i = 1...n, yi ∈ R xi ∈ Rp}

• The Quadratic error is defined by :

E (β) =∑n

i ε2i =

∑ni (yi −

∑j x

ji βj)

2

with matrix notation :

E (β) = (Y − Xβ)T (Y − Xβ)

• Goal : To minimize the error E (β) on the data set S .To compute β ∈ Rp :

β = argminβ∈Rp E (β)

Mathilde Mougeot (ENSIIE) MRR2017 28 / 66

Page 36: Regularization Methods for Linear Regression

Outline

Ordinary Least Square. multiple regression model

• We aim to compute β which minimize :

E (β) = ||Y − Xβ||22= (Y − Xβ)T (Y − Xβ)

• Assumption : XTX inversible. (n ≥ p)

Theorem :

βMCO = (XTX )−1XTY

Mathilde Mougeot (ENSIIE) MRR2017 29 / 66

Page 37: Regularization Methods for Linear Regression

Outline

MCO

• Estimation

β = (XTX )−1XTY

• Prediction Knowing β and given X1, . . . ,Xp,

the prediction of the target can be computed : Y =∑

j βjXj

Y = X β= X (XTX )−1XTY

• P Projection matrix on the Hyperplan (hat matrix)P = X (XTX )−1XT

P2 = P

• Residuals

- ε = Y − Y• Remark : no assumption on the law or distribution of ε

Mathilde Mougeot (ENSIIE) MRR2017 30 / 66

Page 38: Regularization Methods for Linear Regression

Outline

MCO

• Estimation

β = (XTX )−1XTY

• Prediction Knowing β and given X1, . . . ,Xp,

the prediction of the target can be computed : Y =∑

j βjXj

Y = X β= X (XTX )−1XTY

• P Projection matrix on the Hyperplan (hat matrix)P = X (XTX )−1XT

P2 = P

• Residuals

- ε = Y − Y• Remark : no assumption on the law or distribution of ε

Mathilde Mougeot (ENSIIE) MRR2017 30 / 66

Page 39: Regularization Methods for Linear Regression

Outline

MCO

• Estimation

β = (XTX )−1XTY

• Prediction Knowing β and given X1, . . . ,Xp,

the prediction of the target can be computed : Y =∑

j βjXj

Y = X β= X (XTX )−1XTY

• P Projection matrix on the Hyperplan (hat matrix)P = X (XTX )−1XT

P2 = P

• Residuals

- ε = Y − Y• Remark : no assumption on the law or distribution of ε

Mathilde Mougeot (ENSIIE) MRR2017 30 / 66

Page 40: Regularization Methods for Linear Regression

Outline

MCO

• Estimation

β = (XTX )−1XTY

• Prediction Knowing β and given X1, . . . ,Xp,

the prediction of the target can be computed : Y =∑

j βjXj

Y = X β= X (XTX )−1XTY

• P Projection matrix on the Hyperplan (hat matrix)P = X (XTX )−1XT

P2 = P

• Residuals

- ε = Y − Y• Remark : no assumption on the law or distribution of ε

Mathilde Mougeot (ENSIIE) MRR2017 30 / 66

Page 41: Regularization Methods for Linear Regression

Outline

Ordinary Least Square. Geometrical interpretation

Y =∑p

j Xjβj + ε

∈ Rn ∈ Rp ∈ R(n−p)

Mathilde Mougeot (ENSIIE) MRR2017 31 / 66

Page 42: Regularization Methods for Linear Regression

Outline Properties

Ordinary Least Square. Properties

• Orthogonality :

- Y⊥ε• Xj⊥ε ∀j ∈ [1 . . . p] < X j , ε >= 0

• Residual average :

-∑

i εi = 0 if there is an intercept in the model X 1 = (1, 1, . . . , 1)→ the average point belongs to the hyperplan

- ¯Y = Y

• Analysis of Variance -ANAVAR- (Pythagore)

- var(Y ) = var(Y ) + var(E)

Mathilde Mougeot (ENSIIE) MRR2017 32 / 66

Page 43: Regularization Methods for Linear Regression

Outline Properties

Ordinary Least Square. Properties

• Orthogonality :

- Y⊥ε• Xj⊥ε ∀j ∈ [1 . . . p] < X j , ε >= 0

• Residual average :

-∑

i εi = 0 if there is an intercept in the model X 1 = (1, 1, . . . , 1)→ the average point belongs to the hyperplan

- ¯Y = Y

• Analysis of Variance -ANAVAR- (Pythagore)

- var(Y ) = var(Y ) + var(E)

Mathilde Mougeot (ENSIIE) MRR2017 32 / 66

Page 44: Regularization Methods for Linear Regression

Outline Properties

Ordinary Least Square. Properties

• Orthogonality :

- Y⊥ε• Xj⊥ε ∀j ∈ [1 . . . p] < X j , ε >= 0

• Residual average :

-∑

i εi = 0 if there is an intercept in the model X 1 = (1, 1, . . . , 1)→ the average point belongs to the hyperplan

- ¯Y = Y

• Analysis of Variance -ANAVAR- (Pythagore)

- var(Y ) = var(Y ) + var(E)

Mathilde Mougeot (ENSIIE) MRR2017 32 / 66

Page 45: Regularization Methods for Linear Regression

Outline Adjustment

Multiple Linear model : example with R

head(mydata,3) ;y x1 x2 x31 -2.20 0.38 0.98 0.462 -1.75 0.11 0.62 0.373 -0.24 0.80 0.59 0.87...> modlm=lm(y ˜ x1+x2+x3,data=mydata) ;Call :lm(formula = y ˜ x1+x2+x3, data = mydata)Coefficients :(Intercept) x1 x2 x30.02754 1.98163 -3.03612 0.01903

Mathilde Mougeot (ENSIIE) MRR2017 33 / 66

Page 46: Regularization Methods for Linear Regression

Outline Adjustment

Multiple Linear model : example with R

> modlm=lm(y ˜ x1+x2+x3,data=mydata) ;> summary(modlm)lm(formula = y ˜ x1+x2+x3, data = mydata)Residuals :Min 1Q Median 3Q Max-0.29 -0.075 -0.0035 0.073 0.281Coefficients :Estimate Std. Error t value Pr(>|t|)(Intercept) 0.02754 0.01503 1.833 0.0674 .x1 1.98163 0.01577 125.652 <2e-16 ***x2 -3.03612 0.01621 -187.286 <2e-16 ***x3 0.01903 0.01576 1.208 0.2277— Signif. codes : 0 /*** 0.001 /** 0.01 / * 0.05 / . 0.1/ 1Residual standard error : 0.1009 on 496 degrees of freedomMultiple R-squared : 0.9904, Adjusted R-squared : 0.9904F-statistic : 1.707e+04 on 3 and 496 DF, p-value : < 2.2e-16

Mathilde Mougeot (ENSIIE) MRR2017 34 / 66

Page 47: Regularization Methods for Linear Regression

Outline Adjustment

Ordinary Least Square, quality of the adjustment

• R2, R-square (French : coefficient de determination)

• R2 = var(Y )var(Y )

, remark : no unit

• cos2w = R2 =||Y− ¯Y1,n||2

||Y−Y1,n||2

w : angle between the centered vector (Y − Y1,n) and its centered prediction

(Y − ˆY1,n)

• var(E ) = 1n

∑ni=1(yi − yi )

2

= (1− R2)var(Y ), unit of Y 2

Mathilde Mougeot (ENSIIE) MRR2017 35 / 66

Page 48: Regularization Methods for Linear Regression

Outline Adjustment

Ordinary Least Square, quality of the adjustment

• R2, R-square (French : coefficient de determination)

• R2 = var(Y )var(Y )

, remark : no unit

• cos2w = R2 =||Y− ¯Y1,n||2

||Y−Y1,n||2

w : angle between the centered vector (Y − Y1,n) and its centered prediction

(Y − ˆY1,n)

• var(E ) = 1n

∑ni=1(yi − yi )

2

= (1− R2)var(Y ), unit of Y 2

Mathilde Mougeot (ENSIIE) MRR2017 35 / 66

Page 49: Regularization Methods for Linear Regression

Outline Adjustment

Ordinary Least Square, quality of the adjustment

• R2, R-square (French : coefficient de determination)

• R2 = var(Y )var(Y )

, remark : no unit

• cos2w = R2 =||Y− ¯Y1,n||2

||Y−Y1,n||2

w : angle between the centered vector (Y − Y1,n) and its centered prediction

(Y − ˆY1,n)

• var(E ) = 1n

∑ni=1(yi − yi )

2

= (1− R2)var(Y ), unit of Y 2

Mathilde Mougeot (ENSIIE) MRR2017 35 / 66

Page 50: Regularization Methods for Linear Regression

Outline Adjustment

Ordinary Least Square : Quality of the Ajustement• R-square (for few variables)

• R2 = cos2 ω = VarYVar(Y )

• R2 ∈ [0, 1]• R2 = increases mechanically with the number of variables

• Adjusted R-squared is sometimes preferred (penalization with the number ofvariables)• R2

adj = 1− (1− R2) n−1n−p

• R2adj may be negtive

• Residual study :• εi = yi − yi ∀i ∈ 1..n• Vizualization of

• (εi , i) ∀i ∈ 1..n• (εi , yi ) homoscedastic vs heteroscedasticbissectrice model

• Prediction : Vizualization of• (yi , yi ) ∀i ∈ 1..n• comparison with the first bisector.

Mathilde Mougeot (ENSIIE) MRR2017 36 / 66

Page 51: Regularization Methods for Linear Regression

Outline Adjustment

Ordinary Least Square : Quality of the Ajustement• R-square (for few variables)

• R2 = cos2 ω = VarYVar(Y )

• R2 ∈ [0, 1]• R2 = increases mechanically with the number of variables

• Adjusted R-squared is sometimes preferred (penalization with the number ofvariables)• R2

adj = 1− (1− R2) n−1n−p

• R2adj may be negtive

• Residual study :• εi = yi − yi ∀i ∈ 1..n• Vizualization of

• (εi , i) ∀i ∈ 1..n• (εi , yi ) homoscedastic vs heteroscedasticbissectrice model

• Prediction : Vizualization of• (yi , yi ) ∀i ∈ 1..n• comparison with the first bisector.

Mathilde Mougeot (ENSIIE) MRR2017 36 / 66

Page 52: Regularization Methods for Linear Regression

Outline Adjustment

Ordinary Least Square : Quality of the Ajustement• R-square (for few variables)

• R2 = cos2 ω = VarYVar(Y )

• R2 ∈ [0, 1]• R2 = increases mechanically with the number of variables

• Adjusted R-squared is sometimes preferred (penalization with the number ofvariables)• R2

adj = 1− (1− R2) n−1n−p

• R2adj may be negtive

• Residual study :• εi = yi − yi ∀i ∈ 1..n• Vizualization of

• (εi , i) ∀i ∈ 1..n• (εi , yi ) homoscedastic vs heteroscedasticbissectrice model

• Prediction : Vizualization of• (yi , yi ) ∀i ∈ 1..n• comparison with the first bisector.

Mathilde Mougeot (ENSIIE) MRR2017 36 / 66

Page 53: Regularization Methods for Linear Regression

Outline Adjustment

Ordinary Least Square : Quality of the Ajustement• R-square (for few variables)

• R2 = cos2 ω = VarYVar(Y )

• R2 ∈ [0, 1]• R2 = increases mechanically with the number of variables

• Adjusted R-squared is sometimes preferred (penalization with the number ofvariables)• R2

adj = 1− (1− R2) n−1n−p

• R2adj may be negtive

• Residual study :• εi = yi − yi ∀i ∈ 1..n• Vizualization of

• (εi , i) ∀i ∈ 1..n• (εi , yi ) homoscedastic vs heteroscedasticbissectrice model

• Prediction : Vizualization of• (yi , yi ) ∀i ∈ 1..n• comparison with the first bisector.

Mathilde Mougeot (ENSIIE) MRR2017 36 / 66

Page 54: Regularization Methods for Linear Regression

Outline Adjustment

Ordinary Least Square : Quality of the Adjustement

Graphics (yi , yi ) 1 ≤ i ≤ j VERY USEFUL

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

● ●

●●

●●●

● ●

●●

●●

●●

●●

● ●

●●

●●

●●●

●●

● ●

●●

●●

modlm$fit

y

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●● ●

● ●

●●

●●

●●

● ●

● ●

●●

●●●

●●

●●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

modlm$fit

y

● ●

● ●

●●

● ●

●●

●●

●●

●●

● ●●

●●

● ●

●●

●●

● ● ●●

●●

●●

●●

●●

●●

●●

● ●

● ●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●●●

●●

● ●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●●

●●

● ●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

● ●

●●

● ●

● ●

●● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●●

●●

● ●●

●●

●●

●●

●●

●●

●●

●●

modlm$fit

y

●●

● ●

●●

●●

●●

● ●

●●●

●●●

●●

●●

● ●

●● ●

●●

● ●

●●●

●●

●●

●●

● ●

●●

●●

●●

● ●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

● ●●

●●

●●

●●

● ●

●●

●●●

●●

●●

●●

●●

● ●●

●●

●●

● ●

● ●

●●

●●

● ●

●●

● ●

● ●

●●

●●

●●

●● ●●

●●

●●

●●●

●●

●●

●● ●

●●

●●

●●

modlm$fit

y

Mathilde Mougeot (ENSIIE) MRR2017 37 / 66

Page 55: Regularization Methods for Linear Regression

Outline Adjustment

Ordinary Least Square : Student Residual graphεiSE

= yi−yiSE

(no unit term)

+

+++

+

++

+

+

+

+

++

+

+

+

+

+

+

++

+

+

+

++

+

+

+

+

+

++

+

++

+++++

++

+

++

+

+

+

+

+

+

+

+

+

+

++

++

+

++

+

++

+++

+

+

++

++

+

+

++

+

++

++

++

++

++

+

+

+

+

+

++

+

++

12 14 16 18 20 22 24

−4

−2

02

4

y

E

Residual graph

→ Random distribution. There is no information to be capture

Mathilde Mougeot (ENSIIE) MRR2017 38 / 66

Page 56: Regularization Methods for Linear Regression

Outline Adjustment

Ordinary Least Square : Student Residual graphεiSE

= yi−yiSE

(with non unit)

++++

++++

+++++

+++

+

+++++

+

+++

+

++

+++++

+++++++

++++++++++++++++++++

++++++++

++++++++++++++++++

+++++++

++++

++

0 20 40 60 80 100

−3

0−

10

01

02

03

0

Index

E

o

oo

Residual graph function of Y

→ Large values for some points ? Outliers detection ?

Mathilde Mougeot (ENSIIE) MRR2017 39 / 66

Page 57: Regularization Methods for Linear Regression

Outline Adjustment

Ordinary Least Square : Student Residual graphεiSE

= yi−yiSE

(with non unit)

+++

++++++

+

+

++

+

++

+++ ++

+

+

+

+

+

+

+

+

++

++

+

+

+

++

+ +

++++

+

+++

+

+++++

+

+

+

+

+++

++++

+++

+++

+++

+

++

+

+

+

++

++

+

++++

++

+

+

+

+

++

+

+

+

50 100 150 200

−1

0−

50

51

0

y2

E2

Graphe des residus en fonction de Y

→ there is still some information in the residuals.→ The model needs to be changed.

Mathilde Mougeot (ENSIIE) MRR2017 40 / 66

Page 58: Regularization Methods for Linear Regression

Outline Adjustment

Ordinary Least Square : curse of dimension

Data set : {(yi , xi )1 ≤ i ≤ n}. One target variable, one covariable

+

+

+

+

+

+

++

++

+

+

+

+ +

+

+

+

+ +

+

+

+

+

+

+

+

+

+

+

++

++

+++

++

+

+

+

+

+

+

+

++

+

+

−2 −1 0 1 2

−2−1

01

2

x1

y

R2= 0.07

Y:N(0,1) X:N(0,1)

n=50, p=1+1

→ R2 =, R2adj = −0.02

Mathilde Mougeot (ENSIIE) MRR2017 41 / 66

Page 59: Regularization Methods for Linear Regression

Outline Adjustment

Ordinary Least Square : illustration of the impact of thenumber of covariables on the modelinitial data and OLS line

+

+

+ +

+

+

++

+

++

+

++

+

++

+

++

+

+

+

+

++

+

+

+

+

+

+

+

+

+

+ ++

+

+

++

+

+

+

+

+

+

+

+

−2 −1 0 1 2

−0.6

−0.4

−0.2

0.0

0.2

0.4

y

mco

$fit

Y:N(0,1) X:N(0,1)

n=50, p=1+1

R2= 0.07

→ R2 =, R2adj = −0.02

Mathilde Mougeot (ENSIIE) MRR2017 42 / 66

Page 60: Regularization Methods for Linear Regression

Outline Adjustment

Ordinary Least Square : illustration of the impact of thenumber of covariables on the model

initial data and 48 more covariables N (0, 1) are added to the initial data set.

+

+

+

+

+

+

+

+++

+

++

+

+

+

+

+

+

+

+

+++

++

++

+

++

++

++

+

+

++

+

++

+

+

+

+

+

+

+

+

−3 −2 −1 0 1 2

−3−2

−10

12

y

mco

$fit

Y:N(0,1) X:N(0,1)

n=50, p=1+48

R2= 1.00

→ R2 = 0.99, R2adj = 0.93

Mathilde Mougeot (ENSIIE) MRR2017 43 / 66

Page 61: Regularization Methods for Linear Regression

Outline Adjustment

Ordinary Least Square : need to change the model (1/2)

initial data set :

Mathilde Mougeot (ENSIIE) MRR2017 44 / 66

Page 62: Regularization Methods for Linear Regression

Outline Adjustment

Ordinary Least Square : need to change the model (1/2)

logarithmic transformation

Mathilde Mougeot (ENSIIE) MRR2017 45 / 66

Page 63: Regularization Methods for Linear Regression

Outline Adjustment

MCO Regression. Some limits :

If XTX is non inversible

• n >> p, collinearity between some Xj .• Pseudo-inverse, the solution is not unique• Variable selection

• p >> n, when the number of variables is larger than the number ofobservations• Regularization method• Ridge -L2-, Lasso -L1-.• Variable selection

OLS modelPonctual estimation.

Mathilde Mougeot (ENSIIE) MRR2017 46 / 66

Page 64: Regularization Methods for Linear Regression

Outline Adjustment

OLS, XTX non inversible →

Pseudo inverse computation

Solution (n > p), XTX is non invertible with the rank k, k < p :

XTX = UΣ2UT

=U

σ2

1 0 0 0

0... 0 0

0 0 σ2k 0

0 0 0 0

UT

= UkΣ2kU

Tk

(XTX )∗−1 = UkΣ2k−1

UTk avec Σ2

k =

σ21 0 0

0... 0

0 0 σ2k

β = (XTX )∗−1XTYThe solution non unique

Mathilde Mougeot (ENSIIE) MRR2017 47 / 66

Page 65: Regularization Methods for Linear Regression

Modele Lineaire

Outline

• Motivations

• Ordinary Least Square

• Linear Model

• Penalized regression, ridge, lasso

Mathilde Mougeot (ENSIIE) MRR2017 48 / 66

Page 66: Regularization Methods for Linear Regression

Linear Model

Probabilistic assumption on the residuals

The probabilistic assumption on the residuals let to introduce :- statistical tests on the value of the coefficients (to study is a variable is useful ornot to explain the target),-to test if the general liner model is useful (all coefficients equal zero).

Page 67: Regularization Methods for Linear Regression

Linear model

• We write : Y = Xβ + ε avec ε ∼ N (0, σ2)

• We have

- εi = Yi −∑

X ji βj avec f (ε) = 1√

2πσe− ε2

2σ2 i.i.d.

• Residual density & Maximum Likelihood Estimationf (ε1, . . . , εn) =

∏i f (εi )

= 1(2π)n/2σn e

−∑ε2i

2σ2

= 1(2π)n/2σ2n/2 e

− ||Y−Xβ||22σ2

• the goal is to compute β, σ2 solutions of the maximum likelihood Estimation(MLE)Same solution for the MLE and the OLS :

- β = (XTX )−1XTY

- σ2 = 1n

∑i=ni=1 ε

2i

Page 68: Regularization Methods for Linear Regression

Linear model

• We write : Y = Xβ + ε avec ε ∼ N (0, σ2)

• We have

- εi = Yi −∑

X ji βj avec f (ε) = 1√

2πσe− ε2

2σ2 i.i.d.

• Residual density & Maximum Likelihood Estimationf (ε1, . . . , εn) =

∏i f (εi )

= 1(2π)n/2σn e

−∑ε2i

2σ2

= 1(2π)n/2σ2n/2 e

− ||Y−Xβ||22σ2

• the goal is to compute β, σ2 solutions of the maximum likelihood Estimation(MLE)Same solution for the MLE and the OLS :

- β = (XTX )−1XTY

- σ2 = 1n

∑i=ni=1 ε

2i

Page 69: Regularization Methods for Linear Regression

Linear model

• We write : Y = Xβ + ε avec ε ∼ N (0, σ2)

• We have

- εi = Yi −∑

X ji βj avec f (ε) = 1√

2πσe− ε2

2σ2 i.i.d.

• Residual density & Maximum Likelihood Estimationf (ε1, . . . , εn) =

∏i f (εi )

= 1(2π)n/2σn e

−∑ε2i

2σ2

= 1(2π)n/2σ2n/2 e

− ||Y−Xβ||22σ2

• the goal is to compute β, σ2 solutions of the maximum likelihood Estimation(MLE)Same solution for the MLE and the OLS :

- β = (XTX )−1XTY

- σ2 = 1n

∑i=ni=1 ε

2i

Page 70: Regularization Methods for Linear Regression

Linear model

• We write : Y = Xβ + ε avec ε ∼ N (0, σ2)

• We have

- εi = Yi −∑

X ji βj avec f (ε) = 1√

2πσe− ε2

2σ2 i.i.d.

• Residual density & Maximum Likelihood Estimationf (ε1, . . . , εn) =

∏i f (εi )

= 1(2π)n/2σn e

−∑ε2i

2σ2

= 1(2π)n/2σ2n/2 e

− ||Y−Xβ||22σ2

• the goal is to compute β, σ2 solutions of the maximum likelihood Estimation(MLE)Same solution for the MLE and the OLS :

- β = (XTX )−1XTY

- σ2 = 1n

∑i=ni=1 ε

2i

Page 71: Regularization Methods for Linear Regression

Linear model : What are the laws of the estimators ?

Y = Xβ + ε avec ε ∼ N (0, σ2)

Law of the estimators :• Law of β ?

• Law of Y ?

• Law of σ2 ?

Benefits→ let to compute confidence intervals for β and Y .→ let to test the parameters.

Page 72: Regularization Methods for Linear Regression

Law of β :

β ∼ N (β, σ2(XTX )−1)

Expectation and Variance of β ? :

β = (XTX )−1XTY et Y = Xβ + ε

• E(β) = β Non biased estimator E(β)− β = 0

• Var(β) = σ2(XTX )−1

Var(β) = (XTX )−1XT ([var(Y )]X (XTX )−1

= (XTX )−1XT ([var(ε)]X (XTX )−1

= σ2(XTX )−1

• E[(β − β)2] = Var(β) + 0

Recall Var(aY ) = aVar(Y )aT

Page 73: Regularization Methods for Linear Regression

Law of β :

β ∼ N (β, σ2(XTX )−1)

Expectation and Variance of β ? :

β = (XTX )−1XTY et Y = Xβ + ε

• E(β) = β Non biased estimator E(β)− β = 0

• Var(β) = σ2(XTX )−1

Var(β) = (XTX )−1XT ([var(Y )]X (XTX )−1

= (XTX )−1XT ([var(ε)]X (XTX )−1

= σ2(XTX )−1

• E[(β − β)2] = Var(β) + 0

Recall Var(aY ) = aVar(Y )aT

Page 74: Regularization Methods for Linear Regression

Law of β :

β ∼ N (β, σ2(XTX )−1)

Expectation and Variance of β ? :

β = (XTX )−1XTY et Y = Xβ + ε

• E(β) = β Non biased estimator E(β)− β = 0

• Var(β) = σ2(XTX )−1

Var(β) = (XTX )−1XT ([var(Y )]X (XTX )−1

= (XTX )−1XT ([var(ε)]X (XTX )−1

= σ2(XTX )−1

• E[(β − β)2] = Var(β) + 0

Recall Var(aY ) = aVar(Y )aT

Page 75: Regularization Methods for Linear Regression

Law of β :

β ∼ N (β, σ2(XTX )−1)

Expectation and Variance of β ? :

β = (XTX )−1XTY et Y = Xβ + ε

• E(β) = β Non biased estimator E(β)− β = 0

• Var(β) = σ2(XTX )−1

Var(β) = (XTX )−1XT ([var(Y )]X (XTX )−1

= (XTX )−1XT ([var(ε)]X (XTX )−1

= σ2(XTX )−1

• E[(β − β)2] = Var(β) + 0

Recall Var(aY ) = aVar(Y )aT

Page 76: Regularization Methods for Linear Regression

Law of Y

Y ∼ N (Xβ, σ2X (XTX )−1XT )

Expectation and Variance of Y ?, Y = X β

• E(Y ) = XβE(Y ) = E(X β) = XE(β) = Xβ = E(Y )

• Var(Y ) = σ2 X (XTX )−1 XT

Var(Y ) = Var(X β)

= X Var(β) XT

= σ2 X (XTX )−1 XT

Page 77: Regularization Methods for Linear Regression

Law of Y

Y ∼ N (Xβ, σ2X (XTX )−1XT )

Expectation and Variance of Y ?, Y = X β

• E(Y ) = XβE(Y ) = E(X β) = XE(β) = Xβ = E(Y )

• Var(Y ) = σ2 X (XTX )−1 XT

Var(Y ) = Var(X β)

= X Var(β) XT

= σ2 X (XTX )−1 XT

Page 78: Regularization Methods for Linear Regression

Law of Y

Y ∼ N (Xβ, σ2X (XTX )−1XT )

Expectation and Variance of Y ?, Y = X β

• E(Y ) = XβE(Y ) = E(X β) = XE(β) = Xβ = E(Y )

• Var(Y ) = σ2 X (XTX )−1 XT

Var(Y ) = Var(X β)

= X Var(β) XT

= σ2 X (XTX )−1 XT

Page 79: Regularization Methods for Linear Regression

Law of ε

ε ∼ N (0, σ2(In − X (XTX )−1XT )

Expectation and Variance of ε = Y − Y ? :

• E(ε) = 0

• Var(ε) = σ2(In − X (XTX )−1XT )

Var(ε) = Var(Y − Y )

= Var(Y − X β)

= σ2(In)− XVar(β)XT

= σ2(In − X (XTX )−1XT )

Recal : Var(aY ) = aVar(Y )aT

Page 80: Regularization Methods for Linear Regression

Law of ε

ε ∼ N (0, σ2(In − X (XTX )−1XT )

Expectation and Variance of ε = Y − Y ? :

• E(ε) = 0

• Var(ε) = σ2(In − X (XTX )−1XT )

Var(ε) = Var(Y − Y )

= Var(Y − X β)

= σ2(In)− XVar(β)XT

= σ2(In − X (XTX )−1XT )

Recal : Var(aY ) = aVar(Y )aT

Page 81: Regularization Methods for Linear Regression

Law of ε

ε ∼ N (0, σ2(In − X (XTX )−1XT )

Expectation and Variance of ε = Y − Y ? :

• E(ε) = 0

• Var(ε) = σ2(In − X (XTX )−1XT )

Var(ε) = Var(Y − Y )

= Var(Y − X β)

= σ2(In)− XVar(β)XT

= σ2(In − X (XTX )−1XT )

Recal : Var(aY ) = aVar(Y )aT

Page 82: Regularization Methods for Linear Regression

Linear model : law of the estimators

Under the assumption that εi are i.i.d. with εi ∼ N (0, σ2)

Theoremif p ≤ n and XTX inversible,

The vector

(βε

)of dimension (p + n) is a gaussian vector

with mean

(β0

), and

and variance σ2

((XTX )−1 0

0 In − X (XTX )−1XT

)

Page 83: Regularization Methods for Linear Regression

Loi σ2

n−pσ2 σ

2 ∼ χ2n−p

We note : σ2 = ||ε||2n−p

||ε||2 =∑n

i ε2i ||ε||2 suit une loi σ2χ2(n − p) (Cochran theorem)

Then, the expectation of σ2 = ||ε||2n−p is σ2,

(E(χ2(n − p)) = n − p)

We deduce the law of σ2 :σ2 ∼ σ2

n−pχ2(n − p)

Recall : Student theorem.U ∼ N (0, 1) and V ∼ χ2(d) , U and V are independant, the, we haveZ = U√

V/dfollows a Student law of parameter d .

Page 84: Regularization Methods for Linear Regression

Loi σ2

n−pσ2 σ

2 ∼ χ2n−p

We note : σ2 = ||ε||2n−p

||ε||2 =∑n

i ε2i ||ε||2 suit une loi σ2χ2(n − p) (Cochran theorem)

Then, the expectation of σ2 = ||ε||2n−p is σ2,

(E(χ2(n − p)) = n − p)

We deduce the law of σ2 :σ2 ∼ σ2

n−pχ2(n − p)

Recall : Student theorem.U ∼ N (0, 1) and V ∼ χ2(d) , U and V are independant, the, we haveZ = U√

V/dfollows a Student law of parameter d .

Page 85: Regularization Methods for Linear Regression

Loi σ2

n−pσ2 σ

2 ∼ χ2n−p

We note : σ2 = ||ε||2n−p

||ε||2 =∑n

i ε2i ||ε||2 suit une loi σ2χ2(n − p) (Cochran theorem)

Then, the expectation of σ2 = ||ε||2n−p is σ2,

(E(χ2(n − p)) = n − p)

We deduce the law of σ2 :σ2 ∼ σ2

n−pχ2(n − p)

Recall : Student theorem.U ∼ N (0, 1) and V ∼ χ2(d) , U and V are independant, the, we haveZ = U√

V/dfollows a Student law of parameter d .

Page 86: Regularization Methods for Linear Regression

Significativity test of βj , σ2 inconnu

• Student Statistics : T

• Significativity test (bilateral)• H0 : βj = 0• H1 : βj 6= 0

• Decision with a risk α, Reject of H0 if

• βj√σ2Sj,j

> tn−p(1− α/2) with Sj,j jeme term of the diagnonal of (XTX )−1

• pvalue < α

• Conclusion :• βj is significatively different of zero• Xj a une influence dans le modele

Not true if there exists colinearity between the variables

Page 87: Regularization Methods for Linear Regression

Significativity test of βj , σ2 inconnu

• Student Statistics : T

• Significativity test (bilateral)• H0 : βj = 0• H1 : βj 6= 0

• Decision with a risk α, Reject of H0 if

• βj√σ2Sj,j

> tn−p(1− α/2) with Sj,j jeme term of the diagnonal of (XTX )−1

• pvalue < α

• Conclusion :• βj is significatively different of zero• Xj a une influence dans le modele

Not true if there exists colinearity between the variables

Page 88: Regularization Methods for Linear Regression

Significativity test of βj , σ2 inconnu

• Student Statistics : T

• Significativity test (bilateral)• H0 : βj = 0• H1 : βj 6= 0

• Decision with a risk α, Reject of H0 if

• βj√σ2Sj,j

> tn−p(1− α/2) with Sj,j jeme term of the diagnonal of (XTX )−1

• pvalue < α

• Conclusion :• βj is significatively different of zero• Xj a une influence dans le modele

Not true if there exists colinearity between the variables

Page 89: Regularization Methods for Linear Regression

Significativity test of βj , σ2 inconnu

• Student Statistics : T

• Significativity test (bilateral)• H0 : βj = 0• H1 : βj 6= 0

• Decision with a risk α, Reject of H0 if

• βj√σ2Sj,j

> tn−p(1− α/2) with Sj,j jeme term of the diagnonal of (XTX )−1

• pvalue < α

• Conclusion :• βj is significatively different of zero• Xj a une influence dans le modele

Not true if there exists colinearity between the variables

Page 90: Regularization Methods for Linear Regression

Significativity test of βj , σ2 inconnu

• Student Statistics : T

• Significativity test (bilateral)• H0 : βj = 0• H1 : βj 6= 0

• Decision with a risk α, Reject of H0 if

• βj√σ2Sj,j

> tn−p(1− α/2) with Sj,j jeme term of the diagnonal of (XTX )−1

• pvalue < α

• Conclusion :• βj is significatively different of zero• Xj a une influence dans le modele

Not true if there exists colinearity between the variables

Page 91: Regularization Methods for Linear Regression

Global significativity of the model

Test of the model with a risk α

H0 : β2 = β3 = . . . = βp = 0H1 : ∃j = 2, . . . , p, βj 6= 0

Statistics

F = n−pp−1

||Y− ¯Y ||2

||Y−Y ||2 ∼ Fisher(p − 1, n − p)

Remarque : n−pp−1

||Y− ¯Y ||2

||Y−Y ||2 = SSE/(p−1)SSR/(n−p) (E :Estimated ; R : Residuals)

Decision rule

• si Fobs > qFα , H0 is rejected, and there exist a coefficient which is not zero.The regression is ”useful”

• si Fobs ≤ qFα , H0 is acceted, all the coefficients are supposed to be nullThe regression is not ”useful”

Page 92: Regularization Methods for Linear Regression

Global significativity of the model

Test of the model with a risk α

H0 : β2 = β3 = . . . = βp = 0H1 : ∃j = 2, . . . , p, βj 6= 0

Statistics

F = n−pp−1

||Y− ¯Y ||2

||Y−Y ||2 ∼ Fisher(p − 1, n − p)

Remarque : n−pp−1

||Y− ¯Y ||2

||Y−Y ||2 = SSE/(p−1)SSR/(n−p) (E :Estimated ; R : Residuals)

Decision rule

• si Fobs > qFα , H0 is rejected, and there exist a coefficient which is not zero.The regression is ”useful”

• si Fobs ≤ qFα , H0 is acceted, all the coefficients are supposed to be nullThe regression is not ”useful”

Page 93: Regularization Methods for Linear Regression

Global significativity of the model

Test of the model with a risk α

H0 : β2 = β3 = . . . = βp = 0H1 : ∃j = 2, . . . , p, βj 6= 0

Statistics

F = n−pp−1

||Y− ¯Y ||2

||Y−Y ||2 ∼ Fisher(p − 1, n − p)

Remarque : n−pp−1

||Y− ¯Y ||2

||Y−Y ||2 = SSE/(p−1)SSR/(n−p) (E :Estimated ; R : Residuals)

Decision rule

• si Fobs > qFα , H0 is rejected, and there exist a coefficient which is not zero.The regression is ”useful”

• si Fobs ≤ qFα , H0 is acceted, all the coefficients are supposed to be nullThe regression is not ”useful”

Page 94: Regularization Methods for Linear Regression

Global significativity of the model

• Fisher Statistic

• Significativity test (bilateral)• H0 : β2 = . . . = βp = 0• H1 : ∃βj 6= 0

• Decision with a rish α, Reject H0 if

• si n−pp−1

R2

1−R2 > fp−1,n−p(1− α)• si pvalue < α

→ The linear model has an added value

Page 95: Regularization Methods for Linear Regression

Global significativity of the model

• Fisher Statistic

• Significativity test (bilateral)• H0 : β2 = . . . = βp = 0• H1 : ∃βj 6= 0

• Decision with a rish α, Reject H0 if

• si n−pp−1

R2

1−R2 > fp−1,n−p(1− α)• si pvalue < α

→ The linear model has an added value

Page 96: Regularization Methods for Linear Regression

Global significativity of the model

• Fisher Statistic

• Significativity test (bilateral)• H0 : β2 = . . . = βp = 0• H1 : ∃βj 6= 0

• Decision with a rish α, Reject H0 if

• si n−pp−1

R2

1−R2 > fp−1,n−p(1− α)• si pvalue < α

→ The linear model has an added value

Page 97: Regularization Methods for Linear Regression

Global significativity of the model

Remarque1 : sur la statistique de Fisher :

F = n−pp−1

R2

1−R2

The R2 coefficient increase mechanically with the number of variables

Remarque : the adjusted R2 may be used

R2adj = 1− (1−R2)(n−1)

n−p

The R2adj does not increase ith the number of variables.

Page 98: Regularization Methods for Linear Regression

Global significativity of the model

Remarque1 : sur la statistique de Fisher :

F = n−pp−1

R2

1−R2

The R2 coefficient increase mechanically with the number of variables

Remarque : the adjusted R2 may be used

R2adj = 1− (1−R2)(n−1)

n−p

The R2adj does not increase ith the number of variables.

Page 99: Regularization Methods for Linear Regression

Boston Housing Data

The original data are n = 506 observations on p = 14 variables,medv being the target variable

crim per capita crime rate by townzn proportion of residential land zoned for lots over 25,000 sq.ftindus proportion of non-retail business acres per townchas Charles River dummy variable (= 1 if tract bounds river ; 0 otherwise)nox nitric oxides concentration (parts per 10 million)rm average number of rooms per dwellingage proportion of owner-occupied units built prior to 1940dis weighted distances to five Boston employment centresrad index of accessibility to radial highwaystax full-value property-tax rate per USD 10,000ptratio pupil-teacher ratio by townb 1000(B − 0.63)2 where B is the proportion of blacks by townlstat percentage of lower status of the populationmedv median value of owner-occupied homes in USD 1000’s

Page 100: Regularization Methods for Linear Regression

Boston Housing Data

Les donnees :nr crim zn indus chas nox rm age dis rad tax ptratio b lstat medv1 0.00632 18 2.31 0 0.538 6.575 65.2 4.0900 1 296 15.3 396.90 4.98 24.02 0.02731 0 7.07 0 0.469 6.421 78.9 4.9671 2 242 17.8 396.90 9.14 21.63 0.02729 0 7.07 0 0.469 7.185 61.1 4.9671 2 242 17.8 392.83 4.03 34.74 0.03237 0 2.18 0 0.458 6.998 45.8 6.0622 3 222 18.7 394.63 2.94 33.45 0.06905 0 2.18 0 0.458 7.147 54.2 6.0622 3 222 18.7 396.90 5.33 36.2... ... ... ... ... ... ... ... ... ... ... ... ... ... ...

Page 101: Regularization Methods for Linear Regression

Boston Housing Data

MCO sous Rlibrary(mlbench)

#Data data(BostonHousing) tab=BostonHousing;names(tab)

target="medv"; Y=tab[,target]; X=tab[,names(tab)!=target]; names(X)

#MCO resfit=lsfit(x=X,y=Y,intercept=T);

resfit$coef hist(resfit$res)Cst crim zn indus chas nox rm age dis rad tax ptratio b lstat medv36.45 -0.10 0.046 0.020 2.68 -17.76 3.80 0.00 -1.47 0.30 -0.01 -0.95

Page 102: Regularization Methods for Linear Regression

Boston Housing Data

Modele Lineaire sous Rcode Rreslm=lm(medv ~ .,data=tab); summary(reslm) Resultats :n = 506, p = 14Residuals:

Min 1Q Median 3Q Max

-15.595 -2.730 -0.518 1.777 26.199

Coefficients:Estimate Std. Error t value Pr(>|t|)

(Intercept) 3.646e+01 5.103e+00 7.144 3.28e-12 ***

crim -1.080e-01 3.286e-02 -3.287 0.001087 **

zn 4.642e-02 1.373e-02 3.382 0.000778 ***

indus 2.056e-02 6.150e-02 0.334 0.738288

chas1 2.687e+00 8.616e-01 3.118 0.001925 **

nox -1.777e+01 3.820e+00 -4.651 4.25e-06 ***

rm 3.810e+00 4.179e-01 9.116 < 2e-16 ***

age 6.922e-04 1.321e-02 0.052 0.958229

dis -1.476e+00 1.995e-01 -7.398 6.01e-13 ***

rad 3.060e-01 6.635e-02 4.613 5.07e-06 ***

tax -1.233e-02 3.760e-03 -3.280 0.001112 **

ptratio -9.527e-01 1.308e-01 -7.283 1.31e-12 ***

b 9.312e-03 2.686e-03 3.467 0.000573 ***

lstat -5.248e-01 5.072e-02 -10.347 < 2e-16 ***

Signif. codes: 0 *** 0.001/ ** 0.01 /* 0.05 /. 0.1 / 1

Residual standard error: 4.745 on 492 degrees of freedom

Multiple R-squared: 0.7406, Adjusted R-squared: 0.7338

F-statistic: 108.1 on 13 and 492 DF, p-value: < 2.2e-16

Page 103: Regularization Methods for Linear Regression

Precautions

• Multicolinearitela solution des MCO necessite de calculer (XTX )−1.Lorsque le determinant de cette matrice est tres proche de zero, le problemeest mal conditionne.

• Choix des variablesLe coefficient de determination R2 augmente en fonction du nombre devariables.si p = n R2 = 1, ce qui n’est pas forcement pertinent.

Page 104: Regularization Methods for Linear Regression

Demonstration sous R