linear methods for regression

Linear Methods for Regression

Lecture Notes for CMPUT 466/551

Nilanjan Ray

Assumption: Linear Regression Function

Model assumption: Output Y is linear in the inputs X=(X1, X2, X3,…, Xp)

jjj XXY

ˆPredict the output by:

Vector notation, 1 included in X

Where,

Also known as multiple-regression when p>1

Least Square Solution

iii xyxfyRSS

2 )())(()(

)()()( XyXy TRSS

XyXTRSS

yXXX TT 1)(ˆ

residual

Known as least square solution

),,,( 002010 pxxxx For a new input

yXXX TTTT xxxY 1000 )(ˆ)(ˆ The regression output is

Residual sum of squares:

In matrix-vector notation:

Vector differentiation:

Solution:

Bias-Variance Decomposition

yXXX TTTxxfxy 1000 )()(ˆ)(ˆ

TXXf )(

Estimator:

Unbiased estimator! Ex. Show the last step

Model: has zero expectationsame varianceuncorrelated

)]()([

]ˆ)([

)](ˆ[)(

εXXXX

Variance:

])()([

]ˆ)([

)]()(ˆ[

)]](ˆ[)(ˆ[

xfExfE

εXXXX

Decomposition of EPE:

]))(ˆ[)(ˆ[(]))(ˆ[)([(][

)]](ˆ[)](ˆ[)(ˆ)([

)](ˆ)([)](ˆ)([)(

xfExfExfExfEE

xfExfExfxfE

xfxfExyxyExEPE

Irreducible error= 2 Sq. bias=0 Variance= 2(p/N)

Linear

Gauss-Markov Theorem

)](ˆ[)( 00 xfExf

ycyXXX TTTTxxf 01

00 )()(ˆ

)]([)( 00 xgExf

Gauss-Markov Theorem: least square estimate has the minimum varianceamong all linear unbiased estimators

Interpretation:

The estimator found by least squares is linear in y

We have noticed that this estimator is unbiased, i.e.,

If we find any other unbiased estimator g(x0) of f(x0) that is linear in y too, i.e.,

,)( 0 ycTxg

then )].([)](ˆ[ 00 xgVarxfVar

Question: Is the LS the best estimator for the given linear additive model?

Subset Selection

• LS solution often has large variance (remember that variance is proportional to the number of inputs p, i.e., model complexity)

• If we decrease the number of input variables p, we can decrease the variance, however we then sacrifice the zero bias

• If this trade-off decreases test error, the solution can be accepted

• This reasoning leads to subset selection, i.e., select a subset from the p inputs for the regression computation

• Subset selection has another advantage– easy and focused interpretation of the input variables on the output

Subset Selection…

Can we determine which j s are insignificant?

Yes, we can by statistical hypothesis testing!

However, we need a model assumption:

is zero mean Gaussian with standard deviation

Subset Selection: Statistical Significance Test

))(,(~ˆ 21 XXTN

The linear model with additive Gaussian noise has the following properties:

Ex. Show this.

So we can form a standardized coefficient or Z-score test for each coefficient:

iii yy

2)ˆ(1

1̂ and vj is the jth diagonal element of (XTX)-1

Hypothesis testing principle says that a large value of Z-score should retainThe coefficient, a small value should discard the coefficient

How large/small – depends on the significance level

Case Study: Prostate Cancer

Output = log prostate-specific antigen

Input = ( log cancer volume, log prostate weight, age, log of benign prostatic hyperplacia, seminal vesicle invasion,log of capsular penetration, Gleason score, % of Gleason score4 or 5)

Goal: (1) predict the output given a novel input(2) Interpret the influence of the inputs on the output

Case Study…

Scatter plot

Hard to interpret which onesare most influencing

Also we want to find out howthe inputs jointly influence theoutput

Subset Selection on Prostate Cancer Data

Term Coefficient Std. Error Z-score

Intercept 2.48 0.09 27.66

Lcavol 0.68 0.13 5.37

Lweight 0.30 0.11 2.75

Age -0.14 0.10 -1.40

Lbph 0.21 0.10 2.06

Svi 0.31 0.12 2.47

Lcp -0.29 0.15 -1.87

Gleasson -0.02 0.15 -0.15

Pgg45 0.27 0.15 1.74

Scores with magnitude greater than 2 indicate significant variablesat 5% significance level

Coefficient Shrinkage: Ridge Regression Method

yXIXX TT

jjiji xy

})({minargˆ

One computational advantage is that the matrix is always invertible

If L2 norm is replaced by L1 norm, the corresponding regression is calledLASSO (see [HTF])

Non-negative penalty

Ridge Regression…

coefficient

Decreasing

One way to determine is cross validation – we’ll learn about it later

linear methods for regression

Documents

penalized regression methods for linear models in...

statistical methods - department of statistics,...

linear regression: evaluating regression models overview...

linear regression 1michael sokolov / numerical methods for...

linear regression analysis and least square methods

multiple linear regression - analysis made easy · multiple...

lecture 11: regression methods i (linear...

an overview of methods in linear least-squares...

chapter 13 simple linear regression and correlation:...

linear methods: regression & discrimination sec 4.6

linear methods for regression - stanford...

regression 12.1 simple linear regression model 12.2...

lecture 2: more on linear methods for regression

1 curve-fitting polynomial interpolation. 2 curve fitting...

1 curve-fitting interpolation. 2 curve fitting regression...

lecture 15 - regression methods -...

regularization methods for linear regression

contentsstatweb.stanford.edu/~tibs/book/chap3.pdfmany...

chapter 13: simple linear regression. 2 simple regression...

3. linear methods for regression