partial least squares - ucd school of mathematics and...

12/9/2013

1

Partial Least Squares

A tutorial

Lutgarde Buydens

Partial least Squares

• Multivariate regression• Multiple Linear Regression (MLR)• Principal Component Regression (PCR)• Partial Least Squares (PLS)

• Validation

• Preprocessing

Multivariate Regression

X Y

n

p k

Rows: Cases, observations, … Collums: Variables, Classes, tags

Analytical observations of different samplesExperimental runsPersons….

P: Spectral variablesAnalytical measurements

K: Class informationConcentration,.. X: Independent variabels (will be always available)

Y: Dependent variables ( to be predicted later from X)

2000 4000 6000 8000 10000 12000 14000-0.5

0

0.5

1

1.5

2

Wavenumber (cm-1)

Raw data

Multivariate Regression

X Y

n

pk

Rows: Cases, observations … Collums: Variables, Classes, tagsX: Independent variabels (will be always available)Y: Dependent variables ( to be predicted later from X)

2000 4000 6000 8000 10000 12000 14000-0.5

0

0.5

1

1.5

2

Wavenumber (cm-1)

Raw data

Y = f(X) : Predict Y from X

MLR: Multiple Linear RegressionPCR: Principal Component RegressionPLS: Partial Least Sqaures

From univariate to Multiple Linear Regression (MLR)

Least squares regression

y= b0 +b1 x1 + ε

b0 : interceptb1 : slopeε

y

x

MLR: Multiple Linear Regression

Least squares regression

y= b0 +b1 x1 + ε

b0 : interceptb1 : slope

Multiple Linear Regression

y= b0 +b1 x1 + b2x2 + … bpxp + ε

EYY ^

ε

y

x

x2

x1

y

),(

yyrmaximizes

ε

12/9/2013

2


y= b0 +b1 x1 + b2x2 + … bpxp + ε

+

y b0 eX

nn n

b1

bp

=

EYY ^

11

11

:::

p+1

x

x2

x1

yε

yn1 = Xnpbp1 + en1

Ynk = XnpBpk + Enk

b = (XTX)-1XTy


•Disadavantages: (XTX)-1

• Uncorrelated X-variables required

• n p +1

x2

x1

y r(x1,x2) 1


Disadavantages: (XTX)-1


x2

x1

y r(x1,x2) 1

Fits a plane through a line !!




x2

x1

y r(x1,x2) 1

3.763.67

-2.91-2.87

0.230.23

5.555.49

3.253.23

-0.99-1.01

3.67

-2.87

0.23

5.49

3.23

-1.01

3.76

-2.91

0.21

5.55

3.25

-0.99

x2x2 x1x1

Set A Set B

11.29

-8.09

2.19

19.09

10.33

-1.89

y

y= b1 x1 + b2x2 + ε

b1 b2 b1 b2

10.3 -6.92 2.96 0.28MLR

R2 =0.98 R2 =0.98




• n p +1

X p

n

Dimension reduction Variable Selection

Latent variables (PCR, PLS)

PCR: Principal Component Regression

Step 1: Perform PCA on the original X

Step 2 : Use the orthogonal PC-scores as independent variables in a MLR model

Step 3: Calculate b-coefficients from the a-coefficients

XPCA

T

pcols

n-rows n-rows

acols

a1

a2

aa

MLRy

Step 1 Step2

n-rows

a2

aa

b1

b0

bp

Step 3a1

12/9/2013

3


PC1

Dimension reduction:

Use scores (projections) on latent variables that explain maximal variance in X

xp

x2

x1


Step 0 : Meancenter X

Step 1: Perform PCA: X = TPT X* = (TPT)*

Step 2: Perform MLR Y=TA

A = (TTT)-1TTY

Step 3 : Calculate B Y = X* B

Y = (T PT) B

A = PTB

B = (PPT)-1PA

B = PA

Calculate b0’s

MLR on reconstructed X*= (TPT)*

yyb ˆ0


Optimal number of PC’s

Calculate Crossvalidation RMSE for different # PC’s

nyyRMSECV ii

2)(

PLS: Partial Least Squares Regression

XPLS

T

pcols

n-rows n-rows

acol

a1

a2

aa

MLRy

Phase 1

n-rowsa1

a2

aa

b1

b0

bp

Y

kcols

n-rows

Phase 2

a1

kcols

Phase 3


Projection to Latent Structure

PC1

xp

x2

x1

LV1 (w)

xp

x2

x1

PCR PLS

Use PC: Maximizes variance in X

Use LV: Maximizes covariance (X,y)

= VarX*vary*cor(X,y)


Sequential Algorithm: Latent variables and their scores are calculated sequentially

• Step 0: Mean center X

• Step 1: Calculate w

Calculate LV1= w1 that maximizes Covariance (X,Y) : SVD on XTY

(XTY)pk = WpaDaa ZTak w1 = 1st col. of W

Phase 1 : Calculate new independent variables (T)

xp

x2

x1

w1

12/9/2013

4


Sequential Algorithm: Latent variables and their scores are calculated sequentially

• Step 1: Calculate LV1= w1 that maximizes Covariance (X,Y) : SVD on XTY

(XTY)pk = WpaDaa ZTak w1 = 1st col. of W

•Step 2:

Calculate t1, scores (projections) of X on w1

tn1 = Xnpwp1

Phase 1 : Calculate new independent variables (T)

xp

x2

x1

w


XPLS

T

pcols

n-rows n-rows

acol

a1

a2

aa

MLRy

Phase 1

n-rowsa1

a2

aa

b1

b0

bp

Y

kcols

n-rows

Phase 2

a1

kcols

Phase 3

Optimal number of LV’s Calculate Crossvalidation RMSE for different # LV’s

nyyRMSECV ii

2)(


3.763.67

-2.91-2.87

0.230.23

5.555.49

3.253.23

-0.99-1.01

3.67

-2.87

0.23

5.49

3.23

-1.01

3.76

-2.91

0.21

5.55

3.25

-0.99

x2x2 x1x1

Set A Set B

11.29

-8.09

2.19

19.09

10.33

-1.89

y

y= b1 x1 + b2x2 + ε

b1 b2 b1 b2

10.3 -6.92 2.96 0.28

1.60 1.62 1.60 1.62

1.60 1.62 1.60 1.62

MLR

PCR

PLS

MLR, PCR, PLS:

VALIDATION

Estimating prediction error.

Basic Principle:

test how well your model works with new data,

it has not seen yet!

Common measure for prediction error

12/9/2013

5

Prediction error of the samples the model was built on

Error is biased!

Samples also used to build the model

model is biased towards accurate prediction of these specific samples

A Biased Approach

Basic Principle:

test how well your model works with new data, it has not seen yet!

Split data in training and test set.

Several ways:One large test setLeave one out and repeat: LOOLeave n objects out and repeat: LNO

. . .Apply entire model procedure on the test set

Validation: Basic Principle

Validation

Full dataset

Trainingset

Testset

Build model :

b0

bp

yRMSEP

Remark: for final model use whole data set.

Training and test sets

Split in training and test set.• Test set should be

representative of training set• Random choice is often the

best• Check for extremely unlucky

divisions• Apply whole procedure on the

test and validation sets

Cross-validation

• Most simple case: Leave-One-Out (=LOO, segment=1 sample). Normally 10-20% out (=LnO).

• Remark: for final model use whole data set.

Cross-validation: an example

• The data

12/9/2013

6


• Split data into training set and validation set


• Split data into training set and test set


• Build a model on the training set



• Split data again into training set and valid. set– Until all samples have been in the validation set once– Common: Leave-One-Out (LOO)



12/9/2013

7









Cross-validation: a warning

• Data: 13 x 5 = 65 NIR spectra (1102 wavelengths)– 13 samples: different composition of NaOH, NaOCl and Na2CO3

– 5 temperatures: each sample measured at 5 temperatures

Composition

NaOH (wt%) NaOCl(wt%)

Na2CO3 (wt%) Temperature (°C)

1 18.99 0 0 15 21 27 34 402 9.15 9.99 0.15 15 21 27 34 403 15.01 0 4.01 15 21 27 34 404 9.34 5.96 3.97 15 21 27 34 40… … … … …13 16.02 2.01 1.00 15 21 27 34 40

Cross-validation: a warning

• The data

65

1102

65

312

13

Leave SAMPLE out

y

12/9/2013

8

Selection of number of LV’s

Trough Validation:

Choose number of LV’s that results in model with lowest prediction errror

Testset to assess final model cannot be used !

Divide trainingsetCrossvalidation

Validation

Full dataset

TrainingSet

Test’ set

Testset

1) determine #LV’s : wit test’ set

2) Build model : b0

bp

yRMSEP


Double Cross Validation

Full dataset

TrainingsetC

1) determine #LV’s : CV Innerloop

2) Build model : CV Outer loop

b0

bp

yRMSEP

Remark: for final model use whole data set Skip.

CV 1

CV2

Double cross-validation

• The data





Used later to assess model performance!

12/9/2013

9


• Apply crossvalidation on the rest: Split training set into(new) training set and test set

1LV 2LV 3LV

1LV 2LV 3LV 1LV 2LV 3LV

Lowest RMSECV


12/9/2013

10


• Repeat procedure – Until all samples have been in the validation set once


• Repeat procedure – Until all samples have been in the validation set once


• In this way:– The number of LVs is determined by using samples not

used to build the model with

– The prediction error is also determined using samples the model has not seen before


Raw + meancentered data

2000 4000 6000 8000 10000 12000 14000-0.5

0

0.5

1

1.5

2

Wavenumber (cm-1)

Abs

orba

nce

(a.u

.)

Raw data

2000 4000 6000 8000 10000 12000 14000-0.2

-0.15

-0.1

-0.05

0

0.05

0.1

0.15

0.2

0.25

0.3

Wavenumber (cm-1)

Abs

orba

nce

(a.u

.)

Meancentered data

PLS: an example

RMSECV vs. No of LVs

1 2 3 4 5 6 7 8 9 100

0.1

0.2

0.3

0.4

0.5

0.6

0.7

Number of LVs

RM

SEC

V

RMSECV values for prediction of NaOH

Regression coeffficients

3000 4000 5000 6000 7000 8000 9000 10000 11000 12000 13000-0.5

0

0.5

1

1.5

2

Wavenumber (cm-1)

Abs

orba

nce

(a.u

.)

Raw data

3000 4000 5000 6000 7000 8000 9000 10000 11000 12000 13000-2

0

2

4

6

8

10

Wavenumber (cm-1)

Reg

ress

ion

coef

ficie

nt

12/9/2013

11

True vs. predicted

8 10 12 14 16 18 208

10

12

14

16

18

20

NaOH, true

NaO

H, p

redi

cted

True values vs. predictions

500 1000 1500 2000 2500

0

0.5

1

1.5

2

2.5

3

Wavelength (cm-1)

Inte

nsity

(a.u

)

Original spectrumOffsetSlopeScatter

Why Pre-Processing ?Data Artefacts• Baseline correction• Alignment• Scatter correction• Noise removal• Scaling, Normalisation• Transformation• …..

Other• Missing values• Outliers

0 200 400 600 800 1000 1200 1400 16000

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

Wavelength (a.u.)

Inte

nsity

(a.u

.)

original

0 200 400 600 800 1000 1200 1400 160000.10.20.30.40.50.60.70.8

Wavelength (a.u.)

Inte

nsity

(a.u

.) offset

0 200 400 600 800 1000 1200 1400 160000.10.20.30.40.50.60.70.8

Wavelength (a.u.)

Inte

nsity

(a.u

.)

offset+slope

0 200 400 600 800 1000 1200 1400 16000

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

Wavelength (a.u.)

Inte

nsity

(a.u

.) multiplicative

0 200 400 600 800 1000 1200 1400 16000

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

Wavelength (a.u.)

Inte

nsity

(a.u

.)

originaloffsetoffset+slopemultiplicativeoffset + slope + multiplicative

0 200 400 600 800 1000 1200 1400 16000

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

Wavelength (a.u.)

Inte

nsity

(a.u

.)

originaloffsetoffset+slopemultiplicativeoffset + slope + multiplicative

Pre-Processing Methods4914 combinations: all reasonable

STEP 1: (7x) BASELINE

STEP 2: (10x) SCATTER

STEP 3: (10x) NOISE

STEP 4: (7x) SCALING &

TRANSFORMATIONS No baseline correction No scatter correction No noise removal Meancentering

(3x) Detrending polynomial order (2-3-4)

(4x) scaling: Mean Median Max L2 norm

(9x) S-G smoothing (window: 5-9-11 pt) (order: 2-3-4)

Autoscaling

Range scaling

(2x) Derivatisation (1st – 2nd )

SNV Pareto scaling

(3x) RNV (15, 25, 35)% Poisson scaling

AsLS MSC Level scaling

Log transformation

Supervised pre-processing methods OSC No noise removal Meancentering

DOSC Autoscaling Range scaling Pareto scaling Poisson scaling Level scaling Log scaling

Pre-Processing Results

Classification accuracy %

Com

plex

ity o

f the

mod

el (n

o of

LV)

• Complexity of the model : no of LV• Classification Accuracy Raw Data

J. Engel et al. TrAC 2013

12/9/2013

12

• PLS Toolbox (Eigenvector Inc.)– www.eigenvector.com– For use in MATLAB (or standalone!)

• XLSTAT-PLS (XLSTAT)– www.xlstat.com– For use in Microsoft Excel

• Package pls for R– Free software– http://cran.r-project.org

SOFTWARE

partial least squares - ucd school of mathematics and...

Documents