pls l east s quares multivariate r e g r e s s i o n a standard tool for : p artial

38

Post on 20-Dec-2015

214 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: PLS L east S quares Multivariate R e g r e s s i o n A Standard Tool for : P artial
Page 2: PLS L east S quares Multivariate R e g r e s s i o n A Standard Tool for : P artial

PLS

LLeasteastSSquaresquares

Multivariate R e g r e s s i o nA Standard Tool for :

PPartialartial

Page 3: PLS L east S quares Multivariate R e g r e s s i o n A Standard Tool for : P artial

Regression :

Modeling dependentdependent variable(s): YY

By predictorpredictor variables: XX

Chemical property

Biological activity

Chem. composition

Chem. structure (Coded)

Page 4: PLS L east S quares Multivariate R e g r e s s i o n A Standard Tool for : P artial

MLRMLRTraditional method:

IfIf X-variables are:

few ( # X-variables < # Samples)

Uncorrelated (Full Rank X)

Noise Free ( when some correlation exist)

Page 5: PLS L east S quares Multivariate R e g r e s s i o n A Standard Tool for : P artial

InstrumentsInstrumentsSpectrometers

Chromatographs

Sensor Arrays

Numerous

Data …Data …

Correlated

Noisy

Incomplete

But !But !

Page 6: PLS L east S quares Multivariate R e g r e s s i o n A Standard Tool for : P artial

Correlated

PredictorPredictor

XX : Independent Variables

Page 7: PLS L east S quares Multivariate R e g r e s s i o n A Standard Tool for : P artial

The relation between

two Matrices XX and YY

By a LinearLinear Multivariate Regression

PLSR ModelsModels:

The StructureStructure of both XX and YY

Richer resultsRicher results than MLRMLR

1

2

Page 8: PLS L east S quares Multivariate R e g r e s s i o n A Standard Tool for : P artial

PLSRPLSR is a generalizationgeneralization of MLRMLR

PLSR is able to analyzeable to analyze Data with:

Noise

Collinearity (Highly Correlated Data)

Numerous X-variables (> # samples)

Incompleteness in both X and Y

Page 9: PLS L east S quares Multivariate R e g r e s s i o n A Standard Tool for : P artial

HistoryHistory

Herman Wold (1975):

Modeling of chain matrices by:

NNonlinear IIterative PPAArtial LLeast SSquares

Regression between :

- a variablevariable matrix

- a parameterparameter vector

Other parameter vector

Fixed

Page 10: PLS L east S quares Multivariate R e g r e s s i o n A Standard Tool for : P artial

Completion and modification of

Two-blocksTwo-blocks ( XX, YY ) PLS (simplest)

Herman Wold (~2000):

PProjection to LLatent SStructures

As a more descriptivemore descriptive interpretation

Svante Wold & H. Martens (1980):

Page 11: PLS L east S quares Multivariate R e g r e s s i o n A Standard Tool for : P artial

A QQSSPPRR example :

OneOne YY-variable: a chemical propertyproperty

Quant. descriptiondescription of variation in chem. structurestructure

The Free Energy of unfolding of a protein

SevenSeven XX-variables:

1919 different AminoAcids in position 49 of proteinHighlyHighly

CorrelatedCorrelated

Page 12: PLS L east S quares Multivariate R e g r e s s i o n A Standard Tool for : P artial

123456789

10111213141516171819

data PIEPIE0.23

-0.48-0.610.45

-0.11-0.510.000.151.201.28

-0.770.901.560.380.000.171.850.890.71

PIFPIF0.31

-0.60-0.771.54

-0.22-0.640.000.131.801.70

-0.991.231.790.49

-0.040.262.250.961.22

DGRDGR-0.550.511.20

-1.400.290.760.00

-0.25-2.10-2.000.78

-1.60-2.60-1.500.09

-0.58-2.70-1.70-1.60

SACSAC254.2303.6287.9282.9335.0311.6224.9337.2322.6324.0336.6336.3366.1288.5266.7283.9401.8377.8295.1

MRMR2.1262.9942.9942.9333.4583.2431.6623.8563.3503.5182.9333.8604.6382.8762.2792.7435.7554.7913.054

LamLam-0.02-1.24-1.08-0.11-1.19-1.430.03

-1.060.040.12

-2.26-0.33-0.05-0.31-0.40-0.53-0.31-0.84-0.13

DDGTSDDGTS8.58.28.5

11.06.38.87.1

10.116.815.0

7.913.311.2

8.27.48.89.98.8

12.0

VolVol82.2

112.3103.799.1

127.5120.565.0

140.6131.7131.5144.3132.3155.8106.788.5

105.3185.9162.7115.6

X YY

Page 13: PLS L east S quares Multivariate R e g r e s s i o n A Standard Tool for : P artial

TransformationTransformation Symmetrical Distribution

12.542350.2546100584

loglog

1.0973.627-0.6992.7375.002

Page 14: PLS L east S quares Multivariate R e g r e s s i o n A Standard Tool for : P artial

ScalingScaling More weights for

more informative X-variables

No Knowledge about importance of variables

Auto ScalingAuto Scaling

1.Scale to unit variance (xxi i /SD/SD).

2.Centering (xi – xaver).

Same weights for all X-variables

Page 15: PLS L east S quares Multivariate R e g r e s s i o n A Standard Tool for : P artial

Auto Scaling

Numerically More Stable

Page 16: PLS L east S quares Multivariate R e g r e s s i o n A Standard Tool for : P artial

BaseBase of PLSR Model (usually linearlinear)

A few “new” variables :

XX-scoresscores tta a (a=1,2, …,A)(a=1,2, …,A)

OrthogonalOrthogonal

& Linear Combination of X-variables

Modelers of XX Predictors of YY

: T = X W*

Weights

Page 17: PLS L east S quares Multivariate R e g r e s s i o n A Standard Tool for : P artial

X = T P’ + E

TT (X-scores) (X-scores) ttaa (a=1,2, …,A)(a=1,2, …,A)

Are:

Predictors of YY: Y = T Q’ + F

loadings

Y = XW* Q’ + FPLS-Regression PLS-Regression

CoefficientsCoefficients ((BB))

Modelers of XX:

Page 18: PLS L east S quares Multivariate R e g r e s s i o n A Standard Tool for : P artial

By stepwise subtraction of each component (ttaap’p’aa) from XX

X = T P’ + E

X - T P’ = E

X - ta pa’ = Xa

Residual after Residual after subtraction of subtraction of aathth component component

Estimation of Estimation of TT : :

Page 19: PLS L east S quares Multivariate R e g r e s s i o n A Standard Tool for : P artial

XX11 XX22 XX33 XXa-1a-1

XXaa

XX00== tt11pp11 +tt22pp22+ tt33pp33+ t4pp44+… + tappa a + E

Page 20: PLS L east S quares Multivariate R e g r e s s i o n A Standard Tool for : P artial

t1 = X0w1

X1 = X – t1 p1’t2 = X1w2

X2= X1 – t2 p2’t3 = X2w3

XXa-1a-1 = XXa-2a-2 – ta-1 p’a-1ta = XXa-1a-1 wa

.

.

.

.

.

.

XXaa= XXa-1a-1 – ta pa’= E

Stepwise “DeflationDeflation” of XX-matrix

Page 21: PLS L east S quares Multivariate R e g r e s s i o n A Standard Tool for : P artial

Geometrical Interpretation

tt,s are modelers of XX and predictors of YY

Page 22: PLS L east S quares Multivariate R e g r e s s i o n A Standard Tool for : P artial

PLS-2PLS-2PLSPLS--11

MultivariableMultivariable YY

or ??

YY PCARankRank of Y ( #PCs)

IfIf #PCs << # Y variables :

One One yy at a time at a time all in a single modelall in a single model

IfIf #PCs =< # Y variables :

One PLS-2PLS-2 model

PLSPLS--1 1 models

Page 23: PLS L east S quares Multivariate R e g r e s s i o n A Standard Tool for : P artial

UnderfittingUnderfitting

No of PLS components !!

If proper :If proper :

OverfittingOverfitting

GOODGOOD predictionprediction abilityability

Page 24: PLS L east S quares Multivariate R e g r e s s i o n A Standard Tool for : P artial

Cross Validation:Cross Validation:

X YY

PPredictive redictive

REREsidualsidual

SSum ofum of

SSquaresquares

Calibr.

Pred.

Pred.Pred.

Page 25: PLS L east S quares Multivariate R e g r e s s i o n A Standard Tool for : P artial

Different # components# components in the model

Different PRESS PRESS values

Model with proper proper # components# components

is

The model with minmin PRESS value

Page 26: PLS L east S quares Multivariate R e g r e s s i o n A Standard Tool for : P artial

PLS AlgorithmPLS Algorithm

NNonlinear IIterative PAPArtial LLeast SSquares

Common and simple

TransformationTransformation, ScalingScaling and CenteringCentering of XX and YY

Initially :

Page 27: PLS L east S quares Multivariate R e g r e s s i o n A Standard Tool for : P artial

X = T P’ + E

Y = U Q’ + F = T Q’ + F

TT PP

Base :Base :

TT = XX PP

PP = X’X’ TT

X Utilizing X-model

Page 28: PLS L east S quares Multivariate R e g r e s s i o n A Standard Tool for : P artial

AA Getting uu (temporary Y-score):

One of Y columnsOne of Y columnsFor using as X-score

Having: XXYY

is (XX00, or XX11, …, or XXa-1a-1)is (YY00, or YY11, …, or YYa-1a-1)

1. Autoscaled2. Not deflated

For aa = 11 to AA

Page 29: PLS L east S quares Multivariate R e g r e s s i o n A Standard Tool for : P artial

BB

Xa-1 = uuaa wa’ + E wa= X’a-1uuaa//u’u’aauuaa

Make w’awa=1

Temp. X-loadings

Calculating wwaa ( X-weightsweights )

Page 30: PLS L east S quares Multivariate R e g r e s s i o n A Standard Tool for : P artial

CC Calculating ttaa (X-scoresscores):

Xa-1 = ttaa wa’ + E ttaa= Xa-1wa

ScoresScores for both XX and YY

Page 31: PLS L east S quares Multivariate R e g r e s s i o n A Standard Tool for : P artial

Xa-1 = ta pa’ + E

Ya-1 = ta qa’ + F

DD Calculating ppaa ( X-loadingloading)

and qqaa (Y-loadingloading)

pa = Xa-1ta/ta’ta

qa = Ya-1 ta/ta’ta

Page 32: PLS L east S quares Multivariate R e g r e s s i o n A Standard Tool for : P artial

EE Testing desireness of uuaa :

By calculating tta a again

(uua a )new = Ya-1 qa / qa’ qa

wa= X’a-1uuaa//u’u’aauuaa

(ttaa)new= Xa-1wa

Performing convergence testconvergence test on it.

((ttaa))newnew - ttaa / ((ttaa))newnew < 10-7

Page 33: PLS L east S quares Multivariate R e g r e s s i o n A Standard Tool for : P artial

FF If No convergenceNo convergence : Goto

Using ((uua a ))newnew

BB

GG If convergenceconvergence :

Calculating new XX and YY for the next cycle

Xa = Xa-1 - ta pa’

Ya = Ya-1 - ta qa’

Next aa Or : aa=aa+1 and Goto BB

Page 34: PLS L east S quares Multivariate R e g r e s s i o n A Standard Tool for : P artial

HH Last Step (when aa = AA)

BB = W(P’W)-1Q’

PLS-Regression PLS-Regression

Coefficients Coefficients ((BB))

Y = X B + B0

Page 35: PLS L east S quares Multivariate R e g r e s s i o n A Standard Tool for : P artial

ScoresScores

LoadingsLoadingsw1

p1

t1 uu11

q1

XX00 YY00

X1 = X0 – t1 p1’

Y1 = Y0 – t1 q1’

summary

Page 36: PLS L east S quares Multivariate R e g r e s s i o n A Standard Tool for : P artial

ScoresScores

LoadingsLoadingsw2

p2

t2 uu22

q2

XX11 YY11

X2 = X1 – t2 p2’

Y2 = Y1 – t2 q2’

Page 37: PLS L east S quares Multivariate R e g r e s s i o n A Standard Tool for : P artial

ScoresScores

LoadingsLoadingswa

pa

ta uuaa

qa

XXa-1a-1 YYa-1a-1

Xa = Xa-1 – ta pa’= E

Ya = Ya-1 – ta qa’ = F

Page 38: PLS L east S quares Multivariate R e g r e s s i o n A Standard Tool for : P artial

TT UU

WWPP QQXX YY

+ AA , EE, and FF