regression / calibration

Regression / Calibration

MLR, RR, PCR, PLS

Paul Geladi

Head of Research NIRCEUnit of Biomass Technology and ChemistrySwedish University of Agricultural SciencesUmeåTechnobothniaVasa paul.geladi@btk.slu.se paul.geladi@syh.fi

Univariate regression

Offset

Offset a

Slope b

y = a + bx +

y Linear fit

Underfit

y Overfit

y Quadratic fit

Multivariate linear regression

y = f(x)

Works sometimes

y = f(x)

Works only for a few variables

Measurement noise!

∞ possible functions

y = f(x)

Simplified by:

y = b0 + b1x1 + b2x2 + ... + bKxK + f

Linear approximation

y = b0 + b1x1 + b2x2 + ... + bKxK + f

y : responsexk : predictorsbk : regression coefficientsb0 : offset, constantf : residual

Nomenclature

X, y mean-centered b0 out

y = b1x1 + b2x2 + ... + bKxK + f

} I samples

y = b1x1 + b2x2 + ... + bKxK +f

y = Xb + f

X, y known, measurableb, f unknown

No solution

f must be constrained

The MLR solution

Multiple Linear Regression

Ordinary Least Squares (OLS)

b = (X’X)-1 X’y

Problems?

Least squares

3b1 + 4b2 = 14b1 + 5b2 = 0

One solution

3b1 + 4b2 = 14b1 + 5b2 = 0 b1 + b2 = 4

No solution

3b1 + 4b2 + b3 = 14b1 + 5b2 + b3 = 0

∞ solutions

b = (X’X)-1 X’y

-K > I ∞ solutions-I > K no solution-error in X-error in y-inverse may not exist-inverse may be unstable

3b1 + 4b2 + e = 14b1 + 5b2 + e = 0 b1 + b2 + e = 4

Solution

Wanted solution

- I ≥ K- No inverse- No noise in X

Diagnostics

y = Xb + f

SS tot = SSmod + SSres

R2 = SSmod / SStot = 1- SSres / SStot

Coefficient of determination

Diagnostics

y = Xb + f

SSres = f’f

RMSEC = [ SSres / (I-A) ] 1/2

Root Mean Squared Error of Calibration

Alternatives to MLR/OLS

Ridge Regression (RR)

b = (X’X)-1 X’y

I easiest to invert

b = (X’X + kI)-1 X’y

k (ridge constant) as small as possible

Problems

- Choice of ridge constant

- No diagnostics

Principal Component Regression (PCR)

- I ≥ K

-Easy inversion

- A ≤ I- T orthogonal- Noise in X removed

y = Td + f

d = (T’T)-1 T’y

Problem

How many components used?

Advantage

- PCA done on data- Outliers- Classes- Noise in X removed

Partial Least SquaresRegression

X Yt u

w’ q’

Outer relationship

X Yt u

w’ q’

Inner relationship

X Yt u

w’ q’

Advantages

- X decomposed- Y decomposed- Noise in X left out- Noise in Y left out

PCR, PLS are one component at a time methods

After each component, a residual is calculated

The next component is calculatedon the residual

Another view

y = Xb + f

y = XbRR + fRR

y = XbPCR + fPCR

y = XbPLS + fPLS

bbb123OLSShrunk and rotatedA regression vector with too much shrinkage

Subspace of useful regression vectors

Prediction

Xcal ycal

Xtest ytest

Prediction diagnostics

yhat = Xtestb

ftest = ytest -yhat

PRESS = ftest’ftest

RMSEP = [ PRESS / J ] 1/2

Root Mean Squared Error of Prediction

Prediction diagnostics

yhat = Xtestb

ftest = ytest -yhat

R2test = Q2 = 1 - ftest’ftest/ytest’ytest

Some rules of thumb

R2 > 0.65 5 PLS comp.

R2test > 0.5

R2 - R2test < 0.2

f = y - Xb

always 0 bias

ftest = y - yhat

bias = 1/J ftest

Leverage - influence

b= (X’X)-1 X’y

yhat = Xb = X(X’X)-1 X’y = Hy

the Hat matrix

diagonal elements of H: Leverage

b= (X’X)-1 X’y

yhat = Xb = X(X’X)-1 X’y = Hy

the Hat matrix

diagonal elements of H: Leverage

ypred0OutlierBiasedftestUnbiasedLarge varianceSmall varianceHeteroscedastic

Residual plot

Residual

-Check histogram f

-Check variablewise E

-Check objectwise E

Measured responsePredicted responseMeasured responsePredicted responseHeteroscedasticMeasured responsePredicted responseOutlier byextrapolationBad outlierEFG

X Yt u

w’ q’

Plotting: line plots

Scree plot RMSEC, RMSECV, RMSEP

Loading plot against wavel.

Score plot against time

Residual against sample

Residual against yhat

T2 against sample

H against sample

Plotting: scatter plots 2D, 3DScore plot

Loading plot

Biplot

H against residual

Inner relation t - u

Weight wq

Nonlinearities

xyxyxyABDLinearWeak nonlinearxyCStrong nonlinearNon-monotonicxyELinear approximations

Remedies for nonlinearites. Making nonlinear data fit a linear model or making the model nonlinear.

-Fundamental theory (e.g. going from transmittance to absorbance)

-Use extra latent variables in PCR or PLSR

-Use transformations of latent variables

-Remove disturbing variables

-Find subsets that behave linearly

Remedies for nonlinearites. Making nonlinear data fit a linear model or making the model nonlinear.

-Use intrinsically nonlinear methods

-Locally transform variables X, y, or both nonlinearly (powers, logarithms, adding powers)

-Transformation in a neighbourhood (window methods)

-Use global transformations (Fourier, Wavelet)

-GIFI type discretization

regression / calibration

b0 b1x1 b2x2

bkxk fxyikfb

y yhatbias

y xbalways

bkxk flinear approximationy

y left outpcr

i t orthogonal noise

14b1 5b2 e

Documents

regression linear regression

calibration and linear regression analysis: a self-guided...

calibration inspired by semiparametric regression as a...

locally-weighted homographies for calibration of imaging...

modern regression analysis for scientists and engineers...

1 regression and calibration epp 245 statistical analysis of...

tu e 11 calibration of canal gates - usu...

regression and calibration

analysis of sting balance calibration data using optimized...

regression analysis regression models

stress degradation studies and development of validated...

hidden connections between regression models of strain-gage...

conformal prediction - kth · conformal regression is...

1 curve-fitting polynomial interpolation. 2 curve fitting...

regression linear regression regression trees

modern regression - ridge regression

logistic regression -...

efficient regression calibration for logistic regression in...

chapter 2 simple linear regression analysis the simple...

xrf analysis: identifying and estimating errors - prolab...