5. multiway calibration
DESCRIPTION
5. Multiway calibration. Quimiometria Teórica e Aplicada Instituto de Química - UNICAMP. Multiway regression problems e.g. batch reaction monitoring. Process measurements. Product quality. Y. X. batch. batch. time. product quality. process variable. - PowerPoint PPT PresentationTRANSCRIPT
1
5. Multiway calibration
Quimiometria Teórica e Aplicada
Instituto de Química - UNICAMP
2
Multiway regression problems Multiway regression problems e.g.e.g. batch reaction batch reaction monitoringmonitoring
Process measurements Product quality
X
process variable
time
ba
tch
ba
tch
product quality
Y
3
Multiway regression problems Multiway regression problems e.g.e.g. tandem mass tandem mass spectrscopyspectrscopy
X5
X4
X3
X2
X1
sam
ple
s
parent ion m/z
daughter ion m/z
sam
ple
compound
MS-MS spectra Compound concentrations
4
Some terminologySome terminology
Univariate calibration
(OLS – ordinary least squares)
Multivariate calibration
(ridge regression, PCR, PLS etc.)
Second-order advantage
(PARAFAC, restricted Tucker, GRAM, RBL etc.)
zero-order
first-order
second-order
Cannot handle interferents
Can handle interferents if they are present in the
training set
Can handle unknown interferents (although see work of
K.Faber)
N-PLS(?)
5
Multiway calibration methodsMultiway calibration methods
• PARAFAC (already discussed on first day)
• (Unfold-PLS)
• Multiway PCR
• N-PLS
• MCovR (multiway covariates regression) (see work of
Smilde & Gurden)
• GRAM, NBRA, RBL (see work of Kowalski et al.)
6
Unfold-PLSUnfold-PLS
• Matricize (or ‘unfold’) the data and use standard two-way PLS:
X
J
K
I
X1 ... XI
I
JK
• But if a multiway structure exists in the data, multiway methods have some important advantages!!
M
Y
I
7
Two-way PCRTwo-way PCR
• Standard PCR for X (I J) and y (I 1).
1. Calculate PCA model of X:
X = TPT + E
2. Use PCA scores for ordinary regression:
y = Tb + E
b = (TTT)-1TTy
3. Make predictions for new samples:
Tnew = XnewP
ynew = Tnew b
Y
b
1. Calculate PCA model of X:
X = TPT + E
2. Use PCA scores for ordinary regression:
y = Tb + E
b = (TTT)-1TTy
X E
PT
T+=
1. Calculate PCA model of X:
X = TPT + E
8
Multiway PCRMultiway PCR
• Multiway PCR for X (I J K) and y (I 1).
1. Calculate multiway model:
X = A(C||B)T + E
2. Use scores for regression:
y = A bPCR + E
bPCR = (ATA)-1ATy
3. Make predictions for new samples:
Anew = XnewP(PTP)-1
where P = (C||B)
ynew = Anew bPCR
Y
bPCR
1. Calculate multiway model:
X = A(C||B)T + E
2. Use scores for regression:
y = A bPCR + E
bPCR = (ATA)-1ATy
BT
A
+=
CT
X E
1. Calculate multiway model:
X = A(C||B)T + E
9
N-PLSN-PLS
• N-PLS is a direct extension of standard two-way PLS for N-way arrays.
• The advantages of N-PLS are the same as for any multiway analysis:– a more parsimonious model
– loadings which are easier to plot and interpret
10
N-PLSN-PLS
• The standard two-way PLS algorithm (see ‘Multivariate Calibration’ by Martens and Næs):
• The N-PLS algorithm (R.Bro) uses PARAFAC-type loadings, but is otherwise very similar
1 ith w
,covmax 11
r
rr
r
r
w
ywXw
r
rr wXt 1 T1
rrrr wtXX
rr Uqyy 0
1.
2.
3.
4.
1 with
,covmax 11
,
rr
rrr
r
rr
vw
ywvXvw
rrr
r wvXt 1
T1rrr
rr wvtXX
rr Uqyy 0
1.
2.
3.
4.
11
N-PLS graphicN-PLS graphic(taken from R.Bro)(taken from R.Bro)
12
Other methodsOther methods
• Multiway covariates regression (MCovR)– different to PLS-type models– choice of structure on X (PARAFAC, Tucker, unfold etc.)– sometimes loadings are easier to interpret–
standard, N
mixture, N + M
• Restricted Tucker, GRAM, RBL, NBRA etc.– for more specialized use
– second-order advantage, i.e. able to handle unknown interferents
1
0
N M
restricted loadings, A
221min T
YTX
WXWPYXWPX
13
ConclusionsConclusions
• There are a number of different calibration methods for multiway data.
• N-PLS is a extension of two-way PLS for multiway data.
• All the normal guidelines for multivariate regression still apply!!– watch out for outliers
– don’t apply the model outside of the calibration range
14
• Outliers are objects which are very different from the rest of the data. These can have a large effect on the regression model and should be removed.
Outliers (1)Outliers (1)
1 1.5 2 2.5 3 3.5 4 4.54
6
8
10
12
14
16
18
pH
T (o C
)
1 1.5 2 2.5 3 3.5 4 4.54
6
8
10
12
14
16
18
pHT
(o C)
Remove outlier
bad experiment
15
Outliers (2)Outliers (2)
• Outliers can also be found in the model space or in the residuals.
-8 -6 -4 -2 0 2 4 6 8-8
-6
-4
-2
0
2
4
6
Scores PC 1
Sco
res
PC
2
22 24 26 28 30 32 34 36 38 40 420
2
4
6
8
10
12
14
Time (min)
Sum
-of-s
quar
ed r
esid
uals
16
Model extrapolation...Model extrapolation...
18 20 22 24 26 28 3075
76
77
78
79
80
81
82
83
84
Age (months)
Hei
ght
(cm
)
• Univariate example: mean height vs age of a group of young children
• A strong linear relationship between height and age is seen.
• For young children, height and age are correlated.
Moore, D.S. and McCabe G.P., Introduction to the Practice of Statistics (1989).
17
... can be dangerous!... can be dangerous!
0 5 10 15 20 25 300
50
100
150
200
250
300
Age (years)
Hei
ght
(cm
)
Linear model was valid for this age range...
...but is not valid for 30 year olds!