![Page 1: 1 6. Other issues Quimiometria Teórica e Aplicada Instituto de Química - UNICAMP](https://reader035.vdocument.in/reader035/viewer/2022062717/56649e605503460f94b5b33b/html5/thumbnails/1.jpg)
1
6. Other issues
Quimiometria Teórica e Aplicada
Instituto de Química - UNICAMP
![Page 2: 1 6. Other issues Quimiometria Teórica e Aplicada Instituto de Química - UNICAMP](https://reader035.vdocument.in/reader035/viewer/2022062717/56649e605503460f94b5b33b/html5/thumbnails/2.jpg)
2
How many components to use?How many components to use?
• Use ‘unfolding trick’ i.e. look at rank of each mode.– does not have strict statistical basis, but generally works
well!
• Use core-consistency diagnostic (PARAFAC).– also seems to work well in practice
• Split-half analysis.
• Does algorithm converge without problems?
• Use full cross-validation.– N-way Toolbox now has a routine for this – can be slow!
• Look at loadings and residuals.
• Use chemical knowledge.
![Page 3: 1 6. Other issues Quimiometria Teórica e Aplicada Instituto de Química - UNICAMP](https://reader035.vdocument.in/reader035/viewer/2022062717/56649e605503460f94b5b33b/html5/thumbnails/3.jpg)
3
Preprocessing: centering (1)Preprocessing: centering (1)
• We are often interested in the differences between objects, not in their absolute values.– building calibration models: differences between samples
• Mean-centering removes offsets from the data– removes constant background effects
– can help to linearize data, i.e.
![Page 4: 1 6. Other issues Quimiometria Teórica e Aplicada Instituto de Química - UNICAMP](https://reader035.vdocument.in/reader035/viewer/2022062717/56649e605503460f94b5b33b/html5/thumbnails/4.jpg)
4
Preprocessing: centering (2)Preprocessing: centering (2)
• When performing a calibration, it is most common to remove the mean value from each column:
X
jx
ob
ject
variable
Two-way
jijij xxx *
X
primary variable
secondary variable
ob
ject
xjk
Three-way
jkx
jkijkijk xxx *
![Page 5: 1 6. Other issues Quimiometria Teórica e Aplicada Instituto de Química - UNICAMP](https://reader035.vdocument.in/reader035/viewer/2022062717/56649e605503460f94b5b33b/html5/thumbnails/5.jpg)
5
Preprocessing: scaling (1)Preprocessing: scaling (1)
• Sometimes we want to analyse variables measured in different units– chemical engineering: temperatures, pressures, flow rates
– QSAR: ionization constants, Hammett constants, dipole moments
• These variables should be scaled in order to give variables an equal chance to appear in the model.
![Page 6: 1 6. Other issues Quimiometria Teórica e Aplicada Instituto de Química - UNICAMP](https://reader035.vdocument.in/reader035/viewer/2022062717/56649e605503460f94b5b33b/html5/thumbnails/6.jpg)
6
Preprocessing: scaling (2)Preprocessing: scaling (2)
• For two-way arrays (object variables), it is common to divide by the standard deviation after mean-centering the data (‘autoscaling’):
X
j
ob
ject
variable
Two-way
jijij xx /*
X
primary variable
secondary variable
ob
ject
xjk
Three-way
jkAutoscaling can destroy
multilinear structure!
![Page 7: 1 6. Other issues Quimiometria Teórica e Aplicada Instituto de Química - UNICAMP](https://reader035.vdocument.in/reader035/viewer/2022062717/56649e605503460f94b5b33b/html5/thumbnails/7.jpg)
7
Preprocessing: scaling (3)Preprocessing: scaling (3)
process variable
time
ob
ject
X
Xj
Slab scaling maintains the multilinear structure!
jijkijk xx /*
jprocess variable 1
process variable 2
ob
ject
X
Xj
Xk
j k
Double slab scaling may also be useful - ITERATIVE
kijkijk
jijkijk
xx
xx
/
/**
*
![Page 8: 1 6. Other issues Quimiometria Teórica e Aplicada Instituto de Química - UNICAMP](https://reader035.vdocument.in/reader035/viewer/2022062717/56649e605503460f94b5b33b/html5/thumbnails/8.jpg)
8
Tucker modelsTucker models
• Tucker1: X = AG + E– Tucker1 = PCA
• Tucker2: X = G(BA)T + E– G (I R2 R3)
– very rarely used
• Tucker3: X = AG(CB)T + E
![Page 9: 1 6. Other issues Quimiometria Teórica e Aplicada Instituto de Química - UNICAMP](https://reader035.vdocument.in/reader035/viewer/2022062717/56649e605503460f94b5b33b/html5/thumbnails/9.jpg)
9
PARAFAC2PARAFAC2
time shift
wavelength (J)
time (K)
ob
ject
(I)
In PARAFAC2, only the matrix product XiXi
T (J J) is modelled. It works if the correlation structures in the objects are the same.
time shift
![Page 10: 1 6. Other issues Quimiometria Teórica e Aplicada Instituto de Química - UNICAMP](https://reader035.vdocument.in/reader035/viewer/2022062717/56649e605503460f94b5b33b/html5/thumbnails/10.jpg)
10
Missing dataMissing data
• Expectation-maximization (EM) is a technique for estimating models (PARAFAC, Tucker, PLS, PCA etc.) when some of the data is missing:
X = [X* X#]
known missing
• 0. Initialize X#
nnn EXX ˆ• 1. Estimate model, (maximization)
• 3. Repeat until convergence
• 2. Replace missing values with model values
(expectation)## ˆnn XX
![Page 11: 1 6. Other issues Quimiometria Teórica e Aplicada Instituto de Química - UNICAMP](https://reader035.vdocument.in/reader035/viewer/2022062717/56649e605503460f94b5b33b/html5/thumbnails/11.jpg)
11
MuitoMuito obrigadoobrigadoparapara sua
atenção!atenção!