orthogonalizationapproaches for data preprocessing with ... · example calibration data...
TRANSCRIPT
![Page 1: OrthogonalizationApproaches for Data Preprocessing with ... · Example Calibration Data •IDRC-2002 Shootout data •NIR Transflectance of pharmaceutical tablets •Goal is to predict](https://reader034.vdocument.in/reader034/viewer/2022051606/601a47168e797c456f43675f/html5/thumbnails/1.jpg)
Orthogonalization Approaches for Data Preprocessing with
Pharmaceutical, Petrochemical and Remote Sensing Applications
Barry M. Wise, Jeremy M. Shaver and Neal B. Gallagher
Eigenvector Research, Inc.
![Page 2: OrthogonalizationApproaches for Data Preprocessing with ... · Example Calibration Data •IDRC-2002 Shootout data •NIR Transflectance of pharmaceutical tablets •Goal is to predict](https://reader034.vdocument.in/reader034/viewer/2022051606/601a47168e797c456f43675f/html5/thumbnails/2.jpg)
AbstractOver the past dozen years, a number of powerful spectral analysis methods have been published which make use of orthogonalization (i.e. projection followed by weighted subtraction) of interferences or "clutter." These filtering methods provide a means to mitigate the effect of interferences arising from background chemical or physical species, instrumental artifacts, systematic sampling errors and instrument or system drift. They have been used very effectively with complex biological systems, remote sensing applications, chemical process monitoring and calibration transfer problems.
This class of methods includes Orthogonal Partial Least Squares (O-PLS), External Parameter Orthogonalization(EPO), Dynamic Orthogonal Projection (DOP), Orthogonal Signal Correction (OSC), Constrained Principal Spectral Analysis (CPSA), Generalized Least Squares Weighting (GLSW), and Science Based Calibration (SBC) among others. All are based on the orthogonalization premise and each touts a unique ability to improve model performance, robustness, and/or interpretability.
Some relationships between these methods are noted, along with ties to older work. Examples are given of the use of the methods in calibration and classification problems in pharmaceutical, petrochemical and remote sensing applications.
![Page 3: OrthogonalizationApproaches for Data Preprocessing with ... · Example Calibration Data •IDRC-2002 Shootout data •NIR Transflectance of pharmaceutical tablets •Goal is to predict](https://reader034.vdocument.in/reader034/viewer/2022051606/601a47168e797c456f43675f/html5/thumbnails/3.jpg)
What is an Orthogonalization Filter?
• Removes spectral patterns from data which are "interfering" with signal of interest
• The interfering species are historically called “clutter” (backgrounds, noise, interferents)
• Filters return spectra with features “removed”• Weighted subtraction of one or more vectors• "Soft” orthogonalization is deweighting but
not outright complete subtraction
![Page 4: OrthogonalizationApproaches for Data Preprocessing with ... · Example Calibration Data •IDRC-2002 Shootout data •NIR Transflectance of pharmaceutical tablets •Goal is to predict](https://reader034.vdocument.in/reader034/viewer/2022051606/601a47168e797c456f43675f/html5/thumbnails/4.jpg)
Some Examples Using Orthogonalization Filters(by Eigenvector)
• In vivo Tissue identification with NIR probe• Cancer detection using in vivo fluorescence• Identification of arthlesclerosis in artery walls
using NIR• Determination of hydroxide concentration in
high-concentration aqueous ion solutions using Raman spectroscopy
• Identification of chemical species in remote sensing
![Page 5: OrthogonalizationApproaches for Data Preprocessing with ... · Example Calibration Data •IDRC-2002 Shootout data •NIR Transflectance of pharmaceutical tablets •Goal is to predict](https://reader034.vdocument.in/reader034/viewer/2022051606/601a47168e797c456f43675f/html5/thumbnails/5.jpg)
Method 1: Orthogonalization of Model
Method 2: Pre-selection of "clutter"
SOME Orthogonalization Filters
• OSC – Orthogonal Signal Correction (Wold et. al. 1998)
• OPLS – Orthogonal PLS (Trygg, Wold 2002 , patented)
• MOSC – Modified OSC (POSC - Feudale, Tan, S. Brown 2003)
• CPSA - Constrained Principal Spectral Analysis(J. Brown 1990 , patented)
• EPO – External Parameter Orthogonalization(Roger, Chauchard, Bellon-Maurel 2003)
• GLS – Generalized Least Squares(Aitken 1935, Martens et. al. 2003)
• SBC – Science Based Calibration(Marbach 2005, patented (?))
![Page 6: OrthogonalizationApproaches for Data Preprocessing with ... · Example Calibration Data •IDRC-2002 Shootout data •NIR Transflectance of pharmaceutical tablets •Goal is to predict](https://reader034.vdocument.in/reader034/viewer/2022051606/601a47168e797c456f43675f/html5/thumbnails/6.jpg)
Two General Approaches
Filter
PCADecomposition
ClutterSpectra
Clutter Loadings
ChooseSubset
Method 2: Pre-selection of "clutter"Method 1: Orthogonalization of Model
PCA or PLSDecomposition
CalibrationSpectra
Scores & Loadings
OrthogonalizeTo Y-block
Y-block(Classes)
Clutter Loadings
FilteredSpectra
Repeat for Multiple
Components
![Page 7: OrthogonalizationApproaches for Data Preprocessing with ... · Example Calibration Data •IDRC-2002 Shootout data •NIR Transflectance of pharmaceutical tablets •Goal is to predict](https://reader034.vdocument.in/reader034/viewer/2022051606/601a47168e797c456f43675f/html5/thumbnails/7.jpg)
Orthogonal Signal Correction (OSC)
• Introduced by Wold in 1998– “OSC paper not the clearest thing” – Johan Trygg,
June 9, 2011
• OSC objective function
![Page 8: OrthogonalizationApproaches for Data Preprocessing with ... · Example Calibration Data •IDRC-2002 Shootout data •NIR Transflectance of pharmaceutical tablets •Goal is to predict](https://reader034.vdocument.in/reader034/viewer/2022051606/601a47168e797c456f43675f/html5/thumbnails/8.jpg)
OSC Issues
• To the extent the objective function is optimized OSC doesn’t work• Only works if you don’t try too hard!
• Many algorithms (at least 5) with various problems• Factors not orthogonal to y• Factors don’t capture maximum variance in X• Filtered X not in same subspace as original X
• Often implemented prior to cross validation—totally misleading!
![Page 9: OrthogonalizationApproaches for Data Preprocessing with ... · Example Calibration Data •IDRC-2002 Shootout data •NIR Transflectance of pharmaceutical tablets •Goal is to predict](https://reader034.vdocument.in/reader034/viewer/2022051606/601a47168e797c456f43675f/html5/thumbnails/9.jpg)
O-PLS• Originally formulated as sequential algorithm
(NIPALS based)• Since shown to be obtainable from post-
processing conventional PLS model• Does not improve prediction• Claim is that model is more interpretable
E.K. Kemsley and H.S. Tapp, “OPLS filtered data can be obtained directly from non-orthogonalized PLS1,” J. Chemo, 23, 263-264, 2009R. Ergon, “PLS post-processing by similarity transformation (PLS+ST): a simple alternative to OPLS,” J. Chemo, 19, 1-4, 2005J. Trygg and S. Wold, “Orthogonal Projections to Latent Structures (O-PLS),” J. Chemo, 16, 119-128, 2002.
![Page 10: OrthogonalizationApproaches for Data Preprocessing with ... · Example Calibration Data •IDRC-2002 Shootout data •NIR Transflectance of pharmaceutical tablets •Goal is to predict](https://reader034.vdocument.in/reader034/viewer/2022051606/601a47168e797c456f43675f/html5/thumbnails/10.jpg)
NIR of Pseudo-gasoline Samples
800 900 1000 1100 1200 1300 1400 1500 1600
0
0.005
0.01
0.015
0.02
0.025
Wavelength (nm)
Abso
rban
ceEstimated Pure Component Spectra
![Page 11: OrthogonalizationApproaches for Data Preprocessing with ... · Example Calibration Data •IDRC-2002 Shootout data •NIR Transflectance of pharmaceutical tablets •Goal is to predict](https://reader034.vdocument.in/reader034/viewer/2022051606/601a47168e797c456f43675f/html5/thumbnails/11.jpg)
PLS Model on Component 1
10 12 14 16 18 20 22 24 26 28 3010
12
14
16
18
20
22
24
26
28
30
Y Measured 1
Y C
V Pr
edic
ted
1
Samples/Scores Plot of spec1
R2 = 0.9885 Latent VariablesRMSEC = 0.46945RMSECV = 0.68306Calibration Bias = −1.0658e−14CV Bias = 0.0375
Percent Variance Captured by Regression Model -----X-Block----- -----Y-Block-----
Comp This Total This Total ---- ------- ------- ------- -------1 91.17 91.17 8.36 8.36 2 7.40 98.57 7.19 15.55 3 0.93 99.50 32.81 48.36 4 0.46 99.96 26.18 74.54 5 0.02 99.98 24.90 99.44
![Page 12: OrthogonalizationApproaches for Data Preprocessing with ... · Example Calibration Data •IDRC-2002 Shootout data •NIR Transflectance of pharmaceutical tablets •Goal is to predict](https://reader034.vdocument.in/reader034/viewer/2022051606/601a47168e797c456f43675f/html5/thumbnails/12.jpg)
Regular and O-PLS Filtered Regression Vectors
800 900 1000 1100 1200 1300 1400 1500 1600−2
−1.5
−1
−0.5
0
0.5
1
1.5
Variable
Reg
Vec
tor f
or Y
1
Variables/Loadings Plot for spec1
800 900 1000 1100 1200 1300 1400 1500 1600−25
−20
−15
−10
−5
0
5
10
15
20
25
Variable
Reg
Vec
tor f
or Y
1
Variables/Loadings Plot for spec1
Regular O-PLS Filtered
![Page 13: OrthogonalizationApproaches for Data Preprocessing with ... · Example Calibration Data •IDRC-2002 Shootout data •NIR Transflectance of pharmaceutical tablets •Goal is to predict](https://reader034.vdocument.in/reader034/viewer/2022051606/601a47168e797c456f43675f/html5/thumbnails/13.jpg)
Interpretation
• Better in previous example, but partly because we know what spectra should look like
• What if problem has discrete variables with signal that could be positive, negative or zero?
• Much harder! (see “On the Interpretability of O-PLS Models”)
• Working on developing better understanding of when it will work and when it won’t
![Page 14: OrthogonalizationApproaches for Data Preprocessing with ... · Example Calibration Data •IDRC-2002 Shootout data •NIR Transflectance of pharmaceutical tablets •Goal is to predict](https://reader034.vdocument.in/reader034/viewer/2022051606/601a47168e797c456f43675f/html5/thumbnails/14.jpg)
Orthogonalize Model
![Page 15: OrthogonalizationApproaches for Data Preprocessing with ... · Example Calibration Data •IDRC-2002 Shootout data •NIR Transflectance of pharmaceutical tablets •Goal is to predict](https://reader034.vdocument.in/reader034/viewer/2022051606/601a47168e797c456f43675f/html5/thumbnails/15.jpg)
Pre-selection Methods…
• CPSA - Constrained Principal Spectral Analysis(J. Brown 1990, patented)
• EPO – External Parameter Orthogonalization(Roger, Chauchard, Bellon-Maurel 2003)
• GLS – Generalized Least Squares (Aitken 1935)
• SBC – Science Based Calibration(Marbach 2005, patented (?))
FilterPCADecomposition
ClutterSpectra
Clutter Loadings
ChooseSubset
• Identical• Choose # of PCs
• Quite similar• Down-weight by
scale of eigenvalues
All the same…
![Page 16: OrthogonalizationApproaches for Data Preprocessing with ... · Example Calibration Data •IDRC-2002 Shootout data •NIR Transflectance of pharmaceutical tablets •Goal is to predict](https://reader034.vdocument.in/reader034/viewer/2022051606/601a47168e797c456f43675f/html5/thumbnails/16.jpg)
19
Clutter Covariance
€
Xc = (X1,c − x 1,c ) + (X2,c − x 2,c ) + ...
€
C =XcTXc
N −1
Clutter source 1 Clutter source 2
![Page 17: OrthogonalizationApproaches for Data Preprocessing with ... · Example Calibration Data •IDRC-2002 Shootout data •NIR Transflectance of pharmaceutical tablets •Goal is to predict](https://reader034.vdocument.in/reader034/viewer/2022051606/601a47168e797c456f43675f/html5/thumbnails/17.jpg)
20
Covariance to GLS Weighting Matrix
€
C = VS2VT
€
G = VD−1VT
€
di,i−1 =
1si,i2
g2+1
with Large g è 1, dimension unaffectedSmall g è 0, dimension eliminated
![Page 18: OrthogonalizationApproaches for Data Preprocessing with ... · Example Calibration Data •IDRC-2002 Shootout data •NIR Transflectance of pharmaceutical tablets •Goal is to predict](https://reader034.vdocument.in/reader034/viewer/2022051606/601a47168e797c456f43675f/html5/thumbnails/18.jpg)
Choosing Components
1 2 3 4 5 6 7 8 9 10 11-5.5
-5
-4.5
-4
-3.5
-3
-2.5
-2
-1.5
Principal Component Number
log(
eige
nval
ues)
EPO / CPSAxf = x - xPkPk
T
GLS / SBCxf = x - xPDPT
Eigenvalues of Clutter
One adjustable parameter in each method
k=4 k=5k=3
decreasing D
![Page 19: OrthogonalizationApproaches for Data Preprocessing with ... · Example Calibration Data •IDRC-2002 Shootout data •NIR Transflectance of pharmaceutical tablets •Goal is to predict](https://reader034.vdocument.in/reader034/viewer/2022051606/601a47168e797c456f43675f/html5/thumbnails/19.jpg)
Other Similar Pre-selection Filters…• Extended Mixture Model (Extended Least
Squares) orthogonal filtering for Classical Least Squares (CLS) models!
Target (Calibration) Spectra
Starget
Clutter Spectra
Sclutter
c = xS(STS)-1
Pseudo-inverse is an orthogonalization!
Equivalent to full-rank EPO / CPSA model
![Page 20: OrthogonalizationApproaches for Data Preprocessing with ... · Example Calibration Data •IDRC-2002 Shootout data •NIR Transflectance of pharmaceutical tablets •Goal is to predict](https://reader034.vdocument.in/reader034/viewer/2022051606/601a47168e797c456f43675f/html5/thumbnails/20.jpg)
Pre-selecting Clutter
How to get clutter?Look at differences in samples
which should otherwise be the same.
In classification – all samples within a class should nominally be the same!
Use Calibration itself! Filter
PCADecomposition
ClutterSpectra
Clutter Loadings
ChooseSubset
CalibrationSpectra
![Page 21: OrthogonalizationApproaches for Data Preprocessing with ... · Example Calibration Data •IDRC-2002 Shootout data •NIR Transflectance of pharmaceutical tablets •Goal is to predict](https://reader034.vdocument.in/reader034/viewer/2022051606/601a47168e797c456f43675f/html5/thumbnails/21.jpg)
More on How to Get Clutter
• Pure component spectra of known interferences
• Subspace spanned by – samples where analyte of interest is not present– variation in data that is all of the same class – differences between samples where analyte of
interest is (nearly) the same, e.g. y-gradient– repeat measurement of blanks
• Make it up! e.g. polynomial baseline shapes
![Page 22: OrthogonalizationApproaches for Data Preprocessing with ... · Example Calibration Data •IDRC-2002 Shootout data •NIR Transflectance of pharmaceutical tablets •Goal is to predict](https://reader034.vdocument.in/reader034/viewer/2022051606/601a47168e797c456f43675f/html5/thumbnails/22.jpg)
Y-gradient Method
• Sort samples by y (reference) values• Take differences between adjacent samples• Weight X-differences by inverse of difference
in y values • Deweight by covariance of differences (GLS) or
orthogonalize against some number of PCs (EPO, ELS, EMM, PA-CLS)
![Page 23: OrthogonalizationApproaches for Data Preprocessing with ... · Example Calibration Data •IDRC-2002 Shootout data •NIR Transflectance of pharmaceutical tablets •Goal is to predict](https://reader034.vdocument.in/reader034/viewer/2022051606/601a47168e797c456f43675f/html5/thumbnails/23.jpg)
Orthogonalization FiltersFilter Soft/
HardAdj. Params
Clutter source Improves Prediction?
OSC Hard # LVs Part of X orthogonal to y No, but reduces models complexity
O-PLS Hard # LVs Part of X-model space orthogonal to X’y
No, but improves interpretation
MOSC Hard # PCs Part of X orthogonal to y Maybe
CPSA Hard # PCs A priori, includes pathlength adj. Yes
EPO Hard # PCs Classes, y-gradient or a priori Yes
DOP Hard # PCs Synthetic reference samples Yes
GLS Soft Shrinkage a Classes, y-gradient or a priori Yes
SBC Soft # PCs (20?) Repeat samples or blanks Yes
EMM Hard None A priori from known interferents, clutter subspace
Yes, CLS model
ELS Hard # PCs Clutter subspace Yes
PA-CLS Hard None/# PCs Baseline shapes, residuals Yes, CLS model
WLS Soft Regularization Noise measurements Yes
![Page 24: OrthogonalizationApproaches for Data Preprocessing with ... · Example Calibration Data •IDRC-2002 Shootout data •NIR Transflectance of pharmaceutical tablets •Goal is to predict](https://reader034.vdocument.in/reader034/viewer/2022051606/601a47168e797c456f43675f/html5/thumbnails/24.jpg)
We think it is useful to use Clutter!
![Page 25: OrthogonalizationApproaches for Data Preprocessing with ... · Example Calibration Data •IDRC-2002 Shootout data •NIR Transflectance of pharmaceutical tablets •Goal is to predict](https://reader034.vdocument.in/reader034/viewer/2022051606/601a47168e797c456f43675f/html5/thumbnails/25.jpg)
Example Classification Data
4000 3500 3000 2500 2000 1500 1000 500
0
0.5
1
1.5
2
2.5
Using these regions only
• Mid-IR spectra of food grade oils• Classify oils, detect adulterated olive oil
![Page 26: OrthogonalizationApproaches for Data Preprocessing with ... · Example Calibration Data •IDRC-2002 Shootout data •NIR Transflectance of pharmaceutical tablets •Goal is to predict](https://reader034.vdocument.in/reader034/viewer/2022051606/601a47168e797c456f43675f/html5/thumbnails/26.jpg)
Calibration with MSC
�0.2 �0.15 �0.1 �0.05 0 0.05 0.1 0.15 0.2 0.25 0.3�0.1
�0.05
0
0.05
0.1
0.15
0.2
0.25
Scores on PC 1 (84.99%)
Scor
es o
n PC
2 (1
2.71
%)
Samples/Scores Plot of Olive Oil Calibration
Scores on PC 2 (12.71%)CMargCornOliveSaffl
![Page 27: OrthogonalizationApproaches for Data Preprocessing with ... · Example Calibration Data •IDRC-2002 Shootout data •NIR Transflectance of pharmaceutical tablets •Goal is to predict](https://reader034.vdocument.in/reader034/viewer/2022051606/601a47168e797c456f43675f/html5/thumbnails/27.jpg)
Cal and Test with MSC
�0.2 �0.15 �0.1 �0.05 0 0.05 0.1 0.15 0.2 0.25 0.3�0.1
�0.05
0
0.05
0.1
0.15
0.2
0.25
Scores on PC 1 (84.99%)
Scor
es o
n PC
2 (1
2.71
%)
Samples/Scores Plot of Olive Oil Calibration & Oiltest,
![Page 28: OrthogonalizationApproaches for Data Preprocessing with ... · Example Calibration Data •IDRC-2002 Shootout data •NIR Transflectance of pharmaceutical tablets •Goal is to predict](https://reader034.vdocument.in/reader034/viewer/2022051606/601a47168e797c456f43675f/html5/thumbnails/28.jpg)
With MSC and GLS
�0.06 �0.04 �0.02 0 0.02 0.04 0.06 0.08 0.1�0.04
�0.02
0
0.02
0.04
0.06
0.08
0.1
0.12
0.14
Scores on PC 1 (61.84%)
Scor
es o
n PC
2 (3
7.55
%)
Samples/Scores Plot of Olive Oil Calibration & Oiltest,
![Page 29: OrthogonalizationApproaches for Data Preprocessing with ... · Example Calibration Data •IDRC-2002 Shootout data •NIR Transflectance of pharmaceutical tablets •Goal is to predict](https://reader034.vdocument.in/reader034/viewer/2022051606/601a47168e797c456f43675f/html5/thumbnails/29.jpg)
Zoom on Olive Oil
�0.06 �0.055 �0.05 �0.045 �0.04 �0.035 �0.03 �0.025 �0.02 �0.015
�0.035
�0.03
�0.025
�0.02
�0.015
�0.01
Scores on PC 1 (61.84%)
Scor
es o
n PC
2 (3
7.55
%)
Samples/Scores Plot of Olive Oil Calibration & Oiltest,
Calibration and test Olive Oil
Adulterated Olive Oil
![Page 30: OrthogonalizationApproaches for Data Preprocessing with ... · Example Calibration Data •IDRC-2002 Shootout data •NIR Transflectance of pharmaceutical tablets •Goal is to predict](https://reader034.vdocument.in/reader034/viewer/2022051606/601a47168e797c456f43675f/html5/thumbnails/30.jpg)
Zoom on Corn and Safflower Oil
0.04 0.045 0.05 0.055 0.06 0.065 0.07 0.075 0.08
�15
�10
�5
0
5
x 10�3
Scores on PC 1 (61.84%)
Scor
es o
n PC
2 (3
7.55
%)
Samples/Scores Plot of Olive Oil Calibration & Oiltest,
Calibration and test Corn Oil
Calibration and test Safflower Oil
![Page 31: OrthogonalizationApproaches for Data Preprocessing with ... · Example Calibration Data •IDRC-2002 Shootout data •NIR Transflectance of pharmaceutical tablets •Goal is to predict](https://reader034.vdocument.in/reader034/viewer/2022051606/601a47168e797c456f43675f/html5/thumbnails/31.jpg)
Zoom on Corn Margarine
�0.04 �0.0395 �0.039 �0.0385 �0.038
0.1215
0.122
0.1225
0.123
0.1235
Scores on PC 1 (61.84%)
Scor
es o
n PC
2 (3
7.55
%)
Samples/Scores Plot of Olive Oil Calibration & Oiltest,
![Page 32: OrthogonalizationApproaches for Data Preprocessing with ... · Example Calibration Data •IDRC-2002 Shootout data •NIR Transflectance of pharmaceutical tablets •Goal is to predict](https://reader034.vdocument.in/reader034/viewer/2022051606/601a47168e797c456f43675f/html5/thumbnails/32.jpg)
With MSC and EPO
�0.2 �0.15 �0.1 �0.05 0 0.05 0.1 0.15 0.2 0.25�0.05
0
0.05
0.1
0.15
0.2
Scores on PC 1 (88.03%)
Scor
es o
n PC
2 (1
1.52
%)
Samples/Scores Plot of Olive Oil Calibration & Oiltest,
![Page 33: OrthogonalizationApproaches for Data Preprocessing with ... · Example Calibration Data •IDRC-2002 Shootout data •NIR Transflectance of pharmaceutical tablets •Goal is to predict](https://reader034.vdocument.in/reader034/viewer/2022051606/601a47168e797c456f43675f/html5/thumbnails/33.jpg)
Indian Pines Data
• Classic image data set used in many publications
• Crop area near West Lafayette, Indiana• Ground truth identified 16 know crop areas• Data from AVIRIS: Airborne Visible/Infrared
Imaging Spectrometer• 220 channels, 400-2500nm
![Page 34: OrthogonalizationApproaches for Data Preprocessing with ... · Example Calibration Data •IDRC-2002 Shootout data •NIR Transflectance of pharmaceutical tablets •Goal is to predict](https://reader034.vdocument.in/reader034/viewer/2022051606/601a47168e797c456f43675f/html5/thumbnails/34.jpg)
Indian Pines Image
![Page 35: OrthogonalizationApproaches for Data Preprocessing with ... · Example Calibration Data •IDRC-2002 Shootout data •NIR Transflectance of pharmaceutical tablets •Goal is to predict](https://reader034.vdocument.in/reader034/viewer/2022051606/601a47168e797c456f43675f/html5/thumbnails/35.jpg)
Soybean Fields
Soybeans no tillSoybeans minSoybeans clean
![Page 36: OrthogonalizationApproaches for Data Preprocessing with ... · Example Calibration Data •IDRC-2002 Shootout data •NIR Transflectance of pharmaceutical tablets •Goal is to predict](https://reader034.vdocument.in/reader034/viewer/2022051606/601a47168e797c456f43675f/html5/thumbnails/36.jpg)
PLS-DA, Mean-Center Only
Class Probability Image
![Page 37: OrthogonalizationApproaches for Data Preprocessing with ... · Example Calibration Data •IDRC-2002 Shootout data •NIR Transflectance of pharmaceutical tablets •Goal is to predict](https://reader034.vdocument.in/reader034/viewer/2022051606/601a47168e797c456f43675f/html5/thumbnails/37.jpg)
PLS-DA, EPO 1-PC
Class Probability Image
![Page 38: OrthogonalizationApproaches for Data Preprocessing with ... · Example Calibration Data •IDRC-2002 Shootout data •NIR Transflectance of pharmaceutical tablets •Goal is to predict](https://reader034.vdocument.in/reader034/viewer/2022051606/601a47168e797c456f43675f/html5/thumbnails/38.jpg)
Example Calibration Data• IDRC-2002 Shootout data• NIR Transflectance of pharmaceutical tablets• Goal is to predict assay value
600 800 1000 1200 1400 1600 18002
2.5
3
3.5
4
4.5
5
5.5
6
6.5
Wavelength (nm)
Sign
al Int
ensit
y
![Page 39: OrthogonalizationApproaches for Data Preprocessing with ... · Example Calibration Data •IDRC-2002 Shootout data •NIR Transflectance of pharmaceutical tablets •Goal is to predict](https://reader034.vdocument.in/reader034/viewer/2022051606/601a47168e797c456f43675f/html5/thumbnails/39.jpg)
Calibration and Test with MSC & MC
150 160 170 180 190 200 210 220 230 240150
160
170
180
190
200
210
220
230
240
Y Measured 3 assay
Y Pr
edic
ted
3 as
say
Samples/Scores Plot of calibrate_1,c & test_1,
R^2 = 0.9642 Latent VariablesRMSEC = 3.3253RMSEP = 3.3487Calibration Bias = 0Prediction Bias = �0.4224
![Page 40: OrthogonalizationApproaches for Data Preprocessing with ... · Example Calibration Data •IDRC-2002 Shootout data •NIR Transflectance of pharmaceutical tablets •Goal is to predict](https://reader034.vdocument.in/reader034/viewer/2022051606/601a47168e797c456f43675f/html5/thumbnails/40.jpg)
With MSC, GLS & MC
150 160 170 180 190 200 210 220 230 240150
160
170
180
190
200
210
220
230
240
Y Measured 3 assay
Y Pr
edic
ted
3 as
say
Samples/Scores Plot of calibrate_1,c & test_1,
R^2 = 0.9842 Latent VariablesRMSEC = 2.5171RMSEP = 2.159Calibration Bias = �1.1369e�13Prediction Bias = 0.067298
![Page 41: OrthogonalizationApproaches for Data Preprocessing with ... · Example Calibration Data •IDRC-2002 Shootout data •NIR Transflectance of pharmaceutical tablets •Goal is to predict](https://reader034.vdocument.in/reader034/viewer/2022051606/601a47168e797c456f43675f/html5/thumbnails/41.jpg)
With MSC, EPO & MC
150 160 170 180 190 200 210 220 230 240150
160
170
180
190
200
210
220
230
240
Y Measured 3 assay
Y Pr
edic
ted
3 as
say
Samples/Scores Plot of calibrate_1,c & test_1,
R^2 = 0.9792 Latent VariablesRMSEC = 3.0015RMSEP = 2.3951Calibration Bias = �8.5265e�14Prediction Bias = 0.18893
![Page 42: OrthogonalizationApproaches for Data Preprocessing with ... · Example Calibration Data •IDRC-2002 Shootout data •NIR Transflectance of pharmaceutical tablets •Goal is to predict](https://reader034.vdocument.in/reader034/viewer/2022051606/601a47168e797c456f43675f/html5/thumbnails/42.jpg)
With MSC, ELS and MC
150 160 170 180 190 200 210 220 230 240150
160
170
180
190
200
210
220
230
240
250
Y Measured 3 assay
Y Pr
edic
ted
3 as
say
Samples/Scores Plot of calibrate_1,c & test_1,
R^2 = 0.9492 Latent VariablesRMSEC = 2.3962RMSEP = 4.5752Calibration Bias = �1.08e�12Prediction Bias = 1.2235
![Page 43: OrthogonalizationApproaches for Data Preprocessing with ... · Example Calibration Data •IDRC-2002 Shootout data •NIR Transflectance of pharmaceutical tablets •Goal is to predict](https://reader034.vdocument.in/reader034/viewer/2022051606/601a47168e797c456f43675f/html5/thumbnails/43.jpg)
Conclusions• Main differences between methods are– How the clutter is defined– Whether the de-weighting is hard or soft
• Filtering methods are more similar than published statements might have you believe
• Methods achieve similar results, model performance generally improved (except O-PLS, OSC)
• Interpretation of filtered results can be challenging – except OPLS (mostly)