using clutter to improve models - eigenvector · clutter, defined as theconfoundingeffects...
TRANSCRIPT
![Page 1: Using Clutter to Improve Models - Eigenvector · Clutter, defined as theconfoundingeffects ofinterferingchemical species, physical effects, noise andinstrument non-idealities, is](https://reader034.vdocument.in/reader034/viewer/2022042513/5fa3ad9f8063300b9242489f/html5/thumbnails/1.jpg)
Using Clutter to Improve Models
Barry M. WiseEigenvector Research, Inc.
Manson, WA USA
![Page 2: Using Clutter to Improve Models - Eigenvector · Clutter, defined as theconfoundingeffects ofinterferingchemical species, physical effects, noise andinstrument non-idealities, is](https://reader034.vdocument.in/reader034/viewer/2022042513/5fa3ad9f8063300b9242489f/html5/thumbnails/2.jpg)
AbstractClutter, defined as the confounding effects of interfering chemical species, physical effects, noise and instrument non-idealities, is present in all measurements. Sources of clutter include variation in chemical interferents, physical effects such as scattering due to particles, changes in temperature or pressure, instrument drift, detector non-linearity, as well as non-systematic random noise. The effect of clutter on models for sample classification or regression can be mitigated through use of a clutter model. These models can be derived in a number of ways such as combined class-centered data, background characterization or y-block gradient. Once obtained, they can be used to construct filters to be used in preprocessing, such as Generalized Least Squares Weighting, (GLSW), and External Parameter Orthogonalization (EPO). Clutter models can also be used directly with alternative model forms based on Classical Least Squares (CLS) such as Extended Least Squares (ELS). This talk discusses methods for obtaining clutter models and demonstrates their use in a number of applications.
Over the past dozen years, a number of powerful spectral analysis methods have been published which make use of orthogonalization (i.e. projection followed by weighted subtraction) of interferences or "clutter." These filtering methods provide a means to mitigate the effect of interferences arising from background chemical or physical species, instrumental artifacts, systematic sampling errors and instrument or system drift. They have been used very effectively with complex biological systems, remote sensing applications, chemical process monitoring and calibration transfer problems.
This class of methods includes Orthogonal Partial Least Squares (O-PLS), External Parameter Orthogonalization(EPO), Dynamic Orthogonal Projection (DOP), Orthogonal Signal Correction (OSC), Constrained Principal Spectral Analysis (CPSA), Generalized Least Squares Weighting (GLSW), and Science Based Calibration (SBC) among others. All are based on the orthogonalization premise and each touts a unique ability to improve model performance, robustness, and/or interpretability.
Some relationships between these methods are noted, along with ties to older work. Examples are given of the use of the methods in calibration and classification problems in pharmaceutical, petrochemical and remote
![Page 3: Using Clutter to Improve Models - Eigenvector · Clutter, defined as theconfoundingeffects ofinterferingchemical species, physical effects, noise andinstrument non-idealities, is](https://reader034.vdocument.in/reader034/viewer/2022042513/5fa3ad9f8063300b9242489f/html5/thumbnails/3.jpg)
Outline
• What is clutter? • Orthogonalization filters• How to get a clutter models• Ways to deal with clutter• Examples
![Page 4: Using Clutter to Improve Models - Eigenvector · Clutter, defined as theconfoundingeffects ofinterferingchemical species, physical effects, noise andinstrument non-idealities, is](https://reader034.vdocument.in/reader034/viewer/2022042513/5fa3ad9f8063300b9242489f/html5/thumbnails/4.jpg)
What is “Clutter?”
• A confused multitude of things: a condition in which things are not in their expected places
• Radar Clutter Definition: (DOD, NATO) Unwanted signals, echoes, or images on the face of the display tube, which interfere with observation of desired signals.
• Variations in the signal (e.g. spectra) not due to the factor (e.g. analyte) of interest due to systematic or random effects
![Page 5: Using Clutter to Improve Models - Eigenvector · Clutter, defined as theconfoundingeffects ofinterferingchemical species, physical effects, noise andinstrument non-idealities, is](https://reader034.vdocument.in/reader034/viewer/2022042513/5fa3ad9f8063300b9242489f/html5/thumbnails/5.jpg)
Measured Signal
• Clutter is present in all measurements– X-block, Y-block
Measured Signal
Target Signal
Clutter Signal
Interference Signal
Noise
5
![Page 6: Using Clutter to Improve Models - Eigenvector · Clutter, defined as theconfoundingeffects ofinterferingchemical species, physical effects, noise andinstrument non-idealities, is](https://reader034.vdocument.in/reader034/viewer/2022042513/5fa3ad9f8063300b9242489f/html5/thumbnails/6.jpg)
Sources of Clutter• Systematic background variability– in the system being sensed
• Interfering analytes not of interest• Changes in particle size distribution• T, P changes, • Variable sample matrix, e.g. pH
– due to physics of instrument• Drift, optics clouding• Instrument maintenance• Variable baseline or gain
• Non-systematic random noise• homoscedastic, heteroscedastic
6
![Page 7: Using Clutter to Improve Models - Eigenvector · Clutter, defined as theconfoundingeffects ofinterferingchemical species, physical effects, noise andinstrument non-idealities, is](https://reader034.vdocument.in/reader034/viewer/2022042513/5fa3ad9f8063300b9242489f/html5/thumbnails/7.jpg)
Orthogonalization Filters
• Remove clutter from data which interfere with signal of interest
• Filters return spectra with clutter “removed”• “Hard” orthogonalization is projection of a
subspace out of the data• "Soft” orthogonalization is deweighting but
not outright complete subtraction
![Page 8: Using Clutter to Improve Models - Eigenvector · Clutter, defined as theconfoundingeffects ofinterferingchemical species, physical effects, noise andinstrument non-idealities, is](https://reader034.vdocument.in/reader034/viewer/2022042513/5fa3ad9f8063300b9242489f/html5/thumbnails/8.jpg)
Some Examples Using Orthogonalization Filters(by Eigenvector)
• In vivo Tissue identification with NIR probe• Cancer detection using in vivo fluorescence• Identification of arthlesclerosis in artery walls
using NIR• Determination of hydroxide concentration in
high-concentration aqueous ion solutions using Raman spectroscopy
• Identification of chemical species in remote sensing
![Page 9: Using Clutter to Improve Models - Eigenvector · Clutter, defined as theconfoundingeffects ofinterferingchemical species, physical effects, noise andinstrument non-idealities, is](https://reader034.vdocument.in/reader034/viewer/2022042513/5fa3ad9f8063300b9242489f/html5/thumbnails/9.jpg)
Method 1: Orthogonalization of Model
Method 2: Pre-selection of "clutter"
SOME Orthogonalization Filters
• OSC – Orthogonal Signal Correction (Wold et. al. 1998)
• OPLS – Orthogonal PLS (Trygg, Wold 2002 , patented)
• MOSC – Modified OSC (POSC - Feudale, Tan, S. Brown 2003)
• CPSA - Constrained Principal Spectral Analysis (J. Brown 1990 , patented)
• EPO – External Parameter Orthogonalization (Roger et. al 2003)
• GLS – Generalized Least Squares (Aitken 1935, Martens et. al. 2003)
• SBC – Science Based Calibration (Marbach 2005, patented)
• EMSC – Extended Multiplicative Scatter Correction (Martens, Stark)
• ELS/EMM – Extended Least Squares/Extended Mixture Model
![Page 10: Using Clutter to Improve Models - Eigenvector · Clutter, defined as theconfoundingeffects ofinterferingchemical species, physical effects, noise andinstrument non-idealities, is](https://reader034.vdocument.in/reader034/viewer/2022042513/5fa3ad9f8063300b9242489f/html5/thumbnails/10.jpg)
Focusing on this
Two General Approaches
Filter
PCADecomposition
ClutterSpectra
Clutter Loadings
ChooseSubset
Method 2: Pre-selection of "clutter"Method 1: Orthogonalization of Model
PCA or PLSDecomposition
CalibrationSpectra
Scores & Loadings
OrthogonalizeTo Y-block
Y-block(Classes)
Clutter Loadings
FilteredSpectra
Repeat for Multiple
Components
![Page 11: Using Clutter to Improve Models - Eigenvector · Clutter, defined as theconfoundingeffects ofinterferingchemical species, physical effects, noise andinstrument non-idealities, is](https://reader034.vdocument.in/reader034/viewer/2022042513/5fa3ad9f8063300b9242489f/html5/thumbnails/11.jpg)
Pre-selection Methods…
FilterPCADecomposition
ClutterSpectra
Clutter Loadings
ChooseSubset
• Identical• Choose # of PCs
• Quite similar• Down-weight by
scale of eigenvalues
All the same…
• CLS type models
• CPSA - Constrained Principal Spectral Analysis
• EPO – External Parameter Orthogonalization
• GLS – Generalized Least Squares
• SBC – Science Based Calibration
• EMSC – Extended MSC
• EMM/ELS – Extended Mixture Model
![Page 12: Using Clutter to Improve Models - Eigenvector · Clutter, defined as theconfoundingeffects ofinterferingchemical species, physical effects, noise andinstrument non-idealities, is](https://reader034.vdocument.in/reader034/viewer/2022042513/5fa3ad9f8063300b9242489f/html5/thumbnails/12.jpg)
Pre-selecting Clutter
How to get clutter?Look at differences in samples
which should otherwise be the same.
In classification – all samples within a class should nominally be the same!
Use Calibration itself! Filter
PCADecomposition
ClutterSpectra
Clutter Loadings
ChooseSubset
CalibrationSpectra
![Page 13: Using Clutter to Improve Models - Eigenvector · Clutter, defined as theconfoundingeffects ofinterferingchemical species, physical effects, noise andinstrument non-idealities, is](https://reader034.vdocument.in/reader034/viewer/2022042513/5fa3ad9f8063300b9242489f/html5/thumbnails/13.jpg)
More on How to Get Clutter
• Pure component spectra of known interferences
• Subspace spanned by – samples where analyte of interest is not present– variation in data that is all of the same class – repeat measurement of blanks– off-target pixels in remote sensing
• Make it up! e.g. polynomial baseline shapes
![Page 14: Using Clutter to Improve Models - Eigenvector · Clutter, defined as theconfoundingeffects ofinterferingchemical species, physical effects, noise andinstrument non-idealities, is](https://reader034.vdocument.in/reader034/viewer/2022042513/5fa3ad9f8063300b9242489f/html5/thumbnails/14.jpg)
Y-gradient Method
• Sort samples by y (reference) values• Take differences between adjacent samples• Weight X-differences by inverse of difference
in y values • Deweight by covariance of differences (GLS) or
orthogonalize against some number of PCs (EPO, ELS, EMM, PA-CLS)
![Page 15: Using Clutter to Improve Models - Eigenvector · Clutter, defined as theconfoundingeffects ofinterferingchemical species, physical effects, noise andinstrument non-idealities, is](https://reader034.vdocument.in/reader034/viewer/2022042513/5fa3ad9f8063300b9242489f/html5/thumbnails/15.jpg)
15
Clutter Covariance
€
Xc = (X1,c − x 1,c ) + (X2,c − x 2,c ) + ...
€
C =XcTXc
N −1
Clutter source 1 Clutter source 2
![Page 16: Using Clutter to Improve Models - Eigenvector · Clutter, defined as theconfoundingeffects ofinterferingchemical species, physical effects, noise andinstrument non-idealities, is](https://reader034.vdocument.in/reader034/viewer/2022042513/5fa3ad9f8063300b9242489f/html5/thumbnails/16.jpg)
16
Covariance to Clutter Basis
€
C = VS2VT
B= V1...kFor basis choose some number of factors
![Page 17: Using Clutter to Improve Models - Eigenvector · Clutter, defined as theconfoundingeffects ofinterferingchemical species, physical effects, noise andinstrument non-idealities, is](https://reader034.vdocument.in/reader034/viewer/2022042513/5fa3ad9f8063300b9242489f/html5/thumbnails/17.jpg)
Covariance to GLS Weighting Matrix
€
C = VS2VT
€
G = VD−1VT
di ,i−1 = 1
si ,i2
α 2 +1with Large αè ∞,
dimension unaffectedSmall αè 0, dimension eliminated
weighting matrix
![Page 18: Using Clutter to Improve Models - Eigenvector · Clutter, defined as theconfoundingeffects ofinterferingchemical species, physical effects, noise andinstrument non-idealities, is](https://reader034.vdocument.in/reader034/viewer/2022042513/5fa3ad9f8063300b9242489f/html5/thumbnails/18.jpg)
Choosing Components
1 2 3 4 5 6 7 8 9 10 11-5.5
-5
-4.5
-4
-3.5
-3
-2.5
-2
-1.5
Principal Component Number
log(
eige
nval
ues)
EPO / CPSAxf = x - xPkPk
T
GLS / SBCxf = x - xPDPT
Eigenvalues of Clutter
One adjustable parameter in each method
k=4 k=5k=3
decreasing α
![Page 19: Using Clutter to Improve Models - Eigenvector · Clutter, defined as theconfoundingeffects ofinterferingchemical species, physical effects, noise andinstrument non-idealities, is](https://reader034.vdocument.in/reader034/viewer/2022042513/5fa3ad9f8063300b9242489f/html5/thumbnails/19.jpg)
Other Similar Pre-selection Filters…• Extended Mixture Model (Extended Least
Squares) orthogonal filtering for Classical Least Squares (CLS) models!
Target (Calibration) Spectra
Starget
Clutter Spectra
Sclutter
c = xS(STS)-1
Pseudo-inverse is an orthogonalization!
Equivalent to full-rank EPO / CPSA model
![Page 20: Using Clutter to Improve Models - Eigenvector · Clutter, defined as theconfoundingeffects ofinterferingchemical species, physical effects, noise andinstrument non-idealities, is](https://reader034.vdocument.in/reader034/viewer/2022042513/5fa3ad9f8063300b9242489f/html5/thumbnails/20.jpg)
Extended Multiplicative Scatter Correction
• EMSC attempts to correct for scatter that appears in forms other than just linear using the extended mixture model
( )( )
122
1
2
2, 2 1
refP
T T
corrected P
c
c
-
é ùé ù= ê úë û
ë û
=
= -
s s υ υ 1c
c Z Z Z s
s s Pc
[ ]
2
(1 ) 2
1
NxK
Nx K
P
c+
é ù= ë û=
é ù= ê úë û
P υ υ 1
Z s P
cc
20
![Page 21: Using Clutter to Improve Models - Eigenvector · Clutter, defined as theconfoundingeffects ofinterferingchemical species, physical effects, noise andinstrument non-idealities, is](https://reader034.vdocument.in/reader034/viewer/2022042513/5fa3ad9f8063300b9242489f/html5/thumbnails/21.jpg)
EMSC
( )( )
2
1
2
2, 2 1
ref
T T
corrected P Q c
-
é ù= ë û
=
= - -
s s S P Q c
c Z Z Z s
s s Pc Qc
2
(1 )
1 1 (1 )
NxK
Nx J K L ref A
T T T TS P Q x J K L
c
+ + +
+ + +
é ù= ë ûé ù= ë û
é ù= ë û
P υ υ 1
Z s S P Q
c c c c
!
21
• can add spectra of known target analyte SA,NxJ• can add spectra or basis of clutter QNxL.
![Page 22: Using Clutter to Improve Models - Eigenvector · Clutter, defined as theconfoundingeffects ofinterferingchemical species, physical effects, noise andinstrument non-idealities, is](https://reader034.vdocument.in/reader034/viewer/2022042513/5fa3ad9f8063300b9242489f/html5/thumbnails/22.jpg)
We think it is useful to use Clutter!
![Page 23: Using Clutter to Improve Models - Eigenvector · Clutter, defined as theconfoundingeffects ofinterferingchemical species, physical effects, noise andinstrument non-idealities, is](https://reader034.vdocument.in/reader034/viewer/2022042513/5fa3ad9f8063300b9242489f/html5/thumbnails/23.jpg)
Example Classification Data
4000 3500 3000 2500 2000 1500 1000 500
0
0.5
1
1.5
2
2.5
Using these regions only
• Mid-IR spectra of food grade oils• Classify oils, detect adulterated olive oil
![Page 24: Using Clutter to Improve Models - Eigenvector · Clutter, defined as theconfoundingeffects ofinterferingchemical species, physical effects, noise andinstrument non-idealities, is](https://reader034.vdocument.in/reader034/viewer/2022042513/5fa3ad9f8063300b9242489f/html5/thumbnails/24.jpg)
PCA Scores Plot of Oils
Olive oilCorn oil
Safflower oil
Corn margarine
Selected regions, mean centering only
![Page 25: Using Clutter to Improve Models - Eigenvector · Clutter, defined as theconfoundingeffects ofinterferingchemical species, physical effects, noise andinstrument non-idealities, is](https://reader034.vdocument.in/reader034/viewer/2022042513/5fa3ad9f8063300b9242489f/html5/thumbnails/25.jpg)
GLS α = 1
![Page 26: Using Clutter to Improve Models - Eigenvector · Clutter, defined as theconfoundingeffects ofinterferingchemical species, physical effects, noise andinstrument non-idealities, is](https://reader034.vdocument.in/reader034/viewer/2022042513/5fa3ad9f8063300b9242489f/html5/thumbnails/26.jpg)
GLS α = 0.3
![Page 27: Using Clutter to Improve Models - Eigenvector · Clutter, defined as theconfoundingeffects ofinterferingchemical species, physical effects, noise andinstrument non-idealities, is](https://reader034.vdocument.in/reader034/viewer/2022042513/5fa3ad9f8063300b9242489f/html5/thumbnails/27.jpg)
GLS α = 0.1
![Page 28: Using Clutter to Improve Models - Eigenvector · Clutter, defined as theconfoundingeffects ofinterferingchemical species, physical effects, noise andinstrument non-idealities, is](https://reader034.vdocument.in/reader034/viewer/2022042513/5fa3ad9f8063300b9242489f/html5/thumbnails/28.jpg)
GLS α = 0.03
![Page 29: Using Clutter to Improve Models - Eigenvector · Clutter, defined as theconfoundingeffects ofinterferingchemical species, physical effects, noise andinstrument non-idealities, is](https://reader034.vdocument.in/reader034/viewer/2022042513/5fa3ad9f8063300b9242489f/html5/thumbnails/29.jpg)
GLS α = 0.01
![Page 30: Using Clutter to Improve Models - Eigenvector · Clutter, defined as theconfoundingeffects ofinterferingchemical species, physical effects, noise andinstrument non-idealities, is](https://reader034.vdocument.in/reader034/viewer/2022042513/5fa3ad9f8063300b9242489f/html5/thumbnails/30.jpg)
GLS α = 0.003
![Page 31: Using Clutter to Improve Models - Eigenvector · Clutter, defined as theconfoundingeffects ofinterferingchemical species, physical effects, noise andinstrument non-idealities, is](https://reader034.vdocument.in/reader034/viewer/2022042513/5fa3ad9f8063300b9242489f/html5/thumbnails/31.jpg)
Calibration with MSC
�0.2 �0.15 �0.1 �0.05 0 0.05 0.1 0.15 0.2 0.25 0.3�0.1
�0.05
0
0.05
0.1
0.15
0.2
0.25
Scores on PC 1 (84.99%)
Scor
es o
n PC
2 (1
2.71
%)
Samples/Scores Plot of Olive Oil Calibration
Scores on PC 2 (12.71%)CMargCornOliveSaffl
![Page 32: Using Clutter to Improve Models - Eigenvector · Clutter, defined as theconfoundingeffects ofinterferingchemical species, physical effects, noise andinstrument non-idealities, is](https://reader034.vdocument.in/reader034/viewer/2022042513/5fa3ad9f8063300b9242489f/html5/thumbnails/32.jpg)
Cal and Test with MSC
�0.2 �0.15 �0.1 �0.05 0 0.05 0.1 0.15 0.2 0.25 0.3�0.1
�0.05
0
0.05
0.1
0.15
0.2
0.25
Scores on PC 1 (84.99%)
Scor
es o
n PC
2 (1
2.71
%)
Samples/Scores Plot of Olive Oil Calibration & Oiltest,
![Page 33: Using Clutter to Improve Models - Eigenvector · Clutter, defined as theconfoundingeffects ofinterferingchemical species, physical effects, noise andinstrument non-idealities, is](https://reader034.vdocument.in/reader034/viewer/2022042513/5fa3ad9f8063300b9242489f/html5/thumbnails/33.jpg)
With MSC and GLS
�0.06 �0.04 �0.02 0 0.02 0.04 0.06 0.08 0.1�0.04
�0.02
0
0.02
0.04
0.06
0.08
0.1
0.12
0.14
Scores on PC 1 (61.84%)
Scor
es o
n PC
2 (3
7.55
%)
Samples/Scores Plot of Olive Oil Calibration & Oiltest,
![Page 34: Using Clutter to Improve Models - Eigenvector · Clutter, defined as theconfoundingeffects ofinterferingchemical species, physical effects, noise andinstrument non-idealities, is](https://reader034.vdocument.in/reader034/viewer/2022042513/5fa3ad9f8063300b9242489f/html5/thumbnails/34.jpg)
Zoom on Olive Oil
�0.06 �0.055 �0.05 �0.045 �0.04 �0.035 �0.03 �0.025 �0.02 �0.015
�0.035
�0.03
�0.025
�0.02
�0.015
�0.01
Scores on PC 1 (61.84%)
Scor
es o
n PC
2 (3
7.55
%)
Samples/Scores Plot of Olive Oil Calibration & Oiltest,
Calibration and test Olive Oil
Adulterated Olive Oil
![Page 35: Using Clutter to Improve Models - Eigenvector · Clutter, defined as theconfoundingeffects ofinterferingchemical species, physical effects, noise andinstrument non-idealities, is](https://reader034.vdocument.in/reader034/viewer/2022042513/5fa3ad9f8063300b9242489f/html5/thumbnails/35.jpg)
Zoom on Corn and Safflower Oil
0.04 0.045 0.05 0.055 0.06 0.065 0.07 0.075 0.08
�15
�10
�5
0
5
x 10�3
Scores on PC 1 (61.84%)
Scor
es o
n PC
2 (3
7.55
%)
Samples/Scores Plot of Olive Oil Calibration & Oiltest,
Calibration and test Corn Oil
Calibration and test Safflower Oil
![Page 36: Using Clutter to Improve Models - Eigenvector · Clutter, defined as theconfoundingeffects ofinterferingchemical species, physical effects, noise andinstrument non-idealities, is](https://reader034.vdocument.in/reader034/viewer/2022042513/5fa3ad9f8063300b9242489f/html5/thumbnails/36.jpg)
With MSC and EPO
�0.2 �0.15 �0.1 �0.05 0 0.05 0.1 0.15 0.2 0.25�0.05
0
0.05
0.1
0.15
0.2
Scores on PC 1 (88.03%)
Scor
es o
n PC
2 (1
1.52
%)
Samples/Scores Plot of Olive Oil Calibration & Oiltest,
![Page 37: Using Clutter to Improve Models - Eigenvector · Clutter, defined as theconfoundingeffects ofinterferingchemical species, physical effects, noise andinstrument non-idealities, is](https://reader034.vdocument.in/reader034/viewer/2022042513/5fa3ad9f8063300b9242489f/html5/thumbnails/37.jpg)
Indian Pines Data
• Classic image data set used in many publications
• Crop area near West Lafayette, Indiana• Ground truth identified 16 know crop areas• Data from AVIRIS: Airborne Visible/Infrared
Imaging Spectrometer• 220 channels, 400-2500nm
![Page 38: Using Clutter to Improve Models - Eigenvector · Clutter, defined as theconfoundingeffects ofinterferingchemical species, physical effects, noise andinstrument non-idealities, is](https://reader034.vdocument.in/reader034/viewer/2022042513/5fa3ad9f8063300b9242489f/html5/thumbnails/38.jpg)
Indian Pines Image
![Page 39: Using Clutter to Improve Models - Eigenvector · Clutter, defined as theconfoundingeffects ofinterferingchemical species, physical effects, noise andinstrument non-idealities, is](https://reader034.vdocument.in/reader034/viewer/2022042513/5fa3ad9f8063300b9242489f/html5/thumbnails/39.jpg)
Soybean Fields
Soybeans no tillSoybeans minSoybeans clean
![Page 40: Using Clutter to Improve Models - Eigenvector · Clutter, defined as theconfoundingeffects ofinterferingchemical species, physical effects, noise andinstrument non-idealities, is](https://reader034.vdocument.in/reader034/viewer/2022042513/5fa3ad9f8063300b9242489f/html5/thumbnails/40.jpg)
PLS-DA, Mean-Center Only
Class Probability Image
![Page 41: Using Clutter to Improve Models - Eigenvector · Clutter, defined as theconfoundingeffects ofinterferingchemical species, physical effects, noise andinstrument non-idealities, is](https://reader034.vdocument.in/reader034/viewer/2022042513/5fa3ad9f8063300b9242489f/html5/thumbnails/41.jpg)
PLS-DA, EPO 1-PC
Class Probability Image
![Page 42: Using Clutter to Improve Models - Eigenvector · Clutter, defined as theconfoundingeffects ofinterferingchemical species, physical effects, noise andinstrument non-idealities, is](https://reader034.vdocument.in/reader034/viewer/2022042513/5fa3ad9f8063300b9242489f/html5/thumbnails/42.jpg)
Example Calibration Data• IDRC-2002 Shootout data• NIR Transflectance of pharmaceutical tablets• Goal is to predict assay value
600 800 1000 1200 1400 1600 18002
2.5
3
3.5
4
4.5
5
5.5
6
6.5
Wavelength (nm)
Sign
al Int
ensit
y
![Page 43: Using Clutter to Improve Models - Eigenvector · Clutter, defined as theconfoundingeffects ofinterferingchemical species, physical effects, noise andinstrument non-idealities, is](https://reader034.vdocument.in/reader034/viewer/2022042513/5fa3ad9f8063300b9242489f/html5/thumbnails/43.jpg)
Calibration and Test with MSC & MC
150 160 170 180 190 200 210 220 230 240150
160
170
180
190
200
210
220
230
240
Y Measured 3 assay
Y Pr
edic
ted
3 as
say
Samples/Scores Plot of calibrate_1,c & test_1,
R^2 = 0.9642 Latent VariablesRMSEC = 3.3253RMSEP = 3.3487Calibration Bias = 0Prediction Bias = �0.4224
![Page 44: Using Clutter to Improve Models - Eigenvector · Clutter, defined as theconfoundingeffects ofinterferingchemical species, physical effects, noise andinstrument non-idealities, is](https://reader034.vdocument.in/reader034/viewer/2022042513/5fa3ad9f8063300b9242489f/html5/thumbnails/44.jpg)
With MSC, GLS & MC
150 160 170 180 190 200 210 220 230 240150
160
170
180
190
200
210
220
230
240
Y Measured 3 assay
Y Pr
edic
ted
3 as
say
Samples/Scores Plot of calibrate_1,c & test_1,
R^2 = 0.9842 Latent VariablesRMSEC = 2.5171RMSEP = 2.159Calibration Bias = �1.1369e�13Prediction Bias = 0.067298
![Page 45: Using Clutter to Improve Models - Eigenvector · Clutter, defined as theconfoundingeffects ofinterferingchemical species, physical effects, noise andinstrument non-idealities, is](https://reader034.vdocument.in/reader034/viewer/2022042513/5fa3ad9f8063300b9242489f/html5/thumbnails/45.jpg)
With MSC, EPO & MC
150 160 170 180 190 200 210 220 230 240150
160
170
180
190
200
210
220
230
240
Y Measured 3 assay
Y Pr
edic
ted
3 as
say
Samples/Scores Plot of calibrate_1,c & test_1,
R^2 = 0.9792 Latent VariablesRMSEC = 3.0015RMSEP = 2.3951Calibration Bias = �8.5265e�14Prediction Bias = 0.18893
![Page 46: Using Clutter to Improve Models - Eigenvector · Clutter, defined as theconfoundingeffects ofinterferingchemical species, physical effects, noise andinstrument non-idealities, is](https://reader034.vdocument.in/reader034/viewer/2022042513/5fa3ad9f8063300b9242489f/html5/thumbnails/46.jpg)
Orthogonalization FiltersFilter Soft/
HardAdj. Params
Clutter source Improves Prediction?
OSC Hard # LVs Part of X orthogonal to y No, but reduces models complexity
O-PLS Hard # LVs Part of X-model space orthogonal to X’y
No, but sometimes improves interpretation
MOSC Hard # PCs Part of X orthogonal to y Maybe
CPSA Hard # PCs A priori, includes pathlength adj. Yes
EPO Hard # PCs Classes, y-gradient or a priori Yes
DOP Hard # PCs Synthetic reference samples Yes
GLS Soft Shrinkage a Classes, y-gradient or a priori Yes
SBC Soft # PCs (20?) Repeat samples or blanks Yes
EMM Hard None A priori from known interferents, clutter subspace
Yes, CLS model
ELS Hard # PCs Clutter subspace Yes
PA-CLS Hard None/# PCs Baseline shapes, residuals Yes, CLS model
WLS Soft Regularization Noise measurements Yes
![Page 47: Using Clutter to Improve Models - Eigenvector · Clutter, defined as theconfoundingeffects ofinterferingchemical species, physical effects, noise andinstrument non-idealities, is](https://reader034.vdocument.in/reader034/viewer/2022042513/5fa3ad9f8063300b9242489f/html5/thumbnails/47.jpg)
Conclusions• Main differences between methods are– How the clutter is defined– Whether the de-weighting is hard or soft
• Filtering methods are more similar than published statements might have you believe
• Methods achieve similar results, model performance generally improved (except O-PLS, OSC)
• Interpretation of filtered results can be challenging – except OPLS (ideally)