class-modelling vs discriminant classificationclassification rate or **cv efficiency sunday, march...
TRANSCRIPT
Dr. Marina CocchiDpt Chemical and Geological Sciences - UniMOREArctic Analysis 10-15 March [email protected]
Class-modelling vs Discriminant classification
A practitioner reflections
Dept. of Chemical and Geological Sciences, University of Modena and Reggio Emilia
Via G. Campi, 183 - 41125 Modena (Italy)
Servizi Tracciabilità, Autenticità, Monitoraggio Processo e prodotto
www.chemstamp.it
Dr. Marina Cocchi Researcher Analytical Chemistry (Chemometrics)
Sunday, March 16, 2014
Dr. Marina CocchiDpt Chemical and Geological Sciences - UniMOREArctic Analysis 10-15 March [email protected]
outline
o The one class modeling context
o Retrospective on SIMCA
o PLS-DA and its implementation
Sunday, March 16, 2014
Dr. Marina CocchiDpt Chemical and Geological Sciences - UniMOREArctic Analysis 10-15 March [email protected]
Classification task
The one class modeling context
Class Modeling
Discriminant methods
‣single class information is used‣useful in authentication
‣other/others class/es information always used
by F. Marini
by F. Marini
Sunday, March 16, 2014
Dr. Marina CocchiDpt Chemical and Geological Sciences - UniMOREArctic Analysis 10-15 March [email protected]
Why class-modeling ?e.g. Authentication issue
4
Need to identify non conformity, adulterations...
Some.mes
Most o1en
we know what we want !!!
we would like to know!!!
Chemical Signature
there is a category to contrast with
to find if an dulterant is present even not knowing
what it could be
we expect something to change in the signal profiles if an ”unknown” constituent is added
The one class modeling context
Sunday, March 16, 2014
Dr. Marina CocchiDpt Chemical and Geological Sciences - UniMOREArctic Analysis 10-15 March [email protected]
The one class modeling context
the “unknown” has similar profile but some peculiar
features
modeled class/es
‣ in model space the “unknown” will be accepted /close to class but it will have high residuals
A discriminant method will assign it to one of the classes by definition
‣ Does the paradigm one class against the rest of words make sense ? (e.g. ill vs healthy)
Sunday, March 16, 2014
Dr. Marina CocchiDpt Chemical and Geological Sciences - UniMOREArctic Analysis 10-15 March [email protected]
SIMCA model Liguria 3LV**ECVA model 3LV in inner PLS*
CLASS Training set
Test set
Liguria 12 9
Apulia 12 10
Modeled categories
‣ how the models consider “not italian samples” ?
The one class modeling contextDistinguish different valued italian products
Canonical Variate Scores
Ort
hogo
nal D
ista
nce/
OD
lim
Score Distances/SDlim
* model dimensionality according to minimum CV classification rate or **CV Efficiency
Sunday, March 16, 2014
Dr. Marina CocchiDpt Chemical and Geological Sciences - UniMOREArctic Analysis 10-15 March [email protected]
ECVA weights
SIMCA PC1 loadings Liguria
different concentration common
main variance for liguria
The one class modeling context
std for each category+ Apulia
- Liguria
Sunday, March 16, 2014
Dr. Marina CocchiDpt Chemical and Geological Sciences - UniMOREArctic Analysis 10-15 March [email protected]
discirminant PLS model 3LV The one class modeling context
calibration predition
again the model is built on 2 classes Apulia and Liguria
‣ how the models consider “not italian samples” ?
Sunday, March 16, 2014
Dr. Marina CocchiDpt Chemical and Geological Sciences - UniMOREArctic Analysis 10-15 March [email protected]
discirminant PLS model 3LV The one class modeling context
calibration predition
again the model is built on 2 classes Apulia and Liguria
‣ how the models consider “not italian samples” ?
Sunday, March 16, 2014
Dr. Marina CocchiDpt Chemical and Geological Sciences - UniMOREArctic Analysis 10-15 March [email protected]
Retrospective on SIMCA
1. Build a distinct PCA model for each class (separate centering/scaling & dimensionality choice)
2. Define a distance measure to the class model
3. Define a limit/boundary for acceptability (classification rule)
4. For a “new” sample estimate distances from each class model
5. objects may be assigned to one, more than one, or none of the classes
SIMCA methods
not assigned
to both
Sunday, March 16, 2014
Dr. Marina CocchiDpt Chemical and Geological Sciences - UniMOREArctic Analysis 10-15 March [email protected]
Common frame
SD = Scores Distancedistance within model spaceUse PCA scores
OD = Orthogonal Distancedistance from model spaceUse Residuals
2. Define a distance measure to the class model
SIMCA critical/differing steps
Variations
‣how they are definedSD‣Euclidean / Mahalonobis / Leverage‣from origin / boundaries
OD‣DoF correction
‣Use CV residuals and scores or not‣Use both / only one of the two‣ how to combine
3. Define limit(s)/boundary(ies) for acceptability (classification rule)
alternative-SIMCA
‣Use SD & OD distributions/statistics‣same / different
‣Use SD & OD limits Robust / distributions free
original-SIMCA
‣combine SD & OD for “external” object in a single “distance to class” measure : D
‣Use F-test to compare D with class residuals variance (RSD)
Retrospective on SIMCA
Sunday, March 16, 2014
Dr. Marina CocchiDpt Chemical and Geological Sciences - UniMOREArctic Analysis 10-15 March [email protected]
Implementation 1 SIMCA original[1-2]
[1] S.Wold, Pattern Recognition, 1976, 127; [2] S. Wold, M. Sjostrom, ACS Symposium Series, 1977, 243-281.
sp(q)
PC1
PC2
Class space
ta –ϑa(q)
,lim
dp(q) = sp
(q)
i = 1:M M = n° variables; k = 1:N N = n° objects of class qa = 1:A A = n° components
★ SD = 0 for calibration samples★ SD is weigthed by Φa
2
Φa2
= sp(q)2 / (Σktak(q)2 /Nq)
★ F-test because it is a comparison of variances
distance from a class, i.e q, of a test object, i.e. p
dp(q) = sqrt[sp(q)2 + ΣaΦa2(ta –ϑa
(q),lim)2]
total RSD of a class, i.e. q:
s0(q) = sqrt[Σikeik2/(N-‐‑A-‐‑1)(M-‐‑A)]
OD SD
sp(q) = sqrt[Σk e2pk/(M-A)]
Classification rule
F = dp(q)2 / s0(q)2 < Fcrit (M-‐‑A),(N-‐‑A-‐‑1)(M-‐‑A)
If true for both q and r Unique assignment only if
F = dp(q)2 / dp(r)2 > Fcrit (M-‐‑Ar),(M-‐‑Aq)
ϑa(q)
,lim can be: tmax , tmax + std(t), CI(t),
Retrospective on SIMCA
Sunday, March 16, 2014
Dr. Marina CocchiDpt Chemical and Geological Sciences - UniMOREArctic Analysis 10-15 March [email protected]
DoF , corrections:
F = dp(q)2/s0(q)2
• correct F *(N-1)/(N-A-1)
Fcrit (M-A) , (N-A-1)(M-A)Variants: • ½[(N-A-1)(M-A)] (to enlarge acceptance)
• Use: (r-A)(N-A-1)/N, (N-A-1)(r-A)r = M if N>M r = N-1 if N<=M
‣ correct F *N/(N-A-1) and use Fcrit 1, (N-A-1)
See references 15-18 in [1]
1. R.D. Maesschalck, A. Candolfi, D.L. Massart, S. Heuerding, Chemom. Intell. Lab. Syst. 47 (1999) 65-772. G.R. Flaten, B. Grung, O.M. Kvalheim, Chemom. Intell. Lab. Syst. 72 (2004) 1013. Todeschini et al., Chemom. Intell. Lab. Syst. 87 (2007) 3–17.
Sp(q)
PC1
PC2
Class space
✓ use cross-validated residuals in Orthogonal Distance [1, 2]
(DoF in F test were left unchanged)
✓ use only Orthogonal distance [2]
✓ ... use only Leverage values [3]..
Distances:
✓ Use Mahalonobis distance as SD
variations on Theme SIMCA original
Sunday, March 16, 2014
Dr. Marina CocchiDpt Chemical and Geological Sciences - UniMOREArctic Analysis 10-15 March [email protected]
Implementation in Software SIMCA original
SIMCA Umetrics s0(q) = sqrt(Σikeik2/(N-‐‑A-‐‑1)(M-‐‑A))pooled RSD of a class, i.e. q:
absolute DModX(q) = s i(q) = sqrt(Σkeik2/(M-‐‑A))*v
(calibration samples)
where v is a correction factor slightly higher than 1 accounting for the fact that distance to model is expected to be slightly smaller for a sample that is part of it
absolute DModXPS(q) = s p (q) = sqrt(Σkepk2/(M-‐‑A)) (“external” samples), i.e. OD
dp(q) = augmented DModXPS+(q) = sqrt[s p (q)2 + SD2]
Classification rule
(s i(q) / s0
(q) ) is approximately F-distributed with DoF of
observation and the model
SIMCA PARVUS Classification Rule is as in Umetrics but correct F *(N-1)/(N-A-1) Moreover variants are introduced in: ϑa
(q),lim can be restricted
D is considered as the hypotenuse from class box Residuals are corrected by leverage
Retrospective on SIMCA
Sunday, March 16, 2014
Dr. Marina CocchiDpt Chemical and Geological Sciences - UniMOREArctic Analysis 10-15 March [email protected]
Implementation in Software SIMCA original
SIMCA Unscramblers0
(q) = sqrt(ResXValTot) =(?) sqrt(Σikeik2 /[Nval *(M-‐‑A)/M]
ModelDist(q) =
s p (q) = sqrt(Σkepk2/(M-‐‑A)) (“external” samples)
i.e. OD
Classification Membership Limits
‣ Leverage Limit
<= 3*(A+1)/N
‣ Sample to model distance Limit
(s p(q) / s0
(q) )
is approximately F-distributed
Leverage correction: Fcrit 1, (N-‐‑A-‐‑1)
Cross Validation/Test: Fcrit 1, Nval
not clear to me how the two limits are used
Retrospective on SIMCA
Sunday, March 16, 2014
Dr. Marina CocchiDpt Chemical and Geological Sciences - UniMOREArctic Analysis 10-15 March [email protected]
alternative SIMCA a somehow alternative approach (coming from MSPC realm)
T2
Q
Variable 1
Variab
le 2
Varia
ble
3
PC1
PC2
Common frame
SD = Scores Distanceuse Mahalonobis Distance / Hotelling's T-square (T2)
OD = Orthogonal DistanceQ is the sum of squared residuals: Σkepk2 (same role as sp
(q)2) Qlim is calculated by assuming a χ2 distribution
Variations ‣ Which Reference Distribution for SD
‣ Use of Robust estimation
‣ Use CV scores
Classification rule
Use a summary of reduced distance
SD/SDlim & OD/ODlim
Variations ‣ Unique assignment /not
Retrospective on SIMCA
Sunday, March 16, 2014
Dr. Marina CocchiDpt Chemical and Geological Sciences - UniMOREArctic Analysis 10-15 March [email protected]
Classification rule
Assign an object to a class if its reduced combined distance satisfies :
< V2
PLS-Toolbox SIMCA
Tlim 2 = [A*(N-1)/N*(N-A)]*Fcrit A, (N-‐‑A)
alternative SIMCA
PLS-Toolbox SIMCA
Implementation II
Qlim χ2 distribution JM approximation
Q = Σkepk2
Ti2 = tiλ-1ti
Hotelling's T-square statistics
Sunday, March 16, 2014
Dr. Marina CocchiDpt Chemical and Geological Sciences - UniMOREArctic Analysis 10-15 March [email protected]
robust SIMCA [1,2]
1. K.V. Branden and M. Hubert, Chemolab 79 (2005) pp.10-212. M. Daszykowski, K. Kaczmareck, I. Stanimirova, Y. Vander Heyden, B. Walczak
Chemom. Intell. Lab. Syst. (2006)
- use robust PCA for class models
- use a weighted sum of “reduced” orthogonal (OD, i.e. sp(q)2 in previous slide) and
Mahalonobis (MD, in scores space) distances
- thus classification rule will be to assign a new sample to the class for which R is minimal:
R = γ (OD/ODlim) + (1-γ) (MD/MDlim)
where MDlim is estimated by using as reference the chi-squared distribution with dof equal to the number of retained components (A) and ODlim is estimated by assuming that a scaled version of chi-squared distribution is appropriate (Wilson-Hilferty approximation)
γ in the range 0-1
variations on Theme alternative SIMCA
Sunday, March 16, 2014
Dr. Marina CocchiDpt Chemical and Geological Sciences - UniMOREArctic Analysis 10-15 March [email protected]
The same method as for SD is used:
1. A. Pomerantsev J. Chemometrics 22 (2008) 601–609.
variations on Theme alternative SIMCA
Pomerantsev acceptance area [1] Scores Distance = Leverage
mean Leverage
SD distribution
DoF = Nh
The author estimated DoF by the method of moment
The author prefers a Robust estimate:
DoF = A if scores are normally distributed
Ortoghonal Distance
Sunday, March 16, 2014
Dr. Marina CocchiDpt Chemical and Geological Sciences - UniMOREArctic Analysis 10-15 March [email protected]
The same method as for SD is used:
1. A. Pomerantsev J. Chemometrics 22 (2008) 601–609.
variations on Theme alternative SIMCA
Pomerantsev acceptance area [1] Scores Distance = Leverage
mean Leverage
SD distribution
DoF = Nh
The author estimated DoF by the method of moment
The author prefers a Robust estimate:
DoF = A if scores are normally distributed
Ortoghonal Distance
Classification ruleDifferent acceptance areas are derived
depending on how SD and OD are
combined
Sunday, March 16, 2014
Dr. Marina CocchiDpt Chemical and Geological Sciences - UniMOREArctic Analysis 10-15 March [email protected]
variations on Theme alternative SIMCA
Pomerantsev acceptance area [1]
I, II and V were considered acceptable
I
II
V see article
Sunday, March 16, 2014
Dr. Marina CocchiDpt Chemical and Geological Sciences - UniMOREArctic Analysis 10-15 March [email protected]
Hlim fit
Qlim fit χ2 distribution JM approximation
Classification rule (reduced distances as in PLS toolbox)
‣ Q, H limits:
Orthogonal Distance: Q
Scores Distance:
Leverage: Hfit = diag [ Tfit(TfitTTfit)-1TfitT]
HCV = diag [ TCV(TfitTTfit)-1TCVT]
S: scores Covariance matrix;T: Scores matrix;
D-statistic Dfit = diag[ Tfit(Sit)-1TfitT]
Dlim fit[1] ~ [A(N2 - 1)/N(N-A)] F(A, N-A)
[1] J.A. Westerhuis, S.P. Gurden and A.K. Smilde, Chemolab. 51 (2000) 95-114.
[2] M. Forina Chemolab. 96 (2009) 239-245
*
H CV 95%: the 95 percentile of HCV
Q CV 95%: the 95 percentile of QCV
variations on Theme alternative SIMCA
hi N/(N-1) ~ ß(M/2, (N-M-1)/2)[2]
Sunday, March 16, 2014
Dr. Marina CocchiDpt Chemical and Geological Sciences - UniMOREArctic Analysis 10-15 March [email protected]
Implementation I & II comparison
square distance/rsd of a “new” sample is compared to square rsd of class by F-‐‑test
Score distance is computed from the “box” boundaries (but can be taken from average problematic degree of freedom for F-‐‑test scaling factor needed in order to combine score distance with orthogonal one reduced score distance and orthogonal distance are
combined, thus being directly comparable
Score distance is computed from the averageproblematic
two different reference distribution are assumed
degree of freedom (?) giving the same weight to both type of distances
Sp(q)
PC1
PC2
Class space
PC1
PC2 Q
Class space
T2
Qlim
T2limRejected by both approaches
Rejected by original SIMCAAccepted by this rule
Accepted by both approaches
Q
Accepted by rules using only orthogonal
distance?
T2
original in SIMCA (I): score and orthogonal distances are combined as a “rsd” measure
Variants using T2 , MD avoids to define “box” boundaries
using only Orthogonal distances does not take into account deviations inside class space using χ2 distribution for both OD and SD do not combine OD and SD in a single rule
alternative in PLS-‐‑Toolbox SIMCA (II):
Sunday, March 16, 2014
22
0
5000.0000
10000.0000
15000.0000
20000.0000
0
50.0000
100.0000
150.0000
200.0000
0 25 50 75 100
Dlim
numero di oggetti
one thirdone fourthone halfN-1
ratio objs/LVs
Dlim fit[1] ~ [A(N2 - 1)/N(N-A)] F(A, N-A)
Sunday, March 16, 2014
Dr. Marina CocchiDpt Chemical and Geological Sciences - UniMOREArctic Analysis 10-15 March [email protected]
1. I. Frank, Chemom. Intell. Lab. Syst. 4 (1998) 215-222
Distances:
✓ Use a modified Mahalonobis distance defined by using both SD and OD
between LDA and SIMCA DASCO
Bayes rule:
DASCO: min
SD OD
Sunday, March 16, 2014
Dr. Marina CocchiDpt Chemical and Geological Sciences - UniMOREArctic Analysis 10-15 March [email protected]
Regression on dummy variables (DPLS, PLS-DA)
Y coding: as many dummy y-variables as classes 1/0 Fit a PLS2 model
DPLSPredict the dummy values (yi_pred) and assign the object to the group with highest predicted value, k = argmax(yi_pred)
PLS-DA
with more than 2 categories can be sub-optimal (masking effect Hastie1 )
1 Hastie T. et al. The element of statistical learning, Springer: New York,2001
It has been demonstrated PLS on dummy variables has connection with LDA2
2 Barker and Rayens,. J. Chemometrics 2003 (17): 166-173
or with Fisher canonical discriminant analysis, in fact it correspond to eigendecomposition of a slightly altered version (M) of the Between-group variance-covariance matrix (B)
a better alternative would be to use LDA, QDA, etc.. on PLS scores
Nocairi3 et al. implemented PLS-DA bringing to eigendecomposition of B
3 Nocairi et al. Comp. Stat & Data Anal. 2005 (48) 139-147
Sunday, March 16, 2014
Dr. Marina CocchiDpt Chemical and Geological Sciences - UniMOREArctic Analysis 10-15 March [email protected]
Regression on dummy variables (DPLS, PLS-DA)
Y coding: as many dummy y-variables as classes 1/0 Fit a PLS2 model
PLS-DA extension
1 U. Indhal, H. Martens, T. Naes,.J. Chemometrics 2007; 21: 529–536
A general framework has been proposed1 by including prior probability in the derivation
Given a classification problem with specified prior probabilities a feasible strategy isto extract PLS loading weights according to the dominant eigenvector of B found by SVD of: YtX
* X has to be centered by the weighted global means taking into account the prior
Moreover using LDA on PLS2 scores or Y fitted values (not redundant, i.e. with k column) give the same results
Sunday, March 16, 2014
Dr. Marina CocchiDpt Chemical and Geological Sciences - UniMOREArctic Analysis 10-15 March [email protected]
- Y coding: as many dummy y-variables as classes 1/0
- Fit a PLS2 model
‣ define a threshold‣ UNSCRAMBLER: fixed, e.g. 0.5 ; (Also suggest to use LDA... on PLS scores)‣ SIMCA-Umetrics : <0.35 reject; 0.35-0.65 border-line; 0.65-1.35 accept; >1.35 do not belong, possible extreme/outlier‣ PLS-toolbox:‣ Threshold selected on best SENS/SPEC compromise from ROC curve (CV‣ Bayesan thresholdThe Bayesian threshold calculation assumes that the predicted y values follow a distribution similar to what will be observed for future samples. Using these estimated distributions, a threshold is selected at the point where the two estimated distributions cross; this is the y-value at which the number of false positives and false negatives should be minimized for future predictions (either in fit or CV prediction).
Implementation in Software
Classification rule
discriminant PLS
Sunday, March 16, 2014
Dr. Marina CocchiDpt Chemical and Geological Sciences - UniMOREArctic Analysis 10-15 March [email protected]
Practitioners reflections:
‣ there is a clear imbalance between literature and implementations‣ there is a need for tutorials (back to basic.. but a little beyond...)‣ need to define a problem oriented strategy, e.g. modeling vs. discrimination, etc..
‣ using the threshold rule(s) in the DPLS spanned space how to relate to the theoretical FCDA framework or other?‣ when there are more than 2 classes an “hybrid” use of threshold can be found, i.e. use it as class boundaries and treat results as class modeling... make sense (??)‣ to interprete PLS weights, regression coefficients in the usual way forgetting Y is a dummy matrix is correct ?‣ what about target projection and SR in the discriminant PLS framework ?
‣ could we think of unified/generalized SIMCA making a synthesis of the different derivations ?
for discussion:
Sunday, March 16, 2014
Dr. Marina CocchiDpt Chemical and Geological Sciences - UniMOREArctic Analysis 10-15 March [email protected]
Thanks for attention
Sunday, March 16, 2014