![Page 1: Lecture note 02 - CHERIC• K. Pearson, "On Lines and Planes of Closest Fit to Systems of Points in Space,” Philosophical Magazine, 2(6), 559–572. (1901) • H. Hotelling,"Analysis](https://reader034.vdocument.in/reader034/viewer/2022042116/5e94a8548c4bac7754137ec5/html5/thumbnails/1.jpg)
Tutorial 2
• Tutorial 2: spectral example (tablet‐spectra.csv)• 460 observations (tablets), NIR absorbance measured at 650 different wavelengths.
• i.e., X : (460x650)
유준, Copyright © 26
![Page 2: Lecture note 02 - CHERIC• K. Pearson, "On Lines and Planes of Closest Fit to Systems of Points in Space,” Philosophical Magazine, 2(6), 559–572. (1901) • H. Hotelling,"Analysis](https://reader034.vdocument.in/reader034/viewer/2022042116/5e94a8548c4bac7754137ec5/html5/thumbnails/2.jpg)
Tutorial 2
• Matrix residuals
유준, Copyright © 27
![Page 3: Lecture note 02 - CHERIC• K. Pearson, "On Lines and Planes of Closest Fit to Systems of Points in Space,” Philosophical Magazine, 2(6), 559–572. (1901) • H. Hotelling,"Analysis](https://reader034.vdocument.in/reader034/viewer/2022042116/5e94a8548c4bac7754137ec5/html5/thumbnails/3.jpg)
Tutorial 2
• Column residuals (from last lecture)
유준, Copyright © 28
![Page 4: Lecture note 02 - CHERIC• K. Pearson, "On Lines and Planes of Closest Fit to Systems of Points in Space,” Philosophical Magazine, 2(6), 559–572. (1901) • H. Hotelling,"Analysis](https://reader034.vdocument.in/reader034/viewer/2022042116/5e94a8548c4bac7754137ec5/html5/thumbnails/4.jpg)
Tutorial 2; Column residuals
• Spectral example
유준, Copyright © 29
?
![Page 5: Lecture note 02 - CHERIC• K. Pearson, "On Lines and Planes of Closest Fit to Systems of Points in Space,” Philosophical Magazine, 2(6), 559–572. (1901) • H. Hotelling,"Analysis](https://reader034.vdocument.in/reader034/viewer/2022042116/5e94a8548c4bac7754137ec5/html5/thumbnails/5.jpg)
Hotelling’s T2
• Hotelling’s T2
유준, Copyright © 30
![Page 6: Lecture note 02 - CHERIC• K. Pearson, "On Lines and Planes of Closest Fit to Systems of Points in Space,” Philosophical Magazine, 2(6), 559–572. (1901) • H. Hotelling,"Analysis](https://reader034.vdocument.in/reader034/viewer/2022042116/5e94a8548c4bac7754137ec5/html5/thumbnails/6.jpg)
Tutorial 2: Hotelling’s T2
• Spectral example
유준, Copyright © 31
![Page 7: Lecture note 02 - CHERIC• K. Pearson, "On Lines and Planes of Closest Fit to Systems of Points in Space,” Philosophical Magazine, 2(6), 559–572. (1901) • H. Hotelling,"Analysis](https://reader034.vdocument.in/reader034/viewer/2022042116/5e94a8548c4bac7754137ec5/html5/thumbnails/7.jpg)
Tutorial 2: Hotelling’s T2
유준, Copyright © 32
![Page 8: Lecture note 02 - CHERIC• K. Pearson, "On Lines and Planes of Closest Fit to Systems of Points in Space,” Philosophical Magazine, 2(6), 559–572. (1901) • H. Hotelling,"Analysis](https://reader034.vdocument.in/reader034/viewer/2022042116/5e94a8548c4bac7754137ec5/html5/thumbnails/8.jpg)
NIPALS algorithm
• Non‐linear iterative partial least squares (NIPALS) algorithm
유준, Copyright © 33
ta
![Page 9: Lecture note 02 - CHERIC• K. Pearson, "On Lines and Planes of Closest Fit to Systems of Points in Space,” Philosophical Magazine, 2(6), 559–572. (1901) • H. Hotelling,"Analysis](https://reader034.vdocument.in/reader034/viewer/2022042116/5e94a8548c4bac7754137ec5/html5/thumbnails/9.jpg)
NIPALS algorithm
유준, Copyright © 34
![Page 10: Lecture note 02 - CHERIC• K. Pearson, "On Lines and Planes of Closest Fit to Systems of Points in Space,” Philosophical Magazine, 2(6), 559–572. (1901) • H. Hotelling,"Analysis](https://reader034.vdocument.in/reader034/viewer/2022042116/5e94a8548c4bac7754137ec5/html5/thumbnails/10.jpg)
NIPALS algorithm
유준, Copyright © 35
(continues)
![Page 11: Lecture note 02 - CHERIC• K. Pearson, "On Lines and Planes of Closest Fit to Systems of Points in Space,” Philosophical Magazine, 2(6), 559–572. (1901) • H. Hotelling,"Analysis](https://reader034.vdocument.in/reader034/viewer/2022042116/5e94a8548c4bac7754137ec5/html5/thumbnails/11.jpg)
NIPALS algorithm
유준, Copyright © 36
![Page 12: Lecture note 02 - CHERIC• K. Pearson, "On Lines and Planes of Closest Fit to Systems of Points in Space,” Philosophical Magazine, 2(6), 559–572. (1901) • H. Hotelling,"Analysis](https://reader034.vdocument.in/reader034/viewer/2022042116/5e94a8548c4bac7754137ec5/html5/thumbnails/12.jpg)
NIPALS algorithm
유준, Copyright © 37
(continues)
![Page 13: Lecture note 02 - CHERIC• K. Pearson, "On Lines and Planes of Closest Fit to Systems of Points in Space,” Philosophical Magazine, 2(6), 559–572. (1901) • H. Hotelling,"Analysis](https://reader034.vdocument.in/reader034/viewer/2022042116/5e94a8548c4bac7754137ec5/html5/thumbnails/13.jpg)
NIPALS algorithm
유준, Copyright © 38
![Page 14: Lecture note 02 - CHERIC• K. Pearson, "On Lines and Planes of Closest Fit to Systems of Points in Space,” Philosophical Magazine, 2(6), 559–572. (1901) • H. Hotelling,"Analysis](https://reader034.vdocument.in/reader034/viewer/2022042116/5e94a8548c4bac7754137ec5/html5/thumbnails/14.jpg)
Cross‐validation
• Cross‐validation• A general tool for avoiding over‐fitting• Can be applied to any model
유준, Copyright © 39
# of parameters in the model
error
prediction
Modeling (training)
![Page 15: Lecture note 02 - CHERIC• K. Pearson, "On Lines and Planes of Closest Fit to Systems of Points in Space,” Philosophical Magazine, 2(6), 559–572. (1901) • H. Hotelling,"Analysis](https://reader034.vdocument.in/reader034/viewer/2022042116/5e94a8548c4bac7754137ec5/html5/thumbnails/15.jpg)
Cross‐validation
1. Rows of data (X) divided into G groups2. PCA model estimated for data minus one group3. Calculate residual Eg,CV for deleted group using the PCA
model4. Repeat 2 ~ 3 and get EG,CV
※PRESS (prediction error sum of squares), SSX (sum of squares of X)
• R2: how well training data explained by the model• Q2: how well test data explained by the model
유준, Copyright © 40
,2 var( )1 1
var( )G CV PRESSQ
SS
X
EX
![Page 16: Lecture note 02 - CHERIC• K. Pearson, "On Lines and Planes of Closest Fit to Systems of Points in Space,” Philosophical Magazine, 2(6), 559–572. (1901) • H. Hotelling,"Analysis](https://reader034.vdocument.in/reader034/viewer/2022042116/5e94a8548c4bac7754137ec5/html5/thumbnails/16.jpg)
Cross‐validation
• How many components are necessary?
유준, Copyright © 41
![Page 17: Lecture note 02 - CHERIC• K. Pearson, "On Lines and Planes of Closest Fit to Systems of Points in Space,” Philosophical Magazine, 2(6), 559–572. (1901) • H. Hotelling,"Analysis](https://reader034.vdocument.in/reader034/viewer/2022042116/5e94a8548c4bac7754137ec5/html5/thumbnails/17.jpg)
Cross‐validation
• True number of principal components?• No one knows.• Recommendation
• Use cross‐validation as guide, and always look at a few extra components and step back a few components
• then make a judgement that is relevant to your intended use of the model.
• Models where we intend to learn from, or optimize, or monitor a process may well benefit from fewer or more components than suggested by cross‐validation.
유준, Copyright © 42
![Page 18: Lecture note 02 - CHERIC• K. Pearson, "On Lines and Planes of Closest Fit to Systems of Points in Space,” Philosophical Magazine, 2(6), 559–572. (1901) • H. Hotelling,"Analysis](https://reader034.vdocument.in/reader034/viewer/2022042116/5e94a8548c4bac7754137ec5/html5/thumbnails/18.jpg)
Tutorial 3
• Food data (Foods.csv)• Food consumption data from 16 EU contries• % households consuming different types of foods• Objectives: find any similarities / differences among countries using ProMV
유준, Copyright © 43
![Page 19: Lecture note 02 - CHERIC• K. Pearson, "On Lines and Planes of Closest Fit to Systems of Points in Space,” Philosophical Magazine, 2(6), 559–572. (1901) • H. Hotelling,"Analysis](https://reader034.vdocument.in/reader034/viewer/2022042116/5e94a8548c4bac7754137ec5/html5/thumbnails/19.jpg)
Tutorial 3
• Food data• % households consuming different types of foods
유준, Copyright © 44
![Page 20: Lecture note 02 - CHERIC• K. Pearson, "On Lines and Planes of Closest Fit to Systems of Points in Space,” Philosophical Magazine, 2(6), 559–572. (1901) • H. Hotelling,"Analysis](https://reader034.vdocument.in/reader034/viewer/2022042116/5e94a8548c4bac7754137ec5/html5/thumbnails/20.jpg)
Tutorial 3
• Food data
유준, Copyright © 45
![Page 21: Lecture note 02 - CHERIC• K. Pearson, "On Lines and Planes of Closest Fit to Systems of Points in Space,” Philosophical Magazine, 2(6), 559–572. (1901) • H. Hotelling,"Analysis](https://reader034.vdocument.in/reader034/viewer/2022042116/5e94a8548c4bac7754137ec5/html5/thumbnails/21.jpg)
Tutorial 3
• In ProMV,
유준, Copyright © 46
![Page 22: Lecture note 02 - CHERIC• K. Pearson, "On Lines and Planes of Closest Fit to Systems of Points in Space,” Philosophical Magazine, 2(6), 559–572. (1901) • H. Hotelling,"Analysis](https://reader034.vdocument.in/reader034/viewer/2022042116/5e94a8548c4bac7754137ec5/html5/thumbnails/22.jpg)
Some properties of PCA models
• The model is defined by the loadings vectors, p1, p2, ... , pA; each are a (K×1) vector, and can be collected into a single matrix, P, a (K×A) loadings matrix.
• These vectors form a line for one component, a plane for 2 components, and a hyperplane for 3 or more components. This line, plane or hyperplane define the latent variable model.
• An equivalent interpretation of the model plane is that these direction vectors are oriented in such a way that the scores have maximal variance for that component. No other directions of the loading vector (i.e. no other hyperplane) will give a greater variance.
유준, Copyright © 47
![Page 23: Lecture note 02 - CHERIC• K. Pearson, "On Lines and Planes of Closest Fit to Systems of Points in Space,” Philosophical Magazine, 2(6), 559–572. (1901) • H. Hotelling,"Analysis](https://reader034.vdocument.in/reader034/viewer/2022042116/5e94a8548c4bac7754137ec5/html5/thumbnails/23.jpg)
Some properties of PCA models
• This plane is calculated with respect to a given data set, X, an (N×K) matrix, so that the direction vectors best‐fit the data. We can say then that with one component, the best estimate of the original matrix X is:
• If we fit a second component:
유준, Copyright © 48
1 1 1 1 1 1ˆ or equivalently T T X t p X t p E
2 2 2 1 1 2 1 2ˆ or equivalently T T T X t p X t p t p E
TT
![Page 24: Lecture note 02 - CHERIC• K. Pearson, "On Lines and Planes of Closest Fit to Systems of Points in Space,” Philosophical Magazine, 2(6), 559–572. (1901) • H. Hotelling,"Analysis](https://reader034.vdocument.in/reader034/viewer/2022042116/5e94a8548c4bac7754137ec5/html5/thumbnails/24.jpg)
Some properties of PCA models
• The loadings vectors are of unit length: • The loading vectors are orthogonal to one another: • The variance of the t1 vector must be greater than the
variance of the t2 vector, and so on.• Each loading direction, pa, must point in the direction that
best explains the data; but this direction is not unique, since −pa also meets this criterion. If we did select −pa as the direc on, then the scores would just be −ta instead. This does not matter too much, because
유준, Copyright © 49
1.0a p
i jp p
T Ta a a a t p t p
![Page 25: Lecture note 02 - CHERIC• K. Pearson, "On Lines and Planes of Closest Fit to Systems of Points in Space,” Philosophical Magazine, 2(6), 559–572. (1901) • H. Hotelling,"Analysis](https://reader034.vdocument.in/reader034/viewer/2022042116/5e94a8548c4bac7754137ec5/html5/thumbnails/25.jpg)
Readings• History
• K. Pearson, "On Lines and Planes of Closest Fit to Systems of Points in Space,” Philosophical Magazine, 2(6), 559–572. (1901)
• H. Hotelling,"Analysis of a Complex of Statistical Variables with Principal Components,” Journal of Educational Psychology, 24, 417‐441, 498‐520, (1933)
• Papers by K. Karhunen, (1947) in Russian & M. Loeve, (1948) in French• NIPALS algorithm
• H. Wold, “Estimation of principal components and related models by iterative least squares,” in Multivariate Analysis (Ed., Krishnaiah, P. R.), Academic Press, NY, pp. 391‐420 (1966).
• Cross‐validation• S. Wold, “Cross‐validatory estimation of the number of components in
factor and principal components models,” Technometrics, 20, 397‐405, (1978).
유준, Copyright © 50
![Page 26: Lecture note 02 - CHERIC• K. Pearson, "On Lines and Planes of Closest Fit to Systems of Points in Space,” Philosophical Magazine, 2(6), 559–572. (1901) • H. Hotelling,"Analysis](https://reader034.vdocument.in/reader034/viewer/2022042116/5e94a8548c4bac7754137ec5/html5/thumbnails/26.jpg)
Readings
• General• S. Wold, K. Esbensen, and P. Geladi, “Principal Component Analysis,”
Chemometrics and Intelligent Laboratory Systems, 2, 37‐52, (1987).• T. Kourti and J. MacGregor, “Process analysis, monitoring and diagnosis
using multivariate projection methods – a tutorial, Chemometrics and Intelligent Laboratory Systems, 28, 3‐21, (1995).
• J. MacGregor, H. Yu, S. García‐Muñoz, and J. Flores‐Cerrillo, “Data‐Based Latent Variable Methods for Process Analysis, Monitoring and Control”. Computers and Chemical Engineering, 29, 1217‐1223, (2005).
유준, Copyright © 51