20090504_ir_studygroup
DESCRIPTION
20090504 StudyGroup : PCATRANSCRIPT
![Page 1: 20090504_ir_studygroup](https://reader035.vdocument.in/reader035/viewer/2022062513/5571f2aa49795947648cdede/html5/thumbnails/1.jpg)
Theory and Toolkits of PCA
2009 5/4 IRLab Study Group
Presenter : Chin-Hui Chen
![Page 2: 20090504_ir_studygroup](https://reader035.vdocument.in/reader035/viewer/2022062513/5571f2aa49795947648cdede/html5/thumbnails/2.jpg)
Agenda
Theory :◦1. Scenario◦2. What is PCA?◦3. How to minimize Squared-Error ?◦4. Dimensionality Reduction
Toolkit : ◦A list of PCA toolkits◦Demo
![Page 3: 20090504_ir_studygroup](https://reader035.vdocument.in/reader035/viewer/2022062513/5571f2aa49795947648cdede/html5/thumbnails/3.jpg)
Scenario (Point? Line?)
Consider a 2-dimension space
d
Least Squared Error
![Page 4: 20090504_ir_studygroup](https://reader035.vdocument.in/reader035/viewer/2022062513/5571f2aa49795947648cdede/html5/thumbnails/4.jpg)
Agenda
Theory :◦1. Scenario◦2. What is PCA?◦3. How to minimize Squared-Error ?◦4. Dimensionality Reduction
Toolkit : ◦A list of PCA toolkits◦Demo
![Page 5: 20090504_ir_studygroup](https://reader035.vdocument.in/reader035/viewer/2022062513/5571f2aa49795947648cdede/html5/thumbnails/5.jpg)
What is PCA ? (1)
Principal component analysis (PCA) involves a mathematical procedure that transforms a number of possibly correlated variables into a smaller number of uncorrelated variables called “principal components”.
![Page 6: 20090504_ir_studygroup](https://reader035.vdocument.in/reader035/viewer/2022062513/5571f2aa49795947648cdede/html5/thumbnails/6.jpg)
What is PCA ? (2)
What can PCA do ?◦Dimensionality Reduction
For example :
◦Assuming N points in D-dim space◦e.g. {x1, x2, x3, x4} ; xi = (v1, v2)
◦A set (M) of basis for projection◦e.g. {u1}
They are orthonormal bases ( 長度 1, 兩兩內積 0) M << D (represent the feature in M dimensions)
◦e.g. xi = (p1)
![Page 7: 20090504_ir_studygroup](https://reader035.vdocument.in/reader035/viewer/2022062513/5571f2aa49795947648cdede/html5/thumbnails/7.jpg)
Agenda
Theory :◦1. Scenario◦2. What is PCA?◦3. How to minimize Squared-Error ?◦4. Dimensionality Reduction
Toolkit : ◦A list of PCA toolkits◦Demo
![Page 8: 20090504_ir_studygroup](https://reader035.vdocument.in/reader035/viewer/2022062513/5571f2aa49795947648cdede/html5/thumbnails/8.jpg)
How to minimize Squared-Error ?
Consider a D-dimension space◦Given N point : {x1, x2, …, xn}
◦ xi is a D-dim vector
How to ◦1. 找一個點使得 squared-error 最小◦2. 找一條線使得 squared-error 最小
![Page 9: 20090504_ir_studygroup](https://reader035.vdocument.in/reader035/viewer/2022062513/5571f2aa49795947648cdede/html5/thumbnails/9.jpg)
How to ? - Point
◦Goal : Find x0 s.t. min.◦ ◦Let .
![Page 10: 20090504_ir_studygroup](https://reader035.vdocument.in/reader035/viewer/2022062513/5571f2aa49795947648cdede/html5/thumbnails/10.jpg)
How to ? – Point - Line
∴ x0 =
◦1. 找一個點使得 squared-error 最小◦2. 找一條線使得 squared-error 最小
L : xk’- x0 = ake xk’= x0 + ake = m + ake
![Page 11: 20090504_ir_studygroup](https://reader035.vdocument.in/reader035/viewer/2022062513/5571f2aa49795947648cdede/html5/thumbnails/11.jpg)
How to ? – Line
L : xk’ = m + akeGoal :
Find a1…an
![Page 12: 20090504_ir_studygroup](https://reader035.vdocument.in/reader035/viewer/2022062513/5571f2aa49795947648cdede/html5/thumbnails/12.jpg)
How to ? – Line
每個部份微分後 [2ak – 2aket(xk-m)]
What does it mean ?
![Page 13: 20090504_ir_studygroup](https://reader035.vdocument.in/reader035/viewer/2022062513/5571f2aa49795947648cdede/html5/thumbnails/13.jpg)
How to ? – Line
Then, how about e ?
![Page 14: 20090504_ir_studygroup](https://reader035.vdocument.in/reader035/viewer/2022062513/5571f2aa49795947648cdede/html5/thumbnails/14.jpg)
How to ? – Line
Let
Independent of e
![Page 15: 20090504_ir_studygroup](https://reader035.vdocument.in/reader035/viewer/2022062513/5571f2aa49795947648cdede/html5/thumbnails/15.jpg)
How to ? – Line
f(x,y) ->
But if x,y : g(x,y)=0
J’1(e) = -etSeUse lagrange multiplier :
Because |e| = 1 , u = etSe – λ(ete-1)
![Page 16: 20090504_ir_studygroup](https://reader035.vdocument.in/reader035/viewer/2022062513/5571f2aa49795947648cdede/html5/thumbnails/16.jpg)
How to ? – Line
◦What is S ?
Covariance Matrix ( 共變異數矩陣 )◦Assume D-dim
![Page 17: 20090504_ir_studygroup](https://reader035.vdocument.in/reader035/viewer/2022062513/5571f2aa49795947648cdede/html5/thumbnails/17.jpg)
How to ? – Line
, we know S.Then, what is e ? Eigenvectors of S.
AX= λX Eigen : same
![Page 18: 20090504_ir_studygroup](https://reader035.vdocument.in/reader035/viewer/2022062513/5571f2aa49795947648cdede/html5/thumbnails/18.jpg)
How to ? – conclusion
Summary :◦ Find a line : xk’= m + ake
ak = et(xk-m) Se = λe ; e = eigenvectors of covariance matrix.
◦D-dim space can find D eigenvectors.
![Page 19: 20090504_ir_studygroup](https://reader035.vdocument.in/reader035/viewer/2022062513/5571f2aa49795947648cdede/html5/thumbnails/19.jpg)
Agenda
Theory :◦1. Scenario◦2. What is PCA?◦3. How to minimize Squared-Error ?◦4. Dimensionality Reduction
Toolkit : ◦A list of PCA toolkits◦Demo
![Page 20: 20090504_ir_studygroup](https://reader035.vdocument.in/reader035/viewer/2022062513/5571f2aa49795947648cdede/html5/thumbnails/20.jpg)
Dimensionality Reduction
![Page 21: 20090504_ir_studygroup](https://reader035.vdocument.in/reader035/viewer/2022062513/5571f2aa49795947648cdede/html5/thumbnails/21.jpg)
Dimensionality Reduction
Consider a 2-dim space …
X1 = (a,b) X2 = (c,d)
X1 = (a’,b’) X2 = (c’,d’)
We are going to do …X1 = (a’) X2 = (c’)
![Page 22: 20090504_ir_studygroup](https://reader035.vdocument.in/reader035/viewer/2022062513/5571f2aa49795947648cdede/html5/thumbnails/22.jpg)
Dimensionality Reduction
We want to proof :◦Axes of the data are independent.
Consider N m-dim vectors◦{x1, x2, … ,xn}
◦Let X=[x1-m x2-m … xn-m]T m = mean
◦Let E = [e1 e2 … em]
Se = λe eigen decomposition Eigen vector {e1,…,em}
Eigen value {λ1,…, λm}
![Page 23: 20090504_ir_studygroup](https://reader035.vdocument.in/reader035/viewer/2022062513/5571f2aa49795947648cdede/html5/thumbnails/23.jpg)
Dimensionality Reduction
SE = [Se1 Se2 … Sem] = [λe1 λe2 … λem] =
= EDS = EDE-1
E = [e1 e2 … em]
![Page 24: 20090504_ir_studygroup](https://reader035.vdocument.in/reader035/viewer/2022062513/5571f2aa49795947648cdede/html5/thumbnails/24.jpg)
Dimensionality Reduction
We want to know new Covariance Matrix of projected vectors.
Let Y = [y1 y2 … yn]T
E = [e1 e2 … em]
Y = ETX
SY
![Page 25: 20090504_ir_studygroup](https://reader035.vdocument.in/reader035/viewer/2022062513/5571f2aa49795947648cdede/html5/thumbnails/25.jpg)
Dimensionality Reduction
SY = D
1. Covariance of two axes are 0.2. represent data↑->covariance of axes↑ -> λ ↑
![Page 26: 20090504_ir_studygroup](https://reader035.vdocument.in/reader035/viewer/2022062513/5571f2aa49795947648cdede/html5/thumbnails/26.jpg)
Dimensionality Reduction
Conclusion : If we want to reduce
dimension D to M (M<<D) 1. Find S 2. ->eigenvalues 3. Select Top M 4. Project data
![Page 27: 20090504_ir_studygroup](https://reader035.vdocument.in/reader035/viewer/2022062513/5571f2aa49795947648cdede/html5/thumbnails/27.jpg)
Agenda
Theory :◦1. Scenario◦2. What is PCA?◦3. How to minimize Squared-Error ?◦4. Dimensionality Reduction
Toolkit : ◦A list of PCA toolkits◦Demo
![Page 28: 20090504_ir_studygroup](https://reader035.vdocument.in/reader035/viewer/2022062513/5571f2aa49795947648cdede/html5/thumbnails/28.jpg)
Toolkits
![Page 29: 20090504_ir_studygroup](https://reader035.vdocument.in/reader035/viewer/2022062513/5571f2aa49795947648cdede/html5/thumbnails/29.jpg)
A List of PCA Toolkits
C & Java◦ Fionn Murtagh's Multivariate Data Analysis Software and Resources ◦ http://astro.u-strasbg.fr/~fmurtagh/mda-sw/
Perl◦ PDL::PCA
Matlab◦ Statistics Toolbox™ : princomp
Weka◦ weka.attributeSelection.PrincipalComponents
(http://www.laps.ufpa.br/aldebaro/weka/feature_selection.html )
![Page 30: 20090504_ir_studygroup](https://reader035.vdocument.in/reader035/viewer/2022062513/5571f2aa49795947648cdede/html5/thumbnails/30.jpg)
A List of PCA Toolkits
C & Java◦ Fionn Murtagh's Multivariate Data Analysis Software and Resources ◦ http://astro.u-strasbg.fr/~fmurtagh/mda-sw/
C : Download: pca.c Compile: cc pca.c -lm -o pcac Run: ./pcac spectr.dat 36 8 R > pcaout.c.txt
Java : Download: JAMA, PCAcorr.java Compile: javac –classpath Jama-1.0.2.jar PCAcorr.java Run: java PCAcorr iris.dat > pcaout.java.txt
![Page 31: 20090504_ir_studygroup](https://reader035.vdocument.in/reader035/viewer/2022062513/5571f2aa49795947648cdede/html5/thumbnails/31.jpg)
![Page 32: 20090504_ir_studygroup](https://reader035.vdocument.in/reader035/viewer/2022062513/5571f2aa49795947648cdede/html5/thumbnails/32.jpg)