20090504_ir_studygroup

32
Theory and Toolkits of PCA 2009 5/4 IRLab Study Group Presenter : Chin-Hui Chen

Upload: johnsonchen

Post on 11-Jun-2015

13 views

Category:

Documents


0 download

DESCRIPTION

20090504 StudyGroup : PCA

TRANSCRIPT

Page 1: 20090504_ir_studygroup

Theory and Toolkits of PCA

2009 5/4 IRLab Study Group

Presenter : Chin-Hui Chen

Page 2: 20090504_ir_studygroup

Agenda

Theory :◦1. Scenario◦2. What is PCA?◦3. How to minimize Squared-Error ?◦4. Dimensionality Reduction

Toolkit : ◦A list of PCA toolkits◦Demo

Page 3: 20090504_ir_studygroup

Scenario (Point? Line?)

Consider a 2-dimension space

d

Least Squared Error

Page 4: 20090504_ir_studygroup

Agenda

Theory :◦1. Scenario◦2. What is PCA?◦3. How to minimize Squared-Error ?◦4. Dimensionality Reduction

Toolkit : ◦A list of PCA toolkits◦Demo

Page 5: 20090504_ir_studygroup

What is PCA ? (1)

Principal component analysis (PCA) involves a mathematical procedure that transforms a number of possibly correlated variables into a smaller number of uncorrelated variables called “principal components”.

Page 6: 20090504_ir_studygroup

What is PCA ? (2)

What can PCA do ?◦Dimensionality Reduction

For example :

◦Assuming N points in D-dim space◦e.g. {x1, x2, x3, x4} ; xi = (v1, v2)

◦A set (M) of basis for projection◦e.g. {u1}

They are orthonormal bases ( 長度 1, 兩兩內積 0) M << D (represent the feature in M dimensions)

◦e.g. xi = (p1)

Page 7: 20090504_ir_studygroup

Agenda

Theory :◦1. Scenario◦2. What is PCA?◦3. How to minimize Squared-Error ?◦4. Dimensionality Reduction

Toolkit : ◦A list of PCA toolkits◦Demo

Page 8: 20090504_ir_studygroup

How to minimize Squared-Error ?

Consider a D-dimension space◦Given N point : {x1, x2, …, xn}

◦ xi is a D-dim vector

How to ◦1. 找一個點使得 squared-error 最小◦2. 找一條線使得 squared-error 最小

Page 9: 20090504_ir_studygroup

How to ? - Point

◦Goal : Find x0 s.t. min.◦ ◦Let .

Page 10: 20090504_ir_studygroup

How to ? – Point - Line

∴ x0 =

◦1. 找一個點使得 squared-error 最小◦2. 找一條線使得 squared-error 最小

L : xk’- x0 = ake xk’= x0 + ake = m + ake

Page 11: 20090504_ir_studygroup

How to ? – Line

L : xk’ = m + akeGoal :

Find a1…an

Page 12: 20090504_ir_studygroup

How to ? – Line

每個部份微分後 [2ak – 2aket(xk-m)]

What does it mean ?

Page 13: 20090504_ir_studygroup

How to ? – Line

Then, how about e ?

Page 14: 20090504_ir_studygroup

How to ? – Line

Let

Independent of e

Page 15: 20090504_ir_studygroup

How to ? – Line

f(x,y) ->

But if x,y : g(x,y)=0

J’1(e) = -etSeUse lagrange multiplier :

Because |e| = 1 , u = etSe – λ(ete-1)

Page 16: 20090504_ir_studygroup

How to ? – Line

◦What is S ?

Covariance Matrix ( 共變異數矩陣 )◦Assume D-dim

Page 17: 20090504_ir_studygroup

How to ? – Line

, we know S.Then, what is e ? Eigenvectors of S.

AX= λX Eigen : same

Page 18: 20090504_ir_studygroup

How to ? – conclusion

Summary :◦ Find a line : xk’= m + ake

ak = et(xk-m) Se = λe ; e = eigenvectors of covariance matrix.

◦D-dim space can find D eigenvectors.

Page 19: 20090504_ir_studygroup

Agenda

Theory :◦1. Scenario◦2. What is PCA?◦3. How to minimize Squared-Error ?◦4. Dimensionality Reduction

Toolkit : ◦A list of PCA toolkits◦Demo

Page 20: 20090504_ir_studygroup

Dimensionality Reduction

Page 21: 20090504_ir_studygroup

Dimensionality Reduction

Consider a 2-dim space …

X1 = (a,b) X2 = (c,d)

X1 = (a’,b’) X2 = (c’,d’)

We are going to do …X1 = (a’) X2 = (c’)

Page 22: 20090504_ir_studygroup

Dimensionality Reduction

We want to proof :◦Axes of the data are independent.

Consider N m-dim vectors◦{x1, x2, … ,xn}

◦Let X=[x1-m x2-m … xn-m]T m = mean

◦Let E = [e1 e2 … em]

Se = λe eigen decomposition Eigen vector {e1,…,em}

Eigen value {λ1,…, λm}

Page 23: 20090504_ir_studygroup

Dimensionality Reduction

SE = [Se1 Se2 … Sem] = [λe1 λe2 … λem] =

= EDS = EDE-1

E = [e1 e2 … em]

Page 24: 20090504_ir_studygroup

Dimensionality Reduction

We want to know new Covariance Matrix of projected vectors.

Let Y = [y1 y2 … yn]T

E = [e1 e2 … em]

Y = ETX

SY

Page 25: 20090504_ir_studygroup

Dimensionality Reduction

SY = D

1. Covariance of two axes are 0.2. represent data↑->covariance of axes↑ -> λ ↑

Page 26: 20090504_ir_studygroup

Dimensionality Reduction

Conclusion : If we want to reduce

dimension D to M (M<<D) 1. Find S 2. ->eigenvalues 3. Select Top M 4. Project data

Page 27: 20090504_ir_studygroup

Agenda

Theory :◦1. Scenario◦2. What is PCA?◦3. How to minimize Squared-Error ?◦4. Dimensionality Reduction

Toolkit : ◦A list of PCA toolkits◦Demo

Page 28: 20090504_ir_studygroup

Toolkits

Page 29: 20090504_ir_studygroup

A List of PCA Toolkits

C & Java◦ Fionn Murtagh's Multivariate Data Analysis Software and Resources ◦ http://astro.u-strasbg.fr/~fmurtagh/mda-sw/

Perl◦ PDL::PCA

Matlab◦ Statistics Toolbox™ : princomp

Weka◦ weka.attributeSelection.PrincipalComponents

(http://www.laps.ufpa.br/aldebaro/weka/feature_selection.html )

Page 30: 20090504_ir_studygroup

A List of PCA Toolkits

C & Java◦ Fionn Murtagh's Multivariate Data Analysis Software and Resources ◦ http://astro.u-strasbg.fr/~fmurtagh/mda-sw/

C : Download: pca.c Compile: cc pca.c -lm -o pcac Run: ./pcac spectr.dat 36 8 R > pcaout.c.txt

Java : Download: JAMA, PCAcorr.java Compile: javac –classpath Jama-1.0.2.jar PCAcorr.java Run: java PCAcorr iris.dat > pcaout.java.txt

Page 31: 20090504_ir_studygroup
Page 32: 20090504_ir_studygroup