brief introduction to pca & svd 2008.09.09 byung-hyun ha [email protected]

11
Brief Introduction to PCA & SVD 2008.09.09 Byung-Hyun Ha [email protected]

Upload: bernard-blankenship

Post on 26-Dec-2015

213 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Brief Introduction to PCA & SVD 2008.09.09 Byung-Hyun Ha bhha@pusan.ac.kr

Brief Introduction to PCA & SVD

2008.09.09

Byung-Hyun Ha

[email protected]

Page 2: Brief Introduction to PCA & SVD 2008.09.09 Byung-Hyun Ha bhha@pusan.ac.kr

Contents

Principle component analysis (PCA)

Singular value decomposition (SVD)

Page 3: Brief Introduction to PCA & SVD 2008.09.09 Byung-Hyun Ha bhha@pusan.ac.kr

Principle Component Analysis

Steps Get some data Subtract the mean Calculate the covariance matrix Calculate the eigenvectors and eigenvalues of the covariance m

atrix Choosing components and forming a feature vector Deriving the new data set Getting the old data back

Page 4: Brief Introduction to PCA & SVD 2008.09.09 Byung-Hyun Ha bhha@pusan.ac.kr

Principle Component Analysis

Get some data

Page 5: Brief Introduction to PCA & SVD 2008.09.09 Byung-Hyun Ha bhha@pusan.ac.kr

Principle Component Analysis

Subtract the mean

Page 6: Brief Introduction to PCA & SVD 2008.09.09 Byung-Hyun Ha bhha@pusan.ac.kr

Principle Component Analysis

Calculate the covariance matrix

Calculate the eigenvectors and eigenvalues of the covariance matrix

Page 7: Brief Introduction to PCA & SVD 2008.09.09 Byung-Hyun Ha bhha@pusan.ac.kr

Principle Component Analysis

Choosing components and forming feature vector (1)

Deriving the newdata set (1) FinalData1 =

DataAdjust FeatureVector1

Page 8: Brief Introduction to PCA & SVD 2008.09.09 Byung-Hyun Ha bhha@pusan.ac.kr

Principle Component Analysis

Choosing components and forming feature vector (2)

Deriving the newdata set (2) FinalData2 =

DataAdjust FeatureVector2

Getting the olddata back DataAdjust' =

FinalData2 FeatureVector2

T

Page 9: Brief Introduction to PCA & SVD 2008.09.09 Byung-Hyun Ha bhha@pusan.ac.kr

Principle Component Analysis

이론적으로 보자면 , 가장 중요한 성분 (component) w1 은 다음과 같이 구해짐

• 여기서 , x는 data point 의 확률변수 ( 벡터 ) 임

k-1 개의 요인을 제거한 후 가장 주요한 성분이 wk 임

위의 조건을 만족하는 모든 성분을 구하는 것은 최적화 문제이며 공분산 행렬 C 의 특성행렬을 구함으로써 계산할 수 있음

• C = XXT = WWT

• 여기서 XT 는 data point 의 row 행렬 , W 는 특성행렬 , 는 대각행렬

Page 10: Brief Introduction to PCA & SVD 2008.09.09 Byung-Hyun Ha bhha@pusan.ac.kr

Singular Value Decomposition

Singular value decomposition Any m by n matrix A can be factored into

A = Q1Q2T = (orthogonal)(diagonal)(orthogonal).

The columns of Q1 (m by m) are eigenvectors of AAT, and the columns of Q2 (n by n) are eigenvectors of ATA. The r singular values on the diagonal of (m by n) are the square roots of the nonzero eigenvalues of both AAT and ATA.

Example (by MATLAB)[2 5 8 7] [-0.54 -0.38 0.62 -0.34] [21.72 0 0 0 ] [-0.20 -0.47 -0.56 -0.66][3 5 7 6] [-0.50 -0.42 -0.23 0.42] [ 0 3.81 0 0 ] [-0.33 0.24 -0.73 0.55][1 6 4 9] = [-0.51 0.82 0.15 0.00] X [ 0 0 1.43 0 ] X [-0.92 0.06 0.37 -0.08][2 2 3 4] [-0.26 -0.05 -0.65 -0.72] [ 0 0 0 0.91] [ 0.05 0.85 -0.13 -0.51][2 4 4 5] [-0.36 0.03 -0.36 0.44]

Page 11: Brief Introduction to PCA & SVD 2008.09.09 Byung-Hyun Ha bhha@pusan.ac.kr

Singular Value Decomposition

Applications Image compression

[-0.54 -0.38 0.62 -0.34] [21.72 0 0 0 ] [-0.20 -0.47 -0.56 -0.66] [2.0 5.0 8.0 7.0][-0.50 -0.42 -0.23 0.42] [ 0 3.81 0 0 ] [-0.33 0.24 -0.73 0.55] [3.0 5.0 7.0 6.0][-0.51 0.82 0.15 0.00] X [ 0 0 1.43 0 ] X [-0.92 0.06 0.37 -0.08] = [1.0 6.0 4.0 9.0][-0.26 -0.05 -0.65 -0.72] [ 0 0 0 0.91] [ 0.05 0.85 -0.13 -0.51] [2.0 2.0 3.0 4.0][-0.36 0.03 -0.36 0.44] [2.0 4.0 4.0 5.0]

[-0.54 -0.38 0.62 -0.34] [21.72 0 0 0 ] [-0.20 -0.47 -0.56 -0.66] [2.0 5.3 8.0 6.8][-0.50 -0.42 -0.23 0.42] [ 0 3.81 0 0 ] [-0.33 0.24 -0.73 0.55] [3.0 4.7 7.0 6.2][-0.51 0.82 0.15 0.00] X [ 0 0 1.43 0 ] X [-0.92 0.06 0.37 -0.08] = [1.0 6.0 4.0 9.0][-0.26 -0.05 -0.65 -0.72] [ 0 0 0 0.00] [ 0.05 0.85 -0.13 -0.51] [2.0 2.6 2.9 3.7][-0.36 0.03 -0.36 0.44] [2.0 3.7 4.1 5.2]

[-0.54 -0.38 0.62 -0.34] [21.72 0 0 0 ] [-0.20 -0.47 -0.56 -0.66] [2.8 5.2 7.6 6.9][-0.50 -0.42 -0.23 0.42] [ 0 3.81 0 0 ] [-0.33 0.24 -0.73 0.55] [2.7 4.7 7.2 6.2][-0.51 0.82 0.15 0.00] X [ 0 0 0.00 0 ] X [-0.92 0.06 0.37 -0.08] = [1.2 6.0 4.0 9.0][-0.26 -0.05 -0.65 -0.72] [ 0 0 0 0.00] [ 0.05 0.85 -0.13 -0.51] [1.2 2.7 3.3 3.6][-0.36 0.03 -0.36 0.44] [1.5 3.7 4.2 5.2]

[-0.54 -0.38 0.62 -0.34] [21.72 0 0 0 ] [-0.20 -0.47 -0.56 -0.66] [2.4 5.6 6.6 7.7][-0.50 -0.42 -0.23 0.42] [ 0 0.00 0 0 ] [-0.33 0.24 -0.73 0.55] [2.2 5.1 6.0 7.1][-0.51 0.82 0.15 0.00] X [ 0 0 0.00 0 ] X [-0.92 0.06 0.37 -0.08] = [2.2 5.3 6.2 7.3][-0.26 -0.05 -0.65 -0.72] [ 0 0 0 0.00] [ 0.05 0.85 -0.13 -0.51] [1.1 2.7 3.1 3.7][-0.36 0.03 -0.36 0.44] [1.6 3.7 4.3 5.1]