pcabrettb/linoptfall2018/lab9slides.pdf · pca 1 de ne x 1 = (4;1), x 2 = ( 3;1), and x 3 = (1;1)....

PCA

1 Define x1 = (4, 1), x2 = (−3, 1), and x3 = (1, 1).1 Give a one-dimensional affine subspace of R2 that best approximates

these three points.2 Use this to represent each point using a single number (i.e., reduce the

dimension from 2 to 1).

2 Suppose x1, . . . , xn ∈ Rp are datapoints you want to represent ink < p dimensions.

1 Explain how to do this using PCA.2 How do you determine a value for k?3 How can you implement PCA using the SVD?4 Why should we perform dimensionality reduction?

3 Suppose there are two eigenvectors of the covariance matrix thatcorrespond to large eigenvalues, and the rest of the eigenvalues aresmall. How do we interpret this?

Brett Bernstein (CDS at NYU) November 14, 2018 1 / 24

Scree Plot


Variance Along a Direction

1 Let x1, . . . , xn ∈ Rp, and fix a direction w ∈ Rp with ‖w‖ = 1. Wedefine the variance along the direction w by

1

n − 1n∑

i=1

(wT xi − wTµ)2

where µ = 1n∑n

i=1 wT xi . On the homework we will see that the first

eigenvector of the covariance matrix gives the direction withmaximum variance. Why is this desirable?


Project Along Direction (x̃i = xi − µ)

x̃1

x̃2

x̃3

x̃4 x̃5

x̃6

x̃7

w


Project Along Direction (x̃i = xi − µ)

x̃1

x̃2

x̃3

x̃4 x̃5

x̃6

x̃7

wT x̃i-values

w


Low Variance Direction


High Variance Direction


Standardization

1 Someone suggests that you should standardize each feature beforerunning PCA (i.e., subtract the mean of each feature, and then divideby the standard deviation). Does this have any effect?


Centering: Uncentered Data


Centering: Uncentered Data

u1

u2


Centering: Centered Data

u1u2


Scaling

u1

u2


Scaling: Multiply Second Feature by 4

u2

u1


PCA Example

Example

A collection of people come to a testing site to have their heightsmeasured twice. The two testers use different measuring devices, each ofwhich introduces errors into the measurement process. Below we depictsome of the measurements computed (already centered).


PCA Example (Data is Centered)

−20

−10

10

20

Tester 2

−10 −5 5 10Tester 1

1 Describe (vaguely) what you expect the sample covariance matrix tolook like.

2 What do you think the principal directions u1 and u2 are?


PCA Example: Solutions

1 We expect tester 2 to have a larger variance than tester 1, and to benearly perfectly correlated. The sample covariance matrix is

S =

(40.5154 93.506993.5069 232.8653

).

2 We have

S = UΛUT ,U =

(0.3762 −0.92650.9265 0.3762

),Λ =

(270.8290 0

0 2.5518

).

Note that trace Λ = traceS .

Since λ2 is small, it shows that u2 is almost in the nullspace of S .This suggests −.9265x + .3762y ≈ 0 for centered data points(x , y) ∈ R2. In other words, y ≈ 2.46x . Maybe tester 2 usedcentimeters and tester 1 used inches.


PCA Example: Plot In Terms of Principal Components

−20

−10

10

20

Tester 2

−10 −5 5 10Tester 1

−1.25

2.5

6.25u2

−20 −10 10 20u1


Principal Components Are Linear

Suppose we have the following labeled data.

How can we apply PCA and obtain a single principal component thatdistinguishes the colored clusters?


Principal Components Are Linear: Doesn’t Work


Principal Components Are Linear

1 In general, we can deal with non-linear by adding features or usingkernels.

2 Using kernels results in the technique called Kernel PCA.

3 Below we added the feature ‖x̃i‖2 and took the first principalcomponent.

4 Next class we will look at diffusion maps.


Diagonalization

1 Suppose A ∈ Rn×n has a linearly independent list of n eigenvectorsv1, . . . , vn with eigenvalues λ1, . . . , λn. Can we factor A in a waysimilar to the spectral decomposition?

2 The Fibonacci sequence is defined by F0 = 0, F1 = 1, andFk+2 = Fk+1 + Fk for k ≥ 0. How quickly does Fk grow (linearly,polynomially, exponentially)?


pcabrettb/linoptfall2018/lab9slides.pdf · pca 1 de ne x 1 = (4;1), x 2 = ( 3;1), and x 3 = (1;1)....

Documents