2009 7 pca + factor analyses

7/28/2019 2009 7 PCA + Factor Analyses

1/68

Data Reduction I : PCA and Factor Analysis

Carlos Pinto1

1Neuropsychiatric Genetics GroupUniversity of Dublin

Data Analysis Seminars 11 November 2009


2/68

Overview

1. Distortions in Data

2. Variance and Covariance

3. PCA: A Toy Example

4. Running PCA in R

5. Factor Analysis

6. Running Factor Analysis in R

7. Conclusions


3/68

Sources of Confusion in Data

1. Noise

2. Rotation

3. Redundancy

in linearsystems . . .


4/68


1. Noise

2. Rotation

3. Redundancy



5/68


1. Noise

2. Rotation

3. Redundancy



6/68


1. Noise

2. Rotation

3. Redundancy



7/68

Noise

1. Noise arises from inaccuracies in our measurements

2. Noise is measured relative to the signal

SNR =2signal

2noise

The SNR must be large if we are to extract useful information

from our experiment.


8/68

Noise



SNR =2signal

2noise




9/68

Noise



SNR =2signal

2noise




10/68

Noise



SNR =2signal

2noise




11/68

Variance in Signal and Noise


12/68

Rotation

1. We dont always know the true underlying variables in the

system we are studying

2. We may actually be measuring some (hopefully linear!)

combination of the true variables.


13/68

Rotation

1. We dont always know the true underlying variables in the

system we are studying

2. We may actually be measuring some (hopefully linear!)

combination of the true variables.


14/68

Redundancy

1. We dont always know whether the variables we are

measuring are independent.

2. We may actually be measuring one or more (hopefully

linear!) combinations of the same set of variables.


15/68

Redundancy

1. We dont always know whether the variables we are

measuring are independent.

2. We may actually be measuring one or more (hopefully

linear!) combinations of the same set of variables.


16/68

Degrees of Redundancy


17/68


18/68

Goals of PCA

1. To isolate noise

2. To separate out the redundant degrees of freedom

3. To eliminate the effects of rotation


19/68

Goals of PCA

1. To isolate noise




20/68

Goals of PCA

1. To isolate noise




21/68

Example: Measuring Two Correlated Variables


22/68

Reduced Data


23/68

Variance

Variable : X Measurements: {x1 . . .xn}

Variable : Y Measurements: {y1 . . .yn}

Variance of X: 2XX =P

i(xix)(xix)n1

Variance of Y: 2YY =P

i(yiy)(yiy)

n1


24/68

Covariance

Covariance of Xand Y:

2XY =

i(xi x)(yi y)

n 1= 2YX

The covariance measures the degree of the (linear!)

relationship between Xand Y

Small values of the covariance indicate that the variables areindependent (uncorrelated)


25/68

Covariance Matrix

We can write the four variances associated with our two

variables in the form of a matrix:

CXX CXYCYX CYY

The diagonal elements are the variances and the offdiagonal

elements are the covariances

The covariance matrix is symmetric, since CXY = CYX


26/68

Three Variables

For 3 variables, we get a 3 3 matrix:

C11 C12 C13

C21 C22 C23C31 C32 C33

As before, the diagonal elements are the variances and the

offdiagonal elements are the covariances, and the matrix issymmetric


27/68

Interpretation of the Covariance Matrix

1. The diagonal terms are the variances of our different

variables.

2. Large diagonal values correspond to strong signals

3. The off-diagonal terms are the covariances between

different variables. They reflect distortions in the data

(noise, rotation, redundancy).

4. Large off-diagonal values correspond to distortions in ourdata


28/68

Interpretation of the Covariance Matrix

1. The diagonal terms are the variances of our different

variables.

2. Large diagonal values correspond to strong signals

3. The off-diagonal terms are the covariances between

different variables. They reflect distortions in the data

(noise, rotation, redundancy).

4. Large off-diagonal values correspond to distortions in ourdata


29/68

Minimizing Distortion

1. If we redefine our variables (as linear combinations of each

other) the covariance matrix will change.

2. We want to change the covariance matrix so that the

offdiagonal elements are close to zero.

3. That is, we want to diagonalize the covariance matrix.


30/68








31/68








32/68

Example 1: Toy Example

X = (1, 2) X = 3/2

Y = (3, 6) Y = 9/2

Subtract the means, since we are only interested in thevariances

X

=

1

2 ,1

2

Y =

3

2,

3

2


33/68

Example 1: Toy Example

X = (1, 2) X = 3/2

Y = (3, 6) Y = 9/2

Subtract the means, since we are only interested in thevariances

X

=

1

2 ,1

2

Y =

3

2,

3

2

V i


34/68

Variances

C11 =

12

12

+

1

2 1

2

=

1

2

C

12 =

1

2 3

2

+1

2 3

2

=

3

2

C21 =

32

12

+

3

2 1

2

=

3

2

C22 =

32

32

+

32

32

= 9

2

C i M t i


35/68

Covariance Matrix

The covariance matrix is:

C

=

1

2

3

2

3

2

9

2

This is not diagonal the nonzero off-diagonal terms indicate

the presence of distortion, in this case because the two

variables are correlated

R t ti f A I


36/68

Rotation of Axes I

We need to redefine our variables to diagonalize the covariance

matrix.

Choose new variables which are a linear combination of the old

ones:

X = a1X + a2Y

Y = b1X + b2Y

We have to determine the four constants a1, a2, b1, b2 such that

the covariance matrix is diagonal.

R t ti f A I


37/68

Rotation of Axes I


matrix.


ones:

X = a1X + a2Y

Y = b1X + b2Y



Rotation of Axes I


38/68

Rotation of Axes I


matrix.


ones:

X = a1X + a2Y

Y = b1X + b2Y



Rotation of Axes II


39/68

Rotation of Axes II

In this case, the constants are easy to find:

X

= X

+ 3Y

Y = 3X + Y

Recalculating the Data Points


40/68

Recalculating the Data Points

Our data points relative to the new axes are now:

X1 = 1

2+

3 32

= 5

X

2 =1

2 +

3 3

2

= 5

Y1 = 3 1

2+ 3

2= 0

Y2 =

3 12

+ 3

2= 0

Recalculating the Variances


41/68

Recalculating the Variances

The new variances are now:

C11 = (5 5) + (5 5) = 25

C

12 = (5 0) + (5 0) = 0

C21 = (0 5) + (0 5) = 0

C

22 = (0 0) + (0 0) = 0

Diagonalised Covariance Matrix


42/68

Diagonalised Covariance Matrix

The covariance matrix is now:

C =

25 0

0 0

1. All offdiagonal entries are now zero, so data distortions

have been removed

2. The variance of Y

(second entry on the diagonal) is nowzero. This variable has now been eliminated, so the data

has been dimensionally reduced (from 2 dimensions to 1)

Eigenvalues


43/68

Eigenvalues

1. The numbers on the diagonal of C are called the

eigenvalues of the covariance matrix.

2. Large eigenvalues correspond to large variances theseare the important variables to consider.

3. Small eigenvalues correspond to small variances these

variables can sometimes be neglected.

Eigenvectors


44/68

Eigenvectors

The directions of the new rotated axes are called the

eigenvectors of the covariance matrix

Etymological Note:

vector: from Latin, carrier

eigen: from German, proper to, belonging to

Example 2: A More Realistic Example


45/68

Example 2: A More Realistic Example

3 variables, 5 observations

Finance, X Marketing, Y Policy, Z

3 6 5

7 3 3

10 9 8

3 9 7

10 6 5

Running PCA in R


46/68

Running PCA in R

PCA Output from R


47/68

PCA Output from R

Assumptions and Limitations of PCA


48/68

Assumptions and Limitations of PCA

1. Large variances represent signal. Small signals represent

noise.

2. The new axes are linear combinations of the original axes

3. The principal components are orthogonal (perpendicular)

to each other

4. The distributions of our measurements are fully described

by their mean and variance

Advantages of PCA


49/68

d a tages o C

1. The procedure is hypothesis free.

2. The procedure is well defined

3. The solution is (essentially) unique

Factor Analysis


50/68

y

There are a multiplicity of procedures which go under the name

of factor analysis.

Factor analysis is hypothesis drivenand tries to find a prioristructure in the data.

Well use the data from Example 2 in the next few slides...

Factor Model


51/68

Hypothesis: A significant part of the variance of our 3

variables can be expressed as linear combinations of twounderlying factors, F1 and F2:

X = a1F1 + a2F2 + eX

Y = b1F1 + b2F2 + eY

Z = c1F1 + c2F2 + eZ

eX, eY and eZ allow for errors in the measuements of our 3

variables.

Factor Model


52/68



X = a1F1 + a2F2 + eX

Y = b1F1 + b2F2 + eY

Z = c1F1 + c2F2 + eZ


variables.

Factor Model


53/68



X = a1F1 + a2F2 + eX

Y = b1F1 + b2F2 + eY

Z = c1F1 + c2F2 + eZ


variables.

Assumptions


54/68

1. The factors Fi are independent with mean 0 and variance 1

2. The error terms are independent with mean 0 and variance2i

With these assumptions, we can calculate the variances in

terms of our hypothesised loadings.

Assumptions


55/68

1. The factors Fi are independent with mean 0 and variance 1

2. The error terms are independent with mean 0 and variance2i

With these assumptions, we can calculate the variances in

terms of our hypothesised loadings.

Factor Model Variances


56/68

CXX = a2

1 + a2

2 +

2

eX

CYY = b2

1 + b2

2 + 2

eY

CZZ = c2

1 + c2

2 + 2

eZ

CXY = a1b1 + a2b2

CXZ = a1c1 + a2c2

CYZ = b1c1 + b2c2

Observed Variances for Example 2


57/68

CXX = 9.84

CXX = 5.04

CXX = 3.04

CXY = 0.36

CXZ = 0.44

CYZ = 3.84

Note: Observations are not mean-adjusted

Fitting the Model


58/68

To fit the model we would like to choose the loadings ai, bi and

ci so that the predicted variances match the observed

variances.

Problem: The solution is not unique

That is, there are different choices for the loadings that yield the

same values for the variances; in fact, an infinite number.....

Fitting the Model


59/68

To fit the model we would like to choose the loadings ai, bi and

ci so that the predicted variances match the observed

variances.

Problem: The solution is not unique

That is, there are different choices for the loadings that yield the

same values for the variances; in fact, an infinite number.....

So ... what does the computer do ?


60/68

In essence, it finds an initial set of loadings and then rotates

these to find the best solution according to a user specified

criterion

Varimax: Tends to maximise the difference in loading

between factors.

Quartimax: Tends to maximise the difference in loading

between variables.

So ... what does the computer do ?


61/68

In essence, it finds an initial set of loadings and then rotates

these to find the best solution according to a user specified

criterion

Varimax: Tends to maximise the difference in loading

between factors.

Quartimax: Tends to maximise the difference in loading

between variables.

Example 3


62/68

Physics History English Maths Politics

8 4 3 8 5

7 3 3 9 3

10 4 3 9 4

3 8 9 4 7

4 8 6 4 7

5 9 8 5 9

Running Factor Analysis in R


63/68

PCA Analysis of Example 3


64/68

Factor Analysis of Example 3


65/68

Some Opinions on Factor Analysis...


66/68

Take Home


67/68

1. Measurements are subject to many distortions including

noise, rotation and redundancy

2. PCA is a robust, well defined method for reducing data and

investigating underlying structure.

3. There are many approaches to factor analysis; all of them

are subject to a certain degree of arbitrariness and should

be used with caution

Further Reading


68/68

A Tutorial on Principal Components Analysis

Jonathon Shlens

http://www.snl.salk.edu/ shlens/notes.html

Factor Analysis

Peter Tryfos

www.yorku.ca/ptryfos/f1400.pdf

2009 7 pca + factor analyses

Documents