principal components analysis bmtry 726 3/27/14. uses goal: explain the variability of a set of...

35
Principal Components Analysis BMTRY 726 3/27/14

Upload: ami-reynolds

Post on 26-Dec-2015

215 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Principal Components Analysis BMTRY 726 3/27/14. Uses Goal: Explain the variability of a set of variables using a “small” set of linear combinations of

Principal Components Analysis

BMTRY 7263/27/14

Page 2: Principal Components Analysis BMTRY 726 3/27/14. Uses Goal: Explain the variability of a set of variables using a “small” set of linear combinations of

UsesGoal: Explain the variability of a set of variables using a “small”

set of linear combinations of those variablesWhy: There are several reasons we may want to do this

(1) Dimension Reduction (use k of p components)-Note, total variability still requires p components

(2) Identify “hidden” underlying relationships (i.e. patterns in the data)-Use these relationships in further analyses

(3) Select subsets of variables

Page 3: Principal Components Analysis BMTRY 726 3/27/14. Uses Goal: Explain the variability of a set of variables using a “small” set of linear combinations of

“Exact” Principal ComponentsWe can represent data X as linear combinations of p random

measurements on j = 1,2,…,n subjects

Page 4: Principal Components Analysis BMTRY 726 3/27/14. Uses Goal: Explain the variability of a set of variables using a “small” set of linear combinations of

“Exact” Principal ComponentsPrincipal components are those combinations that are:

(1) Uncorrelated (linear combinations Y1, Y2,…, Yp)

(2) Variance as large as possible(3) Subject to:

'1

' '1 1 1

'2

' ' ' '2 2 2 1 2

'

' ' ' '

1 linear combo maximizes

subject to 1

2 linear combo maximizes

subject to 1 and , 0

linear combo maximizes

subject to 1 and , 0 f

st

nd

thp

p p p i p

PC

Var

PC

Var Cov

p PC

Var Cov

a X

a X a a

a X

a X a a a X a X

a X

a X a a a X a X

or i p

Page 5: Principal Components Analysis BMTRY 726 3/27/14. Uses Goal: Explain the variability of a set of variables using a “small” set of linear combinations of

Finding PC’s Under Constraints

• So how do we find PC’s that meet the constraints we just discussed?

• We want to maximize subject to the constraint that

• This constrained maximization problem can be done using the method of Lagrange multipliers

• Thus we want to maximize the function

' ' 1i i i i i a a a a

' 'i i i iVar Y Var a X a a

' 1i i a a

Page 6: Principal Components Analysis BMTRY 726 3/27/14. Uses Goal: Explain the variability of a set of variables using a “small” set of linear combinations of

Finding PC’s Under Constraints

• Differentiate w.r.t ai :

Page 7: Principal Components Analysis BMTRY 726 3/27/14. Uses Goal: Explain the variability of a set of variables using a “small” set of linear combinations of

Finding PC’s Under Constraints

• But how do we choose our eigenvector (i.e. which eigenvector corresponds to which PC?)

• We can see that what we want to maximize is

• So we choose li to be as large as possible

• If l1 is our largest eigenvalue with corresponding eigenvector ei then the solution for our max is

' ' 'i i i i i i i i i a a a a a a

1 1 1 a e

Page 8: Principal Components Analysis BMTRY 726 3/27/14. Uses Goal: Explain the variability of a set of variables using a “small” set of linear combinations of

Finding PC’s Under Constraints

• Recall we had a second constraint

• We could conduct a second Lagrangian maximization to find our second PC

• However we already know that eigenvectors are independent (so this constraint is met)

• We choose the order of the PCs by the magnitude of the eigenvalues

' ', , 0i k i kCov Y Y Cov a X a X

Page 9: Principal Components Analysis BMTRY 726 3/27/14. Uses Goal: Explain the variability of a set of variables using a “small” set of linear combinations of

“Exact” Principal ComponentsSo we can compute the PCs from the variance matrix of X, S:

1 2

1 2

'

'

'1 1 2 2

1. eigenvalues of

2. , , , corresponding eigenvectors of such that

1

0

This yields our principalcomponent

...

p

p

i i i

i i

i j

th

i i i i pi p

Var

i j

i

Y

X Σ

e e e Σ

Σe e

e e

e e

e X e X e X e X

Page 10: Principal Components Analysis BMTRY 726 3/27/14. Uses Goal: Explain the variability of a set of variables using a “small” set of linear combinations of

PropertiesWe can also find the moments of our PC’s

'1 1 11 1 12 2 1

1

1

First : ... p pPC Y

E Y

Var Y

e X e x e x e x

Page 11: Principal Components Analysis BMTRY 726 3/27/14. Uses Goal: Explain the variability of a set of variables using a “small” set of linear combinations of

PropertiesWe can also find the moments of our PC’s

'1 1 2 2: ...

,

thk k k k kp p

k

k

i k

k PC Y

E Y

Var Y

Cov Y Y

e X e x e x e x

Page 12: Principal Components Analysis BMTRY 726 3/27/14. Uses Goal: Explain the variability of a set of variables using a “small” set of linear combinations of

PropertiesNormality assumption not required to find PC’sIf Xj ~ Np(m,S) then:

Total Variance:

1'1 1

2' '

'

1 2

~ ,

and , ,..., are independent

j

j j

pj pp

j j pj

Y

N

Y

Y Y Y

e

X Γ X Γ μ

e

1 2

1 2

1 2

1 1

...

...

...

and proportion total variance accounted for component

p

p

p

th

k kp p

i ii i

trace Var X Var X Var X

Var Y Var Y Var Y

k

Var Y

Var Y

Σ

Page 13: Principal Components Analysis BMTRY 726 3/27/14. Uses Goal: Explain the variability of a set of variables using a “small” set of linear combinations of

Principal ComponentsConsider data with p random measures on j = 1,2,…,n subjectsFor the jth subject we then have the random vector

1

2 1,2,...,

Suppose ~ , ...if we set 2 we know what looks like

j

j

j

pj

j j

X

Xj n

X

N p

X

X μ Σ X

X1

X2

m1

m2

Page 14: Principal Components Analysis BMTRY 726 3/27/14. Uses Goal: Explain the variability of a set of variables using a “small” set of linear combinations of

Graphic Representation

'1 2

' 1 2

' '

1

11 '

1

1'

1 1' 1 1 ' 1 '

1 '

'11

2 ' 1 ' '1

~ ,

Densityof is constant on theellipsoid

Recall: and

Note:

Λ

Λ but and

Λ

i

i

j j j j pj

p

i i i i ii

p

i i ii

p

i ii

i i

N X X X

c

Y

c

X μ Σ X

X X Σ X

e X Σ e e

Σ e e

P P

P P P P P P

P P

e e

X Σ X X e e

22 21 2

2 2 21 2

'11 1

and ... 1

i

p

p

p p

i ii i

YY Y

c c c

Y Y

X

Page 15: Principal Components Analysis BMTRY 726 3/27/14. Uses Goal: Explain the variability of a set of variables using a “small” set of linear combinations of

Graphic RepresentationNow suppose X1, X2 ~ N2(m, S)

Y1 axis selected to maximize variation in the scores

Y2 axis must be orthogonal to Y1 and maximize variation in the scores

2

1 11

n

jjY Y

2

2 21

n

jjY Y

Y2

X2

Y1

X1

Page 16: Principal Components Analysis BMTRY 726 3/27/14. Uses Goal: Explain the variability of a set of variables using a “small” set of linear combinations of

Dimension ReductionProportion of total variance accounted for by the first k

components is

If the proportion of variance accounted for by the first k principal components is large, we might want to restrict our attention to only these first k components

Keep in mind, components are simply linear combinations of the original p measurements

Ideally look for meaningful interpretations of our choose k components

1

1

k

iip

ii

Page 17: Principal Components Analysis BMTRY 726 3/27/14. Uses Goal: Explain the variability of a set of variables using a “small” set of linear combinations of

PC’s from Standardized VariablesWe may want to standardize our variables before finding PCs

1 1

1111

2 2

12222 2

11 11

1 12 2

1

11

2

1

1 1

1 1

p ppp

pp

pp pp

X

X

Xp

ij

ij

ii jj

Z

Z

Z

Cov

Z X μ V X μ

Z V ΣV

ρ

Page 18: Principal Components Analysis BMTRY 726 3/27/14. Uses Goal: Explain the variability of a set of variables using a “small” set of linear combinations of

PC’s from Standardized Variables

So the covariance of V equals the correlation of XWe can define our PC’s for Z the same way as before….

12

'

1

PC :

but now and are the eigenvalues/vectors for

because they are standardized:

1 and

thi i

i i

p

i ii

i Y

Var Y Var Y p

Z V X μ

e Z

e ρ

Page 19: Principal Components Analysis BMTRY 726 3/27/14. Uses Goal: Explain the variability of a set of variables using a “small” set of linear combinations of

Compare Standardized/Non-standardized PCs

' '1 1 1 1

' '2 2 2 2

1

1 2

Non-standardized Standardized

1 4 1 0.4

4 100 0.4 1

100.16 0.04 0.999 1.4 0.707 0.707

0.84 0.999 0.04 0.6 0.707 0.707

proportion varianceexplained by the first

100.160

101

PC

Σ ρ

e e

e e

1

1 2

1.4.992 0.70

2

Page 20: Principal Components Analysis BMTRY 726 3/27/14. Uses Goal: Explain the variability of a set of variables using a “small” set of linear combinations of

EstimationIn general we do not know what S is- we must estimate if from

the sampleSo what are our estimated principal components?

1 2

'1

1 1

1 2 1 2

1 2 1 2

Assume we havea randomsample : , ,...,

We can use :

Eigenvalues for :

ˆ ˆ ˆ.... (consistent estimators , ,..., )

Eigenvectors for :

ˆ ˆ ˆ.... (consistent estimators , ,..., )

n

n

j jn j

p p

p p

X X X

S X X X X

S

S

e e e e e e

'

principal component :

ˆˆ

th

i i

i

y e x

Page 21: Principal Components Analysis BMTRY 726 3/27/14. Uses Goal: Explain the variability of a set of variables using a “small” set of linear combinations of

Sample PropertiesIn general we do not know what S is- estimate it from sampleSo what are our estimated principal components?

21

1 1

1 2 1

ˆ1. Estimated Variance of y :

ˆ ˆ ˆ

ˆ ˆ2. Sample covariance and correlations for , :

ˆ ˆ, 0

ˆ3. Proportion total variance accounted for by :

ˆ ˆ

ˆ ˆ ˆ ˆ...

i

n

i ij in j

i k

i k

k

k kp

p ii

y y

y y

Cov y y i k

y

1ˆ , 2 2

1 1

ˆ4. Estimated correlation for , :

ˆˆ ˆ ˆ

ˆ ˆi k

i k

n

ij i kj kj ik iy x

n nkk

ij i kj kj j

y

y y X Xr

sy y X X

x

e

Page 22: Principal Components Analysis BMTRY 726 3/27/14. Uses Goal: Explain the variability of a set of variables using a “small” set of linear combinations of

CenteringWe often center our observations before defining our PCsThe centered PCs are found according to:

'

'

'11

'11

'1

ˆˆ , 1,2,...,

ˆˆ , 1, 2,...,

ˆˆ

ˆ

ˆ

i i

ij i j

n

i i jn j

n

i jn j

in

y i p

y j n

y

e x x

e x x

e x x

e x x

e 0 0

Page 23: Principal Components Analysis BMTRY 726 3/27/14. Uses Goal: Explain the variability of a set of variables using a “small” set of linear combinations of

ExampleJolicoeur and Mosimann (1960) conducted a study looking at the

relationship between size and shape of painted turtle carapaces.

We can develop PC’s for natural log of length, width, and height of female turtles’ carapaces

1

2

3

1 2 3

log carapace length .0624 .0201 .0249

log carapace width & .0162 .0194

log carapace height .0249

.627 .553 .550

ˆ ˆ ˆ.488 , .272 .830

.608 .788 .993

j

j j

j

x

x

x

x S

e e e

ˆ .06623 .00077 .00054

λ

Page 24: Principal Components Analysis BMTRY 726 3/27/14. Uses Goal: Explain the variability of a set of variables using a “small” set of linear combinations of

ExampleThe first PC is:

This might be interpreted as an overall size component

'1 1ˆˆ

0.627*log length 0.488*log width 0.608*log height

y

e x

Shell dimensionssmall

Small valuesy1

Shell dimensionslarge

Large valuesy1

Page 25: Principal Components Analysis BMTRY 726 3/27/14. Uses Goal: Explain the variability of a set of variables using a “small” set of linear combinations of

ExampleThe second PC is:

Emphasizes contrast between length and height of the shell

'2 2ˆˆ

0.553*log length 0.272*log width 0.788*log height

y

e x

Small valuesy2

Large valuesy2

Page 26: Principal Components Analysis BMTRY 726 3/27/14. Uses Goal: Explain the variability of a set of variables using a “small” set of linear combinations of

ExampleThe third PC is:

Emphasizes contrast between width and length of the shell

'3 3ˆˆ

0.550*log length 0.830*log width 0.099*log height

y

e x

Small valuesy3

Large valuesy3

Page 27: Principal Components Analysis BMTRY 726 3/27/14. Uses Goal: Explain the variability of a set of variables using a “small” set of linear combinations of

Example

Consider the proportion of variability accounted for by each PC

ˆ .06623 .00077 .00054λ

Page 28: Principal Components Analysis BMTRY 726 3/27/14. Uses Goal: Explain the variability of a set of variables using a “small” set of linear combinations of

ExampleHow are the PCs correlated with each of the x’s?

Then

ˆ ,

ˆˆi j

ij iy

jj

er

s

x

Trait

x10.99 0.09 -0.08

X20.99 0.06 0.15

X30.99 -0.14 -0.01

1 1

11 1ˆ ,

11

ˆˆ 0.627 0.066230.99

0.0264y

er

s

x

1y 2y 3y

Page 29: Principal Components Analysis BMTRY 726 3/27/14. Uses Goal: Explain the variability of a set of variables using a “small” set of linear combinations of

Interpretation of PCsConsider data x1, x2, …., xp:

PCs are actually projections onto the estimated eigenvectors

-1st PC is the one with the largest projection-For data reduction, only use PCA if the eigenvalues vary-If x’s are uncorrelated, we can’t really do data reduction

11

'

' 1 2

'

ˆˆLet:

Consider thecontour:

This contour mimics thedensityof ,

ˆ ˆˆ length of projection of in direction of

p

ip i

i i

i i i

Var

y

c

N

y

x S x x

e x x

x x S x x

μ Σ

e x x x x e

Page 30: Principal Components Analysis BMTRY 726 3/27/14. Uses Goal: Explain the variability of a set of variables using a “small” set of linear combinations of

Choosing Number of PCsOften the goal of PCA is dimension reduction of data

Select a limited number of PCs that capture majority of the variability in the data

How do we decide how many PCs to include:1. Scree plot: plot of versus i2. Select all PCs with (for standardized observations)

3. Choose some proportion of the variance you want to account for

iˆ 1i

Page 31: Principal Components Analysis BMTRY 726 3/27/14. Uses Goal: Explain the variability of a set of variables using a “small” set of linear combinations of

Scree Plots

Page 32: Principal Components Analysis BMTRY 726 3/27/14. Uses Goal: Explain the variability of a set of variables using a “small” set of linear combinations of

Choosing Number of PCsShould principal components that only account for a small

proportion of variance always be ignored?

Not necessarily, they may indicate near perfect colinearities among traits

In the turtle example, this is true-very little variation of the variation in shell measurements can be attributed to the 2nd and 3rd components

Page 33: Principal Components Analysis BMTRY 726 3/27/14. Uses Goal: Explain the variability of a set of variables using a “small” set of linear combinations of

Large Sample PropertiesIf n is large, there are nice properties we can use

' '11 21 1

2

1

2

22

ˆ ˆ ˆ ˆwith

ˆFor large : , 2

where:

a. estimated eigen values for are asymptotically independent

ˆb. distribution of ~ ,

c.

n

j j pn j

D

p

p

i i in

n n N

diag

N

S x x x x λ

λ λ 0 Λ

Λ λ

Σ

2 2

2 21 1

2 2

2 2

21

ˆ ˆ an approximate CI for is:

1 1

ˆd. alternative approximation: ln ~ ln ,

ˆ ˆn n

i ii i

n n

i i n

z z

i i i

z z

N

e e

Page 34: Principal Components Analysis BMTRY 726 3/27/14. Uses Goal: Explain the variability of a set of variables using a “small” set of linear combinations of

Large Sample PropertiesAlso for our estimated eigenvectors

These results assume that X1, X2, …., Xn are N(m, S)

'

2

ˆ1. For large : ,

where:

ˆ ˆ2. For large , is approximately independent of the distribution for

D

i i p i

ki i k kk i

k i

i i

n n N E

E

n

e e 0

e e

e

Page 35: Principal Components Analysis BMTRY 726 3/27/14. Uses Goal: Explain the variability of a set of variables using a “small” set of linear combinations of

SummaryPrincipal component analysis most useful for dimensionality

reduction

Can also be used for identifying colinear variables

Note, use of PCA in a regression setting is therefore one way to handle multi-colinearity

A caveat… principal components can be difficult to interpret and should therefore be used with caution