math 5364/66 notes principal components and factor analysis in sas jesse crawford department of...
TRANSCRIPT
Math 5364/66 NotesPrincipal Components and Factor Analysis in SAS
Jesse Crawford
Department of MathematicsTarleton State University
Setting for Principal Components
1,Random vector ( , taking values i) ' n ppX X X
Typical Coordinate System
1,Random vector ( , taking values i) ' n ppX X X
Principal Components
1,Random vector ( , taking values i) ' n ppX X X
Relation to Eigenvectors
1
1
1
• Let cov( )
• Suppose 0 are the eigenvalues of
• Let , be corresponding orthonormal eigenvectors
• Then , are the principal component
,
, s
p
p
p
X
a a
a a
Implementation in R
Simulating the Data in SAS
Simulating the Data in SAS
1 1
2 2
5 0
2 0.4
X Z
X Z
1 1
2 2
1 1
2 2
2 2
5 0
2 0.4
5 0 5 2cov cov
2 0.4 0 0.4
5 0 5 2
2 0.4 0 0.4
25 10
10 4.16
X Z
X Z
X
I
Z
X Z
Covariance Matrix in SAS
25 10
10 4.16
Principal Components in SAS
Inputting a Covariance Matrix Manually
PCA Using Original Data
Example: Math and Reading Exams
Example: Adelges (Winged Aphids)
• 19 variables
• 4 principal components needed to explain 90% of the total variation
• PCA can be used to reduce dimensionality
PCA Summary
• -dimensional random vector
• Covariance matrix
• Principal components are simply an orthonormal eigenbasis of
• Dimensionality reduction is achieved by dropping components with small eigenvalues
p X
Setting for Factor Analysis
1• Random vector ( ,
• Example from Spearman (1904). Exam scores for 33 students.
(Classics,French,English,Math,Pit
,
ch,Music) '
) 'pX X X
X
Setting for Factor Analysis
1
1
• Random vector ( ,
• Example from Spearman (1904). Exam scores for 33 students.
(Classics,French,English,Math,Pitch,Music) '
• Idea: Explain the variation in with a random vector
, ) '
( , , )
p
k
X X X
X
X f f f
'
via a regression equation
1 1 11 1 1 1
2 2 21 1 2 2
1 1
k k
k k
p p p pk k p
X f l f
f l
l
f
f l
X
X l f
l
ò
ò
ò
Setting for Factor Analysis
1• Random vector ,( , ) 'pX X X
1 1 11 1 1 1
2 2 21 1 2 2
1 1
k k
k k
p p p pk k p
X f l f
f l
l
f
f l
X
X l f
l
ò
ò
ò
Observed data(Random)
Intercept Term(Constant)
Factor loadings(Constant) Common factors
(Random)
Specific factors(Random)
Setting for Factor Analysis
1• Random vector ,( , ) 'pX X X
1 1 11 1 1 1
2 2 21 1 2 2
1 1
k k
k k
p p p pk k p
X f l f
f l
l
f
f l
X
X l f
l
ò
ò
ò
Observed data(Random, Observable)
Intercept Term(Constant)
Factor loadings(Constant) Common factors
(Random)
Specific factors(Random)
Unobservable
1 1 11 1 1 1
2 2 21 1 2 2
1 1
• is a -dimensional random vector
•
•
• is a -dimensional random vector
• is a -dimensional random vector
k k
k k
p p p pk k p
p
p k
l
X l
X l
X
X
X f l f
f l f
f l f
Lf
p
k
p
L
f
ò
ò
ò
ò
ò
• is a -dimensional random vector
•
•
• is a -dimensional random vector
• is a -dimensional random vector
p
p k
Lf
p
k
p
X
X
L
f
ò
ò
1
1
1
• ( ) 0
• ) 0
• cov( , ) 0
• cov( )
• cov( ) diag( , , ),
with each 0
(
• cov( )
k
p
k p
k k
p
i
E f
f
f I
E
X
ò
ò
ò
• is a -dimensional random vector
•
•
• is a -dimensional random vector
• is a -dimensional random vector
p
p k
Lf
p
X
X
k
L
f
p
L
L
ò
ò
1
1
1
• ( ) 0
• ) 0
• cov( , ) 0
• cov( )
• cov( ) diag( , , ),
with each 0
(
• cov( )
k
p
k p
k k
p
i
E f
f
f I
E
X
ò
ò
ò
2 21
2
• is a -dimensional random vector
•
•
• is a -dimensional random vector
• is a -dimensional random vector
Var( )
p
p k
i ii i ik i
i i
Lf
p
k
p
LL
X
X
X
L
l
h
l
f
ò
ò
1
1
1
• ( ) 0
• ) 0
• cov( , ) 0
• cov( )
• cov( ) diag( , , ),
with each 0
(
• cov( )
k
p
k p
k k
p
i
E f
f
f I
E
X
ò
ò
ò
2 21
2
• is a -dimensional random vector
•
•
• is a -dimensional random vector
• is a -dimensional random vector
Var( )
p
p k
i ii i ik i
i i
Lf
p
k
p
LL
X
X
X
L
l
h
l
f
ò
ò
1
1
1
• ( ) 0
• ) 0
• cov( , ) 0
• cov( )
• cov( ) diag( , , ),
with each 0
(
• cov( )
k
p
k p
k k
p
i
E f
f
f I
E
X
ò
ò
ò
Communality orCommon variance
Uniqueness or Specific variance
L̂
2
2
Var( )
If corr( ), then
1
i i i
i i
Lf
h
h
X
X
X
ò
2ˆih
1 1
cov( , )
If corr( ), then
corr( , )
i i i ik k i
i j ij
i j ij
X f l f
X f l
X f l
l
X
ò
Correlations between
's and iX f
Principal Component Method for Factor Analysis
1
1 1 1
, where is orthogonal
and diag( , , )
ˆDefine ( ,
ˆ ˆ ˆDefine ')
ˆ ˆ ˆ'
, )
(
ˆ ˆ ˆRes '
p
k
ii ii ii
Lf
LL
L
LL
X
LL
LL
ò
1
2
12
21 2
1
th column of
th principal component of
th eigenvalue
0 res res
res 0 resRes
res res 0
p
p
p p
i
i
i
i
i
L̂
2ˆih
'sˆDiagonal Entries:
Off-diagonal entries : re ss 'ii
ij
Rule of thumb:
If RMS 0.05, then the model is acceptable
1 1 1
cov( )
ˆ
ˆ ˆ ˆ ˆ ˆ ˆ( ) ) (Generalized/weighted least squares method)
ˆˆˆ
ˆValues of are called factor sc r
(
o es.
Lf
X
f L L L X
X Lf
f
X
X
X
ò
ò
ò
Estimating Factor Scores
Rotation of Factors
• is a -dimensional random vector
•
•
• is a -dimensional random vector
• is a -dimensional random vector
p
p k
Lf
p
k
p
X
X
L
f
ò
ò
1
1
1
• ( ) 0
• ) 0
• cov( , ) 0
• cov( )
• cov( ) diag( , , ),
with each 0
(
• cov( )
k
p
k p
k k
p
i
E f
f
f I
E
X
ò
ò
ò
• be an orthogonal matrix.
• Then and satisfy the above conditions.
Let
fLL f
å å