principal component analysis principles and application
Post on 22-Dec-2015
218 views
TRANSCRIPT
Principal Component Analysis
Principles and
Application
Fast Multi-Sensor Large
Computers Instruments Data Sets
Examples:•Satellite Data•Digital Camera, Video Data•Tomography•Particle Imaging Velocimetry (PIV)•Ultrasound Velocimetry (UVP)
Low resolution image
Large Data Sets
1 1 2 1 1 600
2 1 2 2 2 600
400 1 400 2 400 600
x
( , ) ( , ) ( , )y
( , ) ( , ) ( , )
( , ) ( , ) ( , )
p x y p x y p x y
p x y p x y p x y
p x y p x y p x y
• There are 400 x 600 = 240,000 pieces of information.
• Not all of this information is independent => information compression (data compression)
Experiment:
• Consider the flow past a cylinder, and suppose we position a cross-wire probe downstream of the cylinder.
• With a cross-wire probe we can measure two components of the velocity at successive time intervals and store the results in a computer.
1 2
1 2
, , , , ,
time
j m
j m
u uu u
v vv v
Example 1Two component velocity measurement
• As the previous slide suggests, the pair of velocities can be represented as a column vector:
• u is a vector at position x in physical space:
• The magnitude and angle of the vector changes with time.
j
jj
u
v
u
x
yu
x
Mathematical Representation of Data
• Mean velocity :
• Variance :
• Covariance :
• Correlation :
1
1, where the bar means
m
jj
uu u
v m
u
2 2
1
2 2
1
1( ) ( )
1( ) ( )
m
u jj
m
v jj
Var u u um
Var v v vm
1
1cov( , ) ( )( )
cov( , ) cov( , )
m
j jj
u v u u v vm
v u u v
Basic Statistics
cov( , ) , 1 1uv uv
u v
u v
Plot u vs v
u
v1
1
j m
j m
u u u
v v v
The data look correlated
Examine the Statistics
Move to a data centered
coordinate system
u
v ( , )u v
v’
u’
2
1 1
2
1 1
1 1
1 1
m m
i i i
m m
i i i
u u vm m
v u vm m
Calculate the Covariance
matrix
Diagonal terms are the variances in the
u’ and v’ directions
Examine the Statistics
Move to a data centered
coordinate system
u
v ( , )u v
v’
u’
2
1 1
2
1 1
1 1
1 1
m m
i i i
m m
i i i
u u vm m
v u vm m
Calculate the Covariance
matrix
covariance or cross-correlation
Rotate coordinates to remove the correlations
u
v
1
v”
2
u”
2
1
2
1
01
0
m
i
m
i
u
mv
Covariance matrix in the (u”,v”) coordinate system
We have just carried out a
Principal Axis Transformation.
This is the first step in a
Principal Component Analysis
(PCA).
Principal Component Analysis
A procedure for transforming a set of correlated
variables into a new set of uncorrelated variables.
How do we do it??
Construction of the
PCA coordinate system
The PCA coordinate system is one that maximizes the mean squared projection of the data. In this sense it is an “optimal” orthogonal coordinate system. Its popularity is primarily due to its dimension reducing properties.
The basic algorithm for constructing the PCA eigenvectors is:
• Find the best direction (line) in the space, 1.
• Find the best direction (line) 2 with the restriction that it must be orthogonal to 1.
• Find the best direction (line) i with the restriction that i is orthogonal to j for all j < i.
How do we find this nice
coordinate system??
Calculate the eigenvalues and eigenvectors
of the
Covariance Matrix
Experiment:
• Pipe Flow -- measurement of velocity profile.
Example 2.Velocity Profile Measurement
z
u(z)
1
2 where ( )k k
n
u
uu u z
u
u
• As before we represent the velocities in the form of a column vector, but this time the vector is not in physical space.
• The space in which our vector lives is one we shall call profile space or pattern space.
• Profile space has n dimensions. In this example, the position zk defines a direction in profile space.
• As time evolves, we measure a sequence of velocity profiles:11 1 1 2 1
22 1 2 2 2
1 2
( , )( , ) ( , ) ( , )
( , )( , ) ( , ) ( , ), , , , ,
( , )( , ) ( , ) ( , )
time
j m
j m
n jn n n m
u z tu z t u z t u z t
u z tu z t u z t u z t
u z tu z t u z t u z t
Vectors in Profile Space
The Preliminary Calculations
1 1 2 1 1
2 1 2 2 2
1 2
( , ) ( , ) ( , )
( , ) ( , ) ( , )
( , ) ( , ) ( , )
m
m
n n n m
u z t u z t u z t
u z t u z t u z t
u z t u z t u z t
U
1. UVP Data Matrix (n x m=128 x 1024)
1 1 1
2 2 2
1
( ) ( ) ( )
( ) ( ) ( )
( ) ( ) ( )
1( ) ( , )
n n n
m
i i kk
u z u z u z
u z u z u z
u z u z u z
u z u z tm
U
2. Mean Profile Matrix (n x m)
1
m X U U
3. Centered Data Matrix (n x m)
1 1
T
T
T
m m
X X
R U U U U
R XX
4. Covariance Matrix (n x n = 128 x 128)
The Diagonalization
R λ Φ 0
Eigenvalue Equation
1
1 2
0
0
0 n
λ
Eigenvalues
1
1
11 1
1
1
0
n
T
n
n n
i k
n
n
i k
i k
Φ
Eigenvectors (eigenprofiles)
2
k
Note: is the variance of
the data in the direction:k
k
k
Example 3.Taylor-Couette Flow
UVP Example
space
time
UVP data
Before
space
space
After (diagonalisation)
Covariance Matrix
compression!!
The Eigenvalue Spectrum(Signal) Energy Spectrum
Energy Fraction
1
1 1
kk n
kk
n
kk
E
E
Ek
Mode Number 1281
1
0
cumulative sum of Ek
Ek
Mode Number1 20
1
0
1
Filtering and Reconstruction
• Decompose X into signal and noise dominated components (subspaces):
where XF is the Filtered data
XNoise is the Residual
• Reconstruct filtered UVP velocity
F F U X U
F Noise X X X
U
UF
XNoise=U-UF
Eigenvalue Spectrum
Filtered Time Series(Channel 70)
Raw data
Filtered data
Residual
Power Spectra(Integrated over all channels)
Superimpose the Spectra
Generalizations
Generalise
• Response to a stimulus • Comparison of multiple data sets obtained by
varying a parameter to study a transition.
1ref
m X U U
ref U 0