lecture 21 svd and latent semantic indexing and dimensional reduction
DESCRIPTION
Lecture 21 SVD and Latent Semantic Indexing and Dimensional Reduction. Shang-Hua Teng. Singular Value Decomposition. where u 1 … u r are the r orthonormal vectors that are basis of C(A) and v 1 … v r are the r orthonormal vectors that are basis of C(A T ). - PowerPoint PPT PresentationTRANSCRIPT
Singular Value Decomposition
r
Trrr
TT vuvuvuA
21
222111
where • u1 …ur are the r orthonormal vectors that are basis of C(A) and
• v1 …vr are the r orthonormal vectors that are basis of C(AT )
The Singular Value Decomposition·
·
A UVT
= 0
0
A UVT
m x n m x r r x r r x n
=
0
0
m x n m x m m x n n x n
The Singular Value Reduction·
·
A UVT
m x n m x r r x r r x n
=
0
0
Ak Uk
VkT
m x n m x k k x k k x n
=
Distance between Two Matrices
• Frobenius Norm of a matrix A.
• Distance between two matrices A and B
2,
2
1
2
ji ijijF
m
i
n
jijF
BABA
AA
Approximation Theorem
• [Schmidt 1907; Eckart and Young 1939]
Among all m by n matrices B of rank at most k, Ak is the one that minimizes
221
22
)(
2
,
2
min rkFkFkBrank
ji ijijF
AABA
BABA
Application: Image Compression
• Uncompressed m by n pixel image: m×n numbers
• Rank k approximation of image:– k singular values– The first k columns of U (m-vectors)– The first k columns of V (n-vectors)– Total: k × (m + n + 1) numbers
Example: Yogi (Uncompressed)
• Source: [Will]• Yogi: Rock
photographed by Sojourner Mars mission.
• 256 × 264 grayscale bitmap 256 × 264 matrix M
• Pixel values [0,1]• ~ 67584 numbers
Example: Yogi (Compressed)
• M has 256 singular values
• Rank 81 approximation of M:
• 81 × (256 + 264 + 1) = ~ 42201 numbers
Eigenface
• Patented by MIT• Utilizes two dimentional, global grayscale images• Face is mapped to numbers• Create an image subspace(face space) which best
discriminated between faces• Can be used in properly lit and only frontal images
• Latent Semantic Analysis (LSA)
• Latent Semantic Indexing (LSI)
• Principal Component Analysis (PCA)
Term-Document Matrix• Index each document (by human or by
computer)– fij counts, frequencies, weights, etc
m term
2 term
1 term
n docdoc21 doc
21
22221
1 1211
mnmm
n
n
fff
fff
fff
• Each document can be regarded as a point in m dimensions
Document-Term Matrix• Index each document (by human or by
computer)– fij counts, frequencies, weights, etc
m doc
2 doc
1 doc
n term2 term1 term
21
22221
1 1211
mnmm
n
n
fff
fff
fff
• Each document can be regarded as a point in n dimensions
c1 c2 c3 c4 c5 m1 m2 m3 m4human 1 0 0 1 0 0 0 0 0interface 1 0 1 0 0 0 0 0 0computer 1 1 0 0 0 0 0 0 0user 0 1 1 0 1 0 0 0 0system 0 1 1 2 0 0 0 0 0response 0 1 0 0 1 0 0 0 0time 0 1 0 0 1 0 0 0 0EPS 0 0 1 1 0 0 0 0 0survey 0 1 0 0 0 0 0 0 1trees 0 0 0 0 0 1 1 1 0graph 0 0 0 0 0 0 1 1 1minors 0 0 0 0 0 0 0 1 1
LSI
LSI Factor 1
LS
I F
acto
r 2
using k=2…
“differentialequations”
“applications& algorithms”
T
Each term’s coordinates specified in first K valuesof its row.
Each doc’s coordinates specified in first K valuesof its column.
D
Positive Definite Matrices and Quadratic Shapes
For any m x n matrix A, all eigenvalues of AAT and ATA are non-negative
Symmetric matrices that have positive eigenvalues are called Positive Definite matrices
Symmetric matrices that have non-negative eigenvalues are called Positive semi-definite matrices