dimensionality reduction. high-dimensional == many features find concepts/topics/genres: –...
TRANSCRIPT
![Page 1: Dimensionality Reduction. High-dimensional == many features Find concepts/topics/genres: – Documents: Features: Thousands of words, millions of word pairs](https://reader030.vdocument.in/reader030/viewer/2022032701/56649c745503460f949272a4/html5/thumbnails/1.jpg)
Dimensionality Reduction
![Page 2: Dimensionality Reduction. High-dimensional == many features Find concepts/topics/genres: – Documents: Features: Thousands of words, millions of word pairs](https://reader030.vdocument.in/reader030/viewer/2022032701/56649c745503460f949272a4/html5/thumbnails/2.jpg)
Slides by Jure Leskovec 2
Dimensionality Reduction
• High-dimensional == many features• Find concepts/topics/genres:
– Documents:• Features: Thousands of words, millions of word pairs
– Surveys – Netflix: 480k users x 177k movies
![Page 3: Dimensionality Reduction. High-dimensional == many features Find concepts/topics/genres: – Documents: Features: Thousands of words, millions of word pairs](https://reader030.vdocument.in/reader030/viewer/2022032701/56649c745503460f949272a4/html5/thumbnails/3.jpg)
Slides by Jure Leskovec 3
Dimensionality Reduction
• Compress / reduce dimensionality:– 106 rows; 103 columns; no updates– random access to any cell(s); small error: OK
![Page 4: Dimensionality Reduction. High-dimensional == many features Find concepts/topics/genres: – Documents: Features: Thousands of words, millions of word pairs](https://reader030.vdocument.in/reader030/viewer/2022032701/56649c745503460f949272a4/html5/thumbnails/4.jpg)
Slides by Jure Leskovec 4
Dimensionality Reduction
• Assumption: Data lies on or near a low d-dimensional subspace
• Axes of this subspace are effective representation of the data
![Page 5: Dimensionality Reduction. High-dimensional == many features Find concepts/topics/genres: – Documents: Features: Thousands of words, millions of word pairs](https://reader030.vdocument.in/reader030/viewer/2022032701/56649c745503460f949272a4/html5/thumbnails/5.jpg)
Slides by Jure Leskovec 5
Why Reduce Dimensions?
Why reduce dimensions?• Discover hidden correlations/topics
– Words that occur commonly together• Remove redundant and noisy features
– Not all words are useful• Interpretation and visualization• Easier storage and processing of the data
![Page 6: Dimensionality Reduction. High-dimensional == many features Find concepts/topics/genres: – Documents: Features: Thousands of words, millions of word pairs](https://reader030.vdocument.in/reader030/viewer/2022032701/56649c745503460f949272a4/html5/thumbnails/6.jpg)
Slides by Jure Leskovec 6
SVD - Definition
A[m x n] = U[m x r] [ r x r] (V[n x r])T
• A: Input data matrix– m x n matrix (e.g., m documents, n terms)
• U: Left singular vectors – m x r matrix (m documents, r concepts)
• : Singular values– r x r diagonal matrix (strength of each ‘concept’)
(r : rank of the matrix A)• V: Right singular vectors
– n x r matrix (n terms, r concepts)
![Page 7: Dimensionality Reduction. High-dimensional == many features Find concepts/topics/genres: – Documents: Features: Thousands of words, millions of word pairs](https://reader030.vdocument.in/reader030/viewer/2022032701/56649c745503460f949272a4/html5/thumbnails/7.jpg)
Slides by Jure Leskovec 7
SVD
Am
n
m
n
U
VT
T
![Page 8: Dimensionality Reduction. High-dimensional == many features Find concepts/topics/genres: – Documents: Features: Thousands of words, millions of word pairs](https://reader030.vdocument.in/reader030/viewer/2022032701/56649c745503460f949272a4/html5/thumbnails/8.jpg)
Slides by Jure Leskovec 8
SVD
Am
n
+
1u1v1 2u2v2
σi … scalarui … vectorvi … vector
T
![Page 9: Dimensionality Reduction. High-dimensional == many features Find concepts/topics/genres: – Documents: Features: Thousands of words, millions of word pairs](https://reader030.vdocument.in/reader030/viewer/2022032701/56649c745503460f949272a4/html5/thumbnails/9.jpg)
Slides by Jure Leskovec 9
SVD - Properties
It is always possible to decompose a real matrix A into A = U VT , where
• U, , V: unique• U, V: column orthonormal:
– UT U = I; VT V = I (I: identity matrix)– (Cols. are orthogonal unit vectors)
• : diagonal– Entries (singular values) are positive,
and sorted in decreasing order (σ1 σ2 ... 0)
![Page 10: Dimensionality Reduction. High-dimensional == many features Find concepts/topics/genres: – Documents: Features: Thousands of words, millions of word pairs](https://reader030.vdocument.in/reader030/viewer/2022032701/56649c745503460f949272a4/html5/thumbnails/10.jpg)
Slides by Jure Leskovec 10
SVD – Example: Users-to-Movies
• A = U VT - example:
1 1 1 0 0
2 2 2 0 0
1 1 1 0 0
5 5 5 0 0
0 0 0 2 2
0 0 0 3 3 0 0 0 1 1
0.18 0
0.36 0
0.18 0
0.90 0
0 0.53
0 0.800 0.27
=
SciFi
Romnce
9.64 0
0 5.29x
0.58 0.58 0.58 0 0
0 0 0 0.71 0.71
x
Mat
rix
Alie
n
Sere
nity
Casa
blan
ca
Am
elie
![Page 11: Dimensionality Reduction. High-dimensional == many features Find concepts/topics/genres: – Documents: Features: Thousands of words, millions of word pairs](https://reader030.vdocument.in/reader030/viewer/2022032701/56649c745503460f949272a4/html5/thumbnails/11.jpg)
Slides by Jure Leskovec 11
SVD – Example: Users-to-Movies
• A = U VT - example:
1 1 1 0 0
2 2 2 0 0
1 1 1 0 0
5 5 5 0 0
0 0 0 2 2
0 0 0 3 30 0 0 1 1
0.18 0
0.36 0
0.18 0
0.90 0
0 0.53
0 0.800 0.27
=9.64 0
0 5.29x
0.58 0.58 0.58 0 0
0 0 0 0.71 0.71
x
SciFi-conceptRomance-concept
SciFi
Romnce
Mat
rix
Alie
n
Sere
nity
Casa
blan
ca
Am
elie
![Page 12: Dimensionality Reduction. High-dimensional == many features Find concepts/topics/genres: – Documents: Features: Thousands of words, millions of word pairs](https://reader030.vdocument.in/reader030/viewer/2022032701/56649c745503460f949272a4/html5/thumbnails/12.jpg)
Slides by Jure Leskovec 12
SVD - Example
• A = U VT - example:
1 1 1 0 0
2 2 2 0 0
1 1 1 0 0
5 5 5 0 0
0 0 0 2 2
0 0 0 3 30 0 0 1 1
0.18 0
0.36 0
0.18 0
0.90 0
0 0.53
0 0.800 0.27
=9.64 0
0 5.29x
0.58 0.58 0.58 0 0
0 0 0 0.71 0.71
x
SciFi-conceptRomance-concept
U is “user-to-concept” similarity matrix
SciFi
Romnce
Mat
rix
Alie
n
Sere
nity
Casa
blan
ca
Am
elie
![Page 13: Dimensionality Reduction. High-dimensional == many features Find concepts/topics/genres: – Documents: Features: Thousands of words, millions of word pairs](https://reader030.vdocument.in/reader030/viewer/2022032701/56649c745503460f949272a4/html5/thumbnails/13.jpg)
Slides by Jure Leskovec 13
SVD - Example
• A = U VT - example:
1 1 1 0 0
2 2 2 0 0
1 1 1 0 0
5 5 5 0 0
0 0 0 2 2
0 0 0 3 30 0 0 1 1
0.18 0
0.36 0
0.18 0
0.90 0
0 0.53
0 0.800 0.27
=9.64 0
0 5.29x
0.58 0.58 0.58 0 0
0 0 0 0.71 0.71
x
‘strength’ of SciFi-concept
SciFi
Romnce
Mat
rix
Alie
n
Sere
nity
Casa
blan
ca
Am
elie
![Page 14: Dimensionality Reduction. High-dimensional == many features Find concepts/topics/genres: – Documents: Features: Thousands of words, millions of word pairs](https://reader030.vdocument.in/reader030/viewer/2022032701/56649c745503460f949272a4/html5/thumbnails/14.jpg)
Slides by Jure Leskovec 14
SVD - Example
• A = U VT - example:
1 1 1 0 0
2 2 2 0 0
1 1 1 0 0
5 5 5 0 0
0 0 0 2 2
0 0 0 3 30 0 0 1 1
0.18 0
0.36 0
0.18 0
0.90 0
0 0.53
0 0.800 0.27
=9.64 0
0 5.29x
0.58 0.58 0.58 0 0
0 0 0 0.71 0.71
x
V is “movie-to-concept”similarity matrix
SciFi-concept
SciFi
Romnce
Mat
rix
Alie
n
Sere
nity
Casa
blan
ca
Am
elie
![Page 15: Dimensionality Reduction. High-dimensional == many features Find concepts/topics/genres: – Documents: Features: Thousands of words, millions of word pairs](https://reader030.vdocument.in/reader030/viewer/2022032701/56649c745503460f949272a4/html5/thumbnails/15.jpg)
Slides by Jure Leskovec 15
SVD - Example
• A = U VT - example:
1 1 1 0 0
2 2 2 0 0
1 1 1 0 0
5 5 5 0 0
0 0 0 2 2
0 0 0 3 30 0 0 1 1
0.18 0
0.36 0
0.18 0
0.90 0
0 0.53
0 0.800 0.27
=9.64 0
0 5.29x
0.58 0.58 0.58 0 0
0 0 0 0.71 0.71
x
SciFi-concept
SciFi
Romnce
Mat
rix
Alie
n
Sere
nity
Casa
blan
ca
Am
elie
V is “movie-to-concept”similarity matrix
![Page 16: Dimensionality Reduction. High-dimensional == many features Find concepts/topics/genres: – Documents: Features: Thousands of words, millions of word pairs](https://reader030.vdocument.in/reader030/viewer/2022032701/56649c745503460f949272a4/html5/thumbnails/16.jpg)
Slides by Jure Leskovec 16
SVD - Interpretation #1
‘movies’, ‘users’ and ‘concepts’:• U: user-to-concept similarity matrix
• V: movie-to-concept sim. matrix
• : its diagonal elements: ‘strength’ of each concept
![Page 17: Dimensionality Reduction. High-dimensional == many features Find concepts/topics/genres: – Documents: Features: Thousands of words, millions of word pairs](https://reader030.vdocument.in/reader030/viewer/2022032701/56649c745503460f949272a4/html5/thumbnails/17.jpg)
Slides by Jure Leskovec 17
SVD - interpretation #2
SVD gives best axis to project on:
• ‘best’ = min sum of squares of projection errors
• minimum reconstruction error
v1
first right singular vector
Movie 1 rating
Mo
vie
2 ra
tin
g
![Page 18: Dimensionality Reduction. High-dimensional == many features Find concepts/topics/genres: – Documents: Features: Thousands of words, millions of word pairs](https://reader030.vdocument.in/reader030/viewer/2022032701/56649c745503460f949272a4/html5/thumbnails/18.jpg)
Slides by Jure Leskovec 18
SVD - Interpretation #2
• A = U VT - example:
1 1 1 0 0
2 2 2 0 0
1 1 1 0 0
5 5 5 0 0
0 0 0 2 2
0 0 0 3 30 0 0 1 1
0.18 0
0.36 0
0.18 0
0.90 0
0 0.53
0 0.800 0.27
9.64 0
0 5.29x
0.58 0.58 0.58 0 0
0 0 0 0.71 0.71
x
v1
=
v1
first right singular vector
Movie 1 rating
Mo
vie
2 ra
tin
g
![Page 19: Dimensionality Reduction. High-dimensional == many features Find concepts/topics/genres: – Documents: Features: Thousands of words, millions of word pairs](https://reader030.vdocument.in/reader030/viewer/2022032701/56649c745503460f949272a4/html5/thumbnails/19.jpg)
Slides by Jure Leskovec 19
SVD - Interpretation #2
• A = U VT - example:
1 1 1 0 0
2 2 2 0 0
1 1 1 0 0
5 5 5 0 0
0 0 0 2 2
0 0 0 3 30 0 0 1 1
0.18 0
0.36 0
0.18 0
0.90 0
0 0.53
0 0.800 0.27
9.64 0
0 5.29x
0.58 0.58 0.58 0 0
0 0 0 0.71 0.71
x
variance (‘spread’) on the v1 axis
=
![Page 20: Dimensionality Reduction. High-dimensional == many features Find concepts/topics/genres: – Documents: Features: Thousands of words, millions of word pairs](https://reader030.vdocument.in/reader030/viewer/2022032701/56649c745503460f949272a4/html5/thumbnails/20.jpg)
Slides by Jure Leskovec 20
SVD - Interpretation #2
More details• Q: How exactly is dim. reduction done?
1 1 1 0 0
2 2 2 0 0
1 1 1 0 0
5 5 5 0 0
0 0 0 2 2
0 0 0 3 30 0 0 1 1
0.18 0
0.36 0
0.18 0
0.90 0
0 0.53
0 0.800 0.27
9.64 0
0 5.29x
0.58 0.58 0.58 0 0
0 0 0 0.71 0.71
x=
![Page 21: Dimensionality Reduction. High-dimensional == many features Find concepts/topics/genres: – Documents: Features: Thousands of words, millions of word pairs](https://reader030.vdocument.in/reader030/viewer/2022032701/56649c745503460f949272a4/html5/thumbnails/21.jpg)
Slides by Jure Leskovec 21
SVD - Interpretation #2
More details• Q: How exactly is dim. reduction done?• A: Set the smallest singular values to zero
1 1 1 0 0
2 2 2 0 0
1 1 1 0 0
5 5 5 0 0
0 0 0 2 2
0 0 0 3 30 0 0 1 1
0.18 0
0.36 0
0.18 0
0.90 0
0 0.53
0 0.800 0.27
=9.64 0
0 5.29x
0.58 0.58 0.58 0 0
0 0 0 0.71 0.71
x
A=
![Page 22: Dimensionality Reduction. High-dimensional == many features Find concepts/topics/genres: – Documents: Features: Thousands of words, millions of word pairs](https://reader030.vdocument.in/reader030/viewer/2022032701/56649c745503460f949272a4/html5/thumbnails/22.jpg)
Slides by Jure Leskovec 22
SVD - Interpretation #2
More details• Q: How exactly is dim. reduction done?• A: Set the smallest singular values to zero
1 1 1 0 0
2 2 2 0 0
1 1 1 0 0
5 5 5 0 0
0 0 0 2 2
0 0 0 3 30 0 0 1 1
0.18 0
0.36 0
0.18 0
0.90 0
0 0.53
0 0.800 0.27
9.64 0
0 0x
0.58 0.58 0.58 0 0
0 0 0 0.71 0.71
x
A=
~
![Page 23: Dimensionality Reduction. High-dimensional == many features Find concepts/topics/genres: – Documents: Features: Thousands of words, millions of word pairs](https://reader030.vdocument.in/reader030/viewer/2022032701/56649c745503460f949272a4/html5/thumbnails/23.jpg)
Slides by Jure Leskovec 23
SVD - Interpretation #2
More details• Q: How exactly is dim. reduction done?• A: Set the smallest singular values to zero:
1 1 1 0 0
2 2 2 0 0
1 1 1 0 0
5 5 5 0 0
0 0 0 2 2
0 0 0 3 30 0 0 1 1
0.18 0
0.36 0
0.18 0
0.90 0
0 0.53
0 0.800 0.27
9.64 0
0 0x
0.58 0.58 0.58 0 0
0 0 0 0.71 0.71
x
A=
~
![Page 24: Dimensionality Reduction. High-dimensional == many features Find concepts/topics/genres: – Documents: Features: Thousands of words, millions of word pairs](https://reader030.vdocument.in/reader030/viewer/2022032701/56649c745503460f949272a4/html5/thumbnails/24.jpg)
Slides by Jure Leskovec 24
SVD - Interpretation #2
More details• Q: How exactly is dim. reduction done?• A: Set the smallest singular values to zero:
1 1 1 0 0
2 2 2 0 0
1 1 1 0 0
5 5 5 0 0
0 0 0 2 2
0 0 0 3 30 0 0 1 1
0.18
0.36
0.18
0.90
0
00
9.64x
0.58 0.58 0.58 0 0
x
A=
~
![Page 25: Dimensionality Reduction. High-dimensional == many features Find concepts/topics/genres: – Documents: Features: Thousands of words, millions of word pairs](https://reader030.vdocument.in/reader030/viewer/2022032701/56649c745503460f949272a4/html5/thumbnails/25.jpg)
Slides by Jure Leskovec 25
SVD - Interpretation #2
More details• Q: How exactly is dim. reduction done?• A: Set the smallest singular values to zero
1 1 1 0 0
2 2 2 0 0
1 1 1 0 0
5 5 5 0 0
0 0 0 2 2
0 0 0 3 30 0 0 1 1
~
1 1 1 0 0
2 2 2 0 0
1 1 1 0 0
5 5 5 0 0
0 0 0 0 0
0 0 0 0 00 0 0 0 0
A=
B=
Frobenius norm:
ǁMǁF = Σij Mij2
ǁA-BǁF = Σij (Aij-Bij)2
is “small”
![Page 26: Dimensionality Reduction. High-dimensional == many features Find concepts/topics/genres: – Documents: Features: Thousands of words, millions of word pairs](https://reader030.vdocument.in/reader030/viewer/2022032701/56649c745503460f949272a4/html5/thumbnails/26.jpg)
Slides by Jure Leskovec 26
A U
Sigma
VT=
B U
Sigma
VT
=
B is approx A
![Page 27: Dimensionality Reduction. High-dimensional == many features Find concepts/topics/genres: – Documents: Features: Thousands of words, millions of word pairs](https://reader030.vdocument.in/reader030/viewer/2022032701/56649c745503460f949272a4/html5/thumbnails/27.jpg)
27
SVD – Best Low Rank Approx.• Theorem: Let A = U VT (σ1σ2…, rank(A)=r)
then B = U S VT – S = diagonal nxn matrix where si=σi (i=1…k) else si=0
is a best rank-k approximation to A:– B is solution to minB ǁA-BǁF where rank(B)=k
• We will need 2 facts:– where M = P Q R is SVD of M– U VT - U S VT = U ( - S) VT
Slides by Jure Leskovec
Σ
𝜎 11
𝜎 𝑟𝑟
![Page 28: Dimensionality Reduction. High-dimensional == many features Find concepts/topics/genres: – Documents: Features: Thousands of words, millions of word pairs](https://reader030.vdocument.in/reader030/viewer/2022032701/56649c745503460f949272a4/html5/thumbnails/28.jpg)
Slides by Jure Leskovec 28
SVD – Best Low Rank Approx.• We will need 2 facts:
– where M = P Q R is SVD of M
– U VT - U S VT = U ( - S) VT
We apply:-- P column orthonormal-- R row orthonormal-- Q is diagonal
![Page 29: Dimensionality Reduction. High-dimensional == many features Find concepts/topics/genres: – Documents: Features: Thousands of words, millions of word pairs](https://reader030.vdocument.in/reader030/viewer/2022032701/56649c745503460f949272a4/html5/thumbnails/29.jpg)
Slides by Jure Leskovec 29
SVD – Best Low Rank Approx.• A = U VT , B = U S VT (σ1σ2… 0, rank(A)=r)
– S = diagonal nxn matrix where si=σi (i=1…k) else si=0
then B is solution to minB ǁA-BǁF , rank(B)=kWhy?
• We want to choose si to minimize we set si=σi (i=1…k) else si=0
r
kii
r
kii
k
iiis s
i1
2
1
2
1
2)(min
r
iiisFFkBrankBsSBA
i1
2
)(,)(minminmin
U VT - U S VT = U ( - S) VT
![Page 30: Dimensionality Reduction. High-dimensional == many features Find concepts/topics/genres: – Documents: Features: Thousands of words, millions of word pairs](https://reader030.vdocument.in/reader030/viewer/2022032701/56649c745503460f949272a4/html5/thumbnails/30.jpg)
Slides by Jure Leskovec 30
SVD - Interpretation #2
Equivalent:‘spectral decomposition’ of the matrix:
1 1 1 0 0
2 2 2 0 0
1 1 1 0 0
5 5 5 0 0
0 0 0 2 2
0 0 0 3 30 0 0 1 1
= x xu1 u2
σ1
σ2
v1
v2
![Page 31: Dimensionality Reduction. High-dimensional == many features Find concepts/topics/genres: – Documents: Features: Thousands of words, millions of word pairs](https://reader030.vdocument.in/reader030/viewer/2022032701/56649c745503460f949272a4/html5/thumbnails/31.jpg)
Slides by Jure Leskovec 31
SVD - Interpretation #2Equivalent:‘spectral decomposition’ of the matrix
1 1 1 0 0
2 2 2 0 0
1 1 1 0 0
5 5 5 0 0
0 0 0 2 2
0 0 0 3 30 0 0 1 1
= u1σ1 vT1 u2σ2 vT
2+ +...
n
m
n x 1 1 x m
k terms
Assume: σ1 σ2 σ3 ... 0
Why is setting small σs the thing to do?Vectors ui and vi are unit length, so σi scales them.So, zeroing small σs introduces less error.
![Page 32: Dimensionality Reduction. High-dimensional == many features Find concepts/topics/genres: – Documents: Features: Thousands of words, millions of word pairs](https://reader030.vdocument.in/reader030/viewer/2022032701/56649c745503460f949272a4/html5/thumbnails/32.jpg)
Slides by Jure Leskovec 32
SVD - Interpretation #2
Q: How many σs to keep?A: Rule-of-a thumb:
keep 80-90% of ‘energy’ (=σi2)
1 1 1 0 0
2 2 2 0 0
1 1 1 0 0
5 5 5 0 0
0 0 0 2 2
0 0 0 3 30 0 0 1 1
= u1σ1 vT1 u2σ2 vT
2+ +...n
m
assume: σ1 σ2 σ3 ...
![Page 33: Dimensionality Reduction. High-dimensional == many features Find concepts/topics/genres: – Documents: Features: Thousands of words, millions of word pairs](https://reader030.vdocument.in/reader030/viewer/2022032701/56649c745503460f949272a4/html5/thumbnails/33.jpg)
Slides by Jure Leskovec 33
SVD - Complexity
• To compute SVD:– O(nm2) or O(n2m) (whichever is less)
• But:– Less work, if we just want singular values– or if we want first k singular vectors– or if the matrix is sparse
• Implemented in linear algebra packages like– LINPACK, Matlab, SPlus, Mathematica ...
![Page 34: Dimensionality Reduction. High-dimensional == many features Find concepts/topics/genres: – Documents: Features: Thousands of words, millions of word pairs](https://reader030.vdocument.in/reader030/viewer/2022032701/56649c745503460f949272a4/html5/thumbnails/34.jpg)
Slides by Jure Leskovec 34
SVD - Conclusions so far
• SVD: A= U VT: unique– U: user-to-concept similarities– V: movie-to-concept similarities– : strength of each concept
• Dimensionality reduction: – keep the few largest singular values
(80-90% of ‘energy’)– SVD: picks up linear correlations
![Page 35: Dimensionality Reduction. High-dimensional == many features Find concepts/topics/genres: – Documents: Features: Thousands of words, millions of word pairs](https://reader030.vdocument.in/reader030/viewer/2022032701/56649c745503460f949272a4/html5/thumbnails/35.jpg)
Slides by Jure Leskovec 35
Case study: How to query?
Q: Find users that like ‘Matrix’ and ‘Alien’
1 1 1 0 0
2 2 2 0 0
1 1 1 0 0
5 5 5 0 0
0 0 0 2 2
0 0 0 3 3 0 0 0 1 1
0.18 0
0.36 0
0.18 0
0.90 0
0 0.53
0 0.800 0.27
=
SciFi
Romnce
9.64 0
0 5.29x
0.58 0.58 0.58 0 0
0 0 0 0.71 0.71
x
Mat
rix
Alie
n
Sere
nity
Casa
blan
ca
Am
elie
![Page 36: Dimensionality Reduction. High-dimensional == many features Find concepts/topics/genres: – Documents: Features: Thousands of words, millions of word pairs](https://reader030.vdocument.in/reader030/viewer/2022032701/56649c745503460f949272a4/html5/thumbnails/36.jpg)
Slides by Jure Leskovec 36
Case study: How to query?
Q: Find users that like ‘Matrix’A: Map query into a ‘concept space’ – how?
1 1 1 0 0
2 2 2 0 0
1 1 1 0 0
5 5 5 0 0
0 0 0 2 2
0 0 0 3 3 0 0 0 1 1
0.18 0
0.36 0
0.18 0
0.90 0
0 0.53
0 0.800 0.27
=
SciFi
Romnce
9.64 0
0 5.29x
0.58 0.58 0.58 0 0
0 0 0 0.71 0.71
x
Mat
rix
Alie
n
Sere
nity
Casa
blan
ca
Am
elie
![Page 37: Dimensionality Reduction. High-dimensional == many features Find concepts/topics/genres: – Documents: Features: Thousands of words, millions of word pairs](https://reader030.vdocument.in/reader030/viewer/2022032701/56649c745503460f949272a4/html5/thumbnails/37.jpg)
Slides by Jure Leskovec 37
Case study: How to query?
Q: Find users that like ‘Matrix’A: map query vectors into ‘concept space’ – how?
5 0 0 0 0
q=
MatrixAl
ien
v1
q
v2
Mat
rix
Alie
n
Sere
nity
Casa
blan
ca
Am
elie
Project into concept space:Inner product with each ‘concept’ vector vi
![Page 38: Dimensionality Reduction. High-dimensional == many features Find concepts/topics/genres: – Documents: Features: Thousands of words, millions of word pairs](https://reader030.vdocument.in/reader030/viewer/2022032701/56649c745503460f949272a4/html5/thumbnails/38.jpg)
Slides by Jure Leskovec 38
Case study: How to query?
Q: Find users that like ‘Matrix’A: map the vector into ‘concept space’ – how?
v1
q
q*v1
5 0 0 0 0
q=
Mat
rix
Alie
n
Sere
nity
Casa
blan
ca
Am
elie
v2
MatrixAl
ien
Project into concept space:Inner product with each ‘concept’ vector vi
![Page 39: Dimensionality Reduction. High-dimensional == many features Find concepts/topics/genres: – Documents: Features: Thousands of words, millions of word pairs](https://reader030.vdocument.in/reader030/viewer/2022032701/56649c745503460f949272a4/html5/thumbnails/39.jpg)
Slides by Jure Leskovec 39
Case study: How to query?
Compactly, we have:qconcept = q V
E.g.: 0.58 0
0.58 0
0.58 0
0 0.71
0 0.71
movie-to-concept similarities
= 2.9 0
SciFi-concept
5 0 0 0 0
q=
Mat
rix
Alie
n
Sere
nity
Casa
blan
ca
Am
elie
![Page 40: Dimensionality Reduction. High-dimensional == many features Find concepts/topics/genres: – Documents: Features: Thousands of words, millions of word pairs](https://reader030.vdocument.in/reader030/viewer/2022032701/56649c745503460f949272a4/html5/thumbnails/40.jpg)
Slides by Jure Leskovec 40
Case study: How to query?How would the user d that rated
(‘Alien’, ‘Serenity’) be handled?dconcept = d V
E.g.:0.58 0
0.58 0
0.58 0
0 0.71
0 0.71
movie-to-concept similarities
= 5.22 0
SciFi-concept
0 4 5 0 0
d=
Mat
rix
Alie
n
Sere
nity
Casa
blan
ca
Am
elie
![Page 41: Dimensionality Reduction. High-dimensional == many features Find concepts/topics/genres: – Documents: Features: Thousands of words, millions of word pairs](https://reader030.vdocument.in/reader030/viewer/2022032701/56649c745503460f949272a4/html5/thumbnails/41.jpg)
Slides by Jure Leskovec 41
Case study: How to query?
Observation: User d that rated (‘Alien’, ‘Serenity’) will be similar to query “user” q that rated (‘Matrix’), although d did not rate ‘Matrix’!
0 4 5 0 0
d= 2.9 0
SciFi-concept
5 0 0 0 0
5.22 0
q=
Mat
rix
Alie
n
Sere
nity
Casa
blan
ca
Am
elie
Similarity = 0 Similarity ≠ 0
![Page 42: Dimensionality Reduction. High-dimensional == many features Find concepts/topics/genres: – Documents: Features: Thousands of words, millions of word pairs](https://reader030.vdocument.in/reader030/viewer/2022032701/56649c745503460f949272a4/html5/thumbnails/42.jpg)
Slides by Jure Leskovec 42
SVD: Drawbacks
+ Optimal low-rank approximation:• in Frobenius norm
- Interpretability problem:– A singular vector specifies a linear
combination of all input columns or rows- Lack of sparsity:
– Singular vectors are dense!
=
U
VT