singular value decompositionhari/teaching/bigdata/14_svd.pdfgoal: given a ร— matrix , = ฮฃ โˆ— = =1...

24
SVD SINGULAR VALUE DECOMPOSITION

Upload: others

Post on 23-Oct-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

  • SVDSINGULAR VALUE DECOMPOSITION

  • Review โ€ฆ

    Last time

    Recommendation Systems

    Decompositions of the Utility Matrix

    Gradient Descent

    Today

    Singular Value Decomposition

  • Dimensionality reduction

    Utility Matrix, ๐‘€, is low rank Singular Value Decomposition

    ๐‘€ โ†’ ๐‘› ร— ๐‘š

    ๐‘€ = ๐‘ˆ๐‘‰

    ๐‘ˆ โ†’ ๐‘› ร— ๐‘‘, ๐‘‰ โ†’ ๐‘‘ ร— ๐‘š

  • Singular Value Decomposition SWISS ARMY KNIFE OF LINEAR ALGEBRA

  • Goal: Given a ๐‘š ร— ๐‘› matrix ๐ด,

    ๐ด = ๐‘ˆ ฮฃ ๐‘‰โˆ— =

    ๐‘—=1

    ๐‘›

    ๐œŽ๐‘—๐’–๐‘—๐’—๐‘—โˆ—

    ๐‘š ร— ๐‘› ๐‘š ร— ๐‘› ๐‘› ร— ๐‘› ๐‘› ร— ๐‘›

    ๐œŽ1 โ‰ฅ ๐œŽ2 โ‰ฅ โ‹ฏ โ‰ฅ ๐œŽ๐‘› โ‰ฅ 0 are the singular values of ๐ด๐’–๐Ÿ, ๐’–๐Ÿ, โ€ฆ , ๐’–๐’ are orthonormal, the left singular vectors of ๐ด, and๐’—๐Ÿ, ๐’—๐Ÿ, โ€ฆ , ๐’—๐’ are orthonormal, the right singular vectors of ๐ด.

  • Singular Value Decomposition

    Closely related problems:

    Eigenvalue decomposition ๐ด โ‰ˆ ๐‘‰ฮ›๐‘‰โˆ—

    Spanning columns or rows ๐ด โ‰ˆ ๐ถ ๐‘ˆ ๐‘…

    Applications:

    Principal Component Analysis: Form an empirical covariance matrix from some collection of statistical data. By computing the singular value decomposition of the matrix, you find the directions of maximal variance

    Finding spanning columns or rows: Collect statistical data in a large matrix. By finding a set of spanning columns, you can identify some variables that โ€œexplainโ€ the data. (Say a small collection of genes among a set of recorded genomes, or a small number of stocks in a portfolio)

    Relaxed solutions to ๐’Œ-means clustering: Relaxed solutions can be found via the singular value decomposition

    PageRank: primary eigenvector

  • Singular values, intuition

    Blue circles are ๐‘š data points in 2๐ท

    The SVD of the ๐‘š ร— 2 matrix

    ๐‘‰1: 1st (right) singular vector: direction of maximal variance,

    ๐œŽ1: how much of data variance is explained by the first singular vector

    ๐‘‰2: 2nd (right) singular vector: direction of maximal variance, after removing projection of the data along first singular vector.

    ๐œŽ2: measures how much of the data variance is explained by the second singular vector

  • SVD - Interpretation

    ๐‘€ = ๐‘ˆฮฃ๐‘‰โˆ— - example:

    1 1 1 0 0

    2 2 2 0 0

    1 1 1 0 0

    5 5 5 0 0

    0 0 0 2 2

    0 0 0 3 3

    0 0 0 1 1

    0.18 0

    0.36 0

    0.18 0

    0.90 0

    0 0.53

    0 0.80

    0 0.27

    =9.64 0

    0 5.29x

    0.58 0.58 0.58 0 0

    0 0 0 0.71 0.71x

    v1

  • SVD - Interpretation

    X = U S VT - example:

    1 1 1 0 0

    2 2 2 0 0

    1 1 1 0 0

    5 5 5 0 0

    0 0 0 2 2

    0 0 0 3 3

    0 0 0 1 1

    0.18 0

    0.36 0

    0.18 0

    0.90 0

    0 0.53

    0 0.80

    0 0.27

    =9.64 0

    0 5.29x

    0.58 0.58 0.58 0 0

    0 0 0 0.71 0.71x

    variance (โ€˜spreadโ€™) on the v1 axis

  • SVD - Interpretation

    ๐‘ด = ๐‘ผ๐šบ๐•โˆ— - example:

    ๐‘ˆฮฃ gives the coordinates of the points in the projection axis

    1 1 1 0 0

    2 2 2 0 0

    1 1 1 0 0

    5 5 5 0 0

    0 0 0 2 2

    0 0 0 3 3

    0 0 0 1 1

    0.18 0

    0.36 0

    0.18 0

    0.90 0

    0 0.53

    0 0.80

    0 0.27

    =9.64 0

    0 5.29x

    0.58 0.58 0.58 0 0

    0 0 0 0.71 0.71x

  • Dimensionality reduction

    set the smallest eigenvalues to zero:

    1 1 1 0 0

    2 2 2 0 0

    1 1 1 0 0

    5 5 5 0 0

    0 0 0 2 2

    0 0 0 3 3

    0 0 0 1 1

    0.18 0

    0.36 0

    0.18 0

    0.90 0

    0 0.53

    0 0.80

    0 0.27

    =9.64 0

    0 5.29x

    0.58 0.58 0.58 0 0

    0 0 0 0.71 0.71x

  • Dimensionality reduction

    1 1 1 0 0

    2 2 2 0 0

    1 1 1 0 0

    5 5 5 0 0

    0 0 0 2 2

    0 0 0 3 3

    0 0 0 1 1

    0.18

    0.36

    0.18

    0.90

    0

    0

    0

    ~9.64

    x 0.58 0.58 0.58 0 0x

  • Dimensionality reduction

    1 1 1 0 0

    2 2 2 0 0

    1 1 1 0 0

    5 5 5 0 0

    0 0 0 2 2

    0 0 0 3 3

    0 0 0 1 1

    ~

    1 1 1 0 0

    2 2 2 0 0

    1 1 1 0 0

    5 5 5 0 0

    0 0 0 0 0

    0 0 0 0 0

    0 0 0 0 0

  • Relation to Eigenvalue

    decomposition

    Eigenvalue decomposition: Can only be applied to certain classes of square matrices

    Given an SVD of a matrix ๐‘€

    ๐‘€โˆ—๐‘€ = ๐‘‰ฮฃโˆ—๐‘ˆโˆ— ๐‘ˆฮฃ๐‘‰โˆ— = ๐‘‰ ฮฃโˆ—ฮฃ ๐‘‰โˆ—

    ๐‘€๐‘€โˆ— = ๐‘ˆฮฃ๐‘‰โˆ— ๐‘‰ฮฃโˆ—๐‘ˆโˆ— = ๐‘ˆ ฮฃฮฃโˆ— ๐‘ˆโˆ—

    The columns of ๐‘‰ are eigenvectors of ๐‘€โˆ—๐‘€

    The columns of ๐‘ˆ are eigenvectors of ๐‘€๐‘€โˆ—

    The non-zero elements of ฮฃ are the square roots of the non-zero eigenvalues of ๐‘€๐‘€โˆ— or ๐‘€โˆ—๐‘€

  • Calculating inverses with SVD

    Let ๐ด be a ๐‘› ร— ๐‘› matrix

    Then, ๐‘ˆ, ฮฃ, and ๐‘‰ are also ๐‘› ร— ๐‘›

    ๐‘ˆ and ๐‘‰ are orthogonal, so their inverses are equal to their transposes

    ฮฃ is diagonal, so its inverse is the diagonal matrix whose elements are the inverses of the elements of ฮฃ

    ๐ดโˆ’1 = ๐‘‰1/๐œŽ1 โ‹ฏ

    โ‹ฎ โ‹ฑ โ‹ฎโ‹ฏ 1/๐œŽ๐‘›

    ๐‘ˆ๐‘‡

  • Calculating inverses

    If one of the ๐œŽ๐‘– is zero or so small that its value is dominated by round-off error, then there is a problem

    The more of the ๐œŽ๐‘–s that have this problem, the โ€˜more singularโ€™ ๐ด is

    SVD gives a way of determining how singular ๐ด is

    The concept of โ€˜how singularโ€™ ๐ด is, is linked with the condition number of ๐ด

    The condition number of ๐ด is the ratio of the largest singular value to its smallest singular value

  • Concepts you should know

    Null space of ๐ด ๐‘ฅ | ๐ด๐‘ฅ = 0

    Range of ๐ด ๐‘ | ๐ด๐‘ฅ = ๐‘, for some vector ๐‘ฅ

    Rank of ๐ด dimension of the range of ๐ด

    Singular Valued Decomposition constructs orthonormal bases for the range and null space of a matrix

    The columns of ๐‘ˆ which correspond to non-zero singular values of ๐ดare an orthonormal set of basis vectors for the range of ๐ด

    The columns of ๐‘‰ which correspond to zero singular values form an orthonormal basis for the null space of ๐ด

  • Computing the SVD

    Reduce the matrix ๐‘€ to a bidiagonal matrix

    Householder reflections

    QR decomposition

    Compute the SVD of the bidiagonal matrix

    Iterative methods

  • Randomized SVD

  • Goal: Given a ๐‘š ร— ๐‘› matrix ๐ด, for large ๐‘š, ๐‘›, we seek to compute a rank-๐‘˜approximation, with ๐‘˜ โ‰ช ๐‘›,

    ๐ด โ‰ˆ ๐‘ˆ ฮฃ ๐‘‰โˆ— =

    ๐‘—=1

    ๐‘˜

    ๐œŽ๐‘—๐’–๐‘—๐’—๐‘—โˆ—

    ๐‘š ร— ๐‘› ๐‘š ร— ๐‘˜ ๐‘˜ ร— ๐‘˜ ๐‘˜ ร— ๐‘›

    ๐œŽ1 โ‰ฅ ๐œŽ2 โ‰ฅ โ‹ฏ โ‰ฅ ๐œŽ๐‘˜ โ‰ฅ 0 are the (approximate) singular values of ๐ด๐’–๐Ÿ, ๐’–๐Ÿ, โ€ฆ , ๐’–๐’Œ are orthonormal, the (approximate) left singular vectors of ๐ด, and๐’—๐Ÿ, ๐’—๐Ÿ, โ€ฆ , ๐’—๐’Œ are orthonormal, the (approximate) right singular vectors of ๐ด.

  • Randomized SVD

    1. Draw an ๐‘› ร— ๐‘˜ Gaussian random matrix, ฮฉ

    2. Form the ๐‘š ร— ๐‘˜ sample matrix ๐‘Œ = ๐ดฮฉ

    3. Form an ๐‘š ร— ๐‘˜ orthonormal matrix ๐‘„ such that ๐‘Œ = ๐‘„๐‘…

    4. Form the ๐‘˜ ร— ๐‘› matrix ๐ต = ๐‘„โˆ—๐ด

    5. Compute the SVD of the small matrix ๐ต = ๐‘ˆฮฃVโˆ—

    6. Form the matrix ๐‘ˆ = ๐‘„ ๐‘ˆ

    Computational Costs ?

    2,4 ๐‘˜ โˆ’Matrix-Vector product3,5,6 dense operations on matrices

    ๐‘š ร— ๐‘˜, ๐‘˜ ร— ๐‘›

  • Computational Costs

    If ๐ด can fit in RAM

    Cost dominated by 2๐‘š๐‘›๐‘˜ flops required for steps 2,4

    If A cannot fit in RAM

    Standard approaches suffer

    Randomized SVD is successful as long as

    Matrices of size ๐‘š ร— ๐‘˜ and ๐‘˜ ร— ๐‘› must fit in RAM

    Parallelization

    Steps 2,4 permit ๐‘˜-way parallelization

  • Probabilistic Error Analysis

    The error of the method is defined as

    ๐‘’๐‘˜ = ๐ด โˆ’ ๐ด๐‘˜

    ๐‘’๐‘˜ is a random variable whose theoretical minimum value is๐œŽ๐‘˜+1 = min ๐ด โˆ’ ๐ด๐‘˜ โˆถ ๐ด๐‘˜ has rank ๐‘˜

    Ideally, we would like ๐‘’๐‘˜ to be close to ๐œŽ๐‘˜+1 with high probability

    Not true, the expectation of ๐‘’๐‘˜

    ๐œŽ๐‘˜+1is large and has very large variance

  • Oversampling โ€ฆ

    Oversample a little. If ๐‘ is a small integer (think ๐‘ = 5), then we often

    can bound ๐‘’๐‘˜+๐‘ by something close to ๐œŽ๐‘˜+1

    ๐”ผ ๐ด โˆ’ ๐ด๐‘˜+๐‘ โ‰ค 1 +๐‘˜

    ๐‘ โˆ’ 1๐œŽ๐‘˜+1 +

    ๐‘’ ๐‘˜ + ๐‘

    ๐‘

    ๐‘—=๐‘˜+1

    ๐‘›

    ๐œŽ๐‘—2

    12