singular value decomposition (svd) · the singular value decomposition for a matrix a writes a as a...

16
Singular Value Decomposition (SVD) It is a fundamental matrix operation. A generalization of diagonalization of a matrix. The singular value decomposition for a matrix A writes A as a product (hanger)(stretcher)(aligner) Example: For an 5 x 4 matrix A, the singular value decomposition of A is given by: h and a are orthogonal and s is a 5 by 4 diagonal matrix with real, nonnegative diagonal elements, s i such that s i = 0 for i > min(5,4) 1 = 4 3 2 1 4 3 2 1 5 4 3 2 1 a a a a 0 0 0 0 s 0 0 0 0 s 0 0 0 0 s 0 0 0 0 s ) h | h | h | h | h ( A

Upload: others

Post on 23-Oct-2020

10 views

Category:

Documents


0 download

TRANSCRIPT

  • Singular Value Decomposition (SVD)

    It is a fundamental matrix operation. A generalization of diagonalization of a matrix.

    The singular value decomposition for a matrix A writes A as a product (hanger)(stretcher)(aligner)

    Example: For an 5 x 4 matrix A, the singular value decomposition of A is given by:

    h and a are orthogonal and s is a 5 by 4 diagonal matrix with real, nonnegative diagonal elements, si such that

    si = 0 for i > min(5,4)

    1

    =

    →→→→→

    4

    3

    2

    1

    4

    3

    2

    1

    54321

    a

    a

    a

    a

    0000s0000s0000s0000s

    )h|h|h|h|h(A

  • If we use this operation on a matrix that is the cross-covariance matrix of two data sets Ynt,ny and Znt,nz where nt is time steps and ny and nz space/variable, just like CCA, SVD isolates linear combinations of variables within the two fields that tend to be linearly related.

    If Ynt,ny and Znt,nz are equal, SVD provides the principal componets.

    SVD provides spatial patterns from the two fields that explains most of the covariance between them.

    Also called Maximum Covariance Analysis.

    2

  • TheoryGiven two data sets Ynt,ny and Znt,nz

    nt time steps. Same for Y and Z. ny nz space points/variables do not need to be the same.

    The mean per column of Y and Z must be zero.

    Find spatial patterns G and H that are a linear combination from Y and Z, respectively, explains most of the total covariance

    U=YG; (1)V=ZH; (2)

    The cross-covariance matrix of Y and Z is:

    Z'Y1nt

    1CYZ −=

    The patterns G and H are chosen such, as the covariance between U and V is maximized.

    maxV'U1nt

    1)V,Ucov( =−

    = (3)

    3

  • Using (1) and (2) in (3) comes:

    maxHC'GZH'Y'G1nt

    1)V,Ucov( YZ ==−=

    We have now a maximization problem. Constraints: G'G=H'H=1, i.e. the total variance of each of the fields equals one.

    Now we have a maximization problem with its constrains.

    Introduce the Lagrange multipliers µ and σ. Our function to be maximized become:

    F(G,H)=G'CYZH-σ(G'G-1)-µ(H'H-1)

    Now, setting the derivatives 0'G

    F =∂∂

    and 0HF =

    ∂∂

    we can find the

    maximum values of F.

    0GHC'G

    FYZ =σ−=∂

    ∂(4)

    0'HC'GHF

    YZ =µ−=∂∂

    (5) ***

    Solution:

    σ= GCH ZY

    (CYZCZY-σ2I)G=0

    Where σ2 are the eigenvalues of CYZCZY and G the eigenvectors.

    4

  • The SVD (matrix operation) of any matrix Mnt,nz is given by

    M=GkσH'k

    where Gk and Hk are orthogonal and σ is a nt by nz diagonal matrix with real, nonnegative diagonal elements, σi, such that

    σ1≥σ2≥…≥σmin(nt,nz) ≥ 0

    Comparing with a eigenvalue problem:

    σi are the eigenvalues (singular values) of M, Gk is the left eigenvector (singular vector), Hk is the right eigenvector (singular vector), k is the mode of the singular vector.

    σ has as many nonzero elements as the minimum between nt and nz, the remaining ones are zero.

    G and H are composed by a number of vectors equal to the minimum between nt and nz.

    If SVD is applied to the CYZ cross-covariance matrix between two fields - Ynt,ny the predictor (left field) and Znt,nz the predictand (right field) - we can identify the pairs of spatial patterns that explains most of the temporal covariance between the two fields. Gny,k Hnz,k

    The new time series are:

    Unt,k = Ynt,ny*Gny,k and Vnt,k = Znt,nz*Hnz,k

    5

  • For easier interpretation of the spatial patterns (eigenvectors G and H), we look at them as

    homogeneous correlation maps (g):

    - vector of correlations between the values of one (Y or Z) field and the kth mode of the singular vector of the same field (G for Y and H for Z).

    - indicator of the geographic localization of the covarying part of the field.

    the heterogeneous correlation maps (h):

    - vector of correlations between the values of one field (Y or Z) and the kth mode of the singular vector of the other field (H for Y and G for Z).

    - indicates how well the data points of one field can be predicted from the kth expansion coefficient of the other field.

    You need to calculate the significance of these correlations.

    6

  • The maximum covariance is equal to the largest singular value σ1.

    The total squared covariance explained by a single pair of patters (Gk,Hk) is σ2k.

    The squared covariance fraction (SCF), i.e., the percentage of the squared covariance explained by the pair of patterns, is:

    ∑=

    σ

    σ=

    )nz,ntmin(

    1l

    2l

    2k

    kSCF

    SCF is useful for comparing the relative importance of modes in one given expansion.

    For different SVD expansions it is used the normalized squared covariance defined as:

    2/1

    i jji

    2k

    k varvarNSC

    σ

    =∑ ∑

    vari and varj are the variances at the ith variable in the left field and the jth variable in the right field respectively. For normalized

    data, 1varvari j

    ji =∑ ∑ .

    0 >= NSCk

  • Caveats

    SVD can only be applied to data sets under specific conditions.Solution: compare the heterogeneous maps with maps of simple correlation.

    Newman and Sardeshmukh, 1995, J. Climate, 8, 352-360.

    Modeling

    If Y and Z denote the predictor and predictand fields, respectively, the model is generated by first calculating a matrix of regression coefficients (A) which relates the values of predictor time series (U) to the individual points in the predictand field (Z).

    and

    but U=YG, so that

    In practice, - choose only the significant modes of U to compose the model so that the model will not fit to noise,- reserve part of the data set for validation so that

    8

    U'UZ'UA =

    YGAZ =∧

    YvalGAvalZ =∧

    UAZ =∧

  • -

    Examples of SVD use:

    SVD working as EOF Data: precipitation (Uvo and Berndtsson, 1996)

    Stn 1 2 3 4 … ntime 1234

    t9

  • 10

  • 11

  • SVD Relating SST to precipitation (Uvo et al. 1998)

    12

  • 13

  • 14

    NSC = 14.8%

    NSC= 20.6%

  • Suggested references: Bretherton et al 1992 JCWallace et al. 1992 JCCherry 1996 JCUvo Berndtsson 1996 GJR- AtmosphereUvo et al. 1998 JC

    15

  • Report

    You are expected to read through the references above before doing your report on SVD.

    The report should contain at least:Description of the data.Application of SVD to your data set. Physical interpretation of the results.

    How is the explained variance distributed among the modes?How many modes you consider important, why?Can you explain physically the modes you consider important? What do they mean?

    If you are using the same data you developed CCA with, compare CCA and SVD results.

    Develop a SVD model.

    Which one performed better? CCA or SVD model?

    Other analysis or comments.

    16