golub-kahan iterative bidiagonalization and determining ...0 50 100 150 200 250 300 350 400 10−20...

47
Golub-Kahan iterative bidiagonalization and determining the noise level in the data Iveta Hnˇ etynkov´ a ,∗∗ , Martin Pleˇ singer ∗∗,∗∗∗ , Zdenˇ ek Strakoˇ s ,∗∗ * Charles University, Prague ** Academy of Sciences of the Czech republic, Prague *** Technical University, Liberec with thanks to P. C. Hansen, M. Kilmer and many others SIAM ALA, Monterey, October 2009 1

Upload: others

Post on 27-Jan-2021

2 views

Category:

Documents


0 download

TRANSCRIPT

  • Golub-Kahan iterative bidiagonalization and

    determining the noise level in the data

    Iveta Hnětynková ∗,∗∗, Martin Plešinger ∗∗,∗∗∗, Zdeněk Strakoš ∗,∗∗

    * Charles University, Prague

    ** Academy of Sciences of the Czech republic, Prague

    *** Technical University, Liberec

    with thanks to P. C. Hansen, M. Kilmer and many others

    SIAM ALA, Monterey, October 2009

    1

  • Six tons large scale real world ill-posed problem:

    2

  • Solving large scale discrete ill-posed problems is frequently based upon

    orthogonal projections-based model reduction using Krylov sub-

    spaces, see, e.g., hybrid methods. This can be viewed as

    approximation of a Riemann-Stieltjes distribution function

    via matching moments.

    3

  • Consider the Riemann-Stieltjes distribution function ω(λ) with the

    n points of increase associated with the HPD matrix B and the

    normalized inital vector s , (or with the transfer function given by

    the Laplace transform of a linear dynamical system determined by

    B, s ). Then

    s∗(λI − B)−1s =n

    j=1

    ωj

    λ − λj≡ Fn(λ) ,

    where λj, j = 1, . . . , n denote the eigenvalues of B and ωj the

    squared size of the component of s in the corresponding invariant

    subspace respectively.

    4

  • The continued fraction on the right hand side is given by

    Fn(λ) ≡Rn(λ)

    Pn(λ)

    ≡1

    λ − γ1 −δ22

    λ − γ2 −δ23

    λ − γ3 − . . .. . .

    λ − γn−1 −δ2n

    λ − γn

    and the entries γ1, . . . , γn and δ2, . . . , δn form the Jacobi matrix

    5

  • Tn ≡

    γ1 δ2δ2 γ2

    . . .. . . . . . δn

    δn γn

    , δℓ > 0 , ℓ = 2, . . . , n .

    Consider the kth Gauss-Christoffel quadrature approximation ω(k)(λ)

    of the Riemann-Stieltjes distribution function ω(λ) . Its algebraic

    degree is 2k − 1 , i.e., it matches the first 2k moments

    ξℓ−1 =∫

    λℓ−1 dω(λ) =k

    j=1

    ω(k)j {λ

    (k)j }

    ℓ−1, ℓ = 1, . . . ,2k .

    6

  • The nodes and weights of ω(k)(λ) are given by the eigenvalues and

    the corresponding squared first elements of the normalized eigenvec-

    tors of Tk .

    Expansion of the continued fraction Fn(λ) in terms of the decreasing

    powers of λ and the approximation by its kth convergent Fk(λ)

    gives

    Fn(λ) =2k∑

    ℓ=1

    ξℓ−1λℓ

    + O

    (

    1

    λ2k+1

    )

    = Fk(λ) + O

    (

    1

    λ2k+1

    )

    .

    Here Fk(λ) approximates Fn(λ) with the error proportional to

    λ−(2k+1), which represents the minimal partial realization matching

    the first 2k moments, cf. [Stieltjes - 1894, Chebyshev - 1855].

    7

  • Discrete ill-posed problem,

    the smallest node and weight in approximation of ω(λ):

    10−40

    10−20

    100

    10−30

    10−20

    10−10

    100

    the leftmost node

    wei

    ght

    ω(λ): nodes σj2, weights |(b/β

    1,u

    j)|2

    nodes (θ1(k))2, weights |(p

    1(k),e

    1)|2

    δ2noise

    8

  • Outline

    1. Problem formulation

    2. Golub-Kahan iterative bidiagonalization, Lanczos tridiagonaliza-

    tion, and approximation of the Riemann-Stieltjes distribution func-

    tion

    3. Propagation of the noise in the Golub-Kahan bidiagonalization

    4. Determination of the noise level

    5. Numerical illustration

    6. Summary and future work

    9

  • Consider an ill-posed square nonsingular linear algebraic system

    A x ≈ b, A ∈ Rn×n, b ∈ Rn,

    with the right-hand side corrupted by a white noise

    b = bexact + bnoise 6= 0 ∈ Rn , ‖ bexact ‖ ≫ ‖ bnoise ‖ ,

    and the goal to approximate xexact ≡ A−1 bexact.

    The noise level δnoise ≡‖bnoise‖

    ‖bexact‖≪ 1 .

    10

  • Properties (assumptions):

    • matrices A, AT , AAT have a smoothing property;

    • left singular vectors uj of A represent increasing frequencies

    as j increases;

    • the system A x = bexact satisfies the discrete Picard condition.

    Discrete Picard condition (DPC):

    On average, the components |(bexact, uj)| of the true right-hand

    side bexact in the left singular subspaces of A decay faster than

    the singular values σj of A, j = 1, . . . , n .

    White noise:

    The components |(bnoise, uj)|, j = 1, . . . , n do not exhibit any trend.

    11

  • Problem Shaw: Noise level, Singular values, and DPC:

    [Hansen – Regularization Tools]

    0 10 20 30 40 50 60

    10−40

    10−30

    10−20

    10−10

    100

    singular value number

    σj, double precision arithmetic

    σj, high precision arithmetic

    |(bexact, uj)|, high precision arithmetic

    0 50 100 150 200 250 300 350 40010

    −20

    10−15

    10−10

    10−5

    100

    105

    singular value number

    σj

    |(b, uj)| for δ

    noise = 10−14

    |(b, uj)| for δ

    noise = 10−8

    |(b, uj)| for δ

    noise = 10−4

    12

  • Regularization is used to suppress the effect of errors in the data

    and extract the essential information about the solution.

    In hybrid methods, see [O’Leary, Simmons – 81], [Hansen – 98], or

    [Fiero, Golub Hansen, O’Leary – 97], [Kilmer, O’Leary – 01], [Kilmer,

    Hansen, Español – 06], [O’Leary, Simmons – 81], the outer bidiago-

    nalization is combined with an inner regularization – e.g., truncated

    SVD (TSVD), or Tikhonov regularization – of the projected small

    problem (i.e. of the reduced model).

    The bidiagonalization is stopped when the regularized solution of the

    reduced model matches some selected stopping criteria.

    13

  • Stopping criteria are typically based, amongst others, see [Björk –

    88], [Björk, Grimme, Van Dooren – 94], on

    • estimation of the L-curve [Calvetti, Golub, Reichel – 99], [Calvetti,

    Morigi, Reichel, Sgallari – 00], [Calvetti, Reichel – 04];

    • estimation of the distance between the exact and regularized so-

    lution [O’Leary – 01];

    • the discrepancy principle [Morozov – 66], [Morozov – 84];

    • cross validation methods [Chung, Nagy, O’Leary – 04], [Golub,

    Von Matt – 97], [Nguyen, Milanfar, Golub – 01].

    For an extensive study and comparison see [Hansen – 98], [Kilmer,

    O’Leary – 01].

    14

  • Stopping criteria based on information from residual vectors:

    A vector x̂ is a good approximaton to xexact = A−1 bexact if the

    corresponding residual approximates the (white) noise in the data

    r̂ ≡ b − A x̂ ≈ bnoise .

    Behavior of r̂ can be expressed in the frequency domain using

    • discrete Fourier transform, see [Rust – 98], [Rust – 00],

    [Rust, O’Leary – 08], or

    • discrete cosine transform, see [Hansen, Kilmer, Kjeldsen – 06],

    and then analyzed using statistical tools – cumulative periodograms.

    15

  • This talk:

    Under the given (quite natural) assumptions, the Golub-Kahan itera-

    tive bidiagonalization reveals the noise level δnoise .

    16

  • Outline

    1. Problem formulation

    2. Golub-Kahan iterative bidiagonalization, Lanczos tridiago-

    nalization, and approximation of the Riemann-Stieltjes dis-

    tribution function

    3. Propagation of the noise in the Golub-Kahan bidiagonalization

    4. Determination of the noise level

    5. Numerical illustration

    6. Summary and future work

    17

  • Golub-Kahan iterative bidiagonalization (GK) of A :

    given w0 = 0 , s1 = b / β1 , where β1 = ‖b‖ , for j = 1,2, . . .

    αj wj = AT sj − βj wj−1 , ‖wj‖ = 1 ,

    βj+1 sj+1 = A wj − αj sj , ‖sj+1‖ = 1 .

    Let Sk = [ s1, . . . , sk ] , Wk = [w1, . . . , wk ] be the associated

    matrices with orthonormal columns.

    18

  • Denote

    Lk =

    α1β2 α2

    . . . . . .

    βk αk

    ,

    Lk+ =

    α1β2 α2

    . . . . . .

    βk αkβk+1

    =

    [

    Lk

    eTk βk+1

    ]

    ,

    the bidiagonal matrices containing the normalization coefficients. ThenGK can be written in the matrix form as

    AT Sk = Wk LTk ,

    A Wk =[

    Sk, sk+1]

    Lk+ = Sk+1 Lk+ .

    19

  • GK is closely related to the Lanczos tridiagonalization of the sym-metric matrix A AT with the starting vector s1 = b / β1,

    A AT Sk = Sk Tk + αk βk+1 sk+1 eTk ,

    Tk = Lk LTk =

    α21 α1 β1

    α1 β1 α22 + β

    22

    . . .. . . . . . αk−1 βk

    αk−1 βk α2k + β

    2k

    ,

    i.e. the matrix Lk from GK represents a Cholesky factor of thesymmetric tridiagonal matrix Tk from the Lanczos process.

    20

  • Approximation of the distribution function:

    The Lanczos tridiagonalization of the given (SPD) matrix B ∈ Rn×n

    generates at each step k a non-decreasing piecewise constant distri-

    bution function ω(k) , with the nodes being the (distinct) eigenvalues

    of the Lanczos matrix Tk and the weights ω(k)j being the squared

    first entries of the corresponding normalized eigenvectors [Hestenes,

    Stiefel – 52].

    The distribution functions ω(k)(λ) , k = 1, 2, . . . represent Gauss-

    Christoffel quadrature (i.e. minimal partial realization) approxima-

    tions of the distribution function ω(λ) , [Hestenes, Stiefel – 52], [Fis-

    cher – 96], [Meurant, Strakoš – 06].

    21

  • Consider the SVD

    Lk = Pk Θk QkT ,

    Pk = [p(k)1 , . . . , p

    (k)k ] , Qk = [q

    (k)1 , . . . , q

    (k)k ] , Θk = diag (θ

    (k)1 , . . . , θ

    (k)n ) ,

    with the singular values ordered in the increasing order,

    0 < θ(k)1 < . . . < θ

    (k)k .

    Then Tk = Lk LTk = Pk Θ

    2k P

    Tk is the spectral decomposition of Tk ,

    (θ(k)ℓ )

    2 are its eigenvalues (the Ritz values of AAT) and

    p(k)ℓ its eigenvectors (which determine the Ritz vectors of AA

    T),

    ℓ = 1, . . . , k .

    22

  • Summarizing:

    The GK bidiagonalization generates at each step k the distributionfunction

    ω(k)(λ) with nodes (θ(k)ℓ )

    2 and weights ω(k)ℓ = |(p

    (k)ℓ , e1)|

    2

    that approximates the distribution function

    ω(λ) with nodes σ2j and weights ωj = |(b/β1, uj)|2 ,

    where σ2j , uj are the eigenpairs of A AT , for j = n, . . . , 1 .

    Note that unlike the Ritz values (θ(k)ℓ )

    2, the squared singular values

    σ2j are enumerated in descending order.

    23

  • Outline

    1. Problem formulation

    2. Golub-Kahan iterative bidiagonalization, Lanczos tridiagonaliza-

    tion, and approximation of the Riemann-Stieltjes distribution func-

    tion

    3. Propagation of the noise in the Golub-Kahan bidiagonaliza-

    tion

    4. Determination of the noise level

    5. Numerical illustration

    6. Summary and future work

    24

  • GK starts with the normalized noisy right-hand side s1 = b / ‖b‖ .

    Consequently, vectors sj contain information about the noise.

    Can this information be used to determine the level of the noise in

    the observation vector b ?

    Consider the problem Shaw from [Hansen – Regularization Tools]

    (computed via [A,b exact,x] = shaw(400)) with a noisy right-

    hand side (the noise was artificially added using the MATLAB function

    randn). As an example we set

    δnoise ≡‖ bnoise ‖

    ‖ bexact ‖= 10−14 .

    25

  • Components of several bidiagonalization vectors sj

    computed via GK with double reorthogonalization:

    0 200 400−0.2

    −0.1

    0

    0.1

    0.2

    s1

    0 200 400−0.2

    −0.1

    0

    0.1

    0.2

    s6

    0 200 400−0.2

    −0.1

    0

    0.1

    0.2

    s11

    0 200 400−0.2

    −0.1

    0

    0.1

    0.2

    s16

    0 200 400−0.2

    −0.1

    0

    0.1

    0.2

    s17

    0 200 400−0.2

    −0.1

    0

    0.1

    0.2

    s18

    0 200 400−0.2

    −0.1

    0

    0.1

    0.2

    s19

    0 200 400−0.2

    −0.1

    0

    0.1

    0.2

    s20

    0 200 400−0.2

    −0.1

    0

    0.1

    0.2

    s21

    0 200 400−0.2

    −0.1

    0

    0.1

    0.2

    s22

    26

  • The first 80 spectral coefficients of the vectors sj

    in the basis of the left singular vectors uj of A:

    20 40 60 80

    100

    UTs1

    20 40 60 80

    100

    UTs6

    20 40 60 80

    100

    UTs11

    20 40 60 80

    100

    UTs16

    20 40 60 80

    100

    UTs17

    20 40 60 80

    100

    UTs18

    20 40 60 80

    100

    UTs19

    20 40 60 80

    100

    UTs20

    20 40 60 80

    100

    UTs21

    20 40 60 80

    100

    UTs22

    27

  • Signal space – noise space diagrams:

    s1 −> s

    2s

    5 −> s

    6s

    6 −> s

    7s

    10 −> s

    11s

    13 −> s

    14s

    15 −> s

    16

    s16

    −> s17

    s17

    −> s18

    s18

    −> s19

    s19

    −> s20

    s20

    −> s21

    s21

    −> s22

    sk (triangle) and sk+1 (circle) in the signal space span{u1, . . . , uk+1}

    (horizontal axis) and the noise space span{uk+2, . . . , un} (vertical axis).

    28

  • The noise is amplified with the ratio αk/βk+1 :

    GK for the spectral components:

    α1 (VTw1) = Σ(U

    Ts1) ,

    β2 (UTs2) = Σ(V

    Tw1) − α1 (UTs1) ,

    and for k = 2,3, . . .

    αk(VTwk) = Σ(U

    Tsk) − βk(VTwk−1) ,

    βk+1(UTsk+1) = Σ(V

    Twk) − αk(UTsk) .

    29

  • Since dominance in Σ(UTsk) and (VTwk−1) is shifted by one com-

    ponent, in αk (VTwk) = Σ(U

    Tsk) − βk (VTwk−1) , one can not ex-

    pect a significant cancelation, and therefore

    αk ≈ βk .

    Whereas Σ (V Twk) and (UT sk) do exhibit dominance in the direc-

    tion of the same components. If this dominance is strong enough,

    then the required orthogonality of sk+1 and sk in

    βk+1 (UTsk+1) = Σ(V

    Twk) − αk (UTsk) can not be achieved without

    a significant cancelation, and one can expect

    βk+1 ≪ αk .

    .

    30

  • Absolute values of the first 25 components of Σ(V Twk), αk(UTsk),

    and βk+1(UTsk+1) for k = 7, β8/α7 = 0.0524 (left)

    and for k = 12, β13/α12 = 0.677 (right),

    5 10 15 20 2510

    −20

    10−15

    10−10

    10−5

    100

    | Σ VT w7 |

    | α7 UT s

    7 |

    | β8 UT s

    8 |

    5 10 15 20 2510

    −25

    10−20

    10−15

    10−10

    10−5

    | Σ VT w12

    || α

    12 UT s

    12 |

    | β13

    UT s13

    |

    Shaw with the noise level δnoise = 10−14:

    31

  • Outline

    1. Problem formulation

    2. Golub-Kahan iterative bidiagonalization, Lanczos tridiagonaliza-

    tion, and approximation of the Riemann-Stieltjes distribution func-

    tion

    3. Propagation of the noise in the Golub-Kahan bidiagonalization

    4. Determination of the noise level

    5. Numerical illustration

    6. Summary and future work

    32

  • Depending on the noise level, the smaller nodes of ω(λ) are com-

    pletely dominated by noise, i.e., there exists an index Jnoise such

    that for j ≥ Jnoise

    |(b/β1, uj)|2 ≈ |(bnoise/β1, uj)|

    2

    and the weight of the set of the associated nodes is given by

    δ2 ≡n

    j=Jnoise

    |(bnoise/β1, uj)|2 .

    Recall that the large nodes σ21, σ22, . . . are well-separated (relatively

    to the small ones) and their weights on average decrease faster than

    σ21, σ22 , see (DPC). Therefore the large nodes essentially control the

    behavior of the early stages of the Lanczos process.

    33

  • At any iteration step, the weight corresponding to the smallest node

    (θ(k)1 )

    2 must be larger than the sum of weights of all σ2j smaller

    than this (θ(k)1 )

    2 , see [Fischer, Freund – 94]. As k increases, some

    (θ(k)1 )

    2 eventually approaches (or becomes smaller than) the node

    σ2Jnoise, and its weight becomes

    |(p(k)1 , e1)|

    2 ≈ δ2 ≈ δ2noise .

    The weight |(p(k)1 , e1)|

    2 corresponding to the smallest Ritz value

    (θ(k)1 )

    2 is strictly decreasing. At some iteration step it sharply starts

    to (almost) stagnate on the level close to the squared noise level

    δ2noise .

    34

  • Square roots of the weights |(p(k)1 , e1)|

    2, k = 1, 2, . . .,

    0 5 10 15 20 25 3010

    −15

    10−10

    10−5

    100

    iteration number k

    δnoise

    knoise

    = 18 − 1 = 17

    Shaw with the noise level δnoise = 10−14:

    35

  • Square roots of the weights |(p(k)1 , e1)|

    2, k = 1, 2, . . .,

    0 5 10 15 20

    10−4

    10−3

    10−2

    10−1

    100

    iteration number k

    δnoise

    knoise

    = 8 − 1 = 7

    Shaw with the noise level δnoise = 10−4:

    36

  • The smallest node and weight in approximation of ω(λ):

    10−40

    10−20

    100

    10−30

    10−20

    10−10

    100

    the leftmost node

    wei

    ght

    ω(λ): nodes σj2, weights |(b/β

    1,u

    j)|2

    nodes (θ1(k))2, weights |(p

    1(k),e

    1)|2

    δ2noise

    37

  • The smallest node and weight in approximation of ω(λ):

    10−40

    10−20

    100

    10−8

    10−6

    10−4

    10−2

    100

    the leftmost node

    wei

    ght

    ω(λ): nodes σj2, weights |(b/β

    1,u

    j)|2

    nodes (θ1(k))2, weights |(p

    1(k),e

    1)|2

    δ2noise

    38

  • Outline

    1. Problem formulation

    2. Golub-Kahan iterative bidiagonalization, Lanczos tridiagonaliza-

    tion, and approximation of the Riemann-Stieltjes distribution func-

    tion

    3. Propagation of the noise in the Golub-Kahan bidiagonalization

    4. Determination of the noise level

    5. Numerical illustration

    6. Summary and future work

    39

  • Image deblurring problem, image size 324 × 470 pixels,

    problem dimension n = 152280, the exact solution (left) and

    the noisy right-hand side (right), δnoise = 3 × 10−3.

    xexact bexact + bnoise

    40

  • Square roots of the weights |(p(k)1 , e1)|

    2, k = 1, 2, . . . (top)

    and error history of LSQR solutions (bottom):

    0 20 40 60 80 10010

    −3

    10−2

    10−1

    100

    | ( p1(k) , e

    1 ) |, || bnoise ||

    2 / || bexact ||

    2 = 0.003

    | ( p1(k) , e

    1 ) |

    δnoise

    0 20 40 60 80 100

    103.7

    103.8

    103.9

    Error history

    || xkLSQR − xexact ||

    the minimal error, k = 41

    41

  • The best LSQR reconstruction (left), xLSQR41 ,

    and the corresponding componentwise error (right).

    GK without any reorthogonalization!

    LSQR reconstruction with minimal error, xLSQR41

    Error of the best LSQR reconstruction, |xexact − xLSQR41

    |

    10

    20

    30

    40

    50

    60

    70

    80

    90

    100

    110

    42

  • Outline

    1. Problem formulation

    2. Golub-Kahan iterative bidiagonalization, Lanczos tridiagonaliza-

    tion, and the approximation of the Riemann-Stieltjes distribution

    function

    3. Propagation of the noise in the Golub-Kahan bidiagonalization

    4. Determination of the noise level

    5. Numerical illustration

    6. Summary and future work

    43

  • Message:

    Using GK, information about the noise can be obtained in a straight-

    forward way.

    Future work:

    • Large scale problems;

    • Behavior in finite precision arithmetic

    (GK without reorthogonalization);

    • Regularization;

    • Denoising;

    • Colored noise.

    44

  • References

    • Golub, Kahan: Calculating the singular values and pseudoinverse of a matrix,SIAM J. B2, 1965.

    • Hansen: Rank-deficient and discrete ill-posed problems, SIAM MonographsMath. Modeling Comp., 1998.

    • Hansen, Kilmer, Kjeldsen: Exploiting residual information in the parameterchoice for discrete ill-posed problems, BIT, 2006.

    • Hnětynková, Strakoš: Lanczos tridiag. and core problem, LAA, 2007.• Meurant, Strakoš: The Lanczos and CG algorithms in finite precision arith-

    metic, Acta Numerica, 2006.• Paige, Strakoš: Core problem in linear algebraic systems, SIMAX, 2006.• Rust: Truncating the SVD for ill-posed problems, Technical Report, 1998.• Rust, O’Leary: Residual periodograms for choosing regularization parameters

    for ill-posed problems, Inverse Problems, 2008.• Hnětynková, Plešinger, Strakoš: The regularizing effect of the Golub-Kahan

    iterative bidiagonalization and revealing the noise level, BIT, 2009.

    • ...

    45

  • Main message:

    Whenever you see a blurred elephant which is a bit too noisy,

    the best thing is to apply the GK iterative bidiagonalization.

    Full version of the talk can be found at

    www.cs.cas.cz/strakos

    46

  • Thank you for your kind attention!

    47