term projects - university of california, berkeley › ... › ma128blectureweek11.pdf ·...

48
Term Projects I Solving linear equations with Gaussian Elimination I The QR Algorithm for Symmetric Eigenvalue Problem I The QR Algorithm for The SVD I Quasi-Newton Methods

Upload: others

Post on 06-Jul-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Term Projects - University of California, Berkeley › ... › MA128BLectureWeek11.pdf · 2018-04-15 · Project Requirements I Householder reduction to tridiagonal form I Implicit

Term Projects

I Solving linear equations with GaussianElimination

I The QR Algorithm for Symmetric EigenvalueProblem

I The QR Algorithm for The SVD

I Quasi-Newton Methods

Page 2: Term Projects - University of California, Berkeley › ... › MA128BLectureWeek11.pdf · 2018-04-15 · Project Requirements I Householder reduction to tridiagonal form I Implicit

Solving linear equations with Gaussian Elimination (I)

I Gaussian Elimination with Partial Pivoting (GEPP)

I Gaussian Elimination with Complete Pivoting (GECP)

I Iterative Refinement

I Comparison with matlab \ function.

Page 3: Term Projects - University of California, Berkeley › ... › MA128BLectureWeek11.pdf · 2018-04-15 · Project Requirements I Householder reduction to tridiagonal form I Implicit

Gaussian Elimination with partial pivoting

A =

a11 a12 · · · a1na21 a22 · · · a2n

......

. . ....

an1 an2 · · · ann

.

Let Ej be the j-th row of AI for s = 1, 2, · · · , n − 1:

I pivoting: choose largest entry in absolute value:

pivsdef= argmaxs≤j≤n|ajs |, Es ↔ Epivs (row interchange: permutation).

I eliminating xs from Es+1 through En:

ljs =ajsass, s + 1 ≤ j ≤ n,

ajk = ajk − ljs ask , s + 1 ≤ j , k ≤ n.

Page 4: Term Projects - University of California, Berkeley › ... › MA128BLectureWeek11.pdf · 2018-04-15 · Project Requirements I Householder reduction to tridiagonal form I Implicit

Gaussian Elimination with partial pivoting

A =

a11 a12 · · · a1na21 a22 · · · a2n

......

. . ....

an1 an2 · · · ann

.

Let Ej be the j-th row of AI for s = 1, 2, · · · , n − 1:

I pivoting: choose largest entry in absolute value:

pivsdef= argmaxs≤j≤n|ajs |, Es ↔ Epivs

(row interchange: permutation).

I eliminating xs from Es+1 through En:

ljs =ajsass, s + 1 ≤ j ≤ n,

ajk = ajk − ljs ask , s + 1 ≤ j , k ≤ n.

Page 5: Term Projects - University of California, Berkeley › ... › MA128BLectureWeek11.pdf · 2018-04-15 · Project Requirements I Householder reduction to tridiagonal form I Implicit

Gaussian Elimination with partial pivoting

A =

a11 a12 · · · a1na21 a22 · · · a2n

......

. . ....

an1 an2 · · · ann

.

Let Ej be the j-th row of AI for s = 1, 2, · · · , n − 1:

I pivoting: choose largest entry in absolute value:

pivsdef= argmaxs≤j≤n|ajs |, Es ↔ Epivs (row interchange: permutation).

I eliminating xs from Es+1 through En:

ljs =ajsass, s + 1 ≤ j ≤ n,

ajk = ajk − ljs ask , s + 1 ≤ j , k ≤ n.

Page 6: Term Projects - University of California, Berkeley › ... › MA128BLectureWeek11.pdf · 2018-04-15 · Project Requirements I Householder reduction to tridiagonal form I Implicit

Gaussian Elimination with partial pivoting

A =

a11 a12 · · · a1na21 a22 · · · a2n

......

. . ....

an1 an2 · · · ann

.

Let Ej be the j-th row of AI for s = 1, 2, · · · , n − 1:

I pivoting: choose largest entry in absolute value:

pivsdef= argmaxs≤j≤n|ajs |, Es ↔ Epivs (row interchange: permutation).

I eliminating xs from Es+1 through En:

ljs =ajsass, s + 1 ≤ j ≤ n,

ajk = ajk − ljs ask , s + 1 ≤ j , k ≤ n.

Page 7: Term Projects - University of California, Berkeley › ... › MA128BLectureWeek11.pdf · 2018-04-15 · Project Requirements I Householder reduction to tridiagonal form I Implicit

GEPP as LU factorization

Theorem: Let A = (aij) ∈ Rn×n be non-singular. Then GEPPcomputes an LU factorization with permutation matrix P such that

P · A = L · U =

·

.

P is the accumulation of row interchanges, L is unit lowertriangular, and U upper triangular.

Page 8: Term Projects - University of California, Berkeley › ... › MA128BLectureWeek11.pdf · 2018-04-15 · Project Requirements I Householder reduction to tridiagonal form I Implicit

Solving general linear equations with GEPP

A x = b, P · A = L · U

I interchanging components in b

P · (A x) = (P · b) , (L · U) x = (P · b) .

I solving for b with forward and backward substitution

x = (L · U)−1 (P · b)

=(U−1

(L−1 (P · b)

)).

Page 9: Term Projects - University of California, Berkeley › ... › MA128BLectureWeek11.pdf · 2018-04-15 · Project Requirements I Householder reduction to tridiagonal form I Implicit

Gaussian Elimination with complete pivoting

A =

a11 a12 · · · a1na21 a22 · · · a2n...

.

.

.. . .

.

.

.an1 an2 · · · ann

.

I for s = 1, 2, · · · , n − 1:I pivoting: choose largest entry in absolute value:

(ip, jp)def= argmaxs≤i,j≤n|aij |, Es ↔ Eip.

swap matrix columns/ variables s and jpI eliminating xs from Es+1 through En:

ajk = ajk −ajsass

askoverwrite==== ajk , s + 1 ≤ j ≤ n, s + 1 ≤ k ≤ n,

More numerically stable, too expensive. GECP computes LUfactorization with permutations Prow and Pcolumn,

Prow · A · Pcolumn = L · U =

·

.

Page 10: Term Projects - University of California, Berkeley › ... › MA128BLectureWeek11.pdf · 2018-04-15 · Project Requirements I Householder reduction to tridiagonal form I Implicit

Gaussian Elimination with complete pivoting

A =

a11 a12 · · · a1na21 a22 · · · a2n...

.

.

.. . .

.

.

.an1 an2 · · · ann

.

I for s = 1, 2, · · · , n − 1:I pivoting: choose largest entry in absolute value:

(ip, jp)def= argmaxs≤i,j≤n|aij |, Es ↔ Eip.

swap matrix columns/ variables s and jp

I eliminating xs from Es+1 through En:

ajk = ajk −ajsass

askoverwrite==== ajk , s + 1 ≤ j ≤ n, s + 1 ≤ k ≤ n,

More numerically stable, too expensive. GECP computes LUfactorization with permutations Prow and Pcolumn,

Prow · A · Pcolumn = L · U =

·

.

Page 11: Term Projects - University of California, Berkeley › ... › MA128BLectureWeek11.pdf · 2018-04-15 · Project Requirements I Householder reduction to tridiagonal form I Implicit

Gaussian Elimination with complete pivoting

A =

a11 a12 · · · a1na21 a22 · · · a2n...

.

.

.. . .

.

.

.an1 an2 · · · ann

.

I for s = 1, 2, · · · , n − 1:I pivoting: choose largest entry in absolute value:

(ip, jp)def= argmaxs≤i,j≤n|aij |, Es ↔ Eip.

swap matrix columns/ variables s and jpI eliminating xs from Es+1 through En:

ajk = ajk −ajsass

askoverwrite==== ajk , s + 1 ≤ j ≤ n, s + 1 ≤ k ≤ n,

More numerically stable, too expensive. GECP computes LUfactorization with permutations Prow and Pcolumn,

Prow · A · Pcolumn = L · U =

·

.

Page 12: Term Projects - University of California, Berkeley › ... › MA128BLectureWeek11.pdf · 2018-04-15 · Project Requirements I Householder reduction to tridiagonal form I Implicit

Error Bounds and Iterative Refinement

Assume that x is an approximation to the solution x of A x = b.

I Residual rdef= b− A x = A (x− x) . Thus small ‖x− x‖

implies small ‖r‖.T¯

hm: Assume x is an approximate solution with r = b−A x. Thenfor any natural norm,

‖x− x‖ ≤∥∥A−1∥∥ ‖r‖ ,

‖x− x‖‖x‖

≤ κ (A)‖r‖‖b‖

,

where κ (A)def= ‖A‖

∥∥A−1∥∥ is the condition number of A

Page 13: Term Projects - University of California, Berkeley › ... › MA128BLectureWeek11.pdf · 2018-04-15 · Project Requirements I Householder reduction to tridiagonal form I Implicit

Iterative Refinement (I)

I Let A x = b with non-singular A and non-zero b.

I Let F (·) be an in-exact equation solver, so F (b) isapproximate solution.

I Assume F (·) is accurate enough that there exists a ρ < 1 so

‖b− AF (b)‖‖b‖

≤ ρ for any b 6= 0.

For this project,

I F (·) will be: LU factorization without pivoting, LUfactorization with partial pivoting, and LU factorization withcomplete pivoting.

F (b) = U−1(L−1 b

).

Page 14: Term Projects - University of California, Berkeley › ... › MA128BLectureWeek11.pdf · 2018-04-15 · Project Requirements I Householder reduction to tridiagonal form I Implicit

Iterative Refinement (I)

I Let A x = b with non-singular A and non-zero b.

I Let F (·) be an in-exact equation solver, so F (b) isapproximate solution.

I Assume F (·) is accurate enough that there exists a ρ < 1 so

‖b− AF (b)‖‖b‖

≤ ρ for any b 6= 0.

For this project,

I F (·) will be: LU factorization without pivoting, LUfactorization with partial pivoting, and LU factorization withcomplete pivoting.

F (b) = U−1(L−1 b

).

Page 15: Term Projects - University of California, Berkeley › ... › MA128BLectureWeek11.pdf · 2018-04-15 · Project Requirements I Householder reduction to tridiagonal form I Implicit

Iterative Refinement (II)

Given a tolerance τ > 0 and x(0)

I Initialize r(0) = b− A x(0).I for k = 0, 1, · · ·

I Compute

∆x(k) = F(r(k)),

x(k+1) = x(k) + ∆x(k),

r(k+1) = r(k) − A∆x(k).

I If∥∥r(k+1)

∥∥ ≤ τ ‖b‖ stop.

Page 16: Term Projects - University of California, Berkeley › ... › MA128BLectureWeek11.pdf · 2018-04-15 · Project Requirements I Householder reduction to tridiagonal form I Implicit

Project Requirements

I Implement GE, GEPP, GECP, with and without iterativerefinement.

I Compare with matlab \ function in terms of accuracy(residual norm) and execution time.

Page 17: Term Projects - University of California, Berkeley › ... › MA128BLectureWeek11.pdf · 2018-04-15 · Project Requirements I Householder reduction to tridiagonal form I Implicit

The QR Algorithm for Symmetric Eigenvalue Problem

I Householder reduction to tridiagonal

I Implicit bulge Chasing scheme for tridiagonal QRAlgorithm.

I Recover eigenvectors.

Page 18: Term Projects - University of California, Berkeley › ... › MA128BLectureWeek11.pdf · 2018-04-15 · Project Requirements I Householder reduction to tridiagonal form I Implicit

Householder Reduction for symmetric A ∈ Rn×n

I Let vj ∈ Rn−j be the unit vector in the Householder

Reflection matrix Hj = I − 2 vj vTj so that

Hjdef=

(Ij

Hj

)∈ Rn×n, j = 1, · · · , n − 1.

I Perform n − 1 Householder reflections so that

Hn−1 · · · H1 AHT1 · · · HT

n−1 =

α1 β1β1 α2 β2

β2 α3 β3. . .

. . .. . .

βn−1 αn

def= T .

I Householder’s Method,

HT AH = T , or A = H T HT ,

where Hdef= HT

1 · · · HTn−1.

Page 19: Term Projects - University of California, Berkeley › ... › MA128BLectureWeek11.pdf · 2018-04-15 · Project Requirements I Householder reduction to tridiagonal form I Implicit

Computing H

For j = n − 2, n − 3, · · · , 2, 1, let

Hj+1 · · ·Hn−1def=

Ij1

H j+1

. Then

Hj ·Hj+1 · · ·Hn−1 =

(Ij

Hj

) Ij1

H j+1

=

(Ij

H j

),

where

H j =(I − 2 vj v

Tj

)( 1

H j+1

)=

(1

H j+1

)− (2 vj)

(vTj

(1

H j+1

)).

Page 20: Term Projects - University of California, Berkeley › ... › MA128BLectureWeek11.pdf · 2018-04-15 · Project Requirements I Householder reduction to tridiagonal form I Implicit
Page 21: Term Projects - University of California, Berkeley › ... › MA128BLectureWeek11.pdf · 2018-04-15 · Project Requirements I Householder reduction to tridiagonal form I Implicit

Project Requirements

I Householder reduction to tridiagonal form

I Implicit Bulge Chasing tridiagonal QR Algorithm for alleigenvalues.

I Accumulate all Householder reflections and Givens rotationsto recover all eigenvectors.

I Total cost O(n3) flops.

I Compare with matlab eig function in terms of accuracy(residual and orthogonality) and execution time.

Algorithm Details

http://people.inf.ethz.ch/arbenz/ewp/Lnotes/chapter4.pdf

Page 22: Term Projects - University of California, Berkeley › ... › MA128BLectureWeek11.pdf · 2018-04-15 · Project Requirements I Householder reduction to tridiagonal form I Implicit

The QR Algorithm for SVD

I Householder reduction to bidiagonal form

I Implicit bulge chasing scheme for bidiagonal SVD.

I Recover singular vectors.

Page 23: Term Projects - University of California, Berkeley › ... › MA128BLectureWeek11.pdf · 2018-04-15 · Project Requirements I Householder reduction to tridiagonal form I Implicit

Householder Bidiagonal Reduction for A ∈ Rn×n (I)

Partition matrix A =(

a1 A1

)∈ Rn×n, A1 ∈ Rn×(n−1).

I Let u1 ∈ Rn be the unit vector in the Householder Reflection

matrix G1def= G1 = I − 2u1 uT1 so that

G1 a1 =

(± ‖a1‖2

0

)def=

(α1

0

). and

G1 A =(

G1 a1 G1 A1

)def=

(α1 bT10 A1

).

I Let v1 ∈ Rn−1 be the unit vector in the HouseholderReflection matrix H1 = I − 2 v1 vT1 so that,

H1 b1 =

(± ‖b1‖2

0

)def=

(β10

). With H1 =

(1

H1

),

G1 AH1 =

(α1 bT1 H1

0 A1 H1

)def=

(α1 β1 0T

0 a2 A2

).

Page 24: Term Projects - University of California, Berkeley › ... › MA128BLectureWeek11.pdf · 2018-04-15 · Project Requirements I Householder reduction to tridiagonal form I Implicit

G1 AH1 =

(α1 β1 0T

0 a2 A2

), A2 ∈ R(n−1)×(n−2).

I Let u2 ∈ Rn−1 be the unit vector in G2 = I − 2u2 uT2 so that

G2 a2 =

(± ‖a2‖2

0

)def=

(α2

0

). With G2 =

(1

G2

),

G2 G1 AH1 =

(α1 β1 0T

0 G2 a2 G2 A2

)def=

α1 β1 0T

0 α2 bT20 0 A2

.

I Let v2 ∈ Rn−2 be the unit vector in H2 = I − 2 v2 vT2 so that,

H2 b2 =

(± ‖b2‖2

0

)def=

(β20

). With H2 =

(I2

H2

),

G2 G1 AH1H2 =

α1 β1 0T

0 α2 bT2 H2

0 0 A2 H2

def=

α1 β1 0 0T

0 α2 β2 0T

0 0 a3 A3

.

Page 25: Term Projects - University of California, Berkeley › ... › MA128BLectureWeek11.pdf · 2018-04-15 · Project Requirements I Householder reduction to tridiagonal form I Implicit

Householder Bidiagonal Reduction for A ∈ Rn×n (III)

I After n − 1 pairs of Householder Reflections,

Gn−1 · · ·G2 G1 AH1H2 · · ·Hn−1 =

α1 β1

α2 β2

α3. . .. . . βn−1

αn

def= B.

B is an upper bidiagonal matrix.

I Next step is QR algorithm for bidiagonal SVD.

Page 26: Term Projects - University of California, Berkeley › ... › MA128BLectureWeek11.pdf · 2018-04-15 · Project Requirements I Householder reduction to tridiagonal form I Implicit

Input matrix B =

α1 β1α2 β2

α3

. . .

. . . βn−1αn βn

, tolerance τ

Algorithm 1 BidiSVD (B, τ)

while not yet converged doif n = 1 then return SVD of Bfor j = 1, · · · , n − 1 doif |βj | ≤ τ then

return BidiSVD (B (1 : j , 1 : j) , τ) andBidiSVD (B (j + 1 : n, j + 1 : n + 1) , τ)

end ifif |αj | ≤ τ then

return BidiSVD (B (1 : j − 1, 1 : j) , τ) andBidiSVD (B (j : n, j + 1 : n + 1) , τ)

end ifend forCompute shift σ. Bulge-chasing QR with σ.

end while

Page 27: Term Projects - University of California, Berkeley › ... › MA128BLectureWeek11.pdf · 2018-04-15 · Project Requirements I Householder reduction to tridiagonal form I Implicit

All forms Bidiagonal

B =

α1 β1

α2 β2

α3. . .. . . βn−1

αn βn

, B =

β1α2 β2

α3. . .. . . βn−1

αn

B =

α1 β1

α2 β2

α3. . .. . . βn−1

αn

, B =

β1α2 β2

α3. . .. . . βn−1

αn βn

Next: QR for SVD of B =

β1α2 β2

α3. . .. . . βn−1

αn βn

.

Page 28: Term Projects - University of California, Berkeley › ... › MA128BLectureWeek11.pdf · 2018-04-15 · Project Requirements I Householder reduction to tridiagonal form I Implicit

All forms Bidiagonal

B =

α1 β1

α2 β2

α3. . .. . . βn−1

αn βn

, B =

β1α2 β2

α3. . .. . . βn−1

αn

B =

α1 β1

α2 β2

α3. . .. . . βn−1

αn

, B =

β1α2 β2

α3. . .. . . βn−1

αn βn

Next: QR for SVD of B =

β1α2 β2

α3. . .. . . βn−1

αn βn

.

Page 29: Term Projects - University of California, Berkeley › ... › MA128BLectureWeek11.pdf · 2018-04-15 · Project Requirements I Householder reduction to tridiagonal form I Implicit

Bidiagonal SVDI B BT is symmetric tridiagonal.

B BT =

β21 α2β1

α2β1 α22 + β2

2 α3β2

α3β2 α23 + β2

3

. . .. . .

. . . αnβn−1

αnβn−1 α2n + β2

n

.

I Bottom 2× 2 matrix(α2n−1 + β2

n−1 αnβn−1

αnβn−1 α2n + β2

n

)= S ST , with S

def=

(αn−1 βn−1

αn βn

).

Shift σ is the singular value of S closer to√α2n + β2n .

I Bulge-chasing QR based on QR factorization of

B BT − σ2 I =

β21 − σ2 α2β1

α2β1 α22 + β2

2 − σ2 . . .. . .

. . . αnβn−1

αnβn−1 α2n + β2

n − σ2

.

Page 30: Term Projects - University of California, Berkeley › ... › MA128BLectureWeek11.pdf · 2018-04-15 · Project Requirements I Householder reduction to tridiagonal form I Implicit

Bulge-chasing QR for Bidiagonal SVD (I)

I Givens rotation on first column of B BT − σ2 I :

G1

(B BT − σ2 I

)def=

(c1 s1−s1 c1

In−2

) (β21 − σ2

α2β10

)=

( ∗00

).

G1 is the first Givens rotation towards QR factorization ofB BT − σ2 I .

I Creating bulge:

(d1 f1d2 f2

)def=

(c1 s1−s1 c1

) (β1 0α2 β2

).

G1 B =

d1 f1d2 f2

α3. . .. . . βn−1

αn βn

=

∗ +∗ ∗

∗. . .. . . ∗

∗ ∗

.

I Next: Chase Bulge (+) down along the diagonals.

Page 31: Term Projects - University of California, Berkeley › ... › MA128BLectureWeek11.pdf · 2018-04-15 · Project Requirements I Householder reduction to tridiagonal form I Implicit

Bulge-chasing QR for Bidiagonal SVD (II)

G1 B =

d1 f1d2 f2

α3. . .. . . βn−1

αn βn

=

∗ +∗ ∗

∗. . .. . . ∗

∗ ∗

.

I Givens rotation on first row, H1def=

(c1 s1−s1 c1

In−2

)(d1 f1 0T

)HT1 =

( √d21 + f 21 0 0T

)def=(β1 0T

).

I Compute(d2 f2

α3

) (c1 s1−s1 c1

)Tdef=

(d2 f 2d3 f 3

).

Page 32: Term Projects - University of California, Berkeley › ... › MA128BLectureWeek11.pdf · 2018-04-15 · Project Requirements I Householder reduction to tridiagonal form I Implicit

Bulge-chasing QR for Bidiagonal SVD (III)

G1 B H1 =

β1d2 f 2d3 f 3 β3

α4. . .. . . βn−1

αn βn

=

∗∗ ∗+ ∗ ∗

∗. . .. . . ∗

∗ ∗

.

I Givens rotation on first column, G2def=

1c2 s2−s2 c2

In−3

(

c2 s2−s2 c2

) (d2 f 2 0

d3 f 3 β3

)def=

(α2 d2 f20 d3 f3

).

Page 33: Term Projects - University of California, Berkeley › ... › MA128BLectureWeek11.pdf · 2018-04-15 · Project Requirements I Householder reduction to tridiagonal form I Implicit

Bulge-chasing QR for Bidiagonal SVD (IV)

G2 G1 B H1 =

β1α2 d2 f2

d3 f3

α4. . .. . . βn−1

αn βn

=

∗∗ ∗ +∗ ∗

∗. . .. . . ∗

∗ ∗

.

I New Givens rotation:(d2 f2d3 f30 α4

)(c2 s2−s2 c2

)T

=

β2 0d3 f 3d4 f 4

.

I Repeat procedure until bulge disappears at the lower rightcorner.

Page 34: Term Projects - University of California, Berkeley › ... › MA128BLectureWeek11.pdf · 2018-04-15 · Project Requirements I Householder reduction to tridiagonal form I Implicit

Project Requirements

I Householder reduction to bidiagonal form

I Implicit Bulge Chasing bidiagonal QR Algorithm for allsingular values.

I Accumulate all Householder reflections and Givens rotationsto recover all singular vectors.

I Total cost O(n3) flops.

I Compare with matlab svd function in terms of accuracy(residual and orthogonality) and execution time.

Page 35: Term Projects - University of California, Berkeley › ... › MA128BLectureWeek11.pdf · 2018-04-15 · Project Requirements I Householder reduction to tridiagonal form I Implicit

Quasi-Newton Methods (BFGS) for minx g (x)I BFGS Method:

x(k+1) = x(k) − αB−1k ∇g(x(k)

), where Bk ∈ Rn×n.

I Desired Property I: ”Mimics” B(x(k)

)in some sense.

I Desired Property II: ”Easy” to compute B−1k+1 from B−1k .I Ensure symmetry in BkI Step size selection as in Steepest Descent.

Page 36: Term Projects - University of California, Berkeley › ... › MA128BLectureWeek11.pdf · 2018-04-15 · Project Requirements I Householder reduction to tridiagonal form I Implicit

BFGS Method: Derivation

I Assume for some step k ≥ 0, we have available x(k) ∈ Rn andBk ∈ Rn×n. By BFGS Method:

x(k+1) = x(k) − B−1k ∇g(x(k)

).

I Secant equation

∇g(x(k+1)

)−∇g

(x(k)

)≈ H

(x(k)

) (x(k+1) − x(k)

).

I BFGS ideas:I Approximate secant equation: Bk+1 sk+1 = yk+1, where

yk+1def= ∇g

(x(k+1)

)−∇g

(x(k)), sk+1

def= x(k+1) − x(k).

I Choose a special, symmetric Bk+1 that does not differ muchfrom Bk

Bk+1 = Bk +yk+1yTk+1

yTk+1 sk+1−Bk sk+1 sTk+1 BksTk+1 Bk sk+1

Page 37: Term Projects - University of California, Berkeley › ... › MA128BLectureWeek11.pdf · 2018-04-15 · Project Requirements I Householder reduction to tridiagonal form I Implicit

BFGS MethodI Initialization: Given x(0) ∈ Rn.I Choose symmetric B0 ∈ Rn×n, typically SPD.I For k = 0, 1, · · · ,

x(k+1) = x(k) − αB−1k ∇g(x(k)

),

Bk+1 = Bk +yk+1y

Tk+1

yTk+1 sk+1−Bk sk+1 s

Tk+1 Bk

sTk+1 Bk sk+1, with

yk+1 = ∇g(x(k+1)

)−∇g

(x(k)

), sk+1 = x(k+1) − x(k).

Practical details:

I Bk remains SPD for convex problems.I Cholesky factorization of Bk can help to compute Cholesky

factorization of Bk+1? (updating Cholesky factorization)I BFGS inverse update:

B−1k+1 =

(I −

sk+1 yTk+1

sTk+1 yk+1

)B−1k

(I −

sk+1 yTk+1

sTk+1 yk+1

)T

+sk+1 s

Tk+1

sTk+1 yk+1.

Page 38: Term Projects - University of California, Berkeley › ... › MA128BLectureWeek11.pdf · 2018-04-15 · Project Requirements I Householder reduction to tridiagonal form I Implicit

BFGS MethodI Initialization: Given x(0) ∈ Rn.I Choose symmetric B0 ∈ Rn×n, typically SPD.I For k = 0, 1, · · · ,

x(k+1) = x(k) − αB−1k ∇g(x(k)

),

Bk+1 = Bk +yk+1y

Tk+1

yTk+1 sk+1−Bk sk+1 s

Tk+1 Bk

sTk+1 Bk sk+1, with

yk+1 = ∇g(x(k+1)

)−∇g

(x(k)

), sk+1 = x(k+1) − x(k).

Practical details:

I Bk remains SPD for convex problems.I Cholesky factorization of Bk can help to compute Cholesky

factorization of Bk+1? (updating Cholesky factorization)I BFGS inverse update:

B−1k+1 =

(I −

sk+1 yTk+1

sTk+1 yk+1

)B−1k

(I −

sk+1 yTk+1

sTk+1 yk+1

)T

+sk+1 s

Tk+1

sTk+1 yk+1.

Page 39: Term Projects - University of California, Berkeley › ... › MA128BLectureWeek11.pdf · 2018-04-15 · Project Requirements I Householder reduction to tridiagonal form I Implicit

Limited Memory BFGS (L-BFGS) Method (I)

B−1k+1 maybe too large to fit in memory

I L-BFGS: Mimics BFGS, but with linear memory.

BFGS inverse update, from B−1k to B−1k+1:

B−1k+1 =

(I −

sk+1 yTk+1

sTk+1 yk+1

)B−1k

(I −

sk+1 yTk+1

sTk+1 yk+1

)T

+sk+1 s

Tk+1

sTk+1 yk+1.

Big Idea: Go back only m + 1 steps in inverse update

Page 40: Term Projects - University of California, Berkeley › ... › MA128BLectureWeek11.pdf · 2018-04-15 · Project Requirements I Householder reduction to tridiagonal form I Implicit

Limited Memory BFGS (L-BFGS) Method (II)

B−1k+1 =sk+1 s

Tk+1

sTk+1 yk+1+

(I −

sk+1 yTk+1

sTk+1 yk+1

)· · ·

(I −

sk+1−m yTk+1−msTk+1−m yk+1−m

)

×B−1k−m

(I −

yk+1−m sTk+1−msTk+1−m yk+1−m

)· · ·

(I −

yk+1 sTk+1

sTk+1 yk+1

)

+m−1∑j=0

(I −

sk+1 yTk+1

sTk+1 yk+1

)· · ·

(I −

sk+1−j yTk+1−j

sTk+1−j yk+1−j

)

×

(sk−j s

Tk−j

sTk−j yk−j

) (I −

yk+1−j sTk+1−j

sTk+1−j yk+1−j

)· · ·

(I −

yk+1 sTk+1

sTk+1 yk+1

).

Big Idea: Go back only m + 1 steps in inverse update: Set B−1k−m = B−10

Page 41: Term Projects - University of California, Berkeley › ... › MA128BLectureWeek11.pdf · 2018-04-15 · Project Requirements I Householder reduction to tridiagonal form I Implicit

Computing B−1k+1 h for any vector h (I)

B−1k+1 h = αk+1 sk+1 +

(I −

sk+1 yTk+1

sTk+1 yk+1

)· · ·

(I −

sk+1−m yTk+1−msTk+1−m yk+1−m

)B−10 qk+1−m

+m−1∑j=0

(I −

sk+1 yTk+1

sTk+1 yk+1

)· · ·

(I −

sk+1−j yTk+1−j

sTk+1−j yk+1−j

)

×

(sk−j s

Tk−j

sTk−j yk−j

)qk+1−j ,

where αk+1 =sTk+1 h

sTk+1 yk+1, qk+1 = h− αk+1 yk+1,

qtdef=

(I − yt sTt

sTt yt

)· · ·

(I −

yk+1 sTk+1

sTk+1 yk+1

)h

= qt+1 − αt yt , αt =sTt qt+1

sTt yt, t = k , · · · , k + 1−m.

Page 42: Term Projects - University of California, Berkeley › ... › MA128BLectureWeek11.pdf · 2018-04-15 · Project Requirements I Householder reduction to tridiagonal form I Implicit

Computing B−1k+1 h for any vector h (II)

Let zk+1−m = B−10 qk+1−m. Then

B−1k+1 h = αk+1 sk+1 +

(I −

sk+1 yTk+1

sTk+1 yk+1

)· · ·

(I −

sk+1−m yTk+1−msTk+1−m yk+1−m

)zk+1−m

+m−1∑j=0

(I −

sk+1 yTk+1

sTk+1 yk+1

)· · ·

(I −

sk+1−j yTk+1−j

sTk+1−j yk+1−j

)× (αk−j sk−j)

= αk+1 sk+1 +

(I −

sk+1 yTk+1

sTk+1 yk+1

)· · ·

(I −

sk+2−m yTk+2−msTk+2−m yk+2−m

)zk+2−m

+m−2∑j=0

(I −

sk+1 yTk+1

sTk+1 yk+1

)· · ·

(I −

sk+1−j yTk+1−j

sTk+1−j yk+1−j

)× (αk−j sk−j) ,

where zk+2−m = zk+1−m + (αk+1−m − βk+1−m) sk+1−m,

βk+1−m =yTk+1−m zk+1−m

sTk+1−m yk+1−m

Page 43: Term Projects - University of California, Berkeley › ... › MA128BLectureWeek11.pdf · 2018-04-15 · Project Requirements I Householder reduction to tridiagonal form I Implicit

Computing B−1k+1 h for any vector h (II)

Let zk+1−m = B−10 qk+1−m. Then

B−1k+1 h = αk+1 sk+1 +

(I −

sk+1 yTk+1

sTk+1 yk+1

)· · ·

(I −

sk+1−m yTk+1−msTk+1−m yk+1−m

)zk+1−m

+m−1∑j=0

(I −

sk+1 yTk+1

sTk+1 yk+1

)· · ·

(I −

sk+1−j yTk+1−j

sTk+1−j yk+1−j

)× (αk−j sk−j)

= αk+1 sk+1 +

(I −

sk+1 yTk+1

sTk+1 yk+1

)· · ·

(I −

sk+2−m yTk+2−msTk+2−m yk+2−m

)zk+2−m

+m−2∑j=0

(I −

sk+1 yTk+1

sTk+1 yk+1

)· · ·

(I −

sk+1−j yTk+1−j

sTk+1−j yk+1−j

)× (αk−j sk−j) ,

where zk+2−m = zk+1−m + (αk+1−m − βk+1−m) sk+1−m,

βk+1−m =yTk+1−m zk+1−m

sTk+1−m yk+1−m

Page 44: Term Projects - University of California, Berkeley › ... › MA128BLectureWeek11.pdf · 2018-04-15 · Project Requirements I Householder reduction to tridiagonal form I Implicit

Algorithm 2 Computing B−1k+1 h for any vector h

Set q = h.for t = k + 1, k , · · · , k + 1−m do

Compute αt = sTt qsTt yt

, q = q− αt yt .

end forCompute z = B−10 qfor t = k + 1−m, · · · , k + 1 do

Compute β = yTt zsTt yt

, z = z + (αt − β) st .

end forReturn z

Page 45: Term Projects - University of California, Berkeley › ... › MA128BLectureWeek11.pdf · 2018-04-15 · Project Requirements I Householder reduction to tridiagonal form I Implicit

Step-size SelectionI Ensure reduction in g (x): Algorithm 3 will halt with anα3 > 0 so that g

(x(k) + α3 d(k)

)< g

(x(k)

).

Algorithm 3 g (x) Reduction

Set α3 = 1.while g

(x(k) + α3 d(k)

)≥ g

(x(k)

)do

Set α3 = α32 .

end while

I Ensure sufficient reduction in g (x) with α(k):

Algorithm 4 g (x) Sufficient Reduction

Set α2 = α3/2. Compute coefficients h0, h1, h2so P (α) = h0 + h1 α + h2 α (α− α2) interpolatesg(x(k) + α d(k)

)at α = 0, α2, α3. Set α0 = 1

2 (α2 − h1/h2)

α(k) = argminα∈[α0,α2,α3]g(x(k) + α d(k)

).

Page 46: Term Projects - University of California, Berkeley › ... › MA128BLectureWeek11.pdf · 2018-04-15 · Project Requirements I Householder reduction to tridiagonal form I Implicit

Step-size SelectionI Ensure reduction in g (x): Algorithm 3 will halt with anα3 > 0 so that g

(x(k) + α3 d(k)

)< g

(x(k)

).

Algorithm 3 g (x) Reduction

Set α3 = 1.while g

(x(k) + α3 d(k)

)≥ g

(x(k)

)do

Set α3 = α32 .

end while

I Ensure sufficient reduction in g (x) with α(k):

Algorithm 4 g (x) Sufficient Reduction

Set α2 = α3/2. Compute coefficients h0, h1, h2so P (α) = h0 + h1 α + h2 α (α− α2) interpolatesg(x(k) + α d(k)

)at α = 0, α2, α3. Set α0 = 1

2 (α2 − h1/h2)

α(k) = argminα∈[α0,α2,α3]g(x(k) + α d(k)

).

Page 47: Term Projects - University of California, Berkeley › ... › MA128BLectureWeek11.pdf · 2018-04-15 · Project Requirements I Householder reduction to tridiagonal form I Implicit

BFGS Method for minx g (x)

I Initialization: Given x(0) ∈ Rn.

I Choose symmetric B0 ∈ Rn×n, typically SPD.I For k = 0, 1, · · · ,

I Compute x(d) = −B−1k ∇g

(x(k)).

I Compute step size α.I

x(k+1) = x(k) − αB−1k ∇g

(x(k)),

Bk+1 = Bk +yk+1yTk+1

yTk+1 sk+1−Bk sk+1 sTk+1 BksTk+1 Bk sk+1

, with

yk+1 = ∇g(x(k+1)

)−∇g

(x(k)), sk+1 = x(k+1) − x(k).

Page 48: Term Projects - University of California, Berkeley › ... › MA128BLectureWeek11.pdf · 2018-04-15 · Project Requirements I Householder reduction to tridiagonal form I Implicit

Project Requirements

I Implement BFGS and L-BFGS

I Run BFGS and L-BFGS with logistic objective function