lecture 8: fast linear solvers (part 5)zxu2/acms60212-40212-s12/lec-09-5.pdfΒ Β· lecture 8: fast...

17
Lecture 8: Fast Linear Solvers (Part 5) 1

Upload: others

Post on 19-Jun-2020

5 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Lecture 8: Fast Linear Solvers (Part 5)zxu2/acms60212-40212-S12/Lec-09-5.pdfΒ Β· Lecture 8: Fast Linear Solvers (Part 5) 1 . Conjugate Gradient (CG) Method β€’Solve 𝐴𝒙=𝒃 with

Lecture 8: Fast Linear Solvers (Part 5)

1

Page 2: Lecture 8: Fast Linear Solvers (Part 5)zxu2/acms60212-40212-S12/Lec-09-5.pdfΒ Β· Lecture 8: Fast Linear Solvers (Part 5) 1 . Conjugate Gradient (CG) Method β€’Solve 𝐴𝒙=𝒃 with

Conjugate Gradient (CG) Method

β€’ Solve 𝐴𝒙 = 𝒃 with 𝐴 being an 𝑛 Γ— 𝑛 symmetric positive definite matrix.

β€’ Define the quadratic function

πœ™ 𝒙 =1

2𝒙𝑇𝐴𝒙 βˆ’ 𝒙𝑇𝒃

Suppose 𝒙 minimizes πœ™ 𝒙 , 𝒙 is the solution to 𝐴𝒙 = 𝒃.

β€’ π›»πœ™ 𝒙 = πœ•πœ™

πœ•π‘₯1, … ,

πœ•πœ™

πœ•π‘₯𝑛= 𝐴𝒙 βˆ’ 𝒃

β€’ The iteration takes form 𝒙(π‘˜+1) = 𝒙 π‘˜ + π›Όπ‘˜π’—(π‘˜) where 𝒗(π‘˜) is the search direction and π›Όπ‘˜ is the step size.

β€’ Define 𝒓 π‘˜ = 𝒃 βˆ’ 𝐴𝒙 π‘˜ to be the residual vector. 2

Page 3: Lecture 8: Fast Linear Solvers (Part 5)zxu2/acms60212-40212-S12/Lec-09-5.pdfΒ Β· Lecture 8: Fast Linear Solvers (Part 5) 1 . Conjugate Gradient (CG) Method β€’Solve 𝐴𝒙=𝒃 with

β€’ Let 𝒙 and 𝒗 β‰  𝟎 πœ™ 𝒙 + 𝛼𝒗 be fixed vectors and 𝛼 a real number variable.

Define:

β„Ž 𝛼 = πœ™ 𝒙 + 𝛼𝒗 = πœ™ 𝒙 + 2𝛼 < 𝒗, 𝐴𝒙 βˆ’ 𝒃 > +𝛼2 < 𝒗, 𝐴𝒙 >

β„Ž 𝛼 has a minimum when β„Žβ€²(𝛼) = 0. This occurs when

𝛼 =𝒗𝑇(𝒃 βˆ’ 𝐴𝒙)

𝒗𝑻𝐴𝒗.

So β„Ž 𝛼 = πœ™ 𝒙 βˆ’(𝒗𝑇(π’ƒβˆ’π΄π’™))2

𝒗𝑻𝐴𝒗.

Suppose π’™βˆ— is a vector that minimizes πœ™ 𝒙 . So πœ™ 𝒙 + 𝛼 𝒗 β‰₯ πœ™ π’™βˆ— . This implies 𝒗𝑇 𝒃 βˆ’ π΄π’™βˆ— = 0. Therefore 𝒃 βˆ’ π΄π’™βˆ— = 0.

3

Page 4: Lecture 8: Fast Linear Solvers (Part 5)zxu2/acms60212-40212-S12/Lec-09-5.pdfΒ Β· Lecture 8: Fast Linear Solvers (Part 5) 1 . Conjugate Gradient (CG) Method β€’Solve 𝐴𝒙=𝒃 with

β€’ For any 𝒗 β‰  𝟎, πœ™ 𝒙 + 𝛼𝒗 < πœ™ 𝒙 unless

𝒗𝑇 𝒃 βˆ’ 𝐴𝒙 = 0 with 𝛼 =𝒗𝑇(π’ƒβˆ’π΄π’™)

𝒗𝑻𝐴𝒗.

β€’ How to choose the search direction 𝒗? – Method of steepest descent: 𝒗 = βˆ’π›»πœ™ 𝒙

β€’ Remark: Slow convergence for linear systems

Algorithm. Let 𝒙(0) be initial guess. for π‘˜ = 1,2, … 𝒗(π‘˜) = 𝒃 βˆ’ 𝐴𝒙(π‘˜βˆ’1)

π›Όπ‘˜ =<𝒗 π‘˜ , π’ƒβˆ’π΄π’™(π‘˜βˆ’1) >

<𝒗(π‘˜),𝐴𝒗(π‘˜)>

𝒙(π‘˜) = 𝒙(π‘˜βˆ’1) + π›Όπ‘˜π’—(π‘˜) end

4

Page 5: Lecture 8: Fast Linear Solvers (Part 5)zxu2/acms60212-40212-S12/Lec-09-5.pdfΒ Β· Lecture 8: Fast Linear Solvers (Part 5) 1 . Conjugate Gradient (CG) Method β€’Solve 𝐴𝒙=𝒃 with

Steepest descent method when πœ†π‘šπ‘Žπ‘₯

πœ†π‘šπ‘–π‘› is large

β€’ Consider to solve 𝐴𝒙 = 𝒃 with 𝐴 =πœ†1 00 πœ†2

,

𝒃 =πœ†1

πœ†2 and the start vector 𝒗 =

βˆ’9βˆ’1

.

Reduction of ||𝐴𝒙(π‘˜) βˆ’ 𝒃||2 < 10βˆ’4.

– With πœ†1 = 1, πœ†2 = 2, it takes about 10 iterations

– With πœ†1 = 1, πœ†2 = 10, it takes about 40 iterations

5

Page 6: Lecture 8: Fast Linear Solvers (Part 5)zxu2/acms60212-40212-S12/Lec-09-5.pdfΒ Β· Lecture 8: Fast Linear Solvers (Part 5) 1 . Conjugate Gradient (CG) Method β€’Solve 𝐴𝒙=𝒃 with

β€’ Second approach to choose the search direction 𝒗?

– A-orthogonal approach: use a set of nonzero direction

vectors {𝒗 1 , … , 𝒗(𝑛)} that satisfy < 𝒗(𝑖), 𝐴𝒗(𝑗) > = 0, if

𝑖 β‰  𝑗. The set {𝒗 1 , … , 𝒗(𝑛)} is called A-orthogonal.

β€’ Theorem. Let {𝒗 1 , … , 𝒗(𝑛)} be an A-orthogonal set of nonzero vectors associated with the

symmetric, positive definite matrix 𝐴, and let 𝒙(0)

be arbitrary. Define π›Όπ‘˜ =<𝒗 π‘˜ , π’ƒβˆ’π΄π’™(π‘˜βˆ’1) >

<𝒗(π‘˜),𝐴𝒗(π‘˜)> and

𝒙 π‘˜ = 𝒙 π‘˜βˆ’1 + π›Όπ‘˜π’— π‘˜ for π‘˜ = 1,2 … 𝑛. Then

𝐴𝒙 𝑛 = 𝒃 when arithmetic is exact.

6

Page 7: Lecture 8: Fast Linear Solvers (Part 5)zxu2/acms60212-40212-S12/Lec-09-5.pdfΒ Β· Lecture 8: Fast Linear Solvers (Part 5) 1 . Conjugate Gradient (CG) Method β€’Solve 𝐴𝒙=𝒃 with

Conjugate Gradient Method

β€’ The conjugate gradient method of Hestenes and Stiefel.

β€’ Main idea: Construct {𝒗 1 , 𝒗 2 … } during

iteration so that the residual vectors 𝒓 π‘˜ are

mutually orthogonal.

7

Page 8: Lecture 8: Fast Linear Solvers (Part 5)zxu2/acms60212-40212-S12/Lec-09-5.pdfΒ Β· Lecture 8: Fast Linear Solvers (Part 5) 1 . Conjugate Gradient (CG) Method β€’Solve 𝐴𝒙=𝒃 with

Algorithm of CG Method

Let 𝒙(0) be initial guess. Set 𝒓(0) = 𝒃 βˆ’ 𝐴𝒙(0); 𝒗(1) = 𝒓(0). for π‘˜ = 1,2,…

π›Όπ‘˜ =<𝒓 π‘˜βˆ’1 ,𝒓 π‘˜βˆ’1 >

<𝒗(π‘˜),𝐴𝒗(π‘˜)>

𝒙(π‘˜) = 𝒙(π‘˜βˆ’1) + π›Όπ‘˜π’—(π‘˜) 𝒓(π‘˜) = 𝒓(π‘˜βˆ’1) βˆ’ π›Όπ‘˜π΄π’—(π‘˜) // construct residual

πœŒπ‘˜ =< 𝒓 π‘˜ , 𝒓 π‘˜ > if πœŒπ‘˜ < πœ€ exit. //convergence test

π‘ π‘˜ =<𝒓 π‘˜ ,𝒓 π‘˜ >

<𝒓(π‘˜βˆ’1),𝒓(π‘˜βˆ’1)>

𝒗(π‘˜+1) = 𝒓(π‘˜) + π‘ π‘˜π’—(π‘˜) // construct new search direction end

8

Page 9: Lecture 8: Fast Linear Solvers (Part 5)zxu2/acms60212-40212-S12/Lec-09-5.pdfΒ Β· Lecture 8: Fast Linear Solvers (Part 5) 1 . Conjugate Gradient (CG) Method β€’Solve 𝐴𝒙=𝒃 with

Remarks

β€’ Constructed {𝒗 1 , 𝒗 2 … } are pair-wise A-orthogonal.

β€’ Each iteration, there are one matrix-vector multiplication, two dot products and three scalar multiplications.

β€’ Due to round-off errors, in practice, we need more than 𝑛 iterations to get the solution.

β€’ If the matrix 𝐴 is ill-conditioned, the CG method is sensitive to round-off errors (CG is not good as Gaussian elimination with pivoting).

β€’ Main usage of CG is as iterative method applied to bettered conditioned system.

9

Page 10: Lecture 8: Fast Linear Solvers (Part 5)zxu2/acms60212-40212-S12/Lec-09-5.pdfΒ Β· Lecture 8: Fast Linear Solvers (Part 5) 1 . Conjugate Gradient (CG) Method β€’Solve 𝐴𝒙=𝒃 with

CG as Krylov Subspace Method

Theorem. 𝒙(π‘˜) of the CG method minimizes the function πœ™ 𝒙 with respect to the subspace

Ξšπ‘˜ 𝐴, 𝒓 0 =

π‘ π‘π‘Žπ‘›{𝒓 0 , 𝐴𝒓 0 , 𝐴2𝒓 0 , … , π΄π‘˜βˆ’1𝒓 0 }.

I.e.

πœ™ 𝒙(π‘˜) = π‘šπ‘–π‘›π‘π‘–πœ™ 𝒙(0) + 𝑐𝑖𝐴

𝑖𝒓 0π‘˜βˆ’1𝑖=0

The subspace Ξšπ‘˜ 𝐴, 𝒓 0 is called Krylov subspace.

10

Page 11: Lecture 8: Fast Linear Solvers (Part 5)zxu2/acms60212-40212-S12/Lec-09-5.pdfΒ Β· Lecture 8: Fast Linear Solvers (Part 5) 1 . Conjugate Gradient (CG) Method β€’Solve 𝐴𝒙=𝒃 with

Error Estimate

β€’ Define an energy norm || βˆ™ ||𝐴 of vector 𝒖 with respect to matrix 𝐴: ||𝒖||𝐴 = (𝒖𝑇𝐴𝒖)1/2

β€’ Define the error 𝒆(π‘˜) = 𝒙(π‘˜) βˆ’ π’™βˆ— where π’™βˆ— is the exact solution.

β€’ Theorem.

||𝒙(π‘˜) βˆ’ π’™βˆ—||𝐴 ≀ 2(πœ… 𝐴 βˆ’1

πœ… 𝐴 +1)π‘˜||𝒙(0) βˆ’ π’™βˆ—||𝐴 with

πœ… 𝐴 = π‘π‘œπ‘›π‘‘ 𝐴 =πœ†π‘šπ‘Žπ‘₯(𝐴)

πœ†π‘šπ‘–π‘›(𝐴)β‰₯ 1.

Remark: Convergence is fast if matrix 𝐴 is well-conditioned.

11

Page 12: Lecture 8: Fast Linear Solvers (Part 5)zxu2/acms60212-40212-S12/Lec-09-5.pdfΒ Β· Lecture 8: Fast Linear Solvers (Part 5) 1 . Conjugate Gradient (CG) Method β€’Solve 𝐴𝒙=𝒃 with

Preconditioning

Let the symmetric positive definite matrix 𝑀 be a preconditioner for 𝐴 and 𝐿𝐿𝑇 = 𝑀 be its Cholesky factorization. π‘€βˆ’1𝐴 is better conditioned than 𝐴. The preconditioned system of equations is

π‘€βˆ’1𝐴𝒙 = π‘€βˆ’1𝒃 or

πΏβˆ’π‘‡πΏβˆ’1𝐴𝒙 = πΏβˆ’π‘‡πΏβˆ’1𝒃 where πΏβˆ’π‘‡ = (𝐿𝑇)βˆ’1. Multiply with 𝐿𝑇 to obtain

πΏβˆ’1π΄πΏβˆ’π‘‡πΏπ‘‡π’™ = πΏβˆ’1𝒃

Define: 𝐴 = πΏβˆ’1π΄πΏβˆ’π‘‡; 𝒙 = 𝐿𝑇𝒙; 𝒃 = πΏβˆ’1𝒃

Now apply CG to 𝐴 𝒙 = 𝒃 .

12

Page 13: Lecture 8: Fast Linear Solvers (Part 5)zxu2/acms60212-40212-S12/Lec-09-5.pdfΒ Β· Lecture 8: Fast Linear Solvers (Part 5) 1 . Conjugate Gradient (CG) Method β€’Solve 𝐴𝒙=𝒃 with

Preconditioned CG Method

β€’ Define 𝒛(π‘˜) = π‘€βˆ’1𝒓(π‘˜) to be the preconditioned residual.

Let 𝒙(0) be initial guess. Set 𝒓(0) = 𝒃 βˆ’ 𝐴𝒙(0); Solve 𝑀𝒛(0) = 𝒓(0) for 𝒛(0) Set 𝒗(1) = 𝒛(0) for π‘˜ = 1,2, …

π›Όπ‘˜ =<𝒛 π‘˜βˆ’1 ,𝒓 π‘˜βˆ’1 >

<𝒗(π‘˜),𝐴𝒗(π‘˜)>

𝒙(π‘˜) = 𝒙(π‘˜βˆ’1) + π›Όπ‘˜π’—(π‘˜)

𝒓(π‘˜) = 𝒓(π‘˜βˆ’1) βˆ’ π›Όπ‘˜π΄π’—(π‘˜)

solve 𝑀𝒛(π‘˜) = 𝒓(π‘˜) for 𝒛(π‘˜)

πœŒπ‘˜ =< 𝒓 π‘˜ , 𝒓 π‘˜ > if πœŒπ‘˜ < πœ€ exit. //convergence test

π‘ π‘˜ =<𝒛 π‘˜ ,𝒓 π‘˜ >

<𝒛(π‘˜βˆ’1),𝒓(π‘˜βˆ’1)>

𝒗(π‘˜+1) = 𝒓(π‘˜) + π‘ π‘˜π’—(π‘˜) end 13

Page 14: Lecture 8: Fast Linear Solvers (Part 5)zxu2/acms60212-40212-S12/Lec-09-5.pdfΒ Β· Lecture 8: Fast Linear Solvers (Part 5) 1 . Conjugate Gradient (CG) Method β€’Solve 𝐴𝒙=𝒃 with

Incomplete Cholesky Factorization

β€’ Assume 𝐴 is symmetric and positive definite. 𝐴 is sparse. β€’ Factor 𝐴 = 𝐿𝐿𝑇 + 𝑅, 𝑅 β‰  𝟎. 𝐿 has similar sparse structure as

𝐴.

14

for π‘˜ = 1, … , 𝑛 π‘™π‘˜π‘˜ = π‘Žπ‘˜π‘˜ for 𝑖 = π‘˜ + 1, … , 𝑛

π‘™π‘–π‘˜ =π‘Žπ‘–π‘˜

π‘™π‘˜π‘˜

for 𝑗 = π‘˜ + 1, … , 𝑛 if π‘Žπ‘–π‘— = 0 then

𝑙𝑖𝑗 = 0

else π‘Žπ‘–π‘— = π‘Žπ‘–π‘— βˆ’ π‘™π‘–π‘˜π‘™π‘˜π‘—

endif endfor endfor endfor

Page 15: Lecture 8: Fast Linear Solvers (Part 5)zxu2/acms60212-40212-S12/Lec-09-5.pdfΒ Β· Lecture 8: Fast Linear Solvers (Part 5) 1 . Conjugate Gradient (CG) Method β€’Solve 𝐴𝒙=𝒃 with

In diagonal or Jacobi preconditioning 𝑀 = π‘‘π‘–π‘Žπ‘”(𝐴) β€’ Jacobi preconditioning is cheap if it works, i.e.

solving 𝑀𝒛(π‘˜) = 𝒓(π‘˜) for 𝒛(π‘˜) almost cost nothing. References β€’ CT. Kelley. Iterative Methods for Linear and Nonlinear Equations β€’ T. F. Chan and H. A. van der Vorst, Approximate and incomplete factorizations, D.

E. Keyes, A. Sameh, and V. Venkatakrishnan, eds., Parallel Numerical Algorithms, pp. 167-202, Kluwer, 1997

β€’ M. J. Grote and T. Huckle, Parallel preconditioning with sparse approximate inverses, SIAM J. Sci. Comput. 18:838-853, 1997

β€’ Y. Saad, Highly parallel preconditioners for general sparse matrices, G. Golub, A. Greenbaum, and M. Luskin, eds., Recent Advances in Iterative Methods, pp. 165-199, Springer-Verlag, 1994

β€’ H. A. van der Vorst, High performance preconditioning, SIAM J. Sci. Stat. Comput. 10:1174-1185, 1989

Jacobi Preconditioning

15

Page 16: Lecture 8: Fast Linear Solvers (Part 5)zxu2/acms60212-40212-S12/Lec-09-5.pdfΒ Β· Lecture 8: Fast Linear Solvers (Part 5) 1 . Conjugate Gradient (CG) Method β€’Solve 𝐴𝒙=𝒃 with

Row-wise Block Striped Decomposition of a Symmetrically Banded Matrix

Row decomposition 16

Page 17: Lecture 8: Fast Linear Solvers (Part 5)zxu2/acms60212-40212-S12/Lec-09-5.pdfΒ Β· Lecture 8: Fast Linear Solvers (Part 5) 1 . Conjugate Gradient (CG) Method β€’Solve 𝐴𝒙=𝒃 with

Parallel CG Algorithm β€’ Assume a row-wise block-striped decomposition of matrix 𝐴 and partition all vectors

uniformly among tasks.

Let 𝒙(0) be initial guess. Set 𝒓(0) = 𝒃 βˆ’ 𝐴𝒙(0); Solve 𝑀𝒛(0) = 𝒓(0) for 𝒛(0) Set 𝒗(1) = 𝒛(0) for π‘˜ = 1,2, … π’ˆ = 𝐴𝒗(π‘˜) // parallel matrix-vector multiplication

π‘§π‘Ÿ =< 𝒛 π‘˜βˆ’1 , 𝒓 π‘˜βˆ’1 > // parallel dot product by MPI_Allreduce

π›Όπ‘˜ =π‘§π‘Ÿ

<𝒗(π‘˜),π’ˆ> // parallel dot product by MPI_Allreduce

𝒙(π‘˜) = 𝒙(π‘˜βˆ’1) + π›Όπ‘˜π’—(π‘˜) // 𝒓(π‘˜) = 𝒓(π‘˜βˆ’1) βˆ’ π›Όπ‘˜π’ˆ //

solve 𝑀𝒛(π‘˜) = 𝒓(π‘˜) for 𝒛(π‘˜) // Solve matrix system, can involve additional complexity

πœŒπ‘˜ =< 𝒓 π‘˜ , 𝒓 π‘˜ > // MPI_Allreduce if πœŒπ‘˜ < πœ€ exit. //convergence test

π‘§π‘Ÿ_𝑛 =< 𝒛 π‘˜ , 𝒓 π‘˜ > // parallel dot product

π‘ π‘˜ =π‘§π‘Ÿ_𝑛

π‘§π‘Ÿ

𝒗(π‘˜+1) = 𝒓(π‘˜) + π‘ π‘˜π’—(π‘˜) end

17