parallel numerics, wt 2012/2013€¦ · linear systems of equations with dense matricesge in...
TRANSCRIPT
Linear Systems of Equations with Dense Matrices GE in Parallel: Blockwise QR-Decomposition
Parallel Numerics, WT 2012/2013
3 Linear Systems of Equations with DenseMatrices
Parallel Numerics, WT 2012/2013 3 Linear Systems of Equations with Dense Matrices
page 1 of 35
Linear Systems of Equations with Dense Matrices GE in Parallel: Blockwise QR-Decomposition
Contents1 Introduction
1.1 Computer Science Aspects1.2 Numerical Problems1.3 Graphs1.4 Loop Manipulations
2 Elementary Linear Algebra Problems2.1 BLAS: Basic Linear Algebra Subroutines2.2 Matrix-Vector Operations2.3 Matrix-Matrix-Product
3 Linear Systems of Equations with Dense Matrices3.1 Gaussian Elimination3.2 Parallelization3.3 QR-Decomposition with Householder matrices
4 Sparse Matrices4.1 General Properties, Storage4.2 Sparse Matrices and Graphs4.3 Reordering4.4 Gaussian Elimination for Sparse Matrices
5 Iterative Methods for Sparse Matrices5.1 Stationary Methods5.2 Nonstationary Methods5.3 Preconditioning
6 Domain Decomposition6.1 Overlapping Domain Decomposition6.2 Non-overlapping Domain Decomposition6.3 Schur Complements
Parallel Numerics, WT 2012/2013 3 Linear Systems of Equations with Dense Matrices
page 2 of 35
Linear Systems of Equations with Dense Matrices GE in Parallel: Blockwise QR-Decomposition
3.1. Linear Systems of Equations with DenseMatrices
3.1.1. Gaussian Elimination: Basic Properties• Linear system of equations:
a11x1 + . . .+ a1nxn = b1
......
an1x1 + . . .+ annxn = bn
• Solve Ax = b a11 · · · a1n...
. . ....
an1 · · · ann
x1
...xn
=
b1...
bn
• Generate simpler linear equations (matrices). Transform A in
triangular form: A = A(1) → A(2) → . . .→ A(n) = U.
Parallel Numerics, WT 2012/2013 3 Linear Systems of Equations with Dense Matrices
page 3 of 35
Linear Systems of Equations with Dense Matrices GE in Parallel: Blockwise QR-Decomposition
Transformation to Upper Triangular Forma11 a12 · · · a1na21 a22 · · · a2n...
.... . .
...an1 an2 · · · ann
row transformations: (2)→ (2)− a21
a11· (1), . . . , (n)→ (n)− an1
a11· (1)
leads to
A(2) =
a11 a12 a13 · · · a1n
0 a(2)22 a(2)
23 · · · a(2)2n
0 a(2)32 a(2)
33 · · · a(2)3n
......
.... . .
...0 a(2)
n2 a(2)n3 · · · a(2)
nn
next transformations: (3)→ (3)− a(2)
32
a(2)22
· (2), . . . , (n)→ (n)− a(2)n2
a(2)22
· (2)
Parallel Numerics, WT 2012/2013 3 Linear Systems of Equations with Dense Matrices
page 4 of 35
Linear Systems of Equations with Dense Matrices GE in Parallel: Blockwise QR-Decomposition
Transformation to Triangular Form (cont.)
A(3) =
a11 a12 a13 · · · a1n
0 a(2)22 a(2)
23 · · · a(2)2n
0 0 a(3)33 · · · a(3)
3n...
......
. . ....
0 0 a(3)n3 · · · a(3)
nn
next transformations: (4)→ (4)− a(3)
43
a(3)33
· (3), . . . , (n)→ (n)− a(3)n3
a(3)33
· (3)
A(n) =
a11 a12 a13 · · · a1n
0 a(2)22 a(2)
23 · · · a(2)2n
0 0 a(3)33 · · · a(3)
3n...
......
. . ....
0 0 0 · · · a(n)nn
= U
Parallel Numerics, WT 2012/2013 3 Linear Systems of Equations with Dense Matrices
page 5 of 35
Linear Systems of Equations with Dense Matrices GE in Parallel: Blockwise QR-Decomposition
Pseudocode Gaussian Elimination (GE)Simplification: assume that no pivoting is necessary.
a(k)kk 6= 0 or |a(k)
kk | ≥ ρ > 0 for k = 1,2, . . . ,n
for k = 1 : n − 1...for i = k + 1 : n......li,k =
ai,kak,k
...end
...for i = k + 1 : n
......for j = k + 1 : n
.........ai,j = ai,j − li,k · ak,j
......end
...end
end
In practice:• Include pivoting and include right hand side b.• There is still to solve a triangular system in U!
Parallel Numerics, WT 2012/2013 3 Linear Systems of Equations with Dense Matrices
page 6 of 35
Linear Systems of Equations with Dense Matrices GE in Parallel: Blockwise QR-Decomposition
Intermediate Systems
A(k), k = 1,2, . . . ,n with A = A(1) and U = A(n)
a(1)11 · · · a(1)
1,k−1 a(1)1,k · · · a(1)
1,n
0. . .
......
. . ....
.... . . a(k−1)
k−1,k−1 a(k−1)k−1,k · · · a(k−1)
k−1,n
0 · · · 0 a(k)k,k · · · a(k)
k,n...
. . ....
.... . .
...0 · · · 0 a(k)
n,k · · · a(k)n,n
Parallel Numerics, WT 2012/2013 3 Linear Systems of Equations with Dense Matrices
page 7 of 35
Linear Systems of Equations with Dense Matrices GE in Parallel: Blockwise QR-Decomposition
Define Auxiliary Matrices
L =
1 0 · · · 0
l2,1 1. . . 0
.... . . . . .
...ln,1 · · · ln,n−1 1
and U = A(n)
Lk :=
0 · · · 0 0 0 · · · 0...
. . ....
......
. . ....
0 · · · 0 0 0 · · · 00 · · · 0 0 0 · · · 00 · · · 0 lk+1,k 0 · · · 0...
. . ....
......
. . ....
0 · · · 0 ln,k 0 · · · 0
, L = I +
∑k
Lk
Parallel Numerics, WT 2012/2013 3 Linear Systems of Equations with Dense Matrices
page 8 of 35
Linear Systems of Equations with Dense Matrices GE in Parallel: Blockwise QR-Decomposition
Elimination Step in Terms of AuxiliaryMatrices
A(k+1) = (I − Lk ) · A(k) = A(k) − Lk · A(k)
U = A(n) = (I − Ln−1) · A(n−1) = . . . = (I − Ln−1) · · · (I − L1)A(1) = L · A
L := (I − Ln−1) · · · (I − L1)
A = L−1 · U with U upper triangular and L lower triangular
• Theorem 2: L−1 = L and therefore A = LU.
• Advantage: Every further problem Ax = bj can be reduced to(LU)x = bj for arbitrary j .
• Solve two triangular problems (LU)x = Ly = b and Ux = y .
Parallel Numerics, WT 2012/2013 3 Linear Systems of Equations with Dense Matrices
page 9 of 35
Linear Systems of Equations with Dense Matrices GE in Parallel: Blockwise QR-Decomposition
Theorem 2: L−1 = L → A = LU
for i ≤ j : Li · Lj =
(I + Lj)(I − Lj) = I + Lj − Lj − L2j = I ⇒ (I − Lj)
−1 = I + Lj
(I + Li)(I + Lj) = I + Li + Lj + LiLj = I + Li + Lj︸ ︷︷ ︸L−1 = [(I − Ln−1) · · · (I − L1)]
−1 = (I − L1)−1 · · · (I − Ln−1)
−1 =
(I + L1)(I + L2) · · · (I + Ln−1) = I + L1 + L2 + · · ·+ Ln−1 = L
Parallel Numerics, WT 2012/2013 3 Linear Systems of Equations with Dense Matrices
page 10 of 35
Linear Systems of Equations with Dense Matrices GE in Parallel: Blockwise QR-Decomposition
3.2. GE in Parallel: Blockwise
Main idea: Blocking of GE to avoid data transfer between processors.
Basic Concepts:
Replace GE or large LU-decomposition of full matrix by smallintermediate steps (by sequence of small block operations):• Solving collection of small triangular systems LUk = Bk
(parallelism in columns of U)
• A→ A− LU updating matrices (also easy to parallelize)
• small B = LU-decompositions (parallelism in rows of B)
Parallel Numerics, WT 2012/2013 3 Linear Systems of Equations with Dense Matrices
page 11 of 35
Linear Systems of Equations with Dense Matrices GE in Parallel: Blockwise QR-Decomposition
How to Choose Blocks in L/U SatisfyingLU = AL11 0 0
L21 L22 0L31 L32 L33
U11 U12 U130 U22 U230 0 U33
=
A11 A12 A13A21 A22 A23A31 A32 A33
=
=
L11U11 L11U12 L11U13L21U11 L21U12 + L22U22 L21U13 + L22U23L31U11 L31U12 + L32U22 ∗
Different ways of computing L and U depending on• start (assume first entry/row/column of L/U as given)• how to compute new entry/row/column of L/U• update of block structure of L/U by grouping in
– known blocks– blocks newly to compute– blocks to be computed later
Parallel Numerics, WT 2012/2013 3 Linear Systems of Equations with Dense Matrices
page 12 of 35
Linear Systems of Equations with Dense Matrices GE in Parallel: Blockwise QR-Decomposition
Crout Form
Parallel Numerics, WT 2012/2013 3 Linear Systems of Equations with Dense Matrices
page 13 of 35
Linear Systems of Equations with Dense Matrices GE in Parallel: Blockwise QR-Decomposition
Crout Form (cont.)1. Solve
by small LU-decomposition of the modified part of A→ L22,L32,and U22.
2. Solve
by solving small triangular systems of equations in L22 → U23.
Initial steps:
L11U11 = A11,
(L21L31
)U11 =
(A21A31
), L11(U12 U13) = (A12 A13)
Parallel Numerics, WT 2012/2013 3 Linear Systems of Equations with Dense Matrices
page 14 of 35
Linear Systems of Equations with Dense Matrices GE in Parallel: Blockwise QR-Decomposition
New Partitioning
• Combine already computed parts from second column of L andsecond row of U into first column of L and first row of U.
• Split the until now ignored parts L33 and U33 into newcolumns/rows.
• Repeat this overall procedure until L and U are fully computed.
Parallel Numerics, WT 2012/2013 3 Linear Systems of Equations with Dense Matrices
page 15 of 35
Linear Systems of Equations with Dense Matrices GE in Parallel: Blockwise QR-Decomposition
Block StructureIntermediate block structure:
Solve for red blocks.
Reconfigure the block structure:
Repeat until done.Parallel Numerics, WT 2012/2013 3 Linear Systems of Equations with Dense Matrices
page 16 of 35
Linear Systems of Equations with Dense Matrices GE in Parallel: Blockwise QR-Decomposition
Left Looking GE
• Solve L11U12 = A12 by a couple of parallel triangular solves and(L22L32
)U22 =
(A22A32
)−(
L21L31
)U12 =:
(A22
A32
)update part of A and perform small LU-decompostion.
• Reorder blocks and repeat until ready. Start: L11U11 = A11,L21U11 = A21, and L31U11 = A31.
Parallel Numerics, WT 2012/2013 3 Linear Systems of Equations with Dense Matrices
page 17 of 35
Linear Systems of Equations with Dense Matrices GE in Parallel: Blockwise QR-Decomposition
Block StructureIntermediate block structure:
Solve for red blocks.
Reconfigure the block structure:
Repeat until done.Parallel Numerics, WT 2012/2013 3 Linear Systems of Equations with Dense Matrices
page 18 of 35
Linear Systems of Equations with Dense Matrices GE in Parallel: Blockwise QR-Decomposition
Right Looking GENew blocking:
• Start with L11U11 = A11 (small LU-decomposition).
• Equations L21U11 = A21 and L11U12 = A12 by triangular solvesgives L21 and U12.
• It remains L22U22 = A22 − L21U12 = A22
• To compute the LU-decomposition of modified A22 repeat2× 2-blocking for A22 and apply recursively.
Parallel Numerics, WT 2012/2013 3 Linear Systems of Equations with Dense Matrices
page 19 of 35
Linear Systems of Equations with Dense Matrices GE in Parallel: Blockwise QR-Decomposition
Block StructureIntermediate block structure:
Solve for blue and both red blocks.
Reconfigure the block structure:
Repeat until done.
Parallel Numerics, WT 2012/2013 3 Linear Systems of Equations with Dense Matrices
page 20 of 35
Linear Systems of Equations with Dense Matrices GE in Parallel: Blockwise QR-Decomposition
Comparison and Overview
• In comparison, all methods
– have nearly same efficiency in parallel– but better performance (in sequential or parallel) than the
unblocked variants because they are based on BLAS-3.
• Elementary steps of all blocking methods:
– Matrix-Matrix product and sum (easy to parallelize)– Couple of triangular solves (easy to parallelize)– Small LU-decomposition (parallelizable for long rows)
• Crout and right looking slightly better because more flops inmatrix-updates and less triangular solves respectivelyLU-decompositions.
Parallel Numerics, WT 2012/2013 3 Linear Systems of Equations with Dense Matrices
page 21 of 35
Linear Systems of Equations with Dense Matrices GE in Parallel: Blockwise QR-Decomposition
3.3. QR-Decomposition with HouseholderMatrices
3.3.1. QR-decomposition
• Gaussian elimination→ LU-decomposition: sometimesnumerically not stable, over/underdetermined systems
• Improvement:
QR-decomposition A = QR with Q orthogonal, R triangular,Solve linear system Ax=b numerically stable via
b = Ax = QRx ⇔ Rx = QT b
by cheap matrix-vector multiplication and triangular solve.
Parallel Numerics, WT 2012/2013 3 Linear Systems of Equations with Dense Matrices
page 22 of 35
Linear Systems of Equations with Dense Matrices GE in Parallel: Blockwise QR-Decomposition
Overdetermined Systems
• Ax = b with
A being m × n matrix, n� mx vector of length nb vector of length m
• Best approximate solution by solving minimization
minx‖Ax − b‖2
2 = minx
(xT AT Ax − 2xT AT b + bT b)
• Gradient equal zero⇔ AT Ax = AT b (normal equations)
• Solution by considering linear system AT A, but condition numberworse:
cond(AT A) = cond(A)2
Parallel Numerics, WT 2012/2013 3 Linear Systems of Equations with Dense Matrices
page 23 of 35
Linear Systems of Equations with Dense Matrices GE in Parallel: Blockwise QR-Decomposition
Advantage of QR-Decomposition
A = QR, R =
(R10
), cond(R1) = cond(A), b = QT b =
(b1
b2
)
AT Ax = AT b ⇔ (QR)T (QR)x = (QR)T b ⇔
RT Rx = RT (QT b) ⇔ (RT1 0)
(R10
)x = (RT
1 0)b ⇔
RT1 R1x = (RT
1 0)
(b1
b2
)⇔ RT
1 R1x = RT1 b1 ⇔ R1x = b1
• Instead of solving the normal equations we only have to considerthe triangular system in R1 .
• Cheap and better condition number.
Parallel Numerics, WT 2012/2013 3 Linear Systems of Equations with Dense Matrices
page 24 of 35
Linear Systems of Equations with Dense Matrices GE in Parallel: Blockwise QR-Decomposition
3.3.2. Householder Method
• Define special orthogonal and simple matrices H calledHouseholder matrices (compare Givens):
u ∈ Rn, ‖u‖2 = 1 : H = I − 2uuT
• H as rank-1 perturbation of the identity is symmetric, idempotentand orthogonal:
HT = I − 2uuT = H
HT H = H2 = (I−2uuT )(I−2uuT ) = I−2uuT−2uuT+4u uT u︸︷︷︸= 1
uT = I
• For complex problems:orthogonal→ unitary, symmetric→ hermitian
Parallel Numerics, WT 2012/2013 3 Linear Systems of Equations with Dense Matrices
page 25 of 35
Linear Systems of Equations with Dense Matrices GE in Parallel: Blockwise QR-Decomposition
Householder Method (cont.)
• Use H1 with appropriate vector u1 to eliminate first column of A
H1A = (I−2u1uT1 )(a1 · · · am) = (a1−2(uT
1 a1)u1 · · · ∗) =
α ∗0 ∗...
...0 ∗
• To satisfy this equation we have to find a vector u1 of length 1
witha1 − 2(uT
1 a1)u1 = αe1
• Because H1 is orthogonal it holds:
‖H1a1‖2 = ‖a1‖2 = ‖αe1‖2 = |α| ⇒ α = ±‖a1‖2, e.g. α = ‖a1‖2
u1 =a1 − ‖a1‖2e1
2(uT1 a1)
=a1 − ‖a1‖2e1
‖a1 − ‖a1‖2e1‖2
Parallel Numerics, WT 2012/2013 3 Linear Systems of Equations with Dense Matrices
page 26 of 35
Linear Systems of Equations with Dense Matrices GE in Parallel: Blockwise QR-Decomposition
Householder Method (cont. 2)• Repeat for all columns of A
H1A = H1A1 = (I − 2u1uT1 )A =
‖a1‖2 ∗ · · · ∗
0... A2
0
• Apply the same procedure on A2, (n − 1)× (m − 1) matrix.
H2A2 = (I − 2u2uT2 )A2 =
α2 ∗ · · · ∗0... A3
0
• Extend
u2 :=
(0u2
), H2 := I − 2u2uT
2 =
1 0 · · · 00... H2
0
Parallel Numerics, WT 2012/2013 3 Linear Systems of Equations with Dense Matrices
page 27 of 35
Linear Systems of Equations with Dense Matrices GE in Parallel: Blockwise QR-Decomposition
Householder Method (cont. 3)
• For column 1,2, . . . ,m this gives Householder matricesH1, . . . ,Hm with
Hm · · ·H2H1︸ ︷︷ ︸= QT
·A = Hm · · ·H3·
α1 ∗ ∗ · · · ∗0 α2 ∗ · · · ∗0 0...
... A30 0
=
(R10
)=: R
• Hence:
A = QR, Q := (Hm · · ·H2H1)T = H1H2 · · ·Hm
• Remark: for m = n: H1, . . . ,Hm−1 is enough, because lastcolumn is scalar.
Parallel Numerics, WT 2012/2013 3 Linear Systems of Equations with Dense Matrices
page 28 of 35
Linear Systems of Equations with Dense Matrices GE in Parallel: Blockwise QR-Decomposition
3.3.3. Householder Method in Parallel - Blockwise
Idea: work again blockwise.
• In a first step compute u1 and the application of H1 on the first kcolumns of A. Do not compute H1A fully!
• Then compute u2, . . . ,uk and the application of H1 . . .Hk on thefirst columns of A.
Hk · · ·H1(A1 A2) = (Hk · · ·H1A1 (Hk · · ·H1)A2) = (A(k)1 VA2)
• Still to compute: VA2.
• How can we take advantage of parallelism in this computation?→ represent V in special form that allows fast and parallelevaluation of VA2.
Parallel Numerics, WT 2012/2013 3 Linear Systems of Equations with Dense Matrices
page 29 of 35
Linear Systems of Equations with Dense Matrices GE in Parallel: Blockwise QR-Decomposition
Property of Householder matrices
Theorem 3: For Householder matrices Hk , . . . ,Hi it holds
Hk · · ·Hi = (I − 2uk uTk ) · · · (I − 2uiuT
i ) = I − (uk · · · ui)︸ ︷︷ ︸=: Y
Ti
uTk...
uTi
with Ti being upper triangular.
Proof by induction:
Representation obviously fulfilled for one Householder mtx i = k .
Assume, representation holds for Hk , . . . ,Hi . Then . . . (next slide)
Parallel Numerics, WT 2012/2013 3 Linear Systems of Equations with Dense Matrices
page 30 of 35
Linear Systems of Equations with Dense Matrices GE in Parallel: Blockwise QR-Decomposition
Property of Householder matrices (cont.)[(I − 2uk uT
k ) · · · (I − 2uiuTi )](I − 2ui−1uT
i−1) =
=
I − (uk · · · ui)Ti
uTk...
uTi
· (I − 2ui−1uT
i−1) =
= I − 2ui−1uTi−1 − (uk · · · ui)Ti
uTk...
uTi
+ 2(uk · · · ui)Ti
uTk ui−1
...uT
i ui−1
︸ ︷︷ ︸
=: y
uTi−1 =
= I − (uk · · · ui ui−1) ·(
Ti −2y0 2
)·
uT
k...
uTi
uTi−1
Parallel Numerics, WT 2012/2013 3 Linear Systems of Equations with Dense Matrices
page 31 of 35
Linear Systems of Equations with Dense Matrices GE in Parallel: Blockwise QR-Decomposition
Algorithm for parallel Householder
Computation of Hk · · ·HiA = VkiA = (I − YTY T )A in the form
VkiA = Vki(A1 A2) = (∗ VkiA2)
andVkiA2 = (I − YTY T )A2 = A2 − Y [T (Y T A2)]
Algorithm:• Compute u1 and H1A1; u2 and H2A1; . . . ; uk and Hk A1
(sequential)
• Compute Y and VA2 (parallel)
• Repeat with indices k + 1, . . . ,2k ; 2k + 1, . . . ,3k ; . . .
Parallel Numerics, WT 2012/2013 3 Linear Systems of Equations with Dense Matrices
page 32 of 35
Linear Systems of Equations with Dense Matrices GE in Parallel: Blockwise QR-Decomposition
Communication Avoiding QR
• Four independent QR-factorisations
↓
• Two independent reduced QR-factorisations
↓
• One reduced QR-factorisation
Parallel Numerics, WT 2012/2013 3 Linear Systems of Equations with Dense Matrices
page 33 of 35
Linear Systems of Equations with Dense Matrices GE in Parallel: Blockwise QR-Decomposition
Tall Skinny QR
A =
A0A1A2A3
=
Q0R0Q1R1Q2R2Q3R3
=
Q0
Q1Q2
Q3
R0R1R2R3
R0R1R2R3
=
(
R0R1
)(
R2R3
) =
(Q01R01Q23R23
)=
(Q01
Q23
)(R01R23
)
(R01R23
)= Q0123R0123
Parallel Numerics, WT 2012/2013 3 Linear Systems of Equations with Dense Matrices
page 34 of 35
Linear Systems of Equations with Dense Matrices GE in Parallel: Blockwise QR-Decomposition
Tall Skinny QR (cont.)
A =
A0A1A2A3
=
Q0Q1
Q2Q3
· (Q01Q23
)·Q0123
· R0123
Advantage:Messages in O(log(P)) compared to O(2n log(P)) for ScaLAPACK.
Parallel Numerics, WT 2012/2013 3 Linear Systems of Equations with Dense Matrices
page 35 of 35