lecture notes 8: gaussian elimination—sequential and basic parallel...

48
Lecture Notes 8: Gaussian Elimination—Sequential and Basic Parallel Algorithms Shantanu Dutt ECE Dept., UIC

Upload: lenhan

Post on 31-Jan-2018

224 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Lecture Notes 8: Gaussian Elimination—Sequential and Basic Parallel ...dutt/courses/ece566/lect-notes/lect8-GE.pdf · SOR method (successive over-relaxation) Vasilije Perovi´c

Lecture Notes 8: Gaussian Elimination—Sequential and Basic

Parallel Algorithms

Shantanu Dutt

ECE Dept., UIC

Page 2: Lecture Notes 8: Gaussian Elimination—Sequential and Basic Parallel ...dutt/courses/ece566/lect-notes/lect8-GE.pdf · SOR method (successive over-relaxation) Vasilije Perovi´c

Acknowledgements

• Parallel Implementations of Gaussian Elimination, Vasilije Perovi´c, Western Michigan University

• Parallel Gaussian Elimination, Univ. of Frankfurt

Page 3: Lecture Notes 8: Gaussian Elimination—Sequential and Basic Parallel ...dutt/courses/ece566/lect-notes/lect8-GE.pdf · SOR method (successive over-relaxation) Vasilije Perovi´c

Motivation

Gaussian Elimination

Parallel Implementation

Discussion

Parallel Implementations of Gaussian Elimination

Vasilije Perovic

Western Michigan University

[email protected]

January 27, 2012

Vasilije Perovic CS 6260: Gaussian Elimination in Parallel

Page 4: Lecture Notes 8: Gaussian Elimination—Sequential and Basic Parallel ...dutt/courses/ece566/lect-notes/lect8-GE.pdf · SOR method (successive over-relaxation) Vasilije Perovi´c

Motivation

Gaussian Elimination

Parallel Implementation

Discussion

Linear systems of equations

General form of a linear system of equations is given by

a11x1 + · · · + a1nxn = b1a21x1 + · · · + a2nxn = b2...

...am1x1 + · · · + amnxn = bm

where aij ’s and bi ’s are known and we are solving for xi ’s.

Vasilije Perovic CS 6260: Gaussian Elimination in Parallel

Page 5: Lecture Notes 8: Gaussian Elimination—Sequential and Basic Parallel ...dutt/courses/ece566/lect-notes/lect8-GE.pdf · SOR method (successive over-relaxation) Vasilije Perovi´c

Motivation

Gaussian Elimination

Parallel Implementation

Discussion

Ax = b

More compactly, we can rewrite system of linear equations in theform

Ax = b

where

A =

a11 a12 . . . a1na21 a22 . . . a2n...

.... . .

...am1 am2 . . . amn

, x =

x1x2...xn

, b =

b1b2...bm

Vasilije Perovic CS 6260: Gaussian Elimination in Parallel

Page 6: Lecture Notes 8: Gaussian Elimination—Sequential and Basic Parallel ...dutt/courses/ece566/lect-notes/lect8-GE.pdf · SOR method (successive over-relaxation) Vasilije Perovi´c

Motivation

Gaussian Elimination

Parallel Implementation

Discussion

Why study solutions of Ax = b

1 One of the most fundamental problems in the field ofscientific computing

2 Arises in many applications:

Chemical engineeringInterpolationStructural analysisRegression AnalysisNumerical ODEs and PDEs

Vasilije Perovic CS 6260: Gaussian Elimination in Parallel

Page 7: Lecture Notes 8: Gaussian Elimination—Sequential and Basic Parallel ...dutt/courses/ece566/lect-notes/lect8-GE.pdf · SOR method (successive over-relaxation) Vasilije Perovi´c

Motivation

Gaussian Elimination

Parallel Implementation

Discussion

Why study solutions of Ax = b

1 One of the most fundamental problems in the field ofscientific computing

2 Arises in many applications:Chemical engineeringInterpolationStructural analysisRegression AnalysisNumerical ODEs and PDEs

Vasilije Perovic CS 6260: Gaussian Elimination in Parallel

Page 8: Lecture Notes 8: Gaussian Elimination—Sequential and Basic Parallel ...dutt/courses/ece566/lect-notes/lect8-GE.pdf · SOR method (successive over-relaxation) Vasilije Perovi´c

Motivation

Gaussian Elimination

Parallel Implementation

Discussion

General Theory

Partial Pivoting

Sequential Algorithm

Methods for solving Ax = b

1 Direct methods

2 Iterative methods

Vasilije Perovic CS 6260: Gaussian Elimination in Parallel

Page 9: Lecture Notes 8: Gaussian Elimination—Sequential and Basic Parallel ...dutt/courses/ece566/lect-notes/lect8-GE.pdf · SOR method (successive over-relaxation) Vasilije Perovi´c

Motivation

Gaussian Elimination

Parallel Implementation

Discussion

General Theory

Partial Pivoting

Sequential Algorithm

Methods for solving Ax = b

1 Direct methods – obtain the exact solution (in real arithmetic)in finitely many operations

2 Iterative methods – generate sequence of approximations thatconverge in the limit to the solution

Vasilije Perovic CS 6260: Gaussian Elimination in Parallel

Page 10: Lecture Notes 8: Gaussian Elimination—Sequential and Basic Parallel ...dutt/courses/ece566/lect-notes/lect8-GE.pdf · SOR method (successive over-relaxation) Vasilije Perovi´c

Motivation

Gaussian Elimination

Parallel Implementation

Discussion

General Theory

Partial Pivoting

Sequential Algorithm

Methods for solving Ax = b

1 Direct methods – obtain the exact solution (in real arithmetic)in finitely many operations

Gaussian elimination (LU factorization)QR factorizationWZ factorization

2 Iterative methods – generate sequence of approximations thatconverge in the limit to the solution

Jacobi iterationGauss-Seidal iterationSOR method (successive over-relaxation)

Vasilije Perovic CS 6260: Gaussian Elimination in Parallel

Page 11: Lecture Notes 8: Gaussian Elimination—Sequential and Basic Parallel ...dutt/courses/ece566/lect-notes/lect8-GE.pdf · SOR method (successive over-relaxation) Vasilije Perovi´c

Motivation

Gaussian Elimination

Parallel Implementation

Discussion

General Theory

Partial Pivoting

Sequential Algorithm

Methods for solving Ax = b

1 Direct methods – obtain the exact solution (in real arithmetic)in finitely many operations

Gaussian elimination (LU factorization)

QR factorizationWZ factorization

2 Iterative methods – generate sequence of approximations thatconverge in the limit to the solution

Jacobi iterationGauss-Seidal iterationSOR method (successive over-relaxation)

Vasilije Perovic CS 6260: Gaussian Elimination in Parallel

Page 12: Lecture Notes 8: Gaussian Elimination—Sequential and Basic Parallel ...dutt/courses/ece566/lect-notes/lect8-GE.pdf · SOR method (successive over-relaxation) Vasilije Perovi´c

Motivation

Gaussian Elimination

Parallel Implementation

Discussion

General Theory

Partial Pivoting

Sequential Algorithm

Gaussian Elimination

When solving Ax = b we will assume throughout this presentationthat A is non-singular and A and b are known

a11x1 + · · · + a1nxn = b1a21x1 + · · · + a2nxn = b2...

...an1x1 + · · · + annxn = bn .

Vasilije Perovic CS 6260: Gaussian Elimination in Parallel

Page 13: Lecture Notes 8: Gaussian Elimination—Sequential and Basic Parallel ...dutt/courses/ece566/lect-notes/lect8-GE.pdf · SOR method (successive over-relaxation) Vasilije Perovi´c

Motivation

Gaussian Elimination

Parallel Implementation

Discussion

General Theory

Partial Pivoting

Sequential Algorithm

Gaussian Elimination

Assuming that a11 �= 0 we first subtract a21/a11 times the firstequation from the second equation to eliminate the coefficient x1in the second equation, and so on until the coefficients of x1 in thelast n − 1 rows have all been eliminated. This gives the modifiedsystem of equations

a11x1+ a12x2 + · · · + a1nxn = b1a(1)22

x2 + · · · + a(1)2n xn = b(1)

2

......

...

a(1)n1 x2 + · · · + a(1)nn xn = b(1)n ,

where

a(1)ij = aij − a1jai1a11

b(1)i = bi − b1ai1a11

i , j = 1, 2, . . . , n .

Vasilije Perovic CS 6260: Gaussian Elimination in Parallel

Page 14: Lecture Notes 8: Gaussian Elimination—Sequential and Basic Parallel ...dutt/courses/ece566/lect-notes/lect8-GE.pdf · SOR method (successive over-relaxation) Vasilije Perovi´c

Motivation

Gaussian Elimination

Parallel Implementation

Discussion

General Theory

Partial Pivoting

Sequential Algorithm

Gaussian Elimination

Assuming that a11 �= 0 we first subtract a21/a11 times the firstequation from the second equation to eliminate the coefficient x1in the second equation, and so on until the coefficients of x1 in thelast n − 1 rows have all been eliminated. This gives the modifiedsystem of equations

a11x1+ a12x2 + · · · + a1nxn = b1a(1)22

x2 + · · · + a(1)2n xn = b(1)

2

......

...

a(1)n1 x2 + · · · + a(1)nn xn = b(1)n ,

where

a(1)ij = aij − a1jai1a11

b(1)i = bi − b1ai1a11

i , j = 1, 2, . . . , n .

Vasilije Perovic CS 6260: Gaussian Elimination in Parallel

Page 15: Lecture Notes 8: Gaussian Elimination—Sequential and Basic Parallel ...dutt/courses/ece566/lect-notes/lect8-GE.pdf · SOR method (successive over-relaxation) Vasilije Perovi´c

Motivation

Gaussian Elimination

Parallel Implementation

Discussion

General Theory

Partial Pivoting

Sequential Algorithm

Gaussian Elimination (Forward Reduction)

Applying the same process the last n− 1 equations of the modifiedsystem to eliminate coefficients of x2 in the last n − 2 equations,and so on, until the entire system has been reduced to the (upper)triangular form

a11 a12 · · · a1na(1)22

· · · a(1)2n

. . ....

a(n−1)nn

x1x2· · ·xn

=

b1b(1)2

...

b(n−1)n

.

• The superscripts indicate the number of times the elements hadto be changed.

Vasilije Perovic CS 6260: Gaussian Elimination in Parallel

Page 16: Lecture Notes 8: Gaussian Elimination—Sequential and Basic Parallel ...dutt/courses/ece566/lect-notes/lect8-GE.pdf · SOR method (successive over-relaxation) Vasilije Perovi´c

Motivation

Gaussian Elimination

Parallel Implementation

Discussion

General Theory

Partial Pivoting

Sequential Algorithm

Gaussian Elimination (Forward Reduction)

Applying the same process the last n− 1 equations of the modifiedsystem to eliminate coefficients of x2 in the last n − 2 equations,and so on, until the entire system has been reduced to the (upper)triangular form

a11 a12 · · · a1na(1)22

· · · a(1)2n

. . ....

a(n−1)nn

x1x2· · ·xn

=

b1b(1)2

...

b(n−1)n

.

• The superscripts indicate the number of times the elements hadto be changed.

Vasilije Perovic CS 6260: Gaussian Elimination in Parallel

Page 17: Lecture Notes 8: Gaussian Elimination—Sequential and Basic Parallel ...dutt/courses/ece566/lect-notes/lect8-GE.pdf · SOR method (successive over-relaxation) Vasilije Perovi´c

Motivation

Gaussian Elimination

Parallel Implementation

Discussion

General Theory

Partial Pivoting

Sequential Algorithm

Gaussian Elimination – Example

• Perform the forward reduction the the system given below

4 −9 22 −4 4

−1 2 2

x1x2x3

=

231

We start by writing down the augmented matrix for the givensystem:

4 −9 2 22 −4 4 3

−1 2 2 1

Vasilije Perovic CS 6260: Gaussian Elimination in Parallel

Page 18: Lecture Notes 8: Gaussian Elimination—Sequential and Basic Parallel ...dutt/courses/ece566/lect-notes/lect8-GE.pdf · SOR method (successive over-relaxation) Vasilije Perovi´c

Motivation

Gaussian Elimination

Parallel Implementation

Discussion

General Theory

Partial Pivoting

Sequential Algorithm

Gaussian Elimination – Example

• Perform the forward reduction the the system given below

4 −9 22 −4 4

−1 2 2

x1x2x3

=

231

We start by writing down the augmented matrix for the givensystem:

4 −9 2 22 −4 4 3

−1 2 2 1

Vasilije Perovic CS 6260: Gaussian Elimination in Parallel

Page 19: Lecture Notes 8: Gaussian Elimination—Sequential and Basic Parallel ...dutt/courses/ece566/lect-notes/lect8-GE.pdf · SOR method (successive over-relaxation) Vasilije Perovi´c

Motivation

Gaussian Elimination

Parallel Implementation

Discussion

General Theory

Partial Pivoting

Sequential Algorithm

Gaussian Elimination – Example

We perform appropriate row operations to obtain:

4 −9 2 22 −4 4 3

−1 2 2 1

−−→

Vasilije Perovic CS 6260: Gaussian Elimination in Parallel

Page 20: Lecture Notes 8: Gaussian Elimination—Sequential and Basic Parallel ...dutt/courses/ece566/lect-notes/lect8-GE.pdf · SOR method (successive over-relaxation) Vasilije Perovi´c

Motivation

Gaussian Elimination

Parallel Implementation

Discussion

General Theory

Partial Pivoting

Sequential Algorithm

Gaussian Elimination – Example

We perform appropriate row operations to obtain:

4 −9 2 22 −4 4 3

−1 2 2 1

R2−( 2

4)R1→R2−−−−−−−−−−→

R3−(−1

4)R1→R3

4 −9 2 20 0.5 3 20 −0.25 2.5 1.5

Vasilije Perovic CS 6260: Gaussian Elimination in Parallel

Page 21: Lecture Notes 8: Gaussian Elimination—Sequential and Basic Parallel ...dutt/courses/ece566/lect-notes/lect8-GE.pdf · SOR method (successive over-relaxation) Vasilije Perovi´c

Motivation

Gaussian Elimination

Parallel Implementation

Discussion

General Theory

Partial Pivoting

Sequential Algorithm

Gaussian Elimination – Example

We perform appropriate row operations to obtain:

4 −9 2 22 −4 4 3

−1 2 2 1

R2−( 2

4)R1→R2−−−−−−−−−−→

R3−(−1

4)R1→R3

4 −9 2 20 0.5 3 20 −0.25 2.5 1.5

R3−(−0.250.5 )R2→R3−−−−−−−−−−−→

4 −9 2 20 0.5 3 20 0 4 2.5

Vasilije Perovic CS 6260: Gaussian Elimination in Parallel

Page 22: Lecture Notes 8: Gaussian Elimination—Sequential and Basic Parallel ...dutt/courses/ece566/lect-notes/lect8-GE.pdf · SOR method (successive over-relaxation) Vasilije Perovi´c

Motivation

Gaussian Elimination

Parallel Implementation

Discussion

General Theory

Partial Pivoting

Sequential Algorithm

Gaussian Elimination – Example

Note that the row operations used to eliminate x1 from the secondand the third equations are equivalent to multiplying on the leftthe augmented matrix:

1 0 0

−0.5 1 00.25 0 1

·

4 −9 2 22 −4 4 3

−1 2 2 1

=

4 −9 2 20 0.5 3 20 −0.25 2.5 1.5

Vasilije Perovic CS 6260: Gaussian Elimination in Parallel

Page 23: Lecture Notes 8: Gaussian Elimination—Sequential and Basic Parallel ...dutt/courses/ece566/lect-notes/lect8-GE.pdf · SOR method (successive over-relaxation) Vasilije Perovi´c

Motivation

Gaussian Elimination

Parallel Implementation

Discussion

General Theory

Partial Pivoting

Sequential Algorithm

Gaussian Elimination – Example

Note that the row operations used to eliminate x1 from the secondand the third equations are equivalent to multiplying on the leftthe augmented matrix:

1 0 0

−0.5 1 00.25 0 1

� �� �L1

·

4 −9 2 22 −4 4 3

−1 2 2 1

=

4 −9 2 20 0.5 3 20 −0.25 2.5 1.5

Vasilije Perovic CS 6260: Gaussian Elimination in Parallel

Page 24: Lecture Notes 8: Gaussian Elimination—Sequential and Basic Parallel ...dutt/courses/ece566/lect-notes/lect8-GE.pdf · SOR method (successive over-relaxation) Vasilije Perovi´c

Motivation

Gaussian Elimination

Parallel Implementation

Discussion

General Theory

Partial Pivoting

Sequential Algorithm

Gaussian Elimination – Example

Note that the row operations used to eliminate x1 from the secondand the third equations are equivalent to multiplying on the leftthe augmented matrix:

1 0 00 1 00 0.5 1

� �� �L2

·

1 0 0

−0.5 1 00.25 0 1

� �� �L1

· [A|b] =

4 −9 2 20 0.5 3 20 0 4 2.5

Vasilije Perovic CS 6260: Gaussian Elimination in Parallel

Page 25: Lecture Notes 8: Gaussian Elimination—Sequential and Basic Parallel ...dutt/courses/ece566/lect-notes/lect8-GE.pdf · SOR method (successive over-relaxation) Vasilije Perovi´c

Motivation

Gaussian Elimination

Parallel Implementation

Discussion

General Theory

Partial Pivoting

Sequential Algorithm

Gaussian Elimination – Example

1 0 00 1 00 0.5 1

� �� �L2

·

1 0 0

−0.5 1 00.25 0 1

� �� �L1� �� �

�L= L2·L1

· [A|b] =

4 −9 2 20 0.5 3 20 0 4 2.5

Thus, we can write

(L2 · L1) · A = �L · A = U

Vasilije Perovic CS 6260: Gaussian Elimination in Parallel

Page 26: Lecture Notes 8: Gaussian Elimination—Sequential and Basic Parallel ...dutt/courses/ece566/lect-notes/lect8-GE.pdf · SOR method (successive over-relaxation) Vasilije Perovi´c

Motivation

Gaussian Elimination

Parallel Implementation

Discussion

General Theory

Partial Pivoting

Sequential Algorithm

LU factorization

In general, we can think of row operations as multiplying withmatrices on the left, thus the i th elimination step is equivalent asmultiplication on the left by

Li =

1. . .

1−li+1,i

.... . .

−ln,i 1

.

Vasilije Perovic CS 6260: Gaussian Elimination in Parallel

Page 27: Lecture Notes 8: Gaussian Elimination—Sequential and Basic Parallel ...dutt/courses/ece566/lect-notes/lect8-GE.pdf · SOR method (successive over-relaxation) Vasilije Perovi´c

Motivation

Gaussian Elimination

Parallel Implementation

Discussion

General Theory

Partial Pivoting

Sequential Algorithm

LU factorization

Continuing in this fashion we obtain

�LAx = �Lb , �L = Ln−1 · · · L2L1 ,

A = �L−1U = LU .

Vasilije Perovic CS 6260: Gaussian Elimination in Parallel

Page 28: Lecture Notes 8: Gaussian Elimination—Sequential and Basic Parallel ...dutt/courses/ece566/lect-notes/lect8-GE.pdf · SOR method (successive over-relaxation) Vasilije Perovi´c

Motivation

Gaussian Elimination

Parallel Implementation

Discussion

General Theory

Partial Pivoting

Sequential Algorithm

LU factorization

Continuing in this fashion we obtain

�LAx = �Lb , �L = Ln−1 · · · L2L1 ,

A = �L−1U = LU .

Vasilije Perovic CS 6260: Gaussian Elimination in Parallel

Page 29: Lecture Notes 8: Gaussian Elimination—Sequential and Basic Parallel ...dutt/courses/ece566/lect-notes/lect8-GE.pdf · SOR method (successive over-relaxation) Vasilije Perovi´c

Motivation

Gaussian Elimination

Parallel Implementation

Discussion

General Theory

Partial Pivoting

Sequential Algorithm

LU factorization

Continuing in this fashion we obtain

A = �L−1U = LU .

Thus, the Gaussian elimination algorithm for solving Ax = b ismathematically equivalent to the three-step process:

1. Factor A = LU2. Solve (forward substitution) Ly = b3. Solve (back substitution) Ux = y .

• Order of operations:n3

3+ n2 − n

3multiplications/divisions

n3

3+ n2

2− 5n

6.

Vasilije Perovic CS 6260: Gaussian Elimination in Parallel

Page 30: Lecture Notes 8: Gaussian Elimination—Sequential and Basic Parallel ...dutt/courses/ece566/lect-notes/lect8-GE.pdf · SOR method (successive over-relaxation) Vasilije Perovi´c

Motivation

Gaussian Elimination

Parallel Implementation

Discussion

General Theory

Partial Pivoting

Sequential Algorithm

LU factorization

Continuing in this fashion we obtain

A = �L−1U = LU .

Thus, the Gaussian elimination algorithm for solving Ax = b ismathematically equivalent to the three-step process:

1. Factor A = LU2. Solve (forward substitution) Ly = b3. Solve (back substitution) Ux = y .

• Order of operations:n3

3+ n2 − n

3multiplications/divisions

n3

3+ n2

2− 5n

6.

Vasilije Perovic CS 6260: Gaussian Elimination in Parallel

Page 31: Lecture Notes 8: Gaussian Elimination—Sequential and Basic Parallel ...dutt/courses/ece566/lect-notes/lect8-GE.pdf · SOR method (successive over-relaxation) Vasilije Perovi´c

Motivation

Gaussian Elimination

Parallel Implementation

Discussion

General Theory

Partial Pivoting

Sequential Algorithm

Need for partial pivoting

Apply LU factorization without pivoting to

A =

�0.0001 1

1 1

in three-decimal-digit floating point arithmetic.

Vasilije Perovic CS 6260: Gaussian Elimination in Parallel

Page 32: Lecture Notes 8: Gaussian Elimination—Sequential and Basic Parallel ...dutt/courses/ece566/lect-notes/lect8-GE.pdf · SOR method (successive over-relaxation) Vasilije Perovi´c

Motivation

Gaussian Elimination

Parallel Implementation

Discussion

General Theory

Partial Pivoting

Sequential Algorithm

Need for partial pivoting

A =

�0.0001 1

1 1

Solution: L and U are easily obtainable and are given by:

L =

�1 0

fl(1/10−4) 1

�, fl(1/10−4) rounds to 104,

U =

�10−4 10 fl(1− 104 · 1)

�, fl(1− 104 · 1) rounds to − 104,

so LU =

�1 0104 1

� �10−4 10 −104

�=

�10−4 11 0

but A =

�10−4 11 1

�.

Vasilije Perovic CS 6260: Gaussian Elimination in Parallel

Page 33: Lecture Notes 8: Gaussian Elimination—Sequential and Basic Parallel ...dutt/courses/ece566/lect-notes/lect8-GE.pdf · SOR method (successive over-relaxation) Vasilije Perovi´c

Motivation

Gaussian Elimination

Parallel Implementation

Discussion

General Theory

Partial Pivoting

Sequential Algorithm

Need for partial pivoting

A =

�10−4 11 1

�but LU =

�10−4 11 0

Remark 1: Note that original a22 has been entirely “lost” fromthe computation by subtracting 104 from it. Thus if we were touse this LU factorization in order to solve a system there would beno way to guarantee an accurate answer. This is called numericalinstability.

Remark 2: Suppose we attempted to solve system Ax = [1, 2]T

for x using this LU decomposition. The correct answer isx ≈ [1, 1]T . Instead, if we were to stick with thethree-digit-floating point arithmetic we would get an answer�x = [0, 1]T which is completely erroneous.

Vasilije Perovic CS 6260: Gaussian Elimination in Parallel

Page 34: Lecture Notes 8: Gaussian Elimination—Sequential and Basic Parallel ...dutt/courses/ece566/lect-notes/lect8-GE.pdf · SOR method (successive over-relaxation) Vasilije Perovi´c

Motivation

Gaussian Elimination

Parallel Implementation

Discussion

General Theory

Partial Pivoting

Sequential Algorithm

Need for partial pivoting

A =

�10−4 11 1

�but LU =

�10−4 11 0

Remark 1: Note that original a22 has been entirely “lost” fromthe computation by subtracting 104 from it. Thus if we were touse this LU factorization in order to solve a system there would beno way to guarantee an accurate answer. This is called numericalinstability.Remark 2: Suppose we attempted to solve system Ax = [1, 2]T

for x using this LU decomposition. The correct answer isx ≈ [1, 1]T . Instead, if we were to stick with thethree-digit-floating point arithmetic we would get an answer�x = [0, 1]T which is completely erroneous.

Vasilije Perovic CS 6260: Gaussian Elimination in Parallel

Page 35: Lecture Notes 8: Gaussian Elimination—Sequential and Basic Parallel ...dutt/courses/ece566/lect-notes/lect8-GE.pdf · SOR method (successive over-relaxation) Vasilije Perovi´c

Motivation

Gaussian Elimination

Parallel Implementation

Discussion

General Theory

Partial Pivoting

Sequential Algorithm

Sequential Algorithm

Vasilije Perovic CS 6260: Gaussian Elimination in Parallel

Page 36: Lecture Notes 8: Gaussian Elimination—Sequential and Basic Parallel ...dutt/courses/ece566/lect-notes/lect8-GE.pdf · SOR method (successive over-relaxation) Vasilije Perovi´c

Motivation

Gaussian Elimination

Parallel Implementation

Discussion

Row-Oriented Algorithm

Pipelined Implementation

Row-Oriented Algorithm

1 Determination of the local pivot element,2 Determination of the global pivot element,3 Exchange of the pivot row,4 Distribution of the pivot row,5 Computation of the elimination factors,6 Computation of the matrix elements.

Vasilije Perovic CS 6260: Gaussian Elimination in Parallel

Page 37: Lecture Notes 8: Gaussian Elimination—Sequential and Basic Parallel ...dutt/courses/ece566/lect-notes/lect8-GE.pdf · SOR method (successive over-relaxation) Vasilije Perovi´c

Motivation

Gaussian Elimination

Parallel Implementation

Discussion

Row-Oriented Algorithm

Pipelined Implementation

Row-Oriented Algorithm

Remark 1: The computation of the solution vector x in thebackward substitution is inherently serial, since the values of xkdepend on each other and are computed one after another. Thus,in step k the processor Pq owning row k computers the value of xkand sends the value to all other processors by a single broadcastoperation.

Remark 2: Note that the data distribution is not quite adequate.For example, suppose we are at the i th step and that i > m · n/pwhere m is a natural number. In that case, processorsp0, p1, . . . , pm−1 are idle since all their work is done until the backsubstitution step at the very end. Hence there is an issue with loadbalancing.

Vasilije Perovic CS 6260: Gaussian Elimination in Parallel

Page 38: Lecture Notes 8: Gaussian Elimination—Sequential and Basic Parallel ...dutt/courses/ece566/lect-notes/lect8-GE.pdf · SOR method (successive over-relaxation) Vasilije Perovi´c

Motivation

Gaussian Elimination

Parallel Implementation

Discussion

Row-Oriented Algorithm

Pipelined Implementation

Row-Oriented Algorithm

Remark 1: The computation of the solution vector x in thebackward substitution is inherently serial, since the values of xkdepend on each other and are computed one after another. Thus,in step k the processor Pq owning row k computers the value of xkand sends the value to all other processors by a single broadcastoperation.Remark 2: Note that the data distribution is not quite adequate.For example, suppose we are at the i th step and that i > m · n/pwhere m is a natural number. In that case, processorsp0, p1, . . . , pm−1 are idle since all their work is done until the backsubstitution step at the very end. Hence there is an issue with loadbalancing.

Vasilije Perovic CS 6260: Gaussian Elimination in Parallel

Page 39: Lecture Notes 8: Gaussian Elimination—Sequential and Basic Parallel ...dutt/courses/ece566/lect-notes/lect8-GE.pdf · SOR method (successive over-relaxation) Vasilije Perovi´c

Motivation

Gaussian Elimination

Parallel Implementation

Discussion

Row-Oriented Algorithm

Pipelined Implementation

Row-Oriented Algorithm

Remark 2: Note that the data distribution is not quite adequate.For example, suppose we are at the i th step and that i > m · n/pwhere m is a natural number. In that case, processorsp0, p1, . . . , pm−1 are idle since all their work is done until the backsubstitution step at the very end. Hence there is an issue with loadbalancing.

block-cyclic row distribution

Vasilije Perovic CS 6260: Gaussian Elimination in Parallel

Page 40: Lecture Notes 8: Gaussian Elimination—Sequential and Basic Parallel ...dutt/courses/ece566/lect-notes/lect8-GE.pdf · SOR method (successive over-relaxation) Vasilije Perovi´c

Motivation

Gaussian Elimination

Parallel Implementation

Discussion

Row-Oriented Algorithm

Pipelined Implementation

Row-Oriented Algorithm

Remark 3: We could also consider column-oriented datadistribution. In that case, at the k th step the processor that ownsk th column would have all needed data to compute the new pivot.On the one hand, this would reduce communication betweenprocessors that we had when considering row-oriented datadistribution. On the other hand, with column orientation pivotdetermination is all done serially. Thus, in case when n >> p(which is often case) row oriented data distribution might be moreadvantageous since the pivot determination is done in parallel

Vasilije Perovic CS 6260: Gaussian Elimination in Parallel

Page 41: Lecture Notes 8: Gaussian Elimination—Sequential and Basic Parallel ...dutt/courses/ece566/lect-notes/lect8-GE.pdf · SOR method (successive over-relaxation) Vasilije Perovi´c

Parallel Gaussian Elimination

Page 42: Lecture Notes 8: Gaussian Elimination—Sequential and Basic Parallel ...dutt/courses/ece566/lect-notes/lect8-GE.pdf · SOR method (successive over-relaxation) Vasilije Perovi´c

Gaussian Elimination

We work with the rowwise decomposition:– Each processor receives n/p rows of the extended matrix

(A,b)

We parallelize the individual pivoting steps– In step i we find the maximum A(j,i) for i≤ j≤ n

– Swap row j with row i– Broadcast row i to processors holding rows i+1…n– Each processor computes the new

rowj=rowj-A(j,i)/A(i,i) ·rowi

Page 43: Lecture Notes 8: Gaussian Elimination—Sequential and Basic Parallel ...dutt/courses/ece566/lect-notes/lect8-GE.pdf · SOR method (successive over-relaxation) Vasilije Perovi´c

(or O(nP) for a linear array)

(or O(n2P) for a linear array)

Page 44: Lecture Notes 8: Gaussian Elimination—Sequential and Basic Parallel ...dutt/courses/ece566/lect-notes/lect8-GE.pdf · SOR method (successive over-relaxation) Vasilije Perovi´c

for a linear array

and O(n2 log P) for a hypercube • Need to independently reduce comm.

time, as due to much larger constants than computation time, it can dominate or at least significantly affect parallel time

Page 45: Lecture Notes 8: Gaussian Elimination—Sequential and Basic Parallel ...dutt/courses/ece566/lect-notes/lect8-GE.pdf · SOR method (successive over-relaxation) Vasilije Perovi´c
Page 46: Lecture Notes 8: Gaussian Elimination—Sequential and Basic Parallel ...dutt/courses/ece566/lect-notes/lect8-GE.pdf · SOR method (successive over-relaxation) Vasilije Perovi´c

Based on the fact that a pivot is picked from every row, except one, exactly once

Page 47: Lecture Notes 8: Gaussian Elimination—Sequential and Basic Parallel ...dutt/courses/ece566/lect-notes/lect8-GE.pdf · SOR method (successive over-relaxation) Vasilije Perovi´c

recv. row 2 latest @ t = 3n + n2/P + n(P-2) = 2n + n2/P + n(P-1) So a new set of col elim. + row operations done every n +

n2/P time units (of which n time units is idling due to commun. delay) after an initial wait for n(P-1) time unit. When pivot oper. is included for its rows, computation done every 2n + n2/P time units (idling is again n time units). This is the worst-case stage delay.

So total time = n(P-1) + (n-1)(2n + n2/P) Comp. time = Q(n3/P); Commun. time is only Q(nP + n2) [initial pipeline fill time due to commun. only, and subsequent commun. delay per stage of n (causing idling) n(P-1)+(n-1)n time

send row 1 @ t = n recv. row 1 @ t = n + n(P-1)

row oper. done @ t = n2/P + n + n(P-1)

send its piv. row @ t= 3n + n2/P Note: This is the max. delay at proc. 1 in sending the next pivot row ( for the one it owns; next pivot rows from proc. 0 will be sent faster, since the row oper. time is then not needed). Thus worst case stage delay is 2n + n2/P.

recv. row 1 @ t = 2n row oper. done @ t = n2/P + 2n pivot col sel. @ t = n2/P + 3n

start @ t = 0, pivot col sel. @ t = n

Proc. 0 Proc. 1 Proc. P-1

Page 48: Lecture Notes 8: Gaussian Elimination—Sequential and Basic Parallel ...dutt/courses/ece566/lect-notes/lect8-GE.pdf · SOR method (successive over-relaxation) Vasilije Perovi´c

recv. row 2 latest @ t = 3n + n2/P + n(P-2) = 2n + n2/P + n(P-1) So a new set of col elim. + row operations done every n +

n2/P time units (of which n time units is idling due to commun. delay) after an initial wait for n(P-1) time unit. When pivot oper. is included for its rows, computation done every 2n + n2/P time units (idling is again n time units). This is the worst-case stage delay.

So total time = n(P-1) + (n-1)(2n + n2/P) Comp. time = Q(n3/P); Commun. time is only Q(nP + n2) [initial pipeline fill time due to commun. only, and subsequent commun. delay per stage of n (causing idling) n(P-1)+(n-1)n time

send row 1 @ t = n recv. row 1 @ t = n + n(P-1)

row oper. done @ t = n2/P + n + n(P-1)

send its piv. row @ t= 3n + n2/P Note: This is the max. delay at proc. 1 in sending the next pivot row ( for the one it owns; next pivot rows from proc. 0 will be sent faster, since the row oper. time is then not needed). Thus worst case stage delay is 2n + n2/P.

recv. row 1 @ t = 2n row oper. done @ t = n2/P + 2n pivot col sel. @ t = n2/P + 3n

start @ t = 0, pivot col sel. @ t = n

Proc. 0 Proc. 1 Proc. P-1

dutt
Note
It happens only once as it is the fill time (idling = theta(nP)); subsequent idling is only theta(n) per phase, for a total of theta(n^2 + nP) of idling.