parallelizing iterative computation for multiprocessor architectures peter cappello

37
Parallelizing Iterative Computation for Multiprocessor Architectures Peter Cappello

Upload: marsha-bond

Post on 01-Jan-2016

221 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Parallelizing Iterative Computation for Multiprocessor Architectures Peter Cappello

Parallelizing Iterative Computation for Multiprocessor Architectures

Peter Cappello

Page 2: Parallelizing Iterative Computation for Multiprocessor Architectures Peter Cappello

2

What is the problem?

Create programs for multi-processor unit (MPU)

– Multicore processors

– Graphics processing units (GPU)

Page 3: Parallelizing Iterative Computation for Multiprocessor Architectures Peter Cappello

3

For whom is it a problem? Compiler designer

ApplicationProgram Compiler Executable

CPU

EASY

Page 4: Parallelizing Iterative Computation for Multiprocessor Architectures Peter Cappello

4

For whom is it a problem? Compiler designer

ApplicationProgram Compiler Executable

MPU

HARD

Page 5: Parallelizing Iterative Computation for Multiprocessor Architectures Peter Cappello

5

For whom is it a problem? Application programmer

ApplicationProgram Compiler Executable

MPU

Page 6: Parallelizing Iterative Computation for Multiprocessor Architectures Peter Cappello

6

Complex machine consequences

• Programmer needs to be highly skilled

• Programming is error-prone

These consequences imply . . .

Increased parallelism increased development cost!

Page 7: Parallelizing Iterative Computation for Multiprocessor Architectures Peter Cappello

7

Amdahl’s Law

The speedup of a program is bounded by its inherently sequential part.

(http://en.wikipedia.org/wiki/Amdahl's_law)

If– A program needs 20 hours using a CPU– 1 hour cannot be parallelized

Then– Minimum execution time ≥ 1 hour.– Maximum speed up ≤ 20.

Page 8: Parallelizing Iterative Computation for Multiprocessor Architectures Peter Cappello

8(http://en.wikipedia.org/wiki/Amdahl's_law)

Page 9: Parallelizing Iterative Computation for Multiprocessor Architectures Peter Cappello

9

Parallelization opportunities

Scalable parallelism resides in 2

sequential program constructs:

• Divide-and-conquer recursion

• Iterative statements (for)

Page 10: Parallelizing Iterative Computation for Multiprocessor Architectures Peter Cappello

10

2 schools of thought

• Create a general solution

(Address everything somewhat well)

• Create a specific solution

(Address one thing very well)

Page 11: Parallelizing Iterative Computation for Multiprocessor Architectures Peter Cappello

11

Focus on iterative statements (for)

float[] x = new float[n];

float[] b = new float[n];

float[][] a = new float[n][n];

. . .

for ( int i = 0; i < n; i++ )

{

b[i] = 0;

for ( int j = 0; j < n; j++ )

b[i] += a[i][j]*x[j];

}

Page 12: Parallelizing Iterative Computation for Multiprocessor Architectures Peter Cappello

12

Matrix-Vector Product

b = Ax, illustrated with a 3X3 matrix, A.

_______________________________

b1 = a11*x1 + a12*x2 + a13*x3

b2 = a21*x1 + a22*x2 + a23*x3

b3 = a31*x1 + a32*x2 + a33*x3

Page 13: Parallelizing Iterative Computation for Multiprocessor Architectures Peter Cappello

13

a31 a32 a33

a21 a22 a23

a11 a12 a13

x1 x2 x3

x1

x1

x2

x2

x3

x3b1

b2

b3

x1 x2 x3

Page 14: Parallelizing Iterative Computation for Multiprocessor Architectures Peter Cappello

14

a31 a32 a33

a21 a22 a23

a11 a12 a13

x1 x2 x3

x1

x1

x2

x2

x3

x3

TIME

SPACE

Page 15: Parallelizing Iterative Computation for Multiprocessor Architectures Peter Cappello

15

a31 a32 a33

a21 a22 a23

a11 a12 a13

x1 x2 x3

x1

x1

x2

x2

x3

x3

SPACE

TIME

Page 16: Parallelizing Iterative Computation for Multiprocessor Architectures Peter Cappello

16

a31

a32

a33

a21

a22

a23

a11

a12

a13

x1

x2

x3

x1

x1 x

2

x2

x3

x3

SPACE

TIME

Page 17: Parallelizing Iterative Computation for Multiprocessor Architectures Peter Cappello

17

Matrix Product

C = AB, illustrated with a 2X2 matrices.

c11 = a11*b11 + a12*b21

c12 = a11*b12 + a12*b22

c21 = a21*b11 + a22*b21

c12 = a21*b12 + a22*b22

Page 18: Parallelizing Iterative Computation for Multiprocessor Architectures Peter Cappello

18

a21 a22

a11 a12

b11

b11 b21

k

row

a21 a22

a11 a12b12

b21

b12

b22

b22

col

Page 19: Parallelizing Iterative Computation for Multiprocessor Architectures Peter Cappello

19

a11

a21a22

a12

b11

b11 b21

T

S

a21 a22

a11 a12b12

b21

b12

b22

b22

S

Page 20: Parallelizing Iterative Computation for Multiprocessor Architectures Peter Cappello

20

a21 a22

a11 a12

b11

b11 b21

T

Sa21 a22

a11 a12b12

b21

b12

b22

b22

S

Page 21: Parallelizing Iterative Computation for Multiprocessor Architectures Peter Cappello

21

Declaring an iterative computation

• Index set

• Data network

• Functions

• Space-time embedding

Page 22: Parallelizing Iterative Computation for Multiprocessor Architectures Peter Cappello

22

Declaring an Index set

I1: I2:1 ≤ i ≤ j ≤ n 1 ≤ i ≤ n 1 ≤ j ≤ n

i

j

i

j

Page 23: Parallelizing Iterative Computation for Multiprocessor Architectures Peter Cappello

23

Declaring a Data network

D1:

x: [ -1, 0];

b: [ 0, -1];

a: [ 0, 0];

D2:

x: [ -1, 0];

b: [ -1, -1];

a: [ 0, -1];

x

b

ax

ab

Page 24: Parallelizing Iterative Computation for Multiprocessor Architectures Peter Cappello

24

I1:

D1:

x: [ -1, 0];

b: [ 0, -1];

a: [ 0, 0];

Declaring an Index set + Data network

i

j

x

b

a

1 ≤ i ≤ j ≤ n

Page 25: Parallelizing Iterative Computation for Multiprocessor Architectures Peter Cappello

25

Declaring the Functions

R1:float x’ (float x) { return x; }

float b’ (float b, float x, float a)

{ return b + a*x; }

R2:char x’ (char x) { return x; }

boolean b’ (boolean b, char x, char a)

{ return b && a == x; }i

j

Page 26: Parallelizing Iterative Computation for Multiprocessor Architectures Peter Cappello

26

Declaring a Spacetime embedding

E1:– space = -i + j– time = i + j.

E2:– space1 = i – space2 = j– time = i + j.

time

space

timespace2

space1

Page 27: Parallelizing Iterative Computation for Multiprocessor Architectures Peter Cappello

27

Declaring an iterative computation Upper triangular matrix-vector product

UTMVP = (I1,D1,F1,E1)

time

space

Page 28: Parallelizing Iterative Computation for Multiprocessor Architectures Peter Cappello

28

Declaring an iterative computation Full matrix-vector product

UTMVP = (I2,D1,F1,E1)

time

space

Page 29: Parallelizing Iterative Computation for Multiprocessor Architectures Peter Cappello

29

Declaring an iterative computation Convolution (polynomial product)

UTMVP = (I2,D2,F1,E1)

time

space

Page 30: Parallelizing Iterative Computation for Multiprocessor Architectures Peter Cappello

30

Declaring an iterative computation String pattern matching

UTMVP = (I2,D2,F2,E1)

time

space

Page 31: Parallelizing Iterative Computation for Multiprocessor Architectures Peter Cappello

31

Declaring an iterative computation Pipelined String pattern matching

UTMVP = (I2,D2,F2,E2)

timespace2

space1

Page 32: Parallelizing Iterative Computation for Multiprocessor Architectures Peter Cappello

32

Iterative computation specification

Declarative specification

Is a 4-dimensional design space

(actually 5 dimensional: space embedding is

independent of time embeding)

Facilitates reuse of design components.

Page 33: Parallelizing Iterative Computation for Multiprocessor Architectures Peter Cappello

33

Starting with an existing language …

• Can infer

– Index set

– Data network

– Functions

• Cannot infer

– Space embedding

– Time embedding

Page 34: Parallelizing Iterative Computation for Multiprocessor Architectures Peter Cappello

34

Spacetime embedding

• Start with it as a program annotation

• More advanced:

compiler optimized based on program

annotated figure of merit.

Page 35: Parallelizing Iterative Computation for Multiprocessor Architectures Peter Cappello

35

Work

• Work out details of notation• Implement in Java, C, Matlab, HDL, …• Map virtual processor network to actual processor

network• Map

– Java: map processors to Threads, [links to Channels]– GPU: map processors to GPU processing elements

(Challenge: spacetime embedding depends on underlying architecture)

Page 36: Parallelizing Iterative Computation for Multiprocessor Architectures Peter Cappello

36

Work …

• The output of 1 iterative computation is

the input to another.

• Develop a notation for specifying

composite iterative computation?

Page 37: Parallelizing Iterative Computation for Multiprocessor Architectures Peter Cappello

37

Thanks for listening!

Questions?