design issues. how to parallelize task decomposition data decomposition dataflow decomposition...

24
Design Issues

Upload: daniella-hopkins

Post on 13-Jan-2016

215 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Design Issues. How to parallelize  Task decomposition  Data decomposition  Dataflow decomposition Jaruloj Chongstitvatana 2 Parallel Programming: Parallelization

Design Issues

Page 2: Design Issues. How to parallelize  Task decomposition  Data decomposition  Dataflow decomposition Jaruloj Chongstitvatana 2 Parallel Programming: Parallelization

How to parallelize

Task decomposition Data decomposition Dataflow decomposition

Jaruloj Chongstitvatana 2Parallel Programming: Parallelization

Page 3: Design Issues. How to parallelize  Task decomposition  Data decomposition  Dataflow decomposition Jaruloj Chongstitvatana 2 Parallel Programming: Parallelization

Jaruloj ChongstitvatanaParallel Programming: Parallelization 3

Task Decomposition

Page 4: Design Issues. How to parallelize  Task decomposition  Data decomposition  Dataflow decomposition Jaruloj Chongstitvatana 2 Parallel Programming: Parallelization

Task decomposition

Identify tasks Decompose the serial code into parts which

can be paralleled. These parts need to be completely

independent. Dependency: interaction between tasks.

Sequential consistency property The parallel code gives the same result, for

the same input, as the serial code does. Test for parallelizable loop

If running the loop in reverse order gives the same result as the original loop, it is possibly parallelizable.

Jaruloj ChongstitvatanaParallel Programming: Parallelization 4

Page 5: Design Issues. How to parallelize  Task decomposition  Data decomposition  Dataflow decomposition Jaruloj Chongstitvatana 2 Parallel Programming: Parallelization

Design Consideration

What are the tasks & how are tasks defined?

Dependencies between tasks & how to satisfy the dependencies.

How to assign tasks to threads/ processord.

Jaruloj ChongstitvatanaParallel Programming: Parallelization 5

Page 6: Design Issues. How to parallelize  Task decomposition  Data decomposition  Dataflow decomposition Jaruloj Chongstitvatana 2 Parallel Programming: Parallelization

Task Definition

Tasks mostly are related to activities. Used in GUI applications. Example: multimedia web application

Play background music Display animation. Read input from user.

Which part of code should be parallelized? Hotspot: the part that gets executed often.

Jaruloj ChongstitvatanaParallel Programming: Parallelization 6

Page 7: Design Issues. How to parallelize  Task decomposition  Data decomposition  Dataflow decomposition Jaruloj Chongstitvatana 2 Parallel Programming: Parallelization

Criteria for decomposition

More tasks then threads (or cores). Why?

Granularity (Fine/coarse-grained decomposition) The amount of computation in tasks or the

time between synchronizations. Tasks are big enough, comparing to

overhead of handling tasks and threads. Overhead contains thread management,

synchronization, etc.

Jaruloj ChongstitvatanaParallel Programming: Parallelization 7

Page 8: Design Issues. How to parallelize  Task decomposition  Data decomposition  Dataflow decomposition Jaruloj Chongstitvatana 2 Parallel Programming: Parallelization

Fine-grained Coarse-grained

Jaruloj ChongstitvatanaParallel Programming: Parallelization 8

Page 9: Design Issues. How to parallelize  Task decomposition  Data decomposition  Dataflow decomposition Jaruloj Chongstitvatana 2 Parallel Programming: Parallelization

Dependency between tasks

Order dependency Execution order. Can be enforced by:

Put dependent tasks in the same thread.

Add synchronization.

Data dependency Variables shared between

tasks. Can be solved by:

shared and private variables. locks and critical regions.

Jaruloj ChongstitvatanaParallel Programming: Parallelization 9

A DC

B

A D

C

B

sum=0; suma=0; for (i=0; i<m; i++) suma=suma+a[i];sumb=0;for (j=0; j<n; j++) sumb=sumb+b[j];sum=suma+sumb;

sum=0;for (i=0; i<m; i++) sum=sum+a[i];for (i=0; i<n; i++) sum=sum+b[i];

Page 10: Design Issues. How to parallelize  Task decomposition  Data decomposition  Dataflow decomposition Jaruloj Chongstitvatana 2 Parallel Programming: Parallelization

Task scheduling

Static scheduling Simple. Work well if the amount of work can be

estimated before execution. Dynamic scheduling

Divide more tasks than processing elements.

Assign a task to a processing element whenever it is free.

Jaruloj ChongstitvatanaParallel Programming: Parallelization 10

Page 11: Design Issues. How to parallelize  Task decomposition  Data decomposition  Dataflow decomposition Jaruloj Chongstitvatana 2 Parallel Programming: Parallelization

Data Decomposition

Jaruloj ChongstitvatanaParallel Programming: Parallelization 11

Page 12: Design Issues. How to parallelize  Task decomposition  Data decomposition  Dataflow decomposition Jaruloj Chongstitvatana 2 Parallel Programming: Parallelization

Data decomposition

Divide data into chunks & each task works on a chunk.

Considerations How to divide data Make sure each task have access to require

data. Where each chuck goes?

Jaruloj ChongstitvatanaParallel Programming: Parallelization 12

Page 13: Design Issues. How to parallelize  Task decomposition  Data decomposition  Dataflow decomposition Jaruloj Chongstitvatana 2 Parallel Programming: Parallelization

How to divide data

Roughly equally Except when computation is not the same

for all data. Shape of the chunks

The number of neighbor chunks amount of data exchange

Jaruloj ChongstitvatanaParallel Programming: Parallelization 13

Page 14: Design Issues. How to parallelize  Task decomposition  Data decomposition  Dataflow decomposition Jaruloj Chongstitvatana 2 Parallel Programming: Parallelization

Data access for each task

Make local copy of data for each task. Data duplication

Waste memory Synchronization for data consistency No synchronization if data are read only. Not worth if used only few times.

No duplication needed in shared memory model.

Jaruloj ChongstitvatanaParallel Programming: Parallelization 14

Page 15: Design Issues. How to parallelize  Task decomposition  Data decomposition  Dataflow decomposition Jaruloj Chongstitvatana 2 Parallel Programming: Parallelization

Assign chunks to threads/cores Static scheduling

In distributed memory model, shared data need to be considered to reduce synchronization.

Dynamic scheduling Do not know ahead of time.

Jaruloj ChongstitvatanaParallel Programming: Parallelization 15

Page 16: Design Issues. How to parallelize  Task decomposition  Data decomposition  Dataflow decomposition Jaruloj Chongstitvatana 2 Parallel Programming: Parallelization

Example

void computeNextGen (Grid curr, Grid next, int N, int M){ int count;for (int i = 1; i <= N; i++) { for (int j = 1; j <= M; j++) { count = 0; if (curr[i-1][j-1] == ALIVE) count++; if (curr[i-1][j] == ALIVE) count++; … if (curr[i+1][j+1] == ALIVE) count++;

if (count <= 1 || count >= 4) next[i][j] = DEAD;else if (curr[i][j] == ALIVE && (count == 2 || count == 3)) next[i][j] = ALIVE;else if (curr[i][j] == DEAD && count == 3) next[i][j] = ALIVE;else next[i][j] = DEAD; } }return; }

Jaruloj ChongstitvatanaParallel Programming: Parallelization 16

Page 17: Design Issues. How to parallelize  Task decomposition  Data decomposition  Dataflow decomposition Jaruloj Chongstitvatana 2 Parallel Programming: Parallelization

Dataflow decomposition

Break up problems based on how data flows between tasks.

Producer/consumer

Jaruloj ChongstitvatanaParallel Programming: Parallelization 17

Page 18: Design Issues. How to parallelize  Task decomposition  Data decomposition  Dataflow decomposition Jaruloj Chongstitvatana 2 Parallel Programming: Parallelization

What not to parallelize

Algorithms with state Example: Finite state machine simulation

Recurrence relations Examples: convergence loop, calculating fibonacci

Induction variables Variables incremented once in each iteration of

loop Reduction

Do something from a collection of data, e.g. sum. Loop-carried dependence

Results of previous iteration used in current iteration

Jaruloj ChongstitvatanaParallel Programming: Parallelization 18

Page 19: Design Issues. How to parallelize  Task decomposition  Data decomposition  Dataflow decomposition Jaruloj Chongstitvatana 2 Parallel Programming: Parallelization

Algorithms with state

adding some form of synchronization serialize all concurrent executions

writing the code to be reentrant (i.e., it can be reentered without detrimental side effects while it is already running). may not be possible if the update of global

variables is part of the code. Use thread-local storage if the variable(s)

holding the state does not have to be shared between threads.

Jaruloj ChongstitvatanaParallel Programming: Parallelization 19

Page 20: Design Issues. How to parallelize  Task decomposition  Data decomposition  Dataflow decomposition Jaruloj Chongstitvatana 2 Parallel Programming: Parallelization

Recurrence Relations

Jaruloj ChongstitvatanaParallel Programming: Parallelization 20

Page 21: Design Issues. How to parallelize  Task decomposition  Data decomposition  Dataflow decomposition Jaruloj Chongstitvatana 2 Parallel Programming: Parallelization

Induction Variables

i1 = 4;i2 = 0;for (k = 1; k < N; k+

+) {B[i1++] =

function1(k,q,r);i2 += k;A[i2] =

function2(k,r,q);}

i1 = 4;i2 = 0;for (k = 1; k < N; k++) {B[k+4] =

function1(k,q,r);i2 = (k*k + k)/2;A[i2] =

function2(k,r,q);}

Jaruloj ChongstitvatanaParallel Programming: Parallelization 21

Page 22: Design Issues. How to parallelize  Task decomposition  Data decomposition  Dataflow decomposition Jaruloj Chongstitvatana 2 Parallel Programming: Parallelization

Reduction

Combining a collection of data and reduce it to a single scalar value.

To remove the dependency, the combining operation must be associative and commutative.

sum = 0;big = c[0];for (i = 0; i < N; i++) {sum += c[i];big = (c[i] > big ? c[i] : big); }

Jaruloj ChongstitvatanaParallel Programming: Parallelization 22

Page 23: Design Issues. How to parallelize  Task decomposition  Data decomposition  Dataflow decomposition Jaruloj Chongstitvatana 2 Parallel Programming: Parallelization

Loop-carried Dependence

References to the same array on both LHS and RHS of assignments and a backward reference in some RHS use of array.

General case of recurrence relations. Cannot be solved easily.for (k = 5; k < N; k++) {

b[k] = DoSomething(k);a[k] = b[k-5] + MoreStuff(k);

}Jaruloj ChongstitvatanaParallel Programming: Parallelization 23

Page 24: Design Issues. How to parallelize  Task decomposition  Data decomposition  Dataflow decomposition Jaruloj Chongstitvatana 2 Parallel Programming: Parallelization

Example: Loop-carried dependencewrap = a[0] * b[0];for (i = 1; i < N; i++)

{c[i] = wrap;wrap = a[i] * b[i];d[i] = 2 * wrap;}

for (i = 1; i < N; i++) {

wrap = a[i-1] * b[i-1];

c[i] = wrap;wrap = a[i] * b[i];d[i] = 2 * wrap;}

Jaruloj ChongstitvatanaParallel Programming: Parallelization 24