design issues. how to parallelize task decomposition data decomposition dataflow decomposition...

Post on 13-Jan-2016

215 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Design Issues

How to parallelize

Task decomposition Data decomposition Dataflow decomposition

Jaruloj Chongstitvatana 2Parallel Programming: Parallelization

Jaruloj ChongstitvatanaParallel Programming: Parallelization 3

Task Decomposition

Task decomposition

Identify tasks Decompose the serial code into parts which

can be paralleled. These parts need to be completely

independent. Dependency: interaction between tasks.

Sequential consistency property The parallel code gives the same result, for

the same input, as the serial code does. Test for parallelizable loop

If running the loop in reverse order gives the same result as the original loop, it is possibly parallelizable.

Jaruloj ChongstitvatanaParallel Programming: Parallelization 4

Design Consideration

What are the tasks & how are tasks defined?

Dependencies between tasks & how to satisfy the dependencies.

How to assign tasks to threads/ processord.

Jaruloj ChongstitvatanaParallel Programming: Parallelization 5

Task Definition

Tasks mostly are related to activities. Used in GUI applications. Example: multimedia web application

Play background music Display animation. Read input from user.

Which part of code should be parallelized? Hotspot: the part that gets executed often.

Jaruloj ChongstitvatanaParallel Programming: Parallelization 6

Criteria for decomposition

More tasks then threads (or cores). Why?

Granularity (Fine/coarse-grained decomposition) The amount of computation in tasks or the

time between synchronizations. Tasks are big enough, comparing to

overhead of handling tasks and threads. Overhead contains thread management,

synchronization, etc.

Jaruloj ChongstitvatanaParallel Programming: Parallelization 7

Fine-grained Coarse-grained

Jaruloj ChongstitvatanaParallel Programming: Parallelization 8

Dependency between tasks

Order dependency Execution order. Can be enforced by:

Put dependent tasks in the same thread.

Add synchronization.

Data dependency Variables shared between

tasks. Can be solved by:

shared and private variables. locks and critical regions.

Jaruloj ChongstitvatanaParallel Programming: Parallelization 9

A DC

B

A D

C

B

sum=0; suma=0; for (i=0; i<m; i++) suma=suma+a[i];sumb=0;for (j=0; j<n; j++) sumb=sumb+b[j];sum=suma+sumb;

sum=0;for (i=0; i<m; i++) sum=sum+a[i];for (i=0; i<n; i++) sum=sum+b[i];

Task scheduling

Static scheduling Simple. Work well if the amount of work can be

estimated before execution. Dynamic scheduling

Divide more tasks than processing elements.

Assign a task to a processing element whenever it is free.

Jaruloj ChongstitvatanaParallel Programming: Parallelization 10

Data Decomposition

Jaruloj ChongstitvatanaParallel Programming: Parallelization 11

Data decomposition

Divide data into chunks & each task works on a chunk.

Considerations How to divide data Make sure each task have access to require

data. Where each chuck goes?

Jaruloj ChongstitvatanaParallel Programming: Parallelization 12

How to divide data

Roughly equally Except when computation is not the same

for all data. Shape of the chunks

The number of neighbor chunks amount of data exchange

Jaruloj ChongstitvatanaParallel Programming: Parallelization 13

Data access for each task

Make local copy of data for each task. Data duplication

Waste memory Synchronization for data consistency No synchronization if data are read only. Not worth if used only few times.

No duplication needed in shared memory model.

Jaruloj ChongstitvatanaParallel Programming: Parallelization 14

Assign chunks to threads/cores Static scheduling

In distributed memory model, shared data need to be considered to reduce synchronization.

Dynamic scheduling Do not know ahead of time.

Jaruloj ChongstitvatanaParallel Programming: Parallelization 15

Example

void computeNextGen (Grid curr, Grid next, int N, int M){ int count;for (int i = 1; i <= N; i++) { for (int j = 1; j <= M; j++) { count = 0; if (curr[i-1][j-1] == ALIVE) count++; if (curr[i-1][j] == ALIVE) count++; … if (curr[i+1][j+1] == ALIVE) count++;

if (count <= 1 || count >= 4) next[i][j] = DEAD;else if (curr[i][j] == ALIVE && (count == 2 || count == 3)) next[i][j] = ALIVE;else if (curr[i][j] == DEAD && count == 3) next[i][j] = ALIVE;else next[i][j] = DEAD; } }return; }

Jaruloj ChongstitvatanaParallel Programming: Parallelization 16

Dataflow decomposition

Break up problems based on how data flows between tasks.

Producer/consumer

Jaruloj ChongstitvatanaParallel Programming: Parallelization 17

What not to parallelize

Algorithms with state Example: Finite state machine simulation

Recurrence relations Examples: convergence loop, calculating fibonacci

Induction variables Variables incremented once in each iteration of

loop Reduction

Do something from a collection of data, e.g. sum. Loop-carried dependence

Results of previous iteration used in current iteration

Jaruloj ChongstitvatanaParallel Programming: Parallelization 18

Algorithms with state

adding some form of synchronization serialize all concurrent executions

writing the code to be reentrant (i.e., it can be reentered without detrimental side effects while it is already running). may not be possible if the update of global

variables is part of the code. Use thread-local storage if the variable(s)

holding the state does not have to be shared between threads.

Jaruloj ChongstitvatanaParallel Programming: Parallelization 19

Recurrence Relations

Jaruloj ChongstitvatanaParallel Programming: Parallelization 20

Induction Variables

i1 = 4;i2 = 0;for (k = 1; k < N; k+

+) {B[i1++] =

function1(k,q,r);i2 += k;A[i2] =

function2(k,r,q);}

i1 = 4;i2 = 0;for (k = 1; k < N; k++) {B[k+4] =

function1(k,q,r);i2 = (k*k + k)/2;A[i2] =

function2(k,r,q);}

Jaruloj ChongstitvatanaParallel Programming: Parallelization 21

Reduction

Combining a collection of data and reduce it to a single scalar value.

To remove the dependency, the combining operation must be associative and commutative.

sum = 0;big = c[0];for (i = 0; i < N; i++) {sum += c[i];big = (c[i] > big ? c[i] : big); }

Jaruloj ChongstitvatanaParallel Programming: Parallelization 22

Loop-carried Dependence

References to the same array on both LHS and RHS of assignments and a backward reference in some RHS use of array.

General case of recurrence relations. Cannot be solved easily.for (k = 5; k < N; k++) {

b[k] = DoSomething(k);a[k] = b[k-5] + MoreStuff(k);

}Jaruloj ChongstitvatanaParallel Programming: Parallelization 23

Example: Loop-carried dependencewrap = a[0] * b[0];for (i = 1; i < N; i++)

{c[i] = wrap;wrap = a[i] * b[i];d[i] = 2 * wrap;}

for (i = 1; i < N; i++) {

wrap = a[i-1] * b[i-1];

c[i] = wrap;wrap = a[i] * b[i];d[i] = 2 * wrap;}

Jaruloj ChongstitvatanaParallel Programming: Parallelization 24

top related