parallel and distributed computing - ulisboa · outline parallel programming dependency graphs...

34
Parallel Programming Methodology Parallel and Distributed Computing Department of Computer Science and Engineering (DEI) Instituto Superior T´ ecnico October 4, 2012 CPD (DEI / IST) Parallel and Distributed Computing – 6 2012-10-04 1 / 26

Upload: truonglien

Post on 19-Jul-2018

222 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Parallel and Distributed Computing - ULisboa · Outline Parallel programming Dependency graphs Overheads in uence on programming of shared- vs distributed-memory systems Foster’s

Parallel Programming Methodology

Parallel and Distributed Computing

Department of Computer Science and Engineering (DEI)Instituto Superior Tecnico

October 4, 2012

CPD (DEI / IST) Parallel and Distributed Computing – 6 2012-10-04 1 / 26

Page 2: Parallel and Distributed Computing - ULisboa · Outline Parallel programming Dependency graphs Overheads in uence on programming of shared- vs distributed-memory systems Foster’s

Outline

Parallel programming

Dependency graphs

Overheads

influence on programming of shared- vs distributed-memory systems

Foster’s design methodology

CPD (DEI / IST) Parallel and Distributed Computing – 6 2012-10-04 2 / 26

Page 3: Parallel and Distributed Computing - ULisboa · Outline Parallel programming Dependency graphs Overheads in uence on programming of shared- vs distributed-memory systems Foster’s

Parallel Programming

Steps:

Identify work that can be done in parallel

Partition work and perhaps data among tasks

Manage data access, communication and synchronization

CPD (DEI / IST) Parallel and Distributed Computing – 6 2012-10-04 3 / 26

Page 4: Parallel and Distributed Computing - ULisboa · Outline Parallel programming Dependency graphs Overheads in uence on programming of shared- vs distributed-memory systems Foster’s

Dependency Graphs

Programs can be modeled as directed graphs:

Nodes: at the finer granularity level, are instructions

⇒ to reduce complexity, nodes may be an arbitrarysequence of statements

Edges: data dependency constraints among instructions in the nodes

Data Dependency Graphs

CPD (DEI / IST) Parallel and Distributed Computing – 6 2012-10-04 4 / 26

Page 5: Parallel and Distributed Computing - ULisboa · Outline Parallel programming Dependency graphs Overheads in uence on programming of shared- vs distributed-memory systems Foster’s

Dependency Graphs

Programs can be modeled as directed graphs:

Nodes: at the finer granularity level, are instructions

⇒ to reduce complexity, nodes may be an arbitrarysequence of statements

Edges: data dependency constraints among instructions in the nodes

Data Dependency Graphs

CPD (DEI / IST) Parallel and Distributed Computing – 6 2012-10-04 4 / 26

Page 6: Parallel and Distributed Computing - ULisboa · Outline Parallel programming Dependency graphs Overheads in uence on programming of shared- vs distributed-memory systems Foster’s

Dependency Graphs

read(A, B);

x = initX(A, B);

y = initY(A, B);

z = initZ(A, B);

for(i = 0; i < N_ENTRIES; i++)

x[i] = compX(y[i], z[i]);

for(i = 1; i < N_ENTRIES; i++){

x[i] = solveX(x[i-1]);

z[i] = x[i] + y[i];

}

finalize1(&x, &y, &z);

finalize2(&x, &y, &z);

finalize3(&x, &y, &z);

.

.

. ...

CPD (DEI / IST) Parallel and Distributed Computing – 6 2012-10-04 5 / 26

Page 7: Parallel and Distributed Computing - ULisboa · Outline Parallel programming Dependency graphs Overheads in uence on programming of shared- vs distributed-memory systems Foster’s

Types of Parallelism

C

A

B B B

E

A

B C D

A

A

A

B

C

B

B

C

C

Data Functional PipelineParallelism Parallelism Parallelism

CPD (DEI / IST) Parallel and Distributed Computing – 6 2012-10-04 6 / 26

Page 8: Parallel and Distributed Computing - ULisboa · Outline Parallel programming Dependency graphs Overheads in uence on programming of shared- vs distributed-memory systems Foster’s

Overheads

Task creation/finish

Data transfer

Communication (synchronization)

Load balancing

CPD (DEI / IST) Parallel and Distributed Computing – 6 2012-10-04 7 / 26

Page 9: Parallel and Distributed Computing - ULisboa · Outline Parallel programming Dependency graphs Overheads in uence on programming of shared- vs distributed-memory systems Foster’s

Shared vs Distributed Memory Systems

Overheads very different depending on type of architecture!

Start/Finish Data Load Comm

Shared H H = NDistributed N N = H

CPD (DEI / IST) Parallel and Distributed Computing – 6 2012-10-04 8 / 26

Page 10: Parallel and Distributed Computing - ULisboa · Outline Parallel programming Dependency graphs Overheads in uence on programming of shared- vs distributed-memory systems Foster’s

Shared vs Distributed Memory Systems

Tasks SM: more dynamic creation of tasks, hence these can be more fine-grained.DM: typically all tasks active until end, hence requires more coarse-grained

tasks.

Data SM: data partition not an issue when defining tasks; however caution whenaccessing shared data: avoid races using mutual-exclusive regions

DM: data partition is critical for the performance of the application

in both SM and DM:

minimize synchronization pointsbe careful about load balancing

CPD (DEI / IST) Parallel and Distributed Computing – 6 2012-10-04 9 / 26

Page 11: Parallel and Distributed Computing - ULisboa · Outline Parallel programming Dependency graphs Overheads in uence on programming of shared- vs distributed-memory systems Foster’s

Shared vs Distributed Memory Systems

Tasks SM: more dynamic creation of tasks, hence these can be more fine-grained.DM: typically all tasks active until end, hence requires more coarse-grained

tasks.

Data SM: data partition not an issue when defining tasks; however caution whenaccessing shared data: avoid races using mutual-exclusive regions

DM: data partition is critical for the performance of the application

in both SM and DM:

minimize synchronization pointsbe careful about load balancing

CPD (DEI / IST) Parallel and Distributed Computing – 6 2012-10-04 9 / 26

Page 12: Parallel and Distributed Computing - ULisboa · Outline Parallel programming Dependency graphs Overheads in uence on programming of shared- vs distributed-memory systems Foster’s

Shared vs Distributed Memory Systems

Tasks SM: more dynamic creation of tasks, hence these can be more fine-grained.DM: typically all tasks active until end, hence requires more coarse-grained

tasks.

Data SM: data partition not an issue when defining tasks; however caution whenaccessing shared data: avoid races using mutual-exclusive regions

DM: data partition is critical for the performance of the application

in both SM and DM:

minimize synchronization pointsbe careful about load balancing

CPD (DEI / IST) Parallel and Distributed Computing – 6 2012-10-04 9 / 26

Page 13: Parallel and Distributed Computing - ULisboa · Outline Parallel programming Dependency graphs Overheads in uence on programming of shared- vs distributed-memory systems Foster’s

Shared Memory Systems

Typical diagram of a parallel application under shared memory:

Master Thread

Other ThreadsFork

Join

Fork

Join

Tim

e

Fork / Join Parallelism

CPD (DEI / IST) Parallel and Distributed Computing – 6 2012-10-04 10 / 26

Page 14: Parallel and Distributed Computing - ULisboa · Outline Parallel programming Dependency graphs Overheads in uence on programming of shared- vs distributed-memory systems Foster’s

Shared Memory Systems

Application is typically a single program, with directives to handleparallelism:

fork / join

parallel loops

private vs shared variables

critical sections

CPD (DEI / IST) Parallel and Distributed Computing – 6 2012-10-04 11 / 26

Page 15: Parallel and Distributed Computing - ULisboa · Outline Parallel programming Dependency graphs Overheads in uence on programming of shared- vs distributed-memory systems Foster’s

Distributed Memory Systems

Cannot use fine granularity!

Each processor gets assigned a (large) task:

static scheduling: all tasks start at the beginning of computation

dynamic scheduling: tasks start as needed

Application is typically also a single program!⇒ identification number of each task indicates what is its job.

CPD (DEI / IST) Parallel and Distributed Computing – 6 2012-10-04 12 / 26

Page 16: Parallel and Distributed Computing - ULisboa · Outline Parallel programming Dependency graphs Overheads in uence on programming of shared- vs distributed-memory systems Foster’s

Distributed Memory Systems

Cannot use fine granularity!

Each processor gets assigned a (large) task:

static scheduling: all tasks start at the beginning of computation

dynamic scheduling: tasks start as needed

Application is typically also a single program!⇒ identification number of each task indicates what is its job.

CPD (DEI / IST) Parallel and Distributed Computing – 6 2012-10-04 12 / 26

Page 17: Parallel and Distributed Computing - ULisboa · Outline Parallel programming Dependency graphs Overheads in uence on programming of shared- vs distributed-memory systems Foster’s

Task / Channel Model

Parallel programming for distributed memory systems uses:

Task / Channel Model

Parallel computation is represented as a set of tasks that may interact witheach other by sending messages through channels.

Task: program + local memory + I/O ports

Channel: message queue that connects one task’s output port withanother task’s input port

All tasks start simultaneously, and finishing time is determined by the timethe last task stops its execution.

CPD (DEI / IST) Parallel and Distributed Computing – 6 2012-10-04 13 / 26

Page 18: Parallel and Distributed Computing - ULisboa · Outline Parallel programming Dependency graphs Overheads in uence on programming of shared- vs distributed-memory systems Foster’s

Messages in the Task / Channel Model

ordering of data in the channel is maintained

receiving task blocks until a value is available at the receiver

sender never blocks, independently of previous messages not yetdelivered

In the task / channel model

receiving is a synchronous operation

sending is an asynchronous operation

CPD (DEI / IST) Parallel and Distributed Computing – 6 2012-10-04 14 / 26

Page 19: Parallel and Distributed Computing - ULisboa · Outline Parallel programming Dependency graphs Overheads in uence on programming of shared- vs distributed-memory systems Foster’s

Foster’s Design Methodology

Development of scalable parallel algorithms by delayingmachine-dependent decisions to later stages.

Four steps:

partitioning

communication

agglomeration

mapping

CPD (DEI / IST) Parallel and Distributed Computing – 6 2012-10-04 15 / 26

Page 20: Parallel and Distributed Computing - ULisboa · Outline Parallel programming Dependency graphs Overheads in uence on programming of shared- vs distributed-memory systems Foster’s

Foster’s Design Methodology

Problem

Primitive Tasks

Partitioning

Communication

Agglomeration

CPD (DEI / IST) Parallel and Distributed Computing – 6 2012-10-04 16 / 26

Page 21: Parallel and Distributed Computing - ULisboa · Outline Parallel programming Dependency graphs Overheads in uence on programming of shared- vs distributed-memory systems Foster’s

Foster’s Design Methodology

Problem

Primitive Tasks

Partitioning

Communication

Agglomeration

CPD (DEI / IST) Parallel and Distributed Computing – 6 2012-10-04 16 / 26

Page 22: Parallel and Distributed Computing - ULisboa · Outline Parallel programming Dependency graphs Overheads in uence on programming of shared- vs distributed-memory systems Foster’s

Foster’s Design Methodology

Problem

Primitive Tasks

Partitioning

Communication

Agglomeration

CPD (DEI / IST) Parallel and Distributed Computing – 6 2012-10-04 16 / 26

Page 23: Parallel and Distributed Computing - ULisboa · Outline Parallel programming Dependency graphs Overheads in uence on programming of shared- vs distributed-memory systems Foster’s

Foster’s Design Methodology

Problem

Primitive Tasks

Partitioning

Communication

Agglomeration

CPD (DEI / IST) Parallel and Distributed Computing – 6 2012-10-04 16 / 26

Page 24: Parallel and Distributed Computing - ULisboa · Outline Parallel programming Dependency graphs Overheads in uence on programming of shared- vs distributed-memory systems Foster’s

Foster’s Design Methodology

Problem

Primitive Tasks

Partitioning

Communication

Agglomeration

Mapping

CPD (DEI / IST) Parallel and Distributed Computing – 6 2012-10-04 16 / 26

Page 25: Parallel and Distributed Computing - ULisboa · Outline Parallel programming Dependency graphs Overheads in uence on programming of shared- vs distributed-memory systems Foster’s

Foster’s Design Methodology: Partitioning

Partitioning

Process of dividing the computation and data into many small primitivetasks.

Strategies: (no single universal recipe...)

data decomposition

functional decomposition

recursive decomposition

Checklist:

> 10× P primitive tasks than P processors

minimize redundant computations and redundant data storage

primitive tasks are roughly the same size

number of tasks grows naturally with the problem size

CPD (DEI / IST) Parallel and Distributed Computing – 6 2012-10-04 17 / 26

Page 26: Parallel and Distributed Computing - ULisboa · Outline Parallel programming Dependency graphs Overheads in uence on programming of shared- vs distributed-memory systems Foster’s

Recursive Decomposition

Suitable for problems solvable using divide-and-conquer

Steps:

decompose a problem into a set of sub-problems

recursively decompose each sub-problem

stop decomposition when minimum desired granularity reached

CPD (DEI / IST) Parallel and Distributed Computing – 6 2012-10-04 18 / 26

Page 27: Parallel and Distributed Computing - ULisboa · Outline Parallel programming Dependency graphs Overheads in uence on programming of shared- vs distributed-memory systems Foster’s

Data Decomposition

Appropriate data partitioning is critical to parallel performance

Steps:

identify the data on which computations are performed

partition the data across various tasks

Decomposition can be based on

input data

output data

input + output data

intermediate data

CPD (DEI / IST) Parallel and Distributed Computing – 6 2012-10-04 19 / 26

Page 28: Parallel and Distributed Computing - ULisboa · Outline Parallel programming Dependency graphs Overheads in uence on programming of shared- vs distributed-memory systems Foster’s

Input Data Decomposition

Applicable if each output is computed as a function of the input

May be the only natural decomposition if output is unknown

problem of finding the minimum in a set or other reductionssorting a vector

Associate a task with each input data partition

task performs computation on its part of the datasubsequent processing combines partial results from earlier tasks

CPD (DEI / IST) Parallel and Distributed Computing – 6 2012-10-04 20 / 26

Page 29: Parallel and Distributed Computing - ULisboa · Outline Parallel programming Dependency graphs Overheads in uence on programming of shared- vs distributed-memory systems Foster’s

Output Data Decomposition

Applicable if each element of the output can be computedindependently

algorithm is based on one-to-one or many-to-one functions

Partition the output data across tasks

Have each task perform the computation for its outputs

Example:

Matrix-vector multiplication

CPD (DEI / IST) Parallel and Distributed Computing – 6 2012-10-04 21 / 26

Page 30: Parallel and Distributed Computing - ULisboa · Outline Parallel programming Dependency graphs Overheads in uence on programming of shared- vs distributed-memory systems Foster’s

Foster’s Design Methodology: Communication

Communication

Identification of the communication pattern among primitive tasks.

local communication: values shared by a small number of tasksdraw a channel from producing task to consumer tasks

global communication: values are required by a significant number of taskswhile important, not useful to represent in the task/channel model

Checklist:

communication balanced among tasks

each task communicates with a small number of tasks

tasks can perform their communication concurrently

tasks can perform their computations concurrently

CPD (DEI / IST) Parallel and Distributed Computing – 6 2012-10-04 22 / 26

Page 31: Parallel and Distributed Computing - ULisboa · Outline Parallel programming Dependency graphs Overheads in uence on programming of shared- vs distributed-memory systems Foster’s

Foster’s Design Methodology: Agglomeration

Agglomeration

Process of grouping primitive tasks into larger tasks.

Strategies:

group tasks that have high communication with each othergroup sender tasks and group receiving tasksgroup tasks to allow re-use of sequential code

Checklist:

locality has been maximizedreplicated computations take less time than the communications theyreplaceamount of replicated data is small enough to allow algorithm to scaletasks are balanced in terms of computation and communicationnumber of tasks grows naturally with problem sizenumber of tasks is small, but at least as great as Pcost of modifications to sequential code is minimized

CPD (DEI / IST) Parallel and Distributed Computing – 6 2012-10-04 23 / 26

Page 32: Parallel and Distributed Computing - ULisboa · Outline Parallel programming Dependency graphs Overheads in uence on programming of shared- vs distributed-memory systems Foster’s

Foster’s Design Methodology: Mapping

Mapping

Process of assigning tasks to processors.

Strategies:

maximize processor utilization (average % time processor are active)⇒ even load distribution

minimize interprocessor communication⇒ map tasks with channels among them to the same processor⇒ take into account network topology

CPD (DEI / IST) Parallel and Distributed Computing – 6 2012-10-04 24 / 26

Page 33: Parallel and Distributed Computing - ULisboa · Outline Parallel programming Dependency graphs Overheads in uence on programming of shared- vs distributed-memory systems Foster’s

Review

Parallel programming

Dependency graphs

Overheads

influence on programming of shared- vs distributed-memory systems

Foster’s design methodology

CPD (DEI / IST) Parallel and Distributed Computing – 6 2012-10-04 25 / 26

Page 34: Parallel and Distributed Computing - ULisboa · Outline Parallel programming Dependency graphs Overheads in uence on programming of shared- vs distributed-memory systems Foster’s

Next Class

OpenMP

CPD (DEI / IST) Parallel and Distributed Computing – 6 2012-10-04 26 / 26