cisc 372 5 october

CISC 372 5 October

Goals for today:• Foster’s parallel algorithm design

– Partitioning– Task dependency graph

• Granularity• Concurrency• Collective communication

Task/Channel Model

TaskTaskChannelChannel

Parallel Algorithm Design

• Large problem– May be complex– May have large data set– May be “embarrassingly parallel”

• How “solve” problem? (Foster’s Methodology)– Partition problem & data– Determine communication requirements– Agglomerate tasks into more efficient size– Map tasks to processors

Foster’s Methodology

P ro b lemP artitio ning

C ommunic atio n

Agglomeratio nM ap p ing

Partitioning

• Dividing computation and data into pieces• Domain decomposition

– Divide data into pieces– Determine how to associate computations with the

• Functional decomposition– Divide computation into pieces– Determine how to associate data with the

computations

Partitioning Checklist

• At least 10x more primitive tasks than processors in target computer

• Minimize redundant computations and redundant data storage

• Primitive tasks roughly the same size

• Number of tasks an increasing function of problem size

C ommunic atio n

Communication

• Determine values passed among tasks• Local communication

– Task needs values from a small number of other tasks

– Create channels illustrating data flow

• Global communication– Significant number of tasks contribute data to

perform a computation– Don’t create channels for them early in design

Communication Checklist

• Communication operations balanced among tasks

• Each task communicates with only small group of neighbors

• Tasks can perform communications concurrently

• Task can perform computations concurrently

C ommunic atio n

Agglomeration

• Grouping tasks into larger tasks• Goals

– Improve performance– Maintain scalability of program– Simplify programming

• In MPI programming, goal often to create one agglomerated task per processor

Agglomeration to Improve Performance

• Eliminate communication between primitive tasks agglomerated into consolidated task

• Combine groups of sending and receiving tasks

Agglomeration Checklist

• Locality of parallel algorithm has increased• Replicated computations take less time than

communications they replace• Data replication doesn’t affect scalability• Agglomerated tasks have similar computational

and communications costs• Number of tasks increases with problem size• Number of tasks suitable for likely target systems• Tradeoff between agglomeration and code

modifications costs is reasonable

C ommunic atio n

Mapping

• Process of assigning tasks to processors• Centralized multiprocessor: mapping done by

operating system• Distributed memory system: mapping done by

user• Conflicting goals of mapping

– Maximize processor utilization– Minimize interprocessor communication

Mapping Example

Optimal Mapping

• Finding optimal mapping is NP-hard

• Must rely on heuristics

Mapping Checklist

• Considered designs based on one task per processor and multiple tasks per processor

• Evaluated static and dynamic task allocation• If dynamic task allocation chosen, task

allocator is not a bottleneck to performance• If static task allocation chosen, ratio of tasks

to processors is at least 10:1

Complex Mesh Decomposition

Mesh Cube With

Hollow Sphere inside

Cross-section of Cube

• Finite number of tetrahedra• Each tetrahedron varies in size

Tetra-he-who?

• Tetrahedron: A solid having four triangular faces

Maybe not this. This is a tetrahedron.

Static Task Number,Unstructured Comm,

Mesh Cube Partitioned with

Edge Detection

• Finite number of pixels• All pixels same size• All pixel values constrained (0-255)• Stencil computation

Electromagnetic Fields

• Finite number of points• Laplacian equation (Jacobi method - iterate

until converge on solution)• Convergence time varies at each point

• Rough surface sphere• Radiation source at

center• Measure strength on

surface

Parallel Tree Search

• Unbalanced tree

• Nodes may change from active to inactive at any time

• Searching for specific object in set of all active nodes

Online Sort

• Receive list in separate pieces at different times (constant update)

• Keep list sorted at all times

cisc 372 5 october

scalabilityagglomerated

ratio of tasks

multiple tasks

efficient sizemap tasks

static task number

taskseach task

task allocator

large data setmay

Documents

risc cisc difference

risc cisc

cisc i risc

1 recap. 2 complex instruction set computing (cisc) cisc –...

risc x cisc

cisc design

risc ans cisc

risc and cisc understanding the risc and cisc architectures

cisc vs risc

cisc architecture

cs 372: computational geometry lecture 7 segment treescs...

cisc 2 - caxtoncollege.com

risc vs cisc

arquitecturas risc - cisc

cisc icca the canadian steel conference · 2019-04-30 ·...

cisc processors

presentation risc vs cisc

risc - cisc

spit 372 spare-parts 372 05022008

x86 architecture. intel’s dilemma all (most) software uses...