compe472 parallel computing 1 compe 462 parallel computing ali yazıcı – ziya karakaya 2010-11...

66
COMPE472 Parallel Computing 1 COMPE 462 Parallel Computing Ali Yazıcı – Ziya Karakaya 2010-11 Fall Atılım University

Upload: mervyn-walters

Post on 17-Dec-2015

246 views

Category:

Documents


8 download

TRANSCRIPT

Page 1: COMPE472 Parallel Computing 1 COMPE 462 Parallel Computing Ali Yazıcı – Ziya Karakaya 2010-11 Fall Atılım University

COMPE472 Parallel Computing 1

COMPE 462 Parallel Computing

Ali Yazıcı – Ziya Karakaya2010-11 Fall

Atılım University

Page 2: COMPE472 Parallel Computing 1 COMPE 462 Parallel Computing Ali Yazıcı – Ziya Karakaya 2010-11 Fall Atılım University

COMPE472 Parallel Computing 2

Parallel Programming

Chapter 1

Page 3: COMPE472 Parallel Computing 1 COMPE 462 Parallel Computing Ali Yazıcı – Ziya Karakaya 2010-11 Fall Atılım University

COMPE472 Parallel Computing 3

Why Use Parallel Computing? Main Reasons: Save time and/or money: In theory, throwing more

resources at a task will shorten its time to completion, with potential cost savings. Parallel clusters can be built from cheap, commodity components.    

Solve larger problems: Many problems are so large and/or complex that it is impractical or impossible to solve them on a single computer, especially given limited computer memory. For example: "Grand Challenge" (en.wikipedia.org/wiki/Grand_Challenge)

problems requiring PetaFLOPS and PetaBytes of computing resources.

Web search engines/databases processing millions of transactions per second

Page 4: COMPE472 Parallel Computing 1 COMPE 462 Parallel Computing Ali Yazıcı – Ziya Karakaya 2010-11 Fall Atılım University

COMPE472 Parallel Computing 4

Demand for Computational Speed

Continual demand for greater computational speed from a computer system than is currently possible

Areas requiring great computational speed include numerical modeling and simulation of scientific and engineering problems.

Computations must be completed within a “reasonable” time period.

Page 5: COMPE472 Parallel Computing 1 COMPE 462 Parallel Computing Ali Yazıcı – Ziya Karakaya 2010-11 Fall Atılım University

COMPE472 Parallel Computing 5

Grand Challenge ProblemsOne that cannot be solved in a reasonable amount of time with today’s computers. Obviously, an execution time of 10 years is always unreasonable.

Examples Databases, data mining Oil exploration Web search engines, web based business services Medical imaging and diagnosis Pharmaceutical design Management of national and multi-national corporations Financial and economic modeling Advanced graphics and virtual reality, particularly in the entertainment

industry Networked video and multi-media technologies Collaborative work environments Modeling large DNA structures Global weather forecasting Modeling motion of astronomical bodies.

Page 6: COMPE472 Parallel Computing 1 COMPE 462 Parallel Computing Ali Yazıcı – Ziya Karakaya 2010-11 Fall Atılım University

COMPE472 Parallel Computing 6

Weather Forecasting

Atmosphere modeled by dividing it into 3-dimensional cells.

Calculations of each cell repeated many times to model passage of time.

Page 7: COMPE472 Parallel Computing 1 COMPE 462 Parallel Computing Ali Yazıcı – Ziya Karakaya 2010-11 Fall Atılım University

COMPE472 Parallel Computing 7

Global Weather Forecasting Example

Suppose whole global atmosphere divided into cells of size 1 mile 1 mile 1 mile to a height of 10 miles (10 cells high) - about 5 108 cells.

Suppose each calculation requires 200 floating point operations. In one time step, 1011 floating point operations necessary.

To forecast the weather over 7 days using 1-minute intervals, a computer operating at 1Gflops (109 floating point operations/s) takes 106 seconds or over 10 days.

To perform calculation in 5 minutes requires computer operating at 3.4 Tflops (3.4 1012 floating point operations/sec).

Page 8: COMPE472 Parallel Computing 1 COMPE 462 Parallel Computing Ali Yazıcı – Ziya Karakaya 2010-11 Fall Atılım University

COMPE472 Parallel Computing 8

Modeling Motion of Astronomical Bodies

Each body attracted to each other body by gravitational forces. Movement of each body predicted by calculating total force on each body.

With N bodies, N - 1 forces to calculate for each body,

or approx. N2 calculations. (N log2 N for an efficient approx. algorithm.)

After determining new positions of bodies, calculations repeated.

Page 9: COMPE472 Parallel Computing 1 COMPE 462 Parallel Computing Ali Yazıcı – Ziya Karakaya 2010-11 Fall Atılım University

COMPE472 Parallel Computing 9

A galaxy might have, say, 1011 stars.

Even if each calculation done in 1 ms (extremely optimistic figure), it takes 109 years for one iteration using N2 algorithm and almost a year for one iteration using an efficient N log2 N approximate algorithm.

Page 10: COMPE472 Parallel Computing 1 COMPE 462 Parallel Computing Ali Yazıcı – Ziya Karakaya 2010-11 Fall Atılım University

COMPE472 Parallel Computing 10

Astrophysical N-body simulation by Scott Linssen (undergraduate UNC-Charlotte student).

Page 11: COMPE472 Parallel Computing 1 COMPE 462 Parallel Computing Ali Yazıcı – Ziya Karakaya 2010-11 Fall Atılım University

COMPE472 Parallel Computing 11

Parallel Computing

Using more than one computer, or a computer with more than one processor, to solve a problem.

Motives Usually faster computation - very simple idea - that n computers

operating simultaneously can achieve the result n times faster - it will not be n times faster for various reasons.

Other motives include: fault tolerance, larger amount of memory available, ...

Page 12: COMPE472 Parallel Computing 1 COMPE 462 Parallel Computing Ali Yazıcı – Ziya Karakaya 2010-11 Fall Atılım University

COMPE472 Parallel Computing 12

The Real World is Massively Parallel

Page 13: COMPE472 Parallel Computing 1 COMPE 462 Parallel Computing Ali Yazıcı – Ziya Karakaya 2010-11 Fall Atılım University

COMPE472 Parallel Computing 13

Parallel Computing vs Traditional Computing

Page 14: COMPE472 Parallel Computing 1 COMPE 462 Parallel Computing Ali Yazıcı – Ziya Karakaya 2010-11 Fall Atılım University

COMPE472 Parallel Computing 14

Background

Parallel computers - computers with more than one processor - and their programming - parallel programming - has been around for more than

40 years.

Page 15: COMPE472 Parallel Computing 1 COMPE 462 Parallel Computing Ali Yazıcı – Ziya Karakaya 2010-11 Fall Atılım University

COMPE472 Parallel Computing 15

Gill writes in 1958:

“... There is therefore nothing new in the idea of parallel programming, but its application to computers. The author cannot believe that there will be any insuperable difficulty in extending it to computers. It is not to be expected that the necessary programming techniques will be worked out overnight. Much experimenting remains to be done. After all, the techniques that are commonly used in programming today were only won at the cost of considerable toil several years ago. In fact the advent of parallel programming may do something to revive the pioneering spirit in programming which seems at the present to be degenerating into a rather dull and routine occupation ...”Gill, S. (1958), “Parallel Programming,” The Computer Journal, vol. 1, April, pp. 2-10.

Page 16: COMPE472 Parallel Computing 1 COMPE 462 Parallel Computing Ali Yazıcı – Ziya Karakaya 2010-11 Fall Atılım University

COMPE472 Parallel Computing 16

Page 17: COMPE472 Parallel Computing 1 COMPE 462 Parallel Computing Ali Yazıcı – Ziya Karakaya 2010-11 Fall Atılım University

COMPE472 Parallel Computing 17

Speedup Factor

where ts is execution time on a single processor and tp is execution time on a multiprocessor.

S(p) gives increase in speed by using multiprocessor.

Use best sequential algorithm with single processor system. Underlying algorithm for parallel implementation might be (and is usually) different.

S(p) = Execution time using one processor (best sequential algorithm)

Execution time using a multiprocessor with p processors

ts

tp=

Page 18: COMPE472 Parallel Computing 1 COMPE 462 Parallel Computing Ali Yazıcı – Ziya Karakaya 2010-11 Fall Atılım University

COMPE472 Parallel Computing 18

Speedup factor can also be cast in terms of computational steps:

Can also extend time complexity to parallel computations.

S(p) = Number of computational steps using one processor

Number of parallel computational steps with p processors

Page 19: COMPE472 Parallel Computing 1 COMPE 462 Parallel Computing Ali Yazıcı – Ziya Karakaya 2010-11 Fall Atılım University

COMPE472 Parallel Computing 19

Maximum Speedup

Maximum speedup is usually p with p processors (linear speedup).

Possible to get superlinear speedup (greater than p) but usually a specific reason such as:

Extra memory in multiprocessor system Nondeterministic algorithm

Page 20: COMPE472 Parallel Computing 1 COMPE 462 Parallel Computing Ali Yazıcı – Ziya Karakaya 2010-11 Fall Atılım University

COMPE472 Parallel Computing 20

Maximum Speedup

Factors limiting speedupCommunication timeExtra computations in the parallel algorithm

(reevaluation of constants locally) Idle time of some processors

Page 21: COMPE472 Parallel Computing 1 COMPE 462 Parallel Computing Ali Yazıcı – Ziya Karakaya 2010-11 Fall Atılım University

COMPE472 Parallel Computing 21

Maximum Speedup Amdahl’s law

Serial section Parallelizable sections

(a) One processor

(b) Multipleprocessors

fts (1 - f)ts

ts

(1 - f)ts /ptp

p processors

Page 22: COMPE472 Parallel Computing 1 COMPE 462 Parallel Computing Ali Yazıcı – Ziya Karakaya 2010-11 Fall Atılım University

COMPE472 Parallel Computing 22

Speedup factor is given by:

This equation is known as Amdahl’s law

S(p) ts p

fts (1 f )ts /p 1 (p 1)f

Amdahl’s Law

Page 23: COMPE472 Parallel Computing 1 COMPE 462 Parallel Computing Ali Yazıcı – Ziya Karakaya 2010-11 Fall Atılım University

COMPE472 Parallel Computing 23

Speedup against number of processors

4

8

12

16

20

4 8 12 16 20

f = 20%

f = 10%

f = 5%

f = 0%

Number of processors, p

Page 24: COMPE472 Parallel Computing 1 COMPE 462 Parallel Computing Ali Yazıcı – Ziya Karakaya 2010-11 Fall Atılım University

COMPE472 Parallel Computing 24

Amdahl’s law

Even with infinite number of processors, maximum speedup is limited to 1/f.

ExampleWith only 5% of computation being serial, maximum speedup is 20, irrespective of number of processors.

Page 25: COMPE472 Parallel Computing 1 COMPE 462 Parallel Computing Ali Yazıcı – Ziya Karakaya 2010-11 Fall Atılım University

COMPE472 Parallel Computing 25

Superlinear Speedup example - Searching

(a) Searching each sub-space sequentially

tsts/p

Start Time

t

Solution foundxts/p

Sub-spacesearch

x indeterminate

Page 26: COMPE472 Parallel Computing 1 COMPE 462 Parallel Computing Ali Yazıcı – Ziya Karakaya 2010-11 Fall Atılım University

COMPE472 Parallel Computing 26

(b) Searching each sub-space in parallel

Solution found

t

Page 27: COMPE472 Parallel Computing 1 COMPE 462 Parallel Computing Ali Yazıcı – Ziya Karakaya 2010-11 Fall Atılım University

COMPE472 Parallel Computing 27

Speed-up then given by

S(p)xtsp

t+

t=

Page 28: COMPE472 Parallel Computing 1 COMPE 462 Parallel Computing Ali Yazıcı – Ziya Karakaya 2010-11 Fall Atılım University

COMPE472 Parallel Computing 28

Worst case for sequential search when solution found in last sub-space search. Then parallel version offers greatest benefit, i.e.

S(p)

p 1–p

ts t+

t=

  as t tends to zero

Page 29: COMPE472 Parallel Computing 1 COMPE 462 Parallel Computing Ali Yazıcı – Ziya Karakaya 2010-11 Fall Atılım University

COMPE472 Parallel Computing 29

Least advantage for parallel version when solution found in first sub-space search of the sequential search, i.e.

Actual speed-up depends upon which subspace holds solution but could be extremely large.

S(p) = tt

= 1

Page 30: COMPE472 Parallel Computing 1 COMPE 462 Parallel Computing Ali Yazıcı – Ziya Karakaya 2010-11 Fall Atılım University

COMPE472 Parallel Computing 30

Scalability

Architecturally scalable system Increase in number of processors leading to increase in

speedup

Architectural/Algorithmic scalability Increase in data size can be accomodated by the increase in

number of processors

Gustafson argued that Amdahl’s law is not significant Parallel execution time is fixed As p increases, problem size is also increased to maintain

constant parallel execution times

s

ss

sss ftpp

ftpp

ptfft

tfftpS )1(

1

)1(

/)1(

)1()(

Page 31: COMPE472 Parallel Computing 1 COMPE 462 Parallel Computing Ali Yazıcı – Ziya Karakaya 2010-11 Fall Atılım University

COMPE472 Parallel Computing 31

Message-Passing Computations

In a message passing environment, computation time consists of two parts:

The ratio below can be used as a metric:

commcompp ttt

comm

comp

t

t

timecomm

timecomp

_

_

Page 32: COMPE472 Parallel Computing 1 COMPE 462 Parallel Computing Ali Yazıcı – Ziya Karakaya 2010-11 Fall Atılım University

COMPE472 Parallel Computing 32

Types of Parallel Computers

Two principal types:

Shared memory multiprocessor

Distributed memory multicomputer

Page 33: COMPE472 Parallel Computing 1 COMPE 462 Parallel Computing Ali Yazıcı – Ziya Karakaya 2010-11 Fall Atılım University

COMPE472 Parallel Computing 33

Shared Memory Multiprocessor

Page 34: COMPE472 Parallel Computing 1 COMPE 462 Parallel Computing Ali Yazıcı – Ziya Karakaya 2010-11 Fall Atılım University

COMPE472 Parallel Computing 34

Conventional ComputerConsists of a processor executing a program stored in a (main) memory:

Each main memory location located by its address. Addresses start at 0 and extend to 2b - 1 when there are b bits (binary digits) in address.

Main memory

Processor

Instructions (to processor)Data (to or from processor)

Page 35: COMPE472 Parallel Computing 1 COMPE 462 Parallel Computing Ali Yazıcı – Ziya Karakaya 2010-11 Fall Atılım University

COMPE472 Parallel Computing 35

Shared Memory Multiprocessor SystemNatural way to extend single processor model - have multiple processors connected to multiple memory modules, such that each processor can access any memory module :

Processors

Interconnectionnetwork

Memory moduleOneaddressspace

Page 36: COMPE472 Parallel Computing 1 COMPE 462 Parallel Computing Ali Yazıcı – Ziya Karakaya 2010-11 Fall Atılım University

COMPE472 Parallel Computing 36

Simplistic view of a small shared memory multiprocessor

Examples: Dual Pentiums Quad Pentiums

Processors Shared memory

Bus

Page 37: COMPE472 Parallel Computing 1 COMPE 462 Parallel Computing Ali Yazıcı – Ziya Karakaya 2010-11 Fall Atılım University

COMPE472 Parallel Computing 37

Quad Pentium Shared Memory Multiprocessor

Processor

L2 Cache

Bus interface

L1 cache

Processor

L2 Cache

Bus interface

L1 cache

Processor

L2 Cache

Bus interface

L1 cache

Processor

L2 Cache

Bus interface

L1 cache

Memory controller

Memory

I/O interface

I/O bus

Processor/memorybus

Shared memory

Page 38: COMPE472 Parallel Computing 1 COMPE 462 Parallel Computing Ali Yazıcı – Ziya Karakaya 2010-11 Fall Atılım University

COMPE472 Parallel Computing 38

Programming Shared Memory Multiprocessors

Threads - programmer decomposes program into individual parallel sequences, (threads), each being able to access variables declared outside threads.

Example Pthreads

Sequential programming language with preprocessor compiler directives to declare shared variables and specify parallelism.

Example OpenMP - industry standard - needs OpenMP compiler

Page 39: COMPE472 Parallel Computing 1 COMPE 462 Parallel Computing Ali Yazıcı – Ziya Karakaya 2010-11 Fall Atılım University

COMPE472 Parallel Computing 39

Sequential programming language with added syntax to declare shared variables and specify parallelism.

Example UPC (Unified Parallel C) - needs a UPC compiler.

Parallel programming language with syntax to express parallelism - compiler creates executable code for each processor (not now common)

Sequential programming language and ask parallelizing compiler to convert it into parallel executable code. - also not now common

Page 40: COMPE472 Parallel Computing 1 COMPE 462 Parallel Computing Ali Yazıcı – Ziya Karakaya 2010-11 Fall Atılım University

COMPE472 Parallel Computing 40

Message-Passing Multicomputer

Complete computers connected through an interconnection network:

Processor

Interconnectionnetwork

Local

Computers

Messages

memory

Page 41: COMPE472 Parallel Computing 1 COMPE 462 Parallel Computing Ali Yazıcı – Ziya Karakaya 2010-11 Fall Atılım University

COMPE472 Parallel Computing 41

Interconnection Networks

Limited and exhaustive interconnections 2- and 3-dimensional meshes Hypercube (not now common) Using Switches:

Crossbar Trees Multistage interconnection networks

Peer-to-peer c(c-1)/2 links with c processors

Page 42: COMPE472 Parallel Computing 1 COMPE 462 Parallel Computing Ali Yazıcı – Ziya Karakaya 2010-11 Fall Atılım University

COMPE472 Parallel Computing 42

Two-dimensional array (mesh)

Links

Computer/processor

)1(2, pdiametermeshpp

Page 43: COMPE472 Parallel Computing 1 COMPE 462 Parallel Computing Ali Yazıcı – Ziya Karakaya 2010-11 Fall Atılım University

COMPE472 Parallel Computing 43

Three-dimensional hypercube

000 001

010 011

100

110

101

111

In a d-dim hypercube, each node connects to one node in each dimension. Above a 3-d hypercube is shown. Each node is assigned a 3 bit address. Address difference between nodes is only 1 bit. Diameter is log2p

Page 44: COMPE472 Parallel Computing 1 COMPE 462 Parallel Computing Ali Yazıcı – Ziya Karakaya 2010-11 Fall Atılım University

COMPE472 Parallel Computing 44

Four-dimensional hypercube

Hypercubes popular in 1980’s - not now

0000 0001

0010 0011

0100

0110

0101

0111

1000 1001

1010 1011

1100

1110

1101

1111

Page 45: COMPE472 Parallel Computing 1 COMPE 462 Parallel Computing Ali Yazıcı – Ziya Karakaya 2010-11 Fall Atılım University

COMPE472 Parallel Computing 45

Crossbar switch

SwitchesProcessors

Memories

Provides exhaustive connections using one switch for each connection. Used in shared memory systems.

Page 46: COMPE472 Parallel Computing 1 COMPE 462 Parallel Computing Ali Yazıcı – Ziya Karakaya 2010-11 Fall Atılım University

COMPE472 Parallel Computing 46

Tree

Switchelement

Root

Links

Processors

Thinking Machine’s Connection Machine CM5 (1993)

Page 47: COMPE472 Parallel Computing 1 COMPE 462 Parallel Computing Ali Yazıcı – Ziya Karakaya 2010-11 Fall Atılım University

COMPE472 Parallel Computing 47

Multistage Interconnection NetworkExample: Omega network

000

001

010

011

100

101

110

111

000

001

010

011

100

101

110

111

Inputs Outputs

2 ´ 2 switch elements(straight-through or

crossover connections)

Page 48: COMPE472 Parallel Computing 1 COMPE 462 Parallel Computing Ali Yazıcı – Ziya Karakaya 2010-11 Fall Atılım University

COMPE472 Parallel Computing 48

Communication Methods

Circuit switching Establish the path Maintain/Reserve links for message passing Simple telephone system is an example Used in early multicomputers (INTEL IPSC-2)

Packet switching Divide message into “packets” Packet = Source/Dest addresses + Data Packet max size is known Mail system is an example

Page 49: COMPE472 Parallel Computing 1 COMPE 462 Parallel Computing Ali Yazıcı – Ziya Karakaya 2010-11 Fall Atılım University

COMPE472 Parallel Computing 49

Distributed Shared Memory Making main memory of group of interconnected computers look as though a single memory with single address space. Then can use shared memory programming techniques.

Processor

Interconnectionnetwork

Shared

Computers

Messages

memory

Page 50: COMPE472 Parallel Computing 1 COMPE 462 Parallel Computing Ali Yazıcı – Ziya Karakaya 2010-11 Fall Atılım University

COMPE472 Parallel Computing 50

Flynn’s Classifications

Flynn (1966) created a classification for computers based upon instruction streams and data streams:

Single instruction stream-single data stream (SISD) computer

Single processor computer - single stream of instructions generated from program. Instructions operate upon a single stream of data items.

Page 51: COMPE472 Parallel Computing 1 COMPE 462 Parallel Computing Ali Yazıcı – Ziya Karakaya 2010-11 Fall Atılım University

COMPE472 Parallel Computing 51

Single Instruction, Single Data (SISD): • A serial (non-parallel) computer • Single instruction: only one instruction stream is being

acted on by the CPU during any one clock cycle • Single data: only one data stream is being used as input

during any one clock cycle • Deterministic execution • This is the oldest and until recently, the most prevalent

form of computer • Examples: most PCs, single CPU workstations and

mainframes

Page 52: COMPE472 Parallel Computing 1 COMPE 462 Parallel Computing Ali Yazıcı – Ziya Karakaya 2010-11 Fall Atılım University

COMPE472 Parallel Computing 52

SISD

Page 53: COMPE472 Parallel Computing 1 COMPE 462 Parallel Computing Ali Yazıcı – Ziya Karakaya 2010-11 Fall Atılım University

COMPE472 Parallel Computing 53

Multiple Instruction Stream-Multiple Data Stream (MIMD) Computer

General-purpose multiprocessor system - each processor has a separate program and one instruction stream is generated from each program for each processor. Each instruction operates upon different data.

Both the shared memory and the message-passing multiprocessors so far described are in the MIMD classification.

Page 54: COMPE472 Parallel Computing 1 COMPE 462 Parallel Computing Ali Yazıcı – Ziya Karakaya 2010-11 Fall Atılım University

COMPE472 Parallel Computing 54

MIMD

Multiple Instruction, Multiple Data (MIMD): Currently, the most common type of parallel computer. Most modern

computers fall into this category. Multiple Instruction: every processor may be executing a different

instruction stream Multiple Data: every processor may be working with a different data

stream Execution can be synchronous or asynchronous, deterministic or

non-deterministic Examples: most current supercomputers, networked parallel

computer "grids" and multi-processor SMP computers - including some types of PCs.

Page 55: COMPE472 Parallel Computing 1 COMPE 462 Parallel Computing Ali Yazıcı – Ziya Karakaya 2010-11 Fall Atılım University

COMPE472 Parallel Computing 55

MIMD

Page 56: COMPE472 Parallel Computing 1 COMPE 462 Parallel Computing Ali Yazıcı – Ziya Karakaya 2010-11 Fall Atılım University

COMPE472 Parallel Computing 56

Single Instruction Stream-Multiple Data Stream (SIMD) Computer

A specially designed computer - a single instruction stream from a single program, but multiple data streams exist. Instructions from program broadcast to more than one processor. Each processor executes same instruction in synchronism, but using different data.

Developed because a number of important applications that mostly operate upon arrays of data.

Page 57: COMPE472 Parallel Computing 1 COMPE 462 Parallel Computing Ali Yazıcı – Ziya Karakaya 2010-11 Fall Atılım University

COMPE472 Parallel Computing 57

SIMDSingle Instruction, Multiple Data (SIMD): A type of parallel computer Single instruction: All processing units execute the same instruction at

any given clock cycle Multiple data: Each processing unit can operate on a different data

element This type of machine typically has an instruction dispatcher, a very high-

bandwidth internal network, and a very large array of very small-capacity instruction units.

Best suited for specialized problems characterized by a high degree of regularity,such as image processing.

Synchronous (lockstep) and deterministic execution Two varieties: Processor Arrays and Vector Pipelines Examples:

Processor Arrays: Connection Machine CM-2, Maspar MP-1, MP-2 Vector Pipelines: IBM 9000, Cray C90, Fujitsu VP, NEC SX-2, Hitachi

S820

Page 58: COMPE472 Parallel Computing 1 COMPE 462 Parallel Computing Ali Yazıcı – Ziya Karakaya 2010-11 Fall Atılım University

COMPE472 Parallel Computing 58

SIMD

Page 59: COMPE472 Parallel Computing 1 COMPE 462 Parallel Computing Ali Yazıcı – Ziya Karakaya 2010-11 Fall Atılım University

COMPE472 Parallel Computing 59

Multiple Program Multiple Data (MPMD) Structure

Within the MIMD classification, each processor will have its own program to execute:

Program

Processor

Data

Program

Processor

Data

InstructionsInstructions

Page 60: COMPE472 Parallel Computing 1 COMPE 462 Parallel Computing Ali Yazıcı – Ziya Karakaya 2010-11 Fall Atılım University

COMPE472 Parallel Computing 60

Single Program Multiple Data (SPMD) Structure

Single source program written and each processor executes its personal copy of this program, although independently and not in synchronism.

Source program can be constructed so that parts of the program are executed by certain computers and not others depending upon the identity of the computer.

Page 61: COMPE472 Parallel Computing 1 COMPE 462 Parallel Computing Ali Yazıcı – Ziya Karakaya 2010-11 Fall Atılım University

COMPE472 Parallel Computing 61

Networked Computers as a Computing Platform

A network of computers became a very attractive alternative to expensive supercomputers and parallel computer systems for high-performance computing in early 1990’s.

Several early projects. Notable:

Berkeley NOW (network of workstations) project.

NASA Beowulf project.

Page 62: COMPE472 Parallel Computing 1 COMPE 462 Parallel Computing Ali Yazıcı – Ziya Karakaya 2010-11 Fall Atılım University

COMPE472 Parallel Computing 62

Key advantages:

Very high performance workstations and PCs readily available at low cost.

The latest processors can easily be incorporated into the system as they become available.

Existing software can be used or modified.

Page 63: COMPE472 Parallel Computing 1 COMPE 462 Parallel Computing Ali Yazıcı – Ziya Karakaya 2010-11 Fall Atılım University

COMPE472 Parallel Computing 63

Software Tools for Clusters

Based upon Message Passing Parallel Programming:

Parallel Virtual Machine (PVM) - developed in late 1980’s. Became very popular.

Message-Passing Interface (MPI) - standard defined in 1990s.

Both provide a set of user-level libraries for message passing. Use with regular programming languages (C, C++, ...).

Page 64: COMPE472 Parallel Computing 1 COMPE 462 Parallel Computing Ali Yazıcı – Ziya Karakaya 2010-11 Fall Atılım University

COMPE472 Parallel Computing 64

Beowulf Clusters*

A group of interconnected “commodity” computers achieving high performance with low cost.

Typically using commodity interconnects - high speed Ethernet, and Linux OS.

* Beowulf comes from name given by NASA Goddard Space Flight Center cluster project.

Page 65: COMPE472 Parallel Computing 1 COMPE 462 Parallel Computing Ali Yazıcı – Ziya Karakaya 2010-11 Fall Atılım University

COMPE472 Parallel Computing 65

Cluster Interconnects

Originally fast Ethernet on low cost clusters Gigabit Ethernet - easy upgrade path

More Specialized/Higher Performance Myrinet - 2.4 Gbits/sec - disadvantage: single vendor cLan SCI (Scalable Coherent Interface) QNet Infiniband - may be important as infininband interfaces may be

integrated on next generation PCs

Page 66: COMPE472 Parallel Computing 1 COMPE 462 Parallel Computing Ali Yazıcı – Ziya Karakaya 2010-11 Fall Atılım University

COMPE472 Parallel Computing 66

Dedicated cluster with a master node

Dedicated Cluster User

Switch

Master node

Compute nodes

Up link

2nd Ethernetinterface

External network