parallel computing department of computer engineering ferdowsi university hossain deldari

48
Parallel Computing Department Of Computer Engineering Ferdowsi University Hossain Deldari

Upload: lenard-walters

Post on 29-Jan-2016

215 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Parallel Computing Department Of Computer Engineering Ferdowsi University Hossain Deldari

Parallel Computing

Department Of Computer Engineering

Ferdowsi University

Hossain Deldari

Page 2: Parallel Computing Department Of Computer Engineering Ferdowsi University Hossain Deldari

•Parallel Processing

•Super Computer

•Parallel Computer

•Amdahl’s Low, Speedup, Efficiency

•Parallel Machine Architecture

•Computational Model

•Concurrency Approach

•Parallel Programming

•Cluster Computing

Lecture organization

Page 3: Parallel Computing Department Of Computer Engineering Ferdowsi University Hossain Deldari

•It is the division of work into smaller tasks

•Assigning many smaller tasks to multiple workers to work on simultaneously

•Parallel processing is the use of multiple processors to execute different parts of the same program simultaneously

•Difficulties: coordinating, controlling and monitoring the workers

•The main goals of parallel processing are:• -solve much bigger problems much faster! •to reduce wall-clock time of execution of computer programs •to increase the size of computational problems that can be solved

What is Parallel Processing?

Page 4: Parallel Computing Department Of Computer Engineering Ferdowsi University Hossain Deldari

What is a Supercomputer?A supercomputer is a computer that is a lot faster than the computers that normal people use Note: This is a time-dependent definition

ManufacturerComputer/Procs

Rmax

Rpeak

Installation SiteCountry/Year

TMC

CM-5/1024/ 1024 59.70

131.00 Los Alamos National LaboratoryUSA/

June 1993:

TOP500 Lists

Supercomputer & parallel computer

Page 5: Parallel Computing Department Of Computer Engineering Ferdowsi University Hossain Deldari

June 2003:

ManufacturerComputer/Procs

Rmax

Rpeak

Installation SiteCountry/Year

NECEarth-Simulator/ 5120

35860.0040960.00 Earth simulator center

Japan

Rmax Maximal LINPACK performance achieved

Rpeak Theoretical peak performance

LINPACK is a Benchmark

Page 6: Parallel Computing Department Of Computer Engineering Ferdowsi University Hossain Deldari
Page 7: Parallel Computing Department Of Computer Engineering Ferdowsi University Hossain Deldari

Amdahl’s Law

)(

)1()(

pTime

TimepspeedupPC

Amdahl’s low, Speedup, Efficiency

Page 8: Parallel Computing Department Of Computer Engineering Ferdowsi University Hossain Deldari
Page 9: Parallel Computing Department Of Computer Engineering Ferdowsi University Hossain Deldari

Efficiency is a measure of the fraction of time that a processor spends performing useful work.

Efficiency

Page 10: Parallel Computing Department Of Computer Engineering Ferdowsi University Hossain Deldari

Shunt Operation

Page 11: Parallel Computing Department Of Computer Engineering Ferdowsi University Hossain Deldari

• SIMD • MIMD • MISD • Clusters

Parallel and Distributed Computers

Page 12: Parallel Computing Department Of Computer Engineering Ferdowsi University Hossain Deldari

SIMD (Single Instruction Multiple Data)

Page 13: Parallel Computing Department Of Computer Engineering Ferdowsi University Hossain Deldari

MISD(Multi Instruction Single Data)

Page 14: Parallel Computing Department Of Computer Engineering Ferdowsi University Hossain Deldari

MIMD (Multiple Instruction Multiple Data)

Page 15: Parallel Computing Department Of Computer Engineering Ferdowsi University Hossain Deldari

MIMD(cont.)

Page 16: Parallel Computing Department Of Computer Engineering Ferdowsi University Hossain Deldari

•Shared memory model

•Bus-based

•Switch-based

•NUMA

•Distributed memory model

•Distributed shared memory model

•Page-based

•Object-based

•Hardware

Parallel machine architecture

Page 17: Parallel Computing Department Of Computer Engineering Ferdowsi University Hossain Deldari

Shared memory model

Page 18: Parallel Computing Department Of Computer Engineering Ferdowsi University Hossain Deldari

- Shared memory or Multiprocessor

-OpenMP is a standard (C/C++/FORTRAN)

Advantage:

Easy Programming.

Disadvantage:

Design Complexity

Not Scalable

Shared memory model(cont.)

Page 19: Parallel Computing Department Of Computer Engineering Ferdowsi University Hossain Deldari

-Bus is bottleneck

- Not scalable

Bus-based shared memory model

Page 20: Parallel Computing Department Of Computer Engineering Ferdowsi University Hossain Deldari

- Maintenance is difficult.

- Expensive

- scalable

Switch-based shared memory model

Page 21: Parallel Computing Department Of Computer Engineering Ferdowsi University Hossain Deldari

•NUMA stands for Non-Uniform Memory Access.•Simulated shared memory•Better scalability

NUMA model

Page 22: Parallel Computing Department Of Computer Engineering Ferdowsi University Hossain Deldari

• Multi computer

•MPI(Message Passing Interface)

•Easy design

•Low cost

•High scalability

•Difficult programming

Distributed memory model

Page 23: Parallel Computing Department Of Computer Engineering Ferdowsi University Hossain Deldari

Linear Array

Ring

Mesh Fully Connected

6 3

1 2

5 4

Examples of Network Topology

Page 24: Parallel Computing Department Of Computer Engineering Ferdowsi University Hossain Deldari

1110 1111

1010 1011

0110 0111

0010 0011

1101

1010

1000 1001

0100 0101

0010

0000 0001

S

d = 4

Hypercubes

Examples of Network Topology(cont.)

Page 25: Parallel Computing Department Of Computer Engineering Ferdowsi University Hossain Deldari

•Simpler abstraction

•Sharing data

• easier portability

•Easy design with easy programming

•Low performance(for high communication)

Distributed shared memory model

Page 26: Parallel Computing Department Of Computer Engineering Ferdowsi University Hossain Deldari

Degree of Coupling

SIMD MIMD

Shared Memory

Distributed Memory

Supported Grain Sizes

Communication Speedslowfast

fine coarse

loosetight

SIMD SMP NUMA Cluster

Parallel and Distributed Architecture (Leopold, 2001)

Page 27: Parallel Computing Department Of Computer Engineering Ferdowsi University Hossain Deldari

•RAM

•PRAM

•BSP

•LOGP

•MPI

Computational Model

Page 28: Parallel Computing Department Of Computer Engineering Ferdowsi University Hossain Deldari

RAM Model

Page 29: Parallel Computing Department Of Computer Engineering Ferdowsi University Hossain Deldari

• Synchronized Read Compute Write Cycle

• EREW• ERCW• CREW• CRCW

Control

PrivateMemory

P1

PrivateMemory

P2

PrivateMemory

Pp

Global

Memory

Parallel Random Access Machine PRAM Model

Page 30: Parallel Computing Department Of Computer Engineering Ferdowsi University Hossain Deldari

• Generalization of PRAM Model

• Processor-Memory Pairs

• Communication Network

• Barrier Synchronization

Super-step

Processes

Execute Communications

Barrier Synchronization

Bulk Synchronous Parallel (BSP) Model

Page 31: Parallel Computing Department Of Computer Engineering Ferdowsi University Hossain Deldari

• Cost of superstep =

w+max(hs,hr).g+l– w (maximum

number of local operation)

– hs (maximum # of packets sent)

– hr (maximum # of packets received)

g (communication throughput)

p (number of Processors)

l

(synchronization latency)

BSP Space

Complexity

Page 32: Parallel Computing Department Of Computer Engineering Ferdowsi University Hossain Deldari

• Closely related to BSP• It models asynchronous execution• News Parameters

L (message latency)

o The overhead, defined as the length of time that a processor is engaged in the transmission or reception of each message. During this time the processor cannot perform other operations.

g: The gap, defined as the minimum time interval between consecutive message transmissions or receptions. The reciprocal of g corresponds to the available per-processor bandwidth

P: The number of processor/memory modules.

LogP Model

Page 33: Parallel Computing Department Of Computer Engineering Ferdowsi University Hossain Deldari

Logp (cont.)

Page 34: Parallel Computing Department Of Computer Engineering Ferdowsi University Hossain Deldari

What Is MPI?

•A message-passing library specification •message-passing model •not a compiler specification •not a specific product

•For parallel computers, clusters, and heterogeneous networks

•Full-featured

•Designed to permit (unleash?) the development of parallel software libraries

•Designed to provide access to advanced parallel hardware for •end users •library writers •tool developers

MPI(Message Passing Interface)

Page 35: Parallel Computing Department Of Computer Engineering Ferdowsi University Hossain Deldari

Application

MPI

Comm.

Application

MPI

Comm.Node 1 Node 2

Task 1 Task 2

Virtual communication

Real communication

MPI Layer

Page 36: Parallel Computing Department Of Computer Engineering Ferdowsi University Hossain Deldari

Matrix Multiplication Example

Page 37: Parallel Computing Department Of Computer Engineering Ferdowsi University Hossain Deldari

PRAM Matrix Multiplication

Cost Of PRAM Algorithm

Page 38: Parallel Computing Department Of Computer Engineering Ferdowsi University Hossain Deldari

BSP Matrix Multiplication

Cost of algorithm

Page 39: Parallel Computing Department Of Computer Engineering Ferdowsi University Hossain Deldari

Concurrency Approach

•Control Parallel

•Data Parallel

Page 40: Parallel Computing Department Of Computer Engineering Ferdowsi University Hossain Deldari

Control Parallel

Page 41: Parallel Computing Department Of Computer Engineering Ferdowsi University Hossain Deldari

Data Parallel

Page 42: Parallel Computing Department Of Computer Engineering Ferdowsi University Hossain Deldari

The Best granularity for programming

Page 43: Parallel Computing Department Of Computer Engineering Ferdowsi University Hossain Deldari

•Explicit Parallel Programming

Occam, MPI, PVM

•Implicit Parallel Programming

Parallel functional programming

ML,…

Concurrent object-oriented programming

COOL,…

Data parallel programming

Fortran 90, HPF,…

Parallel Programming

Page 44: Parallel Computing Department Of Computer Engineering Ferdowsi University Hossain Deldari

• A Cluster system is – Parallel multicomputer built from high-end

PCs and conventional high-speed network.– Support parallel programming

Cluster Computing

Page 45: Parallel Computing Department Of Computer Engineering Ferdowsi University Hossain Deldari

• Scientific Computing– Simulation , CFD, CAD/CAM , Weather

prediction, process large volume of data• Super server system

– Scalable internet/ web server– Database server– Multimedia, video, audio server

Applications

Cluster Computing(cont.)

Page 46: Parallel Computing Department Of Computer Engineering Ferdowsi University Hossain Deldari

Cluster System Building Block

High Speed Network

HW HW HW HW

OS OS OS OS

Single System Image Layer

System Tool Layer

Application Layer

Cluster Computing(cont.)

Page 47: Parallel Computing Department Of Computer Engineering Ferdowsi University Hossain Deldari

Why cluster computing?

• Scalability – Build small system first, grow it later.

• Low-cost– Hardware based on COTS model

(Component off-the-shelf)– S/w(SoftWare) based on freeware from

research community • Easier to maintain • Vendor independent

Cluster Computing(cont.)

Page 48: Parallel Computing Department Of Computer Engineering Ferdowsi University Hossain Deldari

The End