cs591x -cluster computing and parallel programming parallel computer architecture and software...

40
CS591x -Cluster Computing and Parallel Programming Parallel Computer Architecture and Software Models

Upload: helen-sherman

Post on 17-Jan-2016

223 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: CS591x -Cluster Computing and Parallel Programming Parallel Computer Architecture and Software Models

CS591x -Cluster Computing and Parallel Programming

Parallel Computer Architecture and Software Models

Page 2: CS591x -Cluster Computing and Parallel Programming Parallel Computer Architecture and Software Models

It all about performance

Greater performance is the reason for parallel computingMany types of scientific and engineering programs are too large and too complex for traditional uniprocessorsSuch large problems are common is – Ocean modeling, weather modeling,

astrophysics, solid state physics, power systems….

Page 3: CS591x -Cluster Computing and Parallel Programming Parallel Computer Architecture and Software Models

FLOPS – a measure of performance

FLOPS – Floating Point Operations per Second… a measure of how much computation can be done in a certain amount of time MegaFLOPS – MFLOPS - 106 FLOPS GigaFLOPS – GFLOPS – 109 FLOPS TeraFLOPS – TFLOPS – 1012 FLOPS PetaFLOPS – PFLOPS – 1015 FLOPS

Page 4: CS591x -Cluster Computing and Parallel Programming Parallel Computer Architecture and Software Models

How fast …

Cray 1 - ~150 MFLOPSPentium 4 – 3-6 GFLOPSIBM’s BlueGene - +70 TFLOPSPSC’s Big Ben – 10 TFLOPSHumans --- it depends as calculators – 0.001 MFLOPS as information processors – 10PFLOPS

Page 5: CS591x -Cluster Computing and Parallel Programming Parallel Computer Architecture and Software Models

FLOPS vs. MIPS

FLOPS only concerned with floating pointing calculationsother performance issues memory latency cache performance I/O capacity …

Page 6: CS591x -Cluster Computing and Parallel Programming Parallel Computer Architecture and Software Models

See…

www.Top500.org biannual performance reports and … rankings of the fastest computers in

the world

Page 7: CS591x -Cluster Computing and Parallel Programming Parallel Computer Architecture and Software Models

Performance

Speedup(n processors) = time(1 processor)/time(n processors)

** Culler, Singh and Gupta, Parallel Computing Architecture, A Hardware/Software Approach

Page 8: CS591x -Cluster Computing and Parallel Programming Parallel Computer Architecture and Software Models

Consider…

from: www.lib.utexas.edu/maps/indian_ocean.html

Page 9: CS591x -Cluster Computing and Parallel Programming Parallel Computer Architecture and Software Models

… a model of the Indian Ocean -

73,000,000 square kilometer One data point per 100 meters 7,300,000,000 surface points

Need to model the ocean at depth – say every 10 meters up to 200 meters 20 depth data points

Every 10 minutes for 4 hours – 24 time steps

Page 10: CS591x -Cluster Computing and Parallel Programming Parallel Computer Architecture and Software Models

So –

73 x 106 (points on the surface) x 102 (points per sq. km) x 20 points per sq km of depth) x 24 (time steps) 3,504,000,000,000 data points in the

model grid

Suppose 100 instruction per grid point 350,400,000,000,000 instructions in

model

Page 11: CS591x -Cluster Computing and Parallel Programming Parallel Computer Architecture and Software Models

Then -

Imagine that you have a computer that can run 1 billion (109)instructions per second3.504 x 1014 / 109 = 35040 seconds or 9.7 hours

Page 12: CS591x -Cluster Computing and Parallel Programming Parallel Computer Architecture and Software Models

But –

On a 10 teraflops computer – 3.504 x 1014 / 1013 = 35.0 seconds

Page 13: CS591x -Cluster Computing and Parallel Programming Parallel Computer Architecture and Software Models

Gaining performance

Pipelining More instructions –faster More instructions in execution at the

same time in a single processor Not usually an attractive strategy

these days – why?

Page 14: CS591x -Cluster Computing and Parallel Programming Parallel Computer Architecture and Software Models

Instruction Level Parallelism (ILP)

based on the fact that many instructions do not depend on instructions that are before them…Processor has extra hardware to execute several instructions at the same time …multiple adders…

Page 15: CS591x -Cluster Computing and Parallel Programming Parallel Computer Architecture and Software Models

Pipelining and ILP not the solution to our problem – why?

near incremental improvements in performancebeen done alreadywe need orders of magnitude improvements in performance

Page 16: CS591x -Cluster Computing and Parallel Programming Parallel Computer Architecture and Software Models

Gaining Performance

Vector ProcessorsScientific and Engineering computations are often vector and matrix operations graphic transformations – i.e. shift

object x to the right

Redundant arithmetic hardware and vector registers to operate on an entire vector in one step (SIMD)

Page 17: CS591x -Cluster Computing and Parallel Programming Parallel Computer Architecture and Software Models

Gaining Performance

Vector ProcessorsDeclining popularity for a while – Hardware expensive

Popularity returning – Applications – science, engineering,

cryptography, media/graphics Earth Simulator

Page 18: CS591x -Cluster Computing and Parallel Programming Parallel Computer Architecture and Software Models

Parallel Computer Architecture

Shared Memory ArchitecturesDistributed Memory

Page 19: CS591x -Cluster Computing and Parallel Programming Parallel Computer Architecture and Software Models

Shared Memory Systems

Multiple processors connected to/share the same pool of memorySMPEvery processor has, potentially, access to and control of every memory location

Page 20: CS591x -Cluster Computing and Parallel Programming Parallel Computer Architecture and Software Models

Shared Memory Computers

MemoryProcessor

ProcessorProcessor

Processor

Processor Processor

Page 21: CS591x -Cluster Computing and Parallel Programming Parallel Computer Architecture and Software Models

Shared Memory Computers

Memory Memory Memory

Processor

Processor

Processor

Page 22: CS591x -Cluster Computing and Parallel Programming Parallel Computer Architecture and Software Models

Shared Memory Computer

Memory Memory Memory

Processor

Processor

Processor

Switch

Page 23: CS591x -Cluster Computing and Parallel Programming Parallel Computer Architecture and Software Models

Share Memory Computers

SGI Origin2000 – at NCSABalder256 250mhz R10000 processors128 Gbyte Memory

Page 24: CS591x -Cluster Computing and Parallel Programming Parallel Computer Architecture and Software Models

Shared Memory Computers

Rachel at PSC64 1.15 Ghz EV7 processors256 Gbytes of shared memory

Page 25: CS591x -Cluster Computing and Parallel Programming Parallel Computer Architecture and Software Models

Distributed Memory Systems

Multiple processors each with their own memoryInterconnected to share/exchange data, processingModern architectural approach to supercomputersSupercomputers and Clusters similar

Page 26: CS591x -Cluster Computing and Parallel Programming Parallel Computer Architecture and Software Models

Clusters – distributed memory

Processor

Memory

Processor

Memory

Processor

Memory

Processor

Memory

Processor

Memory

Processor

Memory

Interconnect

Page 27: CS591x -Cluster Computing and Parallel Programming Parallel Computer Architecture and Software Models

ClusterDistributed Memory with SMP

Proc1

Memory

Memory

Memory

Memory

Interconnect

Proc2 Proc1

Memory

Proc2 Proc1

Memory

Proc2

Proc1Proc2 Proc1Proc2 Proc1Proc2

Page 28: CS591x -Cluster Computing and Parallel Programming Parallel Computer Architecture and Software Models

Distributed Memory Supercomputer

BlueGene/L DOE/IBM0.7 Ghz PowerPC 44032768 Processors70 Teraflops

Page 29: CS591x -Cluster Computing and Parallel Programming Parallel Computer Architecture and Software Models

Distributed Memory Supercomputer

Thunder at LLNLNumber 520 Teraflops1.4 Ghz Itanium processors4096 processors

Page 30: CS591x -Cluster Computing and Parallel Programming Parallel Computer Architecture and Software Models

Grid Computing Systems

What is a Grid Means different things to different

people

Distributed Processors Around campus Around the state Around the world

Page 31: CS591x -Cluster Computing and Parallel Programming Parallel Computer Architecture and Software Models

Grid Computing Systems

Widely distributedLoosely connected (i.e. Internet)No central management

Page 32: CS591x -Cluster Computing and Parallel Programming Parallel Computer Architecture and Software Models

Grid Computing SystemsConnected Clusters/other dedicated scientific computers

I2/Abilene

Page 33: CS591x -Cluster Computing and Parallel Programming Parallel Computer Architecture and Software Models

Grid Computer Systems

InternetInternet

Control/Scheduler

Harvested Idle Cycles

Page 34: CS591x -Cluster Computing and Parallel Programming Parallel Computer Architecture and Software Models

Grid Computing Systems

Dedicated Grids TeraGrid Sabre NASA Information Power Grid

Cycle Harvesting Grids Condor *GlobalGridForum (Parabon) Seti@home

Page 35: CS591x -Cluster Computing and Parallel Programming Parallel Computer Architecture and Software Models

Let’s revisit speedup…

we can achieve speedup (theoretically) by using more processors,…but, of factors may limit speedup… Interprocessor communications Interprocess synchronization Load balance

Page 36: CS591x -Cluster Computing and Parallel Programming Parallel Computer Architecture and Software Models

Amdahl’s Law

According to Amdahl’s Law… Speedup = 1/(S + (1-S)/N) where S is the purely sequential part of the

program N is the number of processors

Page 37: CS591x -Cluster Computing and Parallel Programming Parallel Computer Architecture and Software Models

Amdahl’s LawWhat does it mean – Part of a program can is parallelizable Part of the program must be sequential

(S)

Amdahl’s law says – Speedup is constrained by the portion of

the program that must remain sequential relative to the part that is parallelized.

Note: If S is very small – “embarrassingly parallel problem”

Page 38: CS591x -Cluster Computing and Parallel Programming Parallel Computer Architecture and Software Models

Software models for parallel computing

Shared MemoryDistributed MemoryData Parallel

Page 39: CS591x -Cluster Computing and Parallel Programming Parallel Computer Architecture and Software Models

Flynn’s Taxonomy

Single Instruction/Single Data - SISD

Multiple Instruction/Single Data - MISD

Single Instruction/Multiple Data - SIMD

Multiple Instruction/Multiple Data - MIMD

Single Program/Multiple Data - SPMD

Page 40: CS591x -Cluster Computing and Parallel Programming Parallel Computer Architecture and Software Models

Next

Cluster Computer ArchitectureLinux