why parallel/distributed computing

17
Why Parallel/Distributed Computing Sushil K. Prasad Sushil K. Prasad [email protected] [email protected]

Upload: nevan

Post on 25-Feb-2016

45 views

Category:

Documents


2 download

DESCRIPTION

Why Parallel/Distributed Computing. Sushil K. Prasad [email protected]. What is Parallel and Distributed computing?. Solving a single problem faster using multiple CPUs E.g. Matrix Multiplication C = A X B Parallel = Shared Memory among all CPUs Distributed = Local Memory/CPU - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Why Parallel/Distributed Computing

Why Parallel/Distributed Computing

Sushil K. PrasadSushil K. [email protected]@gsu.edu

Page 2: Why Parallel/Distributed Computing

.

What is Parallel and Distributed computing?

Solving a single problem faster using multiple Solving a single problem faster using multiple CPUsCPUs

E.g. Matrix Multiplication C = A X BE.g. Matrix Multiplication C = A X B

Parallel = Shared Memory among all CPUs Parallel = Shared Memory among all CPUs Distributed = Local Memory/CPUDistributed = Local Memory/CPU Common Issues: Partition, Synchronization, Common Issues: Partition, Synchronization,

Dependencies, load balancingDependencies, load balancing

Page 3: Why Parallel/Distributed Computing

.

Eniac (350 op/s) 1946 - (U.S. Army photo)

Page 4: Why Parallel/Distributed Computing

.

ASCI White (10 teraops/sec 2006)

Mega flops = 10^6 flops = 2^20Giga = 10^9 = billion = 2^30Tera = 10^12 = trillion = 2^40Peta = 10^15 = quadrillion = 2^50Exa = 10^18 = quintillion = 2^60

Page 5: Why Parallel/Distributed Computing

.

65 Years of Speed Increases

ENIAC

350 flops

1946

Today - 2011

8 Peta flops = 10^15 flops

K computer

Page 6: Why Parallel/Distributed Computing

.

Why Parallel and Distributed Computing? Grand Challenge ProblemsGrand Challenge Problems

Weather Forecasting; Global WarmingWeather Forecasting; Global Warming Materials Design – Superconducting Materials Design – Superconducting

material at room temperature; nano-material at room temperature; nano-devices; spaceships.devices; spaceships.

Organ Modeling; Drug DiscoveryOrgan Modeling; Drug Discovery

Page 7: Why Parallel/Distributed Computing

.

Why Parallel and Distributed Computing? Physical Limitations of Circuits Physical Limitations of Circuits

Heat and light effectHeat and light effect Superconducting material to counter heat effectSuperconducting material to counter heat effect Speed of light effect – no solution!Speed of light effect – no solution!

Page 8: Why Parallel/Distributed Computing

.

Microprocessor RevolutionMicros

Minis

Mainframes

Speed (log scale)

Time

Supercomputers

Page 9: Why Parallel/Distributed Computing

.

VLSI – Effect of Integration VLSI – Effect of Integration 1 M transistor enough for full 1 M transistor enough for full

functionality - Dec’s Alpha (90’s)functionality - Dec’s Alpha (90’s) Rest must go into multiple CPUs/chipRest must go into multiple CPUs/chip

Cost – Multitudes of average CPUs give Cost – Multitudes of average CPUs give better FLPOS/$ compared to traditional better FLPOS/$ compared to traditional supercomputerssupercomputers

Why Parallel and Distributed Computing?

Page 10: Why Parallel/Distributed Computing

.

Modern Parallel Computers Caltech’s Cosmic Cube (Seitz and Fox)Caltech’s Cosmic Cube (Seitz and Fox) Commercial copy-catsCommercial copy-cats

nCUBE Corporation (512 CPUs)nCUBE Corporation (512 CPUs) Intel’s Supercomputer SystemsIntel’s Supercomputer Systems

iPSC1, iPSC2, Intel Paragon (512 CPUs)iPSC1, iPSC2, Intel Paragon (512 CPUs) Thinking Machines CorporationThinking Machines Corporation

CM2 (65K 4-bit CPUs) – 12-dimensional hypercube - SIMDCM2 (65K 4-bit CPUs) – 12-dimensional hypercube - SIMD CM5 – fat-tree interconnect - MIMD CM5 – fat-tree interconnect - MIMD

Tiahe-1a 4.7 petaflops, 14K Tiahe-1a 4.7 petaflops, 14K Xeon X5670 and 7,168 X5670 and 7,168 Nvidia Tesla M2050 M2050 K-computer 8 petaflops (10^15 FLOPS), 2011, 68 K 2.0GHz 8-core CPUs 68 K 2.0GHz 8-core CPUs

548,352 cores; 548,352 cores;

Page 11: Why Parallel/Distributed Computing

.

Everyday ReasonsEveryday Reasons Available local networked workstations and Grid resources should be Available local networked workstations and Grid resources should be

utilizedutilized Solve compute-intensive problems fasterSolve compute-intensive problems faster

Make infeasible problems feasibleMake infeasible problems feasibleReduce design timeReduce design timeLeverage of large combined memory Leverage of large combined memory

Solve larger problems in same amount of timeSolve larger problems in same amount of timeImprove answer’s precisionImprove answer’s precisionReduce design timeReduce design time

Gain competitive advantage Gain competitive advantage Exploit commodity multi-core and GPU chipsExploit commodity multi-core and GPU chips Find Jobs!Find Jobs!

Why Parallel and Distributed Computing?

Page 12: Why Parallel/Distributed Computing

.

Why Shared Memory programming? Easier conceptual environmentEasier conceptual environment Programmers typically familiar with concurrent Programmers typically familiar with concurrent threadsthreads and and

processesprocesses sharing address space sharing address space CPUs within multi-core chips share memoryCPUs within multi-core chips share memory OpenMP an application programming interface (API) for OpenMP an application programming interface (API) for

shared-memory systemsshared-memory systems Supports higher performance parallel programming of Supports higher performance parallel programming of

symmetrical multiprocessorssymmetrical multiprocessors Java threadsJava threads MPI for Distributed Memory ProgrammingMPI for Distributed Memory Programming

Page 13: Why Parallel/Distributed Computing

.

Seeking Concurrency

Data dependence graphsData dependence graphs Data parallelismData parallelism Functional parallelismFunctional parallelism PipeliningPipelining

Page 14: Why Parallel/Distributed Computing

.

Data Dependence Graph

Directed graphDirected graph Vertices = tasksVertices = tasks Edges = dependenciesEdges = dependencies

Page 15: Why Parallel/Distributed Computing

.

Data Parallelism

Independent tasks apply same operation to Independent tasks apply same operation to different elements of a data setdifferent elements of a data set

Okay to perform operations concurrentlyOkay to perform operations concurrently Speedup: potentially p-fold, p #processorsSpeedup: potentially p-fold, p #processors

for i 0 to 99 do a[i] b[i] + c[i]endfor

Page 16: Why Parallel/Distributed Computing

.

Functional Parallelism Independent tasks apply different operations to Independent tasks apply different operations to

different data elementsdifferent data elements

First and second statementsFirst and second statements Third and fourth statementsThird and fourth statements Speedup: Limited by amount of concurrent sub-tasksSpeedup: Limited by amount of concurrent sub-tasks

a 2b 3m (a + b) / 2s (a2 + b2) / 2v s - m2

Page 17: Why Parallel/Distributed Computing

.

Pipelining Divide a process into stagesDivide a process into stages Produce several items simultaneouslyProduce several items simultaneously Speedup: Limited by amount of concurrent sub-Speedup: Limited by amount of concurrent sub-

tasks = #of stages in the pipelinetasks = #of stages in the pipeline