parallel computing systems part i: introduction

100
©2003 Dror Feitelson Parallel Computing Systems Part I: Introduction Dror Feitelson Hebrew University

Upload: waneta

Post on 13-Jan-2016

30 views

Category:

Documents


2 download

DESCRIPTION

Parallel Computing Systems Part I: Introduction. Dror Feitelson Hebrew University. Topics. Overview of the field Architectures: vectors, MPPs, SMPs, and clusters Networks and routing Scheduling parallel jobs Grid computing Evaluating performance. Today (and next week?). - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Parallel Computing Systems Part I: Introduction

©2003 Dror Feitelson

Parallel Computing SystemsPart I: Introduction

Dror Feitelson

Hebrew University

Page 2: Parallel Computing Systems Part I: Introduction

©2003 Dror Feitelson

Topics

• Overview of the field

• Architectures: vectors, MPPs, SMPs, and clusters

• Networks and routing

• Scheduling parallel jobs

• Grid computing

• Evaluating performance

Page 3: Parallel Computing Systems Part I: Introduction

©2003 Dror Feitelson

Today (and next week?)

• What is parallel computing

• Some history

• The Top500 list

• The fastest machines in the world

• Trends and predictions

Page 4: Parallel Computing Systems Part I: Introduction

©2003 Dror Feitelson

What is a Parallel System?

In particular, what is the difference between parallel and distributed computing?

Page 5: Parallel Computing Systems Part I: Introduction

©2003 Dror Feitelson

What is a Parallel System?

Chandy: it is related to concurrency.

• In distributed computing, concurrency is part of the problem.

• In parallel computing, concurrency is part of the solution.

Page 6: Parallel Computing Systems Part I: Introduction

©2003 Dror Feitelson

Distributed Systems

• Concurrency because of physical distribution– Desktops of different users– Servers across the Internet– Branches of a firm– Central bank computer and ATMs

• Need to coordinate among autonomous systems

• Need to tolerate failures and disconnections

Page 7: Parallel Computing Systems Part I: Introduction

©2003 Dror Feitelson

Parallel Systems

• High-performance computing: solve problems that are to big for a single machine– Get the solution faster (weather forecast)– Get a better solution (physical simulation)

• Need to parallelize algorithm• Need to control overhead• Can assume friendly system?

Page 8: Parallel Computing Systems Part I: Introduction

©2003 Dror Feitelson

The Convergence

Use distributed resources for parallel processing

• Networks of workstations – use available desktop machines within organization

• Grids – use available resources (servers?) across organizations

• Internet computing – use personal PCs across the globe (SETI@home)

Page 9: Parallel Computing Systems Part I: Introduction

©2003 Dror Feitelson

Some History

Page 10: Parallel Computing Systems Part I: Introduction

©2003 Dror Feitelson

Early HPC

• Parallel systems in academia/research– 1974: C.mmp– 1974: Illiac IV– 1978: Cm*– 1983: Goodyear MPP

Page 11: Parallel Computing Systems Part I: Introduction

©2003 Dror Feitelson

Illiac IV•1974•SIMD: all processors do the same

•Numerical calculations at NASA

•Now in Boston computer museum

Page 12: Parallel Computing Systems Part I: Introduction

©2003 Dror Feitelson

The Illiac IV in Numbers

• 64 processors arranged as 8 8 grid• Each processor has 104 ECL transistors• Each processor has 2K 64-bit words

(total is 8 Mbit)• Arranged in 210 boards• Packed in 16 cabinets• 500 Mflops peak performance• Cost: $31 million

Page 13: Parallel Computing Systems Part I: Introduction

©2003 Dror Feitelson

Sustained vs. Peak

• Peak performance: product of clock rate and number of functional units

• Sustained rate: what you actually achieve on a real application

• Sustained is typically much lower than peak– Application does not require all functional units– Need to wait for data to arrive from memory– Need to synchronize– Best for dense matrix operations (Linpack)

A rate that the vendor guarantees will not be

exceeded

Page 14: Parallel Computing Systems Part I: Introduction

©2003 Dror Feitelson

Early HPC

• Parallel systems in academia/research– 1974: C.mmp– 1974: Illiac IV– 1978: Cm*– 1983: Goodyear MPP

• Vector systems by Cray and Japanese firms– 1976: Cray 1 rated at 160 Mflops peak– 1982: Cray X-MP, later Y-MP, C90, …– 1985: Cray 2, NEC SX-2

Page 15: Parallel Computing Systems Part I: Introduction

©2003 Dror Feitelson

Cray’s Achievements

• Architectural innovations– Vector operations on vector

registers– All memory is equally close: no

cache– Trade off accuracy and speed

• Packaging– Short and equally long wires– Liquid cooling systems

• Style

Page 16: Parallel Computing Systems Part I: Introduction

©2003 Dror Feitelson

Vector Supercomputers

• Vector registers store vectors of fast access

• Vector instructions operate on whole vectors of values– Overhead of instruction decode only once per vector– Pipelined execution of instruction on vector

elements: one result per clock tick

(at least after pipeline is full)– Possible to chain vector operations: start feeding

second functional unit before finishing first one

Page 17: Parallel Computing Systems Part I: Introduction

©2003 Dror Feitelson

Cray 1• 1975• 80 MHz clock• 160 Mflops peak• Liquid cooling• World’s most

expensive love seat• Power supply and

cooling under the seat

• Available in red, blue, black…

• No operating system

Page 18: Parallel Computing Systems Part I: Introduction

©2003 Dror Feitelson

Cray 1 Wiring

•Round configuration for small and uniform distances

•Longest wire: 4 feet

•Wires connected manually by extra-small engineers

Page 19: Parallel Computing Systems Part I: Introduction

©2003 Dror Feitelson

Cray X-MP

• 1982

• 1 Gflop

• Multiprocessor with 2 or 4 Cray1-like processors

• Shard memory

Page 20: Parallel Computing Systems Part I: Introduction

©2003 Dror Feitelson

Cray X-MP

Page 21: Parallel Computing Systems Part I: Introduction

©2003 Dror Feitelson

Cray 2

•1985

•Smaller and more compact than Cray 1

•4 (or 8) processors

•Total immersion liquid cooling

Page 22: Parallel Computing Systems Part I: Introduction

©2003 Dror Feitelson

Cray Y-MP

• 1988

• 8 proc’s

• Achieved 1 Gflop

Page 23: Parallel Computing Systems Part I: Introduction

©2003 Dror Feitelson

Cray Y-MP – Opened

Page 24: Parallel Computing Systems Part I: Introduction

©2003 Dror Feitelson

Cray Y-MP – From Back

Power supply and cooling

Page 25: Parallel Computing Systems Part I: Introduction

©2003 Dror Feitelson

Cray C90

• 1992

• 1 Gflop per processor

• 8 or more processors

Page 26: Parallel Computing Systems Part I: Introduction

©2003 Dror Feitelson

The MPP Boom

• 1985: Thinking Machines introduces the Connection Machine CM-1– 16K single-bit processors, SIMD– Followed by CM-2, CM-200– Similar machines by MasPar

• mid ’80s: hypercubes become successful

• Also: Transputers used as building blocks

• Early ’90s: big companies join– IBM, Cray

Page 27: Parallel Computing Systems Part I: Introduction

©2003 Dror Feitelson

SIMD Array Processors

• ’80 favorites– Connection machine– Maspar

• Very many single-bit processors with attached memory – proprietary hardware

• Single control unit: everything is totally synchronized(SIMD = single instruction multiple data)

• Massive parallelism even with “correct counting” (i.e. divide by 32)

Page 28: Parallel Computing Systems Part I: Introduction

©2003 Dror Feitelson

Connection Machine CM-2• Cube of

64K proc’s

• Acts as backend

• Hyper-cube topology

• Data vault for parallel I/O

Page 29: Parallel Computing Systems Part I: Introduction

©2003 Dror Feitelson

Hypercubes

• Early ’80s: Caltech 64-node Cosmic Cube• Mid to late ’80s: Commercialized by

several companies– Intel iPSC, iPSC/2, iPSC/860– nCUBE, nCUBE 2 (later turned into a VoD server…)

• Early ’90s: replaced by mesh/torus – Intel Paragon – i860 processors– Cray T3D, T3E – Alpha processors

Page 30: Parallel Computing Systems Part I: Introduction

©2003 Dror Feitelson

Transputers

• A microprocessor with built-in support for communication

• Programmed using Occam

• Used in Meiko and other systems

PARSEQ

x := 13;c ! x;

SEQc ? y;z := y;-- z is 13

Synchronous communication: an assignment

across processes

Page 31: Parallel Computing Systems Part I: Introduction

©2003 Dror Feitelson

Attack of the Killer Micros

• Commodity microprocessors advance at a faster rate than vector processors

• Takeover point was around year 2000

• Even before that, using many together could provide lots of power– 1992: TMC uses SPARC in CM-5– 1992: Intel uses i860 in Paragon– 1993: IBM SP uses RS/6000, later PowerPC– 1993: Cray uses Alpha in T3D– Berkeley NoW project

Page 32: Parallel Computing Systems Part I: Introduction

©2003 Dror Feitelson

Connection Machine CM-5

• 1992• SPARC-based• Fat-tree network• Dominant in early

’90s• Featured in

Jurassic Park• Support for gang

scheduling!

Page 33: Parallel Computing Systems Part I: Introduction

©2003 Dror Feitelson

Intel Paragon

•1992

•2 i860 proc’s per node:

–Compute–Commun.

•Mesh interconnect with spiffi display

Page 34: Parallel Computing Systems Part I: Introduction

©2003 Dror Feitelson

Cray T3D/T3E

• 1993 – Cray T3D

• Uses commodity microprocessors (DEC Alpha)

• 3D Torus interconnect

• 1995 – Cray T3E

Page 35: Parallel Computing Systems Part I: Introduction

©2003 Dror Feitelson

IBM SP •1993

•16 RS/6000 processors per rack

•Each runs AIX (full Unix)

•Multistage network

•Flexible configurations

•First large IUCC machine

Page 36: Parallel Computing Systems Part I: Introduction

©2003 Dror Feitelson

Berkeley NoW

• The building is the computer

• Just need some glue software…

Page 37: Parallel Computing Systems Part I: Introduction

©2003 Dror Feitelson

Not Everybody is Convinced…

• Japan’s computer industry continues to build vector machines

• NEC– SX series of supercomputers

• Hitachi– SR series of supercomputers

• Fujitsu– VPP series of supercomputers

• Albeit with less style

Page 38: Parallel Computing Systems Part I: Introduction

©2003 Dror Feitelson

Fujitsu VPP700

Page 39: Parallel Computing Systems Part I: Introduction

©2003 Dror Feitelson

NEC SX-4

Page 40: Parallel Computing Systems Part I: Introduction

©2003 Dror Feitelson

More Recent History

• 1994 – 1995 slump– Cold war is over– Thinking machines files for chapter 11– KSR Research files for chapter 11

• Late ’90s much better– IBM, Cray retain parallel machine market– Later also SGI, Sun, especially with SMPs– ASCI program is started

• 21st century: clusters take over– Based on SMPs

Page 41: Parallel Computing Systems Part I: Introduction

©2003 Dror Feitelson

SMPs

• Machines with several CPUs

• Initially small scale: 8-16 processors

• Later achieved large scale of 64-128 processors

• Global shared memory accessed via a bus

• Hard to scale further due to shared memory and cache coherence

Page 42: Parallel Computing Systems Part I: Introduction

©2003 Dror Feitelson

SGI Challenge

• 1 to 16 processors

• Bus interconnect

• Dominated low end of Top500 list in mid ’90s

• Not only graphics…

Page 43: Parallel Computing Systems Part I: Introduction

©2003 Dror Feitelson

SGI Origin

An Origin 2000 installed at IUCC

• MIPS processors

• Remote memory access

Page 44: Parallel Computing Systems Part I: Introduction

©2003 Dror Feitelson

Architectural Convergence

• Shared memory used to be uniform (UMA)– Based on bus or crossbar– Conventional load/store operations

• Distributed memory used message passing• Newer machines support remote memory

access– Nonuniform (NUMA): access to remote

memory costs more– Put/get operations (but handled by NIC)– Cray T3D/T3E, SGI Origin 2000/3000

Page 45: Parallel Computing Systems Part I: Introduction

©2003 Dror Feitelson

The ASCI Program

• 1996: nuclear test ban leads to need for simulation of nuclear explosions

• Accelerated Strategic Computing Initiative: Moore’s law not fast enough…

• Budget of a billion dollars

Page 46: Parallel Computing Systems Part I: Introduction

©2003 Dror Feitelson

The Vision

Market-driven progress

ASCI requirements

Path

Forw

ard

Technology

transfer

Time

Per

form

ance

Page 47: Parallel Computing Systems Part I: Introduction

©2003 Dror Feitelson

ASCI Milestones

• 1996 – ASCI Red: 1 TF Intel

• 1998 – ASCI Blue Mountain: 3 TF

• 1998 – ASCI Blue Pacific: 3 TF

• 2001 – ASCI WhiteWhite: 10 TF

• 2003 – ASCI Purple: 30 TF?

so far two thirds delivered

Page 48: Parallel Computing Systems Part I: Introduction

©2003 Dror Feitelson

The ASCI Red Machine

• 9260 processors – PentiumPro 200

• Arranged as 4-way SMPs in 86 cabinets

• 573 GB memory total

• 2.25 TB disk space total

• 2 miles of cables

• 850 KW peak power consumption

• 44 tons (+300 tons air conditioning equipment)

• Cost: $55 million

Page 49: Parallel Computing Systems Part I: Introduction

©2003 Dror Feitelson

Clusters vs. MPPs

• Mix and match approach – PCs/SMPs/blades used as processing nodes– Fast switched network for interconnect– Linux on each node– MPI for software development– Something for management

• Lower cost to set up

• Non-trivial to operate effectively

Page 50: Parallel Computing Systems Part I: Introduction

©2003 Dror Feitelson

SMP Nodes

• PCs, workstations, or servers with several CPUs

• Small scale (4-8) used as nodes in MPPs or clusters

• Access to shared memory via shared L2 cache

• SMP support (cache coherence) built into modern microprocessors

Page 51: Parallel Computing Systems Part I: Introduction

©2003 Dror Feitelson

Myrinet• 1995• Switched gigabit LAN

– As opposed to Ethernet that is a broadcast medium

• Programmable NIC– Offload communication operations from the

CPU

• Allows clusters to achieve communication rates of MPPs

• Very expensive• Later: gigabit Ethernet

Page 52: Parallel Computing Systems Part I: Introduction

©2003 Dror Feitelson

Page 53: Parallel Computing Systems Part I: Introduction

©2003 Dror Feitelson

Blades

• PCs/SMPs require resources– Floor space– Cables for interconnect– Power supplies and fans

• This is meaningful if you have thousands

• Blades provide dense packaging

• With vertical mounting get < 1U on average

• The hot new thing in 2002

Page 54: Parallel Computing Systems Part I: Introduction

©2003 Dror Feitelson

SunFire Servers

16 servers in a rack-mounted box

Used to be called “single-board computers” in the ’80s (Makbilan)

Page 55: Parallel Computing Systems Part I: Introduction

©2003 Dror Feitelson

The Cray Name• 1972: Cray Research founded

– Cray 1, X-MP, Cray 2, Y-MP, C90…

– From 1993: MPPs T3D, T3E

• 1989: Cray Computer founded– GaAs efforts, closed

• 1996: SGI Acquires Cray Research– Attempt to merge T3E and Origin

• 2000: sold to Tera– Use name to bolster MTA

• 2002: Cray sells Japanese NEC SX-6• 2002: Announces new X1 supercomputer

Page 56: Parallel Computing Systems Part I: Introduction

©2003 Dror Feitelson

Vectors are not Dead!

• 1994: Cray T90– Continues Cray C90 line

• 1996: Cray J90– Continues Cray Y-MP line

• 2000: Cray SV1

• 2002: Cray X1– Only “Big-Iron” company left

Page 57: Parallel Computing Systems Part I: Introduction

©2003 Dror Feitelson

Cray J90

• 1996

• Very popular continuation of Y-MP

• 8, then 16, then 32 processors

• One installed at IUCC

Page 58: Parallel Computing Systems Part I: Introduction

©2003 Dror Feitelson

Cray X1

• 2002• Up to 1024

nodes• 4 custom vector

proc’s per node• 12.8 GFlops

peak each• Torus

interconnect

Page 59: Parallel Computing Systems Part I: Introduction

©2003 Dror Feitelson

Confused?

Page 60: Parallel Computing Systems Part I: Introduction

©2003 Dror Feitelson

The Top500 List

• List of the 500 most powerful computer installations in the world

• Separates academic chic from real impact

• Measured using Linpack– Dense matrix operations– Might not be representative of real applications

Page 61: Parallel Computing Systems Part I: Introduction

©2003 Dror Feitelson

Page 62: Parallel Computing Systems Part I: Introduction

©2003 Dror Feitelson

Page 63: Parallel Computing Systems Part I: Introduction

©2003 Dror Feitelson

Page 64: Parallel Computing Systems Part I: Introduction

©2003 Dror Feitelson

Page 65: Parallel Computing Systems Part I: Introduction

©2003 Dror Feitelson

The Competition

How to achieve a rank:

• Few vector processors– Maximize power per processor– High efficiency

• Many commodity processors– Ride technology curve– Power in numbers– Low efficiency

Page 66: Parallel Computing Systems Part I: Introduction

©2003 Dror Feitelson

Vector Programming

• Conventional Fortran• Automatic vectorization of loops by

compiler• Autotasking uses processors that happen to

be available at runtime to execute chunks of loop iterations

• Easy for application writers• Very high efficiency

Page 67: Parallel Computing Systems Part I: Introduction

©2003 Dror Feitelson

MPP Programming

• Library added to programming language– MPI for distributed memory– OpenMP for shared memory

• Applications need to be partitioned manually

• Many possible efficiency losses– Fragmentation in allocating processors– Stalls waiting for memory and communication– Imbalance among threads

• Hard for programmers

Page 68: Parallel Computing Systems Part I: Introduction

©2003 Dror Feitelson

Also National Competition

• Japan Inc. is “more Cray than Cray”– Computers based on few but powerful proprietary

vector processors– Numerical wind tunnel at rank 1 from 1993 to 1995– CP-PACS at rank 1 in 1996– Earth simulator at rank 1 in 2003

• US industry switched to commodity microprocessors– Even Cray did– ASCI machines at rank 1 in 1997—2002

Page 69: Parallel Computing Systems Part I: Introduction

©2003 Dror Feitelson

Vectors vs. MPPs – 1994

Feitelson, Int. J. High-Perf. Comput. App., 1999

Page 70: Parallel Computing Systems Part I: Introduction

©2003 Dror Feitelson

Vectors vs. MPPs – 1997

Page 71: Parallel Computing Systems Part I: Introduction

©2003 Dror Feitelson

Vectors vs. MPPs – 1997

Page 72: Parallel Computing Systems Part I: Introduction

©2003 Dror Feitelson

The Current Situation

Page 73: Parallel Computing Systems Part I: Introduction

©2003 Dror Feitelson

Real Usage

• Control functions for telecomm companies• Reservoir modeling for oil companies• Graphic rendering for Hollywood• Financial modeling for Wall Street• Drug design for pharmaceuticals• Weather prediction• Airplane design for Boeing and Airbus• Hush-hush activities

Page 74: Parallel Computing Systems Part I: Introduction

©2003 Dror Feitelson

Page 75: Parallel Computing Systems Part I: Introduction

©2003 Dror Feitelson

The Earth Simulator

• Operational in late 2002

• Top rank in Top500 list

• Result of 5-year design and implementation effort

• Equivalent power to top 15 US machines (including all ASCI machines)

• Really big

Page 76: Parallel Computing Systems Part I: Introduction

©2003 Dror Feitelson

Page 77: Parallel Computing Systems Part I: Introduction

©2003 Dror Feitelson

Page 78: Parallel Computing Systems Part I: Introduction

©2003 Dror Feitelson

Page 79: Parallel Computing Systems Part I: Introduction

©2003 Dror Feitelson

Page 80: Parallel Computing Systems Part I: Introduction

©2003 Dror Feitelson

Page 81: Parallel Computing Systems Part I: Introduction

©2003 Dror Feitelson

The Earth Simulator in Numbers

• 640 nodes

• 8 vector processors per node, 5120 total

• 8 Gflops per processor, 40 Tflops total

• 16 GB memory per node, 10 TB total

• 2800 km of cables

• 320 cabinets (2 nodes each)

• Cost: $ 350 million

Page 82: Parallel Computing Systems Part I: Introduction

©2003 Dror Feitelson

Trends

Page 83: Parallel Computing Systems Part I: Introduction

©2003 Dror Feitelson

Page 84: Parallel Computing Systems Part I: Introduction

©2003 Dror Feitelson

Exercise

• Look at 10 years of Top500 lists and try to say something non-trivial about trends

• Are there things that grow?

• Are there things that stay the same?

• Can you make predictions?

Page 85: Parallel Computing Systems Part I: Introduction

©2003 Dror Feitelson

Distribution of Vendors – 1994

Feitelson, Int. J. High-Perf. Comput. App., 1999

Page 86: Parallel Computing Systems Part I: Introduction

©2003 Dror Feitelson

Distribution of Vendors – 1997

Page 87: Parallel Computing Systems Part I: Introduction

©2003 Dror Feitelson

IBM in the Lists

Arrows are the ANL SP1 with 128 processors

Rank doubles each year

Page 88: Parallel Computing Systems Part I: Introduction

©2003 Dror Feitelson

Minimal Parallelism

Page 89: Parallel Computing Systems Part I: Introduction

©2003 Dror Feitelson

Min vs. Max

Page 90: Parallel Computing Systems Part I: Introduction

©2003 Dror Feitelson

Power with Time

• Rmax of last machine doubles each year

This is 8-fold in three years

• Degree of parallelism doubles every three years

• So power of each processor increases 4-fold in three years (=doubles in 18 months)

• Which is Moore’s Law…

Page 91: Parallel Computing Systems Part I: Introduction

©2003 Dror Feitelson

Distribution of Power

• Rank of machines doubles each year• Power of rank 500 machine doubles each

year• So rank 250 machine this year has double

the power of rank 500 machine this year• And rank 125 machine has double the

power of rank 250 machine• In short, power decreases exponentially

with rank

Page 92: Parallel Computing Systems Part I: Introduction

©2003 Dror Feitelson

Power and Rank

Page 93: Parallel Computing Systems Part I: Introduction

©2003 Dror Feitelson

Power and Rank

Page 94: Parallel Computing Systems Part I: Introduction

©2003 Dror Feitelson

Power and Rank

Page 95: Parallel Computing Systems Part I: Introduction

©2003 Dror Feitelson

Power and Rank

The slope is becoming flatter

rankRmax

rankRmax

1

)log()log(

1994 0.978

1995 0.865

1996 0.839

1997 0.816

1998 0.777

1999 0.761

2000 0.800

2001 0.753

2002 0.746

Page 96: Parallel Computing Systems Part I: Introduction

©2003 Dror Feitelson

Machine Ages in Lists

Page 97: Parallel Computing Systems Part I: Introduction

©2003 Dror Feitelson

New Machines

Page 98: Parallel Computing Systems Part I: Introduction

©2003 Dror Feitelson

Industry Share

Page 99: Parallel Computing Systems Part I: Introduction

©2003 Dror Feitelson

Vector Share

Page 100: Parallel Computing Systems Part I: Introduction

©2003 Dror Feitelson

Summary

Invariants of the last few years:

• Power grows exponentially with time

• Parallelism grows exponentially with time

• But maximal usable parallelism is ~10000

• Power drops polynomially with rank

• Age in the list drops exponentially

• About 300 new machines each year

• About 50% of machines in industry

• About 15% of power due to vector processors