csc 364/664 parallel computation fall 2003 burg/miller/torgersen chapter 1: parallel computers

31
CSC 364/664 Parallel Computation Fall 2003 Burg/Miller/Torgersen Chapter 1: Parallel Computers

Upload: neil-davis

Post on 20-Jan-2016

216 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: CSC 364/664 Parallel Computation Fall 2003 Burg/Miller/Torgersen Chapter 1: Parallel Computers

CSC 364/664 Parallel Computation Fall 2003 Burg/Miller/Torgersen

Chapter 1: Parallel Computers

Page 2: CSC 364/664 Parallel Computation Fall 2003 Burg/Miller/Torgersen Chapter 1: Parallel Computers

2

Concurrency vs. True Parallelism

Concurrency is used in systems where more than one user is using a resource at the same CPUDatabase information

In true parallelism, multiple processors are working simultaneously on one application problem

Page 3: CSC 364/664 Parallel Computation Fall 2003 Burg/Miller/Torgersen Chapter 1: Parallel Computers

3

Flynn’s Taxonomy – Classification by Control Mechanism

A classification of parallel systems from a “flow of control” perspective

SISD –single instruction, single dataSIMD – single instruction, multiple dataMISD – multiple instructions, single dataMIMD – multiple instructions, multiple

data

Page 4: CSC 364/664 Parallel Computation Fall 2003 Burg/Miller/Torgersen Chapter 1: Parallel Computers

4

SISD

Single instruction, single dataSequential programming with one

processor, just like you’ve always done

Page 5: CSC 364/664 Parallel Computation Fall 2003 Burg/Miller/Torgersen Chapter 1: Parallel Computers

5

SIMD

Single instruction, multiple data One control unit issuing the same instruction

to multiple CPU’s that operate simultaneously on their own portions of data

Lock-step, synchronized Vector and matrix computation lend

themselves to an SIMD implementation Examples of SIMD computers: Illiac IV, MPP,

DAP, CM-2, and MasPar MP-2

Page 6: CSC 364/664 Parallel Computation Fall 2003 Burg/Miller/Torgersen Chapter 1: Parallel Computers

6

MIMD

Multiple instructions, multiple data Each processor “doing its own thing” Processors synchronize either through

passing messages or writing values to shared memory addresses

Subcategories SPMD – single program, multiple data (MPI on a Linux

cluster) MPMD – multiple program, multiple data (PVM)

Examples of MIMD computers – BBN butterfly, IPSC 1 and 2, IBM SP, SP2

Page 7: CSC 364/664 Parallel Computation Fall 2003 Burg/Miller/Torgersen Chapter 1: Parallel Computers

7

MISD

Multiple instruction, single dataDoesn’t really exist, unless you consider

pipelining an MISD configuration

Page 8: CSC 364/664 Parallel Computation Fall 2003 Burg/Miller/Torgersen Chapter 1: Parallel Computers

8

Comparison of SIMD and MIMD

It takes a specially-designed computer to do SIMD computing, since one control unit controls multiple processors.

SIMD requires only one copy of a program. MIMD systems have a copy of the program

and operating system at each processor. SIMD computers quickly become obsolete. MIMD systems can be pieced together from

the most up-to-date components available.

Page 9: CSC 364/664 Parallel Computation Fall 2003 Burg/Miller/Torgersen Chapter 1: Parallel Computers

9

Classification by Communication Mechanism

Shared-address-space “Multiprocessors”, each with its own control unit Virtual memory makes all memory addresses look

like they come from one consistent space, but they don’t necessarily

Processors communicate with reads and writes Message passing systems

“Multicomputers” Separate processors and separate memory

addresses Processors communicate with message passing

Page 10: CSC 364/664 Parallel Computation Fall 2003 Burg/Miller/Torgersen Chapter 1: Parallel Computers

10

Shared Memory Address Space

Interprocess communication is done in the memory interface through reads and writes.

Virtual memory address maps to a real address. Different processors may have memory locally

attached to them. Access could needed to a processor’s own memory,

or to the memory attached to a different processor. Different instances of memory access could take

different amounts of time. Collisions are possible. UMA (i.e., shared memory) vs. NUMA (i.e., distributed

shared memory)

Page 11: CSC 364/664 Parallel Computation Fall 2003 Burg/Miller/Torgersen Chapter 1: Parallel Computers

11

Message Passing System

Interprocess communication is done at the program level using sends and receives.

Reads and writes refer only to a processor’s local memory.

Data can be packed into long messages before being sent, to compensate for latency.

Global scheduling of messages can help avoid message collisions.

Page 12: CSC 364/664 Parallel Computation Fall 2003 Burg/Miller/Torgersen Chapter 1: Parallel Computers

12

Basic Architecture Terms – Clock Speed and Bandwidth

Clock speed of a processor– max # of times per sec. that a device can say something new

Bandwidth of a transmission medium (i.e., telephone line, cable line, etc.) is defined as the maximum rate at which the medium can change a signal. Bandwidth is measured in cycles per second or Hertz. Bandwidth is determined by the physical properties of the transmission medium, including the material of which it is composed.

Page 13: CSC 364/664 Parallel Computation Fall 2003 Burg/Miller/Torgersen Chapter 1: Parallel Computers

13

Basic Architecture Terms – Clock Speed and Bandwidth

Data rate is a measure of the amount of data that can be sent across a transmission medium per unit time. Data rate is determined by two things (1) the bandwidth, and (2) the potential number of different things that can be conveyed each time the signal changes (which, in the case of a bus, is based on the number of parallel data lines).

Page 14: CSC 364/664 Parallel Computation Fall 2003 Burg/Miller/Torgersen Chapter 1: Parallel Computers

14

Basic Architecture Terms -- Bus

A bus is a communication medium to which all processors are connected.

Only one communication at a time is allowed on the bus.

Only one step from any source to any destination. Bus data rate (sometimes loosely called

“bandwidth”) is defined as clock speed times number of bits transmitted at each clock pulse

Bus is low-cost, but you can’t have very many processors attached to it.

Page 15: CSC 364/664 Parallel Computation Fall 2003 Burg/Miller/Torgersen Chapter 1: Parallel Computers

15

Bus on a Motherboard

The bus transports data among the CPU, memory,and other components.

It consists of electrical circuits called traces and adapters or expansion cards.

There’s a main motherboard bus, and then buses for the CPU, memory, SCSI connections, and USB.

Page 16: CSC 364/664 Parallel Computation Fall 2003 Burg/Miller/Torgersen Chapter 1: Parallel Computers

16

Types of Buses

Original IBM PC bus – 8-bit parallel, 4.77 MHz clock speed

IBM AT, 1982, introduced the ISA bus (Industry Standard Architecture), 16 bit parallel, with expansion slots, still compatible with 8-bit, 8 MHz clock speed

IBM PS/2, MCA (Microchannel Architecture) bus, 32 bit parallel, but not backwardly compatible; 10 MHz clock speed; didn’t catch on

Page 17: CSC 364/664 Parallel Computation Fall 2003 Burg/Miller/Torgersen Chapter 1: Parallel Computers

17

Types of Buses

Compaq and other IBM rivals introduced EISA (Extended Industry Standard Architecture) bus in 1988, 32-bit parallel, 8.2 MHz clock speed; didn’t catch on

VL-Bus (Vesa Local Bus), 32-bit parallel, close to clock speed of CPU, tied directly to CPU

The trend moved to specialized buses with higher clock speeds, closer to the CPU’s clock speed, and separate from the system bus – e.g. PCI (Peripheral Component Bus)

Page 18: CSC 364/664 Parallel Computation Fall 2003 Burg/Miller/Torgersen Chapter 1: Parallel Computers

18

PCI bus

PCI bus can exist side-by-side with ISA bus and system bus; in this sense it’s a “local” bus

Originally 33 MHz, 32-bitsPCI-X is 133 MHz, 64 bit for 1 GB/sec

data transfer rateSupports Plug and Play See http://computer.howstuffworks.com/pci.htm

Page 19: CSC 364/664 Parallel Computation Fall 2003 Burg/Miller/Torgersen Chapter 1: Parallel Computers

19

Ethernet Bus-Based Network

All nodes branch off a common line.Each device has an ethernet address,

also known as MAC address. All computers receive all data transmissions

(in packets). They look to see if the packet is addressed to them, and read it only if it is.

When a computer wants to transmit data, it waits until the line is free.

CSMA/CD protocol is used (carrier-sense multiple access with collision detection).

Page 20: CSC 364/664 Parallel Computation Fall 2003 Burg/Miller/Torgersen Chapter 1: Parallel Computers

20

Basic Architecture Terms -- Ethernet

Ethernet is actually an OSI layer 2 communication protocol. It does not dictate the type of connectivity – could be copper, fiber, wireless.

Today’s ethernet is full-duplex, i.e., it has separate lines for send and receive

IEEE Standard 802.3Ethernet comes in 10, 100, and 1000

Mb/sec (1 Gb/sec) speeds. See http://computer.howstuffworks.com/ethernet.htm

Page 21: CSC 364/664 Parallel Computation Fall 2003 Burg/Miller/Torgersen Chapter 1: Parallel Computers

21

Basic Architecture Terms -- Hub

Hubs connect computers in a network.

They operate using a broadcast model. When n computers are connected to a hub, hubs simply pass through all network traffic to each of the n computers.

Page 22: CSC 364/664 Parallel Computation Fall 2003 Burg/Miller/Torgersen Chapter 1: Parallel Computers

22

Basic Architecture Terms -- Switch

Unlike hubs, switches can look at data packets as they are received, determine the source and destination device, and forward the packet appropriately.

By delivering messages only to the device that the packet was intended for, switches conserve network bandwidth.

See http://howstuffworks.com/lan-switch.htm

Page 23: CSC 364/664 Parallel Computation Fall 2003 Burg/Miller/Torgersen Chapter 1: Parallel Computers

23

Basic Architecture Terms -- Myrinet

Packet communication and switching technology, faster than ethernet.

Myrinet offers full-duplex 2+2 Gb/sec data rate and low latency. It is used in Linux clusters.

Only 16 of the nodes of WFU’s clusters are connected with myrinet. The rest are connected with ethernet, for cost reasons.

Page 24: CSC 364/664 Parallel Computation Fall 2003 Burg/Miller/Torgersen Chapter 1: Parallel Computers

24

Classification by Interconnection Network

Static network Bus-based network can be static (if no switches are

involved) Direct links between computers

Examples include completely connected, line/ring, mesh, tree (regular and fat), and hypercube

Dynamic network Uses switches Connections change according to whether a switch

is open or closed Could be arranged in stages (multistage) (e.g.,

Omega network)

Page 25: CSC 364/664 Parallel Computation Fall 2003 Burg/Miller/Torgersen Chapter 1: Parallel Computers

25

Hypercube

A d-dimensional hypercube has 2d nodes.

Each node has a d-bit address.Neighboring nodes differ in one bit.Needs a routing algorithm. We’ll try one

in class.

Page 26: CSC 364/664 Parallel Computation Fall 2003 Burg/Miller/Torgersen Chapter 1: Parallel Computers

26

Multistage Networks

See notes on Omega network from class.

Page 27: CSC 364/664 Parallel Computation Fall 2003 Burg/Miller/Torgersen Chapter 1: Parallel Computers

27

Properties of Network Communication

Diameter of a network – min # links between 2 farthest nodes

Bisection width of a network -- # links that must be cut to divide network into 2 equal parts

Page 28: CSC 364/664 Parallel Computation Fall 2003 Burg/Miller/Torgersen Chapter 1: Parallel Computers

28

Properties of Network Communication

Message latency – time taken to prepare the message to be sent (software overhead)

Network latency – time taken for a message to pass through a network

Communication latency – total time taken to send a message, including message and network latency

Deadlock – occurs when packets cannot be forwarded because they are waiting for each other in a circular way

Page 29: CSC 364/664 Parallel Computation Fall 2003 Burg/Miller/Torgersen Chapter 1: Parallel Computers

29

Memory Hierarchy

Global memoryLocal memoryCache

Faster, but more expensiveCache coherence must be maintained

Page 30: CSC 364/664 Parallel Computation Fall 2003 Burg/Miller/Torgersen Chapter 1: Parallel Computers

30

Communication Methods

Circuit switchingPacket switchingWormhole routing

Page 31: CSC 364/664 Parallel Computation Fall 2003 Burg/Miller/Torgersen Chapter 1: Parallel Computers

31

Properties of a Parallel Program

GranularitySpeedupOverheadEfficiencyCostScalabilityGustafson’s law