csc 364/664 parallel computation fall 2003 burg/miller/torgersen chapter 1: parallel computers

CSC 364/664 Parallel Computation Fall 2003 Burg/Miller/Torgersen

Chapter 1: Parallel Computers

2

Concurrency vs. True Parallelism

Concurrency is used in systems where more than one user is using a resource at the same CPUDatabase information

In true parallelism, multiple processors are working simultaneously on one application problem

3

Flynn’s Taxonomy – Classification by Control Mechanism

A classification of parallel systems from a “flow of control” perspective

SISD –single instruction, single dataSIMD – single instruction, multiple dataMISD – multiple instructions, single dataMIMD – multiple instructions, multiple

data

4

SISD

Single instruction, single dataSequential programming with one

processor, just like you’ve always done

5

SIMD

Single instruction, multiple data One control unit issuing the same instruction

to multiple CPU’s that operate simultaneously on their own portions of data

Lock-step, synchronized Vector and matrix computation lend

themselves to an SIMD implementation Examples of SIMD computers: Illiac IV, MPP,

DAP, CM-2, and MasPar MP-2

6

MIMD

Multiple instructions, multiple data Each processor “doing its own thing” Processors synchronize either through

passing messages or writing values to shared memory addresses

Subcategories SPMD – single program, multiple data (MPI on a Linux

cluster) MPMD – multiple program, multiple data (PVM)

Examples of MIMD computers – BBN butterfly, IPSC 1 and 2, IBM SP, SP2

7

MISD

Multiple instruction, single dataDoesn’t really exist, unless you consider

pipelining an MISD configuration

8

Comparison of SIMD and MIMD

It takes a specially-designed computer to do SIMD computing, since one control unit controls multiple processors.

SIMD requires only one copy of a program. MIMD systems have a copy of the program

and operating system at each processor. SIMD computers quickly become obsolete. MIMD systems can be pieced together from

the most up-to-date components available.

9

Classification by Communication Mechanism

Shared-address-space “Multiprocessors”, each with its own control unit Virtual memory makes all memory addresses look

like they come from one consistent space, but they don’t necessarily

Processors communicate with reads and writes Message passing systems

“Multicomputers” Separate processors and separate memory

addresses Processors communicate with message passing

10

Shared Memory Address Space

Interprocess communication is done in the memory interface through reads and writes.

Virtual memory address maps to a real address. Different processors may have memory locally

attached to them. Access could needed to a processor’s own memory,

or to the memory attached to a different processor. Different instances of memory access could take

different amounts of time. Collisions are possible. UMA (i.e., shared memory) vs. NUMA (i.e., distributed

shared memory)

11

Message Passing System

Interprocess communication is done at the program level using sends and receives.

Reads and writes refer only to a processor’s local memory.

Data can be packed into long messages before being sent, to compensate for latency.

Global scheduling of messages can help avoid message collisions.

12

Basic Architecture Terms – Clock Speed and Bandwidth

Clock speed of a processor– max # of times per sec. that a device can say something new

Bandwidth of a transmission medium (i.e., telephone line, cable line, etc.) is defined as the maximum rate at which the medium can change a signal. Bandwidth is measured in cycles per second or Hertz. Bandwidth is determined by the physical properties of the transmission medium, including the material of which it is composed.

13

Basic Architecture Terms – Clock Speed and Bandwidth

Data rate is a measure of the amount of data that can be sent across a transmission medium per unit time. Data rate is determined by two things (1) the bandwidth, and (2) the potential number of different things that can be conveyed each time the signal changes (which, in the case of a bus, is based on the number of parallel data lines).

14

Basic Architecture Terms -- Bus

A bus is a communication medium to which all processors are connected.

Only one communication at a time is allowed on the bus.

Only one step from any source to any destination. Bus data rate (sometimes loosely called

“bandwidth”) is defined as clock speed times number of bits transmitted at each clock pulse

Bus is low-cost, but you can’t have very many processors attached to it.

15

Bus on a Motherboard

The bus transports data among the CPU, memory,and other components.

It consists of electrical circuits called traces and adapters or expansion cards.

There’s a main motherboard bus, and then buses for the CPU, memory, SCSI connections, and USB.

16

Types of Buses

Original IBM PC bus – 8-bit parallel, 4.77 MHz clock speed

IBM AT, 1982, introduced the ISA bus (Industry Standard Architecture), 16 bit parallel, with expansion slots, still compatible with 8-bit, 8 MHz clock speed

IBM PS/2, MCA (Microchannel Architecture) bus, 32 bit parallel, but not backwardly compatible; 10 MHz clock speed; didn’t catch on

17

Types of Buses

Compaq and other IBM rivals introduced EISA (Extended Industry Standard Architecture) bus in 1988, 32-bit parallel, 8.2 MHz clock speed; didn’t catch on

VL-Bus (Vesa Local Bus), 32-bit parallel, close to clock speed of CPU, tied directly to CPU

The trend moved to specialized buses with higher clock speeds, closer to the CPU’s clock speed, and separate from the system bus – e.g. PCI (Peripheral Component Bus)

18

PCI bus

PCI bus can exist side-by-side with ISA bus and system bus; in this sense it’s a “local” bus

Originally 33 MHz, 32-bitsPCI-X is 133 MHz, 64 bit for 1 GB/sec

data transfer rateSupports Plug and Play See http://computer.howstuffworks.com/pci.htm

19

Ethernet Bus-Based Network

All nodes branch off a common line.Each device has an ethernet address,

also known as MAC address. All computers receive all data transmissions

(in packets). They look to see if the packet is addressed to them, and read it only if it is.

When a computer wants to transmit data, it waits until the line is free.

CSMA/CD protocol is used (carrier-sense multiple access with collision detection).

20

Basic Architecture Terms -- Ethernet

Ethernet is actually an OSI layer 2 communication protocol. It does not dictate the type of connectivity – could be copper, fiber, wireless.

Today’s ethernet is full-duplex, i.e., it has separate lines for send and receive

IEEE Standard 802.3Ethernet comes in 10, 100, and 1000

Mb/sec (1 Gb/sec) speeds. See http://computer.howstuffworks.com/ethernet.htm

21

Basic Architecture Terms -- Hub

Hubs connect computers in a network.

They operate using a broadcast model. When n computers are connected to a hub, hubs simply pass through all network traffic to each of the n computers.

22

Basic Architecture Terms -- Switch

Unlike hubs, switches can look at data packets as they are received, determine the source and destination device, and forward the packet appropriately.

By delivering messages only to the device that the packet was intended for, switches conserve network bandwidth.

See http://howstuffworks.com/lan-switch.htm

23

Basic Architecture Terms -- Myrinet

Packet communication and switching technology, faster than ethernet.

Myrinet offers full-duplex 2+2 Gb/sec data rate and low latency. It is used in Linux clusters.

Only 16 of the nodes of WFU’s clusters are connected with myrinet. The rest are connected with ethernet, for cost reasons.

24

Classification by Interconnection Network

Static network Bus-based network can be static (if no switches are

involved) Direct links between computers

Examples include completely connected, line/ring, mesh, tree (regular and fat), and hypercube

Dynamic network Uses switches Connections change according to whether a switch

is open or closed Could be arranged in stages (multistage) (e.g.,

Omega network)

25

Hypercube

A d-dimensional hypercube has 2d nodes.

Each node has a d-bit address.Neighboring nodes differ in one bit.Needs a routing algorithm. We’ll try one

in class.

26

Multistage Networks

See notes on Omega network from class.

27

Properties of Network Communication

Diameter of a network – min # links between 2 farthest nodes

Bisection width of a network -- # links that must be cut to divide network into 2 equal parts

28

Properties of Network Communication

Message latency – time taken to prepare the message to be sent (software overhead)

Network latency – time taken for a message to pass through a network

Communication latency – total time taken to send a message, including message and network latency

Deadlock – occurs when packets cannot be forwarded because they are waiting for each other in a circular way

29

Memory Hierarchy

Global memoryLocal memoryCache

Faster, but more expensiveCache coherence must be maintained

30

Communication Methods

Circuit switchingPacket switchingWormhole routing

31

Properties of a Parallel Program

GranularitySpeedupOverheadEfficiencyCostScalabilityGustafson’s law

csc 364/664 parallel computation fall 2003 burg/miller/torgersen chapter 1: parallel computers

Documents

memory interface

processors local memory

multiple cpus

control unitvirtual

multiple dataeach processor

multiple data mpi

multiple dataone control

different processors