csc 364/664 parallel computation fall 2003 burg/miller/torgersen chapter 1: parallel computers
TRANSCRIPT
CSC 364/664 Parallel Computation Fall 2003 Burg/Miller/Torgersen
Chapter 1: Parallel Computers
2
Concurrency vs. True Parallelism
Concurrency is used in systems where more than one user is using a resource at the same CPUDatabase information
In true parallelism, multiple processors are working simultaneously on one application problem
3
Flynn’s Taxonomy – Classification by Control Mechanism
A classification of parallel systems from a “flow of control” perspective
SISD –single instruction, single dataSIMD – single instruction, multiple dataMISD – multiple instructions, single dataMIMD – multiple instructions, multiple
data
4
SISD
Single instruction, single dataSequential programming with one
processor, just like you’ve always done
5
SIMD
Single instruction, multiple data One control unit issuing the same instruction
to multiple CPU’s that operate simultaneously on their own portions of data
Lock-step, synchronized Vector and matrix computation lend
themselves to an SIMD implementation Examples of SIMD computers: Illiac IV, MPP,
DAP, CM-2, and MasPar MP-2
6
MIMD
Multiple instructions, multiple data Each processor “doing its own thing” Processors synchronize either through
passing messages or writing values to shared memory addresses
Subcategories SPMD – single program, multiple data (MPI on a Linux
cluster) MPMD – multiple program, multiple data (PVM)
Examples of MIMD computers – BBN butterfly, IPSC 1 and 2, IBM SP, SP2
7
MISD
Multiple instruction, single dataDoesn’t really exist, unless you consider
pipelining an MISD configuration
8
Comparison of SIMD and MIMD
It takes a specially-designed computer to do SIMD computing, since one control unit controls multiple processors.
SIMD requires only one copy of a program. MIMD systems have a copy of the program
and operating system at each processor. SIMD computers quickly become obsolete. MIMD systems can be pieced together from
the most up-to-date components available.
9
Classification by Communication Mechanism
Shared-address-space “Multiprocessors”, each with its own control unit Virtual memory makes all memory addresses look
like they come from one consistent space, but they don’t necessarily
Processors communicate with reads and writes Message passing systems
“Multicomputers” Separate processors and separate memory
addresses Processors communicate with message passing
10
Shared Memory Address Space
Interprocess communication is done in the memory interface through reads and writes.
Virtual memory address maps to a real address. Different processors may have memory locally
attached to them. Access could needed to a processor’s own memory,
or to the memory attached to a different processor. Different instances of memory access could take
different amounts of time. Collisions are possible. UMA (i.e., shared memory) vs. NUMA (i.e., distributed
shared memory)
11
Message Passing System
Interprocess communication is done at the program level using sends and receives.
Reads and writes refer only to a processor’s local memory.
Data can be packed into long messages before being sent, to compensate for latency.
Global scheduling of messages can help avoid message collisions.
12
Basic Architecture Terms – Clock Speed and Bandwidth
Clock speed of a processor– max # of times per sec. that a device can say something new
Bandwidth of a transmission medium (i.e., telephone line, cable line, etc.) is defined as the maximum rate at which the medium can change a signal. Bandwidth is measured in cycles per second or Hertz. Bandwidth is determined by the physical properties of the transmission medium, including the material of which it is composed.
13
Basic Architecture Terms – Clock Speed and Bandwidth
Data rate is a measure of the amount of data that can be sent across a transmission medium per unit time. Data rate is determined by two things (1) the bandwidth, and (2) the potential number of different things that can be conveyed each time the signal changes (which, in the case of a bus, is based on the number of parallel data lines).
14
Basic Architecture Terms -- Bus
A bus is a communication medium to which all processors are connected.
Only one communication at a time is allowed on the bus.
Only one step from any source to any destination. Bus data rate (sometimes loosely called
“bandwidth”) is defined as clock speed times number of bits transmitted at each clock pulse
Bus is low-cost, but you can’t have very many processors attached to it.
15
Bus on a Motherboard
The bus transports data among the CPU, memory,and other components.
It consists of electrical circuits called traces and adapters or expansion cards.
There’s a main motherboard bus, and then buses for the CPU, memory, SCSI connections, and USB.
16
Types of Buses
Original IBM PC bus – 8-bit parallel, 4.77 MHz clock speed
IBM AT, 1982, introduced the ISA bus (Industry Standard Architecture), 16 bit parallel, with expansion slots, still compatible with 8-bit, 8 MHz clock speed
IBM PS/2, MCA (Microchannel Architecture) bus, 32 bit parallel, but not backwardly compatible; 10 MHz clock speed; didn’t catch on
17
Types of Buses
Compaq and other IBM rivals introduced EISA (Extended Industry Standard Architecture) bus in 1988, 32-bit parallel, 8.2 MHz clock speed; didn’t catch on
VL-Bus (Vesa Local Bus), 32-bit parallel, close to clock speed of CPU, tied directly to CPU
The trend moved to specialized buses with higher clock speeds, closer to the CPU’s clock speed, and separate from the system bus – e.g. PCI (Peripheral Component Bus)
18
PCI bus
PCI bus can exist side-by-side with ISA bus and system bus; in this sense it’s a “local” bus
Originally 33 MHz, 32-bitsPCI-X is 133 MHz, 64 bit for 1 GB/sec
data transfer rateSupports Plug and Play See http://computer.howstuffworks.com/pci.htm
19
Ethernet Bus-Based Network
All nodes branch off a common line.Each device has an ethernet address,
also known as MAC address. All computers receive all data transmissions
(in packets). They look to see if the packet is addressed to them, and read it only if it is.
When a computer wants to transmit data, it waits until the line is free.
CSMA/CD protocol is used (carrier-sense multiple access with collision detection).
20
Basic Architecture Terms -- Ethernet
Ethernet is actually an OSI layer 2 communication protocol. It does not dictate the type of connectivity – could be copper, fiber, wireless.
Today’s ethernet is full-duplex, i.e., it has separate lines for send and receive
IEEE Standard 802.3Ethernet comes in 10, 100, and 1000
Mb/sec (1 Gb/sec) speeds. See http://computer.howstuffworks.com/ethernet.htm
21
Basic Architecture Terms -- Hub
Hubs connect computers in a network.
They operate using a broadcast model. When n computers are connected to a hub, hubs simply pass through all network traffic to each of the n computers.
22
Basic Architecture Terms -- Switch
Unlike hubs, switches can look at data packets as they are received, determine the source and destination device, and forward the packet appropriately.
By delivering messages only to the device that the packet was intended for, switches conserve network bandwidth.
See http://howstuffworks.com/lan-switch.htm
23
Basic Architecture Terms -- Myrinet
Packet communication and switching technology, faster than ethernet.
Myrinet offers full-duplex 2+2 Gb/sec data rate and low latency. It is used in Linux clusters.
Only 16 of the nodes of WFU’s clusters are connected with myrinet. The rest are connected with ethernet, for cost reasons.
24
Classification by Interconnection Network
Static network Bus-based network can be static (if no switches are
involved) Direct links between computers
Examples include completely connected, line/ring, mesh, tree (regular and fat), and hypercube
Dynamic network Uses switches Connections change according to whether a switch
is open or closed Could be arranged in stages (multistage) (e.g.,
Omega network)
25
Hypercube
A d-dimensional hypercube has 2d nodes.
Each node has a d-bit address.Neighboring nodes differ in one bit.Needs a routing algorithm. We’ll try one
in class.
26
Multistage Networks
See notes on Omega network from class.
27
Properties of Network Communication
Diameter of a network – min # links between 2 farthest nodes
Bisection width of a network -- # links that must be cut to divide network into 2 equal parts
28
Properties of Network Communication
Message latency – time taken to prepare the message to be sent (software overhead)
Network latency – time taken for a message to pass through a network
Communication latency – total time taken to send a message, including message and network latency
Deadlock – occurs when packets cannot be forwarded because they are waiting for each other in a circular way
29
Memory Hierarchy
Global memoryLocal memoryCache
Faster, but more expensiveCache coherence must be maintained
30
Communication Methods
Circuit switchingPacket switchingWormhole routing
31
Properties of a Parallel Program
GranularitySpeedupOverheadEfficiencyCostScalabilityGustafson’s law