chapter 2 parallel architectures. outline interconnection networks interconnection networks...

102
Chapter 2 Parallel Architectures Parallel Architectures

Upload: melanie-black

Post on 31-Dec-2015

235 views

Category:

Documents


5 download

TRANSCRIPT

Page 1: Chapter 2 Parallel Architectures. Outline Interconnection networks Interconnection networks Processor arrays Processor arrays Multiprocessors Multiprocessors

Chapter 2

Parallel ArchitecturesParallel Architectures

Page 2: Chapter 2 Parallel Architectures. Outline Interconnection networks Interconnection networks Processor arrays Processor arrays Multiprocessors Multiprocessors

Outline

Interconnection networksInterconnection networks Processor arraysProcessor arrays MultiprocessorsMultiprocessors MulticomputersMulticomputers Flynn’s taxonomyFlynn’s taxonomy

Page 3: Chapter 2 Parallel Architectures. Outline Interconnection networks Interconnection networks Processor arrays Processor arrays Multiprocessors Multiprocessors

Interconnection Networks

Uses of interconnection networksUses of interconnection networks Connect processors to shared memoryConnect processors to shared memory Connect processors to each otherConnect processors to each other

Interconnection media typesInterconnection media types Shared mediumShared medium Switched mediumSwitched medium

Page 4: Chapter 2 Parallel Architectures. Outline Interconnection networks Interconnection networks Processor arrays Processor arrays Multiprocessors Multiprocessors

Shared versus Switched Media

QuickTime™ and a decompressor

are needed to see this picture.

Page 5: Chapter 2 Parallel Architectures. Outline Interconnection networks Interconnection networks Processor arrays Processor arrays Multiprocessors Multiprocessors

Shared Medium

Allows only one message at a timeAllows only one message at a time Messages are broadcastMessages are broadcast Each processor “listens” to every messageEach processor “listens” to every message Arbitration is decentralizedArbitration is decentralized Collisions require resending of messagesCollisions require resending of messages Ethernet is an exampleEthernet is an example

Page 6: Chapter 2 Parallel Architectures. Outline Interconnection networks Interconnection networks Processor arrays Processor arrays Multiprocessors Multiprocessors

Switched Medium

Supports point-to-point messages between Supports point-to-point messages between pairs of processorspairs of processors

Each processor has its own path to switchEach processor has its own path to switch Advantages over shared mediaAdvantages over shared media

Allows multiple messages to be sent Allows multiple messages to be sent simultaneouslysimultaneously

Allows scaling of network to Allows scaling of network to accommodate increase in processorsaccommodate increase in processors

Page 7: Chapter 2 Parallel Architectures. Outline Interconnection networks Interconnection networks Processor arrays Processor arrays Multiprocessors Multiprocessors

Switch Network Topologies

View switched network as a graphView switched network as a graph Vertices = processors or switchesVertices = processors or switches Edges = communication pathsEdges = communication paths

Two kinds of topologiesTwo kinds of topologies DirectDirect IndirectIndirect

Page 8: Chapter 2 Parallel Architectures. Outline Interconnection networks Interconnection networks Processor arrays Processor arrays Multiprocessors Multiprocessors

Direct Topology

Ratio of switch nodes to processor nodes is Ratio of switch nodes to processor nodes is 1:11:1

Every switch node is connected toEvery switch node is connected to 1 processor node1 processor node At least 1 other switch nodeAt least 1 other switch node

Page 9: Chapter 2 Parallel Architectures. Outline Interconnection networks Interconnection networks Processor arrays Processor arrays Multiprocessors Multiprocessors

Indirect Topology

Ratio of switch nodes to processor nodes is Ratio of switch nodes to processor nodes is greater than 1:1greater than 1:1

Some switches simply connect other Some switches simply connect other switchesswitches

Page 10: Chapter 2 Parallel Architectures. Outline Interconnection networks Interconnection networks Processor arrays Processor arrays Multiprocessors Multiprocessors

Evaluating Switch Topologies Diameter Diameter

distance between farthest two nodesdistance between farthest two nodes Clique K_n best: d = O(1) Clique K_n best: d = O(1) but #edges m = O(n^2);but #edges m = O(n^2);

m = O(n) in a path P_n or cycle C_n, but d = O(n) as wellm = O(n) in a path P_n or cycle C_n, but d = O(n) as well Bisection widthBisection width

Min. number of edges in a cut which roughly divides a network in two halves Min. number of edges in a cut which roughly divides a network in two halves - determines the min. bandwidth of the network - determines the min. bandwidth of the network

K_n’s bisection width is O(n), but C_n’s O(1)K_n’s bisection width is O(n), but C_n’s O(1) Degree = Number of edges / node Degree = Number of edges / node

constant degree board can be mass producedconstant degree board can be mass produced Constant edge length? (yes/no)Constant edge length? (yes/no) Planar? – easier to buildPlanar? – easier to build

Page 11: Chapter 2 Parallel Architectures. Outline Interconnection networks Interconnection networks Processor arrays Processor arrays Multiprocessors Multiprocessors

2-D Mesh Network

Direct topologyDirect topology Switches arranged into a 2-D latticeSwitches arranged into a 2-D lattice Communication allowed only between Communication allowed only between

neighboring switchesneighboring switches Variants allow wraparound connections Variants allow wraparound connections

between switches on edge of meshbetween switches on edge of mesh

Page 12: Chapter 2 Parallel Architectures. Outline Interconnection networks Interconnection networks Processor arrays Processor arrays Multiprocessors Multiprocessors

2-D Meshes Torus

QuickTime™ and a decompressor

are needed to see this picture.

Page 13: Chapter 2 Parallel Architectures. Outline Interconnection networks Interconnection networks Processor arrays Processor arrays Multiprocessors Multiprocessors

Evaluating 2-D Meshes

Diameter: Diameter: ((nn1/21/2)) m = m = (n)(n) Bisection width: Bisection width: ((nn1/21/2)) Number of edges per switch: 4Number of edges per switch: 4 Constant edge length? YesConstant edge length? Yes planarplanar

Page 14: Chapter 2 Parallel Architectures. Outline Interconnection networks Interconnection networks Processor arrays Processor arrays Multiprocessors Multiprocessors

Binary Tree Network

Indirect topologyIndirect topology nn = 2 = 2dd processor nodes, processor nodes, nn-1 switches-1 switches

QuickTime™ and a decompressor

are needed to see this picture.

Page 15: Chapter 2 Parallel Architectures. Outline Interconnection networks Interconnection networks Processor arrays Processor arrays Multiprocessors Multiprocessors

Evaluating Binary Tree Network

Diameter: 2 log nDiameter: 2 log n M = O(n)M = O(n) Bisection width: 1Bisection width: 1 Edges / node: 3Edges / node: 3 Constant edge length? NoConstant edge length? No planarplanar

Page 16: Chapter 2 Parallel Architectures. Outline Interconnection networks Interconnection networks Processor arrays Processor arrays Multiprocessors Multiprocessors

Hypertree Network

Indirect topologyIndirect topology Shares low diameter of binary treeShares low diameter of binary tree Greatly improves bisection widthGreatly improves bisection width From “front” looks like From “front” looks like kk-ary tree of height -ary tree of height

dd From “side” looks like upside down binary From “side” looks like upside down binary

tree of height tree of height dd

Page 17: Chapter 2 Parallel Architectures. Outline Interconnection networks Interconnection networks Processor arrays Processor arrays Multiprocessors Multiprocessors

Hypertree Network

QuickTime™ and a decompressor

are needed to see this picture.

Page 18: Chapter 2 Parallel Architectures. Outline Interconnection networks Interconnection networks Processor arrays Processor arrays Multiprocessors Multiprocessors

Evaluating 4-ary Hypertree

Diameter: logDiameter: log n n

Bisection width: Bisection width: nn / 2 / 2

Edges / node: 6Edges / node: 6

Constant edge length? NoConstant edge length? No

Page 19: Chapter 2 Parallel Architectures. Outline Interconnection networks Interconnection networks Processor arrays Processor arrays Multiprocessors Multiprocessors

Butterfly Network

Indirect topologyIndirect topology nn = 2 = 2dd processor processor

nodes connectednodes connectedby by nn(log (log nn + 1) + 1)switching nodesswitching nodes

0 1 2 3 4 5 6 7

3,0 3,1 3,2 3,3 3,4 3,5 3,6 3,7

2,0 2,1 2,2 2,3 2,4 2,5 2,6 2,7

1,0 1,1 1,2 1,3 1,4 1,5 1,6 1,7

0,0 0,1 0,2 0,3 0,4 0,5 0,6 0,7Rank 0

Rank 1

Rank 2

Rank 3

Page 20: Chapter 2 Parallel Architectures. Outline Interconnection networks Interconnection networks Processor arrays Processor arrays Multiprocessors Multiprocessors

Butterfly Network Routing

QuickTime™ and a decompressor

are needed to see this picture.

Page 21: Chapter 2 Parallel Architectures. Outline Interconnection networks Interconnection networks Processor arrays Processor arrays Multiprocessors Multiprocessors

Evaluating Butterfly Network

Diameter: log Diameter: log nn

Bisection width: Bisection width: nn / 2 / 2

Edges per node: 4Edges per node: 4

Constant edge length? NoConstant edge length? No

Page 22: Chapter 2 Parallel Architectures. Outline Interconnection networks Interconnection networks Processor arrays Processor arrays Multiprocessors Multiprocessors

Hypercube

Direct topologyDirect topology 2 2 xx 2 2 xx … … xx 2 mesh 2 mesh Number of nodes a power of 2Number of nodes a power of 2 Node addresses 0, 1, …, 2Node addresses 0, 1, …, 2kk-1-1 Node Node ii connected to connected to kk nodes whose nodes whose

addresses differ from addresses differ from ii in exactly one bit in exactly one bit positionposition

Page 23: Chapter 2 Parallel Architectures. Outline Interconnection networks Interconnection networks Processor arrays Processor arrays Multiprocessors Multiprocessors

Hypercube Addressing

0010

0000

0100

0110 0111

1110

0001

0101

1000 1001

0011

1010

1111

1011

11011100

Page 24: Chapter 2 Parallel Architectures. Outline Interconnection networks Interconnection networks Processor arrays Processor arrays Multiprocessors Multiprocessors

Hypercubes Illustrated

Page 25: Chapter 2 Parallel Architectures. Outline Interconnection networks Interconnection networks Processor arrays Processor arrays Multiprocessors Multiprocessors

Evaluating Hypercube Network

Diameter: log Diameter: log nn

Bisection width: Bisection width: nn / 2 / 2

Edges per node: log Edges per node: log nn

Constant edge length? NoConstant edge length? No

Page 26: Chapter 2 Parallel Architectures. Outline Interconnection networks Interconnection networks Processor arrays Processor arrays Multiprocessors Multiprocessors

Shuffle-exchange

Direct topologyDirect topology Number of nodes a power of 2Number of nodes a power of 2 Nodes have addresses 0, 1, …, 2Nodes have addresses 0, 1, …, 2kk-1-1 Two outgoing links from node Two outgoing links from node ii

Shuffle link to node Shuffle link to node LeftCycle(i)LeftCycle(i) Exchange link to node [xor (Exchange link to node [xor (ii, 1)], 1)]

Page 27: Chapter 2 Parallel Architectures. Outline Interconnection networks Interconnection networks Processor arrays Processor arrays Multiprocessors Multiprocessors

Shuffle-exchange Illustrated

0 1 2 3 4 5 6 7

Page 28: Chapter 2 Parallel Architectures. Outline Interconnection networks Interconnection networks Processor arrays Processor arrays Multiprocessors Multiprocessors

Shuffle-exchange Addressing

0000 0001 0010 0011 0100 0101

1110 11111000 1001 1010 1011 1100 1101

0110 0111

Page 29: Chapter 2 Parallel Architectures. Outline Interconnection networks Interconnection networks Processor arrays Processor arrays Multiprocessors Multiprocessors

Evaluating Shuffle-exchange

Diameter: 2log Diameter: 2log nn - 1 - 1

Bisection width: Bisection width: n n / log / log nn

Edges per node: 2Edges per node: 2

Constant edge length? NoConstant edge length? No

Page 30: Chapter 2 Parallel Architectures. Outline Interconnection networks Interconnection networks Processor arrays Processor arrays Multiprocessors Multiprocessors

Comparing Networks

All have logarithmic diameterAll have logarithmic diameterexcept 2-D meshexcept 2-D mesh

Hypertree, butterfly, and hypercube have Hypertree, butterfly, and hypercube have bisection width bisection width nn / 2 / 2

All have constant edges per node except All have constant edges per node except hypercubehypercube

Only 2-D mesh keeps edge lengths constant Only 2-D mesh keeps edge lengths constant as network size increasesas network size increases

Page 31: Chapter 2 Parallel Architectures. Outline Interconnection networks Interconnection networks Processor arrays Processor arrays Multiprocessors Multiprocessors

Vector Computers

Vector computer: instruction set includes Vector computer: instruction set includes operations on vectors as well as scalarsoperations on vectors as well as scalars

Two ways to implement vector computersTwo ways to implement vector computers Pipelined vector processor: streams data Pipelined vector processor: streams data

through pipelined arithmetic units - CRAY-I, IIthrough pipelined arithmetic units - CRAY-I, II Processor array: many identical, synchronized Processor array: many identical, synchronized

arithmetic processing elements - Maspar’s MP-arithmetic processing elements - Maspar’s MP-I, III, II

Page 32: Chapter 2 Parallel Architectures. Outline Interconnection networks Interconnection networks Processor arrays Processor arrays Multiprocessors Multiprocessors

Why Processor Arrays?

Historically, high cost of a control unitHistorically, high cost of a control unit Scientific applications have data parallelismScientific applications have data parallelism

Page 33: Chapter 2 Parallel Architectures. Outline Interconnection networks Interconnection networks Processor arrays Processor arrays Multiprocessors Multiprocessors

Processor Array

QuickTime™ and a decompressor

are needed to see this picture.

Page 34: Chapter 2 Parallel Architectures. Outline Interconnection networks Interconnection networks Processor arrays Processor arrays Multiprocessors Multiprocessors

Data/instruction Storage

Front end computerFront end computer ProgramProgram Data manipulated sequentiallyData manipulated sequentially

Processor arrayProcessor array Data manipulated in parallelData manipulated in parallel

Page 35: Chapter 2 Parallel Architectures. Outline Interconnection networks Interconnection networks Processor arrays Processor arrays Multiprocessors Multiprocessors

Processor Array Performance

Performance: work done per time unitPerformance: work done per time unit Performance of processor arrayPerformance of processor array

Speed of processing elementsSpeed of processing elements Utilization of processing elementsUtilization of processing elements

Page 36: Chapter 2 Parallel Architectures. Outline Interconnection networks Interconnection networks Processor arrays Processor arrays Multiprocessors Multiprocessors

Performance Example 1

1024 processors1024 processors Each adds a pair of integers in 1 Each adds a pair of integers in 1 secsec What is performance when adding two What is performance when adding two

1024-element vectors (one per processor)?1024-element vectors (one per processor)?

sec/ops10024.1ePerformanc 9sec1

operations1024 ×==

Page 37: Chapter 2 Parallel Architectures. Outline Interconnection networks Interconnection networks Processor arrays Processor arrays Multiprocessors Multiprocessors

Performance Example 2

512 processors512 processors Each adds two integers in 1 Each adds two integers in 1 secsec Performance adding two vectors of length Performance adding two vectors of length

600?600?

sec/ops103ePerformanc 6sec2

operations600 ×==

Page 38: Chapter 2 Parallel Architectures. Outline Interconnection networks Interconnection networks Processor arrays Processor arrays Multiprocessors Multiprocessors

2-D Processor Interconnection Network

QuickTime™ and a decompressor

are needed to see this picture.

Each VLSI chip has 16 processing elements

Page 39: Chapter 2 Parallel Architectures. Outline Interconnection networks Interconnection networks Processor arrays Processor arrays Multiprocessors Multiprocessors

if (COND) then A else B

Page 40: Chapter 2 Parallel Architectures. Outline Interconnection networks Interconnection networks Processor arrays Processor arrays Multiprocessors Multiprocessors

if (COND) then A else B

Page 41: Chapter 2 Parallel Architectures. Outline Interconnection networks Interconnection networks Processor arrays Processor arrays Multiprocessors Multiprocessors

if (COND) then A else B

Page 42: Chapter 2 Parallel Architectures. Outline Interconnection networks Interconnection networks Processor arrays Processor arrays Multiprocessors Multiprocessors

Processor Array Shortcomings

Not all problems are data-parallelNot all problems are data-parallel Speed drops for conditionally executed Speed drops for conditionally executed

codecode Don’t adapt to multiple users wellDon’t adapt to multiple users well Do not scale down well to “starter” systemsDo not scale down well to “starter” systems Rely on custom VLSI for processorsRely on custom VLSI for processors Expense of control units has droppedExpense of control units has dropped

Page 43: Chapter 2 Parallel Architectures. Outline Interconnection networks Interconnection networks Processor arrays Processor arrays Multiprocessors Multiprocessors

Multiprocessors

Multiprocessor: multiple-CPU computer Multiprocessor: multiple-CPU computer with a shared memorywith a shared memory

Same address on two different CPUs refers Same address on two different CPUs refers to the same memory locationto the same memory location

Avoid three problems of processor arraysAvoid three problems of processor arrays Can be built from commodity CPUsCan be built from commodity CPUs Naturally support multiple usersNaturally support multiple users Maintain efficiency in conditional codeMaintain efficiency in conditional code

Page 44: Chapter 2 Parallel Architectures. Outline Interconnection networks Interconnection networks Processor arrays Processor arrays Multiprocessors Multiprocessors

Centralized Multiprocessor

Straightforward extension of uniprocessorStraightforward extension of uniprocessor Add CPUs to busAdd CPUs to bus All processors share same primary memoryAll processors share same primary memory Memory access time same for all CPUsMemory access time same for all CPUs

Uniform memory access (UMA) Uniform memory access (UMA) multiprocessormultiprocessor

Symmetrical multiprocessor (SMP) - Sequent Symmetrical multiprocessor (SMP) - Sequent Balance Series, SGI Power and Challenge Balance Series, SGI Power and Challenge seriesseries

Page 45: Chapter 2 Parallel Architectures. Outline Interconnection networks Interconnection networks Processor arrays Processor arrays Multiprocessors Multiprocessors

Centralized Multiprocessor

QuickTime™ and a decompressor

are needed to see this picture.

Page 46: Chapter 2 Parallel Architectures. Outline Interconnection networks Interconnection networks Processor arrays Processor arrays Multiprocessors Multiprocessors

Private and Shared Data

Private data: items used only by a single Private data: items used only by a single processorprocessor

Shared data: values used by multiple Shared data: values used by multiple processorsprocessors

In a multiprocessor, processors In a multiprocessor, processors communicate via shared data valuescommunicate via shared data values

Page 47: Chapter 2 Parallel Architectures. Outline Interconnection networks Interconnection networks Processor arrays Processor arrays Multiprocessors Multiprocessors

Problems Associated with Shared Data

Cache coherenceCache coherence Replicating data across multiple caches Replicating data across multiple caches

reduces contentionreduces contention How to ensure different processors have How to ensure different processors have

same value for same address?same value for same address? SynchronizationSynchronization

Mutual exclusionMutual exclusion BarrierBarrier

Page 48: Chapter 2 Parallel Architectures. Outline Interconnection networks Interconnection networks Processor arrays Processor arrays Multiprocessors Multiprocessors

Cache-coherence Problem

Cache

CPU A

Cache

CPU B

Memory

7X

Page 49: Chapter 2 Parallel Architectures. Outline Interconnection networks Interconnection networks Processor arrays Processor arrays Multiprocessors Multiprocessors

Cache-coherence Problem

CPU A CPU B

Memory

7X

7

Page 50: Chapter 2 Parallel Architectures. Outline Interconnection networks Interconnection networks Processor arrays Processor arrays Multiprocessors Multiprocessors

Cache-coherence Problem

CPU A CPU B

Memory

7X

7 7

Page 51: Chapter 2 Parallel Architectures. Outline Interconnection networks Interconnection networks Processor arrays Processor arrays Multiprocessors Multiprocessors

Cache-coherence Problem

CPU A CPU B

Memory

2X

7 2

Page 52: Chapter 2 Parallel Architectures. Outline Interconnection networks Interconnection networks Processor arrays Processor arrays Multiprocessors Multiprocessors

Write Invalidate Protocol

CPU A CPU B

7X

7 7 Cache control monitor

Page 53: Chapter 2 Parallel Architectures. Outline Interconnection networks Interconnection networks Processor arrays Processor arrays Multiprocessors Multiprocessors

Write Invalidate Protocol

CPU A CPU B

7X

7 7

Intent to write X

Page 54: Chapter 2 Parallel Architectures. Outline Interconnection networks Interconnection networks Processor arrays Processor arrays Multiprocessors Multiprocessors

Write Invalidate Protocol

CPU A CPU B

7X

7

Intent to write X

Page 55: Chapter 2 Parallel Architectures. Outline Interconnection networks Interconnection networks Processor arrays Processor arrays Multiprocessors Multiprocessors

Write Invalidate Protocol

CPU A CPU B

X 2

2

Page 56: Chapter 2 Parallel Architectures. Outline Interconnection networks Interconnection networks Processor arrays Processor arrays Multiprocessors Multiprocessors

Distributed Multiprocessor

Distribute primary memory among Distribute primary memory among processorsprocessors

Increase aggregate memory bandwidth and Increase aggregate memory bandwidth and lower average memory access timelower average memory access time

Allow greater number of processorsAllow greater number of processors Also called non-uniform memory access Also called non-uniform memory access

(NUMA) multiprocessor - SGI Origin (NUMA) multiprocessor - SGI Origin SeriesSeries

Page 57: Chapter 2 Parallel Architectures. Outline Interconnection networks Interconnection networks Processor arrays Processor arrays Multiprocessors Multiprocessors

Distributed Multiprocessor

QuickTime™ and a decompressor

are needed to see this picture.

Page 58: Chapter 2 Parallel Architectures. Outline Interconnection networks Interconnection networks Processor arrays Processor arrays Multiprocessors Multiprocessors

Cache Coherence

Some NUMA multiprocessors do not Some NUMA multiprocessors do not support it in hardwaresupport it in hardware Only instructions, private data in cacheOnly instructions, private data in cache Large memory access time varianceLarge memory access time variance

Implementation more difficultImplementation more difficult No shared memory bus to “snoop”No shared memory bus to “snoop” Directory-based protocol neededDirectory-based protocol needed

Page 59: Chapter 2 Parallel Architectures. Outline Interconnection networks Interconnection networks Processor arrays Processor arrays Multiprocessors Multiprocessors

Directory-based Protocol

Distributed directory contains information Distributed directory contains information about cacheable memory blocksabout cacheable memory blocks

One directory entry for each cache blockOne directory entry for each cache block Each entry hasEach entry has

Sharing statusSharing status Which processors have copiesWhich processors have copies

Page 60: Chapter 2 Parallel Architectures. Outline Interconnection networks Interconnection networks Processor arrays Processor arrays Multiprocessors Multiprocessors

Sharing Status

UncachedUncached Block not in any processor’s cacheBlock not in any processor’s cache

SharedShared Cached by one or more processorsCached by one or more processors Read onlyRead only

ExclusiveExclusive Cached by exactly one processorCached by exactly one processor Processor has written blockProcessor has written block Copy in memory is obsoleteCopy in memory is obsolete

Page 61: Chapter 2 Parallel Architectures. Outline Interconnection networks Interconnection networks Processor arrays Processor arrays Multiprocessors Multiprocessors

Directory-based ProtocolInterconnection Network

Directory

Local Memory

Cache

CPU 0

Directory

Local Memory

Cache

CPU 1

Directory

Local Memory

Cache

CPU 2

Page 62: Chapter 2 Parallel Architectures. Outline Interconnection networks Interconnection networks Processor arrays Processor arrays Multiprocessors Multiprocessors

Directory-based ProtocolInterconnection Network

CPU 0 CPU 1 CPU 2

7X

Caches

Memories

Directories X U 0 0 0

Bit Vector

Page 63: Chapter 2 Parallel Architectures. Outline Interconnection networks Interconnection networks Processor arrays Processor arrays Multiprocessors Multiprocessors

CPU 0 Reads XInterconnection Network

CPU 0 CPU 1 CPU 2

7X

Caches

Memories

Directories X U 0 0 0

Read Miss

Page 64: Chapter 2 Parallel Architectures. Outline Interconnection networks Interconnection networks Processor arrays Processor arrays Multiprocessors Multiprocessors

CPU 0 Reads XInterconnection Network

CPU 0 CPU 1 CPU 2

7X

Caches

Memories

Directories X S 1 0 0

Page 65: Chapter 2 Parallel Architectures. Outline Interconnection networks Interconnection networks Processor arrays Processor arrays Multiprocessors Multiprocessors

CPU 0 Reads XInterconnection Network

CPU 0 CPU 1 CPU 2

7X

Caches

Memories

Directories X S 1 0 0

7X

Page 66: Chapter 2 Parallel Architectures. Outline Interconnection networks Interconnection networks Processor arrays Processor arrays Multiprocessors Multiprocessors

CPU 2 Reads XInterconnection Network

CPU 0 CPU 1 CPU 2

7X

Caches

Memories

Directories X S 1 0 0

7X

Read Miss

Page 67: Chapter 2 Parallel Architectures. Outline Interconnection networks Interconnection networks Processor arrays Processor arrays Multiprocessors Multiprocessors

CPU 2 Reads XInterconnection Network

CPU 0 CPU 1 CPU 2

7X

Caches

Memories

Directories X S 1 0 1

7X

Page 68: Chapter 2 Parallel Architectures. Outline Interconnection networks Interconnection networks Processor arrays Processor arrays Multiprocessors Multiprocessors

CPU 2 Reads XInterconnection Network

CPU 0 CPU 1 CPU 2

7X

Caches

Memories

Directories X S 1 0 1

7X 7X

Page 69: Chapter 2 Parallel Architectures. Outline Interconnection networks Interconnection networks Processor arrays Processor arrays Multiprocessors Multiprocessors

CPU 0 Writes 6 to XInterconnection Network

CPU 0 CPU 1 CPU 2

7X

Caches

Memories

Directories X S 1 0 1

7X 7X

Write Miss

Page 70: Chapter 2 Parallel Architectures. Outline Interconnection networks Interconnection networks Processor arrays Processor arrays Multiprocessors Multiprocessors

CPU 0 Writes 6 to XInterconnection Network

CPU 0 CPU 1 CPU 2

7X

Caches

Memories

Directories X S 1 0 1

7X 7X

Invalidate

Page 71: Chapter 2 Parallel Architectures. Outline Interconnection networks Interconnection networks Processor arrays Processor arrays Multiprocessors Multiprocessors

CPU 0 Writes 6 to XInterconnection Network

CPU 0 CPU 1 CPU 2

7X

Caches

Memories

Directories X E 1 0 0

6X

Page 72: Chapter 2 Parallel Architectures. Outline Interconnection networks Interconnection networks Processor arrays Processor arrays Multiprocessors Multiprocessors

CPU 1 Reads XInterconnection Network

CPU 0 CPU 1 CPU 2

7X

Caches

Memories

Directories X E 1 0 0

6X

Read Miss

Page 73: Chapter 2 Parallel Architectures. Outline Interconnection networks Interconnection networks Processor arrays Processor arrays Multiprocessors Multiprocessors

CPU 1 Reads XInterconnection Network

CPU 0 CPU 1 CPU 2

7X

Caches

Memories

Directories X E 1 0 0

6X

Switch to Shared

Page 74: Chapter 2 Parallel Architectures. Outline Interconnection networks Interconnection networks Processor arrays Processor arrays Multiprocessors Multiprocessors

CPU 1 Reads XInterconnection Network

CPU 0 CPU 1 CPU 2

6X

Caches

Memories

Directories X E 1 0 0

6X

Page 75: Chapter 2 Parallel Architectures. Outline Interconnection networks Interconnection networks Processor arrays Processor arrays Multiprocessors Multiprocessors

CPU 1 Reads XInterconnection Network

CPU 0 CPU 1 CPU 2

6X

Caches

Memories

Directories X S 1 1 0

6X 6X

Page 76: Chapter 2 Parallel Architectures. Outline Interconnection networks Interconnection networks Processor arrays Processor arrays Multiprocessors Multiprocessors

CPU 2 Writes 5 to XInterconnection Network

CPU 0 CPU 1 CPU 2

6X

Caches

Memories

Directories X S 1 1 0

6X 6X

Write Miss

Page 77: Chapter 2 Parallel Architectures. Outline Interconnection networks Interconnection networks Processor arrays Processor arrays Multiprocessors Multiprocessors

CPU 2 Writes 5 to XInterconnection Network

CPU 0 CPU 1 CPU 2

6X

Caches

Memories

Directories X S 1 1 0

6X 6X

Invalidate

Page 78: Chapter 2 Parallel Architectures. Outline Interconnection networks Interconnection networks Processor arrays Processor arrays Multiprocessors Multiprocessors

CPU 2 Writes 5 to XInterconnection Network

CPU 0 CPU 1 CPU 2

6X

Caches

Memories

Directories X E 0 0 1

5X

Page 79: Chapter 2 Parallel Architectures. Outline Interconnection networks Interconnection networks Processor arrays Processor arrays Multiprocessors Multiprocessors

CPU 0 Writes 4 to XInterconnection Network

CPU 0 CPU 1 CPU 2

6X

Caches

Memories

Directories X E 0 0 1

5X

Write Miss

Page 80: Chapter 2 Parallel Architectures. Outline Interconnection networks Interconnection networks Processor arrays Processor arrays Multiprocessors Multiprocessors

CPU 0 Writes 4 to XInterconnection Network

CPU 0 CPU 1 CPU 2

6X

Caches

Memories

Directories X E 1 0 0

Take Away

5X

Page 81: Chapter 2 Parallel Architectures. Outline Interconnection networks Interconnection networks Processor arrays Processor arrays Multiprocessors Multiprocessors

CPU 0 Writes 4 to XInterconnection Network

CPU 0 CPU 1 CPU 2

5X

Caches

Memories

Directories X E 0 1 0

5X

Page 82: Chapter 2 Parallel Architectures. Outline Interconnection networks Interconnection networks Processor arrays Processor arrays Multiprocessors Multiprocessors

CPU 0 Writes 4 to XInterconnection Network

CPU 0 CPU 1 CPU 2

5X

Caches

Memories

Directories X E 1 0 0

Page 83: Chapter 2 Parallel Architectures. Outline Interconnection networks Interconnection networks Processor arrays Processor arrays Multiprocessors Multiprocessors

CPU 0 Writes 4 to XInterconnection Network

CPU 0 CPU 1 CPU 2

5X

Caches

Memories

Directories X E 1 0 0

5X

Page 84: Chapter 2 Parallel Architectures. Outline Interconnection networks Interconnection networks Processor arrays Processor arrays Multiprocessors Multiprocessors

CPU 0 Writes 4 to XInterconnection Network

CPU 0 CPU 1 CPU 2

5X

Caches

Memories

Directories X E 1 0 0

4X

Page 85: Chapter 2 Parallel Architectures. Outline Interconnection networks Interconnection networks Processor arrays Processor arrays Multiprocessors Multiprocessors

CPU 0 Writes Back X BlockInterconnection Network

CPU 0 CPU 1 CPU 2

5X

Caches

Memories

Directories X E 1 0 0

4X

4X

Data Write Back

Page 86: Chapter 2 Parallel Architectures. Outline Interconnection networks Interconnection networks Processor arrays Processor arrays Multiprocessors Multiprocessors

CPU 0 Writes Back X BlockInterconnection Network

CPU 0 CPU 1 CPU 2

4X

Caches

Memories

Directories X U 0 0 0

Page 87: Chapter 2 Parallel Architectures. Outline Interconnection networks Interconnection networks Processor arrays Processor arrays Multiprocessors Multiprocessors

Multicomputer

Distributed memory multiple-CPU computerDistributed memory multiple-CPU computer Same address on different processors refers to Same address on different processors refers to

different physical memory locationsdifferent physical memory locations Processors interact through message passingProcessors interact through message passing Commercial multicomputers iPSC I, II, Intel Commercial multicomputers iPSC I, II, Intel

Paragon, Ncube I, IIParagon, Ncube I, II Commodity clusters – e.g., CheetahCommodity clusters – e.g., Cheetah

Page 88: Chapter 2 Parallel Architectures. Outline Interconnection networks Interconnection networks Processor arrays Processor arrays Multiprocessors Multiprocessors

Asymmetrical Multicomputer

QuickTime™ and a decompressor

are needed to see this picture.

Page 89: Chapter 2 Parallel Architectures. Outline Interconnection networks Interconnection networks Processor arrays Processor arrays Multiprocessors Multiprocessors

Asymmetrical MC Advantages

Back-end processors dedicated to parallel Back-end processors dedicated to parallel computations computations Easier to understand, Easier to understand, model, tune performancemodel, tune performance

Only a simple back-end operating system Only a simple back-end operating system needed needed Easy for a vendor to create Easy for a vendor to create

Page 90: Chapter 2 Parallel Architectures. Outline Interconnection networks Interconnection networks Processor arrays Processor arrays Multiprocessors Multiprocessors

Asymmetrical MC Disadvantages

Front-end computer is a single point of Front-end computer is a single point of failurefailure

Single front-end computer limits scalability Single front-end computer limits scalability of systemof system

Primitive operating system in back-end Primitive operating system in back-end processors makes debugging difficultprocessors makes debugging difficult

Every application requires development of Every application requires development of both front-end and back-end programboth front-end and back-end program

Page 91: Chapter 2 Parallel Architectures. Outline Interconnection networks Interconnection networks Processor arrays Processor arrays Multiprocessors Multiprocessors

Symmetrical Multicomputer

QuickTime™ and a decompressor

are needed to see this picture.

Page 92: Chapter 2 Parallel Architectures. Outline Interconnection networks Interconnection networks Processor arrays Processor arrays Multiprocessors Multiprocessors

Symmetrical MC Advantages

Alleviate performance bottleneck caused by Alleviate performance bottleneck caused by single front-end computersingle front-end computer

Better support for debuggingBetter support for debugging Every processor executes same programEvery processor executes same program

Page 93: Chapter 2 Parallel Architectures. Outline Interconnection networks Interconnection networks Processor arrays Processor arrays Multiprocessors Multiprocessors

Symmetrical MC Disadvantages

More difficult to maintain illusion of single More difficult to maintain illusion of single “parallel computer”“parallel computer”

No simple way to balance program No simple way to balance program development workload among processorsdevelopment workload among processors

More difficult to achieve high performance More difficult to achieve high performance when multiple processes on each processorwhen multiple processes on each processor

Page 94: Chapter 2 Parallel Architectures. Outline Interconnection networks Interconnection networks Processor arrays Processor arrays Multiprocessors Multiprocessors

ParPar Cluster, A Mixed Model

QuickTime™ and a decompressor

are needed to see this picture.

Page 95: Chapter 2 Parallel Architectures. Outline Interconnection networks Interconnection networks Processor arrays Processor arrays Multiprocessors Multiprocessors

Commodity Cluster

Co-located computersCo-located computers Dedicated to running parallel jobsDedicated to running parallel jobs No keyboards or displaysNo keyboards or displays Identical operating systemIdentical operating system Identical local disk imagesIdentical local disk images Administered as an entityAdministered as an entity

Page 96: Chapter 2 Parallel Architectures. Outline Interconnection networks Interconnection networks Processor arrays Processor arrays Multiprocessors Multiprocessors

Network of Workstations

Dispersed computersDispersed computers First priority: person at keyboardFirst priority: person at keyboard Parallel jobs run in backgroundParallel jobs run in background Different operating systemsDifferent operating systems Different local imagesDifferent local images Checkpointing and restarting importantCheckpointing and restarting important

Page 97: Chapter 2 Parallel Architectures. Outline Interconnection networks Interconnection networks Processor arrays Processor arrays Multiprocessors Multiprocessors

Flynn’s Taxonomy

Instruction streamInstruction stream Data streamData stream Single vs. multipleSingle vs. multiple Four combinationsFour combinations

SISDSISD SIMDSIMD MISDMISD MIMDMIMD

Page 98: Chapter 2 Parallel Architectures. Outline Interconnection networks Interconnection networks Processor arrays Processor arrays Multiprocessors Multiprocessors

SISD

Single Instruction, Single DataSingle Instruction, Single Data Single-CPU systemsSingle-CPU systems Note: co-processors don’t countNote: co-processors don’t count

FunctionalFunctional I/OI/O

Example: PCsExample: PCs

Page 99: Chapter 2 Parallel Architectures. Outline Interconnection networks Interconnection networks Processor arrays Processor arrays Multiprocessors Multiprocessors

SIMD

Single Instruction, Multiple DataSingle Instruction, Multiple Data Two architectures fit this categoryTwo architectures fit this category

Pipelined vector processorPipelined vector processor(e.g., Cray-1)(e.g., Cray-1)

Processor arrayProcessor array(e.g., Connection Machine CM-1, (e.g., Connection Machine CM-1, MASPAR 1000/2000)MASPAR 1000/2000)

Page 100: Chapter 2 Parallel Architectures. Outline Interconnection networks Interconnection networks Processor arrays Processor arrays Multiprocessors Multiprocessors

MISD

MultipleMultipleInstruction,Instruction,Single DataSingle Data

Example:Example:systolic array??systolic array??

QuickTime™ and a decompressor

are needed to see this picture.

Page 101: Chapter 2 Parallel Architectures. Outline Interconnection networks Interconnection networks Processor arrays Processor arrays Multiprocessors Multiprocessors

MIMD

Multiple Instruction, Multiple DataMultiple Instruction, Multiple Data Multiple-CPU computersMultiple-CPU computers

MultiprocessorsMultiprocessors MulticomputersMulticomputers

Page 102: Chapter 2 Parallel Architectures. Outline Interconnection networks Interconnection networks Processor arrays Processor arrays Multiprocessors Multiprocessors

Summary

Commercial parallel computers appearedCommercial parallel computers appearedin 1980sin 1980s

Multiple-CPU computers now dominateMultiple-CPU computers now dominate Small-scale: Centralized multiprocessorsSmall-scale: Centralized multiprocessors Large-scale: Distributed memory Large-scale: Distributed memory

architectures (multiprocessors or architectures (multiprocessors or multicomputers)multicomputers)