interconnection network topology design trade-offs

32
Interconnection Network Topology Design Trade-offs

Upload: toshi

Post on 13-Jan-2016

21 views

Category:

Documents


0 download

DESCRIPTION

Interconnection Network Topology Design Trade-offs. Organizational Structure. Processors datapath + control logic control logic determined by examining register transfers in the datapath Networks links switches network interfaces. Link Design/Engineering Space. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Interconnection Network Topology Design Trade-offs

Interconnection Network Topology Design Trade-offs

Page 2: Interconnection Network Topology Design Trade-offs

2

Organizational Structure

Processors datapath + control logic control logic determined by examining register transfers in the

datapath Networks

links switches network interfaces

Page 3: Interconnection Network Topology Design Trade-offs

3

Link Design/Engineering Space

Cable of one or more wires/fibers with connectors at the ends attached to switches or interfaces

Short: - single logicalvalue at a time

Long: - stream of logicalvalues at a time

Narrow: - control, data and timingmultiplexed on wire

Wide: - control, data and timingon separate wires

Synchronous:- source & dest on sameclock

Asynchronous:- source encodes clock insignal

Page 4: Interconnection Network Topology Design Trade-offs

4

Example: Cray MPPs

T3D: Short, Wide, Synchronous (300 MB/s) 24 bits

16 data, 4 control, 4 reverse direction flow control single 150 MHz clock (including processor) flit = phit = 16 bits two control bits identify flit type (idle and framing)

no-info, routing tag, packet, end-of-packet

T3E: long, wide, asynchronous (500 MB/s) 14 bits, 375 MHz, LVDS flit = 5 phits = 70 bits

64 bits data + 6 control switches operate at 75 MHz framed into 1-word and 8-word read/write request packets

Page 5: Interconnection Network Topology Design Trade-offs

5

Switches

Cross-bar

InputBuffer

Control

OutputPorts

Input Receiver Transmiter

Ports

Routing, Scheduling

OutputBuffer

Page 6: Interconnection Network Topology Design Trade-offs

6

Switch Components

Output ports transmitter (typically drives clock and data)

Input ports synchronizer aligns data signal with local clock domain essentially FIFO buffer

Crossbar connects each input to any output degree limited by area or pinout

Buffering Control logic

complexity depends on routing logic and scheduling algorithm determine output port for each incoming packet arbitrate among inputs directed at same output

Page 7: Interconnection Network Topology Design Trade-offs

7

Interconnection Topologies

Topology

[Regular] [Irregular]

[Static] [Dynamic]

[One-Dimensional]

[Three-Dimensional]

[Two-Dimensional]

[Hypercube] [....][Single-Stage]

[Crossbar][Multistage]

[One-Sided]

[Two-Sided]

[....]

Page 8: Interconnection Network Topology Design Trade-offs

8

Static Connection Topologies

Mesh and Torus Illiac IV, MPP, DAP, CM-2, Paragon k-dimensional mesh N=nk, d=2k, D=k(n-1) wraparound variation - Illiac IV Torus n x n binary torus, d = 4, D = 2 n/2

Hypercubes iPSC, nCube, CM-2 N = 2n, d = n, D = n poor scalability, difficulty in packaging higher-dimensional

hypercubes

Page 9: Interconnection Network Topology Design Trade-offs

9

Dynamic Interconnection Networks

Bus-based networks Crossbar networks Single Stage Networks

Shuffle-exchange N input and N output

Crossbar Recirculating networks

Multi-stage Networks more than one stage of switching elements switching box: straight, exchange, upper broadcast, lower

broadcast network topology and control structure

Page 10: Interconnection Network Topology Design Trade-offs

10

Dynamic Interconnection Networks

Two-sided MIN connecting an arbitrary input to an arbitrary output blocking, rearrangeable, nonblocking networks blocking networks

Data manipulator, Omega, Flip, n-cube, Baseline rearrangeable networks

Benes network nonblocking networks

Clos, Crossbar

Page 11: Interconnection Network Topology Design Trade-offs

11

Interconnection Topologies

Logical Properties: distance, degree

Physcial properties length, width

Fully connected network diameter = 1 degree = N cost?

bus => O(N), but BW is O(1) - actually worse crossbar => O(N2) for BW O(N)

VLSI technology determines switch degree

Page 12: Interconnection Network Topology Design Trade-offs

12

Linear Arrays and Rings

Linear Array Diameter? N-1 Average Distance? 2/3N Bisection bandwidth? 1 Route A -> B given by relative address R = B-A Space O(N)

Torus? Or Ring Examples: FDDI, SCI, FiberChannel Arbitrated Loop, KSR1

Linear Array

Torus

Torus arranged to use short wires

Page 13: Interconnection Network Topology Design Trade-offs

13

Multidimensional Meshes and Tori

d-dimensional array N = kd-1 X ...X kO nodes

described by d-vector of coordinates (id-1, ..., iO)

d-dimensional k-ary mesh: N = kd

k = dN described by d-vector of radix k coordinate

d-dimensional k-ary torus (or k-ary d-cube)?

2D Grid 3D Cube

Page 14: Interconnection Network Topology Design Trade-offs

14

Properties

Routing relative distance: R = (b d-1 - a d-1, ... , b0 - a0 )

traverse ri = b i - a i hops in each dimension dimension-order routing

Average Distance Wire Length? d x 2k/3 for mesh dk/2 for cube

Degree? Bisection bandwidth? Partitioning?

k d-1 bidirectional links Physical layout?

2D in O(N) space Short wires higher dimension?

Page 15: Interconnection Network Topology Design Trade-offs

15

Real World 2D mesh

1824 node Paragon: 16 x 114 array a single cabinet: 16 X 4 array

Page 16: Interconnection Network Topology Design Trade-offs

16

Embeddings in two dimensions

Embed multiple logical dimension in one physical dimension using long wires

6 x 3 x 2

Page 17: Interconnection Network Topology Design Trade-offs

17

Trees

Diameter and ave distance logarithmic k-ary tree, height d = logk N address specified d-vector of radix k coordinates describing path down from root

Fixed degree Route up to common ancestor and down

R = B xor A let i be position of most significant 1 in R, route up i+1 levels down in direction given by low i+1 bits of B

H-tree space is O(N) with O(N) long wires Bisection BW?

Page 18: Interconnection Network Topology Design Trade-offs

18

Fat-Trees

Fatter links (really more of them) as you go up, so bisection BW scales with N

Fat Tree

Page 19: Interconnection Network Topology Design Trade-offs

19

Butterflies

Tree with lots of roots! N log N switches (actually N/2 x logN) Exactly one route from any source to any dest R = A xor B, at level i use ‘straight’ edge if ri=0, otherwise cross edge Bisection N/2 vs N (d-1)/d (d-dimensional mesh) vs 1 (tree)

0

1

2

3

4

16 node butterfly

0 1 0 1

0 1 0 1

0 1

building block

Page 20: Interconnection Network Topology Design Trade-offs

20

Benes network and Fat Tree

Back-to-back butterfly can route all permutations off line

What if you just pick a random mid point?

16-node Benes Network (Unidirectional)

16-node 2-ary Fat-Tree (Bidirectional)

Page 21: Interconnection Network Topology Design Trade-offs

21

Hypercubes

Also called binary n-cubes. # of nodes = N = 2n. O(logN) Hops Good bisection BW Complexity

Out degree is n = logN

0-D 1-D 2-D 3-D 4-D 5-D !

Page 22: Interconnection Network Topology Design Trade-offs

22

Relationship BttrFlies to Hypercubes

Wiring is isomorphic Except that Butterfly always takes log n steps

Page 23: Interconnection Network Topology Design Trade-offs

23

Toplology Summary

All have some “bad permutations” many popular permutations are very bad for meshs (transpose) randomness in wiring or routing makes it hard to find a bad one!

Topology Degree Diameter Ave Dist Bisection D (D ave) @ P=1024

1D Array 2 N-1 N / 3 1 huge

1D Ring 2 N/2 N/4 2

2D Mesh 4 2 (N1/2 - 1) 2/3 N1/2 N1/2 63 (21)

2D Torus 4 N1/2 1/2 N1/2 2N1/2 32 (16)

k-ary n-cube 2n n(k-1) n(k-1)/2 2kn-1 27 (13.5) @n=3

Hypercube n=log N n n/2 N/2 10 (5)

Page 24: Interconnection Network Topology Design Trade-offs

24

Wire Efficient Communication Networks for Multicomputers

What makes a network efficient? Efficient use of the limiting resources Limiting Factors

switches and pins were only considered the limiting factors Wires are limiting factors because of power and delay as well as

density At the board level as well as at the chip level, the system

interconnection is limited by wire density Most of the power dissipated in the networks is CV2f power to used to

drive wires. Most of the delay is propagation delay over wires or RC delay in

driving wires

Page 25: Interconnection Network Topology Design Trade-offs

25

In the 3D world

For n nodes, bisection area is O(n2/3 )

For large n, bisection bandwidth is limited to O(n2/3 ) Bill Dally, IEEE TPDS, [Dal90a] For fixed bisection bandwidth, low-dimensional k-ary d-cubes are

better (otherwise higher is better) i.e., a few short fat wires are better than many long thin wires What about many long fat wires?

Page 26: Interconnection Network Topology Design Trade-offs

262-ary 3-fly

BI = N/2 : high bisection widthBWI = Nw/2din = dout = kd = 2k : low degreeD = d+1 : low diameter

The Design Objective of the Network

To minimize latency and maximize throughput Latency T(,L) :the average time required to deliver a

message Each node injects messages with average length L into the

network at an average rate of bits per cycle. Three independent variables: topology, routing, and flow control

TopologyIndirect Networks (k-ary d-flys: radix k and dimension d) No of processing nodes: N = kd

Page 27: Interconnection Network Topology Design Trade-offs

27

Wire Efficient Topology

Indirect Networks high bisection width, low degree, low diameter, long wires,

symmetry the bisection width B = N/2 does not reflect the actual maximum

wire density for this class of networks: vertical partition (N wires) more accurately reflects the wiring problems

wire area O(N2) : plane mapping - expensive N = kd. As one varies k and d with the number of processing

nodes, N, and BW fixed. the degree and diameter are directly controlled. the channel width remains fixed at w = BW/B=2BW/N. B is independent of the choice of k and d. disadvantage: it prevents the designer from trading off the bandwidth

of a channel against the diameter of the network.

Page 28: Interconnection Network Topology Design Trade-offs

28

Direct Networks (k-ary d-cubes) BD = 2N/k

BWD = 2Nw/k

din = dout = d d = 2d D = dk/2

For small d a low and controllable bisection width (N=kd) low degree high diameter short wires (d 3) wiring complexity O(N)

BI = N/2 : high bisection width

BWI = Nw/2

din = dout = k

d = 2k : low degree

D = d+1 : low diameter

Wire Efficient Topology

Page 29: Interconnection Network Topology Design Trade-offs

29

How Many Dimensions?

d = 2 or d = 3 Short wires, easy to build Many hops, low bisection bandwidth Requires traffic locality

d 4 Harder to build, more wires, longer average length Fewer hops, better bisection bandwidth Can handle non-local traffic

k-ary d-cubes provide a consistent framework for comparison N = kd

scale dimension (d) or nodes per dimension (k) assume cut-through

Page 30: Interconnection Network Topology Design Trade-offs

30

Traditional Scaling: Unloaded Latency(N)

Assumes equal channel width independent of node count or dimension dominated by average distance

0

20

40

60

80

100

120

140

0 2000 4000 6000 8000 10000

Machine Size (N)

Ave

Lat

ency

T(m

=40)

d=2

d=3

d=4

k=2

m/w

0

50

100

150

200

250

0 2000 4000 6000 8000 10000

Machine Size (N)

Ave L

ate

ncy T

(m=140)

Unit routing delay ( = 1) w = 1

Page 31: Interconnection Network Topology Design Trade-offs

31

Real Machines

Wide links, smaller routing delay Tremendous variation

Page 32: Interconnection Network Topology Design Trade-offs

32

Average Distance

but, equal channel width is not equal cost! Higher dimension => more channels

0

10

20

30

40

50

60

70

80

90

100

0 5 10 15 20 25

Dimension

Ave D

ista

nce

256

1024

16384

1048576

ave dist = d (k-1)/2