indirect adaptive routing on large scale interconnection ...isca09.cs.columbia.edu/pres/20.pdfkorean...
TRANSCRIPT
-
1
Indirect Adaptive Routing on Large Scale Interconnection Networks
Nan Jiang, William J. Dally
Computer System Laboratory
Stanford University
John Kim
Korean Advanced Institute of Science and Technology
-
2
Overview
• Indirect adaptive routing (IAR) – Allow adaptive routing decision to be based on local and
remote congestion information
• Main contributions – Three new IAR algorithms for large scale networks – Steady state and transient performance evaluations – Impact of network configurations – Cost of implementation
-
3
Presentation Outline
• Background – The dragonfly network – Adaptive routing
• Indirect adaptive routing algorithms • Performance results • Implementation considerations
-
4
Group 1 Group 0 Group 2 …
Global Network
…
…
…
Local Network
Router 1 Router 2
The Dragonfly Network
• High Radix Network – High radix routers – Small network diameter
• Each router – Three types of channels – Directly connected to a few
other groups • Each group
– Organized by a local network – Large number of global
channels (GC) • Large network with a global
diameter of one
Router 0 p 0 p 1
…
-
5
Routing on the Dragonfly
• Minimal Routing (MIN) 1. Source local network 2. Global network 3. Destination local network
• Some Adversarial traffic congests the global channels – Each group i sends all packets
to group i+1
• Oblivious solution: Valiant’s Algorithm (VAL) – Poor performance on benign
traffic
Router 0
Group 1 Group 0
…
Group 2 …
p 0 p 1
…
…
… Router 1 Router 2 Congestion
-
6
Adaptive Routing
• Choose between the MIN path and a VAL path at the packet source [Singh'05] – Decision metric: path delay – Delay: product of path
distance and path queue depth
• Measuring path queue length is unrealistic
• Use local queues length to approximate path – Require stiff backpressure
Source Router
q 0 q 1
q 2 q 3
MIN GC
VAL GC
Congestion
-
7
Adaptive Routing: Worst Case Traffic
0 0.1 0.2 0.3 0.4 0.5 100
150
200
250
300
350
400
450
Throughput (Flit Injection Rate)
Pack
et L
aten
cy (
Sim
ulat
ion
cycle
s)
Valiantʼs Minimal Adaptive
-
8
Indirect Adaptive Routing
• Improve routing decision through remote congestion information
• Previous method: – Credit round trip [Kim et. al ISCA’08]
• Three new methods: – Reservation – Piggyback – Progressive
-
9
Credit Round Trip (CRT)
• Delay the return of local credits from the congested router
• Creates the illusion of stiffer backpressure
• Drawbacks – Remote congestion is still
inferred through local queues
– Information not up to date Source Router
Congestion
Delayed Credits
Credits
MIN GC
VAL GC
[Kim et. al ISCA’08]
-
10
Reservation (RES)
• Each global channel track the number of incoming MIN packets
• Injected packets creates a reservation flit
• Routing decision based on the reservation outcome
• Drawbacks – Reservation flit flooding – Reservation delay
Source Router
Congestion
RES Flit
RES Failed
MIN GC
VAL GC
-
11
Piggyback (PB)
• Congestion broadcast – Piggybacking on each
packet – Send on idle channels
• Congestion data compression
• Drawbacks – Consumes extra
bandwidth – Congestion information
not up to date (broadcast delay)
Source Router
Congestion
GC Busy
GC Free
MIN GC
VAL GC
-
12
Progressive (PAR)
• MIN routing decisions at the source are not final
• VAL decisions are final • Switch to VAL when
encountering congestion
• Draw backs – Need an additional virtual
channel to avoid deadlock – Add extra hops
Source Router
Congestion
MIN GC
VAL GC
-
13
Experimental Setup
• Fully connected local and global networks – 33 groups – 1,056 nodes
• 10 cycle local channel latency • 100 cycle global channel latency • 10-flit packets
-
14
Steady State Traffic: Uniform Random
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 100
120
140
160
180
200
220
240
260
280
300
Throughput (Flit Injection Rate)
Pack
et L
aten
cy (
Sim
ulat
ion
cycle
s)
Piggyback Credit Round Trip Progressive Reservation Minimal
-
15
Steady State Traffic: Worst Case
0 0.1 0.2 0.3 0.4 0.5 100
150
200
250
300
350
400
450
Throughput (Flit Injection Rate)
Pack
et L
aten
cy (S
imul
atio
n cy
cles)
Piggyback Credit Round Trip Progressive Reservation Valiantʼs
-
16
Transient Traffic: Uniform Random to Worst Case
0 20 40 60 80 100 100
200
300
400
500
Cycles After Transition
Pack
et L
aten
cy
Average Packet Latency per Cycle - UR to WC
Progressive Piggyback
0 20 40 60 80 100 0
50
100
Cycles After Transition % o
f Pac
kets
Rou
ting
Nonm
inim
ally
% Packets Routing Non-minimally per Cycle - UR to WC
Progressive Piggyback
-
17
Network Configuration Considerations
• Packet size – RES requires long packets to amortize reservation flit cost – Routing decision is done on per packet basis
• Channel latency – Affects information delay (CRT, PB) – Affects packet delay (PAR, RES)
• Network size – Affects information bandwidth overhead (RES, PB)
• Global diameter greater than one – Need to exchange congestion information on the global
network
-
18
Cost Considerations
• Credit round trip – Credit delay tracker for every local channel
• Reservation – Reservation counter for every global channel – Additional buffering at the injection port to store packets
waiting for reservation
• Piggyback – Global channel lookup table for every router – Increase in packet size
• Progressive – Extra virtual channel for deadlock avoidance
-
19
Conclusion
• Three new indirect adaptive routing algorithms for large scale networks
• Performance and design evaluation of the algorithms
• Best Algorithm? – Piggyback performed the best under steady state traffic – Progressive responded fastest to transient changes
– Network configurations will affect some algorithm performance – Cost of implementation
-
20
Thank You!
• Questions?
-
21
Adaptive Routing: Uniform Traffic
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 100
120
140
160
180
200
220
240
260
280
300
Throughput - Flit Injection Rate
Pack
et L
aten
cy -
Sim
ulat
ion
cycle
s VAL MIN Adaptive
-
22
Transient Traffic: Worst Case to Uniform Random
-
23
Transient Traffic: Worst Case 1 to Worst Case 10
-
24
1000 Random Permutation Traffic
200 300 0 5
10 15 20 25 30 PB
Packet Latency
% o
f 1K
Perm
utat
ions
200 300 0 5
10 15 20 25 30 CRT
Packet Latency %
of 1
K Pe
rmut
atio
ns
200 300 0 5
10 15 20 25 30 PAR
Packet Latency
% o
f 1K
Perm
utat
ions
200 300 0
5 10 15 20 25 30 RES
Packet Latency
% o
f 1K
Perm
utat
ions
200 300 0 5
10 15 20 25 30 VAL
Packet Latency
% o
f 1K
Perm
utat
ions
-
25
Effect of Packet size on RES: Worst Case Traffic
0 0.1 0.2 0.3 0.4 0.5 0
50
100
150
200
250
300
350
400
450
500
550
Throughput - Flit Injection Rate
Late
ncy
- Sim
ulat
ion
cycle
s
1 Flit 2 Flits 4 Flits 8 Flits
-
26
Large local network: Uniform Random
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 0
50
100
150
200
250
300
350
400
Throughput - Flit Injection Rate
Pack
et L
aten
cy -
Sim
ulat
ion
cycle
s
PB CRT MIN PAR RES
-
27
Large local network: Worst Case
0 0.1 0.2 0.3 0.4 0.5 0
100
200
300
400
500
600
Throughput - Flit Injection Rate
Pack
et L
aten
cy -
Sim
ulat
ion
cycle
s
PB CRT PAR RES VAL