indirect adaptive routing on large scale interconnection ...isca09.cs.columbia.edu/pres/20.pdfkorean...

27
1 Indirect Adaptive Routing on Large Scale Interconnection Networks Nan Jiang, William J. Dally Computer System Laboratory Stanford University John Kim Korean Advanced Institute of Science and Technology

Upload: others

Post on 25-Jan-2021

1 views

Category:

Documents


0 download

TRANSCRIPT

  • 1

    Indirect Adaptive Routing on Large Scale Interconnection Networks

    Nan Jiang, William J. Dally

    Computer System Laboratory

    Stanford University

    John Kim

    Korean Advanced Institute of Science and Technology

  • 2

    Overview

    •  Indirect adaptive routing (IAR) –  Allow adaptive routing decision to be based on local and

    remote congestion information

    •  Main contributions –  Three new IAR algorithms for large scale networks –  Steady state and transient performance evaluations –  Impact of network configurations –  Cost of implementation

  • 3

    Presentation Outline

    • Background – The dragonfly network – Adaptive routing

    •  Indirect adaptive routing algorithms • Performance results •  Implementation considerations

  • 4

    Group 1 Group 0 Group 2 …

    Global Network

    Local Network

    Router 1 Router 2

    The Dragonfly Network

    •  High Radix Network –  High radix routers –  Small network diameter

    •  Each router –  Three types of channels –  Directly connected to a few

    other groups •  Each group

    –  Organized by a local network –  Large number of global

    channels (GC) •  Large network with a global

    diameter of one

    Router 0 p 0 p 1

  • 5

    Routing on the Dragonfly

    •  Minimal Routing (MIN) 1.  Source local network 2.  Global network 3.  Destination local network

    •  Some Adversarial traffic congests the global channels –  Each group i sends all packets

    to group i+1

    •  Oblivious solution: Valiant’s Algorithm (VAL) –  Poor performance on benign

    traffic

    Router 0

    Group 1 Group 0

    Group 2 …

    p 0 p 1

    … Router 1 Router 2 Congestion

  • 6

    Adaptive Routing

    •  Choose between the MIN path and a VAL path at the packet source [Singh'05] –  Decision metric: path delay –  Delay: product of path

    distance and path queue depth

    •  Measuring path queue length is unrealistic

    •  Use local queues length to approximate path –  Require stiff backpressure

    Source Router

    q 0 q 1

    q 2 q 3

    MIN GC

    VAL GC

    Congestion

  • 7

    Adaptive Routing: Worst Case Traffic

    0 0.1 0.2 0.3 0.4 0.5 100

    150

    200

    250

    300

    350

    400

    450

    Throughput (Flit Injection Rate)

    Pack

    et L

    aten

    cy (

    Sim

    ulat

    ion

    cycle

    s)

    Valiantʼs Minimal Adaptive

  • 8

    Indirect Adaptive Routing

    •  Improve routing decision through remote congestion information

    •  Previous method: –  Credit round trip [Kim et. al ISCA’08]

    •  Three new methods: –  Reservation –  Piggyback –  Progressive

  • 9

    Credit Round Trip (CRT)

    •  Delay the return of local credits from the congested router

    •  Creates the illusion of stiffer backpressure

    •  Drawbacks –  Remote congestion is still

    inferred through local queues

    –  Information not up to date Source Router

    Congestion

    Delayed Credits

    Credits

    MIN GC

    VAL GC

    [Kim et. al ISCA’08]

  • 10

    Reservation (RES)

    •  Each global channel track the number of incoming MIN packets

    •  Injected packets creates a reservation flit

    •  Routing decision based on the reservation outcome

    •  Drawbacks –  Reservation flit flooding –  Reservation delay

    Source Router

    Congestion

    RES Flit

    RES Failed

    MIN GC

    VAL GC

  • 11

    Piggyback (PB)

    •  Congestion broadcast –  Piggybacking on each

    packet –  Send on idle channels

    •  Congestion data compression

    •  Drawbacks –  Consumes extra

    bandwidth –  Congestion information

    not up to date (broadcast delay)

    Source Router

    Congestion

    GC Busy

    GC Free

    MIN GC

    VAL GC

  • 12

    Progressive (PAR)

    •  MIN routing decisions at the source are not final

    •  VAL decisions are final •  Switch to VAL when

    encountering congestion

    •  Draw backs –  Need an additional virtual

    channel to avoid deadlock –  Add extra hops

    Source Router

    Congestion

    MIN GC

    VAL GC

  • 13

    Experimental Setup

    •  Fully connected local and global networks – 33 groups – 1,056 nodes

    •  10 cycle local channel latency •  100 cycle global channel latency •  10-flit packets

  • 14

    Steady State Traffic: Uniform Random

    0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 100

    120

    140

    160

    180

    200

    220

    240

    260

    280

    300

    Throughput (Flit Injection Rate)

    Pack

    et L

    aten

    cy (

    Sim

    ulat

    ion

    cycle

    s)

    Piggyback Credit Round Trip Progressive Reservation Minimal

  • 15

    Steady State Traffic: Worst Case

    0 0.1 0.2 0.3 0.4 0.5 100

    150

    200

    250

    300

    350

    400

    450

    Throughput (Flit Injection Rate)

    Pack

    et L

    aten

    cy (S

    imul

    atio

    n cy

    cles)

    Piggyback Credit Round Trip Progressive Reservation Valiantʼs

  • 16

    Transient Traffic: Uniform Random to Worst Case

    0 20 40 60 80 100 100

    200

    300

    400

    500

    Cycles After Transition

    Pack

    et L

    aten

    cy

    Average Packet Latency per Cycle - UR to WC

    Progressive Piggyback

    0 20 40 60 80 100 0

    50

    100

    Cycles After Transition % o

    f Pac

    kets

    Rou

    ting

    Nonm

    inim

    ally

    % Packets Routing Non-minimally per Cycle - UR to WC

    Progressive Piggyback

  • 17

    Network Configuration Considerations

    •  Packet size –  RES requires long packets to amortize reservation flit cost –  Routing decision is done on per packet basis

    •  Channel latency –  Affects information delay (CRT, PB) –  Affects packet delay (PAR, RES)

    •  Network size –  Affects information bandwidth overhead (RES, PB)

    •  Global diameter greater than one –  Need to exchange congestion information on the global

    network

  • 18

    Cost Considerations

    •  Credit round trip –  Credit delay tracker for every local channel

    •  Reservation –  Reservation counter for every global channel –  Additional buffering at the injection port to store packets

    waiting for reservation

    •  Piggyback –  Global channel lookup table for every router –  Increase in packet size

    •  Progressive –  Extra virtual channel for deadlock avoidance

  • 19

    Conclusion

    •  Three new indirect adaptive routing algorithms for large scale networks

    •  Performance and design evaluation of the algorithms

    •  Best Algorithm? –  Piggyback performed the best under steady state traffic –  Progressive responded fastest to transient changes

    –  Network configurations will affect some algorithm performance –  Cost of implementation

  • 20

    Thank You!

    • Questions?

  • 21

    Adaptive Routing: Uniform Traffic

    0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 100

    120

    140

    160

    180

    200

    220

    240

    260

    280

    300

    Throughput - Flit Injection Rate

    Pack

    et L

    aten

    cy -

    Sim

    ulat

    ion

    cycle

    s VAL MIN Adaptive

  • 22

    Transient Traffic: Worst Case to Uniform Random

  • 23

    Transient Traffic: Worst Case 1 to Worst Case 10

  • 24

    1000 Random Permutation Traffic

    200 300 0 5

    10 15 20 25 30 PB

    Packet Latency

    % o

    f 1K

    Perm

    utat

    ions

    200 300 0 5

    10 15 20 25 30 CRT

    Packet Latency %

    of 1

    K Pe

    rmut

    atio

    ns

    200 300 0 5

    10 15 20 25 30 PAR

    Packet Latency

    % o

    f 1K

    Perm

    utat

    ions

    200 300 0

    5 10 15 20 25 30 RES

    Packet Latency

    % o

    f 1K

    Perm

    utat

    ions

    200 300 0 5

    10 15 20 25 30 VAL

    Packet Latency

    % o

    f 1K

    Perm

    utat

    ions

  • 25

    Effect of Packet size on RES: Worst Case Traffic

    0 0.1 0.2 0.3 0.4 0.5 0

    50

    100

    150

    200

    250

    300

    350

    400

    450

    500

    550

    Throughput - Flit Injection Rate

    Late

    ncy

    - Sim

    ulat

    ion

    cycle

    s

    1 Flit 2 Flits 4 Flits 8 Flits

  • 26

    Large local network: Uniform Random

    0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 0

    50

    100

    150

    200

    250

    300

    350

    400

    Throughput - Flit Injection Rate

    Pack

    et L

    aten

    cy -

    Sim

    ulat

    ion

    cycle

    s

    PB CRT MIN PAR RES

  • 27

    Large local network: Worst Case

    0 0.1 0.2 0.3 0.4 0.5 0

    100

    200

    300

    400

    500

    600

    Throughput - Flit Injection Rate

    Pack

    et L

    aten

    cy -

    Sim

    ulat

    ion

    cycle

    s

    PB CRT PAR RES VAL