george michelogiannakis, james balfour, william j. dally
DESCRIPTION
Elastic-Buffer Flow-Control for On-Chip Networks. George Michelogiannakis, James Balfour, William J. Dally. Computer Systems Laboratory Stanford University. Introduction. Elastic-buffer (EB) flow-control uses the channels as distributed FIFOs Input buffers at routers are not needed - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: George Michelogiannakis, James Balfour, William J. Dally](https://reader036.vdocument.in/reader036/viewer/2022062411/56816791550346895ddcc45d/html5/thumbnails/1.jpg)
George Michelogiannakis,James Balfour, William J. Dally
Computer Systems LaboratoryStanford University
Elastic-Buffer Flow-Control for On-Chip Networks
![Page 2: George Michelogiannakis, James Balfour, William J. Dally](https://reader036.vdocument.in/reader036/viewer/2022062411/56816791550346895ddcc45d/html5/thumbnails/2.jpg)
Introduction
Elastic-buffer (EB) flow-control uses the channels as distributed FIFOs• Input buffers at routers are not needed
Can provide 12% more throughput per unit power• Equal zero-load latency
Reduces router cycle time by 18%• Compared to VC routers
2
![Page 3: George Michelogiannakis, James Balfour, William J. Dally](https://reader036.vdocument.in/reader036/viewer/2022062411/56816791550346895ddcc45d/html5/thumbnails/3.jpg)
Outline
Building elastic-buffered channels• By using what is already there
Router microarchitecture Deadlock avoidance Load-sensing for adaptive routing Evaluation
3
![Page 4: George Michelogiannakis, James Balfour, William J. Dally](https://reader036.vdocument.in/reader036/viewer/2022062411/56816791550346895ddcc45d/html5/thumbnails/4.jpg)
The Idea
Use the network channels as distributed FIFOs Use that storage instead of input buffers at
routers• To remove input buffer area and power costs
4
Pipelined channel
Channel as FIFO
![Page 5: George Michelogiannakis, James Balfour, William J. Dally](https://reader036.vdocument.in/reader036/viewer/2022062411/56816791550346895ddcc45d/html5/thumbnails/5.jpg)
Building an Elastic Buffer
To build an EB in a pipelined channel with master-slave flip-flops (FFs):
Use latches for storage by driving their enables independently
5
Master-slave FF
Elastic buffer
![Page 6: George Michelogiannakis, James Balfour, William J. Dally](https://reader036.vdocument.in/reader036/viewer/2022062411/56816791550346895ddcc45d/html5/thumbnails/6.jpg)
How Elastic Buffer Channels Work
Ready/valid handshake between elastic buffers• Ready: At least one free storage slot• Valid: Non-empty (driving valid data)
6Cycle 1Cycle 2Cycle 3Cycle 4Cycle 5Cycle 6
![Page 7: George Michelogiannakis, James Balfour, William J. Dally](https://reader036.vdocument.in/reader036/viewer/2022062411/56816791550346895ddcc45d/html5/thumbnails/7.jpg)
Control Logic Area Overhead
Control logic is implemented as a four-state FSM with 10 gates and 2 FFs• Cost is amortized over channel width
Example: control logic increases area of a 64-bit channel by 5%
7
![Page 8: George Michelogiannakis, James Balfour, William J. Dally](https://reader036.vdocument.in/reader036/viewer/2022062411/56816791550346895ddcc45d/html5/thumbnails/8.jpg)
Outline
Building elastic-buffered channels Router microarchitecture
• Use EB flow-control through the router Deadlock avoidance Load-sensing for adaptive routing Evaluation
8
![Page 9: George Michelogiannakis, James Balfour, William J. Dally](https://reader036.vdocument.in/reader036/viewer/2022062411/56816791550346895ddcc45d/html5/thumbnails/9.jpg)
Use EB Flow-Control Through the Router
9
VC input-buffered router
EB router
Input bufferreplaced byinput EB
VC & SWallocators removed.Per-output arbitersinstead.
Three-slot outputEB to cover forarbitration doneone cycle inadvance.
LA routing alsoapplicable to EBnetworks.
![Page 10: George Michelogiannakis, James Balfour, William J. Dally](https://reader036.vdocument.in/reader036/viewer/2022062411/56816791550346895ddcc45d/html5/thumbnails/10.jpg)
Outline
Building elastic-buffered channels Router microarchitecture Deadlock avoidance
• How to provide isolation without VCs Load-sensing for adaptive routing Evaluation
10
![Page 11: George Michelogiannakis, James Balfour, William J. Dally](https://reader036.vdocument.in/reader036/viewer/2022062411/56816791550346895ddcc45d/html5/thumbnails/11.jpg)
Deadlock Avoidance: Duplicate Channels
No input buffers no virtual channels Three types of possible deadlocks:1. Protocol deadlock2. Cyclic flit dependency in network
Solution: Duplicate physical channels
11
![Page 12: George Michelogiannakis, James Balfour, William J. Dally](https://reader036.vdocument.in/reader036/viewer/2022062411/56816791550346895ddcc45d/html5/thumbnails/12.jpg)
Deadlock Avoidance: No Interleaving
3. Interleaving deadlock• New head flits require destination registers• Occupied destination registers depend on tail flits• Tail flits cannot bypass the new head flit
Solution: Disallow packet interleaving
12
![Page 13: George Michelogiannakis, James Balfour, William J. Dally](https://reader036.vdocument.in/reader036/viewer/2022062411/56816791550346895ddcc45d/html5/thumbnails/13.jpg)
Duplicating Channels Between Routers
Duplicate channels with neckdown• Small improvement (still one switch port), large cost
Duplicate channels with duplicate switch ports• Excessive cost (switch quadratic cost)
13
![Page 14: George Michelogiannakis, James Balfour, William J. Dally](https://reader036.vdocument.in/reader036/viewer/2022062411/56816791550346895ddcc45d/html5/thumbnails/14.jpg)
Dividing Into Sub-Networks More Efficient
Divide into sub-networks• Double bandwidth, double the cost• However, when narrowing datapath down to normalize
for throughput or power more beneficial• Again, due to switch quadratic cost
14
![Page 15: George Michelogiannakis, James Balfour, William J. Dally](https://reader036.vdocument.in/reader036/viewer/2022062411/56816791550346895ddcc45d/html5/thumbnails/15.jpg)
Outline
Building elastic-buffered channels Router microarchitecture Deadlock avoidance Load-sensing for adaptive routing
• Propose a load metric for EB networks Evaluation
15
![Page 16: George Michelogiannakis, James Balfour, William J. Dally](https://reader036.vdocument.in/reader036/viewer/2022062411/56816791550346895ddcc45d/html5/thumbnails/16.jpg)
Output Channel Occupancy Load Metric
Flit-buffered networks use credit count EB networks measure output channel occupancy
• At a certain segment of the output channel (shown in red)• Occupancy decremented when flits leave that segment• Incremented by a packet’s length when routing decision is
made. Packets see other decisions in same cycle
16
![Page 17: George Michelogiannakis, James Balfour, William J. Dally](https://reader036.vdocument.in/reader036/viewer/2022062411/56816791550346895ddcc45d/html5/thumbnails/17.jpg)
Outline
Building elastic-buffered channels Router microarchitecture Deadlock avoidance Load-sensing for adaptive routing Evaluation
• Compare throughput, power, area, latency, cycle time
17
![Page 18: George Michelogiannakis, James Balfour, William J. Dally](https://reader036.vdocument.in/reader036/viewer/2022062411/56816791550346895ddcc45d/html5/thumbnails/18.jpg)
Evaluation Methodology
Used a modified version of booksim Area/power estimations from a 65nm library
• Input buffers modeled as SRAM cells• Throughput/power optimal # of VCs and buffer depth• Two sub-networks: request and reply
Averaged over a set of 6 traffic patterns Constant packet size (512 bits) Swept channel width from 28 to 192 bits Low-swing channels: 0.3 of the full-swing
repeated wire traversal power18
![Page 19: George Michelogiannakis, James Balfour, William J. Dally](https://reader036.vdocument.in/reader036/viewer/2022062411/56816791550346895ddcc45d/html5/thumbnails/19.jpg)
Throughput-Power Gains in 2D Mesh
19
EB network improvement:
Same power: 10% increased throughput
Same throughput: 12% reduced power
Throughput gain
![Page 20: George Michelogiannakis, James Balfour, William J. Dally](https://reader036.vdocument.in/reader036/viewer/2022062411/56816791550346895ddcc45d/html5/thumbnails/20.jpg)
Throughput-Area Gains in 2D Mesh
20
2% improvementfor EB networks
![Page 21: George Michelogiannakis, James Balfour, William J. Dally](https://reader036.vdocument.in/reader036/viewer/2022062411/56816791550346895ddcc45d/html5/thumbnails/21.jpg)
Latency-Throughput in 2D Mesh
21
Zero-load latency equal
![Page 22: George Michelogiannakis, James Balfour, William J. Dally](https://reader036.vdocument.in/reader036/viewer/2022062411/56816791550346895ddcc45d/html5/thumbnails/22.jpg)
Power Breakdown: No Input Buffer Power
22
0 0.2 0.4 0.6 0.8
VC-Buff
EBN
Mesh low-swing power breakdown (2% packet injection rate)
Output clock
Output FF
Crossbar control
Crossbar power
Input buffer write
Input buffer read
Channel FF
Channel clock
Channel traversal
(W)
![Page 23: George Michelogiannakis, James Balfour, William J. Dally](https://reader036.vdocument.in/reader036/viewer/2022062411/56816791550346895ddcc45d/html5/thumbnails/23.jpg)
Area Breakdown: No Input Buffer Area
23
0.0
0.2
0.4
0.6
0.8
1.0
1.2
VC-Buff EBN
Low-swing mesh area breakdown
Channel Switch Input Output(mm2)
![Page 24: George Michelogiannakis, James Balfour, William J. Dally](https://reader036.vdocument.in/reader036/viewer/2022062411/56816791550346895ddcc45d/html5/thumbnails/24.jpg)
Router RTL Implementation
No buffers, VCs, allocators, credits• VC router had look-ahead routing
Buffers: FF arrays. 2 VCs, 8 slots each
Aspect VC router EB router SavingsArea (μm2) 63,515 14,730 77%Clock (ns) 3.3 2.7 18%
Power (mW) 2.59 0.12 95%
24
45nm, LP-CMOS, worst-caseMesh 5x5 routers. DOR. 64-bit datapath
![Page 25: George Michelogiannakis, James Balfour, William J. Dally](https://reader036.vdocument.in/reader036/viewer/2022062411/56816791550346895ddcc45d/html5/thumbnails/25.jpg)
Conclusions
EB flow-control uses channels as distributed FIFOs• Removes input buffers from routers• Uses duplicate physical channels instead of VCs
Increases throughput per unit power up to 12% for low-swing• Depends on what fraction of the overall cost input buffers
constitute Reduces router cycle time by 18% Flow-control choice depends on design parameters
and priorities
25
![Page 26: George Michelogiannakis, James Balfour, William J. Dally](https://reader036.vdocument.in/reader036/viewer/2022062411/56816791550346895ddcc45d/html5/thumbnails/26.jpg)
Questions?
Thanks for your attention