smart: a single- cycle reconfigurable noc for soc applications -jyoti wadhwani chia-hsin owen chen,...
TRANSCRIPT
SMART: A Single-Cycle Reconfigurable NoC for SoC Applications
-Jyoti Wadhwani
Chia-Hsin Owen Chen, Sunghyun Park, Tushar Krishna, Suvinay Subramaniam, Anantha P. Chandrakasan, Li-
Shiuan PehDepartment of Electrical and Computer Science, MIT,
Cambridge
ECE 284 Spring 2013 3
Challenges with this evolution
Scaling “compute” possible: Moore’s LawWhat about communication network?
ECE 284 Spring 2013 4
More “hops” are bad
At each hop: router• Latency• Power
At system level• delayed responses
delayed injection of fresh requests overall shutdown
increased power budget
ECE 284 Spring 2013 5
Motivation
1mm
1mm 1mm
Wires can be driven to multiple mm within a cycle using repeaters
Number of hops in a cycle depends on the repeater circuit and wire parasitics
• NoCs should deliver• Low latency• High bandwidth
• Signaling at low-voltage swing can lower energy consumption and propagation delay
• Wire delay is much shorter than a typical router cycle time• Can traverse multiple hops in a single cycle by bypassing buffering & arbitration at the
routers
1mm 1mm 1mm
with low power and area overhead
Router cycle time = 500ps for a 2GHz clock Full-swing repeated wire delay ~ 100ps/mmby bypassing the buffers, we can traverse 5mm in 1 clock cycle!
ECE 284 Spring 2013 6
Approaches to reduce on-chip latency
• Application-specific topology reconfiguration needed• To bypass the buffering and arbitration at routers
• Topology can be reconfigured to match application-specific communication patterns at• Design time
• Requires knowledge of all applications and their communication graphs at design time
• Overhead: wiring density to support dedicated links• Runtime
• Computation of contention free routes allowing flits to bypass the queues
This paper performs online reconfiguration of network routers at runtime, to enable different applications to run on tailored topologies
ECE 284 Spring 2013 7
SMART LINK• Voltage lock repeater (VLR): Asynchronous low-swing repeater circuit
• For single-cycle multi-hop link traversal • Low-swing link stretches the maximum distance spanned by a repeated
link in a single clock cycle• For transmitting 5.5Gb/s data with BER less than , power consumption for
• Full swing repeater is 4.21mW• VLR is 3.78mW
• Delay of the link with• Full-swing repeaters is 100ps/mm• VLRs is 60ps/mm
Node X voltage locked to swing near the threshold voltage of INV1x without decrease in drive currentLow-swing voltage level is
determined by transistor sizes and link wire impedance simulations performed across process corners
ECE 284 Spring 2013 8
SMART Router Microarchitecture
SMART Crossbar If the MUX is preset to connect the incoming link to the crossbar, bypass path is enabled
bypass path
If the MUX is set to connect the input port buffer to the crossbar, bypass path is disabledBypass path is disabled when the same output port is shared by multiple input ports
ECE 284 Spring 2013 9
SMART Flow
The green and purple flows do not overlap with each other traverse from the source to destination router in a single clock cycle
The red and blue flows overlap need to be stopped at the routers 9 and 10 to arbitrate for the shared crossbar ports
Reverse credit mesh network: to keep track of the free VCs at the endpoint of an arbitrary SMART route
For the blue flow, 3, 7 and 11 forward credits from NIC3 to the router 10’s East output port
The VC queue of a router keeps track of the VCs at the input port of a router multiple hops away, and not just the neighbor
ECE 284 Spring 2013 10
Results
• SMART is compared against two baselines:• Mesh:
• No reconfiguration• Each hop takes 3 cycles in the router and 1 cycle in the link
• Dedicated:• 1-cycle dedicated links tailored to each application
• At 2GHz, SMART NoC can traverse 8mm within a single clock cycle, i.e. 8 hops with 1mm cores
• SMART is 1.5 cycles off in performance from the Dedicated baseline.• when one core acts as a source and another acts as a sink
for most of the flows.
ECE 284 Spring 2013 11
Results• Benefits of SMART are seen more when certain tasks are
tied to specific cores, resulting in longer paths• SMART NoC gives 60% latency savings and 2.2X power
savings compared to the Mesh.• Power savings are due to bypassing of buffers, low voltage
signaling and clock gating at the routers
ECE 284 Spring 2013 12
Conclusion
• The paper proposes• an NoC architecture that reconfigures and tailors a generic
mesh topology for SoC applications at runtime• a low-swing clockless repeated link circuit embedded within
router crossbars that allows packets to bypass all the way from source to destination core within a single clock cycle
ECE 284 Spring 2013 13
Critiques/Comments
• Wire delay does not scale with the shrinking of transistors unlike gate delay.
• In multi-mode design (operating at different voltage levels) and wire resistance increasing with rise in temperature, careful transistor sizing in the repeater circuit is required by simulating across all PVT corners (not just process corners).