smart: a single- cycle reconfigurable noc for soc applications -jyoti wadhwani chia-hsin owen chen,...

14
SMART: A Single- Cycle Reconfigurable NoC for SoC Applications -Jyoti Wadhwani Chia-Hsin Owen Chen, Sunghyun Park, Tushar Krishna, Suvinay Subramaniam, Anantha P. Chandrakasan, Li-Shiuan Peh Department of Electrical and Computer Science, MIT, Cambridge

Upload: derek-adams

Post on 30-Dec-2015

214 views

Category:

Documents


1 download

TRANSCRIPT

SMART: A Single-Cycle Reconfigurable NoC for SoC Applications

-Jyoti Wadhwani

Chia-Hsin Owen Chen, Sunghyun Park, Tushar Krishna, Suvinay Subramaniam, Anantha P. Chandrakasan, Li-

Shiuan PehDepartment of Electrical and Computer Science, MIT,

Cambridge

ECE 284 Spring 2013 2

Evolution of on-chip systems

ECE 284 Spring 2013 3

Challenges with this evolution

Scaling “compute” possible: Moore’s LawWhat about communication network?

ECE 284 Spring 2013 4

More “hops” are bad

At each hop: router• Latency• Power

At system level• delayed responses

delayed injection of fresh requests overall shutdown

increased power budget

ECE 284 Spring 2013 5

Motivation

1mm

1mm 1mm

Wires can be driven to multiple mm within a cycle using repeaters

Number of hops in a cycle depends on the repeater circuit and wire parasitics

• NoCs should deliver• Low latency• High bandwidth

• Signaling at low-voltage swing can lower energy consumption and propagation delay

• Wire delay is much shorter than a typical router cycle time• Can traverse multiple hops in a single cycle by bypassing buffering & arbitration at the

routers

1mm 1mm 1mm

with low power and area overhead

Router cycle time = 500ps for a 2GHz clock Full-swing repeated wire delay ~ 100ps/mmby bypassing the buffers, we can traverse 5mm in 1 clock cycle!

ECE 284 Spring 2013 6

Approaches to reduce on-chip latency

• Application-specific topology reconfiguration needed• To bypass the buffering and arbitration at routers

• Topology can be reconfigured to match application-specific communication patterns at• Design time

• Requires knowledge of all applications and their communication graphs at design time

• Overhead: wiring density to support dedicated links• Runtime

• Computation of contention free routes allowing flits to bypass the queues

This paper performs online reconfiguration of network routers at runtime, to enable different applications to run on tailored topologies

ECE 284 Spring 2013 7

SMART LINK• Voltage lock repeater (VLR): Asynchronous low-swing repeater circuit

• For single-cycle multi-hop link traversal • Low-swing link stretches the maximum distance spanned by a repeated

link in a single clock cycle• For transmitting 5.5Gb/s data with BER less than , power consumption for

• Full swing repeater is 4.21mW• VLR is 3.78mW

• Delay of the link with• Full-swing repeaters is 100ps/mm• VLRs is 60ps/mm

Node X voltage locked to swing near the threshold voltage of INV1x without decrease in drive currentLow-swing voltage level is

determined by transistor sizes and link wire impedance simulations performed across process corners

ECE 284 Spring 2013 8

SMART Router Microarchitecture

SMART Crossbar If the MUX is preset to connect the incoming link to the crossbar, bypass path is enabled

bypass path

If the MUX is set to connect the input port buffer to the crossbar, bypass path is disabledBypass path is disabled when the same output port is shared by multiple input ports

ECE 284 Spring 2013 9

SMART Flow

The green and purple flows do not overlap with each other traverse from the source to destination router in a single clock cycle

The red and blue flows overlap need to be stopped at the routers 9 and 10 to arbitrate for the shared crossbar ports

Reverse credit mesh network: to keep track of the free VCs at the endpoint of an arbitrary SMART route

For the blue flow, 3, 7 and 11 forward credits from NIC3 to the router 10’s East output port

The VC queue of a router keeps track of the VCs at the input port of a router multiple hops away, and not just the neighbor

ECE 284 Spring 2013 10

Results

• SMART is compared against two baselines:• Mesh:

• No reconfiguration• Each hop takes 3 cycles in the router and 1 cycle in the link

• Dedicated:• 1-cycle dedicated links tailored to each application

• At 2GHz, SMART NoC can traverse 8mm within a single clock cycle, i.e. 8 hops with 1mm cores

• SMART is 1.5 cycles off in performance from the Dedicated baseline.• when one core acts as a source and another acts as a sink

for most of the flows.

ECE 284 Spring 2013 11

Results• Benefits of SMART are seen more when certain tasks are

tied to specific cores, resulting in longer paths• SMART NoC gives 60% latency savings and 2.2X power

savings compared to the Mesh.• Power savings are due to bypassing of buffers, low voltage

signaling and clock gating at the routers

ECE 284 Spring 2013 12

Conclusion

• The paper proposes• an NoC architecture that reconfigures and tailors a generic

mesh topology for SoC applications at runtime• a low-swing clockless repeated link circuit embedded within

router crossbars that allows packets to bypass all the way from source to destination core within a single clock cycle

ECE 284 Spring 2013 13

Critiques/Comments

• Wire delay does not scale with the shrinking of transistors unlike gate delay.

• In multi-mode design (operating at different voltage levels) and wire resistance increasing with rise in temperature, careful transistor sizing in the repeater circuit is required by simulating across all PVT corners (not just process corners).

ECE 284 Spring 2013 14

THANK YOU