on-chip communication: networks on chip (nocs)
DESCRIPTION
On-Chip Communication: Networks on Chip (NoCs). Sudeep Pasricha Colorado State University CS/ECE 561 Fall 2011. Outline. Introduction NoC Topology Switching strategies Router Microarchitecture Routing algorithms Flow control schemes Clocking schemes QoS NoC Architecture Examples - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: On-Chip Communication: Networks on Chip (NoCs)](https://reader036.vdocument.in/reader036/viewer/2022062301/568146c4550346895db3ff3c/html5/thumbnails/1.jpg)
1
On-Chip Communication: Networks on Chip (NoCs)
Sudeep PasrichaColorado State University
CS/ECE 561 Fall 2011
![Page 2: On-Chip Communication: Networks on Chip (NoCs)](https://reader036.vdocument.in/reader036/viewer/2022062301/568146c4550346895db3ff3c/html5/thumbnails/2.jpg)
Outline
Introduction NoC Topology Switching strategies Router Microarchitecture Routing algorithms Flow control schemes Clocking schemes QoS NoC Architecture Examples Status and Open Problems
2
![Page 3: On-Chip Communication: Networks on Chip (NoCs)](https://reader036.vdocument.in/reader036/viewer/2022062301/568146c4550346895db3ff3c/html5/thumbnails/3.jpg)
Introduction Evolution of on-chip communication architectures
3
![Page 4: On-Chip Communication: Networks on Chip (NoCs)](https://reader036.vdocument.in/reader036/viewer/2022062301/568146c4550346895db3ff3c/html5/thumbnails/4.jpg)
Introduction Network-on-chip (NoC) is a packet switched on-chip
communication network designed using a layered methodology
NoCs use packets to route data from the source to the destination PE via a network fabric that consists of
network interfaces (NI) switches (routers) interconnection links (wires)
4
registers
ALU MEM
NI
![Page 5: On-Chip Communication: Networks on Chip (NoCs)](https://reader036.vdocument.in/reader036/viewer/2022062301/568146c4550346895db3ff3c/html5/thumbnails/5.jpg)
Introduction NoCs are an attempt to scale down the concepts of
largescale networks, and apply them to the system-on-chip (SoC) domain
NoC Properties Reliable and predictable electrical and physical properties Regular geometry that is scalable Flexible QoS guarantees Higher bandwidth Reusable components
Buffers, arbiters, routers, protocol stack
5
![Page 6: On-Chip Communication: Networks on Chip (NoCs)](https://reader036.vdocument.in/reader036/viewer/2022062301/568146c4550346895db3ff3c/html5/thumbnails/6.jpg)
Introduction ISO/OSI network protocol stack model
6
![Page 7: On-Chip Communication: Networks on Chip (NoCs)](https://reader036.vdocument.in/reader036/viewer/2022062301/568146c4550346895db3ff3c/html5/thumbnails/7.jpg)
Building Blocks: NI
7
Fron
t en
d
Backe
nd
Standardized node interface @ session layer. Initiator vs. target distinction is blurred
1. Supported transactions (e.g. QoSread…)
2. Degree of parallelism3. Session prot. control flow &
negotiation
NoC specific backend (layers 1-4)1. Physical channel interface2. Link-level protocol3. Network-layer (packetization)4. Transport layer (routing)
Node Switches
Standard P2P Node protocol Proprietary link protocol
Decoupling logic & synchronization
Session-layer (P2P) interface with nodesBack-end manages interface with switches
![Page 8: On-Chip Communication: Networks on Chip (NoCs)](https://reader036.vdocument.in/reader036/viewer/2022062301/568146c4550346895db3ff3c/html5/thumbnails/8.jpg)
Building Blocks: Switch
8
Router or Switch: receives and forwards packetsBuffers have dual function
synchronization & queueing
Crossbar
AllocatorArbiter
Output buffers& control flow
Input buffers& control flow
QoS &Routing
Data portswith control flowwires
![Page 9: On-Chip Communication: Networks on Chip (NoCs)](https://reader036.vdocument.in/reader036/viewer/2022062301/568146c4550346895db3ff3c/html5/thumbnails/9.jpg)
On-chip vs. Off Chip Networks Cost
Off-chip: cost is channels, huge pads, expensive connectors, cables, optics
On-chip: cost is Si area and Power (storage!), wires are not infinite, but plentiful
Channel Characteristics On-chip: wires are short latency is comparable with logic, huge
amount of bandwidth, can put logic in links; LOT of uncertainty (process variations) and interference (e.g., noise)
Off-chip: wires are long link latency dominates, bandwidth is precious, links are strongly decoupled;
Workload On-chip: non-homogeneous traffic – much is known Off-chip: very little is known, and it may change
Design issues On-chip: Must fit in floorplanning (die area constraint) Off-chip: dictates board/rack organization
— 9
![Page 10: On-Chip Communication: Networks on Chip (NoCs)](https://reader036.vdocument.in/reader036/viewer/2022062301/568146c4550346895db3ff3c/html5/thumbnails/10.jpg)
NoC Concepts Topology
How the nodes are connected together
Switching Allocation of network resources (bandwidth, buffer
capacity, …) to information flows
Routing Path selection between a source and a destination
node in a particular topology
Flow control How the downstream node communicates forwarding
availability to the upstream node— 10
![Page 11: On-Chip Communication: Networks on Chip (NoCs)](https://reader036.vdocument.in/reader036/viewer/2022062301/568146c4550346895db3ff3c/html5/thumbnails/11.jpg)
Outline
Introduction NoC Topology Switching strategies Router Microarchitecture Routing algorithms Flow control schemes Clocking schemes QoS NoC Architecture Examples Status and Open Problems
11
![Page 12: On-Chip Communication: Networks on Chip (NoCs)](https://reader036.vdocument.in/reader036/viewer/2022062301/568146c4550346895db3ff3c/html5/thumbnails/12.jpg)
NoC Topology 1. Direct Topologies
each node has direct point-to-point link to a subset of other nodes in the system called neighboring nodes
as the number of nodes in the system increases, the total available communication bandwidth also increases
fundamental trade-off is between connectivity and cost
12
![Page 13: On-Chip Communication: Networks on Chip (NoCs)](https://reader036.vdocument.in/reader036/viewer/2022062301/568146c4550346895db3ff3c/html5/thumbnails/13.jpg)
NoC Topology
Most direct network topologies have an orthogonal implementation, where nodes can be arranged in an n-dimensional orthogonal space
e.g. n-dimensional mesh, torus, folded torus, hypercube, and octagon
2D mesh is most popular topology all links have the same length
eases physical design area grows linearly with the number
of nodes must be designed in such a way as to
avoid traffic accumulating in the center of the mesh
13
![Page 14: On-Chip Communication: Networks on Chip (NoCs)](https://reader036.vdocument.in/reader036/viewer/2022062301/568146c4550346895db3ff3c/html5/thumbnails/14.jpg)
NoC Topology Torus topology, also called a k-ary n-cube, is an n-
dimensional grid with k nodes in each dimension k-ary 1-cube (1-D torus) is essentially a ring network with k
nodes limited scalability as performance decreases when more nodes
k-ary 2-cube (i.e., 2-D torus) topology is similar to a regular 2D mesh
except that nodes at the edges are connected to switches at the opposite edge via wrap-around channels
long end-around connections can, however, lead to excessive delays
14
![Page 15: On-Chip Communication: Networks on Chip (NoCs)](https://reader036.vdocument.in/reader036/viewer/2022062301/568146c4550346895db3ff3c/html5/thumbnails/15.jpg)
NoC Topology Folding torus topology overcomes the long link limitation
of a 2-D torus links have the same size
Meshes and tori can be extended by adding bypass links to increase performance at the cost of higher area
15
![Page 16: On-Chip Communication: Networks on Chip (NoCs)](https://reader036.vdocument.in/reader036/viewer/2022062301/568146c4550346895db3ff3c/html5/thumbnails/16.jpg)
NoC Topology Octagon topology is another example of a direct
network messages being sent between any 2 nodes require at most
two hops more octagons can be tiled together to accommodate larger
designs by using one of the nodes as a bridge node
16
![Page 17: On-Chip Communication: Networks on Chip (NoCs)](https://reader036.vdocument.in/reader036/viewer/2022062301/568146c4550346895db3ff3c/html5/thumbnails/17.jpg)
NoC Topology 2. Indirect Topologies
each node is connected to an external switch, and switches have point-to-point links to other switches
switches do not perform any information processing, and correspondingly nodes do not perform any packet switching
e.g. SPIN, crossbar topologies Fat tree topology
nodes are connected only to the leaves of the tree more links near root, where bandwidth requirements are higher
17
![Page 18: On-Chip Communication: Networks on Chip (NoCs)](https://reader036.vdocument.in/reader036/viewer/2022062301/568146c4550346895db3ff3c/html5/thumbnails/18.jpg)
NoC Topology k-ary n-fly butterfly topology
blocking multi-stage network – packets may be temporarily blocked or dropped in the network if contention occurs
kn nodes, and n stages of kn-1 k x k crossbars e.g. 2-ary 3-fly butterfly network
18
![Page 19: On-Chip Communication: Networks on Chip (NoCs)](https://reader036.vdocument.in/reader036/viewer/2022062301/568146c4550346895db3ff3c/html5/thumbnails/19.jpg)
NoC Topology
3. Irregular or ad hoc network topologies customized for an application usually a mix of shared bus, direct, and indirect network
topologies e.g. reduced mesh, cluster-based hybrid topology
19
![Page 20: On-Chip Communication: Networks on Chip (NoCs)](https://reader036.vdocument.in/reader036/viewer/2022062301/568146c4550346895db3ff3c/html5/thumbnails/20.jpg)
Outline
Introduction NoC Topology Switching strategies Router Microarchitecture Routing algorithms Flow control schemes Clocking schemes QoS NoC Architecture Examples Status and Open Problems
20
![Page 21: On-Chip Communication: Networks on Chip (NoCs)](https://reader036.vdocument.in/reader036/viewer/2022062301/568146c4550346895db3ff3c/html5/thumbnails/21.jpg)
Messaging Units
Data is transmitted based on a hierarchical data structuring mechanism
Messages packets flits phits flits and phits are fixed size, packets and data may be variable sized phit is a unit of data that is transferred on a link in a single cycle typically, phit size = flit size
Switching: Determines “when” messaging units are moved21
Data/Message
head flit
Flits: flow control digits
Phits: physical flow control digits
Packets
Dest Info Seq # misc
tail flit
type
![Page 22: On-Chip Communication: Networks on Chip (NoCs)](https://reader036.vdocument.in/reader036/viewer/2022062301/568146c4550346895db3ff3c/html5/thumbnails/22.jpg)
Circuit Switching
Hardware path setup by a routing header or probe End-to-end acknowledgment initiates transfer at full
hardware bandwidth System is limited by signaling rate along the circuits Routing, arbitration and switching overhead experienced
once/message
22
tr ts
tsetup tdata
Time Busy
DataAcknowledgmentHeader Probe
Link
ts
![Page 23: On-Chip Communication: Networks on Chip (NoCs)](https://reader036.vdocument.in/reader036/viewer/2022062301/568146c4550346895db3ff3c/html5/thumbnails/23.jpg)
Circuit Switching
23
Source end node
Destination end node
Buffers for “request”
tokens
![Page 24: On-Chip Communication: Networks on Chip (NoCs)](https://reader036.vdocument.in/reader036/viewer/2022062301/568146c4550346895db3ff3c/html5/thumbnails/24.jpg)
Circuit Switching
— 24
Request for circuit establishment(routing and arbitration is performed during this step)
Source end node
Destination end node
Buffers for “request”
tokens
![Page 25: On-Chip Communication: Networks on Chip (NoCs)](https://reader036.vdocument.in/reader036/viewer/2022062301/568146c4550346895db3ff3c/html5/thumbnails/25.jpg)
Circuit Switching
— 25
Request for circuit establishment
Source end node
Destination end node
Buffers for “ack” tokens
Acknowledgment and circuit establishment(as token travels back to the source, connections are established)
![Page 26: On-Chip Communication: Networks on Chip (NoCs)](https://reader036.vdocument.in/reader036/viewer/2022062301/568146c4550346895db3ff3c/html5/thumbnails/26.jpg)
Circuit Switching
— 26
Request for circuit establishment
Source end node
Destination end node
Acknowledgment and circuit establishment
Packet transport(neither routing nor arbitration is required)
![Page 27: On-Chip Communication: Networks on Chip (NoCs)](https://reader036.vdocument.in/reader036/viewer/2022062301/568146c4550346895db3ff3c/html5/thumbnails/27.jpg)
Circuit Switching
— 27
HiRequest for circuit establishment
Source end node
Destination end node
Acknowledgment and circuit establishment
Packet transport
X
High contention, low utilization () low throughput
![Page 28: On-Chip Communication: Networks on Chip (NoCs)](https://reader036.vdocument.in/reader036/viewer/2022062301/568146c4550346895db3ff3c/html5/thumbnails/28.jpg)
Virtual Circuit Switching
Goal: Reduce cost associated with circuit switching Multiple virtual circuits (channels) multiplexed on a single physical link Allocate one buffer per virtual link can be expensive due to the large number of shared buffers Allocate one buffer per physical link uses time division multiplexing (TDM) to statically schedule usage less expensive routers
28
VC
flit
type
packet
![Page 29: On-Chip Communication: Networks on Chip (NoCs)](https://reader036.vdocument.in/reader036/viewer/2022062301/568146c4550346895db3ff3c/html5/thumbnails/29.jpg)
Virtual Circuit Switching Example
— 29
router 1
router 3
router 2networkinterface
networkinterface
networkinterface
1
1
2
1 2
1
3 3
input 2 for router 1 isoutput 1 for router 2
3 3
1
1
2
2
1
1
2
2
1
1
2
2
2
2
1
1
2
2
1
1
4 4
Use slots to avoid contention• divide up bandwidth
the input routed to the output at this slot
-
-
-
-
o1
-
i1
i1
-
o2
-
-
-
-
o3
-
-
-
-
o4
-
-
i4
-
o1
-
-
-
-
o2
-
-
-
-
o3
i1
-
-
-
o4
-
-
-
-
o1
i1
-
-
-
o3
-
-
-
-
o4
i1
-
-
o2
i3
![Page 30: On-Chip Communication: Networks on Chip (NoCs)](https://reader036.vdocument.in/reader036/viewer/2022062301/568146c4550346895db3ff3c/html5/thumbnails/30.jpg)
Packet Switching Packets are transmitted from source and make their way
independently to receiver possibly along different routes and with different delays
Zero start up time, followed by a variable delay due to contention in routers along packet path
QoS guarantees are harder to make
Three main packet switching scheme variants: Store and Forward (SAF) switching Virtual Cut Through (VCT) switching Wormhole (WH) switching
30
![Page 31: On-Chip Communication: Networks on Chip (NoCs)](https://reader036.vdocument.in/reader036/viewer/2022062301/568146c4550346895db3ff3c/html5/thumbnails/31.jpg)
Packet Switching (Store and Forward)
Routing, arbitration, switching overheads experienced for each packet
Increased storage requirements at the nodes Packetization and in-order delivery requirements Alternative buffering schemes
Use of local processor memory Central (to the switch) queues
31
Message Header
tpacket
Link
Message Data
tr
Time Busy
![Page 32: On-Chip Communication: Networks on Chip (NoCs)](https://reader036.vdocument.in/reader036/viewer/2022062301/568146c4550346895db3ff3c/html5/thumbnails/32.jpg)
Packet Switching (Store and Forward)
— 32
Source end node
Destination end node
Packets are completely stored before any portion is forwarded
Store
Buffers for datapackets
![Page 33: On-Chip Communication: Networks on Chip (NoCs)](https://reader036.vdocument.in/reader036/viewer/2022062301/568146c4550346895db3ff3c/html5/thumbnails/33.jpg)
Packet Switching (Store and Forward)
— 33
Source end node
Destination end node
Packets are completely stored before any portion is forwarded
StoreForward
Requirement:buffers must be
sized to holdentire packet
(MTU)
![Page 34: On-Chip Communication: Networks on Chip (NoCs)](https://reader036.vdocument.in/reader036/viewer/2022062301/568146c4550346895db3ff3c/html5/thumbnails/34.jpg)
Packet Switching (Virtual Cut Through)
Messages cut-through to the next router when feasible In the absence of blocking, messages are pipelined
Pipeline cycle time is the larger of intra-router and inter-router flow control delays
When the header is blocked, the complete message is buffered at a switch
High load behavior approaches that of SAF
34
tblocking
tw
tr ts
Packet HeaderMessage Packetcuts throughthe RouterLink
Time Busy
![Page 35: On-Chip Communication: Networks on Chip (NoCs)](https://reader036.vdocument.in/reader036/viewer/2022062301/568146c4550346895db3ff3c/html5/thumbnails/35.jpg)
Packet Switching (Virtual Cut Through)
— 35
Source end node
Destination end node
Routing
Portions of a packet may be forwarded (“cut-through”) to the next switchbefore the entire packet is stored at the current switch
![Page 36: On-Chip Communication: Networks on Chip (NoCs)](https://reader036.vdocument.in/reader036/viewer/2022062301/568146c4550346895db3ff3c/html5/thumbnails/36.jpg)
Packet Switching (Wormhole)
Messages are pipelined, but buffer space is on the order of a few flits
Small buffers + message pipelining small compact switches/routers
Supports variable sized messages Messages cannot be interleaved over a channel: routing
information is only associated with the header Base Latency is equivalent to that of virtual cut-through
36
Link
Time Busy
tr ts
twormhole
Single Flit
Header Flit
![Page 37: On-Chip Communication: Networks on Chip (NoCs)](https://reader036.vdocument.in/reader036/viewer/2022062301/568146c4550346895db3ff3c/html5/thumbnails/37.jpg)
Virtual Cut Through vs. Wormhole
Virtual Cut Through
Wormhole
37
Source end node
Destination end node
Source end node
Destination end node
Buffers for datapackets
Requirement:buffers must be sized to hold entire packet
(MTU)
Buffers for flits:packets can be larger
than buffers
![Page 38: On-Chip Communication: Networks on Chip (NoCs)](https://reader036.vdocument.in/reader036/viewer/2022062301/568146c4550346895db3ff3c/html5/thumbnails/38.jpg)
Virtual Cut Through vs. Wormhole
38
Source end node
Destination end node
Source end node
Destination end node
Busy Link
Packet stored along the path
Busy Link
Packet completelystored atthe switch
Buffers for datapackets
Requirement:buffers must be sized to hold entire packet
(MTU)
Buffers for flits:packets can be larger
than buffers
Virtual Cut Through
Wormhole
![Page 39: On-Chip Communication: Networks on Chip (NoCs)](https://reader036.vdocument.in/reader036/viewer/2022062301/568146c4550346895db3ff3c/html5/thumbnails/39.jpg)
Comparison of Packet Switching Techniques
SAF packet switching and virtual cut-through Consume network bandwidth proportional to network load VCT behaves like wormhole at low loads and like SAF packet
switching at high loads High buffer costs
Wormhole switching Provides low (unloaded) latency Lower saturation point Higher variance of message latency than SAF packet or VCT
switching
39
![Page 40: On-Chip Communication: Networks on Chip (NoCs)](https://reader036.vdocument.in/reader036/viewer/2022062301/568146c4550346895db3ff3c/html5/thumbnails/40.jpg)
Outline
Introduction NoC Topology Switching strategies Router Microarchitecture Routing algorithms Flow control schemes Clocking schemes QoS NoC Architecture Examples Status and Open Problems
40
![Page 41: On-Chip Communication: Networks on Chip (NoCs)](https://reader036.vdocument.in/reader036/viewer/2022062301/568146c4550346895db3ff3c/html5/thumbnails/41.jpg)
Generic Router Architecture
41
![Page 42: On-Chip Communication: Networks on Chip (NoCs)](https://reader036.vdocument.in/reader036/viewer/2022062301/568146c4550346895db3ff3c/html5/thumbnails/42.jpg)
Pipelined Router Microarchitecture
42
Cro
ssB
ar
Stage 1 Stage 2 Stage 3 Stage 4 Stage 5
VC Allocation
LT & IB (Input Buffering)LT & IB (Input Buffering)
RCRC
VCAVCA
SASA
ST & Output BufferingST & Output Buffering
Input buffers
Input buffers
DEM
UX
Physi
cal
channel
Link
Contr
ol
Link
Contr
ol
Physi
cal
channel
MU
X
DEM
UX M
UX
Output buffers
Link
Contr
ol
Output buffers
Link
Contr
ol
Physi
cal
channel
Physi
cal
channel
DEM
UX M
UX
DEM
UX M
UXRouting
Computation
Routing Computation
Switch Allocation
![Page 43: On-Chip Communication: Networks on Chip (NoCs)](https://reader036.vdocument.in/reader036/viewer/2022062301/568146c4550346895db3ff3c/html5/thumbnails/43.jpg)
Virtual Circuit Switching Router
43
Output 4
RouteComputation
VCAllocatorSwitch
Allocator
VC 1
VC 2
VC n
Input buffers
VC 1
VC 2
VC n
Input buffers
Input 0
Input 4
Output 0
.
.
.
.
.
.
Crossbar switch
RC VCA SA STLT
RCVCASA
STLT
RouterLink
Link Router
RC: Route Computation VCA: VC Allocation; SA: Switch Allocation
ST: Switch Traversal; LT: Link Traversal
By using prediction/speculation, pipeline can be made more compact
![Page 44: On-Chip Communication: Networks on Chip (NoCs)](https://reader036.vdocument.in/reader036/viewer/2022062301/568146c4550346895db3ff3c/html5/thumbnails/44.jpg)
Router Power Dissipation
44
• Used for a 6x6 mesh• 4 stage pipeline, > 3 GHz• Wormhole switching
Source: Partha Kundu, “On-Die Interconnects for Next-Generation CMPs”, talk at On-Chip Interconnection Networks Workshop, Dec 2006
![Page 45: On-Chip Communication: Networks on Chip (NoCs)](https://reader036.vdocument.in/reader036/viewer/2022062301/568146c4550346895db3ff3c/html5/thumbnails/45.jpg)
Head of Line (HOL) Blocking Can be a problem in NoC routers!
limits the throughput of switches to 58.6%
Solution: Virtual Output Queues (VOQ) input queuing strategy in which each input port maintains a
separate queue for each output port
— 45
![Page 46: On-Chip Communication: Networks on Chip (NoCs)](https://reader036.vdocument.in/reader036/viewer/2022062301/568146c4550346895db3ff3c/html5/thumbnails/46.jpg)
Outline
Introduction NoC Topology Switching strategies Router Microarchitecture Routing algorithms Flow control schemes Clocking schemes QoS NoC Architecture Examples Status and Open Problems
46
![Page 47: On-Chip Communication: Networks on Chip (NoCs)](https://reader036.vdocument.in/reader036/viewer/2022062301/568146c4550346895db3ff3c/html5/thumbnails/47.jpg)
Routing Algorithms Responsible for correctly and efficiently routing
packets or circuits from the source to the destination Path selection between a source and a destination
node in a particular topology Ensure load balancing Latency minimization Flexibility wrt faults in the network Deadlock and livelock free solutions
47
S
D
![Page 48: On-Chip Communication: Networks on Chip (NoCs)](https://reader036.vdocument.in/reader036/viewer/2022062301/568146c4550346895db3ff3c/html5/thumbnails/48.jpg)
Static vs. Dynamic Routing Static routing: fixed paths are used to transfer data
between a particular source and destination does not take into account current state of the network
advantages of static routing: easy to implement, since very little additional router logic is required in-order packet delivery if single path is used
Dynamic routing: routing decisions are made according to the current state of the network
considering factors such as availability and load on links path between source and destination may change over time
as traffic conditions and requirements of the application change more resources needed to monitor state of the network and
dynamically change routing paths able to better distribute traffic in a network
48
![Page 49: On-Chip Communication: Networks on Chip (NoCs)](https://reader036.vdocument.in/reader036/viewer/2022062301/568146c4550346895db3ff3c/html5/thumbnails/49.jpg)
Example: Dimension-order Routing Static XY routing (commonly used):
a deadlock-free shortest path routing which routes packets in the X-dimension first and then in the Y-dimension.
Used for tori and meshes Destination address expressed as absolute coordinates
49
00 10 20
01 11 21
02 12 22
03 13 23
-x
+y00 10 20
01 11 21
02 12 22
03 13 23
For torus, a preferred directionmay have to be selected.
For mesh, the preferred directionis the only valid direction.
![Page 50: On-Chip Communication: Networks on Chip (NoCs)](https://reader036.vdocument.in/reader036/viewer/2022062301/568146c4550346895db3ff3c/html5/thumbnails/50.jpg)
Example: Adaptive Routing
Chooses shortest path to destination with minimum congestion Path can change over time
Because congestion changes over time
— 50
00 10 20
01 11 21
02 12 22
03 13 23
![Page 51: On-Chip Communication: Networks on Chip (NoCs)](https://reader036.vdocument.in/reader036/viewer/2022062301/568146c4550346895db3ff3c/html5/thumbnails/51.jpg)
Pitfall: Dynamic Routing
A locally optimum decision may lead to a globally sub-optimal route
51
00 10 20
01 11 21
02 12 22
03 13 23
To avoid slight congestionin (01-02), packets then incurmore congested links
![Page 52: On-Chip Communication: Networks on Chip (NoCs)](https://reader036.vdocument.in/reader036/viewer/2022062301/568146c4550346895db3ff3c/html5/thumbnails/52.jpg)
Non-Algorithmic vs. Algorithmic
Non-algorithmic schemes Do not use any algorithm in routers to
compute route direction e.g. table based routing schemes
use a simple lookup to a local table that stores route information to find where to send each flit at the input buffers
— 52
![Page 53: On-Chip Communication: Networks on Chip (NoCs)](https://reader036.vdocument.in/reader036/viewer/2022062301/568146c4550346895db3ff3c/html5/thumbnails/53.jpg)
Algorithmic routing
— 53
For given topology and routing algorithm, use simple algorithms to compute the route usually implemented as combinational logic may require info to process in packet header specific for a network configuration
E.g., 2D Torus–Packet header contains signed offset to destination (per dimension)
–At each hop, update offset in a dimension
–When x == 0 and y == 0, then at correct processor
sx x sy y
=0 =0
Productive direction vector
![Page 54: On-Chip Communication: Networks on Chip (NoCs)](https://reader036.vdocument.in/reader036/viewer/2022062301/568146c4550346895db3ff3c/html5/thumbnails/54.jpg)
Distributed vs. Source Routing
Distributed routing: each packet carries the destination address
e.g., XY co-ordinates or number identifying destination node/router routing decisions are made in each router by looking up the destination
addresses in a routing table or by executing a hardware function
Source routing: packet carries routing information pre-computed routing tables are stored at a nodes’ NI routing information is looked up at the source NI and routing information is
added to the header of the packet (increasing packet size) when a packet arrives at a router, the routing information is extracted from
the routing field in the packet header does not require a destination address in a packet, any intermediate
routing tables, or functions needed to calculate the route
54
![Page 55: On-Chip Communication: Networks on Chip (NoCs)](https://reader036.vdocument.in/reader036/viewer/2022062301/568146c4550346895db3ff3c/html5/thumbnails/55.jpg)
Minimal vs. Non-minimal Routing Minimal routing: length of routing path from source to
destination is shortest possible length between the 2 nodes source does not start sending a packet if minimal path is not available
Non-minimal routing: can use longer paths if a minimal path not available
by allowing non-minimal paths, the number of alternative paths is increased, which can be useful for avoiding congestion
disadvantage: overhead of additional power consumption
5500 10 20
01 11 21
02 12 22
03 13 23
Minimal adaptive routingis unable to avoid congested linksin the absence of minimal path diversity
![Page 56: On-Chip Communication: Networks on Chip (NoCs)](https://reader036.vdocument.in/reader036/viewer/2022062301/568146c4550346895db3ff3c/html5/thumbnails/56.jpg)
Ordered vs. Unordered Routing Ordered
One route per source-destination (S-D) pair No traffic splitting
Unordered Potentially multiple routes per source-destination (S-D) pair
— 56
Unordered Routing Ordered Routing
![Page 57: On-Chip Communication: Networks on Chip (NoCs)](https://reader036.vdocument.in/reader036/viewer/2022062301/568146c4550346895db3ff3c/html5/thumbnails/57.jpg)
Example: Toggle XY (TXY)
— 57
XY is unbalanced - high cost in uniform capacity grids
Split packets evenly between XY, YX routes
Deadlock avoided with 2 VCs Near-optimal for symmetric traffic
(permutations) [Seo et al. 05; Towles & Dally 02]
Simple; Better Balanced Split routes packets of the same flow take different paths Delays may cause out-of-order arrivals Re-ordering buffers are costly Does not take into account the traffic
pattern Static, minimal scheme Extension: stochastically bias XY, YX
routes at design time
v1
v2
f/2 f/2 f/2
f/2f/2
f/2 f/2 f/2
f/2f/2
![Page 58: On-Chip Communication: Networks on Chip (NoCs)](https://reader036.vdocument.in/reader036/viewer/2022062301/568146c4550346895db3ff3c/html5/thumbnails/58.jpg)
Example: Source Toggle XY The route is a
function of source and destination ID bitwise XOR
Very simple algorithm
Maximum capacity is similar to TXY
— 58
XY YX XY YX XY
YX YX XY YX
XY YX XY YX XY
YX XY YX XY YX
XY YX XY YX XY
![Page 59: On-Chip Communication: Networks on Chip (NoCs)](https://reader036.vdocument.in/reader036/viewer/2022062301/568146c4550346895db3ff3c/html5/thumbnails/59.jpg)
Weighted Ordered Toggle - WOT
Weighted Ordered Toggle (WOT) Route per S-D pair is chosen at programming
time Each source stores a routing bit for each
destination Objective: minimize max link capacity
Optimal route assignment is difficult
— 59
![Page 60: On-Chip Communication: Networks on Chip (NoCs)](https://reader036.vdocument.in/reader036/viewer/2022062301/568146c4550346895db3ff3c/html5/thumbnails/60.jpg)
WOT Min-max Route Assignment Initial assignment - STXY Make changes that
reduce the capacity: Find most loaded link Among S-D pairs sharing
this link change one that minimizes the max capacity (if possible)
Sub-optimal
— 60
S3 S2
S1
D3
D1
D2
![Page 61: On-Chip Communication: Networks on Chip (NoCs)](https://reader036.vdocument.in/reader036/viewer/2022062301/568146c4550346895db3ff3c/html5/thumbnails/61.jpg)
Deadlock-Free Routing Requirements
Routing algorithm must ensure freedom from deadlocks common in WH switching e.g. cyclic dependency shown below
freedom from deadlocks can be ensured by allocating additional hardware resources or imposing restrictions on the routing
usually channel dependency graph (CDG) of the shared network resources is built and analyzed either statically or dynamically
61
![Page 62: On-Chip Communication: Networks on Chip (NoCs)](https://reader036.vdocument.in/reader036/viewer/2022062301/568146c4550346895db3ff3c/html5/thumbnails/62.jpg)
Dependencies
When a packet holds a channel and requests another channel, there is a direct dependency between them
Channel dependency graph D = G(C,E) For deterministic routing: single dependency at each node For adaptive routing: all requested channels produce
dependencies, and dependency graph may contain cycles
62
dependencySAF & VCT dependency
Wormhole dependency
Header flit
Data flit
![Page 63: On-Chip Communication: Networks on Chip (NoCs)](https://reader036.vdocument.in/reader036/viewer/2022062301/568146c4550346895db3ff3c/html5/thumbnails/63.jpg)
Breaking Cyclic Dependencies
• The configuration to the left can deadlock• Add (virtual) channels
We can make the channel dependency graph acyclic via routing restrictions (via the routing function)
63
n0 n1
n3 n2
n0 n1
n3 n2
c0
c1
c2
c3
c11
c02
c12
c01
c03
c13
c00
c10
![Page 64: On-Chip Communication: Networks on Chip (NoCs)](https://reader036.vdocument.in/reader036/viewer/2022062301/568146c4550346895db3ff3c/html5/thumbnails/64.jpg)
Breaking Cyclic Dependencies
Routing function is c0i
when j<=i, c
1i when
j >i
j is hop distance from source here we assume i = 3
Channels c00
and c13
are unused
Routing function breaks cycles in the channel dependency graph
64
c10
c11
c12
c01
c02
c03
n0 n1
n3 n2
c10
c11
c02
c12
c01
c03
c00
c13
![Page 65: On-Chip Communication: Networks on Chip (NoCs)](https://reader036.vdocument.in/reader036/viewer/2022062301/568146c4550346895db3ff3c/html5/thumbnails/65.jpg)
Turn Model Based Routing
What is a turn? From one dimension to another : 90 degree turn To another virtual channel in the same direction: 0 degree turn To the reverse direction: 180 degree turn
Turns combine to form cycles Goal: prohibit the least number of turns to break all
possible cycles Examples: west-first, north-last, negative-first Partially adaptive schemes
— 65
abstract cycles YX routing west-first routing
![Page 66: On-Chip Communication: Networks on Chip (NoCs)](https://reader036.vdocument.in/reader036/viewer/2022062301/568146c4550346895db3ff3c/html5/thumbnails/66.jpg)
Other Routing Algorithm Requirements
Routing algorithm must ensure freedom from livelocks livelocks are similar to deadlocks, except that states of the
resources involved constantly change with regard to one another, without making any progress
occurs especially when dynamic (adaptive) routing is used e.g. can occur in a deflective “hot potato” routing if a packet is
bounced around over and over again between routers and never reaches its destination
livelocks can be avoided with simple priority rules Routing algorithm must ensure freedom from starvation
under scenarios where certain packets are prioritized during routing, some of the low priority packets never reach their intended destination
can be avoided by using a fair routing algorithm, or reserving some bandwidth for low priority data packets
66
![Page 67: On-Chip Communication: Networks on Chip (NoCs)](https://reader036.vdocument.in/reader036/viewer/2022062301/568146c4550346895db3ff3c/html5/thumbnails/67.jpg)
Outline
Introduction NoC Topology Switching strategies Routing algorithms Router Microarchitecture Flow control schemes Clocking schemes QoS NoC Architecture Examples Status and Open Problems
67
![Page 68: On-Chip Communication: Networks on Chip (NoCs)](https://reader036.vdocument.in/reader036/viewer/2022062301/568146c4550346895db3ff3c/html5/thumbnails/68.jpg)
Flow Control
Required in non-Circuit Switched networks to deal with congestion (regulate inflow of flits into NoC)
Recover from transmission errors Commonly used schemes:
ACK-NACK Flow control Credit based Flow control Xon/Xoff (STALL-GO) Flow Control
68
A B C Block
Buffer full
Don’t send
Buffer full
Don’t send
“Backpressure”
![Page 69: On-Chip Communication: Networks on Chip (NoCs)](https://reader036.vdocument.in/reader036/viewer/2022062301/568146c4550346895db3ff3c/html5/thumbnails/69.jpg)
Flow Control Schemes
ACK/NACK when flits are sent on a link, a local copy is kept in a buffer by sender when ACK received by sender, it deletes copy of flit from its local
buffer when NACK is received, sender rewinds its output queue and starts
resending flits, starting from the corrupted one implemented either end-to-end or switch-to-switch sender needs to have a buffer of size at least = 2N + k
N is number of buffers encountered between source and destination k depends on latency of logic at the sender and receiver
fault handling support comes at cost of greater power, area overhead69
![Page 70: On-Chip Communication: Networks on Chip (NoCs)](https://reader036.vdocument.in/reader036/viewer/2022062301/568146c4550346895db3ff3c/html5/thumbnails/70.jpg)
Flow Control Schemes Credit based flow control
— 70
receiversender
Sender sends packets whenever
credit counter is not zero
10Credit counter 9876543210
X
Queue isnot serviced
pipelined transfer
![Page 71: On-Chip Communication: Networks on Chip (NoCs)](https://reader036.vdocument.in/reader036/viewer/2022062301/568146c4550346895db3ff3c/html5/thumbnails/71.jpg)
Flow Control Schemes Credit based flow control
— 71
receiversender
10Credit counter 9876543210
+5
5432
X
Queue isnot serviced
Receiver sends credits after they become available
Sender resumesinjection
pipelined transfer
![Page 72: On-Chip Communication: Networks on Chip (NoCs)](https://reader036.vdocument.in/reader036/viewer/2022062301/568146c4550346895db3ff3c/html5/thumbnails/72.jpg)
Flow Control Schemes Xon/Xoff flow control
— 72
receiversender
XonXoff a packet is injected
if control bit is in Xon
Control bit
Xon
Xoff
pipelined transfer
![Page 73: On-Chip Communication: Networks on Chip (NoCs)](https://reader036.vdocument.in/reader036/viewer/2022062301/568146c4550346895db3ff3c/html5/thumbnails/73.jpg)
Flow Control Schemes Xon/Xoff flow control
73
receiversender
XonXoff
When Xoff threshold is
reached, an Xoff notification is sent
Control bit Xoff
Xon
When in Xoff,sender cannotinject packets
X
Queue isnot serviced
pipelined transfer
![Page 74: On-Chip Communication: Networks on Chip (NoCs)](https://reader036.vdocument.in/reader036/viewer/2022062301/568146c4550346895db3ff3c/html5/thumbnails/74.jpg)
Flow Control Schemes Xon/Xoff flow control
— 74
receiversender
XonXoff
When Xon threshold is
reached, an Xon notification is sent
Control bit
Xon
Xoff
X
Queue isnot serviced
pipelined transfer
![Page 75: On-Chip Communication: Networks on Chip (NoCs)](https://reader036.vdocument.in/reader036/viewer/2022062301/568146c4550346895db3ff3c/html5/thumbnails/75.jpg)
Comparing Credit-Based & Xon/Xoff Flow Control
Both schemes can fully utilize buffers Restart latency is lower for credit-based schemes and
therefore Credit-based flow control has higher average buffer occupancy at
high loads Credit-based flow control leads to higher throughput at high loads Smaller inter-packet gap
Control traffic is higher for credit schemes Block credits can be used to tune link behavior
Buffer sizes are independent of round trip latency for credit schemes (at the expense of performance)
Credit schemes have higher information content useful for QoS schemes
— 75
![Page 76: On-Chip Communication: Networks on Chip (NoCs)](https://reader036.vdocument.in/reader036/viewer/2022062301/568146c4550346895db3ff3c/html5/thumbnails/76.jpg)
Some Commercial Examples
Hypertransport: Credit based Info packets are not buffered and have guaranteed
reception Infiniband: Credit based
Link based schemes + end-to-end Myrinet : Credit based Ethernet: Xon/Xoff PCI Express
Data link layer ack/nack Transaction layer is credit based
— 76
![Page 77: On-Chip Communication: Networks on Chip (NoCs)](https://reader036.vdocument.in/reader036/viewer/2022062301/568146c4550346895db3ff3c/html5/thumbnails/77.jpg)
Outline
Introduction NoC Topology Switching strategies Router Microarchitecture Routing algorithms Flow control schemes Clocking schemes QoS NoC Architecture Examples Status and Open Problems
77
![Page 78: On-Chip Communication: Networks on Chip (NoCs)](https://reader036.vdocument.in/reader036/viewer/2022062301/568146c4550346895db3ff3c/html5/thumbnails/78.jpg)
Clocking schemes Fully synchronous
single global clock is distributed to synchronize entire chip hard to achieve in practice, due to process variations and clock skew
Mesochronous local clocks are derived from a global clock not sensitive to clock skew phase between clock signals in different modules may differ
deterministic for regular topologies (e.g. mesh) non-deterministic for irregular topologies
synchronizers needed between clock domains
Pleisochronous clock signals are produced locally
Asynchronous clocks do not have to be present at all
78
![Page 79: On-Chip Communication: Networks on Chip (NoCs)](https://reader036.vdocument.in/reader036/viewer/2022062301/568146c4550346895db3ff3c/html5/thumbnails/79.jpg)
Outline
Introduction NoC Topology Switching strategies Router Microarchitecture Routing algorithms Flow control schemes Clocking schemes QoS NoC Architecture Examples Status and Open Problems
79
![Page 80: On-Chip Communication: Networks on Chip (NoCs)](https://reader036.vdocument.in/reader036/viewer/2022062301/568146c4550346895db3ff3c/html5/thumbnails/80.jpg)
Quality of Service (QoS) QoS refers to the level of commitment for packet delivery
refers to bounds on performance (bandwidth, delay, and jitter)
Two basic categories best effort (BE)
only correctness and completion of communication is guaranteed usually packet switched worst case times cannot be guaranteed
guaranteed service (GS) makes a tangible guarantee on performance, in addition to basic
guarantees of correctness and completion for communication usually (virtual) circuit switched
80
![Page 81: On-Chip Communication: Networks on Chip (NoCs)](https://reader036.vdocument.in/reader036/viewer/2022062301/568146c4550346895db3ff3c/html5/thumbnails/81.jpg)
Outline
Introduction NoC Topology Switching strategies Router Microarchitecture Routing algorithms Flow control schemes Clocking schemes QoS NoC Architecture Examples Status and Open Problems
81
![Page 82: On-Chip Communication: Networks on Chip (NoCs)](https://reader036.vdocument.in/reader036/viewer/2022062301/568146c4550346895db3ff3c/html5/thumbnails/82.jpg)
Intel’s Teraflops Research Processor
82
Goals:
Deliver Tera-scale performance■ Single precision TFLOP at desktop
power■ Frequency target 5GHz■ Bi-section B/W order of Terabits/s■ Link bandwidth in hundreds of GB/s
Prototype two key technologies■ On-die interconnect fabric■ 3D stacked memory
Develop a scalable design
methodology■ Tiled design approach■ Mesochronous clocking■ Power-aware capability
I/O AreaI/O Area
I/O AreaI/O Area
PLLPLL
single tilesingle tile
1.5mm1.5mm
2.0mm2.0mm
TAPTAP
21.7
2m
m
I/O AreaI/O Area
PLLPLL TAPTAP
12.64mm
65nm, 1 poly, 8 metal (Cu)Technology
100 Million (full-chip) 1.2 Million (tile)
Transistors
275mm2 (full-chip) 3mm2 (tile)
Die Area
8390C4 bumps #
65nm, 1 poly, 8 metal (Cu)Technology
100 Million (full-chip) 1.2 Million (tile)
Transistors
275mm2 (full-chip) 3mm2 (tile)
Die Area
8390C4 bumps #
[Vangal08]
![Page 83: On-Chip Communication: Networks on Chip (NoCs)](https://reader036.vdocument.in/reader036/viewer/2022062301/568146c4550346895db3ff3c/html5/thumbnails/83.jpg)
Main Building Blocks
83
Special Purpose Cores■ High performance Dual
FPMACs
2D Mesh Interconnect ■ High bandwidth low latency
router■ Phase-tolerant tile to tile
communication
Mesochronous Clocking■ Modular & scalable■ Lower power
Workload-aware Power Management■ Sleep instructions■ Chip voltage & freq. control
2KB Data memory (DMEM)
3KB
Inst
. mem
ory
(IMEM
)
6-read, 4-write 32 entry RF
32
64 64
32
64
RIB
96
96
Mesochronous Interface
Processing Engine (PE)
Crossbar RouterM
SIN
T39
39
40 GB/s
FPMAC0
x
+
Normalize
32
32
FPMAC1
x
+32
32
MSIN
T
MSINT
MSINT
Normalize
Tile
![Page 84: On-Chip Communication: Networks on Chip (NoCs)](https://reader036.vdocument.in/reader036/viewer/2022062301/568146c4550346895db3ff3c/html5/thumbnails/84.jpg)
Fine-Grain Power Management
84
Scalable power to match workload demandsScalable power to match workload demands
Dynamic sleep
STANDBY: • Memory retains data• 50% less power/tileFULL SLEEP: •Memories fully off•80% less power/tile
21 sleep regions per tile 21 sleep regions per tile (not all shown)(not all shown)
FP FP Engine 1Engine 1
FP FP Engine 2Engine 2
RouterRouter
Data MemoryData Memory
InstructionInstruction MemoryMemory
FP Engine 1
Sleeping:90% less
power
FP Engine 2
Sleeping:90% less
power
RouterSleeping:
10% less power(stays on to pass traffic)
Data MemorySleeping:
57% less power
Instruction MemorySleeping:
56% less power
![Page 85: On-Chip Communication: Networks on Chip (NoCs)](https://reader036.vdocument.in/reader036/viewer/2022062301/568146c4550346895db3ff3c/html5/thumbnails/85.jpg)
Router Features 5 ports, wormhole, 5cycle pipeline 39-bit (32data , 6ctrl, 1str) bidirectional
mesochronous P2P links per port 2 logical lanes each with 16 flit-buffers Performance, area, power
Freq 5.1GHz @ 1.2V 102GB/s raw bandwidth Area 0.34mm2 (65nm) Power 945mW (1.2V), 470mW (1V), 98mW (0.75V)
Fine-grained clock-gating + sleep (10 regions)
85
![Page 86: On-Chip Communication: Networks on Chip (NoCs)](https://reader036.vdocument.in/reader036/viewer/2022062301/568146c4550346895db3ff3c/html5/thumbnails/86.jpg)
KAIST BONE Project
86
2003 2004 2005 2006 2007
PROTONE- Star topology
Slim Spider- Hierarchical star
Memory Centric NoC(Hierarchical star + Shared memory)
IIS- Configurable
Star
MeshRAW,MIT
80-Tile NoC, Intel
Baseband processor NoC, STMicro, et. al.
[KimNOC07]
![Page 87: On-Chip Communication: Networks on Chip (NoCs)](https://reader036.vdocument.in/reader036/viewer/2022062301/568146c4550346895db3ff3c/html5/thumbnails/87.jpg)
On-Chip Serialization
87
PU
Netw
ork
In
terfa
ce
SE
RD
ES
X-barS/W
Reduced LinkWidth
Reduced X-bar Switch
Operation frequency
Driver size
Wire space
Capacitance load
Energy consumption
Coupling capacitance
Buffer resource
Switching energy
→ Proper level of On-chip Serialization improves NoC performance
![Page 88: On-Chip Communication: Networks on Chip (NoCs)](https://reader036.vdocument.in/reader036/viewer/2022062301/568146c4550346895db3ff3c/html5/thumbnails/88.jpg)
Memory-Centric NoC Architecture
88
Overall Architecture
■ 10 RISC processors
■ 8 dual port memories
■ 4 Channel controllers
■ Hierarchical-star topology packet switching network
■ Mesochronous comm.
RISC0
RISC1
ChannelContoller 0
Dual PortMem. 0
(1.5 KB)
Por
t A
Por
t B
Dual PortMem. 3
X - barS/ W
X- bar Switch
ChannelContoller 1
ChannelContoller 2
Dual PortMem. 4
Dual PortMem. 5
Dual PortMem. 6
Dual PortMem. 7
ControlProcessor
(RISC)
Ext.Mem.
I/F
HierarchicalStar Topology
Network -on -Chip(400 MHz )
X- barS/ W
X - barS/ W
X- barS/ W
RISC2
RISC3
RISC4
NI
NI
NI
NI
NI
NI
NI
NI
NI
NI
NI
NI
RISC7
RISC8
RISC9
ChannelContoller 3
RISC5
RISC6
Dual PortMem. 2
NI
NI
Dual PortMem. 1
NI
NI
36
36
36
36
NI
NI
NIN
IN
I
NI
NI
NI
NI
NI
![Page 89: On-Chip Communication: Networks on Chip (NoCs)](https://reader036.vdocument.in/reader036/viewer/2022062301/568146c4550346895db3ff3c/html5/thumbnails/89.jpg)
Implementation Results
89
Chip photograph & results
8 Dual port Memories
(905.6 mW)
Memory Centric NoC (96.8mW)
RISC Processor (52mW)
10 RISCs(354 mW)
Power Breakdown
[Kim07]
![Page 90: On-Chip Communication: Networks on Chip (NoCs)](https://reader036.vdocument.in/reader036/viewer/2022062301/568146c4550346895db3ff3c/html5/thumbnails/90.jpg)
Summary of NoCs in Emerging Processors
90
System Topology Routing Switching Flow ctrl
MIT RAW 2D mesh (32bit) XY DOR WH, no VC Credit
UPMC SPIN Fat Tree (32bit) Up*/down* WH, no VC Credit
QuickSilver ACM H-Tree (32bit) Up*/down* 1-flit, no VC Credit
UMass Amherst aSOC
2D mesh Shortest-path Pipelined CS, no VC
Timeslot
Sun T1 Crossbar (128bit) - - ACK/NACK
Cell BE EIB Ring (128bit) Shortest-path Pipelined CS, no VC
Credit
TRIPS (operand) 2D mesh (109bit) YX DOR 1-flit, no VC On/off
TRIPS (on-chip) 2D mesh (128bit) YX DOR WH, 4 VCs Credit
Intel SCC 2D torus (32bit) XY,YX DOR, odd-even TM
WH, no VC On/off
TILE64 iMesh 2D mesh (32bit) XY DOR WH, no VC Credit
Intel 80-core NoC 2-D mesh (32bit) Source routing WH, 2 lanes On/off
![Page 91: On-Chip Communication: Networks on Chip (NoCs)](https://reader036.vdocument.in/reader036/viewer/2022062301/568146c4550346895db3ff3c/html5/thumbnails/91.jpg)
Outline
Introduction NoC Topology Switching strategies Router Microarchitecture Routing algorithms Flow control schemes Clocking schemes QoS NoC Architecture Examples Status and Open Problems
91
![Page 92: On-Chip Communication: Networks on Chip (NoCs)](https://reader036.vdocument.in/reader036/viewer/2022062301/568146c4550346895db3ff3c/html5/thumbnails/92.jpg)
Status and Open Problems Power
complex NI and switching/routing logic blocks are power hungry several times greater than for current bus-based approaches
Latency additional delay to packetize/de-packetize data at NIs flow/congestion control and fault tolerance protocol overheads delays at the numerous switching stages encountered by
packets even circuit switching has overhead (e.g. SOCBUS) lags behind what can be achieved with bus-based/dedicated
wiring Lack of tools and benchmarks Simulation speed
GHz clock frequencies, large network complexity, greater number of PEs slow down simulation 92
![Page 93: On-Chip Communication: Networks on Chip (NoCs)](https://reader036.vdocument.in/reader036/viewer/2022062301/568146c4550346895db3ff3c/html5/thumbnails/93.jpg)
Trends
NoCs have high latency and power dissipation
What if we used photonic interconnects on chip?
![Page 94: On-Chip Communication: Networks on Chip (NoCs)](https://reader036.vdocument.in/reader036/viewer/2022062301/568146c4550346895db3ff3c/html5/thumbnails/94.jpg)
Trends
Hybrid electro-photonic NoCs
Torus Mesh Firefly Corona Optical Mesh METEOR0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Norm
aliz
ed P
ow
er
Consu
mptio
n
Thermal TuningOptical ComponentsLaser PowerElectrical LinksBufferCrossbar and RoutingStatic
Ft Rd Ch Ry Fm Br Lu0
1
2
3
4
Norm
aliz
ed L
ate
ncy
Electrical TorusElectrical MeshCoronaFireflyHybrid Optical TorusMETEOR
Ft Rd Ch Ry Fm Br Lu0
1
2
3
4
Norm
aliz
ed P
ow
er
Ft Rd Ch Ry Fm Br Lu0
0.5
1
1.5
Norm
aliz
ed T
hro
ughput
Ft Rd Ch Ry Fm Br Lu0
2
4
6
8
Norm
aliz
ed e
nerg
y-dela
y
8x8 12x120
10
20
30
mm
2
Electrical Area
Electrical TorusElectrical MeshCoronaFireflyHybrid Optical TorusMETEOR
128 WaveGuide 256 WaveGuide0
500
1000
1500
mm
2
Optical Area 8x8
CoronaFireflyHybrid Optical TorusMETEOR
128 WaveGuide 256 WaveGuide0
500
1000
1500
mm
2
Optical Area 12x12
CoronaFireflyHybrid Optical TorusMETEOR
S. Bahirat, S. Pasricha, “METEOR: Hybrid Photonic Ring-Mesh Network-on-Chip for Multicore Architectures” ACM JETC (under review)
S. Bahirat, S. Pasricha, "UC-PHOTON: A Novel Hybrid Photonic Network-on-Chip for Multiple Use-Case Applications", IEEE ISQED 2010 (Best Paper Award)
S. Pasricha, S. Bahirat, " OPAL: A Multi-Layer Hybrid Photonic NoC for 3D ICs”, IEEE/ACM ASPDAC 2011
![Page 95: On-Chip Communication: Networks on Chip (NoCs)](https://reader036.vdocument.in/reader036/viewer/2022062301/568146c4550346895db3ff3c/html5/thumbnails/95.jpg)
Trends
What if we use carbon nanotube interconnects on a chip?■ vs. Cu: Have better conductivity, highly resistant to electromigration and
other sources of physical breakdown, can support high current densities
■ Options: SWCNTs, MWCNTs, SWCNT bundle, mixed bundle■ Exploration results: [Pasricha et al. NanoArch ’08, VLSID ’09, TVLSI ‘10]
metallic
semiconducting
SWCNT Bundle
Mixed Bundle
SWCNT
MWCNT
![Page 96: On-Chip Communication: Networks on Chip (NoCs)](https://reader036.vdocument.in/reader036/viewer/2022062301/568146c4550346895db3ff3c/html5/thumbnails/96.jpg)
Summary: Trends Move towards hybrid interconnection fabrics
NoC-bus based Custom, heterogeneous topologies
New interconnect paradigms Optical Wireless Carbon nanotube
See my research, pubs page for more details http://www.engr.colostate.edu/~sudeep/pubs/pubs.htm
96
![Page 97: On-Chip Communication: Networks on Chip (NoCs)](https://reader036.vdocument.in/reader036/viewer/2022062301/568146c4550346895db3ff3c/html5/thumbnails/97.jpg)
Books
— 97