network-on-chip (1/2) ben abdallah abderazek the university of aizu e-mail: [email protected]

108
Network-on-Chip (1/2) Ben Abdallah Abderazek The University of Aizu E-mail: [email protected] 1 Hong Kong University of Science and Technology, March 2013

Upload: lavender

Post on 23-Feb-2016

29 views

Category:

Documents


1 download

DESCRIPTION

Network-on-Chip (1/2) Ben Abdallah Abderazek The University of Aizu E-mail: [email protected]. Hong Kong University of Science and Technology, March 2013. - PowerPoint PPT Presentation

TRANSCRIPT

PowerPoint Presentation

Network-on-Chip(1/2)

Ben Abdallah AbderazekThe University of AizuE-mail: [email protected] Kong University of Science and Technology, March 2013

3/4/20131

Part 1Application requirementsNoC: A paradigm shift in VLSI DesignCritical problems addressed by NoCTraffic abstractions Data abstractionNetwork delay modeling 223/4/2013Application RequirementsSignal processing Hard real time Very regular load High quality

Typically on DSPsMedia processing Hard real time Irregular load High quality

SoC/media processorsMultimedia Soft real time Irregular load Limited quality

PC/desktopVery challenging. 333/4/2013Packet Processing in Future Internet

Multicore System-on-Chip (MCSoC):High processing power Support wire speed Programmable Scalable Wide applications in networking areasMore packets&Complex packetprocessingGeneral Purpose ProcessorsASIC(large, expensive to develop, not flexible)4Future InternetMCSoC3/4/20134Telecommunication Systems & NoCThe trend nowadays is to integrate telecommunication system on complex MCSoC: Network processors, Multimedia hubs, and base-band telecom circuits

These applications have tight time-to-market and performance constraints53/4/20135Typical NP (Network Processor)6

3/4/20136Examples of NP Applications 7

3/4/20137A typical telecommunication system is composed of 4 type of components: Software tasks Processors executing softwareSpecific hardware cores, and Global on-chip communication network8Telecommunication System Requirements3/4/20138Telecommunication System RequirementsA typical telecommunication system is composed of 4 types of components: Software tasks Processors executing softwareSpecific hardware cores, and Global on-chip communication network9This is on of the most challenging issue.3/4/20139Technology & Architecture TrendsTechnology trendsVast transistor budgetsRelatively poor interconnect scalingNeed to manage complexity and powerBuild flexible designs (multi-/general-purpose)

Architectural trendsGo parallel!10103/4/2013Information transfer is inherently unreliable at the electrical level, due to:Timing errorsCross-talkElectro-magnetic interference (EMI)Soft errorsThe problem will get increasingly worse as technology scales downCommunication Reliability11113/4/2013OperationDelay(.13mico) Delay(.05micro)32-bit ALU Operation650ps250ps32-bit Register read325ps125psRead 32-bit from 8KB RAM780ps300psTransfer 32-bit across chip (10mm)1400ps2300psTransfer 32-bit across chip (200mm)2800ps4600ps2:1 global on-chip communication to operation delay 9:1 in 2010Ref: W.J. Dally HPCA Panel presentation 2002Wire Delay vs. Logic Delay12123/4/201313Point-to-PointManyModulesM1I/O2P1P1P3I/O1P2P5M3M2I/OP1P2M1P3-Leakage Power-Thermal Power-NoiseOn-chip Interconnection Types133/4/201314Shared busP1I/O2P2P3M1P4M2I/O1P5P6M3I/O3P7M4WaitWaitWaitWaitWaitWaitOn-chip Interconnection Types143/4/201315Hierarchical busP3M1P4M2M3I/O3P7M4P1I/O2P2I/O1P5P6WaitWaitWaitWaitWaitBridgeOn-chip Interconnection Types153/4/201316Bus matrixP2M1P1M2P3M3I/O1P6P5P4WaitWaitOn-chip Interconnection Types163/4/201317Network-on-Chip -> our main topic in this lecture.

ProcessingelementNetworkInterfaceRouterInputbuffersUnidirectionallinksOn-chip Interconnection Types173/4/201318

Network-on-Chip -> our main topic in this lectureOn-chip Interconnection Types183/4/201319Let us Summarize all On-chip Interconnection Types

Ref.: Abderazek Ben Abdallah, Multicore Systems-on-Chip: Practical Hardware/Software Design, 2nd Edition,Publisher:Springer,(2013) ,ISBN-13: 978-9491216916193/4/2013Traditional SoC NightmareVariety of dedicated interfaces Design and verification complexity Unpredictable performance Many underutilized wires20DMACPUDSPBridgeIOIOIOCABPeripheral BusCPU BusControl signals3/4/201320NoC: A paradigm Shift in VLSI21ssssssssModuleModulesModuleFrom: Dedicated signal wiresTo: Shared networkPoint- To-point LinkNetwork switchComputing Module213/4/2013NIDRAMswitchNICPUAccelNIswitchswitchswitchNoCswitchswitchNIDRAMCoprocI/ONINININININININIDSPDMAMPEGDMAEthntEthnt22NoC: A paradigm Shift in VLSI22NoC EssentialCommunication by packets of bits Routing of packets through several hops via switchesEfficient sharing of wires Parallelism

23ssssssssModuleModulesModule233/4/2013NoC Operation ExampleCPU requestPacketization and trans.RoutingReceipt and unpacketization (AHB, OCP, ... pinout)Device response Packetization and transmissionRoutingReceipt and unpacketizationCPUI/ONetwork InterfaceswitchNetwork Interfaceswitchswitch2424Characteristics of a Paradigm ShiftSolves a critical problem Step-up in abstraction Design is affected:Design becomes more restrictedNew toolsThe changes enable higher complexity and capacityJump in design productivity25253/4/2013Characteristics of a Paradigm shiftSolves a critical problem Step-up in abstraction Design is affected:Design becomes more restrictedNew toolsThe changes enable higher complexity and capacityJump in design productivity26We will look at the problem addressed by NoC.263/4/2013Don't we already know how to design interconnection networks?Many existing network topologies, router designs and theory has already been developed for high end supercomputers and telecom switches

Yes, and we'll cover some of this material, but the trade-offs on-chip lead to very different designs!!2727273/4/2013Critical problems addressed by NoC28

1) Global interconnect design problem:delay, power, noise, scalability, reliability2) System integrationproductivity problem3) Multicore Processorskey to power-efficient computing283/4/20131(a): NoC and Global wire delay29Long wire delay is dominated by Resistance

Add repeaters

Repeaters become latches (with clock frequency scaling)Latches evolve to NoC routersNoC RouterNoC RouterNoC Router293/4/20131(b): Wire design for NoC30 NoC links: Regular Point-to-point (no fanout tree) Can use transmission-line layout Well-defined current return path

Can be optimized for noise /speed/power ? Low swing, current mode, .303/4/20131(c): NoC Scalability31

NoC:O(n)

O(n)

Point to-Point

O(n^2 n)

O(n n)

Simple BusO(n^3 n)

O(nn)

Segmented Bus:O(n^2 n)

O(nn)

For Same Performance, compare the wire area and power 313/4/20131(d): NoC and Communication Reliability32RouterUMODEMUMODEMUMODEMUMODEMRouterUMODEMUMODEMUMODEMUMODEMRouterError correctionSynchronizationISI reductionParallel to Serial ConvertorModulationLink InterfaceInterconnectInput buffermn Fault tolerance & error correctionA. Morgenshtein, E. Bolotin, I. Cidon, A. Kolodny, R. Ginosar, Micro-modem reliability solution for NOC communications, ICECS 2004323/4/20131(e): NoC and GALSModules in NoC use different clocksMay use different supply voltages NoC can handle synchronization NoC design may be asynchronousNo waste of power when the links and routers are idle33333/4/20132: NoC and Engineering ProductivityNoC eliminates ad-hoc global wire engineering NoC separates computation from communicationNoC is a complete platform for system integration, debugging and testing34343/4/20133: NoC and MulticoreUniprocessors cannot provide Power-efficient performance growthInterconnect dominates dynamic powerGlobal wire delay doesnt scaleILP is limited

35GateInterconnectDiff.353/4/20133: NoC and MulticorePower-efficiency requires many parallel local computationsMulticore chip Thread-Level Parallelism (TLP)

36Uniprocessor PerformanceDie Area (or Power)363/4/20133: NoC and MulticoreUniprocessors cannot provide Power-efficient performance growthInterconnect dominates dynamic powerGlobal wire delay doesnt scaleInstruction-level parallelism is limitedPower-efficiency requires many parallel local computationsChip Multi Processors (CMP)Thread-Level Parallelism (TLP)Network is a natural choice for CMP!

37

373/4/20133: NoC and MulticoreUniprocessors cannot provide Power-efficient performance growthInterconnect dominates dynamic powerGlobal wire delay doesnt scaleInstruction-level parallelism is limitedPower-efficiency requires many parallel local computationsChip Multi Processors (CMP)Thread-Level Parallelism (TLP)Network is a natural choice for CMP!

38

Network is a natural choice for Multicore383/4/2013Why now is the time for NoC ?39Difficulty of DSM wire designProductivity pressureMulticore

393/4/2013Layers of Abstraction in Network ModelingSoftware layersApplication, OS Network & transport layersNetwork topology e.g. crossbar, ring, mesh, torus, fat tree,Switching Circuit / packet switching(SAF, VCT), wormhole Addressing Logical/physical, source/destination, flow, transactionRouting Static/dynamic, distributed/source, deadlock avoidance Quality of Service e.g. guaranteed-throughput, best-effortCongestion control, end-to-end flow control 40403/4/2013Layers of Abstraction in Network ModelingData link layerFlow control Handling of contentionCorrection of transmission errors Physical layerWires, drivers, receivers, repeaters, signaling, circuits,..41413/4/2013How to Select Architecture ?42ASICFPGAASSPCMP/MulticoreReconfigurationRate

During run time

At boot time

At design timeSingle application General purpose or Embedded systems FlexibilityArchitecture choices depends on system needs423/4/201343ASICFPGAASSPCMP/MulticoreReconfigurationRate

During run time

At boot time

At design timeSingle application General purpose or Embedded systems FlexibilityA large range of solutions!How to Select Architecture ?Architecture choices depends on system needs433/4/2013Perspective 1: NoC vs. BusAggregate bandwidth growsLink speed unaffected by NConcurrent spatial reusePipelining is built-inDistributed arbitrationSeparate abstraction layersHowever:No performance guaranteeExtra delay in routersArea and power overhead?Modules need NI Unfamiliar methodology44Bandwidth is shared Speed goes down as N growsNo concurrencyPipelining is tough Central arbitration No layers of abstraction (communication and computation are coupled)

However: Fairly simple and familiarNoCBus443/4/2013Perspective 2: NoC vs. Off-chip NetworksCost is in the linksLatency is tolerableTraffic/applications unknownChanges at runtimeAdherence to networking standards45 Sensitive to cost: area power Wires are relatively cheap Latency is critical Traffic may be known a-priori Design time specialization Custom NoCs are possibleOff-Chip NetworksNoC453/4/2013VLSI CAD ProblemsApplication mapping Floorplanning (placement) Routing Buffer sizing Timing closure Simulation Testing46463/4/2013Application mapping (map tasks to cores) Floorplanning (within the network) Routing (of messages) Buffer sizing (size of FIFO queues in the routers)Simulation (Network simulation, traffic/delay/power modeling)Other NoC design problems (topology synthesis, switching, virtual channels, arbitration, flow control,)47VLSI CAD Problems in NoC473/4/2013Traffic AbstractionsTraffic model are generally captured from actual traces of functional simulationA statically distribution is often assumed for message48PE1PE2PE3PE4PE12PE11PE5PE9PE7PE8PE6PE10

483/4/2013Data Abstractions49PacketPacketPacketPacketMessageHead FlitBody FlitBody FlitTail FlitPacketTypeSequence #VCdataFlitPhit493/4/2013Typical NoC Design Flow50Determine routing and adjust link capacities503/4/2013Timing Closure in NoC51Define inter-module trafficPlace modulesIncrease link capacitiesQoS satisfied ?NoYesFinish Too long capacity results in poor QoS Too high capacity wastes area Uniform link capacities are a waste in ASIP system513/4/2013NoC Design RequirementsHigh-performance interconnectHigh-throughput, latency, power, areaComplex functionalitySupport for virtual-channelsQoSSynchronizationReliability, high-throughput, low-latency52523/4/2013Break + Questions53Part II: NoC Building BlocksTopologyRoutingControl FlowNetwork Interface Router Architecture

543/4/201354Part II: NoC Building BlocksTopologyRouting AlgorithmsRouting MechanismsControl FlowNetwork Interface Router Architecture

553/4/201355NoC TopologyMainly adopted from large-scale networks and parallel computingA good topology allows to fulfill the requirements of the traffic at reasonable costsTopology classifications:Direct topologiesIndirect topologies56NoC topology is the connection map between PEs.The ring is the least complex topology where router nodes are connected in a circular fashion, but scalability is poor due to limited connectivity.

Mesh: Meshes have a relatively large average network distance, which can increase power consumption. Moreover, concentrated areas of high traffic within the grid, called hotspots, may occur

The 2-D Torus is similar to a mesh, but extended with wrap-around links to reduce average network distance. A two-dimensional torus has a similar area footprintas the mesh, with the overhead in added links.

An irregular topology can be derived by altering the connectivity of a regular topology(i.e., one of the topologies described thus far), to form hybrid, cluster-based, orasymmetric topologies

3/4/201356Direct Topology: Mesh57PEPEPEPEPEPEPEPEPEPEPEPERRRRRRRRPEPEPEPERRRRRRRRThe main problem with the mesh topology is its long diameter that has negative effect on communication latency.

Torus topology was proposed to reduce the latency of mesh and keep its simplicity.

The only difference between torus and mesh topologies is that the switches on the edges are connected to the switches on the opposite edges through wrap-around channels.

573/4/2013Direct Topology: Torus58PEPEPEPEPEPEPEPEPEPEPEPERRRRRRRRPEPEPEPERRRRRRRRThe main problem with the mesh topology is its long diameter that has negative effect on communication latency.

Torus topology was proposed to reduce the latency of mesh and keep its simplicity.

The only difference between torus and mesh topologies is that the switches on the edges are connected to the switches on the opposite edges through wrap-around channels.

583/4/2013Direct Topology: Folded Torus59PEPEPEPEPEPEPEPEPEPEPEPERRRRRRRRPEPEPEPERRRRRRRRFoldPERPERPERPERPERPERPERPERPERPERPERPERRPERPERPERPEThe main problem with the mesh topology is its long diameter that has negative effect on communication latency.

Torus topology was proposed to reduce the latency of mesh and keep its simplicity.

The only difference between torus and mesh topologies is that the switches on the edges are connected to the switches on the opposite edges through wrap-around channels.

593/4/2013Direct Topology: Folded Torus60PEPEPEPEPEPEPEPEPEPEPEPERRRRRRRRPEPEPEPERRRRRRRRFoldPERPERPERPERPERPERPERPERPERPERPERPERRPERPERPERPEThe main problem with the mesh topology is its long diameter that has negative effect on communication latency.

Torus topology was proposed to reduce the latency of mesh and keep its simplicity.

The only difference between torus and mesh topologies is that the switches on the edges are connected to the switches on the opposite edges through wrap-around channels.

603/4/2013Direct Topology: Octagon61

PEPEPEPEPEPEPEPESWThe main problem with the mesh topology is its long diameter that has negative effect on communication latency.

Torus topology was proposed to reduce the latency of mesh and keep its simplicity.

The only difference between torus and mesh topologies is that the switches on the edges are connected to the switches on the opposite edges through wrap-around channels.

613/4/2013Indirect Topology: Fat Tree62SWSWSWSWSWSWSWPEPEPEPEPEPEPEPEThe main problem with the mesh topology is its long diameter that has negative effect on communication latency.

Torus topology was proposed to reduce the latency of mesh and keep its simplicity.

The only difference between torus and mesh topologies is that the switches on the edges are connected to the switches on the opposite edges through wrap-around channels.

623/4/2013Indirect Topology:k-ary n-fly butterfly network63SWPEPEPEPEPEPEPEPESWSWSWSWSWSWSWPEPEPEPEPEPEPEPESWSWSWSWThe main problem with the mesh topology is its long diameter that has negative effect on communication latency.

Torus topology was proposed to reduce the latency of mesh and keep its simplicity.

The only difference between torus and mesh topologies is that the switches on the edges are connected to the switches on the opposite edges through wrap-around channels.

633/4/2013Indirect Topology:(m, n, r) symmetric Clos network64SW02SW02SW11SW35SW35SW44SW68SW6877SW911SW911SW1010The main problem with the mesh topology is its long diameter that has negative effect on communication latency.

Torus topology was proposed to reduce the latency of mesh and keep its simplicity.

The only difference between torus and mesh topologies is that the switches on the edges are connected to the switches on the opposite edges through wrap-around channels.

643/4/2013How to Select a Topology ?Application decides the topology typeif PEs = few tens Mesh is recommendedif PEs = 100 or more Hierarchical star is recommendedSome topologies are better for certain designs than others

653/4/201365Part II: NoC Building BlocksTopologyRouting AlgorithmsRouting MechanismsControl FlowNetwork Interface Router Architecture

663/4/201366NoC RoutingRouting algorithm determine path(s) from source to destination. Routing must prevent deadlock, livelock , and starvation.

67

3/4/201367Routing Deadlock68

Without routing restrictions, a resource cycle can occur Leads to deadlock3/4/201368Deadlock, Livelock, and StarvationDeadlock: A packet does not reach its destination, because it is blocked at some intermediate resource.

Livelock: A packet does not reach its destination, because it enters a cyclic path.

Starvation: A packet does not reach its destination, because some resource does not grant access (wile it grants access to other packets).

693/4/201369Lifelock70PEDPEPEPEPEPEPEPEPEPEPERRRRRRRRPEPESPERRRRRRRRCongested channelLifelock703/4/2013Deadlock71100011011202Dest 12Dest 12Dest 11Dest 22Dest 01Dest 01Dest 02Dest 02Dest 03Dest 12Dest 11Dest 22Dest 01Dest 01Dest 02Dest 00713/4/2013Deadlock72100011011202Dest 12Dest 12Dest 11Dest 22Dest 01Dest 01Dest 02Dest 02Dest 03Dest 12Dest 11Dest 22Dest 01Dest 01Dest 02Dest 00723/4/2013Deadlock73100011011202Dest 12Dest 12Dest 11Dest 22Dest 01Dest 01Dest 02Dest 02Dest 03Dest 12Dest 11Dest 22Dest 01Dest 01Dest 02Dest 00BLOCKBLOCKBLOCKBLOCKDEADLOCK733/4/2013Routing Algorithm AttributesNumber of destinationsUnicast, Multicast, Broadcast?

AdaptivityDeterministic, Oblivious or Adaptive

Implementation (Mechanisms)Source or node routing?Table or circuit?743/4/201374Static Vs. Adaptive Routing75PEPEPEPEPEPEPEPEPEPEPEPERRRRRRRRPEPEPEPERRRRRRRRPEPEPEPEPEPEPEPEPEPEPEPERRRRRRRRPEPEPEPERRRRRRRRStaticAdaptiveCongested channel753/4/2013Minimal Vs. Non-Minimal76PEPEPEPEPEPEPEPEPEPEPEPERRRRRRRRPEPEPEPERRRRRRRRPEPEPEPEPEPEPEPEPEPEPEPERRRRRRRRPEPEPEPERRRRRRRRMinimal Non-Minimal763/4/2013Source Vs. Distributed77PEPEPEPEPEPEPEPEPEPEPEPERRRRRRRRPEPEPEPERRRRRRRRPEPEPEPEPEPEPEPEPEPEPEPERRRRRRRRPEPEPEPERRRRRRRRSourceDistributedENESENNNL773/4/2013Source Vs. Distributed78PEPEPEPEPEPEPEPEPEPEPEPERRRRRRRRPEPEPEPERRRRRRRRPEPEPEPEPEPEPEPEPEPEPEPERRRRRRRRPEPEPEPERRRRRRRRENESENNNLRoutingComputationSourceDistributed783/4/2013Source Vs. Distributed79PEPEPEPEPEPEPEPEPEPEPEPERRRRRRRRPEPEPEPERRRRRRRRPEPEPEPEPEPEPEPEPEPEPEPERRRRRRRRPEPEPEPERRRRRRRRNESENNNLRoutingComputationSourceDistributed793/4/2013Source Vs. Distributed80PEPEPEPEPEPEPEPEPEPEPEPERRRRRRRRPEPEPEPERRRRRRRRPEPEPEPEPEPEPEPEPEPEPEPERRRRRRRRPEPEPEPERRRRRRRRESENNNLRoutingComputationSourceDistributed803/4/2013Source Vs. Distributed81PEPEPEPEPEPEPEPEPEPEPEPERRRRRRRRPEPEPEPERRRRRRRRPEPEPEPEPEPEPEPEPEPEPEPERRRRRRRRPEPEPEPERRRRRRRRSENNNLRoutingComputationSourceDistributed813/4/2013Source Vs. Distributed82PEPEPEPEPEPEPEPEPEPEPEPERRRRRRRRPEPEPEPERRRRRRRRPEPEPEPEPEPEPEPEPEPEPEPERRRRRRRRPEPEPEPERRRRRRRRENNNLRoutingComputationSourceDistributed823/4/2013Source Vs. Distributed83PEPEPEPEPEPEPEPEPEPEPEPERRRRRRRRPEPEPEPERRRRRRRRPEPEPEPEPEPEPEPEPEPEPEPERRRRRRRRPEPEPEPERRRRRRRRNNNLRoutingComputationSourceDistributed833/4/2013Source Vs. Distributed84PEPEPEPEPEPEPEPEPEPEPEPERRRRRRRRPEPEPEPERRRRRRRRPEPEPEPEPEPEPEPEPEPEPEPERRRRRRRRPEPEPEPERRRRRRRRNNLRoutingComputationSourceDistributed843/4/2013Source Vs. Distributed85PEPEPEPEPEPEPEPEPEPEPEPERRRRRRRRPEPEPEPERRRRRRRRPEPEPEPEPEPEPEPEPEPEPEPERRRRRRRRPEPEPEPERRRRRRRRNLRoutingComputationSourceDistributed853/4/2013Source Vs. Distributed86PEPEPEPEPEPEPEPEPEPEPEPERRRRRRRRPEPEPEPERRRRRRRRPEPEPEPEPEPEPEPEPEPEPEPERRRRRRRRPEPEPEPERRRRRRRRLRoutingComputationSourceDistributed863/4/2013Routing examples87PEDPEPEPEPEPEPEPEPEPEPERRRRRRRRPEPESPERRRRRRRRDimension Ordered Routing (XY Routing)XY873/4/2013Routing examples88Valiant routing algorithm(VAL)PEPEPEPEPEDPEPEPEPEPEPERRRRRRRRPEPESPERRRRRRRRXYRandom Intermediate node883/4/2013Routing examples89ROMMPEPEPEPEPEDPEPEPEPEPEPERRRRRRRRPEPESPERRRRRRRRXYBounding BoxIntermediate node within bounding box893/4/2013Routing examples90O1TURNPEPEPEPEPEDPEPEPEPEPEPERRRRRRRRPEPESPERRRRRRRRXY50% XY50% YX903/4/2013Routing examples91Dynamic XY(DyXY)PEPEPEPEPEDPEPEPEPEPEPERRRRRRRRPEPESPERRRRRRRRXYCongested channel913/4/2013Summary of Routing AlgorithmsDeterministic algorithms are simple and inexpensive but they do utilize path diversity and thus are weak on load balancingOblivious algorithms give often good results since they allow good load balancing and their effects are easy to analyseAdaptive algorithms although in theory superior, are complex and power hungry9292Latency paramount concernMinimal routing most common for NoCNonminimal can avoid congestion and deliver low latencyNoC researchers favor DOR for simplicity and deadlock freedomHere we only cover unicast routing93Summary of Routing Algorithms3/4/201393Part II: NoC Building BlocksTopologyRouting AlgorithmsRouting MechanismsControl FlowNetwork Interface Router Architecture

943/4/201394Routing MechanismTwo approaches:Fixed routing tables at the source or at each hopAlgorithmic routing uses specialized hardware to compute the route or next hop at run-timeThe term routing mechanics refers to the mechanism that is used to implement any routing algorithm.9595Table-based RoutingTwo approaches:Source-table routing implements all-at-once routing by looking up the entire route at the sourceNode-table routing performs incremental routing by looking up the hop-by-hop routing relation at each node along the routeMajor advantage: A routing table can support any routing relation on any topology.

96SoC Architecture Lecture 72013-03-04(c) Axel Jantsch and Ingo Sander96Table-based Routing97

Example routing mechanism for deterministic source routing NoCs. The NI uses a LUT to store the route map.3/4/201397Source RoutingAll routing decisions are made at the source terminal To route a packet we need: the table is indexed using the packet destinationa route or a set of routes are returned, one route is selectedthe route is prepended and embedded in the packetBecause of its speed, simplicity and scalability source routing is very often used for deterministic and oblivious routing989899Source Routing - ExampleThe example shows a routing table for a 4x2 torus networkIn this example there are two alternative routes for each destinationEach node has its own routing table

0010203031211101DestinationRoute 0Route 100XX10EXWWWX20EEXWWX30WXEEEX01NXSX11NEXENX21NEEXWWNX31NWXWNXSource routing table for node 00 of 4x2 torus networkIn this example the order of XY should be the opposite, i.e. 21->124x2 torus networkExample:-Routing from 00 to 21-Table is indexed with 21-Two routes: NEEX and WWNX

-The source arbitrarily selects NEEXindexselect99Arbitrary Length Encoding of Source RoutesAdvantage: It can be used for arbitrary-sized networksThe complexity of routing is moved from the network nodes to the terminal nodesBut routers must be able to handle arbitrary length routes1003/4/2013100Arbitrary Length-EncodingRouter has16-bit phits32-bit flitsRoute has 13 hops: NENNWNNENNWNNExtra symbols:P: Phit continuation selectorF: Flit continuation PhitThe tables entries in the terminals must be of arbitrary length101

3/4/2013101Node-Table RoutingTable-based routing can also be performed by placing the routing table in the routing nodes rather than in the terminals

Node-table routing is appropriate for adaptive routing algorithms, since it can use state information at each node102102Node-Table RoutingA table lookup is required, when a packet arrives at a router, which takes additional time compared to source routing

Scalability is sacrificed, since different nodes need tables of varying size

Difficult to give two packets arriving from a different node a different way through the network without expanding the tables1033/4/2013103104Table shows a set of routing tablesThere are two choices from a source to a destination

Routing Table for Node 00

0010203031211101Note: Bold font ports are misroutesExample of Node-Table RoutingSoC Architecture Lecture 72013-03-04(c) Axel Jantsch and Ingo Sander104105

0010203031211101A packet passing through node 00 destined for node 11.

If the entry for (00->11) is N , go to 10 and (10-> 11) is S => 00 10 (livelock) Livelock can occurExample of Node-Table Routing105Algorithmic RoutingInstead of using a table, algorithms can be used to compute the next route

In order to be fast, algorithms are usually not very complicated and implemented in hardware1063/4/2013106107Dimension-Order Routingsx and sy indicated the preferred directionssx=0, +x; sx=1, -xsy=0, +y; sy=1, -yx and y represent the number of hops in x and y directionThe PDV is used as an input for selection of a route

Determines the type of the routingIndicates which channels advance the packetExample: Algorithmic Routing107A minimal oblivious router - Implemented by randomly selecting one of the active bits of the PDV as the selected directionMinimal adaptive router - Achieved by making selection based on the length of the respective output Qs.Fully adaptive router Implemented by picking up unproductive direction if Qs > threshold results108Example: Algorithmic Routing3/4/2013108Exercise Compression of source routes. In the source routes, each port selector symbol [N,S,W,E, and X] was encoded with three bits. Suggest an alternative encoding to reduce the average length (in bits) required to represent a source route. Justify your encoding in terms of typical routes that might occur on a torus. Also compare the original three bits per symbol with your encoding on the following routes:NNNNNEEXWNEENWWWWWNX1093/4/2013109Part II: NoC Building BlocksTopologyRouting AlgorithmsRouting MechanismsSwitchingControl FlowNetwork Interface Router ArchitecturePart III: OASIS NoC Real Design

110Next lecture3/4/2013110