Networks on Silicon:Networks on Silicon:Blessing or Nightmare?Blessing or Nightmare?
Paul WielagePaul WielagePhilips Research Laboratories, The NetherlandsPhilips Research Laboratories, The Netherlands
2EUROMICRO 2002, Paul Wielage
Electronic systems
• Systems on chip are everywhereSystems on chip are everywhere
• Technology advances enable increasingly more complex Technology advances enable increasingly more complex designs designs
Central Question:Central Question: how to exploit deep-how to exploit deep-submicron technologies efficiently?submicron technologies efficiently?
3EUROMICRO 2002, Paul Wielage
Silicon technology roadmap
• intrinsic capabilityintrinsic capability of ICs (transistor count / gate delay) of ICs (transistor count / gate delay)
grows with ~ 50% per year (Moore’s Law)grows with ~ 50% per year (Moore’s Law)
• powerpower limits the performance limits the performance
low power SoClow power SoChigh performance high performance
MPU/SoCMPU/SoC20012001 20042004 20102010 20012001 20042004 20102010
gate length (nm)gate length (nm) 130130 9090 4545 9090 5353 2525
supply voltagesupply voltage 1.21.2 11 0.60.6 1.11.1 11 0.60.6
transistor count (M)transistor count (M) 3.33.3 8.38.3 4040 276276 553553 22122212
chip size (mmchip size (mm22)) 100100 120120 144144 310310 310310 310310
clock frequency (GHz)clock frequency (GHz) 0.150.15 0.30.3 0.60.6 1.71.7 2.42.4 4.74.7
wiring levelswiring levels 66 77 99 77 88 1010
max power (W)max power (W) 0.10.1 0.10.1 0.10.1 130130 160160 218218
Source: ITRS 2001Source: ITRS 2001
4EUROMICRO 2002, Paul Wielage
Design Challenges
Moore’s Law is a nice prophesy, but it is hard work to bring into practiceMoore’s Law is a nice prophesy, but it is hard work to bring into practice
Log # transistors
Time
Technology
Designproductivity
Designgap
paradigm paradigm shiftsshifts
Paradigms shifts in design methodology is the only escapeParadigms shifts in design methodology is the only escape
5EUROMICRO 2002, Paul Wielage
Design challenges
• design productivity and design timedesign productivity and design time
system level design paradigm shifts:system level design paradigm shifts:– component based designcomponent based design
(IP block re-use)(IP block re-use)– platform based designplatform based design
(architecture re-use)(architecture re-use)– networks on siliconnetworks on silicon
(communication-centric view)(communication-centric view)
• dynamic and standby power consumptiondynamic and standby power consumption– low swing signalinglow swing signaling– clock gating / supply switchingclock gating / supply switching– power managementpower management
– multi-Vmulti-Vtt transistors transistors
– new memory technologiesnew memory technologies
6EUROMICRO 2002, Paul Wielage
Component-based design
Design methodology:Design methodology:
• IC is a composite of heterogeneous IP blocks, preferably reusedIC is a composite of heterogeneous IP blocks, preferably reused
– e.g. processors, memories, controllerse.g. processors, memories, controllers
– or even whole sub-systems like MPEG encoders / decodersor even whole sub-systems like MPEG encoders / decoders
• composition by standard interfaces and busescomposition by standard interfaces and buses
– e.g. virtual component interface (VCI) or AHB bus protocole.g. virtual component interface (VCI) or AHB bus protocol
• use wrappers to comply to chosen communication standarduse wrappers to comply to chosen communication standard
– goal: plug and play by means of automatic wrapper generationgoal: plug and play by means of automatic wrapper generation
Weak point: Weak point: how do physical issues influence how do physical issues influence performance and functional-correctness?performance and functional-correctness?
7EUROMICRO 2002, Paul Wielage
SoC design in practice
IC of heterogeneous IP blocks IC of heterogeneous IP blocks
• analoganalog
• storagestorage
• computationcomputation
• communicationcommunication
One-Chip TVOne-Chip TVNexperiaNexperiaTMTM platform platform
• computationcomputation
• communicationcommunication
ProspectProspect
- blocks of 50K-100K gates is do-able till 2010blocks of 50K-100K gates is do-able till 2010
- however in 2010: 1000 < # blocks < 10.000however in 2010: 1000 < # blocks < 10.000
- increasingly difficult with growing # blocksincreasingly difficult with growing # blocks
- speed and energy are crucialspeed and energy are crucial
8EUROMICRO 2002, Paul Wielage
Importance of communication speed
Scaling makes transistors faster but not wires Scaling makes transistors faster but not wires mismatchmismatch
Consequence: Consequence: performance bottleneckperformance bottleneck
• faster processors need more data / instructions and more faster processors need more data / instructions and more instantlyinstantly
• highly concurrent processing makes hiding communication highly concurrent processing makes hiding communication latency difficultlatency difficult
Eventually interconnect will dominate SoC Eventually interconnect will dominate SoC performanceperformance
focus shift from computation to communication required focus shift from computation to communication required
9EUROMICRO 2002, Paul Wielage
SoC interconnect requirements
• scalablescalable– in bandwidth and latency for any system sizein bandwidth and latency for any system size
• flexibleflexible– multiple applications / configurations, various bit-ratesmultiple applications / configurations, various bit-rates– connectivity between each pair of IPsconnectivity between each pair of IPs
• compositionalcompositional– allow to merge two sub-systemsallow to merge two sub-systems
• deep sub-micron robustdeep sub-micron robust– noise, cross-talk, IR drop, soft errorsnoise, cross-talk, IR drop, soft errors– support multiple clock-domains (e.g. GALS)support multiple clock-domains (e.g. GALS)
• efficientefficient– costcost– powerpower
10EUROMICRO 2002, Paul Wielage
Today’s communication solutions
In addition, today’s communication solutions are not deep In addition, today’s communication solutions are not deep submicron proof submicron proof
dedicateddedicatedpoint-to-pointpoint-to-point
shared busshared bus cross-bar switchcross-bar switch
scalablescalableflexibleflexible
compositionalcompositionalefficientefficient
scalablescalable
compositionalcompositional
scalablescalable
flexibleflexible
efficientefficient
M I
S I
M ISI
M I
I Arbiter
interface
DRIVER
interface
RECEIVER
M IM I SI
ArbiterI
IM
S
11EUROMICRO 2002, Paul Wielage
On-chip communication
Novel approach:Novel approach:
• Charles Seitz et.al., “Charles Seitz et.al., “Let’s Route Packets Instead of Let’s Route Packets Instead of WiresWires”, 1990”, 1990
• William J. Dally et.al., “William J. Dally et.al., “Route Packets, Not Wires: Route Packets, Not Wires: On-Chip Interconnection NetworksOn-Chip Interconnection Networks”, DAC 2001”, DAC 2001
• Kees Goossens et.al., “Kees Goossens et.al., “Networks on Silicon: Networks on Silicon: Combining Best-Effort and Guaranteed ServicesCombining Best-Effort and Guaranteed Services”, ”, DATE 2002DATE 2002
12EUROMICRO 2002, Paul Wielage
Networks on Silicona paradigm shift in on-chip communication
Essence of a NoS:Essence of a NoS:
• all IP to IP communication via all IP to IP communication via single networksingle network
• network is network is multi-hopmulti-hop: routers are point-to-point connected: routers are point-to-point connected
• routers forward routers forward data-packetsdata-packets
• router is router is IP blockIP block in itselfin itself
router-router-networknetwork
router noderouter node
packet packet storagestorage
cross-bar cross-bar switchswitch
arbiterarbiter
““functional functional IP”IP”
13EUROMICRO 2002, Paul Wielage
Wire usage in router-networks
The typical extremes are not favorable for chip-wide interconnectThe typical extremes are not favorable for chip-wide interconnect
A router-network uses the proper mix:A router-network uses the proper mix: time-shared & point-to-point time-shared & point-to-point• high utilization, few wireshigh utilization, few wires• high frequency, pipelining & repeater insertion possiblehigh frequency, pipelining & repeater insertion possible
time congestiontime congestion
shared bandwidthshared bandwidth
cost efficientcost efficient
flexibleflexiblecross-talkcross-talk
RC delayRC delay
space congestionspace congestion
dedicated dedicated p-to-pp-to-p
space- space- time time
sharedshared
14EUROMICRO 2002, Paul Wielage
Networks on Silicon
Abstract communication services:Abstract communication services:• transporttransport
– uncorrupted, loss-less, without duplicationuncorrupted, loss-less, without duplication• performanceperformance
– guaranteed throughput, bounded latency and jitterguaranteed throughput, bounded latency and jitter– without guarantees: best-effortwithout guarantees: best-effort
• orderingordering– in-order per transaction, connection, global, …in-order per transaction, connection, global, …
hardware technology
application demands
services
use
off
er
15EUROMICRO 2002, Paul Wielage
NoS characteristics
scalabescalabe– #routers, topology, traffic classes#routers, topology, traffic classes– size, bandwidth, latencysize, bandwidth, latency
flexibleflexible– every IP is reachableevery IP is reachable– servicesservices
compositionalcompositional– merging two networks is again a networkmerging two networks is again a network
deep submicron robustdeep submicron robust– routers are highly reusable: allows for DSM optimizationrouters are highly reusable: allows for DSM optimization– distributed implementation: no global clock requireddistributed implementation: no global clock required
efficientefficient– high wire utilization high wire utilization less wires needed less wires needed
16EUROMICRO 2002, Paul Wielage
Issues of concern
Two issues of many:Two issues of many:
• overhead at interface between network and functional IPoverhead at interface between network and functional IP
• state synchronization at system levelstate synchronization at system level
17EUROMICRO 2002, Paul Wielage
The Æthereal network on silicon
Combination of guaranteed and best-effort servicesCombination of guaranteed and best-effort services
• guaranteed throughput & latencyguaranteed throughput & latency
– circuit switching (time division multiplexed)circuit switching (time division multiplexed)
– ATMATM-like connection set up-like connection set up
• best-effort for efficiencybest-effort for efficiency
– virtual output queuingvirtual output queuing
– worm-hole routingworm-hole routing
Inherently loss-less andInherently loss-less and
ordered transportordered transport
No global signalsNo global signals
priority / arbitration
programming
best-effortrouter
guaranteedrouter
stu
6 port prototype router6 port prototype router
- cmos12cmos12
- 6 Kbit queuing6 Kbit queuing
- 512 TDMA slots512 TDMA slots
18EUROMICRO 2002, Paul Wielage
Blessing or …
NIGHTMAREtoo many resources
NIGHTMAREtoo little performance
dedicated, p2pdedicated, p2p
shared busshared bus
router networkrouter network
Good but maybe too expensive
Promising Promising solution!solution!
Hybrid: Hybrid: routers + busesrouters + buses
19EUROMICRO 2002, Paul Wielage
Summary
1.1. TechnologyTechnology offers tremendous opportunities offers tremendous opportunities
2.2. High demands from future High demands from future applicationsapplications
3.3. CommunicationCommunication is the problem of future SoCs is the problem of future SoCs
4.4. Networks on SiliconNetworks on Silicon is the solution is the solution- technology wisetechnology wise- design wisedesign wise