Download - Tightly-Coupled Multi-Layer
Tightly-Coupled Multi-Layer
Topologies for 3D NoCs
Hiroki Matsutani (Keio Univ, JAPAN)Michihiro Koibuchi (NII, JAPAN)
Hideharu Amano (Keio Univ, JAPAN)
Outline• Network-on-Chip (NoC)
– Typical 2D topologies– 2D vs. 3D
• XNoTs– New class of 3D topologies– Definition, Examples– Deadlock-free routing
• Evaluations– Throughput– Area, Energy consumption
Network-on-Chip (NoC)• Tile architectures
– MIT RAW
– Texas U. TRIPS
– Intel 80-tile NoC
• Various topologies– Mesh, Torus, Tree– Large impact on en
ergy, cost, and performance
[Vangal, ISSCC’07]
[Buger, Computer’04]
[Taylor, Micro’02]
An example of tile architecture (ASPLA 90nm CMOS process)
Tile = Processing core + On-chip
router
Packet switched network
2D Topologies: Mesh & Torus
Router Core
• 2-D Mesh • 2-D Torus– 2x bandwidth of meshRAW [Taylor, IEEE Micro’02]
2D Topologies: Fat Tree
• Fat Tree (p, q, c)p: # of upward linksq: # of downward
linksc: # of core ports
Router Core
Fat Tree (2,4,2)Fat Tree (2,4,1)
Network topology should be carefully selected so as to meet the requirements of
application
Network topology should be carefully selected so as to meet the requirements of
application
2D NoC vs. 3D NoC• 2D NoCs
– Long wires, distance– Wire delay– Packets consume
power at links according to their wire length
• 3D NoCs– Several small wafers
or dices are stacked
• Vertical link– Micro bump– Through-wafer via
– Very short (10-50um)
[Ezaki, ISSCC’04]
[Burns, ISSCC’01]
Long horizontal wires in 2D NoCs can be replaced by very short vertical links in 3D
NoCs
Long horizontal wires in 2D NoCs can be replaced by very short vertical links in 3D
NoCs
3D NoCs that have heterogeneous tiers
• Different circuits on each tier
• Different topologies on each tier
Processor array
Cache memory
Custom logic
Tier-1
Tier-2
Tier-3Fat
Tree(2,4,1)
Ring
2D-Mesh
(*) A tier refers a wafer or a die in 3D ICs
How to connect different planar topologies?
How to route packets in heterogeneous 3D NoCs?
How to connect different planar topologies?
How to route packets in heterogeneous 3D NoCs?
We propose a class of topology for heterogeneous 3D NoCs
Outline• Network-on-Chip (NoC)
– Typical 2D topologies– 2D vs. 3D
• XNoTs– New class of 3D topologies– Definition, Examples– Deadlock-free routing
• Evaluations– Throughput– Area, Energy consumption
Multiple network layers are tightly connected by vertical crossbar
switches
Existing vertical link designs
• Vertical bus
• Merit– Small # of vertical link
• Demerit– Low peak performance
• Vertical crossbar
• Merit– Similar performance to tr
ue crossbar– Reasonable # of vertical l
inks
[Li, ISCA’06][Kim, ISCA’07]
We assume to use crossbar-based vertical link for 3D NoCs
Single bus (only a single transfer at the same
time)
Segmented buses (multiple transfers at the same time)
XNoTs: Xbar-connected Network-on-Tiers
• XNoTs: – Multiple planar
topologies– Connected by crossbars
• Network-on-Tier (NoT)– A planar topology– Implemented on a tier– Bottom NoT provides con
nectivity to all cores
Network-on-Tier
Network-on-Tier
Network-on-Tier
XNoTs
A mesh-based NoT
Each core and router have a port for a vertical
connection
Router Core
XNoTs: Xbar-connected Network-on-Tiers
• XNoTs: – Multiple planar
topologies– Connected by crossbars
• Network-on-Tier (NoT)– A planar topology– Implemented on a tier– Bottom NoT provides con
nectivity to all cores
A mesh-based NoT
Router CoreA mesh-based XNoTs
All routers and cores in a same pillar are connected by a crossbar
All routers and cores in a same pillar are connected by a crossbar
Vertical crossbar
pillar
Examples: all tiers have same topology
Mesh-based XNoTs Ring-based XNoTs Tree-based XNoTs
Side viewSide view Side view
Mesh-based XNoTs Ring-based XNoTs Tree-based XNoTs
All routers and cores in a same pillar are connected by a crossbar
All routers and cores in a same pillar are connected by a crossbar
Examples: all tiers have same topology
Examples: Heterogeneous XNoTs (1)
• Different topologies are used in each tier
Fat Tree(2,4,1)
Ring
2D-Mesh
Examples: Heterogeneous XNoTs (1)
Side view
Fat Tree(2,4,1)
Ring
2D-Mesh
• Different topologies are used in each tier
Packets are transferred via bottom tier (tier-1)
Packets are transferred via bottom tier (tier-1)
No connectivity
Examples: Heterogeneous XNoTs (2)
• All tiers cannot provide connectivity to all cores– Except for the bottom tier (i.e., “escape” tier)
Bottom tier (Full connectivity to all
cores)
Top tier (Some links are disconnected)
(*) Only the bottom tier must provide full connectivity to all cores
Examples: Heterogeneous XNoTs (2)
• All tiers cannot provide connectivity to all cores– Except for the bottom tier (i.e., “escape” tier)
Packets are transferred via bottom tier (tier-1)
Packets are transferred via bottom tier (tier-1)
Bottom tier (Full connectivity to all
cores)
Top tier (Some links are disconnected)
(*) Only the bottom tier must provide full connectivity to all cores
XNoTs: Deadlock-free routing
• Intra-tier comm. (X and Y directions)
– Existing deadlock-free routing is used within a tier– Only tier-0 must guarantee connectivity to all cores
• Inter-tier comm. (Z direction)
– Turns from lower-tier to higher-tier are prohibited– Unless the next hop is final destination
Top view Side viewMesh based XNoTs
E.g., dimension-order routing (DOR)
OK!NG!
XNoTs: Path selection (random)
• XNoTs routing– Multiple tiers are available Alternative paths are available
• Path selection policy– How to select a single path?– Random selection Good load balancing
5-hop
5-hop
5-hop
Top view Side viewMesh based XNoTs
We also proposed some policy based path selection policies. For more detail, please refer to the paper.
We also proposed some policy based path selection policies. For more detail, please refer to the paper.
Outline• Network-on-Chip (NoC)
– Typical 2D topologies– 2D vs. 3D
• XNoTs– New class of 3D topologies– Definition, Examples– Deadlock-free routing
• Evaluations– Throughput– Area, Energy consumption
Evaluation: Target topologies (64-core)
• X-Mesh– (4x4 Mesh) x 4 layers
• X-Torus– (4x4 Torus) x 4 layers
• X-FT141– Fat Tree(1,4,1) x 4 layers
• X-FT241– Fat Tree(2,4,1) x 4 layers
• X-FT441– Fat Tree(4,4,1) x 4 layers
X-Mesh
p: # of upward linksq: # of downward
linksc: # of core ports
Fat Tree (p, q, c)
These five topologies are compares with 3D Mesh/Torus
Throughput: Simulation environment
• Grid-based topologies– 3D-Mesh, X-Mesh– 3D-Torus, X-Torus– Dimension-order
routing
• Tree-based topologies– X-FT141, X-FT241– X-FT441– Up*/down* routing
• Path selection policy– Random
Packet size 16-flit (1-flit header)Buffer size 1-flit per channelSwitching Wormhole switching
Latency 3-cycle per 1-hopTraffic Uniform random
(Two virtual channels for tori)
X-Mesh (4x4x4)
Throughput: Simulation results
• X-Torus• X-Mesh
• X-FT441• X-FT241• X-FT141
Grid-based XNoTs Tree-based XNoTs
No degradation (X-Mesh = 3D-Mesh, X-Torus = 3D-Torus)
• 3D-Torus• 3D-Mesh
• 3D-Torus• 3D-Mesh
Network logic area• Network area
– Routers & NIs– Inter-tier vias
• Synthesis of NoC– 64-core (16-core x 4)– 0.18um CMOS
• Router architecture– 1-flit = 32-bit– Wormhole switching– 4-stage pipeline
• Inter-tier vias– 1-10um square– 25um per layer per 1-
bit signal
[Li, ISCA’06][Burns, ISSCC’01]
2
Inter-tier via area is calculated according to # of vertical links
Inter-tier via area is calculated according to # of vertical links
CrossbarInput Ports
Buf
Buf
Arbiter
Typical wormhole router [Matsutani, IPDPS’07]
Network logic area: Results
Network logic area [mm ]
3D Mesh/Torus require 2-port for vertical (i.e., up & down)
XNoTs require only 1-port for vertical (but # of xbar increases)
2
• Synthesis of NoC– 64-core (16-core x 4)– 0.18um CMOS
• Router architecture– 1-flit = 32-bit– Wormhole switching– 4-stage pipeline
• Inter-tier vias– 1-10um square– 25um per layer per 1-
bit signal
[Li, ISCA’06][Burns, ISSCC’01]
2
Inter-tier via area is calculated according to # of vertical links
Inter-tier via area is calculated according to # of vertical links
Energy: NoC’s energy model
• Ave. flit energy– Send 1-flit to dest.– How much energy[J] ?
• Parameters– 6mm square chip– 64-core (16-core x 4)– 0.18um CMOS
• Switching energy– 1-bit switching @ Router– Gate-level sim– 1.13 [pJ / hop]
• Link energy– 1-bit transfer @ Link– 0.67 [pJ / mm]
• Via energy– 4.34 [fF / via]
flitE
swE
linkE
)( linkswaveflit EEHwE
6mm
[Davis, DToC’05]
Energy: Simulation results
• Parameters– 6mm square chip– 64-core (16-core x 4)– 0.18um CMOS
• Switching energy– 1-bit switching @ Router– Gate-level sim– 1.13 [pJ / hop]
• Link energy– 1-bit transfer @ Link– 0.67 [pJ / mm]
• Via energy– 4.34 [fF / via]
swE
linkE
[Davis, DToC’05]
Ave. Flit energy [pJ]
Hop count is short in XNoTs low
power
flitE
Summary: 3D topologies - XNoTs
• Requirements– Different circuits on each layer– Different topologies on each layer– How to connect/route them?
• XNoTs– Tiers are connected by crossbars– Arbitrary tiers can be stacked
• Current problem / future work– We assumed full crossbar as a
baseline– More efficient implementation has
been proposed by– We must revise router
architecture
[Kim, ISCA’07]
Fat Tree
Ring
2D-Mesh
Thank you for your attention
XNoTs: Path selection (QoS)• Control packets
– In-order delivery is required
• Data packets– In-order delivery is
not required– Large data streams
XNoTs (Side view)
Dimension-order (deterministic)
Duato’s Protocol (adaptive)
Duato’s Protocol (adaptive)
Control packets use
tier-1
Deterministic routing
Adaptive routing
XNoTs: Path selection (QoS)• Control packets
– In-order delivery is required
• Data packets– In-order delivery is
not required– Large data streams
Deterministic routing
Adaptive routing
Dimension-order (deterministic)
Duato’s Protocol (adaptive)
Duato’s Protocol (adaptive)
XNoTs (Side view)Various QoS controls are possible by path selection algorithm
Data packets use tier-2 or
tier-3
XNoTs: Path selection (bottom first)
• Heat dissipation is crucial in 3D ICs• Bottom tier
– Close to the board (good heat dissipation property)
• Bottom tier first– Tier-0 is firstly used if there are alternative paths
XNoTs (Side View)
board as heat-sink
3D IC
Bottom tier
Ideal throughput: Channel bisection
• Number of unidirectional links that cross bisection
N-core × n-tier 1-tier 2-tier 4-tier
X-Mesh 8 16 32X-Torus 16 32 64X-FT141 4 8 16X-FT241 8 16 32X-FT441 16 32 643D-Mesh 8 16 323D-Torus 16 32 64
iiN 22
),2min( 1 nNni
),2min( 2 nNni
),4min( nNn
),2min( 1 nNni
),4min( nNni
),2min( 1 nNni
),2min( 2 nNni
16N
No degradation (X-Mesh = 3D-Mesh, X-Torus = 3D-Torus)
3D Topologies: 3D-Mesh
3D-Mesh (4x4x4=64)
Average hop count: 5.33Channel bisection: 16Number of routers: 64Node degree: 5
Average hop count: 4.00Channel bisection: 32Number of routers: 64Node degree: 7
2D-Mesh (8x8=64)
Tier-0
Tier-1
Tier-2
Tier-3