tightly-coupled multi-layer

Post on 14-Jan-2016

30 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

DESCRIPTION

Tightly-Coupled Multi-Layer. Topologies for 3D NoCs. Hiroki Matsutani (Keio Univ, JAPAN) Michihiro Koibuchi (NII, JAPAN) Hideharu Amano (Keio Univ, JAPAN). Outline. Network-on-Chip (NoC) Typical 2D topologies 2D vs. 3D XNoTs New class of 3D topologies Definition, Examples - PowerPoint PPT Presentation

TRANSCRIPT

Tightly-Coupled Multi-Layer

Topologies for 3D NoCs

Hiroki Matsutani (Keio Univ, JAPAN)Michihiro Koibuchi (NII, JAPAN)

Hideharu Amano (Keio Univ, JAPAN)

Outline• Network-on-Chip (NoC)

– Typical 2D topologies– 2D vs. 3D

• XNoTs– New class of 3D topologies– Definition, Examples– Deadlock-free routing

• Evaluations– Throughput– Area, Energy consumption

Network-on-Chip (NoC)• Tile architectures

– MIT RAW

– Texas U. TRIPS

– Intel 80-tile NoC

• Various topologies– Mesh, Torus, Tree– Large impact on en

ergy, cost, and performance

[Vangal, ISSCC’07]

[Buger, Computer’04]

[Taylor, Micro’02]

An example of tile architecture (ASPLA 90nm CMOS process)

Tile = Processing core + On-chip

router

Packet switched network

2D Topologies: Mesh & Torus

Router Core

• 2-D Mesh • 2-D Torus– 2x bandwidth of meshRAW [Taylor, IEEE Micro’02]

2D Topologies: Fat Tree

• Fat Tree (p, q, c)p: # of upward linksq: # of downward

linksc: # of core ports

Router Core

Fat Tree (2,4,2)Fat Tree (2,4,1)

Network topology should be carefully selected so as to meet the requirements of

application

Network topology should be carefully selected so as to meet the requirements of

application

2D NoC vs. 3D NoC• 2D NoCs

– Long wires, distance– Wire delay– Packets consume

power at links according to their wire length

• 3D NoCs– Several small wafers

or dices are stacked

• Vertical link– Micro bump– Through-wafer via

– Very short (10-50um)

[Ezaki, ISSCC’04]

[Burns, ISSCC’01]

Long horizontal wires in 2D NoCs can be replaced by very short vertical links in 3D

NoCs

Long horizontal wires in 2D NoCs can be replaced by very short vertical links in 3D

NoCs

3D NoCs that have heterogeneous tiers

• Different circuits on each tier

• Different topologies on each tier

Processor array

Cache memory

Custom logic

Tier-1

Tier-2

Tier-3Fat

Tree(2,4,1)

Ring

2D-Mesh

(*) A tier refers a wafer or a die in 3D ICs

How to connect different planar topologies?

How to route packets in heterogeneous 3D NoCs?

How to connect different planar topologies?

How to route packets in heterogeneous 3D NoCs?

We propose a class of topology for heterogeneous 3D NoCs

Outline• Network-on-Chip (NoC)

– Typical 2D topologies– 2D vs. 3D

• XNoTs– New class of 3D topologies– Definition, Examples– Deadlock-free routing

• Evaluations– Throughput– Area, Energy consumption

Multiple network layers are tightly connected by vertical crossbar

switches

Existing vertical link designs

• Vertical bus

• Merit– Small # of vertical link

• Demerit– Low peak performance

• Vertical crossbar

• Merit– Similar performance to tr

ue crossbar– Reasonable # of vertical l

inks

[Li, ISCA’06][Kim, ISCA’07]

We assume to use crossbar-based vertical link for 3D NoCs

Single bus (only a single transfer at the same

time)

Segmented buses (multiple transfers at the same time)

XNoTs: Xbar-connected Network-on-Tiers

• XNoTs: – Multiple planar

topologies– Connected by crossbars

• Network-on-Tier (NoT)– A planar topology– Implemented on a tier– Bottom NoT provides con

nectivity to all cores

Network-on-Tier

Network-on-Tier

Network-on-Tier

XNoTs

A mesh-based NoT

Each core and router have a port for a vertical

connection

Router Core

XNoTs: Xbar-connected Network-on-Tiers

• XNoTs: – Multiple planar

topologies– Connected by crossbars

• Network-on-Tier (NoT)– A planar topology– Implemented on a tier– Bottom NoT provides con

nectivity to all cores

A mesh-based NoT

Router CoreA mesh-based XNoTs

All routers and cores in a same pillar are connected by a crossbar

All routers and cores in a same pillar are connected by a crossbar

Vertical crossbar

pillar

Examples: all tiers have same topology

Mesh-based XNoTs Ring-based XNoTs Tree-based XNoTs

Side viewSide view Side view

Mesh-based XNoTs Ring-based XNoTs Tree-based XNoTs

All routers and cores in a same pillar are connected by a crossbar

All routers and cores in a same pillar are connected by a crossbar

Examples: all tiers have same topology

Examples: Heterogeneous XNoTs (1)

• Different topologies are used in each tier

Fat Tree(2,4,1)

Ring

2D-Mesh

Examples: Heterogeneous XNoTs (1)

Side view

Fat Tree(2,4,1)

Ring

2D-Mesh

• Different topologies are used in each tier

Packets are transferred via bottom tier (tier-1)

Packets are transferred via bottom tier (tier-1)

No connectivity

Examples: Heterogeneous XNoTs (2)

• All tiers cannot provide connectivity to all cores– Except for the bottom tier (i.e., “escape” tier)

Bottom tier (Full connectivity to all

cores)

Top tier (Some links are disconnected)

(*) Only the bottom tier must provide full connectivity to all cores

Examples: Heterogeneous XNoTs (2)

• All tiers cannot provide connectivity to all cores– Except for the bottom tier (i.e., “escape” tier)

Packets are transferred via bottom tier (tier-1)

Packets are transferred via bottom tier (tier-1)

Bottom tier (Full connectivity to all

cores)

Top tier (Some links are disconnected)

(*) Only the bottom tier must provide full connectivity to all cores

XNoTs: Deadlock-free routing

• Intra-tier comm. (X and Y directions)

– Existing deadlock-free routing is used within a tier– Only tier-0 must guarantee connectivity to all cores

• Inter-tier comm. (Z direction)

– Turns from lower-tier to higher-tier are prohibited– Unless the next hop is final destination

Top view Side viewMesh based XNoTs

E.g., dimension-order routing (DOR)

OK!NG!

XNoTs: Path selection (random)

• XNoTs routing– Multiple tiers are available Alternative paths are available

• Path selection policy– How to select a single path?– Random selection Good load balancing

5-hop

5-hop

5-hop

Top view Side viewMesh based XNoTs

We also proposed some policy based path selection policies. For more detail, please refer to the paper.

We also proposed some policy based path selection policies. For more detail, please refer to the paper.

Outline• Network-on-Chip (NoC)

– Typical 2D topologies– 2D vs. 3D

• XNoTs– New class of 3D topologies– Definition, Examples– Deadlock-free routing

• Evaluations– Throughput– Area, Energy consumption

Evaluation: Target topologies (64-core)

• X-Mesh– (4x4 Mesh) x 4 layers

• X-Torus– (4x4 Torus) x 4 layers

• X-FT141– Fat Tree(1,4,1) x 4 layers

• X-FT241– Fat Tree(2,4,1) x 4 layers

• X-FT441– Fat Tree(4,4,1) x 4 layers

X-Mesh

p: # of upward linksq: # of downward

linksc: # of core ports

Fat Tree (p, q, c)

These five topologies are compares with 3D Mesh/Torus

Throughput: Simulation environment

• Grid-based topologies– 3D-Mesh, X-Mesh– 3D-Torus, X-Torus– Dimension-order

routing

• Tree-based topologies– X-FT141, X-FT241– X-FT441– Up*/down* routing

• Path selection policy– Random

Packet size 16-flit (1-flit header)Buffer size 1-flit per channelSwitching Wormhole switching

Latency 3-cycle per 1-hopTraffic Uniform random

(Two virtual channels for tori)

X-Mesh (4x4x4)

Throughput: Simulation results

• X-Torus• X-Mesh

• X-FT441• X-FT241• X-FT141

Grid-based XNoTs Tree-based XNoTs

No degradation (X-Mesh = 3D-Mesh, X-Torus = 3D-Torus)

• 3D-Torus• 3D-Mesh

• 3D-Torus• 3D-Mesh

Network logic area• Network area

– Routers & NIs– Inter-tier vias

• Synthesis of NoC– 64-core (16-core x 4)– 0.18um CMOS

• Router architecture– 1-flit = 32-bit– Wormhole switching– 4-stage pipeline

• Inter-tier vias– 1-10um square– 25um per layer per 1-

bit signal

[Li, ISCA’06][Burns, ISSCC’01]

2

Inter-tier via area is calculated according to # of vertical links

Inter-tier via area is calculated according to # of vertical links

CrossbarInput Ports

Buf

Buf

Arbiter

Typical wormhole router [Matsutani, IPDPS’07]

Network logic area: Results

Network logic area [mm ]

3D Mesh/Torus require 2-port for vertical (i.e., up & down)

XNoTs require only 1-port for vertical (but # of xbar increases)

2

• Synthesis of NoC– 64-core (16-core x 4)– 0.18um CMOS

• Router architecture– 1-flit = 32-bit– Wormhole switching– 4-stage pipeline

• Inter-tier vias– 1-10um square– 25um per layer per 1-

bit signal

[Li, ISCA’06][Burns, ISSCC’01]

2

Inter-tier via area is calculated according to # of vertical links

Inter-tier via area is calculated according to # of vertical links

Energy: NoC’s energy model

• Ave. flit energy– Send 1-flit to dest.– How much energy[J] ?

• Parameters– 6mm square chip– 64-core (16-core x 4)– 0.18um CMOS

• Switching energy– 1-bit switching @ Router– Gate-level sim– 1.13 [pJ / hop]

• Link energy– 1-bit transfer @ Link– 0.67 [pJ / mm]

• Via energy– 4.34 [fF / via]

flitE

swE

linkE

)( linkswaveflit EEHwE

6mm

[Davis, DToC’05]

Energy: Simulation results

• Parameters– 6mm square chip– 64-core (16-core x 4)– 0.18um CMOS

• Switching energy– 1-bit switching @ Router– Gate-level sim– 1.13 [pJ / hop]

• Link energy– 1-bit transfer @ Link– 0.67 [pJ / mm]

• Via energy– 4.34 [fF / via]

swE

linkE

[Davis, DToC’05]

Ave. Flit energy [pJ]

Hop count is short in XNoTs low

power

flitE

Summary: 3D topologies - XNoTs

• Requirements– Different circuits on each layer– Different topologies on each layer– How to connect/route them?

• XNoTs– Tiers are connected by crossbars– Arbitrary tiers can be stacked

• Current problem / future work– We assumed full crossbar as a

baseline– More efficient implementation has

been proposed by– We must revise router

architecture

[Kim, ISCA’07]

Fat Tree

Ring

2D-Mesh

Thank you for your attention

XNoTs: Path selection (QoS)• Control packets

– In-order delivery is required

• Data packets– In-order delivery is

not required– Large data streams

XNoTs (Side view)

Dimension-order (deterministic)

Duato’s Protocol (adaptive)

Duato’s Protocol (adaptive)

Control packets use

tier-1

Deterministic routing

Adaptive routing

XNoTs: Path selection (QoS)• Control packets

– In-order delivery is required

• Data packets– In-order delivery is

not required– Large data streams

Deterministic routing

Adaptive routing

Dimension-order (deterministic)

Duato’s Protocol (adaptive)

Duato’s Protocol (adaptive)

XNoTs (Side view)Various QoS controls are possible by path selection algorithm

Data packets use tier-2 or

tier-3

XNoTs: Path selection (bottom first)

• Heat dissipation is crucial in 3D ICs• Bottom tier

– Close to the board (good heat dissipation property)

• Bottom tier first– Tier-0 is firstly used if there are alternative paths

XNoTs (Side View)

board as heat-sink

3D IC

Bottom tier

Ideal throughput: Channel bisection

• Number of unidirectional links that cross bisection

N-core × n-tier 1-tier 2-tier 4-tier

X-Mesh 8 16 32X-Torus 16 32 64X-FT141 4 8 16X-FT241 8 16 32X-FT441 16 32 643D-Mesh 8 16 323D-Torus 16 32 64

iiN 22

),2min( 1 nNni

),2min( 2 nNni

),4min( nNn

),2min( 1 nNni

),4min( nNni

),2min( 1 nNni

),2min( 2 nNni

16N

No degradation (X-Mesh = 3D-Mesh, X-Torus = 3D-Torus)

3D Topologies: 3D-Mesh

3D-Mesh (4x4x4=64)

Average hop count: 5.33Channel bisection: 16Number of routers: 64Node degree: 5

Average hop count: 4.00Channel bisection: 32Number of routers: 64Node degree: 7

2D-Mesh (8x8=64)

Tier-0

Tier-1

Tier-2

Tier-3

top related