part iv: 3d winoc architectures - keio university · 2014-09-07 · • adding extra vcs –...
TRANSCRIPT
![Page 1: Part IV: 3D WiNoC Architectures - Keio University · 2014-09-07 · • Adding extra VCs – Conventional way – Duplicating buffers – 2 VCs for each message class • Bubble flow](https://reader033.vdocument.in/reader033/viewer/2022060512/5f2a0e800f64023e77105369/html5/thumbnails/1.jpg)
Wireless NoC as Interconnection Backbone for Multicore Chips: Promises, Challenges,
and Recent Developments
Part IV: 3D WiNoC Architectures
Mar 24th, 2014 Hiroki Matsutani, "3D WiNoC Architectures", Tutorial at DATE'14 1
Hiroki Matsutani Keio University, Japan
![Page 2: Part IV: 3D WiNoC Architectures - Keio University · 2014-09-07 · • Adding extra VCs – Conventional way – Duplicating buffers – 2 VCs for each message class • Bubble flow](https://reader033.vdocument.in/reader033/viewer/2022060512/5f2a0e800f64023e77105369/html5/thumbnails/2.jpg)
Outline: 3D WiNoC Architectures
• 3D IC technologies: Wired vs. Wireless [5min]
• Prototype systems: Cube-0 & Cube-1 [15min]
• Wireless 3D NoC architectures [15min]
– Ring-based 3D WiNoC – Irregular 3D WiNoC
• Experiment results and Summary [10min]
Mar 24th, 2014 2 Hiroki Matsutani, "3D WiNoC Architectures", Tutorial at DATE'14
So far we focused on 2D WiNoC architecture and its physical link design. This part explores 3D WiNoC
architectures, especially inductive-coupling 3D option.
![Page 3: Part IV: 3D WiNoC Architectures - Keio University · 2014-09-07 · • Adding extra VCs – Conventional way – Duplicating buffers – 2 VCs for each message class • Bubble flow](https://reader033.vdocument.in/reader033/viewer/2022060512/5f2a0e800f64023e77105369/html5/thumbnails/3.jpg)
Design cost of LSI is increasing
• System-on-Chip (SoC) – Required components are integrated on a single chip – Different LSI must be developed for each application
• System-in-Package (SiP) or 3D IC – Required components are stacked for each application
Mar 24th, 2014 3 Hiroki Matsutani, "3D WiNoC Architectures", Tutorial at DATE'14 Next slides show techniques for stacking multiple chips
By changing the chips in a package, we can provide a wider range of chip family with modest design cost
![Page 4: Part IV: 3D WiNoC Architectures - Keio University · 2014-09-07 · • Adding extra VCs – Conventional way – Duplicating buffers – 2 VCs for each message class • Bubble flow](https://reader033.vdocument.in/reader033/viewer/2022060512/5f2a0e800f64023e77105369/html5/thumbnails/4.jpg)
3D IC technology for going vertical
Mar 24th, 2014 4 Hiroki Matsutani, "3D WiNoC Architectures", Tutorial at DATE'14
Two
chip
s (f
ace-
to-f
ace)
Microbump
Through silicon via
Capacitive coupling
Inductive coupling
Wired Wireless M
ore
than
th
ree
chip
s
Scalability
Flexibility
![Page 5: Part IV: 3D WiNoC Architectures - Keio University · 2014-09-07 · • Adding extra VCs – Conventional way – Duplicating buffers – 2 VCs for each message class • Bubble flow](https://reader033.vdocument.in/reader033/viewer/2022060512/5f2a0e800f64023e77105369/html5/thumbnails/5.jpg)
Inductive coupling link for 3D ICs
Mar 24th, 2014 5 Hiroki Matsutani, "3D WiNoC Architectures", Tutorial at DATE'14
Stacking after chip fabrication Only know-good-dies selected More than
3 chips Bonding wires for power supply
Inductor for transceiver Implemented as a square coil with metal in common CMOS
Not a serious problem. Only metal layers are occupied
Footprint of inductor
We have developed some prototype systems of wireless 3D ICs using the inductive coupling
Note: This part focuses on inter-chip w ireless, not the intra-chip w ireless introduced in Parts I I and III .
![Page 6: Part IV: 3D WiNoC Architectures - Keio University · 2014-09-07 · • Adding extra VCs – Conventional way – Duplicating buffers – 2 VCs for each message class • Bubble flow](https://reader033.vdocument.in/reader033/viewer/2022060512/5f2a0e800f64023e77105369/html5/thumbnails/6.jpg)
Outline: 3D WiNoC Architectures
• 3D IC technologies: Wired vs. Wireless [5min]
• Prototype systems: Cube-0 & Cube-1 [15min]
• Wireless 3D NoC architectures [15min]
– Ring-based 3D WiNoC – Irregular 3D WiNoC
• Experiment results and Summary [10min]
Mar 24th, 2014 6 Hiroki Matsutani, "3D WiNoC Architectures", Tutorial at DATE'14
So far we focused on 2D WiNoC architecture and its physical link design. This part explores 3D WiNoC
architectures, especially inductive-coupling 3D option.
![Page 7: Part IV: 3D WiNoC Architectures - Keio University · 2014-09-07 · • Adding extra VCs – Conventional way – Duplicating buffers – 2 VCs for each message class • Bubble flow](https://reader033.vdocument.in/reader033/viewer/2022060512/5f2a0e800f64023e77105369/html5/thumbnails/7.jpg)
An example: MuCCRA-Cube (2008)
• 4 MuCCRA chips are stacked on a PCB board
Mar 24th, 2014 7
PE
PE
PE
PE
PE
PE
PE
PE
PE
PE
PE
PE
PE
PE
PE
PE
Data Memory
Technology: 90nm
5.0m
m
2.5mm
Inductive-Coupling Up Link
Inductive-Coupling Down Link
[Saito,FPL’09] Chip thickness: 85um, Glue: 10um
![Page 8: Part IV: 3D WiNoC Architectures - Keio University · 2014-09-07 · • Adding extra VCs – Conventional way – Duplicating buffers – 2 VCs for each message class • Bubble flow](https://reader033.vdocument.in/reader033/viewer/2022060512/5f2a0e800f64023e77105369/html5/thumbnails/8.jpg)
Stacking method: Staircase stacking
Inductor has TX/RX/Idle modes
Mar 24th, 2014 8
TX
TX
TX
TX
TX TX
TX
TX
TX Bonding wire
Pillar
Inductor (TX)
Inductor (RX)
Bonding wire
Bonding wire
Slide & stack
Hiroki Matsutani, "3D WiNoC Architectures", Tutorial at DATE'14
(mode change 1-cycle)
![Page 9: Part IV: 3D WiNoC Architectures - Keio University · 2014-09-07 · • Adding extra VCs – Conventional way – Duplicating buffers – 2 VCs for each message class • Bubble flow](https://reader033.vdocument.in/reader033/viewer/2022060512/5f2a0e800f64023e77105369/html5/thumbnails/9.jpg)
Stacking method: Staircase stacking
• Inductive-coupling link – Local clock @ 4GHz – Serial data
Mar 24th, 2014 9 Hiroki Matsutani, "3D WiNoC Architectures", Tutorial at DATE'14
TX
TX
TX TX
TX
TX
TxData TxClk
System clock for NoC: 200MHz
TxData TxClk
Local clock shared by neighboring chips; No global sync.
35-bit transfer for each clock
TX
Pillar
Inductor (TX)
Inductor (RX)
We have fabricated some prototype multi-core systems using this wireless technology
![Page 10: Part IV: 3D WiNoC Architectures - Keio University · 2014-09-07 · • Adding extra VCs – Conventional way – Duplicating buffers – 2 VCs for each message class • Bubble flow](https://reader033.vdocument.in/reader033/viewer/2022060512/5f2a0e800f64023e77105369/html5/thumbnails/10.jpg)
An example: Cube-0 (2010)
• Test chip for vertical communication schemes – Vertical point-to-point link between adjacent chips – Vertical shared bus (broadcast)
• Each chip has – 2 cores (packet counter) – 2 routers – Inductors (P2P ring) – Inductors (vertical bus)
Mar 24th, 2014 10 Hiroki Matsutani, "3D WiNoC Architectures", Tutorial at DATE'14
2.1mm x 2.1mm
Core 0 & 1
Inductors (bus)
Inductors (P2P)
Router 0 & 1
Process: Fujitsu 65nm (CS202SZ) Voltage: 1.2V System clock: 200MHz
[Matsutani, NOCS’11]
![Page 11: Part IV: 3D WiNoC Architectures - Keio University · 2014-09-07 · • Adding extra VCs – Conventional way – Duplicating buffers – 2 VCs for each message class • Bubble flow](https://reader033.vdocument.in/reader033/viewer/2022060512/5f2a0e800f64023e77105369/html5/thumbnails/11.jpg)
An example: Cube-0 (2010)
• Test chip for vertical communication schemes – Vertical point-to-point link between adjacent chips – Vertical shared bus (broadcast)
Mar 24th, 2014 11 Hiroki Matsutani, "3D WiNoC Architectures", Tutorial at DATE'14
2.1mm x 2.1mm
Core 0 & 1
Inductors (bus)
Inductors (P2P)
Router 0 & 1
TX
Stacking for Ring network
RX
Slide & stack
[Matsutani, NOCS’11]
![Page 12: Part IV: 3D WiNoC Architectures - Keio University · 2014-09-07 · • Adding extra VCs – Conventional way – Duplicating buffers – 2 VCs for each message class • Bubble flow](https://reader033.vdocument.in/reader033/viewer/2022060512/5f2a0e800f64023e77105369/html5/thumbnails/12.jpg)
An example: Cube-0 (2010)
• Test chip for vertical communication schemes – Vertical point-to-point link between adjacent chips – Vertical shared bus (broadcast)
12
2.1mm x 2.1mm
Core 0 & 1
Inductors (bus)
Inductors (P2P)
Router 0 & 1
TX
Stacking for Ring network
RX
TX/RX
Stacking for Vertical bus
[Matsutani, NOCS’11]
![Page 13: Part IV: 3D WiNoC Architectures - Keio University · 2014-09-07 · • Adding extra VCs – Conventional way – Duplicating buffers – 2 VCs for each message class • Bubble flow](https://reader033.vdocument.in/reader033/viewer/2022060512/5f2a0e800f64023e77105369/html5/thumbnails/13.jpg)
An example: Cube-1 (2012)
• Test chips for building-block 3D systems – Two chip types: Host CPU chip & Accelerator chip – We can customize number & types of chips in SiP
• Cube-1 Host CPU chip – Two 3D wireless routers – MIPS-like CPU
• Cube-1 Accelerator chip – Two 3D wireless routers – Processing element array
Mar 24th, 2014 13 Hiroki Matsutani, "3D WiNoC Architectures", Tutorial at DATE'14
[Miura, IEEE Micro 13]
MIP CPU Core
8x8 PE Array
Inductor
![Page 14: Part IV: 3D WiNoC Architectures - Keio University · 2014-09-07 · • Adding extra VCs – Conventional way – Duplicating buffers – 2 VCs for each message class • Bubble flow](https://reader033.vdocument.in/reader033/viewer/2022060512/5f2a0e800f64023e77105369/html5/thumbnails/14.jpg)
An example: Cube-1 (2012)
Mar 24th, 2014 14 Hiroki Matsutani, "3D WiNoC Architectures", Tutorial at DATE'14
• Microphotographs of test chips
Host CPU Chip
Accelerator Chip
Host CPU + 3 Accelerators
[Miura, IEEE Micro 13]
![Page 15: Part IV: 3D WiNoC Architectures - Keio University · 2014-09-07 · • Adding extra VCs – Conventional way – Duplicating buffers – 2 VCs for each message class • Bubble flow](https://reader033.vdocument.in/reader033/viewer/2022060512/5f2a0e800f64023e77105369/html5/thumbnails/15.jpg)
An example: Cube-1 (2012)
Mar 24th, 2014 15 Hiroki Matsutani, "3D WiNoC Architectures", Tutorial at DATE'14
• Block diagram of CPU & Accelerator chips
![Page 16: Part IV: 3D WiNoC Architectures - Keio University · 2014-09-07 · • Adding extra VCs – Conventional way – Duplicating buffers – 2 VCs for each message class • Bubble flow](https://reader033.vdocument.in/reader033/viewer/2022060512/5f2a0e800f64023e77105369/html5/thumbnails/16.jpg)
An example: Cube-1 (2012)
Mar 24th, 2014 16 Hiroki Matsutani, "3D WiNoC Architectures", Tutorial at DATE'14
• Inductive-coupling ThruChip Interface (TCI)
Note: P lease refer to Part III for antenna design for on-chip w ireless.
![Page 17: Part IV: 3D WiNoC Architectures - Keio University · 2014-09-07 · • Adding extra VCs – Conventional way – Duplicating buffers – 2 VCs for each message class • Bubble flow](https://reader033.vdocument.in/reader033/viewer/2022060512/5f2a0e800f64023e77105369/html5/thumbnails/17.jpg)
An example: Cube-1 (2012)
Mar 24th, 2014 17 Hiroki Matsutani, "3D WiNoC Architectures", Tutorial at DATE'14
![Page 18: Part IV: 3D WiNoC Architectures - Keio University · 2014-09-07 · • Adding extra VCs – Conventional way – Duplicating buffers – 2 VCs for each message class • Bubble flow](https://reader033.vdocument.in/reader033/viewer/2022060512/5f2a0e800f64023e77105369/html5/thumbnails/18.jpg)
An example: Cube-1 (2012)
Mar 24th, 2014 18 Hiroki Matsutani, "3D WiNoC Architectures", Tutorial at DATE'14
Cube-1
Motherboard
[Miura, HotChips’13 Demo]
Cube-1 demo system PE array chip performs image processing
CPU chip for control
![Page 19: Part IV: 3D WiNoC Architectures - Keio University · 2014-09-07 · • Adding extra VCs – Conventional way – Duplicating buffers – 2 VCs for each message class • Bubble flow](https://reader033.vdocument.in/reader033/viewer/2022060512/5f2a0e800f64023e77105369/html5/thumbnails/19.jpg)
Outline: 3D WiNoC Architectures
• 3D IC technologies: Wired vs. Wireless [5min]
• Prototype systems: Cube-0 & Cube-1 [15min]
• Wireless 3D NoC architectures [15min]
– Ring-based 3D WiNoC – Irregular 3D WiNoC
• Experiment results and Summary [10min]
Mar 24th, 2014 19 Hiroki Matsutani, "3D WiNoC Architectures", Tutorial at DATE'14
So far we focused on 2D WiNoC architecture and its physical link design. This part explores 3D WiNoC
architectures, especially inductive-coupling 3D option.
![Page 20: Part IV: 3D WiNoC Architectures - Keio University · 2014-09-07 · • Adding extra VCs – Conventional way – Duplicating buffers – 2 VCs for each message class • Bubble flow](https://reader033.vdocument.in/reader033/viewer/2022060512/5f2a0e800f64023e77105369/html5/thumbnails/20.jpg)
Hiroki Matsutani, "3D WiNoC Architectures", Tutorial at DATE'14
Big picture: Wireless 3D NoC
• Arbitrary chips are stacked after fabrication – Each chip has vertical links at pre-specified locations, but
we do not know internal topology of each chip – Wireless 3D NoC required to stack unknown topologies
Mar 24th, 2014 20
CPU chip from C
Memory chip from A
GPU chip from B
Required chips are stacked for each application
An example (4 chips)
Note: We can add long-range links to induce small-world effects [See Part I ]
![Page 21: Part IV: 3D WiNoC Architectures - Keio University · 2014-09-07 · • Adding extra VCs – Conventional way – Duplicating buffers – 2 VCs for each message class • Bubble flow](https://reader033.vdocument.in/reader033/viewer/2022060512/5f2a0e800f64023e77105369/html5/thumbnails/21.jpg)
Hiroki Matsutani, "3D WiNoC Architectures", Tutorial at DATE'14
Two approaches: Wireless 3D NoC arch
• Ring-based approach – Easy to add & remove – Inefficient hop count – No scalability
• Irregular approach – We can use any links – Irregular routing needed – Plug-and-play protocol
Mar 24th, 2014 21 Chip 0
Chip 1
Chip 2
Chip 3
Chip 4
Chips should be added, removed, swapped for each app.
Chip 0
Chip 1
Chip 2
Chip 3
Chip 4
Good
Bad
Bad [Matsutani, NOCS’11] [Matsutani, ASPDAC’13]
![Page 22: Part IV: 3D WiNoC Architectures - Keio University · 2014-09-07 · • Adding extra VCs – Conventional way – Duplicating buffers – 2 VCs for each message class • Bubble flow](https://reader033.vdocument.in/reader033/viewer/2022060512/5f2a0e800f64023e77105369/html5/thumbnails/22.jpg)
Ring-based 3D wireless NoC • Chips are connected via unidirectional rings
Mar 24th, 2014 22
TX
TX
TX
TX TX
TX
TX
TX
TX
Pillar
TX
RX Router to horizontal NoC
Hiroki Matsutani, "3D WiNoC Architectures", Tutorial at DATE'14
![Page 23: Part IV: 3D WiNoC Architectures - Keio University · 2014-09-07 · • Adding extra VCs – Conventional way – Duplicating buffers – 2 VCs for each message class • Bubble flow](https://reader033.vdocument.in/reader033/viewer/2022060512/5f2a0e800f64023e77105369/html5/thumbnails/23.jpg)
Hiroki Matsutani, "3D WiNoC Architectures", Tutorial at DATE'14
Ring approach: Deadlock problem
• Ring inherently includes a cyclic dependency
Mar 24th, 2014 23
Buffer
Pillar
TX TX
RX Router to horizontal NoC
![Page 24: Part IV: 3D WiNoC Architectures - Keio University · 2014-09-07 · • Adding extra VCs – Conventional way – Duplicating buffers – 2 VCs for each message class • Bubble flow](https://reader033.vdocument.in/reader033/viewer/2022060512/5f2a0e800f64023e77105369/html5/thumbnails/24.jpg)
Hiroki Matsutani, "3D WiNoC Architectures", Tutorial at DATE'14
Buffer
Ring approach: Deadlock problem
• Ring inherently includes a cyclic dependency
Mar 24th, 2014 24
Pillar
TX TX
RX Router to horizontal NoC
![Page 25: Part IV: 3D WiNoC Architectures - Keio University · 2014-09-07 · • Adding extra VCs – Conventional way – Duplicating buffers – 2 VCs for each message class • Bubble flow](https://reader033.vdocument.in/reader033/viewer/2022060512/5f2a0e800f64023e77105369/html5/thumbnails/25.jpg)
Hiroki Matsutani, "3D WiNoC Architectures", Tutorial at DATE'14
Ring approach: Deadlock problem
• Ring inherently includes a cyclic dependency
Mar 24th, 2014 25
Buffer
Cannot move
Cannot move
Cannot move
Pillar
TX TX
RX Router to horizontal NoC
Any packets cannot advance Deadlock avoidance is needed
![Page 26: Part IV: 3D WiNoC Architectures - Keio University · 2014-09-07 · • Adding extra VCs – Conventional way – Duplicating buffers – 2 VCs for each message class • Bubble flow](https://reader033.vdocument.in/reader033/viewer/2022060512/5f2a0e800f64023e77105369/html5/thumbnails/26.jpg)
Hiroki Matsutani, "3D WiNoC Architectures", Tutorial at DATE'14
Ring approach: Deadlock avoidance
• Adding extra VCs – Conventional way – Duplicating buffers – 2 VCs for each
message class
• Bubble flow control – Buffer space of a single
packet must be always reserved in each router
– All message classes share the same buffers
Mar 24th, 2014 26
Cyclic dependency is formed
[Puente,ICPP’99]
![Page 27: Part IV: 3D WiNoC Architectures - Keio University · 2014-09-07 · • Adding extra VCs – Conventional way – Duplicating buffers – 2 VCs for each message class • Bubble flow](https://reader033.vdocument.in/reader033/viewer/2022060512/5f2a0e800f64023e77105369/html5/thumbnails/27.jpg)
Hiroki Matsutani, "3D WiNoC Architectures", Tutorial at DATE'14
Ring approach: Deadlock avoidance
• Adding extra VCs – Conventional way – Duplicating buffers – 2 VCs for each
message class
• Bubble flow control – Buffer space of a single
packet must be always reserved in each router
– All message classes share the same buffers
Mar 24th, 2014 27
Cyclic dependency is cut at the dateline
Two VCs (VC0 and VC1)
Dateline
VC0 VC1
[Puente,ICPP’99] 2 VCs required for a message class; Multi-core uses multiple classes
![Page 28: Part IV: 3D WiNoC Architectures - Keio University · 2014-09-07 · • Adding extra VCs – Conventional way – Duplicating buffers – 2 VCs for each message class • Bubble flow](https://reader033.vdocument.in/reader033/viewer/2022060512/5f2a0e800f64023e77105369/html5/thumbnails/28.jpg)
Hiroki Matsutani, "3D WiNoC Architectures", Tutorial at DATE'14
Ring approach: Deadlock avoidance
• Adding extra VCs – Conventional way – Duplicating buffers – 2 VCs for each
message class
• Bubble flow control – Buffer space of a single
packet must be always reserved in each router
– All message classes share the same buffers
Mar 24th, 2014 28
Deadlock occurs, because all buffers are occupied
[Puente,ICPP’99]
![Page 29: Part IV: 3D WiNoC Architectures - Keio University · 2014-09-07 · • Adding extra VCs – Conventional way – Duplicating buffers – 2 VCs for each message class • Bubble flow](https://reader033.vdocument.in/reader033/viewer/2022060512/5f2a0e800f64023e77105369/html5/thumbnails/29.jpg)
Hiroki Matsutani, "3D WiNoC Architectures", Tutorial at DATE'14
Ring approach: Deadlock avoidance
• Adding extra VCs – Conventional way – Duplicating buffers – 2 VCs for each
message class
• Bubble flow control – Buffer space of a single
packet must be always reserved in each router
– All message classes share the same buffers
Mar 24th, 2014 29
Deadlock does not occur unless all buffers are occupied
Empty space of a packet
[Puente,ICPP’99] We employ Bubble flow for CMP with multiple message classes
![Page 30: Part IV: 3D WiNoC Architectures - Keio University · 2014-09-07 · • Adding extra VCs – Conventional way – Duplicating buffers – 2 VCs for each message class • Bubble flow](https://reader033.vdocument.in/reader033/viewer/2022060512/5f2a0e800f64023e77105369/html5/thumbnails/30.jpg)
Outline: 3D WiNoC Architectures
• 3D IC technologies: Wired vs. Wireless [5min]
• Prototype systems: Cube-0 & Cube-1 [15min]
• Wireless 3D NoC architectures [15min]
– Ring-based 3D WiNoC – Irregular 3D WiNoC
• Experiment results and Summary [10min]
Mar 24th, 2014 30 Hiroki Matsutani, "3D WiNoC Architectures", Tutorial at DATE'14
So far we focused on 2D WiNoC architecture and its physical link design. This part explores 3D WiNoC
architectures, especially inductive-coupling 3D option.
![Page 31: Part IV: 3D WiNoC Architectures - Keio University · 2014-09-07 · • Adding extra VCs – Conventional way – Duplicating buffers – 2 VCs for each message class • Bubble flow](https://reader033.vdocument.in/reader033/viewer/2022060512/5f2a0e800f64023e77105369/html5/thumbnails/31.jpg)
Hiroki Matsutani, "3D WiNoC Architectures", Tutorial at DATE'14
Two approaches: Wireless 3D NoC arch
• Ring-based approach – Easy to add & remove – Inefficient hop count – No scalability
• Irregular approach – We can use any links – Irregular routing needed – Plug-and-play protocol
Mar 24th, 2014 31 Chip 0
Chip 1
Chip 2
Chip 3
Chip 4
Chips should be added, removed, swapped for each app.
Chip 0
Chip 1
Chip 2
Chip 3
Chip 4
Good
Bad
Bad [Matsutani, NOCS’11]
Good
[Matsutani, ASPDAC’13]
![Page 32: Part IV: 3D WiNoC Architectures - Keio University · 2014-09-07 · • Adding extra VCs – Conventional way – Duplicating buffers – 2 VCs for each message class • Bubble flow](https://reader033.vdocument.in/reader033/viewer/2022060512/5f2a0e800f64023e77105369/html5/thumbnails/32.jpg)
Hiroki Matsutani, "3D WiNoC Architectures", Tutorial at DATE'14
Irregular approach: Ad-hoc topology
• Wireless 3D CMPs – Various chips are stacked,
depending on the application
• Each chip – Must have vertical links – May not have horizontal links – May have VCs for horizontal
• Ad-hoc wireless 3D NoC – We cannot expect the network
topology, number of VCs, and its bandwidth before stacking Mar 24th, 2014 32
Chip 0
Chip 1
Chip 2
Chip 7
![Page 33: Part IV: 3D WiNoC Architectures - Keio University · 2014-09-07 · • Adding extra VCs – Conventional way – Duplicating buffers – 2 VCs for each message class • Bubble flow](https://reader033.vdocument.in/reader033/viewer/2022060512/5f2a0e800f64023e77105369/html5/thumbnails/33.jpg)
Hiroki Matsutani, "3D WiNoC Architectures", Tutorial at DATE'14
Irregular approach: Ad-hoc topology
• Wireless 3D CMPs – Various chips are stacked,
depending on the application
• Each chip – Must have vertical links – May not have horizontal links – May have VCs for horizontal
• Ad-hoc wireless 3D NoC – We cannot expect the network
topology, number of VCs, and its bandwidth before stacking Mar 24th, 2014 33
Chip 0
Chip 1
Chip 2
Chip 7
![Page 34: Part IV: 3D WiNoC Architectures - Keio University · 2014-09-07 · • Adding extra VCs – Conventional way – Duplicating buffers – 2 VCs for each message class • Bubble flow](https://reader033.vdocument.in/reader033/viewer/2022060512/5f2a0e800f64023e77105369/html5/thumbnails/34.jpg)
Hiroki Matsutani, "3D WiNoC Architectures", Tutorial at DATE'14
Irregular approach: Ad-hoc topology
• Wireless 3D CMPs – Various chips are stacked,
depending on the application
• Each chip – Must have vertical links – May not have horizontal links – May have VCs for horizontal
• Ad-hoc wireless 3D NoC – We cannot expect the network
topology, number of VCs, and its bandwidth before stacking Mar 24th, 2014 34
Chip 0
Chip 1
Chip 2
Chip 7
No horizontal link
![Page 35: Part IV: 3D WiNoC Architectures - Keio University · 2014-09-07 · • Adding extra VCs – Conventional way – Duplicating buffers – 2 VCs for each message class • Bubble flow](https://reader033.vdocument.in/reader033/viewer/2022060512/5f2a0e800f64023e77105369/html5/thumbnails/35.jpg)
Hiroki Matsutani, "3D WiNoC Architectures", Tutorial at DATE'14
Irregular approach: Ad-hoc topology
• Wireless 3D CMPs – Various chips are stacked,
depending on the application
• Each chip – Must have vertical links – May not have horizontal links – May have VCs for horizontal
• Ad-hoc wireless 3D NoC – We cannot expect the network
topology, number of VCs, and its bandwidth before stacking Mar 24th, 2014 35
Chip 0
Chip 1
Chip 2
Chip 7
Extreme case: only the bottom has link
We need a mechanism to route packets even with such cases
![Page 36: Part IV: 3D WiNoC Architectures - Keio University · 2014-09-07 · • Adding extra VCs – Conventional way – Duplicating buffers – 2 VCs for each message class • Bubble flow](https://reader033.vdocument.in/reader033/viewer/2022060512/5f2a0e800f64023e77105369/html5/thumbnails/36.jpg)
Hiroki Matsutani, "3D WiNoC Architectures", Tutorial at DATE'14
Irregular approach: Up*/down* routing
• Up*/down* (UD) routing – Irregular network routing – A root node is selected – Packets go up and then
go down
• An example – 4x4 2D mesh
– A root node is selected
Mar 24th, 2014 36
1 2 3
4 5 6 7
8 9 10 11
12 13 14 15
[Schroeder, JSAC’91]
0 Root 0
Note: P lease refer to Part I I for routing strategy for irregular WiNoCs.
![Page 37: Part IV: 3D WiNoC Architectures - Keio University · 2014-09-07 · • Adding extra VCs – Conventional way – Duplicating buffers – 2 VCs for each message class • Bubble flow](https://reader033.vdocument.in/reader033/viewer/2022060512/5f2a0e800f64023e77105369/html5/thumbnails/37.jpg)
Hiroki Matsutani, "3D WiNoC Architectures", Tutorial at DATE'14
Irregular approach: Up*/down* routing
• Up*/down* (UD) routing – Irregular network routing – A root node is selected – Packets go up and then
go down
• An example – 4x4 2D mesh
– Direction (up or down) is
determined Mar 24th, 2014 37
0 1 2 3
4 5 6 7
8 9 10 11
12 13 14 15
Root
Up direction
[Schroeder, JSAC’91]
![Page 38: Part IV: 3D WiNoC Architectures - Keio University · 2014-09-07 · • Adding extra VCs – Conventional way – Duplicating buffers – 2 VCs for each message class • Bubble flow](https://reader033.vdocument.in/reader033/viewer/2022060512/5f2a0e800f64023e77105369/html5/thumbnails/38.jpg)
Hiroki Matsutani, "3D WiNoC Architectures", Tutorial at DATE'14
Irregular approach: Up*/down* routing
• Up*/down* (UD) routing – Irregular network routing – A root node is selected – Packets go up and then
go down
• An example – 4x4 2D mesh – Routing path is generated – Down-up turn is prohibited – It generates imbalanced
paths Mar 24th, 2014 38
0 1 2 3
4 5 6 7
8 9 10 11
12 13 14 15
Root
OK
NG
Up direction
[Schroeder, JSAC’91]
![Page 39: Part IV: 3D WiNoC Architectures - Keio University · 2014-09-07 · • Adding extra VCs – Conventional way – Duplicating buffers – 2 VCs for each message class • Bubble flow](https://reader033.vdocument.in/reader033/viewer/2022060512/5f2a0e800f64023e77105369/html5/thumbnails/39.jpg)
Hiroki Matsutani, "3D WiNoC Architectures", Tutorial at DATE'14
Irregular approach: Up*/down* routing
• Up*/down* (UD) routing – Irregular network routing – A root node is selected – Packets go up and then go down
• Another example – 3D NoC with 4 chips
Mar 24th, 2014 39
Chip 0
Chip 1
Chip 3
Chip 2
6 7
4 5
2 3
0 1
Root
[Schroeder, JSAC’91]
![Page 40: Part IV: 3D WiNoC Architectures - Keio University · 2014-09-07 · • Adding extra VCs – Conventional way – Duplicating buffers – 2 VCs for each message class • Bubble flow](https://reader033.vdocument.in/reader033/viewer/2022060512/5f2a0e800f64023e77105369/html5/thumbnails/40.jpg)
Hiroki Matsutani, "3D WiNoC Architectures", Tutorial at DATE'14
Irregular approach: Up*/down* routing
• Up*/down* (UD) routing – Irregular network routing – A root node is selected – Packets go up and then go down
• Another example – 3D NoC with 4 chips
Mar 24th, 2014 40
6 7
4 5
2 3
0 1
Root
[Schroeder, JSAC’91]
Chip 0
Chip 1
Chip 3
Chip 2
![Page 41: Part IV: 3D WiNoC Architectures - Keio University · 2014-09-07 · • Adding extra VCs – Conventional way – Duplicating buffers – 2 VCs for each message class • Bubble flow](https://reader033.vdocument.in/reader033/viewer/2022060512/5f2a0e800f64023e77105369/html5/thumbnails/41.jpg)
Hiroki Matsutani, "3D WiNoC Architectures", Tutorial at DATE'14
Irregular approach: Up*/down* routing
• Up*/down* (UD) routing – Irregular network routing – A root node is selected – Packets go up and then go down
• Another example – 3D NoC with 4 chips
Mar 24th, 2014 41
6 7
4 5
2 3
0 1
Root
NG
OK
[Schroeder, JSAC’91]
Chip 0
Chip 1
Chip 3
Chip 2
The best spanning tree root is selected by exhaustive or heuristic using communication traces (9sec for 64-tile)
![Page 42: Part IV: 3D WiNoC Architectures - Keio University · 2014-09-07 · • Adding extra VCs – Conventional way – Duplicating buffers – 2 VCs for each message class • Bubble flow](https://reader033.vdocument.in/reader033/viewer/2022060512/5f2a0e800f64023e77105369/html5/thumbnails/42.jpg)
Hiroki Matsutani, "3D WiNoC Architectures", Tutorial at DATE'14
Irregular approach: UD with VCs
• UD routing with multiple VCs – Each layer (VC) has its own spanning tree – Packets can transit multiple layers in descent order
Mar 24th, 2014 42
Chip 0
Chip 1
Chip 3
Chip 2
6 7
4 5
2 3
0 1
Root
OK
Chip 0
Chip 1
Chip 3
Chip 2
Root’
[Koibuchi,ICPP’03] [Lysne,TPDS’06]
VC1 VC0
You can use either VC0 or VC1
6 7
4 5
2 3
0 1
OK
How to recognize the topology & build multiple spanning trees?
![Page 43: Part IV: 3D WiNoC Architectures - Keio University · 2014-09-07 · • Adding extra VCs – Conventional way – Duplicating buffers – 2 VCs for each message class • Bubble flow](https://reader033.vdocument.in/reader033/viewer/2022060512/5f2a0e800f64023e77105369/html5/thumbnails/43.jpg)
Outline: 3D WiNoC Architectures
• 3D IC technologies: Wired vs. Wireless [5min]
• Prototype systems: Cube-0 & Cube-1 [15min]
• Wireless 3D NoC architectures [15min]
– Ring-based 3D WiNoC – Irregular 3D WiNoC
• Experiment results and Summary [10min]
Mar 24th, 2014 43 Hiroki Matsutani, "3D WiNoC Architectures", Tutorial at DATE'14
So far we focused on 2D WiNoC architecture and its physical link design. This part explores 3D WiNoC
architectures, especially inductive-coupling 3D option.
![Page 44: Part IV: 3D WiNoC Architectures - Keio University · 2014-09-07 · • Adding extra VCs – Conventional way – Duplicating buffers – 2 VCs for each message class • Bubble flow](https://reader033.vdocument.in/reader033/viewer/2022060512/5f2a0e800f64023e77105369/html5/thumbnails/44.jpg)
Hiroki Matsutani, "3D WiNoC Architectures", Tutorial at DATE'14
Full-system CMP simulations
• Ring-based approach – Easy to add & remove – Inefficient hop count – No scalability
• Irregular approach – We can use any links – Irregular routing needed – Plug-and-play protocol
Mar 24th, 2014 44 Chip 0
Chip 1
Chip 2
Chip 3
Chip 4
Application performance of two approaches is evaluated
Chip 0
Chip 1
Chip 2
Chip 3
Chip 4
Good
Bad
Bad [Matsutani, NOCS’11]
Good
![Page 45: Part IV: 3D WiNoC Architectures - Keio University · 2014-09-07 · • Adding extra VCs – Conventional way – Duplicating buffers – 2 VCs for each message class • Bubble flow](https://reader033.vdocument.in/reader033/viewer/2022060512/5f2a0e800f64023e77105369/html5/thumbnails/45.jpg)
Hiroki Matsutani, "3D WiNoC Architectures", Tutorial at DATE'14
Network topology: Irregular
• The following iteration is performed 1,000 times – Each tile has router and core (e.g., processor or caches) – Each horizontal link appears with 50%
• We examined three cases: 16, 32, and 64 tiles
Mar 24th, 2014 45 4x4 mesh * 4chips
16-tile (2,2,4) 32-tile (4,2,4) 64-tile (4,4,4)
2x2 mesh * 4chips 4x2 mesh * 4chips
![Page 46: Part IV: 3D WiNoC Architectures - Keio University · 2014-09-07 · • Adding extra VCs – Conventional way – Duplicating buffers – 2 VCs for each message class • Bubble flow](https://reader033.vdocument.in/reader033/viewer/2022060512/5f2a0e800f64023e77105369/html5/thumbnails/46.jpg)
Hiroki Matsutani, "3D WiNoC Architectures", Tutorial at DATE'14
Network topology: Irregular
• The following iteration is performed 1,000 times – Each tile has router and core (e.g., processor or caches) – Each horizontal link appears with 50%
• We examined three cases: 16, 32, and 64 tiles
Mar 24th, 2014 46 4x4 mesh * 4chips
16-tile (2,2,4) 32-tile (4,2,4) 64-tile (4,4,4)
2x2 mesh * 4chips 4x2 mesh * 4chips
Among 1,000 random topologies, one with the most typical hop count value is selected for the full-system evaluation
![Page 47: Part IV: 3D WiNoC Architectures - Keio University · 2014-09-07 · • Adding extra VCs – Conventional way – Duplicating buffers – 2 VCs for each message class • Bubble flow](https://reader033.vdocument.in/reader033/viewer/2022060512/5f2a0e800f64023e77105369/html5/thumbnails/47.jpg)
Hiroki Matsutani, "3D WiNoC Architectures", Tutorial at DATE'14
Parallel programs are running on it
• Ring-based approach – Easy to add & remove – Inefficient hop count – No scalability
• Irregular approach – We can use any links – Irregular routing needed – Plug-and-play protocol
Mar 24th, 2014 47 Chip 0
Chip 1
Chip 2
Chip 3
Chip 4
GEMS/Simics is used for full-system simulations
Chip 0
Chip 1
Chip 2
Chip 3
Chip 4
Good
Bad
Bad [Matsutani, NOCS’11]
Good
[Matsutani, ASPDAC’13]
![Page 48: Part IV: 3D WiNoC Architectures - Keio University · 2014-09-07 · • Adding extra VCs – Conventional way – Duplicating buffers – 2 VCs for each message class • Bubble flow](https://reader033.vdocument.in/reader033/viewer/2022060512/5f2a0e800f64023e77105369/html5/thumbnails/48.jpg)
Hiroki Matsutani, "3D WiNoC Architectures", Tutorial at DATE'14
Parallel programs are running on it
Mar 24th, 2014 48
Routers CPUs L2$banks MCs
16-tile 16 4 32 4
32-tile 32 8 64 8
64-tile 64 8 128 16
Table 1: Topologies to be examined
L1$ size & latency 64K / 1cycle
L2$ size & latency 256K / 6cycle
Memory size & latency 4G / 160cycle
Router latency [RC/VSA] [ST] [LT]
Router buffer size 5-flit per VC
Protocol MOESI directory
Table 2: Simulation parameters
Solaris 9 is running on 8-core UltraSPARC NPB (IS, DC, CG, MG, EP, LU, UA, SP, BT, FT)
Table 3: Application programs
GEMS/Simics is used for full-system simulations
![Page 49: Part IV: 3D WiNoC Architectures - Keio University · 2014-09-07 · • Adding extra VCs – Conventional way – Duplicating buffers – 2 VCs for each message class • Bubble flow](https://reader033.vdocument.in/reader033/viewer/2022060512/5f2a0e800f64023e77105369/html5/thumbnails/49.jpg)
Hiroki Matsutani, "3D WiNoC Architectures", Tutorial at DATE'14 Mar 24th, 2014
Application exec time: 16-tile
49
• Ring-based approach (VC flow & Bubble flow controls)
• Irregular approach • Irregular approach outperforms Ring-based
one by 10.8% in 16-tile case.
![Page 50: Part IV: 3D WiNoC Architectures - Keio University · 2014-09-07 · • Adding extra VCs – Conventional way – Duplicating buffers – 2 VCs for each message class • Bubble flow](https://reader033.vdocument.in/reader033/viewer/2022060512/5f2a0e800f64023e77105369/html5/thumbnails/50.jpg)
Mar 24th, 2014 51 Hiroki Matsutani, "3D WiNoC Architectures", Tutorial at DATE'14
Application exec time: 64-tile
Ring has no scalability Irregular one improves significantly
• Ring-based approach (VC flow & Bubble flow controls)
• Irregular approach • Irregular approach outperforms Ring-based
one by 46.0% in 64-tile case.
![Page 51: Part IV: 3D WiNoC Architectures - Keio University · 2014-09-07 · • Adding extra VCs – Conventional way – Duplicating buffers – 2 VCs for each message class • Bubble flow](https://reader033.vdocument.in/reader033/viewer/2022060512/5f2a0e800f64023e77105369/html5/thumbnails/51.jpg)
Application exec time: 16-tile
Mar 24th, 2014 52 Hiroki Matsutani, "3D WiNoC Architectures", Tutorial at DATE'14
• Irregular (50% of horizontal links are implemented)
• 3D mesh (all horizontal links are implemented)
• Performance of Irregular approach Irr3(min) is closed to that of 3D mesh
![Page 52: Part IV: 3D WiNoC Architectures - Keio University · 2014-09-07 · • Adding extra VCs – Conventional way – Duplicating buffers – 2 VCs for each message class • Bubble flow](https://reader033.vdocument.in/reader033/viewer/2022060512/5f2a0e800f64023e77105369/html5/thumbnails/52.jpg)
Application exec time: 64-tile
Mar 24th, 2014 54 Hiroki Matsutani, "3D WiNoC Architectures", Tutorial at DATE'14 Optimized Irr3(min) improves by 15.1% compared to the worst
• Irregular (50% of horizontal links are implemented)
• 3D mesh (all horizontal links are implemented)
• Performance of Irregular approach Irr3(min) is closed to that of 3D mesh
![Page 53: Part IV: 3D WiNoC Architectures - Keio University · 2014-09-07 · • Adding extra VCs – Conventional way – Duplicating buffers – 2 VCs for each message class • Bubble flow](https://reader033.vdocument.in/reader033/viewer/2022060512/5f2a0e800f64023e77105369/html5/thumbnails/53.jpg)
Experiment results: Cube-1 (2012)
Mar 24th, 2014 55 Hiroki Matsutani, "3D WiNoC Architectures", Tutorial at DATE'14
Cube-1
Motherboard
[Miura, HotChips’13 Demo]
Cube-1 demo system PE array chip performs image processing
CPU chip for control
![Page 54: Part IV: 3D WiNoC Architectures - Keio University · 2014-09-07 · • Adding extra VCs – Conventional way – Duplicating buffers – 2 VCs for each message class • Bubble flow](https://reader033.vdocument.in/reader033/viewer/2022060512/5f2a0e800f64023e77105369/html5/thumbnails/54.jpg)
Mar 24th, 2014 56 Hiroki Matsutani, "3D WiNoC Architectures", Tutorial at DATE'14
Experiment results: Cube-1 (2012)
![Page 55: Part IV: 3D WiNoC Architectures - Keio University · 2014-09-07 · • Adding extra VCs – Conventional way – Duplicating buffers – 2 VCs for each message class • Bubble flow](https://reader033.vdocument.in/reader033/viewer/2022060512/5f2a0e800f64023e77105369/html5/thumbnails/55.jpg)
Experiment results: Cube-1 (2012)
Mar 24th, 2014 57 Hiroki Matsutani, "3D WiNoC Architectures", Tutorial at DATE'14
[Miura, IEEE Micro 13]
Packet error rate: Error-free operation at nominal
supply voltage
Power consumption: 5.8mW per 2Gbps channel
(at 0.92V)
![Page 56: Part IV: 3D WiNoC Architectures - Keio University · 2014-09-07 · • Adding extra VCs – Conventional way – Duplicating buffers – 2 VCs for each message class • Bubble flow](https://reader033.vdocument.in/reader033/viewer/2022060512/5f2a0e800f64023e77105369/html5/thumbnails/56.jpg)
Summary: 3D WiNoC Architectures
Mar 24th, 2014 58 Hiroki Matsutani, "3D WiNoC Architectures", Tutorial at DATE'14
• Inductive-coupling 3D SiP – A low cost alternative to build low-volume custom
systems by stacking off-the-shelf known-good-dies – No special process technology is required;
inductors are implemented with metal layers • Cube-1: A practical 3D WiNoC system
– Two types: Host CPU chip & Accelerator chips – We can customize number & types of chips in SiP
![Page 57: Part IV: 3D WiNoC Architectures - Keio University · 2014-09-07 · • Adding extra VCs – Conventional way – Duplicating buffers – 2 VCs for each message class • Bubble flow](https://reader033.vdocument.in/reader033/viewer/2022060512/5f2a0e800f64023e77105369/html5/thumbnails/57.jpg)
Future plans: 3D WiNoC Architectures
Mar 24th, 2014 59 Hiroki Matsutani, "3D WiNoC Architectures", Tutorial at DATE'14
Steady Challenging
Cube-2 design: • STM 28nm process • CPU & Accelerator chips • Memcached accelerator
chip for smart sensors
All-in-one TX/RX macro: • Coil + Routers/buffers • Coil uses only metal
layers; silicon available
Power/heat management: • Dynamic on/off control
of inductors • Closed-loop control [Elfadel, DATE’13] (See Part I)
Combine 2D & 3D WiNoC: • mm-wave wireless for
intra-chip (See Parts II and III)
![Page 58: Part IV: 3D WiNoC Architectures - Keio University · 2014-09-07 · • Adding extra VCs – Conventional way – Duplicating buffers – 2 VCs for each message class • Bubble flow](https://reader033.vdocument.in/reader033/viewer/2022060512/5f2a0e800f64023e77105369/html5/thumbnails/58.jpg)
Future plans: 3D WiNoC Architectures
Mar 24th, 2014 60 Hiroki Matsutani, "3D WiNoC Architectures", Tutorial at DATE'14
Steady Challenging
Jumbo inductors: • Jumbo inductor like a
power ring • <1mm communication • Relax SiP assembly
Wireless broadcast bus: • Combine wireless
vertical bus & P2P links • Static/dynamic TDMA vs.
CSMA/CD
Cartridge style computer: • Insert necessary chips • Power/clk from cartridge • Inter-chip data transfers
use wireless • Power from wireless ??
Exploiting small-world: • Add a random NoC chip
to 3D WiNoC SiP to shorten path length
(See Part I)
![Page 59: Part IV: 3D WiNoC Architectures - Keio University · 2014-09-07 · • Adding extra VCs – Conventional way – Duplicating buffers – 2 VCs for each message class • Bubble flow](https://reader033.vdocument.in/reader033/viewer/2022060512/5f2a0e800f64023e77105369/html5/thumbnails/59.jpg)
References (1/2) • Cube-0: The first real 3D WiNoC
– H. Matsutani, et.al., "A Vertical Bubble Flow Network using Inductive-Coupling for 3-D CMPs", NOCS 2011.
– Y. Take, et.al., "3D NoC with Inductive-Coupling Links for Building-Block SiPs", IEEE Trans on Computers (2014).
• Cube-1: The heterogeneous 3D WiNoC – N. Miura, et.al., "A Scalable 3D Heterogeneous Multicore
with an Inductive ThruChip Interface", IEEE Micro (2013).
• MuCCRA-Cube: Dynamically reconfigurable processor – S. Saito, et.al., "MuCCRA-Cube: a 3D Dynamically
Reconfigurable Processor with Inductive-Coupling Link", FPL 2009.
Mar 24th, 2014 61 Hiroki Matsutani, "3D WiNoC Architectures", Tutorial at DATE'14
![Page 60: Part IV: 3D WiNoC Architectures - Keio University · 2014-09-07 · • Adding extra VCs – Conventional way – Duplicating buffers – 2 VCs for each message class • Bubble flow](https://reader033.vdocument.in/reader033/viewer/2022060512/5f2a0e800f64023e77105369/html5/thumbnails/60.jpg)
References (2/2) • Vertical bubble flow control on Cube-0
– H. Matsutani, et.al., "A Vertical Bubble Flow Network using Inductive-Coupling for 3-D CMPs", NOCS 2011.
– Y. Take, et.al., "3D NoC with Inductive-Coupling Links for Building-Block SiPs", IEEE Trans on Computers (2014).
• Spanning trees optimization for 3D WiNoCs – H. Matsutani, et.al., "A Case for Wireless 3D NoCs for
CMPs", ASP-DAC 2013 (Best Paper Award).
Mar 24th, 2014 62 Hiroki Matsutani, "3D WiNoC Architectures", Tutorial at DATE'14