![Page 1: 3D WiNoC Architecturesmatutani/papers/matsuta... · 2014. 9. 19. · – Each tile has router and core (e.g., processor or caches) – Each horizontal link appears with 50% (H50)](https://reader036.vdocument.in/reader036/viewer/2022071409/610381bf6324191efb7febac/html5/thumbnails/1.jpg)
Interconnect Enhances Architecture: Evolution of Wireless NoC from Planar to 3D
3D WiNoC Architectures
Sep 18th, 2014 Hiroki Matsutani, "3D WiNoC Architectures", Special Session at NOCS'14 1
Hiroki Matsutani Keio University, Japan
![Page 2: 3D WiNoC Architecturesmatutani/papers/matsuta... · 2014. 9. 19. · – Each tile has router and core (e.g., processor or caches) – Each horizontal link appears with 50% (H50)](https://reader036.vdocument.in/reader036/viewer/2022071409/610381bf6324191efb7febac/html5/thumbnails/2.jpg)
Outline: Wireless 3D NoC architectures
• Wireless 3D NoCs (3D WiNoCs) [8min]
– Wireless 3D IC technology – 3D WiNoC design example (65nm)
• Routing & topology exploration [8min]
– Spanning tree optimization for irregular WiNoCs – Power management via vertical ON/OFF links
• Adding random NoC chip for 3D WiNoCs [5min]
– Adding randomness induces small-world effects – Adding random NoC chip to NoC-less 3D ICs
• Summary [1min] 2 Hiroki Matsutani, "3D WiNoC Architectures", Special Session at NOCS'14
[Matsutani,ASPDAC’13]
[Matsutani,DATE’14]
[Miura,IEEE Micro 13] [Matsutani,NOCS’11]
![Page 3: 3D WiNoC Architecturesmatutani/papers/matsuta... · 2014. 9. 19. · – Each tile has router and core (e.g., processor or caches) – Each horizontal link appears with 50% (H50)](https://reader036.vdocument.in/reader036/viewer/2022071409/610381bf6324191efb7febac/html5/thumbnails/3.jpg)
Design cost of LSI is increasing
3
By changing the chips in a package, we can provide a wider range of chip family with modest design cost
• System-on-Chip (SoC) – Required components are integrated on a single chip – Different LSI must be developed for each application
• System-in-Package (SiP) or 3D IC – Required components are stacked for each application
![Page 4: 3D WiNoC Architecturesmatutani/papers/matsuta... · 2014. 9. 19. · – Each tile has router and core (e.g., processor or caches) – Each horizontal link appears with 50% (H50)](https://reader036.vdocument.in/reader036/viewer/2022071409/610381bf6324191efb7febac/html5/thumbnails/4.jpg)
3D IC technology for going vertical
4
Two
chip
s (f
ace-
to-f
ace)
Microbump
Through silicon via
Capacitive coupling
Inductive coupling
Wired Wireless M
ore
than
th
ree
chip
s
Scalability
Flexibility
![Page 5: 3D WiNoC Architecturesmatutani/papers/matsuta... · 2014. 9. 19. · – Each tile has router and core (e.g., processor or caches) – Each horizontal link appears with 50% (H50)](https://reader036.vdocument.in/reader036/viewer/2022071409/610381bf6324191efb7febac/html5/thumbnails/5.jpg)
Wireless 3D NoC (3D WiNoC) Three chips are stacked wirelessly
Each router has a wireless transceiver
Each router has TX/RX coils (inductors)
Coils are implemented with metal layers
5
Chip#2
Chip#1
Chip#0
TX
RX
![Page 6: 3D WiNoC Architecturesmatutani/papers/matsuta... · 2014. 9. 19. · – Each tile has router and core (e.g., processor or caches) – Each horizontal link appears with 50% (H50)](https://reader036.vdocument.in/reader036/viewer/2022071409/610381bf6324191efb7febac/html5/thumbnails/6.jpg)
Hiroki Matsutani, "3D WiNoC Architectures", Special Session at NOCS'14
An example: Cube-1 (2012)
• Test chips for building-block 3D systems – Two chip types: Host CPU chip & Accelerator chip – We can customize number & types of chips in SiP
• Cube-1 Host CPU chip – Two 3D wireless routers – MIPS-like CPU
• Cube-1 Accelerator chip – Two 3D wireless routers – Processing element array
6
[Miura, IEEE Micro 13]
MIP CPU Core
8x8 PE Array
Inductors
![Page 7: 3D WiNoC Architecturesmatutani/papers/matsuta... · 2014. 9. 19. · – Each tile has router and core (e.g., processor or caches) – Each horizontal link appears with 50% (H50)](https://reader036.vdocument.in/reader036/viewer/2022071409/610381bf6324191efb7febac/html5/thumbnails/7.jpg)
Hiroki Matsutani, "3D WiNoC Architectures", Special Session at NOCS'14
An example: Cube-1 (2012)
7
• Microphotographs of test chips
Host CPU Chip
Accelerator Chip
Host CPU + 3 Accelerators
[Miura, IEEE Micro 13]
We can change the number and types of chips in a package according to application requirements
![Page 8: 3D WiNoC Architecturesmatutani/papers/matsuta... · 2014. 9. 19. · – Each tile has router and core (e.g., processor or caches) – Each horizontal link appears with 50% (H50)](https://reader036.vdocument.in/reader036/viewer/2022071409/610381bf6324191efb7febac/html5/thumbnails/8.jpg)
An example: Cube-0 (2010)
• Test chip for vertical communication schemes – Vertical point-to-point link between adjacent chips – Vertical shared bus (broadcast)
8
2.1mm x 2.1mm
Core 0 & 1
Inductors (bus)
Inductors (P2P)
Router 0 & 1
TX
Stacking for Ring network
RX
TX/RX
Stacking for Vertical bus
[Matsutani, NOCS’11]
Either vertical P2P links or broadcast bus can be formed for 3D WiNoC
![Page 9: 3D WiNoC Architecturesmatutani/papers/matsuta... · 2014. 9. 19. · – Each tile has router and core (e.g., processor or caches) – Each horizontal link appears with 50% (H50)](https://reader036.vdocument.in/reader036/viewer/2022071409/610381bf6324191efb7febac/html5/thumbnails/9.jpg)
Outline: Wireless 3D NoC architectures
9 Hiroki Matsutani, "3D WiNoC Architectures", Special Session at NOCS'14
• Wireless 3D NoCs (3D WiNoCs) [8min]
– Wireless 3D IC technology – 3D WiNoC design example (65nm)
• Routing & topology exploration [8min]
– Spanning tree optimization for irregular WiNoCs – Power management via vertical ON/OFF links
• Adding random NoC chip for 3D WiNoCs [5min]
– Adding randomness induces small-world effects – Adding random NoC chip to NoC-less 3D ICs
• Summary [1min]
[Matsutani,ASPDAC’13]
[Matsutani,DATE’14]
[Miura,IEEE Micro 13] [Matsutani,NOCS’11]
![Page 10: 3D WiNoC Architecturesmatutani/papers/matsuta... · 2014. 9. 19. · – Each tile has router and core (e.g., processor or caches) – Each horizontal link appears with 50% (H50)](https://reader036.vdocument.in/reader036/viewer/2022071409/610381bf6324191efb7febac/html5/thumbnails/10.jpg)
Big picture: Wireless 3D NoC
• Arbitrary chips are stacked after fabrication – Each chip has vertical links at pre-specified locations, but
we do not know internal topology of each chip – Some chips may not have horizontal NoC (vertical link only)
10
Memory chip from C
CPU chip from A
Application chip from B
Required chips are stacked for each application
An example (4 chips)
Inductor (vertical link)
![Page 11: 3D WiNoC Architecturesmatutani/papers/matsuta... · 2014. 9. 19. · – Each tile has router and core (e.g., processor or caches) – Each horizontal link appears with 50% (H50)](https://reader036.vdocument.in/reader036/viewer/2022071409/610381bf6324191efb7febac/html5/thumbnails/11.jpg)
Hiroki Matsutani, "3D WiNoC Architectures", Special Session at NOCS'14 11
Chip 0
Chip 1
Chip 3
Chip 2
6 7
4 5
2 3
0 1
Root
[Schroeder, JSAC’91]
Wireless 3D NoC: Up*/down* routing
• Up*/down* (UD) routing – Irregular network routing – A root node is selected – Direction (up or down) is assigned – Packets go up and then go down
• Example: 4 chips
NG
OK
The best spanning tree root is selected by exhaustive or heuristic using communication traces (9sec for 64-tile)
![Page 12: 3D WiNoC Architecturesmatutani/papers/matsuta... · 2014. 9. 19. · – Each tile has router and core (e.g., processor or caches) – Each horizontal link appears with 50% (H50)](https://reader036.vdocument.in/reader036/viewer/2022071409/610381bf6324191efb7febac/html5/thumbnails/12.jpg)
• UD routing with multiple VCs – Each layer (VC) has its own spanning tree – Packets can transit multiple layers in descent order
12
Chip 0
Chip 1
Chip 3
Chip 2
6 7
4 5
2 3
0 1
Root
OK
Chip 0
Chip 1
Chip 3
Chip 2
Root’
[Koibuchi,ICPP’03] [Lysne,TPDS’06]
VC1 VC0
You can use either VC0 or VC1
6 7
4 5
2 3
0 1
OK
Wireless 3D NoC: UD routing w/ VCs
![Page 13: 3D WiNoC Architecturesmatutani/papers/matsuta... · 2014. 9. 19. · – Each tile has router and core (e.g., processor or caches) – Each horizontal link appears with 50% (H50)](https://reader036.vdocument.in/reader036/viewer/2022071409/610381bf6324191efb7febac/html5/thumbnails/13.jpg)
Hiroki Matsutani, "3D WiNoC Architectures", Special Session at NOCS'14
Full-system simulations: Topology
• We assume 64 tiles (four 4x4 chips) • The following iteration is performed 1,000 times
– Each tile has router and core (e.g., processor or caches) – Each horizontal link appears with 50% (H50) – Each vertical link appears with 100% (V100)
13
Among 1,000 random topologies, one with the most typical hop count value is selected for the full-system evaluation
![Page 14: 3D WiNoC Architecturesmatutani/papers/matsuta... · 2014. 9. 19. · – Each tile has router and core (e.g., processor or caches) – Each horizontal link appears with 50% (H50)](https://reader036.vdocument.in/reader036/viewer/2022071409/610381bf6324191efb7febac/html5/thumbnails/14.jpg)
14
Processor architecture X86-64
L1$ size & latency 32K / 1cycle
L2$ size & latency 256K / 6cycle
Memory size & latency 4G / 160cycle
Router latency [BW] [VSA] [ST] [LT]
Router buffer size 5-flit per VC
Protocol MOESI directory (3VC)
Table 1: Simulation parameters
NPB (BT, CG, EP, FT, IS, LU, MG, SP, UA) Table 2: Application programs
CPU CPU
L2 cache banks
Hiroki Matsutani, "3D WiNoC Architectures", Special Session at NOCS'14
Full-system simulations: Gem5
• H50-V100 topology (most typical one among 1,000) – Each horizontal link appears with 50% (H50) – Each vertical link appears with 100% (V100)
8 CPUs, 32 L2$ banks, and 4 memory controllers
![Page 15: 3D WiNoC Architecturesmatutani/papers/matsuta... · 2014. 9. 19. · – Each tile has router and core (e.g., processor or caches) – Each horizontal link appears with 50% (H50)](https://reader036.vdocument.in/reader036/viewer/2022071409/610381bf6324191efb7febac/html5/thumbnails/15.jpg)
15 Hiroki Matsutani, "3D WiNoC Architectures", Special Session at NOCS'14
Full-system simulations: Gem5
• H50-V100 topology (most typical one among 1,000) – Each horizontal link appears with 50% (H50) – Each vertical link appears with 100% (V100)
• The best spanning tree root is selected for each message class (i.e., 3 roots for 3 message classes)
Message class 0 Message class 1
![Page 16: 3D WiNoC Architecturesmatutani/papers/matsuta... · 2014. 9. 19. · – Each tile has router and core (e.g., processor or caches) – Each horizontal link appears with 50% (H50)](https://reader036.vdocument.in/reader036/viewer/2022071409/610381bf6324191efb7febac/html5/thumbnails/16.jpg)
Hop count: H50-V100 topology
16
• Hop count w/ the worst case spanning tree root • Hop count w/ the best case spanning tree root • Hop count w/ two spanning tree roots (2 VCs) • Ideal hop count (not deadlock-free)
Spanning tree optimization also improves performance by 12.0% compared to the worst
Spanning tree optimization always takes the best case
![Page 17: 3D WiNoC Architecturesmatutani/papers/matsuta... · 2014. 9. 19. · – Each tile has router and core (e.g., processor or caches) – Each horizontal link appears with 50% (H50)](https://reader036.vdocument.in/reader036/viewer/2022071409/610381bf6324191efb7febac/html5/thumbnails/17.jpg)
App exec time: H50-V100 topology
17
• H50-V100 w/ the worst case • H50-V100 w/ the best case • H100-V100 (all horizontal links available = 3D Mesh)
(50% horizontal links)
Spanning tree optimization improves performance by 12.0% compared to the worst
Spanning tree optimization always takes the best case
![Page 18: 3D WiNoC Architecturesmatutani/papers/matsuta... · 2014. 9. 19. · – Each tile has router and core (e.g., processor or caches) – Each horizontal link appears with 50% (H50)](https://reader036.vdocument.in/reader036/viewer/2022071409/610381bf6324191efb7febac/html5/thumbnails/18.jpg)
Energy per flit: H50-V100 topology
18
• H50-V100 w/ the worst case • H50-V100 w/ the best case • H100-V100 (all horizontal links available = 3D Mesh)
(50% horizontal links)
Energy for routers, horizontal links, and vertical links
The next slides will explore vertical ON/OFF links (e.g., H100V50)
![Page 19: 3D WiNoC Architecturesmatutani/papers/matsuta... · 2014. 9. 19. · – Each tile has router and core (e.g., processor or caches) – Each horizontal link appears with 50% (H50)](https://reader036.vdocument.in/reader036/viewer/2022071409/610381bf6324191efb7febac/html5/thumbnails/19.jpg)
Power management: Vertical links
• 5.8mW per 2Gbps channel • Stop power supply to inductors for low-power
– “H100-Vn” topology (n =100, 50, 25, 15) – Performance/power is improved by spanning tree
optimization
19 Hiroki Matsutani, "3D WiNoC Architectures", Special Session at NOCS'14
Cube-1
Motherboard
5.8mW/ch @0.92V
[Miura, IEEE Micro 13]
Fujitsu 65nm CMOS
![Page 20: 3D WiNoC Architecturesmatutani/papers/matsuta... · 2014. 9. 19. · – Each tile has router and core (e.g., processor or caches) – Each horizontal link appears with 50% (H50)](https://reader036.vdocument.in/reader036/viewer/2022071409/610381bf6324191efb7febac/html5/thumbnails/20.jpg)
• H100-Vn topology (n =100, 50, 25, 15) – All horizontal links are available (H100) – (100-n)% of vertical links are power-gated (Vn) – Pick up most typical H100-Vn among 1,000 trials
20 Hiroki Matsutani, "3D WiNoC Architectures", Special Session at NOCS'14
Chip#3
Chip#2
Chip#1
Chip#0
H100-V100
Chip#3
Chip#2
Chip#1
Chip#0
H100-V50
Power mgmt: Vertical ON/OFF links
![Page 21: 3D WiNoC Architecturesmatutani/papers/matsuta... · 2014. 9. 19. · – Each tile has router and core (e.g., processor or caches) – Each horizontal link appears with 50% (H50)](https://reader036.vdocument.in/reader036/viewer/2022071409/610381bf6324191efb7febac/html5/thumbnails/21.jpg)
• Random+SPopt (computation cost: low) – Stop (100-n)% of vertical links randomly – Then, spanning tree optimization is performed to mitigate
the performance penalty • Exhaustive search (computation cost: HUGE)
– Find the least important vertical links and stop them until (100-n)% of vertical links are removed
21
Random+SPopt Exhaustive
Random+SPopt Exhaustive
Hop count (Average) App exec time (Average)
Performance penalty is 2.6% w/ Random+SPopt on H100V50
+2.6%
Power mgmt: Vertical off link selection
![Page 22: 3D WiNoC Architecturesmatutani/papers/matsuta... · 2014. 9. 19. · – Each tile has router and core (e.g., processor or caches) – Each horizontal link appears with 50% (H50)](https://reader036.vdocument.in/reader036/viewer/2022071409/610381bf6324191efb7febac/html5/thumbnails/22.jpg)
Outline: Wireless 3D NoC architectures
22 Hiroki Matsutani, "3D WiNoC Architectures", Special Session at NOCS'14
• Wireless 3D NoCs (3D WiNoCs) [8min]
– Wireless 3D IC technology – 3D WiNoC design example (65nm)
• Routing & topology exploration [8min]
– Spanning tree optimization for irregular WiNoCs – Power management via vertical ON/OFF links
• Adding random NoC chip for 3D WiNoCs [5min]
– Adding randomness induces small-world effects – Adding random NoC chip to NoC-less 3D ICs
• Summary [1min]
[Matsutani,ASPDAC’13]
[Matsutani,DATE’14]
[Miura,IEEE Micro 13] [Matsutani,NOCS’11]
![Page 23: 3D WiNoC Architecturesmatutani/papers/matsuta... · 2014. 9. 19. · – Each tile has router and core (e.g., processor or caches) – Each horizontal link appears with 50% (H50)](https://reader036.vdocument.in/reader036/viewer/2022071409/610381bf6324191efb7febac/html5/thumbnails/23.jpg)
Big picture: 3D WiNoC w/ Random • Router IP macros have the same # of ports
– Unused ports can be used for long-range links • In addition, redundant links can be implemented
– Statically multiplexed by FPGA-like switch boxes – By reconfiguring the switch boxes, an unique random
topology is generated
23
Adding randomness induces small-world effects
![Page 24: 3D WiNoC Architecturesmatutani/papers/matsuta... · 2014. 9. 19. · – Each tile has router and core (e.g., processor or caches) – Each horizontal link appears with 50% (H50)](https://reader036.vdocument.in/reader036/viewer/2022071409/610381bf6324191efb7febac/html5/thumbnails/24.jpg)
Case 1: Random NoC to NoC-less 3D ICs
• Each chip has inductors but does not have 2D NoC – Inductors in the same pillar form a vertical broadcast bus – Horizontal connectivity is not provided at all (i.e., NoC-less)
24
Chip#0 NoC-less
Chip#1 NoC-less
Chip#2 NoC-less
Side view (3 chips)
Chip#2
Chip#1
Chip#0
“― ― ―” Configuration
Vertical broadcast bus
![Page 25: 3D WiNoC Architecturesmatutani/papers/matsuta... · 2014. 9. 19. · – Each tile has router and core (e.g., processor or caches) – Each horizontal link appears with 50% (H50)](https://reader036.vdocument.in/reader036/viewer/2022071409/610381bf6324191efb7febac/html5/thumbnails/25.jpg)
Case 1: Random NoC to NoC-less 3D ICs
• Each chip has inductors but does not have 2D NoC – Inductors in the same pillar form a vertical broadcast bus – Adding a 2D Mesh NoC to such NoC-less 3D IC
25
Chip#0 NoC-less
Chip#1 NoC-less
Chip#2 2D Mesh
Side view (3 chips)
Chip#2
Chip#1
Chip#0
“m ― ―” Configuration
Source
Destination
![Page 26: 3D WiNoC Architecturesmatutani/papers/matsuta... · 2014. 9. 19. · – Each tile has router and core (e.g., processor or caches) – Each horizontal link appears with 50% (H50)](https://reader036.vdocument.in/reader036/viewer/2022071409/610381bf6324191efb7febac/html5/thumbnails/26.jpg)
Case 1: Random NoC to NoC-less 3D ICs
• Each chip has inductors but does not have 2D NoC – Inductors in the same pillar form a vertical broadcast bus – Adding a Random NoC to such NoC-less 3D IC
26
Chip#0 NoC-less
Chip#1 NoC-less
Side view (3 chips)
Chip#2
Chip#1
Chip#0
“r ― ―” Configuration
Chip#2 Random
Source
Destination
![Page 27: 3D WiNoC Architecturesmatutani/papers/matsuta... · 2014. 9. 19. · – Each tile has router and core (e.g., processor or caches) – Each horizontal link appears with 50% (H50)](https://reader036.vdocument.in/reader036/viewer/2022071409/610381bf6324191efb7febac/html5/thumbnails/27.jpg)
Case 2: Replacing regular NoC w/ random
• 3D WiNoC that consists of three 2D Mesh NoC layers – Inductors in neighboring chips form a vertical P2P link – E.g., regular 3D Mesh topology (i.e., regular 3D NoC)
27
Chip#0 2D Mesh
Chip#1 2D Mesh
Chip#2 2D Mesh
Side view (3 chips)
Chip#2
Chip#1
Chip#0
“m m m” Configuration
Vertical point-to-point link
![Page 28: 3D WiNoC Architecturesmatutani/papers/matsuta... · 2014. 9. 19. · – Each tile has router and core (e.g., processor or caches) – Each horizontal link appears with 50% (H50)](https://reader036.vdocument.in/reader036/viewer/2022071409/610381bf6324191efb7febac/html5/thumbnails/28.jpg)
Case 2: Replacing regular NoC w/ random
• 3D WiNoC that consists of three 2D Mesh NoC layers – Inductors in neighboring chips form a vertical P2P link – Replacing regular 2D Mesh with random NoC
28
Chip#0 2D Mesh
Chip#1 2D Mesh
Chip#2 2D Mesh
“m m m” Configuration
Chip#0 2D Mesh
Chip#1 Random
Chip#2 2D Mesh
“m r m” Configuration
![Page 29: 3D WiNoC Architecturesmatutani/papers/matsuta... · 2014. 9. 19. · – Each tile has router and core (e.g., processor or caches) – Each horizontal link appears with 50% (H50)](https://reader036.vdocument.in/reader036/viewer/2022071409/610381bf6324191efb7febac/html5/thumbnails/29.jpg)
Q1: How many random chips do we need?
Number of random chips vs. Average latency
29 # of random chips (16-node) # of random chips (64-node)
P2P Bus
P2P Bus
1 random chip drastically reduces latency
1 or 2 random chips are enough in the P2P case
![Page 30: 3D WiNoC Architecturesmatutani/papers/matsuta... · 2014. 9. 19. · – Each tile has router and core (e.g., processor or caches) – Each horizontal link appears with 50% (H50)](https://reader036.vdocument.in/reader036/viewer/2022071409/610381bf6324191efb7febac/html5/thumbnails/30.jpg)
Q2: How should we design random chip?
30 Max random link length [tile] # of Horizontal router ports
P2P Bus
P2P Bus
Double-length link is enough (equivalent to folded torus)
Max. link length (left) & Horizontal degree (right)
4 ports are enough (equivalent to 2D mesh/torus)
![Page 31: 3D WiNoC Architecturesmatutani/papers/matsuta... · 2014. 9. 19. · – Each tile has router and core (e.g., processor or caches) – Each horizontal link appears with 50% (H50)](https://reader036.vdocument.in/reader036/viewer/2022071409/610381bf6324191efb7febac/html5/thumbnails/31.jpg)
Packet latency: P2P (mmmm mrrm rrrr)
31
m m m m Mesh Mesh Mesh Mesh P2P m r r m Mesh Rand Rand Mesh P2P r r r r Rand Rand Rand Rand P2P - - - m None None None Mesh Bus - - - r None None None Rand Bus m - - r Mesh None None Rand Bus
Aver
age
pack
et la
tenc
y [c
ycle
s]
m m m m m r r m r r r r
rrrr reduces latency by 20.7% compared to mmmm
![Page 32: 3D WiNoC Architecturesmatutani/papers/matsuta... · 2014. 9. 19. · – Each tile has router and core (e.g., processor or caches) – Each horizontal link appears with 50% (H50)](https://reader036.vdocument.in/reader036/viewer/2022071409/610381bf6324191efb7febac/html5/thumbnails/32.jpg)
Packet latency: Bus (---m ---r m--r)
32
m m m m Mesh Mesh Mesh Mesh P2P m r r m Mesh Rand Rand Mesh P2P r r r r Rand Rand Rand Rand P2P - - - m None None None Mesh Bus - - - r None None None Rand Bus m - - r Mesh None None Rand Bus
Aver
age
pack
et la
tenc
y [c
ycle
s]
- - - m - - - r m - - r
---r (adding a random NoC) reduces latency by 26.2% compared to ---m (adding a mesh NoC)
![Page 33: 3D WiNoC Architecturesmatutani/papers/matsuta... · 2014. 9. 19. · – Each tile has router and core (e.g., processor or caches) – Each horizontal link appears with 50% (H50)](https://reader036.vdocument.in/reader036/viewer/2022071409/610381bf6324191efb7febac/html5/thumbnails/33.jpg)
Outline: Wireless 3D NoC architectures
33 Hiroki Matsutani, "3D WiNoC Architectures", Special Session at NOCS'14
• Wireless 3D NoCs (3D WiNoCs) [8min]
– Wireless 3D IC technology – 3D WiNoC design example (65nm)
• Routing & topology exploration [8min]
– Spanning tree optimization for irregular WiNoCs – Power management via vertical ON/OFF links
• Adding random NoC chip for 3D WiNoCs [5min]
– Adding randomness induces small-world effects – Adding random NoC chip to NoC-less 3D ICs
• Summary [1min]
[Matsutani,ASPDAC’13]
[Matsutani,DATE’14]
[Miura,IEEE Micro 13] [Matsutani,NOCS’11]
![Page 34: 3D WiNoC Architecturesmatutani/papers/matsuta... · 2014. 9. 19. · – Each tile has router and core (e.g., processor or caches) – Each horizontal link appears with 50% (H50)](https://reader036.vdocument.in/reader036/viewer/2022071409/610381bf6324191efb7febac/html5/thumbnails/34.jpg)
Summary: 3D WiNoC Architectures
34 Hiroki Matsutani, "3D WiNoC Architectures", Special Session at NOCS'14
• Inductive-coupling 3D SiP – A low cost alternative to build low-volume custom
systems by stacking off-the-shelf known-good-dies – No special process technology is required;
inductors are implemented with metal layers • Cube-1: A practical 3D WiNoC system
– Two types: Host CPU chip & Accelerator chips – We can customize number & types of chips in SiP
![Page 35: 3D WiNoC Architecturesmatutani/papers/matsuta... · 2014. 9. 19. · – Each tile has router and core (e.g., processor or caches) – Each horizontal link appears with 50% (H50)](https://reader036.vdocument.in/reader036/viewer/2022071409/610381bf6324191efb7febac/html5/thumbnails/35.jpg)
Summary: 3D WiNoC Architectures
35 Hiroki Matsutani, "3D WiNoC Architectures", Special Session at NOCS'14
• Routing & topology exploration – Spanning tree optimization for adhoc 3D WiNoCs – Power reduction by randomly stopping vertical links – Performance/power is improved by spanning tree
optimization • Adding randomness shortens comm. latency
– Adding random NoC chip to NoC-less 3D ICs – Replacing regular 2D NoC with random 2D NoC
![Page 36: 3D WiNoC Architecturesmatutani/papers/matsuta... · 2014. 9. 19. · – Each tile has router and core (e.g., processor or caches) – Each horizontal link appears with 50% (H50)](https://reader036.vdocument.in/reader036/viewer/2022071409/610381bf6324191efb7febac/html5/thumbnails/36.jpg)
References (1/2) • Cube-0: The first real 3D WiNoC
– H. Matsutani, et.al., "A Vertical Bubble Flow Network using Inductive-Coupling for 3-D CMPs", NOCS 2011.
– Y. Take, et.al., "3D NoC with Inductive-Coupling Links for Building-Block SiPs", IEEE Trans on Computers (2014).
• Cube-1: The heterogeneous 3D WiNoC – N. Miura, et.al., "A Scalable 3D Heterogeneous Multicore
with an Inductive ThruChip Interface", IEEE Micro (2013).
• MuCCRA-Cube: Dynamically reconfigurable processor – S. Saito, et.al., "MuCCRA-Cube: a 3D Dynamically
Reconfigurable Processor with Inductive-Coupling Link", FPL 2009.
36 Hiroki Matsutani, "3D WiNoC Architectures", Special Session at NOCS'14
![Page 37: 3D WiNoC Architecturesmatutani/papers/matsuta... · 2014. 9. 19. · – Each tile has router and core (e.g., processor or caches) – Each horizontal link appears with 50% (H50)](https://reader036.vdocument.in/reader036/viewer/2022071409/610381bf6324191efb7febac/html5/thumbnails/37.jpg)
References (2/2) • Vertical bubble flow control on Cube-0
– H. Matsutani, et.al., "A Vertical Bubble Flow Network using Inductive-Coupling for 3-D CMPs", NOCS 2011.
– Y. Take, et.al., "3D NoC with Inductive-Coupling Links for Building-Block SiPs", IEEE Trans on Computers (2014).
• Spanning trees optimization for 3D WiNoCs – H. Matsutani, et.al., "A Case for Wireless 3D NoCs for
CMPs", ASP-DAC 2013 (Best Paper Award).
• Small-world effects (randomness) – H. Matsutani, et.al., "Low-Latency Wireless 3D NoCs via
Randomized Shortcut Chips", DATE 2014. – M. Koibuchi, et.al., "A Case for Random Shortcut
Topologies for HPC Interconnects", ISCA 2012. 37 Hiroki Matsutani, "3D WiNoC Architectures", Special Session at NOCS'14
![Page 38: 3D WiNoC Architecturesmatutani/papers/matsuta... · 2014. 9. 19. · – Each tile has router and core (e.g., processor or caches) – Each horizontal link appears with 50% (H50)](https://reader036.vdocument.in/reader036/viewer/2022071409/610381bf6324191efb7febac/html5/thumbnails/38.jpg)
Backup slides
38 Hiroki Matsutani, "3D WiNoC Architectures", Special Session at NOCS'14