performance, cost, and energy evaluation of fat h-tree: a cost-efficient tree-based on-chip network...
TRANSCRIPT
![Page 1: Performance, Cost, and Energy Evaluation of Fat H-Tree: A Cost-Efficient Tree-Based On-Chip Network Hiroki Matsutani (Keio Univ, JAPAN) Michihiro Koibuchi](https://reader037.vdocument.in/reader037/viewer/2022110213/5697bf9e1a28abf838c947a3/html5/thumbnails/1.jpg)
Performance, Cost, and Energy Evaluation of Fat H-
Tree:
A Cost-Efficient Tree-BasedOn-Chip Network
Hiroki Matsutani (Keio Univ, JAPAN)Michihiro Koibuchi (NII, JAPAN)
Hideharu Amano (Keio Univ, JAPAN)
![Page 2: Performance, Cost, and Energy Evaluation of Fat H-Tree: A Cost-Efficient Tree-Based On-Chip Network Hiroki Matsutani (Keio Univ, JAPAN) Michihiro Koibuchi](https://reader037.vdocument.in/reader037/viewer/2022110213/5697bf9e1a28abf838c947a3/html5/thumbnails/2.jpg)
Introduction• Network-on-Chips
– Tile architecture– On-chip routers– Packet switching
• Various NoC topologies– Mesh, Torus– H-Tree, Fat Trees
• Fat H-Tree (FHT)
• Evaluations of FHT– Performance– Area– EnergyA mesh-based on-chip network
0 1 2
3 4 5
6 7 8
Tile (RISC, DSP, RAM, I/O)
We proposed FHT as an alternative to Fat Trees
![Page 3: Performance, Cost, and Energy Evaluation of Fat H-Tree: A Cost-Efficient Tree-Based On-Chip Network Hiroki Matsutani (Keio Univ, JAPAN) Michihiro Koibuchi](https://reader037.vdocument.in/reader037/viewer/2022110213/5697bf9e1a28abf838c947a3/html5/thumbnails/3.jpg)
NoCs’ topologies: Mesh & Torus
• 2-D Mesh • 2-D Torus– 2x bandwidth of meshRAW [Taylor, IEEE Micro’02]
Router Core
Fat H-Tree is a tree-based topology, but it includes a torus
structure
![Page 4: Performance, Cost, and Energy Evaluation of Fat H-Tree: A Cost-Efficient Tree-Based On-Chip Network Hiroki Matsutani (Keio Univ, JAPAN) Michihiro Koibuchi](https://reader037.vdocument.in/reader037/viewer/2022110213/5697bf9e1a28abf838c947a3/html5/thumbnails/4.jpg)
NoCs’ topologies: Fat Trees
• Fat Tree (p, q, c)p: # of upward linksq: # of downward
linksc: # of core ports
Router Core
Fat Tree (2,4,2)Fat Tree (2,4,1)
Rank-1
Rank-2
Trees are duplicated in Fat Trees and Fat H-Tree, but the connection patterns of trees are different!
![Page 5: Performance, Cost, and Energy Evaluation of Fat H-Tree: A Cost-Efficient Tree-Based On-Chip Network Hiroki Matsutani (Keio Univ, JAPAN) Michihiro Koibuchi](https://reader037.vdocument.in/reader037/viewer/2022110213/5697bf9e1a28abf838c947a3/html5/thumbnails/5.jpg)
Outline• NoCs’ topologies
– Mesh, Torus– H-Trees, Fat Trees
• Fat H-Tree (FHT)– Structure– 2-D layout– Routing algorithm (DTR)
• Evaluations of FHT– Network logic area– Energy consumption– Throughput
![Page 6: Performance, Cost, and Energy Evaluation of Fat H-Tree: A Cost-Efficient Tree-Based On-Chip Network Hiroki Matsutani (Keio Univ, JAPAN) Michihiro Koibuchi](https://reader037.vdocument.in/reader037/viewer/2022110213/5697bf9e1a28abf838c947a3/html5/thumbnails/6.jpg)
Fat H-Tree: Structure
• Fat H-Tree– Red Tree (H-Tree)– Black Tree (H-Tree)
[Yamada, EUC’04]
Combining two H-Trees (red & black)
Router Core Router Core
Location of black tree is shifted lower-right direction of red tree
By shifting the location of black tree, the connection pattern of trees
are different from original Fat Trees
![Page 7: Performance, Cost, and Energy Evaluation of Fat H-Tree: A Cost-Efficient Tree-Based On-Chip Network Hiroki Matsutani (Keio Univ, JAPAN) Michihiro Koibuchi](https://reader037.vdocument.in/reader037/viewer/2022110213/5697bf9e1a28abf838c947a3/html5/thumbnails/7.jpg)
Fat H-Tree: Structure
• Fat H-Tree– Red Tree (H-Tree)– Black Tree (H-Tree)
[Yamada, EUC’04]
Combining two H-Trees (red & black)
Router Core Router Core
Fat H-Tree is formed on red & black trees
![Page 8: Performance, Cost, and Energy Evaluation of Fat H-Tree: A Cost-Efficient Tree-Based On-Chip Network Hiroki Matsutani (Keio Univ, JAPAN) Michihiro Koibuchi](https://reader037.vdocument.in/reader037/viewer/2022110213/5697bf9e1a28abf838c947a3/html5/thumbnails/8.jpg)
Fat H-Tree: Structure
• Fat H-Tree– Red Tree (H-Tree)– Black Tree (H-Tree)
[Yamada, EUC’04]
Combining two H-Trees (red & black)
Router Core Router Core
Fat H-Tree is formed on red & black trees
![Page 9: Performance, Cost, and Energy Evaluation of Fat H-Tree: A Cost-Efficient Tree-Based On-Chip Network Hiroki Matsutani (Keio Univ, JAPAN) Michihiro Koibuchi](https://reader037.vdocument.in/reader037/viewer/2022110213/5697bf9e1a28abf838c947a3/html5/thumbnails/9.jpg)
Fat H-Tree: Structure
• Fat H-Tree– Red Tree (H-Tree)– Black Tree (H-Tree)
[Yamada, EUC’04]
Combining two H-Trees (red & black)
Router Core Router Core
Fat H-Tree is formed on red & black trees
![Page 10: Performance, Cost, and Energy Evaluation of Fat H-Tree: A Cost-Efficient Tree-Based On-Chip Network Hiroki Matsutani (Keio Univ, JAPAN) Michihiro Koibuchi](https://reader037.vdocument.in/reader037/viewer/2022110213/5697bf9e1a28abf838c947a3/html5/thumbnails/10.jpg)
Fat H-Tree: Structure
• Fat H-Tree– Red Tree (H-Tree)– Black Tree (H-Tree)
[Yamada, EUC’04]
Combining two H-Trees (red & black)
Router Core Router Core
Rank-2 or upper routers are omitted in this figure
Each core is connected to
both red & black trees
Ring is formed with cores & rank1
routers
Torus-level performance by combing only two H-Trees
![Page 11: Performance, Cost, and Energy Evaluation of Fat H-Tree: A Cost-Efficient Tree-Based On-Chip Network Hiroki Matsutani (Keio Univ, JAPAN) Michihiro Koibuchi](https://reader037.vdocument.in/reader037/viewer/2022110213/5697bf9e1a28abf838c947a3/html5/thumbnails/11.jpg)
Fat H-Tree: 2-D layout on VLSI
• Fat H-Tree– Torus structure Folded as well as the folded layout of 2-D Torus
Fat H-Tree’s 2-D layoutRouter Core
Topologically equivalent
(Long feedback links across chip)
![Page 12: Performance, Cost, and Energy Evaluation of Fat H-Tree: A Cost-Efficient Tree-Based On-Chip Network Hiroki Matsutani (Keio Univ, JAPAN) Michihiro Koibuchi](https://reader037.vdocument.in/reader037/viewer/2022110213/5697bf9e1a28abf838c947a3/html5/thumbnails/12.jpg)
Fat H-Tree: Routing algorithm
• Paths on a single H-tree– Only red tree, or– Only black tree
Only red tree 6-
hopOnly black
tree 6-hop
![Page 13: Performance, Cost, and Energy Evaluation of Fat H-Tree: A Cost-Efficient Tree-Based On-Chip Network Hiroki Matsutani (Keio Univ, JAPAN) Michihiro Koibuchi](https://reader037.vdocument.in/reader037/viewer/2022110213/5697bf9e1a28abf838c947a3/html5/thumbnails/13.jpg)
Fat H-Tree: Routing algorithm
• Paths on a single H-tree– Only red tree, or– Only black tree
• Paths across trees– Transit between
trees– Minimum paths
Firstly red is used
Then black is used, total 4-hop (minimum)
Transit!
Exploiting such paths is key for improving the
performance
![Page 14: Performance, Cost, and Energy Evaluation of Fat H-Tree: A Cost-Efficient Tree-Based On-Chip Network Hiroki Matsutani (Keio Univ, JAPAN) Michihiro Koibuchi](https://reader037.vdocument.in/reader037/viewer/2022110213/5697bf9e1a28abf838c947a3/html5/thumbnails/14.jpg)
Fat H-Tree: Dual tree routing (DTR)
• Dual tree routing– Transit trees for
minimum paths– Cycles across trees
• Deadlock avoidance– VC# is increased
when a packet transits from red to black
VC#0 is used
VC#1 is used
Transit!
Sufficient number of VCs is only TWO in 64-node FHT
![Page 15: Performance, Cost, and Energy Evaluation of Fat H-Tree: A Cost-Efficient Tree-Based On-Chip Network Hiroki Matsutani (Keio Univ, JAPAN) Michihiro Koibuchi](https://reader037.vdocument.in/reader037/viewer/2022110213/5697bf9e1a28abf838c947a3/html5/thumbnails/15.jpg)
Outline• NoCs’ topologies
– Mesh, Torus– H-Trees, Fat Trees
• Fat H-Tree (FHT)– Structure– 2-D layout– Routing algorithm (DTR)
• Evaluations of FHT– Network logic area– Energy consumption– Throughput
![Page 16: Performance, Cost, and Energy Evaluation of Fat H-Tree: A Cost-Efficient Tree-Based On-Chip Network Hiroki Matsutani (Keio Univ, JAPAN) Michihiro Koibuchi](https://reader037.vdocument.in/reader037/viewer/2022110213/5697bf9e1a28abf838c947a3/html5/thumbnails/16.jpg)
Ideal throughput: Channel bisection
Bandwidth of FHT is much improved by the torus structure
N=16 N=64 N=256
HT 4 4 4 4
FT1 8 16 32
FT2 16 32 64
FHT 24 40 72
Mesh 8 16 32
Torus 16 32 64
FT1: Fat Tree(2,4,1) FT2: Fat Tree(2,4,2)
nn 22N
1n2
2n2
2n2
1n2
82 2n
due to torus
due to two H-Trees
![Page 17: Performance, Cost, and Energy Evaluation of Fat H-Tree: A Cost-Efficient Tree-Based On-Chip Network Hiroki Matsutani (Keio Univ, JAPAN) Michihiro Koibuchi](https://reader037.vdocument.in/reader037/viewer/2022110213/5697bf9e1a28abf838c947a3/html5/thumbnails/17.jpg)
Number of routers
Router count of FHT is less than Fat Tree(2,4,2)
N=16 N=64 N=256
HT 5 21 85
FT1 6 28 120
FT2 12 56 240
FHT 10 42 170
Mesh 16 64 256
Torus 16 64 256
FT1: Fat Tree(2,4,1) FT2: Fat Tree(2,4,2)
nn 22N
2/)24( nn nn 24
N
3/)14(2 n
3/)14( n
N
Note number of NI is not considered.
FHT requires 2-port NIs for red & black
![Page 18: Performance, Cost, and Energy Evaluation of Fat H-Tree: A Cost-Efficient Tree-Based On-Chip Network Hiroki Matsutani (Keio Univ, JAPAN) Michihiro Koibuchi](https://reader037.vdocument.in/reader037/viewer/2022110213/5697bf9e1a28abf838c947a3/html5/thumbnails/18.jpg)
Network logic area (routers & NIs)
• Synthesis of NoC– 16-core, 64-core– Design Compiler– 0.18um CMOS
• Router architecture– 1-flit = 32-bit– 4-stage pipeline– Wormhole, 2VCs
• NI architecture– In: 2-flit FIFO– Out: 2-flit FIFO
CrossbarInput Ports
Buf
Wormhole router
Buf
Buf
Buf
2VCs
2VCs
FHT’s NI is implemented as a “router” to forward packets
between trees
![Page 19: Performance, Cost, and Energy Evaluation of Fat H-Tree: A Cost-Efficient Tree-Based On-Chip Network Hiroki Matsutani (Keio Univ, JAPAN) Michihiro Koibuchi](https://reader037.vdocument.in/reader037/viewer/2022110213/5697bf9e1a28abf838c947a3/html5/thumbnails/19.jpg)
Synthesis result (64-
core)
Network logic area: 16/64-core
Synthesis result (16-
core)
Network logic area of FHT is smaller than Fat Tree(2,4,2)
FHT’s NI is larger than others
![Page 20: Performance, Cost, and Energy Evaluation of Fat H-Tree: A Cost-Efficient Tree-Based On-Chip Network Hiroki Matsutani (Keio Univ, JAPAN) Michihiro Koibuchi](https://reader037.vdocument.in/reader037/viewer/2022110213/5697bf9e1a28abf838c947a3/html5/thumbnails/20.jpg)
Total wire length of all links
• Total unit-length of links– Core router– Router router
1-unit link
1-unit link
How many unit-links would FHT require?
1-unit = distance between neighboring cores
N=16 N=64 N=256
HT 24 112 480
FT1 32 192 1,024
FT2 64 384 2,048
FHT 72 392 1,800
Mesh 24 112 480
Torus 48 224 960
FT1: Fat Tree(2,4,1) FT2: Fat Tree(2,4,2)
nn 22N
nN
)2(2 nN 1
1
2
)12(88
n
nN
nN2
)2(4 nN
n
nN
2
)12(2
Wire length of FHT is almost the same as Fat Tree(2,4,2)
![Page 21: Performance, Cost, and Energy Evaluation of Fat H-Tree: A Cost-Efficient Tree-Based On-Chip Network Hiroki Matsutani (Keio Univ, JAPAN) Michihiro Koibuchi](https://reader037.vdocument.in/reader037/viewer/2022110213/5697bf9e1a28abf838c947a3/html5/thumbnails/21.jpg)
Energy: NoC’s energy model
• Ave. flit energy– Send 1-flit to dest.– How much
energy[J] ?
• Parameters– 12mm square chip– 16/64-core– 0.18um CMOS
• Switching energy– 1-bit switching @ router– Gate-level sim– 1.88 [pJ / hop]– 1.27 [pJ / hop]– 1.45 [pJ / hop]
• Link energy– 1-bit transfer @ link– 0.67 [pJ / mm]
flitE
swE
linkE)( linkswaveflit EEHwE
[Wang, DATE’05]
12mm
for routers
for NI
for NI(fht)
![Page 22: Performance, Cost, and Energy Evaluation of Fat H-Tree: A Cost-Efficient Tree-Based On-Chip Network Hiroki Matsutani (Keio Univ, JAPAN) Michihiro Koibuchi](https://reader037.vdocument.in/reader037/viewer/2022110213/5697bf9e1a28abf838c947a3/html5/thumbnails/22.jpg)
Energy consumption: 16/64-core
Simulation result (16-
core)
Energy consumption of FHT is less than Fat Tree(2,4,2)
Simulation result (64-
core)
![Page 23: Performance, Cost, and Energy Evaluation of Fat H-Tree: A Cost-Efficient Tree-Based On-Chip Network Hiroki Matsutani (Keio Univ, JAPAN) Michihiro Koibuchi](https://reader037.vdocument.in/reader037/viewer/2022110213/5697bf9e1a28abf838c947a3/html5/thumbnails/23.jpg)
Throughput: Simulation environment
• Flit-level simulation– Throughput / latency– 16/64-core
• Topology (routing)– Mesh, Torus (DOR)– Fat Trees (up/down)– Fat H-Tree (DTR)
• Traffic patterns– Uniform– BT.W– SP.W– CG.W– MG.W– IS.W
Packet size 16-flit (1-flit header)Buffer size 1-flit per channel
Switching Wormhole
# of VCs 2Latency 3-cycle per 1-hop
NAS Parallel Benchmark
![Page 24: Performance, Cost, and Energy Evaluation of Fat H-Tree: A Cost-Efficient Tree-Based On-Chip Network Hiroki Matsutani (Keio Univ, JAPAN) Michihiro Koibuchi](https://reader037.vdocument.in/reader037/viewer/2022110213/5697bf9e1a28abf838c947a3/html5/thumbnails/24.jpg)
FHT vs. FTs: Uniform (16/64-core)• FHT (DTR) • Fat Tree(2,4,2)• Fat Tree(2,4,1)
FHT outperforms FT2 in 16-core,but it doesn’t in 64-core
Uniform (16-core) Uniform (64-core)
FHT(DTR) causes
congestion around root of
trees
![Page 25: Performance, Cost, and Energy Evaluation of Fat H-Tree: A Cost-Efficient Tree-Based On-Chip Network Hiroki Matsutani (Keio Univ, JAPAN) Michihiro Koibuchi](https://reader037.vdocument.in/reader037/viewer/2022110213/5697bf9e1a28abf838c947a3/html5/thumbnails/25.jpg)
FHT vs. FTs: BT (16/64-core)
BT has neighboring communications. Advantage for FHT(DTR)
BT traffic (64-core)
• FHT (DTR) • Fat Tree(2,4,2)• Fat Tree(2,4,1) FHT(DTR)
doesn’t cause congestion
around roots
BT traffic (16-core)
![Page 26: Performance, Cost, and Energy Evaluation of Fat H-Tree: A Cost-Efficient Tree-Based On-Chip Network Hiroki Matsutani (Keio Univ, JAPAN) Michihiro Koibuchi](https://reader037.vdocument.in/reader037/viewer/2022110213/5697bf9e1a28abf838c947a3/html5/thumbnails/26.jpg)
FHT vs. FTs: MG (16/64-core)
Performance is … FHT(DTR) > FT2 > FT1
MG traffic (16-core) MG traffic (64-core)
• FHT (DTR) • Fat Tree(2,4,2)• Fat Tree(2,4,1)
![Page 27: Performance, Cost, and Energy Evaluation of Fat H-Tree: A Cost-Efficient Tree-Based On-Chip Network Hiroki Matsutani (Keio Univ, JAPAN) Michihiro Koibuchi](https://reader037.vdocument.in/reader037/viewer/2022110213/5697bf9e1a28abf838c947a3/html5/thumbnails/27.jpg)
Summary: Evaluations of FHT
• Performance– FHT outperforms Fat Tree (FT2), except for
uniform
• Network logic area– FHT requires 20.5%-28.1% smaller area than FT2
• Energy consumption– FHT requires 6.7%-7.0% less energy than FT2
• Wire length– Wire length of FHT is almost the same as FT2
• Ongoing works– Evaluation in 90nm CMOS– 3-D layout of FHT for 3-D NoCs
wafer
wafer
wafer
(stacked ICs)
![Page 28: Performance, Cost, and Energy Evaluation of Fat H-Tree: A Cost-Efficient Tree-Based On-Chip Network Hiroki Matsutani (Keio Univ, JAPAN) Michihiro Koibuchi](https://reader037.vdocument.in/reader037/viewer/2022110213/5697bf9e1a28abf838c947a3/html5/thumbnails/28.jpg)
Thank you for your attention
![Page 29: Performance, Cost, and Energy Evaluation of Fat H-Tree: A Cost-Efficient Tree-Based On-Chip Network Hiroki Matsutani (Keio Univ, JAPAN) Michihiro Koibuchi](https://reader037.vdocument.in/reader037/viewer/2022110213/5697bf9e1a28abf838c947a3/html5/thumbnails/29.jpg)
Feasibility of Fat H-Tree
• Total wire length– Slightly longer than Fat Trees– But a lot of wire resources are available on-chip
• Wire delay– Length of the longest wire is same as Fat Trees
Fat Tree (2,4,1)Fat H-Tree
If Fat Trees are feasible, Fat H-Tree can be implemented with smaller area but higher
performance
![Page 30: Performance, Cost, and Energy Evaluation of Fat H-Tree: A Cost-Efficient Tree-Based On-Chip Network Hiroki Matsutani (Keio Univ, JAPAN) Michihiro Koibuchi](https://reader037.vdocument.in/reader037/viewer/2022110213/5697bf9e1a28abf838c947a3/html5/thumbnails/30.jpg)
Routings for FHT: Torus routing(TOR)
• Single tree (STR)– Select a single tree
per packet– Can’t transit trees
• Dual tree (DTR)– Transit trees for
minimal paths– VCs are needed
• Torus routing (TOR)– Use torus formed
with rank1 & cores– VCs are needed
Fat H-Tree’s torus structure
Can’t use rank-2 or upper
routers
To avoid congestion around roots, but non-minimal paths
![Page 31: Performance, Cost, and Energy Evaluation of Fat H-Tree: A Cost-Efficient Tree-Based On-Chip Network Hiroki Matsutani (Keio Univ, JAPAN) Michihiro Koibuchi](https://reader037.vdocument.in/reader037/viewer/2022110213/5697bf9e1a28abf838c947a3/html5/thumbnails/31.jpg)
FHT vs. Torus: Uniform (16/64-core)
• FHT (DTR): • FHT (TOR): • 2-D Torus• 2-D Mesh
Minimum routing using links around roots
Using torus structure (can’t use links around roots)
Uniform (64-core)
FHT achieves torus-level throughput using only torus structure
Uniform (16-core)
![Page 32: Performance, Cost, and Energy Evaluation of Fat H-Tree: A Cost-Efficient Tree-Based On-Chip Network Hiroki Matsutani (Keio Univ, JAPAN) Michihiro Koibuchi](https://reader037.vdocument.in/reader037/viewer/2022110213/5697bf9e1a28abf838c947a3/html5/thumbnails/32.jpg)
Number of VCs in Dual Tree Routing
• # of VCs required is– H_max is the longest hop count in the
network
• E.g.,– 16-core FHT requires 2VCs– 64-core FHT requires 2VCs– …
14/max H
VC# is increased when a packet transits red to
black
Two VCs is not so costly…
![Page 33: Performance, Cost, and Energy Evaluation of Fat H-Tree: A Cost-Efficient Tree-Based On-Chip Network Hiroki Matsutani (Keio Univ, JAPAN) Michihiro Koibuchi](https://reader037.vdocument.in/reader037/viewer/2022110213/5697bf9e1a28abf838c947a3/html5/thumbnails/33.jpg)
NIs in Fat H-Tree• Implemented as a
“simplified router”– Connecting red & black
trees
• Routing @ NI is simple– Forward packets to another
tree if dst is not me
Processing Core
Crossbar
for red tree for black tree
Fat H-Tree
![Page 34: Performance, Cost, and Energy Evaluation of Fat H-Tree: A Cost-Efficient Tree-Based On-Chip Network Hiroki Matsutani (Keio Univ, JAPAN) Michihiro Koibuchi](https://reader037.vdocument.in/reader037/viewer/2022110213/5697bf9e1a28abf838c947a3/html5/thumbnails/34.jpg)
Synthesis result (64-
core)
Network logic area: 16/64-core
Synthesis result (16-
core)
Network logic area of FHT is smaller than Fat Tree(2,4,2)
FHT’s NI is larger than others
![Page 35: Performance, Cost, and Energy Evaluation of Fat H-Tree: A Cost-Efficient Tree-Based On-Chip Network Hiroki Matsutani (Keio Univ, JAPAN) Michihiro Koibuchi](https://reader037.vdocument.in/reader037/viewer/2022110213/5697bf9e1a28abf838c947a3/html5/thumbnails/35.jpg)
![Page 36: Performance, Cost, and Energy Evaluation of Fat H-Tree: A Cost-Efficient Tree-Based On-Chip Network Hiroki Matsutani (Keio Univ, JAPAN) Michihiro Koibuchi](https://reader037.vdocument.in/reader037/viewer/2022110213/5697bf9e1a28abf838c947a3/html5/thumbnails/36.jpg)
• Fat H-Tree– Minimum routing (DTR)
routing N=16 N=64 N=256
FT up/down 3.60 5.43 7.36
FHT DTR 3.20 4.84 6.78
Mesh DOR 2.67 5.33 10.67
Torus DOR 2.13 4.06 8.03
FHT offers shorter average hop count than Fat Trees
Average hop count
Nyx,
y)(x,2ave HN-N
H1
FT: Fat Trees
![Page 37: Performance, Cost, and Energy Evaluation of Fat H-Tree: A Cost-Efficient Tree-Based On-Chip Network Hiroki Matsutani (Keio Univ, JAPAN) Michihiro Koibuchi](https://reader037.vdocument.in/reader037/viewer/2022110213/5697bf9e1a28abf838c947a3/html5/thumbnails/37.jpg)
Wire length of links
• Case studies– 16-core (1-unit = 3.0mm)– 64-core (1-unit = 1.5mm)
1-unit = 3mm
Utilization rate of wire resources in 2 metal layers (%)
1-unit = 1.5mm
Flit-width = 32-bit @ 12mm square chip
12mm
N=16 N=64
HT 1.6% 3.7%
FT1 2.1% 6.4%
FT2 4.3% 12.8%
FHT 4.8% 13.1%
Mesh 1.6% 3.7%
Torus 3.2% 7.5%
Wire length of FHT is almost the same as Fat Tree(2,4,2)
![Page 38: Performance, Cost, and Energy Evaluation of Fat H-Tree: A Cost-Efficient Tree-Based On-Chip Network Hiroki Matsutani (Keio Univ, JAPAN) Michihiro Koibuchi](https://reader037.vdocument.in/reader037/viewer/2022110213/5697bf9e1a28abf838c947a3/html5/thumbnails/38.jpg)
Routings for FHT: Single tree (STR)
• Single tree (STR)– Select a single tree
per packet– Can’t transit trees
• Dual tree (DTR)– Transit trees for
minimal paths– VCs are needed
• Torus routing (TOR)– Use torus formed
with rank1 & cores– VCs are needed
Case 1: red tree 6-hop
Case 2: black tree 4-hop
![Page 39: Performance, Cost, and Energy Evaluation of Fat H-Tree: A Cost-Efficient Tree-Based On-Chip Network Hiroki Matsutani (Keio Univ, JAPAN) Michihiro Koibuchi](https://reader037.vdocument.in/reader037/viewer/2022110213/5697bf9e1a28abf838c947a3/html5/thumbnails/39.jpg)
Routings for FHT: Dual tree (DTR)
• Single tree (STR)– Select a single tree
per packet– Can’t transit trees
• Dual tree (DTR)– Transit trees for
minimal paths– VCs are needed
• Torus routing (TOR)– Use torus formed
with rank1 & cores– VCs are needed
Firstly red is used
Then black is used
# of VC is increased when a packet transits red to
black
![Page 40: Performance, Cost, and Energy Evaluation of Fat H-Tree: A Cost-Efficient Tree-Based On-Chip Network Hiroki Matsutani (Keio Univ, JAPAN) Michihiro Koibuchi](https://reader037.vdocument.in/reader037/viewer/2022110213/5697bf9e1a28abf838c947a3/html5/thumbnails/40.jpg)
Fat H-Tree: Structure
• Fat H-Tree– Red Tree (H-Tree)– Black Tree (H-Tree)
[Yamada, EUC’04]
Combining two H-Trees (red & black)
Router Core Router Core
Both edges are connected (folded)
By shifting and folding black tree, the connection pattern of trees are
different from original Fat Trees