michihiro koibuchi*, hiroki matsutani** hideharu amano**. timothy mark pinkston*** * national...
TRANSCRIPT
![Page 1: Michihiro Koibuchi*, Hiroki Matsutani** Hideharu Amano**. Timothy Mark Pinkston*** * National Institute of Informatics, Japan/JST, Japan *Keio University,](https://reader034.vdocument.in/reader034/viewer/2022051620/56649ed35503460f94be3347/html5/thumbnails/1.jpg)
Michihiro Koibuchi*, Hiroki Matsutani**Hideharu Amano**. Timothy Mark Pinkston***
*National Institute of Informatics, Japan/JST, Japan*Keio University, Japan
***University of Southern California
A Lightweight Fault-Tolerant Mechanism for Network-on-Chip
![Page 2: Michihiro Koibuchi*, Hiroki Matsutani** Hideharu Amano**. Timothy Mark Pinkston*** * National Institute of Informatics, Japan/JST, Japan *Keio University,](https://reader034.vdocument.in/reader034/viewer/2022051620/56649ed35503460f94be3347/html5/thumbnails/2.jpg)
Background• Improvement of the die
yield– Circuit Level– Architecture Levele.g. Cell Brd. Eng.
• Play Station 3 : 7SPE• HPC-Purpose: 8SPE
• Fault tolerance of the communication on multi-core systems– Lightweight mechanism
Cell Broadband Engine
Ring busesPPE
SPE SPE SPE SPE
SPE SPE SPE SPE
Cell for PS3 (One SPE is disabled)
![Page 3: Michihiro Koibuchi*, Hiroki Matsutani** Hideharu Amano**. Timothy Mark Pinkston*** * National Institute of Informatics, Japan/JST, Japan *Keio University,](https://reader034.vdocument.in/reader034/viewer/2022051620/56649ed35503460f94be3347/html5/thumbnails/3.jpg)
Outline• Fault patterns on Network-on-Chip (NoC)
• Default-backup path mechanism (DBP)– maintains the connectivity of all healthy PEs, even if the
network includes hard faultsObjective
Provide a highly reliable network using lightweight hardware !
• Evaluation– Energy– Amount of Hardware– Throughput
![Page 4: Michihiro Koibuchi*, Hiroki Matsutani** Hideharu Amano**. Timothy Mark Pinkston*** * National Institute of Informatics, Japan/JST, Japan *Keio University,](https://reader034.vdocument.in/reader034/viewer/2022051620/56649ed35503460f94be3347/html5/thumbnails/4.jpg)
Network-on-Chip (NoC)
(*) Kyoto U/VDEC/ASPLA 90nm CMOS
On-chip routerCore• Processor Core– Largest component– Various fault-
tolerant techniques• Resource sparing• Redundancy
• On-Chip Router– Area is not so large.– Infrastructure that
affects on-chip communication
• Duplication
16-Core Architecture
![Page 5: Michihiro Koibuchi*, Hiroki Matsutani** Hideharu Amano**. Timothy Mark Pinkston*** * National Institute of Informatics, Japan/JST, Japan *Keio University,](https://reader034.vdocument.in/reader034/viewer/2022051620/56649ed35503460f94be3347/html5/thumbnails/5.jpg)
Failures in Communication
• Transient Error (e.g. bit error) – Software layer is responsible, and recoverable
• Link-to-link, and/or end-to-end [Murali,DToC05]• Error detection and/or error correction (e.g. CRC)
• Permanent Error (e.g. hard error)– System avoids using the failed modules
PE PERouter
Router
Hard error!
0100110 0100010
Bit error
![Page 6: Michihiro Koibuchi*, Hiroki Matsutani** Hideharu Amano**. Timothy Mark Pinkston*** * National Institute of Informatics, Japan/JST, Japan *Keio University,](https://reader034.vdocument.in/reader034/viewer/2022051620/56649ed35503460f94be3347/html5/thumbnails/6.jpg)
Router Architecture• Speculative Router [Kim ISCA06]
– Providing fault-tolerance at input buffer, routing computation, and switch allocation unit.
• Dependability for misrouted packets [Thottethodi IPDPS03]
• Channel Reconfiguration[DallyText03, Soteriou ICD04]
Routing Paths• Resource Sparing• Dynamic Reconfiguration• Fault-Tolerant Routing
Each Technique is resilient for portion of possible failures.
- Using them together enables high reliability! But, how about simplicity?
- Hard to recover crossbar failures
Existing NoC Fault-tolerant Techniques
![Page 7: Michihiro Koibuchi*, Hiroki Matsutani** Hideharu Amano**. Timothy Mark Pinkston*** * National Institute of Informatics, Japan/JST, Japan *Keio University,](https://reader034.vdocument.in/reader034/viewer/2022051620/56649ed35503460f94be3347/html5/thumbnails/7.jpg)
Outline• Fault patterns on Network-on-Chip (NoC)
• Default-backup path mechanism (DBP)– maintains the connectivity of all healthy PEs, even if the
network includes hard faultsObjective
Provide a highly reliable network using lightweight hardware !
• Evaluation– Energy– Amount of Hardware– Throughput
![Page 8: Michihiro Koibuchi*, Hiroki Matsutani** Hideharu Amano**. Timothy Mark Pinkston*** * National Institute of Informatics, Japan/JST, Japan *Keio University,](https://reader034.vdocument.in/reader034/viewer/2022051620/56649ed35503460f94be3347/html5/thumbnails/8.jpg)
Motivation
On-chip routerCore• NoC Component– Router, Link Failure
• disabling healthy local PEs
• Segmentation of the network
– NI Failure• Disabling the healthy
local PE
Disabled
The proposed lightweight fault-tolerant technique on a router
maintains network connectivity of all the healthy PEs
Unlike off-chip systems, a faulty module cannot be removed and
replaced
Disabled healthy PE
16-Core Architecture
![Page 9: Michihiro Koibuchi*, Hiroki Matsutani** Hideharu Amano**. Timothy Mark Pinkston*** * National Institute of Informatics, Japan/JST, Japan *Keio University,](https://reader034.vdocument.in/reader034/viewer/2022051620/56649ed35503460f94be3347/html5/thumbnails/9.jpg)
Conventional NoC Router ( 2-D mesh )
• 5-by-5 Router, channel bit-width (flit size) 64-bit
5x5 XBAR
ARBITER
FIFO
FIFO
FIFO
FIFO
FIFOX+
X-
Y+
Y-
CORE
X+
X-
Y+
Y-
CORE
Each input buffer has two
VCs(2x64-bit x 4)
Area (after place and route) is 40 ~ 45 [KGate]; 75% is FIFO[Matsutani.ASP-DAC08]
Each module may fail.
Duplication of all the input ports is too expensive.
![Page 10: Michihiro Koibuchi*, Hiroki Matsutani** Hideharu Amano**. Timothy Mark Pinkston*** * National Institute of Informatics, Japan/JST, Japan *Keio University,](https://reader034.vdocument.in/reader034/viewer/2022051620/56649ed35503460f94be3347/html5/thumbnails/10.jpg)
Minimum Requirements for Communication
5x5 XBAR
ARBITER
FIFO
FIFO
FIFO
FIFO
FIFOX+
X-
Y+
Y-
CORE
X+
X-
Y+
Y-
CORE
• It should send packets to at least one output port
• It should receive packets from at least one input port
To communicate a local core with neighboring cores,
![Page 11: Michihiro Koibuchi*, Hiroki Matsutani** Hideharu Amano**. Timothy Mark Pinkston*** * National Institute of Informatics, Japan/JST, Japan *Keio University,](https://reader034.vdocument.in/reader034/viewer/2022051620/56649ed35503460f94be3347/html5/thumbnails/11.jpg)
Default-backup Path(DBP) Mechanism
5x5 XBAR
ARBITER
FIFO
FIFO
FIFO
FIFO
FIFOX+
X-
Y+
Y-
CORE
X+
X-
Y+
Y-
CORE
• A local core can send packets to at least one output port
• A local core can receive packets from at least one input port
![Page 12: Michihiro Koibuchi*, Hiroki Matsutani** Hideharu Amano**. Timothy Mark Pinkston*** * National Institute of Informatics, Japan/JST, Japan *Keio University,](https://reader034.vdocument.in/reader034/viewer/2022051620/56649ed35503460f94be3347/html5/thumbnails/12.jpg)
5x5 XBAR
ARBITER
FIFO
FIFO
FIFO
FIFO
FIFOX+
X-
Y+
Y-
CORE
X+
X-
Y+
Y-
CORE
Failure
HeadTail
Body
Default-backup Path(DBP) Mechanism
• A local core can send packets to at least one output port
• A local core can receive packets from at least one input port
![Page 13: Michihiro Koibuchi*, Hiroki Matsutani** Hideharu Amano**. Timothy Mark Pinkston*** * National Institute of Informatics, Japan/JST, Japan *Keio University,](https://reader034.vdocument.in/reader034/viewer/2022051620/56649ed35503460f94be3347/html5/thumbnails/13.jpg)
• Cores can communicate with each other, even if router modules fail
• maintain packet transfers from X- direction, o X+ direction
ARBITER
FIFO
FIFO
FIFO
FIFO
FIFOX+
X-
Y+
Y-
CORE
X+
X-
Y+
Y-
CORE
Failure
Behavior of the DBP Mechanism (within a Router)
![Page 14: Michihiro Koibuchi*, Hiroki Matsutani** Hideharu Amano**. Timothy Mark Pinkston*** * National Institute of Informatics, Japan/JST, Japan *Keio University,](https://reader034.vdocument.in/reader034/viewer/2022051620/56649ed35503460f94be3347/html5/thumbnails/14.jpg)
Behavior of the DBP Mechanism ( bypassing Xbar and NI faults )
5x5 XBAR
ARBITER
FIFO
FIFO
FIFO
FIFO
FIFOX+
X-
Y+
Y-
CORE
X+
X-
Y+
Y-
CORE
Failure
Using 3:1 Mux instead of 2:1 mux
![Page 15: Michihiro Koibuchi*, Hiroki Matsutani** Hideharu Amano**. Timothy Mark Pinkston*** * National Institute of Informatics, Japan/JST, Japan *Keio University,](https://reader034.vdocument.in/reader034/viewer/2022051620/56649ed35503460f94be3347/html5/thumbnails/15.jpg)
Another Issue: Network Connectivity
16-Core Architecture
On-chip routerCore
The DBP mechanism provides reliability not only on intra-router datapath but also
on routing paths
• Router, link failure– Disabling healthy local
PEs– Segmentation of the
networks• may disable all the PEs
Dividing into two regions!
![Page 16: Michihiro Koibuchi*, Hiroki Matsutani** Hideharu Amano**. Timothy Mark Pinkston*** * National Institute of Informatics, Japan/JST, Japan *Keio University,](https://reader034.vdocument.in/reader034/viewer/2022051620/56649ed35503460f94be3347/html5/thumbnails/16.jpg)
DBP Mechanism (inter-router behavior)
Router
(omit PEs)
5x5 XBAR
ARBITER
FIFO
FIFO
FIFO
FIFO
FIFOX+
X-
Y+
Y-
CORE
X+
X-
Y+
Y-
CORE
Set the DBP ports along a unidirectional embedded ring topology
Y-
X- X+
Y+ Router
![Page 17: Michihiro Koibuchi*, Hiroki Matsutani** Hideharu Amano**. Timothy Mark Pinkston*** * National Institute of Informatics, Japan/JST, Japan *Keio University,](https://reader034.vdocument.in/reader034/viewer/2022051620/56649ed35503460f94be3347/html5/thumbnails/17.jpg)
Routing Bypasses Faults (e.g., failed crossbar)
Router
Default-backup path is used only at the faulty port
The corresponding network graph
A unidirectional channel on a link
Link
![Page 18: Michihiro Koibuchi*, Hiroki Matsutani** Hideharu Amano**. Timothy Mark Pinkston*** * National Institute of Informatics, Japan/JST, Japan *Keio University,](https://reader034.vdocument.in/reader034/viewer/2022051620/56649ed35503460f94be3347/html5/thumbnails/18.jpg)
DBP Applied to Up*/Down* Routing
S
D
Up*/Down* routing
The router has only a single output port
Existing deadlock-free routing cannot provide the network connectivity, due to the directional routing restrictions
Up
Down DownUp
![Page 19: Michihiro Koibuchi*, Hiroki Matsutani** Hideharu Amano**. Timothy Mark Pinkston*** * National Institute of Informatics, Japan/JST, Japan *Keio University,](https://reader034.vdocument.in/reader034/viewer/2022051620/56649ed35503460f94be3347/html5/thumbnails/19.jpg)
Virtual channel (VC) transition
DBP Routing Mechanism
X
Turn Model[Glass,1992] X
• Guaranteeing deadlock-freedom and connectivity by imposing routing restrictions• Allows packet transfer
along the DBP ring• Allows VC transitions in
increasing order• Uses existing deadlock-
free routing within every virtual-channel network
The Idea is similar to the SAN routing [koibuchi,ICPP03]
We propose a new routing strategy for NoCs with directional routing restrictions!
![Page 20: Michihiro Koibuchi*, Hiroki Matsutani** Hideharu Amano**. Timothy Mark Pinkston*** * National Institute of Informatics, Japan/JST, Japan *Keio University,](https://reader034.vdocument.in/reader034/viewer/2022051620/56649ed35503460f94be3347/html5/thumbnails/20.jpg)
Outline• Fault patterns on Network-on-Chip (NoC)
• Default-backup path mechanism (DBP)– maintains the connectivity of all healthy PEs, even if the
network includes hard faultsObjective
Provide a highly reliable network using lightweight hardware !
• Evaluation– Energy– Amount of Hardware– Throughput
![Page 21: Michihiro Koibuchi*, Hiroki Matsutani** Hideharu Amano**. Timothy Mark Pinkston*** * National Institute of Informatics, Japan/JST, Japan *Keio University,](https://reader034.vdocument.in/reader034/viewer/2022051620/56649ed35503460f94be3347/html5/thumbnails/21.jpg)
Energy: NoC Energy Model
• Ave. flit energy:– Send 1-flit to destination– How much energy[J] ?
• Simulation parameters– 6/12mm square chip
(16/64 cores)– 90nm CMOS
flitE
)( linkswaveflit EEHwE
[Wang, DATE’05]
12mm
![Page 22: Michihiro Koibuchi*, Hiroki Matsutani** Hideharu Amano**. Timothy Mark Pinkston*** * National Institute of Informatics, Japan/JST, Japan *Keio University,](https://reader034.vdocument.in/reader034/viewer/2022051620/56649ed35503460f94be3347/html5/thumbnails/22.jpg)
Energy Consumption
16 cores
As the number of faulty links increases, DBP gracefully increases the energy, due to the increased hop counts
64 cores
almost constant!
![Page 23: Michihiro Koibuchi*, Hiroki Matsutani** Hideharu Amano**. Timothy Mark Pinkston*** * National Institute of Informatics, Japan/JST, Japan *Keio University,](https://reader034.vdocument.in/reader034/viewer/2022051620/56649ed35503460f94be3347/html5/thumbnails/23.jpg)
Amount of Hardware
Area is increased by at most only 11.1% (the 2-VC case)
Total router area of 2-D meshRouter area with various # of ports.
The ratio of additional HW is decreased, as # of ports increases.
![Page 24: Michihiro Koibuchi*, Hiroki Matsutani** Hideharu Amano**. Timothy Mark Pinkston*** * National Institute of Informatics, Japan/JST, Japan *Keio University,](https://reader034.vdocument.in/reader034/viewer/2022051620/56649ed35503460f94be3347/html5/thumbnails/24.jpg)
Performance Evaluation• Network simulation
– Throughput and latency– 16 cores and 64 cores
• Topology – 2-D mesh
• Traffic pattern– Random (as a baseline)
Packet size 16-flit (1-flit header)
Buffer size 1-flit per channel
Switching Wormhole switching
# of VCs 2
Min latency 3-cycle per router
![Page 25: Michihiro Koibuchi*, Hiroki Matsutani** Hideharu Amano**. Timothy Mark Pinkston*** * National Institute of Informatics, Japan/JST, Japan *Keio University,](https://reader034.vdocument.in/reader034/viewer/2022051620/56649ed35503460f94be3347/html5/thumbnails/25.jpg)
Throughput and Latency
Topology is changed from 2-D mesh (no faults) to ring at 48 /224 faults on 16/64 cores
16 cores 64 cores
Throughput is decreased by the increased path hops.
![Page 26: Michihiro Koibuchi*, Hiroki Matsutani** Hideharu Amano**. Timothy Mark Pinkston*** * National Institute of Informatics, Japan/JST, Japan *Keio University,](https://reader034.vdocument.in/reader034/viewer/2022051620/56649ed35503460f94be3347/html5/thumbnails/26.jpg)
Extensions of DBP Mechanism• Faults within the DBP itself and various ports
– Partially duplication– Multiple embedded DBP rings
• Another approach– To improve the latency, a healthy router
enables the DBP
ARBITER
FIFO
FIFO
FIFO
FIFO
FIFOX+
X-
Y+
Y-
CORE
X+
X-
Y+
Y-
CORE
RouterLink faults
Datapath via no crossbar
![Page 27: Michihiro Koibuchi*, Hiroki Matsutani** Hideharu Amano**. Timothy Mark Pinkston*** * National Institute of Informatics, Japan/JST, Japan *Keio University,](https://reader034.vdocument.in/reader034/viewer/2022051620/56649ed35503460f94be3347/html5/thumbnails/27.jpg)
Conclusions• We proposed a lightweight fault-tolerant mechanism,
DBP, for NoCs (architecture level)– Resilient for hardware faults of both intra-router modules
and routing paths– A new routing strategy was developed
– The idea is applicable to various NoC architectures• As well as regular topologies
• Evaluation– Energy consumption
• almost constant by up to 40 faults (64 cores)– Amount of Hardware
• increasing by at most only 11.1% – Throughput performance
• decreasing by the increased path hops
• The DBP serves the role of “lifeline” to increase the lifetime of NoCs