doctor dissertation final review high throughput...

78
High Throughput Architecture and Routing Algorithms Towards the Design of Reliable Mesh-based Many-Core Network-on- Chip Systems Graduate School of Computer Science and Engineering Adaptive Systems Laboratory Jan. 14, 2015 Doctor Dissertation Final Review d8141104, Akram Ben Ahmed Supervised by Prof. Abderazek Ben Abdallah Doctor Dissertation Final Review 1

Upload: others

Post on 28-Aug-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Doctor Dissertation Final Review High Throughput ...web-ext.u-aizu.ac.jp/~benab/publications/theses/Akram...Jan. 14, 2015 Doctor Dissertation Final Review Related work 24 –Vicis

High Throughput Architecture and Routing Algorithms

Towards the Design of Reliable Mesh-based Many-Core Network-on-

Chip Systems

Graduate School of Computer Science and Engineering

Adaptive Systems Laboratory

Jan. 14, 2015 Doctor Dissertation Final Review

d8141104, Akram Ben AhmedSupervised by Prof. Abderazek Ben Abdallah

Doctor Dissertation Final Review

1

Page 2: Doctor Dissertation Final Review High Throughput ...web-ext.u-aizu.ac.jp/~benab/publications/theses/Akram...Jan. 14, 2015 Doctor Dissertation Final Review Related work 24 –Vicis

Outline

• Background

• Research motivation

• Research goal and contributions

• Related work

• Graceful fault-tolerant routing algorithms

• Reliable router architecture and design

• Evaluation

• Conclusion and discussion

Jan. 14, 2015 Doctor Dissertation Final Review 4

Page 3: Doctor Dissertation Final Review High Throughput ...web-ext.u-aizu.ac.jp/~benab/publications/theses/Akram...Jan. 14, 2015 Doctor Dissertation Final Review Related work 24 –Vicis

Outline

• Background

• Research motivation

• Research goal and contributions

• Related work

• Graceful fault-tolerant routing algorithms

• Reliable router architecture and design

• Evaluation

• Conclusion and discussion

Jan. 14, 2015 Doctor Dissertation Final Review 5

Page 4: Doctor Dissertation Final Review High Throughput ...web-ext.u-aizu.ac.jp/~benab/publications/theses/Akram...Jan. 14, 2015 Doctor Dissertation Final Review Related work 24 –Vicis

• System-on-Chip (SoC)

– Required components are integrated on a single chip.

– Different LSI must be developed for each application.

• System-in-Package (SiP) or 3D IC

– Required components are stacked for each application.

Design cost of LSI is increasing

Fig.1 System-in-Package [NOCs2014] Fig.2 Computation power scaling [SoCPaR 2014]

Jan. 14, 2015 Doctor Dissertation Final Review 6

Page 5: Doctor Dissertation Final Review High Throughput ...web-ext.u-aizu.ac.jp/~benab/publications/theses/Akram...Jan. 14, 2015 Doctor Dissertation Final Review Related work 24 –Vicis

• To keep up with demands on computational power we need: - Increase parallelism (ILP/TLP/CLP).- Provide an efficient and low-power interconnect infrastructure to achieve better

scalability, bandwidth, reliability.

Era of Many-core Chips

Fig.3 Cores’ number scaling [SoCPaR 2014] Fig.4 Gate and interconnect delay overtime [VLSI2005]

• Gate delay

– Continuous decrease

• Interconnect delay

– Exponential increase

• Constant increase of the number of cores

Multi-Core Many-Core

7

Page 6: Doctor Dissertation Final Review High Throughput ...web-ext.u-aizu.ac.jp/~benab/publications/theses/Akram...Jan. 14, 2015 Doctor Dissertation Final Review Related work 24 –Vicis

Circuit switching

Point–to-Point

Bus-based

IO

P1P2

M1P3

WaitWait

M1

IO

P1

M2P1

P2

tsetup tdata

DataAcknowledgmentHeader Probe

ts

tr ts

Time

tr = routing time

ts = setup time

Limited bandwidth and

important power overhead

due to the significant path setup latency

M: memory.

P: Processing Element.

IO: input/output peripheral

Jan. 14, 2015

Fig.5 Communication overhead in circuit switching- based interconnect

Doctor Dissertation Final Review 8

Page 7: Doctor Dissertation Final Review High Throughput ...web-ext.u-aizu.ac.jp/~benab/publications/theses/Akram...Jan. 14, 2015 Doctor Dissertation Final Review Related work 24 –Vicis

Network-on-Chip

Fig. 6 Network-on-Chip architecture

Tail flit Body flit Head flit

Packet

Flit

information

Carried

Payload

Ending

flit

RX

TX

RX

TX

RX

TX

Multihop communication

Receive -> Buffer - > Transmit

every flit at every switch.

R: Router. NI: Network interface. PE: Processing Element

Fig. 7 Conventional router architecture

9

Page 8: Doctor Dissertation Final Review High Throughput ...web-ext.u-aizu.ac.jp/~benab/publications/theses/Akram...Jan. 14, 2015 Doctor Dissertation Final Review Related work 24 –Vicis

Network-on-Chip

10

Fig. 9 Torus topology

Fig. 10 Store & Forward switching

Fig. 12 Routing minimality

Fig. 15 Credit based flow control

Fig. 8 Mesh topology

Fig. 11 Wormhole switching

Fig. 13 Routing locality Fig. 14 routing adaptivity

Fig. 16 ACK/NACK flow control

Page 9: Doctor Dissertation Final Review High Throughput ...web-ext.u-aizu.ac.jp/~benab/publications/theses/Akram...Jan. 14, 2015 Doctor Dissertation Final Review Related work 24 –Vicis

Network-on-Chip

10

00

11

01

12

02

13

03

20 21 22 23

14

04

15

05

24 25

30 31 32 33

40 41 42 43

34 35

44 45

50 51 52 53 54 55

16

06

17

07

26 27

36 37

46 47

56 57

60 61 62 63 64 65

70 71 72 73 74 75

66 67

76 77

Layer1

Layer3

30 31 32 33

20 21 22 32

10 11 21 31

00 01 02 03

30 31 32 33

20 21 22 32

10 11 21 31

00 01 02 03

30 31 32 33

20 21 22 32

10 11 21 31

00 01 02 03

30 31 32 33

20 21 22 32

10 11 21 31

00 01 02 03

Layer4

Layer2

Fig. 17 3D- Network-on-Chip architecture

X

Y

Z

X

Y

Router addressed NM (in decimal) NM

Vertical link

Lateral link

Wire length reduction a

b2D

3D

a/2

b/2

Footprint reduction

2D 3D

c

c =die thickness (0.6 mm)+ interdie distance

(1mm ~ 4mm)

(10 μm ~200 μm)11

Page 10: Doctor Dissertation Final Review High Throughput ...web-ext.u-aizu.ac.jp/~benab/publications/theses/Akram...Jan. 14, 2015 Doctor Dissertation Final Review Related work 24 –Vicis

Network-on-Chip

10

00

11

01

12

02

13

03

20 21 22 23

14

04

15

05

24 25

30 31 32 33

40 41 42 43

34 35

44 45

50 51 52 53 54 55

16

06

17

07

26 27

36 37

46 47

56 57

60 61 62 63 64 65

70 71 72 73 74 75

66 67

76 77

30 31 32 33

20 21 22 32

10 11 21 31

00 01 02 03

30 31 32 33

20 21 22 32

10 11 21 31

00 01 02 03

30 31 32 33

20 21 22 32

10 11 21 31

00 01 02 03

30 31 32 33

20 21 22 32

10 11 21 31

00 01 02 03

X

Y

Z

X

Y

Router addressed NM (in decimal) NM

1

2

3

4

5

6

7

891011121314 hops

7

8

9 hops

321

4

5

6

Diameter reduction

The number of hops that a flit

traverses in the longest possible

minimal path between a (source,

destination) pair.

Packet energy reduction

n =the number of flits

h= the number of hops

Fig. 17 3D- Network-on-Chip architecture

12

Page 11: Doctor Dissertation Final Review High Throughput ...web-ext.u-aizu.ac.jp/~benab/publications/theses/Akram...Jan. 14, 2015 Doctor Dissertation Final Review Related work 24 –Vicis

Network-on-Chip

10

00

11

01

12

02

13

03

20 21 22 23

14

04

15

05

24 25

30 31 32 33

40 41 42 43

34 35

44 45

50 51 52 53 54 55

16

06

17

07

26 27

36 37

46 47

56 57

60 61 62 63 64 65

70 71 72 73 74 75

66 67

76 77

30 31 32 33

20 21 22 32

10 11 21 31

00 01 02 03

30 31 32 33

20 21 22 32

10 11 21 31

00 01 02 03

30 31 32 33

20 21 22 32

10 11 21 31

00 01 02 03

30 31 32 33

20 21 22 32

10 11 21 31

00 01 02 03

X

Y

Z

X

Y

Router addressed NM (in decimal) NM

1

2

3

4

5

6

7

891011121314 hops

7

8

9 hops

321

4

5

6

Packet latency reduction

Lp= Lsender+ Ltransport+ Lreceiver

Tightly dependent on the latency overhead of each hop

Fig. 17 3D- Network-on-Chip architecture

Diameter reduction

The number of hops that a flit

traverses in the longest possible

minimal path between a (source,

destination) pair. 13

Page 12: Doctor Dissertation Final Review High Throughput ...web-ext.u-aizu.ac.jp/~benab/publications/theses/Akram...Jan. 14, 2015 Doctor Dissertation Final Review Related work 24 –Vicis

• NoCs are exposed to a variety of manufacturing anddesign factors making them vulnerable to differenttypes of faults (permanent, intermittent, andtransient).

• The single-point-failure nature of NoC introduces abig concern to their reliability as they are the solecommunication medium.

• The need for fault-tolerance in Many-core systemshas become imperative to ensure their reliability andit is growing in importance as technology scales,especially in NoCs.

Jan. 14, 2015

Research motivation

Doctor Dissertation Final Review 14

Page 13: Doctor Dissertation Final Review High Throughput ...web-ext.u-aizu.ac.jp/~benab/publications/theses/Akram...Jan. 14, 2015 Doctor Dissertation Final Review Related work 24 –Vicis

• There is a need for architectures that can combinefault tolerance aspects with performance aspects inan adaptive manner, adapting to different run-time environments.

• The lack of reliability can be illustrated in corruptedmessage delivery, time requirements unsatisfactory,or even sometimes the entire system collapses.

• Reliability becomes crucial with hard real-time(aeronautics, energy plants) and high-precisioncalculation (disaster simulation, bio-medical)applications.

Jan. 14, 2015 Doctor Dissertation Final Review

Research motivation

15

Page 14: Doctor Dissertation Final Review High Throughput ...web-ext.u-aizu.ac.jp/~benab/publications/theses/Akram...Jan. 14, 2015 Doctor Dissertation Final Review Related work 24 –Vicis

• To avoid costly packet retransmission, increase chip yield, and ensure success of interconnect, I propose in this research a high-throughput architecture and routing algorithms for reliable Network-on-Chip designs:

- Graceful fault-tolerant routing algorithms

- Reliable router architecture

- Hardware design and evaluation

Jan. 14, 2015 Doctor Dissertation Final Review

Research goaland contributions

16

Page 15: Doctor Dissertation Final Review High Throughput ...web-ext.u-aizu.ac.jp/~benab/publications/theses/Akram...Jan. 14, 2015 Doctor Dissertation Final Review Related work 24 –Vicis

1. Graceful fault-tolerant routing algorithms: – Look-Ahead-Fault-Tolerant (LAFT) [Jnl2]: a high-throughput

and light-weight fault-tolerant routing algorithm to handle the presence of faults in inter/intra layer links.

– Hybrid-Look-Ahead-Fault-Tolerant (HLAFT) [Jnl1]: combines both local and look-ahead routing to further enhance the router's throughput under worst-case fault scenarios and make the performance degradation as graceful as possible.

Jan. 14, 2015 Doctor Dissertation Final Review

Research goaland contributions

17

Page 16: Doctor Dissertation Final Review High Throughput ...web-ext.u-aizu.ac.jp/~benab/publications/theses/Akram...Jan. 14, 2015 Doctor Dissertation Final Review Related work 24 –Vicis

2. Reliable router architecture: – Random-Access-Buffer (RAB) [Jnl1,Conf2]:

Employed to recover from deadlock at low cost without the needfor Virtual channels.

Efficiently manages the correct flit buffering at the presence offailures in input-buffers.

– Traffic-Prediction-Unit (TPU): endorses RAB for furtherperformance enhancement by providing an alternative forfaulty congested buffers to relieve the traffic overhead .

– Bypass-Link-on-Demand (BLoD) [Conf1]: Ensures the faulttolerance in the crossbar links by allocating theappropriate and minimal escape channels with noconsiderable area and power overhead.

Jan. 14, 2015 Doctor Dissertation Final Review

Research goaland contributions

18

Page 17: Doctor Dissertation Final Review High Throughput ...web-ext.u-aizu.ac.jp/~benab/publications/theses/Akram...Jan. 14, 2015 Doctor Dissertation Final Review Related work 24 –Vicis

3. Hardware design and evaluation of a 3D-NoC system based on a reliable router:

- Hardware synthesis

- Performance evaluation over various benchmarks

Latency

Throughput

Reliability

Jan. 14, 2015 Doctor Dissertation Final Review

Research goaland contributions

19

Page 18: Doctor Dissertation Final Review High Throughput ...web-ext.u-aizu.ac.jp/~benab/publications/theses/Akram...Jan. 14, 2015 Doctor Dissertation Final Review Related work 24 –Vicis

Outline

• Background

• Research motivation

• Research goal and contributions

• Related work

• Robust fault-tolerant routing algorithms

• Reliable router architecture and design

• Evaluation

• Conclusion and discussion

Jan. 14, 2015 Doctor Dissertation Final Review 20

Page 19: Doctor Dissertation Final Review High Throughput ...web-ext.u-aizu.ac.jp/~benab/publications/theses/Akram...Jan. 14, 2015 Doctor Dissertation Final Review Related work 24 –Vicis

• Faults can be classified in terms of the occurrence frequency:

– Transient faults: they occur and remain in the system for a particular period of time before disappearing.

– Intermittent faults: they are transient faults that occur from time to time.

– Permanent faults: they start at a particular time and remain in the system until they are repaired.

• Tackling the presence of faults was investigated in two main approaches:

– Routing approach

– Architectural approach

Jan. 14, 2015 Doctor Dissertation Final Review

Related work

21

Page 20: Doctor Dissertation Final Review High Throughput ...web-ext.u-aizu.ac.jp/~benab/publications/theses/Akram...Jan. 14, 2015 Doctor Dissertation Final Review Related work 24 –Vicis

• Routing approach:

– Routing table [Feng 2001]: uses a local routing table for 2D routing, and a vector to store the vertical link failure status.Area and power overhead

– 4NP-FIRST [Pasricha 2011]: non-minimal fault-tolerant routing for deadlock avoidance.Additional Latency

– AFRA [DATE 2012]: adaptive routing that considers permanent faults in vertical links only.Lack of reliability

– HamFA [DATE 2013]: adaptive fault-tolerant routing based on Hamilton path searching algorithm.Restriction to the fault placement

Jan. 14, 2015 Doctor Dissertation Final Review

Related work

22

Page 21: Doctor Dissertation Final Review High Throughput ...web-ext.u-aizu.ac.jp/~benab/publications/theses/Akram...Jan. 14, 2015 Doctor Dissertation Final Review Related work 24 –Vicis

– Adaptive-Z [Rahmani 2012]: targeted for Hybrid-3D-NoC.Unscalable

– Planar Adaptive Routing (PAR) [Chien 1992]: based on registers to store fault information and Virtual Channels (VCs).

– 3D Minimum-Connected-Component (MCC) [Jiang 2008]: optimized version of PAR.Area and latency overhead

• Architectural approach:

– BulletProof [Constantinides 2006]: based on N-modular redundancy (NMR) technique to duplicate the target component.Very high area and power overhead

Jan. 14, 2015 Doctor Dissertation Final Review

Related work

23

Page 22: Doctor Dissertation Final Review High Throughput ...web-ext.u-aizu.ac.jp/~benab/publications/theses/Akram...Jan. 14, 2015 Doctor Dissertation Final Review Related work 24 –Vicis

– RoCo [Kim 2006]: the router is decomposed by using parallel arbiters and small crossbar to ensure fault-tolerance in Virtual-channel Allocation (VCA), Switch Allocation (SA), and Routing computation (RC) stages. It does not consider the occurrence of faults in the input-buffer nor the crossbar.

– Minimal correction circuitry [Poluri2013]: provides fault-tolerance in VCA, SA, and CT by sharing the VC and redundant crossbar links. It does not consider the occurrence of faults in the buffer.Lack of reliability

Jan. 14, 2015 Doctor Dissertation Final Review

Related work

24

Page 23: Doctor Dissertation Final Review High Throughput ...web-ext.u-aizu.ac.jp/~benab/publications/theses/Akram...Jan. 14, 2015 Doctor Dissertation Final Review Related work 24 –Vicis

– Vicis [DeOrio 2012]: “flexible-fifo” is proposed to deal with permanent fault in buffer entries. “Crossbar-Bypass-Bus” was presented to deal with faults in the crossbar.Does not consider transient and intermittent faults

Latency overhead for sharing the single Crossbar-Bypass-Bus

The routing algorithm is non-minimal

Jan. 14, 2015 Doctor Dissertation Final Review

Related work

25

Page 24: Doctor Dissertation Final Review High Throughput ...web-ext.u-aizu.ac.jp/~benab/publications/theses/Akram...Jan. 14, 2015 Doctor Dissertation Final Review Related work 24 –Vicis

Outline

• Background

• Research motivation

• Research goal and contributions

• Related work

• Graceful fault-tolerant routing algorithms

• Reliable router architecture and design

• Evaluation

• Conclusion and discussion

Jan. 14, 2015 Doctor Dissertation Final Review 26

Page 25: Doctor Dissertation Final Review High Throughput ...web-ext.u-aizu.ac.jp/~benab/publications/theses/Akram...Jan. 14, 2015 Doctor Dissertation Final Review Related work 24 –Vicis

Look-Ahead-Fault-Tolerant:fault information exchange

RouterEast

Router

SouthRouter

NorthRouter

WestRouter

1 bit : fault detection signal:Issued when a fault is detected in the incoming flit.

6 bits fault information signal:Sent to all the neighboring nodes giving informationabout the router links statusin each direction.

out

in

6 bits not 7: We assume that the link connecting the Local port is always valid and can not be fault

DownRouter

UpRouter

36bit fault-information are required for each router

X

Z

Y

Jan. 14, 2015 Doctor Dissertation Final Review

Fig. 18 Fault information exchange

27

Page 26: Doctor Dissertation Final Review High Throughput ...web-ext.u-aizu.ac.jp/~benab/publications/theses/Akram...Jan. 14, 2015 Doctor Dissertation Final Review Related work 24 –Vicis

Look-Ahead-Fault-Tolerant:algorithm

Fig. 19 Look-Ahead-Fault-Tolerant flowchart Fig. 20 Look-Ahead-Fault-Tolerant Pseudo-code

27

Page 27: Doctor Dissertation Final Review High Throughput ...web-ext.u-aizu.ac.jp/~benab/publications/theses/Akram...Jan. 14, 2015 Doctor Dissertation Final Review Related work 24 –Vicis

Look-Ahead-Fault-Tolerant:Example

Fault link Valid link

Source node Destination nodeS D

Current node Next nodeC N

Current out-port Next out-portS

C

D

N

X

Y

Z

Fig. 21 Look-Ahead-Fault-Tolerant example

1- The current out-port is read from the flit and the next-node address is computed2- The three possible directions are calculated: North, East, and Up3- Verify the link status of the three directions, and eliminate the faulty path: East4- Calculating the diversity value and select the highest one: North=3 (North, east, and up); Up=2 (North and east)

1- The current out-port is read from the flit and the next-node address is computed2- The three possible directions are calculated: North, East, and Up3- Verify the link status of the three directions, and eliminate the faulty path: East4- Calculating the diversity value and select the highest one: North=3 (North, east, and up); Up=2 (North and east)

1- The current out-port is read from the flit and the next-node address is computed2- The three possible directions are calculated: North, East, and Up3- Verify the link status of the three directions, and eliminate the faulty path: East4- Calculating the diversity value and select the highest one: North=3 (North, east, and up); Up=2 (North and east)

1- The current out-port is read from the flit and the next-node address is computed2- The three possible directions are calculated: North, East, and Up3- Verify the link status of the three directions, and eliminate the faulty path: East4- Calculating the diversity value and select the highest one: North=3 (North, east, and up); Up=2 (North and east)

1- The current out-port is read from the flit and the next-node address is computed2- The three possible directions are calculated: North, East, and Up3- Verify the link status of the three directions, and eliminate the faulty path: East4- Calculating the diversity value and select the highest one: North=3 (North, east, and up); Up=2 (North and east) 29

Page 28: Doctor Dissertation Final Review High Throughput ...web-ext.u-aizu.ac.jp/~benab/publications/theses/Akram...Jan. 14, 2015 Doctor Dissertation Final Review Related work 24 –Vicis

Jan. 14, 2015 Doctor Dissertation Final Review

• LAFT showed its ability to reduce the latency by anaverage of 32%.

• As the number of faults increases, the performanceof LAFT is not optimal

– LAFT receives the fault information in a single hop range

• LAFT algorithm was published in:– A. Ben Ahmed and A. Ben Abdallah ''Architecture and Design of High-

throughput, Low-latency and Fault Tolerant Routing Algorithm for 3D-Network-on-Chip'', The Journal of Supercomputing, 66(3): 1507-1532,December 2013.

– A. Ben Ahmed and A. Ben Abdallah, “Deadlock-Recovery Support for Fault-tolerant Routing Algorithms in 3D-NoC Architectures, The IEEE 7thInternational Symposium on Embedded Multicore SoCs (MCSoC-13), pp. 67-72, Tokyo, Japan, September 26-28, 2013.

Related publications toLook-Ahead-Fault-Tolerant

30

Page 29: Doctor Dissertation Final Review High Throughput ...web-ext.u-aizu.ac.jp/~benab/publications/theses/Akram...Jan. 14, 2015 Doctor Dissertation Final Review Related work 24 –Vicis

Jan. 14, 2015 Doctor Dissertation Final Review

S

C N

D

- Local routing sometimes is preferred since it provides more information about the fault status.

- Look ahead routing does not provide enough. information for routing

- The route selection might lead to a blocked path, where a turn back or a non-minimal path selection is required.

WORST CASE

Look-Ahead-Fault-Tolerant:limitations

Fig. 22 Example of Look-Ahead-Fault-Tolerant limitations

The solution is to combine both look ahead and local routing for performance enhancement.

31

Page 30: Doctor Dissertation Final Review High Throughput ...web-ext.u-aizu.ac.jp/~benab/publications/theses/Akram...Jan. 14, 2015 Doctor Dissertation Final Review Related work 24 –Vicis

Hybrid-Look-Ahead-Fault-Tolerant: algorithm

Read faults

Compute 3 possible directions

Calculate next-node

LAFT routing

Loc-rout enb?

Local routing

No

Yes

End

Start

Fig. 23 Hybrid-Look-Ahead-Fault-Tolerant flowchart Fig. 24 Hybrid-Look-Ahead-Fault-Tolerant Pseudo-code31

Page 31: Doctor Dissertation Final Review High Throughput ...web-ext.u-aizu.ac.jp/~benab/publications/theses/Akram...Jan. 14, 2015 Doctor Dissertation Final Review Related work 24 –Vicis

Router architecture

Fig.25 3D-OASIS- router architecture 33

West

Input-port

Up

Input-port

Down

Input-port

Local

Input-port

North

Input-port

East

Input-port

South

Input-port

BW RC/SA CT

Page 32: Doctor Dissertation Final Review High Throughput ...web-ext.u-aizu.ac.jp/~benab/publications/theses/Akram...Jan. 14, 2015 Doctor Dissertation Final Review Related work 24 –Vicis

Jan. 14, 2015 Doctor Dissertation Final Review

Hybrid-Look-Ahead-Fault-Tolerant: architecture

FIFO LAFT

sw_reqcontroller

Fault controller

data_in (77)

LocalFT routing

Nex

t(3

)+d

est

(9)

New_next (3)

sw_req (3)

Next(3)+dest (9)

Nex

t(3

)

fau

lt (

6)

Ou

t_p

ort

(3)

de

st(9

)

Fault_in (36)

Loc-rout-enbLoc-rout-enb

Fig.26 Simplified block diagram of the input-port circuit

Path non blocked

Loc-rout-en =0

34

Page 33: Doctor Dissertation Final Review High Throughput ...web-ext.u-aizu.ac.jp/~benab/publications/theses/Akram...Jan. 14, 2015 Doctor Dissertation Final Review Related work 24 –Vicis

Jan. 14, 2015 Doctor Dissertation Final Review

Hybrid-Look-Ahead-Fault-Tolerant: architecture

data_in (77)

Fault_in (36)

Nex

t(3

)+d

est

(9)

New_next (3)

sw_req (3)

Next(3)+dest (9)

Nex

t(3

)

fau

lt (

6)

Ou

t_p

ort

(3)

de

st(9

)

FIFO LAFT

sw_reqcontroller

Fault controller

LocalFT routing

Path blocked

Loc-rout-en =1

Fig.26 Simplified block diagram of the input-port circuit

35

Page 34: Doctor Dissertation Final Review High Throughput ...web-ext.u-aizu.ac.jp/~benab/publications/theses/Akram...Jan. 14, 2015 Doctor Dissertation Final Review Related work 24 –Vicis

Jan. 14, 2015

Hybrid-Look-Ahead-Fault-Tolerant: Example

Compute next node

Compute 3 possible directions

Check faults

Read destination and next port

Decide routing

S

C

D

Local routing

X

Y

Z

- Destination node (2,2,2)

- Current node: (1,0,1)

- Next_port: North

• Current node: 1,0,1 & Next_port: North– Next_node_x= cur_x

– Next_node_y= cur_y+1 1,1,1

– Next_node_z= cur_z

• Next_node= 1,1,1 & dest = 2,2,2

– next_x< dest_xPossible_x= east

– next_y< dest_y Possible_y= north

– next_z< dest_z Possible_z= up

• Fault= 010011– Possible_x= faulty

– Possible_y= faulty

– Possible_z= faulty

Loc-rout-enb=1

Local routing

Fig.27 Hybrid-Look-Ahead-Fault-Tolerant example

36

Page 35: Doctor Dissertation Final Review High Throughput ...web-ext.u-aizu.ac.jp/~benab/publications/theses/Akram...Jan. 14, 2015 Doctor Dissertation Final Review Related work 24 –Vicis

Jan. 14, 2015 Doctor Dissertation Final Review

• HLAFT made the performance degradation furthergraceful by 12% when compared to LAFT.

• HLAFT algorithm was published in the Journal of Paralleland Distributed Computing:– A. Ben Ahmed and A. Ben Abdallah, “Graceful deadlock-free fault-

tolerant routing algorithm for 3D Network-on-Chip architectures'',Journal of Parallel and Distributed Computing, 74(4): 2229-2240, April2014.

• With a large number of cores and layers, 3D-NoC systemsface greater challenges and become more vulnerable tofaults.

• Faults can be caused by the increasing area, thermalpower, stacking misplacement, etc.

Hybrid-Look-Ahead-Fault-Tolerant: limitations

37

Page 36: Doctor Dissertation Final Review High Throughput ...web-ext.u-aizu.ac.jp/~benab/publications/theses/Akram...Jan. 14, 2015 Doctor Dissertation Final Review Related work 24 –Vicis

Jan. 14, 2015 Doctor Dissertation Final Review

• At this high core density, considering faults only inthe inter-router links does not provide the optimalreliability.

• Other components such as input-buffers andcrossbar should be given greater attention to ensurefault tolerance and enhance the system reliability.

• These components consume a large portion of theentire router area and power budget.

– Vulnerable to failures

Hybrid-Look-Ahead-Fault-Tolerant: limitations

38

Page 37: Doctor Dissertation Final Review High Throughput ...web-ext.u-aizu.ac.jp/~benab/publications/theses/Akram...Jan. 14, 2015 Doctor Dissertation Final Review Related work 24 –Vicis

Outline

• Background

• Research motivation

• Research goal and contributions

• Related work

• Graceful fault-tolerant routing algorithms

• Reliable router architecture and design

• Evaluation

• Conclusion and discussion

Jan. 14, 2015 Doctor Dissertation Final Review 39

Page 38: Doctor Dissertation Final Review High Throughput ...web-ext.u-aizu.ac.jp/~benab/publications/theses/Akram...Jan. 14, 2015 Doctor Dissertation Final Review Related work 24 –Vicis

Jan. 14, 2015

3D-Fault-Toleant-OASIS-NoC router architecture

Fig.28 3D-Fault-Toleant-OASIS-NoC router block diagram 40

Page 39: Doctor Dissertation Final Review High Throughput ...web-ext.u-aizu.ac.jp/~benab/publications/theses/Akram...Jan. 14, 2015 Doctor Dissertation Final Review Related work 24 –Vicis

Jan. 14, 2015

Random Access Buffer mechanism

TimerIf the flit’s request is not served after a period of time a flag is issued

RAB managerWhen receiving the flags, it updates the status register and avoids to write or read from the flagged slots

FIFO managerManages the input buffer when neither deadlock nor fault is detected

Fault-detectIssues a flag whenever a permanent or transient fault is detected

Fig.29 Random-Access-Buffer mechanism block diagram 41

Page 40: Doctor Dissertation Final Review High Throughput ...web-ext.u-aizu.ac.jp/~benab/publications/theses/Akram...Jan. 14, 2015 Doctor Dissertation Final Review Related work 24 –Vicis

Jan. 14, 2015

Random-Access-Buffer mechanism: example

RAB_cntrl

data_out

Next_port

Timer

00 00 00 00

Wr_

ad

r

Rd

_a

dr

sw_grnt

data_in

Status-registerUsed to keep the status of the blocking and faulty slots:00: non blocking nor faulty01: blocking10: transient fault11: permanent fault

SR

P4

We

st

P3

East

P2

Sou

th

P1

No

rth

Fig.30 Example showing how Random-Access-Buffer mechanism works 42

Page 41: Doctor Dissertation Final Review High Throughput ...web-ext.u-aizu.ac.jp/~benab/publications/theses/Akram...Jan. 14, 2015 Doctor Dissertation Final Review Related work 24 –Vicis

Jan. 14, 2015

Random-Access-Buffer mechanism: example

RAB_cntrl

data_out

Next_port

Timer

11 00 00 00

Wr_

ad

r

Rd

_a

dr

sw_grnt

data_in

00: non blocking nor faulty01: blocking10: transient fault11: permanent fault

- In this example, we assume the presence of a permanent fault in one slot- The status register indicates 11

X P3

East

P2

Sou

th

P1

No

rth

SR

Permanent fault Transient fault

Fig.30 Example showing how Random-Access-Buffer mechanism works 43

Page 42: Doctor Dissertation Final Review High Throughput ...web-ext.u-aizu.ac.jp/~benab/publications/theses/Akram...Jan. 14, 2015 Doctor Dissertation Final Review Related work 24 –Vicis

Jan. 14, 2015

Random-Access-Buffer mechanism: example

RAB_cntrl

data_out

Next_port

Timer

11 00 00 00

Wr_

ad

r

Rd

_a

dr

sw_grnt 0

data_in

SR

00: non blocking nor faulty01: blocking10: transient fault11: permanent fault

-Timer informs that the flit being processed did not get the grant and it is blocked- The request is dropped and the SR is updated to 01

No

rthX P

3Ea

st

P2

Sou

th

P1

No

rth

Timer

01

Permanent fault Transient fault

Fig.30 Example showing how Random-Access-Buffer mechanism works 44

Page 43: Doctor Dissertation Final Review High Throughput ...web-ext.u-aizu.ac.jp/~benab/publications/theses/Akram...Jan. 14, 2015 Doctor Dissertation Final Review Related work 24 –Vicis

Jan. 14, 2015

Random-Access-Buffer mechanism: example

RAB_cntrl

data_out

Next_port

Timer

11 00 00 01

Wr_

ad

r

Rd

_a

dr

sw_grnt 1

data_in

00: non blocking nor faulty01: blocking10: transient fault11: permanent fault

-The other flits are processed, requested, granted and read from the buffer

So

uthX P

3Ea

st

P2

Sou

th

P1

No

rth

SR

Permanent fault Transient fault

Fig.30 Example showing how Random-Access-Buffer mechanism works 45

Page 44: Doctor Dissertation Final Review High Throughput ...web-ext.u-aizu.ac.jp/~benab/publications/theses/Akram...Jan. 14, 2015 Doctor Dissertation Final Review Related work 24 –Vicis

Jan. 14, 2015

Random-Access-Buffer mechanism: example

RAB_cntrl

data_out

Next_port

Timer

11 00 00 01

Wr_

ad

r

Rd

_a

dr

sw_grnt 1

data_in

00: non blocking nor faulty01: blocking10: transient fault11: permanent fault

-The other flits are processed, requested, granted and read from the buffer

Ea

st X P

3Ea

st

P2

Sou

th

P1

No

rth

SR

Permanent fault Transient fault

Fig.30 Example showing how Random-Access-Buffer mechanism works 46

Page 45: Doctor Dissertation Final Review High Throughput ...web-ext.u-aizu.ac.jp/~benab/publications/theses/Akram...Jan. 14, 2015 Doctor Dissertation Final Review Related work 24 –Vicis

Jan. 14, 2015

Random-Access-Buffer mechanism: example

RAB_cntrl

data_out

Next_port

Timer

11 00 00 01

Wr_

ad

r

Rd

_a

dr

sw_grnt 1

data_in

SR

00: non blocking nor faulty01: blocking10: transient fault11: permanent fault

- A new incoming flit is stored in the buffer- A transient fault is detected- SR is updated to 10

No

rthX P

3Ea

st

P1

No

rth

P4

UpX

10

Permanent fault Transient fault

Fig.30 Example showing how Random-Access-Buffer mechanism works 47

Page 46: Doctor Dissertation Final Review High Throughput ...web-ext.u-aizu.ac.jp/~benab/publications/theses/Akram...Jan. 14, 2015 Doctor Dissertation Final Review Related work 24 –Vicis

Jan. 14, 2015

Random-Access-Buffer mechanism: example

RAB_cntrl

data_out

Next_port

Timer

11 01 00 01

Wr_

ad

r

Rd

_a

dr

sw_grnt 1

data_in

SR

00: non blocking nor faulty01: blocking10: transient fault11: permanent fault

- The previously blocking packet is checked again, granted, and read from the buffer- The status register is updated to 00- The transient fault is removed and SR is updated to 00

No

rthX P

1N

ort

h

P4

Up

00X

00

Permanent fault Transient fault

Fig.30 Example showing how Random-Access-Buffer mechanism works 48

Page 47: Doctor Dissertation Final Review High Throughput ...web-ext.u-aizu.ac.jp/~benab/publications/theses/Akram...Jan. 14, 2015 Doctor Dissertation Final Review Related work 24 –Vicis

Jan. 14, 2015

3D-Fault-Toleant-OASIS-NoC router architecture

Fig.28 3D-Fault-Toleant-OASIS-NoC router block diagram 49

Page 48: Doctor Dissertation Final Review High Throughput ...web-ext.u-aizu.ac.jp/~benab/publications/theses/Akram...Jan. 14, 2015 Doctor Dissertation Final Review Related work 24 –Vicis

Traffic Prediction Unit

Jan. 14, 2015 Doctor Dissertation Final Review

Fig.31 Traffic-Prediction-Unit block diagram

50

Page 49: Doctor Dissertation Final Review High Throughput ...web-ext.u-aizu.ac.jp/~benab/publications/theses/Akram...Jan. 14, 2015 Doctor Dissertation Final Review Related work 24 –Vicis

Traffic Prediction Unit:monitoring interval selection

Jan. 14, 2015

Simulate

Evaluate the buffer occupancy Bo(t)

Select initial intervalt=t0

Optimal interval

BO(tn)≈BO(tn-1)?

Yes

End

Start

Increment intervalt(n)= t(n-1)+s

No

Fig.32 Monitoring interval selection flow-chart 51

Page 50: Doctor Dissertation Final Review High Throughput ...web-ext.u-aizu.ac.jp/~benab/publications/theses/Akram...Jan. 14, 2015 Doctor Dissertation Final Review Related work 24 –Vicis

Jan. 14, 2015

3D-Fault-Toleant-OASIS-NoC router architecture

Fig.28 3D-Fault-Toleant-OASIS-NoC router block diagram 52

Page 51: Doctor Dissertation Final Review High Throughput ...web-ext.u-aizu.ac.jp/~benab/publications/theses/Akram...Jan. 14, 2015 Doctor Dissertation Final Review Related work 24 –Vicis

Jan. 14, 2015

Bypass-Link-on-Demand

Bypass-1

Ctrl

Fault-control-module (FCM)

Bypass-2

Fa

ulty_

Cro

ss

En

ab

le_

byp

ass

dis

ab

le_crs

s

L_in

N_in

E_in

S_in

W_in

U_in

D_in

L_out

N_out

E_out

S_out

W_out

U_out

D_out

Crss flag (to sw_req_ctrl)Fault-control-module- Responsible for deciding the number of the necessary bypass links and disabling the faulty baseline ones

Bypass-3

Bypass-nAdditional escape links used when the baseline crossbar links are detected faulty

Ctrl- Detects the presence of faults and manages the selection between the baseline crossbar and bypass link

Fig.33 Bypass-Link-on-Demand block diagram 53

Page 52: Doctor Dissertation Final Review High Throughput ...web-ext.u-aizu.ac.jp/~benab/publications/theses/Akram...Jan. 14, 2015 Doctor Dissertation Final Review Related work 24 –Vicis

Jan. 14, 2015

Bypass-Link-on-Demand

Bypass-1

Ctrl

Fault-control-module (FCM)

Bypass-2

L_out

N_out

E_out

S_out

W_out

U_out

D_out

Crss flag

Fa

ulty_

Cro

ss

En

ab

le_

byp

ass

Bypass-3

Crss flag

L_in

N_in

E_in

S_in

W_in

U_in

D_in

• A permanent fault is detected and the Ctrl sends a signal to the FCM.

• FCM sends Crss-flagsignal to sw-req-ctrl in the input port to prevent from requesting the faulty crossbar link.

Permanent fault

Fig.34 Bypass-Link-on-Demand example 54

Page 53: Doctor Dissertation Final Review High Throughput ...web-ext.u-aizu.ac.jp/~benab/publications/theses/Akram...Jan. 14, 2015 Doctor Dissertation Final Review Related work 24 –Vicis

Bypass-Link-on-Demand

Bypass-1

Ctrl

Fault-control-module (FCM)

Bypass-2

L_in

N_in

E_in

S_in

W_in

U_in

D_in

L_out

N_out

E_out

S_out

W_out

U_out

D_out

• At the same time, FCM sends a signal to Ctrl to enable the bypass link.

• When other faults are detected other bypasses are enabled.

• When transient faults are removed, the bypass link is disabled.

Crss flag

Fa

ulty_

Cro

ss

En

ab

le_

byp

ass

Bypass-1

Bypass-2

Bypass-3Permanent fault

Transient fault Fig.34 Bypass-Link-on-Demand example 55

Page 54: Doctor Dissertation Final Review High Throughput ...web-ext.u-aizu.ac.jp/~benab/publications/theses/Akram...Jan. 14, 2015 Doctor Dissertation Final Review Related work 24 –Vicis

Outline

• Background

• Research motivation

• Research goal and contributions

• Related work

• Graceful fault-tolerant routing algorithms

• Reliable router architecture and design

• Evaluation

• Conclusion and discussion

Jan. 14, 2015 Doctor Dissertation Final Review 56

Page 55: Doctor Dissertation Final Review High Throughput ...web-ext.u-aizu.ac.jp/~benab/publications/theses/Akram...Jan. 14, 2015 Doctor Dissertation Final Review Related work 24 –Vicis

• We evaluate:

– Hardware complexity

• Area

• Total power

– System performance

• Latency/flit

• Throughput

• Reliability

Jan. 14, 2015 Doctor Dissertation Final Review

Evaluation methodology

• Benchmarks:

– Transpose

– Uniform

– Matrix Multiplication

– JPEG

• We use:

– Verilog HDL

– Synopsys Design Compiler

– Cadence SoC Encounter

– Modelsim

57

Page 56: Doctor Dissertation Final Review High Throughput ...web-ext.u-aizu.ac.jp/~benab/publications/theses/Akram...Jan. 14, 2015 Doctor Dissertation Final Review Related work 24 –Vicis

Jan. 14, 2015 Doctor Dissertation Final Review

Configuration and assumptions

Configuration

58

- Fault-rate: 0%, 5%, 10%, and 20%

- Injection rate: 1,000 to 100,000 flits

- Faults cannot occur on links connecting to the local port

- There exists at least one valid path between a source and destination

Assumption

Page 57: Doctor Dissertation Final Review High Throughput ...web-ext.u-aizu.ac.jp/~benab/publications/theses/Akram...Jan. 14, 2015 Doctor Dissertation Final Review Related work 24 –Vicis

LAFT and HLAFT routing latency/flit evaluation

(a) (b)

(c) (d)

Fig.35 LAFT and HLAFT latency/flit evaluation with: (a) Transpose (b) Uniform (c) 6x6 Matrix (d) JPEG.

57%

36.5%

12.1% 10.5%3%

41.2%

16 %

58.8%

11.6%

59

- 0% FR: 48% reduction compared with XYZ- 20% FR 12.6% when compared LAFT

Page 58: Doctor Dissertation Final Review High Throughput ...web-ext.u-aizu.ac.jp/~benab/publications/theses/Akram...Jan. 14, 2015 Doctor Dissertation Final Review Related work 24 –Vicis

LAFT and HLAFT routing throughput evaluation

(a) (b)

(c) (d)

Fig.36 LAFT and HLAFT throughput evaluation with: (a) Transpose (b) Uniform (c) 6x6 Matrix (d) JPEG.

59.3%

15.5 %

- 0% FR: 43% increase compared with XYZ- 20% FR 11.8% when compared LAFT

60

Page 59: Doctor Dissertation Final Review High Throughput ...web-ext.u-aizu.ac.jp/~benab/publications/theses/Akram...Jan. 14, 2015 Doctor Dissertation Final Review Related work 24 –Vicis

Jan. 14, 2015 Doctor Dissertation Final Review

HLAFT reliability evaluation

• We define the reliability as the capability of the system tocorrectly deliver all the packets to their destinations, evenat the presence of failures.

Routing /faulty links 1 faulty link 2 faulty links 3 faulty links

HamFA 95% 44% 20%

AFRA 33% 7% 3%

HLAFT 100% 100% 100%

Table II HLAFT reliability evaluation results

61

Page 60: Doctor Dissertation Final Review High Throughput ...web-ext.u-aizu.ac.jp/~benab/publications/theses/Akram...Jan. 14, 2015 Doctor Dissertation Final Review Related work 24 –Vicis

BLoD latency/flit evaluation

Fig.37 BLoD latency/flit evaluation with: (a) Transpose (b) Uniform (c) 6x6 Matrix (d) JPEG.

(a) (b)

(c) (d)

20.1%13.2%

27.317.3

- 3 bypasses seems to be the best number- At 20% FT BLoD performs better in 3 applications

62

Page 61: Doctor Dissertation Final Review High Throughput ...web-ext.u-aizu.ac.jp/~benab/publications/theses/Akram...Jan. 14, 2015 Doctor Dissertation Final Review Related work 24 –Vicis

RAB and TPUlatency/flit evaluation

Fig.38 RAB latency/flit evaluation with: (a) Transpose (b) Uniform (c) 6x6 Matrix (d) JPEG.

(a) (b)

(c) (d)

19.2% 12.4%

9.3%15.3 %

- At 20% FT, RAB+TPU exhibits negligible variation- Adding TPU further reduced the latency with 14.05

63

Page 62: Doctor Dissertation Final Review High Throughput ...web-ext.u-aizu.ac.jp/~benab/publications/theses/Akram...Jan. 14, 2015 Doctor Dissertation Final Review Related work 24 –Vicis

Complete 3D-FTO router latency/flit evaluation

(a)(b)

(c) (d)

Fig.39 3D-FTO latency/flit evaluation with: (a) Transpose (b) Uniform (c) 6x6 Matrix (d) JPEG.

- 37% and 18.5% at 0% fault-rate- Performs better than XYZ in two applications

- 12.1% latency increase with the remaining two

22.4%

14.1%

64

Page 63: Doctor Dissertation Final Review High Throughput ...web-ext.u-aizu.ac.jp/~benab/publications/theses/Akram...Jan. 14, 2015 Doctor Dissertation Final Review Related work 24 –Vicis

Complete 3D-FTO router throughput evaluation

Fig.40 3D-FTO throughput evaluation with: (a) Transpose (b) Uniform (c) 6x6 Matrix (d) JPEG.

(a) (b)

(c) (d)

13.3%

17.4%

- 51% and 38.5% at 0% fault-rate- Performs better than XYZ in two applications

- 10.1% throughput decrease with the remaining two

65

Page 64: Doctor Dissertation Final Review High Throughput ...web-ext.u-aizu.ac.jp/~benab/publications/theses/Akram...Jan. 14, 2015 Doctor Dissertation Final Review Related work 24 –Vicis

Hardware design results

Jan. 14, 2015 Doctor Dissertation Final Review

Table III Router hardware complexity evaluation results

Routing circuit Crossbar circuit Input-buffer circuit Complete router

Proposed(LAFT)

Proposed(HLAFT)

BaselineProposed

(BLoD)Baseline

Proposed(RAB+TPU)

Baseline Proposed Baseline

Area (µm) 688 772 609 2443 2085 4529 3543 10587 7654

Power (µW) 111.2 134.4 94.2 379.3 316.7 483.4 373.7 1175.6 886.32

Module

Parameter

38.3% additional area32.6% power overhead

12% additional area18% power overhead

17.1% additional area19.9% power overhead

27.8% additional area29.4% power overhead

66

Power: 1175.6 uWNumber of Pins: 557Frequency: 0.9 GHzVoltage: 1.1 VTotal code: 2386 lines of code

Page 65: Doctor Dissertation Final Review High Throughput ...web-ext.u-aizu.ac.jp/~benab/publications/theses/Akram...Jan. 14, 2015 Doctor Dissertation Final Review Related work 24 –Vicis

Evaluation summaryBC: Best case

WC: Worst case

BenchmarksLatency/flit (us/flit) Throughput (flit/cycle) Area (µm) Power (µW)

BC(0%)

WC(20%)

XYZ LAXYZ BC(0%)WC

(20%)XYZ LAXYZ Proposed Baseline Proposed Baseline

Transpose 159 370 370 320 7.81 4.5 4.27 5.4

688 609 111.2 94.2Uniform 520 950 820 630 6.1 1.1 2.16 4.2

Matrix 268 560 460 320 11.25 4.1 5.23 8.54

JPEG 620 1120 1020 810 8.45 5.1 6.33 7.5

Transpose 158 325 370 320 8.8 5.2 4.27 5.4

872 609 134.4 94.2Uniform 421 850 820 630 6.1 1.9 2.16 4.2

Matrix 270 470 460 320 11.25 5.2 5.23 8.54

JPEG 620 990 1020 810 8.45 6.4 6.33 7.5

BC(0%)

WC(20%)

LAFT LAXYZBC

(0%)WC

(20%)LAFT LAXYZ

2443 2085 379.3 316.7Transpose 940 950 910 1190 9.22 7.12 9.34 6.68

Uniform 10600 14600 10300 12900 2.46 2.12 3.0 2.5

Matrix 930 1050 905 1280 2.3 2.05 2.67 1.8

JPEG 670 680 630 810 6.6 6.5 6.9 5.33

BC(0%)

WC(20%)

RAB(WC)

LAXYZBC

(0%)WC

(20%)RAB(WC)

LAXYZ

5529 3543 623.4 373.7Transpose 9300 11700 14100 11900 10.6 9.0 7.4 9.0

Uniform 106 134 153 129 7.15 5.2 4.32 5.45

Matrix 910 1260 1390 1280 9.4 8.1 7.8 8.3

JPEG 640 770 910 810 10.7 8.2 6.65 7.9

BC(0%)

WC(20%)

XYZ LAXYZBC

(0%)WC

(20%)XYZ LAXYZ

10587 7654 1175.6 886.32Transpose 158 356 376 324 8.92 4.64 4.35 5.41

Uniform 427 917 634 821 6.15 1.88 2.17 4.22

Matrix 268 453 469 320 11.7 4.9 4.76 8.43

JPEG 620 1192 1023 811 8.45 5.21 6. 32 7.45

Table IV Evaluation results summary

Co

mp

lete

3D

-FT

O

RA

B+

TP

UB

Lo

DH

LA

FT

LA

FT

Page 66: Doctor Dissertation Final Review High Throughput ...web-ext.u-aizu.ac.jp/~benab/publications/theses/Akram...Jan. 14, 2015 Doctor Dissertation Final Review Related work 24 –Vicis

Outline

• Background

• Research motivation

• Research goal and contributions

• Related work

• Graceful fault-tolerant routing algorithms

• Reliable router architecture and design

• Evaluation

• Conclusion and discussion

Jan. 14, 2015 Doctor Dissertation Final Review 68

Page 67: Doctor Dissertation Final Review High Throughput ...web-ext.u-aizu.ac.jp/~benab/publications/theses/Akram...Jan. 14, 2015 Doctor Dissertation Final Review Related work 24 –Vicis

Conclusion• In this research, I proposed a high-throughput

architecture and routing algorithms targetedfor reliable Network-on-Chip designs.

• Two efficient routing algorithms, named Look-Ahead-Fault-Tolerant (LAFT) and Hybrid-Look-Ahead-Fault-Tolerant (HLAFT), were presentedto ensure fault-tolerance in links.

• They exploit the look-ahead routing propertiesto provide graceful performance degradationat high fault-rates.

Jan. 14, 2015 Doctor Dissertation Final Review 69

Page 68: Doctor Dissertation Final Review High Throughput ...web-ext.u-aizu.ac.jp/~benab/publications/theses/Akram...Jan. 14, 2015 Doctor Dissertation Final Review Related work 24 –Vicis

Conclusion• Random-Access-Buffer (RAB) mechanism was

proposed to ensure deadlock recovery and alsoto handle the fault occurrence in the input-buffers.

• RAB was endorsed with Traffic-Prediction-Unit(TPU) to further reduce the latency caused bythe presence of faulty buffer-slots.

• To relieve the congestion caused by faults inthe crossbar, we developed a technique namedBypass-Link-on-Demand (BLoD).

Jan. 14, 2015 Doctor Dissertation Final Review 70

Page 69: Doctor Dissertation Final Review High Throughput ...web-ext.u-aizu.ac.jp/~benab/publications/theses/Akram...Jan. 14, 2015 Doctor Dissertation Final Review Related work 24 –Vicis

Conclusion• For up to 10% fault-rate, 3D-FTO reduces the latency

with an average of 37% and 18.5% when compared toXYZ- and LA-XYZ-based systems, respectively.

• It also provides a throughput improvement that canreach the 51% and 38% at the absence of faults.

• At 20% fault-rate, the proposed router provides betterthroughput than that of XYZ in two applications.

• The hardware complexity evaluation results showedthat 3D-FTO exhibits 29.3% additional area, 24.6%power overhead, and a negligible speed drop whencompared to the baseline system.

Jan. 14, 2015 Doctor Dissertation Final Review 71

Page 70: Doctor Dissertation Final Review High Throughput ...web-ext.u-aizu.ac.jp/~benab/publications/theses/Akram...Jan. 14, 2015 Doctor Dissertation Final Review Related work 24 –Vicis

Discussion• Further research is needed about the diagnosis

mechanisms to capture detailed performance of thewhole system.

• Thermal power is one of the major issues 3D-NoCdesigns.

– In-depth thermal power study is necessary.

• Investigate more about the reliability andperformance of the whole system for more generalrun-time scenarios.

• Investigate about Quality-of-Service, especially fortime-constrained applications.

Jan. 14, 2015 Doctor Dissertation Final Review 72

Page 71: Doctor Dissertation Final Review High Throughput ...web-ext.u-aizu.ac.jp/~benab/publications/theses/Akram...Jan. 14, 2015 Doctor Dissertation Final Review Related work 24 –Vicis

Publications• Refereed Journals

– [Jnl1] A. Ben Ahmed and A. Ben Abdallah, “Graceful deadlock-free fault-tolerantrouting algorithm for 3D Network-on-Chip architectures'', Journal of Parallel andDistributed Computing, 74(4): 2229-2240, April 2014.

– [Jnl2] A. Ben Ahmed and A. Ben Abdallah ''Architecture and Design of High-throughput, Low-latency and Fault Tolerant Routing Algorithm for 3D-Network-on-Chip'', The Journal of Supercomputing, 66(3): 1507-1532, December 2013.

• Refereed International conferences– [Conf1] A. Ben Ahmed, M. Meyer, Y. Okuyama, A. Ben Abdallah, Adaptive Error-

and Traffic-aware Router Architecture for Electrical 3D Network-on-ChipSystems", The IEEE 8th International Symposium on Embedded Multicore SoCs(MCSoC-14), pp. 197-204, Aizu-Wakamatsu, Japan, September 23-25, 2012.

– [Conf2] A. Ben Ahmed and A. Ben Abdallah, “Deadlock-Recovery Support forFault-tolerant Routing Algorithms in 3D-NoC Architectures, The IEEE 7thInternational Symposium on Embedded Multicore SoCs (MCSoC-13), pp. 67-72,Tokyo, Japan, September 26-28, 2013.

Jan. 14, 2015 Doctor Dissertation Final Review 73

Page 72: Doctor Dissertation Final Review High Throughput ...web-ext.u-aizu.ac.jp/~benab/publications/theses/Akram...Jan. 14, 2015 Doctor Dissertation Final Review Related work 24 –Vicis

Publications– [Conf3] A. Ben Ahmed, T. Ochi, Sh. Miura, A. Ben Abdallah, “Run-Time

Monitoring Mechanism for Efficient Design of Application-specific NoCArchitectures in Multi/Manycore Era”, The 6th International Workshop on Engineering Parallel and Multicore Systems, pp. 440-445, Taichung-Taiwan, July 3-5, 2013.

– [Conf4] A. Ben Ahmed, A. Ben Abdallah, “Low-overhead Routing Algorithm for 3D Network-on-Chip”, The Third International Conference on Networking and Computing (ICNC-12), pp. 23-32, Okinawa, Japan, December 20-22, 2012.

– [Conf5] A. Ben Ahmed, A. Ben Abdallah, “LA-XYZ: Low Latency, High Throughput Look-Ahead Routing Algorithm for 3D Network-on-Chip (3D-NoC) Architecture, The IEEE 6th International Symposium on Embedded Multicore SoCs (MCSoC-12), pp. 167-174, Aizu-Wakamatsu, Japan, September 20-22, 2012.

– [Conf6] A. Ben Ahmed, K. Mori and A. Ben Abdallah, “ONoC-SPL Customized Network-on-Chip (NoC) Architecture and Prototyping for Data-intensive Computation Applications”, The 4th International Conference on Awareness Science and Technology (iCAST-2012), pp. 257-262, Seoul, South Korea, August 21-24, 2012.

Jan. 14, 2015 Doctor Dissertation Final Review 74

Page 73: Doctor Dissertation Final Review High Throughput ...web-ext.u-aizu.ac.jp/~benab/publications/theses/Akram...Jan. 14, 2015 Doctor Dissertation Final Review Related work 24 –Vicis

Publications

– [Conf7] A. Ben Ahmed, A. Ben Abdallah, K. Kuroda “Architecture and Design of Efficient 3D Network-on-Chip (3D NoC) for Custom Multi-Core SoC”, Fifth International Conference on Broadband and Wireless Computing, Communication and Applications (BWCCA-2010) pp. 67-73, Fukuoka, Japan, November 4-6, 2010 (Best Paper Award).

Jan. 14, 2015 Doctor Dissertation Final Review 75

Page 74: Doctor Dissertation Final Review High Throughput ...web-ext.u-aizu.ac.jp/~benab/publications/theses/Akram...Jan. 14, 2015 Doctor Dissertation Final Review Related work 24 –Vicis

Jan. 14, 2015 Doctor Dissertation Final Review

Under Review Publications

• Refereed Journals– A. Ben Ahmed, A. Ben Abdallah, “Adaptive Fault-Tolerant Architecture and

Routing Algorithm for Reliable Many-Core 3D-NoC systems”, submitted to the Journal of Parallel and Distributed Computing on February, 2014.

76

Page 75: Doctor Dissertation Final Review High Throughput ...web-ext.u-aizu.ac.jp/~benab/publications/theses/Akram...Jan. 14, 2015 Doctor Dissertation Final Review Related work 24 –Vicis

References- [NOCS2014] Hiroki Matsutani, "3D WiNoC Architectures", The 8th ACM/IEEE InternationalSymposium on Networks-on-Chip (NOCS'14), Special Session, Sep 2014.

- [SoCPaR 2014] A. Ben Abdallah, On-Chip Optical Interconnects: Prospects and Challenges, InvitedTalk, 6th International Conference of Soft Computing and Pattern Recognition, August 11-14, 2014

- [VLSI2005] M. El-Moursy and E. Friedman, "Shielding effect of onchip interconnect inductance,"Proc. Great Lakes Symp. VLSI, Apr. 2003, pp. 165-170.

- [Loi 2008] I. Loi et al.. A low-overhead fault tolerance scheme for TSV-based 3D network on chiplinks. In Proc. of the 2008 IEEE/ACM International Conference on Computer-Aided Design, pages598-602, 2008.

- [Parisha 2009] S. Pasricha. Exploring serial vertical interconnects for 3D ICs. In Proc. Of the 46thACM/IEEE Design Automation Conference, pages 581-586, July 2009.

- [Rahmani 2012] A. -M. Rahmani et al.. Design and Management of High-performance, Reliableand Thermal-aware 3D Networks-on-Chip. IET Circuits, Devices & Systems, 6(5):308-321,September 2012.

- [Chien 1992] A. A. Chien and J. H. Kim. Planar-adaptive Routing: Low-cost Adaptive Networks forMultiprocessors. The 19th Annual International Symposium on Computer Architecture, pages 268-277, 1992.

Jan. 14, 2015 Doctor Dissertation Final Review 77

Page 76: Doctor Dissertation Final Review High Throughput ...web-ext.u-aizu.ac.jp/~benab/publications/theses/Akram...Jan. 14, 2015 Doctor Dissertation Final Review Related work 24 –Vicis

References- [Jiang 2008] Z. Jiang, J. Wu and D. Wang. A New Fault Information Model for Fault-Tolerant Adaptive and Minimal Routing in 3-D Meshes. IEEE Transactions on Reliability, 57(1):149-162, March 2008.

- [Feng 2011] Ch. Feng et al.. A Low-Overhead Fault-Aware Deflection Routing Algorithm for 3D Network-on-Chip. IEEE Computer Society Annual Symposium on VLSI, pages 19-24, July 2011.

- [Pasricha 2011] S. Pasricha and Y. Zou. A Low Overhead Fault Tolerant Routing Scheme for 3D Networks-on-Chip. The 12th International Symposium on Quality Electronic Design, pages 1-8, March 2011.

- [DATE 2012] S. Akbari, A. Shafieey, M. Fathy and R. Berangi. AFRA: A Low Cost High Performance Reliable Routing for 3D Mesh NoCs. Design, Automation & Test in Europe Conference & Exhibition, pages 332-337, March 2012.

- [DATE 2013] M. Ebrahimi, M. Daneshtalab, and J. Plosila. Fault-Tolerant Routing Algorithm for 3D NoC Using Hamiltonian Path Strategy. In Proc. of Design, Automation & Test in Europe Conference & Exhibition (DATE), pages 1601-1604, March 2013.

- [Constantinides 2006] K. Constantinides et.al, BulletProof: A defect-tolerant CMP switch architecture, in Proceedings of the 12th International Symposium on High-Performance Computer Architecture (HPCA), pp. 5-16, 2006.

Jan. 14, 2015 Doctor Dissertation Final Review 78

Page 77: Doctor Dissertation Final Review High Throughput ...web-ext.u-aizu.ac.jp/~benab/publications/theses/Akram...Jan. 14, 2015 Doctor Dissertation Final Review Related work 24 –Vicis

References- [Kim 2006] J. Kim, et.al A Gracefully Degrading and Energy-Efficient Modular Router Architecture for On-Chip Networks, in Proceedings of the 33rd International Symposium on Computer Architecture (ISCA), 2006.

- [Poluri 2013] P. Poluri and A. Louri In Proceedings of the 25th International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD), pp. 49-56, October 23-26, 2013

- [DeOrio 2012] A. DeOrio, D. Fick, V. Bertacco, D. Sylvester, D. Blaauw, J. Hu, and G. Chen. A Reliable Routing Architecture and Algorithm for NoCs. IEEE Transactions on CAD of Integrated Circuits and Systems, 31(5):726-739, May 2012.

Jan. 14, 2015 Doctor Dissertation Final Review 79

Page 78: Doctor Dissertation Final Review High Throughput ...web-ext.u-aizu.ac.jp/~benab/publications/theses/Akram...Jan. 14, 2015 Doctor Dissertation Final Review Related work 24 –Vicis

Thank you

for your attention

Doctor Dissertation Final ReviewJan. 14, 2015 80