ariadne a gnostic r econfiguration i n a d isconnected n etwork e nvironment

40
ARIADNE Agnostic Recon guration In A Disconnected Network Environment Konstantinos Aisopos (Princeton, MIT), Andrew DeOrio (Michigan), Li-Shiuan Peh (MIT), Valeria Bertacco (Michigan)

Upload: ciro

Post on 22-Feb-2016

39 views

Category:

Documents


0 download

DESCRIPTION

ARIADNE A gnostic R econfiguration I n A D isconnected N etwork E nvironment. Konstantinos Aisopos (Princeton, MIT), Andrew DeOrio (Michigan), Li- Shiuan Peh (MIT), Valeria Bertacco (Michigan). What is “reconfiguration”?. Silicon technologies move into the nanometer regime - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: ARIADNE A gnostic  R econfiguration I n  A D isconnected  N etwork  E nvironment

ARIADNEAgnostic Reconfiguration In A

Disconnected Network Environment

Konstantinos Aisopos (Princeton, MIT), Andrew DeOrio (Michigan),Li-Shiuan Peh (MIT), Valeria Bertacco (Michigan)

Page 2: ARIADNE A gnostic  R econfiguration I n  A D isconnected  N etwork  E nvironment

What is “reconfiguration”? Silicon technologies move into the nanometer regime …transistors become unreliable

for architects: permanent faults

Our focus in this talk: Network-on-Chip

cannot resend

need to re-route around the fault

reconfiguration: “the process of replacing the routing algorithm”

S D

Shekhar Borkar

In future chips of 100 billion transistors, 10%

of transistors will eventually fail over

the lifetime of the chip

P

P$ S$

NIC

R

Page 3: ARIADNE A gnostic  R econfiguration I n  A D isconnected  N etwork  E nvironment

Why is reconfiguration challenging?

• XY routing

S

D

X

Y

Page 4: ARIADNE A gnostic  R econfiguration I n  A D isconnected  N etwork  E nvironment

S

D

• XY routing

Why is reconfiguration challenging?

Agnostic Reconfiguration algorithm

In ADisconnectedNetworkEnvironment

Page 5: ARIADNE A gnostic  R econfiguration I n  A D isconnected  N etwork  E nvironment

S

D

• XY routing

Why is reconfiguration challenging?

Agnostic Reconfiguration algorithm

In ADisconnectedNetworkEnvironment

Page 6: ARIADNE A gnostic  R econfiguration I n  A D isconnected  N etwork  E nvironment

Outline• Motivation• Ariadne

–Baseline–Deadlocks–Synchronization

• Evaluation–Overhead–Performance–Reliability

• Conclusions

Page 7: ARIADNE A gnostic  R econfiguration I n  A D isconnected  N etwork  E nvironment

S

D

How will S find a path to D?

?

Page 8: ARIADNE A gnostic  R econfiguration I n  A D isconnected  N etwork  E nvironment

S

D

How will S find a path to D?

RT E

…… D:

RT S

…… D:

RT W

…… D:

RT N

…… D:

Page 9: ARIADNE A gnostic  R econfiguration I n  A D isconnected  N etwork  E nvironment

S

D

How will S find a path to D?

RT E,S

…… D:

RT S

…… D:

RT

W,N

…… D:

RT E,N

…… D:

RT W,S

…… D:

RT N

…… D:

Page 10: ARIADNE A gnostic  R econfiguration I n  A D isconnected  N etwork  E nvironment

S

D

How will S find a path to D?

Page 11: ARIADNE A gnostic  R econfiguration I n  A D isconnected  N etwork  E nvironment

How will S find a path to D?

D

S

RT W

…… D:

RT W

…… D: RT

S

…… D:

Page 12: ARIADNE A gnostic  R econfiguration I n  A D isconnected  N etwork  E nvironment

• Upon a fault that changes the topology…– a node can let everyone know how it can be reached

with a single broadcast– N nodes can let everyone know how they can be

reached with N broadcasts

ARIADNE: baseline

Page 13: ARIADNE A gnostic  R econfiguration I n  A D isconnected  N etwork  E nvironment

• Upon a fault that changes the topology…– Every node broadcasts “in-turn” to let others know how it can be reached

ARIADNE: baseline

1st

19

2nd

20

3rd …17

last fault

detector: 18

statically assigned node IDs

24

0 1 2 3 4 5 6 7

8 9 10 11 12 13 14 15

16 17 18 19 21 22 23

24 25 27 28 29 30 31

32 33 34 35 36 37 38 39

40 41 42 43 44 45 46 47

48 49 50 51 52 53 54 55

56 57 58 59 60 61 62 63

20

Page 14: ARIADNE A gnostic  R econfiguration I n  A D isconnected  N etwork  E nvironment

• Upon a fault that changes the topology…– Every node broadcasts “in-turn” to let others know how it can be reached

ARIADNE: baseline

1st

19

2nd

20

3rd …17

last fault

detector: 18

• Issues:– deadlock avoidance– synchronization (when to broadcast, multiple detectors)

Page 15: ARIADNE A gnostic  R econfiguration I n  A D isconnected  N etwork  E nvironment

S

D

ARIADNE: deadlocks

Page 16: ARIADNE A gnostic  R econfiguration I n  A D isconnected  N etwork  E nvironment

S

D

ARIADNE: deadlocks

D

S

D

S

Page 17: ARIADNE A gnostic  R econfiguration I n  A D isconnected  N etwork  E nvironment

r

ARIADNE: deadlocks

first bcast ONLY:nodes are assigned ranks

0 bcaster“root”

1immediate neighbors

2 2-hop neighbors

3 3-hop neighbors

up*/down* disable routes where rank goes

higher

up

down

lower

unique ordering:among nodes with same rank, arbitrarily select a higher one

assume routing circle:1 node will have higher rank than its neighbors, breaking the circular route

0 1 2 3 4 5 6 7

8 9 10 11 12 13 14 15

16 17 19 21 22 23

24 25 27 28 29 30 31

32 33 34 35 36 37 38 39

40 41 42 43 44 45 46 47

48 49 50 51 52 53 54 55

56 57 58 59 60 61 62 63

2018

26

Page 18: ARIADNE A gnostic  R econfiguration I n  A D isconnected  N etwork  E nvironment

r

ARIADNE: deadlocks

first bcast ONLY:nodes are assigned ranks

0 bcaster“root”

1immediate neighbors

2 2-hop neighbors

3 3-hop neighbors

up*/down* disable routes where rank goes

higher

up

down

lower

unique ordering:among nodes with same rank, arbitrarily select a higher one

assume routing circle:1 node will have higher rank than its neighbors, breaking the circular route

Page 19: ARIADNE A gnostic  R econfiguration I n  A D isconnected  N etwork  E nvironment

r

ARIADNE: deadlocks

first bcast ONLY:nodes are assigned ranks

0 bcaster“root”

1immediate neighbors

2 2-hop neighbors

3 3-hop neighbors

up*/down* disable routes where rank goes

higher

up

down

lower

unique ordering:among nodes with same rank, arbitrarily select a higher one

connectivity:can reach any node via the root

S

D

SD

assume routing circle:1 node will have higher rank than its neighbors, breaking the circular route

Page 20: ARIADNE A gnostic  R econfiguration I n  A D isconnected  N etwork  E nvironment

• Upon a fault that changes the topology…– Every node broadcasts “in-turn” to let others know how it can be reached

ARIADNE: deadlocks

• Issues:– deadlock avoidance– synchronization

Page 21: ARIADNE A gnostic  R econfiguration I n  A D isconnected  N etwork  E nvironment

• Upon a fault that changes the topology…– Every node broadcasts “in-turn” to let others know how it can be reached RULE: (i) first broadcast ranks nodes (ii) remaining broadcasts spread

ONLY via enabled turns

ARIADNE: deadlocks

• Issues:– synchronization

Page 22: ARIADNE A gnostic  R econfiguration I n  A D isconnected  N etwork  E nvironment

• “in turn” broadcasts: when does previous broadcast complete? can broadcasts overlap?

ARIADNE: synchronization

1-bit 1-bit

1-bit

1-bit

1-bit

1-bit

1-bit 1-bit

• how does the recipient of a flag know the broadcasting node?

NO

arbitration

Page 23: ARIADNE A gnostic  R econfiguration I n  A D isconnected  N etwork  E nvironment

Solution : Atomic Broadcasts

• Nodes utilize the cycle count as a global reference point

• Each node is assigned a unique broadcast slot from the “global” cycle counter

ARIADNE: synchronization

Page 24: ARIADNE A gnostic  R econfiguration I n  A D isconnected  N etwork  E nvironment

0

4

1

5

2

6

3

7

8

12

9

13

10

14

11

15

11 0010 10 1101 10

cycle count (same for all nodes)

XX XXXX 10 1101 10

bcast cycle

bcast node

9 75

waits for

5 0 …

11 1110 00 4 15

00 0010 10 5 05 initiates

bcast …11 1110 10 5 15

5’s bcastcompletes6 initiates

bcast 00 0010 01 6 0…11 1110 01 6 15

6’s bcastcompletes …

4’s bcastcompletes

11 1110 00 4 15reconfiguration completes in

(16)2 =(number of nodes)2 cycles

log(16) bits

log(16) bits

longest (in hops) broadcast

ARIADNE: synchronization

Page 25: ARIADNE A gnostic  R econfiguration I n  A D isconnected  N etwork  E nvironment

0

4

1

5

2

6

3

7

8

12

9

13

10

14

11

11 0010 10 1110 10

cycle count (same for all nodes)

XX XXXX 10 1101 10

bcast cycle

bcast node

9 75

waits for

5 0 …

11 1110 00 4 15

00 0010 10 5 05 initiates

bcast

15

11 1110 10 5 155’s bcast

completes

8

waits for

8 0

1st hop

2nd

hop

00 10 5 1

00 01 5 2…

8 resigns from becoming the

root node

(!) we need to reconfigure once even for multiple faults

ARIADNE: synchronization

Page 26: ARIADNE A gnostic  R econfiguration I n  A D isconnected  N etwork  E nvironment

Outline• Motivation• Ariadne

–Baseline–Deadlocks–Synchronization

• Evaluation–Overhead–Performance–Reliability

• Conclusions

Page 27: ARIADNE A gnostic  R econfiguration I n  A D isconnected  N etwork  E nvironment

overheadperformancereliability

• On-chip routing algorithms for irregular topologies

Immunet (V. Puente, ISCA’04)

Vicis routing algo (D. Fick, DATE’09)

reserves an escape VC for deadlock freedom (routes deterministically in a ring)

exceptions to turn model to apply it to an arbitrary topology

Evaluation

ARIADNE

synthesized a baseline 5-stage pipelined router (5 ports, 2 VCs, 5-flit buffer/VC)with Synopsys Design Compiler (IBM 130nm target library):router area (mm2): baseline=2.708, Ariadne=2.761, Vicis=2.748, Immunet=2.870

1.5%2.0% 6.0%

Evaluation: Overhead

Page 28: ARIADNE A gnostic  R econfiguration I n  A D isconnected  N etwork  E nvironment

• Experimental Setup: Garnet + GEMS

network topology 8x8 2D mesh

memory controllers 4 at chip corners

channel width 64 bits

router architecture 5-stage pipeline

router ports, VCs 5, 2 (private)

router buffers/port 5-flit for each VC

processors In-order SPARC cores

coherence MOESI protocol

L1 caching private unified 32KB/node

ways: 2 latency: 3 cycles

L2 caching shared distributed 1MB/node

ways: 16 latency: 15 cycles

Network Architecture (GARNET)

System Configuration (GEMS)

Evaluation: Performance

0 20 40 60 80 1000

20

40

60

80

100

Ariadne (average)Vicis (average)Immunet (average)

Injected Faults

Ave

rage

Lat

ency

(cyc

les)

Average over 10 PARSEC benchmarks

1000 fault configurations lower is better

deadlocks trafficrouting ina ring

Page 29: ARIADNE A gnostic  R econfiguration I n  A D isconnected  N etwork  E nvironment

overheadperformancereliability

• On-chip routing algorithms for irregular topologies

Immunet (V. Puente, ISCA’04)

Vicis routing algo (D. Fick, DATE’09)

reserves an escape VC for deadlock freedom (routes deterministically in a ring)

exceptions to turn model to apply it to an arbitrary topology

Evaluation: Performance + Reliability

ARIADNE

1.5%2.0% 6.0%

Page 30: ARIADNE A gnostic  R econfiguration I n  A D isconnected  N etwork  E nvironment

Outline• Motivation• Ariadne

–Baseline–Deadlocks–Synchronization

• Evaluation–Overhead–Performance–Reliability

• Conclusions

Page 31: ARIADNE A gnostic  R econfiguration I n  A D isconnected  N etwork  E nvironment

ConclusionsWe have presented Ariadne.• a reconfiguration algorithm that provides

deadlock-free routing paths in irregular network topologies that result from faulty links

• is implemented in a fully distributed mode, resulting in simple hardware and low complexity

• enables a trade-off between performance and reliable functionality on unreliable silicon

Page 32: ARIADNE A gnostic  R econfiguration I n  A D isconnected  N etwork  E nvironment

Thank You!

Questions?The Greek legend of Princess Ariadne

“Ariadne (Αριάδνη), was the daughter of King Minos of Crete. Minos attacked Athens after his son was killed there. The Athenians asked for terms, and

were required to sacrifice seven young men and seven maidens every nine years to the Minotaur, a monster with the head of a bull on the body of a man. One year, the sacrificial party included Theseus, a young man who

volunteered to come and kill the Minotaur. Ariadne fell in love at first sight, and helped him by giving him a ball of red fleece thread that she was

spinning, to find his way out of the Minotaur's labyrinth.”

…similarly to Princess Ariadne, our Ariadne algorithm helps packets find their way in the labyrinth-like topology of a faulty network.

[source: wikipedia]

Page 33: ARIADNE A gnostic  R econfiguration I n  A D isconnected  N etwork  E nvironment

BACKUP SLIDES

Page 34: ARIADNE A gnostic  R econfiguration I n  A D isconnected  N etwork  E nvironment

0 1K

2K

3K

4K

5K

6K

7K

8K

9K

10K

11K

12K

13K

14K

15K

16K

17K

18K

19K

20K

21K

22K

23K

24K

25K

26K

27K

28K

29K

30K

31K

32K

33K

34K

35K

36K

37K

38K

39K

40K

41K

42K

43K

44K

45K

46K

47K

48K

49K

50K

0

200

400

600

800

1000

1200

0 to 1 faults

0 to 10 faults

0 to 50 faults

time (cycles)

aver

age

pack

et la

tenc

y in

last

1K

cyc

les

(in c

ycle

s)

reconfiguration initiated

reconfiguration completed

Evaluation: Results (reconfiguration dynamics)

Page 35: ARIADNE A gnostic  R econfiguration I n  A D isconnected  N etwork  E nvironment

Ariadne Architecture

Page 36: ARIADNE A gnostic  R econfiguration I n  A D isconnected  N etwork  E nvironment

Ariadne vs. off-chip approaches• Off-chip networks utilize

centralized software algorithms for reconfiguration

C

(-) topology needs to be communicated to a central node

(-) interfacing with OS(-) updated routing tables

need to be delivered back to each node OS

Page 37: ARIADNE A gnostic  R econfiguration I n  A D isconnected  N etwork  E nvironment

Ariadne: Motivation

• Many transistor failures are expected to occur at NoCs fabricated at advanced technology nodes

0 20 40 60 80 1000

20

40

60

# transistor failures

# lin

k fa

ilure

s

• These failures will result in router link failures and disconnected routers

Page 38: ARIADNE A gnostic  R econfiguration I n  A D isconnected  N etwork  E nvironment

Evaluation: Related Work• On-chip routing algorithms for irregular

topologies implemented for comparisonImmunet

(V. Puente, ISCA 2004) Vicis routing algorithm

(D. Fick, DATE 2009)

• 1 escape VC reserved for deadlock freedom: routes deterministically in a ring

S

Dlatency when traffic 6% overhead (3 routing tables) reliability

• Other VCs route adaptively. BUT, if no other VC is available, a packet switches to escape VC

Page 39: ARIADNE A gnostic  R econfiguration I n  A D isconnected  N etwork  E nvironment

Evaluation: Related Work• On-chip routing algorithms for irregular

topologies implemented for comparisonImmunet

(V. Puente, ISCA 2004) Vicis routing algorithm

(D. Fick, DATE 2009)

• 1 escape VC reserved for deadlock freedom: routes deterministically in a ring

latency when traffic 6% overhead (3 routing tables) reliability

• Other VCs route adaptively. BUT, if no other VC is available, a packet switches to escape VC

• No faults? use turn model• Upon fault occurrence: re-enable disabled turns to

increase connectivity

(assuming north last)

?(assuming north last)

(assuming north last)

Page 40: ARIADNE A gnostic  R econfiguration I n  A D isconnected  N etwork  E nvironment

0 1 2 3 4 5 6 7

8 9 10 11 12 13 14 15

16 17 19 21 22 23

24 25 27 28 29 30 31

32 33 34 35 36 37 38 39

40 41 42 43 44 45 46 47

48 49 50 51 52 53 54 55

56 57 58 59 60 61 62 63

2018

26