fast leader (full) recovery despite dynamic faults

55
Fast Leader (Full) Recovery despite Dynamic Faults Ajoy K. Datta Stéphane Devismes Lawrence L. Larmore Sébastien Tixeuil

Upload: truong

Post on 21-Jan-2016

25 views

Category:

Documents


0 download

DESCRIPTION

Fast Leader (Full) Recovery despite Dynamic Faults. Ajoy K. Datta Stéphane Devismes Lawrence L. Larmore Sébastien Tixeuil. Join Work. Sébastien Tixeuil. Ajoy K. Datta & Lawrence L. Larmore. Self-Stabilization [Dijkstra,74]. Self-Stabilization [Dijkstra,74]. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Fast Leader (Full) Recovery despite Dynamic Faults

Fast Leader (Full) Recovery despite Dynamic Faults

Ajoy K. Datta

Stéphane Devismes

Lawrence L. Larmore

Sébastien Tixeuil

Page 2: Fast Leader (Full) Recovery despite Dynamic Faults

Join Work

ICDCN, 04/01/2013, Mumbia

Ajoy K. Datta & Lawrence L. Larmore

Sébastien Tixeuil

Page 3: Fast Leader (Full) Recovery despite Dynamic Faults

Self-Stabilization [Dijkstra,74]

ICDCN, 04/01/2013, Mumbia

Page 4: Fast Leader (Full) Recovery despite Dynamic Faults

Self-Stabilization [Dijkstra,74]

ICDCN, 04/01/2013, Mumbia

A fault = a process state corruption

Page 5: Fast Leader (Full) Recovery despite Dynamic Faults

Self-Stabilization [Dijkstra,74]

ICDCN, 04/01/2013, Mumbia

Page 6: Fast Leader (Full) Recovery despite Dynamic Faults

Self-Stabilization [Dijkstra,74]

ICDCN, 04/01/2013, Mumbia

Page 7: Fast Leader (Full) Recovery despite Dynamic Faults

Self-Stabilization [Dijkstra,74]

ICDCN, 04/01/2013, Mumbia

Page 8: Fast Leader (Full) Recovery despite Dynamic Faults

Self-Stabilization [Dijkstra,74]

ICDCN, 04/01/2013, Mumbia

Page 9: Fast Leader (Full) Recovery despite Dynamic Faults

Self-Stabilization [Dijkstra,74]

ICDCN, 04/01/2013, Mumbia

Page 10: Fast Leader (Full) Recovery despite Dynamic Faults

Self-Stabilization [Dijkstra,74]

ICDCN, 04/01/2013, Mumbia

Recover after any number of

transient faults

Page 11: Fast Leader (Full) Recovery despite Dynamic Faults

Price of the Versatility

1. Several impossibility results– E.g., Leader Election and Token

Circulation in anonymous networks

2. The stabilization time usually depends on global parameters

(diameter, size of the network …)

ICDCN, 04/01/2013, Mumbia

Page 12: Fast Leader (Full) Recovery despite Dynamic Faults

Price of the Versatility

1. Several impossibility results– E.g., Leader Election and Token

Circulation in Anonymous Networks

2. The stabilization time usually depends on global parameters

(diameter, size of the network …)

ICDCN, 04/01/2013, Mumbia

Page 13: Fast Leader (Full) Recovery despite Dynamic Faults

When a few number of faults hit the system

• Self-Stabilization: Ω(D) rounds

ICDCN, 04/01/2013, Mumbia

Page 14: Fast Leader (Full) Recovery despite Dynamic Faults

When a few number of faults hit the system

• Self-Stabilization: Ω(D) rounds

• Stronger forms:– Fault Containment [Ghosh et al, Dist Comp 2007]

– k-adaptive Self-Stabilization [Burman et al, OPODIS’05]

• Weakened forms:– k-stabilization [Beauquier et al, PODC’98]

ICDCN, 04/01/2013, Mumbia

Page 15: Fast Leader (Full) Recovery despite Dynamic Faults

When a few number of faults hit the system

• Self-Stabilization: Ω(D) rounds

• Stronger forms:– Fault Containment [Ghosh et al, Dist Comp 2007]

– k-adaptive Self-Stabilization [Burman et al, OPODIS’05]

• Weakened forms:– k-stabilization [Beauquier et al, PODC’98]

ICDCN, 04/01/2013, Mumbia

Page 16: Fast Leader (Full) Recovery despite Dynamic Faults

Fault-Containment

• Pros– Self-stabilizing– If f ≤ k faults, stabilization time in O(f) rounds– Containment radius– Fault gap is small

• Cons (currently) – k=1, or– Surrounded by a majority of correct processes, or – Synchronous setting, or– Probabilistic recovery

ICDCN, 04/01/2013, Mumbia

Page 17: Fast Leader (Full) Recovery despite Dynamic Faults

Fault gap• The minimum time between consecutive faulty

transitions to have O(f) recovery time

ICDCN, 04/01/2013, Mumbia

Legitimate

Illegitimate

≥ Fault gap

O(f)

Page 18: Fast Leader (Full) Recovery despite Dynamic Faults

Fault gap• The minimum time between consecutive faulty

transitions to have O(f) recovery time

ICDCN, 04/01/2013, Mumbia

Legitimate

Illegitimate

< fault gap

>Ω(D)

Page 19: Fast Leader (Full) Recovery despite Dynamic Faults

Time-Adaptive Self-stabilization

• Self-Stabilization

• If the hamming distance to a legitimate configuration is f ≤ k, i.e., f ≤ k faults occurs simultaneous (Static faults), – “output” stabilization in O(f) rounds

ICDCN, 04/01/2013, Mumbia

Page 20: Fast Leader (Full) Recovery despite Dynamic Faults

Output vs. State Stabilization

ICDCN, 04/01/2013, Mumbia

Legitimate

Correct OutputO(f)

>Ω(D)

Illegitimate

f ≤ k faults

Page 21: Fast Leader (Full) Recovery despite Dynamic Faults

Output vs. State Stabilization

ICDCN, 04/01/2013, Mumbia

Legitimate

Correct OutputO(f)

>Ω(D)

Illegitimate

f ≤ k faults

The fault gap depends on global parameters

Page 22: Fast Leader (Full) Recovery despite Dynamic Faults

k-Stabilization (first definition)

ICDCN, 04/01/2013, Mumbia

If the hamming distance to a legitimate configuration is f ≤ k, i.e., f ≤ k faults occurs simultaneous,the system eventually recoversOtherwise no guarantee

Page 23: Fast Leader (Full) Recovery despite Dynamic Faults

k-Stabilization (first definition)

• Pros– Can solve more problems than self-stabilization– Usually, only-k-dependent stabilization time– Usually, only-k-dependent fault gap

• Cons– Not self-stabilizing– Static faults: f ≤ k faults should occur in a single

transition ICDCN, 04/01/2013, Mumbia

Page 24: Fast Leader (Full) Recovery despite Dynamic Faults

Our definition of k-stabilization

• Faulty transition = one process state corruption

• Dynamic faults: – if f ≤ k faulty transitions occur

in an arbitrary manner• The system eventually recovers

ICDCN, 04/01/2013, Mumbia

Page 25: Fast Leader (Full) Recovery despite Dynamic Faults

Our definition of k-stabilization

ICDCN, 04/01/2013, Mumbia

Legitimate

Illegitimate

1 fault 1 fault 1 fault

f ≤ k faults

Page 26: Fast Leader (Full) Recovery despite Dynamic Faults

Our contribution

• Leader recovery protocol– On an anonymous (yet oriented) ring– Asynchronous atomic read/write

– k-stabilizing if n ≥ 18k + 1– Stabilization time O(k2) rounds– Log(k) bits per process– This problem is unsolvable in self-stabilizing setting

ICDCN, 04/01/2013, Mumbia

Page 27: Fast Leader (Full) Recovery despite Dynamic Faults

Our contribution

ICDCN, 04/01/2013, Mumbia

The system stars in a legitimate configuration where one process is elected

Page 28: Fast Leader (Full) Recovery despite Dynamic Faults

Our contribution

ICDCN, 04/01/2013, Mumbia

Some faulty transitions occurs in an arbitrary manner

Page 29: Fast Leader (Full) Recovery despite Dynamic Faults

Our contribution

ICDCN, 04/01/2013, Mumbia

Some faulty transitions occurs in an arbitrary manner

Fault propagation

Page 30: Fast Leader (Full) Recovery despite Dynamic Faults

Our contribution

ICDCN, 04/01/2013, Mumbia

Some faulty transitions occurs in an arbitrary manner

Fault propagation

Page 31: Fast Leader (Full) Recovery despite Dynamic Faults

Our contribution

ICDCN, 04/01/2013, Mumbia

If n ≥ 18k + 1, the system recovers the same leader inO(k2) rounds

Page 32: Fast Leader (Full) Recovery despite Dynamic Faults

Our contribution

ICDCN, 04/01/2013, Mumbia

If n ≥ 18k + 1, the system recovers the same leader inO(k2) rounds

Page 33: Fast Leader (Full) Recovery despite Dynamic Faults

Our contribution

ICDCN, 04/01/2013, Mumbia

If n ≥ 18k + 1, the system recovers the same leader inO(k2) rounds

Page 34: Fast Leader (Full) Recovery despite Dynamic Faults

Our contribution

ICDCN, 04/01/2013, Mumbia

If n ≥ 18k + 1, the system recovers the same leader inO(k2) rounds

Page 35: Fast Leader (Full) Recovery despite Dynamic Faults

Our contribution

ICDCN, 04/01/2013, Mumbia

If n ≥ 18k + 1, the system recovers the same leader inO(k2) rounds

Page 36: Fast Leader (Full) Recovery despite Dynamic Faults

Fault gap

ICDCN, 04/01/2013, Mumbia

Legitimate

Illegitimate

f ≤ k faulty transition

f ≤ k faulty transitions

0 0O(k2) rounds

Page 37: Fast Leader (Full) Recovery despite Dynamic Faults

Main ideas of the algorithm

ICDCN, 04/01/2013, Mumbia

Page 38: Fast Leader (Full) Recovery despite Dynamic Faults

Vote = Relative Address {-∈3k..3k} { }∪ ⊥

ICDCN, 04/01/2013, Mumbia

0

⊥⊥

3

2

1-1

-2

-3

3k

Interval of relevance:6+1 votes

Page 39: Fast Leader (Full) Recovery despite Dynamic Faults

After k faults

ICDCN, 04/01/2013, Mumbia

0

⊥⊥

3

2

1-1

-2

-3

Page 40: Fast Leader (Full) Recovery despite Dynamic Faults

After k faults

ICDCN, 04/01/2013, Mumbia

0

⊥⊥

3

0

1-1

-2

-3

Page 41: Fast Leader (Full) Recovery despite Dynamic Faults

After k faults

ICDCN, 04/01/2013, Mumbia

1

⊥⊥

3

0

1 0

-2

-3

At most 3k processes change their votes

Page 42: Fast Leader (Full) Recovery despite Dynamic Faults

After k faults

ICDCN, 04/01/2013, Mumbia

1

⊥⊥

3

0

1 0

-2

-3

At most 3k processes change their votes

Always a majority of votes for the previous leader

Page 43: Fast Leader (Full) Recovery despite Dynamic Faults

Rumors

ICDCN, 04/01/2013, Mumbia

1

1

Vote

Rumor

In a legitimate state, Vote = Rumor, for all process

Main idea:Vote: hard to change Rumor: easy to change

Page 44: Fast Leader (Full) Recovery despite Dynamic Faults

Rumors

ICDCN, 04/01/2013, Mumbia

1

2

Vote

Rumor If Rumor ≠ Vote• If Rumor ≠ ⊥

• Candidate ← Rumor• Else

• Candidate ← VoteInitiate Query(Candidate)

Page 45: Fast Leader (Full) Recovery despite Dynamic Faults

Rumors

ICDCN, 04/01/2013, Mumbia

1

2

Vote

Rumor Query(Candidate) traverses the interval of relevance of the candidate (6k+1 processes), and

Count the votes for the candidate

Page 46: Fast Leader (Full) Recovery despite Dynamic Faults

Query Return

• If at least 3k+1 votes for the Candidate

– If Rumor ≠ ≠ Candidate⊥• Initiate a Denial of rumor in its interval of relevance

– Vote←Candidate

– Rumor←Candidate

• Else

– If Rumor = Candidate, then Rumor←⊥– Initiate a Denial of Candidate in its interval of relevance

– If Vote = Candidate, then Vote← ⊥

ICDCN, 04/01/2013, Mumbia

Page 47: Fast Leader (Full) Recovery despite Dynamic Faults

Query Tracks

ICDCN, 04/01/2013, Mumbia

Page 48: Fast Leader (Full) Recovery despite Dynamic Faults

Other tracks

• Denial (to kill a rumor)

• To manage lost queries– Probe wave– Report

(see the paper)

ICDCN, 04/01/2013, Mumbia

Page 49: Fast Leader (Full) Recovery despite Dynamic Faults

Deadlock Prevention

• Each two neighboring processes share a resource– Think of chopstick between 2 philosophers

ICDCN, 04/01/2013, Mumbia

Page 50: Fast Leader (Full) Recovery despite Dynamic Faults

Deadlock Prevention

• Each two neighboring processes share a resource– Think of chopstick between 2 philosophers

• Only a process that holds both its left and right resources can initiate a query

ICDCN, 04/01/2013, Mumbia

Page 51: Fast Leader (Full) Recovery despite Dynamic Faults

Deadlock Prevention

• Each two neighboring processes share a resource– Think of chopstick between 2 philosophers

• Only a process that holds both its left and right resources can initiate a query

• So, at any time at most n/2 pending initiated query

ICDCN, 04/01/2013, Mumbia

Page 52: Fast Leader (Full) Recovery despite Dynamic Faults

Deadlock Prevention

• Each two neighboring processes share a resource– Think of chopstick between 2 philosophers

• Only a process that holds both its left and right resources can initiate a query

• So, at any time at most n/2 pending initiated query• Now, we can have up to 9k rogue queries, i.e., non-

initiated queries

ICDCN, 04/01/2013, Mumbia

Page 53: Fast Leader (Full) Recovery despite Dynamic Faults

Deadlock Prevention

• Each two neighboring processes share a resource– Think of chopstick between 2 philosophers

• Only a process that holds both its left and right resources can initiate a query

• So, at any time at most n/2 pending initiated query• Now, we can have up to 9k rogue queries, i.e., non-

initiated queries• So, n > n/2+9k, that is n ≥ 18k + 1

ICDCN, 04/01/2013, Mumbia

Page 54: Fast Leader (Full) Recovery despite Dynamic Faults

Conclusion

• Less restrictive definition of k-stabilization

• Using this definition, we solve a problem having no self-stabilizing solution:– Leader recovery protocol

• On an anonymous (yet oriented) ring• Only-k-dependent complexity:

– Stabilization time O(k2) rounds– Log(k) bits per process

ICDCN, 04/01/2013, Mumbia

Page 55: Fast Leader (Full) Recovery despite Dynamic Faults

Thank You!ICDCN, 04/01/2013, Mumbia