the weakest failure detector question in distributed computing petr kouznetsov distributed...
TRANSCRIPT
![Page 1: The weakest failure detector question in distributed computing Petr Kouznetsov Distributed Programming Lab EPFL](https://reader035.vdocument.in/reader035/viewer/2022062423/56649c775503460f9492bffa/html5/thumbnails/1.jpg)
The weakest failure detector question in
distributed computing
Petr KouznetsovDistributed Programming Lab
EPFL
![Page 2: The weakest failure detector question in distributed computing Petr Kouznetsov Distributed Programming Lab EPFL](https://reader035.vdocument.in/reader035/viewer/2022062423/56649c775503460f9492bffa/html5/thumbnails/2.jpg)
2
Outline Impossibility results and failure
detectors Model: asynchronous system with
failure detectors The weakest failure detector
question and the CHT proof Determining the weakest failure
detectors for various problems(implementing shared memory, solving consensus, solving non-blocking atomic commit, boosting consensus power of atomic objects)
![Page 3: The weakest failure detector question in distributed computing Petr Kouznetsov Distributed Programming Lab EPFL](https://reader035.vdocument.in/reader035/viewer/2022062423/56649c775503460f9492bffa/html5/thumbnails/3.jpg)
3
Centralized computing
Clients
Centralized computing unit
![Page 4: The weakest failure detector question in distributed computing Petr Kouznetsov Distributed Programming Lab EPFL](https://reader035.vdocument.in/reader035/viewer/2022062423/56649c775503460f9492bffa/html5/thumbnails/4.jpg)
4
Distributed computing
Clients
Distributed computing unit
![Page 5: The weakest failure detector question in distributed computing Petr Kouznetsov Distributed Programming Lab EPFL](https://reader035.vdocument.in/reader035/viewer/2022062423/56649c775503460f9492bffa/html5/thumbnails/5.jpg)
5
Redundancy and synchronization
Distributed computing unit
The distributed implementation should create an illusion of a centralized one:
The components (processes) must be synchronized in a consistent way.
![Page 6: The weakest failure detector question in distributed computing Petr Kouznetsov Distributed Programming Lab EPFL](https://reader035.vdocument.in/reader035/viewer/2022062423/56649c775503460f9492bffa/html5/thumbnails/6.jpg)
6
Consensus
Processes propose values and must agree on a common value in a non-trivial manner:
Agreement: no two correct processes decide differently
Validity: every decided value is a proposed value
Termination: every correct process eventually decides
![Page 7: The weakest failure detector question in distributed computing Petr Kouznetsov Distributed Programming Lab EPFL](https://reader035.vdocument.in/reader035/viewer/2022062423/56649c775503460f9492bffa/html5/thumbnails/7.jpg)
7
Ideal computing
The consistency and progress of the implementation are preserved even if:
Processes can fail by crashing The system is asynchronous:
Communication is not bounded Processing is not bounded
(There is no bound Δ such that, taking local Δ steps, a process can surely “hear” from every correct process.)
![Page 8: The weakest failure detector question in distributed computing Petr Kouznetsov Distributed Programming Lab EPFL](https://reader035.vdocument.in/reader035/viewer/2022062423/56649c775503460f9492bffa/html5/thumbnails/8.jpg)
8
FLP impossibility
Consensus is impossible in an asynchronous system if at least one process might crash.
[Fischer, Lynch and Paterson, 1985]
![Page 9: The weakest failure detector question in distributed computing Petr Kouznetsov Distributed Programming Lab EPFL](https://reader035.vdocument.in/reader035/viewer/2022062423/56649c775503460f9492bffa/html5/thumbnails/9.jpg)
9
Adding (some) synchrony
Consensus is impossible in a system with asynchronous processing or asynchronous communication if at least one process might crash. [Dolev, Dwork, Stockmeyer, 1987]
(… in a shared memory system [Loui, Abu-Amara, 1987])
![Page 10: The weakest failure detector question in distributed computing Petr Kouznetsov Distributed Programming Lab EPFL](https://reader035.vdocument.in/reader035/viewer/2022062423/56649c775503460f9492bffa/html5/thumbnails/10.jpg)
10
Why? It is impossible to distinguish a crashed
process from a “sleeping” one, no matter how many steps you take.
(1)
(2)
?
?
![Page 11: The weakest failure detector question in distributed computing Petr Kouznetsov Distributed Programming Lab EPFL](https://reader035.vdocument.in/reader035/viewer/2022062423/56649c775503460f9492bffa/html5/thumbnails/11.jpg)
11
Adding partial synchrony
Assume that for in every execution there is an upper bound on time to execute a processing step and to communicate a message.
Consensus is solvable if a majority of processes are correct.
(If communication is synchronous and processing is partially synchronous, then consensus is solvable for any number of failures.)
[Dwork, Lynch, Stockmeyer, 1988]
![Page 12: The weakest failure detector question in distributed computing Petr Kouznetsov Distributed Programming Lab EPFL](https://reader035.vdocument.in/reader035/viewer/2022062423/56649c775503460f9492bffa/html5/thumbnails/12.jpg)
12
Adding less synchrony
Assume we (eventually) have a leader, i.e., eventually all processes that take “enough” steps will “hear” from some correct process.
1
2
3
4
![Page 13: The weakest failure detector question in distributed computing Petr Kouznetsov Distributed Programming Lab EPFL](https://reader035.vdocument.in/reader035/viewer/2022062423/56649c775503460f9492bffa/html5/thumbnails/13.jpg)
13
Eventual leader abstraction Ω At every process, Ω outputs a process
identifier. Eventually, the same correct process
id is output at all processes.
1
2
3
4
1
2
4
4
2
4
1
3
1
4
3
3
3
3
3
3
3
![Page 14: The weakest failure detector question in distributed computing Petr Kouznetsov Distributed Programming Lab EPFL](https://reader035.vdocument.in/reader035/viewer/2022062423/56649c775503460f9492bffa/html5/thumbnails/14.jpg)
14
Ω is sufficient for consensus!
Consensus is solvable in an asynchronous system equipped with Ω, where a majority of processes are correct.
[Lam90,CT91]
(If communication is synchronous, then consensus is solvable for any number of failures.)
[DLS88,LH94]
![Page 15: The weakest failure detector question in distributed computing Petr Kouznetsov Distributed Programming Lab EPFL](https://reader035.vdocument.in/reader035/viewer/2022062423/56649c775503460f9492bffa/html5/thumbnails/15.jpg)
15
The question
What is the smallest amount of synchrony that must be introduced into the asynchronous system to solve an unsolvable problem?
![Page 16: The weakest failure detector question in distributed computing Petr Kouznetsov Distributed Programming Lab EPFL](https://reader035.vdocument.in/reader035/viewer/2022062423/56649c775503460f9492bffa/html5/thumbnails/16.jpg)
16
Outline Impossibility results and failure
detectors Model: asynchronous system with
failure detectors The weakest failure detector
question and the CHT proof Determining the weakest failure
detectors for various problems(implementing shared memory, solving consensus, solving non-blocking atomic commit (NBAC), boosting consensus power of atomic objects)
![Page 17: The weakest failure detector question in distributed computing Petr Kouznetsov Distributed Programming Lab EPFL](https://reader035.vdocument.in/reader035/viewer/2022062423/56649c775503460f9492bffa/html5/thumbnails/17.jpg)
17
General system model
Processes p1,…,pn communicate through reliable message-passing channels. (*)
In addition, every processes can query its failure detector module that produces some (maybe incomplete and inaccurate) information about failures.
(*) Later we consider also registers and atomic objects of given power.
![Page 18: The weakest failure detector question in distributed computing Petr Kouznetsov Distributed Programming Lab EPFL](https://reader035.vdocument.in/reader035/viewer/2022062423/56649c775503460f9492bffa/html5/thumbnails/18.jpg)
18
Failure detector modules
p
FD
q r
FD FD
![Page 19: The weakest failure detector question in distributed computing Petr Kouznetsov Distributed Programming Lab EPFL](https://reader035.vdocument.in/reader035/viewer/2022062423/56649c775503460f9492bffa/html5/thumbnails/19.jpg)
19
Failure detectors
FD
p
query Information
on failures
fail(q)
The information output to the processes depends only on failures
![Page 20: The weakest failure detector question in distributed computing Petr Kouznetsov Distributed Programming Lab EPFL](https://reader035.vdocument.in/reader035/viewer/2022062423/56649c775503460f9492bffa/html5/thumbnails/20.jpg)
20
Example: perfect failure detector P
At each process, P outputs a set of suspected process identifiers.
Eventually, every crashed process is suspected
No process is suspected before it crashes
1
2
3
4
Ø 4 4Ø
Ø 4 4Ø
Ø 4 4Ø 4
![Page 21: The weakest failure detector question in distributed computing Petr Kouznetsov Distributed Programming Lab EPFL](https://reader035.vdocument.in/reader035/viewer/2022062423/56649c775503460f9492bffa/html5/thumbnails/21.jpg)
21
Example: failure signal failure detector FS
At each process, FS outputs green or red. If red is output, then a failure previously
occurred. If a failure occurs, then eventually red is
output at all correct processes.
1
2
3
4
green
red
red
red
red
green
green
![Page 22: The weakest failure detector question in distributed computing Petr Kouznetsov Distributed Programming Lab EPFL](https://reader035.vdocument.in/reader035/viewer/2022062423/56649c775503460f9492bffa/html5/thumbnails/22.jpg)
22
Environments
An environment E specifies when and where failures might occur
Examples: Majority of processes are correct At most one process crash
![Page 23: The weakest failure detector question in distributed computing Petr Kouznetsov Distributed Programming Lab EPFL](https://reader035.vdocument.in/reader035/viewer/2022062423/56649c775503460f9492bffa/html5/thumbnails/23.jpg)
23
Failure detector reductionsFailure detector D is weaker than
failure detector D’ if D can be extracted from D’, i.e., there exists an algorithm that simulates D using D’.
pD’ D
q r
D’
D D
D’
![Page 24: The weakest failure detector question in distributed computing Petr Kouznetsov Distributed Programming Lab EPFL](https://reader035.vdocument.in/reader035/viewer/2022062423/56649c775503460f9492bffa/html5/thumbnails/24.jpg)
24
The weakest failure detector
D is the weakest failure detector to solve problem M in an environment E if and only if:
D is sufficient for M in E: D can be used to solve M in E
D is necessary for M in E: D is weaker than any failure detector D’ that can be used to solve M in E
![Page 25: The weakest failure detector question in distributed computing Petr Kouznetsov Distributed Programming Lab EPFL](https://reader035.vdocument.in/reader035/viewer/2022062423/56649c775503460f9492bffa/html5/thumbnails/25.jpg)
25
The question
Given a problem M and an environment E,
what is the weakest failure detector for solving M in E?
![Page 26: The weakest failure detector question in distributed computing Petr Kouznetsov Distributed Programming Lab EPFL](https://reader035.vdocument.in/reader035/viewer/2022062423/56649c775503460f9492bffa/html5/thumbnails/26.jpg)
26
Outline Impossibility results and failure
detectors Model: asynchronous system with
failure detectors The weakest failure detector
question and the CHT proof Determining the weakest failure
detectors for various problems(implementing shared memory, solving consensus, solving non-blocking atomic commit (NBAC), boosting consensus power of atomic objects)
![Page 27: The weakest failure detector question in distributed computing Petr Kouznetsov Distributed Programming Lab EPFL](https://reader035.vdocument.in/reader035/viewer/2022062423/56649c775503460f9492bffa/html5/thumbnails/27.jpg)
27
The CHT result
The CHT Theorem: If a failure detector D implements consensus, then D implements
Corollary: is the weakest failure detector for consensus with a majority of correct processes
[Chandra, Hadzilacos and Toueg, 1996]
![Page 28: The weakest failure detector question in distributed computing Petr Kouznetsov Distributed Programming Lab EPFL](https://reader035.vdocument.in/reader035/viewer/2022062423/56649c775503460f9492bffa/html5/thumbnails/28.jpg)
28
The CHT (96) Proof
Assume D implements consensus: i.e., there is some algorithm A that uses D to implement consensus
We build an algorithm T that uses A to implement
• NB. Implementing means that every process trusts some process so that eventually all correct processes permanently trust the same correct process
![Page 29: The weakest failure detector question in distributed computing Petr Kouznetsov Distributed Programming Lab EPFL](https://reader035.vdocument.in/reader035/viewer/2022062423/56649c775503460f9492bffa/html5/thumbnails/29.jpg)
29
Algorithm T in 5 acts
• (1) The exchange
• (2) The simulation
• (3) The tagging
• (4) The stabilization
• (5) The extraction
![Page 30: The weakest failure detector question in distributed computing Petr Kouznetsov Distributed Programming Lab EPFL](https://reader035.vdocument.in/reader035/viewer/2022062423/56649c775503460f9492bffa/html5/thumbnails/30.jpg)
30
(1) The Exchange
• Every process periodically queries its failure detector module (D) and sends all outputs it has seen to all
• A process builds a growing DAG using the outputs provided by other processes
• A vertex of the DAG is a pair: (process, failure detector value)
• An edge (p1,d1) -> (p2,d2) means that p1 saw d1 before p2 saw d2
![Page 31: The weakest failure detector question in distributed computing Petr Kouznetsov Distributed Programming Lab EPFL](https://reader035.vdocument.in/reader035/viewer/2022062423/56649c775503460f9492bffa/html5/thumbnails/31.jpg)
31
(1) The Exchange
p1d1
p2d2
(p1,d1)(p2,d2)
(p1,d1)(p2,d2)
d3
d4
(p1,d1)(p2,d2)
(p1,d3)
(p2,d4)
(p1,d1)(p2,d2)
(p1,d3)
(p2,d4)
![Page 32: The weakest failure detector question in distributed computing Petr Kouznetsov Distributed Programming Lab EPFL](https://reader035.vdocument.in/reader035/viewer/2022062423/56649c775503460f9492bffa/html5/thumbnails/32.jpg)
32
(2) The Simulation
• Every process pi uses its DAG to simulate runs of A in the system, i.e., every process locally plays the role of all other processes
• Whenever pi updates its DAG, pi triggers runs of A for:• All paths in the DAG• All input vectors I0, I2,.. In, where Ij makes processes p1-pj propose 1 and the rest propose 0
![Page 33: The weakest failure detector question in distributed computing Petr Kouznetsov Distributed Programming Lab EPFL](https://reader035.vdocument.in/reader035/viewer/2022062423/56649c775503460f9492bffa/html5/thumbnails/33.jpg)
33
(2) The Simulation
p1
p1p2
p1p2
p1p2
I0
I1
I2Decide(1)
Decide(1)
Decide(1)
Decide(1)
Decide(0)
Decide(0)
![Page 34: The weakest failure detector question in distributed computing Petr Kouznetsov Distributed Programming Lab EPFL](https://reader035.vdocument.in/reader035/viewer/2022062423/56649c775503460f9492bffa/html5/thumbnails/34.jpg)
34
(3) The Tagging
Periodically, every process pi looks at the results of the simulations, i.e., the outputs of the consensus simulations paths (runs of A)
For every vector Ij, pi gathers all decisions and tags Ij:• 0-valent if only 0 is decided starting from Ij• 1-valent if only 1 is decided starting from Ij• bivalent if both 0 and 1 can be decided (in different simulated runs)
![Page 35: The weakest failure detector question in distributed computing Petr Kouznetsov Distributed Programming Lab EPFL](https://reader035.vdocument.in/reader035/viewer/2022062423/56649c775503460f9492bffa/html5/thumbnails/35.jpg)
35
(3) The Tagging
Notice that:• by validity of consensus, I0 is 0-valent and In is 1-valent • an 0 or 1-valent input vector can only get bivalent • a bivalent input vector stays bivalent forever
![Page 36: The weakest failure detector question in distributed computing Petr Kouznetsov Distributed Programming Lab EPFL](https://reader035.vdocument.in/reader035/viewer/2022062423/56649c775503460f9492bffa/html5/thumbnails/36.jpg)
36
(3) The Tagging
• There is some index k in the sequence I1, …, In such that Ik-1 is 0-valent and Ik is not: k is called the critical index
• If Ik is 1-valent, then pi trusts pk
(we do not consider here the more complicated case when Ik is bivalent)
![Page 37: The weakest failure detector question in distributed computing Petr Kouznetsov Distributed Programming Lab EPFL](https://reader035.vdocument.in/reader035/viewer/2022062423/56649c775503460f9492bffa/html5/thumbnails/37.jpg)
37
(4) The Stabilization
• Eventually, the critical index at a given process does not change anymore: this is because the index can only decrease and cannot go lower than 1
• All DAGs converge to the same infinite DAG and the same critical index k is eventually computed at all processes
![Page 38: The weakest failure detector question in distributed computing Petr Kouznetsov Distributed Programming Lab EPFL](https://reader035.vdocument.in/reader035/viewer/2022062423/56649c775503460f9492bffa/html5/thumbnails/38.jpg)
38
(5) The Extraction
Assume that k is such that Ik-1 is 0-valent and Ik is 1-valent
Thus, eventually, all correct processes permanently trust pk
Claim: pk is correct
![Page 39: The weakest failure detector question in distributed computing Petr Kouznetsov Distributed Programming Lab EPFL](https://reader035.vdocument.in/reader035/viewer/2022062423/56649c775503460f9492bffa/html5/thumbnails/39.jpg)
39
(5) The Extraction
Proof: (by contradiction) Assume pk is faulty
Then there is a simulated run r of A starting form Ik in which pk takes no steps. Ik-1 and Ik differ only in the input value of pk. Then pi cannot distinguish r from a run starting from Ik-1.
But Ik-1 is 0-valent and Ik is 1-valent – a contradiction.
![Page 40: The weakest failure detector question in distributed computing Petr Kouznetsov Distributed Programming Lab EPFL](https://reader035.vdocument.in/reader035/viewer/2022062423/56649c775503460f9492bffa/html5/thumbnails/40.jpg)
40
(5) The Extraction
Assume now that k is such that Ik-1 is 0-valent and Ik is bivalent
Claim: there exists an algorithm that eventually deduces a correct process from simulated runs starting from Ik
![Page 41: The weakest failure detector question in distributed computing Petr Kouznetsov Distributed Programming Lab EPFL](https://reader035.vdocument.in/reader035/viewer/2022062423/56649c775503460f9492bffa/html5/thumbnails/41.jpg)
41
Finally
Eventually, all correct processes trust the same correct process:
Ω is emulated !
![Page 42: The weakest failure detector question in distributed computing Petr Kouznetsov Distributed Programming Lab EPFL](https://reader035.vdocument.in/reader035/viewer/2022062423/56649c775503460f9492bffa/html5/thumbnails/42.jpg)
42
Outline Impossibility results and failure
detectors Model: asynchronous system with
failure detectors The weakest failure detector
question and the CHT proof Determining the weakest failure
detectors for various problems
![Page 43: The weakest failure detector question in distributed computing Petr Kouznetsov Distributed Programming Lab EPFL](https://reader035.vdocument.in/reader035/viewer/2022062423/56649c775503460f9492bffa/html5/thumbnails/43.jpg)
43
Problem: implementing a registerA register is an object accessed through
reads and writes The write(v) stores v at the register and
returns ok The read returns the last value written at
the register
NB In an asynchronous system a register can be implemented if and only if a majority of processes are correct [ABD95].
![Page 44: The weakest failure detector question in distributed computing Petr Kouznetsov Distributed Programming Lab EPFL](https://reader035.vdocument.in/reader035/viewer/2022062423/56649c775503460f9492bffa/html5/thumbnails/44.jpg)
44
Quorum failure detector Σ
At each process, Σ outputs a set of processes
Any two sets (output at any times and at any processes) intersect.
Eventually every set contains only correct processes.
NB Given a majority of correct processes, Σ can be implemented in an asynchronous system.
![Page 45: The weakest failure detector question in distributed computing Petr Kouznetsov Distributed Programming Lab EPFL](https://reader035.vdocument.in/reader035/viewer/2022062423/56649c775503460f9492bffa/html5/thumbnails/45.jpg)
45
Σ is sufficient to implement
registers
Adapt the “correct majority-based” algorithm of [ABD95] to implement (1 reader, 1 writer) atomic register using Σ:Substitute « process p waits until a majority of processes reply »
with « process p waits until all processes in Σ reply »
![Page 46: The weakest failure detector question in distributed computing Petr Kouznetsov Distributed Programming Lab EPFL](https://reader035.vdocument.in/reader035/viewer/2022062423/56649c775503460f9492bffa/html5/thumbnails/46.jpg)
46
Σ is necessary to implement registersLet A be any implementation of registers
that uses some failure detector D.Must show that we can extract Σ from D.
Each write operation involves a set of “participants”: the processes that help the operation take effect (w.r.t. A and D)
Claim: the set of participants includes at least one correct process
![Page 47: The weakest failure detector question in distributed computing Petr Kouznetsov Distributed Programming Lab EPFL](https://reader035.vdocument.in/reader035/viewer/2022062423/56649c775503460f9492bffa/html5/thumbnails/47.jpg)
47
Extraction algorithmEvery process p periodically: writes in its register the participant sets of its
previous writes reads participant sets of other processes outputs
the participant set of its previous write, and for every known participant set S, one live process
in S
All output sets intersect and eventually contain only correct processes
![Page 48: The weakest failure detector question in distributed computing Petr Kouznetsov Distributed Programming Lab EPFL](https://reader035.vdocument.in/reader035/viewer/2022062423/56649c775503460f9492bffa/html5/thumbnails/48.jpg)
48
Emulating Σ: the reduction algorithm
Let Pi(k) be the set of participants in k-th write operation by process i
Round k: Ei := Pi(j) j≤k
write(Ei) to register RiEi := Ei U Pi(k) send (k,?) to all for every j=1,…,n, wait until received (k,ack) from at lest one process in every S read in register Rj
current output of Σ := set of all processes from which (ack,k) plus Pi(k-1)
![Page 49: The weakest failure detector question in distributed computing Petr Kouznetsov Distributed Programming Lab EPFL](https://reader035.vdocument.in/reader035/viewer/2022062423/56649c775503460f9492bffa/html5/thumbnails/49.jpg)
49
Emulating Σ: the proof intuition For any round k, process i stores all Pi(k’)
(k’<k) in Ri and includes Pi(k-1) to its emulated set Σi
=> Any process j that reads Ri afterwards will
include at least one process from Pi(k-1) to its emulated set Σj
=> Every two emulated sets intersect
Eventually, only correct processes send acks => Eventually, the emulation set includes only
correct processes
![Page 50: The weakest failure detector question in distributed computing Petr Kouznetsov Distributed Programming Lab EPFL](https://reader035.vdocument.in/reader035/viewer/2022062423/56649c775503460f9492bffa/html5/thumbnails/50.jpg)
50
Registers: the weakest failure detector
Σ is the weakest failure detector to implement atomic registers, in any environment
![Page 51: The weakest failure detector question in distributed computing Petr Kouznetsov Distributed Programming Lab EPFL](https://reader035.vdocument.in/reader035/viewer/2022062423/56649c775503460f9492bffa/html5/thumbnails/51.jpg)
51
Consensus registers + can be used to solve consensus with
registers, in any environment [LH94]
Consensus => Registers: any consensus algorithm can be used to implement registers, in any environment [Lam86,Sch90]
Consensus => : can be extracted from any failure detector D that solves consensus, in any environment [CHT96]
![Page 52: The weakest failure detector question in distributed computing Petr Kouznetsov Distributed Programming Lab EPFL](https://reader035.vdocument.in/reader035/viewer/2022062423/56649c775503460f9492bffa/html5/thumbnails/52.jpg)
52
Consensus: the weakest failure detectorConsensus registers + (in any
environment)Σ is the weakest FD to implement
registers (in any environment)
Thus,(, Σ) is the weakest failure
detector to solve consensus, in any environment
![Page 53: The weakest failure detector question in distributed computing Petr Kouznetsov Distributed Programming Lab EPFL](https://reader035.vdocument.in/reader035/viewer/2022062423/56649c775503460f9492bffa/html5/thumbnails/53.jpg)
53
Problem: quittable consensus (QC)QC is like consensus except thatif a failure occurs, then processes can
agree eitheron one of the proposed values (as in
consensus),oron the special value Q (« Quit »)
![Page 54: The weakest failure detector question in distributed computing Petr Kouznetsov Distributed Programming Lab EPFL](https://reader035.vdocument.in/reader035/viewer/2022062423/56649c775503460f9492bffa/html5/thumbnails/54.jpg)
54
Quittable consensus (QC)propose(v) (v in 0,1) returns a value in
0,1,Q (Q stands for « quit »)
Agreement: no two processes return different values
Termination: every correct process eventually returns a value
Validity: only a value v in 0,1,Q can be returned If v in 0,1, then some process previously
proposed vIf v=Q, then a failure previously occurred
![Page 55: The weakest failure detector question in distributed computing Petr Kouznetsov Distributed Programming Lab EPFL](https://reader035.vdocument.in/reader035/viewer/2022062423/56649c775503460f9492bffa/html5/thumbnails/55.jpg)
55
Failure detector Ψ For some initial period of time Ψ
outputs some predefined value Ø Eventually,
Ψ behaves like (Ω,Σ), or(only if a failure occurs) Ψ behaves like FS
(outputs red)
NB: If a failure occurs, Ψ can choose to behave like (Ω,Σ) or like FS (the choice is the same at all processes)
![Page 56: The weakest failure detector question in distributed computing Petr Kouznetsov Distributed Programming Lab EPFL](https://reader035.vdocument.in/reader035/viewer/2022062423/56649c775503460f9492bffa/html5/thumbnails/56.jpg)
56
Ψ is sufficient to solve QC
Propose(v) // v in 0,1
wait until Ψ ≠ Ø if Ψ = red then return Q // If Ψ behaves like FS
d := ConsPropose(v) // If Ψ behaves like (Ω,Σ)
// run a consensus algorithm
return d
![Page 57: The weakest failure detector question in distributed computing Petr Kouznetsov Distributed Programming Lab EPFL](https://reader035.vdocument.in/reader035/viewer/2022062423/56649c775503460f9492bffa/html5/thumbnails/57.jpg)
57
Ψ is necessary to solve QC
Let A be a QC algorithm that uses a failure detector D.
Must show that we can extract Ψ from A and D
![Page 58: The weakest failure detector question in distributed computing Petr Kouznetsov Distributed Programming Lab EPFL](https://reader035.vdocument.in/reader035/viewer/2022062423/56649c775503460f9492bffa/html5/thumbnails/58.jpg)
58
Simulating runs of A Every process periodically samples D and
exchanges its FD samples with other processes
=> using these FD samples, the process locally simulates runs of A [CHT96]
pD Simulate A
q r
D D
Simulate A Simulate A
![Page 59: The weakest failure detector question in distributed computing Petr Kouznetsov Distributed Programming Lab EPFL](https://reader035.vdocument.in/reader035/viewer/2022062423/56649c775503460f9492bffa/html5/thumbnails/59.jpg)
59
Extracting ΨEach process pi runs the simulation until,
for every j=1,…,n, there is a simulated run starting from Ij in which pi decides.
If pi decides Q in one of the simulated runs: propose 0 to QC.
Otherwise, propose 1 to QC.
If QC decides 0 or Q --- output red. Otherwise, it is possible to output (Ω,Σ).
![Page 60: The weakest failure detector question in distributed computing Petr Kouznetsov Distributed Programming Lab EPFL](https://reader035.vdocument.in/reader035/viewer/2022062423/56649c775503460f9492bffa/html5/thumbnails/60.jpg)
60
Extracting (Ω,Σ)If there are “enough” simulated runs of A in
which non-Q values are decided, then it is possible to extract (Ω,Σ).
Extracting Ω --- like in CHT, locating a critical index, etc. (by construction, a critical index exists)
Extracting Σ --- a novel technique
![Page 61: The weakest failure detector question in distributed computing Petr Kouznetsov Distributed Programming Lab EPFL](https://reader035.vdocument.in/reader035/viewer/2022062423/56649c775503460f9492bffa/html5/thumbnails/61.jpg)
61
QC: the weakest failure detector
Ψ is the weakest failure detector to solve QC, in any environment
![Page 62: The weakest failure detector question in distributed computing Petr Kouznetsov Distributed Programming Lab EPFL](https://reader035.vdocument.in/reader035/viewer/2022062423/56649c775503460f9492bffa/html5/thumbnails/62.jpg)
62
Problem: NBAC
A set of processes need to agree on whether to commit or to abort a transaction.
Initially, each process votes Yes (“I want to commit”) or No (“We must abort”)
Eventually, processes must reach a common decision (Commit or Abort).
![Page 63: The weakest failure detector question in distributed computing Petr Kouznetsov Distributed Programming Lab EPFL](https://reader035.vdocument.in/reader035/viewer/2022062423/56649c775503460f9492bffa/html5/thumbnails/63.jpg)
63
Problem: NBAC Agreement: no two processes return
different values Termination: every correct process
eventually returns a value Validity: a value in Commit, Abort is
returned If Commit is returned, then every process voted Yes
If Abort is returned, then some process voted no or a failure previously occurred
![Page 64: The weakest failure detector question in distributed computing Petr Kouznetsov Distributed Programming Lab EPFL](https://reader035.vdocument.in/reader035/viewer/2022062423/56649c775503460f9492bffa/html5/thumbnails/64.jpg)
64
NBAC QC + FS NBAC => QC:
Any algorithm for NBAC can be used to solve QC
NBAC => FS: Any algorithm for NBAC can be used to extract FS
QC+FS => NBAC: given (a) any algorithm for QC and (b) FS, we can solve NBAC
![Page 65: The weakest failure detector question in distributed computing Petr Kouznetsov Distributed Programming Lab EPFL](https://reader035.vdocument.in/reader035/viewer/2022062423/56649c775503460f9492bffa/html5/thumbnails/65.jpg)
65
(QC,FS) NBACGiven (a) any algorithm for QC and (b) FS, we can
solve NBAC
send v to all wait until received all votes or FS outputs red
\\ wait until all votes received or \\ a failure occurs
if all votes are received and are Yes then proposal := 1 \\ propose to commit
else proposal := 0 \\ propose to abort
if QC.Propose(proposal) returns 1 then return Commit
else return Abort
![Page 66: The weakest failure detector question in distributed computing Petr Kouznetsov Distributed Programming Lab EPFL](https://reader035.vdocument.in/reader035/viewer/2022062423/56649c775503460f9492bffa/html5/thumbnails/66.jpg)
66
NBAC: the weakest failure detectorNBAC QC + FS (in any
environment)Ψ is the weakest FD to solve QC (in
any environment)
Thus,(Ψ,FS) is the weakest failure
detector to solve NBAC, in any environment
![Page 67: The weakest failure detector question in distributed computing Petr Kouznetsov Distributed Programming Lab EPFL](https://reader035.vdocument.in/reader035/viewer/2022062423/56649c775503460f9492bffa/html5/thumbnails/67.jpg)
67
Problem: boosting consensus power
Assume that processes communicate through atomic (wait-free linearizable) objects.
An object type specifies the interface of the object:
The set of states The set of operations The set of possible state
transitions
![Page 68: The weakest failure detector question in distributed computing Petr Kouznetsov Distributed Programming Lab EPFL](https://reader035.vdocument.in/reader035/viewer/2022062423/56649c775503460f9492bffa/html5/thumbnails/68.jpg)
68
Problem: boosting consensus power
Consensus power [Herlihy, 1991] of an object type T is the maximum number of processes that can solve consensus using atomic objects of type T and registers.
cons(Register)=1 cons(T&S)=2 cons(C&S)= infinity
By definition, given a type T with consensus power n, n+1 processes cannot solve consensus using objects of type T and registers.
![Page 69: The weakest failure detector question in distributed computing Petr Kouznetsov Distributed Programming Lab EPFL](https://reader035.vdocument.in/reader035/viewer/2022062423/56649c775503460f9492bffa/html5/thumbnails/69.jpg)
69
Problem: boosting consensus power
n + 1 processes Registers Shared objects of type T: cons(T) = n
What is the weakest failure detector D to solve consensus?
![Page 70: The weakest failure detector question in distributed computing Petr Kouznetsov Distributed Programming Lab EPFL](https://reader035.vdocument.in/reader035/viewer/2022062423/56649c775503460f9492bffa/html5/thumbnails/70.jpg)
70
Neiger’s conjecture [Nei95]
Ω(k) outputs a set of at most k processes so that,
Eventually, all correct processes detect the same set that includes at least one correct process
Ω(k+1) is weaker than Ω(k) Ω(n) is sufficient to solve (n + 1)-
process consensus using objects of T and registers.
Is Ω(n) necessary?
![Page 71: The weakest failure detector question in distributed computing Petr Kouznetsov Distributed Programming Lab EPFL](https://reader035.vdocument.in/reader035/viewer/2022062423/56649c775503460f9492bffa/html5/thumbnails/71.jpg)
71
Partial response
Yes, if T is one-shot deterministic.
Every operation triggers exactly one transition
At most one operation on an object of type T is allowed for every process
![Page 72: The weakest failure detector question in distributed computing Petr Kouznetsov Distributed Programming Lab EPFL](https://reader035.vdocument.in/reader035/viewer/2022062423/56649c775503460f9492bffa/html5/thumbnails/72.jpg)
72
Partial response
Theorem Ω(n) is necessary to implement wait-free (n + 1)-process consensus with registers and objects of a one-shot deterministic type T such that cons(T)≤ n.
Corollary Ω(n) is necessary to implement (n + 1)-process consensus using registers and (n − 1)-resilient objects of any types.
![Page 73: The weakest failure detector question in distributed computing Petr Kouznetsov Distributed Programming Lab EPFL](https://reader035.vdocument.in/reader035/viewer/2022062423/56649c775503460f9492bffa/html5/thumbnails/73.jpg)
73
The sources C. Delporte-Gallet, H. Fauconnier, R. Guerraoui,
V. Hadzilacos, P. Kouznetsov, and S. TouegThe weakest failure detectors to solve certain fundamental problems in distributed computingPODC 2004
R. Guerraoui and P. Kouznetsov Failure Detectors and Type Boosters DISC 2003
C. Delporte-Gallet, H. Fauconnier, R. Guerraoui, and P. KouznetsovMutual Exclusion in Asynchronous Systems with Failure Detectors To appear in JPDC 2005
![Page 74: The weakest failure detector question in distributed computing Petr Kouznetsov Distributed Programming Lab EPFL](https://reader035.vdocument.in/reader035/viewer/2022062423/56649c775503460f9492bffa/html5/thumbnails/74.jpg)
74
Thank you!