reliable distributed systems membership. agreement on membership recall our approach: detecting...
Post on 21-Dec-2015
213 views
TRANSCRIPT
![Page 1: Reliable Distributed Systems Membership. Agreement on Membership Recall our approach: Detecting failure is a lost cause. Too many things can mimic failure](https://reader035.vdocument.in/reader035/viewer/2022062714/56649d6b5503460f94a4a3ed/html5/thumbnails/1.jpg)
Reliable Distributed Systems
Membership
![Page 2: Reliable Distributed Systems Membership. Agreement on Membership Recall our approach: Detecting failure is a lost cause. Too many things can mimic failure](https://reader035.vdocument.in/reader035/viewer/2022062714/56649d6b5503460f94a4a3ed/html5/thumbnails/2.jpg)
Agreement on Membership Recall our approach:
Detecting failure is a lost cause. Too many things can mimic failure To be accurate would end up waiting for a process
to recover Substitute agreement on membership
Now we can drop a process because it isn’t fast enough
This can seem “arbitrary”, e.g. A kills B… GMS implements this service for
everyone else
![Page 3: Reliable Distributed Systems Membership. Agreement on Membership Recall our approach: Detecting failure is a lost cause. Too many things can mimic failure](https://reader035.vdocument.in/reader035/viewer/2022062714/56649d6b5503460f94a4a3ed/html5/thumbnails/3.jpg)
Architecture
Membership Agreement, “join/leave” and “P seems to be unresponsive”
2PC-like protocols use membership changes instead of failure notification
Applications use replicated data for Applications use replicated data for high availabilityhigh availability
![Page 4: Reliable Distributed Systems Membership. Agreement on Membership Recall our approach: Detecting failure is a lost cause. Too many things can mimic failure](https://reader035.vdocument.in/reader035/viewer/2022062714/56649d6b5503460f94a4a3ed/html5/thumbnails/4.jpg)
Architecture
GMS
A
B
C
D
join leave
join
A seems to have failed
{A}
{A,B,D}
{A,D}
{A,D,C}
{D,C}
X Y Z
Application processes
GMS processes
membership views
![Page 5: Reliable Distributed Systems Membership. Agreement on Membership Recall our approach: Detecting failure is a lost cause. Too many things can mimic failure](https://reader035.vdocument.in/reader035/viewer/2022062714/56649d6b5503460f94a4a3ed/html5/thumbnails/5.jpg)
Contrast dynamic with static model Static model: fixed set of processes “tied” to
resources Processes may be unreachable (while failed or
partitioned away) but later recover Think: “cluster of PCs”
Dynamic model: changing set of processes launched while system runs, some fail/terminate
Failed processes never recover (partitioned process may reconnect, but uses a new pid)
And can still own a physical resource, allowing us to emulate a static model
![Page 6: Reliable Distributed Systems Membership. Agreement on Membership Recall our approach: Detecting failure is a lost cause. Too many things can mimic failure](https://reader035.vdocument.in/reader035/viewer/2022062714/56649d6b5503460f94a4a3ed/html5/thumbnails/6.jpg)
Consistency options Could require that system always be
consistent with actions taken at a process even if that process fails immediately after taking the action This property is needed in systems that take
external actions, like advising an air traffic controller
May not be needed in high availability systems Alternative is to require that operational
part of system remain continuously self-consistent
![Page 7: Reliable Distributed Systems Membership. Agreement on Membership Recall our approach: Detecting failure is a lost cause. Too many things can mimic failure](https://reader035.vdocument.in/reader035/viewer/2022062714/56649d6b5503460f94a4a3ed/html5/thumbnails/7.jpg)
Obstacles to progress Fischer, Lynch and Patterson result:
proof that agreement protocols cannot be both externally consistent and live in asynchronous environments
Suggests that choice between internal consistency and external consistency is a fundamental one!
Can show that this result also applies to dynamic membership problems
![Page 8: Reliable Distributed Systems Membership. Agreement on Membership Recall our approach: Detecting failure is a lost cause. Too many things can mimic failure](https://reader035.vdocument.in/reader035/viewer/2022062714/56649d6b5503460f94a4a3ed/html5/thumbnails/8.jpg)
Usual response to FLP: Chandra/Toueg Consider system as having a failure
detector that provides input to the basic system itself
Agreement protocols within system are considered safe and live if they satisfy their properties and are live when the failure detector is live
Babaoglu: expresses similar result in terms of reachability of processes: protocols are live during periods of reachability
![Page 9: Reliable Distributed Systems Membership. Agreement on Membership Recall our approach: Detecting failure is a lost cause. Too many things can mimic failure](https://reader035.vdocument.in/reader035/viewer/2022062714/56649d6b5503460f94a4a3ed/html5/thumbnails/9.jpg)
Towards an Alternative In this lecture, focus on systems
with self-defined membership Idea is that if p can’t talk to q it will
initiate a membership change that removes q from p’s system “membership view”
Illustrated on next slide
![Page 10: Reliable Distributed Systems Membership. Agreement on Membership Recall our approach: Detecting failure is a lost cause. Too many things can mimic failure](https://reader035.vdocument.in/reader035/viewer/2022062714/56649d6b5503460f94a4a3ed/html5/thumbnails/10.jpg)
Commit protocol from when we discussed transactions
ok to commit?
okdecision unknown!
vote unknown!ok
![Page 11: Reliable Distributed Systems Membership. Agreement on Membership Recall our approach: Detecting failure is a lost cause. Too many things can mimic failure](https://reader035.vdocument.in/reader035/viewer/2022062714/56649d6b5503460f94a4a3ed/html5/thumbnails/11.jpg)
Suppose this is a partitioning failure
ok to commit?
okdecision unknown!
vote unknown!ok
Do these processes actually need to be consistent with the others?
![Page 12: Reliable Distributed Systems Membership. Agreement on Membership Recall our approach: Detecting failure is a lost cause. Too many things can mimic failure](https://reader035.vdocument.in/reader035/viewer/2022062714/56649d6b5503460f94a4a3ed/html5/thumbnails/12.jpg)
Primary partition concept Idea is to identify notion of “the system”
with a unique component of the partitioned system
Call this distinguished component the “primary” partition of the system as a whole. Primary partition can speak with authority
for the system as a whole Non-primary partitions have weaker
consistency guarantees and limited ability to initiate new actions
![Page 13: Reliable Distributed Systems Membership. Agreement on Membership Recall our approach: Detecting failure is a lost cause. Too many things can mimic failure](https://reader035.vdocument.in/reader035/viewer/2022062714/56649d6b5503460f94a4a3ed/html5/thumbnails/13.jpg)
Ricciardi: Group Membership Protocol For use in a group membership service
(usually just a few processes that run on behalf of whole system)
Tracks own membership; own members use this to maintain membership list for the whole system
All user’s of the service see subsequences of a single system-wide group membership history
GMS also tracks the primary partition
![Page 14: Reliable Distributed Systems Membership. Agreement on Membership Recall our approach: Detecting failure is a lost cause. Too many things can mimic failure](https://reader035.vdocument.in/reader035/viewer/2022062714/56649d6b5503460f94a4a3ed/html5/thumbnails/14.jpg)
GMP protocol itself Used only to track membership of the “core”
GMS Designates one GMS member as the
coordinator Switches between 2PC and 3PC
2PC if the coordinator didn’t fail and other members failed or are joining
3PC if the coordinator failed and some other member is taking over as new coordinator
Question: how to avoid “logical partitioning”?
![Page 15: Reliable Distributed Systems Membership. Agreement on Membership Recall our approach: Detecting failure is a lost cause. Too many things can mimic failure](https://reader035.vdocument.in/reader035/viewer/2022062714/56649d6b5503460f94a4a3ed/html5/thumbnails/15.jpg)
GMS majority requirement To move from system “view” i to view
i+1, GMS requires explicit acknowledgement by a majority of the processes in view i
Can’t get a majority: causes GMS to lose its primaryness information
Dahlia Malkhi has extended GMP to support partitioning and remerging; similar idea used by Yair Amir and others in Totem system
![Page 16: Reliable Distributed Systems Membership. Agreement on Membership Recall our approach: Detecting failure is a lost cause. Too many things can mimic failure](https://reader035.vdocument.in/reader035/viewer/2022062714/56649d6b5503460f94a4a3ed/html5/thumbnails/16.jpg)
GMS in Action
p0
p1
...
p5
p0 is the initial coordinator. p1 and p2 join, then p3...p5 join. But p0 fails during join protocol, and later so does p3. Notice use of majority consent to avoid partitioning!
![Page 17: Reliable Distributed Systems Membership. Agreement on Membership Recall our approach: Detecting failure is a lost cause. Too many things can mimic failure](https://reader035.vdocument.in/reader035/viewer/2022062714/56649d6b5503460f94a4a3ed/html5/thumbnails/17.jpg)
GMS in Action
p0
p1
...
p5
2-phase commit… 3-phase… 2–phaseP0 is coordinator… P1 takes over… P1 is new coordinator
![Page 18: Reliable Distributed Systems Membership. Agreement on Membership Recall our approach: Detecting failure is a lost cause. Too many things can mimic failure](https://reader035.vdocument.in/reader035/viewer/2022062714/56649d6b5503460f94a4a3ed/html5/thumbnails/18.jpg)
What if system has thousands of processes?
Idea is to build a GMS subsystem that runs on just a few nodes
GMS members track themselves Other processes ask to be admitted to
system or for faulty processes to be excluded
GMS treats overall system membership as a form of replicated data that it manages, reports to its “listeners”
![Page 19: Reliable Distributed Systems Membership. Agreement on Membership Recall our approach: Detecting failure is a lost cause. Too many things can mimic failure](https://reader035.vdocument.in/reader035/viewer/2022062714/56649d6b5503460f94a4a3ed/html5/thumbnails/19.jpg)
Uses of membership?
If we rewire TCP and RPC to use membership changes as trigger for breaking connections, can eliminate split-brain problems! But nobody really does this Problem is that networks lack
standard GMS subsystems now! But we can still use it ourselves
![Page 20: Reliable Distributed Systems Membership. Agreement on Membership Recall our approach: Detecting failure is a lost cause. Too many things can mimic failure](https://reader035.vdocument.in/reader035/viewer/2022062714/56649d6b5503460f94a4a3ed/html5/thumbnails/20.jpg)
Replicated data within groups A very general requirement:
Data actually managed by group Inputs and outputs, in a server replicated for
fault-tolerance Coordination and synchronization data
Will see how to solve this, and then will use solution to implement “process groups” which are subgroups of the overall system membership
![Page 21: Reliable Distributed Systems Membership. Agreement on Membership Recall our approach: Detecting failure is a lost cause. Too many things can mimic failure](https://reader035.vdocument.in/reader035/viewer/2022062714/56649d6b5503460f94a4a3ed/html5/thumbnails/21.jpg)
Replicated data Assume that we have a (dynamically
defined) group of processes G and that its members manage a replicated data item
Goal: update by sending a multicast to G
Should be able to safely read any copy “locally”
Consider situation where members of G may fail or recover
![Page 22: Reliable Distributed Systems Membership. Agreement on Membership Recall our approach: Detecting failure is a lost cause. Too many things can mimic failure](https://reader035.vdocument.in/reader035/viewer/2022062714/56649d6b5503460f94a4a3ed/html5/thumbnails/22.jpg)
Some Initial Assumptions For now, assume that we work directly on the
real network, not using Ricciardi’s GMS Later will need to put GMS in to solve a
problem this raises, but for now, the model will be the very simple one: processes that communicate using messages, asynchronous network, crash failures
We’ll also need our own implementation of TCP-style reliable point-to-point channels using GMS as input
![Page 23: Reliable Distributed Systems Membership. Agreement on Membership Recall our approach: Detecting failure is a lost cause. Too many things can mimic failure](https://reader035.vdocument.in/reader035/viewer/2022062714/56649d6b5503460f94a4a3ed/html5/thumbnails/23.jpg)
Process group model Initially, we’ll assume we are simply
given the model Later will see that we can use reliable
multicast to implement the model First approximation: a process group is
defined by a series of “views” of its membership. All members see the same sequence of view changes. Failures, joins reported by changing membership
![Page 24: Reliable Distributed Systems Membership. Agreement on Membership Recall our approach: Detecting failure is a lost cause. Too many things can mimic failure](https://reader035.vdocument.in/reader035/viewer/2022062714/56649d6b5503460f94a4a3ed/html5/thumbnails/24.jpg)
Process groups with joins, failures
crash
G0={p,q} G1={p,q,r,s} G2={q,r,s} G3={q,r,s,t}
p
q
r
s
tr, s request to join
r,s added; state xfer
t added, state xfer
t requests to join
p fails
![Page 25: Reliable Distributed Systems Membership. Agreement on Membership Recall our approach: Detecting failure is a lost cause. Too many things can mimic failure](https://reader035.vdocument.in/reader035/viewer/2022062714/56649d6b5503460f94a4a3ed/html5/thumbnails/25.jpg)
State transfer
Method for passing information about state of a group to a joining member
Looks instantaneous, at time the member is added to the view
![Page 26: Reliable Distributed Systems Membership. Agreement on Membership Recall our approach: Detecting failure is a lost cause. Too many things can mimic failure](https://reader035.vdocument.in/reader035/viewer/2022062714/56649d6b5503460f94a4a3ed/html5/thumbnails/26.jpg)
Outline of treatment First, look at reliability and failure
atomicity Next, look at options for “ordering” in
group multicast Next, discuss implementation of the
group view mechanisms themselves Finally, return to state transfer Outcome: process groups, group
communication, state transfer, and fault-tolerance properties
![Page 27: Reliable Distributed Systems Membership. Agreement on Membership Recall our approach: Detecting failure is a lost cause. Too many things can mimic failure](https://reader035.vdocument.in/reader035/viewer/2022062714/56649d6b5503460f94a4a3ed/html5/thumbnails/27.jpg)
Atomic delivery Atomic or failure atomic delivery
If any process receives the message and remains operational, all operational destinations receive it
p
q
r
s
a
b
All processes that receive a subsequently fail.
failsfails
All processes receive b.
![Page 28: Reliable Distributed Systems Membership. Agreement on Membership Recall our approach: Detecting failure is a lost cause. Too many things can mimic failure](https://reader035.vdocument.in/reader035/viewer/2022062714/56649d6b5503460f94a4a3ed/html5/thumbnails/28.jpg)
Additional properties A multicast is dynamically uniform
if: If any process delivers the multicast,
all group members that don’t fail will deliver it (even if the initial recipient fails immediately after delivery).
Otherwise we say that the multicast is “not uniform”
![Page 29: Reliable Distributed Systems Membership. Agreement on Membership Recall our approach: Detecting failure is a lost cause. Too many things can mimic failure](https://reader035.vdocument.in/reader035/viewer/2022062714/56649d6b5503460f94a4a3ed/html5/thumbnails/29.jpg)
Uniform and non-uniform delivery
p
q
r
s
a
b
Uniform delivery of a and b
failsfails
p
q
r
s
a
b
Non-uniform delivery of a
failsfails
![Page 30: Reliable Distributed Systems Membership. Agreement on Membership Recall our approach: Detecting failure is a lost cause. Too many things can mimic failure](https://reader035.vdocument.in/reader035/viewer/2022062714/56649d6b5503460f94a4a3ed/html5/thumbnails/30.jpg)
Stronger properties cost more
Weaker ordering guarantees are cheaper than stronger ones
Non-uniform delivery is cheap Dynamic uniformity is costly Dynamic membership is cheap Static membership is more costly
![Page 31: Reliable Distributed Systems Membership. Agreement on Membership Recall our approach: Detecting failure is a lost cause. Too many things can mimic failure](https://reader035.vdocument.in/reader035/viewer/2022062714/56649d6b5503460f94a4a3ed/html5/thumbnails/31.jpg)
Conceptual cost graph
less ordered local total order global total order
non
-un
ifor
m, d
ynam
ic g
rou
p
u
nif
orm
sta
tic
grou
p
asynchronous and non-uniform “cbcast” to dynamically defined group
uniform and globally total “abcast” in a static group
cbcast in Horus: 85,000/second, 85us latency sender to dest
Total, safe abcast in Totem or Transis: 600/second, 750ms latency sender to dest
![Page 32: Reliable Distributed Systems Membership. Agreement on Membership Recall our approach: Detecting failure is a lost cause. Too many things can mimic failure](https://reader035.vdocument.in/reader035/viewer/2022062714/56649d6b5503460f94a4a3ed/html5/thumbnails/32.jpg)
Implementing multicast primitives Initially assume a static process group Crash failures: permanent failures, a
process fails by crashing undetectably. No GMS (at first).
Unreliable communication: messages can be lost in the channels
... looks like the asynchronous model of FLP
![Page 33: Reliable Distributed Systems Membership. Agreement on Membership Recall our approach: Detecting failure is a lost cause. Too many things can mimic failure](https://reader035.vdocument.in/reader035/viewer/2022062714/56649d6b5503460f94a4a3ed/html5/thumbnails/33.jpg)
Failures? Message loss: overcome with retransmission Process failures: assume they “crash” silently Network failures: also called “partitioning” Can’t distinguish between these cases!
p
q
timeout: q failed!
timeout: p failed!
network partitions
![Page 34: Reliable Distributed Systems Membership. Agreement on Membership Recall our approach: Detecting failure is a lost cause. Too many things can mimic failure](https://reader035.vdocument.in/reader035/viewer/2022062714/56649d6b5503460f94a4a3ed/html5/thumbnails/34.jpg)
Multicast by “flooding” All recipients echo message to all other
recipients, O(n2) messages exchanged Reject duplicates on basis of message id
When can we garbage collect the id?
p
q
r
s
a failsfails
![Page 35: Reliable Distributed Systems Membership. Agreement on Membership Recall our approach: Detecting failure is a lost cause. Too many things can mimic failure](https://reader035.vdocument.in/reader035/viewer/2022062714/56649d6b5503460f94a4a3ed/html5/thumbnails/35.jpg)
Multicast by “flooding” All recipients echo message to all other
recipients, O(n2) messages exchanged Reject duplicates on basis of message id
When can we garbage collect the id?
p
q
r
s
a
![Page 36: Reliable Distributed Systems Membership. Agreement on Membership Recall our approach: Detecting failure is a lost cause. Too many things can mimic failure](https://reader035.vdocument.in/reader035/viewer/2022062714/56649d6b5503460f94a4a3ed/html5/thumbnails/36.jpg)
Multicast by “flooding” All recipients echo message to all other
recipients, O(n2) messages exchanged Reject duplicates on basis of message id
When can we garbage collect the id?
p
q
r
s
a failsfails
![Page 37: Reliable Distributed Systems Membership. Agreement on Membership Recall our approach: Detecting failure is a lost cause. Too many things can mimic failure](https://reader035.vdocument.in/reader035/viewer/2022062714/56649d6b5503460f94a4a3ed/html5/thumbnails/37.jpg)
Multicast by “flooding” All recipients echo message to all other
recipients, O(n2) messages exchanged Reject duplicates on basis of message id
When can we garbage collect the id?
p
q
r
s
a failsfails
![Page 38: Reliable Distributed Systems Membership. Agreement on Membership Recall our approach: Detecting failure is a lost cause. Too many things can mimic failure](https://reader035.vdocument.in/reader035/viewer/2022062714/56649d6b5503460f94a4a3ed/html5/thumbnails/38.jpg)
Garbage collection issue Must remember id as long as might still
see a duplicate copy If no process fails: garbage collect after
echoed by all destinations Very similar to 3PC protocol
... correctness of this protocol depends upon having an accurate way to detect failure! Return to this point in a few minutes.
![Page 39: Reliable Distributed Systems Membership. Agreement on Membership Recall our approach: Detecting failure is a lost cause. Too many things can mimic failure](https://reader035.vdocument.in/reader035/viewer/2022062714/56649d6b5503460f94a4a3ed/html5/thumbnails/39.jpg)
“Lazy” flooding and garbage collection Idea is to delay “non urgent” messages Recipients delay the echo in hope that sender
will confirm successful delivery: O(n)
messages p
q
r
s
a
ack...
![Page 40: Reliable Distributed Systems Membership. Agreement on Membership Recall our approach: Detecting failure is a lost cause. Too many things can mimic failure](https://reader035.vdocument.in/reader035/viewer/2022062714/56649d6b5503460f94a4a3ed/html5/thumbnails/40.jpg)
“Lazy” flooding Recipients delay the echo in hope that sender
will confirm successful delivery: O(n)
messages
p
q
r
s
a
ack... all got it...
![Page 41: Reliable Distributed Systems Membership. Agreement on Membership Recall our approach: Detecting failure is a lost cause. Too many things can mimic failure](https://reader035.vdocument.in/reader035/viewer/2022062714/56649d6b5503460f94a4a3ed/html5/thumbnails/41.jpg)
“Lazy” flooding Recipients delay the echo in hope that
sender will confirm successful delivery: O(n) messages
Notice that garbage collection occurs in 3rd phase
p
q
r
s
a failsfails
ack... all got it... garbage collect
![Page 42: Reliable Distributed Systems Membership. Agreement on Membership Recall our approach: Detecting failure is a lost cause. Too many things can mimic failure](https://reader035.vdocument.in/reader035/viewer/2022062714/56649d6b5503460f94a4a3ed/html5/thumbnails/42.jpg)
“Lazy” flooding, delayed phases “Background” acknowedgements (not shown) Piggyback 2nd, 3rd phase on other multicasts
p
q
r
s
m1
m1
![Page 43: Reliable Distributed Systems Membership. Agreement on Membership Recall our approach: Detecting failure is a lost cause. Too many things can mimic failure](https://reader035.vdocument.in/reader035/viewer/2022062714/56649d6b5503460f94a4a3ed/html5/thumbnails/43.jpg)
“Lazy” flooding, delayed phases “Background” acknowedgements (not shown) Piggyback 2nd, 3rd phase on other multicasts
p
q
r
s
m1 m2
m1 m2, all got m
![Page 44: Reliable Distributed Systems Membership. Agreement on Membership Recall our approach: Detecting failure is a lost cause. Too many things can mimic failure](https://reader035.vdocument.in/reader035/viewer/2022062714/56649d6b5503460f94a4a3ed/html5/thumbnails/44.jpg)
“Lazy” flooding, delayed phases “Background” acknowedgements (not shown) Piggyback 2nd, 3rd phase on other multicasts
p
q
r
s
m1 m2 m3
fails
m1 m2, all got m1 m3, gc m1
![Page 45: Reliable Distributed Systems Membership. Agreement on Membership Recall our approach: Detecting failure is a lost cause. Too many things can mimic failure](https://reader035.vdocument.in/reader035/viewer/2022062714/56649d6b5503460f94a4a3ed/html5/thumbnails/45.jpg)
“Lazy” flooding, delayed phases
“Background” acknowedgements (not shown) Piggyback 2nd, 3rd phase on other multicasts
Reliable multicasts now look cheap!
p
q
r
s
m1 m2 m3 m4
m1 m2, all got m1 m3, gc m1 m4, gc m2
fails
![Page 46: Reliable Distributed Systems Membership. Agreement on Membership Recall our approach: Detecting failure is a lost cause. Too many things can mimic failure](https://reader035.vdocument.in/reader035/viewer/2022062714/56649d6b5503460f94a4a3ed/html5/thumbnails/46.jpg)
Lazy scheme continued
If sender fails, recipients switch to flood-style algorithm... but now we have the same garbage
collection problem: if sender fails we may never be able to garbage collect the id!
Problem is caused by lack of failure detector
![Page 47: Reliable Distributed Systems Membership. Agreement on Membership Recall our approach: Detecting failure is a lost cause. Too many things can mimic failure](https://reader035.vdocument.in/reader035/viewer/2022062714/56649d6b5503460f94a4a3ed/html5/thumbnails/47.jpg)
Garbage collection with inaccurate failure detections
... we lack an accurate way to detect failure If any does seem to fail, but is really still
operational and merely partitioned away, the connection might later be fixed.
That process might “wake up” and send a duplicate
Hence, if we are not sure a process has failed, can’t garbage collect our duplicate-supression data yet!
![Page 48: Reliable Distributed Systems Membership. Agreement on Membership Recall our approach: Detecting failure is a lost cause. Too many things can mimic failure](https://reader035.vdocument.in/reader035/viewer/2022062714/56649d6b5503460f94a4a3ed/html5/thumbnails/48.jpg)
Exploiting a failure detector Suppose that we had a failstop environment Process group membership managed by
oracle, perhaps the GMS we saw earlier Failures reported as “new group views” All see the same sequence of views:
G = {p,q,r,s} {p,r,s} {r,s} Now can assume failures are accurately
detected
![Page 49: Reliable Distributed Systems Membership. Agreement on Membership Recall our approach: Detecting failure is a lost cause. Too many things can mimic failure](https://reader035.vdocument.in/reader035/viewer/2022062714/56649d6b5503460f94a4a3ed/html5/thumbnails/49.jpg)
Now our lazy scheme works! Garbage collect when all non-faulty
processes are known to have received the message
Use process ranking to pick a new “coordinator” if the initial one fails
Cost only reaches n2 if many fail during protocol
Can delay 2nd, 3rd round if desired Also link GMS to point-to-point channel
implementation
![Page 50: Reliable Distributed Systems Membership. Agreement on Membership Recall our approach: Detecting failure is a lost cause. Too many things can mimic failure](https://reader035.vdocument.in/reader035/viewer/2022062714/56649d6b5503460f94a4a3ed/html5/thumbnails/50.jpg)
Failure Detectors Needed as “input” to GMS. For now, just
assume we have one, perhaps Vogel’s investigator
In practice many systems use “timeout”, but timeout is not safe for our purposes
Feeding detections through group membership service converts inaccurate failure detections into what look like failstop failures for processes within the system
![Page 51: Reliable Distributed Systems Membership. Agreement on Membership Recall our approach: Detecting failure is a lost cause. Too many things can mimic failure](https://reader035.vdocument.in/reader035/viewer/2022062714/56649d6b5503460f94a4a3ed/html5/thumbnails/51.jpg)
Cutting Channels to Failed Processes When a process is dropped from
the membership, break the connection to it
This will effectively eliminate the risk of “late” delivery of duplicate messages, etc.
Makes a partitioning failure look like a failstop failure.
![Page 52: Reliable Distributed Systems Membership. Agreement on Membership Recall our approach: Detecting failure is a lost cause. Too many things can mimic failure](https://reader035.vdocument.in/reader035/viewer/2022062714/56649d6b5503460f94a4a3ed/html5/thumbnails/52.jpg)
Dynamic uniformity This property requires an extra phase of
communication Phase 1: distribute message Phase 2: can deliver if all non-faulty
processes received it in phase 1 Insight: no process delivers a message
until all have received it
![Page 53: Reliable Distributed Systems Membership. Agreement on Membership Recall our approach: Detecting failure is a lost cause. Too many things can mimic failure](https://reader035.vdocument.in/reader035/viewer/2022062714/56649d6b5503460f94a4a3ed/html5/thumbnails/53.jpg)
Summary We know how to build a GMS that tracks
its own membership We know how to build an unordered
reliable multicast Actually, “sender-ordered” But from different senders, can delivery in
arbitrary orders And we know how to support various
forms of uniformity Next: multicast ordering