cs 194: distributed systems process resilience, reliable group communication
DESCRIPTION
CS 194: Distributed Systems Process resilience, Reliable Group Communication. Scott Shenker and Ion Stoica Computer Science Division Department of Electrical Engineering and Computer Sciences University of California, Berkeley Berkeley, CA 94720-1776. Some definitions…. - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: CS 194: Distributed Systems Process resilience, Reliable Group Communication](https://reader031.vdocument.in/reader031/viewer/2022020320/56815c81550346895dca967b/html5/thumbnails/1.jpg)
1
CS 194: Distributed SystemsProcess resilience, Reliable Group
Communication
Scott Shenker and Ion Stoica Computer Science Division
Department of Electrical Engineering and Computer SciencesUniversity of California, Berkeley
Berkeley, CA 94720-1776
![Page 2: CS 194: Distributed Systems Process resilience, Reliable Group Communication](https://reader031.vdocument.in/reader031/viewer/2022020320/56815c81550346895dca967b/html5/thumbnails/2.jpg)
2
Some definitions…
Availability: probability the system operates correctly at any given moment
Reliability: ability to run correctly for a long interval of time
Safety: failure to operate correctly does not lead to catastrophic failures
Maintainability: ability to “easily” repair a failed system
![Page 3: CS 194: Distributed Systems Process resilience, Reliable Group Communication](https://reader031.vdocument.in/reader031/viewer/2022020320/56815c81550346895dca967b/html5/thumbnails/3.jpg)
3
… and Some More Definitions (Failure Models)
Crash failure: a server halts, but works correctly until it halts
Omission failure: a server fails to respond to a request
Timing failure: a server response exceeds specified time interval
Response failure: server’s response is incorrect
Arbitrary (Byzantine) failure: server produces arbitrary response at arbitrary times
![Page 4: CS 194: Distributed Systems Process resilience, Reliable Group Communication](https://reader031.vdocument.in/reader031/viewer/2022020320/56815c81550346895dca967b/html5/thumbnails/4.jpg)
4
Masking Failures: Redundancy
How many failures can this design tolerate?
![Page 5: CS 194: Distributed Systems Process resilience, Reliable Group Communication](https://reader031.vdocument.in/reader031/viewer/2022020320/56815c81550346895dca967b/html5/thumbnails/5.jpg)
5
Example: Open Shortest Path First (OSPF) over Broadcast Networks
1) Each node sends an route advertisements to multicast group DR-rtrs- Both designated router (DR) and backup designated router (BDR)
subscribe to this group
2) DR floods route advertisements back to all routers - Send to all-rtrs multicast group to which all nodes subscribe
DR BDR
DR BDR
![Page 6: CS 194: Distributed Systems Process resilience, Reliable Group Communication](https://reader031.vdocument.in/reader031/viewer/2022020320/56815c81550346895dca967b/html5/thumbnails/6.jpg)
6
Agreement in Faulty Systems
Many things can go wrong…
Communication - Message transmission can be unreliable- Time taken to deliver a message is unbounded- Adversary can intercept messages
Processes- Can fail or team up to produce wrong results
Agreement very hard, sometime impossible, to achieve!
![Page 7: CS 194: Distributed Systems Process resilience, Reliable Group Communication](https://reader031.vdocument.in/reader031/viewer/2022020320/56815c81550346895dca967b/html5/thumbnails/7.jpg)
7
Two-Army Problem
“Two blue armies need to simultaneously attack the white army to win; otherwise they will be defeated. The blue army can communicate only across the area controlled by the white army which can intercept the messengers.”
What is the solution?
![Page 8: CS 194: Distributed Systems Process resilience, Reliable Group Communication](https://reader031.vdocument.in/reader031/viewer/2022020320/56815c81550346895dca967b/html5/thumbnails/8.jpg)
8
Byzantine Agreement [Lamport et al. (1982)]
Goal:- Each process learn the true values sent by correct processes
Assumptions:- Every message that is sent is delivered correctly- The receiver knows who sent the message- Message delivery time is bounded
![Page 9: CS 194: Distributed Systems Process resilience, Reliable Group Communication](https://reader031.vdocument.in/reader031/viewer/2022020320/56815c81550346895dca967b/html5/thumbnails/9.jpg)
9
Byzantine Agreement Result
In a system with m faulty processes agreement can be achieved only if there are 2m+1 functioning correctly
Note: This result only guarantees that each process receives the true values sent by correct processors, but it does not identify the correct processes!
![Page 10: CS 194: Distributed Systems Process resilience, Reliable Group Communication](https://reader031.vdocument.in/reader031/viewer/2022020320/56815c81550346895dca967b/html5/thumbnails/10.jpg)
10
Byzantine General Problem: Example
Phase 1: Generals announce their troop strengths to each other
P1 P2
P3 P4
1
11
![Page 11: CS 194: Distributed Systems Process resilience, Reliable Group Communication](https://reader031.vdocument.in/reader031/viewer/2022020320/56815c81550346895dca967b/html5/thumbnails/11.jpg)
11
Byzantine General Problem: Example
Phase 1: Generals announce their troop strengths to each other
P1 P2
P3 P4
2
2 2
![Page 12: CS 194: Distributed Systems Process resilience, Reliable Group Communication](https://reader031.vdocument.in/reader031/viewer/2022020320/56815c81550346895dca967b/html5/thumbnails/12.jpg)
12
Byzantine General Problem: Example
Phase 1: Generals announce their troop strengths to each other
P1 P2
P3 P4
4 4
4
![Page 13: CS 194: Distributed Systems Process resilience, Reliable Group Communication](https://reader031.vdocument.in/reader031/viewer/2022020320/56815c81550346895dca967b/html5/thumbnails/13.jpg)
13
Byzantine General Problem: Example
Phase 2: Each general construct a vector with all troops
P1 P2 P3 P41 2 x 4
P1 P2
P3 P4
yx
z
P1 P2 P3 P41 2 y 4
P1 P2 P3 P41 2 z 4
![Page 14: CS 194: Distributed Systems Process resilience, Reliable Group Communication](https://reader031.vdocument.in/reader031/viewer/2022020320/56815c81550346895dca967b/html5/thumbnails/14.jpg)
14
Byzantine General Problem: Example
Phase 3: Generals send their vectors to each other and compute majority voting
P1 P2 P3 P41 2 y 4a b c d1 2 z 4
P1 P2
P3 P4
(e, f, g, h)
(a, b, c, d)
(h, i, j, k)
P1 P2 P3 P41 2 x 4e f g h1 2 z 4
P1 P2 P3 P41 2 x 41 2 y 4h i j k
P2P3P4
P1P3P4
P1P2P3
(1, 2, ?, 4) (1, 2, ?, 4)
(1, 2, ?, 4)
![Page 15: CS 194: Distributed Systems Process resilience, Reliable Group Communication](https://reader031.vdocument.in/reader031/viewer/2022020320/56815c81550346895dca967b/html5/thumbnails/15.jpg)
15
Reliable Group Communication
Reliable multicast: all nonfaulty processes which do not join/leave during communication receive the message
Atomic multicast: all messages are delivered in the same order to all processes
![Page 16: CS 194: Distributed Systems Process resilience, Reliable Group Communication](https://reader031.vdocument.in/reader031/viewer/2022020320/56815c81550346895dca967b/html5/thumbnails/16.jpg)
16
Reliable multicast: (N)ACK Implosion
(Positive) acknowledgements- Ack every n received packets- What happens for multicast?
Negative acknowledgements- Only ack when data is lost- Assume packet 2 is lost
S
R1
R2
R3
123
![Page 17: CS 194: Distributed Systems Process resilience, Reliable Group Communication](https://reader031.vdocument.in/reader031/viewer/2022020320/56815c81550346895dca967b/html5/thumbnails/17.jpg)
17
Reliable multicast: (N)ACK Implosion
When a packet is lost all receivers in the sub-tree originated at the link where the packet is lost send NACKs
S
R1
R2
R3
3
3
3
2?
2?
2?
![Page 18: CS 194: Distributed Systems Process resilience, Reliable Group Communication](https://reader031.vdocument.in/reader031/viewer/2022020320/56815c81550346895dca967b/html5/thumbnails/18.jpg)
18
Scalable Reliable Multicast (SRM)[Floyd et al ’95]
Receivers use timers to send NACKS and retransmissions- Randomized: prevent implosion- Uses latency estimates
• Short timer cause duplicates when there is reordering• Long timer causes excess delay
Any node retransmits- Sender can use its bandwidth more efficiently- Overall group throughput is higher
Duplicate NACK/retransmission suppression
![Page 19: CS 194: Distributed Systems Process resilience, Reliable Group Communication](https://reader031.vdocument.in/reader031/viewer/2022020320/56815c81550346895dca967b/html5/thumbnails/19.jpg)
19
Inter-node Latency Estimation
Every node estimates latency to every other node
Uses session reports- Assume symmetric latency- What happens when group
becomes very large?
A B
t1
t2
d d
dA,B = (t2 – t1 – d)/2
![Page 20: CS 194: Distributed Systems Process resilience, Reliable Group Communication](https://reader031.vdocument.in/reader031/viewer/2022020320/56815c81550346895dca967b/html5/thumbnails/20.jpg)
20
Chosen from the uniform distribution on
- A – node that lost the packet- S – source- C1, C2 – constants- dS,A – latency between source (S) and A- i – iteration of repair request tries seen
Algorithm- Detect loss set timer- Receive request for same data cancel timer, set new timer- Timer expires send repair request
Repair Request Timer Randomization
])(,[2 ,21,1 ASASi dCCdC
![Page 21: CS 194: Distributed Systems Process resilience, Reliable Group Communication](https://reader031.vdocument.in/reader031/viewer/2022020320/56815c81550346895dca967b/html5/thumbnails/21.jpg)
21
Timer Randomization
Repair timer similar- Every node that receives repair request sets repair timer- Latency estimate is between node and node requesting repair- Use following formula:
• D1, D2 – constants• dR,A – latency between node requesting repair (R) and A
Timer properties – minimize probability of duplicate packets- Reduce likelihood of implosion (duplicates still possible)- Reduce delay to repair
])(,[2 ,21,1 ARARi dDDdD
![Page 22: CS 194: Distributed Systems Process resilience, Reliable Group Communication](https://reader031.vdocument.in/reader031/viewer/2022020320/56815c81550346895dca967b/html5/thumbnails/22.jpg)
22
Chain Topology
C1 = D1 = 1, C2 = D2 = 0 All link distances are 1
L2 L1 R1 R2 R3
source
data outof order
data/repairrequest repairrequest TOrepair TO
request
repair
![Page 23: CS 194: Distributed Systems Process resilience, Reliable Group Communication](https://reader031.vdocument.in/reader031/viewer/2022020320/56815c81550346895dca967b/html5/thumbnails/23.jpg)
23
Star Topology
C1 = D1 = 0, Tradeoff between (1) number of requests
and (2) time to receive the repair C2 <= 1
- E(# of requests) = g –1 C2 > 1
- E(# of requests) = 1 + (g-2)/C2
- E(time until first timer expires) = 2C2/g
- E(# of requests) = - E(time until first timer expires) =
N1
N2
N3 N4
Ng
source
gC 2g
g/1
![Page 24: CS 194: Distributed Systems Process resilience, Reliable Group Communication](https://reader031.vdocument.in/reader031/viewer/2022020320/56815c81550346895dca967b/html5/thumbnails/24.jpg)
24
Bounded Degree Tree
Use both- Deterministic suppression (chain topology)- Probabilistic suppression (star topology)
Large C2/C1 fewer duplicate requests, but larger repair time
Large C1 fewer duplicate requests Small C1 smaller repair time
![Page 25: CS 194: Distributed Systems Process resilience, Reliable Group Communication](https://reader031.vdocument.in/reader031/viewer/2022020320/56815c81550346895dca967b/html5/thumbnails/25.jpg)
25
Adaptive Timers
C and D parameters depends on topology and congestion choose adaptively
After sending a request: - Decrease start of request timer interval
Before each new request timer is set:- If requests sent in previous rounds, and any dup requests were from
further away:• Decrease request timer interval
- Else if average dup requests high:• Increase request timer interval
- Else if average dup requests low and average request delay too high:• Decrease request timer interval
![Page 26: CS 194: Distributed Systems Process resilience, Reliable Group Communication](https://reader031.vdocument.in/reader031/viewer/2022020320/56815c81550346895dca967b/html5/thumbnails/26.jpg)
26
Atomic Multicast
All messages are delivered in the same order to “all” processes
Group view: the set of processes known by the sender when it multicast the message
Virtual synchronous multicast: a message multicast to a group view G is delivered to all nonfaulty processes in G
- If sender fails after sending the message, the message may be delivered to no one
![Page 27: CS 194: Distributed Systems Process resilience, Reliable Group Communication](https://reader031.vdocument.in/reader031/viewer/2022020320/56815c81550346895dca967b/html5/thumbnails/27.jpg)
27
Virtual Synchronous Multicast
![Page 28: CS 194: Distributed Systems Process resilience, Reliable Group Communication](https://reader031.vdocument.in/reader031/viewer/2022020320/56815c81550346895dca967b/html5/thumbnails/28.jpg)
28
Virtual Synchrony Implementation [Birman et al., 1991]
The logical organization of a distributed system to distinguish between message receipt and message delivery
![Page 29: CS 194: Distributed Systems Process resilience, Reliable Group Communication](https://reader031.vdocument.in/reader031/viewer/2022020320/56815c81550346895dca967b/html5/thumbnails/29.jpg)
29
Virtual Synchrony Implementation: [Birman et al., 1991]
Only stable messages are delivered
Stable message: a message received by all processes in the message’s group view
Assumptions (can be ensured by using TCP): - Point-to-point communication is reliable- Point-to-point communication ensures FIFO-ordering
![Page 30: CS 194: Distributed Systems Process resilience, Reliable Group Communication](https://reader031.vdocument.in/reader031/viewer/2022020320/56815c81550346895dca967b/html5/thumbnails/30.jpg)
30
Virtual Synchrony Implementation: Example
Gi = {P1, P2, P3, P4, P5} P5 fails P1 detects that P5 has
failed P1 send a “view change”
message to every process in Gi+1 = {P1, P2, P3, P4}
P1
P2 P3
P4
P5
change view
![Page 31: CS 194: Distributed Systems Process resilience, Reliable Group Communication](https://reader031.vdocument.in/reader031/viewer/2022020320/56815c81550346895dca967b/html5/thumbnails/31.jpg)
31
Virtual Synchrony Implementation: Example
Every process - Send each unstable message m
from Gi to members in Gi+1
- Marks m as being stable- Send a flush message to mark that
all unstable messages have been sent P1
P2 P3
P4
P5
unstable message
flush message
![Page 32: CS 194: Distributed Systems Process resilience, Reliable Group Communication](https://reader031.vdocument.in/reader031/viewer/2022020320/56815c81550346895dca967b/html5/thumbnails/32.jpg)
32
Virtual Synchrony Implementation: Example
Every process - After receiving a flush message
from any process in Gi+1 installs Gi+1
P1
P2 P3
P4
P5
![Page 33: CS 194: Distributed Systems Process resilience, Reliable Group Communication](https://reader031.vdocument.in/reader031/viewer/2022020320/56815c81550346895dca967b/html5/thumbnails/33.jpg)
33
Message Ordering
FIFO-order: messages from the same process are delivered in the same order they were sent
Causal-order: potential causality between different messages is preserved
Total-order: all processes receive messages in the same order
Total ordering does not imply causality or FIFO! Atomicity is orthogonal to ordering
![Page 34: CS 194: Distributed Systems Process resilience, Reliable Group Communication](https://reader031.vdocument.in/reader031/viewer/2022020320/56815c81550346895dca967b/html5/thumbnails/34.jpg)
34
Message Ordering and Atomicity
Multicast Basic Message Ordering Total-ordered Delivery?
Reliable multicast None No
FIFO multicast FIFO-ordered delivery No
Causal multicast Causal-ordered delivery No
Atomic multicast None Yes
FIFO atomic multicast FIFO-ordered delivery Yes
Causal atomic multicast Causal-ordered delivery Yes