practical byzantine fault tolerance and proactive recovery
DESCRIPTION
Practical Byzantine Fault Tolerance and Proactive Recovery. Miguel Castro and Barbara Liskov ACM TOCS ‘02 Presented By: Imranul Hoque. Problem. Computer systems provide crucial services Computer systems fail Natural disasters Hardware failures Software errors Malicious attacks - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Practical Byzantine Fault Tolerance and Proactive Recovery](https://reader033.vdocument.in/reader033/viewer/2022051003/568166ba550346895ddac1a0/html5/thumbnails/1.jpg)
1
Practical Byzantine Fault Tolerance and Proactive Recovery
Miguel Castro and Barbara LiskovACM TOCS ‘02
Presented By: Imranul Hoque
![Page 2: Practical Byzantine Fault Tolerance and Proactive Recovery](https://reader033.vdocument.in/reader033/viewer/2022051003/568166ba550346895ddac1a0/html5/thumbnails/2.jpg)
2
Problem
• Computer systems provide crucial services• Computer systems fail– Natural disasters– Hardware failures– Software errors– Malicious attacks
• Need highly available service
Replicate to increase availability
![Page 3: Practical Byzantine Fault Tolerance and Proactive Recovery](https://reader033.vdocument.in/reader033/viewer/2022051003/568166ba550346895ddac1a0/html5/thumbnails/3.jpg)
3
Assumptions are a ProblemBefore BFT BFT
Behavior of faulty process
Fail stop failure[Paxos, VS Replication]
Byzantine failure
Synchrony Synchronous system[Rampart, SecureRing]
Asynchronous system!• Safety without synchrony• Liveness with eventual time bound
Number of faults
Bounded Unbounded if less then one third fail in a time period• Proactive and frequent recovery[When will this not work?]
![Page 4: Practical Byzantine Fault Tolerance and Proactive Recovery](https://reader033.vdocument.in/reader033/viewer/2022051003/568166ba550346895ddac1a0/html5/thumbnails/4.jpg)
4
Contributions
• Practical replication algorithm– Weak assumption– Good performance [Really?]
• Implementation– BFT: A generic replication toolkit– BFS: A replicated file system
• Performance evaluation
Byzantine Fault Tolerance in Asynchronous Environment
![Page 5: Practical Byzantine Fault Tolerance and Proactive Recovery](https://reader033.vdocument.in/reader033/viewer/2022051003/568166ba550346895ddac1a0/html5/thumbnails/5.jpg)
5
Challenges
Request A Request B
Client Client
![Page 6: Practical Byzantine Fault Tolerance and Proactive Recovery](https://reader033.vdocument.in/reader033/viewer/2022051003/568166ba550346895ddac1a0/html5/thumbnails/6.jpg)
6
Challenges
2: Request B
1: Request A
Client Client
![Page 7: Practical Byzantine Fault Tolerance and Proactive Recovery](https://reader033.vdocument.in/reader033/viewer/2022051003/568166ba550346895ddac1a0/html5/thumbnails/7.jpg)
7
State Machine Replication
2: Request B
1: Request A
2: Request B
1: Request A
2: Request B
1: Request A
2: Request B
1: Request A
Client Client
How to assign sequence number to requests?
![Page 8: Practical Byzantine Fault Tolerance and Proactive Recovery](https://reader033.vdocument.in/reader033/viewer/2022051003/568166ba550346895ddac1a0/html5/thumbnails/8.jpg)
8
Primary Backup Mechanism
Client Client
2: Request B
1: Request A
What if the primary is faulty?Agreeing on sequence number
Agreeing on changing the primary (view change)
View 0
![Page 9: Practical Byzantine Fault Tolerance and Proactive Recovery](https://reader033.vdocument.in/reader033/viewer/2022051003/568166ba550346895ddac1a0/html5/thumbnails/9.jpg)
9
Agreement
• Certificate: set of messages from a quorum• Algorithm steps are justified by certificates
Quorum BQuorum A
Quorums have at least 2f + 1 replicas
Quorums intersect in at least one correct replica
![Page 10: Practical Byzantine Fault Tolerance and Proactive Recovery](https://reader033.vdocument.in/reader033/viewer/2022051003/568166ba550346895ddac1a0/html5/thumbnails/10.jpg)
10
Algorithm Components
• Normal case operation• View changes• Garbage collection• Recovery
All have to be designed to work together
![Page 11: Practical Byzantine Fault Tolerance and Proactive Recovery](https://reader033.vdocument.in/reader033/viewer/2022051003/568166ba550346895ddac1a0/html5/thumbnails/11.jpg)
11
Normal Case Operation
• Three phase algorithm:– PRE-PREPARE picks order of requests– PREPARE ensures order within views– COMMIT ensures order across views
• Replicas remember messages in log• Messages are authenticated– {.}σk denotes a message sent by k
Quadratic message exchange
![Page 12: Practical Byzantine Fault Tolerance and Proactive Recovery](https://reader033.vdocument.in/reader033/viewer/2022051003/568166ba550346895ddac1a0/html5/thumbnails/12.jpg)
12
Pre-prepare Phase
Primary: Replica 0
Replica 1
Replica 2
Replica 3
Request: m
{PRE-PREPARE, v, n, m}σ0
Fail
![Page 13: Practical Byzantine Fault Tolerance and Proactive Recovery](https://reader033.vdocument.in/reader033/viewer/2022051003/568166ba550346895ddac1a0/html5/thumbnails/13.jpg)
13
Prepare PhaseRequest: m
PRE-PREPARE
Primary: Replica 0
Replica 1
Replica 2
Replica 3 Fail
Accepted PRE-PREPARE
![Page 14: Practical Byzantine Fault Tolerance and Proactive Recovery](https://reader033.vdocument.in/reader033/viewer/2022051003/568166ba550346895ddac1a0/html5/thumbnails/14.jpg)
14
Prepare PhaseRequest: m
PRE-PREPARE
Primary: Replica 0
Replica 1
Replica 2
Replica 3 Fail
{PREPARE, v, n, D(m), 1}σ1
Accepted PRE-PREPARE
![Page 15: Practical Byzantine Fault Tolerance and Proactive Recovery](https://reader033.vdocument.in/reader033/viewer/2022051003/568166ba550346895ddac1a0/html5/thumbnails/15.jpg)
15
Prepare PhaseRequest: m
PRE-PREPARE
Primary: Replica 0
Replica 1
Replica 2
Replica 3 Fail
{PREPARE, v, n, D(m), 1}σ1
Accepted PRE-PREPARE
Collect PRE-PREPARE + 2f matching PREPARE
![Page 16: Practical Byzantine Fault Tolerance and Proactive Recovery](https://reader033.vdocument.in/reader033/viewer/2022051003/568166ba550346895ddac1a0/html5/thumbnails/16.jpg)
16
Commit PhaseRequest: m
PRE-PREPARE
Primary: Replica 0
Replica 1
Replica 2
Replica 3 Fail
PREPARE
{COMMIT, v, n, D(m)}σ2
![Page 17: Practical Byzantine Fault Tolerance and Proactive Recovery](https://reader033.vdocument.in/reader033/viewer/2022051003/568166ba550346895ddac1a0/html5/thumbnails/17.jpg)
17
Commit Phase (2)Request: m
PRE-PREPARE
Primary: Replica 0
Replica 1
Replica 2
Replica 3 Fail
PREPARE COMMIT
Collect 2f+1 matching COMMIT: execute and reply
![Page 18: Practical Byzantine Fault Tolerance and Proactive Recovery](https://reader033.vdocument.in/reader033/viewer/2022051003/568166ba550346895ddac1a0/html5/thumbnails/18.jpg)
18
View Change
• Provide liveness when primary fails– Timeouts trigger view changes– Select new primary (= view number mod 3f+1)
• Brief protocol– Replicas send VIEW-CHANGE message along with
the requests they prepared so far– New primary collects 2f+1 VIEW-CHANGE messages– Constructs information about committed requests
in previous views
![Page 19: Practical Byzantine Fault Tolerance and Proactive Recovery](https://reader033.vdocument.in/reader033/viewer/2022051003/568166ba550346895ddac1a0/html5/thumbnails/19.jpg)
19
View Change Safety
• Goal: No two different committed request with same sequence number across views
Quorum for Committed Certificate (m, v, n)
At least one correct replica has Prepared Certificate (m, v, n)
View Change Quorum
![Page 20: Practical Byzantine Fault Tolerance and Proactive Recovery](https://reader033.vdocument.in/reader033/viewer/2022051003/568166ba550346895ddac1a0/html5/thumbnails/20.jpg)
20
Recovery
• Corrective measure for faulty replicas– Proactive and frequent recovery– All replicas can fail if at most f fail in a window
• System administrator performs recovery, or• Automatic recovery from network attacks– Secure co-processor– Read-only memory– Watchdog timer
Clients will not get reply if more than f replicas are recovering
![Page 21: Practical Byzantine Fault Tolerance and Proactive Recovery](https://reader033.vdocument.in/reader033/viewer/2022051003/568166ba550346895ddac1a0/html5/thumbnails/21.jpg)
21
Sketch of Recovery Protocol
• Save state• Reboot with correct code and restore state– Replica has correct code without losing state
• Change keys for incoming messages– Prevent attacker from impersonating others
• Send recovery request r– Others change incoming keys when r execute
• Check state and fetch out-of-date or corrupt items– Replica has correct up-to-date state
![Page 22: Practical Byzantine Fault Tolerance and Proactive Recovery](https://reader033.vdocument.in/reader033/viewer/2022051003/568166ba550346895ddac1a0/html5/thumbnails/22.jpg)
22
Performance
• Andrew benchmark– Andrew100 and Andrew500
• 4 machines: 600 MHz, Pentium III• 3 Systems– BFS: based on BFT– NO-REP: BFS without replication– NFS: NFS-V2 implementation in Linux
No experiment with faulty replicasScalability issue: only 4 & 7 replicas
![Page 23: Practical Byzantine Fault Tolerance and Proactive Recovery](https://reader033.vdocument.in/reader033/viewer/2022051003/568166ba550346895ddac1a0/html5/thumbnails/23.jpg)
23
Benchmark Results (w/o PR)
Without view change and faulty replica!
![Page 24: Practical Byzantine Fault Tolerance and Proactive Recovery](https://reader033.vdocument.in/reader033/viewer/2022051003/568166ba550346895ddac1a0/html5/thumbnails/24.jpg)
24
Benchmark Results (with PR)Recovery Period
Recovery is staggered!
![Page 25: Practical Byzantine Fault Tolerance and Proactive Recovery](https://reader033.vdocument.in/reader033/viewer/2022051003/568166ba550346895ddac1a0/html5/thumbnails/25.jpg)
25
Related WorksFault Tolerance
Fail Stop Fault TolerancePaxos
1989 (TR)
VS ReplicationPODC 1988
Byzantine Fault ToleranceByzantine
Agreement
Rampart
TPDS 1995SecureRingHICSS 1998BFT TOCS ‘02
BASETOCS ‘03
Byzantine
Quorums
Malkhi-
ReiterJDC
1998Phalanx
SRDS 1998FleetToKDI
‘00Q/USOSP ‘05
Hybrid Quorum
HQ Replication OSDI ‘06
![Page 26: Practical Byzantine Fault Tolerance and Proactive Recovery](https://reader033.vdocument.in/reader033/viewer/2022051003/568166ba550346895ddac1a0/html5/thumbnails/26.jpg)
26
Today’s Talk
Replication in Wide Area NetworkProf. Keith Marzullo
Distinguished Lectureship SeriesToday @ 4PM – 1404 SC
![Page 27: Practical Byzantine Fault Tolerance and Proactive Recovery](https://reader033.vdocument.in/reader033/viewer/2022051003/568166ba550346895ddac1a0/html5/thumbnails/27.jpg)
27
Questions?
![Page 28: Practical Byzantine Fault Tolerance and Proactive Recovery](https://reader033.vdocument.in/reader033/viewer/2022051003/568166ba550346895ddac1a0/html5/thumbnails/28.jpg)
28
![Page 29: Practical Byzantine Fault Tolerance and Proactive Recovery](https://reader033.vdocument.in/reader033/viewer/2022051003/568166ba550346895ddac1a0/html5/thumbnails/29.jpg)
29
Backup Slides
• Optimization• Proof of 3f+1 optimality• Garbage Collection• View Change with MAC• Detailed Recovery Protocol• State Transfer
![Page 30: Practical Byzantine Fault Tolerance and Proactive Recovery](https://reader033.vdocument.in/reader033/viewer/2022051003/568166ba550346895ddac1a0/html5/thumbnails/30.jpg)
30
Optimization
• Using MAC instead of Digital Signatures• Replying with digest• Tentative execution of Read-Write requests• Tentative execution or Read-only requests• Piggyback COMMIT with next PRE-PREPARE• Request batching
![Page 31: Practical Byzantine Fault Tolerance and Proactive Recovery](https://reader033.vdocument.in/reader033/viewer/2022051003/568166ba550346895ddac1a0/html5/thumbnails/31.jpg)
31
Proof of 3f+1 Optimality
• f faulty replicas• Should reply after n-f replicas respond to ensure
liveness• Two n-f quorums intersect in at least n-2f replicas• Worst case scenario: out of (n-2f), f are faulty• Two quorums should have at least 1 non-faulty
replica in common– Therefore, n-3f >= 1
![Page 32: Practical Byzantine Fault Tolerance and Proactive Recovery](https://reader033.vdocument.in/reader033/viewer/2022051003/568166ba550346895ddac1a0/html5/thumbnails/32.jpg)
32
Garbage Collection
• Replicas should remove from their logs information about executed operations.– Even if i has executed a request, it may not be safe
to remove that request's information because of a subsequent view change.
– Use periodic checkpointing (after K requests).
![Page 33: Practical Byzantine Fault Tolerance and Proactive Recovery](https://reader033.vdocument.in/reader033/viewer/2022051003/568166ba550346895ddac1a0/html5/thumbnails/33.jpg)
33
Garbage Collection• When replica i produces a checkpoint it multicasts
{CHECKPOINT, n, d, i}αi.– n is the highest sequence number of the requests i has
executed.– d is a digest of the current state.
• Each replica collects such messages until it has a quorum certificate with 2f + 1 authenticated CHECKPOINT messages with the same n and d from distinct i.– Called the stable certificate: all other replicas will be able to
obtain a weak certificate proving that its stable checkpoint is correct if they need to fetch it.
![Page 34: Practical Byzantine Fault Tolerance and Proactive Recovery](https://reader033.vdocument.in/reader033/viewer/2022051003/568166ba550346895ddac1a0/html5/thumbnails/34.jpg)
34
Garbage Collection
• Once a replica has a stable certificate for a checkpoint with sequence number n, then that checkpoint is stable.– The replica can discard all entries in the log with
sequence numbers less than n and all checkpointed states ealier than n.
• The checkpoint can be used to define high and low water marks for sequence numbers:– L = n– H = n + LOG for a log size LOG.
![Page 35: Practical Byzantine Fault Tolerance and Proactive Recovery](https://reader033.vdocument.in/reader033/viewer/2022051003/568166ba550346895ddac1a0/html5/thumbnails/35.jpg)
35
View Change with MAC
• Replica has logged per message m:– m = {REQUEST, o, t, c} αc
– {PRE-PREPARE, v, n, D(m)} αp
• m is pre-prepared at this replica
– {PREPARE, v, n, D(m), i} αi from 2f replicas• m is prepared at this replica
– {COMMIT, v, n, i}αi from 2f+1 replicas• m is commited at this replica
![Page 36: Practical Byzantine Fault Tolerance and Proactive Recovery](https://reader033.vdocument.in/reader033/viewer/2022051003/568166ba550346895ddac1a0/html5/thumbnails/36.jpg)
36
View Change with MAC
• At a high level:– The primary of view v + 1 reads information about
stable and prepared certificates from a quorum– It then computes the stable checkpoint and fills in
the command sequence– It sends this information to the backups.– The backups verify this information.
![Page 37: Practical Byzantine Fault Tolerance and Proactive Recovery](https://reader033.vdocument.in/reader033/viewer/2022051003/568166ba550346895ddac1a0/html5/thumbnails/37.jpg)
37
View Change with MAC
• Each replica i maintains two sets P and Q:– P contains tuples {n, d, v} such that i has collected
a prepared certificate for request with digest d for view v, sequence n and there is no such request with sequence n for a later view.
– Q contains tuples {n, d, v} such that i pre-prepared request for request with digest d for view v, sequence n and there is no such request with sequence n for a later view.
![Page 38: Practical Byzantine Fault Tolerance and Proactive Recovery](https://reader033.vdocument.in/reader033/viewer/2022051003/568166ba550346895ddac1a0/html5/thumbnails/38.jpg)
38
View Change with MAC
• When i suspects the primary for view v has failed:– Enter view v + 1– Multicast {VIEW-CHANGE, v+1, h, C, P, Q, i}αi
– h is the sequence number of the stable checkpoint of i.
– C is a set of checkpoints at i with their sequence number.
![Page 39: Practical Byzantine Fault Tolerance and Proactive Recovery](https://reader033.vdocument.in/reader033/viewer/2022051003/568166ba550346895ddac1a0/html5/thumbnails/39.jpg)
39
View Change with MAC
• Replicas collect VIEW-CHANGE messages and acknowledge them to primary p of view v + 1.– Accepts a message only if the values in their P and
Q are earlier than the new view number.– {VIEW-CHANGE-ACK, v+1, i, j, d}μi
– i is sender of ack.– j is source of the VIEW-CHANGE message.– d is the digest of the VIEW-CHANGE message
![Page 40: Practical Byzantine Fault Tolerance and Proactive Recovery](https://reader033.vdocument.in/reader033/viewer/2022051003/568166ba550346895ddac1a0/html5/thumbnails/40.jpg)
40
View Change with MAC
• The primary p:– Stores VIEW-CHANGE from i in S[i] if gets 2f − 1
corresponding VIEW-CHANGE-ACKs.– Call a view-change certificate– Chooses from S a checkpoint and sets of requests.– Try to do whenever update S.
![Page 41: Practical Byzantine Fault Tolerance and Proactive Recovery](https://reader033.vdocument.in/reader033/viewer/2022051003/568166ba550346895ddac1a0/html5/thumbnails/41.jpg)
41
View Change with MAC
• Select checkpoint for starting state for processing requests in the new view.– Picks checkpoint with the highest number h from
the set of checkpoints that are known to be correct (at least f + 1 of them) and that have numbers higher than the low water mark of at least f +1 non-faulty replicas (so ordering information is available).
![Page 42: Practical Byzantine Fault Tolerance and Proactive Recovery](https://reader033.vdocument.in/reader033/viewer/2022051003/568166ba550346895ddac1a0/html5/thumbnails/42.jpg)
42
View Change with MAC
• Select a request to pre-prepare in the new view for each n between h and h + Log.– If m committed in an earlier view, then choose m.– If there is a quorum of replicas that did not
prepare any request for n, then p pre-proposes a null command.
• Multicast decision to backups.– Each can verify by running the same decision
procedure.
![Page 43: Practical Byzantine Fault Tolerance and Proactive Recovery](https://reader033.vdocument.in/reader033/viewer/2022051003/568166ba550346895ddac1a0/html5/thumbnails/43.jpg)
43
Recovery Protocol
• Save state• Reboot with correct code and restore state– Replica has correct code without losing state
• Change keys for incoming messages– Prevent attacker from impersonating others
• Estimate high-water mark in a correct log (H)– Bound sequence number of bad information by H
![Page 44: Practical Byzantine Fault Tolerance and Proactive Recovery](https://reader033.vdocument.in/reader033/viewer/2022051003/568166ba550346895ddac1a0/html5/thumbnails/44.jpg)
44
Recovery Protocol
• Send recovery request r– Others change incoming keys when r executes– Bound sequence number of forged messages by Hr
(high water mark in log when r executes)• Participate in algorithm– May be needed to complete recovery if not faulty
• Check state and fetch out-of-date or corrupt items• End: checkpoint max(H, Hr) is stable– Replica has correct up-to-date state
![Page 45: Practical Byzantine Fault Tolerance and Proactive Recovery](https://reader033.vdocument.in/reader033/viewer/2022051003/568166ba550346895ddac1a0/html5/thumbnails/45.jpg)
45
State Transfer