dept. of computer science & engineering, the chinese university of hong kong paxos: the...
TRANSCRIPT
Dept. of Computer Science & Engineering, The Chinese University of Hong Kong
Paxos: The Part-Time Parliament
Paxos: The Part-Time Parliament
CHEN Xinyu
2011-04-04
Dept. of Computer Science & Engineering, The Chinese University of Hong Kong 2
OutlineOutline
Background The single-decree protocol Fault-tolerant distributed system Conclusion
Dept. of Computer Science & Engineering, The Chinese University of Hong Kong
The ParliamentThe Parliament
The primary task was to determine the law
A sequence of passed decrees
A decree was passed if and only if a majority of legislators voted for the decree
3
Dept. of Computer Science & Engineering, The Chinese University of Hong Kong
ConstraintsConstraints
The acoustics of the Chamber were poor
Communicate only by messenger
Part-time: No one in Paxos was willing to devote his life to the Parliament
Legislator Continually wandered in and out of the parliamentary
Chamber No secretary
Each legislator maintained a ledger in which he recorded the numbered sequence of decrees that were passed
Messenger Messages may be delayed, lost, or duplicated
4
Dept. of Computer Science & Engineering, The Chinese University of Hong Kong
PreconditionsPreconditions Mutual trust
Legislators were willing to pass any decree that was proposed
Messengers did not garble messages When legislators and messengers
remained in the Chamber Legislators reacted promptly to any messages Messengers delivered messages in a timely fashion
Resources for each legislator A sturdy ledger
Record the decrees Write notes to remind himself of the current progress
Enough funds to hire as many messengers as he needed
Timers
5
Dept. of Computer Science & Engineering, The Chinese University of Hong Kong
The Single-Decree ProtocolThe Single-Decree Protocol
A decree was chosen through a series of numbered ballots
In each ballot, a legislator had the choice only of voting for the decree or not voting
Each ballot was associated with a set of legislators called a quorum
A ballot succeeded if and only if every legislator in the quorum voted for the decrees
6
12/ N
Dept. of Computer Science & Engineering, The Chinese University of Hong Kong
RequirementsRequirements
Consistency No two ledgers could contain contradictory
information
Progress If a majority of the legislators were in the
Chamber, and no one entered or left the Chamber for a sufficiently long period of time, then any decree proposed by a legislator in the Chamber would be passed, and every decree that had been passed would appear in the ledger of every legislator in the Chamber
7
Dept. of Computer Science & Engineering, The Chinese University of Hong Kong
Achieving ConsistencyAchieving Consistency
Each ballot has a unique ballot number
The quorums of any two ballots have at least one legislator in common
For every ballot B, if any legislator in B’s quorum voted in an earlier ballot, then the decree of B equals the decree of the latest of those earlier ballots
8
Dept. of Computer Science & Engineering, The Chinese University of Hong Kong
A Sequence of BallotsA Sequence of Ballots
9
11
22
33
44
55
Ballot # Decree Quorum and voters
If a ballot B is successful, then any later ballot is for the same decree as B
For every ballot B, if any legislator in B’s quorum voted in an earlier ballot, then the decree of B equals the decree of the latest of those earlier ballots
Dept. of Computer Science & Engineering, The Chinese University of Hong Kong
Three RolesThree Roles Proposer: A legislator who initiated a
ballot How to chose a ballot’s number, decree, and quorum? Notes:
pnumber: the largest ballot number that he has proposed pdecree: the proposed decree for the ballot pnumber
Acceptor: A legislator in the quorum How to decide whether or not to vote? Notes
number: the largest ballot number that he has received vnumber: the largest ballot number that he has cast vdecree: the decree voted to accept during the ballot vnumber
Learner: A legislator in Parliament or citizen
10
Dept. of Computer Science & Engineering, The Chinese University of Hong Kong
ProposerProposer
Ballot number Assign each legislator a unique id l
between 0 and N-1 Total N legislators
The smallest ballot number s larger than any he has seen such that s mod N = l
Quorum A simple majority A weighted majority
Any set of legislators whose total weight was more than half the total weight of all legislators
…
11
Dept. of Computer Science & Engineering, The Chinese University of Hong Kong
Phase 1: PreparePhase 1: Prepare Phase 1a: Proposer Acceptor
pnumber = … pdecree = … msg: prepare(pnumber)
Phase 1b: Proposer Acceptor if (pnumber > number)
number = pnumber msg: promise(number, vnumber, vdecree)
He promised that he would not cast a vote for a decree with ballot number less than number
else if (pnumber < number) && different proposers
msg: reject(number) else if (pnumber == number)
ignore12
Dept. of Computer Science & Engineering, The Chinese University of Hong Kong
Phase 2: ProposePhase 2: Propose Phase 2a: Proposer Acceptor
msg:promise(number, vnumber, vdecree) if (pnumber == number) && majority(number)
if(vdecree != null) pdecree = vdecree with the largest of vnumber (only one
such a value) msg: propose(pnumber, pdecree)
Phase 2b: if (pnumber number) && (vnumber pnumber)
number = vnumber = pnumber vdecree = pdecree Learner Acceptor
msg: vote(vnumber, vdecree) else if (pnumber < number)
Proposer Acceptor msg: reject(number)
13
Dept. of Computer Science & Engineering, The Chinese University of Hong Kong
Phase 3: LearnPhase 3: Learn
Phase 3: Learner if majority(vnumber)
Legislator: update his ledger with vdecree Citizen: informed with vdecree
14
Dept. of Computer Science & Engineering, The Chinese University of Hong Kong
Example 1Example 1
15
1 2 3
prepare(0)
promise(0, -, null)
propose(0, )
vote(0, )
prepare(pnumber)promise(number, vnumber, vdecree)propose(pnumber, pdecree)vote(vnumber, vdecree)reject(number)
Dept. of Computer Science & Engineering, The Chinese University of Hong Kong
Example 2Example 2
16
1 2 3 4 5
prepare(0)
promise(0, -, null)
propose(0, )
prepare(4)
promise(4, -, null)
propose(9, )
vote(9, )
prepare(9)
promise(9, 0, )citizen
citizen
vote(0, )
promise(9, -, null)
Dept. of Computer Science & Engineering, The Chinese University of Hong Kong
LivelockLivelock
17
1 2 3 4 5
prepare(0)
promise(0, -, null)
propose(0, )
prepare(4)
promise(4, -, null)
propose(4, )
vote(4, )
reject(4)vote(0, )
prepare(5)
promise(5, -, null)promise(5, 0, )
reject(5)
prepare(9)
Dept. of Computer Science & Engineering, The Chinese University of Hong Kong
President SelectionPresident Selection
The progress condition would be met if only a single proposer, who did not leave the Chamber, was initiating ballots
Having multiple presidents could only impede progress
It could not cause inconsistency
18
Dept. of Computer Science & Engineering, The Chinese University of Hong Kong
Fault-Tolerant Distributed System
Fault-Tolerant Distributed System
A single server: lower availability Multiple server replicas
Legislators Multiple non-reliable server replicas
Proposer : On behalf of client Acceptor : Working server replica Learner: All server replicas
Messenger Non-reliable communication path Non-Byzantine faults (lost, out of order, duplicated)
Decree User command submitted to server replicas
Law (a numbed sequence of passed decrees) Server replica state
State needs to be consistent among replicas Ledger Stable storage
Save messages before being sent out19
Dept. of Computer Science & Engineering, The Chinese University of Hong Kong
ConclusionConclusion
Paxos: a consensus protocol proposed by Leslie Lamport in 1989
Quorum (Majority) Phase 1 (Prepare): no decree proposed
Used in Google Chubby lock Hadoop Zookeeper (Zab) Scalien Keyspace (key-value NOSQL)
Oracle Berkey DB replication …
20
Dept. of Computer Science & Engineering, The Chinese University of Hong Kong 21