distributed commit dr. yingwu zhu. failures in a distributed system consistency requires agreement...

Distributed Commit

Dr. Yingwu Zhu

Failures in a distributed system

• Consistency requires agreement among multiple servers– Is transaction X committed?– Have all servers applied update X to a replica?

• Achieving agreement w/ failures is hard– Impossible to distinguish host vs. network failures

• This class: – all-or-nothingall-or-nothing atomicity in distributed systems

Distributed Commit Problem

• Some applications perform operations on multiple databases– Transfer funds between two bank accounts– Debiting one account and crediting another

• We would like a guarantee that either all the databases get updated, or none does

• Distributed Commit Problem (all-or-noneall-or-none semantics):– Operation is committed when all participants can perform

it– Once a commit decision is reached, this requirement holds

even if some participants fail and later recover

Transaction

Transaction behave as one operation• Atomicity: all-or-none, if transaction failed

then no changes apply to the database• Consistency: there is no violation of the

database integrity constraints• Isolation: partial results are hidden (due to

incomplete transactions)• Durability: the effects of transactions that

were committed are permanent

Example

Bank A Bank B

Transfer $1000From A:$3000To B:$2000

• Clients want all-or-nothing transactions– Transfer either happens or not at all

client

Strawman solution

Bank A Bank B

Transfer $1000From A:$3000To B:$2000

client

Transactioncoordinator

Strawman solution

• What can go wrong?– A does not have enough money– B’s account no longer exists– B has crashed– Coordinator crashes

client transactioncoordinator

bank A bank B

doneA=A-1000

B=B+1000

One-Phase Commit

• A coordinator tells all other processes (participants) whether or not to perform the operation in question

• Problem: – If one participant fails to perform the operation,

no way to tell the coordinator– Violate the all-or-noneall-or-none rule!

Two-Phase Commit (2PC) Overview

• Assumes a coordinator that initiates the commit/abort

• Each participant votes if it is ready to commit– Until the commit actually occurs, the update is

considered temporary (placed in temp area)temporary (placed in temp area)– The participant is permitted to discard a pending

update • Until all participants vote “ok”, a participant can abort

• Coordinator decides outcome and informs all participants

2PC: More Details• Operates in rounds• Coordinator assigns unique identifiers for each

protocol run. How? It’s time to use logical clocks: run identifier can be process ID and the value of logical clock

• Messages carry the identifier of protocol run they are part of

• Since lots of messages must be stored, a garbage collection must be performed, the challenge is to determine when it is safe to remove the information

Participant States• Initial state: pi is not aware

that protocol started, ends when pi received the ready_to_commit and it is ready to send its Ok

• Prepared to commit: pi sent its Ok, saves in temp area and waits for the final decision (Commit or Abort) from coordinator

• Commit or abort state: pi knows the final decision, it must execute it

2PC State Transition

(a) The finite state machine for the coordinator in 2PC.(b) The finite state machine for a participant.Timeout mechanism is used here for coordinator and participants. Coordinator

blocked in “WAIT”, participant blocked in “INIT”

• Ideal world: coordinator and participants never fail.

• How 2PC works?

2PC: Back to Reality

• Each participant can fail at any time

• Coordinator can fail at any time

• Question: how to make 2PC work?

Problem Solving Step-by-Step

• Step1: assume the coordinator never fails but the participant could fail at any time

• Step2: assume the coordinator could fail at any time

Step1: what can go wrong on participants!

• Initial state (INIT): if pi crashes before it received vote-request. It does not send it’s vote back, the coordinator will abort the protocol (not enough votes are received). – implemented by timeouts.

• Prepared to commit(READY): if pi crashes before it learns the outcome, resources remained blocked. It is critical that a crashed participant learns the outcome of pending operations when it comes back: How?

• COMMIT or ABORT state: pi crashes before executing, it must complete the commit or abort repeatedly in spite of being interrupted by failures: How?

Group Discussions

• How to modify the 2PC to address the participant failures?

What modifications are needed?• A process must remember

in what state it was before crashing resume pointresume point

• A process must find out the outcome (by contacting the coordinator) redeem redeem orderorder

• The coordinator must find out when a process indeed completed the decision, since it can crash before executing it garbage collection @ garbage collection @ coordinatorcoordinator

2PC: Overcome Participant Failures• Coordinator:

– Multicast: vote-request– Collect replies/votes

• All vote-commit => log ‘commit’ to ‘outcomes’ table and send commit• Else => log ‘abort’ send abort

– Collect acknowledgments– Garbage-collect protocol outcome information

• Participant:– vote-request => log its vote and send vote-(commit/abort)– commit => make changes permanent, send acknowledgment– abort => delete temp area– After failure:

• For each pending protocol: contact coordinator (or other participants) to learn outcome

Step 2: What if coordinator fails?

• If coordinator crashed during first phase (WAIT):– some participants will be ready to commit– others will not be able to (they voted on abort)– other processes may not know what the state is

• If coordinator crashed during its decision or before sending it out:– some processes will be in READY state– some others will know the outcome

Group Discussions

• How to overcome the coordinator failures?

Improvement• If coordinator fails, processes are blocked

waiting for it to recover• After the coordinator recovers, there are

pendingpending protocols that must be finished• Coordinator must remember its state

before crashing – Write INIT & WAIT state on permanent

storage– write GLOBAL_COMMIT or GLOBAL_ABORT

on permanent storage before sending commit or abort decision to other processes and push these operations through

• Participants may see duplicated messages (due to message re-transmission by coordinator)

2PC: Overcome Coordinator Failures (1)

Coordinator:• Multicast: vote-request• Collect replies/votes

– All vote-commit => log ‘commit’ to ‘outcomes’ table, wait until safe on persistent storage and send commit

– Else => log ‘abort’, send abort• Collect acknowledgments• Garbage collect protocol outcome information

After failure:• For each pending protocol in outcomes table

– Possibly re-transmit VOTE_REQUEST if in WAIT– Send outcome (commit or abort) – Wait for acknowledgments– Garbage collect outcome information

2PC: Overcome Coordinator Failures (2)

Participant: first time message received• Vote-request

– save to temp area and reply its vote –(commit / abort)• Global_commit

– make changes permanent• Global_abort

– delete temp area

Message is a duplicate (recovering coordinator)– Send acknowledgment

After failure:• For each pending protocol:

– contact coordinator to learn outcome

2PC: Coordinator

Outline of the steps taken by the coordinator in 2PC protocolOutline of the steps taken by the coordinator in 2PC protocol..

2PC: participant

• The steps taken by a participant process in 2PC.

2PC: decision query from other participants

State of Q Action by P

COMMIT Transition to COMMIT

ABORT Transition to ABORT

INIT Transition to ABORT

READY Contact another participant*

Problem with 2PC

• The crash of the coordinator may block participants to reach a final decision until it recovers– during the decision stage– All participants in READY

status, cannot cooperatively decide the final decision!

– Solution: 3PC

Another example of 2PC

2 PC• Phase 1 (voting phase)– (1) coordinator sends canCommit? to participants– (2) participant replies with vote (Yes or No); before voting

Yes prepares to commit by saving objects in temp area, and if No

aborts• Phase 2 (completion according to outcome of vote)– (3) coordinator collects votes (including own)

• if no failures and all Yes, sends doCommit to participants• otherwise, sends doAbort to participants

– (4) participants that voted Yes wait for doCommit or doAbort and act accordingly; confirm their action to coordinator by sending haveCommitted

Communication in 2 PC

3 PC Overview• Remember that 2 PC blocks when coordinator

crashes during the decision stage– Participants are blocked until the coordinator

recovers!• Guarantees that the protocol will not block when

only fail-stop failures occur– Avoid blocks in 2 PC– Model is not realistic, but still interesting to look at

• A process fails only by crashing, crashes are accurately detectable

• Requires a fourth round for garbage collection

3PC Key Idea

• Introduces an additional round of communication and delays to prepare-to-prepare-to-commitcommit state to ensure that the state of the system can always be deduced by a subset of a subset of alive participantsalive participants that can communicate with each other– before the commit, coordinator tells all

participants that everyone sent Oks (ready_commit)

3PC: Coordinator

Coordinator:• Multicast: vote-request• Collect votes/replies– All commit => log ‘precommit’ and send precommit– Else => log ‘abort’, send abort

• Collect acks from non-failed participants– All ‘ready-commit’ => log commit and send global-

commit• Collect acknowledgements• Garbage collect protocol outcome information

3PC: ParticipantParticipant: logs state on each message• Vote-request

– save to temp area and reply vote-(commit/abort)• precommit

– Enter precommit state, send ack (ready-commit)• commit

– make changes permanent• abort

– delete temp area

After failure:• Collect participant state information• All precommit or any committed

– Push forward the commit• Else

– Push back the abort

3PC State Transition

• (a) The finite state machine for the coordinator in 3PC. (b) The finite state machine for a participant.

Question: Now can participants be blocked in READY status?Question: Now can participants be blocked in READY status?

Check if 2PC’s problem has been solved?

• A participant can be blocked in– INIT: abort (no participant in PRECOMMIT, why?)– READY: if a majority of participants in READY, safe to

abort– PRECOMMIT: if all participants in PRECOMMIT, then

COMMIT, otherwise safe to abort• In summary, if a participant in READY, all crashed

participants can only recover to INIT, ABORT, or PRECOMMIT, which allows surviving processes can always come to a final decision

Summary Slides

2 PC• Is a blockingblocking protocol• Consists of a coordinator and participants.

1. Coordinator multicasts a VOTE_REQUEST message to all participants.2. When a participant receives a VOTE_REQUEST message, it replies

(unicast) with either VOTE_COMMIT or VOTE_ABORT.• A VOTE_COMMIT response is essentially a contractual guarantee that it

will be able to commit.3. Coordinator collects all votes. If all are VOTE_COMMIT, then it

multicasts a GLOBAL_COMMIT message. Otherwise, it will multicast a GLOBAL_ABORT message.

4. When a participant receives GLOBAL_COMMIT, it locally commits; if it receives GLOBAL_ABORT, it locally aborts.

2 PC FSMs

• Where does the waiting/blocking occur?– Coordinator-WAIT– Participant-INIT– Participant-READY

Coordinator Participant

Two-Phase Commit Recovery

• What happens in case of a crash? How do we detect a crash?– If timeout in Coordinator-WAIT, then abort.– If timout in Participant-INIT, then abort.– If timout in Participant-READY, then need to find out if globally

committed or aborted.• Just wait for Coordinator to recover.• Check with others.

Coordinator Participant

Wait StateWait States

Two-Phase Commit Recovery

• If in Participant-READY, and we wish to check with others:– If Q is in COMMIT, then commit. If Q is in ABORT, then

ABORT.– If Q in INIT, then can safely ABORT.– If all in READY, nothing can be done. 3 PC

Coordinator

Wait State

Participant

Wait States

distributed commit dr. yingwu zhu. failures in a distributed system consistency requires agreement...

client slide

commit decision

coordinator all

distributed commit problem

commit problem all

information slide

permanent slide

twophase commit

Documents

web caching dr. yingwu zhu. what is web caching introducing...

topological data processing for distributed sensor networks...

face detection and head...

synchronization dr. yingwu zhu distributed computing

divide-and-conquer dr. yingwu zhu p65-74, p83-88, p93-96,...

integrating semantics-based access mechanisms with p2p file...

towards efficient load balancing in structured p2p systems...

challenges, design and analysis of a large-scale p2p-vod...

drafting behind akamai (travelocity-based detouring) dr....

chapter 8 memory management dr. yingwu zhu. outline...

tree balancing: avl trees dr. yingwu zhu. recall in bst the...

search: binary search trees dr. yingwu zhu. linear search...

face recognition ying wu yingwu@ece.northwestern.edu...

application of honey bee mating optimization on distribution...

lecture 12: distributed transactions haibin zhu, phd....

search: binary search trees dr. yingwu zhu. review: linear...

overview dr. yingwu zhu. what is an os ? a program that acts...

review of stacks and queues dr. yingwu zhu. our focus only...

collaborative human computing zack zhu march 31, 2010...

algorithm complexity analysis: big-o notation (chapter 10.4)...