eecs 591 distributed systems

19
EECS 591 DISTRIBUTED SYSTEMS Manos Kapritsos Fall 2021 Slides by: Lorenzo Alvisi

Upload: others

Post on 03-May-2022

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: EECS 591 DISTRIBUTED SYSTEMS

EECS 591DISTRIBUTED SYSTEMS

Manos KapritsosFall 2021

Slides by: Lorenzo Alvisi

Page 2: EECS 591 DISTRIBUTED SYSTEMS

When all Ack’s have been received: := Commit send Commit to all

I. sends VOTE-REQ to all participants Coordinator Participant

2. sends to Coordinatorif = No then := Abort

halt send Precommit to allelse := Abort send Abort to all who voted Yes

halt

3. if (all votes are Yes) then

4. if received Precommit then send Ack

5. collect Ack from all participants

6. When receives Commit, sets := Commit and halts

3-PHASE COMMIT

Page 3: EECS 591 DISTRIBUTED SYSTEMS

TIMEOUT ACTIONS

Step 2: is waiting for VOTE-REQ from the coordinator

Coordinator Participant

Step 3: Coordinator is waiting for vote from participants

Step 4: is waiting for Precommit

Step 5: Coordinator is waiting for Ack’s

Step 6: is waiting for Commit

Page 4: EECS 591 DISTRIBUTED SYSTEMS

TIMEOUT ACTIONS

Step 2: is waiting for VOTE-REQ from the coordinator

Same as in 2PC

Coordinator Participant

Step 3: Coordinator is waiting for vote from participants

Step 4: is waiting for Precommit

Step 5: Coordinator is waiting for Ack’s

Step 6: is waiting for Commit

Page 5: EECS 591 DISTRIBUTED SYSTEMS

TIMEOUT ACTIONS

Step 2: is waiting for VOTE-REQ from the coordinator

Same as in 2PC

Coordinator Participant

Step 3: Coordinator is waiting for vote from participants

Same as in 2PCStep 4: is waiting for Precommit

Step 5: Coordinator is waiting for Ack’s

Step 6: is waiting for Commit

Page 6: EECS 591 DISTRIBUTED SYSTEMS

TIMEOUT ACTIONS

Step 2: is waiting for VOTE-REQ from the coordinator

Same as in 2PC

Coordinator Participant

Step 3: Coordinator is waiting for vote from participants

Same as in 2PCStep 4: is waiting for Precommit

Run termination protocol

Step 5: Coordinator is waiting for Ack’s

Step 6: is waiting for Commit

Page 7: EECS 591 DISTRIBUTED SYSTEMS

TIMEOUT ACTIONS

Step 2: is waiting for VOTE-REQ from the coordinator

Same as in 2PC

Coordinator Participant

Step 3: Coordinator is waiting for vote from participants

Same as in 2PCStep 4: is waiting for Precommit

Run termination protocol

Step 5: Coordinator is waiting for Ack’s

Coordinator sends Commit Step 6: is waiting for Commit

Page 8: EECS 591 DISTRIBUTED SYSTEMS

TIMEOUT ACTIONS

Step 2: is waiting for VOTE-REQ from the coordinator

Same as in 2PC

Coordinator Participant

Step 3: Coordinator is waiting for vote from participants

Same as in 2PCStep 4: is waiting for Precommit

Run termination protocol

Step 5: Coordinator is waiting for Ack’s

Coordinator sends Commit Step 6: is waiting for Commit

Run termination protocol

Page 9: EECS 591 DISTRIBUTED SYSTEMS

TIMEOUT ACTIONS

Step 2: is waiting for VOTE-REQ from the coordinator

Same as in 2PC

Coordinator Participant

Step 3: Coordinator is waiting for vote from participants

Same as in 2PCStep 4: is waiting for Precommit

Run termination protocol

Step 5: Coordinator is waiting for Ack’s

Coordinator sends Commit Step 6: is waiting for Commit

Run termination protocol

Participant knows what they will receive…but the NB property can be violated!

Page 10: EECS 591 DISTRIBUTED SYSTEMS

TERMINATION PROTOCOL:PROCESS STATES

At any time while running 3PC, each participantcan be in exactly one of these four states:

Aborted

Uncertain

Pre-committed

Committed

Not voted, voted No, received Abort

Voted Yes but not received Precommit

Received Precommit, not Commit

Received Commit

Page 11: EECS 591 DISTRIBUTED SYSTEMS

NOT ALL STATES ARE COMPATIBLE

Aborted

Uncertain

Pre-committed

Committed

Aborted Uncertain Pre-committed Committed

Page 12: EECS 591 DISTRIBUTED SYSTEMS

When times out, it starts an election protocol to elect a new coordinator

The new coordinator sends STATE-REQ to all processes that participated in the election

The new coordinator collects the states and follows a set of termination rules

TERMINATION PROTOCOL

Page 13: EECS 591 DISTRIBUTED SYSTEMS

to elect a new coordinator

The new coordinator sends STATE-REQ to all processes that participated in the election

The new coordinator collects the states and follows a set of termination rules

TERMINATION PROTOCOL

TR1: if some process decided Abort, thendecide Abortsend Abort to allhalt

TR2: if some process decided Commit, thendecide Commitsend Commit to allhalt

TR3: if all processes that reported state are uncertain, thendecide Abortsend Abort to allhalt

TR4: if some process is pre-committed, but none committed, thensend Precommit to uncertain processeswait for Ack’ssend Commit to allhalt

Page 14: EECS 591 DISTRIBUTED SYSTEMS

TERMINATION PROTOCOL AND FAILURES

Processes can fail while executing the termination protocol

if times out on , it can just ignore

if fails, a new coordinator is elected and the protocol is restarted (election protocol to follow)

total failures will need special care

Page 15: EECS 591 DISTRIBUTED SYSTEMS

RECOVERING

If fails before sending Yes, decide Abort

If fails after having decided, follow decisionIf fails after voting Yes, but before receiving decision value

asks other processes for help3PC is non-blocking: will receive a response with the decision

If has received Precommit

still needs to ask other processes (cannot just Commit)

No need to log Precommit!(or is there?)

Page 16: EECS 591 DISTRIBUTED SYSTEMS

THE ELECTION PROTOCOL

Processes agree on linear ordering (e.g. by pid)Each process maintains a set of all processes that it believes to be operationalWhen detects failure of , it removes from and chooses smallest in to be the new coordinatorIf , then is the new coordinatorOtherwise, sends UR-ELECTED to

Page 17: EECS 591 DISTRIBUTED SYSTEMS

TOTAL FAILURE

Suppose that is the first process to recover and that is uncertain. Can decide Abort?

Some process could have decided Commit after crashed!

is blocked until some process recovers such that either can recover independently is the last process to fail: then can simply invoke the termination protocol

Page 18: EECS 591 DISTRIBUTED SYSTEMS

DETERMINING THE LAST PROCESS TO FAIL

Suppose a set of processes has recoveredDoes contain the last process to fail?

the last process to fail is in the set of every processso the last process to fail must be in

contains the last process to fail if:

Page 19: EECS 591 DISTRIBUTED SYSTEMS

ADMINISTRIVIA

I will email you homework #1 later today

Due next Monday 9/27 before class by email to Tony and me

Research project

Declare your team by Oct 1st (by email to me)

Declare your topic by Oct 8th (by email to me)

Not sure what to do? Come talk to me.