distributed transaction recovery. 2 goal all_or_nothing semantics of atomicity must be extended to a...

Distributed Transaction Recovery

2

Goal

• All_or_nothing semantics of atomicity must be extended to a transaction’s updates on multiple servers

• Distributed system can fail partially• The solution is to set up a special handshake protocol

between servers from a family of distributed commit protocols

• The protocol implies that there are certain circumstances under which a failed server must communicate to other servers during its restart to find out the systemwide decision about the termination status of in-doubt transactions (uncertain transactions)

3

Goal

• Servers are no longer autonomous in that they can always independently recover „in splendid isolation”

• Commit protocols make very few assumptions about how the various servers implement their local recovery

• The requirement there is that the rules of the commit protocol are followed by all parties of a distributed system

• The 2PC protocol has been standardized and is generally accepted in the software industry

4

Distributed Commit • The problem:

How can we ensure atomicity of a distributed transaction?

Example:Consider a chain of stores belonging to a supermarket. Suppose a manager of the chain wants to query all the stores, find the inventory of pants at each, and issue instructions to move pants from store to store in order to balance the inventory. The whole operation is done by a single global transaction T that has component Ti at each ith store and component To at the office where the manager is located

5

Distributed Commit

The sequence of activity performed by T are summarized below:

• Component To is created at the site of the manager• To sends messages to all the stores instructing them to

create components Ti• Each Ti executes a query at store i to discover the number

of pants in inventory and reports this number to To• To takes these numbers and determines what shipments of

pants are desired. To sends messages such as “store 10 should ship 100 pants to store 7” to the appropriate stores

• Stores receiving instructions update their inventory and perform the shipments

6

Failed Coordination of a Distributed

Transaction (1)

Active

Committed

Aborted

Crash/recover

T1

Active

Committed

Aborted

Crash/recover

T2

Both T1 and T2 are ready to commit

7

Failed Coordination of a Distributed

Transaction (2)

Active

Committed

Aborted

Crash/recover

T1

Active

Committed

Aborted

Crash/recover

T2

T1 committed; T2: crashes and recovers

What we need is some new state that has more flexibility than this simple state diagram

8

Transaction State Diagram with

Durable Prepared State

Active

Committed

Aborted

Crash/recover

Precommitted

9

Precommitted state

• Precommitted:

a transaction becomes precommitted as a result of a request by a distributed transaction coordinator (TM). From a precommitted state, the transaction can enter either committed state or aborted state. In the event of a system crash and subsequent recovery, a transaction in a precommitted state returns to precommitted state.

10

Two-Phase Commit Protocol

• We assume that the atomicity mechanisms at each site assure that either the local component commits or it has no effect on the database state at that site – i.e. local components of a global transaction are atomic

• By enforcing the rule that either all components of a distributed transaction commit or none does, we make the distributed transaction itself atomic

11

Two-Phase Commit Protocol

• Some assumptions about the 2PC protocol: – Each site logs actions at that site, but there is no global

log

– One site, coordinator, plays a special role in deciding whether or not the distributed transaction can commit

– 2PC protocol involves sending certain messages between the coordinator and the other sites. As the message is sent, it is logged at the sending site, to aid in recovery if should it be necessary

12

Two-Phase Commit Protocol – Phase 1

1. The coordinator places a log record <Prepare T> on the log at its site

2. The coordinator sends to each component’s site the message <prepare T>

3. Each site receiving the message <prepare T> decides whether to commit or abort its component of T. The site can delay if the component has not yet completed its activity, but must eventually send a response

4. If a site wants to commit its component of T, it must enter a state called precommitted. Once in the precommitted state, the site cannot abort its component of T without a directive to do so from the coordinator.

13


The following steps are done to become precommitted: – Perform whatever steps are necessary to be sure the local

component of T will not have to abort, even if there is a system failure followed by recovery at the site. So, all appropriate actions regarding the log must be taken so that T will be redone rather than undone in a recovery

– Place the record <Ready T> on the local log and flush the log to disk

– Send to the coordinator the message <ready T>

5. If a site wants to abort its component of T, then it logs the record <Don’t commit T> and sends the message <don’t commit T> to the coordinator. It is safe to abort the component at this time, since T will surely abort even if only one component wants to abort

14


1. If the coordinator has received <ready T> from all components of T, then it decides to commit T. The coordinator:

– Logs <Commit T> at its site, and

– Sends message <commit T> to all sites involved in T

2. If the coordinator has received <don’t commit T> from one or more sites, then it decides to abort T. The coordinator:

– Logs <Abort T> at its site, and

– Sends message <abort T> to all sites involved in T

15


3. If a site receives a <commit T> message, it commits the component of T at that site, logging <Commit T> as it does

4. If a site receives the message <abort T>, it aborts the component of T at that site, and writes the log record <Abort T>

16

Failure model

• Message losses: a message does nor arrive at the destination process because of a network failure

• Message duplications: some network component may end up duplicating a message

• Transient process failures: one or more of the involved processes, participants or coordinators, exhibit a soft crash and need to be restarted, but without any damage to data on secondary storage

• Idea: distribute the responsibilities for handling various failure classes among transactional federation and the underlying communication system

17

Failure model

• 2PC does not make any assumptions about the underlying communication system (datagrams as well as session oriented protocols)

• Omission failures versus commission failures – the later ones would lead to the class of distributed consensus protocols known as Byzantine agreement

• We take into account only omission failures

18

Prepared log entry for in-doubt transactions

• Participant that replies „yes” to the coordinator’s poll in the first message round can then crashes.

• The participant can no longer perform local crash recovery as if there were no distributed transactions at all

• The restarted participant needs to check back with the coordinator first before it can decide to consider the transaction commited.

• Thus, every participant that votes „yes” must actually be prepared to go either way – redo or undo teh updates.

19

Two Phase Commit (cd)

There are three places in 2PC where a process is waiting for a message:– the participant is waiting for the <prepare T> message

from the coordinator

– the coordinator is waiting for the <ready T> message from all the participants

– the participant is waiting for the <commit T> or <abort T> message from the coordinator

20

Two Phase Commit (cd)

Timeout Actions :– ad.1. unilateral abort after time_out (participant)

– ad.2. unilateral abort after time_out (coordinator)

– ad.3. the site can be blocked

21

Restart and Termination Protocols

• 2PC is robust with respect to failures in the sense that each failed and restarted process resumes its work in the last remembered state

• 2PC is not robust with respect to failures in the sense that all processes guarantee active progress toward a global commit or rollback

• Two extensions of the basic 2PC– A restart protocol – specifies how failed and restarted protocol

should proceed

– A termination protocol – specifies how a process should behave upon a timeout while it is waiting for some message

22

Restart and Termination Protocols

• As all participants follow the same protocol, we have to specify four cases:

1. The coordinator restart protocol – the continuation of the coordinator’s protocol after a coordinator restart

2. The coordinator termination protocol – the coordinator behavior upon timeout

3. The participant restart protocol – the continuation of the participant’s protocol after a participant restart

4. The participant termination protocol – the participant behavior upon timeout

23

Termination protocol for 2PC • The simplest termination protocol is the following:

The participant P remains blocked until it can re-establish communication with the coordinator. Then the coordinator can tell P the appropriate decision

Drawback: P may be unnecessarily blocked for a long time • Participants can exchange messages directly (without the

mediation of the coordinator)• We assume that coordinator attaches the list of the

participant’s identities to the <prepare T> message it sends to each of them

• The fact that participants do not know each other before receiving the message is of no consequence, since a participant may unilaterally decide to abort transaction

24

Termination protocol for 2PC

• The cooperative termination protocol :

A participant P (initiator) that times out while in its uncertainty period sends a <decision request> message to every other process Q (responder) to inquire whether Q either knows the decision or can unilaterally reach one

• Q has already decided Commit (or Abort) – Q sends a commit or abort to P, and P decides accordingly

• Q has not voted yet – Q can unilaterally decide Abort. It then sends an <abort T> to P, and P decides Abort

• Q has voted Yes but has not yet reached a decision

25

2PC with restart and termination protocols

• Statechart for 2PC with restart and termination protocols:– F transitions are triggered during restart after a process

failure. Once the process’s last state is determined from the log entries on the process’s stable log, the transition is made without any further preconditions

– T transitions are triggered upon timeout and are also made without further preconditions

26

2PC statechart - coordinator

initial

collecting

prepare_to_commitT or F

commit abort

C-pending A-pending

forgotten

Ack

commitT or F

abort

T or F

T or F

variant

27

2PC statechart - participant

initial

prepared

committed aborted

Prepared/ yes

commit/ack

abort/ack

T or F

Commit/ack Abort/ack

Prepared/ no

T or F

28

Communication Topologies for 2PC • The communication topology of a protocol is the

specification of who sends messages to whom

Centralized 2PC Coordinator

Participants

29

Communication Topologies for 2PC

Decentralized 2PC (reduce time complexity)

Linear 2PC (reduce message complexity)

30

Three Phase Commit (3PC)

• The protocol is non-blocking in the absence of total site failures

• The protocol may cause inconsistent decisions to be reached in the event of communication failures

We assume that only site failures occur • All operational sites can communicate with each other • Process that times out waiting for a message from process

q knows that q is down and therefore that no processing can be taking place there (no other process can be communicating with q)

31


• Which of the processes involved in a transaction takes the role of the coordinator ?

• The choice of the coordinator depends on:– Transaction initiator client PC, application server, Web server,

database server, ORB

– Reliability and speed of participants – how many participants does the transaction involve?

– Communication topology and protocol

• The simplest way – transaction initiator is chosen as the coordinator

• If the initiator is a client, it often makes more sense to choose a data server as a coordinator

32


Nonblocking property:

If any operational process is uncertain then no process (whether operational or failed) can have decided to Commit

If the operational sites discover they are all uncertain, they can decide Abort, safe in their knowledge that the failed processes had not decided Commit. When the failed processes recover they can be told to decide Abort. This way blocking is prevented

• Does 2PC obeys NB property? NO

33


1. The coordinator sends a <prepare T> message to all participants

2. When a participant receives a <prepare T> message, it responds with YES or NO message, depending on its vote. If a participant sends NO, it decides Abort and stops

3. The coordinator collects the vote messages from all participants. If any of them was NO or if the coordinator voted No, then the coordinator decides Abort, sends <abort T> to all participants that voted YES, and stops. Otherwise, the coordinator sends <pre-commit T> messages to all participants

34


4. A participant that voted Yes waits for <pre-commit T> or <abort T> message from the coordinator. If it receives an <abort T> , the participant decides Abort and stops. If it receives a <pre-commit T> , it responds with an ACK message to the coordinator

5. The coordinator collects the ACKs. When they have all been received, it decides Commit, sends <commit T> to all participants, and stops

6. A participant waits for a <commit T> message from the coordinator. When it receives that message, it decides Commit and stops

35

3PC – timeout actions

What a process should do when it times out depends on the message it is was waiting for:

1. In step (2) participants wait for <prepare T> message

2. In step (3) the coordinator waits for the votes

3. In step (4) participants wait for a <pre-commit T> or <abort T> message

4. In step (5) the coordinator waits for ACKs

5. In step (6) participants wait for <commit T> message

36

3PC – timeout actions

• Cases (1) and (2) are handled exactly as in 2PC

• In case (4) the coordinator may decide Commit while some failed participant is uncertain

• In case (3) the participant must communicate with other processes to reach a consistent decision

• In case (5) the participant must communicate with other processes to reach a consistent decision. Why can participant ignore the timeout and simply decide commit? The participant could violates condition NB (the coordinator failed after sending pre-commit to p but before sending it to q

37

Termination protocol for 3PC • The state of the process relative to the message it has sent

or received:– Aborted: the process has not voted No, has voted No, or has

received <abort T> – Uncertain: the process is in its uncertain period – Committable: the process has received a <pre-commit T>, but has

not yet received a <commit T>– Committed: the process has received a <commit T>

• When a participant times out in cases (3) or (5) it initiates an election protocol

• The election protocol involves all processes that are operational and results in the “election” of a new coordinator (the old one must have failed, otherwise no participant would have timed out!)

38


The coordinator sends a <state-req> message to all participants that participated in the election. Each participant responds to this message by sending its state. The coordinator collects these states and proceeds according to the following termination protocol:

TR1: If some process is Aborted, the coordinator decides Abort, sends <abort T> messages to all participants, and stops.

TR2: If some process is Committed, the coordinator decides Commit, sends <commit T> messages to all participants, and stops.

39


TR3: If all processes that reported their state are Uncertain, the coordinator decides Abort, sends <abort T> messages to all participants, and stops.TR4: If some process is Committable but none is Committed, the coordinator first sends <pre-commit T> messages to all processes that reported Uncertain, and waits for acknowledgements from these processes. After having received these acknowledgements the coordinator decides Commit, sends <commit T> messages to all participants, and stops.

A participant that receives a <commit T> (or <abort T>) message, decides Commit (or Abort), and stops

40

Correctness of 3PC and its

Termination Protocol

Theorem: In the absence of total failures, 3PC and its termination protocol never cause processes to block

Theorem: Under 3PC and its termination protocol, all operational processes reach the same decision

41

Tree 2PC algorithm

• Which of the processes involved in a transaction takes the role of the coordinator ?

• LAN vs WAN – Everybody can talk to everybody

– The initiator does not know all participants (and probably cannot communicate with them directly and efficiently)

• Observation:– During the execution of a transaction, the involved processes

dynamically form a tree with the transaction initiator as the root; each edge in the tree corresponds to a dynamically established communication link

42

Tree 2PC algorithm• In the hierarchical commit protocol the message flow and

writing of log entries follows from the two roles of an intermediate node, participant with regard to its caller and coordinator for its subtree

initiator

Process 0

Process 1 Process 3

Process 2 Process 4 Process 5

Communication during commit protocol

43

initiator Process 1 Process 2 Process 3 Process 4 Process 5

prepare

prepare

prepareprepare

prepare

Yes

Yes

Yes

Yes

Yes

commit

commit

..........

44

Optimized Algorithms for Distributed Commit

What we can do:

1. Reducing the number of messages and the number of log writes

2. Shortening the „critical path” from the begin of the commit protocol to the point when local locks can be released

3. Reducing the probability of blocking

45


• The number of messages and forced log writes can be reduced by introducing specific conventions for the presumed behavior of a process (i.e. default reaction) in the absence of more explicit information

e.g. If a participant did not force a commit or abort log entry and thus could lose this info in a local crash, this participant could by default contact the coordinator to obtain missing information (but coordinator has already truncated its log and forgotten the transaction)

46


• Basic 2PC:– 4n messages; 2N+2 forced log entries (coordinator - begin and

commit or rollback log entry)

• When a certain piece of information about transaction’s behavior is missing – two variations of 2PC– Presumed abort (PA)

– Presumed commit (PC)

• Presumed abort 2PC has been considered as the method of choice and has been selected for the industry standard XA

47

Heterogeneous Commit Coordinators • As networks grow together, applications increasingly must

access servers and data residing on heterogeneous systems, performing transactions that involve multiple nodes of a network. These nodes have different TP monitors and different commit protocols

• If each transaction manager supports a standard and open commit protocol, then portability and interoperability among them are relatively easy to achieve

• There are two standard commit protocols and application programming interfaces: – LU6.2 (IBM) - embodied by CICS and CICS API

– OSI-TP - combined with X/Open DTP

48

Closed versus Open Transaction

Managers

• Some transaction managers have an open commit protocol: resource managers can participate in the commit decision, and the commit message formats and protocols are public

CICS, Decdtm, TOPEND (NCR), Transarc TM • Some transaction managers have a closed commit protocol:

resource managers cannot participate in the commit decision. The term is used to describe TP systems that have only private protocols and therefore cannot cooperate with other transaction processing systems

IBM’s IMS, Tandem’s TMF

distributed transaction recovery. 2 goal all_or_nothing semantics of atomicity must be extended to a...

Documents

distributed transaction

database state

new state

aborted state

precommitted state returns

distributed systemthe

inventory of pants

single global transaction