part14 crash

8/2/2019 Part14 Crash

1/28

Part 14 -crash 1

Crash RecoveryCrash Recovery

in case of system crash (failure) we require a recoveryscheme to: detect failures restore the database to a consistent state

Failure Types

Volatile Storage main memory and cache

normally does not survive a crash


2/28

Part 14 -crash 2

Failure Types

Nonvolatile Storage usually survives a crash example: disk and magnetic tape except head crash, etc.

Stable Storage "never" lost (??)

can replicate on several nonvolatile media withindependent failure modes


3/28

Part 14 -crash 3

Failure Types

Logical Error program related error - divide by zero, overflow, access

to non-existent memory, etc. can often be restarted after a software fix made

System Error example: deadlock or some undesirable system state

entered re-execution often possible


4/28

Part 14 -crash 4

Failure Types

System Crash some hardware problem, volatile memory lost,...

Disk Failure head crash, etc. error during data transfer - sometimes recoverable


5/28

Part 14 -crash 5

Basic Terminology

input(X)

transfer physical block where data item X resides intomain memory

output(X)

transfer buffer block on which X resides onto physical

block (disk)read(X, xi )

assign value of X to local variable xi :

ifthe block in which X resides is not in mainmemory then issue an input(X).

assign xi the value of data item X from the buffer

block.


6/28

Part 14 -crash 6

Basic Terminology

write(X, xi )

assign value of the local variable xi to data item X in

the buffer block: ifthe block in which X resides is not in main

memory then issue an input(X) first.

assign xi to X in buffer memory.


7/28

Part 14 -crash 7

EXAMPLE consider the following example from a banking system where

$50 is withdrawn from account A and deposited into account B:

read( A, a1 )

a1 = a1 - 50

write( A, a1 )

read( B, b1 )

b1 = b1 + 50

write( B, b1 )


8/28

Part 14 -crash 8

Failure Modes

can leave the database in an inconsistent state, e.g.:

failure after output(A) but before output(B)

before output(A) and output(B) executed, thephysical database blocks and memory blocks differ,problem if crash!


9/28

Part 14 -crash 9

Transaction

a basic program unit

its execution preserves the database consistency

the database is consistent both before and after itsexecution.

transaction may not always complete may become aborted for various reasons database must be restored (rolled back) to the state

before the transaction started the transaction must be atomic

either all the instructions are completed or none areperformed


10/28

Part 14 -crash 10

Crash Recovery Methods

Incremental Log with Deferred Updates during the transaction execution, all writes are deferred until partial

commit stage

all updates are recorded on log and written to stable storage

for example: let A = 1000, B = 2000 at the start

T1 Log

read(A, a)

a := a - 50

write( A, a)

read(B, b)

b := b + 50

write(B, b)

. . .

other

transactions

the log is used to update the

database after thetransaction commits.


11/28

Part 14 -crash 11

Recovery Procedure

redo(T

i

)

set of all data values updated by Ti to new values

Ti needs a redo if both and

found in the log. redo is idempotent: can execute more than once,

same final result. For example, the system crashes while

performing a recovery.


12/28

Part 14 -crash 12

Crash Recovery Methods

Incremental Log with Immediate Update all updates are applied to the database; we keep an

incremental log of all changes.

written to stable storage when Ti begins.

for each write:

is written to

stable storage before any output(X) is performed.

e.g. write(X, 950)

when Ti partially commits, is written to

log.


13/28

Part 14 -crash 13

Recovery Procedure

[Incremental Log with Immediate Update]

redo(Ti)

the same as before set updated items to new values

undo(Ti) if log contains an but no < Ti ,commit>

found.

restore value of items updated by T

i

to their old

values.


14/28

Part 14 -crash 14

Checkpoints

recovery with logs requires the entire log to be

scanned. the search time grows with log size. many redone transactions unnecessary since their

updates have already been written to disk.

we can maintain periodic checkpoints save all logs currently residing in main memory (if

any) onto stable storage. output all modified buffer blocks to disk.

output a to log on stable storage.


15/28

Part 14 -crash 15

Checkpoints - recovery

Recovery:

find the last Ti executing before the last checkpoint, Ti .

all the redo and undo operations apply only to Ti and

subsequent Tjs.

much less time consuming.


16/28

Part 14 -crash 16

Buffer Management

OSs with virtual memory have paging schemes to evict

resident pages as required.

may work against us:

OS may evict a modified block before Ti commits,

as well logs often stored in main memory until abuffer block is full before sending to stable storage.

if now, Ti crashes, an inconsistency may result.

most OSs rarely support database requirements


17/28

Part 14 -crash 17

Buffer Management

it may be possible for the db manager to allocate an

area of memory and manage it independent of the OS(i.e. memory reserved for database use only).

thus < Ti , data_item, old_value, new_value> must be

written to stable storage before output of the block onwhich the item resides. (all entries)

before output on a block in main memory, all logspertaining to the block must be written to stable storage

first.


18/28

Part 14 -crash 18

Shadow Paging

the database is partitioned into a number of fixed length

blocks (pages). we can use a page table to translate each logical block

into its physical block:

1

2

3

n

Logical pag

table

Physical Pages

ondisk

we maintain two page tables:- current page table - used by Ti .

- shadow page table a copy of

the table before Ti executes, never

changed during execution of Ti

,

and stored in stable storage. Logical pagetable

Physical pages

on disk


19/28

Part 14 -crash 19

Shadow Paging

example:

a write(X, xi ) is issued and X resides on the k-th page:

if the k-th page is not in memory, then issue aninput(X).

if this is the first write to the k-th page: find a free page on disk. modify the current page table so the k-th entry

points to the new page.

assign xito X in the buffer page.


20/28

Part 14 -crash 20

Shadow Paging

the shadow page is stored in non-volatile memory just

prior to the execution of Ti . We can recover the

shadow page on a crash.

when Ti commits, the current page table becomes the

new shadow page table.

if the current page table is lost in a crash, it is simple toroll the system back to the last consistent state.

the overhead of log-records are eliminated.


21/28

Part 14 -crash 21

Shadow Paging

recovery is fast since no redo or undos to perform.

In order to commit a transaction:

all modified buffer pages in main memory are outputto disk.

output the current page table to disk (do notoverwrite the shadow page -may need to recover ifcrash occurs now).

send the disk address of current page table to stablestorage - over writes the previous shadow page.


22/28

Part 14 -crash 22

Shadow Paging

Disadvantages: data fragmentation:

the database becomes scattered over the disk (slowsequential access) - may need to repack to maintainfast sequential access.

garbage collection: after a commit, the old version of data is not

reachable (unreferenced) and is not part of free

space. We must perform periodic garbagecollections to recover the lost disk space.


23/28

Part 14 -crash23

Loss of Non-volatile Storage

typically does not occur frequently

do periodic dump from disk to magnetic tape (?)

recovery to point of last dump, then follow log torestore database.


24/28

Part 14 -crash24

Recovery with Concurrent Transactions

the scheme depends on the concurrency-control scheme

used. Basically, to roll back a transaction, we must undo its

updates. situation:

T0 is rolled back: a data item, B, that it updated must berestored to old value - can use undo information in its logfor log based recovery systems.

But if T1 did another update to B before T0 is rolledback, then T1s update is lost if T0 is rolled back.

thus we require that if T updates data item B, then no othertransaction may update B until T either commits or is rolledback.

This can be ensured with strict two-phase locking scheme(exclusive locks held until the end of a transaction).


25/28

Part 14 -crash25


Transaction Rollbacks

transaction Ti is rolled back by scanning the logbackwards.

for every entry found, the data item, Xj,

is restored to its old value V1. (possible that Tiperformed several updates to Xj)

continue scan until found.


26/28

Part 14 -crash26


Checkpoints recovery scheme more complex with concurrent

transaction execution than previous form. Severaltransactions may have been active at the lastcheckpoint.

we require that the checkpoint log entry be, where L is a list of the transactions

active at the time of the checkpoint. as before it is assumed that the transactions do not

perform updates to either the log or to buffer blocksduring the checkpoint duration


27/28

Part 14 -crash27


Restart Recovery initially, create two empty lists: undo-listand redo-list for

transactions requiring these operations. next, scan log backwards until the first

record is found, then:

for each found, add Ti to the redo-list. for each found, if Ti is not in redo-list,

then add it to undo-list. next, check the list L in the checkpoint record:

for each Ti in L, if Ti is not in the redo-list then addTi to the undo-list.


28/28

Part 14 -crash28


once the two lists have been constructed: Rescan log from most recent record backwards

performing an undo for each log record that belongsto a transaction on the undo-list (the log records forredo-list transactions are ignored). Stop scan when have been found for every transaction inundo-list.

Relocate the most recent again. scan log forward and perform redo for each record

that belongs to a transaction on the redo-list. Ignore

log records of transactions in the undo-list.

part14 crash

Documents