data & database administration instructor: engr. sana ziafat

Data & Database Administration

Instructor:

Engr. Sana Ziafat

Database Recovery and Backup Databases are often damaged or lost because of

system problems that may be caused by: Human error, Hardware failure, Incorrect or invalid data, Program errors, Viruses, Network failures, Conflicting transactions or, Natural disasters

Mechanisms for restoring a database quickly and accurately after loss or damage are known as Database recovery.

Back-up Facilities Automatic dump facility that produces backup copy

of the entire database Periodic backup (e.g. nightly, weekly) Stored in secure, off-site location

Incremental back-up Cold backup

database is shut down during backup Hot backup

selected portion is shut down and backed up at a given time

Back-up strategies based on demands

Backup Facilities (Contd.)

Backups of static/less frequently changing data may be taken less often.

Incremental backups which record changes made since last full backup, may be taken on interim basis. Incremental backups take lesser time.

Journalizing Facilities

A DBMS must provide journalizing facilities to keep an audit trail of transactions and database changes. Transaction: a discrete unit of work that must be

completely processed or not processed at all within a computer system. e.g., entering a customer record

In the event of failure, a consistent database state can be re-established by using the information in the journals together with the most recent complete backup.

There are two types of journals or logs: Transaction log Database change log

Journalizing Facilities Audit trail of transactions and database updates

Kept on disk or tape; Must be backed-up

Transaction log record of essential data for each transaction processed

against the database Example of data in the transaction log?

Database change log–images of updated data Before-image–copy before modification After-image–copy after modification

Produces an audit trailaudit trail

Journalizing Facilities (Contd.) Transaction log

Data typically recorded for a transaction include transaction code or ID Action or type of transaction (e.g. insert) Time of transaction Terminal number or user ID Input data values Tables and records accessed Records modified And possibly the old and new field values

Figure - Database audit trail

Database (current)

Transaction Log

Database Change log

Database (backup)

DBMS

Copy of database affected by transactionCopy of

transaction

Effect of transaction or recovery action

Transaction Recovery action

Checkpoint Facilities

A facility by which DBMS periodically refuses to accept any new transactions.

All transactions in progress are completed and the journal files are updates.

At this point, system is in a quiet state, and the database & transaction logs are synchronized.

The DBMS writes a special record (checkpoint record) to log file which is like a snapshot of the state of database.

The checkpoint record contains info necessary to restart the system.

Checkpoint Facilities (Contd.) Any dirty blocks (memory pages containing changes not

yet written to the disks) are written from memory to the disk storage, ensuring that all changes made prior to taking checkpoint have been written to long-term storage.

DBMS may perform checkpoints automatically or in response to user application programs.

Checkpoints should be taken frequently (several times an hour).

In case of failures, it is often possible to resume processing from the most recent checkpoint.

Thus only few minutes processing work must be repeated, compared with several hours for a complete restart of the day’s processing.

Recovery Manager

It is a module of the DBMS which restores the database to a correct condition when a failure occurs and which resumes processing user requests.

The type of restart used depends on the nature of the failure.

Recovery manager uses the logs as well as the backup copy to restore the database

Recovery and Restart Procedures Disk Mirroring–switch between identical copies

of databases: fastest recovery; high availability of data required

Restore/Rerun–reprocess transactions against the backup copy; simple, but two disadvantages/last resort:

1. Sequence of transaction

2. Time to reprocess transaction

Transaction Integrity–commit or abort all transaction changes (business transactions are well-defined sequence of steps) Define transaction boundaries: BEGIN

TRANSACTION & COMMIT Transactions should follow 4 properties.

Transaction ACID Properties

Atomic Transaction cannot be subdivided

Consistent Constraints don’t change from before transaction to

after transaction. Example: inventory level Isolated

Database changes not revealed to users until after transaction has completed;

know on-hand inventory after inventory transaction is completed.

Durable Database changes are permanent

Recovery and Restart Procedures Backward Recovery (Rollback)

apply before- images Reverse the changes made

Forward Recovery (Roll Forward)–apply the most recent after-images (much faster and more accurate than

restore/rerun)

Figure Basic recovery techniques a) Rollback

Figure Basic recovery techniques (cont.)b) Rollforward

Database Failure Responses (1) Aborted transactions

Transaction terminated abnormally; human error, hardware failure of loss of transmission etc. Preferred recovery: rollback Automatically done by DBMS

Incorrect data Difficult to detect; Need human intervention, e.g. wrong grades or payment Preferred recovery: rollback Alternative 1: rerun transactions not including inaccurate data updates Alternative 2: compensating transactions, human intervention to correct the error

Database Failure Responses (2) System failure (database intact)

System crash due to power loss, system software failure Preferred recovery: switch to duplicate database Alternative 1: rollback Alternative 2: restart from checkpoint

Database destruction (database is lost, destroyed) Preferred recovery: switch to duplicate database Alternative 1: rollforward to the state before loss Alternative 2: reprocess transactions

What is Concurrency Control? The process of managing simultaneous operations

against a database so that data integrity is maintained and the operations do not interfere with each other in a multi-user environment.

Background

Most DBMS run in a multi-user environment, where several users are able to share data contained in a database.

No data integrity problems occur when data is being read.

But problems can arise when several users are updating data.

When more than one transaction is being processed against a DB at the same time, the transactions are considered to be concurrent.

The actions that must be made to ensure data integrity are called concurrency control actions.

Background (Contd.)

Remember that CPU can process only one instruction at a time.

As new transactions are submitted while other processing is occurring against the database, the transactions are usually interleaved.

The CPU switches among the transactions so that some portion of each transaction is performed as the CPU addresses each transaction in turn.

Problems without Concurrency Control Lost Updates problem can occur when multiple users attempt to

update database. Example Two users with joint account trying to withdraw cash at the same

time using ATM at different locations.User A User B

1.Read account balance($1000) 1.Read account balance

($1000)

2.Withdraw $200

2.Withdraw $3003.Write account balance($800)

3.Write account balance($700)

ERROR

time

Problems without Concurrency Control Inconsistent read problems can occur when one

user reads data that has been partially updated by another user.

This read will be incorrect, also called dirty read or unrepeatable read.

Serializabilty Concurrent transactions need to be processed in isolation so that

they do not interfere with each other.

If one transaction were entirely processes at a time before another one, no interference would occur.

Procedures which emulate this are called serializable.

Serializable schedules process transactions that will give results as if they had been processed one after the other.

Schedules are designed such that transactions that will not interfere with each other can be run in parallel, e.g., transactions that access data from different tables in a database will not conflict with each other.

Serializability is achieved by different ways, but locking mechanisms are the most common.

Locking

Locking implies that any data retrieved by a user for updating must be locked, or denied to other users, until the update is completed or aborted.

Locking mechanism enforces a sequential updating process that prevents erroneous updates.

It is the pessimistic approach of concurrency control

Concurrency Control through Locking User A User B

1.Read account balance2.Lock account balance

1.Read account balance (denied)

2.Read account balance($1000)

2.Withdraw $200


4. Unlock account balance 2.Lock account balance 3.Read account balance($800)

4.Withdraw $300


6. Unlock account balance

time

Locking Levels There are different locking levels (granularity)

Database Entire DB is locked & becomes unavailable. This level can be

used during backups Table

Entire table containing a requested record is locked. This level can be used for bulk updates that will update entire table (e.g., salary raise)

Block or page The physical storage block (or page) containing the requested

record is locked. This level is commonly implemented. Record level

Only the requested record (row) is locked. All other records within table are available.

Field level Only the particular field (column) in the requested record is

locked.

Types of locks

Shared (S locks or read locks) Allows other transactions to read but not update a record or

resource. Placing a shared lock prevents another user from placing

an exclusive lock, but not a shared lock, on that record. Exclusive (X locks or write locks)

Prevent other transaction from reading and therefore updating a record until unlocked.

Should be placed when updating records. Placing an exclusive lock on a record prevents another

user from placing any type of lock on that record

Deadlocks

Locking may prevent erroneous updates but can lead to deadlocks problem.

Deadlock: When two or more transactions lock a common resource, each one of them waits for the other one to unlock that resource.

Deadlock prevention: users must lock all required records at the beginning of the transaction.

Deadlock resolution: the DBMS detects and breaks the deadlock by backing out of the the deadlocked transactions. Any changes made by that transaction up to the time of

deadlock are removed, The transaction is restarted when the resources become

available.

Versioning

Optimistic approach of concurrency control Each transaction is restricted to a view of a

database as of the time that transaction started and when a transaction modifies a record, the DBMS creates a new record version instead of overwriting the old record.

data & database administration instructor: engr. sana ziafat

Documents

database transaction

database changes

state of database

database updateskept

type of transaction

transaction code

consistent database

checkpoint facilities