data & database administration instructor: engr. sana ziafat
TRANSCRIPT
Data & Database Administration
Instructor:
Engr. Sana Ziafat
Database Recovery and Backup Databases are often damaged or lost because of
system problems that may be caused by: Human error, Hardware failure, Incorrect or invalid data, Program errors, Viruses, Network failures, Conflicting transactions or, Natural disasters
Mechanisms for restoring a database quickly and accurately after loss or damage are known as Database recovery.
Back-up Facilities Automatic dump facility that produces backup copy
of the entire database Periodic backup (e.g. nightly, weekly) Stored in secure, off-site location
Incremental back-up Cold backup
database is shut down during backup Hot backup
selected portion is shut down and backed up at a given time
Back-up strategies based on demands
Backup Facilities (Contd.)
Backups of static/less frequently changing data may be taken less often.
Incremental backups which record changes made since last full backup, may be taken on interim basis. Incremental backups take lesser time.
Journalizing Facilities
A DBMS must provide journalizing facilities to keep an audit trail of transactions and database changes. Transaction: a discrete unit of work that must be
completely processed or not processed at all within a computer system. e.g., entering a customer record
In the event of failure, a consistent database state can be re-established by using the information in the journals together with the most recent complete backup.
There are two types of journals or logs: Transaction log Database change log
Journalizing Facilities Audit trail of transactions and database updates
Kept on disk or tape; Must be backed-up
Transaction log record of essential data for each transaction processed
against the database Example of data in the transaction log?
Database change log–images of updated data Before-image–copy before modification After-image–copy after modification
Produces an audit trailaudit trail
Journalizing Facilities (Contd.) Transaction log
Data typically recorded for a transaction include transaction code or ID Action or type of transaction (e.g. insert) Time of transaction Terminal number or user ID Input data values Tables and records accessed Records modified And possibly the old and new field values
Figure - Database audit trail
Database (current)
Transaction Log
Database Change log
Database (backup)
DBMS
Copy of database affected by transactionCopy of
transaction
Effect of transaction or recovery action
Transaction Recovery action
Checkpoint Facilities
A facility by which DBMS periodically refuses to accept any new transactions.
All transactions in progress are completed and the journal files are updates.
At this point, system is in a quiet state, and the database & transaction logs are synchronized.
The DBMS writes a special record (checkpoint record) to log file which is like a snapshot of the state of database.
The checkpoint record contains info necessary to restart the system.
Checkpoint Facilities (Contd.) Any dirty blocks (memory pages containing changes not
yet written to the disks) are written from memory to the disk storage, ensuring that all changes made prior to taking checkpoint have been written to long-term storage.
DBMS may perform checkpoints automatically or in response to user application programs.
Checkpoints should be taken frequently (several times an hour).
In case of failures, it is often possible to resume processing from the most recent checkpoint.
Thus only few minutes processing work must be repeated, compared with several hours for a complete restart of the day’s processing.
Recovery Manager
It is a module of the DBMS which restores the database to a correct condition when a failure occurs and which resumes processing user requests.
The type of restart used depends on the nature of the failure.
Recovery manager uses the logs as well as the backup copy to restore the database
Recovery and Restart Procedures Disk Mirroring–switch between identical copies
of databases: fastest recovery; high availability of data required
Restore/Rerun–reprocess transactions against the backup copy; simple, but two disadvantages/last resort:
1. Sequence of transaction
2. Time to reprocess transaction
Transaction Integrity–commit or abort all transaction changes (business transactions are well-defined sequence of steps) Define transaction boundaries: BEGIN
TRANSACTION & COMMIT Transactions should follow 4 properties.
Transaction ACID Properties
Atomic Transaction cannot be subdivided
Consistent Constraints don’t change from before transaction to
after transaction. Example: inventory level Isolated
Database changes not revealed to users until after transaction has completed;
know on-hand inventory after inventory transaction is completed.
Durable Database changes are permanent
Recovery and Restart Procedures Backward Recovery (Rollback)
apply before- images Reverse the changes made
Forward Recovery (Roll Forward)–apply the most recent after-images (much faster and more accurate than
restore/rerun)
Figure Basic recovery techniques a) Rollback
Figure Basic recovery techniques (cont.)b) Rollforward
Database Failure Responses (1) Aborted transactions
Transaction terminated abnormally; human error, hardware failure of loss of transmission etc. Preferred recovery: rollback Automatically done by DBMS
Incorrect data Difficult to detect; Need human intervention, e.g. wrong grades or payment Preferred recovery: rollback Alternative 1: rerun transactions not including inaccurate data updates Alternative 2: compensating transactions, human intervention to correct the error
Database Failure Responses (2) System failure (database intact)
System crash due to power loss, system software failure Preferred recovery: switch to duplicate database Alternative 1: rollback Alternative 2: restart from checkpoint
Database destruction (database is lost, destroyed) Preferred recovery: switch to duplicate database Alternative 1: rollforward to the state before loss Alternative 2: reprocess transactions
What is Concurrency Control? The process of managing simultaneous operations
against a database so that data integrity is maintained and the operations do not interfere with each other in a multi-user environment.
Background
Most DBMS run in a multi-user environment, where several users are able to share data contained in a database.
No data integrity problems occur when data is being read.
But problems can arise when several users are updating data.
When more than one transaction is being processed against a DB at the same time, the transactions are considered to be concurrent.
The actions that must be made to ensure data integrity are called concurrency control actions.
Background (Contd.)
Remember that CPU can process only one instruction at a time.
As new transactions are submitted while other processing is occurring against the database, the transactions are usually interleaved.
The CPU switches among the transactions so that some portion of each transaction is performed as the CPU addresses each transaction in turn.
Problems without Concurrency Control Lost Updates problem can occur when multiple users attempt to
update database. Example Two users with joint account trying to withdraw cash at the same
time using ATM at different locations.User A User B
1.Read account balance($1000) 1.Read account balance
($1000)
2.Withdraw $200
2.Withdraw $3003.Write account balance($800)
3.Write account balance($700)
ERROR
time
Problems without Concurrency Control Inconsistent read problems can occur when one
user reads data that has been partially updated by another user.
This read will be incorrect, also called dirty read or unrepeatable read.
Serializabilty Concurrent transactions need to be processed in isolation so that
they do not interfere with each other.
If one transaction were entirely processes at a time before another one, no interference would occur.
Procedures which emulate this are called serializable.
Serializable schedules process transactions that will give results as if they had been processed one after the other.
Schedules are designed such that transactions that will not interfere with each other can be run in parallel, e.g., transactions that access data from different tables in a database will not conflict with each other.
Serializability is achieved by different ways, but locking mechanisms are the most common.
Locking
Locking implies that any data retrieved by a user for updating must be locked, or denied to other users, until the update is completed or aborted.
Locking mechanism enforces a sequential updating process that prevents erroneous updates.
It is the pessimistic approach of concurrency control
Concurrency Control through Locking User A User B
1.Read account balance2.Lock account balance
1.Read account balance (denied)
2.Read account balance($1000)
2.Withdraw $200
3.Write account balance($800)
4. Unlock account balance 2.Lock account balance 3.Read account balance($800)
4.Withdraw $300
5.Write account balance($500)
6. Unlock account balance
time
Locking Levels There are different locking levels (granularity)
Database Entire DB is locked & becomes unavailable. This level can be
used during backups Table
Entire table containing a requested record is locked. This level can be used for bulk updates that will update entire table (e.g., salary raise)
Block or page The physical storage block (or page) containing the requested
record is locked. This level is commonly implemented. Record level
Only the requested record (row) is locked. All other records within table are available.
Field level Only the particular field (column) in the requested record is
locked.
Types of locks
Shared (S locks or read locks) Allows other transactions to read but not update a record or
resource. Placing a shared lock prevents another user from placing
an exclusive lock, but not a shared lock, on that record. Exclusive (X locks or write locks)
Prevent other transaction from reading and therefore updating a record until unlocked.
Should be placed when updating records. Placing an exclusive lock on a record prevents another
user from placing any type of lock on that record
Deadlocks
Locking may prevent erroneous updates but can lead to deadlocks problem.
Deadlock: When two or more transactions lock a common resource, each one of them waits for the other one to unlock that resource.
Deadlock prevention: users must lock all required records at the beginning of the transaction.
Deadlock resolution: the DBMS detects and breaks the deadlock by backing out of the the deadlocked transactions. Any changes made by that transaction up to the time of
deadlock are removed, The transaction is restarted when the resources become
available.
Versioning
Optimistic approach of concurrency control Each transaction is restricted to a view of a
database as of the time that transaction started and when a transaction modifies a record, the DBMS creates a new record version instead of overwriting the old record.