oracle berkeley db - transactional data storage (tds) tutorial

54
1 A Use-Case Based Tutorial Oracle Berkeley DB Transactional Data Store

Upload: gregory-burd

Post on 30-Nov-2014

4.811 views

Category:

Documents


2 download

DESCRIPTION

 

TRANSCRIPT

Page 1: Oracle Berkeley DB - Transactional Data Storage (TDS) Tutorial

11

A Use-Case Based Tutorial

Oracle Berkeley DB Transactional Data Store

Page 2: Oracle Berkeley DB - Transactional Data Storage (TDS) Tutorial

2

Part III: TransactionalData Store

• Overview• What is TDS?• When is TDS appropriate?

• Case Studies• Web services

Page 3: Oracle Berkeley DB - Transactional Data Storage (TDS) Tutorial

3

What is TDS?

• Recoverable data management: recovery from application or system failure.

• Ability to group operations into transactions: all operations in transaction either succeed or fail.

• Concurrency control: multiple readers and writers operate on the database simultaneously.

Page 4: Oracle Berkeley DB - Transactional Data Storage (TDS) Tutorial

4

Transactions Provide ACID

• Atomic: multiple operations appear as a single operation• all or none of the operations applied

• Consistent: no access sees a partially completed transaction

• Isolated: system behaves as if there is no concurrency

• Durable: modifications persist after failureApplications can relax ACID for performance

(more on this later)

Page 5: Oracle Berkeley DB - Transactional Data Storage (TDS) Tutorial

5

Transactions

• All operations can be transactionally protected• Database create, remove, rename• Key/Data pair get, put, update

• Why should you care?• Never lose data after system or application failure (durability)• Group multiple updates into a single operation (atomicity)• Roll back changes (abort transaction if something odd

happens)• Hide changes from other users until completed (consistency)• Changes made in other transactions while your transaction is

underway aren’t visible unless you want to see them (isolation)

Page 6: Oracle Berkeley DB - Transactional Data Storage (TDS) Tutorial

6

Transaction Terminology

• Thread of control• Process or true thread

• Free-threaded• Object protected for simultaneous access by multiple threads

• Deadlock• Two or more threads of control request mutually exclusive locks• Blocked threads cannot proceed until a thread releases its locks• They are stuck forever.

• Transaction• One or more operations grouped into a single unit of work

Page 7: Oracle Berkeley DB - Transactional Data Storage (TDS) Tutorial

7

Transaction Terminology

• Transaction abort• The unit of work is backed out/rolled back/undone

• Transaction commit• The unit of work is permanently done

• System or application failure• Unexpected application exit, for whatever reason

• Recovery• Making databases consistent after failure so they can be

used again• Must fix data and metadata

Page 8: Oracle Berkeley DB - Transactional Data Storage (TDS) Tutorial

8

Transactional APIs (1)

• New flags to DB_ENV->open• DB_INIT_TXN, DB_INIT_LOCK, DB_INIT_LOG,

DB_RECOVER, DB_RECOVER_FATAL

• New configuration options• DB_ENV->set_tx_max: max number of concurrent transactions• DB_ENV->set_lk_detect: set deadlock resolution policy• DB_ENV->set_tx_timestamp: recover to a timestamp• DB_ENV->set_timeout: transaction timeout

Page 9: Oracle Berkeley DB - Transactional Data Storage (TDS) Tutorial

9

Transactional APIs (2)

• Transaction calls• DB_ENV->txn_begin• DB_TXN->commit• DB_TXN->abort• DB_TXN->prepare

• New error returns• DB_RUNRECOVERY• DB_DEADLOCK

• Utility functions• DB_ENV->txn_checkpoint• DB_ENV->lk_detect• DB_ENV->log_archive

Page 10: Oracle Berkeley DB - Transactional Data Storage (TDS) Tutorial

10

What goes in a transaction?

• All updates must be transaction-protected.• DB->put, DBC->c_put• DBC->c_del, DBC->c_del

• All databases that will be accessed inside transactions must be opened/created in a transaction.

• File system operations• DB_ENV->dbremove• DB_ENV->dbrename

Page 11: Oracle Berkeley DB - Transactional Data Storage (TDS) Tutorial

11

What About Reads?

• Read operations can go in transactions• May not always have to

• A read that is a part of read-modify-write should be transaction protected.

• Reads that must be consistent should be transaction-protected.• Other reads might use weaker semantics

• DB_READ_COMMITTED: never see uncommitted data• DB_READ_UNCOMMITTED: may see uncommitted data

Page 12: Oracle Berkeley DB - Transactional Data Storage (TDS) Tutorial

12

Anatomy of an Application

• Create/Open/Recover environment• Open database handles• Spawn utility threads• Spawn worker threads

• Begin transaction• Do database operations (insert/delete/update/etc.)• Commit or abort transaction• Do it again if appropriate (main loop)

• Close database handles• Close environment

Page 13: Oracle Berkeley DB - Transactional Data Storage (TDS) Tutorial

13

Create/Open/Recover Environment

if ((ret = db_env_create(&DB_ENV, 0)) != 0) …error_handling…

/* Configure the environment. */flags = DB_CREATE |DB_INIT_LOG | DB_INIT_TXN | DB_INIT_LOCK | DB_INIT_MPOOL | DB_RECOVER;if ((ret = DB_ENV->open(DB_ENV, HOME, flags, 0)) != 0) …error_handling…

Page 14: Oracle Berkeley DB - Transactional Data Storage (TDS) Tutorial

14

Create/Open Database

• Must specify DB_ENV to database open• That DB_ENV must have been opened with DB_INIT_TXN

• Database open must be transactional• Specify a DB_TXN in the open• Specify DB_AUTO_COMMIT in the open• Open environment with DB_AUTO_COMMIT

Page 15: Oracle Berkeley DB - Transactional Data Storage (TDS) Tutorial

15

Transaction Operations

• Begin transaction with DB_ENV->txn_begin()• DB_TXN->commit() commits the transaction• Releases all locks• All log records are written to disk (by default)

• DB_TXN->abort() aborts the transaction• Releases all locks• Modifications are rolled back

• Must close all cursors before commit or abort

Page 16: Oracle Berkeley DB - Transactional Data Storage (TDS) Tutorial

16

Transaction Differences

• Without transactions:ret = dbp->put(dbp, NULL, &key, &data, 0);

… error handling …• With transactions

ret = DB_ENV->txn_begin(DB_ENV, NULL, &txn, 0);… error handling …

ret = dbp->put(dbp, txn, &key, &data, 0);… error handling …

ret = txn->commit(txn, 0);… error handling …

• Note: if the DB handle was opened transactionally, then we will automatically wrap every modification operation in a transaction (DB_AUTO_COMMIT behavior).

Page 17: Oracle Berkeley DB - Transactional Data Storage (TDS) Tutorial

17

Transactional Cursors

• Specify transaction handle on cursor creation• Cursor operations are performed inside that

transaction• Cursors must be closed before commit or abort

DBC *dbc = NULL;DB_TXN *txn = NULL;

ret = dbenv->txn_begin(dbenv, NULL, &txn, 0);if (ret != 0) … error handling …

ret = dbp->cursor(dbp, txn, &dbc, 0);if (ret != 0) … error handling …

Page 18: Oracle Berkeley DB - Transactional Data Storage (TDS) Tutorial

18

Transactional Iteration

ret = dbp->cursor(dbp, txn, &dbc, 0); … error handling …

while ((ret = dbc->c_get(dbc, &key, &data, DB_NEXT)) == 0) … process record: write new data, delete old data …

if (ret == DB_NOTFOUND) ret = 0;if ((temp_ret = dbc->close(dbc)) != 0 && ret == 0) ret = temp_ret;if (ret == 0) ret = txn->commit(txn, 0);else (void) txn->abort(txn);if (ret != 0) … error handling …

Page 19: Oracle Berkeley DB - Transactional Data Storage (TDS) Tutorial

19

Nested Transactions

• Split large transactions into smaller ones• Children can be individually aborted.

• Create nested transaction• Pass parent DB_TXN handle to DB_ENV->txn_begin()

• Parent transaction with active children can only• Create more children• Commit or abort

• If parent commits (aborts), all children commit (abort)• Child transaction’s locks

• Won’t conflict with the parent’s locks• Will conflict with other children’s (siblings) locks

Page 20: Oracle Berkeley DB - Transactional Data Storage (TDS) Tutorial

20

Deadlocks

• What are deadlocks?• Deadlock resolution• Deadlock detection • Dealing with deadlocks• Deadlock avoidance

Page 21: Oracle Berkeley DB - Transactional Data Storage (TDS) Tutorial

21

Defining Deadlocks

• Consider two threads• Thread 1: Write A, Write B• Thread 2: Write B, Write A

• Let’s say that the sequence of operations is• T1: Write lock A• T2: Write lock B• T1: Request lock on B (blocks)• T2: Request lock on A (blocks

• Neither thread can make forward progress

Page 22: Oracle Berkeley DB - Transactional Data Storage (TDS) Tutorial

22

Conditions for Deadlock

• Exclusive access• Block when access is unavailable• May request additional resources while

holding resources• The graph of “who is waiting for whom” has a

cycle in it.

Page 23: Oracle Berkeley DB - Transactional Data Storage (TDS) Tutorial

23

Deadlock Resolution

• Two techniques:• Assume a sufficiently long wait is a deadlock and self-abort• Detect and selectively abort

• Berkeley DB supports both• Timeouts

• Use DB_ENV->set_timeout to specify a maximum length of time that a lock or transaction should block.

• Return DB_LOCK_NOTGRANTED if the timeout expires before a lock is granted

Page 24: Oracle Berkeley DB - Transactional Data Storage (TDS) Tutorial

24

Deadlock Detection

• Two step process:• Traverse waits-for graph looking for cycles• If cycle, select victim to abort

• Looking for cycles• Synchronously: checked on each blocking lock

DB_ENV->set_lk_detect()• + Immediate detection and notification• - Higher CPU cost

• Asynchronously: run detector threadDB_ENV->lock_detect() or db_deadlock utility program• - Detected only when thread of control runs• + Lower CPU cost

Page 25: Oracle Berkeley DB - Transactional Data Storage (TDS) Tutorial

25

Picking who to abort

• Ideally want to limit wasted effort.• Deadlock resolution policy is configurable

• Default is DB_LOCK_RANDOM

• Other optionsDB_LOCK_MINLOCKSDB_LOCK_MINWRITE (# of write locks, not write ops)DB_LOCK_YOUNGESTDB_LOCK_OLDEST

• The victim• Gets a DB_LOCK_DEADLOCK error return• Must immediately (close all cursors and) abort

Page 26: Oracle Berkeley DB - Transactional Data Storage (TDS) Tutorial

26

Dealing with Deadlocks

• Transactional applications must code for deadlocks.• Typical response to deadlock is to retry• Repeated retries may indicate a serious problem

Page 27: Oracle Berkeley DB - Transactional Data Storage (TDS) Tutorial

27

Deadlock Example

for (fail = 0; fail < MAXIMUM_RETRY; fail++) { ret = DB_ENV->txn_begin(DB_ENV, NULL, &txn, 0); … error_handling …

ret = db->put(db, txn, &key, &data, 0);

if (ret == 0) {… commit the transaction …return 0; /* Success! */

} else { /* DB_LOCK_DEADLOCK or something else */ … abort the transaction … }}

DB_ENV->err(DB_ENV, ret, “Maximum retry limit exceeded”);return (ret); /* Retry limit reached; give up. */

Page 28: Oracle Berkeley DB - Transactional Data Storage (TDS) Tutorial

28

Deadlock Avoidance

• Good practices• Keep transactions short• Read/write databases in the same order in all transactions• Limit the number of concurrent writers• Use DB_RMW for read-modify-write operations• DB_READ_UNCOMMITTED for readers• DB_READ_COMMITTED for cursors• DB_REVSPLITOFF for cyclical Btrees (grow/shrink/grow/…)

• Debugging• db_stat –Co• db_stat -Cl• “Deadlock Debugging” section of the Reference Guide

Page 29: Oracle Berkeley DB - Transactional Data Storage (TDS) Tutorial

29

Subsystems and Utilities

• Checkpoint• Logging• Backups• Recovery

Page 30: Oracle Berkeley DB - Transactional Data Storage (TDS) Tutorial

30

Checkpoints

• Recall that Berkeley DB maintains a cache of database pages.

• Checkpoints write dirty pages from the cache into the databases• Transaction commit only writes the log

• Checkpoints:• Permit log file reclamation• Reduce recovery time• Block other operations minimally

• Checkpoint is I/O intensive

• Typically, checkpoint in a separate thread• Command line db_checkpoint• Use DB_ENV->txn_checkpoint()

Page 31: Oracle Berkeley DB - Transactional Data Storage (TDS) Tutorial

31

Controlling Checkpoints

• Log file size dictates unit of log reclamation• Use DB_ENV->set_lg_max

• Checkpoint method and utility controlled by• Force checkpoint• More than kbytes of log have been written• More than min minutes have passed

• Reduce checkpoint impact by keeping mpool clean• Use DB_ENV->memp_trickle

Page 32: Oracle Berkeley DB - Transactional Data Storage (TDS) Tutorial

32

Logging

• Database modifications are described in log records.• Log records describe physical changes to pages.• Log records are indexed by log sequence numbers (LSNs)• The log is comprised of one or more log files.• Logs are named log.NNNNNNNNNN• Logs can be maintained on-disk or in-memory

• On-disk logs stored in DB_HOME directory• Or in location configured by DB_ENV->set_lg_dir()

• Log records flushed by DB_TXN->commit()

For best performance and recoverability properties,place logs and databases on separate disks

Page 33: Oracle Berkeley DB - Transactional Data Storage (TDS) Tutorial

33

Logging Configuration

• DB_ENV->set_lg_max()• Set the size of the individual log files • On disk logs: default is 10MB• In-memory logs: default is 256KB

• DB_ENV->set_lg_bsize()• Set the size of the in-memory log buffer• On-disk logs: default is 32KB• In-memory logs: default is 1MB

• DB_ENV->set_lg_regionmax()• Set the size of the logging subsystem’s shared region

(default is 60KB)• Region holds filenames

Page 34: Oracle Berkeley DB - Transactional Data Storage (TDS) Tutorial

34

Backups

• Full backup: Copy database and log files• Standard

• Pause database operations and perform the backup• Creates a snapshot at a known point in time

• Hot• Backup while database(s) are still active• Creates a snapshot at fuzzy point in time

• Incremental backup• Copy log files to be replayed/recovered against a full backup• Hot failover

Page 35: Oracle Berkeley DB - Transactional Data Storage (TDS) Tutorial

35

Standard Backup

1. Commit or abort all on-going transactions

2. Pause all database writes

3. Force a checkpoint

4. Copy all database files to backup locationFind active database files with DB_ENV->log_archive()or db_archive program with DB_ARCH_DATA flag

5. Copy the last log file to backup locationDB_ENV->log_archive() or db_archive with DB_ARCH_LOG identifies all of the log files

Page 36: Oracle Berkeley DB - Transactional Data Storage (TDS) Tutorial

36

Hot Backup

1. Do not stop database operations2. Copy all database files to backup location

1. Use DB_ENV->log_archive() with DB_ARCH_DATA or db_archive –s

2. Database files may be modified during the backup, so the copy utility must read each database page atomically.

3. Copy all log files to backup location

The order of operations must be preserved. Copy databases first and then log files.

Page 37: Oracle Berkeley DB - Transactional Data Storage (TDS) Tutorial

37

Log File Removal

• Remove unused log files to regain disk space• db_archive –l or • DB_ENV->log_archive() to identify unused log

files

• To allow catastrophic recovery• Move to backup media; do not simply delete them

• Never remove all of the transaction logs• Do not remove active transaction logs

Page 38: Oracle Berkeley DB - Transactional Data Storage (TDS) Tutorial

38

Recovery

• Write-ahead logging • Log records always written to disk before database updates

• Recovery• Database changes validated against the log

• Redo committed operations not in the database• Undo aborted operations that are in the

database• Removes and re-initializes the environment files • Must be single-threaded• Other threads of control must wait for recovery

Page 39: Oracle Berkeley DB - Transactional Data Storage (TDS) Tutorial

39

When to Run Recovery

• DB_RUNRECOVERY error • Subsequent API calls return DB_RUNRECOVERY• Restart application and run recovery

• Always perform recovery at application startup• Prevent spreading corruption

Page 40: Oracle Berkeley DB - Transactional Data Storage (TDS) Tutorial

40

Kinds of Recovery

• Normal recovery assumes no loss of media• Reviews log records since the last checkpoint• More frequent checkpoints mean faster recovery• DB_RECOVER flag when opening the environment or

db_recover program

• Catastrophic recovery does the same, but:• Reviews all available log files• Catastrophic recovery can take awhile• DB_RECOVER_FATAL flag or db_recover –c

• Recovery to a timestamp is available

Page 41: Oracle Berkeley DB - Transactional Data Storage (TDS) Tutorial

41

Recovery Procedures

• Simplest case• Re-create the databases, no need for recovery

• Normal recovery• Up-to-date database and log files are available• Not used when creating hot backups

• Catastrophic recovery• Database or log files destroyed or corrupted• Normal recovery fails for any reason• Used when creating hot backups

Page 42: Oracle Berkeley DB - Transactional Data Storage (TDS) Tutorial

42

Catastrophic Recovery

• Often performed in a directory other than where the database files previously lived

• Copy the most recent snapshot of the database and log files to the recovery directory

• Copy any newer log files into the recovery directory• Log files must be recovered in sequential order• Run db_recover -c

• Or call DB_ENV->open() with DB_RECOVER_FATAL

Page 43: Oracle Berkeley DB - Transactional Data Storage (TDS) Tutorial

43

Maintaining a Hot Standby

1. db_archive -s to identify all the active database files 2. Copy identified database files to the backup/failover directory3. Archive all existing log files from the backup directory 4. Use db_archive (no option) in the active environment to identify the

inactive logs and move them to the backup directory5. Use db_archive -l in the active environment to identify all active

log files and copy these to the hot failover directory6. Run db_recover -c against the hot failover directory to

catastrophically recover the environment

• Steps 2-5 can be repeated as often as desired• If step 1 is repeated, steps 2-5 must follow to ensure

consistency

Page 44: Oracle Berkeley DB - Transactional Data Storage (TDS) Tutorial

44

Programming Practices

• Disk guarantees• Application structures• Locking• Special errors

Page 45: Oracle Berkeley DB - Transactional Data Storage (TDS) Tutorial

45

Disk I/O Integrity Guarantees

• Disk writes are performed a database page at a time• Berkeley DB assumes pages are atomically written• If the OS requires multiple writes per page

• A partial page can be written, possibly corrupting the database

• To guard against this:• Set database page size equal to file system page size OR• Configure the environment to perform checksums*• Use DB->set_flags() with the DB_CHKSUM flag• If corruption is detected run catastrophic recovery

* Checksums can be computationally expensive

Page 46: Oracle Berkeley DB - Transactional Data Storage (TDS) Tutorial

46

Disk I/O Integrity Guarantees

• Berkeley DB relies on the integrity of the underlying software and hardware interfaces• Berkeley DB assumes POSIX compliance, especially with regard to

disk I/O• Avoid hardware that performs partial writes or “lies” about writes

(acknowledges them when the data hits the disk cache and does not guarantee that the cache can be written under all circumstances).

• Best practice: store database files and log files on physically separate disks

Page 47: Oracle Berkeley DB - Transactional Data Storage (TDS) Tutorial

47

Transactional Applications

• Must handle recovery gracefully• Always run recovery, normally on startup, before other

threads/processes join the environment• Use DB_REGISTER to indicate if recovery is

necessary.• All DB_ENV handles must be opened with DB_REGISTER for

this to work.• If DB_REGISTER and DB_RECOVER are both set, recovery

will only be run if necessary.

• Use DB_ENV->failchk() after any thread/process failure to determine if it is safe to continue.

Page 48: Oracle Berkeley DB - Transactional Data Storage (TDS) Tutorial

48

Single-process Applications

• A single process, multi-threaded application• First thread opens the database environment

• Run recovery at this time• First thread opens databases• First thread spawns subsequent threads• Subsequent threads share DB_ENV and DB handles• Last thread to exit closes the DB_ENV and DB handles

• If any thread exits abnormally• Some thread calls DB_ENV->failchk()• Return value dictates action: continue or run recovery

Page 49: Oracle Berkeley DB - Transactional Data Storage (TDS) Tutorial

49

Multi-process Applications #1

• Order of process start up must be controlled• Recovery must happen before anything else• Recovery must be single-threaded

• Processes themselves may be threaded• Processes maintain their own DB_ENV and DB handles• If thread of control exits abnormally

• All threads should exit database environment• Recovery must be run

• Automate with DB_REGISTER flag in DB_ENV->open()

Page 50: Oracle Berkeley DB - Transactional Data Storage (TDS) Tutorial

50

Multi-process Applications #2

Often easiest to have a single monitoring process• Monitor is responsible for running recovery• Monitor starts other processes after recovery• Monitor is on a timer or waits for other

processes to die• At which time it uses failchk() mechanism• If failchk() returns DB_RUNRECOVERY:

• Monitor runs recovery• Other processes die when the environment is recreated• Monitor kills stubborn or slow processes• Monitor re-starts the system

Page 51: Oracle Berkeley DB - Transactional Data Storage (TDS) Tutorial

51

Locking in Transactions

• Initialize locking subsystem• Use the flag DB_INIT_LOCK to DB_ENV->open

• Pages or records are read- or write-locked• Conflicting lock requests wait until object available• Locking within transactions

• Locks are released at the end of the transaction• Earlier if specified (DB_READ_COMMITTED, DB_READ_UNCOMMITTED).

• Locking without transactions• Locks are released at the end of the operation

Page 52: Oracle Berkeley DB - Transactional Data Storage (TDS) Tutorial

52

Additional Locking Options

• Use lock timeouts instead of or in addition to regular deadlock detection• DB_LOCK_NOTGRANTED

• (optional) is returned when a lock times out• DB_LOCK_DEADLOCK

• returned when a lock times out• or the transaction is selected for deadlock resolution

• Accuracy of timeout depends on how often deadlock detection is performed

Page 53: Oracle Berkeley DB - Transactional Data Storage (TDS) Tutorial

53

Special Errors

• Catastrophic error: should never happen• Subsequent calls to the API will return the

same error• DB->set_paniccall() orDB_ENV->set_paniccall()• Callback function when catastrophic error occurs

• Exit and restart application• Run recovery if you are using an environment

Page 54: Oracle Berkeley DB - Transactional Data Storage (TDS) Tutorial

54

End of Part II