cm20145 file structure and indexing dr alwyn barry dr joanna bryson

CM20145CM20145File Structure and File Structure and IndexingIndexing

Dr Alwyn BarryDr Joanna Bryson

Last TimeLast Time Locking & Two-Phase Locking Review How Locks Work Other Possible Mechanisms

Graph-Based Protocols Timestamp-Based Protocols Validation-Based Protocols

Further Refinements Multiple Granularity Multi-version Schemes

Deadlock Handling Prevention Cures

Locking & Indexes

Now: File Structure & Indexing

OverviewOverview Storage Access & Buffers File Organization

Fixed Length Records Variable Length

Organization of Records in Files Sequential & Clustering

Data-Dictionary Storage Intro to Indexing

Basic Concepts Ordered Indices

Dense & Sparse Multilevel Primary & Secondary

Storage Hierarchy (Lecture 9)Storage Hierarchy (Lecture 9)

©Silberschatz, Korth and Sudarshan

Modifications & additions by S Bird, J Bryson

Storage AccessStorage Access Blocks

A fixed-length unit. The units for storage allocation and data transfer. Database files are organized into blocks.

Buffer portion of main memory available to store copies of

disk blocks.

Buffer Manager subsystem responsible for allocating buffer space in

main memory.

Block Transfers Want to minimize the number of block transfers

between disk and memory. Keep as many blocks as possible in main memory.

Trans. Server Processes (Lec 9)Trans. Server Processes (Lec 9)

Buffer ManagerBuffer Manager Called when need a block from disk. If the block is already in the buffer:

the requesting program is given the address of the block in main memory.

Otherwise:1. The buffer manager allocates space in the

buffer for the block: Discard some other block, if necessary for

space. Discarded block is written back to disk if it

was modified since it was last fetched.2. The buffer manager:

Reads the blocks from disk to buffer. Passes block’s new main memory address

to requesting program.

Buffer-Replacement PoliciesBuffer-Replacement Policies Most operating systems replace the block using

the least recently used (LRU) strategy. Past usage often predicts future.

But database queries have well-defined access patterns (e.g. sequential scans). A database system can predict future

references from information in a query. LRU can be a bad strategy for access patterns

that repeatedly scan tables. Join of r and s computed by nested loops

for each tuple tr of r do for each tuple ts of s do if the tuples tr and ts match …

Want a mixed replacement strategy with information provided by the query optimizer.

Buffer-Replacement Policies (2)Buffer-Replacement Policies (2) Pinned block (e.g for inner loop, s)

Memory block that is not allowed to be written back to disk.

Toss-immediately strategy (e.g. for r) frees the space occupied by a block as soon as

its final tuple has been processed. Most recently used (MRU) strategy (for s)

If know iterating through table, then most recent block will be unused the longest.

Buffer manager can use statistical information regarding the probability that a request will reference a particular relation. E.g. the data dictionary is frequently accessed.

Heuristic: always keep data-dictionary blocks pinned in main memory.

File OrganizationFile Organization The database is stored as a collection of

files. Each file is a sequence of records. A record is a sequence of fields. One approach:

assume record size is fixed, each file has records of one particular type

only, different files are used for different

relations.

This case is easiest to implement, will consider variable length fields later.

Fixed-Length RecordsFixed-Length Records Simple approach:

Store record i starting from byte n (i – 1), where n is the size of each record.

Record access is simple but records may cross blocks. Modification: do not allow records to cross block

boundaries.

Deletion of record I: alternatives: move records i+1,…,n

to i,…,n – 1 move record n to i do not move records, but link all free records on a free list.

Free ListsFree Lists Store the address of first deleted record in file

header. Then store address of second deleted record in location of first, etc.

Addresses are pointers to free space. More space efficient representation: store

addresses in attributes, not one per record. (No pointers stored in in-use records.)

Variable-Length RecordsVariable-Length Records Variable-length records arise in database

systems in several ways: Storage of multiple record types in one file. Record types that allow variable lengths for

one or more fields. Record types that allow repeating fields:

used in some older data models, not 1st Normal Form!

Approachesa)Byte strings,b)Slotted pages,c) Fixed length representation: reserved space,d)Fixed length representation: pointers.

a) Byte-String Representationa) Byte-String Representation

Byte string representation:• Attach an end-of-record () control character to the end of each record.• Difficulties with insertions and deletions.

b) Slotted Page Representationb) Slotted Page Representation

Slotted page header contains: number of record entries end of free space in the block location and size of each record

Records can be moved around within a page to keep them contiguous with no empty space between them; entry in the header must be updated.

Pointers should not point directly to record — instead they should point to the entry for the record in header.

c) Fixed Length Rep: reserved spacec) Fixed Length Rep: reserved space Fixed-length representation:

reserved space pointers

Reserved space – can use fixed-length records of a known maximum length; unused space in shorter records filled with a null or end-of-record symbol.

d) Fixed Length Rep: Pointersd) Fixed Length Rep: Pointers

Pointer method A variable-length record is represented by a list of

fixed-length records, chained together via pointers. Can be used even if the maximum record length is

not known. May waste less space.

d) Fixed Length Rep: Pointers (2)d) Fixed Length Rep: Pointers (2) Space might still be wasted in all records

except the first in a chain. Solution is to allow two kinds of block in file:

Anchor block – contains the first records of chain Overflow block – contains records other than those

that are the first records of chains

Organization of Records in FilesOrganization of Records in Files Heap (won’t consider this)

a record can be placed anywhere in the file where there is space.

Sequential store records in sequential order, based on the value

of the search key of each record.

Clustering file organization records of several different relations can be stored in

the same file. Storing related records on same block minimizes I/O.

Hashing (next lecture) a hash function computed on some attribute of each

record, result of function specifies which block the record

should be placed in.

Sequential File OrganizationSequential File Organization Suitable for applications that require

sequential processing of the entire file. The records in the file are ordered by a

search key.

Sequential File Organization (2)Sequential File Organization (2) Deletion – use pointer chains. Insertion – locate the position where the record is

to be inserted. If there is free space insert there. If no free space, insert the record in an overflow block. In either case, pointer chain must be updated.

Need to reorganize the file occasionally to restore sequential order

Sequential File Organization (3)Sequential File Organization (3)

It is necessary to fill up the space once a record has been deleted

Shift everything up

Move last record

Clustering File OrganizationClustering File Organization Simple file structure stores each relation in a separate

file. Can instead store several relations in one file using a

clustering file organization. E.g., clustering organization of customer and depositor:

Clustering File Organization (2)Clustering File Organization (2)

Good for queries involving depositor customer, and for queries involving one single customer and their accounts.

Bad for queries involving only customer. Results in variable size records.

Add Pointer ChainsAdd Pointer Chains

Clustering File Organization (3)Clustering File Organization (3)

Data Dictionary StorageData Dictionary Storage

Information about relations: names of relations, names and types of attributes of each relation, names and definitions of views, integrity constraints.

User / accounting information, e.g. passwords. Statistical and descriptive data:

number of tuples in each relation, access frequency. Physical file organization information:

How relation is stored (sequential/hash/…). Physical location of relation:

operating system file name, or disk addresses of blocks containing relation records.

Information about indices.

The data dictionary (or system catalog) stores metadata (data about data), such as:

Data Dictionary Storage (2)Data Dictionary Storage (2) Catalog structure: use either

specialized data structures designed for efficient access, or

a set of relations, with existing system features used to ensure efficient access.

The latter alternative is usually preferred. A possible catalog representation:

Relation-metadata = (relation-name, number-of-attributes, storage-organization, location)Attribute-metadata = (attribute-name, relation-name, domain-type,

position, length)User-metadata = (user-name, encrypted-password, group)Index-metadata = (index-name, relation-name, index-type,

index-attributes)View-metadata = (view-name, definition)

Indexing: Basic ConceptsIndexing: Basic Concepts Indexing mechanisms are used to speed

up access to desired data. E.g., author catalog in library

Search Key - attribute or set of attributes used to look up records.

An index file consists of records (called index entries) of the form

Index much smaller than original file. Two basic kinds of indices:

Ordered indices: search keys stored sorted. Hash indices: search keys distributed

uniformly across buckets using hash function.

search-key pointer

Index Evaluation MetricsIndex Evaluation Metrics Types:

Access types supported efficiently. E.g., records with a specified value in the

attribute. records with an attribute value falling in a

specified range.

Time: Access time Insertion time Deletion time

Space: Space overhead

Ordered IndicesOrdered Indices Ordered index:

Index entries are stored sorted on the search key value. E.g., author catalog in library.

Primary index: In a sequentially ordered file, the index that’s search

key specifies the sequential order of the file. Also called clustering index. The search key of a primary index is usually but not

necessarily the primary key. Secondary index:

An index whose search key specifies an order different from the sequential order of the file.

Also called the non-clustering index. Index-sequential file:

Ordered sequential file with a primary index. Efficient for random access and sequential search.

Dense Index Files: ExampleDense Index Files: Example

Dense index — Index record appears for every search-key value in the file.

Dense Index Files: UpdatesDense Index Files: Updates Deletion:

If deleted record was only record with its search-key value, delete search-key too.

Insertion: lookup using the search-key value appearing

in the record to be inserted. if the search-key value does not appear in the

index, insert it. Multilevel update algorithms are simple

extensions of the single-level algorithms

Sparse Index FilesSparse Index Files Sparse Index:

Contains index records for only some search-key values.

Applicable only when records are sequentially ordered on search-key.

To locate record with search-key value K: Find index record with largest search-key value

< K, Search file sequentially starting at the record to

which that index record points. Less space and less maintenance overhead for

insertions and deletions than dense index. Generally slower than for locating records Good tradeoff: sparse index with an index entry

for every block in file, corresponding to least search-key value in the block.

Sparse Index Files: ExampleSparse Index Files: Example

Sparse Index Files: DeletionSparse Index Files: Deletion If deleted record was only record in the

file with its particular search-key value, the search-key is deleted too. Replacing entry in the index with the next

search-key value in the file, in search-key order.

If the next search-key value already has an index entry, just delete without replacement.

Sparse Index Files: InsertionSparse Index Files: Insertion Single-level index insertion:

Perform a lookup using the search-key value appearing in the record to be inserted.

If index stores an entry for each block of the file, no change needs to be made to the index unless a new block is created.

If new block created, first search-key value appearing in the new block is inserted into the index.

Multilevel insertion algorithms are simple extensions.

Multilevel IndexMultilevel Index If primary index doesn’t fit in

memory, access becomes expensive.

To reduce number of disk accesses to index records, treat primary index kept on disk as a sequential file and construct a sparse index on it. outer index – a sparse

index of primary index. inner index – the primary

index file. If outer index is too large to fit

in main memory, create another level.

Must update indices at all levels on insertion or deletion!

Secondary IndicesSecondary Indices Often want to find records based on values of

fields which are not the primary index. Example: Suppose the account database is

stored sequentially by account number, e.g.1. find all accounts in a particular branch.2. find all accounts with balances in a

specified range. A secondary index would have an index record

for each search-key value. Index record points to a bucket that contains

pointers to all the actual records with that particular search-key value.

Example: Secondary IndexExample: Secondary Index

Secondary Index on Secondary Index on balancebalance field of field of accountaccount

Primary and Secondary IndicesPrimary and Secondary Indices Indices offer substantial benefits when

searching for records. When a file is modified, every index on

the file must be updated. Updating indices imposes overhead on database modification.

Sequential scan using primary index is efficient, but a sequential scan using a secondary index is expensive: Each record access may fetch a new

block from disk. Secondary indices have to be dense.

SummarySummary Storage Access & Buffers File Organization






Next: Indexing & Hashing

Reading & ExercisesReading & Exercises

Reading: Silberschatz Chapter 11.5 – 11.8 Connolly & Begg 2.4, 20.3.2, 17.2

(sort of), Appendix C (this also covers Lecture 17)

Exercises: Silberschatz 11.7,14,17,20 C & B – no exercises in the

appendix, borrow Silberschatz!

Changing Whole RelationsChanging Whole Relations If two-phase locking is used :

A delete operation may be performed only if the transaction deleting the tuple has an exclusive lock on the tuple to be deleted.

A transaction that inserts a new tuple into the database is given an X-lock on the tuple.

Insertions and deletions can lead to the phantom phenomenon. A transaction that scans a relation (e.g., find

all Perryridge accounts) and a transaction that inserts a tuple in the relation (e.g., insert new Perryridge acct.) may conflict without accessing any tuple in common.

Non-serializable schedules can result! (Find transaction may not see new account, yet be serialized after the insert transaction.)

Insert and Delete OperationsInsert and Delete Operations A transaction scanning a relation is

reading information about what tuples the relation contains, while another transaction changes the same info. Something should be locked!

One solution: Associate a data item with the relation, to

represent the information about what tuples the relation contains.

Transactions scanning the relation acquire a shared lock in the data item.

Transactions inserting or deleting a tuple acquire an exclusive lock on the data item. (Note: locks on the data item do not conflict with locks on individual tuples.)

Locking for Inserts & DeletesLocking for Inserts & Deletes Previous protocol provides very

low concurrency for insertions/deletions.

Index locking protocols provide higher concurrency while preventing the phantom phenomenon, by requiring locks on certain index buckets.

Index Locking ProtocolIndex Locking Protocol Every relation must have at least one

index. Access to a relation must be made only through one of the indices on the relation.

A transaction Ti that performs a lookup must lock all the index buckets that it accesses, in S-mode.

A transaction Ti may not insert a tuple ti into a relation r without updating all indices to r.

Ti must perform a lookup on every index to find all index buckets that could have possibly contained a pointer to tuple ti, had it existed already, and obtain locks in X-mode on all these index buckets. Ti must also obtain locks in X-mode on all index buckets that it modifies.

The rules of the two-phase locking protocol must be observed.

Concurrency in Index StructuresConcurrency in Index Structures Indices are unlike other database items in that

their only job is to help in accessing data. Index-structures are typically accessed very

often, much more than other database items. Treating index-structures like other database

items leads to low concurrency. Two-phase locking on an index may result in transactions executing practically one-at-a-time.

It is acceptable to have nonserializable concurrent access to an index as long as the accuracy of the index is maintained.

There are index concurrency protocols where locks on internal nodes are released early, and not in a two-phase fashion.

cm20145 file structure and indexing dr alwyn barry dr joanna bryson

Documents

s memory block

buffer space

recent block

hdiscarded block

bufferreplacement policies

data dictionary blocks

pinned block e

number of block transfers