physical database

85
Physical Database

Upload: jillian-adriano

Post on 30-Dec-2015

28 views

Category:

Documents


0 download

DESCRIPTION

Physical Database. Physical Design Activities. Minimize the number of disk accesses Minimize the amount of space used to store the data Provide for data integrity and protect the data from loss. Disk Storage Devices. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Physical Database

Physical Database

Page 2: Physical Database

Physical Design Activities

• Minimize the number of disk accesses

• Minimize the amount of space used to store the data

• Provide for data integrity and protect the data from loss

Page 3: Physical Database

Disk Storage Devices

• Disk units are referred to as direct access storage device or DASD ( daz_dee)

• Offer high storage capacity and low cost• Data stored as magnetized areas on magnetic

disk surfaces.• A disk pack contains several magnetic disks

connected to a rotating spindle.• For disk packs, the tracks with the same diameter

on the various surfaces are called a cylinder. (a set of tracks that can be accessed w/o moving arms)

Page 4: Physical Database

Cont’• Disks are divided into concentric circular tracks on each disk

surface. Track capacity vary typically from 4 to 50 Kbytes. (about 800 tracks)

• Each concentric circle stores the same amount of information, so bits are packed more densely on the smaller-diameter tracks.

• A track is divided into sectors or blocks (or data page). The division of a track into equal-sized blocks is set by the OS during disk formatting. So, The block size B is fixed for each system.

• A block is a unit of space read/written by the disk head. Typical block sizes range from 512 bytes to 4096 bytes. A whole block is transferred between disk and main memory for processing. (Typical Unix sys: 2 KB; mainframe: 4KB; 1.44 floppy: 512B)

This means the bigger the block size, the faster the data read/write time. However, the storage could also be wasted.  

Page 5: Physical Database

Cont’

• In Oracle, the overhead to manage a block is approximately 90 bytes for a table, and 161 bytes for an index (about 80% are used for data).

• Blocks are separated by fixed-size interblock gaps.• A section of memory set aside to receive a block of data

is called a buffer. Typically, a buffer size is the same as the block size. There are many buffers in commercial systems. (500-3000).

• A read-write head moves to the track that contains the block to be transferred (seek time). Disk rotation moves the block under the read-write head for reading or writing (rotational delay or latency).

Page 6: Physical Database

Cont’• Reading or writing a disk block is time-consuming because of the

seek time s and rotational delay (latency) rd.• Thus, the access time (the average time required to access a

single data record) will be: Average_Seek_time + Avg_Latency + Block_Transfer_time where Block_Transfer_time = Transfer_Rate x Block_Size

• Typical access time ranges from 10-60 milliseconds• The seek time >> rotational delay >> BTT.

So, it is common to transfer several consecutive blocks on the same track or cylinder. (to eliminate ST and RD)

Page 7: Physical Database

Cont’

• Locating data on a disk is a major bottle neck in db applications.

• A physical disk block address consists of a surface number, track number (within surface), and block number (within track).

Page 8: Physical Database

Cont’• Mainframe systems call each disk unit a volume, and each volume

has a volume table of content (VTOC). • In the MS-DOS, the data entry info. is stored into a file allocation

table (FAT). • Typically, relative addressing is used. Each record has a relative

record number. The first record is 0 or certain displacement, etc.  • To speed up processing, double buffering can be used. Once one

buffer is full, the CPU can start processing; at the same time, the I/O processor can be reading and transferring the next block into a different buffer.

Page 9: Physical Database

Files of Records

• A file is a sequence of records, where each record is a collection of data values (or data items).

• A file descriptor (or file header) includes info. that describes the file, such as field names and their data types, and the address of the file blocks on disk.

• Records are stored on disk blocks. The blocking factor bfr for a file is the average number of records stored in a disk block.

Page 10: Physical Database

Cont’• A file can have fixed-length records or variable-length

records. • File records can be unspanned (no record can span two

blocks) or spanned (a record can be stored in more than one block).

• In a file of fixed-length records, all records have the same format. Usually, unspanned blocking is used with such files.

• Files of variable-length records require additional info. to be stored in each record, such as separator characters and field types. Usually spanned blocking is used with such files.

• The physical disk blocks that are allocated to hold the records of a file can be contiguous, linked, or indexed

Page 11: Physical Database

Typical Operations on Files

OPEN, FIND, FIND NEXT, READ, INSERT, DELETE, MODIFY, CLOSE, REORGANIZE, READ_ORDERED

Page 12: Physical Database

Primary File Organization

• Unordered Files

• Ordered Files

• Hashed Files

Page 13: Physical Database

Unordered Files

• Also called a heap or a pile file, new records are inserted at the end of the file.

• To search for a record, a linear search through the file records is necessary. This requires reading and searching half the file blocks on the average, and is hence quite expensive.

• Record insertion is quite efficient.

• Reading the records in order of a particular field requires sorting the file records.

Page 14: Physical Database

Ordered Files

• Also called a sequential ( ordered) file.• File records are kept sorted by the values of an ordering

field. • Insertion is expensive; records must be inserted in the

correct order. It is common to keep a separate unordered overflow (or transaction) file for new records to improve insertion efficiency; this is periodically merged with the main ordered file.

• A binary search can be used to search for a record on its ordering field value. This requires reading and searching log2 (b) blocks on the average, an improvement over linear search.

• Reading the records in order of the ordering field is efficient

Page 15: Physical Database

Hashed (Direct) Files

Static External Hashing• Hashing refers to the process of converting attribute

values directly into addresses.• Hashing for disk files is called external hashing• The file blocks are divided into M equal-sized

buckets, numbered bucket0, bucket1 , ...., bucketM-1. Typically, a bucket corresponds to one (or a fixed number of) disk blocks.

• One of the file fields is designated to be the hash key of the file.

Page 16: Physical Database

Cont’

• The record with hash key value K is stored in bucket i, where i = h(K), and h is the hash function. 

• Collisions occur when a new record hashes to a bucket that is already full. An overflow file is kept for storing such records. Overflow records that hash to each bucket can be linked together.

• To reduce overflow records, a hash file is typically kept 70-80% full. If we expect to have r records to store in the table, we should choose M locations for the address space such that (r/M) is between 0.7 and 0.9.

Page 17: Physical Database

Cont’

• Typically, M is a prime number since it distributes randomly via MOD fn.

• The hash fn h should distribute the records uniformly among the buckets; otherwise search time will be increased because many overflow records will exist.

• Main advantages: search is efficient; in most cases, needs a single block access to retrieve that record (fastest possible access, given the hash value)

Page 18: Physical Database

Disadvantage

• No ordering of records

• No search other than hash field

• Fixed buckets M is a problem if # of records grows or shrinks

Page 19: Physical Database

Cont’

• Only OpenIngres and ORACLE support hashing.

• Oracle supports hashing in terms of table clustering

Page 20: Physical Database

Index Structures for Files 1. Iindexes as Access Paths

2. Types of Single-level Indexe :

Primary Indexes, Clustering Indexes, Secondary Index

3. Multi-level Indexes

4. Using B-Trees and B+-Trees as Dynamic Multi-level Indexes

5. Indexes and Performance Optimization

Page 21: Physical Database

Indexes as Access Paths • A single-level index is an auxiliary file that makes it

more efficient to search for a record in the data file.

• The index is usually specified on one field of the file (could be multiple fields)

- Oracle allows up to 16 attributes (or max 1000-2000 bytes)

 • One form of an index is a file of entries <field_value,

pointer_to_record>, which is ordered by the field value so that a binary search can be used.

 •

Page 22: Physical Database

Cont’

The index is called an access path on the field.

 

• Indexes speed up retrieval of data.

- Most beneficial for read-intensive and large files

- Write/update-intensive (volatile) files produce overheads

Page 23: Physical Database

Types of Indexes

• Primary Index

• Clustering Index

• Secondary Index

• Multi-Level Indexes

Page 24: Physical Database

Primary Index

• Defined on an ordered data file whose records are fixed length with 2 fields

(ordering key field + pointer to a disk block)

• The data file is physically ordered on a Primary Key or an ordering field

 • Includes one index entry (or index record) for

each block in the data file (Not every ordering field value appears as an index

entry)

Page 25: Physical Database

Cont’

• The index entry has the ordering field value for the first record in the block, which is called the block anchor

• A similar scheme can use the last record in a block

Page 26: Physical Database

Cont’

A primary index is an example of a nondense (sparse) index;

• A Nondense index: has an entry for each disk block of the data file

• A Dense index: has an entry for each record

Page 27: Physical Database

Cont’

• Major problems: insertion and deletion of records

-  Use either unordered overflow file or a linked list of overflow records for each block in the data file.

Page 28: Physical Database

ExampleSuppose we have an ordered file EMP (Name,

SSN, Address, Job, Sal, ....), record size R = 100 bytes (fixed and unspanned), block size B = 2048 bytes, #records r = 30,000 records, Blocking factor Bfr = B/R = CEIL(2048/100) = 20 records/block

CEIL(X) returns a next higher integer value after truncation.

E.g., CEIL(5.6) returns 6; or CEIL(12.01) returns 13.

The number of file blocks b = CEIL(r/Bfr) = CEIL(30000/20) = 1500 blocks

Page 29: Physical Database

Cont’

(a) No index; use linear search:

An average linear search cost of (b/2) = (1500)/2 = 750 block accesses

(b) No index; use binary search: A binary search on the data file would

need log2 b = log2 (1500) = 11 block accesses

Page 30: Physical Database

Cont’(c) Index; use binary search:

Suppose Length(SSN) = 9 bytes and a block pointer P = 6 bytes

Index entry size IE_Size = (9 +6) = 15 bytes

The Bfr for the index is (B/IE_size) = BOTTOM(2048/15)

=BOTTOM(136.53) = 136 entries/block

The total # of index entries = 1500

The number of index blocks = CEIL(1500/136) = 12 blocks

A binary search on the index file would need CEIL(log2 b) =

CEIL(log2 (12)) = 4 block accesses

To search for a record using the index, needs one additional block access. So, a total of 4 + 1 = 5 block accesses is needed

Page 31: Physical Database

Cont’

• Comparison:

(Linear_Search: No_Index_&_BS : Index_&_BS) = (750 : 11 : 5)

• BS = Binary Search

Page 32: Physical Database

Clustering Index • Defined on a data file physically ordered on a non-key

field that does not have a distinct value for each record.

 • A relation can have at most one clustering index. • The field is called the clustering field. • Includes one index entry (index record) for each

distinct value of the field • The index entry points to the first data block that

contains records w/ that indexed field value

Page 33: Physical Database

Secondary Index • Defined on an unordered data file

• Can be defined on a key field or a non-key field

• Includes one index entry for each record in the data file: dense index.

 • A block anchor cannot be used in the secondary

index because the records of the data file are not physically ordered by values of the secondary key field.

Page 34: Physical Database

Cont’• A secondary index for a key field is a dense

index (one entry/record)

• A secondary index for a non-key field could be densed or nondensed.

 • A secondary index provides a logical ordering

on the records by the indexing field. • There can be many secondary indexes for a

data file • A secondary index is called an inverted index

Page 35: Physical Database

Multi-Level Indexes

Because a single-level index is an ordered file, we can create a primary index to the index itself.

 • We call the original index the first-level index and

the index to the index the second-level index • We can repeat this process, creating a 3rd,

4th, ..., top level until all entries of the top level fit in one disk block

 

Page 36: Physical Database

Cont’• A multi-level index can be created for any type of

first-level index (primary, clustering, secondary) as long as the first-level index consists of more than one disk block

 • Such a multi-level index is a form of search tree;

however, insertion and deletion of new index entries is a severe problem because every level of the index is an ordered file

 • Hence, most multi-level indexes use B-tree or B+

tree data structures, which leave space in each tree node (disk block) to allow for new index entries

Page 37: Physical Database

Example of Multi-level Tree Index

a)  A table with 1 M records, Index value: 8 bytes and Pointer to row ID: 6 bytes, So, 14 bytes/index entry

b)  1 block = 2 KB, Assume the block header overhead is 150 bytes and the fill factor is 0.80., Actual 1 block space = (block size - block header overhead)* fill_factor = (2 KB – 150) * 0.80 = (2048-150)*0.8 = 1518 bytes, Index Blocking Factor, BF = BOTTOM(1518/14) = 108 index records

Page 38: Physical Database

Cont’

c) #disk blocks necessary for storing 1 M recordsCEIL (1,000,000/108) = 9,260 blocks (disk blocks)  Therefore, we have 9,260 blocks at the leaf level

and 9,260 node pointers at the directory level to the leaf level

Page 39: Physical Database

Cont’

d) The space for directory entries at a higher level: CEIL (9,260/108) = 86 blocks

 e) Now we can create a root of the index tree with

a single block. Thus, the index tree for this example consists of 3 levels:

Root: 1 block2nd level: 86 blocks3rd level: 9,260 blocks

Page 40: Physical Database

Cont’

f) Finding the index entry needs only 3 block accesses in this example.

log108 (1 million) < 3This is more efficient than disk binary search

log2 (1 million) = 20 block access

In general, for a tree of depth K with the blocking factor BF,

The total #leaf-level entries = (BF)^kThe max. #records for 3 level index tree with BF = 108 is(108)^3 = 1,259,712 records

Page 41: Physical Database

Cont’The access to the data requires:( #access to the index) + ( 1 additional disk access

to the data file ) = (3 disk accesses to the index ) + ( 1 to the data file) = 4 disk accesses.

This means over 1 M records can be searched by 4 disk accesses. If we assume that 1 disk access = 0.025 sec, 4 disk access = 0.1 sec + a few calculations 0.1 seconds , Note that we used a simplified method for estimating the access time

Page 42: Physical Database

B Trees and B+ Trees as Dynamic Multi-level Indexes

A tree data structure• A node in a search tree of order P:

- Each node contain at most P-1 search values and P pointers

- Two constraints:

1. Within each node, K1 <K2< .... < Kq-1

2. Subtree nodes has larger values than the LHS search value and smaller values than the RHS search value

- Example search tree of order P= 3

Page 43: Physical Database

Cont’

• A tree is balanced if all of its leaf nodes are at the same level

- Ensures that no nodes will be at very high levels (many block accesses)

- Deletion may leave some nodes nearly empty (storage waste)

 • B Tree and B+ Tree structures are variations of

search trees that allow efficient insertion and deletion of new search values

Page 44: Physical Database

Cont’

• In B Tree and B+ Tree structures, each node corresponds to a disk block (or a disk block + offset)

•  Each node is kept between half-full and completely full

•   An insertion into a node that is not full is quite efficient; if a node is full, the insertion causes a split into two nodes

Page 45: Physical Database

Cont’

• Splitting may propagate to other tree levels

• A deletion is quite efficient if a node does not become less than half full

•  If a deletion causes a node to become less than half full, it must be merged with neighboring nodes

Page 46: Physical Database

B tree: • B trees and its variations were designed to

reside on disk, partially memory resident (the root node)

• It may contain an entry for every record  • In a B tree, pointers to data records exist at all

levels of the tree• B tree insertion and deletion algorithm

guarantees that the tree will always be balanced (performance vs. complexity).

Page 47: Physical Database

B+ tree: • In a B+ tree, all pointers to data records

exists only at the leaf-level nodes•  The leaf nodes have an entry for every

search value w/ a pointer to the record•  The leaf nodes of the B+ trees are usually

linked together to provide ordered access on the search field

•  Some search field values are repeated in the internal nodes, Causes duplicate key values and more maintenance cost

•   Faster sequential processing

Page 48: Physical Database

B* tree:

• A B+ tree whose nodes are at least two-thirds filled

•  Increase storage utilization and speed-up in search

•  Increase update cost (node full occurs more frequently)

•  Used in Oracle

Page 49: Physical Database

Discussion: • Most systems allow users to choose a fill factor b/n 0.5 to 1

 

• The PK constraint is enforced by creating an index for the primary key

 

• A fully inverted file is a file with a secondary index for every field

 

• The most popular index organization is B trees or B+ trees

 

• An ordered file with multi-level primary index on its ordering key field is called an Indexed Sequential File

Page 50: Physical Database

Cont’

• IBM's ISAM (Indexed Sequential Access Method) has a 2-level index:

- Cylinder index (the key value of an anchor record for each cylinder of a disk pack and a pointer to the track index)

- Track index (the key value of an anchor record for each track and a pointer to the track)

• IBM's VSAM (Virtual Storage Access Method) is similar to B+ tree structure

Page 51: Physical Database

Recommendation for Physical Design

• Do not make physical design decisions too soon.

• Do not tie the logical design to physical constraints that may well change

• Optimize the physical design for top 20% of queries and transaction types that are most frequently executed.

Page 52: Physical Database

Recommendation for Indexing - Create indexes for PK, FK, CK, frequently-used attributes in

queries (i.e., name), The uniqueness of values (PK, CK)

- Do not create an index for FK which is not used for selection condition. eg Student(SSN, Name, Grade, Bdate, Zip) and ZIP(Zip, City, State): Find address of all seniors (don't create an index for ZIP in Student) , Find all names living in Wayne (create an index for ZIP in Student)

Page 53: Physical Database

Cont’• For multi-attribute indexes .Attribute with most variety first (selectivity) .If equal variety, attribute with most often accessed

first .Weak entity: "main-attr+ minor-attr" such as

owner PK + partial key (In this case, do not create another index on owner PK in the WE table)

. When two attributes frequently occur at the same time in a query

. A composite index will be used only when the leading column appears in WHERE clause.

Page 54: Physical Database

Cont’• Do not create indexes for attributes with small

#values . Boolean value ----- poor choice, slow down

performance, (e.g., gender: each value will have 50% of table size; big overhead)

. Phone number ----- good choice . Area code ---------- marginal . Small tables (less than 30 or 100 records) (Better left unindexed except to enforce uniqueness

in the PK) . Small # of blocks: (less than 8 data blocks in Oracle)

Page 55: Physical Database

Heuristics . A frequent retrieval with less than 1% retrieval is a

good candidate.. A frequent retrieval with 1-5% retrieval should be

considered.. An occasional retrieval with over 2% retrieval is not

a good choice.. The more static a file, the more attractive. The larger the record, the greater the value of

indexing. Don't create more than 3 indexes for a volatile file in

general. Create as many indexes as needed for static files

Page 56: Physical Database

When to create indexes

• When the table is initially loaded with data

• This method will create the index much faster for the sort algorithm will sort all the index values at once and then insert them into a B* tree.

Page 57: Physical Database

NULL and Indexes .Columns that are NULL will not appear in an index

(Except in cluster index)

. Indexes with more than one column will have an entry if any of the columns are not NULL

. Since an index does not include NULL value, some queries will be faster to run if you leave a numeric column NULL, rather than zero

SELECT Name, Commission

FROM Commission

WHERE Commission > 0;

Page 58: Physical Database

Guidelines for Not Using Indexes - If the index is never used by the optimizer

- If more than 10-20% of the rows are to be retrieved

- If the column contains only one, two or three unique values.

- If the column to be indexed is too long (more than 20 bytes)

- If the overhead of maintaining the index is greater than the benefit.

Page 59: Physical Database

LOGICAL STRUCTURES

Schema Objects:

Base table, view, sequences, indexes, clusters, database links, synonyms, procedures, packages

 

Page 60: Physical Database

LOGICAL STORAGE UNITS

TableSpaces

- A tablespace is a logical data organization unit that contains one or more db objects (DD, tables, indexes, or clusters).

- A db consists of one or more tablespaces.

- Every db has at least one tablespace: SYSTEM that holds the DDs, the names and locations of all the tablespaces, tables, indexes, and clusters for this db

- Physically, a tablespace has one or more data files.

 

Page 61: Physical Database

Cont’Example

• CREATE TABLESPACE talbot DATAFILE 'HOME.ONE‘ SIZE 1000K DEFAULT STORAGE (INITIAL 25K NEXT 10K MINEXTENTS 1 MAXEXTENTS 100 PCTINCREASE 50); 

• /* Declaring a table within a tablespace */

• CREATE TABLE ledger (ActionDateDATE, ........)

TABLESPACE talbot;

Page 62: Physical Database

Segments

A segment is a logical storage space for a db object.

- A tabelspace contains many segments.

- Types of segments in Oracle:

Table, Index, Rollback, Temporary, Partition, Cluster

Page 63: Physical Database

Extents

- A segment consists of one or many extents.

- An extent is a set of contiguous data blocks used to store a particular type of information.

- Each segment has an initial disk space set aside called the initial extent.

- The server allocates an additional extent to a segment whose space is full.

- Extents cannot span datafiles.

Page 64: Physical Database

Blocks • A block is the smallest logical (also physical) unit of

storage.

- Several blocks comprise an extent.• A block contains:

. fixed block header (block address, segment type, the table and rows that use the block, etc about 113 bytes.),

. variable block header

. table directory (stores info about the tables in the cluster)

. the actual data with 2 bytes of row overhead, and

. free space (PCTFREE): percent of the block space reserved for update

- DB_FILE_MULTIBLOCK_READ_COUNT in INIT.ORA shows the #of multi-blocks that can be read into buffer cache by a single read request during full table scan.

Page 65: Physical Database

Controlling Space Usage at BLOCK Level

- Identifying the block size:

SQLDBA> SHOW PARAMETERS db_block_size;

SQL> SELECT SEGMENT_NAME, BYTES, BLOCKS, BYTES/BLOCKS Block_Size

FROM USER_SEGMENT;

Page 66: Physical Database

Cont’

- Computing the number of blocks

ANALYZE TABLE mytable COMPUTE STATISTICS;

SELECT BLOCKS FROM USER_TABLES

WHERE Table_Name = ‘MYTABLE’;

Page 67: Physical Database

- CREATE TABLE product ( ....... ) TABLESPACE users PCTFREE 20 PCTUSED 60 STOARGE (INITIAL 10M NEXT 5K PCTINCREASE 0); - PCTFREE Parameter. "Percent free" means what percentage of a block to keep free for

changes to rows in the block. The purpose of PCTFREE is to minimize disk I/O by reducing row

chaining and row migration. Row migration: moving an entire row to a new block after an

update of a row (VARCHAR2 or NULL): occurs when PCTFREE is too small.

. Row chaining: the splitting of a row across multiple blocks (LONG or LONG RAW)

. The default = 10

. Used for both tables and indexes.

Page 68: Physical Database

Cont’PCTFREE and Tables

. The free space must be large enough to handle updates.

. Lower PCTFREE can cause row migration.

. The more volatile your data, the larger PCTFREE should be.

. For tables with no updates, PCTFREE should be zero.

Page 69: Physical Database

PCTFREE and Indexes

. Specifies the free space that should be left within the leaf blocks

Example: Employee index on last_name

If we have 10,000 employees and expect 2000 more, PCTFREE might be 25

This avoids the split of index leaf blocks.

If we build an index whose index key is higher than all current values, PCTFREE could be zero

Page 70: Physical Database

DISPLAYING STORAGE INFORMATION with DD VIEWS

• USER_EXTENT Extents of segments belonging to the user• USER_FREE_SPACE Free extents in a tablespace accessible to the user• USER_SEGMENTS Storage allocation for segments belonging to the user• USER_TABLESPACES Description of accessible tablespaces• USER_USERS Information about the current user• USER_TS_QUOTAS Tablespace quotas for the user

Page 71: Physical Database

PCTUSED Parameter. When the percentage of a data block falls down to the

value of PCTUSED, the block becomes available for inserts

. Used only for tables, not for indexes.

. The default value of PCTUSED = 40

. The higher the PCTUSED, the fuller the data block

. The smaller the gap between PCTUSED and PCTFREE, the more number of blocks on the free list even if none of them have enough space to accommodate most inserts.

. To avoid the bottleneck, leave some slack space b/n PCTFREE and PCTUSED

Page 72: Physical Database

Indexing in Oracle

• There are some differences between indexing theory we discussed in this chapter and those implemented in Oracle.

• Oracle does not have Primary index as we discussed. The index created by the primary key clause is a dense secondary index. The reason is Oracle does not sort the rows based on the value of primary key even though you create the primary key. Rows are just appended as you insert into a table.

Page 73: Physical Database

Oracle automatically creates a dense secondary index when PRIMARY KEY or UNIQUE clause is used:

. CREATE TABLE employee (SSN CHAR(9) PRIMARY KEY ……..);

. ALTER TABLE employee ADD CONSTRAINT emp_PK PRIMARY KEY (SSN).

Page 74: Physical Database

- If you drop the primary key, the unique index is automatically dropped.

 

-  However, if you sort the data by the primary key using CTAS (CREATE TABLE xyz AS SELECT …..), the index on the primary key will be a primary index as discussed in the book. Note that a primary index will speed up your query processing based on the primary key.

Page 75: Physical Database

-  Indexes in Oracle are stored in a B* tree with doubly-linked leaf blocks.

-   An index may have max. 16 column values in a composite index.

-   Hashing is supported only in the clustering tables.-   Recent version of Oracle also supports Bitmap indexes

for OLAP or data warehousing applications. Bitmap indexes are only for low-cardinality attributes (that has a small number of unique values) such as SEX, REGION, STATE, or CODE-like attributes.

Page 76: Physical Database

Index Usage - Indexes speed up query processing on larger tables, but

could cause overhead for volatile files- Small tables ( |rows| <1000 or less than 8 blocks) may

not be beneficial with indexes- Users do not have to specify indexes in queries- Indexes are typically automatically used when mentioned in

WHERE clause- Indexes are stored in data dictionary called

USER_INDEXES (IND).- Indexes are automatically updated when the table is updated- - Indexes are stored in data dictionary called

USER_INDEXES (IND)- - Indexes can be created for any data type except LONG

Page 77: Physical Database

When Indexes are Ignored?

-         When there is no WHERE clause

-         When WHERE clause contains IS NOT NULL or IS NULL

-         When the comparison is <> (or !=)

When an indexed column is modified by a function such as SUBSTR or ||

Page 78: Physical Database

Storage Allocation in Oracle - Oracle only allocates whole blocks, not parts of

blocks

-  Oracle allocates sets of blocks, usually I multiple of five blocks

-  Oracle may allocate larger or smaller sets of blocks depending on the available free space in the tablespace.

-   Oracle uses 7 bytes for a DATE values and 3 bytes for a NUMBER. IN storage estimation, use 8 bytes for a DATE and 4 bytes for a NUMBER including column overhead.

Page 79: Physical Database

INDEX Command in Oracle

CREATE [UNIQUE | BITMAP] INDEX index-name ON { table-name (attribute [ASC | DESC] list) | CLUSTER cluster_name }

[TABLESPACE <tablespace>]

[ INITRANS <integer> ]

[ MAXTRANS <integer> ]

[ PCTFREE <integer> ]

[PCTUSED <integer>]

[STORAGE <storage clause>]

Page 80: Physical Database

[ ASC | DESC ]- ASC and DESC means ascending and descending- This is only for DB2 compatibility, but has no effect- Indexes in Oracle are always ascending  [ TABLESPACE ]- Specifies the table space in which the index is created- Store the table and index in different tablespaces- This allow concurrent access to the table and the index [ INITRANS ]- Number of transaction entries allocated to each block of the index- [Every transaction that modifies the block requires a transaction

entry]- The minimum and default value is 2.- 23 * INITRANS bytes are used as a block header.

Page 81: Physical Database

[ MAXTRANS ]- Max. number of transaction entries allocated to each block of the index- Max number of concurrent transactions that can update an index block- Max and default is 255. [ PCTFREE value1 ]- The percentage of free space left for updates on the index page- The default value is 10 (i.e., each index page is 90% full)- Use higher value for a volatile file- Use a lower value for a stable file [ PCTUSED ]- The percentage of a block that Oracle attempts to fill before it allocates another

block.- The default is 40. [ STORAGE ] clause- If the index will occupy less than 25 blocks, then one extent for the segment will

be allocated[Extents are contiguous blocks]

Page 82: Physical Database

- INITIAL -- First extent in bytes. K or M can be used -- The default value is 5 data blocks. Minimum is 2.- NEXT -- The size of the next extent that will be allocated to the

index.- PCTINCREASE -- The percent by which each extent grows over the

previous extent.- MINEXTENTS -- The total number of extents that are created when a

segment is created [One segment corresponds to database objects]- MAXEXTENTS -- The total number of extents that can be created for the

segment.

Page 83: Physical Database

EXAMPLE:CUST (Cust_num, Fname, Lname, Address, City, State, Zip, Phone)

• Creating an index on a single attribute CREATE INDEX Cust_Lname_IDX ON Cust

(Lname);/* Cust_Lname_IDX is a user-defined index file

name *//* Index name can be up to 30 char long *//* Good naming convention:

TableName_AttributeName *//* Oracle creates a B* tree index using Lname. */

Page 84: Physical Database

Creating an index on multi-attribuesCREATE INDEX Cust_Lname_Zip_IDX ON Cust (Cname, Zip ASC);/* You can have up to 16 attributes in one index *//* You can have columns in only one table *//* Entries must not exceed 1/3 of the data block size *//* Put the most restrictive column first *//* Put the most frequently queried column first */

 Creating a unique index to enforce a unique constraint

CREATE UNIQUE INDEX Cust_num_IDX ON Cust (Cust_Num);/* Declaring an attribute as PRIMARY KEY or UNIQUE automatically

create a unique index */

Page 85: Physical Database

Creating an index with parametersCREATE INDEX Cust_Lname_IDX ON Cust (Lname)

TABLESPACE cust_data_spacePCTFREE 20PCTUSED 50STORAGE ( INITIAL 5K

NEXT 5KMINEXTENTS 1 MAXEXTENTS 50

PCTINCREASE 0);

Dropping an indexDROP INDEX Cust_Lname_Zip_IDX;