![Page 1: CS 440 Database Management Systems Lecture 6: Data storage & access methods 1](https://reader035.vdocument.in/reader035/viewer/2022062600/5a4d1b517f8b9ab0599a78e5/html5/thumbnails/1.jpg)
1
CS 440 Database Management Systems
Lecture 6: Data storage & access methods
![Page 2: CS 440 Database Management Systems Lecture 6: Data storage & access methods 1](https://reader035.vdocument.in/reader035/viewer/2022062600/5a4d1b517f8b9ab0599a78e5/html5/thumbnails/2.jpg)
2
Database System Implementation
Conceptual Design
Physical Storage Schema
Entity Relationship(ER)
Model
Relational Model Files and Indexes
User Requirements
![Page 3: CS 440 Database Management Systems Lecture 6: Data storage & access methods 1](https://reader035.vdocument.in/reader035/viewer/2022062600/5a4d1b517f8b9ab0599a78e5/html5/thumbnails/3.jpg)
3
The advantage of RDBMS• It separates logical level (schema) from physical
level (implementation). • Physical data independence– Users do not worry about how their data is stored and
processes on the physical devices.– It is all SQL!– Their queries work over (almost) all RDBMS
deployments.
![Page 4: CS 440 Database Management Systems Lecture 6: Data storage & access methods 1](https://reader035.vdocument.in/reader035/viewer/2022062600/5a4d1b517f8b9ab0599a78e5/html5/thumbnails/4.jpg)
4
DBMS Architecture
Query Executor
Buffer Manager
Storage Manager
Storage
Transaction Manager
Logging & Recovery
Lock Manager
Buffers Lock Tables
Main Memory
User/Web Forms/Applications/DBA
query transaction
Query Optimizer
Query Rewriter
Query Parser
Files & Access Methods
Process manager
![Page 5: CS 440 Database Management Systems Lecture 6: Data storage & access methods 1](https://reader035.vdocument.in/reader035/viewer/2022062600/5a4d1b517f8b9ab0599a78e5/html5/thumbnails/5.jpg)
5
Challenges in physical level• Processor: 10000 – 100000 MIPS• Main memory: around 10 Gb/ sec.• Secondary storage: higher capacity and durability• Disk random access – Seek time + rotational latency + transfer time– Seek time: 4 ms - 15 ms!– Rotational latency: 2 ms – 7 ms!– Transfer time: at most 1000 Mb/ sec– Read, write in blocks.
![Page 6: CS 440 Database Management Systems Lecture 6: Data storage & access methods 1](https://reader035.vdocument.in/reader035/viewer/2022062600/5a4d1b517f8b9ab0599a78e5/html5/thumbnails/6.jpg)
6
Gloomy future: Moor’s law• Speed of processors and cost and maximum
capacity of storage increase exponentially over time.
• But storage (main and secondary) access time grows much more slowly.
![Page 7: CS 440 Database Management Systems Lecture 6: Data storage & access methods 1](https://reader035.vdocument.in/reader035/viewer/2022062600/5a4d1b517f8b9ab0599a78e5/html5/thumbnails/7.jpg)
7
Random access versus sequential access
• Disk random access : Seek time + rotational latency + transfer time.
• Disk sequential access: reading blocks next to each other– No seek time or rotational latency –Much faster than random access
![Page 8: CS 440 Database Management Systems Lecture 6: Data storage & access methods 1](https://reader035.vdocument.in/reader035/viewer/2022062600/5a4d1b517f8b9ab0599a78e5/html5/thumbnails/8.jpg)
8
Units of data on physical device• Fields: data items• Records• Blocks• Files
![Page 9: CS 440 Database Management Systems Lecture 6: Data storage & access methods 1](https://reader035.vdocument.in/reader035/viewer/2022062600/5a4d1b517f8b9ab0599a78e5/html5/thumbnails/9.jpg)
9
Fields• Fixed size– Integer, Boolean, …
• Variable length– Varchar, …– Null terminated– Size at the beginning of the string
![Page 10: CS 440 Database Management Systems Lecture 6: Data storage & access methods 1](https://reader035.vdocument.in/reader035/viewer/2022062600/5a4d1b517f8b9ab0599a78e5/html5/thumbnails/10.jpg)
10
Records: sets of fields• Schema– Number of fields, types of fields, order, …
• Fixed format and length– Record holds only the data items
• Variable format and length– Record holds fields and their size, type, …
information• Range of formats in between
![Page 11: CS 440 Database Management Systems Lecture 6: Data storage & access methods 1](https://reader035.vdocument.in/reader035/viewer/2022062600/5a4d1b517f8b9ab0599a78e5/html5/thumbnails/11.jpg)
11
Record header• Pointer to the record schema ( record type)• Record size• Timestamp• Other info …
![Page 12: CS 440 Database Management Systems Lecture 6: Data storage & access methods 1](https://reader035.vdocument.in/reader035/viewer/2022062600/5a4d1b517f8b9ab0599a78e5/html5/thumbnails/12.jpg)
12
Blocks• Collection of records• Reduces number of I/O access• Different from OS blocks–Why should RDBMS manage its own blocks?• It knows the access pattern better than OS.
• Separating records in a block– Fixed size records: no worry!–Markers between records– Keep record size information in records or block
header.
![Page 13: CS 440 Database Management Systems Lecture 6: Data storage & access methods 1](https://reader035.vdocument.in/reader035/viewer/2022062600/5a4d1b517f8b9ab0599a78e5/html5/thumbnails/13.jpg)
13
Spanned versus un-spanned• Un-spanned– Each records belongs to only one block
• Spanned– Records may be stored across multiple blocks– Saves space– The only way to deal with large records and fields:
blob, image, …
![Page 14: CS 440 Database Management Systems Lecture 6: Data storage & access methods 1](https://reader035.vdocument.in/reader035/viewer/2022062600/5a4d1b517f8b9ab0599a78e5/html5/thumbnails/14.jpg)
14
Block header• Data about block• File, relation, DB IDs • Block ID and type• Record directory• Pointer to free space• Timestamp• Other info …
![Page 15: CS 440 Database Management Systems Lecture 6: Data storage & access methods 1](https://reader035.vdocument.in/reader035/viewer/2022062600/5a4d1b517f8b9ab0599a78e5/html5/thumbnails/15.jpg)
15
Heap versus sorted files• Heap files– There is not any order in the file– New blocks are inserted at the end of the file.
• Sorted files– Order blocks (and records) based on some key.– Physically contiguous or using links to the next
blocks.
![Page 16: CS 440 Database Management Systems Lecture 6: Data storage & access methods 1](https://reader035.vdocument.in/reader035/viewer/2022062600/5a4d1b517f8b9ab0599a78e5/html5/thumbnails/16.jpg)
16
Average cost of data operations• Insertion – Heap files are more efficient.– Overflow areas for sorted files.
• Search for a record or a range of records– Sorted files are more efficient.
• Deletion– Heap files are more efficient – Although we find the record faster in the sorted file.
![Page 17: CS 440 Database Management Systems Lecture 6: Data storage & access methods 1](https://reader035.vdocument.in/reader035/viewer/2022062600/5a4d1b517f8b9ab0599a78e5/html5/thumbnails/17.jpg)
17
Row versus column stores• We have talked about row store– All fields of a record are stored together.
SSN1 Name1 Age1 Salary1SSN2 Name2 Age2 Salary2SSN3 Name3 Age3 Salary3
![Page 18: CS 440 Database Management Systems Lecture 6: Data storage & access methods 1](https://reader035.vdocument.in/reader035/viewer/2022062600/5a4d1b517f8b9ab0599a78e5/html5/thumbnails/18.jpg)
18
Row versus column stores• We can store the fields in columns.–We can store SSNs implicitly.
SSN1 Name1SSN2 Name2SSN3 Name3
SSN1 Age1SSN2 Age2SSN3 Age3
SSN1 Salary1SSN2 Salary2SSN3 Salary3
![Page 19: CS 440 Database Management Systems Lecture 6: Data storage & access methods 1](https://reader035.vdocument.in/reader035/viewer/2022062600/5a4d1b517f8b9ab0599a78e5/html5/thumbnails/19.jpg)
19
Row versus column store• Column store– Compact storage– Faster reads on data analysis and mining operations
• Row store– Faster writes – Faster reads for record access (OLTP)
• Further reading–Mike Stonebreaker, et al, “C-Store, a column oriented
DBMS”, VLDB’05.
![Page 20: CS 440 Database Management Systems Lecture 6: Data storage & access methods 1](https://reader035.vdocument.in/reader035/viewer/2022062600/5a4d1b517f8b9ab0599a78e5/html5/thumbnails/20.jpg)
20
Access paths• The methods that RDBMS uses to retrieve the
data.• Attribute value(s) Tuple(s)
![Page 21: CS 440 Database Management Systems Lecture 6: Data storage & access methods 1](https://reader035.vdocument.in/reader035/viewer/2022062600/5a4d1b517f8b9ab0599a78e5/html5/thumbnails/21.jpg)
21
Types of search queries• Point query over Beers(name, manf) Select *
From BeersWhere name = ‘Bud’;
• Range query over Sells(bar, beer, price) Select *
From SellsWhere price > 2 AND price <
10;
![Page 22: CS 440 Database Management Systems Lecture 6: Data storage & access methods 1](https://reader035.vdocument.in/reader035/viewer/2022062600/5a4d1b517f8b9ab0599a78e5/html5/thumbnails/22.jpg)
22
Types of access paths• Full table scan– Heap files– Inefficient for both point and range queries.
• Sequential access– Sorted files– Efficient for both point and range queries. – Inefficient to maintain
• Middle ground?
![Page 23: CS 440 Database Management Systems Lecture 6: Data storage & access methods 1](https://reader035.vdocument.in/reader035/viewer/2022062600/5a4d1b517f8b9ab0599a78e5/html5/thumbnails/23.jpg)
23
Indexing• An old idea
![Page 24: CS 440 Database Management Systems Lecture 6: Data storage & access methods 1](https://reader035.vdocument.in/reader035/viewer/2022062600/5a4d1b517f8b9ab0599a78e5/html5/thumbnails/24.jpg)
24
Index• A data structure that speeds up selecting tuples in
a relation based on some search keys.• Search key– A subset of the attributes in a relation–May not be the same as the (primary) key
• Entries in an index– (k, r)– k is the search key.– r is the pointer to a record (record id).
![Page 25: CS 440 Database Management Systems Lecture 6: Data storage & access methods 1](https://reader035.vdocument.in/reader035/viewer/2022062600/5a4d1b517f8b9ab0599a78e5/html5/thumbnails/25.jpg)
25
Index• Data file stores the table data. • Index file stores the index data structure.
• Index file is smaller than the data file. • Ideally, the index should fit in the main memory.
10
20
30
40
50
60
70
80
10
20
30
40
50
60
Data File Index File
![Page 26: CS 440 Database Management Systems Lecture 6: Data storage & access methods 1](https://reader035.vdocument.in/reader035/viewer/2022062600/5a4d1b517f8b9ab0599a78e5/html5/thumbnails/26.jpg)
26
Index categorizations• Clustered vs. unclustered – Records are stored according to the index order.– Records are stored in another order, or not any order.
• Dense vs. sparse – Each record is pointed by an entry in the index.– Each block has an entry in the index.– Size versus time tradeoff.
• Primary vs. secondary – Primary key is the search key– Other attributes.
![Page 27: CS 440 Database Management Systems Lecture 6: Data storage & access methods 1](https://reader035.vdocument.in/reader035/viewer/2022062600/5a4d1b517f8b9ab0599a78e5/html5/thumbnails/27.jpg)
27
Index categorizations• Clustered and dense
10
20
30
40
50
60
70
80
10
20
30
40
50
60
DATAINDEX
![Page 28: CS 440 Database Management Systems Lecture 6: Data storage & access methods 1](https://reader035.vdocument.in/reader035/viewer/2022062600/5a4d1b517f8b9ab0599a78e5/html5/thumbnails/28.jpg)
28
Index categorizations• Clustered and sparse
10
30
50
70
90
110
10
20
30
40
50
60
DATAINDEX
70
80
![Page 29: CS 440 Database Management Systems Lecture 6: Data storage & access methods 1](https://reader035.vdocument.in/reader035/viewer/2022062600/5a4d1b517f8b9ab0599a78e5/html5/thumbnails/29.jpg)
29
Duplicate search keys • Clustered and dense
10
20
30
40
50
60
10
10
10
20
20
30
DATAINDEX
40
50
![Page 30: CS 440 Database Management Systems Lecture 6: Data storage & access methods 1](https://reader035.vdocument.in/reader035/viewer/2022062600/5a4d1b517f8b9ab0599a78e5/html5/thumbnails/30.jpg)
30
Duplicate search keys • Clustered and sparse:
– Any problem?
10
10
20
40
50
60
10
10
10
20
20
30
DATAINDEX
40
50
![Page 31: CS 440 Database Management Systems Lecture 6: Data storage & access methods 1](https://reader035.vdocument.in/reader035/viewer/2022062600/5a4d1b517f8b9ab0599a78e5/html5/thumbnails/31.jpg)
31
Duplicate search keys • Clustered and sparse: – Point to the lowest new search key in every block
10
20
30
40
50
10
10
10
20
20
30
DATAINDEX
40
50
![Page 32: CS 440 Database Management Systems Lecture 6: Data storage & access methods 1](https://reader035.vdocument.in/reader035/viewer/2022062600/5a4d1b517f8b9ab0599a78e5/html5/thumbnails/32.jpg)
32
Unclustered Index• Dense / sparse?
10
10
10
20
20
30
30
40
30
10
20
30
10
20
DATAINDEX
10
40
![Page 33: CS 440 Database Management Systems Lecture 6: Data storage & access methods 1](https://reader035.vdocument.in/reader035/viewer/2022062600/5a4d1b517f8b9ab0599a78e5/html5/thumbnails/33.jpg)
33
Well known index structures• B+ trees:– very popular
• Hash tables: – Not frequently used
![Page 34: CS 440 Database Management Systems Lecture 6: Data storage & access methods 1](https://reader035.vdocument.in/reader035/viewer/2022062600/5a4d1b517f8b9ab0599a78e5/html5/thumbnails/34.jpg)
34
B+ trees• The index of a very large data file gets too large.
• How about building an index for the index file?
• A multi-level index, or a tree
![Page 35: CS 440 Database Management Systems Lecture 6: Data storage & access methods 1](https://reader035.vdocument.in/reader035/viewer/2022062600/5a4d1b517f8b9ab0599a78e5/html5/thumbnails/35.jpg)
35
B+ trees• Degree (order) of the tree: d• Each node (except root) stores [d, 2d] keys:
10 32 94
[A , 10) [10, 32) [32, 94) [94, B)
Non-leaf nodes
12 28 32
12 28 32
39 41 65Leaf nodes
Records
![Page 36: CS 440 Database Management Systems Lecture 6: Data storage & access methods 1](https://reader035.vdocument.in/reader035/viewer/2022062600/5a4d1b517f8b9ab0599a78e5/html5/thumbnails/36.jpg)
36
Example
60
19 50 80 90 110
12 13 17 19 21 30 40 50 52 60 65 72
12 13 17 19 21 30 40 50 52 60 65 72
d = 2
![Page 37: CS 440 Database Management Systems Lecture 6: Data storage & access methods 1](https://reader035.vdocument.in/reader035/viewer/2022062600/5a4d1b517f8b9ab0599a78e5/html5/thumbnails/37.jpg)
37
B+ tree tuning• How to choose the value of d?– Each node should fit in a block.
• Example– Key value: 8 byte– Record pointer: 16 bytes– Block size: 4096 bytes– 2d * 8 + (2d + 1) * 16 <= 4096– d <= 85
![Page 38: CS 440 Database Management Systems Lecture 6: Data storage & access methods 1](https://reader035.vdocument.in/reader035/viewer/2022062600/5a4d1b517f8b9ab0599a78e5/html5/thumbnails/38.jpg)
38
Retrieving tuples using B+ tree • Point queries– Start from the root and follow the links to the leaf.
• Range queries– Find the lowest point in the range.– Then, follow the links between the nodes.
• The top levels are kept in the buffer pool.
![Page 39: CS 440 Database Management Systems Lecture 6: Data storage & access methods 1](https://reader035.vdocument.in/reader035/viewer/2022062600/5a4d1b517f8b9ab0599a78e5/html5/thumbnails/39.jpg)
39
B+ tree and index categories• B+ tree index could be – Dense / sparse?– Clustered/ unclustered?
![Page 40: CS 440 Database Management Systems Lecture 6: Data storage & access methods 1](https://reader035.vdocument.in/reader035/viewer/2022062600/5a4d1b517f8b9ab0599a78e5/html5/thumbnails/40.jpg)
40
Inserting a new key• Pick the proper leaf node and insert the key.• If the node contains more than 2d keys, split the
node and insert the extra node in the parent.
– If leaf level, add K3 to the right node
K1 K2 K3 K4 K5
R0 R1 R2 R3 R4 R5
K1 K2
R0 R1 R2
K4 K5
R3 R4 R5
(K3, ) parent
![Page 41: CS 440 Database Management Systems Lecture 6: Data storage & access methods 1](https://reader035.vdocument.in/reader035/viewer/2022062600/5a4d1b517f8b9ab0599a78e5/html5/thumbnails/41.jpg)
41
Insertion
60
19 50 80 90 110
12 13 17 19 21 30 40 50 52 60 65 72
12 13 17 19 21 30 40 50 52 60 65 72
Insert K = 18
![Page 42: CS 440 Database Management Systems Lecture 6: Data storage & access methods 1](https://reader035.vdocument.in/reader035/viewer/2022062600/5a4d1b517f8b9ab0599a78e5/html5/thumbnails/42.jpg)
42
Insertion
60
19 50 80 90 110
12 13 17 18 19 21 30 40 50 52 60 65 72
12 13 17 19 21 30 40 50 52 60 65 72
Insert K = 18
18
![Page 43: CS 440 Database Management Systems Lecture 6: Data storage & access methods 1](https://reader035.vdocument.in/reader035/viewer/2022062600/5a4d1b517f8b9ab0599a78e5/html5/thumbnails/43.jpg)
43
Insertion
60
19 50 80 90 110
12 13 17 18 50 52 60 65 72
12 13 17 19 21 30 40 50 52 60 65 72
Insert K= 20
19 20 21 30 40
2018
![Page 44: CS 440 Database Management Systems Lecture 6: Data storage & access methods 1](https://reader035.vdocument.in/reader035/viewer/2022062600/5a4d1b517f8b9ab0599a78e5/html5/thumbnails/44.jpg)
44
Insertion
60
19 50 80 90 110
12 13 17 18 50 52 60 65 72
12 13 17 19 21 30 40 50 52 60 65 72
Need to split the node
19 20 21 30 40
2018
![Page 45: CS 440 Database Management Systems Lecture 6: Data storage & access methods 1](https://reader035.vdocument.in/reader035/viewer/2022062600/5a4d1b517f8b9ab0599a78e5/html5/thumbnails/45.jpg)
45
Insertion
60
19 21 50 80 90 110
12 13 17 18 50 52 60 65 72
12 13 17 19 21 30 40 50 52 60 65 72
Split and update the parent node.What if we need to split the root?
20
19 20 21 30 40
18
![Page 46: CS 440 Database Management Systems Lecture 6: Data storage & access methods 1](https://reader035.vdocument.in/reader035/viewer/2022062600/5a4d1b517f8b9ab0599a78e5/html5/thumbnails/46.jpg)
46
Deletion
60
19 21 50 80 90 110
12 13 17 18 50 52 60 65 72
12 13 17 19 21 30 40 50 52 60 65 72
Delete K = 21
20
19 20 21 30 40
18
![Page 47: CS 440 Database Management Systems Lecture 6: Data storage & access methods 1](https://reader035.vdocument.in/reader035/viewer/2022062600/5a4d1b517f8b9ab0599a78e5/html5/thumbnails/47.jpg)
47
Deletion
60
19 21 50 80 90 110
12 13 17 18 50 52 60 65 72
12 13 17 19 30 40 50 52 60 65 72
Note: K = 21 may still remain in the internal levels
20
19 20 30 40
18
![Page 48: CS 440 Database Management Systems Lecture 6: Data storage & access methods 1](https://reader035.vdocument.in/reader035/viewer/2022062600/5a4d1b517f8b9ab0599a78e5/html5/thumbnails/48.jpg)
48
Deletion
60
19 21 50 80 90 110
12 13 17 18 50 52 60 65 72
12 13 17 19 30 40 50 52 60 65 72
Delete K = 20
20
19 20 30 40
18
![Page 49: CS 440 Database Management Systems Lecture 6: Data storage & access methods 1](https://reader035.vdocument.in/reader035/viewer/2022062600/5a4d1b517f8b9ab0599a78e5/html5/thumbnails/49.jpg)
49
Deletion
60
19 21 50 80 90 110
12 13 17 18 50 52 60 65 72
12 13 17 19 30 40 50 52 60 65 72
We need to update the number of keys on the node: Borrow from siblings: rotate
19 30 40
18
![Page 50: CS 440 Database Management Systems Lecture 6: Data storage & access methods 1](https://reader035.vdocument.in/reader035/viewer/2022062600/5a4d1b517f8b9ab0599a78e5/html5/thumbnails/50.jpg)
50
Deletion
60
19 21 50 80 90 110
12 13 17 50 52 60 65 72
12 13 17 19 30 40 50 52 60 65 72
We need to update the number of keys on the node: Borrow from siblings: rotate
18 19 30 40
18
![Page 51: CS 440 Database Management Systems Lecture 6: Data storage & access methods 1](https://reader035.vdocument.in/reader035/viewer/2022062600/5a4d1b517f8b9ab0599a78e5/html5/thumbnails/51.jpg)
51
Deletion
60
18 21 50 80 90 110
12 13 17 50 52 60 65 72
12 13 17 19 30 40 50 52 60 65 72
We need to update the number of keys on the node: Borrow from siblings: rotate
18 19 30 40
18
![Page 52: CS 440 Database Management Systems Lecture 6: Data storage & access methods 1](https://reader035.vdocument.in/reader035/viewer/2022062600/5a4d1b517f8b9ab0599a78e5/html5/thumbnails/52.jpg)
52
Deletion
60
18 21 50 80 90 110
12 13 17 50 52 60 65 72
12 13 17 19 30 40 50 52 60 65 72
What if we cannot borrow from siblings?Example: delete K = 30
18 19 30 40
18
![Page 53: CS 440 Database Management Systems Lecture 6: Data storage & access methods 1](https://reader035.vdocument.in/reader035/viewer/2022062600/5a4d1b517f8b9ab0599a78e5/html5/thumbnails/53.jpg)
53
Deletion
60
18 21 50 80 90 110
12 13 17 50 52 60 65 72
12 13 17 19 40 50 52 60 65 72
What if we cannot borrow from siblings?Merge with a sibling.
18 19 40
18
![Page 54: CS 440 Database Management Systems Lecture 6: Data storage & access methods 1](https://reader035.vdocument.in/reader035/viewer/2022062600/5a4d1b517f8b9ab0599a78e5/html5/thumbnails/54.jpg)
54
Deletion
60
18 21 50 80 90 110
12 13 17 50 52 60 65 72
12 13 17 19 40 50 52 60 65 72
What if we cannot borrow from siblings?Merge siblings!
18 19 40
18
![Page 55: CS 440 Database Management Systems Lecture 6: Data storage & access methods 1](https://reader035.vdocument.in/reader035/viewer/2022062600/5a4d1b517f8b9ab0599a78e5/html5/thumbnails/55.jpg)
55
Deletion
60
18 21 50 80 90 110
12 13 17 50 52 60 65 72
12 13 17 19 40 50 52 60 65 72
What to do with the dangling key and pointer? simply remove them
18 19 40
18
![Page 56: CS 440 Database Management Systems Lecture 6: Data storage & access methods 1](https://reader035.vdocument.in/reader035/viewer/2022062600/5a4d1b517f8b9ab0599a78e5/html5/thumbnails/56.jpg)
56
Deletion
60
18 50 80 90 110
12 13 17 50 52 60 65 72
12 13 17 19 40 50 52 60 65 72
Final tree
18 19 40
18
![Page 57: CS 440 Database Management Systems Lecture 6: Data storage & access methods 1](https://reader035.vdocument.in/reader035/viewer/2022062600/5a4d1b517f8b9ab0599a78e5/html5/thumbnails/57.jpg)
57
Index creationCREATE TABLE Person(Name varchar(50), Pos int, Age int);CREATE INDEX Person_ID ON Person(ID);
CLUSTER Person USING ON Person_ID;
CREATE INDEX Pos_Age ON Person(Pos, Age);
Default is normally B-tree.
Cluster Person_ID index
Multi-attribute index
![Page 58: CS 440 Database Management Systems Lecture 6: Data storage & access methods 1](https://reader035.vdocument.in/reader035/viewer/2022062600/5a4d1b517f8b9ab0599a78e5/html5/thumbnails/58.jpg)
58
Index selection• Let’s index every attribute on every table to speed
up all queries!
• Indexes generally slow down data manipulation– INSERT, DELETE, UPDATE.
![Page 59: CS 440 Database Management Systems Lecture 6: Data storage & access methods 1](https://reader035.vdocument.in/reader035/viewer/2022062600/5a4d1b517f8b9ab0599a78e5/html5/thumbnails/59.jpg)
59
Index selection• Given a query workload and a schema, find the
set of indexes that optimize the execution.• The query workload:– Queries and their frequencies.– Queries are both data retrieval (SELECT) and data
manipulation (INSERT, UPDATE, DELETE).
![Page 60: CS 440 Database Management Systems Lecture 6: Data storage & access methods 1](https://reader035.vdocument.in/reader035/viewer/2022062600/5a4d1b517f8b9ab0599a78e5/html5/thumbnails/60.jpg)
60
Index selection• Part of physical database design– File structure, indexing, tuning queries,…
• Physical database design may affect logical design!– Change the schema to run the queries faster
![Page 61: CS 440 Database Management Systems Lecture 6: Data storage & access methods 1](https://reader035.vdocument.in/reader035/viewer/2022062600/5a4d1b517f8b9ab0599a78e5/html5/thumbnails/61.jpg)
61
Index selection• Generally a hard problem.• RDBMS vendors provide wizards:– Started with AutoAdmin project for SQL Server– SQL Server/ Oracle Index Tuning Wizard– DB2 Index Advisor
• They try many configurations and pick the one that minimizes the time and overheads.