cs 255: database system principles slides: b-trees by:- arunesh joshi id:-006538558
Post on 22-Dec-2015
213 views
TRANSCRIPT
CS 255: Database System Principles
slides: B-trees
By:- Arunesh Joshi Id:-006538558
Agenda• The features and different functionalities of B-
Tree in terms of index structure• The Structure of B-Trees• Applications of B-Trees• Lookup in B-Trees• Range Queries• Insertion into B-Trees• Deletion from a B-Tree• Efficiency of B-Trees
B-Trees
B-tree organizes its blocks into a tree. The tree is balanced, meaning that all paths from the root to a leaf have the same length. Typically, there are three layers in a B-tree: the root, an intermediate layer, and leaves, but any number of layers is possible.
functionalities of B- Tree
• B-Trees automatically maintain as many levels of index as is appropriate for the size of the file being indexed.
• B-Trees manage the space on the blocks they use so that every block is between half used and completely full. No overflow blocks are needed.
Structure of B-Trees
• There are three layers in binary trees- the root, an intermediate layer and leaves
• In a B-Tree each block have space for n search-key values and n+1 pointers
[next slide explains the structure of a B-Tree]
Root
B-Tree Example n=3
100
120
150
180
30
3 5 11 30 35 100
101
110
120
130
150
156
179
180
200
Sample non-leaf
to keys to keys to keys to keys to keys
< 57 57 k<81 81k<95 95
57 81 95
From non-leaf node
to next leafin sequence57 81 95
To re
cord
w
ith k
ey 5
7
To re
cord
w
ith k
ey 8
1
To re
cord
w
ith k
ey 8
5
Sample leaf node:
In textbook’s notation n=3
Leaf:
Non-leaf:
30 3530
30 35
30
Size of nodes: n+1 pointersn keys (fixed)
Don’t want nodes to be too empty
• Use at least
Non-leaf: (n+1)/2 pointers
Leaf: (n+1)/2 pointers to data
Full node min. node
Non-leaf
Leaf
n=3
120
150
180
30
3 5 11 30 35
coun
ts e
ven
if nu
ll
B-tree rules tree of order n
(1) All leaves at same lowest level(balanced tree)
(2) Pointers in leaves point to recordsexcept for “sequence pointer”
Number of pointers/keys for B+tree
Non-leaf(non-root) n+1 n (n+1)/2 (n+1)/2- 1
Leaf(non-root) n+1 n
Root n+1 n 1 1
Max Max Min Min ptrs keys ptrsdata keys
(n+1)/2 (n+1)/2
Applications of B-trees1. The search key of the B-tree is the primary key for the data
file, and the index is dense. That is, there is one key-pointer pair in a leaf for every record of the data file. The data file may or may not be sorted by primary key.
2. The data file is sorted by its primary key, and the B-tree is a sparse index with one key-pointer pair at a leaf for each block of the data file.
3. The data file is sorted by an attribute that is not a key, and this attribute is the search key for the B-tree. For each key value K that appears in the data file there is one key-pointer pair at a leaf. That pointer goes to the first of the records that have K as their sort-key value.
Lookup in B-Trees
• Suppose we want to find a record with search key 40.
• We will start at the root , the root is 13, so the record will go the right of the tree.
• Then keep searching with the same concept.
Looking for block “40”<not present>13
317
312923191713117532
43
4137 4743
23
Range Queries
• B-trees are used for queries in which a range of values are asked for. Like,
SELECT * FROM R WHERE R. k >= 10 AND R. k <= 25;
Insert into B-tree
(a) simple case– space available in leaf
(b) leaf overflow(c) non-leaf overflow(d) new root
(a) Insert key = 32 n=33 5 11 30 31
30
100
32
(a) Insert key = 7 n=3
3 5 11 30 31
30
100
3 5
7
7
(c) Insert key = 160 n=3
100
120
150
180
150
156
179
180
200
160
180
160
179
(d) New root, insert 45 n=3
10 20 30
1 2 3 10 12 20 25 30 32 40 40 45
40
30new root
CS 245 Notes 4 24
(a) Simple case - no example
(b) Coalesce with neighbor (sibling)
(c) Re-distribute keys(d) Cases (b) or (c) at non-leaf
Deletion from B-tree
(b) Coalesce with sibling– Delete 50
10 40 100
10 20 30 40 50
n=4
40
(c) Redistribute keys– Delete 50
10 40 100
10 20 30 35 40 50
n=4
35
35
40 4530 3725 2620 2210 141 3
10 20 30 40
(d) Non-leaf coalese– Delete 37
n=4
40
30
25
25
new root
B-tree deletions in practice
– Often, coalescing is not implemented– Too hard and not worth it!
Why we take 3 as the number of levels of a B-tree?
Suppose our blocks are 4096 bytes. Also let keys be integers of 4 bytes and let pointers be 8 bytes. If there is no header information kept on the blocks, then we want to find the largest integer value of n such that -
411 + 8(n + 1) 5 4096. That value is n = 340. 340 key-pointer pairs could fit in one block for our example data. Suppose that the average block has an occupancy midway between the minimum and maximum. i.e.. a typical block has 255 pointers. With a root 255 children and 255*255= 65023 leaves. We shall have among those leaves cube of 253. or about 16.6 million pointers to records. That is, files with up to 16.6 million records can be accommodated by a 3-level B-tree.
Thank youfor bearing me.