data structures and algorithms course’s slides: hierarchical data structures algis
TRANSCRIPT
Data Structuresand
Algorithms
Course’s slides: Hierarchical data structures
www.mif.vu.lt/~algis
Trees
Linear access time of linked lists is prohibitive
Does there exist any simple data structure for which the running time of most operations (search, insert, delete) is O(log N)?
Trees
A tree is a collection of nodes
The collection can be empty
(recursive definition) If not empty, a tree consists of a distinguished node r (the root), and zero or more nonempty subtrees T1, T2, ...., Tk, each of whose roots are connected by a directed edge from r
Some Terminologies
Child and parent Every node except the root has one parent A node can have an arbitrary number of children
Leaves Nodes with no children
Sibling nodes with same parent
Some Terminologies
Path Length
number of edges on the path
Depth of a node length of the unique path from the root to that node The depth of a tree is equal to the depth of the deepest
leaf
Height of a node length of the longest path from that node to a leaf all leaves are at height 0 The height of a tree is equal to the height of the root
Ancestor and descendant Proper ancestor and proper descendant
Example: UNIX Directory
Binary Trees A tree in which no node can have more
than two children
The depth of an “average” binary tree is considerably smaller than N, eventhough in the worst case, the depth can be as large as N – 1.
Example: Expression Trees
Leaves are operands (constants or variables) The other nodes (internal nodes) contain operators Will not be a binary tree if some operators are not binary
Tree traversal
Used to print out the data in a tree in a certain order
Pre-order traversal
Print the data at the root
Recursively print out all data in the left subtree
Recursively print out all data in the right subtree
Preorder, Postorder and Inorder
Preorder traversal
node, left, right prefix expression
++a*bc*+*defg
Preorder, Postorder and Inorder
Postorder traversal
left, right, node postfix expression
abc*+de*f+g*+ Inorder traversal
left, node, right. infix expression
a+b*c+d*e+f*g
Preorder, Postorder and Inorder
Binary Trees
Possible operations on the Binary Tree ADTparent
left_child, right_child
sibling
root, etc
ImplementationBecause a binary tree has at most two children, we can keep direct pointers to them
Compare: Implementation of a general tree
Binary Search TreesStores keys in the nodes in a way so that searching, insertion and deletion can be done efficiently.Binary search tree property
For every node X, all the keys in its left subtree are smaller than the key value in X, and all the keys in its right subtree are larger than the key value in X
Binary Search Trees
A binary search treeNot a binary search tree
Binary search trees
Average depth of a node is O(log N); maximum depth of a node is O(N)
Two binary search trees representing the same set:
Searching BST
If we are searching for 15, then we are done. If we are searching for a key < 15, then we should
search in the left subtree. If we are searching for a key > 15, then we should
search in the right subtree.
Inorder traversal of BST
Print out all the keys in sorted order
Inorder: 2, 3, 4, 6, 7, 9, 13, 15, 17, 18, 20
findMin/findMax
Return the node containing the smallest element in the tree
Start at the root and go left as long as there is a left child. The stopping point is the smallest element
Similarly for findMax
Time complexity = O(height of the tree)
Insert
Proceed down the tree as you would with a find
If X is found, do nothing (or update something)
Otherwise, insert X at the last spot on the path traversed
Time complexity = O(height of the tree)
Delete
When we delete a node, we need to consider how we take care of the children of the deleted node.
This has to be done such that the property of the search tree is maintained.
Delete
Three cases:(1) the node is a leaf
Delete it immediately(2) the node has one child
Adjust a pointer from the parent to bypass that node
Delete
(3) the node has 2 children replace the key of that node with the
minimum element at the right subtree delete the minimum element
Has either no child or only right child because if it has a left child, that left child would be smaller and would have been chosen. So invoke case 1 or 2
Time complexity = O(height of the tree)
AVL Trees - Lecture 8 25 12/26/03
Binary search tree – best time
All BST operations are O(d), where d is tree depth
minimum d is for a binary tree with N nodes
What is the best case tree? What is the worst case tree?
So, best case running time of BST operations is O(log N)
Nlogd 2
AVL Trees - Lecture 8 26 12/26/03
Binary Search Tree - Worst Time
Worst case running time is O(N)
What happens when you insert elements in ascending order? Insert: 2, 4, 6, 8, 10, 12 into an
empty BST Problem: Lack of “balance”:
compare depths of left and right subtree
Unbalanced degenerate tree
AVL Trees - Lecture 8 27 12/26/03
Balanced and unbalanced BST
4
2 5
1 3
1
5
2
4
3
7
6
4
2 6
5 71 3
Is this “balanced”?
AVL Trees - Lecture 8 28 12/26/03
Approaches to balancing trees
Don't balance
May end up with some nodes very deep
Strict balance
The tree must always be balanced perfectly
Pretty good balance
Only allow a little out of balance
Adjust on access
Self-adjusting
AVL Trees - Lecture 8 29 12/26/03
Balancing binary search trees
Many algorithms exist for keeping binary search trees balanced
Adelson-Velskii and Landis (AVL) trees (height-balanced trees)
Splay trees and other self-adjusting trees
B-trees and other multiway search trees
AVL Trees - Lecture 8 30 12/26/03
Perfect balance
Want a complete tree after every operation
tree is full except possibly in the lower right
This is expensive
For example, insert 2 in the tree on the left and then rebuild as a complete tree
Insert 2 &complete tree
6
4 9
81 5
5
2 8
6 91 4
AVL Trees - Lecture 8 31 12/26/03
AVL - good but not perfect balance
AVL trees are height-balanced binary search trees
Balance factor of a node
height(left subtree) - height(right subtree) An AVL tree has balance factor calculated at
every node
For every node, heights of left and right subtree can differ by no more than 1
Store current heights in each node
AVL Trees - Lecture 8 32 12/26/03
Height of an AVL tree
N(h) = minimum number of nodes in an AVL tree of height h.
Basis
N(0) = 1, N(1) = 2 Induction
N(h) = N(h-1) + N(h-2) + 1 Solution (recall Fibonacci analysis)
N(h) > h ( 1.62)h-1
h-2
h
AVL Trees - Lecture 8 33 12/26/03
Height of an AVL Tree
N(h) > h ( 1.62)
Suppose we have n nodes in an AVL tree of height h.
n > N(h) (because N(h) was the minimum)
n > h hence log n > h (relatively well balanced tree!!)
h < 1.44 log2n (i.e., Find takes O(logn))
AVL Trees - Lecture 8 34 12/26/03
Node Heights
1
00
2
0
6
4 9
81 5
1
height of node = hbalance factor = hleft-hright
empty height = -1
0
0
height=2 BF=1-0=1
0
6
4 9
1 5
1
Tree A (AVL) Tree B (AVL)
AVL Trees - Lecture 8 35 12/26/03
Node heights after insert 7
2
10
3
0
6
4 9
81 5
1
height of node = hbalance factor = hleft-hright
empty height = -1
1
0
2
0
6
4 9
1 5
1
0
7
0
7
balance factor 1-(-1) = 2
-1
Tree A (AVL) Tree B (not AVL)
AVL Trees - Lecture 8 36 12/26/03
Insert and rotation in AVL trees
Insert operation may cause balance factor to become 2 or –2 for some node
only nodes on the path from insertion point to root node have possibly changed in height
So after the Insert, go back up to the root node by node, updating heights
If a new balance factor (the difference hleft-hright) is 2 or –2, adjust tree by rotation around the node
AVL Trees - Lecture 8 37 12/26/03
Single Rotation in an AVL Tree
2
10
2
0
6
4 9
81 5
1
0
7
0
1
0
2
0
6
4
9
8
1 5
1
0
7
AVL Trees - Lecture 8 38 12/26/03
Let the node that needs rebalancing be .
There are 4 cases: Outside Cases (require single rotation) : 1. Insertion into left subtree of left child of . 2. Insertion into right subtree of right child of . Inside Cases (require double rotation) : 3. Insertion into right subtree of left child of . 4. Insertion into left subtree of right child of .
The rebalancing is performed through four separate rotation algorithms.
Insertions in AVL trees
AVL Trees - Lecture 8 39 12/26/03
j
k
X Y
Z
Consider a validAVL subtree
AVL insertion: outside case
h
hh
AVL Trees - Lecture 8 40 12/26/03
j
k
XY
Z
Inserting into Xdestroys the AVL property at node j
AVL Insertion: Outside Case
h
h+1 h
AVL Trees - Lecture 8 41 12/26/03
j
k
XY
Z
Do a “right rotation”
AVL Insertion: Outside Case
h
h+1 h
AVL Trees - Lecture 8 42 12/26/03
j
k
XY
Z
Do a “right rotation”
Single right rotation
h
h+1 h
AVL Trees - Lecture 8 43 12/26/03
j
k
X Y Z
“Right rotation” done!(“Left rotation” is mirror symmetric)
Outside Case Completed
AVL property has been restored!
h
h+1
h
AVL Trees - Lecture 8 44 12/26/03
j
k
X Y
Z
AVL Insertion: Inside Case
Consider a validAVL subtree
h
hh
AVL Trees - Lecture 8 45 12/26/03
Inserting into Y destroys theAVL propertyat node j
j
k
XY
Z
AVL Insertion: Inside Case
Does “right rotation”restore balance?
h
h+1h
AVL Trees - Lecture 8 46 12/26/03
jk
X
YZ
“Right rotation”does not restorebalance… now k isout of balance
AVL Insertion: Inside Case
hh+1
h
AVL Trees - Lecture 8 47 12/26/03
Consider the structureof subtree Y… j
k
XY
Z
AVL Insertion: Inside Case
h
h+1h
AVL Trees - Lecture 8 48 12/26/03
j
k
XV
Z
W
i
Y = node i andsubtrees V and W
AVL Insertion: Inside Case
h
h+1h
h or h-1
AVL Trees - Lecture 8 49 12/26/03
j
k
XV
Z
W
i
AVL Insertion: Inside Case
We will do a left-right “double rotation” . . .
AVL Trees - Lecture 8 50 12/26/03
j
k
X V
ZW
i
Double rotation : first rotation
left rotation complete
AVL Trees - Lecture 8 51 12/26/03
j
k
X V
ZW
i
Double rotation : second rotation
Now do a right rotation
AVL Trees - Lecture 8 52 12/26/03
jk
X V ZW
i
Double rotation : second rotation
right rotation complete
Balance has been restored
hh h or h-1
AVL Trees - Lecture 8 53 12/26/03
Implementation
balance (1,0,-1)
key
rightleft
No need to keep the height; just the difference in height, i.e. the balance factor; this has to be modified on the path of insertion even if you don’t perform rotations
Once you have performed a rotation (single or double) you won’t need to go back up the tree
AVL Trees - Lecture 8 54 12/26/03
Single Rotation
RotateFromRight(n : reference node pointer) {p : node pointer;p := n.right;n.right := p.left;p.left := n;n := p}
X
Y Z
n
You also need to modify the heights or balance factors of n and p
Insert
AVL Trees - Lecture 8 55 12/26/03
Double Rotation
Implement Double Rotation in two lines.
DoubleRotateFromRight(n : reference node pointer) {????}
X
n
V W
Z
AVL Trees - Lecture 8 56 12/26/03
Insertion in AVL Trees
Insert at the leaf (as for all BST)
only nodes on the path from insertion point to root node have possibly changed in height
So after the Insert, go back up to the root node by node, updating heights
If a new balance factor (the difference hleft-hright) is 2 or –2, adjust tree by rotation around the node
AVL Trees - Lecture 8 57 12/26/03
Insert in BST
Insert(T : reference tree pointer, x : element) : integer {if T = null then T := new tree; T.data := x; return 1;//the links to //children are nullcase T.data = x : return 0; //Duplicate do nothing T.data > x : return Insert(T.left, x); T.data < x : return Insert(T.right, x);endcase}
AVL Trees - Lecture 8 58 12/26/03
Insert in AVL trees
Insert(T : reference tree pointer, x : element) : {if T = null then {T := new tree; T.data := x; height := 0; return;}case T.data = x : return ; //Duplicate do nothing T.data > x : Insert(T.left, x); if ((height(T.left)- height(T.right)) = 2){ if (T.left.data > x ) then //outside case T = RotatefromLeft (T); else //inside case T = DoubleRotatefromLeft (T);} T.data < x : Insert(T.right, x); code similar to the left caseEndcase T.height := max(height(T.left),height(T.right)) +1; return;}
AVL Trees - Lecture 8 59 12/26/03
Example of Insertions in an AVL Tree
1
0
2
20
10 30
25
0
35
0
Insert 5, 40
AVL Trees - Lecture 8 60 12/26/03
Example of Insertions in an AVL Tree
1
0
2
20
10 30
25
1
35
0
50
20
10 30
25
1
355
40
0
0
01
2
3
Now Insert 45
AVL Trees - Lecture 8 61 12/26/03
Single rotation (outside case)
2
0
3
20
10 30
25
1
35
2
50
20
10 30
25
1
405
40
0
0
0
1
2
3
45
Imbalance35 45
0 0
1
Now Insert 34
AVL Trees - Lecture 8 62 12/26/03
Double rotation (inside case)
3
0
3
20
10 30
25
1
40
2
50
20
10 35
30
1
405
45
0 1
2
3
Imbalance
45
0
1
Insertion of 34
35
34
0
0
1 25 340
AVL Trees - Lecture 8 63 12/26/03
AVL Tree Deletion
Similar but more complex than insertion
Rotations and double rotations needed to rebalance
Imbalance may propagate upward so that many rotations may be needed.
AVL Trees - Lecture 8 64 12/26/03
Arguments for AVL trees:
1. Search is O(log N) since AVL trees are always balanced.2. Insertion and deletions are also O(logn)3. The height balancing adds no more than a constant factor to the
speed of insertion.
Arguments against using AVL trees:4. Difficult to program & debug; more space for balance factor.5. Asymptotically faster but rebalancing costs time.6. Most large searches are done in database systems on disk and use
other structures (e.g. B-trees).7. May be OK to have O(N) for a single operation if total run time for
many consecutive operations is fast (e.g. Splay trees).
Pros and Cons of AVL Trees
AVL Trees - Lecture 8 65 12/26/03
Double Rotation Solution
DoubleRotateFromRight(n : reference node pointer) {RotateFromLeft(n.right);RotateFromRight(n);}
X
n
V W
Z
Outline
Balanced Search Trees• 2-3 Trees• 2-3-4 Trees• Red-Black Trees
Why care about advanced implementations?
Same entries, different insertion sequence:
Not good! Would like to keep tree balanced.
2-3 Trees each internal node has either 2 or 3 children all leaves are at the same level
Features
2-3 Trees with Ordered Nodes2-node 3-node
• leaf node can be either a 2-node or a 3-node
Example of 2-3 Tree
Traversing a 2-3 Treeinorder(in ttTree: TwoThreeTree)
if(ttTree’s root node r is a leaf)visit the data item(s)
else if(r has two data items){
inorder(left subtree of ttTree’s root)visit the first data iteminorder(middle subtree of ttTree’s root)visit the second data iteminorder(right subtree of ttTree’s root)
}else{
inorder(left subtree of ttTree’s root)visit the data iteminorder(right subtree of ttTree’s root)
}
Searching a 2-3 treeretrieveItem(in ttTree: TwoThreeTree,
in searchKey:KeyType,out treeItem:TreeItemType):boolean
if(searchKey is in ttTree’s root node r){
treeItem = the data portion of rreturn true
}else if(r is a leaf)
return falseelse{
return retrieveItem( appropriate subtree,searchKey, treeItem)
}
What did we gain?
What is the time efficiency of searching for an item?
Gain: Ease of Keeping the Tree Balanced
Binary SearchTree
2-3 Tree
both trees afterinserting items39, 38, ... 32
Inserting ItemsInsert 39
Inserting ItemsInsert 38
insert in leafdivide leaf
and move middlevalue up to parent
result
Inserting ItemsInsert 37
Inserting ItemsInsert 36
insert in leaf
divide leafand move middlevalue up to parent
overcrowdednode
Inserting Items... still inserting 36
divide overcrowded node,move middle value up to parent,
attach children to smallest and largest
result
Inserting ItemsAfter Insertion of 35, 34, 33
Inserting so far
Inserting so far
Inserting ItemsHow do we insert 32?
Inserting Items creating a new root if necessary tree grows at the root
Inserting ItemsFinal Result
70
Deleting ItemsDelete 70
80
Deleting ItemsDeleting 70: swap 70 with inorder successor (80)
Deleting ItemsDeleting 70: ... get rid of 70
Deleting ItemsResult
Deleting ItemsDelete 100
Deleting ItemsDeleting 100
Deleting ItemsResult
Deleting ItemsDelete 80
Deleting ItemsDeleting 80 ...
Deleting ItemsDeleting 80 ...
Deleting ItemsDeleting 80 ...
Deleting ItemsFinal Result
comparison withbinary search tree
Deletion Algorithm I
1. Locate node n, which contains item I
2. If node n is not a leaf swap I with inorder successor
deletion always begins at a leaf
3. If leaf node n contains another item, just delete item Ielse
try to redistribute nodes from siblings (see next slide)if not possible, merge node (see next slide)
Deleting item I:
Deletion Algorithm II
A sibling has 2 items: redistribute item
between siblings andparent
No sibling has 2 items: merge node move item from parent
to sibling
Redistribution
Merging
Deletion Algorithm III
Internal node n has no item left redistribute
Redistribution not possible: merge node move item from parent
to sibling adopt child of n
If n's parent ends up without item, apply process recursively
Redistribution
Merging
Deletion Algorithm IVIf merging process reaches the root and root is without item delete root
Operations of 2-3 Trees
all operations have time complexity of log n
2-3-4 Trees• similar to 2-3 trees• 4-nodes can have 3 items and 4 children
4-node
2-3-4 Tree example
2-3-4 Tree: InsertionInsertion procedure:
• similar to insertion in 2-3 trees• items are inserted at the leafs• since a 4-node cannot take another item,
4-nodes are split up during insertion process
Strategy• on the way from the root down to the leaf:
split up all 4-nodes "on the way"
insertion can be done in one pass(remember: in 2-3 trees, a reverse pass might be necessary)
2-3-4 Tree: InsertionInserting 60, 30, 10, 20, 50, 40, 70, 80, 15, 90, 100
2-3-4 Tree: InsertionInserting 60, 30, 10, 20 ...
... 50, 40 ...
2-3-4 Tree: InsertionInserting 50, 40 ...
... 70, ...
2-3-4 Tree: InsertionInserting 70 ...
... 80, 15 ...
2-3-4 Tree: InsertionInserting 80, 15 ...
... 90 ...
2-3-4 Tree: InsertionInserting 90 ...
... 100 ...
2-3-4 Tree: InsertionInserting 100 ...
2-3-4 Tree: Insertion Procedure
Splitting 4-nodes during Insertion
2-3-4 Tree: Insertion Procedure
Splitting a 4-node whose parent is a 2-node during insertion
2-3-4 Tree: Insertion Procedure
Splitting a 4-node whose parent is a 3-node during insertion
2-3-4 Tree: DeletionDeletion procedure:
• similar to deletion in 2-3 trees• items are deleted at the leafs
swap item of internal node with inorder successor• note: a 2-node leaf creates a problem
Strategy (different strategies possible)
• on the way from the root down to the leaf:turn 2-nodes (except root) into 3-nodes
deletion can be done in one pass(remember: in 2-3 trees, a reverse pass might be necessary)
2-3-4 Tree: DeletionTurning a 2-node into a 3-node ...
Case 1: an adjacent sibling has 2 or 3 items "steal" item from sibling by rotating items and moving subtree
30 50
10 20 40
25
20 50
10 30 40
25
"rotation"
2-3-4 Tree: DeletionTurning a 2-node into a 3-node ...
Case 2: each adjacent sibling has only one item
"steal" item from parent and merge node with sibling(note: parent has at least two items, unless it is the root)
30 50
10 40
25
50
25
merging10 30 40
35 35
2-3-4 Tree: Deletion Practice
Delete 32, 35, 40, 38, 39, 37, 60
Red-Black Tree
• binary-search-tree representation of 2-3-4 tree
• 3- and 4-nodes are represented by equivalent binary trees
• red and black child pointers are used to distinguish betweenoriginal 2-nodes and 2-nodes that represent 3- and 4-nodes
Red-Black Representation of 4-node
Red-Black Representation of 3-node
Red-Black Tree Example
Red-Black Tree Example
Red-Black Tree Operations
Traversals same as in binary search trees
Insertion and Deletion analog to 2-3-4 tree need to split 4-nodes need to merge 2-nodes
Splitting a 4-node that is a root
Splitting a 4-node whose parent is a 2-node
Splitting a 4-node whose parent is a 3-node
Splitting a 4-node whose parent is a 3-node
Splitting a 4-node whose parent is a 3-node
Motivation for B-Trees
So far we have assumed that we can store an entire data structure in main memory
What if we have so much data that it won’t fit?
We will have to use disk storage but when this happens our time complexity fails
The problem is that Big-Oh analysis assumes that all operations take roughly equal time
This is not the case when disk access is involved
Motivation (cont.)
Assume that a disk spins at 3600 RPM
In 1 minute it makes 3600 revolutions, hence one revolution occurs in 1/60 of a second, or 16.7ms
On average what we want is half way round this disk – it will take 8ms
This sounds good until you realize that we get 120 disk accesses a second – the same time as 25 million instructions
In other words, one disk access takes about the same time as 200,000 instructions
It is worth executing lots of instructions to avoid a disk access
Motivation (cont.)
Assume that we use an Binary tree to store all the details of people in Canada (about 32 million records)
We still end up with a very deep tree with lots of different disk accesses; log2 20,000,000 is about 25, so this takes about 0.21 seconds (if there is only one user of the program)
We know we can’t improve on the log n for a binary tree
But, the solution is to use more branches and thus less height!
As branching increases, depth decreases
Definition of a B-tree
A B-tree of order m is an m-way tree (i.e., a tree where each node may have up to m children) in which:
1. the number of keys in each non-leaf node is one less than the number of its children and these keys partition the keys in the children in the fashion of a search tree
2. all leaves are on the same level
3. all non-leaf nodes except the root have at least m / 2 children
4. the root is either a leaf node, or it has from two to m children
5. a leaf node contains no more than m – 1 keys
The number m should always be odd
An example B-Tree
51 6242
6 12
26
55 60 7064 9045
1 2 4 7 8 13 15 18 25
27 29 46 48 53
A B-tree of order 5 containing 26 items
Note that all the leaves are at the same level
Suppose we start with an empty B-tree and keys arrive in the following order:1 12 8 2 25 6 14 28 17 7 52 16 48 68 3 26 29 53 55 45
We want to construct a B-tree of order 5
The first four items go into the root:
To put the fifth item in the root would violate condition 5
Therefore, when 25 arrives, pick the middle key to make a new root
Constructing a B-tree
1281 2
Constructing a B-tree
Add 25 to the tree
1 12 8 2 25 6 14 28 17 7 52 16 48 68 3 26 29 53 55 45
1281 2 25
Exceeds Order. Promote middle and split.
Constructing a B-tree (contd.)
6, 14, 28 get added to the leaf nodes:
1 12 8 2 25 6 14 28 17 7 52 16 48 68 3 26 29 53 55 45
12
8
1 2 25
12
8
1 2 2561 2 2814
Constructing a B-tree (contd.)
Adding 17 to the right leaf node would over-fill it, so we take the middle key, promote it (to the root) and split the leaf
1 12 8 2 25 6 14 28 17 7 52 16 48 68 3 26 29 53 55 45
1 12 8 2 25 6 14 28 17 7 52 16 48 68 3 26 29 53 55 45
12
8
2 2561 2 2814 2817
Constructing a B-tree (contd.)
7, 52, 16, 48 get added to the leaf nodes
1 12 8 2 25 6 14 28 17 7 52 16 48 68 3 26 29 53 55 45
12
8
2561 2 2814
17
7 5216 48
Constructing a B-tree (contd.)
Adding 68 causes us to split the right most leaf, promoting 48 to the root
1 12 8 2 25 6 14 28 17 7 52 16 48 68 3 26 29 53 55 45
8 17
7621 161412 52482825 68
Constructing a B-tree (contd.)
Adding 3 causes us to split the left most leaf
1 12 8 2 25 6 14 28 17 7 52 16 48 68 3 26 29 53 55 45
48178
7621 161412 25 28 52 683 7
Constructing a B-tree (contd.)
1 12 8 2 25 6 14 28 17 7 52 16 48 68 3 26 29 53 55 45
Add 26, 29, 53, 55 then go into the leaves
481783
1 2 6 7 52 6825 28161412 26 29 53 55
Constructing a B-tree (contd.)
Add 45 increases the trees level
1 12 8 2 25 6 14 28 17 7 52 16 48 68 3 26 29 53 55 45
481783
29282625 685553521614126 71 2 45
Exceeds Order. Promote middle and split.
Exceeds Order. Promote middle and split.
Inserting into a B-Tree
Attempt to insert the new key into a leaf
If this would result in that leaf becoming too big, split the leaf into two, promoting the middle key to the leaf’s parent
If this would result in the parent becoming too big, split the parent into two, promoting the middle key
This strategy might have to be repeated all the way to the top
If necessary, the root is split in two and the middle key is promoted to a new root, making the tree one level higher
Exercise in Inserting a B-Tree
Insert the following keys to a 5-way B-tree:
3, 7, 9, 23, 45, 1, 5, 14, 25, 24, 13, 11, 8, 19, 4, 31, 35, 56
Removal from a B-tree
During insertion, the key always goes into a leaf. For deletion we wish to remove from a leaf. There are three possible ways we can do this:
1 - If the key is already in a leaf node, and removing it doesn’t cause that leaf node to have too few keys, then simply remove the key to be deleted.
2 - If the key is not in a leaf then it is guaranteed (by the nature of a B-tree) that its predecessor or successor will be in a leaf -- in this case can we delete the key and promote the predecessor or successor key to the non-leaf deleted key’s position.
Removal from a B-tree (2)
If (1) or (2) lead to a leaf node containing less than the minimum number of keys then we have to look at the siblings immediately adjacent to the leaf in question:
3: if one of them has more than the min’ number of keys then we can promote one of its keys to the parent and take the parent key into our lacking leaf
4: if neither of them has more than the min’ number of keys then the lacking leaf and one of its neighbours can be combined with their shared parent (the opposite of promoting a key) and the new leaf will have the correct number of keys; if this step leave the parent with too few keys then we repeat the process up to the root itself, if required
Type #1: Simple leaf deletion
12 29 52
2 7 9 15 22 56 69 7231 43
Delete 2: Since there are enoughkeys in the node, just delete it
Assuming a 5-wayB-Tree, as before...
Note when printed: this slide is animated
Type #2: Simple non-leaf deletion
12 29 52
7 9 15 22 56 69 7231 43
Delete 52
Borrow the predecessoror (in this case) successor
56
Note when printed: this slide is animated
Type #4: Too few keys in node and its siblings
12 29 56
7 9 15 22 69 7231 43
Delete 72Too few keys!
Join back together
Note when printed: this slide is animated
Type #4: Too few keys in node and its siblings
12 29
7 9 15 22 695631 43
Note when printed: this slide is animated
Type #3: Enough siblings
12 29
7 9 15 22 695631 43
Delete 22
Demote root key andpromote leaf key
Note when printed: this slide is animated
Type #3: Enough siblings
12
297 9 15
31
695643
Note when printed: this slide is animated
Exercise in Removal from a B-Tree
Given 5-way B-tree created by these data (last exercise):
3, 7, 9, 23, 45, 1, 5, 14, 25, 24, 13, 11, 8, 19, 4, 31, 35, 56
Add these further keys: 2, 6,12
Delete these keys: 4, 5, 7, 3, 14
Analysis of B-Trees The maximum number of items in a B-tree of order
m and height h:
root m – 1level 1 m(m – 1)level 2 m2(m – 1). . .level h mh(m – 1)
So, the total number of items is(1 + m + m2 + m3 + … + mh)(m – 1) =[(mh+1 – 1)/ (m – 1)] (m – 1) = mh+1 – 1
When m = 5 and h = 2 this gives 53 – 1 = 124
Reasons for using B-Trees
When searching tables held on disc, the cost of each disc transfer is high but doesn't depend much on the amount of data transferred, especially if consecutive items are transferred
If we use a B-tree of order 101, say, we can transfer each node in one disc read operation
A B-tree of order 101 and height 3 can hold 1014 – 1 items (approximately 100 million) and any item can be accessed with 3 disc reads (assuming we hold the root in memory)
If we take m = 3, we get a 2-3 tree, in which non-leaf nodes have two or three children (i.e., one or two keys)
B-Trees are always balanced (since the leaves are all at the same level), so 2-3 trees make a good type of balanced tree