external sorting and searching
DESCRIPTION
External Sorting and Searching. B-Trees, etc. m-Way Search Trees. In a binary search tree, there is one key value per node and two children. There is no reason why I couldn’t have (at most) m-1 key values per node and m children. Such trees are called m-way search trees. - PowerPoint PPT PresentationTRANSCRIPT
1
External Sorting and Searching
B-Trees, etc.
2
m-Way Search Trees
In a binary search tree, there is one key value per node and two children.
There is no reason why I couldn’t have (at most) m-1 key values per node and m children.
Such trees are called m-way search trees.
3
m-Way Search Tree Example
Here is a 3-way search tree; each node has a maximum of 3 children.
120, 240,
360, 44097 200
4
m-Way Search Tree Example II
Here is another one.
120, 240
360, 440
97
500
5
m-Way Time Complexity
Clearly, the search and insert time for an m-way search tree is still O(n). The number of nodes visited is O(n/m) For each, we must look at m values. We could search in O(log2(m)) time,
yielding a best case of O(n/m * log2(m)). Of course, as n gets much larger than
M, this is still O(n).
6
B-Trees
What I want is a height-balanced m-way search tree to achieve the best search time.
These are called B-Trees.As with height-balanced BSTs, we
will have a re-balancing algorithm to run after every insert and delete.
7
B-Tree Properties
The root may have between 2 and m children.
All other nodes must have between M/2 and m children.
A node that has k children will have k-1 key values.
Thus, the root may have only 2 children; all other nodes must be at least half full.
8
B-Tree Properties II
If a B-Tree has k children (T0, T1, ...TK-1) and k-1 ordered key values (D1, D2,...DK-1), then all the key values in Ti are greater than Di but less than Di+1 for i=1...k-2.
All the key values in T0 are less than D1.
All the key values in Tk-1 are greater than DK-1.
This simply means it is a search tree.
9
B-Tree Insertion
All insertions are done at the terminal level.
First search for terminal level node to insert the new key value into.
If the number of children of this node does not exceed m, stop.
If the number of children does exceed m...
10
B-Tree Node Splitting
Split this node into two nodes: Take the middle value out. Create one node with the lower half of the
key values and one with the upper half. Insert middle value into the parent node. Continue recursively until either the node
can hold the new key value, or you split the root.
11
B-Tree Insert Example
A B-Tree of order 3 (i.e. m=3) is the smallest possible.
It is also the easiest to draw, so we’ll use this order for our example.
This is also called a “2-3 Tree” because each node may have a maximum of 2 key values and 3 children.
12
B-Tree Example
Insert 120. A new root node is created and this value is placed into it.
120
Key values left to insert: 360, 240, 200, 97, 440, 280
13
B-Tree Example
Insert 360. It goes into the root. No further action is required.
120, 360
Key values left to insert:240, 200, 97, 440, 280
14
B-Tree Example
Insert 240. It goes into the root. Since this node has 3 values, it must be split.
120, 240, 360
Key values left to insert: 200, 97, 440, 280
15
B-Tree Example
This shows the result of the split. 120 and 360 go into nodes by themselves, and 240 is placed into a new root node.
240
360120
Key values left to insert: 200, 97, 440, 280
16
B-Tree Example
Insert value 200. It goes into the node with 120. No further action is required.
240
360120, 200
Key values left to insert: 97, 440, 280
17
B-Tree Example
Insert value 97. It goes into the node with 120 and 200. Since this node contains too many values, it must be split
240
36097, 120, 200
Key values left to insert: 440, 280
18
B-Tree Example
This shows the result of the split. 97 and 200 are placed into their own nodes, and 120 is moved up to the parent. The parent node is OK.
120, 240,
36097 200
Key values left to insert: 440, 280
19
B-Tree Example
Insert 440. It goes into the node with 360. No further action is required.
120, 240,
360, 44097 200
Key values left to insert:280
20
B-Tree Example
Insert the value 280. It goes into the node with 360 and 440. Since this node has 3 values, it must be split.
120, 240,
280, 360, 44097 200
Key values left to insert:DONE
21
B-Tree Example
This shows the result of the split. 280 and 440 go into nodes by themselves, and 360 is moved up to the parent node.
120, 240, 360
44028097 200
22
B-Tree Example
The parent node must be split as well. Because it is the root, we must create a new root node.
240
120 360
44028097 200
23
Time Complexity
What is the order of a B-tree search? To answer this, we need to determine the worst case number of levels in a B-Tree of order m that has n key values.
Let’s look at the number of nodes per level: The root must have 1 node; Level 2 must have 2 nodes; Level 3 must have 2* M/2 nodes; Level 4 must have 2* M/2 2 nodes; Level L must have 2* M/2 L-2 nodes.
24
Time Complexity II
Observation: in any list of n elements, there are n+1 ways for the search to fail.
In a B-tree, all the ways to fail are at level L+1 (these are sometimes called Failure Nodes).
Thus, this is a relationship between the number of key values and the height of the tree:
25
Time Complexity III
Because the previous analysis is a worst case, the number of nodes at level L+1 must be less than or equal to N+1:
2 * m/2L-1 <= (N+1)m/2L-1 <= (N+1)/2L-1 <= Log m/2[(N+1)/2]L <= Log m/2[(N+1)/2] + 1
26
Time Complexity IV
One node at each level must be accessed, so L gives the number of nodes to access.
Each node contains m/2 -1 key values, so the total number of comparisons is
{Log m/2[(N+1)/2]+1} * {Log2[m/2 -1]}
27
Fun With Math
Removing the constants, we may say this search is
O{ Log m/2(N) * Log2[m/2] }O{Log2(N) / Log2m/2 * (Log2[m/2) }O{Log2(N)}
28
Summing it up:
WHAT??? ALL THIS WORK FOR THE SAME ORDER AS AN AVL-TREE!!!
What’s going on here???
29
What Really Happens
Remember this is external sorting, so accessing the information and doing comparisons are a much different cost.
Each node in the B-tree is stored in a “block” on the disk; a “block” is the minimum amount of information which can be retrieved with one disk access.
30
What Really Happens II
Thus, the number of disk accesses is the bottle-neck; this is given by L.
A B-tree is built on a field of a data file to speed access to that field.
A “Clustered” or “Primary” B-tree stores the entire record of the file in the B-Tree.
An “Unclustered” or “Secondary” B-tree stores the field’s value and the record number in the node.
31
What Really Happens III
It is the secondary B-trees that one usually means when one says “B-tree”.
Thus, to do a search for a record on a field which has a B-tree: Search the B-tree for the key value. When found, retrieve its associated record
number. Retrieve that record from the data file.
32
A Real Example.
What follows is a real example of how a B-tree is used.
33
Sample Data File
Course Teacher Schedule#CS 470 Prof. Green 23CS 471 Prof. Green 45CS 472 Prof. Green 46CS 473 Prof. Smith 100CS 474 Prof. Smith 110CS 475 Prof. Smith 120CS 476 Prof. Green 140CS 477 Prof. Green 210
34
B-Tree on Schedule#
100
45 120
23 46 110 140,210
This is the way we would normally view it:
35
B-Tree on Schedule#
Rec#Child Ptr 1
Key value 1
Data Ptr 1
Child Ptr 2
Key value 2
Data Ptr 2
Child Ptr 3
1 2 100 4 6 0 0 0
2 3 45 2 4 0 0 0
3 0 23 1 0 0 0 0
4 0 46 3 0 0 0 0
5 0 110 5 0 0 0 0
6 5 120 6 7 0 0 0
7 0 140 7 0 210 8 0
This is how it really looks in a file :
36
Deleting in a B-tree
To delete from a B-Tree, first locate the key value with the normal search routine.
If the key value is not located in a terminal node, replace it with its in order successor and delete the in order successor.
Thus, all deletes which reduce the number of key values occur at the terminal level.
37
Deleting From the Terminal Level
Good news: because there are no children to worry about, we can just remove it from the list.
Bad news: what if this removal reduces the number of children below m/2?
Reality: at some point we will need to reduce the number of nodes...
38
The “Borrow” Algorithm
When a node is reduced below m/2children, first try and borrow a key value from one of its neighbors.
If a neighbor has more than the minimum, then rotate the appropriate key to the parent and the appropriate key from the parent down to the reduced child.
39
Borrow Example
Suppose I want to delete 200 from this b-tree of order 3.
To do so, rotate 240 into middle child, and 360 up to root:
120, 240
360, 44097 200
40
Borrow Example
This shows the result.Problem: what if I now want to delete
240?Borrowing won’t work...
120, 360
44097 240
41
Combining Nodes
When borrowing won’t work, combine the node with the key value from the parent AND the neighbor node with minimum children.
Repeat the deletion algorithm from the parent, looking first to borrow if possible.
Now, let’s delete 240...
42
Combining Example
First, remove 240.
120, 360
44097 240
43
Combining Example
Next, attempt to borrow.Borrowing fails.Combine empty node with 360 and
440.
120, 360
44097 <empty>
44
Combining Example
This shows the result.The parent is OK, so we are done...
120
360, 44097
45
A Larger Example
Delete 280This is a “borrow” case:
260
120, 180 360
440, 50028097 150 200
46
A Larger Example
Delete 360This is a “combine” case:
260
120, 180 440
50036097 150 200
47
A Larger Example
First, remove 360...
260
120, 180 440
500<empty>97 150 200
48
A Larger Example
Next combine node with its neighbor (500) and 440 from the parent...
260
120, 180 440
500<empty>97 150 200
49
A Larger Example
Parent now has a problem...This is a borrow case:
260
120, 180 <empty>
440, 50097 150 200
50
A Larger Example
Children must now be considered. What do I do with the node with 200?
180
120 260
440, 50097 150 200
51
A Larger Example
Link it under 260.Now, delete 97...
180
120 260
440, 50097 150 200
52
A Larger Example
This is a combine case, so bring 120 down and combine with 150...
180
120 260
440, 500<empty> 150 200
53
A Larger Example
The parent now has a problem.This is a combine case:
180
<empty> 260
440, 500120, 150 200
54
A Larger Example
The old root is now empty; what to do with it?
<empty>
180, 260
440, 500120, 150 200
55
A Larger Example
Just dispose of it properly.
180, 260
440, 500120, 150 200