chapter 8 multiway trees - ccc.cs.lakeheadu.caccc.cs.lakeheadu.ca/cs2412/slides/cs2412-s8.pdf ·...
TRANSCRIPT
This chapter studies multiway trees. These trees can be used for
external search.
When searched data is big, it is not suitable to load all the data to
memory.
Search in high-speed memory is much faster than search in external
devices (hard discs, CD etc.)
The idea of external search: each time read in a block of
information to the memory and decide what is the next block we
should search on.
Data Structure 2015 R. Wei 2
Definition An m-way tree has the following properties.
• Each node has 0 to m subtrees.
• A node with k < m subtrees contains k subtrees and k− 1 data
entries.
• The keys of the data entries are ordered:
key1 ≤ key2 ≤ · · · ≤ keyk−1.
• The key values in the first subtree (0th subtree) are all less
than key1; the key values in the ith subtrees are all greater
than or equal to keyi but less than keyi+1.
• All subtrees are themselves multiway trees.
A 2-way tree is a BST.
Data Structure 2015 R. Wei 3
We also want the multisay tree to be balance.
Definition A B-tree is an m-way tree with the following additional
properties:
• The root is either a leaf or it has 2 to m subtrees.
• All internal nodes have at least ⌈m/2⌉ nonnull subtrees.
• All leaf nodes are at the same level.
• A leaf node has at least ⌈m/2⌉ − 1 entries.
Data Structure 2015 R. Wei 6
Data structure of B-tree.
The structure of m-way tree: an entry of a node contains data and
a pointer to its right subtree. A node contains the first pointer to
the subtree with entries less than the key of the first entry, a count
of the number of entries currently in the node, and an array of
entries. The array can be of size m.
Main operations for B-trees are: insert, delete, traverse and search.
Data Structure 2015 R. Wei 8
B-tree insertion:
B-tree insertion takes place at a leaf node.
• Locate the leaf node where the data can be inserted.
• If the node is not full (has less than m− 1 entries), insert the
data to this node.
• If the node is full (called overflow condition), split the node
into two node.
A B-tree grows from the bottom up.
Data Structure 2015 R. Wei 10
The algorithm of B-tree insert:
• If the B-tree is empty, then create the root and insert the first
entry.
• If the B-tree is not empty, call the insert node algorithm which
finds the location, insert it and do necessary update (if
overflow, then split and install the median entry to the parent
etc.).
• If the root needs to split, then create a new root.
Data Structure 2015 R. Wei 12
Algorithm BTreeInsert( tree, data)
if (tree empty)
create new node
set left subtree of node to null
move data to first entry in new node
set subtree of first entry to null
set tree root to address of new node
set number of entries to 1
else
insertNode(tree, data, upEntry)
end if
if (tree higher)
create new node
move upEntry to first entry in new node
set left subtree of the new node to tree
set tree root to new node
set number of entries to 1
end if
Data Structure 2015 R. Wei 13
Algorithm searchNode(nodePtr, target)
if (target < key in first entry)
return 0
end if
set walker to number of entries -1
loop (targer < entry key[walker])
decrement walker
end loop
return walker
This function returns the index to entry with key ≤ target, or 0 if
the key < first entry in node.
Data Structure 2015 R. Wei 16
Algorithm splitNode (node, entryNdx, newEntryLow, upEntry)
create new node
move high entries to new node
if (entryNdx < minimum entries)
inset upEntry in new node
end if
move median data to upEntry
make new node first Ptr the right subtree of median data
make new node the right subtree of upEntry
Data Structure 2015 R. Wei 17
B-tree deletion:
B-tree deletion is a little more complicated than insertion.
• Search for the data to be deleted. If can’t find, then print an
error message and quit.
• If the data is found, then delete the data. Two cases need to
consider: the data at leaf node or non-leaf node.
• If an underflow (a leaf node has less than ⌈m/2⌉ − 1 entries or
an internal node has less than ⌈m/2⌉ nonull subtrees) occurred
after the data deletion, then adjustment must be done.
Data Structure 2015 R. Wei 22
The following algorithm delete an entry. Some situations are
considered: empty tree, the root is empty after deletion. Leave the
details about how to treat underflow to algorithm delete.
Algorithm BTreeDelete (tree, dltKey)
if (tree empty)
return false
end if
delete (tree, dltKey, success)
if (success)
if(tree number of entries zero)
set tree to left subtree
end if
end if
return success
Data Structure 2015 R. Wei 23
The following algorithm deletes the entry from a leaf node and
returns the value of underflow.
Algorithm deleteEntry (node, entryNdx)
delete entry at entryNdx from node
shift entries after delete to left
if (number of entries less minimum, entries)
return true
else
retrun false
end if
Data Structure 2015 R. Wei 26
When deleting an entry in an internal node, we must find
substitute data. We use the immediate predecessor, which is the
largest node on the left subtree of the entry to be deleted. In the
subtree, the largest node is the rightmost subtree.
Algorithm deleteMid (node, entryNdx, subtree)
if (no rightmost subtree) //predecessor in a leaf node
move predecessor’s data to deleted entry
set underflow if node entries less minimum
else
set underflow to deleteMid(node, entryNdx, right subtree)
if (underflow)
set underflow to reFlow(root, entryNdx)
end if
end if
return underflow
Data Structure 2015 R. Wei 27
When a node is underflow, we need to do some adjustment which
we call reflow. Suppose one of the subtree contains unerflow node,
two situations need to consider:
• If the other subtree has more entries than the minimum
number, than we just move some entry to the underflow node,
which we call it balance.
• If the other subtree only has minimum number of entries, then
we need to combine two node to one node together with the
root entry. This is called combine.
Data Structure 2015 R. Wei 28
Algorithm reflow (root, entryNdx)
if (rightTree entries greater minimum entries)
borrowRight (root, entryNdx, leftTree, rightTree)
set underflow to false
else if (leftTree entries greater minimum entries)
borrowLeft (root, entryNdx, leftTree, rightTree)
set underflow to false
else
combine (root, entryNdx, leftTree, rightTree)
if (root numEntries less minimum entries)
set underflow to true
else
set underflow to false
end if
end if
return underflow
Data Structure 2015 R. Wei 29
Algorithm borrowLeft(root, entryNdx, left, right)
shift all elements one to the right
move root data to first entry in right
move right first pointer to right subtree of first entry
move left last right pointer to right first pointer
move left last entry data to root at entryNdx
In above algorithm, when an entry is moved the according pointers
are also adjusted. (To see that, consider the underflow node is not
a leaf node). The algorithm of borrowRight is similar.
Data Structure 2015 R. Wei 31
Algorithm combine (root, entryNdx, left, right)
move parent entry to first open entry in left subtree
move right subtree first subtree to
moved parent left subtree
move entries from right subtree to end of left subtree
shift root data to left
Data Structure 2015 R. Wei 33
Similar to BST, the traversal of a B-tree uses inorder. The
difference is that except of leaf nodes, the data in a node is not
processed at the same time.
Data Structure 2015 R. Wei 36
Algorithm BTreeTraversal (root)
set scanCount to 0
set nextSubTree to root left subtree
loop (scanCount <= number of entries)
if (nextSubTree not null)
BTreeTraversal (nextSubTree)
end if
if (ScanCount < number of entries)
process (entry[scanCount])
set nextSubTree to current entry right subtree
end if
increment scanount
end loop
Data Structure 2015 R. Wei 37
The B-tree search algorithm follow the similar idea of search a
binary tree. But we need find the node and then find the entry in
that node. In this case, we need to return both the node and the
location of the entry in that node.
Recursive method are used for finding the node. At the node
found, compare from the last entry to the first entry.
Data Structure 2015 R. Wei 38
typedef struct
{
void* dataPtr;
struct node* rightPtr;
} ENTRY;
typedef struct node
{
struct node* firstPtr;
int numEntries;
ENTRY entries[ORDER - 1];
} NODE;
typedef struct
{
int count;
NODE* root;
int (*compare) (void* argu1, void* argu2);
} BTREE;
Data Structure 2015 R. Wei 41
void* BTree_Search (BTREE* tree, void* targetPtr)
{
if (tree->root)
return _search
(tree, targetPtr, tree->root);
else
return NULL;
} // BTree_Search
Data Structure 2015 R. Wei 42
void* _search (BTREE* tree, void* targetPtr,
NODE* root)
{
int entryNo;
if (!root)
return NULL;
if (tree->compare(targetPtr,
root->entries[0].dataPtr) < 0)
return _search (tree,
targetPtr,
root->firstPtr);
entryNo = root->numEntries - 1;
while (tree->compare(targetPtr,
root->entries[entryNo].dataPtr) < 0)
entryNo--;
if (tree->compare(targetPtr,
root->entries[entryNo].dataPtr) == 0)
return (root->entries[entryNo].dataPtr);
return (_search (tree,
targetPtr, root->entries[entryNo].rightPtr));
} // _search
Data Structure 2015 R. Wei 43
void BTree_Traverse (BTREE* tree,
void (*process) (void* dataPtr))
{
// Statements
if (tree->root)
_traverse (tree->root, process);
return;
} // end BTree_Traverse
Data Structure 2015 R. Wei 44
void _traverse (NODE* root,
void (*process) (void* dataPtr))
{
int scanCount;
NODE* ptr;
scanCount = 0;
ptr = root->firstPtr;
while (scanCount <= root->numEntries)
{
if (ptr)
_traverse (ptr, process);
// Subtree processed -- get next entry
if (scanCount < root->numEntries)
{
process (root->entries[scanCount].dataPtr);
ptr = root->entries[scanCount].rightPtr;
} // if scanCount
scanCount++;
} // if
return;
} // _traverse
Data Structure 2015 R. Wei 45
void BTree_Insert (BTREE* tree, void* dataInPtr)
{
bool taller;
NODE* newPtr;
ENTRY upEntry;
if (tree->root == NULL)
// Empty Tree. Insert first node
if (newPtr = (NODE*)malloc(sizeof (NODE)))
{
newPtr->firstPtr = NULL;
newPtr->numEntries = 1;
newPtr->entries[0].dataPtr = dataInPtr;
newPtr->entries[0].rightPtr = NULL;
tree->root = newPtr;
(tree->count)++;
for (int i = 1; i < ORDER - 1; i++)
Data Structure 2015 R. Wei 46
{
newPtr->entries[i].dataPtr = NULL;
newPtr->entries[i].rightPtr = NULL;
} // for *
return;
} // if malloc
else
printf("Overflow error 100 in BTree_Insert\a\n"),
exit (100);
taller = _insert (tree, tree->root,
dataInPtr, &upEntry);
if (taller)
{
// Tree has grown. Create new root
newPtr = (NODE*)malloc(sizeof(NODE));
if (newPtr)
Data Structure 2015 R. Wei 47
{
newPtr->entries[0] = upEntry;
newPtr->firstPtr = tree->root;
newPtr->numEntries = 1;
tree->root = newPtr;
} // if newPtr
else
printf("Overflow error 101\a\n"),
exit (100);
} // if taller
(tree->count)++;
return;
} // BTree_Insert
Data Structure 2015 R. Wei 48
bool _insert (BTREE* tree, NODE* root,
void* dataInPtr, ENTRY* upEntry)
{
int compResult;
int entryNdx;
bool taller;
NODE* subtreePtr;
if (!root)
{
(*upEntry).dataPtr = dataInPtr;
(*upEntry).rightPtr = NULL;
return true; // tree taller
} // if NULL tree
entryNdx = _searchNode (tree, root, dataInPtr);
compResult = tree->compare(dataInPtr,
root->entries[entryNdx].dataPtr);
Data Structure 2015 R. Wei 49
if (entryNdx <= 0 && compResult < 0)
// in node’s first subtree
subtreePtr = root->firstPtr;
else
// in entry’s right subtree
subtreePtr = root->entries[entryNdx].rightPtr;
taller = _insert (tree, subtreePtr,
dataInPtr, upEntry);
if (taller)
{
if (root->numEntries >= ORDER - 1)
{
// Need to create new node
_splitNode (root, entryNdx,
compResult, upEntry);
taller = true;
Data Structure 2015 R. Wei 50
} // node full
else
{
if (compResult >= 0)
// New data >= current entry -- insert after
_insertEntry(root, entryNdx + 1, *upEntry);
else
// Insert before current entry
_insertEntry(root, entryNdx, *upEntry);
(root->numEntries)++;
taller = false;
} // else
} // if taller
return taller;
} // _insert
Data Structure 2015 R. Wei 51
void _splitNode (NODE* node, int entryNdx,
int compResult, ENTRY* upEntry)
{
int fromNdx;
int toNdx;
NODE* rightPtr;
rightPtr = (NODE*)malloc(sizeof (NODE));
if (!rightPtr)
printf("Overflow Error 101 in _splitNode\a\n"),
exit (100);
if (entryNdx < MIN_ENTRIES)
fromNdx = MIN_ENTRIES;
else
fromNdx = MIN_ENTRIES + 1;
toNdx = 0;
rightPtr->numEntries = node->numEntries - fromNdx;
Data Structure 2015 R. Wei 52
while (fromNdx < node->numEntries)
rightPtr->entries[toNdx++]
= node->entries[fromNdx++];
node->numEntries = node->numEntries
- rightPtr->numEntries;
if (entryNdx < MIN_ENTRIES)
{
if (compResult < 0)
_insertEntry (node, entryNdx, *upEntry);
else
_insertEntry (node, entryNdx + 1, *upEntry);
} // if
else
{
_insertEntry (rightPtr,
entryNdx - MIN_ENTRIES,
Data Structure 2015 R. Wei 53
*upEntry);
(rightPtr->numEntries)++;
(node->numEntries)--;
} // else
upEntry->dataPtr = node->entries[MIN_ENTRIES].dataPtr;
upEntry->rightPtr = rightPtr;
rightPtr->firstPtr
= node->entries[MIN_ENTRIES].rightPtr;
return;
} // _splitNode
Data Structure 2015 R. Wei 54
bool BTree_Delete (BTREE* tree, void* dltKey)
{
bool success;
NODE* dltPtr;
if (!tree->root)
return false;
_delete (tree,
tree->root,
dltKey,
&success);
if (success)
{
(tree->count)--;
if (tree->root->numEntries == 0)
{
dltPtr = tree->root;
Data Structure 2015 R. Wei 55
tree->root = tree->root->firstPtr;
free (dltPtr);
} // root empty
} // success
return success;
} // BTree_Delete
Data Structure 2015 R. Wei 56
bool _delete (BTREE* tree, NODE* root,
void* dltKeyPtr, bool* success)
{
NODE* leftPtr;
NODE* subTreePtr;
int entryNdx;
int underflow;
if (!root)
{
*success = false;
return false;
} // null tree
entryNdx = _searchNode (tree, root, dltKeyPtr);
if (tree->compare(dltKeyPtr,
root->entries[entryNdx].dataPtr) == 0)
{
Data Structure 2015 R. Wei 57
*success = true;
if (root->entries[entryNdx].rightPtr == NULL)
underflow = _deleteEntry (root, entryNdx);
else
{
if (entryNdx > 0)
leftPtr =
root->entries[entryNdx - 1].rightPtr;
else
leftPtr = root->firstPtr;
underflow = _deleteMid
(root, entryNdx, leftPtr);
if (underflow)
underflow = _reFlow (root, entryNdx);
} // else internal node
} // else found entry
Data Structure 2015 R. Wei 58
else
{
if (tree->compare (dltKeyPtr,
root->entries[0].dataPtr) < 0)
subTreePtr = root->firstPtr;
else
subTreePtr = root->entries[entryNdx].rightPtr;
underflow = _delete (tree, subTreePtr,
dltKeyPtr, success);
if (underflow)
underflow = _reFlow (root, entryNdx);
} // else not found *
return underflow;
} // _delete
Data Structure 2015 R. Wei 59
bool _deleteMid (NODE* root,
int entryNdx,
NODE* subtreePtr)
{
int dltNdx;
int rightNdx;
bool underflow;
if (subtreePtr->firstPtr == NULL)
{
// leaf located. Exchange data & delete leaf
dltNdx = subtreePtr->numEntries - 1;
root->entries[entryNdx].dataPtr =
subtreePtr->entries[dltNdx].dataPtr;
--subtreePtr->numEntries;
underflow = subtreePtr->numEntries < MIN_ENTRIES;
} // if leaf
Data Structure 2015 R. Wei 60
else
{
// Not located. Traverse right for predecessor
rightNdx = subtreePtr->numEntries - 1;
underflow = _deleteMid (root, entryNdx,
subtreePtr->entries[rightNdx].rightPtr);
if (underflow)
underflow = _reFlow (subtreePtr, rightNdx);
} // else traverse right
return underflow;
} // _deleteMid
Data Structure 2015 R. Wei 61
bool _reFlow (NODE* root, int entryNdx)
{
NODE* leftTreePtr;
NODE* rightTreePtr;
bool underflow;
if (entryNdx == 0)
leftTreePtr = root->firstPtr;
else
leftTreePtr = root->entries[entryNdx - 1].rightPtr;
rightTreePtr = root->entries[entryNdx].rightPtr;
if (rightTreePtr->numEntries > MIN_ENTRIES)
{
_borrowRight (root, entryNdx,
leftTreePtr, rightTreePtr);
underflow = false;
} // if borrow right
else
{
Data Structure 2015 R. Wei 62
// Can’t borrow from right--try left
if (leftTreePtr->numEntries > MIN_ENTRIES)
{
_borrowLeft (root, entryNdx,
leftTreePtr, rightTreePtr);
underflow = false;
} // if borrow left *
else
{
// Can’t borrow. Must combine nodes.
_combine (root, entryNdx,
leftTreePtr, rightTreePtr);
underflow = (root->numEntries < MIN_ENTRIES);
} // else combine
} // else borrow right
return underflow;
} // _reFlow
Data Structure 2015 R. Wei 63
void _borrowRight (NODE* root,
int entryNdx,
NODE* leftTreePtr,
NODE* rightTreePtr)
{
int toNdx;
int shifter;
toNdx = leftTreePtr->numEntries;
leftTreePtr->entries[toNdx].dataPtr
= root->entries[entryNdx].dataPtr;
leftTreePtr->entries[toNdx].rightPtr
= rightTreePtr->firstPtr;
++leftTreePtr->numEntries;
root->entries[entryNdx].dataPtr
= rightTreePtr->entries[0].dataPtr;
Data Structure 2015 R. Wei 64
rightTreePtr->firstPtr
= rightTreePtr->entries[0].rightPtr;
shifter = 0;
while (shifter < rightTreePtr->numEntries - 1)
{
rightTreePtr->entries[shifter]
= rightTreePtr->entries[shifter + 1];
++shifter;
} // while
--rightTreePtr->numEntries;
return;
} // _borrowRight
Data Structure 2015 R. Wei 65
void _combine (NODE* root, int entryNdx,
NODE* leftTreePtr, NODE* rightTreePtr)
{
int toNdx;
int fromNdx;
int shifter;
toNdx = leftTreePtr->numEntries;
leftTreePtr->entries[toNdx].dataPtr
= root->entries[entryNdx].dataPtr;
leftTreePtr->entries[toNdx].rightPtr
= rightTreePtr->firstPtr;
++leftTreePtr->numEntries;
--root->numEntries;
fromNdx = 0;
toNdx++;
while (fromNdx < rightTreePtr->numEntries)
Data Structure 2015 R. Wei 66
leftTreePtr->entries[toNdx++]
= rightTreePtr->entries[fromNdx++];
leftTreePtr->numEntries += rightTreePtr->numEntries;
free (rightTreePtr);
shifter = entryNdx;
while (shifter < root->numEntries)
{
root->entries[shifter] =
root->entries[shifter + 1];
shifter++;
} // while
return;
} // _combine
Data Structure 2015 R. Wei 67
BTREE* BTree_Create (int (*compare)
(void* argu1, void* argu2))
{
BTREE* tree;
tree = (BTREE*) malloc (sizeof (BTREE));
if (tree)
{
tree->root = NULL;
tree->count = 0;
tree->compare = compare;
} // if
return tree;
} // BTree_Create
Data Structure 2015 R. Wei 68
void BTree_Print (BTREE* tree)
{
_print (tree->root, 0);
return;
} // BTree_PRINT
void _print (NODE* root, int level)
{
int scanCount;
NODE* ptr;
void* voidPtr;
// Statements
if (root)
{
Data Structure 2015 R. Wei 69
scanCount = root->numEntries - 1;
while (scanCount >= 0)
{
ptr = root->entries[scanCount].rightPtr;
// Test for subtree
if (ptr)
_print (ptr, level + 1);
// Subtree processed -- print current entry
printf("(%02d)", level);
for (int i = 1; i <= level; i++ )
printf (" ." );
voidPtr = root->entries[scanCount].dataPtr;
printf("%4d", *((int*)voidPtr));
Data Structure 2015 R. Wei 70
printf("\t--Node: %p\n", root);
scanCount--;
} // while
// Process first pointer
if (root->firstPtr)
_print (root->firstPtr, level + 1);
} // if root
return;
} // BTree_Print
Data Structure 2015 R. Wei 71
Some special B-tree and variations:
• 2-3 Tree: a B-tree of order 3. (suitable for internal search)
• 2-3-4 Tree: a B-tree of order 4. (suitable for internal search)
• B*tree: when a node overflows, instead of being split
immediately, the data are tried to redistribute among the
node’s siblings.
• B+tree: Some data need to be processed both randomly and
sequentially. In a B+tree, data are all stored in leaf nodes. The
key in the internal node are just for searching. Each leaf node
has one additional pointer pointed to the next leaf node.
Data Structure 2015 R. Wei 72
Tries
A trie is a multiway tree which is used to search keys as a sequence
of characters (letters or digits, for example).
For example, if we want to search a key begin, then we first find b,
then find be, then beg, and so on.
In this way, the root has 26 children. And each node may have at
most 26 children. So it is based on a 26-way tree. In English, there
are no words beginning with ‘bb’, ‘bc’ or , ‘bf’, ‘bg’, · · · . So the
according nodes can be pruned.
Data Structure 2015 R. Wei 76
To prune the tree, we cut all of the branches that are not needed.
For example, if no key starts with letter X, then at level 0 the X
pointer is null. Similarly, after the letter Q, the only valid letter is
U . So all the pointers in the Q branch except U are set to null.
As an example, we display a tries which only contains 5 letters
A,B,C,E, and T . A node contains an array of 5. The node itself
pointer to the letter if the letter exists. In this example, most
pointers are null.
Data Structure 2015 R. Wei 78
Algorithm searchTrie (dictionary, word)
set root to dictionary
set ltrNdx to 0
loop (root not null)
if (root entry equals word)
return true
end if
if (ltrNdx > = word length)
return false
end if
set chNdx to word[ltrNdx]
set root to chNdx subtree
increment ltrNdx
end loop
return false
Data Structure 2015 R. Wei 80