balanced search trees
Post on 05-Jan-2016
25 Views
Preview:
DESCRIPTION
TRANSCRIPT
Balanced Search Trees
15-211 Fundamental Data Structures and Algorithms
Margaret Reid-Miller
3 February 2005
Plan
Today 2-3-4 trees
Red-Black trees
Reading:For today: Chapters 13.3-4
Reminder: HW1 due tonight!!!
HW2 will be available soon
AVL-tree Review
AVL-Trees
What is the key restriction on a binary search tree that keeps an AVL tree balanced?
OK not OK
5
3
6
7
2 4 95
2
6
7
1 4
3
6
2 8
1 4
5
5
3 7
8
94
AVL-Trees
Height balanced:
For each node the heights of left and right subtrees differ by at most 1, a representational invariance.
What is the mechanism to rebalance an out-of-balanced AVL tree caused by an insert?
The single rotation
Rotate the deepest out-of-balanced node. “Pulls” the child up one level.
Z
YX
ZYX
The double rotation
First rotate around child node, then around the parent node.
Z
X
Y1 Y2
Z
Y2
Y1X
Double rotation cont’d
Result is to “pull” the grandchild node up two levels.
Z
X
Y1 Y2
ZX Y1 Y2
AVL Tree Summary
In each node maintains a lazy deletion flag and the height of its subtree.
The height of an AVL tree is at most 45% greater than the minimum.
Requires at most one single or double rotation to regain balance after an insert.
Thus, guarantees O(log N) time for search and insert.
2-3-4 Trees
Balanced 2-3-4 Trees
Maintain height balance in all subtrees. Depth property.
But allow nodes in the tree to expand to accommodate inserts.
In particular, nodes can have 2, 3 or 4 children. Node-size property.
E.g., a 4-node would have 3 keys that splits the keys into 4 intervals.
2-3-4 tree search
Search is similar to a binary search.
E.g., search for B
G M Q
A C H S WR
2-3-4 tree search
Search is similar to a binary search.
E.g., search for B
G M Q
A C H S WR
2-3-4 Tree Insert
To insert, first search for a leaf node in which to put the key.
E.g., insert U
S U W
G M Q
A C H O
G M Q
A C H S WR
2-3-4 Tree Insert
May need to split a node E.g., insert T
A C H S U W
G Q U
A C H
G Q T
WS T
2-3-4 Tree Insert
/* Either returns an empty node or a new root */
public Node BUinsert(int key) {
if isEmptyNode() return new Node(key);
/* Search for leaf to put key into */Node subtree = findChild(key); // down which link?
Node upNode = child.BUinsert(key);
/* upNode is empty, the key at a leaf node, or * the result of a 4-node split that needs to be
* propagated up. */
if upNode.isEmptyNode() return upNode;else
return addToNode(upNode); // split?}
Cascading splits
When inserting a key into a 4-node, the 4-node splits and a key moves up to the parent node.
This new key may in turn cause the parent to split, moving a key up to the grandparent, and so on up to the root.
When would this happen?
Is there a way to avoid these cascading splits?
Bottom-up 2-3-4 trees
This BUinsert is called a bottom-up version of insert, since splits occur as we go back up the tree after the recursive calls.
Work occurs before and after the recursive calls.
Preemptive Split
Every time we find a 4-node while traveling down a search path, we split the 4-node.
Note: Two 2-nodes have the same number of children as one 4-node.
Changes are local to the split node (no cascading).
Guaranteed to find a 2-node or 3-node at the leaf.
Splitting a root node creates a new root.
2-3-4 Tree Height
What is the height of the tree?
At most log2 N + 1
Why?
The maximum depth is when every node is a 2-node. Since every leaf has the same depth, the tree is complete and has depth log2 N + 1.
Number of splits
How many splits does an insertion require?
At most log2 N + 1 splits.
Seems to require less than one split on average when tree is built from a random permutation. Trees tend to have few 4-nodes.
Top-down 2-4-5 trees
The second method is called top-down as splits occur on the way down the tree.
All the work occurs before the recursive calls and no work occurs after the recursive calls. Called tail-recursion, which is much
more efficient.
Can AVL trees be made tail recursive?
2-3-4 trees
Advantages:Guaranteed O(log N) time for search and
insert.
Issues:Awkward to maintain three types of nodes.
Need to modify the standard search on binary trees.
Splits need to move links between nodes.
Code has many cases to handle.
Red Black Trees
Red-Black trees
A red-black tree is binary tree representation of a 2-3-4 tree using red and black nodes.
B F HD IG
G F
B H
I
D
D
IOR
Red-black tree properties
A Red-Black tree is a binary search tree where
Every node is colored either red or black.Note: Every 2-3-4 node corresponds to one
black node.
The root node is black.
Red nodes always have black parents (children)
Every path from the root to a leaf has same number of black nodes.
Red-black tree height
What is the height of a red-black tree?
It is at most 2 log N + 2 since it can be at most twice as high as its corresponding 2-3-4 tree, which has height at most log N + 1.
5
3
6
7
9
Red-black Tree Search
Search is the same as for binary search trees. Color is irrelevant.
Search guaranteed to take O(log N) time.
Search typically occurs more frequently than insert.
Red-black Tree Insert
Simple 4-node test (2 red children?)
Few splits as most 4-nodes tend to be near the leaves.
Some 4-node splits require only changing the color of three nodes.
Rotations needed only when a 4-node has a 3-node parent.
Red-black Tree Summary
Advantages:
Guaranteed O(log N) time for search and insert.
Little overhead for balancing.
Trees are nearly optimal.
Top-down implementation can be made tail-recursive, so very efficient.
B-Trees
B-trees
A generalization of 2-3-4 trees.
Used for very large dictionaries where the data are maintained on disks.
Since disk lookups are very SLOW, want to read as few disk pages as possible.
Want really shallow depth trees!
B-trees Key Idea
Make the nodes in the trees have a huge number of links, k-way.
Typically choose k so that a node fills a disk page.
As with 2-3-4 trees, not all the nodes have k links. Some may have as few as k/2 links.
When a node overflows, split the node.
B-trees
Takes O(log k/2 N) probes for search and insert.
Typically about 2-3 probes (disk accesses)
E.g., for N < 125 million and k = 1000, the height of the tree is less than 3.
As all searches go through the root node, usually keep the root node in memory.
Many variants
Common in many large data base systems.
Conclusion
AVL trees have the disadvantage that insert is not tail recursive.
2-3-4 trees are not practical, but are a good way to think about other approaches.
Red-black trees are very efficient and have guaranteed O(log N) insert and search.
B-trees have very shallow depth to minimize the number of disk reads needed for huge data bases.
top related