balanced search trees

35
Balanced Search Trees 15-211 Fundamental Data Structures and Algorithms Margaret Reid-Miller 3 February 2005

Upload: goro

Post on 05-Jan-2016

25 views

Category:

Documents


0 download

DESCRIPTION

Balanced Search Trees. 15-211 Fundamental Data Structures and Algorithms. Margaret Reid-Miller 3 February 2005. Plan. Today 2-3-4 trees Red-Black trees Reading: For today: Chapters 13.3-4 Reminder: HW1 due tonight!!! HW2 will be available soon. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Balanced Search Trees

Balanced Search Trees

15-211 Fundamental Data Structures and Algorithms

Margaret Reid-Miller

3 February 2005

Page 2: Balanced Search Trees

Plan

Today 2-3-4 trees

Red-Black trees

Reading:For today: Chapters 13.3-4

Reminder: HW1 due tonight!!!

HW2 will be available soon

Page 3: Balanced Search Trees

AVL-tree Review

Page 4: Balanced Search Trees

AVL-Trees

What is the key restriction on a binary search tree that keeps an AVL tree balanced?

OK not OK

5

3

6

7

2 4 95

2

6

7

1 4

3

6

2 8

1 4

5

5

3 7

8

94

Page 5: Balanced Search Trees

AVL-Trees

Height balanced:

For each node the heights of left and right subtrees differ by at most 1, a representational invariance.

What is the mechanism to rebalance an out-of-balanced AVL tree caused by an insert?

Page 6: Balanced Search Trees

The single rotation

Rotate the deepest out-of-balanced node. “Pulls” the child up one level.

Z

YX

ZYX

Page 7: Balanced Search Trees

The double rotation

First rotate around child node, then around the parent node.

Z

X

Y1 Y2

Z

Y2

Y1X

Page 8: Balanced Search Trees

Double rotation cont’d

Result is to “pull” the grandchild node up two levels.

Z

X

Y1 Y2

ZX Y1 Y2

Page 9: Balanced Search Trees

AVL Tree Summary

In each node maintains a lazy deletion flag and the height of its subtree.

The height of an AVL tree is at most 45% greater than the minimum.

Requires at most one single or double rotation to regain balance after an insert.

Thus, guarantees O(log N) time for search and insert.

Page 10: Balanced Search Trees

2-3-4 Trees

Page 11: Balanced Search Trees

Balanced 2-3-4 Trees

Maintain height balance in all subtrees. Depth property.

But allow nodes in the tree to expand to accommodate inserts.

In particular, nodes can have 2, 3 or 4 children. Node-size property.

E.g., a 4-node would have 3 keys that splits the keys into 4 intervals.

Page 12: Balanced Search Trees

2-3-4 tree search

Search is similar to a binary search.

E.g., search for B

G M Q

A C H S WR

Page 13: Balanced Search Trees

2-3-4 tree search

Search is similar to a binary search.

E.g., search for B

G M Q

A C H S WR

Page 14: Balanced Search Trees

2-3-4 Tree Insert

To insert, first search for a leaf node in which to put the key.

E.g., insert U

S U W

G M Q

A C H O

G M Q

A C H S WR

Page 15: Balanced Search Trees

2-3-4 Tree Insert

May need to split a node E.g., insert T

A C H S U W

G Q U

A C H

G Q T

WS T

Page 16: Balanced Search Trees

2-3-4 Tree Insert

/* Either returns an empty node or a new root */

public Node BUinsert(int key) {

if isEmptyNode() return new Node(key);

/* Search for leaf to put key into */Node subtree = findChild(key); // down which link?

Node upNode = child.BUinsert(key);

/* upNode is empty, the key at a leaf node, or * the result of a 4-node split that needs to be

* propagated up. */

if upNode.isEmptyNode() return upNode;else

return addToNode(upNode); // split?}

Page 17: Balanced Search Trees

Cascading splits

When inserting a key into a 4-node, the 4-node splits and a key moves up to the parent node.

This new key may in turn cause the parent to split, moving a key up to the grandparent, and so on up to the root.

When would this happen?

Is there a way to avoid these cascading splits?

Page 18: Balanced Search Trees

Bottom-up 2-3-4 trees

This BUinsert is called a bottom-up version of insert, since splits occur as we go back up the tree after the recursive calls.

Work occurs before and after the recursive calls.

Page 19: Balanced Search Trees

Preemptive Split

Every time we find a 4-node while traveling down a search path, we split the 4-node.

Note: Two 2-nodes have the same number of children as one 4-node.

Changes are local to the split node (no cascading).

Guaranteed to find a 2-node or 3-node at the leaf.

Splitting a root node creates a new root.

Page 20: Balanced Search Trees

2-3-4 Tree Height

What is the height of the tree?

At most log2 N + 1

Why?

The maximum depth is when every node is a 2-node. Since every leaf has the same depth, the tree is complete and has depth log2 N + 1.

Page 21: Balanced Search Trees

Number of splits

How many splits does an insertion require?

At most log2 N + 1 splits.

Seems to require less than one split on average when tree is built from a random permutation. Trees tend to have few 4-nodes.

Page 22: Balanced Search Trees

Top-down 2-4-5 trees

The second method is called top-down as splits occur on the way down the tree.

All the work occurs before the recursive calls and no work occurs after the recursive calls. Called tail-recursion, which is much

more efficient.

Can AVL trees be made tail recursive?

Page 23: Balanced Search Trees

2-3-4 trees

Advantages:Guaranteed O(log N) time for search and

insert.

Issues:Awkward to maintain three types of nodes.

Need to modify the standard search on binary trees.

Splits need to move links between nodes.

Code has many cases to handle.

Page 24: Balanced Search Trees

Red Black Trees

Page 25: Balanced Search Trees

Red-Black trees

A red-black tree is binary tree representation of a 2-3-4 tree using red and black nodes.

B F HD IG

G F

B H

I

D

D

IOR

Page 26: Balanced Search Trees

Red-black tree properties

A Red-Black tree is a binary search tree where

Every node is colored either red or black.Note: Every 2-3-4 node corresponds to one

black node.

The root node is black.

Red nodes always have black parents (children)

Every path from the root to a leaf has same number of black nodes.

Page 27: Balanced Search Trees

Red-black tree height

What is the height of a red-black tree?

It is at most 2 log N + 2 since it can be at most twice as high as its corresponding 2-3-4 tree, which has height at most log N + 1.

5

3

6

7

9

Page 28: Balanced Search Trees

Red-black Tree Search

Search is the same as for binary search trees. Color is irrelevant.

Search guaranteed to take O(log N) time.

Search typically occurs more frequently than insert.

Page 29: Balanced Search Trees

Red-black Tree Insert

Simple 4-node test (2 red children?)

Few splits as most 4-nodes tend to be near the leaves.

Some 4-node splits require only changing the color of three nodes.

Rotations needed only when a 4-node has a 3-node parent.

Page 30: Balanced Search Trees

Red-black Tree Summary

Advantages:

Guaranteed O(log N) time for search and insert.

Little overhead for balancing.

Trees are nearly optimal.

Top-down implementation can be made tail-recursive, so very efficient.

Page 31: Balanced Search Trees

B-Trees

Page 32: Balanced Search Trees

B-trees

A generalization of 2-3-4 trees.

Used for very large dictionaries where the data are maintained on disks.

Since disk lookups are very SLOW, want to read as few disk pages as possible.

Want really shallow depth trees!

Page 33: Balanced Search Trees

B-trees Key Idea

Make the nodes in the trees have a huge number of links, k-way.

Typically choose k so that a node fills a disk page.

As with 2-3-4 trees, not all the nodes have k links. Some may have as few as k/2 links.

When a node overflows, split the node.

Page 34: Balanced Search Trees

B-trees

Takes O(log k/2 N) probes for search and insert.

Typically about 2-3 probes (disk accesses)

E.g., for N < 125 million and k = 1000, the height of the tree is less than 3.

As all searches go through the root node, usually keep the root node in memory.

Many variants

Common in many large data base systems.

Page 35: Balanced Search Trees

Conclusion

AVL trees have the disadvantage that insert is not tail recursive.

2-3-4 trees are not practical, but are a good way to think about other approaches.

Red-black trees are very efficient and have guaranteed O(log N) insert and search.

B-trees have very shallow depth to minimize the number of disk reads needed for huge data bases.