csc212 dtst tdata structure - city university of new...

33
CSC212 Dt St t CSC212 Dt St t Data Structure Data Structure Lecture 15 Lecture 15 B-Trees and the Set Class Trees and the Set Class Instructor: George Instructor: George Wolberg Wolberg Department of Computer Science Department of Computer Science City College of New York City College of New York @ @ George Wolberg, 2016 George Wolberg, 2016 1

Upload: vanmien

Post on 05-May-2018

221 views

Category:

Documents


2 download

TRANSCRIPT

CSC212 D t St tCSC212 D t St tData Structure Data Structure

Lecture 15Lecture 15BB--Trees and the Set ClassTrees and the Set Class

Instructor: George Instructor: George WolbergWolbergDepartment of Computer Science Department of Computer Science

City College of New YorkCity College of New York

@ @ George Wolberg, 2016George Wolberg, 2016 11

TopicsTopicsTopicsTopics

Why BWhy B--TreeTreeThe problem of an unbalanced treeThe problem of an unbalanced treeThe problem of an unbalanced treeThe problem of an unbalanced tree

The BThe B--Tree RulesTree RulesThe Set Class ADT ith BThe Set Class ADT ith B TreesTreesThe Set Class ADT with BThe Set Class ADT with B--TreesTrees Search for an Item in a BSearch for an Item in a B--TreeTree Insert an Item in a BInsert an Item in a B--Tree (*)Tree (*)Remove a Item from a BRemove a Item from a B--Tree (*)Tree (*)

@ @ George Wolberg, 2016George Wolberg, 2016 22

( )( )

The problem of an unbalanced BSTThe problem of an unbalanced BSTThe problem of an unbalanced BSTThe problem of an unbalanced BST

Maximum depth of a BST with n entires: nMaximum depth of a BST with n entires: n--11

1

2An Example: An Example:

Insert 1, 2, 3,4,5 in Insert 1, 2, 3,4,5 in 3

4that order into a bag that order into a bag using a BSTusing a BST

4

5

@ @ George Wolberg, 2016George Wolberg, 2016 33

Run BagTest!Run BagTest!

Worst-Case Times for BSTsWorst-Case Times for BSTsWorst-Case Times for BSTsWorst-Case Times for BSTs

Adding, deleting or searching for an entry in Adding, deleting or searching for an entry in a BST with n entries is O(d) in the worst a BST with n entries is O(d) in the worst ( )( )case, where d is the depth of the BSTcase, where d is the depth of the BST

Since d is no more than nSince d is no more than n--1 the operations1 the operations Since d is no more than nSince d is no more than n 1, the operations 1, the operations in the worst case is (nin the worst case is (n--1).1).

Conclusion: the worst case time for the addConclusion: the worst case time for the addConclusion: the worst case time for the add, Conclusion: the worst case time for the add, delete or search operation of a BST is O(n)delete or search operation of a BST is O(n)

@ @ George Wolberg, 2016George Wolberg, 2016 44

Solutions to the problemSolutions to the problemSolutions to the problemSolutions to the problem

Solution 1Solution 1 Periodically balance the search treePeriodically balance the search tree Periodically balance the search treePeriodically balance the search tree Project 10.9, page 516Project 10.9, page 516

Solution 2Solution 2 Solution 2Solution 2A particular kind of tree : BA particular kind of tree : B--TreeTree d b B & M C i ht i 1972d b B & M C i ht i 1972proposed by Bayer & McCreight in 1972proposed by Bayer & McCreight in 1972

@ @ George Wolberg, 2016George Wolberg, 2016 55

The B-Tree BasicsThe B-Tree BasicsThe B-Tree BasicsThe B-Tree Basics

Similar to a binary search tree (BST)Similar to a binary search tree (BST) where the implementation requires the ability to where the implementation requires the ability to

compare two entries via acompare two entries via a lessless than operator (<)than operator (<)compare two entries via a compare two entries via a lessless--than operator (<)than operator (<) But a BBut a B--tree is NOT a BST tree is NOT a BST –– in fact it is not even in fact it is not even

a binary tree a binary tree yy BB--tree nodes have many (more than two) childrentree nodes have many (more than two) children

Another important propertyAnother important property each node contains more than just a single entryeach node contains more than just a single entry

Advantages:Advantages:E h d dE h d d

@ @ George Wolberg, 2016George Wolberg, 2016 66

Easy to search, and not too deepEasy to search, and not too deep

Applications: bag and setApplications: bag and setApplications: bag and setApplications: bag and set

The DifferenceThe Difference two or more equal entries can occur many timestwo or more equal entries can occur many times two or more equal entries can occur many times two or more equal entries can occur many times

in a in a bag,bag, but not in a but not in a setsetC++ STL: set and multiset (= bag)C++ STL: set and multiset (= bag)( g)( g)

The BThe B--Tree Rules for a Set Tree Rules for a Set We will look at a “We will look at a “setset formulation” of the Bformulation” of the B--We will look at a We will look at a set set formulation of the Bformulation of the B--

Tree rules, but keep in mind that a “Tree rules, but keep in mind that a “bagbagformulation” is also possible formulation” is also possible

@ @ George Wolberg, 2016George Wolberg, 2016 77

pp

The B-Tree RulesThe B-Tree RulesThe B-Tree RulesThe B-Tree Rules

The entries in a BThe entries in a B--tree nodetree nodeBB--tree Rule 1tree Rule 1: The root may have as few as : The root may have as few as

one entry (or 0 entry if no children); every other one entry (or 0 entry if no children); every other node has at least MINIMUM entriesnode has at least MINIMUM entries

BB tree Rule 2tree Rule 2 Th i b fTh i b fBB--tree Rule 2tree Rule 2: The maximum number of : The maximum number of entries in a node is 2* MINIMUM.entries in a node is 2* MINIMUM.

BB--tree Rule 3tree Rule 3: The entries of each B: The entries of each B--tree nodetree nodeBB tree Rule 3tree Rule 3: The entries of each B: The entries of each B tree node tree node are stored in a partially filled array, sorted from are stored in a partially filled array, sorted from the smallest to the largest.the smallest to the largest.

@ @ George Wolberg, 2016George Wolberg, 2016 88

The B-Tree Rules (cont )The B-Tree Rules (cont )The B-Tree Rules (cont.)The B-Tree Rules (cont.)

The The subtreessubtrees below a Bbelow a B--tree nodetree nodeBB--tree Rule 4tree Rule 4: The number of the: The number of the subtreessubtrees belowbelowBB tree Rule 4tree Rule 4: The number of the : The number of the subtreessubtrees below below

a nona non--leaf node with n entries is always n+1leaf node with n entries is always n+1BB--tree Rule 5tree Rule 5: For any non: For any non--leaf node:leaf node:yy

(a). An entry at index (a). An entry at index ii is greater than all the entries in is greater than all the entries in subtreesubtree number number ii of the nodeof the node

(b) An entry at index (b) An entry at index ii is less than all the entries in is less than all the entries in subtreesubtree number i+1 of the nodenumber i+1 of the node

@ @ George Wolberg, 2016George Wolberg, 2016 99

An Example of B-TreeAn Example of B-TreeAn Example of B-TreeAn Example of B-Tree

[0] [1][0] [1]93 and 10793 and 107[0] [1][0] [1]

subtree subtree number 0number 0 subtree subtree

number 1number 1

subtree subtree number 2number 2

each entry each entry < 93< 93 each entry each entry

(93,107)(93,107)each entry each entry > 107> 107

@ @ George Wolberg, 2016George Wolberg, 2016 1010What kind traversal can print a sorted list?What kind traversal can print a sorted list?

The B-Tree Rules (cont )The B-Tree Rules (cont )The B-Tree Rules (cont.)The B-Tree Rules (cont.)

A BA B--tree is balancedtree is balancedBB--tree Rule 6tree Rule 6: Every leaf in a B: Every leaf in a B--tree has thetree has theBB tree Rule 6tree Rule 6: Every leaf in a B: Every leaf in a B tree has the tree has the

same depthsame depth

This rule ensures that a BThis rule ensures that a B--tree is balancedtree is balanced

@ @ George Wolberg, 2016George Wolberg, 2016 1111

Another Example MINIMUM = 1Another Example MINIMUM = 1Another Example, MINIMUM = 1Another Example, MINIMUM = 1

66

2 and 42 and 4 992 and 42 and 4 99

7 and 87 and 8 1010553311

@ @ George Wolberg, 2016George Wolberg, 2016 1212Can you verify that all 6 rules are satisfied?Can you verify that all 6 rules are satisfied?

The set ADT with a B-TreeThe set ADT with a B-TreeThe set ADT with a B-TreeThe set ADT with a B-Treeset.hset.h (p 528(p 528--529)529) template <class Item>template <class Item>

class setclass set{{

Combine fixed size Combine fixed size array with linked array with linked nodesnodes

{{public:public:

... ...... ...bool insert(const Item& entry);bool insert(const Item& entry);

data[]data[] *subset[]*subset[]

number of entries number of entries varyvary

bool insert(const Item& entry);bool insert(const Item& entry);std::size_t erase(const Item& target);std::size_t erase(const Item& target);std::size_t count(const Item& target) const;std::size_t count(const Item& target) const;

private:private:varyvary data_countdata_count up to 200!up to 200!

number of childrennumber of children

pp// MEMBER CONSTANTS// MEMBER CONSTANTSstatic const std::size_t MINIMUM = 200;static const std::size_t MINIMUM = 200;static const std::size_t MAXIMUM = 2 * MINIMUM;static const std::size_t MAXIMUM = 2 * MINIMUM;

number of children number of children varyvary child_countchild_count = data_count+1?= data_count+1?

// MEMBER VARIABLES// MEMBER VARIABLESstd::size_t data_count;std::size_t data_count;Item data[MAXIMUM+1]; // why +1? Item data[MAXIMUM+1]; // why +1? --for insert/erasefor insert/erasestd::size t child count;std::size t child count;

@ @ George Wolberg, 2016George Wolberg, 2016 1313

std::size_t child_count;std::size_t child_count;set *subset[MAXIMUM+2]; // why +2? set *subset[MAXIMUM+2]; // why +2? -- one moreone more

};};

Invariant for the set ClassInvariant for the set ClassInvariant for the set ClassInvariant for the set Class

The entries of a set is stored in a BThe entries of a set is stored in a B--tree, satisfying tree, satisfying the six Bthe six B--tree rules.tree rules.

The number of entries in a node is stored in The number of entries in a node is stored in data_count, and the entries are stored in data[0] data_count, and the entries are stored in data[0] h h d dh h d dthrough data[data_countthrough data[data_count--1]1]

The number of subtrees of a node is stored in The number of subtrees of a node is stored in hild t d th bt i t d b thild t d th bt i t d b tchild_count, and the subtrees are pointed by set child_count, and the subtrees are pointed by set

pointers subset[0] through subset[child_countpointers subset[0] through subset[child_count--1]1]

@ @ George Wolberg, 2016George Wolberg, 2016 1414

Search for a Item in a B-TreeSearch for a Item in a B-TreeSearch for a Item in a B-TreeSearch for a Item in a B-Tree

Prototype:Prototype: std::size t count(const Item& target) const;std::size t count(const Item& target) const; std::size_t count(const Item& target) const;std::size_t count(const Item& target) const;

PostPost condition:condition: PostPost--condition: condition: Returns the number of items equal to the targetReturns the number of items equal to the target

( i h f )( i h f ) (either 0 or 1 for a set).(either 0 or 1 for a set).

@ @ George Wolberg, 2016George Wolberg, 2016 1515

Searching for an Item: countSearching for an Item: countSearching for an Item: countSearching for an Item: countsearch for 10: cout << count (10);search for 10: cout << count (10);

Start at the root.1) locate i so that !(data[i]<target)2) If (d [i] i )

6 and 176 and 17

2) If (data[i] is target) return 1;

else if (no children)19 and 2219 and 2244 1212

else if (no children) return 0;

else 2 and 32 and 3 2525161655 1010 20201818

return subset[i]->count (target);88

@ @ George Wolberg, 2016George Wolberg, 2016 1616

Searching for an Item: countSearching for an Item: countSearching for an Item: countSearching for an Item: countsearch for 10: cout << count (10);search for 10: cout << count (10);

i = 1i = 1Start at the root.1) locate i so that !(data[i]<target)2) If (d [i] i )

6 and 176 and 17

i = 1i = 1

2) If (data[i] is target) return 1;

else if (no children)19 and 2219 and 2244 1212

else if (no children) return 0;

else 2 and 32 and 3 2525161655 1010 20201818

return subset[i]->count (target);88

@ @ George Wolberg, 2016George Wolberg, 2016 1717

Searching for an Item: countSearching for an Item: countSearching for an Item: countSearching for an Item: countsearch for 10: cout << count (10);search for 10: cout << count (10);

i 1i 1Start at the root.1) locate i so that !(data[i]<target)2) If (d [i] i )

6 and 176 and 17i = 1i = 1

2) If (data[i] is target) return 1;

else if (no children)19 and 2219 and 2244 1212

else if (no children) return 0;

else 2 and 32 and 3 2525161655 1010 20201818

return subset[i]->count (target);88

subset[1]subset[1]

@ @ George Wolberg, 2016George Wolberg, 2016 1818

[ ][ ]

Searching for an Item: countSearching for an Item: countSearching for an Item: countSearching for an Item: countsearch for 10: cout << count (10);search for 10: cout << count (10);

Start at the root.1) locate i so that !(data[i]<target)2) If (d [i] i )

6 and 176 and 17

2) If (data[i] is target) return 1;

else if (no children)19 and 2219 and 2244 1212

i = 0i = 0

else if (no children) return 0;

else 2 and 32 and 3 2525161655 1010 20201818

return subset[i]->count (target);88

@ @ George Wolberg, 2016George Wolberg, 2016 1919

Searching for an Item: countSearching for an Item: countSearching for an Item: countSearching for an Item: countsearch for 10: cout << count (10);search for 10: cout << count (10);

Start at the root.1) locate i so that !(data[i]<target)2) If (d [i] i )

6 and 176 and 17

2) If (data[i] is target) return 1;

else if (no children)19 and 2219 and 2244 1212

i = 0i = 0

else if (no children) return 0;

else 2 and 32 and 3 2525161655 1010 20201818

return subset[i]->count (target);88

subset[0]subset[0]

@ @ George Wolberg, 2016George Wolberg, 2016 2020

[ ][ ]

Searching for an Item: countSearching for an Item: countSearching for an Item: countSearching for an Item: countsearch for 10: cout << count (10);search for 10: cout << count (10);

Start at the root.1) locate i so that !(data[i]<target)2) If (d [i] i )

6 and 176 and 17

2) If (data[i] is target) return 1;

else if (no children)19 and 2219 and 2244 1212

else if (no children) return 0;

else 2 and 32 and 3 2525161655 1010 20201818

return subset[i]->count (target);88

i = 0i = 0data[i] is target !data[i] is target !

@ @ George Wolberg, 2016George Wolberg, 2016 2121

[ ] g[ ] g

Insert a Item into a B-TreeInsert a Item into a B-TreeInsert a Item into a B-TreeInsert a Item into a B-Tree

Prototype:Prototype:bool insert(const Item& entry);bool insert(const Item& entry);

PostPost--condition: condition: osos co d o :co d o : If an equal entry was already in the set, the set If an equal entry was already in the set, the set

is unchanged and the return value is false. is unchanged and the return value is false. Otherwise, entry was added to the set and the Otherwise, entry was added to the set and the

return value is true. return value is true.

@ @ George Wolberg, 2016George Wolberg, 2016 2222

Insert an Item in a B-TreeInsert an Item in a B-TreeInsert an Item in a B-TreeInsert an Item in a B-Treeinsert (11);insert (11);

i = 1i = 1Start at the root.1) locate i so that !(data[i]<entry)2) If (data[i] is entry)

6 and 176 and 17

i = 1i = 1

2) If (data[i] is entry) return false; // no work!

else if (no children) 19 and 2219 and 2244 1212i = 0i = 0

insert entry at i;return true;

else 2 and 32 and 3 2525161655 1010 20201818else return subset[i]->insert (entry);

88

i = 0i = 0data[i] is target !data[i] is target !

@ @ George Wolberg, 2016George Wolberg, 2016 2323

[ ] g[ ] g

Insert an Item in a B-TreeInsert an Item in a B-TreeInsert an Item in a B-TreeInsert an Item in a B-Treeinsert (11); // MIN = 1 insert (11); // MIN = 1 --> MAX = 2> MAX = 2

i = 1i = 1Start at the root.1) locate i so that !(data[i]<entry)2) If (data[i] is entry)

6 and 176 and 17

i = 1i = 1

2) If (data[i] is entry) return false; // no work!

else if (no children) 19 and 2219 and 2244 1212i = 0i = 0

insert entry at i;return true;

else 2 and 32 and 3 2525161655 1010 20201818else return subset[i]->insert (entry);

88

i = 1i = 1data[0] < entry !data[0] < entry !

@ @ George Wolberg, 2016George Wolberg, 2016 2424

[ ] y[ ] y

Insert an Item in a B-TreeInsert an Item in a B-TreeInsert an Item in a B-TreeInsert an Item in a B-Treeinsert (11); // MIN = 1 insert (11); // MIN = 1 --> MAX = 2> MAX = 2

i = 1i = 1Start at the root.1) locate i so that !(data[i]<entry)2) If (data[i] is entry)

6 and 176 and 17

i = 1i = 1

2) If (data[i] is entry) return false; // no work!

else if (no children) 19 and 2219 and 2244 1212i = 0i = 0

insert entry at i;return true;

else 2 and 32 and 3 2525161655 10 & 1110 & 11 20201818else return subset[i]->insert (entry);

1616 88

i = 1i = 1put entry in data[1]put entry in data[1]

@ @ George Wolberg, 2016George Wolberg, 2016 2525

p y [ ]p y [ ]

Insert an Item in a B-TreeInsert an Item in a B-TreeInsert an Item in a B-TreeInsert an Item in a B-Treeinsert (1); // MIN = 1 insert (1); // MIN = 1 --> MAX = 2> MAX = 2

i = 0i = 0Start at the root.1) locate i so that !(data[i]<entry)2) If (data[i] is entry)

6 and 176 and 17

i = 0i = 0

i = 0i = 0

2) If (data[i] is entry) return false; // no work!

else if (no children) 19 and 2219 and 2244 1212

i = 0i = 0

insert entry at i;return true;

else 2 and 32 and 3 2525161655 10 & 1110 & 11 20201818else return subset[i]->insert (entry);

1616 88ii = 0= 0 => put entry in data[0]=> put entry in data[0]

@ @ George Wolberg, 2016George Wolberg, 2016 2626

Insert an Item in a B-TreeInsert an Item in a B-TreeInsert an Item in a B-TreeInsert an Item in a B-Treeinsert (1); // MIN = 1 insert (1); // MIN = 1 --> MAX = 2> MAX = 2

i = 0i = 0Start at the root.1) locate i so that !(data[i]<entry)2) If (data[i] is entry)

6 and 176 and 17

i = 0i = 0

i = 0i = 0

2) If (data[i] is entry) return false; // no work!

else if (no children) 19 and 2219 and 2244 1212

i = 0i = 0

insert entry at i;return true;

else 1, 2 and 31, 2 and 3 2525161655 10 & 1110 & 11 20201818else return subset[i]->insert (entry);

a node has MAX+1 = 3 entries!a node has MAX+1 = 3 entries!

,, 1616 88ii = 0= 0

@ @ George Wolberg, 2016George Wolberg, 2016 2727

Insert an Item in a B-TreeInsert an Item in a B-TreeInsert an Item in a B-TreeInsert an Item in a B-Treeinsert (1); // MIN = 1 insert (1); // MIN = 1 --> MAX = 2> MAX = 2

Fix the node with MAX+1 entries 6 and 176 and 17

split the node into two from the middle

19 and 2219 and 2244 1212

move the middle entry up

1, 2 and 31, 2 and 3 2525161655 10 & 1110 & 11 20201818

a node has MAX+1 = 3 entries!a node has MAX+1 = 3 entries!

,, 1616 88

@ @ George Wolberg, 2016George Wolberg, 2016 2828

Insert an Item in a B-TreeInsert an Item in a B-TreeInsert an Item in a B-TreeInsert an Item in a B-Treeinsert (1); // MIN = 1 insert (1); // MIN = 1 --> MAX = 2> MAX = 2

Fix the node with MAX+1 entries 6 and 176 and 17

split the node into two from the middle

19 and 2219 and 222 and 42 and 4 1212

move the middle entry up

33 2525161655 10 & 1110 & 11 2020181811

Note: This shall be done recursively... the recursive functionNote: This shall be done recursively... the recursive function

1616 88

@ @ George Wolberg, 2016George Wolberg, 2016 2929

Note: This shall be done recursively... the recursive function Note: This shall be done recursively... the recursive function returns the middle entry to the root of the subset.returns the middle entry to the root of the subset.

Inserting an Item into a B-TreeInserting an Item into a B-TreeInserting an Item into a B-TreeInserting an Item into a B-Tree

What if the node already What if the node already has has MAXIMUM MAXIMUM number of items?number of items?

Solution Solution –– loose insertionloose insertion (p 551 (p 551 –– 557)557)A loose insert may A loose insert may result result in MAX +1 entries in the in MAX +1 entries in the

root of a subsetroot of a subsetTwo steps to fix the problem:Two steps to fix the problem:

fi ifi i b h bl h f hb h bl h f h fix it fix it –– but the problem may move to the root of the setbut the problem may move to the root of the set fix the root of the setfix the root of the set

@ @ George Wolberg, 2016George Wolberg, 2016 3030

Erasing an Item from a B-TreeErasing an Item from a B-TreeErasing an Item from a B-TreeErasing an Item from a B-Tree

Prototype:Prototype: std::std::size tsize t erase(const Item& target);erase(const Item& target); std::std::size_tsize_t erase(const Item& target);erase(const Item& target);

PostPost--Condition:Condition: If target was in the set then it has beenIf target was in the set then it has been If target was in the set, then it has been If target was in the set, then it has been

removed from the set and the return value is 1.removed from the set and the return value is 1. Otherwise the set is unchanged and theOtherwise the set is unchanged and the returnreturn Otherwise the set is unchanged and the Otherwise the set is unchanged and the return return

value is zero.value is zero.

@ @ George Wolberg, 2016George Wolberg, 2016 3131

Erasing an Item from a B-TreeErasing an Item from a B-TreeErasing an Item from a B-TreeErasing an Item from a B-Tree

Similarly, after “loose erase”, the root of a Similarly, after “loose erase”, the root of a subset may just have MINIMUM subset may just have MINIMUM ––1 entries1 entriesy jy j

Solution: (p557 Solution: (p557 –– 562)562)Fix theFix the shortageshortage of the subset rootof the subset root but this maybut this mayFix the Fix the shortageshortage of the subset root of the subset root –– but this may but this may

move the problem to the root of the entire setmove the problem to the root of the entire setFix theFix the rootroot of the entire set (tree)of the entire set (tree)Fix the Fix the rootroot of the entire set (tree)of the entire set (tree)

@ @ George Wolberg, 2016George Wolberg, 2016 3232

SummarySummarySummarySummary

A BA B--tree is a tree for sorting entries following the tree is a tree for sorting entries following the six rulessix rules

i b l di b l d l f il f i h hh h BB--Tree is balanced Tree is balanced -- every leaf in a Bevery leaf in a B--tree has the tree has the same depthsame depth

Adding erasing and searching an item in a BAdding erasing and searching an item in a B treetree Adding, erasing and searching an item in a BAdding, erasing and searching an item in a B--tree tree have worsthave worst--case time O(log n), where n is the case time O(log n), where n is the number of entriesnumber of entries

However the implementation of adding and However the implementation of adding and erasing an item in a Berasing an item in a B--tree is not a trivial task.tree is not a trivial task.

@ @ George Wolberg, 2016George Wolberg, 2016 3333