+ data structures b-tree jibrael jos : sep 2009. + agenda introduction multiway trees b tree...

55
+ Data Structures B-tree Jibrael Jos : Sep 2009

Upload: ethelbert-palmer

Post on 31-Dec-2015

216 views

Category:

Documents


1 download

TRANSCRIPT

+

Data StructuresB-treeJibrael Jos : Sep 2009

Avoid Taking Printout : Use RTF Outline in case needed 2

+

Agenda

IntroductionMultiway TreesB TreeApplicationStructureAlgo : Insert / Delete

Please Do Not Take Printout : Use RTF Outline in case needed

3+B Tree

Critic

Maths

Summattion

Series

Variations B*, B+

Application Industry

Please Do Not Take Printout : Use RTF Outline in case needed

4+Binary Search Tree

What happens if data is loaded in a binary search tree in this order

23, 32, 45, 11, 43 , 41

1,2,3,4,5,6,7,8

What is AVL tree

Please Do Not Take Printout : Use RTF Outline in case needed

5+Multiway Trees

< K1>= K2

K1

K2

>= K1

<K2

+ m-way trees

Reduce the depth of the tree to O(logmn)

with m-way trees

m children, m-1 keys per node

m = 10 : 106 keys in 6 levels vs 20 for a binary tree

but ........

K1 K2 K3

K1

K2

K3

K1

K2

K3

K1

K2

K3

K1

K2

K3

+m-way trees

But you have to search through the m keys in each node!

Reduces your gain from having fewer levels!

+ m-way trees50

100

150

35

45

110

120

60

70

125

135

85

95

90

75

175

+B-trees

All leaves are on the same level

All nodes except for the root and the leaveshave at least m/2 children at most m children

Anand B

Each node is at least

half full of keys

+ BTREE

74

78

85

9711

14 125

135

21

102

Please Do Not Take Printout : Use RTF Outline in case needed

11+Multiway Tree

M – ary tree

3 levels :

Cylinder , Track , Record : Index Seq (RDBMS)

Tables with less change

Please Do Not Take Printout : Use RTF Outline in case needed

12+BTree

If level is 3, m =199 then what is N

How many split per insertion ?

Please Do Not Take Printout : Use RTF Outline in case needed

13+Multiway Trees : Application

NDPL , Delhi: Electricity Billing 3 lakh consumers Table indexed as BTREE

UCO Bank, Jaipur One DD takes 10 minutes to print Saviour : BTREE

+B-trees - Insertion

Insertion B-tree property : block is at least half-full of keys

Insertion into block with m keys

block overflows split block promote one key split parent if necessary if root is split, tree becomes one level deeper

+ Insert Node

74

78

85

9711

14 125

135

21

102

63

+ After Insert 63

11

14 125

135

63

74

21

78

102

85

97

+ Insert Node

74

78

85

9711

14 125

135

21

102

99

+ After Insert 99

11

14 125

135

74

78

21

85

102

97

99

+ Split Node

74

78

85

97

74

78

85

97

4

node

0

63

Avoid Taking Printout : Use RTF Outline in case needed

20+Structure of Btree

node firstPtr numEntries Entries[1.. M-1]End

Entry key rightPtrEnd Entry

+ Split Node : Final

78

63

74

3

node

0

85

97

2

rightPtr

43

2

median

entry

toNdx

fromNdx

+ Split Node : Final

85

74

78

3

node

4

97

99

2

rightPtr

43

1

median

entry

toNdx

fromNdx

+ Traversal

42

45

63

7411

14 85 95

21

78

Avoid Taking Printout : Use RTF Outline in case needed 24

+

Agenda

DeleteDelete Walk ThroughReflowBorrow LeftBorrow RightCombineDelete Mid

Please Do Not Take Printout : Use RTF Outline in case needed

25+ Delete : For 78Btree Delete Delete() Delete() Delete Mid() Reflow() Reflow() If shorter delete root

42

1

16

21

2 57

78

2

45

52

2 63

74

2 85

97

2

Please Do Not Take Printout : Use RTF Outline in case needed

26+Btree DeleteIf (root null) print (“Attempt to delete from null tree”)Else shorter = delete (root, target) if Shorter delete root Return root

42

1

16

21

2 57

78

2

45

52

2 63

74

2 85

97

2

Target = 78

B

Please Do Not Take Printout : Use RTF Outline in case needed

27+Delete(root , deleteKey)If (root null) data does not existElse entryNdx= searchNode(root, deleteKey) if found entry to be deleted if leaf node underflow=deleteEntry() else underflow=deleteMid (left) if underflow underflow=reflow()

42

1

16

21

2 57

78

2

45

52

2 63

74

2 85

97

2

Target = 78

B

D

Please Do Not Take Printout : Use RTF Outline in case needed

28+Delete Else PartElse if deleteKey less than first entry subtree=firstPtr else subtree=rightPtr underflow= delete (subtree,deleteKey) if underflow underflow= reflow()Return underflow

42

1

16

21

2 57

78

2

45

52

2 63

74

2 85

97

2

Target = 78

B

D

Please Do Not Take Printout : Use RTF Outline in case needed

29+Delete(root , deleteKey)If (root null) data does not existElse entryNdx= searchNode(root, deleteKey) if found entry to be deleted if leaf node underflow=deleteEntry() else underflow=deleteMid (root,entryIndx,left) if underflow underflow=reflow(root,entryIndx)

42

1

16

21

2 57

78

2

45

52

2 63

74

2 85

97

2

Target = 78

B

D

D

DM

Please Do Not Take Printout : Use RTF Outline in case needed

30+Delete(root , deleteKey)If (root null) data does not existElse entryNdx= searchNode(root, deleteKey) if found entry to be deleted if leaf node underflow=deleteEntry() else underflow=deleteMid (root,entryIndx,left) if underflow underflow=reflow(root,entryIndx)

42

1

16

21

2 57

74

2

45

52

2 63

1 85

97

2

74 replaces 78

B

D

D

Please Do Not Take Printout : Use RTF Outline in case needed

31+Delete(root , deleteKey)If (root null) data does not existElse entryNdx= searchNode(root, deleteKey) if found entry to be deleted if leaf node underflow=deleteEntry() else underflow=deleteMid (root,entryIndx,left) if underflow underflow=reflow(root,entryIndx)

42

1

16

21

2

45

52

2

After Reflow

57

1

63

74

85

97

4

B

D

D

Please Do Not Take Printout : Use RTF Outline in case needed

32+Delete Else PartElse if deleteKey less than first entry subtree=firstPtr else subtree=rightPtr underflow= delete (subtree,deleteKey) if underflow underflow= reflow(root,entryIndx)Return underflow

Before Reflow

42

1

16

21

2

45

52

2

57

1

63

74

85

97

4

B

D

Please Do Not Take Printout : Use RTF Outline in case needed

33+Delete Else PartElse if deleteKey less than first entry subtree=firstPtr else subtree=rightPtr underflow= delete (subtree,deleteKey) if underflow underflow= reflow(root,entryIndx)Return underflow

After Reflow

0

45

52

2 63

74

85

97

4

16

21

42

57

4

B

D

Please Do Not Take Printout : Use RTF Outline in case needed

34+BTREE DeleteIf (root null) print (“Attempt to delete from null tree”)Else shorter = delete (root, target) if Shorter delete root Return root

0

45

52

2 63

74

85

97

4

16

21

42

57

4

B

Please Do Not Take Printout : Use RTF Outline in case needed

35+BTREE DeleteIf (root null) print (“Attempt to delete from null tree”)Else shorter = delete (root, target) if Shorter delete root Return root

45

52

2 63

74

85

97

4

16

21

42

57

4

B

Please Do Not Take Printout : Use RTF Outline in case needed

38+ Delete : For 78Btree Delete Delete() Delete() Delete Mid() Reflow() Reflow() If shorter delete root

42

1

16

21

2 57

78

2

45

52

2 63

74

2 85

97

2

Please Do Not Take Printout : Use RTF Outline in case needed

39+Delete : Reflow

1: Try to borrow right.

2: If 1 failed try to borrow from left

3: Cannot Borrow (1,2 failed) Combine

Please Do Not Take Printout : Use RTF Outline in case needed

40+Delete Reflow

Underflow=falseIf RT->no > min Entries BorrowRight (root,entryNdx,LT,RT)Else If LT->no > min Entries BorrowLeft (root,entryNdx,LT,RT)Else combine (root,entryNdx,LT,RT) if root->no < min entries underflow=TrueReturn underflow

Please Do Not Take Printout : Use RTF Outline in case needed

41+ Borrow Left

8 78

2

85

145

63

74

3

Node >= 74 < 78

Node >= 78 < 85

Please Do Not Take Printout : Use RTF Outline in case needed

42+ Combine

65

71

2

63

1

21

57

78

3

42

45

2

59

61

2

Please Do Not Take Printout : Use RTF Outline in case needed

43+ Combine

65

71

2

63

1

21

57

78

3

59

61

2

42

45

57

3

Please Do Not Take Printout : Use RTF Outline in case needed

44+ Combine

65

71

2

21

57

78

3

59

61

2

42 45

57 63

4

Please Do Not Take Printout : Use RTF Outline in case needed

45+ Combine

65

71

2

21

78

2

59

61

2

42 45

57 63

4

Please Do Not Take Printout : Use RTF Outline in case needed

46+Delete Mid

If leaf exchange data and delete leaf entryElse traverse right to locate predecessor deleteMid(right) if underflow reflow

Please Do Not Take Printout : Use RTF Outline in case needed

47+ Delete Mid

42

1

16

21

2 57

78

2

45

52

2 63

74

2 85

97

2

Case 1: To Delete 78 we replace with 74

Please Do Not Take Printout : Use RTF Outline in case needed

48+ Delete Mid

42

1

16

21

2 57

78

2

45

52

2 63

74

2 85

97

2

75

76

2Case 2:To Delete 78 we replace with 76

Hence recursive call of Delete Mid to locate predecessor

Please Do Not Take Printout : Use RTF Outline in case needed

49+order

Order Min Max3 2 34 2 45 3 56 3 6… … …

m m/2 m

Please Do Not Take Printout : Use RTF Outline in case needed

50+Get the Order Right

Keys are 4

Subtrees Max is 5 = Order is 5

Minimum = 3 (which is subtrees)

Min Keys is 2

45

52

2 63

74

85

97

4

16

21

42

57

4

Please Do Not Take Printout : Use RTF Outline in case needed

51+2-3 Tree

Order 3 ….. So how many keys in a node

This rule is valid for non root leaf

Root can have 0, 2, 3 subtrees

Please Do Not Take Printout : Use RTF Outline in case needed

52+ 2 -3 Tree

42

1

16

2 57

78

2

45

52

2 63

2 85

97

2

Please Do Not Take Printout : Use RTF Outline in case needed

53+2-3-4 Tree

Order 4 ….. So how many keys in a node

This rule is valid for non root leaf

Root can have 0, 2, 3 subtrees

Avoid Taking Printout : Use RTF Outline in case needed

54+Structure of B + tree

Non leaf node firstPtr numEntries Entries[1.. M-1]End

Entry key rightPtrEnd Entry

Leaf node firstPtr numEntries Entries[1.. M-1] Next Leaf Node End

Please Do Not Take Printout : Use RTF Outline in case needed

55+ B + Tree

42

1

57

78

2

45

52

2 63

74

2 85

97

2

Implies there are more nodes

Please Do Not Take Printout : Use RTF Outline in case needed

56+B * Tree

Space Usage

BTREE nodes can be 50% Empty (1/2)

So rule modified to two third (2/3)

Also when node overflows instead of being split immed distributed with siblings

And even when split happens all siblings are equally distributed (pg 462)

+B+-trees

B+ trees All the keys in the nodes are dummies Only the keys in the leaves point to “real”

data Linking the leaves

Ability to scan the collection in orderwithout passing through the higher nodes