ch 4d b+ trees

39
+ Ch 4d B+ trees Mark McKenney

Upload: maia-benjamin

Post on 02-Jan-2016

34 views

Category:

Documents


0 download

DESCRIPTION

Ch 4d B+ trees. Mark McKenney. Lots of trees, but what happens when memory fills up?. Performance tanks! All the trees we have seen so far assume that they fit in memory When memory fills….. Disk paging comes into play. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Ch 4d B+ trees

+

Ch 4dB+ treesMark McKenney

Page 2: Ch 4d B+ trees

+Lots of trees, but what happens when memory fills up? Performance tanks!

All the trees we have seen so far assume that they fit in memory When memory fills….. Disk paging comes into play.

To traverse a tree, we need to access nodes that are stored non-sequentially in memory

How big is a node? (a couple of ints and pointers are around 4+4+8+8 = 24 bytes)

What is the minimum amount of data that can be read from memory (usually a word)

What is the minimum amount of memory that can be read from disk? (usually a page: 4kb)

So, if a node is stored on a unique page, we are wasting 4096-24 = 4072 bytes per read 257 reads requires 1 MB data transfer for 6kb of actual data

Page 3: Ch 4d B+ trees

+So, lets generalize a binary tree to Disk… a B tree Actually, a B+ tree

B trees came out first, are harder, and more complicated\

Approach: Make a node the size of a disk page (fixed!) Make sure that no node is too empty Make sure that the tree is balanced

What if actual data is too big to fit in a disk page Use a Key to index the actual data, and store the data on disk in a separate

file

Advantages Maximum disk performance Persistence!!!! Buffering in terms of disk pages This is all very database oriented

Page 4: Ch 4d B+ trees

+ 4

Root

B+Tree Example n=3

100

12

01

50

18

0

30

3 5 11

30

35

100

101

110

120

130

150

156

179

180

200

Keys in the tree (stored in its own file)

Data in a separate file

Page 5: Ch 4d B+ trees

+ 5

Sample non-leaf

to keys to keys to keys to keys

< 57 57 k<81 81k<95 95

57

81

95

Page 6: Ch 4d B+ trees

+ 6

Sample leaf node:

From non-leaf node

to next leaf

in sequence5

7

81

95

To r

eco

rd

wit

h k

ey 5

7

To r

eco

rd

wit

h k

ey 8

1

To r

eco

rd

wit

h k

ey 8

5

Page 7: Ch 4d B+ trees

+ 7

Size of nodes: n+1 pointers

n keys (fixed)

Page 8: Ch 4d B+ trees

+ 8

Don’t want nodes to be too empty

Use at least

Non-leaf: (n+1)/2 pointers

Leaf: (n+1)/2 pointers to data

Page 9: Ch 4d B+ trees

+ 9

Full node min. node

Non-leaf

Leaf

n=3

12

01

50

18

0

30

3 5 11

30

35

counts

even if

null

Page 10: Ch 4d B+ trees

+ 10

B+tree rules tree of order n

(1) All leaves at same lowest level(balanced tree)

(2) Pointers in leaves point to records

except for “sequence pointer”

Page 11: Ch 4d B+ trees

+ 11

(3) Number of pointers/keys for B+tree

Non-leaf(non-root) n+1 n (n+1)/2 (n+1)/2- 1

Leaf(non-root) n+1 n

Root n+1 n 1 1

Max Max Min Min ptrs keys ptrsdata keys

(n+1)/2 (n+1)/2

Page 12: Ch 4d B+ trees

+ 12

Insert into B+tree

(a) simple case space available in leaf

(b) leaf overflow

(c) non-leaf overflow

(d) new root

Page 13: Ch 4d B+ trees

+ 13

(a) Insert key = 32 n=33 5 11

30

31

30

100

Page 14: Ch 4d B+ trees

+ 14

(a) Insert key = 32 n=33 5 11

30

31

30

100

32

Page 15: Ch 4d B+ trees

+ 15

(a) Insert key = 7 n=3

3 5 11

30

31

30

100

Page 16: Ch 4d B+ trees

+ 16

(a) Insert key = 7 n=3

3 5 11

30

31

30

100

3 5

7

Page 17: Ch 4d B+ trees

+ 17

(a) Insert key = 7 n=3

3 5 11

30

31

30

100

3 5

7

7

Page 18: Ch 4d B+ trees

+ 18(c) Insert key = 160

n=3

100

120

150

180

150

156

179

180

200

Page 19: Ch 4d B+ trees

+ 19(c) Insert key = 160

n=3

100

120

150

180

150

156

179

180

200

160

179

Page 20: Ch 4d B+ trees

+ 20(c) Insert key = 160

n=3

100

120

150

180

150

156

179

180

200

180

160

179

Page 21: Ch 4d B+ trees

+ 21(c) Insert key = 160

n=3

100

120

150

180

150

156

179

180

200

160

180

160

179

Page 22: Ch 4d B+ trees

+ 22(d) New root, insert 45 n=3

10

20

30

1 2 3 10

12

20

25

30

32

40

Page 23: Ch 4d B+ trees

+ 23(d) New root, insert 45 n=3

10

20

30

1 2 3 10

12

20

25

30

32

40

40

45

Page 24: Ch 4d B+ trees

+ 24(d) New root, insert 45 n=3

10

20

30

1 2 3 10

12

20

25

30

32

40

40

45

40

Page 25: Ch 4d B+ trees

+ 25(d) New root, insert 45 n=3

10

20

30

1 2 3 10

12

20

25

30

32

40

40

45

40

30new root

Page 26: Ch 4d B+ trees

+ 26

(a) Simple case - no example

(b) Coalesce with neighbor (sibling)

(c) Re-distribute keys

(d) Cases (b) or (c) at non-leaf

Deletion from B+tree

Page 27: Ch 4d B+ trees

+ 27(b) Coalesce with sibling

Delete 50

10

40

100

10

20

30

40

50

n=4

Page 28: Ch 4d B+ trees

+ 28(b) Coalesce with sibling

Delete 50

10

40

100

10

20

30

40

50

n=4

40

Page 29: Ch 4d B+ trees

+ 29(c) Redistribute keys

Delete 50

10

40

100

10

20

30

35

40

50

n=4

Page 30: Ch 4d B+ trees

+ 30(c) Redistribute keys

Delete 50

10

40

100

10

20

30

35

40

50

n=4

35

35

Page 31: Ch 4d B+ trees

31

40

45

30

37

25

26

20

22

10

141 3

10

20

30

40

(d) Non-leaf coalese– Delete 37

n=4

25

Page 32: Ch 4d B+ trees

32

40

45

30

37

25

26

20

22

10

141 3

10

20

30

40

(d) Non-leaf coalese– Delete 37

n=4

30

25

Page 33: Ch 4d B+ trees

33

40

45

30

37

25

26

20

22

10

141 3

10

20

30

40

(d) Non-leaf coalese– Delete 37

n=4

40

30

25

Page 34: Ch 4d B+ trees

34

40

45

30

37

25

26

20

22

10

141 3

10

20

30

40

(d) Non-leaf coalese– Delete 37

n=4

40

30

25

25

new root

Page 35: Ch 4d B+ trees

+ 35

B+tree deletions in practice

– Often, coalescing is not implemented Too hard and not worth it!

Page 36: Ch 4d B+ trees

+Characteristics

B+ trees are typically short and bushy Want searches to touch few nodes since they are on disk For 100 elements in a node

A tree of height 1 can index 100 items A tree of height 2 can index 100 * 100 items = 10,000 A tree of height 3 can index 100*100*100 items =

1,000,000 So, we can find an item in that tree by looking at 3 nodes,

despite the huge number of items Equates to 3 disk reads. Very IO efficient

Databases make heavy use of B trees (usually B+ trees)

Page 37: Ch 4d B+ trees

+A final note

How to locate an element in a node?

They are sorted… use a binary search!

Page 38: Ch 4d B+ trees

+So.. Complexity?

We now have a new type of complexity

IO complexity IO’s are disk (secondary storage) IO’s, the slowest IO’s in a computer

system… So we need an IO complexity as well as a computational complexity, but IO

complexity reigns

So, for a B+ tree with a min nodes and b max nodes and block size (disk page size) of B Number of leaf blocks is O(n/B) IO complexity for all operations is O(logB n)

Height of tree is Ω(loga n) and O(logb n)

Time complexity to find is between Ω( f(a) loga n ) and O( f(b) logb n ) Where f(b) is the time to find an element in a node

Page 39: Ch 4d B+ trees

+Always remember your bandwidth

http://hothardware.com/News/Homing-Pigeon-Faster-Than-Internet-in-Data-Transfer/

Time to transfer 4GB at 2.04MB per second is ……

4 hours, 39 minutes, and 37 sec

Time to transfer 2.57 PB == 2570000GB at 2.04Mbits per second is

130821 Days 12 Hours 32 Minutes 13.54 Seconds == 358 years!

Size of a hard drive: .01 cubic foot

Cargo capacity of a Toyota Yaris: 25.7 cubic feet

Number of hard drives I can transport: 2570

If these are 1 TB hard drives, that’s 2.57 PB == roughly 20.56 peta bits

Time to drive to Chicago: 5hrs == 18000 seconds

Which gives a bandwidth of 1.14 Tbits/second == 142 GB/second

And so the saying is: “Never underestimate the bandwidth of a station wagon loaded with hard drives hurtling down the highway at 70mph”