csc 261/461 –database systems lecture 16 · 2017-03-23 · file types •unordered records (heap...

Post on 08-Apr-2020

1 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

CSC 261/461 – Database SystemsLecture 16

Spring 2017MW 3:25 pm – 4:40 pm

January 18 – May 3Dewey 1101

Announcement

• Project 1 Milepost 3 is out

CSC261,Spring2017,UR

File Types

• Unordered Records (Heap Files)

• Ordered Records (Sorted Files)

• Hash Files

CSC261,Spring2017,UR

Average Access Times for a File of b Blocks under Basic File Organizations

1. EXTERNAL MERGE SORT

5

What you will learn about in this section

1. Externalmergesort

2. Externalmergesortonlargerfiles

3. Optimizationsforsorting

6

Recap: External Merge Algorithm

• Suppose we want to merge two sorted files both much larger than main memory (i.e. the buffer)

• We can use the external merge algorithm to merge files of arbitrary length in 2*(N+M) IO operations with only 3 buffer pages!

External Merge Sort

Why are Sort Algorithms Important?

• Data requested from DB in sorted order is extremely common– e.g., find students in increasing GPA order

• Why not just use quicksort in main memory??–What about if we need to sort 1TB of data with 1GB of

RAM…

Aclassicproblemincomputerscience!

So how do we sort big files?

1. Split into chunks small enough to sort in memory (“runs”)

2. Merge pairs (or groups) of runs using the external merge algorithm

3. Keep merging the resulting runs (each time = a “pass”) until left with one sorted file!

External Merge Sort Algorithm

27,24 3,1

Example:• 3Buffer

pages• 6-pagefile

Disk MainMemory

Buffer

18,22

F1

F2

33,12 55,3144,10

1. Split into chunks small enough to sort in memory

Orangefile=unsorted

External Merge Sort Algorithm

27,24 3,1

Disk MainMemory

Buffer

18,22

F1

F2

33,12 55,3144,10

1. Split into chunks small enough to sort in memory

Example:• 3Buffer

pages• 6-pagefile

Orangefile=unsorted

External Merge Sort Algorithm

27,24 3,1

Disk

MainMemory

Buffer

18,22

F1

F233,12 55,3144,10

1. Split into chunks small enough to sort in memory

Example:• 3Buffer

pages• 6-pagefile

Orangefile=unsorted

External Merge Sort Algorithm

27,24 3,1

Disk

MainMemory

Buffer

18,22

F1

F231,33 44,5510,12

Example:• 3Buffer

pages• 6-pagefile

1. Split into chunks small enough to sort in memory

Orangefile=unsorted

External Merge Sort Algorithm

Disk MainMemory

BufferF1

F2

31,33 44,5510,12

AndsimilarlyforF2

27,24 3,118,2218,22 24,271,3

1. Splitintochunkssmallenoughtosortinmemory

Example:• 3Buffer

pages• 6-pagefileEachsortedfileisacalledarun

External Merge Sort Algorithm

Disk MainMemory

BufferF1

F2

2. Now just run the external merge algorithm & we’re done!

31,33 44,5510,12

18,22 24,271,3

Example:• 3Buffer

pages• 6-pagefile

Calculating IO Cost

For 3 buffer pages, 6 page file:

1. Split into two 3-page files and sort in memory 1. = 1 R + 1 W for each file = 2*(3 + 3) = 12 IO operations

2. Merge each pair of sorted chunks using the external merge algorithm 1. = 2*(3 + 3) = 12 IO operations

3. Total cost = 24 IO

Running External Merge Sort on Larger Files

Disk

31,33 44,5510,12

18,43 24,2745,38

Assumewestillonlyhave3 bufferpages(Buffernotpictured)

31,33 47,5510,12

18,22 23,2041,3

31,33 39,5542,46

18,23 24,271,3

48,33 44,4010,12

18,22 24,2716,31

Running External Merge Sort on Larger Files

Disk

31,33 44,5510,12

18,43 24,2745,38

31,33 47,5510,12

18,22 23,2041,3

31,33 39,5542,46

18,23 24,271,3

48,33 44,4010,12

18,22 24,2716,31

1.Splitintofilessmallenoughtosortinbuffer…

Assumewestillonlyhave3 bufferpages(Buffernotpictured)

Running External Merge Sort on Larger Files

Disk

31,33 44,5510,12

27,38 43,4518,24

31,33 47,5510,12

20,22 23,413,18

39,42 46,5531,33

18,23 24,271,3

33,40 44,4810,12

22,24 27,3116,18

1.Splitintofilessmallenoughtosortinbuffer…andsort

Assumewestillonlyhave3 bufferpages(Buffernotpictured)

Calleachofthesesortedfilesarun

Running External Merge Sort on Larger Files

Disk

31,33 44,5510,12

27,38 43,4518,24

31,33 47,5510,12

20,22 23,413,18

39,42 46,5531,33

18,23 24,271,3

33,40 44,4810,12

22,24 27,3116,18

2.Nowmergepairsof(sorted)files…theresultingfileswillbesorted!

Disk

18,24 27,3110,12

43,44 45,5533,38

12,18 20,223,10

33,41 47,5523,31

18,23 24,271,3

39,42 46,5531,33

16,18 22,2410,12

33,40 44,4827,31

Assumewestillonlyhave3 bufferpages(Buffernotpictured)

Running External Merge Sort on Larger Files

Disk

31,33 44,5510,12

27,38 43,4518,24

31,33 47,5510,12

20,22 23,413,18

39,42 46,5531,33

18,23 24,271,3

33,40 44,4810,12

22,24 27,3116,18

3.Andrepeat…

Disk

18,24 27,3110,12

43,44 45,5533,38

12,18 20,223,10

33,41 47,5523,31

18,23 24,271,3

39,42 46,5531,33

16,18 22,2410,12

33,40 44,4827,31

Disk

10,12 12,183,10

22,23 24,2718,20

33,33 38,4131,31

45,47 55,5543,44

10,12 16,181,3

23,24 24,2718,22

31,33 33,3927,31

44,46 48,5540,42

Assumewestillonlyhave3 bufferpages(Buffernotpictured)

Calleachofthesestepsapass

Running External Merge Sort on Larger Files

Disk

31,33 44,5510,12

27,38 43,4518,24

31,33 47,5510,12

20,22 23,413,18

39,42 46,5531,33

18,23 24,271,3

33,40 44,4810,12

22,24 27,3116,18

4.Andrepeat!

Disk

18,24 27,3110,12

43,44 45,5533,38

12,18 20,223,10

33,41 47,5523,31

18,23 24,271,3

39,42 46,5531,33

16,18 22,2410,12

33,40 44,4827,31

Disk

10,12 12,183,10

22,23 24,2718,20

33,33 38,4131,31

45,47 55,5543,44

10,12 16,181,3

23,24 24,2718,22

31,33 33,3927,31

44,46 48,5540,42

Disk

3,10 10,101,3

12,16 18,1812,12

20,22 22,2318,18

24,24 27,2723,24

31,31 31,3327,31

33,38 39,4033,33

43,44 44,4541,42

48,55 55,5546,47

Simplified 3-page Buffer Version

Assume for simplicity that we split an N-page file into N single-page runs and sort these; then:

• First pass: Merge N/2 pairs of runs each of length 1 page

• Second pass: Merge N/4 pairs of runs each of length 2 pages

• In general, for N pages, we do 𝒍𝒐𝒈𝟐 𝑵 passes– +1 for the initial split & sort

• Each pass involves reading in & writing out all the pages = 2N IO

Unsortedinputfile

Split&sort

Merge

Merge

Sorted!

à 2N*( 𝒍𝒐𝒈𝟐 𝑵 +1)totalIOcost!

Using B+1 buffer pages to reduce # of passes

Suppose we have B+1 buffer pages now; we can:

1. Increase length of initial runs. Sort B+1 at a time!At the beginning, we can split the N pages into runs of length B+1 and sort these in memory

2𝑁( log, 𝑁 + 1)

IOCost:

Startingwithrunsoflength1

2𝑁( log,𝑵

𝑩 + 𝟏 + 1)

StartingwithrunsoflengthB+1

Using B+1 buffer pages to reduce # of passes

Suppose we have B+1 buffer pages now; we can:

2. Perform a B-way merge. On each pass, we can merge groups of B runs at a time (vs. merging pairs of runs)!

IOCost:

2𝑁( log, 𝑁 + 1) 2𝑁( log,𝑵

𝑩 + 𝟏 + 1)

Startingwithrunsoflength1

StartingwithrunsoflengthB+1

2𝑁( log2𝑵

𝑩 + 𝟏 + 1)

PerformingB-waymerges

INDEXING

27

What you will learn about in this section

1. Indexes:Motivation

2. Indexes:Basics

28

Index Motivation

• Suppose we want to search for people of a specific age

• First idea: Sort the records by age… we know how to do this fast!

• How many IO operations to search over N sorted records?– Simple scan: O(N)– Binary search: O(𝐥𝐨𝐠𝟐 𝑵)

Person(name, age)

Couldwegetevencheapersearch?E.g.gofrom𝐥𝐨𝐠𝟐 𝑵à 𝐥𝐨𝐠𝟐𝟎𝟎 𝑵?

Index Motivation

• What about if we want to insert a new person, but keep the list sorted?

• We would have to potentially shift N records, requiring up to ~ 2*N/P IO operations (where P = # of records per page)!– We could leave some “slack” in the pages…

4,5 6,71,3 3,4 5,61,2

2

7,

Couldwegetfasterinsertions?

Index Motivation

• What about if we want to be able to search quickly along multiple attributes (e.g. not just age)?–We could keep multiple copies of the records, each sorted

by one attribute set… this would take a lot of space

Canwegetfastsearchovermultipleattribute(sets)withouttakingtoomuchspace?

We’llcreateseparatedatastructurescalledindexes toaddressallthesepoints

Further Motivation for Indexes: NoSQL!

• NoSQL engines are (basically) just indexes!

– A lot more is left to the user in NoSQL… one of the primary remaining functions of the DBMS is still to provide index over the data records, for the reasons we just saw!

– Sometimes use B+ Trees, sometimes hash indexes

IndexesarecriticalacrossallDBMStypes

Indexes: High-level

• An index on a file speeds up selections on the search key fields for the index.– Search key properties

• Any subset of fields• is not the same as key of a relation

• Example: Onwhichattributeswouldyoubuild

indexes?Product(name, maker, price)

More precisely

• An index is a data structure mapping search keys to sets of rows in a database table

– Provides efficient lookup & retrieval by search key value-usually much faster than searching through all the rows of the database table

• An index can store:– Full rows it points to (primary index) or – Pointers to those rows (secondary index)

Operations on an Index

• Search: Quickly find all records which meet some conditionon the search key attributes–More sophisticated variants as well. Why?

Indexingisonethemostimportantfeaturesprovidedbyadatabaseforperformance

Conceptual Example

Whatifwewanttoreturnallbookspublishedafter1867?Theabovetablemightbeveryexpensivetosearchoverrow-by-row…

SELECT *FROM Russian_NovelsWHERE Published > 1867

BID Title Author Published Full_text

001 WarandPeace Tolstoy 1869 …

002 CrimeandPunishment

Dostoyevsky 1866 …

003 AnnaKarenina Tolstoy 1877 …

Russian_Novels

Conceptual Example

BID Title Author Published Full_text

001 WarandPeace Tolstoy 1869 …

002 CrimeandPunishment

Dostoyevsky 1866 …

003 AnnaKarenina Tolstoy 1877 …

Published BID

1866 002

1869 001

1877 003

Maintainanindexforthis,andsearchoverthat!

Russian_NovelsBy_Yr_Index

Whymightjustkeepingthetablesortedbyyearnotbegoodenough?

Conceptual Example

BID Title Author Published Full_text

001 WarandPeace Tolstoy 1869 …

002 CrimeandPunishment

Dostoyevsky 1866 …

003 AnnaKarenina Tolstoy 1877 …

Published BID

1866 002

1869 001

1877 003

Indexesshownhereastables,butinrealitywewillusemoreefficientdatastructures…

Russian_NovelsBy_Yr_Index

Author Title BID

Dostoyevsky Crime andPunishment

002

Tolstoy AnnaKarenina 003

Tolstoy War andPeace 001

By_Author_Title_Index Canhavemultipleindexestosupportmultiplesearchkeys

Covering Indexes

Published BID

1866 002

1869 001

1877 003

By_Yr_Index

Wesaythatanindexiscovering foraspecificquery iftheindexcontainsalltheneededattributes-meaningthequerycanbeansweredusingtheindexalone!

The“needed”attributesaretheunionofthoseintheSELECTandWHEREclauses…

SELECT Published, BIDFROM Russian_NovelsWHERE Published > 1867

Example:

Primary Indexes: Index for Sorted (Ordered) Files

CSC261,Spring2017,UR

Clustering Indexes (Index for Sorted (on non-key) Files)

CSC261,Spring2017,UR

Secondary Indexes (on a key field)

• Secondary means of accessing a data file

• File records could be ordered, unordered, or hashed

CSC261,Spring2017,UR

Secondary Indexes (on a non-key field)Extra level of indirection

• Provides logical ordering– Though records are not

physically ordered

CSC261,Spring2017,UR

High-level Categories of Index Types

• Multilevel Indexes– Very good for range queries, sorted data– Some old databases only implemented B-Trees– We will mostly look at a variant called B+ Trees

• Hash Tables – Very good for searching

Realdifferencebetweenstructures:costsofopsdetermineswhichindexyoupickandwhy

MULTILEVEL INDEXES

45

What you will learn about in this section

1. ISAM

2. B+Tree

46

1. ISAM

47

ISAM

• Indexed Sequential Access Method

– For an index with𝑏9blocks• Earlier: log,b;block

access• Now: log<=b; block

access• (𝑓𝑜 = 𝑓𝑎𝑛𝑜𝑢𝑡)

1. B+ TREES

49

What you will learn about in this section

1. B+Trees:Basics

2. B+Trees:Design&Cost

3. ClusteredIndexes

50

B+ Trees

• Search trees – B does not mean binary!

• Idea in B Trees:–make 1 node = 1 physical page– Balanced, height adjusted tree (not the B either)

• Idea in B+ Trees:–Make leaves into a linked list (for range queries)

B+ Tree Basics

10 20 30

Eachnon-leaf(“interior”)node has≥ dand≤2dkeys*

*exceptforrootnode,whichcanhavebetween1and2dkeys

Parameterd=degreeTheminimumnumberofkeyaninteriornodecanhave

B+ Tree Basics

10 20 30

k<10

10≤ 𝑘<20

20≤ 𝑘<3030≤ 𝑘

Thenkeysinanodedefinen+1ranges

B+ Tree Basics

10 20 30

Non-leaforinternalnode

22 25 28

Foreachrange,inanon-leafnode,thereisapointer toanothernodewithkeysinthatrange

B+ Tree Basics

10 20 30

Leafnodesalsohavebetweendand2dkeys,andaredifferentinthat:

22 25 28 29

Leaf nodes

32 34 37 38

Non-leaforinternalnode

12 17

B+ Tree Basics

10 20 30

22 25 28 29

Leaf nodes

32 34 37 38

Non-leaforinternalnode

12 17

Leafnodesalsohavebetweendand2dkeys,andaredifferentinthat:

Theirkeyslotscontainpointerstodatarecords

21 22 27 28 30 33 35 3715

11

B+ Tree Basics

10 20 30

22 25 28 29

Leaf nodes

32 34 37 38

Non-leaforinternalnode

12 17

21 22 27 28 30 33 35 3715

11

Leafnodesalsohavebetweendand2dkeys,andaredifferentinthat:

Theirkeyslotscontainpointerstodatarecords

Theycontainapointertothenextleafnodeaswell,forfastersequentialtraversal

B+ Tree Basics

10 20 30

22 25 28 29

Leaf nodes

32 34 37 38

Non-leaforinternalnode

12 17

Notethatthepointersattheleaflevelwillbetotheactualdatarecords(rows).

Wemighttruncatetheseforsimplerdisplay(asbefore)…

Name:JohnAge:21

Name:JakeAge:15

Name:BobAge:27

Name:SallyAge:28

Name:SueAge:33

Name:JessAge:35

Name:AlfAge:37Name:Joe

Age:11

Name:BessAge:22

Name:SalAge:30

Some finer points of B+ Trees

Searching a B+ Tree

• For exact key values:–Start at the root–Proceed down, to the leaf

• For range queries:–As above–Then sequential traversal

SELECT nameFROM peopleWHERE age = 25

SELECT nameFROM peopleWHERE 20 <= ageAND age <= 30

B+ Tree Exact Search Animation

80

20 60 100 120 140

10 15 18 20 30 40 50 60 65 80 85 90

10 12 15 20 28 30 40 60 63 80 84 89

K=30?

30<80

30in[20,60)

Tothedata!

Notallnodespictured

30in[30,40)

B+ Tree Range Search Animation

80

20 60 100 120 140

10 15 18 20 30 40 50 60 65 80 85 90

10 12 15 20 28 30 40 59 63 80 84 89

Kin[30,85]?

30<80

30in[20,60)

Tothedata!

Notallnodespictured

30in[30,40)

B+ Tree Design

• How large is d?

• Example:– Key size = 4 bytes– Pointer size = 8 bytes– Block size = 4096 bytes

• We want each node to fit on a single block/page– 2d x 4 + (2d+1) x 8 <= 4096 à d <= 170

B+ Tree: High Fanout = Smaller & Lower IO

• As compared to e.g. binary search trees, B+ Trees have high fanout(between d+1 and 2d+1)

• This means that the depth of the tree is small à getting to any element requires very few IO operations!– Also can often store most or all of the

B+ Tree in main memory!

Thefanout isdefinedasthenumberofpointerstochildnodescomingoutofanode

Notethatfanout isdynamic-we’lloftenassumeit’sconstantjusttocomeupwithapproximateeqns!

Simple Cost Model for Search• Let:

– f = fanout, which is in [d+1, 2d+1] (we’ll assume it’s constant for our cost model…)– N = the total number of pages we need to index– F = fill-factor (usually ~= 2/3)

• Our B+ Tree needs to have room to index N / F pages!– We have the fill factor in order to leave some open slots for faster insertions

• What height (h) does our B+ Tree need to be?– h=1 à Just the root node- room to index f pages– h=2 à f leaf nodes- room to index f2 pages– h=3 à f2 leaf nodes- room to index f3 pages– …– h à fh-1 leaf nodes- room to index fh pages!

àWeneedaB+Treeofheighth= logH

IJ

Fast Insertions & Self-Balancing

– Same cost as exact search– Self-balancing: B+ Tree remains balanced (with respect to

height) even after insert

B+Treesalso(relatively)fastforsingleinsertions!However,canbecomebottleneckifmanyinsertions(iffill-

factorslackisusedup…)

Insertion

• Perform a search to determine what bucket the new record should go into.

• If the bucket is not full (at most (b-1) entries after the insertion), add the record.

• Otherwise, split the bucket.– Allocate new leaf and move half the bucket's elements to the new bucket.– Insert the new leaf's smallest key and address into the parent.– If the parent is full, split it too.

• Add the middle key to the parent node.– Repeat until a parent is found that need not split.

• If the root splits, create a new root which has one key and two pointers. (That is, the value that gets pushed to the new root gets removed from the original node)

• Note: B-trees grow at the root and not at the leave

CSC261,Spring2017,UR

Insertion (Insert 85)

CSC261,Spring2017,UR

b=f=branchingfactor/fan-out=3

20 40 80 90

40 80

Insertion (Insert 85)

CSC261,Spring2017,UR

b=f=branchingfactor/fan-out=3

20 40 80 90

40 80

85

Thisiswhatwewouldlike.Butthemaximumnumberofkeysinanynodeis(3-1)=2So,split.

Insertion (Insert 85)

CSC261,Spring2017,UR

b=f=branchingfactor/fan-out=3

20 40 80

40 80

85

Allocatenewleafandmovehalfthebucket'selementstothenewbucket.

90

Insertion (Insert 85)

CSC261,Spring2017,UR

b=f=branchingfactor/fan-out=3

20 40 80

40 80

85

Insertthenewleaf'ssmallestkeyandaddressintotheparent.

90

85

Insertion (Insert 85)

CSC261,Spring2017,UR

b=f=branchingfactor/fan-out=3

20 40 80

40 80

85

Thisisnotallowed,astheparentisfull.Needtosplit

90

85

Insertion (Insert 85)

CSC261,Spring2017,UR

b=f=branchingfactor/fan-out=3

20 40 80

40

80

85

Iftheparentisfull,splitittoo.

Addthemiddlekeytotheparentnode.

Repeatuntilaparentisfoundthatneednotsplit

90

85

Insertion (Insert 85)

CSC261,Spring2017,UR

b=f=branchingfactor/fan-out=3

20 40 80

40

80

85

Iftherootsplits,createanewrootwhichhasonekeyandtwopointers.

(Thatis,thevaluethatgetspushedtothenewrootgetsremovedfromtheoriginalnode)

90

85

Deletion

• Start at root, find leaf L where entry belongs.• Remove the entry.– If L is at least half-full, done!– If L has fewer entries than it should,

• If sibling (adjacent node with same parent as L) is more than half-full, re-distribute, borrowing an entry from it.

• Otherwise, sibling is exactly half-full, so we can merge L and sibling.• If merge occurred, must delete entry (pointing to L or sibling)

from parent of L.• Merge could propagate to root, decreasing height.

• https://www.cs.usfca.edu/~galles/visualization/BPlusTree.html

• The degree in this visualization is actually fan-out f or branching factor b.

CSC261,Spring2017,UR

• Find 86

CSC261,Spring2017,UR

• Delete it

CSC261,Spring2017,UR

• Stealing from right sibling (redistribute).• Modify the parent node

CSC261,Spring2017,UR

• Finally,

CSC261,Spring2017,UR

Acknowledgement

• Some of the slides in this presentation are taken from the slides provided by the authors.

• Many of these slides are taken from cs145 course offered byStanford University.

CSC261,Spring2017,UR

top related