CSC 261/461 – Database SystemsLecture 16
Spring 2017MW 3:25 pm – 4:40 pm
January 18 – May 3Dewey 1101
Announcement
• Project 1 Milepost 3 is out
CSC261,Spring2017,UR
File Types
• Unordered Records (Heap Files)
• Ordered Records (Sorted Files)
• Hash Files
CSC261,Spring2017,UR
Average Access Times for a File of b Blocks under Basic File Organizations
1. EXTERNAL MERGE SORT
5
What you will learn about in this section
1. Externalmergesort
2. Externalmergesortonlargerfiles
3. Optimizationsforsorting
6
Recap: External Merge Algorithm
• Suppose we want to merge two sorted files both much larger than main memory (i.e. the buffer)
• We can use the external merge algorithm to merge files of arbitrary length in 2*(N+M) IO operations with only 3 buffer pages!
External Merge Sort
Why are Sort Algorithms Important?
• Data requested from DB in sorted order is extremely common– e.g., find students in increasing GPA order
• Why not just use quicksort in main memory??–What about if we need to sort 1TB of data with 1GB of
RAM…
Aclassicproblemincomputerscience!
So how do we sort big files?
1. Split into chunks small enough to sort in memory (“runs”)
2. Merge pairs (or groups) of runs using the external merge algorithm
3. Keep merging the resulting runs (each time = a “pass”) until left with one sorted file!
External Merge Sort Algorithm
27,24 3,1
Example:• 3Buffer
pages• 6-pagefile
Disk MainMemory
Buffer
18,22
F1
F2
33,12 55,3144,10
1. Split into chunks small enough to sort in memory
Orangefile=unsorted
External Merge Sort Algorithm
27,24 3,1
Disk MainMemory
Buffer
18,22
F1
F2
33,12 55,3144,10
1. Split into chunks small enough to sort in memory
Example:• 3Buffer
pages• 6-pagefile
Orangefile=unsorted
External Merge Sort Algorithm
27,24 3,1
Disk
MainMemory
Buffer
18,22
F1
F233,12 55,3144,10
1. Split into chunks small enough to sort in memory
Example:• 3Buffer
pages• 6-pagefile
Orangefile=unsorted
External Merge Sort Algorithm
27,24 3,1
Disk
MainMemory
Buffer
18,22
F1
F231,33 44,5510,12
Example:• 3Buffer
pages• 6-pagefile
1. Split into chunks small enough to sort in memory
Orangefile=unsorted
External Merge Sort Algorithm
Disk MainMemory
BufferF1
F2
31,33 44,5510,12
AndsimilarlyforF2
27,24 3,118,2218,22 24,271,3
1. Splitintochunkssmallenoughtosortinmemory
Example:• 3Buffer
pages• 6-pagefileEachsortedfileisacalledarun
External Merge Sort Algorithm
Disk MainMemory
BufferF1
F2
2. Now just run the external merge algorithm & we’re done!
31,33 44,5510,12
18,22 24,271,3
Example:• 3Buffer
pages• 6-pagefile
Calculating IO Cost
For 3 buffer pages, 6 page file:
1. Split into two 3-page files and sort in memory 1. = 1 R + 1 W for each file = 2*(3 + 3) = 12 IO operations
2. Merge each pair of sorted chunks using the external merge algorithm 1. = 2*(3 + 3) = 12 IO operations
3. Total cost = 24 IO
Running External Merge Sort on Larger Files
Disk
31,33 44,5510,12
18,43 24,2745,38
Assumewestillonlyhave3 bufferpages(Buffernotpictured)
31,33 47,5510,12
18,22 23,2041,3
31,33 39,5542,46
18,23 24,271,3
48,33 44,4010,12
18,22 24,2716,31
Running External Merge Sort on Larger Files
Disk
31,33 44,5510,12
18,43 24,2745,38
31,33 47,5510,12
18,22 23,2041,3
31,33 39,5542,46
18,23 24,271,3
48,33 44,4010,12
18,22 24,2716,31
1.Splitintofilessmallenoughtosortinbuffer…
Assumewestillonlyhave3 bufferpages(Buffernotpictured)
Running External Merge Sort on Larger Files
Disk
31,33 44,5510,12
27,38 43,4518,24
31,33 47,5510,12
20,22 23,413,18
39,42 46,5531,33
18,23 24,271,3
33,40 44,4810,12
22,24 27,3116,18
1.Splitintofilessmallenoughtosortinbuffer…andsort
Assumewestillonlyhave3 bufferpages(Buffernotpictured)
Calleachofthesesortedfilesarun
Running External Merge Sort on Larger Files
Disk
31,33 44,5510,12
27,38 43,4518,24
31,33 47,5510,12
20,22 23,413,18
39,42 46,5531,33
18,23 24,271,3
33,40 44,4810,12
22,24 27,3116,18
2.Nowmergepairsof(sorted)files…theresultingfileswillbesorted!
Disk
18,24 27,3110,12
43,44 45,5533,38
12,18 20,223,10
33,41 47,5523,31
18,23 24,271,3
39,42 46,5531,33
16,18 22,2410,12
33,40 44,4827,31
Assumewestillonlyhave3 bufferpages(Buffernotpictured)
Running External Merge Sort on Larger Files
Disk
31,33 44,5510,12
27,38 43,4518,24
31,33 47,5510,12
20,22 23,413,18
39,42 46,5531,33
18,23 24,271,3
33,40 44,4810,12
22,24 27,3116,18
3.Andrepeat…
Disk
18,24 27,3110,12
43,44 45,5533,38
12,18 20,223,10
33,41 47,5523,31
18,23 24,271,3
39,42 46,5531,33
16,18 22,2410,12
33,40 44,4827,31
Disk
10,12 12,183,10
22,23 24,2718,20
33,33 38,4131,31
45,47 55,5543,44
10,12 16,181,3
23,24 24,2718,22
31,33 33,3927,31
44,46 48,5540,42
Assumewestillonlyhave3 bufferpages(Buffernotpictured)
Calleachofthesestepsapass
Running External Merge Sort on Larger Files
Disk
31,33 44,5510,12
27,38 43,4518,24
31,33 47,5510,12
20,22 23,413,18
39,42 46,5531,33
18,23 24,271,3
33,40 44,4810,12
22,24 27,3116,18
4.Andrepeat!
Disk
18,24 27,3110,12
43,44 45,5533,38
12,18 20,223,10
33,41 47,5523,31
18,23 24,271,3
39,42 46,5531,33
16,18 22,2410,12
33,40 44,4827,31
Disk
10,12 12,183,10
22,23 24,2718,20
33,33 38,4131,31
45,47 55,5543,44
10,12 16,181,3
23,24 24,2718,22
31,33 33,3927,31
44,46 48,5540,42
Disk
3,10 10,101,3
12,16 18,1812,12
20,22 22,2318,18
24,24 27,2723,24
31,31 31,3327,31
33,38 39,4033,33
43,44 44,4541,42
48,55 55,5546,47
Simplified 3-page Buffer Version
Assume for simplicity that we split an N-page file into N single-page runs and sort these; then:
• First pass: Merge N/2 pairs of runs each of length 1 page
• Second pass: Merge N/4 pairs of runs each of length 2 pages
• In general, for N pages, we do 𝒍𝒐𝒈𝟐 𝑵 passes– +1 for the initial split & sort
• Each pass involves reading in & writing out all the pages = 2N IO
Unsortedinputfile
Split&sort
Merge
Merge
Sorted!
à 2N*( 𝒍𝒐𝒈𝟐 𝑵 +1)totalIOcost!
Using B+1 buffer pages to reduce # of passes
Suppose we have B+1 buffer pages now; we can:
1. Increase length of initial runs. Sort B+1 at a time!At the beginning, we can split the N pages into runs of length B+1 and sort these in memory
2𝑁( log, 𝑁 + 1)
IOCost:
Startingwithrunsoflength1
2𝑁( log,𝑵
𝑩 + 𝟏 + 1)
StartingwithrunsoflengthB+1
Using B+1 buffer pages to reduce # of passes
Suppose we have B+1 buffer pages now; we can:
2. Perform a B-way merge. On each pass, we can merge groups of B runs at a time (vs. merging pairs of runs)!
IOCost:
2𝑁( log, 𝑁 + 1) 2𝑁( log,𝑵
𝑩 + 𝟏 + 1)
Startingwithrunsoflength1
StartingwithrunsoflengthB+1
2𝑁( log2𝑵
𝑩 + 𝟏 + 1)
PerformingB-waymerges
INDEXING
27
What you will learn about in this section
1. Indexes:Motivation
2. Indexes:Basics
28
Index Motivation
• Suppose we want to search for people of a specific age
• First idea: Sort the records by age… we know how to do this fast!
• How many IO operations to search over N sorted records?– Simple scan: O(N)– Binary search: O(𝐥𝐨𝐠𝟐 𝑵)
Person(name, age)
Couldwegetevencheapersearch?E.g.gofrom𝐥𝐨𝐠𝟐 𝑵à 𝐥𝐨𝐠𝟐𝟎𝟎 𝑵?
Index Motivation
• What about if we want to insert a new person, but keep the list sorted?
• We would have to potentially shift N records, requiring up to ~ 2*N/P IO operations (where P = # of records per page)!– We could leave some “slack” in the pages…
4,5 6,71,3 3,4 5,61,2
2
7,
Couldwegetfasterinsertions?
Index Motivation
• What about if we want to be able to search quickly along multiple attributes (e.g. not just age)?–We could keep multiple copies of the records, each sorted
by one attribute set… this would take a lot of space
Canwegetfastsearchovermultipleattribute(sets)withouttakingtoomuchspace?
We’llcreateseparatedatastructurescalledindexes toaddressallthesepoints
Further Motivation for Indexes: NoSQL!
• NoSQL engines are (basically) just indexes!
– A lot more is left to the user in NoSQL… one of the primary remaining functions of the DBMS is still to provide index over the data records, for the reasons we just saw!
– Sometimes use B+ Trees, sometimes hash indexes
IndexesarecriticalacrossallDBMStypes
Indexes: High-level
• An index on a file speeds up selections on the search key fields for the index.– Search key properties
• Any subset of fields• is not the same as key of a relation
• Example: Onwhichattributeswouldyoubuild
indexes?Product(name, maker, price)
More precisely
• An index is a data structure mapping search keys to sets of rows in a database table
– Provides efficient lookup & retrieval by search key value-usually much faster than searching through all the rows of the database table
• An index can store:– Full rows it points to (primary index) or – Pointers to those rows (secondary index)
Operations on an Index
• Search: Quickly find all records which meet some conditionon the search key attributes–More sophisticated variants as well. Why?
Indexingisonethemostimportantfeaturesprovidedbyadatabaseforperformance
Conceptual Example
Whatifwewanttoreturnallbookspublishedafter1867?Theabovetablemightbeveryexpensivetosearchoverrow-by-row…
SELECT *FROM Russian_NovelsWHERE Published > 1867
BID Title Author Published Full_text
001 WarandPeace Tolstoy 1869 …
002 CrimeandPunishment
Dostoyevsky 1866 …
003 AnnaKarenina Tolstoy 1877 …
Russian_Novels
Conceptual Example
BID Title Author Published Full_text
001 WarandPeace Tolstoy 1869 …
002 CrimeandPunishment
Dostoyevsky 1866 …
003 AnnaKarenina Tolstoy 1877 …
Published BID
1866 002
1869 001
1877 003
Maintainanindexforthis,andsearchoverthat!
Russian_NovelsBy_Yr_Index
Whymightjustkeepingthetablesortedbyyearnotbegoodenough?
Conceptual Example
BID Title Author Published Full_text
001 WarandPeace Tolstoy 1869 …
002 CrimeandPunishment
Dostoyevsky 1866 …
003 AnnaKarenina Tolstoy 1877 …
Published BID
1866 002
1869 001
1877 003
Indexesshownhereastables,butinrealitywewillusemoreefficientdatastructures…
Russian_NovelsBy_Yr_Index
Author Title BID
Dostoyevsky Crime andPunishment
002
Tolstoy AnnaKarenina 003
Tolstoy War andPeace 001
By_Author_Title_Index Canhavemultipleindexestosupportmultiplesearchkeys
Covering Indexes
Published BID
1866 002
1869 001
1877 003
By_Yr_Index
Wesaythatanindexiscovering foraspecificquery iftheindexcontainsalltheneededattributes-meaningthequerycanbeansweredusingtheindexalone!
The“needed”attributesaretheunionofthoseintheSELECTandWHEREclauses…
SELECT Published, BIDFROM Russian_NovelsWHERE Published > 1867
Example:
Primary Indexes: Index for Sorted (Ordered) Files
CSC261,Spring2017,UR
Clustering Indexes (Index for Sorted (on non-key) Files)
CSC261,Spring2017,UR
Secondary Indexes (on a key field)
• Secondary means of accessing a data file
• File records could be ordered, unordered, or hashed
CSC261,Spring2017,UR
Secondary Indexes (on a non-key field)Extra level of indirection
• Provides logical ordering– Though records are not
physically ordered
CSC261,Spring2017,UR
High-level Categories of Index Types
• Multilevel Indexes– Very good for range queries, sorted data– Some old databases only implemented B-Trees– We will mostly look at a variant called B+ Trees
• Hash Tables – Very good for searching
Realdifferencebetweenstructures:costsofopsdetermineswhichindexyoupickandwhy
MULTILEVEL INDEXES
45
What you will learn about in this section
1. ISAM
2. B+Tree
46
1. ISAM
47
ISAM
• Indexed Sequential Access Method
– For an index with𝑏9blocks• Earlier: log,b;block
access• Now: log<=b; block
access• (𝑓𝑜 = 𝑓𝑎𝑛𝑜𝑢𝑡)
1. B+ TREES
49
What you will learn about in this section
1. B+Trees:Basics
2. B+Trees:Design&Cost
3. ClusteredIndexes
50
B+ Trees
• Search trees – B does not mean binary!
• Idea in B Trees:–make 1 node = 1 physical page– Balanced, height adjusted tree (not the B either)
• Idea in B+ Trees:–Make leaves into a linked list (for range queries)
B+ Tree Basics
10 20 30
Eachnon-leaf(“interior”)node has≥ dand≤2dkeys*
*exceptforrootnode,whichcanhavebetween1and2dkeys
Parameterd=degreeTheminimumnumberofkeyaninteriornodecanhave
B+ Tree Basics
10 20 30
k<10
10≤ 𝑘<20
20≤ 𝑘<3030≤ 𝑘
Thenkeysinanodedefinen+1ranges
B+ Tree Basics
10 20 30
Non-leaforinternalnode
22 25 28
Foreachrange,inanon-leafnode,thereisapointer toanothernodewithkeysinthatrange
B+ Tree Basics
10 20 30
Leafnodesalsohavebetweendand2dkeys,andaredifferentinthat:
22 25 28 29
Leaf nodes
32 34 37 38
Non-leaforinternalnode
12 17
B+ Tree Basics
10 20 30
22 25 28 29
Leaf nodes
32 34 37 38
Non-leaforinternalnode
12 17
Leafnodesalsohavebetweendand2dkeys,andaredifferentinthat:
Theirkeyslotscontainpointerstodatarecords
21 22 27 28 30 33 35 3715
11
B+ Tree Basics
10 20 30
22 25 28 29
Leaf nodes
32 34 37 38
Non-leaforinternalnode
12 17
21 22 27 28 30 33 35 3715
11
Leafnodesalsohavebetweendand2dkeys,andaredifferentinthat:
Theirkeyslotscontainpointerstodatarecords
Theycontainapointertothenextleafnodeaswell,forfastersequentialtraversal
B+ Tree Basics
10 20 30
22 25 28 29
Leaf nodes
32 34 37 38
Non-leaforinternalnode
12 17
Notethatthepointersattheleaflevelwillbetotheactualdatarecords(rows).
Wemighttruncatetheseforsimplerdisplay(asbefore)…
Name:JohnAge:21
Name:JakeAge:15
Name:BobAge:27
Name:SallyAge:28
Name:SueAge:33
Name:JessAge:35
Name:AlfAge:37Name:Joe
Age:11
Name:BessAge:22
Name:SalAge:30
Some finer points of B+ Trees
Searching a B+ Tree
• For exact key values:–Start at the root–Proceed down, to the leaf
• For range queries:–As above–Then sequential traversal
SELECT nameFROM peopleWHERE age = 25
SELECT nameFROM peopleWHERE 20 <= ageAND age <= 30
B+ Tree Exact Search Animation
80
20 60 100 120 140
10 15 18 20 30 40 50 60 65 80 85 90
10 12 15 20 28 30 40 60 63 80 84 89
K=30?
30<80
30in[20,60)
Tothedata!
Notallnodespictured
30in[30,40)
B+ Tree Range Search Animation
80
20 60 100 120 140
10 15 18 20 30 40 50 60 65 80 85 90
10 12 15 20 28 30 40 59 63 80 84 89
Kin[30,85]?
30<80
30in[20,60)
Tothedata!
Notallnodespictured
30in[30,40)
B+ Tree Design
• How large is d?
• Example:– Key size = 4 bytes– Pointer size = 8 bytes– Block size = 4096 bytes
• We want each node to fit on a single block/page– 2d x 4 + (2d+1) x 8 <= 4096 à d <= 170
B+ Tree: High Fanout = Smaller & Lower IO
• As compared to e.g. binary search trees, B+ Trees have high fanout(between d+1 and 2d+1)
• This means that the depth of the tree is small à getting to any element requires very few IO operations!– Also can often store most or all of the
B+ Tree in main memory!
Thefanout isdefinedasthenumberofpointerstochildnodescomingoutofanode
Notethatfanout isdynamic-we’lloftenassumeit’sconstantjusttocomeupwithapproximateeqns!
Simple Cost Model for Search• Let:
– f = fanout, which is in [d+1, 2d+1] (we’ll assume it’s constant for our cost model…)– N = the total number of pages we need to index– F = fill-factor (usually ~= 2/3)
• Our B+ Tree needs to have room to index N / F pages!– We have the fill factor in order to leave some open slots for faster insertions
• What height (h) does our B+ Tree need to be?– h=1 à Just the root node- room to index f pages– h=2 à f leaf nodes- room to index f2 pages– h=3 à f2 leaf nodes- room to index f3 pages– …– h à fh-1 leaf nodes- room to index fh pages!
àWeneedaB+Treeofheighth= logH
IJ
Fast Insertions & Self-Balancing
– Same cost as exact search– Self-balancing: B+ Tree remains balanced (with respect to
height) even after insert
B+Treesalso(relatively)fastforsingleinsertions!However,canbecomebottleneckifmanyinsertions(iffill-
factorslackisusedup…)
Insertion
• Perform a search to determine what bucket the new record should go into.
• If the bucket is not full (at most (b-1) entries after the insertion), add the record.
• Otherwise, split the bucket.– Allocate new leaf and move half the bucket's elements to the new bucket.– Insert the new leaf's smallest key and address into the parent.– If the parent is full, split it too.
• Add the middle key to the parent node.– Repeat until a parent is found that need not split.
• If the root splits, create a new root which has one key and two pointers. (That is, the value that gets pushed to the new root gets removed from the original node)
• Note: B-trees grow at the root and not at the leave
CSC261,Spring2017,UR
Insertion (Insert 85)
CSC261,Spring2017,UR
b=f=branchingfactor/fan-out=3
20 40 80 90
40 80
Insertion (Insert 85)
CSC261,Spring2017,UR
b=f=branchingfactor/fan-out=3
20 40 80 90
40 80
85
Thisiswhatwewouldlike.Butthemaximumnumberofkeysinanynodeis(3-1)=2So,split.
Insertion (Insert 85)
CSC261,Spring2017,UR
b=f=branchingfactor/fan-out=3
20 40 80
40 80
85
Allocatenewleafandmovehalfthebucket'selementstothenewbucket.
90
Insertion (Insert 85)
CSC261,Spring2017,UR
b=f=branchingfactor/fan-out=3
20 40 80
40 80
85
Insertthenewleaf'ssmallestkeyandaddressintotheparent.
90
85
Insertion (Insert 85)
CSC261,Spring2017,UR
b=f=branchingfactor/fan-out=3
20 40 80
40 80
85
Thisisnotallowed,astheparentisfull.Needtosplit
90
85
Insertion (Insert 85)
CSC261,Spring2017,UR
b=f=branchingfactor/fan-out=3
20 40 80
40
80
85
Iftheparentisfull,splitittoo.
Addthemiddlekeytotheparentnode.
Repeatuntilaparentisfoundthatneednotsplit
90
85
Insertion (Insert 85)
CSC261,Spring2017,UR
b=f=branchingfactor/fan-out=3
20 40 80
40
80
85
Iftherootsplits,createanewrootwhichhasonekeyandtwopointers.
(Thatis,thevaluethatgetspushedtothenewrootgetsremovedfromtheoriginalnode)
90
85
Deletion
• Start at root, find leaf L where entry belongs.• Remove the entry.– If L is at least half-full, done!– If L has fewer entries than it should,
• If sibling (adjacent node with same parent as L) is more than half-full, re-distribute, borrowing an entry from it.
• Otherwise, sibling is exactly half-full, so we can merge L and sibling.• If merge occurred, must delete entry (pointing to L or sibling)
from parent of L.• Merge could propagate to root, decreasing height.
• https://www.cs.usfca.edu/~galles/visualization/BPlusTree.html
• The degree in this visualization is actually fan-out f or branching factor b.
CSC261,Spring2017,UR
• Find 86
CSC261,Spring2017,UR
• Delete it
CSC261,Spring2017,UR
• Stealing from right sibling (redistribute).• Modify the parent node
CSC261,Spring2017,UR
• Finally,
CSC261,Spring2017,UR
Acknowledgement
• Some of the slides in this presentation are taken from the slides provided by the authors.
• Many of these slides are taken from cs145 course offered byStanford University.
CSC261,Spring2017,UR