csc 556 – dbms ii, spring 2013

44
CSC 556 – DBMS II, Spring 2013 April 10 & 17, 2013 Storage media hierarchies, external sorts, B-trees

Upload: dane

Post on 13-Jan-2016

35 views

Category:

Documents


0 download

DESCRIPTION

CSC 556 – DBMS II, Spring 2013. April 10 & 17, 2013 Storage media hierarchies, external sorts, B-trees. Storage medium abstraction. Storage medium as a subclass of an interface allows you to prototype storage medium semi-independently from DBMS structures atop it. Sequential File I/O. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: CSC 556 – DBMS II, Spring 2013

CSC 556 – DBMS II, Spring 2013

April 10 & 17, 2013Storage media hierarchies, external

sorts, B-trees

Page 2: CSC 556 – DBMS II, Spring 2013

Storage medium abstraction

• Storage medium as a subclass of an interface allows you to prototype storage medium semi-independently from DBMS structures atop it.

Page 3: CSC 556 – DBMS II, Spring 2013

Sequential File I/O

• Queue abstraction supports sequential file I/O or core I/O via a series of enqueue, peek & dequeue calls. Supplies one-record lookahead.

• External sorts, i.e., sorts where the data do not fit into memory, rely on this approach.

• A record size entry can precede any variable length record in the file, OR

• A sentinel value can mark a record’s end.

Page 4: CSC 556 – DBMS II, Spring 2013

Direct Access File I/O

• Direct access (a.k.a. random access) file I/O uses a seek system call to locate a position with a direct-access (binary) data file.

• man lseek, fseek, ftell• Offsets are from 0, or current seek position, or end.• Unix 3C library builds atop level 2 system calls. DBMS

may access level 2 system calls directly.• It treats the file as an array where a seek address is an offset

into the array.• Applications attempt to keep contiguous records in

contiguous blocks.

Page 5: CSC 556 – DBMS II, Spring 2013

Merge sort & Sequential File I/O

• The split phase partitions alternate runs from a queue into two helper queues, where the initial queue is the file to be sorted, and the helpers are temporary files.

• A run is a sorted subsequence.• The merge phase doubles the run length by merging peer

runs from the helper queues back into the initial queue.• Natural-length merge sort inspects the data to locate run

boundaries. It may stumble onto big runs.• Another variants uses an internal sort such as quicksort to

build larger initial runs in memory.• A file-based sort is an external sort.

Page 6: CSC 556 – DBMS II, Spring 2013

April 10 example of merge sort• End of split phase with run length = 1.• End-of-run record is underlined.• Destination of this phase is in bold.

main -17 -101 5 0 18 29 62 666 1 -1

tmp0 -17 5 18 62 1

tmp1 -101 0 29 666 -1

Page 7: CSC 556 – DBMS II, Spring 2013

April 10 example of merge sort• End of split phase with run length = 1.

• End of merge phase grows run length to 2.

main -17 -101 5 0 18 29 62 666 1 -1

tmp0 -17 5 18 62 1

tmp1 -101 0 29 666 -1

main -101 -17 0 5 18 29 62 666 -1 1

tmp0 -17 5 18 62 1

tmp1 -101 0 29 666 -1

Page 8: CSC 556 – DBMS II, Spring 2013

April 10 example of merge sort• End of split phase with run length = 2.

• End of merge phase grows run length to 4.

main -101 -17 0 5 18 29 62 666 -1 1

tmp0 -101 -17 18 29 -1 1

tmp1 0 5 62 666

main -101 -17 0 5 18 29 62 666 -1 1

tmp0 -101 -17 18 29 -1 1

tmp1 0 5 62 666

Page 9: CSC 556 – DBMS II, Spring 2013

April 10 example of merge sort• End of split phase with run length = 4.

• End of merge phase grows run length to 8.

main -101 -17 0 5 18 29 62 666 -1 1

tmp0 -101 -17 0 5 -1 1

tmp1 18 29 62 666

main -101 -17 0 5 18 29 62 666 -1 1

tmp0 -101 -17 0 5 -1 1

tmp1 18 29 62 666

Page 10: CSC 556 – DBMS II, Spring 2013

April 10 example of merge sort• End of split phase with run length = 8.

• End of merge phase grows run length to 16.

main -101 -17 0 5 18 29 62 666 -1 1

tmp0 -101 -17 0 5 18 29 62 666

tmp1 -1 1

main -101 -17 -1 0 1 5 18 29 62 666

tmp0 -101 -17 0 5 18 29 62 666

tmp1 -1 1

Page 11: CSC 556 – DBMS II, Spring 2013

Merge sort is O(n log(n))

• Picture a merge sort as a tree growing up from N runs of length 1 to 1 run of length N.

Page 12: CSC 556 – DBMS II, Spring 2013

April 10 radix10 sort (bucket sort)

• This sort also uses sequential file I.O.• Initial sequence of integers.

• Sequence normalized to non-negative values with digits to accommodate the largest.

main -17 -101 5 0 18 29 62 666 1 -1

main 084 000 106 101 119 130 163 707 102 100

Page 13: CSC 556 – DBMS II, Spring 2013

Number temp queues = radixmain 084 000 106 101 119 130 163 767 102 100

tmp0 000 130 100

tmp1 101

tmp2 102

tmp3 163

tmp4 084

tmp5

tmp6 106

tmp7 767

tmp8

tmp9 119

Page 14: CSC 556 – DBMS II, Spring 2013

Number temp queues = radixmain 000 130 100 101 102 163 084 106 767 119

tmp0 000 100 101 102 106

tmp1 119

tmp2

tmp3 130

tmp4

tmp5

tmp6 163 767

tmp7

tmp8 084

tmp9

Page 15: CSC 556 – DBMS II, Spring 2013

Number temp queues = radixmain 000 100 101 102 106 119 130 163 767 084

tmp0 000 084

tmp1 100 101 102 106 119 130 163

tmp2

tmp3

tmp4

tmp5

tmp6

tmp7 767

tmp8

tmp9

Page 16: CSC 556 – DBMS II, Spring 2013

Number temp queues = radix

• Sort is O(N x D) for N items and D digits, but• D is a constant and does not affect growth rate• Radix sort is therefore O(N) on data size• It requires a fixed-bit-width key field.

• Fixed-width fields required for a relational DBMS.

• It has a lot of copy overhead.• Use a radix with many bits, many (smaller) files.

main 000 084 100 101 102 106 119 130 163 767

final -101 -17 -1 0 1 5 18 29 62 666

Page 17: CSC 556 – DBMS II, Spring 2013

Base-2numbits code.

void radixsort(interface_queueOfInts *queueToSort, int numbits,

interface_queueOfInts * temporaryQueues[]) { const int numqueues = (1 << numbits); int mask = ~(~0 << numbits); const int allbits = sizeof(int) * 8 ; // bits per sorted value for (int shifter = 0 ; shifter < allbits ; shifter += numbits) { splitphase(queueToSort, temporaryQueues, shifter, mask); mergephase(queueToSort, temporaryQueues,

numqueues); } }

Page 18: CSC 556 – DBMS II, Spring 2013

Base-2numbits code.

static void splitphase(interface_queueOfInts *merger, interface_queueOfInts * splitter[], int shifter, int bitmask) { bool ignoreme ; // We only peek on queues with data, etc. while (merger->canPeek()) { // while there are runs to split int value = merger->peek(ignoreme); merger->dequeue(); int qid = (value >> shifter) & bitmask ; splitter[qid]->enqueue(value);}}

Page 19: CSC 556 – DBMS II, Spring 2013

Base 2numbits code.

static void mergephase(interface_queueOfInts *merger, interface_queueOfInts * splitter[], int numqueues) { bool ignoreme ; for (int qtodrain = 0 ; qtodrain < numqueues ; qtodrain++) { while (splitter[qtodrain]->canPeek()) { merger->enqueue(splitter[qtodrain]->peek(ignoreme)); splitter[qtodrain]->dequeue(); } }}

Page 20: CSC 556 – DBMS II, Spring 2013

MultiSet is a B+-tree Map over various storage subclasses

• DataMine/mset/MultiSet.h

Page 21: CSC 556 – DBMS II, Spring 2013

Relational DBMS

• Flat, fixed width records (tuples) fit into contiguous memory locations & file blocks.

• Take the least common multiple (LCM) of the record size and block size, and allocate using that.

• Unix lseek and fcntl are the primary system calls. Windows has counterparts. The low-end cylinder allocation on the disk is managed by the operating system.

fixed0 fixed1 fixed2 fixed2 fixed3 fixed4 fixed5 fixed6 fixed7 fixed8

Page 22: CSC 556 – DBMS II, Spring 2013

B-trees and index files

• B-trees are balanced binary trees; typically degree D > 2, as determined by block size.

• How many B-tree node records fit into a disk block?• Each node holds between D/2 and D entries.

• The root is an exception. It can hold < D/2 entries.• In B+-trees the leaves hold the actual data, typically as seek

indices into the contiguous database file. B+-trees also link the leaves into a sorted chain for range-based serial access.

• B+-tree interior nodes hold pointers to children.• A B-tree grows from the bottom up whenever an insertion

causes a node to split.

Page 23: CSC 556 – DBMS II, Spring 2013

MultiSet

• MultiSet uses sets of keys or multisets of keys (duplicate keys allowed) to map to application data elements.

• Search includes ==, <, <=, > or >= key.• First or last key occurrence for actual multisets.• Greatest lesser value when key is not present.• Also supports least greater value.• Serial linked list at leaves (B+ tree) supports duplicate

keys & iterating over results, including following operations.• A result is a MultiSet amenable to union, intersection

and set difference with other MultiSet objects.

Page 24: CSC 556 – DBMS II, Spring 2013

MultiSet.htypedef unsigned long location ; const unsigned BTREEDEGREE = 16 ;template <class KeyType> struct treenode { // basic internal node location parent ; // type treenode location child[BTREEDEGREE] ; // treenodes or leafnodes KeyType key[BTREEDEGREE] ;} // min keys for those childrentemplate <class KeyType> struct leafnode { // leaf connects treenodes to treeelems location parent ; // type treenode location prev, next ; // siblings or cousins location child[BTREEDEGREE] ; // treeelems KeyType key[BTREEDEGREE] ;} // keys for those children

Page 25: CSC 556 – DBMS II, Spring 2013

MultiSet.htemplate <class ElementType> struct treeelem { /* a btree leaf element */ ElementType element ; // main contents location next ; // next avail. if needed for a free list } ;template <class ElementType, class KeyType>class MultiSet { /* ElementType is the type of the set's elements. KeyType is the type,

typically part or possibly all of an ElementType object, thatconstitutes a search key.

... */

Page 26: CSC 556 – DBMS II, Spring 2013

Abstract location

• Location is an unsigned long that is either a seek offset (file) or a cast of an object pointer.

• MultiSet records the depth of the tree.• Interior nodes are type treenode.• Leaf nodes are type leafnode.• Leaf nodes point to treeelem application data.• Those may be app data or may be record

indices into another file of flat data records.

Page 27: CSC 556 – DBMS II, Spring 2013

Searching through the B-tree

• findleaf uses findsubtree on interior nodes.• Uses O(log n) binary search on interior node.• Calls findsubtree in a loop until hitting the

leaves.• Multi-key version finds first or last instance.• findslot returns the array index inside a node.

Page 28: CSC 556 – DBMS II, Spring 2013

Insertion in the B-tree

• randominsert starts at the root• Initial element is a special case.

• Root node is always special because it may contain fewer than degree / 2 entries.

• Otherwise, if it fits into a node, put it there.• When the node is full, split into two nodes of

size N/2 and N/2 + 1.• Add the new node to its parent.• If that parent node was already full, split it recursively.

Page 29: CSC 556 – DBMS II, Spring 2013

Deletion from a B-tree

• There is no problem when number of entries remains >= N/2. Just slide entries above the deleted entry down to cover the deleted one.

• When entries < N/2, try merging with neighbors in a serial chain (siblings or cousins).

• If there are too few for that, merge into one node. Other is empty, delete it from its parent.

• If parent entries < N/2, perform recursive delete.• See deleteelem and deleteleaf in MultiSet.

Page 30: CSC 556 – DBMS II, Spring 2013

http://en.wikipedia.org/wiki/B-tree

• keeps keys in sorted order for sequential traversing• uses a hierarchical index to minimize the number of

disk reads• uses partially full blocks to speed insertions and

deletions• keeps the index balanced with an elegant recursive

algorithm• minimizes waste by making sure the interior nodes

are at least half full

Page 31: CSC 556 – DBMS II, Spring 2013

CoreMultiSet & FileMultiSet

• storage read / write via abstract location• readnode / readleaf / readelem• writenode / writeleaf / writeelem• allocnode / allocleaf / allocelem• freenode / freeleaf / freeelem• FileMultiSet maintains free lists of the above.• Flat file structure in main relational file makes

storage management of free list “easy.”

Page 32: CSC 556 – DBMS II, Spring 2013

Other Indexing – Skip Lists

• Skip lists support probabilistic log(N) lookup, insertion & deletion of a key -> value mapping.

• We will review Pugh’s paper from the 1990’s.• A skip list links each mapping in a series of key-sorted

linked lists, in which higher-order lists contain fewer members, acting as “highways” & “boulevards” in locating keys.

• Skip lists provide better support for concurrency than some balanced tree algorithms by avoiding global tree restructuring on rebalancing.

Page 33: CSC 556 – DBMS II, Spring 2013

Other Indexing – Hashing

• Hashing approaches O(1) (constant-time) lookup for an ideal hash function.

• The ideal hash function preserves all of the bits of distinguishing information is a search key, while folding them into fewer bits to use as a lookup index into an array, an index file, or a flat file of fixed-width records.

• Hash tables disambiguate key collisions either by storing colliding elements in a linked list (chained hashing) or by rehashing to a new bucket (open address hashing).

• Hashing supports only == tests, not <, <=, >, >=.

Page 34: CSC 556 – DBMS II, Spring 2013

Other Indexing – Sorting

• When a sequence of fixed-size records or indices are sorted in a file of contiguous records, binary search is a viable option for locating a key.

• Hashing and sorting are particularly appropriate for indexing on-the-fly, temporary result sets to be combined via intersection, union or set difference.

• Approximate O(1) hashing is fast when combining result sets based only on equality.

• Sort-based search and merging are appropriate when a query requests a result set sorted on an attribute.

Page 35: CSC 556 – DBMS II, Spring 2013

Query Processing (Elmasri & Navathe chapter 19)

• Translate SQL statements into abstract syntax tree using basic compilation techniques.

• Interpret the abstract syntax tree.• ORDER-BY and elimination of duplicate tuples in a

PROJECT are supported by external sort.• Duplicate elimination can use low-level memcmp byte

comparison when comparing fixed-size records.

• If SORT is based in an index key or composite index key, or if query does not entail PROJECT, then a B+-tree index avoids need for a sort.

Page 36: CSC 556 – DBMS II, Spring 2013

SELECT Processing

• Use indexed fields as primary search keys.• Use index-based set intersection, union and

difference operations to support AND, OR and NOT.• Use slow (O(n)) sequential search, or external sorting

(O(n log(n)) where appropriate (ORDER-BY or duplicate elimination) combined with binary search.

• Use hashing for == matching.• Use composite search key indexing or hashing.• Utilize selectivity where possible.

Page 37: CSC 556 – DBMS II, Spring 2013

JOIN Processing

• Nested-loop is brute force approach (O(NK)).• Single-loop when join attributes are indexed.• Sort-merge join requires records to be sorted

on the join attributes, and then merged.• Partition-hash join hashes smaller of two

contributing relations into chained hash table.• Larger relation is then hashed on join

attributes to retrieve tuples from the smaller.

Page 38: CSC 556 – DBMS II, Spring 2013

PROJECT Processing

• If projection includes a distinguishing key, the projected tuples are already unique. Just select a subset of the query results.

• Otherwise, DISTINCT projections require sorting based on entire tuple to eliminate duplicates.

• It is also possible to hash on entire tuples to eliminate duplicates.

Page 39: CSC 556 – DBMS II, Spring 2013

Approaches to Query Optimization

• Pipelined or stream-based processing of query stages across multiple thread / memory units.

• Compiler optimization techniques such as common subexpression elimination.

• Functional approaches such as lazy evaluation.• Use meta-data and heuristics such as size of

contributing relations (smaller is faster and may fit into memory), key distribution data.

Page 40: CSC 556 – DBMS II, Spring 2013

Costs to consider

• Access cost (disk I/O), e.g., NFS vs. local disk.• Disk storage cost for intermediate files.• Computation costs (O(?)) cost.• Memory usage cost (avoid thrashing).• Communication cost for distributed systems.• Maintenance cost in terms of resources &

availability of the database.

Page 41: CSC 556 – DBMS II, Spring 2013

Physical Database Design

• Chapter 20 in Elmasri & Navathe textbook.• Accumulate query statistics & data mine

them.• What attributes to index?• When to use a clustered index on a non-key.• Actual dataset can be organized on only 1 key.• Hashing works well on equality-only joins.• Dynamic hashing for volatile files.

Page 42: CSC 556 – DBMS II, Spring 2013

Physical Database Techniques

• CREATE [ UNIQUE ] INDEX <index name> ON <table name> (<column name> [ <order> ] { , <column name> [ <order> ] } ) [ CLUSTER ] ;

• Denormalization demotes normalized tables to weaker forms for increased speed.

• Vertical partitioning splits relations over attributes to speed projection dynamics.

• Horizontal partitioning splits relations over indexed tuples to speed selection dynamics.

Page 43: CSC 556 – DBMS II, Spring 2013

Collect statistics

• Storage statistics include data about table spaces, index spaces, buffer pools. DBMS may require preallocation and sizing / tuning for clustering.

• I/O and device performance statistics include read / write (paging) on disk intents, hot spots, and thrashing for core memory.

• NFS/local/in-core, number of network interface cards, amount of core memory, cache, memory topology.

• Query / transaction statistics help determine attributes to index & query distributions.

Page 44: CSC 556 – DBMS II, Spring 2013

Tuning queries• Precompiled queries offer opportunities for profiling and speed

improvement.• Avoid generating larger intermediate result sets when smaller ones are

available.• Avoid nested queries that generate large cross-products in favor of

sequential queries. P. 737 example potential to search all of M for each tuple from E.

SELECT Ssn SELECT MAX(Salary) AS High_salary, Dno INTO TEMPFROM EMPLOYEE E FROM EMPLOYEE GROUP BY Dno ;WHERE SELECT EMPLOYEE.Ssn FROM EMPLOYEE, TEMPSalary = SELECT MAX(Salary) WHERE EMPLOYEE.Salary =

TEMP.High_salaryFROM EMPLOYEE AS M AND EMPLOYEE.Dno = TEMP.Dno ;WHERE M.Dno = E.Dno ;