![Page 1: Database Management Systems · Advantages of DBMSs •Abstract data representation •Relational model •Data storage is delegated to the DBMS •Functional query language (SQL)](https://reader033.vdocument.in/reader033/viewer/2022042214/5eb9ddb88872cd3a1f6dc0e3/html5/thumbnails/1.jpg)
Database Management Systems
Marco Serafini
COMPSCI 590S1. Lecture 7
![Page 2: Database Management Systems · Advantages of DBMSs •Abstract data representation •Relational model •Data storage is delegated to the DBMS •Functional query language (SQL)](https://reader033.vdocument.in/reader033/viewer/2022042214/5eb9ddb88872cd3a1f6dc0e3/html5/thumbnails/2.jpg)
2
MapReduce vs. DBMSs
![Page 3: Database Management Systems · Advantages of DBMSs •Abstract data representation •Relational model •Data storage is delegated to the DBMS •Functional query language (SQL)](https://reader033.vdocument.in/reader033/viewer/2022042214/5eb9ddb88872cd3a1f6dc0e3/html5/thumbnails/3.jpg)
33
Advantages of DBMSs• Abstract data representation
• Relational model• Data storage is delegated to the DBMS
• Functional query language (SQL)• Queries specified as simple relational operators• Actual query execution delegated to the DBMS…• … including parallelism, distribution, pipelining etc.
• Support for indexing
![Page 4: Database Management Systems · Advantages of DBMSs •Abstract data representation •Relational model •Data storage is delegated to the DBMS •Functional query language (SQL)](https://reader033.vdocument.in/reader033/viewer/2022042214/5eb9ddb88872cd3a1f6dc0e3/html5/thumbnails/4.jpg)
44
Disadvantages of DBMSs• SQL is a limited interface for complex analytics
• E.g. image analysis, creating maps• Need to define a schema for data a priori• High cost of loading data and indexing
• Can be amortized only if same data and schema reused• Too complex for “one shot” analytics
![Page 5: Database Management Systems · Advantages of DBMSs •Abstract data representation •Relational model •Data storage is delegated to the DBMS •Functional query language (SQL)](https://reader033.vdocument.in/reader033/viewer/2022042214/5eb9ddb88872cd3a1f6dc0e3/html5/thumbnails/5.jpg)
55
Advantages of MapReduce• Support for arbitrary UDFs• Support for a variety of arbitrary data formats• Simple API• Scalability
![Page 6: Database Management Systems · Advantages of DBMSs •Abstract data representation •Relational model •Data storage is delegated to the DBMS •Functional query language (SQL)](https://reader033.vdocument.in/reader033/viewer/2022042214/5eb9ddb88872cd3a1f6dc0e3/html5/thumbnails/6.jpg)
66
Disadvantages• Many of the optimizations of DBMS must be reimplemented by the application, for example
• Indices• Query execution plans (logical + physical)• Column-based architecture• Data format specifications (ProtoBuf)• Compression
![Page 7: Database Management Systems · Advantages of DBMSs •Abstract data representation •Relational model •Data storage is delegated to the DBMS •Functional query language (SQL)](https://reader033.vdocument.in/reader033/viewer/2022042214/5eb9ddb88872cd3a1f6dc0e3/html5/thumbnails/7.jpg)
7
Sorting
![Page 8: Database Management Systems · Advantages of DBMSs •Abstract data representation •Relational model •Data storage is delegated to the DBMS •Functional query language (SQL)](https://reader033.vdocument.in/reader033/viewer/2022042214/5eb9ddb88872cd3a1f6dc0e3/html5/thumbnails/8.jpg)
88
External Sorting• Problem
• Large table with two columns sorted on column one• Table does not fit in memory• Sort by column two
• Q: How to sort?• “Out-of-core” or “external” algorithm
![Page 9: Database Management Systems · Advantages of DBMSs •Abstract data representation •Relational model •Data storage is delegated to the DBMS •Functional query language (SQL)](https://reader033.vdocument.in/reader033/viewer/2022042214/5eb9ddb88872cd3a1f6dc0e3/html5/thumbnails/9.jpg)
99
External Sorting1. Split file in chunks2. Load each chunk in memory3. Sort and write back4. Recursive merge sort
• log(n) passes
![Page 10: Database Management Systems · Advantages of DBMSs •Abstract data representation •Relational model •Data storage is delegated to the DBMS •Functional query language (SQL)](https://reader033.vdocument.in/reader033/viewer/2022042214/5eb9ddb88872cd3a1f6dc0e3/html5/thumbnails/10.jpg)
10
Indexing
![Page 11: Database Management Systems · Advantages of DBMSs •Abstract data representation •Relational model •Data storage is delegated to the DBMS •Functional query language (SQL)](https://reader033.vdocument.in/reader033/viewer/2022042214/5eb9ddb88872cd3a1f6dc0e3/html5/thumbnails/11.jpg)
1111
Index• Index maps a value (of a column) to a row• Primary index: Maps to primary key of table (unique)• Secondary index: Maps to other columns (duplicates)
Index
Table
![Page 12: Database Management Systems · Advantages of DBMSs •Abstract data representation •Relational model •Data storage is delegated to the DBMS •Functional query language (SQL)](https://reader033.vdocument.in/reader033/viewer/2022042214/5eb9ddb88872cd3a1f6dc0e3/html5/thumbnails/12.jpg)
1212
Which Data Structure for Index?• A hash function?• A tree?• Q: Which one is better? When?
![Page 13: Database Management Systems · Advantages of DBMSs •Abstract data representation •Relational model •Data storage is delegated to the DBMS •Functional query language (SQL)](https://reader033.vdocument.in/reader033/viewer/2022042214/5eb9ddb88872cd3a1f6dc0e3/html5/thumbnails/13.jpg)
1313
Hash Index
• Advantages• Fast reads and writes
• Disadvantages• No order• Collisions
Tuple
Tuple Tuple
Tuple
Tuple
Buckets Entries
![Page 14: Database Management Systems · Advantages of DBMSs •Abstract data representation •Relational model •Data storage is delegated to the DBMS •Functional query language (SQL)](https://reader033.vdocument.in/reader033/viewer/2022042214/5eb9ddb88872cd3a1f6dc0e3/html5/thumbnails/14.jpg)
14
Binary Tree• Q: Problem?• Balance
• Balanced binary trees• E.g. R/B trees
• Locality• Space consumption
13
7
3 10
12
15
![Page 15: Database Management Systems · Advantages of DBMSs •Abstract data representation •Relational model •Data storage is delegated to the DBMS •Functional query language (SQL)](https://reader033.vdocument.in/reader033/viewer/2022042214/5eb9ddb88872cd3a1f6dc0e3/html5/thumbnails/15.jpg)
15
B+ Trees• Designed for out-of-core use
• Large nodes• Sorted (for scans)
13 30 48 61
2 5 10 12 13 17 24 28
< 13 [13,30)
…
pointers to file positions(or actual data)
pointers to file positions(or actual data)
Internal node
Leaves
![Page 16: Database Management Systems · Advantages of DBMSs •Abstract data representation •Relational model •Data storage is delegated to the DBMS •Functional query language (SQL)](https://reader033.vdocument.in/reader033/viewer/2022042214/5eb9ddb88872cd3a1f6dc0e3/html5/thumbnails/16.jpg)
1616
Clustered vs. Non-Clustered Index• Clustered index
• Reordering index elements changes physical layout• E.g. a B-Tree where leaf nodes are actual data pages• Primary index often clustered, one clustered index per table• A table without clustered index: heap (don’t be confused!)
• Non-Clustered index• Contains pointers to actual physical location• Can have multiple non-clustered indices for the same table
![Page 17: Database Management Systems · Advantages of DBMSs •Abstract data representation •Relational model •Data storage is delegated to the DBMS •Functional query language (SQL)](https://reader033.vdocument.in/reader033/viewer/2022042214/5eb9ddb88872cd3a1f6dc0e3/html5/thumbnails/17.jpg)
17
Flash/SSD• Most of these algorithms expressed considering disks• What about Flash/SSD?
• Faster but…• Coarse read/write granularity: pages (~8 kB or more)• Wear leveling
• A block can be overwritten only a finite number of times• Controller must spread writes evenly
![Page 18: Database Management Systems · Advantages of DBMSs •Abstract data representation •Relational model •Data storage is delegated to the DBMS •Functional query language (SQL)](https://reader033.vdocument.in/reader033/viewer/2022042214/5eb9ddb88872cd3a1f6dc0e3/html5/thumbnails/18.jpg)
18
Query Execution
![Page 19: Database Management Systems · Advantages of DBMSs •Abstract data representation •Relational model •Data storage is delegated to the DBMS •Functional query language (SQL)](https://reader033.vdocument.in/reader033/viewer/2022042214/5eb9ddb88872cd3a1f6dc0e3/html5/thumbnails/19.jpg)
19
Query Execution• SQL: Domain Specific Language
• Restricted language for specific task• In this case: relational algebra• Physical data independence: changes in physical storage are transparent to applications
• One SQL statement, many possible executions• Logical plan: reorder the operators• Physical plan: different ways to store the data
![Page 20: Database Management Systems · Advantages of DBMSs •Abstract data representation •Relational model •Data storage is delegated to the DBMS •Functional query language (SQL)](https://reader033.vdocument.in/reader033/viewer/2022042214/5eb9ddb88872cd3a1f6dc0e3/html5/thumbnails/20.jpg)
2020
ExampleSELECT e.cidFROM Enrollment e, Student sWHERE s.name= ’John Smith’
and e.sid = s.sid
TablesStudent(sid, name)Enrollment(sid, cid)
![Page 21: Database Management Systems · Advantages of DBMSs •Abstract data representation •Relational model •Data storage is delegated to the DBMS •Functional query language (SQL)](https://reader033.vdocument.in/reader033/viewer/2022042214/5eb9ddb88872cd3a1f6dc0e3/html5/thumbnails/21.jpg)
2121
Logical Plan• Plan in terms of relational operators
!".$%&
'(.)*+",-…-
⋈(.(%&,".(%&
Student Enrollment
!".$%&
'(.)*+",-…-
⋈(.(%&,".(%&
Student Enrollment
![Page 22: Database Management Systems · Advantages of DBMSs •Abstract data representation •Relational model •Data storage is delegated to the DBMS •Functional query language (SQL)](https://reader033.vdocument.in/reader033/viewer/2022042214/5eb9ddb88872cd3a1f6dc0e3/html5/thumbnails/22.jpg)
22
Physical Plan• Local plan annotated with physical operators
!".$%&
'(.)*+",-…-
⋈(.(%&,".(%&
Student Enrollment(File scan)
(Pipeline select and write temporary table T)
(Block nested loop join)
(File scan)
(Pipeline project) !".$%&
'(.)*+",-…-
⋈(.(%&,".(%&
Student Enrollment(Index lookup on name)
(Use index)
(Index nested loop join)
(Index lookup on sid)
(Pipeline project)
![Page 23: Database Management Systems · Advantages of DBMSs •Abstract data representation •Relational model •Data storage is delegated to the DBMS •Functional query language (SQL)](https://reader033.vdocument.in/reader033/viewer/2022042214/5eb9ddb88872cd3a1f6dc0e3/html5/thumbnails/23.jpg)
23
Log Structured Merge Trees
![Page 24: Database Management Systems · Advantages of DBMSs •Abstract data representation •Relational model •Data storage is delegated to the DBMS •Functional query language (SQL)](https://reader033.vdocument.in/reader033/viewer/2022042214/5eb9ddb88872cd3a1f6dc0e3/html5/thumbnails/24.jpg)
2424
How Good are B+ trees?• Q: Are they good for reading? Why?• Q: Are they good for writing? Why?
![Page 25: Database Management Systems · Advantages of DBMSs •Abstract data representation •Relational model •Data storage is delegated to the DBMS •Functional query language (SQL)](https://reader033.vdocument.in/reader033/viewer/2022042214/5eb9ddb88872cd3a1f6dc0e3/html5/thumbnails/25.jpg)
25
Log Structured Merge Trees• Increasingly popular data structure for DBMSs
• RocksDB (Facebook)• LevelDB (Google)
• Goals• Fast data ingestion• Leverage large memory for caching
• Problems• Write and read amplification
![Page 26: Database Management Systems · Advantages of DBMSs •Abstract data representation •Relational model •Data storage is delegated to the DBMS •Functional query language (SQL)](https://reader033.vdocument.in/reader033/viewer/2022042214/5eb9ddb88872cd3a1f6dc0e3/html5/thumbnails/26.jpg)
2626
LSMT Data Structures• Memtable
• Binary tree or skiplist à sorted• Receives writes and serves reads
• Log files (runs)• L0: dump of memtable• Li: merge of multiple Li-1 runs
![Page 27: Database Management Systems · Advantages of DBMSs •Abstract data representation •Relational model •Data storage is delegated to the DBMS •Functional query language (SQL)](https://reader033.vdocument.in/reader033/viewer/2022042214/5eb9ddb88872cd3a1f6dc0e3/html5/thumbnails/27.jpg)
2727
Operations• Writes go to memtable• Reads
• Search memtables and read caches (if available)• Search log files in reverse chronological order• Bloom filters – indices in log files
• Periodically dump memtable to L0• Periodically merge from Li-1 to Li