data analytics using deep learningjarulraj/courses/8803-f19/slides/06-in... · data analytics using...

80
DATA ANALYTICS USING DEEP LEARNING GT 8803 // FALL 2019 // JOY ARULRAJ LECTURE #06: DISK-CENTRIC AND IN-MEMORY DATABASE SYSTEMS

Upload: others

Post on 25-Jun-2020

6 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: DATA ANALYTICS USING DEEP LEARNINGjarulraj/courses/8803-f19/slides/06-in... · DATA ANALYTICS USING DEEP LEARNING GT 8803 // FALL 2019 // JOY ARULRAJ LECTURE #06: DISK- CENTRIC AND

DATA ANALYTICS

USING DEEP LEARNING

GT 8803 // FALL 2019 // JOY ARULRAJ

L E C T U R E # 0 6 : D I S K - C E N T R I C A N D I N - M E M O R Y

D A T A B A S E S Y S T E M S

Page 2: DATA ANALYTICS USING DEEP LEARNINGjarulraj/courses/8803-f19/slides/06-in... · DATA ANALYTICS USING DEEP LEARNING GT 8803 // FALL 2019 // JOY ARULRAJ LECTURE #06: DISK- CENTRIC AND

GT 8803 // Fall 2019

a d m i n i s t r i v i a

• Project ideas– List shared on Piazza

– Start looking for team-mates!

– Sign up for discussion slots during office hours

2

Page 3: DATA ANALYTICS USING DEEP LEARNINGjarulraj/courses/8803-f19/slides/06-in... · DATA ANALYTICS USING DEEP LEARNING GT 8803 // FALL 2019 // JOY ARULRAJ LECTURE #06: DISK- CENTRIC AND

GT 8803 // Fall 2019

L A S T C L A S S

• History of DBMSs– In a way though, it really was a history of data

models

• Data Models– Hierarchical data model (tree) (IMS)

– Network data model (graph) (CODASYL)

– Relational data model (tables) (System R, INGRES)

• Overarching theme about all these systems– They were all disk-based DBMSs

3

Page 4: DATA ANALYTICS USING DEEP LEARNINGjarulraj/courses/8803-f19/slides/06-in... · DATA ANALYTICS USING DEEP LEARNING GT 8803 // FALL 2019 // JOY ARULRAJ LECTURE #06: DISK- CENTRIC AND

GT 8803 // Fall 2019

T O D A Y ’ s A G E N D A

• Disk-centric DBMSs

• In-Memory DBMSs

4

Page 5: DATA ANALYTICS USING DEEP LEARNINGjarulraj/courses/8803-f19/slides/06-in... · DATA ANALYTICS USING DEEP LEARNING GT 8803 // FALL 2019 // JOY ARULRAJ LECTURE #06: DISK- CENTRIC AND

GT 8803 // Fall 2018

DISK-CENTRIC

DBMSs

5

Page 6: DATA ANALYTICS USING DEEP LEARNINGjarulraj/courses/8803-f19/slides/06-in... · DATA ANALYTICS USING DEEP LEARNING GT 8803 // FALL 2019 // JOY ARULRAJ LECTURE #06: DISK- CENTRIC AND

GT 8803 // Fall 2019

A N A T O M Y O F A D A T A B A S E S Y S T E M

Connection Manager + Admission Control

Query Parser

Query Optimizer

Query Executor

Lock Manager (Concurrency Control)

Access Methods (or Indexes)

Buffer Pool Manager

Log Manager

Memory Manager + Disk Manager

Networking Manager

6

QueryTransactional

Storage Manager

Query Processor

Shared Utilities

Process Manager

Source: Anatomy of a Database System

Page 7: DATA ANALYTICS USING DEEP LEARNINGjarulraj/courses/8803-f19/slides/06-in... · DATA ANALYTICS USING DEEP LEARNING GT 8803 // FALL 2019 // JOY ARULRAJ LECTURE #06: DISK- CENTRIC AND

GT 8803 // Fall 2019

A N A T O M Y O F A D A T A B A S E S Y S T E M

7

• Process Manager– Manages client connections

• Query Processor– Parse, plan and execute queries on top of storage manager

• Transactional Storage Manager– Knits together buffer management, concurrency control,

logging and recovery

• Shared Utilities– Manage hardware resources across threads

Page 8: DATA ANALYTICS USING DEEP LEARNINGjarulraj/courses/8803-f19/slides/06-in... · DATA ANALYTICS USING DEEP LEARNING GT 8803 // FALL 2019 // JOY ARULRAJ LECTURE #06: DISK- CENTRIC AND

GT 8803 // Fall 2019

T O P I C S

• Implications of availability of large DRAM

chips for database systems– Buffer Management

– Query Processing

– Concurrency Control

– Logging and Recovery

8

Page 9: DATA ANALYTICS USING DEEP LEARNINGjarulraj/courses/8803-f19/slides/06-in... · DATA ANALYTICS USING DEEP LEARNING GT 8803 // FALL 2019 // JOY ARULRAJ LECTURE #06: DISK- CENTRIC AND

GT 8803 // Fall 2019

B A C K G R O U N D

• Much of the history of DBMSs is about dealing

with the limitations of hardware.

• Hardware was much different when the

original DBMSs were designed:– Uniprocessor (single-core CPU)

– RAM was severely limited (few MB).– The database had to be stored on disk.

– Disk is slow. No seriously, I mean really slow.

9

Page 10: DATA ANALYTICS USING DEEP LEARNINGjarulraj/courses/8803-f19/slides/06-in... · DATA ANALYTICS USING DEEP LEARNING GT 8803 // FALL 2019 // JOY ARULRAJ LECTURE #06: DISK- CENTRIC AND

GT 8803 // Fall 2019

B A C K G R O U N D

• But now DRAM capacities are large enough

that most databases can fit in memory.– Structured data sets are smaller (e.g., tables with

numeric data).

– Unstructured data sets are larger (e.g., videos).

• So why not just use a "traditional" disk-

oriented DBMS with a really large cache?

10

Page 11: DATA ANALYTICS USING DEEP LEARNINGjarulraj/courses/8803-f19/slides/06-in... · DATA ANALYTICS USING DEEP LEARNING GT 8803 // FALL 2019 // JOY ARULRAJ LECTURE #06: DISK- CENTRIC AND

GT 8803 // Fall 2018

D I S K - O R I E N T E D D B M S O V E R H E A D

11

Measured CPU Instructions

OLTP THROUGH THE LOOKING GLASS, AND WHAT WE FOUND THERESIGMOD, pp. 981-992, 2008.

Page 12: DATA ANALYTICS USING DEEP LEARNINGjarulraj/courses/8803-f19/slides/06-in... · DATA ANALYTICS USING DEEP LEARNING GT 8803 // FALL 2019 // JOY ARULRAJ LECTURE #06: DISK- CENTRIC AND

GT 8803 // Fall 2018

D I S K - O R I E N T E D D B M S O V E R H E A D

12

BUFFER POOL

LATCHING

LOCKING

LOGGING

B-TREE KEYS

REAL WORK

Measured CPU Instructions

OLTP THROUGH THE LOOKING GLASS, AND WHAT WE FOUND THERESIGMOD, pp. 981-992, 2008.

Page 13: DATA ANALYTICS USING DEEP LEARNINGjarulraj/courses/8803-f19/slides/06-in... · DATA ANALYTICS USING DEEP LEARNING GT 8803 // FALL 2019 // JOY ARULRAJ LECTURE #06: DISK- CENTRIC AND

GT 8803 // Fall 2018

D I S K - O R I E N T E D D B M S O V E R H E A D

13

BUFFER POOL

LATCHING

LOCKING

LOGGING

B-TREE KEYS

REAL WORK

34%

Measured CPU Instructions

OLTP THROUGH THE LOOKING GLASS, AND WHAT WE FOUND THERESIGMOD, pp. 981-992, 2008.

Page 14: DATA ANALYTICS USING DEEP LEARNINGjarulraj/courses/8803-f19/slides/06-in... · DATA ANALYTICS USING DEEP LEARNING GT 8803 // FALL 2019 // JOY ARULRAJ LECTURE #06: DISK- CENTRIC AND

GT 8803 // Fall 2018

D I S K - O R I E N T E D D B M S O V E R H E A D

14

BUFFER POOL

LATCHING

LOCKING

LOGGING

B-TREE KEYS

REAL WORK

14%

34%

Measured CPU Instructions

OLTP THROUGH THE LOOKING GLASS, AND WHAT WE FOUND THERESIGMOD, pp. 981-992, 2008.

Page 15: DATA ANALYTICS USING DEEP LEARNINGjarulraj/courses/8803-f19/slides/06-in... · DATA ANALYTICS USING DEEP LEARNING GT 8803 // FALL 2019 // JOY ARULRAJ LECTURE #06: DISK- CENTRIC AND

GT 8803 // Fall 2018

D I S K - O R I E N T E D D B M S O V E R H E A D

15

BUFFER POOL

LATCHING

LOCKING

LOGGING

B-TREE KEYS

REAL WORK

16%

14%

34%

Measured CPU Instructions

OLTP THROUGH THE LOOKING GLASS, AND WHAT WE FOUND THERESIGMOD, pp. 981-992, 2008.

Page 16: DATA ANALYTICS USING DEEP LEARNINGjarulraj/courses/8803-f19/slides/06-in... · DATA ANALYTICS USING DEEP LEARNING GT 8803 // FALL 2019 // JOY ARULRAJ LECTURE #06: DISK- CENTRIC AND

GT 8803 // Fall 2018

D I S K - O R I E N T E D D B M S O V E R H E A D

16

BUFFER POOL

LATCHING

LOCKING

LOGGING

B-TREE KEYS

REAL WORK

16%

14%

34%

12%

Measured CPU Instructions

OLTP THROUGH THE LOOKING GLASS, AND WHAT WE FOUND THERESIGMOD, pp. 981-992, 2008.

Page 17: DATA ANALYTICS USING DEEP LEARNINGjarulraj/courses/8803-f19/slides/06-in... · DATA ANALYTICS USING DEEP LEARNING GT 8803 // FALL 2019 // JOY ARULRAJ LECTURE #06: DISK- CENTRIC AND

GT 8803 // Fall 2018

D I S K - O R I E N T E D D B M S O V E R H E A D

17

BUFFER POOL

LATCHING

LOCKING

LOGGING

B-TREE KEYS

REAL WORK

16%

14%

34%

12%

Measured CPU Instructions

OLTP THROUGH THE LOOKING GLASS, AND WHAT WE FOUND THERESIGMOD, pp. 981-992, 2008.

16%

Page 18: DATA ANALYTICS USING DEEP LEARNINGjarulraj/courses/8803-f19/slides/06-in... · DATA ANALYTICS USING DEEP LEARNING GT 8803 // FALL 2019 // JOY ARULRAJ LECTURE #06: DISK- CENTRIC AND

GT 8803 // Fall 2018

D I S K - O R I E N T E D D B M S O V E R H E A D

18

BUFFER POOL

LATCHING

LOCKING

LOGGING

B-TREE KEYS

REAL WORK

16%

14%

34%

12%

Measured CPU Instructions

OLTP THROUGH THE LOOKING GLASS, AND WHAT WE FOUND THERESIGMOD, pp. 981-992, 2008.

16%

7%

Page 19: DATA ANALYTICS USING DEEP LEARNINGjarulraj/courses/8803-f19/slides/06-in... · DATA ANALYTICS USING DEEP LEARNING GT 8803 // FALL 2019 // JOY ARULRAJ LECTURE #06: DISK- CENTRIC AND

GT 8803 // Fall 2019

b U F F E R M A N A G E M E N T

• The primary storage location of the database

is on non-volatile storage (e.g., SSD).– The database is stored in a file as a collection of

fixed-length blocks called slotted pages on disk.

• The system uses an volatile in-memory buffer

pool to cache blocks fetched from disk.– Its job is to manage the movement of those blocks

back and forth between disk and memory.

19

Page 20: DATA ANALYTICS USING DEEP LEARNINGjarulraj/courses/8803-f19/slides/06-in... · DATA ANALYTICS USING DEEP LEARNING GT 8803 // FALL 2019 // JOY ARULRAJ LECTURE #06: DISK- CENTRIC AND

GT 8803 // Fall 2019

b U F F E R M A N A G E M E N T

• When a query accesses a page, the DBMS

checks to see if that page is already in

memory in a buffer pool– If it’s not, then the DBMS has to retrieve it from disk

and copy it into a free frame in the buffer pool.

– If there are no free frames, then find a page to evict

guided by the page replacement policy.

– If the page being evicted is dirty, then the DBMS has

to write it back to disk to ensure the durability

(ACID) of data.

20

Page 21: DATA ANALYTICS USING DEEP LEARNINGjarulraj/courses/8803-f19/slides/06-in... · DATA ANALYTICS USING DEEP LEARNING GT 8803 // FALL 2019 // JOY ARULRAJ LECTURE #06: DISK- CENTRIC AND

GT 8803 // Fall 2019

b U F F E R M A N A G E M E N T

• Page replacement policy is a differentiating

factor between open-source and commercial

DBMSs.– What kind of data does it contain?

– Is the page dirty?

– How likely is the page to be accessed in the near

future?

– Examples: LRU, LFU, CLOCK, ARC

21

Page 22: DATA ANALYTICS USING DEEP LEARNINGjarulraj/courses/8803-f19/slides/06-in... · DATA ANALYTICS USING DEEP LEARNING GT 8803 // FALL 2019 // JOY ARULRAJ LECTURE #06: DISK- CENTRIC AND

GT 8803 // Fall 2019

b U F F E R M A N A G E M E N T

• Once the page is in memory, the DBMS

translates any on-disk addresses to their in-

memory addresses.

(Page Identifier) (Page Pointer)

[#100] [0x5050]

22

Page 23: DATA ANALYTICS USING DEEP LEARNINGjarulraj/courses/8803-f19/slides/06-in... · DATA ANALYTICS USING DEEP LEARNING GT 8803 // FALL 2019 // JOY ARULRAJ LECTURE #06: DISK- CENTRIC AND

GT 8803 // Fall 2018

b U F F E R M A N A G E M E N T

23

Buffer Pool

page6

page4

Index Database (On-Disk)

Slotted Pages

Page Table

page0

page1

page2

page2

Page 24: DATA ANALYTICS USING DEEP LEARNINGjarulraj/courses/8803-f19/slides/06-in... · DATA ANALYTICS USING DEEP LEARNING GT 8803 // FALL 2019 // JOY ARULRAJ LECTURE #06: DISK- CENTRIC AND

GT 8803 // Fall 2018

b U F F E R M A N A G E M E N T

24

Buffer Pool

page6

page4

Index

Page Id + Slot #

Database (On-Disk)

Slotted Pages

Page Table

page0

page1

page2

page2

Page 25: DATA ANALYTICS USING DEEP LEARNINGjarulraj/courses/8803-f19/slides/06-in... · DATA ANALYTICS USING DEEP LEARNING GT 8803 // FALL 2019 // JOY ARULRAJ LECTURE #06: DISK- CENTRIC AND

GT 8803 // Fall 2018

b U F F E R M A N A G E M E N T

25

Buffer Pool

page6

page4

Index

Page Id + Slot #

Database (On-Disk)

Slotted Pages

Page Table

page0

page1

page2

page2

Page 26: DATA ANALYTICS USING DEEP LEARNINGjarulraj/courses/8803-f19/slides/06-in... · DATA ANALYTICS USING DEEP LEARNING GT 8803 // FALL 2019 // JOY ARULRAJ LECTURE #06: DISK- CENTRIC AND

GT 8803 // Fall 2018

b U F F E R M A N A G E M E N T

26

Buffer Pool

page6

page4

Index

Page Id + Slot #

Database (On-Disk)

Slotted Pages

Page Table

page0

page1

page2

page2

Page 27: DATA ANALYTICS USING DEEP LEARNINGjarulraj/courses/8803-f19/slides/06-in... · DATA ANALYTICS USING DEEP LEARNING GT 8803 // FALL 2019 // JOY ARULRAJ LECTURE #06: DISK- CENTRIC AND

GT 8803 // Fall 2018

b U F F E R M A N A G E M E N T

27

Buffer Pool

page6

page4

Index

Page Id + Slot #

Database (On-Disk)

Slotted Pages

Page Table

page0

page1

page2

page2

Page 28: DATA ANALYTICS USING DEEP LEARNINGjarulraj/courses/8803-f19/slides/06-in... · DATA ANALYTICS USING DEEP LEARNING GT 8803 // FALL 2019 // JOY ARULRAJ LECTURE #06: DISK- CENTRIC AND

GT 8803 // Fall 2018

b U F F E R M A N A G E M E N T

28

Buffer Pool

page6

page4

Index

Page Id + Slot #

Database (On-Disk)

Slotted Pages

Page Table

page0

page1

page2

page2

Page 29: DATA ANALYTICS USING DEEP LEARNINGjarulraj/courses/8803-f19/slides/06-in... · DATA ANALYTICS USING DEEP LEARNING GT 8803 // FALL 2019 // JOY ARULRAJ LECTURE #06: DISK- CENTRIC AND

GT 8803 // Fall 2018

b U F F E R M A N A G E M E N T

29

Buffer Pool

page6

page4

Index

Page Id + Slot #

Database (On-Disk)

Slotted Pages

Page Table

page0

page1

page2

page2

Page 30: DATA ANALYTICS USING DEEP LEARNINGjarulraj/courses/8803-f19/slides/06-in... · DATA ANALYTICS USING DEEP LEARNING GT 8803 // FALL 2019 // JOY ARULRAJ LECTURE #06: DISK- CENTRIC AND

GT 8803 // Fall 2018

b U F F E R M A N A G E M E N T

30

Buffer Pool

page6

page4

Index

Page Id + Slot #

Database (On-Disk)

Slotted Pages

Page Table

page0

page1

page2

Page 31: DATA ANALYTICS USING DEEP LEARNINGjarulraj/courses/8803-f19/slides/06-in... · DATA ANALYTICS USING DEEP LEARNING GT 8803 // FALL 2019 // JOY ARULRAJ LECTURE #06: DISK- CENTRIC AND

GT 8803 // Fall 2018

b U F F E R M A N A G E M E N T

31

Buffer Pool

page6

page4

Index

Page Id + Slot #

Database (On-Disk)

Slotted Pages

Page Table

page0

page1

page2

page1

Page 32: DATA ANALYTICS USING DEEP LEARNINGjarulraj/courses/8803-f19/slides/06-in... · DATA ANALYTICS USING DEEP LEARNING GT 8803 // FALL 2019 // JOY ARULRAJ LECTURE #06: DISK- CENTRIC AND

GT 8803 // Fall 2018

b U F F E R M A N A G E M E N T

32

Buffer Pool

page6

page4

Index

Page Id + Slot #

Database (On-Disk)

Slotted Pages

Page Table

page0

page1

page2

page1

Page 33: DATA ANALYTICS USING DEEP LEARNINGjarulraj/courses/8803-f19/slides/06-in... · DATA ANALYTICS USING DEEP LEARNING GT 8803 // FALL 2019 // JOY ARULRAJ LECTURE #06: DISK- CENTRIC AND

GT 8803 // Fall 2018

b U F F E R M A N A G E M E N T

33

Buffer Pool

page6

page4

Index

Page Id + Slot #

Database (On-Disk)

Slotted Pages

Page Table

page0

page1

page2

page1

Page 34: DATA ANALYTICS USING DEEP LEARNINGjarulraj/courses/8803-f19/slides/06-in... · DATA ANALYTICS USING DEEP LEARNING GT 8803 // FALL 2019 // JOY ARULRAJ LECTURE #06: DISK- CENTRIC AND

GT 8803 // Fall 2019

b U F F E R M A N A G E M E N T

• Every tuple access has to go through the

buffer pool manager regardless of whether

that data will always be in memory.– Always have to translate a tuple’s record id to its

memory location.

– Worker thread has to pin pages that it needs to

make sure that they are not swapped to disk.

34

Page 35: DATA ANALYTICS USING DEEP LEARNINGjarulraj/courses/8803-f19/slides/06-in... · DATA ANALYTICS USING DEEP LEARNING GT 8803 // FALL 2019 // JOY ARULRAJ LECTURE #06: DISK- CENTRIC AND

GT 8803 // Fall 2019

B U F F E R M A N A G E M E N T

35

Page 36: DATA ANALYTICS USING DEEP LEARNINGjarulraj/courses/8803-f19/slides/06-in... · DATA ANALYTICS USING DEEP LEARNING GT 8803 // FALL 2019 // JOY ARULRAJ LECTURE #06: DISK- CENTRIC AND

GT 8803 // Fall 2019

B U F F E R M A N A G E M E N T

• Q: What do we gain by managing an in-

memory buffer?– A: Accelerate query processing by storing

frequently-accessed pages in fast memory

• Q: Can we “learn” an optimal page

replacement policy?– A: Recent paper from Google on learning memory

accesses based on LSTM models.

36

Page 37: DATA ANALYTICS USING DEEP LEARNINGjarulraj/courses/8803-f19/slides/06-in... · DATA ANALYTICS USING DEEP LEARNING GT 8803 // FALL 2019 // JOY ARULRAJ LECTURE #06: DISK- CENTRIC AND

GT 8803 // Fall 2019

B U F F E R M A N A G E M E N T

• Q: What do we gain by managing an in-

memory buffer?– A: Accelerate query processing by storing

frequently-accessed pages in fast memory

• Q: Can we “learn” an optimal page

replacement policy?– A: Recent paper from Google on learning memory

accesses based on LSTM models.

37

Page 38: DATA ANALYTICS USING DEEP LEARNINGjarulraj/courses/8803-f19/slides/06-in... · DATA ANALYTICS USING DEEP LEARNING GT 8803 // FALL 2019 // JOY ARULRAJ LECTURE #06: DISK- CENTRIC AND

GT 8803 // Fall 2019

B U F F E R M A N A G E M E N T

• Q: What do we gain by managing an in-

memory buffer?– A: Accelerate query processing by storing

frequently-accessed pages in fast memory

• Q: Can we “learn” an optimal page

replacement policy?– A: Recent paper from Google on learning memory

accesses based on LSTM models.

38

Page 39: DATA ANALYTICS USING DEEP LEARNINGjarulraj/courses/8803-f19/slides/06-in... · DATA ANALYTICS USING DEEP LEARNING GT 8803 // FALL 2019 // JOY ARULRAJ LECTURE #06: DISK- CENTRIC AND

GT 8803 // Fall 2018

Q U E R Y P R O C E S S I N G

39

Tuple-at-a-time→ Each operator calls next on their child to

get the next tuple to process.

Operator-at-a-time→ Each operator materializes their entire

output for their parent operator.

Vector-at-a-time→ Each operator calls next on their child to

get the next chunk of data to process.

SELECT A.id, B.valueFROM A, B

WHERE A.id = B.idAND B.value > 100

A B

A.id=B.id

value>100

A.id, B.value

s

p

Page 40: DATA ANALYTICS USING DEEP LEARNINGjarulraj/courses/8803-f19/slides/06-in... · DATA ANALYTICS USING DEEP LEARNING GT 8803 // FALL 2019 // JOY ARULRAJ LECTURE #06: DISK- CENTRIC AND

GT 8803 // Fall 2019

Q U E R Y P R O C E S S I N G

• The best strategy for executing a query plan

in a disk-centric DBMS– Sequential scans over a table are much faster than

random accesses

• The traditional tuple-at-a-time iterator

model works well– Because output of an operator will not fit in limited

memory

40

Page 41: DATA ANALYTICS USING DEEP LEARNINGjarulraj/courses/8803-f19/slides/06-in... · DATA ANALYTICS USING DEEP LEARNING GT 8803 // FALL 2019 // JOY ARULRAJ LECTURE #06: DISK- CENTRIC AND

GT 8803 // Fall 2019

C O N C U R R E N C Y C O N T R O L

• In a disk-oriented DBMS, the systems assumes

that a txn could stall at any time when it tries

to access data that is not in memory.

41

Page 42: DATA ANALYTICS USING DEEP LEARNINGjarulraj/courses/8803-f19/slides/06-in... · DATA ANALYTICS USING DEEP LEARNING GT 8803 // FALL 2019 // JOY ARULRAJ LECTURE #06: DISK- CENTRIC AND

GT 8803 // Fall 2019

C O N C U R R E N C Y C O N T R O L

• Execute other txns at the same time so that if

one txn stalls then others can keep running.– This is not because the DBMS is trying to use all

cores in the CPU (still focusing on single-core CPUs)

– We do this to let system make forward progress by

executing another txn while the current txn is

waiting for data to be fetched from disk

42

Page 43: DATA ANALYTICS USING DEEP LEARNINGjarulraj/courses/8803-f19/slides/06-in... · DATA ANALYTICS USING DEEP LEARNING GT 8803 // FALL 2019 // JOY ARULRAJ LECTURE #06: DISK- CENTRIC AND

GT 8803 // Fall 2019

C O N C U R R E N C Y C O N T R O L

• Concurrency control policy– Responsible for deciding how to interleave

operations of concurrent transactions in such a way

that it appears as if they are running serially

– This property is referred to as serializability of

transactions

43

Page 44: DATA ANALYTICS USING DEEP LEARNINGjarulraj/courses/8803-f19/slides/06-in... · DATA ANALYTICS USING DEEP LEARNING GT 8803 // FALL 2019 // JOY ARULRAJ LECTURE #06: DISK- CENTRIC AND

GT 8803 // Fall 2019

C O N C U R R E N C Y C O N T R O L

• Concurrency control policy– DBMS has to set locks and latches to ensure the

highest level of isolation (ACID) between

transactions

– Locks are stored in a separate data structure (lock

table) to avoid being swapped to disk.

44

Page 45: DATA ANALYTICS USING DEEP LEARNINGjarulraj/courses/8803-f19/slides/06-in... · DATA ANALYTICS USING DEEP LEARNING GT 8803 // FALL 2019 // JOY ARULRAJ LECTURE #06: DISK- CENTRIC AND

GT 8803 // Fall 2019

L O G G I N G & R E C O V E R Y

• This protocol helps ensure the atomicity and

durability properties (ACID)– Durability: Changes made by committed

transactions must be present in the database after

recovering from a power failure.

– Atomicity: Changes made by uncommitted (in-

progress/aborted) transactions must not be present

in the database after recovering from a power

failure.

45

Page 46: DATA ANALYTICS USING DEEP LEARNINGjarulraj/courses/8803-f19/slides/06-in... · DATA ANALYTICS USING DEEP LEARNING GT 8803 // FALL 2019 // JOY ARULRAJ LECTURE #06: DISK- CENTRIC AND

GT 8803 // Fall 2019

L O G G I N G & R E C O V E R Y

• DBMSs use STEAL and NO-FORCE buffer pool

management policies.– STEAL: DBMS can flush pages dirtied by

uncommitted transactions to disk.

– NO-FORCE: DBMS is not required to flush all pages

dirtied by committed transactions to disk.

– So all page modifications have to be flushed to the

write-ahead log (WAL) before a txn can commit

46

Page 47: DATA ANALYTICS USING DEEP LEARNINGjarulraj/courses/8803-f19/slides/06-in... · DATA ANALYTICS USING DEEP LEARNING GT 8803 // FALL 2019 // JOY ARULRAJ LECTURE #06: DISK- CENTRIC AND

GT 8803 // Fall 2019

L O G G I N G & R E C O V E R Y

• Each log entry contains the before and after

images of modified tuples.– STEAL: Modifications made by uncommitted

transactions that are flushed to disk have to rolled

back.

– NO-FORCE: Modifications made by committed

transactions might not have been flushed to disk.

47

Page 48: DATA ANALYTICS USING DEEP LEARNINGjarulraj/courses/8803-f19/slides/06-in... · DATA ANALYTICS USING DEEP LEARNING GT 8803 // FALL 2019 // JOY ARULRAJ LECTURE #06: DISK- CENTRIC AND

GT 8803 // Fall 2019

L O G G I N G & R E C O V E R Y

• Each log entry contains the before and after

images of modified tuples.– Recording the before and after images in the log is

critical to ensuring atomicity and durability

– Lots of work to keep track of log sequence numbers

(LSNs) all throughout the DBMS.

48

Page 49: DATA ANALYTICS USING DEEP LEARNINGjarulraj/courses/8803-f19/slides/06-in... · DATA ANALYTICS USING DEEP LEARNING GT 8803 // FALL 2019 // JOY ARULRAJ LECTURE #06: DISK- CENTRIC AND

GT 8803 // Fall 2019

L O G G I N G & R E C O V E R Y

49

Page 50: DATA ANALYTICS USING DEEP LEARNINGjarulraj/courses/8803-f19/slides/06-in... · DATA ANALYTICS USING DEEP LEARNING GT 8803 // FALL 2019 // JOY ARULRAJ LECTURE #06: DISK- CENTRIC AND

GT 8803 // Fall 2019

L O G G I N G & R E C O V E R Y

• Q: What would happen if we use a NO-STEAL

policy?– A: Cannot support large transactions that make

changes larger than the buffer pool

• Q: What would happen if we use a FORCE

policy?– A: Performance would drop by orders of

magnitude since need to randomly write to disk all

the time.

50

Page 51: DATA ANALYTICS USING DEEP LEARNINGjarulraj/courses/8803-f19/slides/06-in... · DATA ANALYTICS USING DEEP LEARNING GT 8803 // FALL 2019 // JOY ARULRAJ LECTURE #06: DISK- CENTRIC AND

GT 8803 // Fall 2019

L O G G I N G & R E C O V E R Y

• Q: What would happen if we use a NO-STEAL

policy?– A: Cannot support large transactions that make

changes larger than the buffer pool

• Q: What would happen if we use a FORCE

policy?– A: Performance would drop by orders of

magnitude since need to randomly write to disk all

the time.

51

Page 52: DATA ANALYTICS USING DEEP LEARNINGjarulraj/courses/8803-f19/slides/06-in... · DATA ANALYTICS USING DEEP LEARNING GT 8803 // FALL 2019 // JOY ARULRAJ LECTURE #06: DISK- CENTRIC AND

GT 8803 // Fall 2019

L O G G I N G & R E C O V E R Y

• Q: What would happen if we use a NO-STEAL

policy?– A: Cannot support large transactions that make

changes larger than the buffer pool

• Q: What would happen if we use a FORCE

policy?– A: Performance would drop by orders of

magnitude since need to randomly write to disk all

the time.

52

Page 53: DATA ANALYTICS USING DEEP LEARNINGjarulraj/courses/8803-f19/slides/06-in... · DATA ANALYTICS USING DEEP LEARNING GT 8803 // FALL 2019 // JOY ARULRAJ LECTURE #06: DISK- CENTRIC AND

GT 8803 // Fall 2019

T A K E A W A Y S

• Disk-oriented DBMSs do a lot of extra stuff

because they are predicated on the

assumption that data has to reside on disk

• In-memory DBMSs maximize performance by

optimizing these protocols and algorithms

53

Page 54: DATA ANALYTICS USING DEEP LEARNINGjarulraj/courses/8803-f19/slides/06-in... · DATA ANALYTICS USING DEEP LEARNING GT 8803 // FALL 2019 // JOY ARULRAJ LECTURE #06: DISK- CENTRIC AND

GT 8803 // Fall 2018

IN-MEMORY

DBMSs

54

Page 55: DATA ANALYTICS USING DEEP LEARNINGjarulraj/courses/8803-f19/slides/06-in... · DATA ANALYTICS USING DEEP LEARNING GT 8803 // FALL 2019 // JOY ARULRAJ LECTURE #06: DISK- CENTRIC AND

GT 8803 // Fall 2019

I N - M E M O R Y D B M S S

• Assume that the primary storage location of

the database is permanently in memory.

• Early ideas proposed in the 1980s but it is

now feasible because DRAM prices are low

and capacities are high.

55

Page 56: DATA ANALYTICS USING DEEP LEARNINGjarulraj/courses/8803-f19/slides/06-in... · DATA ANALYTICS USING DEEP LEARNING GT 8803 // FALL 2019 // JOY ARULRAJ LECTURE #06: DISK- CENTRIC AND

GT 8803 // Fall 2019

B O T T L E N E C K S

• If I/O is no longer the slowest resource, much

of the DBMS’s architecture will have to

change account for other bottlenecks:– Locking/latching

– Cache misses

– Predicate evaluations

– Data movement & copying

– Networking (between application & DBMS)

56

Page 57: DATA ANALYTICS USING DEEP LEARNINGjarulraj/courses/8803-f19/slides/06-in... · DATA ANALYTICS USING DEEP LEARNING GT 8803 // FALL 2019 // JOY ARULRAJ LECTURE #06: DISK- CENTRIC AND

GT 8803 // Fall 2018

S T O R A G E A C C E S S L A T E N C I E S

57

L3 DRAM SSD HDD

Read Latency ~20 ns 60 ns 25,000 ns 10,000,000 ns

Write Latency ~20 ns 60 ns 300,000 ns 10,000,000 ns

LET’S TALK ABOUT STORAGE & RECOVERY METHODS FOR NON-VOLATILE MEMORY DATABASE SYSTEMSSIGMOD, pp. 707-722, 2015.

Page 58: DATA ANALYTICS USING DEEP LEARNINGjarulraj/courses/8803-f19/slides/06-in... · DATA ANALYTICS USING DEEP LEARNING GT 8803 // FALL 2019 // JOY ARULRAJ LECTURE #06: DISK- CENTRIC AND

GT 8803 // Fall 2018

S T O R A G E A C C E S S L A T E N C I E S

58

Jim Gray’s analogy:→Reading from L3 cache: Reading a book on a table

→Reading from HDD: Flying to Pluto to read that book

Because everything fits in DRAM, we can do

more sophisticated things in software.

Page 59: DATA ANALYTICS USING DEEP LEARNINGjarulraj/courses/8803-f19/slides/06-in... · DATA ANALYTICS USING DEEP LEARNING GT 8803 // FALL 2019 // JOY ARULRAJ LECTURE #06: DISK- CENTRIC AND

GT 8803 // Fall 2019

b U F F E R M A N A G E M E N T

• An in-memory DBMS does not need to store

the database in slotted pages but it will still

organize tuples in blocks:– Direct memory pointers vs. tuple identifiers

– Separate pools for fixed-length (e.g., numeric data)

and variable-length data (e.g., images)

– Use checksums to detect software errors from

trashing the database.

59

Page 60: DATA ANALYTICS USING DEEP LEARNINGjarulraj/courses/8803-f19/slides/06-in... · DATA ANALYTICS USING DEEP LEARNING GT 8803 // FALL 2019 // JOY ARULRAJ LECTURE #06: DISK- CENTRIC AND

GT 8803 // Fall 2018

b U F F E R M A N A G E M E N T

60

Fixed-LengthData Blocks

Index Variable-LengthData Blocks

Page 61: DATA ANALYTICS USING DEEP LEARNINGjarulraj/courses/8803-f19/slides/06-in... · DATA ANALYTICS USING DEEP LEARNING GT 8803 // FALL 2019 // JOY ARULRAJ LECTURE #06: DISK- CENTRIC AND

GT 8803 // Fall 2018

b U F F E R M A N A G E M E N T

61

Fixed-LengthData Blocks

Index

Memory Address

Variable-LengthData Blocks

Page 62: DATA ANALYTICS USING DEEP LEARNINGjarulraj/courses/8803-f19/slides/06-in... · DATA ANALYTICS USING DEEP LEARNING GT 8803 // FALL 2019 // JOY ARULRAJ LECTURE #06: DISK- CENTRIC AND

GT 8803 // Fall 2018

b U F F E R M A N A G E M E N T

62

Fixed-LengthData Blocks

Index

Memory Address

Variable-LengthData Blocks

Page 63: DATA ANALYTICS USING DEEP LEARNINGjarulraj/courses/8803-f19/slides/06-in... · DATA ANALYTICS USING DEEP LEARNING GT 8803 // FALL 2019 // JOY ARULRAJ LECTURE #06: DISK- CENTRIC AND

GT 8803 // Fall 2018

b U F F E R M A N A G E M E N T

63

Fixed-LengthData Blocks

Index

Memory Address

Variable-LengthData Blocks

Page 64: DATA ANALYTICS USING DEEP LEARNINGjarulraj/courses/8803-f19/slides/06-in... · DATA ANALYTICS USING DEEP LEARNING GT 8803 // FALL 2019 // JOY ARULRAJ LECTURE #06: DISK- CENTRIC AND

GT 8803 // Fall 2019

b U F F E R M A N A G E M E N T

• DRAM is fast, but data is not accessed with

the same frequency and in the same manner.– Hot Data: OLTP Operations (Tweets posted

yesterday)

– Cold Data: OLAP Queries (Tweets posted last year)

• We will study techniques for how to bring

back disk-resident data without slowing

down the entire system.

64

Page 65: DATA ANALYTICS USING DEEP LEARNINGjarulraj/courses/8803-f19/slides/06-in... · DATA ANALYTICS USING DEEP LEARNING GT 8803 // FALL 2019 // JOY ARULRAJ LECTURE #06: DISK- CENTRIC AND

GT 8803 // Fall 2018

Q U E R Y P R O C E S S I N G

65

SELECT A.id, B.valueFROM A, B

WHERE A.id = B.idAND B.value > 100

A B

A.id=B.id

value>100

A.id, B.value

s

p

Page 66: DATA ANALYTICS USING DEEP LEARNINGjarulraj/courses/8803-f19/slides/06-in... · DATA ANALYTICS USING DEEP LEARNING GT 8803 // FALL 2019 // JOY ARULRAJ LECTURE #06: DISK- CENTRIC AND

GT 8803 // Fall 2018

Q U E R Y P R O C E S S I N G

66

Tuple-at-a-time→ Each operator calls next on their child to

get the next tuple to process.

Operator-at-a-time→ Each operator materializes their entire

output for their parent operator.

Vector-at-a-time→ Each operator calls next on their child to

get the next chunk of data to process.

SELECT A.id, B.valueFROM A, B

WHERE A.id = B.idAND B.value > 100

A B

A.id=B.id

value>100

A.id, B.value

s

p

Page 67: DATA ANALYTICS USING DEEP LEARNINGjarulraj/courses/8803-f19/slides/06-in... · DATA ANALYTICS USING DEEP LEARNING GT 8803 // FALL 2019 // JOY ARULRAJ LECTURE #06: DISK- CENTRIC AND

GT 8803 // Fall 2018

Q U E R Y P R O C E S S I N G

67

Tuple-at-a-time→ Each operator calls next on their child to

get the next tuple to process.

Operator-at-a-time→ Each operator materializes their entire

output for their parent operator.

Vector-at-a-time→ Each operator calls next on their child to

get the next chunk of data to process.

SELECT A.id, B.valueFROM A, B

WHERE A.id = B.idAND B.value > 100

A B

A.id=B.id

value>100

A.id, B.value

s

p

Page 68: DATA ANALYTICS USING DEEP LEARNINGjarulraj/courses/8803-f19/slides/06-in... · DATA ANALYTICS USING DEEP LEARNING GT 8803 // FALL 2019 // JOY ARULRAJ LECTURE #06: DISK- CENTRIC AND

GT 8803 // Fall 2019

Q U E R Y P R O C E S S I N G

• The best strategy for executing a query plan

in a DBMS changes when all of the data is

already in memory.– Sequential scans are no longer significantly faster

than random access.

• The traditional tuple-at-a-time iterator

model is too slow because of function calls.– This problem is more significant in OLAP DBMSs.

68

Page 69: DATA ANALYTICS USING DEEP LEARNINGjarulraj/courses/8803-f19/slides/06-in... · DATA ANALYTICS USING DEEP LEARNING GT 8803 // FALL 2019 // JOY ARULRAJ LECTURE #06: DISK- CENTRIC AND

GT 8803 // Fall 2019

Q U E R Y P R O C E S S I N G

69

Page 70: DATA ANALYTICS USING DEEP LEARNINGjarulraj/courses/8803-f19/slides/06-in... · DATA ANALYTICS USING DEEP LEARNING GT 8803 // FALL 2019 // JOY ARULRAJ LECTURE #06: DISK- CENTRIC AND

GT 8803 // Fall 2019

Q U E R Y P R O C E S S I N G

• Q: Query processing in in-memory systems:

sequential scans or random accesses?– A: Sequential scans are no longer significantly

faster than random access.

• Q: Will the traditional tuple-at-a-time iterator

work well now?– A: No, too slow because of function calls (virtual

table lookups).

70

Page 71: DATA ANALYTICS USING DEEP LEARNINGjarulraj/courses/8803-f19/slides/06-in... · DATA ANALYTICS USING DEEP LEARNING GT 8803 // FALL 2019 // JOY ARULRAJ LECTURE #06: DISK- CENTRIC AND

GT 8803 // Fall 2019

Q U E R Y P R O C E S S I N G

• Q: Query processing in in-memory systems:

sequential scans or random accesses?– A: Sequential scans are no longer significantly

faster than random access.

• Q: Will the traditional tuple-at-a-time iterator

work well now?– A: No, too slow because of function calls (virtual

table lookups).

71

Page 72: DATA ANALYTICS USING DEEP LEARNINGjarulraj/courses/8803-f19/slides/06-in... · DATA ANALYTICS USING DEEP LEARNING GT 8803 // FALL 2019 // JOY ARULRAJ LECTURE #06: DISK- CENTRIC AND

GT 8803 // Fall 2019

Q U E R Y P R O C E S S I N G

• Q: Query processing in in-memory systems:

sequential scans or random accesses?– A: Sequential scans are no longer significantly

faster than random access.

• Q: Will the traditional tuple-at-a-time iterator

work well now?– A: No, too slow because of function calls (virtual

table lookups).

72

Page 73: DATA ANALYTICS USING DEEP LEARNINGjarulraj/courses/8803-f19/slides/06-in... · DATA ANALYTICS USING DEEP LEARNING GT 8803 // FALL 2019 // JOY ARULRAJ LECTURE #06: DISK- CENTRIC AND

GT 8803 // Fall 2019

C O N C U R R E N C Y C O N T R O L

• Observation: The cost of a txn acquiring a lock

is the same as accessing data (since the lock

data is also in memory).

• In-memory DBMS may want to detect

conflicts at a different granularity.– Fine-grained locking allows for better concurrency

but requires more locks.

– Coarse-grained locking requires fewer locks but limits the amount of concurrency.

73

Page 74: DATA ANALYTICS USING DEEP LEARNINGjarulraj/courses/8803-f19/slides/06-in... · DATA ANALYTICS USING DEEP LEARNING GT 8803 // FALL 2019 // JOY ARULRAJ LECTURE #06: DISK- CENTRIC AND

GT 8803 // Fall 2019

C O N C U R R E N C Y C O N T R O L

• The DBMS can store locking information

about each tuple together with its data.– This helps with CPU cache locality.

– Mutexes are too slow. Need to use CAS instructions.

74

Page 75: DATA ANALYTICS USING DEEP LEARNINGjarulraj/courses/8803-f19/slides/06-in... · DATA ANALYTICS USING DEEP LEARNING GT 8803 // FALL 2019 // JOY ARULRAJ LECTURE #06: DISK- CENTRIC AND

GT 8803 // Fall 2019

C O N C U R R E N C Y C O N T R O L

• Disk-oriented DBMSs– Stalling during disk I/O

• Memory-oriented DBMSs– New bottleneck is contention caused from txns

executing on multiple cores trying to access data

at the same time.

75

Page 76: DATA ANALYTICS USING DEEP LEARNINGjarulraj/courses/8803-f19/slides/06-in... · DATA ANALYTICS USING DEEP LEARNING GT 8803 // FALL 2019 // JOY ARULRAJ LECTURE #06: DISK- CENTRIC AND

GT 8803 // Fall 2019

L O G G I N G & R E C O V E R Y

• The DBMS still needs a WAL on disk since the

system could halt at anytime.– Use group commit to batch log entries and flush

them together to amortize fsync cost.

– May be possible to use more lightweight logging

schemes (e.g., only store redo information, NO-

STEAL).

– But since there are no "dirty" pages, there is no

need to maintain LSNs all throughout the system.

76

Page 77: DATA ANALYTICS USING DEEP LEARNINGjarulraj/courses/8803-f19/slides/06-in... · DATA ANALYTICS USING DEEP LEARNING GT 8803 // FALL 2019 // JOY ARULRAJ LECTURE #06: DISK- CENTRIC AND

GT 8803 // Fall 2019

L O G G I N G & R E C O V E R Y

• The system also still takes checkpoints to

speed up recovery time.

• Different methods for check-pointing:– Old idea: Maintain a second copy of the database in

memory that is updated by replaying the WAL.

– Switch to a special “copy-on-write” mode and then

write a dump of the database to disk.

– Fork DBMS process and then have the child process

write its contents to disk (using virtual memory).

77

Page 78: DATA ANALYTICS USING DEEP LEARNINGjarulraj/courses/8803-f19/slides/06-in... · DATA ANALYTICS USING DEEP LEARNING GT 8803 // FALL 2019 // JOY ARULRAJ LECTURE #06: DISK- CENTRIC AND

GT 8803 // Fall 2019

S U M M A R Y

• Disk-oriented DBMSs are a relic of the past.– Most structured databases fit entirely in DRAM on a

single machine.

• The world has finally become comfortable

with in-memory data storage and processing.

78

Page 79: DATA ANALYTICS USING DEEP LEARNINGjarulraj/courses/8803-f19/slides/06-in... · DATA ANALYTICS USING DEEP LEARNING GT 8803 // FALL 2019 // JOY ARULRAJ LECTURE #06: DISK- CENTRIC AND

GT 8803 // Fall 2019

A N A T O M Y O F A D A T A B A S E S Y S T E M

Connection Manager + Admission Control

Query Parser

Query Optimizer

Query Executor

Lock Manager (Concurrency Control)

Access Methods (or Indexes)

Buffer Pool Manager

Log Manager

Memory Manager + Disk Manager

Networking Manager

79

QueryTransactional

Storage Manager

Query Processor

Shared Utilities

Process Manager

Source: Anatomy of a Database System

Page 80: DATA ANALYTICS USING DEEP LEARNINGjarulraj/courses/8803-f19/slides/06-in... · DATA ANALYTICS USING DEEP LEARNING GT 8803 // FALL 2019 // JOY ARULRAJ LECTURE #06: DISK- CENTRIC AND

GT 8803 // Fall 2019

N E X T L E C T U R E

• Data Storage

• Assigned Reading– BlazeIt: Fast Exploratory Video Queries using

Neural Networks

80