basic db architectures & layouts - harvard...

73
basic db architectures & layouts prof. Stratos Idreos HTTP://DASLAB.SEAS.HARVARD.EDU/CLASSES/CS165/ class 4

Upload: others

Post on 04-Aug-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: basic db architectures & layouts - Harvard Universitydaslab.seas.harvard.edu/.../CS165Fall2016Class4.pdf · CS165, Fall 2016 Stratos Idreos 2 /40 videos for sections 3 & 4 are online

basic db architectures & layoutsprof. Stratos Idreos

HTTP://DASLAB.SEAS.HARVARD.EDU/CLASSES/CS165/

class 4

Page 2: basic db architectures & layouts - Harvard Universitydaslab.seas.harvard.edu/.../CS165Fall2016Class4.pdf · CS165, Fall 2016 Stratos Idreos 2 /40 videos for sections 3 & 4 are online

CS165, Fall 2016 Stratos Idreos /402

videos for sections 3 & 4 are onlinecheck back every week (1-2 sections weekly)

there is a schedule but more will be added as we go

Page 3: basic db architectures & layouts - Harvard Universitydaslab.seas.harvard.edu/.../CS165Fall2016Class4.pdf · CS165, Fall 2016 Stratos Idreos 2 /40 videos for sections 3 & 4 are online

CS165, Fall 2016 Stratos Idreos /403

database kernel

optimizer

parser

execution

storage buffer pool

in/out

thread pool

transactions

disk

memory

cpu

applications

sql

from last time

Page 4: basic db architectures & layouts - Harvard Universitydaslab.seas.harvard.edu/.../CS165Fall2016Class4.pdf · CS165, Fall 2016 Stratos Idreos 2 /40 videos for sections 3 & 4 are online

CS165, Fall 2016 Stratos Idreos /404

query plan

database kernel

data data data

algo

rithm

s/op

erat

ors

Page 5: basic db architectures & layouts - Harvard Universitydaslab.seas.harvard.edu/.../CS165Fall2016Class4.pdf · CS165, Fall 2016 Stratos Idreos 2 /40 videos for sections 3 & 4 are online

CS165, Fall 2016 Stratos Idreos /405

select name from student

where GPA>3.0

select GPA>3.0

project name

student(id,name,GPA,address,class,…)

result

Page 6: basic db architectures & layouts - Harvard Universitydaslab.seas.harvard.edu/.../CS165Fall2016Class4.pdf · CS165, Fall 2016 Stratos Idreos 2 /40 videos for sections 3 & 4 are online

CS165, Fall 2016 Stratos Idreos /405

select name from student

where GPA>3.0

select GPA>3.0

project name

student(id,name,GPA,address,class,…)

result

scan all the data?

Page 7: basic db architectures & layouts - Harvard Universitydaslab.seas.harvard.edu/.../CS165Fall2016Class4.pdf · CS165, Fall 2016 Stratos Idreos 2 /40 videos for sections 3 & 4 are online

CS165, Fall 2016 Stratos Idreos /405

select name from student

where GPA>3.0

select GPA>3.0

project name

student(id,name,GPA,address,class,…)

result

scan all the data?

logical plan

Page 8: basic db architectures & layouts - Harvard Universitydaslab.seas.harvard.edu/.../CS165Fall2016Class4.pdf · CS165, Fall 2016 Stratos Idreos 2 /40 videos for sections 3 & 4 are online

CS165, Fall 2016 Stratos Idreos /406

select name from student

where GPA>3.0

scan GPA>3.0

project name

result

studentsstudents

index scan GPA>3.0

project name

result

physical plans

Page 9: basic db architectures & layouts - Harvard Universitydaslab.seas.harvard.edu/.../CS165Fall2016Class4.pdf · CS165, Fall 2016 Stratos Idreos 2 /40 videos for sections 3 & 4 are online

CS165, Fall 2016 Stratos Idreos /407

select avg(GPA) from student

where class=2017

select year=2017

project GPA

student(id,name,GPA,address,class,…)

avg GPA

result

Page 10: basic db architectures & layouts - Harvard Universitydaslab.seas.harvard.edu/.../CS165Fall2016Class4.pdf · CS165, Fall 2016 Stratos Idreos 2 /40 videos for sections 3 & 4 are online

CS165, Fall 2016 Stratos Idreos /407

select avg(GPA) from student

where class=2017

select year=2017

project GPA

student(id,name,GPA,address,class,…)

avg GPA

result

data layout

algorithms/ operators

interfaces

data flow

Page 11: basic db architectures & layouts - Harvard Universitydaslab.seas.harvard.edu/.../CS165Fall2016Class4.pdf · CS165, Fall 2016 Stratos Idreos 2 /40 videos for sections 3 & 4 are online

CS165, Fall 2016 Stratos Idreos /408

professor(id,name,…)

course(id,name, profId,…)

student(id,name,…)

give me all students enrolled in cs165select student.name from student, enrolled, course where course.name=“cs165” and enrolled.courseId=course.id and student.id=enrolled.studentId

enrolled(studentId,

courseId,…)

Page 12: basic db architectures & layouts - Harvard Universitydaslab.seas.harvard.edu/.../CS165Fall2016Class4.pdf · CS165, Fall 2016 Stratos Idreos 2 /40 videos for sections 3 & 4 are online

CS165, Fall 2016 Stratos Idreos /409

select course.name=“cs165”

join enrolled.courseid=course.id

student enrolled course

join student.id=enrolled.studentid

project student.name

good plan

select student.name from student, enrolled, course where course.name=“cs165” and enrolled.courseId=course.id and student.id=enrolled.studentId

Page 13: basic db architectures & layouts - Harvard Universitydaslab.seas.harvard.edu/.../CS165Fall2016Class4.pdf · CS165, Fall 2016 Stratos Idreos 2 /40 videos for sections 3 & 4 are online

CS165, Fall 2016 Stratos Idreos /4010

select course.name=“cs165”

join enrolled.courseid=course.id

student enrolled course

join student.id=enrolled.studentid

project student.name

select student.name from student, enrolled, course where course.name=“cs165” and enrolled.courseId=course.id and student.id=enrolled.studentId

pushing selects down

Page 14: basic db architectures & layouts - Harvard Universitydaslab.seas.harvard.edu/.../CS165Fall2016Class4.pdf · CS165, Fall 2016 Stratos Idreos 2 /40 videos for sections 3 & 4 are online

CS165, Fall 2016 Stratos Idreos /4011

select min(A) from R where B<10 and C<80

internal languagelogical plan

optimizer rules/cost model/statistics

physical plan execution

internal language

Page 15: basic db architectures & layouts - Harvard Universitydaslab.seas.harvard.edu/.../CS165Fall2016Class4.pdf · CS165, Fall 2016 Stratos Idreos 2 /40 videos for sections 3 & 4 are online

CS165, Fall 2016 Stratos Idreos /4011

select min(A) from R where B<10 and C<80

internal languagelogical plan

optimizer rules/cost model/statistics

physical plan execution

internal language

Page 16: basic db architectures & layouts - Harvard Universitydaslab.seas.harvard.edu/.../CS165Fall2016Class4.pdf · CS165, Fall 2016 Stratos Idreos 2 /40 videos for sections 3 & 4 are online

CS165, Fall 2016 Stratos Idreos /4011

select min(A) from R where B<10 and C<80

internal languagelogical plan

optimizer rules/cost model/statistics

physical plan execution

internal language

project interface/level of abstraction

Page 17: basic db architectures & layouts - Harvard Universitydaslab.seas.harvard.edu/.../CS165Fall2016Class4.pdf · CS165, Fall 2016 Stratos Idreos 2 /40 videos for sections 3 & 4 are online

CS165, Fall 2016 Stratos Idreos /4012

optimizer

execution

storage

tuning

db kernel

can DBAs make wrong decisions?

can optimizers make

wrong decisions?

Page 18: basic db architectures & layouts - Harvard Universitydaslab.seas.harvard.edu/.../CS165Fall2016Class4.pdf · CS165, Fall 2016 Stratos Idreos 2 /40 videos for sections 3 & 4 are online

CS165, Fall 2016 Stratos Idreos /4013

memory hierarchy

data layouts

column-stores basics

Page 19: basic db architectures & layouts - Harvard Universitydaslab.seas.harvard.edu/.../CS165Fall2016Class4.pdf · CS165, Fall 2016 Stratos Idreos 2 /40 videos for sections 3 & 4 are online

CS165, Fall 2016 Stratos Idreos /4014

system where db runs

applications

sql

caches

cpu - cpu - cpu - cpu

memory

disk - disk - disk - disk

cpu registers

smal

ler/f

aste

r

+ flash

+ non volatile memory

memory hierarchy

Page 20: basic db architectures & layouts - Harvard Universitydaslab.seas.harvard.edu/.../CS165Fall2016Class4.pdf · CS165, Fall 2016 Stratos Idreos 2 /40 videos for sections 3 & 4 are online

CS165, Fall 2016 Stratos Idreos /4015

Jim Gray, IBM, Tandem, DEC, Microsoft ACM Turing award ACM SIGMOD Edgar F. Codd Innovations Award

disk100Kx Pluto

2 years

memory100x New York1.5 hours

on board cache10x this building

10 min

on chip cache2x this room

1 min

registers my head~0

Page 21: basic db architectures & layouts - Harvard Universitydaslab.seas.harvard.edu/.../CS165Fall2016Class4.pdf · CS165, Fall 2016 Stratos Idreos 2 /40 videos for sections 3 & 4 are online

CS165, Fall 2016 Stratos Idreos /4016

registers

on chip cache

on board cache

memory

disk

CPU

memory wall

chea

per

fast

er

SRAM

DRAM

~1ns

~10ns

~100ns

cache miss: looking for something which is not in the cache

memory miss: looking for something which is not in memory

time

speed cpu

mem

Page 22: basic db architectures & layouts - Harvard Universitydaslab.seas.harvard.edu/.../CS165Fall2016Class4.pdf · CS165, Fall 2016 Stratos Idreos 2 /40 videos for sections 3 & 4 are online

CS165, Fall 2016 Stratos Idreos /40

data missesinstruction misses

17

design of storage/access methods/algorithms should minimize:

touch/access only what you need

Page 23: basic db architectures & layouts - Harvard Universitydaslab.seas.harvard.edu/.../CS165Fall2016Class4.pdf · CS165, Fall 2016 Stratos Idreos 2 /40 videos for sections 3 & 4 are online

CS165, Fall 2016 Stratos Idreos /4018

random access & page-based access

need to only read x… but have to read all of page 1

page1 page2 page3

data value x

registers

on chip cache

on board cache

memory

disk

CPU

data

mov

e

Page 24: basic db architectures & layouts - Harvard Universitydaslab.seas.harvard.edu/.../CS165Fall2016Class4.pdf · CS165, Fall 2016 Stratos Idreos 2 /40 videos for sections 3 & 4 are online

CS165, Fall 2016 Stratos Idreos /4019

…5 10 6 4 12 2 8 9 7 6 7 11 3 9 6

memory level N

memory level N-1

query x<5

page size: 5x8 bytes

5 10 6 4 12

(size=120 bytes)

2 8 9 7 6 7 11 3 9 6

Page 25: basic db architectures & layouts - Harvard Universitydaslab.seas.harvard.edu/.../CS165Fall2016Class4.pdf · CS165, Fall 2016 Stratos Idreos 2 /40 videos for sections 3 & 4 are online

CS165, Fall 2016 Stratos Idreos /4019

…5 10 6 4 12 2 8 9 7 6 7 11 3 9 6

memory level N

memory level N-1

query x<5

page size: 5x8 bytes

scan

5 10 6 4 12(size=120 bytes)

2 8 9 7 6 7 11 3 9 6

Page 26: basic db architectures & layouts - Harvard Universitydaslab.seas.harvard.edu/.../CS165Fall2016Class4.pdf · CS165, Fall 2016 Stratos Idreos 2 /40 videos for sections 3 & 4 are online

CS165, Fall 2016 Stratos Idreos /4019

…5 10 6 4 12 2 8 9 7 6 7 11 3 9 6

memory level N

memory level N-1

query x<5

page size: 5x8 bytes

scan

5 10 6 4 12(size=120 bytes)

2 8 9 7 6

4

7 11 3 9 6

Page 27: basic db architectures & layouts - Harvard Universitydaslab.seas.harvard.edu/.../CS165Fall2016Class4.pdf · CS165, Fall 2016 Stratos Idreos 2 /40 videos for sections 3 & 4 are online

CS165, Fall 2016 Stratos Idreos /4019

…5 10 6 4 12 2 8 9 7 6 7 11 3 9 6

memory level N

memory level N-1

query x<5

page size: 5x8 bytes

scan

40 bytes

5 10 6 4 12(size=120 bytes)

2 8 9 7 6

4

7 11 3 9 6

Page 28: basic db architectures & layouts - Harvard Universitydaslab.seas.harvard.edu/.../CS165Fall2016Class4.pdf · CS165, Fall 2016 Stratos Idreos 2 /40 videos for sections 3 & 4 are online

CS165, Fall 2016 Stratos Idreos /4019

…5 10 6 4 12 2 8 9 7 6 7 11 3 9 6

memory level N

memory level N-1

query x<5

page size: 5x8 bytes

scan scan

40 bytes

5 10 6 4 12(size=120 bytes) 2 8 9 7 6 4

7 11 3 9 6

Page 29: basic db architectures & layouts - Harvard Universitydaslab.seas.harvard.edu/.../CS165Fall2016Class4.pdf · CS165, Fall 2016 Stratos Idreos 2 /40 videos for sections 3 & 4 are online

CS165, Fall 2016 Stratos Idreos /4019

…5 10 6 4 12 2 8 9 7 6 7 11 3 9 6

memory level N

memory level N-1

query x<5

page size: 5x8 bytes

scan scan

40 bytes

5 10 6 4 12(size=120 bytes) 2 8 9 7 6 4 2

7 11 3 9 6

Page 30: basic db architectures & layouts - Harvard Universitydaslab.seas.harvard.edu/.../CS165Fall2016Class4.pdf · CS165, Fall 2016 Stratos Idreos 2 /40 videos for sections 3 & 4 are online

CS165, Fall 2016 Stratos Idreos /4019

…5 10 6 4 12 2 8 9 7 6 7 11 3 9 6

memory level N

memory level N-1

query x<5

page size: 5x8 bytes

scan scan

5 10 6 4 12(size=120 bytes) 2 8 9 7 6 4 2

7 11 3 9 6

80 bytes

Page 31: basic db architectures & layouts - Harvard Universitydaslab.seas.harvard.edu/.../CS165Fall2016Class4.pdf · CS165, Fall 2016 Stratos Idreos 2 /40 videos for sections 3 & 4 are online

CS165, Fall 2016 Stratos Idreos /4019

…5 10 6 4 12 2 8 9 7 6 7 11 3 9 6

memory level N

memory level N-1

query x<5

page size: 5x8 bytes

(size=120 bytes) 2 8 9 7 6 4 27 11 3 9 6

scan

80 bytes

Page 32: basic db architectures & layouts - Harvard Universitydaslab.seas.harvard.edu/.../CS165Fall2016Class4.pdf · CS165, Fall 2016 Stratos Idreos 2 /40 videos for sections 3 & 4 are online

CS165, Fall 2016 Stratos Idreos /4019

…5 10 6 4 12 2 8 9 7 6 7 11 3 9 6

memory level N

memory level N-1

query x<5

page size: 5x8 bytes

(size=120 bytes) 2 8 9 7 6 4 27 11 3 9 6

scan

3

80 bytes

Page 33: basic db architectures & layouts - Harvard Universitydaslab.seas.harvard.edu/.../CS165Fall2016Class4.pdf · CS165, Fall 2016 Stratos Idreos 2 /40 videos for sections 3 & 4 are online

CS165, Fall 2016 Stratos Idreos /4019

…5 10 6 4 12 2 8 9 7 6 7 11 3 9 6

memory level N

memory level N-1

query x<5

page size: 5x8 bytes

(size=120 bytes) 2 8 9 7 6 4 27 11 3 9 6

scan

3

120 bytes

Page 34: basic db architectures & layouts - Harvard Universitydaslab.seas.harvard.edu/.../CS165Fall2016Class4.pdf · CS165, Fall 2016 Stratos Idreos 2 /40 videos for sections 3 & 4 are online

CS165, Fall 2016 Stratos Idreos /4020

…5 10 6 4 12 2 8 9 7 6 7 11 3 9 6

memory level N

memory level N-1

query x<5

page size: 5x8 bytes

5 10 6 4 12

(size=120 bytes)

2 8 9 7 6 7 11 3 9 6

an oracle gives us the positions

Page 35: basic db architectures & layouts - Harvard Universitydaslab.seas.harvard.edu/.../CS165Fall2016Class4.pdf · CS165, Fall 2016 Stratos Idreos 2 /40 videos for sections 3 & 4 are online

CS165, Fall 2016 Stratos Idreos /4020

…5 10 6 4 12 2 8 9 7 6 7 11 3 9 6

memory level N

memory level N-1

query x<5

page size: 5x8 bytes

oracle

5 10 6 4 12(size=120 bytes)

2 8 9 7 6 7 11 3 9 6

an oracle gives us the positions

Page 36: basic db architectures & layouts - Harvard Universitydaslab.seas.harvard.edu/.../CS165Fall2016Class4.pdf · CS165, Fall 2016 Stratos Idreos 2 /40 videos for sections 3 & 4 are online

CS165, Fall 2016 Stratos Idreos /4020

…5 10 6 4 12 2 8 9 7 6 7 11 3 9 6

memory level N

memory level N-1

query x<5

page size: 5x8 bytes

oracle

5 10 6 4 12(size=120 bytes)

2 8 9 7 6

4

7 11 3 9 6

an oracle gives us the positions

Page 37: basic db architectures & layouts - Harvard Universitydaslab.seas.harvard.edu/.../CS165Fall2016Class4.pdf · CS165, Fall 2016 Stratos Idreos 2 /40 videos for sections 3 & 4 are online

CS165, Fall 2016 Stratos Idreos /4020

…5 10 6 4 12 2 8 9 7 6 7 11 3 9 6

memory level N

memory level N-1

query x<5

page size: 5x8 bytes

oracle

40 bytes

5 10 6 4 12(size=120 bytes)

2 8 9 7 6

4

7 11 3 9 6

an oracle gives us the positions

Page 38: basic db architectures & layouts - Harvard Universitydaslab.seas.harvard.edu/.../CS165Fall2016Class4.pdf · CS165, Fall 2016 Stratos Idreos 2 /40 videos for sections 3 & 4 are online

CS165, Fall 2016 Stratos Idreos /4020

…5 10 6 4 12 2 8 9 7 6 7 11 3 9 6

memory level N

memory level N-1

query x<5

page size: 5x8 bytes

oracle oracle

40 bytes

5 10 6 4 12(size=120 bytes) 2 8 9 7 6 4

7 11 3 9 6

an oracle gives us the positions

Page 39: basic db architectures & layouts - Harvard Universitydaslab.seas.harvard.edu/.../CS165Fall2016Class4.pdf · CS165, Fall 2016 Stratos Idreos 2 /40 videos for sections 3 & 4 are online

CS165, Fall 2016 Stratos Idreos /4020

…5 10 6 4 12 2 8 9 7 6 7 11 3 9 6

memory level N

memory level N-1

query x<5

page size: 5x8 bytes

oracle oracle

40 bytes

5 10 6 4 12(size=120 bytes) 2 8 9 7 6 4 2

7 11 3 9 6

an oracle gives us the positions

Page 40: basic db architectures & layouts - Harvard Universitydaslab.seas.harvard.edu/.../CS165Fall2016Class4.pdf · CS165, Fall 2016 Stratos Idreos 2 /40 videos for sections 3 & 4 are online

CS165, Fall 2016 Stratos Idreos /4020

…5 10 6 4 12 2 8 9 7 6 7 11 3 9 6

memory level N

memory level N-1

query x<5

page size: 5x8 bytes

oracle oracle

5 10 6 4 12(size=120 bytes) 2 8 9 7 6 4 2

7 11 3 9 6

80 bytesan oracle gives us the positions

Page 41: basic db architectures & layouts - Harvard Universitydaslab.seas.harvard.edu/.../CS165Fall2016Class4.pdf · CS165, Fall 2016 Stratos Idreos 2 /40 videos for sections 3 & 4 are online

CS165, Fall 2016 Stratos Idreos /4020

…5 10 6 4 12 2 8 9 7 6 7 11 3 9 6

memory level N

memory level N-1

query x<5

page size: 5x8 bytes

(size=120 bytes) 2 8 9 7 6 4 27 11 3 9 6

oracle

80 bytesan oracle gives us the positions

Page 42: basic db architectures & layouts - Harvard Universitydaslab.seas.harvard.edu/.../CS165Fall2016Class4.pdf · CS165, Fall 2016 Stratos Idreos 2 /40 videos for sections 3 & 4 are online

CS165, Fall 2016 Stratos Idreos /4020

…5 10 6 4 12 2 8 9 7 6 7 11 3 9 6

memory level N

memory level N-1

query x<5

page size: 5x8 bytes

(size=120 bytes) 2 8 9 7 6 4 27 11 3 9 6

oracle

3

80 bytesan oracle gives us the positions

Page 43: basic db architectures & layouts - Harvard Universitydaslab.seas.harvard.edu/.../CS165Fall2016Class4.pdf · CS165, Fall 2016 Stratos Idreos 2 /40 videos for sections 3 & 4 are online

CS165, Fall 2016 Stratos Idreos /4020

…5 10 6 4 12 2 8 9 7 6 7 11 3 9 6

memory level N

memory level N-1

query x<5

page size: 5x8 bytes

(size=120 bytes) 2 8 9 7 6 4 27 11 3 9 6

oracle

3

120 bytesan oracle gives us the positions

Page 44: basic db architectures & layouts - Harvard Universitydaslab.seas.harvard.edu/.../CS165Fall2016Class4.pdf · CS165, Fall 2016 Stratos Idreos 2 /40 videos for sections 3 & 4 are online

CS165, Fall 2016 Stratos Idreos /4021

when does it make sense to have an oracle

scan=120bytes vs oracle=120bytes (and there is no such thing as an Oracle so Oracle is not for free…)

Page 45: basic db architectures & layouts - Harvard Universitydaslab.seas.harvard.edu/.../CS165Fall2016Class4.pdf · CS165, Fall 2016 Stratos Idreos 2 /40 videos for sections 3 & 4 are online

CS165, Fall 2016 Stratos Idreos /4022

sequential access: read one block; consume it completely; discard it; read next

what is next?

in parallel/prefetching

hardware/software can better predict/buffer sequential pages to be read

1 2 3 4

Page 46: basic db architectures & layouts - Harvard Universitydaslab.seas.harvard.edu/.../CS165Fall2016Class4.pdf · CS165, Fall 2016 Stratos Idreos 2 /40 videos for sections 3 & 4 are online

CS165, Fall 2016 Stratos Idreos /4023

random access: read one block; consume it partially; discard it; might have to read it again in future; read “random” next;

Page 47: basic db architectures & layouts - Harvard Universitydaslab.seas.harvard.edu/.../CS165Fall2016Class4.pdf · CS165, Fall 2016 Stratos Idreos 2 /40 videos for sections 3 & 4 are online

CS165, Fall 2016 Stratos Idreos /4024

level N-1

level Nbuffer pool remember hot blocks

why not use OS caching

Page 48: basic db architectures & layouts - Harvard Universitydaslab.seas.harvard.edu/.../CS165Fall2016Class4.pdf · CS165, Fall 2016 Stratos Idreos 2 /40 videos for sections 3 & 4 are online

CS165, Fall 2016 Stratos Idreos /4025

device block size

os block size

dbms block size

os and db will typically refer to pages

Page 49: basic db architectures & layouts - Harvard Universitydaslab.seas.harvard.edu/.../CS165Fall2016Class4.pdf · CS165, Fall 2016 Stratos Idreos 2 /40 videos for sections 3 & 4 are online

CS165, Fall 2016 Stratos Idreos /4026

employee(id:int, name:varchar(50), office:char(5), telephone:char(10), city:varchar(30), salary:int)

(1, name1, office1, tel1, city1, salary1) (2, name2, office2, tel2, city2, salary2) (3, name3, office3, tel3, city3, salary3) (4, name4, office4, tel4, city4, salary4) (5, name5, office5, tel5, city5, salary5) (6, name6, office6, tel6, city6, salary6) (7, name7, office7, tel7, city7, salary7) (8, name8, office8, tel8, city8, salary8) (9, name9, office9, tel9, city9, salary9)

data storage blocks < pages < files

file

remember: the way we store data defines the best possible way we can access it

Page 50: basic db architectures & layouts - Harvard Universitydaslab.seas.harvard.edu/.../CS165Fall2016Class4.pdf · CS165, Fall 2016 Stratos Idreos 2 /40 videos for sections 3 & 4 are online

CS165, Fall 2016 Stratos Idreos /4027

employee(id:int, name:varchar(50), office:char(5), telephone:char(10), city:varchar(30), salary:int)

(1, name1, office1, tel1, city1, salary1) (2, name2, office2, tel2, city2, salary2) (3, name3, office3, tel3, city3, salary3) (4, name4, office4, tel4, city4, salary4) (5, name5, office5, tel5, city5, salary5) (6, name6, office6, tel6, city6, salary6) (7, name7, office7, tel7, city7, salary7) (8, name8, office8, tel8, city8, salary8) (9, name9, office9, tel9, city9, salary9)

header

row1row2

row3…

slotted pages

page

Page 51: basic db architectures & layouts - Harvard Universitydaslab.seas.harvard.edu/.../CS165Fall2016Class4.pdf · CS165, Fall 2016 Stratos Idreos 2 /40 videos for sections 3 & 4 are online

CS165, Fall 2016 Stratos Idreos /4028

free_offset, N, offset1-length1, offset2-lenght2,…

free space

slotted page

scan null

update var length

Page 52: basic db architectures & layouts - Harvard Universitydaslab.seas.harvard.edu/.../CS165Fall2016Class4.pdf · CS165, Fall 2016 Stratos Idreos 2 /40 videos for sections 3 & 4 are online

CS165, Fall 2016 Stratos Idreos /4029

some things to “worry” abouthow much data we transfer through the memory hierarchy how many computations we do

level Nlevel N-1…cpu

Page 53: basic db architectures & layouts - Harvard Universitydaslab.seas.harvard.edu/.../CS165Fall2016Class4.pdf · CS165, Fall 2016 Stratos Idreos 2 /40 videos for sections 3 & 4 are online

CS165, Fall 2016 Stratos Idreos /4030

row-storeABCD

stored continuously

one page contains all fields of multiple attributes

select A,B,C,Dselect A

file

Page 54: basic db architectures & layouts - Harvard Universitydaslab.seas.harvard.edu/.../CS165Fall2016Class4.pdf · CS165, Fall 2016 Stratos Idreos 2 /40 videos for sections 3 & 4 are online

CS165, Fall 2016 Stratos Idreos /4031

row-store column-storeABCD A B C D

stored continuously

one page contains fields of a single attribute

select A,B,C,Dselect A

Page 55: basic db architectures & layouts - Harvard Universitydaslab.seas.harvard.edu/.../CS165Fall2016Class4.pdf · CS165, Fall 2016 Stratos Idreos 2 /40 videos for sections 3 & 4 are online

CS165, Fall 2016 Stratos Idreos /4032

history/timeline

~1960s

rows

1970: column storage ideas start

appearing

rows rows rows rows

1985: first rather complete column-store model

~2000: open source complete system

rows

2005-now: more ideas and industry adoption of column-

store designs

rows

monetdb

c-store,vertica,vectorwise and then

ibm,microsoft,oracle, and more

Page 56: basic db architectures & layouts - Harvard Universitydaslab.seas.harvard.edu/.../CS165Fall2016Class4.pdf · CS165, Fall 2016 Stratos Idreos 2 /40 videos for sections 3 & 4 are online

CS165, Fall 2016 Stratos Idreos /4033

column-store with materialized IDs

ID A ID B ID Cheader

R(A,B,C)

row1row2

row3

good idea

Page 57: basic db architectures & layouts - Harvard Universitydaslab.seas.harvard.edu/.../CS165Fall2016Class4.pdf · CS165, Fall 2016 Stratos Idreos 2 /40 videos for sections 3 & 4 are online

CS165, Fall 2016 Stratos Idreos /4034

a1 a2 a3 a4 a5 a6

b1 b2 b3 b4 b5 b6

c1 c2 c3 c4 c5 c6

virtual ids/ positional alignment

positional lookups/joinsA(i) = A + i * width(A)

tuple 1tuple 2tuple 3tuple 4tuple 5tuple 6

A B C

fixed-width + dense

columns do not need to have the

same width

Page 58: basic db architectures & layouts - Harvard Universitydaslab.seas.harvard.edu/.../CS165Fall2016Class4.pdf · CS165, Fall 2016 Stratos Idreos 2 /40 videos for sections 3 & 4 are online

CS165, Fall 2016 Stratos Idreos /4035

disk memoryA B C D

A

ABCrow-store

engine

ok so now we can selectively read columns but how do we process them?

early tuple reconstruction/materialization

column-store

engine

option1

option2

Page 59: basic db architectures & layouts - Harvard Universitydaslab.seas.harvard.edu/.../CS165Fall2016Class4.pdf · CS165, Fall 2016 Stratos Idreos 2 /40 videos for sections 3 & 4 are online

CS165, Fall 2016 Stratos Idreos /4036

registers

on chip cache

on board cache

memory

disk

CPU

memory wall

chea

per

fast

er

SRAM

DRAM

~1ns

~10ns

~100ns

it is not just memory and disk

we want to move as few data items as possible all the way up to the CPU

Page 60: basic db architectures & layouts - Harvard Universitydaslab.seas.harvard.edu/.../CS165Fall2016Class4.pdf · CS165, Fall 2016 Stratos Idreos 2 /40 videos for sections 3 & 4 are online

CS165, Fall 2016 Stratos Idreos /4037

select min(C) from R where A<10 & B<20

A B C Ddisk memory

write the query plan and the code/logic of each operator

do not forget about intermediate results describe data layouts at each step

(milestone1 of project)

Page 61: basic db architectures & layouts - Harvard Universitydaslab.seas.harvard.edu/.../CS165Fall2016Class4.pdf · CS165, Fall 2016 Stratos Idreos 2 /40 videos for sections 3 & 4 are online

CS165, Fall 2016 Stratos Idreos /4037

select min(C) from R where A<10 & B<20

A B C Ddisk memory

write the query plan and the code/logic of each operator

do not forget about intermediate results describe data layouts at each step

(milestone1 of project)

no precise final answer is OKunderstanding what matters is key

concepts & designs will be repeated >>1

Page 62: basic db architectures & layouts - Harvard Universitydaslab.seas.harvard.edu/.../CS165Fall2016Class4.pdf · CS165, Fall 2016 Stratos Idreos 2 /40 videos for sections 3 & 4 are online

CS165, Fall 2016 Stratos Idreos /4038

select min(C) from R where A<10 & B<20

A B C Ddisk memory

late reconstruction/materialization

Page 63: basic db architectures & layouts - Harvard Universitydaslab.seas.harvard.edu/.../CS165Fall2016Class4.pdf · CS165, Fall 2016 Stratos Idreos 2 /40 videos for sections 3 & 4 are online

CS165, Fall 2016 Stratos Idreos /4038

select min(C) from R where A<10 & B<20

A<10A B C Ddisk memory

late reconstruction/materialization

Page 64: basic db architectures & layouts - Harvard Universitydaslab.seas.harvard.edu/.../CS165Fall2016Class4.pdf · CS165, Fall 2016 Stratos Idreos 2 /40 videos for sections 3 & 4 are online

CS165, Fall 2016 Stratos Idreos /4038

select min(C) from R where A<10 & B<20

A<10A B C D1: int *input=A2: for (i=0;i<tuples;i++,input++) 3: if *input<104: *output=i5: output++

A<10

disk memory

late reconstruction/materialization

Page 65: basic db architectures & layouts - Harvard Universitydaslab.seas.harvard.edu/.../CS165Fall2016Class4.pdf · CS165, Fall 2016 Stratos Idreos 2 /40 videos for sections 3 & 4 are online

CS165, Fall 2016 Stratos Idreos /4038

select min(C) from R where A<10 & B<20

A<10A B C D IDsdisk memory

late reconstruction/materialization

Page 66: basic db architectures & layouts - Harvard Universitydaslab.seas.harvard.edu/.../CS165Fall2016Class4.pdf · CS165, Fall 2016 Stratos Idreos 2 /40 videos for sections 3 & 4 are online

CS165, Fall 2016 Stratos Idreos /4038

select min(C) from R where A<10 & B<20

A<10A B C D IDs Bdisk memory

late reconstruction/materialization

Page 67: basic db architectures & layouts - Harvard Universitydaslab.seas.harvard.edu/.../CS165Fall2016Class4.pdf · CS165, Fall 2016 Stratos Idreos 2 /40 videos for sections 3 & 4 are online

CS165, Fall 2016 Stratos Idreos /4038

select min(C) from R where A<10 & B<20

B<20A<10A B C D IDs Bdisk memory

late reconstruction/materialization

Page 68: basic db architectures & layouts - Harvard Universitydaslab.seas.harvard.edu/.../CS165Fall2016Class4.pdf · CS165, Fall 2016 Stratos Idreos 2 /40 videos for sections 3 & 4 are online

CS165, Fall 2016 Stratos Idreos /4038

select min(C) from R where A<10 & B<20

B<20A<10A B C D IDs B CIDsdisk memory

late reconstruction/materialization

Page 69: basic db architectures & layouts - Harvard Universitydaslab.seas.harvard.edu/.../CS165Fall2016Class4.pdf · CS165, Fall 2016 Stratos Idreos 2 /40 videos for sections 3 & 4 are online

CS165, Fall 2016 Stratos Idreos /4038

select min(C) from R where A<10 & B<20

B<20 minCA<10A B C D IDs B CIDsdisk memory

late reconstruction/materialization

Page 70: basic db architectures & layouts - Harvard Universitydaslab.seas.harvard.edu/.../CS165Fall2016Class4.pdf · CS165, Fall 2016 Stratos Idreos 2 /40 videos for sections 3 & 4 are online

CS165, Fall 2016 Stratos Idreos /4038

select min(C) from R where A<10 & B<20

B<20 minCA<10A B C D IDs B CIDsdisk memory

late reconstruction/materialization

always sequential access patterns memory contains only what is needed at any point in time

Page 71: basic db architectures & layouts - Harvard Universitydaslab.seas.harvard.edu/.../CS165Fall2016Class4.pdf · CS165, Fall 2016 Stratos Idreos 2 /40 videos for sections 3 & 4 are online

CS165, Fall 2016 Stratos Idreos /4039

column-stores vs row-stores

it all starts with how we store the data

still basic concepts are the same

moving data is a major cost component

it is not just about disk…

the whole memory hierarchy matters

Notes to remember

Page 72: basic db architectures & layouts - Harvard Universitydaslab.seas.harvard.edu/.../CS165Fall2016Class4.pdf · CS165, Fall 2016 Stratos Idreos 2 /40 videos for sections 3 & 4 are online

CS165, Fall 2016 Stratos Idreos /4040

Architecture of a Database System (Sections 1,2,3,4)by J. Hellerstein, M. Stonebraker and J. Hamilton

The Design and Implementation of Modern Column-store Database Systemsby D. Abadi, P. Boncz, S. Harizopoulos, S. Idreos, S. Madden

please keep up with reading!

Page 73: basic db architectures & layouts - Harvard Universitydaslab.seas.harvard.edu/.../CS165Fall2016Class4.pdf · CS165, Fall 2016 Stratos Idreos 2 /40 videos for sections 3 & 4 are online

DATA SYSTEMSprof. Stratos Idreos

class 4

basic db architectures & layouts