models & intro to db architectures - data systems...

49
Models & Intro to DB Architectures prof. Stratos Idreos HTTP://DASLAB.SEAS.HARVARD.EDU/CLASSES/CS165/ class 3

Upload: others

Post on 10-Aug-2020

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Models & Intro to DB Architectures - Data Systems Laboratorydaslab.seas.harvard.edu/.../cs165/doc/class_slides/... · design data structures and algorithms = access methods study

Models & Intro to DB Architecturesprof. Stratos Idreos

HTTP://DASLAB.SEAS.HARVARD.EDU/CLASSES/CS165/

class 3

Page 2: Models & Intro to DB Architectures - Data Systems Laboratorydaslab.seas.harvard.edu/.../cs165/doc/class_slides/... · design data structures and algorithms = access methods study

CS165, Fall 2016 Stratos Idreos /49

welcome brave cs165 students!242+44

Page 3: Models & Intro to DB Architectures - Data Systems Laboratorydaslab.seas.harvard.edu/.../cs165/doc/class_slides/... · design data structures and algorithms = access methods study

CS165, Fall 2016 Stratos Idreos /493

NO LAPTOP/PHONE POLICYclass is based on participation!

+ there is enough evidence that laptops and phones slow you down (check syllabus for more info)

we will bring a copy of the slides for every one in each time class so you can follow and keep notes

Page 4: Models & Intro to DB Architectures - Data Systems Laboratorydaslab.seas.harvard.edu/.../cs165/doc/class_slides/... · design data structures and algorithms = access methods study

CS165, Fall 2016 Stratos Idreos /494

database kernel

data data data

algo

rithm

s/op

erat

ors

applications

sql

disk

memory

cpu

Page 5: Models & Intro to DB Architectures - Data Systems Laboratorydaslab.seas.harvard.edu/.../cs165/doc/class_slides/... · design data structures and algorithms = access methods study

CS165, Fall 2016 Stratos Idreos /495

a “simple” example

assume an array of N integers: find all positions where value>x

data

qualifying positions

exists in all systems: sql, nosql, newsqlselect operator

even the simplest tasks are actually far from trivial

no obvious solutions; just a taste of what to come5

Page 6: Models & Intro to DB Architectures - Data Systems Laboratorydaslab.seas.harvard.edu/.../cs165/doc/class_slides/... · design data structures and algorithms = access methods study

CS165, Fall 2016 Stratos Idreos /496

design data structures and algorithms = access methods

study design tradeoffs with respect to modern hardware application requirements & complete system design

what we will do

next couple of weeksvery basics of data models and languages (today)

basics of db architectures (today next couple of classes) column-stores and hardware-conscious designs

Page 7: Models & Intro to DB Architectures - Data Systems Laboratorydaslab.seas.harvard.edu/.../cs165/doc/class_slides/... · design data structures and algorithms = access methods study

CS165, Fall 2016 Stratos Idreos /497

Page 8: Models & Intro to DB Architectures - Data Systems Laboratorydaslab.seas.harvard.edu/.../cs165/doc/class_slides/... · design data structures and algorithms = access methods study

CS165, Fall 2016 Stratos Idreos /498

for project:install MonetDB & PostgreSQL/MySQL and play with SQL

explain + SQL query to see query plans in MonetDB (use read-only mode)

repeat throughout the semester compare with your system in terms of performance

this is part of final deliverable

& several logistics and tools(code.seas, gdb, perf, valgrind, testing infrastructure)

do not underestimate this…

Page 9: Models & Intro to DB Architectures - Data Systems Laboratorydaslab.seas.harvard.edu/.../cs165/doc/class_slides/... · design data structures and algorithms = access methods study

CS165, Fall 2016 Stratos Idreos /49

what should I be doing?

9

do P0 register to Piazza and stay logged in check syllabus/website carefully check project timeline and plan around it keep up with reading (goes fast) register for notes today (tmr we will do assignments)

come to Labs and OH frequently

Page 10: Models & Intro to DB Architectures - Data Systems Laboratorydaslab.seas.harvard.edu/.../cs165/doc/class_slides/... · design data structures and algorithms = access methods study

CS165, Fall 2016 Stratos Idreos /4910

how to read research papers

1) abstract-intro-related work-conclusions what is the problem why is it important why past solutions do not work what is the core idea what is success

2) core part-analysis basic idea what matters any gaps?

goal: by the end of the semester understand these papers fully

3) follow a few citations and repeat

Page 11: Models & Intro to DB Architectures - Data Systems Laboratorydaslab.seas.harvard.edu/.../cs165/doc/class_slides/... · design data structures and algorithms = access methods study

CS165, Fall 2016 Stratos Idreos /4911

we want you to have fun!data systems is an exciting field!

tell us how you are keeping up tell us what you need to better follow the class tell us your suggestions about how to improve the class

Page 12: Models & Intro to DB Architectures - Data Systems Laboratorydaslab.seas.harvard.edu/.../cs165/doc/class_slides/... · design data structures and algorithms = access methods study

CS165, Fall 2016 Stratos Idreos /49

the evolution of data-driven applications

12

photo of Anand here

use own data (recommendations)(1)

use public data (search)(2)

use social data (social networks)(3)

all of the above (all of the above)(4) training data

(machine learning)(5)

Anand Rajaraman

Page 13: Models & Intro to DB Architectures - Data Systems Laboratorydaslab.seas.harvard.edu/.../cs165/doc/class_slides/... · design data structures and algorithms = access methods study

CS165, Fall 2016 Stratos Idreos /4913

amazon

google

linkedin

Cyborg

bank X

the cyborg knows & manages

all your models

food for (big data-driven) thought Anand Rajaraman

Page 14: Models & Intro to DB Architectures - Data Systems Laboratorydaslab.seas.harvard.edu/.../cs165/doc/class_slides/... · design data structures and algorithms = access methods study

CS165, Fall 2016 Stratos Idreos /4914

the biggest transportations companies have no cars

the biggest data companies have no data

the biggest hotel companies have no hotels

food for (big data-driven) thought

more about this and other thoughts in our first brainstorming session (TBA)

Page 15: Models & Intro to DB Architectures - Data Systems Laboratorydaslab.seas.harvard.edu/.../cs165/doc/class_slides/... · design data structures and algorithms = access methods study

CS165, Fall 2016 Stratos Idreos /4915

logical design

physical design

system design

Page 16: Models & Intro to DB Architectures - Data Systems Laboratorydaslab.seas.harvard.edu/.../cs165/doc/class_slides/... · design data structures and algorithms = access methods study

CS165, Fall 2016 Stratos Idreos /4916

essential steps in using a database system

clean schema load tune

query

experts/system admins

user/apps

Page 17: Models & Intro to DB Architectures - Data Systems Laboratorydaslab.seas.harvard.edu/.../cs165/doc/class_slides/... · design data structures and algorithms = access methods study

CS165, Fall 2016 Stratos Idreos /4917

relational model+SQL

professors(id,name,…)

courses(id,name, profId,…)

students(id,name,…)

database

table/relationcolumn/attribute

give me the names of all students: select name from students

create table for professors: create table professors (id:integer, name: char(40), telephone: char(10), …) insert into professors (76897689, “john smith”, …)

where GPA>3.0

key

Page 18: Models & Intro to DB Architectures - Data Systems Laboratorydaslab.seas.harvard.edu/.../cs165/doc/class_slides/... · design data structures and algorithms = access methods study

CS165, Fall 2016 Stratos Idreos /4918

employee(id:int, name:varchar(50), office:char(5), telephone:char(10), city:varchar(30), salary:int)

(1, name1, office1, tel1, city1, salary1) (2, name2, office2, tel2, city2, salary2) (3, name3, office3, tel3, city3, salary3) (4, name4, office4, tel4, city4, salary4) (5, name5, office5, tel5, city5, salary5) (6, name6, office6, tel6, city6, salary6) (7, name7, office7, tel7, city7, salary7) (8, name8, office8, tel8, city8, salary8) (9, name9, office9, NULL, city9, salary9)

schema

data

cardinality=9

SQL:insert into employee (1, name1, office1, tel1, city1, salary1)

value does not exist

Page 19: Models & Intro to DB Architectures - Data Systems Laboratorydaslab.seas.harvard.edu/.../cs165/doc/class_slides/... · design data structures and algorithms = access methods study

CS165, Fall 2016 Stratos Idreos /4919

relational model+SQL

professor(id,name,…)

course(id,name, profId,…)

student(id,name,…)

database

give me all students enrolled in cs165select student.name from student, enrolled, course where course.name=“cs165” and enrolled.courseId=course.id and student.id=enrolled.studentId

enrolled(studentId,

courseId,…) foreign key

join

Page 20: Models & Intro to DB Architectures - Data Systems Laboratorydaslab.seas.harvard.edu/.../cs165/doc/class_slides/... · design data structures and algorithms = access methods study

CS165, Fall 2016 Stratos Idreos /4920

student(id,name,…)

enrolled(studentId,courseId,…)

how do we join

Page 21: Models & Intro to DB Architectures - Data Systems Laboratorydaslab.seas.harvard.edu/.../cs165/doc/class_slides/... · design data structures and algorithms = access methods study

CS165, Fall 2016 Stratos Idreos /49

normalization

21

AllData(student ID,student name,student address, course name, grade, professor name, professor ID, professor telephone,…)

good

say schema about university db contains one table…

duplicates - tons of data - updates - but no joins

Page 22: Models & Intro to DB Architectures - Data Systems Laboratorydaslab.seas.harvard.edu/.../cs165/doc/class_slides/... · design data structures and algorithms = access methods study

CS165, Fall 2016 Stratos Idreos /4922

fact table(id1,id2,…)

dimension table 1(id1,…)

dimension table 2(id2,…)

star schema

Page 23: Models & Intro to DB Architectures - Data Systems Laboratorydaslab.seas.harvard.edu/.../cs165/doc/class_slides/... · design data structures and algorithms = access methods study

CS165, Fall 2016 Stratos Idreos /4923

snowflake schema

Page 24: Models & Intro to DB Architectures - Data Systems Laboratorydaslab.seas.harvard.edu/.../cs165/doc/class_slides/... · design data structures and algorithms = access methods study

CS165, Fall 2016 Stratos Idreos /4924

Alex Liu, class 2016 165/265 project

adaptive denormalization

1st prize in ACM SIGMOD undergrad research competition

Special Interest Group on Management of Data

Page 25: Models & Intro to DB Architectures - Data Systems Laboratorydaslab.seas.harvard.edu/.../cs165/doc/class_slides/... · design data structures and algorithms = access methods study

CS165, Fall 2016 Stratos Idreos /4925

NORMALIZED DATA DENORMALIZED DATA

good for updates, storage but we need joins

only fast scans but expensive to create,

storage & updates

Page 26: Models & Intro to DB Architectures - Data Systems Laboratorydaslab.seas.harvard.edu/.../cs165/doc/class_slides/... · design data structures and algorithms = access methods study

CS165, Fall 2016 Stratos Idreos /4926

adaptive denormalization

continuously physically reorganize data based on incoming query patterns (joins)

normalized data

denormalized fragments queries only need to fast scan

possible denormalized space

Page 27: Models & Intro to DB Architectures - Data Systems Laboratorydaslab.seas.harvard.edu/.../cs165/doc/class_slides/... · design data structures and algorithms = access methods study

CS165, Fall 2016 Stratos Idreos /4927

create table employee (id:integer, name:varchar(50) not null, office:char(5), telephone:char(10), city:varchar(30), salary:integer, primary key (id) check (salary<100000))

constraints

at most 5 charsmust have a value

must be uniquemust not become rich

when and how do we enforce constraints

Page 28: Models & Intro to DB Architectures - Data Systems Laboratorydaslab.seas.harvard.edu/.../cs165/doc/class_slides/... · design data structures and algorithms = access methods study

CS165, Fall 2016 Stratos Idreos /4928

select max(GPA),avg(GPA),min(GPA) from students

aggregations

select R.a - R.b + R.c from R

math

select * from R where R.a IN (select b from S where C<10)

nested

select * from R where a =10UNIONselect * from B where b =20

set ops

more SQL examples

Page 29: Models & Intro to DB Architectures - Data Systems Laboratorydaslab.seas.harvard.edu/.../cs165/doc/class_slides/... · design data structures and algorithms = access methods study

CS165, Fall 2016 Stratos Idreos /4929

select avg(GPA), class, major from students where GPA>3.0 and class>1990 group by class, major order by class

Page 30: Models & Intro to DB Architectures - Data Systems Laboratorydaslab.seas.harvard.edu/.../cs165/doc/class_slides/... · design data structures and algorithms = access methods study

CS165, Fall 2016 Stratos Idreos /4930

Employee(id:int, name:varchar(50), office:char(5),

telephone:char(10), city:varchar(30), salary:int)

base table

Employee-Berlin-Managerselect * from employee where city=“berlin”

view to be used by managers in Berlin

Employee-Berlin-Allselect id,name,city,office from employee where city=“berlin”

view to be used by all employees in Berlin

how should we store views

Page 31: Models & Intro to DB Architectures - Data Systems Laboratorydaslab.seas.harvard.edu/.../cs165/doc/class_slides/... · design data structures and algorithms = access methods study

CS165, Fall 2016 Stratos Idreos /49

why are models great?

31

Page 32: Models & Intro to DB Architectures - Data Systems Laboratorydaslab.seas.harvard.edu/.../cs165/doc/class_slides/... · design data structures and algorithms = access methods study

CS165, Fall 2016 Stratos Idreos /49

other models?

32

Page 33: Models & Intro to DB Architectures - Data Systems Laboratorydaslab.seas.harvard.edu/.../cs165/doc/class_slides/... · design data structures and algorithms = access methods study

CS165, Fall 2016 Stratos Idreos /4933

it is summer 2017 - now you know all about data systems

you are building an augmented reality startup using Google Glass

people wearing Google Glass can tag places/objects - voice/image recognition works fine

tagging means assigning values, comments, etc to an object

you can then query this data - again assume voice recognition works fine and a black box translates natural language to SQL

how does the schema of your app look like? (tables, attributes, keys, relationships) (assume a limited working environment/features, say walking around Harvard square/yard)

describe 2 interesting queries in SQL

Page 34: Models & Intro to DB Architectures - Data Systems Laboratorydaslab.seas.harvard.edu/.../cs165/doc/class_slides/... · design data structures and algorithms = access methods study

CS165, Fall 2016 Stratos Idreos /4934

user(id,name,location,device,…)

object(id,name,location,telephone,date,url,color,taste,…many more)

comment(id,user_id,oject_id,text,…)

likes_object(user_id,oject_id,)

likes_comment(user_id,comment_id)

a possible example

trust(user_id,user_id)

q1: get all places where jenny said “awesome” q2: get all users that like what I like and are close by

select object.location from object, user where user.name = “jenny” and

comment.user_id=user.id and comment.text LIKE “%awesome%”

select user.name, user.location from user, likes_object as L1, likes_object as L2 where L1.user_id=my_id and L1.object_id=L2.object_id and L2.user_id !=my_id

and user.id=L2.user_id and close(user.location,mylocation)=true

Page 35: Models & Intro to DB Architectures - Data Systems Laboratorydaslab.seas.harvard.edu/.../cs165/doc/class_slides/... · design data structures and algorithms = access methods study

CS165, Fall 2016 Stratos Idreos /4935

object(id,name,location,telephone,date,url,color,taste,…many more)

how do we store the object table?

what if we want to add another kind of

object?

open research and business problem

Page 36: Models & Intro to DB Architectures - Data Systems Laboratorydaslab.seas.harvard.edu/.../cs165/doc/class_slides/... · design data structures and algorithms = access methods study

CS165, Fall 2016 Stratos Idreos /49

design

36

logical design

physical design

system design

Page 37: Models & Intro to DB Architectures - Data Systems Laboratorydaslab.seas.harvard.edu/.../cs165/doc/class_slides/... · design data structures and algorithms = access methods study

CS165, Fall 2016 Stratos Idreos /49

so do db systems “just work”?

37

declarative interface ask what you want

db system

Page 38: Models & Intro to DB Architectures - Data Systems Laboratorydaslab.seas.harvard.edu/.../cs165/doc/class_slides/... · design data structures and algorithms = access methods study

CS165, Fall 2016 Stratos Idreos /4938

declarative interface ask what you want

db system

DBAindexes/views/tuning knobsbut … db cracking, adaptive* ideas

Page 39: Models & Intro to DB Architectures - Data Systems Laboratorydaslab.seas.harvard.edu/.../cs165/doc/class_slides/... · design data structures and algorithms = access methods study

CS165, Fall 2016 Stratos Idreos /4939

clean schema load tune

query

experts/system admins

user/apps

essential steps in using a database system

Page 40: Models & Intro to DB Architectures - Data Systems Laboratorydaslab.seas.harvard.edu/.../cs165/doc/class_slides/... · design data structures and algorithms = access methods study

CS165, Fall 2016 Stratos Idreos /49

design

40

logical design

physical design

system design next up: db architectures 101

Page 41: Models & Intro to DB Architectures - Data Systems Laboratorydaslab.seas.harvard.edu/.../cs165/doc/class_slides/... · design data structures and algorithms = access methods study

CS165, Fall 2016 Stratos Idreos /4941

design/implement numerous possible algorithms + data representations

choose the bestdata source, algorithms and path for each query

database kernel

data data data

algo

rithm

s/op

erat

ors

applications

sql

Page 42: Models & Intro to DB Architectures - Data Systems Laboratorydaslab.seas.harvard.edu/.../cs165/doc/class_slides/... · design data structures and algorithms = access methods study

CS165, Fall 2016 Stratos Idreos /4942

database kernel

data data data

algo

rithm

s/op

erat

ors

select min(A) from R where B<10 and C<80

optimizer

parser

execution

storage

Page 43: Models & Intro to DB Architectures - Data Systems Laboratorydaslab.seas.harvard.edu/.../cs165/doc/class_slides/... · design data structures and algorithms = access methods study

CS165, Fall 2016 Stratos Idreos /4943

database kernel

applications

sql

optimizer

parser

execution

storage

in/out

admission

Page 44: Models & Intro to DB Architectures - Data Systems Laboratorydaslab.seas.harvard.edu/.../cs165/doc/class_slides/... · design data structures and algorithms = access methods study

CS165, Fall 2016 Stratos Idreos /4944

database kernel

applications

sql

client programs

thread1

thread2

thread3 db

program

thread pool

thread5

thread4

Page 45: Models & Intro to DB Architectures - Data Systems Laboratorydaslab.seas.harvard.edu/.../cs165/doc/class_slides/... · design data structures and algorithms = access methods study

CS165, Fall 2016 Stratos Idreos /4945

database kernel

applications

sql

optimizer

parser

execution

storage buffer pool

in/out

thread pool

transactions

disk

memory

cpu

is it “good” to have modules

Page 46: Models & Intro to DB Architectures - Data Systems Laboratorydaslab.seas.harvard.edu/.../cs165/doc/class_slides/... · design data structures and algorithms = access methods study

CS165, Fall 2016 Stratos Idreos /4946

models help create the right abstractions

models help create >>1 applications over the same data

we first need to clean, structure and load data

data systems consist of software components

Notes to remember

Page 47: Models & Intro to DB Architectures - Data Systems Laboratorydaslab.seas.harvard.edu/.../cs165/doc/class_slides/... · design data structures and algorithms = access methods study

CS165, Fall 2016 Stratos Idreos /4947

readingtextbook: chapters 1, 3 (-3.5), 5 (-5.8,-5.9) intro + relational model + SQL

browse “the Fourth Paradigm”

Page 48: Models & Intro to DB Architectures - Data Systems Laboratorydaslab.seas.harvard.edu/.../cs165/doc/class_slides/... · design data structures and algorithms = access methods study

CS165, Fall 2016 Stratos Idreos /4948

readings for next 3 classes

Architecture of a Database System (Sections 1,2,3,4)by J. Hellerstein, M. Stonebraker and J. Hamilton

The Design and Implementation of Modern Column-store Database Systemsby D. Abadi, P. Boncz, S. Harizopoulos, S. Idreos, S. Madden

Page 49: Models & Intro to DB Architectures - Data Systems Laboratorydaslab.seas.harvard.edu/.../cs165/doc/class_slides/... · design data structures and algorithms = access methods study

Models & Intro to DB Architectures

DATA SYSTEMSprof. Stratos Idreos

class 3