class 14 joins - harvard universitydaslab.seas.harvard.edu/.../cs165/doc/class_slides/...v3, null,...

54
joins prof. Stratos Idreos HTTP://DASLAB.SEAS.HARVARD.EDU/CLASSES/CS165/ class 14

Upload: others

Post on 04-Jul-2020

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: class 14 joins - Harvard Universitydaslab.seas.harvard.edu/.../cs165/doc/class_slides/...v3, null, b1 join /34 CS165, Fall 2017 Stratos Idreos 16 select courses.name=“cs165”

joinsprof. Stratos Idreos

HTTP://DASLAB.SEAS.HARVARD.EDU/CLASSES/CS165/

class 14

Page 2: class 14 joins - Harvard Universitydaslab.seas.harvard.edu/.../cs165/doc/class_slides/...v3, null, b1 join /34 CS165, Fall 2017 Stratos Idreos 16 select courses.name=“cs165”

/34CS165, Fall 2017 Stratos Idreos 2

Milestone1: quite involved but easy on the algorithmic side Milestone2: easy once you understand we just batching data Milestone3: not easy, not easy (all concepts/tools needed) Milestone4: better than M3 but still heavy on concepts Milestone5: should be quick

Testing server: will run twice a day as of this week

Remember: limited chances of success if you try to do this alone

Lab marathon: once more in a couple of weeks

Page 3: class 14 joins - Harvard Universitydaslab.seas.harvard.edu/.../cs165/doc/class_slides/...v3, null, b1 join /34 CS165, Fall 2017 Stratos Idreos 16 select courses.name=“cs165”

/34CS165, Fall 2017 Stratos Idreos 3

FINAL REPORT CONTAINS EXPERIMENTAL ANALYSIS HOW TO DO EXPERIMENTS?

find out what matters, test by changing one thing at a time

say we want to test the select operator, to compare scan vs secondary index

Page 4: class 14 joins - Harvard Universitydaslab.seas.harvard.edu/.../cs165/doc/class_slides/...v3, null, b1 join /34 CS165, Fall 2017 Stratos Idreos 16 select courses.name=“cs165”

/34CS165, Fall 2017 Stratos Idreos 4

do not listen to youtube while you run experiments!close all apps, recreate the same environment every time

create scripts for everything

ISOLATE PERFORMANCE AS BEST AS POSSIBLE

Page 5: class 14 joins - Harvard Universitydaslab.seas.harvard.edu/.../cs165/doc/class_slides/...v3, null, b1 join /34 CS165, Fall 2017 Stratos Idreos 16 select courses.name=“cs165”

/34CS165, Fall 2017 Stratos Idreos 5

resp

onse

tim

e (s

ecs)

data size (GB)re

spon

se ti

me

(sec

s)data size (GB)

e.g., to test the select operator

OK not OK

examples for final evaluation

Page 6: class 14 joins - Harvard Universitydaslab.seas.harvard.edu/.../cs165/doc/class_slides/...v3, null, b1 join /34 CS165, Fall 2017 Stratos Idreos 16 select courses.name=“cs165”

/34CS165, Fall 2017 Stratos Idreos 6

thro

ughp

ut (q

/s)

# of queriesth

roug

hput

(q/s

)# of queries

e.g., to test shared scans

OK not OK

examples for final evaluation

Page 7: class 14 joins - Harvard Universitydaslab.seas.harvard.edu/.../cs165/doc/class_slides/...v3, null, b1 join /34 CS165, Fall 2017 Stratos Idreos 16 select courses.name=“cs165”

/34CS165, Fall 2017 Stratos Idreos 7

Midterm2: Nov 15 -> Nov 20?

Midterm1: overall great performance! If you did not score 90+

please consider joining more for OH! If you did score 90+

please consider joining more for OH!

all quizzes, all discussions, all “Read” readings extra weekend OH will be announced

Page 8: class 14 joins - Harvard Universitydaslab.seas.harvard.edu/.../cs165/doc/class_slides/...v3, null, b1 join /34 CS165, Fall 2017 Stratos Idreos 16 select courses.name=“cs165”

/34CS165, Fall 2017 Stratos Idreos 8

so far

database kernel

data data data

algo

rithm

s/op

erat

ors

disk

memory

cpu

Page 9: class 14 joins - Harvard Universitydaslab.seas.harvard.edu/.../cs165/doc/class_slides/...v3, null, b1 join /34 CS165, Fall 2017 Stratos Idreos 16 select courses.name=“cs165”

/34CS165, Fall 2017 Stratos Idreos 8

so far

database kernel

data data data

algo

rithm

s/op

erat

ors

disk

memory

cpu

columns, rows, hybrids, trees

Page 10: class 14 joins - Harvard Universitydaslab.seas.harvard.edu/.../cs165/doc/class_slides/...v3, null, b1 join /34 CS165, Fall 2017 Stratos Idreos 16 select courses.name=“cs165”

/34CS165, Fall 2017 Stratos Idreos 8

so far

database kernel

data data data

algo

rithm

s/op

erat

ors

disk

memory

cpu

scan, binary search, tuple reconstruction, min,

max, search b-tree, etc.

columns, rows, hybrids, trees

Page 11: class 14 joins - Harvard Universitydaslab.seas.harvard.edu/.../cs165/doc/class_slides/...v3, null, b1 join /34 CS165, Fall 2017 Stratos Idreos 16 select courses.name=“cs165”

/34CS165, Fall 2017 Stratos Idreos 8

so far

database kernel

data data data

algo

rithm

s/op

erat

ors

disk

memory

cpu

scan, binary search, tuple reconstruction, min,

max, search b-tree, etc.

columns, rows, hybrids, trees

early/late tuple reconstruction, tuple-at-a-time, vectorized or bulk processing, intermediates format, pushing

selects down, etc

Page 12: class 14 joins - Harvard Universitydaslab.seas.harvard.edu/.../cs165/doc/class_slides/...v3, null, b1 join /34 CS165, Fall 2017 Stratos Idreos 16 select courses.name=“cs165”

/34CS165, Fall 2017 Stratos Idreos 9

joins(project=m4)

Page 13: class 14 joins - Harvard Universitydaslab.seas.harvard.edu/.../cs165/doc/class_slides/...v3, null, b1 join /34 CS165, Fall 2017 Stratos Idreos 16 select courses.name=“cs165”

/34CS165, Fall 2017 Stratos Idreos 10

fact table(id1,id2,…)

dimension table 1(id1,…)

dimension table 2(id2,…)

star schema

Page 14: class 14 joins - Harvard Universitydaslab.seas.harvard.edu/.../cs165/doc/class_slides/...v3, null, b1 join /34 CS165, Fall 2017 Stratos Idreos 16 select courses.name=“cs165”

/34CS165, Fall 2017 Stratos Idreos 10

fact table(id1,id2,…)

dimension table 1(id1,…)

dimension table 2(id2,…)

star schema

avoid duplicates - minimize update cost - but we have to do joins

Page 15: class 14 joins - Harvard Universitydaslab.seas.harvard.edu/.../cs165/doc/class_slides/...v3, null, b1 join /34 CS165, Fall 2017 Stratos Idreos 16 select courses.name=“cs165”

/34CS165, Fall 2017 Stratos Idreos 11

snowflake schema

Page 16: class 14 joins - Harvard Universitydaslab.seas.harvard.edu/.../cs165/doc/class_slides/...v3, null, b1 join /34 CS165, Fall 2017 Stratos Idreos 16 select courses.name=“cs165”

/34CS165, Fall 2017 Stratos Idreos 12

professor(id,name,…)

course(id,name, profId,…)

student(id,name,…)

database

give me all students enrolled in cs165select student.name from student, enrolled, course where course.name=“cs165” and enrolled.courseId=course.id and student.id=enrolled.studentId

enrolled(studentId,

courseId,…) foreign key

Page 17: class 14 joins - Harvard Universitydaslab.seas.harvard.edu/.../cs165/doc/class_slides/...v3, null, b1 join /34 CS165, Fall 2017 Stratos Idreos 16 select courses.name=“cs165”

/34CS165, Fall 2017 Stratos Idreos 13

table 1 table 2

joinforeign key

referencing table 2 primary key

find all tuples where FK=PKjoin: glue the data back together

Page 18: class 14 joins - Harvard Universitydaslab.seas.harvard.edu/.../cs165/doc/class_slides/...v3, null, b1 join /34 CS165, Fall 2017 Stratos Idreos 16 select courses.name=“cs165”

/34CS165, Fall 2017 Stratos Idreos 13

table 1 table 2

joinforeign key

referencing table 2 primary key

find all tuples where FK=PKjoin: glue the data back together

equi join

Page 19: class 14 joins - Harvard Universitydaslab.seas.harvard.edu/.../cs165/doc/class_slides/...v3, null, b1 join /34 CS165, Fall 2017 Stratos Idreos 16 select courses.name=“cs165”

/34CS165, Fall 2017 Stratos Idreos 14

1,a1,b1,c1 2,a2,b2,c2 3,a3,b3,c3 4,a4,b4,c4 1,a5,b5,c5

1,d1,e1,f1 1,d2,e2,f2 2,d3,e3,f3 2,d4,e4,f4 2,d5,e5,f5 3,d6,e6,f6

1,d1,e1,f1,a1,b1,c1 1,d1,e1,f1,a5,b5,c5 1,d2,e2,f2,a1,b1,c1 1,d2,e2,f2,a5,b5,c5 2,d3,e3,f3,a2,b2,c2 2.d4.e4,f4,a2,b2,c2 2,d5,e5,f5,a2,b2,c2 3,d6,e6,f6,a3,b3,c3

joinjoin resultkey,payload key,payload

Page 20: class 14 joins - Harvard Universitydaslab.seas.harvard.edu/.../cs165/doc/class_slides/...v3, null, b1 join /34 CS165, Fall 2017 Stratos Idreos 16 select courses.name=“cs165”

/34CS165, Fall 2017 Stratos Idreos 14

1,a1,b1,c1 2,a2,b2,c2 3,a3,b3,c3 4,a4,b4,c4 1,a5,b5,c5

1,d1,e1,f1 1,d2,e2,f2 2,d3,e3,f3 2,d4,e4,f4 2,d5,e5,f5 3,d6,e6,f6

1,d1,e1,f1,a1,b1,c1 1,d1,e1,f1,a5,b5,c5 1,d2,e2,f2,a1,b1,c1 1,d2,e2,f2,a5,b5,c5 2,d3,e3,f3,a2,b2,c2 2.d4.e4,f4,a2,b2,c2 2,d5,e5,f5,a2,b2,c2 3,d6,e6,f6,a3,b3,c3

joinjoin resultkey,payload key,payload

Page 21: class 14 joins - Harvard Universitydaslab.seas.harvard.edu/.../cs165/doc/class_slides/...v3, null, b1 join /34 CS165, Fall 2017 Stratos Idreos 16 select courses.name=“cs165”

/34CS165, Fall 2017 Stratos Idreos 14

1,a1,b1,c1 2,a2,b2,c2 3,a3,b3,c3 4,a4,b4,c4 1,a5,b5,c5

1,d1,e1,f1 1,d2,e2,f2 2,d3,e3,f3 2,d4,e4,f4 2,d5,e5,f5 3,d6,e6,f6

1,d1,e1,f1,a1,b1,c1 1,d1,e1,f1,a5,b5,c5 1,d2,e2,f2,a1,b1,c1 1,d2,e2,f2,a5,b5,c5 2,d3,e3,f3,a2,b2,c2 2.d4.e4,f4,a2,b2,c2 2,d5,e5,f5,a2,b2,c2 3,d6,e6,f6,a3,b3,c3

joinjoin resultkey,payload key,payload

Page 22: class 14 joins - Harvard Universitydaslab.seas.harvard.edu/.../cs165/doc/class_slides/...v3, null, b1 join /34 CS165, Fall 2017 Stratos Idreos 16 select courses.name=“cs165”

/34CS165, Fall 2017 Stratos Idreos 14

1,a1,b1,c1 2,a2,b2,c2 3,a3,b3,c3 4,a4,b4,c4 1,a5,b5,c5

1,d1,e1,f1 1,d2,e2,f2 2,d3,e3,f3 2,d4,e4,f4 2,d5,e5,f5 3,d6,e6,f6

1,d1,e1,f1,a1,b1,c1 1,d1,e1,f1,a5,b5,c5 1,d2,e2,f2,a1,b1,c1 1,d2,e2,f2,a5,b5,c5 2,d3,e3,f3,a2,b2,c2 2.d4.e4,f4,a2,b2,c2 2,d5,e5,f5,a2,b2,c2 3,d6,e6,f6,a3,b3,c3

joinjoin resultkey,payload key,payload

Page 23: class 14 joins - Harvard Universitydaslab.seas.harvard.edu/.../cs165/doc/class_slides/...v3, null, b1 join /34 CS165, Fall 2017 Stratos Idreos 16 select courses.name=“cs165”

/34CS165, Fall 2017 Stratos Idreos 14

1,a1,b1,c1 2,a2,b2,c2 3,a3,b3,c3 4,a4,b4,c4 1,a5,b5,c5

1,d1,e1,f1 1,d2,e2,f2 2,d3,e3,f3 2,d4,e4,f4 2,d5,e5,f5 3,d6,e6,f6

1,d1,e1,f1,a1,b1,c1 1,d1,e1,f1,a5,b5,c5 1,d2,e2,f2,a1,b1,c1 1,d2,e2,f2,a5,b5,c5 2,d3,e3,f3,a2,b2,c2 2.d4.e4,f4,a2,b2,c2 2,d5,e5,f5,a2,b2,c2 3,d6,e6,f6,a3,b3,c3

joinjoin resultkey,payload key,payload

Page 24: class 14 joins - Harvard Universitydaslab.seas.harvard.edu/.../cs165/doc/class_slides/...v3, null, b1 join /34 CS165, Fall 2017 Stratos Idreos 16 select courses.name=“cs165”

/34CS165, Fall 2017 Stratos Idreos 14

1,a1,b1,c1 2,a2,b2,c2 3,a3,b3,c3 4,a4,b4,c4 1,a5,b5,c5

1,d1,e1,f1 1,d2,e2,f2 2,d3,e3,f3 2,d4,e4,f4 2,d5,e5,f5 3,d6,e6,f6

1,d1,e1,f1,a1,b1,c1 1,d1,e1,f1,a5,b5,c5 1,d2,e2,f2,a1,b1,c1 1,d2,e2,f2,a5,b5,c5 2,d3,e3,f3,a2,b2,c2 2.d4.e4,f4,a2,b2,c2 2,d5,e5,f5,a2,b2,c2 3,d6,e6,f6,a3,b3,c3

joinjoin resultkey,payload key,payload

Page 25: class 14 joins - Harvard Universitydaslab.seas.harvard.edu/.../cs165/doc/class_slides/...v3, null, b1 join /34 CS165, Fall 2017 Stratos Idreos 16 select courses.name=“cs165”

/34CS165, Fall 2017 Stratos Idreos 14

1,a1,b1,c1 2,a2,b2,c2 3,a3,b3,c3 4,a4,b4,c4 1,a5,b5,c5

1,d1,e1,f1 1,d2,e2,f2 2,d3,e3,f3 2,d4,e4,f4 2,d5,e5,f5 3,d6,e6,f6

1,d1,e1,f1,a1,b1,c1 1,d1,e1,f1,a5,b5,c5 1,d2,e2,f2,a1,b1,c1 1,d2,e2,f2,a5,b5,c5 2,d3,e3,f3,a2,b2,c2 2.d4.e4,f4,a2,b2,c2 2,d5,e5,f5,a2,b2,c2 3,d6,e6,f6,a3,b3,c3

joinjoin resultkey,payload key,payload

Page 26: class 14 joins - Harvard Universitydaslab.seas.harvard.edu/.../cs165/doc/class_slides/...v3, null, b1 join /34 CS165, Fall 2017 Stratos Idreos 16 select courses.name=“cs165”

/34CS165, Fall 2017 Stratos Idreos 14

1,a1,b1,c1 2,a2,b2,c2 3,a3,b3,c3 4,a4,b4,c4 1,a5,b5,c5

1,d1,e1,f1 1,d2,e2,f2 2,d3,e3,f3 2,d4,e4,f4 2,d5,e5,f5 3,d6,e6,f6

1,d1,e1,f1,a1,b1,c1 1,d1,e1,f1,a5,b5,c5 1,d2,e2,f2,a1,b1,c1 1,d2,e2,f2,a5,b5,c5 2,d3,e3,f3,a2,b2,c2 2.d4.e4,f4,a2,b2,c2 2,d5,e5,f5,a2,b2,c2 3,d6,e6,f6,a3,b3,c3

joinjoin resultkey,payload key,payload

Page 27: class 14 joins - Harvard Universitydaslab.seas.harvard.edu/.../cs165/doc/class_slides/...v3, null, b1 join /34 CS165, Fall 2017 Stratos Idreos 16 select courses.name=“cs165”

/34CS165, Fall 2017 Stratos Idreos 14

1,a1,b1,c1 2,a2,b2,c2 3,a3,b3,c3 4,a4,b4,c4 1,a5,b5,c5

1,d1,e1,f1 1,d2,e2,f2 2,d3,e3,f3 2,d4,e4,f4 2,d5,e5,f5 3,d6,e6,f6

1,d1,e1,f1,a1,b1,c1 1,d1,e1,f1,a5,b5,c5 1,d2,e2,f2,a1,b1,c1 1,d2,e2,f2,a5,b5,c5 2,d3,e3,f3,a2,b2,c2 2.d4.e4,f4,a2,b2,c2 2,d5,e5,f5,a2,b2,c2 3,d6,e6,f6,a3,b3,c3

joinjoin resultkey,payload key,payload

Page 28: class 14 joins - Harvard Universitydaslab.seas.harvard.edu/.../cs165/doc/class_slides/...v3, null, b1 join /34 CS165, Fall 2017 Stratos Idreos 16 select courses.name=“cs165”

/34CS165, Fall 2017 Stratos Idreos 14

1,a1,b1,c1 2,a2,b2,c2 3,a3,b3,c3 4,a4,b4,c4 1,a5,b5,c5

1,d1,e1,f1 1,d2,e2,f2 2,d3,e3,f3 2,d4,e4,f4 2,d5,e5,f5 3,d6,e6,f6

1,d1,e1,f1,a1,b1,c1 1,d1,e1,f1,a5,b5,c5 1,d2,e2,f2,a1,b1,c1 1,d2,e2,f2,a5,b5,c5 2,d3,e3,f3,a2,b2,c2 2.d4.e4,f4,a2,b2,c2 2,d5,e5,f5,a2,b2,c2 3,d6,e6,f6,a3,b3,c3

joinjoin resultkey,payload key,payload

Page 29: class 14 joins - Harvard Universitydaslab.seas.harvard.edu/.../cs165/doc/class_slides/...v3, null, b1 join /34 CS165, Fall 2017 Stratos Idreos 16 select courses.name=“cs165”

/34CS165, Fall 2017 Stratos Idreos 15

inner join

left outer

right outer

v1, a1 v2, a2

v3, b1 v1, b2

v1, a1, b2

v1, a1, b2 v2, a2, null

v1, a1, b2 v3, null, b1

join

Page 30: class 14 joins - Harvard Universitydaslab.seas.harvard.edu/.../cs165/doc/class_slides/...v3, null, b1 join /34 CS165, Fall 2017 Stratos Idreos 16 select courses.name=“cs165”

/34CS165, Fall 2017 Stratos Idreos 16

select courses.name=“cs165”

join enrolled.courseid=course.id

students enrolled courses

join student.id=enrolled.studentid

project student.name

good plan

select student.name from students, enrolled, courses where courses.name=“cs165” and enrolled.courseId=course.id and student.id=enrolled.studentId

Page 31: class 14 joins - Harvard Universitydaslab.seas.harvard.edu/.../cs165/doc/class_slides/...v3, null, b1 join /34 CS165, Fall 2017 Stratos Idreos 16 select courses.name=“cs165”

/34CS165, Fall 2017 Stratos Idreos 17

select courses.name=“cs165”

join enrolled.courseid=course.id

students enrolled courses

join student.id=enrolled.studentid

project student.name

select student.name from students, enrolled, courses where courses.name=“cs165” and enrolled.courseId=course.id and student.id=enrolled.studentId

pushing selects down

Page 32: class 14 joins - Harvard Universitydaslab.seas.harvard.edu/.../cs165/doc/class_slides/...v3, null, b1 join /34 CS165, Fall 2017 Stratos Idreos 16 select courses.name=“cs165”

/34CS165, Fall 2017 Stratos Idreos 18

selectR.C

selectS.F

join

fetchR.A

fetchS.A

maxR.D

minS.G

R(A,B,C,D) - S(A,E,F,G)

select max(R.D),min(S.G) from R,S where R.A=S.A and R.C<10 and S.F>30

block operator

access patterns

Page 33: class 14 joins - Harvard Universitydaslab.seas.harvard.edu/.../cs165/doc/class_slides/...v3, null, b1 join /34 CS165, Fall 2017 Stratos Idreos 16 select courses.name=“cs165”

/34CS165, Fall 2017 Stratos Idreos 19

12347545495897754255

11356244297819812623

Relation R

Ra

RbRelation SSa Sb

select sum(R.a) from R, S where R.c = S.b and 5<R.a<20 and 40<R.b<50 and 30<S.a<40

Initial Status

1234532378653321290

Rc

31656911278

411935

Ra 24579

inter1

select(Ra,5,20)

31656911278

411935

24579

inter1 12347545495897754255

3445499742

3445499742

Rb inter2 reconstruct(Rb,inter1)

inter2 inter3 24579

24579

459

select(inter2,30,40)

inter3

459

1234532378653321290

Rc join_input_R

237829

459

join_input_R

237829

459

reconstruct(Rc,inter3)

1. inter1 = select(Ra,5,20)2. inter2 = reconstruct(Rb,inter1)3. inter3 = select(inter2,30,40)4. join_input_R = reconstruct(Rc,inter3)5. inter4 = select(Sa,55,65)6. inter5 = reconstruct(Sb,inter4)7. join_input_S = reverse(inter5) 8. join_res_R_S = join(join_input_R,join_input_S) 9. inter6 = voidTail(join_res_R_S)10. inter7 = reconstruct(Ra,inter6)11. result = sum(inter7)

17495899643753613250

Sa 3578

10

inter4 select(Sa,55,65)

17495899643753613250

6229198123

inter5reconstruct(Sb,inter4)

3578

10

3578

10

inter4 11356244297819812623

Sb6229198123

inter53578

10

6229198123

join_input_S 3578

10

reverse(inter5)

6229198123

join_input_S 3578

10

49

105

join_res_ R_S

49

105

join(join_input_R,join_input_S)join_res_ R_S

49

inter6voidTail(join_res_R_S)

49

inter6

Ra

31656911278

411935

919

inter7

919

inter7 reconstruct(Ra,inter6)

28

resultsum(inter7)

Query and Query Plan (MAL Algebra)

(1) (2) (4)(3)

(5) (6) (7)

(8) (9)

(10) (11)

select(inter2,40,50)

select(Sa,50,65)

Page 34: class 14 joins - Harvard Universitydaslab.seas.harvard.edu/.../cs165/doc/class_slides/...v3, null, b1 join /34 CS165, Fall 2017 Stratos Idreos 16 select courses.name=“cs165”

/34CS165, Fall 2017 Stratos Idreos 19

12347545495897754255

11356244297819812623

Relation R

Ra

RbRelation SSa Sb

select sum(R.a) from R, S where R.c = S.b and 5<R.a<20 and 40<R.b<50 and 30<S.a<40

Initial Status

1234532378653321290

Rc

31656911278

411935

Ra 24579

inter1

select(Ra,5,20)

31656911278

411935

24579

inter1 12347545495897754255

3445499742

3445499742

Rb inter2 reconstruct(Rb,inter1)

inter2 inter3 24579

24579

459

select(inter2,30,40)

inter3

459

1234532378653321290

Rc join_input_R

237829

459

join_input_R

237829

459

reconstruct(Rc,inter3)

1. inter1 = select(Ra,5,20)2. inter2 = reconstruct(Rb,inter1)3. inter3 = select(inter2,30,40)4. join_input_R = reconstruct(Rc,inter3)5. inter4 = select(Sa,55,65)6. inter5 = reconstruct(Sb,inter4)7. join_input_S = reverse(inter5) 8. join_res_R_S = join(join_input_R,join_input_S) 9. inter6 = voidTail(join_res_R_S)10. inter7 = reconstruct(Ra,inter6)11. result = sum(inter7)

17495899643753613250

Sa 3578

10

inter4 select(Sa,55,65)

17495899643753613250

6229198123

inter5reconstruct(Sb,inter4)

3578

10

3578

10

inter4 11356244297819812623

Sb6229198123

inter53578

10

6229198123

join_input_S 3578

10

reverse(inter5)

6229198123

join_input_S 3578

10

49

105

join_res_ R_S

49

105

join(join_input_R,join_input_S)join_res_ R_S

49

inter6voidTail(join_res_R_S)

49

inter6

Ra

31656911278

411935

919

inter7

919

inter7 reconstruct(Ra,inter6)

28

resultsum(inter7)

Query and Query Plan (MAL Algebra)

(1) (2) (4)(3)

(5) (6) (7)

(8) (9)

(10) (11)

12347545495897754255

11356244297819812623

Relation R

Ra

RbRelation SSa Sb

select sum(R.a) from R, S where R.c = S.b and 5<R.a<20 and 40<R.b<50 and 30<S.a<40

Initial Status

1234532378653321290

Rc

31656911278

411935

Ra 24579

inter1

select(Ra,5,20)

31656911278

411935

24579

inter1 12347545495897754255

3445499742

3445499742

Rb inter2 reconstruct(Rb,inter1)

inter2 inter3 24579

24579

459

select(inter2,30,40)

inter3

459

1234532378653321290

Rc join_input_R

237829

459

join_input_R

237829

459

reconstruct(Rc,inter3)

1. inter1 = select(Ra,5,20)2. inter2 = reconstruct(Rb,inter1)3. inter3 = select(inter2,30,40)4. join_input_R = reconstruct(Rc,inter3)5. inter4 = select(Sa,55,65)6. inter5 = reconstruct(Sb,inter4)7. join_input_S = reverse(inter5) 8. join_res_R_S = join(join_input_R,join_input_S) 9. inter6 = voidTail(join_res_R_S)10. inter7 = reconstruct(Ra,inter6)11. result = sum(inter7)

17495899643753613250

Sa 3578

10

inter4 select(Sa,55,65)

17495899643753613250

6229198123

inter5reconstruct(Sb,inter4)

3578

10

3578

10

inter4 11356244297819812623

Sb6229198123

inter53578

10

6229198123

join_input_S 3578

10

reverse(inter5)

6229198123

join_input_S 3578

10

49

105

join_res_ R_S

49

105

join(join_input_R,join_input_S)join_res_ R_S

49

inter6voidTail(join_res_R_S)

49

inter6

Ra

31656911278

411935

919

inter7

919

inter7 reconstruct(Ra,inter6)

28

resultsum(inter7)

Query and Query Plan (MAL Algebra)

(1) (2) (4)(3)

(5) (6) (7)

(8) (9)

(10) (11)

select(inter2,40,50)

select(inter2,40,50)

select(Sa,50,65)

Page 35: class 14 joins - Harvard Universitydaslab.seas.harvard.edu/.../cs165/doc/class_slides/...v3, null, b1 join /34 CS165, Fall 2017 Stratos Idreos 16 select courses.name=“cs165”

/34CS165, Fall 2017 Stratos Idreos 20

12347545495897754255

11356244297819812623

Relation R

Ra

RbRelation SSa Sb

select sum(R.a) from R, S where R.c = S.b and 5<R.a<20 and 40<R.b<50 and 30<S.a<40

Initial Status

1234532378653321290

Rc

31656911278

411935

Ra 24579

inter1

select(Ra,5,20)

31656911278

411935

24579

inter1 12347545495897754255

3445499742

3445499742

Rb inter2 reconstruct(Rb,inter1)

inter2 inter3 24579

24579

459

select(inter2,30,40)

inter3

459

1234532378653321290

Rc join_input_R

237829

459

join_input_R

237829

459

reconstruct(Rc,inter3)

1. inter1 = select(Ra,5,20)2. inter2 = reconstruct(Rb,inter1)3. inter3 = select(inter2,30,40)4. join_input_R = reconstruct(Rc,inter3)5. inter4 = select(Sa,55,65)6. inter5 = reconstruct(Sb,inter4)7. join_input_S = reverse(inter5) 8. join_res_R_S = join(join_input_R,join_input_S) 9. inter6 = voidTail(join_res_R_S)10. inter7 = reconstruct(Ra,inter6)11. result = sum(inter7)

17495899643753613250

Sa 3578

10

inter4 select(Sa,55,65)

17495899643753613250

6229198123

inter5reconstruct(Sb,inter4)

3578

10

3578

10

inter4 11356244297819812623

Sb6229198123

inter53578

10

6229198123

join_input_S 3578

10

reverse(inter5)

6229198123

join_input_S 3578

10

49

105

join_res_ R_S

49

105

join(join_input_R,join_input_S)join_res_ R_S

49

inter6voidTail(join_res_R_S)

49

inter6

Ra

31656911278

411935

919

inter7

919

inter7 reconstruct(Ra,inter6)

28

resultsum(inter7)

Query and Query Plan (MAL Algebra)

(1) (2) (4)(3)

(5) (6) (7)

(8) (9)

(10) (11)

select(inter2,40,50)

select(Sa,50,65)

Page 36: class 14 joins - Harvard Universitydaslab.seas.harvard.edu/.../cs165/doc/class_slides/...v3, null, b1 join /34 CS165, Fall 2017 Stratos Idreos 16 select courses.name=“cs165”

/34CS165, Fall 2017 Stratos Idreos 20

12347545495897754255

11356244297819812623

Relation R

Ra

RbRelation SSa Sb

select sum(R.a) from R, S where R.c = S.b and 5<R.a<20 and 40<R.b<50 and 30<S.a<40

Initial Status

1234532378653321290

Rc

31656911278

411935

Ra 24579

inter1

select(Ra,5,20)

31656911278

411935

24579

inter1 12347545495897754255

3445499742

3445499742

Rb inter2 reconstruct(Rb,inter1)

inter2 inter3 24579

24579

459

select(inter2,30,40)

inter3

459

1234532378653321290

Rc join_input_R

237829

459

join_input_R

237829

459

reconstruct(Rc,inter3)

1. inter1 = select(Ra,5,20)2. inter2 = reconstruct(Rb,inter1)3. inter3 = select(inter2,30,40)4. join_input_R = reconstruct(Rc,inter3)5. inter4 = select(Sa,55,65)6. inter5 = reconstruct(Sb,inter4)7. join_input_S = reverse(inter5) 8. join_res_R_S = join(join_input_R,join_input_S) 9. inter6 = voidTail(join_res_R_S)10. inter7 = reconstruct(Ra,inter6)11. result = sum(inter7)

17495899643753613250

Sa 3578

10

inter4 select(Sa,55,65)

17495899643753613250

6229198123

inter5reconstruct(Sb,inter4)

3578

10

3578

10

inter4 11356244297819812623

Sb6229198123

inter53578

10

6229198123

join_input_S 3578

10

reverse(inter5)

6229198123

join_input_S 3578

10

49

105

join_res_ R_S

49

105

join(join_input_R,join_input_S)join_res_ R_S

49

inter6voidTail(join_res_R_S)

49

inter6

Ra

31656911278

411935

919

inter7

919

inter7 reconstruct(Ra,inter6)

28

resultsum(inter7)

Query and Query Plan (MAL Algebra)

(1) (2) (4)(3)

(5) (6) (7)

(8) (9)

(10) (11)

12347545495897754255

11356244297819812623

Relation R

Ra

RbRelation SSa Sb

select sum(R.a) from R, S where R.c = S.b and 5<R.a<20 and 40<R.b<50 and 30<S.a<40

Initial Status

1234532378653321290

Rc

31656911278

411935

Ra 24579

inter1

select(Ra,5,20)

31656911278

411935

24579

inter1 12347545495897754255

3445499742

3445499742

Rb inter2 reconstruct(Rb,inter1)

inter2 inter3 24579

24579

459

select(inter2,30,40)

inter3

459

1234532378653321290

Rc join_input_R

237829

459

join_input_R

237829

459

reconstruct(Rc,inter3)

1. inter1 = select(Ra,5,20)2. inter2 = reconstruct(Rb,inter1)3. inter3 = select(inter2,30,40)4. join_input_R = reconstruct(Rc,inter3)5. inter4 = select(Sa,55,65)6. inter5 = reconstruct(Sb,inter4)7. join_input_S = reverse(inter5) 8. join_res_R_S = join(join_input_R,join_input_S) 9. inter6 = voidTail(join_res_R_S)10. inter7 = reconstruct(Ra,inter6)11. result = sum(inter7)

17495899643753613250

Sa 3578

10

inter4 select(Sa,55,65)

17495899643753613250

6229198123

inter5reconstruct(Sb,inter4)

3578

10

3578

10

inter4 11356244297819812623

Sb6229198123

inter53578

10

6229198123

join_input_S 3578

10

reverse(inter5)

6229198123

join_input_S 3578

10

49

105

join_res_ R_S

49

105

join(join_input_R,join_input_S)join_res_ R_S

49

inter6voidTail(join_res_R_S)

49

inter6

Ra

31656911278

411935

919

inter7

919

inter7 reconstruct(Ra,inter6)

28

resultsum(inter7)

Query and Query Plan (MAL Algebra)

(1) (2) (4)(3)

(5) (6) (7)

(8) (9)

(10) (11)

select(inter2,40,50)

select(Sa,50,65)

select(Sa,50,65)

Page 37: class 14 joins - Harvard Universitydaslab.seas.harvard.edu/.../cs165/doc/class_slides/...v3, null, b1 join /34 CS165, Fall 2017 Stratos Idreos 16 select courses.name=“cs165”

/34CS165, Fall 2017 Stratos Idreos 21

12347545495897754255

11356244297819812623

Relation R

Ra

RbRelation SSa Sb

select sum(R.a) from R, S where R.c = S.b and 5<R.a<20 and 40<R.b<50 and 30<S.a<40

Initial Status

1234532378653321290

Rc

31656911278

411935

Ra 24579

inter1

select(Ra,5,20)

31656911278

411935

24579

inter1 12347545495897754255

3445499742

3445499742

Rb inter2 reconstruct(Rb,inter1)

inter2 inter3 24579

24579

459

select(inter2,30,40)

inter3

459

1234532378653321290

Rc join_input_R

237829

459

join_input_R

237829

459

reconstruct(Rc,inter3)

1. inter1 = select(Ra,5,20)2. inter2 = reconstruct(Rb,inter1)3. inter3 = select(inter2,30,40)4. join_input_R = reconstruct(Rc,inter3)5. inter4 = select(Sa,55,65)6. inter5 = reconstruct(Sb,inter4)7. join_input_S = reverse(inter5) 8. join_res_R_S = join(join_input_R,join_input_S) 9. inter6 = voidTail(join_res_R_S)10. inter7 = reconstruct(Ra,inter6)11. result = sum(inter7)

17495899643753613250

Sa 3578

10

inter4 select(Sa,55,65)

17495899643753613250

6229198123

inter5reconstruct(Sb,inter4)

3578

10

3578

10

inter4 11356244297819812623

Sb6229198123

inter53578

10

6229198123

join_input_S 3578

10

reverse(inter5)

6229198123

join_input_S 3578

10

49

105

join_res_ R_S

49

105

join(join_input_R,join_input_S)join_res_ R_S

49

inter6voidTail(join_res_R_S)

49

inter6

Ra

31656911278

411935

919

inter7

919

inter7 reconstruct(Ra,inter6)

28

resultsum(inter7)

Query and Query Plan (MAL Algebra)

(1) (2) (4)(3)

(5) (6) (7)

(8) (9)

(10) (11)

select(inter2,40,50)

select(Sa,50,65)

Page 38: class 14 joins - Harvard Universitydaslab.seas.harvard.edu/.../cs165/doc/class_slides/...v3, null, b1 join /34 CS165, Fall 2017 Stratos Idreos 16 select courses.name=“cs165”

/34CS165, Fall 2017 Stratos Idreos 21

12347545495897754255

11356244297819812623

Relation R

Ra

RbRelation SSa Sb

select sum(R.a) from R, S where R.c = S.b and 5<R.a<20 and 40<R.b<50 and 30<S.a<40

Initial Status

1234532378653321290

Rc

31656911278

411935

Ra 24579

inter1

select(Ra,5,20)

31656911278

411935

24579

inter1 12347545495897754255

3445499742

3445499742

Rb inter2 reconstruct(Rb,inter1)

inter2 inter3 24579

24579

459

select(inter2,30,40)

inter3

459

1234532378653321290

Rc join_input_R

237829

459

join_input_R

237829

459

reconstruct(Rc,inter3)

1. inter1 = select(Ra,5,20)2. inter2 = reconstruct(Rb,inter1)3. inter3 = select(inter2,30,40)4. join_input_R = reconstruct(Rc,inter3)5. inter4 = select(Sa,55,65)6. inter5 = reconstruct(Sb,inter4)7. join_input_S = reverse(inter5) 8. join_res_R_S = join(join_input_R,join_input_S) 9. inter6 = voidTail(join_res_R_S)10. inter7 = reconstruct(Ra,inter6)11. result = sum(inter7)

17495899643753613250

Sa 3578

10

inter4 select(Sa,55,65)

17495899643753613250

6229198123

inter5reconstruct(Sb,inter4)

3578

10

3578

10

inter4 11356244297819812623

Sb6229198123

inter53578

10

6229198123

join_input_S 3578

10

reverse(inter5)

6229198123

join_input_S 3578

10

49

105

join_res_ R_S

49

105

join(join_input_R,join_input_S)join_res_ R_S

49

inter6voidTail(join_res_R_S)

49

inter6

Ra

31656911278

411935

919

inter7

919

inter7 reconstruct(Ra,inter6)

28

resultsum(inter7)

Query and Query Plan (MAL Algebra)

(1) (2) (4)(3)

(5) (6) (7)

(8) (9)

(10) (11)

12347545495897754255

11356244297819812623

Relation R

Ra

RbRelation SSa Sb

select sum(R.a) from R, S where R.c = S.b and 5<R.a<20 and 40<R.b<50 and 30<S.a<40

Initial Status

1234532378653321290

Rc

31656911278

411935

Ra 24579

inter1

select(Ra,5,20)

31656911278

411935

24579

inter1 12347545495897754255

3445499742

3445499742

Rb inter2 reconstruct(Rb,inter1)

inter2 inter3 24579

24579

459

select(inter2,30,40)

inter3

459

1234532378653321290

Rc join_input_R

237829

459

join_input_R

237829

459

reconstruct(Rc,inter3)

1. inter1 = select(Ra,5,20)2. inter2 = reconstruct(Rb,inter1)3. inter3 = select(inter2,30,40)4. join_input_R = reconstruct(Rc,inter3)5. inter4 = select(Sa,55,65)6. inter5 = reconstruct(Sb,inter4)7. join_input_S = reverse(inter5) 8. join_res_R_S = join(join_input_R,join_input_S) 9. inter6 = voidTail(join_res_R_S)10. inter7 = reconstruct(Ra,inter6)11. result = sum(inter7)

17495899643753613250

Sa 3578

10

inter4 select(Sa,55,65)

17495899643753613250

6229198123

inter5reconstruct(Sb,inter4)

3578

10

3578

10

inter4 11356244297819812623

Sb6229198123

inter53578

10

6229198123

join_input_S 3578

10

reverse(inter5)

6229198123

join_input_S 3578

10

49

105

join_res_ R_S

49

105

join(join_input_R,join_input_S)join_res_ R_S

49

inter6voidTail(join_res_R_S)

49

inter6

Ra

31656911278

411935

919

inter7

919

inter7 reconstruct(Ra,inter6)

28

resultsum(inter7)

Query and Query Plan (MAL Algebra)

(1) (2) (4)(3)

(5) (6) (7)

(8) (9)

(10) (11)

select(inter2,40,50)

select(Sa,50,65)

project m4

Page 39: class 14 joins - Harvard Universitydaslab.seas.harvard.edu/.../cs165/doc/class_slides/...v3, null, b1 join /34 CS165, Fall 2017 Stratos Idreos 16 select courses.name=“cs165”

/34CS165, Fall 2017 Stratos Idreos 22

12347545495897754255

11356244297819812623

Relation R

Ra

RbRelation SSa Sb

select sum(R.a) from R, S where R.c = S.b and 5<R.a<20 and 40<R.b<50 and 30<S.a<40

Initial Status

1234532378653321290

Rc

31656911278

411935

Ra 24579

inter1

select(Ra,5,20)

31656911278

411935

24579

inter1 12347545495897754255

3445499742

3445499742

Rb inter2 reconstruct(Rb,inter1)

inter2 inter3 24579

24579

459

select(inter2,30,40)

inter3

459

1234532378653321290

Rc join_input_R

237829

459

join_input_R

237829

459

reconstruct(Rc,inter3)

1. inter1 = select(Ra,5,20)2. inter2 = reconstruct(Rb,inter1)3. inter3 = select(inter2,30,40)4. join_input_R = reconstruct(Rc,inter3)5. inter4 = select(Sa,55,65)6. inter5 = reconstruct(Sb,inter4)7. join_input_S = reverse(inter5) 8. join_res_R_S = join(join_input_R,join_input_S) 9. inter6 = voidTail(join_res_R_S)10. inter7 = reconstruct(Ra,inter6)11. result = sum(inter7)

17495899643753613250

Sa 3578

10

inter4 select(Sa,55,65)

17495899643753613250

6229198123

inter5reconstruct(Sb,inter4)

3578

10

3578

10

inter4 11356244297819812623

Sb6229198123

inter53578

10

6229198123

join_input_S 3578

10

reverse(inter5)

6229198123

join_input_S 3578

10

49

105

join_res_ R_S

49

105

join(join_input_R,join_input_S)join_res_ R_S

49

inter6voidTail(join_res_R_S)

49

inter6

Ra

31656911278

411935

919

inter7

919

inter7 reconstruct(Ra,inter6)

28

resultsum(inter7)

Query and Query Plan (MAL Algebra)

(1) (2) (4)(3)

(5) (6) (7)

(8) (9)

(10) (11)

select(inter2,40,50)

select(Sa,50,65)

Page 40: class 14 joins - Harvard Universitydaslab.seas.harvard.edu/.../cs165/doc/class_slides/...v3, null, b1 join /34 CS165, Fall 2017 Stratos Idreos 16 select courses.name=“cs165”

/34CS165, Fall 2017 Stratos Idreos 22

12347545495897754255

11356244297819812623

Relation R

Ra

RbRelation SSa Sb

select sum(R.a) from R, S where R.c = S.b and 5<R.a<20 and 40<R.b<50 and 30<S.a<40

Initial Status

1234532378653321290

Rc

31656911278

411935

Ra 24579

inter1

select(Ra,5,20)

31656911278

411935

24579

inter1 12347545495897754255

3445499742

3445499742

Rb inter2 reconstruct(Rb,inter1)

inter2 inter3 24579

24579

459

select(inter2,30,40)

inter3

459

1234532378653321290

Rc join_input_R

237829

459

join_input_R

237829

459

reconstruct(Rc,inter3)

1. inter1 = select(Ra,5,20)2. inter2 = reconstruct(Rb,inter1)3. inter3 = select(inter2,30,40)4. join_input_R = reconstruct(Rc,inter3)5. inter4 = select(Sa,55,65)6. inter5 = reconstruct(Sb,inter4)7. join_input_S = reverse(inter5) 8. join_res_R_S = join(join_input_R,join_input_S) 9. inter6 = voidTail(join_res_R_S)10. inter7 = reconstruct(Ra,inter6)11. result = sum(inter7)

17495899643753613250

Sa 3578

10

inter4 select(Sa,55,65)

17495899643753613250

6229198123

inter5reconstruct(Sb,inter4)

3578

10

3578

10

inter4 11356244297819812623

Sb6229198123

inter53578

10

6229198123

join_input_S 3578

10

reverse(inter5)

6229198123

join_input_S 3578

10

49

105

join_res_ R_S

49

105

join(join_input_R,join_input_S)join_res_ R_S

49

inter6voidTail(join_res_R_S)

49

inter6

Ra

31656911278

411935

919

inter7

919

inter7 reconstruct(Ra,inter6)

28

resultsum(inter7)

Query and Query Plan (MAL Algebra)

(1) (2) (4)(3)

(5) (6) (7)

(8) (9)

(10) (11)

12347545495897754255

11356244297819812623

Relation R

Ra

RbRelation SSa Sb

select sum(R.a) from R, S where R.c = S.b and 5<R.a<20 and 40<R.b<50 and 30<S.a<40

Initial Status

1234532378653321290

Rc

31656911278

411935

Ra 24579

inter1

select(Ra,5,20)

31656911278

411935

24579

inter1 12347545495897754255

3445499742

3445499742

Rb inter2 reconstruct(Rb,inter1)

inter2 inter3 24579

24579

459

select(inter2,30,40)

inter3

459

1234532378653321290

Rc join_input_R

237829

459

join_input_R

237829

459

reconstruct(Rc,inter3)

1. inter1 = select(Ra,5,20)2. inter2 = reconstruct(Rb,inter1)3. inter3 = select(inter2,30,40)4. join_input_R = reconstruct(Rc,inter3)5. inter4 = select(Sa,55,65)6. inter5 = reconstruct(Sb,inter4)7. join_input_S = reverse(inter5) 8. join_res_R_S = join(join_input_R,join_input_S) 9. inter6 = voidTail(join_res_R_S)10. inter7 = reconstruct(Ra,inter6)11. result = sum(inter7)

17495899643753613250

Sa 3578

10

inter4 select(Sa,55,65)

17495899643753613250

6229198123

inter5reconstruct(Sb,inter4)

3578

10

3578

10

inter4 11356244297819812623

Sb6229198123

inter53578

10

6229198123

join_input_S 3578

10

reverse(inter5)

6229198123

join_input_S 3578

10

49

105

join_res_ R_S

49

105

join(join_input_R,join_input_S)join_res_ R_S

49

inter6voidTail(join_res_R_S)

49

inter6

Ra

31656911278

411935

919

inter7

919

inter7 reconstruct(Ra,inter6)

28

resultsum(inter7)

Query and Query Plan (MAL Algebra)

(1) (2) (4)(3)

(5) (6) (7)

(8) (9)

(10) (11)

select(inter2,40,50)

select(Sa,50,65)

Page 41: class 14 joins - Harvard Universitydaslab.seas.harvard.edu/.../cs165/doc/class_slides/...v3, null, b1 join /34 CS165, Fall 2017 Stratos Idreos 16 select courses.name=“cs165”

/34CS165, Fall 2017 Stratos Idreos 23

new resL[]; new resR[]; k=0 for (i=0;i<L;i=i++) for (j=0;j<R;j++) if L[i]==R[j] resL[k]=i resR[k++]=j

nested loops

L R

for all tuples of one side check all tuples of the other side

Page 42: class 14 joins - Harvard Universitydaslab.seas.harvard.edu/.../cs165/doc/class_slides/...v3, null, b1 join /34 CS165, Fall 2017 Stratos Idreos 16 select courses.name=“cs165”

/34CS165, Fall 2017 Stratos Idreos 24

outer(L) inner(R)join probe the red side

for better data locality

stream outer pages hold inner pages

Level1

Level2

res

Total footprint=L+R+res (bytes), R.pages<=Level1.pages-2

say red fits in Level1

Page 43: class 14 joins - Harvard Universitydaslab.seas.harvard.edu/.../cs165/doc/class_slides/...v3, null, b1 join /34 CS165, Fall 2017 Stratos Idreos 16 select courses.name=“cs165”

/34CS165, Fall 2017 Stratos Idreos 25

what if not all data fits in main-memory?

what if not all data fits in L3 cache?

what if not all data fits in L2 cache?

what if not all data fits in L1 cache?

can we utilize >1 cores?

Page 44: class 14 joins - Harvard Universitydaslab.seas.harvard.edu/.../cs165/doc/class_slides/...v3, null, b1 join /34 CS165, Fall 2017 Stratos Idreos 16 select courses.name=“cs165”

/34CS165, Fall 2017 Stratos Idreos 26

for every L.key = R.key pair

return [L.pos,R.pos]

1,a1,b1,c1 2,a2,b2,c2 3,a3,b3,c3 4,a4,b4,c4 1,a5,b5,c5

1,d1,e1,f1 1,d2,e2,f2 2,d3,e3,f3 2,d4,e4,f4 2,d5,e5,f5 3,d6,e6,f6

joinL: key,payload R: key,payload

1) design a nested loops join algorithm and give its I/O cost2) which column should be the inner and why?3) describe optimizations to minimize Level1 misses

Quickly if there is time think of the following: 4) can we use sorting? 5) how would you use a b-tree to do a join?

level 1

level 2

R.size> Level1.size, L.size>Level1.size, R.size+L.size << L2, Level1 block size = Level 2 block size

CPU

data/results stored one column-at-a-time

Page 45: class 14 joins - Harvard Universitydaslab.seas.harvard.edu/.../cs165/doc/class_slides/...v3, null, b1 join /34 CS165, Fall 2017 Stratos Idreos 16 select courses.name=“cs165”

/34CS165, Fall 2017 Stratos Idreos 27

comp O(LxR) I/O O(L/lp+Lx(R/rp))

new resL[]; new resR[]; k=0 for (i=0;i<L;i=i++) for (j=0;j<R;j++) if L[i]==R[j] resL[k]=i resR[k++]=j

lp=LeftEntriesThatFitInOnePage rp=RightEntriesThatFitInOnePage L= number of values in L column R= number of values in R column

Page 46: class 14 joins - Harvard Universitydaslab.seas.harvard.edu/.../cs165/doc/class_slides/...v3, null, b1 join /34 CS165, Fall 2017 Stratos Idreos 16 select courses.name=“cs165”

/34CS165, Fall 2017 Stratos Idreos 28

zig zagouter inner

phase 1

123

outer innerphase 2

123

A number of pages will still be

in LLC!

Page 47: class 14 joins - Harvard Universitydaslab.seas.harvard.edu/.../cs165/doc/class_slides/...v3, null, b1 join /34 CS165, Fall 2017 Stratos Idreos 16 select courses.name=“cs165”

/34CS165, Fall 2017 Stratos Idreos 29

comp O(LxR) I/O O(L/lp+Lx(R/rp))

new resL[]; new resR[]; k=0 for (i=0;i<L;i=i++) for (j=0;j<R;j++) if L[i]==R[j] resL[k]=i resR[k++]=j

lp=LeftEntriesThatFitInOnePage rp=RightEntriesThatFitInOnePage L= number of values in L column R= number of values in R column

Page 48: class 14 joins - Harvard Universitydaslab.seas.harvard.edu/.../cs165/doc/class_slides/...v3, null, b1 join /34 CS165, Fall 2017 Stratos Idreos 16 select courses.name=“cs165”

/34CS165, Fall 2017 Stratos Idreos 29

comp O(LxR) I/O O(L/lp+Lx(R/rp))

new resL[]; new resR[]; k=0 for (i=0;i<L;i=i++) for (j=0;j<R;j++) if L[i]==R[j] resL[k]=i resR[k++]=j

I/O with zig zag: L/lp + R/rp, if R/rp<=LLC-2

L/lp +Lx(R/rp -(LLC-2)), if R/rp<=2x(LLC-2)

lp=LeftEntriesThatFitInOnePage rp=RightEntriesThatFitInOnePage L= number of values in L column R= number of values in R column

Page 49: class 14 joins - Harvard Universitydaslab.seas.harvard.edu/.../cs165/doc/class_slides/...v3, null, b1 join /34 CS165, Fall 2017 Stratos Idreos 16 select courses.name=“cs165”

/34CS165, Fall 2017 Stratos Idreos 30

new resL[]; new resR[]; k=0 for (i=0;i<L;i=i+lp) for (j=0;j<R;j=j+rp) for (r=i;r<i+lp;r++) for (m=j;m<j+rp;m++) if L[r]==R[m]

resL[k]=r resR[k++]=m

comp O(LxR) I/O O(L/lp+Lx(R/rp))

comp O(LxR) I/O O(L/lp+(L/lp)x(R/rp))

blocked nested loopsnew resL[]; new resR[]; k=0 for (i=0;i<L;i=i++) for (j=0;j<R;j++) if L[i]==R[j] resL[k]=i resR[k++]=j

Page 50: class 14 joins - Harvard Universitydaslab.seas.harvard.edu/.../cs165/doc/class_slides/...v3, null, b1 join /34 CS165, Fall 2017 Stratos Idreos 16 select courses.name=“cs165”

/34CS165, Fall 2017 Stratos Idreos 30

new resL[]; new resR[]; k=0 for (i=0;i<L;i=i+lp) for (j=0;j<R;j=j+rp) for (r=i;r<i+lp;r++) for (m=j;m<j+rp;m++) if L[r]==R[m]

resL[k]=r resR[k++]=m

comp O(LxR) I/O O(L/lp+Lx(R/rp))

comp O(LxR) I/O O(L/lp+(L/lp)x(R/rp))

blocked nested loopsnew resL[]; new resR[]; k=0 for (i=0;i<L;i=i++) for (j=0;j<R;j++) if L[i]==R[j] resL[k]=i resR[k++]=j

I/O with zig zag: L/lp + R/rp, if R/rp<=LLC-2

L/lp +(L/lp)x(R/rp -(LLC-2)), if R/rp<=2x(LLC-2)

But if R/rp>2x(LLC-2) zig zag does not work anymore… and so we can flip inner/outer decision

Page 51: class 14 joins - Harvard Universitydaslab.seas.harvard.edu/.../cs165/doc/class_slides/...v3, null, b1 join /34 CS165, Fall 2017 Stratos Idreos 16 select courses.name=“cs165”

/34CS165, Fall 2017 Stratos Idreos 31

sort

on k

ey

sort

on k

ey

merge

sort merge join

while left and right still have tuples if left.val < right.val left++ else if left.val > right.val right++ else add to result, left++, right++

perfect if data is sorted or needs to be sorted anyway

for the rest of the plan

+ handle duplicates on both sides!

Page 52: class 14 joins - Harvard Universitydaslab.seas.harvard.edu/.../cs165/doc/class_slides/...v3, null, b1 join /34 CS165, Fall 2017 Stratos Idreos 16 select courses.name=“cs165”

/34CS165, Fall 2017 Stratos Idreos 32

utilize index

new resL[]; new resR[]; k=0 for (i=0;i<L;i=i++) jk=R.btree.probe(L[i]) if (jk!=null) resL[k]=i

resR[k++]=jk.pos

L R

Page 53: class 14 joins - Harvard Universitydaslab.seas.harvard.edu/.../cs165/doc/class_slides/...v3, null, b1 join /34 CS165, Fall 2017 Stratos Idreos 16 select courses.name=“cs165”

/34CS165, Fall 2017 Stratos Idreos 33

Read: textbook: chapters 4, 14

Page 54: class 14 joins - Harvard Universitydaslab.seas.harvard.edu/.../cs165/doc/class_slides/...v3, null, b1 join /34 CS165, Fall 2017 Stratos Idreos 16 select courses.name=“cs165”

DATA SYSTEMSprof. Stratos Idreos

class 14

joins