class 8 indexing & sorting - harvard...

55
indexing & sorting prof. Stratos Idreos HTTP://DASLAB.SEAS.HARVARD.EDU/CLASSES/CS165/ class 8

Upload: others

Post on 15-Mar-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: class 8 indexing & sorting - Harvard SEASdaslab.seas.harvard.edu/classes/cs165/doc/class_slides/... · 2019-05-13 · CS165, Fall 2017 Stratos Idreos 8 /33 midterms how to prepare

indexing & sortingprof. Stratos Idreos

HTTP://DASLAB.SEAS.HARVARD.EDU/CLASSES/CS165/

class 8

Page 2: class 8 indexing & sorting - Harvard SEASdaslab.seas.harvard.edu/classes/cs165/doc/class_slides/... · 2019-05-13 · CS165, Fall 2017 Stratos Idreos 8 /33 midterms how to prepare

CS165, Fall 2017 Stratos Idreos /332

first part done: basic concepts in modern systems

coming up: indexing and fast scans

Page 3: class 8 indexing & sorting - Harvard SEASdaslab.seas.harvard.edu/classes/cs165/doc/class_slides/... · 2019-05-13 · CS165, Fall 2017 Stratos Idreos 8 /33 midterms how to prepare

CS165, Fall 2017 Stratos Idreos /333

registers

on chip cache2x

on board cache10x

memory100x

disk100Kx

Jim Gray, IBM, Tandem, DEC, Microsoft ACM Turing award ACM SIGMOD Edgar F. Codd Innovations award

Pluto2 years

New York1.5 hours

this building10 min

this room1 min

my head~0

Page 4: class 8 indexing & sorting - Harvard SEASdaslab.seas.harvard.edu/classes/cs165/doc/class_slides/... · 2019-05-13 · CS165, Fall 2017 Stratos Idreos 8 /33 midterms how to prepare

CS165, Fall 2017 Stratos Idreos /334

random access & page-based access

need to only read x… but have to read all of page 1

page1 page2 page3

data value x

registers

on chip cache

on board cache

memory

disk

CPU

data

mov

e

Page 5: class 8 indexing & sorting - Harvard SEASdaslab.seas.harvard.edu/classes/cs165/doc/class_slides/... · 2019-05-13 · CS165, Fall 2017 Stratos Idreos 8 /33 midterms how to prepare

CS165, Fall 2017 Stratos Idreos /335

column-storage

A B C D

it all starts with how we layout the data (bits)

row-store and column-store are just two extremes in the design space

row-storage

A B C D

Page 6: class 8 indexing & sorting - Harvard SEASdaslab.seas.harvard.edu/classes/cs165/doc/class_slides/... · 2019-05-13 · CS165, Fall 2017 Stratos Idreos 8 /33 midterms how to prepare

CS165, Fall 2017 Stratos Idreos /336

as you are starting your projects, remembercome to office hours/lab - read/post in piazza

distributed API, DSL, code is supposed to help you start fast diverging is perfectly OK

functionality goal: select max(R.a), min(S.a) from R, S where R.j=S.j and R.b<20 and S.c>10 and S.d<50

+updates and persistency

performance goal: scalability (cores/queries)

cache conscious

Page 7: class 8 indexing & sorting - Harvard SEASdaslab.seas.harvard.edu/classes/cs165/doc/class_slides/... · 2019-05-13 · CS165, Fall 2017 Stratos Idreos 8 /33 midterms how to prepare

CS165, Fall 2017 Stratos Idreos /337

vectorwised processing: how to

select max(A) from R where B<20

p=select(B,null,20) a=fetch(A,p) res=max(a)

j=0; for(i=0; i<B.size; i+vector.size){

p=select(B,i,vector.size,null,20) a=fetch(A,p) rv[j++]=max(a)

} res=max(rv)

rewrite to

Extra: Enhanced stream processing in a DBMS kernelErietta Liarou, Stratos Idreos, Stefan Manegold, Martin Kersten In Proc. of the International Conf. on Extending Database Technology, 2013

optimizer assume optimizer does the rewriting and focus on analysis of property X - vectorwised vs column-at-a-time

take plans from here:

edge cases not included :)

Page 8: class 8 indexing & sorting - Harvard SEASdaslab.seas.harvard.edu/classes/cs165/doc/class_slides/... · 2019-05-13 · CS165, Fall 2017 Stratos Idreos 8 /33 midterms how to prepare

CS165, Fall 2017 Stratos Idreos /338

midtermshow to prepare

open book, notes, no laptop/discussion

material from lectures, “browse/read” readings

check all quizzes and questions

quiz-like questions - no exact answer

expectations: describe the design space - chose what you think is the best approach (>1 if we ask for it) and then analyze in detail all requests - if you made the wrong choice in the begging it is OK - but say so if you find out in the end and explain as much as possible

explain all steps and tradeoffs

Saturday & Sunday before midterm: Office hours 10am-3/5pm with each one of the five TFs and Stratos (noon-1pm)

10/11

Page 9: class 8 indexing & sorting - Harvard SEASdaslab.seas.harvard.edu/classes/cs165/doc/class_slides/... · 2019-05-13 · CS165, Fall 2017 Stratos Idreos 8 /33 midterms how to prepare

CS165, Fall 2017 Stratos Idreos /339

today+3data access made better

Page 10: class 8 indexing & sorting - Harvard SEASdaslab.seas.harvard.edu/classes/cs165/doc/class_slides/... · 2019-05-13 · CS165, Fall 2017 Stratos Idreos 8 /33 midterms how to prepare

CS165, Fall 2017 Stratos Idreos /3310

select

data data data

join

aggr

selectselect

join

it all starts with the select operator

it touches all the data

Page 11: class 8 indexing & sorting - Harvard SEASdaslab.seas.harvard.edu/classes/cs165/doc/class_slides/... · 2019-05-13 · CS165, Fall 2017 Stratos Idreos 8 /33 midterms how to prepare

CS165, Fall 2017 Stratos Idreos /3311

filtering data: point/range queriesindex

data

index knows structure of the data

an alternative data representation (data structure) of all or part of the data

Page 12: class 8 indexing & sorting - Harvard SEASdaslab.seas.harvard.edu/classes/cs165/doc/class_slides/... · 2019-05-13 · CS165, Fall 2017 Stratos Idreos 8 /33 midterms how to prepare

CS165, Fall 2017 Stratos Idreos /3312

why bother with creating/maintaining another data structure?

but wait, why not just sort the data (array) +

binary search?

Page 13: class 8 indexing & sorting - Harvard SEASdaslab.seas.harvard.edu/classes/cs165/doc/class_slides/... · 2019-05-13 · CS165, Fall 2017 Stratos Idreos 8 /33 midterms how to prepare

CS165, Fall 2017 Stratos Idreos /3313

ok let’s go with sorting for a while

A B C

initial state columns in

insertion order

sorted A B C

select B+C from R where A<10

Page 14: class 8 indexing & sorting - Harvard SEASdaslab.seas.harvard.edu/classes/cs165/doc/class_slides/... · 2019-05-13 · CS165, Fall 2017 Stratos Idreos 8 /33 midterms how to prepare

CS165, Fall 2017 Stratos Idreos /3314

a1 a2 a3 a4 a5

b1 b2 b3 b4 b5

c1 c2 c3 c4 c5

A B Ca5 a3 a2 a1 a4

Ab1 b2 b3 b4 b5

c1 c2 c3 c4 c5

B Cvalues are out of order

Page 15: class 8 indexing & sorting - Harvard SEASdaslab.seas.harvard.edu/classes/cs165/doc/class_slides/... · 2019-05-13 · CS165, Fall 2017 Stratos Idreos 8 /33 midterms how to prepare

CS165, Fall 2017 Stratos Idreos /3314

a1 a2 a3 a4 a5

b1 b2 b3 b4 b5

c1 c2 c3 c4 c5

A B Ca5 a3 a2 a1 a4

Ab1 b2 b3 b4 b5

c1 c2 c3 c4 c5

B Cvalues are out of order

5 3 2 1 4

Page 16: class 8 indexing & sorting - Harvard SEASdaslab.seas.harvard.edu/classes/cs165/doc/class_slides/... · 2019-05-13 · CS165, Fall 2017 Stratos Idreos 8 /33 midterms how to prepare

CS165, Fall 2017 Stratos Idreos /3314

a1 a2 a3 a4 a5

b1 b2 b3 b4 b5

c1 c2 c3 c4 c5

A B Ca5 a3 a2 a1 a4

Ab1 b2 b3 b4 b5

c1 c2 c3 c4 c5

B Cvalues are out of order

5 3 2 1 4

a5 a3 a2 a1 a4

A5 3 2 1 4

select2 1 4

b1 b2 b3 b4 b5

B

intermediate out of order

Page 17: class 8 indexing & sorting - Harvard SEASdaslab.seas.harvard.edu/classes/cs165/doc/class_slides/... · 2019-05-13 · CS165, Fall 2017 Stratos Idreos 8 /33 midterms how to prepare

CS165, Fall 2017 Stratos Idreos /3314

a1 a2 a3 a4 a5

b1 b2 b3 b4 b5

c1 c2 c3 c4 c5

A B Ca5 a3 a2 a1 a4

Ab1 b2 b3 b4 b5

c1 c2 c3 c4 c5

B Cvalues are out of order

5 3 2 1 4

a5 a3 a2 a1 a4

A5 3 2 1 4

select2 1 4

b1 b2 b3 b4 b5

B

intermediate out of order

sort or even better cluster at page boundaries

Page 18: class 8 indexing & sorting - Harvard SEASdaslab.seas.harvard.edu/classes/cs165/doc/class_slides/... · 2019-05-13 · CS165, Fall 2017 Stratos Idreos 8 /33 midterms how to prepare

CS165, Fall 2017 Stratos Idreos /3315

database kernel

data data data

algo

rithm

s/op

erat

ors

applications

sql

disk

memory

cpu

Page 19: class 8 indexing & sorting - Harvard SEASdaslab.seas.harvard.edu/classes/cs165/doc/class_slides/... · 2019-05-13 · CS165, Fall 2017 Stratos Idreos 8 /33 midterms how to prepare

CS165, Fall 2017 Stratos Idreos /3316

A B C

initial state columns in

insertion order

sorted A B C

sorted A B C

propagate order of A

Page 20: class 8 indexing & sorting - Harvard SEASdaslab.seas.harvard.edu/classes/cs165/doc/class_slides/... · 2019-05-13 · CS165, Fall 2017 Stratos Idreos 8 /33 midterms how to prepare

CS165, Fall 2017 Stratos Idreos /3317

pos1 pos2

1 0 1 0

A

sort

edselect max(D),min(E) from R where (A>10 and A<40) and (B>20 and B<60)

binary search for 10 & 40

B

for all B values between pos1 & 2: if B>20 and B<60 mark bit vector at pos i

maxD

D

for each marked position max(D)

avoid scan of A avoid TR on B work on a restricted area across all columns good for memory hierarchy

Page 21: class 8 indexing & sorting - Harvard SEASdaslab.seas.harvard.edu/classes/cs165/doc/class_slides/... · 2019-05-13 · CS165, Fall 2017 Stratos Idreos 8 /33 midterms how to prepare

CS165, Fall 2017 Stratos Idreos /3318

A

sort

edselect max(D),min(E) from R where (A>10 and A<40) or (B>20 and B<60)

binary search for 10 & 40

B

for all B values outside pos1 & 2: if B>20 and B<60 mark bit vector at pos i

maxD

D

for each marked position max(D)

0 0 0 1 1 1 1 1 0 0 0 0

0 1 0 1 1 1 1 1 0 1 1 0

Page 22: class 8 indexing & sorting - Harvard SEASdaslab.seas.harvard.edu/classes/cs165/doc/class_slides/... · 2019-05-13 · CS165, Fall 2017 Stratos Idreos 8 /33 midterms how to prepare

CS165, Fall 2017 Stratos Idreos /3319

A B C

base data

A B C

sort

ed

B A C

sort

ed

queries that filter on A benefit

…C-Store: A Column-oriented DBMSMichael Stonebraker, Daniel J. Abadi, Adam Batkin, Xuedong Chen, Mitch Cherniack, Miguel Ferreira, Edmond Lau, Amerson Lin, Samuel Madden, Elizabeth J. O'Neil, Patrick E. O'Neil, Alex Rasin, Nga Tran, Stanley B. Zdonik In Proc. of the Very Large Databases Conference (VLDB), 2005

queries that filter on B benefit

Page 23: class 8 indexing & sorting - Harvard SEASdaslab.seas.harvard.edu/classes/cs165/doc/class_slides/... · 2019-05-13 · CS165, Fall 2017 Stratos Idreos 8 /33 midterms how to prepare

CS165, Fall 2017 Stratos Idreos /33

initial state columns in

insertion order

20

A B C

base data

A B C

sort

ed

B A C

sort

ed

space overhead - update overhead - which ones to build?

Page 24: class 8 indexing & sorting - Harvard SEASdaslab.seas.harvard.edu/classes/cs165/doc/class_slides/... · 2019-05-13 · CS165, Fall 2017 Stratos Idreos 8 /33 midterms how to prepare

CS165, Fall 2017 Stratos Idreos /3321

declarative interface ask what you want

db system

DBAindexes/views/tuning knobs

Page 25: class 8 indexing & sorting - Harvard SEASdaslab.seas.harvard.edu/classes/cs165/doc/class_slides/... · 2019-05-13 · CS165, Fall 2017 Stratos Idreos 8 /33 midterms how to prepare

CS165, Fall 2017 Stratos Idreos /33

online

22

initial state columns in

insertion order

A B C

base datastorage budget<<smaller than the

possible set of projections

Browse: Self-organizing tuple reconstruction in column-storesStratos Idreos, Martin Kersten, Stefan Manegold In Proc. of the ACM SIGMOD Inter. Conference on Management of Data, 2009

Page 26: class 8 indexing & sorting - Harvard SEASdaslab.seas.harvard.edu/classes/cs165/doc/class_slides/... · 2019-05-13 · CS165, Fall 2017 Stratos Idreos 8 /33 midterms how to prepare

CS165, Fall 2017 Stratos Idreos /3323

L1 memory

L2 memory

CPU

(assume simplified memory hierarchy)

cost to sort array Cs?cost to find a value once sorted Ca?optimized algorithm to minimize Cs & Ca

data does not fit in L1 memory; it fits in L2 CPU can read/write directly from/to L1 only

Page 27: class 8 indexing & sorting - Harvard SEASdaslab.seas.harvard.edu/classes/cs165/doc/class_slides/... · 2019-05-13 · CS165, Fall 2017 Stratos Idreos 8 /33 midterms how to prepare

CS165, Fall 2017 Stratos Idreos /3324

memory level L

memory level L+1

(size=3 pages)

initial state: 8 unordered pages

Page 28: class 8 indexing & sorting - Harvard SEASdaslab.seas.harvard.edu/classes/cs165/doc/class_slides/... · 2019-05-13 · CS165, Fall 2017 Stratos Idreos 8 /33 midterms how to prepare

CS165, Fall 2017 Stratos Idreos /3324

memory level L

memory level L+1

(size=3 pages)

quicksort in place

initial state: 8 unordered pages

Page 29: class 8 indexing & sorting - Harvard SEASdaslab.seas.harvard.edu/classes/cs165/doc/class_slides/... · 2019-05-13 · CS165, Fall 2017 Stratos Idreos 8 /33 midterms how to prepare

CS165, Fall 2017 Stratos Idreos /3324

memory level L

memory level L+1

(size=3 pages)

initial state: 8 unordered pages

Page 30: class 8 indexing & sorting - Harvard SEASdaslab.seas.harvard.edu/classes/cs165/doc/class_slides/... · 2019-05-13 · CS165, Fall 2017 Stratos Idreos 8 /33 midterms how to prepare

CS165, Fall 2017 Stratos Idreos /3324

memory level L

memory level L+1

(size=3 pages)

quicksort in place

initial state: 8 unordered pages

Page 31: class 8 indexing & sorting - Harvard SEASdaslab.seas.harvard.edu/classes/cs165/doc/class_slides/... · 2019-05-13 · CS165, Fall 2017 Stratos Idreos 8 /33 midterms how to prepare

CS165, Fall 2017 Stratos Idreos /3324

memory level L

memory level L+1

(size=3 pages)

initial state: 8 unordered pages

Page 32: class 8 indexing & sorting - Harvard SEASdaslab.seas.harvard.edu/classes/cs165/doc/class_slides/... · 2019-05-13 · CS165, Fall 2017 Stratos Idreos 8 /33 midterms how to prepare

CS165, Fall 2017 Stratos Idreos /3324

memory level L

memory level L+1

(size=3 pages)

quicksort in place

initial state: 8 unordered pages

Page 33: class 8 indexing & sorting - Harvard SEASdaslab.seas.harvard.edu/classes/cs165/doc/class_slides/... · 2019-05-13 · CS165, Fall 2017 Stratos Idreos 8 /33 midterms how to prepare

CS165, Fall 2017 Stratos Idreos /3324

memory level L

memory level L+1

(size=3 pages)

initial state: 8 unordered pages

each page is now sorted we read and wrote every page once

data movement cost is 2N pages

Page 34: class 8 indexing & sorting - Harvard SEASdaslab.seas.harvard.edu/classes/cs165/doc/class_slides/... · 2019-05-13 · CS165, Fall 2017 Stratos Idreos 8 /33 midterms how to prepare

CS165, Fall 2017 Stratos Idreos /3325

memory level L

memory level L+1

(size=3 pages)

initial state: 8 sorted pages

Page 35: class 8 indexing & sorting - Harvard SEASdaslab.seas.harvard.edu/classes/cs165/doc/class_slides/... · 2019-05-13 · CS165, Fall 2017 Stratos Idreos 8 /33 midterms how to prepare

CS165, Fall 2017 Stratos Idreos /3325

memory level L

memory level L+1

(size=3 pages)

merge to new page

initial state: 8 sorted pages

Page 36: class 8 indexing & sorting - Harvard SEASdaslab.seas.harvard.edu/classes/cs165/doc/class_slides/... · 2019-05-13 · CS165, Fall 2017 Stratos Idreos 8 /33 midterms how to prepare

CS165, Fall 2017 Stratos Idreos /3325

memory level L

memory level L+1

(size=3 pages)

merge to new page

initial state: 8 sorted pages

Page 37: class 8 indexing & sorting - Harvard SEASdaslab.seas.harvard.edu/classes/cs165/doc/class_slides/... · 2019-05-13 · CS165, Fall 2017 Stratos Idreos 8 /33 midterms how to prepare

CS165, Fall 2017 Stratos Idreos /3325

memory level L

memory level L+1

(size=3 pages)

merge to new page

initial state: 8 sorted pages

Page 38: class 8 indexing & sorting - Harvard SEASdaslab.seas.harvard.edu/classes/cs165/doc/class_slides/... · 2019-05-13 · CS165, Fall 2017 Stratos Idreos 8 /33 midterms how to prepare

CS165, Fall 2017 Stratos Idreos /3325

memory level L

memory level L+1

(size=3 pages)

merge to new page

initial state: 8 sorted pages

Page 39: class 8 indexing & sorting - Harvard SEASdaslab.seas.harvard.edu/classes/cs165/doc/class_slides/... · 2019-05-13 · CS165, Fall 2017 Stratos Idreos 8 /33 midterms how to prepare

CS165, Fall 2017 Stratos Idreos /3325

memory level L

memory level L+1

(size=3 pages)

merge to new page

initial state: 8 sorted pages

Page 40: class 8 indexing & sorting - Harvard SEASdaslab.seas.harvard.edu/classes/cs165/doc/class_slides/... · 2019-05-13 · CS165, Fall 2017 Stratos Idreos 8 /33 midterms how to prepare

CS165, Fall 2017 Stratos Idreos /3325

memory level L

memory level L+1

(size=3 pages)

initial state: 8 sorted pages

Page 41: class 8 indexing & sorting - Harvard SEASdaslab.seas.harvard.edu/classes/cs165/doc/class_slides/... · 2019-05-13 · CS165, Fall 2017 Stratos Idreos 8 /33 midterms how to prepare

CS165, Fall 2017 Stratos Idreos /3325

memory level L

memory level L+1

(size=3 pages)

initial state: 8 sorted pages

each pair of pages is now sorted we read and wrote every page once

data movement cost is 2N pages (total 2N+2N)

Page 42: class 8 indexing & sorting - Harvard SEASdaslab.seas.harvard.edu/classes/cs165/doc/class_slides/... · 2019-05-13 · CS165, Fall 2017 Stratos Idreos 8 /33 midterms how to prepare

CS165, Fall 2017 Stratos Idreos /3326

1 pass to sort each page (2N pages)

1 pass to merge into 2 sorted pages (2N pages)

1 pass to merge into 4 sorted pages (2N pages)

1 pass to merge into 8 sorted pages (2N pages)

Page 43: class 8 indexing & sorting - Harvard SEASdaslab.seas.harvard.edu/classes/cs165/doc/class_slides/... · 2019-05-13 · CS165, Fall 2017 Stratos Idreos 8 /33 midterms how to prepare

CS165, Fall 2017 Stratos Idreos /3326

1 pass to sort each page (2N pages)

1 pass to merge into 2 sorted pages (2N pages)

1 pass to merge into 4 sorted pages (2N pages)

1 pass to merge into 8 sorted pages (2N pages)

Page 44: class 8 indexing & sorting - Harvard SEASdaslab.seas.harvard.edu/classes/cs165/doc/class_slides/... · 2019-05-13 · CS165, Fall 2017 Stratos Idreos 8 /33 midterms how to prepare

CS165, Fall 2017 Stratos Idreos /3326

1 pass to sort each page (2N pages)

1 pass to merge into 2 sorted pages (2N pages)

1 pass to merge into 4 sorted pages (2N pages)

1 pass to merge into 8 sorted pages (2N pages)

log2(N)

Page 45: class 8 indexing & sorting - Harvard SEASdaslab.seas.harvard.edu/classes/cs165/doc/class_slides/... · 2019-05-13 · CS165, Fall 2017 Stratos Idreos 8 /33 midterms how to prepare

CS165, Fall 2017 Stratos Idreos /3326

1 pass to sort each page (2N pages)

1 pass to merge into 2 sorted pages (2N pages)

1 pass to merge into 4 sorted pages (2N pages)

1 pass to merge into 8 sorted pages (2N pages)

log2(N)+1

Page 46: class 8 indexing & sorting - Harvard SEASdaslab.seas.harvard.edu/classes/cs165/doc/class_slides/... · 2019-05-13 · CS165, Fall 2017 Stratos Idreos 8 /33 midterms how to prepare

CS165, Fall 2017 Stratos Idreos /3326

1 pass to sort each page (2N pages)

1 pass to merge into 2 sorted pages (2N pages)

1 pass to merge into 4 sorted pages (2N pages)

1 pass to merge into 8 sorted pages (2N pages)

2N(log2(N)+1)

Page 47: class 8 indexing & sorting - Harvard SEASdaslab.seas.harvard.edu/classes/cs165/doc/class_slides/... · 2019-05-13 · CS165, Fall 2017 Stratos Idreos 8 /33 midterms how to prepare

CS165, Fall 2017 Stratos Idreos /3326

1 pass to sort each page (2N pages)

1 pass to merge into 2 sorted pages (2N pages)

1 pass to merge into 4 sorted pages (2N pages)

1 pass to merge into 8 sorted pages (2N pages)

2N(log2(N)+1) x bytesPerPage

Page 48: class 8 indexing & sorting - Harvard SEASdaslab.seas.harvard.edu/classes/cs165/doc/class_slides/... · 2019-05-13 · CS165, Fall 2017 Stratos Idreos 8 /33 midterms how to prepare

CS165, Fall 2017 Stratos Idreos /3327

in general, we have M pages in memory (not just 3), so

2N(log2(N)+1) -> 2N(logM-1(N)+1)

Page 49: class 8 indexing & sorting - Harvard SEASdaslab.seas.harvard.edu/classes/cs165/doc/class_slides/... · 2019-05-13 · CS165, Fall 2017 Stratos Idreos 8 /33 midterms how to prepare

CS165, Fall 2017 Stratos Idreos /3327

in general, we have M pages in memory (not just 3), so

2N(log2(N)+1) -> 2N(logM-1(N)+1)

in our first pass we can immediately sort groups of M pages

2N(logM-1(N)+1) -> 2N(logM-1(N/M)+1)

Page 50: class 8 indexing & sorting - Harvard SEASdaslab.seas.harvard.edu/classes/cs165/doc/class_slides/... · 2019-05-13 · CS165, Fall 2017 Stratos Idreos 8 /33 midterms how to prepare

CS165, Fall 2017 Stratos Idreos /3328

data size: N pages memory size: M pages

how much memory M do we need to sort N data in p passes only?

or

how much data can we sort in p passes if we have M memory?

logM-1(N/M)+1<=p

Page 51: class 8 indexing & sorting - Harvard SEASdaslab.seas.harvard.edu/classes/cs165/doc/class_slides/... · 2019-05-13 · CS165, Fall 2017 Stratos Idreos 8 /33 midterms how to prepare

CS165, Fall 2017 Stratos Idreos /33

previous discussion holds for all levels of memory hierarchy

29

Page 52: class 8 indexing & sorting - Harvard SEASdaslab.seas.harvard.edu/classes/cs165/doc/class_slides/... · 2019-05-13 · CS165, Fall 2017 Stratos Idreos 8 /33 midterms how to prepare

CS165, Fall 2017 Stratos Idreos /3330

other usage of sorting, e.g.,:order by group by sort-merge join remove duplicates sort/cluster ids/positions to avoid random access

Page 53: class 8 indexing & sorting - Harvard SEASdaslab.seas.harvard.edu/classes/cs165/doc/class_slides/... · 2019-05-13 · CS165, Fall 2017 Stratos Idreos 8 /33 midterms how to prepare

CS165, Fall 2017 Stratos Idreos /3331

Indexing helps navigate data faster than scan

Indexing is (some times) just another way to organize data

We need to consider all levels of memory hierarchy

when we design our algorithms

and to optimally use all available bytes

Notes to remember

Page 54: class 8 indexing & sorting - Harvard SEASdaslab.seas.harvard.edu/classes/cs165/doc/class_slides/... · 2019-05-13 · CS165, Fall 2017 Stratos Idreos 8 /33 midterms how to prepare

CS165, Fall 2017 Stratos Idreos /3332

Read textbook: Chapter 13

Browse: Self-organizing tuple reconstruction in column-storesStratos Idreos, Martin Kersten, Stefan Manegold In Proc. of the ACM SIGMOD Inter. Conference on Management of Data, 2009

Page 55: class 8 indexing & sorting - Harvard SEASdaslab.seas.harvard.edu/classes/cs165/doc/class_slides/... · 2019-05-13 · CS165, Fall 2017 Stratos Idreos 8 /33 midterms how to prepare

DATA SYSTEMSprof. Stratos Idreos

class 8

indexing & sorting