class 5 column stores 2 - harvard university

Post on 19-Apr-2022

3 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

column stores 2.0prof. Stratos Idreos

HTTP://DASLAB.SEAS.HARVARD.EDU/CLASSES/CS165/

class 5

CS165, Fall 2017 Stratos Idreos /282

what just happened?where is my data?

email, cloud, social media, …

can we design systems that let us know what is going on?

worth thinking about…

CS165, Fall 2017 Stratos Idreos /283

cool papers 2.0

The Case for RodentStore: An Adaptive, Declarative Storage SystemPhilippe Cudré-Mauroux, Eugene Wu, Samuel Madden In Proc. of the Inter. Conference on Innovative Data Systems Research (CIDR), 2009

Abstraction Without Regret in Database Systems Building: a ManifestoChristoph KochIEEE Data Eng. Bull. 37(1): 70-79 (2014)

dbTouch: Analytics at your FingertipsStratos Idreos and Erietta Liarou In Proc. of the Inter. Conference on Innovative Data Systems Research (CIDR), 2013

CS165, Fall 2017 Stratos Idreos /284

design doc think, design, create 1-2 page PDF doc and ask for feedback mandatory M1-M3, optional afterwards

submit through Canvas

do not worry about perfection: fail fast wrong ideas ok if you eventually find out they are wrong :) (holds for midterms as well)

CS165, Fall 2017 Stratos Idreos /285

Jim Gray, IBM, Tandem, DEC, Microsoft ACM Turing award ACM SIGMOD Edgar F. Codd Innovations Award

disk100Kx Pluto

2 years

memory100x New York1.5 hours

on board cache10x this building

10 min

on chip cache2x this room

1 min

registers my head~0

CS165, Fall 2017 Stratos Idreos /28

the way we store data defines the possible (efficient) access methods

6

CS165, Fall 2017 Stratos Idreos /287

free_offset, N, offset1-length1, offset2-lenght2,…

free space

slotted page

scan null

update var length

CS165, Fall 2017 Stratos Idreos /288

row-store column-storeABC D A B C D

CS165, Fall 2017 Stratos Idreos /289

a1 a2 a3 a4 a5 a6

b1 b2 b3 b4 b5 b6

c1 c2 c3 c4 c5 c6

virtual ids/ positional alignment

positional lookups/joinsA(i) = A + i * width(A)

tuple 1tuple 2tuple 3tuple 4tuple 5tuple 6

A B C

fixed-width + dense

columns do not need to have the

same width

CS165, Fall 2017 Stratos Idreos /28

todaycolumn-stores 2.0

10

CS165, Fall 2017 Stratos Idreos /2811

select min(C) from R where A<10 & B<20

B<20 minCA<10A B C D IDs B CIDsdisk memory

query plan = select -> fetch -> select -> fetch - > min

sequential access patterns, max 1 if

CS165, Fall 2017 Stratos Idreos /2811

select min(C) from R where A<10 & B<20

B<20 minCA<10A B C D IDs B CIDsdisk memory

query plan = select -> fetch -> select -> fetch - > min

sequential access patterns, max 1 if

CS165, Fall 2017 Stratos Idreos /2811

select min(C) from R where A<10 & B<20

B<20 minCA<10A B C D IDs B CIDsdisk memory

query plan = select -> fetch -> select -> fetch - > min

sequential access patterns, max 1 if

CS165, Fall 2017 Stratos Idreos /2811

select min(C) from R where A<10 & B<20

B<20 minCA<10A B C D IDs B CIDsdisk memory

query plan = select -> fetch -> select -> fetch - > min

sequential access patterns, max 1 if

CS165, Fall 2017 Stratos Idreos /2811

select min(C) from R where A<10 & B<20

B<20 minCA<10A B C D IDs B CIDsdisk memory

query plan = select -> fetch -> select -> fetch - > min

sequential access patterns, max 1 if

CS165, Fall 2017 Stratos Idreos /2811

select min(C) from R where A<10 & B<20

B<20 minCA<10A B C D IDs B CIDsdisk memory

query plan = select -> fetch -> select -> fetch - > min

sequential access patterns, max 1 if

CS165, Fall 2017 Stratos Idreos /2812

working over fixed width & dense columns

for (i=0;i<size;i++) if column[i]>v inter1[j++]=i

no function calls, no indirections, no auxiliary data, min ifs easy to prefetch next data values

for (i=0;i<size;i++) inter2[j++]=column[inter1[i]]

select

fetch

with data being memory resident these become significant cost components

CS165, Fall 2017 Stratos Idreos /2813

B<20 minCA<10 IDs B CIDs

alt1) start with B alt2) scan A & B independently and merge alt3) store intermediates as bit vectors - not positions …

CS165, Fall 2017 Stratos Idreos /2813

B<20 minCA<10 IDs B CIDs

alt1) start with B alt2) scan A & B independently and merge alt3) store intermediates as bit vectors - not positions …

project: basic one + more if you decide to invest in this area midterm: basic one + 2-3 alternatives

CS165, Fall 2017 Stratos Idreos /2814

B<20 minCA<10 IDs B CIDs

late tuple reconstruction/materialization only reconstruct to present results

no need to assemble tuples minimize memory footprint minimize data we are moving up the memory hierarchy but requires new processing engine

CS165, Fall 2017 Stratos Idreos /2815

disk memoryA B C D

A

ABCrow-store

engineearly tuple

reconstruction/materialization

option1

option2

column-store

engine

CS165, Fall 2017 Stratos Idreos /2816

possible data flow patternstuple at a time block/vector at a time column at a time

B<20 minCA<10 IDs B CIDs

CS165, Fall 2017 Stratos Idreos /2817

select min(C) from R where A<10 & B<20

B<20 minCA<10A B C D IDs B CIDs

A B C D B<20 minCA<10 IDs B CIDs

column-

vector-

CS165, Fall 2017 Stratos Idreos /2818

CEO/Co-founder of Vectorwise (now Actian) now: “changing the world, one terabyte at a time” co-founder of Snowflake

the beer analogy

Marcin Zukowski, PhD

CS165, Fall 2017 Stratos Idreos /2819

registers

on chip cache

on board cache

memory

disk

CPU

chea

per

fast

erop1 op2

query plan

A B

A Bop3

A

size of vector

CS165, Fall 2017 Stratos Idreos /2820

tuple at a time - good for minimizing memory footprint bulk processing - good minimizing functional overhead

vectorized processing - somewhere in between

CS165, Fall 2017 Stratos Idreos /2821

history/timeline

~1960s

tuple at a time

1980s: ideas about block processing

2005: vectorwise

tuple at a time tuple at a time

>2010: industry adoption

CS165, Fall 2017 Stratos Idreos /28

project: column-at-a-time

bonus: vectorized processing

22

CS165, Fall 2017 Stratos Idreos /2823

update row7=(A=a,B=b,C=c,D=d)

row-store column-storeABCD A B C D

vs

which is better to update and why? how much does it cost to update a single row? (think about pages, data movement) how to update in column-stores? (query plan + algorithms)

CS165, Fall 2017 Stratos Idreos /28

A

24

A B C D

B C D

base data pending updates

updatequery

periodically

CS165, Fall 2017 Stratos Idreos /2825

A B C D

columns copy rows copy

fractured mirrors

ABCD

optimizer

query

A case for fractured mirrorsRavishankar Ramamurthy, David J. DeWitt, Qi Su Very Large Databases Journal, 12(2): 89-101, 2003

CS165, Fall 2017 Stratos Idreos /2826

column-stores great for analytics

row-stores great for transactions

still basic concepts are the same

hybrids possible

keep access patterns sequential

and simple (min ifs)

Notes to remember

CS165, Fall 2017 Stratos Idreos /2827

reading

Read: The Design and Implementation of Modern Column-store Database Systems (Sections: all -4.6 & 4.8)by D. Abadi, P. Boncz, S. Harizopoulos, S. Idreos, S. Madden

Read: IEEE Data Engineering Bulletin, 35(1), March 2012 Special Issue on Column-stores (9 short overview papers)

CS165, Fall 2017 Stratos Idreos /2828

research papers

Read: Database Architecture Optimized for the New Bottleneck: Memory Access Peter Boncz, Stefan Manegold, Martin Kersten In Proc. of the Very Large Databases Conference (VLDB), 1999

Browse: MonetDB/X100: Hyper-Pipelining Query Execution Peter A. Boncz, Marcin Zukowski, Niels NesIn Proc. of the Inter. Conference on Innovative Data Systems Research (CIDR), 2005Browse: Materialization Strategies in a Column-Oriented DBMSDaniel Abadi, Daniel Myers, David DeWitt, Samuel Madden In Proc. of the Inter. Conference on Data Engineering (ICDE), 2007

Browse: Self-organizing tuple reconstruction in column-storesStratos Idreos, Martin Kersten, Stefan Manegold In Proc. of the ACM SIGMOD Inter. Conference on Management of Data, 2009

DATA SYSTEMSprof. Stratos Idreos

class 5

column-stores 2.0

top related