john kubiatowicz and anthony d. joseph electrical engineering and computer sciences

43
EECS 262a Advanced Topics in Computer Systems Lecture 16 C-Store / DB Cracking October 28 th , 2013 John Kubiatowicz and Anthony D. Joseph Electrical Engineering and Computer Sciences University of California, Berkeley http://www.eecs.berkeley.edu/~kubitron/cs262

Upload: bruce-nguyen

Post on 02-Jan-2016

17 views

Category:

Documents


0 download

DESCRIPTION

EECS 262a Advanced Topics in Computer Systems Lecture 16 C-Store / DB Cracking October 28 th , 2013. John Kubiatowicz and Anthony D. Joseph Electrical Engineering and Computer Sciences University of California, Berkeley http://www.eecs.berkeley.edu/~kubitron/cs262. Today’s Papers. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: John Kubiatowicz and Anthony D. Joseph Electrical Engineering and Computer Sciences

EECS 262a Advanced Topics in Computer Systems

Lecture 16

C-Store / DB CrackingOctober 28th, 2013

John Kubiatowicz and Anthony D. JosephElectrical Engineering and Computer Sciences

University of California, Berkeley

http://www.eecs.berkeley.edu/~kubitron/cs262

Page 2: John Kubiatowicz and Anthony D. Joseph Electrical Engineering and Computer Sciences

10/28/2013 2cs262a-S13 Lecture-16

Today’s Papers

• C-Store: A Column-oriented DBMS* Mike Stonebraker, Daniel J. Abadi, Adam Batkin, Xuedong Chen, Mitch Cherniack, Miguel Ferreira, Edmond Lau, Amerson Lin, Sam Madden, Elizabeth O’Neil, Pat O’Neil, Alex Rasin, Nga Tran, Stan Zdonik. Appears in Proceedings of the ACM Conference on Very Large Databases(VLDB), 2005

• Database Cracking+

Stratos Idreos, Martin L. Kersten, and Stefan Manegold. Appears in the 3rd Biennial Conference on Innovative Data Systems Research (CIDR), January 7-10, 2007

• Wednesday: Comparison of PDBMS, CS, MR

• Thoughts?

*Some slides based on slides from Jianlin Feng, School of Software, Sun Yat-Sen University+Some slides based on slides from Stratos Idreos, CWI Amsterdam, The Netherlands

Page 3: John Kubiatowicz and Anthony D. Joseph Electrical Engineering and Computer Sciences

10/28/2013 3cs262a-S13 Lecture-16

Relational Data: Logical View

Name Age Department Salary

Bob 25 Math 10K

Bill 27 EECS 50K

Jill 24 Biology 80K

Page 4: John Kubiatowicz and Anthony D. Joseph Electrical Engineering and Computer Sciences

10/28/2013 4cs262a-S13 Lecture-16

C-Store: A Column-oriented DBMS

• Physical layout:

Row store or Column store

Record 1

Record 2Column

1

Column

2

Record 3

Column

3

Relation or Tables

Page 5: John Kubiatowicz and Anthony D. Joseph Electrical Engineering and Computer Sciences

10/28/2013 5cs262a-S13 Lecture-16

Row Stores

• On-Line Transaction Processing (OLTP)– ATM, POS in supermarkets

• Characteristics of OLTP applications: – Transactions that involve small numbers of records (or tuples)– Frequent updates (including queries)– Many users – Fast response times

• OLTP needs a write-optimized row store– Insert and delete a record in one physical write

• Easy to add new record, but might read unnecessary data (wasted memory and I/O bandwidth)

Page 6: John Kubiatowicz and Anthony D. Joseph Electrical Engineering and Computer Sciences

10/28/2013 6cs262a-S13 Lecture-16

Row Store: Columns Stored Together

Record id = <page id, slot #>

Page iRid = (i,N)

Rid = (i,2)

Rid = (i,1)

Pointerto startof freespaceSLOT DIRECTORY

N . . . 2 1

20 16 24 N

# slotsSlot Array

Data

Page 7: John Kubiatowicz and Anthony D. Joseph Electrical Engineering and Computer Sciences

10/28/2013 7cs262a-S13 Lecture-16

Current DBMS Gold Standard

• Store columns from one record contiguously on disk

• Use B-tree indexing

• Use small (e.g. 4K) disk blocks

• Align fields on byte or word boundaries

• Conventional (row-oriented) query optimizer and executor (technology from 1979)

• Aries-style transactions

Page 8: John Kubiatowicz and Anthony D. Joseph Electrical Engineering and Computer Sciences

10/28/2013 8cs262a-S13 Lecture-16

OLAP and Data Warehouses

• On-Line Analytical Processing (OLAP)– Flexible reporting for Business Intelligence

• Characteristics of OLAP applications– Transactions that involve large numbers of records– Frequent Ad-hoc queries and infrequent updates – A few decision-making users – Fast response times

• Data Warehouses facilitate reporting and analysis– Read-Mostly

• Other read-mostly applications– Customer Relationship Management (Siebel, Oracle) – Catalog Search in E-Commerce (Amazon.com, Bestbuy.com)

Page 9: John Kubiatowicz and Anthony D. Joseph Electrical Engineering and Computer Sciences

10/28/2013 9cs262a-S13 Lecture-16

Column Stores

• Logical data model: Relational Model

• Key Intuition: Only read relevant columns– Example: Ad-hoc queries read 2 columns out of 20

• Multiple prior column store implementations– Sybase IQ (early ’90s, bitmap index)– Addamark (i.e., SenSage, for Event Log data warehouse)– KDB (Column-stores for financial services companies)– MonetDB (Hyper-Pipelining Query Execution, CIDR’05)

• Only read necessary data, but might need multiple seeks

Page 10: John Kubiatowicz and Anthony D. Joseph Electrical Engineering and Computer Sciences

10/28/2013 11cs262a-S13 Lecture-16

Read-Optimized Databases

• Effect of column-orientation on performance?– Read-less, seek more so depends on prefetching, query selectivity,

tuple width, competing query traffic

45

…37

Joe

…Sue

1

…2

column stores

1 Joe 45

… … …2 Sue 37

row stores

Sybase IQMonetDBKDBC-Store

SQL ServerDB2Oracle

Page 11: John Kubiatowicz and Anthony D. Joseph Electrical Engineering and Computer Sciences

10/28/2013 12cs262a-S13 Lecture-16

Rows versus Columns

1 Joe 45

2 Sue 37… … … single file

project

Joe 45

1

2

Joe

Sue

45

37

……

3 files

Joe

45reconstruct

Joe 45

seek

column storesrow stores

Page 12: John Kubiatowicz and Anthony D. Joseph Electrical Engineering and Computer Sciences

10/28/2013 13cs262a-S13 Lecture-16

C-Store Technical Ideas

• Column Store with some “novel” ideas (below)

• Only materialized views on each relation (perhaps many)

• Active data compression

• Column-oriented query executor and optimizer

• Shared-nothing architecture

• Replication-based concurrency control and recovery

Writeable Store (WS)

Read-optimized Store (RS)

Tuple Mover

Read-optimized Store (RS)Read-optimized Store

(RS)Read-optimized Store (RS)

Page 13: John Kubiatowicz and Anthony D. Joseph Electrical Engineering and Computer Sciences

10/28/2013 14cs262a-S13 Lecture-16

Architecture of Vertica C-Store

Page 14: John Kubiatowicz and Anthony D. Joseph Electrical Engineering and Computer Sciences

10/28/2013 15cs262a-S13 Lecture-16

Basic Concepts

• A logical table is physically represented as a set of projections

• Each projection consists of a set of columns– Columns are stored separately, along with a common sort order

defined by SORT KEY

• Each column appears in at least one projection

• A column can have different sort orders if it is stored in multiple projections

Page 15: John Kubiatowicz and Anthony D. Joseph Electrical Engineering and Computer Sciences

10/28/2013 16cs262a-S13 Lecture-16

Example C-Store Projection

• LINEITEM(shipdate, quantity, retflag, suppkey | shipdate, quantity, retflag)– First sorted by shipdate– Second sorted by quantity– Third sorted by retflag

• Sorting increases locality of data– Favors compression techniques such as Run-

Length Encoding (see Elephant paper)

Page 16: John Kubiatowicz and Anthony D. Joseph Electrical Engineering and Computer Sciences

10/28/2013 17cs262a-S13 Lecture-16

C-Store Operators• Selection

– Produce bitmaps that can be efficiently combined

• Mask– Materialize a set of values from a column and a bitmap

• Permute– Reorder a column using a join index

• Projection– Free operation!– Two columns in the same order can be concatenated for

free

• Join– Produces positions rather than values

Page 17: John Kubiatowicz and Anthony D. Joseph Electrical Engineering and Computer Sciences

10/28/2013 18cs262a-S13 Lecture-16

Example: Join over Two Columns

Page 18: John Kubiatowicz and Anthony D. Joseph Electrical Engineering and Computer Sciences

10/28/2013 19cs262a-S13 Lecture-16

Column Store has High Compressibility

• Each attribute is stored in a separate column– Related values are compressible (versus values of separate attributes)

• Compression benefits– Reduces the data sizes– Improves disk (and memory) I/O performance by:

» reducing seek times (related data stored nearer together)» reducing transfer times (less data to read/write)» increasing buffer hit rate (buffer can hold larger fraction of data)

Page 19: John Kubiatowicz and Anthony D. Joseph Electrical Engineering and Computer Sciences

10/28/2013 20cs262a-S13 Lecture-16

Compression Methods

• Dictionary

• Bit-pack– Pack several attributes inside a 4-byte word– Use as many bits as max-value

• Delta– Base value per page– Arithmetic differences

• No Run-Length Encoding (unlike Elephant paper)

… ‘low’ …

… ‘high’ …

… ‘low’ …

… ‘normal’ …

… 00 …

… 10 …

… 00 …

… 01 …

Page 20: John Kubiatowicz and Anthony D. Joseph Electrical Engineering and Computer Sciences

10/28/2013 21cs262a-S13 Lecture-16

C-Store Use of Snapshot Isolation

• Snapshot Isolation for Fast OLAP/Data Warehousing– Allows very fast transactions without locks– Can read large consistent snapshot of database

• Divide into RS and WS stores– Read Store is Optimized for Fast Read-Only Transactions– Write Store is Optimized for Transactional Updates

• Low Water Mark (LWM)– Represents earliest epoch at which read-only transactions can run– RS contains tuples added before LWM

• High Water Mark (HWM)– Represents latest epoch at which read-only transactions can run

Page 21: John Kubiatowicz and Anthony D. Joseph Electrical Engineering and Computer Sciences

10/28/2013 22cs262a-S13 Lecture-16

Other Ideas

• K-safety: Can handle up to K-1 failures– Every piece of data replicated K times– Different projections sorted in different ways

• Join Tables– Construct original tuples given covering projects– Vertica gave up on Join Tables – too expensive, require super-projections

Page 22: John Kubiatowicz and Anthony D. Joseph Electrical Engineering and Computer Sciences

10/28/2013 23cs262a-S13 Lecture-16

Evaluation?

• Series of 7 queries against C-Store vs two commercial DBMS– C-Store faster In all cases, sometimes significantly

• Why so much faster?– Column Representation – avoid extra reads– Overlapping projections – multiple orderings of column as appropriate– Better data compression– Query operators operating on compressed data

Page 23: John Kubiatowicz and Anthony D. Joseph Electrical Engineering and Computer Sciences

10/28/2013 24cs262a-S13 Lecture-16

Summary

• Columns outperform rows in listed workloads– Reasons:

» Column representation avoids reads of unused attributes» Query operators operate on compressed representation,

mitigating storage barrier problem» Avoids memory-bandwidth bottleneck in rows

• Storing overlapping projections, rather than the whole table allows storage of multiple orderings of a column as appropriate

• Better compression of data allows more orderings in the same space

• Results from other papers:– Prefetching is key to columns outperforming rows– Systems with competing traffic favor columns

Page 24: John Kubiatowicz and Anthony D. Joseph Electrical Engineering and Computer Sciences

10/28/2013 25cs262a-S13 Lecture-16

Is this a good paper?

• What were the authors’ goals?• What about the evaluation/metrics?• Did they convince you that this was a good

system/approach?• Were there any red-flags?• What mistakes did they make?• Does the system/approach meet the “Test of Time”

challenge?• How would you review this paper today?

Page 25: John Kubiatowicz and Anthony D. Joseph Electrical Engineering and Computer Sciences

BREAK

Page 26: John Kubiatowicz and Anthony D. Joseph Electrical Engineering and Computer Sciences

10/28/2013 27cs262a-S13 Lecture-16

DB Physical Organization Problem

• Many DBMS perform well and efficiently only after being tuned by a DBA

• DBA decides:– Which indexes to build? – On which data parts? – and when to build them?

Page 27: John Kubiatowicz and Anthony D. Joseph Electrical Engineering and Computer Sciences

10/28/2013 28cs262a-S13 Lecture-16

Timeline

• Sample workload

• Analyze performance

• Prepare estimated physical design

• Queries

Very complex and time consuming process!

What about:

• Dynamic, changing workloads?

• Very Large Databases?

Page 28: John Kubiatowicz and Anthony D. Joseph Electrical Engineering and Computer Sciences

10/28/2013 29cs262a-S13 Lecture-16

Database Cracking

• Solve challenges of dynamic environments:– Remove all tuning and physical design steps, but still get

similar performance as a fully tuned system

• How?

• Design new auto-tuning kernels– DBA with cracking (operators, plans, structures, etc.)

Page 29: John Kubiatowicz and Anthony D. Joseph Electrical Engineering and Computer Sciences

10/28/2013 30cs262a-S13 Lecture-16

Database Cracking

• No monitoring• No preparation• No external tools• No full indexes• No human involvement

• Continuous on-the-fly physical reorganization– Partial, incremental, adaptive indexing

• Designed for modern column-stores

Page 30: John Kubiatowicz and Anthony D. Joseph Electrical Engineering and Computer Sciences

10/28/2013 31cs262a-S13 Lecture-16

Cracking Example• Each query is treated as an advice on how data

should be stored– Triggers physical re-organization of the database

Q1: select * from R

where R.A > 10

and R.A < 14

Column A

13

16

4

9

2

12

7

1

19

3

14

11

8

6

Page 31: John Kubiatowicz and Anthony D. Joseph Electrical Engineering and Computer Sciences

10/28/2013 32cs262a-S13 Lecture-16

Cracking Design

• The first time a range query is posed on an attribute A, a cracking DBMS makes a copy of column A, called the cracker column of A

• A cracker column is continuously physically re-organized based on queries that need to touch attribute such as the result is in a contiguous space

• For each cracker column, there is a cracker index

Page 32: John Kubiatowicz and Anthony D. Joseph Electrical Engineering and Computer Sciences

10/28/2013 33cs262a-S13 Lecture-16

Cracking Algorithms• Two types of cracking algorithms based on select’s

where clause

where X < # where # < X < # Split a piece into Split a piece into two new pieces three new pieces

Page 33: John Kubiatowicz and Anthony D. Joseph Electrical Engineering and Computer Sciences

10/28/2013 34cs262a-S13 Lecture-16

Cracker Select Operator

• Traditional select operator– Scans the column– Returns a new column that contains the qualifying values

• The cracker select operator– Searches the cracker index– Physically re-organizes the pieces found– Updates the cracker index– Returns a slice of the cracker column as the result

• More steps but faster because analyzes less data

Page 34: John Kubiatowicz and Anthony D. Joseph Electrical Engineering and Computer Sciences

10/28/2013 35cs262a-S13 Lecture-16

Cracker Column A

4

9

2

7

1

3

8

6

13

12

11

16

19

14

Cracking Example• Each query is treated as advice on how data should

be stored– Physically reorganize based on the selection predicate

Q1: select * from R

where R.A > 10

and R.A < 14

Column A

13

16

4

9

2

12

7

1

19

3

14

11

8

6

Piece 1:

A ≤ 10

Piece 2:

10 < A < 14

Piece 3:

14 ≤ A

Page 35: John Kubiatowicz and Anthony D. Joseph Electrical Engineering and Computer Sciences

10/28/2013 36cs262a-S13 Lecture-16

Cracker Column A

4

9

2

7

1

3

8

6

13

12

11

16

19

14

Cracking Example• Each query is treated as advice on how data should

be stored– Physically reorganize based on the selection predicate

Q1: select * from R

where R.A > 10

and R.A < 14

Column A

13

16

4

9

2

12

7

1

19

3

14

11

8

6

Piece 1:

A ≤ 10

Piece 2:

10 < A < 14

Piece 3:

14 ≤ A

Page 36: John Kubiatowicz and Anthony D. Joseph Electrical Engineering and Computer Sciences

10/28/2013 37cs262a-S13 Lecture-16

Cracker Column A

4

9

2

7

1

3

8

6

13

12

11

16

19

14

Cracking Example• Each query is treated as advice on how data should

be stored– Physically reorganize based on the selection predicate

Q1: select * from R

where R.A > 10

and R.A < 14

Column A

13

16

4

9

2

12

7

1

19

3

14

11

8

6

Piece 1:

A ≤ 10

Piece 2:

10 < A < 14

Piece 3:

14 ≤ A

Res

ult

tupl

es

Gain knowledge on how to organize

dataon-the-fly within

the select-operator

Improve data access for

future queries

Page 37: John Kubiatowicz and Anthony D. Joseph Electrical Engineering and Computer Sciences

10/28/2013 38cs262a-S13 Lecture-16

Cracker Column A

4

9

2

7

1

3

8

6

13

12

11

16

19

14

Cracking Example• Each query is treated as advice on how data should

be stored– Physically reorganize based on the selection predicate

Q1: select * from R

where R.A > 10

and R.A < 14

Column A

13

16

4

9

2

12

7

1

19

3

14

11

8

6

Piece 1:

A ≤ 10

Piece 2:

10 < A < 14

Piece 3:

14 ≤ A

Q1

Q2: select * from R

where R.A > 7

and R.A ≤16

Page 38: John Kubiatowicz and Anthony D. Joseph Electrical Engineering and Computer Sciences

10/28/2013 39cs262a-S13 Lecture-16

Cracker Column A

4

9

2

7

1

3

8

6

13

12

11

16

19

14

Cracking Example• Each query is treated as advice on how data should

be stored– Physically reorganize based on the selection predicate

Q1: select * from R

where R.A > 10

and R.A < 14

Column A

13

16

4

9

2

12

7

1

19

3

14

11

8

6

Piece 1:

A ≤ 10

Piece 2:

10 < A < 14

Piece 3:

14 ≤ A

Q1

Q2: select * from R

where R.A > 7

and R.A ≤16

Cracker Column A

4

2

1

3

6

7

9

8

13

12

11

14

16

19

Piece 1:

A ≤ 7

Piece 3:

10<A<14

Piece 4:

14≤A≤16

Q2Piece 2:

7 < A ≤10

Piece 5:

16 < A

Page 39: John Kubiatowicz and Anthony D. Joseph Electrical Engineering and Computer Sciences

10/28/2013 40cs262a-S13 Lecture-16

Cracker Column A

4

9

2

7

1

3

8

6

13

12

11

16

19

14

Cracking Example• Each query is treated as advice on how data should

be stored– Physically reorganize based on the selection predicate

Q1: select * from R

where R.A > 10

and R.A < 14

Column A

13

16

4

9

2

12

7

1

19

3

14

11

8

6

Piece 1:

A ≤ 10

Piece 2:

10 < A < 14

Piece 3:

14 ≤ A

Q1

Q2: select * from R

where R.A > 7

and R.A ≤16

Cracker Column A

4

2

1

3

6

7

9

8

13

12

11

14

16

19

Piece 1:

A ≤ 7

Piece 3:

10<A<14

Piece 4:

14≤A≤16

Q2Piece 2:

7 < A ≤10

Piece 5:

16 < A

Page 40: John Kubiatowicz and Anthony D. Joseph Electrical Engineering and Computer Sciences

10/28/2013 41cs262a-S13 Lecture-16

Cracker Column A

4

9

2

7

1

3

8

6

13

12

11

16

19

14

Cracking Example• Each query is treated as advice on how data should

be stored– Physically reorganize based on the selection predicate

Q1: select * from R

where R.A > 10

and R.A < 14

Column A

13

16

4

9

2

12

7

1

19

3

14

11

8

6

Piece 1:

A ≤ 10

Piece 2:

10 < A < 14

Piece 3:

14 ≤ A

Q1

Q2: select * from R

where R.A > 7

and R.A ≤16

Cracker Column A

4

2

1

3

6

7

9

8

13

12

11

14

16

19

Piece 1:

A ≤ 7

Piece 3:

10<A<14

Piece 4:

14≤A≤16

Q2Piece 2:

7 < A ≤10

Piece 5:

16 < A

Result

tuples

The more cracking,

the more learned

Page 41: John Kubiatowicz and Anthony D. Joseph Electrical Engineering and Computer Sciences

10/28/2013 42cs262a-S13 Lecture-16

Self-Organizing Behavior (Count(*) range query)

Page 42: John Kubiatowicz and Anthony D. Joseph Electrical Engineering and Computer Sciences

10/28/2013 43cs262a-S13 Lecture-16

Self-Organizing Behavior (TPC-H Query 6)• TPC-H is an ad-hoc, decision support benchmark

– Business oriented ad-hoc queries, concurrent data modifications

• Example: – Tell me the amount of revenue increase that would have resulted from eliminating

certain company-wide discounts in a given percentage range in a given year

• Workload:– Database load– Execution of 22 read-only

queries in both single and multi-user mode

– Execution of 2 refresh functions

Page 43: John Kubiatowicz and Anthony D. Joseph Electrical Engineering and Computer Sciences

10/28/2013 44cs262a-S13 Lecture-16

Is this a good paper?

• What were the authors’ goals?• What about the evaluation/metrics?• Did they convince you that this was a good

system/approach?• Were there any red-flags?• What mistakes did they make?• Does the system/approach meet the “Test of Time”

challenge?• How would you review this paper today?