databases in the cloudmhhammou/15440-f14/lectures/dat… · database warehouses •complete history...

33
DATABASES IN THE CLOUD @andy_pavlo CMU-Q 15-440 December 3 rd , 2014

Upload: others

Post on 09-Aug-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Databases in the Cloudmhhammou/15440-f14/lectures/Dat… · Database Warehouses •Complete history of OLTP databases. •Complex queries that analyze large segments of fact tables

DATABASES IN THE CLOUD @andy_pavlo CMU-Q 15-440 December 3rd, 2014

Page 2: Databases in the Cloudmhhammou/15440-f14/lectures/Dat… · Database Warehouses •Complete history of OLTP databases. •Complex queries that analyze large segments of fact tables

OLTP vs. OLAP databases.

Source: https://www.flickr.com/photos/adesigna/3237575990

Page 3: Databases in the Cloudmhhammou/15440-f14/lectures/Dat… · Database Warehouses •Complete history of OLTP databases. •Complex queries that analyze large segments of fact tables

On-line Transaction Processing

• Fast operations that ingest new data and then update state using ACID transactions.

• Only access a small amount of data. • Volume: 1k to 1m txn/sec • Latency: >1-50 ms • Database Size: 100s GB to 10s TB

3

Page 4: Databases in the Cloudmhhammou/15440-f14/lectures/Dat… · Database Warehouses •Complete history of OLTP databases. •Complex queries that analyze large segments of fact tables

Example

• -line game in the OLTP database.

4

Game Application Framework

Click Stream

Game Updates

OLTP DBMS Pre-computed model decides the next level the player is shown.

Page 5: Databases in the Cloudmhhammou/15440-f14/lectures/Dat… · Database Warehouses •Complete history of OLTP databases. •Complex queries that analyze large segments of fact tables

Example

• -line game in the OLTP database.

5

Game Application Framework

Click Stream

Game Updates

OLTP DBMS

Real-time Monitoring

Page 6: Databases in the Cloudmhhammou/15440-f14/lectures/Dat… · Database Warehouses •Complete history of OLTP databases. •Complex queries that analyze large segments of fact tables

Database Warehouses

• Complete history of OLTP databases. • Complex queries that analyze large

segments of fact tables and combine them with dimension tables.

• Volume: A couple queries per second • Latency: 1-60 seconds • Database Size: 100s TB to 10s PB

6

Page 7: Databases in the Cloudmhhammou/15440-f14/lectures/Dat… · Database Warehouses •Complete history of OLTP databases. •Complex queries that analyze large segments of fact tables

Example

• Compute model used to guide OLTP DBMS decisions from historical data.

7

Game Application Framework

Click Stream

Game Updates

OLTP DBMS OLAP DBMS ETL

New Model

Page 8: Databases in the Cloudmhhammou/15440-f14/lectures/Dat… · Database Warehouses •Complete history of OLTP databases. •Complex queries that analyze large segments of fact tables

OLTP vs. OLAP

• Storage Format: – OLTP → Row-oriented – OLAP → Column-oriented

• Primary Database Location: – OLTP → In-Memory – OLAP → Disks

• Workloads: – OLTP → Write-Heavy – OLAP → Read-Only 8

Page 9: Databases in the Cloudmhhammou/15440-f14/lectures/Dat… · Database Warehouses •Complete history of OLTP databases. •Complex queries that analyze large segments of fact tables

Things to consider with databases in the cloud.

Source: https://www.flickr.com/photos/arvidnn/15285491335

Page 10: Databases in the Cloudmhhammou/15440-f14/lectures/Dat… · Database Warehouses •Complete history of OLTP databases. •Complex queries that analyze large segments of fact tables

Good Things

• Better Resource Utilization • Elastic Scaling • Database-as-a-Service Offerings

10

Page 11: Databases in the Cloudmhhammou/15440-f14/lectures/Dat… · Database Warehouses •Complete history of OLTP databases. •Complex queries that analyze large segments of fact tables

Better Resource Utilization

• Combine multiple silos onto overprovisioned resources.

• Public platform providers achieve better economies of scale.

• Database machines are (mostly) dead. • Optimal multi-tenant placement is a difficult

problem.

11

Page 12: Databases in the Cloudmhhammou/15440-f14/lectures/Dat… · Database Warehouses •Complete history of OLTP databases. •Complex queries that analyze large segments of fact tables

Elastic Scaling

• Automatically provision new resources on the fly as needed.

• Scaling up vs. Scaling out. • Difficult for OLTP DBMS to continue

processing transactions while data migrates.

12

Page 13: Databases in the Cloudmhhammou/15440-f14/lectures/Dat… · Database Warehouses •Complete history of OLTP databases. •Complex queries that analyze large segments of fact tables

OLTP Scale-out Example

13

Elapsed Time

TPC-C Benchmark on H-Store (Fall 2014) Scaling from 3 to 4 nodes

E-Store: Fine-Grained Elastic Partitioning for Distributed Transaction Processing R.Taft, E.Mansour, M.Serafini, J.Duggan, A.J. Elmore, A.Aboulnaga, A.Pavlo, M.Stonebraker Proceedings of the VLDB Endowment, vol. 8, iss. 3, pages. 245 256, November 2014.

Page 14: Databases in the Cloudmhhammou/15440-f14/lectures/Dat… · Database Warehouses •Complete history of OLTP databases. •Complex queries that analyze large segments of fact tables

Database-as-a-Service

• Cloud provider manages physical configuration of a DBMS.

• Ideal for applications that are co-located in

• Combine private data with curated databases (i.e., data marts)

14

Page 15: Databases in the Cloudmhhammou/15440-f14/lectures/Dat… · Database Warehouses •Complete history of OLTP databases. •Complex queries that analyze large segments of fact tables

Bad Things

• I/O Virtualization • File system Replication • Security + Privacy Concerns • Performance Variance

15

Page 16: Databases in the Cloudmhhammou/15440-f14/lectures/Dat… · Database Warehouses •Complete history of OLTP databases. •Complex queries that analyze large segments of fact tables

I/O Virtualization

• Distributed file system stores data transparently across multiple nodes.

• • This causes a DBMS pull data to query

push query to data

16

Page 17: Databases in the Cloudmhhammou/15440-f14/lectures/Dat… · Database Warehouses •Complete history of OLTP databases. •Complex queries that analyze large segments of fact tables

OLAP I/O Virtualization

17

SELECT YEAR(o_date) AS o_year, AVG(o_amount) FROM orders GROUP BY o_year ORDER BY o_year ASC

OLAP DBMS

Terabytes!

Distrib

ute

d F

ilesy

stem

Page 18: Databases in the Cloudmhhammou/15440-f14/lectures/Dat… · Database Warehouses •Complete history of OLTP databases. •Complex queries that analyze large segments of fact tables

OLAP I/O Virtualization

18

SELECT YEAR(o_date) AS o_year, AVG(o_amount) FROM orders GROUP BY o_year ORDER BY o_year ASC

OLAP DBMS Bytes!

Distrib

ute

d F

ilesy

stem

Page 19: Databases in the Cloudmhhammou/15440-f14/lectures/Dat… · Database Warehouses •Complete history of OLTP databases. •Complex queries that analyze large segments of fact tables

File System Replication

• The DBMS should not rely on file system replication for durability.

• OLTP systems maintain replicas in-memory. • OLAP systems can store copies of tables in

different ways on replica nodes.

19

Page 20: Databases in the Cloudmhhammou/15440-f14/lectures/Dat… · Database Warehouses •Complete history of OLTP databases. •Complex queries that analyze large segments of fact tables

OLAP Replication

20

OLAP DBMS

Table 1: name

Table 2: name

Table 1: id

Table 2: id

Sort Order

Sort Order

Re

plica

#1

R

ep

lica #

2

Page 21: Databases in the Cloudmhhammou/15440-f14/lectures/Dat… · Database Warehouses •Complete history of OLTP databases. •Complex queries that analyze large segments of fact tables

OLAP Replication

21

OLAP DBMS

Table 1: name

Table 2: name

Table 1: id

Table 2: id

Sort Order

Table1.name ⨝

Table2.name

Sort Order

Re

plica

#1

R

ep

lica #

2

Page 22: Databases in the Cloudmhhammou/15440-f14/lectures/Dat… · Database Warehouses •Complete history of OLTP databases. •Complex queries that analyze large segments of fact tables

Security + Privacy Concerns

• No truly encrypted solution exists. • Many companies are unable to use public

cloud platforms.

22

Page 23: Databases in the Cloudmhhammou/15440-f14/lectures/Dat… · Database Warehouses •Complete history of OLTP databases. •Complex queries that analyze large segments of fact tables

Performance Variance

• DBMSs are sensitive to changes in underlying hardware performance.

•large fluctuations in performance.

23

Page 24: Databases in the Cloudmhhammou/15440-f14/lectures/Dat… · Database Warehouses •Complete history of OLTP databases. •Complex queries that analyze large segments of fact tables

OLTP Performance Variance

24 YCSB on MySQL (Winter 2012) Medium EC2 Instances

35% Difference

OLTP-Bench: An Extensible Testbed for Benchmarking Relational Databases Djellel Eddine Difallah, Andrew Pavlo, Carlo Curino, Philippe Cudre-Mauroux Proceedings of the VLDB Endowment, vol. 7, pages. 277 288, December 2013.

Page 25: Databases in the Cloudmhhammou/15440-f14/lectures/Dat… · Database Warehouses •Complete history of OLTP databases. •Complex queries that analyze large segments of fact tables

Cloud database vendors.

Source: https://www.flickr.com/photos/alestra/8891585632

Page 26: Databases in the Cloudmhhammou/15440-f14/lectures/Dat… · Database Warehouses •Complete history of OLTP databases. •Complex queries that analyze large segments of fact tables

Important Features

• Automatic Back-ups • Geo-replication • Elasticity / Live Reconfiguration • Efficient Multi-Tenancy • Workload Awareness

26

Page 27: Databases in the Cloudmhhammou/15440-f14/lectures/Dat… · Database Warehouses •Complete history of OLTP databases. •Complex queries that analyze large segments of fact tables

Cloud Database Vendors

• Cloud-friendly systems • Database-as-a-Service (DBaaS)

27

Page 28: Databases in the Cloudmhhammou/15440-f14/lectures/Dat… · Database Warehouses •Complete history of OLTP databases. •Complex queries that analyze large segments of fact tables

Cloud-friendly DBMSs

• Most DBMS vendors make it easy to deploy on cloud platforms.

• Others provide support for easy scale-out in a cloud environment.

• More than just pre-configured instances.

28

Page 29: Databases in the Cloudmhhammou/15440-f14/lectures/Dat… · Database Warehouses •Complete history of OLTP databases. •Complex queries that analyze large segments of fact tables

OLTP DBaaS

• Amazon RDS / Aurora • Microsoft Azure • Google Cloud SQL • Database.com • ClearDB • GenieDB • Clustrix

29

Page 30: Databases in the Cloudmhhammou/15440-f14/lectures/Dat… · Database Warehouses •Complete history of OLTP databases. •Complex queries that analyze large segments of fact tables

OLAP DBaaS

• Amazon Redshift • Google BigQuery • Microsoft Azure • Snowflake

30

Page 31: Databases in the Cloudmhhammou/15440-f14/lectures/Dat… · Database Warehouses •Complete history of OLTP databases. •Complex queries that analyze large segments of fact tables

Parting Thoughts

• The cloud does not magically make database problems go away.

•DBMS on the cloud.

• AFAIK, there is no truly autonomous DBMS as of yet.

31

Page 33: Databases in the Cloudmhhammou/15440-f14/lectures/Dat… · Database Warehouses •Complete history of OLTP databases. •Complex queries that analyze large segments of fact tables

END @andy_pavlo