research faculty summit 2018...through large-scale machine learning sigmod 2017 165 508 562 736 686...

32
Systems | Fueling future disruptions Research Faculty Summit 2018

Upload: others

Post on 04-Jun-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Research Faculty Summit 2018...THROUGH LARGE-SCALE MACHINE LEARNING SIGMOD 2017 165 508 562 736 686 0 250 500 750 1000 426 845 714 843 946 0 250 500 750 1000 Throughput (txn/sec) Default

Systems | Fueling future disruptions

ResearchFaculty Summit 2018

Page 3: Research Faculty Summit 2018...THROUGH LARGE-SCALE MACHINE LEARNING SIGMOD 2017 165 508 562 736 686 0 250 500 750 1000 426 845 714 843 946 0 250 500 750 1000 Throughput (txn/sec) Default

Part #1Part #2Part #3

– Background– Engineering– Oracle Rant

Page 4: Research Faculty Summit 2018...THROUGH LARGE-SCALE MACHINE LEARNING SIGMOD 2017 165 508 562 736 686 0 250 500 750 1000 426 845 714 843 946 0 250 500 750 1000 Throughput (txn/sec) Default

AUTONOMOUS DBMSs 4SELF-ADAPTIVE DATABASES

1970-1990sSelf-Adaptive

Databases

SELECT * FROM A JOIN BON A.ID = B.ID

WHERE A.VAL > 123AND B.NAME LIKE 'XY%'

A.IDA.VALB.ID

B.NAME

+100 +200 +50

TuningAlgorithm

Admin

→Index Selection→Partitioning / Sharding→Data Placement

Page 5: Research Faculty Summit 2018...THROUGH LARGE-SCALE MACHINE LEARNING SIGMOD 2017 165 508 562 736 686 0 250 500 750 1000 426 845 714 843 946 0 250 500 750 1000 Throughput (txn/sec) Default

541

291

0

200

400

600

2000 2004 2008 2012 2016

Number of Knobs

AUTONOMOUS DBMSs 5SELF-TUNING DATABASES

1990-2000sSelf-TuningDatabases

SELECT * FROM A JOIN BON A.ID = B.ID

WHERE A.VAL > 123AND B.NAME LIKE 'XY%'

TuningAlgorithm

Admin

→Index Selection→Partitioning / Sharding→Data PlacementAutoAdmin

A.IDA.VALB.ID

B.NAME

→Knob Configuration

Page 6: Research Faculty Summit 2018...THROUGH LARGE-SCALE MACHINE LEARNING SIGMOD 2017 165 508 562 736 686 0 250 500 750 1000 426 845 714 843 946 0 250 500 750 1000 Throughput (txn/sec) Default

AUTONOMOUS DBMSs 6CLOUD MANAGED DATABASES

2010sCloud

Databases

→Initial Placement→Tenant Migration

Page 7: Research Faculty Summit 2018...THROUGH LARGE-SCALE MACHINE LEARNING SIGMOD 2017 165 508 562 736 686 0 250 500 750 1000 426 845 714 843 946 0 250 500 750 1000 Throughput (txn/sec) Default

Why is this Previous WorkInsufficient?

Page 8: Research Faculty Summit 2018...THROUGH LARGE-SCALE MACHINE LEARNING SIGMOD 2017 165 508 562 736 686 0 250 500 750 1000 426 845 714 843 946 0 250 500 750 1000 Throughput (txn/sec) Default

AUTONOMOUS DBMSs 8A BRIEF HISTORY

Problem #1Human

Judgements

Problem #2ReactionaryMeasures

Problem #3No Transfer

Learning

Page 9: Research Faculty Summit 2018...THROUGH LARGE-SCALE MACHINE LEARNING SIGMOD 2017 165 508 562 736 686 0 250 500 750 1000 426 845 714 843 946 0 250 500 750 1000 Throughput (txn/sec) Default

What is Different this Time?

Page 10: Research Faculty Summit 2018...THROUGH LARGE-SCALE MACHINE LEARNING SIGMOD 2017 165 508 562 736 686 0 250 500 750 1000 426 845 714 843 946 0 250 500 750 1000 Throughput (txn/sec) Default

CARNEGIE MELLON UNIVERSITY 10RESEARCH PROJECTS

OtterTuneExistingSystems

PelotonNew

System

Page 11: Research Faculty Summit 2018...THROUGH LARGE-SCALE MACHINE LEARNING SIGMOD 2017 165 508 562 736 686 0 250 500 750 1000 426 845 714 843 946 0 250 500 750 1000 Throughput (txn/sec) Default

OtterTune

Database Tuning-as-a-Service→Automatically generate

DBMS knob configurations.→Reuse data from previous

tuning sessions.

ottertune.cs.cmu.edu

Page 12: Research Faculty Summit 2018...THROUGH LARGE-SCALE MACHINE LEARNING SIGMOD 2017 165 508 562 736 686 0 250 500 750 1000 426 845 714 843 946 0 250 500 750 1000 Throughput (txn/sec) Default

OTTERTUNETPC-C TUNING

AUTOMATIC DATABASE MANAGEMENT SYSTEM TUNING THROUGH LARGE-SCALE MACHINE LEARNINGSIGMOD 2017

165

508562

736686

0

250

500

750

1000

426

845

714

843

946

0

250

500

750

1000Throughput (txn/sec)

Default RDS DBAScripts OtterTune

12

Page 13: Research Faculty Summit 2018...THROUGH LARGE-SCALE MACHINE LEARNING SIGMOD 2017 165 508 562 736 686 0 250 500 750 1000 426 845 714 843 946 0 250 500 750 1000 Throughput (txn/sec) Default

Peloton

Self-Driving Database System→ In-memory DBMS with

integrated planning framework.

→Designed for autonomous operations.

pelotondb.io

Page 14: Research Faculty Summit 2018...THROUGH LARGE-SCALE MACHINE LEARNING SIGMOD 2017 165 508 562 736 686 0 250 500 750 1000 426 845 714 843 946 0 250 500 750 1000 Throughput (txn/sec) Default

Design Considerations forAutonomous Operation

Page 15: Research Faculty Summit 2018...THROUGH LARGE-SCALE MACHINE LEARNING SIGMOD 2017 165 508 562 736 686 0 250 500 750 1000 426 845 714 843 946 0 250 500 750 1000 Throughput (txn/sec) Default

AUTONMOUS DBMS 15DESIGN CONSIDERATIONS

ConfigurationKnobs

InternalMetrics

ActionEngineering

Page 16: Research Faculty Summit 2018...THROUGH LARGE-SCALE MACHINE LEARNING SIGMOD 2017 165 508 562 736 686 0 250 500 750 1000 426 845 714 843 946 0 250 500 750 1000 Throughput (txn/sec) Default

Anything that requires a human value judgement should be marked as off-limits to autonomous components.

– File Paths / Network Info– Durability / Isolation Levels– Hardware Usage– Recovery Time

16UNTUNABLE KNOBSCONFIGURATION KNOBS

Page 17: Research Faculty Summit 2018...THROUGH LARGE-SCALE MACHINE LEARNING SIGMOD 2017 165 508 562 736 686 0 250 500 750 1000 426 845 714 843 946 0 250 500 750 1000 Throughput (txn/sec) Default

CONFIGURATION KNOBS

The autonomous components need hints about how to change a knob.

– Min/max ranges.– Separate knobs to enable/disable a feature.– Non-uniform deltas.

17HOW TO CHANGE

1 KB 1 MB 1 GB 1 TB+100 KB +100 MB +100 GB

Page 18: Research Faculty Summit 2018...THROUGH LARGE-SCALE MACHINE LEARNING SIGMOD 2017 165 508 562 736 686 0 250 500 750 1000 426 845 714 843 946 0 250 500 750 1000 Throughput (txn/sec) Default

INTERNAL METRICS

If the DBMS has sub-components that are tunable, then it must expose separate metrics for those components.

Bad Example:

18SUB-COMPONENTS

Page 19: Research Faculty Summit 2018...THROUGH LARGE-SCALE MACHINE LEARNING SIGMOD 2017 165 508 562 736 686 0 250 500 750 1000 426 845 714 843 946 0 250 500 750 1000 Throughput (txn/sec) Default

INTERNAL METRICS 19SUB-COMPONENTS

RocksDB Column Family Knobs

Column Family Metrics

Missing:ReadsWrites

Global Metrics

AggregatedMetrics

Page 20: Research Faculty Summit 2018...THROUGH LARGE-SCALE MACHINE LEARNING SIGMOD 2017 165 508 562 736 686 0 250 500 750 1000 426 845 714 843 946 0 250 500 750 1000 Throughput (txn/sec) Default

ACTION ENGINEERING

No action should ever require the DBMS to restart in order for it to take affect.The commercial systems are much better than this than the open-source systems.

20NO SHUTDOWN

Page 21: Research Faculty Summit 2018...THROUGH LARGE-SCALE MACHINE LEARNING SIGMOD 2017 165 508 562 736 686 0 250 500 750 1000 426 845 714 843 946 0 250 500 750 1000 Throughput (txn/sec) Default

ACTION ENGINEERING

Allow replica configurations to diverge from each other.

21REPLICA EXPLORATION

Master

Replicas

Page 22: Research Faculty Summit 2018...THROUGH LARGE-SCALE MACHINE LEARNING SIGMOD 2017 165 508 562 736 686 0 250 500 750 1000 426 845 714 843 946 0 250 500 750 1000 Throughput (txn/sec) Default

What About Oracle'sSelf-Driving DBMS?

Page 23: Research Faculty Summit 2018...THROUGH LARGE-SCALE MACHINE LEARNING SIGMOD 2017 165 508 562 736 686 0 250 500 750 1000 426 845 714 843 946 0 250 500 750 1000 Throughput (txn/sec) Default

ORACLE

Automatic Indexing

Automatic Recovery

Automatic Scaling

Automatic Query Tuning

SELF-DRIVING DBMS23

September 2017 January 2017

Problem #2React ionary

Measures

Page 24: Research Faculty Summit 2018...THROUGH LARGE-SCALE MACHINE LEARNING SIGMOD 2017 165 508 562 736 686 0 250 500 750 1000 426 845 714 843 946 0 250 500 750 1000 Throughput (txn/sec) Default

CONCLUSION

True autonomous DBMSs are achievable in the next decade.You should think about how each new feature can be controlled by a machine.

MAIN TAKEAWAYS24

Page 25: Research Faculty Summit 2018...THROUGH LARGE-SCALE MACHINE LEARNING SIGMOD 2017 165 508 562 736 686 0 250 500 750 1000 426 845 714 843 946 0 250 500 750 1000 Throughput (txn/sec) Default

@andy_pavlo

Page 26: Research Faculty Summit 2018...THROUGH LARGE-SCALE MACHINE LEARNING SIGMOD 2017 165 508 562 736 686 0 250 500 750 1000 426 845 714 843 946 0 250 500 750 1000 Throughput (txn/sec) Default
Page 27: Research Faculty Summit 2018...THROUGH LARGE-SCALE MACHINE LEARNING SIGMOD 2017 165 508 562 736 686 0 250 500 750 1000 426 845 714 843 946 0 250 500 750 1000 Throughput (txn/sec) Default

OTTERTUNE 27AUTOMATIC DBMS TUNING SERVICE

TARGETDATABASE

TUNING MANAGERCONTROLLERCOLLECTOR

Configuration Recommender

KnobAnalyzer

MetricAnalyzer

5001000150020002500

200400

600800

99th

%-ti

le(s

ec)

0.00.51.01.52.0

InternalRepository

INSTALL AGENT

Page 28: Research Faculty Summit 2018...THROUGH LARGE-SCALE MACHINE LEARNING SIGMOD 2017 165 508 562 736 686 0 250 500 750 1000 426 845 714 843 946 0 250 500 750 1000 Throughput (txn/sec) Default

"THE BRAIN"PELOTON 28THE SELF-DRIVING DBMS

FORECAST MODELS

Search Tree

ACTIONCATALOG

ACTION SEQUENCETARGET

DATABASE

WORKLOADHISTORY ?

??

QUERY-BASED WORKLOAD FORECASTING FOR SELF-DRIVING DATABASE MANAGEMENT SYSTEMSIGMOD 2018

Page 29: Research Faculty Summit 2018...THROUGH LARGE-SCALE MACHINE LEARNING SIGMOD 2017 165 508 562 736 686 0 250 500 750 1000 426 845 714 843 946 0 250 500 750 1000 Throughput (txn/sec) Default

PELOTONBUS TRACKING APP WITH ONE-HOUR HORIZON

QUERY-BASED WORKLOAD FORECASTING FOR SELF-DRIVING DATABASE MANAGEMENT SYSTEMSIGMOD 2018

0

15000

30000

45000

60000

9-Jan 11-Jan 13-Jan 15-Jan 17-Jan

Ensemble (LR+RNN)

29

Actual Predicted

Que

ries

Per H

our

Page 30: Research Faculty Summit 2018...THROUGH LARGE-SCALE MACHINE LEARNING SIGMOD 2017 165 508 562 736 686 0 250 500 750 1000 426 845 714 843 946 0 250 500 750 1000 Throughput (txn/sec) Default

ACTION ENGINEERING

Provide a notification callback to indicate when an action starts and when it completes.Harder for changes that can be used before the action completes.

30NOTIFICATIONS

Page 31: Research Faculty Summit 2018...THROUGH LARGE-SCALE MACHINE LEARNING SIGMOD 2017 165 508 562 736 686 0 250 500 750 1000 426 845 714 843 946 0 250 500 750 1000 Throughput (txn/sec) Default

Thank you!

Page 32: Research Faculty Summit 2018...THROUGH LARGE-SCALE MACHINE LEARNING SIGMOD 2017 165 508 562 736 686 0 250 500 750 1000 426 845 714 843 946 0 250 500 750 1000 Throughput (txn/sec) Default