data analytics using deep learningjarulraj/courses/8803-f... · gt 8803 // fall 2018 paper...

45
DATA ANALYTICS USING DEEP LEARNING GT 8803 // FALL 2018 // JOY ARULRAJ LECTURE #19: LEARNING STATE REPRESENTATIONS FOR QUERY OPTIMIZATION WITH DEEP REINFORCEMENT LEARNING

Upload: others

Post on 04-Jun-2020

6 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: DATA ANALYTICS USING DEEP LEARNINGjarulraj/courses/8803-f... · GT 8803 // Fall 2018 PAPER •Learning State Representations for Query Optimization with Deep Reinforcement Learning

DATA ANALYTICS USING DEEP LEARNINGGT 8803 // FALL 2018 // JOY ARULRAJ

L E C T U R E # 1 9 :L E A R N I N G S T A T E R E P R E S E N T A T I O N S F O R Q U E R Y O P T I M I Z A T I O N W I T H D E E P R E I N F O R C E M E N T L E A R N I N G

Page 2: DATA ANALYTICS USING DEEP LEARNINGjarulraj/courses/8803-f... · GT 8803 // Fall 2018 PAPER •Learning State Representations for Query Optimization with Deep Reinforcement Learning

GT 8803 // Fall 2018

P A P E R

• Learning State Representations for Query Optimization with Deep Reinforcement Learning– Jennifer Ortiz, Magdalena Balazinska, Johannes

Gehrke, S. Sathiya Keerthi– University of Washington , Microsoft , Criteo

Research

• Key Topics– Deep reinforcement learning– Query optimization

2

Page 3: DATA ANALYTICS USING DEEP LEARNINGjarulraj/courses/8803-f... · GT 8803 // Fall 2018 PAPER •Learning State Representations for Query Optimization with Deep Reinforcement Learning

GT 8803 // Fall 2018

R E L A T E D L I N K S

• Paper - https://arxiv.org/abs/1803.08604

• J Ortiz - https://homes.cs.washington.edu/~jortiz16/• M Balazinska - https://www.cs.washington.edu/people/faculty/magda• J Gehrke - http://www.cs.cornell.edu/johannes/• SS Keerthi - http://www.keerthis.com/

3

Page 4: DATA ANALYTICS USING DEEP LEARNINGjarulraj/courses/8803-f... · GT 8803 // Fall 2018 PAPER •Learning State Representations for Query Optimization with Deep Reinforcement Learning

GT 8803 // Fall 2018

A G E N D A

• Problem Overview• Background• Key Ideas• Technical Details• Experiments• Discussion

4

Page 5: DATA ANALYTICS USING DEEP LEARNINGjarulraj/courses/8803-f... · GT 8803 // Fall 2018 PAPER •Learning State Representations for Query Optimization with Deep Reinforcement Learning

GT 8803 // Fall 2018

T O D A Y ’ s P A P E R

5

Query

ReinforcementLearning

DeepLearning

QueryCardinality

Database

Page 6: DATA ANALYTICS USING DEEP LEARNINGjarulraj/courses/8803-f... · GT 8803 // Fall 2018 PAPER •Learning State Representations for Query Optimization with Deep Reinforcement Learning

GT 8803 // Fall 2018

P r o b l e m O v e r v i e w

• Query Optimization is still a difficult problem

• Deep Reinforcement Learning (DRL) is an evolving approach to solve complex problems.

• Can DRL be used to improve query plan optimization?

6

Page 7: DATA ANALYTICS USING DEEP LEARNINGjarulraj/courses/8803-f... · GT 8803 // Fall 2018 PAPER •Learning State Representations for Query Optimization with Deep Reinforcement Learning

GT 8803 // Fall 2018

P R O B L E M O V E R V I E W

• Contribution #1: Generate a model that determine a subquery’s cardinality

• Contribution #2: Use reinforcement learning as a Markov process to propose a query plan

• Some Challenges: – State isn’t obvious like in some contexts (e.g.

games)– Choosing the reward can be tricky

7

Page 8: DATA ANALYTICS USING DEEP LEARNINGjarulraj/courses/8803-f... · GT 8803 // Fall 2018 PAPER •Learning State Representations for Query Optimization with Deep Reinforcement Learning

GT 8803 // Fall 2018

B a c k g r o u n d : Q u e r y O p t i m i z a t i o n

• Ongoing problem in database systems research

• Current systems still aren’t great - Why???– Plans must be efficient in time and resources -

tradeoffs– Current DBMSs make simplified assumptions

• Avoid multidimensional/complex methods– Result -> Estimation errors and poor query plans

8

Page 9: DATA ANALYTICS USING DEEP LEARNINGjarulraj/courses/8803-f... · GT 8803 // Fall 2018 PAPER •Learning State Representations for Query Optimization with Deep Reinforcement Learning

GT 8803 // Fall 2018

B a c k g r o u n d : Q u e r y O p t i m i z a t i o n

– Join order• When join includes more than

2 relations, join time can vary depending on size of relation

– Subquery optimization• group by, exists operators can

often be simplified, but…• can be computationally

complex to determine– Cardinality estimation

• Hard to map predicates as new data comes in

• Requires stats to be updated

9

EXPRESSLEARNING- DATABASEMANAGEMENTSYSTEMShttps://en.wikipedia.org/wiki/Query_optimization

Page 10: DATA ANALYTICS USING DEEP LEARNINGjarulraj/courses/8803-f... · GT 8803 // Fall 2018 PAPER •Learning State Representations for Query Optimization with Deep Reinforcement Learning

GT 8803 // Fall 2018

B a c k g r o u n d : Q u e r y O p t i m i z a t i o n

• Commonly used approaches– Data sketches– Sampling– Histograms– Heuristics

10

Page 11: DATA ANALYTICS USING DEEP LEARNINGjarulraj/courses/8803-f... · GT 8803 // Fall 2018 PAPER •Learning State Representations for Query Optimization with Deep Reinforcement Learning

GT 8803 // Fall 2018

B A C K G R O U N D : D E E P L E A R N I N G

• What is it?– Maps input x to output y though a series of hidden

layers.– Transforms data into representations

• e.g. images of cats become pixels – Hidden layers apply of series of functions– Errors decrease over time via backpropagation

11

Page 12: DATA ANALYTICS USING DEEP LEARNINGjarulraj/courses/8803-f... · GT 8803 // Fall 2018 PAPER •Learning State Representations for Query Optimization with Deep Reinforcement Learning

GT 8803 // Fall 2018

B A C K G R O U N D : D E E P L E A R N I N G

• What is it good for?– Machine translation– Object detection– Winning games– Much more…

12

Page 13: DATA ANALYTICS USING DEEP LEARNINGjarulraj/courses/8803-f... · GT 8803 // Fall 2018 PAPER •Learning State Representations for Query Optimization with Deep Reinforcement Learning

GT 8803 // Fall 2018

B A C K G R O U N D : D E E P L E A R N I N G

• Why?– Performs well across multiple domains– We have improved, cheaper hardware and large

datasets for training– It’s good at finding patterns that aren’t obvious to

humans (even domain experts) – Libraries

• PyTorch, TensorFlow, Keras

13

Page 14: DATA ANALYTICS USING DEEP LEARNINGjarulraj/courses/8803-f... · GT 8803 // Fall 2018 PAPER •Learning State Representations for Query Optimization with Deep Reinforcement Learning

GT 8803 // Fall 2018

B A C K G R O U N D : R E I N F O R C E M E N T L E A R N I N G

• What is it?– Agents – the learner in the model– States – condition of the

environment– Actions – Inputs from the agent

(based on previous learning or trial/error)– Rewards – Feedback to agent to

reward (or not)

14

http://introtodeeplearning.com/materials/2018_6S191_Lecture5.pdf

Page 15: DATA ANALYTICS USING DEEP LEARNINGjarulraj/courses/8803-f... · GT 8803 // Fall 2018 PAPER •Learning State Representations for Query Optimization with Deep Reinforcement Learning

GT 8803 // Fall 2018

B A C K G R O U N D : R E I N F O R C E M E N T L E A R N I N G

• What is it good for?– Beating Atari games– Training autonomous

vehicles, robots– Optimizing stocks,

gambling, auction bids, etc.

15

Page 16: DATA ANALYTICS USING DEEP LEARNINGjarulraj/courses/8803-f... · GT 8803 // Fall 2018 PAPER •Learning State Representations for Query Optimization with Deep Reinforcement Learning

GT 8803 // Fall 2018

B A C K G R O U N D : R E I N F O R C E M E N T L E A R N I N G

• Why?– May perform better than brute-force deep

learning models– Agents can use trial/error or greedy approaches

to optimize reward– Can be good in complex state spaces because

you don’t have have to provide fully labeled outputs for the model to train on; can just provide more simpler rewards

16

Page 17: DATA ANALYTICS USING DEEP LEARNINGjarulraj/courses/8803-f... · GT 8803 // Fall 2018 PAPER •Learning State Representations for Query Optimization with Deep Reinforcement Learning

GT 8803 // Fall 2018

K E Y I D E A

Can deep reinforcement learning be used to learn query

representations?

17

Page 18: DATA ANALYTICS USING DEEP LEARNINGjarulraj/courses/8803-f... · GT 8803 // Fall 2018 PAPER •Learning State Representations for Query Optimization with Deep Reinforcement Learning

GT 8803 // Fall 2018

S U B Q U E R Y L E A R N I N G V I A D R L

18

ℎ"

#"ℎ"$%

State t+1State t

RepresentationofDatabaseProperties

Queryoperationattimet

SubqueryRepresentation

⋈State t+2

⋈⋈

Queryoperationattimet+1

StateTransitionFunction ""#$

SubqueryRepresentation

Action t Action t+1

%&'%&(&)*+*,-)+**,/&*

%&'%&(&)*+*,-)+**,/&*+1

+0*,-)+**,/&*

Page 19: DATA ANALYTICS USING DEEP LEARNINGjarulraj/courses/8803-f... · GT 8803 // Fall 2018 PAPER •Learning State Representations for Query Optimization with Deep Reinforcement Learning

GT 8803 // Fall 2018

E X A M P L E

19

⋈𝜎

𝜎⋈select*fromcustomersC,ordersOwhereC.col1=O.col1andO.col1<=10

Representation of Database Properties

State t

Query operation at time t

Representation of Subquery

State t+1

Query operation at time t+1

𝜎Representation of

Subquery

State t+2

Page 20: DATA ANALYTICS USING DEEP LEARNINGjarulraj/courses/8803-f... · GT 8803 // Fall 2018 PAPER •Learning State Representations for Query Optimization with Deep Reinforcement Learning

GT 8803 // Fall 2018

A P P R O A C H

• Map query and database to a feature vector

• Two options:– Transform values using deep networks and output

cardinality– Recursive approach taking subquery (ht) and

operation (at) as input

20

(Q,D)

Page 21: DATA ANALYTICS USING DEEP LEARNINGjarulraj/courses/8803-f... · GT 8803 // Fall 2018 PAPER •Learning State Representations for Query Optimization with Deep Reinforcement Learning

GT 8803 // Fall 2018

M O R E O N A P P R O A C H

• Two options:– Transform values using deep networks and output

cardinality• Needs lots of data – very sparse• Recursive approach is selected– Recursive approach taking subquery (ht) and

operation (at) as input• ht is learned by the model• Thus we have NNST model that learns based on NNObserved

and NNInit

21

Page 22: DATA ANALYTICS USING DEEP LEARNINGjarulraj/courses/8803-f... · GT 8803 // Fall 2018 PAPER •Learning State Representations for Query Optimization with Deep Reinforcement Learning

GT 8803 // Fall 2018

H O W T O E N C O D E D A T A

22

Page 23: DATA ANALYTICS USING DEEP LEARNINGjarulraj/courses/8803-f... · GT 8803 // Fall 2018 PAPER •Learning State Representations for Query Optimization with Deep Reinforcement Learning

GT 8803 // Fall 2018

T E C H N I C A L D E T A I L

23

Usecardinality asanobservedvariable

QueryOperation

Priorsubqueryrepresentation

Page 24: DATA ANALYTICS USING DEEP LEARNINGjarulraj/courses/8803-f... · GT 8803 // Fall 2018 PAPER •Learning State Representations for Query Optimization with Deep Reinforcement Learning

GT 8803 // Fall 2018

S T E P S

• NNInit =𝑓 𝑥%, 𝑎%x=databaseproperties(min/maxvalues,#distinctvalues,1Dhistogram)a= singlerelationaloperator(=≠<>≤≥)

• NNST = 𝑓 ℎ1, 𝑎1h=latentrepresentationofmodelitself(asubquery)a=singlerelationaloperation()

• NNObservedMappingfromhiddenstatetoobservedvariablesattime t

24

Page 25: DATA ANALYTICS USING DEEP LEARNINGjarulraj/courses/8803-f... · GT 8803 // Fall 2018 PAPER •Learning State Representations for Query Optimization with Deep Reinforcement Learning

GT 8803 // Fall 2018

E x p e r i m e n t s

• Uses IMDB dataset – 3 GB– Real data (has skew and correlations

between columns)

• TensorFlow (Python) • Baseline estimates against SQL

Server

25

Page 26: DATA ANALYTICS USING DEEP LEARNINGjarulraj/courses/8803-f... · GT 8803 // Fall 2018 PAPER •Learning State Representations for Query Optimization with Deep Reinforcement Learning

GT 8803 // Fall 2018

E x p e r i m e n t # 1

• Train init function with properties from IMDB• 20K queries (15K train/5K test)• Model uses stochastic gradient descent (SGD)• Learning rate of .01• 50 hidden nodes in hidden layer

26

Page 27: DATA ANALYTICS USING DEEP LEARNINGjarulraj/courses/8803-f... · GT 8803 // Fall 2018 PAPER •Learning State Representations for Query Optimization with Deep Reinforcement Learning

GT 8803 // Fall 2018

E x p e r i m e n t # 1

27

• Fewer epochs == greater errors• m=3, 6th epoch similar to SQL Server• > 6th epoch, outperforms SQL Server• Greater cardinality == longer to converge

(outperforms SQL Server by 9th epoch)

Page 28: DATA ANALYTICS USING DEEP LEARNINGjarulraj/courses/8803-f... · GT 8803 // Fall 2018 PAPER •Learning State Representations for Query Optimization with Deep Reinforcement Learning

GT 8803 // Fall 2018

E x p e r i m e n t # 1

28

Page 29: DATA ANALYTICS USING DEEP LEARNINGjarulraj/courses/8803-f... · GT 8803 // Fall 2018 PAPER •Learning State Representations for Query Optimization with Deep Reinforcement Learning

GT 8803 // Fall 2018

E X P E R I M E N T # 2

• Combined models• Select and join operation – Where a is the join ( )

• Hidden state is able to store enough info to predict cardinality

29

Page 30: DATA ANALYTICS USING DEEP LEARNINGjarulraj/courses/8803-f... · GT 8803 // Fall 2018 PAPER •Learning State Representations for Query Optimization with Deep Reinforcement Learning

GT 8803 // Fall 2018

E X P E R I M E N T # 2

30

Page 31: DATA ANALYTICS USING DEEP LEARNINGjarulraj/courses/8803-f... · GT 8803 // Fall 2018 PAPER •Learning State Representations for Query Optimization with Deep Reinforcement Learning

GT 8803 // Fall 2018

N E X T S T E P S

31

Can subquery representations be used to build query plans?

Page 32: DATA ANALYTICS USING DEEP LEARNINGjarulraj/courses/8803-f... · GT 8803 // Fall 2018 PAPER •Learning State Representations for Query Optimization with Deep Reinforcement Learning

GT 8803 // Fall 2018

G O A L

• Given a database D and a query Q, train a model that can learn to predict subquery cardinalities (and the best join)…

32

State t+1

SubqueryRepresentation

⋈ 𝐴Action 1

? ⋈ 𝐵Action 2

⋈ 𝐶Action 3

Page 33: DATA ANALYTICS USING DEEP LEARNINGjarulraj/courses/8803-f... · GT 8803 // Fall 2018 PAPER •Learning State Representations for Query Optimization with Deep Reinforcement Learning

GT 8803 // Fall 2018

A S S U M P T I O N S

• Model-free environment where probabilities between states are unknown

• Each state encodes operations that have already been done

• The model needs a good reward to be successful

• Need to determine optimal policy

33

Page 34: DATA ANALYTICS USING DEEP LEARNINGjarulraj/courses/8803-f... · GT 8803 // Fall 2018 PAPER •Learning State Representations for Query Optimization with Deep Reinforcement Learning

GT 8803 // Fall 2018

E x a m p l e

34

Page 35: DATA ANALYTICS USING DEEP LEARNINGjarulraj/courses/8803-f... · GT 8803 // Fall 2018 PAPER •Learning State Representations for Query Optimization with Deep Reinforcement Learning

GT 8803 // Fall 2018

A P P R O A C H

• For all relations in a database, assume a set of relations with attributes

• Vector at time t, represents equi-join predicates and 1D selection predicates– e.g. if a predicate exists, set value to 1, otherwise 0

35

Page 36: DATA ANALYTICS USING DEEP LEARNINGjarulraj/courses/8803-f... · GT 8803 // Fall 2018 PAPER •Learning State Representations for Query Optimization with Deep Reinforcement Learning

GT 8803 // Fall 2018

H o w t o r e w a r d ?

• Can be given at each state or at the end.• Option 1:– Minimize cost based on existing query estimators

• Option 2:– Use cardinality from learned model– Experimental

36

Page 37: DATA ANALYTICS USING DEEP LEARNINGjarulraj/courses/8803-f... · GT 8803 // Fall 2018 PAPER •Learning State Representations for Query Optimization with Deep Reinforcement Learning

GT 8803 // Fall 2018

Q - L E A R N I N G

• Init with random values• For each state, the next value of Q comes

from:– Current value of Q– Learning rate– Reward– Max value for a reward given a greedy policy– Discount factor

37

Page 38: DATA ANALYTICS USING DEEP LEARNINGjarulraj/courses/8803-f... · GT 8803 // Fall 2018 PAPER •Learning State Representations for Query Optimization with Deep Reinforcement Learning

GT 8803 // Fall 2018

Q - L e a r n i n g

38

https://medium.freecodecamp.org/an-introduction-to-q-learning-reinforcement-learning-14ac0b4493cc

Page 39: DATA ANALYTICS USING DEEP LEARNINGjarulraj/courses/8803-f... · GT 8803 // Fall 2018 PAPER •Learning State Representations for Query Optimization with Deep Reinforcement Learning

GT 8803 // Fall 2018

O P E N P R O B L E M S

• How to choose rewards?• State space is large and Q-learning can be

impractical– Need to approximate solutions

39

Page 40: DATA ANALYTICS USING DEEP LEARNINGjarulraj/courses/8803-f... · GT 8803 // Fall 2018 PAPER •Learning State Representations for Query Optimization with Deep Reinforcement Learning

GT 8803 // Fall 2018

R e l a t e d W o r k

• Eliminate optimizer• Use RL for query processing• Feedback loop on optimizer• Neural networks to estimate cardinality• Neural networks to build fast indexes• DRL to determine join order

40

Page 41: DATA ANALYTICS USING DEEP LEARNINGjarulraj/courses/8803-f... · GT 8803 // Fall 2018 PAPER •Learning State Representations for Query Optimization with Deep Reinforcement Learning

GT 8803 // Fall 2018

S T R E N G T H S

• Deep learning is a more feasible approach than manually written queries

• Unique approach with using recursive model• Deep learning models can approximate and

exceed performance of industry-standard optimizers

41

Page 42: DATA ANALYTICS USING DEEP LEARNINGjarulraj/courses/8803-f... · GT 8803 // Fall 2018 PAPER •Learning State Representations for Query Optimization with Deep Reinforcement Learning

GT 8803 // Fall 2018

W E A K N E S S E S

• Q-Learning is impractical and difficult– Large state space– Reward selection problem

• Evaluating query plans takes time, but so does training iterative models, would be valuable to compare.

42

Page 43: DATA ANALYTICS USING DEEP LEARNINGjarulraj/courses/8803-f... · GT 8803 // Fall 2018 PAPER •Learning State Representations for Query Optimization with Deep Reinforcement Learning

GT 8803 // Fall 2018

M O D E L D E T A I L S ( N O T I N P A P E R )

43

Page 44: DATA ANALYTICS USING DEEP LEARNINGjarulraj/courses/8803-f... · GT 8803 // Fall 2018 PAPER •Learning State Representations for Query Optimization with Deep Reinforcement Learning

GT 8803 // Fall 2018

F u r t h e r D i s c u s s i o n

• Strategies to pick a reward function?• For actions, discuss a value-based recursive

approach vs. a policy gradient approach.• Is there a way to pick the most representative

queries to reduce state space?

44

Page 45: DATA ANALYTICS USING DEEP LEARNINGjarulraj/courses/8803-f... · GT 8803 // Fall 2018 PAPER •Learning State Representations for Query Optimization with Deep Reinforcement Learning

GT 8803 // Fall 2018

r e f e r e n c e s

1. Slides from Jennifer Ortiz, “Deep Learning for Query Plan Resource and Cost Estimation”, Teradata Analytics Universe, 2018.

2. LIMITED, I. E. (2012). EXPRESS LEARNING - DATABASE MANAGEMENT SYSTEMS. S.l.: PEARSON EDUCATION INDIA.

3. MIT 6.S191: Introduction to Deep Learning, http://introtodeeplearning.com/