performance tuning in computer systems with machine …ey204/pubs/talks/2019_12_11_rais.pdfdeep...

60
Performance Tuning in Computer Systems with Machine Learning Eiko Yoneki [email protected] http://www.cl.cam.ac.uk/~ey204 Systems Research Group University of Cambridge Computer Laboratory Alan Turing Institute

Upload: others

Post on 19-Jul-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Performance Tuning in Computer Systems with Machine …ey204/pubs/talks/2019_12_11_RAIS.pdfDeep Learning, Machine Learning, and AI… e.g. CNN, LSTM e.g. Logistic regression, Neural

Performance Tuning in Computer Systems with Machine Learning

Eiko [email protected]

http://www.cl.cam.ac.uk/~ey204

Systems Research GroupUniversity of Cambridge Computer Laboratory

Alan Turing Institute

Page 2: Performance Tuning in Computer Systems with Machine …ey204/pubs/talks/2019_12_11_RAIS.pdfDeep Learning, Machine Learning, and AI… e.g. CNN, LSTM e.g. Logistic regression, Neural

Tuning Computer Systems is Complex

Complex configuration parameter space / increasing # of parameters

Configurations need tuning to optimise resource utilisation

Cluster Workload Management

Not well-tuned system degrades performance with massive data processing

Compiler Optimisation

Page 3: Performance Tuning in Computer Systems with Machine …ey204/pubs/talks/2019_12_11_RAIS.pdfDeep Learning, Machine Learning, and AI… e.g. CNN, LSTM e.g. Logistic regression, Neural

Complex and High Dimension Parameter Space

Device Allocation for Distributed Training

UBER

Page 4: Performance Tuning in Computer Systems with Machine …ey204/pubs/talks/2019_12_11_RAIS.pdfDeep Learning, Machine Learning, and AI… e.g. CNN, LSTM e.g. Logistic regression, Neural

Parameter Space of Task Scheduler

Tuning distributed SGD scheduler over TensorFlow 10 heterogeneous machines with ~32 parameters ~1053 possible valid configurations

Objective function: minimise distributed SGD iteration time

Page 5: Performance Tuning in Computer Systems with Machine …ey204/pubs/talks/2019_12_11_RAIS.pdfDeep Learning, Machine Learning, and AI… e.g. CNN, LSTM e.g. Logistic regression, Neural

Computer Systems Optimisation

What is performance? Resource usage (e.g. time, power) Computational properties (e.g. accuracy, fairness, latency)

How do we improve it: Manual tuning Runtime autotuning Static time autotuning

Page 6: Performance Tuning in Computer Systems with Machine …ey204/pubs/talks/2019_12_11_RAIS.pdfDeep Learning, Machine Learning, and AI… e.g. CNN, LSTM e.g. Logistic regression, Neural

Manual Tuning: Profiling

Always the first step

Simplest case: Poor man’s profiler

Debugger + Pause

Higher level tools

Perf, Vtune, Gprof…

Distributed profiling: a difficult active research area

No clock synchronisation guarantee

Many resources to consider

System logs can be leveraged

tune implementation based on profiling (never captures all

interactions)

Page 7: Performance Tuning in Computer Systems with Machine …ey204/pubs/talks/2019_12_11_RAIS.pdfDeep Learning, Machine Learning, and AI… e.g. CNN, LSTM e.g. Logistic regression, Neural

Static time Autotuning

Especially useful when:

There is a variety of environments (hardware, input distributions)

The parameter space is difficult to explore manually

Defining a parameter space

e.g. Petabricks: A language and compiler for algorithmic choice (2009)

BNF-like language for parameter space

Uses an evolutionary algorithm for optimisation

Applied to Sort, matrix multiplication

Page 8: Performance Tuning in Computer Systems with Machine …ey204/pubs/talks/2019_12_11_RAIS.pdfDeep Learning, Machine Learning, and AI… e.g. CNN, LSTM e.g. Logistic regression, Neural

Auto-tuning systems

Properties: Many dimensions

(30+)

Expensive objective function

Understanding of the underlying behaviour

Hardware

System

ApplicationInput data

Flags

Page 9: Performance Tuning in Computer Systems with Machine …ey204/pubs/talks/2019_12_11_RAIS.pdfDeep Learning, Machine Learning, and AI… e.g. CNN, LSTM e.g. Logistic regression, Neural

Auto-tuning Complex Systems

Grid search θ ∈ [1, 2, 3, …]

Evolutionary approaches (e.g. )

Hill-climbing (e.g. )

Bayesian optimisation (e.g. )

1000s of evaluations of objective function

Computation more expensive

Fewer samples

Many dimensions Expensive objective function Hand-crafted solutions impractical

(e.g. extensive offline analysis)

Blackbox Optimisation

can surpass human expert-level tuning

Page 10: Performance Tuning in Computer Systems with Machine …ey204/pubs/talks/2019_12_11_RAIS.pdfDeep Learning, Machine Learning, and AI… e.g. CNN, LSTM e.g. Logistic regression, Neural

Deep Learning, Machine Learning, and AI…

e.g. CNN, LSTM

e.g. Logistic regression, Neural Networks, Bayesian, Reinforcement Learning..

Machine learning: a set of methods for creating models that describe or predicting something about the world. It does so by learning those models from data.

Page 11: Performance Tuning in Computer Systems with Machine …ey204/pubs/talks/2019_12_11_RAIS.pdfDeep Learning, Machine Learning, and AI… e.g. CNN, LSTM e.g. Logistic regression, Neural

Bayesian optimisation

Domain

Objective

Page 12: Performance Tuning in Computer Systems with Machine …ey204/pubs/talks/2019_12_11_RAIS.pdfDeep Learning, Machine Learning, and AI… e.g. CNN, LSTM e.g. Logistic regression, Neural

Bayesian optimisation

Domain

Objective

Page 13: Performance Tuning in Computer Systems with Machine …ey204/pubs/talks/2019_12_11_RAIS.pdfDeep Learning, Machine Learning, and AI… e.g. CNN, LSTM e.g. Logistic regression, Neural

Bayesian optimisation

Domain

Objective

Page 14: Performance Tuning in Computer Systems with Machine …ey204/pubs/talks/2019_12_11_RAIS.pdfDeep Learning, Machine Learning, and AI… e.g. CNN, LSTM e.g. Logistic regression, Neural

Bayesian optimisation

Domain

Objective

Page 15: Performance Tuning in Computer Systems with Machine …ey204/pubs/talks/2019_12_11_RAIS.pdfDeep Learning, Machine Learning, and AI… e.g. CNN, LSTM e.g. Logistic regression, Neural

Bayesian optimisation

Domain

Objective

Page 16: Performance Tuning in Computer Systems with Machine …ey204/pubs/talks/2019_12_11_RAIS.pdfDeep Learning, Machine Learning, and AI… e.g. CNN, LSTM e.g. Logistic regression, Neural

Bayesian optimisation

Domain

Objective

Page 17: Performance Tuning in Computer Systems with Machine …ey204/pubs/talks/2019_12_11_RAIS.pdfDeep Learning, Machine Learning, and AI… e.g. CNN, LSTM e.g. Logistic regression, Neural

Bayesian optimisation

Domain

Objective

Page 18: Performance Tuning in Computer Systems with Machine …ey204/pubs/talks/2019_12_11_RAIS.pdfDeep Learning, Machine Learning, and AI… e.g. CNN, LSTM e.g. Logistic regression, Neural

Bayesian optimisation

Domain

Objective

Page 19: Performance Tuning in Computer Systems with Machine …ey204/pubs/talks/2019_12_11_RAIS.pdfDeep Learning, Machine Learning, and AI… e.g. CNN, LSTM e.g. Logistic regression, Neural

Domai

n

Objecti

ve

Bayesian optimisation

Page 20: Performance Tuning in Computer Systems with Machine …ey204/pubs/talks/2019_12_11_RAIS.pdfDeep Learning, Machine Learning, and AI… e.g. CNN, LSTM e.g. Logistic regression, Neural

Domain

Objective

Bayesian optimisation

Page 21: Performance Tuning in Computer Systems with Machine …ey204/pubs/talks/2019_12_11_RAIS.pdfDeep Learning, Machine Learning, and AI… e.g. CNN, LSTM e.g. Logistic regression, Neural

Domain

Objective

Bayesian optimisation

Page 22: Performance Tuning in Computer Systems with Machine …ey204/pubs/talks/2019_12_11_RAIS.pdfDeep Learning, Machine Learning, and AI… e.g. CNN, LSTM e.g. Logistic regression, Neural

Domain

Objective

Bayesian optimisation

Page 23: Performance Tuning in Computer Systems with Machine …ey204/pubs/talks/2019_12_11_RAIS.pdfDeep Learning, Machine Learning, and AI… e.g. CNN, LSTM e.g. Logistic regression, Neural

Domain

Objective

Bayesian optimisation

Page 24: Performance Tuning in Computer Systems with Machine …ey204/pubs/talks/2019_12_11_RAIS.pdfDeep Learning, Machine Learning, and AI… e.g. CNN, LSTM e.g. Logistic regression, Neural

Bayesian optimisation

① Find promising point (parameter values with

high performance value in the model)

② Evaluate the objective function at that point

③ Update the model to reflect this new

measurement

Iteratively build a probabilistic model of objective function

Page 25: Performance Tuning in Computer Systems with Machine …ey204/pubs/talks/2019_12_11_RAIS.pdfDeep Learning, Machine Learning, and AI… e.g. CNN, LSTM e.g. Logistic regression, Neural

Bayesian optimisation

① Find promising point (parameter values with

high performance value in the model)

② Evaluate the objective function at that point

③ Update the model to reflect this new

measurement

Pros:

✓ Data efficient: converges in few iterations

✓ Able to deal with noisy observations

Cons:

✗ In many dimensions, model does not converge to the objective function

Iteratively build a probabilistic model of objective function

Page 26: Performance Tuning in Computer Systems with Machine …ey204/pubs/talks/2019_12_11_RAIS.pdfDeep Learning, Machine Learning, and AI… e.g. CNN, LSTM e.g. Logistic regression, Neural

Structured Bayesian Optimisation

Probabilistic model in Probabilistic Programming:User-given probabilistic model of parameter space

Extend current Probabilistic C++ with various inference algorithms, multi objectives and other language support (e.g. Python)

Page 27: Performance Tuning in Computer Systems with Machine …ey204/pubs/talks/2019_12_11_RAIS.pdfDeep Learning, Machine Learning, and AI… e.g. CNN, LSTM e.g. Logistic regression, Neural

Probabilistic Model

Probabilistic models incorporate random variables and probability distributions into the model

Deterministic model gives a single possible outcome

Probabilistic model gives a probability distribution

Used for various probabilistic logic inference (e.g. MCMC-based inference, Bayesian inference…)

Python based PP:

Pyro: https://pyro.ai/examples

Edward: http://edwardlib.org

Page 28: Performance Tuning in Computer Systems with Machine …ey204/pubs/talks/2019_12_11_RAIS.pdfDeep Learning, Machine Learning, and AI… e.g. CNN, LSTM e.g. Logistic regression, Neural

Performance Improvement from Structure

1. User-given probabilistic model structured in semi-parametric model using Directed Acyclic Graph

2. Sub-Optimisation in numerical optimisation

Exploit structure to split problem into smaller optimisations

(enables nested optimisation)

Use decomposition mechanisms

Page 29: Performance Tuning in Computer Systems with Machine …ey204/pubs/talks/2019_12_11_RAIS.pdfDeep Learning, Machine Learning, and AI… e.g. CNN, LSTM e.g. Logistic regression, Neural

Semi-parametric Model

Easy to use and well suited to SBO

Understand general trend of Objective function

High precision in region of Optimum for finding highest performance

Too restrictive

Too generic

Just right

Page 30: Performance Tuning in Computer Systems with Machine …ey204/pubs/talks/2019_12_11_RAIS.pdfDeep Learning, Machine Learning, and AI… e.g. CNN, LSTM e.g. Logistic regression, Neural

Example:

Cassandra's garbage collection

Minimise 99th percentile latency of Cassandra

Cassandra

JVM

Garbage collection flags:

● Young generation size

● Survivor ratio

● Max tenuring threshold

Page 31: Performance Tuning in Computer Systems with Machine …ey204/pubs/talks/2019_12_11_RAIS.pdfDeep Learning, Machine Learning, and AI… e.g. CNN, LSTM e.g. Logistic regression, Neural

Define DAG Model

Define a directed acyclic graph (DAG) of models

99th Percentile

LatencyGC FlagsGC Rate

Model

GC Average

Duration Model

Latency

Model

Average

GC duration

GC Rate

Tune JVM parameters of a database (Cassandra) to minimise latency

Page 32: Performance Tuning in Computer Systems with Machine …ey204/pubs/talks/2019_12_11_RAIS.pdfDeep Learning, Machine Learning, and AI… e.g. CNN, LSTM e.g. Logistic regression, Neural

DAG model in BOATstruct CassandraModel : public DAGModel<CassandraModel> {

void model(int ygs, int sr, int mtt){// Calculate the size of the heap regionsdouble es = ygs * sr / (sr + 2.0);// Eden space's sizedouble ss = ygs / (sr + 2.0); // Survivor space's size

// Define the dataflow between semi-parametric modelsdouble rate = output("rate", rate_model, es);double duration = output("duration", duration_model,

es, ss, mtt);double latency = output("latency", latency_model,

rate, duration, es, ss, mtt);}

ProbEngine<GCRateModel> rate_model;ProbEngine<GCDurationModel> duration_model;ProbEngine<LatencyModel> latency_model;

};

Page 33: Performance Tuning in Computer Systems with Machine …ey204/pubs/talks/2019_12_11_RAIS.pdfDeep Learning, Machine Learning, and AI… e.g. CNN, LSTM e.g. Logistic regression, Neural

GC Rate Semi-parametric model

Page 34: Performance Tuning in Computer Systems with Machine …ey204/pubs/talks/2019_12_11_RAIS.pdfDeep Learning, Machine Learning, and AI… e.g. CNN, LSTM e.g. Logistic regression, Neural

Evaluation: Garbage collection

Page 35: Performance Tuning in Computer Systems with Machine …ey204/pubs/talks/2019_12_11_RAIS.pdfDeep Learning, Machine Learning, and AI… e.g. CNN, LSTM e.g. Logistic regression, Neural

Evaluation: Garbage collection

Page 36: Performance Tuning in Computer Systems with Machine …ey204/pubs/talks/2019_12_11_RAIS.pdfDeep Learning, Machine Learning, and AI… e.g. CNN, LSTM e.g. Logistic regression, Neural

Evaluation: Neural networks (SGD) scheduling

Communication

modelMachine

modelstm1 tm2 tm3 tm4

maxPredicted

time

Load balancing, worker

allocation over 10 machines =

30 parameters

Use TensorFlow

Page 37: Performance Tuning in Computer Systems with Machine …ey204/pubs/talks/2019_12_11_RAIS.pdfDeep Learning, Machine Learning, and AI… e.g. CNN, LSTM e.g. Logistic regression, Neural

Evaluation: Neural networks scheduling

Default configuration: 9.82s

OpenTuner: 8.71s

BOAT: 4.31s

Existing systems don’t converge!

Page 38: Performance Tuning in Computer Systems with Machine …ey204/pubs/talks/2019_12_11_RAIS.pdfDeep Learning, Machine Learning, and AI… e.g. CNN, LSTM e.g. Logistic regression, Neural

Case Studies

Task Scheduling in Cluster Computing

JVM Garbage Collector

Neural Network Hyper-parameter tuning

LLVM Compiler

ASICS/Soc Design

Page 39: Performance Tuning in Computer Systems with Machine …ey204/pubs/talks/2019_12_11_RAIS.pdfDeep Learning, Machine Learning, and AI… e.g. CNN, LSTM e.g. Logistic regression, Neural

Limitation of Bayersian Optimisation

Not efficient to model dynamic and/or combinatorial model

LLVM Compiler pass list optimisation(BaysOpt vs Random Search)

Ru

n T

ime (

s)

Iteration

Page 40: Performance Tuning in Computer Systems with Machine …ey204/pubs/talks/2019_12_11_RAIS.pdfDeep Learning, Machine Learning, and AI… e.g. CNN, LSTM e.g. Logistic regression, Neural

Computer Systems Optimisation Models Long-term planning: requires model of how actions affect future states.

Only a few system optimisations fall into this category, e.g. network routing optimisation.

Short-term dynamic control: major system components are under dynamic load, such as resource allocation and stream processing, where the future load is not statistically dependent on the current load. BaysOpt is sufficient to optimise distinct workloads. For dynamic workload, Reinforcement Learning would perform better.

Combinatorial optimisation: a set of options to be selected from a larger set under potential rules of combination. There is no straightforward similarity between different combinations. Many problems in device assignment, indexing, compiler optimisation fall in this category. BaysOpt cannot be easily applied. Either learning online if the task is cheap via random sampling, or via RL + pre-training if the task is expensive, or massively parallel online training if the resources are available.

Many systems problems are combinatorial in nature

Page 41: Performance Tuning in Computer Systems with Machine …ey204/pubs/talks/2019_12_11_RAIS.pdfDeep Learning, Machine Learning, and AI… e.g. CNN, LSTM e.g. Logistic regression, Neural

Deep Reinforcement Learning for Optimisation

Deep RL provides attractive framework for differentiable control Blackbox optimisation for dynamic/combinatorial problems Trained model can continuously make decisions on new instances

Problems:

Difficult task: make right decision in large discrete action spaces

Exploration in production system not unstable/unpredictable

Simulations can oversimplify problem and expensive to build

Long online training to build a model…

Many deep learning tools, no standard library for modern RL (~2014-2018)

Some standard flavours emerge but mostly tightly coupled logic/execution

e.g. TensorForce/Rlgraph: 20-30K downloads

Page 42: Performance Tuning in Computer Systems with Machine …ey204/pubs/talks/2019_12_11_RAIS.pdfDeep Learning, Machine Learning, and AI… e.g. CNN, LSTM e.g. Logistic regression, Neural

A brief history of Deep RL software

1. Gen (2014-16): Loose research scripts (e.g. DQN), high expertise

required, only specific simulators

2. Gen (2016-17): OpenAI gym gives unified task interface, reference implementations (e.g. OpenAI baselines)

3. Gen (2017-18): Generic declarative APIs, distributed abstractions (Ray RLlib), some standard flavours emerge

Problems: Tightly coupled execution/logic, testing, reuse,..

Page 43: Performance Tuning in Computer Systems with Machine …ey204/pubs/talks/2019_12_11_RAIS.pdfDeep Learning, Machine Learning, and AI… e.g. CNN, LSTM e.g. Logistic regression, Neural

Problem: Controlling dynamic behaviour

Page 44: Performance Tuning in Computer Systems with Machine …ey204/pubs/talks/2019_12_11_RAIS.pdfDeep Learning, Machine Learning, and AI… e.g. CNN, LSTM e.g. Logistic regression, Neural

Reinforcement Learning

Agent interacts with Dynamicenvironment

Goal: Maximise expectations over rewards over agent’s lifetime

Notion of Planning/Control, not single static configuration

What makes RL different from other ML paradigms?

There is no supervisor, only a reward signal

Feedback is delayed, not instantaneous

Time really matters (sequential)

Agent’s actions affect the subsequent data it receives

The most similar way to human brain’s behaviour…

Page 45: Performance Tuning in Computer Systems with Machine …ey204/pubs/talks/2019_12_11_RAIS.pdfDeep Learning, Machine Learning, and AI… e.g. CNN, LSTM e.g. Logistic regression, Neural

Where are the applications?

Page 46: Performance Tuning in Computer Systems with Machine …ey204/pubs/talks/2019_12_11_RAIS.pdfDeep Learning, Machine Learning, and AI… e.g. CNN, LSTM e.g. Logistic regression, Neural

RL Workloads

Unlike supervised learning, not a single dominant execution pattern

Distributed workloads: Hierarchies of sync/async data exchange

Algorithms highly sensitive to hyper-parameters

From large scale parallel training (e.g. AlphaGo) to single core

Page 47: Performance Tuning in Computer Systems with Machine …ey204/pubs/talks/2019_12_11_RAIS.pdfDeep Learning, Machine Learning, and AI… e.g. CNN, LSTM e.g. Logistic regression, Neural

RL in Computer Systems: Practical Considerations

Action spaces do not scale:

Systems problems often combinatorial

Exploration in production system not a good idea

Unstable, unpredictable

Simulations can oversimplify problem

Expensive to build, not justified versus gain

Online steps take too long

Page 48: Performance Tuning in Computer Systems with Machine …ey204/pubs/talks/2019_12_11_RAIS.pdfDeep Learning, Machine Learning, and AI… e.g. CNN, LSTM e.g. Logistic regression, Neural

Deep Reinforcement Learning for Optimisation

New programming model: Separation of logical dataflow from execution

(no standardised interface)

Automated graph generation/transformation

Page 49: Performance Tuning in Computer Systems with Machine …ey204/pubs/talks/2019_12_11_RAIS.pdfDeep Learning, Machine Learning, and AI… e.g. CNN, LSTM e.g. Logistic regression, Neural

RLgraph: Modular Dataflow Composition

Page 50: Performance Tuning in Computer Systems with Machine …ey204/pubs/talks/2019_12_11_RAIS.pdfDeep Learning, Machine Learning, and AI… e.g. CNN, LSTM e.g. Logistic regression, Neural

RLGraph: Separate Local and Distributed Execution

High performance RL computation graphs for RL with different distributed backends

Page 51: Performance Tuning in Computer Systems with Machine …ey204/pubs/talks/2019_12_11_RAIS.pdfDeep Learning, Machine Learning, and AI… e.g. CNN, LSTM e.g. Logistic regression, Neural

Evaluation: Distributed training

Page 52: Performance Tuning in Computer Systems with Machine …ey204/pubs/talks/2019_12_11_RAIS.pdfDeep Learning, Machine Learning, and AI… e.g. CNN, LSTM e.g. Logistic regression, Neural

Evaluation: Distributed TensorFlow (DM 3D task)

Page 53: Performance Tuning in Computer Systems with Machine …ey204/pubs/talks/2019_12_11_RAIS.pdfDeep Learning, Machine Learning, and AI… e.g. CNN, LSTM e.g. Logistic regression, Neural

Performance (Atari Pong) – APEX DQN based

Left: Distributed sample performance Right: Time to solve Pong (Score ~21)

Page 54: Performance Tuning in Computer Systems with Machine …ey204/pubs/talks/2019_12_11_RAIS.pdfDeep Learning, Machine Learning, and AI… e.g. CNN, LSTM e.g. Logistic regression, Neural

LIFT: Learning from Traces

Idea:

Task may be hard to scale, human can give examples

Ground model with demonstrations

Difficulty: Combining imperfect examples and experience

Page 55: Performance Tuning in Computer Systems with Machine …ey204/pubs/talks/2019_12_11_RAIS.pdfDeep Learning, Machine Learning, and AI… e.g. CNN, LSTM e.g. Logistic regression, Neural

Results (IMDB data set)

Query latencies: mean (left) 99th percentile (right)

Learn from Demonstration and Pre-Training Reducing online

training time

Page 56: Performance Tuning in Computer Systems with Machine …ey204/pubs/talks/2019_12_11_RAIS.pdfDeep Learning, Machine Learning, and AI… e.g. CNN, LSTM e.g. Logistic regression, Neural

Optimising DNN Computation with Graph Substitutions

TASO (SOSP, 2019): Performance improvement by transformation of computation graphs

In progress: use of Reinforcement Learning

Page 57: Performance Tuning in Computer Systems with Machine …ey204/pubs/talks/2019_12_11_RAIS.pdfDeep Learning, Machine Learning, and AI… e.g. CNN, LSTM e.g. Logistic regression, Neural

Case Studies

Packet Classification with RL Match a network packet to a rule from a set of rules

Objective: minimise the classification time and memory footprint

Deep RL solution to build decision trees

DB compound indexing

Stream Processing

Cluster Scheduling

Traffic Signal Control

Page 58: Performance Tuning in Computer Systems with Machine …ey204/pubs/talks/2019_12_11_RAIS.pdfDeep Learning, Machine Learning, and AI… e.g. CNN, LSTM e.g. Logistic regression, Neural

PARK: RL Opensource Platform

Page 59: Performance Tuning in Computer Systems with Machine …ey204/pubs/talks/2019_12_11_RAIS.pdfDeep Learning, Machine Learning, and AI… e.g. CNN, LSTM e.g. Logistic regression, Neural

AutoML: Neural Architecture Search

Current: ML expertise + Data + Computation

AutoML aims turning into: Data + 100 x Computation

Use of Reinforcement Learning, Evolutionary Algorithms

..and tune network model?

Graph transformation

Compression

+ Hyper parameter tuning

Page 60: Performance Tuning in Computer Systems with Machine …ey204/pubs/talks/2019_12_11_RAIS.pdfDeep Learning, Machine Learning, and AI… e.g. CNN, LSTM e.g. Logistic regression, Neural

Tuning Complex Computer Systems

BOAT: Building Auto-Tuners with Structured Bayesian Optimization, WWW 2017. (Morning Paper (2017.5.18) https://github.com/VDalibard/BOAT

RLgraph: Modular Computation Graphs for Deep Reinforcement Learning. SysML 2019. (https://arxiv.org/abs/1810.09028) RLgraph https://github.com/rlgraph/rlgraph

LIFT: Reinforcement Learning in Computer Systems by Learning From Demonstrations. (https://arxiv.org/abs/1808.07903)

Wield: Systematic Reinforcement Learning with Progressive Randomization. 2019. (https://arxiv.org/abs/1909.06844)