Download - Introduction to Deep Learning with Tensor owIntroduction to Deep Learning with Tensor ow Yunhao (Robin) Tang Department of IEOR, Columbia University February 25, 2019 IEOR 8100: Reinforcement

Page 1: Introduction to Deep Learning with Tensor owIntroduction to Deep Learning with Tensor ow Yunhao (Robin) Tang Department of IEOR, Columbia University February 25, 2019 IEOR 8100: Reinforcement

PreliminaryIntro to DL with Tensorflow

Introduction to Deep Learning with Tensorflow

Yunhao (Robin) TangDepartment of IEOR, Columbia University

February 25, 2019

IEOR 8100: Reinforcement Learning 1/36

Page 2: Introduction to Deep Learning with Tensor owIntroduction to Deep Learning with Tensor ow Yunhao (Robin) Tang Department of IEOR, Columbia University February 25, 2019 IEOR 8100: Reinforcement

Summary

What we will cover...

Simple Example: Linear Regression with Tensorflow

Intro to Deep Learning

Deep Learning with Tensorflow

Beyond

If time permits, we will also cover

OpenAI Gym

Implement basic DQN

Page 3: Introduction to Deep Learning with Tensor owIntroduction to Deep Learning with Tensor ow Yunhao (Robin) Tang Department of IEOR, Columbia University February 25, 2019 IEOR 8100: Reinforcement

Summary

Beyond

OpenAI Gym

Implement basic DQN

Page 4: Introduction to Deep Learning with Tensor owIntroduction to Deep Learning with Tensor ow Yunhao (Robin) Tang Department of IEOR, Columbia University February 25, 2019 IEOR 8100: Reinforcement

Summary

Beyond

OpenAI Gym

Implement basic DQN

Page 5: Introduction to Deep Learning with Tensor owIntroduction to Deep Learning with Tensor ow Yunhao (Robin) Tang Department of IEOR, Columbia University February 25, 2019 IEOR 8100: Reinforcement

Summary

Beyond

OpenAI Gym

Implement basic DQN

Page 6: Introduction to Deep Learning with Tensor owIntroduction to Deep Learning with Tensor ow Yunhao (Robin) Tang Department of IEOR, Columbia University February 25, 2019 IEOR 8100: Reinforcement

Summary

Beyond

OpenAI Gym

Implement basic DQN

Page 7: Introduction to Deep Learning with Tensor owIntroduction to Deep Learning with Tensor ow Yunhao (Robin) Tang Department of IEOR, Columbia University February 25, 2019 IEOR 8100: Reinforcement

Summary

Beyond

OpenAI Gym

Implement basic DQN

Page 8: Introduction to Deep Learning with Tensor owIntroduction to Deep Learning with Tensor ow Yunhao (Robin) Tang Department of IEOR, Columbia University February 25, 2019 IEOR 8100: Reinforcement

Summary

Beyond

OpenAI Gym

Implement basic DQN

Page 9: Introduction to Deep Learning with Tensor owIntroduction to Deep Learning with Tensor ow Yunhao (Robin) Tang Department of IEOR, Columbia University February 25, 2019 IEOR 8100: Reinforcement

Simple ExampleIntro to DLBeyond Tensorflow

Info about Tensorflow

TF is an open source project formally supported by Google.

Programming interface: Python, C++, C... Python is most well developed anddocumented.

Latest version: 1.12 (same time last year 1.5). Commonly used version: ≈1.0.(depend on other dependencies)

Can be installed via pip. Best supported on mac os and linux.

(a) Tensorflow paper

Page 10: Introduction to Deep Learning with Tensor owIntroduction to Deep Learning with Tensor ow Yunhao (Robin) Tang Department of IEOR, Columbia University February 25, 2019 IEOR 8100: Reinforcement

(b) Tensorflow paper

Page 11: Introduction to Deep Learning with Tensor owIntroduction to Deep Learning with Tensor ow Yunhao (Robin) Tang Department of IEOR, Columbia University February 25, 2019 IEOR 8100: Reinforcement

(c) Tensorflow paper

Page 12: Introduction to Deep Learning with Tensor owIntroduction to Deep Learning with Tensor ow Yunhao (Robin) Tang Department of IEOR, Columbia University February 25, 2019 IEOR 8100: Reinforcement

(d) Tensorflow paper

Page 13: Introduction to Deep Learning with Tensor owIntroduction to Deep Learning with Tensor ow Yunhao (Robin) Tang Department of IEOR, Columbia University February 25, 2019 IEOR 8100: Reinforcement

(e) Tensorflow paper

Page 14: Introduction to Deep Learning with Tensor owIntroduction to Deep Learning with Tensor ow Yunhao (Robin) Tang Department of IEOR, Columbia University February 25, 2019 IEOR 8100: Reinforcement

Simple Example: Linear Regression

Linear Regression: N data points {xi , yi}Ni=1, x ∈ Rk , y ∈ RSpecify Architecture: Consider linear function for regression with parameter:slope θ ∈ Rk and bias θ0 ∈ R. The prediction is

yi = θT xi + θ0

Define Loss: Loss function to minimize J = 1N

∑Ni=1(yi − yi )

2

Gradient Descent: θ ← θ − α∇θJ, θ0 ← θ0 − α∇θ0J

Page 15: Introduction to Deep Learning with Tensor owIntroduction to Deep Learning with Tensor ow Yunhao (Robin) Tang Department of IEOR, Columbia University February 25, 2019 IEOR 8100: Reinforcement

Linear Regression: N data points {xi , yi}Ni=1, x ∈ Rk , y ∈ R

Specify Architecture: Consider linear function for regression with parameter:slope θ ∈ Rk and bias θ0 ∈ R. The prediction is

yi = θT xi + θ0

∑Ni=1(yi − yi )

2

Page 16: Introduction to Deep Learning with Tensor owIntroduction to Deep Learning with Tensor ow Yunhao (Robin) Tang Department of IEOR, Columbia University February 25, 2019 IEOR 8100: Reinforcement

yi = θT xi + θ0

∑Ni=1(yi − yi )

2

Page 17: Introduction to Deep Learning with Tensor owIntroduction to Deep Learning with Tensor ow Yunhao (Robin) Tang Department of IEOR, Columbia University February 25, 2019 IEOR 8100: Reinforcement

yi = θT xi + θ0

∑Ni=1(yi − yi )

2

Page 18: Introduction to Deep Learning with Tensor owIntroduction to Deep Learning with Tensor ow Yunhao (Robin) Tang Department of IEOR, Columbia University February 25, 2019 IEOR 8100: Reinforcement

yi = θT xi + θ0

∑Ni=1(yi − yi )

2

Page 19: Introduction to Deep Learning with Tensor owIntroduction to Deep Learning with Tensor ow Yunhao (Robin) Tang Department of IEOR, Columbia University February 25, 2019 IEOR 8100: Reinforcement

Linear Regression: Data

(f) True Parameters

(g) Data

Page 20: Introduction to Deep Learning with Tensor owIntroduction to Deep Learning with Tensor ow Yunhao (Robin) Tang Department of IEOR, Columbia University February 25, 2019 IEOR 8100: Reinforcement

Linear Regression with Tensorflow

Let us just focus on defining architecture...

(h) Linear regression tf code

Page 21: Introduction to Deep Learning with Tensor owIntroduction to Deep Learning with Tensor ow Yunhao (Robin) Tang Department of IEOR, Columbia University February 25, 2019 IEOR 8100: Reinforcement

Linear Regression with Tensorflow

We launch the training...

(i) loss (j) fitted line

Page 22: Introduction to Deep Learning with Tensor owIntroduction to Deep Learning with Tensor ow Yunhao (Robin) Tang Department of IEOR, Columbia University February 25, 2019 IEOR 8100: Reinforcement

Intro to DL

Consider data sets with complex relation {xi , yi}Ni=1, linear regression will notwork.MLP (Multi Layer Perceptron): stack multiple layers of linear transformationsand nonlinear functions.

input x ∈ R6, first layer linear transformation W1x + b1 ∈ R4, then apply nonlinearfunction h1 = σ(W1x + b1) ∈ R4. Nonlinear function applies elementwise.second layer has W2, b2, transform as h2 = σ(W2h2 + b2) ∈ R3

final output y = W3σ(W2σ(W1x + b1) + b2) + b3 ∈ R

(k) mlp (l) nonlinear functions

Page 23: Introduction to Deep Learning with Tensor owIntroduction to Deep Learning with Tensor ow Yunhao (Robin) Tang Department of IEOR, Columbia University February 25, 2019 IEOR 8100: Reinforcement

Intro to DL

Consider data sets with complex relation {xi , yi}Ni=1, linear regression will notwork.

MLP (Multi Layer Perceptron): stack multiple layers of linear transformationsand nonlinear functions.

(m) mlp (n) nonlinear functions

Page 24: Introduction to Deep Learning with Tensor owIntroduction to Deep Learning with Tensor ow Yunhao (Robin) Tang Department of IEOR, Columbia University February 25, 2019 IEOR 8100: Reinforcement

Intro to DL

(o) mlp (p) nonlinear functions

Page 25: Introduction to Deep Learning with Tensor owIntroduction to Deep Learning with Tensor ow Yunhao (Robin) Tang Department of IEOR, Columbia University February 25, 2019 IEOR 8100: Reinforcement

Intro to DL

input x ∈ R6, first layer linear transformation W1x + b1 ∈ R4, then apply nonlinearfunction h1 = σ(W1x + b1) ∈ R4. Nonlinear function applies elementwise.

second layer has W2, b2, transform as h2 = σ(W2h2 + b2) ∈ R3

(q) mlp (r) nonlinear functions

Page 26: Introduction to Deep Learning with Tensor owIntroduction to Deep Learning with Tensor ow Yunhao (Robin) Tang Department of IEOR, Columbia University February 25, 2019 IEOR 8100: Reinforcement

Intro to DL

(s) mlp (t) nonlinear functions

Page 27: Introduction to Deep Learning with Tensor owIntroduction to Deep Learning with Tensor ow Yunhao (Robin) Tang Department of IEOR, Columbia University February 25, 2019 IEOR 8100: Reinforcement

Intro to DL

(u) mlp (v) nonlinear functions

Page 28: Introduction to Deep Learning with Tensor owIntroduction to Deep Learning with Tensor ow Yunhao (Robin) Tang Department of IEOR, Columbia University February 25, 2019 IEOR 8100: Reinforcement

Intro to DL

Architecture: define a complex input to output relation with stacked layers oflinear transformations and nonlinear functions (activations).

Architectures are adapted to data structure at hand.Input prior knowledge into modeling through architecture design.MLP, CNN (image), RNN (sequence data, audio, language...

Parameters: weights W1,W2,W3 and bias b1, b2, b3.Linear regression: slope θ and bias θ0

Page 29: Introduction to Deep Learning with Tensor owIntroduction to Deep Learning with Tensor ow Yunhao (Robin) Tang Department of IEOR, Columbia University February 25, 2019 IEOR 8100: Reinforcement

Intro to DL

Page 30: Introduction to Deep Learning with Tensor owIntroduction to Deep Learning with Tensor ow Yunhao (Robin) Tang Department of IEOR, Columbia University February 25, 2019 IEOR 8100: Reinforcement

Intro to DL

Architectures are adapted to data structure at hand.

Input prior knowledge into modeling through architecture design.MLP, CNN (image), RNN (sequence data, audio, language...

Page 31: Introduction to Deep Learning with Tensor owIntroduction to Deep Learning with Tensor ow Yunhao (Robin) Tang Department of IEOR, Columbia University February 25, 2019 IEOR 8100: Reinforcement

Intro to DL

Architectures are adapted to data structure at hand.Input prior knowledge into modeling through architecture design.

MLP, CNN (image), RNN (sequence data, audio, language...

Page 32: Introduction to Deep Learning with Tensor owIntroduction to Deep Learning with Tensor ow Yunhao (Robin) Tang Department of IEOR, Columbia University February 25, 2019 IEOR 8100: Reinforcement

Intro to DL

Page 33: Introduction to Deep Learning with Tensor owIntroduction to Deep Learning with Tensor ow Yunhao (Robin) Tang Department of IEOR, Columbia University February 25, 2019 IEOR 8100: Reinforcement

Intro to DL

Parameters: weights W1,W2,W3 and bias b1, b2, b3.

Linear regression: slope θ and bias θ0

Page 34: Introduction to Deep Learning with Tensor owIntroduction to Deep Learning with Tensor ow Yunhao (Robin) Tang Department of IEOR, Columbia University February 25, 2019 IEOR 8100: Reinforcement

Intro to DL

Page 35: Introduction to Deep Learning with Tensor owIntroduction to Deep Learning with Tensor ow Yunhao (Robin) Tang Department of IEOR, Columbia University February 25, 2019 IEOR 8100: Reinforcement

DL algorithm

Specify Architecture: how many layers, how many hidden unit per layer?yi = fθ(xi ) with generic parameters θ (weights and biases).

∑Ni=1(yi − yi )

2

Algorithm 1 Generic DL regression

1: Input: model architecture, data {xi , yi}, learning rate α2: Initialize: Model parameters θ = {Wi , bi}3: for t=1,2,3...T do4: Compute prediction yi = fθ(xi )5: Compute loss J = 1

N

∑i (yi − yi )

2

6: Gradient update θ ← θ − α∇θJ7: end for

Page 36: Introduction to Deep Learning with Tensor owIntroduction to Deep Learning with Tensor ow Yunhao (Robin) Tang Department of IEOR, Columbia University February 25, 2019 IEOR 8100: Reinforcement

DL algorithm

∑Ni=1(yi − yi )

2

N

∑i (yi − yi )

2

Page 37: Introduction to Deep Learning with Tensor owIntroduction to Deep Learning with Tensor ow Yunhao (Robin) Tang Department of IEOR, Columbia University February 25, 2019 IEOR 8100: Reinforcement

DL algorithm

∑Ni=1(yi − yi )

2

N

∑i (yi − yi )

2

Page 38: Introduction to Deep Learning with Tensor owIntroduction to Deep Learning with Tensor ow Yunhao (Robin) Tang Department of IEOR, Columbia University February 25, 2019 IEOR 8100: Reinforcement

DL algorithm

∑Ni=1(yi − yi )

2

N

∑i (yi − yi )

2

Page 39: Introduction to Deep Learning with Tensor owIntroduction to Deep Learning with Tensor ow Yunhao (Robin) Tang Department of IEOR, Columbia University February 25, 2019 IEOR 8100: Reinforcement

DL Regression: Data

Generate data using y = x3 + x2 + sin(10x) + noise. Linear regression will fail.

(w) data generation

(x) data plots

Page 40: Introduction to Deep Learning with Tensor owIntroduction to Deep Learning with Tensor ow Yunhao (Robin) Tang Department of IEOR, Columbia University February 25, 2019 IEOR 8100: Reinforcement

DL Regression with Tensorflow

Specify the Architecture, Loss using Tensorflow syntax just like linear regression.

(y) Linear regression tf code

train the model just like linear regression

Page 41: Introduction to Deep Learning with Tensor owIntroduction to Deep Learning with Tensor ow Yunhao (Robin) Tang Department of IEOR, Columbia University February 25, 2019 IEOR 8100: Reinforcement

Specify the Architecture, Loss using Tensorflow syntax just like linear regression.

(z) Linear regression tf code

train the model just like linear regression

Page 42: Introduction to Deep Learning with Tensor owIntroduction to Deep Learning with Tensor ow Yunhao (Robin) Tang Department of IEOR, Columbia University February 25, 2019 IEOR 8100: Reinforcement

We launch the training...

() loss () fitted line

Page 43: Introduction to Deep Learning with Tensor owIntroduction to Deep Learning with Tensor ow Yunhao (Robin) Tang Department of IEOR, Columbia University February 25, 2019 IEOR 8100: Reinforcement

Batch training

When data set is large, can only compute gradients on mini-batches.

Sample a batch of data → Compute gradient on the batch → Update parameters.

Batch size: too small: high variance in data stream, unstable training; too large:too costly to compute gradients.

Page 44: Introduction to Deep Learning with Tensor owIntroduction to Deep Learning with Tensor ow Yunhao (Robin) Tang Department of IEOR, Columbia University February 25, 2019 IEOR 8100: Reinforcement

Batch training

Page 45: Introduction to Deep Learning with Tensor owIntroduction to Deep Learning with Tensor ow Yunhao (Robin) Tang Department of IEOR, Columbia University February 25, 2019 IEOR 8100: Reinforcement

Batch training

Page 46: Introduction to Deep Learning with Tensor owIntroduction to Deep Learning with Tensor ow Yunhao (Robin) Tang Department of IEOR, Columbia University February 25, 2019 IEOR 8100: Reinforcement

Batch training

Page 47: Introduction to Deep Learning with Tensor owIntroduction to Deep Learning with Tensor ow Yunhao (Robin) Tang Department of IEOR, Columbia University February 25, 2019 IEOR 8100: Reinforcement

Auto-differentiation

Taking gradients in linear model is easy.

Question: how to take gradients for MLP? ∇W1J is not straightforward as inlinear regression.

J =1

N

∑i

(yi − yi )2, yi = W3σ(W2σ(W1xi + b1) + b2) + b3

Page 48: Introduction to Deep Learning with Tensor owIntroduction to Deep Learning with Tensor ow Yunhao (Robin) Tang Department of IEOR, Columbia University February 25, 2019 IEOR 8100: Reinforcement

J =1

N

∑i

Page 49: Introduction to Deep Learning with Tensor owIntroduction to Deep Learning with Tensor ow Yunhao (Robin) Tang Department of IEOR, Columbia University February 25, 2019 IEOR 8100: Reinforcement

J =1

N

∑i

Page 50: Introduction to Deep Learning with Tensor owIntroduction to Deep Learning with Tensor ow Yunhao (Robin) Tang Department of IEOR, Columbia University February 25, 2019 IEOR 8100: Reinforcement

Tensorflow entails automatic differentiation (autodiff), i.e. automatic gradientcomputation. Users specify forward computation y = fθ(x), the program willinternally specify a way to compute ∇θy . Not symbolic computation, not finitedifference

Example: start with a, b, compute e: c = a + b, d = b + 1, e = c · d . Consider

∂e

∂a=∂e

∂d

∂a+∂e

∂c

∂a

() computation graph

Page 51: Introduction to Deep Learning with Tensor owIntroduction to Deep Learning with Tensor ow Yunhao (Robin) Tang Department of IEOR, Columbia University February 25, 2019 IEOR 8100: Reinforcement

∂e

∂a=∂e

∂d

∂a+∂e

∂c

∂a

Page 52: Introduction to Deep Learning with Tensor owIntroduction to Deep Learning with Tensor ow Yunhao (Robin) Tang Department of IEOR, Columbia University February 25, 2019 IEOR 8100: Reinforcement

∂e

∂a=∂e

∂d

∂a+∂e

∂c

∂a

Page 53: Introduction to Deep Learning with Tensor owIntroduction to Deep Learning with Tensor ow Yunhao (Robin) Tang Department of IEOR, Columbia University February 25, 2019 IEOR 8100: Reinforcement

DL with Tensorflow

High level idea: computation graph. Forward computation specifies localoperations that connect locally related variables. Local gradients can becomputed.

forward computation → local forward ops → local gradient → long rangegradients.

Just like error back-propagating from output to inputs and params,back-propagation.

Useful link: http://deeplearning.stanford.edu/wiki/index.php/UFLDL Tutorial

Page 54: Introduction to Deep Learning with Tensor owIntroduction to Deep Learning with Tensor ow Yunhao (Robin) Tang Department of IEOR, Columbia University February 25, 2019 IEOR 8100: Reinforcement

DL with Tensorflow

Page 55: Introduction to Deep Learning with Tensor owIntroduction to Deep Learning with Tensor ow Yunhao (Robin) Tang Department of IEOR, Columbia University February 25, 2019 IEOR 8100: Reinforcement

DL with Tensorflow

Page 56: Introduction to Deep Learning with Tensor owIntroduction to Deep Learning with Tensor ow Yunhao (Robin) Tang Department of IEOR, Columbia University February 25, 2019 IEOR 8100: Reinforcement

DL with Tensorflow

Page 57: Introduction to Deep Learning with Tensor owIntroduction to Deep Learning with Tensor ow Yunhao (Robin) Tang Department of IEOR, Columbia University February 25, 2019 IEOR 8100: Reinforcement

DL with Tensorflow

Page 58: Introduction to Deep Learning with Tensor owIntroduction to Deep Learning with Tensor ow Yunhao (Robin) Tang Department of IEOR, Columbia University February 25, 2019 IEOR 8100: Reinforcement

DL with Tensorflow

What we have covered from the code...

Placeholder: tf objects that specify users’ inputs into the graph, as literallyplaceholders for data.

Variable: tf objects that represent parameters that can be updated throughautodiff. Embedded inside the graph.

All expressions computed from primitive variables and placeholders are vertices inthe graph.

Page 59: Introduction to Deep Learning with Tensor owIntroduction to Deep Learning with Tensor ow Yunhao (Robin) Tang Department of IEOR, Columbia University February 25, 2019 IEOR 8100: Reinforcement

DL with Tensorflow

Page 60: Introduction to Deep Learning with Tensor owIntroduction to Deep Learning with Tensor ow Yunhao (Robin) Tang Department of IEOR, Columbia University February 25, 2019 IEOR 8100: Reinforcement

DL with Tensorflow

Page 61: Introduction to Deep Learning with Tensor owIntroduction to Deep Learning with Tensor ow Yunhao (Robin) Tang Department of IEOR, Columbia University February 25, 2019 IEOR 8100: Reinforcement

DL with Tensorflow

Page 62: Introduction to Deep Learning with Tensor owIntroduction to Deep Learning with Tensor ow Yunhao (Robin) Tang Department of IEOR, Columbia University February 25, 2019 IEOR 8100: Reinforcement

DL with Tensorflow

What we have not yet covered from the code... how to launch the training? how touse the gradients computed by Tensorflow to update parameters?

TF expression: python objects that define how to update the graph (variables inthe graph) in a single step.

Optimizer: python objects that can specify gradient updates of variables.

Session: python objects that launch the training.

And many more...

Page 63: Introduction to Deep Learning with Tensor owIntroduction to Deep Learning with Tensor ow Yunhao (Robin) Tang Department of IEOR, Columbia University February 25, 2019 IEOR 8100: Reinforcement

DL with Tensorflow

And many more...

Page 64: Introduction to Deep Learning with Tensor owIntroduction to Deep Learning with Tensor ow Yunhao (Robin) Tang Department of IEOR, Columbia University February 25, 2019 IEOR 8100: Reinforcement

DL with Tensorflow

And many more...

Page 65: Introduction to Deep Learning with Tensor owIntroduction to Deep Learning with Tensor ow Yunhao (Robin) Tang Department of IEOR, Columbia University February 25, 2019 IEOR 8100: Reinforcement

DL with Tensorflow

And many more...

Page 66: Introduction to Deep Learning with Tensor owIntroduction to Deep Learning with Tensor ow Yunhao (Robin) Tang Department of IEOR, Columbia University February 25, 2019 IEOR 8100: Reinforcement

DL with Tensorflow

And many more...

Page 67: Introduction to Deep Learning with Tensor owIntroduction to Deep Learning with Tensor ow Yunhao (Robin) Tang Department of IEOR, Columbia University February 25, 2019 IEOR 8100: Reinforcement

Launch the training

Define loss and optimizers: which specifies how to update model parametersduring training.

Define session to launch the training

Initialize variables

Feed data into the computation graph

() launch the training

Page 68: Introduction to Deep Learning with Tensor owIntroduction to Deep Learning with Tensor ow Yunhao (Robin) Tang Department of IEOR, Columbia University February 25, 2019 IEOR 8100: Reinforcement

Launch the training

Page 69: Introduction to Deep Learning with Tensor owIntroduction to Deep Learning with Tensor ow Yunhao (Robin) Tang Department of IEOR, Columbia University February 25, 2019 IEOR 8100: Reinforcement

Launch the training

Page 70: Introduction to Deep Learning with Tensor owIntroduction to Deep Learning with Tensor ow Yunhao (Robin) Tang Department of IEOR, Columbia University February 25, 2019 IEOR 8100: Reinforcement

Launch the training

Page 71: Introduction to Deep Learning with Tensor owIntroduction to Deep Learning with Tensor ow Yunhao (Robin) Tang Department of IEOR, Columbia University February 25, 2019 IEOR 8100: Reinforcement

Launch the training

Page 72: Introduction to Deep Learning with Tensor owIntroduction to Deep Learning with Tensor ow Yunhao (Robin) Tang Department of IEOR, Columbia University February 25, 2019 IEOR 8100: Reinforcement

Training: whole landscape

Page 73: Introduction to Deep Learning with Tensor owIntroduction to Deep Learning with Tensor ow Yunhao (Robin) Tang Department of IEOR, Columbia University February 25, 2019 IEOR 8100: Reinforcement

Beyond Regression

Classification: regression on probability (image classification)

Structured Prediction: regression on high dimensional objects (autoencoders)

Reinforcement Learning: DQN, policy gradients...DQN is an analogy to regressionPG is an analogy to classification

Page 74: Introduction to Deep Learning with Tensor owIntroduction to Deep Learning with Tensor ow Yunhao (Robin) Tang Department of IEOR, Columbia University February 25, 2019 IEOR 8100: Reinforcement

Beyond Regression

Page 75: Introduction to Deep Learning with Tensor owIntroduction to Deep Learning with Tensor ow Yunhao (Robin) Tang Department of IEOR, Columbia University February 25, 2019 IEOR 8100: Reinforcement

Beyond Regression

Page 76: Introduction to Deep Learning with Tensor owIntroduction to Deep Learning with Tensor ow Yunhao (Robin) Tang Department of IEOR, Columbia University February 25, 2019 IEOR 8100: Reinforcement

Beyond Regression

Reinforcement Learning: DQN, policy gradients...

DQN is an analogy to regressionPG is an analogy to classification

Page 77: Introduction to Deep Learning with Tensor owIntroduction to Deep Learning with Tensor ow Yunhao (Robin) Tang Department of IEOR, Columbia University February 25, 2019 IEOR 8100: Reinforcement

Beyond Regression

Reinforcement Learning: DQN, policy gradients...DQN is an analogy to regression

PG is an analogy to classification

Page 78: Introduction to Deep Learning with Tensor owIntroduction to Deep Learning with Tensor ow Yunhao (Robin) Tang Department of IEOR, Columbia University February 25, 2019 IEOR 8100: Reinforcement

Beyond Regression

Page 79: Introduction to Deep Learning with Tensor owIntroduction to Deep Learning with Tensor ow Yunhao (Robin) Tang Department of IEOR, Columbia University February 25, 2019 IEOR 8100: Reinforcement

Hyper-parameters

Get a DL model to work involves a lot of hyper-parameters.

Optimization: optimization subroutines that update parameters with gradients.beyond simple sgd, there are rmsprop, adam, adagrad... adam is popular.learning rate

Architectures: number of layers, hidden units per layers

Nonlinear function: sigmoid, relu, tanh...

Initialization: initialization of variables W , b.arbitrary initializations will fail.xavier initialization, truncated normal...

Need hand tuning or cross validation to select good hyper-parameters.

Other topics: batch-normalization, dropout, stochastic layers...

Page 80: Introduction to Deep Learning with Tensor owIntroduction to Deep Learning with Tensor ow Yunhao (Robin) Tang Department of IEOR, Columbia University February 25, 2019 IEOR 8100: Reinforcement

Hyper-parameters

Optimization: optimization subroutines that update parameters with gradients.

beyond simple sgd, there are rmsprop, adam, adagrad... adam is popular.learning rate

Page 81: Introduction to Deep Learning with Tensor owIntroduction to Deep Learning with Tensor ow Yunhao (Robin) Tang Department of IEOR, Columbia University February 25, 2019 IEOR 8100: Reinforcement

Hyper-parameters

Optimization: optimization subroutines that update parameters with gradients.beyond simple sgd, there are rmsprop, adam, adagrad... adam is popular.

learning rate

Page 82: Introduction to Deep Learning with Tensor owIntroduction to Deep Learning with Tensor ow Yunhao (Robin) Tang Department of IEOR, Columbia University February 25, 2019 IEOR 8100: Reinforcement

Hyper-parameters

Page 83: Introduction to Deep Learning with Tensor owIntroduction to Deep Learning with Tensor ow Yunhao (Robin) Tang Department of IEOR, Columbia University February 25, 2019 IEOR 8100: Reinforcement

Hyper-parameters

Page 84: Introduction to Deep Learning with Tensor owIntroduction to Deep Learning with Tensor ow Yunhao (Robin) Tang Department of IEOR, Columbia University February 25, 2019 IEOR 8100: Reinforcement

Hyper-parameters

Page 85: Introduction to Deep Learning with Tensor owIntroduction to Deep Learning with Tensor ow Yunhao (Robin) Tang Department of IEOR, Columbia University February 25, 2019 IEOR 8100: Reinforcement

Hyper-parameters

Initialization: initialization of variables W , b.

arbitrary initializations will fail.xavier initialization, truncated normal...

Page 86: Introduction to Deep Learning with Tensor owIntroduction to Deep Learning with Tensor ow Yunhao (Robin) Tang Department of IEOR, Columbia University February 25, 2019 IEOR 8100: Reinforcement

Hyper-parameters

Initialization: initialization of variables W , b.arbitrary initializations will fail.

xavier initialization, truncated normal...

Page 87: Introduction to Deep Learning with Tensor owIntroduction to Deep Learning with Tensor ow Yunhao (Robin) Tang Department of IEOR, Columbia University February 25, 2019 IEOR 8100: Reinforcement

Hyper-parameters

Page 88: Introduction to Deep Learning with Tensor owIntroduction to Deep Learning with Tensor ow Yunhao (Robin) Tang Department of IEOR, Columbia University February 25, 2019 IEOR 8100: Reinforcement

Hyper-parameters

Page 89: Introduction to Deep Learning with Tensor owIntroduction to Deep Learning with Tensor ow Yunhao (Robin) Tang Department of IEOR, Columbia University February 25, 2019 IEOR 8100: Reinforcement

Hyper-parameters

Page 90: Introduction to Deep Learning with Tensor owIntroduction to Deep Learning with Tensor ow Yunhao (Robin) Tang Department of IEOR, Columbia University February 25, 2019 IEOR 8100: Reinforcement

Beyond Tensorflow

Other autodiff softwares: Pytorch/Chainer/Theano/Caffepytorch and chainer allow for dynamic graph buildingothers use static graph buildingstrengths/weaknesses: gpu, distributed computing, model building flexibility,debug...

High level interface: Kerasuse tf and theano as backendspecify model architecture more easily

High level interface: stick with Tensorflowtensorflow.contrib

Page 91: Introduction to Deep Learning with Tensor owIntroduction to Deep Learning with Tensor ow Yunhao (Robin) Tang Department of IEOR, Columbia University February 25, 2019 IEOR 8100: Reinforcement

Beyond Tensorflow

Other autodiff softwares: Pytorch/Chainer/Theano/Caffe

pytorch and chainer allow for dynamic graph buildingothers use static graph buildingstrengths/weaknesses: gpu, distributed computing, model building flexibility,debug...

Page 92: Introduction to Deep Learning with Tensor owIntroduction to Deep Learning with Tensor ow Yunhao (Robin) Tang Department of IEOR, Columbia University February 25, 2019 IEOR 8100: Reinforcement

Beyond Tensorflow

Other autodiff softwares: Pytorch/Chainer/Theano/Caffepytorch and chainer allow for dynamic graph building

others use static graph buildingstrengths/weaknesses: gpu, distributed computing, model building flexibility,debug...

Page 93: Introduction to Deep Learning with Tensor owIntroduction to Deep Learning with Tensor ow Yunhao (Robin) Tang Department of IEOR, Columbia University February 25, 2019 IEOR 8100: Reinforcement

Beyond Tensorflow

Other autodiff softwares: Pytorch/Chainer/Theano/Caffepytorch and chainer allow for dynamic graph buildingothers use static graph building

strengths/weaknesses: gpu, distributed computing, model building flexibility,debug...

Page 94: Introduction to Deep Learning with Tensor owIntroduction to Deep Learning with Tensor ow Yunhao (Robin) Tang Department of IEOR, Columbia University February 25, 2019 IEOR 8100: Reinforcement

Beyond Tensorflow

Page 95: Introduction to Deep Learning with Tensor owIntroduction to Deep Learning with Tensor ow Yunhao (Robin) Tang Department of IEOR, Columbia University February 25, 2019 IEOR 8100: Reinforcement

Beyond Tensorflow

High level interface: Keras

use tf and theano as backendspecify model architecture more easily

Page 96: Introduction to Deep Learning with Tensor owIntroduction to Deep Learning with Tensor ow Yunhao (Robin) Tang Department of IEOR, Columbia University February 25, 2019 IEOR 8100: Reinforcement

Beyond Tensorflow

High level interface: Kerasuse tf and theano as backend

specify model architecture more easily

Page 97: Introduction to Deep Learning with Tensor owIntroduction to Deep Learning with Tensor ow Yunhao (Robin) Tang Department of IEOR, Columbia University February 25, 2019 IEOR 8100: Reinforcement

Beyond Tensorflow

Page 98: Introduction to Deep Learning with Tensor owIntroduction to Deep Learning with Tensor ow Yunhao (Robin) Tang Department of IEOR, Columbia University February 25, 2019 IEOR 8100: Reinforcement

Beyond Tensorflow

High level interface: stick with Tensorflow

tensorflow.contrib

Page 99: Introduction to Deep Learning with Tensor owIntroduction to Deep Learning with Tensor ow Yunhao (Robin) Tang Department of IEOR, Columbia University February 25, 2019 IEOR 8100: Reinforcement

Beyond Tensorflow

Page 100: Introduction to Deep Learning with Tensor owIntroduction to Deep Learning with Tensor ow Yunhao (Robin) Tang Department of IEOR, Columbia University February 25, 2019 IEOR 8100: Reinforcement

Beyond Tensorflow

() low level vs. high level

Page 101: Introduction to Deep Learning with Tensor owIntroduction to Deep Learning with Tensor ow Yunhao (Robin) Tang Department of IEOR, Columbia University February 25, 2019 IEOR 8100: Reinforcement

Additional Resources

TF official website: https://www.tensorflow.org/

TF Basic tutorial: https://www.tensorflow.org/tutorials/

Keras doc: https://keras.io/

TF tutorials: https://github.com/aymericdamien/TensorFlow-Examples

Page 102: Introduction to Deep Learning with Tensor owIntroduction to Deep Learning with Tensor ow Yunhao (Robin) Tang Department of IEOR, Columbia University February 25, 2019 IEOR 8100: Reinforcement

Page 103: Introduction to Deep Learning with Tensor owIntroduction to Deep Learning with Tensor ow Yunhao (Robin) Tang Department of IEOR, Columbia University February 25, 2019 IEOR 8100: Reinforcement

Page 104: Introduction to Deep Learning with Tensor owIntroduction to Deep Learning with Tensor ow Yunhao (Robin) Tang Department of IEOR, Columbia University February 25, 2019 IEOR 8100: Reinforcement

Page 105: Introduction to Deep Learning with Tensor owIntroduction to Deep Learning with Tensor ow Yunhao (Robin) Tang Department of IEOR, Columbia University February 25, 2019 IEOR 8100: Reinforcement

Page 106: Introduction to Deep Learning with Tensor owIntroduction to Deep Learning with Tensor ow Yunhao (Robin) Tang Department of IEOR, Columbia University February 25, 2019 IEOR 8100: Reinforcement

OpenAI Gym - very brief intro

() classic control () humanoid () atari game - pong

Page 107: Introduction to Deep Learning with Tensor owIntroduction to Deep Learning with Tensor ow Yunhao (Robin) Tang Department of IEOR, Columbia University February 25, 2019 IEOR 8100: Reinforcement

Gym is a testbed for RL algorithms. Doc: https://github.com/openai/gym

make an environment: env = gym.make(”CartPole-v0”)

initialize: obs = env.reset()first obs, we take the first action based on this

take a step: obs, reward, done, info = env.step(action)based on obs, we make choices on which action to takereward is the reward collected during this one step transitiondone is True or False, to indicate if the episode terminatesinfo additional info about the env, normally empty dictionary

display state: env.render(). Only display current state, need to render in a loopto display consecutive states.

Page 108: Introduction to Deep Learning with Tensor owIntroduction to Deep Learning with Tensor ow Yunhao (Robin) Tang Department of IEOR, Columbia University February 25, 2019 IEOR 8100: Reinforcement

Page 109: Introduction to Deep Learning with Tensor owIntroduction to Deep Learning with Tensor ow Yunhao (Robin) Tang Department of IEOR, Columbia University February 25, 2019 IEOR 8100: Reinforcement

Page 110: Introduction to Deep Learning with Tensor owIntroduction to Deep Learning with Tensor ow Yunhao (Robin) Tang Department of IEOR, Columbia University February 25, 2019 IEOR 8100: Reinforcement

initialize: obs = env.reset()

first obs, we take the first action based on this

Page 111: Introduction to Deep Learning with Tensor owIntroduction to Deep Learning with Tensor ow Yunhao (Robin) Tang Department of IEOR, Columbia University February 25, 2019 IEOR 8100: Reinforcement

Page 112: Introduction to Deep Learning with Tensor owIntroduction to Deep Learning with Tensor ow Yunhao (Robin) Tang Department of IEOR, Columbia University February 25, 2019 IEOR 8100: Reinforcement

take a step: obs, reward, done, info = env.step(action)

based on obs, we make choices on which action to takereward is the reward collected during this one step transitiondone is True or False, to indicate if the episode terminatesinfo additional info about the env, normally empty dictionary

Page 113: Introduction to Deep Learning with Tensor owIntroduction to Deep Learning with Tensor ow Yunhao (Robin) Tang Department of IEOR, Columbia University February 25, 2019 IEOR 8100: Reinforcement

take a step: obs, reward, done, info = env.step(action)based on obs, we make choices on which action to take

reward is the reward collected during this one step transitiondone is True or False, to indicate if the episode terminatesinfo additional info about the env, normally empty dictionary

Page 114: Introduction to Deep Learning with Tensor owIntroduction to Deep Learning with Tensor ow Yunhao (Robin) Tang Department of IEOR, Columbia University February 25, 2019 IEOR 8100: Reinforcement

take a step: obs, reward, done, info = env.step(action)based on obs, we make choices on which action to takereward is the reward collected during this one step transition

done is True or False, to indicate if the episode terminatesinfo additional info about the env, normally empty dictionary

Page 115: Introduction to Deep Learning with Tensor owIntroduction to Deep Learning with Tensor ow Yunhao (Robin) Tang Department of IEOR, Columbia University February 25, 2019 IEOR 8100: Reinforcement

take a step: obs, reward, done, info = env.step(action)based on obs, we make choices on which action to takereward is the reward collected during this one step transitiondone is True or False, to indicate if the episode terminates

info additional info about the env, normally empty dictionary

Page 116: Introduction to Deep Learning with Tensor owIntroduction to Deep Learning with Tensor ow Yunhao (Robin) Tang Department of IEOR, Columbia University February 25, 2019 IEOR 8100: Reinforcement

Page 117: Introduction to Deep Learning with Tensor owIntroduction to Deep Learning with Tensor ow Yunhao (Robin) Tang Department of IEOR, Columbia University February 25, 2019 IEOR 8100: Reinforcement

Page 118: Introduction to Deep Learning with Tensor owIntroduction to Deep Learning with Tensor ow Yunhao (Robin) Tang Department of IEOR, Columbia University February 25, 2019 IEOR 8100: Reinforcement

Beyond OpenAI Gym

There are many other open source projects that provide RL benchmarks

Continuous control: DeepMind Control Suite, Roboschool

Gym-like: rllab

Video game: VizDoom

Cooler video game: DeepMind StarCraft II Learning Environment

Or build your own environments...

Page 119: Introduction to Deep Learning with Tensor owIntroduction to Deep Learning with Tensor ow Yunhao (Robin) Tang Department of IEOR, Columbia University February 25, 2019 IEOR 8100: Reinforcement

Beyond OpenAI Gym

Gym-like: rllab

Video game: VizDoom

Page 120: Introduction to Deep Learning with Tensor owIntroduction to Deep Learning with Tensor ow Yunhao (Robin) Tang Department of IEOR, Columbia University February 25, 2019 IEOR 8100: Reinforcement

Beyond OpenAI Gym

Gym-like: rllab

Video game: VizDoom

Page 121: Introduction to Deep Learning with Tensor owIntroduction to Deep Learning with Tensor ow Yunhao (Robin) Tang Department of IEOR, Columbia University February 25, 2019 IEOR 8100: Reinforcement

Beyond OpenAI Gym

Gym-like: rllab

Video game: VizDoom

Page 122: Introduction to Deep Learning with Tensor owIntroduction to Deep Learning with Tensor ow Yunhao (Robin) Tang Department of IEOR, Columbia University February 25, 2019 IEOR 8100: Reinforcement

Beyond OpenAI Gym

Gym-like: rllab

Video game: VizDoom

Page 123: Introduction to Deep Learning with Tensor owIntroduction to Deep Learning with Tensor ow Yunhao (Robin) Tang Department of IEOR, Columbia University February 25, 2019 IEOR 8100: Reinforcement

Basic DQN [Mnih, 2013]

MDP with state action space S ,A

Instant reward r

Policy π : S 7→ A

Maximize cumulative reward Eπ[∑∞

t=0 rtγt ]

Action value function Qπ(s, a) for state s, action a, under policy π.

Bellman errorE [(Qπ(st , at)−max

aE [rt + γQπ(st+1, a))2]

Bellman error is zero iff optimal policy Q∗(s, a)

Page 124: Introduction to Deep Learning with Tensor owIntroduction to Deep Learning with Tensor ow Yunhao (Robin) Tang Department of IEOR, Columbia University February 25, 2019 IEOR 8100: Reinforcement

Instant reward r

t=0 rtγt ]

Page 125: Introduction to Deep Learning with Tensor owIntroduction to Deep Learning with Tensor ow Yunhao (Robin) Tang Department of IEOR, Columbia University February 25, 2019 IEOR 8100: Reinforcement

Instant reward r

t=0 rtγt ]

Page 126: Introduction to Deep Learning with Tensor owIntroduction to Deep Learning with Tensor ow Yunhao (Robin) Tang Department of IEOR, Columbia University February 25, 2019 IEOR 8100: Reinforcement

Instant reward r

t=0 rtγt ]

Page 127: Introduction to Deep Learning with Tensor owIntroduction to Deep Learning with Tensor ow Yunhao (Robin) Tang Department of IEOR, Columbia University February 25, 2019 IEOR 8100: Reinforcement

Instant reward r

t=0 rtγt ]

Page 128: Introduction to Deep Learning with Tensor owIntroduction to Deep Learning with Tensor ow Yunhao (Robin) Tang Department of IEOR, Columbia University February 25, 2019 IEOR 8100: Reinforcement

Instant reward r

t=0 rtγt ]

Page 129: Introduction to Deep Learning with Tensor owIntroduction to Deep Learning with Tensor ow Yunhao (Robin) Tang Department of IEOR, Columbia University February 25, 2019 IEOR 8100: Reinforcement

Instant reward r

t=0 rtγt ]

Page 130: Introduction to Deep Learning with Tensor owIntroduction to Deep Learning with Tensor ow Yunhao (Robin) Tang Department of IEOR, Columbia University February 25, 2019 IEOR 8100: Reinforcement

Instant reward r

t=0 rtγt ]

Page 131: Introduction to Deep Learning with Tensor owIntroduction to Deep Learning with Tensor ow Yunhao (Robin) Tang Department of IEOR, Columbia University February 25, 2019 IEOR 8100: Reinforcement

Basic DQN: Conventional Q learning

Start with any Q vector Q(0)

Contractive operator T defined as

TQ(st , at) =def maxa

E [rt + γQ(st+1, a)]

Apply contractive operatorQ(t+1) ← TQ(t)

WIll converge under mild conditions. Converge to the fixed point

Q = TQ

which gives zero bellman error.

Page 132: Introduction to Deep Learning with Tensor owIntroduction to Deep Learning with Tensor ow Yunhao (Robin) Tang Department of IEOR, Columbia University February 25, 2019 IEOR 8100: Reinforcement

Q = TQ

Page 133: Introduction to Deep Learning with Tensor owIntroduction to Deep Learning with Tensor ow Yunhao (Robin) Tang Department of IEOR, Columbia University February 25, 2019 IEOR 8100: Reinforcement

Q = TQ

Page 134: Introduction to Deep Learning with Tensor owIntroduction to Deep Learning with Tensor ow Yunhao (Robin) Tang Department of IEOR, Columbia University February 25, 2019 IEOR 8100: Reinforcement

Q = TQ

Page 135: Introduction to Deep Learning with Tensor owIntroduction to Deep Learning with Tensor ow Yunhao (Robin) Tang Department of IEOR, Columbia University February 25, 2019 IEOR 8100: Reinforcement

Q = TQ

Page 136: Introduction to Deep Learning with Tensor owIntroduction to Deep Learning with Tensor ow Yunhao (Robin) Tang Department of IEOR, Columbia University February 25, 2019 IEOR 8100: Reinforcement

Basic DQN

Approximate using neural net Qθ(s, a) ≈ Qπ(s, a) with parameter θ.

Discrete action space |A| <∞: input state s, output |A| values, the ith valuerepresents the approximate action-value function for the ith action.

Continuous action space |A| =∞: input state s and action a, output one valuewhich represents Qπ(s, a). Not our focus here.

Minimize Bellman error

Eπ[(Qθ(si , ai )− ri −maxa

Qθ(s ′i , a))2]

Sample based, given tuples {(si , ai , ri , s ′i )}Ni=1

minθ

N∑i=1

1

N(Qθ(si , ai )− ri −max

aQθ(s ′i , a))2

Page 137: Introduction to Deep Learning with Tensor owIntroduction to Deep Learning with Tensor ow Yunhao (Robin) Tang Department of IEOR, Columbia University February 25, 2019 IEOR 8100: Reinforcement

Basic DQN

Qθ(s ′i , a))2]

minθ

N∑i=1

1

aQθ(s ′i , a))2

Page 138: Introduction to Deep Learning with Tensor owIntroduction to Deep Learning with Tensor ow Yunhao (Robin) Tang Department of IEOR, Columbia University February 25, 2019 IEOR 8100: Reinforcement

Basic DQN

Qθ(s ′i , a))2]

minθ

N∑i=1

1

aQθ(s ′i , a))2

Page 139: Introduction to Deep Learning with Tensor owIntroduction to Deep Learning with Tensor ow Yunhao (Robin) Tang Department of IEOR, Columbia University February 25, 2019 IEOR 8100: Reinforcement

Basic DQN

Qθ(s ′i , a))2]

minθ

N∑i=1

1

aQθ(s ′i , a))2

Page 140: Introduction to Deep Learning with Tensor owIntroduction to Deep Learning with Tensor ow Yunhao (Robin) Tang Department of IEOR, Columbia University February 25, 2019 IEOR 8100: Reinforcement

Basic DQN

Qθ(s ′i , a))2]

minθ

N∑i=1

1

aQθ(s ′i , a))2

Page 141: Introduction to Deep Learning with Tensor owIntroduction to Deep Learning with Tensor ow Yunhao (Robin) Tang Department of IEOR, Columbia University February 25, 2019 IEOR 8100: Reinforcement

Basic DQN

Qθ(s ′i , a))2]

minθ

N∑i=1

1

aQθ(s ′i , a))2

Page 142: Introduction to Deep Learning with Tensor owIntroduction to Deep Learning with Tensor ow Yunhao (Robin) Tang Department of IEOR, Columbia University February 25, 2019 IEOR 8100: Reinforcement

Basic DQN

Instablity in optimizing

minθ

N∑i=1

1

aQθ(s ′i , a))2

Two techniques to alleviate instability: experience replay and target network.

Experience Replay: store experience tuple {si , ai , ri , s ′i } into replay buffer B,when training, sample batches of experience {si , ai , ri , s ′i }bi=1 from B and updateparameters using SGD

Target Network: the training target ri + maxa Qθ(s ′i , a) is non-stationary. Makeit stationary by introducing a slowly updated target net θ− and compute target asri + maxa Qθ−(s ′i , a)