![Page 1: Introduction to Deep Learning with Tensor owIntroduction to Deep Learning with Tensor ow Yunhao (Robin) Tang Department of IEOR, Columbia University February 25, 2019 IEOR 8100: Reinforcement](https://reader030.vdocument.in/reader030/viewer/2022040104/5ec97e4eb216631abf2786f8/html5/thumbnails/1.jpg)
PreliminaryIntro to DL with Tensorflow
Introduction to Deep Learning with Tensorflow
Yunhao (Robin) TangDepartment of IEOR, Columbia University
February 25, 2019
IEOR 8100: Reinforcement Learning 1/36
![Page 2: Introduction to Deep Learning with Tensor owIntroduction to Deep Learning with Tensor ow Yunhao (Robin) Tang Department of IEOR, Columbia University February 25, 2019 IEOR 8100: Reinforcement](https://reader030.vdocument.in/reader030/viewer/2022040104/5ec97e4eb216631abf2786f8/html5/thumbnails/2.jpg)
PreliminaryIntro to DL with Tensorflow
Summary
Summary
What we will cover...
Simple Example: Linear Regression with Tensorflow
Intro to Deep Learning
Deep Learning with Tensorflow
Beyond
If time permits, we will also cover
OpenAI Gym
Implement basic DQN
IEOR 8100: Reinforcement Learning 2/36
![Page 3: Introduction to Deep Learning with Tensor owIntroduction to Deep Learning with Tensor ow Yunhao (Robin) Tang Department of IEOR, Columbia University February 25, 2019 IEOR 8100: Reinforcement](https://reader030.vdocument.in/reader030/viewer/2022040104/5ec97e4eb216631abf2786f8/html5/thumbnails/3.jpg)
PreliminaryIntro to DL with Tensorflow
Summary
Summary
What we will cover...
Simple Example: Linear Regression with Tensorflow
Intro to Deep Learning
Deep Learning with Tensorflow
Beyond
If time permits, we will also cover
OpenAI Gym
Implement basic DQN
IEOR 8100: Reinforcement Learning 2/36
![Page 4: Introduction to Deep Learning with Tensor owIntroduction to Deep Learning with Tensor ow Yunhao (Robin) Tang Department of IEOR, Columbia University February 25, 2019 IEOR 8100: Reinforcement](https://reader030.vdocument.in/reader030/viewer/2022040104/5ec97e4eb216631abf2786f8/html5/thumbnails/4.jpg)
PreliminaryIntro to DL with Tensorflow
Summary
Summary
What we will cover...
Simple Example: Linear Regression with Tensorflow
Intro to Deep Learning
Deep Learning with Tensorflow
Beyond
If time permits, we will also cover
OpenAI Gym
Implement basic DQN
IEOR 8100: Reinforcement Learning 2/36
![Page 5: Introduction to Deep Learning with Tensor owIntroduction to Deep Learning with Tensor ow Yunhao (Robin) Tang Department of IEOR, Columbia University February 25, 2019 IEOR 8100: Reinforcement](https://reader030.vdocument.in/reader030/viewer/2022040104/5ec97e4eb216631abf2786f8/html5/thumbnails/5.jpg)
PreliminaryIntro to DL with Tensorflow
Summary
Summary
What we will cover...
Simple Example: Linear Regression with Tensorflow
Intro to Deep Learning
Deep Learning with Tensorflow
Beyond
If time permits, we will also cover
OpenAI Gym
Implement basic DQN
IEOR 8100: Reinforcement Learning 2/36
![Page 6: Introduction to Deep Learning with Tensor owIntroduction to Deep Learning with Tensor ow Yunhao (Robin) Tang Department of IEOR, Columbia University February 25, 2019 IEOR 8100: Reinforcement](https://reader030.vdocument.in/reader030/viewer/2022040104/5ec97e4eb216631abf2786f8/html5/thumbnails/6.jpg)
PreliminaryIntro to DL with Tensorflow
Summary
Summary
What we will cover...
Simple Example: Linear Regression with Tensorflow
Intro to Deep Learning
Deep Learning with Tensorflow
Beyond
If time permits, we will also cover
OpenAI Gym
Implement basic DQN
IEOR 8100: Reinforcement Learning 2/36
![Page 7: Introduction to Deep Learning with Tensor owIntroduction to Deep Learning with Tensor ow Yunhao (Robin) Tang Department of IEOR, Columbia University February 25, 2019 IEOR 8100: Reinforcement](https://reader030.vdocument.in/reader030/viewer/2022040104/5ec97e4eb216631abf2786f8/html5/thumbnails/7.jpg)
PreliminaryIntro to DL with Tensorflow
Summary
Summary
What we will cover...
Simple Example: Linear Regression with Tensorflow
Intro to Deep Learning
Deep Learning with Tensorflow
Beyond
If time permits, we will also cover
OpenAI Gym
Implement basic DQN
IEOR 8100: Reinforcement Learning 2/36
![Page 8: Introduction to Deep Learning with Tensor owIntroduction to Deep Learning with Tensor ow Yunhao (Robin) Tang Department of IEOR, Columbia University February 25, 2019 IEOR 8100: Reinforcement](https://reader030.vdocument.in/reader030/viewer/2022040104/5ec97e4eb216631abf2786f8/html5/thumbnails/8.jpg)
PreliminaryIntro to DL with Tensorflow
Summary
Summary
What we will cover...
Simple Example: Linear Regression with Tensorflow
Intro to Deep Learning
Deep Learning with Tensorflow
Beyond
If time permits, we will also cover
OpenAI Gym
Implement basic DQN
IEOR 8100: Reinforcement Learning 2/36
![Page 9: Introduction to Deep Learning with Tensor owIntroduction to Deep Learning with Tensor ow Yunhao (Robin) Tang Department of IEOR, Columbia University February 25, 2019 IEOR 8100: Reinforcement](https://reader030.vdocument.in/reader030/viewer/2022040104/5ec97e4eb216631abf2786f8/html5/thumbnails/9.jpg)
PreliminaryIntro to DL with Tensorflow
Simple ExampleIntro to DLBeyond Tensorflow
Info about Tensorflow
TF is an open source project formally supported by Google.
Programming interface: Python, C++, C... Python is most well developed anddocumented.
Latest version: 1.12 (same time last year 1.5). Commonly used version: ≈1.0.(depend on other dependencies)
Can be installed via pip. Best supported on mac os and linux.
(a) Tensorflow paper
IEOR 8100: Reinforcement Learning 3/36
![Page 10: Introduction to Deep Learning with Tensor owIntroduction to Deep Learning with Tensor ow Yunhao (Robin) Tang Department of IEOR, Columbia University February 25, 2019 IEOR 8100: Reinforcement](https://reader030.vdocument.in/reader030/viewer/2022040104/5ec97e4eb216631abf2786f8/html5/thumbnails/10.jpg)
PreliminaryIntro to DL with Tensorflow
Simple ExampleIntro to DLBeyond Tensorflow
Info about Tensorflow
TF is an open source project formally supported by Google.
Programming interface: Python, C++, C... Python is most well developed anddocumented.
Latest version: 1.12 (same time last year 1.5). Commonly used version: ≈1.0.(depend on other dependencies)
Can be installed via pip. Best supported on mac os and linux.
(b) Tensorflow paper
IEOR 8100: Reinforcement Learning 3/36
![Page 11: Introduction to Deep Learning with Tensor owIntroduction to Deep Learning with Tensor ow Yunhao (Robin) Tang Department of IEOR, Columbia University February 25, 2019 IEOR 8100: Reinforcement](https://reader030.vdocument.in/reader030/viewer/2022040104/5ec97e4eb216631abf2786f8/html5/thumbnails/11.jpg)
PreliminaryIntro to DL with Tensorflow
Simple ExampleIntro to DLBeyond Tensorflow
Info about Tensorflow
TF is an open source project formally supported by Google.
Programming interface: Python, C++, C... Python is most well developed anddocumented.
Latest version: 1.12 (same time last year 1.5). Commonly used version: ≈1.0.(depend on other dependencies)
Can be installed via pip. Best supported on mac os and linux.
(c) Tensorflow paper
IEOR 8100: Reinforcement Learning 3/36
![Page 12: Introduction to Deep Learning with Tensor owIntroduction to Deep Learning with Tensor ow Yunhao (Robin) Tang Department of IEOR, Columbia University February 25, 2019 IEOR 8100: Reinforcement](https://reader030.vdocument.in/reader030/viewer/2022040104/5ec97e4eb216631abf2786f8/html5/thumbnails/12.jpg)
PreliminaryIntro to DL with Tensorflow
Simple ExampleIntro to DLBeyond Tensorflow
Info about Tensorflow
TF is an open source project formally supported by Google.
Programming interface: Python, C++, C... Python is most well developed anddocumented.
Latest version: 1.12 (same time last year 1.5). Commonly used version: ≈1.0.(depend on other dependencies)
Can be installed via pip. Best supported on mac os and linux.
(d) Tensorflow paper
IEOR 8100: Reinforcement Learning 3/36
![Page 13: Introduction to Deep Learning with Tensor owIntroduction to Deep Learning with Tensor ow Yunhao (Robin) Tang Department of IEOR, Columbia University February 25, 2019 IEOR 8100: Reinforcement](https://reader030.vdocument.in/reader030/viewer/2022040104/5ec97e4eb216631abf2786f8/html5/thumbnails/13.jpg)
PreliminaryIntro to DL with Tensorflow
Simple ExampleIntro to DLBeyond Tensorflow
Info about Tensorflow
TF is an open source project formally supported by Google.
Programming interface: Python, C++, C... Python is most well developed anddocumented.
Latest version: 1.12 (same time last year 1.5). Commonly used version: ≈1.0.(depend on other dependencies)
Can be installed via pip. Best supported on mac os and linux.
(e) Tensorflow paper
IEOR 8100: Reinforcement Learning 3/36
![Page 14: Introduction to Deep Learning with Tensor owIntroduction to Deep Learning with Tensor ow Yunhao (Robin) Tang Department of IEOR, Columbia University February 25, 2019 IEOR 8100: Reinforcement](https://reader030.vdocument.in/reader030/viewer/2022040104/5ec97e4eb216631abf2786f8/html5/thumbnails/14.jpg)
PreliminaryIntro to DL with Tensorflow
Simple ExampleIntro to DLBeyond Tensorflow
Simple Example: Linear Regression
Linear Regression: N data points {xi , yi}Ni=1, x ∈ Rk , y ∈ RSpecify Architecture: Consider linear function for regression with parameter:slope θ ∈ Rk and bias θ0 ∈ R. The prediction is
yi = θT xi + θ0
Define Loss: Loss function to minimize J = 1N
∑Ni=1(yi − yi )
2
Gradient Descent: θ ← θ − α∇θJ, θ0 ← θ0 − α∇θ0J
IEOR 8100: Reinforcement Learning 4/36
![Page 15: Introduction to Deep Learning with Tensor owIntroduction to Deep Learning with Tensor ow Yunhao (Robin) Tang Department of IEOR, Columbia University February 25, 2019 IEOR 8100: Reinforcement](https://reader030.vdocument.in/reader030/viewer/2022040104/5ec97e4eb216631abf2786f8/html5/thumbnails/15.jpg)
PreliminaryIntro to DL with Tensorflow
Simple ExampleIntro to DLBeyond Tensorflow
Simple Example: Linear Regression
Linear Regression: N data points {xi , yi}Ni=1, x ∈ Rk , y ∈ R
Specify Architecture: Consider linear function for regression with parameter:slope θ ∈ Rk and bias θ0 ∈ R. The prediction is
yi = θT xi + θ0
Define Loss: Loss function to minimize J = 1N
∑Ni=1(yi − yi )
2
Gradient Descent: θ ← θ − α∇θJ, θ0 ← θ0 − α∇θ0J
IEOR 8100: Reinforcement Learning 4/36
![Page 16: Introduction to Deep Learning with Tensor owIntroduction to Deep Learning with Tensor ow Yunhao (Robin) Tang Department of IEOR, Columbia University February 25, 2019 IEOR 8100: Reinforcement](https://reader030.vdocument.in/reader030/viewer/2022040104/5ec97e4eb216631abf2786f8/html5/thumbnails/16.jpg)
PreliminaryIntro to DL with Tensorflow
Simple ExampleIntro to DLBeyond Tensorflow
Simple Example: Linear Regression
Linear Regression: N data points {xi , yi}Ni=1, x ∈ Rk , y ∈ RSpecify Architecture: Consider linear function for regression with parameter:slope θ ∈ Rk and bias θ0 ∈ R. The prediction is
yi = θT xi + θ0
Define Loss: Loss function to minimize J = 1N
∑Ni=1(yi − yi )
2
Gradient Descent: θ ← θ − α∇θJ, θ0 ← θ0 − α∇θ0J
IEOR 8100: Reinforcement Learning 4/36
![Page 17: Introduction to Deep Learning with Tensor owIntroduction to Deep Learning with Tensor ow Yunhao (Robin) Tang Department of IEOR, Columbia University February 25, 2019 IEOR 8100: Reinforcement](https://reader030.vdocument.in/reader030/viewer/2022040104/5ec97e4eb216631abf2786f8/html5/thumbnails/17.jpg)
PreliminaryIntro to DL with Tensorflow
Simple ExampleIntro to DLBeyond Tensorflow
Simple Example: Linear Regression
Linear Regression: N data points {xi , yi}Ni=1, x ∈ Rk , y ∈ RSpecify Architecture: Consider linear function for regression with parameter:slope θ ∈ Rk and bias θ0 ∈ R. The prediction is
yi = θT xi + θ0
Define Loss: Loss function to minimize J = 1N
∑Ni=1(yi − yi )
2
Gradient Descent: θ ← θ − α∇θJ, θ0 ← θ0 − α∇θ0J
IEOR 8100: Reinforcement Learning 4/36
![Page 18: Introduction to Deep Learning with Tensor owIntroduction to Deep Learning with Tensor ow Yunhao (Robin) Tang Department of IEOR, Columbia University February 25, 2019 IEOR 8100: Reinforcement](https://reader030.vdocument.in/reader030/viewer/2022040104/5ec97e4eb216631abf2786f8/html5/thumbnails/18.jpg)
PreliminaryIntro to DL with Tensorflow
Simple ExampleIntro to DLBeyond Tensorflow
Simple Example: Linear Regression
Linear Regression: N data points {xi , yi}Ni=1, x ∈ Rk , y ∈ RSpecify Architecture: Consider linear function for regression with parameter:slope θ ∈ Rk and bias θ0 ∈ R. The prediction is
yi = θT xi + θ0
Define Loss: Loss function to minimize J = 1N
∑Ni=1(yi − yi )
2
Gradient Descent: θ ← θ − α∇θJ, θ0 ← θ0 − α∇θ0J
IEOR 8100: Reinforcement Learning 4/36
![Page 19: Introduction to Deep Learning with Tensor owIntroduction to Deep Learning with Tensor ow Yunhao (Robin) Tang Department of IEOR, Columbia University February 25, 2019 IEOR 8100: Reinforcement](https://reader030.vdocument.in/reader030/viewer/2022040104/5ec97e4eb216631abf2786f8/html5/thumbnails/19.jpg)
PreliminaryIntro to DL with Tensorflow
Simple ExampleIntro to DLBeyond Tensorflow
Linear Regression: Data
(f) True Parameters
(g) Data
IEOR 8100: Reinforcement Learning 5/36
![Page 20: Introduction to Deep Learning with Tensor owIntroduction to Deep Learning with Tensor ow Yunhao (Robin) Tang Department of IEOR, Columbia University February 25, 2019 IEOR 8100: Reinforcement](https://reader030.vdocument.in/reader030/viewer/2022040104/5ec97e4eb216631abf2786f8/html5/thumbnails/20.jpg)
PreliminaryIntro to DL with Tensorflow
Simple ExampleIntro to DLBeyond Tensorflow
Linear Regression with Tensorflow
Let us just focus on defining architecture...
(h) Linear regression tf code
IEOR 8100: Reinforcement Learning 6/36
![Page 21: Introduction to Deep Learning with Tensor owIntroduction to Deep Learning with Tensor ow Yunhao (Robin) Tang Department of IEOR, Columbia University February 25, 2019 IEOR 8100: Reinforcement](https://reader030.vdocument.in/reader030/viewer/2022040104/5ec97e4eb216631abf2786f8/html5/thumbnails/21.jpg)
PreliminaryIntro to DL with Tensorflow
Simple ExampleIntro to DLBeyond Tensorflow
Linear Regression with Tensorflow
We launch the training...
(i) loss (j) fitted line
IEOR 8100: Reinforcement Learning 7/36
![Page 22: Introduction to Deep Learning with Tensor owIntroduction to Deep Learning with Tensor ow Yunhao (Robin) Tang Department of IEOR, Columbia University February 25, 2019 IEOR 8100: Reinforcement](https://reader030.vdocument.in/reader030/viewer/2022040104/5ec97e4eb216631abf2786f8/html5/thumbnails/22.jpg)
PreliminaryIntro to DL with Tensorflow
Simple ExampleIntro to DLBeyond Tensorflow
Intro to DL
Consider data sets with complex relation {xi , yi}Ni=1, linear regression will notwork.MLP (Multi Layer Perceptron): stack multiple layers of linear transformationsand nonlinear functions.
input x ∈ R6, first layer linear transformation W1x + b1 ∈ R4, then apply nonlinearfunction h1 = σ(W1x + b1) ∈ R4. Nonlinear function applies elementwise.second layer has W2, b2, transform as h2 = σ(W2h2 + b2) ∈ R3
final output y = W3σ(W2σ(W1x + b1) + b2) + b3 ∈ R
(k) mlp (l) nonlinear functions
IEOR 8100: Reinforcement Learning 8/36
![Page 23: Introduction to Deep Learning with Tensor owIntroduction to Deep Learning with Tensor ow Yunhao (Robin) Tang Department of IEOR, Columbia University February 25, 2019 IEOR 8100: Reinforcement](https://reader030.vdocument.in/reader030/viewer/2022040104/5ec97e4eb216631abf2786f8/html5/thumbnails/23.jpg)
PreliminaryIntro to DL with Tensorflow
Simple ExampleIntro to DLBeyond Tensorflow
Intro to DL
Consider data sets with complex relation {xi , yi}Ni=1, linear regression will notwork.
MLP (Multi Layer Perceptron): stack multiple layers of linear transformationsand nonlinear functions.
input x ∈ R6, first layer linear transformation W1x + b1 ∈ R4, then apply nonlinearfunction h1 = σ(W1x + b1) ∈ R4. Nonlinear function applies elementwise.second layer has W2, b2, transform as h2 = σ(W2h2 + b2) ∈ R3
final output y = W3σ(W2σ(W1x + b1) + b2) + b3 ∈ R
(m) mlp (n) nonlinear functions
IEOR 8100: Reinforcement Learning 8/36
![Page 24: Introduction to Deep Learning with Tensor owIntroduction to Deep Learning with Tensor ow Yunhao (Robin) Tang Department of IEOR, Columbia University February 25, 2019 IEOR 8100: Reinforcement](https://reader030.vdocument.in/reader030/viewer/2022040104/5ec97e4eb216631abf2786f8/html5/thumbnails/24.jpg)
PreliminaryIntro to DL with Tensorflow
Simple ExampleIntro to DLBeyond Tensorflow
Intro to DL
Consider data sets with complex relation {xi , yi}Ni=1, linear regression will notwork.MLP (Multi Layer Perceptron): stack multiple layers of linear transformationsand nonlinear functions.
input x ∈ R6, first layer linear transformation W1x + b1 ∈ R4, then apply nonlinearfunction h1 = σ(W1x + b1) ∈ R4. Nonlinear function applies elementwise.second layer has W2, b2, transform as h2 = σ(W2h2 + b2) ∈ R3
final output y = W3σ(W2σ(W1x + b1) + b2) + b3 ∈ R
(o) mlp (p) nonlinear functions
IEOR 8100: Reinforcement Learning 8/36
![Page 25: Introduction to Deep Learning with Tensor owIntroduction to Deep Learning with Tensor ow Yunhao (Robin) Tang Department of IEOR, Columbia University February 25, 2019 IEOR 8100: Reinforcement](https://reader030.vdocument.in/reader030/viewer/2022040104/5ec97e4eb216631abf2786f8/html5/thumbnails/25.jpg)
PreliminaryIntro to DL with Tensorflow
Simple ExampleIntro to DLBeyond Tensorflow
Intro to DL
Consider data sets with complex relation {xi , yi}Ni=1, linear regression will notwork.MLP (Multi Layer Perceptron): stack multiple layers of linear transformationsand nonlinear functions.
input x ∈ R6, first layer linear transformation W1x + b1 ∈ R4, then apply nonlinearfunction h1 = σ(W1x + b1) ∈ R4. Nonlinear function applies elementwise.
second layer has W2, b2, transform as h2 = σ(W2h2 + b2) ∈ R3
final output y = W3σ(W2σ(W1x + b1) + b2) + b3 ∈ R
(q) mlp (r) nonlinear functions
IEOR 8100: Reinforcement Learning 8/36
![Page 26: Introduction to Deep Learning with Tensor owIntroduction to Deep Learning with Tensor ow Yunhao (Robin) Tang Department of IEOR, Columbia University February 25, 2019 IEOR 8100: Reinforcement](https://reader030.vdocument.in/reader030/viewer/2022040104/5ec97e4eb216631abf2786f8/html5/thumbnails/26.jpg)
PreliminaryIntro to DL with Tensorflow
Simple ExampleIntro to DLBeyond Tensorflow
Intro to DL
Consider data sets with complex relation {xi , yi}Ni=1, linear regression will notwork.MLP (Multi Layer Perceptron): stack multiple layers of linear transformationsand nonlinear functions.
input x ∈ R6, first layer linear transformation W1x + b1 ∈ R4, then apply nonlinearfunction h1 = σ(W1x + b1) ∈ R4. Nonlinear function applies elementwise.second layer has W2, b2, transform as h2 = σ(W2h2 + b2) ∈ R3
final output y = W3σ(W2σ(W1x + b1) + b2) + b3 ∈ R
(s) mlp (t) nonlinear functions
IEOR 8100: Reinforcement Learning 8/36
![Page 27: Introduction to Deep Learning with Tensor owIntroduction to Deep Learning with Tensor ow Yunhao (Robin) Tang Department of IEOR, Columbia University February 25, 2019 IEOR 8100: Reinforcement](https://reader030.vdocument.in/reader030/viewer/2022040104/5ec97e4eb216631abf2786f8/html5/thumbnails/27.jpg)
PreliminaryIntro to DL with Tensorflow
Simple ExampleIntro to DLBeyond Tensorflow
Intro to DL
Consider data sets with complex relation {xi , yi}Ni=1, linear regression will notwork.MLP (Multi Layer Perceptron): stack multiple layers of linear transformationsand nonlinear functions.
input x ∈ R6, first layer linear transformation W1x + b1 ∈ R4, then apply nonlinearfunction h1 = σ(W1x + b1) ∈ R4. Nonlinear function applies elementwise.second layer has W2, b2, transform as h2 = σ(W2h2 + b2) ∈ R3
final output y = W3σ(W2σ(W1x + b1) + b2) + b3 ∈ R
(u) mlp (v) nonlinear functions
IEOR 8100: Reinforcement Learning 8/36
![Page 28: Introduction to Deep Learning with Tensor owIntroduction to Deep Learning with Tensor ow Yunhao (Robin) Tang Department of IEOR, Columbia University February 25, 2019 IEOR 8100: Reinforcement](https://reader030.vdocument.in/reader030/viewer/2022040104/5ec97e4eb216631abf2786f8/html5/thumbnails/28.jpg)
PreliminaryIntro to DL with Tensorflow
Simple ExampleIntro to DLBeyond Tensorflow
Intro to DL
Architecture: define a complex input to output relation with stacked layers oflinear transformations and nonlinear functions (activations).
Architectures are adapted to data structure at hand.Input prior knowledge into modeling through architecture design.MLP, CNN (image), RNN (sequence data, audio, language...
Parameters: weights W1,W2,W3 and bias b1, b2, b3.Linear regression: slope θ and bias θ0
IEOR 8100: Reinforcement Learning 9/36
![Page 29: Introduction to Deep Learning with Tensor owIntroduction to Deep Learning with Tensor ow Yunhao (Robin) Tang Department of IEOR, Columbia University February 25, 2019 IEOR 8100: Reinforcement](https://reader030.vdocument.in/reader030/viewer/2022040104/5ec97e4eb216631abf2786f8/html5/thumbnails/29.jpg)
PreliminaryIntro to DL with Tensorflow
Simple ExampleIntro to DLBeyond Tensorflow
Intro to DL
Architecture: define a complex input to output relation with stacked layers oflinear transformations and nonlinear functions (activations).
Architectures are adapted to data structure at hand.Input prior knowledge into modeling through architecture design.MLP, CNN (image), RNN (sequence data, audio, language...
Parameters: weights W1,W2,W3 and bias b1, b2, b3.Linear regression: slope θ and bias θ0
IEOR 8100: Reinforcement Learning 9/36
![Page 30: Introduction to Deep Learning with Tensor owIntroduction to Deep Learning with Tensor ow Yunhao (Robin) Tang Department of IEOR, Columbia University February 25, 2019 IEOR 8100: Reinforcement](https://reader030.vdocument.in/reader030/viewer/2022040104/5ec97e4eb216631abf2786f8/html5/thumbnails/30.jpg)
PreliminaryIntro to DL with Tensorflow
Simple ExampleIntro to DLBeyond Tensorflow
Intro to DL
Architecture: define a complex input to output relation with stacked layers oflinear transformations and nonlinear functions (activations).
Architectures are adapted to data structure at hand.
Input prior knowledge into modeling through architecture design.MLP, CNN (image), RNN (sequence data, audio, language...
Parameters: weights W1,W2,W3 and bias b1, b2, b3.Linear regression: slope θ and bias θ0
IEOR 8100: Reinforcement Learning 9/36
![Page 31: Introduction to Deep Learning with Tensor owIntroduction to Deep Learning with Tensor ow Yunhao (Robin) Tang Department of IEOR, Columbia University February 25, 2019 IEOR 8100: Reinforcement](https://reader030.vdocument.in/reader030/viewer/2022040104/5ec97e4eb216631abf2786f8/html5/thumbnails/31.jpg)
PreliminaryIntro to DL with Tensorflow
Simple ExampleIntro to DLBeyond Tensorflow
Intro to DL
Architecture: define a complex input to output relation with stacked layers oflinear transformations and nonlinear functions (activations).
Architectures are adapted to data structure at hand.Input prior knowledge into modeling through architecture design.
MLP, CNN (image), RNN (sequence data, audio, language...
Parameters: weights W1,W2,W3 and bias b1, b2, b3.Linear regression: slope θ and bias θ0
IEOR 8100: Reinforcement Learning 9/36
![Page 32: Introduction to Deep Learning with Tensor owIntroduction to Deep Learning with Tensor ow Yunhao (Robin) Tang Department of IEOR, Columbia University February 25, 2019 IEOR 8100: Reinforcement](https://reader030.vdocument.in/reader030/viewer/2022040104/5ec97e4eb216631abf2786f8/html5/thumbnails/32.jpg)
PreliminaryIntro to DL with Tensorflow
Simple ExampleIntro to DLBeyond Tensorflow
Intro to DL
Architecture: define a complex input to output relation with stacked layers oflinear transformations and nonlinear functions (activations).
Architectures are adapted to data structure at hand.Input prior knowledge into modeling through architecture design.MLP, CNN (image), RNN (sequence data, audio, language...
Parameters: weights W1,W2,W3 and bias b1, b2, b3.Linear regression: slope θ and bias θ0
IEOR 8100: Reinforcement Learning 9/36
![Page 33: Introduction to Deep Learning with Tensor owIntroduction to Deep Learning with Tensor ow Yunhao (Robin) Tang Department of IEOR, Columbia University February 25, 2019 IEOR 8100: Reinforcement](https://reader030.vdocument.in/reader030/viewer/2022040104/5ec97e4eb216631abf2786f8/html5/thumbnails/33.jpg)
PreliminaryIntro to DL with Tensorflow
Simple ExampleIntro to DLBeyond Tensorflow
Intro to DL
Architecture: define a complex input to output relation with stacked layers oflinear transformations and nonlinear functions (activations).
Architectures are adapted to data structure at hand.Input prior knowledge into modeling through architecture design.MLP, CNN (image), RNN (sequence data, audio, language...
Parameters: weights W1,W2,W3 and bias b1, b2, b3.
Linear regression: slope θ and bias θ0
IEOR 8100: Reinforcement Learning 9/36
![Page 34: Introduction to Deep Learning with Tensor owIntroduction to Deep Learning with Tensor ow Yunhao (Robin) Tang Department of IEOR, Columbia University February 25, 2019 IEOR 8100: Reinforcement](https://reader030.vdocument.in/reader030/viewer/2022040104/5ec97e4eb216631abf2786f8/html5/thumbnails/34.jpg)
PreliminaryIntro to DL with Tensorflow
Simple ExampleIntro to DLBeyond Tensorflow
Intro to DL
Architecture: define a complex input to output relation with stacked layers oflinear transformations and nonlinear functions (activations).
Architectures are adapted to data structure at hand.Input prior knowledge into modeling through architecture design.MLP, CNN (image), RNN (sequence data, audio, language...
Parameters: weights W1,W2,W3 and bias b1, b2, b3.Linear regression: slope θ and bias θ0
IEOR 8100: Reinforcement Learning 9/36
![Page 35: Introduction to Deep Learning with Tensor owIntroduction to Deep Learning with Tensor ow Yunhao (Robin) Tang Department of IEOR, Columbia University February 25, 2019 IEOR 8100: Reinforcement](https://reader030.vdocument.in/reader030/viewer/2022040104/5ec97e4eb216631abf2786f8/html5/thumbnails/35.jpg)
PreliminaryIntro to DL with Tensorflow
Simple ExampleIntro to DLBeyond Tensorflow
DL algorithm
Specify Architecture: how many layers, how many hidden unit per layer?yi = fθ(xi ) with generic parameters θ (weights and biases).
Define Loss: Loss function to minimize J = 1N
∑Ni=1(yi − yi )
2
Gradient Descent: θ ← θ − α∇θJ, θ0 ← θ0 − α∇θ0J
Algorithm 1 Generic DL regression
1: Input: model architecture, data {xi , yi}, learning rate α2: Initialize: Model parameters θ = {Wi , bi}3: for t=1,2,3...T do4: Compute prediction yi = fθ(xi )5: Compute loss J = 1
N
∑i (yi − yi )
2
6: Gradient update θ ← θ − α∇θJ7: end for
IEOR 8100: Reinforcement Learning 10/36
![Page 36: Introduction to Deep Learning with Tensor owIntroduction to Deep Learning with Tensor ow Yunhao (Robin) Tang Department of IEOR, Columbia University February 25, 2019 IEOR 8100: Reinforcement](https://reader030.vdocument.in/reader030/viewer/2022040104/5ec97e4eb216631abf2786f8/html5/thumbnails/36.jpg)
PreliminaryIntro to DL with Tensorflow
Simple ExampleIntro to DLBeyond Tensorflow
DL algorithm
Specify Architecture: how many layers, how many hidden unit per layer?yi = fθ(xi ) with generic parameters θ (weights and biases).
Define Loss: Loss function to minimize J = 1N
∑Ni=1(yi − yi )
2
Gradient Descent: θ ← θ − α∇θJ, θ0 ← θ0 − α∇θ0J
Algorithm 2 Generic DL regression
1: Input: model architecture, data {xi , yi}, learning rate α2: Initialize: Model parameters θ = {Wi , bi}3: for t=1,2,3...T do4: Compute prediction yi = fθ(xi )5: Compute loss J = 1
N
∑i (yi − yi )
2
6: Gradient update θ ← θ − α∇θJ7: end for
IEOR 8100: Reinforcement Learning 10/36
![Page 37: Introduction to Deep Learning with Tensor owIntroduction to Deep Learning with Tensor ow Yunhao (Robin) Tang Department of IEOR, Columbia University February 25, 2019 IEOR 8100: Reinforcement](https://reader030.vdocument.in/reader030/viewer/2022040104/5ec97e4eb216631abf2786f8/html5/thumbnails/37.jpg)
PreliminaryIntro to DL with Tensorflow
Simple ExampleIntro to DLBeyond Tensorflow
DL algorithm
Specify Architecture: how many layers, how many hidden unit per layer?yi = fθ(xi ) with generic parameters θ (weights and biases).
Define Loss: Loss function to minimize J = 1N
∑Ni=1(yi − yi )
2
Gradient Descent: θ ← θ − α∇θJ, θ0 ← θ0 − α∇θ0J
Algorithm 3 Generic DL regression
1: Input: model architecture, data {xi , yi}, learning rate α2: Initialize: Model parameters θ = {Wi , bi}3: for t=1,2,3...T do4: Compute prediction yi = fθ(xi )5: Compute loss J = 1
N
∑i (yi − yi )
2
6: Gradient update θ ← θ − α∇θJ7: end for
IEOR 8100: Reinforcement Learning 10/36
![Page 38: Introduction to Deep Learning with Tensor owIntroduction to Deep Learning with Tensor ow Yunhao (Robin) Tang Department of IEOR, Columbia University February 25, 2019 IEOR 8100: Reinforcement](https://reader030.vdocument.in/reader030/viewer/2022040104/5ec97e4eb216631abf2786f8/html5/thumbnails/38.jpg)
PreliminaryIntro to DL with Tensorflow
Simple ExampleIntro to DLBeyond Tensorflow
DL algorithm
Specify Architecture: how many layers, how many hidden unit per layer?yi = fθ(xi ) with generic parameters θ (weights and biases).
Define Loss: Loss function to minimize J = 1N
∑Ni=1(yi − yi )
2
Gradient Descent: θ ← θ − α∇θJ, θ0 ← θ0 − α∇θ0J
Algorithm 4 Generic DL regression
1: Input: model architecture, data {xi , yi}, learning rate α2: Initialize: Model parameters θ = {Wi , bi}3: for t=1,2,3...T do4: Compute prediction yi = fθ(xi )5: Compute loss J = 1
N
∑i (yi − yi )
2
6: Gradient update θ ← θ − α∇θJ7: end for
IEOR 8100: Reinforcement Learning 10/36
![Page 39: Introduction to Deep Learning with Tensor owIntroduction to Deep Learning with Tensor ow Yunhao (Robin) Tang Department of IEOR, Columbia University February 25, 2019 IEOR 8100: Reinforcement](https://reader030.vdocument.in/reader030/viewer/2022040104/5ec97e4eb216631abf2786f8/html5/thumbnails/39.jpg)
PreliminaryIntro to DL with Tensorflow
Simple ExampleIntro to DLBeyond Tensorflow
DL Regression: Data
Generate data using y = x3 + x2 + sin(10x) + noise. Linear regression will fail.
(w) data generation
(x) data plots
IEOR 8100: Reinforcement Learning 11/36
![Page 40: Introduction to Deep Learning with Tensor owIntroduction to Deep Learning with Tensor ow Yunhao (Robin) Tang Department of IEOR, Columbia University February 25, 2019 IEOR 8100: Reinforcement](https://reader030.vdocument.in/reader030/viewer/2022040104/5ec97e4eb216631abf2786f8/html5/thumbnails/40.jpg)
PreliminaryIntro to DL with Tensorflow
Simple ExampleIntro to DLBeyond Tensorflow
DL Regression with Tensorflow
Specify the Architecture, Loss using Tensorflow syntax just like linear regression.
(y) Linear regression tf code
train the model just like linear regression
IEOR 8100: Reinforcement Learning 12/36
![Page 41: Introduction to Deep Learning with Tensor owIntroduction to Deep Learning with Tensor ow Yunhao (Robin) Tang Department of IEOR, Columbia University February 25, 2019 IEOR 8100: Reinforcement](https://reader030.vdocument.in/reader030/viewer/2022040104/5ec97e4eb216631abf2786f8/html5/thumbnails/41.jpg)
PreliminaryIntro to DL with Tensorflow
Simple ExampleIntro to DLBeyond Tensorflow
DL Regression with Tensorflow
Specify the Architecture, Loss using Tensorflow syntax just like linear regression.
(z) Linear regression tf code
train the model just like linear regression
IEOR 8100: Reinforcement Learning 12/36
![Page 42: Introduction to Deep Learning with Tensor owIntroduction to Deep Learning with Tensor ow Yunhao (Robin) Tang Department of IEOR, Columbia University February 25, 2019 IEOR 8100: Reinforcement](https://reader030.vdocument.in/reader030/viewer/2022040104/5ec97e4eb216631abf2786f8/html5/thumbnails/42.jpg)
PreliminaryIntro to DL with Tensorflow
Simple ExampleIntro to DLBeyond Tensorflow
DL Regression with Tensorflow
We launch the training...
() loss () fitted line
IEOR 8100: Reinforcement Learning 13/36
![Page 43: Introduction to Deep Learning with Tensor owIntroduction to Deep Learning with Tensor ow Yunhao (Robin) Tang Department of IEOR, Columbia University February 25, 2019 IEOR 8100: Reinforcement](https://reader030.vdocument.in/reader030/viewer/2022040104/5ec97e4eb216631abf2786f8/html5/thumbnails/43.jpg)
PreliminaryIntro to DL with Tensorflow
Simple ExampleIntro to DLBeyond Tensorflow
Batch training
When data set is large, can only compute gradients on mini-batches.
Sample a batch of data → Compute gradient on the batch → Update parameters.
Batch size: too small: high variance in data stream, unstable training; too large:too costly to compute gradients.
IEOR 8100: Reinforcement Learning 14/36
![Page 44: Introduction to Deep Learning with Tensor owIntroduction to Deep Learning with Tensor ow Yunhao (Robin) Tang Department of IEOR, Columbia University February 25, 2019 IEOR 8100: Reinforcement](https://reader030.vdocument.in/reader030/viewer/2022040104/5ec97e4eb216631abf2786f8/html5/thumbnails/44.jpg)
PreliminaryIntro to DL with Tensorflow
Simple ExampleIntro to DLBeyond Tensorflow
Batch training
When data set is large, can only compute gradients on mini-batches.
Sample a batch of data → Compute gradient on the batch → Update parameters.
Batch size: too small: high variance in data stream, unstable training; too large:too costly to compute gradients.
IEOR 8100: Reinforcement Learning 14/36
![Page 45: Introduction to Deep Learning with Tensor owIntroduction to Deep Learning with Tensor ow Yunhao (Robin) Tang Department of IEOR, Columbia University February 25, 2019 IEOR 8100: Reinforcement](https://reader030.vdocument.in/reader030/viewer/2022040104/5ec97e4eb216631abf2786f8/html5/thumbnails/45.jpg)
PreliminaryIntro to DL with Tensorflow
Simple ExampleIntro to DLBeyond Tensorflow
Batch training
When data set is large, can only compute gradients on mini-batches.
Sample a batch of data → Compute gradient on the batch → Update parameters.
Batch size: too small: high variance in data stream, unstable training; too large:too costly to compute gradients.
IEOR 8100: Reinforcement Learning 14/36
![Page 46: Introduction to Deep Learning with Tensor owIntroduction to Deep Learning with Tensor ow Yunhao (Robin) Tang Department of IEOR, Columbia University February 25, 2019 IEOR 8100: Reinforcement](https://reader030.vdocument.in/reader030/viewer/2022040104/5ec97e4eb216631abf2786f8/html5/thumbnails/46.jpg)
PreliminaryIntro to DL with Tensorflow
Simple ExampleIntro to DLBeyond Tensorflow
Batch training
When data set is large, can only compute gradients on mini-batches.
Sample a batch of data → Compute gradient on the batch → Update parameters.
Batch size: too small: high variance in data stream, unstable training; too large:too costly to compute gradients.
IEOR 8100: Reinforcement Learning 14/36
![Page 47: Introduction to Deep Learning with Tensor owIntroduction to Deep Learning with Tensor ow Yunhao (Robin) Tang Department of IEOR, Columbia University February 25, 2019 IEOR 8100: Reinforcement](https://reader030.vdocument.in/reader030/viewer/2022040104/5ec97e4eb216631abf2786f8/html5/thumbnails/47.jpg)
PreliminaryIntro to DL with Tensorflow
Simple ExampleIntro to DLBeyond Tensorflow
Auto-differentiation
Taking gradients in linear model is easy.
Question: how to take gradients for MLP? ∇W1J is not straightforward as inlinear regression.
J =1
N
∑i
(yi − yi )2, yi = W3σ(W2σ(W1xi + b1) + b2) + b3
IEOR 8100: Reinforcement Learning 15/36
![Page 48: Introduction to Deep Learning with Tensor owIntroduction to Deep Learning with Tensor ow Yunhao (Robin) Tang Department of IEOR, Columbia University February 25, 2019 IEOR 8100: Reinforcement](https://reader030.vdocument.in/reader030/viewer/2022040104/5ec97e4eb216631abf2786f8/html5/thumbnails/48.jpg)
PreliminaryIntro to DL with Tensorflow
Simple ExampleIntro to DLBeyond Tensorflow
Auto-differentiation
Taking gradients in linear model is easy.
Question: how to take gradients for MLP? ∇W1J is not straightforward as inlinear regression.
J =1
N
∑i
(yi − yi )2, yi = W3σ(W2σ(W1xi + b1) + b2) + b3
IEOR 8100: Reinforcement Learning 15/36
![Page 49: Introduction to Deep Learning with Tensor owIntroduction to Deep Learning with Tensor ow Yunhao (Robin) Tang Department of IEOR, Columbia University February 25, 2019 IEOR 8100: Reinforcement](https://reader030.vdocument.in/reader030/viewer/2022040104/5ec97e4eb216631abf2786f8/html5/thumbnails/49.jpg)
PreliminaryIntro to DL with Tensorflow
Simple ExampleIntro to DLBeyond Tensorflow
Auto-differentiation
Taking gradients in linear model is easy.
Question: how to take gradients for MLP? ∇W1J is not straightforward as inlinear regression.
J =1
N
∑i
(yi − yi )2, yi = W3σ(W2σ(W1xi + b1) + b2) + b3
IEOR 8100: Reinforcement Learning 15/36
![Page 50: Introduction to Deep Learning with Tensor owIntroduction to Deep Learning with Tensor ow Yunhao (Robin) Tang Department of IEOR, Columbia University February 25, 2019 IEOR 8100: Reinforcement](https://reader030.vdocument.in/reader030/viewer/2022040104/5ec97e4eb216631abf2786f8/html5/thumbnails/50.jpg)
PreliminaryIntro to DL with Tensorflow
Simple ExampleIntro to DLBeyond Tensorflow
Auto-differentiation
Tensorflow entails automatic differentiation (autodiff), i.e. automatic gradientcomputation. Users specify forward computation y = fθ(x), the program willinternally specify a way to compute ∇θy . Not symbolic computation, not finitedifference
Example: start with a, b, compute e: c = a + b, d = b + 1, e = c · d . Consider
∂e
∂a=∂e
∂d
∂d
∂a+∂e
∂c
∂c
∂a
() computation graph
IEOR 8100: Reinforcement Learning 16/36
![Page 51: Introduction to Deep Learning with Tensor owIntroduction to Deep Learning with Tensor ow Yunhao (Robin) Tang Department of IEOR, Columbia University February 25, 2019 IEOR 8100: Reinforcement](https://reader030.vdocument.in/reader030/viewer/2022040104/5ec97e4eb216631abf2786f8/html5/thumbnails/51.jpg)
PreliminaryIntro to DL with Tensorflow
Simple ExampleIntro to DLBeyond Tensorflow
Auto-differentiation
Tensorflow entails automatic differentiation (autodiff), i.e. automatic gradientcomputation. Users specify forward computation y = fθ(x), the program willinternally specify a way to compute ∇θy . Not symbolic computation, not finitedifference
Example: start with a, b, compute e: c = a + b, d = b + 1, e = c · d . Consider
∂e
∂a=∂e
∂d
∂d
∂a+∂e
∂c
∂c
∂a
() computation graph
IEOR 8100: Reinforcement Learning 16/36
![Page 52: Introduction to Deep Learning with Tensor owIntroduction to Deep Learning with Tensor ow Yunhao (Robin) Tang Department of IEOR, Columbia University February 25, 2019 IEOR 8100: Reinforcement](https://reader030.vdocument.in/reader030/viewer/2022040104/5ec97e4eb216631abf2786f8/html5/thumbnails/52.jpg)
PreliminaryIntro to DL with Tensorflow
Simple ExampleIntro to DLBeyond Tensorflow
Auto-differentiation
Tensorflow entails automatic differentiation (autodiff), i.e. automatic gradientcomputation. Users specify forward computation y = fθ(x), the program willinternally specify a way to compute ∇θy . Not symbolic computation, not finitedifference
Example: start with a, b, compute e: c = a + b, d = b + 1, e = c · d . Consider
∂e
∂a=∂e
∂d
∂d
∂a+∂e
∂c
∂c
∂a
() computation graph
IEOR 8100: Reinforcement Learning 16/36
![Page 53: Introduction to Deep Learning with Tensor owIntroduction to Deep Learning with Tensor ow Yunhao (Robin) Tang Department of IEOR, Columbia University February 25, 2019 IEOR 8100: Reinforcement](https://reader030.vdocument.in/reader030/viewer/2022040104/5ec97e4eb216631abf2786f8/html5/thumbnails/53.jpg)
PreliminaryIntro to DL with Tensorflow
Simple ExampleIntro to DLBeyond Tensorflow
DL with Tensorflow
High level idea: computation graph. Forward computation specifies localoperations that connect locally related variables. Local gradients can becomputed.
forward computation → local forward ops → local gradient → long rangegradients.
Just like error back-propagating from output to inputs and params,back-propagation.
Useful link: http://deeplearning.stanford.edu/wiki/index.php/UFLDL Tutorial
IEOR 8100: Reinforcement Learning 17/36
![Page 54: Introduction to Deep Learning with Tensor owIntroduction to Deep Learning with Tensor ow Yunhao (Robin) Tang Department of IEOR, Columbia University February 25, 2019 IEOR 8100: Reinforcement](https://reader030.vdocument.in/reader030/viewer/2022040104/5ec97e4eb216631abf2786f8/html5/thumbnails/54.jpg)
PreliminaryIntro to DL with Tensorflow
Simple ExampleIntro to DLBeyond Tensorflow
DL with Tensorflow
High level idea: computation graph. Forward computation specifies localoperations that connect locally related variables. Local gradients can becomputed.
forward computation → local forward ops → local gradient → long rangegradients.
Just like error back-propagating from output to inputs and params,back-propagation.
Useful link: http://deeplearning.stanford.edu/wiki/index.php/UFLDL Tutorial
IEOR 8100: Reinforcement Learning 17/36
![Page 55: Introduction to Deep Learning with Tensor owIntroduction to Deep Learning with Tensor ow Yunhao (Robin) Tang Department of IEOR, Columbia University February 25, 2019 IEOR 8100: Reinforcement](https://reader030.vdocument.in/reader030/viewer/2022040104/5ec97e4eb216631abf2786f8/html5/thumbnails/55.jpg)
PreliminaryIntro to DL with Tensorflow
Simple ExampleIntro to DLBeyond Tensorflow
DL with Tensorflow
High level idea: computation graph. Forward computation specifies localoperations that connect locally related variables. Local gradients can becomputed.
forward computation → local forward ops → local gradient → long rangegradients.
Just like error back-propagating from output to inputs and params,back-propagation.
Useful link: http://deeplearning.stanford.edu/wiki/index.php/UFLDL Tutorial
IEOR 8100: Reinforcement Learning 17/36
![Page 56: Introduction to Deep Learning with Tensor owIntroduction to Deep Learning with Tensor ow Yunhao (Robin) Tang Department of IEOR, Columbia University February 25, 2019 IEOR 8100: Reinforcement](https://reader030.vdocument.in/reader030/viewer/2022040104/5ec97e4eb216631abf2786f8/html5/thumbnails/56.jpg)
PreliminaryIntro to DL with Tensorflow
Simple ExampleIntro to DLBeyond Tensorflow
DL with Tensorflow
High level idea: computation graph. Forward computation specifies localoperations that connect locally related variables. Local gradients can becomputed.
forward computation → local forward ops → local gradient → long rangegradients.
Just like error back-propagating from output to inputs and params,back-propagation.
Useful link: http://deeplearning.stanford.edu/wiki/index.php/UFLDL Tutorial
IEOR 8100: Reinforcement Learning 17/36
![Page 57: Introduction to Deep Learning with Tensor owIntroduction to Deep Learning with Tensor ow Yunhao (Robin) Tang Department of IEOR, Columbia University February 25, 2019 IEOR 8100: Reinforcement](https://reader030.vdocument.in/reader030/viewer/2022040104/5ec97e4eb216631abf2786f8/html5/thumbnails/57.jpg)
PreliminaryIntro to DL with Tensorflow
Simple ExampleIntro to DLBeyond Tensorflow
DL with Tensorflow
High level idea: computation graph. Forward computation specifies localoperations that connect locally related variables. Local gradients can becomputed.
forward computation → local forward ops → local gradient → long rangegradients.
Just like error back-propagating from output to inputs and params,back-propagation.
Useful link: http://deeplearning.stanford.edu/wiki/index.php/UFLDL Tutorial
IEOR 8100: Reinforcement Learning 17/36
![Page 58: Introduction to Deep Learning with Tensor owIntroduction to Deep Learning with Tensor ow Yunhao (Robin) Tang Department of IEOR, Columbia University February 25, 2019 IEOR 8100: Reinforcement](https://reader030.vdocument.in/reader030/viewer/2022040104/5ec97e4eb216631abf2786f8/html5/thumbnails/58.jpg)
PreliminaryIntro to DL with Tensorflow
Simple ExampleIntro to DLBeyond Tensorflow
DL with Tensorflow
What we have covered from the code...
Placeholder: tf objects that specify users’ inputs into the graph, as literallyplaceholders for data.
Variable: tf objects that represent parameters that can be updated throughautodiff. Embedded inside the graph.
All expressions computed from primitive variables and placeholders are vertices inthe graph.
IEOR 8100: Reinforcement Learning 18/36
![Page 59: Introduction to Deep Learning with Tensor owIntroduction to Deep Learning with Tensor ow Yunhao (Robin) Tang Department of IEOR, Columbia University February 25, 2019 IEOR 8100: Reinforcement](https://reader030.vdocument.in/reader030/viewer/2022040104/5ec97e4eb216631abf2786f8/html5/thumbnails/59.jpg)
PreliminaryIntro to DL with Tensorflow
Simple ExampleIntro to DLBeyond Tensorflow
DL with Tensorflow
What we have covered from the code...
Placeholder: tf objects that specify users’ inputs into the graph, as literallyplaceholders for data.
Variable: tf objects that represent parameters that can be updated throughautodiff. Embedded inside the graph.
All expressions computed from primitive variables and placeholders are vertices inthe graph.
IEOR 8100: Reinforcement Learning 18/36
![Page 60: Introduction to Deep Learning with Tensor owIntroduction to Deep Learning with Tensor ow Yunhao (Robin) Tang Department of IEOR, Columbia University February 25, 2019 IEOR 8100: Reinforcement](https://reader030.vdocument.in/reader030/viewer/2022040104/5ec97e4eb216631abf2786f8/html5/thumbnails/60.jpg)
PreliminaryIntro to DL with Tensorflow
Simple ExampleIntro to DLBeyond Tensorflow
DL with Tensorflow
What we have covered from the code...
Placeholder: tf objects that specify users’ inputs into the graph, as literallyplaceholders for data.
Variable: tf objects that represent parameters that can be updated throughautodiff. Embedded inside the graph.
All expressions computed from primitive variables and placeholders are vertices inthe graph.
IEOR 8100: Reinforcement Learning 18/36
![Page 61: Introduction to Deep Learning with Tensor owIntroduction to Deep Learning with Tensor ow Yunhao (Robin) Tang Department of IEOR, Columbia University February 25, 2019 IEOR 8100: Reinforcement](https://reader030.vdocument.in/reader030/viewer/2022040104/5ec97e4eb216631abf2786f8/html5/thumbnails/61.jpg)
PreliminaryIntro to DL with Tensorflow
Simple ExampleIntro to DLBeyond Tensorflow
DL with Tensorflow
What we have covered from the code...
Placeholder: tf objects that specify users’ inputs into the graph, as literallyplaceholders for data.
Variable: tf objects that represent parameters that can be updated throughautodiff. Embedded inside the graph.
All expressions computed from primitive variables and placeholders are vertices inthe graph.
IEOR 8100: Reinforcement Learning 18/36
![Page 62: Introduction to Deep Learning with Tensor owIntroduction to Deep Learning with Tensor ow Yunhao (Robin) Tang Department of IEOR, Columbia University February 25, 2019 IEOR 8100: Reinforcement](https://reader030.vdocument.in/reader030/viewer/2022040104/5ec97e4eb216631abf2786f8/html5/thumbnails/62.jpg)
PreliminaryIntro to DL with Tensorflow
Simple ExampleIntro to DLBeyond Tensorflow
DL with Tensorflow
What we have not yet covered from the code... how to launch the training? how touse the gradients computed by Tensorflow to update parameters?
TF expression: python objects that define how to update the graph (variables inthe graph) in a single step.
Optimizer: python objects that can specify gradient updates of variables.
Session: python objects that launch the training.
And many more...
IEOR 8100: Reinforcement Learning 19/36
![Page 63: Introduction to Deep Learning with Tensor owIntroduction to Deep Learning with Tensor ow Yunhao (Robin) Tang Department of IEOR, Columbia University February 25, 2019 IEOR 8100: Reinforcement](https://reader030.vdocument.in/reader030/viewer/2022040104/5ec97e4eb216631abf2786f8/html5/thumbnails/63.jpg)
PreliminaryIntro to DL with Tensorflow
Simple ExampleIntro to DLBeyond Tensorflow
DL with Tensorflow
What we have not yet covered from the code... how to launch the training? how touse the gradients computed by Tensorflow to update parameters?
TF expression: python objects that define how to update the graph (variables inthe graph) in a single step.
Optimizer: python objects that can specify gradient updates of variables.
Session: python objects that launch the training.
And many more...
IEOR 8100: Reinforcement Learning 19/36
![Page 64: Introduction to Deep Learning with Tensor owIntroduction to Deep Learning with Tensor ow Yunhao (Robin) Tang Department of IEOR, Columbia University February 25, 2019 IEOR 8100: Reinforcement](https://reader030.vdocument.in/reader030/viewer/2022040104/5ec97e4eb216631abf2786f8/html5/thumbnails/64.jpg)
PreliminaryIntro to DL with Tensorflow
Simple ExampleIntro to DLBeyond Tensorflow
DL with Tensorflow
What we have not yet covered from the code... how to launch the training? how touse the gradients computed by Tensorflow to update parameters?
TF expression: python objects that define how to update the graph (variables inthe graph) in a single step.
Optimizer: python objects that can specify gradient updates of variables.
Session: python objects that launch the training.
And many more...
IEOR 8100: Reinforcement Learning 19/36
![Page 65: Introduction to Deep Learning with Tensor owIntroduction to Deep Learning with Tensor ow Yunhao (Robin) Tang Department of IEOR, Columbia University February 25, 2019 IEOR 8100: Reinforcement](https://reader030.vdocument.in/reader030/viewer/2022040104/5ec97e4eb216631abf2786f8/html5/thumbnails/65.jpg)
PreliminaryIntro to DL with Tensorflow
Simple ExampleIntro to DLBeyond Tensorflow
DL with Tensorflow
What we have not yet covered from the code... how to launch the training? how touse the gradients computed by Tensorflow to update parameters?
TF expression: python objects that define how to update the graph (variables inthe graph) in a single step.
Optimizer: python objects that can specify gradient updates of variables.
Session: python objects that launch the training.
And many more...
IEOR 8100: Reinforcement Learning 19/36
![Page 66: Introduction to Deep Learning with Tensor owIntroduction to Deep Learning with Tensor ow Yunhao (Robin) Tang Department of IEOR, Columbia University February 25, 2019 IEOR 8100: Reinforcement](https://reader030.vdocument.in/reader030/viewer/2022040104/5ec97e4eb216631abf2786f8/html5/thumbnails/66.jpg)
PreliminaryIntro to DL with Tensorflow
Simple ExampleIntro to DLBeyond Tensorflow
DL with Tensorflow
What we have not yet covered from the code... how to launch the training? how touse the gradients computed by Tensorflow to update parameters?
TF expression: python objects that define how to update the graph (variables inthe graph) in a single step.
Optimizer: python objects that can specify gradient updates of variables.
Session: python objects that launch the training.
And many more...
IEOR 8100: Reinforcement Learning 19/36
![Page 67: Introduction to Deep Learning with Tensor owIntroduction to Deep Learning with Tensor ow Yunhao (Robin) Tang Department of IEOR, Columbia University February 25, 2019 IEOR 8100: Reinforcement](https://reader030.vdocument.in/reader030/viewer/2022040104/5ec97e4eb216631abf2786f8/html5/thumbnails/67.jpg)
PreliminaryIntro to DL with Tensorflow
Simple ExampleIntro to DLBeyond Tensorflow
Launch the training
Define loss and optimizers: which specifies how to update model parametersduring training.
Define session to launch the training
Initialize variables
Feed data into the computation graph
() launch the training
IEOR 8100: Reinforcement Learning 20/36
![Page 68: Introduction to Deep Learning with Tensor owIntroduction to Deep Learning with Tensor ow Yunhao (Robin) Tang Department of IEOR, Columbia University February 25, 2019 IEOR 8100: Reinforcement](https://reader030.vdocument.in/reader030/viewer/2022040104/5ec97e4eb216631abf2786f8/html5/thumbnails/68.jpg)
PreliminaryIntro to DL with Tensorflow
Simple ExampleIntro to DLBeyond Tensorflow
Launch the training
Define loss and optimizers: which specifies how to update model parametersduring training.
Define session to launch the training
Initialize variables
Feed data into the computation graph
() launch the training
IEOR 8100: Reinforcement Learning 20/36
![Page 69: Introduction to Deep Learning with Tensor owIntroduction to Deep Learning with Tensor ow Yunhao (Robin) Tang Department of IEOR, Columbia University February 25, 2019 IEOR 8100: Reinforcement](https://reader030.vdocument.in/reader030/viewer/2022040104/5ec97e4eb216631abf2786f8/html5/thumbnails/69.jpg)
PreliminaryIntro to DL with Tensorflow
Simple ExampleIntro to DLBeyond Tensorflow
Launch the training
Define loss and optimizers: which specifies how to update model parametersduring training.
Define session to launch the training
Initialize variables
Feed data into the computation graph
() launch the training
IEOR 8100: Reinforcement Learning 20/36
![Page 70: Introduction to Deep Learning with Tensor owIntroduction to Deep Learning with Tensor ow Yunhao (Robin) Tang Department of IEOR, Columbia University February 25, 2019 IEOR 8100: Reinforcement](https://reader030.vdocument.in/reader030/viewer/2022040104/5ec97e4eb216631abf2786f8/html5/thumbnails/70.jpg)
PreliminaryIntro to DL with Tensorflow
Simple ExampleIntro to DLBeyond Tensorflow
Launch the training
Define loss and optimizers: which specifies how to update model parametersduring training.
Define session to launch the training
Initialize variables
Feed data into the computation graph
() launch the training
IEOR 8100: Reinforcement Learning 20/36
![Page 71: Introduction to Deep Learning with Tensor owIntroduction to Deep Learning with Tensor ow Yunhao (Robin) Tang Department of IEOR, Columbia University February 25, 2019 IEOR 8100: Reinforcement](https://reader030.vdocument.in/reader030/viewer/2022040104/5ec97e4eb216631abf2786f8/html5/thumbnails/71.jpg)
PreliminaryIntro to DL with Tensorflow
Simple ExampleIntro to DLBeyond Tensorflow
Launch the training
Define loss and optimizers: which specifies how to update model parametersduring training.
Define session to launch the training
Initialize variables
Feed data into the computation graph
() launch the training
IEOR 8100: Reinforcement Learning 20/36
![Page 72: Introduction to Deep Learning with Tensor owIntroduction to Deep Learning with Tensor ow Yunhao (Robin) Tang Department of IEOR, Columbia University February 25, 2019 IEOR 8100: Reinforcement](https://reader030.vdocument.in/reader030/viewer/2022040104/5ec97e4eb216631abf2786f8/html5/thumbnails/72.jpg)
PreliminaryIntro to DL with Tensorflow
Simple ExampleIntro to DLBeyond Tensorflow
Training: whole landscape
() launch the training
IEOR 8100: Reinforcement Learning 21/36
![Page 73: Introduction to Deep Learning with Tensor owIntroduction to Deep Learning with Tensor ow Yunhao (Robin) Tang Department of IEOR, Columbia University February 25, 2019 IEOR 8100: Reinforcement](https://reader030.vdocument.in/reader030/viewer/2022040104/5ec97e4eb216631abf2786f8/html5/thumbnails/73.jpg)
PreliminaryIntro to DL with Tensorflow
Simple ExampleIntro to DLBeyond Tensorflow
Beyond Regression
Classification: regression on probability (image classification)
Structured Prediction: regression on high dimensional objects (autoencoders)
Reinforcement Learning: DQN, policy gradients...DQN is an analogy to regressionPG is an analogy to classification
IEOR 8100: Reinforcement Learning 22/36
![Page 74: Introduction to Deep Learning with Tensor owIntroduction to Deep Learning with Tensor ow Yunhao (Robin) Tang Department of IEOR, Columbia University February 25, 2019 IEOR 8100: Reinforcement](https://reader030.vdocument.in/reader030/viewer/2022040104/5ec97e4eb216631abf2786f8/html5/thumbnails/74.jpg)
PreliminaryIntro to DL with Tensorflow
Simple ExampleIntro to DLBeyond Tensorflow
Beyond Regression
Classification: regression on probability (image classification)
Structured Prediction: regression on high dimensional objects (autoencoders)
Reinforcement Learning: DQN, policy gradients...DQN is an analogy to regressionPG is an analogy to classification
IEOR 8100: Reinforcement Learning 22/36
![Page 75: Introduction to Deep Learning with Tensor owIntroduction to Deep Learning with Tensor ow Yunhao (Robin) Tang Department of IEOR, Columbia University February 25, 2019 IEOR 8100: Reinforcement](https://reader030.vdocument.in/reader030/viewer/2022040104/5ec97e4eb216631abf2786f8/html5/thumbnails/75.jpg)
PreliminaryIntro to DL with Tensorflow
Simple ExampleIntro to DLBeyond Tensorflow
Beyond Regression
Classification: regression on probability (image classification)
Structured Prediction: regression on high dimensional objects (autoencoders)
Reinforcement Learning: DQN, policy gradients...DQN is an analogy to regressionPG is an analogy to classification
IEOR 8100: Reinforcement Learning 22/36
![Page 76: Introduction to Deep Learning with Tensor owIntroduction to Deep Learning with Tensor ow Yunhao (Robin) Tang Department of IEOR, Columbia University February 25, 2019 IEOR 8100: Reinforcement](https://reader030.vdocument.in/reader030/viewer/2022040104/5ec97e4eb216631abf2786f8/html5/thumbnails/76.jpg)
PreliminaryIntro to DL with Tensorflow
Simple ExampleIntro to DLBeyond Tensorflow
Beyond Regression
Classification: regression on probability (image classification)
Structured Prediction: regression on high dimensional objects (autoencoders)
Reinforcement Learning: DQN, policy gradients...
DQN is an analogy to regressionPG is an analogy to classification
IEOR 8100: Reinforcement Learning 22/36
![Page 77: Introduction to Deep Learning with Tensor owIntroduction to Deep Learning with Tensor ow Yunhao (Robin) Tang Department of IEOR, Columbia University February 25, 2019 IEOR 8100: Reinforcement](https://reader030.vdocument.in/reader030/viewer/2022040104/5ec97e4eb216631abf2786f8/html5/thumbnails/77.jpg)
PreliminaryIntro to DL with Tensorflow
Simple ExampleIntro to DLBeyond Tensorflow
Beyond Regression
Classification: regression on probability (image classification)
Structured Prediction: regression on high dimensional objects (autoencoders)
Reinforcement Learning: DQN, policy gradients...DQN is an analogy to regression
PG is an analogy to classification
IEOR 8100: Reinforcement Learning 22/36
![Page 78: Introduction to Deep Learning with Tensor owIntroduction to Deep Learning with Tensor ow Yunhao (Robin) Tang Department of IEOR, Columbia University February 25, 2019 IEOR 8100: Reinforcement](https://reader030.vdocument.in/reader030/viewer/2022040104/5ec97e4eb216631abf2786f8/html5/thumbnails/78.jpg)
PreliminaryIntro to DL with Tensorflow
Simple ExampleIntro to DLBeyond Tensorflow
Beyond Regression
Classification: regression on probability (image classification)
Structured Prediction: regression on high dimensional objects (autoencoders)
Reinforcement Learning: DQN, policy gradients...DQN is an analogy to regressionPG is an analogy to classification
IEOR 8100: Reinforcement Learning 22/36
![Page 79: Introduction to Deep Learning with Tensor owIntroduction to Deep Learning with Tensor ow Yunhao (Robin) Tang Department of IEOR, Columbia University February 25, 2019 IEOR 8100: Reinforcement](https://reader030.vdocument.in/reader030/viewer/2022040104/5ec97e4eb216631abf2786f8/html5/thumbnails/79.jpg)
PreliminaryIntro to DL with Tensorflow
Simple ExampleIntro to DLBeyond Tensorflow
Hyper-parameters
Get a DL model to work involves a lot of hyper-parameters.
Optimization: optimization subroutines that update parameters with gradients.beyond simple sgd, there are rmsprop, adam, adagrad... adam is popular.learning rate
Architectures: number of layers, hidden units per layers
Nonlinear function: sigmoid, relu, tanh...
Initialization: initialization of variables W , b.arbitrary initializations will fail.xavier initialization, truncated normal...
Need hand tuning or cross validation to select good hyper-parameters.
Other topics: batch-normalization, dropout, stochastic layers...
IEOR 8100: Reinforcement Learning 23/36
![Page 80: Introduction to Deep Learning with Tensor owIntroduction to Deep Learning with Tensor ow Yunhao (Robin) Tang Department of IEOR, Columbia University February 25, 2019 IEOR 8100: Reinforcement](https://reader030.vdocument.in/reader030/viewer/2022040104/5ec97e4eb216631abf2786f8/html5/thumbnails/80.jpg)
PreliminaryIntro to DL with Tensorflow
Simple ExampleIntro to DLBeyond Tensorflow
Hyper-parameters
Get a DL model to work involves a lot of hyper-parameters.
Optimization: optimization subroutines that update parameters with gradients.
beyond simple sgd, there are rmsprop, adam, adagrad... adam is popular.learning rate
Architectures: number of layers, hidden units per layers
Nonlinear function: sigmoid, relu, tanh...
Initialization: initialization of variables W , b.arbitrary initializations will fail.xavier initialization, truncated normal...
Need hand tuning or cross validation to select good hyper-parameters.
Other topics: batch-normalization, dropout, stochastic layers...
IEOR 8100: Reinforcement Learning 23/36
![Page 81: Introduction to Deep Learning with Tensor owIntroduction to Deep Learning with Tensor ow Yunhao (Robin) Tang Department of IEOR, Columbia University February 25, 2019 IEOR 8100: Reinforcement](https://reader030.vdocument.in/reader030/viewer/2022040104/5ec97e4eb216631abf2786f8/html5/thumbnails/81.jpg)
PreliminaryIntro to DL with Tensorflow
Simple ExampleIntro to DLBeyond Tensorflow
Hyper-parameters
Get a DL model to work involves a lot of hyper-parameters.
Optimization: optimization subroutines that update parameters with gradients.beyond simple sgd, there are rmsprop, adam, adagrad... adam is popular.
learning rate
Architectures: number of layers, hidden units per layers
Nonlinear function: sigmoid, relu, tanh...
Initialization: initialization of variables W , b.arbitrary initializations will fail.xavier initialization, truncated normal...
Need hand tuning or cross validation to select good hyper-parameters.
Other topics: batch-normalization, dropout, stochastic layers...
IEOR 8100: Reinforcement Learning 23/36
![Page 82: Introduction to Deep Learning with Tensor owIntroduction to Deep Learning with Tensor ow Yunhao (Robin) Tang Department of IEOR, Columbia University February 25, 2019 IEOR 8100: Reinforcement](https://reader030.vdocument.in/reader030/viewer/2022040104/5ec97e4eb216631abf2786f8/html5/thumbnails/82.jpg)
PreliminaryIntro to DL with Tensorflow
Simple ExampleIntro to DLBeyond Tensorflow
Hyper-parameters
Get a DL model to work involves a lot of hyper-parameters.
Optimization: optimization subroutines that update parameters with gradients.beyond simple sgd, there are rmsprop, adam, adagrad... adam is popular.learning rate
Architectures: number of layers, hidden units per layers
Nonlinear function: sigmoid, relu, tanh...
Initialization: initialization of variables W , b.arbitrary initializations will fail.xavier initialization, truncated normal...
Need hand tuning or cross validation to select good hyper-parameters.
Other topics: batch-normalization, dropout, stochastic layers...
IEOR 8100: Reinforcement Learning 23/36
![Page 83: Introduction to Deep Learning with Tensor owIntroduction to Deep Learning with Tensor ow Yunhao (Robin) Tang Department of IEOR, Columbia University February 25, 2019 IEOR 8100: Reinforcement](https://reader030.vdocument.in/reader030/viewer/2022040104/5ec97e4eb216631abf2786f8/html5/thumbnails/83.jpg)
PreliminaryIntro to DL with Tensorflow
Simple ExampleIntro to DLBeyond Tensorflow
Hyper-parameters
Get a DL model to work involves a lot of hyper-parameters.
Optimization: optimization subroutines that update parameters with gradients.beyond simple sgd, there are rmsprop, adam, adagrad... adam is popular.learning rate
Architectures: number of layers, hidden units per layers
Nonlinear function: sigmoid, relu, tanh...
Initialization: initialization of variables W , b.arbitrary initializations will fail.xavier initialization, truncated normal...
Need hand tuning or cross validation to select good hyper-parameters.
Other topics: batch-normalization, dropout, stochastic layers...
IEOR 8100: Reinforcement Learning 23/36
![Page 84: Introduction to Deep Learning with Tensor owIntroduction to Deep Learning with Tensor ow Yunhao (Robin) Tang Department of IEOR, Columbia University February 25, 2019 IEOR 8100: Reinforcement](https://reader030.vdocument.in/reader030/viewer/2022040104/5ec97e4eb216631abf2786f8/html5/thumbnails/84.jpg)
PreliminaryIntro to DL with Tensorflow
Simple ExampleIntro to DLBeyond Tensorflow
Hyper-parameters
Get a DL model to work involves a lot of hyper-parameters.
Optimization: optimization subroutines that update parameters with gradients.beyond simple sgd, there are rmsprop, adam, adagrad... adam is popular.learning rate
Architectures: number of layers, hidden units per layers
Nonlinear function: sigmoid, relu, tanh...
Initialization: initialization of variables W , b.arbitrary initializations will fail.xavier initialization, truncated normal...
Need hand tuning or cross validation to select good hyper-parameters.
Other topics: batch-normalization, dropout, stochastic layers...
IEOR 8100: Reinforcement Learning 23/36
![Page 85: Introduction to Deep Learning with Tensor owIntroduction to Deep Learning with Tensor ow Yunhao (Robin) Tang Department of IEOR, Columbia University February 25, 2019 IEOR 8100: Reinforcement](https://reader030.vdocument.in/reader030/viewer/2022040104/5ec97e4eb216631abf2786f8/html5/thumbnails/85.jpg)
PreliminaryIntro to DL with Tensorflow
Simple ExampleIntro to DLBeyond Tensorflow
Hyper-parameters
Get a DL model to work involves a lot of hyper-parameters.
Optimization: optimization subroutines that update parameters with gradients.beyond simple sgd, there are rmsprop, adam, adagrad... adam is popular.learning rate
Architectures: number of layers, hidden units per layers
Nonlinear function: sigmoid, relu, tanh...
Initialization: initialization of variables W , b.
arbitrary initializations will fail.xavier initialization, truncated normal...
Need hand tuning or cross validation to select good hyper-parameters.
Other topics: batch-normalization, dropout, stochastic layers...
IEOR 8100: Reinforcement Learning 23/36
![Page 86: Introduction to Deep Learning with Tensor owIntroduction to Deep Learning with Tensor ow Yunhao (Robin) Tang Department of IEOR, Columbia University February 25, 2019 IEOR 8100: Reinforcement](https://reader030.vdocument.in/reader030/viewer/2022040104/5ec97e4eb216631abf2786f8/html5/thumbnails/86.jpg)
PreliminaryIntro to DL with Tensorflow
Simple ExampleIntro to DLBeyond Tensorflow
Hyper-parameters
Get a DL model to work involves a lot of hyper-parameters.
Optimization: optimization subroutines that update parameters with gradients.beyond simple sgd, there are rmsprop, adam, adagrad... adam is popular.learning rate
Architectures: number of layers, hidden units per layers
Nonlinear function: sigmoid, relu, tanh...
Initialization: initialization of variables W , b.arbitrary initializations will fail.
xavier initialization, truncated normal...
Need hand tuning or cross validation to select good hyper-parameters.
Other topics: batch-normalization, dropout, stochastic layers...
IEOR 8100: Reinforcement Learning 23/36
![Page 87: Introduction to Deep Learning with Tensor owIntroduction to Deep Learning with Tensor ow Yunhao (Robin) Tang Department of IEOR, Columbia University February 25, 2019 IEOR 8100: Reinforcement](https://reader030.vdocument.in/reader030/viewer/2022040104/5ec97e4eb216631abf2786f8/html5/thumbnails/87.jpg)
PreliminaryIntro to DL with Tensorflow
Simple ExampleIntro to DLBeyond Tensorflow
Hyper-parameters
Get a DL model to work involves a lot of hyper-parameters.
Optimization: optimization subroutines that update parameters with gradients.beyond simple sgd, there are rmsprop, adam, adagrad... adam is popular.learning rate
Architectures: number of layers, hidden units per layers
Nonlinear function: sigmoid, relu, tanh...
Initialization: initialization of variables W , b.arbitrary initializations will fail.xavier initialization, truncated normal...
Need hand tuning or cross validation to select good hyper-parameters.
Other topics: batch-normalization, dropout, stochastic layers...
IEOR 8100: Reinforcement Learning 23/36
![Page 88: Introduction to Deep Learning with Tensor owIntroduction to Deep Learning with Tensor ow Yunhao (Robin) Tang Department of IEOR, Columbia University February 25, 2019 IEOR 8100: Reinforcement](https://reader030.vdocument.in/reader030/viewer/2022040104/5ec97e4eb216631abf2786f8/html5/thumbnails/88.jpg)
PreliminaryIntro to DL with Tensorflow
Simple ExampleIntro to DLBeyond Tensorflow
Hyper-parameters
Get a DL model to work involves a lot of hyper-parameters.
Optimization: optimization subroutines that update parameters with gradients.beyond simple sgd, there are rmsprop, adam, adagrad... adam is popular.learning rate
Architectures: number of layers, hidden units per layers
Nonlinear function: sigmoid, relu, tanh...
Initialization: initialization of variables W , b.arbitrary initializations will fail.xavier initialization, truncated normal...
Need hand tuning or cross validation to select good hyper-parameters.
Other topics: batch-normalization, dropout, stochastic layers...
IEOR 8100: Reinforcement Learning 23/36
![Page 89: Introduction to Deep Learning with Tensor owIntroduction to Deep Learning with Tensor ow Yunhao (Robin) Tang Department of IEOR, Columbia University February 25, 2019 IEOR 8100: Reinforcement](https://reader030.vdocument.in/reader030/viewer/2022040104/5ec97e4eb216631abf2786f8/html5/thumbnails/89.jpg)
PreliminaryIntro to DL with Tensorflow
Simple ExampleIntro to DLBeyond Tensorflow
Hyper-parameters
Get a DL model to work involves a lot of hyper-parameters.
Optimization: optimization subroutines that update parameters with gradients.beyond simple sgd, there are rmsprop, adam, adagrad... adam is popular.learning rate
Architectures: number of layers, hidden units per layers
Nonlinear function: sigmoid, relu, tanh...
Initialization: initialization of variables W , b.arbitrary initializations will fail.xavier initialization, truncated normal...
Need hand tuning or cross validation to select good hyper-parameters.
Other topics: batch-normalization, dropout, stochastic layers...
IEOR 8100: Reinforcement Learning 23/36
![Page 90: Introduction to Deep Learning with Tensor owIntroduction to Deep Learning with Tensor ow Yunhao (Robin) Tang Department of IEOR, Columbia University February 25, 2019 IEOR 8100: Reinforcement](https://reader030.vdocument.in/reader030/viewer/2022040104/5ec97e4eb216631abf2786f8/html5/thumbnails/90.jpg)
PreliminaryIntro to DL with Tensorflow
Simple ExampleIntro to DLBeyond Tensorflow
Beyond Tensorflow
Other autodiff softwares: Pytorch/Chainer/Theano/Caffepytorch and chainer allow for dynamic graph buildingothers use static graph buildingstrengths/weaknesses: gpu, distributed computing, model building flexibility,debug...
High level interface: Kerasuse tf and theano as backendspecify model architecture more easily
High level interface: stick with Tensorflowtensorflow.contrib
IEOR 8100: Reinforcement Learning 24/36
![Page 91: Introduction to Deep Learning with Tensor owIntroduction to Deep Learning with Tensor ow Yunhao (Robin) Tang Department of IEOR, Columbia University February 25, 2019 IEOR 8100: Reinforcement](https://reader030.vdocument.in/reader030/viewer/2022040104/5ec97e4eb216631abf2786f8/html5/thumbnails/91.jpg)
PreliminaryIntro to DL with Tensorflow
Simple ExampleIntro to DLBeyond Tensorflow
Beyond Tensorflow
Other autodiff softwares: Pytorch/Chainer/Theano/Caffe
pytorch and chainer allow for dynamic graph buildingothers use static graph buildingstrengths/weaknesses: gpu, distributed computing, model building flexibility,debug...
High level interface: Kerasuse tf and theano as backendspecify model architecture more easily
High level interface: stick with Tensorflowtensorflow.contrib
IEOR 8100: Reinforcement Learning 24/36
![Page 92: Introduction to Deep Learning with Tensor owIntroduction to Deep Learning with Tensor ow Yunhao (Robin) Tang Department of IEOR, Columbia University February 25, 2019 IEOR 8100: Reinforcement](https://reader030.vdocument.in/reader030/viewer/2022040104/5ec97e4eb216631abf2786f8/html5/thumbnails/92.jpg)
PreliminaryIntro to DL with Tensorflow
Simple ExampleIntro to DLBeyond Tensorflow
Beyond Tensorflow
Other autodiff softwares: Pytorch/Chainer/Theano/Caffepytorch and chainer allow for dynamic graph building
others use static graph buildingstrengths/weaknesses: gpu, distributed computing, model building flexibility,debug...
High level interface: Kerasuse tf and theano as backendspecify model architecture more easily
High level interface: stick with Tensorflowtensorflow.contrib
IEOR 8100: Reinforcement Learning 24/36
![Page 93: Introduction to Deep Learning with Tensor owIntroduction to Deep Learning with Tensor ow Yunhao (Robin) Tang Department of IEOR, Columbia University February 25, 2019 IEOR 8100: Reinforcement](https://reader030.vdocument.in/reader030/viewer/2022040104/5ec97e4eb216631abf2786f8/html5/thumbnails/93.jpg)
PreliminaryIntro to DL with Tensorflow
Simple ExampleIntro to DLBeyond Tensorflow
Beyond Tensorflow
Other autodiff softwares: Pytorch/Chainer/Theano/Caffepytorch and chainer allow for dynamic graph buildingothers use static graph building
strengths/weaknesses: gpu, distributed computing, model building flexibility,debug...
High level interface: Kerasuse tf and theano as backendspecify model architecture more easily
High level interface: stick with Tensorflowtensorflow.contrib
IEOR 8100: Reinforcement Learning 24/36
![Page 94: Introduction to Deep Learning with Tensor owIntroduction to Deep Learning with Tensor ow Yunhao (Robin) Tang Department of IEOR, Columbia University February 25, 2019 IEOR 8100: Reinforcement](https://reader030.vdocument.in/reader030/viewer/2022040104/5ec97e4eb216631abf2786f8/html5/thumbnails/94.jpg)
PreliminaryIntro to DL with Tensorflow
Simple ExampleIntro to DLBeyond Tensorflow
Beyond Tensorflow
Other autodiff softwares: Pytorch/Chainer/Theano/Caffepytorch and chainer allow for dynamic graph buildingothers use static graph buildingstrengths/weaknesses: gpu, distributed computing, model building flexibility,debug...
High level interface: Kerasuse tf and theano as backendspecify model architecture more easily
High level interface: stick with Tensorflowtensorflow.contrib
IEOR 8100: Reinforcement Learning 24/36
![Page 95: Introduction to Deep Learning with Tensor owIntroduction to Deep Learning with Tensor ow Yunhao (Robin) Tang Department of IEOR, Columbia University February 25, 2019 IEOR 8100: Reinforcement](https://reader030.vdocument.in/reader030/viewer/2022040104/5ec97e4eb216631abf2786f8/html5/thumbnails/95.jpg)
PreliminaryIntro to DL with Tensorflow
Simple ExampleIntro to DLBeyond Tensorflow
Beyond Tensorflow
Other autodiff softwares: Pytorch/Chainer/Theano/Caffepytorch and chainer allow for dynamic graph buildingothers use static graph buildingstrengths/weaknesses: gpu, distributed computing, model building flexibility,debug...
High level interface: Keras
use tf and theano as backendspecify model architecture more easily
High level interface: stick with Tensorflowtensorflow.contrib
IEOR 8100: Reinforcement Learning 24/36
![Page 96: Introduction to Deep Learning with Tensor owIntroduction to Deep Learning with Tensor ow Yunhao (Robin) Tang Department of IEOR, Columbia University February 25, 2019 IEOR 8100: Reinforcement](https://reader030.vdocument.in/reader030/viewer/2022040104/5ec97e4eb216631abf2786f8/html5/thumbnails/96.jpg)
PreliminaryIntro to DL with Tensorflow
Simple ExampleIntro to DLBeyond Tensorflow
Beyond Tensorflow
Other autodiff softwares: Pytorch/Chainer/Theano/Caffepytorch and chainer allow for dynamic graph buildingothers use static graph buildingstrengths/weaknesses: gpu, distributed computing, model building flexibility,debug...
High level interface: Kerasuse tf and theano as backend
specify model architecture more easily
High level interface: stick with Tensorflowtensorflow.contrib
IEOR 8100: Reinforcement Learning 24/36
![Page 97: Introduction to Deep Learning with Tensor owIntroduction to Deep Learning with Tensor ow Yunhao (Robin) Tang Department of IEOR, Columbia University February 25, 2019 IEOR 8100: Reinforcement](https://reader030.vdocument.in/reader030/viewer/2022040104/5ec97e4eb216631abf2786f8/html5/thumbnails/97.jpg)
PreliminaryIntro to DL with Tensorflow
Simple ExampleIntro to DLBeyond Tensorflow
Beyond Tensorflow
Other autodiff softwares: Pytorch/Chainer/Theano/Caffepytorch and chainer allow for dynamic graph buildingothers use static graph buildingstrengths/weaknesses: gpu, distributed computing, model building flexibility,debug...
High level interface: Kerasuse tf and theano as backendspecify model architecture more easily
High level interface: stick with Tensorflowtensorflow.contrib
IEOR 8100: Reinforcement Learning 24/36
![Page 98: Introduction to Deep Learning with Tensor owIntroduction to Deep Learning with Tensor ow Yunhao (Robin) Tang Department of IEOR, Columbia University February 25, 2019 IEOR 8100: Reinforcement](https://reader030.vdocument.in/reader030/viewer/2022040104/5ec97e4eb216631abf2786f8/html5/thumbnails/98.jpg)
PreliminaryIntro to DL with Tensorflow
Simple ExampleIntro to DLBeyond Tensorflow
Beyond Tensorflow
Other autodiff softwares: Pytorch/Chainer/Theano/Caffepytorch and chainer allow for dynamic graph buildingothers use static graph buildingstrengths/weaknesses: gpu, distributed computing, model building flexibility,debug...
High level interface: Kerasuse tf and theano as backendspecify model architecture more easily
High level interface: stick with Tensorflow
tensorflow.contrib
IEOR 8100: Reinforcement Learning 24/36
![Page 99: Introduction to Deep Learning with Tensor owIntroduction to Deep Learning with Tensor ow Yunhao (Robin) Tang Department of IEOR, Columbia University February 25, 2019 IEOR 8100: Reinforcement](https://reader030.vdocument.in/reader030/viewer/2022040104/5ec97e4eb216631abf2786f8/html5/thumbnails/99.jpg)
PreliminaryIntro to DL with Tensorflow
Simple ExampleIntro to DLBeyond Tensorflow
Beyond Tensorflow
Other autodiff softwares: Pytorch/Chainer/Theano/Caffepytorch and chainer allow for dynamic graph buildingothers use static graph buildingstrengths/weaknesses: gpu, distributed computing, model building flexibility,debug...
High level interface: Kerasuse tf and theano as backendspecify model architecture more easily
High level interface: stick with Tensorflowtensorflow.contrib
IEOR 8100: Reinforcement Learning 24/36
![Page 100: Introduction to Deep Learning with Tensor owIntroduction to Deep Learning with Tensor ow Yunhao (Robin) Tang Department of IEOR, Columbia University February 25, 2019 IEOR 8100: Reinforcement](https://reader030.vdocument.in/reader030/viewer/2022040104/5ec97e4eb216631abf2786f8/html5/thumbnails/100.jpg)
PreliminaryIntro to DL with Tensorflow
Simple ExampleIntro to DLBeyond Tensorflow
Beyond Tensorflow
() low level vs. high level
IEOR 8100: Reinforcement Learning 25/36
![Page 101: Introduction to Deep Learning with Tensor owIntroduction to Deep Learning with Tensor ow Yunhao (Robin) Tang Department of IEOR, Columbia University February 25, 2019 IEOR 8100: Reinforcement](https://reader030.vdocument.in/reader030/viewer/2022040104/5ec97e4eb216631abf2786f8/html5/thumbnails/101.jpg)
PreliminaryIntro to DL with Tensorflow
Simple ExampleIntro to DLBeyond Tensorflow
Additional Resources
TF official website: https://www.tensorflow.org/
TF Basic tutorial: https://www.tensorflow.org/tutorials/
Keras doc: https://keras.io/
TF tutorials: https://github.com/aymericdamien/TensorFlow-Examples
IEOR 8100: Reinforcement Learning 26/36
![Page 102: Introduction to Deep Learning with Tensor owIntroduction to Deep Learning with Tensor ow Yunhao (Robin) Tang Department of IEOR, Columbia University February 25, 2019 IEOR 8100: Reinforcement](https://reader030.vdocument.in/reader030/viewer/2022040104/5ec97e4eb216631abf2786f8/html5/thumbnails/102.jpg)
PreliminaryIntro to DL with Tensorflow
Simple ExampleIntro to DLBeyond Tensorflow
Additional Resources
TF official website: https://www.tensorflow.org/
TF Basic tutorial: https://www.tensorflow.org/tutorials/
Keras doc: https://keras.io/
TF tutorials: https://github.com/aymericdamien/TensorFlow-Examples
IEOR 8100: Reinforcement Learning 26/36
![Page 103: Introduction to Deep Learning with Tensor owIntroduction to Deep Learning with Tensor ow Yunhao (Robin) Tang Department of IEOR, Columbia University February 25, 2019 IEOR 8100: Reinforcement](https://reader030.vdocument.in/reader030/viewer/2022040104/5ec97e4eb216631abf2786f8/html5/thumbnails/103.jpg)
PreliminaryIntro to DL with Tensorflow
Simple ExampleIntro to DLBeyond Tensorflow
Additional Resources
TF official website: https://www.tensorflow.org/
TF Basic tutorial: https://www.tensorflow.org/tutorials/
Keras doc: https://keras.io/
TF tutorials: https://github.com/aymericdamien/TensorFlow-Examples
IEOR 8100: Reinforcement Learning 26/36
![Page 104: Introduction to Deep Learning with Tensor owIntroduction to Deep Learning with Tensor ow Yunhao (Robin) Tang Department of IEOR, Columbia University February 25, 2019 IEOR 8100: Reinforcement](https://reader030.vdocument.in/reader030/viewer/2022040104/5ec97e4eb216631abf2786f8/html5/thumbnails/104.jpg)
PreliminaryIntro to DL with Tensorflow
Simple ExampleIntro to DLBeyond Tensorflow
Additional Resources
TF official website: https://www.tensorflow.org/
TF Basic tutorial: https://www.tensorflow.org/tutorials/
Keras doc: https://keras.io/
TF tutorials: https://github.com/aymericdamien/TensorFlow-Examples
IEOR 8100: Reinforcement Learning 26/36
![Page 105: Introduction to Deep Learning with Tensor owIntroduction to Deep Learning with Tensor ow Yunhao (Robin) Tang Department of IEOR, Columbia University February 25, 2019 IEOR 8100: Reinforcement](https://reader030.vdocument.in/reader030/viewer/2022040104/5ec97e4eb216631abf2786f8/html5/thumbnails/105.jpg)
PreliminaryIntro to DL with Tensorflow
Simple ExampleIntro to DLBeyond Tensorflow
Additional Resources
TF official website: https://www.tensorflow.org/
TF Basic tutorial: https://www.tensorflow.org/tutorials/
Keras doc: https://keras.io/
TF tutorials: https://github.com/aymericdamien/TensorFlow-Examples
IEOR 8100: Reinforcement Learning 26/36
![Page 106: Introduction to Deep Learning with Tensor owIntroduction to Deep Learning with Tensor ow Yunhao (Robin) Tang Department of IEOR, Columbia University February 25, 2019 IEOR 8100: Reinforcement](https://reader030.vdocument.in/reader030/viewer/2022040104/5ec97e4eb216631abf2786f8/html5/thumbnails/106.jpg)
PreliminaryIntro to DL with Tensorflow
Simple ExampleIntro to DLBeyond Tensorflow
OpenAI Gym - very brief intro
() classic control () humanoid () atari game - pong
IEOR 8100: Reinforcement Learning 27/36
![Page 107: Introduction to Deep Learning with Tensor owIntroduction to Deep Learning with Tensor ow Yunhao (Robin) Tang Department of IEOR, Columbia University February 25, 2019 IEOR 8100: Reinforcement](https://reader030.vdocument.in/reader030/viewer/2022040104/5ec97e4eb216631abf2786f8/html5/thumbnails/107.jpg)
PreliminaryIntro to DL with Tensorflow
Simple ExampleIntro to DLBeyond Tensorflow
OpenAI Gym - very brief intro
Gym is a testbed for RL algorithms. Doc: https://github.com/openai/gym
make an environment: env = gym.make(”CartPole-v0”)
initialize: obs = env.reset()first obs, we take the first action based on this
take a step: obs, reward, done, info = env.step(action)based on obs, we make choices on which action to takereward is the reward collected during this one step transitiondone is True or False, to indicate if the episode terminatesinfo additional info about the env, normally empty dictionary
display state: env.render(). Only display current state, need to render in a loopto display consecutive states.
IEOR 8100: Reinforcement Learning 28/36
![Page 108: Introduction to Deep Learning with Tensor owIntroduction to Deep Learning with Tensor ow Yunhao (Robin) Tang Department of IEOR, Columbia University February 25, 2019 IEOR 8100: Reinforcement](https://reader030.vdocument.in/reader030/viewer/2022040104/5ec97e4eb216631abf2786f8/html5/thumbnails/108.jpg)
PreliminaryIntro to DL with Tensorflow
Simple ExampleIntro to DLBeyond Tensorflow
OpenAI Gym - very brief intro
Gym is a testbed for RL algorithms. Doc: https://github.com/openai/gym
make an environment: env = gym.make(”CartPole-v0”)
initialize: obs = env.reset()first obs, we take the first action based on this
take a step: obs, reward, done, info = env.step(action)based on obs, we make choices on which action to takereward is the reward collected during this one step transitiondone is True or False, to indicate if the episode terminatesinfo additional info about the env, normally empty dictionary
display state: env.render(). Only display current state, need to render in a loopto display consecutive states.
IEOR 8100: Reinforcement Learning 28/36
![Page 109: Introduction to Deep Learning with Tensor owIntroduction to Deep Learning with Tensor ow Yunhao (Robin) Tang Department of IEOR, Columbia University February 25, 2019 IEOR 8100: Reinforcement](https://reader030.vdocument.in/reader030/viewer/2022040104/5ec97e4eb216631abf2786f8/html5/thumbnails/109.jpg)
PreliminaryIntro to DL with Tensorflow
Simple ExampleIntro to DLBeyond Tensorflow
OpenAI Gym - very brief intro
Gym is a testbed for RL algorithms. Doc: https://github.com/openai/gym
make an environment: env = gym.make(”CartPole-v0”)
initialize: obs = env.reset()first obs, we take the first action based on this
take a step: obs, reward, done, info = env.step(action)based on obs, we make choices on which action to takereward is the reward collected during this one step transitiondone is True or False, to indicate if the episode terminatesinfo additional info about the env, normally empty dictionary
display state: env.render(). Only display current state, need to render in a loopto display consecutive states.
IEOR 8100: Reinforcement Learning 28/36
![Page 110: Introduction to Deep Learning with Tensor owIntroduction to Deep Learning with Tensor ow Yunhao (Robin) Tang Department of IEOR, Columbia University February 25, 2019 IEOR 8100: Reinforcement](https://reader030.vdocument.in/reader030/viewer/2022040104/5ec97e4eb216631abf2786f8/html5/thumbnails/110.jpg)
PreliminaryIntro to DL with Tensorflow
Simple ExampleIntro to DLBeyond Tensorflow
OpenAI Gym - very brief intro
Gym is a testbed for RL algorithms. Doc: https://github.com/openai/gym
make an environment: env = gym.make(”CartPole-v0”)
initialize: obs = env.reset()
first obs, we take the first action based on this
take a step: obs, reward, done, info = env.step(action)based on obs, we make choices on which action to takereward is the reward collected during this one step transitiondone is True or False, to indicate if the episode terminatesinfo additional info about the env, normally empty dictionary
display state: env.render(). Only display current state, need to render in a loopto display consecutive states.
IEOR 8100: Reinforcement Learning 28/36
![Page 111: Introduction to Deep Learning with Tensor owIntroduction to Deep Learning with Tensor ow Yunhao (Robin) Tang Department of IEOR, Columbia University February 25, 2019 IEOR 8100: Reinforcement](https://reader030.vdocument.in/reader030/viewer/2022040104/5ec97e4eb216631abf2786f8/html5/thumbnails/111.jpg)
PreliminaryIntro to DL with Tensorflow
Simple ExampleIntro to DLBeyond Tensorflow
OpenAI Gym - very brief intro
Gym is a testbed for RL algorithms. Doc: https://github.com/openai/gym
make an environment: env = gym.make(”CartPole-v0”)
initialize: obs = env.reset()first obs, we take the first action based on this
take a step: obs, reward, done, info = env.step(action)based on obs, we make choices on which action to takereward is the reward collected during this one step transitiondone is True or False, to indicate if the episode terminatesinfo additional info about the env, normally empty dictionary
display state: env.render(). Only display current state, need to render in a loopto display consecutive states.
IEOR 8100: Reinforcement Learning 28/36
![Page 112: Introduction to Deep Learning with Tensor owIntroduction to Deep Learning with Tensor ow Yunhao (Robin) Tang Department of IEOR, Columbia University February 25, 2019 IEOR 8100: Reinforcement](https://reader030.vdocument.in/reader030/viewer/2022040104/5ec97e4eb216631abf2786f8/html5/thumbnails/112.jpg)
PreliminaryIntro to DL with Tensorflow
Simple ExampleIntro to DLBeyond Tensorflow
OpenAI Gym - very brief intro
Gym is a testbed for RL algorithms. Doc: https://github.com/openai/gym
make an environment: env = gym.make(”CartPole-v0”)
initialize: obs = env.reset()first obs, we take the first action based on this
take a step: obs, reward, done, info = env.step(action)
based on obs, we make choices on which action to takereward is the reward collected during this one step transitiondone is True or False, to indicate if the episode terminatesinfo additional info about the env, normally empty dictionary
display state: env.render(). Only display current state, need to render in a loopto display consecutive states.
IEOR 8100: Reinforcement Learning 28/36
![Page 113: Introduction to Deep Learning with Tensor owIntroduction to Deep Learning with Tensor ow Yunhao (Robin) Tang Department of IEOR, Columbia University February 25, 2019 IEOR 8100: Reinforcement](https://reader030.vdocument.in/reader030/viewer/2022040104/5ec97e4eb216631abf2786f8/html5/thumbnails/113.jpg)
PreliminaryIntro to DL with Tensorflow
Simple ExampleIntro to DLBeyond Tensorflow
OpenAI Gym - very brief intro
Gym is a testbed for RL algorithms. Doc: https://github.com/openai/gym
make an environment: env = gym.make(”CartPole-v0”)
initialize: obs = env.reset()first obs, we take the first action based on this
take a step: obs, reward, done, info = env.step(action)based on obs, we make choices on which action to take
reward is the reward collected during this one step transitiondone is True or False, to indicate if the episode terminatesinfo additional info about the env, normally empty dictionary
display state: env.render(). Only display current state, need to render in a loopto display consecutive states.
IEOR 8100: Reinforcement Learning 28/36
![Page 114: Introduction to Deep Learning with Tensor owIntroduction to Deep Learning with Tensor ow Yunhao (Robin) Tang Department of IEOR, Columbia University February 25, 2019 IEOR 8100: Reinforcement](https://reader030.vdocument.in/reader030/viewer/2022040104/5ec97e4eb216631abf2786f8/html5/thumbnails/114.jpg)
PreliminaryIntro to DL with Tensorflow
Simple ExampleIntro to DLBeyond Tensorflow
OpenAI Gym - very brief intro
Gym is a testbed for RL algorithms. Doc: https://github.com/openai/gym
make an environment: env = gym.make(”CartPole-v0”)
initialize: obs = env.reset()first obs, we take the first action based on this
take a step: obs, reward, done, info = env.step(action)based on obs, we make choices on which action to takereward is the reward collected during this one step transition
done is True or False, to indicate if the episode terminatesinfo additional info about the env, normally empty dictionary
display state: env.render(). Only display current state, need to render in a loopto display consecutive states.
IEOR 8100: Reinforcement Learning 28/36
![Page 115: Introduction to Deep Learning with Tensor owIntroduction to Deep Learning with Tensor ow Yunhao (Robin) Tang Department of IEOR, Columbia University February 25, 2019 IEOR 8100: Reinforcement](https://reader030.vdocument.in/reader030/viewer/2022040104/5ec97e4eb216631abf2786f8/html5/thumbnails/115.jpg)
PreliminaryIntro to DL with Tensorflow
Simple ExampleIntro to DLBeyond Tensorflow
OpenAI Gym - very brief intro
Gym is a testbed for RL algorithms. Doc: https://github.com/openai/gym
make an environment: env = gym.make(”CartPole-v0”)
initialize: obs = env.reset()first obs, we take the first action based on this
take a step: obs, reward, done, info = env.step(action)based on obs, we make choices on which action to takereward is the reward collected during this one step transitiondone is True or False, to indicate if the episode terminates
info additional info about the env, normally empty dictionary
display state: env.render(). Only display current state, need to render in a loopto display consecutive states.
IEOR 8100: Reinforcement Learning 28/36
![Page 116: Introduction to Deep Learning with Tensor owIntroduction to Deep Learning with Tensor ow Yunhao (Robin) Tang Department of IEOR, Columbia University February 25, 2019 IEOR 8100: Reinforcement](https://reader030.vdocument.in/reader030/viewer/2022040104/5ec97e4eb216631abf2786f8/html5/thumbnails/116.jpg)
PreliminaryIntro to DL with Tensorflow
Simple ExampleIntro to DLBeyond Tensorflow
OpenAI Gym - very brief intro
Gym is a testbed for RL algorithms. Doc: https://github.com/openai/gym
make an environment: env = gym.make(”CartPole-v0”)
initialize: obs = env.reset()first obs, we take the first action based on this
take a step: obs, reward, done, info = env.step(action)based on obs, we make choices on which action to takereward is the reward collected during this one step transitiondone is True or False, to indicate if the episode terminatesinfo additional info about the env, normally empty dictionary
display state: env.render(). Only display current state, need to render in a loopto display consecutive states.
IEOR 8100: Reinforcement Learning 28/36
![Page 117: Introduction to Deep Learning with Tensor owIntroduction to Deep Learning with Tensor ow Yunhao (Robin) Tang Department of IEOR, Columbia University February 25, 2019 IEOR 8100: Reinforcement](https://reader030.vdocument.in/reader030/viewer/2022040104/5ec97e4eb216631abf2786f8/html5/thumbnails/117.jpg)
PreliminaryIntro to DL with Tensorflow
Simple ExampleIntro to DLBeyond Tensorflow
OpenAI Gym - very brief intro
Gym is a testbed for RL algorithms. Doc: https://github.com/openai/gym
make an environment: env = gym.make(”CartPole-v0”)
initialize: obs = env.reset()first obs, we take the first action based on this
take a step: obs, reward, done, info = env.step(action)based on obs, we make choices on which action to takereward is the reward collected during this one step transitiondone is True or False, to indicate if the episode terminatesinfo additional info about the env, normally empty dictionary
display state: env.render(). Only display current state, need to render in a loopto display consecutive states.
IEOR 8100: Reinforcement Learning 28/36
![Page 118: Introduction to Deep Learning with Tensor owIntroduction to Deep Learning with Tensor ow Yunhao (Robin) Tang Department of IEOR, Columbia University February 25, 2019 IEOR 8100: Reinforcement](https://reader030.vdocument.in/reader030/viewer/2022040104/5ec97e4eb216631abf2786f8/html5/thumbnails/118.jpg)
PreliminaryIntro to DL with Tensorflow
Simple ExampleIntro to DLBeyond Tensorflow
Beyond OpenAI Gym
There are many other open source projects that provide RL benchmarks
Continuous control: DeepMind Control Suite, Roboschool
Gym-like: rllab
Video game: VizDoom
Cooler video game: DeepMind StarCraft II Learning Environment
Or build your own environments...
IEOR 8100: Reinforcement Learning 29/36
![Page 119: Introduction to Deep Learning with Tensor owIntroduction to Deep Learning with Tensor ow Yunhao (Robin) Tang Department of IEOR, Columbia University February 25, 2019 IEOR 8100: Reinforcement](https://reader030.vdocument.in/reader030/viewer/2022040104/5ec97e4eb216631abf2786f8/html5/thumbnails/119.jpg)
PreliminaryIntro to DL with Tensorflow
Simple ExampleIntro to DLBeyond Tensorflow
Beyond OpenAI Gym
There are many other open source projects that provide RL benchmarks
Continuous control: DeepMind Control Suite, Roboschool
Gym-like: rllab
Video game: VizDoom
Cooler video game: DeepMind StarCraft II Learning Environment
Or build your own environments...
IEOR 8100: Reinforcement Learning 29/36
![Page 120: Introduction to Deep Learning with Tensor owIntroduction to Deep Learning with Tensor ow Yunhao (Robin) Tang Department of IEOR, Columbia University February 25, 2019 IEOR 8100: Reinforcement](https://reader030.vdocument.in/reader030/viewer/2022040104/5ec97e4eb216631abf2786f8/html5/thumbnails/120.jpg)
PreliminaryIntro to DL with Tensorflow
Simple ExampleIntro to DLBeyond Tensorflow
Beyond OpenAI Gym
There are many other open source projects that provide RL benchmarks
Continuous control: DeepMind Control Suite, Roboschool
Gym-like: rllab
Video game: VizDoom
Cooler video game: DeepMind StarCraft II Learning Environment
Or build your own environments...
IEOR 8100: Reinforcement Learning 29/36
![Page 121: Introduction to Deep Learning with Tensor owIntroduction to Deep Learning with Tensor ow Yunhao (Robin) Tang Department of IEOR, Columbia University February 25, 2019 IEOR 8100: Reinforcement](https://reader030.vdocument.in/reader030/viewer/2022040104/5ec97e4eb216631abf2786f8/html5/thumbnails/121.jpg)
PreliminaryIntro to DL with Tensorflow
Simple ExampleIntro to DLBeyond Tensorflow
Beyond OpenAI Gym
There are many other open source projects that provide RL benchmarks
Continuous control: DeepMind Control Suite, Roboschool
Gym-like: rllab
Video game: VizDoom
Cooler video game: DeepMind StarCraft II Learning Environment
Or build your own environments...
IEOR 8100: Reinforcement Learning 29/36
![Page 122: Introduction to Deep Learning with Tensor owIntroduction to Deep Learning with Tensor ow Yunhao (Robin) Tang Department of IEOR, Columbia University February 25, 2019 IEOR 8100: Reinforcement](https://reader030.vdocument.in/reader030/viewer/2022040104/5ec97e4eb216631abf2786f8/html5/thumbnails/122.jpg)
PreliminaryIntro to DL with Tensorflow
Simple ExampleIntro to DLBeyond Tensorflow
Beyond OpenAI Gym
There are many other open source projects that provide RL benchmarks
Continuous control: DeepMind Control Suite, Roboschool
Gym-like: rllab
Video game: VizDoom
Cooler video game: DeepMind StarCraft II Learning Environment
Or build your own environments...
IEOR 8100: Reinforcement Learning 29/36
![Page 123: Introduction to Deep Learning with Tensor owIntroduction to Deep Learning with Tensor ow Yunhao (Robin) Tang Department of IEOR, Columbia University February 25, 2019 IEOR 8100: Reinforcement](https://reader030.vdocument.in/reader030/viewer/2022040104/5ec97e4eb216631abf2786f8/html5/thumbnails/123.jpg)
PreliminaryIntro to DL with Tensorflow
Simple ExampleIntro to DLBeyond Tensorflow
Basic DQN [Mnih, 2013]
MDP with state action space S ,A
Instant reward r
Policy π : S 7→ A
Maximize cumulative reward Eπ[∑∞
t=0 rtγt ]
Action value function Qπ(s, a) for state s, action a, under policy π.
Bellman errorE [(Qπ(st , at)−max
aE [rt + γQπ(st+1, a))2]
Bellman error is zero iff optimal policy Q∗(s, a)
IEOR 8100: Reinforcement Learning 30/36
![Page 124: Introduction to Deep Learning with Tensor owIntroduction to Deep Learning with Tensor ow Yunhao (Robin) Tang Department of IEOR, Columbia University February 25, 2019 IEOR 8100: Reinforcement](https://reader030.vdocument.in/reader030/viewer/2022040104/5ec97e4eb216631abf2786f8/html5/thumbnails/124.jpg)
PreliminaryIntro to DL with Tensorflow
Simple ExampleIntro to DLBeyond Tensorflow
Basic DQN [Mnih, 2013]
MDP with state action space S ,A
Instant reward r
Policy π : S 7→ A
Maximize cumulative reward Eπ[∑∞
t=0 rtγt ]
Action value function Qπ(s, a) for state s, action a, under policy π.
Bellman errorE [(Qπ(st , at)−max
aE [rt + γQπ(st+1, a))2]
Bellman error is zero iff optimal policy Q∗(s, a)
IEOR 8100: Reinforcement Learning 30/36
![Page 125: Introduction to Deep Learning with Tensor owIntroduction to Deep Learning with Tensor ow Yunhao (Robin) Tang Department of IEOR, Columbia University February 25, 2019 IEOR 8100: Reinforcement](https://reader030.vdocument.in/reader030/viewer/2022040104/5ec97e4eb216631abf2786f8/html5/thumbnails/125.jpg)
PreliminaryIntro to DL with Tensorflow
Simple ExampleIntro to DLBeyond Tensorflow
Basic DQN [Mnih, 2013]
MDP with state action space S ,A
Instant reward r
Policy π : S 7→ A
Maximize cumulative reward Eπ[∑∞
t=0 rtγt ]
Action value function Qπ(s, a) for state s, action a, under policy π.
Bellman errorE [(Qπ(st , at)−max
aE [rt + γQπ(st+1, a))2]
Bellman error is zero iff optimal policy Q∗(s, a)
IEOR 8100: Reinforcement Learning 30/36
![Page 126: Introduction to Deep Learning with Tensor owIntroduction to Deep Learning with Tensor ow Yunhao (Robin) Tang Department of IEOR, Columbia University February 25, 2019 IEOR 8100: Reinforcement](https://reader030.vdocument.in/reader030/viewer/2022040104/5ec97e4eb216631abf2786f8/html5/thumbnails/126.jpg)
PreliminaryIntro to DL with Tensorflow
Simple ExampleIntro to DLBeyond Tensorflow
Basic DQN [Mnih, 2013]
MDP with state action space S ,A
Instant reward r
Policy π : S 7→ A
Maximize cumulative reward Eπ[∑∞
t=0 rtγt ]
Action value function Qπ(s, a) for state s, action a, under policy π.
Bellman errorE [(Qπ(st , at)−max
aE [rt + γQπ(st+1, a))2]
Bellman error is zero iff optimal policy Q∗(s, a)
IEOR 8100: Reinforcement Learning 30/36
![Page 127: Introduction to Deep Learning with Tensor owIntroduction to Deep Learning with Tensor ow Yunhao (Robin) Tang Department of IEOR, Columbia University February 25, 2019 IEOR 8100: Reinforcement](https://reader030.vdocument.in/reader030/viewer/2022040104/5ec97e4eb216631abf2786f8/html5/thumbnails/127.jpg)
PreliminaryIntro to DL with Tensorflow
Simple ExampleIntro to DLBeyond Tensorflow
Basic DQN [Mnih, 2013]
MDP with state action space S ,A
Instant reward r
Policy π : S 7→ A
Maximize cumulative reward Eπ[∑∞
t=0 rtγt ]
Action value function Qπ(s, a) for state s, action a, under policy π.
Bellman errorE [(Qπ(st , at)−max
aE [rt + γQπ(st+1, a))2]
Bellman error is zero iff optimal policy Q∗(s, a)
IEOR 8100: Reinforcement Learning 30/36
![Page 128: Introduction to Deep Learning with Tensor owIntroduction to Deep Learning with Tensor ow Yunhao (Robin) Tang Department of IEOR, Columbia University February 25, 2019 IEOR 8100: Reinforcement](https://reader030.vdocument.in/reader030/viewer/2022040104/5ec97e4eb216631abf2786f8/html5/thumbnails/128.jpg)
PreliminaryIntro to DL with Tensorflow
Simple ExampleIntro to DLBeyond Tensorflow
Basic DQN [Mnih, 2013]
MDP with state action space S ,A
Instant reward r
Policy π : S 7→ A
Maximize cumulative reward Eπ[∑∞
t=0 rtγt ]
Action value function Qπ(s, a) for state s, action a, under policy π.
Bellman errorE [(Qπ(st , at)−max
aE [rt + γQπ(st+1, a))2]
Bellman error is zero iff optimal policy Q∗(s, a)
IEOR 8100: Reinforcement Learning 30/36
![Page 129: Introduction to Deep Learning with Tensor owIntroduction to Deep Learning with Tensor ow Yunhao (Robin) Tang Department of IEOR, Columbia University February 25, 2019 IEOR 8100: Reinforcement](https://reader030.vdocument.in/reader030/viewer/2022040104/5ec97e4eb216631abf2786f8/html5/thumbnails/129.jpg)
PreliminaryIntro to DL with Tensorflow
Simple ExampleIntro to DLBeyond Tensorflow
Basic DQN [Mnih, 2013]
MDP with state action space S ,A
Instant reward r
Policy π : S 7→ A
Maximize cumulative reward Eπ[∑∞
t=0 rtγt ]
Action value function Qπ(s, a) for state s, action a, under policy π.
Bellman errorE [(Qπ(st , at)−max
aE [rt + γQπ(st+1, a))2]
Bellman error is zero iff optimal policy Q∗(s, a)
IEOR 8100: Reinforcement Learning 30/36
![Page 130: Introduction to Deep Learning with Tensor owIntroduction to Deep Learning with Tensor ow Yunhao (Robin) Tang Department of IEOR, Columbia University February 25, 2019 IEOR 8100: Reinforcement](https://reader030.vdocument.in/reader030/viewer/2022040104/5ec97e4eb216631abf2786f8/html5/thumbnails/130.jpg)
PreliminaryIntro to DL with Tensorflow
Simple ExampleIntro to DLBeyond Tensorflow
Basic DQN [Mnih, 2013]
MDP with state action space S ,A
Instant reward r
Policy π : S 7→ A
Maximize cumulative reward Eπ[∑∞
t=0 rtγt ]
Action value function Qπ(s, a) for state s, action a, under policy π.
Bellman errorE [(Qπ(st , at)−max
aE [rt + γQπ(st+1, a))2]
Bellman error is zero iff optimal policy Q∗(s, a)
IEOR 8100: Reinforcement Learning 30/36
![Page 131: Introduction to Deep Learning with Tensor owIntroduction to Deep Learning with Tensor ow Yunhao (Robin) Tang Department of IEOR, Columbia University February 25, 2019 IEOR 8100: Reinforcement](https://reader030.vdocument.in/reader030/viewer/2022040104/5ec97e4eb216631abf2786f8/html5/thumbnails/131.jpg)
PreliminaryIntro to DL with Tensorflow
Simple ExampleIntro to DLBeyond Tensorflow
Basic DQN: Conventional Q learning
Start with any Q vector Q(0)
Contractive operator T defined as
TQ(st , at) =def maxa
E [rt + γQ(st+1, a)]
Apply contractive operatorQ(t+1) ← TQ(t)
WIll converge under mild conditions. Converge to the fixed point
Q = TQ
which gives zero bellman error.
IEOR 8100: Reinforcement Learning 31/36
![Page 132: Introduction to Deep Learning with Tensor owIntroduction to Deep Learning with Tensor ow Yunhao (Robin) Tang Department of IEOR, Columbia University February 25, 2019 IEOR 8100: Reinforcement](https://reader030.vdocument.in/reader030/viewer/2022040104/5ec97e4eb216631abf2786f8/html5/thumbnails/132.jpg)
PreliminaryIntro to DL with Tensorflow
Simple ExampleIntro to DLBeyond Tensorflow
Basic DQN: Conventional Q learning
Start with any Q vector Q(0)
Contractive operator T defined as
TQ(st , at) =def maxa
E [rt + γQ(st+1, a)]
Apply contractive operatorQ(t+1) ← TQ(t)
WIll converge under mild conditions. Converge to the fixed point
Q = TQ
which gives zero bellman error.
IEOR 8100: Reinforcement Learning 31/36
![Page 133: Introduction to Deep Learning with Tensor owIntroduction to Deep Learning with Tensor ow Yunhao (Robin) Tang Department of IEOR, Columbia University February 25, 2019 IEOR 8100: Reinforcement](https://reader030.vdocument.in/reader030/viewer/2022040104/5ec97e4eb216631abf2786f8/html5/thumbnails/133.jpg)
PreliminaryIntro to DL with Tensorflow
Simple ExampleIntro to DLBeyond Tensorflow
Basic DQN: Conventional Q learning
Start with any Q vector Q(0)
Contractive operator T defined as
TQ(st , at) =def maxa
E [rt + γQ(st+1, a)]
Apply contractive operatorQ(t+1) ← TQ(t)
WIll converge under mild conditions. Converge to the fixed point
Q = TQ
which gives zero bellman error.
IEOR 8100: Reinforcement Learning 31/36
![Page 134: Introduction to Deep Learning with Tensor owIntroduction to Deep Learning with Tensor ow Yunhao (Robin) Tang Department of IEOR, Columbia University February 25, 2019 IEOR 8100: Reinforcement](https://reader030.vdocument.in/reader030/viewer/2022040104/5ec97e4eb216631abf2786f8/html5/thumbnails/134.jpg)
PreliminaryIntro to DL with Tensorflow
Simple ExampleIntro to DLBeyond Tensorflow
Basic DQN: Conventional Q learning
Start with any Q vector Q(0)
Contractive operator T defined as
TQ(st , at) =def maxa
E [rt + γQ(st+1, a)]
Apply contractive operatorQ(t+1) ← TQ(t)
WIll converge under mild conditions. Converge to the fixed point
Q = TQ
which gives zero bellman error.
IEOR 8100: Reinforcement Learning 31/36
![Page 135: Introduction to Deep Learning with Tensor owIntroduction to Deep Learning with Tensor ow Yunhao (Robin) Tang Department of IEOR, Columbia University February 25, 2019 IEOR 8100: Reinforcement](https://reader030.vdocument.in/reader030/viewer/2022040104/5ec97e4eb216631abf2786f8/html5/thumbnails/135.jpg)
PreliminaryIntro to DL with Tensorflow
Simple ExampleIntro to DLBeyond Tensorflow
Basic DQN: Conventional Q learning
Start with any Q vector Q(0)
Contractive operator T defined as
TQ(st , at) =def maxa
E [rt + γQ(st+1, a)]
Apply contractive operatorQ(t+1) ← TQ(t)
WIll converge under mild conditions. Converge to the fixed point
Q = TQ
which gives zero bellman error.
IEOR 8100: Reinforcement Learning 31/36
![Page 136: Introduction to Deep Learning with Tensor owIntroduction to Deep Learning with Tensor ow Yunhao (Robin) Tang Department of IEOR, Columbia University February 25, 2019 IEOR 8100: Reinforcement](https://reader030.vdocument.in/reader030/viewer/2022040104/5ec97e4eb216631abf2786f8/html5/thumbnails/136.jpg)
PreliminaryIntro to DL with Tensorflow
Simple ExampleIntro to DLBeyond Tensorflow
Basic DQN
Approximate using neural net Qθ(s, a) ≈ Qπ(s, a) with parameter θ.
Discrete action space |A| <∞: input state s, output |A| values, the ith valuerepresents the approximate action-value function for the ith action.
Continuous action space |A| =∞: input state s and action a, output one valuewhich represents Qπ(s, a). Not our focus here.
Minimize Bellman error
Eπ[(Qθ(si , ai )− ri −maxa
Qθ(s ′i , a))2]
Sample based, given tuples {(si , ai , ri , s ′i )}Ni=1
minθ
N∑i=1
1
N(Qθ(si , ai )− ri −max
aQθ(s ′i , a))2
IEOR 8100: Reinforcement Learning 32/36
![Page 137: Introduction to Deep Learning with Tensor owIntroduction to Deep Learning with Tensor ow Yunhao (Robin) Tang Department of IEOR, Columbia University February 25, 2019 IEOR 8100: Reinforcement](https://reader030.vdocument.in/reader030/viewer/2022040104/5ec97e4eb216631abf2786f8/html5/thumbnails/137.jpg)
PreliminaryIntro to DL with Tensorflow
Simple ExampleIntro to DLBeyond Tensorflow
Basic DQN
Approximate using neural net Qθ(s, a) ≈ Qπ(s, a) with parameter θ.
Discrete action space |A| <∞: input state s, output |A| values, the ith valuerepresents the approximate action-value function for the ith action.
Continuous action space |A| =∞: input state s and action a, output one valuewhich represents Qπ(s, a). Not our focus here.
Minimize Bellman error
Eπ[(Qθ(si , ai )− ri −maxa
Qθ(s ′i , a))2]
Sample based, given tuples {(si , ai , ri , s ′i )}Ni=1
minθ
N∑i=1
1
N(Qθ(si , ai )− ri −max
aQθ(s ′i , a))2
IEOR 8100: Reinforcement Learning 32/36
![Page 138: Introduction to Deep Learning with Tensor owIntroduction to Deep Learning with Tensor ow Yunhao (Robin) Tang Department of IEOR, Columbia University February 25, 2019 IEOR 8100: Reinforcement](https://reader030.vdocument.in/reader030/viewer/2022040104/5ec97e4eb216631abf2786f8/html5/thumbnails/138.jpg)
PreliminaryIntro to DL with Tensorflow
Simple ExampleIntro to DLBeyond Tensorflow
Basic DQN
Approximate using neural net Qθ(s, a) ≈ Qπ(s, a) with parameter θ.
Discrete action space |A| <∞: input state s, output |A| values, the ith valuerepresents the approximate action-value function for the ith action.
Continuous action space |A| =∞: input state s and action a, output one valuewhich represents Qπ(s, a). Not our focus here.
Minimize Bellman error
Eπ[(Qθ(si , ai )− ri −maxa
Qθ(s ′i , a))2]
Sample based, given tuples {(si , ai , ri , s ′i )}Ni=1
minθ
N∑i=1
1
N(Qθ(si , ai )− ri −max
aQθ(s ′i , a))2
IEOR 8100: Reinforcement Learning 32/36
![Page 139: Introduction to Deep Learning with Tensor owIntroduction to Deep Learning with Tensor ow Yunhao (Robin) Tang Department of IEOR, Columbia University February 25, 2019 IEOR 8100: Reinforcement](https://reader030.vdocument.in/reader030/viewer/2022040104/5ec97e4eb216631abf2786f8/html5/thumbnails/139.jpg)
PreliminaryIntro to DL with Tensorflow
Simple ExampleIntro to DLBeyond Tensorflow
Basic DQN
Approximate using neural net Qθ(s, a) ≈ Qπ(s, a) with parameter θ.
Discrete action space |A| <∞: input state s, output |A| values, the ith valuerepresents the approximate action-value function for the ith action.
Continuous action space |A| =∞: input state s and action a, output one valuewhich represents Qπ(s, a). Not our focus here.
Minimize Bellman error
Eπ[(Qθ(si , ai )− ri −maxa
Qθ(s ′i , a))2]
Sample based, given tuples {(si , ai , ri , s ′i )}Ni=1
minθ
N∑i=1
1
N(Qθ(si , ai )− ri −max
aQθ(s ′i , a))2
IEOR 8100: Reinforcement Learning 32/36
![Page 140: Introduction to Deep Learning with Tensor owIntroduction to Deep Learning with Tensor ow Yunhao (Robin) Tang Department of IEOR, Columbia University February 25, 2019 IEOR 8100: Reinforcement](https://reader030.vdocument.in/reader030/viewer/2022040104/5ec97e4eb216631abf2786f8/html5/thumbnails/140.jpg)
PreliminaryIntro to DL with Tensorflow
Simple ExampleIntro to DLBeyond Tensorflow
Basic DQN
Approximate using neural net Qθ(s, a) ≈ Qπ(s, a) with parameter θ.
Discrete action space |A| <∞: input state s, output |A| values, the ith valuerepresents the approximate action-value function for the ith action.
Continuous action space |A| =∞: input state s and action a, output one valuewhich represents Qπ(s, a). Not our focus here.
Minimize Bellman error
Eπ[(Qθ(si , ai )− ri −maxa
Qθ(s ′i , a))2]
Sample based, given tuples {(si , ai , ri , s ′i )}Ni=1
minθ
N∑i=1
1
N(Qθ(si , ai )− ri −max
aQθ(s ′i , a))2
IEOR 8100: Reinforcement Learning 32/36
![Page 141: Introduction to Deep Learning with Tensor owIntroduction to Deep Learning with Tensor ow Yunhao (Robin) Tang Department of IEOR, Columbia University February 25, 2019 IEOR 8100: Reinforcement](https://reader030.vdocument.in/reader030/viewer/2022040104/5ec97e4eb216631abf2786f8/html5/thumbnails/141.jpg)
PreliminaryIntro to DL with Tensorflow
Simple ExampleIntro to DLBeyond Tensorflow
Basic DQN
Approximate using neural net Qθ(s, a) ≈ Qπ(s, a) with parameter θ.
Discrete action space |A| <∞: input state s, output |A| values, the ith valuerepresents the approximate action-value function for the ith action.
Continuous action space |A| =∞: input state s and action a, output one valuewhich represents Qπ(s, a). Not our focus here.
Minimize Bellman error
Eπ[(Qθ(si , ai )− ri −maxa
Qθ(s ′i , a))2]
Sample based, given tuples {(si , ai , ri , s ′i )}Ni=1
minθ
N∑i=1
1
N(Qθ(si , ai )− ri −max
aQθ(s ′i , a))2
IEOR 8100: Reinforcement Learning 32/36
![Page 142: Introduction to Deep Learning with Tensor owIntroduction to Deep Learning with Tensor ow Yunhao (Robin) Tang Department of IEOR, Columbia University February 25, 2019 IEOR 8100: Reinforcement](https://reader030.vdocument.in/reader030/viewer/2022040104/5ec97e4eb216631abf2786f8/html5/thumbnails/142.jpg)
PreliminaryIntro to DL with Tensorflow
Simple ExampleIntro to DLBeyond Tensorflow
Basic DQN
Instablity in optimizing
minθ
N∑i=1
1
N(Qθ(si , ai )− ri −max
aQθ(s ′i , a))2
Two techniques to alleviate instability: experience replay and target network.
Experience Replay: store experience tuple {si , ai , ri , s ′i } into replay buffer B,when training, sample batches of experience {si , ai , ri , s ′i }bi=1 from B and updateparameters using SGD
Target Network: the training target ri + maxa Qθ(s ′i , a) is non-stationary. Makeit stationary by introducing a slowly updated target net θ− and compute target asri + maxa Qθ−(s ′i , a)
IEOR 8100: Reinforcement Learning 33/36
![Page 143: Introduction to Deep Learning with Tensor owIntroduction to Deep Learning with Tensor ow Yunhao (Robin) Tang Department of IEOR, Columbia University February 25, 2019 IEOR 8100: Reinforcement](https://reader030.vdocument.in/reader030/viewer/2022040104/5ec97e4eb216631abf2786f8/html5/thumbnails/143.jpg)
PreliminaryIntro to DL with Tensorflow
Simple ExampleIntro to DLBeyond Tensorflow
Basic DQN
Instablity in optimizing
minθ
N∑i=1
1
N(Qθ(si , ai )− ri −max
aQθ(s ′i , a))2
Two techniques to alleviate instability: experience replay and target network.
Experience Replay: store experience tuple {si , ai , ri , s ′i } into replay buffer B,when training, sample batches of experience {si , ai , ri , s ′i }bi=1 from B and updateparameters using SGD
Target Network: the training target ri + maxa Qθ(s ′i , a) is non-stationary. Makeit stationary by introducing a slowly updated target net θ− and compute target asri + maxa Qθ−(s ′i , a)
IEOR 8100: Reinforcement Learning 33/36
![Page 144: Introduction to Deep Learning with Tensor owIntroduction to Deep Learning with Tensor ow Yunhao (Robin) Tang Department of IEOR, Columbia University February 25, 2019 IEOR 8100: Reinforcement](https://reader030.vdocument.in/reader030/viewer/2022040104/5ec97e4eb216631abf2786f8/html5/thumbnails/144.jpg)
PreliminaryIntro to DL with Tensorflow
Simple ExampleIntro to DLBeyond Tensorflow
Basic DQN
Instablity in optimizing
minθ
N∑i=1
1
N(Qθ(si , ai )− ri −max
aQθ(s ′i , a))2
Two techniques to alleviate instability: experience replay and target network.
Experience Replay: store experience tuple {si , ai , ri , s ′i } into replay buffer B,when training, sample batches of experience {si , ai , ri , s ′i }bi=1 from B and updateparameters using SGD
Target Network: the training target ri + maxa Qθ(s ′i , a) is non-stationary. Makeit stationary by introducing a slowly updated target net θ− and compute target asri + maxa Qθ−(s ′i , a)
IEOR 8100: Reinforcement Learning 33/36
![Page 145: Introduction to Deep Learning with Tensor owIntroduction to Deep Learning with Tensor ow Yunhao (Robin) Tang Department of IEOR, Columbia University February 25, 2019 IEOR 8100: Reinforcement](https://reader030.vdocument.in/reader030/viewer/2022040104/5ec97e4eb216631abf2786f8/html5/thumbnails/145.jpg)
PreliminaryIntro to DL with Tensorflow
Simple ExampleIntro to DLBeyond Tensorflow
Basic DQN
Instablity in optimizing
minθ
N∑i=1
1
N(Qθ(si , ai )− ri −max
aQθ(s ′i , a))2
Two techniques to alleviate instability: experience replay and target network.
Experience Replay: store experience tuple {si , ai , ri , s ′i } into replay buffer B,when training, sample batches of experience {si , ai , ri , s ′i }bi=1 from B and updateparameters using SGD
Target Network: the training target ri + maxa Qθ(s ′i , a) is non-stationary. Makeit stationary by introducing a slowly updated target net θ− and compute target asri + maxa Qθ−(s ′i , a)
IEOR 8100: Reinforcement Learning 33/36
![Page 146: Introduction to Deep Learning with Tensor owIntroduction to Deep Learning with Tensor ow Yunhao (Robin) Tang Department of IEOR, Columbia University February 25, 2019 IEOR 8100: Reinforcement](https://reader030.vdocument.in/reader030/viewer/2022040104/5ec97e4eb216631abf2786f8/html5/thumbnails/146.jpg)
PreliminaryIntro to DL with Tensorflow
Simple ExampleIntro to DLBeyond Tensorflow
Basic DQN
Instablity in optimizing
minθ
N∑i=1
1
N(Qθ(si , ai )− ri −max
aQθ(s ′i , a))2
Two techniques to alleviate instability: experience replay and target network.
Experience Replay: store experience tuple {si , ai , ri , s ′i } into replay buffer B,when training, sample batches of experience {si , ai , ri , s ′i }bi=1 from B and updateparameters using SGD
Target Network: the training target ri + maxa Qθ(s ′i , a) is non-stationary. Makeit stationary by introducing a slowly updated target net θ− and compute target asri + maxa Qθ−(s ′i , a)
IEOR 8100: Reinforcement Learning 33/36
![Page 147: Introduction to Deep Learning with Tensor owIntroduction to Deep Learning with Tensor ow Yunhao (Robin) Tang Department of IEOR, Columbia University February 25, 2019 IEOR 8100: Reinforcement](https://reader030.vdocument.in/reader030/viewer/2022040104/5ec97e4eb216631abf2786f8/html5/thumbnails/147.jpg)
PreliminaryIntro to DL with Tensorflow
Simple ExampleIntro to DLBeyond Tensorflow
Basic DQN
Naive exploration ε−greedyin state s, with prob ε take action randomly; with prob 1− ε take greedy action
arg maxa
Qθ(s, a)
More advanced exploration: noisy net, parameter space noise, bayesian updates...
Learning rate: constant α = .001 for example, not RM scheme.
IEOR 8100: Reinforcement Learning 34/36
![Page 148: Introduction to Deep Learning with Tensor owIntroduction to Deep Learning with Tensor ow Yunhao (Robin) Tang Department of IEOR, Columbia University February 25, 2019 IEOR 8100: Reinforcement](https://reader030.vdocument.in/reader030/viewer/2022040104/5ec97e4eb216631abf2786f8/html5/thumbnails/148.jpg)
PreliminaryIntro to DL with Tensorflow
Simple ExampleIntro to DLBeyond Tensorflow
Basic DQN
Naive exploration ε−greedy
in state s, with prob ε take action randomly; with prob 1− ε take greedy action
arg maxa
Qθ(s, a)
More advanced exploration: noisy net, parameter space noise, bayesian updates...
Learning rate: constant α = .001 for example, not RM scheme.
IEOR 8100: Reinforcement Learning 34/36
![Page 149: Introduction to Deep Learning with Tensor owIntroduction to Deep Learning with Tensor ow Yunhao (Robin) Tang Department of IEOR, Columbia University February 25, 2019 IEOR 8100: Reinforcement](https://reader030.vdocument.in/reader030/viewer/2022040104/5ec97e4eb216631abf2786f8/html5/thumbnails/149.jpg)
PreliminaryIntro to DL with Tensorflow
Simple ExampleIntro to DLBeyond Tensorflow
Basic DQN
Naive exploration ε−greedyin state s, with prob ε take action randomly; with prob 1− ε take greedy action
arg maxa
Qθ(s, a)
More advanced exploration: noisy net, parameter space noise, bayesian updates...
Learning rate: constant α = .001 for example, not RM scheme.
IEOR 8100: Reinforcement Learning 34/36
![Page 150: Introduction to Deep Learning with Tensor owIntroduction to Deep Learning with Tensor ow Yunhao (Robin) Tang Department of IEOR, Columbia University February 25, 2019 IEOR 8100: Reinforcement](https://reader030.vdocument.in/reader030/viewer/2022040104/5ec97e4eb216631abf2786f8/html5/thumbnails/150.jpg)
PreliminaryIntro to DL with Tensorflow
Simple ExampleIntro to DLBeyond Tensorflow
Basic DQN
Naive exploration ε−greedyin state s, with prob ε take action randomly; with prob 1− ε take greedy action
arg maxa
Qθ(s, a)
More advanced exploration: noisy net, parameter space noise, bayesian updates...
Learning rate: constant α = .001 for example, not RM scheme.
IEOR 8100: Reinforcement Learning 34/36
![Page 151: Introduction to Deep Learning with Tensor owIntroduction to Deep Learning with Tensor ow Yunhao (Robin) Tang Department of IEOR, Columbia University February 25, 2019 IEOR 8100: Reinforcement](https://reader030.vdocument.in/reader030/viewer/2022040104/5ec97e4eb216631abf2786f8/html5/thumbnails/151.jpg)
PreliminaryIntro to DL with Tensorflow
Simple ExampleIntro to DLBeyond Tensorflow
Basic DQN
Naive exploration ε−greedyin state s, with prob ε take action randomly; with prob 1− ε take greedy action
arg maxa
Qθ(s, a)
More advanced exploration: noisy net, parameter space noise, bayesian updates...
Learning rate: constant α = .001 for example, not RM scheme.
IEOR 8100: Reinforcement Learning 34/36
![Page 152: Introduction to Deep Learning with Tensor owIntroduction to Deep Learning with Tensor ow Yunhao (Robin) Tang Department of IEOR, Columbia University February 25, 2019 IEOR 8100: Reinforcement](https://reader030.vdocument.in/reader030/viewer/2022040104/5ec97e4eb216631abf2786f8/html5/thumbnails/152.jpg)
PreliminaryIntro to DL with Tensorflow
Simple ExampleIntro to DLBeyond Tensorflow
Basic DQN
() dqn pseudocode
IEOR 8100: Reinforcement Learning 35/36
![Page 153: Introduction to Deep Learning with Tensor owIntroduction to Deep Learning with Tensor ow Yunhao (Robin) Tang Department of IEOR, Columbia University February 25, 2019 IEOR 8100: Reinforcement](https://reader030.vdocument.in/reader030/viewer/2022040104/5ec97e4eb216631abf2786f8/html5/thumbnails/153.jpg)
PreliminaryIntro to DL with Tensorflow
Simple ExampleIntro to DLBeyond Tensorflow
End
Thanks!
IEOR 8100: Reinforcement Learning 36/36