cs885 reinforcement learning lecture 4a: may 11, 2018cs885 reinforcement learning lecture 4a: may...

18
CS885 Reinforcement Learning Lecture 4a: May 11, 2018 Deep Neural Networks [GBC] Chap. 6, 7, 8 CS885 Spring 2018 Pascal Poupart 1 University of Waterloo

Upload: others

Post on 14-Aug-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: CS885 Reinforcement Learning Lecture 4a: May 11, 2018CS885 Reinforcement Learning Lecture 4a: May 11, 2018 Deep Neural Networks [GBC] Chap. 6, 7, 8 University of Waterloo CS885 Spring

CS885 Reinforcement Learning

Lecture 4a: May 11, 2018

Deep Neural Networks

[GBC] Chap. 6, 7, 8

CS885 Spring 2018 Pascal Poupart 1University of Waterloo

Page 2: CS885 Reinforcement Learning Lecture 4a: May 11, 2018CS885 Reinforcement Learning Lecture 4a: May 11, 2018 Deep Neural Networks [GBC] Chap. 6, 7, 8 University of Waterloo CS885 Spring

CS885 Spring 2018 Pascal Poupart 2

Quick recap

• Markov Decision Processes: value iteration! " ← max' ( " + * ∑,- Pr "- ", 1 !("-)

• Reinforcement Learning: Q-Learning4 ", 1 ← 4 ", 1 + 5[7 + *max

'84 "-, 1- − 4(", 1)]

• Complexity depends on number of states and actions

University of Waterloo

Page 3: CS885 Reinforcement Learning Lecture 4a: May 11, 2018CS885 Reinforcement Learning Lecture 4a: May 11, 2018 Deep Neural Networks [GBC] Chap. 6, 7, 8 University of Waterloo CS885 Spring

CS885 Spring 2018 Pascal Poupart 3

Large State Spaces

• Computer Go: 3"#$ states

• Inverted pendulum: (&, &(, ), )()– 4-dimensional

continuous state space• Atari: 210x160x3 dimensions (pixel values)

University of Waterloo

Page 4: CS885 Reinforcement Learning Lecture 4a: May 11, 2018CS885 Reinforcement Learning Lecture 4a: May 11, 2018 Deep Neural Networks [GBC] Chap. 6, 7, 8 University of Waterloo CS885 Spring

CS885 Spring 2018 Pascal Poupart 4

Functions to be Approximated

• Policy: ! " → $

• Q-function: % ", $ ∈ ℜ

• Value function: ) " ∈ ℜ

University of Waterloo

Page 5: CS885 Reinforcement Learning Lecture 4a: May 11, 2018CS885 Reinforcement Learning Lecture 4a: May 11, 2018 Deep Neural Networks [GBC] Chap. 6, 7, 8 University of Waterloo CS885 Spring

CS885 Spring 2018 Pascal Poupart 5

Q-function Approximation

• Let ! = #$, #&, … , #( )

• Linear* !, + ≈ ∑. /0.#.

• Non-linear (e.g., neural network)* !, + ≈ 1(3;5)

University of Waterloo

Page 6: CS885 Reinforcement Learning Lecture 4a: May 11, 2018CS885 Reinforcement Learning Lecture 4a: May 11, 2018 Deep Neural Networks [GBC] Chap. 6, 7, 8 University of Waterloo CS885 Spring

Traditional Neural Network• Network of units

(computational neurons) linked by weighted edges

• Each unit computes: z = ℎ(%&' + ))– Inputs: '– Output: +– Weights (parameters): %– Bias: )– Activation function (usually non-linear): ℎ

CS885 Spring 2018 Pascal Poupart 6University of Waterloo

Page 7: CS885 Reinforcement Learning Lecture 4a: May 11, 2018CS885 Reinforcement Learning Lecture 4a: May 11, 2018 Deep Neural Networks [GBC] Chap. 6, 7, 8 University of Waterloo CS885 Spring

One hidden Layer Architecture• Feed-forward neural network

• Hidden units: !" = ℎ%('"% ( + *"(%))

• Output units: ,- = ℎ.('-. / + *-

(.))• Overall: ,- = ℎ. ∑" 1-". ℎ% ∑2 1"2% 32 + *"

(%) + *-(.)

CS885 Spring 2018 Pascal Poupart 7University of Waterloo

3%

3.

!%

!.

,%

1%%(%)

1%.(%)1.%(%)

1..(%)

1%%(.)

1%.(.)

input hiddenoutput

11*%(%)

*.(%)

*%(.)

Page 8: CS885 Reinforcement Learning Lecture 4a: May 11, 2018CS885 Reinforcement Learning Lecture 4a: May 11, 2018 Deep Neural Networks [GBC] Chap. 6, 7, 8 University of Waterloo CS885 Spring

Traditional activation functions ℎ• Threshold: ℎ " = $ 1 " ≥ 0

−1 " < 0

• Sigmoid: ℎ " = * " = ++,-./

• Gaussian: ℎ " = 0123/.45

3

• Tanh: ℎ " = tanh " = -/1-./-/,-./

• Identity: ℎ " = "

CS885 Spring 2018 Pascal Poupart 8University of Waterloo

Page 9: CS885 Reinforcement Learning Lecture 4a: May 11, 2018CS885 Reinforcement Learning Lecture 4a: May 11, 2018 Deep Neural Networks [GBC] Chap. 6, 7, 8 University of Waterloo CS885 Spring

Universal function approximation

• Theorem: Neural networks with at least one hidden layer of sufficiently many sigmoid/tanh/Gaussian units can approximate any function arbitrarily closely.

• Picture:

CS885 Spring 2018 Pascal Poupart 9University of Waterloo

Page 10: CS885 Reinforcement Learning Lecture 4a: May 11, 2018CS885 Reinforcement Learning Lecture 4a: May 11, 2018 Deep Neural Networks [GBC] Chap. 6, 7, 8 University of Waterloo CS885 Spring

Minimize least squared error• Minimize error function

! " = 12&'

!' " ( = 12&'

) *+," − .' ((

where ) is the function encoded by the neural net

• Train by gradient descent (a.k.a. backpropagation)– For each example (*', .'), adjust the weights as follows:

123 ← 123 − 56!'6123

CS885 Spring 2018 Pascal Poupart 10University of Waterloo

Page 11: CS885 Reinforcement Learning Lecture 4a: May 11, 2018CS885 Reinforcement Learning Lecture 4a: May 11, 2018 Deep Neural Networks [GBC] Chap. 6, 7, 8 University of Waterloo CS885 Spring

Deep Neural Networks

• Definition: neural network with many hidden layers

• Advantage: high expressivity• Challenges:

– How should we train a deep neural network?– How can we avoid overfitting?

CS885 Spring 2018 Pascal Poupart 11University of Waterloo

Page 12: CS885 Reinforcement Learning Lecture 4a: May 11, 2018CS885 Reinforcement Learning Lecture 4a: May 11, 2018 Deep Neural Networks [GBC] Chap. 6, 7, 8 University of Waterloo CS885 Spring

Mixture of Gaussians

• Deep neural network (hierarchical mixture)

• Shallow neural network (flat mixture)

University of Waterloo CS885 Spring 2018 Pascal Poupart 12

Page 13: CS885 Reinforcement Learning Lecture 4a: May 11, 2018CS885 Reinforcement Learning Lecture 4a: May 11, 2018 Deep Neural Networks [GBC] Chap. 6, 7, 8 University of Waterloo CS885 Spring

13

Image Classification

• ImageNet Large Scale Visual Recognition Challenge

28.2 25.8

16.411.7

7.3 6.73.57 3.07 5.1

05

1015202530

NEC (201

0)

XRCE (201

1)

AlexNet

(201

2)

ZF (201

3)

VGG (201

4)

Google

LeNet

(201

4)

ResNet

(201

5)

Google

LeNet

-v4

(201

6)

Human

Cla

ssifi

catio

n er

ror (

%)

Features + SVMs Deep Convolutional Neural Nets

5 8 19 22 152 depth

CS885 Spring 2018 Pascal PoupartUniversity of Waterloo

Page 14: CS885 Reinforcement Learning Lecture 4a: May 11, 2018CS885 Reinforcement Learning Lecture 4a: May 11, 2018 Deep Neural Networks [GBC] Chap. 6, 7, 8 University of Waterloo CS885 Spring

CS885 Spring 2018 Pascal Poupart 14

Vanishing Gradients

• Deep neural networks of sigmoid and hyperbolic units often suffer from vanishing gradients

large gradient

medium gradient

small gradient

University of Waterloo

Page 15: CS885 Reinforcement Learning Lecture 4a: May 11, 2018CS885 Reinforcement Learning Lecture 4a: May 11, 2018 Deep Neural Networks [GBC] Chap. 6, 7, 8 University of Waterloo CS885 Spring

CS885 Spring 2018 Pascal Poupart 15

Sigmoid and hyperbolic units

• Derivative is always less than 1

sigmoid hyperbolic

University of Waterloo

Page 16: CS885 Reinforcement Learning Lecture 4a: May 11, 2018CS885 Reinforcement Learning Lecture 4a: May 11, 2018 Deep Neural Networks [GBC] Chap. 6, 7, 8 University of Waterloo CS885 Spring

CS885 Spring 2018 Pascal Poupart 16

Simple Example• ! = # $% # $& # $' # $( )

• Common weight initialization in (-1,1)• Sigmoid function and its derivative always less than 1• This leads to vanishing gradients:

*+*,-

= #′(0%)# 0&*+*,2

= #3 0% $%#′(0&)# 0' ≤ *+*,-

*+*,5

= #3 0% $%#′(0&)$&#′(0')#(0() ≤ *+*,2

*+*,6

= #3 0% $%#3 0& $&#3 0' $'#′ 0( ) ≤ *+*,5

) ℎ( ℎ' ℎ& !$( $' $& $%

University of Waterloo

Page 17: CS885 Reinforcement Learning Lecture 4a: May 11, 2018CS885 Reinforcement Learning Lecture 4a: May 11, 2018 Deep Neural Networks [GBC] Chap. 6, 7, 8 University of Waterloo CS885 Spring

CS885 Spring 2018 Pascal Poupart 17

Mitigating Vanishing Gradients

• Some popular solutions:– Pre-training– Rectified linear units– Batch normalization– Skip connections

University of Waterloo

Page 18: CS885 Reinforcement Learning Lecture 4a: May 11, 2018CS885 Reinforcement Learning Lecture 4a: May 11, 2018 Deep Neural Networks [GBC] Chap. 6, 7, 8 University of Waterloo CS885 Spring

CS885 Spring 2018 Pascal Poupart 18

Rectified Linear Units

• Rectified linear: ℎ " = max(0, ")– Gradient is 0 or 1– Sparse computation

• Soft version(“Softplus”) :ℎ " = log(1 + 01)

• Warning: softplusdoes not prevent gradient vanishing (gradient < 1)

RectifiedLinear

Softplus

University of Waterloo