cs885 reinforcement learning lecture 4a: may 11, 2018cs885 reinforcement learning lecture 4a: may...

CS885 Reinforcement Learning

Lecture 4a: May 11, 2018

Deep Neural Networks

[GBC] Chap. 6, 7, 8

CS885 Spring 2018 Pascal Poupart 1University of Waterloo

CS885 Spring 2018 Pascal Poupart 2

Quick recap

• Markov Decision Processes: value iteration! " ← max' ( " + * ∑,- Pr "- ", 1 !("-)

• Reinforcement Learning: Q-Learning4 ", 1 ← 4 ", 1 + 5[7 + *max

'84 "-, 1- − 4(", 1)]

• Complexity depends on number of states and actions

University of Waterloo


Large State Spaces

• Computer Go: 3"#$ states

• Inverted pendulum: (&, &(, ), )()– 4-dimensional

continuous state space• Atari: 210x160x3 dimensions (pixel values)



Functions to be Approximated

• Policy: ! " → $

• Q-function: % ", $ ∈ ℜ

• Value function: ) " ∈ ℜ



Q-function Approximation

• Let ! = #$, #&, … , #( )

• Linear* !, + ≈ ∑. /0.#.

• Non-linear (e.g., neural network)* !, + ≈ 1(3;5)


Traditional Neural Network• Network of units

(computational neurons) linked by weighted edges

• Each unit computes: z = ℎ(%&' + ))– Inputs: '– Output: +– Weights (parameters): %– Bias: )– Activation function (usually non-linear): ℎ


One hidden Layer Architecture• Feed-forward neural network

• Hidden units: !" = ℎ%('"% ( + *"(%))

• Output units: ,- = ℎ.('-. / + *-

(.))• Overall: ,- = ℎ. ∑" 1-". ℎ% ∑2 1"2% 32 + *"

(%) + *-(.)


3%

3.

!%

!.

,%

1%%(%)

1%.(%)1.%(%)

1..(%)

1%%(.)

1%.(.)

input hiddenoutput

11*%(%)

*.(%)

*%(.)

Traditional activation functions ℎ• Threshold: ℎ " = $ 1 " ≥ 0

−1 " < 0

• Sigmoid: ℎ " = * " = ++,-./

• Gaussian: ℎ " = 0123/.45

3

• Tanh: ℎ " = tanh " = -/1-./-/,-./

• Identity: ℎ " = "


Universal function approximation

• Theorem: Neural networks with at least one hidden layer of sufficiently many sigmoid/tanh/Gaussian units can approximate any function arbitrarily closely.

• Picture:


Minimize least squared error• Minimize error function

! " = 12&'

!' " ( = 12&'

) *+," − .' ((

where ) is the function encoded by the neural net

• Train by gradient descent (a.k.a. backpropagation)– For each example (*', .'), adjust the weights as follows:

123 ← 123 − 56!'6123


Deep Neural Networks

• Definition: neural network with many hidden layers

• Advantage: high expressivity• Challenges:

– How should we train a deep neural network?– How can we avoid overfitting?


Mixture of Gaussians

• Deep neural network (hierarchical mixture)

• Shallow neural network (flat mixture)

University of Waterloo CS885 Spring 2018 Pascal Poupart 12

13

Image Classification

• ImageNet Large Scale Visual Recognition Challenge

28.2 25.8

16.411.7

7.3 6.73.57 3.07 5.1

05

1015202530

NEC (201

0)

XRCE (201

1)

AlexNet

(201

2)

ZF (201

3)

VGG (201

4)

Google

LeNet

(201

4)

ResNet

(201

5)

Google

LeNet

-v4

(201

6)

Human

Cla

ssifi

catio

n er

ror (

%)

Features + SVMs Deep Convolutional Neural Nets

5 8 19 22 152 depth

CS885 Spring 2018 Pascal PoupartUniversity of Waterloo


Vanishing Gradients

• Deep neural networks of sigmoid and hyperbolic units often suffer from vanishing gradients

large gradient

medium gradient

small gradient



Sigmoid and hyperbolic units

• Derivative is always less than 1

sigmoid hyperbolic



Simple Example• ! = # $% # $& # $' # $( )

• Common weight initialization in (-1,1)• Sigmoid function and its derivative always less than 1• This leads to vanishing gradients:

*+*,-

= #′(0%)# 0&*+*,2

= #3 0% $%#′(0&)# 0' ≤ *+*,-

*+*,5

= #3 0% $%#′(0&)$&#′(0')#(0() ≤ *+*,2

*+*,6

= #3 0% $%#3 0& $&#3 0' $'#′ 0( ) ≤ *+*,5

) ℎ( ℎ' ℎ& !$( $' $& $%



Mitigating Vanishing Gradients

• Some popular solutions:– Pre-training– Rectified linear units– Batch normalization– Skip connections



Rectified Linear Units

• Rectified linear: ℎ " = max(0, ")– Gradient is 0 or 1– Sparse computation

• Soft version(“Softplus”) :ℎ " = log(1 + 01)

• Warning: softplusdoes not prevent gradient vanishing (gradient < 1)

RectifiedLinear

Softplus


cs885 reinforcement learning lecture 4a: may 11, 2018cs885 reinforcement learning lecture 4a: may...

Documents