cs885 reinforcement learning lecture 4a: may 11, 2018cs885 reinforcement learning lecture 4a: may...
TRANSCRIPT
CS885 Reinforcement Learning
Lecture 4a: May 11, 2018
Deep Neural Networks
[GBC] Chap. 6, 7, 8
CS885 Spring 2018 Pascal Poupart 1University of Waterloo
CS885 Spring 2018 Pascal Poupart 2
Quick recap
• Markov Decision Processes: value iteration! " ← max' ( " + * ∑,- Pr "- ", 1 !("-)
• Reinforcement Learning: Q-Learning4 ", 1 ← 4 ", 1 + 5[7 + *max
'84 "-, 1- − 4(", 1)]
• Complexity depends on number of states and actions
University of Waterloo
CS885 Spring 2018 Pascal Poupart 3
Large State Spaces
• Computer Go: 3"#$ states
• Inverted pendulum: (&, &(, ), )()– 4-dimensional
continuous state space• Atari: 210x160x3 dimensions (pixel values)
University of Waterloo
CS885 Spring 2018 Pascal Poupart 4
Functions to be Approximated
• Policy: ! " → $
• Q-function: % ", $ ∈ ℜ
• Value function: ) " ∈ ℜ
University of Waterloo
CS885 Spring 2018 Pascal Poupart 5
Q-function Approximation
• Let ! = #$, #&, … , #( )
• Linear* !, + ≈ ∑. /0.#.
• Non-linear (e.g., neural network)* !, + ≈ 1(3;5)
University of Waterloo
Traditional Neural Network• Network of units
(computational neurons) linked by weighted edges
• Each unit computes: z = ℎ(%&' + ))– Inputs: '– Output: +– Weights (parameters): %– Bias: )– Activation function (usually non-linear): ℎ
CS885 Spring 2018 Pascal Poupart 6University of Waterloo
One hidden Layer Architecture• Feed-forward neural network
• Hidden units: !" = ℎ%('"% ( + *"(%))
• Output units: ,- = ℎ.('-. / + *-
(.))• Overall: ,- = ℎ. ∑" 1-". ℎ% ∑2 1"2% 32 + *"
(%) + *-(.)
CS885 Spring 2018 Pascal Poupart 7University of Waterloo
3%
3.
!%
!.
,%
1%%(%)
1%.(%)1.%(%)
1..(%)
1%%(.)
1%.(.)
input hiddenoutput
11*%(%)
*.(%)
*%(.)
Traditional activation functions ℎ• Threshold: ℎ " = $ 1 " ≥ 0
−1 " < 0
• Sigmoid: ℎ " = * " = ++,-./
• Gaussian: ℎ " = 0123/.45
3
• Tanh: ℎ " = tanh " = -/1-./-/,-./
• Identity: ℎ " = "
CS885 Spring 2018 Pascal Poupart 8University of Waterloo
Universal function approximation
• Theorem: Neural networks with at least one hidden layer of sufficiently many sigmoid/tanh/Gaussian units can approximate any function arbitrarily closely.
• Picture:
CS885 Spring 2018 Pascal Poupart 9University of Waterloo
Minimize least squared error• Minimize error function
! " = 12&'
!' " ( = 12&'
) *+," − .' ((
where ) is the function encoded by the neural net
• Train by gradient descent (a.k.a. backpropagation)– For each example (*', .'), adjust the weights as follows:
123 ← 123 − 56!'6123
CS885 Spring 2018 Pascal Poupart 10University of Waterloo
Deep Neural Networks
• Definition: neural network with many hidden layers
• Advantage: high expressivity• Challenges:
– How should we train a deep neural network?– How can we avoid overfitting?
CS885 Spring 2018 Pascal Poupart 11University of Waterloo
Mixture of Gaussians
• Deep neural network (hierarchical mixture)
• Shallow neural network (flat mixture)
University of Waterloo CS885 Spring 2018 Pascal Poupart 12
13
Image Classification
• ImageNet Large Scale Visual Recognition Challenge
28.2 25.8
16.411.7
7.3 6.73.57 3.07 5.1
05
1015202530
NEC (201
0)
XRCE (201
1)
AlexNet
(201
2)
ZF (201
3)
VGG (201
4)
LeNet
(201
4)
ResNet
(201
5)
LeNet
-v4
(201
6)
Human
Cla
ssifi
catio
n er
ror (
%)
Features + SVMs Deep Convolutional Neural Nets
5 8 19 22 152 depth
CS885 Spring 2018 Pascal PoupartUniversity of Waterloo
CS885 Spring 2018 Pascal Poupart 14
Vanishing Gradients
• Deep neural networks of sigmoid and hyperbolic units often suffer from vanishing gradients
large gradient
medium gradient
small gradient
University of Waterloo
CS885 Spring 2018 Pascal Poupart 15
Sigmoid and hyperbolic units
• Derivative is always less than 1
sigmoid hyperbolic
University of Waterloo
CS885 Spring 2018 Pascal Poupart 16
Simple Example• ! = # $% # $& # $' # $( )
• Common weight initialization in (-1,1)• Sigmoid function and its derivative always less than 1• This leads to vanishing gradients:
*+*,-
= #′(0%)# 0&*+*,2
= #3 0% $%#′(0&)# 0' ≤ *+*,-
*+*,5
= #3 0% $%#′(0&)$&#′(0')#(0() ≤ *+*,2
*+*,6
= #3 0% $%#3 0& $ 0' $'#′ 0( ) ≤ *+*,5
) ℎ( ℎ' ℎ& !$( $' $& $%
University of Waterloo
CS885 Spring 2018 Pascal Poupart 17
Mitigating Vanishing Gradients
• Some popular solutions:– Pre-training– Rectified linear units– Batch normalization– Skip connections
University of Waterloo
CS885 Spring 2018 Pascal Poupart 18
Rectified Linear Units
• Rectified linear: ℎ " = max(0, ")– Gradient is 0 or 1– Sparse computation
• Soft version(“Softplus”) :ℎ " = log(1 + 01)
• Warning: softplusdoes not prevent gradient vanishing (gradient < 1)
RectifiedLinear
Softplus
University of Waterloo