reinforcement learning architectures for deep · 2016. 11. 23. · basic background - reinforcement...
TRANSCRIPT
![Page 1: Reinforcement Learning Architectures for Deep · 2016. 11. 23. · Basic Background - Reinforcement Learning: Reinforcement Learning is a type of Machine Learning, and thereby also](https://reader036.vdocument.in/reader036/viewer/2022071113/5fea03c648e1663fe4047087/html5/thumbnails/1.jpg)
Dueling Network Architectures for Deep Reinforcement LearningPaper by: Ziyu Wang, Tom Schaul, Matteo Hessel, Hado van Hasselt, Marc Lanctot, Nando de Freitas.
Aqeel Labash
![Page 2: Reinforcement Learning Architectures for Deep · 2016. 11. 23. · Basic Background - Reinforcement Learning: Reinforcement Learning is a type of Machine Learning, and thereby also](https://reader036.vdocument.in/reader036/viewer/2022071113/5fea03c648e1663fe4047087/html5/thumbnails/2.jpg)
Basic Background
- Reinforcement Learning: Reinforcement Learning is a type of Machine Learning, and thereby also a branch of Artificial Intelligence. It allows machines and software agents to automatically determine the ideal behaviour within a specific context, in order to maximize its performance. Simple reward feedback is required for the agent to learn its behaviour; this is known as the reinforcement signal. Source
- Goal: To Achieve policy that maximize rewards. Source
![Page 3: Reinforcement Learning Architectures for Deep · 2016. 11. 23. · Basic Background - Reinforcement Learning: Reinforcement Learning is a type of Machine Learning, and thereby also](https://reader036.vdocument.in/reader036/viewer/2022071113/5fea03c648e1663fe4047087/html5/thumbnails/3.jpg)
Simple Terminologies
Actions
Envi
ronm
ent:(
Stat
es s
ourc
e)
Take out : Model consist of : S :discrete set of states.A :discrete set of actionsR :reward (Reinforcement signal)source
Basic Background
![Page 4: Reinforcement Learning Architectures for Deep · 2016. 11. 23. · Basic Background - Reinforcement Learning: Reinforcement Learning is a type of Machine Learning, and thereby also](https://reader036.vdocument.in/reader036/viewer/2022071113/5fea03c648e1663fe4047087/html5/thumbnails/4.jpg)
Q-learning Algorithm
- Action-State Value function Q(S,A): is a function that take a state and an action and return the value of those pair. source
- Q-Learning: is a type of algorithms to calculate the state-action values.- Update Rule :
source
Basic Background
![Page 5: Reinforcement Learning Architectures for Deep · 2016. 11. 23. · Basic Background - Reinforcement Learning: Reinforcement Learning is a type of Machine Learning, and thereby also](https://reader036.vdocument.in/reader036/viewer/2022071113/5fea03c648e1663fe4047087/html5/thumbnails/5.jpg)
Introduction
- New network architecture for reinforcement learning.
- Main feature: explicitly separate state values and action advantage.
- Benefit: - Ability to apply the network with existing or
future Reinforcement learning algorithms.
![Page 6: Reinforcement Learning Architectures for Deep · 2016. 11. 23. · Basic Background - Reinforcement Learning: Reinforcement Learning is a type of Machine Learning, and thereby also](https://reader036.vdocument.in/reader036/viewer/2022071113/5fea03c648e1663fe4047087/html5/thumbnails/6.jpg)
Introduction
- Can learn which states are valuable or not valuable.
- can more quickly identify the correct action during policy evaluation as redundant or similar actions are added to the learning problem.
- These maps were generated by computing the Jacobians of the trained value and advantage streams with respect to the input video.
![Page 7: Reinforcement Learning Architectures for Deep · 2016. 11. 23. · Basic Background - Reinforcement Learning: Reinforcement Learning is a type of Machine Learning, and thereby also](https://reader036.vdocument.in/reader036/viewer/2022071113/5fea03c648e1663fe4047087/html5/thumbnails/7.jpg)
BackgroundStochastic Policy
![Page 8: Reinforcement Learning Architectures for Deep · 2016. 11. 23. · Basic Background - Reinforcement Learning: Reinforcement Learning is a type of Machine Learning, and thereby also](https://reader036.vdocument.in/reader036/viewer/2022071113/5fea03c648e1663fe4047087/html5/thumbnails/8.jpg)
BackgroundStochastic Policy
![Page 9: Reinforcement Learning Architectures for Deep · 2016. 11. 23. · Basic Background - Reinforcement Learning: Reinforcement Learning is a type of Machine Learning, and thereby also](https://reader036.vdocument.in/reader036/viewer/2022071113/5fea03c648e1663fe4047087/html5/thumbnails/9.jpg)
But...
- Did you notice something ?
- Up to now we still using the old fashion way with impractical in case of large number of states.
![Page 10: Reinforcement Learning Architectures for Deep · 2016. 11. 23. · Basic Background - Reinforcement Learning: Reinforcement Learning is a type of Machine Learning, and thereby also](https://reader036.vdocument.in/reader036/viewer/2022071113/5fea03c648e1663fe4047087/html5/thumbnails/10.jpg)
Deep Q-Network (DQN)
![Page 11: Reinforcement Learning Architectures for Deep · 2016. 11. 23. · Basic Background - Reinforcement Learning: Reinforcement Learning is a type of Machine Learning, and thereby also](https://reader036.vdocument.in/reader036/viewer/2022071113/5fea03c648e1663fe4047087/html5/thumbnails/11.jpg)
Experience replay
- Instead of train using current instance we record multiple experiences and use them for training.
![Page 12: Reinforcement Learning Architectures for Deep · 2016. 11. 23. · Basic Background - Reinforcement Learning: Reinforcement Learning is a type of Machine Learning, and thereby also](https://reader036.vdocument.in/reader036/viewer/2022071113/5fea03c648e1663fe4047087/html5/thumbnails/12.jpg)
Experience Replay
- Increases data efficiency through reuse of experience samples in multiple updates.
- It reduces variance as uniform sampling from the replay buffer reduces the correlation among the samples used in the update.
![Page 13: Reinforcement Learning Architectures for Deep · 2016. 11. 23. · Basic Background - Reinforcement Learning: Reinforcement Learning is a type of Machine Learning, and thereby also](https://reader036.vdocument.in/reader036/viewer/2022071113/5fea03c648e1663fe4047087/html5/thumbnails/13.jpg)
Double Deep Q-Network - DDQN
- DQN can be over optimistic due using max operator which lead to using the same values for evaluating and selecting an action.
![Page 14: Reinforcement Learning Architectures for Deep · 2016. 11. 23. · Basic Background - Reinforcement Learning: Reinforcement Learning is a type of Machine Learning, and thereby also](https://reader036.vdocument.in/reader036/viewer/2022071113/5fea03c648e1663fe4047087/html5/thumbnails/14.jpg)
Prioritized Reply
- Increase the replay probability of experience tuples that have a high expected learning progress
(modified)source
![Page 15: Reinforcement Learning Architectures for Deep · 2016. 11. 23. · Basic Background - Reinforcement Learning: Reinforcement Learning is a type of Machine Learning, and thereby also](https://reader036.vdocument.in/reader036/viewer/2022071113/5fea03c648e1663fe4047087/html5/thumbnails/15.jpg)
The Dueling Network Architecture
![Page 16: Reinforcement Learning Architectures for Deep · 2016. 11. 23. · Basic Background - Reinforcement Learning: Reinforcement Learning is a type of Machine Learning, and thereby also](https://reader036.vdocument.in/reader036/viewer/2022071113/5fea03c648e1663fe4047087/html5/thumbnails/16.jpg)
Duel Network output
Fully connected layers output a scalar.
Fully connected layers output |A| dimensional vector
Note: from Q we can’t get V and A for that the advantage function estimator is forced to have zero at a chosen action
![Page 17: Reinforcement Learning Architectures for Deep · 2016. 11. 23. · Basic Background - Reinforcement Learning: Reinforcement Learning is a type of Machine Learning, and thereby also](https://reader036.vdocument.in/reader036/viewer/2022071113/5fea03c648e1663fe4047087/html5/thumbnails/17.jpg)
Duel Network Architecture
- Alternatively from using (8) using following formula is better:
- Pos: - Advantage change as fast as mean.
- Neg: - We lose semantics of V and A because they
are off target
![Page 18: Reinforcement Learning Architectures for Deep · 2016. 11. 23. · Basic Background - Reinforcement Learning: Reinforcement Learning is a type of Machine Learning, and thereby also](https://reader036.vdocument.in/reader036/viewer/2022071113/5fea03c648e1663fe4047087/html5/thumbnails/18.jpg)
Clipping
To isolate the contributions of the dueling architecture, we re-train DDQN with a single stream network. we apply gradient clipping (we clip the gradients to have their norm less than or equal to 10), and use 1024 hidden units for the first fully-connected layer of the network so that both architectures (dueling and single) have roughly the same number of parameters. We refer to this re-trained model as Single Clip, while the original trained model of van Hasselt et al.(2015) is referred to as Single.
![Page 19: Reinforcement Learning Architectures for Deep · 2016. 11. 23. · Basic Background - Reinforcement Learning: Reinforcement Learning is a type of Machine Learning, and thereby also](https://reader036.vdocument.in/reader036/viewer/2022071113/5fea03c648e1663fe4047087/html5/thumbnails/19.jpg)
ResultsPolicy Evaluation
- = 0.001- MLP
Single stream
Due
l
50 50 50 5025 25
![Page 20: Reinforcement Learning Architectures for Deep · 2016. 11. 23. · Basic Background - Reinforcement Learning: Reinforcement Learning is a type of Machine Learning, and thereby also](https://reader036.vdocument.in/reader036/viewer/2022071113/5fea03c648e1663fe4047087/html5/thumbnails/20.jpg)
Results
![Page 21: Reinforcement Learning Architectures for Deep · 2016. 11. 23. · Basic Background - Reinforcement Learning: Reinforcement Learning is a type of Machine Learning, and thereby also](https://reader036.vdocument.in/reader036/viewer/2022071113/5fea03c648e1663fe4047087/html5/thumbnails/21.jpg)
Results - Atari
Performance Rule.
![Page 22: Reinforcement Learning Architectures for Deep · 2016. 11. 23. · Basic Background - Reinforcement Learning: Reinforcement Learning is a type of Machine Learning, and thereby also](https://reader036.vdocument.in/reader036/viewer/2022071113/5fea03c648e1663fe4047087/html5/thumbnails/22.jpg)
Results - Atari
With Prioritize
![Page 23: Reinforcement Learning Architectures for Deep · 2016. 11. 23. · Basic Background - Reinforcement Learning: Reinforcement Learning is a type of Machine Learning, and thereby also](https://reader036.vdocument.in/reader036/viewer/2022071113/5fea03c648e1663fe4047087/html5/thumbnails/23.jpg)
Happy birthday Raul
![Page 24: Reinforcement Learning Architectures for Deep · 2016. 11. 23. · Basic Background - Reinforcement Learning: Reinforcement Learning is a type of Machine Learning, and thereby also](https://reader036.vdocument.in/reader036/viewer/2022071113/5fea03c648e1663fe4047087/html5/thumbnails/24.jpg)
Info
- Deep Q-Network (DQN)
![Page 25: Reinforcement Learning Architectures for Deep · 2016. 11. 23. · Basic Background - Reinforcement Learning: Reinforcement Learning is a type of Machine Learning, and thereby also](https://reader036.vdocument.in/reader036/viewer/2022071113/5fea03c648e1663fe4047087/html5/thumbnails/25.jpg)
Info
![Page 26: Reinforcement Learning Architectures for Deep · 2016. 11. 23. · Basic Background - Reinforcement Learning: Reinforcement Learning is a type of Machine Learning, and thereby also](https://reader036.vdocument.in/reader036/viewer/2022071113/5fea03c648e1663fe4047087/html5/thumbnails/26.jpg)
Questions?
![Page 27: Reinforcement Learning Architectures for Deep · 2016. 11. 23. · Basic Background - Reinforcement Learning: Reinforcement Learning is a type of Machine Learning, and thereby also](https://reader036.vdocument.in/reader036/viewer/2022071113/5fea03c648e1663fe4047087/html5/thumbnails/27.jpg)
Open Questions
- Where do you think this model will fail/not converge ?- are there any conditions for the environment for this
network to have advantage over other networks ?- What do you think about using this model on this
Environment?
![Page 28: Reinforcement Learning Architectures for Deep · 2016. 11. 23. · Basic Background - Reinforcement Learning: Reinforcement Learning is a type of Machine Learning, and thereby also](https://reader036.vdocument.in/reader036/viewer/2022071113/5fea03c648e1663fe4047087/html5/thumbnails/28.jpg)
Now it’s your turn ^_^
Am Done...