robot control with deep reinforcement learning · Ө 1 Ө 2 Ө 3 Ө n x y z Ө x Ө y Ө z forward...
TRANSCRIPT
![Page 1: Robot control with Deep Reinforcement Learning · Ө 1 Ө 2 Ө 3 Ө n X Y Z Ө X Ө Y Ө Z Forward Kinematic Forward and Inverse Kinematic... Inverse Kinematic Joint Space Cartesian](https://reader030.vdocument.in/reader030/viewer/2022040418/5dd0ea8ed6be591ccb6352c9/html5/thumbnails/1.jpg)
Robot controlwith Deep Reinforcement Learning
Hadi Beik-Mohammadi
INTELLIGENT ROBOTICS SEMINAR TALK, DECEMBER 2017
![Page 2: Robot control with Deep Reinforcement Learning · Ө 1 Ө 2 Ө 3 Ө n X Y Z Ө X Ө Y Ө Z Forward Kinematic Forward and Inverse Kinematic... Inverse Kinematic Joint Space Cartesian](https://reader030.vdocument.in/reader030/viewer/2022040418/5dd0ea8ed6be591ccb6352c9/html5/thumbnails/2.jpg)
• Inverse and Forward Kinematic
• How to Learn a behavior
• Methods
Inverse Recurrent Model
Deep Deterministic Policy Gradient
• Conclusion
INTELLIGENT ROBOTICS SEMINAR TALK, DECEMBER 2017 1
![Page 3: Robot control with Deep Reinforcement Learning · Ө 1 Ө 2 Ө 3 Ө n X Y Z Ө X Ө Y Ө Z Forward Kinematic Forward and Inverse Kinematic... Inverse Kinematic Joint Space Cartesian](https://reader030.vdocument.in/reader030/viewer/2022040418/5dd0ea8ed6be591ccb6352c9/html5/thumbnails/3.jpg)
End Effector
Joint 1
Joint 0
Joint 0Join
t 1
0 360
360
0 300X (CM)
200
Y (
CM
)
Joint SpaceCartesian Space
Target
End Effector
Target
TargetTarget
Target
INTELLIGENT ROBOTICS SEMINAR TALK, DECEMBER 2017
80
290
2
![Page 4: Robot control with Deep Reinforcement Learning · Ө 1 Ө 2 Ө 3 Ө n X Y Z Ө X Ө Y Ө Z Forward Kinematic Forward and Inverse Kinematic... Inverse Kinematic Joint Space Cartesian](https://reader030.vdocument.in/reader030/viewer/2022040418/5dd0ea8ed6be591ccb6352c9/html5/thumbnails/4.jpg)
Ө 1
Ө 2
Ө 3
Ө n
X
Y
Z
Ө X
Ө Y
Ө Z
Forward Kinematic
Forward and Inverse Kinematic
.
.
.Inverse Kinematic
Joint Space Cartesian Space
INTELLIGENT ROBOTICS SEMINAR TALK, DECEMBER 2017 3
![Page 5: Robot control with Deep Reinforcement Learning · Ө 1 Ө 2 Ө 3 Ө n X Y Z Ө X Ө Y Ө Z Forward Kinematic Forward and Inverse Kinematic... Inverse Kinematic Joint Space Cartesian](https://reader030.vdocument.in/reader030/viewer/2022040418/5dd0ea8ed6be591ccb6352c9/html5/thumbnails/5.jpg)
How to build agents that learn behaviors in a dynamic world?
The brain evolved, not to think or feel, but to control movement
Daniel Wolpert, nice TED talk
Learning a behavior:
Learning to map sequences of observations to actions, for a particular goal
INTELLIGENT ROBOTICS SEMINAR TALK, DECEMBER 2017 [3] 4
![Page 6: Robot control with Deep Reinforcement Learning · Ө 1 Ө 2 Ө 3 Ө n X Y Z Ө X Ө Y Ө Z Forward Kinematic Forward and Inverse Kinematic... Inverse Kinematic Joint Space Cartesian](https://reader030.vdocument.in/reader030/viewer/2022040418/5dd0ea8ed6be591ccb6352c9/html5/thumbnails/6.jpg)
What supervision does an agent need to learn purposeful
behaviors in dynamic environments?
• Rewards: sparse feedback from the environment whether the
desired behavior is achieved
• Demonstrations
• Specifications/Attributes of good behavior
INTELLIGENT ROBOTICS SEMINAR TALK, DECEMBER 2017 5
![Page 7: Robot control with Deep Reinforcement Learning · Ө 1 Ө 2 Ө 3 Ө n X Y Z Ө X Ө Y Ө Z Forward Kinematic Forward and Inverse Kinematic... Inverse Kinematic Joint Space Cartesian](https://reader030.vdocument.in/reader030/viewer/2022040418/5dd0ea8ed6be591ccb6352c9/html5/thumbnails/7.jpg)
Inverse Recurrent Model (IRM)[1]• Control Snake-Like Many Joint Robot Arms
• BPTT on recurrent forward models
• Recurrent Neural Networks LSTM
• Offline
Deep Deterministic Policy Gradient (DDPG)[2]• Deep Reinforcement Learning Method
• Actor Critic Network
• Continuous Action Domain
• Model Free
• Online
ME
TH
OD
S
INTELLIGENT ROBOTICS SEMINAR TALK, DECEMBER 2017 6
![Page 8: Robot control with Deep Reinforcement Learning · Ө 1 Ө 2 Ө 3 Ө n X Y Z Ө X Ө Y Ө Z Forward Kinematic Forward and Inverse Kinematic... Inverse Kinematic Joint Space Cartesian](https://reader030.vdocument.in/reader030/viewer/2022040418/5dd0ea8ed6be591ccb6352c9/html5/thumbnails/8.jpg)
Inverse Recurrent Model (IRM)
[1]
INTELLIGENT ROBOTICS SEMINAR TALK, DECEMBER 2017 7
![Page 9: Robot control with Deep Reinforcement Learning · Ө 1 Ө 2 Ө 3 Ө n X Y Z Ө X Ө Y Ө Z Forward Kinematic Forward and Inverse Kinematic... Inverse Kinematic Joint Space Cartesian](https://reader030.vdocument.in/reader030/viewer/2022040418/5dd0ea8ed6be591ccb6352c9/html5/thumbnails/9.jpg)
Deep Deterministic Policy Gradient (DDPG)
CRITIC NETWORK
ACTOR NETWORK
ENVIRONMENT
ACTION
ACTION
STATE
STATE
TD
State-Value Function
Policy Function
INTELLIGENT ROBOTICS SEMINAR TALK, DECEMBER 2017 8
![Page 10: Robot control with Deep Reinforcement Learning · Ө 1 Ө 2 Ө 3 Ө n X Y Z Ө X Ө Y Ө Z Forward Kinematic Forward and Inverse Kinematic... Inverse Kinematic Joint Space Cartesian](https://reader030.vdocument.in/reader030/viewer/2022040418/5dd0ea8ed6be591ccb6352c9/html5/thumbnails/10.jpg)
Deep Deterministic Policy Gradient (DDPG)
https://youtu.be/tJBIqkC1wWM
INTELLIGENT ROBOTICS SEMINAR TALK, DECEMBER 2017 9
![Page 11: Robot control with Deep Reinforcement Learning · Ө 1 Ө 2 Ө 3 Ө n X Y Z Ө X Ө Y Ө Z Forward Kinematic Forward and Inverse Kinematic... Inverse Kinematic Joint Space Cartesian](https://reader030.vdocument.in/reader030/viewer/2022040418/5dd0ea8ed6be591ccb6352c9/html5/thumbnails/11.jpg)
Rewarding
End Effector
Joint 1
Joint 0
Dist (t)
Reward 1 = Gaussian(Dist(t))
https://www.sfu.ca/sonic-studio/handbook/Graphics/Gaussian.gif
INTELLIGENT ROBOTICS SEMINAR TALK, DECEMBER 2017
Reward 2 = Dist(t-1) – Dist(t)
10
![Page 12: Robot control with Deep Reinforcement Learning · Ө 1 Ө 2 Ө 3 Ө n X Y Z Ө X Ө Y Ө Z Forward Kinematic Forward and Inverse Kinematic... Inverse Kinematic Joint Space Cartesian](https://reader030.vdocument.in/reader030/viewer/2022040418/5dd0ea8ed6be591ccb6352c9/html5/thumbnails/12.jpg)
INTELLIGENT ROBOTICS SEMINAR TALK, DECEMBER 2017 11
![Page 13: Robot control with Deep Reinforcement Learning · Ө 1 Ө 2 Ө 3 Ө n X Y Z Ө X Ө Y Ө Z Forward Kinematic Forward and Inverse Kinematic... Inverse Kinematic Joint Space Cartesian](https://reader030.vdocument.in/reader030/viewer/2022040418/5dd0ea8ed6be591ccb6352c9/html5/thumbnails/13.jpg)
2 DOF Manipulator Actor Critic maps during Learning
INTELLIGENT ROBOTICS SEMINAR TALK, DECEMBER 2017 12
![Page 14: Robot control with Deep Reinforcement Learning · Ө 1 Ө 2 Ө 3 Ө n X Y Z Ө X Ө Y Ө Z Forward Kinematic Forward and Inverse Kinematic... Inverse Kinematic Joint Space Cartesian](https://reader030.vdocument.in/reader030/viewer/2022040418/5dd0ea8ed6be591ccb6352c9/html5/thumbnails/14.jpg)
Pros:
• Operate over continuous action spaces
• Algorithm can learn policies end-to-end
• Model-Free
Cons:
• No Proof for learning
• No Guarantee for results
• Requires a large number of training episodes to find solutions
INTELLIGENT ROBOTICS SEMINAR TALK, DECEMBER 2017 13
![Page 15: Robot control with Deep Reinforcement Learning · Ө 1 Ө 2 Ө 3 Ө n X Y Z Ө X Ө Y Ө Z Forward Kinematic Forward and Inverse Kinematic... Inverse Kinematic Joint Space Cartesian](https://reader030.vdocument.in/reader030/viewer/2022040418/5dd0ea8ed6be591ccb6352c9/html5/thumbnails/15.jpg)
References
• [1] Sebastian Otte , Adrian Zwiener , and Martin V. Butz, Inherently Constraint-Aware Control of Many-Joint Robot Arms with Inverse Recurrent Models
• [2] Continuous control with deep reinforcement learning, Timothy P. Lillicrap, Jonathan J. Hunt, Alexander Pritzel, Nicolas Heess, Tom Erez, Yuval Tassa, David Silver, Daan Wierstra
• [3] Deep Reinforcement Learning and Control, Spring 2017, CMU 10703
INTELLIGENT ROBOTICS SEMINAR TALK, DECEMBER 2017INTELLIGENT ROBOTICS SEMINAR TALK, DECEMBER 2017 14