comp 4180: intelligent mobile robotics reinforcement learning

56
COMP 4180: Intelligent Mobile Robotics Reinforcement Learning Jacky Baltes Department of Computer Science University of Manitoba Email: [email protected] http://www4.cs.umanitoba.ca/~jacky/... Teaching/Courses/COMP_4180- IntelligentMobileRobotics/current/index.php

Upload: others

Post on 30-Oct-2021

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: COMP 4180: Intelligent Mobile Robotics Reinforcement Learning

COMP 4180: Intelligent Mobile Robotics

Reinforcement Learning

Jacky BaltesDepartment of Computer Science

University of Manitoba

Email: [email protected]

http://www4.cs.umanitoba.ca/~jacky/...Teaching/Courses/COMP_4180-

IntelligentMobileRobotics/current/index.php

Page 2: COMP 4180: Intelligent Mobile Robotics Reinforcement Learning

Outline

● Reinforcement Learning Problem– Dynamic Programming– Control learning– Control policies that choose optimal actions– Q Learning– Convergence

● Monte-Carlo Methods● Temporal Difference Learning

Page 3: COMP 4180: Intelligent Mobile Robotics Reinforcement Learning

Control Learning

Page 4: COMP 4180: Intelligent Mobile Robotics Reinforcement Learning

Example: TD-Gammon

Page 5: COMP 4180: Intelligent Mobile Robotics Reinforcement Learning

Reinforcement Learning Problem

Page 6: COMP 4180: Intelligent Mobile Robotics Reinforcement Learning

Markov Decision Processes

Page 7: COMP 4180: Intelligent Mobile Robotics Reinforcement Learning

Agent's Learning Task

Page 8: COMP 4180: Intelligent Mobile Robotics Reinforcement Learning

State Value Function

Page 9: COMP 4180: Intelligent Mobile Robotics Reinforcement Learning

Bellman Equation(Deterministic Case)

Page 10: COMP 4180: Intelligent Mobile Robotics Reinforcement Learning

Example

Page 11: COMP 4180: Intelligent Mobile Robotics Reinforcement Learning

Example

Page 12: COMP 4180: Intelligent Mobile Robotics Reinforcement Learning

Iterative Policy Evaluation

Page 13: COMP 4180: Intelligent Mobile Robotics Reinforcement Learning

Iterative Policy Evaluation

Page 14: COMP 4180: Intelligent Mobile Robotics Reinforcement Learning

What to learn?

Page 15: COMP 4180: Intelligent Mobile Robotics Reinforcement Learning

Q (Action-Value) Function

Page 16: COMP 4180: Intelligent Mobile Robotics Reinforcement Learning

Q (Action-Value) Function

Page 17: COMP 4180: Intelligent Mobile Robotics Reinforcement Learning
Page 18: COMP 4180: Intelligent Mobile Robotics Reinforcement Learning

Bellman EquationDeterministic Case

Page 19: COMP 4180: Intelligent Mobile Robotics Reinforcement Learning

Optimal Value Functions

Page 20: COMP 4180: Intelligent Mobile Robotics Reinforcement Learning

Policy Improvement

Page 21: COMP 4180: Intelligent Mobile Robotics Reinforcement Learning

Example

Page 22: COMP 4180: Intelligent Mobile Robotics Reinforcement Learning

Example

Page 23: COMP 4180: Intelligent Mobile Robotics Reinforcement Learning

Generalized Policy Iteration

Page 24: COMP 4180: Intelligent Mobile Robotics Reinforcement Learning

Value IterationQ-Learning

Page 25: COMP 4180: Intelligent Mobile Robotics Reinforcement Learning

Non-deterministic Case

Page 26: COMP 4180: Intelligent Mobile Robotics Reinforcement Learning

Bellman EquationsNon-deterministic Case

Page 27: COMP 4180: Intelligent Mobile Robotics Reinforcement Learning

Value IterationQ-Learning

Page 28: COMP 4180: Intelligent Mobile Robotics Reinforcement Learning

Example

Page 29: COMP 4180: Intelligent Mobile Robotics Reinforcement Learning

Example

Page 30: COMP 4180: Intelligent Mobile Robotics Reinforcement Learning

Reinforcement Learning

Page 31: COMP 4180: Intelligent Mobile Robotics Reinforcement Learning

Monte-Carlo MethodsPolicy Evaluation

Page 32: COMP 4180: Intelligent Mobile Robotics Reinforcement Learning

Monte Carlo MethodPolicy Evaluation

Page 33: COMP 4180: Intelligent Mobile Robotics Reinforcement Learning

Temporal Difference (TD) Learning

Page 34: COMP 4180: Intelligent Mobile Robotics Reinforcement Learning

TD(0): Policy Evaluation

Page 35: COMP 4180: Intelligent Mobile Robotics Reinforcement Learning

TD(0): Policy Evaluation

Page 36: COMP 4180: Intelligent Mobile Robotics Reinforcement Learning

e-Greedy Policy

Page 37: COMP 4180: Intelligent Mobile Robotics Reinforcement Learning

SARSA Policy Iteration

Page 38: COMP 4180: Intelligent Mobile Robotics Reinforcement Learning

SARSA Example

Page 39: COMP 4180: Intelligent Mobile Robotics Reinforcement Learning

SARSA Example V(s)

Page 40: COMP 4180: Intelligent Mobile Robotics Reinforcement Learning

SARSA ExampleQ(s,a)

Page 41: COMP 4180: Intelligent Mobile Robotics Reinforcement Learning

Rotational Inverted Pendulum

Rotational Inverted Pendulum Stablization Demo, Tor Aarnodthttp://www.eecg.utoronto.ca/~aamodt/BAScThesis/RLsim.htm

Page 42: COMP 4180: Intelligent Mobile Robotics Reinforcement Learning

Q-Learning (Off-Policy TD)

Page 43: COMP 4180: Intelligent Mobile Robotics Reinforcement Learning

Q-Learning (Off Policy Iteration)

Page 44: COMP 4180: Intelligent Mobile Robotics Reinforcement Learning

TD vs Monte Carlo

Page 45: COMP 4180: Intelligent Mobile Robotics Reinforcement Learning

Temporal Difference Learning

Page 46: COMP 4180: Intelligent Mobile Robotics Reinforcement Learning

Monte Carlo Method

Page 47: COMP 4180: Intelligent Mobile Robotics Reinforcement Learning

N-Step return

Page 48: COMP 4180: Intelligent Mobile Robotics Reinforcement Learning

TD() Learning

Page 49: COMP 4180: Intelligent Mobile Robotics Reinforcement Learning

Eligibility Traces

Page 50: COMP 4180: Intelligent Mobile Robotics Reinforcement Learning

On-line TD()

Page 51: COMP 4180: Intelligent Mobile Robotics Reinforcement Learning

Function Approximation

Page 52: COMP 4180: Intelligent Mobile Robotics Reinforcement Learning

Function Approximation

Page 53: COMP 4180: Intelligent Mobile Robotics Reinforcement Learning

Stochastic Gradient Descent

Page 54: COMP 4180: Intelligent Mobile Robotics Reinforcement Learning

Convergence

Page 55: COMP 4180: Intelligent Mobile Robotics Reinforcement Learning

Subtleties and Ongoing Research

● Replace Q^ table with neural net or other generalizer

● Handle cases where the state is only partially observable

● Design optimal exploration strategies● Extend to continuous action, state● Learn and use delta^: S x A -> S● Relationship to dynamic programming

Page 56: COMP 4180: Intelligent Mobile Robotics Reinforcement Learning

References

● Reinforcement Learning: An Introduction. Richard S. Sutton, Andrew G. Barto. MIT Press 1998. http://www-anw.cs.umass.edu/~rich/book/the-book.html

● Neuro-Dynamic Programming, Dimitri Bertsekas, John Tsitsiklis, Athena Scientific, 1996.

● Reinforcement Learning: A Tutorial. M. Harmon, S. Harmon.● Reinforcement Learning: A Survey, L. Kaebling et al., Journal of Aritificial

Intelligence Research, Vol 4, pp. 237-285● How to Make Software Agents Do the Right Thing: An Introduction to

Reinforcement Learning, S. Singh, P. Norvig, D. Cohn.● Reinforcement Learning Software:

– http://www-anw.cs.umass.edu/~rich/software.html– http://www.cse.msu.edu/rlr/domains.html

● Reinforcement Learning for Humanoid Robots–

● Frank Hoffman. http://www.nada.kth.se/kurser/kth/2D1431/02/index.html