reinforcement learning in continuous state and action...

32
Reinforcement learning Challenge & contribution Examples Conclusions Reinforcement Learning in Continuous State and Action Spaces Lucian Bu¸ soniu Advisers: Robert Babuška, Bart De Schutter Delft University of Technology Center for Systems and Control Project: Interactive Collaborative Information Systems 13 January 2009

Upload: others

Post on 06-Jul-2020

9 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Reinforcement Learning in Continuous State and Action Spacesbusoniu.net/files/repository/defense-beamer.pdf · Reinforcement learning Challenge & contribution Examples Conclusions

Reinforcement learning Challenge & contribution Examples Conclusions

Reinforcement Learning inContinuous State and Action Spaces

Lucian BusoniuAdvisers: Robert Babuška, Bart De Schutter

Delft University of TechnologyCenter for Systems and Control

Project: Interactive Collaborative Information Systems

13 January 2009

Page 2: Reinforcement Learning in Continuous State and Action Spacesbusoniu.net/files/repository/defense-beamer.pdf · Reinforcement learning Challenge & contribution Examples Conclusions

Reinforcement learning Challenge & contribution Examples Conclusions

Motivation and focus

Learning can find solutions that:are hard to determine a prioriimprove over time

Reinforcement learning:uses reward signal as performance feedbackcan work without prior knowledge

Exact RL solutions only in discrete cases:this thesis: continuous spacesusing approximate solutions

Page 3: Reinforcement Learning in Continuous State and Action Spacesbusoniu.net/files/repository/defense-beamer.pdf · Reinforcement learning Challenge & contribution Examples Conclusions

Reinforcement learning Challenge & contribution Examples Conclusions

Motivation and focus

Learning can find solutions that:are hard to determine a prioriimprove over time

Reinforcement learning:uses reward signal as performance feedbackcan work without prior knowledge

Exact RL solutions only in discrete cases:this thesis: continuous spacesusing approximate solutions

Page 4: Reinforcement Learning in Continuous State and Action Spacesbusoniu.net/files/repository/defense-beamer.pdf · Reinforcement learning Challenge & contribution Examples Conclusions

Reinforcement learning Challenge & contribution Examples Conclusions

Motivation and focus

Learning can find solutions that:are hard to determine a prioriimprove over time

Reinforcement learning:uses reward signal as performance feedbackcan work without prior knowledge

Exact RL solutions only in discrete cases:this thesis: continuous spacesusing approximate solutions

Page 5: Reinforcement Learning in Continuous State and Action Spacesbusoniu.net/files/repository/defense-beamer.pdf · Reinforcement learning Challenge & contribution Examples Conclusions

Reinforcement learning Challenge & contribution Examples Conclusions

Outline

1 Reinforcement learning

2 Challenge & contribution

3 Examples

4 Conclusions

Page 6: Reinforcement Learning in Continuous State and Action Spacesbusoniu.net/files/repository/defense-beamer.pdf · Reinforcement learning Challenge & contribution Examples Conclusions

Reinforcement learning Challenge & contribution Examples Conclusions

RL problem

Optimal control problemExample: robot should move to goal in shortest time

Page 7: Reinforcement Learning in Continuous State and Action Spacesbusoniu.net/files/repository/defense-beamer.pdf · Reinforcement learning Challenge & contribution Examples Conclusions

Reinforcement learning Challenge & contribution Examples Conclusions

Elements of RL

Robot in given state (position X , Y )Robot takes action (e.g., move forward)

and reaches new stateReceives reward = quality of state transition

Page 8: Reinforcement Learning in Continuous State and Action Spacesbusoniu.net/files/repository/defense-beamer.pdf · Reinforcement learning Challenge & contribution Examples Conclusions

Reinforcement learning Challenge & contribution Examples Conclusions

Elements of RL

Robot in given state (position X , Y )Robot takes action (e.g., move forward)

and reaches new stateReceives reward = quality of state transition

Page 9: Reinforcement Learning in Continuous State and Action Spacesbusoniu.net/files/repository/defense-beamer.pdf · Reinforcement learning Challenge & contribution Examples Conclusions

Reinforcement learning Challenge & contribution Examples Conclusions

Elements of RL

Robot in given state (position X , Y )Robot takes action (e.g., move forward)

and reaches new stateReceives reward = quality of state transition

Page 10: Reinforcement Learning in Continuous State and Action Spacesbusoniu.net/files/repository/defense-beamer.pdf · Reinforcement learning Challenge & contribution Examples Conclusions

Reinforcement learning Challenge & contribution Examples Conclusions

Elements of RL

Robot in given state (position X , Y )Robot takes action (e.g., move forward)

and reaches new stateReceives reward = quality of state transition

Page 11: Reinforcement Learning in Continuous State and Action Spacesbusoniu.net/files/repository/defense-beamer.pdf · Reinforcement learning Challenge & contribution Examples Conclusions

Reinforcement learning Challenge & contribution Examples Conclusions

Policy

Control policy: what action to take in every state

u = h(x)

E.g., in state [9, 1], move forwardin state [2, 10], move right

Page 12: Reinforcement Learning in Continuous State and Action Spacesbusoniu.net/files/repository/defense-beamer.pdf · Reinforcement learning Challenge & contribution Examples Conclusions

Reinforcement learning Challenge & contribution Examples Conclusions

Policy

Control policy: what action to take in every state

u = h(x)

E.g., in state [9, 1], move forwardin state [2, 10], move right

Page 13: Reinforcement Learning in Continuous State and Action Spacesbusoniu.net/files/repository/defense-beamer.pdf · Reinforcement learning Challenge & contribution Examples Conclusions

Reinforcement learning Challenge & contribution Examples Conclusions

Performance criterion

Reward = one-step performanceReturn = long-term performance, along trajectory

R(x , u) = r1 + γr2 + γ2r3 + γ3r4 + . . .

0 < γ < 1 discount factor

Page 14: Reinforcement Learning in Continuous State and Action Spacesbusoniu.net/files/repository/defense-beamer.pdf · Reinforcement learning Challenge & contribution Examples Conclusions

Reinforcement learning Challenge & contribution Examples Conclusions

Performance criterion

Reward = one-step performanceReturn = long-term performance, along trajectory

R(x , u) = r1 + γr2 + γ2r3 + γ3r4 + . . .

0 < γ < 1 discount factor

Page 15: Reinforcement Learning in Continuous State and Action Spacesbusoniu.net/files/repository/defense-beamer.pdf · Reinforcement learning Challenge & contribution Examples Conclusions

Reinforcement learning Challenge & contribution Examples Conclusions

Optimality

Goal: obtain maximal return:R∗(x , u) = r1 + discounted rewards

along best trajectory starting in x1

Optimal policy h∗ can be computed from R∗

Page 16: Reinforcement Learning in Continuous State and Action Spacesbusoniu.net/files/repository/defense-beamer.pdf · Reinforcement learning Challenge & contribution Examples Conclusions

Reinforcement learning Challenge & contribution Examples Conclusions

Optimality

Goal: obtain maximal return:R∗(x , u) = r1 + discounted rewards

along best trajectory starting in x1

Optimal policy h∗ can be computed from R∗

Page 17: Reinforcement Learning in Continuous State and Action Spacesbusoniu.net/files/repository/defense-beamer.pdf · Reinforcement learning Challenge & contribution Examples Conclusions

Reinforcement learning Challenge & contribution Examples Conclusions

Algorithms

Many algorithms available to find optimal R∗, h∗

Some require prior knowledge about problem, or data:

Others work with no prior knowledge,and collect data by online interaction:

Page 18: Reinforcement Learning in Continuous State and Action Spacesbusoniu.net/files/repository/defense-beamer.pdf · Reinforcement learning Challenge & contribution Examples Conclusions

Reinforcement learning Challenge & contribution Examples Conclusions

Algorithms

Many algorithms available to find optimal R∗, h∗

Some require prior knowledge about problem, or data:

Others work with no prior knowledge,and collect data by online interaction:

Page 19: Reinforcement Learning in Continuous State and Action Spacesbusoniu.net/files/repository/defense-beamer.pdf · Reinforcement learning Challenge & contribution Examples Conclusions

Reinforcement learning Challenge & contribution Examples Conclusions

Algorithms

Many algorithms available to find optimal R∗, h∗

Some require prior knowledge about problem, or data:

Others work with no prior knowledge,and collect data by online interaction:

Page 20: Reinforcement Learning in Continuous State and Action Spacesbusoniu.net/files/repository/defense-beamer.pdf · Reinforcement learning Challenge & contribution Examples Conclusions

Reinforcement learning Challenge & contribution Examples Conclusions

1 Reinforcement learning

2 Challenge & contribution

3 Examples

4 Conclusions

Page 21: Reinforcement Learning in Continuous State and Action Spacesbusoniu.net/files/repository/defense-beamer.pdf · Reinforcement learning Challenge & contribution Examples Conclusions

Reinforcement learning Challenge & contribution Examples Conclusions

Challenge

Classical algorithms have to store R(x , u)for every combination of state x and action u

Only possible for when number of combinations is small

However, x and u often continuous⇒ infinitely many combinations!

E.g., for robot, x = [X , Y ] continuous

Page 22: Reinforcement Learning in Continuous State and Action Spacesbusoniu.net/files/repository/defense-beamer.pdf · Reinforcement learning Challenge & contribution Examples Conclusions

Reinforcement learning Challenge & contribution Examples Conclusions

Challenge

Classical algorithms have to store R(x , u)for every combination of state x and action u

Only possible for when number of combinations is small

However, x and u often continuous⇒ infinitely many combinations!

E.g., for robot, x = [X , Y ] continuous

Page 23: Reinforcement Learning in Continuous State and Action Spacesbusoniu.net/files/repository/defense-beamer.pdf · Reinforcement learning Challenge & contribution Examples Conclusions

Reinforcement learning Challenge & contribution Examples Conclusions

Challenge

Classical algorithms have to store R(x , u)for every combination of state x and action u

Only possible for when number of combinations is small

However, x and u often continuous⇒ infinitely many combinations!

E.g., for robot, x = [X , Y ] continuous

Page 24: Reinforcement Learning in Continuous State and Action Spacesbusoniu.net/files/repository/defense-beamer.pdf · Reinforcement learning Challenge & contribution Examples Conclusions

Reinforcement learning Challenge & contribution Examples Conclusions

Challenge

Classical algorithms have to store R(x , u)for every combination of state x and action u

Only possible for when number of combinations is small

However, x and u often continuous⇒ infinitely many combinations!

E.g., for robot, x = [X , Y ] continuous

Page 25: Reinforcement Learning in Continuous State and Action Spacesbusoniu.net/files/repository/defense-beamer.pdf · Reinforcement learning Challenge & contribution Examples Conclusions

Reinforcement learning Challenge & contribution Examples Conclusions

Contribution

Algorithms for reinforcement learningin problems with continuous states and actions

Using approximate representations of returns

Page 26: Reinforcement Learning in Continuous State and Action Spacesbusoniu.net/files/repository/defense-beamer.pdf · Reinforcement learning Challenge & contribution Examples Conclusions

Reinforcement learning Challenge & contribution Examples Conclusions

Example: RL for inverted pendulum

Goal: point up and stay thereDifficulty: insufficient power, need to swing back & forthReward: the closer to vertical, the larger

Replay learned policy

Page 27: Reinforcement Learning in Continuous State and Action Spacesbusoniu.net/files/repository/defense-beamer.pdf · Reinforcement learning Challenge & contribution Examples Conclusions

Reinforcement learning Challenge & contribution Examples Conclusions

Example: RL for robot goalkeeper

Catch ball using only video camera image

Page 28: Reinforcement Learning in Continuous State and Action Spacesbusoniu.net/files/repository/defense-beamer.pdf · Reinforcement learning Challenge & contribution Examples Conclusions

Reinforcement learning Challenge & contribution Examples Conclusions

1 Reinforcement learning

2 Challenge & contribution

3 Examples

4 Conclusions

Page 29: Reinforcement Learning in Continuous State and Action Spacesbusoniu.net/files/repository/defense-beamer.pdf · Reinforcement learning Challenge & contribution Examples Conclusions

Reinforcement learning Challenge & contribution Examples Conclusions

Summary

Reinforcement learning: very general framework

Can learn from interaction, without prior knowledge

However:Classical RL algorithms do not workwhen states, actions continuous

Need to use approximate representations (this thesis)

Page 30: Reinforcement Learning in Continuous State and Action Spacesbusoniu.net/files/repository/defense-beamer.pdf · Reinforcement learning Challenge & contribution Examples Conclusions

Reinforcement learning Challenge & contribution Examples Conclusions

Summary

Reinforcement learning: very general framework

Can learn from interaction, without prior knowledge

However:Classical RL algorithms do not workwhen states, actions continuous

Need to use approximate representations (this thesis)

Page 31: Reinforcement Learning in Continuous State and Action Spacesbusoniu.net/files/repository/defense-beamer.pdf · Reinforcement learning Challenge & contribution Examples Conclusions

Reinforcement learning Challenge & contribution Examples Conclusions

Application areas

1 Control engineering: optimal & learning control(e.g., robot control)

2 Computer science: intelligent agents

3 Economics

4 etc.

Page 32: Reinforcement Learning in Continuous State and Action Spacesbusoniu.net/files/repository/defense-beamer.pdf · Reinforcement learning Challenge & contribution Examples Conclusions

Reinforcement learning Challenge & contribution Examples Conclusions

Thank you

Thank you!Questions?