using advice to transfer knowledge acquired in one reinforcement learning task to another lisa...
TRANSCRIPT
![Page 1: Using Advice to Transfer Knowledge Acquired in One Reinforcement Learning Task to Another Lisa Torrey, Trevor Walker, Jude Shavlik University of Wisconsin-Madison,](https://reader036.vdocument.in/reader036/viewer/2022062409/56649ec75503460f94bd403c/html5/thumbnails/1.jpg)
Using Advice to Transfer Knowledge Acquired in
One Reinforcement Learning Task to Another
Lisa Torrey, Trevor Walker, Jude Shavlik
University of Wisconsin-Madison, USA
Richard MaclinUniversity of Minnesota-Duluth, USA
![Page 2: Using Advice to Transfer Knowledge Acquired in One Reinforcement Learning Task to Another Lisa Torrey, Trevor Walker, Jude Shavlik University of Wisconsin-Madison,](https://reader036.vdocument.in/reader036/viewer/2022062409/56649ec75503460f94bd403c/html5/thumbnails/2.jpg)
Our Goal
Transfer knowledge…
… between reinforcement learning tasks
… employing SVM function approximators
… using advice
![Page 3: Using Advice to Transfer Knowledge Acquired in One Reinforcement Learning Task to Another Lisa Torrey, Trevor Walker, Jude Shavlik University of Wisconsin-Madison,](https://reader036.vdocument.in/reader036/viewer/2022062409/56649ec75503460f94bd403c/html5/thumbnails/3.jpg)
Transfer
Learn first task
Learn related taskknowledge
acquired
Exploit previously learned models
Improve learning of new tasks
performance
experience
with transferwithout transfer
![Page 4: Using Advice to Transfer Knowledge Acquired in One Reinforcement Learning Task to Another Lisa Torrey, Trevor Walker, Jude Shavlik University of Wisconsin-Madison,](https://reader036.vdocument.in/reader036/viewer/2022062409/56649ec75503460f94bd403c/html5/thumbnails/4.jpg)
Reinforcement Learning
state…action…reward…new state
Q-function: value of taking action from state Policy: take action with max Qaction(state)
+2
-1
0
![Page 5: Using Advice to Transfer Knowledge Acquired in One Reinforcement Learning Task to Another Lisa Torrey, Trevor Walker, Jude Shavlik University of Wisconsin-Madison,](https://reader036.vdocument.in/reader036/viewer/2022062409/56649ec75503460f94bd403c/html5/thumbnails/5.jpg)
Advice for Transfer
Based on what worked in Task A,
I suggest…
Task ASolution
Task BLearner
I’ll try it, but if it doesn’t work I’ll do something
else.
Advice improves RL performance
Advice can be refined or even discarded
![Page 6: Using Advice to Transfer Knowledge Acquired in One Reinforcement Learning Task to Another Lisa Torrey, Trevor Walker, Jude Shavlik University of Wisconsin-Madison,](https://reader036.vdocument.in/reader036/viewer/2022062409/56649ec75503460f94bd403c/html5/thumbnails/6.jpg)
Transfer Process
Task AQ-functions
Task A experience
Advice from user (optional)
Transfer Advice
Task BQ-functions
Task B experience
Advice from user (optional)
Mapping from user Task A Task B
Task A experience
Task B experience
![Page 7: Using Advice to Transfer Knowledge Acquired in One Reinforcement Learning Task to Another Lisa Torrey, Trevor Walker, Jude Shavlik University of Wisconsin-Madison,](https://reader036.vdocument.in/reader036/viewer/2022062409/56649ec75503460f94bd403c/html5/thumbnails/7.jpg)
RoboCup Soccer Tasks
KeepAway BreakAway
Keep ball from opponents
[Stone & Sutton, ICML 2001]
Score a goal
[Maclin et al., AAAI 2005]
![Page 8: Using Advice to Transfer Knowledge Acquired in One Reinforcement Learning Task to Another Lisa Torrey, Trevor Walker, Jude Shavlik University of Wisconsin-Madison,](https://reader036.vdocument.in/reader036/viewer/2022062409/56649ec75503460f94bd403c/html5/thumbnails/8.jpg)
RL in RoboCup Tasks
Pass, Hold Pass, Move, Shoot
Each time step: +1 At end: +2, 0, or -1
KeepAway BreakAway
Features
Actions
Rewards
(time left)
![Page 9: Using Advice to Transfer Knowledge Acquired in One Reinforcement Learning Task to Another Lisa Torrey, Trevor Walker, Jude Shavlik University of Wisconsin-Madison,](https://reader036.vdocument.in/reader036/viewer/2022062409/56649ec75503460f94bd403c/html5/thumbnails/9.jpg)
Transfer Process
Task AQ-functions
Task A experience
Transfer Advice
Task BQ-functions
Task B experience
Mapping from user Task A Task B
![Page 10: Using Advice to Transfer Knowledge Acquired in One Reinforcement Learning Task to Another Lisa Torrey, Trevor Walker, Jude Shavlik University of Wisconsin-Madison,](https://reader036.vdocument.in/reader036/viewer/2022062409/56649ec75503460f94bd403c/html5/thumbnails/10.jpg)
Approximating Q-Functions
Learn linear coefficientsy = w1 f1 + … + wn fn + b
Non-linearity from Boolean tile featurestilei,lower,upper = 1 if lower ≤ fi < upper
Given examplesState features Si= <f1 , … , fn>
Estimated values y Qaction(Si)
![Page 11: Using Advice to Transfer Knowledge Acquired in One Reinforcement Learning Task to Another Lisa Torrey, Trevor Walker, Jude Shavlik University of Wisconsin-Madison,](https://reader036.vdocument.in/reader036/viewer/2022062409/56649ec75503460f94bd403c/html5/thumbnails/11.jpg)
Support Vector Regression
state S
Q-estimate y
minimize ||w||1 + |b| + C ||k||1
such that y - k Sw + b y + k
Linear Program
![Page 12: Using Advice to Transfer Knowledge Acquired in One Reinforcement Learning Task to Another Lisa Torrey, Trevor Walker, Jude Shavlik University of Wisconsin-Madison,](https://reader036.vdocument.in/reader036/viewer/2022062409/56649ec75503460f94bd403c/html5/thumbnails/12.jpg)
Transfer Process
Task AQ-functions
Task A experience
Transfer Advice
Task BQ-functions
Task B experience
Mapping from user Task A Task B
![Page 13: Using Advice to Transfer Knowledge Acquired in One Reinforcement Learning Task to Another Lisa Torrey, Trevor Walker, Jude Shavlik University of Wisconsin-Madison,](https://reader036.vdocument.in/reader036/viewer/2022062409/56649ec75503460f94bd403c/html5/thumbnails/13.jpg)
Advice Example
Need only follow advice approximately
Add soft constraints to linear program
if distance_to_goal 10
and shot_angle 30
then prefer shoot over all other actions
![Page 14: Using Advice to Transfer Knowledge Acquired in One Reinforcement Learning Task to Another Lisa Torrey, Trevor Walker, Jude Shavlik University of Wisconsin-Madison,](https://reader036.vdocument.in/reader036/viewer/2022062409/56649ec75503460f94bd403c/html5/thumbnails/14.jpg)
Incorporating AdviceMaclin et al., AAAI 2005
if v11 f1 + … + v1n fn d1
…
and vm1 f1 + … + vmn fn dn
then Qshoot > Qother for all other
Advice and Q-functions have same language Linear expressions of features
![Page 15: Using Advice to Transfer Knowledge Acquired in One Reinforcement Learning Task to Another Lisa Torrey, Trevor Walker, Jude Shavlik University of Wisconsin-Madison,](https://reader036.vdocument.in/reader036/viewer/2022062409/56649ec75503460f94bd403c/html5/thumbnails/15.jpg)
Transfer Process
Task AQ-functions
Task A experience
Transfer Advice
Task BQ-functions
Task B experience
Mapping from user Task A Task B
![Page 16: Using Advice to Transfer Knowledge Acquired in One Reinforcement Learning Task to Another Lisa Torrey, Trevor Walker, Jude Shavlik University of Wisconsin-Madison,](https://reader036.vdocument.in/reader036/viewer/2022062409/56649ec75503460f94bd403c/html5/thumbnails/16.jpg)
Expressing Policy with Advice
Qhold_ball(s) Qpass_near(s)Qpass_far(s)
if Qhold_ball(s) > Qpass_near(s)
and Qhold_ball(s) > Qpass_far(s)
then prefer hold_ball over all other actions
Old Q-functions
Advice expressing policy
![Page 17: Using Advice to Transfer Knowledge Acquired in One Reinforcement Learning Task to Another Lisa Torrey, Trevor Walker, Jude Shavlik University of Wisconsin-Madison,](https://reader036.vdocument.in/reader036/viewer/2022062409/56649ec75503460f94bd403c/html5/thumbnails/17.jpg)
Mapping Actions
hold_ball move
pass_near pass_near
pass_far
Qhold_ball(s) Qpass_near(s)Qpass_far(s)
if Qhold_ball(s) > Qpass_near(s)
and Qhold_ball(s) > Qpass_far(s)
then prefer move over all other actions
Old Q-functions
Mapped policy
Mapping from user
![Page 18: Using Advice to Transfer Knowledge Acquired in One Reinforcement Learning Task to Another Lisa Torrey, Trevor Walker, Jude Shavlik University of Wisconsin-Madison,](https://reader036.vdocument.in/reader036/viewer/2022062409/56649ec75503460f94bd403c/html5/thumbnails/18.jpg)
Mapping Features
Qhold_ball(s) = w1 (dist_keeper1)+ w2 (dist_taker2)+ …
Q´hold_ball(s) = w1 (dist_attacker1)+ w2 (MAX_DIST)+ …
Mapping from user
Q-function mapping
![Page 19: Using Advice to Transfer Knowledge Acquired in One Reinforcement Learning Task to Another Lisa Torrey, Trevor Walker, Jude Shavlik University of Wisconsin-Madison,](https://reader036.vdocument.in/reader036/viewer/2022062409/56649ec75503460f94bd403c/html5/thumbnails/19.jpg)
Transfer Example
Qx = wx1f1 + wx2f2 + bx
Qy = wy1f1 + by
Qz = wz2f2 + bz
Old model
Q´x = wx1f´1 + wx2f´2 + bx
Q´y = wy1f´1 + by
Q´z = wz2f´2 + bz
Mapped model
if Q´x > Q´y
and Q´x > Q´z
then prefer x´
Advice
if wx1f´1 + wx2f´2 + bx > wy1f´1 + by
and wx1f´1 + wx2f´2 + bx > wz2 f´2 + bz
then prefer x´ to all other actions
Advice (expanded)
![Page 20: Using Advice to Transfer Knowledge Acquired in One Reinforcement Learning Task to Another Lisa Torrey, Trevor Walker, Jude Shavlik University of Wisconsin-Madison,](https://reader036.vdocument.in/reader036/viewer/2022062409/56649ec75503460f94bd403c/html5/thumbnails/20.jpg)
Transfer Experiment
Between RoboCup subtasks From 3-on-2 KeepAway To 2-on-1 BreakAway
Two simultaneous mappings Transfer passing skills Map passing skills to shooting
![Page 21: Using Advice to Transfer Knowledge Acquired in One Reinforcement Learning Task to Another Lisa Torrey, Trevor Walker, Jude Shavlik University of Wisconsin-Madison,](https://reader036.vdocument.in/reader036/viewer/2022062409/56649ec75503460f94bd403c/html5/thumbnails/21.jpg)
Experiment Mappings
Play a moving KeepAway game Pass Pass, Hold Move
Pretend teammate is standing in the goal Pass Shoot
imaginaryteammate
![Page 22: Using Advice to Transfer Knowledge Acquired in One Reinforcement Learning Task to Another Lisa Torrey, Trevor Walker, Jude Shavlik University of Wisconsin-Madison,](https://reader036.vdocument.in/reader036/viewer/2022062409/56649ec75503460f94bd403c/html5/thumbnails/22.jpg)
Experimental Methodology
Averaged over 10 BreakAway runs
Transfer: advice from one KeepAway model
Control: runs without advice
![Page 23: Using Advice to Transfer Knowledge Acquired in One Reinforcement Learning Task to Another Lisa Torrey, Trevor Walker, Jude Shavlik University of Wisconsin-Madison,](https://reader036.vdocument.in/reader036/viewer/2022062409/56649ec75503460f94bd403c/html5/thumbnails/23.jpg)
Results
0
0.2
0.4
0.6
0.8
0 2500 5000 7500 10000
Games Played
Pro
bab
ility
(S
core
Go
al)
![Page 24: Using Advice to Transfer Knowledge Acquired in One Reinforcement Learning Task to Another Lisa Torrey, Trevor Walker, Jude Shavlik University of Wisconsin-Madison,](https://reader036.vdocument.in/reader036/viewer/2022062409/56649ec75503460f94bd403c/html5/thumbnails/24.jpg)
Analysis
Transfer advice helps BreakAway learners 7% more likely to score a goal after learning
Improvement is delayed Advantage begins after 2500 games
Some advice rules apply rarely Preconditions for shoot advice not often met
![Page 25: Using Advice to Transfer Knowledge Acquired in One Reinforcement Learning Task to Another Lisa Torrey, Trevor Walker, Jude Shavlik University of Wisconsin-Madison,](https://reader036.vdocument.in/reader036/viewer/2022062409/56649ec75503460f94bd403c/html5/thumbnails/25.jpg)
Related Work: Transfer
Remember action subsequences [Singh, ML 1992]
Restrict action choices [Sherstov & Stone, AAAI 2005]
Transfer Q-values directly in KeepAway [Taylor & Stone, AAMAS 2005]
![Page 26: Using Advice to Transfer Knowledge Acquired in One Reinforcement Learning Task to Another Lisa Torrey, Trevor Walker, Jude Shavlik University of Wisconsin-Madison,](https://reader036.vdocument.in/reader036/viewer/2022062409/56649ec75503460f94bd403c/html5/thumbnails/26.jpg)
Related Work: Advice
“Take action A now”[Clouse & Utgoff, ICML 1992]
“In situations S, action A has value X ”[Maclin & Shavlik, ML 1996]
“In situations S, prefer action A over B ”[Maclin et al., AAAI 2005]
![Page 27: Using Advice to Transfer Knowledge Acquired in One Reinforcement Learning Task to Another Lisa Torrey, Trevor Walker, Jude Shavlik University of Wisconsin-Madison,](https://reader036.vdocument.in/reader036/viewer/2022062409/56649ec75503460f94bd403c/html5/thumbnails/27.jpg)
Future Work
Increase speed of linear-program solving
Decrease sensitivity to imperfect advice
Extract advice from kernel-based models
Help user map actions and features
![Page 28: Using Advice to Transfer Knowledge Acquired in One Reinforcement Learning Task to Another Lisa Torrey, Trevor Walker, Jude Shavlik University of Wisconsin-Madison,](https://reader036.vdocument.in/reader036/viewer/2022062409/56649ec75503460f94bd403c/html5/thumbnails/28.jpg)
Conclusions
Transfer exploits previously learned models to improve learning of new tasks
Advice is an appealing way to transfer
Linear regression approach incorporates advice straightforwardly
Transferring a policy accommodates different reward structures
![Page 29: Using Advice to Transfer Knowledge Acquired in One Reinforcement Learning Task to Another Lisa Torrey, Trevor Walker, Jude Shavlik University of Wisconsin-Madison,](https://reader036.vdocument.in/reader036/viewer/2022062409/56649ec75503460f94bd403c/html5/thumbnails/29.jpg)
Acknowledgements
DARPA grant HR0011-04-1-0007 United States Naval Research Laboratory
grant N00173-04-1-G026 Michael Ferris Olvi Mangasarian Ted Wild