reinforcement learning learning action representations for11-16-00)-11-16-20-4522... · learned...

31
Learning Action Representations for Reinforcement Learning Yash Chandak Georgios Theocharous James Kostas Scott Jordan Philip Thomas

Upload: others

Post on 03-Jul-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Reinforcement Learning Learning Action Representations for11-16-00)-11-16-20-4522... · Learned representations of 212 actions. Policy decomposition. Case 1: Action representations

Learning Action Representations for Reinforcement Learning

Yash Chandak

Georgios Theocharous

James Kostas

Scott Jordan

Philip Thomas

Page 2: Reinforcement Learning Learning Action Representations for11-16-00)-11-16-20-4522... · Learned representations of 212 actions. Policy decomposition. Case 1: Action representations

Reinforcement Learning

Page 3: Reinforcement Learning Learning Action Representations for11-16-00)-11-16-20-4522... · Learned representations of 212 actions. Policy decomposition. Case 1: Action representations

Problem Statement

Thousands of possible actions!

Page 4: Reinforcement Learning Learning Action Representations for11-16-00)-11-16-20-4522... · Learned representations of 212 actions. Policy decomposition. Case 1: Action representations

Problem Statement

Thousands of possible actions!

● Personalized tutoring systems

Page 5: Reinforcement Learning Learning Action Representations for11-16-00)-11-16-20-4522... · Learned representations of 212 actions. Policy decomposition. Case 1: Action representations

Problem Statement

Thousands of possible actions!

● Personalized tutoring systems● Advertisement/marketing

Page 6: Reinforcement Learning Learning Action Representations for11-16-00)-11-16-20-4522... · Learned representations of 212 actions. Policy decomposition. Case 1: Action representations

Problem Statement

Thousands of possible actions!

● Personalized tutoring systems● Advertisement/marketing● Medical treatment - drug

prescription

Page 7: Reinforcement Learning Learning Action Representations for11-16-00)-11-16-20-4522... · Learned representations of 212 actions. Policy decomposition. Case 1: Action representations

Problem Statement

Thousands of possible actions!

● Personalized tutoring systems● Advertisement/marketing● Medical treatment - drug

prescription● Portfolio management

Page 8: Reinforcement Learning Learning Action Representations for11-16-00)-11-16-20-4522... · Learned representations of 212 actions. Policy decomposition. Case 1: Action representations

Problem Statement

Thousands of possible actions!

● Personalized tutoring systems● Advertisement/marketing● Medical treatment - drug

prescription● Portfolio management● Video/Songs recommendation

Page 9: Reinforcement Learning Learning Action Representations for11-16-00)-11-16-20-4522... · Learned representations of 212 actions. Policy decomposition. Case 1: Action representations

Problem Statement

Thousands of possible actions!

● Personalized tutoring systems● Advertisement/marketing● Medical treatment - drug

prescription● Portfolio management● Video/Songs recommendation● …● …● Option selection

Page 10: Reinforcement Learning Learning Action Representations for11-16-00)-11-16-20-4522... · Learned representations of 212 actions. Policy decomposition. Case 1: Action representations

Problem Statement

Thousands of possible actions!

● Personalized tutoring systems● Advertisement/marketing● Medical treatment - drug

prescription● Portfolio management● Video/Songs recommendation● …● …● Option selection

Page 11: Reinforcement Learning Learning Action Representations for11-16-00)-11-16-20-4522... · Learned representations of 212 actions. Policy decomposition. Case 1: Action representations

- Actions are not independent discrete quantities.

Key Insights

Page 12: Reinforcement Learning Learning Action Representations for11-16-00)-11-16-20-4522... · Learned representations of 212 actions. Policy decomposition. Case 1: Action representations

- Actions are not independent discrete quantities.

- There is a low dimensional structure underlying their behavior pattern.

Key Insights

Page 13: Reinforcement Learning Learning Action Representations for11-16-00)-11-16-20-4522... · Learned representations of 212 actions. Policy decomposition. Case 1: Action representations

- Actions are not independent discrete quantities.

- There is a low dimensional structure underlying their behavior pattern.

- This structure can be learned independent of the reward.

Key Insights

Page 14: Reinforcement Learning Learning Action Representations for11-16-00)-11-16-20-4522... · Learned representations of 212 actions. Policy decomposition. Case 1: Action representations

- Actions are not independent discrete quantities.

- There is a low dimensional structure underlying their behavior pattern.

- This structure can be learned independent of the reward.

- Instead of raw actions, agent can act in this space of behavior and feedback can be generalized to similar actions.

Key Insights

Page 15: Reinforcement Learning Learning Action Representations for11-16-00)-11-16-20-4522... · Learned representations of 212 actions. Policy decomposition. Case 1: Action representations

Proposed Method

Page 16: Reinforcement Learning Learning Action Representations for11-16-00)-11-16-20-4522... · Learned representations of 212 actions. Policy decomposition. Case 1: Action representations

Algorithm

(a) Supervised learning of action representations.

Page 17: Reinforcement Learning Learning Action Representations for11-16-00)-11-16-20-4522... · Learned representations of 212 actions. Policy decomposition. Case 1: Action representations

Algorithm

(a) Supervised learning of action representations.(b) Learning internal policy with policy gradients.

Page 18: Reinforcement Learning Learning Action Representations for11-16-00)-11-16-20-4522... · Learned representations of 212 actions. Policy decomposition. Case 1: Action representations
Page 19: Reinforcement Learning Learning Action Representations for11-16-00)-11-16-20-4522... · Learned representations of 212 actions. Policy decomposition. Case 1: Action representations

Results

Page 20: Reinforcement Learning Learning Action Representations for11-16-00)-11-16-20-4522... · Learned representations of 212 actions. Policy decomposition. Case 1: Action representations

Results

Page 21: Reinforcement Learning Learning Action Representations for11-16-00)-11-16-20-4522... · Learned representations of 212 actions. Policy decomposition. Case 1: Action representations

Results

Page 22: Reinforcement Learning Learning Action Representations for11-16-00)-11-16-20-4522... · Learned representations of 212 actions. Policy decomposition. Case 1: Action representations

Results

Page 23: Reinforcement Learning Learning Action Representations for11-16-00)-11-16-20-4522... · Learned representations of 212 actions. Policy decomposition. Case 1: Action representations

Real-world Applications at Adobe

Actions = 1498 tutorials

HelpX Photoshop

Actions = 1843 tools

Page 24: Reinforcement Learning Learning Action Representations for11-16-00)-11-16-20-4522... · Learned representations of 212 actions. Policy decomposition. Case 1: Action representations

Poster#112Today

Page 25: Reinforcement Learning Learning Action Representations for11-16-00)-11-16-20-4522... · Learned representations of 212 actions. Policy decomposition. Case 1: Action representations
Page 26: Reinforcement Learning Learning Action Representations for11-16-00)-11-16-20-4522... · Learned representations of 212 actions. Policy decomposition. Case 1: Action representations

Results (Action representations)

Maze domain

Actual behavior of 212 actions

Learned representations of 212 actions

Page 27: Reinforcement Learning Learning Action Representations for11-16-00)-11-16-20-4522... · Learned representations of 212 actions. Policy decomposition. Case 1: Action representations

Policy decomposition

Page 28: Reinforcement Learning Learning Action Representations for11-16-00)-11-16-20-4522... · Learned representations of 212 actions. Policy decomposition. Case 1: Action representations

Case 1: Action representations are known

- The internal policy acts in the space of action representations

- Any existing policy gradient algorithm can be used to improve its local performance, independent of the mapping function.

Page 29: Reinforcement Learning Learning Action Representations for11-16-00)-11-16-20-4522... · Learned representations of 212 actions. Policy decomposition. Case 1: Action representations

Case 2: Learning action representations

- P(a|e) required to map representation to action can be learned by satisfying the earlier assumption:

- We parameterize P(a|e) and P(e|s,s’) with learnable functions f and g, respectively.

- Observed transition tuples are from the required distribution.

- Parameters can be learned by minimizing the stochastic KL divergence.

- Procedure is independent of reward.

Page 30: Reinforcement Learning Learning Action Representations for11-16-00)-11-16-20-4522... · Learned representations of 212 actions. Policy decomposition. Case 1: Action representations

Toy Maze:

- Agent in continuous state with n actuators.- 2n actions. Exponentially large action space.- Long horizon and single goal reward.

Adobe Datasets:

- N-gram based multi-time step user behavior model from passive data.- Rewards defined using a surrogate objective.- Photoshop tool recommendation (1843 tools)- HelpX tutorial recommendation (1498 tutorials)

Experiments

Page 31: Reinforcement Learning Learning Action Representations for11-16-00)-11-16-20-4522... · Learned representations of 212 actions. Policy decomposition. Case 1: Action representations

Advantages

- Exploits structure in space of actions.

- Quick generalization of feedback to similar actions.

- Less parameters updated using high variance policy gradients.

- Drop-in extension for existing policy gradient algorithms.