reinforcement learning learning action representations for11-16-00)-11-16-20-4522... · learned...

Learning Action Representations for Reinforcement Learning

Yash Chandak

Georgios Theocharous

James Kostas

Scott Jordan

Philip Thomas

Reinforcement Learning

Problem Statement

Thousands of possible actions!

Problem Statement

● Personalized tutoring systems

Problem Statement

● Personalized tutoring systems● Advertisement/marketing

Problem Statement

● Personalized tutoring systems● Advertisement/marketing● Medical treatment - drug

prescription

Problem Statement

prescription● Portfolio management

Problem Statement

prescription● Portfolio management● Video/Songs recommendation

Problem Statement

prescription● Portfolio management● Video/Songs recommendation● …● …● Option selection

Problem Statement

prescription● Portfolio management● Video/Songs recommendation● …● …● Option selection

- Actions are not independent discrete quantities.

Key Insights

- There is a low dimensional structure underlying their behavior pattern.

Key Insights

- This structure can be learned independent of the reward.

Key Insights

- This structure can be learned independent of the reward.

- Instead of raw actions, agent can act in this space of behavior and feedback can be generalized to similar actions.

Key Insights

Proposed Method

Algorithm

(a) Supervised learning of action representations.

Algorithm

(a) Supervised learning of action representations.(b) Learning internal policy with policy gradients.

Results

Real-world Applications at Adobe

Actions = 1498 tutorials

HelpX Photoshop

Actions = 1843 tools

Poster#112Today

Results (Action representations)

Maze domain

Actual behavior of 212 actions

Learned representations of 212 actions

Policy decomposition

Case 1: Action representations are known

- The internal policy acts in the space of action representations

- Any existing policy gradient algorithm can be used to improve its local performance, independent of the mapping function.

Case 2: Learning action representations

- P(a|e) required to map representation to action can be learned by satisfying the earlier assumption:

- We parameterize P(a|e) and P(e|s,s’) with learnable functions f and g, respectively.

- Observed transition tuples are from the required distribution.

- Parameters can be learned by minimizing the stochastic KL divergence.

- Procedure is independent of reward.

Toy Maze:

- Agent in continuous state with n actuators.- 2n actions. Exponentially large action space.- Long horizon and single goal reward.

Adobe Datasets:

- N-gram based multi-time step user behavior model from passive data.- Rewards defined using a surrogate objective.- Photoshop tool recommendation (1843 tools)- HelpX tutorial recommendation (1498 tutorials)

Experiments

Advantages

- Exploits structure in space of actions.

- Quick generalization of feedback to similar actions.

- Less parameters updated using high variance policy gradients.

- Drop-in extension for existing policy gradient algorithms.

reinforcement learning learning action representations for11-16-00)-11-16-20-4522... · learned...

Documents

4522 - poc-net · 4522 author: rca subject: fp-2011-09-24...

pm - 12-lead ecg for monitoring & diagnostic use rel. h.0...

curriculum representations

object representations

a dministrative c ode jim westbrook, purchasing manager...

4522 962 12981 - philips

ibio*4522 thesis in integrative biology - draft

integer representations

representations introduction

63-4522 02 - wireless installation & integration reference...

oral representations cannot be relied upon as …...oral...

market representations in industrial marketing: could...

3511-4522 ghb-triple lip seal- - squarespace · pdf...

artistic representations

molecular representations

intermediate representations -...

understanding representations

knowledge representations

architecture representations

program representations xiangyu zhang. cs590f software...