learning latent space continuous models for ... - icml11-14-00)-11-15-05-5163... · battlezone...

17
DeepMDP Learning Latent Space Continuous Models for Representation Learning Carles Gelada, Saurabh Kumar, Jacob Buckman, Ofir Nachum, Marc G. Bellemare

Upload: others

Post on 30-Apr-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Learning Latent Space Continuous Models for ... - ICML11-14-00)-11-15-05-5163... · BattleZone Gravitar Carnival CrazyClimber BankHeist Centipede Hero Qbert RoadRunner TimePilot Tutankham

DeepMDPLearning Latent Space Continuous Models for

Representation Learning

Carles Gelada, Saurabh Kumar, Jacob Buckman,

Ofir Nachum, Marc G. Bellemare

Page 2: Learning Latent Space Continuous Models for ... - ICML11-14-00)-11-15-05-5163... · BattleZone Gravitar Carnival CrazyClimber BankHeist Centipede Hero Qbert RoadRunner TimePilot Tutankham

212

Simple Representations for RL

Page 3: Learning Latent Space Continuous Models for ... - ICML11-14-00)-11-15-05-5163... · BattleZone Gravitar Carnival CrazyClimber BankHeist Centipede Hero Qbert RoadRunner TimePilot Tutankham

Neural networks

DeepMDPLatent Space Model:

MDP:

& trained via the following two losses:

Page 4: Learning Latent Space Continuous Models for ... - ICML11-14-00)-11-15-05-5163... · BattleZone Gravitar Carnival CrazyClimber BankHeist Centipede Hero Qbert RoadRunner TimePilot Tutankham

Reward Loss

Page 5: Learning Latent Space Continuous Models for ... - ICML11-14-00)-11-15-05-5163... · BattleZone Gravitar Carnival CrazyClimber BankHeist Centipede Hero Qbert RoadRunner TimePilot Tutankham

Transition Loss

Page 6: Learning Latent Space Continuous Models for ... - ICML11-14-00)-11-15-05-5163... · BattleZone Gravitar Carnival CrazyClimber BankHeist Centipede Hero Qbert RoadRunner TimePilot Tutankham

Tractable Losses

Page 7: Learning Latent Space Continuous Models for ... - ICML11-14-00)-11-15-05-5163... · BattleZone Gravitar Carnival CrazyClimber BankHeist Centipede Hero Qbert RoadRunner TimePilot Tutankham

Deep Policies

Page 8: Learning Latent Space Continuous Models for ... - ICML11-14-00)-11-15-05-5163... · BattleZone Gravitar Carnival CrazyClimber BankHeist Centipede Hero Qbert RoadRunner TimePilot Tutankham

Representation Quality

Page 9: Learning Latent Space Continuous Models for ... - ICML11-14-00)-11-15-05-5163... · BattleZone Gravitar Carnival CrazyClimber BankHeist Centipede Hero Qbert RoadRunner TimePilot Tutankham

Only Discards:

Ferns, N., Panangaden, P., and Precup, D. Metrics for Finite Markov Decision

Processes. In Proceedings of the 20th Conference on Uncertainty in Artificial

Intelligence, UAI ’04, pp. 162–169, 2004.

Page 10: Learning Latent Space Continuous Models for ... - ICML11-14-00)-11-15-05-5163... · BattleZone Gravitar Carnival CrazyClimber BankHeist Centipede Hero Qbert RoadRunner TimePilot Tutankham

Phi as a Representation

Page 11: Learning Latent Space Continuous Models for ... - ICML11-14-00)-11-15-05-5163... · BattleZone Gravitar Carnival CrazyClimber BankHeist Centipede Hero Qbert RoadRunner TimePilot Tutankham

Donut World

Page 12: Learning Latent Space Continuous Models for ... - ICML11-14-00)-11-15-05-5163... · BattleZone Gravitar Carnival CrazyClimber BankHeist Centipede Hero Qbert RoadRunner TimePilot Tutankham

DeepMDP on Donut World

2D latent space

+

DeepMDP losses

Page 13: Learning Latent Space Continuous Models for ... - ICML11-14-00)-11-15-05-5163... · BattleZone Gravitar Carnival CrazyClimber BankHeist Centipede Hero Qbert RoadRunner TimePilot Tutankham

DeepMDP on Donut World

Visualization of latent distance

Page 14: Learning Latent Space Continuous Models for ... - ICML11-14-00)-11-15-05-5163... · BattleZone Gravitar Carnival CrazyClimber BankHeist Centipede Hero Qbert RoadRunner TimePilot Tutankham

DeepMDPAuxiliary Task

Base C51 agent

+

DeepMDP losses

Page 15: Learning Latent Space Continuous Models for ... - ICML11-14-00)-11-15-05-5163... · BattleZone Gravitar Carnival CrazyClimber BankHeist Centipede Hero Qbert RoadRunner TimePilot Tutankham

DeepMDPAuxiliary Task

Base C51 agent

+

DeepMDP losses

Page 16: Learning Latent Space Continuous Models for ... - ICML11-14-00)-11-15-05-5163... · BattleZone Gravitar Carnival CrazyClimber BankHeist Centipede Hero Qbert RoadRunner TimePilot Tutankham

● DeepMDPs as Models of the Environment

● Norm-MMD Metrics and their Associated Smoothness

Page 17: Learning Latent Space Continuous Models for ... - ICML11-14-00)-11-15-05-5163... · BattleZone Gravitar Carnival CrazyClimber BankHeist Centipede Hero Qbert RoadRunner TimePilot Tutankham

Thanks For Listening

Poster #108