lab 4: q-learning (table) - github pages · lab 4: q-learning (table) exploit&exploration and...

11

Lab 4: Q-learning (table) exploit&exploration and discounted future reward Reinforcement Learning with TensorFlow&OpenAI Gym Sung Kim <[email protected]>

Upload: others

Post on 10-Oct-2019

8 views

Category:

Documents

0 download

Report

Download

Embed Size (px):

TRANSCRIPT

Page 1: Lab 4: Q-learning (table) - GitHub Pages · Lab 4: Q-learning (table) exploit&exploration and discounted future reward Reinforcement Learning with TensorFlow&OpenAI Gym Sung Kim

Lab 4: Q-learning (table)exploit&exploration and discounted future reward

Reinforcement Learning with TensorFlow&OpenAI GymSung Kim <[email protected]>

Page 2: Lab 4: Q-learning (table) - GitHub Pages · Lab 4: Q-learning (table) exploit&exploration and discounted future reward Reinforcement Learning with TensorFlow&OpenAI Gym Sung Kim

Exploit VS Exploration: decaying E-greedy

for i in range (1000)

e = 0.1 / (i+1)

if random(1) < e: a = random

else: a = argmax(Q(s, a))

Page 3: Lab 4: Q-learning (table) - GitHub Pages · Lab 4: Q-learning (table) exploit&exploration and discounted future reward Reinforcement Learning with TensorFlow&OpenAI Gym Sung Kim

Exploit VS Exploration: add random noise

0.5 0.6 0.3 0.2 0.5

for i in range (1000) a = argmax(Q(s, a) + random_values / (i+1))

Page 4: Lab 4: Q-learning (table) - GitHub Pages · Lab 4: Q-learning (table) exploit&exploration and discounted future reward Reinforcement Learning with TensorFlow&OpenAI Gym Sung Kim

Discounted reward ( = 0.9)

1

Page 5: Lab 4: Q-learning (table) - GitHub Pages · Lab 4: Q-learning (table) exploit&exploration and discounted future reward Reinforcement Learning with TensorFlow&OpenAI Gym Sung Kim

Q-learning algorithm

Page 6: Lab 4: Q-learning (table) - GitHub Pages · Lab 4: Q-learning (table) exploit&exploration and discounted future reward Reinforcement Learning with TensorFlow&OpenAI Gym Sung Kim

Code: setup

https://medium.com/emergent-future/simple-reinforcement-learning-with-tensorflow-part-0-q-learning-with-tables-and-neural-networks-d195264329d0#.pjz9g59ap

Page 7: Lab 4: Q-learning (table) - GitHub Pages · Lab 4: Q-learning (table) exploit&exploration and discounted future reward Reinforcement Learning with TensorFlow&OpenAI Gym Sung Kim

Code: Q learning

https://medium.com/emergent-future/simple-reinforcement-learning-with-tensorflow-part-0-q-learning-with-tables-and-neural-networks-d195264329d0#.pjz9g59ap

Page 8: Lab 4: Q-learning (table) - GitHub Pages · Lab 4: Q-learning (table) exploit&exploration and discounted future reward Reinforcement Learning with TensorFlow&OpenAI Gym Sung Kim

Code: results

https://medium.com/emergent-future/simple-reinforcement-learning-with-tensorflow-part-0-q-learning-with-tables-and-neural-networks-d195264329d0#.pjz9g59ap

Page 9: Lab 4: Q-learning (table) - GitHub Pages · Lab 4: Q-learning (table) exploit&exploration and discounted future reward Reinforcement Learning with TensorFlow&OpenAI Gym Sung Kim

Code: e-greedy

https://medium.com/emergent-future/simple-reinforcement-learning-with-tensorflow-part-0-q-learning-with-tables-and-neural-networks-d195264329d0#.pjz9g59ap

Page 10: Lab 4: Q-learning (table) - GitHub Pages · Lab 4: Q-learning (table) exploit&exploration and discounted future reward Reinforcement Learning with TensorFlow&OpenAI Gym Sung Kim

Code: e-greedy results

https://medium.com/emergent-future/simple-reinforcement-learning-with-tensorflow-part-0-q-learning-with-tables-and-neural-networks-d195264329d0#.pjz9g59ap

Page 11: Lab 4: Q-learning (table) - GitHub Pages · Lab 4: Q-learning (table) exploit&exploration and discounted future reward Reinforcement Learning with TensorFlow&OpenAI Gym Sung Kim

Next

Nondeterministic/

Stochastic worlds

OpenAI Five Model Architecture · OpenAI Five Model Architecture (06/06/2018) Title: dota_network_diagram Created Date: 6/24/2018 4:00:19 PM

ut s OpenAI + DotA 2kanmy/courses/6101_1820/s13.pdf · ut s OpenAI Rapid[1] … a general-purpose RL training system . ut s Proximal Policy Optimization[1, 4, 5]

Lecture 7: DQN - GitHub Pages · PDF fileLecture 7: DQN Reinforcement Learning with TensorFlow&OpenAI Gym ... Deep Q-Networks ... Deep Reinforcement Learning, David Silver,

Adversarial Approaches to Bayesian Learning and Bayesian ... · Learning and Bayesian Approaches to Adversarial Robustness Ian Goodfellow, OpenAI Research Scientist NIPS 2016 Workshop

Lab 7: DQN 1 (NIPS 2013) - GitHub Pages · Lab 7: DQN 1 (NIPS 2013) Reinforcement Learning with TensorFlow&OpenAI Gym Sung Kim

ns-3 meets OpenAI Gym: The Playground for …...ns-3 meets OpenAI Gym MSWiM ’19, Nov 25–29, 2019, Miami Beach, USA over Ethernet or WiFi network devices. Based on the core concepts,

IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, VOL. 39, …people.csail.mit.edu/hunkim/papers/kim-tse2013.pdf · information about a specific software failure, including the failure symptoms,

Extending the OpenAI Gym for robotics: a toolkit for ...erlerobotics.com/whitepaper/robot_gym.pdfExtending the OpenAI Gym for robotics: a toolkit for reinforcement learning using ROS

Working in OpenAI Environments Designing Your Own · Designing Your Own Mike Rudd CS 885 Guest Lecture May 18, 2018. OpenAI* •Not-for-profit, funded by private ... Building Your

Extending the OpenAI Gym for robotics: a toolkit for ... · OpenAI Gym [1] is a is a toolkit for reinforcement learning research that has recently gained popularity in the machine

Lecture 4: Q-learning (table) - GitHub Pages · Lecture 4: Q-learning (table) exploit&exploration and discounted future reward Reinforcement Learning with TensorFlow&OpenAI Gym Sung

Lecture 7: DQN - GitHub Pageshunkim.github.io/ml/RL/rl07.pdf · Lecture 7: DQN Reinforcement Learning with TensorFlow&OpenAI Gym Sung Kim Q-function Approximation:

IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, VOL. 34, NO. …people.csail.mit.edu/hunkim/papers/kim-tse-2008.pdf · 2015-09-04 · classification uses a machine learning classifier

Lab 2: Playing OpenAI Gym Games - GitHub Pages · 2017-10-02 · Lab 2: Playing OpenAI Gym Games Reinforcement Learning with TensorFlow&OpenAI Gym Sung Kim

Can OpenAI Codex and Other Large Language Models Help Us

Learning with Opponent-Learning Awareness · 2018-06-18 · Learning with Opponent-Learning Awareness Jakob Foerster†,‡ University of Oxford Richard Y. Chen† OpenAI Maruan Al-Shedivat‡

Lecture 5: Windy Frozen Lake Nondeterministic world!hunkim.github.io/ml/RL/rl05.pdf · Lecture 5: Windy Frozen Lake Nondeterministic world! Reinforcement Learning with TensorFlow&OpenAI

Catacomb : A database backed WebDAV and DASL repositorywebdav.org/papers/catacomb-apachecon2002.pdf · Title: Catacomb : A database backed WebDAV and DASL repository Author: hunkim

Long-Term Planning and Situational Awareness in OpenAI Five · 2019. 12. 17. · 2 Approach 2.1 OpenAI Five In this paper, we focus speciﬁcally on OpenAI Five [8], a model trained

Model-Based Reinforcement Learning via Meta-Policy ...h2t.anthropomatik.kit.edu/pdf/Rothfuss2018.pdf · Jonas Rothfuss KIT, UC Berkeley [email protected] John Schulman OpenAI

Lab 3: Dummy Q-learning (table) - GitHub Pages · PDF fileLab 3: Dummy Q-learning (table) Reinforcement Learning with TensorFlow&OpenAI Gym Sung Kim

Proximal Policy Optimization Algorithms€¦ · Proximal Policy Optimization Algorithms John Schulman, Filip Wolski, Prafulla Dhariwal, Alec Radford, Oleg Klimov OpenAI fjoschu, filip,

On First-Order Meta-Learning Algorithms · On First-Order Meta-Learning Algorithms Alex Nichol and Joshua Achiam and John Schulman OpenAI falex, jachiam, [email protected] Abstract

Lecture 1: Introduction - GitHub Pageshunkim.github.io/ml/RL/rl01.pdf · Lecture 1: Introduction Reinforcement Learning with TensorFlow&OpenAI Gym Sung Kim

Jonas Schneider, Head of Engineering for Robotics, OpenAI

Lecture 1: Introduction - GitHub Pages 1: Introduction Reinforcement Learning with TensorFlow&OpenAI Gym Sung Kim Nature of Learning •We learn from past--

10-703 Deep RL and Controls OpenAI Gym Recitation API Basic Datatypes ... Minecraft. VirtualEnv Installation ... 10-703 Deep RL and Controls OpenAI Gym Recitation Author: Devin Schwab

Measuring the Algorithmic Efficiency of Neural NetworksMeasuring the Algorithmic Efﬁciency of Neural Networks Danny Hernandez OpenAI [email protected] Tom B. Brown OpenAI [email protected]

Lab 7: DQN 1 (NIPS 2013) - GitHub Pageshunkim.github.io/ml/RL/rl07-l1.pdf · Lab 7: DQN 1 (NIPS 2013) Reinforcement Learning with TensorFlow&OpenAI Gym Sung Kim

Geometry-Aware Neural Renderingpapers.nips.cc/paper/9331-geometry-aware-neural-rendering.pdf · Geometry-Aware Neural Rendering Josh Tobin OpenAI & UC Berkeley [email protected] OpenAI

Ian Goodfellow, OpenAI Research Scientist Guest lecture for CS 294

Lab 4: Q-learning (table) - GitHub Pageshunkim.github.io/ml/RL/rl-l04.pdfLab 4: Q-learning (table) exploit&exploration and discounted future reward Reinforcement Learning with TensorFlow&OpenAI

State Common Entrance Test Cell - fe2019.mahacet.orgfe2019.mahacet.org/CAP-I/CAPR-I_EN3207.pdf · 95 693790.6428261EN19333354GHANGREKAR ATHARVA VINAY M OPENAI 96 753789.8902170EN19247083ANKIT

ML with Tensorflow - GitHub Pageshunkim.github.io/ml/lec7.pdf · 2017-10-02 · Lecture 7-1 Application & Tips: Learning rate, data preprocessing, overﬁtting Sung Kim

DeepAM: Migrate APIs with Multi-modal Sequence to …hunkim/papers/gu-ijcai2017.pdfDeepAM: Migrate APIs with Multi-modal Sequence to Sequence Learning Xiaodong Gu1, Hongyu Zhang2,