lecture 3: q-learning (table) - github pages · lecture 3: q-learning (table) reinforcement...

Lecture 3: Q-learning (table) Reinforcement Learning with TensorFlow&OpenAI Gym Sung Kim <[email protected]>

Upload: hoanghanh

Post on 28-Sep-2018

218 views

Category:

Documents

0 download

Report

Download

Embed Size (px):

TRANSCRIPT

Lecture 3: Q-learning (table)

Reinforcement Learning with TensorFlow&OpenAI GymSung Kim <[email protected]>

Try Frozen Lake, Real Game?

Page 3: Lecture 3: Q-learning (table) - GitHub Pages · Lecture 3: Q-learning (table) Reinforcement Learning with TensorFlow&OpenAI Gym Sung Kim

Frozen Lake: Random?

Page 4: Lecture 3: Q-learning (table) - GitHub Pages · Lecture 3: Q-learning (table) Reinforcement Learning with TensorFlow&OpenAI Gym Sung Kim

Frozen Lake: Even if you know the way, ask.“아는 길도, 물어가라”

Page 5: Lecture 3: Q-learning (table) - GitHub Pages · Lecture 3: Q-learning (table) Reinforcement Learning with TensorFlow&OpenAI Gym Sung Kim

Q-function (state-action value function)

(1) state

(2) action(3) quality (reward)

Q (state, action)

Page 6: Lecture 3: Q-learning (table) - GitHub Pages · Lecture 3: Q-learning (table) Reinforcement Learning with TensorFlow&OpenAI Gym Sung Kim

Policy using Q-function

Q (state, action)

Q (s1, LEFT): 0Q (s1, RIGHT): 0.5Q (s1, UP): 0Q (s1, DOWN): 0.3

Page 7: Lecture 3: Q-learning (table) - GitHub Pages · Lecture 3: Q-learning (table) Reinforcement Learning with TensorFlow&OpenAI Gym Sung Kim

Optimal Policy, and Max Q

Q (state, action)

Max Q =

Page 8: Lecture 3: Q-learning (table) - GitHub Pages · Lecture 3: Q-learning (table) Reinforcement Learning with TensorFlow&OpenAI Gym Sung Kim

Frozen Lake: optimal policy with Q

Page 9: Lecture 3: Q-learning (table) - GitHub Pages · Lecture 3: Q-learning (table) Reinforcement Learning with TensorFlow&OpenAI Gym Sung Kim

Frozen Lake: optimal policy with Qa

s`s

Page 10: Lecture 3: Q-learning (table) - GitHub Pages · Lecture 3: Q-learning (table) Reinforcement Learning with TensorFlow&OpenAI Gym Sung Kim

Finding, Learning Q

• Assume (believe) Q in s` exists!

• My condition- I am in s

- when I do action a, I’ll go to s`- when I do action a, I’ll get reward r- Q in s`, Q(s`, a`) exist!

• How can we express Q(s, a) using Q(s`, a`)?

Page 11: Lecture 3: Q-learning (table) - GitHub Pages · Lecture 3: Q-learning (table) Reinforcement Learning with TensorFlow&OpenAI Gym Sung Kim

Learning Q (s, a)?

r s`

Page 12: Lecture 3: Q-learning (table) - GitHub Pages · Lecture 3: Q-learning (table) Reinforcement Learning with TensorFlow&OpenAI Gym Sung Kim

State, action, reward

S F F F

F H F H

F F F H

H F F G

Page 13: Lecture 3: Q-learning (table) - GitHub Pages · Lecture 3: Q-learning (table) Reinforcement Learning with TensorFlow&OpenAI Gym Sung Kim

Future reward

Page 14: Lecture 3: Q-learning (table) - GitHub Pages · Lecture 3: Q-learning (table) Reinforcement Learning with TensorFlow&OpenAI Gym Sung Kim

Learning Q (s, a)?

Page 15: Lecture 3: Q-learning (table) - GitHub Pages · Lecture 3: Q-learning (table) Reinforcement Learning with TensorFlow&OpenAI Gym Sung Kim

Learning Q(s, a): 16x4 Table16 states and 4 actions (up, down, left, right)

Page 16: Lecture 3: Q-learning (table) - GitHub Pages · Lecture 3: Q-learning (table) Reinforcement Learning with TensorFlow&OpenAI Gym Sung Kim

Learning Q(s, a): Tableinitial Q values are 0

Page 17: Lecture 3: Q-learning (table) - GitHub Pages · Lecture 3: Q-learning (table) Reinforcement Learning with TensorFlow&OpenAI Gym Sung Kim

Learning Q(s, a) Table (with many trials)initial Q values are 0

Page 18: Lecture 3: Q-learning (table) - GitHub Pages · Lecture 3: Q-learning (table) Reinforcement Learning with TensorFlow&OpenAI Gym Sung Kim

Learning Q(s, a) Table (with many trials)initial Q values are 0

1 Q(s14, aright) = r = 1

Q(s13, aright) = r + max(Q(s14, a)) = 0 + max (0, 0, 1, 0) = 1

Page 19: Lecture 3: Q-learning (table) - GitHub Pages · Lecture 3: Q-learning (table) Reinforcement Learning with TensorFlow&OpenAI Gym Sung Kim

Learning Q(s, a) Table: one success!initial Q values are 0

111

Page 20: Lecture 3: Q-learning (table) - GitHub Pages · Lecture 3: Q-learning (table) Reinforcement Learning with TensorFlow&OpenAI Gym Sung Kim

Learning Q(s, a) Table: optimal policy

111

Page 21: Lecture 3: Q-learning (table) - GitHub Pages · Lecture 3: Q-learning (table) Reinforcement Learning with TensorFlow&OpenAI Gym Sung Kim

Dummy Q-learning algorithm

Page 22: Lecture 3: Q-learning (table) - GitHub Pages · Lecture 3: Q-learning (table) Reinforcement Learning with TensorFlow&OpenAI Gym Sung Kim

Lab: Dummy Q-learning Table

DeepAM: Migrate APIs with Multi-modal Sequence to …hunkim/papers/gu-ijcai2017.pdfDeepAM: Migrate APIs with Multi-modal Sequence to Sequence Learning Xiaodong Gu1, Hongyu Zhang2,

ns-3 meets OpenAI Gym: The Playground for …...ns-3 meets OpenAI Gym MSWiM ’19, Nov 25–29, 2019, Miami Beach, USA over Ethernet or WiFi network devices. Based on the core concepts,

Lecture 1: Introduction - GitHub Pageshunkim.github.io/ml/RL/rl01.pdf · Lecture 1: Introduction Reinforcement Learning with TensorFlow&OpenAI Gym Sung Kim

10-703 Deep RL and Controls OpenAI Gym Recitation API Basic Datatypes ... Minecraft. VirtualEnv Installation ... 10-703 Deep RL and Controls OpenAI Gym Recitation Author: Devin Schwab

ut s OpenAI + DotA 2kanmy/courses/6101_1820/s13.pdf · ut s OpenAI Rapid[1] … a general-purpose RL training system . ut s Proximal Policy Optimization[1, 4, 5]

Can OpenAI Codex and Other Large Language Models Help Us

Lab 7: DQN 1 (NIPS 2013) - GitHub Pageshunkim.github.io/ml/RL/rl07-l1.pdf · Lab 7: DQN 1 (NIPS 2013) Reinforcement Learning with TensorFlow&OpenAI Gym Sung Kim

Lab 2: Playing OpenAI Gym Games - GitHub Pageshunkim.github.io/ml/RL/rl-l02.pdf · Lab 2: Playing OpenAI Gym Games Reinforcement Learning with TensorFlow&OpenAI Gym Sung Kim

Large-Scale Study of Curiosity-Driven LearningLarge-Scale Study of Curiosity-Driven Learning Yuri Burda OpenAI Harri Edwards OpenAI Deepak Pathak UC Berkeley Amos Storkey Univ. of

Lecture 4: Q-learning (table) - GitHub Pages · Lecture 4: Q-learning (table) exploit&exploration and discounted future reward Reinforcement Learning with TensorFlow&OpenAI Gym Sung

Learning with Opponent-Learning Awareness · 2018-06-18 · Learning with Opponent-Learning Awareness Jakob Foerster†,‡ University of Oxford Richard Y. Chen† OpenAI Maruan Al-Shedivat‡

Generative Adversarial Networks (GANs) - Ian Goodfellow, OpenAI

Deep Reinforcement Learning Through Policy Optimization, John Schulman, OpenAI

Lecture 5: Windy Frozen Lake Nondeterministic world!hunkim.github.io/ml/RL/rl05.pdf · Lecture 5: Windy Frozen Lake Nondeterministic world! Reinforcement Learning with TensorFlow&OpenAI

State Common Entrance Test Cell - fe2019.mahacet.orgfe2019.mahacet.org/CAP-I/CAPR-I_EN3207.pdf · 95 693790.6428261EN19333354GHANGREKAR ATHARVA VINAY M OPENAI 96 753789.8902170EN19247083ANKIT

GPT-3: Few-Shot Learning with a Giant Language Model · 2020. 12. 16. · GPT-3: Few-Shot Learning with a Giant Language Model Melanie Subbiah 1 OpenAI Columbia University

Measuring the Algorithmic Efficiency of Neural NetworksMeasuring the Algorithmic Efﬁciency of Neural Networks Danny Hernandez OpenAI [email protected] Tom B. Brown OpenAI [email protected]

OpenAI Five Model Architecture · OpenAI Five Model Architecture (06/06/2018) Title: dota_network_diagram Created Date: 6/24/2018 4:00:19 PM

arXiv:1810.12894v1 [cs.LG] 30 Oct 2018 · EXPLORATION BY RANDOM NETWORK DISTILLATION Yuri Burda OpenAI Harrison Edwards OpenAI Amos Storkey Univ. of Edinburgh Oleg Klimov OpenAI ABSTRACT

Lecture 7: DQN - GitHub Pages · PDF fileLecture 7: DQN Reinforcement Learning with TensorFlow&OpenAI Gym ... Deep Q-Networks ... Deep Reinforcement Learning, David Silver,

Extending the OpenAI Gym for robotics: a toolkit for ... · OpenAI Gym [1] is a is a toolkit for reinforcement learning research that has recently gained popularity in the machine

Ian Goodfellow, OpenAI Research Scientist Guest lecture for CS 294

Lab 2: Playing OpenAI Gym Games - GitHub Pages · 2017-10-02 · Lab 2: Playing OpenAI Gym Games Reinforcement Learning with TensorFlow&OpenAI Gym Sung Kim

Lab 4: Q-learning (table) - GitHub Pages · Lab 4: Q-learning (table) exploit&exploration and discounted future reward Reinforcement Learning with TensorFlow&OpenAI Gym Sung Kim

Proximal Policy Optimization Algorithms€¦ · Proximal Policy Optimization Algorithms John Schulman, Filip Wolski, Prafulla Dhariwal, Alec Radford, Oleg Klimov OpenAI fjoschu, filip,

Lab 6-1: Q Network - GitHub Pages · 2017-10-02 · Lab 6-1: Q Network Reinforcement Learning with TensorFlow&OpenAI Gym Sung Kim State(0~15) as input

OpenAI Five Model Architecture - Amazon S3

Catacomb : A database backed WebDAV and DASL repositorywebdav.org/papers/catacomb-apachecon2002.pdf · Title: Catacomb : A database backed WebDAV and DASL repository Author: hunkim

Lecture 7: DQN - GitHub Pageshunkim.github.io/ml/RL/rl07.pdf · Lecture 7: DQN Reinforcement Learning with TensorFlow&OpenAI Gym Sung Kim Q-function Approximation:

On First-Order Meta-Learning Algorithms · On First-Order Meta-Learning Algorithms Alex Nichol and Joshua Achiam and John Schulman OpenAI falex, jachiam, [email protected] Abstract

Teacher-Student Curriculum Learning › pdf › 1707.00183.pdfTeacher-Student Curriculum Learning Tambet Matiisen,y University of Tartu Avital Oliver z OpenAI Taco Cohen University

IEEE TRANSACTIONS ON JOURNAL NAME, MANUSCRIPT ID 1 …people.csail.mit.edu/hunkim/papers/lee-tse2015.pdfAbstract—Recommendation systems are intended to increase developer productivity

IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, VOL. 34, NO. …people.csail.mit.edu/hunkim/papers/kim-tse-2008.pdf · 2015-09-04 · classification uses a machine learning classifier

Jonas Schneider, Head of Engineering for Robotics, OpenAI

Learning Montezuma’s Revenge from a Single Demonstration · 2018-12-11 · Learning Montezuma’s Revenge from a Single Demonstration Tim Salimans OpenAI, Google Brain Richard Chen