lab 4: q-learning (table) - github pageshunkim.github.io/ml/rl/rl-l04.pdflab 4: q-learning (table)...

11

Lab 4: Q-learning (table) exploit&exploration and discounted future reward Reinforcement Learning with TensorFlow&OpenAI Gym Sung Kim <[email protected]>

Upload: others

Post on 12-Jul-2020

8 views

Category:

Documents

0 download

Report

Download

Embed Size (px):

TRANSCRIPT

Page 1: Lab 4: Q-learning (table) - GitHub Pageshunkim.github.io/ml/RL/rl-l04.pdfLab 4: Q-learning (table) exploit&exploration and discounted future reward Reinforcement Learning with TensorFlow&OpenAI

Lab 4: Q-learning (table)exploit&exploration and discounted future reward

Reinforcement Learning with TensorFlow&OpenAI GymSung Kim <[email protected]>

Page 2: Lab 4: Q-learning (table) - GitHub Pageshunkim.github.io/ml/RL/rl-l04.pdfLab 4: Q-learning (table) exploit&exploration and discounted future reward Reinforcement Learning with TensorFlow&OpenAI

Exploit VS Exploration: decaying E-greedy

for i in range (1000)

e = 0.1 / (i+1)

if random(1) < e: a = random

else: a = argmax(Q(s, a))

Page 3: Lab 4: Q-learning (table) - GitHub Pageshunkim.github.io/ml/RL/rl-l04.pdfLab 4: Q-learning (table) exploit&exploration and discounted future reward Reinforcement Learning with TensorFlow&OpenAI

Exploit VS Exploration: add random noise

0.5 0.6 0.3 0.2 0.5

for i in range (1000) a = argmax(Q(s, a) + random_values / (i+1))

Page 4: Lab 4: Q-learning (table) - GitHub Pageshunkim.github.io/ml/RL/rl-l04.pdfLab 4: Q-learning (table) exploit&exploration and discounted future reward Reinforcement Learning with TensorFlow&OpenAI

Discounted reward ( = 0.9)

1

Page 5: Lab 4: Q-learning (table) - GitHub Pageshunkim.github.io/ml/RL/rl-l04.pdfLab 4: Q-learning (table) exploit&exploration and discounted future reward Reinforcement Learning with TensorFlow&OpenAI

Q-learning algorithm

Page 6: Lab 4: Q-learning (table) - GitHub Pageshunkim.github.io/ml/RL/rl-l04.pdfLab 4: Q-learning (table) exploit&exploration and discounted future reward Reinforcement Learning with TensorFlow&OpenAI

Code: setup

https://medium.com/emergent-future/simple-reinforcement-learning-with-tensorflow-part-0-q-learning-with-tables-and-neural-networks-d195264329d0#.pjz9g59ap

Page 7: Lab 4: Q-learning (table) - GitHub Pageshunkim.github.io/ml/RL/rl-l04.pdfLab 4: Q-learning (table) exploit&exploration and discounted future reward Reinforcement Learning with TensorFlow&OpenAI

Code: Q learning

https://medium.com/emergent-future/simple-reinforcement-learning-with-tensorflow-part-0-q-learning-with-tables-and-neural-networks-d195264329d0#.pjz9g59ap

Page 8: Lab 4: Q-learning (table) - GitHub Pageshunkim.github.io/ml/RL/rl-l04.pdfLab 4: Q-learning (table) exploit&exploration and discounted future reward Reinforcement Learning with TensorFlow&OpenAI

Code: results

https://medium.com/emergent-future/simple-reinforcement-learning-with-tensorflow-part-0-q-learning-with-tables-and-neural-networks-d195264329d0#.pjz9g59ap

Page 9: Lab 4: Q-learning (table) - GitHub Pageshunkim.github.io/ml/RL/rl-l04.pdfLab 4: Q-learning (table) exploit&exploration and discounted future reward Reinforcement Learning with TensorFlow&OpenAI

Code: e-greedy

https://medium.com/emergent-future/simple-reinforcement-learning-with-tensorflow-part-0-q-learning-with-tables-and-neural-networks-d195264329d0#.pjz9g59ap

Page 10: Lab 4: Q-learning (table) - GitHub Pageshunkim.github.io/ml/RL/rl-l04.pdfLab 4: Q-learning (table) exploit&exploration and discounted future reward Reinforcement Learning with TensorFlow&OpenAI

Code: e-greedy results

https://medium.com/emergent-future/simple-reinforcement-learning-with-tensorflow-part-0-q-learning-with-tables-and-neural-networks-d195264329d0#.pjz9g59ap

Page 11: Lab 4: Q-learning (table) - GitHub Pageshunkim.github.io/ml/RL/rl-l04.pdfLab 4: Q-learning (table) exploit&exploration and discounted future reward Reinforcement Learning with TensorFlow&OpenAI

Next

Nondeterministic/

Stochastic worlds

Lab 7: DQN 1 (NIPS 2013) - GitHub Pageshunkim.github.io/ml/RL/rl07-l1.pdf · Lab 7: DQN 1 (NIPS 2013) Reinforcement Learning with TensorFlow&OpenAI Gym Sung Kim

t learning RL Barto Sutton forthcoming Bertsek · 2005. 1. 10. · Reinforcemen t learning RL Barto Sutton forthcoming Bertsek as Tsitsik lis applies naturally to the case of autonomous

Deep Reinforcement Learning - David Silver · Reinforcement Learning: AI = RL RL is a general-purpose framework for arti cial intelligence I RL is for an agent with the capacity to

Reinforcement Learning (RL)

Lecture 12: Policy Optimization IIierg6130/2019/slide/lecture12.pdf · 2 Policy-based RL: solve RL mainly through learning 1 Machine learning and deep learning 2 Representative algorithms:

Bandits, Active Learning, Bayesian RL and Global

Learning to Optimize Join Queries with Deep RL€¦ · Learning to Optimize Join Queries with Deep RL Join Optimization DQ: Deep Q-learning for join optimization Discussion CS294

© Jude Shavlik 2006, David Page 2007 CS 760 – Machine Learning (UW-Madison)RL Lecture, Slide 1 Reinforcement Learning (RL) Consider an “agent” embedded

A2-RL: Aesthetics Aware Reinforcement Learning for Image …openaccess.thecvf.com/content_cvpr_2018/papers/Li_A2-RL... · 2018-06-11 · A2-RL: Aesthetics Aware Reinforcement Learning

RC and RL Circuits - Electronics â€“ Online Distance Learning

A Brief Introduction to Reinforcement Learningais.informatik.uni-freiburg.de/.../deep_learning... · Outline • Characteristics of Reinforcement Learning (RL) • The RL Problem

Reinforcement Learning Dealing with Complexity and Safety in RL

IERG 5350 Reinforcement Learning Lecture 1: Course Overview · •RL algorithms: Q-learning, policy gradients, actor-critic. •Others advanced topics: inverse RL, imitation learning,

Reinforcement Learning - RL in finite MDPshome.deib.polimi.it/restelli/MyWebSite/pdf/rl4.pdf · Reinforcement Learning RL in ﬁnite MDPs ... i 0 i = 1and P i 0 2 i

Continuous Action Reinforcement Learning Automata, an RL

Reinforcement Learning. Overview Introduction Q-learning Exploration Exploitation Evaluating RL algorithms On-Policy learning: SARSA Model-based

ML with Tensorflow - GitHub Pageshunkim.github.io/ml/lec0.pdf · Goals •Basic understanding of machine learning algorithms Linear regression, Logistic regression (classiﬁcation)-Neural

Multiagent Learning Paradigmspstone/Papers/bib2html-links/LNAI18-Tuyls.pdf1. Online RL towards individual utility 2. Online RL towards social welfare 3. Co-evolutionary learning 4

Tutorial: Deep Reinforcement Learning€¦ · Reinforcement Learning in a nutshell RL is a general-purpose framework for decision-making I RL is for an agent with the capacity to

Excerpt from Blue Jasmine - Edl · 2018-07-28 · 205053P SAMPLES OF STANDARDS STUDENTS ARE LEARNING THIS NINE WEEKS: 5TH GRADE ELA STANDARDS: RL.5.1, RL.5.2, RL.5.3, RL.5.4, RL.5.5,

Deep Reinforcement Learning - GitHub Pages · 1 Q-Learning 2 Value based Deep RL 3 Policy based Deep RL 4 Lessons Srivatsan Srinivasan (IACS) Deep RL March 27, 2019 2 / 31. Q-Learning

SM RL: S M R LEARNING IN UNSTABLE ENVIRONMENTS

Sequence-to-Sequence Reinforcement Learning Seq2seq RL for

Lab 2: Playing OpenAI Gym Games - GitHub Pageshunkim.github.io/ml/RL/rl-l02.pdf · Lab 2: Playing OpenAI Gym Games Reinforcement Learning with TensorFlow&OpenAI Gym Sung Kim

RL-CycleGAN: Reinforcement Learning Aware Simulation-to-Real · 2020. 6. 29. · RL-CycleGAN: Reinforcement Learning Aware Simulation-To-Real Kanishka Rao1, Chris Harris1, Alex Irpan1,

Lab 6-1: Q Network - GitHub Pageshunkim.github.io/ml/RL/rl06-l1.pdfPlaying Atari with Deep Reinforcement Learning - University of Toronto by V Mnih et al. Algorithm Playing Atari with

An introduction to reinforcement learning (rl)

ML with Tensorflow - GitHub Pageshunkim.github.io/ml/lec7.pdf · 2017-10-02 · Lecture 7-1 Application & Tips: Learning rate, data preprocessing, overﬁtting Sung Kim

Logically-Constrained Neural Fitted Q-iteration · Reinforcement Learning (RL) is a successful paradigm for training ... mediate application of LTL in RL is for safe learning: for

RL-CycleGAN: Reinforcement Learning Aware Simulation-to-Realopenaccess.thecvf.com/content_CVPR_2020/papers/Rao_RL... · 2020-06-07 · RL-CycleGAN: Reinforcement Learning Aware Simulation-To-Real

Soar-RL: Reinforcement Learning and Soar

Lab 6-2: Q Network for Cart Pole - GitHub Pageshunkim.github.io/ml/RL/rl06-l2.pdf · 2017-10-02 · import numpy as np import tensor flow as tf import gym gym. make( CartPole—vØ

Manix Au - Automatic State Construction using DT for RL agen · Manix Au 04.03.2004 . 8 Introduction To The Thesis Reinforcement Learning Reinforcement Learning (RL) is a computational

A2-RL: Aesthetics Aware Reinforcement Learning for Image … · 2018-03-13 · A2-RL: Aesthetics Aware Reinforcement Learning for Image Cropping Debang Li 1;2, Huikai Wu , Junge Zhang

AI-based Decision Support for Sustainable Operation of ......Learning (RL), a machine learning algorithm suitable for complex and large scale optimization problems [25]. RL outperforms