![Page 1: Lab 4: Q-learning (table) - GitHub Pageshunkim.github.io/ml/RL/rl-l04.pdfLab 4: Q-learning (table) exploit&exploration and discounted future reward Reinforcement Learning with TensorFlow&OpenAI](https://reader034.vdocument.in/reader034/viewer/2022042910/5f3fefc9c58d2b57f27b2c1f/html5/thumbnails/1.jpg)
Lab 4: Q-learning (table)exploit&exploration and discounted future reward
Reinforcement Learning with TensorFlow&OpenAI GymSung Kim <[email protected]>
![Page 2: Lab 4: Q-learning (table) - GitHub Pageshunkim.github.io/ml/RL/rl-l04.pdfLab 4: Q-learning (table) exploit&exploration and discounted future reward Reinforcement Learning with TensorFlow&OpenAI](https://reader034.vdocument.in/reader034/viewer/2022042910/5f3fefc9c58d2b57f27b2c1f/html5/thumbnails/2.jpg)
Exploit VS Exploration: decaying E-greedy
for i in range (1000)
e = 0.1 / (i+1)
if random(1) < e: a = random
else: a = argmax(Q(s, a))
![Page 3: Lab 4: Q-learning (table) - GitHub Pageshunkim.github.io/ml/RL/rl-l04.pdfLab 4: Q-learning (table) exploit&exploration and discounted future reward Reinforcement Learning with TensorFlow&OpenAI](https://reader034.vdocument.in/reader034/viewer/2022042910/5f3fefc9c58d2b57f27b2c1f/html5/thumbnails/3.jpg)
Exploit VS Exploration: add random noise
0.5 0.6 0.3 0.2 0.5
for i in range (1000) a = argmax(Q(s, a) + random_values / (i+1))
![Page 4: Lab 4: Q-learning (table) - GitHub Pageshunkim.github.io/ml/RL/rl-l04.pdfLab 4: Q-learning (table) exploit&exploration and discounted future reward Reinforcement Learning with TensorFlow&OpenAI](https://reader034.vdocument.in/reader034/viewer/2022042910/5f3fefc9c58d2b57f27b2c1f/html5/thumbnails/4.jpg)
Discounted reward ( = 0.9)
1
![Page 5: Lab 4: Q-learning (table) - GitHub Pageshunkim.github.io/ml/RL/rl-l04.pdfLab 4: Q-learning (table) exploit&exploration and discounted future reward Reinforcement Learning with TensorFlow&OpenAI](https://reader034.vdocument.in/reader034/viewer/2022042910/5f3fefc9c58d2b57f27b2c1f/html5/thumbnails/5.jpg)
Q-learning algorithm
![Page 6: Lab 4: Q-learning (table) - GitHub Pageshunkim.github.io/ml/RL/rl-l04.pdfLab 4: Q-learning (table) exploit&exploration and discounted future reward Reinforcement Learning with TensorFlow&OpenAI](https://reader034.vdocument.in/reader034/viewer/2022042910/5f3fefc9c58d2b57f27b2c1f/html5/thumbnails/6.jpg)
Code: setup
https://medium.com/emergent-future/simple-reinforcement-learning-with-tensorflow-part-0-q-learning-with-tables-and-neural-networks-d195264329d0#.pjz9g59ap
![Page 7: Lab 4: Q-learning (table) - GitHub Pageshunkim.github.io/ml/RL/rl-l04.pdfLab 4: Q-learning (table) exploit&exploration and discounted future reward Reinforcement Learning with TensorFlow&OpenAI](https://reader034.vdocument.in/reader034/viewer/2022042910/5f3fefc9c58d2b57f27b2c1f/html5/thumbnails/7.jpg)
Code: Q learning
https://medium.com/emergent-future/simple-reinforcement-learning-with-tensorflow-part-0-q-learning-with-tables-and-neural-networks-d195264329d0#.pjz9g59ap
![Page 8: Lab 4: Q-learning (table) - GitHub Pageshunkim.github.io/ml/RL/rl-l04.pdfLab 4: Q-learning (table) exploit&exploration and discounted future reward Reinforcement Learning with TensorFlow&OpenAI](https://reader034.vdocument.in/reader034/viewer/2022042910/5f3fefc9c58d2b57f27b2c1f/html5/thumbnails/8.jpg)
Code: results
https://medium.com/emergent-future/simple-reinforcement-learning-with-tensorflow-part-0-q-learning-with-tables-and-neural-networks-d195264329d0#.pjz9g59ap
![Page 9: Lab 4: Q-learning (table) - GitHub Pageshunkim.github.io/ml/RL/rl-l04.pdfLab 4: Q-learning (table) exploit&exploration and discounted future reward Reinforcement Learning with TensorFlow&OpenAI](https://reader034.vdocument.in/reader034/viewer/2022042910/5f3fefc9c58d2b57f27b2c1f/html5/thumbnails/9.jpg)
Code: e-greedy
https://medium.com/emergent-future/simple-reinforcement-learning-with-tensorflow-part-0-q-learning-with-tables-and-neural-networks-d195264329d0#.pjz9g59ap
![Page 10: Lab 4: Q-learning (table) - GitHub Pageshunkim.github.io/ml/RL/rl-l04.pdfLab 4: Q-learning (table) exploit&exploration and discounted future reward Reinforcement Learning with TensorFlow&OpenAI](https://reader034.vdocument.in/reader034/viewer/2022042910/5f3fefc9c58d2b57f27b2c1f/html5/thumbnails/10.jpg)
Code: e-greedy results
https://medium.com/emergent-future/simple-reinforcement-learning-with-tensorflow-part-0-q-learning-with-tables-and-neural-networks-d195264329d0#.pjz9g59ap
![Page 11: Lab 4: Q-learning (table) - GitHub Pageshunkim.github.io/ml/RL/rl-l04.pdfLab 4: Q-learning (table) exploit&exploration and discounted future reward Reinforcement Learning with TensorFlow&OpenAI](https://reader034.vdocument.in/reader034/viewer/2022042910/5f3fefc9c58d2b57f27b2c1f/html5/thumbnails/11.jpg)
Next
Nondeterministic/
Stochastic worlds