lecture 3: q-learning (table) - github pages · lecture 3: q-learning (table) reinforcement...
TRANSCRIPT
![Page 1: Lecture 3: Q-learning (table) - GitHub Pages · Lecture 3: Q-learning (table) Reinforcement Learning with TensorFlow&OpenAI Gym Sung Kim](https://reader031.vdocument.in/reader031/viewer/2022022620/5bae037409d3f2b5718bd947/html5/thumbnails/1.jpg)
Lecture 3: Q-learning (table)
Reinforcement Learning with TensorFlow&OpenAI GymSung Kim <[email protected]>
![Page 2: Lecture 3: Q-learning (table) - GitHub Pages · Lecture 3: Q-learning (table) Reinforcement Learning with TensorFlow&OpenAI Gym Sung Kim](https://reader031.vdocument.in/reader031/viewer/2022022620/5bae037409d3f2b5718bd947/html5/thumbnails/2.jpg)
Try Frozen Lake, Real Game?
S
![Page 3: Lecture 3: Q-learning (table) - GitHub Pages · Lecture 3: Q-learning (table) Reinforcement Learning with TensorFlow&OpenAI Gym Sung Kim](https://reader031.vdocument.in/reader031/viewer/2022022620/5bae037409d3f2b5718bd947/html5/thumbnails/3.jpg)
Frozen Lake: Random?
S
![Page 4: Lecture 3: Q-learning (table) - GitHub Pages · Lecture 3: Q-learning (table) Reinforcement Learning with TensorFlow&OpenAI Gym Sung Kim](https://reader031.vdocument.in/reader031/viewer/2022022620/5bae037409d3f2b5718bd947/html5/thumbnails/4.jpg)
Frozen Lake: Even if you know the way, ask.“아는 길도, 물어가라”
![Page 5: Lecture 3: Q-learning (table) - GitHub Pages · Lecture 3: Q-learning (table) Reinforcement Learning with TensorFlow&OpenAI Gym Sung Kim](https://reader031.vdocument.in/reader031/viewer/2022022620/5bae037409d3f2b5718bd947/html5/thumbnails/5.jpg)
Q-function (state-action value function)
(1) state
(2) action(3) quality (reward)
Q (state, action)
![Page 6: Lecture 3: Q-learning (table) - GitHub Pages · Lecture 3: Q-learning (table) Reinforcement Learning with TensorFlow&OpenAI Gym Sung Kim](https://reader031.vdocument.in/reader031/viewer/2022022620/5bae037409d3f2b5718bd947/html5/thumbnails/6.jpg)
Policy using Q-function
Q (state, action)
Q (s1, LEFT): 0Q (s1, RIGHT): 0.5Q (s1, UP): 0Q (s1, DOWN): 0.3
![Page 7: Lecture 3: Q-learning (table) - GitHub Pages · Lecture 3: Q-learning (table) Reinforcement Learning with TensorFlow&OpenAI Gym Sung Kim](https://reader031.vdocument.in/reader031/viewer/2022022620/5bae037409d3f2b5718bd947/html5/thumbnails/7.jpg)
Optimal Policy, and Max Q
Q (state, action)
Max Q =
![Page 8: Lecture 3: Q-learning (table) - GitHub Pages · Lecture 3: Q-learning (table) Reinforcement Learning with TensorFlow&OpenAI Gym Sung Kim](https://reader031.vdocument.in/reader031/viewer/2022022620/5bae037409d3f2b5718bd947/html5/thumbnails/8.jpg)
Frozen Lake: optimal policy with Q
S
![Page 9: Lecture 3: Q-learning (table) - GitHub Pages · Lecture 3: Q-learning (table) Reinforcement Learning with TensorFlow&OpenAI Gym Sung Kim](https://reader031.vdocument.in/reader031/viewer/2022022620/5bae037409d3f2b5718bd947/html5/thumbnails/9.jpg)
Frozen Lake: optimal policy with Qa
s`s
![Page 10: Lecture 3: Q-learning (table) - GitHub Pages · Lecture 3: Q-learning (table) Reinforcement Learning with TensorFlow&OpenAI Gym Sung Kim](https://reader031.vdocument.in/reader031/viewer/2022022620/5bae037409d3f2b5718bd947/html5/thumbnails/10.jpg)
Finding, Learning Q
• Assume (believe) Q in s` exists!
• My condition- I am in s
- when I do action a, I’ll go to s`- when I do action a, I’ll get reward r- Q in s`, Q(s`, a`) exist!
• How can we express Q(s, a) using Q(s`, a`)?
![Page 11: Lecture 3: Q-learning (table) - GitHub Pages · Lecture 3: Q-learning (table) Reinforcement Learning with TensorFlow&OpenAI Gym Sung Kim](https://reader031.vdocument.in/reader031/viewer/2022022620/5bae037409d3f2b5718bd947/html5/thumbnails/11.jpg)
Learning Q (s, a)?
a
r s`
![Page 12: Lecture 3: Q-learning (table) - GitHub Pages · Lecture 3: Q-learning (table) Reinforcement Learning with TensorFlow&OpenAI Gym Sung Kim](https://reader031.vdocument.in/reader031/viewer/2022022620/5bae037409d3f2b5718bd947/html5/thumbnails/12.jpg)
State, action, reward
S F F F
F H F H
F F F H
H F F G
![Page 13: Lecture 3: Q-learning (table) - GitHub Pages · Lecture 3: Q-learning (table) Reinforcement Learning with TensorFlow&OpenAI Gym Sung Kim](https://reader031.vdocument.in/reader031/viewer/2022022620/5bae037409d3f2b5718bd947/html5/thumbnails/13.jpg)
Future reward
![Page 14: Lecture 3: Q-learning (table) - GitHub Pages · Lecture 3: Q-learning (table) Reinforcement Learning with TensorFlow&OpenAI Gym Sung Kim](https://reader031.vdocument.in/reader031/viewer/2022022620/5bae037409d3f2b5718bd947/html5/thumbnails/14.jpg)
Learning Q (s, a)?
![Page 15: Lecture 3: Q-learning (table) - GitHub Pages · Lecture 3: Q-learning (table) Reinforcement Learning with TensorFlow&OpenAI Gym Sung Kim](https://reader031.vdocument.in/reader031/viewer/2022022620/5bae037409d3f2b5718bd947/html5/thumbnails/15.jpg)
Learning Q(s, a): 16x4 Table16 states and 4 actions (up, down, left, right)
![Page 16: Lecture 3: Q-learning (table) - GitHub Pages · Lecture 3: Q-learning (table) Reinforcement Learning with TensorFlow&OpenAI Gym Sung Kim](https://reader031.vdocument.in/reader031/viewer/2022022620/5bae037409d3f2b5718bd947/html5/thumbnails/16.jpg)
Learning Q(s, a): Tableinitial Q values are 0
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
![Page 17: Lecture 3: Q-learning (table) - GitHub Pages · Lecture 3: Q-learning (table) Reinforcement Learning with TensorFlow&OpenAI Gym Sung Kim](https://reader031.vdocument.in/reader031/viewer/2022022620/5bae037409d3f2b5718bd947/html5/thumbnails/17.jpg)
Learning Q(s, a) Table (with many trials)initial Q values are 0
![Page 18: Lecture 3: Q-learning (table) - GitHub Pages · Lecture 3: Q-learning (table) Reinforcement Learning with TensorFlow&OpenAI Gym Sung Kim](https://reader031.vdocument.in/reader031/viewer/2022022620/5bae037409d3f2b5718bd947/html5/thumbnails/18.jpg)
Learning Q(s, a) Table (with many trials)initial Q values are 0
1 Q(s14, aright) = r = 1
Q(s13, aright) = r + max(Q(s14, a)) = 0 + max (0, 0, 1, 0) = 1
![Page 19: Lecture 3: Q-learning (table) - GitHub Pages · Lecture 3: Q-learning (table) Reinforcement Learning with TensorFlow&OpenAI Gym Sung Kim](https://reader031.vdocument.in/reader031/viewer/2022022620/5bae037409d3f2b5718bd947/html5/thumbnails/19.jpg)
Learning Q(s, a) Table: one success!initial Q values are 0
11
11
1
111
![Page 20: Lecture 3: Q-learning (table) - GitHub Pages · Lecture 3: Q-learning (table) Reinforcement Learning with TensorFlow&OpenAI Gym Sung Kim](https://reader031.vdocument.in/reader031/viewer/2022022620/5bae037409d3f2b5718bd947/html5/thumbnails/20.jpg)
Learning Q(s, a) Table: optimal policy
11
11
1
111
![Page 21: Lecture 3: Q-learning (table) - GitHub Pages · Lecture 3: Q-learning (table) Reinforcement Learning with TensorFlow&OpenAI Gym Sung Kim](https://reader031.vdocument.in/reader031/viewer/2022022620/5bae037409d3f2b5718bd947/html5/thumbnails/21.jpg)
Dummy Q-learning algorithm
![Page 22: Lecture 3: Q-learning (table) - GitHub Pages · Lecture 3: Q-learning (table) Reinforcement Learning with TensorFlow&OpenAI Gym Sung Kim](https://reader031.vdocument.in/reader031/viewer/2022022620/5bae037409d3f2b5718bd947/html5/thumbnails/22.jpg)
Next
Lab: Dummy Q-learning Table