part 4: monte carlo learning to solve blackjack...r. s. sutton and a. g. barto: reinforcement...

Part 4: Monte Carlo Learning to solve Blackjack Introduction to Reinforcement Learning R. S. Sutton and A. G. Barto: Reinforcement Learning: An Introduction 1

Upload: others

Post on 25-Aug-2020

5 views

Category:

Documents

1 download

Report

Download

Embed Size (px):

TRANSCRIPT

Part 4: Monte Carlo Learning to solve Blackjack

Introduction to Reinforcement Learning

R. S. Sutton and A. G. Barto: Reinforcement Learning: An Introduction 1

Page 2: Part 4: Monte Carlo Learning to solve Blackjack...R. S. Sutton and A. G. Barto: Reinforcement Learning: An Introduction 3 Monte Carlo Policy Evaluation Goal: learn v π Given: some

R. S. Sutton and A. G. Barto: Reinforcement Learning: An Introduction 2

Monte Carlo Methods for Solving MDPs

❐ Monte Carlo methods are learning methods! Experience " values, policy

❐ Monte Carlo methods can be used in two ways:! On-line: No model necessary and still attains optimality! Simulated: Planning without a full probability model

❐ Monte Carlo methods learn from complete sample returns! Only defined for episodic

R. S. Sutton and A. G. Barto: Reinforcement Learning: An Introduction 3

Monte Carlo Policy Evaluation

❐ Goal: learn vπ❐ Given: some number of episodes under π which contain s❐ Idea: Average returns observed after visits to s

❐ Every-Visit MC: average returns for every time s is visited in an episode

❐ First-visit MC: average returns only for first time s is visited in an episode

❐ Both converge asymptotically to vπ

1 2 3 4 5

Page 4: Part 4: Monte Carlo Learning to solve Blackjack...R. S. Sutton and A. G. Barto: Reinforcement Learning: An Introduction 3 Monte Carlo Policy Evaluation Goal: learn v π Given: some

R. S. Sutton and A. G. Barto: Reinforcement Learning: An Introduction 4

First-visit Monte Carlo policy evaluation

Page 5: Part 4: Monte Carlo Learning to solve Blackjack...R. S. Sutton and A. G. Barto: Reinforcement Learning: An Introduction 3 Monte Carlo Policy Evaluation Goal: learn v π Given: some

R. S. Sutton and A. G. Barto: Reinforcement Learning: An Introduction 5

Backup diagram for Monte Carlo

❐ Entire episode included❐ Only one choice at each state

❐ Time required to estimate one state does not depend on the total number of states

Page 6: Part 4: Monte Carlo Learning to solve Blackjack...R. S. Sutton and A. G. Barto: Reinforcement Learning: An Introduction 3 Monte Carlo Policy Evaluation Goal: learn v π Given: some

R. S. Sutton and A. G. Barto: Reinforcement Learning: An Introduction 6

Blackjack example

❐ Object: Have your card sum be greater than the dealers without exceeding 21.

❐ States (200 of them): ! current sum (12-21)! dealer’s showing card (ace-10)! do I have a useable ace?

❐ Reward: +1 for winning, 0 for a draw, -1 for losing❐ Actions: stick (stop receiving cards), hit (receive another

card)❐ Policy: Stick if my sum is 20 or 21, else hit

Page 7: Part 4: Monte Carlo Learning to solve Blackjack...R. S. Sutton and A. G. Barto: Reinforcement Learning: An Introduction 3 Monte Carlo Policy Evaluation Goal: learn v π Given: some

R. S. Sutton and A. G. Barto: Reinforcement Learning: An Introduction 7

Blackjack value functions

Page 8: Part 4: Monte Carlo Learning to solve Blackjack...R. S. Sutton and A. G. Barto: Reinforcement Learning: An Introduction 3 Monte Carlo Policy Evaluation Goal: learn v π Given: some

R. S. Sutton and A. G. Barto: Reinforcement Learning: An Introduction 8

Monte Carlo Exploring Starts

Fixed point is optimal policy π*

Now proven (almost)

Page 9: Part 4: Monte Carlo Learning to solve Blackjack...R. S. Sutton and A. G. Barto: Reinforcement Learning: An Introduction 3 Monte Carlo Policy Evaluation Goal: learn v π Given: some

R. S. Sutton and A. G. Barto: Reinforcement Learning: An Introduction 9

Blackjack example continued

❐ Exploring starts❐ Initial policy as described before

Page 10: Part 4: Monte Carlo Learning to solve Blackjack...R. S. Sutton and A. G. Barto: Reinforcement Learning: An Introduction 3 Monte Carlo Policy Evaluation Goal: learn v π Given: some

R. S. Sutton and A. G. Barto: Reinforcement Learning: An Introduction 10

Monte Carlo Exploring Starts

Fixed point is optimal policy π*

Now proven (almost)

Page 11: Part 4: Monte Carlo Learning to solve Blackjack...R. S. Sutton and A. G. Barto: Reinforcement Learning: An Introduction 3 Monte Carlo Policy Evaluation Goal: learn v π Given: some

R. S. Sutton and A. G. Barto: Reinforcement Learning: An Introduction 11

Blackjack example continued

❐ Exploring starts❐ Initial policy as described before

Page 12: Part 4: Monte Carlo Learning to solve Blackjack...R. S. Sutton and A. G. Barto: Reinforcement Learning: An Introduction 3 Monte Carlo Policy Evaluation Goal: learn v π Given: some

R. S. Sutton and A. G. Barto: Reinforcement Learning: An Introduction 12

On-policy Monte Carlo Control

)(1

sAε

ε +−

greedy

)(sAε

non-max

❐ On-policy: learn about policy currently executing❐ How do we get rid of exploring starts?

! Need soft policies: π(a|s) > 0 for all s and a! e.g. ε-soft policy:

❐ Similar to GPI: move policy towards greedy policy (i.e. ε-soft)

❐ Converges to best ε-soft policy

Page 13: Part 4: Monte Carlo Learning to solve Blackjack...R. S. Sutton and A. G. Barto: Reinforcement Learning: An Introduction 3 Monte Carlo Policy Evaluation Goal: learn v π Given: some

R. S. Sutton and A. G. Barto: Reinforcement Learning: An Introduction 13

Summary for Monte Carlo Methods

❐ Monte Carlo methods learn directly from experience! On-line: No model necessary and still attains optimality! Simulated: No need for a full model

❐ Monte Carlo methods learn from complete sample returns! Only defined for episodic tasks

❐ The Monte Carlo exploring starts algorithm avoids the vexing issue of maintaining exploration

Monte Carlo and quasi-Monte Carlo integration

Markov Chain Monte Carlo Methods By Marc Sobel. Exploring Monte Carlo A biulding in the country of Monte Carlo:

AS MONACO · TO MONACO Fabulous events in the Principality Football Prince’s Palace Monte-Carlo Monaco Monte-Carlo Monte-Carlo Monaco Monte-Carlo AS Monaco of Monaco Casino Grand

Backgammon, Go, - Ferdowsi University of Mashhadfumblog.um.ac.ir/gallery/839/Chapter 06.pdf · R. S. Sutton and A. G. Barto: Reinforcement Learning: An Introduction 1 Monte Carlo

Monte Carlo Methods - Rupam Mahmood...is extended to the Monte Carlo case in which only sample experience is available. 5.1 Monte Carlo Prediction We begin by considering Monte Carlo

Monte Carlo Methods - UNIGE · Monte Carlo Methods Stéphane Paltani What are Monte-Carlo methods? Generation of random variables Markov chains Monte-Carlo Error …

Monte Carlo and quasi-Monte Carlo methods - PKUdsec.pku.edu.cn/~tieli/notes/numer_anal/MCQMC_Caflisch.pdf · MONTE CARLO AND QUASI-MONTE CARLO 3 quasi-random points converges more

Monte Carlo Standard Errors for Markov Chain Monte Carlojflegal/Final_Thesis_twosided.pdf · Monte Carlo Standard Errors for Markov Chain Monte Carlo a dissertation submitted to the

Monte Carlo and quasi-Monte Carlo for image synthesisstatweb.stanford.edu/~owen/pubtalks/egsr2014post.pdf · 2014-06-29 · Monte Carlo and quasi-Monte Carlo for image synthesis Art

CLASSICAL MONTE CARLO & METROPOLIS ALGORITHMstaff.ustc.edu.cn/~yjdeng/CQMC2012/lecture_notes/part1.pdf · CLASSICAL MONTE CARLO & METROPOLIS ALGORITHM Monte Carlo (MC) simulations

Monte Carlo Simulations - DESY · 2010-08-18 · H. Jung, Monte Carlo Simulations in particle physics, summer student lecture, august 8, 2010 33 Monte Carlo method Monte Carlo method

14 Monte Carlo methodsquantum.phys.unm.edu/467-17/chap14.pdf · 2017. 4. 5. · 14 Monte Carlo methods 14.1 The Monte Carlo method The Monte Carlo method is simple, robust, and useful

Monte Carlo tools for LHC - INFN Lecce web · Monte Carlo tools for LHC Contents Basics of Monte Carlo methods Monte Carlo programs for particle physics 1. Monte Carlo integrators

Monte Carlo Methods and Stochastic Algorithmsbl/monte-carlo/poly.pdf · 1.1. ON THE CONVERGENCE RATE OF MONTE-CARLO METHODS 3 1.1 On the convergence rate of Monte-Carlo methods In

Low Discrepancy Sequences and Quasi-Monte Carlo ...sites.science.oregonstate.edu/.../1997ScSiWi.pdf1.3 Quasi-Monte Carlo Integration Quasi-Monte Carlo Integration originated in the

Monte Carlo Simulationathreya/Teaching/statistics1/dootika.pdf · Monte Carlo Simulation Monte Carlo methods, or Monte Carlo experiments, are a broad class of computational algorithms

Monte Carlo Treatment Planning - Radiation Dosimetryradiationdosimetry.org/files/...16-monte-carlo-treatment-planning.pdf · Monte Carlo Treatment Planning An Introduction NEDERLANDSE

Monte Carlo Method - Monte Carlo Simulation · Monte Carlo Method Monte Carlo Simulation Peter Frank Perroni December 1, 2015 Peter Frank Perroni Monte Carlo Method

Monte Carlo Particle Transport: Algorithm and Performance ... · arise in photon Monte Carlo simulations. Neutron Monte Carlo Transport The Monte Carlo method of simulating particle

Monte Carlo Simulation - statvision.com Carlo Simulation.pdf · Monte Carlo Simulation “Monte Carlo simulations are used to model the probability of different outcomes in a process

Fatin Sezgin - MCQMC2010 - Monte Carlo and Quasi-Monte Carlo

Monte Carlo approach to uncertainty analyses in forestry ... Monte Carlo... · inventory Monte Carlo Small or large ... To do Monte Carlo simulations, ... Monte Carlo simulation #

Quantum Monte Carlo - academic.sun.ac.zaacademic.sun.ac.za/somerskool/2005/documents/lecture_notes/... · 1 ceperley random walks Quantum Monte Carlo 1. Introduction to Monte Carlo:

Quasi-Monte Carlo Variational InferenceQuasi-Monte Carlo Variational Inference 3. Quasi-Monte Carlo Variational Inference In this Section, we introduce Quasi-Monte Carlo Varia-tionalInference(qmcvi),usingrandomizedqmc

Monte Carlo Simulation Monte Carlo Simulation - University of Florida

Quantum Monte Carlo for Electronic Structurekentpr/talks/uc_group_mtg_June2003.pdf · •Real-world Applications •Monte Carlo integration •Variational Monte Carlo •Diffusion

OPTIONS: A MONTE CARLO APPROACH - camjclub - homeA+Monte+Carlo+Appro… · OPTIONS: A MONTE CARLO APPROACH ... This paper develops a Monte Carlo simulation method for solving option

JetForm:836 - mcnp.lanl.gov · 06-7094 Monte Carlo Eigenvalue Calculations Forrest Brown Monte Carlo lectures. 1 Monte Carlo Eigenvalue Calculations Forrest Brown Monte Carlo Codes

Monte Carlo

Monte Carlo Methods - ethz.ch · 1 Monte Carlo Methods Nicholas Constantine Metropolis Stanislaw Ulam 2 Books about Monte Carlo • M.H. Kalos and P.A. Whitlock: „Monte Carlo Methods“

Monte Carlo Simulation of Stochastic Processescacs.usc.edu/education/cs596/09Stochastic.pdf · Monte Carlo Simulation of Stochastic Processes MONTE CARLO METHOD • Monte Carlo

Chapter 4 Monte Carlo Methods and Simulation. Monte Carlo Methods

Monte Carlo Methods and Area Estimates - Cornell … · Monte Carlo Methods and Area Estimates CS3220 - Summer 2008 Jonathan Kaldor. Monte Carlo Methods ... as a Monte Carlo simulation

Method? Monte Carlo Methods Two basic principles Monte Carlo

Monte` Carlo Methods 1 MONTE` CARLO METHODS INTEGRATION and SAMPLING TECHNIQUES