choosemove=16 19 v =30 points. legalmoves= 16 19, or simulatemove = 11 15, or …. 16 19,

27
ChooseMove = 16 19 V = 30 points

Upload: daphne-dunkum

Post on 14-Dec-2015

216 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: ChooseMove=16  19 V =30 points. LegalMoves= 16  19, or SimulateMove = 11  15, or …. 16  19,

ChooseMove = 16 19

V = 30 points

Page 2: ChooseMove=16  19 V =30 points. LegalMoves= 16  19, or SimulateMove = 11  15, or …. 16  19,

LegalMoves =

16 19, or

SimulateMove =

11 15, or

….

16 19,

Page 3: ChooseMove=16  19 V =30 points. LegalMoves= 16  19, or SimulateMove = 11  15, or …. 16  19,

For all legal moves:

- simulate that move

- check the value of the result

Pick the best move

LegalMoves

12 16, or

11 15, or

….

12 16SimulateMove =

V 30 points=

=

Page 4: ChooseMove=16  19 V =30 points. LegalMoves= 16  19, or SimulateMove = 11  15, or …. 16  19,

We’ll make our computer program learn the function V

- so when it sees a board, it will assign a score to the board

- it has to learn how to assign the correct score to a board

First, we have to define how our function V is supposed to behave…

Page 5: ChooseMove=16  19 V =30 points. LegalMoves= 16  19, or SimulateMove = 11  15, or …. 16  19,

…but what about boards in the middle of a game?

b

b ‘

successor( b )

….

….

….

V(b’)=100

Page 6: ChooseMove=16  19 V =30 points. LegalMoves= 16  19, or SimulateMove = 11  15, or …. 16  19,

…but what about boards in the middle of a game?

b

b ‘

successor( b )

….

….

….

V(b’)=100

V(b)=V(b’)=100

V(b)=V(b’)=100

….

Page 7: ChooseMove=16  19 V =30 points. LegalMoves= 16  19, or SimulateMove = 11  15, or …. 16  19,

b

b ‘

successor( b )

….

….

….

V(b’)=100

V(b)=V(b’)=100

V(b)=V(b’)=0

….

V(b’)=-100

NB: if this board state occurs in 50% wins and 50% loses then V(b) will eventually be 0

Page 8: ChooseMove=16  19 V =30 points. LegalMoves= 16  19, or SimulateMove = 11  15, or …. 16  19,

b

b ‘

successor( b )

….

….

….

V(b’)=100

V(b)=V(b’)=100

V(b)=V(b’)=0

….

V(b’)=-100

NB: if this board state occurs in 75% wins and 25% loses then V(b) will eventually be 50

Page 9: ChooseMove=16  19 V =30 points. LegalMoves= 16  19, or SimulateMove = 11  15, or …. 16  19,

b

b ‘

successor( b )

….

….

….

V(b’)=100

V(b)=V(b’)=100

V(b)=V(b’)=0

….

V(b’)=-100

so each board score will eventually be a mix of the number of games that you can win, lose or draw from that board (assuming an ideal scoring function…in fact we often only approximate this)

Page 10: ChooseMove=16  19 V =30 points. LegalMoves= 16  19, or SimulateMove = 11  15, or …. 16  19,

…but what about boards in the middle of a game?

b

b ‘

successor( b )

….

….

….

V(b’)=100

V(b)=V(b’)=100

V(b)=V(b’)=100

….

that’s our definition of V – this is how our computer program’s V should learn how to behave…

Page 11: ChooseMove=16  19 V =30 points. LegalMoves= 16  19, or SimulateMove = 11  15, or …. 16  19,

But in the middle of a game, how can you calculate the path to the final board?

- generate all possibilities? tic-tac-toe

chess …takes too long!

- approximate V and call it V’ (Vhat)

V’(b) V(b) …so we’ll learn an approximation to V

b

b ‘

successor( b )

….

….

….

V(b’)=100

V(b)=V(b’)=100

V(b)=V(b’)=100

….

Page 12: ChooseMove=16  19 V =30 points. LegalMoves= 16  19, or SimulateMove = 11  15, or …. 16  19,

Now we’ll choose our representation of V’

(what is a board anyway…?)

1) big table?

50

90

input: board score

….

Page 13: ChooseMove=16  19 V =30 points. LegalMoves= 16  19, or SimulateMove = 11  15, or …. 16  19,

2) CLIPS rules?

IF my piece is near a side edge

THEN score = 80

IF my piece is near the opponents edge

THEN score = 90

….

Now we’ll choose our representation of V’

(what is a board anyway…?)

Page 14: ChooseMove=16  19 V =30 points. LegalMoves= 16  19, or SimulateMove = 11  15, or …. 16  19,

3) polynomial function?

X1 = 12

X1 = number of white pieces on board

X2 = number of red pieces

X3 = number of white kings

X4 = number of red kings

X5 = number of white pieces threatened by red (can be captured on red’s next turn)

X6 = number of red pieces threatened by white

X2 = 11

X3 = 0

X4 = 0

X5 = 1

X6 = 0

e.g.

we define some variables…

Now we’ll choose our representation of V’

(what is a board anyway…?)

Page 15: ChooseMove=16  19 V =30 points. LegalMoves= 16  19, or SimulateMove = 11  15, or …. 16  19,

3) polynomial function?

X1 = number of white pieces on board

X2 = number of red pieces

X3 = number of white kings

X4 = number of red kings

X5 = number of white pieces threatened by red (can be captured on red’s next turn)

X6 = number of red pieces threatened by white

arrange as linear combination: (quadratic function)

Now we’ll choose our representation of V’

(what is a board anyway…?)

Page 16: ChooseMove=16  19 V =30 points. LegalMoves= 16  19, or SimulateMove = 11  15, or …. 16  19,

3) polynomial function?

X1 = number of white pieces on board

X2 = number of red pieces

X3 = number of white kings

X4 = number of red kings

X5 = number of white pieces threatened by red (can be captured on red’s next turn)

X6 = number of red pieces threatened by white

arrange as linear combination: (quadratic function)

Computer program will change the values of the weights – it will learn what the weights should be to give correct score for each board

Now we’ll choose our representation of V’

(what is a board anyway…?)

Page 17: ChooseMove=16  19 V =30 points. LegalMoves= 16  19, or SimulateMove = 11  15, or …. 16  19,

learning process – playing and learning at same time

computer will play against itself

initialise V’ with random weights (w0=23 etc.)

V’(b) = 23 + 0.5 x1 +…

Page 18: ChooseMove=16  19 V =30 points. LegalMoves= 16  19, or SimulateMove = 11  15, or …. 16  19,

learning process – playing and learning at same time

computer will play against itself

initialise V’ with random weights (w0=23 etc.)

start a new game - for each board

(a) calculate V’ on all possible legal moves

board 1

V’(<x1=12, x2=12,…,x6=1>) = 23 + 0.5 x1 +…

= 30

b = <x1=12, x2=12,…,x6=1>

For example, one of the boards resulting from a legal move might be LegalMoves =

12 16, or

11 15, or….

12 16SimulateMove =

V 30 points=

=

=

=

Page 19: ChooseMove=16  19 V =30 points. LegalMoves= 16  19, or SimulateMove = 11  15, or …. 16  19,

learning process – playing and learning at same time

computer will play against itself

initialise V’ with random weights (w0=23 etc.)

start a new game - for each board

(a) calculate V’ on all possible legal moves

board 1

First time round V’ is essentially random (because we set the weights as random) – as it learns V’ should pick better successors

LegalMoves =

12 16, or

11 15, or….

12 16SimulateMove =

V 30 points=

=

=

=

Page 20: ChooseMove=16  19 V =30 points. LegalMoves= 16  19, or SimulateMove = 11  15, or …. 16  19,

learning process – playing and learning at same time

computer will play against itself

initialise V’ with random weights (w0=23 etc.)

start a new game - for each board

(a) calculate V’ on all possible legal moves

(b) pick the successor with the highest score

(c) evaluate error

board 1

red:1216

board 2

At this point

board 1 = b

board 2 = successor(b)

Page 21: ChooseMove=16  19 V =30 points. LegalMoves= 16  19, or SimulateMove = 11  15, or …. 16  19,

learning process – playing and learning at same time

computer will play against itself

initialise V’ with random weights (w0=23 etc.)

start a new game - for each board

(a) calculate V’ on all possible legal moves

(b) pick the successor with the highest score

(c) evaluate error

(d) modify each weight to correct error

board 1

board 2

red:1216

Page 22: ChooseMove=16  19 V =30 points. LegalMoves= 16  19, or SimulateMove = 11  15, or …. 16  19,

learning process – playing and learning at same time

computer will play against itself

initialise V’ with random weights (w0=23 etc.)

start a new game - for each board

(a) calculate V’ on all possible legal moves

(b) pick the successor with the highest score

(c) evaluate error

(d) modify each weight to correct error

board 1

board 2

error

weight toupdate

learning rate

modify weight in proportion to size of attribute value

red:1216

Page 23: ChooseMove=16  19 V =30 points. LegalMoves= 16  19, or SimulateMove = 11  15, or …. 16  19,

learning process – playing and learning at same time

computer will play against itself

initialise V’ with random weights (w0=23 etc.)

start a new game - for each board

(a) calculate V’ on all possible legal moves

(b) pick the successor with the highest score

(c) evaluate error

(d) modify each weight to correct error

board 1

board 2

error

weight toupdate

learning rate

modify weight in proportion to size of attribute value

At this point the change probably won’t be in a useful direction, because the weights are all still random

red:1216

Page 24: ChooseMove=16  19 V =30 points. LegalMoves= 16  19, or SimulateMove = 11  15, or …. 16  19,

learning process – playing and learning at same time

computer will play against itself

initialise V’ with random weights (w0=23 etc.)

start a new game - for each board

(a) calculate V’ on all possible legal moves

(b) pick the successor with the highest score

(c) evaluate error

(d) modify each weight to correct error

(e) repeat until end-of-game

board 1

board 2

….

red wins

red:1216

Page 25: ChooseMove=16  19 V =30 points. LegalMoves= 16  19, or SimulateMove = 11  15, or …. 16  19,

learning process – playing and learning at same time

computer will play against itself

initialise V’ with random weights (w0=23 etc.)

start a new game - for each board

(a) calculate V’ on all possible legal moves

(b) pick the successor with the highest score

(c) evaluate error

(d) modify each weight to correct error

(e) repeat until end-of-game

board 1

board 2

….

red wins

b is a final board state

Vtrain(b) = 100 …because we won

red:1216

Page 26: ChooseMove=16  19 V =30 points. LegalMoves= 16  19, or SimulateMove = 11  15, or …. 16  19,

learning process – playing and learning at same time

computer will play against itself

initialise V’ with random weights (w0=23 etc.)

start a new game - for each board

(a) calculate V’ on all possible legal moves

(b) pick the successor with the highest score

(c) evaluate error

(d) modify each weight to correct error

(e) repeat until end-of-game

board 1

board 2

….

red wins

now we’re shifting our V’ towards something useful (a little bit)

red:1216

Page 27: ChooseMove=16  19 V =30 points. LegalMoves= 16  19, or SimulateMove = 11  15, or …. 16  19,

learning process – playing and learning at same time

computer will play against itself

initialise V’ with random weights (w0=23 etc.)

start a new game - for each board

(a) calculate V’ on all possible legal moves

(b) pick the successor with the highest score

(c) evaluate error

(d) modify each weight to correct error

(e) repeat until end-of-game

board 1

board 2

….

red wins

1 million games later and our V’ might now predict a useful score for any board that we might see

red:1216