evolving b oard game players without using expert knowledge

45
1 Evolving Board Game Players Without Using Expert Knowledge A presentation of research by Amit Benbassat Advisor: Moshe Sipper. A. Benbassat and M. Sipper “Evolving Lose-Checkers Players using Genetic Programming” IEEE Conference on Computational Intelligence and Games (CIG'10) , 2010 New yet unpublished results. Includes results:

Upload: theola

Post on 22-Feb-2016

23 views

Category:

Documents


0 download

DESCRIPTION

Evolving B oard Game Players Without Using Expert Knowledge. A presentation of research by Amit Benbassat Advisor: Moshe Sipper. Includes results:. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Evolving  B oard Game  Players Without Using Expert Knowledge

1

Evolving Board Game Players Without Using Expert Knowledge

A presentation of research by Amit BenbassatAdvisor: Moshe Sipper.

A. Benbassat and M. Sipper “Evolving Lose-Checkers Players using Genetic Programming” IEEE Conference on Computational Intelligence and Games (CIG'10), 2010New yet unpublished results.

Includes results:

Page 2: Evolving  B oard Game  Players Without Using Expert Knowledge

Synopsis Tree based GP in a nutshell. Applying tree based GP to Lose

Checkers. Expanding work to other games. Available projects.

2

Page 3: Evolving  B oard Game  Players Without Using Expert Knowledge

A Bit About Tree-Based GP

A method of solving problems by evolving solver programs.

The programs are represented in memory in tree form (i.e. the genomes are trees).

Initially promoted mostly through the efforts of John Koza.

3

Page 4: Evolving  B oard Game  Players Without Using Expert Knowledge

Tree-Based GPTurning expressions into a tree shaped data

structure: (X + 1) – (√X) IF (X≤3) THEN ((X+Y) + 3) ELSE ((X*Y)*X)

4

+

SQRT

XX 1

IFT

≤ +

+ 3

X Y

*

X Y

* XX 3

Page 5: Evolving  B oard Game  Players Without Using Expert Knowledge

Generic Genetic Operators:Self-Replication

5

IFT

≤ +

+ 3

X Y

*

X Y

* XX 3

IFT

≤ +

+ 3

X Y

*

X Y

* XX 3

Page 6: Evolving  B oard Game  Players Without Using Expert Knowledge

Generic Genetic Operators:Rebuild Mutation

6

IFT

≤ +

+ 3

X Y

*

X Y

* XX 3

Y 4

Page 7: Evolving  B oard Game  Players Without Using Expert Knowledge

Generic Genetic Operators:Two-Way Crossover

7

IFT

≤ +

+ 3

X Y

X 3

Y 4

+

SQRT

XX 1

Page 8: Evolving  B oard Game  Players Without Using Expert Knowledge

Synopsis Previous results in games using GP and

GAs. Applying tree based GP to Lose Checkers.

Design. Algorithm and operators. Results.

Expanding work to other games. Conclusions and future work.

8

Page 9: Evolving  B oard Game  Players Without Using Expert Knowledge

Applying GP to Lose Checkers:From Genotype to Phenotype

Used strongly typed tree based GP. Trees are seen as board-state

evaluators. The individual players are built around

the evaluator, using it (integrated with alpha-beta search) to decide which move to take.

9

Page 10: Evolving  B oard Game  Players Without Using Expert Knowledge

Terminal Nodes

10

Page 11: Evolving  B oard Game  Players Without Using Expert Knowledge

Terminal Nodes (cont’d)

11

Page 12: Evolving  B oard Game  Players Without Using Expert Knowledge

Function Nodes

12

Page 13: Evolving  B oard Game  Players Without Using Expert Knowledge

Applying GP to Lose Checkers

Algorithm:Generate random population consisting

of individuals of tree height 5 for

generation 0.Repeat for each generation i

Evaluate fitness.Selection().Procreation(XOprob,mutProb). 13

Page 14: Evolving  B oard Game  Players Without Using Expert Knowledge

Fitness Calculations The system supports a sequence of guides.

Each guide has a number of rounds assigned to it. Each guide has a number of games per round

assigned to it. The system also supports play between

individuals in the population (referred to in the EA literature as coevolution) and a parameter coPlayNum for number of games.

Players get 1 fitness point for winning a game and 0.5 points for a draw. 14

Page 15: Evolving  B oard Game  Players Without Using Expert Knowledge

Fitness Calculations (cont’d)

for each guide i dofor j ← 1 to guide i‘s Number of rounds do

Have every individual in the population deemed fit enough play guide i’s round size games against guide i.

Have every individual in the population play coPlayNum

games as black against coPlayNum random opponents in

the population.

15

Page 16: Evolving  B oard Game  Players Without Using Expert Knowledge

SelectionRepeat until number of parents selected is equal to original population size

Randomly choose two different individuals from population : I1 and I2if I1.Fitness > I2.Fitness thenSelect a copy of I1 for parent population.

elseSelect a copy of I2 for parent population.

16

Page 17: Evolving  B oard Game  Players Without Using Expert Knowledge

Genetic Operators:Local Mutation

17

Every tree node N returning a floating point value was assigned a number.

This number was initialized to 1.0 and acted as a factor for the return value.

Local mutation is a slight change in the node’s factor.

+

A B

<f1> Returnsf1*(A+B) +

A B

<f2> Returnsf2*(A+B)

Page 18: Evolving  B oard Game  Players Without Using Expert Knowledge

Genetic Operators:One-Way Crossover

18

IFT

≤ +

+ 3

X Y

X 3

Y 4

+

SQRT

XX 11

Page 19: Evolving  B oard Game  Players Without Using Expert Knowledge

Procreation(XOprob,mutProb)

While there remain at least 2 unselected individuals.find two unselected individuals I1 I2 at random.with probability XOprobIf I1.Fitness > I2.Fitnessuse one-way XO to transfer genes from I1 to I2.Else

use two-way XO between I1 and I2.For each individual I1 in population.

with probability mutProb choose a node in I1‘s tree atrandom and mutate it by either rebuild or local mutation.19

Page 20: Evolving  B oard Game  Players Without Using Expert Knowledge

Opponents There is no known simple evaluation

function for Lose Checkers. All hand-crafted players used the

random function to evaluate non-trivial board-states.

Two types of opponents were written in code: The random player. An α-β player of depth d with a random

evaluation function.20

Page 21: Evolving  B oard Game  Players Without Using Expert Knowledge

Quality of α-β Players To insure that α-β

players using a random evaluation function are indeed proficient players, their performance was tested.

Each test tournament consists of 10000 games.

21

1st player win ratio 2nd player

1st player

0.9665 Random αβ2

0.8502 αβ2 αβ3

0.5873 αβ3 αβ8

0.82535 αβ3 αβ5

0.5562 αβ8 αβ5

Page 22: Evolving  B oard Game  Players Without Using Expert Knowledge

Results with Search Againstα-β Players

Using lookahead 3, playing 1000 games against αβ3.

22

vs. αβ3 Fitness Eval

Run ID

744.0 50Co r00044698.5 50Co r00046765.5 50Co r00047696.5 50Co r00048781.5 50Co r00049721.0 50Co r00056786.5 50Co r00057697.0 50Co r00058737.0 50Co r00060737.0 50Co r00061

Page 23: Evolving  B oard Game  Players Without Using Expert Knowledge

Results with Search Againstα-β Players (cont’d)

Using lookahead 3, playing against various opponents.

23

vs. αβ8 vs .αβ6 vs. αβ4 vs. αβ3

Run ID

758.0 816.0 944.5 744.0 r00044476.0 722.5 899.0 765.5 r00047735.5 809.0 915.0 781.5 r00049399.5 745.5 909.0 786.5 r00057408.5 627.0 897.0 737.0 r00060715.5 781.5 947.0 737.0 r00061

Page 24: Evolving  B oard Game  Players Without Using Expert Knowledge

Results with Search Againstα-β Players: Parameters

Run parameters: Population 150, 120 generations. No guide play, 50 co-play games as black,

search depth 3. maximum tree depth:

12 in runs 44A-49A. 14 in runs 56A-61A

XO_Prob 0.8, mutProb 0.2, local_muteProb 0.5.

24

Page 25: Evolving  B oard Game  Players Without Using Expert Knowledge

Evolving Players using Deeper Search

Results with players using lookahead 4.

25

vs. αβ8 vs. αβ6 vs. αβ5 Run ID395.0 603.5 582.0 r00064561.5 782.5 537.0 r00065483.5 757.5 567.0 r00066385.5 723.0 598.5 r00067524.0 787.0 548.0 r00068523.0 715.5 573.5 r00069476.0 691.5 577.0 r00070401.5 582.5 551.5 r00071

Page 26: Evolving  B oard Game  Players Without Using Expert Knowledge

Results with Search Againstα-β Players: Parameters

Run parameters: Population 50, 70 generations. guide play:

20 games (in 2 rounds of 10) against αβ5. 20 co-play games as black. Search depth 4. maximum tree depth of 10. XO_Prob 0.8, mutProb 0.2, local_muteProb

0.5.26

Page 27: Evolving  B oard Game  Players Without Using Expert Knowledge

The Role of Mobility Initial runs with search produced tepid

results. The introduction of the mobility

terminal greatly improved those results.

Mobility is a general principle which apllies to many board games, and often associated with a high level of play. 27

Page 28: Evolving  B oard Game  Players Without Using Expert Knowledge

Synopsis Tree based GP in a nutshell. Applying tree based GP to Lose

Checkers. Expanding work to other games.

New results in Lose Checkers. 10X10 Checkers. Reversi. Dodgem.

Conclusions and future work. 28

Page 29: Evolving  B oard Game  Players Without Using Expert Knowledge

New Results in Lose Checkers

29

vs. αβ5 Fitness Eval Run ID

632.0 10αβ2_20Co r00090645.0 10αβ2_20Co r00091608.0 25Co r00096575.0 25Co r00097575.5 40Co r00098633.5 40Co r00099

Results with players using lookahead 4.

Page 30: Evolving  B oard Game  Players Without Using Expert Knowledge

New Results in Lose Checkers (cont’d)

30

Run parameters: Population: 120-150 Generations: 90-100. Guide play:

10 games against αβ2 in two of the runs. 20-40 co-play games as black. Search depth 4. Maximum tree depth of 14. XO_Prob 0.8, mutProb 0.2, local_muteProb

0.5.

Page 31: Evolving  B oard Game  Players Without Using Expert Knowledge

10x10 Checkers

31

10x10 Board. Objective: To

eliminate all opponent pieces or render all opponent pieces immobile.

Rules: As in 8x8 version.

Page 32: Evolving  B oard Game  Players Without Using Expert Knowledge

Quality of α-β Players Evolved players were

tested against α-β players that chose a material evaluation function at random for each turn.

To insure that α-β players are indeed proficient players, their performance was tested.

Each test tournament consists of 10000 games.

32

1st player win ratio

2nd player

1st player

0.99885 Random αβ2

0.5229 αβ2 αβ3

0.876 αβ3 αβ5

Page 33: Evolving  B oard Game  Players Without Using Expert Knowledge

10x10 Checkers Results

33

vs. αβ3 Search Depth

Fitness Eval

Run ID

889.0 3 50Co r00084927.0 3 50Co r00085732.0 2 25Co r00092615.5 2 25Co r00093554.0 2 25Co r00094631.0 2 25Co r00095

Page 34: Evolving  B oard Game  Players Without Using Expert Knowledge

10x10 Checkers Results (cont’d)

Run parameters: Population: 100-150 Generations: 100 No guide play. 25-50 co-play games as

black. Search depth 4. Maximum tree depth 13-14. XO_Prob 0.8, mutProb 0.2, local_muteProb

0.5.34

Page 35: Evolving  B oard Game  Players Without Using Expert Knowledge

8x8 Reversi Popular board game.

AKA Othello. 8x8 board. Each piece has black

side and white side. Each player places

piece on her turn, flipping trapped opponent pieces.

Objective: Maximize number of friendly pieces on the board.35

Page 36: Evolving  B oard Game  Players Without Using Expert Knowledge

Reversi Specific Terminals

36

Return Value Return Type

Node Name

Number of corners occupied by

opponent

F EnemyCornerCount

Number of corners occupied by player

F FriendlyCornerCount

FriendlyCornerCount− EnemyCornerCount

F CornerCount

Page 37: Evolving  B oard Game  Players Without Using Expert Knowledge

Quality of α-β Players

37

1st player win ratio

2nd player 1st player

0.8471 Random αβ2

0.6004 αβ2 αβ3

0.7509 αβ3 αβ5

0.7662 αβ5 αβ7

Evolved players were tested against α-β players that chose a material evaluation function at random for each turn.

To insure that α-β players are indeed proficient players, their performance was tested.

Each test tournament consists of 10000 games.

Page 38: Evolving  B oard Game  Players Without Using Expert Knowledge

Reversi Results

38

vs. αβ7

vs. αβ5

Search Depth

Fitness Eval

Run ID

758.5 875.0 4 25Co r00100803.0 957.5 4 25Co r00101640.5 942.5 4 40Co r00102711.5 905.5 4 40Co r00103760.0 956.0 4 40Co r00108826.0 912.5 4 40Co r00109730.5 953.5 4 40Co r00110815.5 961.0 4 40Co r00111

Page 39: Evolving  B oard Game  Players Without Using Expert Knowledge

Reversi Results (cont’d) Run parameters:

Population: 120 Generations: 100 No guide play. 25-40 co-play games as

black. Search depth 4. Maximum tree depth of 14. XO_Prob 0.8, mutProb 0.2, local_muteProb

0.5.39

Page 40: Evolving  B oard Game  Players Without Using Expert Knowledge

Dodgem

40

Page 41: Evolving  B oard Game  Players Without Using Expert Knowledge

Synopsis Tree based GP in a nutshell. Applying tree based GP to Lose

Checkers. Expanding work to other games. Available projects.

41

Page 42: Evolving  B oard Game  Players Without Using Expert Knowledge

Your mission (should you decide to accept it)

1. Choose a game.2. Write game program in C and

interface with Java system.3. Write game specific terminal nodes

and adjustments if necessary.4. Run it, document results, produce

report.

42

Page 43: Evolving  B oard Game  Players Without Using Expert Knowledge

Games

43

Page 44: Evolving  B oard Game  Players Without Using Expert Knowledge

My Current Areas of Interest.

Games with high branching factor. Games with random element. Multiplayer games. Games with partial information.

44

Page 45: Evolving  B oard Game  Players Without Using Expert Knowledge

Another project.I want to check my selective crossover operator.

Adapt system to a toy problem. Execute runs with selective XO and with

typical XO using several parameter sets. Compare and analyze results. Write report.

45