evolving b oard game players without using expert knowledge

1

Evolving Board Game Players Without Using Expert Knowledge

A presentation of research by Amit BenbassatAdvisor: Moshe Sipper.

A. Benbassat and M. Sipper “Evolving Lose-Checkers Players using Genetic Programming” IEEE Conference on Computational Intelligence and Games (CIG'10), 2010New yet unpublished results.

Includes results:

Synopsis Tree based GP in a nutshell. Applying tree based GP to Lose

Checkers. Expanding work to other games. Available projects.

2

A Bit About Tree-Based GP

A method of solving problems by evolving solver programs.

The programs are represented in memory in tree form (i.e. the genomes are trees).

Initially promoted mostly through the efforts of John Koza.

3

Tree-Based GPTurning expressions into a tree shaped data

structure: (X + 1) – (√X) IF (X≤3) THEN ((X+Y) + 3) ELSE ((X*Y)*X)

4

+

−

SQRT

XX 1

IFT

≤ +

+ 3

X Y

*

X Y

* XX 3

Generic Genetic Operators:Self-Replication

5

IFT

≤ +

+ 3

X Y

*

X Y

* XX 3

IFT

≤ +

+ 3

X Y

*

X Y

* XX 3

Generic Genetic Operators:Rebuild Mutation

6

IFT

≤ +

+ 3

X Y

*

X Y

* XX 3

−

Y 4

Generic Genetic Operators:Two-Way Crossover

7

IFT

≤ +

+ 3

X Y

X 3

−

Y 4

+

−

SQRT

XX 1

Synopsis Previous results in games using GP and

GAs. Applying tree based GP to Lose Checkers.

Design. Algorithm and operators. Results.

Expanding work to other games. Conclusions and future work.

8

Applying GP to Lose Checkers:From Genotype to Phenotype

Used strongly typed tree based GP. Trees are seen as board-state

evaluators. The individual players are built around

the evaluator, using it (integrated with alpha-beta search) to decide which move to take.

9

Terminal Nodes

10

Terminal Nodes (cont’d)

11

Function Nodes

12

Applying GP to Lose Checkers

Algorithm:Generate random population consisting

of individuals of tree height 5 for

generation 0.Repeat for each generation i

Evaluate fitness.Selection().Procreation(XOprob,mutProb). 13

Fitness Calculations The system supports a sequence of guides.

Each guide has a number of rounds assigned to it. Each guide has a number of games per round

assigned to it. The system also supports play between

individuals in the population (referred to in the EA literature as coevolution) and a parameter coPlayNum for number of games.

Players get 1 fitness point for winning a game and 0.5 points for a draw. 14

Fitness Calculations (cont’d)

for each guide i dofor j ← 1 to guide i‘s Number of rounds do

Have every individual in the population deemed fit enough play guide i’s round size games against guide i.

Have every individual in the population play coPlayNum

games as black against coPlayNum random opponents in

the population.

15

SelectionRepeat until number of parents selected is equal to original population size

Randomly choose two different individuals from population : I1 and I2if I1.Fitness > I2.Fitness thenSelect a copy of I1 for parent population.

elseSelect a copy of I2 for parent population.

16

Genetic Operators:Local Mutation

17

Every tree node N returning a floating point value was assigned a number.

This number was initialized to 1.0 and acted as a factor for the return value.

Local mutation is a slight change in the node’s factor.

+

A B

<f1> Returnsf1*(A+B) +

A B

<f2> Returnsf2*(A+B)

Genetic Operators:One-Way Crossover

18

IFT

≤ +

+ 3

X Y

X 3

−

Y 4

+

−

SQRT

XX 11

Procreation(XOprob,mutProb)

While there remain at least 2 unselected individuals.find two unselected individuals I1 I2 at random.with probability XOprobIf I1.Fitness > I2.Fitnessuse one-way XO to transfer genes from I1 to I2.Else

use two-way XO between I1 and I2.For each individual I1 in population.

with probability mutProb choose a node in I1‘s tree atrandom and mutate it by either rebuild or local mutation.19

Opponents There is no known simple evaluation

function for Lose Checkers. All hand-crafted players used the

random function to evaluate non-trivial board-states.

Two types of opponents were written in code: The random player. An α-β player of depth d with a random

evaluation function.20

Quality of α-β Players To insure that α-β

players using a random evaluation function are indeed proficient players, their performance was tested.

Each test tournament consists of 10000 games.

21

1st player win ratio 2nd player

1st player

0.9665 Random αβ2

0.8502 αβ2 αβ3

0.5873 αβ3 αβ8

0.82535 αβ3 αβ5

0.5562 αβ8 αβ5

Results with Search Againstα-β Players

Using lookahead 3, playing 1000 games against αβ3.

22

vs. αβ3 Fitness Eval

Run ID

744.0 50Co r00044698.5 50Co r00046765.5 50Co r00047696.5 50Co r00048781.5 50Co r00049721.0 50Co r00056786.5 50Co r00057697.0 50Co r00058737.0 50Co r00060737.0 50Co r00061

Results with Search Againstα-β Players (cont’d)

Using lookahead 3, playing against various opponents.

23

vs. αβ8 vs .αβ6 vs. αβ4 vs. αβ3

Run ID

758.0 816.0 944.5 744.0 r00044476.0 722.5 899.0 765.5 r00047735.5 809.0 915.0 781.5 r00049399.5 745.5 909.0 786.5 r00057408.5 627.0 897.0 737.0 r00060715.5 781.5 947.0 737.0 r00061

Results with Search Againstα-β Players: Parameters

Run parameters: Population 150, 120 generations. No guide play, 50 co-play games as black,

search depth 3. maximum tree depth:

12 in runs 44A-49A. 14 in runs 56A-61A

XO_Prob 0.8, mutProb 0.2, local_muteProb 0.5.

24

Evolving Players using Deeper Search

Results with players using lookahead 4.

25

vs. αβ8 vs. αβ6 vs. αβ5 Run ID395.0 603.5 582.0 r00064561.5 782.5 537.0 r00065483.5 757.5 567.0 r00066385.5 723.0 598.5 r00067524.0 787.0 548.0 r00068523.0 715.5 573.5 r00069476.0 691.5 577.0 r00070401.5 582.5 551.5 r00071

Results with Search Againstα-β Players: Parameters

Run parameters: Population 50, 70 generations. guide play:

20 games (in 2 rounds of 10) against αβ5. 20 co-play games as black. Search depth 4. maximum tree depth of 10. XO_Prob 0.8, mutProb 0.2, local_muteProb

0.5.26

The Role of Mobility Initial runs with search produced tepid

results. The introduction of the mobility

terminal greatly improved those results.

Mobility is a general principle which apllies to many board games, and often associated with a high level of play. 27


Checkers. Expanding work to other games.

New results in Lose Checkers. 10X10 Checkers. Reversi. Dodgem.

Conclusions and future work. 28

New Results in Lose Checkers

29

vs. αβ5 Fitness Eval Run ID

632.0 10αβ2_20Co r00090645.0 10αβ2_20Co r00091608.0 25Co r00096575.0 25Co r00097575.5 40Co r00098633.5 40Co r00099

Results with players using lookahead 4.

New Results in Lose Checkers (cont’d)

30

Run parameters: Population: 120-150 Generations: 90-100. Guide play:

10 games against αβ2 in two of the runs. 20-40 co-play games as black. Search depth 4. Maximum tree depth of 14. XO_Prob 0.8, mutProb 0.2, local_muteProb

0.5.

10x10 Checkers

31

10x10 Board. Objective: To

eliminate all opponent pieces or render all opponent pieces immobile.

Rules: As in 8x8 version.

Quality of α-β Players Evolved players were

tested against α-β players that chose a material evaluation function at random for each turn.

To insure that α-β players are indeed proficient players, their performance was tested.


32

1st player win ratio

2nd player

1st player

0.99885 Random αβ2

0.5229 αβ2 αβ3

0.876 αβ3 αβ5

10x10 Checkers Results

33

vs. αβ3 Search Depth

Fitness Eval

Run ID

889.0 3 50Co r00084927.0 3 50Co r00085732.0 2 25Co r00092615.5 2 25Co r00093554.0 2 25Co r00094631.0 2 25Co r00095

10x10 Checkers Results (cont’d)

Run parameters: Population: 100-150 Generations: 100 No guide play. 25-50 co-play games as

black. Search depth 4. Maximum tree depth 13-14. XO_Prob 0.8, mutProb 0.2, local_muteProb

0.5.34

8x8 Reversi Popular board game.

AKA Othello. 8x8 board. Each piece has black

side and white side. Each player places

piece on her turn, flipping trapped opponent pieces.

Objective: Maximize number of friendly pieces on the board.35

Reversi Specific Terminals

36

Return Value Return Type

Node Name

Number of corners occupied by

opponent

F EnemyCornerCount

Number of corners occupied by player

F FriendlyCornerCount

FriendlyCornerCount− EnemyCornerCount

F CornerCount

Quality of α-β Players

37

1st player win ratio

2nd player 1st player

0.8471 Random αβ2

0.6004 αβ2 αβ3

0.7509 αβ3 αβ5

0.7662 αβ5 αβ7

Evolved players were tested against α-β players that chose a material evaluation function at random for each turn.

To insure that α-β players are indeed proficient players, their performance was tested.


Reversi Results

38

vs. αβ7

vs. αβ5

Search Depth

Fitness Eval

Run ID

758.5 875.0 4 25Co r00100803.0 957.5 4 25Co r00101640.5 942.5 4 40Co r00102711.5 905.5 4 40Co r00103760.0 956.0 4 40Co r00108826.0 912.5 4 40Co r00109730.5 953.5 4 40Co r00110815.5 961.0 4 40Co r00111

Reversi Results (cont’d) Run parameters:

Population: 120 Generations: 100 No guide play. 25-40 co-play games as

black. Search depth 4. Maximum tree depth of 14. XO_Prob 0.8, mutProb 0.2, local_muteProb

0.5.39

Dodgem

40


Checkers. Expanding work to other games. Available projects.

41

Your mission (should you decide to accept it)

1. Choose a game.2. Write game program in C and

interface with Java system.3. Write game specific terminal nodes

and adjustments if necessary.4. Run it, document results, produce

report.

42

Games

43

My Current Areas of Interest.

Games with high branching factor. Games with random element. Multiplayer games. Games with partial information.

44

Another project.I want to check my selective crossover operator.

Adapt system to a toy problem. Execute runs with selective XO and with

typical XO using several parameter sets. Compare and analyze results. Write report.

45

evolving b oard game players without using expert knowledge

Documents

number of games

parent population

random population

coplaynum games

typed tree

tree form

individuals of tree

evolving checkers players