e volving & a dapting opponents in pacman presented by: debarghya majumdar somdas bandyopadhyay...

34
EVOLVING & ADAPTING OPPONENTS IN PACMAN PRESENTED BY: DEBARGHYA MAJUMDAR SOMDAS BANDYOPADHYAY KAUSTAV DEY BISWAS

Upload: patience-boone

Post on 24-Dec-2015

219 views

Category:

Documents


2 download

TRANSCRIPT

EVOLVING & ADAPTING OPPONENTS IN PACMAN

PRESENTED BY:

DEBARGHYA MAJUMDARSOMDAS BANDYOPADHYAYKAUSTAV DEY BISWAS

TOPICS TO BE DISCUSSED PacMan

The Game Areas of Intelligence Classical Ghost Algorithm

Basic Neuro-Evolution The Model The ANN Offline Learning Online Learning

Neuro-Evolution of Augmenting Topologies The Model The ANN Training Algorithm Experiments Discussions

Conclusions References

PACMAN: THE GAME Classical PacMan released by Namco (Japan) in 1980 Single player predator/prey game Player plays as the PacMan:

Navigates through a 2D maze Eats pellets Avoids ‘Ghosts’

Ghosts played by the computer Usually 4 in number Chase the PacMan and try to catch it

Winning / Losing PacMan wins if it has eaten all pellets or survived for 180

sec PacMan loses if caught by a Ghost

Special Power-pills Allows PacMan to eat the Ghosts for a short period of time Eaten Ghosts respawns

PACMAN: AREAS OF INTELLIGENCE

Programming the PacMan to replace human player Heavily researched field Gives insight into AI, but doesn't add value to the game

engine Making the Ghosts more intelligent

Thrives to: Make the Ghosts more efficient killers Incorporate teamwork Minimize score of the PacMan Make the Ghosts learn & adapt Make the game more 'interesting‘

Adds value to the game Scarcely researched field Currently our topic of interest

PACMAN: THE CLASSICAL GHOST ALGORITHM

Two modes of play: Attack: Chase down Pacman Scatter: Break away when Pacman has a power-pill

Decide target cell on reaching intersection: Attack target

Red Ghost: Current position of Pacman Pink Ghost: Cell 4 positions ahead of Pacman Blue Ghost: Cell 2 positions ahead of Pacman Orange Ghost: Target when far away, retire to corner when

nearby Scatter target

Pseudo-random behaviour

Minimize distance from target cell

PACMAN: MAKING THE GHOSTS INTELLIGENT

We will discuss two approaches for programming intelligence, learnability & adaptability into the Ghosts:

Local optimization using basic Neuro-Evolution

Global optimization using Neuro-Evolution of Augmenting Topologies (NEAT)

BASIC NEURO-EVOLUTION: THE MODEL Game field modeled as a grid. Each square can have any

of: Wall block Pellet Pacman Ghost(s)

Power-pills not incorporated to keep things simple Ghosts have only local info about the grid Game proceeds in cycles of 2 steps:

Gather information about environment Make a move by processing the information

Each Ghost controlled independently by a dedicated ANN ANNs trained by Evolutionary Algorithm (GA):

Offline (Train & Deploy) Online (Learn as you go)

Weight-adjusting is the only means of training the ANNs

BASIC NEURO-EVOLUTION: THE ANN

A 4-5-4 Feed-Forward neural controller, comprising of sigmoid neurons is employed to manage the Ghosts’ motion.

Inputs:Using their sensors, Ghosts inspect the environment from their own point of view and decide their next action.Each Ghost receives input information from its environment expressed in the neural network’s input array of dimension 4

OutputsFour scores corresponding to each of the directionThe direction with the maximum score is selected

BASIC NEURO-EVOLUTION: THE ANN – INPUTS

The Input array Consists of:

∆x,P = xg – xp

∆y,P = xy – xp

∆x,C = xg – xC

∆y,C = xg – xC where (xg,yg), (xp,yp) and (xc,yc) are the Cartesian co-ordinates of the current Ghost’s, Pacman’s and closest Ghost’s current position respectively.

Picture courtesy – Ref.[1]

∆y,P∆x,P ∆x,C ∆y,C

UP DOWN LEFT RIGHT

BASIC NEURO-EVOLUTION: THE NETWORK

BASIC NEURO-EVOLUTION: OFFLINE LEARNING

Off-line evolutionary learning approach is used in order to produce some ‘good’ (i.e. in terms of performance) initial behaviors for the online learning mechanism.

The neural networks that determine the behavior of the Ghosts are themselves evolved.

The evolving process is limited to the connection weights of the neural network.

The evolutionary procedure is based on genetic algorithm

Each Ghost has a genome that encodes the connection weights of its neural network.

A population of neural networks (Ghosts) is initialized randomly with initial uniformly distributed random connection weights that lie within [-5, 5].

BASIC NEURO-EVOLUTION: OFFLINE LEARNING – ALGORITHM

At each generation:

Step 1

Every Ghost in the population is cloned 4 times.

These 4 clones are placed in the PacMan game field and play N games, each one for an evaluation period of t simulation steps.

The outcome of these games is to ascertain the time taken to kill pacman tk for each game.

BASIC NEURO-EVOLUTION: OFFLINE LEARNING – ALGORITHM (CONTD.)

Step 2

Each Ghost is evaluated for each game and its fitness value is given E{f} over N games where f is given by :-

By the use of fitness function f , we promote Pacman killing behaviors capable of achieving high performance value.

4

1

)]/(1[ ttf k

BASIC NEURO-EVOLUTION: OFFLINE LEARNING – ALGORITHM (CONTD.)

Step 3

A pure elitism selection method is used whereonly the 10% best fit solutions determine the members

of the intermediate population and, therefore, are able to breed.

Step 4

Each parent clones an equal number of offspring in order to replace the non-picked solutions from elitism.

BASIC NEURO-EVOLUTION: OFFLINE LEARNING – ALGORITHM (CONTD.)

Step 5

Mutation occurs in each gene (connection weight) of each offspring’s genome with a small probability. A uniform random distribution is used to define the mutated value of the connection weight.

The algorithm is terminated when a predeterminednumber of generations g is achieved (e.g. g = 1000)

and the best-fit Ghost’s connections weights are saved.

BASIC NEURO-EVOLUTION: OFFLINE LEARNING

Pros:Computationally efficient – Minimum in-game computationsCan be tailor-made for specific maps

Cons:Cannot adapt to changing mapsMay overfit to training player's characteristics

BASIC NEURO-EVOLUTION: ONLINE LEARNING

This learning approach is based on the idea of Ghosts that learn while they are playing against Pacman.

In other words, Ghosts that are reactive to any player’s behavior and learn from its strategy instead of being predictable and uninteresting

Furthermore, this approach’s additional objective is to keep the game’s interest at high levels as long as it is being played.

BASIC NEURO-EVOLUTION: ONLINE LEARNING (CONTD)

Beginning from any initial group of homogeneous offline trained (OLT) Ghosts, the OLL mechanism attempts to transform them into a group of heterogeneous Ghosts that are interesting to play against.

An OLT Ghost is cloned 4 times and its clones are placed in the Pacman game field to play against a selected Pacman type of player.

BASIC NEURO-EVOLUTION: ONLINE LEARNING – ALGORITHM

At each generation:

Step 1

Each ghost is evaluated every t simulation steps while the game is played. The fitness function is :-

f = d1 – dt

where ,di = Distance between ghost and pacman at the ith

simulation step.

This fitness function promotes ghosts that move towards pacman within an evaluation period of t seconds.

BASIC NEURO-EVOLUTION: ONLINE LEARNING – ALGORITHM (CONTD.)

Step 2

A pure elitism selection method is used where only the fittest solution is able to breed. The best-fit parent clones an offspring.

Step 3

Mutation occurs in each gene (connection weight) of each offspring’s genome with a probability that is inversely proportional to the entropy of the group of ghosts.

BASIC NEURO-EVOLUTION: ONLINE LEARNING – ALGORITHM (CONTD.)

Step 4

The cloned offspring is evaluated briefly offline mode, that is, by replacing the worst-fit member of the population and playing an offline (i.e. no visualization of the actions) short game of t simulation steps.

The fitness values of the mutated offspring and the worst-fit Ghost are compared and the better one is kept for the next generation.

BASIC NEURO-EVOLUTION: ONLINE LEARNING

Pros:Adapts easily to varying maps, player mindsetsHugely generalized

Cons:Slow due to intensive computations during run-timeMay take some time to re-train on new maps

NEURO-EVOLUTION OF AUGMENTING TOPOLOGIES: THE MODEL Takes into account team work and strategic formations Operates on global data Has three modes of operation:

Chasing (pursuing Pacman) Fleeing (evading Pacman when Pacman has consumed a power-pill) Returning (returning back to the hideout to be restored)

Optimizes the team of Ghosts as a whole Each Ghost controlled independently by a dedicated ANN ANNs trained by Evolutionary Algorithm (GA) ANN training affects:

Weights of the edges Interconnection of the perceptrons (ANN topology)

Ghosts trained in real-time: Training proceeds parallely with the game Adapts the Ghosts over short time slices

Ghosts classified according to their distance from Pacman Each distance class has a dedicated ANN population which evolves genetically Multiple populations aid in heterogeneous strategy development

NEURO-EVOLUTION OF AUGMENTING TOPOLOGIES: THE ANN Each ANN represents a Ghost Input:

Current status of Ghost Current status of closest Ghost Current status of closest Ghost to Pacman Distances to objects of interest (Pacman, Ghost, Powerpill,

Pellet, Intersection, etc) Distances between Pacman & objects of interest (Ghost,

Powerpill, Pellet, Intersection, etc) Output:

Score of a cell Applied 4 times, one for each adjacent cell Cell with maximum score selected for making move

Connections: Minimally connected Evolves through NEAT

NEURO-EVOLUTION OF AUGMENTING TOPOLOGIES: TRAINING ALGORITHM

Initialize: A number of random neural network populations

generated, each corresponding to ghosts classified according to their distance to Pacman.

Game divided into time slices of a small number of moves. Gn represents the state of the game beginning at time slice n.

NEURO-EVOLUTION OF AUGMENTING TOPOLOGIES: TRAINING ALGORITHM (CONTD.) Algorithm:

Mark a ghost for learning during current time slice, beginning at Gn.

Look ahead (based on the models of the other ghosts and Pacman) and store the game state as expected to be like at the beginning of the next slice through simulated play (eGn+1 ). This will be the starting state for the NEAT simulation runs.

The fitness of a ghost strategy is determined by evaluating the game state that we expect to reach when the strategy is used in place of the marked ghost (eGn+2 ). This evaluation is an evaluation of the end state. Various fitness schemes are considered.

In parallel to the running of the actual game, run NEAT until the actual game reaches Gn+1.

The best individual from the simulations is substituted into the game, replacing the marked ghost.

Repeat the process for the next ghost in turn.

NEURO-EVOLUTION OF AUGMENTING TOPOLOGIES: TRAINING ALGORITHM – PICTORIAL REPRESENTATION

Picture courtesy – Ref.[2]

NEURO-EVOLUTION OF AUGMENTING TOPOLOGIES: EXPERIMENT 1 – CHASING AND EVADING PACMAN

Improvement over Classical AI Ghosts tend to form clusters, reducing effectiveness

Rank 1: Pacman’s number of lives

Rank 2:

n

i

fi

n

i

ci PacmanGhostdistPacmanGhostdist

11

),(),(

Score Lives Lost

Classical AI 4808.4 1.44

Experiment 1 4127.6 1.12

Table courtesy – Ref.[2]

NEURO-EVOLUTION OF AUGMENTING TOPOLOGIES: EXPERIMENT 2 – REMAINING DISPERSED

Inefficient as compared to Experiment 1 Ghosts tend to oscillate in dispersed locations

Rank 1: Pacman’s number of lives

Rank 2:

Rank 3:

),(max),(min 11 PacmanGhostdistPacmanGhostdist fi

ni

ci

ni

Score Lives Lost

Classical AI 4808.4 1.44

Experiment 1 4127.6 1.12

Experiment 2 4930.8 1.52

n

i

fi

fi

n

i

ci

ci GhostGhostdistGhostGhostdist

11

11 ),(),(

Table courtesy – Ref.[2]

NEURO-EVOLUTION OF AUGMENTING TOPOLOGIES: EXPERIMENT 3 – PROTECTION BEHAVIOUR

Teamwork improved Ghosts committing ‘suicide’!

Rank 1: Pacman’s number of lives

Rank 2: count(Ghostr)

Rank 3: count(Ghostf)

Rank 4:

Rank 5:

),(max),(min 11 PacmanGhostdistPacmanGhostdist fi

ni

ci

ni

Score Lives Lost

Classical AI 4808.4 1.44

Experiment 1 4127.6 1.12

Experiment 2 4930.8 1.52

Experiment 3 4271.6 1.64

n

i

fi

fi

n

i

ci

ci GhostGhostdistGhostGhostdist

11

11 ),(),(

Table courtesy – Ref.[2]

NEURO-EVOLUTION OF AUGMENTING TOPOLOGIES: EXPERIMENT 4 – AMBUSHING PACMAN

Kill rate significantly increased

Rank 1: Pacman’s number of lives

Rank 2: Intersections ‘controlled’ by Pacman

Rank 3:

Rank 4:

Rank 5: Pacman’s score

),(max),(min 11 PacmanGhostdistPacmanGhostdist fi

ni

ci

ni

Score Lives Lost

Classical AI 4808.4 1.44

Experiment 1 4127.6 1.12

Experiment 2 4930.8 1.52

Experiment 3 4271.6 1.64

Experiment 4 4494.4 1.96

n

i

fi

fi

n

i

ci

ci GhostGhostdistGhostGhostdist

11

11 ),(),(

Table courtesy – Ref.[2]

NEURO-EVOLUTION OF AUGMENTING TOPOLOGIES: DISCUSSIONS

Uses high-level global data about the state of the game

Reduces computational lag by looking ahead and employing parallelism

Encourages system to learn short-term strategies, rather than generalized long-term ones

Chalks down basic fitness strategies, opening up a horizon for many more

Demonstrates complex team behaviours

CONCLUSION

PacMan serves as a good test-bed for programming intelligent agents

Generalized strategies applicable to a vast class of predator/prey game

Programming Ghosts gives good insight into efficient team strategies

REFERENCES

[1] G. N. Yannakakis, and J. Hallam, "Evolving Opponents for Interesting Interactive Computer Games,'' in Proceedings of the 8th International Conference on the Simulation of Adaptive Behavior (SAB'04); From Animals to Animats 8, pp. 499-508, Los Angeles, CA, USA, July 13-17, 2004. The MIT Press.

[2] Mark Wittkamp, Luigi Barone, Philip Hingston, Using NEAT for Continuous Adaptation and Teamwork Formation in Pacman , 2008 IEEE Symposium on computational Intelligence and Games (CIG ‘08)

[3] Kenneth O. Stanley, Risto Miikkulainen(2002), Evolving Neural Networks through Augmenting Topologies, The MIT Press Journals