e volving & a dapting opponents in pacman presented by: debarghya majumdar somdas bandyopadhyay...
TRANSCRIPT
EVOLVING & ADAPTING OPPONENTS IN PACMAN
PRESENTED BY:
DEBARGHYA MAJUMDARSOMDAS BANDYOPADHYAYKAUSTAV DEY BISWAS
TOPICS TO BE DISCUSSED PacMan
The Game Areas of Intelligence Classical Ghost Algorithm
Basic Neuro-Evolution The Model The ANN Offline Learning Online Learning
Neuro-Evolution of Augmenting Topologies The Model The ANN Training Algorithm Experiments Discussions
Conclusions References
PACMAN: THE GAME Classical PacMan released by Namco (Japan) in 1980 Single player predator/prey game Player plays as the PacMan:
Navigates through a 2D maze Eats pellets Avoids ‘Ghosts’
Ghosts played by the computer Usually 4 in number Chase the PacMan and try to catch it
Winning / Losing PacMan wins if it has eaten all pellets or survived for 180
sec PacMan loses if caught by a Ghost
Special Power-pills Allows PacMan to eat the Ghosts for a short period of time Eaten Ghosts respawns
PACMAN: AREAS OF INTELLIGENCE
Programming the PacMan to replace human player Heavily researched field Gives insight into AI, but doesn't add value to the game
engine Making the Ghosts more intelligent
Thrives to: Make the Ghosts more efficient killers Incorporate teamwork Minimize score of the PacMan Make the Ghosts learn & adapt Make the game more 'interesting‘
Adds value to the game Scarcely researched field Currently our topic of interest
PACMAN: THE CLASSICAL GHOST ALGORITHM
Two modes of play: Attack: Chase down Pacman Scatter: Break away when Pacman has a power-pill
Decide target cell on reaching intersection: Attack target
Red Ghost: Current position of Pacman Pink Ghost: Cell 4 positions ahead of Pacman Blue Ghost: Cell 2 positions ahead of Pacman Orange Ghost: Target when far away, retire to corner when
nearby Scatter target
Pseudo-random behaviour
Minimize distance from target cell
PACMAN: MAKING THE GHOSTS INTELLIGENT
We will discuss two approaches for programming intelligence, learnability & adaptability into the Ghosts:
Local optimization using basic Neuro-Evolution
Global optimization using Neuro-Evolution of Augmenting Topologies (NEAT)
BASIC NEURO-EVOLUTION: THE MODEL Game field modeled as a grid. Each square can have any
of: Wall block Pellet Pacman Ghost(s)
Power-pills not incorporated to keep things simple Ghosts have only local info about the grid Game proceeds in cycles of 2 steps:
Gather information about environment Make a move by processing the information
Each Ghost controlled independently by a dedicated ANN ANNs trained by Evolutionary Algorithm (GA):
Offline (Train & Deploy) Online (Learn as you go)
Weight-adjusting is the only means of training the ANNs
BASIC NEURO-EVOLUTION: THE ANN
A 4-5-4 Feed-Forward neural controller, comprising of sigmoid neurons is employed to manage the Ghosts’ motion.
Inputs:Using their sensors, Ghosts inspect the environment from their own point of view and decide their next action.Each Ghost receives input information from its environment expressed in the neural network’s input array of dimension 4
OutputsFour scores corresponding to each of the directionThe direction with the maximum score is selected
BASIC NEURO-EVOLUTION: THE ANN – INPUTS
The Input array Consists of:
∆x,P = xg – xp
∆y,P = xy – xp
∆x,C = xg – xC
∆y,C = xg – xC where (xg,yg), (xp,yp) and (xc,yc) are the Cartesian co-ordinates of the current Ghost’s, Pacman’s and closest Ghost’s current position respectively.
Picture courtesy – Ref.[1]
BASIC NEURO-EVOLUTION: OFFLINE LEARNING
Off-line evolutionary learning approach is used in order to produce some ‘good’ (i.e. in terms of performance) initial behaviors for the online learning mechanism.
The neural networks that determine the behavior of the Ghosts are themselves evolved.
The evolving process is limited to the connection weights of the neural network.
The evolutionary procedure is based on genetic algorithm
Each Ghost has a genome that encodes the connection weights of its neural network.
A population of neural networks (Ghosts) is initialized randomly with initial uniformly distributed random connection weights that lie within [-5, 5].
BASIC NEURO-EVOLUTION: OFFLINE LEARNING – ALGORITHM
At each generation:
Step 1
Every Ghost in the population is cloned 4 times.
These 4 clones are placed in the PacMan game field and play N games, each one for an evaluation period of t simulation steps.
The outcome of these games is to ascertain the time taken to kill pacman tk for each game.
BASIC NEURO-EVOLUTION: OFFLINE LEARNING – ALGORITHM (CONTD.)
Step 2
Each Ghost is evaluated for each game and its fitness value is given E{f} over N games where f is given by :-
By the use of fitness function f , we promote Pacman killing behaviors capable of achieving high performance value.
4
1
)]/(1[ ttf k
BASIC NEURO-EVOLUTION: OFFLINE LEARNING – ALGORITHM (CONTD.)
Step 3
A pure elitism selection method is used whereonly the 10% best fit solutions determine the members
of the intermediate population and, therefore, are able to breed.
Step 4
Each parent clones an equal number of offspring in order to replace the non-picked solutions from elitism.
BASIC NEURO-EVOLUTION: OFFLINE LEARNING – ALGORITHM (CONTD.)
Step 5
Mutation occurs in each gene (connection weight) of each offspring’s genome with a small probability. A uniform random distribution is used to define the mutated value of the connection weight.
The algorithm is terminated when a predeterminednumber of generations g is achieved (e.g. g = 1000)
and the best-fit Ghost’s connections weights are saved.
BASIC NEURO-EVOLUTION: OFFLINE LEARNING
Pros:Computationally efficient – Minimum in-game computationsCan be tailor-made for specific maps
Cons:Cannot adapt to changing mapsMay overfit to training player's characteristics
BASIC NEURO-EVOLUTION: ONLINE LEARNING
This learning approach is based on the idea of Ghosts that learn while they are playing against Pacman.
In other words, Ghosts that are reactive to any player’s behavior and learn from its strategy instead of being predictable and uninteresting
Furthermore, this approach’s additional objective is to keep the game’s interest at high levels as long as it is being played.
BASIC NEURO-EVOLUTION: ONLINE LEARNING (CONTD)
Beginning from any initial group of homogeneous offline trained (OLT) Ghosts, the OLL mechanism attempts to transform them into a group of heterogeneous Ghosts that are interesting to play against.
An OLT Ghost is cloned 4 times and its clones are placed in the Pacman game field to play against a selected Pacman type of player.
BASIC NEURO-EVOLUTION: ONLINE LEARNING – ALGORITHM
At each generation:
Step 1
Each ghost is evaluated every t simulation steps while the game is played. The fitness function is :-
f = d1 – dt
where ,di = Distance between ghost and pacman at the ith
simulation step.
This fitness function promotes ghosts that move towards pacman within an evaluation period of t seconds.
BASIC NEURO-EVOLUTION: ONLINE LEARNING – ALGORITHM (CONTD.)
Step 2
A pure elitism selection method is used where only the fittest solution is able to breed. The best-fit parent clones an offspring.
Step 3
Mutation occurs in each gene (connection weight) of each offspring’s genome with a probability that is inversely proportional to the entropy of the group of ghosts.
BASIC NEURO-EVOLUTION: ONLINE LEARNING – ALGORITHM (CONTD.)
Step 4
The cloned offspring is evaluated briefly offline mode, that is, by replacing the worst-fit member of the population and playing an offline (i.e. no visualization of the actions) short game of t simulation steps.
The fitness values of the mutated offspring and the worst-fit Ghost are compared and the better one is kept for the next generation.
BASIC NEURO-EVOLUTION: ONLINE LEARNING
Pros:Adapts easily to varying maps, player mindsetsHugely generalized
Cons:Slow due to intensive computations during run-timeMay take some time to re-train on new maps
NEURO-EVOLUTION OF AUGMENTING TOPOLOGIES: THE MODEL Takes into account team work and strategic formations Operates on global data Has three modes of operation:
Chasing (pursuing Pacman) Fleeing (evading Pacman when Pacman has consumed a power-pill) Returning (returning back to the hideout to be restored)
Optimizes the team of Ghosts as a whole Each Ghost controlled independently by a dedicated ANN ANNs trained by Evolutionary Algorithm (GA) ANN training affects:
Weights of the edges Interconnection of the perceptrons (ANN topology)
Ghosts trained in real-time: Training proceeds parallely with the game Adapts the Ghosts over short time slices
Ghosts classified according to their distance from Pacman Each distance class has a dedicated ANN population which evolves genetically Multiple populations aid in heterogeneous strategy development
NEURO-EVOLUTION OF AUGMENTING TOPOLOGIES: THE ANN Each ANN represents a Ghost Input:
Current status of Ghost Current status of closest Ghost Current status of closest Ghost to Pacman Distances to objects of interest (Pacman, Ghost, Powerpill,
Pellet, Intersection, etc) Distances between Pacman & objects of interest (Ghost,
Powerpill, Pellet, Intersection, etc) Output:
Score of a cell Applied 4 times, one for each adjacent cell Cell with maximum score selected for making move
Connections: Minimally connected Evolves through NEAT
NEURO-EVOLUTION OF AUGMENTING TOPOLOGIES: TRAINING ALGORITHM
Initialize: A number of random neural network populations
generated, each corresponding to ghosts classified according to their distance to Pacman.
Game divided into time slices of a small number of moves. Gn represents the state of the game beginning at time slice n.
NEURO-EVOLUTION OF AUGMENTING TOPOLOGIES: TRAINING ALGORITHM (CONTD.) Algorithm:
Mark a ghost for learning during current time slice, beginning at Gn.
Look ahead (based on the models of the other ghosts and Pacman) and store the game state as expected to be like at the beginning of the next slice through simulated play (eGn+1 ). This will be the starting state for the NEAT simulation runs.
The fitness of a ghost strategy is determined by evaluating the game state that we expect to reach when the strategy is used in place of the marked ghost (eGn+2 ). This evaluation is an evaluation of the end state. Various fitness schemes are considered.
In parallel to the running of the actual game, run NEAT until the actual game reaches Gn+1.
The best individual from the simulations is substituted into the game, replacing the marked ghost.
Repeat the process for the next ghost in turn.
NEURO-EVOLUTION OF AUGMENTING TOPOLOGIES: TRAINING ALGORITHM – PICTORIAL REPRESENTATION
Picture courtesy – Ref.[2]
NEURO-EVOLUTION OF AUGMENTING TOPOLOGIES: EXPERIMENT 1 – CHASING AND EVADING PACMAN
Improvement over Classical AI Ghosts tend to form clusters, reducing effectiveness
Rank 1: Pacman’s number of lives
Rank 2:
n
i
fi
n
i
ci PacmanGhostdistPacmanGhostdist
11
),(),(
Score Lives Lost
Classical AI 4808.4 1.44
Experiment 1 4127.6 1.12
Table courtesy – Ref.[2]
NEURO-EVOLUTION OF AUGMENTING TOPOLOGIES: EXPERIMENT 2 – REMAINING DISPERSED
Inefficient as compared to Experiment 1 Ghosts tend to oscillate in dispersed locations
Rank 1: Pacman’s number of lives
Rank 2:
Rank 3:
),(max),(min 11 PacmanGhostdistPacmanGhostdist fi
ni
ci
ni
Score Lives Lost
Classical AI 4808.4 1.44
Experiment 1 4127.6 1.12
Experiment 2 4930.8 1.52
n
i
fi
fi
n
i
ci
ci GhostGhostdistGhostGhostdist
11
11 ),(),(
Table courtesy – Ref.[2]
NEURO-EVOLUTION OF AUGMENTING TOPOLOGIES: EXPERIMENT 3 – PROTECTION BEHAVIOUR
Teamwork improved Ghosts committing ‘suicide’!
Rank 1: Pacman’s number of lives
Rank 2: count(Ghostr)
Rank 3: count(Ghostf)
Rank 4:
Rank 5:
),(max),(min 11 PacmanGhostdistPacmanGhostdist fi
ni
ci
ni
Score Lives Lost
Classical AI 4808.4 1.44
Experiment 1 4127.6 1.12
Experiment 2 4930.8 1.52
Experiment 3 4271.6 1.64
n
i
fi
fi
n
i
ci
ci GhostGhostdistGhostGhostdist
11
11 ),(),(
Table courtesy – Ref.[2]
NEURO-EVOLUTION OF AUGMENTING TOPOLOGIES: EXPERIMENT 4 – AMBUSHING PACMAN
Kill rate significantly increased
Rank 1: Pacman’s number of lives
Rank 2: Intersections ‘controlled’ by Pacman
Rank 3:
Rank 4:
Rank 5: Pacman’s score
),(max),(min 11 PacmanGhostdistPacmanGhostdist fi
ni
ci
ni
Score Lives Lost
Classical AI 4808.4 1.44
Experiment 1 4127.6 1.12
Experiment 2 4930.8 1.52
Experiment 3 4271.6 1.64
Experiment 4 4494.4 1.96
n
i
fi
fi
n
i
ci
ci GhostGhostdistGhostGhostdist
11
11 ),(),(
Table courtesy – Ref.[2]
NEURO-EVOLUTION OF AUGMENTING TOPOLOGIES: DISCUSSIONS
Uses high-level global data about the state of the game
Reduces computational lag by looking ahead and employing parallelism
Encourages system to learn short-term strategies, rather than generalized long-term ones
Chalks down basic fitness strategies, opening up a horizon for many more
Demonstrates complex team behaviours
CONCLUSION
PacMan serves as a good test-bed for programming intelligent agents
Generalized strategies applicable to a vast class of predator/prey game
Programming Ghosts gives good insight into efficient team strategies
REFERENCES
[1] G. N. Yannakakis, and J. Hallam, "Evolving Opponents for Interesting Interactive Computer Games,'' in Proceedings of the 8th International Conference on the Simulation of Adaptive Behavior (SAB'04); From Animals to Animats 8, pp. 499-508, Los Angeles, CA, USA, July 13-17, 2004. The MIT Press.
[2] Mark Wittkamp, Luigi Barone, Philip Hingston, Using NEAT for Continuous Adaptation and Teamwork Formation in Pacman , 2008 IEEE Symposium on computational Intelligence and Games (CIG ‘08)
[3] Kenneth O. Stanley, Risto Miikkulainen(2002), Evolving Neural Networks through Augmenting Topologies, The MIT Press Journals