gameplaying games playing: ideal world of hostile agents...

13
1 Game Playing Why study games? Games playing: ideal world of hostile agents attempting to diminish one’s well being. Reasons to study games: Modeling strategic and adversary problems is of general interest (e.g. economic situations). Handling opponents introduces uncertainty and requires contingency plans. Problems are usually complex and very often viewed as an indicator of intelligence. Characteristics: Well-formalized problems: clear description of the environment. Common-sense knowledge is not required. Rules are fixed. Number of nodes in the tree might be high, but memorizing the past is not needed. Why Study Games? Games offer: Intellectual Engagement Abstraction Representability Performance Measure Not all games are suitable for AI research. We will restrict ourselves to 2 person perfect information board games. Game Playing as Search Playing a game involves searching for the best move. Board games clearly involve notions like start state, goal state, operators, etc. We can this usefully import problem solving techniques that we have already met. There are nevertheless important differences from standard search problems. Special Characteristics of Game Playing Search Main differences are uncertainties introduced by Till now we assumed the situation is not going to change while we search. However …. Presence of an opponent. One do not know what the opponent will do until he/she does it. Game playing programs must solve the contingency problem. Complexity. Most interesting games are simply too complex to solve by exhaustive means. Chess, for example, has an average branching factor of 35. Uncertainty also arises from not having the resources to compute a move which is guaranteed to be the best. AI and game playing Game playing (especially chess and checkers) was the first test application of AI It involves a different type of search problem than we have considered up to now – a solution is not a path, but simply the next move The best move depends on what the opponent might do (adversary search)

Upload: others

Post on 28-Jul-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: GamePlaying Games playing: ideal world of hostile agents ...ivalova/Spring08/cis412/Ectures/Game1.pdf · Game Playing – Example Nim (a simple game) Start with a single pile of tokens

1

Game Playing

Why study games?Games playing: ideal world of hostile agents attempting to diminish one’s well being.Reasons to study games:

Modeling strategic and adversary problems is of general interest (e.g. economic situations).Handling opponents introduces uncertainty and requires contingency plans.Problems are usually complex and very often viewed as an indicator of intelligence.

Characteristics:Well-formalized problems: clear description of the environment.Common-sense knowledge is not required.Rules are fixed.Number of nodes in the tree might be high, but memorizing the past is not needed.

Why Study Games?

Games offer:Intellectual Engagement AbstractionRepresentabilityPerformance Measure

Not all games are suitable for AI research. We will restrict ourselves to 2 person perfect informationboard games.

Game Playing as Search

Playing a game involves searching for the best move.

Board games clearly involve notions like start state, goal state, operators, etc. We can this usefully import problem solving techniques that we have already met.

There are nevertheless important differences from standard search problems.

Special Characteristics of Game Playing Search

Main differences are uncertainties introduced by

Till now we assumed the situation is not going to change while we search. However ….Presence of an opponent. One do not know what the opponent will do until he/she does it. Game playing programs must solve the contingency problem.

Complexity. Most interesting games are simply too complex to solve by exhaustive means. Chess, for example, has an average branching factor of 35. Uncertainty also arises from not having the resources to compute a move which is guaranteed to be the best.

AI and game playing

Game playing (especially chess and checkers) was the first test application of AIIt involves a different type of search problem than we have considered up to now – a solution is not a path, but simply the next moveThe best move depends on what the opponent might do (adversary search)

Page 2: GamePlaying Games playing: ideal world of hostile agents ...ivalova/Spring08/cis412/Ectures/Game1.pdf · Game Playing – Example Nim (a simple game) Start with a single pile of tokens

2

Two-player games: motivation

Previous heuristics and search procedures are only useful for single-player games

no notion of turns: one or more cooperative agentsdoes not take into account adversarial moves

Games are ideal to explore adversarial strategies

well-defined, abstract rulesmost formulated as search problems really hard combinatorial problems -- chess!!

Two-player games

Search tree for each player remains the sameEven levels i are moves of player AOdd levels i+1 are moves of player B

Each player searches for a goal (different for each) at their levelEach player evaluates the states according to their heuristic functionA’s best move brings B to the worst state A searches for its best move assuming B will also search for its best move

MinMax search strategy

Search for A’s best next move, so that no matter what B does (in particular, choosing its best move) A will be better offAt each step, evaluate the value of all descendants: take the maximum if it is A’s turn, or the minimum if it is B’s turnWe need the estimated values d moves ahead

generate all nodes to level d (BFS)propagate Min-Max values up from leafs

Typical case

2-person gamePlayers alternate moves Zero-sum: one player’s loss is the other’s gainPerfect information: both players have access to complete information about the state of the game. No information is hidden from either player.No chance (e.g., using dice) involved Examples: Tic-Tac-Toe, Checkers, Chess, Go, Nim, OthelloNot: Bridge, Solitaire, Backgammon, ...

How to play a game

A way to play such a game is to:Consider all the legal moves you can makeCompute the new position resulting from each moveEvaluate each resulting position and determine which is bestMake that moveWait for your opponent to move and repeat

Key problems are:Representing the “board”Generating all legal next boardsEvaluating a position

Ingredients of 2-Person Games

Players: We call them Max and Min.

Initial State: Includes board position and whose turn it is.

Operators: These correspond to legal moves.

Terminal Test: A test applied to a board position which determines whether the game is over. In chess, for example, this would be a checkmate or stalemate situation.

Utility Function: A function which assigns a numeric value to a terminal state. For example, in chess the outcome is win (+1), lose (-1) or draw (0). Note that by convention, we always measure utility relative to Max.

Page 3: GamePlaying Games playing: ideal world of hostile agents ...ivalova/Spring08/cis412/Ectures/Game1.pdf · Game Playing – Example Nim (a simple game) Start with a single pile of tokens

3

Normal and Game Search Problem

Normal search problem: Max searches for a sequence of moves yielding a winning position and then makes the first move in the sequence.

Game search problem: Clearly, this is not feasible in in a game situation where Min's moves must be taken into consideration. Max must devise a strategy which leads to a winning position no matter what moves Min makes.

Chess as a First Choice

It provides proof that a machine can actually do something that was thought to require intelligence.

It has simple rules.

The world state is fully accessible to the program.

The computer representation can be correct in every relevant detail.

Some games

• tic-tac-toe

• checkers

• Go

• Othello

• chess

• poker

• bridge

Complexity of Searching

The presence of an opponent makes the decision problem more complicated.

Games are usually much too hard to solve.

Games penalize inefficiency very severely.

Things to Come…

Perfect Decisions in Two-Person Games

Imperfect Decisions

Alpha-Beta Pruning

Games That Include an Element of Chance

Perfect Decisions in Two-Player Games

Page 4: GamePlaying Games playing: ideal world of hostile agents ...ivalova/Spring08/cis412/Ectures/Game1.pdf · Game Playing – Example Nim (a simple game) Start with a single pile of tokens

4

Games as Search Problem

Some games can normally be defined in the form of a tree.

Branching factor is usually an average of the possible number of moves at each node.

This is a simple search problem: a player must search this search tree and reach a leaf node with a favorable outcome.

Two Player Game

Two players: Max and MinObjective of both Max and Min to optimize winnings

Max must reach a terminal state with the highest utilityMin must reach a terminal state with the lowest utility

Game ends when either Max and Min have reached a terminal stateupon reaching a terminal state points maybe awarded or sometimes deducted

Search Problem Revisited

Simple problem is to reach a favorable terminal state

Problem Not so simple...Max must reach a terminal state with as high a utility as possible regardless of Min’s moves

Max must develop a strategy that determines best possible move for each move Min makes.

Game Playing - Minimax

Game Playing

An opponent tries to thwart your every move

1944 - John von Neumann outlined a search method (Minimax) that maximised your position whilst minimising your opponents

Game Playing – Example

Nim (a simple game)Start with a single pile of tokensAt each move the player must select a pile and divide the tokens into two non-empty, non-equal piles

++

+

Game Playing - Minimax

Starting with 7 tokens, the game is small enough that we can draw the entire game tree

The “game tree” to describe all possible games follows:

Page 5: GamePlaying Games playing: ideal world of hostile agents ...ivalova/Spring08/cis412/Ectures/Game1.pdf · Game Playing – Example Nim (a simple game) Start with a single pile of tokens

5

7

6-1 5-2 4-3

5-1-1 4-2-1 3-2-2 3-3-1

4-1-1-1 3-2-1-1 2-2-2-1

3-1-1-1-1 2-2-1-1-1

2-1-1-1-1-1

Game Playing - Minimax

Conventionally, in discussion of minimax, have two players “MAX” and “MIN”

The utility function is taken to be the utility for MAX

Larger values are better for MAX”

Game Playing – Nim

Remember that larger values are taken to be better for MAXAssume that use a utility function of

1 = a win for MAX0 = a win for MIN

We only compare values, “larger or smaller”, so the actual sizes do not matter

in other games might use {+1,0,-1} for {win,draw,lose}.

Game Playing – Minimax

Basic idea of minimax:

Player MAX is going to take the best move available

Will select the next state to be the one with the highest utility

Hence, value of a MAX node is the MAXIMUM of the values of the next possible states

i.e. the maximum of its children in the search tree

Game Playing – Minimax

Player MIN is going to take the best move available for MIN

i.e. the worst available for MAX

Will select the next state to be the one with the lowest utility

recall, higher utility values are better for MAX and so worse for MIN

Hence, value of a MIN node is the MINIMUM of the values of the next possible states

i.e. the minimum of its children in the search tree

Game Playing – Minimax Summary

A “MAX” move takes the best move for MAX –so takes the MAX utility of the children

A “MIN” move takes the best for min – hence the worst for MAX – so takes the MIN utility of the children

Games alternate in play between MIN and MAX

Page 6: GamePlaying Games playing: ideal world of hostile agents ...ivalova/Spring08/cis412/Ectures/Game1.pdf · Game Playing – Example Nim (a simple game) Start with a single pile of tokens

6

7

6-1 5-2 4-3

5-1-1 4-2-1 3-2-2 3-3-1

4-1-1-1 3-2-1-1 2-2-2-1

3-1-1-1-1 2-2-1-1-1

2-1-1-1-1-1

MIN

MIN

MIN

MAX

MAX

MAX 0 (loss for MAX)

1

0

0

01

0 1 0 1

1 1 11

Game Playing – Use of Minimax

The Min node has value +1

All moves by MIN lead to a state of value +1 for MAXMIN cannot avoid losing

From the values on the tree one can read off the best moves for each player

make sure you know how to extract these best moves (“perfect lines of play”)

Game Playing – Bounded Minimax

For real games, search trees are much bigger and deeper than Nim

Cannot possibly evaluate the entire tree

Have to put a bound on the depth of the search

Game Playing – Bounded Minimax

The terminal states are no longer a definite win/lossactually they are really a definite win/draw/loss but with reasonable computer resources we cannot determine which

Have to heuristically/approximately evaluate the quality of the positions of the states

Evaluation of the utility function is expensive if it is not a clear win or loss

Game Playing – Bounded Minimax

Next Slide:

Artificial example of minimax bounded

Evaluate “terminal position” after all possible moves by MAX

(The numbers are invented, and just to illustrate the working of minimax)

= terminal position = agent = opponent

1

MIN

MAX

1 -3

A

B

B C

Utility values of “terminal” positions obtained

by an evaluation function

Page 7: GamePlaying Games playing: ideal world of hostile agents ...ivalova/Spring08/cis412/Ectures/Game1.pdf · Game Playing – Example Nim (a simple game) Start with a single pile of tokens

7

Game Playing – Bounded Minimax

Example of minimax with bounded depth

Evaluate “terminal position” after all possible moves in the order:

1. MAX (a.k.a “agent”)2. MIN (a.k.a. “opponent”)3. MAX

(The numbers are invented, and just to illustrate the working of minimax)Assuming MX plays first, complete the MIN/MAX tree

D E F G

= terminal position = agent = opponent

4 -5 -5 1 -7 2 -3 -8

1

MAX

MIN

4 1 2 -3

MAX

1 -3B C

A

Game Playing – Bounded Minimax

If both players play their best moves, then which “line” does the play follow?

D E F G

= terminal position = agent = opponent

4 -5 -5 1 -7 2 -3 -8

1

MAX

MIN

4 1 2 -3

MAX

1 -3B C

A

Game Playing – Perfect Play

Note that the line of perfect play leads the a terminal node with the same value as the root nodeAll intermediate nodes also have that same valueEssentially, this is the meaning of the value at the root nodeCaveat: This only applies if the tree is not expanded further after a move because then the terminals will change and so values can change

Two-Ply Game: Revisited

3 2 2

3

3 12 8 2 4 6 14 5 2

Page 8: GamePlaying Games playing: ideal world of hostile agents ...ivalova/Spring08/cis412/Ectures/Game1.pdf · Game Playing – Example Nim (a simple game) Start with a single pile of tokens

8

An Analysis

This algorithm is only good for games with a low branching factor, Why?

In general, the complexity is:O(bd) where: b = average branching factor

d = number of plies

Is There Another Way?

Take Chess on average has:35 branches and

usually at least 100 movesso game space is:

• 35100

Is this a realistic game space to search?Since time is important factor in gaming searching this game space is highly undesirable

Imperfect Decisions

Why is it Imperfect?

Many games produce very large search trees.

Without knowledge of the terminal states the program is taking a guess as to which path to take.

Cutoffs must be implemented due to time restrictions, either buy computer or game situations.

Evaluation Functions

A function that returns an estimate of the expected utility of the game from a given position.

Given the present situation give an estimate as to the value of the next move.

The performance of a game-playing program is dependant on the quality of the evaluation functions.

How to Judge Quality

Evaluation functions must agree with the utility functions on the terminal states.

It must not take too long ( trade off between accuracy and time cost).

Should reflect actual chance of winning.

Page 9: GamePlaying Games playing: ideal world of hostile agents ...ivalova/Spring08/cis412/Ectures/Game1.pdf · Game Playing – Example Nim (a simple game) Start with a single pile of tokens

9

Design

Different evaluation functions must depend on the nature of the game.

Encode the quality of a position in a number that is representable within the framework of the given language.

Design a heuristic for value to the given position of any object in the game.

Different Types

Material Advantage Evaluation Functions

Values of the pieces are judge independent of other pieces on the board. A value is returned base on the material value of thecomputer minus the material value of the player.

Weighted Linear Functions

• W1f1+w2f2+……wnfn

W’s are weight of the piecesF’s are features of the particular positions

Example

Chess : Material Value – each piece on the board is worth some value ( Pawn = , Knights = 3 …etc) www.imsa.edu/~stendahl/comp/txt/gnuchess.txt

Othello : Value given to # of certain color on the board and # of colors that will be converted lglwww.epfl.ch/~wolf/java/html/Othello-desc.html

Different Types

Use probability of winning as the value to return.

If A has a 100% chance of winning then its value to return is 1.00

Cutoff Search

Cutting of searches at a fixed depth dependant on timeThe deeper the search the more information is available to the program the more accurate the evaluation functions

Iterative deepening – when time runs out return the program returns the deepest completed search.

Is searching a node deeper better than searching more nodes?

Consequences

Evaluation function might return an incorrect value.If the search in cutoff and the next move results involves a capture then the value that is return maybe incorrect.

Horizon problemMoves that are pushed deeper into the search trees may result in an oversight by the evaluation function.

Page 10: GamePlaying Games playing: ideal world of hostile agents ...ivalova/Spring08/cis412/Ectures/Game1.pdf · Game Playing – Example Nim (a simple game) Start with a single pile of tokens

10

Improvements to Cutoff

Evaluation functions should only be applied to quiescent position.

Quiescent Position : Position that are unlikely to exhibit wild swings in value in the near future.

Non quiescent position should be expanded until on is reached. This extra search is called a Quiescence search.

Will provide more information about that one node in the search tree but may result in the lose of information about the other nodes.

Alpha-Beta Pruning

Pruning

What is pruning?The process of eliminating a branch of the search tree from consideration without examining it.

Why prune?To eliminate searching nodes that are potentially unreachable.To speedup the search process.

Alpha-Beta Pruning

A particular technique to find the optimal solution according to a limited depth search using evaluation functions.Returns the same choice as minimax cutoff decisions, but examines fewer nodes.Gets its name from the two variables that are passed along during the search which restrict the set of possible solutions.

Definitions

Alpha – the value of the best choice so far along the path for MAX.Beta – the value of the best choice (lowest value) so far along the path for MIN.

Implementation

Set root node alpha to negative infinity and beta to positive infinity.Search depth first, propagating alpha and beta values down to all nodes visited until reaching desired depth.Apply evaluation function to get the utility of this node.If parent of this node is a MAX node, and the utility calculated is greater than parents current alpha value, replace this alpha value with this utility.

Page 11: GamePlaying Games playing: ideal world of hostile agents ...ivalova/Spring08/cis412/Ectures/Game1.pdf · Game Playing – Example Nim (a simple game) Start with a single pile of tokens

11

Implementation (Cont’d)

If parent of this node is a MIN node, and the utility calculated is less than parents current beta value, replace this beta value with this utility.Based on these updated values, it compares the alpha and beta values of this parent node to determine whether to look at any more children or to backtrack up the tree.Continue the depth first search in this way until all potentially better paths have been evaluated.

Example: Depth = 4

MIN

MAX

α = − ∞

β = + ∞

α = − ∞

β = + ∞

α = − ∞

β = 3

α = 3 α = − ∞

β = + ∞ β = 3

α = − ∞ α = 3 α = − ∞

β = 3 β = 2 β = 3

α = 8 α = 3 α = 2 α = 14

β = + ∞ β = 8 β = + ∞ β = 3

α = − ∞

β = 3

α = 3 α = 3

β = + ∞ β = 3

α = − ∞ α = 3 α = − ∞

β = 3 β = 2 β = 3

α = 8 α = 3 α = 2 α = 14

β = + ∞ β = 8 β = + ∞ β = 3

α = − ∞

β = 3

α = 3 α = 3

β = + ∞ β = 3

α = − ∞ α = 3 α = − ∞

β = 3 β = 2 β = 3

α = 8 α = 3 α = 2 α = 14

β = + ∞ β = 8 β = + ∞ β = 3

α = − ∞

β = + ∞

α = − ∞

β = 3

α = 3 α = − ∞

β = + ∞ β = 3

α = − ∞ α = 3 α = − ∞

β = 3 β = 2 β = 3

α = 8 α = 3 α = 2 α = 14

β = + ∞ β = 8 β = + ∞ β = 3

α = − ∞

β = 3

α = 3 α = − ∞

β = + ∞ β = 3

α = − ∞ α = 3 α = − ∞

β = 3 β = 2 β = 3

α = 8 α = 3 α = 2 α = − ∞

β = + ∞ β = 8 β = + ∞ β = 3

α = − ∞

β = 3

α = 3 α = − ∞

β = + ∞ β = 3

α = − ∞ α = 3 α = − ∞

β = 3 β = 2 β = 3

α = 8 α = 3 α = 2

β = + ∞ β = 8 β = + ∞

α = − ∞

β = 3

α = 3 α = − ∞

β = + ∞ β = 3

α = − ϑ α = 3

β = 3 β = 2

α = 8 α = 3 α = 2

β = + ∞ β = 8 β = + ∞

α = − ∞

β = + ∞

α = 3

β = + ∞

α = − ∞ α = 3

β = 3 β = 2

α = 8 α = 3 α = 2

β = + ∞ β = 8 β = + ∞

α = − ∞

β = + ∞

α = 3

β = + ∞

α = − ∞ α = 3

β = 3 β = 2

α = 8 α = 3 α = 2

β = + ∞ β = 8 β = + ∞

α = − ∞

β = + ∞

α = 3

β = + ∞

α = − ∞ α = 3

β = 3 β = + ∞

α = 8 α = 3 α = 2

β = + ∞ β = 8 β = + ∞

α = − ∞

β = + ∞

α = 3

β = + ∞

α = − ∞ α = 3

β = 3 β = 2

α = 8 α = 3 α = 2

β = + ∞ β = 8 β = + ∞

α = − ∞

β = + ∞

α = 3

β = + ∞

α = − ∞ α = 3

β = 3 β = + ∞

α = 8 α = 3 α = 3

β = + ∞ β = 8 β = + ∞

α = − ∞

β = + ∞

α = 3

β = + ∞

α = − ∞ α = 3

β = 3 β = + ∞

α = 8 α = 3

β = + ∞ β = 8

α = − ∞

β = + ∞

α = 3

β = + ∞

α = − ∞

β = 3

α = 8 α = 3

β = + ∞ β = 8

α = − ∞

β = + ∞

α = − ∞

β = + ∞

α = − ∞

β = 3

α = 8 α = 3

β = + ∞ β = 8

α = − ∞

β = + ∞

α = − ∞

β = + ∞

α = − ∞

β = 8

α = 8 α = 3

β = + ∞ β = 8

α = − ∞

β = + ∞

α = − ∞

β = + ∞

α = − ∞

β = 8

α = 8 α = − ∞

β = + ∞ β = 8

α = − ∞

β = + ∞

α = − ∞

β = + ∞

α = − ∞

β = 8

α = 8

β = + ∞

α = − ∞

β = + ∞

α = − ∞

β = + ∞

α = − ∞

β = + ∞

α = 8

β = + ∞

α = − ∞

β = + ∞

α = − ∞

β = + ∞

α = − ∞

β = + ∞

α = − ∞

β = + ∞

α = − ∞

β = + ∞

α = − ∞

β = + ∞

α = − ∞

β = + ∞

Effectiveness

The effectiveness depends on the order in which the search progresses.If b is the branching factor and d is the depth of the search, the best case for alpha-beta is O(bd/2), compared to the best case of minimax which is O(bd).

Problems

If there is only one legal move, this algorithm will still generate an entire search tree.Designed to identify a “best” move, not to differentiate between other moves.Overlooks moves that forfeit something early for a better position later.Evaluation of utility usually not exact.Assumes opponent will always choose the best possible move.

Games that Include an Element of Chance

Chance Nodes

Many games that unpredictable outcomes caused by such actions as throwing a dice or randomizing a condition.

Such games must include chance nodes in addition to MIN and MAX nodes.

For each node, instead of a definite utility or evaluation, we can only calculate an expected value.

Page 12: GamePlaying Games playing: ideal world of hostile agents ...ivalova/Spring08/cis412/Ectures/Game1.pdf · Game Playing – Example Nim (a simple game) Start with a single pile of tokens

12

Inclusion of Chance Nodes Calculating Expected Value

For the terminal nodes, we apply the utility function.

We can calculate the expected value of a MAX move by applying an expectimax value to each chance node at the same ply.

After calculating the expected value of a chance node, we can apply the normal minimax-value formula.

Expectimax FunctionProvided we are at a chance node preceding MAX’s turn, we can calculate the expected utility for MAX as follows:

Let di be a possible dice roll or random event, where P(di) represents the probability of that event occurring.If we let S denote the set of legal positions generated by each dice roll, we have the expectimax function defined as follows:

expectimax(C) = Σi P(di) maxs єS(utility(s))

Where the function maxs єS will return the move MAX will pick out of all the choices available.Alternately, you can generate an expextimin function for chance nodes preceding MIN’s move.Together they are called the expectiminimax function.

Application to an ExampleMAX

Chance

MIN

MAX

Chance

2 3 14 3 2 12 3 5 2 7 5 61 2

4 3 3 3 5 7 6 2

3.6 3.0 5.8 4.4

3.0 4.4

3.56

.6 .6 .6 .6.4 .4 .4 .4

.6 .4

Chance Nodes: Differences

For minimax, any order-preserving transformation of leaves do not affect the decision.However, when chance nodes are introduced, only positive linear transformations will keep the same decision.

Complexity of Expectiminimax

Where minimax does O(bm), expectiminimax will take O(bmnm), where n is the number of distinct rolls.

The extra cost makes it unrealistic to look too far ahead.

How much this effects our ability to look ahead depends on how many random events that can occur (or possible dice rolls).

Page 13: GamePlaying Games playing: ideal world of hostile agents ...ivalova/Spring08/cis412/Ectures/Game1.pdf · Game Playing – Example Nim (a simple game) Start with a single pile of tokens

13

Wrapping Things Up

Things to Consider

Calculating optimal decisions are intractable in most cases, thus all algorithms must make some assumptions and approximations.

The standard approach based on minimax, evaluation functions, and alpha-beta pruning is just one way of doing things.

These search techniques do not reflect how humans actually play games.

Demonstrating A Problem

Given this two-ply tree, the minimax algorithm will select the right-most branch, since it forces a minimum value of no less than 100.This relies on the assumption that 100, 101, and 102 are in fact actually better than 99.

Summary

We defined the game in terms of a search.Discussion of two-player games given perfect information (minimax).Using cut-off to meet time constraints.Optimizations using alpha-beta pruning to arrive at the same conclusion as minimax would have.Complexity of adding chance to the decision tree.