lecture 6: game playing heshaam faili [email protected] university of tehran two-player games...

32
Lecture 6: Game Playing Heshaam Faili [email protected]. ir University of Tehran Two-player games Minmax search algorithm Alpha-Beta pruning Games with chance State of the art

Upload: kathlyn-walton

Post on 12-Jan-2016

214 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Lecture 6: Game Playing Heshaam Faili hfaili@ece.ut.ac.ir University of Tehran Two-player games Minmax search algorithm Alpha-Beta pruning Games with chance

Lecture 6: Game Playing

Heshaam [email protected]

University of Tehran

Two-player games Minmax search

algorithm Alpha-Beta pruningGames with chance

State of the art

Page 2: Lecture 6: Game Playing Heshaam Faili hfaili@ece.ut.ac.ir University of Tehran Two-player games Minmax search algorithm Alpha-Beta pruning Games with chance

2

Two-player games: motivation

Multi agent environments: cooperative or competitive

Zero-Sum games Two or multi players games Adversarial games Previous heuristics and search procedures are only

useful for single-player games no notion of turns: one or more cooperative agents does not take into account adversarial moves

Games are ideal to explore adversarial strategies well-defined, abstract rules most formulated as search problems really hard combinatorial problems -- chess!!

Page 3: Lecture 6: Game Playing Heshaam Faili hfaili@ece.ut.ac.ir University of Tehran Two-player games Minmax search algorithm Alpha-Beta pruning Games with chance

3

Two-player games

Search tree for each player remains the same Even levels i are moves of player A Odd levels i+1 are moves of player B

Each player searches for a goal (different for each) at their level

Each player evaluates the states according to their heuristic function

A’s best move brings B to the worst state A searches for its best move assuming B

will also search for its best move

Page 4: Lecture 6: Game Playing Heshaam Faili hfaili@ece.ut.ac.ir University of Tehran Two-player games Minmax search algorithm Alpha-Beta pruning Games with chance

4

Structure of Game

Initial state Successor function Terminal test Utility function

Page 5: Lecture 6: Game Playing Heshaam Faili hfaili@ece.ut.ac.ir University of Tehran Two-player games Minmax search algorithm Alpha-Beta pruning Games with chance

5

Game Tree andMinMax search strategy

Search for A’s best next move, so that no matter what B does (in particular, choosing its best move) A will be better off

At each step, evaluate the value of all descendants: take the maximum if it is A’s turn, or the minimum if it is B’s turn

We need the estimated values d moves ahead generate all nodes to level d (BFS) propagate Min-Max values up from leafs

Page 6: Lecture 6: Game Playing Heshaam Faili hfaili@ece.ut.ac.ir University of Tehran Two-player games Minmax search algorithm Alpha-Beta pruning Games with chance

6

Page 7: Lecture 6: Game Playing Heshaam Faili hfaili@ece.ut.ac.ir University of Tehran Two-player games Minmax search algorithm Alpha-Beta pruning Games with chance

7

Illustration of MinMax principle

Page 8: Lecture 6: Game Playing Heshaam Faili hfaili@ece.ut.ac.ir University of Tehran Two-player games Minmax search algorithm Alpha-Beta pruning Games with chance

8

Page 9: Lecture 6: Game Playing Heshaam Faili hfaili@ece.ut.ac.ir University of Tehran Two-player games Minmax search algorithm Alpha-Beta pruning Games with chance

9

MinMax algorithm

while actual node is not a winning state do1. Generate the game tree to level m from actual2. Apply the utility function to the leafs3. Propagate the values of each node up by

layers, according to the MinMax principle, up to the root

4. Choose the action with the maximum value at the root (minmax decision) and move to next state (actual := Apply(op,game)

end Complexity: O(bm)

Page 10: Lecture 6: Game Playing Heshaam Faili hfaili@ece.ut.ac.ir University of Tehran Two-player games Minmax search algorithm Alpha-Beta pruning Games with chance

10

MinMax search on Tic-Tac-Toe

Evaluation function Eval(n) for A infinity if n is a win state for A (Max) -infinity if n is a win state for B (Min) (# of 3-moves for A) -- (# of 3-moves for B)

a 3-move is an open row, column, diagonal

A is X

Eval(s) = 6 - 4

Page 11: Lecture 6: Game Playing Heshaam Faili hfaili@ece.ut.ac.ir University of Tehran Two-player games Minmax search algorithm Alpha-Beta pruning Games with chance

11

Tic-Tac-Toe MinMax search, d=2

Page 12: Lecture 6: Game Playing Heshaam Faili hfaili@ece.ut.ac.ir University of Tehran Two-player games Minmax search algorithm Alpha-Beta pruning Games with chance

12

Tic-Tac-Toe MinMax search, d=4

Page 13: Lecture 6: Game Playing Heshaam Faili hfaili@ece.ut.ac.ir University of Tehran Two-player games Minmax search algorithm Alpha-Beta pruning Games with chance

13

Tic-Tac-Toe MinMax search, d=6

Page 14: Lecture 6: Game Playing Heshaam Faili hfaili@ece.ut.ac.ir University of Tehran Two-player games Minmax search algorithm Alpha-Beta pruning Games with chance

14

MinMax search m is the look-ahead factor: how many

moves ahead of the current one we compute.

MinMax optimal if d <= m (d is depth of

solution) IDS can be adapted to save memory. Horizon problem: node n can be the best at

level m but disastrous at level m+1! Quiescent states: small variations of

evaluation function values. Explore nonquiescent states further.

Page 15: Lecture 6: Game Playing Heshaam Faili hfaili@ece.ut.ac.ir University of Tehran Two-player games Minmax search algorithm Alpha-Beta pruning Games with chance

15

Multi player games

Concepts: alliances, cooperative games

Page 16: Lecture 6: Game Playing Heshaam Faili hfaili@ece.ut.ac.ir University of Tehran Two-player games Minmax search algorithm Alpha-Beta pruning Games with chance

16

Alpha-Beta Prunning

A disadvantage of MinMax search is that it has to expand every node in the subtree to depth m. Can we make a correct MinMax decision without looking at every node?

We want to prune the tree: stop exploring subtrees for which their value will not influence the final MinMax root decision

Observation: all nodes whose values are greater (smaller) that the current minimum (maximum) need not be explored!

Page 17: Lecture 6: Game Playing Heshaam Faili hfaili@ece.ut.ac.ir University of Tehran Two-player games Minmax search algorithm Alpha-Beta pruning Games with chance

17

Alpha-Beta prunning

Page 18: Lecture 6: Game Playing Heshaam Faili hfaili@ece.ut.ac.ir University of Tehran Two-player games Minmax search algorithm Alpha-Beta pruning Games with chance

18

Alpha-Beta search cutoff rules

Keep track and update two values: alpha is the lowest value in the MAX path so far beta is the highest value in the MIN path so far

Rule: do not expand node n when beta <= alpha for MIN node return alpha for MAX node return beta

(,) is propagated from parent to child In min nodes, is sent parent In max nodes, is sent to parent

In min nodes = min(parent, ) In max nodes = max(parent, )

Page 19: Lecture 6: Game Playing Heshaam Faili hfaili@ece.ut.ac.ir University of Tehran Two-player games Minmax search algorithm Alpha-Beta pruning Games with chance

19

Alpha-Beta prunning

function Max-Value(state,game,alpha,beta) returns minmax value of state

if Cutoff-Test(state) then return Eval(state) for each s in Successor(state) do

alpha := Max(alpha,Min-Value(s, game,alpha,beta)if beta <= alpha then return beta

end; return alphafunction Min-Value(state,game,alpha,beta) returns

minmax value of state if Cutoff-Test(state) then return Eval(state) for each s in Successor(state) do

beta := Min (beta,Max-Value(s, game,alpha,beta)if beta <= alpha then return alpha

end; return beta

Page 20: Lecture 6: Game Playing Heshaam Faili hfaili@ece.ut.ac.ir University of Tehran Two-player games Minmax search algorithm Alpha-Beta pruning Games with chance

20

Page 21: Lecture 6: Game Playing Heshaam Faili hfaili@ece.ut.ac.ir University of Tehran Two-player games Minmax search algorithm Alpha-Beta pruning Games with chance

21

Alpha-Beta prunning in Tic-Tac-Toe

Page 22: Lecture 6: Game Playing Heshaam Faili hfaili@ece.ut.ac.ir University of Tehran Two-player games Minmax search algorithm Alpha-Beta pruning Games with chance

22

Imperfect, real time decisions The search depth is not practical

Time limitation Should cut-off nodes earlier (cut-off test)

Decide when to apply EVAL Evaluation function: return an estimate

of the expected utility from a given node Terminal nodes in the same way to utility Computation not take long time … For non-terminal nodes, strongly correlated

to actual chances of winning

Page 23: Lecture 6: Game Playing Heshaam Faili hfaili@ece.ut.ac.ir University of Tehran Two-player games Minmax search algorithm Alpha-Beta pruning Games with chance

23

Evaluation functions Calculating various features.

Number of pawns, Features defines categories

Lead to win, loss, draw Example: In a category, 72% win (+1), 20% loss (-1) and

8% draw(0) Expected value = 72%*1+20%*(-1)+8%*0 = 0.52

Require too many categories and too many experience Compute numerical contribution from each feature and

combine these numbers In Chess: each Pawn: 1, night and bishop: 3, rook: 5,

queen: 9 Good pawn structure : 1, Queen Safety: 1 Weighted Linear function:

Non-linear function:Feature are not dependant to each other

-Bishop in the end of chess is more useful from beginning

-Two Bishop together in more useful than two value bishop

- These functions are approximated from human experience from ages ...

- For other problems, where no human experience exists, machine learning techniques should be

used …

Page 24: Lecture 6: Game Playing Heshaam Faili hfaili@ece.ut.ac.ir University of Tehran Two-player games Minmax search algorithm Alpha-Beta pruning Games with chance

24

Cut-off search If Cut-off-test(state, depth) then return

EVAL(state) Fixed depth width(depth >d) Use IDS until time run out and return the

deepest completed search (see next slide) Cutoff should distinct quiescent states

Forward pruning: some moves at a given node are pruned immediately without further consideration

Can be very dangerous Should be used in symmetric moves ..

Page 25: Lecture 6: Game Playing Heshaam Faili hfaili@ece.ut.ac.ir University of Tehran Two-player games Minmax search algorithm Alpha-Beta pruning Games with chance

25

IDS error …

Page 26: Lecture 6: Game Playing Heshaam Faili hfaili@ece.ut.ac.ir University of Tehran Two-player games Minmax search algorithm Alpha-Beta pruning Games with chance

26

Practical extensions By combining all techniques there still a

problem E.g. in chess, each standard play (3 minutes) can

process 200 million nodes = five plies. With alpha-beta pruning become 10 plies

Evaluation function measures expected utility of a move. Like the heuristic function before

Cut-off criteria by “goodness”, not by depth: selective deepening

Quiescent states expansion Horizon problem Precompiled evaluations for the beginning and

end of the game

Page 27: Lecture 6: Game Playing Heshaam Faili hfaili@ece.ut.ac.ir University of Tehran Two-player games Minmax search algorithm Alpha-Beta pruning Games with chance

27

Games with chance

Backgammon board

Page 28: Lecture 6: Game Playing Heshaam Faili hfaili@ece.ut.ac.ir University of Tehran Two-player games Minmax search algorithm Alpha-Beta pruning Games with chance

28

Search tree with probabilities

Page 29: Lecture 6: Game Playing Heshaam Faili hfaili@ece.ut.ac.ir University of Tehran Two-player games Minmax search algorithm Alpha-Beta pruning Games with chance

29

Games with chance chance nodes: nodes where chance

events happen (rolling dice, flipping a coin, etc)

Evaluate expected value by averaging outcome probabilities: C is a chance node P(di) probability of rolling di (1,2, …, 12) S(C,di) is the set of positions generated

by applying all legal moves for roll di to C

Page 30: Lecture 6: Game Playing Heshaam Faili hfaili@ece.ut.ac.ir University of Tehran Two-player games Minmax search algorithm Alpha-Beta pruning Games with chance

30

Position evaluation in games with chance

Page 31: Lecture 6: Game Playing Heshaam Faili hfaili@ece.ut.ac.ir University of Tehran Two-player games Minmax search algorithm Alpha-Beta pruning Games with chance

31

State-of-the art game programs

Championship programs for Checkers, Chess, Backgammon, Othello, Go, and more

Combination of brute force search, heuristics, and game database

Some programs attempt to improve or learn the heuristic function by comparing its estimates with actual outcomes

Some programs attempt to discover rules and heuristic functions

Page 32: Lecture 6: Game Playing Heshaam Faili hfaili@ece.ut.ac.ir University of Tehran Two-player games Minmax search algorithm Alpha-Beta pruning Games with chance

32

?