search with other agents - sharif

Soleymani

Search with Other AgentsCE417: Introduction to Artificial IntelligenceSharif University of TechnologyFall 2020

“Artificial Intelligence: A Modern Approach”, 3rd Edition, Chapter 5

Most slides have been adopted from Klein and Abdeel, CS188, UC Berkeley.

Adversarial Games

2

Zero-Sum Game Games ☺

Checkers: 1950: First computer player. 1994:First computer champion: Chinook ended 40-year-reign of human champion Marion Tinsley usingcomplete 8-piece endgame. 2007: Checkerssolved!

Chess: 1997: Deep Blue defeats human championGary Kasparov in a six-game match. Deep Blueexamined 200M positions per second, used verysophisticated evaluation and undisclosed methodsfor extending some lines of search up to 40 ply.Current programs are even better, if less historic.

Go: Human champions are now starting to bechallenged by machines, though the best humansstill beat the best machines. In go, b > 300!Classic programs use pattern knowledge bases,but big recent advances use Monte Carlo(randomized) expansion methods.

3

Zero-Sum Game Games ☺

Checkers: 1950: First computer player. 1994:First computer champion: Chinook ended 40-year-reign of human champion Marion Tinsley usingcomplete 8-piece endgame. 2007: Checkerssolved!

Chess: 1997: Deep Blue defeats human championGary Kasparov in a six-game match. Deep Blueexamined 200M positions per second, used verysophisticated evaluation and undisclosed methodsfor extending some lines of search up to 40 ply.Current programs are even better, if less historic.

Go :2016: Alpha GO defeats humanchampion. Uses Monte Carlo Tree Search,learned evaluation function by deeplearning.

4

Many different kinds of games!

Axes:

Deterministic or stochastic?

One, two, or more players?

Zero sum?

Perfect information (can you see the state)?

Want algorithms for calculating a strategy (policy) which recommends amove from each state

Types of Games

5

Zero-Sum Games

Zero-Sum Games

Agents have opposite utilities

(values on outcomes)

Lets us think of a single value that one

maximizes and the other minimizes

Adversarial, pure competition

General Games

Agents have independent utilities

(values on outcomes)

Cooperation, indifference, competition,

and more are all possible

More later on non-zero-sum games

6

Primary assumptions

We start with these games:

Two-player

Turn taking

agents act alternately

Zero-sum

agents’ goals are in conflict: sum of utility values at the end of the

game is zero or constant

Perfect information

fully observable

7

Examples:Tic-tac-toe, chess, checkers

Deterministic Games

Many possible formalizations, one is:

States: S (start at s0)

Players: P={1...N} (usually take turns)

Actions:A (may depend on player / state)

Transition Function: SxA → S

Terminal Test: S → {t,f}

Terminal Utilities: SxP → R

Solution for a player is a policy: S →A

8

Single-Agent Trees

8

2 0 2 6 4 6… …

9

Value of a State

Non-Terminal States:

8

2 0 2 6 4 6… … Terminal States:

Value of a state: The best achievable outcome (utility) from that state

10

Adversarial Search

11

Game tree (tic-tac-toe)

12

Two players:𝑃1 and 𝑃2 (𝑃1 is now searching to find a good move)

Zero-sum games:𝑃1 gets 𝑈(𝑡),𝑃2 gets 𝐶 − 𝑈(𝑡) for terminal node 𝑡

Utilities from the point of view of 𝑃1

1-ply = half move

𝑃1

𝑃2

𝑃1

𝑃2

𝑃1: 𝑋𝑃2: 𝑂

Adversarial Game Trees

-20 -8 -18 -5 -10 +4… … -20 +8

13

Minimax Values

+8-10-5-8

States Under Agent’s Control:

Terminal States:

States Under Opponent’s Control:

14

Tic-Tac-Toe Game Tree

15

Adversarial Search (Minimax)

Opponent is assumed optimal

Minimax search:

A state-space search tree

Players alternate turns

Compute each node’s minimax value:

the best achievable utility against a

rational (optimal) adversary8 2 5 6

max

min2 5

5

Terminal values:part of the game

16

Minimax

𝑀𝐼𝑁𝐼𝑀𝐴𝑋(𝑠) shows the best achievable outcome of being in

state 𝑠 (assumption: optimal opponent)

17

𝑀𝐼𝑁𝐼𝑀𝐴𝑋 𝑠 = ൞

𝑈𝑇𝐼𝐿𝐼𝑇𝑌(𝑠,𝑀𝐴𝑋) 𝑖𝑓 𝑇𝐸𝑅𝑀𝐼𝑁𝐴𝐿_𝑇𝐸𝑆𝑇(𝑠)

𝑚𝑎𝑥𝑎∈𝐴𝐶𝑇𝐼𝑂𝑁𝑆 𝑠 𝑀𝐼𝑁𝐼𝑀𝐴𝑋(𝑅𝐸𝑆𝑈𝐿𝑇(𝑠, 𝑎)) 𝑃𝐿𝐴𝑌𝐸𝑅 𝑠 = 𝑀𝐴𝑋

𝑚𝑖𝑛𝑎∈𝐴𝐶𝑇𝐼𝑂𝑁𝑆 𝑠 𝑀𝐼𝑁𝐼𝑀𝐴𝑋(𝑅𝐸𝑆𝑈𝐿𝑇 𝑠, 𝑎 ) 𝑃𝐿𝐴𝑌𝐸𝑅 𝑠 = 𝑀𝐼𝑁Utility of being in state s

3 2 2

3

Minimax (Cont.)

18

Optimal strategy: move to the state with highest minimax

value

Best achievable payoff against best play

Maximizes the worst-case outcome for MAX

It works for zero-sum games

Minimax Implementation

def min-value(state):initialize v = +∞for each successor of state:

v = min(v, max-value(successor))return v

def max-value(state):initialize v = -∞for each successor of state:

v = max(v, min-value(successor))return v

19

Minimax Implementation (Dispatch)

def value(state):

if the state is a terminal state: return the state’s utility

if the next agent is MAX: return max-value(state)

if the next agent is MIN: return min-value(state)

def min-value(state):initialize v = +∞for each successor of state:

v = min(v, value(successor))return v


v = max(v, value(successor))return v

20

Minimax algorithm

21

Depth first search

function 𝑀𝐼𝑁𝐼𝑀𝐴𝑋_𝐷𝐸𝐶𝐼𝑆𝐼𝑂𝑁(𝑠𝑡𝑎𝑡𝑒) returns 𝑎𝑛 𝑎𝑐𝑡𝑖𝑜𝑛return max

𝑎∈𝐴𝐶𝑇𝐼𝑂𝑁𝑆(𝑠𝑡𝑎𝑡𝑒)𝑀𝐼𝑁_𝑉𝐴𝐿𝑈𝐸(𝑅𝐸𝑆𝑈𝐿𝑇(𝑠𝑡𝑎𝑡𝑒, 𝑎))

function 𝑀𝐴𝑋_𝑉𝐴𝐿𝑈𝐸(𝑠𝑡𝑎𝑡𝑒) returns 𝑎 𝑢𝑡𝑖𝑙𝑖𝑡𝑦 𝑣𝑎𝑙𝑢𝑒if 𝑇𝐸𝑅𝑀𝐼𝑁𝐴𝐿_𝑇𝐸𝑆𝑇(𝑠𝑡𝑎𝑡𝑒) then return 𝑈𝑇𝐼𝐿𝐼𝑇𝑌(𝑠𝑡𝑎𝑡𝑒)𝑣 ← −∞for each 𝑎 in 𝐴𝐶𝑇𝐼𝑂𝑁𝑆(𝑠𝑡𝑎𝑡𝑒) do

𝑣 ← 𝑀𝐴𝑋(𝑣,𝑀𝐼𝑁_𝑉𝐴𝐿𝑈𝐸(𝑅𝐸𝑆𝑈𝐿𝑇𝑆(𝑠𝑡𝑎𝑡𝑒, 𝑎)))return 𝑣

function 𝑀𝐼𝑁_𝑉𝐴𝐿𝑈𝐸(𝑠𝑡𝑎𝑡𝑒) returns 𝑎 𝑢𝑡𝑖𝑙𝑖𝑡𝑦 𝑣𝑎𝑙𝑢𝑒if 𝑇𝐸𝑅𝑀𝐼𝑁𝐴𝐿_𝑇𝐸𝑆𝑇(𝑠𝑡𝑎𝑡𝑒) then return 𝑈𝑇𝐼𝐿𝐼𝑇𝑌(𝑠𝑡𝑎𝑡𝑒)𝑣 ← ∞for each 𝑎 in 𝐴𝐶𝑇𝐼𝑂𝑁𝑆(𝑠𝑡𝑎𝑡𝑒) do

𝑣 ← 𝑀𝐼𝑁(𝑣,𝑀𝐴𝑋_𝑉𝐴𝐿𝑈𝐸(𝑅𝐸𝑆𝑈𝐿𝑇𝑆(𝑠𝑡𝑎𝑡𝑒, 𝑎)))return 𝑣

Properties of minimax

Complete?Yes (when tree is finite)

Optimal?Yes (against an optimal opponent)

Time complexity:𝑂(𝑏𝑚)

Space complexity:𝑂(𝑏𝑚) (depth-first exploration)

For chess, 𝑏 ≈ 35,𝑚 > 50 for reasonable games

Finding exact solution is completely infeasible

22

Minimax Properties

Optimal against a perfect player. Otherwise?

10 10 9 100

max

min

23

Video of Demo Min vs. Exp (Min)

24

Video of Demo Min vs. Exp (Exp)

25

Game Tree Pruning

26

Pruning

27

Correct minimax decision without looking at every node

in the game tree

α-β pruning

Branch & bound algorithm

Prunes away branches that cannot influence the final decision

α-β pruning example

28


29


30


31


32

Alpha-Beta Pruning

General configuration (MIN version)

We’re computing the MIN-VALUE at some node n

We’re looping over n’s children

Who cares about n’s value? MAX

Let 𝛼 be the best value that MAX can get at any

choice point along the current path from the root

If n becomes worse than 𝛼, MAX will avoid it, so we

can stop considering n’s other children (it’s already

bad enough that it won’t be played)

MAX version is symmetric

MAX

MIN

MAX

MIN

𝛼

n

33

α-β pruning

34

Assuming depth-first generation of tree

We prune node 𝑛 when player has a better choice 𝑚 at

(parent or) any ancestor of 𝑛

Two types of pruning (cuts):

pruning of max nodes (α-cuts)

pruning of min nodes (β-cuts)

Why is it called α-β?

α: Value of the best (highest) choice found so far at any choice

point along the path for MAX

𝛽: Value of the best (lowest) choice found so far at any choice

point along the path for MIN

Updating α and 𝛽 during the search process

For a MAX node once the value of this node is known to be more

than the current 𝛽 (v ≥ 𝛽), its remaining branches are pruned.

For a MIN node once the value of this node is known to be less

than the current 𝛼 (v ≤ 𝛼), its remaining branches are pruned.

35

Alpha-Beta Quiz 2

36

Alpha-Beta Quiz 2

10

10

>=100

2

<=2

37

α-β pruning (an other example)

38

1 5

≤23

≥5

2

3

Alpha-Beta Implementation

def min-value(state , α, β):initialize v = +∞for each successor of state:

v = min(v, value(successor, α, β))if v ≤ α return vβ = min(β, v)

return v

def max-value(state, α, β):initialize v = -∞for each successor of state:

v = max(v, value(successor, α, β))if v ≥ β return vα = max(α, v)

return v

α: MAX’s best option on path to root

β: MIN’s best option on path to root

39

40

function 𝐴𝐿𝑃𝐻𝐴_𝐵𝐸𝑇𝐴_𝑆𝐸𝐴𝑅𝐶𝐻(𝑠𝑡𝑎𝑡𝑒) returns 𝑎𝑛 𝑎𝑐𝑡𝑖𝑜𝑛𝑣 ← 𝑀𝐴𝑋_𝑉𝐴𝐿𝑈𝐸(𝑠𝑡𝑎𝑡𝑒, −∞,+∞)return the 𝑎𝑐𝑡𝑖𝑜𝑛 in 𝐴𝐶𝑇𝐼𝑂𝑁𝑆(𝑠𝑡𝑎𝑡𝑒) with value 𝑣

function 𝑀𝐴𝑋_𝑉𝐴𝐿𝑈𝐸(𝑠𝑡𝑎𝑡𝑒, 𝛼, 𝛽) returns 𝑎 𝑢𝑡𝑖𝑙𝑖𝑡𝑦 𝑣𝑎𝑙𝑢𝑒if 𝑇𝐸𝑅𝑀𝐼𝑁𝐴𝐿_𝑇𝐸𝑆𝑇(𝑠𝑡𝑎𝑡𝑒) then return 𝑈𝑇𝐼𝐿𝐼𝑇𝑌(𝑠𝑡𝑎𝑡𝑒)𝑣 ← −∞for each 𝑎 in 𝐴𝐶𝑇𝐼𝑂𝑁𝑆(𝑠𝑡𝑎𝑡𝑒) do

𝑣 ← 𝑀𝐴𝑋(𝑣,𝑀𝐼𝑁_𝑉𝐴𝐿𝑈𝐸(𝑅𝐸𝑆𝑈𝐿𝑇𝑆(𝑠𝑡𝑎𝑡𝑒, 𝑎), 𝛼, 𝛽))if 𝑣 ≥ 𝛽 then return 𝑣𝛼 ← 𝑀𝐴𝑋(𝛼, 𝑣)

return 𝑣

function 𝑀𝐼𝑁_𝑉𝐴𝐿𝑈𝐸(𝑠𝑡𝑎𝑡𝑒, 𝛼, 𝛽) returns 𝑎 𝑢𝑡𝑖𝑙𝑖𝑡𝑦 𝑣𝑎𝑙𝑢𝑒if 𝑇𝐸𝑅𝑀𝐼𝑁𝐴𝐿_𝑇𝐸𝑆𝑇(𝑠𝑡𝑎𝑡𝑒) then return 𝑈𝑇𝐼𝐿𝐼𝑇𝑌(𝑠𝑡𝑎𝑡𝑒)𝑣 ← +∞for each 𝑎 in 𝐴𝐶𝑇𝐼𝑂𝑁𝑆(𝑠𝑡𝑎𝑡𝑒) do

𝑣 ← 𝑀𝐼𝑁(𝑣,𝑀𝐴𝑋_𝑉𝐴𝐿𝑈𝐸(𝑅𝐸𝑆𝑈𝐿𝑇𝑆(𝑠𝑡𝑎𝑡𝑒, 𝑎), 𝛼, 𝛽))if 𝑣 ≤ 𝛼 then return 𝑣𝛽 ← 𝑀𝐼𝑁(𝛽, 𝑣)

return 𝑣

Alpha-Beta Pruning Properties This pruning has no effect on minimax value computed for the root!

Values of intermediate nodes might be wrong

Important: children of the root may have the wrong value

So the most naïve version won’t let you do action selection

Good child ordering improves effectiveness of pruning

With “perfect ordering”:

Time complexity drops to O(bm/2)

Doubles solvable depth!

Full search of, e.g. chess, is still hopeless…

This is a simple example of metareasoning (computing about what to compute)

10 10 0

max

min

41

Order of moves

Good move ordering improves effectiveness of pruning

Best order: time complexity is 𝑂(𝑏 Τ𝑚 2)

42

?

Computational time limit (example)

43

100 secs is allowed for each move (game rule)

104 nodes/sec (processor speed)

We can explore just 106 nodes for each move

bm = 106, b=35 ⟹ m=4

(4-ply look-ahead is a hopeless chess player!)

- reaches about depth 8 – decent chess program

It is still hopeless

Resource Limits

44

Resource Limits

Problem: In realistic games, cannot searchto leaves!

Solution: Depth-limited search Instead, search only to a limited depth in the tree

Replace terminal utilities with an evaluationfunction for non-terminal positions

Guarantee of optimal play is gone

More plies makes a BIG difference

Use iterative deepening for an anytimealgorithm

? ? ? ?

-1 -2 4 9

4

min

max

-2 4

45

Computational time limit: Solution

46

Cut off the search and apply a heuristic evaluation

function

cutoff test: turns non-terminal nodes into terminal leaves

Cut off test instead of terminal test (e.g., depth limit)

evaluation function: estimated desirability of a state

Heuristic function evaluation instead of utility function

This approach does not guarantee optimality.

Heuristic minimax

47

𝐻𝑀𝐼𝑁𝐼𝑀𝐴𝑋 𝑠,𝑑 =

൞

𝐸𝑉𝐴𝐿(𝑠,𝑀𝐴𝑋) 𝑖𝑓 𝐶𝑈𝑇𝑂𝐹𝐹_𝑇𝐸𝑆𝑇(𝑠, 𝑑)

𝑚𝑎𝑥𝑎∈𝐴𝐶𝑇𝐼𝑂𝑁𝑆 𝑠 𝐻_𝑀𝐼𝑁𝐼𝑀𝐴𝑋(𝑅𝐸𝑆𝑈𝐿𝑇 𝑠, 𝑎 , 𝑑 + 1) 𝑃𝐿𝐴𝑌𝐸𝑅 𝑠 = MAX

𝑚𝑖𝑛𝑎∈𝐴𝐶𝑇𝐼𝑂𝑁𝑆 𝑠 𝐻_𝑀𝐼𝑁𝐼𝑀𝐴𝑋(𝑅𝐸𝑆𝑈𝐿𝑇 𝑠, 𝑎 , 𝑑 + 1) 𝑃𝐿𝐴𝑌𝐸𝑅 𝑠 = MIN

Depth Matters

Evaluation functions are always imperfect

The deeper in the tree the evaluation

function is buried, the less the quality of

the evaluation function matters

An important example of the tradeoff

between complexity of features and

complexity of computation

48

Video of Demo Limited Depth (2)

49

Video of Demo Limited Depth (10)

50

Evaluation Functions

51

Evaluation functions

52

For terminal states, it should order them in the same way

as the true utility function.

For non-terminal states, it should be strongly correlated

with the actual chances of winning.

It must not need high computational cost.

Evaluation functions based on features

53

Example: features for evaluation of the chess states

Number of each kind of piece: number of white pawns, black

pawns, white queens, black queens, etc

King safety

Good pawn structure

…

Evaluation Functions

Evaluation functions score non-terminals in depth-limited search

Ideal function: returns the actual minimax value of the position

In practice: typically weighted linear sum of features:

e.g. f1(s) = (num white queens – num black queens), etc.

54

Evaluation for Pacman

[Demo: thrashing d=2, thrashing d=2 (fixed evaluation function), smart ghosts coordinate (L6D6,7,8,10)]

55

Video of Demo Thrashing (d=3)

56

Why Pacman Starves

A danger of replanning agents! He knows his score will go up by eating the dot now (west, east)

He knows his score will go up just as much by eating the dot later (east, west)

There are no point-scoring opportunities after eating the dot (within the horizon, two here)

Therefore, waiting seems just as good as eating: he may go east, then back west in the next round ofreplanning!

57

Fixing the evaluation function

Evaluation function:

# of eaten dots + 1/(distance to the closest dot)

58

Video of Demo Thrashing -- Fixed (d=3)

59

Video of Demo Smart Ghosts (Coordination)

60

Video of Demo Smart Ghosts (Coordination) –

Zoomed In

61

Other Game Types

62

Multi-Agent Utilities

What if the game is not zero-sum, or has multiple players?

Generalization of minimax: Terminals have utility tuples

Node values are also utility tuples

Each player maximizes its own component

Can give rise to cooperation and

competition dynamically…

1,6,6 7,1,2 6,1,2 7,2,1 5,1,7 1,5,2 7,7,1 5,2,5

1,6,6

63

Uncertain Outcomes

64

Probabilities

65

Reminder: Probabilities A random variable represents an event whose outcome is unknown

A probability distribution is an assignment of weights to outcomes

Example:Traffic on freeway

Random variable:T = whether there’s traffic

Outcomes:T in {none, light, heavy}

Distribution: P(T=none) = 0.25, P(T=light) = 0.50, P(T=heavy) = 0.25

Some laws of probability (more later):

Probabilities are always non-negative

Probabilities over all possible outcomes sum to one

As we get more evidence, probabilities may change:

P(T=heavy) = 0.25, P(T=heavy | Hour=8am) = 0.60

We’ll talk about methods for reasoning and updating probabilities later

0.25

0.50

0.25

66

The expected value of a function of a random variable is theaverage, weighted by the probability distribution overoutcomes

Example: How long to get to the airport?

Reminder: Expectations

0.25 0.50 0.25Probability:

20 min 30 min 60 minTime:35 min

x x x+ +

67

Mixed Layer Types

E.g. Backgammon

Expectiminimax

Environment is an extra “random agent”

player that moves after each min/max agent

Each node computes the appropriate

combination of its children

68

Stochastic games

69

𝐸𝑋𝑃𝐸𝐶𝑇_𝑀𝐼𝑁𝐼𝑀𝐴𝑋(𝑠) =𝑈𝑇𝐼𝐿𝐼𝑇𝑌(𝑠,𝑀𝐴𝑋) 𝑖𝑓 𝑇𝐸𝑅𝑀𝐼𝑁𝐴𝐿_𝑇𝐸𝑆𝑇(𝑠)

𝑚𝑎𝑥𝑎∈𝐴𝐶𝑇𝐼𝑂𝑁𝑆 𝑠 𝐸𝑋𝑃𝐸𝐶𝑇_𝑀𝐼𝑁𝐼𝑀𝐴𝑋(𝑅𝐸𝑆𝑈𝐿𝑇(𝑠, 𝑎)) 𝑃𝐿𝐴𝑌𝐸𝑅 𝑠 = MAX

𝑚𝑖𝑛𝑎∈𝐴𝐶𝑇𝐼𝑂𝑁𝑆 𝑠 𝐸𝑋𝑃𝐸𝐶𝑇_𝑀𝐼𝑁𝐼𝑀𝐴𝑋(𝑅𝐸𝑆𝑈𝐿𝑇(𝑠, 𝑎)) 𝑃𝐿𝐴𝑌𝐸𝑅 𝑠 = MIN

𝑟𝑃(𝑟) 𝐸𝑋𝑃𝐸𝐶𝑇_𝑀𝐼𝑁𝐼𝑀𝐴𝑋(𝑅𝐸𝑆𝑈𝐿𝑇 𝑠, 𝑟 ) 𝑃𝐿𝐴𝑌𝐸𝑅 𝑠 = CHANCE

Stochastic games: Backgammon

70

Stochastic games

71

Expected utility: Chance nodes take average (expectation) over

all possible outcomes.

It is consistent with the definition of rational agents trying to

maximize expected utility.

2 3 1 4

2.1 1.3average of the values

weighted by their

probabilities

Evaluation functions for stochastic games

72

An order preserving transformation on leaf values is nota sufficient condition.

Evaluation function must be a positive lineartransformation of the expected utility of a position.

Properties of search space for stochastic games

73

𝑂(𝑏𝑚𝑛𝑚) Backgammon: 𝑏 ≈ 20 (can be up to 4000 for double dice rolls), 𝑛 = 21

(no. of different possible dice rolls)

3-plies is manageable (≈ 108 nodes)

Probability of reaching a given node decreases enormously by

increasing the depth (multiplying probabilities)

Forming detailed plans of actions may be pointless

Limiting depth is not such damaging particularly when the probability values (for

each non-deterministic situation) are close to each other

But pruning is not straightforward.

Example: Backgammon

Dice rolls increase b: 21 possible rolls with 2 dice

Backgammon 20 legal moves

Depth 2 = 20 x (21 x 20)3 = 1.2 x 109

As depth increases, probability of reaching a given

search node shrinks

So usefulness of search is diminished

So limiting depth is less damaging

But pruning is trickier…

Historic AI: TDGammon uses depth-2 search + very

good evaluation function + reinforcement learning

Image: Wikipedia

74

1st AI world champion in any game!

State-of-the-art game programs

Chess (𝑏 ≈ 35) In 1997, Deep Blue defeated Kasparov.

ran on a parallel computer doing alpha-beta search.

reaches depth 14 plies routinely. techniques to extend the effective search depth

Hydra: Reaches depth 18 plies using more heuristics.

Checkers (𝑏 < 10) Chinook (ran on a regular PC and uses alpha-beta search) ended 40-

year-reign of human world championTinsley in 1994.

Since 2007, Chinook has been able to play perfectly by using alpha-betasearch combined with a database of 39 trillion endgame positions.

75

State-of-the-art game programs (Cont.)

76

Othello (𝑏 is usually between 5 to 15) Logistello defeated the human world champion by

six games to none in 1997.

Human champions are no match for computers atOthello.

Go (𝑏 > 300) Human champions refuse to compete against

computers (current programs are still at advancedamateur level).

MOGO avoided alpha-beta search and used MonteCarlo rollouts.

AlphaGo (2016) has beaten professionals withouthandicaps.

State-of-the-art game programs (Cont.)

77

Backgammon (stochastic) TD-Gammon (1992) was competitive with top

human players. Depth 2 or 3 search along with a good evaluation

function developed by learning methods

Bridge (partially observable, multiplayer) In 1998, GIB was 12th in a filed of 35 in the par

contest at human world championship.

In 2005, Jack defeated three out of seven topchampions pairs. Overall, it lost by a small margin.

Scrabble (partially observable & stochastic) In 2006, Quackle defeated the former world

champion 3-2.

Utilities & Expectimax

Utilities

79

Utilities

Utilities are functions from outcomes (states of the world) to real numbersthat describe an agent’s preferences

Where do utilities come from? In a game, may be simple (+1/-1)

Utilities summarize the agent’s goals

Theorem: any “rational” preferences can be summarized as a utility function

We hard-wire utilities and let behaviors emerge Why don’t we let agents pick utilities?

Why don’t we prescribe behaviors?

80

Utilities: Uncertain Outcomes

Getting ice cream

Get Single Get Double

Oops Whew!

81

Expectimax Search (One Player Game)

Why wouldn’t we know what the result of an action will be? Explicit randomness: rolling dice

Unpredictable opponents: the ghosts respond randomly

Actions can fail: when moving a robot, wheels might slip

Values should now reflect average-case (expectimax)

outcomes, not worst-case outcomes

Expectimax search: compute the average score under

optimal play Max nodes as in minimax search

Chance nodes are like min nodes but the outcome is uncertain

Calculate their expected utilities i.e. take weighted average (expectation) of children

Soon, we’ll learn how to formalize the underlying uncertain-

result problems as Markov Decision Processes

10 4 5 7

max

chance

10 10 9 100

82

Expectimax Pseudocode: One Player

def value(state):

if the state is a terminal state: return the state’s utility

if the next agent is MAX: return max-value(state)

if the next agent is EXP: return exp-value(state)

def exp-value(state):initialize v = 0for each successor of state:

p = probability(successor)v += p * value(successor)

return v


v = max(v, value(successor))return v

83

Expectimax Pseudocode

def exp-value(state):initialize v = 0for each successor of state:

p = probability(successor)v += p * value(successor)

return v 5 78 24 -12

1/21/3

1/6

v = (1/2) (8) + (1/3) (24) + (1/6) (-12) = 10

84

Expectimax Pruning?

12 93 2

85

What Utilities to Use?

For worst-case minimax reasoning, terminal function scale doesn’t matter

We just want better states to have higher evaluations (get the ordering right)

We call this insensitivity to monotonic transformations

For average-case expectimax reasoning, we need magnitudes to be meaningful

0 40 20 30 x2 0 1600 400 900

86

Expectimax Search

Why wouldn’t we know what the result of an action will be? Explicit randomness: rolling dice Unpredictable opponents: the ghosts respond randomly Unpredictable humans: humans are not perfect Actions can fail: when moving a robot, wheels might slip

Values should now reflect average-case (expectimax)outcomes, not worst-case (minimax) outcomes

Expectimax search: compute the average score under optimalplay Max nodes as in minimax search Chance nodes are like min nodes but the outcome is uncertain Calculate their expected utilities I.e. take weighted average (expectation) of children

Later, we’ll learn how to formalize the underlying uncertain-result problems as Markov Decision Processes

10 4 5 7

max

chance

10 10 9 100

[Demo: min vs exp (L7D1,2)]

87

Worst-Case vs. Average Case (One player

Game)

10 10 9 100

max

min

Idea: Uncertain outcomes controlled by chance, not an adversary!

88

Video of Demo Minimax vs Expectimax (Min)

89

Video of Demo Minimax vs Expectimax (Exp)

90

In expectimax search, we have a probabilistic modelof how the opponent (or environment) will behavein any state could be a simple uniform distribution (roll a die) could be sophisticated and require a great deal of

computation a chance node for any outcome out of our control:

opponent or environment The model might say that adversarial actions are likely!

What Probabilities to Use?

Having a probabilistic belief about another agent’s action

does not mean that the agent is flipping any coins!

91

What Probabilities to Use?

Having a probabilistic belief about another agent’s action

does not mean that the agent is flipping any coins!

92

In expectimax search, we have a probabilistic modelof how the opponent (or environment) will behavein any state could be a simple uniform distribution (roll a die) could be sophisticated and require a great deal of

computation a chance node for any outcome out of our control:

opponent or environment The model might say that adversarial actions are likely!

For now, assume each chance node magicallycomes along with probabilities that specify thedistribution over its outcomes

Quiz: Informed Probabilities

Let’s say you know that your opponent is actually running a depth 2 minimax, using

the result 80% of the time, and moving randomly otherwise

Question:What tree search should you use?

0.1 0.9

▪ Answer: Expectimax!▪ To figure out EACH chance node’s probabilities,

you have to run a simulation of your opponent

▪ This kind of thing gets very slow very quickly

▪ Even worse if you have to simulate your opponent simulating you…

▪ … except for minimax and maximax, which have the nice property that it all collapses into one game tree

This is basically how you would model a human, except for their utility: their utility might be the same as yours (i.e. you try to help them, but they are depth 2 and noisy), or they might have a slightly different utility (like another person navigating in the office)

93

Modeling Assumptions

94

The Dangers of Optimism and Pessimism

Dangerous OptimismAssuming chance when the world is

adversarial

Dangerous PessimismAssuming the worst case when it’s not

likely

95

Video of Demo World Assumptions

Random Ghost – Expectimax Pacman

96


Adversarial Ghost – Minimax Pacman

97


Adversarial Ghost – Expectimax Pacman

98


Random Ghost – Minimax Pacman

99

Assumptions vs. Reality

Adversarial Ghost Random Ghost

MinimaxPacman

Won 5/5

Avg. Score: 483

Won 5/5

Avg. Score: 493

ExpectimaxPacman

Won 1/5

Avg. Score: -303

Won 5/5

Avg. Score: 503

Results from playing 5 games

Pacman used depth 4 search with an eval function that avoids troubleGhost used depth 2 search with an eval function that seeks Pacman

100

Maximum Expected Utility

Why should we average utilities? Why not minimax?

Principle of maximum expected utility:

A rational agent should chose the action that maximizes itsexpected utility, given its knowledge

Questions:

Where do utilities come from?

How do we know such utilities even exist?

How do we know that averaging even makes sense?

What if our behavior (preferences) can’t be described by

utilities?

101

search with other agents - sharif

Documents