search with other agents - sharif

101
Soleymani Search with Other Agents CE417: Introduction to Artificial Intelligence Sharif University of Technology Fall 2020 β€œArtificial Intelligence: A Modern Approach”, 3 rd Edition, Chapter 5 Most slides have been adopted from Klein and Abdeel, CS188, UC Berkeley.

Upload: others

Post on 30-Dec-2021

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Search with Other Agents - Sharif

Soleymani

Search with Other AgentsCE417: Introduction to Artificial IntelligenceSharif University of TechnologyFall 2020

β€œArtificial Intelligence: A Modern Approach”, 3rd Edition, Chapter 5

Most slides have been adopted from Klein and Abdeel, CS188, UC Berkeley.

Page 2: Search with Other Agents - Sharif

Adversarial Games

2

Page 3: Search with Other Agents - Sharif

Zero-Sum Game Games ☺

Checkers: 1950: First computer player. 1994:First computer champion: Chinook ended 40-year-reign of human champion Marion Tinsley usingcomplete 8-piece endgame. 2007: Checkerssolved!

Chess: 1997: Deep Blue defeats human championGary Kasparov in a six-game match. Deep Blueexamined 200M positions per second, used verysophisticated evaluation and undisclosed methodsfor extending some lines of search up to 40 ply.Current programs are even better, if less historic.

Go: Human champions are now starting to bechallenged by machines, though the best humansstill beat the best machines. In go, b > 300!Classic programs use pattern knowledge bases,but big recent advances use Monte Carlo(randomized) expansion methods.

3

Page 4: Search with Other Agents - Sharif

Zero-Sum Game Games ☺

Checkers: 1950: First computer player. 1994:First computer champion: Chinook ended 40-year-reign of human champion Marion Tinsley usingcomplete 8-piece endgame. 2007: Checkerssolved!

Chess: 1997: Deep Blue defeats human championGary Kasparov in a six-game match. Deep Blueexamined 200M positions per second, used verysophisticated evaluation and undisclosed methodsfor extending some lines of search up to 40 ply.Current programs are even better, if less historic.

Go :2016: Alpha GO defeats humanchampion. Uses Monte Carlo Tree Search,learned evaluation function by deeplearning.

4

Page 5: Search with Other Agents - Sharif

Many different kinds of games!

Axes:

Deterministic or stochastic?

One, two, or more players?

Zero sum?

Perfect information (can you see the state)?

Want algorithms for calculating a strategy (policy) which recommends amove from each state

Types of Games

5

Page 6: Search with Other Agents - Sharif

Zero-Sum Games

Zero-Sum Games

Agents have opposite utilities

(values on outcomes)

Lets us think of a single value that one

maximizes and the other minimizes

Adversarial, pure competition

General Games

Agents have independent utilities

(values on outcomes)

Cooperation, indifference, competition,

and more are all possible

More later on non-zero-sum games

6

Page 7: Search with Other Agents - Sharif

Primary assumptions

We start with these games:

Two-player

Turn taking

agents act alternately

Zero-sum

agents’ goals are in conflict: sum of utility values at the end of the

game is zero or constant

Perfect information

fully observable

7

Examples:Tic-tac-toe, chess, checkers

Page 8: Search with Other Agents - Sharif

Deterministic Games

Many possible formalizations, one is:

States: S (start at s0)

Players: P={1...N} (usually take turns)

Actions:A (may depend on player / state)

Transition Function: SxA β†’ S

Terminal Test: S β†’ {t,f}

Terminal Utilities: SxP β†’ R

Solution for a player is a policy: S β†’A

8

Page 9: Search with Other Agents - Sharif

Single-Agent Trees

8

2 0 2 6 4 6… …

9

Page 10: Search with Other Agents - Sharif

Value of a State

Non-Terminal States:

8

2 0 2 6 4 6… … Terminal States:

Value of a state: The best achievable outcome (utility) from that state

10

Page 11: Search with Other Agents - Sharif

Adversarial Search

11

Page 12: Search with Other Agents - Sharif

Game tree (tic-tac-toe)

12

Two players:𝑃1 and 𝑃2 (𝑃1 is now searching to find a good move)

Zero-sum games:𝑃1 gets π‘ˆ(𝑑),𝑃2 gets 𝐢 βˆ’ π‘ˆ(𝑑) for terminal node 𝑑

Utilities from the point of view of 𝑃1

1-ply = half move

𝑃1

𝑃2

𝑃1

𝑃2

𝑃1: 𝑋𝑃2: 𝑂

Page 13: Search with Other Agents - Sharif

Adversarial Game Trees

-20 -8 -18 -5 -10 +4… … -20 +8

13

Page 14: Search with Other Agents - Sharif

Minimax Values

+8-10-5-8

States Under Agent’s Control:

Terminal States:

States Under Opponent’s Control:

14

Page 15: Search with Other Agents - Sharif

Tic-Tac-Toe Game Tree

15

Page 16: Search with Other Agents - Sharif

Adversarial Search (Minimax)

Opponent is assumed optimal

Minimax search:

A state-space search tree

Players alternate turns

Compute each node’s minimax value:

the best achievable utility against a

rational (optimal) adversary8 2 5 6

max

min2 5

5

Terminal values:part of the game

16

Page 17: Search with Other Agents - Sharif

Minimax

𝑀𝐼𝑁𝐼𝑀𝐴𝑋(𝑠) shows the best achievable outcome of being in

state 𝑠 (assumption: optimal opponent)

17

𝑀𝐼𝑁𝐼𝑀𝐴𝑋 𝑠 = ࡞

π‘ˆπ‘‡πΌπΏπΌπ‘‡π‘Œ(𝑠,𝑀𝐴𝑋) 𝑖𝑓 𝑇𝐸𝑅𝑀𝐼𝑁𝐴𝐿_𝑇𝐸𝑆𝑇(𝑠)

π‘šπ‘Žπ‘₯π‘Žβˆˆπ΄πΆπ‘‡πΌπ‘‚π‘π‘† 𝑠 𝑀𝐼𝑁𝐼𝑀𝐴𝑋(π‘…πΈπ‘†π‘ˆπΏπ‘‡(𝑠, π‘Ž)) π‘ƒπΏπ΄π‘ŒπΈπ‘… 𝑠 = 𝑀𝐴𝑋

π‘šπ‘–π‘›π‘Žβˆˆπ΄πΆπ‘‡πΌπ‘‚π‘π‘† 𝑠 𝑀𝐼𝑁𝐼𝑀𝐴𝑋(π‘…πΈπ‘†π‘ˆπΏπ‘‡ 𝑠, π‘Ž ) π‘ƒπΏπ΄π‘ŒπΈπ‘… 𝑠 = 𝑀𝐼𝑁Utility of being in state s

3 2 2

3

Page 18: Search with Other Agents - Sharif

Minimax (Cont.)

18

Optimal strategy: move to the state with highest minimax

value

Best achievable payoff against best play

Maximizes the worst-case outcome for MAX

It works for zero-sum games

Page 19: Search with Other Agents - Sharif

Minimax Implementation

def min-value(state):initialize v = +∞for each successor of state:

v = min(v, max-value(successor))return v

def max-value(state):initialize v = -∞for each successor of state:

v = max(v, min-value(successor))return v

19

Page 20: Search with Other Agents - Sharif

Minimax Implementation (Dispatch)

def value(state):

if the state is a terminal state: return the state’s utility

if the next agent is MAX: return max-value(state)

if the next agent is MIN: return min-value(state)

def min-value(state):initialize v = +∞for each successor of state:

v = min(v, value(successor))return v

def max-value(state):initialize v = -∞for each successor of state:

v = max(v, value(successor))return v

20

Page 21: Search with Other Agents - Sharif

Minimax algorithm

21

Depth first search

function 𝑀𝐼𝑁𝐼𝑀𝐴𝑋_𝐷𝐸𝐢𝐼𝑆𝐼𝑂𝑁(π‘ π‘‘π‘Žπ‘‘π‘’) returns π‘Žπ‘› π‘Žπ‘π‘‘π‘–π‘œπ‘›return max

π‘Žβˆˆπ΄πΆπ‘‡πΌπ‘‚π‘π‘†(π‘ π‘‘π‘Žπ‘‘π‘’)𝑀𝐼𝑁_π‘‰π΄πΏπ‘ˆπΈ(π‘…πΈπ‘†π‘ˆπΏπ‘‡(π‘ π‘‘π‘Žπ‘‘π‘’, π‘Ž))

function 𝑀𝐴𝑋_π‘‰π΄πΏπ‘ˆπΈ(π‘ π‘‘π‘Žπ‘‘π‘’) returns π‘Ž 𝑒𝑑𝑖𝑙𝑖𝑑𝑦 π‘£π‘Žπ‘™π‘’π‘’if 𝑇𝐸𝑅𝑀𝐼𝑁𝐴𝐿_𝑇𝐸𝑆𝑇(π‘ π‘‘π‘Žπ‘‘π‘’) then return π‘ˆπ‘‡πΌπΏπΌπ‘‡π‘Œ(π‘ π‘‘π‘Žπ‘‘π‘’)𝑣 ← βˆ’βˆžfor each π‘Ž in 𝐴𝐢𝑇𝐼𝑂𝑁𝑆(π‘ π‘‘π‘Žπ‘‘π‘’) do

𝑣 ← 𝑀𝐴𝑋(𝑣,𝑀𝐼𝑁_π‘‰π΄πΏπ‘ˆπΈ(π‘…πΈπ‘†π‘ˆπΏπ‘‡π‘†(π‘ π‘‘π‘Žπ‘‘π‘’, π‘Ž)))return 𝑣

function 𝑀𝐼𝑁_π‘‰π΄πΏπ‘ˆπΈ(π‘ π‘‘π‘Žπ‘‘π‘’) returns π‘Ž 𝑒𝑑𝑖𝑙𝑖𝑑𝑦 π‘£π‘Žπ‘™π‘’π‘’if 𝑇𝐸𝑅𝑀𝐼𝑁𝐴𝐿_𝑇𝐸𝑆𝑇(π‘ π‘‘π‘Žπ‘‘π‘’) then return π‘ˆπ‘‡πΌπΏπΌπ‘‡π‘Œ(π‘ π‘‘π‘Žπ‘‘π‘’)𝑣 ← ∞for each π‘Ž in 𝐴𝐢𝑇𝐼𝑂𝑁𝑆(π‘ π‘‘π‘Žπ‘‘π‘’) do

𝑣 ← 𝑀𝐼𝑁(𝑣,𝑀𝐴𝑋_π‘‰π΄πΏπ‘ˆπΈ(π‘…πΈπ‘†π‘ˆπΏπ‘‡π‘†(π‘ π‘‘π‘Žπ‘‘π‘’, π‘Ž)))return 𝑣

Page 22: Search with Other Agents - Sharif

Properties of minimax

Complete?Yes (when tree is finite)

Optimal?Yes (against an optimal opponent)

Time complexity:𝑂(π‘π‘š)

Space complexity:𝑂(π‘π‘š) (depth-first exploration)

For chess, 𝑏 β‰ˆ 35,π‘š > 50 for reasonable games

Finding exact solution is completely infeasible

22

Page 23: Search with Other Agents - Sharif

Minimax Properties

Optimal against a perfect player. Otherwise?

10 10 9 100

max

min

23

Page 24: Search with Other Agents - Sharif

Video of Demo Min vs. Exp (Min)

24

Page 25: Search with Other Agents - Sharif

Video of Demo Min vs. Exp (Exp)

25

Page 26: Search with Other Agents - Sharif

Game Tree Pruning

26

Page 27: Search with Other Agents - Sharif

Pruning

27

Correct minimax decision without looking at every node

in the game tree

Ξ±-Ξ² pruning

Branch & bound algorithm

Prunes away branches that cannot influence the final decision

Page 28: Search with Other Agents - Sharif

Ξ±-Ξ² pruning example

28

Page 29: Search with Other Agents - Sharif

Ξ±-Ξ² pruning example

29

Page 30: Search with Other Agents - Sharif

Ξ±-Ξ² pruning example

30

Page 31: Search with Other Agents - Sharif

Ξ±-Ξ² pruning example

31

Page 32: Search with Other Agents - Sharif

Ξ±-Ξ² pruning example

32

Page 33: Search with Other Agents - Sharif

Alpha-Beta Pruning

General configuration (MIN version)

We’re computing the MIN-VALUE at some node n

We’re looping over n’s children

Who cares about n’s value? MAX

Let 𝛼 be the best value that MAX can get at any

choice point along the current path from the root

If n becomes worse than 𝛼, MAX will avoid it, so we

can stop considering n’s other children (it’s already

bad enough that it won’t be played)

MAX version is symmetric

MAX

MIN

MAX

MIN

𝛼

n

33

Page 34: Search with Other Agents - Sharif

Ξ±-Ξ² pruning

34

Assuming depth-first generation of tree

We prune node 𝑛 when player has a better choice π‘š at

(parent or) any ancestor of 𝑛

Two types of pruning (cuts):

pruning of max nodes (Ξ±-cuts)

pruning of min nodes (Ξ²-cuts)

Page 35: Search with Other Agents - Sharif

Why is it called Ξ±-Ξ²?

Ξ±: Value of the best (highest) choice found so far at any choice

point along the path for MAX

𝛽: Value of the best (lowest) choice found so far at any choice

point along the path for MIN

Updating Ξ± and 𝛽 during the search process

For a MAX node once the value of this node is known to be more

than the current 𝛽 (v β‰₯ 𝛽), its remaining branches are pruned.

For a MIN node once the value of this node is known to be less

than the current 𝛼 (v ≀ 𝛼), its remaining branches are pruned.

35

Page 36: Search with Other Agents - Sharif

Alpha-Beta Quiz 2

36

Page 37: Search with Other Agents - Sharif

Alpha-Beta Quiz 2

10

10

>=100

2

<=2

37

Page 38: Search with Other Agents - Sharif

Ξ±-Ξ² pruning (an other example)

38

1 5

≀23

β‰₯5

2

3

Page 39: Search with Other Agents - Sharif

Alpha-Beta Implementation

def min-value(state , α, β):initialize v = +∞for each successor of state:

v = min(v, value(successor, Ξ±, Ξ²))if v ≀ Ξ± return vΞ² = min(Ξ², v)

return v

def max-value(state, α, β):initialize v = -∞for each successor of state:

v = max(v, value(successor, Ξ±, Ξ²))if v β‰₯ Ξ² return vΞ± = max(Ξ±, v)

return v

Ξ±: MAX’s best option on path to root

Ξ²: MIN’s best option on path to root

39

Page 40: Search with Other Agents - Sharif

40

function 𝐴𝐿𝑃𝐻𝐴_𝐡𝐸𝑇𝐴_𝑆𝐸𝐴𝑅𝐢𝐻(π‘ π‘‘π‘Žπ‘‘π‘’) returns π‘Žπ‘› π‘Žπ‘π‘‘π‘–π‘œπ‘›π‘£ ← 𝑀𝐴𝑋_π‘‰π΄πΏπ‘ˆπΈ(π‘ π‘‘π‘Žπ‘‘π‘’, βˆ’βˆž,+∞)return the π‘Žπ‘π‘‘π‘–π‘œπ‘› in 𝐴𝐢𝑇𝐼𝑂𝑁𝑆(π‘ π‘‘π‘Žπ‘‘π‘’) with value 𝑣

function 𝑀𝐴𝑋_π‘‰π΄πΏπ‘ˆπΈ(π‘ π‘‘π‘Žπ‘‘π‘’, 𝛼, 𝛽) returns π‘Ž 𝑒𝑑𝑖𝑙𝑖𝑑𝑦 π‘£π‘Žπ‘™π‘’π‘’if 𝑇𝐸𝑅𝑀𝐼𝑁𝐴𝐿_𝑇𝐸𝑆𝑇(π‘ π‘‘π‘Žπ‘‘π‘’) then return π‘ˆπ‘‡πΌπΏπΌπ‘‡π‘Œ(π‘ π‘‘π‘Žπ‘‘π‘’)𝑣 ← βˆ’βˆžfor each π‘Ž in 𝐴𝐢𝑇𝐼𝑂𝑁𝑆(π‘ π‘‘π‘Žπ‘‘π‘’) do

𝑣 ← 𝑀𝐴𝑋(𝑣,𝑀𝐼𝑁_π‘‰π΄πΏπ‘ˆπΈ(π‘…πΈπ‘†π‘ˆπΏπ‘‡π‘†(π‘ π‘‘π‘Žπ‘‘π‘’, π‘Ž), 𝛼, 𝛽))if 𝑣 β‰₯ 𝛽 then return 𝑣𝛼 ← 𝑀𝐴𝑋(𝛼, 𝑣)

return 𝑣

function 𝑀𝐼𝑁_π‘‰π΄πΏπ‘ˆπΈ(π‘ π‘‘π‘Žπ‘‘π‘’, 𝛼, 𝛽) returns π‘Ž 𝑒𝑑𝑖𝑙𝑖𝑑𝑦 π‘£π‘Žπ‘™π‘’π‘’if 𝑇𝐸𝑅𝑀𝐼𝑁𝐴𝐿_𝑇𝐸𝑆𝑇(π‘ π‘‘π‘Žπ‘‘π‘’) then return π‘ˆπ‘‡πΌπΏπΌπ‘‡π‘Œ(π‘ π‘‘π‘Žπ‘‘π‘’)𝑣 ← +∞for each π‘Ž in 𝐴𝐢𝑇𝐼𝑂𝑁𝑆(π‘ π‘‘π‘Žπ‘‘π‘’) do

𝑣 ← 𝑀𝐼𝑁(𝑣,𝑀𝐴𝑋_π‘‰π΄πΏπ‘ˆπΈ(π‘…πΈπ‘†π‘ˆπΏπ‘‡π‘†(π‘ π‘‘π‘Žπ‘‘π‘’, π‘Ž), 𝛼, 𝛽))if 𝑣 ≀ 𝛼 then return 𝑣𝛽 ← 𝑀𝐼𝑁(𝛽, 𝑣)

return 𝑣

Page 41: Search with Other Agents - Sharif

Alpha-Beta Pruning Properties This pruning has no effect on minimax value computed for the root!

Values of intermediate nodes might be wrong

Important: children of the root may have the wrong value

So the most naΓ―ve version won’t let you do action selection

Good child ordering improves effectiveness of pruning

With β€œperfect ordering”:

Time complexity drops to O(bm/2)

Doubles solvable depth!

Full search of, e.g. chess, is still hopeless…

This is a simple example of metareasoning (computing about what to compute)

10 10 0

max

min

41

Page 42: Search with Other Agents - Sharif

Order of moves

Good move ordering improves effectiveness of pruning

Best order: time complexity is 𝑂(𝑏 Ξ€π‘š 2)

42

?

Page 43: Search with Other Agents - Sharif

Computational time limit (example)

43

100 secs is allowed for each move (game rule)

104 nodes/sec (processor speed)

We can explore just 106 nodes for each move

bm = 106, b=35 ⟹ m=4

(4-ply look-ahead is a hopeless chess player!)

- reaches about depth 8 – decent chess program

It is still hopeless

Page 44: Search with Other Agents - Sharif

Resource Limits

44

Page 45: Search with Other Agents - Sharif

Resource Limits

Problem: In realistic games, cannot searchto leaves!

Solution: Depth-limited search Instead, search only to a limited depth in the tree

Replace terminal utilities with an evaluationfunction for non-terminal positions

Guarantee of optimal play is gone

More plies makes a BIG difference

Use iterative deepening for an anytimealgorithm

? ? ? ?

-1 -2 4 9

4

min

max

-2 4

45

Page 46: Search with Other Agents - Sharif

Computational time limit: Solution

46

Cut off the search and apply a heuristic evaluation

function

cutoff test: turns non-terminal nodes into terminal leaves

Cut off test instead of terminal test (e.g., depth limit)

evaluation function: estimated desirability of a state

Heuristic function evaluation instead of utility function

This approach does not guarantee optimality.

Page 47: Search with Other Agents - Sharif

Heuristic minimax

47

𝐻𝑀𝐼𝑁𝐼𝑀𝐴𝑋 𝑠,𝑑 =

࡞

𝐸𝑉𝐴𝐿(𝑠,𝑀𝐴𝑋) 𝑖𝑓 πΆπ‘ˆπ‘‡π‘‚πΉπΉ_𝑇𝐸𝑆𝑇(𝑠, 𝑑)

π‘šπ‘Žπ‘₯π‘Žβˆˆπ΄πΆπ‘‡πΌπ‘‚π‘π‘† 𝑠 𝐻_𝑀𝐼𝑁𝐼𝑀𝐴𝑋(π‘…πΈπ‘†π‘ˆπΏπ‘‡ 𝑠, π‘Ž , 𝑑 + 1) π‘ƒπΏπ΄π‘ŒπΈπ‘… 𝑠 = MAX

π‘šπ‘–π‘›π‘Žβˆˆπ΄πΆπ‘‡πΌπ‘‚π‘π‘† 𝑠 𝐻_𝑀𝐼𝑁𝐼𝑀𝐴𝑋(π‘…πΈπ‘†π‘ˆπΏπ‘‡ 𝑠, π‘Ž , 𝑑 + 1) π‘ƒπΏπ΄π‘ŒπΈπ‘… 𝑠 = MIN

Page 48: Search with Other Agents - Sharif

Depth Matters

Evaluation functions are always imperfect

The deeper in the tree the evaluation

function is buried, the less the quality of

the evaluation function matters

An important example of the tradeoff

between complexity of features and

complexity of computation

48

Page 49: Search with Other Agents - Sharif

Video of Demo Limited Depth (2)

49

Page 50: Search with Other Agents - Sharif

Video of Demo Limited Depth (10)

50

Page 51: Search with Other Agents - Sharif

Evaluation Functions

51

Page 52: Search with Other Agents - Sharif

Evaluation functions

52

For terminal states, it should order them in the same way

as the true utility function.

For non-terminal states, it should be strongly correlated

with the actual chances of winning.

It must not need high computational cost.

Page 53: Search with Other Agents - Sharif

Evaluation functions based on features

53

Example: features for evaluation of the chess states

Number of each kind of piece: number of white pawns, black

pawns, white queens, black queens, etc

King safety

Good pawn structure

…

Page 54: Search with Other Agents - Sharif

Evaluation Functions

Evaluation functions score non-terminals in depth-limited search

Ideal function: returns the actual minimax value of the position

In practice: typically weighted linear sum of features:

e.g. f1(s) = (num white queens – num black queens), etc.

54

Page 55: Search with Other Agents - Sharif

Evaluation for Pacman

[Demo: thrashing d=2, thrashing d=2 (fixed evaluation function), smart ghosts coordinate (L6D6,7,8,10)]

55

Page 56: Search with Other Agents - Sharif

Video of Demo Thrashing (d=3)

56

Page 57: Search with Other Agents - Sharif

Why Pacman Starves

A danger of replanning agents! He knows his score will go up by eating the dot now (west, east)

He knows his score will go up just as much by eating the dot later (east, west)

There are no point-scoring opportunities after eating the dot (within the horizon, two here)

Therefore, waiting seems just as good as eating: he may go east, then back west in the next round ofreplanning!

57

Page 58: Search with Other Agents - Sharif

Fixing the evaluation function

Evaluation function:

# of eaten dots + 1/(distance to the closest dot)

58

Page 59: Search with Other Agents - Sharif

Video of Demo Thrashing -- Fixed (d=3)

59

Page 60: Search with Other Agents - Sharif

Video of Demo Smart Ghosts (Coordination)

60

Page 61: Search with Other Agents - Sharif

Video of Demo Smart Ghosts (Coordination) –

Zoomed In

61

Page 62: Search with Other Agents - Sharif

Other Game Types

62

Page 63: Search with Other Agents - Sharif

Multi-Agent Utilities

What if the game is not zero-sum, or has multiple players?

Generalization of minimax: Terminals have utility tuples

Node values are also utility tuples

Each player maximizes its own component

Can give rise to cooperation and

competition dynamically…

1,6,6 7,1,2 6,1,2 7,2,1 5,1,7 1,5,2 7,7,1 5,2,5

1,6,6

63

Page 64: Search with Other Agents - Sharif

Uncertain Outcomes

64

Page 65: Search with Other Agents - Sharif

Probabilities

65

Page 66: Search with Other Agents - Sharif

Reminder: Probabilities A random variable represents an event whose outcome is unknown

A probability distribution is an assignment of weights to outcomes

Example:Traffic on freeway

Random variable:T = whether there’s traffic

Outcomes:T in {none, light, heavy}

Distribution: P(T=none) = 0.25, P(T=light) = 0.50, P(T=heavy) = 0.25

Some laws of probability (more later):

Probabilities are always non-negative

Probabilities over all possible outcomes sum to one

As we get more evidence, probabilities may change:

P(T=heavy) = 0.25, P(T=heavy | Hour=8am) = 0.60

We’ll talk about methods for reasoning and updating probabilities later

0.25

0.50

0.25

66

Page 67: Search with Other Agents - Sharif

The expected value of a function of a random variable is theaverage, weighted by the probability distribution overoutcomes

Example: How long to get to the airport?

Reminder: Expectations

0.25 0.50 0.25Probability:

20 min 30 min 60 minTime:35 min

x x x+ +

67

Page 68: Search with Other Agents - Sharif

Mixed Layer Types

E.g. Backgammon

Expectiminimax

Environment is an extra β€œrandom agent”

player that moves after each min/max agent

Each node computes the appropriate

combination of its children

68

Page 69: Search with Other Agents - Sharif

Stochastic games

69

𝐸𝑋𝑃𝐸𝐢𝑇_𝑀𝐼𝑁𝐼𝑀𝐴𝑋(𝑠) =π‘ˆπ‘‡πΌπΏπΌπ‘‡π‘Œ(𝑠,𝑀𝐴𝑋) 𝑖𝑓 𝑇𝐸𝑅𝑀𝐼𝑁𝐴𝐿_𝑇𝐸𝑆𝑇(𝑠)

π‘šπ‘Žπ‘₯π‘Žβˆˆπ΄πΆπ‘‡πΌπ‘‚π‘π‘† 𝑠 𝐸𝑋𝑃𝐸𝐢𝑇_𝑀𝐼𝑁𝐼𝑀𝐴𝑋(π‘…πΈπ‘†π‘ˆπΏπ‘‡(𝑠, π‘Ž)) π‘ƒπΏπ΄π‘ŒπΈπ‘… 𝑠 = MAX

π‘šπ‘–π‘›π‘Žβˆˆπ΄πΆπ‘‡πΌπ‘‚π‘π‘† 𝑠 𝐸𝑋𝑃𝐸𝐢𝑇_𝑀𝐼𝑁𝐼𝑀𝐴𝑋(π‘…πΈπ‘†π‘ˆπΏπ‘‡(𝑠, π‘Ž)) π‘ƒπΏπ΄π‘ŒπΈπ‘… 𝑠 = MIN

π‘Ÿπ‘ƒ(π‘Ÿ) 𝐸𝑋𝑃𝐸𝐢𝑇_𝑀𝐼𝑁𝐼𝑀𝐴𝑋(π‘…πΈπ‘†π‘ˆπΏπ‘‡ 𝑠, π‘Ÿ ) π‘ƒπΏπ΄π‘ŒπΈπ‘… 𝑠 = CHANCE

Page 70: Search with Other Agents - Sharif

Stochastic games: Backgammon

70

Page 71: Search with Other Agents - Sharif

Stochastic games

71

Expected utility: Chance nodes take average (expectation) over

all possible outcomes.

It is consistent with the definition of rational agents trying to

maximize expected utility.

2 3 1 4

2.1 1.3average of the values

weighted by their

probabilities

Page 72: Search with Other Agents - Sharif

Evaluation functions for stochastic games

72

An order preserving transformation on leaf values is nota sufficient condition.

Evaluation function must be a positive lineartransformation of the expected utility of a position.

Page 73: Search with Other Agents - Sharif

Properties of search space for stochastic games

73

𝑂(π‘π‘šπ‘›π‘š) Backgammon: 𝑏 β‰ˆ 20 (can be up to 4000 for double dice rolls), 𝑛 = 21

(no. of different possible dice rolls)

3-plies is manageable (β‰ˆ 108 nodes)

Probability of reaching a given node decreases enormously by

increasing the depth (multiplying probabilities)

Forming detailed plans of actions may be pointless

Limiting depth is not such damaging particularly when the probability values (for

each non-deterministic situation) are close to each other

But pruning is not straightforward.

Page 74: Search with Other Agents - Sharif

Example: Backgammon

Dice rolls increase b: 21 possible rolls with 2 dice

Backgammon 20 legal moves

Depth 2 = 20 x (21 x 20)3 = 1.2 x 109

As depth increases, probability of reaching a given

search node shrinks

So usefulness of search is diminished

So limiting depth is less damaging

But pruning is trickier…

Historic AI: TDGammon uses depth-2 search + very

good evaluation function + reinforcement learning

Image: Wikipedia

74

1st AI world champion in any game!

Page 75: Search with Other Agents - Sharif

State-of-the-art game programs

Chess (𝑏 β‰ˆ 35) In 1997, Deep Blue defeated Kasparov.

ran on a parallel computer doing alpha-beta search.

reaches depth 14 plies routinely. techniques to extend the effective search depth

Hydra: Reaches depth 18 plies using more heuristics.

Checkers (𝑏 < 10) Chinook (ran on a regular PC and uses alpha-beta search) ended 40-

year-reign of human world championTinsley in 1994.

Since 2007, Chinook has been able to play perfectly by using alpha-betasearch combined with a database of 39 trillion endgame positions.

75

Page 76: Search with Other Agents - Sharif

State-of-the-art game programs (Cont.)

76

Othello (𝑏 is usually between 5 to 15) Logistello defeated the human world champion by

six games to none in 1997.

Human champions are no match for computers atOthello.

Go (𝑏 > 300) Human champions refuse to compete against

computers (current programs are still at advancedamateur level).

MOGO avoided alpha-beta search and used MonteCarlo rollouts.

AlphaGo (2016) has beaten professionals withouthandicaps.

Page 77: Search with Other Agents - Sharif

State-of-the-art game programs (Cont.)

77

Backgammon (stochastic) TD-Gammon (1992) was competitive with top

human players. Depth 2 or 3 search along with a good evaluation

function developed by learning methods

Bridge (partially observable, multiplayer) In 1998, GIB was 12th in a filed of 35 in the par

contest at human world championship.

In 2005, Jack defeated three out of seven topchampions pairs. Overall, it lost by a small margin.

Scrabble (partially observable & stochastic) In 2006, Quackle defeated the former world

champion 3-2.

Page 78: Search with Other Agents - Sharif

Utilities & Expectimax

Page 79: Search with Other Agents - Sharif

Utilities

79

Page 80: Search with Other Agents - Sharif

Utilities

Utilities are functions from outcomes (states of the world) to real numbersthat describe an agent’s preferences

Where do utilities come from? In a game, may be simple (+1/-1)

Utilities summarize the agent’s goals

Theorem: any β€œrational” preferences can be summarized as a utility function

We hard-wire utilities and let behaviors emerge Why don’t we let agents pick utilities?

Why don’t we prescribe behaviors?

80

Page 81: Search with Other Agents - Sharif

Utilities: Uncertain Outcomes

Getting ice cream

Get Single Get Double

Oops Whew!

81

Page 82: Search with Other Agents - Sharif

Expectimax Search (One Player Game)

Why wouldn’t we know what the result of an action will be? Explicit randomness: rolling dice

Unpredictable opponents: the ghosts respond randomly

Actions can fail: when moving a robot, wheels might slip

Values should now reflect average-case (expectimax)

outcomes, not worst-case outcomes

Expectimax search: compute the average score under

optimal play Max nodes as in minimax search

Chance nodes are like min nodes but the outcome is uncertain

Calculate their expected utilities i.e. take weighted average (expectation) of children

Soon, we’ll learn how to formalize the underlying uncertain-

result problems as Markov Decision Processes

10 4 5 7

max

chance

10 10 9 100

82

Page 83: Search with Other Agents - Sharif

Expectimax Pseudocode: One Player

def value(state):

if the state is a terminal state: return the state’s utility

if the next agent is MAX: return max-value(state)

if the next agent is EXP: return exp-value(state)

def exp-value(state):initialize v = 0for each successor of state:

p = probability(successor)v += p * value(successor)

return v

def max-value(state):initialize v = -∞for each successor of state:

v = max(v, value(successor))return v

83

Page 84: Search with Other Agents - Sharif

Expectimax Pseudocode

def exp-value(state):initialize v = 0for each successor of state:

p = probability(successor)v += p * value(successor)

return v 5 78 24 -12

1/21/3

1/6

v = (1/2) (8) + (1/3) (24) + (1/6) (-12) = 10

84

Page 85: Search with Other Agents - Sharif

Expectimax Pruning?

12 93 2

85

Page 86: Search with Other Agents - Sharif

What Utilities to Use?

For worst-case minimax reasoning, terminal function scale doesn’t matter

We just want better states to have higher evaluations (get the ordering right)

We call this insensitivity to monotonic transformations

For average-case expectimax reasoning, we need magnitudes to be meaningful

0 40 20 30 x2 0 1600 400 900

86

Page 87: Search with Other Agents - Sharif

Expectimax Search

Why wouldn’t we know what the result of an action will be? Explicit randomness: rolling dice Unpredictable opponents: the ghosts respond randomly Unpredictable humans: humans are not perfect Actions can fail: when moving a robot, wheels might slip

Values should now reflect average-case (expectimax)outcomes, not worst-case (minimax) outcomes

Expectimax search: compute the average score under optimalplay Max nodes as in minimax search Chance nodes are like min nodes but the outcome is uncertain Calculate their expected utilities I.e. take weighted average (expectation) of children

Later, we’ll learn how to formalize the underlying uncertain-result problems as Markov Decision Processes

10 4 5 7

max

chance

10 10 9 100

[Demo: min vs exp (L7D1,2)]

87

Page 88: Search with Other Agents - Sharif

Worst-Case vs. Average Case (One player

Game)

10 10 9 100

max

min

Idea: Uncertain outcomes controlled by chance, not an adversary!

88

Page 89: Search with Other Agents - Sharif

Video of Demo Minimax vs Expectimax (Min)

89

Page 90: Search with Other Agents - Sharif

Video of Demo Minimax vs Expectimax (Exp)

90

Page 91: Search with Other Agents - Sharif

In expectimax search, we have a probabilistic modelof how the opponent (or environment) will behavein any state could be a simple uniform distribution (roll a die) could be sophisticated and require a great deal of

computation a chance node for any outcome out of our control:

opponent or environment The model might say that adversarial actions are likely!

What Probabilities to Use?

Having a probabilistic belief about another agent’s action

does not mean that the agent is flipping any coins!

91

Page 92: Search with Other Agents - Sharif

What Probabilities to Use?

Having a probabilistic belief about another agent’s action

does not mean that the agent is flipping any coins!

92

In expectimax search, we have a probabilistic modelof how the opponent (or environment) will behavein any state could be a simple uniform distribution (roll a die) could be sophisticated and require a great deal of

computation a chance node for any outcome out of our control:

opponent or environment The model might say that adversarial actions are likely!

For now, assume each chance node magicallycomes along with probabilities that specify thedistribution over its outcomes

Page 93: Search with Other Agents - Sharif

Quiz: Informed Probabilities

Let’s say you know that your opponent is actually running a depth 2 minimax, using

the result 80% of the time, and moving randomly otherwise

Question:What tree search should you use?

0.1 0.9

β–ͺ Answer: Expectimax!β–ͺ To figure out EACH chance node’s probabilities,

you have to run a simulation of your opponent

β–ͺ This kind of thing gets very slow very quickly

β–ͺ Even worse if you have to simulate your opponent simulating you…

β–ͺ … except for minimax and maximax, which have the nice property that it all collapses into one game tree

This is basically how you would model a human, except for their utility: their utility might be the same as yours (i.e. you try to help them, but they are depth 2 and noisy), or they might have a slightly different utility (like another person navigating in the office)

93

Page 94: Search with Other Agents - Sharif

Modeling Assumptions

94

Page 95: Search with Other Agents - Sharif

The Dangers of Optimism and Pessimism

Dangerous OptimismAssuming chance when the world is

adversarial

Dangerous PessimismAssuming the worst case when it’s not

likely

95

Page 96: Search with Other Agents - Sharif

Video of Demo World Assumptions

Random Ghost – Expectimax Pacman

96

Page 97: Search with Other Agents - Sharif

Video of Demo World Assumptions

Adversarial Ghost – Minimax Pacman

97

Page 98: Search with Other Agents - Sharif

Video of Demo World Assumptions

Adversarial Ghost – Expectimax Pacman

98

Page 99: Search with Other Agents - Sharif

Video of Demo World Assumptions

Random Ghost – Minimax Pacman

99

Page 100: Search with Other Agents - Sharif

Assumptions vs. Reality

Adversarial Ghost Random Ghost

MinimaxPacman

Won 5/5

Avg. Score: 483

Won 5/5

Avg. Score: 493

ExpectimaxPacman

Won 1/5

Avg. Score: -303

Won 5/5

Avg. Score: 503

Results from playing 5 games

Pacman used depth 4 search with an eval function that avoids troubleGhost used depth 2 search with an eval function that seeks Pacman

100

Page 101: Search with Other Agents - Sharif

Maximum Expected Utility

Why should we average utilities? Why not minimax?

Principle of maximum expected utility:

A rational agent should chose the action that maximizes itsexpected utility, given its knowledge

Questions:

Where do utilities come from?

How do we know such utilities even exist?

How do we know that averaging even makes sense?

What if our behavior (preferences) can’t be described by

utilities?

101