August 20 2001 6th Computer Olympiad 1
Learning Opponent-type Learning Opponent-type Probabilities for PrOM searchProbabilities for PrOM search
Jeroen Donkers
IKAT Universiteit Maastricht
August 20 2001 6th Computer Olympiad 2
ContentsContents
• OM search and PrOM search
• Learning for PrOM search
• Off-line Learning
• On-line Learning
• Conclusions & Future research
August 20 2001 6th Computer Olympiad 3
OM searchOM search
– MAX player uses evaluation function V0
– Opponent uses different evaluation function (Vop)
– At MIN nodes: predict which move the opponent will select (using standard search and Vop)
– At MAX nodes, pick the move that maximizes the search value (based on V0)
– At leaf nodes: use V0
August 20 2001 6th Computer Olympiad 4
PrOM searchPrOM search
• Extended Opponent Model:– a set of opponent types (e.g. evaluation functions)– a probability distribution over this set
• Interpretation: At every move, the opponent uses a random device to pick one of the opponent types, and plays using the selected type.
August 20 2001 6th Computer Olympiad 5
PrOM search algorithmPrOM search algorithm
• At MIN nodes: determine for every opponent type which move would be selected.
• Compute the MAX player’s value for these moves
• Use opponent-type probabilities to compute the expected value of the MIN node
• at MAX nodes: select maximum child
August 20 2001 6th Computer Olympiad 6
Learning in PrOM searchLearning in PrOM search
• How do we assess the probabilities on the opponent types?– Off line: use games previously played by the
opponent, to estimate the probabilities. (lot of time and - possibly - data available)
– On line: use the observed moves during a game to adjust the probabilities.(only little time and few observations)prior probabilities are needed.
August 20 2001 6th Computer Olympiad 7
Off-Line LearningOff-Line Learning
• Ultimate Learning Goal: find P**(opp) for a given opponent and given opponent types such that PrOM search plays the best against that opponent.
• Assumption: PrOM search plays the best if P** = P*, where P*(opp) is the mixed strategy that predicts the moves of the opponent the best.
August 20 2001 6th Computer Olympiad 8
Off-Line LearningOff-Line Learning
• How to obtain P*(opp)?• Input: a set of positions and the moves that
the given opponent and all the given opponent types would select
• “Algorithm”: P*(oppi) = Ni / N• But: leave out all ambiguous positions!
(e.g. when more than one opponent type agree with the opponent)
August 20 2001 6th Computer Olympiad 9
Off-Line LearningOff-Line Learning
• Case I: The opponent is using a mixed strategy P#(opp) of the given opponent types– Effective learning is possible (P*(opp) P# (opp))– More difficult if the opponent types are not
independent
August 20 2001 6th Computer Olympiad 10
5 opponent typesP = (a,b,b,b,b)20 moves 100 - 100,000 runs100 samples
Not leaving outambiguous events
August 20 2001 6th Computer Olympiad 11
5 opponent typesP = (a,b,b,b,b)20 moves 10 - 100,000 runs100 samples
Leaving outambiguous events
August 20 2001 6th Computer Olympiad 12
2-20 opponent typesP = (a,b,b,b,b)20 moves 100,000 runs100 samples
Varying number of opponent types
August 20 2001 6th Computer Olympiad 13
Off-Line LearningOff-Line Learning
• Case 2: The opponent is using a different strategy.– Opponent types behave random but dependent
(distribution of type i depends on type i-1)– Real opponent selects a fixed move
August 20 2001 6th Computer Olympiad 14
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
0 2 4 6 8 10 12 14 16 18
Opponent's selection
opp4
opp3
opp2
opp1
opp0
0
1
2
3
4
5
6
0 2 4 6 8 10 12 14 16 18
Opponent's selection
-Lo
g(e
rro
r)
10 1̂
10 2̂
10 3̂
10 4̂
10 5̂
Learning error
Learned probabilities
August 20 2001 6th Computer Olympiad 15
Fast On-Line LearningFast On-Line Learning
• At the principal MIN node, only the best moves for every opponent type are needed
• Increase the probability of an opponent type slightly if the observed move is the same as the selected move of this opponent type only. Normalize all probabilities.
• Drift to one opponent type is possible.
August 20 2001 6th Computer Olympiad 16
Slower On-Line LearningSlower On-Line LearningNaive Bayesian Naive Bayesian (Duda & Hart’73)(Duda & Hart’73)
• Compute the value of every move at the principal MIN node for every opponent type
• Transform these values into conditional probabilities P(move | opp).
• Compute P(opp | moveobs) using P*(opp) (Bayes rule)
• take P*(opp) = a.P*(opp) + (1- a) P(opp | moveobs)
August 20 2001 6th Computer Olympiad 17
Naïve Bayesian LearningNaïve Bayesian Learning• In the end, drifting to 1-0 probabilities will
occur almost always• Parameter a is very important for the actual
performance: – amount of change in the probabilities– convergence– drifting speed
• It should be tuned in a real setting
August 20 2001 6th Computer Olympiad 18
ConclusionsConclusions
• Effective off-line learning of probabilities is possible, when ambiguous events are disregarded.
• Off-line learning also works if the opponent does not use a mixed strategy of known opponent types.
• On-line learning must be tuned precisely to a given situation
August 20 2001 6th Computer Olympiad 19
Future ResearchFuture Research
• PrOM search and learning in real game playing
– Zanzibar Bao (8x4 mancala)– LOA (some experiment with OM-search done)
– Chess endgames