learning shape in computer go david silver. a brief introduction to go black and white take turns to...
Post on 20-Dec-2015
213 views
TRANSCRIPT
Learning Shape in Computer Go
David Silver
A brief introduction to Go
Black and white take turns to place down stones
Once played, a stone cannot move
The aim is to surround the most territory
Usually played on 19x19 board
Capturing
The lines radiating from a stone are called liberties
If a connected group of stones has all of its liberties removed then it is captured
Captured stones are removed from the board
Capturing
The lines radiating from a stone are called liberties
If a connected group of stones has all of its liberties removed then it is captured
Captured stones are removed from the board
Atari Go (Capture Go)
Atari Go is a simplified version of GoThe winner is the first player to captureOften used to teach Go to beginnersCircumvents several tricky issues
The game only finishing by agreement Ko (local repetitions of position) Seki (local stalemates)
Computer Go
Computer Go programs are very weak Search space is too large for brute force
techniques No good evaluation functions
Human intuition (shape knowledge) has proven difficult to capture.
Why not learn shape knowledge?And use it to learn an evaluation function?
Local shape
Local shape describes a pattern of stonesIt is used extensively by current Computer
Go programs (pattern databases)Inputting local shape by hand takes many
years of hard labourWe would like to:
Learn local shapes by trial and error Assign a value for the goodness of a shape Just how good is a particular shape?
Enumerating local shapes
In these experiments all possible local shapes are used as features
Up to a small maximum size (e.g. 2x2)A local shape is defined to be:
A particular configuration of stones At a canonical position on the board
Local shapes are used as binary features by the learning algorithm
Invariances
Each canonical local shape can be: Rotated Reflected Inverted
So each position may cause updates to multiple instances of each feature.
Algorithm
Value function is learnt for afterstatesMove selection is done by 1-ply greedy
search (ε = 0) over value function Active local shapes are identified Linear combination is taken Sigmoid squashing function is applied
Backups are performed using TD(0)Reward of +1 for winning, 0 for losing
Value function approximation
Training procedure
The challenge: Learn to beat the average liberty player
So learning algorithm was trained specifically against the average liberty player
The problem: learning is very slow, since the agent almost never wins any games by chance.
The solution: mix in a proportion of random moves until the agent wins 50% of all games.
Reduce the proportion of randomness as the agent learns to win more games.
Training procedure
The two pint challenge: Learn to beat the average liberty player
So learning algorithm was trained specifically against the average liberty player
The problem: learning is very slow, since the agent almost never wins any games by chance.
The solution: mix in a proportion of random moves until the agent wins 50% of all games.
Reduce the proportion of randomness as the agent learns to win more games.
Results for different shape sizes
0
20
40
60
80
100
120
010 21 31 42 52 63 73 84 94
105115126136147157168178189199210220231241Training games (thousands)
Percentage wins
1x1
2x1
2x2
3x3
3x3
0
10
20
30
40
50
60
70
80
010 19 29 39 49 58 68 78 88 97
107117127136146156166175185195205214224234244Training games (thousands)
Percentage random moves
1x1
2x1
2x2
3x2
3x3
Results for different board sizes
0
20
40
60
80
100
120
012 23 35 47 59 70 82 94
106117129141153164176188200211223235247Training games (thousands)
Percentage wins
5x5 board
6x6 board
7x7 board
Shapes learned (1x1)
Shapes learned (2x2)
Shapes learned (3x3)
Conclusions
Local shape information is sufficient to beat a naïve rule-based player
Significant shapes can be learnedThe ‘goodness’ of shapes can be learnedA linear threshold unit can provide a
reasonable evaluation functionEnumerating all local shapes reaches a
natural limit at 3x3Training methodology is crucial
Future work
Learn shapes selectively rather than enumerating all possible shapes
Learn shapes to answer specific questions Can black B4 be captured? Can white connect A2 to D5?
Learn non-local shape: Use connectivity relationships Build hierarchies of shapes