a scalable machine learning approach to go pierre baldi and lin wu uc irvine
DESCRIPTION
What is Go?TRANSCRIPT
![Page 1: A Scalable Machine Learning Approach to Go Pierre Baldi and Lin Wu UC Irvine](https://reader036.vdocument.in/reader036/viewer/2022062504/5a4d1bc27f8b9ab0599d394f/html5/thumbnails/1.jpg)
A Scalable Machine Learning Approach to Go
Pierre Baldi and Lin WuUC Irvine
![Page 2: A Scalable Machine Learning Approach to Go Pierre Baldi and Lin Wu UC Irvine](https://reader036.vdocument.in/reader036/viewer/2022062504/5a4d1bc27f8b9ab0599d394f/html5/thumbnails/2.jpg)
Contents
• Introduction on Go• Existing approaches• Our approach• Results• Conclusion & Future work
![Page 3: A Scalable Machine Learning Approach to Go Pierre Baldi and Lin Wu UC Irvine](https://reader036.vdocument.in/reader036/viewer/2022062504/5a4d1bc27f8b9ab0599d394f/html5/thumbnails/3.jpg)
What is Go?
![Page 4: A Scalable Machine Learning Approach to Go Pierre Baldi and Lin Wu UC Irvine](https://reader036.vdocument.in/reader036/viewer/2022062504/5a4d1bc27f8b9ab0599d394f/html5/thumbnails/4.jpg)
What is Go?
• Black & white play alternatively
• Stones with zero liberty will be removed
• The one who has more territory wins
![Page 5: A Scalable Machine Learning Approach to Go Pierre Baldi and Lin Wu UC Irvine](https://reader036.vdocument.in/reader036/viewer/2022062504/5a4d1bc27f8b9ab0599d394f/html5/thumbnails/5.jpg)
Why is Go interested?
• Go is a hard game for computer.– The best Go computer programs are easily
defeated by an average human amateur• Board games have expert-level programs
– Chess: Deep blue (1997) & FRITZ (2002)– Checker: Chinook (1994)– Othello (Reversi): Logistello (2002)– Backgammon: TD-GAMMON (1992)
![Page 6: A Scalable Machine Learning Approach to Go Pierre Baldi and Lin Wu UC Irvine](https://reader036.vdocument.in/reader036/viewer/2022062504/5a4d1bc27f8b9ab0599d394f/html5/thumbnails/6.jpg)
Why is Go interested for AI?
• Poses unique opportunities and challenges for AI and machine learning– Hard to build high quality evaluation function– Big branching factor, 200-300, compared with
35-40 for chess
![Page 7: A Scalable Machine Learning Approach to Go Pierre Baldi and Lin Wu UC Irvine](https://reader036.vdocument.in/reader036/viewer/2022062504/5a4d1bc27f8b9ab0599d394f/html5/thumbnails/7.jpg)
Existing approaches
• Hard-coded programs• Evaluate the next move by playing large
number of random games• Use machine learning to learn the
evaluation functions
![Page 8: A Scalable Machine Learning Approach to Go Pierre Baldi and Lin Wu UC Irvine](https://reader036.vdocument.in/reader036/viewer/2022062504/5a4d1bc27f8b9ab0599d394f/html5/thumbnails/8.jpg)
Existing approaches ── hard-coded programs• Hand-tailored pattern libraries• Hard-coded rules to choose among multiple
hits• Tactical search (or reading)• E.g. “Many Faces of Go”, “GnuGo”
![Page 9: A Scalable Machine Learning Approach to Go Pierre Baldi and Lin Wu UC Irvine](https://reader036.vdocument.in/reader036/viewer/2022062504/5a4d1bc27f8b9ab0599d394f/html5/thumbnails/9.jpg)
Existing approaches ── hard-coded programs• Pros:
– Good performance• Cons:
– Intensive manual work– Pattern library is not complete– Hard to manage and improve
![Page 10: A Scalable Machine Learning Approach to Go Pierre Baldi and Lin Wu UC Irvine](https://reader036.vdocument.in/reader036/viewer/2022062504/5a4d1bc27f8b9ab0599d394f/html5/thumbnails/10.jpg)
Existing approaches ── Random games• Play huge number of random games from
given position• Use the results of games to evaluate all the
legal moves• Choose the legal move with best evaluation• E.g: Gobble, Go81
![Page 11: A Scalable Machine Learning Approach to Go Pierre Baldi and Lin Wu UC Irvine](https://reader036.vdocument.in/reader036/viewer/2022062504/5a4d1bc27f8b9ab0599d394f/html5/thumbnails/11.jpg)
Existing approaches ── Random games• Pros
– Easy to implement– Reasonable performance
• Cons– Small boards only, cannot scale to normal
board
![Page 12: A Scalable Machine Learning Approach to Go Pierre Baldi and Lin Wu UC Irvine](https://reader036.vdocument.in/reader036/viewer/2022062504/5a4d1bc27f8b9ab0599d394f/html5/thumbnails/12.jpg)
Existing approaches ── Machine learning• Schraudolph et al., 1994
– TD0– Neural Network
• Graepel et al., 2001– Condensed graph by common fate property– SVM
• Stern, Graepel, and MacKay, 2005– Conditional Markov random field
![Page 13: A Scalable Machine Learning Approach to Go Pierre Baldi and Lin Wu UC Irvine](https://reader036.vdocument.in/reader036/viewer/2022062504/5a4d1bc27f8b9ab0599d394f/html5/thumbnails/13.jpg)
Existing approaches ── Machine learning• Pros:
– Learn automatically • Cons:
– Poor performance
![Page 14: A Scalable Machine Learning Approach to Go Pierre Baldi and Lin Wu UC Irvine](https://reader036.vdocument.in/reader036/viewer/2022062504/5a4d1bc27f8b9ab0599d394f/html5/thumbnails/14.jpg)
Out approach
• Use scalable algorithms to learn high quality evaluation functions automatically
• Imitate human evaluating process
![Page 15: A Scalable Machine Learning Approach to Go Pierre Baldi and Lin Wu UC Irvine](https://reader036.vdocument.in/reader036/viewer/2022062504/5a4d1bc27f8b9ab0599d394f/html5/thumbnails/15.jpg)
Our approach ── Human evaluating process• Three key components
– The understanding of patterns– The ability to combine patterns– The ability to relate strategic rewards to tactical
ones
![Page 16: A Scalable Machine Learning Approach to Go Pierre Baldi and Lin Wu UC Irvine](https://reader036.vdocument.in/reader036/viewer/2022062504/5a4d1bc27f8b9ab0599d394f/html5/thumbnails/16.jpg)
Our approach ── System components• 3x3 pattern library
– Learn tactical patterns automatically• A structure-rich Recursive Neural Network
– Propagate interaction between patterns– Learn the correlation between strategic rewards
(Targets) and tactical reward (Inputs)
![Page 17: A Scalable Machine Learning Approach to Go Pierre Baldi and Lin Wu UC Irvine](https://reader036.vdocument.in/reader036/viewer/2022062504/5a4d1bc27f8b9ab0599d394f/html5/thumbnails/17.jpg)
Our approach ── RNN architecture
• Six planes– One input plane– One output plane– Four Hidden Planes
![Page 18: A Scalable Machine Learning Approach to Go Pierre Baldi and Lin Wu UC Irvine](https://reader036.vdocument.in/reader036/viewer/2022062504/5a4d1bc27f8b9ab0599d394f/html5/thumbnails/18.jpg)
Our approach ── Update sequence
sharing.by weight function same the are and ,,, where
),,(
),,(
),,(
),,(
),,,,(
1,,1,,
1,,1,,
1,,1,,
1,,1,,
,,,,,,
SESWNWNE
SEji
SEjijiSE
SEji
SWji
SWjijiSW
SWji
NWji
NWjijiNW
NWji
NEji
NEjijiNE
NEji
SEji
SWji
NEji
NWjijioji
NNNN
HHINH
HHINH
HHINH
HHINH
HHHHINO
![Page 19: A Scalable Machine Learning Approach to Go Pierre Baldi and Lin Wu UC Irvine](https://reader036.vdocument.in/reader036/viewer/2022062504/5a4d1bc27f8b9ab0599d394f/html5/thumbnails/19.jpg)
Our approach ── Provide relevant inputs• For intersections
– Intersection type: black, white, or empty– Influence: influence from the same & opposite color– Pattern stability: a statistical value calculated from 3x3
patterns• For groups
– Number of eyes– Number of 1st, 2nd, 3rd, and 4th order liberties– Number of liberties of the 1st and 2nd weakest opponents
![Page 20: A Scalable Machine Learning Approach to Go Pierre Baldi and Lin Wu UC Irvine](https://reader036.vdocument.in/reader036/viewer/2022062504/5a4d1bc27f8b9ab0599d394f/html5/thumbnails/20.jpg)
Our approach ── Pattern stability (I)• 9x9 board is split into 10 unique locations
for 3x3 patterns with mirror and rotation symmetries considered
• Stability is measured for each intersection of each pattern within each unique location.
![Page 21: A Scalable Machine Learning Approach to Go Pierre Baldi and Lin Wu UC Irvine](https://reader036.vdocument.in/reader036/viewer/2022062504/5a4d1bc27f8b9ab0599d394f/html5/thumbnails/21.jpg)
Our approach ── Pattern stability (II)• Ten unique pattern locations
![Page 22: A Scalable Machine Learning Approach to Go Pierre Baldi and Lin Wu UC Irvine](https://reader036.vdocument.in/reader036/viewer/2022062504/5a4d1bc27f8b9ab0599d394f/html5/thumbnails/22.jpg)
Our approach ── Pattern stability (III)
1 asset constant tion regulariza a is data. trainingin the games of end at the
on intersectiat stone (or white)black a with ends pattern that timesofnumber theis ) )((or )( where
)()()()()(
is pattern of point gridfor )(stability pattern The
Ci
ppNWpNB
CpNWpNBpNWpNBpS
pipS
ii
ii
iii
i
![Page 23: A Scalable Machine Learning Approach to Go Pierre Baldi and Lin Wu UC Irvine](https://reader036.vdocument.in/reader036/viewer/2022062504/5a4d1bc27f8b9ab0599d394f/html5/thumbnails/23.jpg)
Our approach ── Pattern stability results (I)
![Page 24: A Scalable Machine Learning Approach to Go Pierre Baldi and Lin Wu UC Irvine](https://reader036.vdocument.in/reader036/viewer/2022062504/5a4d1bc27f8b9ab0599d394f/html5/thumbnails/24.jpg)
Our approach ── Pattern stability results (II)
![Page 25: A Scalable Machine Learning Approach to Go Pierre Baldi and Lin Wu UC Irvine](https://reader036.vdocument.in/reader036/viewer/2022062504/5a4d1bc27f8b9ab0599d394f/html5/thumbnails/25.jpg)
Results ── Validation error
![Page 26: A Scalable Machine Learning Approach to Go Pierre Baldi and Lin Wu UC Irvine](https://reader036.vdocument.in/reader036/viewer/2022062504/5a4d1bc27f8b9ab0599d394f/html5/thumbnails/26.jpg)
Results ── Results on move predictions
![Page 27: A Scalable Machine Learning Approach to Go Pierre Baldi and Lin Wu UC Irvine](https://reader036.vdocument.in/reader036/viewer/2022062504/5a4d1bc27f8b9ab0599d394f/html5/thumbnails/27.jpg)
Results ── Matched move (I)
![Page 28: A Scalable Machine Learning Approach to Go Pierre Baldi and Lin Wu UC Irvine](https://reader036.vdocument.in/reader036/viewer/2022062504/5a4d1bc27f8b9ab0599d394f/html5/thumbnails/28.jpg)
Results ── Matched move (II)
![Page 29: A Scalable Machine Learning Approach to Go Pierre Baldi and Lin Wu UC Irvine](https://reader036.vdocument.in/reader036/viewer/2022062504/5a4d1bc27f8b9ab0599d394f/html5/thumbnails/29.jpg)
Conclusion & Future work