ai in games: achievements and challenges · “mastering the game of go with deep neural networks...

67
AI in Games: Achievements and Challenges Yuandong Tian Facebook AI Research

Upload: others

Post on 25-May-2020

6 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: AI in Games: Achievements and Challenges · “Mastering the game of Go with deep neural networks and tree search”, Silver et al, Nature 2016 AlphaGo “Mastering the game of Go

AIinGames:AchievementsandChallenges

YuandongTianFacebookAIResearch

Page 2: AI in Games: Achievements and Challenges · “Mastering the game of Go with deep neural networks and tree search”, Silver et al, Nature 2016 AlphaGo “Mastering the game of Go

GameasaVehicleofAI

Lesssafetyandethical concerns

Fasterthanreal-time

Infinite supplyoffully labeleddata

Controllable and replicable Low costper sample

Complicateddynamicswithsimplerules.

Page 3: AI in Games: Achievements and Challenges · “Mastering the game of Go with deep neural networks and tree search”, Silver et al, Nature 2016 AlphaGo “Mastering the game of Go

GameasaVehicleofAI

Abstractgametoreal-world

Algorithmisslowanddata-inefficient Requirealotofresources.

Hardtobenchmarktheprogress

Page 4: AI in Games: Achievements and Challenges · “Mastering the game of Go with deep neural networks and tree search”, Silver et al, Nature 2016 AlphaGo “Mastering the game of Go

GameasaVehicleofAI

BetterGames

Abstractgametoreal-world

Algorithmisslowanddata-inefficient Requirealotofresources.

Hardtobenchmarktheprogress

Page 5: AI in Games: Achievements and Challenges · “Mastering the game of Go with deep neural networks and tree search”, Silver et al, Nature 2016 AlphaGo “Mastering the game of Go

2000s 2010s1990s1970s

GameSpectrum

Good old days 1980s

Page 6: AI in Games: Achievements and Challenges · “Mastering the game of Go with deep neural networks and tree search”, Silver et al, Nature 2016 AlphaGo “Mastering the game of Go

2000s 2010s1990s1970s

GameSpectrum

Good old days 1980s

ChessGo Poker

Page 7: AI in Games: Achievements and Challenges · “Mastering the game of Go with deep neural networks and tree search”, Silver et al, Nature 2016 AlphaGo “Mastering the game of Go

2000s 2010s1990s1970s

GameSpectrum

Good old days 1980s

Breakout (1978)Pong (1972)

Page 8: AI in Games: Achievements and Challenges · “Mastering the game of Go with deep neural networks and tree search”, Silver et al, Nature 2016 AlphaGo “Mastering the game of Go

2000s 2010s1990s1970s

GameSpectrum

Good old days 1980s

Super Mario Bro (1985) Contra (1987)

Page 9: AI in Games: Achievements and Challenges · “Mastering the game of Go with deep neural networks and tree search”, Silver et al, Nature 2016 AlphaGo “Mastering the game of Go

2000s 2010s1990s1970s

GameSpectrum

Good old days 1980s

Doom (1993) KOF’94 (1994) StarCraft (1998)

Page 10: AI in Games: Achievements and Challenges · “Mastering the game of Go with deep neural networks and tree search”, Silver et al, Nature 2016 AlphaGo “Mastering the game of Go

2000s 2010s1990s1970s

GameSpectrum

Good old days 1980s

Counter Strike (2000) The Sims 3 (2009)

Page 11: AI in Games: Achievements and Challenges · “Mastering the game of Go with deep neural networks and tree search”, Silver et al, Nature 2016 AlphaGo “Mastering the game of Go

2000s 2010s1990s1970s

GameSpectrum

Good old days 1980s

GTA V (2013)StarCraft II (2010) Final Fantasy XV (2016)

Page 12: AI in Games: Achievements and Challenges · “Mastering the game of Go with deep neural networks and tree search”, Silver et al, Nature 2016 AlphaGo “Mastering the game of Go

GameasaVehicleofAI

BetterAlgorithm/System BetterEnvironment

Abstractgametoreal-world

Algorithmisslowanddata-inefficient Requirealotofresources.

Hardtobenchmarktheprogress

Page 13: AI in Games: Achievements and Challenges · “Mastering the game of Go with deep neural networks and tree search”, Silver et al, Nature 2016 AlphaGo “Mastering the game of Go

Ourwork

DarkForest GoEngine(YuandongTian,YanZhu,ICLR16)

DoomAI(Yuxin Wu,YuandongTian,ICLR17)

ELF:ExtensiveLightweightandFlexibleFramework(YuandongTianetal,arXiv)

BetterAlgorithm/System BetterEnvironment

Page 14: AI in Games: Achievements and Challenges · “Mastering the game of Go with deep neural networks and tree search”, Silver et al, Nature 2016 AlphaGo “Mastering the game of Go

HowGameAIworks

Evenwithasuper-supercomputer,itisnotpossibletosearchtheentirespace.

Page 15: AI in Games: Achievements and Challenges · “Mastering the game of Go with deep neural networks and tree search”, Silver et al, Nature 2016 AlphaGo “Mastering the game of Go

HowGameAIworks

Extensivesearch Evaluate

Evenwithasuper-supercomputer,itisnotpossibletosearchtheentirespace.

Consequence

Blackwins

Whitewins

Blackwins

Whitewins

Blackwins

Currentgamesituation

Lufei Ruan vs. Yifan Hou (2010)

Page 16: AI in Games: Achievements and Challenges · “Mastering the game of Go with deep neural networks and tree search”, Silver et al, Nature 2016 AlphaGo “Mastering the game of Go

HowGameAIworks

Extensivesearch Evaluate Consequence

Blackwins

Whitewins

Blackwins

Whitewins

Blackwins

Howmanyactiondoyouhaveperstep?Checker:a few possiblemovesPoker:a few possiblemovesChess:30-40 possiblemovesGo:100-200 possiblemovesStarCraft:50^100possiblemoves

Alpha-betapruning+Iterativedeepening[MajorChessengine]

Monte-CarloTreeSearch+UCBexploration[MajorGoengine]???

CounterfactualRegretMinimization[Libratus,DeepStack]

Currentgamesituation

Page 17: AI in Games: Achievements and Challenges · “Mastering the game of Go with deep neural networks and tree search”, Silver et al, Nature 2016 AlphaGo “Mastering the game of Go

HowGameAIworks

Extensivesearch Evaluate Consequence

Blackwins

Whitewins

Blackwins

Whitewins

Blackwins

Howcomplicatedisthegamesituation?Howdeepisthegame?

ChessGoPokerStarCraft

Linearfunctionforsituationevaluation[Stockfish]

DeepValuenetwork[AlphaGo,DeepStack]

Randomgameplaywithsimplerules[Zen,CrazyStone,DarkForest]

Endgamedatabase

Rule-based

Currentgamesituation

Page 18: AI in Games: Achievements and Challenges · “Mastering the game of Go with deep neural networks and tree search”, Silver et al, Nature 2016 AlphaGo “Mastering the game of Go

HowtomodelPolicy/Valuefunction?

• Manymanualsteps• Conflictingparameters,notscalable.• Needstrongdomainknowledge.

• End-to-Endtraining• Lotsofdata,lesstuning.

• Minimaldomainknowledge.• Amazingperformance

Traditionalapproach DeepLearning

Non-smooth+high-dimensionalSensitivetosituations.OnestonechangesinGoleadstodifferentgame.

Page 19: AI in Games: Achievements and Challenges · “Mastering the game of Go with deep neural networks and tree search”, Silver et al, Nature 2016 AlphaGo “Mastering the game of Go

Case study: AlphaGo• Computations

• Trainwithmany GPUsandinferencewithTPU.

• Policynetwork• Trainedsupervisedfromhumanreplays.• Self-playnetworkwithRL.

• Highqualityplayout/rolloutpolicy• 2microsecondpermove,24.2%accuracy. ~30%• ThousandsoftimesfasterthanDCNNprediction.

• Valuenetwork• Predictsgameconsequenceforcurrentsituation.• Trainedon30Mself-playgames.

“Mastering the game of Go with deep neural networks and tree search”, Silver et al, Nature 2016

Page 20: AI in Games: Achievements and Challenges · “Mastering the game of Go with deep neural networks and tree search”, Silver et al, Nature 2016 AlphaGo “Mastering the game of Go

AlphaGo• PolicynetworkSL(trainedwithhumangames)

“Mastering the game of Go with deep neural networks and tree search”, Silver et al, Nature 2016

Page 21: AI in Games: Achievements and Challenges · “Mastering the game of Go with deep neural networks and tree search”, Silver et al, Nature 2016 AlphaGo “Mastering the game of Go

AlphaGo• FastRollout(2microsecond),~30% accuracy

“Mastering the game of Go with deep neural networks and tree search”, Silver et al, Nature 2016

Page 22: AI in Games: Achievements and Challenges · “Mastering the game of Go with deep neural networks and tree search”, Silver et al, Nature 2016 AlphaGo “Mastering the game of Go

MonteCarloTreeSearch

2/10

2/10

2/10

1/1

20/30

10/18

9/10

10/12

1/8

22/40

1/1

2/10

2/10

1/1

20/30

10/18

9/10

10/12

1/8

22/402/10

1/1

21/31

11/19

10/11

10/12

1/8

23/41

1/1

(a) (b) (c)

Tree policyDefault policy

Aggregatewinrates,andsearchtowardsthegoodnodes.

PUCT

Page 23: AI in Games: Achievements and Challenges · “Mastering the game of Go with deep neural networks and tree search”, Silver et al, Nature 2016 AlphaGo “Mastering the game of Go

AlphaGo• ValueNetwork (trained via 30M self-played games)• How data are collected?

Game start

Current state

Sampling SL network(more diverse moves)

Game terminates

Sampling RL network (higher win rate)Uniformsampling

“Mastering the game of Go with deep neural networks and tree search”, Silver et al, Nature 2016

Page 24: AI in Games: Achievements and Challenges · “Mastering the game of Go with deep neural networks and tree search”, Silver et al, Nature 2016 AlphaGo “Mastering the game of Go

AlphaGo• ValueNetwork (trained via 30M self-played games)

“Mastering the game of Go with deep neural networks and tree search”, Silver et al, Nature 2016

Page 25: AI in Games: Achievements and Challenges · “Mastering the game of Go with deep neural networks and tree search”, Silver et al, Nature 2016 AlphaGo “Mastering the game of Go

AlphaGo

“Mastering the game of Go with deep neural networks and tree search”, Silver et al, Nature 2016

Page 26: AI in Games: Achievements and Challenges · “Mastering the game of Go with deep neural networks and tree search”, Silver et al, Nature 2016 AlphaGo “Mastering the game of Go

Ourwork

Page 27: AI in Games: Achievements and Challenges · “Mastering the game of Go with deep neural networks and tree search”, Silver et al, Nature 2016 AlphaGo “Mastering the game of Go

OurcomputerGoplayer: DarkForest

• DCNNasatreepolicy• Predictnextkmoves(ratherthannextmove)• Trainedon170kKGSdataset/80kGoGoD,57.1% accuracy.• KGS3Dwithoutsearch(0.1spermove)• Release3monthbeforeAlphaGo,<1%GPUs(fromAjaHuang)

Yuandong Tian and Yan Zhu, ICLR 2016

Yan Zhu

Page 28: AI in Games: Achievements and Challenges · “Mastering the game of Go with deep neural networks and tree search”, Silver et al, Nature 2016 AlphaGo “Mastering the game of Go

Name

Our/enemyliberties

Ko location

Our/enemystones/emptyplace

Our/enemystonehistory

Opponent rank

FeatureusedforDCNN

OurcomputerGoplayer: DarkForest

Page 29: AI in Games: Achievements and Challenges · “Mastering the game of Go with deep neural networks and tree search”, Silver et al, Nature 2016 AlphaGo “Mastering the game of Go

PureDCNN

WinratebetweenDCNNandopensourceengines.

darkforest:Onlyusetop-1prediction,trainedonKGSdarkfores1:Usetop-3prediction,trainedonGoGoDdarkfores2:darkfores1 withfine-tuning.

Page 30: AI in Games: Achievements and Challenges · “Mastering the game of Go with deep neural networks and tree search”, Silver et al, Nature 2016 AlphaGo “Mastering the game of Go

MonteCarloTreeSearch

2/10

2/10

2/10

1/1

20/30

10/18

9/10

10/12

1/8

22/40

1/1

2/10

2/10

1/1

20/30

10/18

9/10

10/12

1/8

22/402/10

1/1

21/31

11/19

10/11

10/12

1/8

23/41

1/1

(a) (b) (c)

Tree policyDefault policy

Aggregatewinrates,andsearchtowardsthegoodnodes.

Page 31: AI in Games: Achievements and Challenges · “Mastering the game of Go with deep neural networks and tree search”, Silver et al, Nature 2016 AlphaGo “Mastering the game of Go

WinratebetweenDCNN+MCTSandopensourceengines.

darkfmcts3:Top-3/5,75krollouts,~12sec/move,KGS5d

94.2%

DCNN + MCTS

Page 32: AI in Games: Achievements and Challenges · “Mastering the game of Go with deep neural networks and tree search”, Silver et al, Nature 2016 AlphaGo “Mastering the game of Go

• DCNN+MCTS• Usetop3/5movesfromDCNN,75krollouts.• StableKGS5d.Opensource.• 3rd placeonKGSJanuaryTournaments• 2nd placein9th UECComputerGoCompetition(NotthistimeJ)

DarkForest versusKoichiKobayashi(9p)

OurcomputerGoplayer: DarkForest

https://github.com/facebookresearch/darkforestGo

Page 33: AI in Games: Achievements and Challenges · “Mastering the game of Go with deep neural networks and tree search”, Silver et al, Nature 2016 AlphaGo “Mastering the game of Go

WinRateanalysis(usingDarkForest)(AlphaGo versusLeeSedol)

https://github.com/facebookresearch/ELF/tree/master/go

New version of DarkForest on ELF platform

Page 34: AI in Games: Achievements and Challenges · “Mastering the game of Go with deep neural networks and tree search”, Silver et al, Nature 2016 AlphaGo “Mastering the game of Go

FirstPersonShooter(FPS)Game

Playthegamefromthe raw image!

Yuxin Wu, Yuandong Tian, ICLR 2017

Yuxin Wu

Page 35: AI in Games: Achievements and Challenges · “Mastering the game of Go with deep neural networks and tree search”, Silver et al, Nature 2016 AlphaGo “Mastering the game of Go

NetworkStructure

SimpleFrameStacking isveryuseful(ratherthanUsingLSTM)

Page 36: AI in Games: Achievements and Challenges · “Mastering the game of Go with deep neural networks and tree search”, Silver et al, Nature 2016 AlphaGo “Mastering the game of Go

Actor-Critic Models

Update Policynetwork

UpdateValuenetwork

Reward

Encourageactionsleadingtostateswithhigh-than-expectedvalue.Encouragevaluefunctiontoconvergetothetruecumulativerewards.Keepthediversityofactions

sT

s0

V (sT )

Page 37: AI in Games: Achievements and Challenges · “Mastering the game of Go with deep neural networks and tree search”, Silver et al, Nature 2016 AlphaGo “Mastering the game of Go

CurriculumTraining

Fromsimpletocomplicated

Page 38: AI in Games: Achievements and Challenges · “Mastering the game of Go with deep neural networks and tree search”, Silver et al, Nature 2016 AlphaGo “Mastering the game of Go

Curriculum Training

Page 39: AI in Games: Achievements and Challenges · “Mastering the game of Go with deep neural networks and tree search”, Silver et al, Nature 2016 AlphaGo “Mastering the game of Go

VizDoom AICompetition2016 (Track1)

Rank Bot 1 2 3 4 5 6 7 8 9 10 11 Totalfrags

1 F1 56 62 n/a 54 47 43 47 55 50 48 50 559

2 Arnold 36 34 42 36 36 45 36 39 n/a 33 36 413

3 CLYDE 37 n/a 38 32 37 30 46 42 33 24 44 393

Videos:https://www.youtube.com/watch?v=94EPSjQH38Yhttps://www.youtube.com/watch?v=Qv4esGWOg7w&t=394s

Wewonthefirstplace!

Page 40: AI in Games: Achievements and Challenges · “Mastering the game of Go with deep neural networks and tree search”, Silver et al, Nature 2016 AlphaGo “Mastering the game of Go
Page 41: AI in Games: Achievements and Challenges · “Mastering the game of Go with deep neural networks and tree search”, Silver et al, Nature 2016 AlphaGo “Mastering the game of Go

VisualizationofValuefunctionsBest4frames(agentisabouttoshoottheenemy)

Worst4frames(agentmissedtheshootandisoutofammo)

Page 42: AI in Games: Achievements and Challenges · “Mastering the game of Go with deep neural networks and tree search”, Silver et al, Nature 2016 AlphaGo “Mastering the game of Go

ELF:Extensive,LightweightandFlexibleFrameworkforGameResearch

• Extensive• AnygameswithC++interfacescanbeincorporated.

• Lightweight• Fast.Mini-RTS(40KFPSpercore)• Minimalresourceusage(1GPU+severalCPUs)• Fast training (a couple of hours for a RTS game)

• Flexible• Environment-Actortopology• Parametrized gameenvironments.• ChoiceofdifferentRLmethods.

Yuandong Tian, Qucheng Gong, Wendy Shang, Yuxin Wu, Larry Zitnick (NIPS 2017 Oral)

Arxiv:https://arxiv.org/abs/1707.01067

Larry Zitnick

Qucheng Gong Wendy Shang

Yuxin Wu

https://github.com/facebookresearch/ELF

Page 43: AI in Games: Achievements and Challenges · “Mastering the game of Go with deep neural networks and tree search”, Silver et al, Nature 2016 AlphaGo “Mastering the game of Go

How RL system works

Game1

GameN

Game2

Consumers (Python)

Actor

Model

Optimizer

Process 1

Process 2

Process N

Replay Buffer

Page 44: AI in Games: Achievements and Challenges · “Mastering the game of Go with deep neural networks and tree search”, Silver et al, Nature 2016 AlphaGo “Mastering the game of Go

ELFdesign

Plug-and-play; no worry about the concurrency anymore.

Game1

GameN

Daemon(batch

collector)

Producer(GamesinC++)

Game2

History buffer

History buffer

History buffer

Consumers (Python)

Reply

BatchwithHistoryinfo

Actor

Model

Optimizer

Page 45: AI in Games: Achievements and Challenges · “Mastering the game of Go with deep neural networks and tree search”, Silver et al, Nature 2016 AlphaGo “Mastering the game of Go
Page 46: AI in Games: Achievements and Challenges · “Mastering the game of Go with deep neural networks and tree search”, Silver et al, Nature 2016 AlphaGo “Mastering the game of Go

PossibleUsage

• GameResearch• Boardgame(Chess,Go,etc)• Real-timeStrategyGame

• Complicated RL algorithms.• Discrete/Continuouscontrol

• Robotics

• DialogandQ&ASystem

Page 47: AI in Games: Achievements and Challenges · “Mastering the game of Go with deep neural networks and tree search”, Silver et al, Nature 2016 AlphaGo “Mastering the game of Go

Initialization

Page 48: AI in Games: Achievements and Challenges · “Mastering the game of Go with deep neural networks and tree search”, Silver et al, Nature 2016 AlphaGo “Mastering the game of Go

MainLoop

Page 49: AI in Games: Achievements and Challenges · “Mastering the game of Go with deep neural networks and tree search”, Silver et al, Nature 2016 AlphaGo “Mastering the game of Go

Training

Page 50: AI in Games: Achievements and Challenges · “Mastering the game of Go with deep neural networks and tree search”, Silver et al, Nature 2016 AlphaGo “Mastering the game of Go

Self-Play

Page 51: AI in Games: Achievements and Challenges · “Mastering the game of Go with deep neural networks and tree search”, Silver et al, Nature 2016 AlphaGo “Mastering the game of Go

Multi-Agent

Page 52: AI in Games: Achievements and Challenges · “Mastering the game of Go with deep neural networks and tree search”, Silver et al, Nature 2016 AlphaGo “Mastering the game of Go

Monte-Carlo Tree Search

1/1

2/10

2/10

1/1

20/30

10/18

9/10

10/12

1/8

22/40

Page 53: AI in Games: Achievements and Challenges · “Mastering the game of Go with deep neural networks and tree search”, Silver et al, Nature 2016 AlphaGo “Mastering the game of Go

FlexibleEnvironment-Actortopology

(b) Many-to-One (c) One-to-Many

Environment Actor

(a) One-to-OneVanilla A3C BatchA3C, GA3C Self-Play,

Monte-Carlo Tree Search

Environment Actor

Environment Actor

Environment

Environment Actor

Environment

Actor

Environment Actor

Actor

Page 54: AI in Games: Achievements and Challenges · “Mastering the game of Go with deep neural networks and tree search”, Silver et al, Nature 2016 AlphaGo “Mastering the game of Go

RLPytorch

• ARLplatforminPyTorch• A3Cin30lines.• Interfacingwithdict.

Page 55: AI in Games: Achievements and Challenges · “Mastering the game of Go with deep neural networks and tree search”, Silver et al, Nature 2016 AlphaGo “Mastering the game of Go

Architecture Hierarchy

ELF

RTS EngineALEGo(DarkForest)

Mini-RTS Capturethe Flag

TowerDefense

An extensive framework that can host many games.

Specific game engines.

EnvironmentsPong Breakout

Page 56: AI in Games: Achievements and Challenges · “Mastering the game of Go with deep neural networks and tree search”, Silver et al, Nature 2016 AlphaGo “Mastering the game of Go

AminiatureRTSengine

Enemy base

Your base

Your barracks

Worker

Enemy unit

Resource Game Name Descriptions Avg Game Length

Mini-RTS Gather resource and build troops to destroy opponent’s base.

1000-6000 ticks

Capture the Flag Capture the flag and bring it to your own base

1000-4000 ticks

Tower Defense Builds defensive towers to block enemy invasion.

1000-2000 ticks

Fog of War

Page 57: AI in Games: Achievements and Challenges · “Mastering the game of Go with deep neural networks and tree search”, Silver et al, Nature 2016 AlphaGo “Mastering the game of Go

Simulation Speed

Platform ALE RLE Universe Malmo

FPS 6000 530 60 120

Platform DeepMind Lab VizDoom TorchCraft Mini-RTSFPS 287(C) / 866(G)

6CPU+1GPU7,000 2,000 (Frameskip=50) 40,000

Page 58: AI in Games: Achievements and Challenges · “Mastering the game of Go with deep neural networks and tree search”, Silver et al, Nature 2016 AlphaGo “Mastering the game of Go

Training AI

Conv ReLUBN

x4

Policy

Value

Game visualization Game internal data(respectingfogofwar)

Location of all workers

Location of all melee tanks

Location of all range tanks

HP portion

UsingInternal Game data and A3C.Rewardisonlyavailableoncethegameisover.

Resource

Page 59: AI in Games: Achievements and Challenges · “Mastering the game of Go with deep neural networks and tree search”, Silver et al, Nature 2016 AlphaGo “Mastering the game of Go

MiniRTS

Buildingthatcanbuildworkersandcollectresources.

Resourceunitthatcontains1000minerals.

Workerwhocanbuildbarracksandgatherresource.Lowspeedin movement andlowattackdamage.

Buildingthatcanbuildmeleeattackerandrangeattacker.

TankwithhighHP,mediummovementspeed,shortattackrange,highattackdamage.

TankwithlowHP,highmovementspeed,longattackrangeandmediumattackdamage.

Page 60: AI in Games: Achievements and Challenges · “Mastering the game of Go with deep neural networks and tree search”, Silver et al, Nature 2016 AlphaGo “Mastering the game of Go

Training AI9 discrete actions.

No. Action name Descriptions

1 IDLE Do nothing

2 BUILDWORKER If the base is idle, build a worker

3 BUILDBARRACK Moveaworker(gatheringoridle)toanemptyplaceandbuildabarrack.

4 BUILDMELEEATTACKER Ifwehaveanidlebarrack,buildanmeleeattacker.

5 BUILDRANGEATTACKER Ifwehaveanidlebarrack,builda range attacker.

6 HITANDRUNIfwehaverangeattackers,movetowardsopponentbaseandattack.Takeadvantageoftheirlongattackrangeandhighmovementspeedtohitandrunifenemycounter-attack.

7 ATTACK Allmeleeandrangeattackersattacktheopponent’sbase.

8 ATTACKINRANGE Allmeleeandrangeattackersattackenemiesinsight.

9 ALLDEFEND Alltroopsattackenemytroopsnearthebaseandresource.

Page 61: AI in Games: Achievements and Challenges · “Mastering the game of Go with deep neural networks and tree search”, Silver et al, Nature 2016 AlphaGo “Mastering the game of Go

Win rate against rule-based AI

Frame skip AI_SIMPLE AI_HIT_AND_RUN

50 68.4(±4.3) 63.6(±7.9)

20 61.4(±5.8) 55.4(±4.7)

10 52.8(±2.4) 51.1(±5.0)

SIMPLE(median)

SIMPLE(mean/std)

HIT_AND_RUN(median)

HIT_AND_RUN(mean/std)

ReLU 52.8 54.7(±4.2) 60.4 57.0(±6.8)

Leaky ReLU 59.8 61.0(±2.6) 60.2 60.3(±3.3)

ReLU + BN 61.0 64.4(±7.4) 55.6 57.5(±6.8)

Leaky ReLU + BN 72.2 68.4(±4.3) 65.5 63.6(±7.9)

Frame skip (how often AI makes decisions)

Network Architecture Conv ReLUBN

Page 62: AI in Games: Achievements and Challenges · “Mastering the game of Go with deep neural networks and tree search”, Silver et al, Nature 2016 AlphaGo “Mastering the game of Go

Effect of T-steps

Large T is better.

Page 63: AI in Games: Achievements and Challenges · “Mastering the game of Go with deep neural networks and tree search”, Silver et al, Nature 2016 AlphaGo “Mastering the game of Go

Transfer Learning and Curriculum TrainingAI_SIMPLE AI_HIT_AND_RUN Combined

(50%SIMPLE+50% H&R)

SIMPLE 68.4(±4.3) 26.6(±7.6) 47.5(±5.1)

HIT_AND_RUN 34.6(±13.1) 63.6(±7.9) 49.1(±10.5)

Combined(No curriculum) 49.4(±10.0) 46.0(±15.3) 47.7(±11.0)

Combined 51.8(±10.6) 54.7(±11.2) 53.2(±8.5)

AI_SIMPLE AI_HIT_AND_RUN CAPTURE_THE_FLAG

Withoutcurriculum training 66.0 (±2.4) 54.4 (±15.9) 54.2(±20.0)

Withcurriculum training 68.4 (±4.3) 63.6(±7.9) 59.9(±7.4)

MixtureofSIMPLE_AIandTrainedAI

Trainingtime

99%

Highest win rate against AI_SIMPLE: 80%

Page 64: AI in Games: Achievements and Challenges · “Mastering the game of Go with deep neural networks and tree search”, Silver et al, Nature 2016 AlphaGo “Mastering the game of Go

Monte Carlo Tree Search

MiniRTS (AI_SIMPLE) MiniRTS (Hit_and_Run)Random 24.2 (±3.9) 25.9 (±0.6)MCTS 73.2 (±0.6) 62.7 (±2.0)

MCTS uses complete information and perfect dynamicsMCTS evaluation is repeated on 1000 games, using 800 rollouts.

Page 65: AI in Games: Achievements and Challenges · “Mastering the game of Go with deep neural networks and tree search”, Silver et al, Nature 2016 AlphaGo “Mastering the game of Go
Page 66: AI in Games: Achievements and Challenges · “Mastering the game of Go with deep neural networks and tree search”, Silver et al, Nature 2016 AlphaGo “Mastering the game of Go

Ongoing Work• One framework for different games.

• DarkForest remastered: https://github.com/facebookresearch/ELF/tree/master/go

• Richergamescenarios for MiniRTS.• Multiplebases(Expand?Rush?Defending?)• Morecomplicatedunits.• Provide a LUA interface for easier modification of the game.

• Realisticactionspace• Onecommand perunit

• Model-basedReinforcementLearning• MCTS with perfect information and perfect dynamics also achieves ~70% winrate

• Self-Play(TrainedAIversusTrainedAI)

Page 67: AI in Games: Achievements and Challenges · “Mastering the game of Go with deep neural networks and tree search”, Silver et al, Nature 2016 AlphaGo “Mastering the game of Go

Thanks!