nips - university of pennsylvaniamkearns/teaching/cgt/nips.pdf · a t uto rial on computational...
TRANSCRIPT
A
TutorialonComputationalGameTheory
NIPS
2002
MichaelKearns
ComputerandInformationScience
UniversityofPennsylvania
For
an
updated
and
expanded
version
of
these
slides,
visit
http://www.cis.upenn.edu/�mkearns/nips02tutorial
ThanksTo:
�
Avrim
Blum
�
DeanFoster
�
Sham
Kakade
�
JonKleinberg
�
DaphneKoller
�
JohnLangford
�
MichaelLittman
�
YishayMansour
�
Andrew
Ng
�
LuisOrtiz
�
DavidParkes
�
LawrenceSaul
�
RobSchapire
�
YoavShoham
�
SatinderSingh
�
MosheTennenholtz
�
ManfredWarmuth
RoadMap(1)
�ExamplesofStrategicCon ictasMatrixGames
�BasicsDe�nitionsof(Matrix)GameTheory
�NotionsofEquilibrium:Overview
�De�nitionandExistenceofNashEquilibria
�ComputingNashEquilibriaforMatrixGames
�GraphicalModelsforMultiplayerGameTheory
�ComputingNashEquilibriainGraphicalGames
RoadMap(2)
�OtherEquilibrium
Concepts:
{
CorrelatedEquilibria
{
CorrelatedEquilibriaandGraphicalGames
{
EvolutionaryStableStrategies
{
Nash'sBargainingProblem,CooperativeEquilibria
�LearninginRepeatedGames
{
ClassicalApproaches;RegretMinimizingAlgorithms
�GameswithState
{
ConnectionstoReinforcementLearning
�OtherDirectionsandConclusions
Example:Prisoner'sDilemma
�Twosuspectsinacrimeareinterrogatedinseparaterooms
�Eachhastwochoices:confessordeny
�Withnoconfessions,enoughevidencetoconvictonlesser
charge;oneconfessionenoughtoestablishguilt
�Policeo�erpleabargainsforconfessing
�Encodestrategiccon ictasapayo�matrix:
payo�s
confess
deny
confess
�3,�3
0,�4
deny
�4,0
�1,�1
�Whatshouldhappen?
Example:HawksandDoves
�Twoplayerscompeteforavaluableresource
�Eachhasaconfrontationalstrategy(\hawk")andaconcil-
iatorystrategy(\dove")
�ValueofresourceisV;costoflosingaconfrontationisC
�SupposeC>V
(thinknuclear�rststrike)
�Encodestrategiccon ictasapayo�matrix:
payo�s
hawk
dove
hawk
(V�C)=2,(V�C)=2
V,0
dove
0,V
V=2,V=2
�Whatshouldhappen?
A(Weak)Metaphor
�Actionsoftheplayerscanbeviewedas(binary)variables
�Underanyreasonablenotionof\rationality",thepayo�ma-
triximposesconstraintsonthejointbehaviorofthesetwo
variables
�Insteadofbeingprobabilistic,theseconstraintsarestrategic
�Insteadofcomputingconditionaldistributionsgiventheother
actions,playersoptimizetheirpayo�
�Playersaresel�shandplaytheirbestresponse
BasicsofGameTheory
�Setofplayersi=
1;:::;n(assumen=
2fornow)
�Eachplayerhasasetofm
basicactionsorpurestrategies
(suchas\hawk"or"dove")
�Notation:aiwilldenotethepurestrategychosenbyplayeri
�Jointaction:~a
�Payo�toplayerigivenbymatrixortableMi (~a)
�Goalofplayers:maximizetheirownpayo�
NotionsofEquilibria:Overview
(1)
�Anequilibrium
amongtheplayersisastrategicstando�
�Noplayercanimproveontheircurrentstrategy
�Butunderwhatmodelofcommunication,coordination,and
collusionamongtheplayers?
�Allstandardequilibrium
notionsaredescriptiveratherthan
prescriptive
NotionsofEquilibria:Overview
(2)
�Nocommunicationorbargaining:
NashEquilibria
�Communicationviacorrelationorsharedrandomness:
CorrelatedEquilibria
�Fullcommunicationandcoalitions:
(Assorted)CooperativeEquilibria
�Equilibrium
underevolutionarydynamics:
EvolutionaryStableStrategy
�We'llbeginwithNashEquilibria
MixedStrategies
�Needtointroducemixedstrategies
�Eachplayerihasanindependentdistributionpiovertheir
purestrategies(pi2[0;1]in2-actioncase)
�Use~p=
(p1;:::;pn)todenotetheproductdistributionin-
ducedoverjointaction~a
�Use~a�~ptoindicate~adistributedaccordingto~p
�Expectedreturntoplayeri:E~a�~p [Mi (~a)]
�(Whataboutmoregeneraldistributionsover~a?)
NashEquilibria
�A
productdistribution~psuchthatnoplayerhasaunilateral
incentivetodeviate
�Allplayersknow
allpayo�matrices
�Informal:nocommunication,dealsorcollusionallowed|
everyoneforthemselves
�Let~p[i:p0i ]denote~pwithpireplacedbyp0i
�Formally:~pisaNashequilibrium
(NE)ifforeveryplayeri,
andeverymixedstrategyp0i ,E~a�~p [Mi (~a)]�E~a�~p[i:p0i ] [Mi (~a)]
�Nash1951:NEalwaysexistinmixedstrategies
�Playerscanannouncetheirstrategies
ApproximateNashEquilibria
�Asetofmixedstrategies(~p1;:::;~pn)suchthatnoplayerhas
\toomuch"unilateralincentivetodeviate
�Formally:~pisan�-Nashequilibrium
(NE)ifforeveryplayeri,
andeverymixedstrategyp0i ,E~a�~p [Mi (~a)]�E~a�~p[i:p0i ] [Mi (~a)]��
�Motivation:intertia,costofchange,...
�Computationaladvantages
NEforPrisoner'sDilemma
�Recallpayo�matrix:
payo�s
confess
deny
confess
�3,�3
0,�4
deny
�4,0
�1,�1
�One(pure)NE:(confess,confess)
�Failuretocooperatedespitebene�ts
�Sourceofgreatandenduringangstingametheory
NEforHawksandDoves
�Recallpayo�matrix(V<C):
payo�s
hawk
dove
hawk
(V�C)=2,(V�C)=2
V,0
dove
0,V
V=2,V=2
�ThreeNE:
{
pure:(hawk,dove)
{
pure:(dove,hawk)
{
mixed:(Pr[hawk]=
V=C,Pr[hawk]=
V=C)
�Rock-Paper-Scissors:OnlymixedNE
NEExistenceIntuition
�Supposethat~pisnotaNE
�Forsomeplayeri,mustbesomepurestrategygivinghigher
returnagainst~pthanpi
�Foreachsuchplayer,shiftsomeoftheweightofpitothis
purestrategy
�Leaveallotherpjalone
�Formalizeascontinuousmapping~p!
F(~p)
�BrouwerFixedPointTheorem:continuousmappingF
ofa
compactsetintoitselfmustpossess~p�suchthatF(~p�)=
~p�
�One-dimensionalcaseeasy,high-dimensionaldiÆcult
SomeNEFacts
�Existencenotguaranteedinpurestrategies
�MaybemultipleNE
�Inmultiplayercase,maybeexponentiallymanyNE
�Suppose(p1;p2)and(p01;p02)aretwoNE
�Zero-sum:
(p1;p02)and
(p01;p2)also
NE,and
giveplayers
samepayo�s(gameshaveauniquevalue)
�Generalsum:(p1;p02)maynotbeaNE;di�erentNE
may
givedi�erentpayo�s
�Whichwillbechosen?
{
dynamics,additionalcriteria,structureofinteraction?
ComputingNE
�Inputs:
{
Payo�matricesMi
{
Note:eachhasmnentries(nplayers,m
actionseach)
�Output:
{
AnyNE?
{
AllNE?(outputsize)
{
SomeparticularNE?
ComplexityStatusofComputingaNE(1)
�Zero-sum,2-playercase(inputsizem2):
{
LinearProgramming
{
Polynomialtimesolution
�General-sum
case,2players(inputsizem2):
{
CloselyrelatedtoLinearComplementarityProblems
{
CanbesolvedwiththeLemke-Howsonalgorithm
{
Exponentialworst-caserunningtime
{
ProbablynotinP,butprobablynotNP-complete?
ComplexityStatusofComputingaNE(2)
�Maximizingsum
ofrewardsNP-completefor2players
�General-sum
case,multiplayer(inputsizemn):
{
Simplicalsubdivisionmethods(Scarf'salgorithm)
{
Exponentialworst-caserunningtime
{
Notclearsmallactionspaces(n=
2)help
�Missing:compactmodelsoflargeplayerandactionspaces
2-Player,Zero-Sum
Case:LPFormulation
�Assume2players,M
=
M1=
�M2
�Letp1=
(p11;:::;pm1)andp2bemixedstrategies
�Minimaxtheorem
says:
max
p1
min
p2
fp1Mp2g=
min
p2
max
p1
fp1Mp2g
�SolvedbystandardLPmethods
GeneralSum
Case:ASamplingFolkTheorem
�Suppose(p1;p2)isaNE
�Idea:let^pibeanempiricaldistributionbysamplingpi
�Ifwesampleenough,^piandpiwillgetnearlyidenticalreturns
againstanyopponentstrategy(uniform
convergence)
�Thus,(^p1;^p2)willbe�-NE
�From
Cherno�bounds,only�(1=�2)log(m)samplessuÆces
�Yields(m)(1=�2)log(m)algorithm
forapproximateNE
CompactModelsforMultiplayerGames
�Evenin2-playergames,computationalbarriersappear
�Multiplayergamesmakethingsevenworse
�Maybeweneedbetterrepresentations
�SeeaccompanyingPowerPointpresentation.
CorrelatedEquilibria
�NE~pisaproductdistributionoverthejointaction~a
�SuÆcestoguaranteeexistenceofNE
�Now
letPbeanarbitraryjointdistributionover~a
�Informalintuition:assumingallothersplay\theirpart"of
P,ihasnounilateralincentivetodeviatefrom
P
�Let~a�idenoteallactionsexceptai
�SaythatPisaCorrelatedEquilibrium
(CE)ifforanyplayer
i,andanyactionsa;a0fori:
X~a�
i P(~a�i jai=
a)Mi (~a�i ;a)�X~a
�
i P(~a�i jai=
a)Mi (~a�i ;a0)
AdvantagesofCE
�Conceptual:SomeCEpayo�vectorsnotachievablebyNE
�Everydayexample:traÆcsignal
�CEallows\cooperation"viasharedrandomization
�AnymixtureofNEisaCE|
butthereareotherCEaswell
�Computational:notethat
X~ai
(P(~a�
i ;ai=
a)=P(ai=
a))Mi (~a�
i ;a)�
X~ai
(P(~a�
i ;ai=
a)=P(ai=
a0))Mi (~a�
i ;a)
islinearinvariablesP(~a�i ;ai=
a)=
P(~a)
�Thushavejustalinearfeasibilityproblem
�2-playercase:computeCEinpolynomialtime
CorrelatedEquilibriaandGraphicalGames
�Nomatterhow
complexthegame,NEfactor
�Thus,NEalwayshavecompactrepresentations
�AnymixtureofNEisaCE
�Thus,evensimplegamescanhaveCEofarbitrarycomplexity
�How
dowerepresenttheCEofagraphicalgame?
�RestrictattentiontoCEuptoexpectedpayo�equivalence
MarkovNetsandGraphicalGames
�LetGbethegraphofagraphicalgame
�Cande�neaMarkovnetMN(G):
{
Form
cliquesoflocalneighborhoodsinG
{
ForeachcliqueC,introducepotentialfunction�C�0on
justthesettingsinC
{
Markovnetsemantics:Pr[~a]=
(1=Z)Q
C�(~aC)
�ForanyCE
ofagamewithgraphG,thereisaCE
with
identicalexpectedpayo�srepresentableinMN(G)
�Linkbetweenstrategicandprobabilisticstructure
�IfGisatree,cancomputea(random)CEeÆciently
EvolutionaryGameTheory
�A
di�erentmodelofmultiplayergames
�Assumeanin�nitepopulationofplayers|
butthatmeetin
random,pairwiseconfrontations
�Assumesymmetricpayo�matrixM
(asinHawksandDoves)
�LetPbethedistributionoveractionsinducedbythe(aver-
aged)populationmixedstrategiespi
�Then�tnessofpiisexpectedreturnagainstP
�Assumeevolutionarydynamics:thehigherthe�tnessofpi ,
themoreo�springplayerihasinthenextgeneration
EvolutionaryStableStrategies
�LetPbethepopulationmixedstrategy
�LetQ
beaninvading\mutant"population
�LetM(P;Q)betheexpectedpayo�toarandom
playerfrom
Pfacingarandom
playerfrom
Q
�Supposepopulationis(1��)P+�Q
�Fitnessofincumbentpopulation:(1��)M(P;P)+�M(P;Q)
�Fitnessofinvadingpopulation:(1��)M(Q;P)+�M(Q;Q)
�SayPisanESSifforanyQ6=
PandsuÆcientlysmall�>0,
(1��)M(P;P)+�M(P;Q)>(1��)M(Q;P)+�M(Q;Q)
�EitherM(P;P)
>
M(Q;P)
orM(P;P)
=
M(Q;P)
and
M(P;Q)>M(Q;Q)
ESSforHawksandDoves
�Recallpayo�matrix(V<C):
payo�s
hawk
dove
hawk
(V�C)=2,(V�C)=2
V,0
dove
0,V
V=2,V=2
�ESS:P(hawk)=
V=C
RemarksonESS
�Donotalwaysexist!
�Specialtypeof(symmetric)NE
�Biological�eldstudies
�Sourcesofrandomization
�Mixedstrategiesvs.populationaverages
�Marketmodels
RicherGameRepresentations
�Havesaidquitealotaboutsingle-shotmatrixgames
�Whatabout:
{
Repeatedgames
{
Gameswithstate(chess,checkers)
{
Stochasticgames(multi-playerMDPs)
�Canalways(painfully)expressinnormalform
�Normalform
equilibriaconceptsrelevant
RepeatedGames
�Stillhaveunderlyinggamematrices
�Now
playthesingle-shotgamerepeatedly,examinecumula-
tiveoraveragereward
�Gamehasnointernalstate(thoughplayersmight)
�Relevantdetail:how
manyroundsofplay?
LearninginRepeatedGames
�\Classical"algorithms:
{
FictitiousPlay:bestresponsetoempiricaldistributionof
opponentplay
{
Various(stochastic)gradientapproaches
�Commonquestion:whenwillsuchdynamicsconvergetoNE?
�Positiveresultsfairlyrestrictive
�Generalizationstoparametricstrategyrepresentations?
ExponentialUpdatesandRegretMinimization
�Viewrepeatedplayasasequenceoftrialsagainstanarbitrary
opponent
�Maintainaweightoneachpurestrategy
�Oneachtrial,multiplyeachweightbyafactorexponentially
decreasinginitsregret
�Generalsetting:
near-minimizationofregretonsequence,
butnoguaranteeofNE
�Zero-sum
case:two\copies"willconvergetoNE
�RegretminimizationandNEvs.CE
RepeatedGamesandBoundedRationality
�ConsiderrestrictingthecomplexityofstrategiesinTrounds
ofarepeatedgame
�Example:nextactioncomputedbya�nitestatemachineon
thehistoryofplaysofar
�New
equilibriamayarisefrom
therestriction
�Prisoner'sDilemma:ifnumberofstatesiso(log(T)),mutual
cooperation(denial)becomesaNE
GameswithState
�Standardboardgames:chess,checkers
�Oftenfeaturepartialorhiddeninformation(poker)
�Mightinvolverandomization(backgammon)
StochasticGames
�GeneralizeMDPstomultipleplayers
�Ateachstates,havepayo�matrixMsiforplayeri
�Immediaterewardtoiatstatesunderjointaction~aisMsi (~a)
�Markoviandynamics:P(s0js;~a)
�Discountedsum
ofrewards
�Everyplayerhasapolicy�i (s)
�Generalizeoptimalpolicyto(Nash)equilibrium
(�1;:::;�n)
�Don'tjusthavetoworryaboutin uenceonfuturestate,but
everyoneelse'spolicy
�Explorationevenmorechallenging
StochasticGamesandRL
�For�xedpoliciesofopponents,cande�nevaluefunctions
�WhathappenswhenindependentQ-learnersplay?
�Resultswithdi�erentamountsandtypeofsharedinfo
�GeneralizationofE3algorithm
tostochasticgames
�Generalizationofsparsesamplingmethods
Conclusions
�Classicalgametheoryarichandvariedformalism
forstrate-
gicreasoning,acomplementtomorepassivereasoning
�Likeprobabilitytheory,providessoundfoundationsbutlacks
emphasisonrepresentationandcomputation
�Computationalgametheoryaimstoprovidetheseemphases
�ManysubstantiveconnectionstoNIPStopicsalreadyunder
way(graphicalmodels,learningalgorithms,dynamicalsys-
tems,reinforcementlearning):::
�:::butevenmorelieahead.
�Come�ndmetochataboutopenproblems!
ContactInformation
�Email:[email protected]
�Web:www.cis.upenn.edu/�mkearns
�Thistutorial:www.cis.upenn.edu/�mkearns/nips02tutorial
{
willmorphintoPenncoursepage
�COLT/SVM
2003specialsessionongametheory