nips - university of pennsylvaniamkearns/teaching/cgt/nips.pdf · a t uto rial on computational...

A

TutorialonComputationalGameTheory

NIPS

2002

MichaelKearns

ComputerandInformationScience

UniversityofPennsylvania

[email protected]

For

an

updated

and

expanded

version

of

these

slides,

visit

http://www.cis.upenn.edu/�mkearns/nips02tutorial

ThanksTo:

�

Avrim

Blum

�

DeanFoster

�

Sham

Kakade

�

JonKleinberg

�

DaphneKoller

�

JohnLangford

�

MichaelLittman

�

YishayMansour

�

Andrew

Ng

�

LuisOrtiz

�

DavidParkes

�

LawrenceSaul

�

RobSchapire

�

YoavShoham

�

SatinderSingh

�

MosheTennenholtz

�

ManfredWarmuth

RoadMap(1)

�ExamplesofStrategicCon ictasMatrixGames

�BasicsDe�nitionsof(Matrix)GameTheory

�NotionsofEquilibrium:Overview

�De�nitionandExistenceofNashEquilibria

�ComputingNashEquilibriaforMatrixGames

�GraphicalModelsforMultiplayerGameTheory

�ComputingNashEquilibriainGraphicalGames

RoadMap(2)

�OtherEquilibrium

Concepts:

{

CorrelatedEquilibria

{

CorrelatedEquilibriaandGraphicalGames

{

EvolutionaryStableStrategies

{

Nash'sBargainingProblem,CooperativeEquilibria

�LearninginRepeatedGames

{

ClassicalApproaches;RegretMinimizingAlgorithms

�GameswithState

{

ConnectionstoReinforcementLearning

�OtherDirectionsandConclusions

Example:Prisoner'sDilemma

�Twosuspectsinacrimeareinterrogatedinseparaterooms

�Eachhastwochoices:confessordeny

�Withnoconfessions,enoughevidencetoconvictonlesser

charge;oneconfessionenoughtoestablishguilt

�Policeo�erpleabargainsforconfessing

�Encodestrategiccon ictasapayo�matrix:

payo�s

confess

deny

confess

�3,�3

0,�4

deny

�4,0

�1,�1

�Whatshouldhappen?

Example:HawksandDoves

�Twoplayerscompeteforavaluableresource

�Eachhasaconfrontationalstrategy(\hawk")andaconcil-

iatorystrategy(\dove")

�ValueofresourceisV;costoflosingaconfrontationisC

�SupposeC>V

(thinknuclear�rststrike)

�Encodestrategiccon ictasapayo�matrix:

payo�s

hawk

dove

hawk

(V�C)=2,(V�C)=2

V,0

dove

0,V

V=2,V=2

�Whatshouldhappen?

A(Weak)Metaphor

�Actionsoftheplayerscanbeviewedas(binary)variables

�Underanyreasonablenotionof\rationality",thepayo�ma-

triximposesconstraintsonthejointbehaviorofthesetwo

variables

�Insteadofbeingprobabilistic,theseconstraintsarestrategic

�Insteadofcomputingconditionaldistributionsgiventheother

actions,playersoptimizetheirpayo�

�Playersaresel�shandplaytheirbestresponse

BasicsofGameTheory

�Setofplayersi=

1;:::;n(assumen=

2fornow)

�Eachplayerhasasetofm

basicactionsorpurestrategies

(suchas\hawk"or"dove")

�Notation:aiwilldenotethepurestrategychosenbyplayeri

�Jointaction:~a

�Payo�toplayerigivenbymatrixortableMi (~a)

�Goalofplayers:maximizetheirownpayo�

NotionsofEquilibria:Overview

(1)

�Anequilibrium

amongtheplayersisastrategicstando�

�Noplayercanimproveontheircurrentstrategy

�Butunderwhatmodelofcommunication,coordination,and

collusionamongtheplayers?

�Allstandardequilibrium

notionsaredescriptiveratherthan

prescriptive

NotionsofEquilibria:Overview

(2)

�Nocommunicationorbargaining:

NashEquilibria

�Communicationviacorrelationorsharedrandomness:


�Fullcommunicationandcoalitions:

(Assorted)CooperativeEquilibria

�Equilibrium

underevolutionarydynamics:

EvolutionaryStableStrategy

�We'llbeginwithNashEquilibria

MixedStrategies

�Needtointroducemixedstrategies

�Eachplayerihasanindependentdistributionpiovertheir

purestrategies(pi2[0;1]in2-actioncase)

�Use~p=

(p1;:::;pn)todenotetheproductdistributionin-

ducedoverjointaction~a

�Use~a�~ptoindicate~adistributedaccordingto~p

�Expectedreturntoplayeri:E~a�~p [Mi (~a)]

�(Whataboutmoregeneraldistributionsover~a?)

NashEquilibria

�A

productdistribution~psuchthatnoplayerhasaunilateral

incentivetodeviate

�Allplayersknow

allpayo�matrices

�Informal:nocommunication,dealsorcollusionallowed|

everyoneforthemselves

�Let~p[i:p0i ]denote~pwithpireplacedbyp0i

�Formally:~pisaNashequilibrium

(NE)ifforeveryplayeri,

andeverymixedstrategyp0i ,E~a�~p [Mi (~a)]�E~a�~p[i:p0i ] [Mi (~a)]

�Nash1951:NEalwaysexistinmixedstrategies

�Playerscanannouncetheirstrategies

ApproximateNashEquilibria

�Asetofmixedstrategies(~p1;:::;~pn)suchthatnoplayerhas

\toomuch"unilateralincentivetodeviate

�Formally:~pisan�-Nashequilibrium

(NE)ifforeveryplayeri,

andeverymixedstrategyp0i ,E~a�~p [Mi (~a)]�E~a�~p[i:p0i ] [Mi (~a)]��

�Motivation:intertia,costofchange,...

�Computationaladvantages

NEforPrisoner'sDilemma

�Recallpayo�matrix:

payo�s

confess

deny

confess

�3,�3

0,�4

deny

�4,0

�1,�1

�One(pure)NE:(confess,confess)

�Failuretocooperatedespitebene�ts

�Sourceofgreatandenduringangstingametheory

NEforHawksandDoves

�Recallpayo�matrix(V<C):

payo�s

hawk

dove

hawk

(V�C)=2,(V�C)=2

V,0

dove

0,V

V=2,V=2

�ThreeNE:

{

pure:(hawk,dove)

{

pure:(dove,hawk)

{

mixed:(Pr[hawk]=

V=C,Pr[hawk]=

V=C)

�Rock-Paper-Scissors:OnlymixedNE

NEExistenceIntuition

�Supposethat~pisnotaNE

�Forsomeplayeri,mustbesomepurestrategygivinghigher

returnagainst~pthanpi

�Foreachsuchplayer,shiftsomeoftheweightofpitothis

purestrategy

�Leaveallotherpjalone

�Formalizeascontinuousmapping~p!

F(~p)

�BrouwerFixedPointTheorem:continuousmappingF

ofa

compactsetintoitselfmustpossess~p�suchthatF(~p�)=

~p�

�One-dimensionalcaseeasy,high-dimensionaldiÆcult

SomeNEFacts

�Existencenotguaranteedinpurestrategies

�MaybemultipleNE

�Inmultiplayercase,maybeexponentiallymanyNE

�Suppose(p1;p2)and(p01;p02)aretwoNE

�Zero-sum:

(p1;p02)and

(p01;p2)also

NE,and

giveplayers

samepayo�s(gameshaveauniquevalue)

�Generalsum:(p1;p02)maynotbeaNE;di�erentNE

may

givedi�erentpayo�s

�Whichwillbechosen?

{

dynamics,additionalcriteria,structureofinteraction?

ComputingNE

�Inputs:

{

Payo�matricesMi

{

Note:eachhasmnentries(nplayers,m

actionseach)

�Output:

{

AnyNE?

{

AllNE?(outputsize)

{

SomeparticularNE?

ComplexityStatusofComputingaNE(1)

�Zero-sum,2-playercase(inputsizem2):

{

LinearProgramming

{

Polynomialtimesolution

�General-sum

case,2players(inputsizem2):

{

CloselyrelatedtoLinearComplementarityProblems

{

CanbesolvedwiththeLemke-Howsonalgorithm

{

Exponentialworst-caserunningtime

{

ProbablynotinP,butprobablynotNP-complete?

ComplexityStatusofComputingaNE(2)

�Maximizingsum

ofrewardsNP-completefor2players

�General-sum

case,multiplayer(inputsizemn):

{

Simplicalsubdivisionmethods(Scarf'salgorithm)

{

Exponentialworst-caserunningtime

{

Notclearsmallactionspaces(n=

2)help

�Missing:compactmodelsoflargeplayerandactionspaces

2-Player,Zero-Sum

Case:LPFormulation

�Assume2players,M

=

M1=

�M2

�Letp1=

(p11;:::;pm1)andp2bemixedstrategies

�Minimaxtheorem

says:

max

p1

min

p2

fp1Mp2g=

min

p2

max

p1

fp1Mp2g

�SolvedbystandardLPmethods

GeneralSum

Case:ASamplingFolkTheorem

�Suppose(p1;p2)isaNE

�Idea:let^pibeanempiricaldistributionbysamplingpi

�Ifwesampleenough,^piandpiwillgetnearlyidenticalreturns

againstanyopponentstrategy(uniform

convergence)

�Thus,(^p1;^p2)willbe�-NE

�From

Cherno�bounds,only�(1=�2)log(m)samplessuÆces

�Yields(m)(1=�2)log(m)algorithm

forapproximateNE

CompactModelsforMultiplayerGames

�Evenin2-playergames,computationalbarriersappear

�Multiplayergamesmakethingsevenworse

�Maybeweneedbetterrepresentations

�SeeaccompanyingPowerPointpresentation.


�NE~pisaproductdistributionoverthejointaction~a

�SuÆcestoguaranteeexistenceofNE

�Now

letPbeanarbitraryjointdistributionover~a

�Informalintuition:assumingallothersplay\theirpart"of

P,ihasnounilateralincentivetodeviatefrom

P

�Let~a�idenoteallactionsexceptai

�SaythatPisaCorrelatedEquilibrium

(CE)ifforanyplayer

i,andanyactionsa;a0fori:

X~a�

i P(~a�i jai=

a)Mi (~a�i ;a)�X~a

�

i P(~a�i jai=

a)Mi (~a�i ;a0)

AdvantagesofCE

�Conceptual:SomeCEpayo�vectorsnotachievablebyNE

�Everydayexample:traÆcsignal

�CEallows\cooperation"viasharedrandomization

�AnymixtureofNEisaCE|

butthereareotherCEaswell

�Computational:notethat

X~ai

(P(~a�

i ;ai=

a)=P(ai=

a))Mi (~a�

i ;a)�

X~ai

(P(~a�

i ;ai=

a)=P(ai=

a0))Mi (~a�

i ;a)

islinearinvariablesP(~a�i ;ai=

a)=

P(~a)

�Thushavejustalinearfeasibilityproblem

�2-playercase:computeCEinpolynomialtime

CorrelatedEquilibriaandGraphicalGames

�Nomatterhow

complexthegame,NEfactor

�Thus,NEalwayshavecompactrepresentations

�AnymixtureofNEisaCE

�Thus,evensimplegamescanhaveCEofarbitrarycomplexity

�How

dowerepresenttheCEofagraphicalgame?

�RestrictattentiontoCEuptoexpectedpayo�equivalence

MarkovNetsandGraphicalGames

�LetGbethegraphofagraphicalgame

�Cande�neaMarkovnetMN(G):

{

Form

cliquesoflocalneighborhoodsinG

{

ForeachcliqueC,introducepotentialfunction�C�0on

justthesettingsinC

{

Markovnetsemantics:Pr[~a]=

(1=Z)Q

C�(~aC)

�ForanyCE

ofagamewithgraphG,thereisaCE

with

identicalexpectedpayo�srepresentableinMN(G)

�Linkbetweenstrategicandprobabilisticstructure

�IfGisatree,cancomputea(random)CEeÆciently

EvolutionaryGameTheory

�A

di�erentmodelofmultiplayergames

�Assumeanin�nitepopulationofplayers|

butthatmeetin

random,pairwiseconfrontations

�Assumesymmetricpayo�matrixM

(asinHawksandDoves)

�LetPbethedistributionoveractionsinducedbythe(aver-

aged)populationmixedstrategiespi

�Then�tnessofpiisexpectedreturnagainstP

�Assumeevolutionarydynamics:thehigherthe�tnessofpi ,

themoreo�springplayerihasinthenextgeneration

EvolutionaryStableStrategies

�LetPbethepopulationmixedstrategy

�LetQ

beaninvading\mutant"population

�LetM(P;Q)betheexpectedpayo�toarandom

playerfrom

Pfacingarandom

playerfrom

Q

�Supposepopulationis(1��)P+�Q

�Fitnessofincumbentpopulation:(1��)M(P;P)+�M(P;Q)

�Fitnessofinvadingpopulation:(1��)M(Q;P)+�M(Q;Q)

�SayPisanESSifforanyQ6=

PandsuÆcientlysmall�>0,

(1��)M(P;P)+�M(P;Q)>(1��)M(Q;P)+�M(Q;Q)

�EitherM(P;P)

>

M(Q;P)

orM(P;P)

=

M(Q;P)

and

M(P;Q)>M(Q;Q)

ESSforHawksandDoves

�Recallpayo�matrix(V<C):

payo�s

hawk

dove

hawk

(V�C)=2,(V�C)=2

V,0

dove

0,V

V=2,V=2

�ESS:P(hawk)=

V=C

RemarksonESS

�Donotalwaysexist!

�Specialtypeof(symmetric)NE

�Biological�eldstudies

�Sourcesofrandomization

�Mixedstrategiesvs.populationaverages

�Marketmodels

RicherGameRepresentations

�Havesaidquitealotaboutsingle-shotmatrixgames

�Whatabout:

{

Repeatedgames

{

Gameswithstate(chess,checkers)

{

Stochasticgames(multi-playerMDPs)

�Canalways(painfully)expressinnormalform

�Normalform

equilibriaconceptsrelevant

RepeatedGames

�Stillhaveunderlyinggamematrices

�Now

playthesingle-shotgamerepeatedly,examinecumula-

tiveoraveragereward

�Gamehasnointernalstate(thoughplayersmight)

�Relevantdetail:how

manyroundsofplay?

LearninginRepeatedGames

�\Classical"algorithms:

{

FictitiousPlay:bestresponsetoempiricaldistributionof

opponentplay

{

Various(stochastic)gradientapproaches

�Commonquestion:whenwillsuchdynamicsconvergetoNE?

�Positiveresultsfairlyrestrictive

�Generalizationstoparametricstrategyrepresentations?

ExponentialUpdatesandRegretMinimization

�Viewrepeatedplayasasequenceoftrialsagainstanarbitrary

opponent

�Maintainaweightoneachpurestrategy

�Oneachtrial,multiplyeachweightbyafactorexponentially

decreasinginitsregret

�Generalsetting:

near-minimizationofregretonsequence,

butnoguaranteeofNE

�Zero-sum

case:two\copies"willconvergetoNE

�RegretminimizationandNEvs.CE

RepeatedGamesandBoundedRationality

�ConsiderrestrictingthecomplexityofstrategiesinTrounds

ofarepeatedgame

�Example:nextactioncomputedbya�nitestatemachineon

thehistoryofplaysofar

�New

equilibriamayarisefrom

therestriction

�Prisoner'sDilemma:ifnumberofstatesiso(log(T)),mutual

cooperation(denial)becomesaNE

GameswithState

�Standardboardgames:chess,checkers

�Oftenfeaturepartialorhiddeninformation(poker)

�Mightinvolverandomization(backgammon)

StochasticGames

�GeneralizeMDPstomultipleplayers

�Ateachstates,havepayo�matrixMsiforplayeri

�Immediaterewardtoiatstatesunderjointaction~aisMsi (~a)

�Markoviandynamics:P(s0js;~a)

�Discountedsum

ofrewards

�Everyplayerhasapolicy�i (s)

�Generalizeoptimalpolicyto(Nash)equilibrium

(�1;:::;�n)

�Don'tjusthavetoworryaboutin uenceonfuturestate,but

everyoneelse'spolicy

�Explorationevenmorechallenging

StochasticGamesandRL

�For�xedpoliciesofopponents,cande�nevaluefunctions

�WhathappenswhenindependentQ-learnersplay?

�Resultswithdi�erentamountsandtypeofsharedinfo

�GeneralizationofE3algorithm

tostochasticgames

�Generalizationofsparsesamplingmethods

Conclusions

�Classicalgametheoryarichandvariedformalism

forstrate-

gicreasoning,acomplementtomorepassivereasoning

�Likeprobabilitytheory,providessoundfoundationsbutlacks

emphasisonrepresentationandcomputation

�Computationalgametheoryaimstoprovidetheseemphases

�ManysubstantiveconnectionstoNIPStopicsalreadyunder

way(graphicalmodels,learningalgorithms,dynamicalsys-

tems,reinforcementlearning):::

�:::butevenmorelieahead.

�Come�ndmetochataboutopenproblems!

ContactInformation

�Email:[email protected]

�Web:www.cis.upenn.edu/�mkearns

�Thistutorial:www.cis.upenn.edu/�mkearns/nips02tutorial

{

willmorphintoPenncoursepage

�COLT/SVM

2003specialsessionongametheory

nips - university of pennsylvaniamkearns/teaching/cgt/nips.pdf · a t uto rial on computational...

Documents