1 learning by duopoly agents steve kimbrough fred murphy informs, november 7, 2006, 8:00-9:30 file:...

1

Learning by Duopoly Agents

Steve Kimbrough

Fred MurphyINFORMS, November 7, 2006, 8:00-9:30

File: kimbrough-murphy-informs-2006fm-1.ppt

2

Abstract as Published

Title: Learning by Duopoly Agents in Bidding for Day-Ahead Electricity Supply Presenting Author: Fred Murphy,Professor, Temple University, The Fox School of Business, 108 Speakman Hall, 1810 N. 13th Street, Philadlephia PA 19122, United States, [email protected] Co-Author: Steve Kimbrough,Professor, University of Pennsylvania, 3730 Walnut Street, Suite 500, Philadelphia PA 19104, United States, [email protected] Abstract: Standardly, bids by distinct firms in the day-ahead market for electricity are combined to produce a kinked supply curve. We report results from an agent-based model in which a stochastic demand curve for electricity is given exogenously and in which two agents learn to bid to supply electricity. We report on the design of the model, the behavior of various learning regimes, and the conditions under which tacit collusion by the bidding agents may be arrived at, sustained, and destroyed.

3

Problems with Classic Economic Models

• Stylized so that it is possible to derive analytic results (the world is complex)

• Assumes actors have full information (no one has this)

• Assumes actors have a clear objective function to maximize profits (after one accounting course, it is clear that the definition of profit is unclear)

4

Properties of a Basic Economic Agent

• Has a measure of success• Has a data stream to measure its success• Does simple experiments or to learn how its

actions affect success or observes the consequences of outside sources of variation

• Operates in a potentially noisy environmentWe term this Probe and Adjust (PandA)Note that the first three properties are the

minimal set of properties for an economic agent to improve

5

PandA as an Algorithm• Adjusts a continuous parameter (price, quantity, etc.)• Parameters: currentLevel, delta, epsilon, and

epochLength• Activity proceeds in epochs. In each episode the

agent plays (“bids”) its currentLevel ± e, where e is in [-delta, delta] and is drawn uniformly. The agent records its returns from playing above or below currentLevel.

• After epochLength epochs, the current epochs concludes, and the agent adjusts currentLevel by ± epsilon, depending on whether playing up or down yielded better rewards.

NB. PandA agents explore & exploit, with the tradeoff specified by the PandA parameters (which can also be learned).

6

Three Market Contexts

• Monopoly

• Oligopoly

• Perfect competition

We are focusing on oligopoly, where the theory is admittedly unsatisfactory.

But first…monopoly

7

Monopoly

• Market context: agent is the only supplier• Agent properties:

– Measure of success is classic definition of profits– The data stream is the profits associated with the

quantity offered (Agent does not know the demand curve)

– Agent tries different quantities and adjusts the base quantity around which it experiments

8

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.

9

Monopoly Results• Simple model, PandA, quickly arrives at the vicinity of

the monopoly position.• PandA is robust under stochasticity. Can track

random walk changes in the demand function.• PandA performance depends on parameters, delta

and epsilon, and epochLength, but these can be tuned, can be learned by the agent.

• Key point: Here is ONE learning policy that is computationally & epistemically undemanding and that arrives at something close to the monopoly position. Reproduces classical theory with much less stringent assumptions.

10

Duopoly & Oligopoly: Cournot Competition

• Market structure: two or more agents offer quantities into the marketplace

• Agent properties:– Measure of success is the classic profit calculation– Data stream for each player is its decisions and

profits– Each agent tries different quantities and adjusts

the base quantity around which it experiments

11

Cournot Results

• PandA players quickly arrive at the vicinity of the Cournot equilibrium.

• PandA is robust under stochasticity. Can track random-walk changes in the demand function.

• PandA performance depends on parameters, delta and epsilon, and epochLength, but these can be tuned by the agent.

• Key point: Reproduces classical theory with much less stringent assumptions.

13

Duopoly & Oligopoly: Bertrand Competition

• Market structure: two or more players offer prices. The agent with the lowest price wins the whole market

• Agent properties:– Measure of success is the classic profit calculation– Data stream is its decisions and profits– Agent tries different prices and adjusts the base

price around which it experiments.

15

Bertrand Results

• Players stochastically split the market at the monopoly price

• This is the reverse of classic Bertrand• The reason is simple, at the competitive

equilibrium profits are 0. A nonzero probability of nonzero profits leads agents to raise prices. Through their random choices they split the market.

16

Further Bertrand Results• PandA players bid prices• But now there are more than 2 players. What

happens?• Depends on the number of players and on the

epochLengths they use.• Broadly: if players are added and/or everyone’s

epochLength is shortened, a tipping point is eventually reached and the agents “race to the bottom”, achieving the classic Bertrand result.

• If one or a few players are more “patient”, have longer epochLengths, this will mitigate the effect, even in the presence of impatient players.

17

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.

18

Where We Stand

• Results confirm and differ from classic models• Market designs we have used are simple• We can show mathematically that the agents are

essentially calculating stochastic gradients and moving in the optimizing direction. Thus, our current results can be derived using classic economic tools

• Any early losses from experimenting are more than compensated by higher longer run returns from learning (the explore/exploit tradeoff)

19

Where We Want to Go

• More complex markets with no simple analytic results

• Agency theory results tested using incented agents

• A richer set of behaviors to better understand the consequences of market power and the potential for tacit collusion.

• Alternative agents with minimum capability

20

A Framework for future Research

• Start with agents having the minimum capability and restrict added capabilities to those of real managers ( e.g. schemas for organizing data streams)

• Reproduce basic economic results– If the results differ, prove why.

• Only then move to complicated markets where analytic results cannot be derived

An analogy is the relationship between queuing theory and simulation

1 learning by duopoly agents steve kimbrough fred murphy informs, november 7, 2006, 8:00-9:30 file:...

Documents

panda agents

panda parameters

agentbased model

panda performance

bidding agents

currentlevel e

demand function

epochlength epochs