stackelberg plansย ยท in our duopoly example ๐‘ก= 1๐‘ก, the time decision of the stackelberg...

Post on 03-Oct-2020

2 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Stackelberg Plans

Thomas J. Sargent and John Stachurski

February 7, 2021

1 Contents

โ€ข Overview 2โ€ข Duopoly 3โ€ข The Stackelberg Problem 4โ€ข Stackelberg Plan 5โ€ข Recursive Representation of Stackelberg Plan 6โ€ข Computing the Stackelberg Plan 7โ€ข Exhibiting Time Inconsistency of Stackelberg Plan 8โ€ข Recursive Formulation of the Followerโ€™s Problem 9โ€ข Markov Perfect Equilibrium 10โ€ข MPE vs. Stackelberg 11

In addition to whatโ€™s in Anaconda, this lecture will need the following libraries:

In [1]: !pip install --upgrade quantecon

2 Overview

This notebook formulates and computes a plan that a Stackelberg leader uses to manip-ulate forward-looking decisions of a Stackelberg follower that depend on continuation se-quences of decisions made once and for all by the Stackelberg leader at time 0.

To facilitate computation and interpretation, we formulate things in a context that allows usto apply dynamic programming for linear-quadratic models.

From the beginning, we carry along a linear-quadratic model of duopoly in which firms faceadjustment costs that make them want to forecast actions of other firms that influence futureprices.

Letโ€™s start with some standard imports:

In [2]: import numpy as npimport numpy.linalg as laimport quantecon as qefrom quantecon import LQimport matplotlib.pyplot as plt%matplotlib inline

1

/home/ubuntu/anaconda3/lib/python3.7/site-packages/numba/np/ufunc/parallel.py:355:

NumbaWarning: The TBB threading layer requires TBB version 2019.5 or later i.e.,

TBB_INTERFACE_VERSION >= 11005. Found TBB_INTERFACE_VERSION = 11004. The TBBthreading

layer is disabled.warnings.warn(problem)

3 Duopoly

Time is discrete and is indexed by ๐‘ก = 0, 1, โ€ฆ.Two firms produce a single good whose demand is governed by the linear inverse demandcurve

๐‘๐‘ก = ๐‘Ž0 โˆ’ ๐‘Ž1(๐‘ž1๐‘ก + ๐‘ž2๐‘ก)

where ๐‘ž๐‘–๐‘ก is output of firm ๐‘– at time ๐‘ก and ๐‘Ž0 and ๐‘Ž1 are both positive.๐‘ž10, ๐‘ž20 are given numbers that serve as initial conditions at time 0.By incurring a cost of change

๐›พ๐‘ฃ2๐‘–๐‘ก

where ๐›พ > 0, firm ๐‘– can change its output according to

๐‘ž๐‘–๐‘ก+1 = ๐‘ž๐‘–๐‘ก + ๐‘ฃ๐‘–๐‘ก

Firm ๐‘–โ€™s profits at time ๐‘ก equal

๐œ‹๐‘–๐‘ก = ๐‘๐‘ก๐‘ž๐‘–๐‘ก โˆ’ ๐›พ๐‘ฃ2๐‘–๐‘ก

Firm ๐‘– wants to maximize the present value of its profits

โˆžโˆ‘๐‘ก=0

๐›ฝ๐‘ก๐œ‹๐‘–๐‘ก

where ๐›ฝ โˆˆ (0, 1) is a time discount factor.

3.1 Stackelberg Leader and Follower

Each firm ๐‘– = 1, 2 chooses a sequence ๐‘ž๐‘– โ‰ก ๐‘ž๐‘–๐‘ก+1โˆž๐‘ก=0 once and for all at time 0.

We let firm 2 be a Stackelberg leader and firm 1 be a Stackelberg follower.The leader firm 2 goes first and chooses ๐‘ž2๐‘ก+1โˆž

๐‘ก=0 once and for all at time 0.Knowing that firm 2 has chosen ๐‘ž2๐‘ก+1โˆž

๐‘ก=0, the follower firm 1 goes second and chooses๐‘ž1๐‘ก+1โˆž

๐‘ก=0 once and for all at time 0.In choosing ๐‘ž2, firm 2 takes into account that firm 1 will base its choice of ๐‘ž1 on firm 2โ€™schoice of ๐‘ž2.

2

3.2 Abstract Statement of the Leaderโ€™s and Followerโ€™s Problems

We can express firm 1โ€™s problem as

max๐‘ž1

ฮ 1( ๐‘ž1; ๐‘ž2)

where the appearance behind the semi-colon indicates that ๐‘ž2 is given.

Firm 1โ€™s problem induces the best response mapping

๐‘ž1 = ๐ต( ๐‘ž2)

(Here ๐ต maps a sequence into a sequence)

The Stackelberg leaderโ€™s problem is

max๐‘ž2

ฮ 2(๐ต( ๐‘ž2), ๐‘ž2)

whose maximizer is a sequence ๐‘ž2 that depends on the initial conditions ๐‘ž10, ๐‘ž20 and the pa-rameters of the model ๐‘Ž0, ๐‘Ž1, ๐›พ.

This formulation captures key features of the model

โ€ข Both firms make once-and-for-all choices at time 0.โ€ข This is true even though both firms are choosing sequences of quantities that are in-

dexed by time.โ€ข The Stackelberg leader chooses first within time 0, knowing that the Stackelberg fol-

lower will choose second within time 0.

While our abstract formulation reveals the timing protocol and equilibrium concept well, itobscures details that must be addressed when we want to compute and interpret a Stackel-berg plan and the followerโ€™s best response to it.

To gain insights about these things, we study them in more detail.

3.3 Firmsโ€™ Problems

Firm 1 acts as if firm 2โ€™s sequence ๐‘ž2๐‘ก+1โˆž๐‘ก=0 is given and beyond its control.

Firm 2 knows that firm 1 chooses second and takes this into account in choosing ๐‘ž2๐‘ก+1โˆž๐‘ก=0.

In the spirit of working backward, we study firm 1โ€™s problem first, taking ๐‘ž2๐‘ก+1โˆž๐‘ก=0 as given.

We can formulate firm 1โ€™s optimum problem in terms of the Lagrangian

๐ฟ =โˆž

โˆ‘๐‘ก=0

๐›ฝ๐‘ก๐‘Ž0๐‘ž1๐‘ก โˆ’ ๐‘Ž1๐‘ž21๐‘ก โˆ’ ๐‘Ž1๐‘ž1๐‘ก๐‘ž2๐‘ก โˆ’ ๐›พ๐‘ฃ2

1๐‘ก + ๐œ†๐‘ก[๐‘ž1๐‘ก + ๐‘ฃ1๐‘ก โˆ’ ๐‘ž1๐‘ก+1]

Firm 1 seeks a maximum with respect to ๐‘ž1๐‘ก+1, ๐‘ฃ1๐‘กโˆž๐‘ก=0 and a minimum with respect to

๐œ†๐‘กโˆž๐‘ก=0.

We approach this problem using methods described in Ljungqvist and Sargent RMT5 chapter2, appendix A and Macroeconomic Theory, 2nd edition, chapter IX.

3

First-order conditions for this problem are

๐œ•๐ฟ๐œ•๐‘ž1๐‘ก

= ๐‘Ž0 โˆ’ 2๐‘Ž1๐‘ž1๐‘ก โˆ’ ๐‘Ž1๐‘ž2๐‘ก + ๐œ†๐‘ก โˆ’ ๐›ฝโˆ’1๐œ†๐‘กโˆ’1 = 0, ๐‘ก โ‰ฅ 1

๐œ•๐ฟ๐œ•๐‘ฃ1๐‘ก

= โˆ’2๐›พ๐‘ฃ1๐‘ก + ๐œ†๐‘ก = 0, ๐‘ก โ‰ฅ 0

These first-order conditions and the constraint ๐‘ž1๐‘ก+1 = ๐‘ž1๐‘ก + ๐‘ฃ1๐‘ก can be rearranged to take theform

๐‘ฃ1๐‘ก = ๐›ฝ๐‘ฃ1๐‘ก+1 + ๐›ฝ๐‘Ž02๐›พ โˆ’ ๐›ฝ๐‘Ž1

๐›พ ๐‘ž1๐‘ก+1 โˆ’ ๐›ฝ๐‘Ž12๐›พ ๐‘ž2๐‘ก+1

๐‘ž๐‘ก+1 = ๐‘ž1๐‘ก + ๐‘ฃ1๐‘ก

We can substitute the second equation into the first equation to obtain

(๐‘ž1๐‘ก+1 โˆ’ ๐‘ž1๐‘ก) = ๐›ฝ(๐‘ž1๐‘ก+2 โˆ’ ๐‘ž1๐‘ก+1) + ๐‘0 โˆ’ ๐‘1๐‘ž1๐‘ก+1 โˆ’ ๐‘2๐‘ž2๐‘ก+1

where ๐‘0 = ๐›ฝ๐‘Ž02๐›พ , ๐‘1 = ๐›ฝ๐‘Ž1

๐›พ , ๐‘2 = ๐›ฝ๐‘Ž12๐›พ .

This equation can in turn be rearranged to become the second-order difference equation

๐‘ž1๐‘ก + (1 + ๐›ฝ + ๐‘1)๐‘ž1๐‘ก+1 โˆ’ ๐›ฝ๐‘ž1๐‘ก+2 = ๐‘0 โˆ’ ๐‘2๐‘ž2๐‘ก+1 (1)

Equation (1) is a second-order difference equation in the sequence ๐‘ž1 whose solution we want.

It satisfies two boundary conditions:

โ€ข an initial condition that ๐‘ž1,0, which is givenโ€ข a terminal condition requiring that lim๐‘‡ โ†’+โˆž ๐›ฝ๐‘‡ ๐‘ž2

1๐‘ก < +โˆžUsing the lag operators described in chapter IX of Macroeconomic Theory, Second edition(1987), difference equation (1) can be written as

๐›ฝ(1 โˆ’ 1 + ๐›ฝ + ๐‘1๐›ฝ ๐ฟ + ๐›ฝโˆ’1๐ฟ2)๐‘ž1๐‘ก+2 = โˆ’๐‘0 + ๐‘2๐‘ž2๐‘ก+1

The polynomial in the lag operator on the left side can be factored as

(1 โˆ’ 1 + ๐›ฝ + ๐‘1๐›ฝ ๐ฟ + ๐›ฝโˆ’1๐ฟ2) = (1 โˆ’ ๐›ฟ1๐ฟ)(1 โˆ’ ๐›ฟ2๐ฟ) (2)

where 0 < ๐›ฟ1 < 1 < 1โˆš๐›ฝ < ๐›ฟ2.

Because ๐›ฟ2 > 1โˆš๐›ฝ the operator (1 โˆ’ ๐›ฟ2๐ฟ) contributes an unstable component if solved back-wards but a stable component if solved forwards.

Mechanically, write

(1 โˆ’ ๐›ฟ2๐ฟ) = โˆ’๐›ฟ2๐ฟ(1 โˆ’ ๐›ฟโˆ’12 ๐ฟโˆ’1)

and compute the following inverse operator

4

[โˆ’๐›ฟ2๐ฟ(1 โˆ’ ๐›ฟโˆ’12 ๐ฟโˆ’1)]โˆ’1 = โˆ’๐›ฟ2(1 โˆ’ ๐›ฟ2

โˆ’1)โˆ’1๐ฟโˆ’1

Operating on both sides of equation (2) with ๐›ฝโˆ’1 times this inverse operator gives the fol-lowerโ€™s decision rule for setting ๐‘ž1๐‘ก+1 in the feedback-feedforward form.

๐‘ž1๐‘ก+1 = ๐›ฟ1๐‘ž1๐‘ก โˆ’ ๐‘0๐›ฟโˆ’12 ๐›ฝโˆ’1 1

1 โˆ’ ๐›ฟโˆ’12

+ ๐‘2๐›ฟโˆ’12 ๐›ฝโˆ’1

โˆžโˆ‘๐‘—=0

๐›ฟ๐‘—2๐‘ž2๐‘ก+๐‘—+1, ๐‘ก โ‰ฅ 0 (3)

The problem of the Stackelberg leader firm 2 is to choose the sequence ๐‘ž2๐‘ก+1โˆž๐‘ก=0 to maxi-

mize its discounted profits

โˆžโˆ‘๐‘ก=0

๐›ฝ๐‘ก(๐‘Ž0 โˆ’ ๐‘Ž1(๐‘ž1๐‘ก + ๐‘ž2๐‘ก))๐‘ž2๐‘ก โˆ’ ๐›พ(๐‘ž2๐‘ก+1 โˆ’ ๐‘ž2๐‘ก)2

subject to the sequence of constraints (3) for ๐‘ก โ‰ฅ 0.

We can put a sequence ๐œƒ๐‘กโˆž๐‘ก=0 of Lagrange multipliers on the sequence of equations (3) and

formulate the following Lagrangian for the Stackelberg leader firm 2โ€™s problem

=โˆž

โˆ‘๐‘ก=0

๐›ฝ๐‘ก(๐‘Ž0 โˆ’ ๐‘Ž1(๐‘ž1๐‘ก + ๐‘ž2๐‘ก))๐‘ž2๐‘ก โˆ’ ๐›พ(๐‘ž2๐‘ก+1 โˆ’ ๐‘ž2๐‘ก)2

+โˆž

โˆ‘๐‘ก=0

๐›ฝ๐‘ก๐œƒ๐‘ก๐›ฟ1๐‘ž1๐‘ก โˆ’ ๐‘0๐›ฟโˆ’12 ๐›ฝโˆ’1 1

1 โˆ’ ๐›ฟโˆ’12

+ ๐‘2๐›ฟโˆ’12 ๐›ฝโˆ’1

โˆžโˆ‘๐‘—=0

๐›ฟโˆ’๐‘—2 ๐‘ž2๐‘ก+๐‘—+1 โˆ’ ๐‘ž1๐‘ก+1

(4)

subject to initial conditions for ๐‘ž1๐‘ก, ๐‘ž2๐‘ก at ๐‘ก = 0.

Comments: We have formulated the Stackelberg problem in a space of sequences.

The max-min problem associated with Lagrangian (4) is unpleasant because the time ๐‘ก com-ponent of firm 1โ€™s payoff function depends on the entire future of its choices of ๐‘ž1๐‘ก+๐‘—โˆž

๐‘—=0.

This renders a direct attack on the problem cumbersome.

Therefore, below, we will formulate the Stackelberg leaderโ€™s problem recursively.

Weโ€™ll put our little duopoly model into a broader class of models with the same conceptualstructure.

4 The Stackelberg Problem

We formulate a class of linear-quadratic Stackelberg leader-follower problems of which ourduopoly model is an instance.

We use the optimal linear regulator (a.k.a. the linear-quadratic dynamic programming prob-lem described in LQ Dynamic Programming problems) to represent a Stackelberg leaderโ€™sproblem recursively.

Let ๐‘ง๐‘ก be an ๐‘›๐‘ง ร— 1 vector of natural state variables.

Let ๐‘ฅ๐‘ก be an ๐‘›๐‘ฅ ร— 1 vector of endogenous forward-looking variables that are physically free tojump at ๐‘ก.

5

In our duopoly example ๐‘ฅ๐‘ก = ๐‘ฃ1๐‘ก, the time ๐‘ก decision of the Stackelberg follower.

Let ๐‘ข๐‘ก be a vector of decisions chosen by the Stackelberg leader at ๐‘ก.The ๐‘ง๐‘ก vector is inherited physically from the past.

But ๐‘ฅ๐‘ก is a decision made by the Stackelberg follower at time ๐‘ก that is the followerโ€™s best re-sponse to the choice of an entire sequence of decisions made by the Stackelberg leader at time๐‘ก = 0.

Let

๐‘ฆ๐‘ก = [๐‘ง๐‘ก๐‘ฅ๐‘ก

]

Represent the Stackelberg leaderโ€™s one-period loss function as

๐‘Ÿ(๐‘ฆ, ๐‘ข) = ๐‘ฆโ€ฒ๐‘…๐‘ฆ + ๐‘ขโ€ฒ๐‘„๐‘ข

Subject to an initial condition for ๐‘ง0, but not for ๐‘ฅ0, the Stackelberg leader wants to maxi-mize

โˆ’โˆž

โˆ‘๐‘ก=0

๐›ฝ๐‘ก๐‘Ÿ(๐‘ฆ๐‘ก, ๐‘ข๐‘ก) (5)

The Stackelberg leader faces the model

[ ๐ผ 0๐บ21 ๐บ22

] [๐‘ง๐‘ก+1๐‘ฅ๐‘ก+1

] = [๐ด11 ๐ด12๐ด21 ๐ด22

] [๐‘ง๐‘ก๐‘ฅ๐‘ก

] + ๐‘ข๐‘ก (6)

We assume that the matrix [ ๐ผ 0๐บ21 ๐บ22

] on the left side of equation (6) is invertible, so that

we can multiply both sides by its inverse to obtain

[๐‘ง๐‘ก+1๐‘ฅ๐‘ก+1

] = [๐ด11 ๐ด12๐ด21 ๐ด22

] [๐‘ง๐‘ก๐‘ฅ๐‘ก

] + ๐ต๐‘ข๐‘ก (7)

or

๐‘ฆ๐‘ก+1 = ๐ด๐‘ฆ๐‘ก + ๐ต๐‘ข๐‘ก (8)

4.1 Interpretation of the Second Block of Equations

The Stackelberg followerโ€™s best response mapping is summarized by the second block of equa-tions of (7).

In particular, these equations are the first-order conditions of the Stackelberg followerโ€™s opti-mization problem (i.e., its Euler equations).

These Euler equations summarize the forward-looking aspect of the followerโ€™s behavior andexpress how its time ๐‘ก decision depends on the leaderโ€™s actions at times ๐‘  โ‰ฅ ๐‘ก.

6

When combined with a stability condition to be imposed below, the Euler equations summa-rize the followerโ€™s best response to the sequence of actions by the leader.

The Stackelberg leader maximizes (5) by choosing sequences ๐‘ข๐‘ก, ๐‘ฅ๐‘ก, ๐‘ง๐‘ก+1โˆž๐‘ก=0 subject to (8)

and an initial condition for ๐‘ง0.

Note that we have an initial condition for ๐‘ง0 but not for ๐‘ฅ0.

๐‘ฅ0 is among the variables to be chosen at time 0 by the Stackelberg leader.

The Stackelberg leader uses its understanding of the responses restricted by (8) to manipulatethe followerโ€™s decisions.

4.2 More Mechanical Details

For any vector ๐‘Ž๐‘ก, define ๐‘Ž๐‘ก = [๐‘Ž๐‘ก, ๐‘Ž๐‘ก+1 โ€ฆ].Define a feasible set of ( ๐‘ฆ1, 0) sequences

ฮฉ(๐‘ฆ0) = ( ๐‘ฆ1, 0) โˆถ ๐‘ฆ๐‘ก+1 = ๐ด๐‘ฆ๐‘ก + ๐ต๐‘ข๐‘ก, โˆ€๐‘ก โ‰ฅ 0

Please remember that the followerโ€™s Euler equation is embedded in the system of dynamicequations ๐‘ฆ๐‘ก+1 = ๐ด๐‘ฆ๐‘ก + ๐ต๐‘ข๐‘ก.

Note that in the definition of ฮฉ(๐‘ฆ0), ๐‘ฆ0 is taken as given.

Although it is taken as given in ฮฉ(๐‘ฆ0), eventually, the ๐‘ฅ0 component of ๐‘ฆ0 will be chosen bythe Stackelberg leader.

4.3 Two Subproblems

Once again we use backward induction.

We express the Stackelberg problem in terms of two subproblems.

Subproblem 1 is solved by a continuation Stackelberg leader at each date ๐‘ก โ‰ฅ 0.

Subproblem 2 is solved by the Stackelberg leader at ๐‘ก = 0.

The two subproblems are designed

โ€ข to respect the protocol in which the follower chooses ๐‘ž1 after seeing ๐‘ž2 chosen by theleader

โ€ข to make the leader choose ๐‘ž2 while respecting that ๐‘ž1 will be the followerโ€™s best responseto ๐‘ž2

โ€ข to represent the leaderโ€™s problem recursively by artfully choosing the state variablesconfronting and the control variables available to the leader

4.3.1 Subproblem 1

๐‘ฃ(๐‘ฆ0) = max( ๐‘ฆ1,0)โˆˆฮฉ(๐‘ฆ0)

โˆ’โˆž

โˆ‘๐‘ก=0

๐›ฝ๐‘ก๐‘Ÿ(๐‘ฆ๐‘ก, ๐‘ข๐‘ก)

7

4.3.2 Subproblem 2

๐‘ค(๐‘ง0) = max๐‘ฅ0

๐‘ฃ(๐‘ฆ0)

Subproblem 1 takes the vector of forward-looking variables ๐‘ฅ0 as given.

Subproblem 2 optimizes over ๐‘ฅ0.

The value function ๐‘ค(๐‘ง0) tells the value of the Stackelberg plan as a function of the vector ofnatural state variables at time 0, ๐‘ง0.

4.4 Two Bellman Equations

We now describe Bellman equations for ๐‘ฃ(๐‘ฆ) and ๐‘ค(๐‘ง0).

4.4.1 Subproblem 1

The value function ๐‘ฃ(๐‘ฆ) in subproblem 1 satisfies the Bellman equation

๐‘ฃ(๐‘ฆ) = max๐‘ข,๐‘ฆโˆ—

โˆ’๐‘Ÿ(๐‘ฆ, ๐‘ข) + ๐›ฝ๐‘ฃ(๐‘ฆโˆ—) (9)

where the maximization is subject to

๐‘ฆโˆ— = ๐ด๐‘ฆ + ๐ต๐‘ข

and ๐‘ฆโˆ— denotes next periodโ€™s value.

Substituting ๐‘ฃ(๐‘ฆ) = โˆ’๐‘ฆโ€ฒ๐‘ƒ๐‘ฆ into Bellman equation (9) gives

โˆ’๐‘ฆโ€ฒ๐‘ƒ๐‘ฆ = max๐‘ข,๐‘ฆโˆ— โˆ’๐‘ฆโ€ฒ๐‘…๐‘ฆ โˆ’ ๐‘ขโ€ฒ๐‘„๐‘ข โˆ’ ๐›ฝ๐‘ฆโˆ—โ€ฒ๐‘ƒ๐‘ฆโˆ—

which as in lecture linear regulator gives rise to the algebraic matrix Riccati equation

๐‘ƒ = ๐‘… + ๐›ฝ๐ดโ€ฒ๐‘ƒ๐ด โˆ’ ๐›ฝ2๐ดโ€ฒ๐‘ƒ๐ต(๐‘„ + ๐›ฝ๐ตโ€ฒ๐‘ƒ๐ต)โˆ’1๐ตโ€ฒ๐‘ƒ๐ด

and the optimal decision rule coefficient vector

๐น = ๐›ฝ(๐‘„ + ๐›ฝ๐ตโ€ฒ๐‘ƒ๐ต)โˆ’1๐ตโ€ฒ๐‘ƒ๐ด

where the optimal decision rule is

๐‘ข๐‘ก = โˆ’๐น๐‘ฆ๐‘ก

8

4.4.2 Subproblem 2

We find an optimal ๐‘ฅ0 by equating to zero the gradient of ๐‘ฃ(๐‘ฆ0) with respect to ๐‘ฅ0:

โˆ’2๐‘ƒ21๐‘ง0 โˆ’ 2๐‘ƒ22๐‘ฅ0 = 0,

which implies that

๐‘ฅ0 = โˆ’๐‘ƒ โˆ’122 ๐‘ƒ21๐‘ง0

5 Stackelberg Plan

Now letโ€™s map our duopoly model into the above setup.

We will formulate a state space system

๐‘ฆ๐‘ก = [๐‘ง๐‘ก๐‘ฅ๐‘ก

]

where in this instance ๐‘ฅ๐‘ก = ๐‘ฃ1๐‘ก, the time ๐‘ก decision of the follower firm 1.

5.1 Calculations to Prepare Duopoly Model

Now weโ€™ll proceed to cast our duopoly model within the framework of the more generallinear-quadratic structure described above.

That will allow us to compute a Stackelberg plan simply by enlisting a Riccati equation tosolve a linear-quadratic dynamic program.

As emphasized above, firm 1 acts as if firm 2โ€™s decisions ๐‘ž2๐‘ก+1, ๐‘ฃ2๐‘กโˆž๐‘ก=0 are given and beyond

its control.

5.2 Firm 1โ€™s Problem

We again formulate firm 1โ€™s optimum problem in terms of the Lagrangian

๐ฟ =โˆž

โˆ‘๐‘ก=0

๐›ฝ๐‘ก๐‘Ž0๐‘ž1๐‘ก โˆ’ ๐‘Ž1๐‘ž21๐‘ก โˆ’ ๐‘Ž1๐‘ž1๐‘ก๐‘ž2๐‘ก โˆ’ ๐›พ๐‘ฃ2

1๐‘ก + ๐œ†๐‘ก[๐‘ž1๐‘ก + ๐‘ฃ1๐‘ก โˆ’ ๐‘ž1๐‘ก+1]

Firm 1 seeks a maximum with respect to ๐‘ž1๐‘ก+1, ๐‘ฃ1๐‘กโˆž๐‘ก=0 and a minimum with respect to

๐œ†๐‘กโˆž๐‘ก=0.

First-order conditions for this problem are

๐œ•๐ฟ๐œ•๐‘ž1๐‘ก

= ๐‘Ž0 โˆ’ 2๐‘Ž1๐‘ž1๐‘ก โˆ’ ๐‘Ž1๐‘ž2๐‘ก + ๐œ†๐‘ก โˆ’ ๐›ฝโˆ’1๐œ†๐‘กโˆ’1 = 0, ๐‘ก โ‰ฅ 1

๐œ•๐ฟ๐œ•๐‘ฃ1๐‘ก

= โˆ’2๐›พ๐‘ฃ1๐‘ก + ๐œ†๐‘ก = 0, ๐‘ก โ‰ฅ 0

9

These first-order order conditions and the constraint ๐‘ž1๐‘ก+1 = ๐‘ž1๐‘ก + ๐‘ฃ1๐‘ก can be rearranged totake the form

๐‘ฃ1๐‘ก = ๐›ฝ๐‘ฃ1๐‘ก+1 + ๐›ฝ๐‘Ž02๐›พ โˆ’ ๐›ฝ๐‘Ž1

๐›พ ๐‘ž1๐‘ก+1 โˆ’ ๐›ฝ๐‘Ž12๐›พ ๐‘ž2๐‘ก+1

๐‘ž๐‘ก+1 = ๐‘ž1๐‘ก + ๐‘ฃ1๐‘ก

We use these two equations as components of the following linear system that confronts aStackelberg continuation leader at time ๐‘ก

โŽกโŽขโŽขโŽฃ

1 0 0 00 1 0 00 0 1 0

๐›ฝ๐‘Ž02๐›พ โˆ’๐›ฝ๐‘Ž1

2๐›พ โˆ’๐›ฝ๐‘Ž1๐›พ ๐›ฝ

โŽคโŽฅโŽฅโŽฆ

โŽกโŽขโŽขโŽฃ

1๐‘ž2๐‘ก+1๐‘ž1๐‘ก+1๐‘ฃ1๐‘ก+1

โŽคโŽฅโŽฅโŽฆ

=โŽกโŽขโŽขโŽฃ

1 0 0 00 1 0 00 0 1 10 0 0 1

โŽคโŽฅโŽฅโŽฆ

โŽกโŽขโŽขโŽฃ

1๐‘ž2๐‘ก๐‘ž1๐‘ก๐‘ฃ1๐‘ก

โŽคโŽฅโŽฅโŽฆ

+โŽกโŽขโŽขโŽฃ

0100

โŽคโŽฅโŽฅโŽฆ

๐‘ฃ2๐‘ก

Time ๐‘ก revenues of firm 2 are ๐œ‹2๐‘ก = ๐‘Ž0๐‘ž2๐‘ก โˆ’ ๐‘Ž1๐‘ž22๐‘ก โˆ’ ๐‘Ž1๐‘ž1๐‘ก๐‘ž2๐‘ก which evidently equal

๐‘งโ€ฒ๐‘ก๐‘…1๐‘ง๐‘ก โ‰ก โŽกโŽข

โŽฃ

1๐‘ž2๐‘ก๐‘ž1๐‘ก

โŽคโŽฅโŽฆ

โ€ฒ

โŽกโŽขโŽฃ

0 ๐‘Ž02 0

๐‘Ž02 โˆ’๐‘Ž1 โˆ’๐‘Ž1

20 โˆ’๐‘Ž1

2 0โŽคโŽฅโŽฆ

โŽกโŽขโŽฃ

1๐‘ž2๐‘ก๐‘ž1๐‘ก

โŽคโŽฅโŽฆ

If we set ๐‘„ = ๐›พ, then firm 2โ€™s period ๐‘ก profits can then be written

๐‘ฆโ€ฒ๐‘ก๐‘…๐‘ฆ๐‘ก โˆ’ ๐‘„๐‘ฃ2

2๐‘ก

where

๐‘ฆ๐‘ก = [๐‘ง๐‘ก๐‘ฅ๐‘ก

]

with ๐‘ฅ๐‘ก = ๐‘ฃ1๐‘ก and

๐‘… = [๐‘…1 00 0]

Weโ€™ll report results of implementing this code soon.

But first, we want to represent the Stackelberg leaderโ€™s optimal choices recursively.

It is important to do this for several reasons:

โ€ข properly to interpret a representation of the Stackelberg leaderโ€™s choice as a sequence ofhistory-dependent functions

โ€ข to formulate a recursive version of the followerโ€™s choice problem

First, letโ€™s get a recursive representation of the Stackelberg leaderโ€™s choice of ๐‘ž2 for ourduopoly model.

10

6 Recursive Representation of Stackelberg Plan

In order to attain an appropriate representation of the Stackelberg leaderโ€™s history-dependentplan, we will employ what amounts to a version of the Big K, little k device often used inmacroeconomics by distinguishing ๐‘ง๐‘ก, which depends partly on decisions ๐‘ฅ๐‘ก of the followers,from another vector ๐‘ง๐‘ก, which does not.

We will use ๐‘ง๐‘ก and its history ๐‘ง๐‘ก = [ ๐‘ง๐‘ก, ๐‘ง๐‘กโˆ’1, โ€ฆ , ๐‘ง0] to describe the sequence of the Stackelbergleaderโ€™s decisions that the Stackelberg follower takes as given.

Thus, we let ๐‘ฆโ€ฒ๐‘ก = [ ๐‘งโ€ฒ

๐‘ก ๐‘ฅโ€ฒ๐‘ก] with initial condition ๐‘ง0 = ๐‘ง0 given.

That we distinguish ๐‘ง๐‘ก from ๐‘ง๐‘ก is part and parcel of the Big K, little k device in this in-stance.

We have demonstrated that a Stackelberg plan for ๐‘ข๐‘กโˆž๐‘ก=0 has a recursive representation

๐‘ฅ0 = โˆ’๐‘ƒ โˆ’122 ๐‘ƒ21๐‘ง0

๐‘ข๐‘ก = โˆ’๐น ๐‘ฆ๐‘ก, ๐‘ก โ‰ฅ 0๐‘ฆ๐‘ก+1 = (๐ด โˆ’ ๐ต๐น) ๐‘ฆ๐‘ก, ๐‘ก โ‰ฅ 0

From this representation, we can deduce the sequence of functions ๐œŽ = ๐œŽ๐‘ก( ๐‘ง๐‘ก)โˆž๐‘ก=0 that com-

prise a Stackelberg plan.

For convenience, let ๐ด โ‰ก ๐ด โˆ’ ๐ต๐น and partition ๐ด conformably to the partition ๐‘ฆ๐‘ก = [ ๐‘ง๐‘ก๐‘ฅ๐‘ก] as

[๐ด11 ๐ด12๐ด21 ๐ด22

]

Let ๐ป00 โ‰ก โˆ’๐‘ƒ โˆ’1

22 ๐‘ƒ21 so that ๐‘ฅ0 = ๐ป00 ๐‘ง0.

Then iterations on ๐‘ฆ๐‘ก+1 = ๐ด ๐‘ฆ๐‘ก starting from initial condition ๐‘ฆ0 = [ ๐‘ง0๐ป0

0 ๐‘ง0] imply that for

๐‘ก โ‰ฅ 1

๐‘ฅ๐‘ก =๐‘ก

โˆ‘๐‘—=1

๐ป๐‘ก๐‘— ๐‘ง๐‘กโˆ’๐‘—

where

๐ป๐‘ก1 = ๐ด21

๐ป๐‘ก2 = ๐ด22 ๐ด21

โ‹ฎ โ‹ฎ๐ป๐‘ก

๐‘กโˆ’1 = ๐ด๐‘กโˆ’222 ๐ด21

๐ป๐‘ก๐‘ก = ๐ด๐‘กโˆ’1

22 ( ๐ด21 + ๐ด22๐ป00 )

An optimal decision rule for the Stackelbergโ€™s choice of ๐‘ข๐‘ก is

๐‘ข๐‘ก = โˆ’๐น ๐‘ฆ๐‘ก โ‰ก โˆ’ [๐น๐‘ง ๐น๐‘ฅ] [ ๐‘ง๐‘ก๐‘ฅ๐‘ก

]

11

or

๐‘ข๐‘ก = โˆ’๐น๐‘ง ๐‘ง๐‘ก โˆ’ ๐น๐‘ฅ๐‘ก

โˆ‘๐‘—=1

๐ป๐‘ก๐‘—๐‘ง๐‘กโˆ’๐‘— = ๐œŽ๐‘ก( ๐‘ง๐‘ก) (10)

Representation (10) confirms that whenever ๐น๐‘ฅ โ‰  0, the typical situation, the time ๐‘ก compo-nent ๐œŽ๐‘ก of a Stackelberg plan is history-dependent, meaning that the Stackelberg leaderโ€™schoice ๐‘ข๐‘ก depends not just on ๐‘ง๐‘ก but on components of ๐‘ง๐‘กโˆ’1.

6.1 Comments and Interpretations

After all, at the end of the day, it will turn out that because we set ๐‘ง0 = ๐‘ง0, it will be truethat ๐‘ง๐‘ก = ๐‘ง๐‘ก for all ๐‘ก โ‰ฅ 0.

Then why did we distinguish ๐‘ง๐‘ก from ๐‘ง๐‘ก?

The answer is that if we want to present to the Stackelberg follower a history-dependentrepresentation of the Stackelberg leaderโ€™s sequence ๐‘ž2, we must use representation (10) castin terms of the history ๐‘ง๐‘ก and not a corresponding representation cast in terms of ๐‘ง๐‘ก.

6.2 Dynamic Programming and Time Consistency of followerโ€™s Problem

Given the sequence ๐‘ž2 chosen by the Stackelberg leader in our duopoly model, it turns outthat the Stackelberg followerโ€™s problem is recursive in the natural state variables that con-front a follower at any time ๐‘ก โ‰ฅ 0.

This means that the followerโ€™s plan is time consistent.

To verify these claims, weโ€™ll formulate a recursive version of a followerโ€™s problem that buildson our recursive representation of the Stackelberg leaderโ€™s plan and our use of the Big K,little k idea.

6.3 Recursive Formulation of a Followerโ€™s Problem

We now use what amounts to another โ€œBig ๐พ, little ๐‘˜โ€ trick (see rational expectations equi-librium) to formulate a recursive version of a followerโ€™s problem cast in terms of an ordinaryBellman equation.

Firm 1, the follower, faces ๐‘ž2๐‘กโˆž๐‘ก=0 as a given quantity sequence chosen by the leader and be-

lieves that its output price at ๐‘ก satisfies

๐‘๐‘ก = ๐‘Ž0 โˆ’ ๐‘Ž1(๐‘ž1๐‘ก + ๐‘ž2๐‘ก), ๐‘ก โ‰ฅ 0

Our challenge is to represent ๐‘ž2๐‘กโˆž๐‘ก=0 as a given sequence.

To do so, recall that under the Stackelberg plan, firm 2 sets output according to the ๐‘ž2๐‘ก com-ponent of

๐‘ฆ๐‘ก+1 =โŽกโŽขโŽขโŽฃ

1๐‘ž2๐‘ก๐‘ž1๐‘ก๐‘ฅ๐‘ก

โŽคโŽฅโŽฅโŽฆ

12

which is governed by

๐‘ฆ๐‘ก+1 = (๐ด โˆ’ ๐ต๐น)๐‘ฆ๐‘ก

To obtain a recursive representation of a ๐‘ž2๐‘ก sequence that is exogenous to firm 1, we definea state ๐‘ฆ๐‘ก

๐‘ฆ๐‘ก =โŽกโŽขโŽขโŽฃ

1๐‘ž2๐‘ก

๐‘ž1๐‘ก๐‘ฅ๐‘ก

โŽคโŽฅโŽฅโŽฆ

that evolves according to

๐‘ฆ๐‘ก+1 = (๐ด โˆ’ ๐ต๐น) ๐‘ฆ๐‘ก

subject to the initial condition ๐‘ž10 = ๐‘ž10 and ๐‘ฅ0 = ๐‘ฅ0 where ๐‘ฅ0 = โˆ’๐‘ƒ โˆ’122 ๐‘ƒ21 as stated above.

Firm 1โ€™s state vector is

๐‘‹๐‘ก = [ ๐‘ฆ๐‘ก๐‘ž1๐‘ก

]

It follows that the follower firm 1 faces law of motion

[ ๐‘ฆ๐‘ก+1๐‘ž1๐‘ก+1

] = [๐ด โˆ’ ๐ต๐น 00 1] [ ๐‘ฆ๐‘ก

๐‘ž1๐‘ก] + [0

1] ๐‘ฅ๐‘ก (11)

This specification assures that from the point of the view of a firm 1, ๐‘ž2๐‘ก is an exogenous pro-cess.

Here

โ€ข ๐‘ž1๐‘ก, ๐‘ฅ๐‘ก play the role of Big Kโ€ข ๐‘ž1๐‘ก, ๐‘ฅ๐‘ก play the role of little k

The time ๐‘ก component of firm 1โ€™s objective is

โ€ฒ๐‘ก๐‘ฅ๐‘ก โˆ’ ๐‘ฅ2

๐‘ก =โŽกโŽขโŽขโŽขโŽฃ

1๐‘ž2๐‘ก

๐‘ž1๐‘ก๐‘ฅ๐‘ก

๐‘ž1๐‘ก

โŽคโŽฅโŽฅโŽฅโŽฆ

โ€ฒ

โŽกโŽขโŽขโŽขโŽฃ

0 0 0 0 ๐‘Ž02

0 0 0 0 โˆ’๐‘Ž12

0 0 0 0 00 0 0 0 0๐‘Ž02 โˆ’๐‘Ž1

2 0 0 โˆ’๐‘Ž1

โŽคโŽฅโŽฅโŽฅโŽฆ

โŽกโŽขโŽขโŽขโŽฃ

1๐‘ž2๐‘ก

๐‘ž1๐‘ก๐‘ฅ๐‘ก

๐‘ž1๐‘ก

โŽคโŽฅโŽฅโŽฅโŽฆ

โˆ’ ๐›พ๐‘ฅ2๐‘ก

Firm 1โ€™s optimal decision rule is

๐‘ฅ๐‘ก = โˆ’ ๐น๐‘‹๐‘ก

and itโ€™s state evolves according to

๐‘ก+1 = ( ๐ด โˆ’ ๐น )๐‘‹๐‘ก

13

under its optimal decision rule.

Later we shall compute ๐น and verify that when we set

๐‘‹0 =โŽกโŽขโŽขโŽขโŽฃ

1๐‘ž20๐‘ž10๐‘ฅ0๐‘ž10

โŽคโŽฅโŽฅโŽฅโŽฆ

we recover

๐‘ฅ0 = โˆ’ ๐น0

which will verify that we have properly set up a recursive representation of the followerโ€™sproblem facing the Stackelberg leaderโ€™s ๐‘ž2.

6.4 Time Consistency of Followerโ€™s Plan

Since the follower can solve its problem using dynamic programming its problem is recursivein what for it are the natural state variables, namely

โŽกโŽขโŽขโŽฃ

1๐‘ž2๐‘ก

๐‘ž10๐‘ฅ0

โŽคโŽฅโŽฅโŽฆ

It follows that the followerโ€™s plan is time consistent.

7 Computing the Stackelberg Plan

Here is our code to compute a Stackelberg plan via a linear-quadratic dynamic program asoutlined above

In [3]: # Parametersa0 = 10a1 = 2ฮฒ = 0.96ฮณ = 120n = 300tol0 = 1e-8tol1 = 1e-16tol2 = 1e-2

ฮฒs = np.ones(n)ฮฒs[1:] = ฮฒฮฒs = ฮฒs.cumprod()

14

In [4]: # In LQ formAlhs = np.eye(4)

# Euler equation coefficientsAlhs[3, :] = ฮฒ * a0 / (2 * ฮณ), -ฮฒ * a1 / (2 * ฮณ), -ฮฒ * a1 / ฮณ, ฮฒ

Arhs = np.eye(4)Arhs[2, 3] = 1

Alhsinv = la.inv(Alhs)

A = Alhsinv @ Arhs

B = Alhsinv @ np.array([[0, 1, 0, 0]]).T

R = np.array([[0, -a0 / 2, 0, 0],[-a0 / 2, a1, a1 / 2, 0],[0, a1 / 2, 0, 0],[0, 0, 0, 0]])

Q = np.array([[ฮณ]])

# Solve using QE's LQ class# LQ solves minimization problems which is why the sign of R and Q was

changedlq = LQ(Q, R, A, B, beta=ฮฒ)P, F, d = lq.stationary_values(method='doubling')

P22 = P[3:, 3:]P21 = P[3:, :3]P22inv = la.inv(P22)H_0_0 = -P22inv @ P21

# Simulate forward

ฯ€_leader = np.zeros(n)

z0 = np.array([[1, 1, 1]]).Tx0 = H_0_0 @ z0y0 = np.vstack((z0, x0))

yt, ut = lq.compute_sequence(y0, ts_length=n)[:2]

ฯ€_matrix = (R + F. T @ Q @ F)

for t in range(n):ฯ€_leader[t] = -(yt[:, t].T @ ฯ€_matrix @ yt[:, t])

# Display policiesprint("Computed policy for Stackelberg leader\n")print(f"F = F")

Computed policy for Stackelberg leader

F = [[-1.58004454 0.29461313 0.67480938 6.53970594]]

15

7.1 Implied Time Series for Price and Quantities

The following code plots the price and quantities

In [5]: q_leader = yt[1, :-1]q_follower = yt[2, :-1]q = q_leader + q_follower # Total output, Stackelbergp = a0 - a1 * q # Price, Stackelberg

fig, ax = plt.subplots(figsize=(9, 5.8))ax.plot(range(n), q_leader, 'b-', lw=2, label='leader output')ax.plot(range(n), q_follower, 'r-', lw=2, label='follower output')ax.plot(range(n), p, 'g-', lw=2, label='price')ax.set_title('Output and prices, Stackelberg duopoly')ax.legend(frameon=False)ax.set_xlabel('t')plt.show()

7.2 Value of Stackelberg Leader

Weโ€™ll compute the present value earned by the Stackelberg leader.

Weโ€™ll compute it two ways (they give identical answers โ€“ just a check on coding and thinking)

In [6]: v_leader_forward = np.sum(ฮฒs * ฯ€_leader)v_leader_direct = -yt[:, 0].T @ P @ yt[:, 0]

16

# Display valuesprint("Computed values for the Stackelberg leader at t=0:\n")print(f"v_leader_forward(forward sim) = v_leader_forward:.4f")print(f"v_leader_direct (direct) = v_leader_direct:.4f")

Computed values for the Stackelberg leader at t=0:

v_leader_forward(forward sim) = 150.0316v_leader_direct (direct) = 150.0324

In [7]: # Manually checks whether P is approximately a fixed pointP_next = (R + F.T @ Q @ F + ฮฒ * (A - B @ F).T @ P @ (A - B @ F))(P - P_next < tol0).all()

Out[7]: True

In [8]: # Manually checks whether two different ways of computing the# value function give approximately the same answerv_expanded = -((y0.T @ R @ y0 + ut[:, 0].T @ Q @ ut[:, 0] +

ฮฒ * (y0.T @ (A - B @ F).T @ P @ (A - B @ F) @ y0)))(v_leader_direct - v_expanded < tol0)[0, 0]

Out[8]: True

8 Exhibiting Time Inconsistency of Stackelberg Plan

In the code below we compare two values

โ€ข the continuation value โˆ’๐‘ฆ๐‘ก๐‘ƒ๐‘ฆ๐‘ก earned by a continuation Stackelberg leader who inheritsstate ๐‘ฆ๐‘ก at ๐‘ก

โ€ข the value of a reborn Stackelberg leader who inherits state ๐‘ง๐‘ก at ๐‘ก and sets ๐‘ฅ๐‘ก =โˆ’๐‘ƒ โˆ’1

22 ๐‘ƒ21

The difference between these two values is a tell-tale sign of the time inconsistency of theStackelberg plan

In [9]: # Compute value function over time with a reset at time tvt_leader = np.zeros(n)vt_reset_leader = np.empty_like(vt_leader)

yt_reset = yt.copy()yt_reset[-1, :] = (H_0_0 @ yt[:3, :])

for t in range(n):vt_leader[t] = -yt[:, t].T @ P @ yt[:, t]vt_reset_leader[t] = -yt_reset[:, t].T @ P @ yt_reset[:, t]

In [10]: fig, axes = plt.subplots(3, 1, figsize=(10, 7))

axes[0].plot(range(n+1), (- F @ yt).flatten(), 'bo',label='Stackelberg leader', ms=2)

axes[0].plot(range(n+1), (- F @ yt_reset).flatten(), 'ro',

17

label='continuation leader at t', ms=2)axes[0].set(title=r'Leader control variable $u_t$', xlabel='t')axes[0].legend()

axes[1].plot(range(n+1), yt[3, :], 'bo', ms=2)axes[1].plot(range(n+1), yt_reset[3, :], 'ro', ms=2)axes[1].set(title=r'Follower control variable $x_t$', xlabel='t')

axes[2].plot(range(n), vt_leader, 'bo', ms=2)axes[2].plot(range(n), vt_reset_leader, 'ro', ms=2)axes[2].set(title=r'Leader value function $v(y_t)$', xlabel='t')

plt.tight_layout()plt.show()

9 Recursive Formulation of the Followerโ€™s Problem

We now formulate and compute the recursive version of the followerโ€™s problem.

We check that the recursive Big ๐พ , little ๐‘˜ formulation of the followerโ€™s problem producesthe same output path ๐‘ž1 that we computed when we solved the Stackelberg problem

In [11]: A_tilde = np.eye(5)A_tilde[:4, :4] = A - B @ F

R_tilde = np.array([[0, 0, 0, 0, -a0 / 2],[0, 0, 0, 0, a1 / 2],[0, 0, 0, 0, 0],

18

[0, 0, 0, 0, 0],[-a0 / 2, a1 / 2, 0, 0, a1]])

Q_tilde = QB_tilde = np.array([[0, 0, 0, 0, 1]]).T

lq_tilde = LQ(Q_tilde, R_tilde, A_tilde, B_tilde, beta=ฮฒ)P_tilde, F_tilde, d_tilde = lq_tilde.stationary_values(method='doubling')

y0_tilde = np.vstack((y0, y0[2]))yt_tilde = lq_tilde.compute_sequence(y0_tilde, ts_length=n)[0]

In [12]: # Checks that the recursive formulation of the follower's problem gives# the same solution as the original Stackelberg problemfig, ax = plt.subplots()ax.plot(yt_tilde[4], 'r', label="q_tilde")ax.plot(yt_tilde[2], 'b', label="q")ax.legend()plt.show()

Note: Variables with _tilde are obtained from solving the followerโ€™s problem โ€“ those with-out are from the Stackelberg problem

In [13]: # Maximum absolute difference in quantities over time between# the first and second solution methodsnp.max(np.abs(yt_tilde[4] - yt_tilde[2]))

Out[13]: 6.661338147750939e-16

In [14]: # x0 == x0_tildeyt[:, 0][-1] - (yt_tilde[:, 1] - yt_tilde[:, 0])[-1] < tol0

Out[14]: True

19

9.1 Explanation of Alignment

If we inspect the coefficients in the decision rule โˆ’ ๐น , we can spot the reason that the followerchooses to set ๐‘ฅ๐‘ก = ๐‘ฅ๐‘ก when it sets ๐‘ฅ๐‘ก = โˆ’ ๐น๐‘‹๐‘ก in the recursive formulation of the followerproblem.

Can you spot what features of ๐น imply this?

Hint: remember the components of ๐‘‹๐‘ก

In [15]: # Policy function in the follower's problemF_tilde.round(4)

Out[15]: array([[-0. , 0. , -0.1032, -1. , 0.1032]])

In [16]: # Value function in the Stackelberg problemP

Out[16]: array([[ 963.54083615, -194.60534465, -511.62197962, -5258.22585724],[ -194.60534465, 37.3535753 , 81.97712513, 784.76471234],[ -511.62197962, 81.97712513, 247.34333344, 2517.05126111],[-5258.22585724, 784.76471234, 2517.05126111, 25556.16504097]])

In [17]: # Value function in the follower's problemP_tilde

Out[17]: array([[-1.81991134e+01, 2.58003020e+00, 1.56048755e+01,1.51229815e+02, -5.00000000e+00],

[ 2.58003020e+00, -9.69465925e-01, -5.26007958e+00,-5.09764310e+01, 1.00000000e+00],[ 1.56048755e+01, -5.26007958e+00, -3.22759027e+01,-3.12791908e+02, -1.23823802e+01],[ 1.51229815e+02, -5.09764310e+01, -3.12791908e+02,-3.03132584e+03, -1.20000000e+02],[-5.00000000e+00, 1.00000000e+00, -1.23823802e+01,-1.20000000e+02, 1.43823802e+01]])

In [18]: # Manually check that P is an approximate fixed point(P - ((R + F.T @ Q @ F) + ฮฒ * (A - B @ F).T @ P @ (A - B @ F)) < tol0).all()

Out[18]: True

In [19]: # Compute `P_guess` using `F_tilde_star`F_tilde_star = -np.array([[0, 0, 0, 1, 0]])P_guess = np.zeros((5, 5))

for i in range(1000):P_guess = ((R_tilde + F_tilde_star.T @ Q @ F_tilde_star) +

ฮฒ * (A_tilde - B_tilde @ F_tilde_star).T @ P_guess@ (A_tilde - B_tilde @ F_tilde_star))

In [20]: # Value function in the follower's problem-(y0_tilde.T @ P_tilde @ y0_tilde)[0, 0]

20

Out[20]: 112.65590740578058

In [21]: # Value function with `P_guess`-(y0_tilde.T @ P_guess @ y0_tilde)[0, 0]

Out[21]: 112.6559074057807

In [22]: # Compute policy using policy iteration algorithmF_iter = (ฮฒ * la.inv(Q + ฮฒ * B_tilde.T @ P_guess @ B_tilde)

@ B_tilde.T @ P_guess @ A_tilde)

for i in range(100):# Compute P_iterP_iter = np.zeros((5, 5))for j in range(1000):

P_iter = ((R_tilde + F_iter.T @ Q @ F_iter) + ฮฒ* (A_tilde - B_tilde @ F_iter).T @ P_iter@ (A_tilde - B_tilde @ F_iter))

# Update F_iterF_iter = (ฮฒ * la.inv(Q + ฮฒ * B_tilde.T @ P_iter @ B_tilde)

@ B_tilde.T @ P_iter @ A_tilde)

dist_vec = (P_iter - ((R_tilde + F_iter.T @ Q @ F_iter)+ ฮฒ * (A_tilde - B_tilde @ F_iter).T @ P_iter@ (A_tilde - B_tilde @ F_iter)))

if np.max(np.abs(dist_vec)) < 1e-8:dist_vec2 = (F_iter - (ฮฒ * la.inv(Q + ฮฒ * B_tilde.T @ P_iter @ B_tilde)

@ B_tilde.T @ P_iter @ A_tilde))

if np.max(np.abs(dist_vec2)) < 1e-8:F_iter

else:print("The policy didn't converge: try increasing the number of \

outer loop iterations")else:

print("`P_iter` didn't converge: try increasing the number of inner \loop iterations")

In [23]: # Simulate the system using `F_tilde_star` and check that it gives the# same result as the original solution

yt_tilde_star = np.zeros((n, 5))yt_tilde_star[0, :] = y0_tilde.flatten()

for t in range(n-1):yt_tilde_star[t+1, :] = (A_tilde - B_tilde @ F_tilde_star) \

@ yt_tilde_star[t, :]

fig, ax = plt.subplots()ax.plot(yt_tilde_star[:, 4], 'r', label="q_tilde")ax.plot(yt_tilde[2], 'b', label="q")ax.legend()plt.show()

21

In [24]: # Maximum absolute differencenp.max(np.abs(yt_tilde_star[:, 4] - yt_tilde[2, :-1]))

Out[24]: 0.0

10 Markov Perfect Equilibrium

The state vector is

๐‘ง๐‘ก = โŽกโŽขโŽฃ

1๐‘ž2๐‘ก๐‘ž1๐‘ก

โŽคโŽฅโŽฆ

and the state transition dynamics are

๐‘ง๐‘ก+1 = ๐ด๐‘ง๐‘ก + ๐ต1๐‘ฃ1๐‘ก + ๐ต2๐‘ฃ2๐‘ก

where ๐ด is a 3 ร— 3 identity matrix and

๐ต1 = โŽกโŽขโŽฃ

001โŽคโŽฅโŽฆ

, ๐ต2 = โŽกโŽขโŽฃ

010โŽคโŽฅโŽฆ

The Markov perfect decision rules are

๐‘ฃ1๐‘ก = โˆ’๐น1๐‘ง๐‘ก, ๐‘ฃ2๐‘ก = โˆ’๐น2๐‘ง๐‘ก

22

and in the Markov perfect equilibrium, the state evolves according to

๐‘ง๐‘ก+1 = (๐ด โˆ’ ๐ต1๐น1 โˆ’ ๐ต2๐น2)๐‘ง๐‘ก

In [25]: # In LQ formA = np.eye(3)B1 = np.array([[0], [0], [1]])B2 = np.array([[0], [1], [0]])

R1 = np.array([[0, 0, -a0 / 2],[0, 0, a1 / 2],[-a0 / 2, a1 / 2, a1]])

R2 = np.array([[0, -a0 / 2, 0],[-a0 / 2, a1, a1 / 2],[0, a1 / 2, 0]])

Q1 = Q2 = ฮณS1 = S2 = W1 = W2 = M1 = M2 = 0.0

# Solve using QE's nnash functionF1, F2, P1, P2 = qe.nnash(A, B1, B2, R1, R2, Q1,

Q2, S1, S2, W1, W2, M1,M2, beta=ฮฒ, tol=tol1)

# Simulate forwardAF = A - B1 @ F1 - B2 @ F2z = np.empty((3, n))z[:, 0] = 1, 1, 1for t in range(n-1):

z[:, t+1] = AF @ z[:, t]

# Display policiesprint("Computed policies for firm 1 and firm 2:\n")print(f"F1 = F1")print(f"F2 = F2")

Computed policies for firm 1 and firm 2:

F1 = [[-0.22701363 0.03129874 0.09447113]]F2 = [[-0.22701363 0.09447113 0.03129874]]

In [26]: q1 = z[1, :]q2 = z[2, :]q = q1 + q2 # Total output, MPEp = a0 - a1 * q # Price, MPE

fig, ax = plt.subplots(figsize=(9, 5.8))ax.plot(range(n), q, 'b-', lw=2, label='total output')ax.plot(range(n), p, 'g-', lw=2, label='price')ax.set_title('Output and prices, duopoly MPE')ax.legend(frameon=False)ax.set_xlabel('t')plt.show()

23

In [27]: # Computes the maximum difference between the two quantities of the twofirms

np.max(np.abs(q1 - q2))

Out[27]: 6.8833827526759706e-15

In [28]: # Compute valuesu1 = (- F1 @ z).flatten()u2 = (- F2 @ z).flatten()

ฯ€_1 = p * q1 - ฮณ * (u1) ** 2ฯ€_2 = p * q2 - ฮณ * (u2) ** 2

v1_forward = np.sum(ฮฒs * ฯ€_1)v2_forward = np.sum(ฮฒs * ฯ€_2)

v1_direct = (- z[:, 0].T @ P1 @ z[:, 0])v2_direct = (- z[:, 0].T @ P2 @ z[:, 0])

# Display valuesprint("Computed values for firm 1 and firm 2:\n")print(f"v1(forward sim) = v1_forward:.4f; v1 (direct) = v1_direct:.4f")print(f"v2 (forward sim) = v2_forward:.4f; v2 (direct) = v2_direct:.

4f")

Computed values for firm 1 and firm 2:

v1(forward sim) = 133.3303; v1 (direct) = 133.3296v2 (forward sim) = 133.3303; v2 (direct) = 133.3296

24

In [29]: # Sanity checkฮ›1 = A - B2 @ F2lq1 = qe.LQ(Q1, R1, ฮ›1, B1, beta=ฮฒ)P1_ih, F1_ih, d = lq1.stationary_values()

v2_direct_alt = - z[:, 0].T @ lq1.P @ z[:, 0] + lq1.d

(np.abs(v2_direct - v2_direct_alt) < tol2).all()

Out[29]: True

11 MPE vs. Stackelberg

In [30]: vt_MPE = np.zeros(n)vt_follower = np.zeros(n)

for t in range(n):vt_MPE[t] = -z[:, t].T @ P1 @ z[:, t]vt_follower[t] = -yt_tilde[:, t].T @ P_tilde @ yt_tilde[:, t]

fig, ax = plt.subplots()ax.plot(vt_MPE, 'b', label='MPE')ax.plot(vt_leader, 'r', label='Stackelberg leader')ax.plot(vt_follower, 'g', label='Stackelberg follower')ax.set_title(r'MPE vs. Stackelberg Value Function')ax.set_xlabel('t')ax.legend(loc=(1.05, 0))plt.show()

In [31]: # Display valuesprint("Computed values:\n")print(f"vt_leader(y0) = vt_leader[0]:.4f")print(f"vt_follower(y0) = vt_follower[0]:.4f")print(f"vt_MPE(y0) = vt_MPE[0]:.4f")

25

Computed values:

vt_leader(y0) = 150.0324vt_follower(y0) = 112.6559vt_MPE(y0) = 133.3296

In [32]: # Compute the difference in total value between the Stackelberg and the MPEvt_leader[0] + vt_follower[0] - 2 * vt_MPE[0]

Out[32]: -3.970942562087714

26

top related