lecture 8, 11/08/07information, control & games, fall 07, copyright p. b. luh, s.-c. chang 1...
Post on 21-Jan-2016
224 Views
Preview:
TRANSCRIPT
Lecture 8, 11/08/07 Information, Control & Games, Fall 07, Copyright P. B. Luh, S.-C. Chang
1
Information, Control & Games: Lecture #8Hierarchical Games
Last Two Times: • Finite Nash Games
– Feedback Games and Behavior Strategies• Infinite Nash Games
– Open-Loop, Feedback, Closed-Loop Nash Equilibria• Cooperative games
– Coalitional games– Redistribution of payoffs– Unanimity game and the core– Majority, vote trading, Landowner&workers game– Shapley Value under Differential Marginal Contributions– Cooperative Game and Risk
Next Time• 11/15 No Class • 11/22 Midterm exam
Today • Finite Hierarchical Games
– Motivating Examples
– Solution Concept
– Examples and Results on Finite Games
– An Example of Single-Act Infinite Games
• The Inducible Region Approach – Approach for Single-Stage Problems
– Principle of Optimality
– Multi-Stage Games
• Team Decision Theory– A Motivating Example
– A Formal Model and Solution Methodology
– A Canonical Example
Information, Control & Games, Fall 07, Copyright P. B. Luh, S.-C. Chang
3
• Reading Assignments for Today
1. Text 6.2
2. T. S. Chang, P. B. Luh, “Derivation of Necessary and Sufficient Conditions for Single-Stage Stackelberg Games vis the Inducible Region Concept,” IEEE Transactions on Automatic Control, Vol. AC-29, No.1, Jan. 1984, pp. 63-66
3. P.B. Luh, S.C. Chang, T.S. Chang, ”Solutions and Properties of Multi-Stage Stackelberg Games,” Automatica, Vol.20,No.2, March 1984, pp.251-256.
4. Relevant sections of T. Basar and G. J. Olsder, Dynamic Noncooperative Game Theory
5. Y. C. Ho, “Team Decision Theory and Information Structures,” Proceedings of IEEE, Vol. 68, No. 6, June 1980, pp. 644-654.
Information, Control and GamesLecture 8
Lecture 8, 11/08/07
Today • Finite Hierarchical Games
– Motivating Examples
– Solution Concept
– Examples and Results on Finite Games
– An Example of Single-Act Infinite Games
• The Inducible Region Approach – Approach for Single-Stage Problems
– Principle of Optimality
– Multi-Stage Games
• Team Decision Theory– A Motivating Example
– A Formal Model and Solution Methodology
– A Canonical Example
Information, Control & Games, Fall 07, Copyright P. B. Luh, S.-C. Chang
5
Hierarchical Games: Motivating Examples
• Consider the Dating Game discussed previously
Gentleman\Lady Opera Football
Opera (-1, -2) (0, 0)
Football (0, 0) (-2, -1)
– There are two Nash solutions which are not equivalent (Do
not have the same pair of costs) and not interchangeable (Mixing
various Nash choices may not end up with a Nash solution)
Q. If the lady is in a dominating position and can announce and then impose her strategy, knowing that the gentleman will react “rationally.” What should she announce? Why?
The lady would announce and impose Opera
Lecture 8, 11/08/07
Information, Control & Games, Fall 07, Copyright P. B. Luh, S.-C. Chang
6
– The DM holding the powerful position to announce and then impose his/her strategies is the Leader
– Followers then reacting rationally to Leader’s strategies
– Hierarchical, Leader-Follower, or Stackelberg Game
Q. Who is the leader? For example, after marriage, would the lady remain to be the leader?– Earn to be a leader or by the authority of position (still earned)
Q. Other examples of hierarchical games? – Grading for this course
• Allocation of 100 points: What are important
• Homework: Grading 2 randomly selected problems
• Term Project: Extra credit for cross-discipline teaming
– What would happen if a student said “I am not going to hand in any homework assignment for this course”?
Other examples:
Seat belt law
Speed limits
Lecture 8, 11/08/07
Information, Control & Games, Fall 07, Copyright P. B. Luh, S.-C. Chang
7
Solution Concept • Consider a two-person problem with 1 leader (DM1) and 1
follower (DM2) – DM1: Strategy 1 1, cost function J1(1, 2)
– DM2: Strategy 2 2, cost function J2(1, 2)
SGD. How to describe the hierarchical game concept?
• For a given 1, DM2 reacts rationally by minimizing his/her cost, i.e., min22
J2(1, 2)
– DM2’s rational reaction set:
R2(1) { 2, | J2(1, ) J2(1, 2) 2 2}
• The leader needs to find the best strategy 1S to minimize
J1(1, 2), taking into account F’s reactions
min11 J1(1, 2), s.t. 2 R2(1)
Lecture 8, 11/08/07
Information, Control & Games, Fall 07, Copyright P. B. Luh, S.-C. Chang
8
Q. If you were DM1, what would you do? – Need to have some behavioral assumptions
– The leader is assumed to be conservative to safeguard the worst case
Q. What if there are multiple elements in R2(1)?
Example: DM1\DM2 L M R
L (0, 0) (1, 0) (3, 1)
R (2, -1) (2, 0) (-1, -1)
Selecting L for the costs (1, 0). Mathematically?
min11 max2R2(1) J1(1, 2)
1S ~ The Stackelberg strategy for the leader
J1S(1
S, 2S) ~ The Stackelberg cost
Q. Is the problem easier or more difficult as compared to Nash? Lecture 8, 11/08/07
Information, Control & Games, Fall 07, Copyright P. B. Luh, S.-C. Chang
9
• The problem is difficult even if R2(1) is singleton for all 1
– Difficult to characterize the reaction set R2(1)
– Difficult to optimize J1(1, 2) s.t. 2 R2(1)
Graphical Interpretation
Q. What is R2(u1)? What is the solution with DM1 as the leader (S1)? What is R1(u2)? Solution with DM2 as the leader (S2)? Nash equilibrium (N)?
1
2
Level curves for DM2
R2Level curvesfor DM1
S1
S2
N
R1
Q. How do we compare S1 with N for DM1? Why?
S1 is better than N for DM1 if R2(1) is a singleton for each 1
N is the intersection of R1 and R2, and S1 has the best J1 on R2
Same is true for DM 2 Lecture 8, 11/08/07
Information, Control & Games, Fall 07, Copyright P. B. Luh, S.-C. Chang
10
Examples
Example 1. A Matrix Game
Q. What is the solution when DM1 is the leader? With DM2 as the leader? The Nash solutions?
DM1\DM2 21 22 23 24
L (-4, -1) (2, 0) (0, 1) (2, -1)
M (-3, -2) (0, 3) (0, -3) (-3, -2)
R (4, -1) (1, 0) (1, 0) (-2, -1)
1S = M with costs (0, -3)
2S = 24 with costs (-3, -2)
– The Nash equilibrium points are: (L, 21) with costs (-4, -1) and (M, 23) with costs (0, -3) (Intersection of R1 and R2)
Lecture 8, 11/08/07
Information, Control & Games, Fall 07, Copyright P. B. Luh, S.-C. Chang
11
Example 2. A Game in Extensive Form
L R
DM1
6,-6 3,-7
L R
DM2
L R
4,-10 10,-5
Q. Who is the leader?
What is the solution when DM1 is the leader? When DM2 is the leader? Nash?
– DM1 as the leader 1S = L with costs (3, -7) ~ Easy
– DM2 can be the leader ~ The leader doesn’t have to move first. He has to announce strategy first and then impose it
– DM2: 22 = 4 strategies 21 22 23 24
DM1 L L L R R
DM1 R L R L R
Outcome
Nash: 2
N: DM1 = L ~ L DM1 = R ~ L
1N: L with costs (3, -7)
Backward induction(3, -7) (3, -7) (4, -10) (6, -6)
Lecture 8, 11/08/07
Information, Control & Games, Fall 07, Copyright P. B. Luh, S.-C. Chang
12
Example 2. (Continued)
L R
DM1
6,-6 3,-7
L R
DM2
L R
4,-10 10,-5
SGD. The problem with DM2 as the leader was solved by converting it to normal form. Can it be directly solved in extensive form? How?
• Will come back to this at the next hour
Lecture 8, 11/08/07
Information, Control & Games, Fall 07, Copyright P. B. Luh, S.-C. Chang
13
Relevant Results on Finite Games
Theorem 3.3. Every two-person finite game admits a pure Stackelberg strategy for the leader
Proof. Intuitively clear from the finiteness of 1 and 2
Proposition 3.16. For a given two-person finite game, if– a pure Nash solution (1, 2) exists, and
– R2(1) is a singleton for every 1 1
then J1S J1
N
Proof. By contradiction
Lecture 8, 11/08/07
Information, Control & Games, Fall 07, Copyright P. B. Luh, S.-C. Chang
14
An Example of Single-Act Infinite Games
Problem Formulation
• Two manufacturers, M1 and M2, produce a single product type with the same manufacturing technology – Mi produces quantity ui at the cost Ci = cui + d
– d > 0 is the setup cost, and c > 0 is the unit production cost
• Price is determined by the demand-supply relationship: p = a – b(u1 + u2), with a > 0 and b > 0
• Profit for M1: pu1 – C1 = -J(u1, u2), or
J1(u1, u2) = [b(u1 + u2) – a]u1 + cu1 + d
– Similarly, J2(u1, u2) = [b(u1 + u2) – a]u2 + cu2 + d Lecture 8, 11/08/07
Information, Control & Games, Fall 07, Copyright P. B. Luh, S.-C. Chang
15
• Several cases will be examined:– Single manufacturer (monopoly case with u2 0)
– Nash equilibrium
– Stackelberg solution with M1 as the leader
Case 1. Single Manufacturer
Q. How to solve the problem?
• Necessary condition: dJ1/du1 = a - 2bu1 - c = 0 u1 = (a – c)/2b
• Suppose a = 50, b = 1, c = 2, and d = 10, thenu1
M = (a – c)/2b = 24, pM = a – bu1M = 50 – 24 = 26
C1M = cu1
M + d = 224 + 10 = 58
J1M = – pu1
M + C1M = –2624 + 58 = –566
Lecture 8, 11/08/07
Information, Control & Games, Fall 07, Copyright P. B. Luh, S.-C. Chang
16
Case 2. Nash equilibrium
Q. How to solve the problem? • Necessary condition:
(J1(u1, u2) = [b(u1 + u2) – a]u1 + cu1 + d)
J1/u1 = a - 2bu1 - bu2 - c = 0
J2/u2 = a - bu1 - 2bu2 - c = 0
u1N = u2
N = (a – c)/3b ~ Symmetric
• With the same a = 50, b = 1, c = 2, and d = 10, then what?u1
N = u2N = (a – c)/3b = 16, u1
N + u2N = 32 > u1
M = 24
pN = a – b(u1N + u2
N) = 50 – 32 = 18 < pM = 26
C1N = C2
N = cu1N + d = 216 + 10 = 42, C1
N + C2N = 84 > 58
J1N = J2
N = -1816 + 42 = –246,
-(J1N + J2
N) = 492 < -J1M = 566
Lecture 8, 11/08/07
Information, Control & Games, Fall 07, Copyright P. B. Luh, S.-C. Chang
17
Case 3. Stackelberg solution with M1 as the leader
SGD. How to solve the problem? • Reaction of M2:
J2/u2 = a - bu1 - 2bu2 - c = 0 u2 = (a – c – bu1)/2b
• M1’s problem: Min J1(u1, u2), s.t. u2 = (a – c – bu1)/2b. How to solve it?
J1(u1, u2) = [b(u1 + u2) – a]u1 + cu1 + d
= (bu1 – a – c)u1/2 + cu1 + d
dJ1/du1 = bu1 - 0.5a + 0.5c = 0 u1S = (a – c)/2b
• From here, one can get u2
S = (a – c – bu1S)/2b = (a – c)/4b = u1
S/2
pS = a – b(u1S + u2
S) = (a +3c)/4
Lecture 8, 11/08/07
Information, Control & Games, Fall 07, Copyright P. B. Luh, S.-C. Chang
18
• With the same a = 50, b = 1, c = 2, and d = 10, then what?u1
S = (a – c)/2b = 24 = u1M > u1
N = 16
u2S = 0.5u1
S = 12 < u2N
u1S + u2
S = 36 > u1N + u2
N = 32 > u1M = 24
pS = (a +3c)/4 = 56/4 = 14 < pN = 18 < pM = 26
C1S = cu1
S + d = 224 + 10 = 58
C2S = cu2
S + d = 212 + 10 = 34
J1S = -pu1
S + C1S = -1424 + 58 = –278
-J1S = 278 > -J1
N = 246; -J1S = 278 < -JM = 566
J2S = – pu2
S + C2S = –1412 + 34 = –134
-J2S = 134 < -J2
N = 246
-(J1S + J2
S) = 412 < -(J1N + J2
N) = 492 < -J1M = 566
Lecture 8, 11/08/07
Today • Finite Hierarchical Games
– Motivating Examples
– Solution Concept
– Examples and Results on Finite Games
– An Example of Single-Act Infinite Games
• The Inducible Region Approach – Approach for Single-Stage Problems
– Principle of Optimality
– Multi-Stage Games
• Team Decision Theory– A Motivating Example
– A Formal Model and Solution Methodology
– A Canonical Example
Information, Control & Games, Fall 07, Copyright P. B. Luh, S.-C. Chang
20
The Inducible Region Approach
• Change of DM2 to DM0 with cost stated as (J0, J1)
• If DM1 is the leader, the problem is quite simple
• The best DM0 can do depends on his/her ability to influence DM1’s cost
Q. What is the worst that DM0 can penalize DM1?
L R
DM1
3,8 -3,7
L R
DM0
L R
-9,10 10,3
SGD. Now with DM0 as the leader, we want to solve the problem directly in extensive form. How?
Lecture 8, 11/08/07
Information, Control & Games, Fall 07, Copyright P. B. Luh, S.-C. Chang
21
minu1 maxu0
J1(u0, u1) M, maximum penalizing strategy, (u0M, u1
M)
– M is the worst value that DM1 could ever get. Implications?
– Any outcome with J1 > M cannot be achieved
– Conversely, any outcome with J1(u0, u1) < M is “inducible,”
i.e., exists a strategy 0 so that (u0, u1) is the resulting outcome
L R
DM1
3,8 -3,7
L R
DM0
L R
-9,10 10,3
– On the boundary J1(u0, u1) = M:
• (u0M, u1
M) is inducible
L R
DM1
3,8 -3,7
L R
DM0
L R
-9,10 -9,8
• J0(u0, u1) < J0(u0M, u1
M) is not inducible (behavior assumption)
(6,8)
• J0(u0, u1) > J0(u0M, u1
M) is okay but not worthwhile to consider
Lecture 8, 11/08/07
Information, Control & Games, Fall 07, Copyright P. B. Luh, S.-C. Chang
22
DM0’s Optimization Problem
• Select the minimal cost within the “inducible region”: IR = {(u0, u1) | J1(u0, u1) () M}
L R
DM1
3,8 -3,7
L R
DM0
L R
-9,10 10,3
Min(u0, u1)IR J0(u0, u1)
– If u1 = u1S, then u0 = u0
S
– If u1 u1S, the resulting J1 > 7 to induce DM1 to select u1
S
(u0S, u1
S) = (L, L), with (J0S, J1
S) = (-3, 7)
• To construct DM0’s strategy 0S:
Lecture 8, 11/08/07
Q. How to interpret the approach graphically?
u1
u0
J1(u0,u1) = M
maxu0 J1(u0,u1)
0M(u1)
J1(u0,u1) = J1S
Q. How to construct 0S?
– As long as the curve is outside the level curve of J(u0, u1) =
J1S but tangent to it at (u0
S, u1S)
– Could be linear, nonlinear, or even discontinuous
minIR J0(u0,u1) (u0
S,u1S)
M
Example
Problem Formulation J0 = ½ u0
2 – ½ u0u1 + u12 + a u1
J1 = (u0 – ½)2 + 4(u1 – ½)2
– with u0, u1 [0, 1], and “a” is a parameter to be varied
u1
u0
0 0
1
1
IR
Delineate IR, find (u0S, u1
S), and construct a 0S
Delineation of IR M minu1
maxu0 J1(u0, u1)
As an intermediate step: maxu0 J1(u0, u1) 0
M(u1). What is it?
– It is easy to see that 0M(u1) = 0 or 1, and u1 = ½
– Consequently, M (u0 – ½)2 + 4(u1 – ½)2, = (½)2 = ¼, and
IR = {(u0, u1) | (u0 – ½)2 + 4(u1 – ½)2 () ¼}
Finding (u0S, u1
S)
– Min(u0, u1)IR J0(u0, u1) ~ May not be an easy task
Q. Any short cut?
– Can find the “team solution,” i.e., selecting both u0 and u1 to
minimize J0 (DM0 and DM1 act as a team, the best possible)
– If the team solution is in IR, then we are almost done
– Otherwise, have to perform the “hard” optimization
• First, find the “team solution” J0 = ½ u0
2 – ½ u0u1 + u12 + a u1
J0/u0 = u0 – ½ u1 = 0 u0 = ½ u1
J0/u1 = – ½ u0 + 2u1 + a = 0 7/4 u1 + a = 0
– Consequently, u0 = -2/7 a, and u1 = -4/7 a
– If (-2/7 a, -4/7 a) IR, then it is (u0S, u1
S)
Q. Is (-2/7 a, -4/7 a) in IR? J1 = (u0 – ½)2 + 4(u1 – ½)2
= (-2/7 a – ½)2 + 4(-4/7 a – ½)2
= (4/49 a2 + 2/7 a + ¼) + 4(16/49 a2 + 4/7 a + ¼)
= 68/49 a2 + 18/7 a + 5/4 () ¼
– Therefore, (-2/7 a, -4/7 a) IR if
68/49 a2 + 18/7 a + 1 () 0, or
a2 + 63/34 a + 49/68 () 0, or
68
63763,
6863763
a
a [-1.298. -0.555] A
• If a A, then u0S = -2/7 a, and u1
S = -4/7 a
• A possible 0S:
– If u1 = -4/7 a, then u0 = -2/7 a
– Otherwise, u0 = 0
u1
u0
0 0
1
1
IR
• Suppose a = -1, and we want to find a linear 0S
u0S = -2/7 a = 2/7; u1
S = -4/7 a = 4/7
Q. How to find a linear 0S?
– Find the level curve of J1(u0S, u1
S)
– Find the tangent line of the level curve at (u0
S, u1S)J1 = (u0 – ½)2 + 4(u1 – ½)2 = 13/196
J1/u1 = 2(u0 – ½) (du0/du1) + 8(u1 – ½) = 0
du0/du1|(u0S, u1
S) = -8(u1 – ½)/2(u0 – ½) = 4/3
– Then the tangent line at (u0S, u1
S) is described by
(u0 – 2/7) = 4/3 (u1 – 4/7), or
u0 = 4/3 u1 – 10/21 for u1 [5/14, 1], and u0 = 0 otherwise
– Or, u0 = 2/7 for u1 = 4/7, and u0 = 0 otherwise. Credible?
Q. What to do if a A ?
• Have to solve the problem:Min J0 = ½ u0
2 – ½ u0u1 + u12 + a u1
s.t. (u0 – ½)2 + 4(u1 – ½)2 () ¼
– With nonlinear constraint, not an easy problem
– Left as an exercise
Principle of Optimality
The Issue
• Nash games in extensive form are solved by backward induction. This is based on the Principle of Optimality– An optimal strategy has the property that whatever the initial
state and time are, all remaining decisions must also constitute an optimal strategy
• For hierarchical games, we have not been able to use backward induction
Q. Why? Does Principle of Optimality still hold here?
Time
01 = u1
u0 = 0(u1)
– Once announced 0S and observed u1, DM0 should set u0 as u0 =
0S(u1). Is there incentive for DM0 to deviate from this?
Example (Slightly modified)
L R
DM1
-10,9 -3,5
L R
DM0
L R
-2,-2 4,7
Q. Now suppose we are the leader DM0. We announced the strategy, and DM1 selected L. What should we do? – There is strong incentive for DM0 to select R instead of
carrying out what was announced
– Along the optimal path, there is incentive for DM0 to deviate
Q. Suppose for some unknown reasons DM1 select R. What should we do? – There is also strong incentive for DM0 to select L instead of
carrying out what was announced
– Off the optimal path, there is incentive for the DM0 to deviate
• Principle of optimality does not hold for hierarchical games
• Multi-stage games cannot be solved one stage at a time back from the terminal stage as in backward induction
• There exist inherent incentive for the leader to deviate from what was announced, even in the absence of uncertainties or any unforeseeable interruptions
• Government/CEO credibility is at risk, a very familiar phenomena
Multi-Stage Hierarchical Games
• Since the Principle of Optimality does not hold, a multi-stage problem cannot be solved by backward induction
Q. What to do? – The inducible region approach can be extended. How?
– The basic ideas of IR presented earlier still hold here
The General Approach – Delineate IR
• Worst case analysis for the follower
– Min(u0, u1)IR J0(u0, u1)
• A parameter optimization problem may not easy to solve
– Construct 0S.
• Theoretically not hard, but may have practical difficulties
U
DM1
DM0
D
DM0
DM1
(2,4.5)
(3,9)
(4,10)
(5,7)
(-100,7)
(3,3)
(5,1)
(-2,9) (0,5)
(-100,10)
(3,5)
(7,6)
(4,2)
(5,3)
(9,5)
(-100,4)
(J0,J1)
SGD. What is the worst that DM0 can penalize DM1?
9
10
7
9
10
6
3
5
9
7
6
3
9
6
6
minu11 maxu01
[minu12 maxu02
J1(u0, u1)]
= minu11 maxu01
[M2]
M = 6
0tM(u1t) ~ “The maximum penalizing
strategy”
• Obtained by backward induction
Q. Which outcomes are inducible, which are not? – For any (u0, u1), if J1(u0, u1) > Mt
for any t (any stage) along the path, u1 will never be selected
– Any (u0, u1) such that J1(u0, u1) < Mt for all t (for all the stages)
along the path, (u0, u1) is “inducible”
– The boundary J1(u0, u1) = Mt: Analyzed carefully as before
IR = {(u0, u1) | J1(u0, u1) () Mt for all t along the path}
Finding (u0S, u1
S)
– Min(u0, u1)IR J0(u0, u1)
• A parameter optimization problem with outcomes (0, 5)
Constructing 0S
– If u1t = u1tS, then u0t = u0t
S
– Otherwise, try to make the resulting J1 greater than J1S
• One way is to use the maximum penalizing strategy 0tM(u1t)
• Others might be better in case of deviation by DM1
• E.g., (U, D, U, U) would result in (J0, J1) = (-100, 7)!
Today • Finite Hierarchical Games
– Motivating Examples
– Solution Concept
– Examples and Results on Finite Games
– An Example of Single-Act Infinite Games
• The Inducible Region Approach – Approach for Single-Stage Problems
– Principle of Optimality
– Multi-Stage Games
• Team Decision Theory– A Motivating Example
– A Formal Model and Solution Methodology
– A Canonical Example
Team Decision Theory
– Decentralized decision-making where DMs have access to different information and are responsible for different decisions, but share the same objective
A Motivating Example
• Problem Context – Back to the old days with no radio or telephone
– Mr. B of Boston and Mr. N of NYC, knowing their local weather, have to decide whether to go to Hartford today
Boston Hartford
New York
– The meeting requires a good weather at Hartford. If it rains, then they waste the trip
Q. Should they go or not?
• Mr. B and Mr. N share the same objective function – Shine at Hartford: B \ N Go Don’t
Go -10 3
Don’t 3 0B \ N Go Don’t
Go 4 2
Don’t 2 -5
– Rain at Hartford:
• The only information available is the local weather: B at
Boston and N at NYC. B, N, and H are correlated:
B R R R R S S S SN R R S S R R S SH R S R S R S R SPr. 0.25 0.05 0.1 0.1 0.1 0.1 0.05 0.25
Q. What are possible strategies for Mr. B? Mr. N? B: 4 strategies B1
B2 B3 B4
S in B G G D DR in B G D G D
N1 N2 N3 N4
S in N G G D D
R in N G D G D
N: 4 strategies
• Off-line coordination ~ Want to find the best pair of strategies. How?
• Game in normal form: Compute the expected payoff for each pair of strategies (16 of them), and select the best pair
J(B, N) = E[J(B(B), N(N), H)]
= B, N, H J(B(B), N(N), H)Pr(B, N, H)
– For example,
J(B1, N1) = 0.254 + 0.05(-10) + 0.14 + 0.1(-10) + 0.14
+ 0.1(-10) + 0.054 + 0.25(-10) = -3
J(B1, N2) = 0.252 + 0.053 + 0.14 + 0.1(-10) + 0.12
+ 0.13 + 0.054 + 0.25(-10) = -1.75
• The one with the minimum cost is the solution (B
*, N*) = (B1, N1), with J* = -3
– With pre-game coordination, there is no difficulty associated with the non-uniqueness of solution
Q. Is there a systematic way to find the optimal solution? – Not an easy task. Shall present a formal model and then go
over several examples
A Formal Model and Solution Methodology
A Formal Model
• A set of DMs: {1, 2, .., N} – Without loss of generality, assume that each DM observes one
measurement, makes one decision, and then leaves
• A set of uncertainties = {1, 2, .., m}, with j j
– Nature’s decisions
• A set of observations: z = {z1, z2, .., zN}
– zn = n(, u) Zn ~ DM n’s observation, subject to causality
– If zn = n() for all n, then static information
• A set of decision variables: u = {u1, u2, .., uN}
– un Un ~ DM n’s decision
n: Zn Un ~ DM n’s strategy
n n ~ The set of admissible strategies, e.g., linear
strategies
– un = n(zn)
• The cost function J: U R – For a given realization of :
J(u, ) = J(u1, u2, .., uN, ) ~ Extensive form
= J(1(z1), 2(z2), .., N(zN), )
= J(, ) ~ Normal form
J(1, 2, .., N) = E[J(, )] ~ Expected cost function
• The problem: min1, 2, .., NJ(1, 2, .., N)
Q. How to solve the problem?
Solution Methodology • Functional (as opposed to parameter) optimization
• No systematic way to solve it, except under special conditions
• Possible methods: – Brute force exhaustive search
– Impose more structures, e.g., best linear/quadratic strategies
– Relax the conditions, e.g., person-by-person optimal
21 2223
11 0 0 -1
12 0 -3 0
13 -1 0 -2
J(12, 22) = -3 ~ Optimal team solution
J(13, 23) = -2 ~ Person-by-person optimal ~ Nash solution
• Team optimality Person-by-person optimality, but not vice versa
• (1*, 2
*, .., N*) is a person-by-person optimal solution iff
J(1*, 2
*, .., N*) J(1
*, 2*, .., n, n+1
*,., N*) n and n
– Equivalently, minnJ(1
*, 2*, .., n, n+1
*,., N*) n
*
– An optimal control problem, and may be solvable
– DMs could coordinate off-line, and no difficulty on the non-uniqueness of solutions Start
Guess (1g, .., N
g)
MinnJ(1
g, .., n, .., Ng)
n* n
N
* = ng n?
Yes, PBPOS?
No, revise the guess
• Consequently, one way to find (1
*, 2*, .., N
*) by:
– There is no systematic way to revise the guess
– To prove the convergence is quite difficult
Today – Approach for Single-Stage Problems
– Principle of Optimality
– Multi-Stage Games
• Team Decision Theory– A Motivating Example
– A Formal Model and Solution Methodology
– A Canonical Example
• Paper Review by Priscillia Hunt, Selini Katsaiti, Dinesh Padmanabhan, Ivailo Kotzev
A Canonical Example
Problem Formulation
• Two DMs. DM0 with control u0, and DM1 with u1
y0 = x0 + bv0, b > 0
y1 = x0 + cu0 + dv1, c 0, d > 0
– x0, v1, v2 are independent random variables with x0 ~ N(0, 2),
v1 ~ N(0, 1), and v2 ~ N(0, 1)
– Information structure yet to be specified
– The cost function:
J = E{½ (x0 + au0 + u1)2 + hu0
2 + gu12}, with a, h, g 0
• By appropriately assigning values to parameters, the above can represent different problems. We shall consider a few
The Static Case – Let a = b = d = 1, c = 0, and h = g = ½
J = E{½ (x0 + u0 + u1)2 + ½u0
2 + ½u12}
0 = {y0}, with y0 = x0 + v0
1 = {y1}, with y1 = x0 + v1
– With a random initial state x0, two noisy measurements are
made: y0 by DM0, and y1 by DM1
t
y0
y1
u0(y0)
u1(y1)
x0
– Static information structure since both y0 and y1 are
independent of decisions
Q. How to solve it?
• Shall find linear person-by-person optimal solution first, and then show that it is also team optimal
Linear Person-by-Person Optimal Solution
– Assume u0* = k0y0 and u1
* = k1y1, with k0 and k1 yet to be
determined
DM0’s perspective
• If u1* = k1y1 = k1(x0 + v1), then min0
J(0, 1*) 0
*
J = E{½ (x0 + u0 + u1)2 + ½u0
2 + ½u12}
= E{E{½[x0 + u0 + k1(x0 + v1)]2 + ½u0
2 + ½[k1(x0 + v1)]2|y0}}
J/u0 = 0 = 2u0 + E[(k1+ 1)x0 + k1v1)| y0]
= 2u0 + (k1+ 1)E[x0| y0]
u0* = - ½ (k1+ 1)E[x0| y0]
~ What is this?
E[x0|y0] =x0 + x0y0 y0y0
-1 (y0 -y0)
= 0 + 2(2 + 1)-1y0 = 2y0/(2 + 1)
– Consequently,
u0* = - ½ (k1+ 1)E[x0|y0] = -(k1+ 1)2y0/2(2 + 1) = k0y0
2(2 + 1)k0 + 2k1 = -2
DM1’s perspective
– Similarly, if u0* = k0y0, then min1
J(0*, 1) 1
*
u1* = - ½ (k0+ 1)E[x0|y1] = -(k0+ 1)2y1/2(2 + 1) = k1y1
2k0 + 2(2 + 1)k1 = -2
1
,0
0N~
vx
x
y
x22
22
00
0
0
0
Determining k0 and k1
– Two unknowns k0 and k1, and two linear conditions:
12
12k
k
112
121
2
2
2
2
1
0
2
2
2
2
• Determinant is not zero Always exists a unique solution
• Once k0 and k1 are solved, the solution is obtained
Q. Is the solution team optimal? Why or why not?
Team Optimal Solution
• The above solution is also team optimal in view of the strict convexity of J
– For a given realization of (including x0, v0 and v1), y0 and y1
are determined
u0* = k0 y0 and u1
* = k1 y1
J(u0, u1, ) = ½ (x0 + u0 + u1)2 + ½u0
2 + ½u12
u
J
u*
> J(u0*, u1
*, ) + J/u0(u0 - u0*) + J/u1(u1 - u1
*)
for all (0, 1) (0*, 1
*)
E[J(u0=0(y0), u1=1(y1), )] > E[J(u0=0*(y0), u1=1
*(y1), )]
(0*, 1
*) is team optimal
u0* = -2/7 y0, and u1
* = -2/7 y1
JStatic = E{½ (x0 + u0 + u1)2 + ½u0
2 + ½u12}
= ½ E{[x0 – 2/7 (x0 + v0) – 2/7 (x0 + v1)]2
+ [2/7 (x0 + v0)]2 + [2/7 (x0 + v1)]
2}
= ½ E{[3/7 x0 – 2/7 v0 – 2/7 v1]2 + 4/49 (x0 + v0)
2
+ 4/49 (x0 + v1)2}
= ½ {[36/49 + 4/49 + 4/49] + 104/49}
= 42/49 = 6/7 = 0.857
Special Case When = 2
• In this case, the following equations for k0 and k1 become
;4.0
4.0
k
k
14.0
4.01
1
0
7272
4.0
4.0
14.0
4.01
84.01
k
k
2
1
General Results
Proposition 1
• For an LQG static team decision problem with J = ½ uTQu + uTS, Q > 0, ~ N(0, ), and
y = H,
The team optimal solution exists, is unique, is linear in measurements, and can be obtained by solving the person-by-person optimal problem
Today • Finite Hierarchical Games
– Motivating Examples– Solution Concept– Examples and Results on Finite Games – An Example of Single-Act Infinite Games
• The Inducible Region Approach – Approach for Single-Stage Problems– Principle of Optimality – Multi-Stage Games
• Team Decision Theory– A Motivating Example– A Formal Model and Solution Methodology – A Canonical Example
Next Time• 11/14 No Class • 11/21 Midterm exam
top related