information collection in a linear programscholar.rhsmith.umd.edu/.../files/2010_sp12_lp.pdf ·...
TRANSCRIPT
Information collection in a linear program
Ilya O. Ryzhov Warren B. Powell
Operations Research and Financial EngineeringPrinceton University
Princeton, NJ 08544, USA
International Conference on Stochastic ProgrammingAugust 17, 2010
1 / 37
Outline
1 Introduction
2 Mathematical model
3 The knowledge gradient algorithmDerivationComputationTheory: asymptotic optimality
4 Experimental results
5 Conclusions
2 / 37
Outline
1 Introduction
2 Mathematical model
3 The knowledge gradient algorithmDerivationComputationTheory: asymptotic optimality
4 Experimental results
5 Conclusions
3 / 37
Motivation: emergency response
Our goal is to find the shortest(least congested) path across anetwork
This is an LP where eachobjective coefficient representsthe congestion on an edge
We can measure the localcongestion on an individual edge(e.g. from the air) and changeour estimate of the congestionon that edge
4 / 37
Motivation: agricultural planning
We solve an LP to maximizetotal crop yield subject toacreage constraints indifferent fields
The exact yield fromplanting a certain field isunknown
Before settling on a plan, weperform expensive soil testson different fields to improveour beliefs about the yield
5 / 37
LP formulation
We consider an LP in standard form,
V (c) = maxx cT xs.t. Ax = b
x ≥ 0,
where the vector c ∈RM is unknown
We have a Bayesian prior belief about c in which the coefficients arecorrelated
We can measure a coefficient (e.g. perform a soil test) and observe aresult that changes our beliefs
We are given N measurements to learn the true optimal valueV (c)...what should we measure?
6 / 37
The effect of learning
Changing our estimate of a single objective coefficient can drasticallychange what we believe to be the optimal solution
Consider the shortest-path problem:
7 / 37
The effect of learning
Changing our estimate of a single objective coefficient can drasticallychange what we believe to be the optimal solution
Consider the shortest-path problem:
7 / 37
The effect of learning
Changing our estimate of a single objective coefficient can drasticallychange what we believe to be the optimal solution
Consider the shortest-path problem:
7 / 37
Correlated beliefs in optimal learning
By measuring one coefficient, we can obtain information about manyother coefficientsIn a traffic network, if edge (i , j) is congested, it is likely that edgesinto i and out of j are congested
8 / 37
Correlated beliefs in optimal learning
By measuring one coefficient, we can obtain information about manyother coefficientsIn a traffic network, if edge (i , j) is congested, it is likely that edgesinto i and out of j are congested
8 / 37
Correlated beliefs in optimal learning
By measuring one coefficient, we can obtain information about manyother coefficientsIn a traffic network, if edge (i , j) is congested, it is likely that edgesinto i and out of j are congested
8 / 37
Correlated beliefs in optimal learning
Correlations are modeled using a covariance matrix
We assume c ∼N(c0,Σ0
)Example:
Σ0 =
12 6 36 7 43 4 15
The value Σ0
j ,k represents our belief about the covariance ofcoefficients j and k
9 / 37
A quick literature review
Stochastic linear programmingI Theoretical properties of the expected optimal value of a stochastic LP
(Madansky 1960, Itami 1974)I Approximate algorithms for multi-stage problems (Birge 1982)
Parametric linear programming/sensitivity analysisI Linear programs with varying objective coefficients (Jansen et al. 1997)
Optimal learningI Simple underlying optimization models, e.g. ranking and selection
(Bechhofer et al. 1985) and multi-armed bandits (Gittins 1989)I Recent work on learning with correlated beliefs (Frazier et al. 2009)
and independent beliefs on graphs (Ryzhov & Powell 2010)
Our work synthesizes and builds on concepts from all of these areas.
10 / 37
Outline
1 Introduction
2 Mathematical model
3 The knowledge gradient algorithmDerivationComputationTheory: asymptotic optimality
4 Experimental results
5 Conclusions
11 / 37
Preliminaries
We assume that the feasible region is known and bounded
Let x (c) be the optimal solution, i.e. the solution of
V (c) = maxx cT xs.t. Ax = b
x ≥ 0,
By strong duality, the dual LP has the same optimal value:
V (c) = miny bT ys.t. AT y − s = c
s ≥ 0
Let y (c) and s (c) denote the optimal dual solution
12 / 37
Learning with correlated Bayesian beliefs
At first, we believe that
c ∼N(c0,Σ0
)We measure the jth coefficient and observe
c1j ∼N (cj ,λj) .
As a result, our beliefs change:
c1 = c0 +c1j − c0
j
λj + Σ0jj
Σ0ej
Σ1 = Σ0−Σ0eje
Tj Σ0
λj + Σ0jj
Repeat the process to obtain c2,c3, ...
The vector ej isgiven by
ej = (0, ...,1, ...,0)T
where component jis equal to 1.
13 / 37
Dynamic programming formulation
The optimal measurement strategy can be described using Bellman’sequation:
V ∗,N(cN ,ΣN
)= V
(cN)
V ∗,n (cn,Σn) = maxj
IE[V ∗,n+1
(cn+1,Σn+1
)|cn,Σn, jn = j
]The optimal measurement J∗,n (cn,Σn) is the choice of j thatachieves the argmax in V ∗,n (cn,Σn)
Due to the curse of dimensionality, this equation is computationallyintractable
14 / 37
Outline
1 Introduction
2 Mathematical model
3 The knowledge gradient algorithmDerivationComputationTheory: asymptotic optimality
4 Experimental results
5 Conclusions
15 / 37
Definition
Originally developed for ranking and selection (Gupta & Miescke1996, Frazier et al. 2008)
The KG decision rule is given by
JKG ,n (cn,Σn) = arg maxj
IEnj
[V(cn+1
)−V (cn)
]The KG factor
νKG ,nj = IEn
j
[V(cn+1
)−V (cn)
]is the expected improvement in our estimate of the optimal value ofthe LP that is achieved by measuring j
The future beliefs cn+1 are random at time n, meaning that KGcomputes the expected value of a stochastic LP
16 / 37
Derivation
It can be shown (Frazier et al. 2009) that, given cn and Σn, and giventhat we measure j at time n, the conditional distribution of cn+1 is
cn+1 ∼ cn +Σnej√λj + Σn
jj
·Z
where Z is a one-dimensional standard normal random variable.
Thus, the KG factor becomes
νKG ,nj = IE
[V(cn + ∆cn
j ·Z)]−V (cn)
where ∆cnj =
Σnej√λj +Σn
jj
.
17 / 37
Graphical illustration
The solution x(cn + z∆cn
j
)is constant if z is in a certain interval
Varying z rotates the level curve of the objective function
18 / 37
Graphical illustration
The solution x(cn + z∆cn
j
)is constant if z is in a certain interval
Varying z rotates the level curve of the objective function
18 / 37
Graphical illustration
The solution x(cn + z∆cn
j
)is constant if z is in a certain interval
Varying z rotates the level curve of the objective function
18 / 37
Graphical illustration
The solution x(cn + z∆cn
j
)is constant if z is in a certain interval
Varying z rotates the level curve of the objective function
18 / 37
Derivation (continued)
The set of values of z for which x(cn + z∆cn
j
)is constant is known
(Hadigheh & Terlaky 2006) as the invariant support set
Let −∞ = z1 < z2 < ... < zI = ∞ be a partition of the real line intoinvariant support sets
Let xi = x(cn + z∆cn
j
)for z ∈ (zi ,zi+1)
Then,
IEV(cn + ∆cn
j ·Z)
= ∑i
∫ zi+1
zi
[(cn + z∆cn
j
)Txi
]φ(z)dz
where φ is the standard normal pdf
19 / 37
Graphical illustration
The optimal solution x(cn + z∆cn
j
)changes at the breakpoints zi
The level curve of cn + zi∆cnj is tangent to a face of the polyhedron
20 / 37
Graphical illustration
The optimal solution x(cn + z∆cn
j
)changes at the breakpoints zi
The level curve of cn + zi∆cnj is tangent to a face of the polyhedron
z > zi
20 / 37
Graphical illustration
The optimal solution x(cn + z∆cn
j
)changes at the breakpoints zi
The level curve of cn + zi∆cnj is tangent to a face of the polyhedron
z = zi
20 / 37
Graphical illustration
The optimal solution x(cn + z∆cn
j
)changes at the breakpoints zi
The level curve of cn + zi∆cnj is tangent to a face of the polyhedron
z < zi
20 / 37
The KG formula
After some algebra, we can obtain an expression
νKG ,nj = ∑
i
(bi+1−bi ) f (−|zi |)
whereI bi =
(∆cn
j
)Txi
I f (z) = zΦ(z) + φ (z)I Φ is the standard normal cdf
This formula gives the exact value of the KG factor, provided that wecan compute the breakpoints zi of the piecewise linear function
21 / 37
Computation of the breakpoints
At time n, we start with one optimal solution x (cn) for z = 0
We determine whether z = 0 is itself a breakpoint by solving two LPs
z− = miny ,s,z zs.t. AT y − s− z∆cn
j = cn
x (cn)T s = 0s ≥ 0,
andz+ = maxy ,s,z z
s.t. AT y − s− z∆cnj = cn
x (cn)T s = 0s ≥ 0
The values z−,z+ are the smallest and largest values of z for whichx (cn) is optimal (Roos et al. 1997)
22 / 37
Computation of the breakpoints
If either z− or z+ is equal to zero, then z = 0 is a breakpointSuppose that z− = 0 and z+ > 0
23 / 37
Computation of the breakpoints
If either z− or z+ is equal to zero, then z = 0 is a breakpointSuppose that z− = 0 and z+ > 0
23 / 37
Computation of the breakpoints
If either z− or z+ is equal to zero, then z = 0 is a breakpointSuppose that z− = 0 and z+ > 0
x (cn)
23 / 37
Computation of the breakpoints
If either z− or z+ is equal to zero, then z = 0 is a breakpointSuppose that z− = 0 and z+ > 0
x (cn)
xl (cn)
23 / 37
Finding the neighbouring extreme point
The point xl (cn) is the optimal solution to the LP
Vl (cn) = minx
(∆cn
j
)Tx
s.t. Ax = b
(s (cn))T x = 0x ≥ 0
The quantity(
∆cnj
)Txl (cn) is the left derivative of the piecewise
linear function at the breakpoint z = 0
The right derivative is(
∆cnj
)Tx (cn) itself
24 / 37
Finding the next breakpoint
However, xl (cn) is also optimal at two breakpoints, zero and zl (cn)
x (cn)
xl (cn)
25 / 37
Finding the next breakpoint
However, xl (cn) is also optimal at two breakpoints, zero and zl (cn)
xl (cn)
z = 0
25 / 37
Finding the next breakpoint
However, xl (cn) is also optimal at two breakpoints, zero and zl (cn)
xl (cn)
z = zl (cn)
25 / 37
Finding the next breakpoint
This next breakpoint is the optimal value of the LP
zl (cn) = miny ,s,z zs.t. AT y − s− z∆cn
j = cn
(xl (cn))T s = 0s ≥ 0.
This LP is identical to the one we used to find z−, but with x (cn)replaced by xl (cn)
We can now find a new z− and repeat the procedure until z− =−∞
26 / 37
Other cases
If z− < 0 and z+ = 0, we can find the neighbouring extreme pointxu (cn) by solving
Vu (cn) = maxx
(∆cn
j
)Tx
s.t. Ax = b
(s (cn))T x = 0x ≥ 0
The next breakpoint is the optimal value of an LP
zu (cn) = maxy ,s,z zs.t. AT y − s− z∆cn
j = cn
(xu (cn))T s = 0s ≥ 0
Again, the process can be repeated until z+ = ∞
27 / 37
Other cases
z− < 0 < z+: zero is not a breakpoint, but both z− and z+ arez− = z+ = 0: zero is a breakpoint, but x (cn) is not an extreme point
xu (cn)
xl (cn)
x (cn)
28 / 37
Summary of algorithm for computing KG factors
Given a set of beliefs (cn,Σn), do the following for j = 1, ...,M:
1 Let z = 0 and solve for x (cn), y (cn) and s (cn).
2 Solve two LPs to obtain z−,z+ and decide whether z = 0 is abreakpoint.
3 Solve a sequence of LPs to obtain the entire vector z of breakpointsand the set x of invariant solutions.
4 Compute νKG ,nj using z and x .
Finally, we measure the coefficient with the largest νKG ,nj .
29 / 37
Asymptotic optimality property of KG
Proposition
For any measurement strategy π,
IEπV(cN)≤ IEV (c) .
Theorem
limN→∞
IEKGV(cN)
= IEV (c) .
Recall that our objective is to maximize IEπV(cN)
As N → ∞, the KG method achieves the highest possible value
30 / 37
Outline
1 Introduction
2 Mathematical model
3 The knowledge gradient algorithmDerivationComputationTheory: asymptotic optimality
4 Experimental results
5 Conclusions
31 / 37
Experimental results: shortest-path problem
Ten layered graphs (22 nodes, 50 edges)
Ten larger layered graphs (38 nodes, 102 edges)
32 / 37
Outline
1 Introduction
2 Mathematical model
3 The knowledge gradient algorithmDerivationComputationTheory: asymptotic optimality
4 Experimental results
5 Conclusions
33 / 37
Conclusions
We have proposed a new class of optimal learning problems where theunderlying optimization model is a linear program
We have derived a knowledge gradient method for deciding what tomeasure in this setting
The KG method computes the value of a single measurement exactlyand is asymptotically optimal
The algorithm for finding breakpoints terminates in finite time, but iscomputationally expensive
34 / 37
References
Bechhofer, R., Santner, T. & Goldsman, D. (1995) Design andAnalysis of Experiments for Statistical Selection, Screening andMultiple Comparisons. John Wiley and Sons, New York.
Birge, J. (1982) “The value of the stochastic solution in stochasticlinear programs with fixed recourse.” Mathematical Programming 24,314–325.
Frazier, P.I., Powell, W. & Dayanik, S. (2008) “A knowledge-gradientpolicy for sequential information collection.” SIAM J. on Control andOptimization 47:5, 2410-2439.
Frazier, P.I., Powell, W.B. & Dayanik, S. (2009) “Theknowledge-gradient policy for correlated normal rewards.” INFORMSJ. on Computing 21:4, 599-613.
35 / 37
References
Gittins, J.C. (1989) Multi-Armed Bandit Allocation Indices. JohnWiley and Sons, New York.
Gupta, S. & Miescke, K. (1996) “Bayesian look ahead one stagesampling allocation for selecting the best population.” J. onStatistical Planning and Inference 54:229-244.
Hadigheh, A. & Terlaky, T. (2006) “Sensitivity analysis in linearoptimization: Invariant support set intervals.” European Journal onOperational Research 169:3, 1158–1175.
Itami, H. (1974) “Expected Value of a Stochastic Linear Program andthe Degree of Uncertainty of Parameters.” Management Science21:3, 291–301.
36 / 37
References
Jansen, B., de Jong, J., Roos, C. & Terlaky, T. (1997) “Sensitivityanalysis in linear programming: just be careful!” European Journal ofOperational Research 101:1, 15–28.
Madansky, A. (1960) “Inequalities for stochastic linear programmingproblems.” Management Science 6:2, 197–204.
Roos, C., Terlaky, T. & Vial, J. (1997) Theory and Algorithms forLinear Optimization: An Interior Point Approach. John Wiley andSons, Chichester, UK.
Ryzhov, I.O. & Powell, W.B. (2010) “Information collection on agraph.” To appear in Operations Research.
37 / 37