sequential bayes-optimal policies for multiple comparisons ... · sequential bayes-optimal policies...

Sequential Bayes-optimal Policies for MultipleComparisons with a Control

Jing Xie and Peter I. Frazier

Operations Research & Information Engineering, Cornell University

Sunday November 13, 2011Junior Faculty Interest Group (JFIG) Paper Competition

INFORMS Annual MeetingCharlotte, NC

What is Multiple Comparisons with a Control?

We have a stochastic simulator.

Given a set of input parameters x , it provides a random sample y(x).

For which inputs x is E [y(x)] > 0?

●

●

● ●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

● ● ●

●

●

●

●

●●

●

● ●

●

●

● ●

●

●●

● ●

●

●

●

●

●

●

●

●

●

●

● ●

●

y(n)

f(x)

In other words, find the level set of x 7→ E[f (x)].

MCC appears in Ambulance Positioning

We must allocate ambulances across 11 bases in the city of Edmonton.Which allocations satisfy mandated minimums for percentage of callsanswered in time, under a variety of different possible call arrival rates?

Call

Arrival

Rate

Allocations of Ambulances to Bases

>70% on time

<70% on time

2/hr

20/hr

[Thanks to Shane Henderson and Matt Maxwell for providing the ambulance simulation]

MCC appears in Algorithm Development

A researcher who develops a new algorithm would like to know:

In which problem settings is average-case performance better withAlgorithm A than with Algorithm B?

0 10 20 30 40 50 60 70 80 90 100−0.005

0

0.005

0.01

0.015

0.02

0.025

0.03

M

VK

G−

VIE

N=10*MN=3*MN=1*M

Given Samples, Estimating the Level Set isWell-Understood

y(x)

x=1 x=10

Given Samples, Estimating the Level Set isWell-Understood

●

●

●

●●

●

●●

●

●

y(x)

x=1 x=10

We Provide an Optimal Way to Choose Where to Sample

If our simulator is complex and takes a long time to run, the numberof samples we can take is limited.

This makes accurate MCC more difficult.

Where should we place our limited samples to estimate thelevel set as accurately as possible?

Our contribution: We provide an answer to this question withBayes-optimal performance.

The Optimal Policy Puts Samples Where They Help Most

y(x)

x=1 x=10

The Optimal Policy Puts Samples Where They Help Most

●

●

●

●● ●

●

● ●●

y(x)

x=1 x=10

Mathematical Model

We have alternatives x = 1, . . . ,k.

Samples from alternative x are Normal(µx ,σ2x ).

µx is unknown, while σ2x is assumed known (can be relaxed).

We have an independent normal Bayesian prior on each µx .

Sampling continues until an external deadline requires it to stop.

We assume this deadline is unknown and geometrically distributed.

When sampling stops, we estimate the level set {x : µx > 0} based onthe samples. The reward is the number of alternatives correctlyclassified.

The Optimal Policy Maximizes Average-Case Accuracy

A policy π is a rule for choosing where to sample next, based onprevious observations.

Let R be the number of alternatives classified correctly whensampling stops.

Eπ [R|~µ] is the performance under policy π and true mean vector ~µ.∫Eπ [R|~µ]P(d~µ) = Eπ [R] is the Bayes- or average-case performance.

We wish to find the policy that maximizes this.

Finding the Optimal Policy Means Solving a DynamicProgram

We wish to find the policy π that solves:

supπ

Eπ [R]

The solution is characterized theoretically via dynamic programming.

The curse of dimensionality usually makes computing the solutionto such dynamic programs intractable.

We Rewrite the Problem as a Bandit Problem

The expected reward is the expected number of alternatives correctlyclassified at the end.

We decompose this expected reward into an infinite sum ofdiscounted expected one-step rewards

Eπ [R] = R0 +Eπ

[∞

∑n=1

αn−1Rn

].

Here,

α is the parameter of the geometric distribution of the deadline.

R0 is the expected reward if we stop after taking no samples.

Rn is the expected one-step improvement, due to sampling, of theprobability of correctly classifying the alternative sampled.

We Can Compute the Optimal Policy

Written in this way, the problem becomesa multi-armed bandit problem.

(Gittins & Jones 1974) shows the optimalsolution is

arg maxx

νx(Snx),

where Snx is a parameterization of theBayesian posterior on µx .

The Gittins index νx(·) is defined in terms of a single-alternativeversion of the problem

νx(s) = supτ>0

E[

∑τn=1 αn−1Rn

∑τn=1 αn−1

∣∣∣∣ S0x = s,x1 = · · ·= xτ = x

].

We can compute Gittins indices efficiently because thesingle-alternative problem is much smaller than the full DP.

The Optimal Policy Improves Accuracy in AmbulancePositioning

PE=pure exploration(sample at random);

MV=max variance(equal allocation);

OPT=optimal policy.

Time ↓

Conclusion: The Optimal Policy Saves Time

The Multiple Comparisons with a Control problem appears inmany different simulation applications.

We found the optimal method for deciding where to sample.

This allows accurately characterizing level sets more quickly and withfewer simulation samples.

Thank You

Any questions?

Literature Review

Much of the literature on this topic has focused on providingnon-Bayesian statistical guarantees on accuracy.

One-stage policies: Tukey 1953, . . .Two-stage policies: Dudewicz and Ramberg 1972, . . .Fully sequential policies: Welsch 1977, Holm 1979, Hsu and Edwards1983, Gopalan and Berry 1998. . .

We focus on sampling sequentially in a Bayes-optimal way.

sequential bayes-optimal policies for multiple comparisons ... · sequential bayes-optimal policies...

Documents