respondingtounexpectedoverloads inlarge...

15
MANAGEMENT SCIENCE Vol. 55, No. 8, August 2009, pp. 1353–1367 issn 0025-1909 eissn 1526-5501 09 5508 1353 inf orms ® doi 10.1287/mnsc.1090.1025 © 2009 INFORMS Responding to Unexpected Overloads in Large-Scale Service Systems Ohad Perry, Ward Whitt Department of Industrial Engineering and Operations Research, Columbia University, New York, New York 10027 {[email protected], [email protected]} W e consider how two networked large-scale service systems that normally operate separately, such as call centers, can help each other when one encounters an unexpected overload and is unable to immediately increase its own staffing. Our proposed control activates serving some customers from the other system when a ratio of the two queue lengths (numbers of waiting customers) exceeds a threshold. Two thresholds, one for each direction of sharing, automatically detect the overload condition and prevent undesired sharing under normal loads. After a threshold has been exceeded, the control aims to keep the ratio of the two queue lengths at a specified value. To gain insight, we introduce an idealized stochastic model with two customer classes and two associated service pools containing large numbers of agents. To set the important queue-ratio parameters, we consider an approximating deterministic fluid model. We determine queue-ratio parameters that minimize convex costs for this fluid model. We perform simulation experiments to show that the control is effective for the original stochastic model. Indeed, the simulations show that the proposed queue-ratio control with thresholds outperforms the optimal fixed partition of the servers given known fixed arrival rates during the overload, even though the proposed control does not use information about the arrival rates. Key words : service systems; call centers; overload controls; queue-ratio routing, many-server queues; deterministic fluid models History : Received August 27, 2008; accepted March 15, 2009, by Paul H. Zipkin, operations and supply chain management. Published online in Articles in Advance June 18, 2009. 1. Introduction In a large-scale service system, such as a call cen- ter, under normal circumstances the arrival rates vary by time of day in a predictable way, and the staffing responds to that anticipated pattern, typically with fixed staffing levels over specified time intervals; see Aksin et al. (2007) and Gans et al. (2003) for background. However, occasionally, for various rea- sons, there may be unforeseen surges in demand, going significantly beyond the usual fluctuations, and lasting for a significant period of time. A demand surge might occur because of a catastrophic event in emergency response, a system failure experienced by an alternative service provider, or an unanticipated intense television advertising campaign in retail. Such unexpected demand surges typically cause conges- tion that cannot be eliminated entirely. Because the demand surge is sudden and unexpected, it may not be possible to immediately change the staffing level. Fortunately, there may be an opportunity to allevi- ate the congestion caused by the overload by getting help from another service system, which ordinarily operates independently. For example, with the reduc- tion of telecommunication costs, it is more and more common to have networked call centers, often geo- graphically dispersed, even on different continents. Such sharing is typically possible among different hospitals in a metropolitan area. It is often desirable to operate these service systems separately, but their connection provides opportunities, in particular, to provide assistance under overloads. In this paper we consider how that might be done and how to assess the costs and benefits. An important consideration is that we typically do not want sharing under normal loads. One reason is that it is easier to manage the different facilities separately, e.g., by maintaining clear accountability. Another reason is that the agents in each service facil- ity may be less effective and/or less efficient serving the customers from the other system, because each requires specialized skills not required for the other. We want to consider the case in which serving the other class is possible, but that there are penalties for doing so. We will assume that the service rates are slower for nondesignated agents. The proposed overload control applies directly to separate service systems run by a single organization, but could also be adopted by two different organiza- tions by mutual agreement. Our analysis provides use- ful information about the likely consequences of any agreement, which should facilitate making the agree- ment. Current practice for call centers (that we are 1353 INFORMS holds copyright to this article and distributed this copy as a courtesy to the author(s). Additional information, including rights and permission policies, is available at http://journals.informs.org/.

Upload: others

Post on 16-May-2020

5 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: RespondingtoUnexpectedOverloads inLarge …users.iems.northwestern.edu/~perry/Responding.pdfMANAGEMENT SCIENCE Vol.55,No.8,August2009,pp.1353–1367 issn0025-1909 eissn1526-5501 09

MANAGEMENT SCIENCEVol. 55, No. 8, August 2009, pp. 1353–1367issn 0025-1909 �eissn 1526-5501 �09 �5508 �1353

informs ®

doi 10.1287/mnsc.1090.1025©2009 INFORMS

Responding to Unexpected Overloadsin Large-Scale Service Systems

Ohad Perry, Ward WhittDepartment of Industrial Engineering and Operations Research, Columbia University,

New York, New York 10027 {[email protected], [email protected]}

We consider how two networked large-scale service systems that normally operate separately, such as callcenters, can help each other when one encounters an unexpected overload and is unable to immediately

increase its own staffing. Our proposed control activates serving some customers from the other system whena ratio of the two queue lengths (numbers of waiting customers) exceeds a threshold. Two thresholds, one foreach direction of sharing, automatically detect the overload condition and prevent undesired sharing undernormal loads. After a threshold has been exceeded, the control aims to keep the ratio of the two queue lengthsat a specified value. To gain insight, we introduce an idealized stochastic model with two customer classes andtwo associated service pools containing large numbers of agents. To set the important queue-ratio parameters,we consider an approximating deterministic fluid model. We determine queue-ratio parameters that minimizeconvex costs for this fluid model. We perform simulation experiments to show that the control is effective for theoriginal stochastic model. Indeed, the simulations show that the proposed queue-ratio control with thresholdsoutperforms the optimal fixed partition of the servers given known fixed arrival rates during the overload, eventhough the proposed control does not use information about the arrival rates.

Key words : service systems; call centers; overload controls; queue-ratio routing, many-server queues;deterministic fluid models

History : Received August 27, 2008; accepted March 15, 2009, by Paul H. Zipkin, operations and supply chainmanagement. Published online in Articles in Advance June 18, 2009.

1. IntroductionIn a large-scale service system, such as a call cen-ter, under normal circumstances the arrival rates varyby time of day in a predictable way, and the staffingresponds to that anticipated pattern, typically withfixed staffing levels over specified time intervals;see Aksin et al. (2007) and Gans et al. (2003) forbackground. However, occasionally, for various rea-sons, there may be unforeseen surges in demand,going significantly beyond the usual fluctuations, andlasting for a significant period of time. A demandsurge might occur because of a catastrophic event inemergency response, a system failure experienced byan alternative service provider, or an unanticipatedintense television advertising campaign in retail. Suchunexpected demand surges typically cause conges-tion that cannot be eliminated entirely. Because thedemand surge is sudden and unexpected, it may notbe possible to immediately change the staffing level.Fortunately, there may be an opportunity to allevi-

ate the congestion caused by the overload by gettinghelp from another service system, which ordinarilyoperates independently. For example, with the reduc-tion of telecommunication costs, it is more and morecommon to have networked call centers, often geo-graphically dispersed, even on different continents.

Such sharing is typically possible among differenthospitals in a metropolitan area. It is often desirableto operate these service systems separately, but theirconnection provides opportunities, in particular, toprovide assistance under overloads. In this paper weconsider how that might be done and how to assessthe costs and benefits.An important consideration is that we typically do

not want sharing under normal loads. One reasonis that it is easier to manage the different facilitiesseparately, e.g., by maintaining clear accountability.Another reason is that the agents in each service facil-ity may be less effective and/or less efficient servingthe customers from the other system, because eachrequires specialized skills not required for the other.We want to consider the case in which serving theother class is possible, but that there are penalties fordoing so. We will assume that the service rates areslower for nondesignated agents.The proposed overload control applies directly to

separate service systems run by a single organization,but could also be adopted by two different organiza-tions by mutual agreement. Our analysis provides use-ful information about the likely consequences of anyagreement, which should facilitate making the agree-ment. Current practice for call centers (that we are

1353

INFORMS

holds

copyrightto

this

article

and

distrib

uted

this

copy

asa

courtesy

tothe

author(s).

Add

ition

alinform

ation,

includ

ingrig

htsan

dpe

rmission

policies,

isav

ailableat

http://journa

ls.in

form

s.org/.

Page 2: RespondingtoUnexpectedOverloads inLarge …users.iems.northwestern.edu/~perry/Responding.pdfMANAGEMENT SCIENCE Vol.55,No.8,August2009,pp.1353–1367 issn0025-1909 eissn1526-5501 09

Perry and Whitt: Responding to Unexpected Overloads in Large-Scale Service Systems1354 Management Science 55(8), pp. 1353–1367, © 2009 INFORMS

aware of) is limited to sharing within a single organi-zation, and then only manually or on a regular basisunder normal loading. Load-balancing schemes usedin practice are described in §5.3 of Gans et al. (2003).Thus, our goal is to develop a control to automati-

cally detect when an overload has occurred (in eithersystem, or in both) and, then, before the staffing lev-els can be changed, reduce the resulting congestionby activating appropriate sharing from agents in theother system. We also want to prevent undesired shar-ing under normal loads. By focusing on this overloadproblem, we aim to contribute new insight into thelongstanding question about the costs and benefits ofresource pooling; see §4.2 of Aksin et al. (2007) andreferences therein. Here we focus on a situation wherewe want to turn on and off the pooling.

1.1. Organization of the PaperWe start in §2 with a literature review. Next, in§3, we introduce our proposed modelling approach.As an idealized model of two large-scale service sys-tems, which ordinarily operate separately, but havethe capability of serving customers from the othersystem, we consider the Markovian X call-centermodel having two homogeneous customer classesand two homogeneous agent pools, where all theagents are cross trained but serve the other class inef-ficiently. For clarity, we provide a concrete example.We then introduce a cost framework to evaluate alter-native controls. We indicate how we specify an over-load incident and how we evaluate the performanceconsequence.In §4, we introduce the proposed control, which is

a variant of the queue-ratio controls introduced byGurvich and Whitt (2009a, b). After reviewing thatcontrol (without thresholds), we show that it can per-form very poorly for this unintended application,because it can induce inefficient sharing simultane-ously in both directions. We then introduce our pro-posed alternative, which includes two thresholds, onefor each direction of sharing.In §5, we introduce a deterministic fluid model to

approximate the overloaded system after the over-load incident has occurred. We then introduce a con-vex cost structure and show how to select queue-ratiofunctions to minimize the long-run average cost inthe overload incident for the fluid model. We thendevelop a numerical algorithm to compute the opti-mal queue-ratio functions for arbitrary convex costfunctions. We exhibit explicit formulas for the optimalqueue-ratio functions for special structured separablecost functions in §EC.4 in the e-companion.1

1 An electronic companion to this paper is available as part of the on-line version that can be found at http://mansci.journal.informs.org/.

In §6, we discuss how to set the threshold param-eters. In §7, we conduct simulation experiments toshow that the optimal control for the fluid model iseffective for the stochastic X model. Finally, we stateour conclusions in §8. Supporting material appears inthe e-companion.

2. Literature ReviewIn this paper, we contribute to the literature onoverload (or congestion) control in queueing sys-tems. There is a substantial literature studying con-trols that route (or assign) customers (or jobs) toservers, possibly exploiting thresholds. Many of thesepapers, such as Bell and Williams (2005) and refer-ences therein, focus on single-server systems withoutcustomer abandonment, whereas we focus on many-server systems with customer abandonment; we onlydiscuss the many-server literature. (The distinction isbetween routing to one of several servers, as opposedto routing to one of several pools of servers.) It is nowunderstood that the presence of many servers changesthe problem; e.g., see Gurvich and Whitt (2009a). Onefeature of many-server systems with customer aban-donment we will exploit is the rate at which the tran-sient distribution approaches its steady-state limit:it tends to be much faster for many-server queues.In particular, the systems we consider tend to reachsteady state in a few mean service times; we elaboratein §EC.1. Hence, in our analysis of performance dur-ing an overload incident, we approximate using thenew steady state, determined by the new arrival rates(assumed constant). Customer abandonments ensurethat the system remains stable.Our paper can also be viewed as a contribution to

the call-routing problem for multiclass and multisitecall centers with skill-based routing; see §5 of Ganset al. (2003) and §§2.3.3, 4.1, and 4.2 of Aksin et al.(2007). Others have proposed responding to stochasticfluctuations and unexpected overloads by modulat-ing demand in different ways: (i) admission control;(ii) making delay announcements that may inducecustomers to leave, use a different service channel(e.g., e-mail instead of voice), or call back later; and(iii) acting to reduce service times, e.g., by curtailingcross-selling activities; see §3 of Aksin et al. (2007) andArmony and Gurvich (2006).In contrast, our paper relates to the larger liter-

ature exploiting server flexibility (supply-side man-agement). One approach is to have extra temporaryservers available on short notice; see Bhandari et al.(2008) and references therein. Instead, we proposeusing servers that are already working; i.e., we pro-pose a form of resource pooling, which exploits crosstraining; see §4.2 of Aksin et al. (2007) and §5.1 of Ganset al. (2003). As should be anticipated, though, our

INFORMS

holds

copyrightto

this

article

and

distrib

uted

this

copy

asa

courtesy

tothe

author(s).

Add

ition

alinform

ation,

includ

ingrig

htsan

dpe

rmission

policies,

isav

ailableat

http://journa

ls.in

form

s.org/.

Page 3: RespondingtoUnexpectedOverloads inLarge …users.iems.northwestern.edu/~perry/Responding.pdfMANAGEMENT SCIENCE Vol.55,No.8,August2009,pp.1353–1367 issn0025-1909 eissn1526-5501 09

Perry and Whitt: Responding to Unexpected Overloads in Large-Scale Service SystemsManagement Science 55(8), pp. 1353–1367, © 2009 INFORMS 1355

control tends to be more effective in alleviating con-gestion (rather than just balancing the service degra-dation) when the less-loaded system actually has someslack. Our work draws on the queue-ratio control pro-posed in Gurvich and Whitt (2009a, b), which appliesto very general network topologies. Here we considerthe relatively difficult X model, allowing sharing inboth directions (see Figure 1), but our approach makesthe model behave more like the N model; see Tezcanand Dai (2009).However, we make significant departures from the

previous literature. First, we want resource sharingonly in the presence of the unanticipated overload,and only in the proper direction, which depends onthe nature of the overload. Hence, we turn on andoff the sharing. Second, we regard the overload asa rare, exceptional, unanticipated event, rather thana stochastic fluctuation in demand. Thus, we thinkthat it is inappropriate to perform a long-run steady-state analysis of system performance with alternat-ing normal and overload periods (although that couldbe done). Instead, we focus on a single overload inisolation.Because the system tends to be overloaded, even

after sharing has been activated, system performancetends to be well approximated by deterministicfluid approximations, as in Whitt (2004). From aheavy-traffic perspective, the system operates in theso-called efficiency driven (ED) many-server heavy-traffic regime, instead of the quality-and-efficiency-driven (QED) regime; see Garnett et al. (2002) andGurvich and Whitt (2009a, b). Our paper also relatesto the literature on arrival-rate uncertainty; see §4.4of Gans et al. (2003) and §2.4 of Aksin et al. (2007).Arrival-rate uncertainty also tends to make determin-istic fluid approximations remarkably accurate; e.g.,see Whitt (2006), Bassamboo and Zeevi (2009), andreferences therein.In closing, we mention the large literature on detec-

tion outside queueing, such as control charts in statis-tical quality control, sequential analysis, and changepoint problems, but we make no direct contact with it.However, detecting the new arrival rate is not theonly issue: in simulations, our proposed control out-performs the optimal fixed partition of the serversgiven known arrival rates during the overload, eventhough our proposed control does not use direct infor-mation about the arrival rates; see §7.2.

3. Modelling Approach3.1. X ModelAs an idealized model of two separate service sys-tems with the capability of sharing, we consider theX model, depicted in Figure 1. The X model has twohomogeneous customer classes and two homogeneous

Figure 1 X Call-Center Model

Customer class 1

λ1 λ2

µ11

Arrivals

µ21

Same

µ12µ22

OtherSame

Routing

Service pool 2

Queues

Abandonment

θ1 θ2

m1agents

m2agents

Other

Abandonment

Class-dependentservice rates

Customer class 2

Service pool 1

agent pools. We assume that each customer class hasa service pool primarily dedicated to it, but all agentsare cross trained, so that they can handle calls fromthe other class, even though they may do so ineffi-ciently or ineffectively. Under normal loading (at ornear forecasted arrival rates), we want each class to beserved only by its designated agents, without any helpfrom cross-trained agents in the other service pool. Weassume that staffing has been performed in standardways, so that the number of agents in each pool is ade-quate to meet performance targets at forecasted arrivalrates. However, we also want to automatically acti-vate sharing when there are unexpected unbalancedoverloads, either when only one class is overloadedor when both classes are overloaded but one is muchmore overloaded than the other.More specifically, in this paper, we consider a fully

Markovian model. Customers from the two classesarrive according to independent Poisson processeswith arrival rates �1 and �2. There is a queue for eachcustomer class, with customers from each class enter-ing service in order of arrival. We assume that waitingcustomers have limited patience. A class i customerwill abandon if he does not start service before arandom time that is exponentially distributed withmean 1/�i. There are two service pools, with pool jhaving mj homogeneous servers working in parallel.The service times are mutually independent exponen-tial random variables, but the mean may depend onboth the customer class and the service pool. The meanservice time for a class i customer served by a type jagent is 1/i j . Let the service times, abandonmenttimes, and arrival processes be mutually independent.Let Qi�t� be the number of class i customers in queueand let Zi j �t� be the number of type j agents busy

INFORMS

holds

copyrightto

this

article

and

distrib

uted

this

copy

asa

courtesy

tothe

author(s).

Add

ition

alinform

ation,

includ

ingrig

htsan

dpe

rmission

policies,

isav

ailableat

http://journa

ls.in

form

s.org/.

Page 4: RespondingtoUnexpectedOverloads inLarge …users.iems.northwestern.edu/~perry/Responding.pdfMANAGEMENT SCIENCE Vol.55,No.8,August2009,pp.1353–1367 issn0025-1909 eissn1526-5501 09

Perry and Whitt: Responding to Unexpected Overloads in Large-Scale Service Systems1356 Management Science 55(8), pp. 1353–1367, © 2009 INFORMS

serving class i customers, at time t. With the assump-tions above, the stochastic process �Qi�t�Zi j �t�� i =12� j = 12� becomes a six-dimensional continuous-time Markov chain, given any routing policy thatdepends on this six-dimensional state.In this context, under normal loading we want each

class served only by agents from its own designatedservice pool; i.e., we want Z12�t�≈Z21�t�≈ 0 for all t.One possible reason is that the value of service byagents from the other pool might be less, perhapsbecause they lack specialized skills. Another possi-ble reason is that service by the cross-trained agentsis less efficient; we might have the strong inefficient-sharing condition

11 >12 and 22 >21� (1)

We examine the inefficient-sharing case. Through-out this paper, we assume the basic inefficient-sharingcondition

1122 ≥1221� (2)

Clearly, condition (1) implies condition (2). These con-ditions play a role in §5.2.In this X-model setting with inefficient sharing, we

suppose that an unexpected overload occurs at someunanticipated time that changes the arrival rates. Weassume that we are unable to immediately change thestaffing levels in response to that unexpected over-load. However, we do have the option of allowingsome of the cross-trained agents from the less-loadedservice pool serve customers from the more over-loaded customer class. In addition, we do not knowthe new arrival rates when the overload occurs. Thus,we need to develop a control that depends on thesystem history; in some way we must discover thatthe arrival rates have indeed changed. That is chal-lenging, because stochastic fluctuations under normalloading may make us think that the arrival rates havechanged when in fact they have not. We illustratewith the following example.Example 1. To illustrate, consider a symmetric

model with forecasted arrival rates �1 = �2 = 90 perunit of time, where the mean service time for cus-tomers served by designated agents is −1

11 = −122 =

1�0, whereas the mean service time for customersserved by agents from the other pool is −1

12 = −121 =

1�25. We measure time in units of mean service timesby designated agents, which for discussion we take tobe five minutes. Notice that condition (1) holds here:For all agents, the mean time required to serve theother class is 25% greater than the mean time requiredto serve an agent’s own class. Let customers abandonat rate �1 = �2 = 0�4.Because serving the other class is less efficient, with

these parameters it makes sense to operate the systemas two separate systems. Following standard staffing

methods for a single-class single-pool M/M/m +Mmodel, we may assign m1 = m2 = 100 agents to thetwo service pools. That makes the traffic intensities�1 ≡ �1/m111 = �2 = 0�90, which we regard as nor-mal loading. With this staffing, standard algorithmsshow that steady-state performance is quite good:82% of the arrivals enter service immediately uponarrival without joining the queue, only 0.5% of thearrivals abandon, the average size of each queueis 1�1, and the expected conditional waiting time,given that the customer is served, is only 0.012 (about3.6 seconds with a mean service time of five minutes).Now suppose that, at some unanticipated time, the

arrival rate for class 1 jumps to �1 = 130, whereas thearrival rate for class 2 remains at �2 = 90. If class 1receives no help from pool 2, then class 1 experiencessevere congestion. Assuming that the system reachessteady state after this shift in arrival rate (which doesnot take very long, approximately a few mean servicetimes, as confirmed by simulations; see §EC.1), almostall class 1 customers must wait before starting service,23% of the class 1 customers abandon, the averagesize of the class 1 queue becomes 75, the expectedconditional waiting time given that a class 1 customeris served is 0�65 (3�25 minutes).If, as system managers, we were able to recognize

that the class 1 arrival rate had shifted to 130, thenwe might elect to reassign some of the class 2 agents.For example, we might let 25 of the pool 2 agentsbe devoted to serving class 1. That increases the totalservice rate responding to the class 1 arrival rate of130 from 100 to 100+ �1/1�25�25 = 120, and leaves atotal service rate of 100 − 25 = 75 to respond to theclass 2 arrival rate of 90. Because sharing is inefficient,we must sacrifice 25 units of service rate for class 2 togain 20 units of service rate for class 1.Assuming that the two classes can be modelled

as M/M/m + M queues (which is only approxi-mately correct for class 1 because its servers havebecome heterogenous), we can analyze the perfor-mance, e.g., by Whitt (2005). The pair of abandon-ment probabilities for the two classes changes from(0�230�005) to (0�080�17); the pair of mean queuelengths for the two classes changes from (751�1) to(2638); and the pair of conditional expected wait-ing times given that the customer is served changesfrom (0�650�012) to (0�2050�450) (1.03 minutes and2�25 minutes, respectively). In this paper we developa control that responds in a similar way, but does soautomatically without having to know that the arrivalrates made that specific shift, and without making afixed partition of the agents.

3.2. Analysis with a Cost FunctionThe advantage of such sharing, or any other controlthat produces similar sharing by the inefficient cross-trained agents, depends on the cost of the congestion

INFORMS

holds

copyrightto

this

article

and

distrib

uted

this

copy

asa

courtesy

tothe

author(s).

Add

ition

alinform

ation,

includ

ingrig

htsan

dpe

rmission

policies,

isav

ailableat

http://journa

ls.in

form

s.org/.

Page 5: RespondingtoUnexpectedOverloads inLarge …users.iems.northwestern.edu/~perry/Responding.pdfMANAGEMENT SCIENCE Vol.55,No.8,August2009,pp.1353–1367 issn0025-1909 eissn1526-5501 09

Perry and Whitt: Responding to Unexpected Overloads in Large-Scale Service SystemsManagement Science 55(8), pp. 1353–1367, © 2009 INFORMS 1357

experienced. To assess that cost, we will assume thatthere is a cost function C, with C�Q1�t�Q2�t�� repre-senting the expected cost rate incurred at time t if thevector of queue lengths at time t is (Q1�t�Q2�t�). Ifthe overload incident takes place over the time inter-val �a b�, then the expected total cost would be

CT ≡ E[∫ b

aC�Q1�t�Q2�t�� dt

]

=∫ b

aE�C�Q1�t�Q2�t��� dt� (3)

We assume that the cost function C is convex andstrictly increasing. The convexity explains why wemight want to share when one class is much moreoverloaded than the other, no matter which class isoverloaded.In this context, our goal is to choose a routing pol-

icy, which may allow assignments to cross-trainedagents, to achieve low (near-minimum) expected totalcost for all possible overload incidents and resultingstochastic processes (Q1�t�Q2�t�), although produc-ing only a negligible amount of sharing under nor-mal loading. To define what we mean by an “over-load incident,” we can first specify an interval �a c�over which the arrival-rate vector (�1�t��2�t�) differsfrom the nominal vector. (We assume that the arrivalprocess is a nonhomogeneous Poisson process withthese new arrival rates.) However, we should alsoinclude an additional interval �c b� after time c toallow the vector queue length (Q1�t�Q2�t�) to returnto its nominal steady-state value. (Engineering judge-ment is required.) In our analysis, we simplify byrestricting attention to scenarios, as in the exampleabove, in which the pair of arrival rates (�1�2) makesa sudden unexpected shift at some time, and remainsat the new vector for a significant duration, so thatthe system reaches a new steady state at the newarrival-rate vector. (Customer abandonment ensuresthat the system reaches steady state for any arrival-rate vector.) Our control applies more generally.For such scenarios, we simplify by reexpressing our

goal as minimizing the expected steady-state cost;i.e., we aim to minimize CT ≡ E�C�Q1Q2��, where(Q1Q2) is the vector of steady-state queue lengthsassociated with the new arrival-rate vector associatedwith the overload. We will use this steady-state over-load framework to set the control parameters anddemonstrate effectiveness, but the control applies toother overload scenarios. For this steady-state anal-ysis to be effective, it is important that the systemapproaches the new steady state associated with theoverload relatively quickly. As illustrated in the con-crete example above, this tends to happen in a fewmean service times. We discuss this important pointfurther in §EC.1.

In the context of Example 1, we might have a shiftin arrival rates lasting five hours. It might not be pos-sible to change the staffing in response, because it isin the middle of the same day. The initial transientperiod might last three mean service times or 15 min-utes, which is 5% of the total overload incident. Theremight then be a recovery period lasting about fivemean service times or 25 minutes, after which the sys-tem returns to steady state. For such overloads, thesteady state is evidently reasonable, and it is essentialfor tractability. Even with this simplifying approxima-tion, the control problem for the stochastic system isvery difficult. We will get an approximate solutiononly after exploiting a fluid approximation in additionto this steady-state analysis; see §5.2. Even with thatapproximation, the analysis with a general increasingconvex cost function gets complicated; see §5.2. How-ever, as a byproduct, there is a very nice, simple story(explicit formulas for everything), provided that weassume a separable quadratic power cost function; seeProposition 5.

4. Proposed ControlWe start by briefly reviewing the fixed-queue-ratio(FQR) routing rule from Gurvich and Whitt (2009b)and then we show that the FQR rule without thresh-olds can perform poorly with inefficient sharing,where the conditions in the theorems of Gurvich andWhitt (2009b) are violated. Then we introduce ourproposed modification of FQR to treat unexpectedoverloads. It involves general queue-ratio functions,as in Gurvich and Whitt (2009a), and thresholds, oneof each for each direction of sharing.

4.1. FQR and Its Difficulties withInefficient Sharing

With two queues, FQR can be implemented by con-sidering a (weighted) queue-difference stochastic processD�t�≡Q1�t�− rQ2�t�, t ≥ 0, where r is a single target-ratio parameter that management can set. With FQRfor the X model, a newly available agent in eitherservice pool serves the customer at the head of theclass 1 (class 2) queue if D�t� > 0 (D�t� < 0), and servesthe customer at the head of its own queue if D�t� =0. The goal of FQR is to maintain a nearly constantqueue ratio: Q1�t�/Q2�t� ≈ r throughout time. Whenr = 1, FQR coincides with serving the longer queue.Under regularity conditions, the FQR control has

two very desirable features for large-scale service sys-tems, which makes it possible to reduce the multiclassmultipool staffing-and-routing problem to the well-understood, single-class single-pool staffing problem.First, if the required conditions are satisfied, then FQRtends to produce state-space collapse (SSC); i.e., for theX model, the two-dimensional queue-length vector

INFORMS

holds

copyrightto

this

article

and

distrib

uted

this

copy

asa

courtesy

tothe

author(s).

Add

ition

alinform

ation,

includ

ingrig

htsan

dpe

rmission

policies,

isav

ailableat

http://journa

ls.in

form

s.org/.

Page 6: RespondingtoUnexpectedOverloads inLarge …users.iems.northwestern.edu/~perry/Responding.pdfMANAGEMENT SCIENCE Vol.55,No.8,August2009,pp.1353–1367 issn0025-1909 eissn1526-5501 09

Perry and Whitt: Responding to Unexpected Overloads in Large-Scale Service Systems1358 Management Science 55(8), pp. 1353–1367, © 2009 INFORMS

(Q1�t�Q2�t�) tends to evolve approximately as a one-dimensional process determined by the total queuelength Q!�t� ≡ Q1�t� + Q2�t�. In particular, Qi�t� ≈piQ!�t� for i = 12, where p1 = r/�1 + r� = 1 − p2;e.g., see Figure EC.8 in §EC.2. Moreover, it does soin a way such that all three stochastic processes—Q!�t�, Q1�t�, and Q2�t�—remain appropriately sta-ble as t → . Indeed, Gurvich and Whitt (2009b)show that, under regularity conditions, FQR achievesSSC asymptotically in the QED many-server heavy-traffic limiting regime. Second, with FQR, it is possi-ble to choose the ratio parameter r (or, equivalently,the queue proportions pi) to determine the optimallevel of staffing to achieve desired service-level dif-ferentiation; i.e., staffing costs are minimized sub-ject to meeting class-dependent delay targets P�Wi >Ti� = %; see §EC.2 and Gurvich and Whitt (2009b).Gurvich and Whitt (2009a) also showed how to staffto minimize convex costs under normal loading. Inthat case, the asymptotically optimal control in theQED regime is not FQR, but a state-dependent gen-eralization: the queue-and-idleness-ratio (QIR) control.Our optimal queue ratios for the fluid model underoverloading with convex costs are of the same state-dependent form.However, in our setting, where service provided

by nondesignated agents is inefficient, neither FQRnor QIR, without the extra thresholds, is appropri-ate in normal loading, because they induce unde-sired sharing. Because of the inefficient sharing, thesystem is not work conserving; sharing causes therequired workload to increase. Indeed, the conditionsin the key theorems of Gurvich and Whitt (2009a,b) are violated. In fact, those conditions are actu-ally needed to maintain stability. (However, for FQRwithout the thresholds, SSC is still achieved; the twoqueues explode together.)Example 2. To illustrate, consider the X model

with parameters m1 = m2 = 100, 11 = 22 = 1�0,12 = 21 = 0�8, �1 = �2 = 0�99, and �1 = �2 = 0�0(no abandonment). Because the traffic intensities are�i = �i/mii i = 0�99, the two separate systems with-out sharing are stable (with mean queue length 85and mean waiting time 0�85). However, if we use FQRwith r = 1, then inefficient sharing is generated, sothat a significant proportion of each agent pool is busyserving the other class. As a consequence, the arrivalrate actually exceeds the service rate and the queuelengths diverge to infinity. Here, there still is SSC, butthe two queue lengths diverge together.This difficulty when FQR is applied inappropri-

ately is illustrated by Figures 2 and 3. They show thesample paths of Q1�t� and Z21�t�, starting empty, inone simulation run. After an initial transient period,the number of agents serving the other class fluctu-ates around E�Z12�= E�Z21�≈ 39, whereas the queue

Figure 2 Sample Path of Z2�1�t� for FQR

0 200 400 600 800 1,0000

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8Proportion of type 1 servers serving class 2 customers

Time

Pro

port

ion

grows in an approximately linear rate; the simulationestimate is E�Qi�t��≈ 6�8t, t ≥ 0. (These numerical val-ues are estimated from multiple simulation runs. Theconfidence intervals are less than 1%. We develop ana-lytical approximations to describe this behavior in asubsequent paper.)Customer abandonment necessarily prevents the

queues from exploding. Even in the worst case, whenall agents are dedicated to the wrong class, the sys-tem would be stable. However, there still is perfor-mance degradation, e.g., with �1 = �2 = 0�2 and r = 1about 39% of the agents in each pool are busy serv-ing customers from the other class which causes thequeues to grow from 10, if there is no sharing, to 34.More details appear in §EC.2.

4.2. Proposed Control: FQR-THere is the lesson from the previous subsection: If weare going to use a queue-ratio control, then we need to

Figure 3 Sample Path of Q1�t� for FQR

0 200 400 600 800 1,0000

1,000

2,000

3,000

4,000

5,000

6,000

7,000Number of customers in class 1 queue

Time

Num

ber

in q

ueue

Queue 16.8t

INFORMS

holds

copyrightto

this

article

and

distrib

uted

this

copy

asa

courtesy

tothe

author(s).

Add

ition

alinform

ation,

includ

ingrig

htsan

dpe

rmission

policies,

isav

ailableat

http://journa

ls.in

form

s.org/.

Page 7: RespondingtoUnexpectedOverloads inLarge …users.iems.northwestern.edu/~perry/Responding.pdfMANAGEMENT SCIENCE Vol.55,No.8,August2009,pp.1353–1367 issn0025-1909 eissn1526-5501 09

Perry and Whitt: Responding to Unexpected Overloads in Large-Scale Service SystemsManagement Science 55(8), pp. 1353–1367, © 2009 INFORMS 1359

take extra measures to prevent sharing under normalloading. First, we want to prevent simultaneous inef-ficient sharing in both directions. Hence, we restrictthe routing to one-way sharing at any time: We do notallow a newly available type 2 agent to serve a wait-ing class 1 customer if there are any type 1 agentsbusy serving class 2 customers. And similarly in theother direction. (However, over time, the direction ofone-way sharing may change; we are not consideringthe so-called N model, which only allows one-waysharing in one fixed direction.)From cost considerations, discussed in §5, we want

to allow different ratio parameters r12 and r21 forthe different ways we may share. (In general, wemay need more complicated ratio functions or, equiv-alently, sharing regions; see §5, especially Figure 4.)To permit sharing only in the presence of unbal-anced overloads, we suggest fixed-queue-ratio routingwith thresholds (FQR-T). In addition to the two ratioparameters r12 and r21, we introduce two positivethresholds &12 and &21. We then define two queue-difference stochastic processes:

D12�t�≡Q1�t�− r12Q2�t� and

D21�t�≡ r21Q2�t�−Q1�t��(4)

As long as D12�t� < &12 and D21�t� < &21, we do notallow any sharing, i.e., we only let agents serve cus-tomers from their designated class.However, available pool 2 agents are assigned to

class 1 customers when D12�t� ≥ &12, provided thatno pool 1 agents are still serving a class 2 customer.As soon as the first pool 2 agent is assigned to servea class 1 customer, we drop the threshold &12, butkeep the other threshold &21. (We could elect to addanother threshold for the sharing; see §EC.4.) Onceone-way sharing has been activated with pool 2 help-ing class 1, we use ordinary FQR with ratio param-eter r12. Upon service completion, a newly availabletype 2 agent serves the customer at the head of theclass 1 queue (the class 1 customer who has waitedthe longest) if D12�t� > 0; otherwise the agent servesa customer from his own class. In this phase, pool 1agents only serve class 1 customers. Only one-waysharing in this direction will be allowed until eitherthe class 1 queue becomes empty or the other differ-ence process crosses the other threshold, i.e., whenD21�t�≥ &21. As soon as either of these events occurs,newly available pool 2 agents are only assigned toclass 2 and the threshold &12 is reinstated.We can initiate sharing in the opposite direction

when first D21�t�≥ &21 and there are no class 2 agentsserving class 1 customers. At the first time both condi-tions are satisfied, we start sharing with a pool 2 agentserving a class 1 customer. When that first assign-ment takes place, we remove the threshold &21 and

again use FQR with one-way sharing, but now withthe ratio parameter r21.Upon arrival, a class i customer is routed to pool i if

there are idle servers; otherwise the arrival goes to theend of the class i queue. An arrival might increasethe queue to a point that sharing is activated. Thenthe first customer in queue is served by the other class(presumably the agent that has been idle the longest,but we do not focus on individual agents).The queue-difference stochastic processes in (4) will

never provide any instantaneous motivation to haveagents of both types simultaneously inefficiently serv-ing the other class if r12 ≥ r21. That property will besatisfied when we apply a cost function to specify theratio parameters in §5.2.To illustrate how FQR-T performs in normal loading

(heavy load, but not overloaded), we again considerExample 2 with abandonments at rate �i = 0�2. We letr12 = r21 = 1, so that there is no change from FQRabove, but nowwe add thresholds &12 = &21 = 10. Theperformance is greatly improved with FQR-T com-pared to FQR without thresholds: E�Z12� = E�Z21� ≈2�0 for FQR-T, whereas E�Z12� = E�Z21� ≈ 39 forFQR. As a consequence, the performance for FQR-Tis almost the same as without sharing. In particular,with FQR-T, the abandonment rate is slightly higherthan without sharing (2�5% compared to 2�0%), but theaverage queue length is actually less �9�4 comparedto 10�0). In fact, FQR-T can outperform no sharingwith larger threshold values, because of the resource-pooling effect. For more details, see §EC.2.

5. Fluid Approximation for SteadyState of the X Model

To obtain a tractable characterization of performancefor FQR-T and find good queue-ratio parameters, wenow introduce a deterministic fluid approximation.To describe the steady-state behavior of our modelwhen there is no sharing, we first discuss the caseof a single customer class served by a single servicepool—the classical M/M/m+M model, with arrivalrate �, individual service rate , and abandonmentrate �. Afterward we treat the more general X model.

5.1. One Class and One PoolFor theM/M/m+M model, the approximating deter-ministic fluid model has been studied in Whitt (2004)via many-server heavy-traffic limits. Here we willderive the simple steady-state formulas directly. Weassume that input and output (which we call fluid)occurs deterministically at the specified rates. Wethink of the system as large and thus regard the num-ber of customers and servers as continuous quanti-ties as well. Thus, fluid arrives deterministically andcontinuously at constant rate �. Fluid also is served

INFORMS

holds

copyrightto

this

article

and

distrib

uted

this

copy

asa

courtesy

tothe

author(s).

Add

ition

alinform

ation,

includ

ingrig

htsan

dpe

rmission

policies,

isav

ailableat

http://journa

ls.in

form

s.org/.

Page 8: RespondingtoUnexpectedOverloads inLarge …users.iems.northwestern.edu/~perry/Responding.pdfMANAGEMENT SCIENCE Vol.55,No.8,August2009,pp.1353–1367 issn0025-1909 eissn1526-5501 09

Perry and Whitt: Responding to Unexpected Overloads in Large-Scale Service Systems1360 Management Science 55(8), pp. 1353–1367, © 2009 INFORMS

and abandons deterministically and continuously atrates that are directly proportional to the number ofbusy servers and the queue length, respectively. If the“number” of busy servers is x, then fluid is served atrate x; if the queue length is q, then fluid abandonsat rate q�.We say that the system is overloaded if the input

rate exceeds the maximum possible total service rate.Given m servers, each working at rate , the max-imum possible total service rate is m. Thus, thesystem is overloaded if � > m, and not overloadedotherwise. If the system is overloaded, then in steadystate all servers will be busy and there will be a queueof waiting fluid, with content q, which can be deter-mined simply be equating the rate in to the rate out,including customer abandonment: rate in≡ �=m+q� ≡ rate out. As an immediate consequence, we getq = ��−m�/�. If the system is not overloaded, i.e.,if �≤m, then there will be no queue. Then we candescribe the steady state via the amount of spare ser-vice capacity (number of idle servers), s, which againcan be determined by equating the rate in to the rateout: rate in≡ �= �m− s�≡ rate out. As an immediateconsequence, we get s = m− ��/�. Without directlyspecifying whether or not the system is overloaded,we can write

q = ��−m�+

�and s =

(m− �

)+ (5)

where �x�+ ≡max*x0+. We always have the comple-mentarity relation qs = 0.From the point of view of our analysis, we regard �

as an unknown parameter, but we consider theremaining parametersm, , and � as fixed and known.For any given �, we can compute q and s as indicatedabove. With our overload control problem in mind, itis significant that we can recover � from the pair (q s),because we want to learn about � by observing (q s).If q > 0 and s = 0, then necessarily we are overloaded,and � = �q +m; if q = 0 and s > 0, then necessarilywe are underloaded (which includes normally loaded),and � = �m− s�; if q = 0 and s = 0, then necessarilywe are critically loaded, and � = m; we cannot haveq > 0 and s > 0. For an overloaded fluid queue, � isan increasing linear function of q; for an underloadedqueue, � is a decreasing linear function of s.As discussed in Whitt (2004), we can also describe

the transient behavior of the fluid model and deter-mine other performance measures. For example, if thefluid model is overloaded, then the associated approx-imate potential steady-state waiting time (virtual wait-ing time for a customer with infinite patience) is w =log ��/m�/�1 = log ���/�1, where �≡ �/m is the traf-fic intensity, satisfying �> 1; see (2.26) of Whitt (2004).Note that an increasing convex function of w is an

increasing convex function of � for �≥m. Because �

is a positive linear function of q under overloads, wesee that an increasing convex function of w itself is aconvex increasing function of q, as we have assumedin our optimization formulation. Similarly, the aban-donment rate in the overloaded fluid model is �q =�−m, so the abandonment rate is an increasing lin-ear function of q under overloads.

5.2. Optimal Solution for the X Fluid ModelThe X fluid model is a natural generalization ofthe single-class single-pool fluid model above. Nowwe have two deterministic arrival rates �1 and �2,one for each class, with the additional parame-ters *mj �ii j� i = 12� j = 12+. Closely parallelingthe discussion above, we will be characterizing thesteady-state performance in terms of the quantities(Q1Q2 S1 S2), where Qi is the fluid content at theclass i queue, whereas Sj is the amount of spare capac-ity at pool j .The steady-state behavior of the X fluid model

depends on the number of agents from each poolassigned to (and actually busy serving customersfrom) each customer class, i.e., the deterministic vec-tor (Z11Z12Z21Z22), where Zi j is the number ofpool j agents assigned to serve class i customers,which is regarded as a continuous variable. To belegitimate assignments, we must have Zi j ≥ 0 for alli and j with Z11 + Z21 ≤ m1 and Z12+Z22 ≤m2.Because these agents are actually busy serving cus-tomers, we must also have �1 ≥ Z1111 + Z1212and �2 ≥ Z2121 + Z2222. Once we assign valuesto these variables Zi j , we reduce the X model totwo single-class single-pool models. The arrival ratefor class i is �i, whereas the service rate for classi is Zi1i1 + Zi2i2. Class i is then overloaded ifand only if �i > Zi1i1 + Zi2i2, in which case thesteady-state fluid content in the class i is

Qi =�i−Zi1i1−Zi2i2

�i� (6)

If class i is not overloaded, then Qi = 0. The sparecapacity in pool j in steady state is Sj = mj − Z1 j −Z2 j ≥ 0, j = 12.In this X fluid model setting, for known arrival

rates, our initial goal is to determine the mini-mum cost C∗��1�2�, which is the minimum ofC�Q1�Z11Z12�Q2�Z21Z22�� for specified arrival-rate vector (�1�2), which we denote simply byC�Z11Z12Z21Z22�, over all feasible fixed assign-ment vectors �Z11Z12Z21Z22� in �4 with Qi ≡Qi�Zi1Zi2� defined in (6). We let the asterisk denotethe optimal solution. (We do not consider moregeneral controls.) We will apply the optimal solu-tion to find the optimal state-dependent queue-ratiofunctions.

INFORMS

holds

copyrightto

this

article

and

distrib

uted

this

copy

asa

courtesy

tothe

author(s).

Add

ition

alinform

ation,

includ

ingrig

htsan

dpe

rmission

policies,

isav

ailableat

http://journa

ls.in

form

s.org/.

Page 9: RespondingtoUnexpectedOverloads inLarge …users.iems.northwestern.edu/~perry/Responding.pdfMANAGEMENT SCIENCE Vol.55,No.8,August2009,pp.1353–1367 issn0025-1909 eissn1526-5501 09

Perry and Whitt: Responding to Unexpected Overloads in Large-Scale Service SystemsManagement Science 55(8), pp. 1353–1367, © 2009 INFORMS 1361

Let qi be the queue length of class i and let si bethe spare capacity in pool i when there is no shar-ing. They can be expressed as in (5), with formulasdepending on i. In the fluid model, we regard thesystem as being in normal loading if neither queueis overloaded without sharing, i.e., if q1 = q2 = 0, butthe amount of spare capacity is not too large. Becausethe cost function is increasing and convex, under nor-mal loading we achieve the minimum cost by lettingZ12 = Z21 = 0 (no sharing) to obtain Qi = 0 for i =12. The unexpected overload means that either q1 > 0or q2 > 0, or both. Henceforth we assume that to bethe case.The natural model state is (�1�2), but an equiva-

lent representation is (q1 s1 q2 s2), where we alwayshave the complementarity relation q1s1 = q2s2 = 0.If qi > 0, then �i = mii i + qi�i; if si > 0, then �i =�mi− si�i i. This alternative representation impliesthat, for the X fluid model, we can determine thearrival rates by observing the queue lengths and sparecapacities.Let Z∗

ij be the optimal value of the variable Zi j . Westart by stating some basic propositions, which serveto simplify our X-fluid-model optimization problem.We first reduce the number of variables from four totwo. The following is immediate.

Proposition 1 (No Idle Agents). If we do not haveQ∗1 = Q∗

2 = 0, then there should be no idle agents, i.e.,S∗j = 0 or, equivalently, Z∗

1 j +Z∗2 j =mj for j = 12.

As a consequence of Proposition 1, if q1 > 0, q2 = 0,and s2 > 0, then necessarily Z∗

12 > 0. Moreover, eitherZ∗12 ≥ s2 or Q∗

1 =Q∗2 = 0.

We next show that inefficient sharing implies notwo-way sharing.

Proposition 2 (One-Way Sharing). Because theservice rates satisfy the inefficient-sharing condition1122 ≥ 1221 in (2), it suffices to consider one-waysharing; i.e., Z∗

12Z∗21 = 0.

Proof. Suppose that Z12 > 0 and Z21 > 0, so thatwe have sharing in both directions. It suffices toassume that Q1 > 0 and Q2 > 0. We will show that,for appropriate positive variables x12 and x21, if wereplace (Z12Z21) by (Z12 − x12Z21 − x21), thenboth queue lengths will decrease until one of thevariables Z12 − x12 or Z21 − x21 becomes zero orboth queues become empty. We define x21 as anappropriate constant multiple of x12, so that wehave a single real variable. To do so, let .i ≡ �i −Zi1i1−Zi2i2 > 0 for i= 12. Then let x21 ≡ /x12,where /≡ �.212+.122�/�.211+.121�. Then weconsider what happens as we increase x12, assum-ing that / remains constant. Let 0i ≡ �i�Qi�0� −Qi�x12��, where Qi�x12� denotes Qi with the initial

vector of sharing levels (Z12Z21) replaced by (Z12−x12Z21−/x12). Then

01 = x12.1(1122−1221.211+.121

)and

02 = x12.2(1122−1221.211+.121

)�

(7)

Clearly, 0i ≥ 0 for both i if and only if inequal-ity (2) holds. Moreover, from (6) and (7), we seethat both queues become empty at the same level ofx12. Hence, we can decrease both variables Z12 andZ21 by increasing x12 until one of these variablesbecomes zero or both queue lengths simultaneouslybecome zero. �

As a consequence of Proposition 2, we can re-express the basic optimization problem, first, in termsof two convex real-valued functions of a single realvariable, C12 and C21, and second, in terms of a sin-gle combined convex function of a single real vari-able, Cc. Let 1A be the indicator function of the set A;i.e., 1A�x�= 1 if x ∈A and 1A�x�= 0 otherwise. We putthe short proof required in §EC.3.1.

Proposition 3 (Single-Variable Functions). Be-cause the inefficient-sharing condition (2) holds, the opti-mal cost can be expressed as

C∗��1�2� = C∗�q1 s1 q2 s2�

= min*C12�Z12�C21�Z21�+

= min*Cc�Z12−Z21�+ (8)

over Z12 and Z21 such that 0≤Z12 ≤m2, 0≤Z21 ≤m1,and Z12Z21 = 0, where

C12�Z12�

≡C12�Z12��1�2�

≡C(��1−m111−Z1212�+

�1��2−�m2−Z12�22�+

�2

)

≡C12�Z12�q1s1=0 q2s2�

≡C(�q1−12Z12�+

�1�q2−s222+22Z12�+

�2

) (9)

Cc�Z12−Z21�≡C12�Z12−Z21�1*Z12−Z21≥0++C21�−�Z12−Z21��1*Z12−Z21<0+

=C12�Z12�1*Z12>0++C21�Z21�1*Z21>0++C�q1q2�1*Z12=Z21=0+ (10)

with qi and si defined in (5), satisfying q1s2 = q2s2 = 0,and C21�Z21� defined analogously to C12�Z12� in (9).The functions C12 and C21 are continuous strictly con-vex functions of the single real variables Z12 and Z21over their domain. If, in addition, the stronger inefficient-sharing condition 11 >12 and 22 >21 in (1) holds,

INFORMS

holds

copyrightto

this

article

and

distrib

uted

this

copy

asa

courtesy

tothe

author(s).

Add

ition

alinform

ation,

includ

ingrig

htsan

dpe

rmission

policies,

isav

ailableat

http://journa

ls.in

form

s.org/.

Page 10: RespondingtoUnexpectedOverloads inLarge …users.iems.northwestern.edu/~perry/Responding.pdfMANAGEMENT SCIENCE Vol.55,No.8,August2009,pp.1353–1367 issn0025-1909 eissn1526-5501 09

Perry and Whitt: Responding to Unexpected Overloads in Large-Scale Service Systems1362 Management Science 55(8), pp. 1353–1367, © 2009 INFORMS

then Cc is also a continuous strictly convex function of thesingle real variable Z12−Z21 over the domain specified inthis proposition.

Corollary 1 (Three Intervals). If the stronger in-efficient-sharing condition (1) holds, then for each pair ofarrival rates (�1�2) or initial state (q1 s1 q2 s2) (with-out sharing), there are two thresholds 212 ≥ 221 such thatexactly one of the following occurs:

�i� Z∗12>0 and Z∗

21=0 for Z12−Z21>212�ii� Z∗

21>0 and Z∗12=0 for Z12−Z21<221

�iii� Z∗12=Z∗

21=0 for 221≤Z12−Z21≤212�(11)

The value of Corollary 1 will be clear when weturn our attention to the queue ratio r below. Wecan apply Proposition 2 to get further simplification ifthere is initially spare capacity. Then, from the begin-ning, we know that we can only have sharing withhelp provided by the pool with spare capacity; i.e.,if q1 > 0> s2, then Z∗

12 > 0 and Z∗21 = 0, so that it suf-

fices to minimize C12�Z12�.It is natural to have the cost function C be smooth,

in which case the optimal solution can be found bysimple calculus. Proposition EC.1 concludes that, ifthe optimal solution found by calculus falls outsidethe feasible set, then the actual optimum value isobtained at the nearest boundary point.It is easy to see that there is a one-to-one corre-

spondence between the queue ratio r ≡Q1/Q2 and thereal variable Z12−Z21 used to specify the optimiza-tion problem in Proposition 3. That implies that thereis a one-to-one correspondence between the fixed-agent-allocation optimization problem (choosing Z12and Z21) and the (fixed) queue-ratio control problem(choosing state-dependent queue-ratio functions r12and r21) in the fluid-model context. We establish itformally in §EC.3.3.Finally, we provide a basis for an efficient algo-

rithm to determine the equivalent optimal controls. Todo so, we effectively reduce the dimension from twoto one by observing that special weighted sums ofthe queue lengths (and corresponding weighted sumsof the arrival rates) are independent of the agent-assignment variables Z12 and Z21. We only state theresult for Z12; the corresponding result for Z21 isstated in §EC.3.4. The proof is verification by directcomputation, so we omit it. For understanding, it maybe helpful to refer to Figure 4 in §5.3.

Proposition 4 (Constant Weighted QueueLengths). Let

a12 ≡22�112�2

and a12 ≡1222

� (12)

Consider any initial state (�1�2), or equivalently(q1 s1 q2 s2), with s1 = 0. Then

w12 ≡ a12(�1−m111

�1

)+(�2−m222

�2

)

= a12q1+ q2−s222�2

= a12Q1�Z12�+Q2�Z12�−S2�Z12�22

�2(13)

for all Z12 with 0≤Z12 ≤m2.

Proposition 4 implies that the locus of all non-negative queue-length vectors �Q1Q2� ≡ �Q1�Z12�Q2�Z12�� associated with initial state (�1�2), orequivalently (q1 s1 q2 s2), with s1 = 0, is on the line*�Q1Q2�3 a12Q1 + Q2 = w12+ in the nonnegativequadrant. Thus, for any nonnegative constant w12,the optimal queue-length vector (Q∗

1Q∗2) and the opti-

mal queue-ratio r∗12 ≡ Q∗1/Q

∗2 restricted to one-way

sharing (Z21 = 0) are the same for all initial states(q1 s1 q2 s2) with s1 = 0 satisfying (13) provided thatq1 ≥ Q∗

1. In that case, a12Q∗1 + Q∗

2 = w12. That sameoptimal queue-length vector and optimal queue ratioholds for all arrival pairs (�1�2) where s1 = 0, Z21 = 0and

�1+ a12�2 = �w12

≡ �1�2w12+ a12�2m111+ �1m222a12�2

� (14)

And similarly for sharing in the other direction; see§EC.3.4.

5.3. Computing the OptimalQueue-Ratio Functions

We now demonstrate how to numerically find theoptimal state-dependent queue ratios r∗12 and r∗21as functions of the fluid state (Q1 S1Q2 S2). Withthe thresholds, this gives us a state-dependent queue-ratio control with thresholds (QR-T). To illustrate, weconsider a (nonseparable) quadratic cost function ofthe form

C�Q1Q2�= 3Q21+ 2Q2

2+Q1Q2+ 10Q1+ 5Q2� (15)

For any vector of arrival rates (�1�2) we can assignone, and only one, point in the (Q1Q2) plane, whichrepresents the queue lengths associated with thesearrival rates, when there is no sharing. To representspare capacity, we allow negative values; i.e., −Qiis shorthand for −Sii i/�i. (We actually plot (Q1 −S111/�1Q2 − S222/�2) even though the axes aresimply labelled Qi.)We apply Proposition 4 to find the optimal queue

ratios. We first consider when pool 2 helps class 1.

INFORMS

holds

copyrightto

this

article

and

distrib

uted

this

copy

asa

courtesy

tothe

author(s).

Add

ition

alinform

ation,

includ

ingrig

htsan

dpe

rmission

policies,

isav

ailableat

http://journa

ls.in

form

s.org/.

Page 11: RespondingtoUnexpectedOverloads inLarge …users.iems.northwestern.edu/~perry/Responding.pdfMANAGEMENT SCIENCE Vol.55,No.8,August2009,pp.1353–1367 issn0025-1909 eissn1526-5501 09

Perry and Whitt: Responding to Unexpected Overloads in Large-Scale Service SystemsManagement Science 55(8), pp. 1353–1367, © 2009 INFORMS 1363

Figure 4 Curves of Optimal Queue Ratios for an X Model withParameters �i� i = 1, �1�2 = �2�1 = 08, �i = 03, mi = 100,and Quadratic Cost Function C�Q1�Q2� from (15)

–50 0 50 100 150 200–50

0

50

100

150

200

250

300

Q1

Q2 Ratio = 1/r1, 2

Ratio = 1/r2,1Slope = –a1, 2

Slope = –a2,1

Notes. Also shown are two possible initial states, depicted by stars, and thelines of constant weighted queue lengths that relate the initial states to (recip-rocals of) the optimal queue ratios, depicted by circles. Negative queuesstand for spare capacity.

To treat that case, we let �2 =m222, so that class 2has no queue before pool 2 helps class 1. We thenassume that �1 >m111 so that class 1 is overloaded.We then choose a large set of positive weighted arrivalsums * �w1

12 � � � �wn12+ and find the optimal queue ratiofor each. In the first step, we let �1 ≡ �w12 − a12�2,using (14). We then write (15) as a function of Z12,take its derivative and find the optimal Z∗

12. Plug-ging Z∗

12 in the queue equations gives us the optimalqueue lengths (for the specific arrival rates), and theoptimal queue ratio r∗12. We repeat this for every �wi12to get the curve 1/r∗12 depicted in Figure 4. To findthe curve 1/r∗21 we go through essentially the sameprocedure for Z∗

21.Figure 4 simultaneously depicts the three optimal

sharing regions in the two-dimensional state spaceand the two curves of optimal queue ratios. It wasgenerated using Matlab on a system with the follow-ing parameters: m1 =m2 = 100, 11 = 22 = 1, 12 =21 = 0�8, and �1 = �2 = 0�3. In addition, Figure 4shows how to find the optimal queue ratio for twopossible initial queue lengths denoted by stars. Whenthe initial queue-length vector is �Q1Q2� = �1500�(equivalently, �1 = 145 and �2 = 100), then the optimalqueue-length vector is �Q∗

1Q∗2�= �76�591�8� and the

optimal queue ratio is r∗12 ≡Q∗1/Q

∗2 = 0�83. This opti-

mal queue ratio is the intersection of the curve 1/r12with the line with slope −a12 that passes through(1500) (the circle on the 1/r12 curve). When the ini-tial queue-length vector is �Q1Q2�= �0150� (equiv-alently, �1 = 100 and �2 = 145), we get r∗21 = 0�41 and�Q∗

1Q∗2� = �46�6112�8�. The optimal queue ratio is

also the intersection of the curve 1/r21 with the line

with slope −a21 that passes through (0150) (the cir-cle on the 1/r21 curve).Both the 1/r∗12 and 1/r∗21 curves seem to be lin-

ear, although that is actually not quite the case; ther∗i j ’s are not constants for this cost function. For exam-ple, we already noted that, for ��1�2� = �145100�the optimal queue ratio is r∗12 = 0�83. If we change �1to 110 then the optimal ratio becomes 0�80. For theother sharing direction, if ��1�2� = �100145�, thenr∗21 = 0�41, but if we change �2 to 110, then the opti-mal ratio changes to 0�38.The fact that the two optimal-ratio curves are nearly

linear in Figure 4 suggests that we can approxi-mate the optimal queue-ratio function by fixed queueratios, depending only on the direction of sharing; i.e.,we can use FQR-T with only two values: one for r12and the other for r21. In our example we may chooseto use r12 = 0�8 and r21 = 0�4. The cost for using anearly optimal ratio is very small in the fluid approx-imation, and even smaller in the stochastic system.To understand when the optimal queue-ratio func-

tions are nearly linear, as in the example above,and what the structure should be more generally,we investigate structured cost functions in §EC.4. Weobtain explicit analytical expressions in special cases.We focus on separable cost functions: C�Q1Q2� =C1�Q1� + C2�Q2�, where each component cost func-tion Ci is strictly convex, strictly increasing andtwice differentiable. For example, we find that QR-Treduces to FQR-T exactly when Ci�Qi� = ciQnii withn1 = n2; the case n1 = n2 = 2 is close to (15).

Proposition 5 (Explicit Solution).WhenC�Q1Q2�= c1Q2

1 + c2Q22, FQR-T is optimal for the X fluid model

with

r∗12 ≡a12c2c1

= c222�1c112�2

r∗21 ≡a21c2c1

= c221�1c111�2

Z∗12=

�c112�1��q1−�s111/�1��−�c222�2��q2−�s222/�2��c1

212/�1+c2222/�2

Z∗21=

c221�2�q2−�s222�/�2�−c111�1�q1−�s111�/�1�c1

211+c2221

(16)

In Proposition 5, the cost is specified by a singleparameter: the ratio c1/c2 specifies the relative impor-tance of the two queues. (The remaining parameteris equivalent to choosing the monetary units.) Finally,we caution that other cases (e.g., linear costs) can bequite different; see §EC.4.

5.4. Application to the Stochastic ModelWe can directly apply the QR-T control derived aboveto the stochastic model. Figure 4 identifies threesharing regions to apply to the stochastic process(Q1�t� S1�t�Q2�t� S2�t�) once sharing has been acti-vated. There are two regions for each direction of

INFORMS

holds

copyrightto

this

article

and

distrib

uted

this

copy

asa

courtesy

tothe

author(s).

Add

ition

alinform

ation,

includ

ingrig

htsan

dpe

rmission

policies,

isav

ailableat

http://journa

ls.in

form

s.org/.

Page 12: RespondingtoUnexpectedOverloads inLarge …users.iems.northwestern.edu/~perry/Responding.pdfMANAGEMENT SCIENCE Vol.55,No.8,August2009,pp.1353–1367 issn0025-1909 eissn1526-5501 09

Perry and Whitt: Responding to Unexpected Overloads in Large-Scale Service Systems1364 Management Science 55(8), pp. 1353–1367, © 2009 INFORMS

sharing; e.g., if sharing has been activated with pool 2helping class 1, available pool 2 agents serve class 1customers when the queue-length vector falls in thelower right region, whereas there is no sharing inthe other two regions. The way to share is describedin §4.2.

6. Choosing the ThresholdsWe now consider how to choose the thresholds &12and &21. These thresholds have two importantroles: first, they automatically detect when the sys-tem becomes overloaded and, second, they preventunwanted sharing in normal loading. If the thresh-olds are too large, then the queues may not reachthem during the overload. (Abandonments necessar-ily keep the queues from increasing without bound,even under overloads.) On the other hand, as dis-cussed in §4.1, if the thresholds are too small, thensharing may be activated too often, so that we mayget inefficient sharing.Unfortunately, the fluid analysis cannot reveal the

“right” size of the thresholds, because the fluidqueues are empty under normal loading. We needto understand the extent of the stochastic fluctua-tions, something that is not captured by the fluidapproximation. At this point, it is convenient to applymany-server heavy-traffic limits to gain additionalinsight. To understand the general idea, it suffices torefer to established limits for the basic M/M/n+Mmodel, as in Garnett et al. (2002) and Whitt (2004).There, both fluid models and refined diffusion processmodels are obtained as limits as the scale increases,where scale is measured by the number n of servers.What is unusual here, though, is that we are simulta-neously interested in the QED regime and the ED oroverloaded regime.The QED regime is appropriate to describe nor-

mal loading, which is what prevails before the over-load occurs, whereas the ED regime is appropriateto describe the overloaded system. In both cases, thearrival rate is allowed to grow as n → , whereasthe service rate and abandonment rate are held fixed.The important insight is that the queue lengths tendto be of order O�

√n� in the QED regime, as depicted

by the diffusion limit in the QED regime, whereasthe queue lengths tend to be of order O�n� in the EDregime, as depicted by the fluid limit in the ED regime.Thus, to prevent unwanted sharing when the sys-

tem is normally loaded, we should choose the thresh-olds to be of size bigger than O�

√n�. That ensures

that the weighted-queue-difference processes, D12and D21, will not move above the thresholds byrandom fluctuations. On the other hand, we shouldchoose the thresholds to be o�n�, so that the thresh-olds will be asymptotically negligible compared to the

O�n� fluid content. Then, asymptotically, they will beexceeded instantaneously when the overload occursand they will not significantly alter the queue ratios.From this simple reasoning, we see that it sufficesto have &�n�i j = O�np� as n→ for 1/2 < p < 1. (Inci-dentally, that scaling also makes the thresholds outof reach in normal loading in the law-of-the-iteratedlogarithm scaling of �n log logn�1/2.)This asymptotic analysis shows that the thresh-

olds chosen in this way are asymptotically optimal,both during normal loading and during overload inci-dents. Asymptotically, the thresholds will be exceedednegligibly often during normal loading; asymptoti-cally, the thresholds do not alter the optimal aver-age cost in the overload incidents. For the case ofnormal loading, we can apply the QED results inGarnett et al. (2002); for the overload incidents, theED results in Whitt (2004) provide only heuristic sup-port, because they apply only to the M/M/n + Mmodel. We intend to prove the ED fluid model for theX model with FQR-T in a subsequent paper.Of course, we actually have a system with one

fixed n. When we want to apply the theory to a realsystem, with a finite number of agents, it becomeshard to distinguish between O�n� and O�

√n�. For

example, if n= 100, then both 10= 0�1n and 10=√n.

Thus, as in any application of asymptotic results, weshould numerically verify that the values chosen areappropriate, and refine them if necessary, for whichwe can use simulation. For example, for a system with100 agents in each pool (and abandonment rates lessthan service rates), we found that &i j = 10 is effective.We found that the performance is not too sensitive tothe choice of the thresholds, provided that they areneither too small nor too large. We present simulationresults for FQR-T under normal loading in §EC.5.2,including a sensitivity analysis for the thresholds.

7. Simulation ExperimentsOur analysis has been based on a fluid approxima-tion of a stochastic system. It remains to show thatthe fluid approximation is suitably accurate for thestochastic system and that the optimal control for thefluid model works well in the stochastic system. Forthose purposes, we conduct simulation experiments.

7.1. Accuracy of Fluid ApproximationIn this subsection, we investigate the accuracy of theapproximation. To show how the accuracy increasesas the system becomes larger, we simulated threecases, each case represents an element in a sequenceof queueing systems indexed by n, scaled to satisfya many-server heavy-traffic limit in the ED regime asn→. We use the same fixed service and abandon-ment rates as before (i i = 1, 12 = 21 = 0�8 and�i = 0�3). We consider a fixed queue ratio r = 1. We

INFORMS

holds

copyrightto

this

article

and

distrib

uted

this

copy

asa

courtesy

tothe

author(s).

Add

ition

alinform

ation,

includ

ingrig

htsan

dpe

rmission

policies,

isav

ailableat

http://journa

ls.in

form

s.org/.

Page 13: RespondingtoUnexpectedOverloads inLarge …users.iems.northwestern.edu/~perry/Responding.pdfMANAGEMENT SCIENCE Vol.55,No.8,August2009,pp.1353–1367 issn0025-1909 eissn1526-5501 09

Perry and Whitt: Responding to Unexpected Overloads in Large-Scale Service SystemsManagement Science 55(8), pp. 1353–1367, © 2009 INFORMS 1365

Table 1 Comparison of Steady-State Performance Measures in FluidApproximation with Corresponding Simulation Results forthe Markovian X Model

n= 25 n= 100 n= 400

Perf. meas. Approx. Sim. Approx. Sim. Approx. Sim.

Q1 139 135 556 528 2222 2167±04 ±12 ±70

Q1/n 056 054 056 053 056 054±002 ±001 ±002

Q2 139 157 556 584 2222 2231±05 ±12 ±70

Q2/n 056 063 056 058 056 056±002 ±001 ±002

Ratio 10 098 10 090 10 096±002 000 000

Z1�2 42 48 167 177 667 664±02 ±03 ±22

Z1�2/n 017 019 017 018 017 017±001 ±000 ±001

Cost 137 179 1935 2028 2996 2998(thousands) ±001 ±081 ±192

Notes. For each n, there are n agents in each pool, with �1 = 13n, �2 = n,�i� i = 1, �1�2 = �2�1 = 08, and �i = 03. The thresholds �1�2 = �2�1 are3�10�30 for n= 25�100�400, respectively.

let the arrival rates be �1 = 1�3n and �2 = n, whenthere are n agents in each service pool. The threecases we consider are n = 25100400. We let thethresholds for these three values of n be &12 = &21 =31030, respectively. (The thresholds were droppedwhen exceeded.)Table 1 shows the results. Each result is the average

of five independent simulation runs having 300,000arrivals in each run. The half-width of 95% confidenceintervals, calculated using the t random variable withfour degrees of freedom, are also given.To show both the actual performance and the con-

vergence to the fluid limit as n increases, we dis-play both the direct values and the scaled values,dividing by n. Because the scaled values tend to benearly independent of n, we witness the heavy-trafficfluid limit. We see that the approximations get betteras n increases, but they are already not too bad whenn= 25.

7.2. Comparing the Two ControlsIn the fluid analysis, choosing the number of agents ineach pool that are helping customers from the otherclass is equivalent to choosing the queue ratio, ri j .However, that is not the case in the actual stochasticsystem. With specified numbers of agents serving cus-tomers from the other class, the queue ratio fluctuatesrandomly. With specified queue ratios, the numbersof agents helping the other class fluctuates randomly.Moreover, with specified numbers of agents servingcustomers from the other class, the two queue-lengthprocesses evolve independently. In sharp contrast,

Figure 5 Comparison of Cost of Using FQR-T with Different Ratiosand Cost of Fixing the Equivalent Z1�2s

18,000

18,500

19,000

19,500

20,000

20,500

21,000

21,500

22,000

22,500

23,000

Cos

t

r costZ cost

r cost 20,508.6

Z cost

1 2 3 4 5

21,466.3

20,281.5

21,345.4

19,729.7

20,856.1

21,160.4

21,417.0

22,306.2

22,632.3

Notes. In order, the five ratios considered are r = 12�1�083�06�04,where r = 083 is the optimal queue ratio according to the fluid model. Basedon the fluid model, the equivalent Z1�2s are, respectively, 15�17�19�22�25,where each Z1�2 has been rounded up to the smallest integer greater than itsactual value. The average cost of five independent simulation runs is writtenbeneath each point. The cost function is (15), and the system parameters aremi = 100, �1 = 130, �2 = 100, �i� i = 1, �1�2 = �2�1 = 08, and �i = 03.

with specified queue ratios, the queue-length pro-cesses are strongly dependent, as in Figure EC.8. Thissuggests that there is a big difference between the twocontrols in a real, stochastic system. We thus expectthe average cost under FQR-T to be different than theaverage cost when fixing Zi j . We conducted simula-tion experiments to compare the two controls.To compare the two controls—FQR-T and fixed

Zi j—we simulated a system with mi = 100 agents ineach pool, arrival rates �1 = 130 and �2 = 100, ser-vice rates 11 = 22 = 1 and 12 = 21 = 0�8, andabandonment rates �1 = �2 = 0�3. Because class 1 isoverloaded, we took &21 = &12 = 10, but once we goover the threshold &12, we drop it, so that it becomes&12 = 0.Figure 5 presents simulation results comparing the

two average costs for five different cases: (1) r12 =1�2Z12 = 15; (2) r12 = 1Z12 = 17; (3) r12 = 0�83,Z12 = 19 (optimal point); (4) r12 = 0�6Z12 = 22; and(5) r12 = 0�4Z12 = 25. For each point, we fixed thequeue-ratio r12, and used FQR-T with this ratio. Foreach such r12, there is an equivalent Z12 in the fluidequations. Because this Z12 is not necessarily an inte-ger, we rounded it up to the smallest integer largerthan Z12, i.e., we used �Z12� in each simulation ofthe fixed-Zi j control. According to the fluid approxi-mation, the optimal queue ratio is r12 = 0�83, and the

INFORMS

holds

copyrightto

this

article

and

distrib

uted

this

copy

asa

courtesy

tothe

author(s).

Add

ition

alinform

ation,

includ

ingrig

htsan

dpe

rmission

policies,

isav

ailableat

http://journa

ls.in

form

s.org/.

Page 14: RespondingtoUnexpectedOverloads inLarge …users.iems.northwestern.edu/~perry/Responding.pdfMANAGEMENT SCIENCE Vol.55,No.8,August2009,pp.1353–1367 issn0025-1909 eissn1526-5501 09

Perry and Whitt: Responding to Unexpected Overloads in Large-Scale Service Systems1366 Management Science 55(8), pp. 1353–1367, © 2009 INFORMS

respective optimal Z12 is equal to 18�4, rounded up to19 in the simulation experiments.For each case, we conducted five independent sim-

ulation runs using FQR-T, and five independent sim-ulation runs with a fixed Z12, each run with 300,000arrivals. The independent replications make it pos-sible to reliably estimate confidence intervals usingthe t statistic with four degrees of freedom. Thelarge number of arrivals ensures that the transientbehavior in the beginning of the simulation, beforereaching steady state, does not affect the final sim-ulation estimates of the steady-state averages. Addi-tional simulation results are given in Table EC.1 in§EC.5, including the half-width of 95% confidenceintervals and a comparison of the simulation to thefluid approximation.There are several interesting observations to be

made: First, the r-cost curve lies significantly belowthe Z-cost curve, which shows that FQR-T is a supe-rior control. At the optimal point for FQR-T, the aver-age cost under FQR-T is about 5�4% smaller than theaverage cost under the fixed-Z12 control.Secondly, FQR-T tends to be a more robust con-

trol. Small changes in r do not produce large changesin the cost. Note that the largest r value here (1�2)is three times as large as the smallest r value (0�4),whereas the largest Z value here (25) is only 1�6 timesas large as the smallest Z value (15). Moreover, theaverage costs when using FQR-T with r12 = 1�2 andr12 = 1 are still smaller than the cost of fixing Z12 atits optimal value. For further discussion, see §EC.5.

8. ConclusionsIn this paper, we studied ways to respond to unex-pected overloads in large-scale service systems. Weconsidered the Markovian X model with two cus-tomer classes and two service pools, assuming thatagents are more effective serving customers from theirown class than customers from the other class, asspecified by the inefficient-sharing conditions in (1)and (2). Thus, we want negligible sharing under nor-mal loads, but we want to activate sharing when thereis an unexpected overload at an unanticipated time,without knowing what the new arrival rates will be.The main ideas for analyzing the performance and

determining appropriate queue-ratio functions for theQR-T and FQR-T controls we propose are (i) to usesteady-state analysis and (ii) to apply an approximat-ing deterministic fluid model. The QR-T and FQR-Tcontrols proposed for the actual stochastic system aredirect applications of the corresponding optimal con-trols derived for the fluid model in §5.2. We devel-oped an algorithm to find the optimal queue-ratiocurves for a general convex cost function in Propo-sition 4 and §5.3. The resulting QR-T control is eas-ily understood as a partition of the state space into

three sharing regions, as depicted in Figure 4, withtwo regions for each direction of sharing.In Proposition 5 we also provided strong justifica-

tion for FQR-T when the cost function has the formC�Q1Q2� = c1Q2

1 +Q22 for some constants c1 and c2.

In that case, we proved that FQR-T is optimal for thefluid model (i.e., the optimal QR-T control reduces toan instance of FQR-T) and exhibited the explicit opti-mal queue-ratio parameters. Then the optimal queue-ratio parameters depend on the cost function C onlyvia the single parameter c1/c2, which succinctly cap-tures the relative importance of the two queues. Forother sharing regions, see §EC.4.The main ideas for gaining insight into appropriate

threshold values were to apply (i) many-server heavy-traffic asymptotics and (ii) simulation. Heuristically,we showed that the thresholds should be asymptot-ically optimal simultaneously for periods of normalloading and for periods of overload. Asymptotically,no trade-off need be made. The requirement is thatthe thresholds should be of order O�np� as n→ ,where 1/2 < p < 1 and n is the system scale factor.We used simulation to verify that the thresholds workwell for given finite n.Our FQR-T (or QR-T) control is appealing for sev-

eral reasons. First, it is automatic and simple; weneed not directly discover the arrival rates to find outwhen overloads occur, and then decide what amountof sharing should be done. Instead, FQR-T automati-cally detects the time the system becomes overloaded,and then automatically enforces the optimal ratio, byobserving only the size of the two queues. It is eas-ier to use the information about the queues, which isreadily available, than to use information about thearrival rates, which is not readily available. Moreover,simulation experiments indicate that FQR-T performsbetter (produces lower expected costs) than fixing Zi jat their optimal values, even with known arrival rates;see Figure 5.It remains to mathematically prove that the fluid

model with QR-T or FQR-T arises as the many-serverheavy-traffic limit of scaled overloaded queueing sys-tems as the scale increases. It also remains to showthat FQR-T is asymptotically optimal among all pos-sible controls in such a limit. It remains to developsimilar controls for more complex multiclass multi-pool systems with nonexponential service-time andabandonment-time distributions.

9. Electronic CompanionAn electronic companion to this paper is available aspart of the online version that can be found at http://mansci.journal.informs.org/.

AcknowledgmentsThis research was supported by National Science Founda-tion Grant DMI-0457095.

INFORMS

holds

copyrightto

this

article

and

distrib

uted

this

copy

asa

courtesy

tothe

author(s).

Add

ition

alinform

ation,

includ

ingrig

htsan

dpe

rmission

policies,

isav

ailableat

http://journa

ls.in

form

s.org/.

Page 15: RespondingtoUnexpectedOverloads inLarge …users.iems.northwestern.edu/~perry/Responding.pdfMANAGEMENT SCIENCE Vol.55,No.8,August2009,pp.1353–1367 issn0025-1909 eissn1526-5501 09

Perry and Whitt: Responding to Unexpected Overloads in Large-Scale Service SystemsManagement Science 55(8), pp. 1353–1367, © 2009 INFORMS 1367

ReferencesAksin, Z., M. Armony, V. Mehrotra. 2007. The modern call center:

A multi-disciplinary perspective on operations managementresearch. Production Oper. Management 16(6) 665–688.

Armony, M., I. Gurvich. 2006. When promotions meet opera-tions: Cross-selling and its effect on call-center performance.Working paper, Stern School of Business, New York University,New York.

Bassamboo, A., A. Zeevi. 2009. On a data-driven method forstaffing large call centers. Oper. Res. 57(3) 714–726.

Bell, S. L., R. J. Williams. 2005. Dynamic scheduling of a parallelserver system in heavy traffic with complete resource pooling:Asymptotic optimality of a thrshold policy. Electronic J. Prob.10 1044–1115.

Bhandari, A., A. Scheller-Wolf, M. Harchol-Balter. 2008. An exactand efficient algorithm for the constrained dynamic operatorstaffing problem for call centers.Management Sci. 54(2) 339–353.

Gans, N., G. Koole, A. Mandelbaum. 2003. Telephone call centers:Tutorial, review and research prospects. Manufacturing ServiceOper. Management 5(2) 79–141.

Garnett, O., A. Mandelbaum, M. I. Reiman. 2002. Designing a callcenter with impatient customers. Manufacturing Service Oper.Management 4(3) 208–227.

Gurvich, I., W. Whitt. 2009a. Scheduling flexible servers with con-vex delay costs in many-server service systems. ManufacturingService Oper. Management. 11(2) 237–253.

Gurvich, I., W. Whitt. 2009b. Service-level differentiation in many-server service systems via queue-ratio routing. Oper. Res.Forthcoming.

Tezcan, T., J. G. Dai. 2009. Dynamic control of N -systems withmany servers: Asymptotic optimality of a static priority policyin heavy traffic. Oper. Res. Forthcoming.

Whitt, W. 2004. Efficiency-driven heavy-traffic approximations formany-server queues with abandonments. Management Sci.50(10) 1449–1461.

Whitt, W. 2005. Engineering solution of a basic call-center model.Management Sci. 51(2) 221–235.

Whitt, W. 2006. Staffing a call center with uncertain arrival rate andabsenteeism. Production Oper. Management 15(1) 88–102.

INFORMS

holds

copyrightto

this

article

and

distrib

uted

this

copy

asa

courtesy

tothe

author(s).

Add

ition

alinform

ation,

includ

ingrig

htsan

dpe

rmission

policies,

isav

ailableat

http://journa

ls.in

form

s.org/.