queueing models of case managers - university of alberta

Queueing Models of Case Managers

Fernanda Campello, Armann IngolfssonSchool of Business, University of Alberta, Edmonton, T6G 2R6 [email protected]

Robert A. ShumskyTuck School of Business at Dartmouth, Hanover, NH 03755 [email protected]

Many service systems use case managers, servers who are assigned multiple customers and have frequent,

repeated interactions with each customer until the customer’s service is completed. Examples may be found

in healthcare (emergency department physicians), contact centers (agents handling multiple on-line chats

simultaneously) and social welfare agencies (social workers with multiple clients). We propose a stochastic

model of a baseline case-manager system, formulate models that provide performance bounds and stability

conditions for the baseline system, and develop two approximations, one of which is based on a two-time-scale

approach. Numerical experiments and analysis of the approximations show that increasing case throughput

by increasing the probability of case completion can lead to much greater waiting-time reductions than

increasing service speed. Many systems place an upper limit on the number of customers simultaneously

handled by each case manager. We examine the impact of these case-load limits on waiting time and describe

effective, heuristic methods for setting these limits.

1. Introduction

Many service systems employ case managers: customer service agents in a contact center who

manage multiple on-line chats at once; parole officers and social workers who meet with clients

in crisis; and emergency department (ED) physicians who treat multiple patients simultaneously.

Case-manager systems are popular because they can provide highly customized service and can

avoid errors and delays due to handoffs.

We define a case manager as a server who is assigned multiple customers and repeatedly interacts

with those customers. Interactions between an individual customer and the case manager are

usually interspersed by external delays that do not require the manager’s attention, e.g., the delay

while an on-line chat customer composes a message, the time a parole officer’s client stays out of

trouble, and the wait for a test result to be returned to the ED physician. Many of these systems

place an upper limit on the number of customers assigned to each case manager at one time, and

this leads to the formation of a pre-assignment queue for customers who have not yet been assigned

to a case manager.

1

:2

Despite the use of case managers in a wide variety of service systems, when compared to the

analysis of standard multi-server systems there has been relatively little academic research on

case-manager systems (we review the important existing literature in Section 3). In practice, the

analysis and management of case-manager systems is often rudimentary. For example, one method

for setting caseloads proposed in both the academic literature and industry handbooks is a simple

deterministic calculation: divide the number of hours a case manager is available per month by

the average time required per case per month. Variations on this deterministic approach have been

suggested by Yamatani et al. (2009) for social workers, Ostrom and Kauder (1996) for judges in

the Colorado State Courts, and the American Prosecutors Research Institute (APRI 2002) for

prosecuting attorneys. The web site of the Child Welfare League of America (CWLA 2013) states

that “Although the field could benefit from a standardized caseload/workload model, currently

there is no tested and universally accepted formula ... Yet, the CWLA standards most requested

are those that provide recommended caseload and/or workload sizes.” Our models are intended

to fill this need. In particular, existing standards and models do not capture the variable and

unpredictable nature of the work (Yamatani et al. 2009). Our models incorporate this randomness

and can be used to assess the impact of caseload limits on throughput and pre-assignment delay.

In this paper we make the following contributions. (1) We develop new bounds and approxima-

tions for case-manager systems. One approximation is a novel application of two-time-scale (T )

analysis, and a modification of this approximation leads to a tractable balanced (B) approximation

that is a birth-death process. (2) We demonstrate how to apply case-manager models to a variety

of environments: ED, chat, and social work systems. (3) We demonstrate that the approximations

provide accurate estimates of system performance measures over a wide range of parameters, and

that the simple B approximation provides particularly robust recommendations for setting caseload

limits. (4) We use the approximations to derive insights on the behavior and management of case-

manager systems. In particular, we see that two methods for improving throughput (increasing

resolution probability and increasing service rates) can have dramatically different impacts on sys-

tem performance. We also see that increasing caseload limits may or may not decrease waiting

time, even in the absence of server slowdown as caseloads rise, because changes in the caseload

limit shift the balance between two phenomena: the benefits of server pooling vs. higher server

utilization at higher caseloads.

2. Definitions and Models

In our system the service provided to a given customer, which we refer to as a case, is composed

of a random number of processing steps, all of which are handled by the same case manager

(server). When a processing step is finished either the case is completed and leaves the system or

:3

the case waits for the completion of an external delay that does not require the case manager’s

attention before the next processing step can begin. In an ED, for example, the processing steps are

encounters with the patient’s assigned physician, the external delays are diagnostic tests or requests

for other information, and a particular case is completed when the patient is either discharged or

admitted to the hospital.

Figure 1 shows our baseline model. Customers arrive according to a Poisson process with rate λ

to a pre-assignment queue where they wait to be assigned to one of N case managers who each have

a maximum caseload M. When a case manager completes a case, then another case, if available, is

assigned from the pre-assignment queue to that case manager. If the case manager is busy, the new

case joins a FCFS internal queue. Otherwise, the new case immediately begins the first processing

step with the case manager. The duration of each processing step is exponentially distributed with

rate µTot ≡ µ+µ′, where µ is the case completion rate and µ′ is the rate at which the case moves

to an external delay. Therefore, the probability that a case is completed after each processing step

is γ = µ/µTot. Otherwise, with probability 1− γ, the case moves to an exponentially distributed

external delay with mean 1/λ′. Note that a case manager handles multiple cases simultaneously: if

one case is in an external delay then the manager works on another case that is not in an external

delay, if one is available.

Our notational convention is to use λ and µ for the rates at which cases arrive and are completed

by busy servers and to use primes (λ′ and µ′) for rates at which cases cycle around before they are

completed. The parameters µTot and γ are uniquely determined by the parameters µ and µ′ and

we will use these two parametrizations interchangeably. The (µTot, γ) parametrization corresponds

more closely to empirical data, whereas the (µ,µ′) parametrization is useful for model formulation

and approximation. We use calligraphic fonts (S,R,P,B, and T ) to label the three systems and

two approximations that we define.

If multiple case managers are below their case limits when a case arrives, then that case is

immediately sent to a manager with the smallest caseload. We refer to this scheme as the join-the-

smallest-caseload (JSC) routing policy. Note that the JSC policy may not be the optimal policy,

although Tezcan and Zhang (2014) find that the JSC policy is asymptotically optimal for a similar

system. We refer to the baseline system as the S system because of this Smallest-caseload policy.

The S system can be represented by a Markov chain. For each case manager u∈ 0, . . . ,N, we

define two sets of state variables: the caseloads ku ∈ 0, . . . ,M and the number of cases currently

waiting for or receiving service, ju ∈ 0, . . . , ku. We also define state variable i as the total number

of cases in the system. The state space (i, j1, . . . , jN , k1, . . . , kN) : i≥ 0, j1 ≤ k1 ≤M, . . . , jN ≤ kN ≤

M has [(M + 2)(M + 1)/2]N

states with i≤NM and (M +1)N states for each value of i, i >NM .

:4

Figure 1 The baseline case-manager system S

The state space grows exponentially with the number of case managers, which makes this Markov

chain representation of organizations with a large number of case managers computationally chal-

lenging, even if there is a limit on the size of the pre-assignment queue (for example, the Children,

Youth and Families Department of Pittsburgh described in Yamatani et al. (2009) has N = 112 case

managers, each with a caseload limit M = 25, producing a system with 10285 states, not counting

states with a positive pre-assignment queue). Even for systems where γ = 1 and M =∞ (the case

managers are parallel exponential servers) and N > 2, the computation of performance measures

under join-shortest-queue (JSQ) routing (equivalent to our JSC) requires various approximations

(Lin and Raghavendra 1996, Nelson and Philips 1989). We solve small S systems numerically and

we use simulation for larger systems. We also formulate four models that are substantially easier

to analyze and generate interesting insights into system performance: two that seem to provide

bounds on the S system (R = random and P = pooled) and two that approximate the S system

(T = two-time-scale and B = balanced).

In theR system (Figure 2), new case arrivals are routed randomly to one of the N case managers,

so that new cases arrive to each case manager according to a Poisson process with rate λ/N .

The random routing eliminates the caseload balancing benefits of the JSC policy. If the manager’s

caseload equals M , then a new arrival to that case manager waits in a pre-assignment queue

associated with that particular manager. The term “pre-assignment queue” is used here to match

the analogous queue in the S system.

In the P system (Figure 3), cases are not assigned to a particular server; they may use any server

for each processing step, which reduces forced server idleness. If the total number of customers in

:5

Figure 2 The R system.

Figure 3 The P system.

service, in the internal queue, and in external delay is greater than NM , then an arriving customer

waits in a pre-assignment queue. Otherwise, if all servers are busy the customer waits in a first-

come-first-served internal queue that is common to all N case managers. As we will see in Section

3, the P system has frequently been used to describe hospital ward operations.

In the T approximation, we assume that a case manager with caseload ku functions as an

exponential server with service rate φ(ku) equal to the steady-state service completion rate in

a related single-server finite-source (M/M/1//ku) queueing model, which is justified if the “fast

rates” λ′ and µ′ are much larger than the “slow rates” λ and µ. This approximation has a smaller

state space than the original S system but is still a challenge to solve. The B approximation (Figure

4) builds on the T approximation, by adding the assumption that arrivals are routed and cases are

transferred between case managers so that managers always have caseloads that are within one case

:6

Figure 4 The B approximation.

of each other. This Balancing assumption results in a simple birth-death process approximation,

as we discuss in Section 6.

Throughout this paper we will refer to the performance measures shown in Table 1.

Table 1 Performance measure definitions for models M = P,S,B,R,T .

Expected Number Expected Time

Pre-Assignment: LMa WMa

Internal Queue: LMq WMq

External Delay: LMe TMe = (1/λ′)(1/γ− 1)

Service: SM 1/µ

Total in System: LM TM

3. Literature Review

There is a rich and growing literature on healthcare operations that is closely related to our models.

In particular, several researchers have proposed and analyzed models that are similar to our Psystem. Yom-Tov and Mandelbaum (2014) propose solutions to ED nurse and physician staffing

problems based on the application of time-varying fluid and diffusion approximations to a pooled

system with unlimited caseload. To support capacity planning decisions in an oncology ward, Yom-

Tov (2010) uses a pooled model with a finite caseload, where patients are blocked when the system

reaches the caseload limit. de Vericourt and Jennings (2011) examine the efficiency of nurse-to-

patient ratio policies for nurse staffing using a closed M/M/s//n queueing system (which is similar

to our pooled system, but with a fixed number of customers and no pre-assignment queue) to

model medical units. Yankovic and Green (2011) examine a finite-source queueing model with two

:7

sets of servers: nurses and beds. The variable population size allows them to include the potential

change in the number of patients during a work shift. Huang et al. (2012) investigate routing

policies for initial and subsequent patient encounters with ED physicians, modeled as a pooled set

of servers with no caseload limit and no external delays, and analyzed in the conventional heavy

traffic regime. de Vericourt and Zhou (2005) describe a general model of a call center in which

a customer may revisit the system if the customer’s problem is not resolved on the first call. As

in our P system (and distinct from our S system), all of these models assume that any customer

can be treated by any server. On the other hand, in Apte et al. (1999), case managers receive

independent streams of jobs, as in the R system.

Primary care physicians may also be seen as case managers: they have their own patients (their

‘panel’) who repeatedly visit the physician for examination or treatment. Green and Savin (2008)

model a single physician using a single-server queueing model, where the arrival rate to the physi-

cian is proportional to the panel size. This is a reasonable model because panel sizes are large (in

the thousands) and the probability of arrival for any particular patient on any particular day is

small. Our model, however, is designed for systems where the servers have small caseloads (1-30

customers rather than thousands) and customers may return relatively quickly to the case man-

ager. In addition, we model the process of assigning a customer to one of multiple case managers

when a customer first enters the system, while Green and Savin (2008) focus on a single physician.

Models closest to our S system are found in Saghafian et al. (2014, 2012), Dobson et al. (2013),

Tezcan and Zhang (2014), and Luo and Zhang (2013). Saghafian et al. (2014) model an ED as a

case-manager system, as we define it, and disaggregate the analysis to “Phase 1” (similar to our

pre-assignment queue) and “Phase 2” (with repeated testing and interactions with a physician).

They model Phase 1 as a priority M/G/1 queue and focus on the triage decision, that is, whether to

prioritize patients with simple or complex conditions. They analyze Phase 2 as a Markov Decision

Process and focus on how a physician chooses the next patient. In our S model, we integrate Phases

1-2, but assume that all patients are homogeneous. Saghafian et al. (2012) use a model similar to

that in Saghafian et al. (2014) to examine how patients should be routed (or “streamed”) through

an ED, depending on whether the patient is likely to be discharged or admitted to the hospital.

Dobson et al. (2013) (hereafter DTT) examine a case-manager system that is also motivated

by an ED. Their model allows for limited capacity to serve customers in external delay, service

interruptions from customers in external delay, and distinct service-time distributions for the initial

vs. subsequent encounters between a customer and a case manager. Both DTT and this paper

use simulation to analyze systems with separate (non-pooled) case managers. This paper differs

from DTT in terms of both methodology and focus. This paper models the bounding systems as

quasi-birth-death (QBD) processes and uses two-time-scale analysis to develop approximations,

:8

while DTT use high-caseload asymptotic analysis to examine the performance of single-server

and pooled systems. DTT focus on the optimal control of the system—whether the case manager

should prioritize new customers or returning customers—while we focus on system stability and

the determination of caseload limits.

The models in Tezcan and Zhang (2014) and Luo and Zhang (2013) are motivated by customer

service chat and instant messaging systems in which each agent simultaneously serves multiple

customers. In both papers, the system is approximated with a processor sharing model, that is,

each agent’s capacity is infinitely divisible and all customers are served simultaneously. Tezcan and

Zhang (2014) examine the optimal routing policy, and they find that under certain conditions the

optimal policy is similar to the JSC policy for our S system. Luo and Zhang (2013) focus on the

transient and steady-state system behavior, given a routing policy. Both papers derive their results

using a many-server asymptotic analysis. These processor-sharing models are built upon general

functions that describe each manager’s case completion rate, given caseloads. Our models instead

describe the specific interactions between customers and case managers, allowing us to predict the

impact of changes in customer or manager behavior (such as average duration of external delays or

probability of service completion) on system performance. The case-completion-rate function φ(ku)

in our T and B approximations has a similar interpretation to the case-completion-rate functions

in Tezcan and Zhang (2014) and Luo and Zhang (2013).

The two-time-scale approach that we use for the T approximation has been widely applied to

manufacturing, communications, and financial systems; see references in Yin and Zhang (2013).

This approach, however, has rarely been used to study service systems. Ramakrishnan et al. (2005)

and Shi et al. (2014) examine patient flow between a hospital’s ED and in-patient beds using a

different type of two-time-scale analysis—one in which the speed of the fast time scale does not

approach infinity, as it does in the approach we use. The B approximation is related to Gilbert’s

(1996) “perpetual backlog” system—a finite-source model of a single case manager that assumes

the manager is always at the caseload limit.

Finally, KC (2013) empirically examines the effect of caseload levels (or “multitasking”) on the

productivity and service quality of ED physicians. Coviello et al. (2014) perform a similar analysis

of the impact of ”task juggling” on the productivity of Italian judges. We will return to these

results in Section 9.

4. Analysis of the Bounding Systems

In this section, we focus on theR and P systems, which we believe provide lower and upper bounds,

respectively, on S system performance. Our numerical studies support this hypothesis. In addition,

these easy-to-analyze systems enable us to quickly determine ranges of parameters for which the

:9

case-manager system is stable, as well as the range of performance measures we could expect to

find in the S system. In particular, the R and P system bounds dramatically reduce the number of

simulations needed to analyze the S system. The bounds also help us to understand the dynamics

of the case-manager system, identifying when there is considerable advantage due to the pooling

effect when routing to the server with the smallest caseload, and when this advantage is small and

the case-manager system performs close to a random routing system.

4.1. Comparing the R, S, and P systems

The P system does not have fixed customer-server assignments. A customer at the head of the

internal queue is served by the first available server without having to wait for a particular server

to be free. Therefore, a server is less likely to be idle due to an empty internal queue in the P

system than in the S system, which has fixed customer-server assignments. For this reason, we

expect queue lengths and waiting times to be smaller in the P system than in the S system. Pooling

resources that work at the same rate is known to be beneficial in many settings. For example,

Smith and Whitt (1981) show that pooling two M/M/s loss systems with the same service time

distribution is beneficial (but pooling might not be beneficial if the service time distributions are

different). Based on these considerations, we propose:

Conjecture 1. For S and P systems with the same parameters (N,M,λ,λ′, µ,µ′), T S ≥ TP

The routing in the S system is state-dependent, using dynamic caseload information for each

manager in an attempt to achieve a more balanced distribution of caseloads among managers than

in the R system. In a system with better balanced caseloads, the chances of having an idle server

should be smaller, so we expect performance measures such as queue lengths and waiting times to

be smaller in the S system than in the R system. Therefore, we propose:

Conjecture 2. For S and R systems with the same parameters (N,M,λ,λ′, µ,µ′), TR ≥ T S

These relationships have been established for the special case where γ = 1 and M →∞. In this case,

the R system corresponds to N parallel, independent, and identical M/M/1 queues, the S system

corresponds to a JSQ system with N parallel exponential servers and the P system corresponds to

an M/M/N system. Nelson and Philips (1989) argue that the number of customers in the S system

is stochastically larger than number of customers in the P system, and the S system has a lower

expected response time than the R system. This relationship between S and R also holds true

for more general service time distributions with non-decreasing hazard rate (Weber 1978). (Whitt

(1986) discusses service time distributions for which JSQ is not optimal, however.) Our conjectured

bounds hold true for all computational experiments we have done so far, up to simulation error.

:10

4.2. QBD formulations of the R and P Systems

We formulate the subsystem for each individual case manager in the R system as a QBD process

(Latouche and Ramaswami 1999). We begin with state variables i= the total number of cases in

the system (in the pre-assignment queue or assigned to the case manager) and j = the number of

cases in the internal queue or in service. We use i and j to determine the pre-assignment queue

length la ≡ (i−M)+, the caseload q= min(i,M), and an indicator variable s= min(j,1) that equals

one if the manager is busy and zero otherwise. The possible transitions are:

• New case arrival: (i, j)→ (i+ 1, j+ 1) with rate λ/N , when i <M , and (i, j)→ (i+ 1, j) with

rate λ/N , when i≥M .

• Service completion that results in case completion: (i, j)→ (i− 1, j − 1) with rate sµ when

i≤M , and (i, j)→ (i− 1, j) with rate sµ, when i >M .

• Service completion that does not result in case completion: (i, j)→ (i, j− 1) with rate sµ′.

• Completion of external delay: (i, j)→ (i, j+ 1) with rate (q− j)λ′.To formulate the subsystem as a QBD, we transform state variables i and j into the standard

QBD state variables: The level, which we define as ` = 0 if i < M and ` = la + 1 otherwise, and

the phase, which we define as p= j. We order the states, (`, p), lexicographically and we organize

the transition rates in the general block tridiagonal form of a QBD infinitesimal generator, for

M=R,P:

Q=

BM1 BM0BM2 AM1 AM0

AM2 AM1 AM0

AM2 AM1. . .

. . .. . .

. (1)

The diagonal matrix blocks correspond to transitions where the level does not change whereas

the off-diagonal blocks correspond to transitions where the level increases (above the diagonal) or

decreases (below the diagonal) by one. The R and P systems both have infinitesimal generators

with this general form. Appendix E defines the matrix blocks BR0 , BR1 , and BR2 for transitions out

of, within, and into the (M + 1)M/2 boundary states, where the level equals zero. The R system

repeating matrix blocks AR0 , AR1 , and AR2 are square matrices of order M + 1 as follows (using ∆

for generic diagonal elements in AR1 and AR):

AR0 =λ

NI,AR1 =

∆ Mλ′

µ′ ∆ (M − 1)λ′

. . .. . .

. . .µ′ ∆ λ′

µ′ ∆

,AR2 = µ

0

1. . .

1

, (2)

The matrix AR =AR0 +AR1 +AR2 is the infinitesimal generator for a finite-source single-server queue

with M customers that we analyze in Section 5 when we investigate the stability of the R system.

:11

We define the P system similarly to the R system, with the same state variables i and j, for the

total number of customers in the system and the total number of customers in service or waiting in

an internal queue, respectively. The auxiliary state variables are computed as la = (i−NM)+, q=

min(i,NM), and s= min(j,N). The possible transitions are the same as for the R system and the

matrix blocks (shown in Appendix E) have similar structures. The sum AP of the repeating matrix

blocks corresponds to the Markov chain of a finite-source N -server queue with NM customers,

which will play a role in our analysis of the stability of the P system in Section 5.

Let πM` ,M =R,P be a row vector of stationary probabilities for level `, ` ≥ 0. These vectors

satisfy the matrix-geometric recursion

πM`+1 = πM` RM, `≥ 1, (3)

where the rate matrix RM is the minimal nonnegative solution of the nonlinear matrix equation

AM0 +RMAM1 + (RM)2AM2 = 0,M=R,P. (4)

We compute RM using the modified SS method (Gun 1989) and we compute πM0 and πM1 through

standard QBD analysis, as detailed in Appendix E.

Expressions (5)-(6) provide formulas to compute the average pre-assignment queue length,

LMa ,M=R,P, for the R and P systems (the queue length is aggregated over all case managers

for the R system, for easier comparison to the other systems).

LRa =N∞∑n=1

(n− 1)πRn e=NπR1 RR(I −RR)−2e, (5)

LPa =∞∑n=1

(n− 1)πPn e= πP1 RP(I −RP)−2e, (6)

where e is a column vector of ones. Appendix E provides similar closed-form expressions for the

other performance measures, for the R and P systems.

5. Stability Conditions

Let λMlim be the largest external arrival rate that systemM=R,S,P can accommodate without the

expected length of the pre-assignment queue growing without bound. We refer to [0, λMlim) as the

system M stability region. Intuitively, we expect the external arrival rate limit to be the product

of three components:

1. The number of case managers, N ,

2. The rate at which a case manager clears cases when busy, µ,

3. The probability that a case manager is busy, if the external arrival rate is sufficiently high to

not limit the case manager’s busy probability.

:12

The product of the first two components, Nµ, is the rate at which the system could clear cases if

all case managers were always busy. The product of the first and third components can be viewed

as E[SMlim], the steady state expected number of busy servers in a limiting system with all case

managers at full caseload (for the P system, this means a system caseload of NM). We expect the

P system to have a larger stability region than the R and S systems, because the P system avoids

situations with an idle case manager, while at the same time a case is waiting in internal delay.

In this section, we first demonstrate that the stability regions for the three systems coincide

in the special case when M = 1 and that the R and P stability regions coincide in the limiting

case when M approaches infinity. Then we formally prove that the limit on the external arrival

rate for the R and P systems can be expressed as the product of the three components that we

have mentioned and that P has a larger stability region than R. We conjecture that the R and S

systems have the same stability regions and we provide numerical support for this conjecture.

When M = 1, a case will never wait for a case manager—its entire time after leaving the

pre-assignment queue will consist of processing steps and external delays, without any internal

delays. The average total time from leaving the pre-assignment queue until case completion is

(1/γ)(1/µTot) + (1/γ − 1)(1/λ′) = 1/µ+ (1/µ)(µ′/λ′) and out of this total, the average time that

some case manager is busy with the case is 1/µ. It follows that the proportion of time that a case

manager is busy, if the system is fully loaded, is

1µ

1µ

+ 1µµ′

λ′

=1

1 + µ′

λ′

. (7)

Therefore, the external arrival rate limit is λMlim =Nµ/(1 +µ′/λ′) for M=R,S,P.

When M approaches infinity, then the R and P systems become open Jackson networks and

straightforward analysis (Appendix A) shows that λMlim =Nµ, that is, the external arrival rate limit

equals the rate at which the system can clear cases if all case managers are busy at all times.

We provide general expressions for the external arrival rate limits for the R and P systems in

Theorem 1. We use a general QBD ergodicity condition (Latouche and Ramaswami 1999) to prove

the validity of these expressions.

Theorem 1. The R and P systems are stable if and only if λ< λMlim for M=R,P, where

λMlim = µE[SMlim], M=R,P (8)

and SMlim is the steady state number of busy servers in a limiting system for system M=R,P.

The limiting system Rlim for R is a collection of N independent and identical single-server

finite-source Markovian queueing systems (M/M/1/./M) with population size M . The limiting

system Plim for P is an N -server finite-source Markovian queueing system (M/M/N/./NM) with

:13

1

1.5

2

2.5

3

3.5

4

4.5

5

5.5

1 2 3 4 5 6 7 8

Stability Lim

it for λ

Caseload limit (M)

P: λ' = 9.6R & S: λ' = 9.6P: λ' = 5.1R & S: λ' = 5.1P: λ' = 2.1R & S: λ' = 2.1

Figure 5 New-case arrival rate stability limits for maximum caseloads of 1 to 8 cases, for R, S, and P systems

with N = 2 case managers, µTot = 7.5, γ = 1/3, λ′ = 2.1,5.1, and 9.6.

population size NM . The service rate is µ′ and the average time until arrival is 1/λ′ for each

customer in the population, for both limiting systems. The steady state expected number of busy

servers in these two systems can be expressed as:

E[SRlim] = N

(M∑i=0

mini,1ωRi

), (9)

E[SPlim] = N

(NM∑i=0

mini/N,1ωPi

), (10)

where ωMi is the steady state probability of state i in the Markov chain corresponding to matrix

block AM, for M=R,P.

Proof: See Appendix B

Fig. 5 shows that P has a larger stability region than R for caseload limits M between 1 and

∞ and confirms that their stability regions coincide when M = 1 and when M →∞. This figure

was generated by using the expressions in Theorem 1 to compute λMlim,M=R,P for systems with

N = 2 case managers, with parameters µTot = 7.5, γ = 1/3, λ′ = 2.1,5.1, and 9.6, and maximum

caseload limits varying from 1 to 8. The stability limits increase when λ′ increases, because less

time in external delay leads to less forced server idleness.

The S system can be formulated as a QBD process, by combining the caseload and queue length

state variables for all case managers into a single phase variable with finite (but large) range. We

:14

used this approach to numerically compute stability limits, λSlim, for the S system. We only need

to generate the repeating matrix blocks (not the boundary matrix blocks) to compute stability

limits. The S system repeating matrix blocks are square matrices of order (M + 1)N . We verified

numerically that the R and S systems have exactly the same stability limits for all values of M

and λ′ that are shown in Fig. 5 (and for many other cases, see Appendix F.2). This numerical

evidence leads us to:

Conjecture 3. For S and R systems with the same parameters (N,M,λ′, µ,µ′), λSlim = λRlim.

In addition to the numerical evidence, we observe that if the arrival rate of new cases is sufficiently

high, one would expect the internal queues of the R and S systems to behave in the same way. For

such highly loaded systems, each case manager would operate, most of the time, as a single-server

M -customer finite-source queue, in both the R and the S systems. The numerical results that we

report in Section 8 (in particular, Fig. 11 (b)) is consistent with these arguments.

We conclude this section by proving that λPlim ≥ λRlim.

Theorem 2. Let SMlim(t) be the number of busy servers and QMlim(t) be the number of customers

waiting for service at time t in an Mlim system, where M=R,P. If both the Rlim and the Plim

systems start empty (SRlim(0) = QRlim(0) = SPlim(0) = QPlim(0) = 0), then SPlim ≥st SRlim, which implies

that λPlim = µE[SPlim]≥ µE[SRlim] = λRlim.

Proof: See Appendix C.

6. The Two-Time-Scale and Balanced Approximations

At a high level, the S system is an N -server system with arrival rate λ and per-server service rate

µ. At a lower level, each case manager corresponds to a single-server finite-source queueing system

with parameters λ′, µ′ and a randomly varying population size. In this section, we use two-time-

scale analysis to develop the T approximation for the S system, by assuming that the high-level

system operates on a slow time scale and the low-level systems operate on a fast time scale. An

additional balancing assumption leads to our highly tractable B approximation.

We begin our development of the T approximation by recalling the state description for the

S system, with state variables i (total number of cases in the system), ju, and ku, u = 1, . . . ,N ,

where ju is the number of cases in the internal queue or in service and ku is the caseload for case

manager u. We borrow QBD terminology, referring to `≡ (i, k1, . . . , kN) as the level and (j1, . . . , jN)

as the phase and order the state variables as (i, k1, . . . , kN , j1, . . . , jN). The resulting process is not

necessarily a QBD process, because the level might not be skip-free. We assume that k1 ≤ · · · ≤ kNand that if ku = ku+1 then ju ≤ ju+1 for u= 1, . . . ,N−1, which is without loss of generality, because

the individual case managers have identical characteristics.

:15

The state space Ω for the reformulated S system is the set of vectors m= (i, k1, . . . , kN , j1, . . . , jN)

that satisfy the assumptions in the preceding paragraph. We order the states lexicographically,

we partition Ω into subsets Ω` based on the value of the level `, and we partition the vector of

transient probabilities p(t) = (p`(t)) and the vector of stationary probabilities π= (π`) in the same

way. In order to economize on notation, we interpret the index ` to either represent the vector

(i, k1, . . . , kN) or an integer label (0,1, ...) for the lexicographically ordered level vectors. Using the

latter interpretation, we write the infinitesimal generator, with one block row and one block column

corresponding to each level set Ω`, as follows:

Q=

Q00 Q01 · · ·

Q10 Q11 · · ·...

.... . .

. (11)

The diagonal matrix blocks Q`` correspond to transitions where the level does not change whereas

the off-diagonal blocks correspond to transitions where the level increases by n ≥ 1 (Q`,`+n) or

decreases by n (Q`,`−n). See Appendix F.3 for details regarding the S system transitions.

We represent the S system as having two time scales, following Yin and Zhang (2013, Sec. 4.3),

by replacing the fast parameters with λ′/ε and µ′/ε, where ε = 1 corresponds to the original S

system. The time index t represents the slow time scale and t/ε represents the fast time scale.

Our T approximation is based on the limiting behavior of the S system as ε→ 0, which implies

1/ε→∞ and the fast time scale becomes infinitely fast. We write the infinitesimal generator as

Qε =1

εQ+ Q=

1

ε

Q0

Q1

. . .

+

D00 Q01 · · ·

Q10 D11 · · ·...

.... . .

, (12)

where the off-diagonal matrix blocks Q`,`′ are from (11); Q` +D`` =Q``; and the D`` blocks are

diagonal matrices, with entries chosen to ensure that Q is stochastic, which also ensures that each

of the blocks Q` as well as Q are stochastic. The vector pε(t) of transient probabilities satisfies the

forward equation: dpε(t)/dt= pε(t)Qε, with pε(0) = p(0). We expand pε(t) as

pε(t) = outer expansion + initial layer correction + remainder =ϕ0(t) +ψ0(t/ε) + remainder. (13)

The outer expansion, ϕ0(t), approximates the steady state solution, limt→∞ pε(t), which is what

we would like to obtain. The initial layer correction, ψ0(t/ε), approximates the initial transient

solution. The outer expansion can, in general, be a function of time, but it is constant in our case

because Qε is time independent.

:16

Outer expansion: The outer expansion is a solution to ϕ0Q= 0,ϕ0e= 1. Since Q is block diagonal,

ϕ0Q= 0 reduces to ϕ`0Q` = 0. Each block Q` where `= ik1 . . . kN is the generator for a collection of

N independent single-server finite-source systems, where system u has ku customers. Let ν` be the

unique solution to ν`Q` = 0, ν`e= 1, that is, ν` is the steady-state probability vector for the `-th

collection of finite-source systems. An entry in this vector that corresponds to state vector m is the

product of terms from the steady-state distributions for the N independent systems, multiplied

by the number of combinations of the states of the N independent systems that results in the

state vector m. Concatenating the ν` vectors into ν = (ν0, ν1, . . . ) results in a vector that satisfies

ϕ0Q= 0, but is not a probability distribution, because the terms in each component ν` add up to

one, and there are infinitely many components. To obtain a probability vector, we multiply each of

the components ν` with a scalar θ`0 that we interpret as the approximate steady-state probability

of level `. That is, we express ϕ0 as ϕ0 = (θ00ν0, θ10ν

1, . . . ). We obtain the θ`0 from

θ0Q= 0, θ0e= 1, Q= diag(ν0, ν1, . . . )× Q×diag(1,1, . . . ), (14)

where Q is an “average” of Q with respect to ν = (ν0, ν1, . . . ). The blocks in the block-diagonal

matrix diag(1,1, . . . ) are column vectors of ones. We aggregate each level into a single state, and

we view the aggregated process as a Markov chain with infinitesimal generator Q.

The generator Q can be expressed as:

Q=

ν0D00

1 ν0Q011 · · ·

ν1Q101 ν1D11

1 · · ·...

.... . .

. (15)

A non-zero superdiagonal entry ν`Q`,`+n1, n≥ 1, corresponds to the level increasing from ` to `+n.

The level can only increase through the arrival of a new case, which implies that ν`Q`,`+n1= λ/|A|,

where A is the set of case managers with minimal caseload for level `. A non-zero subdiagonal entry

ν`Q`,`−n1, n≥ 1 corresponds to the level decreasing from ` to `−n, which can only occur through a

case completion. The nonzero entries in Q`,`+n are of the form s(m)µ where s(m) =∑

u∈C 1ju > 0

is the number of busy servers in the set C of case managers for which a case completion would

change the level from ` to `−n and m is the state vector for the origin state. The quantity Q`,`−n1

is a column vector in which the j-th entry is the sum of the j-th row in Q`,`−n, which equals

s(mj)µ where mj is the state vector corresponding to level ` and phase j. Therefore, ν`Q`,`−n1=

µ∑

j s(mj)ν`j , where

∑j s(mj)ν

`j =∑

u∈C β(λ′/µ′, ku) is the expected number of busy servers in C,

using β(λ′/µ′, k) for the expected server utilization a single-server, k-customer finite-source system

with rates λ′ and µ′. The diagonal entries ν`D``1 are chosen so that each row sum in Q equals

zero. See Appendix G for an algorithm to construct the matrix Q.

:17

0.25

0.30

0.35

0.40

0.45

0.50

0.55

0.01 0.1 1 10

Pre‐assignmen

t wait

S system

Two‐time‐scale

Balanced

0.25

0.26

0.27

0.28

0.29

0.01 0.1 1 10

Internal W

ait

Figure 6 Pre-assignment and internal delays, as a function of ε, for the S system and the T and B

approximations.

The matrix Q is the infinitesimal generator for the T approximation Markov chain with states

`= 0,1, . . . that correspond to the levels for the S system. For states where i≥NM , the T process

is a birth-death process, with birth rate λ and death rate U ≡ Nβ(λ′/µ′,M)µ. By solving this

process, we obtain the probabilities θ00, θ01, . . . , which we interpret as the probabilities of levels

0,1, . . . . We obtain the approximate probability distribution of states corresponding to level ` as

θ0`ν`, that is, by multiplying the probability of level ` with the steady state distribution for the

collection of finite-source systems at level `.

Initial-layer correction: The initial-layer correction is, in general, obtained from ψ0(τ) = (p(0)−

ϕ0(0)) exp(Qτ). We are only interested in the steady-state solution and therefore we can view the

initial value p(0) as a free parameter. We set p(0) = ϕ0(0), which implies that the initial-layer

correction vanishes: ψ0(τ) = 0.

The two-time-scale approximation is exact in the limit, as stated formally in the following the-

orem (a special case of Prop. 4.28 in Yin and Zhang, 2013).

Theorem 3. The remainder pε(t)−ϕ0 satisfies limε→0 sup0≤t≤T |pε(t)−ϕ0|= 0, for any T > 0.

Example: Fig. 6 shows how the pre-assignment and internal delays vary with ε, for N = M =

2, λ = 0.9, and µ = λ′ = µ′ = 1. For these parameters, the relative errors for both measures are

within 10% for ε≤ 1. The T errors increase with ε, as we expect from Theorem 3.

We show in Appendix H that the cardinality of the state space for the T approximation grows

subexponentially with N (and M). Despite this, the state space grows quickly—as an example,

for M = N = 10, the number of levels is 1.8 × 105. This motivates our development of the B

approximation, which adds the following balancing assumption to the T approximation:

Assumption 1. Balanced caseloads: The caseloads ku and kv of any two case managers u and

v are equal, if possible, and otherwise differ by at most one case.

Appendix I describes a case transfer mechanism that enforces Assumption 1. Under this assump-

tion, the T approximation reduces to our B approximation; a Markovian birth-death process for

:18

the total number of customers in the system, i. The birth rate in all states is λ. We obtain the

death rates by decomposing the total number of customers in the system as

i= la +N0kmin +N1(kmin + 1), (16)

where la = (i−NM)+ is the pre-assignment queue length, kmin is the minimum caseload of any

manager, andNi is the number of managers with caseload kmin+i, i= 0,1. We must have∑1

0Ni =N

and∑1

0Ni(kmin + i) = min(i,NM), which implies kmin = (min(i,NM)−N1)/N , N1 = min(i,NM)

mod N , and N0 =N −N1. The resulting death rate di equals

di =

(N0β

(λ′

µ′, kmin

)+N1β

(λ′

µ′, kmin + 1

))µ, i= 1,2, . . . (17)

The death rate saturates at di =U =Nβ(λ′/µ′,M)µ for i≥MN . Like the T process, the B process

has a geometrically-decaying tail, and like both the R system and the T process, the B process is

stable if and only if λ<U .

If λ < U , then standard birth-death process derivations provide formulas for the steady-state

probabilities of states 0 and NM and the average pre-assignment queue length:

π0 =

(1 +

NM−1∑i=1

λi∏i

j=1 dj+

λNM∏NM

i=1 di

U

U −λ

)−1, πNM =

λNM∏NM

i=1 diπ0, LBa = πNM

λU

(U −λ)2. (18)

We approximate the internal queue length for a case manager with caseload k as the steady-

state expected queue length η(λ′/µ′, k) in a single-server k-customer system. The overall expected

number in internal queues is:

LBq = π0

MN−1∑i=1

(λi∏i

j=1 dj

)[N0η

(λ′

µ′, kmin

)+N1η

(λ′

µ′, kmin + 1

)]+πNMNη

(λ′

µ′ ,M)U

U −λ. (19)

By Little’s Law, the expected pre-assignment and internal waits are WBa =LBa/λ and WB

q =LBq /λ.

Example, continued: Fig. 7 shows state transition diagrams for the T and B processes, with

N =M = 2. Fig. 6 shows the pre-assignment and internal delays for the B approximation as well as

the T approximation and the S system. The T approximation is usually closer to the S system than

the B approximation. We perform more thorough tests of approximation accuracy in Section 7.

Appendix J provides analysis of the T and B approximations for the N =M = 1 special case.

The B approximation may be used to efficiently calculate performance measures for case manager

systems of any realistic size. For example, using the numerical techniques in Ingolfsson and Tang

(2012), the system with 112 case managers from Yamatani et al. (2009) required less than a second

to solve.

:19

000 101

211

202

312 422 522 …

λ λ λ λ λ λ

λβ(1)μ β(2)μ

2β(1)μ

β(1)μ

β(2)μ

U U U = 2β(2)μ

2TS

000 101 211 312 422 522 …

λ λ λ λ λ

β(1)μ 2β(1)μ U U U = 2β(2)μ

B

λ

β(1)μ + β(2)μ

Figure 7 State transition diagrams for the T and B approximations, with N =M = 2, and β(λ′/µ′, k)

abbreviated to β(k). States are shown as ik1k2.

7. Calibrating the Models and Testing the Approximations

In this section we derive case-manager-model parameters to represent three environments: an ED,

a live-chat customer support center, and the social work department in an urban hospital (Table 2).

We use data from published research, industry documents, and interviews with case managers

and their supervisors. While the data are gathered from just a few facilities, we believe that

the parameters provide reasonable starting points for our analysis. We test the accuracy of the

approximations for these base cases as well as for a wider range of parameters. We find that

the approximations are accurate for the base cases and we identify parameter regions over which

approximation accuracy is high and other regions over which accuracy degrades.

Table 2 Base case parameters and model results

ED Chat Social Work

time unit hours minutes weeks

λ 8.6 3.2 12.6

λ′ 1.8 0.51 1.0

# visits to server (1/γ) 1.9 7.8 10

µTot 5.9 2.7 26.7

µ 3.2 0.3 2.7

µ′ 2.7 2.4 24.0

N 3 20 7

M 5 3 20

λ/λRlim 0.91 0.91 0.91

S simulation [95% CI] [0.576,0.577] [1.220,1.266] [0.145,0.149]

Wa T (% from simulation mean) 0.572 (-1%) 1.173 (-6%) —

B (% from simulation mean) 0.560 (-3%) 1.170 (-6%) 0.140 (-5%)

S simulation [95% CI] [0.613,0.614] [1.020,1.022] [0.5776,0.5784]

Wq T (% from simulation mean) 0.616 (0.4%) 1.027 (0.6%) —

B (% from simulation mean) 0.601 (-2%) 1.018 (-0.3%) 0.580 (0.3%)

:20

For the ED base-case parameters, we use information from Graff et al. (1993), who studied how

physician service time in an ED varies with patient category, length of stay, and intensity of service.

The physicians recorded the beginning and ending times of each interaction with a patient as well

as the length of stay: the time between patient registration in the ED and patient release. While

these ED data were partitioned into five customer types, we use the aggregate data to directly

calculate µTot and γ. We assume N = 3 physicians (typical for a small to medium-sized ED) with

a caseload limit of M = 5 patients (based on KC (2013), who found that physician performance

declines significantly when caseloads climb above 5). See Appendix K for information on how we

find values of λ and λ′ that are consistent with the data in Graff et al. (1993) and additional

details on how we derived the other parameters. As we see in Table 2, each patient visits a doctor

on average 1.9 times, and a fully utilized doctor has just under 6 interactions with patients per

hour. We use the ratio λ/λRlim between the patient arrival rate and the R system stability limit as

a measure of congestion, which is justified by Conjecture 3, that λSlim = λRlim. Under these base-case

parameters, λ/λRlim = 0.91. We set λ to achieve the same level of congestion for the following two

base cases.

The Chat base-case parameters are taken from three sources: interviews with a manager of a

chat contact center, an interview with a chat agent, and transcripts of chat sessions. The manager

provided us with the rate at which a typical chat agent types responses to customers, in words/min.

The agent told us that his maximum number of simultaneous chats is M = 3. We combine this

information with data gleaned from ten sample chat transcripts posted at ewriteonline.com

(O’Flahavan 2012). These transcripts captured interactions between customers and agents for

a variety of utilities and on-line retailers, such as Comcast, Nordstrom and Zappos. We derive

parameters µTot, γ, and λ′ by counting customer-agent interactions, counting words per message,

and analyzing time stamps in the chat transcripts. We find, for example, that the average number

of agent chat messages per customer in the transcripts is 7.8, with 2.7 messages typed per minute

(Table 2). We set N = 20, to represent a small to medium-sized chat center.

The Social Work base-case parameters are derived from interviews with a social worker in an

urban hospital in the U.S. This case manager’s maximum caseload is M = 20 patients and her

department has N = 7 social workers. She sees each patient on average once per week for approxi-

mately 10 weeks (a number influenced, in part, by limits on insurance coverage). Each patient visit

requires approximately 90 minutes of work, including travel time, the actual visit, and subsequent

paperwork. Assuming a 40-hour work week, the case manager handles 26.7 patient visits per week

(Table 2). We reiterate that these base-case parameters are not precise estimates for a particular

system but instead will help us to define a reasonable range for tests of our approximations.

:21

The last six rows of Table 2 illustrate the quality of the approximations for the base cases,

measured against simulation. We see that the T and B approximations provide accurate waiting-

time estimates. For the ED and Chat systems the T results are usually slightly more accurate than

the B results. For the Social Work system it was impossible to numerically solve for the steady-

state values of the 888,029 elements in the state space of the T approximation.1 For both the T

and B models, the error for Wa is usually significantly larger than the error for Wq—often an order

of magnitude larger in both absolute and relative terms. In addition, we will see that while Wq is

fairly stable over our parameter ranges, Wa may vary wildly. Therefore, for the remainder of this

section we focus on approximation accuracy for Wa.

In Section 6 we saw how the accuracy of the T and B approximations depends on the ratio

of the slow and fast time scales. In developing the T approximation, we used the parameter ε to

vary this ratio for a particular system. The parameter ε, however, does not allow us to compare

the slow/fast ratios between different case-manager systems. Therefore, we propose the fraction

(λ + Nµ)/[N(µ′ + Mλ′)] as an absolute measure of the ratio between the slow and fast event

frequencies. The numerator is the slow event frequency assuming all N servers are busy. The

denominator is the fast event frequency assuming that all servers are busy and have a full caseload,

with all customers in external delay. We expect that as this ratio increases, the approximation

accuracy will decline. In addition, Yom-Tov (2010) shows that the dynamics of a case-manager

loss system are heavily influenced by the offered load on the server; in our notation, the inverse of

Yom-Tov’s offered load is µ′/λ′.

Table 3 Results from the N = 2 Experiments: Mean (Max) B approximation % absolute error for Wa

λ/(Nµ) = 0.95, avg. Wa = 38.3 λ/(Nµ) = 0.7, avg. Wa = 1.6

µ′/λ′ 0.1 1 5 10 0.1 1 5 10

slow/fast 0− 1 2.0(2.1) 1.2(1.3) 1.0(2.4) 1.5(3.9) 8.1(8.6) 5.0(5.6) 5.5(17) 8.9(20)

ratio: 1− 5 2.0(2.0) 1.3(1.6) 5.5(10) 10(16) 8.2(8.6) 6.4(8.4) 23(32) 40(53)

λ+NµN(µ′+Mλ′) 5− 11 1.8(1.8) 1.6(2.4) 10(18) 18(28) 7.8(8.2) 7.8(12) 47(64) 66(76)

We tested the approximations’ accuracy by varying all parameters while ensuring that both

the slow/fast ratio and µ′/λ′ cover a wide range. In one set of 144 experiments, we set N = 2,

1 For the results in Sections 7-9 we programmed the calculations for the T , B, P, R, and small instances of the Smodel in Matlab. The computation time per instance was negligible for the B approximation, less than a second foreach of the R and P systems, and significantly longer for the T approximation and the S system, depending on theproblem size. When necessary, we simulated the S system using the Arena simulation software. For each instance,we simulated 100 replications, each of which had a 500-hour warmup period, followed by 2,000 simulated hours.These simulations required roughly 12 minutes of computation time per instance. This was usually sufficient to createconfidence intervals that were visually indistinguishable from the mean values in the figures in this paper.

:22

Table 4 Results from the N = 2 Experiments: Mean (Max) T approximation % absolute error for Wa


µ′/λ′ 0.1 1 5 10 0.1 1 5 10

slow/fast 0− 1 < .00(< .00) .04(.15) .59(2.2) 1.2(3.7) .06(.10) .44(1.3) 4.2(16) 8.4(20)

ratio: 1− 5 .02(.03) .27(.70) 5.0(10) 10(16) .04(.05) 1.9(4.8) 22(31) 40(53)

λ+NµN(µ′+Mλ′) 5− 11 .03(.03) .54(1.4) 10(18) 18(28) .02(.04) 3.4(8.7) 46(63) 66(76)

which allowed us to solve the S system numerically rather than use simulation. We fixed µ′ = 1

and we varied λ′ to obtain µ′/λ′ = 0.1,1,5,10. We varied µ and λ to achieve customer loads of

λ/(Nµ) = 0.7,0.9,0.95 and slow/fast ratios from near 0 to above 10. To set the caseload limit

M , we started with MPlim, the smallest caseload limit for which a pooled system is stable, and we

set M = MPlim +X for X = 1,2,3 (although for the caseload-limit experiments described later in

Section 8.2 we will vary M over a wider range). We calculated performance measures for the S,

R, P, B, T models for each experiment. Henceforth we will call these the “N = 2 Experiments.”

Table 3 summarizes the N = 2 results for the B approximation for λ/(Nµ) = 0.95 and 0.7. As

an example, for λ/(Nµ) = 0.95, µ′/λ′ = 5, and a slow/fast ratio in the range 1− 5, the average

absolute approximation error for Wa from the B approximation is 5.5% and the maximum error is

10%. The table shows that an increase in either ratio can increase the approximation error, with

a strong interaction effect. Table 3 further shows that the relative approximation error declines as

congestion rises. When λ/(Nµ) = 0.95, there is a significant pre-assignment queue (Wa = 38.3 on

average) and the relative approximation error tends to be small. When λ/(Nµ) = 0.7, Wa = 1.6,

and the relative error is larger. The results for λ/(Nµ) = 0.9 fall between those for λ/(Nµ) = 0.95

and 0.7.

Table 4 summarizes the performance of the T approximation and shows that it is more accurate

than the B approximation, particularly for small values of µ′/λ′. The overall pattern of errors,

however, is the same as shown in Table 3, with increasing errors as the two ratios grow larger.

We conducted additional experiments, including 384 with N = 1, 105 with N = 3, and 6 large

systems that interpolate among the base cases (Appendix L). These experiments produced results

similar to those described above. Although the errors from the B system may be large for some

parameters, we show in Section 9 and in Appendix L that this simple approximation provides

extremely reliable results when used to set caseload limits that meet a waiting-time target.

8. Insights Into System Behavior

In this section we use our models to generate insights about three aspects of the dynamics of

case-manager systems: How methods for increasing throughput affect waiting times, the impact of

caseload limits, and the role of pooling in system performance.

:23

8.1. Increasing Throughput: Resolution Probability vs. Service Rate

A fully utilized case manager completes cases at the rate γµTot. Therefore, to increase case-manager

productivity, one could increase either γ or µTot. For example, a technical support center using chat

might either create tools to help agents solve customer problems over fewer iterations (increase γ)

or hire agents with faster composition and typing speeds (increase µTot). One might hypothesize

that a particular percentage increase in either γ or µTot would have the same impact on system

wait times. Theorem 4 shows, however, that in the B approximation an increase in γ has a larger

impact.

Theorem 4. The sojourn time WB(µTot, γ) and the average pre-assignment wait WBa (µTot, γ)

under the B approximation, expressed as functions of the parameters µTot and γ, satisfy

WB(µTot(1 + y), γ) ≥ WB(µTot, γ(1 + y)),

WBa (µTot(1 + y), γ) ≥ WB

a (µTot, γ(1 + y)),

for y ∈ [0, µ′/µ).

Proof: Appendix D

While Theorem 4 applies to the B approximation, we find numerically that the same phenomenon

holds in the S system. For example, in Figure 8 we begin with the baseline Chat system and

increase both µTot (top lines) and γ (bottom lines). We see that the impact of an increase in γ can

be substantially larger than that of an increase in µTot, for both the B approximation and for the

simulated S system. For the ED system, the relative benefit of an increase in γ vs. µTot is less than

that shown in Figure 8, but the relative benefit is greater for the Social Work system. Of course,

it may not be equally costly to increase the two parameters by the same percentage, but managers

should keep in mind that the marginal value of increasing γ can be much larger.

8.2. Varying Caseload Limits

Varying the caseload limit M adjusts the tradeoff between pre-assignment delay and internal delay.

A higher M increases case manager utilization which may lead, on the one hand, to longer internal

delays and, on the other hand, to greater system capacity and shorter pre-assignment delays.

Figure 9 illustrates this tradeoff for variations from the ED base case, and shows that the impact

of changes in M on pre-assignment delay tend to dominate the impact on internal delay, so that

the total delay declines as M rises.

Let Mmin be the caseload limit that minimizes the total delay, producing Wmin. Figure 9 illus-

trates that Mmin =∞ and total delay declines as M rises for the ED base case—a situation that

we see in 50% of the N = 2 Experiments. For 15% of the experiments, total delay rises with M , so

:24

Figure 8 In the Chat system, the impact of increasing the completion probability is larger than the impact of

increasing the processing rate

.

0.00.20.40.60.81.01.21.41.61.82.0

4 5 6 7 8 9 10 11 12 13 14 15

Waitin

g tim

e (hrs.)

Caseload limit (M)

Total

Internal

Pre‐assignment

Figure 9 Average total, internal, and pre-assignment waits for the ED base case

that Mmin =MSlim, the smallest caseload limit for which the S system is stable, as in Figure 10 (a).

For the remaining 35% of the experiments, Mmin falls between MSlim and ∞, as in Figure 10(b).

These three situations—falling, rising, and nonmonotonic delay as M rises—correspond to different

pooling regimes, as we explain next.

8.3. The Impact of Partial Pooling

To provide customers with continuity of care—service from a single agent—the S system uses

only partial pooling. The S-system pre-assignment queue is pooled, in the sense that customers

are assigned to any server using state-dependent routing—in contrast to the R system. Pooling

ends, however, once customers are assigned to a case manager; in contrast to the P system, a case

manager can be idle because all of her cases are externally delayed while other case managers have

:25

9.009.029.049.069.089.109.129.149.169.189.20

3 4 5 6 7 8 9

Waitin

g tim

e (hrs.)

Caseload limit (M)(a) '/' = 0.1

Total wait

223.5223.6223.6223.7223.7223.8223.8223.9223.9224.0224.0

5 6 7 8 9 10 11

Waitin

g tim

e (hrs.)

Caseload limit (M)(b) '/' = 1

Total wait

Figure 10 Total average wait, as the caseload limit varies, from two experiments.

0.0

5.0

10.0

15.0

20.0

7.60 8.45 9.30

Avg Pre‐Assig

nmen

t Wait (hrs)

Arrival rate (per hr) and λ/λlimR ratio

R

S

2TS

B

P

λ =λ/λlimR = 0.806 0.986

(a)

0.00.10.20.30.40.50.60.70.80.91.0

7.60 8.45 9.30

Avg Internal W

ait (hrs)

Arrival rate (per hr) and λ/λlimR ratio

R

S

2TS

B

P

λ = λ/λlimR = 0.806 0.986

(b)

Figure 11 Average waits for the R, S, B, and P systems when the new case arrival rate λ varies from 7.6 to

9.3 per hour for the ED base case

.

nonempty internal queues. Therefore, it is not obvious whether pre-assignment and internal delays

in the S system will be closer to the R or P systems.

Figure 11 provides one demonstration of the impact of partial pooling in case-manager systems.

To generate the figure, we vary from the ED base case to allow λ to approach the R system

stability limit (λ/λRlim approaches 1), where λRlim = 9.44. The pre-assignment wait grows without

bound as the arrival rate approaches the system capacity while the internal wait increases more

slowly. When λ= 9.3 (99% of λRlim), the full pooling benefits of the P system reduce the average

pre-assignment delay eightfold compared to the R system (from 21.2 to 2.8 hours). In this case the

S system achieves most of the benefits of pooling, with a 6.12-hour average pre-assignment delay.

The N = 2 Experiments show the same pattern, that the S system provides most, but not all, of

the benefits of pooling. Over the 144 experiments, the pre-assignment delay in the P (S) system

was, on average, 24% (33%) of the pre-assignment delay in the R system.

:26

Partial pooling has a different impact on internal delay than on pre-assignment delay. When

λ/λRlim is low, the state-dependent routing from the pre-assignment queue helps to keep lightly-

loaded managers busy, so that the internal delay of the S system can move towards that of the

P system (see the left-hand side of Figure 11 (b)). As λ/λRlim approaches 1, however, most new

cases wait in the pre-assignment queue before being routed to the first available case manager,

thus removing the benefits of state-dependent routing and closing the gap between the S and R

systems. Note that the ratio λ/λRlim can also be varied by changing γ, µTot and λ′ (see Theorem 1).

As these parameters vary, the results are similar to those seen in Figure 11.

Now recall the three delay patterns we observed as we varied the caseload limit M (Figures 9-10).

Suppose we have an S system with either zero-duration external delays (1/λ′ = 0) or no external

delays (γ = 1, or equivalently, µ′ = 0). In either case a manager with at least one assigned case is

never forced to be idle, so there is no need to make M larger than its minimum value of M = 1.

Therefore, M = 1 maximizes the benefit from the state-dependent routing. In general, as the ratio

µ′/λ′ declines, the optimal caseload limit approaches the minimum value for which the system is

stable. Such is the case in Figure 10 (a), where µ′/λ′ has a low value of 0.1 and Mmin =MSlim.

In contrast, if average external delays durations are long (or if customers tend to require many

external delays) and caseloads are too low, then the case manager is often starved, reducing

throughput and increasing pre-assignment delay. In the ED base case, µ′/λ′ = 1.5, and we see in

Figure 9 that Mmin =∞. Finally, Figure 10 (b) shows a transitional case with µ′/λ′ = 1.0, in which

the advantages of pre-assignment pooling balance the throughput advantages of high caseloads at

an intermediate value of M .

9. Setting Caseload Limits

Research on multitasking indicates that increased caseloads can have a negative impact on service

quality (KC 2013, Coviello et al. 2014). Therefore it would be useful to identify reasonable caseload

limits that avoid the negative impact of multitasking while keeping the average total wait below a

target. In this section we demonstrate that the B approximation can be used to set caseload limits,

and we compare the results from the B approximation with three simpler benchmark heuristics,

described below. We will see that the B approximation is remarkably accurate for setting caseloads,

even when its waiting-time accuracy begins to decline, as in parts of Table 3.

To test our approximations and heuristics, for each experiment, we identified MS10%: The smallest

caseload limit such that the average total wait in the S system is at most 10% above the minimum

total delay achieved at Mmin. Specifically, if Mmin =MSlim then MS

10% =MSlim. If Mmin >MS

lim then

we begin with M =Mmin and decrement M by one case at a time until we identify the smallest

value of M such that the average total wait is less than 1.1Wmin. We use a similar procedure

:27

to identify MB10%, the smallest caseload limit that brings the average total waiting time in the B

approximation below 1.1Wmin (to make the test complete, Wmin is also calculated using the B

approximation). We also compute caseload limits using the following three benchmark heuristics:

1. Deterministic (MDet): Yamatani et al. (2009) propose a simple method for setting caseload

limits: Divide the time, χ, that a case manager is available per month by the time per month that

each case requires. We reinterpret this advice in the context of our model. The amount of time

each case requires per month from the case manager is χ multiplied by the proportion of time

that case requires from its case manager while assigned, that is, χ× [(1/µTot)/(1/µTot +1/λ′)]. The

recommended caseload limit is therefore:

MDet =

⌈χ

χ(1/µTot)/(1/µTot + 1/λ′)

⌉=

⌈1/µTot + 1/λ′

1/µTot

⌉. (20)

This approach implicitly assumes (i) that there is no variability in the system and (ii) that the

case manager is always working on the maximum possible caseload.

2. Deterministic-80% (MDet,80): Under the assumption of no variability, the previous approx-

imation would fully load the server. As a rough adjustment to take variability into account, let

MDet,80 = d0.8MDete.

3. Service-Delay Ratio (MServ): Neither of the previous heuristics take into account the

possible forced idleness to the server due to external delays. As a rough method to account for

idleness, let the maximum caseload be the ratio of the mean time in service plus external delay to

the mean time in service:

MServ =

⌈1/µTot + (1/γ− 1)(1/λ′)

1/µTot

⌉. (21)

Table 5 shows the recommended caseload limits for our three base case systems: ED, chat and

social work. The limits generated from the B approximation are at most one case away from the

limits generated from simulation. For the ED, two of the three heuristics recommend caseload

limits that are too small, producing long waits or an unstable system. For the chat and social work

systems, all three heuristics produce limits that are dramatically too large.

Table 5 Caseload-limit setting for the base cases. UNS = Unstable.

Caseload limit Total wait under each caseload limit

(from simulation of S system)

Base case MS10% MB10% MDet MDet,80 MServ Base case MS10% MB10% MDet MDet,80 MServ

ED 5 6 5 5 4 3 1.204 1.088 1.204 1.204 1.95 UNS

Chat 3 4 4 7 6 6 2.264 1.084 1.084 1.084 1.084 1.084

Social 20 21 21 28 23 25 0.728 0.62 0.62 0.583 0.586 0.584

Work

:28

We obtained similar results when using the parameters from the N = 2 Experiments. Table 6

summarizes these results, and details about each of the experiments are in Appendix L. The B

approximation produces extremely accurate results: 90% of the time the suggested caseload limit

using the balanced approximation precisely matches the suggested limit from the S system, and

for the remaining 10% of cases the balanced approximation’s suggestion is only one case off. None

of the benchmark heuristics are consistently accurate. All miss the suggested caseload limit over

80% of the time; sometimes dramatically. For particular instances, the MDet, MDet,80, and MServ

heuristics suggest caseload limits that are 127, 99, and 81 cases too high, respectively. We obtain

similar results for the tests described in Appendix L.

Table 6 Accuracy of caseload-limit-setting methods

Method: MB10% MDet MDet,80 MServ

% cases M =MS10% 90 13 6 17

Mean |M −MS10%| 0.1 20.1 15.6 9.0

Min (M −MS10%) -1 -6 -9 -7

Max (M −MS10%) 0 127 99 81

In these experiments we set caseload limits by setting a constraint on the total time in system.

An alternative method is to constrain the time until first contact with the case manager and/or

constrain the internal waiting time, thus trading off pre-assignment and post-assignment delay, as

in Figure 9. An experiment with these alternative constraints using the ED system shows that the

B approximation again recommends caseload limits that are always identical, or within 1 case, of

the recommendation from simulation.

10. Conclusions

We develop a stochastic model of a case-manager system. Exact analysis of this baseline Markov

chain model, which has two state variables for every case manager, is difficult because of the curse

of dimensionality. We therefore formulate more tractable bounds, corresponding stability limits,

and system approximations. The stability limits are particularly useful when setting up simulation

experiments, for they allow one to quickly identify reasonable system parameter regions.

Our numerical experiments show that, in general, the performance of our case-manager system

(with cases assigned to particular managers) is often closer in performance to that of a fully-pooled

system than to a system with fully independent case managers. Because our system is partially

pooled, however, increasing caseloads may increase or decrease delays in the system. We find

that the caseload limit that minimizes expected delay depends crucially on the ratio µ′/λ′, with

lower values of this ratio associated with higher delay-minimizing caseload limits. Analysis of the

balanced approximation, and numerical experiments, show that increasing the probability of case

:29

resolution can have a dramatically larger impact on performance than a comparable increase in

server speed.

We also found that the simple balanced approximation provides accurate performance measures

over a wide range of parameters, and is extremely robust when used to set caseload limits. Bench-

mark heuristics found in the academic literature, industry studies and industry field guides perform

quite poorly. These calculations ignore the impact of system parameters (such as the external

delay) and may recommend caseload limits that are either unreasonably high or are so low that

the system is unstable. Finally, another advantage of the balanced approximation is that it is easily

adapted to incorporate relationships between the manager’s caseload and the case completion rate,

such as the ones documented in KC (2013) and Coviello et al. (2014).

Our models ignore several important aspects of reality. Arrival processes are likely to be non-

stationary for case-managers systems like EDs and chat systems. It might be possible to predict

the workload required and the complexity of a case when it arrives, through triage. Case managers

are likely to be heterogeneous in terms of experience and expertise. Case managers can experience

burnout and they can be concerned about fairness in the allocation of cases (see, for example,

Department of Health and Safety 2011). We focus on workload but the quality of the work is also

important. All of these issues would benefit from further investigation.

References

APRI. 2002. How many cases should a prosecutor handle? Results of the National Workload Assessment

Project. Tech. rep., American Prosecutors Research Institute, Alexandria, Virginia. Available at:

www.ndaa.org/pdf/How%20Many%20Cases.pdf.

Apte, U. M., C.M. Beath, C. Goh. 1999. An analysis of the production line versus the case manager approach

to information intensive services. Decision Sciences 30(4) 1105–1129.

Bhaskaran, B.G. 1986. Almost sure comparison of birth and death processes with application to M/M/s

queueing systems. Queueing Systems 1(1) 103–127.

Coviello, D., A. Ichino, N. Persico. 2014. Time allocation and task juggling. The American Economic Review

104(2) 609–623.

CWLA. 2013. Recommended caseload standards. www.cwla.org/newsevents/news030304cwlacaseload.

htm.

de Vericourt, F., O. B. Jennings. 2011. Nurse staffing in medical units: A queueing perspective. Operations

Research 59(6) 1320–1331.

de Vericourt, F., Y-P Zhou. 2005. Managing response time in a call-routing problem with service failure.

Operations Research 53(6) 968–981.

:30

Department of Health, Social Services, Public Safety. 2011. Children’s social service: Case management

model. Tech. rep., Department of Health, Social Services, and Public Safety, Belfast, United Kingdom.

Available at: http://www.dhsspsni.gov.uk/oss-guide-rit-6-2011.pdf.

Dobson, G., T. Tezcan, V. Tilson. 2013. Optimal workflow decisions for investigators in systems with

interruptions. Management Science 59(5) 1125–1141.

Erdos, P. 1942. On an elementary proof of some asymptotic formulas in the theory of partitions. Annals of

Mathematics 43(3) 437–450.

Graff, L.G., S. Wolf, R. Dinwoodie, D. Buono, D. Mucci. 1993. Emergency physician workload: A time study.

Annals of Emergency Medicine 22(7) 1156–1163.

Green, L. V., S. Savin. 2008. Reducing delays for medical appointments: A queueing approach. Operations

Research 56(6) 1526–1538.

Gun, L. 1989. Experimental results on matrix-analytical solution techniques—extensions and comparisons.

Stochastic Models 5(4) 669–682.

Hardy, G. H., S. Ramanujan. 1918. Asymptotic formulaæ in combinatory analysis. Proceedings of the London

Mathematical Society 2(1) 75–115.

Harel, A. 1990. Convexity properties of the Erlang loss formula. Operations Research 38(3) 499–505.

Huang, J., B. Carmeli, A. Mandelbaum. 2012. Control of patient flow in emergency departments, or multiclass

queues with deadlines and feedback. Tech. rep. Submitted.

Ingolfsson, A., L. Tang. 2012. Efficient and reliable computation of birth-death process performance measures.

INFORMS Journal on Computing 24(1) 29–41.

Jackson, J. R. 1957. Networks of waiting lines. Operations Research 5(4) 518–521.

KC, D. S. 2013. Does multitasking improve performance? Evidence from the emergency department. Man-

ufacturing & Service Operations Management 16(2) 168–183.

Kimura, T. 1993. Duality between the Erlang loss system and a finite source queue. Operations Research

Letters 13(3) 169–173.

Latouche, G, V Ramaswami. 1999. Introduction to Matrix Analytic Methods in Stochastic Modeling . Society

for Industrial and Applied Mathematics, Philadelphia PA.

Lin, H-C, C.S. Raghavendra. 1996. An approximate analysis of the join the shortest queue (JSQ) policy.

IEEE Transactions on Parallel and Distributed Systems 7(3) 301–307.

Luo, J., J. Zhang. 2013. Staffing and control of instant messaging contact centers. Operations Research

61(2) 328–343.

Nathanson, M. B. 2000. Elementary methods in number theory , vol. 195. Springer.

Nelson, R.D., T.K. Philips. 1989. An approximation to the response time for shortest queue routing. Per-

formance Evaluation Review 1(1) 181–189.

:31

O’Flahavan, L. 2012. Write better chat to customers: Chat transcripts for review. E-WRITE whitepaper,

retrieved January 30, 2014, from ewriteonline.com/category/whitepapers/.

Ostrom, B., N. Kauder. 1996. Examining the work of state courts, 1995: A national perspective from the

court statistics project. Tech. Rep. NCSC Publication Number R-191, National Center for State Courts.

Ramakrishnan, M., D. Sier, P.G. Taylor. 2005. A two-time-scale model for hospital patient flow. IMA Journal

of Management Mathematics 16 197–215.

Ross, S. 1996. Stochastic Processes. 2nd ed. Wiley, New York.

Saghafian, S., W. J. Hopp, M. P. Van Oyen, J. S. Desmond, S. L. Kronick. 2012. Patient streaming as a

mechanism for improving responsiveness in emergency departments. Operations Research 60(5) 1080–

1097.

Saghafian, S., W. J. Hopp, M. P. Van Oyen, J. S. Desmond, S. L. Kronick. 2014. Complexity-augmented

triage: A tool for improving patient safety and operational efficiency. Manufacturing & Service Opera-

tions Management 16(3) 329–345.

Shi, P., M. C. Chou, J.G. Dai, D. Ding, J. Sim. 2014. Models and insights for hospital inpatient operations:

Time-dependent ed boarding time. Tech. rep., Working paper.

Smith, D. R., W. Whitt. 1981. Resource sharing for efficiency in traffic systems. Bell System Technical

Journal 60(13) 39–55.

Tezcan, T., J. Zhang. 2014. Routing and staffing in customer service chat systems with impatient customers.

Operations Research 62(4) 943–956.

Weber, R. R. 1978. On the optimal assignment of customers to parallel servers. Journal of Applied Probability

15(2) 406–413.

Whitt, W. 1986. Deciding which queue to join: Some counter examples. Operations Research 34(1) 55–62.

Yamatani, H., R. Engel, S. Spjeldnes. 2009. Child welfare worker caseload: What’s just right? Social Work

54(4) 361–368.

Yankovic, N., L. V. Green. 2011. Identifying good nursing levels: A queueing approach. Operations Research

59(4) 942–955.

Yin, G. G., Q. Zhang. 2013. Continuous-time Markov Chains and Applications: A Two-time-scale Approach,

vol. 37. Springer.

Yom-Tov, G. B. 2010. Queues in hospitals: Queueing networks with reentering customers in the QED regime.

PhD thesis, Technion - Israel Institute of Technology .

Yom-Tov, G. B., A. Mandelbaum. 2014. Erlang-R: A time-varying queue with ReEntrant customers, in

support of healthcare staffing. Manufacturing & Service Operations Management 16(2) 283–299.

:32

1/ ′

Tot

1

Λ

Λ

λλ/

1/ ′

Tot1

Λ

Λ

λ

Tot

.

.

.

λ

(a) (b)

Figure 12 Jackson network for (a) a single manager in an R system with unlimited caseload and (b) for a P

system with unlimited caseload.

Appendix A: Stability Limits for R and P Systems with M →∞

A single case manager in an R system with unlimited caseload can be represented by the two-node Jackson

network (Jackson 1957) in Fig. 12 (a). In this Jackson network flow balance requires Λ1γ = λ/N , where Λ1

is the overall arrival rate to the case manager. In order for the network to be stable, both nodes need to

be stable. The external delay node has infinitely many servers, so it will always be stable. The service node

will be stable if Λ1/µTot = λ/(Nµ)< 1⇒ λ<Nµ. The stability limit λRlim =Nµ is the rate with which cases

leave the system if the case managers are never idle. When M is infinite, a case manager’s capacity is never

reduced because of forced idleness while there are cases available to work on.

A P system with N case managers and unlimited caseload can be represented by the Jackson network in

Fig. 12 (b). In this Jackson network Λ1γ = λ. In order for the service node to be stable we need Λ1/(NµTot) =

λ/(Nµ)< 1⇒ λ<Nµ. We conclude that as M tends to infinity, the stability limits for the R and P systems

converge to the same value.

Appendix B: Proof of Theorem 1

The general QBD ergodicity condition that we use (Latouche and Ramaswami 1999, pg. 155) is that ωA0e <

ωA2e, where ω is the steady state probability vector corresponding to the transition matrix A=A0 +A1 +A2,

satisfying ωA= 0 and ωe= 1; A0, A1, and A2 are the repeating matrix blocks for the QBD.

Using the matrix blocks from (2) for the R system, ωRAR0 e < ωRAR2 e reduces to λ<Nµ(1−ωR0 ), where

ωR0 is the steady state probability of the first state in the Markov chain corresponding to matrix block

AR. Inspection of the matrix block AR reveals that it corresponds to a birth-death process. The system

can be viewed as an M/M/1/./M finite-source queueing system. With this interpretation, the sum of the

probabilities of all but the “idle server” state equals the probability that the single server in this queueing

system is busy. We refer to a collection of N such systems as Rlim, because this collection of single-server

finite-source queueing systems describes how the R system would work if the external arrival rate was

sufficiently large to ensure that all N case managers had a full caseload of M at all times. This proves (8)

for M=R and E[SRlim] as given in (9).

The proof of (8) for M=P and (10) follows the same steps. Inspection of the matrix AP in (44) reveals

that it is the transition matrix for an M/M/N/./NM system. We refer to this system as Plim and note that

:33

it corresponds to how the P system would operate if the external arrival rate was large enough to ensure

that the system had a full caseload of NM at all times. The ergodicity condition ωPAP0 e < ωPAP2 e reduces

to λ<Nµ(∑NM

i=0 mini/N,1ωPi ), where ωPi is the steady state probability of state i in the Plim system. The

summation in parentheses is the steady state expected proportion of busy servers in the Plim system.

Appendix C: Proof of Theorem 2

For t= 0 it is true that SPlim(t)≥st SRlim(t). Assume that SPlim(t)≥st S

Rlim(t) for t ∈ [0, t′] and that SPlim(t′) =

SRlim(t′) = b′ > 0. We will prove, using a coupling argument, that the desired order will continue to hold after

the next event after time t′.

If QPlim(t′) > 0, then the Plim system has one or more waiting customers, which implies that all of the

servers in that system are busy, or SPlim(t′) = SRlim(t′) =N . Therefore, an arrival to either Plim or Rlim will

not change the number of busy servers. A departure from Plim will not change Slim (because there is at least

one waiting customer in that system) and a departure from Rlim will either leave SRlim unchanged or reduce

it by one, depending on whether the server that completes service has a waiting customer or not. Thus, the

desired ordering of SPlim and SRlim is maintained regardless of what the subsequent event is.

If QPlim(t′) = 0, then it follows that QRlim(t′)≥ 0 =QPlim(t′), which implies that Plim has more customers in

external delay (NM − b′) than Rlim (NM − b′−QRlim(t′)). We have the following distributions for the time

until the next event after t′ of each type:

Next arrival to Plim after t′: aP(t′)∼ exp(NM − b′)λ′ (22)

Next arrival to Rlim after t′: aR(t′)∼ exp

[NM − b′−QRlim(t′)]λ′

(23)

Next departure from Plim after t′: dP(t′)∼ expb′µ′ (24)

Next departure from Rlim after t′: dR(t′)∼ expb′µ′ (25)

Note that immediately after t′, customers arrive to the queue in Plim at the same or a higher rate than

they arrive to a queue in Rlim. Therefore, we can couple Plim and Rlim as follows. After t′ we let Plim

run freely. If the next event after t′ in Plim is a departure, then we let a departure occur in Rlim with

probability 1. If the next event after t′ in Plim is an arrival, then we let an arrival occur in Rlim with

probability p= (NM−b′−QRlim(t′))/(NM−b′). This construction ensures the proper distributions for dR(t′)

and aR(t′) and keeps the sample path of the number of busy servers in Plim at or above the sample path of

the number of busy servers in Rlim with probability 1 at all times. Therefore, SPlim ≥st SRlim, which implies

that E[SPlim]≥E[SRlim] (Ross 1996, Lemma 9.1.1).

Appendix D: Proof of Theorem 4

Lemma 1. The Erlang loss probability B(k,a) for an M/G/s/s system with k servers and offered load a

is increasing in a for a> 0.

Proof: The derivative of B(.) with respect to ρ≡ a/k can be expressed as (Harel 1990, Expression (2)):

∂B(k,a)

∂ρ= (1− ρ+ ρB(k,a))aB(k,a). (26)

:34

Harel (1990, Lemma 1) show that 1− ρ+ ρB(k,a)> 0, which implies that ∂B(.)/∂ρ> 0. Therefore,

∂B(k,a)

∂a=∂ρ

∂a

∂B(k,a)

∂ρ=

1

k

∂B(k,a)

∂ρ> 0, (27)

which proves the lemma.

Proof of Theorem 4: Recall that µTot = µ+µ′, γ = µ/µTot, µ= γµTot, and µ′ = (1−γ)µTot. We translate

the two perturbations to (µTot, γ) mentioned in the Theorem into equivalent perturbations to (µ,µ′) and to

the ratio λ′/µ′ as follows, for y ∈ [0, µ′/µ):

Perturbation 1: (µTot, γ)→ (µTot(1 + y), γ)⇔ (µ,µ′)→ (µ(1 + y), µ′(1 + y))⇒ λ′

µ′→ λ′

µ′+µ′y, (28)

Perturbation 2: (µTot, γ)→ (µTot, γ(1 + y))⇔ (µ,µ′)→ (µ(1 + y), µ′−µy)⇒ λ′

µ′→ λ′

µ′−µy. (29)

The case-clearing rate of a server with k assigned cases, under the B approximation, is β(λ′/µ′, k)µ. Recall

that β(λ′/µ′, k) is the steady-state expected server utilization for a single-server, k-customer system. We

use the fact that β(λ′/µ′, k) decreases when the fast service rate µ′ increases, holding λ′ and k constant.

We prove this fact as follows. Using a duality result from Kimura (1993), β(λ′/µ′, k) = 1−B(k,µ′/λ′). As

µ′ increases, the offered load a= µ′/λ′ for the loss system increases, B(k,µ′/λ′) increases (Lemma 1), and

β(λ′/µ′, k) decreases.

The case-clearing rates with k assigned cases, under the two perturbations, are:

Perturbation 1: µ(1 + y)β

(λ′

µ′+µ′y, k

), (30)

Perturbation 2: µ(1 + y)β

(λ′

µ′−µy,k

). (31)

Perturbation 1 has a higher fast service rate (µ′+µ′y > µ′−µy) as long as y > 0 and all other parameters in

the case-clearing rate expressions are the same for the two perturbations, which implies that the Perturbation

1 case-clearing rate is higher (or equal, if y= 0).

The B-approximation birth-death processes under Perturbations 1 and 2 have equal birth rates and equal

or higher death rates under Perturbation 1 (see (17)). It follows (Bhaskaran 1986, Thm. 1) that LB(µTot(1+

y), γ)≥LB(µTot, γ(1 + y)) and therefore, by Little’s Law, WB(µTot(1 + y), γ)≥WB(µTot, γ(1 + y)).

A minor modification of the proof of Theorem 4 (b) in Bhaskaran (1986) shows that WBa (µTot(1 +y), γ)≥

WBa (µTot, γ(1 + y)). Specifically, we view the B-approximation birth-death process as describing an M/M/s

queue with s=NM servers and state-dependent service rates µi = di/min(i,NM). The modification involves

directly defining the inter-departure-time random variables while all NM servers are busy as exponential

random variables with mean 1/(Nβ(λ′/µ′,M)µ), rather than defining these random variables as the minimum

of s service time random variables. With this modification, the remaining steps in the proof of Theorem 4

(b) in Bhaskaran (1986) follow, keeping in mind that for the virtual waiting time, the only death rates that

matter are the constant death rates di for i≥NM .

:35

Appendices that do not appear in the paper begin here.

Appendix E: Computing steady state probabilities and performance measures forthe R and P systems

E.1. R system

The R system boundary matrix blocks are:

BR0 =λ

N

[0x,1 0x,M0M,1 IM

], BR1 =

∆ U1

L1 D1

. . .

. . .. . . UM−1

LM−1 DM−1

, BR2 = µ

[01,y−M 01,M

0M,y−M IM

], (32)

where x=(M − 1)M

2, y=

(M + 1)M

2,Un =

λ

N[0n,1|In] , Ln = µ

[01,n

In

],Dn =

∆ nλ′

µ′ ∆ (n− 1)λ′

. . .. . .

. . .

µ′ ∆ λ′

µ′ ∆

.Example: For N =M = 2, the states are:

i j la q s level phase0 0 0 0 0 0 01 0 0 1 0 0 01 1 0 1 1 0 1i 0 i− 2 2 0 i− 1 0 for i≥ 2i 1 i− 2 2 1 i− 1 1i 2 i− 2 2 1 i− 1 2

The first three states are the boundary states. The boundary matrix blocks are:

BR0 =λ

2

11

, BR1 =

∆ λ/2∆ λ′

µ µ′ ∆

, BR2 = µ

11

.The repeating matrix blocks are:

AR0 =λ

2

11

1

, AR1 =

∆ 2λ′

µ′ ∆ λ′

µ′ ∆

, AR2 = µ

11

.The vectors πR0 and πR1 are obtained from the boundary conditions

πR0 BR1 +πR1 B

R2 = 0, (33)

πR0 BR0 +πR1 A

R1 +πR2 A

R2 = 0, (34)

and the normalization condition

πR0 e+

∞∑n=1

πRn e= πR0 e+πR1

∞∑n=1

(RR)n−1e= πR0 e+πR1 (I −RR)−1e= 1, (35)

where AR0 , AR1 , and AR2 are defined in Section 4.2. Let iR0 be the column vector of the number of customers

assigned to a manager and jR0 be the column vector of the number of customers in internal queue or in

service in the boundary states. We compute system performance measures as:

Average caseload:

LRc =N

(πR0 i

R0 +

∞∑n=1

MπRn e

)=N

(πR0 i

R0 +MπR1

∞∑n=1

(RR)n−1e

)=N

(πR0 i

R0 +MπR1 (I −RR)−1e

). (36)

:36

Average internal queue length:

LRq =N

πR0 (jR0 − e)+ +

∞∑n=1

πRn

001...

M − 1

=N

πR0 (jR0 − e)+ +πR1 (I −RR)−1

001...

M − 1

. (37)

Average number of busy servers:

SR =N

πR0 minjR0 ,1+

∞∑n=1

πRn

01...1

=N

πR0 minjR0 ,1+πR1 (I −RR)−1

01...1

. (38)

Average number of cases in external delay:

LRe =LRc −LRq −SR. (39)

Average pre-assignment queue length:

LRa =N

∞∑n=1

(n− 1)πRn e=NπR1 RR(I −RR)−2e. (40)

Average pre-assignment wait, average internal wait, and average total system time:

WRa =

LRaλ,WR

q =LRqλ,TR =

LRc +LRaλ

, (41)

from Little’s Law.

E.2. P system

The P system boundary matrix blocks BP0 , BP1 , and BP2 are:

BP0 = λ

[0x,1 0x,NM

0NM,1 INM

],BP1 =

∆ U1

L1 D1

. . .

. . .. . . UNM−1

LNM−1 DNM−1

,BP2 = µ

[01,x 01,NM

0NM,x C(NM,N)

], (42)

where

x = (NM − 1)NM/2

C(v,w) =

min1,w

min2,w. . .

minv,w

Un = λ [0n,1|In]

Ln = µ

[01,n

C(n,N)

]

Dn =

∆ nλ′

min1,Nµ′ ∆ (n− 1)λ′

min2,Nµ′. . .

. . .

. . . ∆ λ′

minn,Nµ′ ∆

.

:37

The repeating matrix blocks are:

AP0 = λI,AP1 =

∆ NMλ′

µ′ ∆ (NM − 1)λ′

. . .. . .

. . .

Nµ′ ∆ (N − 1)Mλ′

. . .. . .

. . .

Nµ′ ∆ λ′

Nµ′ ∆

,AP2 = µ

01

. . .

N. . .

NN

(43)

AP =AP0 +AP1 +AP2 =

∆ NMλ′

µ′ ∆ (NM − 1)λ′

. . .. . .

. . .

Nµ′ ∆ (N − 1)Mλ′

. . .. . .

. . .

Nµ′ ∆ λ′

Nµ′ ∆

(44)

Example: For N =M = 2, the states are:

i j la q s level phase0 0 0 0 0 0 01 0 0 1 0 0 01 1 0 1 1 0 12 0 0 2 0 0 02 1 0 2 1 0 12 2 0 2 2 0 23 0 0 2 0 0 03 1 0 3 1 0 13 2 0 3 2 0 23 3 0 3 3 0 3i 0 i− 4 4 0 i− 3 0 for i≥ 4i 1 i− 4 4 1 i− 3 1i 2 i− 4 4 2 i− 3 2i 3 i− 4 4 3 i− 3 3i 4 i− 4 4 4 i− 3 4

The first ten states are the boundary states. The boundary matrix blocks are as follows. We use horizontal

and vertical lines to separate values of i, to clarify the structure of the blocks.

BP0 = λ

1

11

1

, BP1 =

∆ λ∆ λ′ λ

µ µ′ ∆ λ∆ 2λ′ λ

µ µ′ ∆ λ′ λ2µ 2µ′ ∆ λ

∆ 3λ′

µ µ′ ∆ 2λ′

2µ 2µ′ ∆ λ′

2µ 2µ′ ∆

,BP2 = µ

12

22

.

The repeating matrix blocks are:

AP0 = λ

1

11

11

, AP1 =

∆ 4λ′

µ′ ∆ 3λ′

2µ′ ∆ 2λ′

2µ′ ∆ λ′

2µ′ ∆

, AP2 = µ

12

22

.

:38

The vectors πP0 and πP1 can be obtained from the boundary conditions (33)-(34) and the normalization

condition (35), with R replaced by P. Let iP0 be the column vector of the total caseloads and jP0 be the

column vector of the number of customers in internal queue or in service in the boundary states. We compute

the system performance measures as:

Average caseload:

LPc = πP0 iP0 +

∞∑n=1

NMπPn e= πP0 iP0 +NMπP1 (I −RP)−1e. (45)

Average internal queue length:

LPq = πP0 maxjP0 − e,0+

∞∑n=1

πPn

001...

NM − 1

= πP0 maxjP0 − e,0+πP1 (I −RP)−1

001...

NM − 1

. (46)

Average number of busy servers:

SP = πP0 minjP0 ,N+

∞∑n=1

πPn

0...N...N

= πP0 minjP0 ,N+πP1 (I −RP)−1

0...N...N

(47)

Average number of cases in external delay:

LPe =LPc −LPq −SP (48)

Average length of pre-assignment queue:

LPa =

∞∑n=1

(n− 1)πPn e= π1RP(I −RP)−2e (49)

Average pre-assignment wait, average internal wait, and average total system time:

WPa =

LPaλ,WP

q =LPqλ,TP =

LPc +LPaλ

, (50)

from Little’s Law.

Appendix F: Formulating the S system as a Markov chain

Recall that the S system state variables are i, the total number of customers in the system; ku, the caseload

of case manager u= 1, . . . ,N ; and ju, the internal queue length, including the customer in service if any, of

case manager u= 1, . . . ,N . In this section, we provide details on the different ways in which we formulate the

S system as a Markov chain. First, we let the level ` equal 0 for i <NM and i−NM + 1 for i≥NM ; and

we let the phase, p, equal the vector of caseload and internal queue length variables, (k1, . . . , kN , j1, . . . , jN).

This formulation results in a QBD process. We solve this QBD numerically for N = 2 (and various values of

M), as discussed in Section F.1. We present an algorithm to determine the stability limit, λSlim, for any value

of N and M , as discussed in Section F.2. Second, we redefine the level, `, to be the vector (i, k1, . . . , kN),

and the phase, p, to be the vector (j1, . . . , jN), and we impose the ordering assumptions that we discussed

in Section 6. This formulation is not a QBD process but it helps us develop the T and B approximations.

We provide details regarding this formulation in Section F.3

:39

F.1. S System as a QBD for N =M = 2

The possible transitions are:

Arrival of a new case:

(l, k1, k2, j1, j2)→

(l+ 1, k1 + 1, k2, j1 + 1, j2), when k1 <k2 ≤M (rate λ) or k1 = k2 <M (rate λ/2)(l+ 1, k1, k2 + 1, j1, j2 + 1), when k2 <k1 ≤M (rate λ) or k1 = k2 <M (rate λ/2)(l+ 1, k1, k2, j1, j2), when k1, k2 ≥M (rate λ)

(51)

Service completion that results in case completion:

(l, k1, k2, j1, j2)→

(l− 1, k1− 1, k2, j1− 1, j2), when j1 > 0 and l≤ 2M (rate µ)(l− 1, k1, k2− 1, j1, j2− 1), when j2 > 0 and l≤ 2M (rate µ)(l− 1, k1, k2, j1, j2), when j1, j2 > 0 and l > 2M (rate 2µ),

or minj1, j2= 0, maxj1, j2> 0, and l > 2M (rate µ)(52)

Service completion that does not result in case completion:

(l, k1, k2, j1, j2)→

(l, k1, k2, j1− 1, j2), when j1 > 0 (rate µ′)(l, k1, k2, j1, j2− 1), when j2 > 0 (rate µ′)

(53)

Completion of external delay:

(l, k1, k2, j1, j2)→

(l, k1, k2, j1 + 1, j2), when [k1− j1]> 0 (rate [k1− j1]λ′)(l, k1, k2, j1, j2 + 1), when [k2− j2]> 0 (rate [k2− j2]λ′)

(54)

The states are:i k1 k2 j1 j2 level phase0 0 0 0 0 0 00001 1 0 0 0 0 10001 1 0 1 0 0 10101 0 1 0 0 0 01001 0 1 0 1 0 01012 1 1 0 0 0 11002 1 1 0 1 0 11012 1 1 1 0 0 11112 1 1 1 1 0 11112 2 0 0 0 0 20002 2 0 1 0 0 20102 2 0 2 0 0 20202 0 2 0 0 0 02002 0 2 0 1 0 02012 0 2 0 2 0 02023 2 1 0 0 0 21003 2 1 0 1 0 21013 2 1 1 0 0 21103 2 1 1 1 0 21113 2 1 2 0 0 21203 2 1 2 1 0 21213 1 2 0 0 0 12003 1 2 0 1 0 12013 1 2 0 2 0 12023 1 2 1 0 0 12103 1 2 1 1 0 12113 1 2 1 2 0 1212i 2 2 0 0 i− 3 2200 for i≥ 4i 2 2 0 1 i− 3 2201i 2 2 0 2 i− 3 2202i 2 2 1 0 i− 3 2210i 2 2 1 1 i− 3 2211i 2 2 1 2 i− 3 2212i 2 2 2 0 i− 3 2220i 2 2 2 1 i− 3 2221i 2 2 2 2 i− 3 2222. . .

The first 27 states are the boundary states. The boundary matrix blocks are as follows. We use vertical

and horizontal lines to separate values of i, and, for i≥ 4, to separate values of j1—in order to clarify the

:40

structure of the matrix blocks.

BS0 = λ

11

11

11

11

11

11

, BS1 =

∆ λ/2 λ/2

∆ λ′ λµ µ′ ∆ λ

∆ λ′ λµ µ′ ∆ λ

∆ λ′ λ′ λ/2 λ/2µ µ′ ∆ λ′ λ/2 λ/2

µ µ′ ∆ λ′ λ/2 λ/2µ µ µ′ µ′ ∆ λ/2 λ/2

∆ 2λ′ λµ µ′ ∆ λ′ λ

µ µ′ ∆ λ∆ 2λ′ λ

µ µ′ ∆ λ′ λµ µ′ ∆ λ

∆ λ′ 2λ′

µ µ′ ∆ 2λ′

µ µ′ ∆ λ′ λ′

µ µ µ′ µ′ ∆ λ′

µ µ′ ∆ λ′

µ µ µ′ µ′ ∆∆ 2λ′ λ′

µ µ′ ∆ λ′ λ′

µ µ′ ∆ λ′

µ µ′ ∆ 2λ′

µ µ µ′ µ′ ∆ λ′

µ µ µ′ µ′ ∆

,

BS2 = µ

1

11

1 11 1

11 1

1 1

.The repeating matrix blocks are:

AS0 = λ

1

11

11

11

11

, AS1 =

∆ 2λ′ 2λ′

µ′ ∆ λ′ 2λ′

µ′ ∆ 2λ′

µ′ ∆ 2λ′ λ′

µ′ µ′ ∆ λ′ λ′

µ′ µ′ ∆ λ′

µ′ ∆ 2λ′

µ′ µ′ ∆ λ′

µ′ µ′ ∆

, AS2 = µ

1

11

22

12

2

.F.2. Numerical Evaluation of Stability Limits for the S System

We focus on the part of the state space where all case managers are at full caseload (k1 = · · ·= kN =M). In

that part of the state space, we set the level to `= la+ 1 and we set the phase to p= (j1, . . . , jN)—the vector

of internal queue lengths, including the customer in service, if any. We ignore the caseload state variables

because they are constant. The possible transitions are:

A1: New case arrival: (`, p)→ (`+ 1, p), at rate λ.

S1: Service completion that results in a case completion: (`, p)→ (`− 1, p), at rate∑N

u=1 min(ju,1)µ.

A2: External delay completion: (`, p)→ (`, p+ eu), at rate (M − ju)λ′.

S2: Service completion by case manager u that does not result in a case completion: (`, p)→ (`, p− eu),

at rate min(ju,1)µ′

We would like to develop an algorithm to generate the repeating matrix blocks, AS0 ,AS1 ,A

S2 ,A

S for any

N ≥ 1, so that we can determine the stability condition for S. Transition type A1 is associated with matrix

block AS0 (` increases by one), transition type S1 is associated with matrix block AS2 (` decreases by one), and

transition types (A2, S2) are associated with matrix block AS1 (` does not change). The stability condition

that we want to use is the following (Latouche and Ramaswami 1999, Theorem 7.2.3): ωAS0 e < ωAS2 e, where

ω solves ωAS = 0, ωe= 1.

:41

The A1 and S1 transitions do not change the phase, which implies that AS0 and AS2 are diagonal matrices.

The A1 transition rate is the same in all phases, which means that AS0 = λI. Therefore, ωAS0 e = ωλIe =

λωe= λ. The S1 transition rate depends on the phase—in particular, it depends on how many case managers

are busy in phase p. Therefore, we can write AS2 = µC2 where C2 is a diagonal matrix with diagonal entries

of the form∑N

u=1 min(ju,1). It follows that ωAS2 = µωC2e. The vector C2e contains the number of busy case

managers in each phase. With these simplifications, the stability condition becomes

λ< λSlim = µωC2e (55)

Given that AS0 and AS2 are diagonal matrices, we see that the off-diagonal entries in AS = AS0 +AS1 +AS2

depend only on the off-diagonal entries in AS1 , and those entries involve only the fast rates (λ′ and µ′), not

the slow rates (λ and µ). Thus, ω depends only on λ′ and µ′. Since ω is the steady-state distribution for the

Markov chain induced by the infinitesimal generator AS , whose parameters depend only on λ′ and µ′, we

can set µ′ = 1 without loss of generality, or equivalently, ω depends only on the ratio r = λ′/µ′, not on the

separate values of λ′ and µ′.

The algorithm to compute λSlim is as follows.

1. Generate a list of phases p= (j1, . . . , jN) by allowing each ju to take on all integer values from 0 to M .

The resulting list has (M + 1)N entries.

2. Sum the entries in each phase to obtain the vector C2e of the number of busy case managers.

3. Generate the matrix AS by looping through the list of phases and generating entries corresponding to

all A2 and S2 transitions that can occur from the current phase and then adding diagonal entries that ensure

zero row sums. AS is a square matrix of order (M + 1)N with at most 2N + 1 nonzero entries per row.

4. Solve ωAS = 0, ωe= 1 to obtain ω.

5. Calculate λSlim as µωC2e.

We can interpret the expression µωC2e similarly to the expressions in Theorem 1. First, we have the case

completion rate µ. Second, we have the inner product of the probability vector ω and the vector C2e of the

number of busy case managers. This inner product can be viewed as an expected value of a random variable

whose values correspond to the entries in C2e and hence, ωC2e can be interpreted as the expected number

of busy case managers, or E[SSlim].

The one thing we have not been able to do is to analytically characterize the distribution ω as corresponding

to a collection of N independent single-server M -customer finite-source systems, but we conjecture this to

be the case.

Comparing Expression (8) for λRlim and Expression (55) for λSlim, we see that to numerically check Conjec-

ture 3, we can ignore µ and simply check the following: Does N times the steady-state server busy probability

in a single-server M -customer finite-source system with parameter r equal ωC2e? The latter expression

depends only on N , M , and r. We have done this comparison numerically for a set of experiments where

N ranges from 1 to 5, M ranges from 1 to 5, and r takes the values 0.01, 0.1, 1, 10, and 100. In all cases,

|λSlim−λBlim|< 10−13.

:42

F.3. Possible Transitions for the T Approximation Reformulation of the S System

We define three operators in order to simplify the listing of possible transitions: Operator g, which sorts

the case manager state variables k1 . . . kNj1 . . . jN for the case managers, first by caseload, and second, by

number in the internal queue in service and operators c(v) and d(v) for the positions in the state vector of

the state variables kv and jv for a case manager v. This allows us to increase (or decrease) the value of these

state variables by adding (or subtracting) the unit vectors ec(v) and ed(v).

The possible transitions are:

(Slow) New case arrival that is immediately assigned: If one or more case managers are below full caseload

(i <NM) then let A be the set of case managers v with kv = min(k1, . . . , kN). Then we have transitions from

m→ g(m+ e1 + ec(v) + ed(v)) with rate λ/|A| for each v ∈A. That is, we break ties for the lowest caseload

randomly.

(Slow) New case arrival that must wait for assignment: If all case managers are at full caseload (i≥NM)

then we have transitions from m→ g(m+ e1) with rate λ.

(Slow) Case completion when no cases are waiting for assignment: If the pre-assignment queue is empty

(i≤NM) then for each busy case manager v (where jv > 0), we have transitions m→ g(m− ec(v) − ed(v))

with rate µ.

(Slow) Case completion when some cases are waiting for assignment: If the pre-assignment queue is nonempty

(i >NM) then for each busy case manager v (where jv > 0), we have transitions m→ g(m−e1) with rate µ.

(Fast) Service completion that does not result in case completion: For each busy case manager v (where

jv > 0), we have transitions m→ g(m− ed(v)) with rate µ′.

(Fast) Completion of external delay: For every case manager v, we have transitions m→ g(m+ ed(v)) with

rate (kv − jv)λ′.

Appendix G: Pseudo Code for Generating Q

1. Generate the levels matrix L =

i0 k01 . . . k0

N

......

. . ....

inL knL1 . . . knL

N

, where nL + 1 is the number of levels for which

i≤NM + 1.

2. Initialize matrix Q← 0(nL+1)×(nL+1).

3. For each level ` from 0 to nL do

kmin←minL(`, j)|2≤ j ≤N + 1 (Initialize minimum caseload among managers in level `)

C← 01×(M+1) (Initialize vector to store the number of managers with caseloads 0,1, . . . ,M .)

For each case manager v:

If L(`, v+ 1) = kmin, then nmin← v (Identify nmin, manager that would receive an arriving case)

C(L(`, v+ 1) + 1)←C(L(`, v+ 1) + 1) + 1 (Count managers with caseloads 0, . . . ,M)

Return

For each y from 0 to nL do

If L(y, ·) =L(`, ·) + e1 + enmin+1, then Q(`, y))← λ

:43

For each v from 1 to N do

If L(y, ·) =L(`, ·)− e1− ev+1, then

Q(`, y)←C(L(`, v+ 1) + 1)β(λ′/µ′,L(`, v+ 1))µ

Return

Return

Return

Return

Q(nL, nL− 1)← 1

Q(nL− 1, nL)← 1/(1−x), where x= λ/(Nβ(λ′/µ′,M)µ)

For each ` from 0 to nL do

Q(`, `)←−∑nL

j=1Q(`, j)

Return

Appendix H: Cardinality of the T Approximation State Space

The level is `= (i, k1, . . . , kN). Given the constraints∑N

u=1 ku = i,0≤ k1 ≤ · · · ≤ kN ≤M , how many levels are

there for a given i < NM? Call the number of levels n(i). Finding n(i) is related to the problem of finding

the number p(i) of partitions of an integer i. A partition of an integer is a way to express the integer as the

sum of integers (Nathanson 2000). The number of levels n(i) is the number of partitions of i, restricted to

have at most N parts where each part is at most M . Therefore, n(i)≤ p(i). Here is one approximate formula

for p(i) (Hardy and Ramanujan 1918, Erdos 1942):

p(i) =1

4√

3i

(exp

(π√

2/3))√i

(56)

This means that p(i), and therefore n(i), grows subexponentially with i. The total number of states in the

T approximation, not counting the geometric tail, is bounded by:

n(0) + ...+n(NM)≤ p(0) + ...+ p(NM)≤ (NM + 1)p(NM)

≤ (NM + 1)1

4√

3NM

(exp

(π√

2/3))√NM

(57)

which grows subexponentially with N (and with M). This upper bound is far from tight, as Figure 13

illustrates. As an example, for M =N = 10, the number of levels is 1.8×105 and the upper bound is 2×1010.

Appendix I: B Approximation Balancing Mechanism

The following is one mechanism that ensures that caseloads for any two case managers differ by at most

one case. If the pre-assignment queue is empty and case manager u completes a job and is left with ku jobs,

compare ku with the caseload of the case manager v with the largest number of cases, uv. If kv − ku > 1,

then move one case from case manager v to case manager u. If the pre-assignment queue is occupied when

a case manager completes a job, then she pulls a case from the pre-assignment queue. If a new case arrives

and finds the pre-assignment queue empty, then assign the case to a server with the smallest caseload. If all

caseloads ku =M , then an arriving case waits in the pre-assignment queue.

:44

1.E+001.E+011.E+021.E+031.E+041.E+051.E+061.E+071.E+081.E+091.E+101.E+11

1 2 3 4 5 6 7 8 9 10N

Number of levels

Upper bound

Figure 13 Number of levels, for i <NM , for the T approximation with M = 10, as a function of N .

Appendix J: Impact of ε on T Approximation Accuracy when N =M = 1

If N = M = 1, then we can view the S system as an M/G/1 queue. The “service” of each case consists

of an alternating sequence of processing steps and external delays, until the case is completed. The next

case does not get access to the server until the complete sequence of steps are completed for the currently

assigned case. If we can compute the mean and variance of the M/G/1 service time then we can use the

Pollaczek-Khinchine formula to obtain the average pre-assignment delay.

Let S = P0 + E1 + P1 + · · · + En + PN be the M/G/1 service time random variable, that is, the total

time that a case ties up the server, where the Pis are the processing durations and the Eis are the external

delay durations. The number of external delays, N , is geometrically distributed with mean E[N ] = (1−γ)/γ

and variance var[N ] = (1 − γ)/γ2. Straightforward algebra reveals that E[S] = (1/µ)(1 + µ′/λ′), var[S] =

(1/µ2)((λ′)2 + 2µ′(λ′+µ) + (µ′)2)/(λ′)2, and the squared coefficient of variation is SCV[S] = 1 + 2µµ′/(λ′+

µ′)2. The mean service time depends only on the ratio of the fast transition rates, but the SCV increases as

the fast transition rates (λ′ and µ′) decrease together by the same factor.

To focus on the impact of the speed of the fast system, let r= λ′/µ′, λ′ = r/ε, and µ′ = 1/ε. Then:

E[S] =1

µ

(1 +

1

r

)(58)

SCV[S] = 1 +2µε

(r+ 1)2(59)

Holding r constant, we see that SCV[S]→ 1 as ε→ 0 and SCV[S]→∞ as ε→∞.

One form of the Pollaczek-Khinchine formula is:

Wa =1 + SCV[S]

2× λ(E[S])2

1−λE[S]=

(1 +

µε

(r+ 1)2

)×

λµ

(1 + 1

r

)2µ−λ

(1 + 1

r

) . (60)

We see that the average pre-assignment wait increases linearly with ε. Even though the analysis assumes

N =M = 1, this finding is consistent with the N =M = 2 graph in the left panel of Figure 6.

When N =M = 1, the T and B approximations coincide, because there is only one server, so balancing is

not an issue. Both approximations reduce to an M/M/1 queue, with the following service rate:

µ×Prserver busy in 1-server 1-customer finite-source system= µ× r

r+ 1(61)

:45

Using this service rate in the M/M/1 delay formula, we obtain the Wa value corresponding to ε= 0 in (60).

This confirms what we already knew from Theorem 3, that the T approximation is exact in the limit as

ε→ 0 when N =M = 1, and shows that the same is true of the B approximation in this special case.

Appendix K: Parameters for Emergency Department Base Case

Here we describe how parameters for a case-manager model of an Emergency Department (ED) may be

inferred from partial data. In practice, administrative data and observational studies for case-manager sys-

tems may not capture sufficient information for direct estimation of all system parameters (M , N , λ, λ′,

µTot, and γ). For example, in an ED, administrative data might track a patient’s total length of stay (LOS)

and the times of consultations with physicians but might not include information about when a patient’s

external delay (a diagnostic imaging test, for example) ends and internal delay (waiting for a consultation

with the assigned physician) begins. In this appendix, we illustrate how one might address these difficulties.

We use information from a time study of emergency physician workload by Graff et al. (1993). We view

physicians as case managers. Graff et al. (1993) studied how physician service time varies with patient service

category, length of stay, and intensity of service. The physicians in their study (from a university-affiliated

community teaching hospital) recorded the beginning and ending times of each interaction with a patient,

as well as the LOS—the time between patient registration in the ED and patient release.

Table 7 lists statistics from Graff et al. for five patient types. The aggregate patient averages in Table 7

permit direct estimation of the average number of processing steps and the average service time per processing

step, as follows:

Average number of processing steps =1

γ= 1.86 ⇒ γ = 0.54 (62)

Average physician service time =1

µTot

=total service time

average number of steps=

0.32 hrs.

1.86(63)

= 0.17 hrs. = 10.3 minutes ⇒ µTot = 5.91/hr.

Table 7 Data from Graff et al. (1993). All times are in hours

Patient type Number Avg. service Avg. # of γ LOS Avg. # of ext. T −Tstime (Ts) steps (1/γ) (T ) delays (Ne)

Nonselected 514 0.40 2.20 0.45 2.17 1.20 1.76Walk-in 637 0.16 1.30 0.77 0.98 0.30 0.82

Obs. 52 0.93 6.30 0.16 12.41 5.30 11.48Lac. repair 102 0.42 1.10 0.91 1.60 0.10 1.18

Critical 42 0.53 2.60 0.38 2.92 1.60 2.39Total 1347

Wtd. avg. 0.32 1.86 0.54 1.98 0.86 1.67

The data do not allow direct estimation of the external arrival rate (λ) and the average external delay

(1/λ′). We can use the S model, however, to determine values for (λ′, λ) that are consistent with the 1.98-hour

average total LOS from Graff et al. We decompose the total LOS as follows:

Total LOS = Pre-assignment delay + internal delay + service time + external delay (64)

= 1.98 hours.

:46

Figure 14 Contour of cases satisfying (65) along with the stability limits

.

After substituting direct estimates for the average total LOS and the average service time, we are left with

Pre-assignment delay + internal delay + external delay =Wa(λ,λ′) +Wq(λ,λ

′) +Te(λ,λ′) (65)

= 1.67 hours.

We can use the S model to identify (λ′, λ) pairs that satisfy (65) and are, therefore, consistent with the data in

Graff et al. (1993), but first we must set base-case values for N and M . We assume N = 3 physicians (typical

for a small to medium-sized ED) with a maximum caseload of M = 5 patients (based on the empirical study

by KC (2013), which found that when caseloads climb above 5, physician performance declined significantly).

After fixing N , M , µTot, and γ, we first varied λ′ and computed the stability limits for the R and Psystems, as shown in Figure 14. Then we simulated the S system for several (λ′, λ) pairs that fell within

the R system stability region. Figure 14 shows several such pairs that satisfy (65), up to simulation error.

These pairs form an approximate contour along which (65) is satisfied, and this contour lies entirely within

the R system stability region. The complete set of values corresponding to the (λ′, λ) pair that we chose

for our base case are λ= 8.6/hour, λ′ = 1.8/hour, µTot = 5.91/hour, γ = 0.54, M = 5, and N = 3. With the

S model, these values result in a physician utilization of 90%, average pre-assignment wait of 0.58 hours,

average internal wait of 0.61 hours, and average external delay of 0.47 hours—values that appear plausible

for an ED.

Appendix L: Additional Tests for B and T Approximation Accuracy

This section describes a series of experiments to test the accuracy of the B and T approximations and to

assess the B approximation’s usefulness for establishing caseload limits.

L.1. Approximation Accuracy for Variations from the Base Case

Figure 15 shows the relative accuracy of Wa generated from the B approximation, as compared to the

simulation mean, for the three base cases and for 6 variations from the base-case systems. The 6 variations

were chosen to define a wide range of slow/fast ratios (the fraction (λ+Nµ)/[N(µ′+Mλ′)]) and µ′/λ′. The

Figure shows that if both ratios are large, approximation accuracy declines significantly. The T approximation

produces the same pattern.

:47

Figure 15 Bubble width and labels are % absolute difference between the Wa from simulation and Wa from

the B approximation

L.2. Approximation Accuracy and Caseload Setting for Systems with N = 1

These experiments are similar to the N = 2 Experiments in Section 7, but with N = 1. We set µ′ = 1, and

µ′/λ′ = 0.1,1,5,10. We also choose values of µ and λ so that the customer load λ/µ= 0.7,0.9,0.95 and so

that the slow/fast ratio varied from near 0 to above 10. Finally, for each combination of [µ′, λ′, µ,λ] we set

the caseload limit M . Let MPlim be the smallest caseload limits for which a pooled system is stable. We set

M =MPlim +X for X = 1,3. This produces 96 experiments. Note that for N = 1, the R, S, and P systems

are identical. Table 8 summarizes results for the B approximation’s accuracy for Wa, where “s-f ratio” refers

to the slow/fast ratio.

Table 8 Results from the N = 1 Experiments: Mean (Max) B system % absolute approximation error for Wa


µ′/λ′ 0.1 1 5 10 0.1 1 5 10

0− 1 < .00(< .00) .01(.02) .3(.9) .7(2.1) < .00(< .01) 0.3(1.1) 3.7(17) 7.0(16)

s-f

rati

o

1− 5 < .00(< .00) .04(.10) 2.4(4.3) 6.0(10) .01(.03) 2.8(5.5) 19(21) 37(47)

5− 11 < .00(< .00) .21(.21) 4.6(8.2) 11(18) .06(.06) 5.3(10.4) 51(67) 69(78)

In general, the approximation is more accurate here than in the N = 2 Experiments. This may be due to

the fact that the average values for Wa here are larger, and the approximation tends to be less accurate for

small waiting times. The overall pattern of the results is similar to the pattern from the N = 2 Experiments:

the approximation is highly accurate if either µ′/λ′ or the slow-fast ratio is small but deteriorates if both

ratios are large.

We also tested whether the B approximation could be used to set accurate caseloads. As in Section 9,

for each experiment we identified MS10%, the smallest caseload limit such that the average total wait in the

S system is at most 10% above the minimum total delay achieved at Mmin. Because with N = 1 there is

:48

Figure 16 Recommended caseloads from the S model (MS10%) versus caseload limits from the B approximation

and the benchmark heuristics

no pre-assignment pooling, Mmin =∞ in all cases. Therefore, we searched for MS10% by beginning at MPlim

and increasing M until the average total delay is less than 1.1Wmin. We use the B approximation to find

the analogous caseload limit, MB10%, and we compute caseload limits using the three benchmark heuristics:

MDet,MDet,80, and MServ. Table 9 shows that, for these experiments, the B approximation is far superior

to the heuristics for setting caseloads. For 83% of the experiments, the approximation identified the optimal

caseload, and for the other 17% of experiments it undershot the optimal caseload by just 1 case. The best

heuristic, MServ, only identified the optimal caseload 17% of the time, and at times was 8 cases too low and

up to 81 cases too high.

Table 9 Accuracy of caseload limit-setting methods

Method: MB10% MDet MDet,80 MServ

% cases M =MS10% 83 10 6 17

Mean |M −MS10%| 0.2 19.8 15.4 9.0

Min (M −MS10%) -1 -6 -9 -8

Max (M −MS10%) 0 125 97 81

Figure 17 shows that the B approximation is accurate over a wide range of caseloads: note the clustering

of the B-approximation caseload limit recommendations on the diagonal. The recommendations using the

heuristics, however, are usually far off of the diagonal.

L.3. Further Information for Caseload Setting for Systems with N = 2

Table 10 provides further information about caseload setting for the N = 2 experiments.

L.4. Approximation Accuracy and Caseload Setting for Systems with N = 3

We also tested the B approximation over 105 experiments, all with N = 3. For each experiment we ran

simulation experiments to identify MS10% and used the B approximation to find MB10%. We also used the

:49

Table 10 More information about caseload setting experiments for N = 2. The case worker utilization is

λ/(Nµ). The parameter µ′ is fixed at 1. UNS = unstable.Caseload limits Total wait under each caseload limit

λ′ γ λ µ utilization MS10% MB

10% MDet MDet,80 MServ MS10% MB

10% MDet MDet,80 MServ

10 0.55 1.73 1.24 0.7 2 2 2 1 2 0.89 0.89 0.89 UNS 0.8910 0.93 17.29 12.35 0.7 2 2 3 2 2 0.09 0.09 0.09 0.09 0.0910 0.98 86.47 61.76 0.7 2 2 8 6 2 0.02 0.02 0.02 0.02 0.0210 0.99 172.94 123.53 0.7 2 2 14 11 2 0.01 0.01 0.01 0.01 0.0110 0.53 1.99 1.11 0.9 2 2 2 1 2 4.18 4.18 4.18 UNS 4.1810 0.92 19.89 11.05 0.9 2 2 3 2 2 0.42 0.42 0.4 0.42 0.4210 0.98 99.47 55.26 0.9 2 2 7 6 2 0.08 0.08 0.08 0.08 0.0810 0.99 198.95 110.53 0.9 2 2 13 10 2 0.04 0.04 0.04 0.04 0.0410 0.52 2.05 1.08 0.95 3 3 2 1 2 8.81 8.81 UNS UNS UNS10 0.92 20.46 10.77 0.95 3 3 3 2 2 0.88 0.88 0.88 UNS UNS10 0.98 102.31 53.85 0.95 3 3 7 6 2 0.18 0.18 0.18 0.18 UNS10 0.99 204.62 107.69 0.95 3 3 12 10 2 0.09 0.09 0.09 0.09 UNS1 0.19 0.33 0.24 0.7 4 4 3 2 7 5.51 5.51 6.32 UNS 5.441 0.7 3.29 2.35 0.7 4 4 5 4 3 0.55 0.55 0.54 0.55 0.651 0.92 16.47 11.76 0.7 4 4 14 12 3 0.11 0.11 0.11 0.11 0.141 0.96 32.94 23.53 0.7 4 4 26 21 3 0.06 0.06 0.05 0.05 0.081 0.17 0.38 0.21 0.9 5 5 3 2 7 22.86 22.86 UNS UNS 22.371 0.68 3.79 2.11 0.9 5 5 5 4 3 2.29 2.29 2.29 2.61 UNS1 0.91 18.95 10.53 0.9 5 5 13 11 3 0.46 0.46 0.45 0.45 UNS1 0.95 37.89 21.05 0.9 5 5 24 19 3 0.23 0.23 0.22 0.22 UNS1 0.17 0.39 0.21 0.95 5 5 3 2 7 50.31 50.31 UNS UNS 47.511 0.67 3.9 2.05 0.95 5 5 5 4 3 5.04 5.04 5.04 6.88 UNS1 0.91 19.49 10.26 0.95 5 5 13 10 3 1.01 1.01 0.95 0.95 UNS1 0.95 38.97 20.51 0.95 5 5 23 19 3 0.51 0.51 0.47 0.47 UNS

0.2 0.12 0.2 0.14 0.7 8 8 7 6 42 12 12 14.38 24.86 11.110.2 0.59 1.98 1.41 0.7 8 8 14 11 10 1.21 1.21 1.1 1.1 1.110.2 0.88 9.88 7.06 0.7 9 8 42 34 7 0.23 0.25 0.22 0.22 0.370.2 0.93 19.76 14.12 0.7 9 8 77 62 7 0.11 0.13 0.11 0.11 0.230.2 0.11 0.23 0.13 0.9 11 11 7 6 46 43.29 43.29 UNS UNS 41.050.2 0.56 2.27 1.26 0.9 11 11 13 10 10 4.33 4.33 4.12 4.78 4.780.2 0.86 11.37 6.32 0.9 11 11 38 31 7 0.87 0.87 0.82 0.82 UNS0.2 0.93 22.74 12.63 0.9 11 11 70 56 7 0.44 0.44 0.41 0.41 UNS0.2 0.11 0.23 0.12 0.95 12 12 7 6 47 88.52 88.52 UNS UNS 83.520.2 0.55 2.34 1.23 0.95 12 12 13 10 11 8.86 8.86 8.52 12.94 9.820.2 0.86 11.69 6.15 0.95 12 12 37 30 7 1.78 1.78 1.66 1.66 UNS0.2 0.92 23.38 12.31 0.95 12 12 68 55 7 0.9 0.9 0.83 0.83 UNS0.1 0.11 0.18 0.13 0.7 13 13 13 10 89 14.2 14.2 14.2 33.8 13.390.1 0.56 1.81 1.29 0.7 13 13 24 20 19 1.42 1.42 1.33 1.33 1.330.1 0.87 9.06 6.47 0.7 14 13 76 61 13 0.27 0.3 0.26 0.26 0.30.1 0.93 18.12 12.94 0.7 14 13 141 113 12 0.14 0.16 0.13 0.13 0.220.1 0.1 0.21 0.12 0.9 17 17 13 10 98 51.36 51.36 271.74 UNS 47.990.1 0.54 2.08 1.16 0.9 17 17 23 19 20 5.14 5.14 4.78 4.83 4.780.1 0.85 10.42 5.79 0.9 17 17 69 56 13 1.05 1.05 0.95 0.95 9.60.1 0.92 20.84 11.58 0.9 18 17 127 102 12 0.5 0.54 0.48 0.48 UNS0.1 0.1 0.21 0.11 0.95 19 19 13 10 100 100.29 100.29 UNS UNS 95.160.1 0.53 2.14 1.13 0.95 19 19 23 18 20 10.03 10.03 9.5 10.72 9.710.1 0.85 10.72 5.64 0.95 19 19 68 54 13 2.02 2.02 1.9 1.9 UNS0.1 0.92 21.44 11.28 0.95 19 19 124 100 12 1.02 1.02 0.95 0.95 UNS

deterministic heuristic and the pooled-system stability limit (MPlim, when this caseload limit resulted in a

stable system) to compute caseload limits.

We ran two series of experiments: Series A, with lightly loaded systems and low recommended caseload

limits and Series B, with heavily loaded systems and high recommended caseload limits. We list the param-

eters for the caseload experiments in Tables 12 (Series A) and 13 (Series B). We controlled the system load

via the ratio λ/(Nµ), which corresponds to the case-manager utilization for a system with M =∞. The

experiments covered a wide range of parameter values that might be seen in healthcare settings, for example,

1/λ′ varied from 23 minutes to 1 hour in Series A and from 2 to 4 hours in Series B. The parameter sets

were primarily constructed using a full factorial design, but with unstable systems eliminated and a few

experiments added to widen the range of recommended caseloads.

Table 11 and Figure 17 summarize the results of the experiments. The fourth and fifth lines of Table 11

and the clustering of the B-system caseload limit recommendations on the diagonal in Figure 17 show that

MB10% provides us with an accurate method for setting caseload limits. The balanced model caseload limits

usually match the exact MS10% (75% of cases in Series A and 88% of cases in Series B) and they differ from

:50

Figure 17 Recommended caseloads from the S system simulation (MS10%) versus caseload limits from the

deterministic heuristic (MDet), the B approximation (MB10%), and the P system stability limit (MPlim)

MS10% by at most 1 in all cases. The deterministic approach, on the other hand, is a poor approximation. The

deterministic caseload limit MDet matches MS10% in only 10% of the Series A cases and 4% of the Series B

cases and MDet is often an overestimate, by up to 10 cases. Figure 17 also shows that MP10% often significantly

underestimates the recommended caseload limit.

The B approximation is less successful at providing precise performance measure estimates, given the

recommended caseload. From Table 11, the B-approximation average and maximum absolute errors for total

wait, compared to the S-system simulation, were 9% and 34% in Series A, respectively. The performance

of the approximation was much better in Series B (1%, 6%). Note, however, that in Series A the absolute

waiting times were extremely small, so that the absolute total waiting time error produced by the B system

was also small, averaging 0.9 minutes.

Table 11 Summary of numerical experiments.

Series A Series BNumber of cases 81 24Average for MP

lim 1.8 11.8Average for λ/(Nµ) 0.56 0.92% cases MB

10% =MS10% 75% 88%

Max |MB10%−MS

10%| 1 1Avg. abs % system time error by B, given MB

10% 2% 0.4%Max. abs. % system time error by B, given MB

10% 7% 3%Avg. % waiting time error by B, given MB

10% 9% 1%Max. abs. % waiting time error by B, given MB

10% 34% 6%% cases MDet =MS

10% 10% 4%Max |MDet−MS

10%| 9 10

:51

Table 12 Parameters for Series A (N = 3 case managers and M = 5 cases in all experiments).

Exp. # λ′ γ λ µTot Exp. # λ′ γ λ µTot Exp. # λ′ γ λ µTot

1 0.95 0.54 7.60 5.91 28 1.80 0.54 7.60 5.91 55 2.65 0.54 7.60 5.912 0.95 0.54 7.60 7.00 29 1.80 0.54 7.60 7.00 56 2.65 0.54 7.60 7.003 0.95 0.54 7.60 9.00 30 1.80 0.54 7.60 9.00 57 2.65 0.54 7.60 9.004 0.95 0.54 8.60 5.91 31 1.80 0.54 8.60 5.91 58 2.65 0.54 8.60 5.915 0.95 0.54 8.60 7.00 32 1.80 0.54 8.60 7.00 59 2.65 0.54 8.60 7.006 0.95 0.54 8.60 9.00 33 1.80 0.54 8.60 9.00 60 2.65 0.54 8.60 9.007 0.95 0.54 9.30 5.91 34 1.80 0.54 9.30 5.91 61 2.65 0.54 9.30 5.918 0.95 0.54 9.30 7.00 35 1.80 0.54 9.30 7.00 62 2.65 0.54 9.30 7.009 0.95 0.54 9.30 9.00 36 1.80 0.54 9.30 9.00 63 2.65 0.54 9.30 9.0010 0.95 0.75 7.60 5.91 37 1.80 0.75 7.60 5.91 64 2.65 0.75 7.60 5.9111 0.95 0.75 7.60 7.00 38 1.80 0.75 7.60 7.00 65 2.65 0.75 7.60 7.0012 0.95 0.75 7.60 9.00 39 1.80 0.75 7.60 9.00 66 2.65 0.75 7.60 9.0013 0.95 0.75 8.60 5.91 40 1.80 0.75 8.60 5.91 67 2.65 0.75 8.60 5.9114 0.95 0.75 8.60 7.00 41 1.80 0.75 8.60 7.00 68 2.65 0.75 8.60 7.0015 0.95 0.75 8.60 9.00 42 1.80 0.75 8.60 9.00 69 2.65 0.75 8.60 9.0016 0.95 0.75 9.30 5.91 43 1.80 0.75 9.30 5.91 70 2.65 0.75 9.30 5.9117 0.95 0.75 9.30 7.00 44 1.80 0.75 9.30 7.00 71 2.65 0.75 9.30 7.0018 0.95 0.75 9.30 9.00 45 1.80 0.75 9.30 9.00 72 2.65 0.75 9.30 9.0019 0.95 0.95 7.60 5.91 46 1.80 0.95 7.60 5.91 73 2.65 0.95 7.60 5.9120 0.95 0.95 7.60 7.00 47 1.80 0.95 7.60 7.00 74 2.65 0.95 7.60 7.0021 0.95 0.95 7.60 9.00 48 1.80 0.95 7.60 9.00 75 2.65 0.95 7.60 9.0022 0.95 0.95 8.60 5.91 49 1.80 0.95 8.60 5.91 76 2.65 0.95 8.60 5.9123 0.95 0.95 8.60 7.00 50 1.80 0.95 8.60 7.00 77 2.65 0.95 8.60 7.0024 0.95 0.95 8.60 9.00 51 1.80 0.95 8.60 9.00 78 2.65 0.95 8.60 9.0025 0.95 0.95 9.30 5.91 52 1.80 0.95 9.30 5.91 79 2.65 0.95 9.30 5.9126 0.95 0.95 9.30 7.00 53 1.80 0.95 9.30 7.00 80 2.65 0.95 9.30 7.0027 0.95 0.95 9.30 9.00 54 1.80 0.95 9.30 9.00 81 2.65 0.95 9.30 9.00

Table 13 Parameters for Series B (N = 3 case managers and M = 5 cases in all experiments).

Exp. # λ′ γ λ µTot

1 0.25 0.20 3.40 5.912 0.40 0.20 3.40 5.913 0.50 0.20 3.40 5.914 0.25 0.30 5.00 5.915 0.40 0.30 5.00 5.916 0.50 0.30 5.00 5.917 0.25 0.40 6.90 5.918 0.40 0.40 6.90 5.919 0.50 0.40 6.90 5.9110 0.25 0.50 6.90 5.9111 0.40 0.50 6.90 5.9112 0.50 0.50 6.90 5.9113 0.25 0.20 3.01 5.9114 0.40 0.20 3.01 5.9115 0.50 0.20 3.01 5.9116 0.25 0.30 4.52 5.9117 0.40 0.30 4.52 5.9118 0.50 0.30 4.52 5.9119 0.25 0.40 6.03 5.9120 0.40 0.40 6.03 5.9121 0.50 0.40 6.03 5.9122 0.25 0.50 7.54 5.9123 0.40 0.50 7.54 5.9124 0.50 0.50 7.54 5.91

queueing models of case managers - university of alberta

Documents