convergence of a dynamic policy for buffer management in shared buffer atm switches
TRANSCRIPT
Performance Evaluation 36–37 (1999) 249–266www.elsevier.com/locate/peva
Convergence of a dynamic policy for buffer management in sharedbuffer ATM switches
Supriya Sharma a, Yannis Viniotis b,*
a Strategic Network Planning, Alcatel USA, Plano, TX 75075, USAb Department of ECE, North Carolina State University, Raleigh, NC 27695-7911, USA
Abstract
Lack of effective buffer management can lead to severe cell loss in shared buffer ATM switches. With the aim ofreducing cell loss, we study a class of non-anticipative buffer management policies that always admit cells to the bufferwhile there is space in it and may pushout a cell when the buffer becomes full. We propose a dynamic algorithm thatoperates without any knowledge of the arrival process; we consider a two-class system and show that the algorithm isoptimal when the arrivals to each class are a superposition of identically distributed and independent Bernoulli processes. 1999 Published by Elsevier Science B.V. All rights reserved.
Keywords: ATM shared buffer switches; Cell loss; Dynamic pushout algorithms; Convergence of Markov chains; Buffermanagement
1. Introduction
In this paper we study a 2ð 2 shared buffer ATM switch, shown in Fig. 1. Each arrival, called a cellin this paper, has a predetermined destination, either output line 1 or output line 2. If the output line isbusy at the time of cell arrival, the cell waits in a finite (size K ) common buffer that is shared betweencells of the two output lines. The transmission time of all cells is constant (and equal to 2.83 µs for ATM
Input line 2 Output line 2
Output line 1Input line 1
Fig. 1. The shared buffer switch.
Ł Corresponding author. E-mail: [email protected]
0166-5316/99/$ – see front matter 1999 Published by Elsevier Science B.V. All rights reserved.PII: S 0 1 6 6 - 5 3 1 6 ( 9 9 ) 0 0 0 2 1 - 8
250 S. Sharma, Y. Viniotis / Performance Evaluation 36–37 (1999) 249–266
lines at 155 Mbps). Even though the cells of the two classes share a common storage of size K cells,each output line has its own logical queue. There is a buffer management policy, f , which controls theentries to and departures from the buffer of cells. This policy is the focus of our study.
Previous work (e.g. [4,5]) shows that lack of effective buffer management can lead to severe cell lossin the buffer. For example, with complete sharing of the buffer space amongst the two output lines, theentire buffer can be dominated by cells of only one of the output lines, which causes the arriving cellsof the other output lines to be lost, resulting in unfairness and excessive loss, since one line may beforced to idle. Our goal, then, is to find policies that regulate the queue lengths of the output lines andthus reduce the cell loss probability of the buffer. The class of policies we consider has the followingstructure: policies always admit cells to the buffer while there is space in it and may pushout a cellwhen the buffer becomes full. In other words, a policy may remove a previously accepted cell fromthe buffer and insert the newly arrived one in its place. It is assumed that the policies do not have anyknowledge of the future (arrivals). In [7] it was determined that, under certain assumptions, the optimalpolicy (that results in the least probability of cell loss) is a static policy. Static policies are problematic toimplement in real systems, since they require knowledge of the probability mass function of the arrivalprocess.
In this paper, we propose a dynamic buffer management policy that operates without knowledge ofthe arrival process parameters; instead, it uses estimates of the optimal policy parameters. Our mainresult, Theorem 11, is a proof that these estimates (and consequently the performance of the dynamicpolicy) converge to those of the optimal one.
In Section 2 we present a discrete-time queueing model for a switch with two inputs and outputs. Thestatic optimal policy, f Ł, and the optimality criterion are stated in Section 3. In Section 4, we introducethe measurements collected by the dynamic policy and define this policy formally. Finally, in Section 5,we present the proof of convergence.
2. The stochastic model
A queueing model for the shared buffer ATM switch is depicted in Fig. 2. The output lines of thesystem are represented by two servers, called X and Y , serving cells from a common buffer of sizeK > 2 (that includes the space for cells in service). Time is discrete, denoted by n 2 N0 D f0; 1; 2; : : :gor n 2 N D f1; 2; : : :g. Arrivals take place at the end of a time slot, the length of which is equal to theservice time of a cell.
Fig. 2. The queueing model.
S. Sharma, Y. Viniotis / Performance Evaluation 36–37 (1999) 249–266 251
The class F of buffer management policies that we consider is the set of all policies, f , that arenon-anticipative and work conserving. As we shall see, since the policies are work conserving, Eqs. (3)–(5) and (10) hold true.
The arrival process on each input line is modeled as an iid Bernoulli process; the destination of thecell (server X or Y ) is also determined in an iid manner. Let the random processes fAX
n g; fAYn g, denote
the number of arrivals to servers X and Y (also called class X and Y ), respectively, in slot n 2 N0. Sincethe arrivals to a server are the superposition of two Bernoulli processes (the arrivals on the input lines),in any given time slot, there can be at most two arrivals in a slot, i.e.,
AXn ; AY
n ½ 0; and AXn C AY
n � 2: (1)
For a fixed n 2 N0, the vector fAXn ; AY
n g has the following known probability mass function:
Pf.AXn ; AY
n / D .i; j/g D ri j ; i; j ½ 0; i C j � 2: (2)
Consider a buffer management policy f 2 F . Let the sequence fX fn ;Y f
n g; n 2 N0, called the bufferprocess, denote the number of cells of each server in the buffer at the beginning of the nth time slot. Thestate space of the sequence is given by S D f.x; y/: x; y ½ 0; x C y � K g, where K is a fixed positiveconstant.
Define the sequence f OX fn ; OY f
n g; n 2 N0, as follows. Let .x/C D max.0; x/; then
. OX f0 ;OY f0 / D .X f
0 ;Y f0 /;
OX fn D .X f
n�1 � 1/C C AXn�1; n 2 N ; (3)
OY fn D .Y f
n�1 � 1/C C AYn�1; n 2 N :
This sequence is, therefore, the demand for storage in slot n, but is called the overflow process forreasons that will be apparent later. Together with the buffer process, the overflow process defines thequeueing model of the shared buffer ATM switch.
Define the sequence f¦ fk g; k 2 N0, as follows. Let ¦ f
0 D 0, and
¦f
k D minfn > ¦ fk�1: OX f
n C OY fn > K g; k 2 N : (4)
¦f
k is the kth return time to the state f OX fn C OY f
n > K g. At this time, the demand for buffer space exceedsthe capacity of the buffer and cell loss takes place. Since our buffer management policies accept all cellsinto the buffer while it is not full, the state of the buffer in the nth slot, .X f
n ;Y fn /, is the same as the
demand for space, f OX fn ; OY f
n g, except at cell loss times; i.e.,
.X fn ;Y f
n / D . OX fn ;OY fn /; n 6D ¦ f
k ; k 2 N : (5)
The decision of the buffer management policy, f (regarding the class of cell to be pushed out),determines the value of .X f
¦f
k;Y f
¦f
k/;8k 2 N , and is explained in the next section.
It can be seen that the event f¦ fk D mg depends entirely on the event f OX f
n ; OY fn gmnD0. Therefore,
f¦ fk g; k 2 N0 is a sequence of stopping times [1,6] on the process f OX f
n ; OY fn g.
Since the time slots cell loss takes place it can be determined by examining the f OX fn ; OY f
n g process;it is this process that is of particular interest to us, and shall be the focus of this paper. In the nextsubsection we examine how loss takes place in the shared buffer while in Section 2.2 we investigatesome properties of the overflow process.
252 S. Sharma, Y. Viniotis / Performance Evaluation 36–37 (1999) 249–266
2.1. Cell loss and policy actions
In order to find out exactly how cell loss takes place, we consider the total number of cells in thebuffer, given by X f
n C Y fn or OX f
n C OY fn . From Eq. (3), we can easily determine the conditions necessary
for cell loss to occur. These conditions are therefore stated without proof in the following lemma.
Lemma 1. There can be at most one cell loss in any time slot and this cell loss can take place only fromthe vertices of the state space, .0; K / and .K ; 0/, i.e., at all cell loss times, n D ¦ f
k ;8k 2 N , we havethat:
OX fn C OY f
n D K C 1;
.X fn�1;Y f
n�1/ D .K ; 0/ or .0; K /;
AXn�1 C AY
n�1 D 2:
Therefore, from Lemma 1, we can see that when .X fn�1;Y f
n�1/ D .0; K /, we get that:
. OX fn ;OY fn / D
8><>:.0; K C 1/; if AX
n�1 D 0; AYn�1 D 2;
.1; K /; if AXn�1 D 1; AY
n�1 D 1;
.2; K � 1/; if AXn�1 D 2; AY
n�1 D 0:
(6)
When .X fn�1;Y f
n�1/ D .K ; 0/, we get that:
. OX fn ;OY fn / D
8><>:.K C 1; 0/; if AX
n�1 D 2; AYn�1 D 0;
.K ; 1/; if AXn�1 D 1; AY
n�1 D 1;
.K � 1; 2/; if AXn�1 D 0; AY
n�1 D 2:
(7)
Thus, the state space of the process . OX f
¦f
k
; OY f
¦f
k
/; k 2 N , is given byŁ, where
Ł D f.0; K C 1/; .1; K /; .2; K � 1/; .K C 1; 0/; .K ; 1/; .K � 1; 2/g: (8)
Recall that the state of the buffer process at an overflow instant is determined by the actions of thebuffer management policy. In order to formally define a buffer management policy, we need to determinethe states of .X f
¦f
k
;Y f
¦f
k
/; k 2 N . Since by Lemma 1, there can be only one cell loss at a loss instant,
the policy, f , has to drop or push out only one cell. Therefore, 8k 2 N , .X f
¦f
k
;Y f
¦f
k
/ is constrained to lie
within the following bounds:
X f
¦f
k
2 f OX f
¦f
k
; OX f
¦f
k
� 1g; Y f
¦f
k
2 f OY f
¦f
k
; OY f
¦f
k
� 1g: (9)
From Eq. (9), the state space of fX f
¦f
k
;Y f
¦f
k
g can be easily determined as the following subset of S:
f.0; K /; .1; K � 1/; .2; K � 2/; .K � 2; 2/; .K � 1; 1/; .K ; 0/g: (10)
A buffer management policy, f , is then defined by the sequence of states at cell loss times,f OX f
¦f
k
; OY f
¦f
k
; X f
¦f
k
;Y f
¦f
k
g, k 2 N0.
S. Sharma, Y. Viniotis / Performance Evaluation 36–37 (1999) 249–266 253
2.2. Properties of f OX fn ; OY f
n gWe now study a few properties of the overflow process that will be useful in Sections 4.1 and 5. The
first property characterizes the passage from a buffer content of c cells to that of c C 1 cells. The secondstates the one-step transition probabilities of the overflow process, while the third gives its recurrenceproperties.
Lemma 2. The total number of cells in the buffer may increase by only one cell in a slot, provided thatthe buffer is not empty and one server idles; i.e., if at some time n0 that is not a loss instant, we have that
OX fn0C OY f
n0D c; 1 � c � K ;
− D minfn > n0: OX fn C OY f
n D c C 1g;then
OX fn C OY f
n � c; n0 � n < −;
OX f−�1 D 0 or OY f
−�1 D 0:
Proof. Since OX fn0 C OY f
n0 � K , we know that . OX fn0;OY fn0/ D .X f
n0;Y fn0/. Therefore, by expanding Eq. (3), at
time n D n0,
OX fnC1 C OY f
nC1 D
8>>><>>>:X f
n C Y fn � 2C AX
n C AYn ; if X f
n ;Y fn > 0;
Y fn � 1C AX
n C AYn ; if X f
n D 0;Y fn > 0;
X fn � 1C AX
n C AYn ; if X f
n > 0;Y fn D 0;
AXn C AY
n ; if X fn D 0;Y f
n D 0:
(11)
From Eq. (1) we have AXn0C AY
n0� 2, and we can easily see that
OX fn0C1 C OY f
n0C1 � OX fn0C OY f
n0D c
when X fn0;Y f
n0 > 0. In the next two cases, when one of the two servers idles, we see that we may haveOX f
n0C1 C OY fn0C1 D c C 1 and − D n0 C 1 if AX
n0C AY
n0D 2. In this case, we see that the statement of the
lemma is satisfied.If − > n0 C 1, OX f
n0C1 C OY fn0C1 � c � K , and we get . OX f
n0C1;OY fn0C1/ D .X f
n0C1;Y fn0C1/. Then, at time
n D n0C 1, the arguments made for time n D n0 can be applied again. Continuing inductively, we get therequired result. �
For the particular case that c D K , − is also a cell loss time. It can easily be seen that at timen D − � 1 we must have either . OX f
n ; OY fn / D .K ; 0/ or .0; K /. However, this property does not hold true
at a general loss instant; i.e., we cannot claim that . OX f
¦f
k �1; OY f
¦f
k �1/ D .K ; 0/ or .0; K /. This is because
we may have ¦ fk�1 D ¦ f
k � 1 in which case OX f
¦f
k �1C OY f
¦f
k �1> K . This problem is circumvented by the
use of the state of the buffer process, .X fn ;Y f
n /, at time ¦ fk � 1, as in Lemma 1.
For any .x1; y1/ 2 S and .x2; y2/ 2 S [Ł, n 2 N0, we define the following function:
p..x1; y1/; .x2; y2// D PfAXn D x2 � .x1 � 1/C; AY
n D y2 � .y1 � 1/Cg: (12)
254 S. Sharma, Y. Viniotis / Performance Evaluation 36–37 (1999) 249–266
We can see from Eq. (2) that for x D x2�.x1�1/C, and y D y2�.y1�1/C, p..x1; y1/; .x2; y2// D rxy .The next lemma (that is reminiscent of the Strong Markov Property [1,3]) proves that the function
p..x1; y1/; .x2; y2// is the one-step transition probability of the overflow process from state .x1; y1/ to.x2; y2/. Our choice of state space for .x1; y1/ makes it clear that the transition probabilities at lossinstants are not considered, as this transition depends upon the specific policy used.
Lemma 3. For any stopping time − on the process f OX fn ; OY f
n g that is not a loss instant (− 6D ¦ fk ;8k 2 N0),
the conditional distribution of the vector . OX fn ; OY f
n / at time − C 1 given the entire past, depends only onthe state of the process at time − and is stationary; i.e., for .x−C1; y−C1/ 2 S [Ł; .x− ; y− / 2 S,
Pf. OX f−C1;OY f−C1/ D .x−C1; y−C1/ j . OX f
− ;OY f− / D .x− ; y− /; : : :; . OX f
0 ;OY f0 / D .x0; y0/g
D p..x− ; y− /; .x−C1; y−C1//:
Proof. Since − is not a loss instant, x− C y− � K , and we can use Eqs. (3) and (5) to write:
OX f−C1 D . OX f
− � 1/C C AX− ;
OY f−C1 D . OY f
− � 1/C C AY− :
For simplicity, we define:
C D f. OX f−C1;OY f−C1/ D .x−C1; y−C1/g;
B D f. OX f− ;OY f− / D .x− ; y− /; : : :; . OX f
0 ;OY f0 / D .x0; y0/g;
x D x−C1 � .x− � 1/C; y D y−C1 � .y− � 1/C:
Then, PfC j Bg D PfAX− D x; AY
− D y j Bg. Therefore,
PfC j Bg D1XjD0
PfAX− D x; AY
− D y j − D j; Bg ð Pf− D j; Bg=PfBg:
Now,
PfAX− D x; AY
− D y j − D j; BgD PfAX
j D x; AYj D y j . OX f
j ;OY f
j / D .x− ; y− /; : : :; . OX f0 ;OY f0 / D .x0; y0/g:
From Eq. (3), we know that . OX fj ;OY f
j / depends upon fAXi ; AY
i g, i � j � 1, and the policy f . Sincearrivals are iid, .AX
j ; AYj / is independent of fAX
i ; AYi g, i � j � 1. Since f is non-anticipative, .AX
j ; AYj /
is independent of f OX fi ;OY fi g, i � j . Therefore,
PfAXj D x; AY
j D y j . OX fj ;OY f
j / D .x− ; y− /; : : :; . OX f0 ;OY f0 / D .x0; y0/g
D PfAXj D x; AY
j D yg D rxy;
PfC j Bg D p..x− ; y− /; .x−C1; y−C1//ð1XjD0
Pf− D j; Bg=PfBg D p..x− ; y− /; .x−C1; y−C1//: �
In [7], the following lemma is proved.
Lemma 4. All states in the state space of f OX fn ; OY f
n g are visited infinitely often.
S. Sharma, Y. Viniotis / Performance Evaluation 36–37 (1999) 249–266 255
3. The optimality criterion and the optimal policy
Consider two systems controlled by policies f and g 2 F . Define a sequence of stopping times,f¦ f g
k g, k 2 N0, on the stochastic process f OX fn ; OY f
n ; OX gn ; OY g
n g, k 2 N0, as:
¦f g
0 D 0; ¦f g
k D minfn > ¦ f gk�1: OX f
n C OY fn > K or OX g
n C OY gn > K g:
Therefore, f¦ f gk g is the kth time that there is a loss in either system.
The cell loss under systems f and g at time ¦ f gk , k 2 N , is defined as:
L fk D OX f
¦f g
k
C OY f
¦f g
k
� K ; Łgk D OX g
¦f g
k
C OY g
¦f g
k
� K :
We define as our optimality criterion the minimization of the average expected cell loss in the system;i.e., f Ł is optimal, if the following inequality is satisfied:
limn!1
1
n
nXkD1
E L f Łk � lim
n!11
n
nXkD1
E Lgk ; 8g 2 F :
In [7], it is proved that the optimal policy, f Ł, has the following structure. For every n D ¦ f Łk , k 2 N ,
If . OX f Łn ; OY f Ł
n / D .0; K C 1/ then .X f Łn ;Y f Ł
n / D .0; K /:
If . OX f Łn ; OY f Ł
n / D .1; K / then .X f Łn ;Y f Ł
n / D .1; K � 1/:
If . OX f Łn ; OY f Ł
n / D .K C 1; 0/ then .X f Łn ;Y f Ł
n / D .K ; 0/:
If . OX f Łn ; OY f Ł
n / D .K ; 1/ then .X f Łn ;Y f Ł
n / D .K � 1; 1/:
If . OX f Łn ; OY f Ł
n / D .2; K � 1/ then .X f Łn ;Y f Ł
n / D(.1; K � 1/; if .1; K � 2/ < �.1; K � 2/;
.2; K � 2/; otherwise:
If . OX f Łn ; OY f Ł
n / D .K � 1; 2/ then .X f Łn ;Y f Ł
n / D(.K � 1; 1/; if �.K � 2; 1/ < .K � 2; 1/;
.K � 2; 2/; otherwise:
In the above expression, .x; y/ and �.x; y/ for .x; y/ 2 S n f.0; 0/g are taboo probabilities that aredefined as follows: let − D minfn > 0: OX f
n C OY fn D OX f
0 C OY f0 C 1g, then,
.x; y/ D Pf OY fn > 0; 0 � n < − j OX f
0 D x; OY f0 D yg;
�.x; y/ D Pf OX fn > 0; 0 � n < − j OX f
0 D x; OY f0 D yg:
(13)
The definition of − is similar to that in Lemma 2, with c D OX f0 C OY f
0 . From the lemma, we knowthat for 0 � n < − , we have OX f
n C OY fn � x C y � K . Therefore, the actions of the policy do not
affect f OX fn ; OY f
n g−nD0, which is why the probabilities .x; y/ and �.x; y/ are independent of the policy f .Lemma 2 tells us that one of OX f
−�1 and OY f−�1 must be 0. .x; y/ and �.x; y/ give us the probability that
only one of the two events occurs. It can easily be seen that the optimal policy chooses to push out cellsfrom the class whose server has a smaller probability of going idle before the next cell loss takes place.For more details, see [7].
256 S. Sharma, Y. Viniotis / Performance Evaluation 36–37 (1999) 249–266
Thus, with the knowledge of .1; K � 2/, �.1; K � 2/, .K � 2; 1/, and �.K � 2; 1/, a buffer policyis able to make optimal decisions. The optimal policy, f Ł, is static and Markov, in the sense that thedecision taken at an overflow time depends only upon the state of the overflow process at that timeand given the same state at different overflow times, f Ł takes the same decision. However, it requiresknowledge of the probability mass function of the arrival process, something that is unlikely to beavailable in a real network. Therefore, we present in the next section, a dynamic algorithm, which usesestimates for .Ð/ and �.Ð/, that are available to the policy at time ¦ f
k . As we will show in Section 5, thisdynamic algorithm is also optimal.
4. The dynamic policy
In this section, we describe the system information that the dynamic policy must measure, in order toobtain the estimates it needs and then formally define the policy. For simplicity of notation, we retainthe symbol f to denote the (yet unstated) dynamic policy.
4.1. Measurements taken by the policy
First we define the sequences fS fk g; k 2 N0 and fT f
k g; k 2 N0. For any .a; b/ 2 S n f.0; 0/g, andk 2 N , let
S f0 D T f
0 D 0;
S fk D minfn ½ T f
k�1: . OX fn ; OY f
n / D .a; b/g;T f
k D minfn > S fk : OX f
n C OY fn D a C b C 1g:
(14)
S fk and T f
k are actually functions of .a; b/, but the argument .a; b/ is dropped for simplicity of notation.The same is true of all the other variables to follow in this section.
S fk is not the kth return time to the state .a; b/. S f
k is the first time that the overflow process reachesthe state .a; b/ after time T f
k�1. At the times fS fk�1 C 1; S f
k�1 C 2; : : :; T fk�1 � 1g, the state .a; b/ may be
visited several times. However, from the definition in Eq. (14), none of these instants may be S fk .
Similarly, T fk is not the kth return time to the state f OX f
n C OY fn D .a C b C 1/g (referred to hereafter
as state .a C b C 1/). It is the first time that the state a C b C 1 is visited after time S fk . In the times
fT fk�1 C 1; : : :; S f
k g, state a C b C 1 may be visited several times, with none of them being equal to T fk .
However, it can be seen that both fS fk g; k 2 N0 and fT f
k g; k 2 N0 are sequences of stopping times on thef OX f
n ; OY fn g process.
Since from Lemma 4 we know that all states in the state space of f OX fn ; OY f
n g are visited infinitely often,we get the following result, stated as a lemma.
Lemma 5. The random variables S fk � T f
k�1 and T fk � S f
k are finite, with probability 1.
Thus the time axis is divided into non-overlapping, non-contiguous cycles of time fS fk ; : : :; T f
k g. Theterm kth cycle is used to refer to the interval fS f
k ; : : :; T fk g, k 2 N0. Since these cycles have been proven
to be finite with probability 1 (wp 1), the random variables defined below are finite wp 1 too.
S. Sharma, Y. Viniotis / Performance Evaluation 36–37 (1999) 249–266 257
Let kAk denote the cardinality of a set A. Define the sequence of sets fAk; k 2 N0g as A0 D ;, and,
A fk D fn 2 fS f
k ; S fk C 1; : : :; T f
k g: . OX fn ;OY fn / D .a; b/g:
The sequence of random variables, fH fk g; k 2 N0, is defined as H f
k D kA fk k. The random variable,
H fk , counts the number of times that the state .a; b/ is visited in the kth cycle. By the definition of S f
k inEq. (14), H f
k ½ 1; k 6D 0. From Lemma 5, all cycles are of finite length wp 1, and thus,
PfH fk <1g D 1: (15)
Define the sequence of random variables, fC fn g; n 2 N0, as follows:
C fn D maxfi ½ 0: T f
i � n; i 2 N0g: (16)
C fn counts the number of cycles completed in f0; : : :; ng. Since the length of the 0th interval is 0, it is
not really counted as a cycle and thus C fn D 0; n < T f
1 . It can be easily seen that the (integer) sequencefC f
n g is non-decreasing. However, by Lemma 5, since the length of cycles is finite, wp 1, the number oftimes that any integer is repeated must be finite, wp 1, too.
Define the sequence of random variables, fJ fk g; k 2 N , as follows:
J fk D max
(m ½ 0:
mXiD0
H fi < k; m 2 N0
)C 1: (17)
J fk denotes the cycle in which the kth visit to state .a; b/ takes place. Since we know by Eq. (15) that
the number of visits per cycle (H fk ) is finite wp 1, J f
k has to increase with k, though not strictly.Define the sequence of random variables, fN f
k g; k 2 N , as:
N f1 D S f
1 ;
N fk D minfn > N f
k�1: . OX fn ; OY f
n / D .a; b/; n 2 N0g; k D 2; 3; : : :(18)
N fk is the kth return time of the state .a; b/. It is a strictly increasing random variable (therefore,
limk!1 N fk D 1 wp 1,) with the property that
S f
J fk
� N fk < T f
J fk
: (19)
The above equation merely says that the time of the kth visit to state .a; b/ must lie between thebeginning and end of the cycle it takes place in (i.e., J f
k ). By the definitions of the cycles in Eq. (14), weknow that none of these visits can take place outside the cycles.
From Lemma 2, we know that between every visit of the state .a; b/, say the kth one, and the end ofthe cycle it occurs in, (TJ f
k), either the state f OX f
n D 0g or f OY fn D 0g or both must be visited; i.e.,²
OX fn > 0; OY f
n > 0; n D N fk ; : : :; T f
J fk
¦D ;; k 2 N : (20)
258 S. Sharma, Y. Viniotis / Performance Evaluation 36–37 (1999) 249–266
The variables, fI f xk g and fI f y
k g, k 2 N , that we introduce next, track if either the f OX fn D 0g state or
the f OY fn D 0g state is visited exclusively after the kth visit to state .a; b/. More formally,
I f xk D
(1; if OX f
n > 0; 8n D N fk ; N f
k C 1; : : :; T f
J fk
� 1;
0; otherwise:(21)
I f yk D
(1; if OY f
n > 0; 8n D N fk ; N f
k C 1; : : :; T f
J fk
� 1;
0; otherwise:(22)
From Eq. (20), it can be seen that for any fixed k, at most one of I f xk and I f y
k will be equal to 1. Ifboth states f OX f
n D 0g and f OY fn D 0g are visited, then these random variables will both be 0.
Define the sequence of random variables, fG fk g; k 2 N , as:
G fk D
kXiD1
H fi : (23)
So, G fk counts the number of times that the state .a; b/ is visited in the first k cycles, viz. in the time
interval f0; : : :; T fk g. Since H f
i ½ 1; i 6D 0, we know that G fk is a positive and strictly increasing integer
sequence.
4.2. Estimates used by the dynamic policy
We are now in a position to define f Q� fk .a; b/g; k 2 N0, and f Q f
k .a; b/g; k 2 N0, the estimates for�.a; b/ and .a; b/, respectively. These are calculated at the end of every cycle as:
Q� f0 .a; b/ D Q f
0 .a; b/ D 0;
Q� fk .a; b/ D
G fkX
iD1
I f xi
G fk
;
Q fk .a; b/ D
G fkX
iD1
I f yi
G fk
; k 2 N :
(24)
Since G fk ½ 1;8k 2 N , the estimates are well defined. Thus, we have expressed the estimates as a
time average or Ces Jaro sum of a sequence of random variables.
4.3. The decision made by the dynamic policy
When the overflow states are one of f.0; K C1/; .1; K /; .K C1; 0/; .K ; 1/g, the actions of the (static)optimal policy, f Ł, are fixed. In these cases, our dynamic policy follows the actions of the optimal
S. Sharma, Y. Viniotis / Performance Evaluation 36–37 (1999) 249–266 259
policy. In the other two states, f.2; K � 1/; .K � 1; 2/g, it uses estimates as shown in the followingequation:
If . OX f
¦f
k
; OY f
¦f
k
/ D .2; K � 1/
then .X f
¦f
k
;Y f
¦f
k
/ D8<:.1; K � 1/; if Q f
C f
¦f
k
.1; K � 2/ < Q� f
C f
¦f
k
.1; K � 2/;
.2; K � 2/; otherwise:
If . OX f
¦f
k
; OY f
¦f
k
/ D .K � 1; 2/
then .X f
¦f
k
;Y f
¦f
k
/ D8<:.K � 1; 1/; if Q� f
C f
¦f
k
.K � 2; 1/ < Q f
C f
¦f
k
.K � 2; 1/;
.K � 2; 2/; otherwise:
Q� fk and Q f
k are updated at the end of every cycle, i.e., at times, T fk ; k 2 N (one of the times that the
state f OX fn C OY f
n D K g is visited). However, the policy itself uses these estimates only at loss instants. Weknow the kth one occurs at time ¦ f
k ; k 2 N which is also the kth time that the state f OX fn C OY f
n D K C 1gis visited. Since by Lemma 2 we know that during a cycle the total number of cells in the buffer (orthe value of OX f
n C OY fn ) cannot exceed the number at the beginning, the cell loss can only take place
in between cycles, not during a cycle. And since at the time of the kth loss, ¦ fk , there have been C f
¦f
k
complete cycles, the estimates used by the policy at this time are Q� f
C f
¦f
k
and Q f
C f
¦f
k
. (An example of a
sequence of estimates that policy f might use is Q� f0 ;Q� f0 ;Q� f1 ;Q� f3 ;Q� f7 ;Q� f7 ; : : :.) In order to relate our
policy f to the static optimal policy, f Ł, we will first show that:
limk!1
Q� f
C f
¦f
k
.a; b/ D �.a; b/;
limk!1
Q f
C f
¦f
k
.a; b/ D .a; b/:(25)
This has to be shown for both .a; b/ D .K � 2; 1/ and .1; K � 2/. It should be remembered that foreach value of .a; b/, a different set of cycles and variables as defined in Section 4.1 are obtained. In thenext section, we deal with the convergence for the general case .a; b/ 2 S n f0; 0g.
5. Proof of convergence
To prove Eq. (25), we prove the stronger result
limk!1
Q� fk .a; b/ D �.a; b/;
limk!1
Q fk .a; b/ D .a; b/:
(26)
First we note that f¦ fk g is a strictly increasing sequence with limk!1¦
fk D 1. This follows from the
facts that the state OX fn C OY f
n > K is visited infinitely often (see Lemma 4) and that by Lemma 1, there
260 S. Sharma, Y. Viniotis / Performance Evaluation 36–37 (1999) 249–266
is at most one cell loss in any time slot. Since fC fk g is a non-decreasing integer sequence so is fC f
¦f
k
g.Therefore, f Q� f
C f
¦f
k
g is composed of a subsequence of f Q� fk g in which certain terms are repeated, but only
for a finite amount of times. The same relation applies to f Q f
C f
¦f
k
g and f Q fk g.
From Eq. (24), Eq. (26) can be restated as the following:
limk!1
G fkX
iD1
I f xi
G fk
D �.a; b/; limk!1
G fkX
iD1
I f yi
G fk
D .a; b/: (27)
Since fI f xk g and fI f y
k g are not iid sequences, we show that they are regenerative in order to proveconvergence of their time averages. However, both processes are defined upon the f OX f
n ; OY fn g process,
and since the decisions taken by the dynamic policy at time ¦ fk ; k 2 N depend upon the entire past of
the process, fI f xk g1kD1 and fI f y
k g1kD1 are not necessarily regenerative.We first extract a regenerative (and Markov) process, f QX f
n ; QY fn g, from f OX f
n ; OY fn g by concatenating all
the fS fk ; : : :; T f
k g cycles (or deleting the fT fk C 1; : : :; S f
kC1 � 1g periods). Then we show that the I f xk
and I f yk variables remain unchanged if redefined on this new process. This enables us to use properties
of Markov Chains to prove the required regenerative properties.
5.1. The f QX fn ; QY f
n g process
This process is defined as:
QX fn D OX f
Ýf
n; QY f
n D OY f
Ýf
n; n 2 N0: (28)
Ýf
n can be explained in the following way: if a counter were to be started at 0, incrementing in everytime slot that occurs within an fS f
k ; : : :; T fk g cycle of the f OX f
n ; OY fn g process, then, Ý f
n would be the valueof time at which the counter equalled n. An example is given in Fig. 3.Ý
fn is defined formally as follows: set Ý f
0 D S f1 , and
Ý fn D
8<:Ýf
n�1 C 1; if OX f
Ýf
n�1
C OY f
Ýf
n�1
6D .a C bC 1/;
minfi > Ýf
n�1: . OX fi ;OY fi / D .a; b/g; otherwise:
(29)
Lemma 6. fÝ fn g is a sequence of stopping times on f OX f
n ; OY fn g.
Proof. By induction. Since S f1 is a stopping time, so is Ý f
0 . Suppose that Ý fn�1 is a stopping time. From
the definition, it can be seen that the event fÝ fn � ig can be determined by observing the values of
f OX fm; OY f
m g, Ý fn�1 � m � i . Therefore, Ý f
n is a stopping time on f OX fn ; OY f
n g. �
5.2. The Markov property of f QX fn ; QY f
n gNow we use the property of f OX f
n ; OY fn g stated in Lemma 3 to show that f QX f
n ; QY fn g is a Markov Chain.
S. Sharma, Y. Viniotis / Performance Evaluation 36–37 (1999) 249–266 261
Lemma 7. The process f QX fn ; QY f
n g is a time-homogeneous Markov Chain with state space QS Df.x; y/: x; y ½ 0; x C y � a C b C 1g and one-step transition probability from state .x; y/ to state. Px; Py/ denoted by Qp..x; y/; . Px; Py//, given in Eqs. (30) and (31).
Proof. Consider the transition from state . QX fn�1;QY fn�1/ D .xn�1; yn�1/ to state . QX f
n ; QY fn / D .xn; yn/. For
xn�1C yn�1 D aC bC 1, from Eq. (14), we know that n� 1 is the end of some cycle. So, . QX fn ; QY f
n / canonly be .a; b/; i.e., whenever the process reaches the state QX f
n C QY fn D aC bC 1, it is forced to return to
state .a; b/ in the next time slot. Also, from Lemma 2, OX fn C OY f
n � a C b C 1 for all n that occur withina fS f
k ; : : :; T fk g cycle. Therefore, QX f
n C QY fn � a C b C 1;8n.
To write out the one-step transition probabilities, we consider two cases:
ž xn�1 C yn�1 D a C b C 1. For .x; y/; . Px ; Py/ 2 QS and x C y D a C b C 1 we define
Qp..x; y/; . Px; Py// D(
1; if . Px; Py/ D .a; b/;
0; if . Px; Py/ 6D .a; b/:(30)
We see that:
Pf. QX fn ;QY fn / D .xn; yn/ j . QX f
n�1;QY fn�1/ D .xn�1; yn�1/g D Qp..xn�1; yn�1/; .xn; yn//:
From Eqs. (28) and (29),
Pf. QX fn ;QY fn / D .xn; yn/ j . QX f
n�1;QY fn�1/ D .xn�1; yn�1/; : : :; . QX f
0 ;QY f0 / D .x0; y0/g
D Qp..xn�1; yn�1/; .xn; yn//:
ž xn�1 C yn�1 < a C b C 1.
Pf. QX fn ;QY fn / D .xn; yn/ j . QX f
n�1;QY fn�1/ D .xn�1; yn�1/g
D Pf. OX f
Ýf
n; OY f
Ýf
n/ D .xn; yn/ j . OX f
Ýf
n�1
; OY f
Ýf
n�1
/ D .xn�1; yn�1/g:
Sf~
Tf
1~ ~
Sf
2~S
f
3
Sf
2Tf
11Sf
Xf
n~ ~
Ynf
Xf
n
1
^
0
Ynf^
Sf
3Tf
2
~T
f
2
0
Fig. 3. The new process.
262 S. Sharma, Y. Viniotis / Performance Evaluation 36–37 (1999) 249–266
Since OX f
Ýf
n�1
C OY f
Ýf
n�1
D xn�1 C yn�1 < a C bC 1, from Eq. (29), Ý fn D Ý f
n�1 C 1. The left-hand side
of the above equation can instead be written as:
Pf. OX f
Ýf
n�1C1; OY f
Ýf
n�1C1/ D .xn; yn/ j . OX f
Ýf
n�1
; OY f
Ýf
n�1
/ D .xn�1; yn�1/g:
Now, by Lemma 6, Ý fn�1 is a stopping time (Ý f
n�1 is a stopping time and should not be confused withÝ
fn � 1, which is not) on f OX f
n ; OY fn g. Therefore we can apply the property of Lemma 3 to get:
Pf. OX f
Ýf
n�1C1; OY f
Ýf
n�1C1/ D .xn; yn/ j . OX f
Ýf
n�1
; OY f
Ýf
n�1
/ D .xn�1; yn�1/g D p..xn�1; yn�1/; .xn; yn//:
For .x; y/; . Px ; Py/ 2 QS and x C y < a C b C 1 we define
Qp..x; y/; . Px; Py// D p..x; y/; . Px; Py//: (31)
Therefore,
Pf. QX fn ;QY fn / D .xn; yn/ j . QX f
n�1;QY fn�1/ D .xn�1; yn�1/g D Qp..xn�1; yn�1/; .xn; yn//:
Again, by using Lemma 3, we can extend the result given above to the case when the entire past ofthe process is provided.
Pf. QX fn ;QY fn / D .xn; yn/ j . QX f
n�1;QY fn�1/ D .xn�1; yn�1/; f. QX f
m;QY fm / D .xm; ym/g;m � n � 2g
D Pf. OX f
Ýf
n; OY f
Ýf
n/ D .xn; yn/ j . OX f
Ýf
n�1
; OY f
Ýf
n�1
/
D .xn�1; yn�1/; f. OX f
Ýf
m; OY f
Ýf
m/ D .xm; ym/g;m � n � 2g
D Pf. OX f
Ýf
n; OY f
Ýf
n/ D .xn; yn/ j . OX f
Ýf
n�1
; OY f
Ýf
n�1
/ D .xn�1; yn�1/g
D Qp..xn�1; yn�1/; .xn; yn//: �
5.3. The measurements on the new process
Define variables QS fk ;QT fk ;QN f
k on the f QX fn ; QY f
n g; k 2 N0, process in the same manner that theS f
k ; T fk ; N f
k were defined on the f OX fn ; OY f
n g process in Section 4.1:
QS f0 D 0; QT f
0 D 0; QN f0 D 0;
QN fk D minfn ½ QN f
k�1: . QX fn ;QY fn / D .a; b/g;
QS fk D minfn ½ QT f
k�1: . QX fn ;QY fn / D .a; b/g;
QT fk D minfn > QS f
k : QX fn C QY f
n D a C b C 1g; k 2 N :
Lemma 8. The realizations of the random variables H fk , J f
k , I f xk , I f y
k and G fk , are unchanged if the
stochastic process they are defined on is changed from f OX fn ; OY f
n g to f QX fn ; QY f
n g.
The proof of the above lemma is trivial and hence omitted.
S. Sharma, Y. Viniotis / Performance Evaluation 36–37 (1999) 249–266 263
Finally we come to the results for which we had to define a new process composed of the cycles andshow that it is a Markov Chain.
Lemma 9. The random variables fI f xk g (respectively fI f y
k g), 8k 2 N , have identical distribution, givenby PfI f x
k D 1g D �.a; b/ (respectively PfI f yk D 1g D .a; b//:
Proof. QN fk is the kth return time of the f QX f
n ; QY fn g process to the state .a; b/. Therefore, . QX QN f
k; QY QN f
k/ D
.a; b/;8k 2 N . The stochastic process f QX fn ; QY f
n g, which is a Markov Chain by Lemma 7, regenerates atall f QN f
k g instants by the Strong Markov Property. Also, we know that QN f1 D 0, i.e., . QX0; QY0/ D .a; b/.
Therefore, from [1], f QX fn ; QY f
n g, n ½ QN fk is stochastically equivalent to (has the same joint distribution
as) f QX fn ; QY f
n g, n 2 N0.Consider the fI f x
k g; k 2 N , process, defined on the f QX fn ; QY f
n g process. I f xk is a function of the
post- QN fk process, i.e., of f QX f
n ; QY fn g, n ½ QN f
k , which has a common joint distribution for all k.Therefore, for any k 2 N , the random variables I f x
k too have a common marginal distribution, which bydefinition is
PfI f xk D 1g D Pf QX f
n > 0; 8n D QN fk ;QN f
k C 1; : : :; QT f
J fk
� 1gD Pf QX f
n > 0; 8n D QN f1 ;QN f
1 C 1; : : :; QT f
J f1
� 1gD Pf QX f
n > 0; 8n D 0; 1; : : :; QT f1 � 1g
D Pf OX fn > 0; 8n D S f
1 ; : : :; T f1 � 1g D �.a; b/:
The last equality follows from the definition of �.a; b/ in Eq. (13) and a trivial extension of Lemma3. Similarly, we can show that PfI f y
k D 1g D .a; b/, 8k 2 N . �
Lemma 10. The random processes fI f xk g and fI f y
k g; k 2 N regenerate at all Gk C 1 instants.
Proof. It can be easily seen that the indicator variables, I f xk are not independent. However, from the
definition of I f xk , it is seen that its dependence on the f QX f
n ; QY fn g process starts and ends with the cycle
that the kth visit takes place in. Therefore, if we show that the cycles are independent, then indicatorvariables for visits in different cycles are independent.
From its definition, QS fk is a strictly increasing sequence of stopping times on the Markov Chain
f QX fn ; QY f
n g, with the property that
. QX fQS fk
; QY fQS fk
/ D .a; b/; k 2 N :
Therefore, by the corollary to theorem 13.3 of [3], the collections f QX fn ; QY f
n g; QT fk ½ n ½ QS f
k , k 2 Nare independent.
Therefore, the beginning of every cycle in the f QX fn ; QY f
n g process marks a regeneration in the indi-cator process. At n D QS f
kC1, the beginning of the k C 1st cycle, there have been Gk C 1 visits of thestate .a; b/. Therefore, the process fI f x
k g regenerates at all Gk C 1 instants. The same is true of the fI f yk g
process. �
264 S. Sharma, Y. Viniotis / Performance Evaluation 36–37 (1999) 249–266
Our main result is the following:
Theorem 11. The estimates, Q� fk .a; b/ and Q f
k .a; b/ converge to �.a; b/ and .a; b/ respectively, wp 1,i.e.,
limk!1
Q� fk .a; b/ D �.a; b/; lim
k!1Q fk .a; b/ D .a; b/:
Moreover, the average expected loss incurred under the dynamic policy, f; converges to that incurred bythe optimal static policy, f ŁI i.e.,
limn!1
1
n
nXkD1
E L fk D lim
n!11
n
nXkD1
E L f Łk :
Proof. Since by Lemma 10 the processes fI f xk g, k 2 N , and fI f y
k g, k 2 N , are regenerative and byLemma 9 their distribution is known, we can apply the convergence of time-averages for regenerativeprocesses theorem from [8]. Therefore,
limk!1
Q� fk .a; b/ D lim
k!1
G fkX
iD1
I f xi
G fk
D E.I f xk / D �.a; b/:
Similarly,
limk!1
Q fk .a; b/ D lim
k!1
G fkX
iD1
I f yi
G fk
D E.I f yk / D .a; b/:
For the second result, we note that from Section 4.3, the decision made by the dynamic policyis determined by the sign of the difference Q� f
k .a; b/ � Q fk .a; b/. If this sign is the same as that of
�.a; b/� .a; b/ (for both .a; b/ D .2; K � 1/ as well as .a; b/ D .K � 1; 2/), then the dynamic policymakes the same decision as the optimal static policy, f Ł.
For any one value of .a; b/, say .a; b/ D .2; K � 1/, assume without loss of generality that the signof �.a; b/� .a; b/ is positive. Then we know that since Q� f
k .a; b/ converges to �.a; b/, and Q fk .a; b/ to
.a; b/, there must be a positive integer time, say k0, such that:
j Q� fk .a; b/� �.a; b/j < �.a; b/� .a; b/
2; 8k > k0;
j Q fk .a; b/� .a; b/j < �.a; b/� .a; b/
2; 8k > k0:
Therefore, we get that Q� fk .a; b/� Q fk .a; b/ > 0, 8k > k0, i.e., Q� f
k .a; b/� Q fk .a; b/ has the same sign as
�.a; b/� .a; b/ after time k0.A similar relationship can be proved for the other value of .a; b/ D .K � 1; 2/. Let the time after
which the result holds true be k1 2 N0. Let k2 be the maximum of k0 and k1. We can see that after timek2, the dynamic policy, f , starts taking the same decisions as the optimal policy, f Ł, with the result
S. Sharma, Y. Viniotis / Performance Evaluation 36–37 (1999) 249–266 265
that f OX fn ; OY f
n g; n ½ k2 has the same finite dimensional joint probability mass function as f OX f Łn ; OY f Ł
n g,n 2 N0. This fact proves the second claim of the theorem. �
6. Conclusions
In this paper, we considered the problem of minimizing average expected cell losses in an ATMshared buffer switch with two input and two output ports. Based on a static policy, we have defineda dynamic buffer management policy, that does not require knowledge of the pmf of the input arrivalprocess. We have shown that when the input processes to the switch are iid Bernoulli (with unknownparameters), the estimates used by the dynamic policy converge to the parameters used by the optimalstatic policy; therefore, the cost incurred by the dynamic policy converges to that incurred by the optimalpolicy; thus proving that the dynamic policy is also optimal.
The main difficulty in the work lay first in identifying the estimates to be used, and then in provingtheir regenerative properties. However, the dynamic policy itself is simple to implement and has theadvantage over the optimal static policy that it does not need to know the pmf of the arrival process;this information that is unlikely to be available in a real network. Arrival processes to switches in realnetworks may not always be iid Bernoulli. When these processes are modeled by Discrete Batch MarkovArrival Processes [2], the optimal static policy has been identified in [7] to have a similar form to theoptimal static policy for the iid Bernoulli case. This leads us to believe that a dynamic policy similar tothe one presented in this paper can be proven to be optimal too. However, this work is left for the future.
References
[1] R. Bhattacharya, E. Waymire, Stochastic Processes and Applications, Wiley, New York, 1990.[2] C. Blondia, O. Casals, Performance analysis of statistical multiplexing of VBR sources, in: Proc. Infocom, Florence,
Italy, May 1992, IEEE, Vol. 2, pp. 828–838 (6C.2).[3] K.-L. Chung, Markov Chains with Stationary Transition Probabilities, Springer, Berlin, 1960.[4] M.G. Hluchyj, M.J. Karol, Queueing in high-performance packet switching, IEEE J. Selected Areas Commun. 6 (9)
(1988) 1587–1597.[5] F. Kamoun, L. Kleinrock, Analysis of shared finite storage in a computer network node environment under general traffic
conditions, IEEE Trans. Commun. 28 (1980) 992–1003.[6] S. Karlin, H.M. Taylor, A First Course in Stochastic Processes (2nd ed.), Academic Press, San Diego, CA, 1975.[7] S. Sharma, Optimal buffer management in shared buffer ATM switches, Ph.D. Thesis, North Carolina State University,
Raleigh, NC, 1996.[8] R.W. Wolff, Stochastic Modeling and the Theory of Queues, Prentice-Hall, Englewood Cliffs, NJ, 1989.
Supriya Sharma received a B.E. from VRCE, Nagpur, India, in ’91, an M.S. from Clemson Univ.,S.C., in ’93, and a Ph.D. from NCSU, Raleigh, N.C., in ’96. She currently works in the StrategicNetwork Planning group in Alcatel USA. Her research interests are data network architecture,traffic management, quality of service, and performance analysis of networking software. Supriyawas awarded the National Merit Scholarship from the Govt. of India from ’86 to ’91, the MotorolaScholarship in ’91, and an Internet Achievement Award for her work on Patricia trees at IBM, RTP,in ’97. Her e-mail address is [email protected]
266 S. Sharma, Y. Viniotis / Performance Evaluation 36–37 (1999) 249–266
Yannis Viniotis received his Ph.D. from the University of Maryland, College Park, in 1988 andis currently Associate Professor of Electrical and Computer Engineering at North Carolina StateUniversity. Dr. Viniotis is the author of over fifty technical publications, including an engineeringtextbook on probability and random processes (published by McGraw-Hill, 1998). He has servedas the guest editor of a special issue of the Performance Evaluation Journal and as the co-editorof the proceedings of two international conferences in computer networking. His research interestsinclude quality of service issues in high speed networks, multicast routing, traffic management anddesign and analysis of stochastic algorithms. Dr. Viniotis is the Associate Director of the Mastersin Computer Networking Program, a program offered jointly by the Departments of Electrical andComputer Engineering, Computer Science, and the School of Business at NCSU; he is also theco-founder of a startup networking company in Research Triangle Park, that specializes in ASIC
implementation of integrated QoS solutions for IP and ATM networks.