[ieee 2011 ieee 19th international workshop on quality of service (iwqos) - san jose, ca, usa...

9
Multi-Tiered, Burstiness-Aware Bandwidth Estimation and Scheduling for VBR Video Flows Ritesh Kalle IIIT, Bangalore, India UmaMaheswari C. Devi, Shivkumar Kalyanaraman IBM Research – India, Bangalore, India Abstract—The increasing demand for high-quality streaming video delivered to mobile clients necessitates efficient bandwidth utilization and allocation at not only the wireless channel but also the wired backhaul of broadband wireless networks. In this context, we propose techniques for increasing the link utilization and enhancing the quality-of-experience (QoE) for end users while multiplexing video streams over a wired link. For increasing the link utilization, we present a generic multi-tiered bandwidth estimation and scheduling scheme that can guarantee lower bounds on loss for flows at lower tiers. This scheme can be used for supporting heterogeneous loss classes, differentiated losses for different layers of video streams, or per-flow guarantees using lower aggregate bandwidth than schemes proposed in the literature. For enhancing the end-user QoE, we present a scheme for minimizing correlated losses and improving the smoothness of video quality by minimizing the maximum loss suffered by any logical unit of a stream and also the variability in loss across the length of the stream. In simulations performed using MPEG-4 sources, our loss-minimization approach could lower the maximum loss by a factor of five and the loss variance by more than an order of magnitude. Our multi-tiered scheme could lower the estimated bandwidth and improve statistical multiplexing gains by 10–20% with three classes, 5–20% with two classes, and over 30% in the context of providing deterministic per-flow guarantees. I. I NTRODUCTION A meteoric rise in video traffic comprising both live and stored video over wireless and mobile networks has been predicted for the immediate future [1]. Delivering the increas- ing wireless streaming traffic while also meeting end-user expectations on quality is expected to significantly increase the stress on the wired backhaul of wireless broadband networks. Wireless network operators are therefore seeking to deploy optimizations that can ease this pressure. In this context, we consider efficient bandwidth estimation and multiplexing schemes to increase the utilization over wired links and improve QoE to end users. Digital video streams are typically variable-bit-rate (VBR) encoded due to the higher quality, lower delay, and lower bandwidth advantages of VBR over constant-bit-rate (CBR) encoding [2]. VBR video however is, in general, extremely bursty, with burstiness extending over multiple time scales (long-range dependence) that, when transported over a net- work, cannot in general be smoothed without introducing significant delay. Video streams often differ in the QoS they require due to, e.g., differences in the service cost, encoding type, etc. Similarly, different segments or layers of a video stream can require different levels of QoS. For example, end-user QoE is more sensitive to loss of intra-coded I frames than the other two types of frames. One common approach for supporting heterogeneous QoS (e.g., different loss rates) to different classes of streams or sub-streams is to completely isolate them by classes and provide per-class reservations. However, since VBR video is bursty, and burstiness is only lowered but not eliminated by multiplexing, there is bound to be considerable slack in each of the classes. With strict per-class reservations, this slack is, in general, not reclaimed in a predictable way, leading to resource under-utilization. A second approach for scheduling heterogeneous classes is through the use of static-priority algorithms, which has been considered in [3] and [4]. Since strict prioritization penalizes lower-priority classes, the bandwidth needed to meet QoS requirements for heterogeneous classes tends to be higher with static-priority scheduling than if there were some isolation among the classes. Contributions: In this paper, we first propose a bandwidth estimation and scheduling scheme that seeks a middle ground between complete isolation and strict prioritization for pro- viding differentiated stochastic guarantees on loss to different classes of flows or segments of a flow. The proposed scheme provides reservations for subsets of classes and reclaims slack in a predictable manner. The scheme is multi-tiered, with the QoS classes arranged, in order, from the most stringent to the least stringent. Bandwidth estimation and scheduling proceeds upwards from the most-stringent QoS class at the lower-most tier to the least-stringent class. A schematic of the proposed approach is shown in Fig. 1(b). At each tier or level i, aggregate reservation B i needed for all the classes up to that level is determined. During runtime scheduling, slack from the lower levels is reclaimed so that the QoS of all the classes can be met with lesser aggregate bandwidth. The above scheme can also be used to realize more efficiently the two-tier reservation framework proposed recently in [5] for providing deterministic per-flow bandwidth guarantees. Apart from lowering the total bandwidth needs, the proposed scheme can also help isolate less-bursty flows from flows with higher burstiness. As a second contribution, we consider smoothing the video quality by minimizing correlated losses (i.e., loss clumps) that are confined to a sequence of frames. Since VBR video exhibits rate variability even with multiplexing and smoothing, burst spikes in the incoming traffic, and hence, losses, are not uncommon when the QoS guaranteed is stochastic. Incurring heavy losses in a sequence of frames can lead to a huge deterioration in quality for the end user in those frames. To alleviate such deterioration, we present a scheme that actively identifies future bursts and streams ahead bursty sections early 978-1-4577-0103-0/11/$26.00 c 2011 IEEE

Upload: shivkumar

Post on 11-Mar-2017

213 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: [IEEE 2011 IEEE 19th International Workshop on Quality of Service (IWQoS) - San Jose, CA, USA (2011.06.6-2011.06.7)] 2011 IEEE Nineteenth IEEE International Workshop on Quality of

Multi-Tiered, Burstiness-Aware Bandwidth Estimation andScheduling for VBR Video Flows

Ritesh KalleIIIT, Bangalore, India

UmaMaheswari C. Devi, Shivkumar KalyanaramanIBM Research – India, Bangalore, India

Abstract—The increasing demand for high-quality streamingvideo delivered to mobile clients necessitates efficient bandwidthutilization and allocation at not only the wireless channel butalso the wired backhaul of broadband wireless networks. In thiscontext, we propose techniques for increasing the link utilizationand enhancing the quality-of-experience (QoE) for end userswhile multiplexing video streams over a wired link. For increasingthe link utilization, we present a generic multi-tiered bandwidthestimation and scheduling scheme that can guarantee lowerbounds on loss for flows at lower tiers. This scheme can be usedfor supporting heterogeneous loss classes, differentiated lossesfor different layers of video streams, or per-flow guaranteesusing lower aggregate bandwidth than schemes proposed in theliterature. For enhancing the end-user QoE, we present a schemefor minimizing correlated losses and improving the smoothnessof video quality by minimizing the maximum loss suffered byany logical unit of a stream and also the variability in lossacross the length of the stream. In simulations performed usingMPEG-4 sources, our loss-minimization approach could lower themaximum loss by a factor of five and the loss variance by morethan an order of magnitude. Our multi-tiered scheme could lowerthe estimated bandwidth and improve statistical multiplexinggains by 10–20% with three classes, 5–20% with two classes,and over 30% in the context of providing deterministic per-flowguarantees.

I. INTRODUCTION

A meteoric rise in video traffic comprising both live andstored video over wireless and mobile networks has beenpredicted for the immediate future [1]. Delivering the increas-ing wireless streaming traffic while also meeting end-userexpectations on quality is expected to significantly increase thestress on the wired backhaul of wireless broadband networks.Wireless network operators are therefore seeking to deployoptimizations that can ease this pressure. In this context,we consider efficient bandwidth estimation and multiplexingschemes to increase the utilization over wired links andimprove QoE to end users.

Digital video streams are typically variable-bit-rate (VBR)encoded due to the higher quality, lower delay, and lowerbandwidth advantages of VBR over constant-bit-rate (CBR)encoding [2]. VBR video however is, in general, extremelybursty, with burstiness extending over multiple time scales(long-range dependence) that, when transported over a net-work, cannot in general be smoothed without introducingsignificant delay.

Video streams often differ in the QoS they require dueto, e.g., differences in the service cost, encoding type, etc.Similarly, different segments or layers of a video streamcan require different levels of QoS. For example, end-userQoE is more sensitive to loss of intra-coded I frames than

the other two types of frames. One common approach forsupporting heterogeneous QoS (e.g., different loss rates) todifferent classes of streams or sub-streams is to completelyisolate them by classes and provide per-class reservations.However, since VBR video is bursty, and burstiness is onlylowered but not eliminated by multiplexing, there is boundto be considerable slack in each of the classes. With strictper-class reservations, this slack is, in general, not reclaimedin a predictable way, leading to resource under-utilization.A second approach for scheduling heterogeneous classes isthrough the use of static-priority algorithms, which has beenconsidered in [3] and [4]. Since strict prioritization penalizeslower-priority classes, the bandwidth needed to meet QoSrequirements for heterogeneous classes tends to be higher withstatic-priority scheduling than if there were some isolationamong the classes.Contributions: In this paper, we first propose a bandwidthestimation and scheduling scheme that seeks a middle groundbetween complete isolation and strict prioritization for pro-viding differentiated stochastic guarantees on loss to differentclasses of flows or segments of a flow. The proposed schemeprovides reservations for subsets of classes and reclaims slackin a predictable manner. The scheme is multi-tiered, withthe QoS classes arranged, in order, from the most stringentto the least stringent. Bandwidth estimation and schedulingproceeds upwards from the most-stringent QoS class at thelower-most tier to the least-stringent class. A schematic ofthe proposed approach is shown in Fig. 1(b). At each tier orlevel i, aggregate reservation Bi needed for all the classesup to that level is determined. During runtime scheduling,slack from the lower levels is reclaimed so that the QoS of allthe classes can be met with lesser aggregate bandwidth. Theabove scheme can also be used to realize more efficiently thetwo-tier reservation framework proposed recently in [5] forproviding deterministic per-flow bandwidth guarantees. Apartfrom lowering the total bandwidth needs, the proposed schemecan also help isolate less-bursty flows from flows with higherburstiness.

As a second contribution, we consider smoothing the videoquality by minimizing correlated losses (i.e., loss clumps)that are confined to a sequence of frames. Since VBR videoexhibits rate variability even with multiplexing and smoothing,burst spikes in the incoming traffic, and hence, losses, are notuncommon when the QoS guaranteed is stochastic. Incurringheavy losses in a sequence of frames can lead to a hugedeterioration in quality for the end user in those frames. Toalleviate such deterioration, we present a scheme that activelyidentifies future bursts and streams ahead bursty sections early978-1-4577-0103-0/11/$26.00 c© 2011 IEEE

Page 2: [IEEE 2011 IEEE 19th International Workshop on Quality of Service (IWQoS) - San Jose, CA, USA (2011.06.6-2011.06.7)] 2011 IEEE Nineteenth IEEE International Workshop on Quality of

Fig. 1: Schematic representations of the (a) per-class reservationand (b) proposed multi-tier bandwidth estimation and schedulingapproaches. bi denotes the bandwidth reserved for Class i under theper-class reservation approach, while Bi, the cumulative bandwidthreserved for classes 1, . . . , i, for all i, under the multi-tiered approach.

on by imposing small losses in the earlier frames. Gracefuldegradation to end-user QoE can be achieved by ensuring thatthe imposed losses are to the least valuable sections only.

It should be noted that the complexity of scheduling withour proposed approaches is higher than simpler schemes suchas FIFO. However, the advent of workload and network-optimized wire-speed processors and the planned/projectedimprovement to their processing abilities should make itless of a challenge to implement such frameworks that canimprove bandwidth utilization and QoE at the cost of a highercomputation overhead [6], [7].Organization: The rest of the paper is organized as follows.Sec. II describes related work. Our system model is presentedin Sec. III. Secs. IV and V present and empirically evaluate ourproposed multi-tiered bandwidth estimation and schedulingapproach and our burstiness-aware scheduling approach tominimize correlated losses, respectively. Sec. VI concludes.

II. RELATED WORK

Bandwidth estimation and scheduling for VBR video hasbeen an active field of research with rich literature. One ofthe well-studied bandwidth management techniques is smooth-ing [8], which reduces video burstiness and rate variability bypre-fetching frames into client buffers of non-trivial sizes toimprove utilization. Smoothing is mainly applicable for storedvideo that does not include interactivity. Smoothing of livevideo is limited by the latency that can be tolerated that limitsthe amount of look-ahead possible [9]. In addition to addingto the latency, smoothed streams are known to continue toexhibit long-term rate variability that leaves open opportunitiesto exploit statistical multiplexing gains [10]. Closely relatedto smoothing is work on dynamic bandwidth renegotiationproposed by schemes such as Renegotiated CBR (RCBR) [11]

and REnegotiated Deterministic VBR (RED-VBR) [12]. Ourproposed approach can be adapted for use in conjunction withsmoothing and dynamic renegotiation to lower the aggregatebandwidth needs of those schemes when multiple QoS classesare involved.

There have been studies that show that less-bursty sourcesincur greater losses than those with larger burstiness whensources with different burstiness are multiplexed [13], andthat narrow bandwidth streams are more severely affected thanwide bandwidth streams [14]. The methods of this paper canserve to provide isolation among different types of flows whileminimizing bandwidth under-utilization.

Techniques for providing differentiated stochastic delay/lossbounds to different classes of VBR flows are limited toproviding independent reservations for the various classes oran aggregate reservation that is arrived at by assuming the moststringent of the QoS requirement of the concerned classes [15].Both of these methods can significantly under-utilize theavailable bandwidth. As mentioned in Sec. I, multiplexingheterogeneous classes under static-priority scheduling whileproviding deterministic and stochastic guarantees on delaybounds has been considered in [3] and [4], respectively. Static-priority scheduling tends to under-utilize system resources dueto strict and fixed prioritization among the classes that makesit hard to meet the performance requirements of lower-priorityclasses. It is well known that for periodic workloads, single-resource utilization under static-priority schedulers decreaseswith the number of distinct priorities, and approaches 69%, incomparison to the 100% utilization bound of dynamic-priorityschedulers such as the earliest-deadline first (EDF) [16]. Asmentioned in [4], the bounds provided therein on delay areloose, and it should be noted that the empirical evaluationprovided in [4] is for a single-class system only.

The video quality smoothing technique that we propose inSec. V is somewhat related to adaptive streaming. Broadly,adaptive streaming attempts to match the output rate of astream to the available bandwidth. One class of adaptivestreaming approaches adjusts the parameters of a scalablevideo encoder so that the encoded stream rate is as desired [17][18] [19]. A second approach considers ordering all the packetswithin a slot by priority and dropping the least-priority onesduring periods of congestion [20]. Our approach can be usedwhen it is not feasible to change the encoded stream and cancomplement the priority-dropping approach by letting futurebursty slots borrow bandwidth from the earlier slots.

Our smoothing technique is also related to collaborativeprefetching techniques proposed in [21] and [22] and refer-ences therein. Like our approach, collaborative prefetching isalso concerned with distributing the spare bandwidth among agroup of videos for streaming their future frames, but considersdropping entire frames only. Further, unlike our approach, itdoes not attempt to reduce the loss variance over the lengthof a video.

III. SYSTEM MODEL

We consider a video-streaming server streaming video ob-jects (stored or live) over a bandwidth-constrained wired linkto a destination node, for further dissemination to end clients.Videos are variable-bit-rate (VBR) encoded but have a constant

Page 3: [IEEE 2011 IEEE 19th International Workshop on Quality of Service (IWQoS) - San Jose, CA, USA (2011.06.6-2011.06.7)] 2011 IEEE Nineteenth IEEE International Workshop on Quality of

frame rate. We assume that the streaming system allowsmultiple but a small number of QoS levels, each specifiedby a loss ratio, which is the proportion of the total video thatmay be lost. Our scheme may also be adapted for supportingclasses with varying bounds on probability of loss or toleranceto delay. (The probability of loss is the probability with whicha video frame can be lost.)

We assume a discrete-time slotted transmission system withan appropriate slot duration, e.g., frame or GoP1 duration,(based on tolerable delay). We assume that the content span-ning a slot is available at the beginning of that slot and any partof the content not delivered by the end of its slot is dropped.In other words, multiplexing is buffer-less (but for holding thecurrent slot’s contents). Content from multiple sources may betransmitted in a slot.

Each stream’s traffic descriptor is given by a discretemarginal distribution of its bandwidth obtained using the his-togram method [23]. Following [10], the marginal distributionof a video source is given by a K-state discrete random vari-able with probability distribution given by a set of K (pk, rk)pairs. pk denotes the probability that the stream is in state k,and rk, the amount of traffic generated per transmission slotin this state. pk can be obtained by dividing the peak trafficper slot of the video (i.e., the segment size) into K equi-sizedbins, and determining the ratio of the number of segmentswhose sizes fall in each of the bins to the total number ofsegments in the video. rk is given by the “upper-limit” of thekth bin’s range. While the above descriptor can be obtainedaccurately for stored video, techniques proposed in the contextof measurement-based admission control (MBAC) [24] can beused for obtaining approximate estimates of true distributionsat run time for live video. A more parsimonious 3-state, 5-parameter model that is an upper bound to any marginal distri-bution [10] may be used instead as the traffic descriptor, if thecomputational overhead associated with verbose distributionsis a concern.

Given a set of stored or live video streams, with trafficdescriptors as specified above, to be transmitted from a videoserver to a given destination node, we first propose a methodfor estimating the aggregate bandwidth needed for the streams,when the streams have different tolerance to loss. We alsopropose an associated runtime algorithm for scheduling andapportioning the aggregate bandwidth among the streams sothat each stream’s bound on loss ratio is not violated. Wethen consider the issue of minimizing correlated losses whenthe multiplexed sources burst simultaneously and propose amethod for smoothing the delivery of a stream over its lifetime.

The backhaul of a wireless broadband network, such as a3G or all-IP 4G network, has a tree topology in which everynode has a single immediate upstream node (for traffic inthe direction towards the wireless edge). Resource allocationcan hence be simpler than in a mesh network, with trafficaggregation over a slot required at just the ingress node.Assuming streaming video servers or proxies are located atthe ingress, all the flows destined to an egress node, typically

1GoP or group of pictures denotes a sequence of frames beginning with anI frame followed by a fixed number of P and B frames arranged in a specifiedway. A video stream is composed of a sequence of GoPs.

a base station, at the wireless edge may be treated as anaggregate and allocated aggregate resources end-to-end (tillthe edge). Intermediate nodes merely have to participate inadmission control to ensure that enough bandwidth is availableat their outgoing links and forward all the traffic admitted atthe ingress.

Towards the problem of efficient bandwidth estimation andscheduling, we consider C different QoS classes of videostreams where each class i, 1 ≤ i ≤ C, consists of Ni streams.Each QoS class is associated with a loss ratio qi. The QoSclasses are assumed to be ordered from the most stringent tothe least stringent loss ratio, i.e., qi ≤ qi+1, for all 1 ≤ i < C.The sources that belong to a class are homogeneous withrespect to their loss rate tolerance only and are not requiredto have identical traffic descriptors.

IV. MULTI-TIERED BANDWIDTH ESTIMATION AND

SCHEDULING OF VBR VIDEO CLASSES

In this section, we first propose an efficient bandwidthestimation scheme for determining the aggregate bandwidthfor VBR video classes with different loss bounds and anassociated runtime scheduling algorithm for ensuring the lossbounds of all the classes.

A. Bandwidth Estimation and Scheduling Algorithms

Bandwidth estimation for flows with identical loss bounds:Our bandwidth estimation algorithm for multiple classes buildsupon the Chernoff-bounds based estimation scheme for VBRvideo flows with identical QoS specification (loss bounds),and hence, a brief description of it is in order. (Completedescriptions of the approach can be found, among othersources, in [10] and [25].) Consider N video sources, wherethe traffic of source i is characterized by a Ki-state discreterandom variable ai, whose probability mass function is givenby a set of Ki (pi,k, ri,k) pairs, 1 ≤ k ≤ Ki, as described inSec. III. Let the total traffic from all the sources at any timeslot be A =

∑Ni=1 ai. Then, as explained in [10] and [25],

the aggregate bandwidth b∗ that is needed to approximatelysatisfy a given loss ratio bound q at a buffer-less multiplexeras derived from Chernoff bounds is

b∗ =

N∑

i=1

M ′i(θ

∗)Mi(θ∗)

, (1)

where θ∗ is the solution to the equation

ln(q) ≈ Λ(θ)−θΛ′(θ)−2 ln(θ)−1

2ln(Λ′′(θ))−1

2ln(2π)−ln(E[A]).

(2)In the above equation, Λ(θ) =

∑Ni=1 ln(Mi(θ)), where

Mi(θ) =∑Ki

k=1 pi,keθri,k , is the moment generating function

of the random variable ai.The loss rate q, which denotes the proportion of data

lost from an arriving stream is given by E[(A−b∗)+]E[A] [26],

where E[.] denotes the expectation function and X+ denotesmax(0, X).

We will denote the aggregation of the sequence of all classes1, . . . , i using the symbol Ci. Let mi denote the cumulativemean bandwidth per slot of all the Ni flows in Class i, andMi, that of all flows of all classes in Ci.

Page 4: [IEEE 2011 IEEE 19th International Workshop on Quality of Service (IWQoS) - San Jose, CA, USA (2011.06.6-2011.06.7)] 2011 IEEE Nineteenth IEEE International Workshop on Quality of

Algorithm 1 Aestbw: Bandwidth Estimation for Loss Rate

Classes1: /*mi denotes the mean payload size per slot of all the Ni flows

in Class i. Mi :=∑i

k=1 mk*/2: for i := 1 to C do3: qeffi =

∑ik=1 mk·qk∑i

k=1mk

=∑i

k=1 mk·qkMi

;4: Determine the aggregate bandwidth Bi needed for the N1 +

. . .+Ni flows in Classes 1 . . . i (Ci) using (1) and (2) assumingan effective loss rate bound of qeffi for all the classes

5: end for6: return vector 〈B1, B2, . . . , BC〉;

Algorithm 2 Asch: Per-slot Scheduling Algorithm

1: /*Inputs: bandwidth allocation vector 〈B1, . . . , BC〉 returned byAest

bw and the vector 〈a1, . . . , aC〉 of incoming bytes for classes1, . . . , C; Bi is specified in bytes per slot.*/

2: X0 := 0;3: for i := 1 to C do4: xi := min(ai, (Bi −Xi−1));5: Xi := Xi−1 + xi;6: if Xi < Bi then7: Use the remaining bandwidth of Bi −Xi for transmitting

the pending bytes of classes 1, . . . , i− 1 in that order;8: Update X1, . . . , Xi−1 as appropriate;9: end if

10: end for

Bandwidth estimation for flows with non-identical loss bounds:Our bandwidth estimation method for heterogeneous QoSclasses of VBR streams that builds upon the estimation for asingle class is provided in Algorithm 1. The algorithm returnsa bandwidth vector 〈B1, B2, . . . , BC〉, where Bi ≤ Bi+1, fori < C, and Bi denotes the aggregate bandwidth reservedfor flows in all classes in Ci in each slot. The bandwidthneeded for ensuring the stipulated loss rate bound for the N1

flows of Class 1 is obtained using (1) in conjunction with(2) for q = q1. Bandwidth estimate Bi for the ith classaggregate Ci is obtained using an effective loss rate boundof qeffi =

∑ik=1 mk·qk

Mifor all the classes in Ci.

The way the aggregate bandwidth Bi is apportioned amongthe individual classes at run time is given by the per-slotscheduler Asch (Algorithm 2), which we describe next.Per-slot scheduling algorithm: For the sake of conciseness, butat the risk of slightly overloading notation, in the remainder ofthe section we will let ai denote the total number of incomingbytes from all flows in Class i in the slot under consideration.Total number of bytes from all flows of all classes in Ci shallbe denoted Ai. Let xi denote the number of bytes assured toClass i, that is, the number of bytes allocated for transmissionfrom Class i before any slack from the upper classes (classes >i) is handed down, and Xi, the total number of bytes scheduledfor transmission from all classes in Ci. Note that unlike xi,Xi includes slack from the upper classes as well. Without lossof generality, we will assume that Bi is in units of bytes perslot. We will assume that the bandwidth of the outgoing linkis at least BC .

Fig. 1(b) of Sec. I provides a schematic of the multi-tiered run-time allocation. Allocation proceeds sequentiallyfrom Class 1 so that in each slot, Ci is assured at least Bi

bytes in all, and each Class i is assured at least Bi − Bi−1

bytes (subject to that many bytes arriving in that slot). Class i

would have slack if its total incoming bytes ai < Bi −Bi−1,and we say that there is slack at level i (all classes in Ci

considered together) if Ai < Bi holds. Any slack that class imay have is handed down to the lower classes 1, . . . , i − 1subject to the total allocation to Ci not exceeding Bi. Suchdistribution is not necessary to meet the loss rate bounds butto ensure that the allocation to Ci is at least Bi. Any slack thatmay still be present at level i is then allocated to class i + 1to accommodate its burst. Such slack allocation is necessaryto ensure that class-wise QoS guarantees are met.

One of the main ways in which our scheme differs fromtraditional multi-class scheduling approaches is in the slackdistribution policy. Fig. 1(a) shows a weighted round-robinscheme for providing per-class reservations. This scheme lacksan explicit policy on how slack is reclaimed, and hence, theaggregate bandwidth needed tends to be higher.

Referring to Algorithm 2, which provides a complete listing,in each slot, Class 1 is assured min(a1, B1) bytes, the mini-mum of its total incoming bytes in that slot and the bandwidththat should be reserved for it to meet its QoS. In iteration iof the for loop in line 3, Class i is allocated its assured xi

bytes given by min(ai, Bi − Xi−1). Since at the beginningof iteration i, Xi−1 ≤ Bi−1 holds (refer to the proof ofTheorem 1 below), Class i is assured at least Bi − Bi−1

bytes (if ai ≥ Bi − Bi−1), but can be allocated more ifXi−1 < Bi−1. In other words, as described earlier, Class iis allocated slack, if any, from lower classes (level i − 1) tothe extent it can consume. If Xi < Bi holds after Class i isallocated its assured bytes, then it implies that there is someslack in Class i. This slack is handed down to the lower classes1, . . . , i− 1 before the next class is considered.

The worst-case complexity of the algorithm to apportion theavailable bandwidth among C classes is O(C2).

We now show that our scheme can guarantee the bound onloss rate for each class.

Theorem 1. The bandwidth estimation algorithm Aestbw and

per-slot scheduling algorithm Asch can together ensure theneeded bounds on loss rates for each of a set of C QoS classesof video streams (with parameters as described earlier).

Proof: The proof of this theorem is by induction over theclass index i. Referring to Algorithm Asch, we will denote thevalue of Xi at the beginning of iteration j (and hence end ofiteration j−1) by Xj

i . We will refer to the bandwidth allottedto classes in Ci before slack from the upper classes i+1, . . . , Cis assigned to them as the total bandwidth assured to all theclasses in Ci put together. Note that X i+1

i denotes this value.In addition to inductively showing that the loss rate bound qiof each Class i can be ensured, we will also show that theaggregate assured allocation to Ci at the end of iteration i,X i+1

i , is min(Ai, Bi).

Base Case i = 1: The loss rate bound of q1 for Class 1 caneasily be seen to hold by observing the following: qeff1 = q1in line 3 of Algorithm Aest

bw, B1 is the bandwidth needed forClass 1 to have a loss rate ≈ q1, and Class 1 is assured theminimum of B1 and incoming bytes (i.e., X2

1 is min(A1, B1))in each slot (refer line 4). (Class 1 can be allocated more thanmin(A1, B1)) if A1 > B1 and the upper classes have slack.)

Page 5: [IEEE 2011 IEEE 19th International Workshop on Quality of Service (IWQoS) - San Jose, CA, USA (2011.06.6-2011.06.7)] 2011 IEEE Nineteenth IEEE International Workshop on Quality of

Induction Step: Assume that the loss rate for each class kof the classes in Ci−1 ≈ qk. Also assume that X i

i−1, thebandwidth assured for Ci−1, is min(Ai−1, Bi−1) bytes. Fromline 4 of Algorithm Aest

bw, if classes in Ci together are reserved abandwidth of Bi, then the combined loss rate for Ci is ≈ qeffi .If ai ≥ Bi −Xi−1, then since X i

i−1 = min(Ai−1, Bi−1) and,by line 4 of Asch, xi = min(ai, Bi −Xi−1), we have in theith iteration, Xi = X i

i−1 + xi = Bi = min(Ai, Bi). On theother hand, if ai < Bi − Xi−1, then Xi < Bi holds afterline 5. In this case, the distribution of the slack Bi − Xi toCi−1 in line 7 ensures Xi = min(Ai, Bi) at the end of theith iteration. Thus, all the classes in Ci are together assured aloss rate ≈ qeffi . That is,

E[(Ai −Bi)+]

∑ik=1 mi

=E[(Ai −Bi)

+]

Mi≈ qeffi . (3)

Note that the bandwidth assured to Class i in any slot is xiand our proof obligation is to show that the loss rate bound ofClass i, E[(ai−xi)

+]mi

, is very close to qi. Since Ai = Ai−1+ai,we have Ai − Bi = Ai−1 + ai − (X i

i−1 + Bi − X ii−1) =

(Ai−1 −X ii−1) + (ai − (Bi −X i

i−1)). Thus,

ai − (Bi −Xii−1) = (Ai −Bi)− (Ai−1 −Xi

i−1)

⇒ (ai − (Bi −Xii−1))

+ = ((Ai −Bi)− (Ai−1 −Xii−1))

+

= ((Ai −Bi)+ − (Ai−1 −Xi

i−1)+)+ (since Ai−1 ≥ Xi

i−1).

Hence, E[(ai − (Bi −X ii−1))

+] = E[((Ai −Bi)+ − (Ai−1 −

X ii−1)

+)+], which, by linearity of expectation yields

E[(ai−(Bi−Xii−1))

+] = (E[(Ai−Bi)+]−E[(Ai−1−Xi

i−1)+])+.

(4)From line 4 of Algorithm Asch, xi = min(ai, Bi − Xi−1).Hence,

(ai − (Bi −X ii−1))

+ = (ai − xi), (5)

the deficit in allocation to Class i. By (3), E[(Ai − Bi)+] ≈

Mi · qeffi . By the induction hypothesis, the loss rate for theinitial i − 1 classes is ≈ qeffi−1 , i.e., E[(Ai−1 − X i

i−1)+] ≈

Mi−1 · qeffi−1 . Hence, by (4) and (5), E[ai − xi] ≈ Mi · qeffi −Mi−1 ·qeffi−1 = mi ·qi. Thus, E[(ai−(Bi−X i

i−1))+] = E[ai−

xi] ≈ mi · qi, and the loss rate for Class i, E[(ai−xi)+]

mi≈ qi.

B. Empirical Evaluation

We evaluated the bandwidth savings and statistical multi-plexing gains (SMG) achievable by our approach (in compari-son to a per-class reservation approach) using MPEG-4 tracesavailable at http://trace.eas.asu.edu/TRACE/stat.html [27].Bandwidth savings for one set of evaluations with two andthree classes are presented in Fig. 3. (SMG numbers are veryclose to bandwidth savings and have been omitted due tolack of space.) For this evaluation, we used traces of soccer,starship troopers, and silence of the lambs videos. The trafficcharacteristics of these sources are provided in Fig 2. Theframe rate of these videos is 25 fps and their GoP is 12frames long. Our slot duration was set to one GoP duration,that is, 480 ms. The first two sources were used for theevaluation with two classes, while all the three were usedfor the 3-class evaluation. We evaluated with the sourcessmoothed over a GoP as well as when they were optimally

smoothed (over the entire source length) using the approachin [8]. The number of bins was set to five for each sourcewhen determining its marginal probability distribution usingthe histogram method [23].

Video Mean Rate Peak Ratesoccer 531.2 1444.08starship troopers 289.26 934.056silence of the lambs 276.14 1717.44star wars 132.15 572.80sony demo 977.36 3447.2

Fig. 2: Statistical characteristics of the MPEG-4traces used in evaluations. Rates are in Kb/GoP.

In the2-classexperi-ments, weincreasedthe totalnumberof sourcesper classfrom 10 to 100, with the two classes containing theindicated number of sources of soccer and starship troopers,respectively. We conducted experiments with differentcombinations of loss rates for the two classes. We will referto the ratio of the base 10 logarithm of the loss rate of themore stringent class (q2) to that of the less stringent class(q1) as the QoS ratio of the classes. Informally, the QoSratio denotes the difference in orders of magnitude betweenthe loss rates of the two classes. Figs. 3(a) and (b) showthe bandwidth savings when sources are smoothed over aGoP and when they are optimally smoothed (using adequatereceiver buffers) using the approach in [8], respectively.

In both cases, in general, bandwidth savings decreases withdecreasing QoS ratio and increasing number of sources. Thisis because as the QoS ratio decreases, the heterogeneity ofthe classes decreases, and as the number of sources increases,the aggregate traffic becomes smoother, lowering the amountof per-class slack. When smoothed over GoP, savings closeto 20% is achieved with 10 sources and a QoS ratio of 8.Even for a more realistic loss rate ratio of 3, close to 10% and15% savings are achieved with 25 and 10 sources, respectively.Fig. 3(b) shows the savings when the sources are optimallysmoothed. Savings in this case are somewhat diminished butstill considerable when the QoS ratio is at least 5.

The total number of sources in the 3-class experimentranged from 30 to 150. QoS ratio was set to 3. Results inFig. 3(c) show that savings range from 10%-20%.Differential QoS by frame sizes: Recently, a two-stageresource-allocation framework for providing deterministic per-flow guarantees on bandwidth for video flows was proposed in[5]. In that framework, per-flow reservations are made at thefirst stage for reliably delivering a fraction (pth-percentile) ofsegments (group of frames) and a shared aggregate bandwidthis provided at the second level for serving peak demands (forsegments requiring additional bandwidth over the reservedamount) of all the flows. The reserved bandwidth is givenby the largest of the sizes of the pth-percentile segments(with segments ordered by size), and the shared bandwidth,using Markov inequality for guaranteeing a desired loss prob-ability for the bursty segments. This scheme can be seenas a special case of our multi-tiered framework. The totalbandwidth needed for providing equivalent guarantees canbe lowered using our approach wherein the first level sub-streams are also considered while determining the aggregatebandwidth at the second level and slack from the first-level isused at the second level. We compared our approach with

Page 6: [IEEE 2011 IEEE 19th International Workshop on Quality of Service (IWQoS) - San Jose, CA, USA (2011.06.6-2011.06.7)] 2011 IEEE Nineteenth IEEE International Workshop on Quality of

−8,−1 −1,−8 −7,−2 −2,−7 −6,−3 −3,−6 −5,−4 −4,−50

5

10

15

20

log10 of loss rates for the two classes

savi

ngs

in b

andw

idth

(%)

Sources smoothed over a GoP

10 sources25 sources50 sources75 sources100 sources

(a)

−8,−1 −1,−8 −7,−2 −2,−7 −6,−3 −3,−6 −5,−4 −4,−50

5

10

15

20

log10 of loss rates for the two classes

savi

ngs

in b

andw

idth

(%)

Optimally smoothed sources

10 sources25 sources50 sources75 sources100 sources

(b)

1e−8,1e−5,1e−2 1e−7,1e−4,1e−1 1e−6,1e−3,10

5

10

15

20

25

loss rates for the three classes

savi

ngs

in b

andw

idth

(%)

Sources smoothed over a GoP − 3 classes

10 sources25 sources50 sources

(c)

Fig. 3: Bandwidth savings with the proposed multi-tiered approach with (a) & (b) two classes and (c) three classes. Loss rates (in %) forthe classes are indicated along the X-axis.

that in [5] using the soccer trace, while requiring that atleast 98.67% of the GoPs be delivered without loss. Sav-ings achieved are quite significant and plotted in Fig. 4.

1e−8 1e−7 1e−6 1e−5 1e−4 1e−3 1e−2 1e−15

10

15

20

25

30

35

loss rate for the second tier

savi

ngs

in b

andw

idth

(%)

10 sources25 sources50 sources75 sources100 sources

Fig. 4: Bandwidth savings in the context of differ-

ential QoS by frame size.

Differential QoS todifferent frame types:Our multi-tiered ap-proach can also beused to provide dif-ferentiated QoS todifferent parts orsub-streams of in-dividual flows, e.g.,the I, P, and B frames.

For simplicity andwithout loss of gen-erality, assume thatall the flows re-quire identical lossrate bounds for any given type of frame (I, P, or B). Then,our approach can be used for ensuring the desired boundsusing three tiers or classes by constructing three (sub-)streamscorresponding to the three frame types from each stream. Tolower the pessimism, all the frames from the streams in thelower classes can be included in the higher ones also, andthe bandwidth estimation algorithms appropriately modified toaccount for the difference. Our evaluation using 25 instances ofthe sony demo trace yielded a bandwidth savings of 25% whenthe loss rates for the I and P frames was set to 0% and thatfor the B frames to 0.1% and a savings of 10% when the lossrate for P frames was also increased to 0.1% (in comparisonto an approach that makes exclusive reservations for the threeframe types). Since it is desirable to avoid losing the I-frames,we chose a loss rate of 0% for the I-frames.

V. IMPROVING QOE BY LOWERING CORRELATED LOSSES

We now turn to run-time techniques for lowering correlatedlosses or loss clumps during bursts in order to improve thesmoothness of video quality and hence end-user quality-of-experience (QoE).

It is known that multiplexing of video sources cannot totallyeliminate rate variability. Hence, when a CBR channel isallocated for the multiplexed sources, losses in the bursty slots

could be quite high, leading to a sharp drop in the end-userquality for the associated frames. When detailed frame tracesare available either for the entire stream length, as is possiblewith stored video, or for a limited number of future frames,e.g., as is the case with near-real-time video, future bursts canbe predicted, and there is the possibility of ameliorating lossspikes. We present one scheme for doing so in this section.For ease of description, we assume that all N flows belongto a single QoS class, but the approach can be extended tomultiple classes, as well.

We assume that per-slot allocation among streams is fair inthe sense that the bandwidth allocated to a stream in a slot isproportional to its total incoming bytes in that slot. In otherwords, the total allocation bi to stream i in slot j is given bybi = B · ai,j

Aj, where B is the total bandwidth available for

the aggregate stream, ai,j is the total traffic due to stream i inslot j, and Aj =

∑Ni=1 ai,j . It is easy to see that with such an

allocation, in each slot, the loss rate for all the streams wouldbe equal. As in Sec. IV, we assume that the traffic pertainingto a slot should be transmitted by the end of the slot. Any partof the traffic that could not be transmitted is dropped.

Loss spikes can be lowered by streaming portions of futurebursts early on. Delayed streaming is not possible due to therestriction that the contents of a slot should be transmitted byits end. Since the earlier slots are not guaranteed to containslack, one way of making room for the bursty frames is toincur some losses in the earlier slots, provided doing so wouldnot adversely impact the end-user quality of those frames.As discussed in Sec. II, the approach suggested is related toadaptive streaming.

Problem formulation: We formulate the problem of minimiz-ing loss spikes and variability as follows. As discussed earlier,the loss rate for all streams in a given slot can be made equalby allocating bandwidth to the streams in proportion to theircumulative frame sizes. We will hence focus on minimizingthe maximum loss and loss variability for the aggregate stream.Let �� = 〈�1, �2, . . . , �M 〉 denote the aggregate loss vectorwhere �1 ≥ �2 ≥ . . . ≥ �M denote the loss rates in Mslots arranged in non-increasing order. (Note that �i does notcorrespond to the loss in the ith slot, but the ith largest lossover the M slots. We will let ri denote the loss rate in the ith

slot.) Then, the objective is to determine an optimal feasible

Page 7: [IEEE 2011 IEEE 19th International Workshop on Quality of Service (IWQoS) - San Jose, CA, USA (2011.06.6-2011.06.7)] 2011 IEEE Nineteenth IEEE International Workshop on Quality of

vector �� for the stream such that for any other vector ��′,∑ki=1 �i ≤ ∑k

i=1 �′i, k = 1, . . . ,M . By this definition since

�1 ≤ �′1 for all ��′, we first seek to minimize the maximum lossrate over all slots. Next, over all allocations that minimize themaximum loss rate, we seek an allocation that minimizes thenext largest loss rate and so on.

To obtain an optimal loss vector for a given traffic sequence,note that for any two consecutive slots with raw loss ratesr1 = max(0, 1− B

A1) and r2 = max(0, 1− B

A2), where r1 < r2,

the optimal rates are given by r1 = r2 = max(0, 1− 2BA1+A2

).These rates are obtained by streaming part of the contentsof slot 2 in slot 1 and increasing the loss suffered by slot1. This idea can be extended to a longer M -slot sequence asfollows. The loss rate for the first k slots is given by r1 = r2 =. . . = rk = r, where r = maxj=1,...,M{max{0, 1− j·B

∑ji=1 Ai

}},

where k = max{j|{max{0, 1 − j·B∑j

i=1 Ai} = r}. Note that

k is the latest slot such that grouping slots 1 through k andstreaming their contents over the k slots to equalize their lossesby pre-streaming when needed, yields the highest loss rate r1for slot 1, and r1 is chosen as the loss rate for the first k slots.The loss rates for the remainder of the traffic sequence canbe obtained by applying the above procedure to the rest ofthe slots. For M slots, the worst-case complexity of the aboveprocedure is O(M2).

For illustration, consider an example with M = 6 slots, andlet the number of incoming bytes be A1 = 150, A2 = 150,A3 = 200, A4 = 130, A5 = 180, and A6 = 100. Let theavailable bandwidth be 150 bytes per slot. In the absence ofloss minimization attempts, slots 3 and 5 would suffer lossrates of 25% and 16.67%, respectively, while the remainingslots would not have any loss. The loss rate for slot 3 canbe brought down to (A1+A2+A3)−3·B

A1+A2+A3, which is 10%, by

streaming 30 bytes of A3 in slot 2. Doing so would requireshedding 15 bytes each of slots 1 and 2, that is, increasing theirloss rates to 10%. Similarly, loss rate for slot 5 can be broughtdown to 3% by streaming 24 of its bytes in advance in slot 4and dropping six and four bytes of A5 and A4, respectively.Since the loss rates for A4 and A5 are lower than those of theinitial three slots, it is not possible to lower the loss rates forany of these slots any further. Thus, slots 1–3 form a groupwhile slots 4 and 5 form the second.

The main challenge in lowering loss variability lies inefficiently determining adjacent slots that should be mergedto form “slot groups.” An O(M)-time algorithm for accom-plishing it is provided in the listing in Algorithm 3 (Amin

loss).The algorithm can also be modified to respect the maximumpermissible loss rates per slot (as opposed to rendering themequal) when feasible.

Detailed description of Aminloss: As opposed to the O(M2)

procedure described above which requires a pass over all theremaining slots to identify the next group (with equal loss ratesfor its slots) regardless of the length of the group, Amin

loss makesjust a single pass over the slots to identify all the groups.Aminloss achieves this by maintaining partial groups and merging

two or more of the trailing groups as each successive slot isconsidered.

To give more specifics, the algorithm computes the rawtransmission rate, denoted curr tx frac and given by B/Ai,

Algorithm 3 Alossmin: Algorithm to minimize correlated losses

1: /*B: bandwidth available per slot; Ai: incoming bytes in slot i;M : total slots*/

2: num grps := 0; /*number of groups of consecutive slots*/3: start slot , end slot : array [1..M ] of integer;4: tx frac, loss rate : array [1..M ] of real;5: for i := 1 to M do6: curr tx frac := B/Ai; /*tx fraction for slot i’s contents*/7: /*if tx frac is less than preceding group’s, borrow b/w from

the preceding group*/8: if num grps > 0 ∧ curr tx frac < 1 ∧ curr tx frac <

tx frac[num grps ] then9: k := num grps ;

10: slots := 1;11: /* Merge as many groups as needed*/12: while k > 0 ∧ curr tx frac < 1 ∧ curr tx frac <

tx frac[k] do13: prev slots := end slot [k]− start slot [k] + 1; /* num

slots in the previous group*/14: curr tx frac := prev slots+slots

1tx frac[k]

×prev slots+ 1curr tx frac

×slots;

/*tx frac for the merged group*/15: slots := slots+ prev slots; /*num slots in the merged

group*/16: k := k − 1;17: end while18: end slot [k + 1] := i;19: num grps := k + 1; /*adjust the number of groups*/20: else21: /*if slot i’s tx frac is higher than the previous group’s, place

slot i in its own group*/22: num grps := num grps + 1;23: tx frac[num grps ] := curr tx frac;24: start slot [num grps ] := end slot [num grps ] := i;25: end if26: end for27: /*Compute per-slot loss rates loss rate using loss rate = 1−

min(1, tx frac);*/

for the next slot i and compares it with that of the lastgroup in the list of constructed groups (tx frac[num grps ])(line 8). (We let tx frac to be larger than 1.0 to facilitatecomputing the transmission rate when two groups are merged.)If curr tx frac is smaller, then slot i is merged with the lastgroup so that loss rate for it is lowered. Further, the remaininggroups are scanned in the reverse order, merging the currentgroup with the last group from the list until a group witha smaller value for tx frac is encountered or all the groupshave been examined (while loop in line 12). When two groupsare merged, tx frac for the merged group is appropriatelyadjusted (set to the ratio of total bandwidth available to thetotal incoming bytes in all the slots of the group). On theother hand, if curr tx frac for the next slot i is in fact largerthan that of the last group, then slot i is placed in a group ofits own, and the algorithm proceeds to examine the next slot(lines 22– 24).

Algorithm complexity: The O(M) complexity for the algo-rithm follows from the M iterations of the for loop at line 5and the fact that the while loop at line 12 (within the for loop)is executed at most M times in all, over all the iterations ofthe for loop enclosing it. To see the latter, note that the whileloop is executed only if curr tx frac (fraction of transmittedbytes for the current group) is less than tx frac of the last

Page 8: [IEEE 2011 IEEE 19th International Workshop on Quality of Service (IWQoS) - San Jose, CA, USA (2011.06.6-2011.06.7)] 2011 IEEE Nineteenth IEEE International Workshop on Quality of

0 2000 4000 60000

0.05

0.1

0.15

0.2

slot number

loss

frac

tion

vanilla streaming

0 2000 4000 60000

0.05

0.1

0.15

0.2

slot number

loss

frac

tion

loss minimization with disjoint windows

0 2000 4000 60000

0.05

0.1

0.15

0.2

slot number

loss

frac

tion

work−ahead streaming with sliding windows

0 2000 4000 60000

0.05

0.1

0.15

0.2

slot number

loss

frac

tion

loss minimization with sliding windows(a) (b)

(c) (d)

Fig. 5: Per-slot loss rates with an epoch length of 50 slots.

group, in which case the current group is merged with thelast group, lowering the total number of groups by one. Eachstep within the while loop requires constant time. Hence, sinceeach execution of the while loop lowers the number of groupsby one. the bound on the number of executions easily followsby noting that for M slots, the total number of groups is atmost M .

It can be shown that the above algorithm returns an optimalloss vector per our definition of optimality provided earlier.Loss minimization over epochs: The above algorithm can

be successively invoked at the beginning of disjoint epochsor windows of M consecutive slots to minimize losses overthem. The length of an epoch may be based on the numberof future slots for which trace details are available. Whilesuch information is readily available for the entire length of astored video, it can be expected to be available for a reasonableamount of time into the future for near-live video as well.

Minimizing losses over disjoint epochs has the disadvantagethat slack cannot be distributed across epochs. This drawbackcan be overcome by using sliding epochs (of the same size)as opposed to disjoint epochs. Note that unlike increasing theepoch length, a sliding epoch does not increase the amountof look-ahead that is needed. The use of a sliding epochwould however necessitate that the algorithm be invoked at thebeginning of each slot. The first invocation (at the beginningof first slot) would minimize losses over the first M slots.The remaining invocations need only consider one future slot(which is M−1 slots away from the current slot) and adjust therates of the groups already constructed based on the incomingbytes in the new slot, and the amount of content, if any, alreadystreamed from the current and later slots. Hence, during allinvocations except the first, the for loop in line (3) will beexecuted just once. It can be shown that the amortized per-slot complexity is O(1).Empirical Results: We evaluated our loss minimization ap-

proach using 50 time-shifted and wrapped-around copies of theMPEG-4 encoded star wars trace available at [27]. The framerate for the trace used is 25 fps and its GoP is 12 frames long.Our slot duration was set to one GoP duration, that is, 480

0 2000 4000 60000

0.05

0.1

0.15

0.2

loss

frac

tion

work−ahead streaming with sliding windows

slot number0 2000 4000 6000

0

0.05

0.1

0.15

0.2

slot number

loss

frac

tion

loss minimization with sliding windows

Fig. 6: Per-slot loss rates with an epoch length of 250 slots.

ms. Peak and mean rates can be found in Fig. 2. We reportresults for traces smoothed over one GoP and those optimallysmoothed over 50 GoPs.

We determined the per-slot loss rates for the aggregatedtrace separately under (a) vanilla streaming (without lossminimization or work-ahead streaming), wherein in each slotat most B of the incoming bytes of that slot are streamed,(b) max loss minimization streaming with disjoint epochs,(c) opportunistic work-ahead streaming but without loss min-imization, and (d) max loss minimization streaming withsliding epochs. In opportunistic work-ahead streaming, weallow future frames to be streamed early if there is slack inan earlier slot but do not subject the earlier slot to any losses.For all the schemes, the available bandwidth per slot, B, wasset to 840 KB.

Per-slot loss rates under the different streaming schemesfor traces smoothed over one GoP, with epoch length of50 (for schemes (b)–(d)), are plotted in Fig. 5. Cumulativedistribution of per-slot loss rates is plotted in Fig. 7. Wesee that the loss-minimization approach with disjoint epochs(in inset (b)) reduces both the number and magnitude of theloss spikes, in comparison to vanilla streaming (in inset (a)).The spikes are not fully eliminated due to the limited epochlength. Inset (c) shows the loss rates when streaming aheadusing sliding windows (but no attempt to minimize lossesover the slots) is employed. The use of sliding windows withwork-ahead streaming significantly reduces the percentage ofslots incurring losses, but the per-slot loss rate remains high.The pre-slot loss rates are brought down drastically with ourproposed approach using sliding windows (inset (d)).

Fig. 6 shows results for schemes (c) and (d) when the epochlength is increased to 250 slots (which amounts to 2 minsthat may still be reasonable for near live video). (Resultsfor schemes (a) and (b) are omitted due to lack of space.)Increasing the epoch length favors the loss-minimization ap-proach, both with disjoint (not shown) and sliding windows.With disjoint windows, both the % of slots with losses, andthe maximum loss and loss per slot are reduced. With slidingwindows, the magnitude of the losses is lowered only slightly,while the % of slots with losses increases slightly. This is dueto the fact that with sliding windows, the effective numberof slots over which loss is distributed is much larger thanthe epoch length, and hence, the scope for improvement issmaller when epoch length increases. From Fig. 7, it can beseen that while the fraction of slots incurring losses is lowerwith scheme (c), it has a long tail in that the loss rate is highfor most of the lossy slots. In comparison, the per-slot lossrates are lower by around a factor of five under our proposedapproach (scheme (d)).

Page 9: [IEEE 2011 IEEE 19th International Workshop on Quality of Service (IWQoS) - San Jose, CA, USA (2011.06.6-2011.06.7)] 2011 IEEE Nineteenth IEEE International Workshop on Quality of

0 0.05 0.10.6

0.7

0.8

0.9

1

loss fraction per slot

perc

entil

e sl

ots

star wars (epoch size:50 slots)

vanillaloss min (disjoint epochs)workaheadloss min (sliding epochs)

0 0.05 0.10.6

0.7

0.8

0.9

1

loss fraction per slot

perc

entil

e sl

ots

star wars (epoch size:250 slots)

vanillaloss min (disjoint epochs)workaheadloss min (sliding epochs)

(a) (b)Fig. 7: Cumulative distribution of per-slot loss rates for GoP-smoothed star wars trace. (a) 50-slot epoch. (b) 250-slot epoch.

0 2000 4000 60000

0.05

0.1

0.15

0.2

slot number

loss

frac

tion

work−ahead streaming with sliding windows

0 2000 4000 60000

0.05

0.1

0.15

0.2

slot number

loss

frac

tion

loss minimization with sliding windows

Fig. 8: Per-slot loss rates with an epoch length of 50 slots for tracesoptimally smoothed (over 50-slot sliding windows).

Finally, Fig. 8 plots the losses under schemes (c) and (d)when the streams have been optimally smoothed over 50 GoPs(i.e., 600 frames). Note that though losses under the work-ahead streaming approach ((c)) is lowered in this case, ourscheme is still capable of reducing the maximum per-slot lossrate by more than a factor of 2.

VI. CONCLUSION

We have presented bandwidth estimation and allocationalgorithms for multiple classes of video flows with differentloss tolerances, with the objective of lowering the aggregatebandwidth needed to meet the QoS needs of all the classes. Wehave also proposed an approach for minimizing the maximumloss and variation in loss across the length of a video in orderto improve the video quality. In our numerical experiments,the proposed methods could lower the total bandwidth needsby over 20% and significantly lower the maximum loss andloss variability. Our schemes can be instantiated across thewired backhaul of a 3G/4G wireless network, to ease the stressdue to the sharp increase in wireless video traffic. The treetopology of the wireless backhaul and the advent of wire-speedprocessors should make it less of a challenge to implement theproposed schemes that have higher computational overheadthan simpler schemes.

Several directions for future work exist. One is to extendthe proposed schemes for wireless links to provide a completeend-to-end solution. Another is to enhance our loss minimiza-tion approach to optimize directly for end-user QoE metricssuch as PSNR.

REFERENCES

[1] Cisco Systems Inc., “Cisco visual networking index: Global mobile datatraffic forecast update, 2009-2014.”

[2] T. V. Lakshman, A. Ortega, and A. R. Reibman, “Vbr video: Tradeoffsand potentials,” Proceedings of the IEEE, vol. 86, no. 5, pp. 952–973,May 1998.

[3] E. W. Knightly and H. Zhang, “D-BIND: An accurate traffic model forproviding qos guarantees to vbr traffic,” in INFOCOM, April 1997, pp.219–231.

[4] E. W. Knightly, “H-BIND: A new approach to providing statisticalperformance guarantees to vbr traffic,” in INFOCOM, March 1996, pp.1091–1099.

[5] J. M. Londono and A. Bestavros, “A two-tiered on-line server-side band-width reservation framework for the real-time delivery of multiple videostreams,” in Proceedings of Multimedia Computing and Networking,January 2009.

[6] D. P. LaPotin, S. Daijavad, S. Johnson, and et al, “Workload andnetwork-optimized computing systems,” IBM Journal of Research andDevelopment, vol. 54, no. 1, pp. 1:1–1:12, January-February 2010.

[7] H. Franke, J. Xenidis, C. Basso, B. M. Bass, S. S. Woodward, J. D.Brown, and C. L. Johnson, “Introduction to the wire-speed processorand architecture,” IBM Journal of Research and Development, vol. 54,no. 1, pp. 3:1–3:11, January-February 2010.

[8] J. D. Salehi, Z.-L. Zhang, J. Kurose, and D. Towsley, “Supporting storedvideo: Reducing rate variability and end-to-end resource requirementsthrough optimal smoothing,” IEEE/ACM Transactions on Networking,vol. 6, no. 4, pp. 397–410, August 1998.

[9] S. Sen, J. Rexford, J. Dey, J. Kurose, and D. Towsley, “Online smoothingof live, variable-bit-rate video,” IEEE Transactions on Multimedia,vol. 2, no. 1, pp. 37–48, March 2000.

[10] Z.-L. Zhang, J. Kurose, J. D. Salehi, and D. Towsley, “Smoothing,statistical multiplexing, and call admission control for stored video,”IEEE Journal on Selected Areas in Communications, vol. 15, no. 6, pp.1148–1166, August 1997.

[11] M. Grossglauser, S. Keshav, and N. C. Tse, “Rcbr: A simple andefficient service for multiple time-scale traffic,” IEEE/ACM Transactionson Networking, vol. 5, no. 6, pp. 741–755, December 1997.

[12] H. Zhang and E. W. Knightly, “Red-vbr: A renegotiation-based approachto support delay-sensitive vbr video,” ACM Multimedia Systems Journal,vol. 5, no. 3, pp. 164–176, May 1997.

[13] J. Kim, R. Simha, and T. Suda, “Analysis of a finite buffer queue withheterogeneous markov modulated arrival processes: a study of trafficburstiness and priority packet discarding,” Computer Networks and ISDNSystems, vol. 28, no. 5, pp. 653–673, March 1996.

[14] V. Trecordi and G. Verticale, “Per-flow delay performance in a fifoscheduler fed by policed udp sources,” Computer Communications, pp.309–316, 2000.

[15] T. Yang, D.-H. Tsang, and P. McCabe, “Cell scheduling and bandwidthallocation for heterogeneous VBR video conferencing traffic,” in IEEEGlobecom, April 1995, pp. 377–377.

[16] C. Liu and J. Layland, “Scheduling algorithms for multiprogramming ina hard real-time environment,” Journal of the ACM, vol. 30, pp. 46–61,January 1973.

[17] J.-C. Bolot and T. Turletti, “Experience with control mechanisms forpacket video in the internet,” ACM SIGCOMM Computer Communica-tion Review, vol. 28, no. 1, pp. 4–15, January 1998.

[18] G.-M. Su and M. Wu, “Efficient bandwidth resource allocation for low-delay multiuser video streaming,” IEEE Transactions on Circuits andSystems for Video Technology, vol. 15, no. 9, pp. 1124–1137, September2005.

[19] X. M. Zhang, A. Vetro, Y. Q. Shi, and H. Sun, “Constant qualityconstrained rate allocation for fgs coded video,” IEEE Transactions onCircuits and Systems for Video Technology, vol. 13, no. 2, pp. 121–130,February 2003.

[20] J. Huang, C. Krasic, J. Walpole, and W.-C. Feng, “Adaptive live videostreaming by priority drop,” in IEEE Conference on Advanced Videoand Signal Based Surveillance, July 2003, pp. 342–347.

[21] Z. Anotoniou and I. Stavrakakis, “An efficient deadline-credit-basedtransport scheme for prerecorded semisoft continuous media applica-tions,” IEEE/ACM Transactions on Networking, vol. 10, no. 5, pp. 630–643, October 2002.

[22] S. Oh, B. Kulapala, A. Richa, and M. Reisslein, “Continuous-timecollaborative prefetching of continuous media,” IEEE Transactions onBroadcasting, vol. 54, no. 1, pp. 36–52, March 2008.

[23] P. Skelly, M. Schwartz, and S. Dixit, “A histogram-based model forvideo traffic behavior in an atm multiplexer,” IEEE/ACM Transactionson Networking, vol. 1, no. 4, pp. 446–459, August 1993.

[24] D. Tse and M. Grossglauser, “Measurement-based call admission con-trol: Analysis and simulation,” in INFOCOM, April 1997, pp. 981–989.

[25] W.-C. Poon and K.-T. Lo, “The study on statistical multiplexing ofhomogeneous vbr-mpeg video streams,” Computer Communications,vol. 22, no. 15-16, pp. 1457–1467, September 1999.

[26] F. P. Kelly, “Notes on effective bandwidths,” Stochastic Networks:Theory and Applications, pp. 141–168, 1996.

[27] F. Fitzek and M. Reisslein, “Mpeg4 and h.263 video traces for networkperformance evaluation,” IEEE Network, vol. 15, no. 6, pp. 40–54,December 2001.