1638 ieee transactions on multimedia, vol. 15, no. 7,...

1638 IEEE TRANSACTIONS ON MULTIMEDIA, VOL. 15, NO. 7, NOVEMBER 2013

Proxy-Based Multi-Stream Scalable VideoAdaptation Over Wireless Networks

Using Subjective Quality and Rate ModelsHao Hu, Xiaoqing Zhu, Yao Wang, Fellow, IEEE, Rong Pan, Jiang Zhu, and Flavio Bonomi

Abstract—Despite growing maturity in broadband mobilenetworks, wireless video streaming remains a challenging task,especially in highly dynamic environments. Rapidly changingwireless link qualities, highly variable round trip delays, andunpredictable traffic contention patterns often hamper the per-formance of conventional end-to-end rate adaptation techniquessuch as TCP-friendly rate control (TFRC). Furthermore, existingapproaches tend to treat all flows leaving the network edge equally,without accounting for heterogeneity in the underlying wirelesslink qualities or the different rate utilities of the video streams.In this paper, we present a proxy-based solution for adapting thescalable video streams at the edge of a wireless network, whichcan respond quickly to highly dynamic wireless links. Our designadopts the recently standardized scalable video coding (SVC)technique for lightweight rate adaptation at the edge. Leveragingpreviously developed rate and quality models of scalable videowith both temporal and amplitude scalability, we derive therate-quality model that relates the maximum quality under agiven rate by choosing the optimal frame rate and quantizationstepsize. The proxy iteratively allocates rates of different videostreams to maximize a weighted sum of video qualities associatedwith different streams, based on the periodically observed linkthroughputs and the sending buffer status. The temporal and am-plitude layers included in each video are determined to optimizethe quality while satisfying the rate assignment. Simulation studiesshow that our scheme consistently outperforms TFRC in terms ofagility to track link qualities and overall subjective quality of allstreams. In addition, the proposed scheme supports differentialservices for different streams, and competes fairly with TCP flows.

Index Terms—Scalable video coding (SVC), subjective videoquality model, video rate adaptation, wireless video streaming.

I. INTRODUCTION

R ECENT years have seen a proliferation of smart phonesand constant bandwidth upgrades in broadband mobile

networks. These two factors combined have fueled the rapid

Manuscript received November 09, 2011; revised September 25, 2012; ac-cepted December 25, 2012. Date of publication June 04, 2013; date of currentversion October 11, 2013. This work was supported in part by a gift award fromCisco Systems, Inc. The associate editor coordinating the review of this manu-script and approving it for publication was Dr. Zhihai (Henry) He.H. Hu, X. Zhu, R. Pan, J. Zhu, and F. Bonomi are with the Advanced Archi-

tecture and Research Group, Cisco Systems, San Jose, CA 95134 USA (e-mail:[email protected]; [email protected]; [email protected]; [email protected]; [email protected]).Y. Wang is with the Department of Electrical and Computer Engineering,

Polytechnic Institute of NYU, Brooklyn, NY 11201 USA (e-mail: [email protected]).Color versions of one or more of the figures in this paper are available online

at http://ieeexplore.ieee.org.Digital Object Identifier 10.1109/TMM.2013.2266092

growth of mobile media traffic. The study in [1] predicts that by2015, two-thirds of world’s mobile data will be video. On theother hand, mobile media streaming remains a daunting task,especially for users in a highly dynamic environment. The pres-ence of heterogeneous access networks and high user mobilitycontribute to the wide fluctuations of wireless link qualitiesin terms of their throughputs and latencies. As multiple videostreaming sessions share the same access node (e.g., a cellularbases station or a WiFi access point), the system also needs toallocate wireless channel resources wisely among competingtraffic flows. It is therefore of crucial importance to have aneffective video rate adaptation scheme to strive for the bestpossible viewing experience of individual users in face of widelink quality fluctuations and dynamic network traffic patterns.The challenges are multifold. First, rate adaptation for

streaming video needs to closely track fluctuations in the avail-able wireless link bandwidth. Conventional techniques suchas TCP-friendly rate control (TFRC) [2], however, typicallyrely on end-to-end packet statistics and fall behind abruptchanges in the underlying network conditions. Second, existingapproaches achieve fairness by allocating equal rates to allcompeting flows, whereas video streams naturally differ intheir utilities of rate depending on their contents. For instance,it would be desirable for an action movie sequence to bestreamed at a higher rate than a head-and-shoulder news clipcompeting over the same bottleneck wireless link. Such con-tent-aware allocation is missing in today’s systems. Thirdly,clients connecting to the same access node may experiencedifferent throughputs over their respective wireless links, dueto factors such as distance and channel fading characteris-tics. Without proper in-network information, rate adaptationdecisions made at the senders can easily lead to inefficientresource sharing. More specifically, packet transmissions overa low-quality wireless link can block the access node fromadequately serving other streams over higher-quality links, aproblem commonly known as head-of-line blocking [3].In this paper, we address the above issues in a novel rate

adaptation scheme for streaming video over a highly dynamicenvironment. Our design introduces a proxy at the edge of thenetwork, right where congestion over the wireless links occurs.This allows the rate adaptation module to constantly monitorthe bottleneck buffer level, which, in turn, reflects variationsin the throughput and delay of wireless links for all receivers.To strike a balance between computational complexity and ef-ficiency, we adopt the latest H.264/SVC standard [4] for light-weight in-network rate adaptation. Although the standard of-

1520-9210 © 2013 IEEE

HU et al.: PROXY-BASED MULTI-STREAM SCALABLE VIDEO ADAPTATION 1639

fers spatial, temporal and amplitude scalability, we only makeuse of temporal and amplitude scalability in the present work.Compared with nonscalable H.264 coding, the relative bit rateincrease at the same fidelity for amplitude scalability can be aslow as 10% for all supported rate points when spanning a bi-trate range with a factor of 2–3 between the lowest and highestsupported rate point [5]. To provide a wider selection of ratepoints while maintaining high coding efficiency, temporal scal-ability can be considered, as providing temporal scalability usu-ally does not have any negative impact on coding efficiency [5].By combined usage of temporal scalability and amplitude scal-ability, a wide bitrate range (with a factor of more than 10) isallowed. The resulting scalable video stream can be decoded atdifferent frame rates (FR) and quantization stepsizes (QS). Wefurther leverage the parametric models from our prior work [6]to explicitly account for the impact of FR and QS on rate andsubjective quality of the scalably encoded stream. These modelsenable our system to choose the best combination of FR and QS,and correspondingly the temporal and amplitude layers, given arate constraint for each stream. The adaptation of both FR andQS supports video delivery over a wide range of rates, which isimportant because of the wide ranging channel conditions overwireless networks.The goal of the video adaptation module at a proxy node is

to maximize the overall viewing experience of all traversingstreams. We show that the problem can be decomposed into twosteps: i) to allocate the video rate for each stream based on theirrespective rate-quality relations and wireless link throughputsand the common bottleneck buffer level; and ii) to extract videopackets belonging to the appropriate temporal and amplitudelayers from each scalable video stream based on the allocatedrate. Given the optimal rate-quality tradeoff derived from theoriginal rate and quality models, the first subproblem of multi-stream rate allocation is solved by maximizing the weightedsum of user qualities under a total network utilization constraint.We propose an iterative solution, whereby the per-stream rate iscalculated based on periodic updates of bottleneck buffer leveland relative link throughputs. The second subproblem can besolved offline, by using the original rate and quality parametricmodels to pre-order the video temporal and amplitude layers sothat each additional layer offers maximum quality improvementfor the rate increment.The main contributions of this study include: i) We derived

a novel analytical rate-quality model, that relates the maximumachievable quality for a given video rate, for scalable video withboth temporal and amplitude scalability; the model is highly ac-curate for a variety of video content and does not require anycontent-dependent parameters except the rate of the completescalable stream; ii) We presented an efficient method for pre-or-dering the temporal and amplitude layers of a scalable stream toachieve rate-quality optimality; iii) We proposed a proxy-basedvideo adaptation architecture for multi-stream video streamingto wireless nodes, which can react quickly to changes in thelink conditions of the wireless nodes; iv)We derived an iterativemulti-stream rate allocation scheme at the proxy, that can max-imize a weighted sum of received video quality at all receivers,given the link bandwidths of all receivers. Extensive simula-tion studies confirm that the proposed scheme consistently out-

performs conventional TFRC-based rate adaptation used in theDatagram Congestion Control Protocol (DCCP) [7].The rest of the paper is organized as follows. The next

section presents a review of related work. Then, we providesa background on scalable video coding and our prior workin modeling the rate and subjective quality of SVC streams.In Section IV, we derive the optimal rate-quality tradeoffbased on the prior subjective quality and rate models, andpresent an algorithm for ordering the temporal and amplitudelayers to achieve rate-quality optimality while consideringthe layer-dependency in SVC. In Section V, we describe theframework for maximizing a weighted sum of the subjectivequality for all participating streams, subject to the channelconstraint and video coding constraints. A iterative rate allo-cation scheme is proposed to solve the quality maximizationproblem. In Section VI we address the practical system designs.Section VII compares performance of the proposed schemeagainst TFRC-based rate adaptation in DCCP under variousnetwork scenarios. Our contributions and future works aresummarized in Section VIII.

II. RELATED WORK

A. Subjective Quality and Rate Modeling

There have been quite extensive research exploring the im-pact of frame rate and quantization step size, individually andjointly, on the perceptual quality [6], [8]–[14]. However, thereis not a widely adopted quality model that considers explicitlythe effect of both FR and QS. In this work, we choose to use thequality model developed in [6], [14], as this model was shownto be highly accurate not only over the authors’ own dataset,but also several other datasets. In addition, this model is ana-lytically simple, with only two parameters that can be estimatedaccurately from features computed from the original video [14].To the best of our knowledge, no prior work except [6] has

proposed rate models that considers the impact of FR andQS ex-plicitly. By using both the quality and rate models in [6], we de-velop in this work a rate-quality model. Using both the rate andquality models, we also develop an efficient algorithm to pre-order the temporal and amplitude layers to achieve rate-qualityoptimality.

B. Video Adaptation and Rate Allocation

It has been long agreed that the video streaming rate needssome form of adaptation to match the time-varying wirelesschannel capacity [15], to provide a better user experience. Atthe encoder end, techniques such as adaptive encoder rate con-trol [16], [17], transcoding [18], and bitstream switching [19]are proposed to dynamically adjust video rate. A viable alter-native to this is scalable video coding, whereby a stream onlyneeds to be encoded once yet can be flexibly decoded at sev-eral different target rates [20]. Such a design greatly facilitateson-the-fly adaptation of the spatial resolution, temporal rate,and frame quality (controlled by QS) of the transmitted videostream. The recently standardized SVC extension in H.264, inparticular, has succeeded in achieving comparable coding effi-ciency as the non-scalable encoding in H.264/AVC [5]. Hence


it is especially appealing for video streaming in a mobile en-vironment [21]. However, given a target rate, there are manypossible combinations of SVC spatial, temporal and amplitudelayers that can lead to different perceptual quality. Therefore,adaptation of SVC streams subject to a rate constraint is not atrivial problem. Using the rate and quality models in [6], we canpre-order the temporal and amplitude layers in a SVC stream toreach rate-quality optimality, greatly simplifying the SVC adap-tation problem.In terms of underlying rate control protocol/algorithm, many

conventional schemes rely on equation-based TCP-friendlyrate control (TFRC) [2] for regulating the rate of each stream[22], [23]. However, these end-to-end schemes often sufferfrom slow convergence when the bottleneck link bandwidthchanges rapidly, and lead to allocation results oblivious ofvideo content characteristics. [24] proposed a TCP-friendlyvideo transport protocol targeting for wireless environment,but it is still content-agnostic. [25]–[28] rely on using a videorate-distortion model to solve network resource allocationwhile providing video content-awareness. The model is how-ever only applicable for videos at a fixed frame rate. Cross-layerdesign is also investigated in the literature to improve the videoadaptation over wireless [29]–[32]. In a very recent work [33],proxy-assisted video adaptation is considered for the casewhere multiple streams share a common backbone network andthe video rate-distortion model is used. Our work stands apartfrom existing approaches by combining the rate and qualityadaptation capability of H.264/SVC with a rate-quality tradeoffmodel that considers effect of both frame rate and quantizationstepsize on the rate and quality. In addition, link bandwidthheterogeneity is considered in the rate allocation. The proposedalgorithm enables a fast converging, quality optimized rateallocation at the proxy node. The proposed system is capable ofboth closely following dynamics in the wireless link bandwidthand tailoring the rate allocation for each stream based on itsown rate-quality tradeoff and the effective link bandwidth, andchoosing the optimal combination of frame rate and quantiza-tion stepsize that maximizes the quality for a given rate.

III. SVC BACKGROUND AND PRIOR WORKIN QUALITY AND RATE MODELING

A. SVC Coding Scheme and Rate Adaptation

Scalable video coding schemes have been advocated forvideo adaptation to network and terminal capabilities due to itslow complexity and flexibility [20]. This approach eliminatescomputationally demanding transcoding processes at videoservers or intermediate proxies by simply extracting appro-priate bitstreams according to network or terminal constraint.It is shown that the latest SVC standard [4] can achieve com-parable coding efficiency as the state-of-the-art H.264/AVCnon-scalable coding [34].In SVC, the motion-compensated transform coding architec-

ture is extended to achieve a wide range of spatio-temporal andamplitude scalabilities. Fig. 1 illustrates the typical structure ofa group of pictures (GOP) that implements only temporal and

Fig. 1. Structure of a group of pictures (GOP) in an H.264/SVC stream encodedwith the coarse granularity scalablity (CGS) approach. The GOP length is 8frames in this example. The stream supports 3 amplitude layers and 4 temporallayers.

amplitude scalabilities. Each frame in the video sequence is en-coded intomultiple amplitude layers (labeled as , and )with decreasing QS . Inter-frame prediction among the picturesin each amplitude layer follows a dyadic pattern, leading to sev-eral temporal layers (labeled as , , etc.) with increasing FR. This approach, also known as coarse granularity scalability(CGS), allows the stream to be flexibly decoded at various com-binations of amplitude and temporal levels without introducingany mismatch error in the decoding process. In the exampleof Fig. 1, for instance, the stream supports 3 amplitude layersand 4 temporal layers, thereby allowing 12 rate-quality tradeoffpoints. If at certain point, the server or proxy decides to send

, then all chunks with label willbe extracted for sending, e.g. those shaded chunks in the GOP.In the bitstream, each chunk represents a network abstract layer(NAL) unit, containing bits at that layer from a single videoframe.With CGS, video adaptation is allowed at GOP boundaries.

In the same example, suppose the highest framerate is 30 framesper second (fps), then the video rate can be switched every 8/30seconds. There is a tradeoff between granularity of the adap-tation interval and coding efficiency since smaller GOP sizeleads to lower compression ratio. The typical choice of GOPsize ranges between 8 to 32 for video streaming.

B. Modeling Subjective Quality and Rate of SVC

In [6], [14], the authors studied the impact of FR and QS onthe subjective quality and bitrate of scalable video. Based onthe mean opinion score (MOS) obtained from subjective qualitytests, it is observed that the impact of FR and that of QS onthe MOS is separable. In other words, the video quality can bequantified as the product of a metric that accesses the quality of aquantized video at the highest frame rate, based on the QS , anda temporal correction factor which scales the quality assessmentaccording to the actual FR . The normalized subjective qualityat any QS and FR can be written as:

(1)

where are content-dependent model parameters; de-notes the minimum QS; denotes the maximum FR. Herequality is normalized with respect to the quality achievable at


TABLE IPARAMETERS FOR RATE MODEL, QUALITY MODEL, AND RATE-QUALITY MODEL AND THE RMSE FOR RATE-QUALITY MODEL

. Note that, the parameter indicates how fastthe quality drops with increasing , with a larger suggestinga faster drop. On the other hand, the parameter determineshow fast the quality reduces when decreases, with a smallerincurring a faster drop. The parameter values depend on the

motion and texture characteristics of the video. The parametervalues for 7 test sequences used in [14] are given in Table I. Ascan be seen, ranges between 0.05–0.2, while varies between5–9, for these sequences which cover a large range of motionand texture characteristics.For the same set of encoded sequences, their bitrates are

recorded and the influence of and on the bitrate is analyzed.Following the same decomposition approach, it is shown thatthe bitrate can also be modeled as the product of two functionsof and , respectively. The overall rate model is:

(2)

where are content-dependent model parameters andcorresponds to the bitrate at . Here, character-izes how fast the bitrate reduces when increases. On the otherhand, controls the bitrate decaying speed when decreases.Table I shows that varies over 1.0–1.3, while ranges between0.4–0.8.How to estimate the values of based on the

video characteristics can be found in [35]. Basically, two con-tent features, i.e., average frame difference and average mo-tion vector magnitude, when linearly combined, suffice to givean acceptable prediction accuracy. These features and conse-quently the model parameters can be computed along with theencoding operation at the video server, with a complexity nomore than that of motion estimation.

IV. RATE-QUALITY MODELING AND QUALITYOPTIMIZED ORDERING OF SVC LAYERS

A. Deriving Rate-Quality Model

As decreasing FR or increasing QS will lead to decreasingbitrate and vice versa, there might be multiple combinationsof FR and QS that satisfy a given bitrate constraint . Butthe associated quality is different. It is then desirable to findthe combination that gives the best quality while satisfying .By finding the optimal and the corresponding maximumachievable normalized quality for each possible , we arriveat the rate-quality model, to be denoted as .

First, we define some notations that will be used later: normal-ized bitrate , normalized FR andrelative QS . Given a rate constraint , itcan be derived from (2) that , which is equivalent to

. Then, we can rewrite (1) in terms ofas

(3)

To find the optimal FR which maximizes for agiven , we solve for . Together with (2),we have the following relation between the optimal andthe rate contraint ,

(4)

(5)

Note here, as and , if the resulting is lessthan 1, the constraint is active and then is clipped to1; a new is calculated using (5). The above equations give acriteria for choosing the FR and QS under any rate constraint.It is easy to verify that and sothat is unique and is monotonically increasing whileis monotonically decreasing as increases. Then, the best

normalized quality can be derived from by using(1). Although it is hard to derive closed-form relations betweenthe rate constraint and the optimal , , and ,we can easily compute , , and for any given based on(4) and (5) numerically (for example, using the “solve” functionin MATLAB). For notational simplicity, in the sequel, we willignore the superscript in , and use to denote theoptimal rate-quality tradeoff. Fig. 2 shows the and thecorresponding versus bitrate for two sequences, FOREMANand FOOTBALL.Fig. 3 shows the numerically computed optimal rate-quality

tradeoff curves in terms of normalized rate for seven sequenceswith various video characteristics. We found that these curvescan be closely approximated with the following exponentialfunction

(6)


Fig. 2. Optimal FR and QS (left axis) and the corresponding optimal nor-malized quality (right axis) versus bitrate. These results assume that both FRand QS can take on any value in their respective ranges, i.e., and

. (a) FOREMAN; (b) FOOTBALL.

Fig. 3. Normalized optimal Rate-Quality tradeoff curves for seven sequences:AKIYO, CITY, CREW, FOOTBALL, FOREMAN, ICE and WATERFALL.The solid line gives a unified rate-quality model.

where are model parameters. controls the overallquality dropping rate, with larger indicating faster drop.impacts the quality gain over bitrate increment. In other words,larger implies less bits increment for the same quality gain.In order to estimate and , we have to first determine a setof the optimal points by using (3) and (4), then em-ploy standard non-linear regression method. For pre-encodedvideo, this can be done offline. For real-time live video, this isstill feasible since we only need to re-estimate the parametersevery few seconds or longer. Table I summarizes the model pa-rameters for each sequence and the model accuracy in terms ofthe root-mean-square error (RMSE). The fitting curve for indi-vidual sequence matches with its data very accurately and henceare not shown separately in Fig. 3.Because the optimal curves for different sequences in

Fig. 3 are very close to each other, we further propose to usea unified model with the same parameter set for all sequences.Using least squares fitting to the data from all sequences,we found and . This unified model isalso shown in Fig. 3 and summarized in Table I. As will beshown later, the rate allocation results obtained using the uni-fied model are very similar to those obtained using content-de-pendent values for and . Note that although we propose touse the same parameters and , the maximum rate is

still video-sequence dependent. We can rewrite the as afunction of the absolute rate as

(7)

B. Rate-Quality Optimized Ordering of SVC Layers

To efficiently stream a pre-coded scalable video where thetarget bit rate is changing dynamically, it is desirable to pre-order the SVC layers in a rate-quality optimized manner, sothat each additional layer yields the maximum possible qualityimprovement. With such a pre-ordered SVC stream, the proxycan simply keep sending additional layers, until the rate targetis reached. Noting that each SVC layer (together with its pre-vious layers) corresponds to a feasible pair, the problemis equivalent to ordering the feasible pairs, subject to de-coding dependency constraint. In this section, we discuss howto employ the rate and quality models given in Section IV-A tooptimize the ordering of pairs.We have shown that (4) and (5) can be employed to determine

the optimal which gives the best quality under a rateconstraint. However, the resulting might not always befeasible. For example, if a SVC stream is encoded with a dyadictemporal prediction structure, the feasible FR is doubled everytime from the lowest to the highest FR. Similarly, there are onlya small number of amplitude layers in a typical SVC stream,corresponding to a few discrete levels of QS . In the following,we discuss how to take into account such practical limitations.Suppose there are temporal layers and amplitude

layers, the corresponding feasible choices of FR and QS areand , respectively. We can

construct a table which gives all possible combinations, eachindicated by a quadruplet . If some combinationin this table has higher but lower than at least one othercombination, then it is clearly not rate-quality optimal. Wecan eliminate these points and order the remaining points inincreasing rates with two steps. The first step is to sort all pointsin terms of their rates from low to high. Then starting fromthe point with the second lowest rate to the end (the first pointcorresponds to the base layer), we compare the quality of thecurrent point to that of the previous kept point; we remove thecurrent point if the quality is less or equal, otherwise keep thecurrent point. We denote the table that contains the remainingentries as . The complexity of generating this table is

.The points in has the property that as the rate increases,

the quality also increases. Some of the points in may notbe optimal in the sense that they do not provide the maximumpossible quality improvement for the incurred rate increment.Furthermore, some points in may not satisfy the SVC de-coding dependency. For example, a current point may have aFR that is lower than the previous point, or a QS that is higherthan the previous point. There are multiple ways to removethe non-feasible points in based on the rationale that thenext feasible point can be either increasing in FR or decreasingin QS or both. We use the following Algorithm 1 to removethe non-optimal and non-feasible points in , and create thetable . The associated complexity is also .


TABLE IIENTRIES OF FOR SEVEN SEQUENCES. EACH ENTRY IS A QUADRUPLET OF

Algorithm 1 Generate rate-quality optimized table

put into table ; .

while do

for

do

if has lower FR or higher QS thanlast point in then

continue;

end if

calculate from points and;

end for

find which gives the highest value;

put into table ;

;

end while

Fig. 4 shows the rate-quality relations of points contained intable and and the corresponding for eachpoint obtained using Algorithm 1. Clearly, many feasible pointsare not optimal and are removed to get . A few points in

are still not optimal or do not satisfy the decoding depen-dency, which are removed to get . Note that the removedpoints in tend to have a much smaller quality improve-ment compared to a neighbor point with slightly larger rate. Asthe target rate increases, the FR monotonically increases whilethe QSmonotonically decreases, thereby satisfying dependencyacross the layers. The points in follow the concave shapeof the rate-quality curves in Fig. 3 very closely, indicating thatordering SVC layers based on these points yields near-optimalrate-quality tradeoff. Table II summarizes the entries in forall seven sequences.

Fig. 4. Quality optimized table for the sequence FOOTBALL. (a) rate-qualitytradeoff of points in and ; (b) corresponding values for pointsin .

V. QUALITY MAXIMIZATION FRAMEWORK AND ALGORITHMSFOR RATE ALLOCATION

In this section, we develop a subjective quality maximizationframework for rate allocation amongmultiple wireless receiversunder the same wireless access node. The framework is basedon the rate-quality model in (7) and can be easily adapted toaccommodate other models.

A. Problem Formulation

Consider a set of video receivers sharing a common accessnode. For each receiver experiencing a wire-less link throughput of , the interested video can be adaptedto have an equivalent coding parameter setting indicated by

, which results in subjective quality of andvideo bitrate of . In addition, suppose there are a setof background flows, each generates rate .From system-wide of view, the optimal network utilizationis achieved by solving the following utility maximizationproblem.

(8)

(9)

(10)


Here and denote the possible choice of QS and FR forthe video requested by receiver , respectively. The objectivefunction (8) is the weighted sum of subjective quality over allvideo receivers where important receivers will be assigned witha larger weight. The constraint (9) ensures that the aggregatedchannel utilization time is below the maximal system utilizationratio (which we assume to be 1 here, but can in general be lessthan 1). (10) specifies the coding parameter constraints.The computational complexity involved in solving the opti-

mization problem increases with the dimension of the codingparameter set and the number of video streams. Besides, thesecoding parameters are usually integers. Thus, the problem be-comes combinatorial hence computationally expensive. Sup-pose there are possible choices of , possible choices ofand video streams, the number of possible combinations

would be .We, instead, propose to solve a relaxed form of this

problem by allowing continuous choices of video rates, andby leveraging the optimal rate-quality tradeoff developed inSection IV-A. After obtaining candidate sending rates from thefirst step, we can then simply pump pre-ordered video packetsinto the network as described in Section IV-B.To solve for the candidate sending rate for each video re-

ceiver, we reformulate the problem as follows.

(11)

(12)

(13)

where denotes the base layer bitrate for video .

B. Iterative Solution

Claim 1: In practical video rate regions, the problem(11)–(13) is a concave maximization problem.

a) Proof: To justify this claim, we show the concave re-gion for , or equivalently, .

As , thengives the concave region of . And the concave re-gion is . At the lower boundary

, the corresponding normalized quality. With the parameters shown in Table I,

we have the normalized quality always less than 0.1 and. As a normalized quality of 0.1 (e.g., 1 on a 10

rating scale) is considered very annoying and unacceptable, thevideo stream should never be extracted at a normalized ratelower than this bound. With a SVC video, the lowest rate isthe base layer rate. In our test videos, the base layers all havenormalized rate above 0.016 with normalized quality above0.13, as indicated in Table II. Therefore, we claim that forthe practically meaningful range of rate, or equivalently

, is concave. As the weighted summation in (11) preservesthe concavity, the overall objective function is concave.Concavity of the objective function (11) ensures that the local

maximum is also the global maximum. We propose to use an it-erative algorithm to approach the optimal point. The reason be-hind this is trifold: 1) if the bottleneck link is highly dynamicwith time-varying bandwidth and delay, the results by directlysolving problem (11)–(13) given past observation (or estima-tion) of , and could be highly skewed; 2) directlysolving problem (11)–(13) incurs much larger complexity andoverhead; 3) directly solving problem (11)–(13) requires an ac-curate observation (or estimation) of , and , otherwise,the results can be invalid even if computed at a high frequency.Iterative solutions, on the other hand, are generally more robustagainst variations of the network conditions and measurementerrors [36].Denote and

the instantaneous incoming rate and the requiredserving time at the access point, respectively. Thenis the instantaneous effective link outgoing rate. Following thesame idea of the primal-dual algorithm in network rate control[37], [38], we propose the following two iterative steps:

(14)

(15)

Here are two scaling factors; is the first derivativeof w.r.t . denotes the price of using the link and

.Intuitively, the value of increases when the network is tem-

porarily over-congested, leading to a negative or slower incre-ment of , whereas temporarily underutilization of the networkresults in decreased and consequently higher from all con-tributing streams.Theorem 1: The iterative algorithm of (14), (15) will

converge to an equilibrium point which solve the problem(11)–(13) in a time-sharing wireless network under staticchannel conditions.

b) Proof: The constrained optimization problem of(11)–(13) can be converted to maximizing the following objec-tive function

where are Lagrange multipliers. The Karush-Kuhn-Tucker (KKT) conditions are as follows:

(16)

(17)

(18)

(19)


Note that the algorithm of (14), (15) is essentially gradientdescending algorithm, which stabilizes at certain point ,i.e. when and . Next, we show that the equilibriumpoint achieved by the algorithm of (14), (15) solvesthe problem (11)–(13) by examining the satisfaction of its KKTconditions.Case 1: if and , the point

satisfies

Then, we have the corresponding for theKKT condition.Case 2: if and for some , we have

andfor the KKT condition.Case 3: if and for some , we have

andfor the KKT condition.Case 4: if , then according to (14). So,

we have , and for the KKTcondition.

VI. PRACTICAL SYSTEM DESIGN

Previously, we have developed the mathematical frameworkfor solving the system-wide quality maximization problem. Inthis section, we focus on how to implement the proposed itera-tive solution in a practical wireless video streaming system.

A. Architecture Overview

In a wireless video streaming system, the video server main-tains the original video bistreams. Upon requests for certainvideo contents, the server will send the (sub)stream through IPnetwork to the wireless access node via which the (sub)stream isserved to end-users. The wireless access node is usually the bot-tleneck where the congestion is likely to occur as it is shared bymany users and has relatively low bandwidth capacity comparedto the IP network. In order to track the wireless link status, anend-to-end feedback mechanism would be expected to informthe video adapter if it is at the server. However, the end-to-enddelay is typically large in a wireless environment.To agilely trace the link status, we envision a proxy node

colocated with the access node. It is in charge of tracking thetime-varying status of the wireless access link, while dynam-ically adapting the traversing scalable video streams. Fig. 5provides an architectural overview of the proxy-based videostreaming system. The benefit of such a design is multifold.First, it requires no additional modifications at either the videoservers or the mobile clients; video adaptation is performed atthe proxy and is agnostic to both ends. Second, since the proxynode is located right at the bottleneck wireless node, it canreact much more agilely than end-to-end adaptation schemes inface of abrupt changes over the wireless hop. Furthermore, theproxy node has knowledge and control of all traffic traversingthe bottleneck wireless link, therefore is well-positioned to

Fig. 5. Architecture overview of the adaptive video streaming system. A proxynode at the edge of the network performs video adaptation before relaying videostreams to mobile clients.

optimize the allocated rate across the competing streams in amore holistic manner.The main drawback of this architecture is the potentially

large wasted bandwidth on the IP network. By the time adap-tation happens at the proxy, the required video (sub)streamsare expected to be arrived. Therefore, the server needs tosend the whole bitstream to the proxy or send over-provisioned(sub)streams based on some prediction algorithms. Consideringthat the bandwidth is abundant in the core network and theprevalent deployment of caching servers in the network, this isacceptable.

B. Implementing the Algorithm at the Proxy

The iterative algorithm consists of two processes. The firstprocess, which follows (14), updates the streaming rate giventhe observation of and , which is the vector containing theobserved link throughputs of all video receivers. The secondprocess, with (15), determines a new link price given the ob-servation of the sending rates in the last update interval from allusers and the effective serving rate . Note here, dependson both and . Suppose the proxy adapts video stream everyseconds, we write (15) in a discretized form,

(20)

The middle term in (20), i.e. is in fact the evolutionof queue length at the access node. At time index , the newstream rate for video is calculated as

(21)

and according to the rate-quality model in (7), is givenby


Fig. 6. Main components in proxy-based adaptation architecture.

We implement (20) and (21) in two separate modules: linkbuffer monitor and video adapter both at the proxy. Fig. 6 showsthe diagram of the two modules and the signaling in between.The link buffer monitor checks the bottleneck queue lengthonce every seconds. It is also responsible for estimatingthe link throughput for each receiver . In our system, thepackets’ inter-departure time at the interface queue is inspectedand it is used to derive the instantaneous throughput of the linkthat transports the packet under consideration (via dividing thepacket length by the inter-departure time for that packet). Then,the link throughput can be estimated by averaging over anumber of packets.The optimal rate allocation module will calculate the new

stream rate based on the feedback from the link buffer monitorand the video rate-quality parameters embedded in the SVCstream. Then, the SVC stream is adapted to the new rate bysimply sending video packets up to the target rate assumingstream is pre-ordered in the quality-optimized manner, as dis-cussed in Section IV-B.Note that the pre-ordering should ideallybe done at video encoder so that the video streams arriving atthe proxy are already in optimal orders. However, it is possibleto have the proxy to order the SVC layers if the video servers donot have pre-ordered streams and are agnostic of the rate-qualitymodel adopted by the proposed proxy-based adaptation system.

C. Discussions

1) Feedback Interval: For CGS video streams, the rate adap-tation can only be carried out at the boundary of GOPs. For ex-ample, with a CGS encoded video stream with GOP size of 16,the rate switching can only be done every sec-onds assuming 30 frames per second. On the other hand, for theproxy to accurately track the link status, it should run the itera-tive algorithm of (20), (21) at a much faster update frequency.To cope with this situation, we run the iterative algorithm witha short update interval, but only adapt the video rate at the be-ginning of each new GOP. A filter may be applied to obtain asmoother sending rate. In our simulation, the smoothed sendingrate at the beginning of each GOP is calculated as a weightedsum of the current sending rate determined from (21) and thesmoothed sending rate for the previous GOP with coefficients0.8 and 0.2 respectively. Then, the video rate is adapted ac-cording to the most recent smoothed sending rate.2) Iterative Algorithm vs. Exhaustive Search: We carried out

some numerical case studies and found that the iterative algo-rithm incurs at most 5% efficiency loss (in terms of the total

Fig. 7. Performance comparison of the iterative algorithm with the exhaustivesearch. Three sequences (AKIYO, FOREMAN and FOOTBALL) are requestedby three receivers and all the receivers have the same link throughput, rangingfrom 300 kbps to 3 Mbps. Five amplitude layers and five temporal layers aregenerated, thereby allowing 25 discrete quality-rate points. But only one of thosepoints given in Table II is chosen for each allocated rate for a sequence.

utility at the same rate) compared with the brute-force exhaus-tive search approach as shown in Fig. 7. The loss is largelydue to the discrete nature of the feasible rate points. An allo-cated rate (which is determined assuming any rate is achiev-able) is not always fully utilized. The complexity for exhaustivesearch is on the order of , while the iterative algorithm is

at each iteration. The corresponding running time for theMATLAB scripts is 500 ms and 0.17 ms, respectively. Throughour extensive simulations, we found that the discrete nature ofthe feasible rate points does not impact the algorithm conver-gence speed much.3) Limitation of the Effective Link Bandwidth Estimation

Method: As we assume there is no cross-layer information,i.e. the information from MAC layer and below, exposed tothe proxy, the proxy is agnostic to the packet loss below thelink layer. This limitation will result in inaccurate bandwidthestimation, especially when there are severe packet losses.For example, in WiFi system, a packet may undergo severalretransmissions and finally be dropped at the MAC layer,therefore the instantaneous link throughput is 0. However, theproxy had only observed the packet sojourn time at the MAClayer, and calculated the instantaneous link throughput as thepacket size divided by the sojourn time (which is equivalentto the packet inter-departure time at the interface queue ifother processing overheads are negligible). Thus, the resultingestimation does not consider potential packet loss after themaximum retransmission limit is reached.4) Equalizing Receivers’ Subjective Qualities: In the formu-

lation (11)–(13), the objective is to maximize the aggregatedweighted system utility. When all the weights are equal, thequality at all receivers are generally not equal. By appropri-ately choosing the weights, we can equalize the quality of allreceivers, or making some receivers enjoying higher quality.Without detailed derivations, we can show that setting

for each receiver leads to equalized subjective qual-ities at the receivers. We conducted some experiments to verifythis and the results can be found in Section VII-C-3.5) Sensitivity to Parameters and in the Rate-Quality

Model: We have conducted a theoretical analysis, investigatingthe sensitivity of the rate allocation results to the accuracy of theparameters and . This study showed that the relative error inthe allocated rate is less than 15%, when we use the fixed param-eters in the unified model for all sequences. This analysis is not


Fig. 8. Simulation topology setup. a) Topology for the first set of simulations; b) topology for the second set of simulations.

included here due to space limitation. We also conducted someof the experiments described in Section VII-C-1 both with thecontent-dependent parameters and fixed parameters, and foundthat the results are very close. One specific comparison is givenin Figs. 11(a) and 11(b), to be discussed later. Both the anal-ysis and simulation results suggest that in practice it is viableto use the unified Rate-Quality model, while performing rateallocation.

VII. PERFORMANCE EVALUATION

In this section, we evaluate our system design with exten-sive simulations based on the [39] simulator. We im-plemented video adaptation agents and a video player emulatorwhich generates video playback trace. We conducted two setsof simulations. The first set, targeting for evaluating the ef-fectiveness of the proxy-based design in adapting the sendingrate based on the time-varying channel condition, is driven bya real-world wireless measurement trace. The impact on back-ground traffic (e.g. TCP) is also evaluated. The second set ofsimulations investigate the effectiveness of the proposed ap-proach in rate allocation among multiple receivers based onboth channel conditions and video characteristics. We assumethe channel conditions of all receivers are static so as to isolatethe effect of channel dynamics.

A. Common Simulation Settings

We use real video packets traces to drive the simulations. Twotypical video sequences are used in the simulations: FOREMANand FOOTBALL, representing a slow-to-medium motion clipand a highly intense motion clip, respectively. Both sequenceshave a spatial resolution of 352 288 pixels (CIF) and temporalrate of 30 fps.We encode the sequences with JSVMversion 9.12[40] to generate SVC streams with 5 CGS layers and 5 temporallayers, and then pre-order the packets in a quality-optimizedmanner as explained in Section IV-B. The feasible rate pointsare as shown previously in Table II. Note that the maximumrates needed to achieve the highest quality are very differentfor the two videos, 0.8 Mbps and 2.1 Mbps, respectively. Thismeans that to achieve similar quality, FOOTBALL should beallocated much higher rate.We choose TFRC as a comparison rate control mechanism,

as it is considered suitable for media streaming. We appliedthe Datagram Congestion Control Protocol (DCCP) patch for

.1 which contains a TFRC implementation. DCCP doesnot require reliable in-order delivery of packets and the streamrate will be determined by the TFRC solely.In our simulations, we choose ,

where denotes total number of receivers, and forevery receiver unless otherwise stated. The proxy senses theinterface queue every 50 ms and estimates the effective band-width for each link by averaging over 16 most recent packets.The original candidate sending rate for each video is updatedusing (21). A smoothed candidate sending rate is calculated asdiscussed in Section VI-C-1 and the actual video rate is onlychanged at the beginning of each GOP. The rate point amongthose in Table II that is closest to the smoothed candidate ratefrom below is chosen. The packet size is set to 500 bytes and if avideo NAL unit size is larger than that, it will be segmented intoseveral packets. The access point queue size is set to 75 packetsas suggested in [41].We set the video playback delay to be 2 seconds which means

the player will start 2 seconds later after receiving the first videopacket. If any packet belonging to a NAL unit is lost duringtransmission, the entire NAL unit is discarded aswell as all otherNAL units that depend on it. Remaining NAL units that meetthe playback deadline are used to determine the highest decod-able temporal layer and amplitude layer, which correspond toa certain FR and QS . This pair is used to derive thenormalized quality using (1).

B. One Receiver With a Dynamic Wireless Environment

In this set of simulation, we are interested in the responsive-ness of our system design to the dynamic behavior of the wire-less link. The most important aspect is how fast the proposeddesign can tract and react to the wireless link quality changes.Besides, it is also important to study how video traffic inter-acts with background TCP flows. Only FOOTBALL sequenceis used in this part.1) Setup: Fig. 8(a) illustrates the simulation topology setup,

where the video server is attached to node N1 and the proxyis located at the AP. A video client agent and a video playeragent are attached to node M0. There is also a background TCPtraffic generated by a FTP session from node N2 to node M0,traversing the same AP. The wired link between node N0 andAP is of 100 Mbps and 2 ms delay; the wired link between N1

1http://eugen.dedu.free.fr/ns2/dccp-ns2.34.patch


Fig. 9. Wireless SNR and PHY rate traces from a real-world measurement. The trace was collected while driving around Mountainview, CA and the averagespeed of the vehicle was around 20 mph. (a) Signal-to-noise ratio; (b) PHY rate.

Fig. 10. Performance comparison of proxy-based adaptation and TFRC. The average sending rates for proxy-based adaptation and TFRC are 1601 kbps and 1072kbps respectively. The average playback rates over time for proxy-based adaptation and TFRC are 1333 kbps and 919 kbps respectively (not shown here). The av-erage normalized qualities over time for proxy-based adaptation and TFRC are 0.66 and 0.47 respectively. (a) Sending rate; (b) normalized quality; (c) backgroundTCP throughput.

and N0 has 10 Mbps and 18 ms delay while the wired link be-tween N2 and N0 has 10 Mbps and 10 ms delay; the wirelesslink between AP and M0 is driven by a real-world measurementtrace obtained when driving aroundMountainview, CA. The av-erage speed of the vehicle was around 20 mph. Fig. 9 showsthe signal to noise ratio (SNR) (in dB) and PHY rate (in Mbps)over time. These two traces are used to generate PER rate andcalculate transmission time. Note here, the effective link rate isalways less than the PHY rate due to retransmission at the MAClayer up to a preset retransmission limit. The video server startsstreaming at time 0 while FTP session starts at time 80. Bothflows end at time 200.2) Proxy Adaptation vs TFRC: Fig. 10 shows the per-

formance comparison for proxy based adaptation and TFRCscheme. From Fig. 10(a), it can be seen that when the channelcondition is good and stable, e.g., around time 50, 90 and 150,TFRC achieves good performance. However, when the channelquality changes dynamically, TFRC can not react agilelyexperiencing a slow convergence speed, for example duringtime 75–85 and 125 to 140. On the other hand, proxy-basedadaptation can adapt the sending rate quickly, and the re-sulting video playback quality is significantly improved overthe TFRC scheme. When the channel condition changes, theproxy-based adaptation can agilely follow up and stabilizeat a high streaming rate. In terms of playback quality, theproxy-based adaptation provides higher quality than TFRCin most time, and the average quality over the entire durationis much higher, as shown in Fig. 10(b). It is worth notinghere that since the proxy does not have information from theMAC layer, it does not know whether a packet is dropped ornot and the link throughput estimation may not be accurate atsevere packet loss. This can lead the proxy-based adaptation

react improperly when continuous packet losses happen. Forexample, around time 110–120, the proxy is still sending outvideo packets which are all dropped (which can be inferredfrom the zero quality during that time period in Fig. 10(b)).When there are other flows destined to good wireless channels,the system will unnecessarily lower the rate assigned to thoseflows. To alleviate this problem, either cross-layer informationexchange or user feedback can be considered so that the proxycan estimate the link throughput more accurately. We deferthis to our future study. Fig. 10(c) shows the throughput ofthe background TCP traffic over time. TCP throughputs arevery similar under both types of competing video streams. Thisindicates that the proxy-based adaptation is also TCP-friendly.

C. Multiple Receivers With Static Behavior

In this second part, we isolate the randomness of the wirelessnetwork and use fixed wireless links to drive the simulator. Thetopology setup is given in Fig. 8(b) where two or more serversstream videos via individual wireless links to the receivers de-pends on the specific scenario. In each scenario, we set differentlink conditions for different receivers. In this set of simulations,there is no background traffic injected.1) Content-Aware Rate Allocation Among Two Users: We

consider two nodes M1 and M2 share the same AP node. Thelink between AP and M1 has a PHY rate of 1 Mbps whilethe link between AP and M2 has a PHY rate of 11 Mbps. M1streams the FOREMAN sequence starting from time 0 and lastsfor 200 seconds, whereas M2 starts to stream FOOTBALL attime 20 seconds and finishes at time 250 seconds. Fig. 11 showsthe sending rate for the two competing video streams. As can beseen in Fig. 11(a), with the proxy-adapt approach (with equalweights for the two videos) the rate allocation of two streams


Fig. 11. Comparison of video playback rate achieved by TFRC, and the proposed proxy-based scheme. In this simulation, the FOREMAN sequence and theFOOTBAL sequence share a base station with PHY rate of 1 Mbps and 11 Mbps respectively. FOREMAN lasts between 0–200 sec, FOOTBALL between 20–250sec. Same weights are used for both receivers. (a) Proxy-based (non-unified model); (b) proxy-based (unified model); (c) TFRC.

Fig. 12. Normalized quality and video rate seen by users. The number of users ranges from 4 to 36 where half of them receive FOREMAN and the others receiveFOOTBALL. In addition, half of the members within each group have 6 Mbps PHY rate and the others have 54 Mbps PHY rate. (a) Average video rate for eachcategory of users: TFRC case; (b) average video rate for each category of users: Proxy-adapt case; (c) average normalized quality for each category of users: TFRCcase; (d) average normalized quality for each category of users: Proxy-adapt case.

converges in a short time period with FOOTBALL being allo-cated more rate. After session 1 ends, session 2 quickly jumpsto the full video rate. On the other hand, TFRC converges to aequilibrium point which gives similar rates to both sessions ascan be seen in Fig. 11(c). Some rate variations are observed forthe second stream (FOOTBALL) due to the limited rate choices.TFRC also requires more time to converge, e.g. when the firststream ends, TFRC takes 3 more seconds to converge to the fullvideo rate for the second stream.The results shown in Fig. 11(a) were obtained with content

dependent parameter values for and in the rate-qualitymodel. To examine the sensitivity of the rate allocation resultsto these parameters, we also run the simulations with the unifiedmodel and the results are shown in Fig. 11(b). Comparing thesetwo figures, we can see that the rate allocation stabilizes at thesame level for both settings. This suggests that it is viable to usefixed parameters for the Rate-Quality model, while performingrate allocation.2) Performance With Varying Number of Users: In this sce-

nario, an AP is shared by multiple users, each receives one video

stream. The number of concurrent users ranges from 4 to 36with half of them receiving FOREMAN and the others receivingFOOTBALL. In both groups of receivers, half of them have agood link condition of 54 Mbps PHY rate and the others have afair link condition of 6 Mbps PHY rate. Equal weights are usedfor all receivers and they all start at time 0. The total simulationtime is 100 seconds. Results are averaged over a duration of 60seconds after the system has reached convergence.Fig. 12 shows the average video rate and the resulting

normalized quality for each category of receivers. ComparingFig. 12(a) and Fig. 12(b), we can easily notice the effectivenessthe content-aware and link throughput-aware rate adaptationof the proxy-based approach. Fig. 12(a) shows that, withTFRC, all users are receiving similar video rate regardlessof their video characteristics and link status. This also leadsto the head-of-line blocking especially when the system isoverloaded with large number of receivers. As can be seenin Fig. 12(a), the video rate for all receivers are similar tothe receiver with 6 Mbps PHY rate. Fig. 12(b) confirms thatwith proxy-based scheme, FOOTBALL users will receive


TABLE IIIWEIGHT SETTINGS FOR DIFFERENTIAL SERVICES

Fig. 13. Users’ receiving rate and quality under the Proxy-based and TFRCschemes with heterogeneous weight assignment. (a) Rate; (b) quality.

higher rate than FOREMAN users if their channel conditionsare the same, whereas users with poor channel quality willreceive lower rates than those with good channel quality. Theresulting average normalized qualities for both scenarios areshown in Fig. 12(c) and Fig. 12(d). With TFRC, more complexvideo (FOOTBALL) is delivered with a lower quality. Also,same video content receivers have similar quality even thoughtheir link throughputs are different. On the other hand, withproxy-based scheme, received quality not only depends on thevideo content but also depends on the link throughput. It is alsoclear that proxy-based scheme gives larger aggregated qualitythan TFRC.3) Differential Services: Previous results for the proxy-based

approach are all obtained with equal weights for all users. Al-though the sending rates are content-aware, with higher ratesallocated to more complex video, the received qualities are notequalized. By assigning different weights to different users,the proxy-based scheme provides the capability of differentialservices or equalizing receivers’ qualities. We consider anAP being shared by 9 users divided into three groups. Thefirst group, denoted as G1, receives FOREMAN; the secondgroup, G2, receives FOOTBALL and the third group G3, alsoreceives FOOTBALL. Two sets of weightings are investigatedas shown in Table III. The first one (denoted as Proxy-1),targeting for equal subjective quality, has weights 1 and 2.6and 2.6 respectively for the three groups. These weights arechosen as discussed in Section VI-C-4. The second set (denotedas Proxy-2) has weights 1, 2.6 and 10 respectively for thethree groups, so that G1 and G2 receive similar quality andG3 receives a higher quality. In practice, the second set mayrepresent the case where one group of users paid premiumprice for better video services. All users have link PHY rate12 Mbps. As shown in Fig. 13, with proxy-based scheme andthe first set of weights (Proxy-1), all FOOTBALL receiversobtain higher rates than FOREMAN receivers and all receiversreceive very similar quality. With the second set of weights(Proxy-2), the premium users are able to receive a much higherquality than the other users. TFRC can not provide either equal

or differentiated quality as it attempts to allocate same rates toall users. As a result, while FOREMAN users enjoy a higherquality, all FOOTBALL users experience much lower quality.

VIII. CONCLUSIONS AND FUTURE WORK

We proposed a proxy-based subjective-quality aware con-tent-adaptation solution targeting for wireless edge network.Wefirst derived a novel analytical rate-quality model which can beutilized for quality-aware network optimization, followed by avideo chunk pre-ordering method that achieves rate-quality op-timality. Then, we investigated an iterative multi-stream rateallocation scheme at the proxy, that can maximize a weightedsum of received video quality at all receivers, given the linkbandwidths of all receivers. The algorithm explicitly accountsfor heterogeneity in both the video streams and the underlyingwireless link capacities experienced by different users. It canconverge very fast and accurately track the available network re-sources. Requiring only the queue length at the congested accesspoint and the throughputs of individual outgoing video streams,it does not require any additional feedback from the receivers.The proposed scheme also readily supports differentiated ser-vice for users of different relative importance levels, yet com-petes fairly against conventional TCP flows.We note that the rate adaptation framework proposed in this

work can potentially be extended for other video streaming sce-narios, with or without the support of scalable video coding.For instance, one can adopt the same notion of maximizingoverall viewing experience for adaptive HTTP video streaming,in which case the proxy either prefetches multiple pre-encodedversions of the same video segment, or generate them on-the-flyfrom a received high-quality single-layer video. Furthermore,the proposed iterative rate-allocation scheme is not limited toour rate-quality model; rather it is applicable to any concaverate-quality function. It can be extended to accommodate thespatial scalability as well.

REFERENCES[1] Cisco, Visual Networking Index: Global Mobile Data Traffic Forecast

Update, 2010–2015, Feb. 2011.[2] S. Floyd,M. Handley, J. Pahdye, and J.Widmer, “Tcp friendly rate con-

trol (TFRC): Protocol specification,” RFC 5348 (Proposed Standard),Sep. 2008.

[3] S. Lu, V. Bharghavan, and R. Srikant, “Fair scheduling in wirelesspacket networks,” IEEE/ACM Trans. Netw., vol. 7, no. 4, pp. 473–489,Aug. 1999.

[4] ITU-T Recommendation H.264-ISO/IEC 14496-10(AVC), AdvancedVideo Coding for Generic Audiovisual Services, Amendment 3: Scal-able Video Coding, ITU-T and ISO/IEC JTC 1, 2005.

[5] H. Schwarz, D. Marpe, and T. Wiegand, “Overview of the scalablevideo coding extension of H.264/AVC,” IEEE Trans. Circuits Syst.Video Technol., vol. 17, no. 9, pp. 1103–1120, Sep. 2007.

[6] Y. Wang, Z. Ma, and Y.-F. Ou, “Modeling rate and perceptual qualityof scalable video as functions of quantization and frame rate and itsapplication in scalable video adaptation,” in Proc. Packet Video, May2009, pp. 1–9.

[7] E. Kohler, M. Handley, and S. Floyd, “Datagram congestion controlprotocol (DCCP),” RFC 4340, Mar. 2006.

[8] H. Chen and J. E. Thropp, “Review of low frame rate effects on humanperformance,” IEEE Trans. Syst., Man, Cybern. A: Syst. Humans, vol.37, no. 6, pp. 1063–1076, Nov. 2007.

[9] Y. Wang, S.-F. Chang, and A. C. Loui, “Subjective preference ofspatio-temporal rate in video adaptation using multi-dimensionalscalable coding,” in Proc. IEEE Int. Conf. Multimedia and Expo, Jun.2004, pp. 1719–1722.


[10] K.-C. Yang, C. C. Guest, K. El-Maleh, and P. K. Das, “Perceptual tem-poral quality metric for compressed video,” IEEE Trans. Multimedia,vol. 9, no. 7, pp. 1526–1535, Nov. 2007.

[11] Q. Huynh-Thu andM. Ghanbari, “Temporal aspect of perceived qualityin mobile video broadcasting,” IEEE Trans. Broadcast., vol. 54, no. 3,pp. 641–651, Sep. 2008.

[12] S. H. Jin, C. S. Kim, D. J. Seo, and Y. M. Ro, “Quality measurementmodeling on scalable video applications,” in Proc. IEEE Int. WorkshopMultimedia Signal Processing, Oct. 2007, pp. 131–134.

[13] K. Yamagishi and T. Hayashi, “QRP08-1: Opinion model for esti-mating video quality of videophone services,” in Proc. IEEE GlobalCommunications Conf., Nov. 2006, pp. 1–5.

[14] Y.-F. Ou, Z. Ma, T. Liu, and Y. Wang, “Perceptual quality assessmentof video considering both frame rate and quantization artifacts,” IEEETrans. Circuits Syst. Video Technol., vol. 21, no. 3, pp. 286–298, Mar.2011.

[15] Q. Zhang, W. Zhu, and Y. Zhang, “End-to-end QoS for video deliveryover wireless internet,” Proc. IEEE, vol. 93, no. 1, pp. 123–134, Jan.2005.

[16] J. Cabrera and A. Ortega, “Stochastic rate control of video coders forwireless channels,” IEEE Trans. Circuits Syst. Video Technol., vol. 12,no. 6, pp. 496–510, Jun. 2002.

[17] L. Haratcherev, J. Taal, K. Langendoen, R. Lagendijk, and H. Sips,“Optimized video streaming over 802.11 by cross-layer signaling,”IEEE Commun. Mag., vol. 44, no. 1, pp. 115–121, Jan. 2006.

[18] P. van Beek and M. U. Demircin, “Delay-constrained rate adaptationfor robust video transmission over home networks,” in Proc. IEEE Int.Conf. Image Processing, Sep. 2005, pp. 173–176.

[19] T. Stockhammer, M. Walter, and G. Liebl, “Optimized h. 264-basedbitstream switching for wireless video streaming,” in Proc. IEEE Int.Conf. Multimedia and Expo, Jul. 2005, pp. 1396–1399.

[20] J. R. Ohm, “Advances in scalable video coding,” Proc. IEEE, vol. 93,no. 1, pp. 42–56, Jan. 2005.

[21] T. Schierl, T. Stockhammer, and T. Wiegand, “Mobile video transmis-sion using scalable video coding,” IEEE Trans. Circuits Syst. VideoTechnol., vol. 17, no. 9, pp. 1204–1217, Sep. 2007.

[22] M. Chen and A. Zakhor, “Rate control for streaming video over wire-less,” in Proc. IEEE Int. Conf. Computer Communications, Mar. 2004,pp. 1181–1190.

[23] H. Luo, D. Wu, S. Ci, H. Sharif, and H. Tang, “TFRC-based ratecontrol for real-time video streaming over wireless multi-hop meshnetworks,” in Proc. IEEE Int. Conf. Communications, Jun. 2009,pp. 1–5.

[24] G. Yang, T. Sun, M. Gerla, M. Y. Sanadidi, and L.-J. Chen, “Smoothand efficient real-time video transport in the presence of wireless er-rors,” ACM Trans. Multimedia Comput., Commun., Applicat., vol. 2,no. 2, pp. 109–126, May 2006.

[25] J. Chakareski and P. Frossard, “Rate-distortion optimized distributedpacket scheduling of multiple video streams over shared communica-tion resources,” IEEE Trans. Multimedia, vol. 8, no. 2, pp. 207–218,April 2006.

[26] L. Zhou, X. Wang, W. Tu, G. Muntean, and B. Geller, “Distributedscheduling scheme for video streaming over multi-channel multi-radiomulti-hop wireless networks,” IEEE J. Select. Areas Commun., vol. 28,no. 3, pp. 409–419, Apr. 2010.

[27] X. Zhu, R. Pan, N. Dukkipati, V. Subramanian, and F. Bonomi, “Lay-ered internet video engineering (LIVE): Network-assisted bandwidthsharing and transient loss protection for scalable video streaming,” inProc. IEEE Int. Conf. Computer Communications Mini-Conference,Mar. 2010.

[28] X. Zhu and B. Girod, “Distributed media-aware rate allocation forwireless video streaming,” IEEE Trans. Circuits Syst. Video Technol.,vol. 20, no. 11, pp. 1462–1474, Jan. 2010.

[29] S. Khan, S. Duhovnikov, E. Steinbach, and W. Kellerer, “MOS-basedmultiuser multiapplication cross-layer optimization for mobile multi-media communication,” Adv. Multimedia, vol. 2007, no. 1, pp. 1–11,Jan. 2007.

[30] A. Ksentini, M. Naimi, and A. Gueroui, “Toward an improvement ofH.264 video transmission over IEEE 802.11e through a cross-layer ar-chitecture,” IEEE Commun. Mag., vol. 44, no. 1, pp. 107–104, Jan.2006.

[31] M. van der Schaar and N. Shankar, “Cross-layer wireless multimediatransmission: Challenges, principles and new paradigms,” IEEE Wire-less Commun., vol. 12, no. 4, pp. 50–58, Aug. 2005.

[32] H. Zhang, Y. Zheng,M. Khojastepour, and S. Rangarajan, “Cross-layeroptimization for streaming scalable video over fading wireless net-works,” IEEE J. Select. Areas Commun., vol. 28, no. 3, pp. 344–353,Apr. 2010.

[33] J. Chakareski, “In-network packet scheduling and rate allocation: Acontent delivery perspective,” IEEE Trans. Multimedia, vol. 13, no. 5,pp. 1092–1102, Oct. 2011.

[34] M. Wien, H. Schwarz, and T. Oelbaum, “Performance analysis ofSVC,” IEEE Trans. Circuits Syst. Video Technol., vol. 17, no. 9, pp.1194–1203, Sep. 2007.

[35] Z. Ma, M. Xu, Y.-F. Ou, and Y. Wang, “Modeling rate and perceptualquality of video as functions of quantization and frame rate and itsapplications,” IEEE Trans. Circuits Syst. Video Technol., vol. 22, no.5, pp. 671–682, May 2012.

[36] S. Ulukus and R. D. Yates, “Stochastic power control for cellular radiosystems,” IEEE Trans. Commun., vol. 46, no. 6, pp. 784–798, Jun.1998.

[37] F. Kelly, A. Maulloo, and D. Tan, “Rate control in communication net-works: Shadow prices, proportional fairness and stability,” J. Oper.Res. Soc., vol. 49, no. 3, pp. 237–252, Mar. 1998.

[38] R. Srikant, The Mathematics of Internet Congestion Control. Cam-bridge, MA, USA: Birkhauser, 2003.

[39] Ns-2 Network Simulator. [Online]. Available: http://www.isi.edu/nsnam/ns/.

[40] JSVM SVC Reference Software. [Online]. Available: http://ip.hhi.de/imagecom_G1/savce/downloads/.

[41] Cisco, QoS on Wireless LAN Controllers and Lightweight APsConfiguration Example. [Online]. Available: http://www.cisco.com/en/US/tech/tk722/tk809/technologies_configuration_ex-ample09186a00807e9717.shtml.

Hao Hu received the B.S. degree from Nankai Uni-versity and the M.S. degree from Tianjin Universityin 2005 and 2007 respectively, and the Ph.D. degreefrom Polytechnic Institute of New York University inJanuary 2012.He is currently with the Advanced Architecture

and Research group, Cisco Systems, San Jose, CA.He interned in the Corporate Research, ThomsonInc., NJ in 2008 and Cisco Systems, CA in 2011.His research interests include video QoE, videostreaming and adaptation.

Xiaoqing Zhu is currently a member of the Ad-vanced Architecture & Research Group at CiscoSystems Inc. She received the B.Eng. degree inElectronics Engineering from Tsinghua University,Beijing, China, in 2001. She received both the M.S.and Ph.D. degrees in Electrical Engineering fromStanford University, CA, USA, in 2002 and 2009,respectively. She interned with the IBM AlmadenResearch Center in 2003, and was at Sharp Labsof America in the summer of 2006. Dr. Zhu wasawarded the Stanford Graduate Fellowship from

2001 to 2005. She was recipient of the best student paper award in ACMMultimedia 2007.Dr. Zhu’s research interests lie at the intersection of multimedia signal

processing, wireless communications, and networking. She has servedas reviewer for many journals and magazines, including IEEE JOURNALON SELECTED AREAS IN COMMUNICATIONS, IEEE TRANSACTIONS ON

WIRELESS COMMUNICATIONS, IEEE TRANSACTIONS ON MULTIMEDIA, IEEECOMMUNICATIONS MAGAZINE, and IEEE NETWORK MAGAZINE. She hasalso helped organize various conferences and workshops, such as IEEEGLOBECOM, IEEE International Conference on Computing, Networking andCommunication (ICNC), and SPIE Visual Communications and Image Pro-cessing (VCIP). She served as guest editor for IEEE Technical Committee onMultimedia Communications (MMTC) E-Letter, IEEE JOURNAL ON SELECTEDAREAS IN COMMUNICATIONS, and IEEE TRANSACTIONS ON MULTIMEDIA.


Yao Wang (M’90–SM’98–F’04) received the B.S.and M.S. degrees in electronic engineering from Ts-inghua University, Beijing, China, in 1983 and 1985,respectively, and the Ph.D. degree in electrical andcomputer engineering from the University of Cali-fornia, Santa Barbara, in 1990.Since 1990, she has been with the Electrical and

Computer Engineering Faculty, Polytechnic Uni-versity, Brooklyn, NY (now Polytechnic Institute ofNew York University). She is the leading author ofthe textbook Video Processing and Communications

(Prentice-Hall, 2001). Her current research interests include video coding andnetworked video applications, medical imaging, and pattern recognition.Dr. Wang has served as an Associate Editor for the IEEE TRANSACTIONS

ON MULTIMEDIA and the IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FORVIDEO TECHNOLOGY. She received the New York City Mayor Award for Excel-lence in Science and Technology in the Young Investigator Category in 2000.Shewas a co-winner of the IEEECommunications Society Leonard G. AbrahamPrize Paper Award in the Field of Communications Systems in 2004. She re-ceived the Overseas Outstanding Young Investigator Award from the NationalNatural Science Foundation of China in 2005 and was named the Yangtze RiverLecture Scholar by the Ministry of Education of China in 2007.

Rong Pan received her Ph.D. degree from theElectrical Engineering Department at StanfordUniversity in 2002. Currently, she is a DistinguishedEngineer/Senior Director at Cisco where she headsa team in the advanced architecture and researchdivision. She is an author for more than 30 technicalpapers and an inventor of 27 patents. Her work andinnovations have been widely recognized. She is akey-inventor of the QCN algorithm, which is now anIEEE 802.1 standard on congestion notification forData Center Ethernet. Her other algorithms, such as

AFD (a simple, approximate scheme to Fair Queueing), have had major impacton multiple Cisco’s flagship products with a combined revenue of more than$10 B. The CHOKe algorithm that she developed as part of her Ph.D. thesis hasbecome a standard QoS feature in Linux Kernel since version 3.2.2. Currently,she is working on the buffer bloat problem in the Internet and leading Cisco’seffort at IETF on this topic. She has won a best paper award and served asprogram committee members at IEEE conferences, and she will serve as thetechnical chair at IEEE High Performance Switching and Routing Conference2014.

Jiang Zhu is a senior technical leader in AdvancedArchitecture and Research group at Cisco Systems,Inc. He has over 15 years of industrial experiencebuilding large-scale distributed media systems. Hisresearch focuses on adaptive content networking,large-scale data systems, software defined net-working (SDN), cloud service orchestrations andapplications of data mining and machine learningin these fields. He did his doctoral study focusingon SDN and OpenFlow in High Performance Net-working Group at Stanford University. He also

received his M.S. degrees in Electrical Engineering and in Management Sci-ence & Engineering from Stanford University. Before Stanford, he obtained hisM.S. in Computer Science from DePaul University and B.Eng. in Automationfrom Tsinghua University.

Flavio Bonomi is a Cisco Fellow, Vice President,and is the Head of the Advanced Architecture andResearch Organization at Cisco Systems, in SanJose, CA. Over the past years, he has led a numberof Cisco’s Advanced Architecture activities, andcontributed to the establishment of Cisco’s virtual,distributed Research Organization, including archi-tects and researchers embedded in a wide number oforganizations across Cisco, and collaborating with agrowing network of industry and university partners.He received his Ph.D. in Electrical Engineering in

1985, and a Master in Electrical Engineering in 1981 from Cornell Universityin Ithaca, NY. He received his Electrical Engineering Degree from Pavia Uni-versity, in Italy.

1638 ieee transactions on multimedia, vol. 15, no. 7,...

Documents