mehdi rezaei1,imedbouazizi 2 and moncef gabbouj1moncef/publications/fuzzy-joint-encoding...method...

14
International Journal of Innovative Computing, Information and Control ICIC International c °2009 ISSN 1349-4198 Volume 5, Number 7, July 2009 pp. 1—IHMSP07-07 FUZZY JOINT ENCODING AND STATISTICAL MULTIPLEXING OF MULTIPLE VIDEO SOURCES WITH INDEPENDENT QUALITY OF SERVICES FOR STREAMING OVER DVB-H Mehdi Rezaei 1 , Imed Bouazizi 2 and Moncef Gabbouj 1 1 Department of Signal Processing Tampere University of Technology P.O.Box 553, FI-33101 Tampere, Finland [email protected]; moncef.gabbouj@tut.2 Nokia Research Center Tampere, Finland [email protected] Received February 2008; revised July 2008 Abstract. A novel fuzzy joint video encoding and statistical multiplexing (StatMux) method for broadcasting over DVB-H (Digital Video Broadcasting for Handhelds) chan- nels is proposed to decrease end-to-end delay in a broadcast system. DVB-H uses a time-sliced transmission scheme to reduce the power consumption used for radio recep- tion in handheld receivers. Due to the time slicing scheme in DVB-H, channel changing delay, i.e. changing from one audio-visual service to another, and thereafter end-to-end delay becomes signicant. The proposed video encoding method decreases the buering delays that constitute the major parts of the end-to-end delay by implementing statistical multiplexing (StatMux) over video services. Unlike conventional similar methods, in the proposed method the multiplexed services can have independent bit rates and quality of services. Moreover, the computational complexity of the proposed method is much lower than that of conventional methods. Although the proposed method has been designed and tested for DVB-H broadcast system, it can be deployed in other video broadcast systems in which a number of video services are encoded and broadcasted simultaneously. Simu- lation results show that the proposed method can considerably decrease end-to-end delay without any cost in the overall quality of compressed video. Keywords: Broadcasting, Fuzzy logic control, Rate control, Statistical multiplexing, Streaming, Video coding 1. Introduction. Digital Video Broadcasting for Handheld terminals (DVB-H) is an ETSI specication for delivering broadcast services to battery-powered handheld receivers [1-4]. DVB-H is mainly based on the DVB-T specication for digital terrestrial television. However, it adds a number of features designed to consider the limited battery life of handheld devices and the particular environments in which such receivers operate [5,6]. Services used in mobile handheld terminals require relatively low bit rates. The esti- mated maximum bit rate for streaming video using advanced compression technology like H.264/AVC is in the order of a few hundred kilobits per second. A DVB-T transmission system usually provides a bit rate of up to 8Mbps or more. This provides a possibility to signicantly reduce the average power consumption of a DVB-H receiver by introducing a scheme based on time division multiplexing. This scheme is called Time-slicing. To reduce the power consumption in handheld terminals, the service data is time-sliced and then it is sent through the channel as bursts at a signicantly higher bit rate compared to the bit rate of the audio-visual service. Time-slicing enables a receiver to stay active 1

Upload: others

Post on 05-Oct-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Mehdi Rezaei1,ImedBouazizi 2 and Moncef Gabbouj1moncef/publications/fuzzy-joint-encoding...method for broadcasting over DVB-H (Digital Video Broadcasting for Handhelds) chan-nels is

International Journal of InnovativeComputing, Information and Control ICIC International c°2009 ISSN 1349-4198Volume 5, Number 7, July 2009 pp. 1—IHMSP07-07

FUZZY JOINT ENCODING AND STATISTICAL MULTIPLEXING OFMULTIPLE VIDEO SOURCES WITH INDEPENDENT QUALITY OF

SERVICES FOR STREAMING OVER DVB-H

Mehdi Rezaei1, Imed Bouazizi2 and Moncef Gabbouj1

1Department of Signal ProcessingTampere University of Technology

P.O.Box 553, FI-33101 Tampere, [email protected]; [email protected]

2Nokia Research CenterTampere, Finland

[email protected]

Received February 2008; revised July 2008

Abstract. A novel fuzzy joint video encoding and statistical multiplexing (StatMux)method for broadcasting over DVB-H (Digital Video Broadcasting for Handhelds) chan-nels is proposed to decrease end-to-end delay in a broadcast system. DVB-H uses atime-sliced transmission scheme to reduce the power consumption used for radio recep-tion in handheld receivers. Due to the time slicing scheme in DVB-H, channel changingdelay, i.e. changing from one audio-visual service to another, and thereafter end-to-enddelay becomes significant. The proposed video encoding method decreases the bufferingdelays that constitute the major parts of the end-to-end delay by implementing statisticalmultiplexing (StatMux) over video services. Unlike conventional similar methods, in theproposed method the multiplexed services can have independent bit rates and quality ofservices. Moreover, the computational complexity of the proposed method is much lowerthan that of conventional methods. Although the proposed method has been designed andtested for DVB-H broadcast system, it can be deployed in other video broadcast systemsin which a number of video services are encoded and broadcasted simultaneously. Simu-lation results show that the proposed method can considerably decrease end-to-end delaywithout any cost in the overall quality of compressed video.Keywords: Broadcasting, Fuzzy logic control, Rate control, Statistical multiplexing,Streaming, Video coding

1. Introduction. Digital Video Broadcasting for Handheld terminals (DVB-H) is anETSI specification for delivering broadcast services to battery-powered handheld receivers[1-4]. DVB-H is mainly based on the DVB-T specification for digital terrestrial television.However, it adds a number of features designed to consider the limited battery life ofhandheld devices and the particular environments in which such receivers operate [5,6].Services used in mobile handheld terminals require relatively low bit rates. The esti-mated maximum bit rate for streaming video using advanced compression technology likeH.264/AVC is in the order of a few hundred kilobits per second. A DVB-T transmissionsystem usually provides a bit rate of up to 8Mbps or more. This provides a possibility tosignificantly reduce the average power consumption of a DVB-H receiver by introducinga scheme based on time division multiplexing. This scheme is called Time-slicing. Toreduce the power consumption in handheld terminals, the service data is time-sliced andthen it is sent through the channel as bursts at a significantly higher bit rate comparedto the bit rate of the audio-visual service. Time-slicing enables a receiver to stay active

1

Page 2: Mehdi Rezaei1,ImedBouazizi 2 and Moncef Gabbouj1moncef/publications/fuzzy-joint-encoding...method for broadcasting over DVB-H (Digital Video Broadcasting for Handhelds) chan-nels is

2 M. REZAEI, I. BOUAZIZI AND M. GABBOUJ

during only a small fraction of the time, while receiving bursts of a requested service. Itsignificantly reduces the power consumption used for radio reception. DVB-H also em-ploys additional forward error correction to further improve mobile and indoor receptionperformance of DVB-T.Channel changing delay in DVB-H refers to the time between the start of switching

to a new channel and the start of the media rendering [7]. Channel changing delayincludes several parts: Arrival Delay (delay to arrival of desired burst), Reception Delay(reception duration of desired burst), Decapsulation Buffering Delay, Decoder RefreshDelay (delay to the first random access point), Decoder Buffering Delay (Initial bufferingperiod of Coded Picture Buffer). The decapsulation buffering delay includes two bufferingdelays for the Multi-protocol Decapsulation Buffer (MDB) and RTP (Real Time Protocol)Decapsulation Buffer (RDB). The decapsulation buffering delay is required to compensatefor the variations of burst size and the decoder buffering delay is needed to compensate forthe variations of bit rate. Moreover, another delay is needed for synchronization betweenthe associated streams (e.g. audio and video).One of the significant factors in channel changing delay is the arrival delay. The arrival

delay depends on the time-slicing parameters that define the power consumption of DVB-H receivers. The lower the receiver power consumption, the higher will the arrival delaybe. Another factor in channel changing delay is the required delay to compensate thevariation in bit rate. For video streaming over DVB-H system the advantages of variablebit rate video is exploited. For most video contents, a variable bit rate (VBR) video canprovide better visual quality and coding efficiency than a constant bit rate video [8]. Ahigher quality and compression performance can be obtained by more variations in bitrate and higher buffering delay.When VBR bit streams are broadcasted in DVB-H, utilizing statistical multiplexing

(StatMux) is beneficial to reduce end-to-end delay and to maximize utilization of trans-mission bandwidth. StatMux in DVB-H can be implemented in conjunction with encodingat the encoders where a number of services are encoded and broadcasted simultaneouslyand/or it can be implemented in conjunction with the time-slicing at a network elementcalled IP Multiprotocol Encapsulator. Depending on the implementation method, Stat-Mux can affect on different parts of the end-to-end delay. Implementation of StatMuxat the IP encapsulator is out of scope of this paper. The proposed joint encoding andStatMux method affect the buffering delay that is required at IP encapsulator beforeencapsulation and the decoder initial buffering delay that is required before decoding.The major problem of joint video encoding and StatMux is how to allocate the available

bit budget among the video sources that share the common channel bandwidth and arejointly encoded. The conventional joint encoding methods follow two main approaches:forward analysis and modeling approach. In forward analysis, a preprocessing is per-formed on video sources to gather statistics about the coding complexity. The real codingprocess can operate based on the statistics obtained by the preprocessing. In the modelingapproach, first it is attempted to model the performance of video encoder and the cod-ing complexity of video sources and then the allocated bits to video sources is controlledbased on provided models while the models are updated during the encoding. See theproposed methods in [9-11] as examples for the two approaches. The system presentedin [9] consists of several preprocessors and video encoders. Each preprocessor analyzes avideo source and derives picture statistics. Using these statistics, a joint rate controllercalculates dynamically the bit rate for each encoder based on the relative complexities ofthe sources. Another bit allocation method for joint coding of multiple video sources ispresented in [10]. In this method, the input video sources are divided into Super GOPs (anumber of GOP) and Super Frames (a set of frames, one from each source) and then, the

Page 3: Mehdi Rezaei1,ImedBouazizi 2 and Moncef Gabbouj1moncef/publications/fuzzy-joint-encoding...method for broadcasting over DVB-H (Digital Video Broadcasting for Handhelds) chan-nels is

FUZZY JOINT ENCODING AND STATISTICAL MULTIPLEXING 3

bit budget is distributed hierarchically between the video super GOPs, super frames andframes according to their relative complexities while the encoder and decoders’ buffers areprevented from overflowing and under-flowing. Finally, using a rate-distortion model, aquantization parameter is calculated for each frame according to the allocated bits to theframe. A similar approach to that in [10] is presented in [11]. In this paper we propose anovel fuzzy joint encoding and StatMux method that is different from conventional meth-ods. The proposed method does not use any preprocessing or model for controlling. Itutilizes a number of fuzzy controllers to control the bit rates of encoded bit streams simul-taneously in real-time and to decrease the buffering delays and thereafter end-to-end delayof broadcast system. The proposed joint video encoding and multiplexing method in thispaper has been built based on a combination and modification of the rate control methodsproposed in [12,13] and [14] for independent and joint video encoding, respectively.The paper is organized as follows: Section 2 presents an overview on the proposed

method. Detailed information about the proposed method is presented in Section 3 andSection 4. Simulation results of the proposed method are provided in Section 5. Thepaper ends with conclusions in Section 6.

Figure 1. Block diagram of proposed joint encoding system

2. Proposed Joint Video Encoding and Statistical Multiplexing Method. Theproposed joint encoding and StatMux method is implemented by a joint video rate controlsystem. Figure 1 shows the simplified block diagram of the rate control system. A numberof video sources are encoded simultaneously, each by one encoder. The rate control systemutilizes a number of fuzzy controllers to control the bit rate of each encoded bit streamand also the bit rate of aggregated bit stream. The proposed system is a real-time controlsystem without any look ahead and preprocessing. Utilizing the fuzzy controllers, ithas a very low degree of complexity in comparison to the conventional methods. Unlikethe conventional methods, in the proposed method, the multiplexed bit streams can haveindependent quality of services and different bit rates. The proposed method can be tunedto only allow for short-term exchanges of bit budget information between bit streams, inwhich case the long-term exchanges of bits between the bit streams are prevented. In thiscase, the average quality of encoded bit stream remains constant in comparison to theindependent encoding case. In another case, it can be tuned to allow long-term exchangesof bit budget between the bit streams similar to the conventional methods.According to the proposed method, an independent video rate controller (IRC) is used

for encoding each video bit stream to guarantee a VBR bit stream with an average bitrate and a buffering constraint. The encoded bit streams are multiplexed and moved to

Page 4: Mehdi Rezaei1,ImedBouazizi 2 and Moncef Gabbouj1moncef/publications/fuzzy-joint-encoding...method for broadcasting over DVB-H (Digital Video Broadcasting for Handhelds) chan-nels is

4 M. REZAEI, I. BOUAZIZI AND M. GABBOUJ

a virtual joint buffer. The data is removed from the joint buffer at a constant bit rateappropriate to the target bandwidth of the transmission channel. The occupancy of jointbuffer is used as a feedback signal by a joint rate controller (JRC). The JRC controlsthe variations in bit rate of the aggregate bit stream to guarantee a limited bufferingdelay. Less variations in the bit rate of aggregated bit stream means smaller bufferingdelays. The proposed method not only decreases the decoder buffering delay by allocatingvariable bandwidth instead of a fixed bandwidth to each bit streams, but also, decreasesthe buffering delay that is required before the start of encapsulation at the IP encapsulatorof the DVB-H network. These two delays are related to each other such that any reductionin the decoder buffering delay means a similar reduction in the buffering delay at the IPencapsulator.The video encoders are controlled by adjusting the Quantization Parameter (QP) on

a picture basis. The QP is mainly controlled by the IRCs while the JRC adds a smallpositive or negative value to the QP values determined by the IRCs according to theoccupancy of the joint buffer and the bit rate of the aggregate bit stream as

Qn = QIRCn +∆QJ (1)

where Qn denotes the used QP by the nth encoder. QIRCn is the QP calculated by the n

th

independent rate controller and ∆QJ represents the output of JRC. More details aboutthe IRCs and JRC systems are presented in Section 3 and Section 4, respectively.

3. Fuzzy Independent Rate Controller. The IRC controls the bit rate of a bit streamby adjusting the QP on a picture basis. It utilizes a fuzzy rate controller and severalother tools to calculate the QP for different types of video pictures. Although here, onlyintra-prediction pictures (I-picture) and reference inter-prediction pictures (P-pictures)are explained, the algorithm is easily stretched to support other types of pictures as well.The IRC utilizes a virtual buffer to impose a buffering constraint on the bit stream.The IRC can be functionally divided into two main parts. The first part utilizes the

fuzzy controller to compute the QP of P-pictures. The second part of the algorithm usesother feedback signals from uncompressed and compressed video to calculate the QP ofI-pictures. The I-pictures at the scene boundaries are treated differently from the normalI-pictures at the periodic random access points. In VBR, the bit allocation to I-pictureshas a remarkable impact on the overall rate-distortion (RD) performance. Therefore, theQP of I-pictures should be computed very carefully. The key point in the proposed IRCis to prevent unnecessary variations in quality while the buffer constraint is observed.Detailed information about the calculation of QP is presented in the sequel.

Figure 2. Block diagram of the fuzzy IRC

Page 5: Mehdi Rezaei1,ImedBouazizi 2 and Moncef Gabbouj1moncef/publications/fuzzy-joint-encoding...method for broadcasting over DVB-H (Digital Video Broadcasting for Handhelds) chan-nels is

FUZZY JOINT ENCODING AND STATISTICAL MULTIPLEXING 5

3.1. Output of IRC for P-pictures. The output of IRC includes the main part of aQP that is used for encoding a video picture as in (1). The output of IRC (QIRCn) forP and I pictures are denoted as QP and QI , respectively, in the sequel. The output ofIRC for P-pictures is defined by the fuzzy controller. Figure 2 depicts the block diagramof the proposed rate control system for P-pictures. The fuzzy controller and the virtualbuffer (Buf.1 to Buf.N in Figure 1) are the basic elements of the control system. Thefuzzy controller attempts to control the bit rate of the encoded bit stream by adjustingthe variations of QP while it has been optimized to prevent unnecessary fluctuations ofQP. In computation of QP, it is assumed that the consequent video frames have a similardegree of coding complexity (except in scene cuts). Therefore, the coding complexity ofthe previously encoded picture is used as an estimate for the coding complexity of thecurrent encoding picture and the QP of the current picture is computed based on theQP of the previously encoded picture with small variation which is defined by the fuzzycontroller. The fuzzy controller uses two feedback signals about the buffer fullness andabout the bit rate. Furthermore, a low pass filter (LPF) smoothes the feedback signalabout the bit rate to smooth the variations in the output of the fuzzy controller. Theoutput of IRC for the current P-picture (QP ) is the sum of the QPs used for encoding theprevious picture and the output of the fuzzy controller (∆QF )

QP (i) = QP (i− 1) +∆QF (i) (2)

From the system point of view, the main part of IRC output is the delayed version ofthe previous output and the variation of IRC output is adjusted by the fuzzy controller.In this approach, in fact, the RD performance of the previously encoded pictures is usedas a reference point for encoding the next picture and, if necessary, a small adjustmentas compared to the reference point is computed. The main advantage of this approach isthat in the small range around the reference point, the all nonlinear functions that existin the system can be assumed as linear without losing the computational accuracy.

3.2. Virtual buffer. The virtual buffer used by the IRC simulates the buffering processof the decoder at the receiver side. Although it utilizes a simple model, it is nearly identicalto the hypothetical reference decoder models used in different video coding standards. Theoccupancy of the virtual buffer is updated after encoding each video picture as follows

OB(i+ 1) = OB(i)− B(i) + (RT/F ) (3)

where OB(i) denotes the occupancy of the buffer before encoding ith picture. B(i) shows

the number of bits consumed by the ith encoded picture (P or I). RT indicates the targetaverage bit rate for the bit stream and F represents the video frame rate. Note that thevirtual buffer models the decoder buffer at the receiver side. Therefore, the inputs to thisbuffer correspond to the outputs from a buffer that operates at the encoder side.

3.3. Fuzzy controller. Study of the conventional rate control approach shows that manyheuristic functions coexist with the nonlinear RD models in the rate control process. Seethe proposed rate controller in [15] as example for VBR video. As a new approach, thefuzzy controller is selected for the proposed system because the nonlinear functions andthe complexities that exist in rate control task can be simply included in the fuzzy rulesand fuzzy membership functions (MSFs). Generally, a fuzzy controller can be designedbased on the expert experiences or it can learn from the examples. Therefore, a fuzzycontroller is a good option that makes use of the many heuristic results for video ratecontrol. Moreover, according to the used block diagram shown in Figure 2, a controller isrequired to define a small quantized value based on rough measurements on the bit rateand buffer fullness. These properties make it fit to a fuzzy controller.

Page 6: Mehdi Rezaei1,ImedBouazizi 2 and Moncef Gabbouj1moncef/publications/fuzzy-joint-encoding...method for broadcasting over DVB-H (Digital Video Broadcasting for Handhelds) chan-nels is

6 M. REZAEI, I. BOUAZIZI AND M. GABBOUJ

The fuzzy controller has two input signals that are normalized values of the bufferoccupancy and the bit rate of P-pictures. The buffer occupancy is normalized by thebuffer size and the bit rate of P-pictures is normalized by the target bit rate for P-pictures. While in VBR the consumed bit budget by P-pictures can be very differentfrom the consumed bit budget by I-pictures, depending on the frequency of I-pictures inthe bit stream, the target bit rate of P-pictures can be very different from the whole targetbit rate. It is attempted to estimate a precise value for the target bit rate of P-picturesto be used for the normalization purpose. The fuzzy inputs are defined as

x1 = OB/SB (4)

x2 =BPF

RT

µ1 +

XIP − 1II

¶(5)

where BP denotes the consumed bit budget by the previous encoded P-picture. II standsfor the interval of periodic I-pictures in the bit stream in term of number of pictures. XIPindicates the coding complexity of I-pictures relative to P-pictures and it is computed as

XIP = BI/BP (6)

where BI and BP denote the average consumed bit budgets by the encoded I-picturesand P-pictures, respectively, in the current scene cut. If the previous encoded picture isan intra picture, the value of BP in (5) is reset to the value of BI/XIP . To suppress thefluctuations of QP results of short-term variations in complexity of video pictures, thelow pass filter (LPF) smoothes the variation of BP before input to the fuzzy controller.The impulse response of LPF is

H(z) = m/¡m+ 1− z−1¢ (7)

where m is a constant value and good results are obtained with m = 1.2.All the used fuzzy rules are summarized in Table 1. The content of table specifies

the output of the fuzzy controller. The letters H, L, M and V correspond to the fuzzydescriptions High, Low, Medium and Very, respectively. The descriptor Very has beenrepeated to make new descriptors. The number before V shows the number of repetition.As an example from the table, it can be expressed as: if x1 is VL and x2 is H then outputis 3VH (Very Very Very High). The input signals are specified by their fuzzy membershipfunctions. Nine and seven membership functions have been used for the two inputs x1and x2, respectively. The fuzzy rules and membership functions have been designedbased on experiences form our previous rate control algorithms presented in [12-17]. Theasymmetric structures in the table of fuzzy rule and fuzzy MSFs are related to a numberof facts which affect the operation of rate controller. The nonlinearity of the RD functionand the difference between the bit budgets of I and P-pictures are two key points thatcause the asymmetry in the structures. The other key point is that the gain of controlloop is a function of buffer conditions. A more aggressive control is required when thebuffer fullness is close to critical conditions to prevent underflow and overflow and a loosercontrol is preferred when the buffer fullness is far from the critical conditions to preventunnecessary variations in quality of the encoded video. After preliminary design of thefuzzy system, an optimization process was performed to fine tune the fuzzy membershipfunctions. In the optimization process several parameters including average bit rate,average PSNR, average QP, and standard deviation of PSNR were considered. The finalshapes of membership functions are shown in Figure 3. The desired central values for theoutput of fuzzy system correspond to the fuzzy rules in Table 1 are depicted in Table 2.A well-known and simple fuzzy system with two inputs using product inference engine,

singleton fuzzifier, and center-average defuzzifier, as in [18], was used.

Page 7: Mehdi Rezaei1,ImedBouazizi 2 and Moncef Gabbouj1moncef/publications/fuzzy-joint-encoding...method for broadcasting over DVB-H (Digital Video Broadcasting for Handhelds) chan-nels is

FUZZY JOINT ENCODING AND STATISTICAL MULTIPLEXING 7

f(x1, x2) =

N1Pi1=1

N2Pi2=1

yi1i2μAi11(x1).μAi22

(x2)

N1Pi1=1

N2Pi2=1

μAi11(x1).μAi22

(x2)

(8)

where f(x1, x2) denotes approximated output and {A1i , A2i , · · · , ANii }i=1,2 are fuzzy setswith {μ

Ai11(x1)}1≤i1≤N1 and {μAi21 (x2)}1≤i2≤N2 membership functions defined for inputs x1

and x2, respectively. The central desired outputs denoted by yi1i2 . More information

about the derivation steps of the fuzzy system is presented in [18] and [19].The output of fuzzy system is passed through a gain control block that adaptively tunes

the gain of the feedback loop according to the buffer size and the video content propertiesas

∆QF = α×RT/SB × f(x1, x2) (9)

where α is a coefficient which can be used for fine tuning of the RCA according to thevideo content properties.

Table 1. Summarization of the IF-THEN fuzzy rules for IRCs

VH 6VH 5VH 4VH 3VH VVH VH H MH MH 5VH 4VH 3VH VVH VH H MH M MLMH 4VH 3VH VVH VH H MH M ML L

x2 M 3VH VVH VH H MH M ML L VLML VVH VH H MH M ML L VL VVLL VH H MH M ML L VL VVL 3VLVL H MH M ML L VL VVL 3VL 4VL

XL VVL VL L ML M MH H VHx1

Table 2. Desired central values of fuzzy output in IRCs

VH 8 7 6 5 4 3 2 1 0H 7 6 5 4 3 2 1 0 −1MH 6 5 4 3 2 1 0 −1 −2

x2 M 5 4 3 2 1 0 −1 −2 −3ML 4 3 2 1 0 −1 −2 −3 −4L 3 2 1 0 −1 −2 −3 −4 −5VL 2 1 0 −1 −2 −3 −4 −5 −6

3VL VVL VL L ML M MH H VHx1

3.4. Output of IRC for I-pictures. The output of IRC for I-picture is computed basedon the picture coding complexity and scene cut information. There are two types of I-pictures in the bit stream: periodic I-pictures which are placed in locations with a constantfrequency and I- pictures which are inserted at the beginning of scene cuts. The outputof IRC for both types of I-pictures is formulated as

QI = QR +∆QX (10)

Page 8: Mehdi Rezaei1,ImedBouazizi 2 and Moncef Gabbouj1moncef/publications/fuzzy-joint-encoding...method for broadcasting over DVB-H (Digital Video Broadcasting for Handhelds) chan-nels is

8 M. REZAEI, I. BOUAZIZI AND M. GABBOUJ

Figure 3. Membership functions of the IRCs fuzzy inputs

where QI denotes the output and QR is a reference value for QI that is computed differ-ently for the two types of I-pictures. The control of QI or the variation of QI around thereference QR is imposed by a controlling signal ∆QX . The controlling signal adapts theQP of I-pictures according to the coding complexity of video frame. While QR defines areference value for the QP, the controlling signal makes small variations around the refer-ence value. More details about the controlling signal and the reference QR are presentedin the sequel.

3.4.1. Reference value. The reference value (QR) for two types of I-pictures is handleddifferently. For the periodic I-pictures in which the subsequent pictures have a high degreeof correlation in terms of content and complexity, the idea is to have I-pictures with aquality as close as possible to the quality of neighboring pictures. Implementing a lowpass filter similar to (7) on QP of the previous encoded pictures gives a local averagevalue which can be used as the reference value for the current I-picture. However, usinga similar QP for encoding the I-picture and the neighboring P-pictures results in a higherquality for the I-picture than the P-pictures. This difference is acceptable and it is usefulfor overall quality. The low pass filter prevents larger differences that may have existedbetween the quality of I-picture and P-pictures.The I-pictures at scene cuts may or may not have correlation in terms of complexity

and/or content with the previous encoded pictures. Therefore, any estimation indepen-dently of the previous encoded frames or only based on the previous encoded frames maylose the bit budget or the quality. From this point of view estimating a fit QP for anI-picture at scene cut is quite challenging. As a simple solution, the reference value forthe first I-pictures at scene cut is calculated as

QR =¡Q+Qm

¢/2 (11)

where Q is a local average as for frequent I-pictures and Qm is a constant QP in the middlerange, e.g. (26-34) for H.264/AVC, as a global average over various video contents. Thelocal average value or Q keeps the quality of the I-picture close to those of previousencoded pictures when there is some correlation between the two consequent scenes interms of content. The Qm guarantees the allocation of a bit budget in the middle rangeif there is no correlation between consequent scenes in terms of coding complexity.

Page 9: Mehdi Rezaei1,ImedBouazizi 2 and Moncef Gabbouj1moncef/publications/fuzzy-joint-encoding...method for broadcasting over DVB-H (Digital Video Broadcasting for Handhelds) chan-nels is

FUZZY JOINT ENCODING AND STATISTICAL MULTIPLEXING 9

3.4.2. Coding complexity adaptation. The complexity adaptation signal or ∆QX controlsthe output of IRC for the I-picture around the reference value according to the codingcomplexity of the picture. To compute the small variations around the reference valuewith low complexity, it is accurate enough to use a simple first-order RD model as

R = X/D (12)

where R and D denote the rate and distortion respectively. X stands for the coding com-plexity. However, for small variations of QP around the reference point, an approximatedlinear function between QP and distortion can be assumed and the RD model above canbe rewritten as

R = X/Q (13)

Using the RD model (13), considering the average values of QP and complexity of I-pictures as a reference point, the complexity adaptation signal based on a drift from thereference can be derived as

∆QX = AXQI

µX

X− 1¶

(14)

where QI denotes the average value of QP of all encoded I-pictures. AX is an experi-mentally defined constant value (typically about 0.3) called complexity adaptation factor.X denotes the coding complexity of the I-picture and X stands for the average valueof coding complexity over all encoded I-pictures. Various criteria such as variance forthe estimation of coding complexity can be used. An accurate measure for the codingcomplexity of I-pictures was proposed in our previous work [16].

4. Fuzzy Joint Rate Controller. The JRC produces an output that is added to theIRCs outputs to compute the QPs used by the encoders. The output of JRC modifiesthe QPs to control the variations in the bit rate of the aggregate bit stream. It utilizesa fuzzy controller with feedbacks from the occupancy of the joint buffer and from the bitrate of aggregated bit stream. More details about the JRC are presented in the sequel.

4.1. Virtual joint buffer. The joint virtual buffer operates similarly to individual buffersused by IRCs but it is used for the aggregate bit stream. The buffer occupancy is updatedafter encoding a series of corresponding pictures (mth of each source) as

OJB(m) = OJB(m− 1)−NXi=1

Bi +

NXi=1

Ri/F (15)

where OJB denotes the joint buffer occupancy and Bi represents the consumed bits bythe encoded picture of ith source. Ri indicates the target bit rate of the i

th bit streamand F stands for the frame rate of the bit streams.

4.2. Joint rate controller. The output of JRC is defined by a fuzzy controller. Whileeach bit stream uses an independent controller with a buffer constraint, without any othercontrol, the multiplexed bit stream is constrained by a joint buffer with a size equal tothe sum of the sizes of the individual buffers used by the IRCs. The idea is to use avirtual buffer with the size of SJB as smaller as possible and then use the JRC to operateonly when the buffer condition is critical. The fuzzy controller has been designed in sucha way that the JRC has a non-zero output only when the buffer state is critical. This

Page 10: Mehdi Rezaei1,ImedBouazizi 2 and Moncef Gabbouj1moncef/publications/fuzzy-joint-encoding...method for broadcasting over DVB-H (Digital Video Broadcasting for Handhelds) chan-nels is

10 M. REZAEI, I. BOUAZIZI AND M. GABBOUJ

minimizes the interaction between encoders and also it minimizes the variations in thequality of each bit stream. The fuzzy controller has two input signal as

y1 = OJB/SJB (16)

y2 =

FNPi=1

Bi

NPi=1

Ri

(17)

where SJB denotes the size of joint buffer.

Table 3. Summarization of the IF-THEN fuzzy rules for IRCs

H VH H MH M M M ML

y2 M H MH M M M ML L

L MH M M M ML L VL

VL L ML M MH H VHy1

All the fuzzy rules are summarized in Table 3. The content of Table 3 specifies theoutput of the controller. The letters H, L, M and V correspond to the fuzzy descriptorsHigh, Low, Medium and Very, respectively. As an example from the table it can beexpressed as: if y1 is VL and y2 is M then output is H. Seven and three MSFs have beenused for the two inputs y1 and y2, respectively. The linguistic fuzzy rules and MSFs weredesigned based on some theoretic and experimental results such as in IRCs. Furthermore,an optimization process was performed for fine tuning of the fuzzy MSFs. The final shapesof the MSFs are shown in the Figure 4. The desired central values for the output of fuzzysystem correspond to VL, L, ML, M, MH, H and VH in the Table 3 are −3, −2, −1, 0,1, 2 and 3, respectively. A fuzzy system similar to (8) was used for the JRC. Moreover,the output of fuzzy system is tuned adaptively according to buffer size as

∆QJ = β ×R/SJB × f(y1, y2) (18)

where β is a constant coefficient typically about 0.3.When the number of multiplex services is small and the size of joint buffer is also

relatively small, it is useful to make a time shift between the periodic IDR pictures acrossthe bit streams to prevent unnecessary variations in QP and quality.

5. Simulation Results. To evaluate the performance of the proposed joint video en-coding and multiplexing method a set of simulations were performed. A number of 4long (60seconds) video sequences with the frame rate of 15fps, QVGA picture format anddifferent contents were encoded by two methods for a target bit rate of 300kb/s for each.First, the sequences were encoded independently by independent rate controller such asused IRCs in the proposed system. Second, they were encoded with the proposed jointrate control system. Then, encapsulating, transmission and reception of DVB-H weresimulated on the two sets of encoded bit streams for a constant bit rate channel with abandwidth of 1200kb/s. The required decoder buffer size, decoder buffering delay andPSNR of luminance component were measured for two sets. Results of simulation arepresented in the Table 4. The proposed method provides 38% reduction in the requireddecoder buffer size and 62% reduction in the decoder buffering delay at the expense of0.02dB degradation in quality. Due to symmetric operation of the decoder buffer and

Page 11: Mehdi Rezaei1,ImedBouazizi 2 and Moncef Gabbouj1moncef/publications/fuzzy-joint-encoding...method for broadcasting over DVB-H (Digital Video Broadcasting for Handhelds) chan-nels is

FUZZY JOINT ENCODING AND STATISTICAL MULTIPLEXING 11

Figure 4. Membership functions of the JRC fuzzy inputs

the IP encapsulator buffer, the same percentage of reduction in the buffering delay of IPencapsulator is expected.

Table 4. Simulation results on 4 video sequences (S1, S2, S3 and S4),300kb/s for each, 15fps, QVGA, 1200kb/s channel bandwidth

Independent Encoded Joint Encoded

SequencePSNR Delay Buffer PSNR Delay Buffer“dB” “s” “kbit” “dB” “s” “kbit”

S1 38.86 0.56 269 38.83 0.23 170S2 41.41 0.37 190 41.41 0.24 180S3 37.02 0.39 211 37.00 0.25 176S4 40.08 1.25 477 40.07 0.27 184

Average 39.35 0.65 287 39.33 0.25 178

Sample graphical simulation results are depicted in Figure 5. The frame size and thePSNR of a bit stream have been depicted for the two cases: independent encoding andjoint encoding and joint encoding. It can be seen how the PSNR graphs are similar.The bit rate and the PSNR are changed only when the buffer state is critical. Theoverall results show that the proposed method can control the bit rate of aggregated bitstream without considerable touch of video quality. Figure 6 shows sample reconstructedvideo frames encoded by the proposed method. The video sequences used for simulationsinclude many scene cuts with very different contents in terms of coding complexity andmotions that are very challenging for the rate control. It is notable that the known videosequences that are used for the standardization process cannot be used for the evaluationof proposed method because they have short lengths and homogenous contents.To evaluate the proposed method from the computational complexity point of view,

the computational complexity of the IRC was compared with the presented rate controlalgorithm in the Joint Model (JM) of H.264/AVC standard [20]. The first 100 videoframes of four known video sequences including Foreman, Carphone, News, and Hallwere encoded by the two rate control algorithms. The consumed processing times by the

Page 12: Mehdi Rezaei1,ImedBouazizi 2 and Moncef Gabbouj1moncef/publications/fuzzy-joint-encoding...method for broadcasting over DVB-H (Digital Video Broadcasting for Handhelds) chan-nels is

12 M. REZAEI, I. BOUAZIZI AND M. GABBOUJ

Figure 5. Simulation results of independent and joint video encoding

Figure 6. Sample reconstructed video frames encoded by the proposed method

controllers were measured by a high accuracy using the clock of processor (Intel Pentium4,2.8GHz). To minimize the measuring error results of time sharing operation of processor,the encoding was repeated 10 times and the minimum measured value was selected foreach sequence. The measured results are shown in Table 5. The average results over thevideo sequences show that the JM rate controller consumes a processing time about 384μs(micro second) in average for each frame while the IRC consumes a processing time about15μs in average for each frame. This numbers can be scaled to the number of bit streamsto estimate the computational complexity of the whole rate control system. According

Page 13: Mehdi Rezaei1,ImedBouazizi 2 and Moncef Gabbouj1moncef/publications/fuzzy-joint-encoding...method for broadcasting over DVB-H (Digital Video Broadcasting for Handhelds) chan-nels is

FUZZY JOINT ENCODING AND STATISTICAL MULTIPLEXING 13

to these results, there is a big difference between the computational complexity of theproposed joint rate control method and similar conventional methods.The overall simulation results show that that the proposed fuzzy joint encoding and

multiplexing method can considerably decrease end-to-end delay of DVB-H broadcastsystem without any cost in service quality and by a very low computational complexity.

Table 5. Comparison of the IRC and JM rate controller

SequenceProcessing Time ‘Micro Second’

JM IRC

Foreman 39327 1493Carphone 38136 1506News 38116 1504Hall 38379 1508

Average Over Sequences 38489 1503Average Per Frame 384 15

6. Conclusions. Utilizing fuzzy controllers, a method for joint video encoding and sta-tistical multiplexing of multiple video bit streams was proposed that decreases end-to-enddelay in a broadcast system in which a number of services are encoded and broadcastedsimultaneously. In the proposed method, the advantages of statistical multiplexing aredeployed while the broadcast services can have independent bit rates and quality of ser-vice. The proposed method has a low degree of computational complexity. It can decreaseand control the buffering delays of broadcast system without any cost in overall quality ofcompressed video. While in a broadcast system, delay and bandwidth are considered asresources that can compensate each other, the proposed method can be used to decreasethe overall bandwidth consumed by the broadcast services.

Acknowledgment. This work was supported by Nokia and the Academy of Finland,Finnish Center of Excellence Program 2006-2011, under Project 213462.

REFERENCES

[1] ETSI, Digital video broadcasting (DVB): Transmission systems for handheld terminals, ETSI Stan-dard, EN302304 V1.1.1, 2004.

[2] M. Kornfeld, DVB-H: The emerging standard for mobile data communication, Proc. of the IEEESymp. on Consumer Electronics, pp.193-198, 2004.

[3] G. May, The IP datacast system-overview and mobility aspects, Proc. of the IEEE Symp. on Con-sumer Electronics, pp.509-514, 2004.

[4] G. Faria, J. A. Henriksson, E. Stare and P. Talmola, DVB-H: Digital broadcast services to handhelddevices, Proc. of IEEE, vol.94, no.1, 2006.

[5] ETSI, Digital video broadcasting (DVB): Framing structure, channel coding and modulation fordigital terrestrial television, ETSI Standard, EN300744 V1.5.1, 2004.

[6] U. Ladebusch and C. A. Liss, Terrestrial DVB (DVB-T): A broadcast technology for stationaryportable and mobile use, Proc. of the IEEE, vol.94, no.1, pp.183-193, 2006.

[7] M. Rezaei, I. Bouazizi, V. K. M. Vadakital and M. Gabbouj, Optimal channel changing delay formobile TV over DVB-H, Proc. of the IEEE Con. on Portable Information Devices Orlando, USA,pp.1-5, 2007.

[8] T. V. Lakshman, A. Ortega and A. R. Reibman, VBR video: Tradeoffs and potentials, Proc. of theIEEE, vol.86, no.5, pp.952-973, 1998.

[9] L. Boroczky, A. Y. Ngai and E. F. Westermann, Joint rate control with look-ahead for multi-program video coding, IEEE Transactions on Circuits and Systems for Video Technology, vol.10,no.7, pp.1159-1163, 2000.

Page 14: Mehdi Rezaei1,ImedBouazizi 2 and Moncef Gabbouj1moncef/publications/fuzzy-joint-encoding...method for broadcasting over DVB-H (Digital Video Broadcasting for Handhelds) chan-nels is

14 M. REZAEI, I. BOUAZIZI AND M. GABBOUJ

[10] L. Wang and A. Vincent, Bit allocation and constraints for joint coding of multiple video programs,IEEE Transactions on Circuits and Systems for Video Technology, vol.9, no.6, pp.949-959, 1999.

[11] H. Xiong, J. Sun, S. Yu, C. Luo and J. Zhou, Design and implementation of multiplexing rate controlin broadband access network TV transmission system, IEEE Transactions on Consumer Electronics,vol.50, no.3, pp.849-855, 2004.

[12] M. Rezaei, M. Gabbouj and I. Bouazizi, Delay constrained fuzzy rate control for video streamingover DVB-H, Proc. of the IEEE Conf. on Intelligent Information Hiding and Multimedia SignalProcessing, Pasadena, California, USA, pp.223-227, 2006.

[13] M. Rezaei, M. M. Hannuksela and M. Gabbouj, Semi-fuzzy rate controller for variable bit rate video,IEEE Transactions on Circuits and Systems for Video Technology, vol.18, no.5, pp.633-645, 2008.

[14] M. Rezaei, I. Bouazizi and M. Gabbouj, Fuzzy joint encoding and statistical multiplexing of multiplevideo sources with independent quality of services for streaming over DVB-H, Proc. of the IEEE Conf.on Intelligent Information Hiding and Multimedia Signal Processing, Kaohsiung, Taiwan, pp.542-545,2007.

[15] M. Rezaei, S. Wenger and M. Gabbouj, Video rate control for streaming and local recording optimizedfor mobile devices, Proc. of the IEEE Symp. on Personal Indoor and Mobile Radio Communications,Berlin, vol.4, pp.2284-2288, 2005.

[16] M. Rezaei, S. Wenger and M. Gabbouj, Analyzed rate distortion model in standard video codecs forrate control, Proc. of the IEEE Workshop on Signal Processing Systems, Athens, Greece, pp.550-555,2005.

[17] M. Rezaei, M. M. Hannuksela and M. Gabbouj, Low-complexity fuzzy video rate controller forstreaming, Proc. of the IEEE Conf. on Acoustic, Speech and Signal Processing, Toulouse, France,vol.2, pp.897-900, 2006.

[18] L. X. Wang, Adaptive Fuzzy System and Control: Design and Stability Analysis, NJ: Prentice-Hall,Englewood Cliffs, 1994.

[19] L. X. Wang, Stable adaptive fuzzy control of nonlinear systems, IEEE Trans. Fuzzy Systems, vol.1,no.2, pp.146-155, 1993.

[20] G. Sullivan, T. Wiegand and K. P. Lim, Joint model reference encoding methods and decodingconcealment methods, Joint Video Team (JVT) of ISO/IEC MPEG and ITU-T VCEG, DocumentJVT-I049, San Diego, USA, 2003.