[ieee 2010 data compression conference - snowbird, ut, usa (2010.03.24-2010.03.26)] 2010 data...

10
Information Flows in Video Coding Jia Wang Xiaolin Wu Department of Electronic Engineering Shanghai Jiao Tong Univ., Shanghai, 200240, China, [email protected] Department of Electrical & Computer Engineering McMaster University, Canada, L8S 4K1, [email protected] Abstract We study information theoretical performance of common video coding methodologies at the frame level. Via an abstraction of consecutive video frames as correlated random variables, many existing video coding techniques, includ- ing the baseline of MPEG-x and H.26x, the scalable coding and the distributed video coding, can have corresponding information theoretical models. The the- oretical achievable rate distortion regions have been completely solved for some systems while for others remain open. We show that the achievable rate region of sequential coding equals to that of predictive coding for Markov sources. We give a theoretical analysis of the coding efficiency of B frames in the popular hybrid video coding architecture, bringing new understanding of the current practice. We also find that distributed sequential video coding generally incurs a performance loss if the source is not Markov. 1 Introduction With the ubiquity of network-based and wireless visual communications video cod- ing is facing new challenges. Among them are stringent demands on low complexity (e.g., for power conservation on battery-operated mobile devices) and system flex- ibility to suit heterogeneous networks. These requirements motivated the study of new video coding methodologies, including distributed video coding, scalable video coding, multiple description video coding, etc. It was not until recently information theoretical models for video coding were proposed and studied. In [5], Viswanthan and Berger introduced the concept of sequential coding of correlated sources, which is an extension of the traditional successive refinement problem [1] [3], and provided an information theoretical model for the IPP... frame coding structure commonly found in hybrid video coding systems. In [2] and [10] achievable rate regions for sequential coding of more than two sources were studied. In [8], sequential coding is further extended to multi-stage sequential coding problem, which is a theoretical model for scalable video coding. In abstraction video coding can be considered as a special case of network coding. The frames are the input sources to the encoder-decoder network, in which the nodes are the encoders and decoders of individual frames and the edges signify the data communications between encoders and decoders (network nodes). In this perspective, we borrow the notion of “information flow” from the network coding literature. For information flows in the above network model of video coding, we study not only the rate distortion limits of an encoder-decoder system, but also strategies to approach it. By treating each frame as a single random variable, video coding falls into the category of multi-user source coding in information theory, and as such new insights can be brought about the design of video coding systems. 2010 Data Compression Conference 1068-0314/10 $26.00 © 2010 IEEE DOI 10.1109/DCC.2010.21 149

Upload: xiaolin

Post on 17-Mar-2017

215 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: [IEEE 2010 Data Compression Conference - Snowbird, UT, USA (2010.03.24-2010.03.26)] 2010 Data Compression Conference - Information Flows in Video Coding

Information Flows in Video Coding

Jia Wang∗ Xiaolin Wu†

∗Department of Electronic Engineering

Shanghai Jiao Tong Univ., Shanghai, 200240, China, [email protected]†Department of Electrical & Computer Engineering

McMaster University, Canada, L8S 4K1, [email protected]

Abstract

We study information theoretical performance of common video codingmethodologies at the frame level. Via an abstraction of consecutive video framesas correlated random variables, many existing video coding techniques, includ-ing the baseline of MPEG-x and H.26x, the scalable coding and the distributedvideo coding, can have corresponding information theoretical models. The the-oretical achievable rate distortion regions have been completely solved for somesystems while for others remain open. We show that the achievable rate regionof sequential coding equals to that of predictive coding for Markov sources. Wegive a theoretical analysis of the coding efficiency of B frames in the popularhybrid video coding architecture, bringing new understanding of the currentpractice. We also find that distributed sequential video coding generally incursa performance loss if the source is not Markov.

1 Introduction

With the ubiquity of network-based and wireless visual communications video cod-ing is facing new challenges. Among them are stringent demands on low complexity(e.g., for power conservation on battery-operated mobile devices) and system flex-ibility to suit heterogeneous networks. These requirements motivated the study ofnew video coding methodologies, including distributed video coding, scalable videocoding, multiple description video coding, etc. It was not until recently informationtheoretical models for video coding were proposed and studied. In [5], Viswanthanand Berger introduced the concept of sequential coding of correlated sources, which isan extension of the traditional successive refinement problem [1] [3], and provided aninformation theoretical model for the IPP... frame coding structure commonly foundin hybrid video coding systems. In [2] and [10] achievable rate regions for sequentialcoding of more than two sources were studied. In [8], sequential coding is furtherextended to multi-stage sequential coding problem, which is a theoretical model forscalable video coding.

In abstraction video coding can be considered as a special case of network coding.The frames are the input sources to the encoder-decoder network, in which the nodesare the encoders and decoders of individual frames and the edges signify the datacommunications between encoders and decoders (network nodes). In this perspective,we borrow the notion of “information flow” from the network coding literature. Forinformation flows in the above network model of video coding, we study not only therate distortion limits of an encoder-decoder system, but also strategies to approachit. By treating each frame as a single random variable, video coding falls into thecategory of multi-user source coding in information theory, and as such new insightscan be brought about the design of video coding systems.

2010 Data Compression Conference

1068-0314/10 $26.00 © 2010 IEEE

DOI 10.1109/DCC.2010.21

149

Page 2: [IEEE 2010 Data Compression Conference - Snowbird, UT, USA (2010.03.24-2010.03.26)] 2010 Data Compression Conference - Information Flows in Video Coding

Figure 1: Information flows in two different video coding systems. (a) System 1:sequential coding. (b) System 2: predictive coding.

The rest of the paper is organized as follows. In section 2, we examine the the-oretical results of the IPP... structured video coding, and show that the seeminglydifferent forms of achievable regions are equivalent for Markov sources. In section 3,we analyze the efficiency of coding B frames. The main result is that the causal strat-egy commonly adopted by current video coding systems is not theoretically optimal.The paper concludes in section 4, with a discussion of inherent rate loss of distributedsequential coding.

2 Sequential Coding of Many Correlated Sources

The information flow of video coding is modeled as system 1 Fig. 1. Actually, system1 provides only an upper bound for a practical video codec (cf. [5]). A precisecharacterization of video information flow depends on how many frames are used inthe encoder and decoder. System 2 in Fig. 1 shows a different system, referred toas predictive coding in [10]. The motivation behind this system is that in practice,people usually do not use the full information of the previous frames in encoding thecurrent frame. The achievable region of system 1 is known while that of system 2remains open. But we will show that if the sources satisfy a Markov condition, theachievable regions of both systems are the same. Formal statements of both systemscan be found in [2], [10] and omitted here. In the following development we use IIDsequences generated from the source distribution.

Achievable region of system 1:Let R∗

1(DL) be the set of all achievable rate L-tuple RL = (R1, R2, ..., RL) atdistortion level DL = (D1, D2, ..., DL). Let R1(DL) be the set consisting of all rateL-tuple RL such that there exist random variables Yi correlated with Xi and functionsgi, satisfying

Ri ≥ I(X i1; Yi|Y

i−11 )

Di ≥ Ed(Xi, Xi), i = 1, . . . , L

where Xji = (Xi, Xi+1, ..., Xj) for i < j, Xi = gi(Y

i1 ) for some deterministic function

gi, and XNi+1 − (X i

1, Yi−11 ) − Yi forms a Markov chain. Denote by co(R1(DL)) the

150

Page 3: [IEEE 2010 Data Compression Conference - Snowbird, UT, USA (2010.03.24-2010.03.26)] 2010 Data Compression Conference - Information Flows in Video Coding

convex closure of R1(DL). The following theorem is already given in [2] and [10] 1.

Theorem 1 R∗1(DL) = co(R1(DL)).

Remark : Since I(Y i1 ; XL

i+1|Xi1) = I(Y i−1

1 ; XLi+1|X

i1)+I(Yi; X

Li+1|X

i1Y

i−11 ) and Y i−1

1 −

X i−11 − XL

i implies Y i−11 − X i

1 − XLi+1, one can show by induction that the Markov

chains XNi+1 − (X i

1, Yi−11 ) − Yi are equivalent to Y i

1 − X i1 − XL

i+1, i = 1, . . . , L.

An Achievable region of system 2 (inner bound):Let R∗

2(DL) be the set of all achievable rate L-tuples at distortion level DL. LetR2(DL) be the set consisting of all rate L-tuples RL such that there exist randomvariables Yi correlated with Xi and functions gi, satisfying

Ri ≥ I(Xi; Yi|Yi−11 )

Di ≥ Ed(Xi, Xi),

where Xi = gi(Yi1 ) for some deterministic function gi, and (X i−1

1 , XNi+1)− (Xi, Y

i−11 )−

Yi form a Markov chain. Denote by co(R2(DL)) the convex closure of R2(DL).

Theorem 2 R∗2(DL) ⊇ co(R2(DL)).

The proof of this theorem follows from standard procedures in information theoryand is omitted here. It is obvious that R∗

1(DL) ⊇ R∗2(DL), and generally speaking,

system 1 has a larger achievable region than system 2. But for Markov sources, thetwo systems have the same achievable region, which is stated in the following theorem.

Theorem 3 If the sources {Xi}Li=1 forms a Markov chain X1 −X2 − ...−XL in this

order, R∗1(DL) = R∗

2(DL).

Remark : In [10], the authors show that for a stationary and ergodic source, theminimum total rates of the two systems are the same.

Proof: It is obvious R∗1(DL) ⊇ R∗

2(DL). So, according to the previous theorems,the assertion of this theorem stands if we can show R1(DL) = R2(DL).1) R1(DL) ⊇ R2(DL).

It is easy to prove this part since the Markov condition (X i−11 , XN

i+1)−(Xi, Yi−11 )−

Yi implies XNi+1 − (X i

1, Yi−11 ) − Yi and I(Xi; Yi|Y

i−11 ) = I(X i

1; Yi|Yi−11 ).

2) R1(DL) ⊆ R2(DL).In order to prove this relation, we introduce a new region below. Let R0(DL)

be the set consisting of all rate L-tuples such that there exist random variables Yi

correlated with Xi and functions gi, satisfying

Ri ≥ I(Xi; Yi|Yi−11 ) (1)

Di ≥ Ed(Xi, Xi), (2)

XLi+1 − (X i

1, Yi−11 ) − Yi (3)

where Xi = gi(Yi1 ) for some deterministic function gi.

For any point in R0(DL), there exist correlated random variables Yi, i = 1, ..., N ,such that the relations (1)∼(3) hold. In the remark below Theorem 1, we have shown

1The last rate constraint is different from that used in [10]. But the equivalence of the two regionsis easy to prove.

151

Page 4: [IEEE 2010 Data Compression Conference - Snowbird, UT, USA (2010.03.24-2010.03.26)] 2010 Data Compression Conference - Information Flows in Video Coding

that Y i1 −X i

1 −XLi+1. Then according to the Markov assumption X1 −X2 − . . .−XL

andI(X i−1

1 Y i1 ; XL

i+1|Xi) = I(X i−11 ; XL

i+1|Xi) + I(Y i1 ; XL

i+1|Xi1),

we have(X i−1

1 , Y i1 ) − Xi − XL

i+1,

which yieldsYi − (Xi, Y

i−11 ) − XL

i+1. (4)

In the following, we show R2(DL) ⊇ R0(DL). Define

p′(x1, ..., xL, y1, ..., yL) = p(x1, ..., xL, y1)

L−1∏

i=1

p(yi+1|xi+1, y1, ..., yi).

It can be verified that p′(x1, ..., xL, y1, ..., yL) is a pmf. We denote Y ′1 , ..., Y

′L as the aux-

iliary random variables correlated with X1, ..., XL according to p′(x1, ..., xL, y1, ..., yL).It is easy to check the following Markov conditions.

(X i−11 , XL

i+1) − (Xi, Y′i−11 ) − Y ′

i .

We have the following

p′(x1, y1) = p(x1, y1)

p′(x2, y1, y2) = p(x2, y1, y2),

and for i > 2

p′(xi, y1, ..., yi) =∑

x1,...,xi−1,xi+1,...,xL

yi+1,...,yL

[

p(x1, ..., xL, y1)L−1∏

j=1

p(yj+1|xj+1, y1, ..., yj)

]

=∑

x1,...,xi−1,xi+1,...,xL

[

p(x1, ..., xL, y1)i−1∏

j=1

p(yj+1|xj+1, y1, ..., yj)

]

=∑

x2,x3...,xi−1

[

p(x2, ..., xi, y1)

i−1∏

j=1

p(yj+1|xj+1, y1, ..., yj)

]

= p(xi, y1, ..., yi)

where we have used (4) in the last equality. Thus

p′(xi, y1, ..., yi) = p(xi, y1, ..., yi), i = 1, ..., L,

which guaranteesI(Xi; Yi|Y

i−11 ) = I(Xi; Y

′i |Y

′i−11 ).

It is now clear that for any point in R0(DL), there exist auxiliary random variablesY ′

1 , ..., Y′L such that the conditions in R2(DL) hold. Thus R0(DL) ⊆ R2(DL). Since

it is obvious R1(DL) ⊆ R0(DL), we have R1(DL) ⊆ R2(DL).From the proof, one can see that R0(DL) = R1(DL) = R2(DL) for Markov

sources. An immediate consequence of this result is that for Markov sources, theminimum total rate has the following simple expression, the proof of which uses theregion R0(DL) and is omitted.

Corollary 1 The minimum total rate for Markov sources is given by

minXL

i+1−(Xi

1,Yi−11 )−Yi

Ed(Xi,Yi)≤Di

L∑

j=1

I(Xj; Yj|Yj−11 ).

152

Page 5: [IEEE 2010 Data Compression Conference - Snowbird, UT, USA (2010.03.24-2010.03.26)] 2010 Data Compression Conference - Information Flows in Video Coding

Figure 2: (a) The information flow of a B-frame structure. (b) An equivalent diagramof the B-frame in the coding order.

3 Rate Distortion Performance of B-frame Coding

In this section, we discuss the information flows of so-called B frames that are com-monly found in hybrid video coding systems, such as MPEG and H.264.

The aim of studying the information flow of B-frame is to show the coding ef-ficiency of the B-frame coding. According to the previous section, the achievableregion of the information flow of B-frame is known now. In the following, we will giveresults for Gaussian sources. In practice, say, for broadcasting purpose, one usuallyuse two B frames between I and P frame. In the following, we first give the achievableregion of the strategy in which two B frames are coded jointly2. The information flowdiagram of this joint B frame coding is shown in Fig.2.

In the following, we first give the achievable rate region for two B frames. Sincethe two B frames are jointly coded, we combine them to form a compound source.And the achievable region of one B frame is a special case.

For convenience let X1, X2 and (X3, X4) be the frames in the coding order, i.e.,(X3, X4) is the compound B frame. According to the previous result, the achievableregion of the coding system for arbitrary sources of Fig.2(b) is given in the following

for arbitrary auxiliary random variables Xi, i = 1, · · · , 4.

R1 ≥ I(X1; X1)

R2 ≥ I(X1, X2; X2|X1)

R3 ≥ I(X3, X4; X3, X4|X1, X2)

X1 − X1 − (X2, X3, X4)

(X1, X2) − (X1, X2) − (X3, X4)

Ed(Xi, gi(Xi1)) ≤ Di, i = 1, 2, 3, 4.

Denote D4 = (D1, D2, D3, D4) and

Y1 = {(Y1, Y2, Y3, Y4) :Y1 − X1 − (X2, X3, X4),

(Y1, Y2) − (X1, X2) − (X3, X4),

∃gi, s.t. E[(Xi − gi(Yi1 ))2] ≤ Di, i = 1, 2, 3, 4}.

It is clear that the minimum total rate is given by

Rt(D4) = min(X1,X2,X3,X4)∈Y1

(

I(X1; X1) + I(X1, X2; X2|X1) + I(X3, X4; X3, X4|X1X2))

.

2The joint coding of the two B frames are reasonable since the delay of the coding system dependsonly on the number of B frames.

153

Page 6: [IEEE 2010 Data Compression Conference - Snowbird, UT, USA (2010.03.24-2010.03.26)] 2010 Data Compression Conference - Information Flows in Video Coding

In the above expression, the auxiliary random variables are not restricted to beGaussian. But in the following, we will present several lemmas and show that theminimum total rate is indeed achieved when all the variables are Gaussian. First, wefind it convenient to use the concept of Wide-Sense Markov (WSM)3.

Definition 1 Three random variables X, Y and Z form a Wide-Sense Markov chainin this order, and denoted by X ⇔ Y ⇔ Z4, if

E(X|Y Z) = E(X|Y ),

where E(X|Y ) denotes the linear MMSE estimator of X given Y .

We list several important properties of WSM below.1) For any random variables X, Y , and arbitrary measurable function g,

X ⇔ E(X|Y ) ⇔ g(Y ).

2) For jointly Gaussian random variables WSM is equivalent to strict sense Markov(SSM).3) SSM does not imply WSM, and vice versa.4) If X−Y −Z form a Markov chain in this order and X and Y are jointly Gaussian,then X ⇔ Y ⇔ Z. More generally, if X − Y − Z form a Markov chain in this orderand E(X|Y ) is a linear function in Y , then X ⇔ Y ⇔ Z.5) Given a WSM chain, there always exists a Gaussian SSM chain with the samecovariance matrix.

Define

Y2 = {(Y1, Y2, Y3, Y4) :Y1 − X1 − (X2, X3, X4)

(Y1, Y2) − (X1, X2) − (X3, X4)

X2 ⇔ Y2 ⇔ Y1

X3 ⇔ Y3 ⇔ (Y1, Y2)

X4 ⇔ Y4 ⇔ (Y1, Y2)

E[(Xi − Yi)2] ≤ Di, i = 1, 2, 3, 4}.

Lemma 1 The minimum total rate of the IBBP coding structure with joint coding ofthe two B frames is

Rt(D4) = min(X1,X2,X3,X4)∈Y2

(

I(X1; X1) + I(X1X2; X2|X1) + I(X3X4; X3X4|X1X2))

.

Furthermore, the minimum total rate is achieved only if Xi = E(Xi|Xi1), i = 1, 2 and

Xi = E(Xi|X41 ), i = 3, 4.

Proof: If X1 − X1 − (X2, X3, X4) and (X1, X2) − (X1, X2) − (X3, X4), then

3The proof of the Lemma in [2] is incomplete since conditional uncorrelatedness does not guar-antee the Markovity of the Gaussian variables with the same covariance matrix.

4Throughout this paper, we use X − Y −Z to denote (strict sense) Markov and X ⇔ Y ⇔ Z todenote WSM.

154

Page 7: [IEEE 2010 Data Compression Conference - Snowbird, UT, USA (2010.03.24-2010.03.26)] 2010 Data Compression Conference - Information Flows in Video Coding

I(X1; X1) + I(X1X2; X2|X1) + I(X3X4; X3X4|X1X2)

= I(X1X2X3X4; X1) + I(X1X2X3X4; X2|X1) + I(X3X4; X3X4|X1X2)

= I(X1X2X3X4; X1X2) + I(X3X4; X3X4|X1X2)

= I(X3X4; X1X2) + I(X1X2; X1X2|X3X4) + I(X3X4; X3X4|X1X2)

= I(X3X4; X1X2X3X4) + I(X1X2; X1X2|X3X4). (5)

We have

min(X1,X2,X3,X4)∈Y2

(

I(X1; X1) + I(X1X2; X2|X1) + I(X3X4; X3X4|X1X2))

a

≥ min(X1,X2,X3,X4)∈Y1

(

I(X1; X1) + I(X1X2; X2|X1) + I(X3X4; X3X4|X1X2))

b= min

(X1,X2,X3,X4)∈Y1

(

I(X3X4; X1X2X3X4) + I(X1X2; X1X2|X3X4))

c

≥ min(Y1,Y2,Y3,Y4)∈Y2

(I(X3X4; Y1Y2Y3Y4) + I(X1X2; Y1Y2|X3X4))

d= min

(Y1,Y2,Y3,Y4)∈Y2

(I(X1; Y1) + I(X1X2; Y2|Y1) + I(X3X4; Y3Y4|Y1Y2)) , (6)

where a is because Y1 ⊇ Y2; b and d are because of (5); c is because for any ran-dom variables X1, X2, X3 and X4, one can set Yi = E(Xi|X

i1), i = 1, 2, and Yi =

E(Xi|X41 ), i = 3, 4. And furthermore, X2 ⇔ Y2 ⇔ Y1 and Xi ⇔ Yi ⇔ (Y1, Y2), i = 3, 4

hold according to the property 1 of WSM. Since the first and last term of (6) areequal, both inequalities must hold with equalities. And the second half of the Lemmais easy to prove.

Lemma 2 [9] Let X and Y be random (column) vectors. Then 5

E[

(X− E(X|Y))(X− E(X|Y))T]

� E[

(X− g(Y))(X− g(Y))T]

,

where g(Y) is an arbitrary measurable vector and with the same dimension of X.

We consider the compound B frame of the following form

(

X3

X4

)

= A

(

X1

X2

)

+

(

N1

N2

)

(7)

where for simplicity we assume det(A) = 1 and N1, N2 are zero mean Gaussianrandom variables independent of X1, X2. The covariance matrix of N1, N2 is denotedby KN1N2 .

Lemma 3 Given DX1X2|X1X2= D12. Then

H(X3X4|X1X2) − H(X1X2|X1X2) ≥1

2log

|D12 + KN1N2|

|D12|.

5For two square matrices A and B with the same dimension, A � B means that B−A is positivesemi-definite.

155

Page 8: [IEEE 2010 Data Compression Conference - Snowbird, UT, USA (2010.03.24-2010.03.26)] 2010 Data Compression Conference - Information Flows in Video Coding

Proof: Parallel to [7] and using the identity H(AX ) = H(X ) + log det(A).

Given random variables (X1, X2, X3, X4) ∈ Y2, and assume Xi = E(Xi|Xi1), i =

1, 2, and Xi = E(Xi|X41 ), i = 3, 4, we have the following covariance matrices.

K1 ≡ KX1,X2,X1,X2=

1 ρ12 1 − D1 xρ12 1 a 1 − D2

1 − D1 a 1 − D1 bx 1 − D2 b 1 − D2

(8)

where a is determined by the WSM chain X1 ⇔ X1 ⇔ X2 and b is determined bythe WSM chain X2 ⇔ X2 ⇔ X1. The parameter x is to be determined.

K2 ≡ KX3,X4,X1,X2,X3,X4=

(

KX3X4X1X2

)

1 − D3

y1

c1

d1

y2

1 − D4

c2

d2

1 − D3 y1 c1 d1 1 − D3 zy2 1 − D4 c2 d2 z 1 − D4

(9)

where KX3X4X1X2is determined by (7) and KX1X2X1X2

. The parameters c1 and d1 are

determined by the WSM chain X3 ⇔ X3 ⇔ (X1, X2). The parameters c2 and d2 are

determined by the WSM chain X4 ⇔ X4 ⇔ (X1, X2). The parameters y1, y2, z are

to be determined. Let D12(x) = DX1X2|X1X2and D34(x, y1, y2, z) = DX3X4|X1X2X3X4

,where DX1X2|X1X2

and DX3X4|X1X2X3X4are the linear conditional covariance matrices

derived from K1 and K2. Also, introduce notationA = {(x, y1, y2, z) : K1 < 0,K2 < 0}.

Theorem 4 The minimum total rate of the Gaussian compound B frame coding is

Rt(D4) = min(x,y1,y2,z)∈A

1

2log

|K34|∣

∣D34(x, y1, y2, z)

+1

2log

∣D12(x) + KN1N2

∣D12(x)

.

Proof: We only sketch the proof here. Let D12 = DX1X2|X1X2and D34 =

DX3X4|X1X2X3X4, it follows from Lemma 2

D12 � D12(x),D34 � D34(x, y1, y2, z).

Then we have according to Lemma 1

Rt(D1, D2, D3) ≥ I(X1; X1) + I(X1X2; X2|X1) + I(X3X4; X3X4|X1X2)

= I(X3X4; X1X2X3X4) − I(X3X4; X1X2) + I(X1X2; X1X2)

≥1

2log

|K34|

|D34|+

1

2log

|D12 + KN1N2|

|D12|

≥1

2log

|K34|∣

∣D34(x, y1, y2, z)

+1

2log

∣D12(x) + KN1N2

∣D12(x)

. (10)

The achievability of the right size of (10) is easy to verify.

156

Page 9: [IEEE 2010 Data Compression Conference - Snowbird, UT, USA (2010.03.24-2010.03.26)] 2010 Data Compression Conference - Information Flows in Video Coding

Define

Y3 = {(Y1, Y2, Y3) :(X1, X2, X3, Y1, Y2, Y3)are jointly Gaussian,

Y1 − X1 − (X2, X3)

(Y1, Y2) − (X1, X2) − X3

X2 − Y2 − Y1

X3 − Y3 − (Y1, Y2)

E[(Xi − Yi)2] ≤ Di, i = 1, 2, 3}.

Corollary 2 The minimum total rate of the Gaussian one B frame coding is

min(Y1,Y2,Y3)∈Y3

(I(X3; Y3|Y1Y2) + I(X1X2; Y1Y2)) .

Proof: In order to use the above theorem to prove this corollary, we introduce anauxiliary source X4. The introduction of X4 does not increase the minimum total rateof the original problem since the distortion matrix of (X3, X4) can be arbitrary. Itcan be verified that the minimum value of the right side of (10) for fixed x is achieved

when X4 ⇔ (X3, X1, X2) ⇔ X3 and (X3, X4) ⇔ (X1, X2, X3) ⇔ X4. Detailed proofis omitted.

Remark : According to the above Corollary, to determine the minimum total rateof the one B frame coding problem one only need to find the optimal x. Numericalresults can show that there is a performance loss when coding the two B frameseparately, using Corollary 2, comparing with a joint coding of the two B frames,characterized by Theorem 4.

An example (Causal strategy is not optimal.):In the following, we show that even for one B frame coding, the commonly used

strategy is not optimal. Let X1, X2 and X3 be jointly Gaussian with zero meanand unit variance. The correlation coefficient of X1 and X2 is ρ12 = 0.5 and X3 isdetermined by

X3 = 0.5X1 + 0.5X2 + N,

where N is a Gaussian random variable independent of X1 and X2 and σ2N = 0.25.

The distortions at three levels are D1 = D2 = D3 = 0.5.The optimal solution of x is x = 0.4182. Accordingly, the three rates are de-

termined by R1 = I(X1; Y1) = 0.5 bits, R2 = I(X1X2; Y2|Y1) = 0.1415 bits, andR3 = I(X3; Y3|Y1Y2) = 0.031 bits. Thus the minimum total rate is 0.673 bits. But ifat the second stage the encoder performs an optimal sequential coding with only twolayers, the rates are R1 = 0.5 bits, R2 = 0.11 bits, and R3 = 0.087 bits, respectively,then the total rate is 0.697 bits.

If the Gaussian sources satisfy the Markov condition, it is known that to achievethe minimum total rate, the optimal coding of each stage is to achieve the (condi-tional) rate-distortion function given previous reconstructions and does not need toconsider future layers [4], which we call the causal strategy. Intuitively, it was spec-ulated that this coding strategy is still optimal for non-Markov Gaussian sources.But this is wrong! The above example demonstrates that in a Gaussian three layercoding system, optimal sequential coding of the first two layers might not be globallyoptimal. This result should be of interest to video coding community.

4 Conclusion and Discussion

Having analyzed performance bounds of some video coding models, we conclude thispaper by pointing out the rate loss of distributed coding versus centralized coding for

157

Page 10: [IEEE 2010 Data Compression Conference - Snowbird, UT, USA (2010.03.24-2010.03.26)] 2010 Data Compression Conference - Information Flows in Video Coding

Figure 3: An example that distributed coding (a) has performance loss comparingwith sequential coding (b).

Gaussian (non-Markov) sources, which we refer to as the essential rate loss. This isdemonstrated by a simple example in Fig. 3. In the depicted distributed sequentialcoding system and the centralized coding system there are encoder-decoder pairs,with two sources X and Y . In the figure dashed decoders have a large distortion(here we do not care about the distortion), while dashed encoders have output bit-rate of zero. As such the above systems degrade to the well-known multi-terminalsource coding and jointly coding of two sources. According to the recent result ofWagner [6], the achievable region of the left system is smaller than that of the rightsystem, whose minimum total rate is exactly the rate-distortion function of the jointsource (X, Y ).

Therefore, distributed video coding cannot achieve the same performance as cen-tralized video coding, in the case that the Markov chain condition does not holds. Infuture research we will investigate whether the essential rate loss is the lower boundfor any sources with the same encoder-decoder structure.

AcknowledgementJ. Wang’s work was supported in part by NSFC under grant 60802020.

References

[1] W. H. R. Equitz and T. M. Cover, “Successive refinement of information,” IEEE Trans. Inf.Theory, vol. 37, pp. 269-275, Mar. 1991.

[2] N. Ma, and P. Ishwar, “On delayed sequential coding of correlated source,” arXiv:cs/0701197v2,Sept.30, 2008.

[3] B. Rimoldi, “Successive refinement of information: characterization of the achievable rates,”IEEE Trans. Inf. Theory, vol. 40, pp. 253-259, Jan. 1994.

[4] Y. Sermadevi, J. Chen, S. S. Hemami, and T. Berger,“When is bit allocation for predictive videocoding easy?” in Proc. IEEE Data Comp. Conf. pp. 289-298, Utah, March, 2005.

[5] H. Viswanathan and T. Berger, “Sequential coding of correlated sources,” IEEE Trans. Inf.Theory, vol. 46, pp. 236-246. Jan. 2000.

[6] A. B. Wagner, S. Tavildar, P. Viswanath, “Rate region of the quadratic Gassian two-encodersource-coding problem,” IEEE Trans. Inf. Theory, vol. 54, pp. 1938-1961, May 2008.

[7] H. Wang and P. Viswanath, “Vector Gaussian multiple description with individual and centralreceivers,” IEEE Trans. Inf. Theory, vol. 53, pp. 2133-2153, Jun. 2007.

[8] J. Wang, X. Wu, J. Sun and S. Yu, “On multi-stage sequential coding of correlated sources, inProc. IEEE Data Comp. Conf. pp. 253-262, Utah, March, 2007.

[9] J. Wang, J. Chen, and X. Wu, “On the minimum sum rate of Gaussian multiterminal sourcecoding: new proofs, in Proc. IEEE Inter. Sym. Informtion Theory, pp.1463-1467, Seoul, Korea,2009.

[10] E.-h. Yang, L. Zheng, D. He, and Z. Zhang, “On the rate distortion theory for causal videocoding,” IEEE Inf. Theory and App. Workshop, pp. 385-391, 2009.

158