improved interview video error concealment on whole frame packet loss

12
Improved interview video error concealment on whole frame packet loss Ting-Lan Lin a,, Tsung-En Chang a , Gui-Xiang Huang a , Chi-Chan Chou a , Uday Singh Thakur b a Department of Electronic Engineering, Chung Yuan Christian University, No. 200, Zhongbei Rd., Zhongli City, Taoyuan County 320, Taiwan, ROC b Institut für Nachrichtentechnik, RWTH Aachen University, Germany article info Article history: Received 26 November 2013 Accepted 9 September 2014 Available online 21 September 2014 Keywords: Multiview video with depth Error concealment Illumination compensation Whole frame loss DIBR (Depth image based rendering) Lost motion estimation Hole filling Reference view selection abstract An improved DIBR-based (Depth image based rendering) whole frame error concealment method for multiview video with depth is designed. An optimal reference view selection is first proposed. The paper further includes three modified parts for the DIBRed pixels. First, the missing 1-to-1 pixels are concealed by the pixels from another view. The light differences between views are taken care of by the information of the motion vector of the projected coordination and a reverse DIBR procedure. Second, the generation of the many-to-1 pixels is improved via their depth information. Third, the hole pixels are found using the estimated motion vectors derived efficiently from a weighted function of the neighboring available motion vectors and their distance to the target hole pixel. The experimental results show that, compared to the state-of-the-art method, the combined system of the four proposed methods is superior and improves the performance by 5.53 dB at maximum. Ó 2014 Elsevier Inc. All rights reserved. 1. Introduction When compressed bitstreams are transmitted over the band- width-limited communication channel, bit errors, or packet losses, occur. And for video coding, the information of how to reconstruct a specific frame (for example, the motion vectors and the reference frame numbers for motion compensation, etc.,) is contained in that frame packet, not in its neighboring (inter-view or inter-frame) frame packet. Therefore, if a frame packet is missing, all the infor- mation (for example, the motion vectors) associated with this frame to recover this frame is missing. Thus, this frame is not pos- sible to be reconstructed from the reference frame by the standard decoder mechanism. Therefore, an error concealment method is needed. Error concealment techniques use the available informa- tion in the spatial and/or temporal neighborhood to recover the lost blocks. Kim et al. [1] use the INTRA mode information in the neighboring blocks to estimate the edge direction in the lost block. Block-matching-type error concealment methods [2,3] have been developed. The algorithm proposed by Qian et al. [4] adaptively decides the order of the lost blocks to be error-concealed. Multiple reference frames are used to improve the performance of the error concealment introduced by Chen et al. [5]. The method proposed by Chen et al. [6] aims to minimize a joint spatial–temporal cost function to obtain the lost motion vector. By analyzing the differ- ences between neighboring motion vectors, the estimated motion vectors are improved by recursive motion vector enhancement [7]. Zhang’s method [8] utilizes the available motion vectors to find the autoregressive regression between reference pixels and the current pixels in order to recover lost pixels. Statistical methods/ models have also been applied in this area. Gaussian Mixture Mod- els and Mixture Model-Least Square are used in the works of Pers- son et al. [9,10]. A B-spline method is used in literature [11]. Additionally, the work proposed by Zheng and Chau [12] uses a polynomial model for the estimation of the lost motion vectors. Chen et al. [13] make use of the surrounding motion vectors to form a 2-dimensional plane to estimate the lost motion vectors. Adaptive search range for motion estimation is studied and used for the motion recovery of Xu and Zhou [14]. In the work of Tröger and Kaup [15], when transmitting the video, a lower-resolution version is also sent so that when packets are lost, the decoder can use the low-resolution version for error concealment. A classic and efficient method called motion vector extrapolation (MVE) [16] was proposed. This efficient method is further extended in recent literature [17,18]. Zhou et al. [18] especially improve the motion vectors estimated by MVE with the use of the received/ available motion vectors in the spatial neighborhood. Recently, the research area of multiview videos with depth has emerged, and the applications in error concealment methods have also been developed. The transmission system of multiview videos http://dx.doi.org/10.1016/j.jvcir.2014.09.006 1047-3203/Ó 2014 Elsevier Inc. All rights reserved. Corresponding author. Fax: +886 03 265 4699. E-mail addresses: [email protected] (T.-L. Lin), [email protected] (T.-E. Chang), [email protected] (G.-X. Huang), [email protected] (C.-C. Chou), [email protected] (U.S. Thakur). J. Vis. Commun. Image R. 25 (2014) 1811–1822 Contents lists available at ScienceDirect J. Vis. Commun. Image R. journal homepage: www.elsevier.com/locate/jvci

Upload: uday-singh

Post on 16-Feb-2017

216 views

Category:

Documents


3 download

TRANSCRIPT

Page 1: Improved interview video error concealment on whole frame packet loss

J. Vis. Commun. Image R. 25 (2014) 1811–1822

Contents lists available at ScienceDirect

J. Vis. Commun. Image R.

journal homepage: www.elsevier .com/ locate/ jvc i

Improved interview video error concealment on whole frame packet loss

http://dx.doi.org/10.1016/j.jvcir.2014.09.0061047-3203/� 2014 Elsevier Inc. All rights reserved.

⇑ Corresponding author. Fax: +886 03 265 4699.E-mail addresses: [email protected] (T.-L. Lin), [email protected] (T.-E.

Chang), [email protected] (G.-X. Huang), [email protected] (C.-C.Chou), [email protected] (U.S. Thakur).

Ting-Lan Lin a,⇑, Tsung-En Chang a, Gui-Xiang Huang a, Chi-Chan Chou a, Uday Singh Thakur b

a Department of Electronic Engineering, Chung Yuan Christian University, No. 200, Zhongbei Rd., Zhongli City, Taoyuan County 320, Taiwan, ROCb Institut für Nachrichtentechnik, RWTH Aachen University, Germany

a r t i c l e i n f o

Article history:Received 26 November 2013Accepted 9 September 2014Available online 21 September 2014

Keywords:Multiview video with depthError concealmentIllumination compensationWhole frame lossDIBR (Depth image based rendering)Lost motion estimationHole fillingReference view selection

a b s t r a c t

An improved DIBR-based (Depth image based rendering) whole frame error concealment method formultiview video with depth is designed. An optimal reference view selection is first proposed. The paperfurther includes three modified parts for the DIBRed pixels. First, the missing 1-to-1 pixels are concealedby the pixels from another view. The light differences between views are taken care of by the informationof the motion vector of the projected coordination and a reverse DIBR procedure. Second, the generationof the many-to-1 pixels is improved via their depth information. Third, the hole pixels are found using theestimated motion vectors derived efficiently from a weighted function of the neighboring availablemotion vectors and their distance to the target hole pixel. The experimental results show that, comparedto the state-of-the-art method, the combined system of the four proposed methods is superior andimproves the performance by 5.53 dB at maximum.

� 2014 Elsevier Inc. All rights reserved.

1. Introduction

When compressed bitstreams are transmitted over the band-width-limited communication channel, bit errors, or packet losses,occur. And for video coding, the information of how to reconstructa specific frame (for example, the motion vectors and the referenceframe numbers for motion compensation, etc.,) is contained in thatframe packet, not in its neighboring (inter-view or inter-frame)frame packet. Therefore, if a frame packet is missing, all the infor-mation (for example, the motion vectors) associated with thisframe to recover this frame is missing. Thus, this frame is not pos-sible to be reconstructed from the reference frame by the standarddecoder mechanism. Therefore, an error concealment method isneeded. Error concealment techniques use the available informa-tion in the spatial and/or temporal neighborhood to recover thelost blocks. Kim et al. [1] use the INTRA mode information in theneighboring blocks to estimate the edge direction in the lost block.Block-matching-type error concealment methods [2,3] have beendeveloped. The algorithm proposed by Qian et al. [4] adaptivelydecides the order of the lost blocks to be error-concealed. Multiplereference frames are used to improve the performance of the errorconcealment introduced by Chen et al. [5]. The method proposed

by Chen et al. [6] aims to minimize a joint spatial–temporal costfunction to obtain the lost motion vector. By analyzing the differ-ences between neighboring motion vectors, the estimated motionvectors are improved by recursive motion vector enhancement[7]. Zhang’s method [8] utilizes the available motion vectors to findthe autoregressive regression between reference pixels and thecurrent pixels in order to recover lost pixels. Statistical methods/models have also been applied in this area. Gaussian Mixture Mod-els and Mixture Model-Least Square are used in the works of Pers-son et al. [9,10]. A B-spline method is used in literature [11].Additionally, the work proposed by Zheng and Chau [12] uses apolynomial model for the estimation of the lost motion vectors.Chen et al. [13] make use of the surrounding motion vectors toform a 2-dimensional plane to estimate the lost motion vectors.Adaptive search range for motion estimation is studied and usedfor the motion recovery of Xu and Zhou [14]. In the work of Trögerand Kaup [15], when transmitting the video, a lower-resolutionversion is also sent so that when packets are lost, the decodercan use the low-resolution version for error concealment. A classicand efficient method called motion vector extrapolation (MVE)[16] was proposed. This efficient method is further extended inrecent literature [17,18]. Zhou et al. [18] especially improve themotion vectors estimated by MVE with the use of the received/available motion vectors in the spatial neighborhood.

Recently, the research area of multiview videos with depth hasemerged, and the applications in error concealment methods havealso been developed. The transmission system of multiview videos

Page 2: Improved interview video error concealment on whole frame packet loss

1812 T.-L. Lin et al. / J. Vis. Commun. Image R. 25 (2014) 1811–1822

with depth is shown in Fig. 1. The work by Yan [19] uses the receiveddepth information, along with the BMA (Block matching algorithm),to recover the lost pixels. Liu et al. [20] first exam the characteristicsof the received depth map, and, with this information, the lostmotion vectors can be recovered more accurately. In the work ofYang et al. [21], the lost pixels in one view are recovered by thereceived frame in another view. The illumination differences in dif-ferent views are considered and investigated. However, the worksdiscussed above consider the losses of MBs in a checkerboard man-ner; that is, there must be four (top, bottom, left and right) surround-ing received neighboring MBs for the above methods to work. Thismeans that they cannot be used in the scenario of whole framelosses, which is a more realistic situation for the current video codecwith very high compression rate. The work of Yan and Zhou [22] con-siders whole frame loss, and the error concealment algorithm uses amotion extrapolation technique and a received depth map torecover the lost frame from the previous frame. In the work of Chunget al. [23], both color frames and depth frames are lost, and the colorframes are recovered not only from the previous frame but also fromanother view using the technique of DIBR (Depth-image-based ren-dering) with illumination compensation between frames.

In this work, we aim to modify and improve the error conceal-ment method from the work of Chung et al. [23] in the scenario ofwhole frame loss. Particularly, we aim to improve the pixel qualitygenerated from the DIBR process used in Chung et al. [23]. DIBR isa technique to generate a virtual view using the pixels from a refer-ence view. The coordinate mapping between the reference view andthe virtual view is computed by camera parameters and the depthinformation in both views; details of the computation can be foundin [24]. There are 3 types of pixel mapping in DIBR: 1-to-1 pixels,many-to-1 pixels, and hole pixels. For 1-to-1 mapping situation, thereis only one pixel location W in the reference view mapping to the tar-get pixel location in the virtual view. In this case, the pixel to beplaced in the target location in virtual view is simply copied fromthe pixel at W in the reference frame. For many-to-1 mapping situ-ation, there are multiple pixel locations in the reference view map-ping to the target pixel location in the virtual view. Define themultiple pixel locations in the reference view to be a set P, it isimportant to find out which pixel in the set P is the best to be copiedto the target location in the virtual view. If there is no pixel locationin the reference view mapping to the target pixel location in the vir-tual view, the target pixel location is called hole. In our approach, wedevelop algorithms with 4 novelties to improve the DIBRed pixelsgenerated by Chung’s work [23]:

1. We proposed an optimal reference view selection method forthe DIBR to generate the virtual view.

Depth DepthDepth

Color Color Color

t+1tt-1

Left

Right

Depth DepthDepth

Color Color Color

……

……

……

……

Fig. 1. System of multiview video with depth.

2. We challenge the assumptions Chung’s work made about thepixel estimation and the illumination compensation schemeduring the error concealment process for 1-to-1 pixels. Wepropose a novel pixel estimation and illumination procedurethat is more realistic.

3. For the DIBR process for the many-to-1 pixels, we use thedepth information to choose the better pixel. A baselinemethod is compared.

4. Finally, a novel hole-filling algorithm is proposed to finishthe error concealment procedure. The hole-filling algorithmmakes use of the surrounding available motion vectorsaround the hole pixel.

We published a preliminary work on error concealment usingimproved illumination compensation in [25], which correspondsto point 2 described above. In the current paper, we further pro-pose points 1, point 3 and point 4 to improve the overall system.The improved performance of this proposed method is demon-strated in the experimental results with more different viewsand videos. One thing to note is that in this work, the error conceal-ment method is designed under the assumption that the depthframes are always available in the decoder. This can be done byadding more channel coding protection to the depth frames, ortransmitting the depth frames through a more reliable channel. Adeeper study for the situation that the depth frames are also lostis envisioned as a future work.

This paper is structured as follows: Section 2 discusses the state-of-the-art whole frame error concealment methods [23] for multi-view video with depth. Section 3 proposes an optimal DIBR referenceview selection method. The proposed modification on 1-to-1 pixelsfor DIBR is developed in section 4. Note that Sections 2 and 4 are par-tially taken from our prior work in [25]. For the modification of themany-to-1 pixels in the DIBR, an improved method is used in Section5. A comparison with a baseline method is made. Section 6 proposesa novel method to hole-fill the images after DIBR, using neighboringavailable motion vectors. The performance of a benchmarking algo-rithm is also compared and discussed. Experimental results aredemonstrated and discussed in Section 7. Section 8 concludes thepaper, and Section 9 provides a vision of future work.

2. State-of-the-art whole frame loss error concealment inmultiview video with depth

This section briefly discusses the state-of-the-art multiviewerror concealment in whole frame loss [23] for 1-to-1 pixels,many-to-1 pixels, and hole pixels.

A two-view video system is considered in Fig. 2: the left andright view with color and depth frames. Superscripts l and r are usedto denote the left view and right view, and subscripts c and d areused to denote the color frame and depth frame. For example, Fl

c

means the color frame in the left view at time t. In the current work,the color frame Fr

c and depth frame Frd in the right view at time t are

lost, and the Frc is to be error-concealed first by the DIBR (Depth-

image based rendering) technique [24]. That is, with the informa-tion of the available Fl

c and Fld, the DIBR can generate the view coor-

dination projection to produce the estimated view of FrcðtÞ.

By the technique of DIBR, as discussed in the Introduction, threetypes of pixels are generated: 1-to-1 pixels, many-to-1 pixels, andhole pixels. The state-of-the-art methods [23] to address thesethree types of pixels are introduced here. For a 1-to-1 pixel, a, inFr

c , its mapping coordinates, b, in Flc are found by DIBR. With the

available motion vector, mv(b), associated with b, a pointed pixel,c, in Fl

cðt � 1Þ can be found. The illumination change, e(b), from bin Fl

c to c in Flcðt � 1Þ is found by

eðbÞ ¼ ColorðbÞ � ColorðcÞ ð1Þ

Page 3: Improved interview video error concealment on whole frame packet loss

Left

Right

b

a

c

d

mv(b)

mv(a)

( )trcF( )1−trcF

( )1−tlcF ( )tlcF

Fig. 2. Whole frame error concealment in [23]. 1-to-1 mapped pixels determinedby DIBR are considered.

T.-L. Lin et al. / J. Vis. Commun. Image R. 25 (2014) 1811–1822 1813

In the work of [23], it is assumed that the illumination changeacross frames for a in the right view can be approximated by theillumination change across frames for b in the left view:

eðaÞ ffi eðbÞ ð2Þ

Additionally, the motion vector of pixel a in Frc is assumed to be

approximated by the motion vector of pixel b in Flc:

mvðaÞ ffi mvðbÞ ð3Þ

After obtaining the estimated motion vector mv(a) and the esti-mated illumination change e(a), the estimated pixel at a can befound by

ColorðaÞ ¼ Frcðt � 1; aþmvðaÞÞ þ eðaÞ ð4Þ

After the 1-to-1 pixels are estimated, the motion vector extrapo-lation from the previous frame is used for many-to-1 pixels and holepixels. In Fig. 3, for the many-to-1 pixels and hole pixels, denoted as e,in the frame t, Fr

c , the Frcðt � 1Þ and its motion vectors are considered.

For each pixel in Frcðt � 1Þ, its corresponding motion vector pointing

to Frcðt � 2Þ is used to reversely project to Fr

c . If a many-to-1 pixel orhole pixel e in Fr

c is projected by N pixels (point f and g in this exam-ple) from Fr

cðt � 1Þ , the color value of e is derived as the average valueof the N pixels in Fr

cðt � 1Þ.

3. Proposed DIBR reference view selection

When a frame is lost, DIBR is performed to generate the pixels(1-to-1, many-to-1, and hole pixels) in the virtual view for the lostframe from a reference view frame. A reference view can be anyview in all the views other than the view at which the frame is lost.In this section, we proposed a reference view selection algorithmto optimally select a reference view for the possibly best finalimage quality.

Fig. 3. Whole frame error concealment in [23]. Many-to-1 pixels and hole pixelsdetermined by DIBR are considered.

When the current view is lost, for the proposed method, weneed to choose a reference view to perform DIBR for the error con-cealment. The criterion for us to choose the reference view from allother views is how easy the hole pixels generated from other viewscan be concealed. For example, in Fig. 4, assume the view 4 is lost,we choose view 0 (a farther view) as reference view to generate thevirtual view 4, as shown in the left of Fig. 4. Also, we choose view 3(a closer view) as reference view to generate the virtual view 4, asshown in the right of Fig. 4. We can observe that a virtual view 4 inthe left has larger number of hole pixels than the one in the right.And as can be seen in the figure, the larger the holes are, the moredifficult the whole frame can be recovered.

Therefore, we use the number of holes in a generated virtualview to be the criterion for reference view selection. We definethe hole pixel percentage as the number of hole pixels divided bythe total number of pixels in a frame. And the view that has theleast hole pixel percentage to the virtual view will be the referenceview for the proposed method to perform DIBR from. For example,if view 4 is missing, we consider view 0, 1, 2, 3, 5, 6, 7 to be the can-didates for the reference view. We generate the virtual view 4 fromview 0, 1, 2, 3, 5, 6, 7 individually, so we have 7 virtual views ofview 4. For each of these 7 virtual views of view 4, we computetheir hole pixel percentage. If the virtual view 4 from view 5 hasthe least hole pixel percentage, the view 5 is decided to be the ref-erence view in our method.

A more general expression of the reference view selection is asfollows. For the lost view j, we define a view set Wj to be all theviews excluding view j. A hole pixel percentage ui

j is defined as afunction of the number of hole pixels in virtual view j from candi-date reference view i e {Wj}:

uij ¼

the number of hole pixels in virtual view j from candidate reference view itotal number of pixels in a frame

�%

The candidate reference view i⁄ that has the least uij will be the

optimal reference view for us to perform DIBR to generate virtualview j. This optimization problem of the reference view selectioncan be formulated as follows:

i� ¼ arg min8i2fWjg

uij

4. Novel pixel estimation and illumination compensation for 1-to-1 pixels

In this section, we evaluate the assumptions used in the 1-to-1pixels estimation process in [23], and improve the assumptions topropose a novel algorithm.

For 1-to-1 pixels, the error concealment method in [23] oper-ates under two assumptions:

(1) For the same particular spatial location, the motion vector inone view is the same as the motion vector in another view.

(2) The illumination difference between frames in one view isthe same as that in another view.

However, these assumptions are not realistic; due to the differ-ence of the shooting angle between two camera views, the motionvectors in different views should not be the same, and the illumina-tion difference between frames should not be the same for differentviews. In this work, we attempted to design an error concealmentalgorithm with better assumptions. The improved method can beexplained by Fig. 5. The Fr

cðtÞ is lost during video transmission andis to be error-concealed by the DIBR technique from the informationin Fl

c . We concentrate on the 1-to-1 case. To recover the 1-to-1 pixel ain Fr

c , the corresponding coordinate b in Flc is determined by the DIBR

method. Again, the motion vector of b is used to point to the

Page 4: Improved interview video error concealment on whole frame packet loss

(a) virtual view 4 from view 0 (b) virtual view 4 from view 3

Fig. 4. The comparisons of the size of the holes in the virtual view 4 when the reference view is (a) farther (view 0) and (b) closer (view 3).

Fig. 5. Proposed whole frame error concealment methods with improved pixelestimation and illumination compensation. 1-to-1 mapped pixels determined byDIBR are considered.

1814 T.-L. Lin et al. / J. Vis. Commun. Image R. 25 (2014) 1811–1822

reference point c in Flcðt � 1Þ. What is different from the method in

[23] is that we do not assume that the motion vector mv(a) of a inFr

c is the same as the motion vector mv(b) of b in Flc; this assumption

is too strong and unrealistic because the motion vectors of thecorresponding positions in different views should not be the samedue to the difference in shooting angle between the different views.Instead, we use the pixel value of position b, derived by the DIBRmethod, to be the initial pixel value of a. This is a geometric answerby the coordination projection, and it is better than using theassumption of the shared motion vectors in different views. This isstage one of the proposed method for 1-to-1 pixels.

Now we consider the illumination problem as stage two. In [23],the illumination compensation is considered across the frames,and it is assumed that this compensation is valid for differentviews. However, the illumination differences across frames for differ-ent views should be different due to the different shooting angle. Inthe proposed method, we make a different assumption; we assumethe illumination differences across two views for different frames arethe same. This is a more realistic situation because the illuminationcondition (differences) across different views should be almost thesame for consecutive frames (only 1/30 s apart). Therefore, we findthe point h in Fr

cðt � 1Þ by the coordination projection using reverseDIBR technique from the point c in Fl

cðt � 1Þ. The illumination com-pensation in time t � 1 between the left view and right view iscomputed as

DðhÞ ¼ ColorðhÞ � ColorðcÞ ð5Þ

As discussed previously, the illumination compensation D(a) forthe corresponding points a and b in time t can be approximated by

DðaÞ ffi DðhÞ ð6Þ

Therefore, this illumination compensation D(a) in frame t isused to modified the initial estimate of a from stage one

ColorðaÞ ¼ ColorðbÞ þ DðaÞ ffi ColorðbÞ þ DðhÞ ð7Þ

5. Pixel processing for the many-to-1 pixels in the DIBRprocedure

In Section 4, we discussed how we improve the 1-to-1 pixelquality from [23] with proper assumption of illumination compen-sation. In this section, we discuss procedures to process the many-to-1 pixels during the DIBR.

In [23], the many-to-1 pixels are filled by the motion vectorextrapolation method from the previous frame, as discussed in Sec-tion 2. However, the motion vector extrapolation method in [23]does not perform well. In fact, a better method for this case wouldbe to use the pixel information from another view. When we synthe-size the virtual view using DIBR from a reference view, there will bemultiple pixel locations in the reference view being mapped into thesame location in the virtual view. This is a situation of many-to-1mapping. This situation is due to the shift of the view, and those pix-els become at the same view point in the new view. This is moreunderstandable using the following example. For a better result ofdemonstration, views that are far apart are used (view 0 and view7). In Fig. 6, in the left view (as reference view), the woman andthe man are not at the same pixel location. However, in the rightview (as virtual view), part of the woman (left hand) overlaps withpart of the man (elbow), indicating that both parts are at the samepixel locations. And it turns out that the part that is exposed to us(occupying the location) is the part that is closer to us, which isthe part (elbow) of man in this example. Therefore, in other words,among the parts that have the same pixel location in the virtual view,the part that has larger depth value (closer to the viewer) shouldoccupy the location. To be specific, as shown in Fig. 8, the many-to-1 mapping means that, during the DIBR, more than one pixel loca-tion (a, b, and c in this example) in the left frame Fl

c is mapping to thesame target location (d in this example) in the right frame Fr

c , and thequestion becomes which of the many candidates to choose as thebest final pixel. Based on the geometric relation, the correspondingdepth values of location a, b, and c in left depth frame Fl

c are firstchecked, and then the best choice is the location that has the largestdepth value. We define this method as LargestDepth (A similar tech-nique is used in [24]).

To show the correctness of the above explanation, we devise abaseline algorithm to compare the performance. Assumed thatthere is a many to 1 case where three different coordinates A, B

Page 5: Improved interview video error concealment on whole frame packet loss

(a) view 7 (left view, as reference view) (b) view 0 (right view, as virtual view)

Fig. 6. Demonstration of the same frame (35th) in different views.

T.-L. Lin et al. / J. Vis. Commun. Image R. 25 (2014) 1811–1822 1815

and C in view 1 are mapped to the same coordinate D in view 2. (3-to-1 case). Consider the 4 neighboring (or closest available) pixelsof D to be E, F, G and H. For a candidate in the many-to-1 pixel A,we compute the absolute depth difference between A and E, abso-lute depth difference between A and F, absolute depth differencebetween A and G, absolute depth difference between A and H.The sum of these four absolute depth differences is denoted asErrorA. By the same procedure, we can compute ErrorB and ErrorCfor the rest of the many-to-1 candidates B and C. If the ErrorA is thesmallest among the three in this example, then the color pixel of Awill occupy the location of D in view 2. We define this method asDepthDifference. We compare the actual performance of the Larg-estDepth and DepthDifference in the following example that runsDIBR from view 0 to view 1. In Fig. 7, the yellow pixels in subfigure(c) are the many-to-1 pixel locations in the virtual view, and wefocus on the performance of the two methods in this area (otherareas are 1-to-1 and hole pixels, and they are processed in thesame way for the two methods since they are not the focuses inthis example). Subfigure (d) is the resulting image by the methodof DepthDifference. We can see that for the many-to-1 pixels,the pixel that has similar depth of the neighboring pixels in the vir-tual view is selected, and the resulting image for the many-to-1area become similar to the background (who is a larger portionof the surrounding pixels around the hole pixels). This is a visuallybad image. On the other hand, the resulting image of LargestDepthis shown in subfigure (e). By selecting the pixel of the largestdepth, the resulting image is more satisfactory with the humanperception, and therefore a more visually pleasant image. There-fore, the LargestDepth method is used in the proposed method.

6. Proposed hole-filling method using distance-weightedmotion vectors

After improving the 1-to-1 and many-to-1 pixel estimation dur-ing DIBR, the final step is to estimate the hole pixels. Hole pixelsare produced when there are no pixel coordinates in another viewmapping to the current pixel location, and we need to use theinformation of the other frame/view to estimate the missing pixels.In this section, we aim to use the reference pixels in the previousframe of the same view to compensate for the hole pixels. A bench-marking algorithm is also devised and discussed to show the effi-ciency of the proposed method.

6.1. Proposed efficient hole-filling method

As shown in Fig. 9, a hole area (gray) is shown in the right frameFr

c , and we want to estimate the motion vectors of the hole pixels inthat area. Take the hole pixel a in the hole area, for example. Wepropose using the motion vectors of the immediate neighboringlocations at the top, bottom, left and right side of the hole pixel

a. They are locations b, c, d, and e in Fig. 9. We denote the motionvectors of the immediate neighboring locations as mv(b), mv(c),mv(d) and mv(e), respectively. We propose finding the missingmotion vector of the hole pixel a by the sum of the four motionvectors of the immediate neighboring locations, weighted by theinverse of the spatial distance between the neighboring locationand the hole pixel location:

dmv ðaÞ ¼ w mvðbÞ � 1

abþmvðcÞ � 1

acþmvðdÞ � 1

adþmvðeÞ � 1

ae

� �

where ax denotes the Euclidian distance between the location a andthe location x. The weighting is the inverse of the distance becausethe farther apart the two locations are, the less correlation of themotion vectors they have. w is the normalizing factor:

w ¼ 11abþ 1

ac þ1

adþ 1

ae

After dmv ðaÞ is obtained, the missing pixel at location a can be recov-ered by

ColorðaÞ ¼ Frc t � 1; aþdmv ðaÞ� �

The procedure is repeated for all the missing pixels in a hole.The experiment section shows that this technique is better thanthe motion vector extrapolation from the previous frame, as pro-posed in Chung’s work [23], for the hole pixels by DIBR.

6.2. A benchmarking Adaptive Weighting Estimation Algorithm(AWEA)

In the proposed method, the missing motion vector dmv ðaÞ ofhole pixel a can be estimated by the following equation:

dmv ðaÞ ¼W qbmvðbÞ þ qcmvðcÞ þ qdmvðdÞ þ qemvðeÞ� �

The notation qx ¼ 1ax stands for the weighting coefficient on the

motion vector of the pixel x, and the ax denotes the Euclidian dis-tance between the location a and the location x. W is the normal-izing factor. In this case, the weightings qx on the neighboringmotion vector of location x is simply the inverse of the distancebetween the pixel a and the neighboring pixel x. In this subsection,we aim to find a more systematic way to derive the weightings onthe neighboring motion vector at the location x. We had designedan Adaptive Weighting Estimation Algorithm (AWEA).

We aim to know that when two locations a and x are ax pixelsapart, what is the likelihood that the mv(a) and mv(x) are identical.When the frame t is lost, the motion vector information in theframe t is lost, so we perform the data analysis of motion vectorson frame t � 1. In frame t � 1, we choose a fixed center point A,and consider all the pixels to its left side. For a particular pixel Xto its left side, we compute the absolute difference between the

Page 6: Improved interview video error concealment on whole frame packet loss

(a) True view 1 (left) (b) True view 0 (right)

(c) Virtual view 1 from view 0. Yellow pixels indicate many-to-1 pixels

(d) Virtual view 1 from view 0. Depth Difference method

(e) Virtual view 1 from view 0. Largest Depth method

Fig. 7. The generation of virtual view 1 from view 0 using the method of Depth Difference and Largest Depth.

1816 T.-L. Lin et al. / J. Vis. Commun. Image R. 25 (2014) 1811–1822

motion vectors mv(A) and mv(X), defined by MVdiff, and record theEuclidian distance between the two pixel locations, defined byDist. Now we have a pair of information (MVdiff, Dist). We per-form this procedure for all the pixels to the left side of A. Assumethere are K pixels to its left side, then there will be K pairs of theinformation (MVdiff, Dist). Similarly, the process is repeated forpixels in the right side, the up side and the down side of the centerpoint A. Assume there are J pixels in the right side, L pixels in theup side, and R pixels in the down side, along with the K pixels inthe left side, there will be NA = (J + L + R + K) pairs of the informa-tion (MVdiff, Dist) for this center point A. We then change thelocation of center point to other location, for example point B,and find NB pairs of the information (MVdiff, Dist). The processcan be repeated for all the pixels in frame t � 1 to be the centerpoint, and we can have total number N = (NA + NB + NC + . . .) pairs

of information (MVdiff, Dist). Among N pairs of information(MVdiff, Dist), we only focus on the pairs of information whereMVdiff = 0 for the case when the two compared motion vectorsare identical (difference is 0). Assume there are Q pairs ofinformation whose MVdiff = 0, among which the number ofinstances that a and x is ax apart is nax. The probability that themotion vectors of a and x are the same, when they are ax away,can be computed as

px ¼nax

Q�%

This is to be used to replace the weighting coefficient on mv(x)for dmv ðaÞ; the higher the probability is, the more likely that thetwo motion vectors are the same, and the more weighting should

Page 7: Improved interview video error concealment on whole frame packet loss

Fig. 8. The processing diagram for the many-to-1 pixels in the DIBR procedure.

Fig. 9. Proposed inter-frame method for filling the hole produced during the DIBRprocedure.

T.-L. Lin et al. / J. Vis. Commun. Image R. 25 (2014) 1811–1822 1817

be put on mv(x) for predicting mv(a). Therefore, for AWEA, theweighting coefficient is defined as lx = px. Hence, the motion esti-mation equation can be rewritten as following for AWEA

dmv ðaÞ ¼WðlbmvðbÞ þ lcmvðcÞ þ ldmvðdÞ þ lemvðeÞÞ

W ¼ 1lb þ lc þ ld þ le

Even though it is a more systematic way to derive the weight-ing, the complexity is much higher than the proposed method insubsection A. And in the experiment section, we show that AWEAis only better than the proposed method in some cases (just closeto 50% cases) with only small magnitudes.

7. Experimental results

In this section, we will describe the experimental settings indetail. We compare the error concealment performance of thecombined system of the four proposed methods (optimal reference

view selection in Section 3, 1-to-1 pixels in Section 4, many-to-1pixels in Section 5 and hole pixels in Section 6), denoted by Pro-posed, against the state-of-the-art method proposed by Chunget al. [23], denoted by Chung. We also compare with our priorwork in [25], denoted by Prior (with only the algorithm for 1-to-1 pixels in Section 4). For benchmarking the proposed method interms of efficiency, the AWEA is also compared.

The video encoder is H.264 (JM 18.0). We use IPPPP as theencoding GOP structure, with size 15 frames. We use QP = 16(quantization parameter), and the variable bit rate is enabled.The videos we considered are ballet, breakdance, balloons and kendowith different views. For breakdance and balloons, there are view 0to view 7 of complete color frames and depth frames. However forballoons and kendo, only limited number of views is complete, con-taining both color frames and depth frames, therefore only thoseviews are tested. (The sequences are downloaded from NagoyaUniversity [26].) For packet losses, in this paper we consider wholeframe losses. For each view video, we drop the frames with a frameloss rate 1/15 (random drops in our experiment), and the losses areconcealed by 4 different compared methods. We evaluate the per-formances of the 4 methods by average PSNR (between recon-structed frames and the original uncompressed frames) over theframes in the video.

Tables 1–4 demonstrate the average PSNR comparison resultsfor ballet, breakdance, balloons and kendo sequences, respectively.The comparisons are made among Chung’s, the Prior, and the Pro-posed method. In each table, Gain1 is defined as the performancegain the Prior has over Chung’s, Gain2 is defined as the perfor-mance gain the Proposed method has over Chung’s, and Gain3 isdefined as the performance gain the AWEA method has over Pro-posed. The first column is the view at which the frame is lost.

We first focus on the comparisons between Chung’s and thePrior method, as indicated by Gain1. For breakdance in Table 2,we can see that the Prior is always better than Chung’s for differentviews; the Gain1 is always positive. And out of 7 views, there are 6views having the Gain1 larger than 1 dB, and the largest gain canbe up to 1.60 dB. This means that the assumption of proposed 1-to-1 method in Prior is more valid than the one in the Chung’smethod in these videos. However, for ballet video in Table 1, thePrior is not always better than the Chung’s method; out of 7 views,the Prior wins 4 cases and loses 3 cases. The reason that in somecases the Chung’s method performs better in this ballet video canbe the following: the ballet video is a relatively slow video with alarge portion of static background, and many motion vectors areessentially zero, even in different views. So the assumption in Chung’swork that the motion vectors in different views are the same is validfor many frames in this video. Therefore, Chung’s work sometimesperforms slightly better than the Prior work in this video. Onething to note is that the losing cases are of very small magnitudes(�0.32 dB at worst). Similar situation that the Prior method is notalways better than the Chung’s method can be observed for bal-loons and kendo sequences in Tables 3 and 4, respectively; half ofthe Gain1s is positive, and half is negative. To conclude, most ofthe time the Prior is better than the Chung’s work (Gain1 is positive14 times out of 20 times over 4 sequences in 4 tables). And the losingcases are of very small magnitudes (�0.32 dB at worst), while the win-ning case can have as high as 1.60 dB improvement.

Even though the proposed 1-to-1 algorithm in Prior losesslightly in some cases, after being combined with the subsequentproposed many-to-1 and the hole-filling algorithms in the Pro-posed method, we can perform better than the Chung’s work inall cases. Let us focus on the comparisons between the Proposedmethod and the state-of-the-art Chung’s method, which are dem-onstrated by the Gain2 in the 4 tables. For the ballet sequence inTable 1, we can see that the Gain2s are all positive, demonstratingthe good performances in all cases. The magnitudes of Gain2 are

Page 8: Improved interview video error concealment on whole frame packet loss

Table 1Average performance comparisons (PSNR in dB) for ballet sequence.

View at which the frame is lost Chung’s Prior Gain1 (Prior � Chung) Proposed Gain2 (Proposed � Chung) AWEA Gain3 (AWEA � Proposed)

View 1 29.49 29.38 �0.11 30.21 0.72 30.14 �0.07View 2 30.59 30.37 �0.22 30.79 0.20 30.66 �0.13View 3 30.65 30.70 0.05 31.21 0.56 31.09 �0.12View 4 29.13 29.22 0.09 29.97 0.84 29.82 �0.15View 5 30.22 29.90 �0.32 30.95 0.73 30.48 �0.47View 6 30.56 30.58 0.02 31.38 0.82 31.18 �0.20View 7 29.90 29.95 0.05 31.46 1.56 31.49 0.03

Table 2Average performance comparisons (PSNR in dB) for breakdance sequence.

View at which the frame is lost Chung’s Prior Gain1 (Prior � Chung) Proposed Gain2 (Proposed � Chung) AWEA Gain3 (AWEA � Proposed)

View 1 26.20 27.60 1.40 29.86 3.66 29.98 0.12View 2 26.49 27.61 1.12 29.81 3.32 29.93 0.12View 3 27.07 27.97 0.90 30.35 3.28 30.41 0.06View 4 26.75 28.01 1.26 31.39 4.64 31.43 0.04View 5 27.30 28.90 1.60 31.46 4.16 31.54 0.08View 6 26.54 27.76 1.22 30.26 3.72 30.39 0.13View 7 26.56 27.97 1.41 30.12 3.56 30.26 0.14

Table 3Average performance comparisons (PSNR in dB) for balloons sequence.

View at which the frame is lost Chung’s Prior Gain1 (Prior � Chung) Proposed Gain2 (Proposed � Chung) AWEA Gain3 (AWEA � Proposed)

View 1 29.55 29.52 �0.03 35.08 5.53 34.99 �0.09View 3 30.82 30.77 �0.05 35.24 4.42 35.13 �0.11View 5 30.61 30.66 0.05 35.80 5.19 35.72 �0.08

Table 4Average performance comparisons (PSNR in dB) for kendo sequence.

View at which the frame is lost Chung’s Prior Gain1 (Prior � Chung) Proposed Gain2 (Proposed � Chung) AWEA Gain3 (AWEA � Proposed)

View 1 30.89 31.68 0.79 33.20 2.31 33.11 �0.09View 3 30.79 31.59 0.80 34.10 3.31 34.19 0.09View 5 32.66 32.63 �0.03 34.11 1.45 34.20 0.09

1818 T.-L. Lin et al. / J. Vis. Commun. Image R. 25 (2014) 1811–1822

not large (0.20–1.56 dB). It is because the proposed 1-to-1 methoddoes not win a lot, as shown in Gain1, the improvement from sub-sequent proposed many-to-1 and hole-filling algorithm can onlycontribute little to the overall result in the overall Proposed sys-tem. For breakdance in Table 2, the Gain2s are also always positive,and the magnitudes are more significant; they range from 3.28 dBto 4.64 dB. We have these large Gain2s in breakdance sequencebecause in addition to the improvement from the proposedmany-to-1 and hole method, the 1-to-1 method also performs bet-ter than Chung’s 1-to-1 method for all cases, as shown by Gain1being all positive. For balloons sequence in Table 3, again, the Pro-posed method always outperforms the Chung’s method, by up to5.53 dB. Similarly for kendo in Table 4, the Gain2s are always posi-tive, by as large as 3.31 dB. For the above two sequences, eventhough the Prior (the proposed 1-to-1 method) is not always betterthan the Chung’s method (Gain1s in Tables 3 and 4), the Proposedsystem (including the proposed 1-to-1, many-to-1 and hole-fillingmethod) can always perform better than the Chung’s method.Therefore, the Proposed method can robustly perform better than thestate-of-the-art Chung’s method [23] for all the views in all the videos.

To benchmark the efficiency of the Proposed method, we com-pare the performance between the Proposed method and AWEA. Ineach table, Gain3 shows how much AWEA is better than Proposed.For ballet sequence, mostly Gain3 is negative. For breakdance,Gain3 is all positive. For balloons, Gain3 is all negative, and for ken-do, Gain3 is negative in one out of three cases. Therefore, the

complex statistical AWEA method is not always better. The reasoncan be that the true underlying weighting function in lost frame tcannot be perfectly characterized by the proposed likelihood func-tion found in previous frame t � 1 by our approach in AWEA, andsometimes the simple weighting of the inverse distance is closerto the characteristic of the true underlying weighting function inlost frame t. Also from the tables, the magnitudes of Gain3 (posi-tive or negative) are very small: ranging from �0.47 dB to0.14 dB. Thus the performances of the Proposed and the AWEAmethod are close. To summarize, with large computational com-plexity used in AWEA, it is not always that the AWEA is better thanthe Proposed, and the performances between them are mostly veryclose to each other. Therefore, even though the proposed method isvery simple (the weighting is simply the inverse of the distance), ithas similar performances compared to a more complicated statis-tical-analysis-based AWEA method.

To see the numerical comparisons among Chung, Prior, Pro-posed and AWEA from another point of view, we compute theCoRR (correlation) between the original frame pixels and thereconstructed frame pixels by individual error concealment meth-ods. We report all the comparisons in the measurement in CoRR inTable 5 for ballet, Table 6 for breakdance, Table 7 for balloons, andTable 8 for kendo. We can observe that the trends in CoRR are verysimilar with those in PSNR in Tables 1–4. The Prior method is mostof the time better than Chung’s method in 14 cases out of 20 cases,with the largest winning margin being 0.0241 and largest losing

Page 9: Improved interview video error concealment on whole frame packet loss

Table 5Average performance comparisons (Corr, correlation) for ballet sequence.

View at which the frame is lost Chung’s Prior Gain1 (Prior � Chung) Proposed Gain2 (Proposed � Chung) AWEA Gain3 (AWEA � Proposed)

View 1 0.9572 0.9585 0.0013 0.9673 0.0101 0.9667 �0.0006View 2 0.9620 0.9600 �0.0020 0.9655 0.0035 0.9644 �0.0011View 3 0.9588 0.9605 0.0017 0.9672 0.0084 0.9664 �0.0008View 4 0.9453 0.9444 �0.0009 0.9531 0.0078 0.9516 �0.0015View 5 0.9560 0.9533 �0.0027 0.9647 0.0087 0.9613 �0.0034View 6 0.9632 0.9623 �0.0009 0.9715 0.0083 0.9706 �0.0009View 7 0.9589 0.9598 0.0009 0.9724 0.0135 0.9728 0.0004

Table 6Average performance comparisons (Corr, correlation) for breakdance sequence.

View at which the frame is lost Chung’s Prior Gain1 (Prior � Chung) Proposed Gain2 (Proposed � Chung) AWEA Gain3 (AWEA � Proposed)

View 1 0.9159 0.9386 0.0227 0.9632 0.0473 0.9642 0.0010View 2 0.9154 0.9328 0.0174 0.9608 0.0454 0.9620 0.0012View 3 0.9219 0.9346 0.0127 0.9637 0.0418 0.9641 0.0004View 4 0.9196 0.9366 0.0170 0.9714 0.0518 0.9717 0.0003View 5 0.9283 0.9465 0.0182 0.9713 0.0430 0.9718 0.0005View 6 0.9131 0.9284 0.0153 0.9614 0.0483 0.9625 0.0011View 7 0.9042 0.9283 0.0241 0.9576 0.0534 0.9585 0.0009

Table 7Average performance comparisons (Corr, correlation) for balloons sequence.

View at which the frame is lost Chung’s Prior Gain1 (Prior � Chung) Proposed Gain2 (Proposed � Chung) AWEA Gain3 (AWEA � Proposed)

View 1 0.9793 0.9791 �0.0002 0.9943 0.0150 0.9942 �0.0001View 3 0.9848 0.9845 �0.0003 0.9944 0.0096 0.9943 �0.0001View 5 0.9838 0.9839 0.0001 0.9949 0.0111 0.9948 �0.0001

Table 8Average performance comparisons (Corr, correlation) for kendo sequence.

View at which the frame is lost Chung’s Prior Gain1 (Prior � Chung) Proposed Gain2 (Proposed � Chung) AWEA Gain3 (AWEA � Proposed)

View 1 0.9941 0.9950 0.0009 0.9964 0.0023 0.9965 0.0001View 3 0.9939 0.9949 0.0010 0.9970 0.0031 0.9971 0.0001View 5 0.9955 0.9956 0.0001 0.9971 0.0016 0.9972 0.0001

T.-L. Lin et al. / J. Vis. Commun. Image R. 25 (2014) 1811–1822 1819

margin being only �0.0027, as can be seen in Gain1. The Proposedsystem (including the proposed 1-to-1, many-to-1 and hole-fillingmethod) is always better than Chung’s method, as indicated byGain2; they are all positive, and can be as high as 0.0534, showingthe outstanding performance of the Proposed method. Finally forthe comparison between the Proposed and the AWEA method inGain3, out of 20 cases, the complicated AWEA only performs betterin 11 cases, with largest winning margin being only 0.0012. Itmeans again that the Proposed method is very efficient; the com-plexity is much lower than the AWEA, and the performance canbe almost the same as AWEA.

For the visual comparisons, Fig. 10–17 demonstrate the error-concealed frames by different methods; Fig. 10 and 11 are for

(a) Original frame (b) Error-concealed byChung (PSNR=28.57 dB)

Fig. 10. Reconstructed frame 64 of ballet video. Prior is better than Chu

ballet, Fig. 12 and 13 are for breakdance, Fig. 14 and 15 are for bal-loons, and Fig. 16 and 17 are for kendo. In Fig. 10, frame 64 for balletis shown. As shown, the error-concealed image by the Proposedmethod maintains better object integrity, including the turningangle of the head and the leg part, as circled, while the images pro-duced by Prior and Chung’s method produce the wrong angle of thehead-turnings. This indicates that the Proposed method is a betterapproach to recover the pixels. In Fig. 11 for frame 22, as circled inwhite, the Proposed method produces not only better quality of thewoman’s arm location but also a more correct background pixels,compared to Chung’s method. Similar situations can be observedfor the breakdance sequence. In Fig. 12 for the frame 24, better pix-els and object integrity can be maintained in the error-concealed

(c) Error-concealed byPrior (PSNR=28.65 dB)

(d) Error-concealed byProposed (PSNR=31.28 dB)

ng by 0.08 dB, and the Proposed is better than Chung by 2.71 dB.

Page 10: Improved interview video error concealment on whole frame packet loss

(a) Original frame (b) Error-concealed by Chung (PSNR=30.20 dB)

(c) Error-concealed by Prior (PSNR=30.12 dB)

(d) Error-concealed by Proposed (PSNR=30.92 dB)

Fig. 11. Reconstructed frame 22 of ballet video. Prior is worse than Chung by 0.08 dB, and the Proposed is better than Chung by 0.72 dB.

(a) Original frame (b) Error-concealed by Chung (PSNR=24.90 dB)

(c) Error-concealed by Prior (PSNR=26.17 dB)

(d) Error-concealed by Proposed (PSNR=28.34 dB)

Fig. 12. Reconstructed frame 24 of breakdance video. Prior is better than Chung by 1.27 dB, and the Proposed is better than Chung by 3.44 dB.

(a) Original frame (b) Error-concealed by Chung (PSNR=30.32 dB)

(c) Error-concealed by Prior (PSNR=31.51 dB)

(d) Error-concealed by Proposed (PSNR=33.84 dB)

Fig. 13. Reconstructed frame 51 of breakdance video. Prior is better than Chung by 1.19 dB, and the Proposed is better than Chung by 3.52 dB.

(a) Original frame (b) Error-concealed by Chung (PSNR=31.76 dB)

(c) Error-concealed by Prior (PSNR=31.97 dB)

(d) Error-concealed by Proposed (PSNR=36.74 dB)

Fig. 14. Reconstructed frame 35 of balloons video. Prior is better than Chung by 0.21 dB, and the Proposed is better than Chung by 4.98 dB.

(a) Original frame (b) Error-concealed by Chung (PSNR=29.17 dB)

(c) Error-concealed by Prior (PSNR=29.18 dB)

(d) Error-concealed by Proposed (PSNR=35.77 dB)

Fig. 15. Reconstructed frame 50 of balloons video. Prior is better than Chung by 0.01 dB, and the Proposed is better than Chung by 6.60 dB.

1820 T.-L. Lin et al. / J. Vis. Commun. Image R. 25 (2014) 1811–1822

Page 11: Improved interview video error concealment on whole frame packet loss

(a) Original frame (b) Error-concealed by Chung (PSNR=32.18 dB)

(c) Error-concealed by Prior (PSNR=33.56 dB)

(d) Error-concealed by Proposed (PSNR=35.78 dB)

Fig. 16. Reconstructed frame 79 of kendo video. Prior is better than Chung by 1.38 dB, and the Proposed is better than Chung by 3.60 dB.

(a) Original frame (b) Error-concealed by Chung (PSNR=32.60 dB)

(c) Error-concealed by Prior (PSNR=32.92 dB)

(d) Error-concealed by Proposed (PSNR=33.90 dB)

Fig. 17. Reconstructed frame 5 of kendo video. Prior is better than Chung by 0.32 dB, and the Proposed is better than Chung by 1.30 dB.

T.-L. Lin et al. / J. Vis. Commun. Image R. 25 (2014) 1811–1822 1821

image by the Proposed method in Fig. 12(d), especially for thelower white-circled area. For the upper white-circled area (shoe),the work in Chung obviously chooses the wrong pixels by themotion vector in another view. For the same area, by the Proposedmethod in Fig. 12(d), more correct pixels are chosen by the Pro-posed method, providing a better integrity of the estimated shoepixels. In Fig. 13 for frame 51, it can be observed that for the towel(circled), the work in Chung does not perform well; in the towel ascircled in Fig. 13(b) by the method in Chung, the object integrity ofthe towel is not maintained, so the image quality of the area is notsatisfactory. However, the Proposed method can conceal the errorwith a much better visual result; in Fig. 13(d), the Proposedmethod shows the better ability in error concealment to keep theobject integrity (of the white towel) and have more correct pixelvalues (as shown by PSNR). In Fig. 14 for the frame 35 in the bal-loons video, it can be observed that in the white-circled side ofthe pants, the Proposed method can provide a smooth edge ofthe pant, while the Chung’s method cannot. In Fig. 15 for the frame50, again, the Proposed can produce better and smoother edges ofthe balloon and the pants (circled) than the Chung’s method can.For the kendo sequence, for frame 79, Fig. 16 shows that the Pro-posed has better integrity of the arm pixels in the white-dash circlethan Chung’s method. Finally, in Fig. 17 for the frame 5, the edgearea in the white circle by Proposed is smoother than that byChung’s method. These show that compared to the state-of-the-artChung’s method [23], the Proposed method cannot only improvesthe objective video quality in PSNR, but also can produce better motionvectors and visually more correct pixels for the missing image.

8. Conclusions

We propose several improved algorithms to perform error con-cealment for whole frame loss in multiview videos with depth. Thetechnique of DIBR is used and improved. First, the optimal refer-ence view selection is developed. Then, for the 1-to-1 mapping pix-els, we use correctly mapped pixels from another view as theinitially estimated pixels. The illumination compensation acrossviews of the current frame is estimated by that of the previous

frame, derived by the information of the motion vectors of themapped pixel, and a reverse DIBR. The assumptions for the ideasdescribed above are more realistic than the assumptions madeby the previous state-of-the-art method. Many-to-1 pixels are gen-erated with better decision using depth information, producingbetter pixel quality. The hole pixels are efficiently recovered withthe estimated motion vectors, computed using the neighboringavailable motion vectors and weighted by the inverse of the spatialdistance. The experimental results show that the proposed com-bined method always outperforms the previous state-of-the-artmethod in all different test views and videos, and the maximumimprovement can be as high as 5.53 dB.

9. Future work

When the whole frame and its associated motion vectors aremissing, the proposed error concealment method is performed.After the use of DIBR and the proposed 1-to-1 and many-to-1 pixelestimation, hole pixels are waited to be error-concealed. We pro-posed a hole-filling algorithm in the paper, and it is desirable to fillthe holes by applying existing single-view video, partial-frame-losserror concealments to fill the hole. In existing methods, the partialframe loss is to drop pixel information in a checkerboard, and therecovery of the lost block can use the help of spatially availableneighbors. The analogy of this situation in our case is that therecovered 1-to-1 and many-to-1 pixels are spatially availableneighbors, and the hole pixels are the lost blocks. By this analogy,the existing partial-frame-loss error concealment algorithms canbe applied and modified, to improve the hole-filling performancein our experiment.

Another view point for the future work is to make tougher theassumption under which the proposed algorithm is designed; thedepth frames should be available in the decoder for the proposedDIBR-based error concealment method to work. For the tougherassumption that during the transmission, the depth frames arelost, an additional algorithm to first error-conceal the missingdepth frame is required. Therefore we aim to design a depth frame

Page 12: Improved interview video error concealment on whole frame packet loss

1822 T.-L. Lin et al. / J. Vis. Commun. Image R. 25 (2014) 1811–1822

error concealment method so that the proposed system can dealwith the packet loss situation that is more realistic.

Acknowledgments

This research is supported by the National Science Council Tai-wan under Grant NSC 101-2221-E-033-036 and NSC 102-2221-E-033-018, and by the Ministry of Science and Technology, Taiwanunder Grant MOST 103-2221-E-033-020.

References

[1] M. Kim, H. Lee, S. Sull, Spatial error concealment for H.264 using sequentialdirectional interpolation, IEEE Trans. Consum. Electron. 54 (2008) 1811–1818.

[2] T.S. Valente, C. Dufour, F. Groliere, D. Snook, An efficient error concealmentimplementation for MPEG4 video streams, IEEE Trans. Consum. Electron. 47(2001) 568–578.

[3] Y.-K. Wang, M.M. Hannuksela, V. Varsa, A. Hourunranta, M. Gabbouj, The errorconcealment feature in the H.26L test model, IEEE Int. Conf. Image Process. 2(2002) 729–732.

[4] X. Qian, G. Liu, H. Wang, Recovering connected error region based on adaptiveerror concealment order determination, IEEE Trans. Multimedia 11 (2009)683–695.

[5] C. Chen, Y. Liu, Z. Yang, J. Bu, X. Deng, Multi-frame error concealment forH.264/AVC frames with complexity adaptation, IEEE Trans. Consum. Electron.54 (2008) 1422–1429.

[6] Y. Chen, Y. Hu, O.C. Au, H. Li, C.W. Chen, Video error concealment using spatio-temporal boundary matching and partial differential equation, IEEE Trans.Multimedia 10 (2008) 2–15.

[7] J.-T. Chien, G.-L. Li, M.-J. Chen, Effective error concealment algorithm of wholeframe loss for H.264 video coding standard by recursive motion vectorrefinement, IEEE Trans. Consum. Electron. 56 (2010) 1689–1695.

[8] Y. Zhang, X. Xiang, D. Zhao, S. Ma, W. Gao, Packet video error concealment withauto regressive model, IEEE Trans. Circ. Syst. Video Technol. 22 (2011) 1–14.

[9] D. Persson, T. Eriksson, P. Hedelin, Packet video error concealment withgaussian mixture models, IEEE Trans. Image Process. 17 (2008) 145–154.

[10] D. Persson, T. Eriksson, Mixture model- and least squares-based packet videoerror concealment, IEEE Trans. Image Process. 18 (2009) 1048–1054.

[11] K. Seth, V. Kamakoti, S. Srinivasan, Efficient motion vector recovery algorithmfor H.264 using B-Spline approximation, IEEE Trans. Broadcast. 56 (2010) 467–480.

[12] J. Zheng, L.P. Chau, Efficient motion vector recovery algorithm for H.264 basedon a polynomial model, IEEE Trans. Multimedia 7 (2005) 507–513.

[13] X. Chen, Y.Y. Chung, C. Bae, X. He, W.-C. Yeh, An efficient error concealmentalgorithm for H.264/AVC using regression modeling-based prediction, IEEETrans. Consum. Electron. 56 (2010) 2694–2701.

[14] Y. Xu, Y. Zhou, Adaptive temporal error concealment scheme for H.264/AVCvideo decoder, IEEE Trans. Consum. Electron. 54 (2008) 1846–1851.

[15] T. Tröger, A. Kaup, Inter-sequence error concealment techniques for multi-broadcast TV reception, IEEE Trans. Broadcast. 57 (2011) 777–793.

[16] Q. Peng, T. Yang, C. Zhu, Block-based temporal error concealment for videopacket using motion vector extrapolation, in: IEEE International Conference onCommunications, Circuits and Systems and West Sino Expositions, 2002, pp.10–14.

[17] B. Yan, H. Gharavi, A hybrid frame concealment algorithm for H.264/AVC, IEEETrans. Image Process. 19 (2010) 98–107.

[18] J. Zhou, B. Yan, H. Gharavi, Efficient motion vector interpolation for errorconcealment of H.264/AVC, IEEE Trans. Broadcast. 57 (2011) 75–80.

[19] B. Yan, A novel H.264 based motion vector recovery method for 3D videotransmission, IEEE Trans. Consum. Electron. 53 (2007).

[20] Y. Liu, J. Wang, H. Zhang, Depth image-based temporal error concealment for3-D video transmission, IEEE Trans. Circ. Syst. Video Technol. 20 (2010) 600–604.

[21] S. Yang, Y. Zhao, S. Wang, H. Chen, Error concealment for stereoscopic videousing illumination compensation, IEEE Trans. Consum. Electron. 57 (2011)1907–1914.

[22] B. Yan, J. Zhou, Efficient frame concealment for depth image-based 3-D videotransmission, IEEE Trans. Multimedia 14 (2012) 936–941.

[23] T.-Y. Chung, S. Sull, C.-S. Kim, Frame loss concealment for stereoscopic videoplus depth sequences, IEEE Trans. Consum. Electron. 57 (2011) 1336–1344.

[24] K. Müller, A. Smolic, K. Dix, P. Merkle, P. Kauff, T. Wiegand, View synthesis foradvanced 3D video systems, EURASIP J. Image Video Process. 2008 (438148)(2008).

[25] T.-L. Lin, T.-E. Chang, G.-S. Huang, C.-C. Chou, Multiview video errorconcealment with improved pixel estimation and illuminationcompensation, in: IEEE International Symposium on Intelligent SignalProcessing and Communication Systems, November 2013.

[26] Nagoya University’s sequences <http://www.fujii.nuee.nagoya-u.ac.jp/multiview-data/>.