philipp merkle, aljoscha smolic karsten müller, thomas wiegand csvt 2007

23
Efficient Prediction Structure for Multi-view Video Coding Philipp Merkle, Aljoscha Smolic Karsten Müller, Thomas Wiegand CSVT 2007

Upload: donald-mcdowell

Post on 11-Jan-2016

219 views

Category:

Documents


0 download

TRANSCRIPT

Efficient Prediction Structure for Multi-view Video Coding

Philipp Merkle, Aljoscha Smolic Karsten Müller,

Thomas Wiegand

CSVT 2007

OutlineMulti-view video coding (MVC) introductionRequirements and test conditions for MVCPrediction structuresExperimental resultsConclusion

2

MVC IntroductionMVC: Multi-view Video CodingMulti-view video (MVV): A system that uses

multiple camera views of the same scene is called.

Usage: 3DTV, free viewpoint video(FVV), etc.

3

Requirements for MVCTemporal random accessView random accessScalabilityBackward compatibilityQuality consistencyParallel processing

4

Temporal and inter-view correlation

5

T

T

T

temporal/inter-view mixed mode

Inter-view

temporal/inter-view mixed modeTemporal

Temporal and inter-view correlation analysis

6

H.264/AVC encoder was used with the following settings: Motion compensation block size of 16*16 Search range of ±32 pixels Lagrange parameter (λ) of 29.5

denotes the decrease of the average in comparison to temporal prediction only.J J

Simply including temporal and inter-view prediction modes

7

Temporal and inter-view correlation analysis (cont’d)

Lagrangian cost functionLagrangian cost function:

D denotes distortion.R denotes number of bits to transmit all components of

the motion vector.For each block in a picture, algorithm chooses

MV within a search rage that minimizes .

The distortion in the subject macroblock B is calculated by:

8

J D R (1)

argmin ( , ) ( , )i i im D S m R S m (2)

iS imM J

2

( , )

, ( , , ) ( , , )i x y tx y B

D S m s x y t s x m y m t m

(3)

1D camera: Ballroom, Exit, Rena, Race1, Uli, (line)

Breakdancers (arched) 2D camera: Flamenco2 (cross), AkkoKayo

(array)

Use 5 to 16 camera views Target high quality TV-type video (640*480

or 1024*768) then limited channel communication-type video.

9

Test data and test conditions

Knowledge – hierarchical B picture, QP cascadingHierarchical B picture, key picture, non-key

picture:

QP cascading : [1]

10

key picture key picture

1 ( 1?4 :1)k kQP QP k

[1] “Analysis of hierarchical B pictures and MCTF”, ICME 2006, IEEE International Conference on Multimedia and Expo, Toronto, Ontario, Canada, July 2006

Knowledge – DPB sizeDecoded Picture Buffer (DPB) size is

increased to: [2]

11

2* _ _ _GOP length number of views

[2] “Efficient Compression of Multi-view Video Exploiting Inter-view Dependencies Based on H.264/AVC”, ICME 2006, IEEE International Conference on Multimedia and Expo, Toronto, Ontario, Canada, July 2006

Memory-efficient reordering of multi-view input for compression

Two tasks1. To adapt the multi-view prediction schemes

to the specific camera arrangements of the test data sets.

2. To adapt the prediction structures to the random access specification.

12

Prediction structureSimulcast coding structureTo allow synchronization and random access,

all key pictures are coded in intra mode.

13

Prediction structure (cont’d)The first view is called base view (remains

the I frame).

14

0S

Prediction structure (cont’d)Alternative structures of inter-view for key

pictures

15

KS_IPP KS_PIP KS_IBP

KS_IPP

KS_PIP

KS_IBP

Linear camera arrangement 2D Camera array

Prediction structure (cont’d)Inter-view prediction for key and non-key

pictures

16

AS_IPP mode

Experimental results – objective evaluation

17

Ballroom test result

Average coding gains compared with anchor coding

Experimental results – subjective evaluationDifferent bit-rates were selected for the

different data sets.

18

Ballroom test result

Race1 test result

Experimental results – subjective evaluationAS_IBP outperforms the anchors significantly.The gain decreases slightly with higher bit-rates.

19

Average results over all test sequences

Influence of camera densityUsing Rena sequence, and

consisting of 16 linear arranged cameras with a 5 cm distance between two adjacent cameras

Repeated for each shifted set of 9 adjacent cameras

The structure are applied to every time instance of the MVV sequence without temporal prediction.

20

Results of experiments on camera density

Coding gain increases with decreasing camera distance and decreasing reconstruction quality.

21

Results of experiments on camera density (cont’d)

Results of average per camera rate relative to the one camera case(→)

A larger QP value leads to a larger coding gain

22

ConclusionResulting multi-view prediction: achieving

significant coding gains and being highly flexible.

Parallel processing is supported by the presented sequential processing approach.

Problems:Large disparities between the different views

of multi-view video sequencesIllumination and color inconsistencies across

views

23