presented by yehuda dar advanced topics in computer vision ( 048921 )winter 2011-2012

Video Compressionusing Computer VisionPresented by Yehuda Dar

Advanced Topics in Computer Vision (048921) Winter 2011-2012

Video Compression Basics

Fundamental tradeoff among:Bit-rateDistortionComputational complexity

Video Compression Basics

Utilized redundancies:SpatialTemporalPsycho-visualStatistical

H.264 Overview

H.264 Redundancy UtilizationMeans Utilization Redundancy

• Transform coding• Intra coding (spatial prediction) High

Spatial

Motion estimation & compensation High Temporal

• YCbCr color space• 4:2:0 sampling• DC \ AC coefficients quantization

MediumPsycho-visual

Entropy coding High Statistical

Compression using Computer Vision

Motivation:

Better utilization of the psycho-visual redundancy

Application-specific compression methods

Exploring new approaches

A Review of:

A Scheme for Attentional Video Compression R. Gupta and S. ChaundhuryPAMI 2011

Method Outline

Salient region detectionFoveated video codingIntegration into H.264

Foveated image coding demonstrationFigure from Guo & Zhang, Trans. Image Process., 2010

Saliency MapStep 1: Creating a 3D Feature Map

Based on Calculation method Feature typeLiu et al, CVPR 2007

Color spatial variance

Global

Huang et al, ICPR 2010

Center-surround multi-scale ratio of dissimilarity

Local

Yu et al, ICDL 2009

Pulse-DCT Rarity

Relevance Vector Machine (RVM)

Used here as a binary classifier

Advantages over support-vector-machine (SVM):Provides posterior probabilitiesBetter generalization abilityFaster decisions

Saliency MapStep 2: Unify Features using RVM

Global

local

rarity

average

avgglobal

avglocalavgrarity

æ ö÷ç ÷ç ÷ç ÷ç ÷ç ÷ç ÷ç ÷÷ç ÷ç ÷çè ø

average

average

ground truth count pixels

‘salient \ ’‘non salient’

RVM

sample

label

Training Procedure for MBs:

Saliency MapStep 2: Unify Features using RVMTrained RVM Usage:

avgglobal

avglocalavgrarity

æ ö÷ç ÷ç ÷ç ÷ç ÷ç ÷ç ÷ç ÷÷ç ÷ç ÷çè øRVM

Newinput

Binary label ‘salient \ ’‘non salient’

Probability Relative saliency

Saliency Map: Result Comparison

input global local[Huang et al, ICPR 2010]

rarity[Yu et al, ICDL 2009]

proposed [Harel et al, NIPS 2006]

[Bruce & Tsotsos, NIPS 2006]

Figures from Gupta & Chaundhury, PAMI 2011

Saliency Map: ROC Curve

Figure from Gupta & Chaundhury, PAMI 2011

Proposed[Harel et al, NIPS 2006]

Integration Into H.264:Calculation of Saliency Values

Recalculating saliency map only when it significantly changes

Mutual-information between successive frames indicates changes in saliency:


Integration Into H.264:Propagation of Saliency Values

For inter-coded MBs, the saliency value is a weighted-average of those pointed by the motion-vector


Integration Into H.264:Salient-Adaptive Quantization

Non-uniform bit-allocationSmaller saliency value => coarser

quantization

Integration Into H.264

Figure from Gupta & Chaundhury, PAMI 2011

Paper EvaluationNovelty:

Methods for: saliency map saliency value propagation

Assumption:All the MBs in P-frames are inter-coded (problematic)

Writing level: GoodPartially self-contained

Paper EvaluationFeasibility:

Higher complexity than H.264 encoders Not for real-time encoders Useful at low bit-rates Objects entering the scene may be considered unimportant

Experimental evaluation:Saliency:

visual comparison: good ROC curve comparison: partial

Compression:None (authors’ future direction)

Future Directions

Improving encoding complexityless complex saliency method

Better object entrance treatmentUsing mutual-information of frame areas

Treat intra-coded MBs in P-frames

A Review of:

3D Models Coding and Morphing for Efficient Video CompressionF. Galpin, R. Balter, L. Morin, K. DeguchiCVPR 2004

Method Outline

3D model extraction3D model-based video codingReconstruction using adaptive geometric morphing

3D Models Stream Generation

Figure from Galpin et al, CVPR 2004

Stream Compression

Three data types to compress:3D modelTexture imagesCamera parameters

Texture Image Compression


Reconstruction Process:

3D Model Compression

The 3D model originates in decimated depth map

Compressed by:Wavelet transformDepth-adaptive quantization

Figures from Galpin et al, CVPR 2004

Video Reconstruction:Texture Fading


Video Reconstruction:Texture Fading

without texture fading with texture fading


Video Reconstruction:Geometric Morphing

Improving 3D model interpolation


Video Reconstruction:Geometric Morphing

regular interpolation interpolation with geometric morphing


Result Comparison with H.264

Paper EvaluationNovelty:

Compression using unknown 3D model

Assumptions:Static sceneMoving monocular cameraNeglected camera rotationGOP intrinsic parameters are fixed

Writing level: GoodNot self-contained

Paper Evaluation

Feasibility:Only for static scene videoHigh encoder\decoder complexityReal-time unsuitableUseful at very low bit-rates

Experimental evaluation:Sufficient visual comparison with H.264No run-time information

Future Directions

Treat moving objects

Improve complexityAt least for real-time decoding

Approach Comparison3D model Attention

Static scene Any Video type

Very low Low Bit-rates useful at

High High Encoder complexity

High Regular Decoder complexity

Unsuitable Possible Integration in H.264

Inferior Promising Overall evaluation

presented by yehuda dar advanced topics in computer vision ( 048921 )winter 2011-2012

Documents

selfcontained slide

overview slide

saliency map step

coarser quantization

pulsedctrarity slide

propagation of saliency

gupta chaundhury

chaundhury pami