presented by yehuda dar advanced topics in computer vision ( 048921 )winter 2011-2012
TRANSCRIPT
Video Compressionusing Computer VisionPresented by Yehuda Dar
Advanced Topics in Computer Vision (048921) Winter 2011-2012
Video Compression Basics
Fundamental tradeoff among:Bit-rateDistortionComputational complexity
Video Compression Basics
Utilized redundancies:SpatialTemporalPsycho-visualStatistical
H.264 Overview
H.264 Redundancy UtilizationMeans Utilization Redundancy
• Transform coding• Intra coding (spatial prediction) High
Spatial
Motion estimation & compensation High Temporal
• YCbCr color space• 4:2:0 sampling• DC \ AC coefficients quantization
MediumPsycho-visual
Entropy coding High Statistical
Compression using Computer Vision
Motivation:
Better utilization of the psycho-visual redundancy
Application-specific compression methods
Exploring new approaches
A Review of:
A Scheme for Attentional Video Compression R. Gupta and S. ChaundhuryPAMI 2011
Method Outline
Salient region detectionFoveated video codingIntegration into H.264
Foveated image coding demonstrationFigure from Guo & Zhang, Trans. Image Process., 2010
Saliency MapStep 1: Creating a 3D Feature Map
Based on Calculation method Feature typeLiu et al, CVPR 2007
Color spatial variance
Global
Huang et al, ICPR 2010
Center-surround multi-scale ratio of dissimilarity
Local
Yu et al, ICDL 2009
Pulse-DCT Rarity
Relevance Vector Machine (RVM)
Used here as a binary classifier
Advantages over support-vector-machine (SVM):Provides posterior probabilitiesBetter generalization abilityFaster decisions
Saliency MapStep 2: Unify Features using RVM
Global
local
rarity
average
avgglobal
avglocalavgrarity
æ ö÷ç ÷ç ÷ç ÷ç ÷ç ÷ç ÷ç ÷÷ç ÷ç ÷çè ø
average
average
ground truth count pixels
‘salient \ ’‘non salient’
RVM
sample
label
Training Procedure for MBs:
Saliency MapStep 2: Unify Features using RVMTrained RVM Usage:
avgglobal
avglocalavgrarity
æ ö÷ç ÷ç ÷ç ÷ç ÷ç ÷ç ÷ç ÷÷ç ÷ç ÷çè øRVM
Newinput
Binary label ‘salient \ ’‘non salient’
Probability Relative saliency
Saliency Map: Result Comparison
input global local[Huang et al, ICPR 2010]
rarity[Yu et al, ICDL 2009]
proposed [Harel et al, NIPS 2006]
[Bruce & Tsotsos, NIPS 2006]
Figures from Gupta & Chaundhury, PAMI 2011
Saliency Map: ROC Curve
Figure from Gupta & Chaundhury, PAMI 2011
Proposed[Harel et al, NIPS 2006]
Integration Into H.264:Calculation of Saliency Values
Recalculating saliency map only when it significantly changes
Mutual-information between successive frames indicates changes in saliency:
Figures from Gupta & Chaundhury, PAMI 2011
Integration Into H.264:Propagation of Saliency Values
For inter-coded MBs, the saliency value is a weighted-average of those pointed by the motion-vector
Figures from Gupta & Chaundhury, PAMI 2011
Integration Into H.264:Salient-Adaptive Quantization
Non-uniform bit-allocationSmaller saliency value => coarser
quantization
Integration Into H.264
Figure from Gupta & Chaundhury, PAMI 2011
Paper EvaluationNovelty:
Methods for: saliency map saliency value propagation
Assumption:All the MBs in P-frames are inter-coded (problematic)
Writing level: GoodPartially self-contained
Paper EvaluationFeasibility:
Higher complexity than H.264 encoders Not for real-time encoders Useful at low bit-rates Objects entering the scene may be considered unimportant
Experimental evaluation:Saliency:
visual comparison: good ROC curve comparison: partial
Compression:None (authors’ future direction)
Future Directions
Improving encoding complexityless complex saliency method
Better object entrance treatmentUsing mutual-information of frame areas
Treat intra-coded MBs in P-frames
A Review of:
3D Models Coding and Morphing for Efficient Video CompressionF. Galpin, R. Balter, L. Morin, K. DeguchiCVPR 2004
Method Outline
3D model extraction3D model-based video codingReconstruction using adaptive geometric morphing
3D Models Stream Generation
Figure from Galpin et al, CVPR 2004
Stream Compression
Three data types to compress:3D modelTexture imagesCamera parameters
Texture Image Compression
Figure from Galpin et al, CVPR 2004
Reconstruction Process:
3D Model Compression
The 3D model originates in decimated depth map
Compressed by:Wavelet transformDepth-adaptive quantization
Figures from Galpin et al, CVPR 2004
Video Reconstruction:Texture Fading
Figure from Galpin et al, CVPR 2004
Video Reconstruction:Texture Fading
without texture fading with texture fading
Figures from Galpin et al, CVPR 2004
Video Reconstruction:Geometric Morphing
Improving 3D model interpolation
Figure from Galpin et al, CVPR 2004
Video Reconstruction:Geometric Morphing
regular interpolation interpolation with geometric morphing
Figures from Galpin et al, CVPR 2004
Result Comparison with H.264
Paper EvaluationNovelty:
Compression using unknown 3D model
Assumptions:Static sceneMoving monocular cameraNeglected camera rotationGOP intrinsic parameters are fixed
Writing level: GoodNot self-contained
Paper Evaluation
Feasibility:Only for static scene videoHigh encoder\decoder complexityReal-time unsuitableUseful at very low bit-rates
Experimental evaluation:Sufficient visual comparison with H.264No run-time information
Future Directions
Treat moving objects
Improve complexityAt least for real-time decoding
Approach Comparison3D model Attention
Static scene Any Video type
Very low Low Bit-rates useful at
High High Encoder complexity
High Regular Decoder complexity
Unsuitable Possible Integration in H.264
Inferior Promising Overall evaluation