m. wu: enee631 digital image processing (spring'09) video content analysis and streaming spring...
TRANSCRIPT
M. Wu: ENEE631 Digital Image Processing (Spring'09)
Video Content Analysis and StreamingVideo Content Analysis and Streaming
Spring ’09 Instructor: Min Wu
Electrical and Computer Engineering Department,
University of Maryland, College Park
bb.eng.umd.edu (select ENEE631 S’09) [email protected]
ENEE631 Spring’09ENEE631 Spring’09Lecture 19 (4/13/2009)Lecture 19 (4/13/2009)
M. Wu: ENEE631 Digital Image Processing (Spring'09) Lec.19 – Video Analysis & Comm [2]
Overview and LogisticsOverview and Logistics
Last Time: – General methodologies on motion analysis– Optical flow equations
Today:– Wrap up motion analysis– Video content analysis
Basic framework Temporal segmentation; Compressed domain processing
– A quick guide on video communications
UM
CP
EN
EE
63
1 S
lide
s (c
rea
ted
by
M.W
u ©
20
04
)
M. Wu: ENEE631 Digital Image Processing (Spring'09) Lec.19 – Video Analysis & Comm [3]
Review: Optical Flow EquationReview: Optical Flow Equation
Orthogonal decomposition of the flow vector v
– Projection along “normal direction” ~ vn
i.e., along image gradient f ’s direction
– Projection along tangent direction ~ vt
i.e., along orthogonal direction to image gradient f
O.F.E. f Normal direction
Tangent direction
0||||
t
fvf n
From Wang’s Preprint Fig.6.2
UM
CP
EN
EE
63
1 S
lide
s (c
rea
ted
by
M.W
u ©
20
01
)
|||| ftf
vn
M. Wu: ENEE631 Digital Image Processing (Spring'09) Lec.19 – Video Analysis & Comm [4]
Ambiguity in Motion EstimationAmbiguity in Motion Estimation
One equation for two unknowns– Tangent direction of motion vector
is undetermined
– “Aperture problem” Aperture ~ small window over which to apply const. intensity
assumption MV can be estimated only if aperture contains 2+ different
gradient directions (e.g. corners)
– Usually need additional constraints Spatial smoothness of motion field
Indeterminate motion vector over constant region (||f || = 0)– Reliable motion estimation only for regions with brightness
variations (e.g. edges or nonflat textures)
From Wang’s Preprint Fig.6.3
UM
CP
EN
EE
63
1 S
lide
s (c
rea
ted
by
M.W
u ©
20
01
)
M. Wu: ENEE631 Digital Image Processing (Spring'09) Lec.19 – Video Analysis & Comm [5]
General Methodologies for Motion EstimationGeneral Methodologies for Motion Estimation
Two categories: Feature vs. Intensity based estimation
Feature based– Step-1 establish correspondences between feature pairs – Step-2 estimate parameters of a chosen motion model by
least-square fitting of the correspondences
– Good for global/camera motion describable by parametric models
Common models: affine, projective, … (Wang Sec.5.5.2-5.5.4) Applications: Image mosaicing, synthesis of multiple-views
Intensity based– Apply optical flow equation (or its variation) to local regions– Good for non-simple motion and multiple objects– Applications: video coding, motion prediction and filtering
UM
CP
EN
EE
63
1 S
lide
s (c
rea
ted
by
M.W
u ©
20
01
)
M. Wu: ENEE631 Digital Image Processing (Spring'09) Lec.19 – Video Analysis & Comm [6]
Motion Estimation CriteriaMotion Estimation Criteria
Criterion based on displaced frame difference– E.g. in block matching approach
Criterion based on optical flow equations
Other criteria and considerations– Smoothness constraints– Bayesian criterion
UM
CP
EN
EE
63
1 S
lide
s (c
rea
ted
by
M.W
u ©
20
01
)
M. Wu: ENEE631 Digital Image Processing (Spring'09) Lec.19 – Video Analysis & Comm [7]
Commonly Used Optimization MethodsCommonly Used Optimization Methods For minimizing the previously defined M.E. error function
Exhaustive search– MAD often used for computational simplicity– Guaranteed global optimality at expense of computation complexity– Fast algorithms for sub-optimal solutions
Gradient-based search (Appendix B of Wang’s book)
– MSE often used for mathematical tractability (differentiable)– Iterative approach
refine an estimate along negative gradient directions of objective func.
– Generally converge to local optimal require good initial estimate
– Estimation method of Gradient also affects accuracy & robustness
UM
CP
EN
EE
63
1 S
lide
s (c
rea
ted
by
M.W
u ©
20
01
)
M. Wu: ENEE631 Digital Image Processing (Spring'09) Lec.19 – Video Analysis & Comm [8]
Various Motion Estimation ApproachesVarious Motion Estimation Approaches
Pixel-based motion estimation (Wang’s sec.6.3) Estimate one MV for every pixel Use relation from Optical Flow Equation to construct M.E.
criterion Add smoothness constraints on motion field to deal with
aperture problem and avoid poor estimation of MV
Block-matching– Correlation method (Wang’s sec.6.4.5)
Deformable block-matching (Wang’s sec.6.5)
– Use more block-based motion model than translational model e.g., affine/bilinear/projective mapping for each block (sec.5.5) square block in current frame match with non-square block in
ref.
Mesh-based motion estimation (Wang’s sec.6.6)
UM
CP
EN
EE
63
1 S
lide
s (c
rea
ted
by
M.W
u ©
20
01
)
M. Wu: ENEE631 Digital Image Processing (Spring'09) Lec.19 – Video Analysis & Comm [10]
Video Content AnalysisVideo Content Analysis
M. Wu: ENEE631 Digital Image Processing (Spring'09) Lec.19 – Video Analysis & Comm [11]
Recall: MPEG-7Recall: MPEG-7
“Multimedia Content Description Interface”– Not a video coding/compression standard like previous MPEG– Emphasize on how to describe the video content for efficient
indexing, search, and retrieval
Standardize the description mechanism of content– Descriptor, Description Scheme & Description Definition
Languages– Commonly used visual descriptors: Color, Texture, Shape, …
Figure from MPEG-7 Document N4031 (March 2001)
UM
CP
EN
EE
40
8G
Slid
es
(cre
ate
d b
y M
.Wu
& R
.Liu
© 2
00
2)
M. Wu: ENEE631 Digital Image Processing (Spring'09) Lec.19 – Video Analysis & Comm [12]
Introduction to Video Content AnalysisIntroduction to Video Content Analysis
Teach computer to “understand” video content– Define features that computer can learn to measure and compare
color (RGB values or other color coordinates) motion (magnitude and directions) shape (contours) texture and patterns
– Give example correspondences so that computer can learn build connections between feature & higher-level
semantics/concepts statistical classification and recognition techniques
Video understanding1. Break a video sequence into chunks, each with consistent content ~ “shot”2. Group similar shot into scenes that represent certain events3. Describe connections among scenes via story boards or scene graphs4. Associate shot/scene with representative feature/semantics for future query
UM
CP
EN
EE
40
8G
Slid
es
(cre
ate
d b
y M
.Wu
& R
.Liu
© 2
00
2)
M. Wu: ENEE631 Digital Image Processing (Spring'09) Lec.19 – Video Analysis & Comm [13]
Video Understanding (step-1)Video Understanding (step-1)
– Break a video sequence into chunks, each with consistent content ~ “shot”
From Yeung-Yeo-Liu: STG (Princeton)
M. Wu: ENEE631 Digital Image Processing (Spring'09) Lec.19 – Video Analysis & Comm [14]
Video Understanding (step-2)Video Understanding (step-2)
– Group similar shot into scenes
From Yeung-Yeo-Liu: STG (Princeton)
M. Wu: ENEE631 Digital Image Processing (Spring'09) Lec.19 – Video Analysis & Comm [15]
Video Understanding (step-3)Video Understanding (step-3)
– Describe connections among scenes via story boards or scene graphs
From Yeung-Yeo-Liu: STG (Princeton)
M. Wu: ENEE631 Digital Image Processing (Spring'09) Lec.19 – Video Analysis & Comm [16]
Video Temporal SegmentationVideo Temporal Segmentation
A first step toward video content understanding– Elect “key frames” to represent each shot for index/retrieval– Sequence of shot duration as a “signature” for a video
Two types of transitions– “Cut” ~ abrupt transition– Gradual transition: Fade out and Fade in; Dissolve; Wipe
Detecting transitions– Detecting cut is relatively easier
check frame-wise difference
– Detecting dissolve and fade by checking linearity f0 (1 – t/T) + f1 * t/T
– Detecting wipe ~ more difficult exploit transition patterns, or linearity of color histogram
UM
CP
EN
EE
40
8G
Slid
es
(cre
ate
d b
y M
.Wu
& R
.Liu
© 2
00
2)
M. Wu: ENEE631 Digital Image Processing (Spring'09) Lec.19 – Video Analysis & Comm [17]
Detect Dissolve via Linearity in Pixel ChangesDetect Dissolve via Linearity in Pixel Changes
Dissolve: a linear combination of g and h
Detect straight lines in DC frame space– correlation detection on triplets
dissolve
g k
h k
m
n
Pixel 1
Pixel 2
Pixel 3
From talks by Joyce-Liu (Princeton)
M. Wu: ENEE631 Digital Image Processing (Spring'09) Lec.19 – Video Analysis & Comm [18]
Examples of WipesExamples of Wipes
UM
CP
EN
EE
40
8G
Slid
es
(cre
ate
d b
y M
.Wu
© 2
00
2)
M. Wu: ENEE631 Digital Image Processing (Spring'09) Lec.19 – Video Analysis & Comm [19]
Wipe Detection (1)Wipe Detection (1)
– Convert the 2-D problem to 1-D by projection
A common strategy in feature extraction and analysis in image processing
– Perform horizon, vertical, diagonal projection to detect diverse wipe types
UM
CP
EN
EE
40
8G
Slid
es
(cre
ate
d b
y M
.Wu
© 2
00
2)
M. Wu: ENEE631 Digital Image Processing (Spring'09) Lec.19 – Video Analysis & Comm [20]
Review: Color HistogramReview: Color Histogram
Generalize from luminance histogram
What is color histogram?– Count the # of pixels with the same color– Plot color-value vs. corresponding pixel#
Give idea of the dominate color and color distribution– Ignore the exact spatial location of each color value– Useful in image and video analysis
Color histogram can be used to:– Detect gradual shot transition esp. for fancy wipes– Measure content similarity between images / video shots
UM
CP
EN
EE
40
8G
Slid
es
(cre
ate
d b
y M
.Wu
& R
.Liu
© 2
00
2)
M. Wu: ENEE631 Digital Image Processing (Spring'09) Lec.19 – Video Analysis & Comm [21]
Wipe Detection (2)Wipe Detection (2)
Diverse and fancy wipes
Linear change in color histogram
Ref: Joyce & Liu, IEEE Trans. Multimedia, 2006.
wipe
G k
H k
m
n
Bin 1
Bin 2
Bin 3
From talks by Joyce-Liu (Princeton)
M. Wu: ENEE631 Digital Image Processing (Spring'09) Lec.19 – Video Analysis & Comm [22]
Types of TransitionsTypes of Transitions
– [above] Transition types offered by Adobe Premiere– See also transition demos provided by PowerPoint
From talks by Joyce-Liu (Princeton)
Video transition collection (Dr. Rob Joyce)
M. Wu: ENEE631 Digital Image Processing (Spring'09) Lec.19 – Video Analysis & Comm [23]
Compressed-Domain ProcessingCompressed-Domain Processing
Does video analysis have to decompress the whole video?
Use I & P frames only to reduce computation and enhance robustness in scene change detection
… I b b P b b P b b P b b I b b P …
Working in compressed domain– Process video by only doing partial decoding (inverse VLC,
etc.) without a full decoding (IDCT) to save computation– Low-resolution version provides enough info for transition
detection
=> “DC-image”
UM
CP
EN
EE
40
8G
Slid
es
(cre
ate
d b
y M
.Wu
& R
.Liu
© 2
00
2)
M. Wu: ENEE631 Digital Image Processing (Spring'09) Lec.19 – Video Analysis & Comm [24]
DC ImageDC Image– Put DC of each block together– Already contain most information of the video
DC Frame
Example From Joyce-Liu (Princeton)
M. Wu: ENEE631 Digital Image Processing (Spring'09) Lec.19 – Video Analysis & Comm [25]
Fast Extraction of DC Image From MPEG-1Fast Extraction of DC Image From MPEG-1
I frame– Put together DC coeff. from each block (and apply proper scaling)
Predictive (P/B) frame– Fast approximation of reference block’s DC – Adding DC of the motion compensation residue
recall DCT is a linear transform
See Yeo-Liu’s paper for more derivations on approximations (DC; DC+2AC)
[ ( )] [ ( )] [ ( )]DCT P DCT P DCT Pcur ref diff00 00 00
[ ( )] [ ( )]DCT Ph w
DCT Prefi i
ii
00 001
4
64
1 2
3 4
C
R
UM
CP
EN
EE
40
8G
Slid
es
(cre
ate
d b
y M
.Wu
© 2
00
2)
M. Wu: ENEE631 Digital Image Processing (Spring'09) Lec.19 – Video Analysis & Comm [26]
Compressed-Domain Scene Change DetectionCompressed-Domain Scene Change Detection Compare nearby frames
– Take pixel-wise difference of nearby DC-frames– Or take pixel-wise difference of every N frames to
accumulate more changes => useful for detect gradual transitions
Observe the pixel-wise difference for different frame pairs– Peaks @ cuts, and plateaus @ gradual transitions
Figure from Yeo-Liu CSVT’95 paper
UM
CP
EN
EE
40
8G
Slid
es
(cre
ate
d b
y M
.Wu
© 2
00
2)
M. Wu: ENEE631 Digital Image Processing (Spring'09) Lec.19 – Video Analysis & Comm [27]
Scene Change Detection (cont’d)Scene Change Detection (cont’d)
Figure from Yeo-Liu CSVT’95 paper
UM
CP
EN
EE
40
8G
Slid
es
(cre
ate
d b
y M
.Wu
© 2
00
2)
– Identify candidate places for gradual transitions
– Can further explore the linearity in DC frames
=> Help differentiate gradual transitions from motions
M. Wu: ENEE631 Digital Image Processing (Spring'09) Lec.19 – Video Analysis & Comm [28]
Summary on Video Temporal SegmentationSummary on Video Temporal Segmentation
A first step toward video content understanding
Two types of transitions– “Cut” ~ abrupt transition– Gradual transition: Fade out and Fade in; Dissolve; Wipe
Detecting transitions: can be done on “DC images” w/o full decompression– Detecting cut is relatively easier ~ check frame-wise
difference– Detecting dissolve and fade by checking linearity
f0 (1 – t/T) + f1 * t/T
– Detecting wipe ~ more difficult exploit transition patterns, or linearity of color histogram
UM
CP
EN
EE
40
8G
Slid
es
(cre
ate
d b
y M
.Wu
& R
.Liu
© 2
00
2)
M. Wu: ENEE631 Digital Image Processing (Spring'09) Lec.19 – Video Analysis & Comm [29]
Video CommunicationsVideo Communications
M. Wu: ENEE631 Digital Image Processing (Spring'09) Lec.19 – Video Analysis & Comm [30]
MM + Data Comm. = Effective MM Communications?MM + Data Comm. = Effective MM Communications?
Multimedia vs. Generic Data– Perceptual no-difference vs. Bit-by-bit accuracy– Unequal importance within multimedia data– High data volume and real-time requirements
Need consider the interplay between source coding and transmission and make use of MM specific properties
E.g. wireless video need “good” compression algorithm to:– Support scalable video compression rate ( from 10 to several
hundred kbps)– Be robust to the transmission errors and channel impairments– Minimize end-to-end delay– Handle missing frames intelligently
M. Wu: ENEE631 Digital Image Processing (Spring'09) Lec.19 – Video Analysis & Comm [31]
Error-Resilient Coding with Localized Synch MarkerError-Resilient Coding with Localized Synch Marker
To reduce error propagation
Output sequence
Inputsequence
H.263 encoder
MB detection
LRM
H.263 decoder
Error concealment
Random noise
H.263 with FRM H.263 with LRM
(From D. Lun @ HK PolyUniv. Short Course 6/01)
M. Wu: ENEE631 Digital Image Processing (Spring'09) Lec.19 – Video Analysis & Comm [32]
Issues in Video Communications/StreamingIssues in Video Communications/Streaming
Source coding aspects– Rate-Distortion tradeoff and bit allocation in R-D optimal sense– Scalable coding and Fine Granular Scalability (FGS)– Multiple description coding– Error resilient source coding
Channel coding aspects ~ see ENEE626 for general theory– Unequal Error Protection (UEP) channel codes– Embedded modulation for achieving UEP
Joint source-channel approaches– Jointly select source and channel coding parameters to optimize
end-to-end distortion– Wisely map source codewords to channel symbols– Take advantage of channel’s non-uniform characteristics for UEP
Bandwidth resource determination, allocation & adaptation
M. Wu: ENEE631 Digital Image Processing (Spring'09) Lec.19 – Video Analysis & Comm [33]
Reading ReferencesReading References Video temporal segmentation for content analysis
– Yeo-Liu CSVT 12/1995 paper (DC-image & scene change detection)
– Joyce-Liu TMM 2006 paper (Wipe detection)
Video communications– Wang’s video textbook: Chapter 14, 15.– Wood’s book: Chapter 12
UM
CP
EN
EE
40
8G
Slid
es
(cre
ate
d b
y M
.Wu
© 2
00
2)