m. wu: enee631 digital image processing (spring'09) video content analysis and streaming spring...

M. Wu: ENEE631 Digital Image Processing (Spring'09)

Video Content Analysis and StreamingVideo Content Analysis and Streaming

Spring ’09 Instructor: Min Wu

Electrical and Computer Engineering Department,

University of Maryland, College Park

bb.eng.umd.edu (select ENEE631 S’09) [email protected]

ENEE631 Spring’09ENEE631 Spring’09Lecture 19 (4/13/2009)Lecture 19 (4/13/2009)

M. Wu: ENEE631 Digital Image Processing (Spring'09) Lec.19 – Video Analysis & Comm [2]

Overview and LogisticsOverview and Logistics

Last Time: – General methodologies on motion analysis– Optical flow equations

Today:– Wrap up motion analysis– Video content analysis

Basic framework Temporal segmentation; Compressed domain processing

– A quick guide on video communications

UM

CP

EN

EE

63

1 S

lide

s (c

rea

ted

by

M.W

u ©

20

04

)


Review: Optical Flow EquationReview: Optical Flow Equation

Orthogonal decomposition of the flow vector v

– Projection along “normal direction” ~ vn

i.e., along image gradient f ’s direction

– Projection along tangent direction ~ vt

i.e., along orthogonal direction to image gradient f

O.F.E. f Normal direction

Tangent direction

0||||

t

fvf n

From Wang’s Preprint Fig.6.2

UM

CP

EN

EE

63

1 S

lide

s (c

rea

ted

by

M.W

u ©

20

01

)

|||| ftf

vn


Ambiguity in Motion EstimationAmbiguity in Motion Estimation

One equation for two unknowns– Tangent direction of motion vector

is undetermined

– “Aperture problem” Aperture ~ small window over which to apply const. intensity

assumption MV can be estimated only if aperture contains 2+ different

gradient directions (e.g. corners)

– Usually need additional constraints Spatial smoothness of motion field

Indeterminate motion vector over constant region (||f || = 0)– Reliable motion estimation only for regions with brightness

variations (e.g. edges or nonflat textures)

From Wang’s Preprint Fig.6.3

UM

CP

EN

EE

63

1 S

lide

s (c

rea

ted

by

M.W

u ©

20

01

)


General Methodologies for Motion EstimationGeneral Methodologies for Motion Estimation

Two categories: Feature vs. Intensity based estimation

Feature based– Step-1 establish correspondences between feature pairs – Step-2 estimate parameters of a chosen motion model by

least-square fitting of the correspondences

– Good for global/camera motion describable by parametric models

Common models: affine, projective, … (Wang Sec.5.5.2-5.5.4) Applications: Image mosaicing, synthesis of multiple-views

Intensity based– Apply optical flow equation (or its variation) to local regions– Good for non-simple motion and multiple objects– Applications: video coding, motion prediction and filtering

UM

CP

EN

EE

63

1 S

lide

s (c

rea

ted

by

M.W

u ©

20

01

)


Motion Estimation CriteriaMotion Estimation Criteria

Criterion based on displaced frame difference– E.g. in block matching approach

Criterion based on optical flow equations

Other criteria and considerations– Smoothness constraints– Bayesian criterion

UM

CP

EN

EE

63

1 S

lide

s (c

rea

ted

by

M.W

u ©

20

01

)


Commonly Used Optimization MethodsCommonly Used Optimization Methods For minimizing the previously defined M.E. error function

Exhaustive search– MAD often used for computational simplicity– Guaranteed global optimality at expense of computation complexity– Fast algorithms for sub-optimal solutions

Gradient-based search (Appendix B of Wang’s book)

– MSE often used for mathematical tractability (differentiable)– Iterative approach

refine an estimate along negative gradient directions of objective func.

– Generally converge to local optimal require good initial estimate

– Estimation method of Gradient also affects accuracy & robustness

UM

CP

EN

EE

63

1 S

lide

s (c

rea

ted

by

M.W

u ©

20

01

)


Various Motion Estimation ApproachesVarious Motion Estimation Approaches

Pixel-based motion estimation (Wang’s sec.6.3) Estimate one MV for every pixel Use relation from Optical Flow Equation to construct M.E.

criterion Add smoothness constraints on motion field to deal with

aperture problem and avoid poor estimation of MV

Block-matching– Correlation method (Wang’s sec.6.4.5)

Deformable block-matching (Wang’s sec.6.5)

– Use more block-based motion model than translational model e.g., affine/bilinear/projective mapping for each block (sec.5.5) square block in current frame match with non-square block in

ref.

Mesh-based motion estimation (Wang’s sec.6.6)

UM

CP

EN

EE

63

1 S

lide

s (c

rea

ted

by

M.W

u ©

20

01

)


Video Content AnalysisVideo Content Analysis


Recall: MPEG-7Recall: MPEG-7

“Multimedia Content Description Interface”– Not a video coding/compression standard like previous MPEG– Emphasize on how to describe the video content for efficient

indexing, search, and retrieval

Standardize the description mechanism of content– Descriptor, Description Scheme & Description Definition

Languages– Commonly used visual descriptors: Color, Texture, Shape, …

Figure from MPEG-7 Document N4031 (March 2001)

UM

CP

EN

EE

40

8G

Slid

es

(cre

ate

d b

y M

.Wu

& R

.Liu

© 2

00

2)


Introduction to Video Content AnalysisIntroduction to Video Content Analysis

Teach computer to “understand” video content– Define features that computer can learn to measure and compare

color (RGB values or other color coordinates) motion (magnitude and directions) shape (contours) texture and patterns

– Give example correspondences so that computer can learn build connections between feature & higher-level

semantics/concepts statistical classification and recognition techniques

Video understanding1. Break a video sequence into chunks, each with consistent content ~ “shot”2. Group similar shot into scenes that represent certain events3. Describe connections among scenes via story boards or scene graphs4. Associate shot/scene with representative feature/semantics for future query

UM

CP

EN

EE

40

8G

Slid

es

(cre

ate

d b

y M

.Wu

& R

.Liu

© 2

00

2)


Video Understanding (step-1)Video Understanding (step-1)

– Break a video sequence into chunks, each with consistent content ~ “shot”

From Yeung-Yeo-Liu: STG (Princeton)



– Group similar shot into scenes




– Describe connections among scenes via story boards or scene graphs



Video Temporal SegmentationVideo Temporal Segmentation

A first step toward video content understanding– Elect “key frames” to represent each shot for index/retrieval– Sequence of shot duration as a “signature” for a video

Two types of transitions– “Cut” ~ abrupt transition– Gradual transition: Fade out and Fade in; Dissolve; Wipe

Detecting transitions– Detecting cut is relatively easier

check frame-wise difference

– Detecting dissolve and fade by checking linearity f0 (1 – t/T) + f1 * t/T

– Detecting wipe ~ more difficult exploit transition patterns, or linearity of color histogram

UM

CP

EN

EE

40

8G

Slid

es

(cre

ate

d b

y M

.Wu

& R

.Liu

© 2

00

2)


Detect Dissolve via Linearity in Pixel ChangesDetect Dissolve via Linearity in Pixel Changes

Dissolve: a linear combination of g and h

Detect straight lines in DC frame space– correlation detection on triplets

dissolve

g k

h k

m

n

Pixel 1

Pixel 2

Pixel 3

From talks by Joyce-Liu (Princeton)


Examples of WipesExamples of Wipes

UM

CP

EN

EE

40

8G

Slid

es

(cre

ate

d b

y M

.Wu

© 2

00

2)


Wipe Detection (1)Wipe Detection (1)

– Convert the 2-D problem to 1-D by projection

A common strategy in feature extraction and analysis in image processing

– Perform horizon, vertical, diagonal projection to detect diverse wipe types

UM

CP

EN

EE

40

8G

Slid

es

(cre

ate

d b

y M

.Wu

© 2

00

2)


Review: Color HistogramReview: Color Histogram

Generalize from luminance histogram

What is color histogram?– Count the # of pixels with the same color– Plot color-value vs. corresponding pixel#

Give idea of the dominate color and color distribution– Ignore the exact spatial location of each color value– Useful in image and video analysis

Color histogram can be used to:– Detect gradual shot transition esp. for fancy wipes– Measure content similarity between images / video shots

UM

CP

EN

EE

40

8G

Slid

es

(cre

ate

d b

y M

.Wu

& R

.Liu

© 2

00

2)


Wipe Detection (2)Wipe Detection (2)

Diverse and fancy wipes

Linear change in color histogram

Ref: Joyce & Liu, IEEE Trans. Multimedia, 2006.

wipe

G k

H k

m

n

Bin 1

Bin 2

Bin 3



Types of TransitionsTypes of Transitions

– [above] Transition types offered by Adobe Premiere– See also transition demos provided by PowerPoint


Video transition collection (Dr. Rob Joyce)


Compressed-Domain ProcessingCompressed-Domain Processing

Does video analysis have to decompress the whole video?

Use I & P frames only to reduce computation and enhance robustness in scene change detection

… I b b P b b P b b P b b I b b P …

Working in compressed domain– Process video by only doing partial decoding (inverse VLC,

etc.) without a full decoding (IDCT) to save computation– Low-resolution version provides enough info for transition

detection

=> “DC-image”

UM

CP

EN

EE

40

8G

Slid

es

(cre

ate

d b

y M

.Wu

& R

.Liu

© 2

00

2)


DC ImageDC Image– Put DC of each block together– Already contain most information of the video

DC Frame

Example From Joyce-Liu (Princeton)


Fast Extraction of DC Image From MPEG-1Fast Extraction of DC Image From MPEG-1

I frame– Put together DC coeff. from each block (and apply proper scaling)

Predictive (P/B) frame– Fast approximation of reference block’s DC – Adding DC of the motion compensation residue

recall DCT is a linear transform

See Yeo-Liu’s paper for more derivations on approximations (DC; DC+2AC)

[ ( )] [ ( )] [ ( )]DCT P DCT P DCT Pcur ref diff00 00 00

[ ( )] [ ( )]DCT Ph w

DCT Prefi i

ii

00 001

4

64

1 2

3 4

C

R

UM

CP

EN

EE

40

8G

Slid

es

(cre

ate

d b

y M

.Wu

© 2

00

2)


Compressed-Domain Scene Change DetectionCompressed-Domain Scene Change Detection Compare nearby frames

– Take pixel-wise difference of nearby DC-frames– Or take pixel-wise difference of every N frames to

accumulate more changes => useful for detect gradual transitions

Observe the pixel-wise difference for different frame pairs– Peaks @ cuts, and plateaus @ gradual transitions

Figure from Yeo-Liu CSVT’95 paper

UM

CP

EN

EE

40

8G

Slid

es

(cre

ate

d b

y M

.Wu

© 2

00

2)


Scene Change Detection (cont’d)Scene Change Detection (cont’d)

Figure from Yeo-Liu CSVT’95 paper

UM

CP

EN

EE

40

8G

Slid

es

(cre

ate

d b

y M

.Wu

© 2

00

2)

– Identify candidate places for gradual transitions

– Can further explore the linearity in DC frames

=> Help differentiate gradual transitions from motions


Summary on Video Temporal SegmentationSummary on Video Temporal Segmentation

A first step toward video content understanding

Two types of transitions– “Cut” ~ abrupt transition– Gradual transition: Fade out and Fade in; Dissolve; Wipe

Detecting transitions: can be done on “DC images” w/o full decompression– Detecting cut is relatively easier ~ check frame-wise

difference– Detecting dissolve and fade by checking linearity

f0 (1 – t/T) + f1 * t/T

– Detecting wipe ~ more difficult exploit transition patterns, or linearity of color histogram

UM

CP

EN

EE

40

8G

Slid

es

(cre

ate

d b

y M

.Wu

& R

.Liu

© 2

00

2)


Video CommunicationsVideo Communications


MM + Data Comm. = Effective MM Communications?MM + Data Comm. = Effective MM Communications?

Multimedia vs. Generic Data– Perceptual no-difference vs. Bit-by-bit accuracy– Unequal importance within multimedia data– High data volume and real-time requirements

Need consider the interplay between source coding and transmission and make use of MM specific properties

E.g. wireless video need “good” compression algorithm to:– Support scalable video compression rate ( from 10 to several

hundred kbps)– Be robust to the transmission errors and channel impairments– Minimize end-to-end delay– Handle missing frames intelligently


Error-Resilient Coding with Localized Synch MarkerError-Resilient Coding with Localized Synch Marker

To reduce error propagation

Output sequence

Inputsequence

H.263 encoder

MB detection

LRM

H.263 decoder

Error concealment

Random noise

H.263 with FRM H.263 with LRM

(From D. Lun @ HK PolyUniv. Short Course 6/01)


Issues in Video Communications/StreamingIssues in Video Communications/Streaming

Source coding aspects– Rate-Distortion tradeoff and bit allocation in R-D optimal sense– Scalable coding and Fine Granular Scalability (FGS)– Multiple description coding– Error resilient source coding

Channel coding aspects ~ see ENEE626 for general theory– Unequal Error Protection (UEP) channel codes– Embedded modulation for achieving UEP

Joint source-channel approaches– Jointly select source and channel coding parameters to optimize

end-to-end distortion– Wisely map source codewords to channel symbols– Take advantage of channel’s non-uniform characteristics for UEP

Bandwidth resource determination, allocation & adaptation


Reading ReferencesReading References Video temporal segmentation for content analysis

– Yeo-Liu CSVT 12/1995 paper (DC-image & scene change detection)

– Joyce-Liu TMM 2006 paper (Wipe detection)

Video communications– Wang’s video textbook: Chapter 14, 15.– Wood’s book: Chapter 12

UM

CP

EN

EE

40

8G

Slid

es

(cre

ate

d b

y M

.Wu

© 2

00

2)

m. wu: enee631 digital image processing (spring'09) video content analysis and streaming spring...

Documents

motion prediction

reliable motion estimation

filteringumcp enee631

motion estimationone

nonsimple motion

chosen motion model

image mosaicing

flow vector v projection