m. wu: enee631 digital image processing (spring'09) basics on video coding spring ’09...

M. Wu: ENEE631 Digital Image Processing (Spring'09)

Basics on Video CodingBasics on Video Coding

Spring ’09 Instructor: Min Wu

Electrical and Computer Engineering Department,

University of Maryland, College Park

bb.eng.umd.edu (select ENEE631 S’09) [email protected]

ENEE631 Spring’09ENEE631 Spring’09Lecture 15 (3/30/2009)Lecture 15 (3/30/2009)

M. Wu: ENEE631 Digital Image Processing (Spring'09) Lec15 – Hybrid Video Coding [2]

Overview and LogisticsOverview and Logistics Last Time:

– Bit allocation issues in image compression– Optimal transform KLT ~ unitary transform; decorrelate data

optimal MMSE approximation under basis restriction

Comments on issues arising from mid-term exam– Linearity and shift invariance: check by definition

Is piecewise linear stretching a linear operation? If ignoring boundary effect, are median filtering and point

operations (including histogram based processing) shift invariant? Give examples on shift variant operations

– Quantization: MMSE criterion vs. Minmax criterion

Today:– Image interpolation– Video coding: explore temporal and spatial redundancy

UM

CP

EN

EE

63

1 S

lide

s (c

rea

ted

by

M.W

u ©

20

04

)


Image Interpolation: Image Interpolation:

A Quick Extension from 1-D InterpolationA Quick Extension from 1-D Interpolation

Useful in image enlargement, rotation, motion estimation, etc.Useful in image enlargement, rotation, motion estimation, etc.

UM

CP

EN

EE

63

1 S

lide

s (c

rea

ted

by

M.W

u ©

20

04

)


Examples of Image InterpolationExamples of Image Interpolation

4x zoom (nearest neighbor) 4x zoom (bilinear)


Interpolation / ZoomingInterpolation / Zooming

How to make up the new pixels?

Replication according to the nearest neighbor– Simple but leaves zig-zag boundary

(reflect spectrum artifacts; equiv. to interlace zero & LPF with a constant mask)

(p,q)

(p’,q’)

(p,q+1)

(p+1,q+1)(p+1,q)

a

b

a 1-a

f1

f2

– Do two horizontal and one vertical 1-D interpolation

F( p’, q’ ) = (1-a) [ (1-b) F(p, q) + b F(p, q+1) ] + a [(1-b) F(p+1, q) + b F(p+1, q+1) ]

For zoom in by 2 in each dimension:F(p’, q’) = 0.5 [0.5 F(p,q) + 0.5 F(p,q+1)] + 0.5 [0.5 F(p+1,q) + 0.5 F(p+1,q+1)]

=> equiv. to F(x, y) = r x + s y + u xy + v solve parameters using 4 known pixels

Bilinear interpolation– Extend 1-D linear interpolation: (1-a) f1 + a f2

UM

CP

EN

EE

63

1/4

08

G S

lide

s (c

rea

ted

by

M.W

u ©

20

01

/20

02

)


Review: 1-D Frequency-Domain InterpretationReview: 1-D Frequency-Domain Interpretation

From Crochiere-Rabiner “Multirate DSP” book Fig.2.15-16


Frequency-Domain InterpretationFrequency-Domain Interpretation Review multirate signal processing (ENEE630)

For Images: extend to the 2-D transform

Downsampling– Aliasing as spectra replicas becomes closer– LPF to avoid aliasing

Upsampling– Upsampling with zero interlacing ~ replicated spectrum– LPF to filter out the spectra replicas in high-frequency part– Ideal filter vs. practical filters

nearest neighbor approach for 2x zoom use [think] what equiv. filters used for bilinear interpolation?

Sampling rate conversion with rational rate M / N– Upsample with zero interlacing by M LPF Downsample

1 1

1 1

UM

CP

EN

EE

63

1 S

lide

s (c

rea

ted

by

M.W

u ©

20

01

)

1/2 1

1/4 1/2

1/4 1/2

1/2

1/4

1/4


More on InterpolationMore on Interpolation

Other filters

– Bi-cubic interpolation (3rd order polynomial on index variables) Based on combination of 16-pixel neighborhood

– Can build p-th order interpolation by recursive filtering After upsample by p, convolve with linear interpolation filter p

times

Interpolation that avoids blurred edges and textures

– Sharpening– Edge-preserving interpolation

( recent research papers in ICIP and Trans. on Image Proc. )

=> Will discuss more on 2-D sampling and frequency domain interpretation in a few lectures

UM

CP

EN

EE

63

1 S

lide

s (c

rea

ted

by

M.W

u ©

20

01

/20

04

)


From Image Coding to Video CodingFrom Image Coding to Video Coding

UM

CP

EN

EE

63

1 S

lide

s (c

rea

ted

by

M.W

u ©

20

04

)


ReviewReview

Basic tools for compression

– PCM coding, entropy coding, run-length coding– Quantization and truncation– Predictive coding– Transform coding: DCT-based

JPEG image compression

– 8x8 Block-DCT based transform coding– Use predictive coding, quantization, run-length coding, and

entropy coding

Today: digital video and video compression

UM

CP

EN

EE

40

8G

Slid

es

(cre

ate

d b

y M

.Wu

& R

.Liu

© 2

00

2)


Bring in Motion Bring in Motion Video (Motion Pictures) Video (Motion Pictures)

Capturing video

– Video as a 3-D signal 2 spatial dimensions & time dimension continuous I( x, y, t ) => discrete I( m, n, tk )

– Frame by frame => image sequence

Encode digital video

– Simplest way ~ compress each frame image individually e.g., “motion-JPEG” only spatial redundancy is explored and reduced

– How about temporal redundancy? Is differential coding good? Pixel-by-pixel difference could still be large due to motion

Need better prediction

UM

CP

EN

EE

40

8G

Slid

es

(cre

ate

d b

y M

.Wu

& R

.Liu

© 2

00

2)


Video ExamplesVideo Examples

1. NASA shuttle

2. “Talking Head”


Explore Temporal Redundancy – 1Explore Temporal Redundancy – 1stst try try

– Difference between corresponding pixels of two video frames

From Gonzalez-Woods 3/e Fig. 8.34-8.35


Explore MotionExplore MotionFrom Gonzalez-Woods

3/e Fig. 8.37


Motion EstimationMotion Estimation

Help understanding the content of image sequence– Useful for surveillance

Stabilizing video by detecting and removing small, noisy global motions– For building stabilizer in camcorder

Reduce temporal redundancy of video for compression[What estimation accuracy and resolution are necessary for this purpose?]

one motion displacement vector per picture? (extreme case: DPCM)

one vector per pixel?

=> Tradeoff: (1) effectiveness & complexity in approximating commonly seen motions; (2) overhead in describing the motion model.

UM

CP

EN

EE

40

8G

Slid

es

(cre

ate

d b

y M

.Wu

& R

.Liu

© 2

00

2;

20

07

)


Block-Matching by Exhaustive SearchBlock-Matching by Exhaustive Search Modeling: assume movements are block-based translation

Search every possibility over a specified range for the best matching block – MAD (mean absolute difference) often used for simplicity

=> Flash Demo (by Dr. Ken Lam @ Hong Kong PolyTech Univ.)

From Wang’s Preprint Fig.6.6U

MC

P E

NE

E4

08

G S

lide

s (c

rea

ted

by

M.W

u &

R.L

iu ©

20

02

)


Motion Compensation Motion Compensation

– Help reduce temporal redundancy of video

PREVIOUS FRAME CURRENT FRAME

PREDICTED FRAME PREDICTION ERROR FRAME

UM

CP

EN

EE

40

8G

Slid

es

(cre

ate

d b

y M

.Wu

& R

.Liu

© 2

00

2)

Revised from R.Liu Seminar Course ’00 @ UMD


Complexity of Exhaustive Block-MatchingComplexity of Exhaustive Block-Matching

Assumptions– Block size NxN and image size S=M1 x M2– Search step size is 1 pixel ~ “integer-pel accuracy”– Search range +/–R pixels both horizontally and vertically

Computation complexity# Candidate matching blocks = (2R+1)2 # Operations for computing MAD for one block ~ O(N2)# Operations for MV estimation per blk ~ O((2R+1)2 N2); # Blocks = S / N2 – Total # operations for entire frame ~ O((2R+1)2 S)

i.e., overall computation load is independent of block size! block size affects encoding bit rate and effectiveness of motion

compensation.

E.g., M=512, N=16, R=16, 30fps => On the order of 8.55 x 109 operations per second!– Was difficult for real time estimation, but possible with parallel hardware

UM

CP

EN

EE

40

8G

Slid

es

(cre

ate

d b

y M

.Wu

& R

.Liu

© 2

00

2)


Exhaustive Search: Cons and ProsExhaustive Search: Cons and Pros

Pros– Guaranteed optimality within search range and motion model

Cons– Can only search among finitely many candidates

What if the motion is “fractional”?

– High computation complexity On the order of [search-range-size x image-size] for 1-pixel step

size

How to improve accuracy?

– Include blocks at fractional translation as candidates => require interpolation

How to improve speed?– Try to exclude unlikely candidates

UM

CP

EN

EE

40

8G

Slid

es

(cre

ate

d b

y M

.Wu

& R

.Liu

© 2

00

2)


Fractional Accuracy Search for Block MatchingFractional Accuracy Search for Block Matching For motion accuracy of 1/K pixel

– Upsample (interpolate) reference frame by a factor of K– Search for the best matching block in the upsampled reference frame

Half-pel accuracy ~ K=2– Significant accuracy improvement over integer-pel

(esp. for low-resolution)– Complexity increase

(From Wang’s Preprint Fig.6.7)

UM

CP

EN

EE

40

8G

Slid

es

(cre

ate

d b

y M

.Wu

& R

.Liu

© 2

00

2)


No motion compensation

1-pixel precision

½ pixel precision

¼ pixel precision

Fractional Accuracy for Motion: ExampleFractional Accuracy for Motion: ExampleFrom Gonzalez-Woods

3/e Fig. 8.38


Fast Algorithms for Block MatchingFast Algorithms for Block Matching

Basic ideas– Matching errors near the best match are generally smaller than far away– Skip candidates that are unlikely to give good match

UM

CP

EN

EE

40

8G

Slid

es

(cre

ate

d b

y M

.Wu

& R

.Liu

© 2

00

2)



M24

M15 M14 M13

M16

M11

M12

M5 M4 M3

M17 M18 M19

-6 M6 M1 M2 +6

M7 M8 M9

dx

dy

Fast Algorithm: 3-Step Search Fast Algorithm: 3-Step Search

Search candidates at 8 neighbor positions

Step-size cut down by 2 after each iteration– Start with step size

approx. half of max. search range

motion vector {dx, dy} = {1, 6}

Total number of computations: 9 + 82 = 25 (3-step) (2R+1)2 = 169 (full search)

(Fig. from Ken Lam – HK Poly Univ. short course in summer’2001)

UM

CP

EN

EE

40

8G

Slid

es

(cre

ate

d b

y M

.Wu

& R

.Liu

© 2

00

2)

=> See Flash demo by Jane Kim (UMD)


Lowest resolution

medium resolution

Original resolution

Hierarchical Block MatchingHierarchical Block Matching Problem with fast search at full resolution

– Small mis-alignment may give high displacement error (EDFD) esp. for texture and edge blocks

Hierarchical (multi-resolution) block matching– Match with coarse resolution to narrow down search range– Match with high resolution to refine motion estimation


UM

CP

EN

EE

40

8G

Slid

es

(cre

ate

d b

y M

.Wu

& R

.Liu

© 2

00

2)


Summary of Today’s LectureSummary of Today’s Lecture

Interpolation

Block-based motion estimation and compensation

Next Lecture: video compression through hybrid coding

=> Given what we discussed, how to design a video codec?

Exploit spatial redundancy via transform coding Exploit temporal redundancy via predictive coding

~ motion estimation and compensation

Reading assignment– Gonzalez’s 3/e book 2.4.4 (interpolation); 8.2.9 (motion compensation)

– To explore further: Wang’s video textbook 9.3.1, 6.4

UM

CP

EN

EE

63

1 S

lide

s (c

rea

ted

by

M.W

u ©

20

04

)


Hybrid Coding for Video Hybrid Coding for Video

UM

CP

EN

EE

63

1 S

lide

s (c

rea

ted

by

M.W

u ©

20

04

)


DCT-M.E. Hybrid Video CodingDCT-M.E. Hybrid Video Coding “Hybrid” ~ combined transform coding & predictive coding Spatial redundancy removal

– Use DCT-based transform coding for reference frame Temporal redundancy removal

– Use motion-based predictive coding for next frames estimate motion and use reference frame to predict only encode MV & prediction residue (“motion compensation residue”)

(From Princeton EE330 S’01 by B.Liu)

UM

CP

EN

EE

40

8G

Slid

es

(cre

ate

d b

y M

.Wu

& R

.Liu

© 2

00

2)


Review: Predictive Coding with QuantizationReview: Predictive Coding with Quantization Consider: high correlation between successive samples

Predictive coding– Basic principle: Remove redundancy between successive pixels and only

encode residual between actual and predicted – Residue usually has much smaller dynamic range

Allow fewer quantization levels for the same MSE => get compression

– Compression efficiency depends on intersample redundancy

First try:

Any problem with this codec?

uQ (n)

Predictor+

eQ(n)

uP(n) = f[uQ(n-1)] DecodeDecode

rr

u(n)

Predictor

Quantizer_

e(n) eQ(n)

EncodeEncoderr

u’P(n) = f[u(n-1)]

UM

CP

EN

EE

40

8G

Slid

es

(cre

ate

d b

y M

.Wu

& R

.Liu

© 2

00

2)


Predictive Coding (cont’d)Predictive Coding (cont’d)

Problem with 1st try– Input to predictor are different at

encoder and decoder decoder doesn’t know u(n)!

– Mismatch error could propagate to future reconstructed samples

Solution: Differential PCM (DPCM)

– Use quantized sequence uQ(n) for prediction at both encoder and decoder

– Simple predictor f[ x ] = x– Prediction error e(n)– Quantized prediction error eQ(n)

– Distortion d(n) = e(n) – eQ(n)

uQ (n)

Predictor+

eQ(n)

uP(n)= f[uQ(n-1)]

DecodeDecoderr

EncodeEncoderr

u(n)

Predictor

Quantizer_

e(n) eQ(n)

+uP(n)=f[uQ(n-1)]

uQ(n)

UM

CP

EN

EE

40

8G

Slid

es

(cre

ate

d b

y M

.Wu

& R

.Liu

© 2

00

2)

Note: “Predictor” contains one-step buffer as input to the prediction


Hybrid MC-DCT Video EncoderHybrid MC-DCT Video Encoder(From R.Liu’s Handbook Fig.2.18)

• Intra-frame: encoded without prediction• Inter-frame: predictively encoded => use quantized frames as ref for residue

UM

CP

EN

EE

40

8G

Slid

es

(cre

ate

d b

y M

.Wu

& R

.Liu

© 2

00

2)


Hybrid MC-DCT Video DecoderHybrid MC-DCT Video Decoder

(From R.Liu’s Handbook Fig.2.18)

UM

CP

EN

EE

40

8G

Slid

es

(cre

ate

d b

y M

.Wu

& R

.Liu

© 2

00

2)


Hybrid Video Coding: Problems to Be SolvedHybrid Video Coding: Problems to Be Solved Not all regions are easily inferable from previous frame

– Occlusion ~ solvable by backward prediction using future frames as ref.– Adaptively decide using prediction or not

Drifting and error propagation

Solution: Encode reference regions or frames from time to time (“intra coding”)

Random access: e.g. want to get 95th frame

Solution: Encode frame without prediction from time to time

How to allocate bits?– Based on visual model and statistics: JPEG-like quant. steps; entropy coding

– Consider constant or variable bit-rate requirement Constant-bit-rate (CER) vs. Variable-bit-rate (VER)

Wrap up all solutions ~ MPEG-like codec

UM

CP

EN

EE

40

8G

Slid

es

(cre

ate

d b

y M

.Wu

& R

.Liu

© 2

00

2)


Background Reviews onBackground Reviews on

Video Acquisition and DisplayVideo Acquisition and Display


Video CameraVideo Camera

Frame-by-frame capturing

CCD sensors (Charge-Coupled Devices)– 2-D array of solid-state sensors– Each sensor corresponding to a pixel– Store in a buffer and sequentially read out– Small and light => widely used

UM

CP

EN

EE

40

8G

Slid

es

(cre

ate

d b

y M

.Wu

& R

.Liu

© 2

00

2)


Video DisplayVideo Display

CRT (Cathode Ray Tube)

– Large dynamic range– Bulky for large display

CRT physical depth has to be similar to screen width

LCD Flat-panel display

– Use electrical field to change the optical properties hence the brightness/color of liquid crystal

– Generating the electrical field by an array of transistors: active-matrix thin-film transistors by plasma

“Active-matrix display” (also known as TFT) has a transistor located at each pixel, allowing display be switched more frequently and less current to control pixel luminance. Passive matrix LCD has a grid of conductors with pixels located at the grid intersections

UM

CP

EN

EE

40

8G

Slid

es

(cre

ate

d b

y M

.Wu

& R

.Liu

© 2

00

2)


Composite vs. Component VideoComposite vs. Component Video

Component video– Three separate signals for tristimulus color representation or

luminance-chrominance representation – Pro: higher quality– Con: need high bandwidth and synchronization

Composite video– Multiplex into a signal signal– Historical reason for transmitting color TV through monochrome

channel– Pro: save bandwidth– Con: cross talk

S-video: luminance sig. + single multiplexed chrominance sig.

UM

CP

EN

EE

40

8G

Slid

es

(cre

ate

d b

y M

.Wu

& R

.Liu

© 2

00

2)


Analog Video RasterAnalog Video Raster

Line-by-line “Raster Scan”– Represent line-by-line image frame with 1-D analog

waveform– Synchronization signal for horizontal and vertical retrace

UM

CP

EN

EE

40

8G

Slid

es

(cre

ate

d b

y M

.Wu

& R

.Liu

© 2

00

2)


Forming Picture on TV Tube (Monochrome)Forming Picture on TV Tube (Monochrome)

How many lines?

From B.Liu EE330S’01 Princeton


How Many TV Lines?How Many TV Lines?

Determined by spatial freq. response of HVS(Recall Lecture-2)

dot

dot

Cannot resolve if

distance > 2000 x separation

(~ 0.03 degree viewing angle)

From B.Liu EE330S’01 Princeton

N = 500 for D=4H


Progressive vs. Interlaced scanProgressive vs. Interlaced scanFrom B.Liu EE330S’01 Princeton


Analog Color TV SystemsAnalog Color TV Systems

Historical notes – Color TV system had to be compatible with earlier monochrome TV system

3 formats– NTSC ~ North American + Japan/Taiwan – PAL ~ Western Europe + Asia(China) + Middle East– SECAM ~ Eastern Europe + France– What format in your home country?

From Wang’s Preprint Fig.1.5

UM

CP

EN

EE

40

8G

Slid

es

(cre

ate

d b

y M

.Wu

& R

.Liu

© 2

00

2)


Comparison of Three Analog TV SystemsComparison of Three Analog TV Systems

– Spatial and temporal resolution– Color coordinate– Signal bandwidth– Multiplexing of luminance, chrominance, and audio

(From Wang’s Book Preprint)

UM

CP

EN

EE

40

8G

Slid

es

(cre

ate

d b

y M

.Wu

& R

.Liu

© 2

00

2)


NTSCNTSC

4:3 aspect ratio (width:height)

525 lines/frame, 2:1 interlace at field rate 59.94Hz– 483 active lines per frame; vertical retrace takes time of 9 lines– rest for broadcaster’s info. like closed caption

YIQ color coordinate for transmission– RGB primary slightly different from PAL– Orthogonal chrominance

I ~ orange-to-cyan; Q ~ green-to-purple (need less bandwidth)

Multiplexing over 6M Hz total bandwidth– Artifacts due to cross talk between luminance and chrominanceU

MC

P E

NE

E4

08

G S

lide

s (c

rea

ted

by

M.W

u &

R.L

iu ©

20

02

)


NTSC 6MHz Bandwidth NTSC 6MHz Bandwidth From Wang’s BookPreprint Fig.1.6(b)


Analog Video RecordingAnalog Video Recording

Comparison of common formats

From Wang’s BookPreprint Table 1.2


Digital Video FormatsDigital Video Formats

ITU-R BT.601 recommendation Downsampled chrominance

– Y Cb Cr coordinate and four subsampling formats

Inter. Telecomm. Union – Radio sector

Wang’sBookPreprint Fig.1.8

UM

CP

EN

EE

40

8G

Slid

es

(cre

ate

d b

y M

.Wu

& R

.Liu

© 2

00

2)


Summary: Source Video FormatsSummary: Source Video Formats

UM

CP

EN

EE

40

8G

Slid

es

(cre

ate

d b

y M

.Wu

& R

.Liu

© 2

00

2)


Application RequirementsApplication Requirements

UM

CP

EN

EE

40

8G

Slid

es

(cre

ate

d b

y M

.Wu

& R

.Liu

© 2

00

2)

m. wu: enee631 digital image processing (spring'09) basics on video coding spring ’09...

Documents

q b fp

n2 linear system h

d linear interpolation

n n0 impulse response

image compressionoptimal

image enlargement

d systema

processing shift invariant