introduction to jpeg and mpeg ingemar j. cox university college london

Introduction to JPEG and MPEG

Ingemar J. Cox

University College London

Nov 27th 2006 Ingemar J. Cox 2

UC

L A

dast

ral P

ark

Post

gra

duate

Cam

pus

Outline

Elementary information theory

Lossless compression

Quantization

Fundamentals of images

Discrete Cosine Transform (DCT)

JPEG

MPEG-1, MPEG-2


UC

L A

dast

ral P

ark

Post

gra

duate

Cam

pus

Bibliography

D. MacKay, “Information Theory, Inference and learning Algorithms”, Cambridge University Press, 2003. http://www.inference.phy.cam.ac.uk/itprnn/book.html

W. B. Pennebaker and J. L. Mitchell, “JPEG Still Image Data Compression Standard”, Chapman Hall, 1993 (ISBN 0-442-01272-1).

G. K. Wallace, “The JPEG Still-Picture Compression Standard”, IEEE Trans. On Consumer Electronics, 38, 1, 18-34, 1992.

http://en.wikipedia.org/wiki/JPEG


UC

L A

dast

ral P

ark

Post

gra

duate

Cam

pus

Bibliography

http://en.wikipedia.org/wiki/MPEG-2

T. Sikora, “MPEG Digital Video-Coding Standards”, IEEE Signal Processing Magazine, 82-100, September 1997

http://en.wikipedia.org/wiki/MPEG-2

Elementary Information Theory


UC

L A

dast

ral P

ark

Post

gra

duate

Cam

pus


How much information does a symbol convey?

Intuitively, the more unpredictable or surprising it is, the more information is conveyed.

Conversely, if we strongly expected something, and it occurs, we have not learnt very much


UC

L A

dast

ral P

ark

Post

gra

duate

Cam

pus


If p is the probability that a symbol will occur

Then the amount of information, I, conveyed is:

The information, I, is measured in bits

It is the optimum code length for the symbol

pI

1log2


UC

L A

dast

ral P

ark

Post

gra

duate

Cam

pus


The entropy, H, is the average information per symbol

Provides a lower bound on the compression that can be achieved

))(

1(log)( 2 sp

spHs


UC

L A

dast

ral P

ark

Post

gra

duate

Cam

pus

Elementary Information theory

A simple example. Suppose we need to transmit four possible weather conditions:

1. Sunny

2. Cloudy

3. Rainy

4. Snowy

If all conditions are equally likely, p(s)=0.25, and H=2 i.e. we need a minimum of 2 bits per symbol


UC

L A

dast

ral P

ark

Post

gra

duate

Cam

pus

Elementary information theory

Suppose instead that it is:1. Sunny 0.5 of the time

2. Cloudy 0.25 of the time

3. Rainy 0.125 of the time, and

4. Snowy 0.125 of the time

Then the entropy is

75.175.05.05.0

3125.02225.015.0125.0

1log125.02

25.0

1log25.0

5.0

1log5.0 222

H

H

H


UC

L A

dast

ral P

ark

Post

gra

duate

Cam

pus


Variable length codewords

Huffman code – integer code lengths

Arithmetic codes – non-integer code lengths


UC

L A

dast

ral P

ark

Post

gra

duate

Cam

pus


Huffman code

Weather Probability Information Integer code

Sunny 0.5 1 0

Cloudy 0.25 2 10

Rainy 0.125 3 110

Snowy 0.125 3 111


UC

L A

dast

ral P

ark

Post

gra

duate

Cam

pus


Previous illustration is an example of a lossless code I.e. we are able to recover the information exactly


UC

L A

dast

ral P

ark

Post

gra

duate

Cam

pus


Note that we have assumed that each symbol is independent of the other symbols I.e. the current symbol provides no information

regarding the next symbol


UC

L A

dast

ral P

ark

Post

gra

duate

Cam

pus

Quantization

Quantization is the process of approximating a continuous (or range of values) by a (much) smaller range of values

Where Round(y) rounds y to the nearest integer

is the quantization stepsize

5.0

Round),(x

xQ


UC

L A

dast

ral P

ark

Post

gra

duate

Cam

pus

Quantization

Example: =2

0 1-3 -2 -1 2 3 4 5-5 -4

0-1 1 2-2

0-2 2 4-4


UC

L A

dast

ral P

ark

Post

gra

duate

Cam

pus

Quantization

Quantization plays an important role in lossy compression This is where the loss happens

Fundamentals of Images


UC

L A

dast

ral P

ark

Post

gra

duate

Cam

pus


An image consists of pixels (picture elements)

Each pixel represents luminance (and colour) Typically, 8-bits per pixel


UC

L A

dast

ral P

ark

Post

gra

duate

Cam

pus


Colour Colour spaces (representations)

RGB (red-green-blue) CMY (cyan-magenta-yellow) YUV

• Y = 0.3R+0.6G+0.1B (luminance)

• U=R-Y

• V=B-Y

Greyscale

Binary


UC

L A

dast

ral P

ark

Post

gra

duate

Cam

pus


A TV frame is about 640x480 pixels

If each pixels is represented by 8-bits for each colour, then the total image size is 640×480*3=921,600 bytes or 7.4Mbits

At 30 frames per second, this would be 220Mbits/second


UC

L A

dast

ral P

ark

Post

gra

duate

Cam

pus


Do we need all these bits?


UC

L A

dast

ral P

ark

Post

gra

duate

Cam

pus


Here is an image represented with 8-bits per pixel


UC

L A

dast

ral P

ark

Post

gra

duate

Cam

pus


Here is the same image at 7-bits per pixel


UC

L A

dast

ral P

ark

Post

gra

duate

Cam

pus


And at 6-bits per pixel


UC

L A

dast

ral P

ark

Post

gra

duate

Cam

pus




UC

L A

dast

ral P

ark

Post

gra

duate

Cam

pus


Do we need all these bits? No!

The previous example illustrated the eye’s sensitivity to luminance

We can build a perceptual model Only code what is important to the human visual

system (HVS) Usually a function of spatial frequency


UC

L A

dast

ral P

ark

Post

gra

duate

Cam

pus

Fundamentals of Images

Just as audio has temporal frequencies

Images have spatial frequencies

Transforms Fourier transform Discrete cosine transform Wavelet transform Hadamard transform


UC

L A

dast

ral P

ark

Post

gra

duate

Cam

pus

Discrete cosine transform

Forward DCT

Inverse DCT

1

0

)5.0(8

cos)(2

)()(

N

n

nu

nsuC

uS

)5.0(8

cos)(2

)()(

1

0

nu

uSuC

nsN

u


UC

L A

dast

ral P

ark

Post

gra

duate

Cam

pus

Basis functions

DC term


UC

L A

dast

ral P

ark

Post

gra

duate

Cam

pus

Basis functions

First term


UC

L A

dast

ral P

ark

Post

gra

duate

Cam

pus

Basis functions

Second term


UC

L A

dast

ral P

ark

Post

gra

duate

Cam

pus

Basis functions

Third term


UC

L A

dast

ral P

ark

Post

gra

duate

Cam

pus

Basis functions

Fourth term


UC

L A

dast

ral P

ark

Post

gra

duate

Cam

pus

Basis functions

Fifth term


UC

L A

dast

ral P

ark

Post

gra

duate

Cam

pus

Basis functions

Sixth term


UC

L A

dast

ral P

ark

Post

gra

duate

Cam

pus

Basis functions

Seventh term

DCT Example


UC

L A

dast

ral P

ark

Post

gra

duate

Cam

pus

Example

Signal


UC

L A

dast

ral P

ark

Post

gra

duate

Cam

pus

Example

DCT coefficients are: 4.2426 0 -3.1543 0 0 0 -0.2242 0


UC

L A

dast

ral P

ark

Post

gra

duate

Cam

pus

Example: DCT decomposition

DC term


UC

L A

dast

ral P

ark

Post

gra

duate

Cam

pus


2nd AC term


UC

L A

dast

ral P

ark

Post

gra

duate

Cam

pus


6th AC term


UC

L A

dast

ral P

ark

Post

gra

duate

Cam

pus

Example: summation of DCT terms

First two non-zero coefficients


UC

L A

dast

ral P

ark

Post

gra

duate

Cam

pus

Example: summation of DCT terms

All 3 non-zero coefficients


UC

L A

dast

ral P

ark

Post

gra

duate

Cam

pus

Example

What if we quantize DCT coefficients? =1

Quantized DCT coefficients are: 4 0 -3 0 0 0 0 0


UC

L A

dast

ral P

ark

Post

gra

duate

Cam

pus

Example

Approximate reconstruction


UC

L A

dast

ral P

ark

Post

gra

duate

Cam

pus

Example

Exact reconstruction


UC

L A

dast

ral P

ark

Post

gra

duate

Cam

pus

2-D DCT Transform

Let i(x,y) represent an image with N rows and M columns

Its DCT I(u,v) is given by

where

M

x

N

y

vyuxyxivCuCvuI

1 1 16

)12(cos

16

)12(cos),()()(

4

1),(

2

1)0( C 1)( uC


UC

L A

dast

ral P

ark

Post

gra

duate

Cam

pus


Discrete cosine transform Coefficients are approximately uncorrelated

Except DC term C.f. original 8×8 pixel block

Concentrates more power in the low frequency coefficients

Computationally efficient

Block-based DCT Compute DCT on 8×8 blocks of pixels


UC

L A

dast

ral P

ark

Post

gra

duate

Cam

pus


Basis functions for the 8×8 DCT (courtesy Wikipedia)

Fundamentals of JPEG


UC

L A

dast

ral P

ark

Post

gra

duate

Cam

pus


DCT Quantizer Entropy coder

IDCT Dequantizer Entropy

decoder

Compressed

image data

Encoder

Decoder


UC

L A

dast

ral P

ark

Post

gra

duate

Cam

pus


JPEG works on 8×8 blocks

Extract 8×8 block of pixels

Convert to DCT domain

Quantize each coefficient Different stepsize for each coefficient

Based on sensitivity of human visual system

Order coefficients in zig-zag order

Entropy code the quantized values


UC

L A

dast

ral P

ark

Post

gra

duate

Cam

pus


A common quantization table is

16 11 10 16 24 40 51 61

12 12 14 19 26 58 60 55

14 13 16 24 40 57 69 56

14 17 22 29 51 87 80 62

18 22 37 56 68 109 103 77

24 35 55 64 81 104 113 92

49 64 78 87 103 121 120 101

72 92 95 98 112 100 103 99


UC

L A

dast

ral P

ark

Post

gra

duate

Cam

pus


Zig-zag ordering

0 1 5 6 14 15 27 28

2 4 7 13 16 26 29 42

3 8 12 17 25 30 41 43

9 11 18 24 31 40 44 53

10 19 23 32 39 45 52 54

20 22 33 38 46 51 55 60

21 34 37 47 50 56 59 61

35 36 48 49 57 58 62 63


UC

L A

dast

ral P

ark

Post

gra

duate

Cam

pus


Entropy coding Run length encoding followed by

Huffman Arithmetic

DC term treated separately Differential Pulse Code Modulation (DPCM)

2-step process1. Convert zig-zag sequence to a symbol sequence

2. Convert symbols to a data stream


UC

L A

dast

ral P

ark

Post

gra

duate

Cam

pus


Modes Sequential Progressive

Spectral selection• Send lower frequency coefficients first

Successive approximation• Send lower precision first, and subsequently refine

Lossless Hierarchical

Send low resolution image first

Fundamentals of MPEG-1/2


UC

L A

dast

ral P

ark

Post

gra

duate

Cam

pus

Fundamentals of MPEG

A sequence of 2D images

Temporal correlation as well as spatial correlation

TV broadcast Frame-based Field-based


UC

L A

dast

ral P

ark

Post

gra

duate

Cam

pus

MPEG

Moving Picture Experts Group

Standard for video compression

Similarities with JPEG


UC

L A

dast

ral P

ark

Post

gra

duate

Cam

pus

MPEG

Design is a compromise between Bit rate Encoder/decoder complexity Random access capability


UC

L A

dast

ral P

ark

Post

gra

duate

Cam

pus

MPEG

Images Spatial redundancy Perceptual redundancy

Video Spatial redundancy

Intraframe coding

Temporal redundancy Interframe coding

Perceptual redundancy


UC

L A

dast

ral P

ark

Post

gra

duate

Cam

pus

MPEG

Consider a sequence of n frames of video.

It consists of: I-frames P-frames B-frames

A sequence of one I-frame followed by P- and B-frames is known as a GOP Group of Pictures E.g. IBBPBBPBBPBBP


UC

L A

dast

ral P

ark

Post

gra

duate

Cam

pus

MPEG

I-frames Intraframe coded

No motion compensation

P-frames Interframe coded

Motion compensation• Based on past frames only

B-frames Interframe coded

Motion compensation• Based on past and future frames


UC

L A

dast

ral P

ark

Post

gra

duate

Cam

pus

MPEG

Motion-compensated prediction Divide current frame, i, into disjoint 16×16

macroblocks Search a window in previous frame, i-1, for closest

match Calculate the prediction error For each of the four 8×8 blocks in the macroblock,

perform DCT-based coding Transmit motion vector + entropy coded prediction

error (lossy coding)


UC

L A

dast

ral P

ark

Post

gra

duate

Cam

pus

MPEG

Like JPEG, the DC term is treated separately DPCM

B-frame compression high Need buffer and delay

introduction to jpeg and mpeg ingemar j. cox university college london

Documents

information exactlyingemar

average information

current symbol

optimum code length

nearest integer

picture compression

smaller range of values

entropy isingemar