chapter 12 multimedia information

Chapter 12Multimedia

Information

Lossless Data Compression

Compression of Analog Signals

Image and Video Coding

Bits, numbers, information

Bit: number with value 0 or 1 n bits: digital representation for 0, 1, … , 2n

Byte or Octet, n = 8 Computer word, n = 16, 32, or 64

n bits allows enumeration of 2n possibilities n-bit field in a header n-bit representation of a voice sample Message consisting of n bits

The number of bits required to represent a message is a measure of its information content More bits → More content

Block vs. Stream Information

Block Information that occurs

in a single block Text message Data file JPEG image MPEG file

Size = Bits / block

or bytes/block 1 kbyte = 210 bytes 1 Mbyte = 220 bytes 1 Gbyte = 220 bytes

Stream Information that is

produced & transmitted continuously Real-time voice Streaming video

Bit rate = bits / second 1 kbps = 103 bps 1 Mbps = 106 bps 1 Gbps =109 bps

Transmission Delay L number of bits in message R bps speed of digital transmission system L/R time to transmit the information tprop time for signal to propagate across medium d distance in meters c speed of light (3x108 m/s in vacuum)

Use data compression to reduce LUse higher speed modem to increase R

Place server closer to reduce d

seconds /// RLcdRLtDelay prop

Compression

Information usually not represented efficiently Data compression algorithms

Represent the information using fewer bits Noiseless: original information recovered exactly

E.g. zip, compress, GIF, fax Noisy: recover information approximately

JPEG Tradeoff: # bits vs. quality

Compression Ratio#bits (original file) / #bits (compressed file)

H

W

= + +H

W

H

W

H

W

Color image

Red component

image

Green component

image

Blue component

image

Total bits = 3 H W pixels B bits/pixel = 3HWB

Example: 810 inch picture at 400 400 pixels per in2

400 400 8 10 = 12.8 million pixels8 bits/pixel/color

12.8 megapixels 3 bytes/pixel = 38.4 megabytes

Color Image

Examples of Block Information

Type Method Format Original Compressed

(Ratio)

Text Zip, compress

ASCII Kbytes- Mbytes

(2-6)

Fax CCITT Group 3

A4 page 200x100 pixels/in2

256 kbytes

5-54 kbytes (5-50)

Color Image

JPEG 8x10 in2 photo

4002 pixels/in2

38.4 Mbytes

1-8 Mbytes (5-30)


Information

Lossless Data Compression

Lossless data compression

Data expansion

ASDF9H... 11010101... ASDF9H...

Data Compression

Information is produced by a source Usually contains redundancy

Lossless Data Compression system exploits redundancy to produce a more efficient (usually binary) representation of the information

Compressed stream is stored or transmitted depending on application

Data Expansion system recovers exact original information stream

Binary Tree Codes

Suppose information source generates symbols fromA = {a1, a2, … , aK}

Binary tree code K leafs 1 leaf assigned to each symbol Binary codeword for symbol aj is

sequence of bits from root to corresponding leaf

Encoding use table Decoding: trace path from root

to leaf, output corresponding symbol; repeat

a1

a2

a3 a4

0 1

0 1

0 1

a1 00a2 1a3 010a4 011

Encoding Table

Performance of Tree Code

Average number of encoded bits per source symbol Let l(aj) = length of codeword for aj

][)(][1

j

K

jj aPallE

To minimize above expression, assign short codeword to frequent symbols and longer codewords to less frequent symbols

Assume• 5 symbol information source: {a,b,c,d,e}

• symbol probabilities: {1/4, 1/4,1/4,1/8,1/8}

0 1

00

0

1

1

1

00a

a 00b 01c 10d 110e 111

Symbol Codeword

01b

10c

110 111d e

aedbbad.... mapped into 00 111 110 01 01 00 110 ...Note: decoding done without commas or spaces

17 bits

Example

Finding Good Tree Codes What is the best code if

K=2? Simple! There is only one tree

code: assign 0 or 1 to each of the symbols

What about K=3? Assign the longest pair of

codeword to the two least frequent symbols

If you don’t, then switching most frequent symbol to shortest codeword will reduce average length

Picking the two least probable symbols is always best thing to do

a1 a2

0 1

a1 a2

0 1

0 1

a3

?

Huffman Code Algorithm for finding optimum binary tree code for a set of

symbols A={1,2,…,K}, denote symbols by index Symbol probabilities: {p1, p2, p3, … , pK}

Basic step: Identify two least probable symbols, say i and j Combine them into new symbol (i,j) with probability pi + pj

Remove i and j from A and replace them with (i,j) New alphabet A has 1 fewer symbol If A has two symbols, stop

Else repeat Basic Step Building the tree code

Each time two symbols are combined join them in the binary tree

1.00

a b c d e

.50 .20 .15 .10 .05

.15

.30

.50

a 0b 10c 110d 1110e 1111

a

b

c

d e

0 1

0

0

0

1

1

1

The final tree code

Building the tree code by Huffman algorithm

E[l]=1(.5)+2(.20)+3(.15)+4(.1+.05)=1.95

What is the best performance?

Can we do better? Huffman is optimum, so we cannot do better for A

If we take pairs of symbols, we have a different alphabet A’={aa, ab, ac, …, ba, bb, …, ea, eb, …ee} {(.5)(.5), (.5)(.2), ….., (.05)(.05)}

By taking pairs, triplets, and so on, we can usually improve performance

So what is the best possible performance? Entropy of the source

Entropy of an Information Source

Suppose a source: produces symbols from alphabet A={1,2,…,K} with probabilities {p1, p2, p3, … , pK} Source outputs are statistically independent of

each other Then the entropy H of the source is the best

possible performance

lbits/symbo log1

2

K

jjj ppH

Examples Example 1: source with {.5, .2, .15, .10, .05}

923.12ln/)05.ln05.1.ln1.15.ln15.2.ln2.5.ln5.( H Huffman code gave E[l]=1.95, so its pretty close to H

Example 2: source with K equiprobable symbols

K

j

KKK

H1

22 log1

log1

(

Example 3: source with K=2m equiprobable symbols

m

j

mmm

mH2

122 lbits/symbo 2log

2

1log

2

1(

Fixed-length code with m bits is optimum!

“Blank” in strings of alphanumeric information ------$5----3-------------$2--------$3------

“0” (white) and “1” (black) in fax documents

Run-Length Codes

When one symbol is much more frequent than the rest, block codes don’t work well Runlength codes work better

Parse the symbol stream into runs of the frequent symbol Apply Huffman or similar code to encode the lengths of the runs

Binary Runlength Code 1

Run Length Codeword Codeword (m = 4)

1 0 00..00 000001 1 00..01 0001001 2 00..10 00100001 3 00..11 001100001 4 . .000001 5 . .0000001 6 . . . . . . . . . .000...01 2m – 2 11..10 1110000...00 run >2m – 2 11..11 1111

Inputs:

m

Use m-bit counter to count complete runs up to length 2m-2 If 2m-1 consecutive zeros, send m 1s to indicate length>2m-2

000…001000…01000…0100…001… 137 bits

Example: Code 1, m = 4

25 57 36 15

Symbols >14 10>14>14>14 12 >14>14 6 >14 0

1111 1010 1111 1111 1111 1100 1111 1111 0110 1111 0000 44 bits

Runs

15w 10wb 15w 15w 15w 12wb 15w 15w 6wb 15w b

Example: Code 1

Code 1 performance: m / E[R] encoded bits/source bits

Run Length Codeword Codeword (m = 4)

1 0 10..00 1000001 1 10..01 10001001 2 10..10 100100001 3 10..11 1001100001 4 . .000001 5 . .0000001 6 . . . . . . . . . .000... 01 2m – 1 11..11 11111000... 00 run >2m – 1 0 0

Inputs:

m + 1

Binary Runlength Code 2

When all-zero runs are frequent, encode event with 1 bit to get higher compression

Example: Code 2, m = 4

000…001000…01000…0100…001… 137 bits25 57 36 15

Symbols >15 9 >15>15>15 9 >15>15 4 15

0 11001 0 0 0 11001 0 0 10100 11111 26 bits

Runs

16w 9wb 16w 16w 16w 9wb 16w 16w 4wb 15wb

EncodedStream

DecodedStream

Example: Code 2

Code 2 performance: E[ l ] / E[R] encoded bits/source bits

(a) Huffman code applied to white runs and black runs

(b) Encode differences between consecutive lines

Predictive Coding

Fax Documents use Runlength Encoding

CCITT Group 3 facsimile standard Default: 1-D Huffman coding of runlengths Option: 2-D (predictive) run-length coding

CCITT G-II G-III G-IV

Business LetterCircuit DiagramInvoiceDense textTechnical paperGraphDense Japanese textHandwriting and simple graphics

256K256K256K256K256K256K256K256K

17K15K31K54K32K23K53.5K26K

10K5.4K14K35K16K8.3K34.6K10K

Average 256K 31.4K 16.6K

Compression ratio 1 8.2 15.4

Maximum compression ratio 17.1 47.4

Note: Documents scanned at 200 100 pixels/square inch. G-IV intended for

documents scanned at 400 100 pixels/square inch and ISDN transmission at 64 kbps.

Adaptive Coding Adaptive codes provide compression when symbol

and pattern probabilities unknown Essentially, encoder learns/discovers frequent

patterns Lempel-Ziv algorithm powerful & popular

Incorporated in many utilities Whenever a pattern is repeated in the symbol stream, it is

replaced by a pointer to where it first occurred & a value to indicate the length of the pattern

All tall We all are tall. All small We all are small

All_ta[2,3]We_[6,4]are[4,5]._[1,4]sm[6,15][31,5].

ll all_ _tall All_ small

all_We_all_are_


Information

Compression of Analog Signals

Th e s p ee ch s i g n al l e v el v a r ie s w i th t i m(e)

Stream Information

A real-time voice signal must be digitized & transmitted as it is produced

Analog signal level varies continuously in time

Samplert

x(t)

t

x(nT)

Interpolationfilter

t

x(t)

t

x(nT)

(a)

(b)

Nyquist: Perfect reconstruction if sampling rate 1/T > 2Ws

Sampling Theorem

Quantization of Analog Samples

input x(nT)

output y(nT)

0.51.5

2.5

3.5

-0.5

-1.5-2.5

-3.5

Quantization error:“noise” = x(nT) – y(nT)

Quantizer maps inputinto closest of 2m

representation values

Original signalSample value

Approximation

3 b

its /

sam

ple

Bit Rate of Digitized Signal

Bandwidth Ws Hertz: how fast the signal changes Higher bandwidth → more frequent samples Minimum sampling rate = 2 x Ws

Bit Rate = 2 Ws samples/second x m bits/sample

Representation accuracy: range of approximation error Higher accuracy

→ smaller spacing between approximation values

→ more bits per sample

SNR = 6m – 7 dB

Example: Voice & Audio

Telephone voice Ws = 4 kHz → 8000

samples/sec 8 bits/sample Rs=8x8000 = 64 kbps

Cellular phones use more powerful compression algorithms: 8-12 kbps

CD Audio Ws = 22 kHz → 44000

samples/sec 16 bits/sample Rs=16x44000= 704 kbps

per audio channel MP3 uses more powerful

compression algorithms: 50 kbps per audio channel

Differential Coding

Successive samples tend to be correlated

Use prediction to get better quality for m bits

-1

-0.8

-0.6

-0.4

-0.2

0

0.2

0.4

0.6

0.8

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19

k

x(kT

), d(

kT)

Smooth signal

Successive differences

Differential PCMQuantize the difference between prediction and actual signal:

The end-to-end error is only the error introduced by the quantizer!

y(n)- x(n) = ˆ x (n) + ˜ d (n) - x(n) = ˜ d (n) - d(n) = e(n)

LINEARPREDICTOR

h

Decoder

LINEARPREDICTOR

h+

+

+

QUANTIZER

+

+

Encoder+

y(n)

+

+

-

d~

(n) to channelx(n)

x(n)

d~

(n ) y(n)

x(n)

d (n )

x(n)

.log10 10 pPCMDPCM GSNRSNR

Voice Codec Standards

A variety of voice codecs have been standardized for different target bit rates and implementation complexities. These include:

G.711 64 kbps using PCMG.723.1 5-6 kbps using CELPG.726 16-40 kbps using ADPCMG.728 16 kbps using low delay CELPG.729 8 kbps using CELP

X(f)

f 0 W-W

Q(f)Noise spectrum

Signal spectrum

Transform Coding

Quantization noise in PCM is “white” (flat spectrum) At high frequencies, noise power can be higher than signal

power If coding can produce noise that is “shaped” so that signal

power is always higher than noise power, then masking effects in ear results in better subjective quality

Transform coding maps original signal into a different domain prior to encoding

X(f)

f

0 W-W

Q(f)

Subband Coding

Subband coding is a form of transform coding Original signal is decomposed into multiple signals

occupying different frequency bands Each band is PCM or DPCM encoded separately Each band allocated bits so that signal power

always higher than noise power in that band

MP3 Audio Coding

MP3 is coding for digital audio in MPEG Uses subband coding

Sampling rate: 16 to 48 kHz @ 16 bits/sample Audio signal decomposed into 32 subbands Fast Fourier transform used for decomposition Bits allocated according to signal power in subbands

Adjustable compression ratio Trade off bitrate vs quality 32 kbps to 384 kbps per audio signal


Information

Image and Video Coding

Image Coding

Two-dimensional signal Variation in Intensity in 2 dimensions RGB Color representation Raw representation requires very large number of

bits Linear prediction & transform techniques

applicable Joint Picture Experts Group (JPEG) standard

Transform Coding

Time signal on left side is smooth, that is, it changes slowly with time

If we take its discrete cosine transform (DCT) we find that the non-negligible frequency components are clustered near zero frequency; other components are negligible.

(frequency)

X(f)

(time)

x(t) 1-D DCT(a)

80 10 3 2 08 2 1 0 ..2 0 0 ..0 0 ...0 ...

100 95 85 ..102 99 70..101 80 70..95 77 65.. (u,v)

Frequency(n,m)Space

2-D DCT(b)

Take a block of samples from a smooth image If we take two-dimensional DCT, non-negligible

values will cluster near low spatial frequencies (upper left-hand corner)

Image Transform Coding

Sample Image in 8x8 blocks

In image and video coding, the picture array is divided into 8x8 pixel blocks which are coded separately.

Quantized DCT coefficients are scanned in zigzag fashion

Resulting sequence is run-length and variable-length (Huffman) coded

180 150 115 100 100 100 100 100250 180 128 100 100 100 100 100190 170 120 100 100 100 100 100160 130 110 100 100 100 100 100110 100 100 100 100 100 100 100100 100 100 100 100 100 100 100100 100 100 100 100 100 100 100100 100 100 100 100 100 100 100

DCT

8x8 block of 8-bit pixel values

111 22 15 5 1 0 0 014 17 10 4 1 0 0 0 2 2 1 0 0 0 0 0-4 -4 -2 -1 0 0 0 0-3 -3 -1 0 0 0 0 0-1 -1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

Quantized DCT Coefficients

DCT Coding

JPEG Image Coding Standard JPEG defines:

Several coding modes for different applications Quantization matrices for DCT coefficients Huffman VLC coding tables

Baseline DCT/VLC coding gives 5:1 to 30:1 compression

VLC codingDCT Quantization

8x8block

QuantizationMatrices

Huffman TabDC: DPCM VLIAC: O-run/VLI

SymmetricDCT/I-DCT

transfer

Low (23.5 kb) High (64.8 kb)

Look for jaggedness along boundaries

Video Signal

Sequence of picture frames Each picture digitized &

compressed Frame repetition rate

10-30-60 frames/second depending on quality

Frame resolution Small frames for

videoconferencing Standard frames for

conventional broadcast TV HDTV frames

30 fps

Rate = M bits/pixel x (WxH) pixels/frame x F frames/second

. . .

. . .

. . .

. . .

. . .

. . .

m x n pixelcolor

picture

m x n blockred component

m x n blockgreen component

m x n blockcomponent signal

A scanned color picture produces 3 color component signals

BGRQ

BGRI

BGRY

xxxx

xxxx

xxxx

31.052.021.0

32.028.060.0

11.059.030.0

Luminance signal (black & white)

Chrominance signals

Color Representation

RGB (Red, Green, Blue) Each RGB component has the same bandwidth and Dynamic

Range

YUV Commonly used to mean YCbCr, where Y represents the

intensity and Cr and Cb represent chrominance information Derived From "Color Difference" Video Signals: Y, R–Y, B–Y

Y = 0.299R + 0.587G + 0.114B

Sampling Ratio of Y:Cr:Cb Y is typically sampled more finely than Cr & Cb 4:4:4, 4:2:2, 4:2:0, 4:1:1

(b) Broadcast TV at 30 frames/sec =

10.4 x 106 pixels/sec

720

480

(c) HDTV at 30 frames/sec =

67 x 106 pixels/sec1080

1920

(a) QCIF videoconferencing at 30 frames/sec =

760,000 pixels/sec

144

176

Typical Video formats

CIF: Common Interchange Format 352x288 pixels, 30 frames/second, sampling rate 4:2:0

SIF: Simple Input Format 360x242 pixels, 30 frames/second, sampling rate 4:2:0 360x288 pixels, 25 frames/second, sampling rate 4:2:0

CCIR-601 (ITU-601) 720x525 pixels, 30 frames/second, sampling rate 4:4:4

& 4:2:2 720x625 pixels, 25 frames/second, sampling rate 4:4:4

& 4:2:2

Video Compression Techniques

Intraframe coding: compression of single image, e.g. JPEG

Interframe coding: compression of difference between current image block & reference block in another frame Requires motion compensation Prediction: reference frame is in past Interpolation: reference frames are in past &

future

- Motion Vector

- Error Block

- Intra Block

Intra- /inter-frame

processorEncoder

Frame buffer

Fn

Fn-1

Fn

Fn-1(+1,+2)

(x,y)Hybrid Encoder:

MotionVector

Motion Compensation

Find block from previous frame that best matches current block; transmit displacement vector

Encode difference between current & previous block

CRC errorand

Fixed-lengthcontrol

p x 64

I-DCT

MotionEstimation

Q-1

Q

Intra

DCTHuffman

VLC

FrameMemory

- Inter

Filter

Motion Vector

8x8block

H.261 Encoder

Intended for videoconferencing applications Bit rates = p x 64 kbps, p = 2, 6, 24 common

Video Codecs: H.263

Frame-based coding

Low Bit rate Coding: < 64 Kbps (typical)

H.261 coding with improvements I/P/B frames Additional Image formats: 4CIF, 16CIF

Suitable for desktop video conferencing over low-speed links

MPEG Coding Standard

Motion Picture Expert Group (MPEG) Video and audio compression & multiplexing Video display controls

Fast forward, reverse, random access Elements of encoding

Intra- and inter-frame coding using DCT Bidirectional motion compensation Group of Picture structure Scalability options

MPEG only standardizes the decoder

MPEG Video Block Diagram

DCT: Discrete Cosine TransformFS: Frame StoreMC: Motion Compensation

VB: Variable BufferVLC: Variable length codingVLD: Variable length decoding

Fn-1

Linear prediction Interpolation

1-D examples:

Fn

Fn+1

Fn

Bidirectional MC

Fn-1

Fn+1

- Intra- Forward- Backward- Bidirectional

Quantize individualsamples

xx

MPEG Motion Compensation

Bidirectional Motion Compensation

B11

Pred 12

B1B2

B4B5

B7B8

B10

Intra 0

Pred 3

Pred 6

Intra 9

- Intra- Forward- Reverse- Bidirectional

16 x 16 bidirectionalmacroblocks

Group of Picture Structure

I-frames: for random access intraframe coded; lowest compression

P-frames: predictive encoded most recent I- or P- frame, medium compression

B-frames: interpolation most recent & subsequent I- or P-frame, highest compression

MPEG2 Scalability Modes

Scalability modes Data Partitioning

Separate headers and payloads apart SNR (Signal-to-Noise Ratio)

Different levels of quality Temporal

Different frame rates Spatial

Different resolutions

Limited scalability capabilities Three layers only

Decoder

Decoder

Base stream

Enhancement stream Largeimage

Smallimage

Decoder

Decoder

Base stream

Enhancement stream Highquality

Lowquality

SNR scalability

Spatial Scalability

MPEG Scalability

MPEG Versions MPEG-1

For video storage in CD-ROM & transmission over T-1 lines (1.5 Mbps)

MPEG-2 Many options: 352x240 pixel; 720x480 pixel; 1440x1152

pixel; 1920x1080 pixel Many profiles (set of coding tools & parameters) Main Profile

I, P & B frames; 720x480 conventional TV Very good quality @ 4-6 Mbps

MPEG-4 <64 kpbs to 4 Mbps Designed to enable viewing, access & manipulation of

objects, not only pixels For digital TV, streaming video, mobile multimedia & games

MPEG Systems and Multiplex

Provides packetization and multiplexing for audio/video elementary streams

Provides timing and error control information MPEG1 systems:

System Streams, long variable size packets, suitable for error-free environments

MPEG2 systems: Transport Streams, short fixed size packets, suitable for

error-prone environments Program Streams, long variable size packets, suitable for

relatively error-free environments

Audioencoder

Videoencoder

Packetizer

Packetizer

Packet stream

multiplexer

Transport stream

multiplexer

Programstream

Transportstream

PESstreams

MPEG systems layer

MPEG-2 Multiplexing

Packetized Elementary Streams (PES) Packet length, presentation & decoding timestamps; bit rate For lip-synch, clock recovery

(for error-free environment)

(for error-prone environment)

Digital Video Summary

Type Method Format Original Compressed

Video Confer-ence

H.261 176x144 or 352x288 pix

@10-30 fr/sec

2-36 Mbps

64-1544 kbps

Full Motion

MPEG2

720x480 pix @30 fr/sec

249 Mbps

2-6 Mbps

HDTV MPEG2

1920x1080 @30 fr/sec

1.6 Gbps

19-38 Mbps