# 1 audio/video compression 4 zlecture 3: multimedia networks zlecture 4: audio/video compression...

TRANSCRIPT

1

Audio/Video Compression

4

Lecture 3: Multimedia Networks Lecture 4: Audio/Video Compression Image & Video Compression Standards Speech & Audio Compression Standards Wavelet Transform & its Application in Compression

2

Introduction to Audio/Video Compression

4

With today’s technology, only compression makes storage/transmission of digital audio/video streams possible

Redundancy exploitation for compression based on human perceptive features

3

Introduction to Audio/Video Compression

4

Spatial redundancy: Values of neighboring pixels strongly correlated in natural images

Temporal redundancy: Adjacent frames in a video sequence often show very little change, a strong audio signal in a given time segment can mask certain lower level distortion in future & past segments

4

Introduction to Audio/Video Compression

4

Spectral redundancy: In multispectral images, spectral values of same pixel across spectral bands correlated, an audio signal can completely mask a sufficiently weaker signal in its frequency-vicinity

Redundancy across scale: Distinct image features invariant under scaling

Redundancy in stereo: Correlations between stereo images/audio channels

5

Introduction to Audio/Video Compression

4

Spatial/spectral redundancies: Transform Coding Temporal redundancy: DPCM (differential pulse code

modulation), motion estimation/motion compensation First compression methods: lossless

Huffman coding Ziv-Lempel coding Arithmetic coding

Inadequate for transmission media of low bandwidth (e.g., ISDN) or for devices of low data throughput (e.g., CD-ROM)

6

Introduction to Audio/Video Compression

4

Lossless vs. lossy compression Intraframe vs. interframe compression Symmetrical vs. asymmetrical compression Real-time: Encoding-decoding delay<=50 ms Scalable: Frames coded at different resolutions or quality

levels Recent advanced compression methods reduce bandwidths

enormously without reduction of perceptive quality

7

Introduction to Audio/Video Compression

4

Entropy coding: Arithmetic coding, Huffman coding, Run-length coding

Source coding: DPCM, DCT, DWT, motion-estimation/motion compensation

Hybrid Coding: H.261, H.263, H.263+, JPEG, MPEG1, MPEG2, MPEG4, Perceptual Audio Coder

PreprocessingSourcecoding

Entropycoding

Uncompressed data

Hybrid coding = source coding + entropy coding

Compressed data

8

Wavelet Theory

4

A unified framework for analysis of non-stationary signals Wavelet transform (WT): Alternative to classical Short-Time

Fourier Transform (STFT) or Gabor Transform By contrast to STFT, WT does “constant-Q” or relative

bandwidth frequency analysis: short windows at high frequencies and long windows at low frequencies

9

Short-Time Fourier Transform

4

Fourier Transform (FT):

X(f): Projection of signal x(t) along exp(j2ft) How signal energy being distributed over frequencies

)2exp(),()2exp()()( ftjtxdtftjtxfX

10

Short-Time Fourier Transform

4

To know local energy distribution, STFT is introduced:

g(t): A window of finite support Around local time , how signal energy being distributed

over frequencies

dtftjtgtxfSTFT )2exp()(*)(),(

11

Short-Time Fourier Transform

4

Given f, STFT(,): Output of a bandpass filter having the window function (modulated to f) as its impulse response

Resolution in time/frequency by window g(t):

2

22

2

2

22

2

|)(|

|)(|

|)(|

|)(|

fG

fGff

tg

tgtt

12

Short-Time Fourier Transform

4

Uncertainty Principle (Heisenberg):

Once window g(t) chosen, resolution in time/frequency fixed

41

ft

13

Continuous Wavelet Transform (CWT)

4

If can be kept constant, resolution in frequency becomes arbitrarily good at low frequencies while resolution in time becomes arbitrarily good at high frequencies

CWT follows the above idea but all impulse responses of filter bank are defined as scaled versions of the same prototype or basic wavelet h(t)

14

Continuous Wavelet Transform (CWT)

4

Let

h(t): Any bandpass function

frequency) torelated (somehow scale :0a

)(1

)(

a

th

atha

dta

thtx

a

dtthtxaCWT a

)()(1

)()(),(

*

*

15

Continuous Wavelet Transform (CWT)

4

FT of ha(t):

)(

)2exp()(

)2exp()(1

)2exp()()(

afHa

dfajha

dtftja

th

a

dtftjthfH aa

16

Continuous Wavelet Transform (CWT)

4

Resolution in frequency of ha(t):

22

2

22

2

2

22

2

22

2

1

)(

)(1

)(

)(

)(

)(

ca

dffH

dffHf

a

dfafH

dfafHf

dffH

dffHff

a

a

17

Continuous Wavelet Transform (CWT)

4

Given a fixed frequency f0, if scale a is chosen as

const

Then

0

0

f

c

af

c

f

f

faf

18

Continuous Wavelet Transform (CWT)

4

By definition of CWT:

Scale a not linked to frequency modulation but related to time-scaling

dta

thatxa

dta

thtx

a

dtthtxaCWT a

)()(

)()(1

)()(),(

*

*

*

19

Continuous Wavelet Transform (CWT)

4

Signal x(at) seen through a constant length filter centered at /a

Larger scale a is, more contracted signal x(t) becomes Smaller scale a is, more dilated signal x(t) becomes Larger scales: CWT(,a) provides more global view of signal

x(t) Smaller scales: CWT(,a) provides more detailed view of

signal x(t)

20

Continuous Wavelet Transform (CWT)

4

Define wavelet ha,

:Inner product or correlation between x(t) and ha,

CWT(,a) called analysis stage (of signal x(t)) at scale a

)(1

)(, a

th

atha

,,),( ahxaCWT

21

Continuous Wavelet Transform (CWT)

4

x(t) can be recovered from multi-scale analysis if

02, )(),()(

: 0)( and

a

a a

dadthaCWTctx

dtthh

dffH

dffHfc 2

22

)(

)(

22

Continuous Wavelet Transform (CWT)

4

Energy conservation:

Signal energy distributed at scale a by:

: wavelet spectrogram, or scalogram, distribution of signal energy in time-scale plane (associated with area measure )

0

2

22),(

a a

dadaCWTx

2

2),(a

daCWT

2),( aCWT

2a

dad

23

Continuous Wavelet Transform (CWT)

4

Larger scales more global view courser resolutions Smaller scales more detailed view finer resolutions CWT decomposition of signal over scales signal energy

distribution with various resolutions

24

Discrete Wavelet Transform (DWT)

4

Two methods developed independently in late 70’s and early 80’s

Subband Coding Pyramid Coding or multiresolution signal analysis

25

Multiresolution Pyramid 4

Given an original sequence x(n), n Z, define a lower resolution signal:

k

knxkg

nxgny

)2()(

)2)(()(

Where g(n) : a halfband lowpass filter

26

Multiresolution Pyramid 4

An approximation of x(n) from y(n) :

Where y’(2n) = y(n), y’(2n+1) = 0g’(n) : an interpolative filter

k

knykg

nygna

)()(

))(()(

27

Multiresolution Pyramid 4

If g(n) and g’(n) are perfect halfband filters, i.e.,

then a(n) provides a perfect halfband lowpass approximation to x(n)

,2/2/0

,2/||1

)()(

or

eGeG jj

28

Multiresolution Pyramid 4

It can be proved :

2/or 2/ ,0

2/|| ),()(

)()()(

)]()()()([2/1)(2

2/12/12/12/1

jj eXeA

zGzYzA

zXzGzXzGzY

29

Multiresolution Pyramid 4

Letd(n) = x(n) - a(n)

Then x(n) = a(n) +d(n)

But redundancy between a(n) and d(n) : If x(n) uses sampling rate fs , d(n) and y(n)

use sampling rate fs or fs /2, respectively

30

Multiresolution Pyramid 4

Pyramid decomposition : a redundant representation

But redundancy upper bounded by :1 + 1/2 + 1/4 + … < 2 in one dimensional

systemx(n) y(n) y (n)

d(n) d (n)

1

1

31

Multiresolution Pyramid 4

For perfect halfband lowpass filters g(n) and g’(n),it is clear that d(n) contains frequencies above /2 of x(n), and thus can also be subsampled by two without loss of information.

In a pyramid, it is possible to take very good lowpass filters and derive visually pleasing course versions

In a subband scheme, critical sampling is accomplished at a price of a constraint filter design and a relatively poor lowpass version as a course approximation : undesirable if the course version is used for viewing in a compatible subchannel

32

Subband Coding 4

One stage of a pyramid decomposition a half rate low resolution signal +a full rate difference signal

# (samples) increased by 50% If filter g(n) and g’(n) meet certain conditions,

oversampling can be avoided Subband coding first popularized in speech

compression does not produce such redundancy

33

Subband Coding 4

A full-band one dimensional signal is decomposed into two subbands using an analysis filter bank

Ideally, the analysis filter bank consists of a lowpass filter and a highpass filter with nonoverlapping frequency responses and unit gain over their respective bandwidth

After filtering, lowpass and highpass signals each have only a half of original bandwidth or “frequency content”, and thus can be downsampled in half

But ideal filters are unrealizable

34

Subband Coding 4

By using overlapping responses, frequency gaps in subband signals can be prevented

Aliasing will be introduced when lowpass and highpass signals are downsampled in half

The aliasing effect can be eliminated to produce perfect reconstruction at synthesis stage

Lowpass and highpass signals will each have a bandwidth more than a half of original bandwidth

Quadrature Mirror Filters (QMF) for analysis/synthesis filtering

35

Subband Coding 4

Output signals from analysis bank after downsampling:y1(n)=(h1*x)(2n)

y2(n)=(h2*x)(2n)

After quantization, y1(n) and y2(n)

After upsampling, become:

)]()()()([)(

)]()()()([)(

21

21

21

21

21

21

21

21

2221

2

1121

1

zXzHzXzHzY

zXzHzXzHzY

)(ˆ and )(ˆ 21 nyny

)(ˆ and )(ˆ 21 nyny

0)12(~ ),(ˆ)2(~0)12(~ ),(ˆ)2(~

222

111

nynyny

nynyny

36

Subband Coding 4

Output signals from synthesis bank:

Reconstructed signal:

))(~( ),)(~( 2211 ngyngy

)()(ˆ)()(ˆ

)()(~

)()(~

)(ˆ

))(~())(~()(ˆ

22

212

1

2211

2211

zGzYzGzY

zGzYzGzYnX

ngyngynx

37

Subband Coding 4

Ignoring quantization or coding effect,

If H1(z), G1(z) are ideal lowpass filters and H2(z), G2(z) are ideal highpass filters,

)()]()()()([

)()]()()()([

)()()()()(ˆ

)()(ˆ ),()(ˆ

221121

221121

22

212

1

2211

zXzGzHzGzH

zXzGzHzGzH

zGzYzGzYzX

zYzYzYzY

) ( 0)()(

) ( 1)()(

211

211

jj

jj

eGeH

eGeH

38

Subband Coding 4

Then

) ( 1)()(

) ( 0)()(

222

222

jj

jj

eGeH

eGeH

)( 0)()()()(

)( 1)()()()(

2211

2211

jjjj

jjjj

eGeHeGeH

eGeHeGeH

39

Subband Coding 4

Implying

Indicating is the aliasing component when filters are not ideal, which is desired to be zero

)()(ˆ

0)()()()(

1)()()()(

21

2211

2211

zXzX

zGzHzGzH

zGzHzGzH

)()]()()()([ 221121 zXzGzHzGzH

40

Subband Coding 4

To have perfect reconstruction in non-ideal filtering case, the iff conditions are:

If H2(z)=H1(-z), G1(z)=2H1(z), G2(z)=-2H1(-z), the aliased term becomes zero and the reconstructed is given:

0)()()()(

2)()()()(

2211

2211

zGzHzGzH

zGzHzGzH

)(ˆ zX

)()]()([)()]()([)(ˆ 21

21

22

21 zXzHzHzXzHzHzX

41

Subband Coding 4

For perfect reconstruction, we need

or

Using symmetric linear phase FIR of length N for H1 results in

1)()( 21

21 zHzH

1)()()( 21

21 jjj eHeHeT

)1(2

112

1 )()1()()(

NjjNjj eeHeHeT

42

Subband Coding 4

As N=even,

QMF filters

QMFfor 1)(

)()()( )1(2

1

2

1

j

Njjjj

eT

eeHeHeT

/20

1)(1

jeH )(2jeH

43

Subband Coding 4

If subband filters Hi(z), Gi(z) satisfy three conditions

perfect reconstruction results, too Aliased term

2)()()()(

),()(

2,1 ),()(

2211

112

1

zGzHzGzH

zzHzH

izGzH ii

0

)()()()(

)()()()(

111

11

11

122

111

zHzzHzHzH

zHzHzHzH

z

44

Multiresolution Wavelet Representation and Approximation

4

Embedded linear spaces in L2(R):

Let Aj be an orthogonal projection on Vj:

Let Oj be the orthonormal complement of Vj in Vj+1 :

)(, 21 RLVVj jj

||||||||),()(,)( 2 ffAfgRLxfVxg

AAA

jj

jjj

1 j j jV O V

45

Multiresolution Wavelet Representation and Approximation

4

Let Dj be an orthogonal projection on Oj :

Then an original signal A0f can be decomposed as:

||||||||),()(,)( 2 ffDfgRLxfOxg

DDD

jj

jjj

fDfDfA

fDfDfA

fDfAfA

JJ 1

122

110

46

Multiresolution Wavelet Representation and Approximation

4

A-J f = the orthogonal projection of A0f on

D-j f = the orthogonal projection of A0f on O-j

D-j f and D-k f : orthogonal to each other or uncorrelated to each other

D-j f : orthogonal to A-J f , or uncorrelated to A-J f

A-J f : a coarse version of A0f

: details of A0f arranged from coarser to finer

0VV J

)( kj

)0( Jj

fDfD J 1,,

47

Multiresolution Wavelet Representation and Approximation

4

Let be an orthonormal basis of Vj :

Aj f can be characterized by the coefficients of orthonormal expansion:

The sequence denoted by and called a discrete approximation of f in Vj

}:)({ , Znxnj

n

njnjj ffARLf ,,2 ,),(

}:,{ , Znf nj fAdj

48

Multiresolution Wavelet Representation and Approximation

4

Let be an orthonormal basis of Oj

Dj f characterized by the coefficients

The sequence denoted by and called a discrete approximation of f in Oj

}:)({ , Znxnj

n

njnjj ffD ,,,

}:,{ , Znf nj fDd

j

49

Multiresolution Wavelet Representation and Approximation

4

Thus, A0f can be characterized by

can be further characterized by

This set of discrete signals is called orthogonal “wavelet” representation

is organized as a coarse version added by increasing fine details

The orthogonal representation: decorrelated representation

fAd0fAd0

},,,{ 1 fDfDfA ddJ

dJ

fAd0

50

Multiresolution Wavelet Representation and Approximation

4

If we require:

Aj f is band-limited such that it can be sampled by a rate of 2j, i.e., 2j samples per time or length unit

1)2()( jj VxfiffVxf

)2( xf

)(xf

5.0 1

51

Multiresolution Wavelet Representation and Approximation

4

Translation invariant with A0:

Translation invariant with produced by

))(())((

)()(let ),(,

00

2

kxfAxgA

kxfxgRLfZk

dA0 }:)({ ,0 Znxn

knn fg

kxfxgRLfZk

,0,0

2

,,

)()(let ),(,

52

Multiresolution Wavelet Representation and Approximation

4

Then ‘s can be constructed by a scaling function Furthermore, let

then

nj , 0,0 )()(,0 nxxn

)2(2)(, nxx jjnj

)()(~

)(),()( 121

nhnh

nxxnh

)2)(~

())((

,)2(~

,

1

,1,

nfAhnfA

fknhf

dj

dj

kkjnj

53

Multiresolution Wavelet Representation and Approximation

4

filtered by and downsampled by two

Let

fAfA dj

dj 1 h

~

1)()(

)()( ,1)0(

then

)()(

22

12

HH

nhH

enhH

n

n

jn

54

Multiresolution Wavelet Representation and Approximation

4

Let then ‘s can be constructed by

Let then

)()( 222

Hej

)2(2)(

)()(

,

,0

nxx

nxx

jjnj

n

nj ,

)()(~ ,)(),()( 121 ngngnxxng

)2(~)( 1 nfAgnfD dj

dj

55

Multiresolution Wavelet Representation and Approximation

4

filtered by and downsampled by two

From

H,G: Quadrature Mirror Filters

fAfD dj

dj 1 g~

)1()1()(

),()(1

222

nhng

Hen

j

56

Multiresolution Wavelet Representation and Approximation

4

)2)(~()2)(~

(

))()(2(~))()(2(~

))()(2())()(2(

,)2(,)2(,))((

)2()2(

)2(,,

)2(,,

,,,11

,,,1

2,01,,1

2,01,,1

nfDgnfAh

kfDkngkfAknh

kfDnkgkfAnkh

fnkgfnkhfnfA

nkgnkh

nkg

nkh

dj

dj

k

dj

k

dj

k

dj

k

dj

kjk

kjk

njdj

k kkjkjnj

nkkjnj

nkkjnj

57

Multiresolution Wavelet Representation and Approximation

4

)1()1()(

)()(,)(),()(

)()(,)(),()(

)()(~)()(

~)2)(~()2)(

~())((

)2)(~())((

)2)(~

())((

1

221

11

221

11

1

1

1

nhng

xnxxng

xnxxnh

ngng

nhnh

nfDgnfAhnfA

nfAgnfD

nfAhnfA

n

x

x

dj

dj

dj

dj

dj

dj

dj

58

Multiresolution Wavelet Representation and Approximation

4

)()(~

)()(~

)()(

1

1

1

zGzG

zHzH

zzHzG

Think of: Then, analysis stage for subband or wavelet

decomposition is the same Higher resolution signal Two low resolution signals

through filtering by and downsampling by two

ghhh ~,~

21

)~,~

(or , 21 ghhh

59

Multiresolution Wavelet Representation and Approximation

4

Synthesis stage for subband or wavelet decomposition is different

For subband: low resolution signals upsampled by two, followed by filtering by , followed by summation to reconstruct higher resolution signal

For wavelet: low resolution signals filtered by the same , and downsampled by two, followed by summation to reconstruct higher resolution signal

)()(),()( 2211 nhngnhng

h~

g~

60

Multiresolution Wavelet Representation and Approximation

4

After filtering at analysis stage, two produced signals have only a half resolution as the original signal

Downsampling by two is justifiable Before filtering at synthesis stage, upsampling by

two on two low resolution signals in subband decomposition seems not well justifiable

61

Multiresolution Wavelet Representation and Approximation

4

h

jjnj

x

nn

nn

nh

nxxnh

nxxx

VB

cnxcV

mndxmxnx

ZnnxB

)( Assume

)(),()(Let

)2(2)(),()(Let

for basis lorthonormaan is Then,

with :)(Let

)()()(

:lorthonorma be :)(Let

2

1

,221

1

00

20

0

62

Multiresolution Wavelet Representation and Approximation

4

1,

2

1

221

221

1,1,

1

2,

,,

,

)( Thus

)(But

)2()(),(

))2(()()()(

)2()2(22,

:

for basis lorthonormaan is Then

with :)(Let

)()()(

)2()2(2)()(

:lorthonorma is :)(Then

jnj

m

wu

jjjmjnj

jj

jj

nn

nnjnj

jjjmjnj

njj

Vx

nmh

nmhmunu

dwnmwdumun

dxmxnx

VV

VB

cxcV

mndumunu

dxmxnxdxxx

ZnxB

63

Multiresolution Wavelet Representation and Approximation

4

)2)(~

())((Then

)()(~

Let

,)2(

,,

,,,

:,

,

1

,1

,1,1,

,1,1,,

,

,,

nfAhnfA

nhnh

fnkh

f

ff

ZnffA

ffA

dj

dj

kjk

kkjkjnj

kjk

kjnjnj

njdj

nnjnjj