concepts of multimedia processing and transmission it 481, lecture #4 dennis mccaughey, ph.d. 25...

Concepts of Multimedia Concepts of Multimedia Processing and TransmissionProcessing and Transmission

IT 481, Lecture #4Dennis McCaughey, Ph.D.

25 September, 2006

08/28/2006IT 481, Fall 20062

Introduction to Linear SystemsIntroduction to Linear Systems

The Modified Discrete Cosine Transform (MDCT) was introduced in the lecture on MP3 encoding

How does it relate to the Discrete Cosine Transform (DCT) and why are we concerned?

The DCT and DCT are important enablers in data compression of both audio and video.

The DCT is a special case of the Discrete Fourier Transform (DFT), a key component in digital signal processing

The Fast Fourier Transform (FFT) is a computationally efficient form of the DFT

08/28/2006IT 481, Fall 20063

Linear Systemh(t)

f(t) g(t)

Linear System DefinitionLinear System Definition

08/28/2006IT 481, Fall 20064

Linear System Response to a Series of Linear System Response to a Series of Sampled data InputsSampled data Inputs

0 2 T N TT

(2 )Tf T ( )Tf T (0)Tf

( ) ( )Th t T f T

( ) (0)Th t f

( ) ( )Th t N T f N T

( 2 ) (2 )Th t T f T

t

( )Tf N T

1

0

( ) ( ) ( )N

Nn

g t T f n T h t n T

08/28/2006IT 481, Fall 20065

Linear System Input/OutputLinear System Input/Output

1

0

0

( ) ( ) ( )

( ) lim

( ) ( )

N

Nn

NN

T

t

g t T f n T h t n T

g t g t

f h t d

This is denoted as the convolution of f(t) and h(t)

08/28/2006IT 481, Fall 20066

Fourier Transform - Non-periodic SignalFourier Transform - Non-periodic Signal

Let g(t) be a continuous & non-periodic function of t

The Fourier Transform of g(t) is

– Where = 2f is the radial frequency in unit of radian/sec, and f the frequency in unit of Hz

The Inverse Fourier Transform is

( ) ( ) j tG g t e dt

21( ) ( )

2j tg t G e d

08/28/2006IT 481, Fall 20067

Fourier Transform ExampleFourier Transform Example

)/(tan

22

0

)(

0

)(

0

111

1

)()(

)()(

aj

tjatja

tjattjat

at

eaja

eja

dte

dteedtetueG

tuetg

08/28/2006IT 481, Fall 20068

Relationship Between the Fourier Relationship Between the Fourier Transform and ConvolutionTransform and Convolution

If

( ) ( ) ( )

Then

The Fourier Transform of ( ) is the product of

the Fourier Transforms of and

t

g t f h t d

G F H

g t

f t h t

08/28/2006IT 481, Fall 20069

A Very Important PropertyA Very Important Property

2 2

2 2

1, Parcivals Theorem

2

If , then 0 as

g t dt G d

g t dt G

08/28/2006IT 481, Fall 200610

Convolution Sum ExampleConvolution Sum Example

0

0 , 1 , 2

0 , 1 , 2

0 0 0

1 0 1 1 0

2 0 2 1 1 2 0

3 1 2 2 1

4 2 2

n

k

f f f f

h h h h

g n f k h n k

g f h

g f h f h

g f h f h f h

g f h f h

g f h

ng = nf + nh -1

f(k) = h(k) =0 for k >2

08/28/2006IT 481, Fall 200611

Integer Arithmetic ExampleInteger Arithmetic Example

Multiplication of 2 Integers is a form of discrete convolution

12345

111

*

12345

111

12345

12345

12345

1370295

x

y

z x y

08/28/2006IT 481, Fall 200612

Discrete Convolution in Matrix FormDiscrete Convolution in Matrix Form

00 0 1 2 0 0

01 0 0 1 2 0

02 0 0 0 1 2

13 2 0 0 0 1

24 1 2 0 0 0

is what is known as a "circulant" matrix and is an

important structure in signal proc

C

C

g h h h

g h h h

fg h h h

fg h h h

fg h h h

g H f

H

2

14

essing

If is an matrix, it requires on the order of

adds and multiplies to accomplish the convolution

if 128 then this is 2

CH nxn n

n operations

08/28/2006IT 481, Fall 200613

Enter the Discrete Fourier TransformEnter the Discrete Fourier Transform

*

If is the Matrix representing the Discrete Fourier Transform, then

where is a diagnonal matrix consisting of the discrete

Fourier Transform of the first row of

Thus

C

C

C

F

D F H F D

H

H

*

*

142 2

So that

The FFT is a fast implementation which allows the computations to

be accomplished in ( log ( )) 128log (128) 128*7 2

FDF

g FDF f

O n n k k

08/28/2006IT 481, Fall 200614

Discrete Fourier Transform (DFT)Discrete Fourier Transform (DFT)

A discrete-time version of the Fourier Transform that can be implemented in digital domain

Given an N-point time-sampled sequence {x0, x1 ,…, xN-1}, the DFT is described by a transform pair with complexity O(N2)

Furthermore,

1

0

21

0

2 1,

N

n

N

knj

nk

N

k

N

knj

kn eXN

xexX

**1k

XDFTN

XIDFT n

08/28/2006IT 481, Fall 200615

Fast Fourier Transform (FFT)Fast Fourier Transform (FFT)

FFT is a computationally efficient algorithm O(Nlog2N). Recall DFT transform

Let

It can be shown that

Where Gn and Hn are two half-sized DFTs of even and odd terms

1

0

2N

k

N

knj

kn exX

Nj

N eW2

nn

Nnn HWGX

08/28/2006IT 481, Fall 200616

The FFT Efficient ImplementationThe FFT Efficient Implementation

Each half-size DFT can in turn be divided into a pair of quarter-size DFTs.

End result is a partition and reordering of time domain inputs using what is known as bit-reverse addressing– Each stage of the DFT consists of N complex

multiply-accumulates in a straight forward implementation

– Further simplification from eight to six real operations by the “butterfly”

– Further simplification when time-domain sequence is real

08/28/2006IT 481, Fall 200617

The FFT StructureThe FFT Structure

08/28/2006IT 481, Fall 200618

The Discrete Cosine Transform (DCT)The Discrete Cosine Transform (DCT)

DCT is a Fourier-related transform similar to the discrete Fourier transform (DFT), but using only real numbers

It is equivalent to a DFT of roughly twice the length, operating on real data with even symmetry (since the Fourier transform of a real and even function is real and even)

1

0

1

0

(2 1)( ) ( ) cos , 0,1,2, , 1

2

(2 1)( ) ( ) ( ) cos , 0,1,2, , 1

2

(0) 1, ( ) 2, 1, 2, , 1

N

c nn

N

cn

n kX k x g n k N

N

n kx n X k g n n N

N

g g k k N

08/28/2006IT 481, Fall 200619

The Modified Discrete Cosine Transform The Modified Discrete Cosine Transform (MDCT)(MDCT)

The MDCT is 50% overlapped making it very useful for quantization as it effectively removes the otherwise easily detectable blocking artifact between blocks

1

0

1

0

( ) cos 2 1 2 1 , 0,1,2, , 12 2 2

( ) ( ) cos 2 1 2 1 , 0,1,2, , 12 2

N

mc nn

N

mck

N NX k x n k k

N

Nx n X k n k n N

N

08/28/2006IT 481, Fall 200620

In Matrix Notation(2 Length-8 Blocks)In Matrix Notation(2 Length-8 Blocks)

'

0 1 2 3 4 5 6 7 8 9 10 11

4 8

4 4

4 8

4 4 8 12

8 8

, , , , , , , , , , ,

0 0 0 0

0 0 0 0

0 0 0 0

0 0 0 0

0 0 0 0

0 0 0 0

0 0 0 0

0 0 0 0

1* '

4

mc

x

x

x

x x

x

x x x x x x x x x x x x x

X T x

where

MDCT

T

MDCT

T T I

08/28/2006IT 481, Fall 200621

Fourier Transform SummaryFourier Transform Summary

Physical Interpretation– Describes the frequency content of a real-world

signal– For real-world signals, frequency content tails off

as frequencies get higher Mathematical Interpretation

– Convolution in time domain becomes multiplication in frequency domain

– A matrix that diagonalizes a circulant convolution matrix

– DCT is a special case of the DFT

08/28/2006IT 481, Fall 200622

Adaptive Transform Coding (ATC)Adaptive Transform Coding (ATC)

Another frequency domain technique for bit rate rage of 9.6 – 20 Kbps and involves block transformation of windowed input segment of speech waveform

Each segment is represented by a set of transformed coefficients which are quantized and transmitted in lieu of the signal itself

At receiver, quantized coefficients are inverse-transformed to get back to original waveform

The most attractive and frequently used transformed is the Discrete Cosine Transform (DCT) and corresponding Inverse Discrete Cosine Transform (IDCT)

08/28/2006IT 481, Fall 200623

ATC PracticalityATC Practicality

Bit allocation among different coefficients are varied adaptively from frame to frame while keeping the total number of bits constant

Time-varying statistics controls the bit allocation procedure and has to be transmitted as side information (an overhead of about 2 Kbps)

Side information is also used to determine the step size of various coefficient quantizers

In practice, the DCT and IDCT are not directly evaluated using the formulation here but rather by computationally efficient algorithm such as the FFT

08/28/2006IT 481, Fall 200624

Source Coding - VocodersSource Coding - Vocoders

A class of speech coding system that analyze the voice signal at the transmitter, derive the parameters and transmit them to the receiver at which voice is synthesized using these parameters

All vocoders attempt to model the speech generation process by a dynamic system and quantify the physical parameters of the system

In general much more complex than waveform coders and achieve very high economy in transmission bit rate

They tend to be less robust and performance are very much speaker-dependent

08/28/2006IT 481, Fall 200625

Channel VocoderChannel Vocoder

The first among many analysis-synthesis systems that was demonstrated

Frequency domain vocoder that determine the envelope of the speech signal for a number of frequency bands and then sample, encode and multiplex these samples with the encoded outputs of the other filters

The sampling is done synchronously every 10 ms to 30 ms

Along with energy information about each band, the voiced/unvoiced decision, the pitch frequency for voiced speech are also transmitted

08/28/2006IT 481, Fall 200626

Cepstrum VocoderCepstrum Vocoder

The cepstrum vocoder separates the excitation and vocal tract spectrum by the Inverse Fourier transform of the log magnitude spectrum of the signal– The low frequency coefficients in the cepstrum correspond

to the vocal tract spectral envelope– High frequency excitation coefficients form periodic pulse

train at multiples of the sampling period At the receiver, the vocal tract cepstral coefficients

are Fourier transformed to produce the vocal impulse response

By convolving the impulse response with a synthetic excitation signal, the original speech is reconstructed

08/28/2006IT 481, Fall 200627

Linear Predictive Coders (LPC)Linear Predictive Coders (LPC)

The time-domain LPC extracts the significant features of speech from its waveform.

Computationally intensive but by far the most popular among the class of low bit rate vocoders. It’s possible to transmit good quality voice at 4.8 Kbps and poorer quality voice at lower rates

LPC models the vocal tract as an all-pole digital filter, and uses a weighted sum of past p samples to estimate the present sample (10 p 15), with en being the prediction error

p

knknkn esas

1

08/28/2006IT 481, Fall 200628

LPC CoefficientsLPC Coefficients

The LPC coefficients an are found by solving the system of equations

Where Cmk are the correlation coefficients computed from the m-th and k-th lags of sn

A matrix inversion is needed hence high computational load

The reflection coefficient, a related set of coefficients are transmitted in practice

p

kkmk aC

0

0

08/28/2006IT 481, Fall 200629

LPC Transmitted ParametersLPC Transmitted Parameters

Reflection coefficients can be adequately represented by 6 bits

For q = 10 predictor, needs 72 bits per frame– 60 bits for coefficients– 5 bits for a gain parameter and 6 bits for a pitch period

If parameters are estimated every 15 – 20 msec– Resulting bit rate has a range of 2400 – 4800 bps

Additional saving can be achieved via a non-linear transformation of the coefficients prior to coding to reduce sensitivity to quantization error

08/28/2006IT 481, Fall 200630

LPC Receiver ProcessingLPC Receiver Processing

At the receiver, the coefficients are used for a synthesis filter.

Various LPC methods differ based on how the synthesis filter is excited– Multi-pulse Excited LPC: typically 8 pulses with proper

positions are used as excitation– Code-Excited LPC (CELP): transmitter searches its code

book for a stochastic excitation to the LPC filter that gives the best perceptual match to the sound. The index to the code book is then transmitted

CELP coders are extremely complex and can require more than 500 MIPS

However, high quality is achieved when excitation is code at 0.25 bits/sec and transmission bit rate as low as 4.8 Kbps

08/28/2006IT 481, Fall 200631

Various LPC VocodersVarious LPC Vocoders

08/28/2006IT 481, Fall 200632

ITU-T Speech Coding StandardsITU-T Speech Coding Standards

08/28/2006IT 481, Fall 200633

Speech Coder PerformanceSpeech Coder Performance

Objective measure: how well the reconstructed speech signal quantitatively approximates original version?

Mean Square Error (MSE) distortion Frequency weighted MSE Segmented Signal to Noise Ration (SNR)

Subjective measure: conducted by playing the sample to a number of listeners to judge the quality of the speech– Overall quality, listening efforts, intelligibility, naturalness

Diagnostic Rhyme Test (DRT): most popular for intelligibility Diagnostic Acceptability Measure (DAM) evaluates

acceptability of speech coding system Mean Opinion Score (MOS) the most popular ranking system

08/28/2006IT 481, Fall 200634

Mean Opinion Score (MOS)Mean Opinion Score (MOS)

Most popular ranking system

08/28/2006IT 481, Fall 200635

MOS for Speech CodersMOS for Speech Coders

concepts of multimedia processing and transmission it 481, lecture #4 dennis mccaughey, ph.d. 25...

Documents