concepts of multimedia processing and transmission it 481, lecture #4 dennis mccaughey, ph.d. 25...
TRANSCRIPT
Concepts of Multimedia Concepts of Multimedia Processing and TransmissionProcessing and Transmission
IT 481, Lecture #4Dennis McCaughey, Ph.D.
25 September, 2006
08/28/2006IT 481, Fall 20062
Introduction to Linear SystemsIntroduction to Linear Systems
The Modified Discrete Cosine Transform (MDCT) was introduced in the lecture on MP3 encoding
How does it relate to the Discrete Cosine Transform (DCT) and why are we concerned?
The DCT and DCT are important enablers in data compression of both audio and video.
The DCT is a special case of the Discrete Fourier Transform (DFT), a key component in digital signal processing
The Fast Fourier Transform (FFT) is a computationally efficient form of the DFT
08/28/2006IT 481, Fall 20063
Linear Systemh(t)
f(t) g(t)
Linear System DefinitionLinear System Definition
08/28/2006IT 481, Fall 20064
Linear System Response to a Series of Linear System Response to a Series of Sampled data InputsSampled data Inputs
0 2 T N TT
(2 )Tf T ( )Tf T (0)Tf
( ) ( )Th t T f T
( ) (0)Th t f
( ) ( )Th t N T f N T
( 2 ) (2 )Th t T f T
t
( )Tf N T
1
0
( ) ( ) ( )N
Nn
g t T f n T h t n T
08/28/2006IT 481, Fall 20065
Linear System Input/OutputLinear System Input/Output
1
0
0
( ) ( ) ( )
( ) lim
( ) ( )
N
Nn
NN
T
t
g t T f n T h t n T
g t g t
f h t d
This is denoted as the convolution of f(t) and h(t)
08/28/2006IT 481, Fall 20066
Fourier Transform - Non-periodic SignalFourier Transform - Non-periodic Signal
Let g(t) be a continuous & non-periodic function of t
The Fourier Transform of g(t) is
– Where = 2f is the radial frequency in unit of radian/sec, and f the frequency in unit of Hz
The Inverse Fourier Transform is
( ) ( ) j tG g t e dt
21( ) ( )
2j tg t G e d
08/28/2006IT 481, Fall 20067
Fourier Transform ExampleFourier Transform Example
)/(tan
22
0
)(
0
)(
0
111
1
)()(
)()(
aj
tjatja
tjattjat
at
eaja
eja
dte
dteedtetueG
tuetg
08/28/2006IT 481, Fall 20068
Relationship Between the Fourier Relationship Between the Fourier Transform and ConvolutionTransform and Convolution
If
( ) ( ) ( )
Then
The Fourier Transform of ( ) is the product of
the Fourier Transforms of and
t
g t f h t d
G F H
g t
f t h t
08/28/2006IT 481, Fall 20069
A Very Important PropertyA Very Important Property
2 2
2 2
1, Parcivals Theorem
2
If , then 0 as
g t dt G d
g t dt G
08/28/2006IT 481, Fall 200610
Convolution Sum ExampleConvolution Sum Example
0
0 , 1 , 2
0 , 1 , 2
0 0 0
1 0 1 1 0
2 0 2 1 1 2 0
3 1 2 2 1
4 2 2
n
k
f f f f
h h h h
g n f k h n k
g f h
g f h f h
g f h f h f h
g f h f h
g f h
ng = nf + nh -1
f(k) = h(k) =0 for k >2
08/28/2006IT 481, Fall 200611
Integer Arithmetic ExampleInteger Arithmetic Example
Multiplication of 2 Integers is a form of discrete convolution
12345
111
*
12345
111
12345
12345
12345
1370295
x
y
z x y
08/28/2006IT 481, Fall 200612
Discrete Convolution in Matrix FormDiscrete Convolution in Matrix Form
00 0 1 2 0 0
01 0 0 1 2 0
02 0 0 0 1 2
13 2 0 0 0 1
24 1 2 0 0 0
is what is known as a "circulant" matrix and is an
important structure in signal proc
C
C
g h h h
g h h h
fg h h h
fg h h h
fg h h h
g H f
H
2
14
essing
If is an matrix, it requires on the order of
adds and multiplies to accomplish the convolution
if 128 then this is 2
CH nxn n
n operations
08/28/2006IT 481, Fall 200613
Enter the Discrete Fourier TransformEnter the Discrete Fourier Transform
*
If is the Matrix representing the Discrete Fourier Transform, then
where is a diagnonal matrix consisting of the discrete
Fourier Transform of the first row of
Thus
C
C
C
F
D F H F D
H
H
*
*
142 2
So that
The FFT is a fast implementation which allows the computations to
be accomplished in ( log ( )) 128log (128) 128*7 2
FDF
g FDF f
O n n k k
08/28/2006IT 481, Fall 200614
Discrete Fourier Transform (DFT)Discrete Fourier Transform (DFT)
A discrete-time version of the Fourier Transform that can be implemented in digital domain
Given an N-point time-sampled sequence {x0, x1 ,…, xN-1}, the DFT is described by a transform pair with complexity O(N2)
Furthermore,
1
0
21
0
2 1,
N
n
N
knj
nk
N
k
N
knj
kn eXN
xexX
**1k
XDFTN
XIDFT n
08/28/2006IT 481, Fall 200615
Fast Fourier Transform (FFT)Fast Fourier Transform (FFT)
FFT is a computationally efficient algorithm O(Nlog2N). Recall DFT transform
Let
It can be shown that
Where Gn and Hn are two half-sized DFTs of even and odd terms
1
0
2N
k
N
knj
kn exX
Nj
N eW2
nn
Nnn HWGX
08/28/2006IT 481, Fall 200616
The FFT Efficient ImplementationThe FFT Efficient Implementation
Each half-size DFT can in turn be divided into a pair of quarter-size DFTs.
End result is a partition and reordering of time domain inputs using what is known as bit-reverse addressing– Each stage of the DFT consists of N complex
multiply-accumulates in a straight forward implementation
– Further simplification from eight to six real operations by the “butterfly”
– Further simplification when time-domain sequence is real
08/28/2006IT 481, Fall 200617
The FFT StructureThe FFT Structure
08/28/2006IT 481, Fall 200618
The Discrete Cosine Transform (DCT)The Discrete Cosine Transform (DCT)
DCT is a Fourier-related transform similar to the discrete Fourier transform (DFT), but using only real numbers
It is equivalent to a DFT of roughly twice the length, operating on real data with even symmetry (since the Fourier transform of a real and even function is real and even)
1
0
1
0
(2 1)( ) ( ) cos , 0,1,2, , 1
2
(2 1)( ) ( ) ( ) cos , 0,1,2, , 1
2
(0) 1, ( ) 2, 1, 2, , 1
N
c nn
N
cn
n kX k x g n k N
N
n kx n X k g n n N
N
g g k k N
08/28/2006IT 481, Fall 200619
The Modified Discrete Cosine Transform The Modified Discrete Cosine Transform (MDCT)(MDCT)
The MDCT is 50% overlapped making it very useful for quantization as it effectively removes the otherwise easily detectable blocking artifact between blocks
1
0
1
0
( ) cos 2 1 2 1 , 0,1,2, , 12 2 2
( ) ( ) cos 2 1 2 1 , 0,1,2, , 12 2
N
mc nn
N
mck
N NX k x n k k
N
Nx n X k n k n N
N
08/28/2006IT 481, Fall 200620
In Matrix Notation(2 Length-8 Blocks)In Matrix Notation(2 Length-8 Blocks)
'
0 1 2 3 4 5 6 7 8 9 10 11
4 8
4 4
4 8
4 4 8 12
8 8
, , , , , , , , , , ,
0 0 0 0
0 0 0 0
0 0 0 0
0 0 0 0
0 0 0 0
0 0 0 0
0 0 0 0
0 0 0 0
1* '
4
mc
x
x
x
x x
x
x x x x x x x x x x x x x
X T x
where
MDCT
T
MDCT
T T I
08/28/2006IT 481, Fall 200621
Fourier Transform SummaryFourier Transform Summary
Physical Interpretation– Describes the frequency content of a real-world
signal– For real-world signals, frequency content tails off
as frequencies get higher Mathematical Interpretation
– Convolution in time domain becomes multiplication in frequency domain
– A matrix that diagonalizes a circulant convolution matrix
– DCT is a special case of the DFT
08/28/2006IT 481, Fall 200622
Adaptive Transform Coding (ATC)Adaptive Transform Coding (ATC)
Another frequency domain technique for bit rate rage of 9.6 – 20 Kbps and involves block transformation of windowed input segment of speech waveform
Each segment is represented by a set of transformed coefficients which are quantized and transmitted in lieu of the signal itself
At receiver, quantized coefficients are inverse-transformed to get back to original waveform
The most attractive and frequently used transformed is the Discrete Cosine Transform (DCT) and corresponding Inverse Discrete Cosine Transform (IDCT)
08/28/2006IT 481, Fall 200623
ATC PracticalityATC Practicality
Bit allocation among different coefficients are varied adaptively from frame to frame while keeping the total number of bits constant
Time-varying statistics controls the bit allocation procedure and has to be transmitted as side information (an overhead of about 2 Kbps)
Side information is also used to determine the step size of various coefficient quantizers
In practice, the DCT and IDCT are not directly evaluated using the formulation here but rather by computationally efficient algorithm such as the FFT
08/28/2006IT 481, Fall 200624
Source Coding - VocodersSource Coding - Vocoders
A class of speech coding system that analyze the voice signal at the transmitter, derive the parameters and transmit them to the receiver at which voice is synthesized using these parameters
All vocoders attempt to model the speech generation process by a dynamic system and quantify the physical parameters of the system
In general much more complex than waveform coders and achieve very high economy in transmission bit rate
They tend to be less robust and performance are very much speaker-dependent
08/28/2006IT 481, Fall 200625
Channel VocoderChannel Vocoder
The first among many analysis-synthesis systems that was demonstrated
Frequency domain vocoder that determine the envelope of the speech signal for a number of frequency bands and then sample, encode and multiplex these samples with the encoded outputs of the other filters
The sampling is done synchronously every 10 ms to 30 ms
Along with energy information about each band, the voiced/unvoiced decision, the pitch frequency for voiced speech are also transmitted
08/28/2006IT 481, Fall 200626
Cepstrum VocoderCepstrum Vocoder
The cepstrum vocoder separates the excitation and vocal tract spectrum by the Inverse Fourier transform of the log magnitude spectrum of the signal– The low frequency coefficients in the cepstrum correspond
to the vocal tract spectral envelope– High frequency excitation coefficients form periodic pulse
train at multiples of the sampling period At the receiver, the vocal tract cepstral coefficients
are Fourier transformed to produce the vocal impulse response
By convolving the impulse response with a synthetic excitation signal, the original speech is reconstructed
08/28/2006IT 481, Fall 200627
Linear Predictive Coders (LPC)Linear Predictive Coders (LPC)
The time-domain LPC extracts the significant features of speech from its waveform.
Computationally intensive but by far the most popular among the class of low bit rate vocoders. It’s possible to transmit good quality voice at 4.8 Kbps and poorer quality voice at lower rates
LPC models the vocal tract as an all-pole digital filter, and uses a weighted sum of past p samples to estimate the present sample (10 p 15), with en being the prediction error
p
knknkn esas
1
08/28/2006IT 481, Fall 200628
LPC CoefficientsLPC Coefficients
The LPC coefficients an are found by solving the system of equations
Where Cmk are the correlation coefficients computed from the m-th and k-th lags of sn
A matrix inversion is needed hence high computational load
The reflection coefficient, a related set of coefficients are transmitted in practice
p
kkmk aC
0
0
08/28/2006IT 481, Fall 200629
LPC Transmitted ParametersLPC Transmitted Parameters
Reflection coefficients can be adequately represented by 6 bits
For q = 10 predictor, needs 72 bits per frame– 60 bits for coefficients– 5 bits for a gain parameter and 6 bits for a pitch period
If parameters are estimated every 15 – 20 msec– Resulting bit rate has a range of 2400 – 4800 bps
Additional saving can be achieved via a non-linear transformation of the coefficients prior to coding to reduce sensitivity to quantization error
08/28/2006IT 481, Fall 200630
LPC Receiver ProcessingLPC Receiver Processing
At the receiver, the coefficients are used for a synthesis filter.
Various LPC methods differ based on how the synthesis filter is excited– Multi-pulse Excited LPC: typically 8 pulses with proper
positions are used as excitation– Code-Excited LPC (CELP): transmitter searches its code
book for a stochastic excitation to the LPC filter that gives the best perceptual match to the sound. The index to the code book is then transmitted
CELP coders are extremely complex and can require more than 500 MIPS
However, high quality is achieved when excitation is code at 0.25 bits/sec and transmission bit rate as low as 4.8 Kbps
08/28/2006IT 481, Fall 200631
Various LPC VocodersVarious LPC Vocoders
08/28/2006IT 481, Fall 200632
ITU-T Speech Coding StandardsITU-T Speech Coding Standards
08/28/2006IT 481, Fall 200633
Speech Coder PerformanceSpeech Coder Performance
Objective measure: how well the reconstructed speech signal quantitatively approximates original version?
Mean Square Error (MSE) distortion Frequency weighted MSE Segmented Signal to Noise Ration (SNR)
Subjective measure: conducted by playing the sample to a number of listeners to judge the quality of the speech– Overall quality, listening efforts, intelligibility, naturalness
Diagnostic Rhyme Test (DRT): most popular for intelligibility Diagnostic Acceptability Measure (DAM) evaluates
acceptability of speech coding system Mean Opinion Score (MOS) the most popular ranking system
08/28/2006IT 481, Fall 200634
Mean Opinion Score (MOS)Mean Opinion Score (MOS)
Most popular ranking system
08/28/2006IT 481, Fall 200635
MOS for Speech CodersMOS for Speech Coders