Download - Speech & Audio Processing
Speech & Audio Processing
Speech & Audio Coding Examples
April 22, 2023 Veton Këpuska 2
A Simple Speech Coder
LPC Based Analysis Structure
Pre-emphasis
WindowingAnalysis
Auto-Correlation
Levinson-Durbin
Linear Prediction Analysis
AudioInput
AnalysisFilter
Residual
Filter Coeffs
Residual
Filter CoeffsQ
uanti
zati
on
April 22, 2023 Veton Këpuska 3
Windowing Analysis Stage
N – Length of the Analysis Window
10-30 msec
April 22, 2023 Veton Këpuska 4
Some Analysis Windows
April 22, 2023 Veton Këpuska 5
MATLAB Useful Functions
wintool Use “doc wintool” for more information
window Use “>doc window” for the list of supported windows
Define your own window if needed e.g: Sine window and Vorbis window
windowvorbis
5.0sin
2sin
windowsine5.0
sin
2
N
nnw
N
nnw
April 22, 2023 Veton Këpuska 6
LPC Analysis Stage
LPC Method Described in: Ch5-Analysis_&_Synthesis_of_Pole-
Zero_Speech_Models.ppt
Summary: Perform Autocorrelation Solve system of equations with Durbin-
Levinson Method
MATLAB help doc lpc, etc.
April 22, 2023 Veton Këpuska 7
Example of MATLAB Codefunction myLPCCodec(wavfile, N)%% wavfile - input MS wav file % N - LPC Filter Order%[x, fs, nbits] = wavread(wavfile);% plot(x);% Playing Original Signalsoundsc(x,fs);% Performing LPC analysis using MATLAB lpc function[a, g] = lpc(x,N);% performing filtering operation on estimated filter coeffs% producing predicted samplesest_x = filter([0 -a(2:end)], 1, x);% error signale = x - est_x;% Testing the quality of predicted samplessoundsc(est_x, fs); % Synthesis Stage With Zero Loss of Informationsyn_x = filter([0 -a(2:end)], 1, g.*e);soundsc(syn_x,fs);
zAzH1
ge[n] ŝ[n]
p
kk ngeknsns
1
ˆˆ
April 22, 2023 Veton Këpuska 8
Analysis of Quantization Errors Use MATLAB functions to research the effects of
quantization errors introduced by precision of the arithmetic operations and representation of the filter and error signal: Double (float64) representation (software emulation) Float (float32) representation (software emulation) Int (int32) representation (hardware emulation) Short (int16) representation (hardware emulation).
Useful MATLAB functions: Fix, floor, round, ceil Example:
sig_hat=fix(sig*2^(B-1))/2^(B-1); Truncation of the sig to B bits.
April 22, 2023 Veton Këpuska 9
Quantization of Error Signal & Filter Coefficients
Can Apply ADPCM for Error Signal Filter Coefficients in the Direct Filter Form
are found to be sensitive to quantization errors: Small quantization error can have a large effect
on filter characteristics. Issue is that polynomial coefficients have non-
linear mapping to poles of the filter (e.g., roots of the polynomial).
Alternate representations possible that have significantly better tolerance to quantization error.
April 22, 2023 Veton Këpuska 10
LPC Filter Representations
As noted previously when Levinson-Durbin algorithm was introduced one alternate representation to filter coefficients was also mentioned: PARCOR coefficients:
LPC to PARCOR:
111
21 11
1
11
1
iii
i
iji
ii
iji
j
jpj
ak
ijk
aaaa
,,p,pifor
pja
April 22, 2023 Veton Këpuska 11
PARCOR Filter Representation
PARCOR to LPC:
pja
ijakaa
ka
,pifor
pjj
ijii
ij
ij
iii
1
11
,2,1
11
April 22, 2023 Veton Këpuska 12
Line Spectral Frequency Representation
It turns out that PARCOR coefficients can be represented with LSF that have significantly better properties.
Note that:
The PARCOR lattice structure of the LPC synthesis filter above:
zAzH1
z-1z-1
kp-
+
z-1z-1
kp-1
+
-z-1z-1 k 0
=-1
Input OutputA0Ap-1Ap
B0Bp-1Bp
k p+
1=
∓ 1
April 22, 2023 Veton Këpuska 13
Line Spectral Frequency Representation
From previous slide the following holds:
From this realization of the filter the LSP representation is derived:
& & 1
11
100
111
11
zAzzB
zzBzA
zAkzBzzB
zBkzAzA
pp
p
pppp
pppp
April 22, 2023 Veton Këpuska 14
LSF Representation
zQzPzA
zBzAzQk
zBzAzPk
ppp
pppp
pppp
11
11
11
2
1
1
1
April 22, 2023 Veton Këpuska 15
LPC Synthesis Filter with LSF
1121
1
1
11
11
11
zQzP
zAzAzH
pp
April 22, 2023 Veton Këpuska 16
A Simple Speech Coder
LPC Based Synthesis Structure
ResidualSynthesis
FilterAudioOutput
Filter Coeffs
De-emphasis
Deco
din
g
ResidualSignal
FilterCoeffs
Audio Coding
April 22, 2023 Veton Këpuska 18
Audio Coding
Most of the Audio Coding Standards use principles of Psychoacoustics.
Example of Basic Structure of MP3 encoder:
Filterbank &Transform
Filterbank &Transform
QuantizationQuantization
PsychoacousticModel
PsychoacousticModel
AudioInput Bit-stream
April 22, 2023 Veton Këpuska 19
Basic Structure of Audio Coders
Filterbank Processing Psychoacoustic Model Quantization
Filter Bank Analysis Synthesis
April 22, 2023 Veton Këpuska 21
Filterbank Processing:
Splitting full-band signal into several sub-bands: Uniform sub-bands (FFT) Critical Band (FFT followed by non-linear
transformation) Reflect Human Auditory Apparatus. Mel-Scale and Bark-Scale transformations
2
7500arctan*5.3*00076.0arctan*13
7001ln*01048.1127
ffBark
fMel
April 22, 2023 Veton Këpuska 22
Mel-Scale
April 22, 2023 Veton Këpuska 23
Bark-Scale
April 22, 2023 Veton Këpuska 24
Analysis Structure of Filterbank
hk[n]hk[n]
AudioInput
hN[n]hN[n]
h1[n]h1[n]
↓↓
↓↓
↓↓
MDCTMDCT
MDCTMDCT
MDCTMDCT
hk[n] – Impulse Response of a Quadrature Mirror kth-filter
N – Number of Channels. Typically 32
↓ - Down-sampling
MDCT – Modified Discrete Cosine Transform
MDCTMDCT
MDCTMDCT
MDCTMDCT
Quanti
zati
on
Bit Stream
April 22, 2023 Veton Këpuska 25
MDCTMDCT
MDCTMDCT
MDCTMDCT
Analysis Structure of Filterbank
IMDCTIMDCT AudioOutput
IMDCTIMDCT
IMDCTIMDCT
↑↑
↑↑
↑↑
gk[n]gk[n]
gN[n]gN[n]
g1[n]g1[n]
gk[n] – Impulse Response of a Inverse Quadrature Mirror kth-filter
N – Number of Channels. Typically 32
↑ - Up-sampling
IMDCT – Inverse Modified Discrete Cosine Transform
Deco
din
g
Bit Stream
Psycho-Acoustic Modeling
April 22, 2023 Veton Këpuska 27
Psychoacoustic Model
Masking Threshold according to the human auditory perception. Masking threshold is used to quantize
the Discrete Cosine Transform Coefficients
Analysis is done in frequency domain represented by DFT and computed by FFT.
April 22, 2023 Veton Këpuska 28
Threshold of Hearing
Absolute threshold of audibly perceptible events in quiet conditions (no other sounds).
Any signal bellow the threshold can be removed without effect on the perception.
April 22, 2023 Veton Këpuska 29
Threshold of Hearing
April 22, 2023 Veton Këpuska 30
Frequency Masking
Schröder Spreading Function Bark Scale Function:
2
12
10
ker
2
474.015.17474.05.781.15log*10
7500arctan*5.3*00076.0arctan*13
zzzF
fzfzz
fffz
masmaskee
April 22, 2023 Veton Këpuska 31
Masking Curve
April 22, 2023 Veton Këpuska 32
Primary Tone 1kHz
April 22, 2023 Veton Këpuska 33
Masked Tone 900 Hz
April 22, 2023 Veton Këpuska 34
Combined Sound 1kHz + 0.9kHz
April 22, 2023 Veton Këpuska 35
Combined 1kHz + 0.9kHz (-10dB)
April 22, 2023 Veton Këpuska 36
Combined 1kHz + 5kHz (-10dB)
END
April 22, 2023 Veton Këpuska 37