speech & audio processing
DESCRIPTION
Speech & Audio Processing. Speech & Audio Coding Examples. A Simple Speech Coder. LPC Based Analysis Structure. Linear Prediction Analysis. Levinson- Durbin. Pre- emphasis. Windowing Analysis. Auto- Correlation. Audio Input. Residual. Residual. Analysis Filter. Quantization. - PowerPoint PPT PresentationTRANSCRIPT
Speech & Audio Processing
Speech & Audio Coding Examples
April 22, 2023 Veton Këpuska 2
A Simple Speech Coder LPC Based Analysis Structure
Pre-emphasis
WindowingAnalysis
Auto-Correlation
Levinson-Durbin
Linear Prediction Analysis
AudioInput
AnalysisFilter
Residual
Filter Coeffs
Residual
Filter CoeffsQu
antiz
atio
n
April 22, 2023 Veton Këpuska 3
Windowing Analysis StageN – Length of the Analysis Window10-30 msec
April 22, 2023 Veton Këpuska 4
Some Analysis Windows
April 22, 2023 Veton Këpuska 5
MATLAB Useful Functions wintool
Use “doc wintool” for more information window
Use “>doc window” for the list of supported windows Define your own window if needed e.g:
Sine window and Vorbis window
windowvorbis5.0sin2
sin
windowsine5.0sin
2
Nnnw
Nnnw
April 22, 2023 Veton Këpuska 6
LPC Analysis Stage LPC Method Described in:
Ch5-Analysis_&_Synthesis_of_Pole-Zero_Speech_Models.pptx
Summary: Perform Autocorrelation Solve system of equations with Durbin-
Levinson Method MATLAB help
doc lpc, etc.
April 22, 2023 Veton Këpuska 7
Example of MATLAB Codefunction myLPCCodec(wavfile, N)%% wavfile - input MS wav file % N - LPC Filter Order%
zAzH 1 ŝ[n]
p
kk ngeknsns
1
ˆˆ
ImpulseTrain
NoiseGenerator
VoicedUnvoiced
Vocal TractModel
Gain
April 22, 2023 Veton Këpuska 8
Analysis of Quantization Errors Use MATLAB functions to research the effects of
quantization errors introduced by precision of the arithmetic operations and representation of the filter and error signal: Double (float64) representation (software emulation) Float (float32) representation (software emulation) Int (int32) representation (hardware emulation) Short (int16) representation (hardware emulation).
Useful MATLAB functions: Fix, floor, round, ceil Example:
sig_hat=fix(sig*2^(B-1))/2^(B-1); Truncation of the sig to B bits.
April 22, 2023 Veton Këpuska 9
Quantization of Error Signal & Filter Coefficients Can Apply ADPCM for Error Signal Filter Coefficients in the Direct Filter Form
are found to be sensitive to quantization errors: Small quantization error can have a large effect
on filter characteristics. Issue is that polynomial coefficients have non-
linear mapping to poles of the filter (e.g., roots of the polynomial).
Alternate representations possible that have significantly better tolerance to quantization error.
April 22, 2023 Veton Këpuska 10
LPC Filter Representations As noted previously when Levinson-Durbin algorithm was
introduced one alternate representation to filter coefficients was also mentioned: PARCOR coefficients:
LPC to PARCOR:
111
21 11
1
11
1
iii
i
iji
ii
iji
j
jpj
ak
ijkaaa
a
,,p,pifor
pja
April 22, 2023 Veton Këpuska 11
PARCOR Filter Representation PARCOR to LPC:
pja
ijakaa
ka
,pifor
pjj
ijii
ij
ij
iii
1
11
,2,1
11
April 22, 2023 Veton Këpuska 12
Line Spectral Frequency Representation It turns out that PARCOR coefficients can be represented
with LSF that have significantly better properties. Note that:
The PARCOR lattice structure of the LPC synthesis filter above:
zAzH 1
z-1
kp-
+
z-1
kp-1
+
-z-1 k 0=
-1
Input OutputA0Ap-1Ap
B0Bp-1Bp
k p+1
=∓1
April 22, 2023 Veton Këpuska 13
Line Spectral Frequency Representation From previous slide the following holds:
From this realization of the filter the LSP representation is derived:
& & 1
11
100
111
11
zAzzB
zzBzA
zAkzBzzB
zBkzAzA
pp
p
pppp
pppp
April 22, 2023 Veton Këpuska 14
LSF Representation
zQzPzA
zBzAzQk
zBzAzPk
ppp
pppp
pppp
11
11
11
21
1
1
April 22, 2023 Veton Këpuska 15
LPC Synthesis Filter with LSF
11211
111
11
11
zQzP
zAzAzH
pp
April 22, 2023 Veton Këpuska 16
A Simple Speech Coder LPC Based Synthesis Structure
Residual SynthesisFilter
AudioOutput
Filter Coeffs
De-emphasis
Deco
ding
ResidualSignal
FilterCoeffs
Audio Coding
April 22, 2023 Veton Këpuska 18
Audio Coding Most of the Audio Coding Standards use
principles of Psychoacoustics. Example of Basic Structure of MP3
encoder:
Filterbank &Transform
Quantization
PsychoacousticModel
AudioInput Bit-stream
April 22, 2023 Veton Këpuska 19
Basic Structure of Audio Coders Filterbank Processing Psychoacoustic Model Quantization
Filter Bank Analysis Synthesis
April 22, 2023 Veton Këpuska 21
Filterbank Processing: Splitting full-band signal into several sub-
bands: Uniform sub-bands (FFT) Critical Band (FFT followed by non-linear
transformation) Reflect Human Auditory Apparatus. Mel-Scale and Bark-Scale transformations
2
7500arctan*5.3*00076.0arctan*13
7001ln*01048.1127
ffBark
fMel
April 22, 2023 Veton Këpuska 22
Mel-Scale
April 22, 2023 Veton Këpuska 23
Bark-Scale
April 22, 2023 Veton Këpuska 24
Analysis Structure of Filterbank
hk[n]
AudioInput
hN[n]
h1[n]
↓
↓
↓
MDCT
MDCT
MDCT
hk[n] – Impulse Response of a Quadrature Mirror kth-filterN – Number of Channels. Typically 32↓ - Down-samplingMDCT – Modified Discrete Cosine Transform
MDCT
MDCT
MDCT
Quan
tizat
ion
Bit Stream
April 22, 2023 Veton Këpuska 25
MDCT
MDCT
MDCT
Analysis Structure of Filterbank
IMDCT AudioOutput
IMDCT
IMDCT
↑
↑
↑
gk[n]
gN[n]
g1[n]
gk[n] – Impulse Response of a Inverse Quadrature Mirror kth-filter N – Number of Channels. Typically 32↑ - Up-samplingIMDCT – Inverse Modified Discrete Cosine Transform
Deco
ding
Bit Stream
Psycho-Acoustic Modeling
April 22, 2023 Veton Këpuska 27
Psychoacoustic Model Masking Threshold according to the
human auditory perception. Masking threshold is used to quantize
the Discrete Cosine Transform Coefficients
Analysis is done in frequency domain represented by DFT and computed by FFT.
April 22, 2023 Veton Këpuska 28
Threshold of Hearing Absolute threshold of audibly
perceptible events in quiet conditions (no other sounds).
Any signal bellow the threshold can be removed without effect on the perception.
April 22, 2023 Veton Këpuska 29
Threshold of Hearing
April 22, 2023 Veton Këpuska 30
Frequency Masking Schröder Spreading Function Bark Scale Function:
21
210
ker
2
474.015.17474.05.781.15log*10
7500arctan*5.3*00076.0arctan*13
zzzF
fzfzz
fffz
masmaskee
April 22, 2023 Veton Këpuska 31
Masking Curve
April 22, 2023 Veton Këpuska 32
Primary Tone 1kHz
April 22, 2023 Veton Këpuska 33
Masked Tone 900 Hz
April 22, 2023 Veton Këpuska 34
Combined Sound 1kHz + 0.9kHz
April 22, 2023 Veton Këpuska 35
Combined 1kHz + 0.9kHz (-10dB)
April 22, 2023 Veton Këpuska 36
Combined 1kHz + 5kHz (-10dB)
April 22, 2023 Veton Këpuska 37
END