speech & audio processing

Speech & Audio Processing

Speech & Audio Coding Examples

April 22, 2023 Veton Këpuska 2

A Simple Speech Coder LPC Based Analysis Structure

Pre-emphasis

WindowingAnalysis

Auto-Correlation

Levinson-Durbin

Linear Prediction Analysis

AudioInput

AnalysisFilter

Residual

Filter Coeffs

Residual

Filter CoeffsQu

antiz

atio

n


Windowing Analysis StageN – Length of the Analysis Window10-30 msec


Some Analysis Windows


MATLAB Useful Functions wintool

Use “doc wintool” for more information window

Use “>doc window” for the list of supported windows Define your own window if needed e.g:

Sine window and Vorbis window

windowvorbis5.0sin2

sin

windowsine5.0sin

2

Nnnw

Nnnw


LPC Analysis Stage LPC Method Described in:

Ch5-Analysis_&_Synthesis_of_Pole-Zero_Speech_Models.pptx

Summary: Perform Autocorrelation Solve system of equations with Durbin-

Levinson Method MATLAB help

doc lpc, etc.


Example of MATLAB Codefunction myLPCCodec(wavfile, N)%% wavfile - input MS wav file % N - LPC Filter Order%

zAzH 1 ŝ[n]

p

kk ngeknsns

1

ˆˆ

ImpulseTrain

NoiseGenerator

VoicedUnvoiced

Vocal TractModel

Gain


Analysis of Quantization Errors Use MATLAB functions to research the effects of

quantization errors introduced by precision of the arithmetic operations and representation of the filter and error signal: Double (float64) representation (software emulation) Float (float32) representation (software emulation) Int (int32) representation (hardware emulation) Short (int16) representation (hardware emulation).

Useful MATLAB functions: Fix, floor, round, ceil Example:

sig_hat=fix(sig*2^(B-1))/2^(B-1); Truncation of the sig to B bits.


Quantization of Error Signal & Filter Coefficients Can Apply ADPCM for Error Signal Filter Coefficients in the Direct Filter Form

are found to be sensitive to quantization errors: Small quantization error can have a large effect

on filter characteristics. Issue is that polynomial coefficients have non-

linear mapping to poles of the filter (e.g., roots of the polynomial).

Alternate representations possible that have significantly better tolerance to quantization error.


LPC Filter Representations As noted previously when Levinson-Durbin algorithm was

introduced one alternate representation to filter coefficients was also mentioned: PARCOR coefficients:

LPC to PARCOR:

111

21 11

1

11

1

iii

i

iji

ii

iji

j

jpj

ak

ijkaaa

a

,,p,pifor

pja


PARCOR Filter Representation PARCOR to LPC:

pja

ijakaa

ka

,pifor

pjj

ijii

ij

ij

iii

1

11

,2,1

11


Line Spectral Frequency Representation It turns out that PARCOR coefficients can be represented

with LSF that have significantly better properties. Note that:

The PARCOR lattice structure of the LPC synthesis filter above:

zAzH 1

z-1

kp-

+

z-1

kp-1

+

-z-1 k 0=

-1

Input OutputA0Ap-1Ap

B0Bp-1Bp

k p+1

=∓1


Line Spectral Frequency Representation From previous slide the following holds:

From this realization of the filter the LSP representation is derived:

& & 1

11

100

111

11

zAzzB

zzBzA

zAkzBzzB

zBkzAzA

pp

p

pppp

pppp


LSF Representation

zQzPzA

zBzAzQk

zBzAzPk

ppp

pppp

pppp

11

11

11

21

1

1


LPC Synthesis Filter with LSF

11211

111

11

11

zQzP

zAzAzH

pp


A Simple Speech Coder LPC Based Synthesis Structure

Residual SynthesisFilter

AudioOutput

Filter Coeffs

De-emphasis

Deco

ding

ResidualSignal

FilterCoeffs

Audio Coding


Audio Coding Most of the Audio Coding Standards use

principles of Psychoacoustics. Example of Basic Structure of MP3

encoder:

Filterbank &Transform

Quantization

PsychoacousticModel

AudioInput Bit-stream


Basic Structure of Audio Coders Filterbank Processing Psychoacoustic Model Quantization

Filter Bank Analysis Synthesis


Filterbank Processing: Splitting full-band signal into several sub-

bands: Uniform sub-bands (FFT) Critical Band (FFT followed by non-linear

transformation) Reflect Human Auditory Apparatus. Mel-Scale and Bark-Scale transformations

2

7500arctan*5.3*00076.0arctan*13

7001ln*01048.1127

ffBark

fMel


Mel-Scale


Bark-Scale


Analysis Structure of Filterbank

hk[n]

AudioInput

hN[n]

h1[n]

↓

↓

↓

MDCT

MDCT

MDCT

hk[n] – Impulse Response of a Quadrature Mirror kth-filterN – Number of Channels. Typically 32↓ - Down-samplingMDCT – Modified Discrete Cosine Transform

MDCT

MDCT

MDCT

Quan

tizat

ion

Bit Stream


MDCT

MDCT

MDCT

Analysis Structure of Filterbank

IMDCT AudioOutput

IMDCT

IMDCT

↑

↑

↑

gk[n]

gN[n]

g1[n]

gk[n] – Impulse Response of a Inverse Quadrature Mirror kth-filter N – Number of Channels. Typically 32↑ - Up-samplingIMDCT – Inverse Modified Discrete Cosine Transform

Deco

ding

Bit Stream

Psycho-Acoustic Modeling


Psychoacoustic Model Masking Threshold according to the

human auditory perception. Masking threshold is used to quantize

the Discrete Cosine Transform Coefficients

Analysis is done in frequency domain represented by DFT and computed by FFT.


Threshold of Hearing Absolute threshold of audibly

perceptible events in quiet conditions (no other sounds).

Any signal bellow the threshold can be removed without effect on the perception.


Threshold of Hearing


Frequency Masking Schröder Spreading Function Bark Scale Function:

21

210

ker

2

474.015.17474.05.781.15log*10

7500arctan*5.3*00076.0arctan*13

zzzF

fzfzz

fffz

masmaskee


Masking Curve


Primary Tone 1kHz


Masked Tone 900 Hz


Combined Sound 1kHz + 0.9kHz


Combined 1kHz + 0.9kHz (-10dB)


Combined 1kHz + 5kHz (-10dB)


END

speech & audio processing

Documents

filter characteristics

n lpc filter order

direct filter form

lsp representation

parcor coefficients

small quantization error

analysis window10

polynomial coefficients