Download - Speech & Audio Processing

Speech & Audio Processing

Speech & Audio Coding Examples

April 22, 2023 Veton Këpuska 2

A Simple Speech Coder

LPC Based Analysis Structure

Pre-emphasis

WindowingAnalysis

Auto-Correlation

Levinson-Durbin

Linear Prediction Analysis

AudioInput

AnalysisFilter

Residual

Filter Coeffs

Residual

Filter CoeffsQ

uanti

zati

on


Windowing Analysis Stage

N – Length of the Analysis Window

10-30 msec


Some Analysis Windows


MATLAB Useful Functions

wintool Use “doc wintool” for more information

window Use “>doc window” for the list of supported windows

Define your own window if needed e.g: Sine window and Vorbis window

windowvorbis

5.0sin

2sin

windowsine5.0

sin

2

N

nnw

N

nnw


LPC Analysis Stage

LPC Method Described in: Ch5-Analysis_&_Synthesis_of_Pole-

Zero_Speech_Models.ppt

Summary: Perform Autocorrelation Solve system of equations with Durbin-

Levinson Method

MATLAB help doc lpc, etc.


Example of MATLAB Codefunction myLPCCodec(wavfile, N)%% wavfile - input MS wav file % N - LPC Filter Order%[x, fs, nbits] = wavread(wavfile);% plot(x);% Playing Original Signalsoundsc(x,fs);% Performing LPC analysis using MATLAB lpc function[a, g] = lpc(x,N);% performing filtering operation on estimated filter coeffs% producing predicted samplesest_x = filter([0 -a(2:end)], 1, x);% error signale = x - est_x;% Testing the quality of predicted samplessoundsc(est_x, fs); % Synthesis Stage With Zero Loss of Informationsyn_x = filter([0 -a(2:end)], 1, g.*e);soundsc(syn_x,fs);

zAzH1

ge[n] ŝ[n]

p

kk ngeknsns

1

ˆˆ


Analysis of Quantization Errors Use MATLAB functions to research the effects of

quantization errors introduced by precision of the arithmetic operations and representation of the filter and error signal: Double (float64) representation (software emulation) Float (float32) representation (software emulation) Int (int32) representation (hardware emulation) Short (int16) representation (hardware emulation).

Useful MATLAB functions: Fix, floor, round, ceil Example:

sig_hat=fix(sig*2^(B-1))/2^(B-1); Truncation of the sig to B bits.


Quantization of Error Signal & Filter Coefficients

Can Apply ADPCM for Error Signal Filter Coefficients in the Direct Filter Form

are found to be sensitive to quantization errors: Small quantization error can have a large effect

on filter characteristics. Issue is that polynomial coefficients have non-

linear mapping to poles of the filter (e.g., roots of the polynomial).

Alternate representations possible that have significantly better tolerance to quantization error.


LPC Filter Representations

As noted previously when Levinson-Durbin algorithm was introduced one alternate representation to filter coefficients was also mentioned: PARCOR coefficients:

LPC to PARCOR:

111

21 11

1

11

1

iii

i

iji

ii

iji

j

jpj

ak

ijk

aaaa

,,p,pifor

pja


PARCOR Filter Representation

PARCOR to LPC:

pja

ijakaa

ka

,pifor

pjj

ijii

ij

ij

iii

1

11

,2,1

11


Line Spectral Frequency Representation

It turns out that PARCOR coefficients can be represented with LSF that have significantly better properties.

Note that:

The PARCOR lattice structure of the LPC synthesis filter above:

zAzH1

z-1z-1

kp-

+

z-1z-1

kp-1

+

-z-1z-1 k 0

=-1

Input OutputA0Ap-1Ap

B0Bp-1Bp

k p+

1=

∓ 1


Line Spectral Frequency Representation

From previous slide the following holds:

From this realization of the filter the LSP representation is derived:

& & 1

11

100

111

11

zAzzB

zzBzA

zAkzBzzB

zBkzAzA

pp

p

pppp

pppp


LSF Representation

zQzPzA

zBzAzQk

zBzAzPk

ppp

pppp

pppp

11

11

11

2

1

1

1


LPC Synthesis Filter with LSF

1121

1

1

11

11

11

zQzP

zAzAzH

pp


A Simple Speech Coder

LPC Based Synthesis Structure

ResidualSynthesis

FilterAudioOutput

Filter Coeffs

De-emphasis

Deco

din

g

ResidualSignal

FilterCoeffs

Audio Coding


Audio Coding

Most of the Audio Coding Standards use principles of Psychoacoustics.

Example of Basic Structure of MP3 encoder:

Filterbank &Transform

Filterbank &Transform

QuantizationQuantization

PsychoacousticModel

PsychoacousticModel

AudioInput Bit-stream


Basic Structure of Audio Coders

Filterbank Processing Psychoacoustic Model Quantization

Filter Bank Analysis Synthesis


Filterbank Processing:

Splitting full-band signal into several sub-bands: Uniform sub-bands (FFT) Critical Band (FFT followed by non-linear

transformation) Reflect Human Auditory Apparatus. Mel-Scale and Bark-Scale transformations

2

7500arctan*5.3*00076.0arctan*13

7001ln*01048.1127

ffBark

fMel


Mel-Scale


Bark-Scale


Analysis Structure of Filterbank

hk[n]hk[n]

AudioInput

hN[n]hN[n]

h1[n]h1[n]

↓↓

↓↓

↓↓

MDCTMDCT

MDCTMDCT

MDCTMDCT

hk[n] – Impulse Response of a Quadrature Mirror kth-filter

N – Number of Channels. Typically 32

↓ - Down-sampling

MDCT – Modified Discrete Cosine Transform

MDCTMDCT

MDCTMDCT

MDCTMDCT

Quanti

zati

on

Bit Stream


MDCTMDCT

MDCTMDCT

MDCTMDCT

Analysis Structure of Filterbank

IMDCTIMDCT AudioOutput

IMDCTIMDCT

IMDCTIMDCT

↑↑

↑↑

↑↑

gk[n]gk[n]

gN[n]gN[n]

g1[n]g1[n]

gk[n] – Impulse Response of a Inverse Quadrature Mirror kth-filter

N – Number of Channels. Typically 32

↑ - Up-sampling

IMDCT – Inverse Modified Discrete Cosine Transform

Deco

din

g

Bit Stream

Psycho-Acoustic Modeling


Psychoacoustic Model

Masking Threshold according to the human auditory perception. Masking threshold is used to quantize

the Discrete Cosine Transform Coefficients

Analysis is done in frequency domain represented by DFT and computed by FFT.


Threshold of Hearing

Absolute threshold of audibly perceptible events in quiet conditions (no other sounds).

Any signal bellow the threshold can be removed without effect on the perception.


Threshold of Hearing


Frequency Masking

Schröder Spreading Function Bark Scale Function:

2

12

10

ker

2

474.015.17474.05.781.15log*10

7500arctan*5.3*00076.0arctan*13

zzzF

fzfzz

fffz

masmaskee


Masking Curve


Primary Tone 1kHz


Masked Tone 900 Hz


Combined Sound 1kHz + 0.9kHz


Combined 1kHz + 0.9kHz (-10dB)


Combined 1kHz + 5kHz (-10dB)

END


Download - Speech & Audio Processing

Top Related