speech & audio processing

37
Speech & Audio Processing Speech & Audio Coding Examples

Upload: tuari

Post on 25-Feb-2016

68 views

Category:

Documents


4 download

DESCRIPTION

Speech & Audio Processing. Speech & Audio Coding Examples. A Simple Speech Coder. LPC Based Analysis Structure. Linear Prediction Analysis. Levinson- Durbin. Pre- emphasis. Windowing Analysis. Auto- Correlation. Audio Input. Residual. Residual. Analysis Filter. Quantization. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Speech & Audio Processing

Speech & Audio Processing

Speech & Audio Coding Examples

Page 2: Speech & Audio Processing

April 22, 2023 Veton Këpuska 2

A Simple Speech Coder LPC Based Analysis Structure

Pre-emphasis

WindowingAnalysis

Auto-Correlation

Levinson-Durbin

Linear Prediction Analysis

AudioInput

AnalysisFilter

Residual

Filter Coeffs

Residual

Filter CoeffsQu

antiz

atio

n

Page 3: Speech & Audio Processing

April 22, 2023 Veton Këpuska 3

Windowing Analysis StageN – Length of the Analysis Window10-30 msec

Page 4: Speech & Audio Processing

April 22, 2023 Veton Këpuska 4

Some Analysis Windows

Page 5: Speech & Audio Processing

April 22, 2023 Veton Këpuska 5

MATLAB Useful Functions wintool

Use “doc wintool” for more information window

Use “>doc window” for the list of supported windows Define your own window if needed e.g:

Sine window and Vorbis window

windowvorbis5.0sin2

sin

windowsine5.0sin

2

Nnnw

Nnnw

Page 6: Speech & Audio Processing

April 22, 2023 Veton Këpuska 6

LPC Analysis Stage LPC Method Described in:

Ch5-Analysis_&_Synthesis_of_Pole-Zero_Speech_Models.pptx

Summary: Perform Autocorrelation Solve system of equations with Durbin-

Levinson Method MATLAB help

doc lpc, etc.

Page 7: Speech & Audio Processing

April 22, 2023 Veton Këpuska 7

Example of MATLAB Codefunction myLPCCodec(wavfile, N)%% wavfile - input MS wav file % N - LPC Filter Order%

zAzH 1 ŝ[n]

p

kk ngeknsns

1

ˆˆ

ImpulseTrain

NoiseGenerator

VoicedUnvoiced

Vocal TractModel

Gain

Page 8: Speech & Audio Processing

April 22, 2023 Veton Këpuska 8

Analysis of Quantization Errors Use MATLAB functions to research the effects of

quantization errors introduced by precision of the arithmetic operations and representation of the filter and error signal: Double (float64) representation (software emulation) Float (float32) representation (software emulation) Int (int32) representation (hardware emulation) Short (int16) representation (hardware emulation).

Useful MATLAB functions: Fix, floor, round, ceil Example:

sig_hat=fix(sig*2^(B-1))/2^(B-1); Truncation of the sig to B bits.

Page 9: Speech & Audio Processing

April 22, 2023 Veton Këpuska 9

Quantization of Error Signal & Filter Coefficients Can Apply ADPCM for Error Signal Filter Coefficients in the Direct Filter Form

are found to be sensitive to quantization errors: Small quantization error can have a large effect

on filter characteristics. Issue is that polynomial coefficients have non-

linear mapping to poles of the filter (e.g., roots of the polynomial).

Alternate representations possible that have significantly better tolerance to quantization error.

Page 10: Speech & Audio Processing

April 22, 2023 Veton Këpuska 10

LPC Filter Representations As noted previously when Levinson-Durbin algorithm was

introduced one alternate representation to filter coefficients was also mentioned: PARCOR coefficients:

LPC to PARCOR:

111

21 11

1

11

1

iii

i

iji

ii

iji

j

jpj

ak

ijkaaa

a

,,p,pifor

pja

Page 11: Speech & Audio Processing

April 22, 2023 Veton Këpuska 11

PARCOR Filter Representation PARCOR to LPC:

pja

ijakaa

ka

,pifor

pjj

ijii

ij

ij

iii

1

11

,2,1

11

Page 12: Speech & Audio Processing

April 22, 2023 Veton Këpuska 12

Line Spectral Frequency Representation It turns out that PARCOR coefficients can be represented

with LSF that have significantly better properties. Note that:

The PARCOR lattice structure of the LPC synthesis filter above:

zAzH 1

z-1

kp-

+

z-1

kp-1

+

-z-1 k 0=

-1

Input OutputA0Ap-1Ap

B0Bp-1Bp

k p+1

=∓1

Page 13: Speech & Audio Processing

April 22, 2023 Veton Këpuska 13

Line Spectral Frequency Representation From previous slide the following holds:

From this realization of the filter the LSP representation is derived:

& & 1

11

100

111

11

zAzzB

zzBzA

zAkzBzzB

zBkzAzA

pp

p

pppp

pppp

Page 14: Speech & Audio Processing

April 22, 2023 Veton Këpuska 14

LSF Representation

zQzPzA

zBzAzQk

zBzAzPk

ppp

pppp

pppp

11

11

11

21

1

1

Page 15: Speech & Audio Processing

April 22, 2023 Veton Këpuska 15

LPC Synthesis Filter with LSF

11211

111

11

11

zQzP

zAzAzH

pp

Page 16: Speech & Audio Processing

April 22, 2023 Veton Këpuska 16

A Simple Speech Coder LPC Based Synthesis Structure

Residual SynthesisFilter

AudioOutput

Filter Coeffs

De-emphasis

Deco

ding

ResidualSignal

FilterCoeffs

Page 17: Speech & Audio Processing

Audio Coding

Page 18: Speech & Audio Processing

April 22, 2023 Veton Këpuska 18

Audio Coding Most of the Audio Coding Standards use

principles of Psychoacoustics. Example of Basic Structure of MP3

encoder:

Filterbank &Transform

Quantization

PsychoacousticModel

AudioInput Bit-stream

Page 19: Speech & Audio Processing

April 22, 2023 Veton Këpuska 19

Basic Structure of Audio Coders Filterbank Processing Psychoacoustic Model Quantization

Page 20: Speech & Audio Processing

Filter Bank Analysis Synthesis

Page 21: Speech & Audio Processing

April 22, 2023 Veton Këpuska 21

Filterbank Processing: Splitting full-band signal into several sub-

bands: Uniform sub-bands (FFT) Critical Band (FFT followed by non-linear

transformation) Reflect Human Auditory Apparatus. Mel-Scale and Bark-Scale transformations

2

7500arctan*5.3*00076.0arctan*13

7001ln*01048.1127

ffBark

fMel

Page 22: Speech & Audio Processing

April 22, 2023 Veton Këpuska 22

Mel-Scale

Page 23: Speech & Audio Processing

April 22, 2023 Veton Këpuska 23

Bark-Scale

Page 24: Speech & Audio Processing

April 22, 2023 Veton Këpuska 24

Analysis Structure of Filterbank

hk[n]

AudioInput

hN[n]

h1[n]

MDCT

MDCT

MDCT

hk[n] – Impulse Response of a Quadrature Mirror kth-filterN – Number of Channels. Typically 32↓ - Down-samplingMDCT – Modified Discrete Cosine Transform

MDCT

MDCT

MDCT

Quan

tizat

ion

Bit Stream

Page 25: Speech & Audio Processing

April 22, 2023 Veton Këpuska 25

MDCT

MDCT

MDCT

Analysis Structure of Filterbank

IMDCT AudioOutput

IMDCT

IMDCT

gk[n]

gN[n]

g1[n]

gk[n] – Impulse Response of a Inverse Quadrature Mirror kth-filter N – Number of Channels. Typically 32↑ - Up-samplingIMDCT – Inverse Modified Discrete Cosine Transform

Deco

ding

Bit Stream

Page 26: Speech & Audio Processing

Psycho-Acoustic Modeling

Page 27: Speech & Audio Processing

April 22, 2023 Veton Këpuska 27

Psychoacoustic Model Masking Threshold according to the

human auditory perception. Masking threshold is used to quantize

the Discrete Cosine Transform Coefficients

Analysis is done in frequency domain represented by DFT and computed by FFT.

Page 28: Speech & Audio Processing

April 22, 2023 Veton Këpuska 28

Threshold of Hearing Absolute threshold of audibly

perceptible events in quiet conditions (no other sounds).

Any signal bellow the threshold can be removed without effect on the perception.

Page 29: Speech & Audio Processing

April 22, 2023 Veton Këpuska 29

Threshold of Hearing

Page 30: Speech & Audio Processing

April 22, 2023 Veton Këpuska 30

Frequency Masking Schröder Spreading Function Bark Scale Function:

21

210

ker

2

474.015.17474.05.781.15log*10

7500arctan*5.3*00076.0arctan*13

zzzF

fzfzz

fffz

masmaskee

Page 31: Speech & Audio Processing

April 22, 2023 Veton Këpuska 31

Masking Curve

Page 32: Speech & Audio Processing

April 22, 2023 Veton Këpuska 32

Primary Tone 1kHz

Page 33: Speech & Audio Processing

April 22, 2023 Veton Këpuska 33

Masked Tone 900 Hz

Page 34: Speech & Audio Processing

April 22, 2023 Veton Këpuska 34

Combined Sound 1kHz + 0.9kHz

Page 35: Speech & Audio Processing

April 22, 2023 Veton Këpuska 35

Combined 1kHz + 0.9kHz (-10dB)

Page 36: Speech & Audio Processing

April 22, 2023 Veton Këpuska 36

Combined 1kHz + 5kHz (-10dB)

Page 37: Speech & Audio Processing

April 22, 2023 Veton Këpuska 37

END