lecture 16 (104 slides)

8/12/2019 Lecture 16 (104 Slides)

1/104

Digital Speech ProcessingDigital Speech Processing

Speech Coding Methods BasedSpeech Coding Methods Basedon Speech Modelson Speech Models

1

8/12/2019 Lecture 16 (104 Slides)

2/104

Waveform Coding versus BlockWaveform Coding versus BlockProcessingProcessing

sample-by-sample matching of waveforms coding quality measured using SNR

Source modeling (block processing) block processing of signal => vector of outputs

every block overlapped blocks

Block 1

2

Block 2

Block 3

8/12/2019 Lecture 16 (104 Slides)

3/104

-- weve carried waveform coding based on optimizing and maximizing

achieved bit rate reductions on the order of 4:1 (i.e., from 128 KbpsPCM to 32 Kbps ADPCM) at the same time achieving toll quality SNR for telephone-bandwidth speech

o ower ra e ur er w ou re uc ng speec qua y, we nee oexploit features of the speech production model, including:

source modeling s ectrum modelin use of codebook methods for coding efficiency

we also need a new way of comparing performance of differentwaveform and model-based coding methods

, , -based coders since they operate on blocks of speech and dont followthe waveform on a sample-by-sample basis

new subjective measures need to be used that measure user-perceived

3

, ,

8/12/2019 Lecture 16 (104 Slides)

4/104

Enhancements for ADPCM Coders

pitch prediction noise shaping

Analysis-by-Synthesis Speech Coders multipulse linear prediction coder (MPLPC) code-excited linear prediction (CELP)

Open-Loop Speech Coders two-state excitation model LPC vocoder residual-excited linear predictive coder

m xe exc a on sys ems speech coding quality measures - MOS speech coding standards

4

8/12/2019 Lecture 16 (104 Slides)

5/104

Differential QuantizationDifferential Quantization

n x nd nd nc

][~ n x ][ n xP : simple predictor ofvocal tract response

][ nc ][ nd ][ n x

5= = p

k

k

z zP 1)( ][~

n x

8/12/2019 Lecture 16 (104 Slides)

6/104

Issues with Differential uantizationIssues with Differential uantization difference signal retains the character of

the excitation si nal switches back and forth between quasi-periodic and noise-like signals

pre c on ura on even w en us ng

p =20) is order of 2.5 msec (for sampling predictor is predicting vocal tract response

not the excitation period (for voiced sounds)

Solution incorporate two stages ofprediction, namely a short-time predictor

6

-

time predictor for pitch period

8/12/2019 Lecture 16 (104 Slides)

7/104

Pitch PredictionPitch Prediction+ + ++ +

+ ++

+

+

+ )( zP = M z zP )(

[ ] x n

++

)(2 zP )(1 1 zP =

=

p

k

k

k z zP 12 )(

)( zPc

first stage pitch predictor:

res uaTransmitter Receiver

second sta e linear redictor vocal tract redictor :

1 ( ) = M P z z

721( )

== p

k

k k P z z

8/12/2019 Lecture 16 (104 Slides)

8/104

first stage pitch predictor:

i

M 1

this predictor model assumes that the pitch period, , is an integer number of samples and is a gain constant allowing

i M

for variations in pitch period over time (for unvoiced or background frames, values of and are irrelevant) i

M

1

is of the form:

+ = + +M M P z z z z 1

1 =M M k z 1

this more advanced form provides a way to handle non-integer pitch period through interpolation around the nearest integer

=i

k

8

pitch period value, M

8/12/2019 Lecture 16 (104 Slides)

9/104

The combined inverse system is the cascade in the decoder system:i

1 2

1 1 1( ) 1 ( ) 1 ( ) 1 ( )c c

H z P z P z P z= =

[ ][ ]1 21 ( ) 1 ( ) 1 ( ) 1 1 -

cP z P z P z = =

i

[ ]1 2 1( ) ( ) ( )P z P z P z

1 2 1

1

which is implemented as a parallel combination of two predictors:

and

c

P z P z P z

i

[ ]

The prediction signal, can be expressed as x ni

p

9

1

k k =

8/12/2019 Lecture 16 (104 Slides)

10/104

8/12/2019 Lecture 16 (104 Slides)

11/104

8/12/2019 Lecture 16 (104 Slides)

12/104

first search for M that maximizes [M ] opt

Solve for more accurate pitch predictor bym n m z ng e var ance o e expan epitch predictor

Solve for optimum vocal tract predictorcoefficients, k , k=1,2,,p

12

8/12/2019 Lecture 16 (104 Slides)

13/104

Pitch PredictionPitch Prediction

Vocal tractprediction

Pitch and

prediction

13

8/12/2019 Lecture 16 (104 Slides)

14/104

Noise Shaping in DPCMNoise Shaping in DPCMSystemsSystems

14

8/12/2019 Lecture 16 (104 Slides)

15/104

8/12/2019 Lecture 16 (104 Slides)

16/104

Noise ShapingNoise Shaping

Basic ADPCM encoder and decoder

Equivalent ADPCM encoder and decoder

16Noise shaping ADPCM encoder and decoder

8/12/2019 Lecture 16 (104 Slides)

17/104

Noise Sha inNoise Sha in[ ] [ ] [ ] ( ) ( ) ( )

The equivalence of parts (b) and (a) is shown by the following:

From art a we see that:

x n x n e n X z X z E z= + = +

i

i

( ) ( ) ( ) ( ) [1 ( )] ( ) ( ) ( )

with

D z X z P z X z P z X z P z E z= =

i

( ) ( ) ( ) [1 ( )] ( ) [1 ( )] ( )

showing the equivalence of parts (b) and (a). Further, since

D z D z E z P z X z P z E z= + = +

i

( ) ( ) ( ) ( )1 ( )

1([1 ( )] ( ) [1 ( )] ( ))

X z H z D z D zP z

P z X z P z E z

= =

= +

( ) ( )

Fee X z E z

= +

i

ding back the quantization error through the predictor,

17

[ ] [ ]

, ,, by the quantization error, , incurred in quantizing the

difference signal

x n e n

[ ]., d n

8/12/2019 Lecture 16 (104 Slides)

18/104

Shaping the Quantization NoiseShaping the Quantization Noise( )

( ),

To shape the quantization noise we simply replace bya different system function, giving the reconstructed signal as:

P z

F z

i

'( ) ( ) '( ) '( )1 ( )

1

X z H z D z D zP z

= =

'

1 ( )P z

1 ( )( ) '( )

F z X z E z

= +

[ ]Thus if is coded by the encoder, the -transform of thereconstructed signal at the receiver is:

x n zi

'( ) ( ) '( )1 ( )

'( ) '( ) ( ) '(1

X z X z E zF z

E z E z z E zP z

= + = = )

181 ( )

( ) 1 ( ) where is the effective noise shaping filter F z

z P z

=

i

8/12/2019 Lecture 16 (104 Slides)

19/104

Noise shaping filter options:i

. an we assume no se as a a spec rum, en

the noise and speech spectrum have the same shape 2. then the e uivalent s stem is the standard

z

F z P z

=

= DPCM

1

'( ) '( ) ( )system where with flat noise spectrum

'' '' p

k k

E z E z E z

= =

= =1

.

to ''hide'' the noise beneath the spectral peaks of the speech signal;k =

zeros have the same angles in the -plane, but with a radius that isdivided by .

z

19

8/12/2019 Lecture 16 (104 Slides)

20/104

Noise Sha in Filter Noise Sha in Filter

20

8/12/2019 Lecture 16 (104 Slides)

21/104

Noise Shaping Filter Noise Shaping Filter 2'

2 /2 / 2

' '

,

1 ( )

power of then the power spectrum of the shaped noise is of the form:

S

S

e

j F F j F F F eP e

=

1 ( )S P e

spectrum

above

speech

21

8/12/2019 Lecture 16 (104 Slides)

22/104

Fully Quantized AdaptiveFully Quantized AdaptivePredictive Coder Predictive Coder

22

8/12/2019 Lecture 16 (104 Slides)

23/104

Input is x [n]

P 2 (z ) is the short-term (vocal tract) predictor

Signal v [n] is the short-term prediction error

Goal of encoder is to obtain a quantized representation of this excitation

23

, .

8/12/2019 Lecture 16 (104 Slides)

24/104

Total bit rate for ADPCM coder:i

where is the number of bits for the quantization of thedifference signal, is the number of bits for encoding the

ADPCM S P P

B B

= + +i

step size at frame ra ,te and is the total number of bits

allocated to the predictor coefficients (both long and short-term)

with frame rate

P

P

F B

F

8000 1 4Typically and even with bits, we need between

8000 and 3200S F B= i

bps for quantization of difference signal Typically we need about 3000-4000 bps for the side informationi

(step size and predictor coefficients) Overall we need between 11,000 and 36,000 bps for a fui lly quantized system

24

8/12/2019 Lecture 16 (104 Slides)

25/104

Bit Rate for LP CodingBit Rate for LP Coding

speech and residual sampling rate: F s=8 kHz LP analysis frame rate: F

=F

P = 50-100 frames/sec

quantizer stepsize: 6 bits/frame predictor parameters:

M (pitch period): 7 bits/frame vocal tract predictor coefficients: PARCORs 16-20, 46-

50 bits/frame

prediction residual: 1-3 bits/sample total bit rate:

25

BR = 72*F P + F s (minimum)

8/12/2019 Lecture 16 (104 Slides)

26/104

TwoTwo--Level (B=1 bit) Quantizer Level (B=1 bit) Quantizer re ct on res ua

Quantizer in ut

Quantizer output

Reconstructed pitch

Original pitch residual

econs ruc e speec

26

8/12/2019 Lecture 16 (104 Slides)

27/104

ThreeThree- -Level Center Level Center- -Clipped Quantizer Clipped Quantizer re ct on res ua

Quantizer in ut

Quantizer output

Reconstructed pitch

Original pitch residual

econs ruc e speec

27

8/12/2019 Lecture 16 (104 Slides)

28/104

Summary of Using LP in SpeechSummary of Using LP in Speechodingoding

the redictor can be more so histicated than a

vocal tract response predictorcan utilizeperiodicity (for voiced speech frames) t e quant zat on no se spectrum can e s ape

by noise feedback

the formant peaks in the speech, thereby utilizing theperceptual masking power of the human auditory

we now move on to more advanced LP coding ofspeech using Analysis-by-Synthesis methods

28

8/12/2019 Lecture 16 (104 Slides)

29/104

Analysis Analysis- -byby--SynthesisSynthesisSpeech CodersSpeech Coders

29

8/12/2019 Lecture 16 (104 Slides)

30/104

-- --

closed-loop adaptive predictive coder was

input/excitation to the vocal tract model) to

rates while maintaining very high quality at

30

8/12/2019 Lecture 16 (104 Slides)

31/104

A A--bb--S Speech CodingS Speech Coding

Replace quantizer for generating excitation signal with anoptimization process (denoted as Error Minimization above)

31

w ere y e exc a on s gna , n s cons ruc e ase onminimization of the mean-squared value of the synthesis error,

d [n]= x [n]- x [n]; utilizes Perceptual Weighting filter.~

8/12/2019 Lecture 16 (104 Slides)

32/104

A A--bb--S Speech CodingS Speech Coding

[ ]

as c opera on o eac oop o c ose - oop - - sys em: 1. at the beginning of each loop (and only once each loop), the

speech signal, , is used to generate an optimum order th x n p

i

11 1( )

1 ( )

p

i

H zP z

z = =

[ ] [ ] [ ]

[ ],

2. the difference signal, , based on an initial estimateof the speech signal, is perceptually

d n x n x n

x n

=

=

weighted by a speech-adaptive

filter of the form:1 ( )

( )1 ( )

(see next vugraph)

3. the error minimization box and the excitation generator create a sequence

P zW z

P z

=

[ ]

of error signals that iteratively (once per loop) improve the match to theweighted error signal

4. the resulting excitation signal, , which is an improved estimate of thed n

32

ac ua pre c on error s gna or eac oop era on, s ue o exc e e LPC filter and the loop processing is iterated until the resulting error signal

meets some criterion for stopping the closed-loop iterations.

8/12/2019 Lecture 16 (104 Slides)

33/104

8/12/2019 Lecture 16 (104 Slides)

34/104

8/12/2019 Lecture 16 (104 Slides)

35/104

Implementation of A-B-S Speechoding

the vocal tract filter that produces high qualitysynthetic output, while maintaining a structuredrepresentation that makes it easy to code the

excitation at low data rates SolutionSolution :: use a set of basis functions whichallow you to iteratively build up an optimal

,basis function at each iteration in the A-b-S

35

8/12/2019 Lecture 16 (104 Slides)

36/104

Implementation of A-B-S Speech Coding

1 2{ [ ], [ ],..., [ ]}, 0 1

Assume we are given a set of basis functions of the form:

and each basis function is 0 outside the defining interval.Q

Q

f n f n f n n L =

i

At each iteration of the A-b-S loop, wei

:

select the basis function

from that maximially reduces the perceptually weighted mean-squared error, E

( )1 2

0

[ ] [ ] [ ] [ ]

[ ] [ ]where and are the VT and perceptual weighting filte

L

n

E x n d n h n w n

h n w n

=

=

rs.[ ],

[ ] [ ]

.

We denote the optimal basis function at the iteration as

giving the excitation signal where is the optimal

wei htin coefficient for basis function

k

k

th

k k k

k f n

d n f n

n

=

i

The A-b-Sk

i iteration continues until the perceptually weighted error falls below some desired threshold, or until a maximum number of iterations is reached ivin the final excitation si nal as: N d n

36

1

[ ] [ ]k

N

k k

d n f n =

=

8/12/2019 Lecture 16 (104 Slides)

37/104

- -

Closed Loo Coder Closed Loo Coder

37Reformulated Closed Loop Coder Reformulated Closed Loop Coder

8/12/2019 Lecture 16 (104 Slides)

38/104

8/12/2019 Lecture 16 (104 Slides)

39/104

- -

, ,...,

[ ])

e now eg n e era on o e - - oop,

We optimally select one of the basis set (call this anddetermine the am litude ivin :

k f n

=i

i

[ ] [ ], 1, 2,...,

W

k k k d n f n k N = =

i

e then form the new perceptually weighted error as:1

1

' [ ] ' [ ] [ ] '[ ]

' [ ] ' [ ]

k k k k

k k k

th

e n e n f n h n

e n y n

=

=

( )1

2

0

' [ ]

- L

k k n

E e n

=

=

i

( )1

2

10

' [ ] ' [ ] L

k k k n

e n y n

=

=

39

8/12/2019 Lecture 16 (104 Slides)

40/104

8/12/2019 Lecture 16 (104 Slides)

41/104

- - Our final results are the relations:

N N

i

1 121 1

2

'[ ] [ ] '[ ] ' [ ]

' ' ' '

k k k k

k k

L L N

x n f n h n y n = =

= =

0 0 1

21

2 '[ ] ' [ ] ' [ ]

k k n n k

L N

k k j

E x n y n y n

= = =

= 0 1

where then k j = =

1 1

' ' ' '

re-optimized 's satisfy the relation:k L N L

0 1 0

, , ,...,

[ ]

At receiver use set of along with to create excitation:k

j k k jn k n

k f n = = =

= =

i

411[ ] [ ]

k k k

x n f n =

= [ ]h n

A l iA l i bb S h i C diS h i C di

8/12/2019 Lecture 16 (104 Slides)

42/104

Analysis Analysis- -byby--Synthesis CodingSynthesis Coding

[ ] [ ] 0 1 1 u pu se near pre c ve co ng

= = f n n Q L

-

. . a an . . em e, ,

Proc. IEEE Conf. Acoustics, Speech and Signal Proc ., 1982.

[ ] 1 2

vector of white Gaussian noise, = =M f n Q

M. R. Schroeder and B. S. Atal, Code-excited linear prediction

Self-excited linear predictive vocoder (SEV)(CELP), Proc. IEEE Conf. Acoustics, Speech and Signal Proc ., 1985.

1 2, s e vers ons oprevious excitation source = n n

42

R. C. Rose and T. P. Barnwell, The self-excited vocoder,Proc. IEEE Conf. Acoustics, Speech and Signal Proc ., 1986.

8/12/2019 Lecture 16 (104 Slides)

43/104

43

8/12/2019 Lecture 16 (104 Slides)

44/104

Multipulse LP Coder Multipulse LP Coder

2

u pu se uses mpu ses as e as s unc ons; us e as cerror minimization reduces to:

i

0 1

[ ] [ ] k k n k

E x n h n = =

=

44

8/12/2019 Lecture 16 (104 Slides)

45/104

1 11. find best and for single pulse solution

2. subtract out the effect of this impulse from the speech

waveform and repeat the process3. do this until desired minimum error is obtained -

perceptually close to the original

45

M lti l A l iM lti l A l i

8/12/2019 Lecture 16 (104 Slides)

46/104

Multipulse AnalysisMultipulse Analysis

Output fromprevious

B. S. Atal and J. R. Remde, A new model of LPC excitation-

frame

46

, .IEEE Conf. Acoustics, Speech and Signal Proc ., 1982.

8/12/2019 Lecture 16 (104 Slides)

47/104

Examples of Multipulse LPCExamples of Multipulse LPC

B. S. Atal and J. R. Remde, A new model of LPC Excitation-

47

, .IEEE Conf. Acoustics, Speech and Signal Proc ., 1982.

8/12/2019 Lecture 16 (104 Slides)

48/104

-- =

impulses/sec X 9 bits/impulse => 7200 bps need 2400 bps for A(z) => total bit rate of

9600 bps code pulse locations differentially ( i = N i N i-1 ) to reduce range of variable

amplitudes normalized to reduce dynamicran e

48

8/12/2019 Lecture 16 (104 Slides)

49/104

basic idea is that primary pitch pulses are correlated and predictable over

, . .,s [n] s [n-M ]

break correlation of speech into short term component (used to providespectral estimates) and long term component (used to provide pitch pulse

first remove short-term correlation by short-term prediction, followed byremoving long-term correlation by long-term predictions

49

8/12/2019 Lecture 16 (104 Slides)

50/104

( ) 1 ( ) 1

prediction error filter

= = p

k A z P z z 1

( )short term residual, , includes primary pitch pulses that can be removedby long-term predictor of the form

=

k

u n

( ) 1 giving

=

M B z bz

( ) ( ) ( ) = v n u n bu n M ( )with fewer large pulses to code than in u n

50

A l iA l i bb S h iS h i

8/12/2019 Lecture 16 (104 Slides)

51/104

Analysis Analysis- -byby--SynthesisSynthesis

mpu ses se ecte to represent t e output o t e ong term pre ctor, rat er t an t e output o t e s ort

term predictor most impulses still come in the vicinity of the primary pitch pulse

=> result is hi h ualit s eech codin at 8-9.6 Kb s

51

8/12/2019 Lecture 16 (104 Slides)

52/104

Code Excited LinearCode Excited LinearPrediction (CELP)Prediction (CELP)

52

8/12/2019 Lecture 16 (104 Slides)

53/104

Code Excited LPCode Excited LP basic idea is to represent the residual after long-term (pitch

period) and short-term (vocal tract) prediction on each frame- ,

by multiple pulses replaced residual generator in previous design by a

frame at 8 kHz sampling rate

can use either deterministic or stochastic codebook10

deterministic codebooks are derived from a training set ofvectors => problems with channel mismatch conditions

histogram of the residual from the long-term predictorroughly is Gaussian pdf => construct codebook from whiteGaussian random numbers with unit variance

53

CELP used in STU-3 at 4800 bps, cellular coders at 800 bps

Code Excited LPCode Excited LP

8/12/2019 Lecture 16 (104 Slides)

54/104

Code Excited LPCode Excited LP

54

amplitude distribution of the residual from the long-term pitch predictoroutput is roughly identical to a Gaussian distribution with the same meanand variance.

CELP EncoderCELP Encoder

8/12/2019 Lecture 16 (104 Slides)

55/104

CELP Encoder CELP Encoder

55

8/12/2019 Lecture 16 (104 Slides)

56/104

For each of the excitation VQ codebook vectors

the following operations occur: the codebook vector is scaled by the LPC gain

, , the error signal, e [n], is used to excite the long-term

pitch predictor, yielding the estimate of the speechsignal, x [n], for the current codebook vector the signal, d [n], is generated as the difference

between the s eech si nal x n and the estimated

speech signal, x [n] the difference signal is perceptually weighted and the

-

~

56

Stochastic Code (CELP)Stochastic Code (CELP)

8/12/2019 Lecture 16 (104 Slides)

57/104

Stochastic Code (CELP)Stochastic Code (CELP)Excitation AnalysisExcitation Analysis

57

8/12/2019 Lecture 16 (104 Slides)

58/104

58

8/12/2019 Lecture 16 (104 Slides)

59/104

8/12/2019 Lecture 16 (104 Slides)

60/104

Goal is to suppress noise below the maskingthreshold at all fre uencies usin a filter of the form:

i

111

1

p

k k k

k z

=

21

1

p pk k

k k

z z

z

=

=

where the typical ranges of the pai

0.2 0.4

rameters are:

1

2

. .

0.8 0.9

i

60

in the valleys without distorting the speech.

8/12/2019 Lecture 16 (104 Slides)

61/104

61

8/12/2019 Lecture 16 (104 Slides)

62/104

-

array of Gaussian random numbers, where mostof the samples between adjacent codewordswere identical

Such overlapping codebooks typically use shiftsof one or two samples, and provide largecomplexity reductions for storage and

given frame

62

Overlapped StochasticOverlapped Stochastic

8/12/2019 Lecture 16 (104 Slides)

63/104

Overlapped StochasticOverlapped Stochasticodebookodebook

Two codewords which are identical except for a shift of

63

wo samp es

8/12/2019 Lecture 16 (104 Slides)

64/104

64

CELP WaveformsCELP Waveforms

8/12/2019 Lecture 16 (104 Slides)

65/104

a or g na speec

b s nthetic s eechoutput

residual

(d) reconstructed LPCresidual

(e) prediction residualafter pitch prediction

65

10-bit randomcodebook

8/12/2019 Lecture 16 (104 Slides)

66/104

66

8/12/2019 Lecture 16 (104 Slides)

67/104

67

FSFS--1016 Encoder/Decoder 1016 Encoder/Decoder

8/12/2019 Lecture 16 (104 Slides)

68/104

68

8/12/2019 Lecture 16 (104 Slides)

69/104

8/12/2019 Lecture 16 (104 Slides)

70/104

-- Three sets of features are produced by

the encoding system, namely:1. the LPC spectral parameters (coded as a

set of 10 LSP parameters) for each 30 msecframe

2. the codeword and gain of the adaptivecodebook vector for each 7.5 msec sub-

3. the codeword and gain of the stochastic

70

. -frame

8/12/2019 Lecture 16 (104 Slides)

71/104

--

71

8/12/2019 Lecture 16 (104 Slides)

72/104

72

8/12/2019 Lecture 16 (104 Slides)

73/104

Total delay of any coder is the time taken by the input

speec samp e o e processe , ransm e , an

decoded, plus any transmission delay, including: buffering delay at the encoder (length of analysis frame window)-~ -

processing delay at the encoder (compute and encode all coderparameters)-~20-40 msec

buffering delay at the decoder (collect all parameters for a frameof speech)-~20-40 msec processing delay at the decoder (time to compute a frame of

output using the speech synthesis model)-~10-20 msec ,

of signals, forward error correction, etc.) is order of 70-130 msec

73

8/12/2019 Lecture 16 (104 Slides)

74/104

,

large due to forward adaptation forestimatin the vocal tract and itchparameters backward adaptive methods generally

produced poor quality speech Chen showed how a backward adaptive

as a conventional forward adaptive coder atbit rates of 8 and 16 kbps

74

Low Delay (LD) CELP Coder Low Delay (LD) CELP Coder

8/12/2019 Lecture 16 (104 Slides)

75/104

1 ( / 0.9)P z

1 ( / 0.4)

=

P z

75

8/12/2019 Lecture 16 (104 Slides)

76/104

-- only the excitation sequence is transmitted to thereceiver; the long and short-term predictors arecom ne n o one or er pre c or w osecoefficients are updated by performing LPC analysis onthe previously quantized speech signal

e exc a on ga n s up a e y us ng e ga ninformation embedded in the previously quantizedexcitation

- , ,bits/sample at an 8 kHz rate; using a codeword length of5 samples, each excitation vector is coded using a 10-bit

- -

codebook) a closed loop optimization procedure is used to populatethe sha e codebook usin the same wei hted error

76

criterion as is used to select the best codeword in theCELP coder

16 kbps LD CELP Characteristics16 kbps LD CELP Characteristics

8/12/2019 Lecture 16 (104 Slides)

77/104

8 kHz sam lin rate 2 bits/sample for coding residual

5 samples per frame are encoded by VQ using a10-bit gain-shape codebook 3 bits (2 bits and sign) for gain (backward adaptive on

7 bits for wave shape recursive autocorrelation method used to

compute autocorrelation values from pastsynthetic speech.

77

-voice

LDLD--CELP Decoder CELP Decoder

8/12/2019 Lecture 16 (104 Slides)

78/104

a pre c or an ga n va ues are er ve romcoded speech as at the encoder

1

1 110 11 ( )

= M P z

78

3 1110 21 ( )

p P z

8/12/2019 Lecture 16 (104 Slides)

79/104

8/12/2019 Lecture 16 (104 Slides)

80/104

- -

used to derive an excitation signal that

while being efficient to code

code-excited LPC

the CELP idea

80

8/12/2019 Lecture 16 (104 Slides)

81/104

81

--

8/12/2019 Lecture 16 (104 Slides)

82/104

82

8/12/2019 Lecture 16 (104 Slides)

83/104

d n x n

83

ModelModel--Based CodingBased Codingassume we model the vocal tract transfer function as

8/12/2019 Lecture 16 (104 Slides)

84/104

( )( )( ) ( ) 1 ( )

= = =

X z G GH z S z A z P z

1

( )

LPC coder 100 frames/sec, 13 parameters/frame (p 10 LPC

=

=

=> =

k k k

P z a z

coefficients, pitch period, voicing decision, gain) 1300 parameters/secondfor coding versus 8000 samples/sec for the waveform

=>

84

8/12/2019 Lecture 16 (104 Slides)

85/104

dont use predictor coefficients (large dynamic range,=>

poles, PARCOR coefficients, etc. code LP parameters optimally using estimated pdfs for

1. V/UV-1 bit 100 bps

2. Pitch Period-6 bits uniform 600 b s3. Gain-5 bits (non-uniform) 500 bps4. LPC poles-10 bits (non-uniform)-5 bitsor an s or o eac o po es ps

Total required bit rate 7200 bps S5-original

85

loss from original speech quality)

quality limited by simple impulse/noise excitation modelS5-synthetic

8/12/2019 Lecture 16 (104 Slides)

86/104

.

2. use of PARCOR coefficients (| k i |= uniform pdf with small spectral sensitivity

=> 5-6 bits for coding can achieve 4800 bps with almost samequality as 7200 bps system above

can achieve 2400 bps with 20 msecframes => 50 frames/sec

86

--

8/12/2019 Lecture 16 (104 Slides)

87/104

87

--

8/12/2019 Lecture 16 (104 Slides)

88/104

the ke roblems with s eech coders based on all-pole linear prediction models

production model

idealization of source as either ulse train orrandom noise lack of accountin for arameter correlation

using a one-dimensional scalar quantizationmethod => aided greatly by using VQ methods

88

VQVQ--Based LPC Coder Based LPC Coder

8/12/2019 Lecture 16 (104 Slides)

89/104

on PARCOR

coefficients

bottom line : todramatically improve

quality need improved

Case 1: same quality as 2400 bps LPC vocoder Case 2: same bit rate, higher quality

exc a on mo e

10-bit codebook of PARCOR vectors

44.4 frames/sec

-

22 bit codebook => 4.2 mill ioncodewords to be searched

never achieved good quality

89

, ,

2-bit for f rame synchronization

total bit rate of 800 bps

due to computation, storage,graininess of quantization at cellboundaries

8/12/2019 Lecture 16 (104 Slides)

90/104

network-64 Kb s PCM 8 kHz sam lin rate 8-bit log quantization)

international-32 Kbps ADPCM teleconferencing-16 Kbps LD-CELP wireless-13, 8, 6.7, 4 Kbps CELP-based coders secure telephony-4.8, 2.4 Kbps LPC-based

coders (MELP) - - storage for voice mail, answering machines,

- -

90

8/12/2019 Lecture 16 (104 Slides)

91/104

91

8/12/2019 Lecture 16 (104 Slides)

92/104

bit rate-2400 to 128,000 b s quality-subjective (MOS), objective ( SNR ,

intelligibility) - , delay-echo, reverberation; block coding delay,

rocessin dela , multi lexin dela ,transmission delay-~100 msec

telephone bandwidth-200-3200 Hz, 8kHz

wideband speech-50-7000 Hz, 16 kHz samplingrate

92

Network Speech CodingNetwork Speech Coding

d dd d

8/12/2019 Lecture 16 (104 Slides)

93/104

tandardstandards

G.711 compandedPCM 64 Kbps tollG.726/727 ADPCM 16-40 Kbps toll

48, 56,64 Kb s.

G.728 LD-CELP 16 Kbps toll

G.729A CS-ACELP 8 Kbps toll-

93

. .& ACELP

. .Kbps

8/12/2019 Lecture 16 (104 Slides)

94/104

Secure Telephony SpeechSecure Telephony Speech

di t d ddi t d d

8/12/2019 Lecture 16 (104 Slides)

95/104

oding tandardsoding tandards

- . ps g

FS-1016 CELP 4.8 Kbps FS-1016

95

8/12/2019 Lecture 16 (104 Slides)

96/104

8/12/2019 Lecture 16 (104 Slides)

97/104

2 types of coders waveform approximating-PCM, DPCM, ADPCM-coders which produce a

reconstructed signal which converges toward the original signal with

decreasing quantization error - - - - - , , . ,

which produce a reconstructed signal which does not converge to the originalsignal with decreasing quantization error

to quality of or iginal speech

parametric coder converges-

maximum quality (due to themodel inaccuracy ofrepresenting speech)

97

8/12/2019 Lecture 16 (104 Slides)

98/104

talker and language dependencytalker and language dependency - especially for parametriccoders that estimate itch that is hi hl variable across men, womenand children; language dependency related to sounds of thelanguage (e.g., clicks) that are not well reproduced by model-basedcoders

-normalized to a maximum level; when actual samples are lower thanthis level, the coder is not operating at full efficiency causing loss ofquality

- , ,and interfering talkers; levels of background noise varies, makingoptimal coding based on clean speech problematic

multiple encodingsmultiple encodings - tandem encodings in a multi-linkcommun ca on sys em, e econ erenc ng w mu p e enco ers

channel errorschannel errors - especially an issue for cellular communications;errors either random or bursty (fades)-redundancy methods ofternused

98

nonnon--speech soundsspeech sounds - e.g., music on hold, dtmf tones; sounds thatare poorly coded by the system

8/12/2019 Lecture 16 (104 Slides)

99/104

Measures of Speech Coder QualityMeasures of Speech Coder Quality

Intelligibility-Diagnostic Rhyme Test (DRT)

8/12/2019 Lecture 16 (104 Slides)

100/104

Intelligibility Diagnostic Rhyme Test (DRT) compare words that differ in leading consonant en y spo en wor as one o a pa r o c o ces high scores (~90%) obtained for all coders above 4

Kbps Subjective Quality-Mean Opinion Score (MOS)

5 excellent quality

3 fair quality 2 poor quality

MOS scores for high quality wideband speech(~4.5) and for high quality telephone bandwidth

100

speech (~4.1)

Evolution of S eech Coder Performance

E ll t North American TDMA

8/12/2019 Lecture 16 (104 Slides)

101/104

Excellent North American TDMA

Good 2000

Fair

Poor

ITU RecommendationsCel l ul ar St andar ds

Secure Telephony1980 Profile

1980

Bad

ro e2000 Profile

101Bit Rate (kb/s)

8/12/2019 Lecture 16 (104 Slides)

102/104

GOOD (4)G.723.1

G.729

IS-127 G.728 G.726G.711

FAIR (3)

IS54

FS1016MELP1995

2000

POOR (2)FS10151990

BAD (1) 1980

102BIT RATE (kb/s)

8/12/2019 Lecture 16 (104 Slides)

103/104

64 kbps Mu-Law PCM 32 kbps CCITT G.721 ADPCM

- 8 kbps CELP

4.8 kbps CELP for STU-3

103

. - -

d b d h d

8/12/2019 Lecture 16 (104 Slides)

104/104

Wideband Speech Coding a e a er

3.2 kHz-uncoded 7 kHz-uncoded 7 kHz-coded at 64 kbps (G.722) 7 kHz-coded at 32 kbps (LD-CELP)

- - Female talker

3.2 kHz-uncoded 7 kHz-uncoded 7 kHz-coded at 64 kbps (G.722) - -

104

7 kHz-coded at 16 kbps (BE-CELP)

lecture 16 (104 slides)

Documents