1 a novel approach to speech coding after time scale modification presented by, h. gokhan ilk, ph.d

42
1 A Novel Approach to Speech Coding After Time Scale Modification Presented by, H. Gokhan Ilk, Ph.D

Upload: beatrice-hodges

Post on 28-Dec-2015

217 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: 1 A Novel Approach to Speech Coding After Time Scale Modification Presented by, H. Gokhan Ilk, Ph.D

1

A Novel Approach to Speech Coding After Time Scale Modification

Presented by,

H. Gokhan Ilk, Ph.D

Page 2: 1 A Novel Approach to Speech Coding After Time Scale Modification Presented by, H. Gokhan Ilk, Ph.D

Speech Coding

FIT, Brno University of

Technology2

Something about the presenter

B.Sc, Ankara University

Electronics Eng. Dept.

M.Sc.

Instrument Design & Applications UMIST, University of Manchester, Institute of Science and Technology, UK

Page 3: 1 A Novel Approach to Speech Coding After Time Scale Modification Presented by, H. Gokhan Ilk, Ph.D

Speech Coding

FIT, Brno University of

Technology3

Ph.D DCT Based Prototype Interpolation Speech Coding

University of Manchester, UK

Something about the presenter

Page 4: 1 A Novel Approach to Speech Coding After Time Scale Modification Presented by, H. Gokhan Ilk, Ph.D

Speech Coding

FIT, Brno University of

Technology4

Where is the Department?

Page 5: 1 A Novel Approach to Speech Coding After Time Scale Modification Presented by, H. Gokhan Ilk, Ph.D

Speech Coding

FIT, Brno University of

Technology5

Contact Details

Address : Ankara University,

Faculty of Engineering

Electronics Engineering Department

Beşevler 06100 Ankara, Turkey

[email protected]

Page 6: 1 A Novel Approach to Speech Coding After Time Scale Modification Presented by, H. Gokhan Ilk, Ph.D

Speech Coding

FIT, Brno University of

Technology6

Medical Doctors are more interested in this figure

Speech Production

LUNGS

Page 7: 1 A Novel Approach to Speech Coding After Time Scale Modification Presented by, H. Gokhan Ilk, Ph.D

Speech Coding

FIT, Brno University of

Technology7

How does it look like?

This figure is more interesting for a DSP course/seminar

Long term correlation

Short term correlation

Page 8: 1 A Novel Approach to Speech Coding After Time Scale Modification Presented by, H. Gokhan Ilk, Ph.D

Speech Coding

FIT, Brno University of

Technology8

How does it look like?

Speech can be generally classified as Voiced or Unvoiced.

Voiced part is a quasi-periodic (almost periodic) signal with higher energy and less zero crossing.

Unvoiced part is a noise like signal

Page 9: 1 A Novel Approach to Speech Coding After Time Scale Modification Presented by, H. Gokhan Ilk, Ph.D

Speech Coding

FIT, Brno University of

Technology9

a) Voiced

PSD: Power Spectrum Density

b) Unvoiced

How does it look like in the freqency domain

Page 10: 1 A Novel Approach to Speech Coding After Time Scale Modification Presented by, H. Gokhan Ilk, Ph.D

Speech Coding

FIT, Brno University of

Technology10

Now is a good time for maths

1

0

][ˆN

k

knxkwnynynyne

Anyone heard of Wiener Filter Theory, Optimal Filtering

1

0

][ˆN

k

knxkhnynynyne

Convolution sum

Wiener filter turns out to be an FIR filter with N coefficients

Page 11: 1 A Novel Approach to Speech Coding After Time Scale Modification Presented by, H. Gokhan Ilk, Ph.D

Speech Coding

FIT, Brno University of

Technology11

Optimal Filtering

1

0

][ˆN

k

knxkwnynynyne

Error is the difference between our signal and optimal estimate

nxyNow

1

0

][ˆN

kk knxanxnxnxne

Page 12: 1 A Novel Approach to Speech Coding After Time Scale Modification Presented by, H. Gokhan Ilk, Ph.D

Speech Coding

FIT, Brno University of

Technology12

Prediction as an Optimum Filtering Problem

1

1

][ˆN

kk knxanxnxnxne

pnxanxanxaknxanx p

p

kk

...21][ˆ 211

p

kk aknxanxnxne

00 1 ,][ˆ

Page 13: 1 A Novel Approach to Speech Coding After Time Scale Modification Presented by, H. Gokhan Ilk, Ph.D

Speech Coding

FIT, Brno University of

Technology13

LPC Analysis Filter

+

Linear Prediction Filter

nx

][ˆ nx

ne-

Page 14: 1 A Novel Approach to Speech Coding After Time Scale Modification Presented by, H. Gokhan Ilk, Ph.D

Speech Coding

FIT, Brno University of

Technology14

The AR (Auto Regressive) Model

nepnxanxanxanx p ...21][ 21

Considering optimum filter theory and regression analysis, since both independent and dependent variables belong to the same

random process, x, x[n] is called an autoregressive or AR process. That is the process is regressed upon itself.

Thanks to the people from Statistics, who called this analysis regression analysis of time series, long long time ago.

Page 15: 1 A Novel Approach to Speech Coding After Time Scale Modification Presented by, H. Gokhan Ilk, Ph.D

Speech Coding

FIT, Brno University of

Technology15

Innovations representation

H(z)=A(z) H-1(z)=1/A(z) nx ne nx

From Linear System Theory

The inverse system has many advantages.

1. In communications (left and right systems are apart)

2. The system on the right does not need any input ???

Page 16: 1 A Novel Approach to Speech Coding After Time Scale Modification Presented by, H. Gokhan Ilk, Ph.D

Speech Coding

FIT, Brno University of

Technology16

Innovations representation

Innovations representation is basically an inverse system.

Why called innovations??

Assume that x, our discrete random signal is speech.

It can be either voiced, which means it is quasi-periodic or unvoiced, then it is noise.

If x is voiced, then LPC analysis works very well and e[n] is close to zero

If x is unvoiced, then LPC analysis works well again because e[n] is white noise

In any case we do not need e[n] and thus the filters themselves present the information. That is why the representation is called INNOVATIONS.

Page 17: 1 A Novel Approach to Speech Coding After Time Scale Modification Presented by, H. Gokhan Ilk, Ph.D

Speech Coding

FIT, Brno University of

Technology17

What are these filters?

nepnxanxanxanx p ...21][ 21

zEzzXazzXazzXazX pp ...][ 2

21

1

Linear Prediction Synthesis Filter

A-1(z)

][nx ne

E(z) X(z)

Page 18: 1 A Novel Approach to Speech Coding After Time Scale Modification Presented by, H. Gokhan Ilk, Ph.D

Speech Coding

FIT, Brno University of

Technology18

p

i

ii zazXzXzAzE

zE

zX

zAzA

1

1 1)()()()()(

)(

)(

1)(

What are these filters? Finally LPC Analysis and Synthesis Filters

A(z) LPC analyses filter,

1/A(z) LPC synthesis filter

The filter in the AR model is therefore an IIR filter and AR model is therefore said to be an “all pole” modelUseful information for statisticians

Page 19: 1 A Novel Approach to Speech Coding After Time Scale Modification Presented by, H. Gokhan Ilk, Ph.D

Speech Coding

FIT, Brno University of

Technology19

What is the deal with these filters???

Since 1/A(z) is a causal filter (does everybody see that???), this implies that it is minimum phase (It is causal stable (???) with a causal stable inverse)

Since A(z) is an FIR filter, it is always stable and we know that it is causal. We also know that 1/A(z) is also causal. BUT IS IT ALWAYS STABLE???

We will now see that the ai (LPC coefficients) are found by solving Normal equations with a positive definite correlation function. Since they are found by solving a positive definite matrix inverse, the poles always lie within the unit circle...

Page 20: 1 A Novel Approach to Speech Coding After Time Scale Modification Presented by, H. Gokhan Ilk, Ph.D

Speech Coding

FIT, Brno University of

Technology20

How do we calculate LPC coefficients ?

p

jj jnsanens

1

)()()(

The problem is to determine the parameters aj, j=1,2,....p

If j : represents the estimates of aj then the error (or residual) is given by

p

jj jnsnsne

1

)()()(

Page 21: 1 A Novel Approach to Speech Coding After Time Scale Modification Presented by, H. Gokhan Ilk, Ph.D

Speech Coding

FIT, Brno University of

Technology21

It is now possible to determine the estimates by minimising the mean squared error, i.e.

}])()({[)}({ 2

1

2

p

jj jnsnsEneEError

Setting the partial derivatives of Error with respect to j

to zero for j = 1,2,...,p, we get

where E{.} is the expectation operator

,...2,10)}(])()({[1

iinsjnsnsEp

jj

Derivatives again ?

Page 22: 1 A Novel Approach to Speech Coding After Time Scale Modification Presented by, H. Gokhan Ilk, Ph.D

Speech Coding

FIT, Brno University of

Technology22

That is, e(n) is orthogonal to s(n-i) for i = 1,2,...p. Equation can be rearranged to give

piiji n

p

jnj ,...,2,1)0,(),(

1

)}()({),( jnsinsEjin

• Signal assumed stationary

Solving the linear equation

mnn

n

jmsims

pjpijnsinsEji

)()(

,...,2,1,,...,2,1)}()({),(

This is auto correlation? Or is it not?

Page 23: 1 A Novel Approach to Speech Coding After Time Scale Modification Presented by, H. Gokhan Ilk, Ph.D

Speech Coding

FIT, Brno University of

Technology23

Are we good with linear algabra?

That is, e(n) is orthogonal to s(n-i) for i = 1,2,...p

A x = b

Obtained from University of Chicago web site

Page 24: 1 A Novel Approach to Speech Coding After Time Scale Modification Presented by, H. Gokhan Ilk, Ph.D

Speech Coding

FIT, Brno University of

Technology24

Auto-Correlation Method

pjpijmsimsjipN

mnnn

0,1)()(),(1

0

pjpijimsmsjijiN

mnnn

0,1)()(),()(1

0

N : length of the sample sequence

sn(m) = 0 outside the interval 0 m N-1

Method I

Page 25: 1 A Novel Approach to Speech Coding After Time Scale Modification Presented by, H. Gokhan Ilk, Ph.D

Speech Coding

FIT, Brno University of

Technology25

pjpijiRji nn ,...,1,0,...,2,1)(),(

jN

mnnn jmsmsjR

1

0

)()()(

piiRjiR n

p

jnj

1)()(1

Short time auto correlation

)(

:

)2(

)1(

:

)0(..)1(

::::

)2(..)1(

)1(.)1()0(

2

1

pR

R

R

RpR

pRR

pRRR

n

n

n

pnn

nn

nnn

Levinson-Durbin recursion

Page 26: 1 A Novel Approach to Speech Coding After Time Scale Modification Presented by, H. Gokhan Ilk, Ph.D

Speech Coding

FIT, Brno University of

Technology26

Covariance Method

1

0

2 )(N

mn meE

pjpijmsimsjiN

mnnn

0,1)()(),(1

0

pjpijimsmsjiiN

imnnn

0,1)()(),(1

It requires the use of the samples in the interval -p m N-1

Method II

Page 27: 1 A Novel Approach to Speech Coding After Time Scale Modification Presented by, H. Gokhan Ilk, Ph.D

Speech Coding

FIT, Brno University of

Technology27

)0,(

:

)0,2(

)0,1(

:

),(..)1,(

::::

),2(..)1,2(

),1(.)2,1()1,1(

2

1

pppp

p

p

n

n

n

pnn

nn

nnn

TLDL

Covariance Method

Symmetric covariance matrix, Cheolesky decomposition

Page 28: 1 A Novel Approach to Speech Coding After Time Scale Modification Presented by, H. Gokhan Ilk, Ph.D

Speech Coding

FIT, Brno University of

Technology28

What is next???

Now that we have the LPC ai coefficients, we can present speech with a compact representation

This further requires an efficient representation of the excitation (residual, error) signal. In fact for example optimum magnitude calculation of regularly spaced pulses for the excitation constitutes GSM (Global System for Mobile Communications)

Page 29: 1 A Novel Approach to Speech Coding After Time Scale Modification Presented by, H. Gokhan Ilk, Ph.D

Speech Coding

FIT, Brno University of

Technology29

State of Art

Efficient quantization of LPC parameters (called LSP or LSF (line spectral frequencies or pairs) together with the efficient representation and quantization of the excitation results in today’s state of art voice coding.

Examples:

GSM, CELP (code excited linear prediction), MELP (mixed excited linear prediction) etc.

Page 30: 1 A Novel Approach to Speech Coding After Time Scale Modification Presented by, H. Gokhan Ilk, Ph.D

Speech Coding

FIT, Brno University of

Technology30

Anything novel and interesting?

Linear predictive coding and efficient representation of the excitation signal attracted so much interest that these poor subjects had been beaten to death.

Therefore one has to do A LOT in order to gain A LITTLE

Or merge two different disciplines in a clever way.

It turns out that Prof. Verhelst has already developed one of the most important tools in one of these disciplines.

Page 31: 1 A Novel Approach to Speech Coding After Time Scale Modification Presented by, H. Gokhan Ilk, Ph.D

Speech Coding

FIT, Brno University of

Technology31

What is the novelty?

Since speech signal exhibit both short and long term correlation and LPC analysis removes most of the short term correlation, we can remove the long term correlation, i.e. get rid of long term redundancy.

The key is not to disturb pitch and formant frequencies. A detailed investigation of these parameters could be found in:

W. Verhelst, “Overlap-add methods for time-scaling of speech”, Speech Commun. 30 (2000) 207–221.

Page 32: 1 A Novel Approach to Speech Coding After Time Scale Modification Presented by, H. Gokhan Ilk, Ph.D

Speech Coding

FIT, Brno University of

Technology32

How does it work?

If pitch and formant frequencies are not disturbed by the WSOLA algorithm then one can compress speech (before coding) with a compression rate of beta and then expand the decoded speech at the receiver side with an expansion factor of 1/beta. If for example beta=0.5, then one can have a full duplex channel at a half duplex bandwidth.

Why? Because the same signal is represented at half duration with minimum distortion.

Page 33: 1 A Novel Approach to Speech Coding After Time Scale Modification Presented by, H. Gokhan Ilk, Ph.D

Speech Coding

FIT, Brno University of

Technology33

Waveform

Similarity

Overlap

and ADD

Page 34: 1 A Novel Approach to Speech Coding After Time Scale Modification Presented by, H. Gokhan Ilk, Ph.D

Speech Coding

FIT, Brno University of

Technology34

How does it work?

U=N/2 No rate change (WSOLA =1) U<N/2 Speech slows down, expansion (WSOLA >=1) U>N/2 Speech speeds up, compression (WSOLA <=1)

This is for 50% overlapping frames. A good way to test the algorithm:::Compress with =1 and expand with =1

Page 35: 1 A Novel Approach to Speech Coding After Time Scale Modification Presented by, H. Gokhan Ilk, Ph.D

Speech Coding

FIT, Brno University of

Technology35

Is that it ???

We have tried this approach with many different algorithms operating in time and frequency domains. Our experiments with the new NATO standard, Stanag 4591, MELP (mixed excitation linear predictive vocoder) indeed proved that WSOLA produces high quality output and it is computationally efficient.

Details can be found

H.G. Ilk, S. Tugac, “Channel and source considerations of a bit rate reduction technique for a possible wireless communications system’s performance enhancement”, IEEE Trans. Wireless Commun. vol. 4(1), January 2005, pp. 93–99

But what if we would like to make most of our bandwidth? Then the system should be adaptive. It means WSOLA should operate at different time compression factors. This is an engineer’s dream come true. You dont operate at constant or multi-rate bit rates but you operate at flexible bit rates. That is YOU tell me how much bandwidth you got and I give tou the best quality possible. Not the other way around !!!Not the other way around !!!

Page 36: 1 A Novel Approach to Speech Coding After Time Scale Modification Presented by, H. Gokhan Ilk, Ph.D

Speech Coding

FIT, Brno University of

Technology36

We are more clever than that

Up to this point we are only using Werner’s WSOLA algorithm, that has been developed for hearing disabled. What is we want to change beta seamlessly. How do we do that? To change beta, you can either change U or N.

Restrictions::: Frame size (N) should not change at the transmitter, during compression. That is determined by your codec and it is standard.

Page 37: 1 A Novel Approach to Speech Coding After Time Scale Modification Presented by, H. Gokhan Ilk, Ph.D

Speech Coding

FIT, Brno University of

Technology37

What is the

extension then??

Different beta as we proceed,

Compression

As you can see from solid black lines N is constant.

As you can see from dashed blue lines U changes for each frame

Page 38: 1 A Novel Approach to Speech Coding After Time Scale Modification Presented by, H. Gokhan Ilk, Ph.D

Speech Coding

FIT, Brno University of

Technology38

Half symmetric windows in order to go back to the original time scale

Expansion

During synthesis at the receiver, N has to change for synchronous output speech

Page 39: 1 A Novel Approach to Speech Coding After Time Scale Modification Presented by, H. Gokhan Ilk, Ph.D

Speech Coding

FIT, Brno University of

Technology39

What is the originality?

This approach is particularly useful in packet switching network applications like VoIP (Voice Over IP) in dynamic networks because the load may change abruptly and it is not symmetric at each direction.

It is also equally valuable in circuit switching congested voice networks because today’s networks either allow multi-rates (2.4, 4.8 or 8.0 kb/s) or simply drops your call. This will allow priority in phone calls or cheaper tariffs leading to QoS in a circuit switching network (That is novel is it NOT???)That is novel is it NOT???)

Details can be found in

Hakkı Gökhan İlk and Saadettin Güler, “Adaptive time scale modification of speech for graceful degrading voice quality in congested networks for VoIP applications”, Signal Processing, Volume 86, pp 127-139, 2006

Page 40: 1 A Novel Approach to Speech Coding After Time Scale Modification Presented by, H. Gokhan Ilk, Ph.D

Speech Coding

FIT, Brno University of

Technology40

Samples

Male

“Steve wore a bright red cashmere sweater”

Female

“Before Thursday’s exam review every formula”

2.4 kb/s

1.0 kb/s

128 kb/s PCM

2.4 kb/s

1.0 kb/s

128 kb/s PCM

Page 41: 1 A Novel Approach to Speech Coding After Time Scale Modification Presented by, H. Gokhan Ilk, Ph.D

Speech Coding

FIT, Brno University of

Technology41

Reward!Our algorithm has been selected as one of the two

finalists in a competition by TURKCELL (a GSM giant in Turkey). We hope to win the competition by our presentation and demo on 28 September.

Page 42: 1 A Novel Approach to Speech Coding After Time Scale Modification Presented by, H. Gokhan Ilk, Ph.D

Speech Coding

FIT, Brno University of

Technology42

I would like to thank Honza and

FIT for making this exchange

possible