digital speech and audio processing e. nemer spring 2008€¦ · digital speech and audio...

22
Digital Speech and Audio Processing Spring 2008 - 2 E. Nemer 1. Autocorrelation Function Deterministic Signal Voiced speech Random Signal Unvoiced Speech 2. Average Magnitude Difference Function 3. Windowing issues. 4. Pitch Estimation

Upload: others

Post on 29-Sep-2020

4 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Digital Speech and Audio Processing E. Nemer Spring 2008€¦ · Digital Speech and Audio Processing E. Nemer Spring 2008 -3 Autocorrelation Function • The autocorrelation function

Digital Speech and Audio Processing Spring 2008 - 1E. Nemer

Digital Speech and Audio Processing Spring 2008 - 2E. Nemer

1. Autocorrelation Function

• Deterministic Signal

• Voiced speech

• Random Signal

• Unvoiced Speech

2. Average Magnitude Difference Function

3. Windowing issues.

4. Pitch Estimation

Page 2: Digital Speech and Audio Processing E. Nemer Spring 2008€¦ · Digital Speech and Audio Processing E. Nemer Spring 2008 -3 Autocorrelation Function • The autocorrelation function

Digital Speech and Audio Processing Spring 2008 - 3E. Nemer

Autocorrelation Function

• The autocorrelation function of a deterministic signal x(n), is defined by:

∑∞

−∞=

+=m

kmxmxk )()()(φ

∑−−=

=

+=1

0)()()(

kNn

nkmxmxkφ

• In practice, we would window the signal so that we can compute the above sum . Assume a rectangular window of length N, then the simple form of the autocorrelation is :

Digital Speech and Audio Processing Spring 2008 - 4E. Nemer

0 1 2 3 4 5 6 7 8 9 1 0-1

-0 .5

0

0 .5

1

0 1 2 3 4 5 6 7 8 9 1 0-1

-0 .5

0

0 .5

1

0 1 2 3 4 5 6 7 8 9 1 0-1

-0 .5

0

0 .5

1

Original x(m)

Shifted versions x(m-k)

• Consider the simple case of a sinusoidal signal :

⎟⎠⎞

⎜⎝⎛ +⋅= θπ m

FsfAmx 02cos)(

0fFsPperiod =

Autocorrelation Function

Page 3: Digital Speech and Audio Processing E. Nemer Spring 2008€¦ · Digital Speech and Audio Processing E. Nemer Spring 2008 -3 Autocorrelation Function • The autocorrelation function

Digital Speech and Audio Processing Spring 2008 - 5E. Nemer

- 2 5 - 2 0 - 1 5 - 1 0 - 5 0 5 1 0 1 5 2 0 2 5- 6 0

- 4 0

- 2 0

0

2 0

4 0

6 0

0 1 2 3 4 5 6 7 8 9 1 0- 1

- 0 . 5

0

0 . 5

1

0 1 2 3 4 5 6 7 8 9 1 0- 1

- 0 . 5

0

0 . 5

1

Autocorrelation Function

∑∞

−∞=

+=m

kmxmxk )()()(φ

Multiply-addGet one point

Digital Speech and Audio Processing Spring 2008 - 6E. Nemer

)()( Pkk +=φφ for periodic functions (period=P)

)()( kk −=φφ symetry

kk ∀≤ ),0()( φφφ(0) is the energy of x(n)

follows Cauchy’s inequality

Properties of autocorrelation function

Autocorrelation Function

- 2 5 - 2 0 - 1 5 - 1 0 - 5 0 5 1 0 1 5 2 0 2 5- 6 0

- 4 0

- 2 0

0

2 0

4 0

6 0

Page 4: Digital Speech and Audio Processing E. Nemer Spring 2008€¦ · Digital Speech and Audio Processing E. Nemer Spring 2008 -3 Autocorrelation Function • The autocorrelation function

Digital Speech and Audio Processing Spring 2008 - 7E. Nemer

0 50 100 150-5

0

5

10x 108

0 50 100 150 200 250 300 350-4000

-3000

-2000

-1000

0

1000

2000

3000

4000

Periodic peaks

Rn(k)

Example for Voiced Signal

Digital Speech and Audio Processing Spring 2008 - 8E. Nemer

)( )(

Density SpectralPower Function ation Autocorrel

fSR XX τ F

)( )( fSR XX τ 1−F

∫∞

∞−

−= ττ τπ deRfS fjXX

2)()(

∫∞

∞−

= dfefSR fjXX

τπτ 2)()(

[ ] )0()( 2XRtXEP == [ ] ∫

∞−

== dffStXEP X )()( 2

Power Spectral Density

Page 5: Digital Speech and Audio Processing E. Nemer Spring 2008€¦ · Digital Speech and Audio Processing E. Nemer Spring 2008 -3 Autocorrelation Function • The autocorrelation function

Digital Speech and Audio Processing Spring 2008 - 9E. Nemer

• A more reasonable definition of the auto-correlation is expressed using the pair of windowed signals :

∑∞

−∞=

−−+−=m

n mknwkmxmnwmxkR )()()()()(

window for x(m) window for x(m+k)

∑∞

−∞=

−+−−=−=m

nn mknwkmxmnwmxkRkR )()()()()()(

Easy to show

Autocorrelation Function

Digital Speech and Audio Processing Spring 2008 - 10E. Nemer

• Another way to view the windowed autocorrelation is by considering the product of the speech signal with its delayed version, passed thru a filter whose response is :

[ ]∑∞

−∞=

−⋅−=m

kn mnhkmxmxkR )()()()(

)()()( knwnwnhk +=

Delay k

)(nhk)(nx

)( knx −)(kRn

Autocorrelation Function

[ ] [ ]∑∞

−∞=

−+−⋅−=m

n mknwmnwkmxmxkR )()()()()(

Page 6: Digital Speech and Audio Processing E. Nemer Spring 2008€¦ · Digital Speech and Audio Processing E. Nemer Spring 2008 -3 Autocorrelation Function • The autocorrelation function

Digital Speech and Audio Processing Spring 2008 - 11E. Nemer

∑−

+−=

−−+−=kn

Nnmn mknwkmxmnwmxkR

1

)()()()()(

201 n = 300

50,,0 L=k

1+− Nn nm

)( mnw −

Autocorrelation Function

e.g. N = 100n = 300

Digital Speech and Audio Processing Spring 2008 - 12E. Nemer

∑−

=

−−+−=k

mn kmnwkmxmnwmxkR

300

201

)()()()()(

)0()300()1()299(.......)98()202()99()201()1(

wxwxwxwxRn +=

Autocorrelation Function

)0()300()50()250(.......)98()202()99()201()50(

wxwxwxwxRn +=

Total number of products in the sum : N – k

Page 7: Digital Speech and Audio Processing E. Nemer Spring 2008€¦ · Digital Speech and Audio Processing E. Nemer Spring 2008 -3 Autocorrelation Function • The autocorrelation function

Digital Speech and Audio Processing Spring 2008 - 13E. Nemer

∑−−=

=

+−

=1

0)()(1)(

kNm

mun kmxmx

kNkR

∑−−=

=

+=1

0)()(1)(

kNm

mb kmxmx

NkR

)(1)( kRNkkR unbau ⎟⎠⎞

⎜⎝⎛ −=

unbiased

biased

Biased for finite N, but Asymptotically unbiased for

∞→N

Autocorrelation Function

Recall the simple definition of the autocorrelation (not windowed)

Digital Speech and Audio Processing Spring 2008 - 14E. Nemer

Autocorrelation voiced speech

00

1T

F =

0T

0TFemale

Page 8: Digital Speech and Audio Processing E. Nemer Spring 2008€¦ · Digital Speech and Audio Processing E. Nemer Spring 2008 -3 Autocorrelation Function • The autocorrelation function

Digital Speech and Audio Processing Spring 2008 - 15E. Nemer

0T

0T

00 3

13T

F =

Autocorrelation voiced speech Male

Digital Speech and Audio Processing Spring 2008 - 16E. Nemer

Sixteen.wav filtered at 600Hz

sixteen.wav

Page 9: Digital Speech and Audio Processing E. Nemer Spring 2008€¦ · Digital Speech and Audio Processing E. Nemer Spring 2008 -3 Autocorrelation Function • The autocorrelation function

Digital Speech and Audio Processing Spring 2008 - 17E. Nemer

An ensemble of sample functions:

The case of a random process

Digital Speech and Audio Processing Spring 2008 - 18E. Nemer

When X(t) is a stationary random process , then :

The mean of X(t) is

fX(t)(x) : the probability density function (pdf)The autocorrelation function of X(t) is

[ ])()( tXEtX =μ

xxxf tX d )(

)(∫∞

∞−=

[ ]

)(

),(

),(

)()( )(R

12

-

- 2121)((0)21

-

- 2121)()(21

2121

12

21

ttR

dxdxxxfxx

dxdxxxfxx

tXtXE,tt

X

ttXX

tXtX

X

−=

=

=

=

∫ ∫∫ ∫∞

∞ −

Xμ=

The case of a random process

Page 10: Digital Speech and Audio Processing E. Nemer Spring 2008€¦ · Digital Speech and Audio Processing E. Nemer Spring 2008 -3 Autocorrelation Function • The autocorrelation function

Digital Speech and Audio Processing Spring 2008 - 19E. Nemer

For convenience of notation , we redefine

1. The mean-square value

2. The autocorrelation is symetrical :

3. The largest value is at lag 0

[ ] allfor , )()()( tτtXtXERX −=τ

[ ] 0 τ, )((0) 2 == tXERX

τ)()( −= RRX τ

(0))( XX RR ≤τ

The case of a random process

Digital Speech and Audio Processing Spring 2008 - 20E. Nemer

The RX(τ) provides the interdependence information of two random variables obtained from X(t) at timesτ seconds apart

The case of a random process

Page 11: Digital Speech and Audio Processing E. Nemer Spring 2008€¦ · Digital Speech and Audio Processing E. Nemer Spring 2008 -3 Autocorrelation Function • The autocorrelation function

Digital Speech and Audio Processing Spring 2008 - 21E. Nemer

),(~)42cos()( πππ −ΘΘ+= UttX

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1-1

0

1Realizations of Sinusoid of cont. random phase

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1-1

0

1

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1-1

0

1

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1-1

0

1

⎪⎩

⎪⎨⎧ ≤≤−=Θ

elsewhere,0

,21

)( πθππf θ

Θf

θπ− π

Random process : sinusoid with random phase

Digital Speech and Audio Processing Spring 2008 - 22E. Nemer

Θ)2cos()( += tπfAtX c

[ ] )2cos(2

)()()(2

tπfAtXτtXER cX =+=τ

Random process : sinusoid with random phase

Page 12: Digital Speech and Audio Processing E. Nemer Spring 2008€¦ · Digital Speech and Audio Processing E. Nemer Spring 2008 -3 Autocorrelation Function • The autocorrelation function

Digital Speech and Audio Processing Spring 2008 - 23E. Nemer

N=1000;X=randn(1,N);

m=50;Rx=Rx_est(X,m);

plot(X)plot([-m:m],Rx)

0 100 200 300 400 500 600 700 800 900 1000-10

-8

-6

-4

-2

0

2

4

6

8

10Gaussian Random Process

Random process : white noise

Digital Speech and Audio Processing Spring 2008 - 24E. Nemer

Power Spectral Density

0 100 200 300 400 500 600 700 800 900 1000-10

-8

-6

-4

-2

0

2

4

6

8

10Gaussian Random Process

for j=1:100,Rx=Rx_est(X,m);Sx=fftshift(abs(fft(Rx)));

end;

Sx_av=Sx_av+Sx;

Sx_av=Sx_av/100;

-50 -40 -30 -20 -10 0 10 20 30 40 50-0.2

0

0.2

0.4

0.6

0.8

1

1.2Autocorrelation function

-0.5 -0.4 -0.3 -0.2 -0.1 0 0.1 0.2 0.3 0.4 0.50

0.2

0.4

0.6

0.8

1

1.2

1.4

1.6

1.8

2

f

Power Spectral Desity

Page 13: Digital Speech and Audio Processing E. Nemer Spring 2008€¦ · Digital Speech and Audio Processing E. Nemer Spring 2008 -3 Autocorrelation Function • The autocorrelation function

Digital Speech and Audio Processing Spring 2008 - 25E. Nemer

Autocorrelation – unvoiced speech

Digital Speech and Audio Processing Spring 2008 - 26E. Nemer

Average Magnitude Difference Function

• Instead of taking average over magnitude, we do it over magnitude difference

• Operator T: T(x) is the concatenation of a linear filter H(z)=1-z-p

and taking absolute value |x|• It will be used in a computationally efficient solution to pitch

estimation

∑+−=

−−=Δn

Nnmn pmxmxpM

1

|)()(|)(

Page 14: Digital Speech and Audio Processing E. Nemer Spring 2008€¦ · Digital Speech and Audio Processing E. Nemer Spring 2008 -3 Autocorrelation Function • The autocorrelation function

Digital Speech and Audio Processing Spring 2008 - 27E. Nemer

AMDF vs. autocorrelation

0 50 100 1500

1

2

3

4

5

6

7x 105

0 50 100 150 200 250 300 350-4000

-3000

-2000

-1000

0

1000

2000

3000

4000

0 5 0 1 0 0 1 5 0-5

0

5

1 0x 1 0 8

Digital Speech and Audio Processing Spring 2008 - 28E. Nemer

Summary

• Short-time energy

• Short-time amplitude

• Short-time average zero-cross (ZC) rate

• Short-time amplitude difference

• Short-time auto-correlation

∑+−=

=n

Nnmn mxE

1

2 )(

∑+−=

=n

Nnmn mxM

1

|)(|

∑+−=

−−=Δn

Nnmn pmxmxpM

1

|)()(|)(

∑∞

−∞=

−−+−=m

n mknwkmxmnwmxkR )()()()()(

∑+−=

−−=n

Nnmn mxmxZ

1|)]1(sgn[)](sgn[|

Page 15: Digital Speech and Audio Processing E. Nemer Spring 2008€¦ · Digital Speech and Audio Processing E. Nemer Spring 2008 -3 Autocorrelation Function • The autocorrelation function

Digital Speech and Audio Processing Spring 2008 - 29E. Nemer

Window Effect

• Obvious advantage: localizing a part of speech for analysis.

• Less obvious disadvantage: how will the windowing affect the analysis results?– Compare the localization property of two extreme cases: N=1 vs. N=∞

» N=1: best in time but worst in frequency» N= ∞: best in frequency but worst in time

• Fundamentally speaking, windowing is the pursuit of a better tradeoff between time and frequency localization

Why do we window the signal :

Digital Speech and Audio Processing Spring 2008 - 30E. Nemer

Window Parameters

• Window length– Not too short: too few samples are not enough to resolve the uncertainty – Not too long: too many samples would introduce more uncertainty

• Window shape– Rectangular window is conceptually the simplest, but suffers from spectral

leakage problem

• It is always a tradeoff: there is no universally optimal window; which window is more effective depends on specific applications

Page 16: Digital Speech and Audio Processing E. Nemer Spring 2008€¦ · Digital Speech and Audio Processing E. Nemer Spring 2008 -3 Autocorrelation Function • The autocorrelation function

Digital Speech and Audio Processing Spring 2008 - 31E. Nemer

Rectangular Window

Digital Speech and Audio Processing Spring 2008 - 32E. Nemer

Various Windows

• Hamming/Hanning window – Also known as raised Cosine window

• Blackman window– Based on two cosine terms

• Bartlett window– Also known as triangular window

• Kaiser-Bessel window– Based on Bessel functions

Page 17: Digital Speech and Audio Processing E. Nemer Spring 2008€¦ · Digital Speech and Audio Processing E. Nemer Spring 2008 -3 Autocorrelation Function • The autocorrelation function

Digital Speech and Audio Processing Spring 2008 - 33E. Nemer

Hanning / Hamming Windows

)2cos(*5.05.0)(NnnW π

+=N-point Hann window

)2cos(*46.054.0)(NnnW π

+=N-point Hamming window

)2cos(*)1()(NnaanW π

−+=N-point raised cosine window

Digital Speech and Audio Processing Spring 2008 - 34E. Nemer

Hamming

Page 18: Digital Speech and Audio Processing E. Nemer Spring 2008€¦ · Digital Speech and Audio Processing E. Nemer Spring 2008 -3 Autocorrelation Function • The autocorrelation function

Digital Speech and Audio Processing Spring 2008 - 35E. Nemer

Blackman Window

)2cos(08.0)cos(5.042.0)(Nn

NnnW ππ ++=

Digital Speech and Audio Processing Spring 2008 - 36E. Nemer

Barlett Window

NNnnW |2/|1)( −

−=

Page 19: Digital Speech and Audio Processing E. Nemer Spring 2008€¦ · Digital Speech and Audio Processing E. Nemer Spring 2008 -3 Autocorrelation Function • The autocorrelation function

Digital Speech and Audio Processing Spring 2008 - 37E. Nemer

Kaiser-Bessel Window

zero order modified Bessel function of the first kind

B determines the tradeoff between the main lobe and sidelobesand is often specified as a half integer multiple of pi

Digital Speech and Audio Processing Spring 2008 - 38E. Nemer

Kaiser-Bessel Window

Page 20: Digital Speech and Audio Processing E. Nemer Spring 2008€¦ · Digital Speech and Audio Processing E. Nemer Spring 2008 -3 Autocorrelation Function • The autocorrelation function

Digital Speech and Audio Processing Spring 2008 - 39E. Nemer

Pitch Period Estimation

P

PP

Digital Speech and Audio Processing Spring 2008 - 40E. Nemer

• How to estimate the period for sinusoids?

– Fourier Transform (will be covered by the next lectures)

• In the time domain, we might consider

– Autocorrelation function – recall its periodic property

– Average Magnitude Difference Function (AMDF) – essentially follow the same motivation but computationally more efficient

Pitch Period Estimation

Page 21: Digital Speech and Audio Processing E. Nemer Spring 2008€¦ · Digital Speech and Audio Processing E. Nemer Spring 2008 -3 Autocorrelation Function • The autocorrelation function

Digital Speech and Audio Processing Spring 2008 - 41E. Nemer

P1 P2P3

P=Average(P1,P2,…)

Pitch Period Estimation

Digital Speech and Audio Processing Spring 2008 - 42E. Nemer

Pitch Estimation using prediction

We assume the signal is periodic (i.e. repeats itself) , and let L be the estimate of the pitch period.

We then ‘predict’ the current period, from the previous one.

And the prediction error, for the estimate L is then

)()( Lnxcnx −⋅=)

Page 22: Digital Speech and Audio Processing E. Nemer Spring 2008€¦ · Digital Speech and Audio Processing E. Nemer Spring 2008 -3 Autocorrelation Function • The autocorrelation function

Digital Speech and Audio Processing Spring 2008 - 43E. Nemer

)()()()()( Lnxcnxnxnxne −⋅−=−= )

[ ] ( )[ ]22 )()()( LnxcnxEneE −⋅−=

[ ] [ ] [ ] [ ])()(2)()()( 2222 LnxnxEcLnxEcnxEneE −⋅−−+=

[ ] [ ] [ ])()(2)(2)( 22 LnxnxELnxcEneEc

−−−=∂∂

[ ] 0)(2 =∂∂ neEc

[ ][ ] ),(

),0()(

)()(2 LLR

LRLnxELnxnxEc

x

x=−−

=

Pitch Estimation using prediction

Energy Of error

MinimizeEnergy Of error

Digital Speech and Audio Processing Spring 2008 - 44E. Nemer

• For a given range of estimates for the pitch L = L1 to L2

• Compute the autocorrelation at lags 0, L

• Compute the optimum gain c, and the prediction error:

• Choose the value of L (pitch) that has the minimum error.

),(),0(LLRLRc

x

x=

),(),0()0,0(

2

LLRLRRMinErr

x

xx −=

Pitch Estimation using prediction