digital speech and audio processing e. nemer spring 2008€¦ · digital speech and audio...

Digital Speech and Audio Processing Spring 2008 - 1E. Nemer


1. Autocorrelation Function

• Deterministic Signal

• Voiced speech

• Random Signal

• Unvoiced Speech

2. Average Magnitude Difference Function

3. Windowing issues.

4. Pitch Estimation


Autocorrelation Function

• The autocorrelation function of a deterministic signal x(n), is defined by:

∑∞

−∞=

+=m

kmxmxk )()()(φ

∑−−=

=

+=1

0)()()(

kNn

nkmxmxkφ

• In practice, we would window the signal so that we can compute the above sum . Assume a rectangular window of length N, then the simple form of the autocorrelation is :


0 1 2 3 4 5 6 7 8 9 1 0-1

-0 .5

0

0 .5

1

0 1 2 3 4 5 6 7 8 9 1 0-1

-0 .5

0

0 .5

1

0 1 2 3 4 5 6 7 8 9 1 0-1

-0 .5

0

0 .5

1

Original x(m)

Shifted versions x(m-k)

• Consider the simple case of a sinusoidal signal :

⎟⎠⎞

⎜⎝⎛ +⋅= θπ m

FsfAmx 02cos)(

0fFsPperiod =



- 2 5 - 2 0 - 1 5 - 1 0 - 5 0 5 1 0 1 5 2 0 2 5- 6 0

- 4 0

- 2 0

0

2 0

4 0

6 0

0 1 2 3 4 5 6 7 8 9 1 0- 1

- 0 . 5

0

0 . 5

1

0 1 2 3 4 5 6 7 8 9 1 0- 1

- 0 . 5

0

0 . 5

1


∑∞

−∞=

+=m

kmxmxk )()()(φ

Multiply-addGet one point


)()( Pkk +=φφ for periodic functions (period=P)

)()( kk −=φφ symetry

kk ∀≤ ),0()( φφφ(0) is the energy of x(n)

follows Cauchy’s inequality

Properties of autocorrelation function


- 2 5 - 2 0 - 1 5 - 1 0 - 5 0 5 1 0 1 5 2 0 2 5- 6 0

- 4 0

- 2 0

0

2 0

4 0

6 0


0 50 100 150-5

0

5

10x 108

0 50 100 150 200 250 300 350-4000

-3000

-2000

-1000

0

1000

2000

3000

4000

Periodic peaks

Rn(k)

Example for Voiced Signal


)( )(

Density SpectralPower Function ation Autocorrel

fSR XX τ F

)( )( fSR XX τ 1−F

∫∞

∞−

−= ττ τπ deRfS fjXX

2)()(

∫∞

∞−

= dfefSR fjXX

τπτ 2)()(

[ ] )0()( 2XRtXEP == [ ] ∫

∞

∞−

== dffStXEP X )()( 2

Power Spectral Density


• A more reasonable definition of the auto-correlation is expressed using the pair of windowed signals :

∑∞

−∞=

−−+−=m

n mknwkmxmnwmxkR )()()()()(

window for x(m) window for x(m+k)

∑∞

−∞=

−+−−=−=m

nn mknwkmxmnwmxkRkR )()()()()()(

Easy to show



• Another way to view the windowed autocorrelation is by considering the product of the speech signal with its delayed version, passed thru a filter whose response is :

[ ]∑∞

−∞=

−⋅−=m

kn mnhkmxmxkR )()()()(

)()()( knwnwnhk +=

Delay k

)(nhk)(nx

)( knx −)(kRn


[ ] [ ]∑∞

−∞=

−+−⋅−=m

n mknwmnwkmxmxkR )()()()()(


∑−

+−=

−−+−=kn

Nnmn mknwkmxmnwmxkR

1

)()()()()(

201 n = 300

50,,0 L=k

1+− Nn nm

)( mnw −


e.g. N = 100n = 300


∑−

=

−−+−=k

mn kmnwkmxmnwmxkR

300

201

)()()()()(

)0()300()1()299(.......)98()202()99()201()1(

wxwxwxwxRn +=


)0()300()50()250(.......)98()202()99()201()50(

wxwxwxwxRn +=

Total number of products in the sum : N – k


∑−−=

=

+−

=1

0)()(1)(

kNm

mun kmxmx

kNkR

∑−−=

=

+=1

0)()(1)(

kNm

mb kmxmx

NkR

)(1)( kRNkkR unbau ⎟⎠⎞

⎜⎝⎛ −=

unbiased

biased

Biased for finite N, but Asymptotically unbiased for

∞→N


Recall the simple definition of the autocorrelation (not windowed)


Autocorrelation voiced speech

00

1T

F =

0T

0TFemale


0T

0T

00 3

13T

F =

Autocorrelation voiced speech Male


Sixteen.wav filtered at 600Hz

sixteen.wav


An ensemble of sample functions:

The case of a random process


When X(t) is a stationary random process , then :

The mean of X(t) is

fX(t)(x) : the probability density function (pdf)The autocorrelation function of X(t) is

[ ])()( tXEtX =μ

xxxf tX d )(

)(∫∞

∞−=

[ ]

)(

),(

),(

)()( )(R

12

-

- 2121)((0)21

-

- 2121)()(21

2121

12

21

ttR

dxdxxxfxx

dxdxxxfxx

tXtXE,tt

X

ttXX

tXtX

X

−=

=

=

=

∫ ∫∫ ∫∞

∞

∞

∞ −

∞

∞

∞

∞

Xμ=



For convenience of notation , we redefine

1. The mean-square value

2. The autocorrelation is symetrical :

3. The largest value is at lag 0

[ ] allfor , )()()( tτtXtXERX −=τ

[ ] 0 τ, )((0) 2 == tXERX

τ)()( −= RRX τ

(0))( XX RR ≤τ



The RX(τ) provides the interdependence information of two random variables obtained from X(t) at timesτ seconds apart



),(~)42cos()( πππ −ΘΘ+= UttX

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1-1

0

1Realizations of Sinusoid of cont. random phase

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1-1

0

1

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1-1

0

1

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1-1

0

1

⎪⎩

⎪⎨⎧ ≤≤−=Θ

elsewhere,0

,21

)( πθππf θ

Θf

θπ− π

Random process : sinusoid with random phase


Θ)2cos()( += tπfAtX c

[ ] )2cos(2

)()()(2

tπfAtXτtXER cX =+=τ

Random process : sinusoid with random phase


N=1000;X=randn(1,N);

m=50;Rx=Rx_est(X,m);

plot(X)plot([-m:m],Rx)

0 100 200 300 400 500 600 700 800 900 1000-10

-8

-6

-4

-2

0

2

4

6

8

10Gaussian Random Process

Random process : white noise


Power Spectral Density

0 100 200 300 400 500 600 700 800 900 1000-10

-8

-6

-4

-2

0

2

4

6

8

10Gaussian Random Process

for j=1:100,Rx=Rx_est(X,m);Sx=fftshift(abs(fft(Rx)));

end;

Sx_av=Sx_av+Sx;

Sx_av=Sx_av/100;

-50 -40 -30 -20 -10 0 10 20 30 40 50-0.2

0

0.2

0.4

0.6

0.8

1

1.2Autocorrelation function

-0.5 -0.4 -0.3 -0.2 -0.1 0 0.1 0.2 0.3 0.4 0.50

0.2

0.4

0.6

0.8

1

1.2

1.4

1.6

1.8

2

f

Power Spectral Desity


Autocorrelation – unvoiced speech


Average Magnitude Difference Function

• Instead of taking average over magnitude, we do it over magnitude difference

• Operator T: T(x) is the concatenation of a linear filter H(z)=1-z-p

and taking absolute value |x|• It will be used in a computationally efficient solution to pitch

estimation

∑+−=

−−=Δn

Nnmn pmxmxpM

1

|)()(|)(


AMDF vs. autocorrelation

0 50 100 1500

1

2

3

4

5

6

7x 105

0 50 100 150 200 250 300 350-4000

-3000

-2000

-1000

0

1000

2000

3000

4000

0 5 0 1 0 0 1 5 0-5

0

5

1 0x 1 0 8


Summary

• Short-time energy

• Short-time amplitude

• Short-time average zero-cross (ZC) rate

• Short-time amplitude difference

• Short-time auto-correlation

∑+−=

=n

Nnmn mxE

1

2 )(

∑+−=

=n

Nnmn mxM

1

|)(|

∑+−=

−−=Δn

Nnmn pmxmxpM

1

|)()(|)(

∑∞

−∞=

−−+−=m

n mknwkmxmnwmxkR )()()()()(

∑+−=

−−=n

Nnmn mxmxZ

1|)]1(sgn[)](sgn[|


Window Effect

• Obvious advantage: localizing a part of speech for analysis.

• Less obvious disadvantage: how will the windowing affect the analysis results?– Compare the localization property of two extreme cases: N=1 vs. N=∞

» N=1: best in time but worst in frequency» N= ∞: best in frequency but worst in time

• Fundamentally speaking, windowing is the pursuit of a better tradeoff between time and frequency localization

Why do we window the signal :


Window Parameters

• Window length– Not too short: too few samples are not enough to resolve the uncertainty – Not too long: too many samples would introduce more uncertainty

• Window shape– Rectangular window is conceptually the simplest, but suffers from spectral

leakage problem

• It is always a tradeoff: there is no universally optimal window; which window is more effective depends on specific applications


Rectangular Window


Various Windows

• Hamming/Hanning window – Also known as raised Cosine window

• Blackman window– Based on two cosine terms

• Bartlett window– Also known as triangular window

• Kaiser-Bessel window– Based on Bessel functions


Hanning / Hamming Windows

)2cos(*5.05.0)(NnnW π

+=N-point Hann window

)2cos(*46.054.0)(NnnW π

+=N-point Hamming window

)2cos(*)1()(NnaanW π

−+=N-point raised cosine window


Hamming


Blackman Window

)2cos(08.0)cos(5.042.0)(Nn

NnnW ππ ++=


Barlett Window

NNnnW |2/|1)( −

−=


Kaiser-Bessel Window

zero order modified Bessel function of the first kind

B determines the tradeoff between the main lobe and sidelobesand is often specified as a half integer multiple of pi


Kaiser-Bessel Window


Pitch Period Estimation

P

PP


• How to estimate the period for sinusoids?

– Fourier Transform (will be covered by the next lectures)

• In the time domain, we might consider

– Autocorrelation function – recall its periodic property

– Average Magnitude Difference Function (AMDF) – essentially follow the same motivation but computationally more efficient



P1 P2P3

P=Average(P1,P2,…)



Pitch Estimation using prediction

We assume the signal is periodic (i.e. repeats itself) , and let L be the estimate of the pitch period.

We then ‘predict’ the current period, from the previous one.

And the prediction error, for the estimate L is then

)()( Lnxcnx −⋅=)


)()()()()( Lnxcnxnxnxne −⋅−=−= )

[ ] ( )[ ]22 )()()( LnxcnxEneE −⋅−=

[ ] [ ] [ ] [ ])()(2)()()( 2222 LnxnxEcLnxEcnxEneE −⋅−−+=

[ ] [ ] [ ])()(2)(2)( 22 LnxnxELnxcEneEc

−−−=∂∂

[ ] 0)(2 =∂∂ neEc

[ ][ ] ),(

),0()(

)()(2 LLR

LRLnxELnxnxEc

x

x=−−

=


Energy Of error

MinimizeEnergy Of error


• For a given range of estimates for the pitch L = L1 to L2

• Compute the autocorrelation at lags 0, L

• Compute the optimum gain c, and the prediction error:

• Choose the value of L (pitch) that has the minimum error.

),(),0(LLRLRc

x

x=

),(),0()0,0(

2

LLRLRRMinErr

x

xx −=


digital speech and audio processing e. nemer spring 2008€¦ · digital speech and audio...

Documents