digital speech and audio processing e. nemer spring 2008€¦ · digital speech and audio...
TRANSCRIPT
Digital Speech and Audio Processing Spring 2008 - 1E. Nemer
Digital Speech and Audio Processing Spring 2008 - 2E. Nemer
1. Autocorrelation Function
• Deterministic Signal
• Voiced speech
• Random Signal
• Unvoiced Speech
2. Average Magnitude Difference Function
3. Windowing issues.
4. Pitch Estimation
Digital Speech and Audio Processing Spring 2008 - 3E. Nemer
Autocorrelation Function
• The autocorrelation function of a deterministic signal x(n), is defined by:
∑∞
−∞=
+=m
kmxmxk )()()(φ
∑−−=
=
+=1
0)()()(
kNn
nkmxmxkφ
• In practice, we would window the signal so that we can compute the above sum . Assume a rectangular window of length N, then the simple form of the autocorrelation is :
Digital Speech and Audio Processing Spring 2008 - 4E. Nemer
0 1 2 3 4 5 6 7 8 9 1 0-1
-0 .5
0
0 .5
1
0 1 2 3 4 5 6 7 8 9 1 0-1
-0 .5
0
0 .5
1
0 1 2 3 4 5 6 7 8 9 1 0-1
-0 .5
0
0 .5
1
Original x(m)
Shifted versions x(m-k)
• Consider the simple case of a sinusoidal signal :
⎟⎠⎞
⎜⎝⎛ +⋅= θπ m
FsfAmx 02cos)(
0fFsPperiod =
Autocorrelation Function
Digital Speech and Audio Processing Spring 2008 - 5E. Nemer
- 2 5 - 2 0 - 1 5 - 1 0 - 5 0 5 1 0 1 5 2 0 2 5- 6 0
- 4 0
- 2 0
0
2 0
4 0
6 0
0 1 2 3 4 5 6 7 8 9 1 0- 1
- 0 . 5
0
0 . 5
1
0 1 2 3 4 5 6 7 8 9 1 0- 1
- 0 . 5
0
0 . 5
1
Autocorrelation Function
∑∞
−∞=
+=m
kmxmxk )()()(φ
Multiply-addGet one point
Digital Speech and Audio Processing Spring 2008 - 6E. Nemer
)()( Pkk +=φφ for periodic functions (period=P)
)()( kk −=φφ symetry
kk ∀≤ ),0()( φφφ(0) is the energy of x(n)
follows Cauchy’s inequality
Properties of autocorrelation function
Autocorrelation Function
- 2 5 - 2 0 - 1 5 - 1 0 - 5 0 5 1 0 1 5 2 0 2 5- 6 0
- 4 0
- 2 0
0
2 0
4 0
6 0
Digital Speech and Audio Processing Spring 2008 - 7E. Nemer
0 50 100 150-5
0
5
10x 108
0 50 100 150 200 250 300 350-4000
-3000
-2000
-1000
0
1000
2000
3000
4000
Periodic peaks
Rn(k)
Example for Voiced Signal
Digital Speech and Audio Processing Spring 2008 - 8E. Nemer
)( )(
Density SpectralPower Function ation Autocorrel
fSR XX τ F
)( )( fSR XX τ 1−F
∫∞
∞−
−= ττ τπ deRfS fjXX
2)()(
∫∞
∞−
= dfefSR fjXX
τπτ 2)()(
[ ] )0()( 2XRtXEP == [ ] ∫
∞
∞−
== dffStXEP X )()( 2
Power Spectral Density
Digital Speech and Audio Processing Spring 2008 - 9E. Nemer
• A more reasonable definition of the auto-correlation is expressed using the pair of windowed signals :
∑∞
−∞=
−−+−=m
n mknwkmxmnwmxkR )()()()()(
window for x(m) window for x(m+k)
∑∞
−∞=
−+−−=−=m
nn mknwkmxmnwmxkRkR )()()()()()(
Easy to show
Autocorrelation Function
Digital Speech and Audio Processing Spring 2008 - 10E. Nemer
• Another way to view the windowed autocorrelation is by considering the product of the speech signal with its delayed version, passed thru a filter whose response is :
[ ]∑∞
−∞=
−⋅−=m
kn mnhkmxmxkR )()()()(
)()()( knwnwnhk +=
Delay k
)(nhk)(nx
)( knx −)(kRn
Autocorrelation Function
[ ] [ ]∑∞
−∞=
−+−⋅−=m
n mknwmnwkmxmxkR )()()()()(
Digital Speech and Audio Processing Spring 2008 - 11E. Nemer
∑−
+−=
−−+−=kn
Nnmn mknwkmxmnwmxkR
1
)()()()()(
201 n = 300
50,,0 L=k
1+− Nn nm
)( mnw −
Autocorrelation Function
e.g. N = 100n = 300
Digital Speech and Audio Processing Spring 2008 - 12E. Nemer
∑−
=
−−+−=k
mn kmnwkmxmnwmxkR
300
201
)()()()()(
)0()300()1()299(.......)98()202()99()201()1(
wxwxwxwxRn +=
Autocorrelation Function
)0()300()50()250(.......)98()202()99()201()50(
wxwxwxwxRn +=
Total number of products in the sum : N – k
Digital Speech and Audio Processing Spring 2008 - 13E. Nemer
∑−−=
=
+−
=1
0)()(1)(
kNm
mun kmxmx
kNkR
∑−−=
=
+=1
0)()(1)(
kNm
mb kmxmx
NkR
)(1)( kRNkkR unbau ⎟⎠⎞
⎜⎝⎛ −=
unbiased
biased
Biased for finite N, but Asymptotically unbiased for
∞→N
Autocorrelation Function
Recall the simple definition of the autocorrelation (not windowed)
Digital Speech and Audio Processing Spring 2008 - 14E. Nemer
Autocorrelation voiced speech
00
1T
F =
0T
0TFemale
Digital Speech and Audio Processing Spring 2008 - 15E. Nemer
0T
0T
00 3
13T
F =
Autocorrelation voiced speech Male
Digital Speech and Audio Processing Spring 2008 - 16E. Nemer
Sixteen.wav filtered at 600Hz
sixteen.wav
Digital Speech and Audio Processing Spring 2008 - 17E. Nemer
An ensemble of sample functions:
The case of a random process
Digital Speech and Audio Processing Spring 2008 - 18E. Nemer
When X(t) is a stationary random process , then :
The mean of X(t) is
fX(t)(x) : the probability density function (pdf)The autocorrelation function of X(t) is
[ ])()( tXEtX =μ
xxxf tX d )(
)(∫∞
∞−=
[ ]
)(
),(
),(
)()( )(R
12
-
- 2121)((0)21
-
- 2121)()(21
2121
12
21
ttR
dxdxxxfxx
dxdxxxfxx
tXtXE,tt
X
ttXX
tXtX
X
−=
=
=
=
∫ ∫∫ ∫∞
∞
∞
∞ −
∞
∞
∞
∞
Xμ=
The case of a random process
Digital Speech and Audio Processing Spring 2008 - 19E. Nemer
For convenience of notation , we redefine
1. The mean-square value
2. The autocorrelation is symetrical :
3. The largest value is at lag 0
[ ] allfor , )()()( tτtXtXERX −=τ
[ ] 0 τ, )((0) 2 == tXERX
τ)()( −= RRX τ
(0))( XX RR ≤τ
The case of a random process
Digital Speech and Audio Processing Spring 2008 - 20E. Nemer
The RX(τ) provides the interdependence information of two random variables obtained from X(t) at timesτ seconds apart
The case of a random process
Digital Speech and Audio Processing Spring 2008 - 21E. Nemer
),(~)42cos()( πππ −ΘΘ+= UttX
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1-1
0
1Realizations of Sinusoid of cont. random phase
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1-1
0
1
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1-1
0
1
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1-1
0
1
⎪⎩
⎪⎨⎧ ≤≤−=Θ
elsewhere,0
,21
)( πθππf θ
Θf
θπ− π
Random process : sinusoid with random phase
Digital Speech and Audio Processing Spring 2008 - 22E. Nemer
Θ)2cos()( += tπfAtX c
[ ] )2cos(2
)()()(2
tπfAtXτtXER cX =+=τ
Random process : sinusoid with random phase
Digital Speech and Audio Processing Spring 2008 - 23E. Nemer
N=1000;X=randn(1,N);
m=50;Rx=Rx_est(X,m);
plot(X)plot([-m:m],Rx)
0 100 200 300 400 500 600 700 800 900 1000-10
-8
-6
-4
-2
0
2
4
6
8
10Gaussian Random Process
Random process : white noise
Digital Speech and Audio Processing Spring 2008 - 24E. Nemer
Power Spectral Density
0 100 200 300 400 500 600 700 800 900 1000-10
-8
-6
-4
-2
0
2
4
6
8
10Gaussian Random Process
for j=1:100,Rx=Rx_est(X,m);Sx=fftshift(abs(fft(Rx)));
end;
Sx_av=Sx_av+Sx;
Sx_av=Sx_av/100;
-50 -40 -30 -20 -10 0 10 20 30 40 50-0.2
0
0.2
0.4
0.6
0.8
1
1.2Autocorrelation function
-0.5 -0.4 -0.3 -0.2 -0.1 0 0.1 0.2 0.3 0.4 0.50
0.2
0.4
0.6
0.8
1
1.2
1.4
1.6
1.8
2
f
Power Spectral Desity
Digital Speech and Audio Processing Spring 2008 - 25E. Nemer
Autocorrelation – unvoiced speech
Digital Speech and Audio Processing Spring 2008 - 26E. Nemer
Average Magnitude Difference Function
• Instead of taking average over magnitude, we do it over magnitude difference
• Operator T: T(x) is the concatenation of a linear filter H(z)=1-z-p
and taking absolute value |x|• It will be used in a computationally efficient solution to pitch
estimation
∑+−=
−−=Δn
Nnmn pmxmxpM
1
|)()(|)(
Digital Speech and Audio Processing Spring 2008 - 27E. Nemer
AMDF vs. autocorrelation
0 50 100 1500
1
2
3
4
5
6
7x 105
0 50 100 150 200 250 300 350-4000
-3000
-2000
-1000
0
1000
2000
3000
4000
0 5 0 1 0 0 1 5 0-5
0
5
1 0x 1 0 8
Digital Speech and Audio Processing Spring 2008 - 28E. Nemer
Summary
• Short-time energy
• Short-time amplitude
• Short-time average zero-cross (ZC) rate
• Short-time amplitude difference
• Short-time auto-correlation
∑+−=
=n
Nnmn mxE
1
2 )(
∑+−=
=n
Nnmn mxM
1
|)(|
∑+−=
−−=Δn
Nnmn pmxmxpM
1
|)()(|)(
∑∞
−∞=
−−+−=m
n mknwkmxmnwmxkR )()()()()(
∑+−=
−−=n
Nnmn mxmxZ
1|)]1(sgn[)](sgn[|
Digital Speech and Audio Processing Spring 2008 - 29E. Nemer
Window Effect
• Obvious advantage: localizing a part of speech for analysis.
• Less obvious disadvantage: how will the windowing affect the analysis results?– Compare the localization property of two extreme cases: N=1 vs. N=∞
» N=1: best in time but worst in frequency» N= ∞: best in frequency but worst in time
• Fundamentally speaking, windowing is the pursuit of a better tradeoff between time and frequency localization
Why do we window the signal :
Digital Speech and Audio Processing Spring 2008 - 30E. Nemer
Window Parameters
• Window length– Not too short: too few samples are not enough to resolve the uncertainty – Not too long: too many samples would introduce more uncertainty
• Window shape– Rectangular window is conceptually the simplest, but suffers from spectral
leakage problem
• It is always a tradeoff: there is no universally optimal window; which window is more effective depends on specific applications
Digital Speech and Audio Processing Spring 2008 - 31E. Nemer
Rectangular Window
Digital Speech and Audio Processing Spring 2008 - 32E. Nemer
Various Windows
• Hamming/Hanning window – Also known as raised Cosine window
• Blackman window– Based on two cosine terms
• Bartlett window– Also known as triangular window
• Kaiser-Bessel window– Based on Bessel functions
Digital Speech and Audio Processing Spring 2008 - 33E. Nemer
Hanning / Hamming Windows
)2cos(*5.05.0)(NnnW π
+=N-point Hann window
)2cos(*46.054.0)(NnnW π
+=N-point Hamming window
)2cos(*)1()(NnaanW π
−+=N-point raised cosine window
Digital Speech and Audio Processing Spring 2008 - 34E. Nemer
Hamming
Digital Speech and Audio Processing Spring 2008 - 35E. Nemer
Blackman Window
)2cos(08.0)cos(5.042.0)(Nn
NnnW ππ ++=
Digital Speech and Audio Processing Spring 2008 - 36E. Nemer
Barlett Window
NNnnW |2/|1)( −
−=
Digital Speech and Audio Processing Spring 2008 - 37E. Nemer
Kaiser-Bessel Window
zero order modified Bessel function of the first kind
B determines the tradeoff between the main lobe and sidelobesand is often specified as a half integer multiple of pi
Digital Speech and Audio Processing Spring 2008 - 38E. Nemer
Kaiser-Bessel Window
Digital Speech and Audio Processing Spring 2008 - 39E. Nemer
Pitch Period Estimation
P
PP
Digital Speech and Audio Processing Spring 2008 - 40E. Nemer
• How to estimate the period for sinusoids?
– Fourier Transform (will be covered by the next lectures)
• In the time domain, we might consider
– Autocorrelation function – recall its periodic property
– Average Magnitude Difference Function (AMDF) – essentially follow the same motivation but computationally more efficient
Pitch Period Estimation
Digital Speech and Audio Processing Spring 2008 - 41E. Nemer
P1 P2P3
P=Average(P1,P2,…)
Pitch Period Estimation
Digital Speech and Audio Processing Spring 2008 - 42E. Nemer
Pitch Estimation using prediction
We assume the signal is periodic (i.e. repeats itself) , and let L be the estimate of the pitch period.
We then ‘predict’ the current period, from the previous one.
And the prediction error, for the estimate L is then
)()( Lnxcnx −⋅=)
Digital Speech and Audio Processing Spring 2008 - 43E. Nemer
)()()()()( Lnxcnxnxnxne −⋅−=−= )
[ ] ( )[ ]22 )()()( LnxcnxEneE −⋅−=
[ ] [ ] [ ] [ ])()(2)()()( 2222 LnxnxEcLnxEcnxEneE −⋅−−+=
[ ] [ ] [ ])()(2)(2)( 22 LnxnxELnxcEneEc
−−−=∂∂
[ ] 0)(2 =∂∂ neEc
[ ][ ] ),(
),0()(
)()(2 LLR
LRLnxELnxnxEc
x
x=−−
=
Pitch Estimation using prediction
Energy Of error
MinimizeEnergy Of error
Digital Speech and Audio Processing Spring 2008 - 44E. Nemer
• For a given range of estimates for the pitch L = L1 to L2
• Compute the autocorrelation at lags 0, L
• Compute the optimum gain c, and the prediction error:
• Choose the value of L (pitch) that has the minimum error.
),(),0(LLRLRc
x
x=
),(),0()0,0(
2
LLRLRRMinErr
x
xx −=
Pitch Estimation using prediction