eel 6586: automatic speech processing windows lecture mark d. skowronski computational...
DESCRIPTION
…not those either!TRANSCRIPT
![Page 1: EEL 6586: AUTOMATIC SPEECH PROCESSING Windows Lecture Mark D. Skowronski Computational Neuro-Engineering Lab University of Florida February 10, 2003](https://reader035.vdocument.in/reader035/viewer/2022062317/5a4d1b397f8b9ab05999dd36/html5/thumbnails/1.jpg)
EEL 6586: AUTOMATIC SPEECH PROCESSING
Windows Lecture
Mark D. Skowronski Computational Neuro-Engineering Lab
University of FloridaFebruary 10, 2003
![Page 2: EEL 6586: AUTOMATIC SPEECH PROCESSING Windows Lecture Mark D. Skowronski Computational Neuro-Engineering Lab University of Florida February 10, 2003](https://reader035.vdocument.in/reader035/viewer/2022062317/5a4d1b397f8b9ab05999dd36/html5/thumbnails/2.jpg)
No, not MS Windows®…
![Page 3: EEL 6586: AUTOMATIC SPEECH PROCESSING Windows Lecture Mark D. Skowronski Computational Neuro-Engineering Lab University of Florida February 10, 2003](https://reader035.vdocument.in/reader035/viewer/2022062317/5a4d1b397f8b9ab05999dd36/html5/thumbnails/3.jpg)
…not those either!
![Page 4: EEL 6586: AUTOMATIC SPEECH PROCESSING Windows Lecture Mark D. Skowronski Computational Neuro-Engineering Lab University of Florida February 10, 2003](https://reader035.vdocument.in/reader035/viewer/2022062317/5a4d1b397f8b9ab05999dd36/html5/thumbnails/4.jpg)
Speech windows
Speech is NONSTATIONARY
![Page 5: EEL 6586: AUTOMATIC SPEECH PROCESSING Windows Lecture Mark D. Skowronski Computational Neuro-Engineering Lab University of Florida February 10, 2003](https://reader035.vdocument.in/reader035/viewer/2022062317/5a4d1b397f8b9ab05999dd36/html5/thumbnails/5.jpg)
Assume speech is stationary over ‘short’ window of time.
‘SEVEN’
Speech windows
![Page 6: EEL 6586: AUTOMATIC SPEECH PROCESSING Windows Lecture Mark D. Skowronski Computational Neuro-Engineering Lab University of Florida February 10, 2003](https://reader035.vdocument.in/reader035/viewer/2022062317/5a4d1b397f8b9ab05999dd36/html5/thumbnails/6.jpg)
What is a ‘short’ window of time?• 10 μs: smallest difference detectable by
auditory system (localization),• 3 ms: shortest phoneme (plosive burst),• 10 ms: glottal pulse period,• 100 ms: average phoneme duration,• 4 s: exhale period during speech.
‘Short’ depends on application.
![Page 7: EEL 6586: AUTOMATIC SPEECH PROCESSING Windows Lecture Mark D. Skowronski Computational Neuro-Engineering Lab University of Florida February 10, 2003](https://reader035.vdocument.in/reader035/viewer/2022062317/5a4d1b397f8b9ab05999dd36/html5/thumbnails/7.jpg)
Applications using windows
• Automatic speech recognition,• Speech coding/decoding,• Speaker identification,• Text-to-speech synthesis,• Noise reduction
Typical window (frame) length: 20-30 msTypical frame rate: 100 frames/sec
![Page 8: EEL 6586: AUTOMATIC SPEECH PROCESSING Windows Lecture Mark D. Skowronski Computational Neuro-Engineering Lab University of Florida February 10, 2003](https://reader035.vdocument.in/reader035/viewer/2022062317/5a4d1b397f8b9ab05999dd36/html5/thumbnails/8.jpg)
Short-time analysis
)()()( nsnwnx s(n): entire speech utterance
w(n): window function
x(n): frame of speech
Window function is non-zero for N samples, n=0,…,N-1
![Page 9: EEL 6586: AUTOMATIC SPEECH PROCESSING Windows Lecture Mark D. Skowronski Computational Neuro-Engineering Lab University of Florida February 10, 2003](https://reader035.vdocument.in/reader035/viewer/2022062317/5a4d1b397f8b9ab05999dd36/html5/thumbnails/9.jpg)
Short-term Fourier Transform
m
njemnwmsnX )()(),(
s(m): entire speech utterance
w(m): window function
X(n,ω): STFT of speech at time n
STFT is a smoothed version of original spectrum.
![Page 10: EEL 6586: AUTOMATIC SPEECH PROCESSING Windows Lecture Mark D. Skowronski Computational Neuro-Engineering Lab University of Florida February 10, 2003](https://reader035.vdocument.in/reader035/viewer/2022062317/5a4d1b397f8b9ab05999dd36/html5/thumbnails/10.jpg)
STFT example
)(*)()()()()( SWXnsnwnx
s(n): pure sinewave of infinite length
w(n): rectangular window:
o.w.0
1,...,01)(
Nnnw
![Page 11: EEL 6586: AUTOMATIC SPEECH PROCESSING Windows Lecture Mark D. Skowronski Computational Neuro-Engineering Lab University of Florida February 10, 2003](https://reader035.vdocument.in/reader035/viewer/2022062317/5a4d1b397f8b9ab05999dd36/html5/thumbnails/11.jpg)
STFT example|W(ω)|
*
|S(ω)|
ω0
ω0
=|X(ω)|
![Page 12: EEL 6586: AUTOMATIC SPEECH PROCESSING Windows Lecture Mark D. Skowronski Computational Neuro-Engineering Lab University of Florida February 10, 2003](https://reader035.vdocument.in/reader035/viewer/2022062317/5a4d1b397f8b9ab05999dd36/html5/thumbnails/12.jpg)
Window types
• Rectangular• Hann (cosine)• Hamming (raised cosine)• Blackman• Kaiser-Bessel
Tradeoff between leakage and blurring
![Page 13: EEL 6586: AUTOMATIC SPEECH PROCESSING Windows Lecture Mark D. Skowronski Computational Neuro-Engineering Lab University of Florida February 10, 2003](https://reader035.vdocument.in/reader035/viewer/2022062317/5a4d1b397f8b9ab05999dd36/html5/thumbnails/13.jpg)
Window tradeoff• Blurring: main lobe width A• Leakage: side lobe suppression B
B
A
![Page 14: EEL 6586: AUTOMATIC SPEECH PROCESSING Windows Lecture Mark D. Skowronski Computational Neuro-Engineering Lab University of Florida February 10, 2003](https://reader035.vdocument.in/reader035/viewer/2022062317/5a4d1b397f8b9ab05999dd36/html5/thumbnails/14.jpg)
Popular windowsWindow Unit BW Sidelobe
Rectangle 1 -13 dB
Hann 2 -31 dB
Hamming 2 -43 dB
Blackman 3 -68 dB
Kaiser-Bessel
4 -91 dB
![Page 15: EEL 6586: AUTOMATIC SPEECH PROCESSING Windows Lecture Mark D. Skowronski Computational Neuro-Engineering Lab University of Florida February 10, 2003](https://reader035.vdocument.in/reader035/viewer/2022062317/5a4d1b397f8b9ab05999dd36/html5/thumbnails/15.jpg)
Practical issues
• Rule of thumb:– Time domain, use Rectangle window– Freq domain, use Hamming window
• Why?
![Page 16: EEL 6586: AUTOMATIC SPEECH PROCESSING Windows Lecture Mark D. Skowronski Computational Neuro-Engineering Lab University of Florida February 10, 2003](https://reader035.vdocument.in/reader035/viewer/2022062317/5a4d1b397f8b9ab05999dd36/html5/thumbnails/16.jpg)
Time domain issues• Correlation in time domain interfered by
tapered windows
20 ms /eh/, male utterance, pitch measurement (normalized autocorrelation).
First side peak lower using Hamming window
![Page 17: EEL 6586: AUTOMATIC SPEECH PROCESSING Windows Lecture Mark D. Skowronski Computational Neuro-Engineering Lab University of Florida February 10, 2003](https://reader035.vdocument.in/reader035/viewer/2022062317/5a4d1b397f8b9ab05999dd36/html5/thumbnails/17.jpg)
Frequency domain issuesfs=12.5 KHz, /eh/, 800 samples, male speaker.Blurring/Leakage tradeoff evidence: