chairman:hung-chi yang presenter: yue -fong guo advisor: dr. yeou-jiunn chen date: 2013.3.20
DESCRIPTION
Classification of place of articulation in unvoiced stops with spectro -temporal surface modeling . V. Karjigi , P. Rao Dept. of Electrical Engineering, Indian Institute of Technology Bombay, Powai , Mumbai 400076, India - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Chairman:Hung-Chi Yang Presenter: Yue -Fong Guo Advisor: Dr. Yeou-Jiunn Chen Date: 2013.3.20](https://reader033.vdocument.in/reader033/viewer/2022061415/56816466550346895dd64c39/html5/thumbnails/1.jpg)
Classification of place of articulation
in unvoiced stops with spectro-temporal surface
modeling V. Karjigi , P. Rao
Dept. of Electrical Engineering, Indian Institute of Technology Bombay, Powai, Mumbai 400076, India
Received 8 December 2011; received in revised form 12 March 2012; accepted 23 April 2012 Available online 1 June 2012
Chairman:Hung-Chi YangPresenter: Yue-Fong GuoAdvisor: Dr. Yeou-Jiunn ChenDate: 2013.3.20
![Page 2: Chairman:Hung-Chi Yang Presenter: Yue -Fong Guo Advisor: Dr. Yeou-Jiunn Chen Date: 2013.3.20](https://reader033.vdocument.in/reader033/viewer/2022061415/56816466550346895dd64c39/html5/thumbnails/2.jpg)
Outline
• Introduction
• MFCC
• 2D-DCT
• Polynomial surface
![Page 3: Chairman:Hung-Chi Yang Presenter: Yue -Fong Guo Advisor: Dr. Yeou-Jiunn Chen Date: 2013.3.20](https://reader033.vdocument.in/reader033/viewer/2022061415/56816466550346895dd64c39/html5/thumbnails/3.jpg)
Outline
• GMM
• Results
• Conclusion
![Page 4: Chairman:Hung-Chi Yang Presenter: Yue -Fong Guo Advisor: Dr. Yeou-Jiunn Chen Date: 2013.3.20](https://reader033.vdocument.in/reader033/viewer/2022061415/56816466550346895dd64c39/html5/thumbnails/4.jpg)
Introduction
• Automatic speech recognition (ASR) system
• The goal is the lexical content of the human voice is converted to a computer-readable input
• Attempt to identify or confirm issue voice speaker rather than the content of the terms contained therein
![Page 5: Chairman:Hung-Chi Yang Presenter: Yue -Fong Guo Advisor: Dr. Yeou-Jiunn Chen Date: 2013.3.20](https://reader033.vdocument.in/reader033/viewer/2022061415/56816466550346895dd64c39/html5/thumbnails/5.jpg)
Introduction
• Automatic speech recognition (ASR) system • Acoustics feature• Signal processing and feature extraction• Mel frequency cepstral coefficients (MFCC)
• Acoustics model• Statistically speech model• Gaussian mixture model (GMM)
![Page 6: Chairman:Hung-Chi Yang Presenter: Yue -Fong Guo Advisor: Dr. Yeou-Jiunn Chen Date: 2013.3.20](https://reader033.vdocument.in/reader033/viewer/2022061415/56816466550346895dd64c39/html5/thumbnails/6.jpg)
MFCC• Mel frequency cepstral coefficients
(MFCC)• MFCC takes human perception sensitivity
with respect to frequencies into consideration, and therefore are best for speech/speaker recognition.
![Page 7: Chairman:Hung-Chi Yang Presenter: Yue -Fong Guo Advisor: Dr. Yeou-Jiunn Chen Date: 2013.3.20](https://reader033.vdocument.in/reader033/viewer/2022061415/56816466550346895dd64c39/html5/thumbnails/7.jpg)
MFCC1.Pre-emphasis• The speech signal s(n) is sent to a high-pass filter
2.Frame blocking3.Hamming windowing• Each frame has to be multiplied with a
hamming window in order to keep the continuity of the first and the last points in the frame
![Page 8: Chairman:Hung-Chi Yang Presenter: Yue -Fong Guo Advisor: Dr. Yeou-Jiunn Chen Date: 2013.3.20](https://reader033.vdocument.in/reader033/viewer/2022061415/56816466550346895dd64c39/html5/thumbnails/8.jpg)
MFCC4. Fast Fourier Transform or FFT• The time domain signal into a frequency
domain
5.Triangular Bandpass Filters• Smooth the magnitude spectrum such that the
harmonics are flattened in order to obtain the envelop of the spectrum with harmonics.
6.Discrete cosine transform or DCT
![Page 9: Chairman:Hung-Chi Yang Presenter: Yue -Fong Guo Advisor: Dr. Yeou-Jiunn Chen Date: 2013.3.20](https://reader033.vdocument.in/reader033/viewer/2022061415/56816466550346895dd64c39/html5/thumbnails/9.jpg)
MFCC7.Log energy• The energy within a frame is also an important
feature that can be easily obtained
8.Delta cepstrum• Actually used in speech recognition, we usually
coupled differential cepstrum parameters to show the changes of the the cepstrum parameters of the time
![Page 10: Chairman:Hung-Chi Yang Presenter: Yue -Fong Guo Advisor: Dr. Yeou-Jiunn Chen Date: 2013.3.20](https://reader033.vdocument.in/reader033/viewer/2022061415/56816466550346895dd64c39/html5/thumbnails/10.jpg)
2D-DCT• 2D-DCT modeling
![Page 11: Chairman:Hung-Chi Yang Presenter: Yue -Fong Guo Advisor: Dr. Yeou-Jiunn Chen Date: 2013.3.20](https://reader033.vdocument.in/reader033/viewer/2022061415/56816466550346895dd64c39/html5/thumbnails/11.jpg)
Polynomial surface• Polynomial surface modeling
![Page 12: Chairman:Hung-Chi Yang Presenter: Yue -Fong Guo Advisor: Dr. Yeou-Jiunn Chen Date: 2013.3.20](https://reader033.vdocument.in/reader033/viewer/2022061415/56816466550346895dd64c39/html5/thumbnails/12.jpg)
Polynomial surface• Polynomial surface modeling
![Page 13: Chairman:Hung-Chi Yang Presenter: Yue -Fong Guo Advisor: Dr. Yeou-Jiunn Chen Date: 2013.3.20](https://reader033.vdocument.in/reader033/viewer/2022061415/56816466550346895dd64c39/html5/thumbnails/13.jpg)
Polynomial surface• Polynomial surface modeling
![Page 14: Chairman:Hung-Chi Yang Presenter: Yue -Fong Guo Advisor: Dr. Yeou-Jiunn Chen Date: 2013.3.20](https://reader033.vdocument.in/reader033/viewer/2022061415/56816466550346895dd64c39/html5/thumbnails/14.jpg)
Polynomial surface• Polynomial surface modeling
![Page 15: Chairman:Hung-Chi Yang Presenter: Yue -Fong Guo Advisor: Dr. Yeou-Jiunn Chen Date: 2013.3.20](https://reader033.vdocument.in/reader033/viewer/2022061415/56816466550346895dd64c39/html5/thumbnails/15.jpg)
GMM
• Gaussian mixture model (GMM)• Is an effective tool for data modeling and
pattern classification• Speaker acoustic characteristics for
clustering, and then each group of acoustic characteristics described with a Gaussian density distribution
![Page 16: Chairman:Hung-Chi Yang Presenter: Yue -Fong Guo Advisor: Dr. Yeou-Jiunn Chen Date: 2013.3.20](https://reader033.vdocument.in/reader033/viewer/2022061415/56816466550346895dd64c39/html5/thumbnails/16.jpg)
Databases• Databases• Evaluated on two distinct datasets • American English continuous speech as provided
in the TIMIT database • Marathi words database specially created for the
purpose
![Page 17: Chairman:Hung-Chi Yang Presenter: Yue -Fong Guo Advisor: Dr. Yeou-Jiunn Chen Date: 2013.3.20](https://reader033.vdocument.in/reader033/viewer/2022061415/56816466550346895dd64c39/html5/thumbnails/17.jpg)
Results
![Page 18: Chairman:Hung-Chi Yang Presenter: Yue -Fong Guo Advisor: Dr. Yeou-Jiunn Chen Date: 2013.3.20](https://reader033.vdocument.in/reader033/viewer/2022061415/56816466550346895dd64c39/html5/thumbnails/18.jpg)
Conclusion
• A comparison of performance with published results on the same task revealed that the spectro-temporal feature systems tested in this work improve upon the best previous systems’ performances in terms of classification accuracies on the specified datasets.
![Page 19: Chairman:Hung-Chi Yang Presenter: Yue -Fong Guo Advisor: Dr. Yeou-Jiunn Chen Date: 2013.3.20](https://reader033.vdocument.in/reader033/viewer/2022061415/56816466550346895dd64c39/html5/thumbnails/19.jpg)
The End