chairman:hung-chi yang presenter: yue -fong guo advisor: dr. yeou-jiunn chen date: 2013.3.20

Classification of place of articulation

in unvoiced stops with spectro-temporal surface

modeling V. Karjigi , P. Rao

Dept. of Electrical Engineering, Indian Institute of Technology Bombay, Powai, Mumbai 400076, India

Received 8 December 2011; received in revised form 12 March 2012; accepted 23 April 2012 Available online 1 June 2012

Chairman:Hung-Chi YangPresenter: Yue-Fong GuoAdvisor: Dr. Yeou-Jiunn ChenDate: 2013.3.20

Outline

• Introduction

• MFCC

• 2D-DCT

• Polynomial surface

Outline

• GMM

• Results

• Conclusion

Introduction

• Automatic speech recognition (ASR) system

• The goal is the lexical content of the human voice is converted to a computer-readable input

• Attempt to identify or confirm issue voice speaker rather than the content of the terms contained therein

Introduction

• Automatic speech recognition (ASR) system • Acoustics feature• Signal processing and feature extraction• Mel frequency cepstral coefficients (MFCC)

• Acoustics model• Statistically speech model• Gaussian mixture model (GMM)

MFCC• Mel frequency cepstral coefficients

(MFCC)• MFCC takes human perception sensitivity

with respect to frequencies into consideration, and therefore are best for speech/speaker recognition.

MFCC1.Pre-emphasis• The speech signal s(n) is sent to a high-pass filter

2.Frame blocking3.Hamming windowing• Each frame has to be multiplied with a

hamming window in order to keep the continuity of the first and the last points in the frame

MFCC4. Fast Fourier Transform or FFT• The time domain signal into a frequency

domain

5.Triangular Bandpass Filters• Smooth the magnitude spectrum such that the

harmonics are flattened in order to obtain the envelop of the spectrum with harmonics.

6.Discrete cosine transform or DCT

MFCC7.Log energy• The energy within a frame is also an important

feature that can be easily obtained

8.Delta cepstrum• Actually used in speech recognition, we usually

coupled differential cepstrum parameters to show the changes of the the cepstrum parameters of the time

2D-DCT• 2D-DCT modeling

Polynomial surface• Polynomial surface modeling

• Gaussian mixture model (GMM)• Is an effective tool for data modeling and

pattern classification• Speaker acoustic characteristics for

clustering, and then each group of acoustic characteristics described with a Gaussian density distribution

Databases• Databases• Evaluated on two distinct datasets • American English continuous speech as provided

in the TIMIT database • Marathi words database specially created for the

purpose

Results

Conclusion

• A comparison of performance with published results on the same task revealed that the spectro-temporal feature systems tested in this work improve upon the best previous systems’ performances in terms of classification accuracies on the specified datasets.

The End

chairman:hung-chi yang presenter: yue -fong guo advisor: dr. yeou-jiunn chen date: 2013.3.20

dct modeling

data modeling

important feature

frame sn

speechspeaker recognition

unvoiced stops

human voice

fftthe time domain signal

Documents

w. j. cheng shih chien university 2006 an efficient ip...

spim part ii chun-cheng lin ( 林春成 ) & jiunn-jye lee (...

flavor structure of the nucleon sea from lattice qcd...

mega financial holding company · directors: yeou-tsair...

a study of assisting hearing-impaired students in...

stainless steel world registration 9€¦ · s plus tube...

han xuyang joshua ian lim ho tzyy yeou lee zheyi boey rui yu...

lectures on modern physics jiunn-ren roan 12 oct. 2007

holographic superconductors jiunn-wei chen (ntu) w/ ying-jer...

a study on the structural test and mechanical behavior of...

finasteride inhibits melanogenesis through …...jae ok seo1...

a real-time vehicle license plate recognition (lpr) system a...

mmp11: a novel target antigen for cancer immunotherapy...

interoperability of tcp/ip prepared by cheng, wang-jiunn...

development of an intein-mediated split–cas9 system for...

bangladesh cardiovascular conference delhi, india. prof....

yeou-kouq tung - water resources data system...

what atomic physics can do for neutrino and direct dark...

scientific misbehavior jiunn-ren roan fall 2006. what is...

c programming language wang jiunn cheng 2003/07/29...