2015/9/131 stress detection j.-s. roger jang ( 張智星 ) mir labmir lab, csie dept., national...

17
111/06/12 1 Stress Detection J.-S. Roger Jang ( 張張張 ) MIR Lab , CSIE Dept., National Taiwan Univ. http://mirlab.org/jang

Upload: shannon-bates

Post on 27-Dec-2015

227 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: 2015/9/131 Stress Detection J.-S. Roger Jang ( 張智星 ) MIR LabMIR Lab, CSIE Dept., National Taiwan Univ

112/04/19 1

Stress Detection

J.-S. Roger Jang (張智星 )

MIR Lab, CSIE Dept., National Taiwan Univ.

http://mirlab.org/jang

Page 2: 2015/9/131 Stress Detection J.-S. Roger Jang ( 張智星 ) MIR LabMIR Lab, CSIE Dept., National Taiwan Univ

-2-

Intro to Stress Detection

Stress detection (SD) for English Given an English word and its pronunciation Detect the stress position of the pronunciation

Applications Computer-assisted pronunciation training (CAPT)

Similar to… Tone recognition in Mandarin Chinese Intonation scoring

Page 3: 2015/9/131 Stress Detection J.-S. Roger Jang ( 張智星 ) MIR LabMIR Lab, CSIE Dept., National Taiwan Univ

-3-

Examples of Stress in English Words

For multi-syllablic English word, there is a stressed syllable

Example Dictionary: stressed at syllable 1 Tomorrow: stressed at syllable 2 International: stressed at syllable 3

Page 4: 2015/9/131 Stress Detection J.-S. Roger Jang ( 張智星 ) MIR LabMIR Lab, CSIE Dept., National Taiwan Univ

-4-

Steps in Stress Detection

Preprocessing Use forced alignment to find vowel locations

Feature extraction Extract feature for each vowel

Model construction Build a classifier for vowel-based stress detection

Post processing Create a word-based stress detection

Page 5: 2015/9/131 Stress Detection J.-S. Roger Jang ( 張智星 ) MIR LabMIR Lab, CSIE Dept., National Taiwan Univ

-5-

Forced Alignment (1/2)

A process used for align an utterance and the corresponding canonical phonetic alphabets

Example: International

C:/Users/ROGERJ~1/AppData/Local/Temp/tpfa725f5d_eb16_47c7_a5cd_e042eea5d8d4.wav0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9

Sco

re=9

0.13

df=[

0 0

0 0

0 0

0]

-1

-0.5

0

0.5

1

(sil)

-1

sil

-1

international (ih_n_t_er_n_ae_sh_ah_n_ah_l)

90

ih

66

n

64

t

49

er

100

n

100

ae

100

sh

100

ah

100

n

100

ah

100

l

100

(sil)

-1

sil

-1

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9

Pitc

h

44

46

48

50

52

Pitch1: unbroken

Pitch2: segmented

Play Wave Play pitch Play both Play Pitch2

Page 6: 2015/9/131 Stress Detection J.-S. Roger Jang ( 張智星 ) MIR LabMIR Lab, CSIE Dept., National Taiwan Univ

-6-

Forced Alignment (2/2)

Applications of forced alignment Speech scoring (based on timber only) Utterance verification

Our forced alignment engine ASRA (Automatic Speech Recognition &

Assessment): For voice command recognition and speech assessment (scoring)

Page 7: 2015/9/131 Stress Detection J.-S. Roger Jang ( 張智星 ) MIR LabMIR Lab, CSIE Dept., National Taiwan Univ

-7-

Corpora for Stress Detection

Merriam Webster dictionary Website

Some statistics # pronunciations: 21950 Usable files: 14994

No. of syllables > 1Available in our

dictionaryValid output from ASRA

In-house recordings Recordings from MSAR

for several years Available upon request

Page 8: 2015/9/131 Stress Detection J.-S. Roger Jang ( 張智星 ) MIR LabMIR Lab, CSIE Dept., National Taiwan Univ

-8-

Speech Corpus for Lexical Stress Detection

Merriam Webster Online Dictionary’s Lexical Pronunciation

– http://www.merriam-webster.com

– All utterance are pronunciated by Native SpeakersStress Position

Number of Syllable

2 3 4 5 6 7 8

1 5090 2421 465 36 0 1 0

2 1691 1654 1324 147 9 0 0

3 0 348 926 450 27 0 0

4 0 0 34 242 72 4 2

5 0 0 0 1 30 11 0

6 0 0 0 0 0 7 0

7 0 0 0 0 0 0 0

Total 6781 4423 2749 876 138 23 2

Total utterances 14992

Total Syllables 43212

Stressed Syllables 14992

Unstressed Syllables 28220

Stressed : Unstressed 1 : 1.9

Sample Rate 16000

Resolution 16

Channel mono

Page 9: 2015/9/131 Stress Detection J.-S. Roger Jang ( 張智星 ) MIR LabMIR Lab, CSIE Dept., National Taiwan Univ

-9-

Stress Detection based on Vowel Classification

SD is based on vowel classification due to the following observations Each word has a stressed

syllable Each syllable is usually

composed of a consonant and a vowel

Vowels are always voiced (have pitch)

Therefore Each vowel is classified

into “unstressed” or “stressed”

To determine stressed syllable in an utteranceMax likelihood of the class

“Stressed”Min likelihood of the class

“Unstressed”Difference of the above two

Page 10: 2015/9/131 Stress Detection J.-S. Roger Jang ( 張智星 ) MIR LabMIR Lab, CSIE Dept., National Taiwan Univ

-10-

Features for vowels

Vowel-based features Pitch: min, mean, max, range, std, slope, etc. Volume: min, mean, max, range, std, slope, etc. Duration (normalized by speech rate) Legendre polynomial fitting for pitch & volume Spectral emphasized version of the above …

Page 11: 2015/9/131 Stress Detection J.-S. Roger Jang ( 張智星 ) MIR LabMIR Lab, CSIE Dept., National Taiwan Univ

-11-

Lexical Stress Detection – Experiment 1

Feature SetE : Root Mean Square EnergyD : DurationP : PitchS : Root Mean Square Spectral Emphasis EnergyPS: Pitch SlopeCE: Legendre Coefficient of Root Mean Square Energy ContourCP: Legendre Coefficient of Pitch ContourCS: Legendre Coefficient of Spectral Emphasis Energy Contour

10-fold Cross ValidationClassifier: SVM

Page 12: 2015/9/131 Stress Detection J.-S. Roger Jang ( 張智星 ) MIR LabMIR Lab, CSIE Dept., National Taiwan Univ

-12-

3 Syllables word

1st 2nd 3rd

1st 96.08% 3.10% 0.83%

2nd 8.28% 86.58% 5.14%

3rd 31.90% 5.75% 62.36%

4 Syllables word

1st 2nd 3rd 4th

1st 96.13% 2.37% 1.51% 0%

2nd 8.91% 87.76% 2.34% 0.98%

3rd 21.62% 2.46% 73.95% 0.97%

4th 38.24% 5.88% 2.94% 52.94%

5 Syllables word

1st 2nd 3rd 4th 5th

1st 100% 0% 0% 0% 0%

2nd 8.16% 88.44% 2.72% 0.68% 0%

3rd 19.33% 1.78% 76.67% 1.78% 0.44%

4th 13.64% 13.22% 2.48% 70.66% 0%

5th 100% 0% 0% 0% 0%

2 Syllables word

  1st 2nd

1st 95.13% 4.87%

2nd 25.67% 74.33%

Page 13: 2015/9/131 Stress Detection J.-S. Roger Jang ( 張智星 ) MIR LabMIR Lab, CSIE Dept., National Taiwan Univ

-13-

Lexical Stress Detection – Experiment 2

10-fold Cross ValidationClassifier: SVM

Syllable Number-Independent Classifier vs. Syllable Number-dependent Classifier

Feature Set

Max. Root Mean Square Energy

Mean Root Mean Square Energy

Max. Pitch

Median Pitch

Duration

Max. Spectral Emphasis Root Mean Square Energy

Mean Spectral Emphasis Root Mean Square Energy

Pseudo-Slope of Pitch Contour

Legendre Polynomials Coefficients of Pitch Contour

Legendre Polynomials Coefficients of RMS Energy Contour

Legendre Polynomials Coefficients of Spectral Emphasis RMS Energy

Page 14: 2015/9/131 Stress Detection J.-S. Roger Jang ( 張智星 ) MIR LabMIR Lab, CSIE Dept., National Taiwan Univ

-14-

Lexical Stress Detection – Experiment 3

GMMC: Gaussian Mixture Model ClassifierNBC: Naïve Bayes ClassifierQC: Quadratic ClassifierSVMC: Support Vector Machine Classifier

Feature Set

Max. Root Mean Square Energy

Mean Root Mean Square Energy

Max. Pitch

Median Pitch

Duration

Max. Spectral Emphasis Root Mean Square Energy

Mean Spectral Emphasis Root Mean Square Energy

Pseudo-Slope of Pitch Contour

Legendre Polynomials Coefficients of Pitch Contour

Legendre Polynomials Coefficients of RMS Energy Contour

Legendre Polynomials Coefficients of Spectral Emphasis RMS Energy

10-fold Cross Validation

Page 15: 2015/9/131 Stress Detection J.-S. Roger Jang ( 張智星 ) MIR LabMIR Lab, CSIE Dept., National Taiwan Univ

-15-

Lexical Stress Detection – Error Analysis

Page 16: 2015/9/131 Stress Detection J.-S. Roger Jang ( 張智星 ) MIR LabMIR Lab, CSIE Dept., National Taiwan Univ

-16-

Lexical Stress Detection – Error Analysis

Page 17: 2015/9/131 Stress Detection J.-S. Roger Jang ( 張智星 ) MIR LabMIR Lab, CSIE Dept., National Taiwan Univ

-17-

More on Stress Detection

ASRA Chapter 20 of online

tutorial on Audio Signal Processing

DemoRecognition

• goDemoVc.m in ASR

• Web

Assessment• goDemoSa.m in ASR

• Web

Stress detection Application note Demo