building a sentential model for automatic prosody evaluation kyuchul yoon school of english language...

31
Building a sentential Building a sentential model model for for automatic prosody automatic prosody evaluation evaluation Kyuchul Yoon School of English Language & Literature Yeungnam University 2009.06.19 Korea University Part A

Upload: amberlynn-york

Post on 02-Jan-2016

219 views

Category:

Documents


3 download

TRANSCRIPT

Page 1: Building a sentential model for automatic prosody evaluation Kyuchul Yoon School of English Language & Literature Yeungnam University 2009.06.19 Korea

Building a sentential modelBuilding a sentential modelforfor

automatic prosody evaluationautomatic prosody evaluation

Kyuchul YoonSchool of English Language & Literature

Yeungnam University2009.06.19

Korea University

Part A

Page 2: Building a sentential model for automatic prosody evaluation Kyuchul Yoon School of English Language & Literature Yeungnam University 2009.06.19 Korea

English pronunciation evaluation

English pronunciation proficiency evaluation– Ultimate goals

• Evaluation at– The segmental level

– The suprasegmental level

– Current goals• Evaluation at

– The suprasegmental level

Introduction

Page 3: Building a sentential model for automatic prosody evaluation Kyuchul Yoon School of English Language & Literature Yeungnam University 2009.06.19 Korea

English pronunciation evaluation

The goal of present study– Prosody evaluation of a single target utterance

• Produced by a Korean student

• Given– An English target sentence

– A sentential model for prosody evaluation

Introduction

Page 4: Building a sentential model for automatic prosody evaluation Kyuchul Yoon School of English Language & Literature Yeungnam University 2009.06.19 Korea

Manual vs. automatic

Problems of manual evaluation– What to evaluate– How to evaluate– Consistency

Problems of automatic evaluation– How to reflect human knowledge

Introduction

Page 5: Building a sentential model for automatic prosody evaluation Kyuchul Yoon School of English Language & Literature Yeungnam University 2009.06.19 Korea

Manual vs. automatic A possible solution?

– Avoid knowledge-based abstraction• Compare a target utterance with

native speakers’ utterances

– Use multiple utterances for comparison• Multiple “good” utterances from native speakers

– Adopt raw values• Calculate difference values between the target and the “good”

utterances in terms of – The three prosodic aspects : F0, intensity, durations 3D coordinates

Introduction

Page 6: Building a sentential model for automatic prosody evaluation Kyuchul Yoon School of English Language & Literature Yeungnam University 2009.06.19 Korea

How to build the model

Use multivariate statistical analysis– A discriminant analysis

The components of the model(The segmental proficiency scores controlled)

– The manual prosody evaluation scores (response)– The automatic prosody evaluation scores (factors)

The requirements of the model– The correlation between the two levels

Manual scores vs. Automatic scores

Introduction

Page 7: Building a sentential model for automatic prosody evaluation Kyuchul Yoon School of English Language & Literature Yeungnam University 2009.06.19 Korea

How to build the model

The manual prosody scores (an ideal case)• The “good” utterance versions (point 5)

by many native speakers of English

• The utterance versions by Korean students whose prosodic proficiencies are• High (point 5)

• Intermediate (point 3)

• Low (point 1)

• On a scale of 1 (worst) to 5 (best)

Introduction

Page 8: Building a sentential model for automatic prosody evaluation Kyuchul Yoon School of English Language & Literature Yeungnam University 2009.06.19 Korea

How to build the model

The automatic prosody scores• Use of Praat scripts• Comparison between a single target utterance &

multiple native speakers’ utterances to yield scores for– The F0 difference– The intensity difference– The duration difference

in the form of 3D coordinates (x, y, z) = (F0, Int, Dur)• One utterance yields as many coordinates as the

number of “good” native speakers

Introduction

Page 9: Building a sentential model for automatic prosody evaluation Kyuchul Yoon School of English Language & Literature Yeungnam University 2009.06.19 Korea

How to build the model

Evaluation by comparisons

Introduction

Page 10: Building a sentential model for automatic prosody evaluation Kyuchul Yoon School of English Language & Literature Yeungnam University 2009.06.19 Korea

A 3D sentential modelfor prosody evaluation

A 3D model– 3D axes: F0, intensity, durations

(F0, Int, Dur) coordinates= (x, y, z)

– Automatic scores as scatterplot points– Manually evaluated scores group the points

Introduction

Page 11: Building a sentential model for automatic prosody evaluation Kyuchul Yoon School of English Language & Literature Yeungnam University 2009.06.19 Korea

A 3D sentential modelfor prosody evaluatioin

Validity of the model– Sufficient separation of groups with different

manual scores

– colors : manual scores– arrowheads : automatic scores

Introduction

Page 12: Building a sentential model for automatic prosody evaluation Kyuchul Yoon School of English Language & Literature Yeungnam University 2009.06.19 Korea

Sentential prosody evaluation [7]Before & after duration manipulation

native

learnerbefore

learnerafter

Methods

Page 13: Building a sentential model for automatic prosody evaluation Kyuchul Yoon School of English Language & Literature Yeungnam University 2009.06.19 Korea

Sentential prosody evaluation [7]F0 : point-to-point comparison btw/ native and learner

after normalization

native

learnerafter

Methods

Automatic score (F0, Int, Dur)(x, y, z)

Page 14: Building a sentential model for automatic prosody evaluation Kyuchul Yoon School of English Language & Literature Yeungnam University 2009.06.19 Korea

Sentential prosody evaluation [7]Intensity : point-to-point comparison btw/ native and learner

after normalization

native

learnerafter

Methods

Automatic score (F0, Int, Dur)(x, y, z)

Page 15: Building a sentential model for automatic prosody evaluation Kyuchul Yoon School of English Language & Literature Yeungnam University 2009.06.19 Korea

Sentential prosody evaluation [7]Duration : segment-to-segment comparison btw/ native and learner

native

learnerbefore

Methods

P = (p1, p2, p3,..., pn) and Q = (q1, q2, q3,..., qn) in Euclidean n-dimensional space

Euclidean distance metric for evaluation measure

Automatic score (F0, Int, Dur)(x, y, z)

Page 16: Building a sentential model for automatic prosody evaluation Kyuchul Yoon School of English Language & Literature Yeungnam University 2009.06.19 Korea

Manual evaluation of sentential prosodyMethods

Manual scores for Set B utterances“The dancing queen likes only the apple pies”

Page 17: Building a sentential model for automatic prosody evaluation Kyuchul Yoon School of English Language & Literature Yeungnam University 2009.06.19 Korea

Sentential prosody evaluation [7]Methods

A sample score array for one utterance from group K5:one learner utterance vs. 10 model native utterances

Automatic prosody score for K5.U1 = {(899,142,408), (360,92,190), …(716,178,183)}

Page 18: Building a sentential model for automatic prosody evaluation Kyuchul Yoon School of English Language & Literature Yeungnam University 2009.06.19 Korea

A prosody evaluation modelby a Korean phonetician

Results

Korean phonetician’s Model

Page 19: Building a sentential model for automatic prosody evaluation Kyuchul Yoon School of English Language & Literature Yeungnam University 2009.06.19 Korea

A prosody evaluation modelby a Korean phonetician

Results

Korean phonetician’s Model

Page 20: Building a sentential model for automatic prosody evaluation Kyuchul Yoon School of English Language & Literature Yeungnam University 2009.06.19 Korea

A sample prosody evaluationwith a discriminant analysis

Results

Page 21: Building a sentential model for automatic prosody evaluation Kyuchul Yoon School of English Language & Literature Yeungnam University 2009.06.19 Korea

To make this fully automaticDiscussion

For manual evaluation of the training model– The number of Korean learners

• The more the better

– The levels of English proficiency• The diverse the better (scores 1 through 5)

For automatic evaluation of the trainees– Need automatic segmentation (ASR)– Need to deal with redundant/missing segments

Page 22: Building a sentential model for automatic prosody evaluation Kyuchul Yoon School of English Language & Literature Yeungnam University 2009.06.19 Korea

Building a sentential modelBuilding a sentential modelfor automatic evaluation for automatic evaluation

of pronunciation proficiencyof pronunciation proficiency

What about segmental evaluation?

Part B

Page 23: Building a sentential model for automatic prosody evaluation Kyuchul Yoon School of English Language & Literature Yeungnam University 2009.06.19 Korea

Segmental evaluation byspectral comparison

Methods

Sex/age controlled (no normalization was used)– Adult male (native/Korean) speakers were selected

Spectral comparison– Three equally-spaced spectral slices were used for each

matching segments– Euclidean distance measure was used from a pair of

matching spectral envelopes

Four coordinates for pronunciation proficiency evaluation– Segments, F0, intensity, durations– (w, x, y, z) becomes one of the score array

Page 24: Building a sentential model for automatic prosody evaluation Kyuchul Yoon School of English Language & Literature Yeungnam University 2009.06.19 Korea

Manual evaluation of overall proficiencyMethods

Manual scores for Set C utterances“Put your toys away right now”

<Table 4> The overall scores of the 34 utterances for Set C sentence “Put your toys away right now”.The manual evaluation was performed by a Korean phonetician. Note that the subjects were all male adults.

Page 25: Building a sentential model for automatic prosody evaluation Kyuchul Yoon School of English Language & Literature Yeungnam University 2009.06.19 Korea

A pronunciation proficiency evaluation modelby a Korean phonetician

Results

Korean phonetician’s Models

(Intensity axis not shown)

Page 26: Building a sentential model for automatic prosody evaluation Kyuchul Yoon School of English Language & Literature Yeungnam University 2009.06.19 Korea

A prosody evaluation modelby a Korean phonetician

Results

Korean phonetician’s Model

Page 27: Building a sentential model for automatic prosody evaluation Kyuchul Yoon School of English Language & Literature Yeungnam University 2009.06.19 Korea

A discriminant analysisResults

<Table 5> The classification table from the discriminant analysis of one test data.The number in each cell represents the probability of the automatic pronunciation Proficiency score being classified into the predicted group.

<Table 6> The confusion matrix for the classification table.

Page 28: Building a sentential model for automatic prosody evaluation Kyuchul Yoon School of English Language & Literature Yeungnam University 2009.06.19 Korea

Discriminant analyseswith leave-one-out cross-validation

Results

Testing for score 4 : 6 out of 9 correct

Testing for score 2 : 12 out of 15 correct

Page 29: Building a sentential model for automatic prosody evaluation Kyuchul Yoon School of English Language & Literature Yeungnam University 2009.06.19 Korea

Discriminant analyseswith leave-one-out cross-validation

Results

For N4 & K2 groups, evaluation models were built by using– The discriminant analysis with– Leave-one-out cross-validation

The number of models (built by discriminant analyses) was 24– Group N4 : 9 subjects– Group K2 : 15 subjects

Success rate– Group N4 : 6 out of 9 predicted correct– Group K2 : 12 out of 15 predicted correct

Page 30: Building a sentential model for automatic prosody evaluation Kyuchul Yoon School of English Language & Literature Yeungnam University 2009.06.19 Korea

Automatic evaluationof pronunciation proficiency

Discussion

Viability of sentential models for the evaluation of– Segmental proficiency : spectral comparison– Prosodic proficiency : F0/intensity/durations

in the form of multiple score array coordinates (segments, F0, intensity, durations) = (w, x, y, z)

Comparison seems to work– A target utterance vs. multiple model native utterances

Better models can be built with– More (controlled) utterances– More score resolution

• Current : score 2 (bad) – score 4 (good)

• Future : score 1 (worst) – score 3 (fair) – score 5 (best)

Page 31: Building a sentential model for automatic prosody evaluation Kyuchul Yoon School of English Language & Literature Yeungnam University 2009.06.19 Korea

References[1] Boersma, Paul, “Praat, a system for doing phonetics by computer”, Glot International

5(9/10), pp.341-345, 2001.[2] Mahalanobis, P.C., “On the generalized distance in statistics”, Proceedings of the National

Institute of Science of India 12, pp.49-55, 1936.[3] Moulines, E. & F. Charpentier, “Pitch synchronous waveform processing techniques for

text-to-speech synthesis using diphones”, Speech Communication 9, pp.453-467, 1990.[4] Ramus, F., M. Nespor, J. Mehler, “Correlates of linguistic rhythm in the speech signal”,

Cognition 73, pp. 265-292, 1999.[5] Rhee, S., S. Lee, Y. Lee & S. Kang, “Design and construction of Korean-Spoken English

Corpus (K-SEC)”, Malsori 46, pp.159-174, 2003.[6] Yoon, K, “Imposing native speakers' prosody on non-native speakers' utterances: The

technique of cloning prosody”, Journal of the Modern British & American Language & Literature 25(4), pp.197-215, 2007.

[7] Yoon, K. 2008. Synthesis and evaluation of prosodically exaggerated utterances. Unpublished manuscript