building a sentential model for automatic prosody evaluation kyuchul yoon school of english language...

Building a sentential modelBuilding a sentential modelforfor

automatic prosody evaluationautomatic prosody evaluation

Kyuchul YoonSchool of English Language & Literature

Yeungnam University2009.06.19

Korea University

Part A

English pronunciation evaluation

English pronunciation proficiency evaluation– Ultimate goals

• Evaluation at– The segmental level

– The suprasegmental level

– Current goals• Evaluation at

– The suprasegmental level

Introduction

English pronunciation evaluation

The goal of present study– Prosody evaluation of a single target utterance

• Produced by a Korean student

• Given– An English target sentence

– A sentential model for prosody evaluation

Introduction

Manual vs. automatic

Problems of manual evaluation– What to evaluate– How to evaluate– Consistency

Problems of automatic evaluation– How to reflect human knowledge

Introduction

Manual vs. automatic A possible solution?

– Avoid knowledge-based abstraction• Compare a target utterance with

native speakers’ utterances

– Use multiple utterances for comparison• Multiple “good” utterances from native speakers

– Adopt raw values• Calculate difference values between the target and the “good”

utterances in terms of – The three prosodic aspects : F0, intensity, durations 3D coordinates

Introduction

How to build the model

Use multivariate statistical analysis– A discriminant analysis

The components of the model(The segmental proficiency scores controlled)

– The manual prosody evaluation scores (response)– The automatic prosody evaluation scores (factors)

The requirements of the model– The correlation between the two levels

Manual scores vs. Automatic scores

Introduction


The manual prosody scores (an ideal case)• The “good” utterance versions (point 5)

by many native speakers of English

• The utterance versions by Korean students whose prosodic proficiencies are• High (point 5)

• Intermediate (point 3)

• Low (point 1)

• On a scale of 1 (worst) to 5 (best)

Introduction


The automatic prosody scores• Use of Praat scripts• Comparison between a single target utterance &

multiple native speakers’ utterances to yield scores for– The F0 difference– The intensity difference– The duration difference

in the form of 3D coordinates (x, y, z) = (F0, Int, Dur)• One utterance yields as many coordinates as the

number of “good” native speakers

Introduction


Evaluation by comparisons

Introduction

A 3D sentential modelfor prosody evaluation

A 3D model– 3D axes: F0, intensity, durations

(F0, Int, Dur) coordinates= (x, y, z)

– Automatic scores as scatterplot points– Manually evaluated scores group the points

Introduction

A 3D sentential modelfor prosody evaluatioin

Validity of the model– Sufficient separation of groups with different

manual scores

– colors : manual scores– arrowheads : automatic scores

Introduction

Sentential prosody evaluation [7]Before & after duration manipulation

native

learnerbefore

learnerafter

Methods

Sentential prosody evaluation [7]F0 : point-to-point comparison btw/ native and learner

after normalization

native

learnerafter

Methods

Automatic score (F0, Int, Dur)(x, y, z)

Sentential prosody evaluation [7]Intensity : point-to-point comparison btw/ native and learner

after normalization

native

learnerafter

Methods


Sentential prosody evaluation [7]Duration : segment-to-segment comparison btw/ native and learner

native

learnerbefore

Methods

P = (p1, p2, p3,..., pn) and Q = (q1, q2, q3,..., qn) in Euclidean n-dimensional space

Euclidean distance metric for evaluation measure


Manual evaluation of sentential prosodyMethods

Manual scores for Set B utterances“The dancing queen likes only the apple pies”

Sentential prosody evaluation [7]Methods

A sample score array for one utterance from group K5:one learner utterance vs. 10 model native utterances

Automatic prosody score for K5.U1 = {(899,142,408), (360,92,190), …(716,178,183)}

A prosody evaluation modelby a Korean phonetician

Results

Korean phonetician’s Model

A sample prosody evaluationwith a discriminant analysis

Results

To make this fully automaticDiscussion

For manual evaluation of the training model– The number of Korean learners

• The more the better

– The levels of English proficiency• The diverse the better (scores 1 through 5)

For automatic evaluation of the trainees– Need automatic segmentation (ASR)– Need to deal with redundant/missing segments

Building a sentential modelBuilding a sentential modelfor automatic evaluation for automatic evaluation

of pronunciation proficiencyof pronunciation proficiency

What about segmental evaluation?

Part B

Segmental evaluation byspectral comparison

Methods

Sex/age controlled (no normalization was used)– Adult male (native/Korean) speakers were selected

Spectral comparison– Three equally-spaced spectral slices were used for each

matching segments– Euclidean distance measure was used from a pair of

matching spectral envelopes

Four coordinates for pronunciation proficiency evaluation– Segments, F0, intensity, durations– (w, x, y, z) becomes one of the score array

Manual evaluation of overall proficiencyMethods

Manual scores for Set C utterances“Put your toys away right now”

<Table 4> The overall scores of the 34 utterances for Set C sentence “Put your toys away right now”.The manual evaluation was performed by a Korean phonetician. Note that the subjects were all male adults.

A pronunciation proficiency evaluation modelby a Korean phonetician

Results

Korean phonetician’s Models

(Intensity axis not shown)

A prosody evaluation modelby a Korean phonetician

Results

Korean phonetician’s Model

A discriminant analysisResults

<Table 5> The classification table from the discriminant analysis of one test data.The number in each cell represents the probability of the automatic pronunciation Proficiency score being classified into the predicted group.

<Table 6> The confusion matrix for the classification table.

Discriminant analyseswith leave-one-out cross-validation

Results

Testing for score 4 : 6 out of 9 correct

Testing for score 2 : 12 out of 15 correct

Discriminant analyseswith leave-one-out cross-validation

Results

For N4 & K2 groups, evaluation models were built by using– The discriminant analysis with– Leave-one-out cross-validation

The number of models (built by discriminant analyses) was 24– Group N4 : 9 subjects– Group K2 : 15 subjects

Success rate– Group N4 : 6 out of 9 predicted correct– Group K2 : 12 out of 15 predicted correct

Automatic evaluationof pronunciation proficiency

Discussion

Viability of sentential models for the evaluation of– Segmental proficiency : spectral comparison– Prosodic proficiency : F0/intensity/durations

in the form of multiple score array coordinates (segments, F0, intensity, durations) = (w, x, y, z)

Comparison seems to work– A target utterance vs. multiple model native utterances

Better models can be built with– More (controlled) utterances– More score resolution

• Current : score 2 (bad) – score 4 (good)

• Future : score 1 (worst) – score 3 (fair) – score 5 (best)

References[1] Boersma, Paul, “Praat, a system for doing phonetics by computer”, Glot International

5(9/10), pp.341-345, 2001.[2] Mahalanobis, P.C., “On the generalized distance in statistics”, Proceedings of the National

Institute of Science of India 12, pp.49-55, 1936.[3] Moulines, E. & F. Charpentier, “Pitch synchronous waveform processing techniques for

text-to-speech synthesis using diphones”, Speech Communication 9, pp.453-467, 1990.[4] Ramus, F., M. Nespor, J. Mehler, “Correlates of linguistic rhythm in the speech signal”,

Cognition 73, pp. 265-292, 1999.[5] Rhee, S., S. Lee, Y. Lee & S. Kang, “Design and construction of Korean-Spoken English

Corpus (K-SEC)”, Malsori 46, pp.159-174, 2003.[6] Yoon, K, “Imposing native speakers' prosody on non-native speakers' utterances: The

technique of cloning prosody”, Journal of the Modern British & American Language & Literature 25(4), pp.197-215, 2007.

[7] Yoon, K. 2008. Synthesis and evaluation of prosodically exaggerated utterances. Unpublished manuscript

building a sentential model for automatic prosody evaluation kyuchul yoon school of english language...

Documents

modelthe manual prosody

z automatic scores

levels manual scores

manual scoresarrowheads

d coordinates x

d coordinatesintroductionhow

durone utterance

different manual scorescolors