building a sentential model for automatic prosody evaluation kyuchul yoon school of english language...
TRANSCRIPT
Building a sentential modelBuilding a sentential modelforfor
automatic prosody evaluationautomatic prosody evaluation
Kyuchul YoonSchool of English Language & Literature
Yeungnam University2009.06.19
Korea University
Part A
English pronunciation evaluation
English pronunciation proficiency evaluation– Ultimate goals
• Evaluation at– The segmental level
– The suprasegmental level
– Current goals• Evaluation at
– The suprasegmental level
Introduction
English pronunciation evaluation
The goal of present study– Prosody evaluation of a single target utterance
• Produced by a Korean student
• Given– An English target sentence
– A sentential model for prosody evaluation
Introduction
Manual vs. automatic
Problems of manual evaluation– What to evaluate– How to evaluate– Consistency
Problems of automatic evaluation– How to reflect human knowledge
Introduction
Manual vs. automatic A possible solution?
– Avoid knowledge-based abstraction• Compare a target utterance with
native speakers’ utterances
– Use multiple utterances for comparison• Multiple “good” utterances from native speakers
– Adopt raw values• Calculate difference values between the target and the “good”
utterances in terms of – The three prosodic aspects : F0, intensity, durations 3D coordinates
Introduction
How to build the model
Use multivariate statistical analysis– A discriminant analysis
The components of the model(The segmental proficiency scores controlled)
– The manual prosody evaluation scores (response)– The automatic prosody evaluation scores (factors)
The requirements of the model– The correlation between the two levels
Manual scores vs. Automatic scores
Introduction
How to build the model
The manual prosody scores (an ideal case)• The “good” utterance versions (point 5)
by many native speakers of English
• The utterance versions by Korean students whose prosodic proficiencies are• High (point 5)
• Intermediate (point 3)
• Low (point 1)
• On a scale of 1 (worst) to 5 (best)
Introduction
How to build the model
The automatic prosody scores• Use of Praat scripts• Comparison between a single target utterance &
multiple native speakers’ utterances to yield scores for– The F0 difference– The intensity difference– The duration difference
in the form of 3D coordinates (x, y, z) = (F0, Int, Dur)• One utterance yields as many coordinates as the
number of “good” native speakers
Introduction
How to build the model
Evaluation by comparisons
Introduction
A 3D sentential modelfor prosody evaluation
A 3D model– 3D axes: F0, intensity, durations
(F0, Int, Dur) coordinates= (x, y, z)
– Automatic scores as scatterplot points– Manually evaluated scores group the points
Introduction
A 3D sentential modelfor prosody evaluatioin
Validity of the model– Sufficient separation of groups with different
manual scores
– colors : manual scores– arrowheads : automatic scores
Introduction
Sentential prosody evaluation [7]Before & after duration manipulation
native
learnerbefore
learnerafter
Methods
Sentential prosody evaluation [7]F0 : point-to-point comparison btw/ native and learner
after normalization
native
learnerafter
Methods
Automatic score (F0, Int, Dur)(x, y, z)
Sentential prosody evaluation [7]Intensity : point-to-point comparison btw/ native and learner
after normalization
native
learnerafter
Methods
Automatic score (F0, Int, Dur)(x, y, z)
Sentential prosody evaluation [7]Duration : segment-to-segment comparison btw/ native and learner
native
learnerbefore
Methods
P = (p1, p2, p3,..., pn) and Q = (q1, q2, q3,..., qn) in Euclidean n-dimensional space
Euclidean distance metric for evaluation measure
Automatic score (F0, Int, Dur)(x, y, z)
Manual evaluation of sentential prosodyMethods
Manual scores for Set B utterances“The dancing queen likes only the apple pies”
Sentential prosody evaluation [7]Methods
A sample score array for one utterance from group K5:one learner utterance vs. 10 model native utterances
Automatic prosody score for K5.U1 = {(899,142,408), (360,92,190), …(716,178,183)}
A prosody evaluation modelby a Korean phonetician
Results
Korean phonetician’s Model
A prosody evaluation modelby a Korean phonetician
Results
Korean phonetician’s Model
A sample prosody evaluationwith a discriminant analysis
Results
To make this fully automaticDiscussion
For manual evaluation of the training model– The number of Korean learners
• The more the better
– The levels of English proficiency• The diverse the better (scores 1 through 5)
For automatic evaluation of the trainees– Need automatic segmentation (ASR)– Need to deal with redundant/missing segments
Building a sentential modelBuilding a sentential modelfor automatic evaluation for automatic evaluation
of pronunciation proficiencyof pronunciation proficiency
What about segmental evaluation?
Part B
Segmental evaluation byspectral comparison
Methods
Sex/age controlled (no normalization was used)– Adult male (native/Korean) speakers were selected
Spectral comparison– Three equally-spaced spectral slices were used for each
matching segments– Euclidean distance measure was used from a pair of
matching spectral envelopes
Four coordinates for pronunciation proficiency evaluation– Segments, F0, intensity, durations– (w, x, y, z) becomes one of the score array
Manual evaluation of overall proficiencyMethods
Manual scores for Set C utterances“Put your toys away right now”
<Table 4> The overall scores of the 34 utterances for Set C sentence “Put your toys away right now”.The manual evaluation was performed by a Korean phonetician. Note that the subjects were all male adults.
A pronunciation proficiency evaluation modelby a Korean phonetician
Results
Korean phonetician’s Models
(Intensity axis not shown)
A prosody evaluation modelby a Korean phonetician
Results
Korean phonetician’s Model
A discriminant analysisResults
<Table 5> The classification table from the discriminant analysis of one test data.The number in each cell represents the probability of the automatic pronunciation Proficiency score being classified into the predicted group.
<Table 6> The confusion matrix for the classification table.
Discriminant analyseswith leave-one-out cross-validation
Results
Testing for score 4 : 6 out of 9 correct
Testing for score 2 : 12 out of 15 correct
Discriminant analyseswith leave-one-out cross-validation
Results
For N4 & K2 groups, evaluation models were built by using– The discriminant analysis with– Leave-one-out cross-validation
The number of models (built by discriminant analyses) was 24– Group N4 : 9 subjects– Group K2 : 15 subjects
Success rate– Group N4 : 6 out of 9 predicted correct– Group K2 : 12 out of 15 predicted correct
Automatic evaluationof pronunciation proficiency
Discussion
Viability of sentential models for the evaluation of– Segmental proficiency : spectral comparison– Prosodic proficiency : F0/intensity/durations
in the form of multiple score array coordinates (segments, F0, intensity, durations) = (w, x, y, z)
Comparison seems to work– A target utterance vs. multiple model native utterances
Better models can be built with– More (controlled) utterances– More score resolution
• Current : score 2 (bad) – score 4 (good)
• Future : score 1 (worst) – score 3 (fair) – score 5 (best)
References[1] Boersma, Paul, “Praat, a system for doing phonetics by computer”, Glot International
5(9/10), pp.341-345, 2001.[2] Mahalanobis, P.C., “On the generalized distance in statistics”, Proceedings of the National
Institute of Science of India 12, pp.49-55, 1936.[3] Moulines, E. & F. Charpentier, “Pitch synchronous waveform processing techniques for
text-to-speech synthesis using diphones”, Speech Communication 9, pp.453-467, 1990.[4] Ramus, F., M. Nespor, J. Mehler, “Correlates of linguistic rhythm in the speech signal”,
Cognition 73, pp. 265-292, 1999.[5] Rhee, S., S. Lee, Y. Lee & S. Kang, “Design and construction of Korean-Spoken English
Corpus (K-SEC)”, Malsori 46, pp.159-174, 2003.[6] Yoon, K, “Imposing native speakers' prosody on non-native speakers' utterances: The
technique of cloning prosody”, Journal of the Modern British & American Language & Literature 25(4), pp.197-215, 2007.
[7] Yoon, K. 2008. Synthesis and evaluation of prosodically exaggerated utterances. Unpublished manuscript