automatic grapheme-phoneme conversion for spoken british english corpora c. auran, c. bouzon &...

Automatic Grapheme-Phoneme Conversion for Spoken British English Corpora

C. AURAN, C. BOUZON & D.J. HIRST

Laboratoire Parole et LangageCNRS UMR6057

Université de Provence

Summary

1. The Aix-MARSEC ProjectBuilding Aix-MARSECAvailability of the databaseMethodology

2. Grapheme-Phoneme Conversion and AlignmentThe Aix-MARSEC MethodologyIntegration into PCE

3. Conclusion and Perspectives

The Aix-MARSEC Project

• Automatic grapheme-to-phoneme conversion

• Automatic phoneme level alignment

• Automatic intonation annotation using the Momel-Intsint methodology

• 8 annotation levels aligned: phonemes, syllable constituents,

syllables, words, feet and rhythmic units, tone groups, Intsint coding

• Tagging and parsing alignment under way


An evolution from the SEC and MARSEC corpora

SEC

Spoken English Corpus

• 55,000 words, 339 min. and 18 sec. • BBC 1980s recordings• 11 speaking styles• 53 (17 female and 36 male) speakers• Orthographic transcription• Syntactic tagging and parsing• Prosodic annotation: 14 tonetic stress marks

MARSEC

Machine Readable SEC

Aix-MARSEC

Building Aix-MARSEC

• Alignment of words and tone groups with the signal

• Conversion of all the TSM to ASCII characters


Availability of the database

• Online version:• Annotation files (TextGrids)

• Phonemes data tables

• Perl and Praat scripts

www.lpl.univ-aix.fr/~EPGA/

• CD-Rom version:• Annotation files (TextGrids)

• Phonemes data tables

• Perl and Praat scripts

• Sound files (.wav format)


Methodology

Automatic alignment

Orthographic transcription

Raw phonemic transcription

Optimised phonemic transcription

Aligned phonemic transcription

Elision prediction

G2P conversion

SC annotation Syllable annotation Word annotation

TSM annotation

Rhythmic annotation

Grapheme-Phoneme Conversion and Alignment

G2P Conversion and Alignment




Elision prediction

G2P conversion

The Aix-MARSEC Methodology

Automatic alignment


SC annotation Syllable annotation Word annotation




G2P conversion




G2P Conversion: General principles

• Dictionary-based method (4 dictionaries used)

• Specific processing for numbers, abbreviations, etc.

• Syntagmatic effects (linking r, definite article)

Raw transcription



G2P Conversion: The 4 dictionaries

• Primary pronunciation dictionary (‘Advanced Learners’ Dictionary’, Oxford University Press; 71 000 entries)

• Complementary dictionary (700 entries)

• “Problematic forms” dictionary (for hesitations, partial words,…; 26 entries)

• “Reduced forms” dictionary (75 entries)



G2P Conversion: Specific issues

• Abbreviations• Numbers• Sequences of numbers and capitals (Post Codes)• Genitives and Contractions• 3rd person and plural forms• Preterite and past participle forms




G2P conversion



Elision prediction



Elision Prediction: General principles

• Raw transcription ↔ citation forms

• Continuous speech ↔ specific phenomena (elisions, epenthesis, metathesis, etc.)



Elision prediction: Constraints

- Intonation constraints (TSM)- Temporal constraints:

Minimal threshold: 5ms

Thresholds for specific phonemes (Klatt, 1979)

/t – d/= 55ms; /@/= 55ms; /T/= 110ms

Lengthening « z » factor: z < 0 elision

z ≥ 0 no elision

- Phonotactic constraints (rules)

G2P Conversion and AlignmentElision prediction: Rules

Principles Phonemes Contexts Constraints Examples0 <5ms1 d and TSM and then2 h he('s/ll/d) him his her TSM in her case

3 t d {[t][d]} # {[t][d]} Th.1 - except '-ed' I've got to

4 t d C1 + {[t][d]} # C2 – {[h][j]} Th. mustn't lose

6 l [O:] + [l] (#) C always7 T C + [T] (#) [s] Th. twelfths8 ptk bdg [s| z] + {[p| b][t| d][k| g]} (#) [s| z] tourists

10 @ # [k@n] ('syll (syll [0…n])) # TSM - Th. confront

11 @ {[k][p]} + [@] + [n] # Th. open

5 p k glimpsenasal + {[p][k]} (#) C – {[r][l][j]}

9 @ Th. - */rl/ camera[@] + {[l][r]} (#) + voyelle réduite {[I][@]}

1Th.: duration threshold

G2P Conversion and AlignmentElision prediction: Evaluation

MEASURES

RECALL 50,51 %

PRECISION 74,44 %

SILENCE 49,49 %

NOISE 25,56 %

F-MEASURE 60,18 %

4077 elided phonemes out of 199,770 in the corpus (≈ 2 %)

Half of all elisions are correctly predicted

¾ predicted elisions are correct

Global quality of the algorithm





Elision prediction

G2P conversion


Automatic alignment


G2P Conversion and AlignmentAlignment: General principles

HMM and Viterbi based alignment by Christophe Lévy (LIA, France)

- HMM trained on the TIMIT corpus of American English

- Gaussian Mixture Model (8 components & diagonal covariance matrices estimated through the Expectation-Maximisation algorithm optimising the Maximum-Likelihood criterion)

- 12 MFCC (filter bank analysis) increased by energy, delta and delta-delta coefficients

39-coefficient vector per speech frame


Absolute mean error: 22 ms

Mean error: - 6,29 ms

Kurtosis: 8,15 (narrow distribution)

Skewness: -0,94 (left bias)

Alignment: Evaluation


Acceptance Threshold

Optimised transcription

64 ms 93.25 %

32 ms 82.02 %

20 ms 68.37 %

16 ms 59.97 %

15 ms 57.40 %

10 ms 42.43 %

5 ms 23.72 %

Alignment: Evaluation

Integration into PCEIntegration: Motivations

Double focus:

Segmental phenomena

Prosodic phenomena

Formant charts

Tonal alignment

Phoneme level alignment

For phoneticians and phonologists

Integration into PCEIntegration: 2 possible policies

• Direct integration: Exact Aix-MARSEC methodology

Requires word level manual alignment

• Alternative integration: Adaptation of the Aix-MARSEC methodology

Optional elisions predicted on the basis of phonotactic rules only + decision during the alignment phase

Conclusions and Perspectives

Conclusions and Perspectives

• An easily evolutive fully automatic methodology

• Diverse types of phonological / phonetic segmental / prosodic

exploitation (formant charts, temporal, intonational and metrical

studies, …)

• Full interactivity with other ProZEd modules (Momel-Intsint, …)

• Realistic integration into PCE (2 options)

Well… This time it’s for good !!

Presentation available from

www.lpl.univ-aix.fr/~EPGA/

14 ASCII prosodic annotation symbols:

_ low level~ high level< step-down> step-up/’ (high) rise-fall

‘/ high\ high fall fall-rise/ high rise

, low rise‘ low fall,\ (low rise-fall – not used)\, low fall-rise* stressed but unaccented| minor intonation unit boundary|| major intonation unit boundary

(Roach, 1994)

Back to the presentation

Reduced forms processing

Creation of a reduced forms dictionary based on O’Connor (1967) and

Faure (1975)

Reduction constraint: TSM absence

Aim: improving G2P conversion

Back to the presentation

Example: TSM: ‘/and → converted into /{nd/

No TSM: and → converted into /@nd/

automatic grapheme-phoneme conversion for spoken british english corpora c. auran, c. bouzon &...

Documents

alignment alignment

aixmarsec methodology

phonemic transcription

aixmarsec project slide

alignment elision prediction

signal conversion

evaluation slide

algorithm slide