automatic grapheme-phoneme conversion for spoken british english corpora c. auran, c. bouzon &...

29
Automatic Grapheme-Phoneme Conversion for Spoken British English Corpora C. AURAN, C. BOUZON & D.J. HIRST Laboratoire Parole et Langage CNRS UMR6057 Université de Provence

Upload: nathaniel-russell

Post on 28-Mar-2015

217 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Automatic Grapheme-Phoneme Conversion for Spoken British English Corpora C. AURAN, C. BOUZON & D.J. HIRST Laboratoire Parole et Langage CNRS UMR6057 Université

Automatic Grapheme-Phoneme Conversion for Spoken British English Corpora

C. AURAN, C. BOUZON & D.J. HIRST

Laboratoire Parole et LangageCNRS UMR6057

Université de Provence

Page 2: Automatic Grapheme-Phoneme Conversion for Spoken British English Corpora C. AURAN, C. BOUZON & D.J. HIRST Laboratoire Parole et Langage CNRS UMR6057 Université

Summary

1. The Aix-MARSEC ProjectBuilding Aix-MARSECAvailability of the databaseMethodology

2. Grapheme-Phoneme Conversion and AlignmentThe Aix-MARSEC MethodologyIntegration into PCE

3. Conclusion and Perspectives

Page 3: Automatic Grapheme-Phoneme Conversion for Spoken British English Corpora C. AURAN, C. BOUZON & D.J. HIRST Laboratoire Parole et Langage CNRS UMR6057 Université

The Aix-MARSEC Project

Page 4: Automatic Grapheme-Phoneme Conversion for Spoken British English Corpora C. AURAN, C. BOUZON & D.J. HIRST Laboratoire Parole et Langage CNRS UMR6057 Université

• Automatic grapheme-to-phoneme conversion

• Automatic phoneme level alignment

• Automatic intonation annotation using the Momel-Intsint methodology

• 8 annotation levels aligned: phonemes, syllable constituents,

syllables, words, feet and rhythmic units, tone groups, Intsint coding

• Tagging and parsing alignment under way

The Aix-MARSEC Project

An evolution from the SEC and MARSEC corpora

SEC

Spoken English Corpus

• 55,000 words, 339 min. and 18 sec. • BBC 1980s recordings• 11 speaking styles• 53 (17 female and 36 male) speakers• Orthographic transcription• Syntactic tagging and parsing• Prosodic annotation: 14 tonetic stress marks

MARSEC

Machine Readable SEC

Aix-MARSEC

Building Aix-MARSEC

• Alignment of words and tone groups with the signal

• Conversion of all the TSM to ASCII characters

Page 5: Automatic Grapheme-Phoneme Conversion for Spoken British English Corpora C. AURAN, C. BOUZON & D.J. HIRST Laboratoire Parole et Langage CNRS UMR6057 Université

The Aix-MARSEC Project

Page 6: Automatic Grapheme-Phoneme Conversion for Spoken British English Corpora C. AURAN, C. BOUZON & D.J. HIRST Laboratoire Parole et Langage CNRS UMR6057 Université

The Aix-MARSEC Project

Availability of the database

• Online version:• Annotation files (TextGrids)

• Phonemes data tables

• Perl and Praat scripts

www.lpl.univ-aix.fr/~EPGA/

• CD-Rom version:• Annotation files (TextGrids)

• Phonemes data tables

• Perl and Praat scripts

• Sound files (.wav format)

Page 7: Automatic Grapheme-Phoneme Conversion for Spoken British English Corpora C. AURAN, C. BOUZON & D.J. HIRST Laboratoire Parole et Langage CNRS UMR6057 Université

The Aix-MARSEC Project

Methodology

Automatic alignment

Orthographic transcription

Raw phonemic transcription

Optimised phonemic transcription

Aligned phonemic transcription

Elision prediction

G2P conversion

SC annotation Syllable annotation Word annotation

TSM annotation

Rhythmic annotation

Page 8: Automatic Grapheme-Phoneme Conversion for Spoken British English Corpora C. AURAN, C. BOUZON & D.J. HIRST Laboratoire Parole et Langage CNRS UMR6057 Université

Grapheme-Phoneme Conversion and Alignment

Page 9: Automatic Grapheme-Phoneme Conversion for Spoken British English Corpora C. AURAN, C. BOUZON & D.J. HIRST Laboratoire Parole et Langage CNRS UMR6057 Université

G2P Conversion and Alignment

Orthographic transcription

Raw phonemic transcription

Optimised phonemic transcription

Elision prediction

G2P conversion

The Aix-MARSEC Methodology

Automatic alignment

Aligned phonemic transcription

SC annotation Syllable annotation Word annotation

Page 10: Automatic Grapheme-Phoneme Conversion for Spoken British English Corpora C. AURAN, C. BOUZON & D.J. HIRST Laboratoire Parole et Langage CNRS UMR6057 Université

G2P Conversion and Alignment

Orthographic transcription

Raw phonemic transcription

G2P conversion

The Aix-MARSEC Methodology

Page 11: Automatic Grapheme-Phoneme Conversion for Spoken British English Corpora C. AURAN, C. BOUZON & D.J. HIRST Laboratoire Parole et Langage CNRS UMR6057 Université

G2P Conversion and Alignment

The Aix-MARSEC Methodology

G2P Conversion: General principles

• Dictionary-based method (4 dictionaries used)

• Specific processing for numbers, abbreviations, etc.

• Syntagmatic effects (linking r, definite article)

Raw transcription

Page 12: Automatic Grapheme-Phoneme Conversion for Spoken British English Corpora C. AURAN, C. BOUZON & D.J. HIRST Laboratoire Parole et Langage CNRS UMR6057 Université

G2P Conversion and Alignment

The Aix-MARSEC Methodology

G2P Conversion: The 4 dictionaries

• Primary pronunciation dictionary (‘Advanced Learners’ Dictionary’, Oxford University Press; 71 000 entries)

• Complementary dictionary (700 entries)

• “Problematic forms” dictionary (for hesitations, partial words,…; 26 entries)

• “Reduced forms” dictionary (75 entries)

Page 13: Automatic Grapheme-Phoneme Conversion for Spoken British English Corpora C. AURAN, C. BOUZON & D.J. HIRST Laboratoire Parole et Langage CNRS UMR6057 Université

G2P Conversion and Alignment

The Aix-MARSEC Methodology

G2P Conversion: Specific issues

• Abbreviations• Numbers• Sequences of numbers and capitals (Post Codes)• Genitives and Contractions• 3rd person and plural forms• Preterite and past participle forms

Page 14: Automatic Grapheme-Phoneme Conversion for Spoken British English Corpora C. AURAN, C. BOUZON & D.J. HIRST Laboratoire Parole et Langage CNRS UMR6057 Université

G2P Conversion and Alignment

Orthographic transcription

Raw phonemic transcription

G2P conversion

The Aix-MARSEC Methodology

Optimised phonemic transcription

Elision prediction

Page 15: Automatic Grapheme-Phoneme Conversion for Spoken British English Corpora C. AURAN, C. BOUZON & D.J. HIRST Laboratoire Parole et Langage CNRS UMR6057 Université

G2P Conversion and Alignment

The Aix-MARSEC Methodology

Elision Prediction: General principles

• Raw transcription ↔ citation forms

• Continuous speech ↔ specific phenomena (elisions, epenthesis, metathesis, etc.)

Page 16: Automatic Grapheme-Phoneme Conversion for Spoken British English Corpora C. AURAN, C. BOUZON & D.J. HIRST Laboratoire Parole et Langage CNRS UMR6057 Université

G2P Conversion and Alignment

The Aix-MARSEC Methodology

Elision prediction: Constraints

- Intonation constraints (TSM)- Temporal constraints:

Minimal threshold: 5ms

Thresholds for specific phonemes (Klatt, 1979)

/t – d/= 55ms; /@/= 55ms; /T/= 110ms

Lengthening « z » factor: z < 0 elision

z ≥ 0 no elision

- Phonotactic constraints (rules)

Page 17: Automatic Grapheme-Phoneme Conversion for Spoken British English Corpora C. AURAN, C. BOUZON & D.J. HIRST Laboratoire Parole et Langage CNRS UMR6057 Université

G2P Conversion and AlignmentElision prediction: Rules

Principles Phonemes Contexts Constraints Examples0 <5ms1 d and TSM and then2 h he('s/ll/d) him his her TSM in her case

3 t d {[t][d]} # {[t][d]} Th.1 - except '-ed' I've got to

4 t d C1 + {[t][d]} # C2 – {[h][j]} Th. mustn't lose

6 l [O:] + [l] (#) C always7 T C + [T] (#) [s]  Th. twelfths8 ptk bdg [s| z] + {[p| b][t| d][k| g]} (#) [s| z] tourists

10 @ # [k@n] ('syll (syll [0…n])) #  TSM - Th. confront

11 @ {[k][p]} + [@] + [n] # Th. open

5 p k glimpsenasal + {[p][k]} (#) C – {[r][l][j]}

9 @ Th. - */rl/ camera[@] + {[l][r]} (#) + voyelle réduite {[I][@]}

1Th.: duration threshold

Page 18: Automatic Grapheme-Phoneme Conversion for Spoken British English Corpora C. AURAN, C. BOUZON & D.J. HIRST Laboratoire Parole et Langage CNRS UMR6057 Université

G2P Conversion and AlignmentElision prediction: Evaluation

MEASURES

RECALL 50,51 %

PRECISION 74,44 %

SILENCE 49,49 %

NOISE 25,56 %

F-MEASURE 60,18 %

4077 elided phonemes out of 199,770 in the corpus (≈ 2 %)

Half of all elisions are correctly predicted

¾ predicted elisions are correct

Global quality of the algorithm

Page 19: Automatic Grapheme-Phoneme Conversion for Spoken British English Corpora C. AURAN, C. BOUZON & D.J. HIRST Laboratoire Parole et Langage CNRS UMR6057 Université

G2P Conversion and Alignment

Orthographic transcription

Raw phonemic transcription

Optimised phonemic transcription

Elision prediction

G2P conversion

The Aix-MARSEC Methodology

Automatic alignment

Aligned phonemic transcription

Page 20: Automatic Grapheme-Phoneme Conversion for Spoken British English Corpora C. AURAN, C. BOUZON & D.J. HIRST Laboratoire Parole et Langage CNRS UMR6057 Université

G2P Conversion and AlignmentAlignment: General principles

HMM and Viterbi based alignment by Christophe Lévy (LIA, France)

- HMM trained on the TIMIT corpus of American English

- Gaussian Mixture Model (8 components & diagonal covariance matrices estimated through the Expectation-Maximisation algorithm optimising the Maximum-Likelihood criterion)

- 12 MFCC (filter bank analysis) increased by energy, delta and delta-delta coefficients

39-coefficient vector per speech frame

Page 21: Automatic Grapheme-Phoneme Conversion for Spoken British English Corpora C. AURAN, C. BOUZON & D.J. HIRST Laboratoire Parole et Langage CNRS UMR6057 Université

G2P Conversion and Alignment

Absolute mean error: 22 ms

Mean error: - 6,29 ms

Kurtosis: 8,15 (narrow distribution)

Skewness: -0,94 (left bias)

Alignment: Evaluation

Page 22: Automatic Grapheme-Phoneme Conversion for Spoken British English Corpora C. AURAN, C. BOUZON & D.J. HIRST Laboratoire Parole et Langage CNRS UMR6057 Université

G2P Conversion and Alignment

Acceptance Threshold

Optimised transcription

64 ms 93.25 %

32 ms 82.02 %

20 ms 68.37 %

16 ms 59.97 %

15 ms 57.40 %

10 ms 42.43 %

5 ms 23.72 %

Alignment: Evaluation

Page 23: Automatic Grapheme-Phoneme Conversion for Spoken British English Corpora C. AURAN, C. BOUZON & D.J. HIRST Laboratoire Parole et Langage CNRS UMR6057 Université

Integration into PCEIntegration: Motivations

Double focus:

Segmental phenomena

Prosodic phenomena

Formant charts

Tonal alignment

Phoneme level alignment

For phoneticians and phonologists

Page 24: Automatic Grapheme-Phoneme Conversion for Spoken British English Corpora C. AURAN, C. BOUZON & D.J. HIRST Laboratoire Parole et Langage CNRS UMR6057 Université

Integration into PCEIntegration: 2 possible policies

• Direct integration: Exact Aix-MARSEC methodology

Requires word level manual alignment

• Alternative integration: Adaptation of the Aix-MARSEC methodology

Optional elisions predicted on the basis of phonotactic rules only + decision during the alignment phase

Page 25: Automatic Grapheme-Phoneme Conversion for Spoken British English Corpora C. AURAN, C. BOUZON & D.J. HIRST Laboratoire Parole et Langage CNRS UMR6057 Université

Conclusions and Perspectives

Page 26: Automatic Grapheme-Phoneme Conversion for Spoken British English Corpora C. AURAN, C. BOUZON & D.J. HIRST Laboratoire Parole et Langage CNRS UMR6057 Université

Conclusions and Perspectives

• An easily evolutive fully automatic methodology

• Diverse types of phonological / phonetic segmental / prosodic

exploitation (formant charts, temporal, intonational and metrical

studies, …)

• Full interactivity with other ProZEd modules (Momel-Intsint, …)

• Realistic integration into PCE (2 options)

Page 27: Automatic Grapheme-Phoneme Conversion for Spoken British English Corpora C. AURAN, C. BOUZON & D.J. HIRST Laboratoire Parole et Langage CNRS UMR6057 Université

Well… This time it’s for good !!

Presentation available from

www.lpl.univ-aix.fr/~EPGA/

Page 28: Automatic Grapheme-Phoneme Conversion for Spoken British English Corpora C. AURAN, C. BOUZON & D.J. HIRST Laboratoire Parole et Langage CNRS UMR6057 Université

14 ASCII prosodic annotation symbols:

_ low level~ high level< step-down> step-up/’ (high) rise-fall

‘/ high\ high fall fall-rise/ high rise

, low rise‘ low fall,\ (low rise-fall – not used)\, low fall-rise* stressed but unaccented| minor intonation unit boundary|| major intonation unit boundary

(Roach, 1994)

Back to the presentation

Page 29: Automatic Grapheme-Phoneme Conversion for Spoken British English Corpora C. AURAN, C. BOUZON & D.J. HIRST Laboratoire Parole et Langage CNRS UMR6057 Université

Reduced forms processing

Creation of a reduced forms dictionary based on O’Connor (1967) and

Faure (1975)

Reduction constraint: TSM absence

Aim: improving G2P conversion

Back to the presentation

Example: TSM: ‘/and → converted into /{nd/

No TSM: and → converted into /@nd/