letter to phoneme alignment

22
LETTER TO PHONEME ALIGNMENT Reihaneh Rabbany Shahin Jabbari

Upload: jamese

Post on 29-Jan-2016

68 views

Category:

Documents


0 download

DESCRIPTION

Letter to Phoneme Alignment. Reihaneh Rabbany Shahin Jabbari. Outline. Motivation Problem and its Challenges Relevant Works Our Work Formal Model EM Dynamic Bayesian Network Evaluation Letter to Phoneme Generator AER Result. Text to Speech Problem. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Letter to Phoneme Alignment

LETTER TO PHONEME ALIGNMENT

Reihaneh Rabbany

Shahin Jabbari

Page 2: Letter to Phoneme Alignment

OUTLINE

Motivation Problem and its Challenges Relevant Works Our Work

Formal Model EM Dynamic Bayesian Network

Evaluation Letter to Phoneme Generator AER

Result 2

Page 3: Letter to Phoneme Alignment

TEXT TO SPEECH TEXT TO SPEECH PROBLEM

Conversion of Text to Speech: TTS

Automated Telecom ServicesE-mail by PhoneBanking SystemsHandicapped People

3

Page 4: Letter to Phoneme Alignment

PRONUNCIATIONPRONUNCIATION

Pronunciation of the words Dictionary Words Non-Dictionary Words

Phonetic Analysis

Dictionary Look-up Language is alive, new words add Proper Nouns

4

Phonetic AnalysisWord

Pronunciation

Page 5: Letter to Phoneme Alignment

OUTLINE

Motivation Problem and its Challenges Relevant Works Our Work

Formal Model EM Dynamic Bayesian Network

Evaluation Letter to Phoneme Generator AER

Result 5

Page 6: Letter to Phoneme Alignment

PROBLEM

Letter to Phoneme Alignment◦ Letter: c a k e

◦ Phoneme: k ei k

6

L2P

Page 7: Letter to Phoneme Alignment

CHALLENGES

No Consistency◦ City / s /◦ Cake / k /◦ Kid / k /

No Transparency◦ K i d (3) / k i d / (3) ◦ S i x (3) / s i k s / (4)◦ Q u e u e (5) / k j u: / (3)◦ A x e (3) / a k s / (3)

7

Page 8: Letter to Phoneme Alignment

OUTLINE

Motivation Problem and its Challenges Relevant Works Our Work

Formal Model EM Dynamic Bayesian Network

Evaluation Letter to Phoneme Generator AER

Result 8

Page 9: Letter to Phoneme Alignment

ONE-TO-ONE EMDAELEMANS ET.AL., 1996 Length of word = pronunciation Produce all possible alignments

Inserting null letter/phoneme

Alignment probability

9

i

ii lpPAP )|()(

Page 10: Letter to Phoneme Alignment

DECISION TREEBLACK ET.AL., 1996

Train a CART Using Aligned Dictionary Why CART? A Single Tree for Each Letter

10

Page 11: Letter to Phoneme Alignment

KONDRAK

Alignments are not always one-to-one A x e / a k s / B oo k /b ú k /

Only Null Phoneme Similar to one-to-one EM

Produce All Possible Alignments Compute the Probabilities

11

Page 12: Letter to Phoneme Alignment

OUTLINE

Motivation Problem and its Challenges Relevant Works Our Work

Formal Model EM Dynamic Bayesian Network

Evaluation Letter to Phoneme Generator AER

Result 12

Page 13: Letter to Phoneme Alignment

FORMAL MODEL

Word: sequence of letters

Pronunciation: sequence of phonemes

Alignment: sequence of subalignments

Problem: Finding the most probable alignment

13

mpppP ...21

iiik PLaaaaA ,...21

nlllL ...21

),|(maxarg PLAPA Abest

2|||,| ii PL

Page 14: Letter to Phoneme Alignment

MANY-TO-MANY EM

1. Initialize prob(SubAlignmnets)// Expectation Step2. For each word in training_set

2.1. Produce all possible alignments 2.2. Choose the most probable

alignment// Maximization Step3. For all subalignments

3.1. Compute new_p(SubAlignmnets)

14][

],[)(

i

iii lM

plMaP

Page 15: Letter to Phoneme Alignment

DYNAMIC BAYESIAN NETWORK

15

Model

Subaligments are considered as hidden variables

Learn DBN by EM

lili PiPi

ai

k

iiii PLaPAP

1

),|()(

],[

][)(

ii

ii lpM

aMaP

Page 16: Letter to Phoneme Alignment

CONTEXT DEPENDENT DBN

Context independency assumption Makes the model simpler It is not always a correct assumption Example: Chat and Hat

Model

16

lili PiPi

aiai-1

k

iiiii PLaaPAP

11 ),,|()(

],,[

][)(

1 iii

ii lpaM

aMaP

Page 17: Letter to Phoneme Alignment

OUTLINE

Motivation Problem and its Challenges Relevant Works Our Work

Formal Model EM Dynamic Bayesian Network

Evaluation Letter to Phoneme Generator AER

Result 17

Page 18: Letter to Phoneme Alignment

EVALUATION DIFFICULTIES

Unsupervised Evaluation No Aligned Dictionary

Solutions How much it boost a supervised module

Letter to Phoneme Generator Comparing the result with a gold alignment

AER

18

Page 19: Letter to Phoneme Alignment

Letter to Phoneme Generator

Percentage of correctly generated phonemes and words

How it works? Finding Chunks

Binary Classification Using Instance-Based-Learning

Phoneme Prediction Phoneme is predicted independently for each letter Phoneme is predicted for each chunk

Hidden Markov Model 19

Page 20: Letter to Phoneme Alignment

ALIGNMENT ERROR RATIO

AER Evaluating by Alignment Error Ratio

Counting common pairs between Our aligned output Gold alignment

Calculating AER

20

|| A

GAAER

Page 21: Letter to Phoneme Alignment

OUTLINE

Motivation Problem and its Challenges Relevant Works Our Work

Formal Model EM Dynamic Bayesian Network

Evaluation Letter to Phoneme Generator AER

Result 21

Page 22: Letter to Phoneme Alignment

RESULTS

22

10 fold cross validation

Model Word Accuracy

Phoneme Accuracy

Best previous results 66.82 92.45

One_To_One EM 53.87% 85.66%

Many_To_Many EM 76% 94.5%

DBN ContextIndependent

79.12% 95.23%

ContextDependent

81.54% 96. 70%