elis-dssp sint-pietersnieuwstraat 41 b-9000 gent recognition of foreign names spoken by native...

21
ELIS-DSSP Sint- Pietersnieuwstraat 41 B-9000 Gent Recognition of foreign names spoken by native speakers Frederik Stouten & Jean-Pierre Martens Ghent University Electronics and Information Systems (ELIS)

Upload: margaret-cain

Post on 05-Jan-2016

213 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: ELIS-DSSP Sint-Pietersnieuwstraat 41 B-9000 Gent Recognition of foreign names spoken by native speakers Frederik Stouten & Jean-Pierre Martens Ghent University

ELIS-DSSPSint-Pietersnieuwstraat 41B-9000 Gent

Recognition of foreign names spoken by native speakers

Frederik Stouten & Jean-Pierre Martens

Ghent University

Electronics and Information Systems (ELIS)

Page 2: ELIS-DSSP Sint-Pietersnieuwstraat 41 B-9000 Gent Recognition of foreign names spoken by native speakers Frederik Stouten & Jean-Pierre Martens Ghent University

ELIS-DSSPSint-Pietersnieuwstraat 41B-9000 Gent Interspeech 07 - August 30th 07 2

Overview

• Problem statement• Methodology

– computing phonological scores– foreignizable phonemes

• Experiments– baseline system– systems with methodology implementation

• Conclusions

Page 3: ELIS-DSSP Sint-Pietersnieuwstraat 41 B-9000 Gent Recognition of foreign names spoken by native speakers Frederik Stouten & Jean-Pierre Martens Ghent University

ELIS-DSSPSint-Pietersnieuwstraat 41B-9000 Gent Interspeech 07 - August 30th 07 3

• Automatic attendant or car navigation systems– lexicon may contain > 100K words– many from foreign origin

• Native speaker of Dutch can pronounce Andrew as

Problem statement

Page 4: ELIS-DSSP Sint-Pietersnieuwstraat 41 B-9000 Gent Recognition of foreign names spoken by native speakers Frederik Stouten & Jean-Pierre Martens Ghent University

ELIS-DSSPSint-Pietersnieuwstraat 41B-9000 Gent Interspeech 07 - August 30th 07 4

• Automatic attendant or car navigation systems– lexicon may contain > 100K words– many from foreign origin

• Native speaker of Dutch can pronounce Andrew as• nativized A n d r E w• intermediate E n d r u w• foreignized E n d r u

Problem statement

Page 5: ELIS-DSSP Sint-Pietersnieuwstraat 41 B-9000 Gent Recognition of foreign names spoken by native speakers Frederik Stouten & Jean-Pierre Martens Ghent University

ELIS-DSSPSint-Pietersnieuwstraat 41B-9000 Gent Interspeech 07 - August 30th 07 5

• Standard solutions– foreign g2p’s + mapping to native phonemes– include foreign phoneme acoustic models

• Our proposal – combine scores of standard acoustic models and

phonologically inspired back-off model • both models trained on native speech only

– use foreign g2p’s without phoneme mapping– introduce foreignizable phonemes instead of traditional

foreign-to-native phoneme mappings

Problem situation

Page 6: ELIS-DSSP Sint-Pietersnieuwstraat 41 B-9000 Gent Recognition of foreign names spoken by native speakers Frederik Stouten & Jean-Pierre Martens Ghent University

ELIS-DSSPSint-Pietersnieuwstraat 41B-9000 Gent Interspeech 07 - August 30th 07 6

Combining scores

• two-stream score per acoustic model state q– standard model : log pA(x | q)

– phonological back-off model : log pB(x | q)

• control parameters– g1q, g2q = state dependent stream weight

(different risk for foreignized pronunciation)

– α, β = state independent scaling coefficients (to get same overall mean, variance)

– equidistant samples on g1q+ g2q = 1 (factor has no effect)

LL(xjq) = g1q logpA (xjq) + g2q[®logpB (xjq) ¡ ¯]

Page 7: ELIS-DSSP Sint-Pietersnieuwstraat 41 B-9000 Gent Recognition of foreign names spoken by native speakers Frederik Stouten & Jean-Pierre Martens Ghent University

ELIS-DSSPSint-Pietersnieuwstraat 41B-9000 Gent Interspeech 07 - August 30th 07 7

Combining scores

• Computation of log pB(x | q)

– phonological feature space: binary features fi (i=1,…,25)

– map each state to phonological space• select features of state on basis of forced alignment of speech

with standard acoustic models

• select fi with large enough mean of P(fi | x) / P(fi ) on state

• other strategy for foreignizable phonemes (see further)

– compute posterior probabilities P(fi | x) • configuration of 4 neural networks

Page 8: ELIS-DSSP Sint-Pietersnieuwstraat 41 B-9000 Gent Recognition of foreign names spoken by native speakers Frederik Stouten & Jean-Pierre Martens Ghent University

ELIS-DSSPSint-Pietersnieuwstraat 41B-9000 Gent Interspeech 07 - August 30th 07 8

Combining scores

• Computation of log pB(x | q)

– phonological feature space: binary features fi (i=1,…,25)

– map each state to phonological space• select features of state on basis of forced alignment of speech

with standard acoustic models

• select fi with large enough mean of P(fi | x) / P(fi ) on state

• other strategy for foreignizable phonemes (see further)

– compute posterior probabilities P(fi | x) • configuration of 4 neural networks

– convert posterior probabilities to log-likelihood logpB (xjq) = log

PB (qjx)PB (q)

+ logpB (x)

Page 9: ELIS-DSSP Sint-Pietersnieuwstraat 41 B-9000 Gent Recognition of foreign names spoken by native speakers Frederik Stouten & Jean-Pierre Martens Ghent University

ELIS-DSSPSint-Pietersnieuwstraat 41B-9000 Gent Interspeech 07 - August 30th 07 9

Combining scores

• Come to final two-stream score

– g2q less dependent on q than

– g2q log pB (x) = discardable

– computation of log PB(q | x) / PB(q)• Pq : positive features that are ‘on’ for state q• Nq : negative features absent or ‘off’ for q

LL(xjq) = g1q logpA (xjq) + g2q[®logPB (qjx)PB (q)

¡ ¯]

logPB (qjx)PB (q)

Page 10: ELIS-DSSP Sint-Pietersnieuwstraat 41 B-9000 Gent Recognition of foreign names spoken by native speakers Frederik Stouten & Jean-Pierre Martens Ghent University

ELIS-DSSPSint-Pietersnieuwstraat 41B-9000 Gent Interspeech 07 - August 30th 07 10

Combining scores

• Assuming independent PHFs we get

(1) (2)• Start with only positive features (term (1))

– problem : unequal number for different q– solution : take average or wqp x (1), with wqp = 1 / card(Pq)– experiment showed this is better

• Add negative features (term (2))– supposed to represent same probability– experiment shows 75 % correlation between (1) and (2)– keeping (1) + (2) is slightly better than discarding (2)

logPB (qjx)PB (q)

=X

f i 2Pq

logP (f i jx)P (f i )

+X

f i 2N q

log1¡ P (f i jx)1¡ P (f i )

Page 11: ELIS-DSSP Sint-Pietersnieuwstraat 41 B-9000 Gent Recognition of foreign names spoken by native speakers Frederik Stouten & Jean-Pierre Martens Ghent University

ELIS-DSSPSint-Pietersnieuwstraat 41B-9000 Gent Interspeech 07 - August 30th 07 11

Introducing foreignizable phonemes

• Baseline pronunciation of foreign name – take foreign language g2p output– map foreign phonemes to best native equivalent

• Our pronunciation– if equivalent has different PHFs

keep info of original

foreignizable phoneme: /NativePhon/_/ForeignPhon/

– e.g. /rr/ /r/_/rr/ (Dutch /r/ originating from English /rr/)– 6 such phonemes for English Dutch– use positive PHFs of /ForeignPhon/ (knowledge based)

Page 12: ELIS-DSSP Sint-Pietersnieuwstraat 41 B-9000 Gent Recognition of foreign names spoken by native speakers Frederik Stouten & Jean-Pierre Martens Ghent University

ELIS-DSSPSint-Pietersnieuwstraat 41B-9000 Gent Interspeech 07 - August 30th 07 12

Introducing foreignizable phonemes

• Pronunciation variants– mix of standard and new approach

Page 13: ELIS-DSSP Sint-Pietersnieuwstraat 41 B-9000 Gent Recognition of foreign names spoken by native speakers Frederik Stouten & Jean-Pierre Martens Ghent University

ELIS-DSSPSint-Pietersnieuwstraat 41B-9000 Gent Interspeech 07 - August 30th 07 13

Introducing foreignizable phonemes

• Pronunciation variants– mix of standard and new approach

name transcription

Alan Presser baseline E l @ n _ p r E s @ r

alternative 1 E l @ n _ p r_rr E s @ r

alternative 2 E l @ n _ p r E s @ r_rr

alternative 3 E l @ n _ p r_rr E s @ r_rr

Page 14: ELIS-DSSP Sint-Pietersnieuwstraat 41 B-9000 Gent Recognition of foreign names spoken by native speakers Frederik Stouten & Jean-Pierre Martens Ghent University

ELIS-DSSPSint-Pietersnieuwstraat 41B-9000 Gent Interspeech 07 - August 30th 07 14

Experiments

• Recognition of English names– database from Nuance (Cremelie, N and ten Bosch, L)– 2050 English name utterances – 21 different names– 26 native speakers of Dutch

• Recognizer– Standard acoustic models: cross-word triphones, trained on

Dutch read speech– PHF feature detector: neural network configuration, trained on

Dutch read speech– Vocabulary: 21 English names + 1779 Dutch names– Lexicon: different transcriptions for each name (see next slide)

Page 15: ELIS-DSSP Sint-Pietersnieuwstraat 41 B-9000 Gent Recognition of foreign names spoken by native speakers Frederik Stouten & Jean-Pierre Martens Ghent University

ELIS-DSSPSint-Pietersnieuwstraat 41B-9000 Gent Interspeech 07 - August 30th 07 15

Baseline system

• No back-off model used• Effects of different types of

transcriptions measured

Page 16: ELIS-DSSP Sint-Pietersnieuwstraat 41 B-9000 Gent Recognition of foreign names spoken by native speakers Frederik Stouten & Jean-Pierre Martens Ghent University

ELIS-DSSPSint-Pietersnieuwstraat 41B-9000 Gent Interspeech 07 - August 30th 07 16

Baseline system

• No back-off model used• Effects of different types of

transcriptions measuredlexicon WER CI95(%)

DuAlone 30.3 28.4-32.3

DuMan 23.5 21.6-25.3

EngAlone 23.1 21.2-24.9

EngDu 18.2 16.5-19.9

EngMan 16.8 15.2-18.4

ManAlone 24.7 22.8-26.5

Page 17: ELIS-DSSP Sint-Pietersnieuwstraat 41 B-9000 Gent Recognition of foreign names spoken by native speakers Frederik Stouten & Jean-Pierre Martens Ghent University

ELIS-DSSPSint-Pietersnieuwstraat 41B-9000 Gent Interspeech 07 - August 30th 07 17

Baseline system

• No back-off model used• Effects of different types of

transcriptions measured• Most important findings

1. English much better than Dutch transcriptions (alone)

model foreign pronunciations

2. Dutch transcriptions inevitable

model native pronunciations

lexicon WER CI95(%)

DuAlone 30.3 28.4-32.3

DuMan 23.5 21.6-25.3

EngAlone 23.1 21.2-24.9

EngDu 18.2 16.5-19.9

EngMan 16.8 15.2-18.4

ManAlone 24.7 22.8-26.5

Page 18: ELIS-DSSP Sint-Pietersnieuwstraat 41 B-9000 Gent Recognition of foreign names spoken by native speakers Frederik Stouten & Jean-Pierre Martens Ghent University

ELIS-DSSPSint-Pietersnieuwstraat 41B-9000 Gent Interspeech 07 - August 30th 07 18

Systems with back-off model

• system FOREIGN– consider one foreignizable phonemes at the time– same g1 on all its states : find optimal value under

condition that g1 = 1 for all other phonemes– repeat process until all foreignizable phonemes treated

• system NATIVE– same g1 on all states– search for best g1

• system ALL– foreignizable phonemes : g1 = from FOREIGN– other phonemes: same g1, g1 = from NATIVE

Page 19: ELIS-DSSP Sint-Pietersnieuwstraat 41 B-9000 Gent Recognition of foreign names spoken by native speakers Frederik Stouten & Jean-Pierre Martens Ghent University

ELIS-DSSPSint-Pietersnieuwstraat 41B-9000 Gent Interspeech 07 - August 30th 07 19

Systems with back-off model

• Main results : relative improvement of 11%

• Other results– g1 < 0.5 for system FOREIGN

– g1 > 0.5 for system NATIVE

system g1 g2 WER(%)

BASELINE 1 0 18.2

FOREIGN opt. opt. 16.5

NATIVE 0.7 0.3 17.3

ALL opt+0.7 opt+0.3 16.2

Page 20: ELIS-DSSP Sint-Pietersnieuwstraat 41 B-9000 Gent Recognition of foreign names spoken by native speakers Frederik Stouten & Jean-Pierre Martens Ghent University

ELIS-DSSPSint-Pietersnieuwstraat 41B-9000 Gent Interspeech 07 - August 30th 07 20

Latest work

• Seek confirmation of results on other data• Autonomata database (STEVIN-project)

– 60000 names, 5000 different names, 240 speakers– French + English + Dutch names– French + English + Dutch speakers– French + English + Dutch g2p outputs per name– large RI by using foreign g2p’s on French and English– much larger RI with our methodology than here– paper submitted to ASRU-2007

Page 21: ELIS-DSSP Sint-Pietersnieuwstraat 41 B-9000 Gent Recognition of foreign names spoken by native speakers Frederik Stouten & Jean-Pierre Martens Ghent University

ELIS-DSSPSint-Pietersnieuwstraat 41B-9000 Gent Interspeech 07 - August 30th 07 21

Conclusions as of today

• large improvements on foreign name recognition by adding foreign g2p outputs (RI of around 40%)

• substantial extra improvements by adding new methodology (RI of up to 30%)