Download - 1 FUL:Incorporating phonological theory into ASR Aditi lahiri(Prof in Oxford) Henning Reetz(Prof in Frankfurt) presented by Jacques Koreman Jacques Koreman

1

FUL: Incorporating phonological theory into ASRAditi lahiri (Prof in Oxford)Henning Reetz (Prof in Frankfurt)

presented by Jacques KoremanJacques Koreman (ISK), presntation speech group IET at NTNU

2

Acknowledgment and responsibilities

• Some of the slides (in Times New Roman) were made available by Henning Reetz

• The ideas are all Aditi’s and Henning’s• Their (mis)representation is mine…

3

What is FUL, and why is it interesting?

FUL stands for featurally underspecified lexicon. This presentation addresses its main characteristics:

• Underspecified features are omitted from the underlying representation

• Non-stochastic approach, in contrast to any current techniques in ASR• Psychological reality proven by psycholinguistic and other evidence

4

An example of underspecification

Underspecification can help to deal with assimilation,as for instance in spontaneous speech

green bag green grassoften realised as greem bag greeng grass

while lame dog long dayis never realised as lane dog londay

Why? Because /n/ is underspecified for place and can therefore borrow a place features from its

neighbourwhile /m/ is [LABIAL]

5

FUL featural specification

The specification of features is constrained by universal properties and language-specific requirements: for German [ABRUPT] and [CORONAL] (cf. ”green”) are not specified in the lexicon.

• FUL uses monovalent, not binary features

• V and C share the same place features

The type of features are very much under debate: binary or monovalent, fully specified or underspecified, V and C features together or separate, feature names?

On the next slide, the latest version of the feature hierarchy in FUL is shown.

6

Latest version FUL feature hierarchy

7

Lexical entries/access in FUL

• Entries contain underspecified representations.As opposed to standard full, binary specification!

• Each morpheme has a unique representation.Diametrically opposed view of dealing with variation in

the signal compared to exemplar-based modelling!

• Rough signal parameters mapped onto phonological features (no segments, syllables or other intermediate representation)Unlike detailed acoustic analysis in other systems!

• Features used to directly access the lexicon using a non-stochastic, ternary matching procedure.Human speech processing as opposed to pattern

matching?

8

ASR on the basis of a FUL

How does ASR with FUL work?

Slides 9-18 explain the recognition steps in the FUL system.

Why does ASR with FUL work?

After that, evidence for the approach from human speech processing will be presented.

9

Acoustic signal (stream of samples)

Stream of phonological features

SegmentsProsodyMorphologySyntaxSemantic

Phonological & syntactic parsing

match no mismatch mismatch Matching process

Acoustic front end

Word candidates

Representationwith

phonologicalfeatures

Word lexicon

Overview of the FUL system

10

Acoustic signal (stream of samples, waveform)LPC

FFT• • •

Stream of formants and spectral shape parameter

Heuristics(e.g. [high] := F1<450 Hz)

Stream of features (labial, nasal, low,...)

Heuristics(e.g. length > 5 ms)

20ms window1 ms step rate

1 ms step rate

synchronise features

Stream of corrected and synchronised featuresend

LPCAcoustic font end

Could maybe also be

landmarks…

11

speech signal

formants

heuristics, e.g. [high] := F1 < 450 Hz

Acoustic font end... parameter extraction

12

Phonological features

[son]

[low][high]

•

••

•

Acoustic font end….. to features

13

Phonological features, filtered and synchronised01111001001

01110001001

00110001010

00110000010

01110001000

10001000000

00110010101

00110001010

01110000000

10001000000

01110000001

10001000000

00110001010

01000001000

00110010010

01110010001

[son]

[low][high]

•

••

•

Acoustic font end….. features filtered/synchronised

14

p f u a s b a i s t S p i t s ´

btdfv

vpbsz

UoO´

OE{o´e

SZtd

• • •

Lexicon search with underspecified features01111001001

01110001001

00110001010

00110000010

01110001000

10001000000

00110010101

00110001010

01110000000

10001000000

01110000001

10001000000

00110001010

01000001000

00110010010

01110010001

[son]

[low][high]

•

••

•

Acoustic font end….. lexical access with features

15

strident /s/

labial nasal /m/labial nasal /m/

no mismatch /s/ strident

nasal /n/

labial /p/consonantal

features, computed from signal at one instance in time

features, stored in the lexicon

labial nasal [m]

• • • • • •

labial no mismatch /p/consonantal

no mismatch /n/ nasal

The crunch of FUL: ternary matching

16

labial nasal /m/


features, computed from signal at another instance in time

coronal nasal [n]

labial nasal /m/

labial nasal /m/

features, computed from signal at one instance in time


labial nasal [m]

no mismatch /n/ nasal

nasal /n/no mismatch /n/ nasal


17

• • •

/fa/ {„fang!“ catch! verb, imp., ....}

{„fange“ I catch verb, 1st sg., ....}

{„fangen“ we catch verb, 1st pl.+ inf., ....}

{„fang an“ start! verb, imp., ....}

{„fange an“ I start verb, 1st sg., ....}

{„fangen an“ we start verb, 1st pl., ....}

{„fang auf“ catch! verb, imp., ....}

Morphological extension of underspecif.

18


matching features

features in lexiconscore =

2

x features in signal

• Mismatches cause words in the lexicon to be dropped from the list.

• No-mismatches or matches do not, but lead to different scores for the word candidates by comparing the number of features derived from the signal with those specified in the lexicon:

An im-probable system?

19

An im-probable system? Evidence.

FUL stands for featurally underspecified lexicon.

• Underspecified features are omitted from the underlying representation

• Non-stochastic approach, in contrast to any current techniques in ASR

• Psychological reality proven by psycholinguistic and other evidence

20

Evidence for underspecification:semantic priming in lexical decision• Crossmodal experiment (German):

– hear prime: Honig (honey) Hammer (hammer)

– see target: Biene (bee) Nagel (nail)

• Subjects’ task: lexical decision

• Pseudo-word Ho[m]ig primes Biene, but Ha[n]er does not prime Nagel

• Conclusion: [n] underspecified for place in lexicon

leads to no-mismatch for Ho[m]ig,but [m] in lexicon is labial, thus

mismatch for Ha[n]er

21

Evidence for underspecification:semantic priming in EEG

• The N400 is an event-related potential (ERP) component typically elicited by unexpected linguistic stimuli.

• It is characterized as a negative deflection peaking ca. 400ms after stimulus presentation.

• In models of speech comprehension, N400 is often associated with the semantic integration of words in sentence context; its finding is interpreted as pointing to the activation of a process working on semantics in the general time frame.

22

Evidence for underspecification:semantic priming in EEG

• word target: Hor[d]e (horde) Pro[b]e (test)pseudo-word target: Hor[b]e (horde) Pro[d]e (test)

• Subjects’ task: speeded lexical decision

• Similar RTs for words and pseudo-words, but more errors in lexical decision for Hor[b]e (no-mismatch for Hor[d]e) than for Pro[d]e (mismatch on Pro[b]e)

Also large negative peak for Pro[d]e but not for Hor[b]e (which behaved similarly to real words).

• Conclusion: [d] underspecified for place in lexicon, but [b] specified as [LABIAL]

23

Evidence for underspecification:vowel listening in MEG experiment

• standard (continuous): [o:]

deviant (played once): [ø:]

• Subjects’ task: just listen….

• Asymmetrical MisMatch Negativity (MMN) effect (perception of change) for [o:]- [ø:] greater than for [ø:]- [o:] : higher amplitude difference ca. 180 ms from onset of deviant and earlier effect.

Similar effects for other pairs.

• Conclusion: Results fit with underspecification

24

Evidence for underspecificationAnd there is more evidence

• from CVC gating experiments in English and Bengali, where a non-nasalised oral vowel could lead to both oral and nasal responses when the CV is heard (Lahiri & Marslen-Wilson, 1991,1992)

• from priming experiments, suggesting there are two kinds of [o:] in German, one which is specified for [labial,dorsal] (Boote-Bötchen as primes for Boot), the other specified only for [labial] (Söhne-Söhnchen as primes for Sohn)

• from language change in Miogliola (Northern Italian), wher two types of [n] were shown to exist, one [coronal], the other unspecified for place (Ghini, 2001).

….and more

25

Conclusions

• FUL is an implementation of phonological theory in ASR.

• FUL is firmly grounded in psycholinguistic experiments and observations on language change.

• FUL recognition is robust against variation in speech, but does not contain mechanisms to normalize for variation not directly related to the linguistic content (as we possibly do when we begin to understand a speaker better when we first meet him/talk to him on the phone), nor to use this information.

26

References

This presentation was mainly based on

• (a draft version of)Lahiri, A. & Reetz, H. (2002). "Underspecified recognition", in C. Gussenhoven & N. Warner (eds.) Laboratory Phonology 7. Berlin: Mouton, 637-675.

• Lahiri, A. & Reetz, H. (submitted to J. Phon.). ”Distinctive features: phonological underspecification in processing”.

• See also: http://ling.uni-konstanz.de/pages/ proj/sfb471/ publ/d-3.html

Download - 1 FUL:Incorporating phonological theory into ASR Aditi lahiri(Prof in Oxford) Henning Reetz(Prof in Frankfurt) presented by Jacques Koreman Jacques Koreman

Top Related