1
FUL: Incorporating phonological theory into ASRAditi lahiri (Prof in Oxford)Henning Reetz (Prof in Frankfurt)
presented by Jacques KoremanJacques Koreman (ISK), presntation speech group IET at NTNU
2
Acknowledgment and responsibilities
• Some of the slides (in Times New Roman) were made available by Henning Reetz
• The ideas are all Aditi’s and Henning’s• Their (mis)representation is mine…
3
What is FUL, and why is it interesting?
FUL stands for featurally underspecified lexicon. This presentation addresses its main characteristics:
• Underspecified features are omitted from the underlying representation
• Non-stochastic approach, in contrast to any current techniques in ASR• Psychological reality proven by psycholinguistic and other evidence
4
An example of underspecification
Underspecification can help to deal with assimilation,as for instance in spontaneous speech
green bag green grassoften realised as greem bag greeng grass
while lame dog long dayis never realised as lane dog londay
Why? Because /n/ is underspecified for place and can therefore borrow a place features from its
neighbourwhile /m/ is [LABIAL]
5
FUL featural specification
The specification of features is constrained by universal properties and language-specific requirements: for German [ABRUPT] and [CORONAL] (cf. ”green”) are not specified in the lexicon.
• FUL uses monovalent, not binary features
• V and C share the same place features
The type of features are very much under debate: binary or monovalent, fully specified or underspecified, V and C features together or separate, feature names?
On the next slide, the latest version of the feature hierarchy in FUL is shown.
6
Latest version FUL feature hierarchy
7
Lexical entries/access in FUL
• Entries contain underspecified representations.As opposed to standard full, binary specification!
• Each morpheme has a unique representation.Diametrically opposed view of dealing with variation in
the signal compared to exemplar-based modelling!
• Rough signal parameters mapped onto phonological features (no segments, syllables or other intermediate representation)Unlike detailed acoustic analysis in other systems!
• Features used to directly access the lexicon using a non-stochastic, ternary matching procedure.Human speech processing as opposed to pattern
matching?
8
ASR on the basis of a FUL
How does ASR with FUL work?
Slides 9-18 explain the recognition steps in the FUL system.
Why does ASR with FUL work?
After that, evidence for the approach from human speech processing will be presented.
9
Acoustic signal (stream of samples)
Stream of phonological features
SegmentsProsodyMorphologySyntaxSemantic
Phonological & syntactic parsing
match no mismatch mismatch Matching process
Acoustic front end
Word candidates
Representationwith
phonologicalfeatures
Word lexicon
Overview of the FUL system
10
Acoustic signal (stream of samples, waveform)LPC
FFT• • •
Stream of formants and spectral shape parameter
Heuristics(e.g. [high] := F1<450 Hz)
Stream of features (labial, nasal, low,...)
Heuristics(e.g. length > 5 ms)
20ms window1 ms step rate
1 ms step rate
synchronise features
Stream of corrected and synchronised featuresend
LPCAcoustic font end
Could maybe also be
landmarks…
11
speech signal
formants
heuristics, e.g. [high] := F1 < 450 Hz
Acoustic font end... parameter extraction
12
Phonological features
[son]
[low][high]
•
••
•
Acoustic font end….. to features
13
Phonological features, filtered and synchronised01111001001
01110001001
00110001010
00110000010
01110001000
10001000000
00110010101
00110001010
01110000000
10001000000
01110000001
10001000000
00110001010
01000001000
00110010010
01110010001
[son]
[low][high]
•
••
•
Acoustic font end….. features filtered/synchronised
14
p f u a s b a i s t S p i t s ´
btdfv
vpbsz
UoO´
OE{o´e
SZtd
• • •
Lexicon search with underspecified features01111001001
01110001001
00110001010
00110000010
01110001000
10001000000
00110010101
00110001010
01110000000
10001000000
01110000001
10001000000
00110001010
01000001000
00110010010
01110010001
[son]
[low][high]
•
••
•
Acoustic font end….. lexical access with features
15
strident /s/
labial nasal /m/labial nasal /m/
no mismatch /s/ strident
nasal /n/
labial /p/consonantal
features, computed from signal at one instance in time
features, stored in the lexicon
labial nasal [m]
• • • • • •
labial no mismatch /p/consonantal
no mismatch /n/ nasal
The crunch of FUL: ternary matching
16
labial nasal /m/
features, stored in the lexicon
features, computed from signal at another instance in time
coronal nasal [n]
labial nasal /m/
labial nasal /m/
features, computed from signal at one instance in time
features, stored in the lexicon
labial nasal [m]
no mismatch /n/ nasal
nasal /n/no mismatch /n/ nasal
The crunch of FUL: ternary matching
17
• • •
/fa/ {„fang!“ catch! verb, imp., ....}
{„fange“ I catch verb, 1st sg., ....}
{„fangen“ we catch verb, 1st pl.+ inf., ....}
{„fang an“ start! verb, imp., ....}
{„fange an“ I start verb, 1st sg., ....}
{„fangen an“ we start verb, 1st pl., ....}
{„fang auf“ catch! verb, imp., ....}
Morphological extension of underspecif.
18
The crunch of FUL: ternary matching
matching features
features in lexiconscore =
2
x features in signal
• Mismatches cause words in the lexicon to be dropped from the list.
• No-mismatches or matches do not, but lead to different scores for the word candidates by comparing the number of features derived from the signal with those specified in the lexicon:
An im-probable system?
19
An im-probable system? Evidence.
FUL stands for featurally underspecified lexicon.
• Underspecified features are omitted from the underlying representation
• Non-stochastic approach, in contrast to any current techniques in ASR
• Psychological reality proven by psycholinguistic and other evidence
20
Evidence for underspecification:semantic priming in lexical decision• Crossmodal experiment (German):
– hear prime: Honig (honey) Hammer (hammer)
– see target: Biene (bee) Nagel (nail)
• Subjects’ task: lexical decision
• Pseudo-word Ho[m]ig primes Biene, but Ha[n]er does not prime Nagel
• Conclusion: [n] underspecified for place in lexicon
leads to no-mismatch for Ho[m]ig,but [m] in lexicon is labial, thus
mismatch for Ha[n]er
21
Evidence for underspecification:semantic priming in EEG
• The N400 is an event-related potential (ERP) component typically elicited by unexpected linguistic stimuli.
• It is characterized as a negative deflection peaking ca. 400ms after stimulus presentation.
• In models of speech comprehension, N400 is often associated with the semantic integration of words in sentence context; its finding is interpreted as pointing to the activation of a process working on semantics in the general time frame.
22
Evidence for underspecification:semantic priming in EEG
• word target: Hor[d]e (horde) Pro[b]e (test)pseudo-word target: Hor[b]e (horde) Pro[d]e (test)
• Subjects’ task: speeded lexical decision
• Similar RTs for words and pseudo-words, but more errors in lexical decision for Hor[b]e (no-mismatch for Hor[d]e) than for Pro[d]e (mismatch on Pro[b]e)
Also large negative peak for Pro[d]e but not for Hor[b]e (which behaved similarly to real words).
• Conclusion: [d] underspecified for place in lexicon, but [b] specified as [LABIAL]
23
Evidence for underspecification:vowel listening in MEG experiment
• standard (continuous): [o:]
deviant (played once): [ø:]
• Subjects’ task: just listen….
• Asymmetrical MisMatch Negativity (MMN) effect (perception of change) for [o:]- [ø:] greater than for [ø:]- [o:] : higher amplitude difference ca. 180 ms from onset of deviant and earlier effect.
Similar effects for other pairs.
• Conclusion: Results fit with underspecification
24
Evidence for underspecificationAnd there is more evidence
• from CVC gating experiments in English and Bengali, where a non-nasalised oral vowel could lead to both oral and nasal responses when the CV is heard (Lahiri & Marslen-Wilson, 1991,1992)
• from priming experiments, suggesting there are two kinds of [o:] in German, one which is specified for [labial,dorsal] (Boote-Bötchen as primes for Boot), the other specified only for [labial] (Söhne-Söhnchen as primes for Sohn)
• from language change in Miogliola (Northern Italian), wher two types of [n] were shown to exist, one [coronal], the other unspecified for place (Ghini, 2001).
….and more
25
Conclusions
• FUL is an implementation of phonological theory in ASR.
• FUL is firmly grounded in psycholinguistic experiments and observations on language change.
• FUL recognition is robust against variation in speech, but does not contain mechanisms to normalize for variation not directly related to the linguistic content (as we possibly do when we begin to understand a speaker better when we first meet him/talk to him on the phone), nor to use this information.
26
References
This presentation was mainly based on
• (a draft version of)Lahiri, A. & Reetz, H. (2002). "Underspecified recognition", in C. Gussenhoven & N. Warner (eds.) Laboratory Phonology 7. Berlin: Mouton, 637-675.
• Lahiri, A. & Reetz, H. (submitted to J. Phon.). ”Distinctive features: phonological underspecification in processing”.
• See also: http://ling.uni-konstanz.de/pages/ proj/sfb471/ publ/d-3.html