current dev. in phonetics
Post on 11-May-2015
989 Views
Preview:
TRANSCRIPT
Current developments in phonetics
Louis C.W. Pols
Institute of Phonetic Sciences (IFA)
Amsterdam Center for Language and Communication
(ACLC)
June 11, 2004 Current Developments in Phonetics 2
Overview
My job: Aspects of Phonetics Size of Phonetics Community Ken Stevens and Phonetics My choices of topics Conclusions
June 11, 2004 Current Developments in Phonetics 3
Aspects of Phonetics
Phonetics is fantastic and interdisciplinary it is about the speech signal and much more: spoken language, spoken communication phonemes and prosody speaking and listening mental storage and retrieval speech acquisition and speech pathology speech technology, speech databases languages of the world, dialects and more: e.g. laboratory phonology,
evaluating cochlear implants, or designing Web avatars
June 11, 2004 Current Developments in Phonetics 4
My choices, given other intro’s
Phonetics as a basic science Computational modeling,
computational phonetics Knowledge from annotated, and
preferable freely accessible, speech corpora
Phonetics as an interdisciplinary science
June 11, 2004 Current Developments in Phonetics 5
Size of Phonetics Community
1000+ participants to speech conferences like: ICSLP & Eurospeech (now Interspeech under ISCA), ICASSP, LREC, ICPhS (under IPA)
numerous workshops (see ISCA and FoNETiks newsletters, and News section in SpeCom)
IPA ~1000 members, ISCA ~1350 members phonetics community at least 10 times bigger books; journals; LDC & ELRA ICPhS’03 Barcelona: 50 countries (USA 158; FR
81; GE 73; UK 71; JAP 46; SP 45, SW 41; NE 31; CAN 25; RU 19; IT 17; FI 14; AU 12; BRA 12, CH 12)
Ba-Ma system: less specialization in Phonetics
June 11, 2004 Current Developments in Phonetics 6
Ken Stevens and Phonetics
ESCA medal at Eurospeech’95 in Madrid on average one paper per year in JASA special issue JPhon “Quantal Theory”
(1989) 1998 master piece “Acoustic Phonetics” regular keynote speaker at conferences many international contacts (also Europe) many good students world-wide
June 11, 2004 Current Developments in Phonetics 7
Banquet Eurospeech95, Madrid
E’95 chairman ESCA-medalist ESCA president J.M. Pardo Ken Stevens Mrs. Pardo
Louis Pols
June 11, 2004 Current Developments in Phonetics 8
Textbook Phonetics
Summer course in English Phonetics (UCL): phonemic systems (vowels and consonants) segmental analysis (allophonic processes) word stress weakening and coarticulation processes sentence stress (accent, tonal stress) intonation and meaning
similar in most textbooks
June 11, 2004 Current Developments in Phonetics 9
Invariance Symp., MIT 1983
Invariance and variability in speech processes (Perkell & Klatt, 1986)
also Leitmotiv for my Amsterdam group perception of dynamic speechlike
sounds (vW) formant dynamics (van Son) appropriate context (van Son) acoustic vowel reduction (van Bergem) efficiency of speech (van Son)
June 11, 2004 Current Developments in Phonetics 10
20 30 40 50
Transition duration (ms)
0
60
120
180
240
Tone glide
Tone glide
Single-isolated
ComplexSingle Single
Complex
Adopted from van Wieringen & Pols (1998), Acta Acustica 84, 520-528“Discrimination of short and rapid speechlike transitions”
DL for short speech-like transitions
complex
simple
short longer trans.
initial
final
June 11, 2004 Current Developments in Phonetics 11
Static vs. dynamic V recogn.
see Weenink (2001) “Vowel normalizations with the TIMIT acoustic
phonetic speech corpus”, IFA Proc. 24, 117-123 438 males, both train & test sentences TIMIT 35,385 vowel segments, hand segmented 13 monophthongeal vowel categories 1-Bark bandfilter anal. (18), intensity normal. 3 frames per segment: central and 25 ms L/R
June 11, 2004 Current Developments in Phonetics 12
Some results
Vowel classif. (%) with discriminant functions
Condition # Items Static 1 frame
Dynamic 3 frames
Original 35,385438x13x(1…25)
59.3 66.9
speaker normalized
35,385 62.2 69.2
V centers per speaker
5,374438x13
78.9 90.1
speaker normalized
5,374 87.9 94.5
June 11, 2004 Current Developments in Phonetics 13
vowel perception w/w or w/o transitions?
our claims (vSon, IFA proc. 17(1993): only evidence for compensatory processes
(i.e. perceptual-overshoot and dynamic-specification), when in an appropriate context
synthetic isolated dynamic formant tracks lead to perceptual undershoot (=averaging)
silent center studies are ambiguous concl.: info in formant dynamics is only
used when V’s are heard in appropriate context
Perceiving (speech) dynamics
F = 375 Hz2
F1
2FF =-375 Hz2
time --> ms
fre
qu
en
cy -
-> H
z
F1
2Ffr
eq
ue
ncy
-->
Hz
< 6.3, 12.5, 25, > 50, 100, 150 ms
< 25, 50 > 100, 150 ms
< 25, 50 > 100, 150 ms
Stationary (reference) tokens
Dynamic tokens
on- offglide
complete
F =-225 Hz1
on- offglide
complete
F = 225 Hz1
June 11, 2004 Current Developments in Phonetics 15
Vowel identification
compare V responses for dynamic stimuli with those for static stimuli
calculate net shift in V responses per onglide (CV), complete (CVC), or offglide (VC)
result: responses average over the trailing part of the formant track
see Pols & vSon, “Acoustics and perception of dynamic vowel segments”, Speech Comm.
June 11, 2004 Current Developments in Phonetics 16
Perceptual undershoot
X
1501005025-50
-40
-30
-20
-10
0
10
20
30
40
50
% N
et s
hift
->
Token duration -> ms
F = 225Hz1
F = 375Hz2
F =-375Hz2
F =-225Hz1Net shift in vowelresponses to tokenswith curved formanttracks vs. stationarytokens. All valuessignificant, exceptsmall open triangles
June 11, 2004 Current Developments in Phonetics 17
Local context and C & V identification
120 CVC fragments taken from a read text
various segments per CVC-fragment(50ms V-kernel and beyond)
both accented and unaccented vowels subjects identified (pre- or post-vocalic)
consonant or vowel in CV-, VC-, or CVC-segments
vSon & Pols (1999), “Perisegmental speech improves consonant and vowel identification”, Speech Comm. 29, 1-22
KernelV
CVC
TCCTTCC
VCC
CCTCV VC
CCV
CV VC
233200150100500Time -> ms
S la
50 ms
+10 ms–10–10+10 ms
+25 ms+25 ms Transition Transition
(152)
(91)(112)
(91)
(106)(91)(56)(41)
(106)(91)
(56)(41)
(50)
Vowel identification
Consonant identification
Stimulus typeKernel VC V CV CVC
Err
ors
-> %
0
10
20
30
40
All+ Accent– Accent
204010031037
N
0.0
0.5
1.0
1.5
Log
2 P
erpl
exity
->
bits
+ +
* * *
+
Error rates of vowel identification for the individual stimulus token types. Long-short vowel errors (/α-a:, -o:/) are ignored
c
VC V C0
10
20
30
40
50
60
70
VCCV
Err
ors
-> %
ErrorCorrect
Other segment is
N = 1680
results:• phoneme identification benefits from extra
speech• left context more beneficial than right context• better identification in CV when also other
member of pair was identified correctly (context effect)
June 11, 2004 Current Developments in Phonetics 21
Effect of (lack of) context
100 Dutch listeners identifying V segments“Vowel contrast reduction”, K-vBeinum
(1980)3 conditions M1 M2 F1 F2 Av.isolated V %(3) ASC
95.2433
88.9404
88.0447
86.4634
89.6480
words %(5) ASC
88.1406
78.8320
84.9374
85.3529
84.3407
unstr., free conv. %(10) ASC
31.2174
28.7119
33.3209
38.9255
33.0189
ASC = 1/n Σ |LFi - LFi|2 (total variance), LFi = 100 10log Fii=1
n
June 11, 2004 Current Developments in Phonetics 22
Historical biases
R. Plomp (2002) “The intelligent ear. On the nature of sound perception”
biases in research: dominance for simple stimuli (e.g., phonemes) preference for microscopic approach (e.g.,
phoneme discrimination rather than intelligibility) emphasis on psychophysical rather than
cognitive aspects of hearing use of clean signals in lab (rather than acoustic
reality of outside world with its disruptive sounds)
June 11, 2004 Current Developments in Phonetics 23
Computational Phonetics
R. Moore (1995) 13th ICPhS, Stockholmunify the emerging theoretical and practical developments in speech technology with the established knowledge and practices in phonetic sciences
Sagisaka et al. (1997), “Computing prosody. Computational models for processing spontaneous speech”
Klatt (1987), vSanten (1997), Wang (1997), duration modeling
vBergem (1993), Acoustic and lexical vowel reduction
Steeneken (1992), Speech Transmission Index
June 11, 2004 Current Developments in Phonetics 24
c2 c1
F2
normalized time
-1
Fcenter (c0)
Foffset
0 1
Fonset
F2 (t) = c0 + c1t + c2 t 2 (second order polynomial)
F2 (t) = F2 (t) + α2p(t) + β2
t (t) + γ2α (t) for @ in
/p@tα/
F2 (-1) = 1352 Hz ; F2 (0) = 1435 Hz; F2 (1)=1485 Hz
Stylized formant contour
June 11, 2004 Current Developments in Phonetics 25
Schwa realization
The schwa is not just a centralized vowel but somethingthat is completely assimilated with its phonemic context
June 11, 2004 Current Developments in Phonetics 26
Human word intelligibility vs. noise
from Ph.D thesisH. Steeneken (1992)‘On measuring andpredicting speechintelligibility’
June 11, 2004 Current Developments in Phonetics 27
Knowledge from Annotated Sp. Corp.
knowledge casted in rules vs. knowledge derived from intelligent searches in DB’s
vSanten (1997) greedy algorithm Greenberg et al. (2003) Switchboard Oostdijk et al. (2002) 1000 hrs.-10M words
spoken Dutch corpus (CGN) vSon et al. (2001) 5.5 hrs. IFA corpus Intas915 project (Dutch, Finnish, Russian)
June 11, 2004 Current Developments in Phonetics 28
Freq. effects vs. vowel reduction
Dutch Finnish Russian-0,100
-0,050
0,000
0,050
0,100
0,150
0,200
0,250
0,300 Duration
F12Dist
CoG
Intensity
Co
rre
latio
n C
oe
ffic
ien
t ->
R
Dutch Finnish Russian
0,000
0,050
0,100
0,150
0,200
0,250
0,300Duration
F12Dist
CoG
Intensity
Co
rre
latio
n C
oe
ffic
ien
t ->
Rread speech spontaneous speech
-log2(word frequency) vs. acoustic vowel reduction (in terms of duration, F1F2Dist, CoG, and Intensity) for Du, Fi, Ru
June 11, 2004 Current Developments in Phonetics 29
Phonetics an Interdisciplinary Science
some examples phonetics is a contributor to many
signal and data processing techniques as well as pattern recognition techniques
use of source-filter model to describe early speech development
laryngectomized speech, production and evaluation
turn switches in conversational dialogs progress in vowel production in babies
Early speech development
Articulation type Phonation type NoArt One Art Two Art
No Phonation Stage III Stage V Uninterrupted Phonation Stage I Stage III Stage V Interrupted Phonation Stage II Stage III Stage V Variegated Unint. Phon. Stage IV Stage III + IV Stage IV + V Variegated Interrupted Phon. Stage IV Stage II + III + IV Stage IV + V
average onset (in weeks)
Stage I
Stage II
Stage III
Stage IVSta
ge V
(babblin
g)
Stage VI
(‘words’)
0 6 10 20 31 40
vBeinum, Clement, vdDikkenberg, Developmental Sc. 4, 61-70 (2001)
Tracheoesophageal speech
C. van As, Ph.D thesis (2001)
June 11, 2004 Current Developments in Phonetics 32
Turn switches in conversation
shift in phonetics from isolated stimuli to conversational speech
quantitative modelling of the identification of turn-relevent places (TRP’s)
integration process of temporally unfolding information at different levels in speech, from conversation acts and semantics to prosody, phonetics and visual cues
use of laryngograph to detect preparatory glottal closure that precedes most TRP’s
new project Rob van Son (start Jan. 2004)
June 11, 2004 Current Developments in Phonetics 33
Progression in V production of babies
especially in the first year of life utterances difficult to identify as phon. seq. spectro-temporal analyses difficult because
of very high pitch formant measurements biased by
expectations pitch-related bandfilter analysis (automatic) 5 normal-hearing and 5 hearing-impaired vdStelt et al. (2003)
June 11, 2004 Current Developments in Phonetics 34
Spectral measurements
+
++
+
+
+
+
++
+
+
+
+
+
+
+
+
+ +
+
+
+++
+
+
+++
+
+
+
+
+
++
+
+
+
++
+
+++
+ +
+
++
++ +
+
+ ++
+
+
+
+
+
+
++
+
++
+ +
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
++
+
+
+
+
+
+
+
+
+
+
+
+
++
+
+
+
+
+
++
+
+
+
+
+++
++ + +
+
+
+
+
+
++
+
+
+
+++
+
+
++
+
+
+
+
+
+
+
+
+
++
+
+
+
+
+
+
++
+
+
+
+
+
+
+
+
++
+
++
+
+
++
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
++
+
+
+
+
+
+
+
+
+
++
+
+++
+
+
++ +
++
+
+
+
+
+
++ +
+
+
+
+++ +
+
+
+
+
+
+
+
++
++
+
+
+
+
+
+
++
+
++
++
++
+
+
+
+
+
+
+ +
+
+
+
+
+
+
+
+
+
+
+
++
+
+
+
++
+
++
+
+
+
+
+
+
+
+
+
+
++
+
+
+
+
++
+
++
+
+
+
+
+
+
+
+
+
+
+
++
+
+
+++
+
+
+
++
+
++
+
+
+
++
+
++
++
+
+
+
+
+
+
+ +
+
+
+
+
+
++
+
+
+
+
++
+
+
+
+
+
+
+ + +
+
+++
++
+
+
+ ++
+
+
++
+
++
++
+
++
++
+
+
+
+
+
+
+
+
+
+
+
++
++
+ +
++
+
+
++
+ +
++
+
+
++
+
+
++
+
+
++
+
+
++
+
+
+
+
+
+
+
+ +
+
+
+
+
+
++
+
+
+
+
+
+
+
+
+
+
+
++
+
+
+
+
+
+
+
+
++
+
+
+
+
+
+
+
++
+
+ ++
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
++
+
+
+
++
+
+
+
+
+
+
+
+
+
+
+
++
+
+
+
+ ++
+
+
+
+
+
+
+
+
+
+
+
+ +
+
+
+
+
+
+
+
+
++
+
+
+
+
+
+++
++
+
+
+
+
+
+
+
++
++
+
+
+
+
+
+
+
+
+
+
+
++
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+ ++
+
+
+
+
+
+
++
+
+
+
+
++
+
++
+++
+
+
++
+
+
+
+
+
+
+
+
+
+
+
+ +
+
++
+
+
++
+
+
++
+
+
+
+
+
+
++
+
++
+
+
+
+
+
+
+
+
+ +
+ +
+
+
+
+
+
+ ++
+
++
+
+
+
+
+
+
+
+
+
++
+
+
+ +
++
+
+
++
+
+
+
+
+
++
++
+
+
+
+
+
++
++
++
+
+
+
++
+
+
+
+
+
+
+
+
+
+
++
++
+
+
+
+
+
+
+
+
++
+
+
++
+
+
++
+
++
++
+
+
+
+
+
+
++
+
+
++
+
+
+
+
+
+
++
+
+
++
+
+
++
+
++
+
+
+
+
+
+
+
+
+
++
++
+
+
+
++
++
++
++
+
++
+
+
+
+
++
+
++
+
+
+
++
+
+
+ +
+
+
-8
-7
-6
-5
-4
-3
-2
-1
0
1
2
-8 -7 -6 -5 -4 -3 -2 -1 0 1 2 3 4 5 6 7pc1
*
**
*
*
***
*
*
*
*
*
**
*
**
*
** *
**
*
**
*
*
*
*
* **
**
*
*
*
*
*
**
*
*
*
*
*
**
*
*
**
*
*
*
**
*
*
*
*
*
*
*
**
*
*
*
*
**
*
*
*
*
*
*
****
*
*
**
*
**
*
*
*
**
*
*
*
*
*
**
*
*
*
*
*
*
*
*
*
**
*
*
***
*
*
**
**
*
*
*
*
**
*
*
* *
*
*
**
*
*
*
*
*
**
o
o
o
o
o
oo
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
oo
o
o
oo
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
oo
o
o
o
oo
o
o
o
o oo
oo
o o
o
o ooo
o
o
ooo
o oo
o
oo
o
o
o
o
o
o
oo o
oo
o
o
o
o
o
o
o
o
oo
o
oo
oo
o
o
oo
oo
o
o
o
oo
o
o
o
ooo oo
o
o
ooo
o
o
o
o
oo o
o
o
ooo
o
o
o
o
ooo
o o
o
oo
o
o
o
o
o
o
o
o
o
o
o
ooo
o
o
o
o
o
oo
oo
o
o
o
o
o
oo
o
o
o
o
o
o
oo
o o
oo
o
o o
o
o
o
o
o
ooo
o
o
o o
o
o
o
o
o
o
o
oo
o
oo
o
o
o
oo
o
o
o
o
o
o
o
o
o
oo
o
o
o
o
5/5
5/24+
++
+
+
+
+
++
+
+
+
+
+
+
+
+
+ +
+
+
+++
+
+
+++
+
+
+
+
+
++
+
+
+
++
+
+++
+ +
+
++
++ +
+
+ ++
+
+
+
+
+
+
++
+
++
+ +
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
++
+
+
+
+
+
+
+
+
+
+
+
+
++
+
+
+
+
+
++
+
+
+
+
+++
++ + +
+
+
+
+
+
++
+
+
+
+++
+
+
++
+
+
+
+
+
+
+
+
+
++
+
+
+
+
+
+
++
+
+
+
+
+
+
+
+
++
+
++
+
+
++
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
++
+
+
+
+
+
+
+
+
+
++
+
+++
+
+
++ +
++
+
+
+
+
+
++ +
+
+
+
+++ +
+
+
+
+
+
+
+
++
++
+
+
+
+
+
+
++
+
++
++
++
+
+
+
+
+
+
+ +
+
+
+
+
+
+
+
+
+
+
+
++
+
+
+
++
+
++
+
+
+
+
+
+
+
+
+
+
++
+
+
+
+
++
+
++
+
+
+
+
+
+
+
+
+
+
+
++
+
+
+++
+
+
+
++
+
++
+
+
+
++
+
++
++
+
+
+
+
+
+
+ +
+
+
+
+
+
++
+
+
+
+
++
+
+
+
+
+
+
+ + +
+
+++
++
+
+
+ ++
+
+
++
+
++
++
+
++
++
+
+
+
+
+
+
+
+
+
+
+
++
++
+ +
++
+
+
++
+ +
++
+
+
++
+
+
++
+
+
++
+
+
++
+
+
+
+
+
+
+
+ +
+
+
+
+
+
++
+
+
+
+
+
+
+
+
+
+
+
++
+
+
+
+
+
+
+
+
++
+
+
+
+
+
+
+
++
+
+ ++
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
++
+
+
+
++
+
+
+
+
+
+
+
+
+
+
+
++
+
+
+
+ ++
+
+
+
+
+
+
+
+
+
+
+
+ +
+
+
+
+
+
+
+
+
++
+
+
+
+
+
+++
++
+
+
+
+
+
+
+
++
++
+
+
+
+
+
+
+
+
+
+
+
++
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+ ++
+
+
+
+
+
+
++
+
+
+
+
++
+
++
+++
+
+
++
+
+
+
+
+
+
+
+
+
+
+
+ +
+
++
+
+
++
+
+
++
+
+
+
+
+
+
++
+
++
+
+
+
+
+
+
+
+
+ +
+ +
+
+
+
+
+
+ ++
+
++
+
+
+
+
+
+
+
+
+
++
+
+
+ +
++
+
+
++
+
+
+
+
+
++
++
+
+
+
+
+
++
++
++
+
+
+
++
+
+
+
+
+
+
+
+
+
+
++
++
+
+
+
+
+
+
+
+
++
+
+
++
+
+
++
+
++
++
+
+
+
+
+
+
++
+
+
++
+
+
+
+
+
+
++
+
+
++
+
+
++
+
++
+
+
+
+
+
+
+
+
+
++
++
+
+
+
++
++
++
++
+
++
+
+
+
+
++
+
++
+
+
+
++
+
+
+ +
+
+
-8
-7
-6
-5
-4
-3
-2
-1
0
1
2
-8 -7 -6 -5 -4 -3 -2 -1 0 1 2 3 4 5 6 7pc1
*
*
**
*
*
**
*
*
* *
*
*
**
*
*
*
*
**
*
*
*
*
*
*
*
**
*
**
*
*
*
*
*
*
*
* *
*
*
*
* *
*
*
*
*
*
*
** *
*
*
*
* *
*
*
*
*
*
*
*
*
*** *
**
* **
*
* *
*
*
*
*
*
*
**
*
***
*
*
* *
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
**
*
**
*
**
*
*
*
*
*
*
**
*
*
*
*
**
**
*
* *
*
*
*
*
**
*
*
*
*
*
*
**
*
* * *
*
*
** *
*
**
*
*
*
*
****
**
*
*
*
*
** *
**
**
*
*
*
*
*
*
**
** **
* *
**
*
**
*
*
*
**
*
*
**
* *
*
**
**
*
*
**
*
*
*
*
*
*
*
*
***
*
*
*
*
*
* *
**
* ***
*
o
o
o
o
o
oo
o
o
o
o
ooo
oo o o
o
o
o
o
o
oo
o
o
o
ooo
o
o
oo
o
o
o
o
o
o
o
o
o
oo
o
o
o
o
o
o
oo
o
o
o
o
o
o
o
o
oo
o
oo
o
o
oo
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
oo
o
o
o
o
o
o
o
o
o
oo
o
ooo
o
o
oo o
oo
o
o
o
o
o
oo o
o
o
o
ooo o
o
o
o
o
o
o
o
oo
oo
o
o
o
o
o
o
oo
o
oo
oo
oo
o
o
o
o
o
o
o o
o
o
o
o
o
o
o7/24
7/5
normal hearing child 5 & 24 mo. hearing-impaired child 5 & 24 mo.
i
u a
June 11, 2004 Current Developments in Phonetics 35
Conclusions
importance of dynamic information implications of (lack of) (local) context interdisciplinary nature of phonetics need for large, annotated, and freely
accessible speech corpora generalization via computational
phonetics phonetics and phonology (Patricia Keating)
top related