annotation of speech from the phonetics/phonology perspective bettina braun & jürgen trouvain...
TRANSCRIPT
Annotation of speech from the phonetics/phonology perspective
Bettina Braun & Jürgen Trouvain
15.02.2002
Fachrichtung 4.7, Institut für Phonetik
Annotation of speech 2
Manipulating text vs. speech [1]
text file manipulation "vowel-only" versionremove all consonant letters, replace them with a space, so that only the vowels are left
e ea e o e a o o o o : a e ou y i e o i i a e u y e i e a e oo .
Annotation of speech 3
Manipulating text vs. speech [2]
text file manipulation"consonants-only" versionremove all vowel letters, replace them with a space, so that only the consonants are left
Th w th r f r c st f r t m rr w: r th r cl d n th m n ng w th f w s nn sp lls n th ft n n.
Annotation of speech 4
Manipulating text vs. speech [3]
The weather forecast for tomorrow: rather cloudy in the morning with a few sunny spells in the afternoon.
speech file manipulation original recording, not manipulated "consonants-only" version:
vowel segments replaced with silence "vowels-only" version:
consonant segments replaced with silence
Annotation of speech 5
Coarticulation articulating means
articulator in motion, not in fixed position
articulators move continously, not discretely
articulatory movements temporally overlap
Annotation of speech 6
original
vowelsonly
vowelsonlywithoutsilences
Annotation of speech 7
Timing information of consonant
durations:silence is more than nothing
Annotation of speech 8
Speech melody information about fundamental
frequency (F0) in the voiced vowel segments with F0 variation
without any F0 variation (monotonous)
Annotation of speech 9
Annotation of sound segments: discreteness in mind & in physics
"Es ist 8 Uhr morgens."
m
m
m
o
O
N
s
s
s
graphemes
phonemes
phones O6
r
r
g
g
e
@
n
n
Annotation of speech 10
Annotation of sound segments: discrete units?
"Die Nacht haben Maiers gut geschlafen."
"…………… haben Maier ……………………."
phonemic h a: b @ n m aI @ r s acoustic-phonetic h a: b m aI 6 s articulatory phonetic h a: b n m aI 6 s
(possibly)
Annotation of speech 11
Segmentation of sound segments: degree of discreteness
"Wer möchte noch Milch?"
clear segmentation: closure and closure release in [t] in "möch t
e"
unclear segmentation: [I l] in "M il ch"
Annotation of speech 12
Kiel Corpus read & spontaneous speech
orthography phonemic (canonical) form realised form word & sentence boundary manually labelled
Annotation of speech 13
From sounds to syllables: how many syllables?
semi-vowels: syllabic or not? Studie Stu - di - e vs. Stu - die
Piano Pi - a - no vs. Pia - no
size of auditory window "… mit mir diese Dienstreise zu unternehmen, …"
rei - se - zu - un - ter
zu - un - ter
zu - un
Annotation of speech 14
From sounds to syllables:where is the syllable boundary?
ambisyllabic consonants & onset principles Mitte /m I - t @/ vs. /m I _t @/
Adler /a: t - l @ r/ vs. / a: - d l @ r/
Fenster /f E n s - t E r/ vs. /f E n - s t E r/
resyllabification "Wenn es Ihnen da 5 Tage lang irgendwo passen
würde."
/v E n - E s/ vs. [v E _ n E s]
Annotation of speech 15
Controlled elicitation of spontaneous speech
Monologues Erzählung Bildbeschreibung
Dialogues: Task-oriented data collection Map Task Appointment-making
Degree of naturalness? Controlled elicitation
Annotation of speech 16
Controlled elicitation of spontaneous speech
Annotation of speech 17
Problems for annotation: non-speech in speech
Many non-linguistic signal portions: swallowing lip-smacking breathing unfilled, filled pauses laughter hesitational lengthening
Partly overlapping with speech
Annotation of speech 18
Functions of prosody Generally: Features above the
segmental level suprasegmental
Annotation of speech 19
Phonetic encoding of prosody perceived pitch over time duration intensity spectral quality
Annotation of speech 20
Prosodic annotation: Signal oriented
Tilt-model (Taylor 2000) intonational “events” continuous parameters (tilt
parameter): amplitude: sum of the magnitude of rise and
fall duration: sum of rise and fall durations tilt: shape of the event
1.0 0.5 0
Annotation of speech 21
Prosodic annotation: Autosegmental, phonological
GToBI (Grice et al.) Tonal tier, break tier Two levels of pitch-heights (L, H) Simple and complex pitch accents Association to word stress marked by
* Exact temporal alignment Boundary tones marked by % Strength of prosodic breaks (3, 4)
Annotation of speech 22
Prosodic annotation: Exampletonalorth.breakmisc
Annotation of speech 23
GToBI Labelfiles
46.836392 113 also 46.958899 113 ich 47.171623 113 bin 47.555335 113 genau 48.180049 113 waagerecht 48.468170 113 rechts 48.613576 113 von 48.726670 113 der 49.246344 113 Goldmine
47.469173 115 L+H* 47.555339 115 H- 47.768061 115 H* 47.851534 115 < 48.320061 115 !H* 48.812822 115 !H* 49.240958 115 L-%
orthografic tones
47.555339 123 3 49.249036 123 4
breaks
Annotation of speech 24
Prosodic annotation: Phonological, single-layer
KIM (Kohler 1995) no suprasegmental tiers => efficient
analysis of segment-prosody interaction
differentiated from segmental labels by special diacritica
time marks for prosodic events anchored to word boundaries.
Example:
Annotation of speech 25
13 #c: 0.0007500 13 #&2 0.0007500 13 ##v: 0.0007500 13 $Q- 0.0007500 13 $E: 0.0007500 2147 $m 0.1341250 4787 #&PGn 0.2991250 4787 #&2( 0.2991250 4787 ##d 0.2991250 6243 $-h 0.3901250 6619 $'i: 0.4136250 7569 $n 0.4730000 8265 $s 0.5165000 9202 $t 0.5750625 9527 $-h 0.5953750 9995 $a: 0.6246250 10648 $k-x 0.6654375 11405 #&0 0.7127500 11405 ##v 0.7127500 12528 $Y6 0.7829375 13946 $d 0.8715625 14275 $@+ 0.8921250
14721 #&0 0.9200000 14721 ##m 0.9200000 16051 $i:6+ 1.0031250 16935 #&0 1.0583750 16935 ##g 1.0583750 18093 $-h 1.1307500 18564 $'u: 1.1601875 19314 $t 1.2070625 19981 $-h 1.2487500 20336 #&0. 1.2709375 20336 #&2) 1.2709375 20336 ##p 1.2709375 21501 $-h 1.3437500 22440 $'a 1.4024375 23700 $s 1.4811875 25408 $@- 1.5879375 25408 $n 1.5879375 28935 #, 1.8083750
Annotation of speech 26
13 #c: 0.0007500 13 #&2 0.0007500 13 ##v: 0.0007500 13 $Q- 0.0007500 13 $E: 0.0007500 2147 $m 0.1341250 4787 #&PGn 0.2991250 4787 #&2( 0.2991250 4787 ##d 0.2991250 6243 $-h 0.3901250 6619 $'i: 0.4136250 7569 $n 0.4730000 8265 $s 0.5165000 9202 $t 0.5750625 9527 $-h 0.5953750 9995 $a: 0.6246250 10648 $k-x 0.6654375 11405 #&0 0.7127500 11405 ##v 0.7127500 12528 $Y6 0.7829375 13946 $d 0.8715625 14275 $@+ 0.8921250
14721 #&0 0.9200000 14721 ##m 0.9200000 16051 $i:6+ 1.0031250 16935 #&0 1.0583750 16935 ##g 1.0583750 18093 $-h 1.1307500 18564 $'u: 1.1601875 19314 $t 1.2070625 19981 $-h 1.2487500 20336 #&0. 1.2709375 20336 #&2) 1.2709375 20336 ##p 1.2709375 21501 $-h 1.3437500 22440 $'a 1.4024375 23700 $s 1.4811875 25408 $@- 1.5879375 25408 $n 1.5879375 28935 #, 1.8083750
Annotation of speech 27
13 #c: 0.0007500 13 #&2 0.0007500 13 ##v: 0.0007500 13 $Q- 0.0007500 13 $E: 0.0007500 2147 $m 0.1341250 4787 #&PGn 0.2991250 4787 #&2( 0.2991250 4787 ##d 0.2991250 6243 $-h 0.3901250 6619 $'i: 0.4136250 7569 $n 0.4730000 8265 $s 0.5165000 9202 $t 0.5750625 9527 $-h 0.5953750 9995 $a: 0.6246250 10648 $k-x 0.6654375 11405 #&0 0.7127500 11405 ##v 0.7127500 12528 $Y6 0.7829375 13946 $d 0.8715625 14275 $@+ 0.8921250
14721 #&0 0.9200000 14721 ##m 0.9200000 16051 $i:6+ 1.0031250 16935 #&0 1.0583750 16935 ##g 1.0583750 18093 $-h 1.1307500 18564 $'u: 1.1601875 19314 $t 1.2070625 19981 $-h 1.2487500 20336 #&0. 1.2709375 20336 #&2) 1.2709375 20336 ##p 1.2709375 21501 $-h 1.3437500 22440 $'a 1.4024375 23700 $s 1.4811875 25408 $@- 1.5879375 25408 $n 1.5879375 28935 #, 1.8083750
Annotation of speech 28
13 #c: 0.0007500 13 #&2 0.0007500 13 ##v: 0.0007500 13 $Q- 0.0007500 13 $E: 0.0007500 2147 $m 0.1341250 4787 #&PGn 0.2991250 4787 #&2( 0.2991250 4787 ##d 0.2991250 6243 $-h 0.3901250 6619 $'i: 0.4136250 7569 $n 0.4730000 8265 $s 0.5165000 9202 $t 0.5750625 9527 $-h 0.5953750 9995 $a: 0.6246250 10648 $k-x 0.6654375 11405 #&0 0.7127500 11405 ##v 0.7127500 12528 $Y6 0.7829375 13946 $d 0.8715625 14275 $@+ 0.8921250
14721 #&0 0.9200000 14721 ##m 0.9200000 16051 $i:6+ 1.0031250 16935 #&0 1.0583750 16935 ##g 1.0583750 18093 $-h 1.1307500 18564 $'u: 1.1601875 19314 $t 1.2070625 19981 $-h 1.2487500 20336 #&0. 1.2709375 20336 #&2) 1.2709375 20336 ##p 1.2709375 21501 $-h 1.3437500 22440 $'a 1.4024375 23700 $s 1.4811875 25408 $@- 1.5879375 25408 $n 1.5879375 28935 #, 1.8083750
Annotation of speech 29
13 #c: 0.0007500 13 #&2 0.0007500 13 ##v: 0.0007500 13 $Q- 0.0007500 13 $E: 0.0007500 2147 $m 0.1341250 4787 #&PGn 0.2991250 4787 #&2( 0.2991250 4787 ##d 0.2991250 6243 $-h 0.3901250 6619 $'i: 0.4136250 7569 $n 0.4730000 8265 $s 0.5165000 9202 $t 0.5750625 9527 $-h 0.5953750 9995 $a: 0.6246250 10648 $k-x 0.6654375 11405 #&0 0.7127500 11405 ##v 0.7127500 12528 $Y6 0.7829375 13946 $d 0.8715625 14275 $@+ 0.8921250
14721 #&0 0.9200000 14721 ##m 0.9200000 16051 $i:6+ 1.0031250 16935 #&0 1.0583750 16935 ##g 1.0583750 18093 $-h 1.1307500 18564 $'u: 1.1601875 19314 $t 1.2070625 19981 $-h 1.2487500 20336 #&0. 1.2709375 20336 #&2) 1.2709375 20336 ##p 1.2709375 21501 $-h 1.3437500 22440 $'a 1.4024375 23700 $s 1.4811875 25408 $@- 1.5879375 25408 $n 1.5879375 28935 #, 1.8083750
Annotation of speech 30
Data structures and retrieval Mostly pure textfiles, aligned to
signal “Retrieval” using script languages (GToBI in EMU-Format) XML-formats
Annotation of speech 31
What for? Basic research
Rhythmic patterns Speech rate measurements (units, domains) Temporal alignment & scaling of pitch
accents Differentiated analysis of pitch range
Speech technology Modelling accentuation in ASR Speech rate in ASR Intonation and timing for synthesis
Annotation of speech 32
Bibliography Alwan, A., H.Bourlard and S.Furui (eds). 2001. Speech
Communication 33. Special Issue on Speech Annotation and Corpus Tools.
Grice,M., S.Baumann and R.Benzmüller (to appear). German ToBI. In: S.Jun (ed). Prosodic Typology
Grice, M. et al. (2000). Representation and annotation of dialogue. In: Handbook of Multimodal and Spoken Dialogue Systems. Resources, Terminology and Product Evaluation. Kluwer, pp. 1-101.
Kohler, K.J. (ed) 1995. Kieler Arbeitsberichte 29. Taylor, P. 2000. Analysis and Synthesis of Intonation
Using the Tilt Model. In: JASA 107(3). pp. 1697-1714.