annotation of speech from the phonetics/phonology perspective bettina braun & jürgen trouvain...

32
Annotation of speech from the phonetics/phonology perspective Bettina Braun & Jürgen Trouvain 15.02.2002 Fachrichtung 4.7, Institut für Phonetik

Upload: cole-medina

Post on 26-Mar-2015

213 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Annotation of speech from the phonetics/phonology perspective Bettina Braun & Jürgen Trouvain 15.02.2002 Fachrichtung 4.7, Institut für Phonetik

Annotation of speech from the phonetics/phonology perspective

Bettina Braun & Jürgen Trouvain

15.02.2002

Fachrichtung 4.7, Institut für Phonetik

Page 2: Annotation of speech from the phonetics/phonology perspective Bettina Braun & Jürgen Trouvain 15.02.2002 Fachrichtung 4.7, Institut für Phonetik

Annotation of speech 2

Manipulating text vs. speech [1]

text file manipulation "vowel-only" versionremove all consonant letters, replace them with a space, so that only the vowels are left

e ea e o e a o o o o : a e ou y i e o i i a e u y e i e a e oo .

Page 3: Annotation of speech from the phonetics/phonology perspective Bettina Braun & Jürgen Trouvain 15.02.2002 Fachrichtung 4.7, Institut für Phonetik

Annotation of speech 3

Manipulating text vs. speech [2]

text file manipulation"consonants-only" versionremove all vowel letters, replace them with a space, so that only the consonants are left

Th w th r f r c st f r t m rr w: r th r cl d n th m n ng w th f w s nn sp lls n th ft n n.

Page 4: Annotation of speech from the phonetics/phonology perspective Bettina Braun & Jürgen Trouvain 15.02.2002 Fachrichtung 4.7, Institut für Phonetik

Annotation of speech 4

Manipulating text vs. speech [3]

The weather forecast for tomorrow: rather cloudy in the morning with a few sunny spells in the afternoon.

speech file manipulation original recording, not manipulated "consonants-only" version:

vowel segments replaced with silence "vowels-only" version:

consonant segments replaced with silence

Page 5: Annotation of speech from the phonetics/phonology perspective Bettina Braun & Jürgen Trouvain 15.02.2002 Fachrichtung 4.7, Institut für Phonetik

Annotation of speech 5

Coarticulation articulating means

articulator in motion, not in fixed position

articulators move continously, not discretely

articulatory movements temporally overlap

Page 6: Annotation of speech from the phonetics/phonology perspective Bettina Braun & Jürgen Trouvain 15.02.2002 Fachrichtung 4.7, Institut für Phonetik

Annotation of speech 6

original

vowelsonly

vowelsonlywithoutsilences

Page 7: Annotation of speech from the phonetics/phonology perspective Bettina Braun & Jürgen Trouvain 15.02.2002 Fachrichtung 4.7, Institut für Phonetik

Annotation of speech 7

Timing information of consonant

durations:silence is more than nothing

Page 8: Annotation of speech from the phonetics/phonology perspective Bettina Braun & Jürgen Trouvain 15.02.2002 Fachrichtung 4.7, Institut für Phonetik

Annotation of speech 8

Speech melody information about fundamental

frequency (F0) in the voiced vowel segments with F0 variation

without any F0 variation (monotonous)

Page 9: Annotation of speech from the phonetics/phonology perspective Bettina Braun & Jürgen Trouvain 15.02.2002 Fachrichtung 4.7, Institut für Phonetik

Annotation of speech 9

Annotation of sound segments: discreteness in mind & in physics

"Es ist 8 Uhr morgens."

m

m

m

o

O

N

s

s

s

graphemes

phonemes

phones O6

r

r

g

g

e

@

n

n

Page 10: Annotation of speech from the phonetics/phonology perspective Bettina Braun & Jürgen Trouvain 15.02.2002 Fachrichtung 4.7, Institut für Phonetik

Annotation of speech 10

Annotation of sound segments: discrete units?

"Die Nacht haben Maiers gut geschlafen."

"…………… haben Maier ……………………."

phonemic h a: b @ n m aI @ r s acoustic-phonetic h a: b m aI 6 s articulatory phonetic h a: b n m aI 6 s

(possibly)

Page 11: Annotation of speech from the phonetics/phonology perspective Bettina Braun & Jürgen Trouvain 15.02.2002 Fachrichtung 4.7, Institut für Phonetik

Annotation of speech 11

Segmentation of sound segments: degree of discreteness

"Wer möchte noch Milch?"

clear segmentation: closure and closure release in [t] in "möch t

e"

unclear segmentation: [I l] in "M il ch"

Page 12: Annotation of speech from the phonetics/phonology perspective Bettina Braun & Jürgen Trouvain 15.02.2002 Fachrichtung 4.7, Institut für Phonetik

Annotation of speech 12

Kiel Corpus read & spontaneous speech

orthography phonemic (canonical) form realised form word & sentence boundary manually labelled

Page 13: Annotation of speech from the phonetics/phonology perspective Bettina Braun & Jürgen Trouvain 15.02.2002 Fachrichtung 4.7, Institut für Phonetik

Annotation of speech 13

From sounds to syllables: how many syllables?

semi-vowels: syllabic or not? Studie Stu - di - e vs. Stu - die

Piano Pi - a - no vs. Pia - no

size of auditory window "… mit mir diese Dienstreise zu unternehmen, …"

rei - se - zu - un - ter

zu - un - ter

zu - un

Page 14: Annotation of speech from the phonetics/phonology perspective Bettina Braun & Jürgen Trouvain 15.02.2002 Fachrichtung 4.7, Institut für Phonetik

Annotation of speech 14

From sounds to syllables:where is the syllable boundary?

ambisyllabic consonants & onset principles Mitte /m I - t @/ vs. /m I _t @/

Adler /a: t - l @ r/ vs. / a: - d l @ r/

Fenster /f E n s - t E r/ vs. /f E n - s t E r/

resyllabification "Wenn es Ihnen da 5 Tage lang irgendwo passen

würde."

/v E n - E s/ vs. [v E _ n E s]

Page 15: Annotation of speech from the phonetics/phonology perspective Bettina Braun & Jürgen Trouvain 15.02.2002 Fachrichtung 4.7, Institut für Phonetik

Annotation of speech 15

Controlled elicitation of spontaneous speech

Monologues Erzählung Bildbeschreibung

Dialogues: Task-oriented data collection Map Task Appointment-making

Degree of naturalness? Controlled elicitation

Page 16: Annotation of speech from the phonetics/phonology perspective Bettina Braun & Jürgen Trouvain 15.02.2002 Fachrichtung 4.7, Institut für Phonetik

Annotation of speech 16

Controlled elicitation of spontaneous speech

Page 17: Annotation of speech from the phonetics/phonology perspective Bettina Braun & Jürgen Trouvain 15.02.2002 Fachrichtung 4.7, Institut für Phonetik

Annotation of speech 17

Problems for annotation: non-speech in speech

Many non-linguistic signal portions: swallowing lip-smacking breathing unfilled, filled pauses laughter hesitational lengthening

Partly overlapping with speech

Page 18: Annotation of speech from the phonetics/phonology perspective Bettina Braun & Jürgen Trouvain 15.02.2002 Fachrichtung 4.7, Institut für Phonetik

Annotation of speech 18

Functions of prosody Generally: Features above the

segmental level suprasegmental

Page 19: Annotation of speech from the phonetics/phonology perspective Bettina Braun & Jürgen Trouvain 15.02.2002 Fachrichtung 4.7, Institut für Phonetik

Annotation of speech 19

Phonetic encoding of prosody perceived pitch over time duration intensity spectral quality

Page 20: Annotation of speech from the phonetics/phonology perspective Bettina Braun & Jürgen Trouvain 15.02.2002 Fachrichtung 4.7, Institut für Phonetik

Annotation of speech 20

Prosodic annotation: Signal oriented

Tilt-model (Taylor 2000) intonational “events” continuous parameters (tilt

parameter): amplitude: sum of the magnitude of rise and

fall duration: sum of rise and fall durations tilt: shape of the event

1.0 0.5 0

Page 21: Annotation of speech from the phonetics/phonology perspective Bettina Braun & Jürgen Trouvain 15.02.2002 Fachrichtung 4.7, Institut für Phonetik

Annotation of speech 21

Prosodic annotation: Autosegmental, phonological

GToBI (Grice et al.) Tonal tier, break tier Two levels of pitch-heights (L, H) Simple and complex pitch accents Association to word stress marked by

* Exact temporal alignment Boundary tones marked by % Strength of prosodic breaks (3, 4)

Page 22: Annotation of speech from the phonetics/phonology perspective Bettina Braun & Jürgen Trouvain 15.02.2002 Fachrichtung 4.7, Institut für Phonetik

Annotation of speech 22

Prosodic annotation: Exampletonalorth.breakmisc

Page 23: Annotation of speech from the phonetics/phonology perspective Bettina Braun & Jürgen Trouvain 15.02.2002 Fachrichtung 4.7, Institut für Phonetik

Annotation of speech 23

GToBI Labelfiles

46.836392 113 also 46.958899 113 ich 47.171623 113 bin 47.555335 113 genau 48.180049 113 waagerecht 48.468170 113 rechts 48.613576 113 von 48.726670 113 der 49.246344 113 Goldmine

47.469173 115 L+H* 47.555339 115 H- 47.768061 115 H* 47.851534 115 < 48.320061 115 !H* 48.812822 115 !H* 49.240958 115 L-%

orthografic tones

47.555339 123 3 49.249036 123 4

breaks

Page 24: Annotation of speech from the phonetics/phonology perspective Bettina Braun & Jürgen Trouvain 15.02.2002 Fachrichtung 4.7, Institut für Phonetik

Annotation of speech 24

Prosodic annotation: Phonological, single-layer

KIM (Kohler 1995) no suprasegmental tiers => efficient

analysis of segment-prosody interaction

differentiated from segmental labels by special diacritica

time marks for prosodic events anchored to word boundaries.

Example:

Page 25: Annotation of speech from the phonetics/phonology perspective Bettina Braun & Jürgen Trouvain 15.02.2002 Fachrichtung 4.7, Institut für Phonetik

Annotation of speech 25

13 #c: 0.0007500 13 #&2 0.0007500 13 ##v: 0.0007500 13 $Q- 0.0007500 13 $E: 0.0007500 2147 $m 0.1341250 4787 #&PGn 0.2991250 4787 #&2( 0.2991250 4787 ##d 0.2991250 6243 $-h 0.3901250 6619 $'i: 0.4136250 7569 $n 0.4730000 8265 $s 0.5165000 9202 $t 0.5750625 9527 $-h 0.5953750 9995 $a: 0.6246250 10648 $k-x 0.6654375 11405 #&0 0.7127500 11405 ##v 0.7127500 12528 $Y6 0.7829375 13946 $d 0.8715625 14275 $@+ 0.8921250

14721 #&0 0.9200000 14721 ##m 0.9200000 16051 $i:6+ 1.0031250 16935 #&0 1.0583750 16935 ##g 1.0583750 18093 $-h 1.1307500 18564 $'u: 1.1601875 19314 $t 1.2070625 19981 $-h 1.2487500 20336 #&0. 1.2709375 20336 #&2) 1.2709375 20336 ##p 1.2709375 21501 $-h 1.3437500 22440 $'a 1.4024375 23700 $s 1.4811875 25408 $@- 1.5879375 25408 $n 1.5879375 28935 #, 1.8083750

Page 26: Annotation of speech from the phonetics/phonology perspective Bettina Braun & Jürgen Trouvain 15.02.2002 Fachrichtung 4.7, Institut für Phonetik

Annotation of speech 26

13 #c: 0.0007500 13 #&2 0.0007500 13 ##v: 0.0007500 13 $Q- 0.0007500 13 $E: 0.0007500 2147 $m 0.1341250 4787 #&PGn 0.2991250 4787 #&2( 0.2991250 4787 ##d 0.2991250 6243 $-h 0.3901250 6619 $'i: 0.4136250 7569 $n 0.4730000 8265 $s 0.5165000 9202 $t 0.5750625 9527 $-h 0.5953750 9995 $a: 0.6246250 10648 $k-x 0.6654375 11405 #&0 0.7127500 11405 ##v 0.7127500 12528 $Y6 0.7829375 13946 $d 0.8715625 14275 $@+ 0.8921250

14721 #&0 0.9200000 14721 ##m 0.9200000 16051 $i:6+ 1.0031250 16935 #&0 1.0583750 16935 ##g 1.0583750 18093 $-h 1.1307500 18564 $'u: 1.1601875 19314 $t 1.2070625 19981 $-h 1.2487500 20336 #&0. 1.2709375 20336 #&2) 1.2709375 20336 ##p 1.2709375 21501 $-h 1.3437500 22440 $'a 1.4024375 23700 $s 1.4811875 25408 $@- 1.5879375 25408 $n 1.5879375 28935 #, 1.8083750

Page 27: Annotation of speech from the phonetics/phonology perspective Bettina Braun & Jürgen Trouvain 15.02.2002 Fachrichtung 4.7, Institut für Phonetik

Annotation of speech 27

13 #c: 0.0007500 13 #&2 0.0007500 13 ##v: 0.0007500 13 $Q- 0.0007500 13 $E: 0.0007500 2147 $m 0.1341250 4787 #&PGn 0.2991250 4787 #&2( 0.2991250 4787 ##d 0.2991250 6243 $-h 0.3901250 6619 $'i: 0.4136250 7569 $n 0.4730000 8265 $s 0.5165000 9202 $t 0.5750625 9527 $-h 0.5953750 9995 $a: 0.6246250 10648 $k-x 0.6654375 11405 #&0 0.7127500 11405 ##v 0.7127500 12528 $Y6 0.7829375 13946 $d 0.8715625 14275 $@+ 0.8921250

14721 #&0 0.9200000 14721 ##m 0.9200000 16051 $i:6+ 1.0031250 16935 #&0 1.0583750 16935 ##g 1.0583750 18093 $-h 1.1307500 18564 $'u: 1.1601875 19314 $t 1.2070625 19981 $-h 1.2487500 20336 #&0. 1.2709375 20336 #&2) 1.2709375 20336 ##p 1.2709375 21501 $-h 1.3437500 22440 $'a 1.4024375 23700 $s 1.4811875 25408 $@- 1.5879375 25408 $n 1.5879375 28935 #, 1.8083750

Page 28: Annotation of speech from the phonetics/phonology perspective Bettina Braun & Jürgen Trouvain 15.02.2002 Fachrichtung 4.7, Institut für Phonetik

Annotation of speech 28

13 #c: 0.0007500 13 #&2 0.0007500 13 ##v: 0.0007500 13 $Q- 0.0007500 13 $E: 0.0007500 2147 $m 0.1341250 4787 #&PGn 0.2991250 4787 #&2( 0.2991250 4787 ##d 0.2991250 6243 $-h 0.3901250 6619 $'i: 0.4136250 7569 $n 0.4730000 8265 $s 0.5165000 9202 $t 0.5750625 9527 $-h 0.5953750 9995 $a: 0.6246250 10648 $k-x 0.6654375 11405 #&0 0.7127500 11405 ##v 0.7127500 12528 $Y6 0.7829375 13946 $d 0.8715625 14275 $@+ 0.8921250

14721 #&0 0.9200000 14721 ##m 0.9200000 16051 $i:6+ 1.0031250 16935 #&0 1.0583750 16935 ##g 1.0583750 18093 $-h 1.1307500 18564 $'u: 1.1601875 19314 $t 1.2070625 19981 $-h 1.2487500 20336 #&0. 1.2709375 20336 #&2) 1.2709375 20336 ##p 1.2709375 21501 $-h 1.3437500 22440 $'a 1.4024375 23700 $s 1.4811875 25408 $@- 1.5879375 25408 $n 1.5879375 28935 #, 1.8083750

Page 29: Annotation of speech from the phonetics/phonology perspective Bettina Braun & Jürgen Trouvain 15.02.2002 Fachrichtung 4.7, Institut für Phonetik

Annotation of speech 29

13 #c: 0.0007500 13 #&2 0.0007500 13 ##v: 0.0007500 13 $Q- 0.0007500 13 $E: 0.0007500 2147 $m 0.1341250 4787 #&PGn 0.2991250 4787 #&2( 0.2991250 4787 ##d 0.2991250 6243 $-h 0.3901250 6619 $'i: 0.4136250 7569 $n 0.4730000 8265 $s 0.5165000 9202 $t 0.5750625 9527 $-h 0.5953750 9995 $a: 0.6246250 10648 $k-x 0.6654375 11405 #&0 0.7127500 11405 ##v 0.7127500 12528 $Y6 0.7829375 13946 $d 0.8715625 14275 $@+ 0.8921250

14721 #&0 0.9200000 14721 ##m 0.9200000 16051 $i:6+ 1.0031250 16935 #&0 1.0583750 16935 ##g 1.0583750 18093 $-h 1.1307500 18564 $'u: 1.1601875 19314 $t 1.2070625 19981 $-h 1.2487500 20336 #&0. 1.2709375 20336 #&2) 1.2709375 20336 ##p 1.2709375 21501 $-h 1.3437500 22440 $'a 1.4024375 23700 $s 1.4811875 25408 $@- 1.5879375 25408 $n 1.5879375 28935 #, 1.8083750

Page 30: Annotation of speech from the phonetics/phonology perspective Bettina Braun & Jürgen Trouvain 15.02.2002 Fachrichtung 4.7, Institut für Phonetik

Annotation of speech 30

Data structures and retrieval Mostly pure textfiles, aligned to

signal “Retrieval” using script languages (GToBI in EMU-Format) XML-formats

Page 31: Annotation of speech from the phonetics/phonology perspective Bettina Braun & Jürgen Trouvain 15.02.2002 Fachrichtung 4.7, Institut für Phonetik

Annotation of speech 31

What for? Basic research

Rhythmic patterns Speech rate measurements (units, domains) Temporal alignment & scaling of pitch

accents Differentiated analysis of pitch range

Speech technology Modelling accentuation in ASR Speech rate in ASR Intonation and timing for synthesis

Page 32: Annotation of speech from the phonetics/phonology perspective Bettina Braun & Jürgen Trouvain 15.02.2002 Fachrichtung 4.7, Institut für Phonetik

Annotation of speech 32

Bibliography Alwan, A., H.Bourlard and S.Furui (eds). 2001. Speech

Communication 33. Special Issue on Speech Annotation and Corpus Tools.

Grice,M., S.Baumann and R.Benzmüller (to appear). German ToBI. In: S.Jun (ed). Prosodic Typology

Grice, M. et al. (2000). Representation and annotation of dialogue. In: Handbook of Multimodal and Spoken Dialogue Systems. Resources, Terminology and Product Evaluation. Kluwer, pp. 1-101.

Kohler, K.J. (ed) 1995. Kieler Arbeitsberichte 29. Taylor, P. 2000. Analysis and Synthesis of Intonation

Using the Tilt Model. In: JASA 107(3). pp. 1697-1714.