representing intonational variation
DESCRIPTION
Representing Intonational Variation. Julia Hirschberg CS 4706. Today. How can we represent meaningful speech variation so we can compare utterances? assign in TTS? Expanded vs. compressed pitch range? Louder vs. softer speech? Faster vs. slower speech? - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Representing Intonational Variation](https://reader030.vdocument.in/reader030/viewer/2022032313/56812bd7550346895d9043f9/html5/thumbnails/1.jpg)
04/19/23 1
Representing Intonational Variation
Julia Hirschberg
CS 4706
![Page 2: Representing Intonational Variation](https://reader030.vdocument.in/reader030/viewer/2022032313/56812bd7550346895d9043f9/html5/thumbnails/2.jpg)
04/19/23 2
Today
• How can we represent meaningful speech variation so we can compare utterances? assign in TTS?– Expanded vs. compressed pitch range?– Louder vs. softer speech?– Faster vs. slower speech?– Differences in intonational prominence?– Differences in intonational phrasing?– Differences in pitch contours?
![Page 4: Representing Intonational Variation](https://reader030.vdocument.in/reader030/viewer/2022032313/56812bd7550346895d9043f9/html5/thumbnails/4.jpg)
04/19/23 4
Language Learning Approaches
• A simpler approach– / IS it INteresting /– / d’you feel ANGry? /– / WHAT’S the PROBlem? / (McCarthy,
1991:106)• How much variation do we need to capture?
– How detailed?– Continuous or categorical features?– If categorical, what are the possible classes?
![Page 5: Representing Intonational Variation](https://reader030.vdocument.in/reader030/viewer/2022032313/56812bd7550346895d9043f9/html5/thumbnails/5.jpg)
04/19/23 5
How Do We Decide?
•Auditory:
– Language teachers: what representations can learners understand
•Acoustic:
– Examine the speech signal for critical vs. accidental variation
•Experimental approaches
– Identify potential meaningful variation
– Design production or perception studies to test
– E.g. what does a contour mean?
![Page 6: Representing Intonational Variation](https://reader030.vdocument.in/reader030/viewer/2022032313/56812bd7550346895d9043f9/html5/thumbnails/6.jpg)
04/19/23 6
Intonation Models
• Superpositional models (Fujisaki 1983, Möbius et al. 1993): acoustic/physiological
• Linear or Tone sequence models
– British school (Kingdon ’58, O’Connor & Arnold ’73, Cruttenden ’97): based on auditory analysis
– American School (Pierrehumbert ’80, ToBI): mainly acoustic analysis
– Dutch school (‘t Hart, Collier and Cohen 1990): perceptual data
![Page 7: Representing Intonational Variation](https://reader030.vdocument.in/reader030/viewer/2022032313/56812bd7550346895d9043f9/html5/thumbnails/7.jpg)
04/19/23 7
Superpositional models
• Pitch pattern of intonation modeled with two components: phrase component and accent component.
• Phrase has basic shape, and pitch movements for individual accents are superimposed over basic shape:
plus
=Apples, oranges and tomatoes
![Page 8: Representing Intonational Variation](https://reader030.vdocument.in/reader030/viewer/2022032313/56812bd7550346895d9043f9/html5/thumbnails/8.jpg)
04/19/23 8
Lily and Rosa thought this was divine.Prince William was gorgeous and he was looking for a bride.They dreamed of wedding bells.
• Declination: downtrend in f0 over the course of an utterance
• Successful in speech synthesis for languages like Japanese (little variation in accent type, e.g.)
Good for modeling utterance-level trends
![Page 9: Representing Intonational Variation](https://reader030.vdocument.in/reader030/viewer/2022032313/56812bd7550346895d9043f9/html5/thumbnails/9.jpg)
04/19/23 9
Disadvantages
• Disadvantages– Too rigid: All contours must be modeled with
an accent and a phrase component – Many SAE contours cannot be captured easily
• Cannot distinguish prominence types• Cannot capture differences in phrase endings
![Page 10: Representing Intonational Variation](https://reader030.vdocument.in/reader030/viewer/2022032313/56812bd7550346895d9043f9/html5/thumbnails/10.jpg)
04/19/23 10
– No account of different accent types, or variations in phrase endings
– No notation system which allows users to share observations from large speech corpora or to compare contours
– Used primarily for synthesis
![Page 11: Representing Intonational Variation](https://reader030.vdocument.in/reader030/viewer/2022032313/56812bd7550346895d9043f9/html5/thumbnails/11.jpg)
04/19/23 11
Tone Sequence Models
• Intonation generated from sequences of categorically different, phonologically distinctive tones
• Basic unit of intonational description: intonation phrase (tone unit, breath group)– Delimited by pauses, phrase-final lengthening, pitch
• Syllables may be stressed or accented – Accent aligned with primary stress -- telephone– Indicated by F0, duration, intensity, voice quality
![Page 12: Representing Intonational Variation](https://reader030.vdocument.in/reader030/viewer/2022032313/56812bd7550346895d9043f9/html5/thumbnails/12.jpg)
04/19/23 15
British School
JOHN’s never BEEN to Jamaica
Prenuclear accent unit Nuclear accent unit
But
Prehead
Stressed syllable
‘Head’ ‘Nucleus’
![Page 13: Representing Intonational Variation](https://reader030.vdocument.in/reader030/viewer/2022032313/56812bd7550346895d9043f9/html5/thumbnails/13.jpg)
04/19/23 16
Six nuclear choices in English
Ja maic
falling
a ic
rising
Ja maa
a c
rising-falling
iJa m a
falling-rising
Ja maica
Rising-falling-rising
a ciJa m aalevel
Ja maica
![Page 14: Representing Intonational Variation](https://reader030.vdocument.in/reader030/viewer/2022032313/56812bd7550346895d9043f9/html5/thumbnails/14.jpg)
04/19/23 17
The American School
• American school-type models make a distinction between accents (what makes a particular word prominent) and boundary tones (how a phrase ends)
• Autosegmental metrical or two-tone models
• Only two tones, which may be combined
– H = high target
– L = low target
![Page 15: Representing Intonational Variation](https://reader030.vdocument.in/reader030/viewer/2022032313/56812bd7550346895d9043f9/html5/thumbnails/15.jpg)
04/19/23 18
Pierrehumbert 1980
• Contours = pitch accents, phrase accents, boundary tones
Pitch Accents*
Phrase Accents*
Boundary Tone
H* L*
L*+H L+H*
H*+L H+L*
L- H- L% H%
![Page 16: Representing Intonational Variation](https://reader030.vdocument.in/reader030/viewer/2022032313/56812bd7550346895d9043f9/html5/thumbnails/16.jpg)
04/19/23 19
Price, Ostendorf et al
• Break indices: degree of juncture between words
• 0 8 (none to ‘a lot’)– What I’d like is a nice roast beef sandwich.
![Page 17: Representing Intonational Variation](https://reader030.vdocument.in/reader030/viewer/2022032313/56812bd7550346895d9043f9/html5/thumbnails/17.jpg)
04/19/23 20
To(nes and)B(reak)I(ndices)
• Developed by prosody researchers in four meetings over 1991-94
• Putting Pierrehumbert ’80 and Price, Ostendorf, et al together
• Goals:
– devise common labeling scheme for Standard American English that is robust and reliable
– promote collection of large, prosodically labeled, shareable corpora
![Page 18: Representing Intonational Variation](https://reader030.vdocument.in/reader030/viewer/2022032313/56812bd7550346895d9043f9/html5/thumbnails/18.jpg)
04/19/23 21
• ToBI standards also proposed for Japanese, German, Italian, Spanish, British and Australian English,....
• Minimal ToBI transcription:
– Recording of speech
– F0 contour
– ToBI tiers: • orthographic tier: words
• break-index tier: degrees of junction (Price et al ‘89)
• tonal tier: pitch accents, phrase accents, boundary tones (Pierrehumbert ‘80)
• miscellaneous tier: disfluencies, non-speech sounds, etc.
![Page 19: Representing Intonational Variation](https://reader030.vdocument.in/reader030/viewer/2022032313/56812bd7550346895d9043f9/html5/thumbnails/19.jpg)
04/19/23 22
Sample ToBI Labeling
![Page 20: Representing Intonational Variation](https://reader030.vdocument.in/reader030/viewer/2022032313/56812bd7550346895d9043f9/html5/thumbnails/20.jpg)
04/19/23 23
• Online training material,available at: http://anita.simmons.edu/~tobi/index.html
• Evaluation– Good inter-labeler reliability for expert and
naive labelers: 88% agreement on presence/absence of tonal category, 81% agreement on category label, 91% agreement on break indices to within 1 level (Silverman et al. ‘92,Pitrelli et al ‘94)
![Page 21: Representing Intonational Variation](https://reader030.vdocument.in/reader030/viewer/2022032313/56812bd7550346895d9043f9/html5/thumbnails/21.jpg)
04/19/23 24
Pitch Accent/Prominence in ToBI
• Which items are made intonationally prominent and how: tonal targets/levels not movement
• Accent type:
– H* simple high(declarative)– L* simple low (ynq)– L*+H scooped, late rise (uncertainty/
incredulity)– L+H* early rise to stress (contrastive focus)– H+!H* fall onto stress (implied familiarity)
![Page 22: Representing Intonational Variation](https://reader030.vdocument.in/reader030/viewer/2022032313/56812bd7550346895d9043f9/html5/thumbnails/22.jpg)
04/19/23 25
•Downstepped accents:
•!H*,
•L+!H*,
•L*+!H
•Degree of prominence:within a phrase: HiF0 (~nuclear accent)
across phrases ??
![Page 23: Representing Intonational Variation](https://reader030.vdocument.in/reader030/viewer/2022032313/56812bd7550346895d9043f9/html5/thumbnails/23.jpg)
04/19/23 26
Prosodic Phrasing in ToBI
• ‘Levels’ of phrasing:
– intermediate phrase: one or more pitch accents plus a phrase accent, H- or L-
– intonational phrase: 1 or more intermediate phrases + boundary tone, H% or L%
• ToBI break-index tier
– 0 no word boundary– 1 word boundary
![Page 24: Representing Intonational Variation](https://reader030.vdocument.in/reader030/viewer/2022032313/56812bd7550346895d9043f9/html5/thumbnails/24.jpg)
04/19/23 27
– 2 strong juncture with no tonal markings
– 3 intermediate phrase boundary
– 4 intonational phrase boundary
![Page 25: Representing Intonational Variation](https://reader030.vdocument.in/reader030/viewer/2022032313/56812bd7550346895d9043f9/html5/thumbnails/25.jpg)
04/19/23 28
L*+H
L*
H*
H-H%H-L%L-H%L-L%
![Page 26: Representing Intonational Variation](https://reader030.vdocument.in/reader030/viewer/2022032313/56812bd7550346895d9043f9/html5/thumbnails/26.jpg)
04/19/23 29
H* !H*
H+!H*
L+H*
H-H%H-L%L-H%L-L%
![Page 27: Representing Intonational Variation](https://reader030.vdocument.in/reader030/viewer/2022032313/56812bd7550346895d9043f9/html5/thumbnails/27.jpg)
04/19/23 30
• ToBI exercises• NB: you will be submitting these exercises for
the take-home part of the midterm, so save them!
![Page 28: Representing Intonational Variation](https://reader030.vdocument.in/reader030/viewer/2022032313/56812bd7550346895d9043f9/html5/thumbnails/28.jpg)
04/19/23 31
Next Class
• Predicting prosodic assignments from text