center for spoken language understanding 1 prediction and synthesis of prosodic effects on spectral...
TRANSCRIPT
![Page 1: CENTER FOR SPOKEN LANGUAGE UNDERSTANDING 1 PREDICTION AND SYNTHESIS OF PROSODIC EFFECTS ON SPECTRAL BALANCE OF VOWELS Jan P.H. van Santen and Xiaochuan](https://reader030.vdocument.in/reader030/viewer/2022032722/56649ce65503460f949b450b/html5/thumbnails/1.jpg)
CENTER FOR SPOKEN LANGUAGE UNDERSTANDING 1
PREDICTION AND SYNTHESIS OF PROSODIC EFFECTS ON SPECTRAL
BALANCE OF VOWELS
Jan P.H. van Santen and Xiaochuan Niu
Center for Spoken Language UnderstandingOGI School of Science & Technology at OHSU
![Page 2: CENTER FOR SPOKEN LANGUAGE UNDERSTANDING 1 PREDICTION AND SYNTHESIS OF PROSODIC EFFECTS ON SPECTRAL BALANCE OF VOWELS Jan P.H. van Santen and Xiaochuan](https://reader030.vdocument.in/reader030/viewer/2022032722/56649ce65503460f949b450b/html5/thumbnails/2.jpg)
CENTER FOR SPOKEN LANGUAGE UNDERSTANDING 2
OVERVIEW
1. IMPORTANCE OF SPECTRAL BALANCE2. MEASUREMENT OF SPECTRAL BALANCE3. ANALYSIS METHODS4. RESULTS5. SYNTHESIS6. CONCLUSIONS
![Page 3: CENTER FOR SPOKEN LANGUAGE UNDERSTANDING 1 PREDICTION AND SYNTHESIS OF PROSODIC EFFECTS ON SPECTRAL BALANCE OF VOWELS Jan P.H. van Santen and Xiaochuan](https://reader030.vdocument.in/reader030/viewer/2022032722/56649ce65503460f949b450b/html5/thumbnails/3.jpg)
CENTER FOR SPOKEN LANGUAGE UNDERSTANDING 3
1. IMPORTANCE OF SPECTRAL BALANCE
• Linguistic Control Factors– Stress-like factors– Positional factors– Phonemic factors
• Acoustic Correlates– Traditionally TTS-controlled:
• Pitch, timing, amplitude
– Demonstrated in natural speech, but usually not TTS-controlled:• Spectral tilt, balance• Formant dynamics• …
![Page 4: CENTER FOR SPOKEN LANGUAGE UNDERSTANDING 1 PREDICTION AND SYNTHESIS OF PROSODIC EFFECTS ON SPECTRAL BALANCE OF VOWELS Jan P.H. van Santen and Xiaochuan](https://reader030.vdocument.in/reader030/viewer/2022032722/56649ce65503460f949b450b/html5/thumbnails/4.jpg)
CENTER FOR SPOKEN LANGUAGE UNDERSTANDING 4
2. MEASUREMENT OF SPECTRAL BALANCE
• Data:– 472 greedily selected sentences
• Genre: newspaper• Greedy features: linguistic control factors
– One female speaker– Manual segmentation– Accent: independent rating by 3 judges
• 0-3 score
![Page 5: CENTER FOR SPOKEN LANGUAGE UNDERSTANDING 1 PREDICTION AND SYNTHESIS OF PROSODIC EFFECTS ON SPECTRAL BALANCE OF VOWELS Jan P.H. van Santen and Xiaochuan](https://reader030.vdocument.in/reader030/viewer/2022032722/56649ce65503460f949b450b/html5/thumbnails/5.jpg)
CENTER FOR SPOKEN LANGUAGE UNDERSTANDING 5
2. MEASUREMENT OF SPECTRAL BALANCE
• Energy in 5 formant-range frequency bands– B0: 100-300 Hz [~F0]
– B1: 300-800 Hz [~F1]
– B2: 800-2500 Hz [~F2]
– B3: 2500-3500 Hz [~F3]
– B4: 3500- max Hz [~fricative noise]
• In other words, multidimensional measure• Filter bank Square
Average [1 ms rect.] 20 log10(Bi )
• Subtract estimated per-utterance means
![Page 6: CENTER FOR SPOKEN LANGUAGE UNDERSTANDING 1 PREDICTION AND SYNTHESIS OF PROSODIC EFFECTS ON SPECTRAL BALANCE OF VOWELS Jan P.H. van Santen and Xiaochuan](https://reader030.vdocument.in/reader030/viewer/2022032722/56649ce65503460f949b450b/html5/thumbnails/6.jpg)
CENTER FOR SPOKEN LANGUAGE UNDERSTANDING 6
2. MEASUREMENT OF SPECTRAL BALANCE
• Details:– Confounding with F0
• Measure pitch-corrected and raw– For certain wave shapes, pitch directly related to fixed-frame
energy– Why do both: wave shapes may change in unknown ways
• F0 not confined to B0 [female speech]
– Vowel formants not quite confined to bands [e.g., F1 for /EE/ and F3 for /ER/]
![Page 7: CENTER FOR SPOKEN LANGUAGE UNDERSTANDING 1 PREDICTION AND SYNTHESIS OF PROSODIC EFFECTS ON SPECTRAL BALANCE OF VOWELS Jan P.H. van Santen and Xiaochuan](https://reader030.vdocument.in/reader030/viewer/2022032722/56649ce65503460f949b450b/html5/thumbnails/7.jpg)
CENTER FOR SPOKEN LANGUAGE UNDERSTANDING 7
2. MEASUREMENT OF SPECTRAL BALANCE
• Why not more or different bands?– Multiple interacting Linguistic Control Factors
• Need measurements that minimize interactions
– 5 bands Different vowels “behave similarly”• Can model vowels as a class
• Why not simply spectral tilt?– 5 bands more information than single measure– Supply more information for synthesis
![Page 8: CENTER FOR SPOKEN LANGUAGE UNDERSTANDING 1 PREDICTION AND SYNTHESIS OF PROSODIC EFFECTS ON SPECTRAL BALANCE OF VOWELS Jan P.H. van Santen and Xiaochuan](https://reader030.vdocument.in/reader030/viewer/2022032722/56649ce65503460f949b450b/html5/thumbnails/8.jpg)
CENTER FOR SPOKEN LANGUAGE UNDERSTANDING 8
3. ANALYSIS METHODS
• Measures likely to behave like segmental duration:– Multiple interacting, confounded factors:
• Interaction: Magnitude of effects on one factor may depend on other factors
• Confounding: Unequal frequencies of control factor combinations
– “Directional Invariance”• Direction of effects on one factor
independent of other factors
![Page 9: CENTER FOR SPOKEN LANGUAGE UNDERSTANDING 1 PREDICTION AND SYNTHESIS OF PROSODIC EFFECTS ON SPECTRAL BALANCE OF VOWELS Jan P.H. van Santen and Xiaochuan](https://reader030.vdocument.in/reader030/viewer/2022032722/56649ce65503460f949b450b/html5/thumbnails/9.jpg)
CENTER FOR SPOKEN LANGUAGE UNDERSTANDING 9
3. ANALYSIS METHODS
• Need method that – can handle multiple interacting,
confounded factors and – takes advantage of Directional
Invariance:
• Used: Sums of Products Model:
Ki Ij
jjini
i
cSccB )(),...,( ,0
![Page 10: CENTER FOR SPOKEN LANGUAGE UNDERSTANDING 1 PREDICTION AND SYNTHESIS OF PROSODIC EFFECTS ON SPECTRAL BALANCE OF VOWELS Jan P.H. van Santen and Xiaochuan](https://reader030.vdocument.in/reader030/viewer/2022032722/56649ce65503460f949b450b/html5/thumbnails/10.jpg)
CENTER FOR SPOKEN LANGUAGE UNDERSTANDING 10
3. ANALYSIS METHODS
• Special cases:– Multiplicative model: K = {1}, I1 = {0,…,n}
)()(),...,( ,100,10 nnni cScSccB
)()(),...,( 1,01,00 nnni cScSccB
– Additive model: K = {0,…,n}, Ii = {i}
![Page 11: CENTER FOR SPOKEN LANGUAGE UNDERSTANDING 1 PREDICTION AND SYNTHESIS OF PROSODIC EFFECTS ON SPECTRAL BALANCE OF VOWELS Jan P.H. van Santen and Xiaochuan](https://reader030.vdocument.in/reader030/viewer/2022032722/56649ce65503460f949b450b/html5/thumbnails/11.jpg)
CENTER FOR SPOKEN LANGUAGE UNDERSTANDING 11
3. ANALYSIS METHODS
• Used additive model
• Note: Parameter estimates are:– Estimates of marginal means …– … in balanced design:
),...,,...,()( 0,...,,...,
1,00
niiCcccCc
ii cccBMeancSnnii
![Page 12: CENTER FOR SPOKEN LANGUAGE UNDERSTANDING 1 PREDICTION AND SYNTHESIS OF PROSODIC EFFECTS ON SPECTRAL BALANCE OF VOWELS Jan P.H. van Santen and Xiaochuan](https://reader030.vdocument.in/reader030/viewer/2022032722/56649ce65503460f949b450b/html5/thumbnails/12.jpg)
CENTER FOR SPOKEN LANGUAGE UNDERSTANDING 12
3. ANALYSIS METHODS
• Pitch correction:
)(log20)(log20 10010][
wici tfBB
• Confounding with F0: Show both
<B0, B1, B2, B3, B4> and:
<B0 + B1, B2, B3, B4>
![Page 13: CENTER FOR SPOKEN LANGUAGE UNDERSTANDING 1 PREDICTION AND SYNTHESIS OF PROSODIC EFFECTS ON SPECTRAL BALANCE OF VOWELS Jan P.H. van Santen and Xiaochuan](https://reader030.vdocument.in/reader030/viewer/2022032722/56649ce65503460f949b450b/html5/thumbnails/13.jpg)
CENTER FOR SPOKEN LANGUAGE UNDERSTANDING 13
4. RESULTS: (A) POSITIONAL EFFECTS
5 Bands, not pitch-correctedSolid: right position, dashed: left position. Y-axis: corrected mean
![Page 14: CENTER FOR SPOKEN LANGUAGE UNDERSTANDING 1 PREDICTION AND SYNTHESIS OF PROSODIC EFFECTS ON SPECTRAL BALANCE OF VOWELS Jan P.H. van Santen and Xiaochuan](https://reader030.vdocument.in/reader030/viewer/2022032722/56649ce65503460f949b450b/html5/thumbnails/14.jpg)
CENTER FOR SPOKEN LANGUAGE UNDERSTANDING 14
4. RESULTS: (A) POSITIONAL EFFECTS
5 Bands, pitch-corrected
![Page 15: CENTER FOR SPOKEN LANGUAGE UNDERSTANDING 1 PREDICTION AND SYNTHESIS OF PROSODIC EFFECTS ON SPECTRAL BALANCE OF VOWELS Jan P.H. van Santen and Xiaochuan](https://reader030.vdocument.in/reader030/viewer/2022032722/56649ce65503460f949b450b/html5/thumbnails/15.jpg)
CENTER FOR SPOKEN LANGUAGE UNDERSTANDING 15
4. RESULTS: (A) POSITIONAL EFFECTS
4 Bands, not pitch-corrected
![Page 16: CENTER FOR SPOKEN LANGUAGE UNDERSTANDING 1 PREDICTION AND SYNTHESIS OF PROSODIC EFFECTS ON SPECTRAL BALANCE OF VOWELS Jan P.H. van Santen and Xiaochuan](https://reader030.vdocument.in/reader030/viewer/2022032722/56649ce65503460f949b450b/html5/thumbnails/16.jpg)
CENTER FOR SPOKEN LANGUAGE UNDERSTANDING 16
4. RESULTS: (A) POSITIONAL EFFECTS
4 Bands, pitch-corrected
![Page 17: CENTER FOR SPOKEN LANGUAGE UNDERSTANDING 1 PREDICTION AND SYNTHESIS OF PROSODIC EFFECTS ON SPECTRAL BALANCE OF VOWELS Jan P.H. van Santen and Xiaochuan](https://reader030.vdocument.in/reader030/viewer/2022032722/56649ce65503460f949b450b/html5/thumbnails/17.jpg)
CENTER FOR SPOKEN LANGUAGE UNDERSTANDING 17
4. RESULTS: (B) STRESS/ACCENT EFFECTS
5 Bands, not pitch-correctedSolid: stressed syllable, dashed: unstressed. Y-axis: corrected mean
![Page 18: CENTER FOR SPOKEN LANGUAGE UNDERSTANDING 1 PREDICTION AND SYNTHESIS OF PROSODIC EFFECTS ON SPECTRAL BALANCE OF VOWELS Jan P.H. van Santen and Xiaochuan](https://reader030.vdocument.in/reader030/viewer/2022032722/56649ce65503460f949b450b/html5/thumbnails/18.jpg)
CENTER FOR SPOKEN LANGUAGE UNDERSTANDING 18
4. RESULTS: (B) STRESS/ACCENT EFFECTS
5 Bands, pitch-corrected
![Page 19: CENTER FOR SPOKEN LANGUAGE UNDERSTANDING 1 PREDICTION AND SYNTHESIS OF PROSODIC EFFECTS ON SPECTRAL BALANCE OF VOWELS Jan P.H. van Santen and Xiaochuan](https://reader030.vdocument.in/reader030/viewer/2022032722/56649ce65503460f949b450b/html5/thumbnails/19.jpg)
CENTER FOR SPOKEN LANGUAGE UNDERSTANDING 19
4. RESULTS: (B) STRESS/ACCENT EFFECTS
4 Bands, not pitch-corrected
![Page 20: CENTER FOR SPOKEN LANGUAGE UNDERSTANDING 1 PREDICTION AND SYNTHESIS OF PROSODIC EFFECTS ON SPECTRAL BALANCE OF VOWELS Jan P.H. van Santen and Xiaochuan](https://reader030.vdocument.in/reader030/viewer/2022032722/56649ce65503460f949b450b/html5/thumbnails/20.jpg)
CENTER FOR SPOKEN LANGUAGE UNDERSTANDING 20
4. RESULTS: (B) STRESS/ACCENT EFFECTS
4 Bands, pitch-corrected
![Page 21: CENTER FOR SPOKEN LANGUAGE UNDERSTANDING 1 PREDICTION AND SYNTHESIS OF PROSODIC EFFECTS ON SPECTRAL BALANCE OF VOWELS Jan P.H. van Santen and Xiaochuan](https://reader030.vdocument.in/reader030/viewer/2022032722/56649ce65503460f949b450b/html5/thumbnails/21.jpg)
CENTER FOR SPOKEN LANGUAGE UNDERSTANDING 21
4. RESULTS: (C) TILT EFFECTS
4
3
2
1
0
)2,1,0,1,2(
B
B
B
B
B
Tilt
![Page 22: CENTER FOR SPOKEN LANGUAGE UNDERSTANDING 1 PREDICTION AND SYNTHESIS OF PROSODIC EFFECTS ON SPECTRAL BALANCE OF VOWELS Jan P.H. van Santen and Xiaochuan](https://reader030.vdocument.in/reader030/viewer/2022032722/56649ce65503460f949b450b/html5/thumbnails/22.jpg)
CENTER FOR SPOKEN LANGUAGE UNDERSTANDING 22
5. SYNTHESIS
• Use ABS/OLA sinusoidal model:s[n] = sum of overlapped short-time signal frames sk[n]
sk[n] = sum of quasi-harmonic sinusoidal components:
sk[n] lAk,l cos(k,l n + k,l
• Each frame of unit is represented by a set of quasi-harmonic sinusoidal parameters;
• Given the desired F0 contour, pitch shift is applied to the sinusoidal parameter component of the unit to obtain the target parameter Ak,l ;
![Page 23: CENTER FOR SPOKEN LANGUAGE UNDERSTANDING 1 PREDICTION AND SYNTHESIS OF PROSODIC EFFECTS ON SPECTRAL BALANCE OF VOWELS Jan P.H. van Santen and Xiaochuan](https://reader030.vdocument.in/reader030/viewer/2022032722/56649ce65503460f949b450b/html5/thumbnails/23.jpg)
CENTER FOR SPOKEN LANGUAGE UNDERSTANDING 23
5. SYNTHESIS
• Considering the differences of prosody factors between original and target unit, band differences:
iii BB ˆ
• Transform the band difference into weights applying to the sinusoidal parameters:
i
2010 iiw
• ,when the j’th harmonic is located in
the i'th band;ikjkj wAA
• Spectral smoothing across unit boundaries.
![Page 24: CENTER FOR SPOKEN LANGUAGE UNDERSTANDING 1 PREDICTION AND SYNTHESIS OF PROSODIC EFFECTS ON SPECTRAL BALANCE OF VOWELS Jan P.H. van Santen and Xiaochuan](https://reader030.vdocument.in/reader030/viewer/2022032722/56649ce65503460f949b450b/html5/thumbnails/24.jpg)
CENTER FOR SPOKEN LANGUAGE UNDERSTANDING 24
5. SYNTHESIS
5 Bands modification example [i:]
![Page 25: CENTER FOR SPOKEN LANGUAGE UNDERSTANDING 1 PREDICTION AND SYNTHESIS OF PROSODIC EFFECTS ON SPECTRAL BALANCE OF VOWELS Jan P.H. van Santen and Xiaochuan](https://reader030.vdocument.in/reader030/viewer/2022032722/56649ce65503460f949b450b/html5/thumbnails/25.jpg)
CENTER FOR SPOKEN LANGUAGE UNDERSTANDING 25
CONCLUSIONS
• Described simple methods for predicting and synthesizing spectral balance
• But: Spectral balance is only one “non-standard acoustic correlate”
• Others that remain to be addressed:– Spectral dynamics– Phase