center for spoken language understanding 1 prediction and synthesis of prosodic effects on spectral...

25
CENTER FOR SPOKEN LANGUAGE UNDERSTANDING 1 PREDICTION AND SYNTHESIS OF PROSODIC EFFECTS ON SPECTRAL BALANCE OF VOWELS Jan P.H. van Santen and Xiaochuan Niu Center for Spoken Language Understanding OGI School of Science & Technology at OHSU

Upload: natalie-gibbs

Post on 17-Dec-2015

215 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: CENTER FOR SPOKEN LANGUAGE UNDERSTANDING 1 PREDICTION AND SYNTHESIS OF PROSODIC EFFECTS ON SPECTRAL BALANCE OF VOWELS Jan P.H. van Santen and Xiaochuan

CENTER FOR SPOKEN LANGUAGE UNDERSTANDING 1

PREDICTION AND SYNTHESIS OF PROSODIC EFFECTS ON SPECTRAL

BALANCE OF VOWELS

Jan P.H. van Santen and Xiaochuan Niu

Center for Spoken Language UnderstandingOGI School of Science & Technology at OHSU

Page 2: CENTER FOR SPOKEN LANGUAGE UNDERSTANDING 1 PREDICTION AND SYNTHESIS OF PROSODIC EFFECTS ON SPECTRAL BALANCE OF VOWELS Jan P.H. van Santen and Xiaochuan

CENTER FOR SPOKEN LANGUAGE UNDERSTANDING 2

OVERVIEW

1. IMPORTANCE OF SPECTRAL BALANCE2. MEASUREMENT OF SPECTRAL BALANCE3. ANALYSIS METHODS4. RESULTS5. SYNTHESIS6. CONCLUSIONS

Page 3: CENTER FOR SPOKEN LANGUAGE UNDERSTANDING 1 PREDICTION AND SYNTHESIS OF PROSODIC EFFECTS ON SPECTRAL BALANCE OF VOWELS Jan P.H. van Santen and Xiaochuan

CENTER FOR SPOKEN LANGUAGE UNDERSTANDING 3

1. IMPORTANCE OF SPECTRAL BALANCE

• Linguistic Control Factors– Stress-like factors– Positional factors– Phonemic factors

• Acoustic Correlates– Traditionally TTS-controlled:

• Pitch, timing, amplitude

– Demonstrated in natural speech, but usually not TTS-controlled:• Spectral tilt, balance• Formant dynamics• …

Page 4: CENTER FOR SPOKEN LANGUAGE UNDERSTANDING 1 PREDICTION AND SYNTHESIS OF PROSODIC EFFECTS ON SPECTRAL BALANCE OF VOWELS Jan P.H. van Santen and Xiaochuan

CENTER FOR SPOKEN LANGUAGE UNDERSTANDING 4

2. MEASUREMENT OF SPECTRAL BALANCE

• Data:– 472 greedily selected sentences

• Genre: newspaper• Greedy features: linguistic control factors

– One female speaker– Manual segmentation– Accent: independent rating by 3 judges

• 0-3 score

Page 5: CENTER FOR SPOKEN LANGUAGE UNDERSTANDING 1 PREDICTION AND SYNTHESIS OF PROSODIC EFFECTS ON SPECTRAL BALANCE OF VOWELS Jan P.H. van Santen and Xiaochuan

CENTER FOR SPOKEN LANGUAGE UNDERSTANDING 5

2. MEASUREMENT OF SPECTRAL BALANCE

• Energy in 5 formant-range frequency bands– B0: 100-300 Hz [~F0]

– B1: 300-800 Hz [~F1]

– B2: 800-2500 Hz [~F2]

– B3: 2500-3500 Hz [~F3]

– B4: 3500- max Hz [~fricative noise]

• In other words, multidimensional measure• Filter bank Square

Average [1 ms rect.] 20 log10(Bi )

• Subtract estimated per-utterance means

Page 6: CENTER FOR SPOKEN LANGUAGE UNDERSTANDING 1 PREDICTION AND SYNTHESIS OF PROSODIC EFFECTS ON SPECTRAL BALANCE OF VOWELS Jan P.H. van Santen and Xiaochuan

CENTER FOR SPOKEN LANGUAGE UNDERSTANDING 6

2. MEASUREMENT OF SPECTRAL BALANCE

• Details:– Confounding with F0

• Measure pitch-corrected and raw– For certain wave shapes, pitch directly related to fixed-frame

energy– Why do both: wave shapes may change in unknown ways

• F0 not confined to B0 [female speech]

– Vowel formants not quite confined to bands [e.g., F1 for /EE/ and F3 for /ER/]

Page 7: CENTER FOR SPOKEN LANGUAGE UNDERSTANDING 1 PREDICTION AND SYNTHESIS OF PROSODIC EFFECTS ON SPECTRAL BALANCE OF VOWELS Jan P.H. van Santen and Xiaochuan

CENTER FOR SPOKEN LANGUAGE UNDERSTANDING 7

2. MEASUREMENT OF SPECTRAL BALANCE

• Why not more or different bands?– Multiple interacting Linguistic Control Factors

• Need measurements that minimize interactions

– 5 bands Different vowels “behave similarly”• Can model vowels as a class

• Why not simply spectral tilt?– 5 bands more information than single measure– Supply more information for synthesis

Page 8: CENTER FOR SPOKEN LANGUAGE UNDERSTANDING 1 PREDICTION AND SYNTHESIS OF PROSODIC EFFECTS ON SPECTRAL BALANCE OF VOWELS Jan P.H. van Santen and Xiaochuan

CENTER FOR SPOKEN LANGUAGE UNDERSTANDING 8

3. ANALYSIS METHODS

• Measures likely to behave like segmental duration:– Multiple interacting, confounded factors:

• Interaction: Magnitude of effects on one factor may depend on other factors

• Confounding: Unequal frequencies of control factor combinations

– “Directional Invariance”• Direction of effects on one factor

independent of other factors

Page 9: CENTER FOR SPOKEN LANGUAGE UNDERSTANDING 1 PREDICTION AND SYNTHESIS OF PROSODIC EFFECTS ON SPECTRAL BALANCE OF VOWELS Jan P.H. van Santen and Xiaochuan

CENTER FOR SPOKEN LANGUAGE UNDERSTANDING 9

3. ANALYSIS METHODS

• Need method that – can handle multiple interacting,

confounded factors and – takes advantage of Directional

Invariance:

• Used: Sums of Products Model:

Ki Ij

jjini

i

cSccB )(),...,( ,0

Page 10: CENTER FOR SPOKEN LANGUAGE UNDERSTANDING 1 PREDICTION AND SYNTHESIS OF PROSODIC EFFECTS ON SPECTRAL BALANCE OF VOWELS Jan P.H. van Santen and Xiaochuan

CENTER FOR SPOKEN LANGUAGE UNDERSTANDING 10

3. ANALYSIS METHODS

• Special cases:– Multiplicative model: K = {1}, I1 = {0,…,n}

)()(),...,( ,100,10 nnni cScSccB

)()(),...,( 1,01,00 nnni cScSccB

– Additive model: K = {0,…,n}, Ii = {i}

Page 11: CENTER FOR SPOKEN LANGUAGE UNDERSTANDING 1 PREDICTION AND SYNTHESIS OF PROSODIC EFFECTS ON SPECTRAL BALANCE OF VOWELS Jan P.H. van Santen and Xiaochuan

CENTER FOR SPOKEN LANGUAGE UNDERSTANDING 11

3. ANALYSIS METHODS

• Used additive model

• Note: Parameter estimates are:– Estimates of marginal means …– … in balanced design:

),...,,...,()( 0,...,,...,

1,00

niiCcccCc

ii cccBMeancSnnii

Page 12: CENTER FOR SPOKEN LANGUAGE UNDERSTANDING 1 PREDICTION AND SYNTHESIS OF PROSODIC EFFECTS ON SPECTRAL BALANCE OF VOWELS Jan P.H. van Santen and Xiaochuan

CENTER FOR SPOKEN LANGUAGE UNDERSTANDING 12

3. ANALYSIS METHODS

• Pitch correction:

)(log20)(log20 10010][

wici tfBB

• Confounding with F0: Show both

<B0, B1, B2, B3, B4> and:

<B0 + B1, B2, B3, B4>

Page 13: CENTER FOR SPOKEN LANGUAGE UNDERSTANDING 1 PREDICTION AND SYNTHESIS OF PROSODIC EFFECTS ON SPECTRAL BALANCE OF VOWELS Jan P.H. van Santen and Xiaochuan

CENTER FOR SPOKEN LANGUAGE UNDERSTANDING 13

4. RESULTS: (A) POSITIONAL EFFECTS

5 Bands, not pitch-correctedSolid: right position, dashed: left position. Y-axis: corrected mean

Page 14: CENTER FOR SPOKEN LANGUAGE UNDERSTANDING 1 PREDICTION AND SYNTHESIS OF PROSODIC EFFECTS ON SPECTRAL BALANCE OF VOWELS Jan P.H. van Santen and Xiaochuan

CENTER FOR SPOKEN LANGUAGE UNDERSTANDING 14

4. RESULTS: (A) POSITIONAL EFFECTS

5 Bands, pitch-corrected

Page 15: CENTER FOR SPOKEN LANGUAGE UNDERSTANDING 1 PREDICTION AND SYNTHESIS OF PROSODIC EFFECTS ON SPECTRAL BALANCE OF VOWELS Jan P.H. van Santen and Xiaochuan

CENTER FOR SPOKEN LANGUAGE UNDERSTANDING 15

4. RESULTS: (A) POSITIONAL EFFECTS

4 Bands, not pitch-corrected

Page 16: CENTER FOR SPOKEN LANGUAGE UNDERSTANDING 1 PREDICTION AND SYNTHESIS OF PROSODIC EFFECTS ON SPECTRAL BALANCE OF VOWELS Jan P.H. van Santen and Xiaochuan

CENTER FOR SPOKEN LANGUAGE UNDERSTANDING 16

4. RESULTS: (A) POSITIONAL EFFECTS

4 Bands, pitch-corrected

Page 17: CENTER FOR SPOKEN LANGUAGE UNDERSTANDING 1 PREDICTION AND SYNTHESIS OF PROSODIC EFFECTS ON SPECTRAL BALANCE OF VOWELS Jan P.H. van Santen and Xiaochuan

CENTER FOR SPOKEN LANGUAGE UNDERSTANDING 17

4. RESULTS: (B) STRESS/ACCENT EFFECTS

5 Bands, not pitch-correctedSolid: stressed syllable, dashed: unstressed. Y-axis: corrected mean

Page 18: CENTER FOR SPOKEN LANGUAGE UNDERSTANDING 1 PREDICTION AND SYNTHESIS OF PROSODIC EFFECTS ON SPECTRAL BALANCE OF VOWELS Jan P.H. van Santen and Xiaochuan

CENTER FOR SPOKEN LANGUAGE UNDERSTANDING 18

4. RESULTS: (B) STRESS/ACCENT EFFECTS

5 Bands, pitch-corrected

Page 19: CENTER FOR SPOKEN LANGUAGE UNDERSTANDING 1 PREDICTION AND SYNTHESIS OF PROSODIC EFFECTS ON SPECTRAL BALANCE OF VOWELS Jan P.H. van Santen and Xiaochuan

CENTER FOR SPOKEN LANGUAGE UNDERSTANDING 19

4. RESULTS: (B) STRESS/ACCENT EFFECTS

4 Bands, not pitch-corrected

Page 20: CENTER FOR SPOKEN LANGUAGE UNDERSTANDING 1 PREDICTION AND SYNTHESIS OF PROSODIC EFFECTS ON SPECTRAL BALANCE OF VOWELS Jan P.H. van Santen and Xiaochuan

CENTER FOR SPOKEN LANGUAGE UNDERSTANDING 20

4. RESULTS: (B) STRESS/ACCENT EFFECTS

4 Bands, pitch-corrected

Page 21: CENTER FOR SPOKEN LANGUAGE UNDERSTANDING 1 PREDICTION AND SYNTHESIS OF PROSODIC EFFECTS ON SPECTRAL BALANCE OF VOWELS Jan P.H. van Santen and Xiaochuan

CENTER FOR SPOKEN LANGUAGE UNDERSTANDING 21

4. RESULTS: (C) TILT EFFECTS

4

3

2

1

0

)2,1,0,1,2(

B

B

B

B

B

Tilt

Page 22: CENTER FOR SPOKEN LANGUAGE UNDERSTANDING 1 PREDICTION AND SYNTHESIS OF PROSODIC EFFECTS ON SPECTRAL BALANCE OF VOWELS Jan P.H. van Santen and Xiaochuan

CENTER FOR SPOKEN LANGUAGE UNDERSTANDING 22

5. SYNTHESIS

• Use ABS/OLA sinusoidal model:s[n] = sum of overlapped short-time signal frames sk[n]

sk[n] = sum of quasi-harmonic sinusoidal components:

sk[n] lAk,l cos(k,l n + k,l

• Each frame of unit is represented by a set of quasi-harmonic sinusoidal parameters;

• Given the desired F0 contour, pitch shift is applied to the sinusoidal parameter component of the unit to obtain the target parameter Ak,l ;

Page 23: CENTER FOR SPOKEN LANGUAGE UNDERSTANDING 1 PREDICTION AND SYNTHESIS OF PROSODIC EFFECTS ON SPECTRAL BALANCE OF VOWELS Jan P.H. van Santen and Xiaochuan

CENTER FOR SPOKEN LANGUAGE UNDERSTANDING 23

5. SYNTHESIS

• Considering the differences of prosody factors between original and target unit, band differences:

iii BB ˆ

• Transform the band difference into weights applying to the sinusoidal parameters:

i

2010 iiw

• ,when the j’th harmonic is located in

the i'th band;ikjkj wAA

• Spectral smoothing across unit boundaries.

Page 24: CENTER FOR SPOKEN LANGUAGE UNDERSTANDING 1 PREDICTION AND SYNTHESIS OF PROSODIC EFFECTS ON SPECTRAL BALANCE OF VOWELS Jan P.H. van Santen and Xiaochuan

CENTER FOR SPOKEN LANGUAGE UNDERSTANDING 24

5. SYNTHESIS

5 Bands modification example [i:]

Page 25: CENTER FOR SPOKEN LANGUAGE UNDERSTANDING 1 PREDICTION AND SYNTHESIS OF PROSODIC EFFECTS ON SPECTRAL BALANCE OF VOWELS Jan P.H. van Santen and Xiaochuan

CENTER FOR SPOKEN LANGUAGE UNDERSTANDING 25

CONCLUSIONS

• Described simple methods for predicting and synthesizing spectral balance

• But: Spectral balance is only one “non-standard acoustic correlate”

• Others that remain to be addressed:– Spectral dynamics– Phase