ssml extension for expressive mandarin tts shuang li hongwu yang lianhong cai tsinghua university
Post on 12-Jan-2016
216 Views
Preview:
TRANSCRIPT
SSML Extension for Expressive Mandarin TTS
Shuang LiHongwu YangLianhong Cai
Tsinghua University
Outline
MotivationMotivation
Expression of SpeechExpression of Speech
Proposed SSML extension Proposed SSML extension
ConclusionConclusion
Motivation(1/3)
• Sentences with the same text can be expressed with different styles, emotions and moods
• Current tts system lacks variability
Motivation(2/3)• Current SSML cannot define speaking style, em
otion and mood– Good news: 生日快乐 “ Happy birthday”
expressed in happiness (emotion)
– Bad news: 张总去世了 “ Director Zhang passed away” expressed in sadness (emotion)
– Information provider: 飞往纽约的飞机将要起飞 “Flight for New York is going to take off”:
Expressed in a mild mood
– Dialog: 是中国队赢了吗?“Did Chinese team win?”: Emphasize “Chinese”, with interrogative mood
• Current SSML hard to show the difference between the expressions above
Motivation(3/3)
emotion
Positive, neutral, negative
style
news
Sports comment
dialog
Info providing
……
characteristic
Expressive speech
Emotion, style and characteristic are relatively independent but cannot be separatedCharacteristic and style: relatively stable and global featuresEmotion: short-time, local feature
Expressing pattern
No tag
Phisiological/social characteristics
Voice tag
Phisiological reactations
No tag
With different speaking stylesRepresenting speaker’s attitude, purpose and emotionMore harmonious with the circumstance
Outline
MotivationMotivation
Expression of SpeechExpression of Speech
Proposed SSML extension Proposed SSML extension
ConclusionConclusion
Expression of SpeechStyle : speaking style( dialog, news, information providing…)Mood : mood( request, acquisition, affirmation, apology…) Emotion : emotional activities( neutral, negative, positive)
Mood Emotion
Intonation Emphasis
Speaking RateBreak
Spectral Features
Duration Energy Pitch
Style
Hierarchical framework of Prosody
• Break level– B0: no break– B1: Syllable – B2: Prosodic word– B3: Prosodic Phrase– B4: Breath Group– B5: Prosodic Group
• Chiu-yu Tseng,et al. Fluent speech prosody: Framework and modeling. Speech Communication, 46(2005) 284-399
我永远忘不了 <B3/25ms> 一张对日抗战时的新闻照片, <B3/507ms> 轰炸后的废墟焦土上,<B3/272ms> 一个衣不蔽体、 <B3/384ms> 满身尘土灰烟的幼儿 <B3/100ms> 坐在地上 <B3/75ms> 无助的大哭着。 <B5/1110ms> 那是一再令我热泪盈眶的镜头。 <B3/507ms> 新闻摄影中的战争传真 <B3/276ms> 已不能只称是照片了。 <B5/802ms>
• From Chiu-yu Tseng, report in Beijing University, Oct 11, 2005
Outline
IntroductionIntroduction
Expression of SpeechExpression of Speech
Proposed SSML extension Proposed SSML extension
ConclusionConclusion
Proposed tag ( 1/2 )• Utterance: prosodic group, expressing a complete meaning
– Attributes:Style : speaking style
Value :News, Reading, Information provider, dialog, etc
Emotion: speaking emotion
Value :Happy 、 Sad 、 Angry 、 Calm 、 Despair, etc
+1 for positive,0 for neutral, -1 for negative
mood : speaking mood
Value :given, request, acquisition, affirmation,apology, etc
Proposed tag ( 2/2 )• BG: breath group
– attributes:intonation :
Value : indicative, interrogative, imperative
• PPh: prosodic phrase
• PW: prosodic word
• Syl: Syllable
Some examples(1/3)• <?xml version="1.0"?>• <speak version="1.0" xmlns="http://www.w3.org/2001/10/synthesis"• xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"• xsi:schemaLocation="http://www.w3.org/2001/10/synthesis• http://www.w3.org/TR/speech-synthesis/synthesis.xsd"• xml:lang=“zh-CN">• <utterence style=”information provide” emotion=”-1” mood=”apology”>• <bg intonation=” indicative”>• <pph>1121 次航班 (Flight 1121)</pph>• <pph> 延误 (has been delayed )• <pw><emphasis level=”strong”>1 小时 (for an hour )</emphasis></pw></pph>• <break strength=”medium”, time=”215ms”/>• <pph> 请旅客们到 (Please go to )</pph>• <pw><emphasis=”moderate”>G6</emphasis=”moderate”></pw>• <pph> 候机厅等候 (the waiting room)</pph>• </bg>• </utterence>• </speak>
Some examples(2/3)• <?xml version="1.0"?>• <speak version="1.0" xmlns="http://www.w3.org/2001/10/synthesis"• xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"• xsi:schemaLocation="http://www.w3.org/2001/10/synthesis• http://www.w3.org/TR/speech-synthesis/synthesis.xsd"• xml:lang=“zh-CN">• <utterence style=”dialog” emotion=”neutral” mood=”acquisition”>• <bg intonation=”interrogative”>• <pph><pw>• <emphasis level=”strong”> 张威 (Zhang Wei )</emphasis>• </pw></pph>• <break strength=medium time=75ms/>• <pph> 担心肖荫开车发晕 (is afraid of Xiao Yin being dizzy when driving
)</pph>• </bg>• </utterence>• </speak>
Some examples(3/3)• <?xml version="1.0"?>• <speak version="1.0" xmlns="http://www.w3.org/2001/10/synthesis"• xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"• xsi:schemaLocation="http://www.w3.org/2001/10/synthesis• http://www.w3.org/TR/speech-synthesis/synthesis.xsd"• xml:lang=“zh-CN">• <utterence style=”dialog” emotion=”angery”>• <bg intonation=”interrogative”>• <prosody rate=”x-fast”> 难道不是你的错吗? (Isn’t it your fault? )• <break strength=”medium” time=”520ms”/>• </bg>• <bg intonation=”imperative”>• 以后你小心一点 (Be careful next time)• </bg>• </utterence>• </speak>
Outline
MotivationMotivation
Expression of SpeechExpression of Speech
Proposed SSML extension Proposed SSML extension
ConclusionConclusion
Conclusion & question?
• 5 elements for hierarchic prosodic structure– utterance, bg, pph, pw, syl
• 3 expressive attributes for utterance– style– emotion– mood
• 1 intonation attributes for bg– intonation
top related