ssml extension for expressive mandarin tts shuang li hongwu yang lianhong cai tsinghua university

17
SSML Extension for Expressive Mandarin TTS Shuang Li Hongwu Yang Lianhong Cai Tsinghua University

Upload: donald-manning

Post on 12-Jan-2016

215 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: SSML Extension for Expressive Mandarin TTS Shuang Li Hongwu Yang Lianhong Cai Tsinghua University

SSML Extension for Expressive Mandarin TTS

Shuang LiHongwu YangLianhong Cai

Tsinghua University

Page 2: SSML Extension for Expressive Mandarin TTS Shuang Li Hongwu Yang Lianhong Cai Tsinghua University

Outline

MotivationMotivation

Expression of SpeechExpression of Speech

Proposed SSML extension Proposed SSML extension

ConclusionConclusion

Page 3: SSML Extension for Expressive Mandarin TTS Shuang Li Hongwu Yang Lianhong Cai Tsinghua University

Motivation(1/3)

• Sentences with the same text can be expressed with different styles, emotions and moods

• Current tts system lacks variability

Page 4: SSML Extension for Expressive Mandarin TTS Shuang Li Hongwu Yang Lianhong Cai Tsinghua University

Motivation(2/3)• Current SSML cannot define speaking style, em

otion and mood– Good news: 生日快乐 “ Happy birthday”

expressed in happiness (emotion)

– Bad news: 张总去世了 “ Director Zhang passed away” expressed in sadness (emotion)

– Information provider: 飞往纽约的飞机将要起飞 “Flight for New York is going to take off”:

Expressed in a mild mood

– Dialog: 是中国队赢了吗?“Did Chinese team win?”: Emphasize “Chinese”, with interrogative mood

• Current SSML hard to show the difference between the expressions above

Page 5: SSML Extension for Expressive Mandarin TTS Shuang Li Hongwu Yang Lianhong Cai Tsinghua University

Motivation(3/3)

emotion

Positive, neutral, negative

style

news

Sports comment

dialog

Info providing

……

characteristic

Expressive speech

Emotion, style and characteristic are relatively independent but cannot be separatedCharacteristic and style: relatively stable and global featuresEmotion: short-time, local feature

Expressing pattern

No tag

Phisiological/social characteristics

Voice tag

Phisiological reactations

No tag

With different speaking stylesRepresenting speaker’s attitude, purpose and emotionMore harmonious with the circumstance

Page 6: SSML Extension for Expressive Mandarin TTS Shuang Li Hongwu Yang Lianhong Cai Tsinghua University

Outline

MotivationMotivation

Expression of SpeechExpression of Speech

Proposed SSML extension Proposed SSML extension

ConclusionConclusion

Page 7: SSML Extension for Expressive Mandarin TTS Shuang Li Hongwu Yang Lianhong Cai Tsinghua University

Expression of SpeechStyle : speaking style( dialog, news, information providing…)Mood : mood( request, acquisition, affirmation, apology…) Emotion : emotional activities( neutral, negative, positive)

Mood Emotion

Intonation Emphasis

Speaking RateBreak

Spectral Features

Duration Energy Pitch

Style

Page 8: SSML Extension for Expressive Mandarin TTS Shuang Li Hongwu Yang Lianhong Cai Tsinghua University

Hierarchical framework of Prosody

• Break level– B0: no break– B1: Syllable – B2: Prosodic word– B3: Prosodic Phrase– B4: Breath Group– B5: Prosodic Group

• Chiu-yu Tseng,et al. Fluent speech prosody: Framework and modeling. Speech Communication, 46(2005) 284-399

Page 9: SSML Extension for Expressive Mandarin TTS Shuang Li Hongwu Yang Lianhong Cai Tsinghua University

我永远忘不了 <B3/25ms> 一张对日抗战时的新闻照片, <B3/507ms> 轰炸后的废墟焦土上,<B3/272ms> 一个衣不蔽体、 <B3/384ms> 满身尘土灰烟的幼儿 <B3/100ms> 坐在地上 <B3/75ms> 无助的大哭着。 <B5/1110ms> 那是一再令我热泪盈眶的镜头。 <B3/507ms> 新闻摄影中的战争传真 <B3/276ms> 已不能只称是照片了。 <B5/802ms>

• From Chiu-yu Tseng, report in Beijing University, Oct 11, 2005

Page 10: SSML Extension for Expressive Mandarin TTS Shuang Li Hongwu Yang Lianhong Cai Tsinghua University

Outline

IntroductionIntroduction

Expression of SpeechExpression of Speech

Proposed SSML extension Proposed SSML extension

ConclusionConclusion

Page 11: SSML Extension for Expressive Mandarin TTS Shuang Li Hongwu Yang Lianhong Cai Tsinghua University

Proposed tag ( 1/2 )• Utterance: prosodic group, expressing a complete meaning

– Attributes:Style : speaking style

Value :News, Reading, Information provider, dialog, etc

Emotion: speaking emotion

Value :Happy 、 Sad 、 Angry 、 Calm 、 Despair, etc

+1 for positive,0 for neutral, -1 for negative

mood : speaking mood

Value :given, request, acquisition, affirmation,apology, etc

Page 12: SSML Extension for Expressive Mandarin TTS Shuang Li Hongwu Yang Lianhong Cai Tsinghua University

Proposed tag ( 2/2 )• BG: breath group

– attributes:intonation :

Value : indicative, interrogative, imperative

• PPh: prosodic phrase

• PW: prosodic word

• Syl: Syllable

Page 13: SSML Extension for Expressive Mandarin TTS Shuang Li Hongwu Yang Lianhong Cai Tsinghua University

Some examples(1/3)• <?xml version="1.0"?>• <speak version="1.0" xmlns="http://www.w3.org/2001/10/synthesis"• xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"• xsi:schemaLocation="http://www.w3.org/2001/10/synthesis• http://www.w3.org/TR/speech-synthesis/synthesis.xsd"• xml:lang=“zh-CN">• <utterence style=”information provide” emotion=”-1” mood=”apology”>• <bg intonation=” indicative”>• <pph>1121 次航班 (Flight 1121)</pph>• <pph> 延误 (has been delayed )• <pw><emphasis level=”strong”>1 小时 (for an hour )</emphasis></pw></pph>• <break strength=”medium”, time=”215ms”/>• <pph> 请旅客们到 (Please go to )</pph>• <pw><emphasis=”moderate”>G6</emphasis=”moderate”></pw>• <pph> 候机厅等候 (the waiting room)</pph>• </bg>• </utterence>• </speak>

Page 14: SSML Extension for Expressive Mandarin TTS Shuang Li Hongwu Yang Lianhong Cai Tsinghua University

Some examples(2/3)• <?xml version="1.0"?>• <speak version="1.0" xmlns="http://www.w3.org/2001/10/synthesis"• xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"• xsi:schemaLocation="http://www.w3.org/2001/10/synthesis• http://www.w3.org/TR/speech-synthesis/synthesis.xsd"• xml:lang=“zh-CN">• <utterence style=”dialog” emotion=”neutral” mood=”acquisition”>• <bg intonation=”interrogative”>• <pph><pw>• <emphasis level=”strong”> 张威 (Zhang Wei )</emphasis>• </pw></pph>• <break strength=medium time=75ms/>• <pph> 担心肖荫开车发晕 (is afraid of Xiao Yin being dizzy when driving

)</pph>• </bg>• </utterence>• </speak>

Page 15: SSML Extension for Expressive Mandarin TTS Shuang Li Hongwu Yang Lianhong Cai Tsinghua University

Some examples(3/3)• <?xml version="1.0"?>• <speak version="1.0" xmlns="http://www.w3.org/2001/10/synthesis"• xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"• xsi:schemaLocation="http://www.w3.org/2001/10/synthesis• http://www.w3.org/TR/speech-synthesis/synthesis.xsd"• xml:lang=“zh-CN">• <utterence style=”dialog” emotion=”angery”>• <bg intonation=”interrogative”>• <prosody rate=”x-fast”> 难道不是你的错吗? (Isn’t it your fault? )• <break strength=”medium” time=”520ms”/>• </bg>• <bg intonation=”imperative”>• 以后你小心一点 (Be careful next time)• </bg>• </utterence>• </speak>

Page 16: SSML Extension for Expressive Mandarin TTS Shuang Li Hongwu Yang Lianhong Cai Tsinghua University

Outline

MotivationMotivation

Expression of SpeechExpression of Speech

Proposed SSML extension Proposed SSML extension

ConclusionConclusion

Page 17: SSML Extension for Expressive Mandarin TTS Shuang Li Hongwu Yang Lianhong Cai Tsinghua University

Conclusion & question?

• 5 elements for hierarchic prosodic structure– utterance, bg, pph, pw, syl

• 3 expressive attributes for utterance– style– emotion– mood

• 1 intonation attributes for bg– intonation