humaine workshop paris generating narrative speech for the virtual storyteller 1 koen meijs, mariet...

33
Humaine Workshop Paris Generating narrative speech for the Virtual Storyteller 1 Generating narrative speech for the Virtual Storyteller Koen Meijs, Mariet Theune, Dirk Heylen* and others

Upload: frederick-blakeway

Post on 31-Mar-2015

215 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: Humaine Workshop Paris Generating narrative speech for the Virtual Storyteller 1 Koen Meijs, Mariet Theune, Dirk Heylen* and others

Humaine Workshop Paris Generating narrative speech for the Virtual Storyteller 1

Generating narrative speech for the Virtual Storyteller Koen Meijs, Mariet Theune, Dirk Heylen* and others

Page 2: Humaine Workshop Paris Generating narrative speech for the Virtual Storyteller 1 Koen Meijs, Mariet Theune, Dirk Heylen* and others

Humaine Workshop Paris Generating narrative speech for the Virtual Storyteller 2

Overview

• Background: The Virtual Storyteller• Analysis of human storytellers• Conversion rules and testing• Implementation• Evaluation• Conclusions and future work

Page 3: Humaine Workshop Paris Generating narrative speech for the Virtual Storyteller 1 Koen Meijs, Mariet Theune, Dirk Heylen* and others

Humaine Workshop Paris Generating narrative speech for the Virtual Storyteller 3

The Virtual Storyteller

Automatic story

generation:• Plot creation• Natural language

generation• Storytelling

Page 4: Humaine Workshop Paris Generating narrative speech for the Virtual Storyteller 1 Koen Meijs, Mariet Theune, Dirk Heylen* and others

Humaine Workshop Paris Generating narrative speech for the Virtual Storyteller 4

Plot creation

Characters in the story are (semi) autonomous agents, which:

• Have their own personality, goals and emotions

• Can perform planned actions to reach their goals

• Are guided by a director agent

Page 5: Humaine Workshop Paris Generating narrative speech for the Virtual Storyteller 1 Koen Meijs, Mariet Theune, Dirk Heylen* and others

Humaine Workshop Paris Generating narrative speech for the Virtual Storyteller 5

NLG and story presentation

• Language generation using simple sentence templates

• Story presentation by an embodied, speaking agent (using Microsoft Agents as a temporary solution)

Page 6: Humaine Workshop Paris Generating narrative speech for the Virtual Storyteller 1 Koen Meijs, Mariet Theune, Dirk Heylen* and others

Humaine Workshop Paris Generating narrative speech for the Virtual Storyteller 6

Example story settingNB: Visualisation is not part

of the system yet!

Page 7: Humaine Workshop Paris Generating narrative speech for the Virtual Storyteller 1 Koen Meijs, Mariet Theune, Dirk Heylen* and others

Humaine Workshop Paris Generating narrative speech for the Virtual Storyteller 7

Example story text

Diana walked to the forest. Brutus walked to the plains. Diana picked up the sword. Brutus walked to the desert. Diana walked to the desert. Brutus was afraid of Diana because Brutus saw that Diana had the sword. Brutus hit Diana. Diana was afraid of Brutus because Diana saw Brutus.Diana walked to the forest. Brutus was afraid of Diana because Brutus saw that Diana had the sword. Brutus walked to the forest. Diana stabbed the villain. And she lived happily ever after!!!

Page 8: Humaine Workshop Paris Generating narrative speech for the Virtual Storyteller 1 Koen Meijs, Mariet Theune, Dirk Heylen* and others

Humaine Workshop Paris Generating narrative speech for the Virtual Storyteller 8

Storytellers’ speech

Human storytellers engage their audience by:• General “storytelling” speech style• Different voices for characters• Expressing emotions• Different “sound effects”

Page 9: Humaine Workshop Paris Generating narrative speech for the Virtual Storyteller 1 Koen Meijs, Mariet Theune, Dirk Heylen* and others

Humaine Workshop Paris Generating narrative speech for the Virtual Storyteller 9

Focus of this work

• General storytelling style• Use of prosody to express suspense in

stories

Page 10: Humaine Workshop Paris Generating narrative speech for the Virtual Storyteller 1 Koen Meijs, Mariet Theune, Dirk Heylen* and others

Humaine Workshop Paris Generating narrative speech for the Virtual Storyteller 10

Analysis of human speakers

Global storytelling style, material from:• newsreader (Onno Duyvené de Wit) • children’s storyteller (Sacco van der Made)• adult storyteller (Toon Tellegen)

Analysis (using PRAAT) mainly based on children’s storyteller

Page 11: Humaine Workshop Paris Generating narrative speech for the Virtual Storyteller 1 Koen Meijs, Mariet Theune, Dirk Heylen* and others

Humaine Workshop Paris Generating narrative speech for the Virtual Storyteller 11

Features

• Pitch• Intensity• Tempo (syllables per second)• Pause duration• Vowel length

Page 12: Humaine Workshop Paris Generating narrative speech for the Virtual Storyteller 1 Koen Meijs, Mariet Theune, Dirk Heylen* and others

Humaine Workshop Paris Generating narrative speech for the Virtual Storyteller 12

Global storytelling style

Pitch / intensity: • Averages are

similar• Standard

deviation is much larger for storyteller

newsreader

children’s storyteller

Page 13: Humaine Workshop Paris Generating narrative speech for the Virtual Storyteller 1 Koen Meijs, Mariet Theune, Dirk Heylen* and others

Humaine Workshop Paris Generating narrative speech for the Virtual Storyteller 13

Global storytelling style

Tempo (syllables per second): newsreader is much faster than both storytellers

Pause duration: storyteller pauses are longer (esp. between sentences)

Also: lengthening of certain adverbs/adjectives by storyteller (“A long corridor that was s o low …”)

Page 14: Humaine Workshop Paris Generating narrative speech for the Virtual Storyteller 1 Koen Meijs, Mariet Theune, Dirk Heylen* and others

Humaine Workshop Paris Generating narrative speech for the Virtual Storyteller 14

Expressing suspense

• Sudden climax: an unexpected revelation.

E.g., opening Bluebeard’s secret chamber:“She had to get used to the darkness, and then …”

• Increasing climax: building up expectation.

Finally finding the Sleeping Beauty:“He opened the door and… there was the sleeping princess.”

Page 15: Humaine Workshop Paris Generating narrative speech for the Virtual Storyteller 1 Koen Meijs, Mariet Theune, Dirk Heylen* and others

Humaine Workshop Paris Generating narrative speech for the Virtual Storyteller 15

Sudden climax

• “En toen…” / “And then…”• Sudden rise in pitch and intensity on “then” • Vowel lengthening in “then”

Page 16: Humaine Workshop Paris Generating narrative speech for the Virtual Storyteller 1 Koen Meijs, Mariet Theune, Dirk Heylen* and others

Humaine Workshop Paris Generating narrative speech for the Virtual Storyteller 16

Increasing climax

• Two parts: 1 creating expectation 2 revelation• First part: increasing pitch and vowel duration • Second part: more constant, lower pitch and

intensity

Page 17: Humaine Workshop Paris Generating narrative speech for the Virtual Storyteller 1 Koen Meijs, Mariet Theune, Dirk Heylen* and others

Humaine Workshop Paris Generating narrative speech for the Virtual Storyteller 17

Conversion rules

• Conversion from ‘neutral’ to ‘storytelling’ speech

• Rules based on analysis of human speakers• Input: paired time-value data • Output: new values for a given time domain

Page 18: Humaine Workshop Paris Generating narrative speech for the Virtual Storyteller 1 Koen Meijs, Mariet Theune, Dirk Heylen* and others

Humaine Workshop Paris Generating narrative speech for the Virtual Storyteller 18

Example from storytelling style

• Pitch: increase the pitch of syllables carrying a sentence accent

• All pitch values inside the syllable’s time domain are multiplied by a certain factor (based on a sine function)

• Maximum increase between 40-90 Hz

→ best value to be determined experimentally

Page 19: Humaine Workshop Paris Generating narrative speech for the Virtual Storyteller 1 Koen Meijs, Mariet Theune, Dirk Heylen* and others

Humaine Workshop Paris Generating narrative speech for the Virtual Storyteller 19

Determining constant values

• Material: speech produced by Fluency text-to-speech, manipulated using PRAAT scripts

• Five subjects compared 22 speech fragment pairs with different values for one constant

• Subjects had to indicate: – Which fragment sounded most natural or– Which had the best expression of

suspense

Page 20: Humaine Workshop Paris Generating narrative speech for the Virtual Storyteller 1 Koen Meijs, Mariet Theune, Dirk Heylen* and others

Humaine Workshop Paris Generating narrative speech for the Virtual Storyteller 20

Results: storytelling style

Constant Range Outcome

Max. pitch increase 40 – 90 Hz 40 Hz

Intensity increase 2 - 6 Db 2 Db

Global tempo

(syllables per second)

3.0 – 3.6 sps 3.6 sps

Vowel duration increase 0 or 50% 50%

Page 21: Humaine Workshop Paris Generating narrative speech for the Virtual Storyteller 1 Koen Meijs, Mariet Theune, Dirk Heylen* and others

Humaine Workshop Paris Generating narrative speech for the Virtual Storyteller 21

Results: sudden climax

Constant Range Outcome

Intensity rise at start of climax 6 - 10 Db 6 Db

Pitch rise at start of climax 80 – 120 Hz 80 Hz

Subsequent pitch rise 0 - 200 Hz 0 Hz

“Everybody waited in silence, and then ... there was a loud bang!”

Page 22: Humaine Workshop Paris Generating narrative speech for the Virtual Storyteller 1 Koen Meijs, Mariet Theune, Dirk Heylen* and others

Humaine Workshop Paris Generating narrative speech for the Virtual Storyteller 22

Results: increasing climax

Constant Range Outcome

Pitch contour start at 25-50 Hz

top at 60-80 Hz

25 Hz

60 Hz

Vowel duration increase 50 - 100% 50%

“Step by step he jumped from stone to stone, slipped on the last stone and… fell into the water.”

Neutral: Pitch contour manipulated:

Page 23: Humaine Workshop Paris Generating narrative speech for the Virtual Storyteller 1 Koen Meijs, Mariet Theune, Dirk Heylen* and others

Humaine Workshop Paris Generating narrative speech for the Virtual Storyteller 23

Pilot test of conversion rules

• 16 speech fragments:– 8 ‘neutral’ (Fluency, with no manipulation) – 8 manipulated using PRAAT according to

conversion rules, using best constant values• Eight subjects rated storytelling quality,

naturalness, and suspense on a five-point scale (subjects divided in two groups)

Page 24: Humaine Workshop Paris Generating narrative speech for the Virtual Storyteller 1 Koen Meijs, Mariet Theune, Dirk Heylen* and others

Humaine Workshop Paris Generating narrative speech for the Virtual Storyteller 24

Page 25: Humaine Workshop Paris Generating narrative speech for the Virtual Storyteller 1 Koen Meijs, Mariet Theune, Dirk Heylen* and others

Humaine Workshop Paris Generating narrative speech for the Virtual Storyteller 25

Pilot test results

Compared to neutral fragments, • Storytelling quality of manipulated fragments

was rated equal or better• Naturalness of manipulated fragments was

rated equal or less • Manipulated fragments were rated as having

more suspense, even if only the ‘global storytelling style’ was used

Page 26: Humaine Workshop Paris Generating narrative speech for the Virtual Storyteller 1 Koen Meijs, Mariet Theune, Dirk Heylen* and others

Humaine Workshop Paris Generating narrative speech for the Virtual Storyteller 26

Implementation

annotated text input

partial synthesis (Fluency)

neutral prosodic

information

resynthesis (Fluency)

narrative prosodic

information

narrative speech

application of

conversion rules

Prosodic information = list of phonemes with pitch and duration values (no possible to adjust intensity)

Page 27: Humaine Workshop Paris Generating narrative speech for the Virtual Storyteller 1 Koen Meijs, Mariet Theune, Dirk Heylen* and others

Humaine Workshop Paris Generating narrative speech for the Virtual Storyteller 27

Example annotated text

Annotation: extension of SSML. <speak>

<style type=narrative/>

<s> The beard made him look <accent extend=yes> so </accent> ugly that everybody ran away when they saw him. </s>

<s> He wanted to turn around <climax type=sudden> and then </climax> there was a loud bang. </s>

<s> Bluebeard raised the big knife, <climax type=increasing> he wanted to strike and <climax_top/> there was a knock on the door. </climax> </s>

</speak>

Page 28: Humaine Workshop Paris Generating narrative speech for the Virtual Storyteller 1 Koen Meijs, Mariet Theune, Dirk Heylen* and others

Humaine Workshop Paris Generating narrative speech for the Virtual Storyteller 28

Example prosodic information

1: h 112

2: I: 151 50 75

3: R 75

4: l 75

5: @ 47 20 71 70 61

6: k 131

7: @ 55 80 70

8: _ 11 50 65

• Phoneme • Duration (ms)• Pitch percentage

(specifying at which point during the phoneme the pitch value should be applied)

• Pitch value

Page 29: Humaine Workshop Paris Generating narrative speech for the Virtual Storyteller 1 Koen Meijs, Mariet Theune, Dirk Heylen* and others

Humaine Workshop Paris Generating narrative speech for the Virtual Storyteller 29

Conversion steps

• Parse XML• Look up phonemes to be manipulated• Apply function

For example, pitch for global storytelling style:

y(t).(sin((((t-t1)/(t2-t1))0,5π) + 0,25π)/n)),

where n = average pitch / 40

• Return adapted valuesNB: intensity cannot be adapted in Fluency

Page 30: Humaine Workshop Paris Generating narrative speech for the Virtual Storyteller 1 Koen Meijs, Mariet Theune, Dirk Heylen* and others

Humaine Workshop Paris Generating narrative speech for the Virtual Storyteller 30

Evaluation of implementation

• Set-up similar to conversion rule pilot test• 16 fragments (8 neutral / narrative pairs)• 20 subjects, divided in two groups• Rating storytelling quality, naturalness, and

suspense on a 5 point scale

Page 31: Humaine Workshop Paris Generating narrative speech for the Virtual Storyteller 1 Koen Meijs, Mariet Theune, Dirk Heylen* and others

Humaine Workshop Paris Generating narrative speech for the Virtual Storyteller 31

Mean scores

1 2 3 4 5 6 7 8

Story-telling

3,0 3,9 3,1 3,5 3,1 3,3 3,0 3,6 2,5 3,2 3,1 3,6 3,1 3,5 3,0 2,8

Natural-ness

2,6 3,7 3,3 3,2 2,6 2,8 2,6 3,3 2,5 2,3 2,5 3,2 3,1 3,5 3,1 2,9

Suspense

2,1 3,7 2,5 3,1 2,5 2,8 2,1 3,0 1,8 2,2 2,3 3,6 2,7 3,4 2,4 4,0

Significant differences (≤ 0,05) are shown in bold face. Underlining indicates near significance.

Page 32: Humaine Workshop Paris Generating narrative speech for the Virtual Storyteller 1 Koen Meijs, Mariet Theune, Dirk Heylen* and others

Humaine Workshop Paris Generating narrative speech for the Virtual Storyteller 32

Summing up the results

• Storytelling quality of manipulated fragments: rated above average, and better than neutral fragments (but hardly significant)

• Naturalness: ratings vary; some accents were seen as misplaced (though copied from original fragment)

• Suspense of manipulated fragments rated higher than neutral fragments (some significance)

Page 33: Humaine Workshop Paris Generating narrative speech for the Virtual Storyteller 1 Koen Meijs, Mariet Theune, Dirk Heylen* and others

Humaine Workshop Paris Generating narrative speech for the Virtual Storyteller 33

Conclusions & future work

• Successful automatic conversion from standard text-to-speech to ‘storytelling prosody’

• Further improvement and larger-scale evaluation still needed

• Automatic derivation of features from text?