detecting and adapting conversational agent strategy to ...this project focuses on adding emotion...

82
Heriot-Watt University Research Report Detecting and Adapting Conversational Agent Strategy to User’s Emotions in Video Games Author: Brice Cagnol Supervisor: Prof. Oliver Lemon Second Reader: Prof. Frank Broz A thesis submitted in fulfilment of the requirements for the degree of MSc. Artificial Intelligence in the School of Mathematical and Computer Sciences August 2017

Upload: others

Post on 03-Jul-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Detecting and Adapting Conversational Agent Strategy to ...This project focuses on adding emotion detection in video games to make non-playable characters adapt to users’ emotions

Heriot-Watt University

Research Report

Detecting and Adapting ConversationalAgent Strategy to User’s Emotions in

Video Games

Author:

Brice Cagnol

Supervisor:

Prof. Oliver Lemon

Second Reader:

Prof. Frank Broz

A thesis submitted in fulfilment of the requirements

for the degree of MSc. Artificial Intelligence

in the

School of Mathematical and Computer Sciences

August 2017

Page 2: Detecting and Adapting Conversational Agent Strategy to ...This project focuses on adding emotion detection in video games to make non-playable characters adapt to users’ emotions
Page 3: Detecting and Adapting Conversational Agent Strategy to ...This project focuses on adding emotion detection in video games to make non-playable characters adapt to users’ emotions

Declaration of Authorship

I, Brice Cagnol, declare that this thesis titled, ’Detecting and Adapting Conversational

Agent Strategy to User’s Emotions in Video Games’ and the work presented in it is my

own. I confirm that this work submitted for assessment is my own and is expressed in

my own words. Any uses made within it of the works of other authors in any form (e.g.,

ideas, equations, figures, text, tables, programs) are properly acknowledged at any point

of their use. A list of the references employed is included.

Signed:

Date:

ii

Page 4: Detecting and Adapting Conversational Agent Strategy to ...This project focuses on adding emotion detection in video games to make non-playable characters adapt to users’ emotions

“Words are pale shadows of forgotten names. As names have power, words have power.

Words can light fires in the minds of men. Words can wring tears from the hardest

hearts. There are seven words that will make a person love you. There are ten words

that will break a strong man’s will. But a word is nothing but a painting of a fire. A

name is the fire itself. ”

Patrick Rothfuss, The Name of the Wind (2007).

Page 5: Detecting and Adapting Conversational Agent Strategy to ...This project focuses on adding emotion detection in video games to make non-playable characters adapt to users’ emotions

Abstract

With the incessant power increase of personal computers and the democratisation of

virtual reality, video games’ experiences are getting more and more immersive. In this

context, it could be interesting to implement dialogue systems allowing free speech.

This project focuses on adding emotion detection in video games to make non-playable

characters adapt to users’ emotions. This would personalise users’ experiences and

contribute to immerse the player in the game’s world.

A video game prototype was made with emotion detection through camera and text.

It features a dialogue system written with Opendial that adapts to detected emotions.

This prototype system was built on a previous one realized in partnership with Speech

Graphics that allows free speech with a non-playable character. In the new application,

the user have to achieve a goal by talking to this non-playable character. The plot and

scene were built to make users express emotions like surprise, joy, sadness or anger.

An evaluation was run with two versions of the system, one with the emotion detection

enabled, the other without it. This project assesses whether users can notice that

emotion detection is running or not and how it contributes to their experience in the

game.

Page 6: Detecting and Adapting Conversational Agent Strategy to ...This project focuses on adding emotion detection in video games to make non-playable characters adapt to users’ emotions

Acknowledgements

I would like to thank my supervisor Prof. Oliver Lemon for his patience, suggestions

and help throughout the second semester and the project realization.

v

Page 7: Detecting and Adapting Conversational Agent Strategy to ...This project focuses on adding emotion detection in video games to make non-playable characters adapt to users’ emotions

Contents

Declaration of Authorship ii

Abstract iv

Acknowledgements v

Contents vi

List of Figures x

List of Tables xi

Abbreviations xii

Introduction 1

1 Literature Review 3

1.1 Use of Dialogues and Speech in Video Games . . . . . . . . . . . . . . . . 3

1.1.1 Dialogue System in Video Games . . . . . . . . . . . . . . . . . . . 3

1.1.1.1 Current Dialogue System . . . . . . . . . . . . . . . . . . 3

1.1.1.2 Natural Language Generation and Open Dialogue . . . . 4

1.1.1.3 Event[0], a Game with Open Dialogues . . . . . . . . . . 5

1.1.2 Text-to-Speech in Games . . . . . . . . . . . . . . . . . . . . . . . 6

1.1.3 Voice Recognition in Games . . . . . . . . . . . . . . . . . . . . . . 7

1.2 Detecting and adapting to user’s emotions . . . . . . . . . . . . . . . . . . 7

1.2.1 Sentiment Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

1.2.1.1 Emotions . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

1.2.1.2 Detecting Emotions through Facial Expressions . . . . . 8

1.2.1.3 Detecting Emotions through Text . . . . . . . . . . . . . 9

1.2.2 Adapting to User’s Emotions . . . . . . . . . . . . . . . . . . . . . 9

1.3 Previous Prototype . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

1.3.1 Speech Graphics’ Real Time Animation System . . . . . . . . . . . 10

1.3.2 Video Game Prototype . . . . . . . . . . . . . . . . . . . . . . . . 10

1.3.3 Dialogue Manager . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

1.3.4 Chatbot . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

vi

Page 8: Detecting and Adapting Conversational Agent Strategy to ...This project focuses on adding emotion detection in video games to make non-playable characters adapt to users’ emotions

Contents vii

2 Methodology 13

2.1 Project Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

2.2 Dialogue manager improvement . . . . . . . . . . . . . . . . . . . . . . . . 15

2.3 Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

3 Requirements Analysis 17

3.1 Project Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

3.2 Video Game Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18

3.2.1 Controls . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18

3.2.1.1 Current Implementation . . . . . . . . . . . . . . . . . . . 18

3.2.2 Graphical User Interface . . . . . . . . . . . . . . . . . . . . . . . . 18

3.2.2.1 During dialogues . . . . . . . . . . . . . . . . . . . . . . . 18

3.2.2.2 Any time . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

3.2.3 Non-Player Character . . . . . . . . . . . . . . . . . . . . . . . . . 19

3.2.3.1 Current Implementation . . . . . . . . . . . . . . . . . . . 19

3.3 Conversational Agent . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20

3.3.1 Text Input . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20

3.3.1.1 Current Implementation . . . . . . . . . . . . . . . . . . . 20

3.3.2 Voice Recognition . . . . . . . . . . . . . . . . . . . . . . . . . . . 20

3.3.3 Natural Language Understanding . . . . . . . . . . . . . . . . . . . 20

3.3.4 Emotions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21

3.3.4.1 Current Implementation . . . . . . . . . . . . . . . . . . . 21

3.3.5 Chatbot . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21

3.3.5.1 Current Implementation . . . . . . . . . . . . . . . . . . . 21

4 Project Plan 22

4.1 Project Stakeholders . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22

4.2 Project Tasks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22

4.3 Gantt Chart . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24

4.4 Risk Management . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25

4.4.1 Risk Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25

4.4.2 Contingency Plan . . . . . . . . . . . . . . . . . . . . . . . . . . . 26

4.5 Changes in the Plan . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26

5 Professional, Legal, Ethical and Social Issues 27

5.1 Professional Issues . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27

5.2 Legal Issues . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27

5.3 Ethical Issues . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28

5.4 Social Issues . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28

5.5 Ethical Approval . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28

6 Implementation 29

6.1 Presentation of the application . . . . . . . . . . . . . . . . . . . . . . . . 29

6.2 Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30

6.2.1 Dialogue Class . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30

6.2.2 Unity Application . . . . . . . . . . . . . . . . . . . . . . . . . . . 31

6.2.2.1 Menu . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32

6.2.2.2 Game . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32

Page 9: Detecting and Adapting Conversational Agent Strategy to ...This project focuses on adding emotion detection in video games to make non-playable characters adapt to users’ emotions

Contents viii

6.2.3 Java Application . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34

6.2.3.1 Initialization . . . . . . . . . . . . . . . . . . . . . . . . . 34

6.2.3.2 Main Loop . . . . . . . . . . . . . . . . . . . . . . . . . . 35

6.2.4 Determining which Emotion to Keep . . . . . . . . . . . . . . . . . 36

6.2.4.1 Emotion Detection with Affectiva . . . . . . . . . . . . . 36

6.2.4.2 Sentiment Analysis with Watson . . . . . . . . . . . . . . 37

6.2.5 Dialogue Manager with OpenDial Scripts . . . . . . . . . . . . . . 37

6.2.5.1 Semantic Language Understanding . . . . . . . . . . . . . 37

6.2.5.2 Emotion Manager . . . . . . . . . . . . . . . . . . . . . . 37

6.2.5.3 Dialogue Manager . . . . . . . . . . . . . . . . . . . . . . 38

6.2.5.4 Natural Language Generation . . . . . . . . . . . . . . . 38

6.2.5.5 Time Manager . . . . . . . . . . . . . . . . . . . . . . . . 38

6.3 Decisions Taken on Design of the Implementation . . . . . . . . . . . . . . 38

6.3.1 Technologies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38

6.3.2 Game Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39

6.3.2.1 Layout of the Scene . . . . . . . . . . . . . . . . . . . . . 39

6.3.2.2 Making Users Express Emotions . . . . . . . . . . . . . . 39

6.3.3 Removing the Chatbot . . . . . . . . . . . . . . . . . . . . . . . . . 40

6.4 Difficulties Met During the Implementation . . . . . . . . . . . . . . . . . 40

6.5 Critical Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41

6.6 Sample Dialogues . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41

6.6.1 Full Sample Dialogue Without Emotions . . . . . . . . . . . . . . . 42

6.6.2 Dialogue Samples With Emotions . . . . . . . . . . . . . . . . . . . 42

7 Evaluation 43

7.1 Description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43

7.1.1 Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43

7.1.2 Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43

7.1.3 Questionnaire . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44

7.1.3.1 About the User . . . . . . . . . . . . . . . . . . . . . . . 44

7.1.3.2 About the Application . . . . . . . . . . . . . . . . . . . 44

7.1.3.3 About the Video Game Industry . . . . . . . . . . . . . . 45

7.1.4 Objective Metrics . . . . . . . . . . . . . . . . . . . . . . . . . . . 45

7.2 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45

7.2.1 Ease of Completion . . . . . . . . . . . . . . . . . . . . . . . . . . . 46

7.2.2 Independence of the Two Groups . . . . . . . . . . . . . . . . . . . 46

7.2.3 Emotion Detection . . . . . . . . . . . . . . . . . . . . . . . . . . . 47

7.2.3.1 Feeling Like the Character Reacts to User’s Emotions . . 47

7.2.3.2 Emotion Detection through Camera and Text . . . . . . 48

7.2.4 Contribution of Emotion Detection to Users’ Enjoyment . . . . . . 49

7.2.5 Suggestions of Possible Improvements . . . . . . . . . . . . . . . . 50

7.2.5.1 About the Game Itself . . . . . . . . . . . . . . . . . . . 50

7.2.5.2 About the Conversational Agent . . . . . . . . . . . . . . 50

7.2.6 Application to the Industry . . . . . . . . . . . . . . . . . . . . . . 50

8 Future Works 52

8.1 Emotion Detection and Management . . . . . . . . . . . . . . . . . . . . . 52

Page 10: Detecting and Adapting Conversational Agent Strategy to ...This project focuses on adding emotion detection in video games to make non-playable characters adapt to users’ emotions

Contents ix

8.2 Open Dialogue . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52

8.3 Create a Game . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53

Conclusion 54

A Ethical Approval 55

B Consent Form 58

C Evaluation Instructions 60

D Questionnaire 62

E Risk Assessment Form 65

References 67

Page 11: Detecting and Adapting Conversational Agent Strategy to ...This project focuses on adding emotion detection in video games to make non-playable characters adapt to users’ emotions

List of Figures

1.1 Selection of the main character’s dialogue act in The Elder Scrolls V:Skyrim [Bethesda Softworks, 2011] . . . . . . . . . . . . . . . . . . . . . . 4

1.2 Screenshot of the video game prototype (from a video capture made byAaron Walwyn). The door on the right is locked and the character nextto it is the NPC called Leia. . . . . . . . . . . . . . . . . . . . . . . . . . . 11

2.1 Project Architecture Diagram . . . . . . . . . . . . . . . . . . . . . . . . . 13

4.1 Gantt Chart . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24

4.2 Project Schedule . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25

6.1 Screenshot of the application showing the AI . . . . . . . . . . . . . . . . 29

6.2 Project Architecture Diagram . . . . . . . . . . . . . . . . . . . . . . . . . 30

6.3 Sample XML data for a dialogue object . . . . . . . . . . . . . . . . . . . 31

6.4 Main Menu of the Game . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32

6.5 Unity application’s main scripts . . . . . . . . . . . . . . . . . . . . . . . . 32

6.6 Starting the conversation from DialogueManager.cs . . . . . . . . . . . . . 33

6.7 Sending new Dialogue instance from VoiceRecognition.cs . . . . . . . . . . 33

6.8 Initializing the Dialogue System . . . . . . . . . . . . . . . . . . . . . . . . 34

6.9 Initializing UDP connexion . . . . . . . . . . . . . . . . . . . . . . . . . . 34

6.10 Initializing Tone Analyzer service and two variables . . . . . . . . . . . . . 35

6.11 Opening the door . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36

6.12 Calculating new confidence for Affectiva . . . . . . . . . . . . . . . . . . . 37

6.13 Scene’s view from Unity Editor . . . . . . . . . . . . . . . . . . . . . . . . 39

7.1 Age of Participants for each System . . . . . . . . . . . . . . . . . . . . . 46

7.2 How often Users Play Video Games (x: mark, y: number of people . . . . 47

7.3 Number of people feeling like there is or not emotion detection for eachsystem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48

7.4 Number of people for every different answer given to the question “Ienjoyed using this app” for each system . . . . . . . . . . . . . . . . . . . 49

x

Page 12: Detecting and Adapting Conversational Agent Strategy to ...This project focuses on adding emotion detection in video games to make non-playable characters adapt to users’ emotions

List of Tables

4.1 Project Stakeholders . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22

4.2 Risk Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25

6.1 Full dialogue sample without emotions . . . . . . . . . . . . . . . . . . . . 42

6.2 Dialogue samples with emotions . . . . . . . . . . . . . . . . . . . . . . . . 42

7.1 Objective Metrics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45

xi

Page 13: Detecting and Adapting Conversational Agent Strategy to ...This project focuses on adding emotion detection in video games to make non-playable characters adapt to users’ emotions

Abbreviations

NPC Non-Player Character

GUI Graphical User Interface

TTS Text-To-Speech

ASR Automatic Speech Recognition

SLU Spoken Language Understanding

DM Dialogue Manager

NLG Natural Language Generation

xii

Page 14: Detecting and Adapting Conversational Agent Strategy to ...This project focuses on adding emotion detection in video games to make non-playable characters adapt to users’ emotions

Introduction

In many role-playing video games, the story is brightened up by dialogues with non-

player characters. Most of the time, the dialogues are fully directed. However, some

games offer to the player different possible choices in the dialogue that can affect its

outcome but these choices are not driven by speech interaction. They are displayed on

the screen and the player just selects one.

In the meantime, conversational agents never stop to progress. Natural language under-

standing and generation are particularly efficient. So their industrial applications are

numerous today. The most known today are the personal assistants like Apple’s Siri

and Microsoft’s Cortana, providing many features.

Even if conversational agents are on the rise, they do not seem to get into the video game

industry although they could have interesting applications. Some games include voice

recognition for giving orders but none seem to integrate fully implemented conversational

agents.

During the conversational agents’ course, we made, with Aaron Walwyn and Halvin

Dufour, a video game prototype including a conversational agent. The project was

supervised by Dr. Oliver Lemon and done in partnership with Speech Graphics. The

prototype was linked to a dialogue manager, including language understanding and

natural language generation.

The objectives of the present research project were to, build on this initial prototype

system to:

1. Create a new prototype with a dialogue situation with a dialogue manager that

allows to achieve a goal that is sensible in a video game (complete a mission, buy

an item, etc.),

1

Page 15: Detecting and Adapting Conversational Agent Strategy to ...This project focuses on adding emotion detection in video games to make non-playable characters adapt to users’ emotions

Introduction 2

2. Allow it to detect user’s emotions with sentiment analysis,

3. Make the agent react to the emotions to enrich the dialogue, and

4. Evaluate if it contributes to a better experience for the user than the first dialogue

manager.

The present report will develop through a literature review the current status of dia-

logues in video games and the actual research on sentiment analysis. It will then explain

the methodology and the requirements of this project planned before its realization with

a reflection about its evaluation. Chapter 4 presents the project plan with detail of the

stakeholders, Gantt chart and risk management. Next chapter will present the profes-

sional, legal, ethical and social issues. Then, this report details the final implementation

and the evaluation setup and results. Finally, it presents suggestions about possible

future works before concluding.

Page 16: Detecting and Adapting Conversational Agent Strategy to ...This project focuses on adding emotion detection in video games to make non-playable characters adapt to users’ emotions

Chapter 1

Literature Review

1.1 Use of Dialogues and Speech in Video Games

This section presents the current use of dialogues and speech in video games to offer a

comprehension of the issues they imply in the video game industry.

1.1.1 Dialogue System in Video Games

1.1.1.1 Current Dialogue System

In most games, dialogues are fully linear. The characters’ dialogue acts are written

by the game’s scenarists and cannot be modified by the player. However, some role

playing games prompt the player to choose what the main character would say within

a range of possible utterances (see Fig. 1.1). The next non-player character’s (NPC)

utterance is different depending on the player choice. In most cases, one choice always

leads to the same response. In the same way, the last NPC’s utterance often leads to

the same range of possible responses for the player. Nevertheless, the main character’s

characteristics, like charisma or intelligence, can unlock more choices. Moreover, a same

choice can randomly lead to two different answers, the characteristics of the character

affecting the probability of each possible response. The navigation through the dialogue

is a Bayesian network. For example, a NPC could propose a mission to the player that

would be rewarded with 100 gold pieces. If the main character’s charisma is too low, the

player would have only two options: accept or decline. Otherwise, he or she would have

3

Page 17: Detecting and Adapting Conversational Agent Strategy to ...This project focuses on adding emotion detection in video games to make non-playable characters adapt to users’ emotions

Chapter 1 Literature Review 4

Figure 1.1: Selection of the main character’s dialogue act in The Elder Scrolls V:Skyrim [Bethesda Softworks, 2011]

one more option, which would be to demand 200 gold pieces instead of 100. If the main

character’s charisma is great, the NPC would be more likely to increase the reward.

This kind of dialogues are not used in many games as it can greatly affect the plot

continuity. For example, If the player has decided to join a specific guild in the game,

it would change the possible dialogues with many characters. Thus, adding choices to

the players make the writing process of the dialogues longer and more complex as the

writers must ensure the plot consistency and that the player would not be blocked in

some situations by his or her choices. Moreover, this system is meaningful only in games

focusing on their narrative. We can conclude that this dialogue system cannot be used

in every games and that the decision of implementing it must be taken at the beginning

of the game project.

1.1.1.2 Natural Language Generation and Open Dialogue

Until recently, there were few projects with natural language generation [Reed et al.,

2011]. This conference states that there were not successful attempts of integrating

ambitious dialogue systems. The “ambitious early plans for flexible dialogue variation

in Spore [EA, 2008]” did not succeed. A. Reed et al. could create a prototype for a role

Page 18: Detecting and Adapting Conversational Agent Strategy to ...This project focuses on adding emotion detection in video games to make non-playable characters adapt to users’ emotions

Chapter 1 Literature Review 5

playing game with natural language generation: SpyFeet. They planned to improve this

process to personalise the dialogues.

More than natural language generation, there were attempts to integrate an open dia-

logue system into games where the user could freely talk to the agent via a microphone.

Fifteen years ago, dialogue systems were designed for very specific tasks [Lemon, 2002],

like booking a flight. Thus, the dialogue looked more like ”form-filling” than a real

dialogue. The dialogue model was a finite-state machine. O. Lemon made a proposal

of “complex spoken conversational interactions with NPCs”. He made an UAV game in

which the user could collaborate with the UAV to achieve his or her goals. The UAV

had small scope and would always give responses that are consistent to the game. This

application has proven that using open dialogue in video games could be relevant.

Since then, there were projects of making open dialogue in games with a wider scope. In

his system, Dernoncourt [2012] implemented two conversational agent’s classes in AIML.

The first one was “non-task-oriented”, that could allow the user to chat “on any topic

with a friendly relationship such as ALICE [Wallace, 2009]”. The other one is “task-

oriented”, with “a goal assigned to them in their design”. In this case, the user can talk

freely and will always have a consistent response which is not necessarily in the scope

of the game. According to F. Dernoncourt, the current system with few possibilities

“decreases the motivation and the reflection” of the user.

These conferences and articles give interesting reflections about the implementation of

open dialogue and natural language generation in games. They all present different rea-

sons and solutions for their implementation. However, we can observe that the research

about open dialogue in games progresses slowly. Today, conversational agents seem suf-

ficiently advanced to propose more prototypes and later applications in the video game

industry.

1.1.1.3 Event[0], a Game with Open Dialogues

In Event[0] [Event[0], 2016], the user can talk freely by typing in a terminal with Kaizen,

an AI [Semel, 2010]. The dialogue is led by a chatbot made for the game. The AI answers

to the player depending on his or her utterance, but also the current situation and AI’s

mood. It allows to give it a personality. As the AI is not perfect, it is mainly used to

Page 19: Detecting and Adapting Conversational Agent Strategy to ...This project focuses on adding emotion detection in video games to make non-playable characters adapt to users’ emotions

Chapter 1 Literature Review 6

improve the immersion of the player and help him or her to progress. Also, by defining

it as an AI, its defects could be easily understood by the player, which would have been

impossible with a human character.

1.1.2 Text-to-Speech in Games

In modern games, most spoken NPCs’ utterances are written and interpreted by actors.

Text-to-Speech (TTS), which is the generation of a voice output from a text input, is

not currently used for dialogues in video games. Moonbase Alpha [Virtual Heroes, Army

Game Studio, NASA, 2010], a multiplayer simulation of lunar exploration, uses TTS on

its internal chat. It could be useful for players as they could take note of what his or

her team-mates write in the chat without reading it. TTS has not been used in video

game dialogues because:

• It is only necessary with utterances that were not written during the making of

the game, like chatbots’ responses. Otherwise, the dialogues could be dubbed.

• Most TTS services are cloud based so games using them would not be playable

offline.

• If there are multiple characters to design, it could be complex and expensive to

create as many different voices.

• It is not easy to get a realistic voice with believable emotions. It is especially true

if the game’s publisher needs to translate the game in many languages.

In the case of Moonbase Alpha, the use of TTS is relevant as it is already an online

game, every utterance could not be dubbed as they are written by players and the voice

do not need to express emotions.

In conclusion, using TTS would not be relevant in every games, especially if dialogues

are all written. However, it would be necessary if dialogues are generated within the

game.

Page 20: Detecting and Adapting Conversational Agent Strategy to ...This project focuses on adding emotion detection in video games to make non-playable characters adapt to users’ emotions

Chapter 1 Literature Review 7

1.1.3 Voice Recognition in Games

Automatic speech recognition (ASR) is the process of converting a voice input into a

text output. It was not used in early video games as most of personal computers and

consoles did not have compatible microphones.

Some video games can be commanded by voice, especially tactics games. In Tom

Clancy’s EndWar [Ubisoft, 2008], a real-time strategy game, the player can control

his or her troops with voice commands. However, these commands only allow the player

to do the same actions as he or she could do with a keyboard and a mouse. One of the

only games in which interactions with a NPC use voice recognition is Hey You, Pikachu!

[Nintendo, 1998], released on the Nintendo 64. In this game, the player can interact

with Pikachu, a wild creature that the player has to tame. The user can greet or make

request to Pikachu. The latter has different reactions depending on its confidence to the

player and his or her utterance.

Finally, using voice recognition as a main feature is complicated. First, it implies the use

of a microphone. If the player is in a noisy room, his or her experience could be badly

affected. Thus, this mechanic would have to be used in games adapted to it, like role

playing games but would not be meaningful with the current dialogue system. It would

be more interesting for the player to say any utterance instead of choosing between a

few.

1.2 Detecting and adapting to user’s emotions

1.2.1 Sentiment Analysis

1.2.1.1 Emotions

Humans’ communication is driven by emotions [Burkhardt et al., 2009]. However, most

conversational agents do not express or detect user’s emotions.

Detecting emotions can be used to adapt the conversational agents’ strategy to the emo-

tion of the user. Thus, this could lead to more personalized conversations which would

be closer to conversations between humans than talking with simple agents. However,

Page 21: Detecting and Adapting Conversational Agent Strategy to ...This project focuses on adding emotion detection in video games to make non-playable characters adapt to users’ emotions

Chapter 1 Literature Review 8

as explained by Holzapfel and Fuegen [2002]: “Unfortunately, there is no agreement on

a unique definition of emotions. So no universally valid classification scheme can be

found that can be used by both the emotion recognizer and the dialogue system in a do-

main independent way.” They also explain that emotions have a physical and a cognitive

parts, called bodily and mental emotions [Picard and Picard, 1997]. This implies that

an agent could need to see its interlocutor to clearly identify his or her emotions. The

emotions could be perceived as continuous values [Lang et al., 1990]. In this case, the

emotion would be determined by its valence and its arousal. The valence “refers to the

organism’s disposition to assume either an appetitive or defensive behavioral set” and

arousal “to the organism’s disposition to react with varying degrees of energy or force”

(Lang) which would be the two drivers of emotions. Emotions can also be perceived as

discrete [Ekman, 1992]. P. Eckman defines basic emotions, like anger, enjoyment and

fear by using nine characteristics to distinguish them.

In many cases, sentiment analysis is used for binary classifying the emotions [Cambria

et al., 2013]. For example, we could want to know the agreement of an user over an

opinion. However, webcams installed on many devices allows now to make multimodal

sentiment analysis, but Cambria et al. explain that “multimodal sentiment analysis

hasn’t been fully explored”.

There are now multiple ways of predicting emotions of an user. For instance, we could

perform it from facial expressions or text inputs.

1.2.1.2 Detecting Emotions through Facial Expressions

Many APIs are based on facial expressions to recognise the emotions, so only their

physical aspect. The Emotion API [Microsoft, 2016] recognises eight emotions: anger,

contempt, disgust, fear, happiness, neutral, sadness and surprise. From a picture source,

it attributes a confidence to each emotion that sum up to 1. The emotions with a closer

score to 1 are the most probable ones.

Affectiva [Affectiva, 2009] is another API to do emotion recognition through camera. It

can recognize the same emotions than Microsoft’s Emotion API. The likelihood of each

emotion depends on the facial expression of the user [Affectiva, 2009]. For example, eyes

widen and mouth open increase the likelihood for the anger while a smile decreases it.

Page 22: Detecting and Adapting Conversational Agent Strategy to ...This project focuses on adding emotion detection in video games to make non-playable characters adapt to users’ emotions

Chapter 1 Literature Review 9

1.2.1.3 Detecting Emotions through Text

The Watson’s Tone Analyzer [IBM, 2015] analyses text and evaluates emotion, language

style and social tendencies of its writer(s). It attributes a confidence to five emotions:

anger, disgust, fear, joy and sadness. Socher et al. [2013] suggested that sentiment

analysis needs an important training and complex models. They created a “Sentiment

Treebank” which includes over 200,000 phrases labelled with an emotion to train a

“Recursive Neural Tensor Network”. With this, they achieved an accuracy of 80.7% on

sentiment analysis.

For the present project, detecting emotions with user’s utterances could be really inter-

esting as there is no need for a camera to perform this detection.

1.2.2 Adapting to User’s Emotions

In their study, Burkhardt et al. [2009] made a voice portal to test their system. They

proposed a range of strategies to make the agent react to users’ negative emotions. For

example, if the agent detects a slight anger, it would show that it perceived the user’s

anger, but would also try to explain “that it is better to continue the task”. If the

agent detects a strong anger, it would show empathy and propose to the user to talk

to a human agent. Finally, 70% of the users who tested this portal “judged the use of

emotion detection in voice portal system reasonable”. This leads to the conclusion that

adapting the agent’s strategy to user’s emotion could positively affect the conversation.

Holzapfel and Fuegen [2002] made a framework to allow conversational agents to adapt

to user’s emotion. They suggest that this could be used in robots. As they could

accidentally cause damage to objects, they would have to follow user’s order only when

their confidence is high. Thus, they could be careful at angry user’s who could create

dangerous situations.

Finally, many works focused on negative emotions, especially anger, as adapting con-

versational agents to these emotions allow to comfort the user and correct or prevent

errors. In the case of a video game, both positive and negative emotions could be used

as all dialogues are not meant to achieve goals but also to immerse the player into the

game’s universe.

Page 23: Detecting and Adapting Conversational Agent Strategy to ...This project focuses on adding emotion detection in video games to make non-playable characters adapt to users’ emotions

Chapter 1 Literature Review 10

1.3 Previous Prototype

This section presents the prototype we have done, with Aaron Walwyn and Halvin

Dufour and in partnership with Speech Graphics [Hofer and Berger, 2010] during the

Conversational Agents’ course between January and April 2017. The prototype includes

one NPC. User’s goal is to talk to this NPC to open a locked door. The game prototype

includes ASR and TTS while a Java program generates the NPC’s response from the

user’s utterances.

1.3.1 Speech Graphics’ Real Time Animation System

Speech Graphics is a company based in Edinburgh and specialized in facial animations

for the video game industry. Its main solution software, SGX, generate facial animations

with expressions and lip synchronisation from an audio input and its transcript.

Speech Graphics’ real time system goes further as the character’s face can be animated

while the audio input is loaded. Thus, we can manage the lip synchronisation while

recording from a microphone or generating with TTS the audio output. We used this

feature during our project to animate the NPC’s face in a realistic way.

1.3.2 Video Game Prototype

The scene was made with the Unity [Unity Technologies, 2005] game engine. The envi-

ronment looks like the inside of a spaceship and a locked door is kept by an AI called

Leia, projected on a screen (see Fig. 1.2), which is the only NPC in the prototype.

The ASR is made with the Windows 10 Voice SDK which is used by the virtual personal

assistant Cortana [Microsoft, 2014]. This allows an efficient voice recognition in real time

while it is embedded in Windows 10. However, it is not compatible with other operating

systems. TTS is performed by the Text to Speech service from Watson [IBM, 2006].

1.3.3 Dialogue Manager

The generation of responses from user’s utterance is made by a Java program. It receives

and send dialogue acts via UDP in XML format.

Page 24: Detecting and Adapting Conversational Agent Strategy to ...This project focuses on adding emotion detection in video games to make non-playable characters adapt to users’ emotions

Chapter 1 Literature Review 11

Figure 1.2: Screenshot of the video game prototype (from a video capture made byAaron Walwyn). The door on the right is locked and the character next to it is the

NPC called Leia.

Dialogue management is performed by OpenDial [Language Technology Group, Univer-

sity of Oslo, 2014]. This open-source toolkit allows to write a fully functional dialogue

manager in XML based on a Bayesian Network. As stated by Lison and Kennington

[2015]: “The toolkit relies on probabilistic rules to represent the internal models of the

domain in a compact and human-readable format.” This makes this toolkit easy to use,

as anyone can find the rules and edit them by changing probabilities or adding phrases

and utterances. Also, “The toolkit is in our view particularly well-suited to handle dia-

logue domains that exhibits both a complex state-action space and partially observable

environments. Due its hybrid modelling approach, the toolkit is able to capture such

dialogue domains in a relatively small set of rules and associated parameters, allowing

them to be tuned from modest amounts of training data, which is a critical require-

ment in many applications” (Lison and Kennington). As we needed to make a fully

functional prototype in a few weeks, OpenDial allowed to catch many different phrases

from the user to return consistent responses with a few rules and without training data.

Moreover, the probabilistic rules within OpenDial allowed the conversational agent to

randomly give different responses to the same requests. For example, once the user have

given the right password to open the door, the NPC can say: “Correct” or “Opening

the door. . . ”

Page 25: Detecting and Adapting Conversational Agent Strategy to ...This project focuses on adding emotion detection in video games to make non-playable characters adapt to users’ emotions

Chapter 1 Literature Review 12

We separated the domain in three parts:

• The spoken language understanding, that converts the user’s utterances into re-

quests. For example, the utterance: “Open the door!”, is a Request(Open).

• The dialogue manager, that associates a request to a response. In our previous

example, it would be DoorProtected.

• The natural language generator that generate the NPC’s utterance. With Door-

Protected, it could return: “Sorry, this door is password protected. Do you know

the password?”

1.3.4 Chatbot

We implemented a chatbot so it could return a response when the dialogue manager

could not make one from the user’s utterance (if the utterance does not match any

phrases from the language understanding). We first used Cleverbot [Carpenter, 1997]

but as we could not fully configure it, its responses were not always satisfactory. For

example, it would not answer ”Leia” if the user ask what the NPC name is. Thus, we

could use a chatbot written in AIML, like Rosie or S.U.P.E.R. [Pandorabots, 2013] so

we could edit it.

Page 26: Detecting and Adapting Conversational Agent Strategy to ...This project focuses on adding emotion detection in video games to make non-playable characters adapt to users’ emotions

Chapter 2

Methodology

This chapter describes planned architecture for the implementation and evaluation of

the project. Current implementation is detailed in Chapter 6. Evaluation’s description

and results can be found in Chapter 7.

2.1 Project Architecture

Figure 2.1: Project Architecture Diagram

13

Page 27: Detecting and Adapting Conversational Agent Strategy to ...This project focuses on adding emotion detection in video games to make non-playable characters adapt to users’ emotions

Chapter 2 Methodology 14

The application will be separated in two programs, like our previous one (see section

1.3). First, a video game prototype made with the Unity engine that includes the ASR

and TTS. The other program will be written in Java. The latter will include the dialogue

manager and a chatbot. The two programs would communicate by sending XML data

via UDP. The XML data format contains the last utterance, its confidence, which is the

probability of correctly matching the user’s utterance between 0 and 1, game logic if

necessary (opening a door, loosing money, getting a new item, etc.) and the associated

emotion like fear, surprise, anger, contempt, etc. Though many APIs can detect between

6 and 8 emotions, they would probably be grouped in no more than 4 or 5 emotions in

the implementation.

The two programs will be autonomous. Thus, even if one of them is not working, I

could still work on the second one, compile it and run it. This would allow flexibility in

development and problem solving. I would not necessarily have to pause a task to fix

bugs so I could always focus on one program at the same time.

The video game prototype’s scripts will be separated so the modification of each one

should not imply many changes in the others. In the same way, the dialogue manager

and the chatbot of the java application will be independent. While the dialogue manager

is used for the utterances in the game’s context and can generate game logic, the chatbot

would be used for every off-topic utterances.

Like in our previous project (see subsection 1.3.3), the dialogue manager will be written

in XML and interpreted by the OpenDial toolkit. It will be separated in three files:

• The natural language understanding, that catches specific patterns. For example,

if the user’s utterance begins with “who” and ends with “you”, it would consider

that the user is requesting more information about the AI.

• The dialogue manager itself, that decides what action to take in each dialogue

situation.

• The natural language generator, that can return different responses for a same

action. For the request of more information, the NPC could whether say “My

name is Brian” or “I’m Brian, from Edinburgh”.

Page 28: Detecting and Adapting Conversational Agent Strategy to ...This project focuses on adding emotion detection in video games to make non-playable characters adapt to users’ emotions

Chapter 2 Methodology 15

If the dialogue manager returns nothing, the user’s utterance is sent to the chatbot.

The chatbot does pattern matching to return a response but is not associated to specific

actions. Thus, the program will always send a response to the game prototype.

2.2 Dialogue manager improvement

OpenDial allows to write a dialogue manager architecture easily. Then, it would be

possible to improve every aspect of the dialogue manager in any order and perform tests

quickly. After each test, I could improve the dialogue manager in an incremental way;

by adding variables, phrases and actions.

If these tests are performed with different people, it would be possible to know which

words are difficult to recognize and edit the language understanding to overcome this

issue. I could also find new phrases that should be handled by the dialogue manager.

2.3 Evaluation

The main goal of this project is to evaluate the improvement that sentiment analysis

could bring into dialogues in video games. For example, in the context of the user talking

to a bartender NPC, the latter could react differently depending on the user’s emotion.

When the player asks for a drink, the NPC would always answer “wait a second” if the

program does not perform sentiment analysis. Otherwise, if the player seems joyful, the

agent could say “here is your drink”. If the player seems angry, the NPC could answer

“hey man, calm down”. Thus, it would be important to perform evaluations throughout

the project.

I would first evaluate the Unity prototype. The goal will be to ensure that the controls,

the environment and the GUI are well designed. To do so, the application should be

tested by video game users and as well as people new to video games to ensure that the

prototype is accessible to everybody.

The second evaluation affects sentiment analysis. If only one API is implemented, the

goal would be to ensure that the detected emotion is always consistent with user’s

emotion. After each utterance, the user should say which emotion he or she tried to

Page 29: Detecting and Adapting Conversational Agent Strategy to ...This project focuses on adding emotion detection in video games to make non-playable characters adapt to users’ emotions

Chapter 2 Methodology 16

express. Then we would compare it to the emotion detected by the program. Another

person could also tell what emotion he or she perceived to compare it to the application’s

output. If there are more than one implementation, the goal would be to evaluate which

one gives the best results. If sentiment analysis through text was performed, like the

tone analyzer from IBM’s Watson, it would be interesting to evaluate if it could replace

sentiment analysis through a camera.

The final evaluation will consist in verifying if the adaptation of the NPCs to the user’s

emotions contributes to a better experience. It would be a within-subjects evaluation

as the goal is to compare the application with emotions’ detection and without it. A

questionnaire would be filled by every tester and every conclusion would be based on its

analysis.

Page 30: Detecting and Adapting Conversational Agent Strategy to ...This project focuses on adding emotion detection in video games to make non-playable characters adapt to users’ emotions

Chapter 3

Requirements Analysis

This chapter stated originally planned requirements that the implementation would

follow. Any change from the requirements to the current implementation is detailed in

subsections called Current Implementation.

In this chapter, the word ”could” refers to optional requirements and ”should” to manda-

tory ones.

3.1 Project Overview

The project focuses on the management of dialogues in video games. The main idea is

to integrate open voice dialogue in games instead of proposing to the player a choice

within possible sentences written on the screen. To do so, the video game prototype

should include at least one NPC which the user would be able to interact with. The

prototype should allow open dialogue. Thus, the player could talk freely and the NPC

should always answer to him or her in a consistent way. The application should also

detect the emotions of the user and the NPC adapting its strategy to them. Finally, the

NPC’s face should be animated with the Realtime System from Speech Graphics.

17

Page 31: Detecting and Adapting Conversational Agent Strategy to ...This project focuses on adding emotion detection in video games to make non-playable characters adapt to users’ emotions

Chapter 3 Requirements Analysis 18

3.2 Video Game Design

3.2.1 Controls

The prototype should be intuitive to use for video games’ users. Thus, the controls

should be the same than we could find in modern video games, using the keyboard and

the mouse. The ’W’, ’S’, ’A’ and ’D’ keys should allow to move respectively forward,

backward, left and right. Moving the mouse should orientate the camera and the left

click should be used to interact with the objects and NPCs in the prototype.

For more needed controls:

• If it is a common action in video games, the prototype should use the most usual

control to perform it. For example, the space bar should be used to add a jump

action.

• If this action is more specific to the prototype, the corresponding key should be

easily accessed by the player, like the ’Q’ and ’E’ keys.

3.2.1.1 Current Implementation

No particular commands have been added to the program. The player can move with

’W’, ’A’, ’S’ and ’D’ keys and interact with the character or any object with the left

click.

3.2.2 Graphical User Interface

3.2.2.1 During dialogues

The GUI should display to the player some possible sentences he or she could say to

make the dialogue progress. The application should allow to show or hide the help by

pressing a key.

Both user’s and NPC’s utterances should be displayed on the screen when they are

spoken so the user can check any time:

Page 32: Detecting and Adapting Conversational Agent Strategy to ...This project focuses on adding emotion detection in video games to make non-playable characters adapt to users’ emotions

Chapter 3 Requirements Analysis 19

• if what he or she told has been correctly recognized,

• the NPC’s response if the user did not clearly hear it.

A button should allow to turn on or off the microphone. Thus the user can turn it off

if he or she wants to interact with somebody aside from the game.

3.2.2.2 Any time

If the user can interact with the NPC or the object at the centre of the screen, the GUI

should display the possible action.

If some controls are not obvious, the GUI could show their names and associated com-

mands on the screen.

The prototype could include a menu to quit or restart the game.

3.2.3 Non-Player Character

The prototype should include a goal. This goal should be achieved by interacting with

at least one NPC.

The role of every NPC should be easily understood by the user. Their environment and

words should be consistent with their role.

3.2.3.1 Current Implementation

The GUI does not display possible sentences as it would have been difficult to implement

this feature correctly. Instead, some hints were given in the evaluation instructions (see

Appendix C) so the user could always figure out what to do.

The user cannot turn off the microphone. This feature is not necessary as the application

is short (see Chapter 6) but would be necessary in longer games to allow the user to

take a break at any time.

The prototype does not include a menu.

Page 33: Detecting and Adapting Conversational Agent Strategy to ...This project focuses on adding emotion detection in video games to make non-playable characters adapt to users’ emotions

Chapter 3 Requirements Analysis 20

3.3 Conversational Agent

3.3.1 Text Input

During the evaluation, it could occur that the recognizer does not get the user’ utterances

properly. This could be due to the user’s accent or difficulty to talk, noise in the room

or technical issues like a not working microphone. In this case, the prototype should

allow the user to type written inputs instead of saying them.

3.3.1.1 Current Implementation

The user cannot type written inputs but can repeat his or her utterance. Cortana’s

option ”Recognize non-native accents for this language” was enabled so specific accents

could be recognized.

3.3.2 Voice Recognition

The speech recognition should only work with spoken English as implementing more

languages would be out of the project scope.

If proper nouns are used to refer to characters or places that may be useful in the

prototype, they should be recognized. Thus, every name should be easily pronounceable

subject to exceptions.

3.3.3 Natural Language Understanding

The dialogue manager should catch some specific phrases to make consistent answers.

To express the same idea, there are more than one possible phrase. Thus, the dialogue

manager should handle as many ways of expressing every idea as possible.

Some phrases could be wrongly interpreted by the speech recognizer. For example, it

could understand ”the store” when the player actually said ”this door”. In this case, the

dialogue manager should, if possible, catch this phrase and consider it as an expected

one.

Page 34: Detecting and Adapting Conversational Agent Strategy to ...This project focuses on adding emotion detection in video games to make non-playable characters adapt to users’ emotions

Chapter 3 Requirements Analysis 21

3.3.4 Emotions

The application should at least detect if user’s emotion is positive, negative, or neutral.

It could then detect more precise emotions like anger, sadness, or boredom.

The agent should adapt its responses to these emotions in a consistent way. For example,

it could try to motivate the user if he or she feels bored.

If the emotion detection is made through camera, it should be done in the Unity proto-

type. If sentiment analysis is performed on the utterances, it should be implemented in

the Java program.

3.3.4.1 Current Implementation

Sentiment analysis and emotion through camera were implemented and can recognize 6

emotions: neutral, joy, sadness, anger, disgust and fear/surprise (see Chapter 6).

3.3.5 Chatbot

The chatbot should be editable. It would in this way be possible to make its behaviour

consistent with the video game prototype. Its responses would match the personality

and the environment of the character. The chatbot could be written in AIML and

should be implemented in the Java code. It could be based on an existing chatbot like

S.U.P.E.R. chatbot.

3.3.5.1 Current Implementation

No chatbot has been included to the current application. Reasons of this decision are

detailed in section 6.3.3.

Page 35: Detecting and Adapting Conversational Agent Strategy to ...This project focuses on adding emotion detection in video games to make non-playable characters adapt to users’ emotions

Chapter 4

Project Plan

This chapter states the initially expected plan, with project stakeholders and risk man-

agement. Section 4.5 details how the project was finally conducted.

4.1 Project Stakeholders

Stakeholder Needs

Speech Graphics (company) Offering new ways of managing dialogues in video gamesVideo Game Industry Offering new dialogue experiences to customersVideo Game Consumers Having new experiences with dialogues in video gamesResearch Community Accessibility and utility of this workProject Supervisor and Myself Getting relevant results

Table 4.1: Project Stakeholders

4.2 Project Tasks

Making the video game prototype:

• Choosing an interesting dialogue context,

• Design

• Implementation:

– Create the Unity Prototype

22

Page 36: Detecting and Adapting Conversational Agent Strategy to ...This project focuses on adding emotion detection in video games to make non-playable characters adapt to users’ emotions

Chapter 4 Project Plan 23

– Build the SLU, DM and NLG

– Implement the chatbot and adapt to the video game context

• Validate prototype

Sentiment analysis:

• Test different APIs

• Choose the most relevant ones

• Implement the chosen APIs

• Evaluate them to select the most practical and accurate one

Adapting agent’s strategy to user’s emotions:

• Design

• Implementation

• Evaluation

Page 37: Detecting and Adapting Conversational Agent Strategy to ...This project focuses on adding emotion detection in video games to make non-playable characters adapt to users’ emotions

Chapter 4 Project Plan 24

4.3 Gantt Chart

Figure 4.1: Gantt Chart

Page 38: Detecting and Adapting Conversational Agent Strategy to ...This project focuses on adding emotion detection in video games to make non-playable characters adapt to users’ emotions

Chapter 4 Project Plan 25

Figure 4.2: Project Schedule

4.4 Risk Management

4.4.1 Risk Analysis

Risk Impact Probability

Impossible to implement a sentiment analysis API H M

Implementation is behind schedule H H

Prototype does not work H M

Plan changes M H

Student illness H L

Table 4.2: Risk Analysis

Page 39: Detecting and Adapting Conversational Agent Strategy to ...This project focuses on adding emotion detection in video games to make non-playable characters adapt to users’ emotions

Chapter 4 Project Plan 26

4.4.2 Contingency Plan

To overcome any risk, the current schedule allows some flexibility to complete every task

as 13 days are dedicated to solving problems.

More than one API will be studied so there would be alternatives if one API could not

be implemented.

The dialogue manager will be independent from the video game prototype. If the latter

is not working, I could still implement the sentiment analysis and the agent strategy. I

would find and discuss a backup plan with my supervisor in the meantime.

4.5 Changes in the Plan

There were five main steps in the development of the project:

1. Making the Unity application running. They were few issues to make it running

on the laptop on which was made the project.

2. Implementing Affectiva and Watson to have the technical part of the project done.

3. Writing the dialogue and choose a method to select the emotion between the one

detected with Affectiva and the one detected by Watson’s Tone Analyzer.

4. Designing and run the evaluation.

5. Analysing the results and write the report.

Page 40: Detecting and Adapting Conversational Agent Strategy to ...This project focuses on adding emotion detection in video games to make non-playable characters adapt to users’ emotions

Chapter 5

Professional, Legal, Ethical and

Social Issues

5.1 Professional Issues

The code produced during this project will respect the British Computing Society code

of conduct. It will be provided with a complete documentation to compile and edit the

code.

As the project is done in partnership with Speech Graphics, it would be needed to

respect any opinion and design advice that comes from this company.

5.2 Legal Issues

Speech Graphics’ Realtime System will not be distributed due to a non-disclosure agree-

ment made with Speech Graphics. Softwares and libraries used in this project will be

used with respect to their licenses.

Apart from Speech Graphics’ system, the project’s source code will be distributed under

the MIT license.

27

Page 41: Detecting and Adapting Conversational Agent Strategy to ...This project focuses on adding emotion detection in video games to make non-playable characters adapt to users’ emotions

Chapter 5 Professional, Legal, Ethical and Social Issues 28

5.3 Ethical Issues

Evaluation will be ran with the full informed consent of the users and with no deception.

All data will be anonymised so nobody could be identified by reading this data. A camera

will be running during the evaluation for emotion detection, but no picture will be kept

or publish in any way.

5.4 Social Issues

As players could freely talk with a conversational agent and a chatbot, it could occur

that some responses offend the users. Any utterance from the chatbot that could be

offensive will be deleted.

5.5 Ethical Approval

The Ethical Approval form filled on the Heriot Watt’s MSc. project-system website can

be found in Appendix A.

Page 42: Detecting and Adapting Conversational Agent Strategy to ...This project focuses on adding emotion detection in video games to make non-playable characters adapt to users’ emotions

Chapter 6

Implementation

6.1 Presentation of the application

The implementation results in a small scaled 3D video game application. The scene

takes place in a spaceship. The user controls the main character who woke up alone and

found nobody but an AI.

By talking to the AI, the user has to figure out what his or her goal – leave the spaceship

by opening an initially locked door – is and achieve it before the end of a hidden timer.

Dialogue is made by talking to the AI through a microphone. The AI responds to any

user’s utterance through an audio output.

Figure 6.1: Screenshot of the application showing the AI

29

Page 43: Detecting and Adapting Conversational Agent Strategy to ...This project focuses on adding emotion detection in video games to make non-playable characters adapt to users’ emotions

Chapter 6 Implementation 30

6.2 Architecture

Figure 6.2: Project Architecture Diagram

The system consists in two programs running independently. The first one is an Unity

game featuring the 3D environment, the game mechanics, speech recognition and TTS.

The second one is a Java program including Watson’s Tone Analyzer for sentiment

analysis and OpenDial libraries and scripts to manage the dialogues.

The Unity application records users’ utterances and send them to the Java application

in an XML format via UDP. The latter will generate an answer with OpenDial scripts

and send AI’s response to the other also via UDP in the same format.

6.2.1 Dialogue Class

In the two programs, a dialogue class was made with these attributes:

• Speech (string): the last utterance,

• Confidence (float): confidence for the emotion detected between 0 and 100,

• Action (string): an action to execute (e.g. open the door),

Page 44: Detecting and Adapting Conversational Agent Strategy to ...This project focuses on adding emotion detection in video games to make non-playable characters adapt to users’ emotions

Chapter 6 Implementation 31

• Emotion (string): the emotion detected through camera,

• Time (int): remaining time in seconds.

In both programs, each utterance leads to a new instance of this object. An XML string

is then made with the dialogue instance attributes and sent to the other program via

UDP. The Unity application uses the XmlSerializer object and the Java application an

Unmarshaller and a Marshaller objects to create and read XML data.

<?xml version="1.0" encoding="utf-16"?>

<dialogue xmlns:xsi="http://www.w3.org/2001/xmlschema-instance"

xmlns:xsd="http://www.w3.org/2001/xmlschema">

<action>action</action>

<speech>why am i alone</speech>

<confidence>12</confidence>

<emotion>neutral</emotion>

<time>636</time>

</dialogue>

Figure 6.3: Sample XML data for a dialogue object

6.2.2 Unity Application

The game was made with Unity 5.6.1f1. C# scripts were written with Visual Studio

2015 made by Microsoft in its Community edition. The game uses scripts made by

Speech Graphics for the facial animation of the AI’s 3D model, the Unity SDK to use

Watson services, the Unity SDK for Affectiva and free assets from the Unity store to

build the spaceship corridors.

The first version of this application was made by Aaron Walwyn during the conversa-

tional agents’ course. The application was mostly edited to add emotion detection and

management.

The game contains two scenes: the main menu and the game itself.

Page 45: Detecting and Adapting Conversational Agent Strategy to ...This project focuses on adding emotion detection in video games to make non-playable characters adapt to users’ emotions

Chapter 6 Implementation 32

6.2.2.1 Menu

Figure 6.4: Main Menu of the Game

The main menu consists in a simple screen with a text box in which the IP address of

the computer on which the Java program is running (the ports are set up in the code)

must be typed. This IP address can be the local one if the two programs are running

on the same computer.

Then, the spaceship scene is loaded when clicking on Play Spaceship Demo.

6.2.2.2 Game

Figure 6.5: Unity application’s main scripts

Page 46: Detecting and Adapting Conversational Agent Strategy to ...This project focuses on adding emotion detection in video games to make non-playable characters adapt to users’ emotions

Chapter 6 Implementation 33

In the game, once the user has clicked on the AI, a new instance of dialogue is made

in a script DialogueManager.cs with no speech but an action start conversation. This

instance is sent to the Java program to start the conversation and have the AI say

the first utterance. Then the voice recognition with the DictationRecognizer from the

library Unity.Windows.Speech which uses Cortana’s speech recognition starts in the

script VoiceRecognition.

Dialogue d = new Dialogue (null, 1f, "start_conversation");

if (LevelManager.instance.dialogue != null)

LevelManager.instance.dialogue.Send(d);

PlayerController.instance.voice.StartInput();

Figure 6.6: Starting the conversation from DialogueManager.cs

A script called NetworkController sends and receives the XML data through a Send()

and a Listen() methods. The XMLSerializer object used to serialize and deserialize the

Dialogue instance is in a DialogueClient.cs script in which the object Dialogue is also

defined.

The speech recognition works continuously, so the user can talk at the same time than

the AI. Once the program has detected the end of the user’s dictation, it creates an

instance of Dialogue in VoiceRecognition.cs with the speech, the current emotion with

its confidence that is set up in EmotionManager.cs and the remaining time. Finally the

Dialogue instance is sent with the function Send() from the DialogueClient class to the

Java program, using the NetworkController.

Dialogue d = new Dialogue(_text, emotion.currentEmotion,

emotion.confidence, 360 - Mathf.CeilToInt(Time.time));

if (LevelManager.instance.dialogue != null)

LevelManager.instance.dialogue.Send(d);

Figure 6.7: Sending new Dialogue instance from VoiceRecognition.cs

When the NetworkController receives an AI’s response, the Receive() function from

DialogueClient is called to create a new instance of Dialogue. If there is an action

in this instance, it is sent to an ActionController. The DoorInteraction class inherits

from this ActionController. If the action is open door, a bool called locked takes the

value true, so the player can then open the door which was initially locked by clicking

on it. Then the Dialogue instance is added to a queue. In the Update() function of

DialogueClient, if the queue is not empty, we remove its first Dialogue and send it to the

Page 47: Detecting and Adapting Conversational Agent Strategy to ...This project focuses on adding emotion detection in video games to make non-playable characters adapt to users’ emotions

Chapter 6 Implementation 34

TextToSpeechControl so an audio output can be generated with Watson’s TTS service.

This class also starts the facial animation of the 3D model with lip synchronisation with

the audio by using the scripts from Speech Graphics.

6.2.3 Java Application

The Java application was written in Java 8 with the integrated development environ-

ment Intellij 2016.2.5 Idea made by JetBrains. It uses the Java SDK for Watson Tone

Analyzer service and libraries for using OpenDial’s scripts. It has two classes; the Dia-

logue class with constructors to instantiate it and getters and setters to manipulate its

attributes and a Main class.

6.2.3.1 Initialization

First, an OpenDial’s Domain is defined from our scripts and a DialogueSystem system

using this Domain. We then choose to not show OpenDial’s graphical user interface and

finally start the system.

//Creating the dialogue system

Domain domain = XMLDomainReader.extractDomain("scripts/project.xml");

DialogueSystem system = new DialogueSystem(domain);

//GUI off

system.getSettings().showGUI = false;

system.startSystem();

Figure 6.8: Initializing the Dialogue System

Then, the UDP connexion is set up with the IP of the laptop on which the Unity

application is running and used ports.

// Address and port as arguments

InetAddress host = InetAddress.getByName(args[0]);

int portRec = Integer.parseInt(args[1]);

int portSend = Integer.parseInt(args[2]);

//Create socket

DatagramSocket socket = new DatagramSocket(portRec);

Figure 6.9: Initializing UDP connexion

Page 48: Detecting and Adapting Conversational Agent Strategy to ...This project focuses on adding emotion detection in video games to make non-playable characters adapt to users’ emotions

Chapter 6 Implementation 35

Finally, the Tone Analyzer service is set up as well as two variables: timeMin, the

remaining time in minutes and doorOpen, a boolean which is true when the door is

open.

// Watson for sentiment analysis

ToneAnalyzer service = new ToneAnalyzer("2016-05-19", "user", "password");

// Left time in minutes

int timeMin = 6;

// Is the door currently open?

boolean doorOpen = false;

Figure 6.10: Initializing Tone Analyzer service and two variables

6.2.3.2 Main Loop

A buffer is created to receive the XML data from the Unity application. Then the packet

is received and an instance of Dialogue is made with an Unmarshaller object.

If Dialogue’s action is set up to start conversation, its speech is set to start conversation,

which will be later interpreted by the Dialogue Manager.

We run the Tone Analyzer service on the speech and get the main emotion with its

confidence. Then the emotion is chosen between emotion detected by the camera and

the one estimated with Watson’s service (see section 6.2.4). The emotion is finally set

up in the Dialogue Manager.

Then, the remaining time is managed by setting the right value to the Dialogue Manager

system’s timeRemaining :

• if the time in minutes is greater than 0, timeRemaining = time in minutes +

“minutes” e.g. “5 minutes”

• if the time is smaller than 60 seconds but greater than 10, timeRemaining = time

in seconds + “seconds”

• if the time in seconds is smaller than 10, timeRemaining = “endNear”

We then add the speech as an input to the Dialogue Manager system and get the answer.

An instance of Dialogue called response is made with this answer. If the value of the

Page 49: Detecting and Adapting Conversational Agent Strategy to ...This project focuses on adding emotion detection in video games to make non-playable characters adapt to users’ emotions

Chapter 6 Implementation 36

system’s variable openDoor is true and doorOpen is false, doorOpen becomes true and

response’s action is set to open door.

// Actions

if (!doorOpen && system.getContent("openDoor").toString().equals("true"))

{

response.setAction("open_door");

doorOpen = true;

}

Figure 6.11: Opening the door

Finally, the response is converted into XML data with a Marshaller object and sent to

the Unity application before the program loops.

6.2.4 Determining which Emotion to Keep

In each case, we choose the emotion between:

• neutral,

• joy,

• anger,

• disgust,

• sadness,

• surprise and fear for Affectiva and fear for Watson which are assimilated here.

Once an emotion has been chosen by Affectiva and Watson’s Tone Analyzer, the emotion

kept between these two is the one with the greatest confidence. The next subsections

explain how these confidences between 0 and 100 are determined.

6.2.4.1 Emotion Detection with Affectiva

Affectiva gives a confidence between 0 and 100 for every emotion that sum up to 100.

We choose the emotion other than neutral with the greatest confidence. If its confidence

is greater than 5, we keep this emotion and calculate a new confidence. Otherwise, the

emotion is set to neutral.

Page 50: Detecting and Adapting Conversational Agent Strategy to ...This project focuses on adding emotion detection in video games to make non-playable characters adapt to users’ emotions

Chapter 6 Implementation 37

confidence = confidence * 100 / (currentAnger + currentDisgust +

currentFear + currentJoy + currentSadness + currentSurprise);

Figure 6.12: Calculating new confidence for Affectiva

6.2.4.2 Sentiment Analysis with Watson

Confidences in Watson are between and 0 and 1 but do not sum up to 1. There is no

score attributed to a neutral emotion. We keep the emotion with the greatest confidence

if it is greater than 0.1. The new confidence is confidence of the kept emotion∗100sum of all confidences Otherwise,

the emotion is automatically the emotion from Affectiva.

6.2.5 Dialogue Manager with OpenDial Scripts

OpenDial’s scripts are written in a XML format. The Dialogue Manager has some

variables including userEmotion for the emotion of the user, timeRemaining for the re-

maining time and openDoor to set the action of opening the door. Dialogue management

is split into 5 steps:

6.2.5.1 Semantic Language Understanding

The semantic language understanding (SLU) associates user’s utterance to a function.

As there are many ways of expressing the same idea, the SLU can catch different patterns.

For example, to express the idea of asking: “What is the situation?”, associated to

the function AskSituation, the SLU can catch these patterns: “* What * going on*”,

“*explain * happen*”, “*tell * situation*”, etc. where * means any string.

6.2.5.2 Emotion Manager

The emotion manager associates an idea with the emotion of the user. For example, if

the player’s emotion is anger, then AskSituation becomes AskSituationAnger. In most

cases, there are specific functions for one or two emotions and all the others are merged in

one function. For example, AskSituation is associated to AskSituationAnger if the user’s

emotion is anger or disgust, AskSituationSurprise for surprise, AskSituation otherwise.

Page 51: Detecting and Adapting Conversational Agent Strategy to ...This project focuses on adding emotion detection in video games to make non-playable characters adapt to users’ emotions

Chapter 6 Implementation 38

6.2.5.3 Dialogue Manager

The dialogue manager is unused here. It could have been used if the AI could give

different answers to a same user’s utterance and emotion depending on other variables.

6.2.5.4 Natural Language Generation

The natural language generation (NLG) associates a function to a response. For example:

• AskSituationAnger is associated with “Hum, we are crashing into the sun.”,

• AskSituationSurprise is associated with “I’m not telling you if you are not being

nice.”,

• AskSituation is associated with “Believe me or not but our spaceship will crash

into the sun.”.

The NLG can also change the value of the openDoor variable when the AI tells that it

is opening the door.

6.2.5.5 Time Manager

The time manager changes the response if timeRemaining has a different value than

none. If timeRemaining matches the patterns “* minutes” or “* seconds” it adds it and

“left.” to the response. For example, with the function AskSituation and timeRemaining

equal to “3 minutes”, the response becomes: “Believe me or not but our spaceship will

crash into the sun. 3 minutes left.” If timeRemaining is equal to “endNear”, then the

response becomes: “It’s too late... Farewell, my friend.”

6.3 Decisions Taken on Design of the Implementation

6.3.1 Technologies

Affectiva was chosen for emotion detection as it exists an Unity SDK to implement it

easily into an Unity game system.

Page 52: Detecting and Adapting Conversational Agent Strategy to ...This project focuses on adding emotion detection in video games to make non-playable characters adapt to users’ emotions

Chapter 6 Implementation 39

Some choices were made before the present project. Unity was chosen because it is one

of the most simple game engine to use and many APIs are easy to implement into Unity

by using the right SDKs. UDP was chosen over TCP as the two programs run on the

same laptop, so UDP is a faster solution with very limited risks of getting wrong data.

6.3.2 Game Design

6.3.2.1 Layout of the Scene

Figure 6.13: Scene’s view from Unity Editor

The three most important objects in the scene were placed close from each other so the

user could easily find them and would not have to move much during the evaluation. A

screen displays the AI. There is a sign showing that escape pods can be found behind a

specific door. This sign has been placed here to give users a hint about what is behind

the locked door.

6.3.2.2 Making Users Express Emotions

Many choices were made in order to make users express different emotions during the

evaluation.

Page 53: Detecting and Adapting Conversational Agent Strategy to ...This project focuses on adding emotion detection in video games to make non-playable characters adapt to users’ emotions

Chapter 6 Implementation 40

The goal of the user is not given before the evaluation. This was intended to surprise the

player when he or she learns that the spaceship will crash into the Sun. Also, having the

door locked and a limited time with a hidden timer were expected to create a situation

of urgency, so the player could easily get annoyed by the AI refusing to open the door.

By making the AI asking to the player if he would leave it alone (see dialogue samples

in section 6.6), the user could express sadness.

6.3.3 Removing the Chatbot

It was decided to remove the chatbot from the implementation. Indeed, as it was not

reacting to users’ emotions, it would not have contributed to the evaluation in a sensible

way. Moreover, it might have participated to make the conversations off-topic. It would

be more useful in real games without time limit.

6.4 Difficulties Met During the Implementation

The first difficulty met was to use the Unity application. It firstly did not work on

the laptop used for this project. Then I had to learn how it works in order to use it

adequately and add the emotion detection. Some of the features initially planned (see

Chapter 3 Requirements Analysis) were not implemented because of the difficulty that

it would have represented (like allowing the user to type their utterances instead of

talking).

The other main difficulty was to write a dialogue manager that understands many dif-

ferent possible sentences with sensible responses to them. As I am not a native speaker,

it was difficult for me to think about all possible ways of expressing an idea. The imple-

mentation would have needed more tests with different people to make it more complete.

OpenDial toolkit makes also the process of writing dialogues long. This report further

develops ideas to make the writing process easier and faster in Chapter 8 Future Works.

Page 54: Detecting and Adapting Conversational Agent Strategy to ...This project focuses on adding emotion detection in video games to make non-playable characters adapt to users’ emotions

Chapter 6 Implementation 41

6.5 Critical Evaluation

The present implementation is interesting. The dialogues have a good flow as the voice

recognition is made in real time and the program does not take much time to answer to

each user’s utterance. The presence of the lip synchronisation on the AI makes it feel

like it is really talking. Even if it does not express any emotion, it is not an issue as it

is supposed to be an AI but that could become problematic for human characters.

The presence of emotion detection makes each run of the game unique. Even if the

emotion detection through camera did not give good results, the presence of sentiment

analysis on the text allows to have different responses for a single idea just by changing

its semantic.

Affectiva only has recognized joy. It never detected other emotions like sadness or anger.

It may be due to lightning, to the camera used, to the background behind the users or

to some mistakes in the configuration of Affectiva in Unity.

The actual game does not feature an end when the user succeeds or fails – a Game Over

scene. The dialogue is also very slow to start when clicking on the AI, which made the

users wonder if it was working.

There are some technical issues in the program. Sometimes, the voice recognitions

stops so the application has to be restarted, which breaks user’s immersion. It is also

sometimes possible to open the door at the wrong time, which may come from an error

in OpenDial’s scripts.

The dialogue manager is not complete. More possibilities should be added as well as

more different patterns to catch to avoid the AI from saying “I did not understand” too

often.

6.6 Sample Dialogues

These samples were obtained by writing directly in OpenDial’s GUI. Getting the same

behaviour from the AI in the current implementation could be achieved by tricking

Watson’s sentiment analysis. e.g. By adding “It makes me sad but” before intended

sentence.

Page 55: Detecting and Adapting Conversational Agent Strategy to ...This project focuses on adding emotion detection in video games to make non-playable characters adapt to users’ emotions

Chapter 6 Implementation 42

6.6.1 Full Sample Dialogue Without Emotions

AI: Hey PassengerUser (neutral): What is going on?AI: Believe me or not but our spaceship will crash into the sun.User (neutral): What should I do?AI: Actually, you are in a pretty bad situation. Maybe the only solution

for you is to go through this door and escape.User (neutral): I will escape then.AI: Are you leaving me alone here?User (neutral): Maybe I can bring you on a flash drive.AI: You can’t save me. . . but thanks for trying. Just go now.

Table 6.1: Full dialogue sample without emotions

6.6.2 Dialogue Samples With Emotions

User (anger): What is going on?AI: I’m not telling you if you are not being nice.User (anger): I’m calm now.AI: No you’re notUser (sadness): I’m calm now.AI: I did not think this would affect you this much. Sorry.User (neutral): I’m calm now.AI: So now you can repeat your question

User (neutral): I will escape then.AI: Are you leaving me alone here?User (neutral): I can’t do anything for you.AI: Will you remember me?User (neutral): Yes, I will never forget you.AI: Thank you. Now, go, my friend.

User (anger): I can’t do anything for you.AI: We’ll die together then.

User (sadness): I can’t do anything for you.AI: If there is no other solution, just go. I hope you would not forget me...

User (neutral): Blade Runner is the best movie.AI: I did not understand what you’ve said.User (joy): Blade Runner is the best movie.AI: I don’t get how you can be happy in this situation.User (disgust): Blade Runner is the best movie.AI: You look revolted by the situation.

Table 6.2: Dialogue samples with emotions

Page 56: Detecting and Adapting Conversational Agent Strategy to ...This project focuses on adding emotion detection in video games to make non-playable characters adapt to users’ emotions

Chapter 7

Evaluation

7.1 Description

7.1.1 Objectives

The main objectives of the evaluation were:

• To check if users could perceive whether the emotion detection is enabled or not

in the application and

• To assess the contribution of emotion detection to the enjoyment of the application.

Secondly, this evaluation aims to estimate and compare the accuracy of the emotion

detection by Affectiva and the sentiment analysis by Watson.

Finally, it surveys the opinion about how emotion detection could impact the video game

industry.

7.1.2 Setup

A second version of the implementation described in Chapter 6 was made. It uses exactly

the same Unity application and Opendial scripts than the other one. However, the

emotion is always set up to “neutral”, simulating the deactivation of emotion detection.

43

Page 57: Detecting and Adapting Conversational Agent Strategy to ...This project focuses on adding emotion detection in video games to make non-playable characters adapt to users’ emotions

Chapter 7 Evaluation 44

A between-subjects evaluation was run with these two systems. A within-subjects one

would not have been effective as a user testing the same application twice would not

express the same emotions during his or her second run than his or her first one. For

example, he or she would not be surprised when the AI tells that the spaceship will

crash into the Sun. Moreover, he or she would know the objective which is supposed to

be learnt during the test.

Each participant had to fill and sign a consent form (see Appendix B). He or she was then

given instructions (see Appendix C). They gave an idea of the game’s plot, explained

how to play and gave some advices in case the user does not know what to do or say

during the test.

The evaluation took only a few minutes as there is a five minutes limit in the application.

Participants were asked to put on an headphone to avoid the system from hearing itself.

At the end of the test, the user was given a questionnaire to fill (see Appendix D)

assessing the enjoyment of using the application and if the user felt like the AI reacted

to his or her emotions.

7.1.3 Questionnaire

In this questionnaire, every Likert scale are between 1 – strongly disagree – and 6 –

strongly agree – so there are no neutral answers.

7.1.3.1 About the User

The first section of the questionnaire – About You – asked for the age of the user (in

intervals) and how often he or she plays video games (on a Likert scale). These questions

were used to verify the independence of the two groups of users as a group as these two

factors could be biases in the results.

7.1.3.2 About the Application

The second section – The Application – assesses the enjoyment of the player and if he

or she felt like the Non-Playable Character reacted to his or her emotions on a Likert

Page 58: Detecting and Adapting Conversational Agent Strategy to ...This project focuses on adding emotion detection in video games to make non-playable characters adapt to users’ emotions

Chapter 7 Evaluation 45

Scale. These two questions are the most important as they were used to achieve the

main objectives of the evaluation.

7.1.3.3 About the Video Game Industry

The third section – Video Game Industry – surveys the opinion of testers about what

could emotion detection bring to the the video game industry. First, it assesses, accord-

ing to the user, the usefulness of free speech and emotion detection in games. Then an

open question allows the tester to explain his or her opinion. The final question was

asked to think about other possible domains where emotion detection could be used.

7.1.4 Objective Metrics

Dialogues were logged to keep track of objective metrics used in the analysis of the

results.

Metric Use

Number of turns in the dia-logue

Used to estimate the length of the conversation

Number of times the AI did notunderstand what the user said

Used to evaluate how difficult it is to be understood bythe AI

Completion of the task Used to assess the ease of completing the task with giveninstructions

Did the user asked for help? Used to assess the ease of knowing what to say or do tocomplete the task.

Number of times an emotionother than “neutral” was per-ceived by the camera

Used to evaluate the contribution of emotion detectionthrough the camera

Number of times an emotionother than “neutral” was per-ceived by sentiment analysis

Used to evaluate the contribution of emotion detectionthrough sentiment analysis

Table 7.1: Objective Metrics

7.2 Results

Eleven people took part in this evaluation; six without emotion detection enabled and

five with it. All the testers are students or researchers at Heriot-Watt University.

Page 59: Detecting and Adapting Conversational Agent Strategy to ...This project focuses on adding emotion detection in video games to make non-playable characters adapt to users’ emotions

Chapter 7 Evaluation 46

7.2.1 Ease of Completion

During the evaluation, only one person did not complete the task by running out of

time. One user stated that the instructions were helpful to complete the task. 45% of

the users asked for help to the AI. We can conclude that the task was achievable but it

is important to help the users who cannot always figure out what to do or say.

7.2.2 Independence of the Two Groups

A first Chi-Square test was performed on the age of the testers depending on the system

used.χ2(2) = 1.397, p = 0.497. The p-value is greater than 0.05, so there is no correlation

between the age of the users and the system used.

Figure 7.1: Age of Participants for each System

Another Chi-Square test was performed on the answers to the question about how often

do users play video games and the system they were using. χ2(4) = 3.471, p = 0.482.

p > 0.05, so these two parameters are independent.

We can conclude that the two groups were independent on the aspects of their age and

how much they play video games so these parameters will not be biases in the analysis

of the results.

Page 60: Detecting and Adapting Conversational Agent Strategy to ...This project focuses on adding emotion detection in video games to make non-playable characters adapt to users’ emotions

Chapter 7 Evaluation 47

Figure 7.2: How often Users Play Video Games (x: mark, y: number of people

7.2.3 Emotion Detection

7.2.3.1 Feeling Like the Character Reacts to User’s Emotions

A Chi-Square test was performed on users’ answer to the question “I felt like the Non-

Playable Character reacted to my emotions” depending on the system they used. χ2(4) =

8.983, p = 0.062 so p > 0.05. The two parameters seem to be independent. However,

by grouping marks between 1 and 3 and marks between 4 and 6, respectively for people

who felt like there was no emotion detection and people who felt like there was emotion

detection, we observe only one special case. One person guessed that there was emotion

detection when it was not enabled (see Fig. 7.3).

It was noticed that during his or her test, the dialogue only had 4 turns, making it

difficult for the user to notice that the emotion detection was not activated. By removing

this line from the test, the result of the Chi-Square test changes. χ2(4) = 10, p = 0.04

so p < 0.05. In this case, the two metrics are dependent. We can conclude that the

presence of emotion detection is noticeable by the users as long as the dialogue has

enough turns to see it.

Page 61: Detecting and Adapting Conversational Agent Strategy to ...This project focuses on adding emotion detection in video games to make non-playable characters adapt to users’ emotions

Chapter 7 Evaluation 48

Figure 7.3: Number of people feeling like there is or not emotion detection for eachsystem

7.2.3.2 Emotion Detection through Camera and Text

The proportion of user’s utterances tagged with an emotion other than “neutral” was

calculated for detection through camera and text:

P (emotionscamera) = Number of times an emotion other than “neutral” was perceived through cameraNumber of turns in the dialogue

P (emotionstext) = Number of times an emotion other than “neutral” was perceived through textNumber of turns in the dialogue

On average, P (emotionscamera) = 18.87% with a variance of 1.42%. In every case,

emotions detected through camera were tagged with “joy” or “neutral” which means

that Affectiva could not detect every intended emotions. It might be due to lighting or

the background during tests. Further tests could be run on this API and it should be

compared to other emotion detection APIs.

On average, P (emotionstext) = 58.83% with a variance of 0.45%. Watson’s Tone An-

alyzer could detect the six intended emotions. In the current study, it impacted the

dialogue manager more than the emotion detection through camera. However, future

Page 62: Detecting and Adapting Conversational Agent Strategy to ...This project focuses on adding emotion detection in video games to make non-playable characters adapt to users’ emotions

Chapter 7 Evaluation 49

works should not rely only on sentiment analysis as it cannot detect the tone of voice

or the facial expressions of the user (see Chapter 8).

7.2.4 Contribution of Emotion Detection to Users’ Enjoyment

It first was verified if the number of times the AI does not understand the user affects his

or her enjoyment of the application. With a Sig. value of 0.113, which is greater than

0.05, on the Shapiro-Wilk test, this metric does not deviate from a normal distribution so

it is non-parametric. A kruskal-Wallis test was run on it within three groups, depending

on the given mark to the question 3 about enjoyment. χ2(2) = 1.788, p = 0.409 so

p > 0.05. Thus, the number of times the AI did not understand the user and his or her

enjoyment are independent.

A Chi-Square test was run on the answers to the question about how much did users

enjoy the application depending on the system they used. χ2(2) = 0.11, p = 0.946

so p > 0.05. The enjoyment of the user does not depend on the system used. In

this evaluation, emotion detection and adaptation of the conversational agent did not

contribute to improve the experience of the users. It might be caused by the short

duration of the application. Emotion detection could have more impact on the long run

if the game is more immersive and the NPCs more endearing.

Figure 7.4: Number of people for every different answer given to the question “Ienjoyed using this app” for each system

Page 63: Detecting and Adapting Conversational Agent Strategy to ...This project focuses on adding emotion detection in video games to make non-playable characters adapt to users’ emotions

Chapter 7 Evaluation 50

7.2.5 Suggestions of Possible Improvements

Users gave different answers to the question 6: “What improvements could be done one

the current application?”

7.2.5.1 About the Game Itself

Some suggestions were made about gaming aspects. It was suggested to implement the

ability of jumping in the game. It was also advised to put clear goals.

Some users suggested that we should be able to escape with the AI, for example by

taking it on a flash drive or a cellphone. It was also suggested that the player should

be able to interact more with the environment in the game. We can conclude that free

speech should come with a wide range of possible actions to improve players’ immersion

in the game.

7.2.5.2 About the Conversational Agent

It was proposed to improve the natural language understanding and to widen the dia-

logue, making the AI answering to more utterances in a consistent way. Suggestions on

future works to improve this aspect are made in chapter 8.

7.2.6 Application to the Industry

The answers to question 7 “I think that talking freely to the characters is the next step

of video game industry” and question 8 “I think that emotion detection is valuable in

games” respectively have an average of 5 and 4.82. Thus, users think that emotion de-

tection is valuable in games. It is suggested that it could strongly improve the immersion

in games with open worlds and that it could be combined with virtual reality to engage

the player. It could also improve the games by making them non-repetitive. However,

some users are more unsure. Moreover, it was suggested that it would not work in every

video game genre.

Page 64: Detecting and Adapting Conversational Agent Strategy to ...This project focuses on adding emotion detection in video games to make non-playable characters adapt to users’ emotions

Chapter 7 Evaluation 51

Finally, users suggested that emotion detection could be used in robotics, advertising,

smart homes, schools and hospitals. Thus, the use of conversational agents with emotion

detection and reaction to them could be used in other domains than video games.

Page 65: Detecting and Adapting Conversational Agent Strategy to ...This project focuses on adding emotion detection in video games to make non-playable characters adapt to users’ emotions

Chapter 8

Future Works

This chapter details some recommendations about what could be done in the future for

video games with open dialogue and emotion detection.

8.1 Emotion Detection and Management

Emotion detection needs to be improved to allow the video game characters to react in

the most sensible way. Comparative tests need to be run on multiple APIs for emotion

detection through camera in order to get the best possible results. Emotion detection

through audio could be investigated and combined with emotion detection through cam-

era and text to get the most precise emotions. e.g. If a user says something sad with a

smile and a hesitant tone, the AI could know that the player has an ambivalent emotion.

8.2 Open Dialogue

Open dialogue is complex to implement as writers could want to have full control on

the dialogues but at the same time would not want to make it too long to write them.

It could be needed to make a program with a GUI to help to write these dialogues. It

could look like a spreadsheet with most current utterances implemented with a full set of

different ways of expressing the same idea. Indeed, some sentences like “Who are you?”,

“What is your name?” or “Could you help me? would be likely to be said to any NPC,

so it would be great to not have to always add every ways of saying it to the dialogue

52

Page 66: Detecting and Adapting Conversational Agent Strategy to ...This project focuses on adding emotion detection in video games to make non-playable characters adapt to users’ emotions

Chapter 8 Future Works 53

scripts. Then writers could complete fields like the name and other characteristics of the

character so the answers to the most simple questions could be generated automatically.

Finally, writers could handwrite any response for each input emotion to make unique

characters or add even more specific recognized utterances from the users like passwords.

This tool could be made with machine learning to help classifying many different possible

sentences and tag words to group them in categories.

If there are too many fields in the spreadsheet to be completed by hand, a chatbot

specially trained to answer to different emotions could be used to complete them to still

have different characters answering differently to the same user’s inputs.

8.3 Create a Game

The best way of fully evaluating if emotion detection contributes to a better experience

for the users would be to create a complete game with more than one character. The

game should be long enough and have a good gameplay and story to fully immerse the

players. The game could feature characters with emotion detection and others without

it. The evaluation could be made by asking to the users who are they favourite characters

and why to verify if the ones with emotion detection contribute to the enjoyment and

immersion of the users. As the game would be long, the best way of running the

evaluation might be to distribute the game and ask the users to complete an online

form. Some data could also be recorded from the game like the time the user played

and the number of times he or she talked to every character.

Page 67: Detecting and Adapting Conversational Agent Strategy to ...This project focuses on adding emotion detection in video games to make non-playable characters adapt to users’ emotions

Conclusion

This project is based on existing research. Open dialogue and sentiment analysis are

frequently used in the industry with Cortana, Siri or Watson. However, voice dialogues

are not currently used in the video game industry. It seems that dialogue systems in

modern video games are really close to systems used ten years ago.

By making a system with emotion detection, we could try to figure out how it may

contribute to the video game industry. The implementation implied a specific design to

have the users expressing emotions and possibly appreciate to see the NPC reacting to

them.

By evaluating this system, we could verify that emotion detection is noticed by users.

However, it did not seem to fully contribute to their enjoyment of the game. It might be

due to the short duration of the evaluation that did not allow users to fully be immersed

in the game and get attached to the NPC.

These results lead to reflections about how to fully take advantage of the possibilities

offered by emotion detection in games. By making tools to make the dialogue writing

process easier and faster, it could be possible to make complete games with multiple

characters and a deep story. These could lead to new great video game experiences.

This might be combined with other fields, like crafting AIs with unique personalities

and their own emotions, virtual reality and more.

54

Page 68: Detecting and Adapting Conversational Agent Strategy to ...This project focuses on adding emotion detection in video games to make non-playable characters adapt to users’ emotions

Appendix A

Ethical Approval

55

Page 69: Detecting and Adapting Conversational Agent Strategy to ...This project focuses on adding emotion detection in video games to make non-playable characters adapt to users’ emotions

Appendix A Ethical Approval 56

MSc Project System: Ethical Approval Details

MSc Academic year: 2016/17

Student: Brice Cagnol

Title: Speech Interfaces for Role-playing games

Supervisor: Oliver Lemon

Abstract: Modern RPGs such as Skyrim and Fallout 4, Witcher, etc have verycumbersome and annoying interfaces for inventory management. This projectwill instead develop a mod for such RPGs that allow the player inventory tobe managed by speech. For example "show me all my food" or "what itemsdo I have that contain copper" or "drop all my items that are worth less than100 gold" See e.g. https://www.youtube.com/watch?v=DRVCkUN_Mq8 forinspiration You will evaluate the usability and entertainment value of thissystem versus the original GUI that ships with the game. See e.g.http://www.voiceattack.com/

Currentstatus

Ethically approved

Purpose of Study:

State the aim of the study and the methods to be used (e.g. user interface evaluation,online questionnaire, system performance, focus groups, etc).

Does the research involve any of the following?

Human subjects

Personal data identifiable with living people (as defined by Data Protection Act)

Sensitive personal data (as defined by Data Protection Act)

Confidential data

None of the above (proceed to Risk Assessment)

Interface Only Screening

Interface only approval can be granted to projects that will be evaluating a user interface byobserving individuals using the software and performing a system usability scale (SUS)questionnaire. Participants must be staff or students of Heriot-Watt University. The standardconsent form must be completed by the participant prior to the study and stored by the studentconducting the project. All collected data must be anonymised. No sensitive data will becollected. Only standard computing equipment, i.e. an office PC, laptop, tablet, or mobilephone, will be used.

Does the evaluation consist of evaluating a user interface according to the above criteria?

MSc Project System: Ethical Approval Details https://www.macs.hw.ac.uk/cs/project-system/ethics-submission.php...

1 sur 2 14/08/2017 15:17

Page 70: Detecting and Adapting Conversational Agent Strategy to ...This project focuses on adding emotion detection in video games to make non-playable characters adapt to users’ emotions

Appendix A Ethical Approval 57

Yes (proceed to risk assessment)No

[...]

Health and Safety Risk Assessment

I confirm that the project involves only standard IT equipment and exposes participants tono more hazards than a conventional office environment.

YesNo

Supervisor's comments: readonly

Ethics Coordinator comments: readonly

Declarations

Student

I confirm that the above information is accurate and a true reflection of the intended study.

Name : Brice CagnolDate : 2017-05-15

Supervisor

I, as supervisor of the above student, have checked the above for accuracy and am satisfiedthat the information provided is a true reflection of the intended study.

Name : Oliver LemonDate : 2017-08-07

MSc Project System: Ethical Approval Details https://www.macs.hw.ac.uk/cs/project-system/ethics-submission.php...

2 sur 2 14/08/2017 15:17

Page 71: Detecting and Adapting Conversational Agent Strategy to ...This project focuses on adding emotion detection in video games to make non-playable characters adapt to users’ emotions

Appendix B

Consent Form

58

Page 72: Detecting and Adapting Conversational Agent Strategy to ...This project focuses on adding emotion detection in video games to make non-playable characters adapt to users’ emotions

Appendix B Consent Form 59

Detecting and Adapting Conversational Agent Strategy to User’s

Emotions in Video Games

Consent Form Researcher:

Brice Cagnol – [email protected]

Supervisor:

Professor Oliver Lemon – [email protected]

Participation Consent This study assesses the functionality of emotion detection and in a video game and how it adapts to

user’s emotions.

You will be presented with a system and instructions of how to play the game, the boundaries,

scenario and overall objective of the gameplay as well as information about what to do if you get

stuck. The test is not assessing your skills as a player nor the dialogue choices that you make, neither

will be judged or form part of the evaluation. You will then be asked to complete a questionnaire to

help us assess your experience using this system.

All aspects of this test are completely optional and you may stop at any time without prejudice. A

webcam will be running during the test but not any picture will be stored. All resulting data is kept

anonymous, you will be provided with a participant number that will be stored separately from your

questionnaire.

By signing this document, you agree to take part in the above study and have your results published

as part of the evaluation.

Participant ID: _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

Participant Name: _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

Participant Signature: _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

Date: _ _ _ _ _ / _ _ _ _ _ _ / _ _ _ _ _ _ _ _

Researcher Signature: _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

Page 73: Detecting and Adapting Conversational Agent Strategy to ...This project focuses on adding emotion detection in video games to make non-playable characters adapt to users’ emotions

Appendix C

Evaluation Instructions

60

Page 74: Detecting and Adapting Conversational Agent Strategy to ...This project focuses on adding emotion detection in video games to make non-playable characters adapt to users’ emotions

Appendix C Evaluation Instructions 61

Evaluation Instructions

Plot You woke up in an empty spaceship. After having wandering for a few minutes, you find an AI close to the door to the escape pods of the ship. Maybe you could ask to it what is going on…

Objectives Talking with the AI can give clues about the objectives. If you cannot complete them, you can stop the test whenever you want.

Commands You can move with ZQSD (just like WASD on a QWERTY keyboard). Z is forward, S backward, Q left and D right. You can turn the camera with the mouse.

When an interaction with an object is possible, there is a small message in grey saying so under the cursor (see picture). You can then click to start the interaction. For example, you can click on the AI to talk with it when [Talk] is displayed.

When you talk to the AI, what you are saying is shown under last AI response. There could be errors in the recognition but the AI could eventually give a satisfying response to it or you could just repeat then. If the AI shows that it did not understand your last utterance, it may be because the current formulation isn’t recognized or that the understanding of your idea has not been implemented.

Help If you are blocked, you can ask for help to the AI. You can say: “Help me” or “What should I do” for example. Then the agent will give you a hint about what you can do.

Page 75: Detecting and Adapting Conversational Agent Strategy to ...This project focuses on adding emotion detection in video games to make non-playable characters adapt to users’ emotions

Appendix D

Questionnaire

62

Page 76: Detecting and Adapting Conversational Agent Strategy to ...This project focuses on adding emotion detection in video games to make non-playable characters adapt to users’ emotions

Appendix D Questionnaire 63

Page 77: Detecting and Adapting Conversational Agent Strategy to ...This project focuses on adding emotion detection in video games to make non-playable characters adapt to users’ emotions

Appendix D Questionnaire 64

Page 78: Detecting and Adapting Conversational Agent Strategy to ...This project focuses on adding emotion detection in video games to make non-playable characters adapt to users’ emotions

Appendix E

Risk Assessment Form

65

Page 79: Detecting and Adapting Conversational Agent Strategy to ...This project focuses on adding emotion detection in video games to make non-playable characters adapt to users’ emotions

Appendix E Risk Assessment Form 66

MACS Risk Assessment Form (Project)

Student:

Brice Cagnol

Project Title:

Detecting and Adapting Conversational Agent Strategy to User's Emotions in Video Games

Supervisor:

Prof. Oliver Lemon

Risks:

Risk Present (give details) (tick if present)

Control Measures and/or Protection

Standard Office environment- includes purely software projects

Yes (Laptop and Headphone) Nothing

Unusual peripherals e.g. Robot, VR helmet, haptic device, etc.

None Nothing

Unusual Output e.g. Laser, loud noises, flashing lights etc.

None Nothing

Other risks

None Nothing

Page 80: Detecting and Adapting Conversational Agent Strategy to ...This project focuses on adding emotion detection in video games to make non-playable characters adapt to users’ emotions

References

Affectiva (2009). Affectiva api.

Affectiva (2009). Mapping expressions to emotions.

https://developer.affectiva.com/mapping-expressions-to-emotions/ [Last Accessed:

17/08/2017].

Bethesda Softworks (2011). The elder scrolls v: Skyrim.

Burkhardt, F., Van Ballegooy, M., Engelbrecht, K.-P., Polzehl, T., and Stegmann, J.

(2009). Emotion detection in dialog systems: applications, strategies and challenges.

In Affective Computing and Intelligent Interaction and Workshops, 2009. ACII 2009.

3rd International Conference on, pages 1–6. IEEE.

Cambria, E., Schuller, B., Xia, Y., and Havasi, C. (2013). New avenues in opinion mining

and sentiment analysis. IEEE Intelligent Systems, 28(2):15–21.

Carpenter, R. (1997). Cleverbot. http://www.cleverbot.com/ [Last Accessed:

5/04/2017].

Dernoncourt, F. (2012). Designing an intelligent dialogue system for serious games. RJC

EIAH, page 33.

EA (2008). Spore.

Ekman, P. (1992). An argument for basic emotions. Cognition & emotion, 6(3-4):169–

200.

Event[0] (2016). Ocelot society.

Hofer, G. and Berger, M. (2010). Speech graphics. https://www.speech-graphics.com/

[Last Accessed: 5/04/2017].

67

Page 81: Detecting and Adapting Conversational Agent Strategy to ...This project focuses on adding emotion detection in video games to make non-playable characters adapt to users’ emotions

Bibliography 68

Holzapfel, H. and Fuegen, C. (2002). Integrating emotional cues into a framework for

dialogue management. In Proceedings of the 4th IEEE International Conference on

Multimodal Interfaces, page 141. IEEE Computer Society.

IBM (2006). Watson apis. https://www.ibm.com/watson/ [Last Accessed: 5/04/2017].

IBM (2015). Tone analyzer. https://tone-analyzer-demo.mybluemix.net/ [Last Ac-

cessed: 6/04/2017].

Lang, P. J., Bradley, M. M., and Cuthbert, B. N. (1990). Emotion, attention, and the

startle reflex. Psychological review, 97(3):377.

Language Technology Group, University of Oslo (2014). Opendial.

http://www.opendial-toolkit.net/ [Last Accessed: 1/04/2017].

Lemon, O. (2002). Transferable multi-modal dialogue systems for interactive entertain-

ment. In AAAI Spring Symposium on Artificial Intelligence in Interactive Entertain-

ment, pages 57–61.

Lison, P. and Kennington, C. (2015). Developing spoken dialogue systems with the

opendial toolkit. SEMDIAL 2015 goDIAL, pages 194–195.

Microsoft (2014). Cortana.

Microsoft (2016). Emotion api. https://www.microsoft.com/cognitive-services/en-

us/emotion-api [Last Accessed: 6/04/2017].

Nintendo (1998). Hey you, pikachu!

Pandorabots (2013). S.u.p.e.r. chatbot.

Picard, R. W. and Picard, R. (1997). Affective computing, volume 252. MIT press

Cambridge.

Reed, A. A., Samuel, B., Sullivan, A., Grant, R., Grow, A., Lazaro, J., Mahal, J.,

Kurniawan, S., Walker, M., and Wardrip-Fruin, N. (2011). A step towards the future

of role-playing games: The spyfeet mobile rpg project. In AIIDE.

Semel, P. (2010). Interview: Event[0] game designer sergey mohov on the game’s

talkative a.i. https://www.gamecrate.com/interview-event0-game-designer-sergey-

mohov-games-talkative-ai/14545 [Last Accessed: 17/08/2017].

Page 82: Detecting and Adapting Conversational Agent Strategy to ...This project focuses on adding emotion detection in video games to make non-playable characters adapt to users’ emotions

Bibliography 69

Socher, R., Perelygin, A., Wu, J. Y., Chuang, J., Manning, C. D., Ng, A. Y., Potts, C.,

et al. (2013). Recursive deep models for semantic compositionality over a sentiment

treebank. In Proceedings of the conference on empirical methods in natural language

processing (EMNLP), volume 1631, page 1642. Citeseer.

Ubisoft (2008). Tom clancy’s endwar.

Unity Technologies (2005). Unity game engine. https://unity3d.com/ [Last Accessed:

5/04/2017].

Virtual Heroes, Army Game Studio, NASA (2010). Moonbase alpha.

Wallace, R. S. (2009). The anatomy of alice. In Parsing the Turing Test, pages 181–210.

Springer.