xface open source project and smil-agent scripting ...xface.fbk.eu/downloads/itc-xface-mm07.pdf ·...

4
Xface Open Source Project and SMIL-Agent Scripting Language for Creating and Animating Embodied Conversational Agents Koray Balcı * FBK-irst Via Sommarive, 18 I-38050 Trento, Italy [email protected] Elena Not FBK-irst Via Sommarive, 18 I-38050 Trento, Italy [email protected] Massimo Zancanaro FBK-irst Via Sommarive, 18 I-38050 Trento, Italy [email protected] Fabio Pianesi FBK-irst Via Sommarive, 18 I-38050 Trento, Italy [email protected] ABSTRACT Xface is a set of open source tools for creation of embodied conversational agents (ECAs) using MPEG4 and keyframe based rendering driven by SMIL-Agent scripting language. Xface Toolkit, coupled with SMIL-Agent scripting serves as a full 3D facial animation authoring package. Xface project is initiated by Cognitive and Communication Technologies (TCC) division of FBK-irst (formerly ITC-irst). The toolkit is written in ANSI C++, and is open source and platform independent. Categories and Subject Descriptors I.3.7 [Computer Graphics]: Three-Dimensional Graphics and Realism—virtual reality ; H.5.1 [Information Inter- faces and Presentation]: Multimedia Information Sys- tems—artificial, augmented, and virtual realities General Terms Performance, Standardization, Design Keywords Embodied conversational agents, 3D talking heads, mpeg4 facial animation, scripting, open source 1. INTRODUCTION An expressive ECA would be of help in various appli- cations such as human-computer dialogues aimed at prob- lem solving, tutoring, advising, adaptive multimodal presen- tations, or also simple, canned, information presentations linked to html pages, possibly manually written by a human * Author is also a PhD student in Computer Engineering Department at Bogazici Universtiy, Istanbul, Turkey Copyright is held by the author/owner(s). MM’07, September 23–28, 2007, Augsburg, Bavaria, Germany. ACM 978-1-59593-701-8/07/0009. author). We believe that the lack of free and open source tools for creation and animation of faces limit the further research on mentioned areas. Xface, our open source toolkit together with SMIL-Agent scripting language aims to solve this problem. 2. XFACE TOOLKIT In late 2003, with the above motivations, we initiated the project and released the early version of the toolkit in 2004 [3]. Over the years the toolkit evolved with new fea- tures and gained acceptance from the community allowing us to have a dedicated user base and testing group that re- ports bugs and new feature requests on a regular basis. The toolkit currently incorporates four pieces of software. The core Xface library is for developers who would like to embed 3D facial animation in their applications. XfaceEd editor provides an easy to use interface to generate MPEG4 ready meshes from static 3D models. XfacePlayer is a sam- ple application that demonstrates the toolkit in action and XfaceClient is used as a script editor and communication controller over network with the player. Some key features of Xface Toolkit can be listed as follows: Accepts MPEG4 FAP files and SMIL-Agent scripts as input. Supports muscle based deformation (for MPEG4) and keyframe interpolation based animation using morph targets. Muscle deformation methods/rules can be extended easily. Blending of visemes (visual phonemes), emotions, ex- pressions. Head and eye movements (random and controlled). Can use various TTS engines (MS SAPI, Festival [7]). We have experiments with English and Italian, and got reports from our users successfully using it in Spanish and Dutch.

Upload: lamcong

Post on 30-Jan-2018

217 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Xface Open Source Project and SMIL-Agent Scripting ...xface.fbk.eu/downloads/itc-xface-mm07.pdf · Language for Creating and Animating Embodied Conversational Agents ... on OpenGL

Xface Open Source Project and SMIL-Agent ScriptingLanguage for Creating and Animating Embodied

Conversational Agents

Koray Balcı∗

FBK-irstVia Sommarive, 18 I-38050

Trento, [email protected]

Elena NotFBK-irst

Via Sommarive, 18 I-38050Trento, [email protected]

Massimo ZancanaroFBK-irst

Via Sommarive, 18 I-38050Trento, Italy

[email protected]

Fabio PianesiFBK-irst

Via Sommarive, 18 I-38050Trento, Italy

[email protected]

ABSTRACTXface is a set of open source tools for creation of embodiedconversational agents (ECAs) using MPEG4 and keyframebased rendering driven by SMIL-Agent scripting language.Xface Toolkit, coupled with SMIL-Agent scripting serves asa full 3D facial animation authoring package. Xface projectis initiated by Cognitive and Communication Technologies(TCC) division of FBK-irst (formerly ITC-irst). The toolkitis written in ANSI C++, and is open source and platformindependent.

Categories and Subject DescriptorsI.3.7 [Computer Graphics]: Three-Dimensional Graphicsand Realism—virtual reality ; H.5.1 [Information Inter-faces and Presentation]: Multimedia Information Sys-tems—artificial, augmented, and virtual realities

General TermsPerformance, Standardization, Design

KeywordsEmbodied conversational agents, 3D talking heads, mpeg4facial animation, scripting, open source

1. INTRODUCTIONAn expressive ECA would be of help in various appli-

cations such as human-computer dialogues aimed at prob-lem solving, tutoring, advising, adaptive multimodal presen-tations, or also simple, canned, information presentationslinked to html pages, possibly manually written by a human

∗Author is also a PhD student in Computer EngineeringDepartment at Bogazici Universtiy, Istanbul, Turkey

Copyright is held by the author/owner(s).MM’07, September 23–28, 2007, Augsburg, Bavaria, Germany.ACM 978-1-59593-701-8/07/0009.

author). We believe that the lack of free and open sourcetools for creation and animation of faces limit the furtherresearch on mentioned areas. Xface, our open source toolkittogether with SMIL-Agent scripting language aims to solvethis problem.

2. XFACE TOOLKITIn late 2003, with the above motivations, we initiated

the project and released the early version of the toolkit in2004 [3]. Over the years the toolkit evolved with new fea-tures and gained acceptance from the community allowingus to have a dedicated user base and testing group that re-ports bugs and new feature requests on a regular basis.The toolkit currently incorporates four pieces of software.

The core Xface library is for developers who would like toembed 3D facial animation in their applications. XfaceEdeditor provides an easy to use interface to generate MPEG4ready meshes from static 3D models. XfacePlayer is a sam-ple application that demonstrates the toolkit in action andXfaceClient is used as a script editor and communicationcontroller over network with the player.Some key features of Xface Toolkit can be listed as follows:

• Accepts MPEG4 FAP files and SMIL-Agent scripts asinput.

• Supports muscle based deformation (for MPEG4) andkeyframe interpolation based animation using morphtargets.

• Muscle deformation methods/rules can be extendedeasily.

• Blending of visemes (visual phonemes), emotions, ex-pressions.

• Head and eye movements (random and controlled).

• Can use various TTS engines (MS SAPI, Festival [7]).We have experiments with English and Italian, and gotreports from our users successfully using it in Spanishand Dutch.

Page 2: Xface Open Source Project and SMIL-Agent Scripting ...xface.fbk.eu/downloads/itc-xface-mm07.pdf · Language for Creating and Animating Embodied Conversational Agents ... on OpenGL

• Control over TCP/IP. XfacePlayer can be controlledusing any programming language with our messagingsystem.

• Save animation as video.

• Platform independent code base. (We distribute onlywindows version, however people reported compiling itunder Linux successfully)

The toolkit handles facial animation in two modes;MPEG4 facial animation (FA) and keyframe interpolation.Next we will discuss these modes in more detail.

2.1 MPEG4 Facial AnimationIn 1999, Moving Pictures Experts Group released MPEG4

as an ISO standard [1, 2, 9]. According to MPEG4 FAstandard, there are 84 feature points (FP) on the head. Foreach FP to be animated, corresponding vertex on the model,and the indices to the vertices in the zone of influence of thisFP should be set. Then, 68 facial animation parameters(FAPs) drive the animation on those feature points.With XfaceEd’s editing tools, one can set these feature

points and the zone of influence and define muscle models foreach zone. Then, muscle deformations under different FAPvalues can be previewed and parameters can be fine tuned.Once all the information is in place, a face configuration filein XML syntax is produced. This file contains various infor-mation such as 3D models used for the face (one can haveseparate models for head, hair, teeth, etc.), textures, FPs,zones of influences, muscle models used, weight factors, etc.This configuration file then used in XfacePlayer to generateanimation.

2.2 Keyframe InterpolationAs an alternative, Xface Toolkit also implements a key-

frame interpolation based animation framework. In thismode, you should have a set of keyframes for different emo-tions and visemes in place prepared externally. In XfaceEd,these keyframes are inserted to the configuration and saved.Using SMIL-Agent scripts, we can test the animation withinthe editor. Visual speech (visemes) and emotions are definedas different channels where they are blended [6, 4, 5, 10] andinterpolated over time. In Figure 1, you can see some of theemotion keyframes used for Alice model.In both MPEG4 and keyframe based modes, all the algo-

rithms are implemented in the core library mentioned pre-viously. This enables identical animation behavior in bothXfaceEd and XfacePlayer while also letting application de-velopers to implement their own players if they needed bylinking only the core library.In Figures 3 and 4, various screenshots from XfacePlayer

and XfaceEd are shown.

3. SMIL-AGENTSynthetic characters are often integrated in multimodal

interfaces to convey messages to the user, provide visualfeedback and engage them in the dialogue, also through emo-tional involvement. This is accomplished through a suitablesynchronization of voice, lip movements, facial expressions,gestures, etc., In the end, synthetic characters should not beconsidered as a single modality but as stemming from thesynergic contribution of different communication channels

Figure 1: Emotion keyframes for Alice face.

that, properly synchronized, generate an overall communi-cation performance.With respect to other existing scripting languages SMIL-

Agent [8] pushes further the idea of having a separate rep-resentation for the various communication modalities of asynthetic character (e.g., voice, speech animation, sign ani-mation, facial expressions, gestures, etc.) and their explicitinterleaving in the presentation performance. Furthermore,SMIL-Agent explicitly abstracts away from all data relatedto the dialogue management and the integration of the agentwithin larger multimodal presentations, thus assuring theportability of the language (and of the synthetic characterssupporting it) to different task and application contexts.SMIL-Agent is in XML syntax and can be interpreted na-

tively by Xface library to generate animations. Here is asample SMIL-Agent script for a character informing a pa-tient about her health status; the first part give some diag-nostic information, and is produced in a sad mood; the finalpart is in a happy mood with added eye and head move-ments.

<par system-language="english"><speech channel="voice" affect="sorry-for"type="inform" id="angina"><mark id="*1*"/>You have been diagnosed as suffering from<mark id="*2*"/>angina pectoris, which appears to be mild.

</speech><seq channel="face" ><speech-animation affect="Sad"content-id="angina" content-end="*2*"/><speech-animation affect="Happy"content-id="angina" content-begin="*2*"/>

</seq><action channel="eyes" actiontype="turning"intensity="0.5" content-id="angina"content-begin="*1*" content-end="*2*"><parameter>LookLeft</parameter>

Page 3: Xface Open Source Project and SMIL-Agent Scripting ...xface.fbk.eu/downloads/itc-xface-mm07.pdf · Language for Creating and Animating Embodied Conversational Agents ... on OpenGL

</action><action channel="head" actiontype="pointing"content-id="angina" content-begin="*2*"><parameter>15</parameter><parameter>0</parameter><parameter>5</parameter>

</action></par>

In Figure 2 the flow of animation generation is presented.In Xface Toolkit, there is a separate library for interpretingSMIL-Agent scripts and creating animation definitions. Asshown in the figure, this task is not so trivial and involvescommunication with speech synthesizers (TTS engines), var-ious modalities such as speech, emotions and gestures shouldbe extracted, processed and blended and synchronized be-fore producing the final animation.

Figure 2: SMIL-Agent script processing.

We are currently working on a new user friendly, visualSMIL-Agent editing tool, which we plan to release this sum-mer1.

4. CONCLUSIONWith Xface project, we aim to develop a set of tools that

are easy to use and extend, open to researchers and softwaredevelopers. All the pieces in the toolkit are operating sys-tem independent, and can be compiled with any ANSI C++standard compliant compiler. For animation, toolkit relieson OpenGL and is optimized enough to achieve satisfactoryframe rates (minimum 25 frames per second) with high poly-gon count (12000+ polygons) using modest hardware.We distribute Xface under Mozilla Public License Version

1.1 in our Subversion server 2 and it can be freely down-loaded. The release archives are also available at Source-Forge3.We have used our toolkit in various projects such as:

1see http://xface.itc.it/projectxq/ for an early version.2http://xfacesvn.itc.it/svn/trunk3http://sourceforge.net/projects/xface/

• Generation of relational reports, with the automaticgeneration of the SMIL-Agent scripts starting fromthe acoustic and visual scene analysis of meetings, theanimations produced by the player (in form of AVIfiles) are integrated into multimodal SMIL presenta-tions. (EU FP6 IST project CHIL4)

• Persuasion studies for education. (EU FP6 NoE pro-ject HUMAINE5)

• Healthcare support system.

Statistics from our project page show that the latest bi-nary setup file of Xface Toolkit (v0.94; released December2006) downloaded 480 times, and overall we have a downloadcount of 2238 since the development started in May 2004.We also get daily support requests and feedbacks, mostlyfrom graduate students all over the world.In the future, we plan to use Xface Toolkit in some other

projects, increase its feature set and enlarge the user base.For further information and material, visit our web pages:

Xface: http://xface.itc.itSMIL-Agent: http://tcc.itc.it/i3p/research/smil-agent

5. REFERENCES[1] ISO/IEC JTC1/WG11 N1901, Text for CD 14496-1Systems. Fribourg Meeting, November 1997.

[2] ISO/IEC JTC1/WG11 N1902, Text for CD 14496-1Visual. Fribourg Meeting, November 1997.

[3] K. Balcı. Xface: MPEG-4 based open source toolkitfor 3d facial animation. In Proc. Advance VisualInterfaces, Italy, May 2004.

[4] T. Bui, D. Heylen, and A. Nijholt. Combination offacial movements on a 3d talking head. In Proc. ofComputer Graphics International 2004 (CGI 2004),Crete, Greece, June 2004. IEEE Computer Society.

[5] Y. Cao, W. C. Tien, P. Faloutsos, and F. Pighin.Expressive speech-driven facial animation. ACMTransactions on Graphics, October 2005.

[6] J. Edge and S. Maddock. Expressive visual speechusing geometric muscle functions. In Proc. of the 19thEurographics UK Chapter Annual Conference (UCL),pages 11–18, April 2001.

[7] T. C. for Speech Technology Research. The FestivalSpeech Synthesis System. University of Edinburgh,2002. http://www.cstr.ed.ac.uk/projects/festival/.

[8] E. Not, K. Balcı, F. Pianesi, and M. Zancanaro.Synthetic characters as multichannel interfaces. InProc. ICMI05, Italy, October 2005.

[9] I. Pandzic and R. Forchheimer. MPEG-4 FacialAnimation: The Standard, Implementation andApplications. Wiley, New York, 2002.

[10] H. Pyun, W. Chae, Y. Kim, H. Kang, and S. Y. Shin.An example-based approach to text-driven speechanimation with emotional expressions. TechnicalReport 200, KAIST, July 2004.

4http://chil.server.de/servlet/is/101/5http://emotion-research.net/

Page 4: Xface Open Source Project and SMIL-Agent Scripting ...xface.fbk.eu/downloads/itc-xface-mm07.pdf · Language for Creating and Animating Embodied Conversational Agents ... on OpenGL

Figure 3: XfacePlayer rendering animation with various emotions.

Figure 4: XfaceEd: Setting FAPU and FP, preview FAPs, and test with SMIL-Agent.