emotion communication via copying behaviour: a case study ... · emotion communication via copying...

Emotion Communication via Copying Behaviour:A Case Study with the Greta Embodied Agent

Ginevra CastellanoSchool of EECE & HCI CentreUniversity of Birmingham, [email protected]

Maurizio ManciniInfoMus Lab, DIST

University of Genova, Italy

Christopher PetersDepartment of Computing

and the Digital EnvironmentCoventry University, UK

ABSTRACTThis paper investigates emotion communication via copy-ing behaviour in an embodied virtual agent. Copying takesplace at the expressive level, where motion qualities are con-sidered, rather than exact low-level motion matching. Wepresent an experiment that investigates (1) the extent towhich people can recognise the emotion expressed by theagent’s copying behaviour, and (2) whether and how thetype of gesture performed by the agent affects the percep-tion of emotion. Results suggest that a combination of typeof movement performed and its quality are important forsuccessfully communicating emotions.

Categories and Subject DescriptorsH.5.2 [Information Interfaces and Presentation]: UserInterfaces—Evaluation/methodology ; J.4 [Computer Ap-plications]: Social and Behavioural Sciences

General TermsAlgorithms, Human Factors, Design, Theory

KeywordsExpressivity, gesture, emotion, copying behaviour, ECA

1. INTRODUCTIONEstablishing an affective loop between a user and an em-

bodied agent requires for the agent to be endowed with ca-pabilities fundamental to social intelligence, relating to thetimely analysis, processing and synthesis of appropriate be-havioural cues [6]. According to this conceptual view, thedesign of an agent’s behaviour may take place at differentlevels, operational over multiple degrees of sophistication:from reactive feedback based on fast decoding of low-levelcues [11], to the employment of planned responses utilisingcomplex reflective models accounting for the theorised men-tal states of interactors and state of the interaction [14].

Permission to make digital or hard copies of all or part of this work forpersonal or classroom use is granted without fee provided that copies arenot made or distributed for profit or commercial advantage and that copiesbear this notice and the full citation on the first page. To copy otherwise, torepublish, to post on servers or to redistribute to lists, requires prior specificpermission and/or a fee.AFFINE ’11, ICMI’11, Alicante, SpainCopyright 20XX ACM X-XXXXX-XX-X/XX/XX ...$10.00.

This paper aims to elucidate some basic aspects towardsthe challenge of creating such systems. We present an exper-iment investigating participants’ ratings of video recordingsof expressive gestures conducted by actors and synthesisedversions conducted by an embodied virtual agent. Emo-tional gestures conducted by actors are mapped onto syn-thesised versions generated by the Greta embodied agent.Copying takes place at the expressive level, where motionqualities are considered, rather than exact low-level motionmatching. Different types of gestures are performed by theagent, depending on the emotion expressed by the actors.The experiment investigates how a combination of (a) thetype of movement performed, and (b) the way in which itis performed, can be used to successfully communicate emo-tions and to inform the design of an agent capable of es-tablishing bidirectional affective communication with humanusers based on expressive copying behaviour.

The results suggest that a combination of the type ofmovement performed and its quality are important for suc-cessfully communicating emotions.

2. BACKGROUNDRecently there has been an increasing interest towards

studies on affect recognition from the automatic analysis ofbody movement and postures. Studies on affect recogni-tion from body movement include the work by Gunes andPiccardi [8], Bernhardt et al. [2], Camurri et al. [4] andCastellano et al. [7]. Examples of studies that addressedaffect recognition from body posture are those of Sanghvi etal. [16], and of Kleinsmith and Bianchi-Berthouze [9].

Other studies deal with agents that can react to affectiveexpressions of the user and provide low-level feedback. Ex-amples include the work by Maatman and colleagues [11],who designed an agent capable of creating a sense of rap-port in human speakers by providing real-time non-verballistening feedback, the agent designed by Kopp et al. [10],endowed with the ability to imitate natural gestures per-formed by humans, and the work by Reidsma and colleagues[15], who designed a virtual rap dancer that invites users tojoin him in a dancing activity.

Previous work conducted by the authors addressed copy-ing behaviour in an embodied agent [12] [6]. Emotional ges-tures performed by actors were analysed in terms of motioncues and these cues were used to synthesise conversationalgestures in the Greta agent. Results showed that, in mostcases, people tended to associate the emotional content ofthe agent’s gestures with that intended to be expressed bythe human actor.

Figure 1: Overview of the system’s modules.

In this work we investigate emotion perception by partic-ipants when the Greta agent performs a copying behaviourby generating an emotion-specific gesture, i.e., a gesture thatdiffers depending on the emotion expressed by the actor.

3. SYSTEM OVERVIEWThe system (Figure 1) used to generate the stimuli in the

experiment presented in this paper consists of two integratedsoftware platforms: EyesWeb XMI [3] for motion trackingand movement expressivity processing; and Greta [13], anEmbodied Conversational Agent (ECA) with a humanoidappearance, capable of generating expressive behaviours. Amodule for mapping movement expressivity (i.e., motioncues characterising the human movement that is being anal-ysed are mapped onto the agent’s expressivity parameters)is integrated with these two platforms [5].The embodied agent generates expressive copying behaviour:

the agent generates gestures that reproduce the same expres-sivity of the gestures performed by a human. The copying isperformed only at the expressive level, whereas informationabout the shape (configuration of hand/arm) of the gestureperformed by the human is not considered.

4. EXPERIMENT

4.1 OverviewThe overall purpose of the experiment was to investigate

(1) the extent to which people can recognise the emotionexpressed by the agent’s copying behaviour, and (2) whetherand how the type of gesture performed by the agent affectsthe perception of emotion.

4.2 MaterialsA set of gestures from an extract of videos of the GEMEP

(GEneva Multimodal Emotion Portrayals) corpus, a corpusof acted emotional expressions [1], was analysed with Eye-sWeb XMI. Six videos of the corpus were considered, withthree different emotions (anger, joy, sadness) expressed bytwo different actors observed by a frontal camera. The ges-tures displayed by the actors were performed freely, withoutany forced choice. The expressive motion cues reported inFigure 1 were extracted from the videos and reproduced inthe Greta agent using the expressivity mapping module [5].For each video of the GEMEP corpus different types of ges-tures of the Greta agent were synthesised and recorded invideos (one gesture for each video), according to the follow-ing specifications:

1. Greta performs a beat gesture by altering all the ex-pressivity parameters from their original neutral val-

Figure 2: The beat performed by the Greta agent.The face is hidden so as to ensure that it does notinfluence the perception of emotions from the par-ticipants.

Figure 3: The emotion-specific gestures performedby the Greta agent in correspondence with the threeemotions: anger (top); joy (center); sadness (bot-tom). The face is hidden so as to ensure that it doesnot influence the perception of emotions from theparticipants.

ues. A beat gesture is a conversational gesture whoseshape does not appear to convey any obvious emo-tional expression or meaning (see Figure 2). Six videoswere created in this phase;

2. Greta performs an emotion-specific gesture (i.e., a ges-ture that differs depending on the emotion expressedby the actor) by altering all the expressivity param-eters from their neutral values. Based on empiricalobservations, the following gestures were chosen forevaluation: deictic gesture for anger ; opening arms forjoy ; raising and lowering the arms in front of the bodyfor sadness (see Figure 3). Six videos were created inthis phase.

Table 1 summarises the gestures considered in the exper-iment.

Condition Type of Movement PerformerC1 Freely performed gestures ActorsC2 Beat gestures ECAC3 Emotion-specific gestures ECA

Table 1: Gestures considered in the experiment.

Variable name Variable type LevelsType of movement Independent 3: conditions

C1, C2 and C3Emotion rating Dependent 1: anger or

joy or sadness

Table 2: Dependent and independent variables forthe one-way ANOVA.

4.3 ProcedureTwenty-two students and researchers in computer science

(12M:10F, average age: 33 years old) participated in theexperiment. Each participant was asked to observe a totalof eighteen videos over three conditions (C1, C2 and C3; seeTable 1). Six videos were shown for each of the conditionsC1, C2 and C3, respectively.A computer was used to show the participants the videos

in a random order, different for each participant. Partic-ipants were told that they were participating in a studyaiming to investigate the relationship between emotions andmovement expressivity in an expressive virtual agent. Eachparticipant was presented with the following instructions:“You will be shown a set of videos in which a real personor a virtual agent performs one gesture. For each video youwill be required to observe the body movements of the per-son or agent, and to evaluate which emotion(s) is/are beingexpressed”. Participants were asked to observe the gesturesin the videos and to associate an emotion label (anger, joyor sadness) with each gesture using a slider: each emotioncould be rated in a continuum scale of 1 to 100. Partici-pants were allowed to watch each video as many times asthey wanted.

4.4 ResultsIn order to investigate the effect of the type of movement

generated by the agent on the participants’ ratings, a one-way analysis of variance (ANOVA) with repeated measureswas performed for each of the three dependent variables (theratings of anger, joy and sadness), with the type of move-ment (three levels: C1, C2 and C3) as the independent vari-able. Table 2 summarises the dependent and independentvariables considered in the one-way ANOVA. Note that theemotion expressed by the actors is not an independent vari-able here, as the effect of the type of movement on the rat-ings of each emotion is considered when the same emotionis expressed by the actors. Means and standard deviationsare reported in Table 3.Pairwise comparisons (Bonferroni corrected) were performed

in order to identify specific differences among the ratings.Anger: The one-way ANOVA for ratings of anger when

anger is expressed by the actors showed a significant maineffect of the type of movement [F (2, 42) = 18.78; p < 0.001].Pairwise comparisons showed that ratings of anger whenanger is expressed by the actor and in correspondence withC1 are significantly higher than ratings of anger in corre-spondence with C2 (MD = 28.68; p < 0.001) and C3 (MD= 17.55; p < 0.001). No significant difference was foundbetween C2 and C3, although ratings of anger are higher incorrespondence of C3 with respect to C2 (Table 3).Joy: The one-way ANOVA for ratings of joy when joy is

expressed by the actors showed a significant main effect of

Type of Ratings of Ratings of Ratings ofMovement Anger Joy SadnessCondition C1

Mean 80.32 38.52 49.30S.D. 14.44 25.36 22.67

Condition C2Mean 51.64 51.11 50.73S.D. 21.36 23.68 16.02

Condition C3Mean 62.77 54.82 49.70S.D. 15.56 26.15 19.11

Table 3: Mean values and standard deviations ofratings of anger, joy and sadness for the differentconditions of type of movement (N = 22 in eachcondition).

the type of movement [F (2, 42) = 4.34, p < 0.05]. Pairwisecomparisons showed that ratings of joy in correspondencewith C2 are significantly higher than those in correspon-dence with C1 (MD = 12.59, p < 0.05). There is no signif-icant difference between C2 and C3, even if Table 3 showsthat ratings of joy in correspondence with C3 are higher.

Sadness: The one-way ANOVA for ratings of sadnesswhen sadness is expressed by the actors did not show anysignificant main effect of the type of movement. For this rea-son, differences among the ratings of sadness in correspon-dence with different conditions of type of movement werenot investigated.

5. DISCUSSIONThe experiment addressed the issue of evaluating whether

the gestures performed by the Greta agent are associatedwith the same emotion expressed by the correspondent ges-tures performed by the actors. As a success criterion, weconsider the proposed expressive motion cues as effective atcommunicating the same emotion conveyed by the originalmovement if the emotion expressed by the agent (C2 andC3) is rated over 50%.

Results show that anger is given ratings exceeding 50% incorrespondence with both conditions C2 and C3 (51.64% forC2 and 62.77% for C3, see Table 3).

As far as the ratings of joy are concerned, there is a signifi-cant difference between conditions C1 and C2 (higher valuesfor C2). Moreover, Table 3 shows that joy is given ratingsgreater than 50% for C2 (51.11%) and C3 (54.82%), whileratings of joy for C1 are 38.52% . These results suggest thatthe gesture chosen by the actors to express joy may havebeen misleading, but that the expressivity of the originalmovement reproduced in the Greta agent allows the par-ticipants to associate the original affective content with thesynthesised gesture.

Finally, in the case of sadness, there is no significant dif-ference in the ratings between conditions C1 and C2. Fur-thermore, ratings only approach the 50% (see Table 3). Thisresult suggests that the gesture and related expressivity cho-sen by the actors to express sadness may have been mislead-ing for the participants. Nevertheless, it may also be pos-sible that the expressivity reproduced on the beat gestureperformed by the agent did not evoke sadness.

The experiment also investigated whether using an emotion-

specific gesture rather than the same gesture for all emotionsto reproduce the expressivity of the original movement in-creases the percentage of recognition of emotions by the par-ticipants. Although the post-hoc tests performed after theone-way ANOVAs for each emotion do not show any signifi-cant difference between C2 and C3, the mean values show aspecific direction towards a better recognition for C3 ratherthan C2 in the case of anger and joy (see Table 3). Theseresults suggest that using emotion-specific gestures may al-low people to better understand a communicated emotion,but only when a suitable gesture is chosen, which was mostlikely not the case for sadness in our experiment.

6. CONCLUSIONThis paper presented an experiment that investigates emo-

tion communication via copying behaviour in an ECA.These results suggest that a combination of type of move-

ment performed and its quality are important for success-fully communicating emotions, in order to inform the designof an agent capable of establishing an affective loop with theuser by generating expressive copying behaviour.Future work will further investigate the role of the type of

gesture performed by the agent for the purpose of emotioncommunication. The choice of the appropriate type of ges-ture for successful emotion communication during a copyingbehaviour task is likely to be influenced by contextual fac-tors such as characteristics of the interaction scenario andthe target users.

7. ACKNOWLEDGMENTSWe would like to thank Tanja Banziger and Klaus Scherer

for the videos from the GEMEP corpus.

8. REFERENCES[1] T. Banziger and K. Scherer. Introducing the Geneva

Multimodal Emotion Portrayal (GEMEP) Corpus. InK. R. Scherer, T. Banziger, and E. B. Roesch, editors,Blueprint for affective computing: A sourcebook.Oxford, England: Oxford University Press, In press.

[2] D. Bernhardt and P. Robinson. Detecting affect fromnon-stylised body motions. In A. Paiva, R. Prada, andR. W. Picard, editors, Affective Computing andIntelligent Interaction, Second InternationalConference, ACII 2007, Lisbon, Portugal, September12-14, 2007, Proceedings, volume 4738 of LNCS, pages59–70. Berlin: Springer-Verlag, 2007.

[3] A. Camurri, P. Coletta, G. Varni, and S. Ghisio.Developing multimodal interactive systems withEyesWeb XMI. In Proceedings of the 2007 Conferenceon New Interfaces for Musical Expression, pages305–308, 2007.

[4] A. Camurri, I. Lagerlof, and G. Volpe. Recognizingemotion from dance movement: Comparison ofspectator recognition and automated techniques.International Journal of Human-Computer Studies,Elsevier Science, 59:213–225, july 2003.

[5] G. Castellano and M. Mancini. Analysis of emotionalgestures for the generation of expressive copyingbehaviour in an embodied agent. In M. S. Dias,S. Gibet, M. Wanderley, and R. Bastos, editors,Advances in Gesture-based Human-Computer

Interaction and Simulation: 7th International GestureWorkshop, GW 2007, Lisbon, Portugal, May 2007,Revised Selected Papers, volume 5085 of LNAI, pages193–198. Berlin: Springer-Verlag, 2009.

[6] G. Castellano, M. Mancini, C. Peters, andP. McOwan. Expressive copying behavior for socialagents: A perceptual analysis. IEEE Transactions onSystems, Man and Cybernetics - Part A, In press.

[7] G. Castellano, S. D. Villalba, and A. Camurri.Recognising Human Emotions from Body Movementand Gesture Dynamics. In A. Paiva, R. Prada, andR. W. Picard, editors, Affective Computing andIntelligent Interaction, Second InternationalConference, ACII 2007, Lisbon, Portugal, September12-14, 2007, Proceedings, volume 4738 of LNCS, pages71–82. Berlin: Springer-Verlag, 2007.

[8] H. Gunes and M. Piccardi. Automatic temporalsegment detection and affect recognition from face andbody display. IEEE Transactions on Systems, Manand Cybernetics - Part B, 39(1):64–84, February 2009.

[9] A. Kleinsmith, N. Bianchi-Berthouze, and A. Steed.Automatic recognition of non-acted affective postures.IEEE Transactions on Systems, Man, and CyberneticsPart B, 99:1–12, January 2011.

[10] S. Kopp, T. Sowa, and I. Wachsmuth. Imitation gameswith an artificial agent: From mimicking tounderstanding shape-related iconic gestures. InA. Camurri and G. Volpe, editors, Gesture-basedcommunication in human-computer interaction,volume 2915 of LNAI, pages 436–447. Berlin: SpringerVerlag, 2004.

[11] R. M. Maatman, J. Gratch, and S. Marsella. Naturalbehavior of a listening agent. In Proceedings of the 5thInternational Conference on Intelligent Virtual Agents(IVA), Kos, Greece. Springer Verlag, 2005.

[12] M. Mancini, G. Castellano, C. Peters, andP. McOwan. Evaluating the communication ofemotion via expressive gesture copying behaviour inan embodied humanoid agent. In InternationalConference on Affective Computing and IntelligentInteraction, Memphis, USA, 2011. Springer.

[13] C. Pelachaud. Multimodal expressive embodiedconversational agents. In MULTIMEDIA ’05:Proceedings of the 13th annual ACM internationalconference on Multimedia, pages 683–689, New York,NY, USA, 2005. ACM Press.

[14] D. V. Pynadath and S. Marsella. Psychsim: Modelingtheory of mind with decision-theoretic agents. InIJCAI, pages 1181–1186, 2005.

[15] D. Reidsma, A. Nijholt, R. Poppe, R. Rienks, andH. Hondorp. Virtual rap dancer: Invitation to dance.In CHI ’06 extended abstracts on Human factors incomputing systems, pages 263–266. ACM, 2006.

[16] J. Sanghvi, G. Castellano, I. Leite, A. Pereira, P. W.McOwan, and A. Paiva. Automatic analysis ofaffective postures and body motion to detectengagement with a game companion. In ACM/IEEEInternational Conference on Human-Robot Interaction,pages 305–312, Lausanne, Switzerland, 2011. ACM.

emotion communication via copying behaviour: a case study ... · emotion communication via copying...

Documents