1 université paris 8 multimodal expressive embodied conversational agents catherine pelachaud...

11

Université Paris 8Université Paris 8

Multimodal Expressive Multimodal Expressive Embodied Conversational AgentsEmbodied Conversational Agents

Catherine PelachaudCatherine Pelachaud Elisabetta Bevacqua

Nicolas Ech Chafai, FT

Maurizio Mancini

Magalie Ochs, FT

Christopher Peters

Radek Niewiadomski

22

ECAs CapabilitiesECAs Capabilities

Anthropomorphic autonome figures Anthropomorphic autonome figures New form on human-machine New form on human-machine

interactioninteraction Study of human communication, Study of human communication,

human-human interactionhuman-human interaction ECAs ought to be endowed with ECAs ought to be endowed with

dialogic and expressive capabilities dialogic and expressive capabilities Perception: an ECA must be able to pay Perception: an ECA must be able to pay

attention to, perceive user and the attention to, perceive user and the context she is placed in.context she is placed in.

33

ECAs capabilitiesECAs capabilities

Interaction: Interaction: – speaker and addressee speaker and addressee emitsemits signals signals– speaker speaker perceivesperceives feedback from addressee feedback from addressee– speaker may decide to speaker may decide to adaptadapt to addressee’s to addressee’s

feedbackfeedback– consider social context consider social context

Generation: expressive synchronized visual Generation: expressive synchronized visual and acoustic behaviors. and acoustic behaviors. – produce expressive behavioursproduce expressive behaviours

words, voice, intonation,words, voice, intonation, gaze, facial expression, gesturegaze, facial expression, gesture body movements, body posturebody movements, body posture

44

Synchrony tool - Synchrony tool - BEATBEAT

Cassell et al, Media Cassell et al, Media Lab MITLab MIT

Decomposition of Decomposition of text into theme and text into theme and rhemerheme

Linked to WordNetLinked to WordNet Computation of:Computation of:

– intonationintonation– gazegaze– gesturegesture

55

Virtual Training Environments Virtual Training Environments MREMRE

(J. Gratch, L. Jonhson, S. (J. Gratch, L. Jonhson, S. Marsella…, USC)Marsella…, USC)

66

Interactive SystemInteractive System

Real state agentGesture synchronized with speech and intonation Small talk Dialog partner

77

MAX, MAX, S. Kopp, U of S. Kopp, U of BielefeldBielefeld

Gesture understanding and imitation

88

Gilbert and George at Gilbert and George at the Bank (Upenn, 1994)the Bank (Upenn, 1994)

1010

GretaGreta

1111

Problem to Be SolvedProblem to Be Solved Human communication is endowed Human communication is endowed

with three devices to express with three devices to express communicative intention:communicative intention:– Verbs and formulasVerbs and formulas– Intonation and paralinguisticIntonation and paralinguistic– Facial expression, gaze, gesture, body Facial expression, gaze, gesture, body

movement, posture…movement, posture…

Problem: For any communicative Problem: For any communicative act, the Speaker has to decide:act, the Speaker has to decide:– Which nonverbal behaviors to showWhich nonverbal behaviors to show– How to execute themHow to execute them

1212

Verbal and Nonverbal Verbal and Nonverbal CommunicationCommunication

Suppose I want to advise a friend to put on Suppose I want to advise a friend to put on her coat because it is snowing.her coat because it is snowing.

Which signals do I use?Which signals do I use?

Verbal signal: use of a syntactically complex Verbal signal: use of a syntactically complex sentence: sentence:

Take your umbrella because it is rainingTake your umbrella because it is raining

Verbal + nonverbal signals:Verbal + nonverbal signals:

Take your umbrella +Take your umbrella + point out to the window to point out to the window to show the rain by a gesture or by gazeshow the rain by a gesture or by gaze

1313

Multimodal SignalsMultimodal Signals

The whole body communicates by using:The whole body communicates by using:– Verbal acts (words and sentences)Verbal acts (words and sentences)– Prosody, intonation (nonverbal vocal signals)Prosody, intonation (nonverbal vocal signals)– Gesture (hand and arm movements)Gesture (hand and arm movements)– Facial action (smile, frown)Facial action (smile, frown)– Gaze (eyes and head movements)Gaze (eyes and head movements)– Body orientation and posture (trunk and leg Body orientation and posture (trunk and leg

movements)movements)

All these systems of signals have to All these systems of signals have to cooperate in expressing overall meaning cooperate in expressing overall meaning of communicative act.of communicative act.

1414

Multimodal SignalsMultimodal Signals

Accompany flow of speechAccompany flow of speech

Synchronized at the verbal levelSynchronized at the verbal level

Punctuate accented phonemic segments Punctuate accented phonemic segments and pausesand pauses

Substitute for word(s)Substitute for word(s)

Emphasize what is being saidEmphasize what is being said

Regulate the exchange of speaking turnRegulate the exchange of speaking turn

1515

SynchronizationSynchronization There exists an isomorphism between There exists an isomorphism between

patterns of speech, intonation and facial patterns of speech, intonation and facial actionsactions

Different levels of synchrony:Different levels of synchrony:

– Phoneme level (blink)Phoneme level (blink)

– Word level (eyebrow)Word level (eyebrow)

– Phrase level (hand gesture)Phrase level (hand gesture)

Interactional synchrony: Synchrony Interactional synchrony: Synchrony between speaker and addresseebetween speaker and addressee

1616

Taxonomy of Communicative Taxonomy of Communicative Functions (I. Poggi)Functions (I. Poggi)

The speaker may provide three broad The speaker may provide three broad types of information about:types of information about:– Information about the world: deictic, iconic Information about the world: deictic, iconic

(adjectival),…(adjectival),…– Information about the speaker’s mind: Information about the speaker’s mind:

belief (certainty, adjectival)belief (certainty, adjectival) goal (performative, rheme/theme, turn-system, belief goal (performative, rheme/theme, turn-system, belief

relation)relation) emotionemotion meta-cognitivemeta-cognitive

– Information about speaker’s identity (sex, Information about speaker’s identity (sex, culture, age…)culture, age…)

1717

Multimodal Signals Multimodal Signals (Isabella Poggi)(Isabella Poggi)

Characterization of multimodal signals by Characterization of multimodal signals by their placement with respect to linguistic their placement with respect to linguistic utterance and significance in transmitting utterance and significance in transmitting information. Eg:information. Eg:– Raised eyebrow may signal surprise, Raised eyebrow may signal surprise,

emphasis, question mark, suggestion…emphasis, question mark, suggestion…

– Smile may express happiness, be a polite Smile may express happiness, be a polite greeting, be a backchannel signal…greeting, be a backchannel signal…

Need two information to characterize Need two information to characterize multimodal signals:multimodal signals:– Their meaningTheir meaning– Their visual actionTheir visual action

1818

Lexicon=(meaning, signal)Lexicon=(meaning, signal)

Expression meaningExpression meaning

– deicticdeictic: this, that, here, there: this, that, here, there– adjectivaladjectival: small, difficult: small, difficult– certaintycertainty: certain, uncertain…: certain, uncertain…– performativeperformative: greet, request: greet, request– topictopic commentcomment: emphasis : emphasis – BeliefBelief relationrelation: contrast,…: contrast,…– turn allocationturn allocation: take/give turn: take/give turn– affectiveaffective: anger, fear, happy-: anger, fear, happy-

for, sorry-for, envy, relief, ….for, sorry-for, envy, relief, ….

Expression signalExpression signal– Deictic:Deictic: gaze direction gaze direction– Certainty: Certainty: CertainCertain: palm up : palm up

open hand; open hand; UncertainUncertain: raised : raised eyebroweyebrow

– adjectival:adjectival: small eye aperture small eye aperture – Belief relation:Belief relation: ContrastContrast: raised : raised

eyebroweyebrow– Performative:Performative: SuggestSuggest: small : small

raised eyebrow, head aside; raised eyebrow, head aside; AssertAssert: horizontal ring: horizontal ring

– Emotion: Emotion: Sorry-forSorry-for: head : head aside, inner eyebrow up; aside, inner eyebrow up; JoyJoy: : raising fist upraising fist up

– Emphasis:Emphasis: raised eyebrows, raised eyebrows, head nod, beathead nod, beat

1919

Representation LanguageRepresentation Language Affective Presentation Markup Language – APMLAffective Presentation Markup Language – APML

– describes the communicative functions describes the communicative functions

– works at meaning level and not the signal levelworks at meaning level and not the signal level

<APML>

<turn-allocation type="take turn"> <performative type="greet">

Good Morning, Angela. </performative>

<affective type="happy"> It is so

<topic-comment type="comment"> wonderful </topic-comment>

to see you again. </affective> <certainty type="certain"> I was

<topic-comment type="comment"> sure </topic-comment>

we would do so, one day! </certainty> </turn-allocation> </APML>..

2020

Facial Description Facial Description LanguageLanguage

Facial expressions defined as (meaning, Facial expressions defined as (meaning, signal) pairs stored in librarysignal) pairs stored in library

Hierarchical set of classes:Hierarchical set of classes:– Facial basis FB class: basic facial movementFacial basis FB class: basic facial movement– An FB may be represented as a set of MPEG-4 An FB may be represented as a set of MPEG-4

compliant FAPs or recursively, as a compliant FAPs or recursively, as a combination of other FBs using the `+' combination of other FBs using the `+' operatorsoperators FB={fap3=vFB={fap3=v11,…,fap69=v,…,fap69=vkk};};

FB'=cFB'=c11*FB*FB11+c+c22*FB*FB22;;

where cwhere c11 and c and c2 2 are constants and FBare constants and FB11 and FB and FB22 can be: can be:– Previous defined FBs Previous defined FBs

– FB of the form: {fap3=vFB of the form: {fap3=v11,…,fap69=v,…,fap69=vkk}}

2121

Facial basis classFacial basis class

Facial basis class Facial basis class – Examples of facial basis class:Examples of facial basis class:

Eyebrow: small_frown, left_raise, Eyebrow: small_frown, left_raise, right_raiseright_raise

Eyelid: upper_lid_raiseEyelid: upper_lid_raise Mouth: left_corner_stretch, Mouth: left_corner_stretch,

left_corner_raiseleft_corner_raise

+ =

2222

Facial DisplaysFacial Displays

Every facial display (FD) is made up of Every facial display (FD) is made up of one or more FBs:one or more FBs:– FD=FBFD=FB11 + FB + FB22 + FB + FB33 + … + FB + … + FBnn;;

– surprise=raise_eyebrow+raise_lid+open_msurprise=raise_eyebrow+raise_lid+open_mouth;outh;

– worried=(surprise*0.7)+sadness;worried=(surprise*0.7)+sadness;

2323

Facial DisplaysFacial Displays

Probabilistic mapping between the tags and signals:Probabilistic mapping between the tags and signals:

– Es: happy_for = (smile*0.5, 0.3) + (smile*0.25) + (smile*2 Es: happy_for = (smile*0.5, 0.3) + (smile*0.25) + (smile*2 + raised_eyebrow, 0.35) + (nothing, 0.1)+ raised_eyebrow, 0.35) + (nothing, 0.1)

Definition of a function class for addressee Definition of a function class for addressee association (meaning, signal)association (meaning, signal)

Class communicative function:Class communicative function:– CertaintyCertainty– AdjectivalAdjectival– PerformativePerformative– AffectiveAffective– ……

2424

Facial Temporal CourseFacial Temporal Course

2525

Gestural LexiconGestural Lexicon Certainty: Certainty:

– Certain: palm up open handCertain: palm up open hand– Uncertain: showing empty hands while lowering Uncertain: showing empty hands while lowering

forearmsforearms Belief-relation:Belief-relation:

– List of items of same class: numbering on fingersList of items of same class: numbering on fingers– Temporal relation: fist with extended hand moves back Temporal relation: fist with extended hand moves back

and forth behind one’s shoulderand forth behind one’s shoulder Turn-taking:Turn-taking:

– Hold the floor: raise hand, palm toward hearer Hold the floor: raise hand, palm toward hearer Performative: Performative:

– Assert: horizontal ringAssert: horizontal ring– Reproach: extended index, palm to left, rotating up & Reproach: extended index, palm to left, rotating up &

down on wristdown on wrist Emphasis: beatEmphasis: beat

2626

Gesture Specification Gesture Specification LanguageLanguage

Scripting language for hand-arm gestures, Scripting language for hand-arm gestures, based on formational parameters [Stokoe]:based on formational parameters [Stokoe]:– Hand shape specified using HamNoSys [Prillwitz et. al.]Hand shape specified using HamNoSys [Prillwitz et. al.]

– Arm position: concentric squares in front of agent Arm position: concentric squares in front of agent [McNeill][McNeill]

– Wrist orientation: palm and finger base orientationWrist orientation: palm and finger base orientation

Gestures are defined by a sequence of timed Gestures are defined by a sequence of timed key poses: gesture framekey poses: gesture frame

Gestures are broken down temporally into Gestures are broken down temporally into distinct (optional) phases:distinct (optional) phases:– Gesture phase: preparation, stroke, hold, retractionGesture phase: preparation, stroke, hold, retraction– Change of formational components over time Change of formational components over time

2727

Gesture Gesture specification specification

example: example: CertainCertain

2828

Gesture Temporal CourseGesture Temporal Course

rest position preparation stroke start – stroke end

retraction rest position

2929

ECA architectureECA architecture

3030

ECA ArchitectureECA Architecture

Input to the system: APML annotated textInput to the system: APML annotated text Output to the system: Animation files and Output to the system: Animation files and

WAV file for the audioWAV file for the audio System: System:

– Interprets APML tagged dialogs, i.e. all Interprets APML tagged dialogs, i.e. all communicative functionscommunicative functions

– Looks in a library the mapping between the Looks in a library the mapping between the meaning (specified by the XML-tag) and signalsmeaning (specified by the XML-tag) and signals

– Decides which signals to convey on which Decides which signals to convey on which modalitiesmodalities

– Synchronizes the signals with speech at different Synchronizes the signals with speech at different levels (word, phoneme or utterance)levels (word, phoneme or utterance)

3131

Behavioral EngineBehavioral Engine

3232

ModulesModules APML ParserAPML Parser: XML parser: XML parser

TTS FestivalTTS Festival: manages the speech synthesis and give us : manages the speech synthesis and give us the list of phonemes and phonemes duration.the list of phonemes and phonemes duration.

Expr2Signal ConverterExpr2Signal Converter: given a communicative : given a communicative function and its meaning, this module returns the list of function and its meaning, this module returns the list of facial signals facial signals

Conflicts ResolverConflicts Resolver: resolves the conflicts that may : resolves the conflicts that may happened when more than one facial signals should be happened when more than one facial signals should be activated on same facial partsactivated on same facial parts

Face GeneratorFace Generator: converts the facial signals into MPEG-4 : converts the facial signals into MPEG-4 FAP valuesFAP values

Viseme GeneratorViseme Generator: converts each phoneme, given by : converts each phoneme, given by Festival, into a set of FAPsFestival, into a set of FAPs

MPEG4 FAP DecoderMPEG4 FAP Decoder: is an MPEG-4 compliant Facial : is an MPEG-4 compliant Facial Animation Engine Animation Engine

3333

TTS FestivalTTS Festival Drive the synchronization of facial expressionDrive the synchronization of facial expression Synchronization implemented at word levelSynchronization implemented at word level

– Timing of facial expression connected to the text Timing of facial expression connected to the text embedded between the markersembedded between the markers

Use of the tree structure of Festival to Use of the tree structure of Festival to compute expressions durationcompute expressions duration

3434

Expr2Signal ConverterExpr2Signal Converter

Instantiation of APML tags: meaning Instantiation of APML tags: meaning of a given communicative functionof a given communicative function

Converts markers into facial signalsConverts markers into facial signals

Use of a library containing the Use of a library containing the lexicon of the type (meaning, facial lexicon of the type (meaning, facial expressions)expressions)

3535

Gaze ModelGaze Model

Based on communicative functions’ model Based on communicative functions’ model of Isabella Poggiof Isabella Poggi

This model predicts what should be the This model predicts what should be the value of gaze in order to have a given value of gaze in order to have a given meaning in a given conversational context. meaning in a given conversational context.

For example:For example:

– agent wants to emphasize a given word, the agent wants to emphasize a given word, the model will output that the agent should gaze at model will output that the agent should gaze at her conversant.her conversant.

3636


Very deterministic behavior model: at every Very deterministic behavior model: at every Communicative Function associated with a Communicative Function associated with a meaning correspond the same signal (with meaning correspond the same signal (with probabilistic changes)probabilistic changes)

Event-driven model: only when a Event-driven model: only when a Communicative Function is specified the Communicative Function is specified the associated signals are computedassociated signals are computed

only when a Communicative Function is only when a Communicative Function is specified, the corresponding behavior may specified, the corresponding behavior may varyvary

3737


Several drawbacks as there is no Several drawbacks as there is no temporal consideration:temporal consideration:

– No consideration of past and current No consideration of past and current gaze behavior to compute the new onegaze behavior to compute the new one

– No consideration of how long the current No consideration of how long the current gaze state of S and L has lastedgaze state of S and L has lasted

3838

Gaze AlgorithmGaze Algorithm Two steps:Two steps:

1.1. Communicative prediction:Communicative prediction:• Apply the communicative function model to Apply the communicative function model to

compute the gaze behavior as to convey a compute the gaze behavior as to convey a given meaning for S and Lgiven meaning for S and L

2.2. Statistical prediction:Statistical prediction:• The communicative gaze model is The communicative gaze model is

probabilistically modified by a statistical probabilistically modified by a statistical model defined with constraints:model defined with constraints:– what is the communicative gaze behavior of S what is the communicative gaze behavior of S

and Land L– in which gaze behavior S and L werein which gaze behavior S and L were– the duration of the current state of S and Lthe duration of the current state of S and L

3939

Temporal Gaze Temporal Gaze ParametersParameters

The gaze behaviors depend on the communicative The gaze behaviors depend on the communicative functions, general purpose of the conversation functions, general purpose of the conversation (persuasion discours, teaching...), personality, cultural (persuasion discours, teaching...), personality, cultural root, social relations... root, social relations...

Very, too, complex modelVery, too, complex model

propose parameters that control the gaze behavior propose parameters that control the gaze behavior overalloverall

TTS=1,L=1S=1,L=1maxmax: maximum duration the mutual gaze state may remain active.: maximum duration the mutual gaze state may remain active.

TTS=1S=1maxmax : maximum duration of gaze state S=1. : maximum duration of gaze state S=1.

TTL=1L=1maxmax : maximum duration of gaze state L=1 . : maximum duration of gaze state L=1 .

TTS=0S=0maxmax : maximum duration of gaze state S=0. : maximum duration of gaze state S=0.

TTL=0L=0maxmax : maximum duration of gaze state L=0. : maximum duration of gaze state L=0.

4040

Mutual Gaze

4141

Gaze Aversion

4242

Gesture PlannerGesture Planner Adaptive instantiation:Adaptive instantiation:

– Preparation and retraction phase adjustmentsPreparation and retraction phase adjustments

– Transition key and rest gesture insertionTransition key and rest gesture insertion

– Joint-chain follow-throughJoint-chain follow-through Forward time shifting of children joints in timeForward time shifting of children joints in time

Stroke of gesture on stressed wordStroke of gesture on stressed word

Stroke expansionStroke expansion During planning phase, identify During planning phase, identify rhemerheme clauses with clauses with

closely repeated emphases/pitch accentsclosely repeated emphases/pitch accents

Indicate secondary accents by repeating the stroke Indicate secondary accents by repeating the stroke of the primary gesture with decreasing amplitudeof the primary gesture with decreasing amplitude

4343

Gesture PlannerGesture Planner Determination of gesture:Determination of gesture:

– Look in dictionaryLook in dictionary

Selection of gestureSelection of gesture

– Gestures associated with most embedded tags Gestures associated with most embedded tags have priority (except beat): adjectival, deictichave priority (except beat): adjectival, deictic

Duration of gesture:Duration of gesture:

– Coarticulation between successive gestures Coarticulation between successive gestures closed in timeclosed in time

– Hold for gestures belonging to higher up tag Hold for gestures belonging to higher up tag hierarchy (e.g. performative, belief-relation)hierarchy (e.g. performative, belief-relation)

– Otherwise go to rest positionOtherwise go to rest position

4444

Behavior ExpressivityBehavior Expressivity

Behavior is related to the (Wallbott, 1998):Behavior is related to the (Wallbott, 1998):– qualityquality of the mental state (e.g. emotion) it refers of the mental state (e.g. emotion) it refers

toto– quantityquantity (somehow linked to the intensity factor (somehow linked to the intensity factor

of the mental state)of the mental state)

Behaviors encode: Behaviors encode: – content information (the ‘What is communicating’)content information (the ‘What is communicating’)– expressive information (the ‘How it is expressive information (the ‘How it is

communicating’)communicating’)

Behavior expressivity refers to the manner of Behavior expressivity refers to the manner of execution of the behaviorexecution of the behavior

4545

Expressivity DimensionsExpressivity Dimensions

SpatialSpatial: amplitude of movement: amplitude of movement TemporalTemporal: duration of movement: duration of movement PowerPower: dynamic property of movement: dynamic property of movement FluidityFluidity: smoothness and continuity of : smoothness and continuity of

movementmovement RepetitivenessRepetitiveness: tendency to rhythmic repeats: tendency to rhythmic repeats Overall Activation:Overall Activation: quantity of movement quantity of movement

across modalitiesacross modalities

4646

Overall ActivitationOverall Activitation

• Threshold filter on atomic behaviors during APML tag matching

• Determines the number of nonverbal signals to be executed.

4747

Spatial ParameterSpatial Parameter

• Amplitude of movement controlled through asymmetric scaling of the reach

• space that is used to find IK goal positions

• Expand or condense the entire space in front of agent

4848

Temporal parameterTemporal parameter

Stroke shift / velocity control of a beat gesture

Y p

osit

ion

of w

rist

w.r

.t. s

houl

der

[cm

]

Frame #

• Determine the speed of the arm movement of a gesture's meaning-carrying stroke phase

• Modify speed of stroke

4949

FluidityFluidity• Continuity control of TCB interpolation splines and gesture-to-gesture• Continuity of arms’ trajectory paths• Control the velocity profiles of an action

coarticulation

X p

osit

ion

of w

rist

w.r

.t. s

houl

der

[cm

]

Frame #

5050

PowerPower

• Tension and Bias control of TCB splines;• Overshoot reduction• Acceleration and deceleration of limbs

Hand shape control for gestures that do not need hand configuration to convey their meaning (beats).

5151

RepetitivityRepetitivity

• Technique of stroke expansion: Consecutive emphases are realized gesturally by repeating the stroke of the first gesture.

5252

Multiple Modality Ex: Multiple Modality Ex: AbruptAbrupt

Overall Activity = 0.6Spatial = 0Temporal = 1Fluidity = -1Power = 1Repetition = -1

5353

Multiple Modality Ex: Multiple Modality Ex: VigorousVigorous

Overall Activity = 1

Spatial = 1

Temporal = 1

Fluidity = 1

Power = 0

Repetition = 1

5454

Evaluation of Expressive Evaluation of Expressive GestureGesture

(H1) The chosen implementation for mapping single (H1) The chosen implementation for mapping single dimensions of expressivity onto animation dimensions of expressivity onto animation parameters is appropriate - a change in a single parameters is appropriate - a change in a single dimension can be recognized and correctly dimension can be recognized and correctly attributed by users.attributed by users.

(H2) Combining parameters in such a way that they (H2) Combining parameters in such a way that they reflect a given communicative intent will result in reflect a given communicative intent will result in more believable overall impression of the agent.more believable overall impression of the agent.

106 subjects from 17 to 26 years old106 subjects from 17 to 26 years old

5555

Perceptual Test StudiesPerceptual Test Studies

Evaluation of the adequacy of the implementation of each Evaluation of the adequacy of the implementation of each parameter:parameter:– check whether subjects could perceive and distinguish the six check whether subjects could perceive and distinguish the six

different expressivity parameters and indicate their direction of different expressivity parameters and indicate their direction of change. change.

– Result: good recognition for Result: good recognition for spatialspatial and and temporaltemporal parameters; parameters; lower recognition for lower recognition for fluidityfluidity and and powerpower parameters as they are parameters as they are inter-dependent.inter-dependent.

Evaluation task: does setting appropriate values for the Evaluation task: does setting appropriate values for the expressivity parameters create behaviors that are judged expressivity parameters create behaviors that are judged as exhibiting corresponding expressivity?as exhibiting corresponding expressivity?– 3 different types of behaviors: 3 different types of behaviors: abrupt, sluggish, vigorousabrupt, sluggish, vigorous– users prefer the coherent performance for vigorous and abruptusers prefer the coherent performance for vigorous and abrupt

5656

InteractionInteraction Interaction: two or more parties exchange Interaction: two or more parties exchange

messages. messages. Interaction is by no means a one way Interaction is by no means a one way

communication channel between parties. communication channel between parties. Within an interaction, parties take turns in playing Within an interaction, parties take turns in playing

the roles of the speaker and of the addressee.the roles of the speaker and of the addressee.

5757

InteractionInteraction

Speaker and addressee adapt their Speaker and addressee adapt their behaviors to each otherbehaviors to each other

– Speaker monitors addressees attention Speaker monitors addressees attention and interest in what he has to sayand interest in what he has to say

– addressee selects feedback behaviors to addressee selects feedback behaviors to show the speaker that he is paying show the speaker that he is paying attentionattention

5858


Speaker:Speaker:– Pointless for a speaker to engage in an Pointless for a speaker to engage in an

act of communication if addressee does act of communication if addressee does not pay or intend to pay attentionnot pay or intend to pay attention

– Important for speaker to assess Important for speaker to assess addressee’s engagement at:addressee’s engagement at: when starting an interaction: assess the when starting an interaction: assess the

possibility of engagement in interaction possibility of engagement in interaction ((establish phaseestablish phase))

when interaction is going on: check if when interaction is going on: check if engagement is lasting and sustaining engagement is lasting and sustaining conversation (conversation (maintain phasemaintain phase))

5959


addresseeaddressee– attentionattention: pay attention to the signals produced by : pay attention to the signals produced by

speaker to perceive, process and memorize themspeaker to perceive, process and memorize them– perceptionperception: of signals: of signals– comprehensioncomprehension: understand meaning attached to : understand meaning attached to

signalssignals– internal reactioninternal reaction: the comprehension of the : the comprehension of the

meaning may create cognitive and emotional meaning may create cognitive and emotional reactionreaction

– decisiondecision: communication or not of the internal : communication or not of the internal reactionreaction

– generationgeneration: display behaviors: display behaviors

6060

BackchannelBackchannel

Types of backchannels (I. Poggi):Types of backchannels (I. Poggi):– attentionattention– comprehensioncomprehension– beliefbelief– interestinterest– agreementagreement

positive/negativepositive/negative any combination of the above: pay any combination of the above: pay

attention but not understand; understand attention but not understand; understand but non believe, etc.but non believe, etc.

6161


Depending on the type of speech act Depending on the type of speech act they respond to, a signal will be they respond to, a signal will be interpreted as a backchannel or not.interpreted as a backchannel or not.– backchannel: a signal of agreement / backchannel: a signal of agreement /

disagreement that follows the expression disagreement that follows the expression of opinions, evaluations, planningof opinions, evaluations, planning

– not a backchannel: a signal of not a backchannel: a signal of comprehension / incomprehension after comprehension / incomprehension after an explicit question « Did you an explicit question « Did you understand? »understand? »

6262


Polysemy of backchannel signals: Polysemy of backchannel signals: – a signal may provide different types of a signal may provide different types of

informationinformation– a frown: negative feedback for a frown: negative feedback for

understanding, believing and agreeingunderstanding, believing and agreeing

6363

Backchannel signals of gazeBackchannel signals of gaze gaze: gaze:

– show direction of attentionshow direction of attention– inform on level of engagement or on intention to inform on level of engagement or on intention to

maintain engagementmaintain engagement– indicate degree of intimacy indicate degree of intimacy but alsobut also– monitor the gaze behavior of others to establish monitor the gaze behavior of others to establish

their intention to engage or maintain engagedtheir intention to engage or maintain engaged shared attention situation involved mutual shared attention situation involved mutual

gaze at each other partner or mutual gaze gaze at each other partner or mutual gaze at a same objectat a same object

6464

Backchannel modellingBackchannel modelling

Reactive modelReactive model– generates an instinctive feedback without reasoninggenerates an instinctive feedback without reasoning

– simple backchannel or mimicrysimple backchannel or mimicry

– spontaneous - sincere spontaneous - sincere

Cognitive modelCognitive model– conscious decision to provide backchannel to conscious decision to provide backchannel to

provoke a particular effect on the speaker or to provoke a particular effect on the speaker or to reach a specific goalreach a specific goal

– deliberate – possibly pretendeddeliberate – possibly pretended

– it can be shifted to automatic (it can be shifted to automatic (ex. ex. when listening to a when listening to a borebore))

6565

Backchannel DemoBackchannel Demo

6666

A reactive backchannelA reactive backchannel

Currently, our model is Currently, our model is reactivereactive in in naturenature

– Dependent on perceptionDependent on perception Speaker interprets addressee’s behaviorSpeaker interprets addressee’s behavior

Speaker generates or alters its own behaviorSpeaker generates or alters its own behavior

– Our focus: interest and attention on a Our focus: interest and attention on a signal level (not on a cognitive level)signal level (not on a cognitive level)

6767

Organization of the Organization of the communication communication Attraction of Attraction of attentionattention

Communicative agents: Communicative agents: the agents provide information to the user, and the agents provide information to the user, and should guarantee the user pay attentionshould guarantee the user pay attention

Animation expressivity:Animation expressivity: principle of “staging”, so that a single idea is principle of “staging”, so that a single idea is clearly expressed at each instant of timeclearly expressed at each instant of time

Animation specificity:Animation specificity: animators’ creativity, no realistic constraints for animators’ creativity, no realistic constraints for animators animators

What types of gesture properties could guarantee user’s attention?

France Telecom

6868

Organization of the Organization of the communication communication Attraction of attentionAttraction of attention

Corpus:Corpus: videos from traditional animationvideos from traditional animation that that illustrate different types of conversational illustrate different types of conversational interactioninteraction

the modulations of gesture expressivity over time play a role in managing communication, thus serving as a pragmatic tool France Telecom

6969

EmotionEmotion

elicited by the evaluation of events, elicited by the evaluation of events, objects, actionsobjects, actions

integration of emotions in a dialog integration of emotions in a dialog system (Artimis, FT)system (Artimis, FT)

identify under which circumstances a identify under which circumstances a dialog agent should express dialog agent should express emotionsemotions

France Telecom

7070

EmotionEmotion

BDI representationBDI representation based on OCC model: Appraisal variablesbased on OCC model: Appraisal variables [Ortony [Ortony

et al.et al. 1988]: 1988]:– Desirability/Undesirability Desirability/Undesirability : Achievement or threaten of the : Achievement or threaten of the

agent's choice agent's choice – Degree of realizationDegree of realization : Degree of certainty of the choice's : Degree of certainty of the choice's

achievementachievement– Probability of an eventProbability of an event : Probability of feasibility of an event : Probability of feasibility of an event– AgencyAgency : The agent who is actor of the event : The agent who is actor of the event

Emotional Mental State

Set of appraisal variables Configuration of mental

attitudes

Representation of appraisal variables by mental attitudes

France Telecom

7171

EmotionEmotion

complex emotions:complex emotions:– superposition of 2 emotions: evaluation of superposition of 2 emotions: evaluation of

an event can happen under different an event can happen under different anglesangles

– mask an emotion by another one : mask an emotion by another one : consideration of social contextconsideration of social context

joy + deception = maskingjoy + deception = masking

7272

VideoVideoMasking of Deception by JoyMasking of Deception by Joy

7373

ConclusionConclusion

Creation of a virtual agent able to Creation of a virtual agent able to – communicate nonverballycommunicate nonverbally– show emotionsshow emotions– use expressive gesturesuse expressive gestures– perceive and be attentiveperceive and be attentive– maintain the attentionmaintain the attention

Two studies on expressivityTwo studies on expressivity– from manual annotation of video corpusfrom manual annotation of video corpus– from mimicry of movement analysisfrom mimicry of movement analysis

1 université paris 8 multimodal expressive embodied conversational agents catherine pelachaud...

Documents

humanhuman interactionecas

nonverbal behaviors

poggithe speaker

expressive behaviourswords

nonverbal communicationsuppose

communicative intention

systems of signals

verbal levelpunctuate