collection and analysis of multimodal interaction in direction giving dialogues

Collection and Analysis of Multimodal Interaction in Direction Giving

Dialogues

Seikei University

Takeo Tsukamoto Yumi Muroya

Masashi Okamoto Yukiko Nakano

Japan

Towards an Automatic Gesture Selection Mechanism for Metaverse Avatars

Overview Introduction Research Goal and Questions Approach Data Collection Experiment Analysis Conclusion and Future Work

Introduction Online 3D virtual worlds based on Metaverse

application is growing steadily in popularityEx. ： Second Life (SL)

⇒The communication method is limited to:Online chat with speech balloonsManual gesture generation

Hello

Introduction(Cont.) Human face-to-face communication is largely

dependent on non-verbal behaviors Ex. direction giving dialogues

Many spatial gestures are used in order to illustrate directions and physical relationships of buildings and landmarks

How can we implement natural non-verbal behaviors into Metaverse

application ?

Research Goal and QuestionsGoal

Establish natural communication between avatars in Metaverse based on human face-to-face communication

Research QuestionsAutomation : gesture selection

How to automatically generate proper gestures?

Comprehensibility : gesture displayHow to intelligibly display gestures to

interlocutor?

Previous work An automatic gesture selection mechanism

for Japanese chat texts in Second Life [Tsukamoto,2010]

/2 　 you keep going straight in this road, then you will be able to find a house having a round window on your left./

Proxemics

Proxemics is important to implement comprehensible gestures in

Metaverse

Previous work doesn’t consider proxemics⇒ There are some cases when avatar’s gesture becomes unintelligible to the others

ApproachConduct an experiment to collect human

gesturesin direction giving dialogues

Collect participant’s verbal and non-verbal data

Analyze the relationship between gestures and proxemics

Data Collection Experiment

Direction Giver (DG)Know the way to any place on campus of Seikei

Univ. Direction Receiver (DR)

Know nothing about the campus of Seikei Univ.

Experimental Procedure

The DR asks a way to a specific building

The DG explains how to get to

the building

DG

DR

Experimental InstructionDirection Receiver Instructed to completely understand the

way to the goal through a conversation with the DG

Direction GiverInstructed to confirm that the DR

understood the direction correctly after the explanation was finished

Experimental Materials

Each pair recorded a conversation 　 for each goal place

Experimental Equipments

Right arm Abdominal

Head ShoulderHeadset microphone

EquipmentsExperimental

Video

Collected Data

Video DataUtterer Start Time sec.（）End Time sec.（） Utterance Content

DG 6.8153 7.9568 Well,DG 8.3196 10.591 It is hard to understand because there are

many buildings.DR 10.5869 10.7703 YesDG 10.8244 12.1624 Uh…there is the connecting corridorDG 12.3124 13.4045 at the frontDR 13.0544 13.3586 YesDG 13.5765 14.5465 and that's

Transcription of Utterances

Motion Capture Data

AnalysisInvestigated DG’s gesture

distribution with respect to proxemics Analyzed 30 dialogues collected from 10

pairsAnalysis was focused on the

movements of DG’s right arm during gesturing

Automatic Gesture Annotation

Extracted features

Movement of position(x, y, z) Rotation(x, y, z) Relative position of the right arm to

shoulder(x, y, z) Distance between right arm and shoulder

Binary judge Gesturing /

Not gesturing

It is very time consuming to manually annotate nonverbal behaviors

Automatically annotated the gesture occurrence More than 77% of the gestures are right arm gestures

Built a decision tree that identified right arm gestures

Weka J48 was used for the decision tree learning

Automatic Gesture Annotation(Cont.) As the result of 10-fold cross validation, the

accuracy is 97.5%Accurate enough for automatic annotation

Example of automatic annotation

Gesture Display Space Defined as the overlap among the DG’s front

area, the DR’s front area, and the DR’s front field of vision

D ＧDR

Direction Receiver Direction Giver

Gesture Display SpaceCenter

Distance of DGfrom the center

Distance of DRfrom the center

DR’s bodyDirection vector

DG’s bodyDirection vector

DR’s front field of Vision

Category ConditionsNormal(12/30) 450mm Distance Both-center 950mm≦ ≦

Close_to_DG(4/30) Distance DG-center 450mm≦ 　450mm Distance DR-center 950mm≦ ≦

Close_to_DR(8/30) Distance DR-center 450mm≦ 　450mm Distance DG-center 950mm≦ ≦

Close_to_Both(2/30) Distance Both-center 450mm≦

Far_from_Both(4/30) 950mm Distance DG-center or≦950mm Distance DR-center≦

Define 450mm to 950mm as the standard distance from the center of the gesture display space Human arm length is 60cm to 80cm, by adding 15cm

margin

Categories of Proxemics

Analysis ： Relationship between Proxemics and Gesture Distribution Analyze the distribution of gestures by

plotting the DG’s right arm positionNormal Close_to_DG Close_to_DR Close_to_Both

Similar Wider Smaller

Analysis ： Relationship between Proxemics and Gesture Distribution(Cont.)

Close_to_Both < Normal = Close_to_DG < Close_to_Both

Applying the Proxemics Model Create avatar gestures based on our

proxemics model To test whether the findings are applicable

Close_to_DG Close_to_DR

Conclusion Conducted an experiment to collect human

gestures in direction giving dialogues Investigated the relationship between the

proxemics and the gesture distribution Proposed five types of proxemics

characterized by the distance from the gesture display space

Found that the gesture distribution range was different depending on the proxemics of the participants

Future WorkEstablish a computational model of

determining gesture direction

Examine the effectiveness of the model whether the users perceive the avatar’s

gestures being appropriate and informative

Thank you for your attention

Related work [Breitfuss, 2008] Built a system that

automatically adds gestural behavior and eye gaze Based on linguistic and contextual information of

input text [Tepper, 2004] Proposed a method for generating

novel iconic gestures Used spatial information about locations and shape of

landmarks to represent concept of words From a set of parameters, iconic gestures are generated

without relying on a lexicon of gesture shapes [Bergmann, 2009] Represented individual

variation of gesture shapes using Bayesian network Built an extensive corpus of multimodal behaviors in

direction-giving and landmark description task

collection and analysis of multimodal interaction in direction giving dialogues

Documents

gesture displayhow

gesture selectionhow

communication method

communication studies

avatars gestures

metaverse avatarsi

metaverse applications

spatial gestures