communication homme-machine multimodale jean-claude martin maître de conférences limsi-cnrs /...

97
Communication Homme-Machine MultiModale Jean-Claude MARTIN Maître de Conférences LIMSI-CNRS / LINC-IUT de Montreuil martin @ limsi.fr www.limsi.fr/Individu/martin ESS Nouvelles Technologies et andicaps Sensoriels et Physiques

Upload: timothy-mccoy

Post on 29-Dec-2015

213 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Communication Homme-Machine MultiModale Jean-Claude MARTIN Maître de Conférences LIMSI-CNRS / LINC-IUT de Montreuil martin @ limsi.fr

Communication Homme-Machine MultiModale

Jean-Claude MARTINMaître de Conférences

LIMSI-CNRS / LINC-IUT de Montreuilmartin @ limsi.fr

www.limsi.fr/Individu/martin

DESS Nouvelles Technologies et Handicaps Sensoriels et Physiques

Page 2: Communication Homme-Machine MultiModale Jean-Claude MARTIN Maître de Conférences LIMSI-CNRS / LINC-IUT de Montreuil martin @ limsi.fr

Jean-Claude MARTIN - LIMSI / LINC 2

Content

• Introduction to HCI• History• ICT & HCI

• Multimodal HCI• Concepts• Experimental studies• Softwares and systems • Roadmap

Page 3: Communication Homme-Machine MultiModale Jean-Claude MARTIN Maître de Conférences LIMSI-CNRS / LINC-IUT de Montreuil martin @ limsi.fr

Communication Homme-Machine

Introduction, historique

Page 4: Communication Homme-Machine MultiModale Jean-Claude MARTIN Maître de Conférences LIMSI-CNRS / LINC-IUT de Montreuil martin @ limsi.fr

Jean-Claude MARTIN - LIMSI / LINC 4

Historique

Page 5: Communication Homme-Machine MultiModale Jean-Claude MARTIN Maître de Conférences LIMSI-CNRS / LINC-IUT de Montreuil martin @ limsi.fr

Jean-Claude MARTIN - LIMSI / LINC 5

Definition

• Curricula for Human-Computer Interaction http://www.acm.org/sigchi/cdg/cdg2.html• Human-computer interaction is a

discipline concerned with the design, evaluation and implementation of interactive computing systems for human use and with the study of major phenomena surrounding them.

Page 6: Communication Homme-Machine MultiModale Jean-Claude MARTIN Maître de Conférences LIMSI-CNRS / LINC-IUT de Montreuil martin @ limsi.fr

Jean-Claude MARTIN - LIMSI / LINC 6

Page 7: Communication Homme-Machine MultiModale Jean-Claude MARTIN Maître de Conférences LIMSI-CNRS / LINC-IUT de Montreuil martin @ limsi.fr

Jean-Claude MARTIN - LIMSI / LINC 7

CLASSICAL LOOP SCHEME

Page 8: Communication Homme-Machine MultiModale Jean-Claude MARTIN Maître de Conférences LIMSI-CNRS / LINC-IUT de Montreuil martin @ limsi.fr

Jean-Claude MARTIN - LIMSI / LINC 8

HCI HISTORY Engelbart History Corner

Page 9: Communication Homme-Machine MultiModale Jean-Claude MARTIN Maître de Conférences LIMSI-CNRS / LINC-IUT de Montreuil martin @ limsi.fr

Jean-Claude MARTIN - LIMSI / LINC 9

HCI HISTORY Engelbart ‘s First mouse

Page 10: Communication Homme-Machine MultiModale Jean-Claude MARTIN Maître de Conférences LIMSI-CNRS / LINC-IUT de Montreuil martin @ limsi.fr

Jean-Claude MARTIN - LIMSI / LINC 10

HCI HISTORYComputer support meeting

Engelbart - SRI (1967) Historic photos Engelbart

http://www.bootstrap.org/images/photos/index.htm

Page 11: Communication Homme-Machine MultiModale Jean-Claude MARTIN Maître de Conférences LIMSI-CNRS / LINC-IUT de Montreuil martin @ limsi.fr

Jean-Claude MARTIN - LIMSI / LINC 11

HCI HISTORYDistributed Computer support meeting

Engelbart ‘s presentation at the 1968 Fall Joint Computer Conference, showing screen shot of hypermedia with simultaneous on screen video teleconferencing

Page 12: Communication Homme-Machine MultiModale Jean-Claude MARTIN Maître de Conférences LIMSI-CNRS / LINC-IUT de Montreuil martin @ limsi.fr

Jean-Claude MARTIN - LIMSI / LINC 12

ICT devices

Mobile & WAP phonesMultimedia Mobile phones UMTSPDA, Pocket PC’sElectronic booksLocalisation systems…

Page 13: Communication Homme-Machine MultiModale Jean-Claude MARTIN Maître de Conférences LIMSI-CNRS / LINC-IUT de Montreuil martin @ limsi.fr

Jean-Claude MARTIN - LIMSI / LINC 13

ICT & HCI

Page 14: Communication Homme-Machine MultiModale Jean-Claude MARTIN Maître de Conférences LIMSI-CNRS / LINC-IUT de Montreuil martin @ limsi.fr

Jean-Claude MARTIN - LIMSI / LINC 14

ICT applications

Itinerary description Information services (news,

weather…)Mediated communicationCommunication integrator

Page 15: Communication Homme-Machine MultiModale Jean-Claude MARTIN Maître de Conférences LIMSI-CNRS / LINC-IUT de Montreuil martin @ limsi.fr

Jean-Claude MARTIN - LIMSI / LINC 15

ICT USERS Professional vs. Wide public

Page 16: Communication Homme-Machine MultiModale Jean-Claude MARTIN Maître de Conférences LIMSI-CNRS / LINC-IUT de Montreuil martin @ limsi.fr

Jean-Claude MARTIN - LIMSI / LINC 16

ICT MOBILITYDesktop vs. mobile

Page 17: Communication Homme-Machine MultiModale Jean-Claude MARTIN Maître de Conférences LIMSI-CNRS / LINC-IUT de Montreuil martin @ limsi.fr

Jean-Claude MARTIN - LIMSI / LINC 17

ICT DIMENSIONS

Number of users Mono vs. Multi user

Devices Homogeneous vs. Heterogeneous

config.

Page 18: Communication Homme-Machine MultiModale Jean-Claude MARTIN Maître de Conférences LIMSI-CNRS / LINC-IUT de Montreuil martin @ limsi.fr

Jean-Claude MARTIN - LIMSI / LINC 18

ICT DIMENSIONS

Fast evolution Devices, services, software technics

Commercial dimension Consumers expectation,

announcementsTerminology

Multimedia, Internet

Page 19: Communication Homme-Machine MultiModale Jean-Claude MARTIN Maître de Conférences LIMSI-CNRS / LINC-IUT de Montreuil martin @ limsi.fr

Jean-Claude MARTIN - LIMSI / LINC 19

HCI & ICT« Where is the borderline ? »

ICT Has some traditional HCI problems Has some non-traditional HCI problems

Sometimes correlated

Page 20: Communication Homme-Machine MultiModale Jean-Claude MARTIN Maître de Conférences LIMSI-CNRS / LINC-IUT de Montreuil martin @ limsi.fr

Communication Homme-Machine Multimodale

Implique des traitements distribués dans différentes modalités

Quels problèmes ? Quelles solutions ?

Page 21: Communication Homme-Machine MultiModale Jean-Claude MARTIN Maître de Conférences LIMSI-CNRS / LINC-IUT de Montreuil martin @ limsi.fr

Communication Homme-Machine

Page 22: Communication Homme-Machine MultiModale Jean-Claude MARTIN Maître de Conférences LIMSI-CNRS / LINC-IUT de Montreuil martin @ limsi.fr

Jean-Claude MARTIN - LIMSI / LINC 22

Dagstuhl Seminar 2001

http://www.dfki.de/~wahlster/Dagstuhl_Multi_Modality/

Page 23: Communication Homme-Machine MultiModale Jean-Claude MARTIN Maître de Conférences LIMSI-CNRS / LINC-IUT de Montreuil martin @ limsi.fr

Jean-Claude MARTIN - LIMSI / LINC 23

Multimedia Parsing(Bolt 1985) from Maybury 96

Put That There

Eye Tracker

Speech Recognizer

Data Glove

Page 24: Communication Homme-Machine MultiModale Jean-Claude MARTIN Maître de Conférences LIMSI-CNRS / LINC-IUT de Montreuil martin @ limsi.fr

Terminology and Typologies

Page 25: Communication Homme-Machine MultiModale Jean-Claude MARTIN Maître de Conférences LIMSI-CNRS / LINC-IUT de Montreuil martin @ limsi.fr

Jean-Claude MARTIN - LIMSI / LINC 25

Terminology and Typologies

Media physical device enabling the exchange of

information between the user and the computer

Modality a way to use a media

Example writing, gesturing and drawing are different

modalities of the pen media

Page 26: Communication Homme-Machine MultiModale Jean-Claude MARTIN Maître de Conférences LIMSI-CNRS / LINC-IUT de Montreuil martin @ limsi.fr

Jean-Claude MARTIN - LIMSI / LINC 26

Multimedia (Maybury 1999)http://www.mitre.org/resources/centers/it/maybury/mark.html

InputProcessing

Output

MEDIUM

Rendering

System

Storage

CD-ROMDisk

GestureLanguage Graphics

CODE

MODE

Tactile

Visual

Auditory

Olfactory TasteUser(s)

Page 27: Communication Homme-Machine MultiModale Jean-Claude MARTIN Maître de Conférences LIMSI-CNRS / LINC-IUT de Montreuil martin @ limsi.fr

Jean-Claude MARTIN - LIMSI / LINC 27

Multimedia Parsing: Heterogeneous Input

MEDIA UNITS*

Text character, syllable, word, phrase, sentence,paragraph (plus punctuation and formatting)

Speech Phoneme, syllable, word, phrase, sentence,(plus intonational features)

Pen Stoke (pen down/up), tap, text unit, graphic,menu/item selection

Mouse Click, drag, menu/item selection3DDevice

(e.g., spaceball, polhemus, dataglove)x, y, z, pitch, yaw, roll, force, posture, gesture

Image Pixel, vector, region, text/graphic objectVideo Frame (image), shot, transition, CC stream,

sound stream*Refined from Wittenburg, K. 1993. Multimedia and Multimodal Parsing:

Tutorial Notes. 31st Annual Meeting of the ACL, Columbus, Ohio, 23 June, 1993

Page 28: Communication Homme-Machine MultiModale Jean-Claude MARTIN Maître de Conférences LIMSI-CNRS / LINC-IUT de Montreuil martin @ limsi.fr

Jean-Claude MARTIN - LIMSI / LINC 28

Cadres d’étude de la CHM

Page 29: Communication Homme-Machine MultiModale Jean-Claude MARTIN Maître de Conférences LIMSI-CNRS / LINC-IUT de Montreuil martin @ limsi.fr

Jean-Claude MARTIN - LIMSI / LINC 29

Cadres d’étude de la CHM

Page 30: Communication Homme-Machine MultiModale Jean-Claude MARTIN Maître de Conférences LIMSI-CNRS / LINC-IUT de Montreuil martin @ limsi.fr

Jean-Claude MARTIN - LIMSI / LINC 30

Cadres d’étude de la CHM

Page 31: Communication Homme-Machine MultiModale Jean-Claude MARTIN Maître de Conférences LIMSI-CNRS / LINC-IUT de Montreuil martin @ limsi.fr

Jean-Claude MARTIN - LIMSI / LINC 31

Cadres d’étude de la CHM(Nigay & Coutaz 93)

FU

SIO

N Com

biné

Indé

pend

ant

USAGE DES MODALITES

Séquentiel Parallèle

ALTERNEE SYNERGIQUE

EXCLUSIVE CONCURRENTE

Rec. Parole

Souris

"Next note"

Temps

Modalités

Rec. Parole

Souris

"Delete that"

Temps

Modalités

Rec. Parole

Souris

"Insert note"

Temps

Modalités

clic (destination)

clic (objet à détruire)

Rec. Parole

Souris

"Empty the trash"

Temps

Modalités

clic (ouvrir un document)

Page 32: Communication Homme-Machine MultiModale Jean-Claude MARTIN Maître de Conférences LIMSI-CNRS / LINC-IUT de Montreuil martin @ limsi.fr

Jean-Claude MARTIN - LIMSI / LINC 32

(Nigay & Coutaz 93)

Quatre types de multimodalité et exemples issus de l'espace de classification proposé dans (Nigay and Coutaz 93) :

 alternée : production séquentielle des expressions, utilisation possible de plusieurs

modalités par expression, utilisation d'une seule modalité à un moment donné, exclusive : production séquentielle des expressions, utilisation d'une seule modalité

par expression, utilisation d'une seule modalité à un moment donné, synergique : production séquentielle des énoncés, utilisation de plusieurs modalités

par énoncé, utilisation possible de plusieurs modalités en même temps, concurrente : possibilité d’utiliser plusieurs modalités en même temps, avec

plusieurs expressions indépendantes.

Page 33: Communication Homme-Machine MultiModale Jean-Claude MARTIN Maître de Conférences LIMSI-CNRS / LINC-IUT de Montreuil martin @ limsi.fr

Jean-Claude MARTIN - LIMSI / LINC 33

TYCOON Typology (Martin 93)

Page 34: Communication Homme-Machine MultiModale Jean-Claude MARTIN Maître de Conférences LIMSI-CNRS / LINC-IUT de Montreuil martin @ limsi.fr

Jean-Claude MARTIN - LIMSI / LINC 34

TYCOON Typology

When several modalities cooperate by transfer, this means that a chunk of information produced by a modality is used by another modality.

When several modalities cooperate by equivalence, this means that a chunk of information may be processed as an alternative, by either of them.

Page 35: Communication Homme-Machine MultiModale Jean-Claude MARTIN Maître de Conférences LIMSI-CNRS / LINC-IUT de Montreuil martin @ limsi.fr

Jean-Claude MARTIN - LIMSI / LINC 35

TYCOON Typology

If several modalities cooperate by redundancy, this means that the same information is processed by these modalities.

When several modalities cooperate by complementarity, it means that different chunks of information are processed by each modality but have to be merged.

Page 36: Communication Homme-Machine MultiModale Jean-Claude MARTIN Maître de Conférences LIMSI-CNRS / LINC-IUT de Montreuil martin @ limsi.fr

Jean-Claude MARTIN - LIMSI / LINC 36

TYCOON Typology

When modalities cooperate by specialization, this means that a specific kind of information is always processed by the same modality.

Finally, when several modalities cooperate by concurrency, it means that different chunks of information are processed by several modalities at the same time but must not be merged.

Page 37: Communication Homme-Machine MultiModale Jean-Claude MARTIN Maître de Conférences LIMSI-CNRS / LINC-IUT de Montreuil martin @ limsi.fr

Jean-Claude MARTIN - LIMSI / LINC 37

Notations formelles

spécialisation_absolue (M, I , {Mi}) : I = S(M) Mi ( I S(Mi) )

MSpE(M) S(M) I

MiE(Mi) S(Mi)

specialisation_informations(M, I , {Mi}) : I S(M) Mi ( I S(Mi) )

MSpE(M) I

specialisation_modalité (M, I, {Mi}) : I = S(M) Mi ( I S(Mi) )

MSpE(M) S(M) I

MiE(Mi)

Page 38: Communication Homme-Machine MultiModale Jean-Claude MARTIN Maître de Conférences LIMSI-CNRS / LINC-IUT de Montreuil martin @ limsi.fr

Jean-Claude MARTIN - LIMSI / LINC 38

« Referenceable » objects Object inner properties + descriptive properties in each modality Precision and ambiguity of multimodal references

Gesture : distance between the object and the focus point of gesture

Speech : « this »« this building »« this museum » « the orsay museum »

Combining salience values (Huls 95)

Speech Gesture Graphics Total 0.5 0.99 0.624 0.261

Street Building

Hotel RestaurantMuseum

Map Object

Orsay Museum

Page 39: Communication Homme-Machine MultiModale Jean-Claude MARTIN Maître de Conférences LIMSI-CNRS / LINC-IUT de Montreuil martin @ limsi.fr

Jean-Claude MARTIN - LIMSI / LINC 39

Experimental Studies

Page 40: Communication Homme-Machine MultiModale Jean-Claude MARTIN Maître de Conférences LIMSI-CNRS / LINC-IUT de Montreuil martin @ limsi.fr

Jean-Claude MARTIN - LIMSI / LINC 40

Information

Information -> Presentation -> Cognition

VisualizationCognition

Page 41: Communication Homme-Machine MultiModale Jean-Claude MARTIN Maître de Conférences LIMSI-CNRS / LINC-IUT de Montreuil martin @ limsi.fr

Jean-Claude MARTIN - LIMSI / LINC 41

Cycle de développement

Généralement 4 phases Analyse des besoins Prototypage rapide Evaluation de l'utilisabilité Implémentation

Page 42: Communication Homme-Machine MultiModale Jean-Claude MARTIN Maître de Conférences LIMSI-CNRS / LINC-IUT de Montreuil martin @ limsi.fr

Jean-Claude MARTIN - LIMSI / LINC 42

Outils d’analyse

Page 43: Communication Homme-Machine MultiModale Jean-Claude MARTIN Maître de Conférences LIMSI-CNRS / LINC-IUT de Montreuil martin @ limsi.fr

Jean-Claude MARTIN - LIMSI / LINC 43

Outils d’analyse

Page 44: Communication Homme-Machine MultiModale Jean-Claude MARTIN Maître de Conférences LIMSI-CNRS / LINC-IUT de Montreuil martin @ limsi.fr

Jean-Claude MARTIN - LIMSI / LINC 44

Outils d’analyse

Page 45: Communication Homme-Machine MultiModale Jean-Claude MARTIN Maître de Conférences LIMSI-CNRS / LINC-IUT de Montreuil martin @ limsi.fr

Jean-Claude MARTIN - LIMSI / LINC 45

Page 46: Communication Homme-Machine MultiModale Jean-Claude MARTIN Maître de Conférences LIMSI-CNRS / LINC-IUT de Montreuil martin @ limsi.fr

Jean-Claude MARTIN - LIMSI / LINC 46

Wizard interface /Subject interface

SRI (Kehler et al. 98) http://www.speech.sri.com/people/julia/

Page 47: Communication Homme-Machine MultiModale Jean-Claude MARTIN Maître de Conférences LIMSI-CNRS / LINC-IUT de Montreuil martin @ limsi.fr

Jean-Claude MARTIN - LIMSI / LINC 47

Subjects Side

Subject: study pen & voice in unconstrained environment

Non Operational Input Interface• Audio input saved, but not processed• Pen drawings transmitted to Wizard, but not processed• Exception: icon selection (point, circle) enabled

Fully Operational Output (Multimedia)• Photos, videos, audio and displayed information are

available

Page 48: Communication Homme-Machine MultiModale Jean-Claude MARTIN Maître de Conférences LIMSI-CNRS / LINC-IUT de Montreuil martin @ limsi.fr

Jean-Claude MARTIN - LIMSI / LINC 48

Wizard’s Side

Wizard: study pen, voice & GUI using real system

Fully Operational System• Can use any available modality or combination of modalities

Shared Subject’s screen PLUS features• Advanced GUI (Databases query, Map navigation, etc…)• Clock in order to answer within an acceptable amount of time• Additional panel to send audio messages to subject

The Wizard is THE EXPERT• Captures user intention & decides system response

Page 49: Communication Homme-Machine MultiModale Jean-Claude MARTIN Maître de Conférences LIMSI-CNRS / LINC-IUT de Montreuil martin @ limsi.fr

Jean-Claude MARTIN - LIMSI / LINC 49

The problem:Analyzing Subjects Data

Video

Sound Files

Pen Files

Page 50: Communication Homme-Machine MultiModale Jean-Claude MARTIN Maître de Conférences LIMSI-CNRS / LINC-IUT de Montreuil martin @ limsi.fr

Jean-Claude MARTIN - LIMSI / LINC 50

The goal:building the real system

How to analyze the multimodal behavior ?

How to specify a multimodal system ?

Find a common grid ?

Page 51: Communication Homme-Machine MultiModale Jean-Claude MARTIN Maître de Conférences LIMSI-CNRS / LINC-IUT de Montreuil martin @ limsi.fr

Jean-Claude MARTIN - LIMSI / LINC 51

A survey of multimodal user studies

Guyomardet al. 95

Oviattet al. 97

Dendaet al. 97

Traftonet al. 97

Fais 97

application touristic map real estatemap

edition task

touristic map touristic map

hotel booking

map

real /simulated

simulated,then real

simulated real simulated real

inputspeech

tactile screenspeech

pen

speechtactile screen

(pointing)

keyboard("natural

language"),mouse, menus

speechkeyboard

tactile screen

output speechgraphics, text

speechgraphics

speechgraphics

graphics speechgraphics, text

persona

Page 52: Communication Homme-Machine MultiModale Jean-Claude MARTIN Maître de Conférences LIMSI-CNRS / LINC-IUT de Montreuil martin @ limsi.fr

Jean-Claude MARTIN - LIMSI / LINC 52

What are the monomodal features of user’s behavior ?

Categories of words and gestureGuyomard et al. 95: pointing, lines, areas,

contours, closed or openOviatt et al. 97: composed circle-line-circleMignot et al. 96: several fingers (rotation)

Monomodal behavior changed ?Oviatt et al. 97: less spoken disfluencies with

multimodality

Page 53: Communication Homme-Machine MultiModale Jean-Claude MARTIN Maître de Conférences LIMSI-CNRS / LINC-IUT de Montreuil martin @ limsi.fr

Jean-Claude MARTIN - LIMSI / LINC 53

Does the subject use complementarity ?

Definitiondifferent chunks of information belonging to the

same command are transmitted on different modalities

Exampleis this a chinese restaurant + circling gesture

ObservationsDifferent patterns (Guyomard et al. 95 et al. 95)

• Are there any beaches in this locality ? + <pointing>• What are the camping sites at + <pointing>

Page 54: Communication Homme-Machine MultiModale Jean-Claude MARTIN Maître de Conférences LIMSI-CNRS / LINC-IUT de Montreuil martin @ limsi.fr

Jean-Claude MARTIN - LIMSI / LINC 54

Does the subject use redundancy ?

Definitionthe same chunk of information is transmitted on

several modalities

Exampleis Tiger Lily’s a chinese restaurant

+ circling gesture around Tiger Lily’s restaurant

Page 55: Communication Homme-Machine MultiModale Jean-Claude MARTIN Maître de Conférences LIMSI-CNRS / LINC-IUT de Montreuil martin @ limsi.fr

Jean-Claude MARTIN - LIMSI / LINC 55

Does the subject use redundancy ?

Observations• Oviatt et al. 97: seldom (2% of commands)• Mignot et al. 96: often with continuous gestures• Petrelli et al. 97: often with short labels

What is missing ?• continuum between redundancy and complementarity,

saliency • impact of graphical output on speech and gesture (2

maps in Oviatt et al. 97)

Page 56: Communication Homme-Machine MultiModale Jean-Claude MARTIN Maître de Conférences LIMSI-CNRS / LINC-IUT de Montreuil martin @ limsi.fr

Jean-Claude MARTIN - LIMSI / LINC 56

Complementarity / redundancyTemporal relation

Observations• Oviatt et al. 97:pen (writing, drawing) often before

speech• Mignot et al. 96: no obvious systematic temporal

relation• Catinis et al. 95:temporal coincidence often observed

Not possible to generalizeWhat is missing ?

• Why such differences ? Media, task, users …• distinction between complementarity and redundancy

Page 57: Communication Homme-Machine MultiModale Jean-Claude MARTIN Maître de Conférences LIMSI-CNRS / LINC-IUT de Montreuil martin @ limsi.fr

Jean-Claude MARTIN - LIMSI / LINC 57

Does the subject use equivalence ?

Definitionthe user tries several ways of achieving the

same commandequivalence does not mean equality !

Examplespeech : scroll the map to the leftgesture : arrow towards the leftcomplementarity or redundant combination

Page 58: Communication Homme-Machine MultiModale Jean-Claude MARTIN Maître de Conférences LIMSI-CNRS / LINC-IUT de Montreuil martin @ limsi.fr

Jean-Claude MARTIN - LIMSI / LINC 58

Does the subject use equivalence ?

Observations• modality as a function of types of commands

What is missing ?• Rating the equivalence behavior of the users

Does she switches between modalities ?That is useful for system implementation

Page 59: Communication Homme-Machine MultiModale Jean-Claude MARTIN Maître de Conférences LIMSI-CNRS / LINC-IUT de Montreuil martin @ limsi.fr

Jean-Claude MARTIN - LIMSI / LINC 59

Does the subject use concurrency ?

Definition• independent chunks of information are transmitted on

several modalities and overlap in time

Observations• once in (Mignot et al. 96)• once at SRI (moving a window)

What is missing ?• Why ?

Page 60: Communication Homme-Machine MultiModale Jean-Claude MARTIN Maître de Conférences LIMSI-CNRS / LINC-IUT de Montreuil martin @ limsi.fr

Jean-Claude MARTIN - LIMSI / LINC 60

Does the subject use specialization ?

Definitiona specific type of information is always

transmitted on the same modality

ObservationsMignot et al. 96: 2 subjects preferred speech only

2 subjects continuous gestures only for moving action

What is missing ?distinction between sub-types of specialization

Page 61: Communication Homme-Machine MultiModale Jean-Claude MARTIN Maître de Conférences LIMSI-CNRS / LINC-IUT de Montreuil martin @ limsi.fr

Jean-Claude MARTIN - LIMSI / LINC 61

Example of transcription

INP. Speech Is this a Chinese restaurant?INP. Gesture Circles around the selected restaurant (gesture after speech)OUT. Graph Textual description displayedANALYSIS Information about selected object :

partial redundancy speech (this), graphical context, gesture (circle), gesture after speech

Page 62: Communication Homme-Machine MultiModale Jean-Claude MARTIN Maître de Conférences LIMSI-CNRS / LINC-IUT de Montreuil martin @ limsi.fr

Jean-Claude MARTIN - LIMSI / LINC 62

CjCj

CjequivalentCjeequivalenc

)(

Cj

Cj CjRrk

CjR

rkCjsalience

redundcompl )(

),(

./.)(

Multimodal Metrics

Page 63: Communication Homme-Machine MultiModale Jean-Claude MARTIN Maître de Conférences LIMSI-CNRS / LINC-IUT de Montreuil martin @ limsi.fr

Jean-Claude MARTIN - LIMSI / LINC 63

TYCOON-DTD

Figure 1: Example of the XML annotation of a sample command observed in the SRI corpus (Cheyer et al. 1998).

Page 64: Communication Homme-Machine MultiModale Jean-Claude MARTIN Maître de Conférences LIMSI-CNRS / LINC-IUT de Montreuil martin @ limsi.fr

Jean-Claude MARTIN - LIMSI / LINC 64

Figure 2: The “referenceable objects” section of a multimodal annotation.

Page 65: Communication Homme-Machine MultiModale Jean-Claude MARTIN Maître de Conférences LIMSI-CNRS / LINC-IUT de Montreuil martin @ limsi.fr

Jean-Claude MARTIN - LIMSI / LINC 65

Figure 3: A speech segment ("Senator dinner ... ? can I eat a hamburger there ?" which contains two references to the object

rest1.

Page 66: Communication Homme-Machine MultiModale Jean-Claude MARTIN Maître de Conférences LIMSI-CNRS / LINC-IUT de Montreuil martin @ limsi.fr

Jean-Claude MARTIN - LIMSI / LINC 66

Figure 4: A gesture segment including a reference to the object rest1.

Page 67: Communication Homme-Machine MultiModale Jean-Claude MARTIN Maître de Conférences LIMSI-CNRS / LINC-IUT de Montreuil martin @ limsi.fr

Jean-Claude MARTIN - LIMSI / LINC 67

STEP 1: Parse the XML fileCf Article

          Build the document tree out of the XML file Build Java representation of referenceable objects (Figure 5) and

references (Figure 6). Build the table associating each couple (objects, reference) with a

salience value (Figure 7) ; these values are computed according to pre-defined salience rules such as “if the reference contains the full name of this object, set the salience of this object in this reference to 1.0” ; these rules are expected to be dependent on the corpus at hand.

Build the table computing the average salience values for all the references in the different modalities within the same multimodal segment (Figure 8).

Page 68: Communication Homme-Machine MultiModale Jean-Claude MARTIN Maître de Conférences LIMSI-CNRS / LINC-IUT de Montreuil martin @ limsi.fr

Jean-Claude MARTIN - LIMSI / LINC 68

Software tools

Page 69: Communication Homme-Machine MultiModale Jean-Claude MARTIN Maître de Conférences LIMSI-CNRS / LINC-IUT de Montreuil martin @ limsi.fr

Jean-Claude MARTIN - LIMSI / LINC 69

ProcessingInput

User(s)

Information

Applications

People

App

lica

tion

Int

erfa

ce

MediaAnalysis

Gesture

Language

Graphics

User Model

Discourse Model

Domain Model

Media Models

Task Model

OutputRendering

DiscourseModeling

InteractionManagement

Media Fusion

User Modeling

PlanRecognition & Generation

PresentationDesignGesture

MediaDesign

Language

Graphics

IUI Architecture

Page 70: Communication Homme-Machine MultiModale Jean-Claude MARTIN Maître de Conférences LIMSI-CNRS / LINC-IUT de Montreuil martin @ limsi.fr

Jean-Claude MARTIN - LIMSI / LINC 70

Open Agent ArchitectureCheyer, Julia (SRI)

Features: • Distributed, parallel, multi-language, extensible,

dynamic, collaborative

http://www.ai.sri.com/~oaa/

Page 71: Communication Homme-Machine MultiModale Jean-Claude MARTIN Maître de Conférences LIMSI-CNRS / LINC-IUT de Montreuil martin @ limsi.fr

Jean-Claude MARTIN - LIMSI / LINC 71

Page 72: Communication Homme-Machine MultiModale Jean-Claude MARTIN Maître de Conférences LIMSI-CNRS / LINC-IUT de Montreuil martin @ limsi.fr

Jean-Claude MARTIN - LIMSI / LINC 72

OAA provides...

Distributed access to dynamic databasesParallel competition and cooperation

“Show photo of the hotel in Menlo Park”• Natural language -> last hotel talked about• Map display -> only looking at one hotel• Gesture agent -> will point at a hotel in 1 sec.• Database agent -> combine “Menlo Park” data

with other other competing

agents

Collaboration across shared workspaces• Human to agents, agent to agents, human to human

Page 73: Communication Homme-Machine MultiModale Jean-Claude MARTIN Maître de Conférences LIMSI-CNRS / LINC-IUT de Montreuil martin @ limsi.fr

Vidéos

Page 74: Communication Homme-Machine MultiModale Jean-Claude MARTIN Maître de Conférences LIMSI-CNRS / LINC-IUT de Montreuil martin @ limsi.fr

Jean-Claude MARTIN - LIMSI / LINC 74

SOFTWARE TOOLS Milaas Summer School 1999

Page 75: Communication Homme-Machine MultiModale Jean-Claude MARTIN Maître de Conférences LIMSI-CNRS / LINC-IUT de Montreuil martin @ limsi.fr

Jean-Claude MARTIN - LIMSI / LINC 75

SOFTWARE TOOLS Specification of cooperation

A variable V3 is defined as the beginning of a sequence: start_sequence Multimodal V3

It may be activated by one event among several (the word "name" typed on the keyboard or the speech items "what is the name of" or "what is that"): equivalence Multimodal V3 Keyboard name Speech

what_is_the_name_of Speech what_is_that

This V3 variable is linked sequentially to a second variable V4: complementarity_sequence Multimodal V3 V4

V4 may only be activated by a mouse event: specialization Multimodal V4 Mouse *

V4 is bound to a parameter of an application module which is involved in the execution process: bind_application Parameter1NameOf V4

V4 is the last variable of the sequence: end_sequence Multimodal V4 NameOf

Page 76: Communication Homme-Machine MultiModale Jean-Claude MARTIN Maître de Conférences LIMSI-CNRS / LINC-IUT de Montreuil martin @ limsi.fr

Jean-Claude MARTIN - LIMSI / LINC 76

SOFTWARE TOOLS Multimodal Engine

Page 77: Communication Homme-Machine MultiModale Jean-Claude MARTIN Maître de Conférences LIMSI-CNRS / LINC-IUT de Montreuil martin @ limsi.fr

Jean-Claude MARTIN - LIMSI / LINC 77

SOFTWARE TOOLS Guided Propagation Networks (Béroule 85)

used in the multimodal moduleenables «multimodal recognition

scores» A) activation proportionalto speech recognition score

C) linear function fortemporal proximity

B) recognition is possiblewhen events are missing

Page 78: Communication Homme-Machine MultiModale Jean-Claude MARTIN Maître de Conférences LIMSI-CNRS / LINC-IUT de Montreuil martin @ limsi.fr

Jean-Claude MARTIN - LIMSI / LINC 78

SOFTWARE TOOLS Processing steps

MultimodalApplication multimodalEngine

speechModality gestureModality

processEvent(monomodalEvent)

monomodalApplication

updateSalience (monomodalEvent)executeInterpretedCommand(cmd)

Page 79: Communication Homme-Machine MultiModale Jean-Claude MARTIN Maître de Conférences LIMSI-CNRS / LINC-IUT de Montreuil martin @ limsi.fr

Jean-Claude MARTIN - LIMSI / LINC 79

SOFTWARE TOOLS General Algorithm

parse the specification file create the cooperation network, the referencable objects while the user does not ask for exit

if an event is detected on a modality update the salience of referencable objects create an information object put it into the output of the information node managing this

information for each cooperation node in the network

toBeActivated = f ( type of cooperation, output of the input nodes) if toBeActivated is true

build a hypothesis object, compute its score and put it into the output if this node is terminal, call application.executeCommand() and solve

references if needed

Page 80: Communication Homme-Machine MultiModale Jean-Claude MARTIN Maître de Conférences LIMSI-CNRS / LINC-IUT de Montreuil martin @ limsi.fr

Jean-Claude MARTIN - LIMSI / LINC 80

SOFTWARE TOOLS Propagating Hypothesis in the Network

---- Propagating the following hypothesis from V12cooperation : specification = complementarity noCriteria V12 V10 V11score : 0.73input :

cooperation : specification = equivalence V10 V1 V4score : 0.65semantics : nullinput :

modality : SPEECH_RECOGNITION - weight : 0.8features :

-----------attributeType : TIMEsemantics : nullscore : 1.0content : 150309-----------attributeType : RECOGNISED_WORDsemantics : trafficscore : 0.65content : is there any traffic

Page 81: Communication Homme-Machine MultiModale Jean-Claude MARTIN Maître de Conférences LIMSI-CNRS / LINC-IUT de Montreuil martin @ limsi.fr

Jean-Claude MARTIN - LIMSI / LINC 81

SOFTWARE TOOLS

Page 82: Communication Homme-Machine MultiModale Jean-Claude MARTIN Maître de Conférences LIMSI-CNRS / LINC-IUT de Montreuil martin @ limsi.fr

Jean-Claude MARTIN - LIMSI / LINC 82

Exemple de travaux issus de la linguistique

Page 83: Communication Homme-Machine MultiModale Jean-Claude MARTIN Maître de Conférences LIMSI-CNRS / LINC-IUT de Montreuil martin @ limsi.fr

Jean-Claude MARTIN - LIMSI / LINC 83

Deixis (1)

Definition (Lyons 1977 in Huls et al. 95)

the location and identification of persons, objects, events, processes and activities being talked about, or referred to, in relation to the spatiotemporal context created and sustained by the act of utterance and the participation in it, typically, of a single speaker and a least one addressee

Page 84: Communication Homme-Machine MultiModale Jean-Claude MARTIN Maître de Conférences LIMSI-CNRS / LINC-IUT de Montreuil martin @ limsi.fr

Jean-Claude MARTIN - LIMSI / LINC 84

Deixis (2)

Types of deixis personal deixis:

pronouns (I, we, you)

temporal deixis: time of speech + tense ("he lives in Paris") and

temporal modifiers ("in an hour")

spatial deixis: demonstratives produced with a gesture ("this file" +

gesture)

Page 85: Communication Homme-Machine MultiModale Jean-Claude MARTIN Maître de Conférences LIMSI-CNRS / LINC-IUT de Montreuil martin @ limsi.fr

Jean-Claude MARTIN - LIMSI / LINC 85

Anaphors (1)

Definition Anaphors can be interpreted without regard to

the spatio-temporal context. Their interpretation merely depends on the linguistic expressions that precede them in the discourse.

Example "Print the file about dialogue system. Delete this"

The words used in deictic expressions are also used in anaphors ...

Page 86: Communication Homme-Machine MultiModale Jean-Claude MARTIN Maître de Conférences LIMSI-CNRS / LINC-IUT de Montreuil martin @ limsi.fr

Jean-Claude MARTIN - LIMSI / LINC 86

XTRA (Allgayer et al. 89) data from the dialog memory and from

gesture analysis are combined by taking the intersection of two sets of potential referents suggested by these information sources

Page 87: Communication Homme-Machine MultiModale Jean-Claude MARTIN Maître de Conférences LIMSI-CNRS / LINC-IUT de Montreuil martin @ limsi.fr

Jean-Claude MARTIN - LIMSI / LINC 87

EDWARD (1)

Huls et al. 95Input: typed words, mouse clicksOutput: text display, graphicsApplication: file managementFocus on: referent resolutionUses 3 knowledge sources

Page 88: Communication Homme-Machine MultiModale Jean-Claude MARTIN Maître de Conférences LIMSI-CNRS / LINC-IUT de Montreuil martin @ limsi.fr

Jean-Claude MARTIN - LIMSI / LINC 88

EDWARD (2)

knowledge base: semantic network (classes, instances of

entities and relations): man#24, send#89

Restrictions on the relations role-fillers exclude certain referents.

Some entities and relations are represented graphically (files, "contained in")

Page 89: Communication Homme-Machine MultiModale Jean-Claude MARTIN Maître de Conférences LIMSI-CNRS / LINC-IUT de Montreuil martin @ limsi.fr

Jean-Claude MARTIN - LIMSI / LINC 89

EDWARD (3)Context model based on (Alshawi 87)

SalienceThe salience value of an instance is

obtained by adding the current significance weights of the CF

objects in scope [successive weigths]

linguisticCF

major-constituent referents: referents of subject, (in)direct object, modifier [3, 2, 1, 0] referents of the subject phrase [2, 1, 0] nested-term referent (referent of NP modifiers: Propositional Phrase, relative clause) [1, 0] relation expressed by, S, PP, and relative clause [3, 2, 1, 0]

perceptualCF

referents visible in the current viewport [1, ..., 1, 0] referents selected in the model world [2, ..., 2, 0] referents indicated by a pointing gesture [30, 1, 0]

ni

instiCFceweightsignificaninstSV 1 )()(

Page 90: Communication Homme-Machine MultiModale Jean-Claude MARTIN Maître de Conférences LIMSI-CNRS / LINC-IUT de Montreuil martin @ limsi.fr

Jean-Claude MARTIN - LIMSI / LINC 90

EDWARD (4)

Example of salience computationSV of Koen SV of Ria SV of the article

Koen is the husband of Ria subject + major3+2=5

nested1

0

He writes an article (existing)+subject+major(3-1+2-1)+3+2=8

existing1-1=0

major3

This article is about his wife (existing)+nested(3-2+2-2+3-1+2-1)+1=5

major3

(existing)+subject+major(3-1)+3+2=7

Page 91: Communication Homme-Machine MultiModale Jean-Claude MARTIN Maître de Conférences LIMSI-CNRS / LINC-IUT de Montreuil martin @ limsi.fr

Jean-Claude MARTIN - LIMSI / LINC 91

MULTIMODALITYand ICT ?

Switching between different computers and media configurations environements privacy issues (from car to public transportation)

Application: reference to map objects Some media / modalities / combination

become unavailable are used differently (degradation)

Page 92: Communication Homme-Machine MultiModale Jean-Claude MARTIN Maître de Conférences LIMSI-CNRS / LINC-IUT de Montreuil martin @ limsi.fr

Jean-Claude MARTIN - LIMSI / LINC 92

MULTIMODALITYand ICT ?

Usefulness for adaptative mobile interfaces Multimodal recognition score

noisy environment, events not detected, bad precision

Cooperation between modalitiesequivalence to cope several environmental situationsredundancy to cope with noisy environments

What remains to be done Model of environment and mobile configurations Real testing with a mobile configuration

Page 93: Communication Homme-Machine MultiModale Jean-Claude MARTIN Maître de Conférences LIMSI-CNRS / LINC-IUT de Montreuil martin @ limsi.fr

Jean-Claude MARTIN - LIMSI / LINC 93

Et la multimodalité en sortie ?

Page 94: Communication Homme-Machine MultiModale Jean-Claude MARTIN Maître de Conférences LIMSI-CNRS / LINC-IUT de Montreuil martin @ limsi.fr

Jean-Claude MARTIN - LIMSI / LINC 94

Theoretical issues:Views of same information

Bahnhofstr.

Sulzbachstr.

Viktoriastr.

Betzenstr.

Peter moves along Bahnhofstrasse

PDA

ML server

Mobile Phone

PC

Page 95: Communication Homme-Machine MultiModale Jean-Claude MARTIN Maître de Conférences LIMSI-CNRS / LINC-IUT de Montreuil martin @ limsi.fr

Jean-Claude MARTIN - LIMSI / LINC 95

Multimedia Generation: Media Coordination

Application Knowledge Generation ParametersNovice Expert German English Instruction Manual Incremental Output

•••

Presentation Goal

(BMB P A (GOAL P (DONE A fill-in-128)))

•••

Wireframe Data(primitive-object : NAME "watercontainer" :TYPE :BOX :VERTICES '((10 10 10) ...) :PQUADER :OU '(0 1 0) ) •••

WIP Processing Modules

Knowledge Base Commonsense knowledge of presentation techniques

(defaction 'Fill-in-water '(actpars ( ( ... ) ) (sequence (A1 Lift-lid) (A2 Remove-cover) (A3 Pour-water) ) (constraints ( ... ) ) •••

Multimodal Presentation  

[Wahlster et al, 1993][André & Rist, 1993]

Page 96: Communication Homme-Machine MultiModale Jean-Claude MARTIN Maître de Conférences LIMSI-CNRS / LINC-IUT de Montreuil martin @ limsi.fr

Jean-Claude MARTIN - LIMSI / LINC 96

Multimedia Generation: Media AllocationTailoring to the User’s Task

Exactness (table) versus trends/comparison(bar chart) [Burger and Marshall, 1993]

NL Query: “When do trains leave for New York from Washington?”

Time

When Do Trains Leave For New York From Washing... KillPrintText

Train

MetroLiner Service

Normal Service

8:00 am 9:00 am 10:00 am 11:00 am Noon 1:00 pm 2:00 pm 3:00 pm

Bold MetroLiner ServiceRoman Normal Service

KillPrintTextWhen Do Trains ...

WashingtonNew York

10:59 am

11:41 am

11:59 am

12:35 pm

12:55 pm

1:55 pm

2:49 pm

2:55 pm

8:00 am

8:20 am

9:00 am

9:40 am

10:00 am

11:00 am

11:20 am

Noon

Page 97: Communication Homme-Machine MultiModale Jean-Claude MARTIN Maître de Conférences LIMSI-CNRS / LINC-IUT de Montreuil martin @ limsi.fr

Jean-Claude MARTIN - LIMSI / LINC 97

CUBRICON Architecture

SPEECHINPUT

DEVICE

KEYBOARDDEVICE

MOUSEPOINTINGDEVICE

COLOR-GRAPHICSDISPLAY

MONOCHROMEDISPLAY

SPEECHOUTPUTDEVICE

LEXICON

DISCOURSEMODEL

OUTPUT PLANNINGSTRATEGIES

KB OF GENERALKNOWLEDGE

EXECUTOR AND COMMUNICATOR TO TARGET SYSTEM

GRAMMAR

USERMODEL

KB OF DOMAIN-SPECIFIC KNOWLEDGE

TARGET APPLICATION SYSTEM

MISSIONPLANNINGSYSTEM

DBMS

INPUTCOORDINATOR

MULTIMEDIAPARSER

INTERPRETER

COORDINATEDOUTPUT

GENERATOR

MULTIMEDIAOUTPUT

PLANNER

KNOWLEDGE SOURCES

INTELLIGENT MULTIMEDIAINTERFACE

1

2

3

4

5

Neal, J. G. and Shapiro, S. C. 1991. Intelligent Multi-Media Interface Technology. In Sullivan, J. W., and Tyler, S. W. (eds.) Intelligent User Interfaces. Frontier Series. New York: ACM Press. 11-43