fusion engines for input multimodal interfaces: a survey
DESCRIPTION
Fusion engines are fundamental components of multimodal interactive systems, to interpret temporal combinations of deterministic as well as non-deterministic inputs whose meaning can vary according to the context, user and task. While various surveys have already been released on the topic of multimodal interactive systems, the current paper focuses on the design, specification, construction and evaluation of fusion engines. The article first introduces the adopted terminology and the major challenges that fusion engines propose to solve. Further, a history of the work achieved in the field of fusion engines is presented according to the main phases of the BRETAM model. A classification of existing approaches for fusion engines is then presented. The classification dimensions include the types of applications, the fusion principles and the temporal aspects. Finally, unsolved challenges, such as software frameworks, quantitative evaluation, machine learning and adaptation, sketch future work in the field of fusion engines.TRANSCRIPT
Special session on Multimodal Fusion
• A survey: Fusion Engines for Multimodal Input• 5 papers
D. Lalanne (Switzerland), L. Nigay (France), P. Palanque (France), P. Robinson (UK), J. Vanderdonckt (Belgium)
1
Multimodal fusion
• Multimodal fusion for• Perception• Interaction
• Focus on multimodal interaction• 4 papers on multimodal interaction• 1 paper on multimodal perception
(first one)
2
Input Multimodal Interaction
3
Input Fusion Engines• Multimodal fusion
• Combining and interpreting data from multiple input modalities
• Usage of input modalities
Combined
Independent
Sequential Parallel
Alternate
Exclusive
Synergistic
Concurrent
4
Input Fusion Engines
• Combined usage (sequential, parallel) why?
• Natural interaction is multimodal by nature.
• The combination of input modalities increases the bandwidth of the human-computer interaction.
5
Fusion engines• A very dynamic domain • ˜15 years of contributions: 1993-2008
6
Input Fusion engines• Some key features
• Multiple and temporal combinations• Types of data and time synchronization
• Probabilistic inputs• Non deterministic inputs
• Robustness• Error handling• Adaptation to context
• Context = (user, environment, platform)
7
Classification:Fusion engines
8
1980 R. Bolt
“Put that there”
Classification:Fusion engines
9
1980 R. Bolt
“Put that there”
Cubricon
1989
CARE 1995
Quickset
1997
ICARE 2004 Petshop
2004FAME 2006
Classification:Fusion engines
10
1980 R. Bolt
“Put that there”
Multiple (up to 255) Input API in
Windows 7 Microsoft
MultiPoint SDK
“Zoom in
here”
UX beats Usability
A gap
Theories and Contributions over Time
11
Reference Tool/ language/ programFusion Time Representation
Application types
Notation Type Level Input DevicesAmbiguity Resolution
Quantitative Qualitative
BBolt [4] Put that here system None None Dialog Speech gesture ? N ? Map manipulation
R Wahlster
Erreur ! Source du renvoi introuvable. XTRA None Unification Dialog Keyboard Mouse N Y Map manipulation
Neal [26] CubriconGeneralized Augmented Transition Network Procedural Dialog Speech Mouse Keyboard
Proximity-based N Y Map manipulation
E
Koons [19] No name Parse treeFrame-based Dialog Speech, Eye gaze, Gesture
First solution Y Y 3D World
Nigay [28] Pac-Amodeus Melting PotFrame-based Dialog + low level Speech, Keyboard, Mouse
Context-based resolution Y N Flight Scheduling
Cohen [9] Quickset Feature Structure Unification Dialog Pen VoiceS / G & G / S & N best Y N
Simulation System training
Bellik [3] MEDITOR NoneFrame-based Dialog + low level Speech Mouse
History Buffer Y Y Text Editor
Martin [22] TYCOON Set of processes – Guided Propagation Networks Procedural Dialog Speech Keyboard Mouse
Probability-based resolution Y Y
Edition of graphical user interfaces
Johnston [18] FST Finite State Automata Procedural Dialog Speech penPossible (N best) Y Y Corporate Directory
T & A Krahnstoever
[20] iMap Stream StampedFrame-based Dialog Speech gesture Not given Y N Crisis Management
Dumas [12] HephaisTK XML Typed (SMUIML)Frame-based Dialog Speech Mouse Phidgets First one Y Y Meeting assistants
Holzapfel [17] No Name Typed Feature Structure Unification Dialog Speech gesture N Best list Y N Humanoid Robot
Pfleger [33] PATE XML Typed Unification Dialog Speech pen N Best list Y Y Bathroom design Tool
Milota [25] No Name Multimodal Parse Tree Unification DialogSpeech Mouse keyboard Touchscreen S / G & G /S Y N Graphic Design
Melichar [24] WCIMultimodal Generic Dialog Node Unification Dialog Speech Mouse Keyboard First One ? ? Multimedia DB
Sun [37] PUMPP Matrix Unification Dialog Speech gesture S / G N Y Traffic Control
Bourguet [7] Mengine Finite State machine Procedural Low level Speech Mouse Not given N Y No example
Latoschik [21] No NameTemporal Augmented Transition Network Procedural Dialog Speech gesture
Fuzzy constraints Y Y Virtual reality
Bouchet [5] [6]Mansoux [23]
ICARE(Input/Output) Melting pot
Frame-based Dialog + low level
Speech, Helmet visor HOTAS, Tactile surface, GPS localization, Magnetometer, Mouse, Keyboard
Context-based resolution Y N
Aircraft Cockpit, Authentication, Mobile Augmented Reality systems (Game, Post-it), Augmented Surgery
Navarre [30] Petshop Petri nets Procedural Dialog + low levelSpeech mouse Keyboard Touchscreen *** Y Y Aircraft Cockpit
Flippo [14] No Name Semantic tree Hybrid DialogSpeech Mouse Gaze gesture
Feedback for missing data Y N Collaborative Map
Portillo [34] MIMUSFeature Value Structure (DTAC) Hybrid Dialog Speech Mouse
Knowledgeable agent Y N
Duarte [11] FAME Behavioral Matrix Hybrid Dialog Speech Mouse Keyboard Not given ? ? Digital talking Book12
ReferenceTool/
language/ program
FusionTime
Representation Application types
Notation Type Level Input DevicesAmbiguity Resolution
Quantitative
Qualitative
B Bolt [4] Put that here system None None Dialog Speech gesture ? N ? Map manipulation
R Wahlster XTRA None Unification Dialog Keyboard Mouse N Y Map manipulation
Neal [26] Cubricon
Generalized Augmented Transition Network Procedural Dialog Speech Mouse Keyboard Proximity-based N Y Map manipulation
E Koons [19] No name Parse tree Frame-based Dialog Speech, Eye gaze, Gesture First solution Y Y 3D World
Nigay [28] Pac-Amodeus Melting Pot Frame-based Dialog + low level Speech, Keyboard, MouseContext-based resolution Y N Flight Scheduling
Cohen [9] Quickset Feature Structure Unification Dialog Pen Voice S / G & G / S & N best Y NSimulation System training
Bellik [3] MEDITOR None Frame-based Dialog + low level Speech Mouse History Buffer Y Y Text Editor
Martin [22] TYCOON
Set of processes – Guided Propagation Networks Procedural Dialog Speech Keyboard Mouse
Probability-based resolution Y Y
Edition of graphical user interfaces
Johnston [18] FST Finite State Automata Procedural Dialog Speech pen Possible (N best) Y Y Corporate Directory
T & A Krahnstoever [20] iMap Stream Stamped Frame-based Dialog Speech gesture Not given Y N Crisis Management
Dumas [12] HephaisTK XML Typed (SMUIML) Frame-based Dialog Speech Mouse Phidgets First one Y Y Meeting assistants
Holzapfel [17] No NameTyped Feature Structure Unification Dialog Speech gesture N Best list Y N Humanoid Robot
Pfleger [33] PATE XML Typed Unification Dialog Speech pen N Best list Y Y Bathroom design Tool
Milota [25] No NameMultimodal Parse Tree Unification Dialog
Speech Mouse keyboard Touchscreen S / G & G /S Y N Graphic Design
Melichar [24] WCIMultimodal Generic Dialog Node Unification Dialog Speech Mouse Keyboard First One ? ? Multimedia DB
Sun [37] PUMPP Matrix Unification Dialog Speech gesture S / G N Y Traffic Control
Bourguet [7] Mengine Finite State machine Procedural Low level Speech Mouse Not given N Y No example
Latoschik [21] No NameTemporal Augmented Transition Network Procedural Dialog Speech gesture Fuzzy constraints Y Y Virtual reality
Bouchet [5] [6]Mansoux [23]
ICARE(Input/Output) Melting pot Frame-based Dialog + low level
Speech, Helmet visor HOTAS, Tactile surface, GPS localization, Magnetometer, Mouse, Keyboard
Context-based resolution Y N
Aircraft Cockpit, Authentication, Mobile Augmented Reality systems (Game, Post-it), Augmented Surgery
Navarre [30] Petshop Petri nets Procedural Dialog + low levelSpeech mouse Keyboard Touchscreen *** Y Y Aircraft Cockpit
Flippo [14] No Name Semantic tree Hybrid Dialog Speech Mouse Gaze gestureFeedback for missing data Y N Collaborative Map
Portillo [34] MIMUSFeature Value Structure (DTAC) Hybrid Dialog Speech Mouse Knowledgeable agent Y N
Duarte [11] FAME Behavioral Matrix Hybrid Dialog Speech Mouse Keyboard Not given ? ? Digital talking Book
13
Special sessionMultimodal Fusion
• Content• A survey• 5 papers
• Schedule • 10 mn introduction and survey outlook• 15 mn per paper + 5 mn questions• 10 mn for questions on the session
D. Lalanne (Switzerland), L. Nigay (France), P. Palanque (France), P. Robinson (UK), J. Vanderdonckt (Belgium)
Special sessionMultimodal Fusion
• H. Mendonça: Agent-based fusion• B. Dumas: An evaluation framework to
benchmarck fusion engines• L. Nigay: CARE-based fusion• J. Ladry & P. Palanque: Petri net based formal
description and execution of fusion engines• M. Sezgin: Fusion of speech and facial
expression recognition
16
QUESTIONS?
Fusion engines: research agenda
• Performance evaluation• Testbeds, metrics• Identification of interpretation errors• Formal predictive evaluation
• Adaptation to context• Dynamic aspect of adaptation• Reconfigurations
• Engineering aspects• Difficult to develop (toolkit from manufacturers required)• Fusion engine tuning (tuning is the key for interaction
techniques e.g. drag&drop)
17
Fusion Principles
• Notation: Petri nets based (ICOs)• Type: Procedural only• Level: Dialogue and low level• Input Devices: Speech, mice, keyboard,
touch screen • Ambiguity resolution: inside models • Time representation (Quantitative –
Qualitative): Both• Application Type : Safety Critical,
Aeronautics and Space
18