smartkom: modality fusion for a mobile companion based on semantic web technologies
DESCRIPTION
SmartKom: Modality Fusion for a Mobile Companion based on Semantic Web Technologies. Cyber Assist Consortium Second International Symposium - Information Environment for Mobile and Ubiquitous Computing Era - Tokyo, 25 March 2003. Wolfgang Wahlster. - PowerPoint PPT PresentationTRANSCRIPT
German Research Center for Artificial IntelligenceDFKI GmbH
Stuhlsatzenhausweg 366123 Saarbruecken, Germany
phone: (+49 681) 302-5252/4162fax: (+49 681) 302-5341e-mail: [email protected]
WWW:http://www.dfki.de/~wahlster
Wolfgang Wahlster
SmartKom: Modality Fusion for a Mobile Companion based on Semantic Web Technologies
Cyber Assist Consortium Second International Symposium- Information Environment for Mobile and Ubiquitous Computing Era -
Tokyo, 25 March 2003
© W. Wahlster
Multimodal UMTS Systems
Intelligent Interaction with Mobile Internet Services
Access to web content and web services anywhere
and anytime
Access to corporate networks
and virtual private networks from any device
Access to edutainment and infotainment servicesAccess to edutainment
and infotainment services
Access to all messages (voice, email, multimedia, MMS)
from any single device
Access to all messages (voice, email, multimedia, MMS)
from any single device
PersonalizationPersonalization
LocalizationLocalization
© W. Wahlster
MMDialogue
Back-Bone
Home:Consumer Electronics
EPG
Public:Cinema,
Phone, Fax,
Mail, Biometrics
Mobile:Car andPedestrianNavigation
Application
Layer
SmartKom-Mobile
SmartKom-Public
SmartKom-Home/Office
SmartKom: A Highly Portable Multimodal Dialogue System
© W. Wahlster
A Demonstration of SmartKom’s MultimodalInterface for the Federal President of Germany Dr. Rau
© W. Wahlster
SmartKom`s SDDP Interaction MetaphorSDDP = Situated Delegation-oriented Dialogue ParadigmAnthropomorphic Interface = Dialogue Partner
User
specifies goal delegates task
cooperate on problems
asks questions presents results
Service 1 Service 1
Service 2Service 2
Service 3Service 3
Webservices
PersonalizedInteraction Agent
See: Wahlster et al. 2001 , Eurospeech
© W. Wahlster
SmartKom‘s Use of Semantic Web Technology
Three Layers of Annotations
PersonalizedPresentation
M3LContent high
Structure XML medium
Layout HTML low
© W. Wahlster
SmartKom: Intuitive Multimodal Interaction
MediaInterfaceEuropean Media LabUinv. Of
MunichUniv. ofStuttgart
Saarbrücken
Aachen
Dresden Berkeley
Stuttgart
MunichUniv. ofErlangen
Heidelberg
Main ContractorScientific Director
W. Wahlster
DFKISaarbrücken
The SmartKom Consortium:
Project Budget: € 25.5 million, funded by BMBF (Dr. Reuse) and industryProject Duration: 4 years (September 1999 – September 2003)
Ulm
© W. Wahlster
Outline of the Talk
1. The Markup Language Layer Model of SmartKom
2. Modality Fusion in SmartKom
3. The Role of the Semantic Web Language M3L
4. Providing Coherence in Multimodal Dialogs by Ontology-based Overlay
5. Conclusions
© W. Wahlster
Personalization
Mapping Web Content Onto a Variety of Structures and Layouts
From the “one-size fits-all“ approach of static webpages to the “perfect personal fit“ approach of adaptive webpages
StructureXML1 XML2 XMLn
ContentM3L
LayoutHTML11 HTML1m HTML21 HTML2o HTML31 HTML3p
© W. Wahlster
The Markup Language Layer Model of SmartKom
M3LMultiModal Markup
Language
OILOntology Inference
Layer
XMLSeXtended Markup Language Schema
RDFSResource Description Framework Schema
XMLeXtended Markup
LanguageRDF
Resource Description Framework
HTMLHypertext Markup
Language
© W. Wahlster
Spoken Dialogue
Graphical Userinterfaces
GesturalInteraction
MultimodalInteraction
SmartKom: Merging Various User Interface Paradigms
Facial Expressions
Biometrics
© W. Wahlster
Multimodal Input and Output in SmartKomFusion and Fission of Multiple Modalities
Input by the User
Output by the Presentation agent
Speech
Gesture
FacialExpressions
+
+
+
+
+
+
© W. Wahlster
Symbolic and Subsymbolic Fusion of Multiple Modes
SpeechRecognition
GestureRecognition
ProsodyRecognition
Facial ExpressionRecognition
LipReading
SubsymbolicFusion
- Neuronal Networks- Hidden Markov Models
SymbolicFusion
- Graph Unification - Bayesian Networks
Reference Resolution and Disambiguation
Modality-Free Semantic Representation
© W. Wahlster
Personalized Interaction with WebTVs via SmartKom (DFKI with Sony, Philips, Siemens)
User: Switch on the TV.
Smartakus: Okay, the TV is on.
User: Which channels are presenting the latest news right now?
Smartakus: CNN and NTV are presenting news.
User: Please record this news channel on a videotape.
Smartakus: Okay, the VCR is now recording the selected program.
Example: Multimodal Access to Electronic Program Guides for TV
© W. Wahlster
Using Facial Expression Recognition forAffective Personalization
(3’) Smartakus: Which of these features do you want to see?
Processing ironic or sarcastic comments
(1) Smartakus: Here you see the CNN program for tonight.
(2) User: That’s great.
(3) Smartakus: I’ll show you the program of another channel for tonight.
(2’) User: That’s great.
© W. Wahlster
The SmartKom Demonstrator System
Camera for Gestural Input
Microphone
Multimodal Control of TV-Set
Multimodal Control of VCR/DVD Player
Camera forFacial Analysis
© W. Wahlster
Unification of Scored Hypothesis Graphs for Modality Fusion in SmartKom
Word HypothesisGraph with
Acoustic Scores
Clause and Sentence
Boundarieswith Prosodic
Scores
Scored Hypotheses
about the User‘s Emotional State
Gesture HypothesisGraph with Scores
of PotentialReference Objects
Intention RecognizerSelection of Most Likely Interpretation
Modality FusionMutual Disambiguation
Reduction of UncertaintyIntention Hypotheses Graph
© W. Wahlster
SmartKom‘s Computational Mechanisms for Modality Fusion and Fission
Modality Fusion Modality Fission
OntologicalInferences
Unification
OverlayOperations
Planning
ConstraintPropagation
M3L: Modality-Free Semantic Representation
© W. Wahlster
The Role of the Semantic Web Language M3LM3L (Multimodal Markup Language) defines the data exchange formats used
for communication between all modules of SmartKom
M3L is partioned into 40 XML schema definitions covering SmartKom‘s
discourse domains
The XML schema event.xsd
captures the semantic
representation of concepts
and processes in SmartKom‘s
multimodal dialogs
© W. Wahlster
OIL2XSD: Using XSLT Stylesheets to Convert an OIL Ontology to an XML Schema
© W. Wahlster
Using Ontologies to Extract Information from the Web
MyOnto-Movie
:title
:description
:actors
MyOnto-Person
:name:birthday
:director
Film.de-Movie
:title
:description
Kinopolis.de-Movie
:name :critics
:o-title
:main actor
Mapping of Metadata
© W. Wahlster
I would like to send an email to Koiti.
<domainObject>
<sendTelecommunicationProcess>
<sender>....................</sender>
<receiver>..............</receiver>
<document>..........</document>
<email>...........</email>
</sendTelecommunicationProcess>
</domainObject>
M3L as a Meaning Representation Language for the User‘s Input
© W. Wahlster
Exploiting Ontological Knowledge to Understand and Answer the User‘s Queries
<beginTime> <time> <function> <at> 2002-05-10T10:25:46 </at> </function></beginTime>
<domainObject> <epg> <broadcastDefault> <avMedium> <actors>
<actor><name>Schwarzenegger/name></actor>
</actors> </avMedium> <channel><name>Pro7</name></channel> </broadcastDefault> </epg> </domainObject>
Which movies with Schwarzenegger are shown on the Pro7 channel?
© W. Wahlster
SmartKom’s Multimodal Dialogue Back-Bone
Communication BlackboardsData FlowContext Dependencies
Analyzers
ExternalServices
ModalityFusion
DiscourseModeling
ActionPlanning
ModalityFission
Generators
• Speech
• Gestures
• Facial Expressions
• Speech
• Graphics
• Gestures
Dialogue Manager
© W. Wahlster
Smartkom‘s Three-Tiered Discourse Model
DO1 DO2
VO1
DO10
DO3 DO9
Modality Layer
Discourse Layer
System: This [] is a list of films showing in Heidelberg.
heidelberglist
LO2 LO3. . .
Domain Layer DomainObject1
ticket first
DO11 DO12
reserve
LO4 LO5 LO6
DomainObject2
GO1
. . .
. . .
User: Please reserve a ticket for the first one.
DO = Discourse Object, LO = Linguistic ObjectGO = Gestural Object, VO = Visual Object
cf. M. Löckelt et. al. 2002, N. Pfleger 2002
© W. Wahlster
theater: MovieTheater
movie: Movie
reservationNumber: PositiveInteger
SmartKom’s Domain Model based on M3L• Used for communication in the back-bone• Frame-based ontology; representation as Typed Feature Structures in M3L (XML)
name: Stringdirector: Person
cast: PersonListyearOfProduction: PositiveInteger…
address: Addressseats: SeatStructure…
CinemaReservation
• Application objects composed of subobjects• Slots: Feature paths meaningful for the dialogue (entities that can be talked
about / referenced to); e.g. movie:director:lastName in a CinemaReservation object
• Slots can recursively contain other slots
firstName: StringlastName: String…
© W. Wahlster
Overlay Operations Using the Discourse Model
Augmentation and Validation
– compare with a number of previous discourse states:
•fill in consistent information•compute a score
– for each hypothesis - background pair:
– Overlay (covering, background)
Covering:Background:
IntentionHypothesis
Lattice
SelectedAugmentedHypothesis
Sequence
© W. Wahlster
The Overlay Operation Versus the Unification Operation
• Nonmonotonic and noncommutative unification-like operation
• Inherit (non-conflicting) background information
• two sources of conflicts:
– conflicting atomic values
overwrite background (old) with covering (new)
– type clash
assimilate background to the type of covering; recursion
Unification
Overlay
cf. J. Alexandersson, T. Becker 2001
© W. Wahlster
Example for Overlay
User: "What films are on TV tonight?"
System: [presents list of films]
User: "That‘s a boring program, I‘d rather
go to the movies."
How do we inherit “tonight” ?
© W. Wahlster
Domain Model: A Type Hierarchy of TFS
String:...
Time:entEntertainm
title
beginTime
...Cinema:
ePerformanccinema
...Channel:
Broadcastchannel
A named entertainment at
some time
A named TV program at some
time on some channel
A named Movie at some time at some cinema
© W. Wahlster
Unification Simulation
Films on TV tonight
...:
"":
Broadcast
anychanneltonight
TimebeginTimeFail – type clash
String:...
Time:entEntertainm
title
beginTime
...Cinema:
ePerformanccinema
...Channel:
Broadcastchannel
...:
""Time:
ePerformanc
anycinematonightbeginTime
© W. Wahlster
...:
""Time:
ePerformanc
anycinematonightbeginTime
Overlay Simulation
String:...
Time:entEntertainm
title
beginTime
...Cinema:
ePerformanccinema
...Channel:
Broadcastchannel
Go to the moviesFilms on TV tonight
...:
"":
Broadcast
anychanneltonight
TimebeginTime
...:
Time:ePerformanc
anycinemabeginTime
..."":
ePerformanc
tonightTimebeginTime
Assimilation
Background
Covering
© W. Wahlster
"Formal" Definition Overlay
• Let– co be covering– bg be background
• Step 1:– Assimilate(co,bg)
T
bg
co
• Step 2:– Overlay(co,assimilate(co,bg))
• If co and bg are frames: recursion• If co is empty: use bg• If bg is empty: use co• If conflict: use co
© W. Wahlster
Domain Models with Multiple Inheritance
• Assimilate(co,bg) – Compute the set of minimal
upper bounds (MUB)– Specialize the MUBs– Unify the specialized MUBs
T
co bg
• Overlay remains untouched
MUB MUB
© W. Wahlster
Overlay - Scoring
• Four fundamental scoring parameters:– Number of features from Covering (co)– Number of features from Background (bg)– Number of type clashes (tc)– Number of conflicting atomic values (cv)
)()(
)()(),,,(
cvtcbgco
cvtcbgcocvtcbgcoscore
Codomain [-1,1]
Higher score indicates better fit (1 overlay(c,b) unify(c,b))
© W. Wahlster
anychannel
avMedium
Tto
Tfrombetweenfunction
eveningdaytime
tonightdaydeicticdayinstantanalysis
timebeginTime
Broadcast
:
:
00:59:2326102001:
00:00:1826102001:::
:
::::
::
Analysis of U4:
Example: Enrichment and Validation
U4: What’s on TV tonight?S5: [Displays a list of films] Here you see a list of films running tonight.U6: That seems not very interesting, show me the cinema program.
Discourse context
...
:
00:59:2326102001:
00:00:1826102001:::
:
::::
::
avMedium
Tto
Tfrombetweenfunction
eveningdaytime
tonightdaydeicticdayinstantanalysis
timebeginTime
Broadcast
Heidelberg :name
Town:target
carnnsportatiomeansOfTra
iteratedctedTranslMotionDire
:
© W. Wahlster
Example: Enrichment and Validation
U4: What’s on TV tonight?S5: [Displays a list of films] Here you see a list of films running tonight.U6: That seems not very interesting, show me the cinema program.
Discourse context
...
:
00:59:2326102001:
00:00:1826102001:::
:
::::
::
avMedium
Tto
Tfrombetweenfunction
eveningdaytime
tonightdaydeicticdayinstantanalysis
timebeginTime
Broadcast
...
:
:
avMedium
beginTime
ePerformanc
Analysis of U6:
Heidelberg :name
Town:target
carnnsportatiomeansOfTra
iteratedctedTranslMotionDire
:
anycinema
avMedium
Tto
Tfrombetweenfunction
eveningdaytime
tonightdaydeicticdayinstantanalysis
timebeginTime
ePerformanc
:
:
00:59:2326102001:
00:00:1826102001:::
:
::::
::
Overlay ( U6, U4)
Result: (Score: 0.8666)
© W. Wahlster
Example: Enrichment and Validation
U4: What’s on TV tonight?S5: [Displays a list of films] Here you see a list of films running tonight.U6: That seems not very interesting, show me the cinema program.
Discourse context
...
:
00:59:2326102001:
00:00:1826102001:::
:
::::
::
avMedium
Tto
Tfrombetweenfunction
eveningdaytime
tonightdaydeicticdayinstantanalysis
timebeginTime
Broadcast
...
:
:
avMedium
beginTime
ePerformanc
Analysis of U6:
Heidelberg :name
Town:target
carnnsportatiomeansOfTra
iteratedctedTranslMotionDire
:
Overlay ( U6, U2)
...
:
:
avMedium
beginTime
ePerformanc
Result: (Score: -1)
© W. Wahlster
Animation of Scoring Parameters
anychannel
avMedium
Tto
Tfrombetweenfunction
eveningdaytime
tonightdaydeicticdayinstantanalysis
timebeginTime
:
:
00:59:2326102001:
00:00:1826102001:::
:
::::
::
Broadcast
Background
...
:
:
ePerformanc
avMedium
beginTime
Covering
Number of features from Covering (co)Number of features from Background (bg)Number of type clashes (tc)Number of conflicting atomic values (cv)Result:
121
2
8666.0)01()122(
)01()122(),,,(
cvtcbgcoscore
0
© W. Wahlster
The High-Level Control Flow of SmartKom
© W. Wahlster
M3L Specification of a Presentation Task<presentationTask> <subTask> <presentationGoal> <inform> ... </inform> <abstractPresentationContent> ... <result> <broadcast id="bc1"> <channel> <name>EuroSport</name> </channel> <beginTime> <time> <at>2000-12-05T14:00:00</at> </time> </beginTime> <endTime> <time> <at>2000-12-05T15:00:00</at> </time> </endTime> <avMedium> <title>Sport News</title> <avType>sport</avType> ... </abstractPresentationContent> <interactionMode>leanForward</interactionMode> <goalID>APGOAL3000</goalID> <source>generatorAction</source> <realizationType>GraphicsAndSpeech</realizationType>
© W. Wahlster
SmartKom‘s Presentation PlannerThe Presentation Planner generates a Presentation Plan by applying a set of Presentation Strategies to the Presentation Goal.
GlobalPresent
Present AddSmartakus DoLayout EvaluatePersonaNode
Inform
TryToPresentTVOverview
ShowTVOverview
ShowTVOverview SetLayoutDataSetLayoutData
ShowTVOverview
SetLayoutData SetLayoutData
PersonaAction
SendScreenCommand
....
...
...
Generation of Layout
Smartakus Actions
GenerateText
......
... Speak
cf. J. Müller, P. Poller, V. Tschernomas 2002
© W. Wahlster
• Seamless integration and mutual disambiguation of multimodal input and output on semantic and pragmatic levels
• Situated understanding of possibly imprecise, ambiguous, or incom-plete multimodal input
• Context-sensitive interpretation of dialog interaction on the basis of dynamic discourse and context models
• Adaptive generation of coordinated, cohesive and coherent multimodal presentations
• Semi- or fully automatic completion of user-delegated tasks through the integration of information services
• Intuitive personification of the system through a presentation agent
Salient Characteristics of SmartKom
© W. Wahlster
Various types of unification, overlay, constraint processing,
planning and ontological inferences are the fundamental
processes involved in SmartKom‘s modality fusion and
fission components.
The key function of modality fusion is the reduction of the
overall uncertainty and the mutual disambiguation of the
various analysis results based on a three-tiered
representation of multimodal discourse.
We have shown that a multimodal dialogue sytsem must not
only understand and represent the user‘s input, but its own
multimodal output.
Conclusions
© W. Wahlster
First International Conference on Perceptive &Multimodal User Interfaces (PMUI’03)
November 5-7th, 2003 Delta Pinnacle Hotel, Vancouver, B.C., CanadaConference Chair Sharon Oviatt, Oregon Health & Science Univ., USAProgram Chairs Wolfgang Wahlster, DFKI, GermanyMark Maybury, MITRE, USA
PMUI’03 is sponsored by ACM, and will be co-located in Vancouver with ACM’s UIST’03. This meeting follows three successful Perceptive User Interface Workshops (with PUI’01 held in Florida) and three International Multimodal Interface Conferences initiated in Asia (with ICMI’02 held in Pittsburgh).
© W. Wahlster
March 2003ISBN 0-262-06232-18 x 9, 392 pp., 98 illus.$40.00/£26.95 (CLOTH)
Edited by Dieter Fensel, James A. Hendler, Henry Lieberman and Wolfgang WahlsterForeword by Tim Berners-Lee
http://smartkom.dfki.de/