wolfgang wahlster german research center for artificial intelligence dfki gmbh wahlster seventeenth...
TRANSCRIPT
Wolfgang Wahlster
German Research Center for Artificial Intelligence
DFKI GmbH
www.dfki.de/~wahlster
Seventeenth International Joint Conference on Artificial Intelligence, IJCAI-01 Seattle
Wednesday, 8 August 2001
Robust Translation of Spontaneous Speech:
A Multi-Engine Approach
© Wolfgang Wahlster, DFKI GmbH
Mobile Speech-to-Speech Translation of Spontaneous Dialogs
As the name Verbmobil suggests,the system supports verbal
communication with foreign dialog partners in mobile situations.
1
2
face-to-face conversations
telecommunication
© Wolfgang Wahlster, DFKI GmbH
Mobile Speech-to-Speech Translation of Spontaneous Dialogs
Verbmobil Speech
Translation Server
Conference Call:
The Verbmobil Speech Translation Server connects
GSM cell phone users
© Wolfgang Wahlster, DFKI GmbH
Robust Realtime Translation with Verbmobil
At a German Airport: An American business man calls the secretary of a German business partner.
© Wolfgang Wahlster, DFKI GmbH
Verbmobil‘s Multi-Blackboard and Multi-Engine Architecture
Exploiting Underspecification in a Multi-StratalSemantic Representation Language
Combining Deep and Shallow Processing Strategies for Robust Dialog Translation
Evaluation and Technology Transfer
Lessons Learned and Conclusions
Outline
© Wolfgang Wahlster, DFKI GmbH
Telephone-based Dialog Translation
German
EnglishEnglish
German
VerbmobilServerCluster
American DialogPartner
American DialogPartner
German DialogPartner
German DialogPartner
Bianca/Brick XSBinTec
ISDN-LAN Router
Bianca/Brick XSBinTec
ISDN-LAN Router
GermanEnglish
English German
Su
n S
erv
er 4
50
LIN
UX
Ser
ve
r
Su
n U
LT
RA
60
/80
ISDN Conference Call(3 Participants):
-German Speaker -Verbmobil -American Speaker Speech-based Set-up
of the Conference Call
© Wolfgang Wahlster, DFKI GmbH
Verbmobil: The First Speech-Only Dialog Translation System
American Speaker: “Verbmobil” (Voice Dialing)
Mobile DECT Phone
Mobile GSM Phone
© Wolfgang Wahlster, DFKI GmbH
Verbmobil: The First Speech-Only Dialog Translation System
American Speaker: “Verbmobil” (Voice Dialing)
Connect to the VerbmobilSpeech-to-Speech Translation Server
+49 631 3111911
Mobile DECT Phone
Mobile GSM Phone
© Wolfgang Wahlster, DFKI GmbH
Verbmobil: The First Speech-Only Dialog Translation System
American Speaker: “Verbmobil” (Voice Dialing)
Connect to the VerbmobilSpeech-to-Speech Translation Server
+49 631 3111911
Verbmobil: “Welcome to the Verbmobil Translation System. Please speak the telephone
number of your partner.”
Mobile GSM Phone
Mobile DECT Phone
© Wolfgang Wahlster, DFKI GmbH
Verbmobil: The First Speech-Only Dialog Translation System
American Speaker: “Verbmobil” (Voice Dialing)
Connect to the VerbmobilSpeech-to-Speech Translation Server
+49 631 3111911
American Speaker: “0177555”
Verbmobil: “Welcome to the Verbmobil Translation System. Please speak the telephone
number of your partner.”
Mobile GSM Phone
Mobile DECT Phone
© Wolfgang Wahlster, DFKI GmbH
Verbmobil: The First Speech-Only Dialog Translation System
American Speaker: “Verbmobil” (Voice Dialing)
Connect to the VerbmobilSpeech-to-Speech Translation Server
+49 631 3111911
Foreign Participant is placed into the Conference Call
To
Ger
man
Par
tici
pan
t
Verbmobil: Verbmobil hat eine neue Verbindung aufgebaut. Bitte sprechen Sie jetzt.
To
Am
eric
an P
arti
cip
ant
Verbmobil: Welcome to the Verbmobil server. Please start your input after the beep.
Verbmobil: “Welcome to the Verbmobil Translation System. Please speak the telephone
number of your partner.”
American Speaker: “0177555”
Mobile GSM Phone
Mobile DECT Phone
© Wolfgang Wahlster, DFKI GmbH
Verbmobil is a Multilingual System
GermanEnglish
(American)
German Japanese
It supports bidirectional translation between:
GermanChinese
(Mandarine)
UNIVERSITÄT DESSAARLANDES
RUHR-UNIVERSITÄTBOCHUM
Phase 2
UNIVERSITÄTHAMBURG
UNIVERSITÄTKARLSRUHE
UNIVERSITÄTBIELEFELD
TECHNISCHEUNIVERSITÄT
MÜNCHEN
FRIEDRICH-ALEXANDER-UNIVERSITÄT
ERLANGEN-NÜRNBERG UNIVERSITÄTSTUTTGART
RHEINISCHE FRIEDRICHWILHELMS-UNIVERSITÄT
BONNLUDWIG
MAXIMILIANSUNIVERSITÄT
MÜNCHEN
TU-BRAUNSCHWEIG
EBERHARDT-KARLSUNIVERSITÄT
TÜBINGEN W. Wahlster, DFKI
DAIMLERCHRYSLER
Verbmobil Partner
© Wolfgang Wahlster, DFKI GmbH
What has the callersaid?100
Alternatives
What has the caller meant?
10Alternatives
What does the callerwant?
Unambiguous Understanding in the
Dialog Context
Red
uct
ion
of
Un
cert
ain
ty
Sprachanalyse
Speech Recognition
Speech Telephone Input
Discourse Context
Knowledgeabout Domainof Discourse
Grammar
LexicalMeaning
AcousticLanguage
Models
Word Lists
Speech
Analysis
SpeechUnder-stan-ding
Three Levels of Language Processing
© Wolfgang Wahlster, DFKI GmbH
Open Microphone,GSM Quality
SpontaneousSpeech
Speakeradaptive
MultipartyNegotiation
Verbmobil
Incr
ea
sin
g C
om
ple
xit
y
Input Conditions Naturalness Adaptability Dialog Capabilities
Close-SpeakingMicrophone/
HeadsetPush-to-talk
Isolated Words SpeakerDependent
MonologDictation
Telephone,Pause-basedSegmentation
Read Continuous
Speech
SpeakerIndependent
Information-seeking Dialog
Challenges for Language Engineering
© Wolfgang Wahlster, DFKI GmbH
Scenario 1AppointmentScheduling
Scenario 2Travel Planning &Hotel Reservation
Scenario 3PC-Maintenance
Hotline
Verbmobil II: Three Domains of Discourse
© Wolfgang Wahlster, DFKI GmbH
Scenario 1AppointmentScheduling
Scenario 2Travel Planning &Hotel Reservation
Scenario 3PC-Maintenance
Hotline
When? When? Where? How? What? When? Where?How?
Verbmobil II: Three Domains of Discourse
© Wolfgang Wahlster, DFKI GmbH
Scenario 1AppointmentScheduling
Scenario 2Travel Planning &Hotel Reservation
Scenario 3PC-Maintenance
Hotline
When? When? Where? How? What? When? Where?How?
Focus on temporalexpressions
Focus on temporaland spatial expressions
Integration of specialsublanguage lexica
Verbmobil II: Three Domains of Discourse
© Wolfgang Wahlster, DFKI GmbH
Scenario 1AppointmentScheduling
Scenario 2Travel Planning &Hotel Reservation
Scenario 3PC-Maintenance
Hotline
When? When? Where? How? What? When? Where?How?
Focus on temporalexpressions
Focus on temporaland spatial expressions
Integration of specialsublanguage lexica
Vocabulary Size:6000
Vocabulary Size:10000
Vocabulary Size:30000
Verbmobil II: Three Domains of Discourse
© Wolfgang Wahlster, DFKI GmbH
Wann fährt der nächsteZug nach Hamburg ab?
When does the next train to Hamburg depart?
Wo befindet sichdas nächste
Hotel?
Where is the nearest hotel?
Context-Sensitive Speech-to-Speech Translation
VerbmobilServer
© Wolfgang Wahlster, DFKI GmbH
The Control Panel of Verbmobil
© Wolfgang Wahlster, DFKI GmbH
The Control Panel of Verbmobil
© Wolfgang Wahlster, DFKI GmbH
The Control Panel of Verbmobil
© Wolfgang Wahlster, DFKI GmbH
The Control Panel of Verbmobil
© Wolfgang Wahlster, DFKI GmbH
The Control Panel of Verbmobil
© Wolfgang Wahlster, DFKI GmbH
The Control Panel of Verbmobil
© Wolfgang Wahlster, DFKI GmbH
The Control Panel of Verbmobil
© Wolfgang Wahlster, DFKI GmbH
The Control Panel of Verbmobil
© Wolfgang Wahlster, DFKI GmbH
The Control Panel of Verbmobil
© Wolfgang Wahlster, DFKI GmbH
The Control Panel of Verbmobil
© Wolfgang Wahlster, DFKI GmbH
The Control Panel of Verbmobil
© Wolfgang Wahlster, DFKI GmbH
Verbmobil‘s Massive Data Collection Effort
Transliteration Variant 1Transliteration Variant 2 Lexical OrthographyCanonical PronounciationManual Phonological Segmentation
Automatic Phonological SegmentationWord SegmentationProsodic SegmentationDialog ActsNoises
Superimposed SpeechSyntactic CategoryWord CategorySyntactic FunctionProsodic Boundaries
The so-called Partitur (German word for musical score)orchestrates fifteen strata of annotations
3,200 dialogs (182 hours)with 1,658 speakers79,562 turnsdistributed on56 CDs, 21.5 GB
© Wolfgang Wahlster, DFKI GmbH
Machine Learningfor the Integration of Statistical Properties into
Symbolic Models for Speech Recognition, Parsing,Dialog Processing, Translation
TranscribedSpeech Data
SegmentedSpeech
with ProsodicLabels
AnnotatedDialogs withDialog Acts
Treebanks &Predicate-ArgumentStructures
AlignedBilingualCorpora
HiddenMarkovModels
Neural Nets,MultilayeredPerceptrons
ProbabilisticAutomata
ProbabilisticGrammars
ProbabilisticTransfer
Rules
Extracting Statistical Properties from Large Corpora
© Wolfgang Wahlster, DFKI GmbH
50
60
70
80
90
100
VM1 '97 '98
'99.1
'99.2
'99.3
2000
Wo
rd a
ccu
racy
[%
]Japanese English German
Multilinguality
© Wolfgang Wahlster, DFKI GmbH
Language Identification (LID)
GermanRecognizer
EnglishRecognizer
JapaneseRecognizer
SpeechIndependent LID- Module
w1 … wn
Multilinguality
© Wolfgang Wahlster, DFKI GmbH
M1 M2 M3
M5 M6M4
BB 2BB 1 BB 3
M1
Verbmobil I Verbmobil II Multi-Agent Architecture Multi-Blackboard Architecture
Each module must know, which module produces what data
Direct communication between modules
Heavy data traffic for moving copies
around
All modules can register for each blackboard dynamically
No direct communication between modules
No copies of representation structures
(word lattice, VIT chart)
From a Multi-Agent Architecture to a Multi-Blackboard Architecture
BlackboardsM2
M3
M6
M4 M5
© Wolfgang Wahlster, DFKI GmbH
Module 1.1
Module 2.1
Module 3.1
Blackboard 1Preprocessed
Speech Signal
Blackboard 2Word Lattice
Blackboard 3Syntactic
Representation:ParsingResults
Blackboard 4Semantic
Representation:Lambda
DRS
Blackboard 5DialogActs
Module 4.1
Module 5.1
Module 6.1
Multi-Blackboard/Multi-Engine Architecture
1.2 2.2 3.2.. .. ..
4.2..5.2.. ..
6.2
© Wolfgang Wahlster, DFKI GmbH
Audio Data
CommandRecognizer
SpontaneousSpeech Recognizer
Channel/SpeakerAdaptation
ProsodicAnalysis
A Multi-Blackboard Architecture for the Combinationof Results from Deep and Shallow Processing Modules
© Wolfgang Wahlster, DFKI GmbH
Audio Data
Word HypothesesGraph with
Prosodic Labels
CommandRecognizer
SpontaneousSpeech Recognizer
Channel/SpeakerAdaptation
ProsodicAnalysis
StatisticalParser
Dialog ActRecognition
Chunk Parser
HPSGParser
A Multi-Blackboard Architecture for the Combinationof Results from Deep and Shallow Processing Modules
© Wolfgang Wahlster, DFKI GmbH
Audio Data
Word HypothesesGraph with
Prosodic Labels
VITsUnderspecified
DiscourseRepresentations
CommandRecognizer
SpontaneousSpeech Recognizer
Channel/SpeakerAdaptation
ProsodicAnalysis
StatisticalParser
Dialog ActRecognition
Chunk Parser
HPSGParser
SemanticConstruction
Robust DialogSemantics
SemanticTransfer
Generation
A Multi-Blackboard Architecture for the Combinationof Results from Deep and Shallow Processing Modules
© Wolfgang Wahlster, DFKI GmbH
VIT (Verbmobil Interface Terms) as a Multi-Stratal Representation Language
used as a common representation scheme for information exchange between all components and processing threads
design inspired by underspecified discourse representation structures (UDRS, Reyle/Kamp 1993)
compact representation of lexical and structured ambiguities and scope underspecifications of quantifiers, negations and adverbs
variable-free sets of non-recursive terms:[beginning (35, i37), arg3 (35, i37 ,i38), come (27, i35), arg1
(27, i35, i36), decl (37, h43), pron (26, i36), at (36, i35, i37), mofy (34 ,i38, aug), def (28, i37, h42, h41), udef (31, i38, h45, h44)],
streams of literals as flat multi-stratal representations that are very efficient for incremental processing
© Wolfgang Wahlster, DFKI GmbH
Vit (vitID (sid (104,a,en,10,80,1,en,y,semantics), % Segment Identifier
[word (he, 1, [26]), word(is, 2, []), word(coming, 3, [27]), word(at, 4, [36]), word(the ,5, [28]), word(beginning, 6, [35]), word(of, 7, [35]), word(``August'', 8, [34])]),
% WHG String
index (38, 25 ,i35), % Index
[beginning (35, i37), arg3 (35, i37 ,i38), come (27, i35), arg1 (27, i35, i36), decl (37, h43), pron (26, i36), at (36, i35, i37), mofy (34 ,i38, aug), def (28, i37, h42, h41), udef (31, i38, h45, h44)],
% Conditions[in_g (26, 25), in_g (37, 38), in_g (27, 25), in_g (28, 30), in_g (31, 33), in_g (34, 32), in_g (35, 29), in_g (36, 25), leq (25, h41), leq (25, h43), leq (29, h42), leq (29, h44), leq (30, h43), leq (32, h45), leq (33, h43)],
% Scope and Grouping Constraints[s_sort (i35, situation), s_sort (i37, time), s_sort (i38, time)], % Sortal Specifications for Instance Variables[dialog_act (25, inform), dir (36, no), prontype (i36, third,std)], % Discourse and Pragmatics[cas (i36, nom), gend (i36, masc), num (i36, sg), num (i37, sg), num (i38, sg), pcase (l135, i38, of)], % Syntax[ta_aspect (i35, progr), ta_mood (i35, ind), ta_perf (i35, nonperf), ta_tense (i35, pres)], % Tense and Aspect[pros_accent (35)] % Prosody
VIT for ‘He is coming at the beginning of August‘
© Wolfgang Wahlster, DFKI GmbH
[word (he, 1, [26]), word(is, 2, []), word(coming, 3, [27]), word(at, 4, [36]), word(the ,5, [28]), word(beginning, 6, [35]), word(of, 7, [35]), word(``August'', 8, [34])]),
% WHG String[beginning (35, i37), arg3 (35, i37 ,i38), come (27, i35), arg1 (27, i35, i36), decl (37, h43), pron (26, i36), at (36, i35, i37), mofy (34 ,i38, aug), def (28, i37, h42, h41), udef (31, i38, h45, h44)], % Conditions[s_sort (i35, situation), s_sort (i37, time), s_sort (i38, time)], % Sorts[cas (i36, nom), gend (i36, masc), num (i36, sg), num (i37, sg),], % Syntax
Information between Layers is Linked TogetherUsing Constant Symbols
Instances are constants interpreted as skolemized variables
© Wolfgang Wahlster, DFKI GmbH
[word (he, 1, [26]), word(is, 2, []), word(coming, 3, [27]), word(at, 4, [36]), word(the ,5, [28]), word(beginning, 6, [35]), word(of, 7, [35]), word(``August'', 8, [34])]),
% WHG String[beginning (35, i37), arg3 (35, i37 ,i38), come (27, i35), arg1 (27, i35, i36), decl (37, h43), pron (26, i36), at (36, i35, i37), mofy (34 ,i38, aug), def (28, i37, h42, h41), udef (31, i38, h45, h44)], % Conditions[s_sort (i35, situation), s_sort (i37, time), s_sort (i38, time)], % Sorts[cas (i36, nom), gend (i36, masc), num (i36, sg), num (i37, sg),], % Syntax
Information between Layers Linked TogetherUsing Constant Symbols
Instances are constants interpreted as skolemized variables
© Wolfgang Wahlster, DFKI GmbH
[word (he, 1, [26]), word(is, 2, []), word(coming, 3, [27]), word(at, 4, [36]), word(the ,5, [28]), word(beginning, 6, [35]), word(of, 7, [35]), word(``August'', 8, [34])]),
% WHG String[beginning (35, i37), arg3 (35, i37 ,i38), come (27, i35), arg1 (27, i35, i36), decl (37, h43), pron (26, i36), at (36, i35, i37), mofy (34 ,i38, aug), def (28, i37, h42, h41), udef (31, i38, h45, h44)], % Conditions[s_sort (i35, situation), s_sort (i37, time), s_sort (i38, time)], % Sorts[cas (i36, nom), gend (i36, masc), num (i36, sg), num (i37, sg),], % Syntax
Information between Layers Linked TogetherUsing Constant Symbols
Instances are constants interpreted as skolemized variables
© Wolfgang Wahlster, DFKI GmbH
[word (he, 1, [26]), word(is, 2, []), word(coming, 3, [27]), word(at, 4, [36]), word(the ,5, [28]), word(beginning, 6, [35]), word(of, 7, [35]), word(``August'', 8, [34])]),
% WHG String[beginning (35, i37), arg3 (35, i37 ,i38), come (27, i35), arg1 (27, i35, i36), decl (37, h43), pron (26, i36), at (36, i35, i37), mofy (34 ,i38, aug), def (28, i37, h42, h41), udef (31, i38, h45, h44)], % Conditions[s_sort (i35, situation), s_sort (i37, time), s_sort (i38, time)], % Sorts[cas (i36, nom), gend (i36, masc), num (i36, sg), num (i37, sg),], % Syntax
Information between Layers Linked TogetherUsing Constant Symbols
Instances are constants interpreted as skolemized variables
© Wolfgang Wahlster, DFKI GmbH
[word (he, 1, [26]), word(is, 2, []), word(coming, 3, [27]), word(at, 4, [36]), word(the ,5, [28]), word(beginning, 6, [35]), word(of, 7, [35]), word(``August'', 8, [34])]),
% WHG String[beginning (35, i37), arg3 (35, i37 ,i38), come (27, i35), arg1 (27, i35, i36), decl (37, h43), pron (26, i36), at (36, i35, i37), mofy (34 ,i38, aug), def (28, i37, h42, h41), udef (31, i38, h45, h44)], % Conditions[s_sort (i35, situation), s_sort (i37, time), s_sort (i38, time)], % Sorts[cas (i36, nom), gend (i36, masc), num (i36, sg), num (i37, sg),], % Syntax
Information between Layers Linked TogetherUsing Constant Symbols
Instances are constants interpreted as skolemized variables
© Wolfgang Wahlster, DFKI GmbH
The Use of Underspecified Representations
Wir telephonierten mit Freunden aus Schweden.Two Readings in theSource Language
UnderspecifiedSemantic
Representation
AmbiguityPreserving
Translations
A compact representationof scope ambiguities in alogical language withoutusing disjunctions
Two Readings in theTarget Language
We called friends from Sweden.
© Wolfgang Wahlster, DFKI GmbH
Verbmobil is the First Dialog Translation System that Uses Prosodic Information Systematicallyat All Processing Stages
Speech Signal Word Hypotheses Graph
Multilingual Prosody ModuleProsodic features:duration pitch energy pause
Search SpaceRestriction
Parsing
Dialog ActSegmentation and
Recognition
Dialog Understanding
Constraints forTransfer
Translation
LexicalChoice
GenerationSpeech
Synthesis
SpeakerAdaptation
BoundaryInformationBoundary
InformationBoundary
InformationBoundary
InformationSentence
MoodSentence
MoodAccented
WordsAccented
WordsProsodic Feature
Vector
© Wolfgang Wahlster, DFKI GmbH
Using Syntactic-Prosodic Boundaries to Speed-Up the Parsing Process
yes S1 no problem S4 Mister Mueller S4 when would you like to go to Hannover S4
without boundaries: # chart edges: 1256 runtime: 1.31 secs
with boundaries: #chart edges: 632 runtime: 0.62 secs
speed-up: 53%
© Wolfgang Wahlster, DFKI GmbH
Using Syntactic-Prosodic Boundaries to Speed-Up the Parsing Process
yes S1 no problem S4 Mister Mueller S4 when would you like to go to Hannover S4
without boundaries: # chart edges: 1256 runtime: 1.31 secs
with boundaries: #chart edges: 632 runtime: 0.62 secs
speed-up: 53%
© Wolfgang Wahlster, DFKI GmbH
Chunk ParserChunk ParserStatistical ParserStatistical Parser HPSG ParserHPSG Parser
Integrating Shallow and Deep Analysis Components in a Multi-Engine Approach
A* Algorithmguiding through
Augmented Word Hypotheses Graph
A* Algorithmguiding through
Augmented Word Hypotheses Graph
© Wolfgang Wahlster, DFKI GmbH
Robust Dialog SemanticsCombination and knowledge-
based reconstruction of complete VITs
Robust Dialog SemanticsCombination and knowledge-
based reconstruction of complete VITs
Complete and SpanningVITs
Complete and SpanningVITs
Integrating Shallow and Deep Analysis Components in a Multi-Engine Approach
Chunk ParserChunk ParserStatistical ParserStatistical Parser HPSG ParserHPSG Parser
partial VITs Chart with a combination of
partial VITs
Chart with a combination of
partial VITs
partial VITs
partial VITs
A* Algorithmguiding through
Augmented Word Hypotheses Graph
A* Algorithmguiding through
Augmented Word Hypotheses Graph
© Wolfgang Wahlster, DFKI GmbH
Wir treffen uns inMannheim, äh, in Saarbrücken.
(We are meeting in Mannheim, oops, in Saarbruecken.)
We are meetingin Saarbruecken.
English
German
Automatic Understanding and Correction of Speech Repairs in Spontaneous Telephone Dialogs
© Wolfgang Wahlster, DFKI GmbH
I need a car next Tuesday oops MondayI need a car next Tuesday oops Monday
Original Utterance Editing Phase Repair Phase
Reparandum Hesitation Reparans
Recognition ofSubstitutions
Transformation of theWord Hypothesis Graph
I need a car next MondayI need a car next Monday
Verbmobil Technology: Understands Speech Repairs and extracts the intended meaning
The Understanding of Spontaneous Speech Repairs
© Wolfgang Wahlster, DFKI GmbH
VHG: A Packed Chart Representation of Partial Semantic Representations
Chart Parser using cascaded finite-state transducers
Incremental chart construction and anytime processing
© Wolfgang Wahlster, DFKI GmbH
VHG: A Packed Chart Representation of Partial Semantic Representations
Chart Parser using cascaded finite-state transducers
Statistical LR parser trained on a treebank
Incremental chart construction and anytime processing
© Wolfgang Wahlster, DFKI GmbH
VHG: A Packed Chart Representation of Partial Semantic Representations
Chart Parser using cascaded finite-state transducers
Statistical LR parser trained on a treebank
Very fast HPSG parser
Incremental chart construction and anytime processing
© Wolfgang Wahlster, DFKI GmbH
VHG: A Packed Chart Representation of Partial Semantic Representations
Chart Parser using cascaded finite-state transducers
Statistical LR parser trained on a treebank
Very fast HPSG parser
Incremental chart construction and anytime processing
© Wolfgang Wahlster, DFKI GmbH
Incremental chart construction and anytime processing Rule-based combination and transformation of partial
UDRS coded as VITs
VHG: A Packed Chart Representation of Partial Semantic Representations
Chart Parser using cascaded finite-state transducers
Statistical LR parser trained on a treebank
Very fast HPSG parser
© Wolfgang Wahlster, DFKI GmbH
Incremental chart construction and anytime processing Rule-based combination and transformation of partial
UDRS coded as VITs Selection of a spanning analysis using a bigram model for VITs
VHG: A Packed Chart Representation of Partial Semantic Representations
Chart Parser using cascaded finite-state transducers
Statistical LR parser trained on a treebank
Very fast HPSG parser
© Wolfgang Wahlster, DFKI GmbH
We aremeeting in
Kaiserslautern.
Wir treffen uns
Kaiserslautern.(We are meeting
Kaiserslautern.)
English
German
Semantic Correction of Recognition Errors
© Wolfgang Wahlster, DFKI GmbH
Goals of robust semantic processing (Pinkal, Worm, Rupp) Combination of unrelated analysis fragments Completion of incomplete analysis results Skipping of irrelevant fragments
Method: Transformation rules on VIT Hypothesis Graph:Conditions on VIT structures Operations on VIT structures
The rules are based on various knowledge sources:
lattice of semantic types domain ontology sortal restrictions semantic constraints
Results: 20% analysis is improved, 0.6% analysis gets worse
Robust Dialog Semantics: Deep Processing of Shallow Structures
© Wolfgang Wahlster, DFKI GmbH
The preposition ‚in‘ is missing in all paths through the word hypothesis graph. A temporal NP is transformed into a temporal modifier using an underspecified temporal relation:
[temporal_np(V1)] [typeraise_to_mod (V1, V2)] & V2
The modifier is applied to a proposition:
[type (V1, prop), type (V2, mod)] [apply (V2, V1, V3)] & V3
Let us meet the late afternoon to catch the train to Frankfurt
Let us meet (in) the late afternoon to catch the train to Frankfurt
Robust Dialog Semantics: Combining and Completing Partial Representations
© Wolfgang Wahlster, DFKI GmbH
Competing Strategies for Robust Speech Translation
The concurrent processing modules of Verbmobil combine deep semantic translation with shallow surface-oriented translation
methods.
timeout?timeout?
Acceptable Translation RateAcceptable Translation Rate
Expensive, but precise Translation
Cheap, but approximate Translation
Principled and compositional syntactic and semantic analysis
Semantic-based transfer of Verbmobil Interface Terms (VITs)
as set of underspecified DRS
Case-based Translation
Dialog-act based translation
Statistical translation Results with
Confidence ValuesResults with
Confidence Values
Selection of best result
© Wolfgang Wahlster, DFKI GmbH
Architecture of the Semantic Transfer Module
Bilingual DictionaryBilingual Dictionary
Refined VIT (L1)Refined VIT (L1) Refined VIT (L2)Refined VIT (L2)Lexical Transfer
Monolingual Refinement
Rules
Monolingual Refinement
Rules
DisambiguationRules
DisambiguationRules
Monolingual Refinement
Rules
Monolingual Refinement
Rules
DisambiguationRules
DisambiguationRules
VIT (L1)VIT (L1) VIT (L2)VIT (L2)Phrasal Transfer
Underspecified VIT (L1)Underspecified VIT (L1) Underspecified VIT (L2)Underspecified VIT (L2)
Phrasal DictionaryPhrasal Dictionary
Refinement Refinement
© Wolfgang Wahlster, DFKI GmbH
Preserving lexical ambiguities
How did you find his office? (get to or like)Wie fanden Sie sein Büro?
Disambiguation is not necessary for the translation between German and English.
dou kare no jimusho o mitsukeraremashita ka How he POSS office OBJ get to can PAST QUESTION
kare no jimusho wa dou omoimasu kaHe POSS office TOPIC how think QUESTION
Lexical Disambiguation On-Demand
Disambiguation is necessary for the translation between German and Japanese.
© Wolfgang Wahlster, DFKI GmbH
Three English Translations of the German Word “Termin” Found in the Verbmobil Corpus
1. Verschieben wir den Termin.Let’s reschedule the appointment
2. Schlagen Sie einen Termin vor.Suggest a date.
3. Da habe ich einen Termin frei.I have got a free slot there.
Subsumption Relationsin the Domain Model
scheduled event
default temporal_specification
appointment set_start_time time_interval
date slot
© Wolfgang Wahlster, DFKI GmbH
Entries in the Transfer Lexicon: German English (Simplified)
tau_lex (termin, appointment,pred_sort (subsume (scheduled_event))).tau_lex (termin, date, pred_sort (subsume (set_start_time)).tau_lex (termin, slot, pred_sort (subsume (time_interval))).
tau_lex (verschieben, reschedule, [tau (#S), tau (#0)],pred_args ([#S, #0 & pred_sort (scheduled_event)]))
tau_lex (ausmachen, fix, [tau (#S), tau (#0)],pred_args ([#S, #0 & pred_sort (set_start_time)]))
tau_lex (freihaben, have_free, [tau (#S), tau (#0)],pred_args ([#S, #0 & pred_sort (time_interval)]))
© Wolfgang Wahlster, DFKI GmbH
Using Context and World Knowledgefor Semantic Transfer
All other dialog translation systems translate word-by-word or sentence-by-sentence.
1Nehmen wir dieses Hotel, ja. Let us take this hotel.
Ich reserviere einen Platz. I will reserve a room.
2Machen wir das Abendessen dort. Let us have dinner there.
Ich reserviere einen Platz. I will reserve a table.
3Gehen wir ins Theater. Let us go to the theater.
Ich möchte Plätze reservieren. I would like to reserve seats.
Example: Platz room / table / seat
© Wolfgang Wahlster, DFKI GmbH
Segment 1If you prefer another hotel,
Segment 1If you prefer another hotel,
Segment 2please let me know.
Segment 2please let me know.
Integrating Deep and Shallow Processing: Combining Results from Concurrent Translation Threads
© Wolfgang Wahlster, DFKI GmbH
StatisticalTranslationStatistical
TranslationDialog-Act Based
TranslationDialog-Act Based
TranslationSemanticTransferSemanticTransfer
Case-BasedTranslationCase-BasedTranslation
Integrating Deep and Shallow Processing: Combining Results from Concurrent Translation Threads
Segment 1If you prefer another hotel,
Segment 1If you prefer another hotel,
Segment 2please let me know.
Segment 2please let me know.
Alternative Translations with Confidence Values
© Wolfgang Wahlster, DFKI GmbH
Integrating Deep and Shallow Processing: Combining Results from Concurrent Translation Threads
Segment 1Translated by Semantic Transfer
Segment 1Translated by Semantic Transfer
Segment 2Translated by Case-Based Translation
Segment 2Translated by Case-Based Translation
Alternative Translations with Confidence Values
StatisticalTranslationStatistical
TranslationDialog-Act Based
TranslationDialog-Act Based
TranslationSemanticTransferSemanticTransfer
Case-BasedTranslationCase-BasedTranslation
Segment 1If you prefer another hotel,
Segment 1If you prefer another hotel,
Segment 2please let me know.
Segment 2please let me know.
Selection ModuleSelection Module
© Wolfgang Wahlster, DFKI GmbH
SEQ := Set of all translation sequences for a turnSeqSEQ := Sequence of translation segments s1, s2, ...sn
Input:
A Machine Learning Approach to the Selection of the Best Translation Result
Each translation thread provides for every segment an online confidence value confidence (thread.segment)
© Wolfgang Wahlster, DFKI GmbH
SEQ := Set of all translation sequences for a turnSeqSEQ := Sequence of translation segments s1, s2, ...sn
Input: Each translation thread provides for every segment an online confidence value confidence (thread.segment)
Task: Compute normalized confidence values for translated Seq
CONF (Seq) = Length(segment) * (alpha(thread) + beta(thread) * confidence(thread.segment))
segment Seq
A Context-Free Approach to the Selection of the Best Translation Result
© Wolfgang Wahlster, DFKI GmbH
SEQ := Set of all translation sequences for a turnSeqSEQ := Sequence of translation segments s1, s2, ...sn
Input:
Task: Compute normalized confidence values for translated Seq
CONF (Seq) = Length(segment) * (alpha(thread) + beta(thread) * confidence(thread.segment))
Output: Best (SEQ) = {Seq SEQ | Seq is maximal element in (SEQ CONF)
segment Seq
A Context-Free Approach to the Selection of the Best Translation Result
Each translation thread provides for every segment an online confidence value confidence (thread.segment)
© Wolfgang Wahlster, DFKI GmbH
Turn := segment1, segment2...segmentn
For each turn in a training corpusall segments translated by one of the four translation threads aremanually annotated with a score for translation quality.
Learning the Normalizing Factors Alpha and Beta from an Annotated Corpus
© Wolfgang Wahlster, DFKI GmbH
Turn := segment1, segment2...segmentn
For each turn in a training corpusall segments translated by one of the four translation threads aremanually annotated with a score for translation quality.
For the sequence of n segments resulting in the best overall translation score at most 4n linear inequations are generated, so that the selected sequence is better than all alternative translation sequences.
Learning the Normalizing Factors Alpha and Beta from an Annotated Corpus
© Wolfgang Wahlster, DFKI GmbH
Turn := segment1, segment2...segmentn
For each turn in a training corpusall segments translated by one of the four translation threads aremanually annotated with a score for translation quality.
For the sequence of n segments resulting in the best overall translation score at most 4n linear inequations are generated, so that the selected sequence is better than all alternative translation sequences.
From the set of inequations for spanning analyses ( 4n) the values ofalpha and beta can be determined offline by solving the constraint system.
Learning the Normalizing Factors Alpha and Beta from an Annotated Corpus
© Wolfgang Wahlster, DFKI GmbH
Integrating a Deep HPSG-based Analysis with Probabilistic Dialog Act Recognition for Semantic Transfer
ProbabilisticAnalysis of Dialog
Acts (HMM)
ProbabilisticAnalysis of Dialog
Acts (HMM)
Recognition ofDialog Plans
(Plan Operators)
Recognition ofDialog Plans
(Plan Operators)
Dialog Act Type
HPSG AnalysisHPSG Analysis
RobustDialog Semantics
RobustDialog Semantics
VITVIT
SemanticTransferSemanticTransfer
Dialog Act Type
© Wolfgang Wahlster, DFKI GmbH
ProbabilisticAnalysis of Dialog
Acts (HMM)
ProbabilisticAnalysis of Dialog
Acts (HMM)
Recognition ofDialog Plans
(Plan Operators)
Recognition ofDialog Plans
(Plan Operators)
Dialog Phase
Dialog Act Type
Integrating a Deep HPSG-based Analysis with Probabilistic Dialog Act Recognition for Semantic Transfer
HPSG AnalysisHPSG Analysis
RobustDialog Semantics
RobustDialog Semantics
VITVIT
SemanticTransferSemanticTransfer
Dialog Act Type
© Wolfgang Wahlster, DFKI GmbH
Dialog Act
CONTROL_DIALOG
MANAGE_TASK
PROMOTE_TASK
GREETING
INTRODUCE
POLITENESS_FORMULA
THANK
DELIBERATE
BACKCHANNEL
INIT
DEFER
CLOSE
REQUEST
SUGGEST
INFORM
FEEDBACK
COMMIT
REQUEST_SUGGEST
REQUEST_CLARIFY
REQUEST_COMMENT
REQUEST_COMMIT
GREETING_BEGIN
GREETING_END
DIGRESS
EXCLUDE
CLARIFY
GIVE_REASON
DEVIATE_SCENARIO
REFER_TO_SETTING
CLARIFY_ANSWER
FEEDBACK_NEGATIVE REJECT EXPLAINED_REJECT
FEEDBACK_POSITIVEACCEPT
CONFIRM
The Dialog Act Hierarchy used for Planning,Prediction, Translation and Generation
© Wolfgang Wahlster, DFKI GmbH
( OPERATOR-s-10523-6
goal [IN-TURN confirm-s-10523 ?S-3314 ?S-3316]
subgoals (sequence [IN-TURN confirm-s-10521 ?S-3314 ?S-3315]
[ IN-TURN confirm-s-10522 ?S-3315 ?S-3316])
PROB 0.72)
( OPERATOR-s-10521-8
goal [IN-TURN confirm-s-10521 ?S-3321 ?S-3322]
subgoals (sequence[DOMAIN-DEPENDENT accept ?S-3321
?S-3322])
PROB 0.95)
Learning of Probabilistic Plan Operators from Annotated Corpora
© Wolfgang Wahlster, DFKI GmbH
Dialog Translationby Verbmobil
MultilingualGeneration of Summaries
HTML-Document
in English
Transferred by
Internet or Fax
HTML-Document
in German
Transferred by
Internet or FaxGerman Dialog Partner
American Dialog Partner
Automatic Generation of Multilingual Summariesof Telephone Conversations
© Wolfgang Wahlster, DFKI GmbH
Dialog Summary
Participants: Mr. Jones, Mr. Mueller Date: 22.3.2001 Time: 8:57 AM to 10:03 AM Theme: Appointment schedule with trip and accommodation
Dialog Summary: Scheduling:
Mr. Jones and Mr. Mueller will meet at the train station on the 1st of March 2001 at 10:00 am. Travelling:
The trip from Hamburg to Hanover by train will start on the 1st of March at 10:15 am.
Summary automatically generated at 22.3.2001 12:31:24 h
© Wolfgang Wahlster, DFKI GmbH
Microplanning: Create Syntactic Building Blocks
Method: Mapping of dependency structures
Example: Time Expressions
DEF (L,I,G,H)
DOWF (L1,I,mo)
ORD (L2,I,11)
MOFY (L3,I,may)
MONDAY1
ARG
ELEVENTH_DAY
SPEC
ARG
THE
MAY
ARG
OF_P
Semantic dependency: VIT Syntactical dependency: TAG
© Wolfgang Wahlster, DFKI GmbH
Speeding Up the Language Generation Process by the Compilation of the HPSG Grammar to an LTAG Generation Grammar
Lexicalized TreeAdjoining Grammar
2,350 Trees
Compilation- extended domain of locality- no recursive feature structures- fast generation (0.5 secs average runtime)
HPSGAnalysis Grammar
© Wolfgang Wahlster, DFKI GmbH
I have time Monday.onSentence to synthesize
i have time monday
I have time monday
i have monday
i
on
on
on
onTok
ens
S E
Edge direction
S E
have time
i mondayon
Corpus-based Speech Synthesis
© Wolfgang Wahlster, DFKI GmbH
Funding by the German Ministry for Education and Research BMBF (Dr. Reuse)
Phase I (1993-1996) $ 33 MPhase II (1997-2000) $ 28 M
60% Industrial funding according to a shared cost model $ 17 M
Additional R&D investments of industrial partners $ 11 M
Total $ 89 M
Verbmobil: Long-Term, Large-Scale Funding and Its Impact
© Wolfgang Wahlster, DFKI GmbH
> 800 Publications (>600 refereed)
> Many Patents
> 20 Commercial Spin-off Products
> 8 Spin-off Companies
> 900 trained Researchers for German Language Industry
Philips, DaimlerChrysler and Siemens are leaders in
Spoken Dialog Applications
Verbmobil: Long-Term, Large-Scale Funding and Its Impact
© Wolfgang Wahlster, DFKI GmbH
0
50
100
150
200
250
300
350
1 5 10 15 20 25 30 35 40 45 50 55 60
Distribution of Sentence Length in Large-Scale Evaluation
Web-based Evaluation of 25,345 Translations by 65 Evaluators
© Wolfgang Wahlster, DFKI GmbH
Evaluation ResultsThe translation of a turn is approximately correct if it preserves the intention of the speaker and the main propositional content of her utterance.
Translation Thread
Case-based Translation
Statistical Translation
Dialog-Act based Translation
Semantic Transfer
Substring-based Translation
Automatic Selection
Manual Selection
44%
79%
45%
47%
75%
66% / 83% *
95%
46%
81%
46%
49%
79%
68% / 85% *
97%
Word Accuracy 75%
3267 Turns
Word Accuracy 80%
2723 Turns
* After Training with Instance-based Learning Algorithm
© Wolfgang Wahlster, DFKI GmbH
TopicMeeting time Meeting place Means of transportationDeparture place Arrival time Who reserves the hotel How to get to departure place
Total Number of Tasks
Average Percentage of Successful Task Completions
SuccessfulCompletions/
Attempts25/2821/2730/3022/2522/2628/31
7/9
227/255
Successful Tasks
89,377,810088
84,690,377,8
86,8
Frequency-Based
Weighting Factor0,900,870,970,810,84
10,29
89,6
Results of End-to-End Evaluation Based on Dialog Task Completion for 31 Trials
.. ..
© Wolfgang Wahlster, DFKI GmbH
Vocabulary Size: 10 000 for German , Equivalent English Lexicon, 2500 for Japanese
Operational Success Criteria:
Word recognition rate (16 kHz):
German: spontaneous: 75% (cooperative: 85%)
English: spontaneous: 72% (cooperative: 82%)
Japanese: spontaneous: 75% (cooperative: 85%)
(8kHz) spontaneous: 70% (cooperative: 80%)
80% of the translations are approximately correct and the dialog task success rate should be around 90%.
The average end-to-end processing time should be four times real time (length of the input signal)
Checklist for Final Verbmobil System
© Wolfgang Wahlster, DFKI GmbH
Results of the Verbmobil Project have been used in 20 Spin-Off Products by the Industrial Partners DaimlerChrysler, Philips and Siemens
Verbmobil
Dictation Systems3
Spoken Dialog Systems4
Dialog Engines2
Command & ControlSystems
5
Text ClassificationSystems
3
Translation Systems3
© Wolfgang Wahlster, DFKI GmbH
Speech control of: cell phone, radio, windows / AC, navigation systemOption for S-, C-, and E-Class of Mercedes and BMWSpeaker-independent, Garbage models for non-speech (blinker, AC, wheels)
Linguatronic : Spoken Dialogs with a Mercedes-Benz
Mike
Please call Doris Wahlster.
Open the left window in the back.
I want to hear the weather channel.
When will I reach the next gas station?
Where is the next parking lot?
© Wolfgang Wahlster, DFKI GmbH
Fielded applications
Train schedules (German Railway System, DB)
TABA (Philips)+49 241 60 40 20
OSCAR (DaimlerChrysler)+49 1805 99 66 22
Flight Schedules (Lufthansa)
ALF (Philips)+49 1803 00 00 74
Technical Challenges: phone-based dialogs, many proper names, clarification
subdialogs
Spoken Dialogs about Schedules
© Wolfgang Wahlster, DFKI GmbH
Verbmobil
XtraMind TechnologiesLanguage Technology for Customer Interaction Serviceswww.xtramind.comSaarbrücken
GSDC GmbHMultilingual Documentationwww.ic-portal.gsdc.deNürnberg
SCHEMA GmbHDocument Engineeringwww.schema.deNürnberg
SYMPALOG GmbHSpoken Dialog Systemswww.sympalog.deNürnberg
RETIVOX GbRSpeech Synthesis Systemswww.retivox.deBonn
CLT Sprachtechnologie GmbHLT for Text Processing www.clt-st.deSaarbrücken
AIXPLAIN AGHuman Language Technologywww.aixplain.deAachen
SONICSON GmbHNatural Language Access toOnline Musicwww.sonicson.comKaiserslautern
Successful Technology Transfer: 8 High-Tec Spin-Off Companies in the Area of Language Technology have been founded by Verbmobil Researchers
© Wolfgang Wahlster, DFKI GmbH
Verbmobil
Internships18
Master Students238
PhD Students164
Student Research Assistants
483Habilitations
16
Total919
Verbmobil was the Key Resource for the Education and Training of Researchers and Engineers Needed
to Build Up Language Industry in Germany
© Wolfgang Wahlster, DFKI GmbH
Verbmobil SmartKom
Today‘s Cell Phone Third Generation UMTS Phone
Speech only Speech, Graphics and Gesture
From Spoken Dialog to Multimodal Dialog
© Wolfgang Wahlster, DFKI GmbH
Natural Language
Dialog
Graphical User
Interfaces
GesturalInteraction
MultimodalInteraction
Merging Various User Interface Paradigms
see Phil Cohen‘s invited talk on Friday
© Wolfgang Wahlster, DFKI GmbH
Main ContractorProject Management
TestbedSoftware Integration
DFKISaarbrücken
Main ContractorProject Management
TestbedSoftware Integration
DFKISaarbrücken
The SmartKom Consortium:
Project Budget: $ 34 MProject Duration: 4 years
SmartKom: Intuitive Multimodal Interaction
MediaInterface European Media Lab
IMS Institut für MaschinelleSprachverarbeitung, Universität Stuttgart
Ludwig-Maximilians-Universität München
© Wolfgang Wahlster, DFKI GmbH
Camera
GPS
Microphone
Loudspeaker
Stylus-Activated Sketch Pad
WearableComputeServer
Docking Stationfor Car PC
Biosensorfor Authentication& Emotional Feedback
GSM for Telephone,Fax, Internet Connectivity
SmartKom-Mobile: A Handheld Communication Assistant
© Wolfgang Wahlster, DFKI GmbH
SmartKom: Multimodal Dialogs with a Life-like Character
© Wolfgang Wahlster, DFKI GmbH
Verbmobil is a Very Large Dialog System
69 modules communicate via 224 blackboards
HPSG for German uses a hierarchy of 2,400 types
15,385 entries in the semantic database
22,783 transfer rules and 13,640 microplanning rules
30,000 templates for case-based translation
691,583 alignment templates
334 finite state-transducers
© Wolfgang Wahlster, DFKI GmbH
Deep Processing can be used for merging, completing and repairing the results of
shallow processing strategies.
Shallow methods can be used to guide the search in deep processing.
Statistical methods must be augmented by symbolic models to achieve higher accuracy
and broader coverage.
Statistical methods can be used to learn operators or selection strategies for symbolic
processes.
Lessons Learned from Verbmobil
© Wolfgang Wahlster, DFKI GmbH
Real-world problems in language technology like the
understanding of spoken dialogs,
speech-to-speech translation
and multimodal dialog systems
can only be cracked by the combined muscle of deep and shallow processing approaches.
Conclusions and Take-Home Messages
© Wolfgang Wahlster, DFKI GmbH
In a multi-blackboard and multi-engine architecture based on packed representations on all processing levels
speech recognition
parsing
semantic processing
translation
generation
using charts with underspecified representations the results of concurrent processing threads can be combined in an incremental fashion.
Conclusions and Take-Home Messages
© Wolfgang Wahlster, DFKI GmbH
All results of concurrent and competing processing modules
should come with a confidence value,
so that statistically trained selection modules can choose the most promising result at each stage,
if demanded by a following processing step.
Conclusions and Take-Home Messages
© Wolfgang Wahlster, DFKI GmbH
Packed representations together with formalisms for underspecification
capture the uncertainties in a each processing phase,
so that the uncertainties can be reduced by linguistic, discourse and domain constraints
as soon as they become applicable.
Conclusions and Take-Home Messages
© Wolfgang Wahlster, DFKI GmbH
Conclusions and Take-Home Messages
Underspecification allows disambiguation requirements to be delayed until later processing stages
where better-informed decisions can be made.
The massive use of underspecification makes the syntax-semantic interface and transfer rules almost deterministic,
thereby boosting processing speed.
© Wolfgang Wahlster, DFKI GmbH
Integrating top-down knowledge into low-level speech recognition processes
Exploiting more knowledge about human interpretation strategies
More robust translation of turns with very low word accuracy rates
Expensive data collection and cognitively unrealistic training data
Open Problems:
© Wolfgang Wahlster, DFKI GmbH
You can find a 10-page paper in the IJCAI-01 Proceedings, Vol. 2see pages 1484 - 1493
An extended version will appear inthe Winter issue of the AI Magazine
or check the URL: verbmobil.dfki.de
Further Reading
© Wolfgang Wahlster, DFKI GmbH
Wahlster, W. (2000) (ed.):
Verbmobil: Foundations of Speech-to-Speech Translation.
Berlin, New York, Tokyo: Springer.
679 pp. 224 figs., 88 tabs.
Hardcover ISBN 3-540-67783-6
The Verbmobil Book