speech-to-speech mt design and engineering
DESCRIPTION
Speech-to-Speech MT Design and Engineering. Alon Lavie and Lori Levin MT Class April 16 2001. Outline. Design and Engineering of the JANUS speech-to-speech MT system The Travel & Medical Domain Interlingua (IF) Portability to new domains: ML approaches Evaluation and User Studies - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Speech-to-Speech MT Design and Engineering](https://reader036.vdocument.in/reader036/viewer/2022070404/56813b7e550346895da49dc8/html5/thumbnails/1.jpg)
Speech-to-Speech MTDesign and Engineering
Alon Lavie and Lori Levin
MT Class
April 16 2001
![Page 2: Speech-to-Speech MT Design and Engineering](https://reader036.vdocument.in/reader036/viewer/2022070404/56813b7e550346895da49dc8/html5/thumbnails/2.jpg)
Outline
• Design and Engineering of the JANUS speech-to-speech MT system
• The Travel & Medical Domain Interlingua (IF)• Portability to new domains: ML approaches• Evaluation and User Studies• Open Problems, Current and Future Research
![Page 3: Speech-to-Speech MT Design and Engineering](https://reader036.vdocument.in/reader036/viewer/2022070404/56813b7e550346895da49dc8/html5/thumbnails/3.jpg)
Overview
• Fundamentals of our approach
• System overview
• Engineering a multi-domain system
• Evaluations and user studies
• Alternative translation approaches
• Current and future research
![Page 4: Speech-to-Speech MT Design and Engineering](https://reader036.vdocument.in/reader036/viewer/2022070404/56813b7e550346895da49dc8/html5/thumbnails/4.jpg)
JANUS Speech Translation
• Translation via an interlingua representation
• Main translation engine is rule-based
• Semantic grammars
• Modular grammar design
• System engineered for multiple domains
• Recent focus on domain portability– using machine learning for rapid extension to a
new domain
![Page 5: Speech-to-Speech MT Design and Engineering](https://reader036.vdocument.in/reader036/viewer/2022070404/56813b7e550346895da49dc8/html5/thumbnails/5.jpg)
The C-STAR Travel Planning Domain
General Scenario:
• Dialogue between one traveler and one or more travel agents
• Focus on making travel arrangements for a personal leisure trip (not business)
• Free spontaneous speech
![Page 6: Speech-to-Speech MT Design and Engineering](https://reader036.vdocument.in/reader036/viewer/2022070404/56813b7e550346895da49dc8/html5/thumbnails/6.jpg)
The C-STAR Travel Planning Domain
Natural breakdown into several sub-domains:
• Hotel Information and Reservation
• Transportation Information and Reservation
• Information about Sights and Events
• General Travel Information
• Cross Domain
![Page 7: Speech-to-Speech MT Design and Engineering](https://reader036.vdocument.in/reader036/viewer/2022070404/56813b7e550346895da49dc8/html5/thumbnails/7.jpg)
Semantic Grammars
• Describe structure of semantic concepts instead of syntactic constituency of phrases
• Well suited for task-oriented dialogue containing many fixed expressions
• Appropriate for spoken language - often disfluent and syntactically ill-formed
• Faster to develop reasonable coverage for limited domains
![Page 8: Speech-to-Speech MT Design and Engineering](https://reader036.vdocument.in/reader036/viewer/2022070404/56813b7e550346895da49dc8/html5/thumbnails/8.jpg)
Semantic Grammars
Hotel Reservation Example:
Input: we have two hotels available
Parse Tree:
[give-information+availability+hotel]
(we have [hotel-type]
([quantity=] (two)
[hotel] (hotels)
available)
![Page 9: Speech-to-Speech MT Design and Engineering](https://reader036.vdocument.in/reader036/viewer/2022070404/56813b7e550346895da49dc8/html5/thumbnails/9.jpg)
The JANUS-III Translation System
![Page 10: Speech-to-Speech MT Design and Engineering](https://reader036.vdocument.in/reader036/viewer/2022070404/56813b7e550346895da49dc8/html5/thumbnails/10.jpg)
The JANUS-III Translation System
![Page 11: Speech-to-Speech MT Design and Engineering](https://reader036.vdocument.in/reader036/viewer/2022070404/56813b7e550346895da49dc8/html5/thumbnails/11.jpg)
The SOUP Parser
• Specifically designed to parse spoken language using domain-specific semantic grammars
• Robust - can skip over disfluencies in input• Stochastic - probabilistic CFG encoded as a
collection of RTNs with arc probabilities• Top-Down - parses from top-level concepts of the
grammar down to matching of terminals• Chart-based - dynamic matrix of parse DAGs
indexed by start and end positions and head cat
![Page 12: Speech-to-Speech MT Design and Engineering](https://reader036.vdocument.in/reader036/viewer/2022070404/56813b7e550346895da49dc8/html5/thumbnails/12.jpg)
The SOUP Parser
• Supports parsing with large multiple domain grammars
• Produces a lattice of parse analyses headed by top-level concepts
• Disambiguation heuristics rank the analyses in the parse lattice and select a single best path through the lattice
• Graphical grammar editor
![Page 13: Speech-to-Speech MT Design and Engineering](https://reader036.vdocument.in/reader036/viewer/2022070404/56813b7e550346895da49dc8/html5/thumbnails/13.jpg)
SOUP Disambiguation Heuristics
• Maximize coverage (of input)• Minimize number of parse trees (fragmentation)• Minimize number of parse tree nodes• Minimize the number of wild-card matches• Maximize the probability of parse trees• Find sequence of domain tags with maximal
probability given the input words: P(T|W), where T= t1,t2,…,tn is a sequence of domain tags
![Page 14: Speech-to-Speech MT Design and Engineering](https://reader036.vdocument.in/reader036/viewer/2022070404/56813b7e550346895da49dc8/html5/thumbnails/14.jpg)
JANUS Generation Modules
Two alternative generation modules:
• Top-Down context-free based generator - fast, used for English and Japanese
• GenKit - unification-based generator augmented with Morphe morphology module - used for German
![Page 15: Speech-to-Speech MT Design and Engineering](https://reader036.vdocument.in/reader036/viewer/2022070404/56813b7e550346895da49dc8/html5/thumbnails/15.jpg)
Modular Grammar Design• Grammar development separated into modules corresponding to
sub-domains (Hotel, Transportation, Sights, General Travel, Cross Domain)
• Shared core grammar for lower-level concepts that are common to the various sub-domains (e.g. times, prices)
• Grammars can be developed independently (using shared core grammar)
• Shared and Cross-Domain grammars significantly reduce effort in expanding to new domains
• Separate grammar modules facilitate associating parses with domain tags - useful for multi-domain integration within the parser
![Page 16: Speech-to-Speech MT Design and Engineering](https://reader036.vdocument.in/reader036/viewer/2022070404/56813b7e550346895da49dc8/html5/thumbnails/16.jpg)
Translation with Multiple Domain Grammars
• Parser is loaded with all domain grammars
• Domain tag attached to grammar rules of each domain
• Previously developed grammars for other domains can also be incorporated
• Parser creates a parse lattice consisting of multiple analyses of the input into sequences of top-level domain concepts
• Parser disambiguation heuristics rank the analyses in the parse lattice and select a single best sequence of concepts
![Page 17: Speech-to-Speech MT Design and Engineering](https://reader036.vdocument.in/reader036/viewer/2022070404/56813b7e550346895da49dc8/html5/thumbnails/17.jpg)
Translation with Multiple Domain Grammars
![Page 18: Speech-to-Speech MT Design and Engineering](https://reader036.vdocument.in/reader036/viewer/2022070404/56813b7e550346895da49dc8/html5/thumbnails/18.jpg)
A SOUP Parse Lattice
![Page 19: Speech-to-Speech MT Design and Engineering](https://reader036.vdocument.in/reader036/viewer/2022070404/56813b7e550346895da49dc8/html5/thumbnails/19.jpg)
Domain Portability: Travel to Medical
Knowledge-Based Methods
Re-usability of knowledge sources for translation and speech recognition
Corpus-Based Methods
Reduce the amount of new training data for translation and speech recognition
![Page 20: Speech-to-Speech MT Design and Engineering](https://reader036.vdocument.in/reader036/viewer/2022070404/56813b7e550346895da49dc8/html5/thumbnails/20.jpg)
Background
• New domain: Medical– Doctor-patient diagnostic conversations– Global importance in emergencies and in
machine translation for remote health care– Synergy with Lincoln Lab
• Joint evaluation
• Joint interlingua
– Test case for portability
![Page 21: Speech-to-Speech MT Design and Engineering](https://reader036.vdocument.in/reader036/viewer/2022070404/56813b7e550346895da49dc8/html5/thumbnails/21.jpg)
Portability
• Advantage: Interlingua• Problem: Writing semantic grammars
– Domain dependent
– Requires time, effort, and expertise
• Approach:– Grammar modularity
– Domain action learning
– Automatic/Interactive semantic grammar induction
![Page 22: Speech-to-Speech MT Design and Engineering](https://reader036.vdocument.in/reader036/viewer/2022070404/56813b7e550346895da49dc8/html5/thumbnails/22.jpg)
Hybrid Stat/Rule-based Analysis
• Developing large coverage semantic analysis grammars is time consuming difficult to port analysis system to new domains
• “low-level” argument grammars are more domain-independent: contain many concepts that are used across domains: time, location, prices, etc.
• “high-level” domain-actions are domain-specific, must be redeveloped for each new domain: give-info+onset+symptom
• Tagging data sets with interlingua representations is less time consuming, needed anyway for system development
![Page 23: Speech-to-Speech MT Design and Engineering](https://reader036.vdocument.in/reader036/viewer/2022070404/56813b7e550346895da49dc8/html5/thumbnails/23.jpg)
Hybrid Rule/Stat Approach
• Combines grammar-based and statistical approaches to analysis:– Develop semantic grammars for phrase-level arguments that are
more portable to new domains
– Use statistical machine learning techniques for classifying into domain-actions
• Porting to a new domain requires:– developing argument parse rules for new domain
– tagging training set with domain-actions for new domain
– training the classifiers for domain-actions on the tagged data
![Page 24: Speech-to-Speech MT Design and Engineering](https://reader036.vdocument.in/reader036/viewer/2022070404/56813b7e550346895da49dc8/html5/thumbnails/24.jpg)
The Hybrid Analysis Process
Parse an utterance for arguments Segment the utterance into sentences Extract features from the utterance
and the single best parse output Use a learned classifier to identify
the speech act Use a learned classifier to identify
the concept sequence Combine into a full parse
![Page 25: Speech-to-Speech MT Design and Engineering](https://reader036.vdocument.in/reader036/viewer/2022070404/56813b7e550346895da49dc8/html5/thumbnails/25.jpg)
Argument Parsing
The SOUP parser produces a forest of parse trees that cover as much of the input as possible
The parse forest can be a mixture of trees allowed by any of the grammars
Only the best parse is used for further processing
![Page 26: Speech-to-Speech MT Design and Engineering](https://reader036.vdocument.in/reader036/viewer/2022070404/56813b7e550346895da49dc8/html5/thumbnails/26.jpg)
Argument Parse ExampleWe have a double room available for you at twenty-three thousand five hundred yen
[=availability=]::PSD ( we have [super_room-type=] ( [room-type=] ( a [room:double] ( double room ) ) ) available )
[arg-party:for-whom=]::ARG ( for [you] ( you ) )[arg:time=]::ARG ( [point=] ( at [hour-minute=] (
[big:hour=] ( [big:23] ( twenty-three ) ) ) ) )[arg:super_price=]::ARG ( [price=] (
[one-price:main-quantity=] ( [n-1000=] ( thousand ) [price:n-100=] ( five hundred ) )
[currency=] ( [yen] ( yen ) ) ) )
![Page 27: Speech-to-Speech MT Design and Engineering](https://reader036.vdocument.in/reader036/viewer/2022070404/56813b7e550346895da49dc8/html5/thumbnails/27.jpg)
Automatic Classification of Domain Actions
Train classifiers for speech acts and concepts Training data: Utterances labeled with speech act,
concepts, and best argument parse Input features
n most common words Arguments and pseudo-arguments in best parse Speaker Predicted speech act (for concept classifier)
![Page 28: Speech-to-Speech MT Design and Engineering](https://reader036.vdocument.in/reader036/viewer/2022070404/56813b7e550346895da49dc8/html5/thumbnails/28.jpg)
Full Parse ExampleWe have a double room available for you at twenty-three thousand five hundred yen
give-information+availability+room ([=availability=]::PSD ( we have [super_room-type=] (
[room-type=] ( a [room:double] ( double room ) ) ) available )
[arg-party:for-whom=]::ARG ( for [you] ( you ) )[arg:time=]::ARG ( [point=] ( at [hour-minute=] (
[big:hour=] ( [big:23] ( twenty-three ) ) ) ) )[arg:super_price=]::ARG ( [price=] (
[one-price:main-quantity=] ( [n-1000=] ( thousand ) [price:n-100=] ( five hundred ) )
[currency=] ( [yen] ( yen ) ) ) ))
![Page 29: Speech-to-Speech MT Design and Engineering](https://reader036.vdocument.in/reader036/viewer/2022070404/56813b7e550346895da49dc8/html5/thumbnails/29.jpg)
Classification Results UsingMemory-based (TiMBL) Classifiers
Classification Accuracy (16-fold Cross Validation)
0
0.1
0.2
0.30.4
0.5
0.6
0.7
0.8
500 1000 2000 3000 4000 5000 6009
Training Set Size
Mea
n A
ccu
racy
Speech Act
Concept Sequence
Domain Action
![Page 30: Speech-to-Speech MT Design and Engineering](https://reader036.vdocument.in/reader036/viewer/2022070404/56813b7e550346895da49dc8/html5/thumbnails/30.jpg)
Status and Open Research
• Preliminary analysis engine implemented, currently used for travel domain in NESPOLE!
• Areas for further research and development:– Explore a variety of classifiers– Explore features for domain-action classification– Classification compositionality – how to claissify the
components of the domain-action separately and combine them?
– Taking advantage of additional knowledge sources: the interlingua specification, dialogue context
– Better address segmentation of utterance into DAs
![Page 31: Speech-to-Speech MT Design and Engineering](https://reader036.vdocument.in/reader036/viewer/2022070404/56813b7e550346895da49dc8/html5/thumbnails/31.jpg)
Automatic Induction of Semantic Grammars
• Seed grammar for a new domain has very limited coverage
• Corpus of development data tagged with interlingua representations available
• Expand the seed grammar by learning new rules for covering the same domain-actions
• First step: how well can we do with no human intervention?
![Page 32: Speech-to-Speech MT Design and Engineering](https://reader036.vdocument.in/reader036/viewer/2022070404/56813b7e550346895da49dc8/html5/thumbnails/32.jpg)
Outline of Semantic Grammar Induction
Tree Matching Linearization
ParserIF
HypothesesGeneration
RulesInduction
Knowledge
RulesManagementSeed
Grammar
s[gi+onset+sym]
( [manner=]
[sym-loc=]
*+became
[adj:sym-name=] )
Learned Grammar
![Page 33: Speech-to-Speech MT Design and Engineering](https://reader036.vdocument.in/reader036/viewer/2022070404/56813b7e550346895da49dc8/html5/thumbnails/33.jpg)
Human vs Machine Experiment
• Seed grammar
• Extended by a human
• Extended by automatic semantic grammar induction
![Page 34: Speech-to-Speech MT Design and Engineering](https://reader036.vdocument.in/reader036/viewer/2022070404/56813b7e550346895da49dc8/html5/thumbnails/34.jpg)
Seed Grammar
Cross Domain
Medical Shared
Around 100 rules and 6000 lexical items
Around 200 rules Around 600 rules and growing
MedicalHello. My name is Sam.
I have a burning sensation in my foot.
![Page 35: Speech-to-Speech MT Design and Engineering](https://reader036.vdocument.in/reader036/viewer/2022070404/56813b7e550346895da49dc8/html5/thumbnails/35.jpg)
A Parse Tree[request-information+existence+body-state]::MED ( WH-PHRASES::XDM ( [q:duration=]::XDM ( [dur:question]::XDM ( how long ) ) ) HAVE-GET-FEEL::MED ( GET ( have ) ) you HAVE-GET-FEEL::MED ( HAS ( had ) ) [super_body-state-spec=]::MED ( [body-state-spec=]::MED ( ID-WHOSE::MED ( [identifiability=] ( [id:non-distant] ( this ) ) )
BODY-STATE::MED ( [pain]::MED ( pain ) ) ) ) )
![Page 36: Speech-to-Speech MT Design and Engineering](https://reader036.vdocument.in/reader036/viewer/2022070404/56813b7e550346895da49dc8/html5/thumbnails/36.jpg)
Manual Grammar Development
•About five additional days of development after the seed grammar was finalized
•Focusing on medical rules only
•Domain-independent rules remain untouched
![Page 37: Speech-to-Speech MT Design and Engineering](https://reader036.vdocument.in/reader036/viewer/2022070404/56813b7e550346895da49dc8/html5/thumbnails/37.jpg)
Development and evaluation sets
• Development set: 133 sentences– from one dialog
• Evaluation set: 83 sentences– from two dialogs – unseen speakers– Only SDUs that could be manually tagged with a full IF
according to the current specification were included.
![Page 38: Speech-to-Speech MT Design and Engineering](https://reader036.vdocument.in/reader036/viewer/2022070404/56813b7e550346895da49dc8/html5/thumbnails/38.jpg)
Grading Procedure: Recall and Precision of IF Components
c:give-information+ speech act
existence+body-state concepts
(body-state-spec=(pain, top-level argument
identifiability=no), sub-argument
body-location= top-level argument
(inside=head)) sub-argument
• Recall – ignored if number of items is 0
• Precision – ignored if 0 out of 0
![Page 39: Speech-to-Speech MT Design and Engineering](https://reader036.vdocument.in/reader036/viewer/2022070404/56813b7e550346895da49dc8/html5/thumbnails/39.jpg)
12.948.26.2Precision
14.128.31.2Recall
Sub-level Values
12.648.20.0Precision
14.128.30.0Recall
Sub-Level Args
39.250.00.0Precision
29.88.30.0Recall
Top-Level Values
34.442.20.0Precision
29.67.20.0Recall
Top-Level Args
25.142.212.5Precision
32.510.12.2Recall
Concept List
45.875.071.0Precision
49.348.243.3Recall
Speech Act
LearnedExtendedSeed
Human vs. Machine: Evaluation Results
![Page 40: Speech-to-Speech MT Design and Engineering](https://reader036.vdocument.in/reader036/viewer/2022070404/56813b7e550346895da49dc8/html5/thumbnails/40.jpg)
User Studies• We conducted three sets of user tests• Travel agent played by experienced system user• Traveler is played by a novice and given five minutes of
instruction• Traveler is given a general scenario - e.g., plan a trip to
Heidelberg
• Communication only via ST system, multi-modal interface and muted video connection
• Data collected used for system evaluation, error analysis and then grammar development
![Page 41: Speech-to-Speech MT Design and Engineering](https://reader036.vdocument.in/reader036/viewer/2022070404/56813b7e550346895da49dc8/html5/thumbnails/41.jpg)
System Evaluation Methodology
• End-to-end evaluations conducted at the SDU (sentence) level
• Multiple bilingual graders compare the input with translated output and assign a grade of: Perfect, OK or Bad
• OK = meaning of SDU comes across• Perfect = OK + fluent output• Bad = translation incomplete or incorrect
![Page 42: Speech-to-Speech MT Design and Engineering](https://reader036.vdocument.in/reader036/viewer/2022070404/56813b7e550346895da49dc8/html5/thumbnails/42.jpg)
August-99 Evaluation
• Data from latest user study - traveler planning a trip to Japan
• 132 utterances containing one or more SDUs, from six different users
• SR word error rate 14.7%
• 40.2% of utterances contain recognition error(s)
![Page 43: Speech-to-Speech MT Design and Engineering](https://reader036.vdocument.in/reader036/viewer/2022070404/56813b7e550346895da49dc8/html5/thumbnails/43.jpg)
Evaluation ResultsMethod Output
LanguageOK+Perfect Perfect
SOUP -Transcribed English 74% 54%SOUP-Recognition English 59% 42%SOUP-Transcribed Japanese 77% 59%SOUP-Recognition Japanese 62% 45%SOUP-Transcribed German 70% 39%SOUP-Recognition German 58% 34%
![Page 44: Speech-to-Speech MT Design and Engineering](https://reader036.vdocument.in/reader036/viewer/2022070404/56813b7e550346895da49dc8/html5/thumbnails/44.jpg)
Evaluation - Progress Over Time
Method OK+Perfect Perfect
Jan-99 Transcribed 69% 46%
Apr-99 Transcribed 70% 49%
Aug-99 Transcribed 74% 54%
Jan-99 Recognition 55% 36%
Apr-99 Recognition 57% 38%
Aug-99 Recognition 59% 42%
![Page 45: Speech-to-Speech MT Design and Engineering](https://reader036.vdocument.in/reader036/viewer/2022070404/56813b7e550346895da49dc8/html5/thumbnails/45.jpg)
Current and Future Work
• Expanding the interlingua: covering descriptive as well as task-oriented sentences
• Developing the new portable approaches• development of the server-based architecture for
supporting multiple applications:– NESPOLE!: speech-MT for advanced e-commerce
– C-STAR: speech-to-speech MT over mobile phones
– LingWear: MT and language assistance on wearable devices
![Page 46: Speech-to-Speech MT Design and Engineering](https://reader036.vdocument.in/reader036/viewer/2022070404/56813b7e550346895da49dc8/html5/thumbnails/46.jpg)
Students Working on the Project
• Chad Langley: Hybrid Rule/Stat Analysis, Speech MT architecture
• Ben Han: Automatic Grammar Induction
• Alicia Tribble: Interlingua and grammar development for Medical Domain
• Joy Zhang, Erik Peterson: Chinese EBMT for LingWear
![Page 47: Speech-to-Speech MT Design and Engineering](https://reader036.vdocument.in/reader036/viewer/2022070404/56813b7e550346895da49dc8/html5/thumbnails/47.jpg)
The JANUS Speech-MT Team• Project Leaders: Lori Levin, Alon Lavie, Tanja
Schultz, Alex Waibel• Grammar and Component Developers: Donna
Gates, Dorcas Wallace, Kay Peterson, Alicia Tribble, Chad Langley, Ben Han, Celine Morel, Susie Burger, Vicky MacLaren, Kornel Laskowski, Erik Peterson