![Page 1: Speech-to-Speech MT in NESPOLE! Design and Engineering Alon Lavie, Lori Levin Work with: Chad Langley, Tanja Schultz, Dorcas Wallace, Donna Gates, Kay](https://reader036.vdocument.in/reader036/viewer/2022062408/56649f395503460f94c563ea/html5/thumbnails/1.jpg)
Speech-to-Speech MT in NESPOLE!
Design and EngineeringAlon Lavie, Lori Levin
Work with: Chad Langley, Tanja Schultz, Dorcas Wallace, Donna Gates, Kay
Peterson, Kornel Laskowski
MT Class, April 2, 2003
![Page 2: Speech-to-Speech MT in NESPOLE! Design and Engineering Alon Lavie, Lori Levin Work with: Chad Langley, Tanja Schultz, Dorcas Wallace, Donna Gates, Kay](https://reader036.vdocument.in/reader036/viewer/2022062408/56649f395503460f94c563ea/html5/thumbnails/2.jpg)
• Speech-to-speech translation for E-Commerce applications
• Partners: CMU, Univ of Karlsruhe, ITC-irst, UJF-CLIPS, AETHRA, APT-Trentino
• Builds on successful collaboration within C-STAR• Improved limited-domain speech translation• Experiment with multimodality and with MEMT• Showcase-1: Travel and Tourism in Trentino, completed
in Nov-2001, demonstrated IST,HLT• Showcase-2: expanded travel + medical service
![Page 3: Speech-to-Speech MT in NESPOLE! Design and Engineering Alon Lavie, Lori Levin Work with: Chad Langley, Tanja Schultz, Dorcas Wallace, Donna Gates, Kay](https://reader036.vdocument.in/reader036/viewer/2022062408/56649f395503460f94c563ea/html5/thumbnails/3.jpg)
April 2, 2003 MT Class 3
NESPOLE! System Overview
• Human-to-human spoken language translation for e-commerce application (e.g. travel & tourism) (Lavie et al., 2002)
• English, German, Italian, and French• Translation via interlingua• Translation servers for each language
exchange interlingua to perform translation– Speech recognition (Speech Text)– Analysis (Text Interlingua)– Generation (Interlingua Text)– Synthesis (Text Speech)
![Page 4: Speech-to-Speech MT in NESPOLE! Design and Engineering Alon Lavie, Lori Levin Work with: Chad Langley, Tanja Schultz, Dorcas Wallace, Donna Gates, Kay](https://reader036.vdocument.in/reader036/viewer/2022062408/56649f395503460f94c563ea/html5/thumbnails/4.jpg)
April 2, 2003 MT Class 4
Speech-to-speech in E-commerce
• Augment current passive web E-commerce with live interaction capabilities
• Client starts via web, can easily connect to agent for specific detailed information
• “Thin client” - very little special hardware and software on client PC: browser, MS Netmeeting, Shared Whiteboard
![Page 5: Speech-to-Speech MT in NESPOLE! Design and Engineering Alon Lavie, Lori Levin Work with: Chad Langley, Tanja Schultz, Dorcas Wallace, Donna Gates, Kay](https://reader036.vdocument.in/reader036/viewer/2022062408/56649f395503460f94c563ea/html5/thumbnails/5.jpg)
April 2, 2003 MT Class 5
NESPOLE! User Interfaces
![Page 6: Speech-to-Speech MT in NESPOLE! Design and Engineering Alon Lavie, Lori Levin Work with: Chad Langley, Tanja Schultz, Dorcas Wallace, Donna Gates, Kay](https://reader036.vdocument.in/reader036/viewer/2022062408/56649f395503460f94c563ea/html5/thumbnails/6.jpg)
April 2, 2003 MT Class 6
NESPOLE! Translation Monitor
![Page 7: Speech-to-Speech MT in NESPOLE! Design and Engineering Alon Lavie, Lori Levin Work with: Chad Langley, Tanja Schultz, Dorcas Wallace, Donna Gates, Kay](https://reader036.vdocument.in/reader036/viewer/2022062408/56649f395503460f94c563ea/html5/thumbnails/7.jpg)
April 2, 2003 MT Class 7
NESPOLE! Architecture
![Page 8: Speech-to-Speech MT in NESPOLE! Design and Engineering Alon Lavie, Lori Levin Work with: Chad Langley, Tanja Schultz, Dorcas Wallace, Donna Gates, Kay](https://reader036.vdocument.in/reader036/viewer/2022062408/56649f395503460f94c563ea/html5/thumbnails/8.jpg)
April 2, 2003 MT Class 8
Distributed S2S Translation over the Internet
![Page 9: Speech-to-Speech MT in NESPOLE! Design and Engineering Alon Lavie, Lori Levin Work with: Chad Langley, Tanja Schultz, Dorcas Wallace, Donna Gates, Kay](https://reader036.vdocument.in/reader036/viewer/2022062408/56649f395503460f94c563ea/html5/thumbnails/9.jpg)
April 2, 2003 MT Class 9
Language-specific HLT Servers
![Page 10: Speech-to-Speech MT in NESPOLE! Design and Engineering Alon Lavie, Lori Levin Work with: Chad Langley, Tanja Schultz, Dorcas Wallace, Donna Gates, Kay](https://reader036.vdocument.in/reader036/viewer/2022062408/56649f395503460f94c563ea/html5/thumbnails/10.jpg)
April 2, 2003 MT Class 10
Our Parsing and Analysis Approach
• Goal: A portable and robust analyzer for task-oriented human-to-human speech, parsing utterances into interlingua representations
• Our earlier systems used full semantic grammars to parse complete DAs– Useful for parsing spoken language in restricted domains– Difficult to port to new domains
• Current focus is on improving portability to new domains (and new languages)
• Approach: Continue to use semantic grammars to parse domain-independent phrase-level arguments and train classifiers to identify DAs
![Page 11: Speech-to-Speech MT in NESPOLE! Design and Engineering Alon Lavie, Lori Levin Work with: Chad Langley, Tanja Schultz, Dorcas Wallace, Donna Gates, Kay](https://reader036.vdocument.in/reader036/viewer/2022062408/56649f395503460f94c563ea/html5/thumbnails/11.jpg)
April 2, 2003 MT Class 11
Interchange Format
• Interchange Format (IF) is a shallow semantic interlingua for task-oriented domains
• Utterances represented as sequences of semantic dialog units (SDUs)
• IF representation consists of four parts– Speaker– Speech Act– Concepts– Arguments
speaker : speech act +concept* +arguments*
}Domain Action
![Page 12: Speech-to-Speech MT in NESPOLE! Design and Engineering Alon Lavie, Lori Levin Work with: Chad Langley, Tanja Schultz, Dorcas Wallace, Donna Gates, Kay](https://reader036.vdocument.in/reader036/viewer/2022062408/56649f395503460f94c563ea/html5/thumbnails/12.jpg)
April 2, 2003 MT Class 12
Hybrid Analysis Approach
Text
Argument
Parser
TextArguments
SDU
Segmenter
TextArguments
SDUs
DA
Classifier
IF
Use a combination of grammar-based phrase-level parsing and machine learning to produce interlingua (IF) representations
![Page 13: Speech-to-Speech MT in NESPOLE! Design and Engineering Alon Lavie, Lori Levin Work with: Chad Langley, Tanja Schultz, Dorcas Wallace, Donna Gates, Kay](https://reader036.vdocument.in/reader036/viewer/2022062408/56649f395503460f94c563ea/html5/thumbnails/13.jpg)
April 2, 2003 MT Class 13
Hybrid Analysis ApproachHello. I would like to take a vacation in Val di Fiemme.c:greeting (greeting=hello)c:give-information+disposition+trip (disposition=(who=i, desire), visit-spec=(identifiability=no, vacation), location=(place-name=val_di_fiemme))
hello i would like to take a vacation in val di fiemme
SDU1 SDU2
greeting= disposition= visit-spec= location=
hello i would like to take a vacation in val di fiemme
greeting give-information+disposition+trip
greeting= disposition= visit-spec= location=
hello i would like to take a vacation in val di fiemme
![Page 14: Speech-to-Speech MT in NESPOLE! Design and Engineering Alon Lavie, Lori Levin Work with: Chad Langley, Tanja Schultz, Dorcas Wallace, Donna Gates, Kay](https://reader036.vdocument.in/reader036/viewer/2022062408/56649f395503460f94c563ea/html5/thumbnails/14.jpg)
April 2, 2003 MT Class 14
Argument Parsing
• Parse utterances using phrase-level grammars• SOUP Parser (Gavaldà, 2000): Stochastic,
chart-based, top-down robust parser designed for real-time analysis of spoken language
• Separate grammars based on the type of phrases that the grammar is intended to cover
![Page 15: Speech-to-Speech MT in NESPOLE! Design and Engineering Alon Lavie, Lori Levin Work with: Chad Langley, Tanja Schultz, Dorcas Wallace, Donna Gates, Kay](https://reader036.vdocument.in/reader036/viewer/2022062408/56649f395503460f94c563ea/html5/thumbnails/15.jpg)
April 2, 2003 MT Class 15
Domain Action Classification
• Identify the DA for each SDU using trainable classifiers
• Two TiMBL (k-NN) classifiers– Speech act– Concept sequence
• Binary features indicate presence or absence of arguments and pseudo-arguments
![Page 16: Speech-to-Speech MT in NESPOLE! Design and Engineering Alon Lavie, Lori Levin Work with: Chad Langley, Tanja Schultz, Dorcas Wallace, Donna Gates, Kay](https://reader036.vdocument.in/reader036/viewer/2022062408/56649f395503460f94c563ea/html5/thumbnails/16.jpg)
April 2, 2003 MT Class 16
Using the IF Specification
• Use knowledge of the IF specification during DA classification– Ensure that only legal DAs are produced– Guarantee that the DA and arguments
combine to form a valid IF representation
• Strategy: Find the best DA that licenses the most arguments– Trust parser to reliably label arguments– Retaining detailed argument information is
important for translation
![Page 17: Speech-to-Speech MT in NESPOLE! Design and Engineering Alon Lavie, Lori Levin Work with: Chad Langley, Tanja Schultz, Dorcas Wallace, Donna Gates, Kay](https://reader036.vdocument.in/reader036/viewer/2022062408/56649f395503460f94c563ea/html5/thumbnails/17.jpg)
April 2, 2003 MT Class 17
Evaluation: Classification Accuracy
• 20-fold cross-validation using the NESPOLE! travel domain database
English German
SDUs 8289 8719
Domain Actions
972 1001
Speech Acts
70 70
Concept Sequences
615 638
Vocabulary 1946 2815
The database: Most Frequent Class:
English German
Speech Act
41.4% 40.7%
Concept Sequence
38.9% 40.3%
![Page 18: Speech-to-Speech MT in NESPOLE! Design and Engineering Alon Lavie, Lori Levin Work with: Chad Langley, Tanja Schultz, Dorcas Wallace, Donna Gates, Kay](https://reader036.vdocument.in/reader036/viewer/2022062408/56649f395503460f94c563ea/html5/thumbnails/18.jpg)
April 2, 2003 MT Class 18
Evaluation: Classification Accuracy
English German
Speech Acts
81.25% 78.93%
Concept Sequences
69.59% 67.08%
Classification Performance Accuracy
![Page 19: Speech-to-Speech MT in NESPOLE! Design and Engineering Alon Lavie, Lori Levin Work with: Chad Langley, Tanja Schultz, Dorcas Wallace, Donna Gates, Kay](https://reader036.vdocument.in/reader036/viewer/2022062408/56649f395503460f94c563ea/html5/thumbnails/19.jpg)
April 2, 2003 MT Class 19
Evaluation:End-to-End Translation
• English-to-English and English-to-Italian• Training set: ~8000 SDUs from NESPOLE!• Test set: 2 dialogs, only client utterances• Uses IF specification fallback strategy• Three graders, bilingual English/Italian
speakers• Each SDU graded as perfect, ok, bad, very bad• Acceptable translation = perfect+ok• Majority scores
![Page 20: Speech-to-Speech MT in NESPOLE! Design and Engineering Alon Lavie, Lori Levin Work with: Chad Langley, Tanja Schultz, Dorcas Wallace, Donna Gates, Kay](https://reader036.vdocument.in/reader036/viewer/2022062408/56649f395503460f94c563ea/html5/thumbnails/20.jpg)
April 2, 2003 MT Class 20
Evaluation:End-to-End Translation
Speech recognizer hypotheses 66.7% WAR: 56.4%
English Source InputTarget
LanguageAcceptable
(OK + Perfect)
Translation from English 68.1%
Human Transcription Italian 69.7%
Translation from English 50.4%
SR Hypothesis Italian 50.2%
![Page 21: Speech-to-Speech MT in NESPOLE! Design and Engineering Alon Lavie, Lori Levin Work with: Chad Langley, Tanja Schultz, Dorcas Wallace, Donna Gates, Kay](https://reader036.vdocument.in/reader036/viewer/2022062408/56649f395503460f94c563ea/html5/thumbnails/21.jpg)
April 2, 2003 MT Class 21
Evaluation:Data Ablation Experiment
Classification Accuracy (16-fold Cross Validation)
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
500 1000 2000 3000 4000 5000 6009
Training Set Size
Mea
n A
ccu
racy
Speech Act
Concept Sequence
Domain Action
![Page 22: Speech-to-Speech MT in NESPOLE! Design and Engineering Alon Lavie, Lori Levin Work with: Chad Langley, Tanja Schultz, Dorcas Wallace, Donna Gates, Kay](https://reader036.vdocument.in/reader036/viewer/2022062408/56649f395503460f94c563ea/html5/thumbnails/22.jpg)
April 2, 2003 MT Class 22
Domain Portability
• Experimented with porting to a medical assistance domain in NESPOLE!
• Initial medical domain system up and running, with reasonable coverage of flu-like symptoms and chest pain
• Porting the interlingua, grammars and modules for English, German and Italian required about 6 person months in total– Interlingua development: ~180 hours– Interlingua annotation: ~200 hours– Analysis grammars, training: ~250 hours– Generation development: ~250 hours
![Page 23: Speech-to-Speech MT in NESPOLE! Design and Engineering Alon Lavie, Lori Levin Work with: Chad Langley, Tanja Schultz, Dorcas Wallace, Donna Gates, Kay](https://reader036.vdocument.in/reader036/viewer/2022062408/56649f395503460f94c563ea/html5/thumbnails/23.jpg)
April 2, 2003 MT Class 23
New Development Tools
![Page 24: Speech-to-Speech MT in NESPOLE! Design and Engineering Alon Lavie, Lori Levin Work with: Chad Langley, Tanja Schultz, Dorcas Wallace, Donna Gates, Kay](https://reader036.vdocument.in/reader036/viewer/2022062408/56649f395503460f94c563ea/html5/thumbnails/24.jpg)
April 2, 2003 MT Class 24
Questions?
![Page 25: Speech-to-Speech MT in NESPOLE! Design and Engineering Alon Lavie, Lori Levin Work with: Chad Langley, Tanja Schultz, Dorcas Wallace, Donna Gates, Kay](https://reader036.vdocument.in/reader036/viewer/2022062408/56649f395503460f94c563ea/html5/thumbnails/25.jpg)
April 2, 2003 MT Class 25
Grammars
• Argument grammar– Identifies arguments defined in the IFs[arg:activity-spec=]
(*[object-ref=any] *[modifier=good] [biking])
– Covers "any good biking", "any biking", "good biking", "biking", plus synonyms for all 3 words
• Pseudo-argument grammar– Groups common phrases with similar meanings into
classess[=arrival=] (*is *usually arriving)
– Covers "arriving", "is arriving", "usually arriving", "is usually arriving", plus synonyms
![Page 26: Speech-to-Speech MT in NESPOLE! Design and Engineering Alon Lavie, Lori Levin Work with: Chad Langley, Tanja Schultz, Dorcas Wallace, Donna Gates, Kay](https://reader036.vdocument.in/reader036/viewer/2022062408/56649f395503460f94c563ea/html5/thumbnails/26.jpg)
April 2, 2003 MT Class 26
Grammars
• Cross-domain grammar– Identifies simple domain-independent DAss[greeting]
([greeting=first_meeting] *[greet:to-whom=])
– Covers "nice to meet you", "nice to meet you donna", "nice to meet you sir", plus synonyms
• Shared grammar– Contains low-level rules accessible by all
other grammars
![Page 27: Speech-to-Speech MT in NESPOLE! Design and Engineering Alon Lavie, Lori Levin Work with: Chad Langley, Tanja Schultz, Dorcas Wallace, Donna Gates, Kay](https://reader036.vdocument.in/reader036/viewer/2022062408/56649f395503460f94c563ea/html5/thumbnails/27.jpg)
April 2, 2003 MT Class 27
Segmentation
• Identify SDU boundaries between argument parse trees
• Insert a boundary if either parse tree is from cross-domain grammar
• Otherwise, use a simple statistical model
])C([A ])C([A
])AC([ ])C([A])AF([A
21
2121
![Page 28: Speech-to-Speech MT in NESPOLE! Design and Engineering Alon Lavie, Lori Levin Work with: Chad Langley, Tanja Schultz, Dorcas Wallace, Donna Gates, Kay](https://reader036.vdocument.in/reader036/viewer/2022062408/56649f395503460f94c563ea/html5/thumbnails/28.jpg)
April 2, 2003 MT Class 28
Using the IF Specification
• Check if the best speech act and concept sequence form a legal IF
• If not, test alternative combinations of speech acts and concept sequences from ranked set of possibilities
• Select the best combination that licenses the most arguments
• Drop any arguments not licensed by the best DA
![Page 29: Speech-to-Speech MT in NESPOLE! Design and Engineering Alon Lavie, Lori Levin Work with: Chad Langley, Tanja Schultz, Dorcas Wallace, Donna Gates, Kay](https://reader036.vdocument.in/reader036/viewer/2022062408/56649f395503460f94c563ea/html5/thumbnails/29.jpg)
April 2, 2003 MT Class 29
Grammar Development and Classifier Training
• Four steps1. Write argument grammars2. Parse training data3. Obtain segmentation counts4. Train DA classifiers
• Steps 2-4 are automated to simplify testing new grammars
• Translation servers include a development mode for testing new grammars
![Page 30: Speech-to-Speech MT in NESPOLE! Design and Engineering Alon Lavie, Lori Levin Work with: Chad Langley, Tanja Schultz, Dorcas Wallace, Donna Gates, Kay](https://reader036.vdocument.in/reader036/viewer/2022062408/56649f395503460f94c563ea/html5/thumbnails/30.jpg)
April 2, 2003 MT Class 30
Evaluation:IF Specification Fallback
• 182 SDUs required classification• 4% had illegal DAs• 29% had illegal IFs• Mean arguments per SDU: 1.47
Changed
Speech Act 5%
Concept Sequence 26%
Domain Action 29%
Arguments dropped per SDU
Without fallback 0.38
With fallback 0.07
![Page 31: Speech-to-Speech MT in NESPOLE! Design and Engineering Alon Lavie, Lori Levin Work with: Chad Langley, Tanja Schultz, Dorcas Wallace, Donna Gates, Kay](https://reader036.vdocument.in/reader036/viewer/2022062408/56649f395503460f94c563ea/html5/thumbnails/31.jpg)
April 2, 2003 MT Class 31
Evaluation:Data Ablation Experiment
• 16-fold cross validation setup• Test set size (# SDUs): 400• Training set sizes (# SDUs): 500, 1000, 2000,
3000, 4000, 5000, 6009 (all data)• Data from previous C-STAR system• No use of IF specification
![Page 32: Speech-to-Speech MT in NESPOLE! Design and Engineering Alon Lavie, Lori Levin Work with: Chad Langley, Tanja Schultz, Dorcas Wallace, Donna Gates, Kay](https://reader036.vdocument.in/reader036/viewer/2022062408/56649f395503460f94c563ea/html5/thumbnails/32.jpg)
April 2, 2003 MT Class 32
Future Work
• Alternative segmentation models, feature sets, and classification methods
• Multiple argument parses• Evaluate portability and robustness
– Collect dialogues in a new domain– Create argument and full DA grammars for a small
development set of dialogues– Assess portability by comparing grammar
development times and examining grammar reusability
– Assess robustness by comparing performance on unseen data
![Page 33: Speech-to-Speech MT in NESPOLE! Design and Engineering Alon Lavie, Lori Levin Work with: Chad Langley, Tanja Schultz, Dorcas Wallace, Donna Gates, Kay](https://reader036.vdocument.in/reader036/viewer/2022062408/56649f395503460f94c563ea/html5/thumbnails/33.jpg)
April 2, 2003 MT Class 33
References• Cattoni, R., M. Federico, and A. Lavie. 2001. Robust Analysis of Spoken
Input Combining Statistical and Knowledge-Based Information Sources. In Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, Trento, Italy.
• Daelemans, W., J. Zavrel, K. van der Sloot, and A. van den Bosch. 2000. TiMBL: Tilburg Memory Based Learner, version 3.0, Reference Guide. ILK Technical Report 00-01. http://ilk.kub.nl/~ilk/papers/ilk0001.ps.gz
• Gavaldà, M. 2000. SOUP: A Parser for Real-World Spontaneous Speech. In Proceedings of the IWPT-2000, Trento, Italy.
• Gotoh, Y. and S. Renals. Sentence Boundary Detection in Broadcast Speech Transcripts. 2000. In Proceedings on the International Speech Communication Association Workshop: Automatic Speech Recognition: Challenges for the New Millennium, Paris.
• Lavie, A., F. Metze, F. Pianesi, et al. 2002. Enhancing the Usability and Performance of NESPOLE! – a Real-World Speech-to-Speech Translation System. In Proceedings of HLT-2002, San Diego, CA.
![Page 34: Speech-to-Speech MT in NESPOLE! Design and Engineering Alon Lavie, Lori Levin Work with: Chad Langley, Tanja Schultz, Dorcas Wallace, Donna Gates, Kay](https://reader036.vdocument.in/reader036/viewer/2022062408/56649f395503460f94c563ea/html5/thumbnails/34.jpg)
April 2, 2003 MT Class 34
References• Lavie, A., C. Langley, A. Waibel, et al. 2001. Architecture and Design
Considerations in NESPOLE!: a Speech Translation System for E-commerce Applications. In Proceedings of HLT-2001, San Diego, CA.
• Lavie, A., D. Gates, N. Coccaro, and L. Levin. 1997. Input Segmentation of Spontaneous Speech in JANUS: a Speech-to-speech Translation System. In Dialogue Processing in Spoken Language Systems: Revised Papers from ECAI-96 Workshop, E. Maier, M. Mast, and S. Luperfoy (eds.), LNCS series, Springer Verlag.
• Lavie, A. 1996. GLR*: A Robust Grammar-Focused Parser for Spontaneously Spoken Language. PhD dissertation, Technical Report CMU-CS-96-126, Carnegie Mellon University, Pittsburgh, PA.
• Munk, M. 1999. Shallow Statistical Parsing for Machine Translation. Diploma Thesis, Karlsruhe University.
• Stevenson, M. and R. Gaizauskas. Experiments on Sentence Boundary Detection. 2000. In Proceedings of ANLP and NAACL-2000, Seattle.
• Woszczyna, M., M. Broadhead, D. Gates, et al. 1998. A Modular Approach to Spoken Language Translation for Large Domains. In Proceedings of AMTA-98, Langhorne, PA.