speech-to-speech mt design and engineering

Speech-to-Speech MT Design and Engineering Alon Lavie and Lori Levin MT Class April 16 2001

Upload: ziven

Post on 06-Jan-2016




1 download


Speech-to-Speech MT Design and Engineering. Alon Lavie and Lori Levin MT Class April 16 2001. Outline. Design and Engineering of the JANUS speech-to-speech MT system The Travel & Medical Domain Interlingua (IF) Portability to new domains: ML approaches Evaluation and User Studies - PowerPoint PPT Presentation


Page 1: Speech-to-Speech MT Design and Engineering

Speech-to-Speech MTDesign and Engineering

Alon Lavie and Lori Levin

MT Class

April 16 2001

Page 2: Speech-to-Speech MT Design and Engineering


• Design and Engineering of the JANUS speech-to-speech MT system

• The Travel & Medical Domain Interlingua (IF)• Portability to new domains: ML approaches• Evaluation and User Studies• Open Problems, Current and Future Research

Page 3: Speech-to-Speech MT Design and Engineering


• Fundamentals of our approach

• System overview

• Engineering a multi-domain system

• Evaluations and user studies

• Alternative translation approaches

• Current and future research

Page 4: Speech-to-Speech MT Design and Engineering

JANUS Speech Translation

• Translation via an interlingua representation

• Main translation engine is rule-based

• Semantic grammars

• Modular grammar design

• System engineered for multiple domains

• Recent focus on domain portability– using machine learning for rapid extension to a

new domain

Page 5: Speech-to-Speech MT Design and Engineering

The C-STAR Travel Planning Domain

General Scenario:

• Dialogue between one traveler and one or more travel agents

• Focus on making travel arrangements for a personal leisure trip (not business)

• Free spontaneous speech

Page 6: Speech-to-Speech MT Design and Engineering

The C-STAR Travel Planning Domain

Natural breakdown into several sub-domains:

• Hotel Information and Reservation

• Transportation Information and Reservation

• Information about Sights and Events

• General Travel Information

• Cross Domain

Page 7: Speech-to-Speech MT Design and Engineering

Semantic Grammars

• Describe structure of semantic concepts instead of syntactic constituency of phrases

• Well suited for task-oriented dialogue containing many fixed expressions

• Appropriate for spoken language - often disfluent and syntactically ill-formed

• Faster to develop reasonable coverage for limited domains

Page 8: Speech-to-Speech MT Design and Engineering

Semantic Grammars

Hotel Reservation Example:

Input: we have two hotels available

Parse Tree:


(we have [hotel-type]

([quantity=] (two)

[hotel] (hotels)


Page 9: Speech-to-Speech MT Design and Engineering

The JANUS-III Translation System

Page 10: Speech-to-Speech MT Design and Engineering

The JANUS-III Translation System

Page 11: Speech-to-Speech MT Design and Engineering

The SOUP Parser

• Specifically designed to parse spoken language using domain-specific semantic grammars

• Robust - can skip over disfluencies in input• Stochastic - probabilistic CFG encoded as a

collection of RTNs with arc probabilities• Top-Down - parses from top-level concepts of the

grammar down to matching of terminals• Chart-based - dynamic matrix of parse DAGs

indexed by start and end positions and head cat

Page 12: Speech-to-Speech MT Design and Engineering

The SOUP Parser

• Supports parsing with large multiple domain grammars

• Produces a lattice of parse analyses headed by top-level concepts

• Disambiguation heuristics rank the analyses in the parse lattice and select a single best path through the lattice

• Graphical grammar editor

Page 13: Speech-to-Speech MT Design and Engineering

SOUP Disambiguation Heuristics

• Maximize coverage (of input)• Minimize number of parse trees (fragmentation)• Minimize number of parse tree nodes• Minimize the number of wild-card matches• Maximize the probability of parse trees• Find sequence of domain tags with maximal

probability given the input words: P(T|W), where T= t1,t2,…,tn is a sequence of domain tags

Page 14: Speech-to-Speech MT Design and Engineering

JANUS Generation Modules

Two alternative generation modules:

• Top-Down context-free based generator - fast, used for English and Japanese

• GenKit - unification-based generator augmented with Morphe morphology module - used for German

Page 15: Speech-to-Speech MT Design and Engineering

Modular Grammar Design• Grammar development separated into modules corresponding to

sub-domains (Hotel, Transportation, Sights, General Travel, Cross Domain)

• Shared core grammar for lower-level concepts that are common to the various sub-domains (e.g. times, prices)

• Grammars can be developed independently (using shared core grammar)

• Shared and Cross-Domain grammars significantly reduce effort in expanding to new domains

• Separate grammar modules facilitate associating parses with domain tags - useful for multi-domain integration within the parser

Page 16: Speech-to-Speech MT Design and Engineering

Translation with Multiple Domain Grammars

• Parser is loaded with all domain grammars

• Domain tag attached to grammar rules of each domain

• Previously developed grammars for other domains can also be incorporated

• Parser creates a parse lattice consisting of multiple analyses of the input into sequences of top-level domain concepts

• Parser disambiguation heuristics rank the analyses in the parse lattice and select a single best sequence of concepts

Page 17: Speech-to-Speech MT Design and Engineering

Translation with Multiple Domain Grammars

Page 18: Speech-to-Speech MT Design and Engineering

A SOUP Parse Lattice

Page 19: Speech-to-Speech MT Design and Engineering

Domain Portability: Travel to Medical

Knowledge-Based Methods

Re-usability of knowledge sources for translation and speech recognition

Corpus-Based Methods

Reduce the amount of new training data for translation and speech recognition

Page 20: Speech-to-Speech MT Design and Engineering


• New domain: Medical– Doctor-patient diagnostic conversations– Global importance in emergencies and in

machine translation for remote health care– Synergy with Lincoln Lab

• Joint evaluation

• Joint interlingua

– Test case for portability

Page 21: Speech-to-Speech MT Design and Engineering


• Advantage: Interlingua• Problem: Writing semantic grammars

– Domain dependent

– Requires time, effort, and expertise

• Approach:– Grammar modularity

– Domain action learning

– Automatic/Interactive semantic grammar induction

Page 22: Speech-to-Speech MT Design and Engineering

Hybrid Stat/Rule-based Analysis

• Developing large coverage semantic analysis grammars is time consuming difficult to port analysis system to new domains

• “low-level” argument grammars are more domain-independent: contain many concepts that are used across domains: time, location, prices, etc.

• “high-level” domain-actions are domain-specific, must be redeveloped for each new domain: give-info+onset+symptom

• Tagging data sets with interlingua representations is less time consuming, needed anyway for system development

Page 23: Speech-to-Speech MT Design and Engineering

Hybrid Rule/Stat Approach

• Combines grammar-based and statistical approaches to analysis:– Develop semantic grammars for phrase-level arguments that are

more portable to new domains

– Use statistical machine learning techniques for classifying into domain-actions

• Porting to a new domain requires:– developing argument parse rules for new domain

– tagging training set with domain-actions for new domain

– training the classifiers for domain-actions on the tagged data

Page 24: Speech-to-Speech MT Design and Engineering

The Hybrid Analysis Process

Parse an utterance for arguments Segment the utterance into sentences Extract features from the utterance

and the single best parse output Use a learned classifier to identify

the speech act Use a learned classifier to identify

the concept sequence Combine into a full parse

Page 25: Speech-to-Speech MT Design and Engineering

Argument Parsing

The SOUP parser produces a forest of parse trees that cover as much of the input as possible

The parse forest can be a mixture of trees allowed by any of the grammars

Only the best parse is used for further processing

Page 26: Speech-to-Speech MT Design and Engineering

Argument Parse ExampleWe have a double room available for you at twenty-three thousand five hundred yen

[=availability=]::PSD ( we have [super_room-type=] ( [room-type=] ( a [room:double] ( double room ) ) ) available )

[arg-party:for-whom=]::ARG ( for [you] ( you ) )[arg:time=]::ARG ( [point=] ( at [hour-minute=] (

[big:hour=] ( [big:23] ( twenty-three ) ) ) ) )[arg:super_price=]::ARG ( [price=] (

[one-price:main-quantity=] ( [n-1000=] ( thousand ) [price:n-100=] ( five hundred ) )

[currency=] ( [yen] ( yen ) ) ) )

Page 27: Speech-to-Speech MT Design and Engineering

Automatic Classification of Domain Actions

Train classifiers for speech acts and concepts Training data: Utterances labeled with speech act,

concepts, and best argument parse Input features

n most common words Arguments and pseudo-arguments in best parse Speaker Predicted speech act (for concept classifier)

Page 28: Speech-to-Speech MT Design and Engineering

Full Parse ExampleWe have a double room available for you at twenty-three thousand five hundred yen

give-information+availability+room ([=availability=]::PSD ( we have [super_room-type=] (

[room-type=] ( a [room:double] ( double room ) ) ) available )

[arg-party:for-whom=]::ARG ( for [you] ( you ) )[arg:time=]::ARG ( [point=] ( at [hour-minute=] (

[big:hour=] ( [big:23] ( twenty-three ) ) ) ) )[arg:super_price=]::ARG ( [price=] (

[one-price:main-quantity=] ( [n-1000=] ( thousand ) [price:n-100=] ( five hundred ) )

[currency=] ( [yen] ( yen ) ) ) ))

Page 29: Speech-to-Speech MT Design and Engineering

Classification Results UsingMemory-based (TiMBL) Classifiers

Classification Accuracy (16-fold Cross Validation)









500 1000 2000 3000 4000 5000 6009

Training Set Size


n A



Speech Act

Concept Sequence

Domain Action

Page 30: Speech-to-Speech MT Design and Engineering

Status and Open Research

• Preliminary analysis engine implemented, currently used for travel domain in NESPOLE!

• Areas for further research and development:– Explore a variety of classifiers– Explore features for domain-action classification– Classification compositionality – how to claissify the

components of the domain-action separately and combine them?

– Taking advantage of additional knowledge sources: the interlingua specification, dialogue context

– Better address segmentation of utterance into DAs

Page 31: Speech-to-Speech MT Design and Engineering

Automatic Induction of Semantic Grammars

• Seed grammar for a new domain has very limited coverage

• Corpus of development data tagged with interlingua representations available

• Expand the seed grammar by learning new rules for covering the same domain-actions

• First step: how well can we do with no human intervention?

Page 32: Speech-to-Speech MT Design and Engineering

Outline of Semantic Grammar Induction

Tree Matching Linearization








( [manner=]



[adj:sym-name=] )

Learned Grammar

Page 33: Speech-to-Speech MT Design and Engineering

Human vs Machine Experiment

• Seed grammar

• Extended by a human

• Extended by automatic semantic grammar induction

Page 34: Speech-to-Speech MT Design and Engineering

Seed Grammar

Cross Domain

Medical Shared

Around 100 rules and 6000 lexical items

Around 200 rules Around 600 rules and growing

MedicalHello. My name is Sam.

I have a burning sensation in my foot.

Page 35: Speech-to-Speech MT Design and Engineering

A Parse Tree[request-information+existence+body-state]::MED ( WH-PHRASES::XDM ( [q:duration=]::XDM ( [dur:question]::XDM ( how long ) ) ) HAVE-GET-FEEL::MED ( GET ( have ) ) you HAVE-GET-FEEL::MED ( HAS ( had ) ) [super_body-state-spec=]::MED ( [body-state-spec=]::MED ( ID-WHOSE::MED ( [identifiability=] ( [id:non-distant] ( this ) ) )

BODY-STATE::MED ( [pain]::MED ( pain ) ) ) ) )

Page 36: Speech-to-Speech MT Design and Engineering

Manual Grammar Development

•About five additional days of development after the seed grammar was finalized

•Focusing on medical rules only

•Domain-independent rules remain untouched

Page 37: Speech-to-Speech MT Design and Engineering

Development and evaluation sets

• Development set: 133 sentences– from one dialog

• Evaluation set: 83 sentences– from two dialogs – unseen speakers– Only SDUs that could be manually tagged with a full IF

according to the current specification were included.

Page 38: Speech-to-Speech MT Design and Engineering

Grading Procedure: Recall and Precision of IF Components

c:give-information+ speech act

existence+body-state concepts

(body-state-spec=(pain, top-level argument

identifiability=no), sub-argument

body-location= top-level argument

(inside=head)) sub-argument

• Recall – ignored if number of items is 0

• Precision – ignored if 0 out of 0

Page 39: Speech-to-Speech MT Design and Engineering


Sub-level Values


Sub-Level Args

Top-Level Values


Top-Level Args


Concept List



Speech Act


Human vs. Machine: Evaluation Results

Page 40: Speech-to-Speech MT Design and Engineering

User Studies• We conducted three sets of user tests• Travel agent played by experienced system user• Traveler is played by a novice and given five minutes of

instruction• Traveler is given a general scenario - e.g., plan a trip to


• Communication only via ST system, multi-modal interface and muted video connection

• Data collected used for system evaluation, error analysis and then grammar development

Page 41: Speech-to-Speech MT Design and Engineering

System Evaluation Methodology

• End-to-end evaluations conducted at the SDU (sentence) level

• Multiple bilingual graders compare the input with translated output and assign a grade of: Perfect, OK or Bad

• OK = meaning of SDU comes across• Perfect = OK + fluent output• Bad = translation incomplete or incorrect

Page 42: Speech-to-Speech MT Design and Engineering

August-99 Evaluation

• Data from latest user study - traveler planning a trip to Japan

• 132 utterances containing one or more SDUs, from six different users

• SR word error rate 14.7%

• 40.2% of utterances contain recognition error(s)

Page 43: Speech-to-Speech MT Design and Engineering

Evaluation ResultsMethod Output

LanguageOK+Perfect Perfect

SOUP -Transcribed English 74% 54%SOUP-Recognition English 59% 42%SOUP-Transcribed Japanese 77% 59%SOUP-Recognition Japanese 62% 45%SOUP-Transcribed German 70% 39%SOUP-Recognition German 58% 34%

Page 44: Speech-to-Speech MT Design and Engineering

Evaluation - Progress Over Time

Method OK+Perfect Perfect

Jan-99 Transcribed 69% 46%

Apr-99 Transcribed 70% 49%

Aug-99 Transcribed 74% 54%

Jan-99 Recognition 55% 36%

Apr-99 Recognition 57% 38%

Aug-99 Recognition 59% 42%

Page 45: Speech-to-Speech MT Design and Engineering

Current and Future Work

• Expanding the interlingua: covering descriptive as well as task-oriented sentences

• Developing the new portable approaches• development of the server-based architecture for

supporting multiple applications:– NESPOLE!: speech-MT for advanced e-commerce

– C-STAR: speech-to-speech MT over mobile phones

– LingWear: MT and language assistance on wearable devices

Page 46: Speech-to-Speech MT Design and Engineering

Students Working on the Project

• Chad Langley: Hybrid Rule/Stat Analysis, Speech MT architecture

• Ben Han: Automatic Grammar Induction

• Alicia Tribble: Interlingua and grammar development for Medical Domain

• Joy Zhang, Erik Peterson: Chinese EBMT for LingWear

Page 47: Speech-to-Speech MT Design and Engineering

The JANUS Speech-MT Team• Project Leaders: Lori Levin, Alon Lavie, Tanja

Schultz, Alex Waibel• Grammar and Component Developers: Donna

Gates, Dorcas Wallace, Kay Peterson, Alicia Tribble, Chad Langley, Ben Han, Celine Morel, Susie Burger, Vicky MacLaren, Kornel Laskowski, Erik Peterson