lessons learned mokusei: multilingual conversational interfaces future plans explore...

Lessons Learned Mokusei: Multilingual Conversational Interfaces Future Plans • Explore language-independent approaches to speech understanding and generation • Port human-language technologies for English conversational interfaces to Japanese • Use existing Jupiter domain as test case – A telephone-only conversational interface for weather information – More than 500 cities worldwide (~350 in US) – On-line information from four Web sites – Use the Galaxy client server architecture • Speech Recognition (SUMMIT: Glass et al., ICSLP ‘96) – Lexicon: >2,000 words with phonemic pronunciations – Phonological modeling: *Japanese specific phonological rules, e.g., desu ka /d e s k a/ *Japanese phonetic units mapped into English ones – Acoustic modeling: *Used English models to generate forced transcriptions utterances *Retrained acoustic models to create hybrid models – Language modeling: *Class n-gram using 60 word classes. trained on ~3,500 read & spontaneous sentences *Also exploring a class n-gram derived automatically from TINA • Speech Synthesis – NTT Fluet text-to-speech system Note: Sample sentences from Japanese speakers can be played from PC S. Seneff, J. Glass, T.J. Hazen, J. Polifroni, and V. Zue MIT Laboratory for Computer Science Y. Minami NTT Cyberspace Laboratories S peech G eneration S peech U nderstanding C om m on Semantic Fram e DATABASE Speech U nderstanding Speech G eneration Language as Interface • Language Understanding (TINA: Seneff, Comp Ling, ‘92 ) – Japanese grammar contains >900 unique non- terminals – Translation file maps Japanese words to English equivalent – Produces same semantic frame as for English inputs – Left recursive structure of Japanese requires look-ahead to resolve role of content words *Parse each content word into structure labeled “object” *Drop off “object” after next particle, which defines role and position in hierarchy • Language Generation (GENESIS, Glass et al., ICSLP ‘94) – Used English language generation tables as template – Modified ordering of constituents – Provided translation lexicon for words – Many language specific challenges, including constituent ordering, quantifier translation, and multiple meanings Language as Content •Use the same internal representation for Japanese and English •Update from Web sites and satellite feeds at frequent intervals •Parse all data into semantic frames to capture meaning •Scan frames for semantic content and prepare new relational database table entries English: Some thunderstorms may be accompanied by gusty winds and hail Japanese: clause: weather_event topic: precip_act, name: thunderstorm, num: pl quantifier: some pred: accompanied_by adverb: possibly topic: wind, num: pl, pred: gusty and: precip_act, name: hail weathe r wind hai l rain/ storm Frame indexed under weather, wind, rain, storm, and hail • Our approach to developing multilingual interfaces appears feasible • A top-down approach to parsing can be made effective for left- recursive languages • Word order divergence between English and Japanese motivated a redesign of our language generation component • Novel technique of generating a class n-gram language model using the NL component appears promising • Involvement of Japanese researcher is essential • Additional data collection from native Japanese speakers – Nearly 1000 sentences were collected in December • Improvement of individual components – Vocabulary coverage, acoustic and language models – Parse coverage – Continued development of a more sophisticated language generation component • Expansion of weather content for Japan Research Objectives

Upload: brice-bradley

Post on 31-Dec-2015

212 views

Category:

Documents

0 download

Report

Download

Embed Size (px):

TRANSCRIPT

Lessons Learned

Mokusei: Multilingual Conversational Interfaces

Future Plans

• Explore language-independent approaches to speech understanding and generation

• Port human-language technologies for English conversational interfaces to Japanese

• Use existing Jupiter domain as test case– A telephone-only conversational interface for

weather information– More than 500 cities worldwide (~350 in US)– On-line information from four Web sites– Use the Galaxy client server architecture

• Speech Recognition (SUMMIT: Glass et al., ICSLP ‘96)– Lexicon: >2,000 words with phonemic pronunciations– Phonological modeling:

* Japanese specific phonological rules, e.g.,desu ka /d e s k a/

* Japanese phonetic units mapped into English ones– Acoustic modeling:

* Used English models to generate forced transcriptions utterances

* Retrained acoustic models to create hybrid models– Language modeling:

* Class n-gram using 60 word classes. trained on ~3,500 read & spontaneous sentences

* Also exploring a class n-gram derived automatically from TINA

• Speech Synthesis

– NTT Fluet text-to-speech system

Note: Sample sentences from Japanese speakers can be played from PC

S. Seneff, J. Glass, T.J. Hazen, J. Polifroni, and V. ZueMIT Laboratory for Computer Science

Y. MinamiNTT Cyberspace Laboratories

SpeechGeneration

SpeechUnderstanding

CommonSemantic

Frame

DATABASE

SpeechUnderstanding

SpeechGeneration

Language as Interface

• Language Understanding (TINA: Seneff, Comp Ling, ‘92 )– Japanese grammar contains >900 unique non-terminals– Translation file maps Japanese words to English equivalent– Produces same semantic frame as for English inputs– Left recursive structure of Japanese requires look-ahead to

resolve role of content words* Parse each content word into structure labeled “object”* Drop off “object” after next particle, which defines role and

position in hierarchy

• Language Generation (GENESIS, Glass et al., ICSLP ‘94)– Used English language generation tables as template– Modified ordering of constituents– Provided translation lexicon for words – Many language specific challenges, including constituent

ordering, quantifier translation, and multiple meanings

Language as Content

• Use the same internal representation for Japanese and English• Update from Web sites and satellite feeds at frequent intervals• Parse all data into semantic frames to capture meaning• Scan frames for semantic content and prepare new relational

database table entries

English: Some thunderstorms may be accompanied by gusty winds and hail

Japanese:

clause: weather_eventtopic: precip_act, name: thunderstorm, num: pl

quantifier: somepred: accompanied_by

adverb: possiblytopic: wind, num: pl, pred: gusty

and: precip_act, name: hail

weather

windhail

rain/storm

Frame indexed under weather, wind, rain, storm, and hail

• Our approach to developing multilingual interfaces appears feasible