lessons learned mokusei: multilingual conversational interfaces future plans explore...
TRANSCRIPT
Lessons Learned
Mokusei: Multilingual Conversational Interfaces
Future Plans
• Explore language-independent approaches to speech understanding and generation
• Port human-language technologies for English conversational interfaces to Japanese
• Use existing Jupiter domain as test case– A telephone-only conversational interface for
weather information– More than 500 cities worldwide (~350 in US)– On-line information from four Web sites– Use the Galaxy client server architecture
• Speech Recognition (SUMMIT: Glass et al., ICSLP ‘96)– Lexicon: >2,000 words with phonemic pronunciations– Phonological modeling:
* Japanese specific phonological rules, e.g.,desu ka /d e s k a/
* Japanese phonetic units mapped into English ones– Acoustic modeling:
* Used English models to generate forced transcriptions utterances
* Retrained acoustic models to create hybrid models– Language modeling:
* Class n-gram using 60 word classes. trained on ~3,500 read & spontaneous sentences
* Also exploring a class n-gram derived automatically from TINA
• Speech Synthesis
– NTT Fluet text-to-speech system
Note: Sample sentences from Japanese speakers can be played from PC
S. Seneff, J. Glass, T.J. Hazen, J. Polifroni, and V. ZueMIT Laboratory for Computer Science
Y. MinamiNTT Cyberspace Laboratories
SpeechGeneration
SpeechUnderstanding
CommonSemantic
Frame
DATABASE
SpeechUnderstanding
SpeechGeneration
Language as Interface
• Language Understanding (TINA: Seneff, Comp Ling, ‘92 )– Japanese grammar contains >900 unique non-terminals– Translation file maps Japanese words to English equivalent– Produces same semantic frame as for English inputs– Left recursive structure of Japanese requires look-ahead to
resolve role of content words* Parse each content word into structure labeled “object”* Drop off “object” after next particle, which defines role and
position in hierarchy
• Language Generation (GENESIS, Glass et al., ICSLP ‘94)– Used English language generation tables as template– Modified ordering of constituents– Provided translation lexicon for words – Many language specific challenges, including constituent
ordering, quantifier translation, and multiple meanings
Language as Content
• Use the same internal representation for Japanese and English• Update from Web sites and satellite feeds at frequent intervals• Parse all data into semantic frames to capture meaning• Scan frames for semantic content and prepare new relational
database table entries
English: Some thunderstorms may be accompanied by gusty winds and hail
Japanese:
clause: weather_eventtopic: precip_act, name: thunderstorm, num: pl
quantifier: somepred: accompanied_by
adverb: possiblytopic: wind, num: pl, pred: gusty
and: precip_act, name: hail
weather
windhail
rain/storm
Frame indexed under weather, wind, rain, storm, and hail
• Our approach to developing multilingual interfaces appears feasible
• A top-down approach to parsing can be made effective for left-recursive languages
• Word order divergence between English and Japanese motivated a redesign of our language generation component
• Novel technique of generating a class n-gram language model using the NL component appears promising
• Involvement of Japanese researcher is essential
• Additional data collection from native Japanese speakers
– Nearly 1000 sentences were collected in December
• Improvement of individual components– Vocabulary coverage, acoustic and
language models– Parse coverage– Continued development of a more
sophisticated language generation component
• Expansion of weather content for Japan
Research Objectives