rapid development in new languages limited training data (6hrs) provided by nectec from 34 speakers,...
TRANSCRIPT
![Page 1: Rapid Development in new languages Limited training data (6hrs) provided by NECTEC from 34 speakers, + 8 spks for development and test Romanization of](https://reader038.vdocument.in/reader038/viewer/2022103101/5697c0151a28abf838ccde1e/html5/thumbnails/1.jpg)
Rapid Development in new languages• Limited training data (6hrs) provided by NECTEC from 34 speakers, + 8 spks for development and test• Romanization of Thai script in order to:
• allows non-Thai researchers to work with the Roman representation like in the grammar development• romanized output basically provides the pronunciation > easier for speech synthesis component
• Current dictionary covers the given 6-hours database = 734 words• Rapid bootstrapping of acoustic models using a 7-lingual GlobalPhone model set (Ch, Cr, Fr, Ge, Ja, Sp, Tu)• Results on ASR indicate that rapid bootstrapping can be done successfully for limited domain (see table) Word accuracy [%] in Thai language on the evaluation set:CI-AM 83.63% CD-AM (500) 84.44% CD-AM (1000) 82.71%
A Thai Speech Translation System for Medical DialogsTanja Schultz, Dorcas Alexander, Alan W Black, Kay Peterson, Sinaporn Suebvisai, Alex Waibel
Speech Recognition
Tcl/Tk based Communication Server• Runs on Windows and Linux platforms• Integrates several languages: Thai, English, Spa, Ch, ...• Integrates different speech recognition approaches
• Decoding along n-grams versus Context Free Grammars • Integrates different translation approaches
• IF-based Translation versus statistical MT• Integrates two natural language generations from IF
• knowledge-based generation with the pseudo-unification• statistical generation
• Allows transmission of IF across devices for (wireless) multi-party translation (see demo: Laptop PDA )
Speech SynthesisTranslation
Symbolic GenerationGenKit
Recognition/Analysis
SR+LMIF
SourceLang
Speech
SynthesisCepstral Statistical Generation
IF2NL
Target Language
Text
TargetLang
Speech
Stat. AnalysisSOUP
Direct SMT
SR+Parsing (CFG-Grammar)Thai / Englishmedical
English / Thaimedical
System Architecture
First Thai voice built in the Festival Speech Synthesis System• Limited domain targeting the Hotel Reservation domain• 235 sentence that covered the main aspects of immediate interest• Recorded, auto-labeled, and built a synthetic voice using FestVox tools• Converted to small footprint portable version using Cepstral's Theta engine
Rapid synthesis development in new languages:• Phoneme set shared with Speech Recognition• Lexicon of 522 words vocabulary constructed by hand• Statistically trained letter to sound rules to bootstrap the required word coverage• Unit selection concatenative synthesis• Phones tagged with syllable and tone information for more fluent results
• Interlingua based Machine Translation component - Interchange Format (IF)• abstracts from variation in syntax across languages• allows monolingual development for analysis and generation• provides paraphrase back into source language• can be easily extended to new languages due to STAR structure
• Some extensions due to Thai characteristics:• The use of a term to indicate the gender of the person:Thai: zookhee kha1 - Eng: okay (ending)s[acknowledge] (zookhee *[speaker=])• An affirmation that means more than simply "yes."Thai: saap khrap - Eng: know (ending)s[affirm+knowledge](saap *[speaker=])• Verb separation of terms for feasibility and other modalities
Interface:• Hypothesis
• Thai+ Roman script• Parse tree (CFG)• Translation
• IF representation