rapid development in new languages limited training data (6hrs) provided by nectec from 34 speakers,...

1
Rapid Development in new languages • Limited training data (6hrs) provided by NECTEC from 34 speakers, + 8 spks for development and test • Romanization of Thai script in order to: • allows non-Thai researchers to work with the Roman representation like in the grammar development • romanized output basically provides the pronunciation > easier for speech synthesis component • Current dictionary covers the given 6-hours database = 734 words • Rapid bootstrapping of acoustic models using a 7-lingual GlobalPhone model set (Ch, Cr, Fr, Ge, Ja, Sp, Tu) • Results on ASR indicate that rapid bootstrapping can be done successfully for limited domain (see table) Word accuracy [%] in Thai language on the evaluation set: CI-AM 83.63% CD-AM (500) 84.44% CD-AM (1000) 82.71% A Thai Speech Translation System for Medical Dialogs Tanja Schultz, Dorcas Alexander, Alan W Black, Kay Peterson, Sinaporn Suebvisai, Alex Waibel Speech Recognition Tcl/Tk based Communication Server • Runs on Windows and Linux platforms • Integrates several languages: Thai, English, Spa, Ch, ... • Integrates different speech recognition approaches • Decoding along n-grams versus Context Free Grammars • Integrates different translation approaches • IF-based Translation versus statistical MT • Integrates two natural language generations from IF • knowledge-based generation with the pseudo-unification • statistical generation • Allows transmission of IF across devices for (wireless) multi-party translation (see demo: Laptop PDA ) Speech Synthesis Translation Symbolic Generation GenKit Recognition/ Analysis SR+LM IF Source Lang Speech Synthesis Cepstral Statistical Generation IF2NL Target Language Text Target Lang Speech Stat. Analysis SOUP Direct SMT SR+Parsing (CFG-Grammar) Thai / English medical English / Thai medical System Architecture First Thai voice built in the Festival Speech Synthesis System • Limited domain targeting the Hotel Reservation domain • 235 sentence that covered the main aspects of immediate interest • Recorded, auto-labeled, and built a synthetic voice using FestVox tools • Converted to small footprint portable version using Cepstral's Theta engine Rapid synthesis development in new languages: • Phoneme set shared with Speech Recognition • Lexicon of 522 words vocabulary constructed by hand • Statistically trained letter to sound rules to bootstrap the required word coverage • Interlingua based Machine Translation component - Interchange Format (IF) • abstracts from variation in syntax across languages • allows monolingual development for analysis and generation • provides paraphrase back into source language • can be easily extended to new languages due to STAR structure • Some extensions due to Thai characteristics: • The use of a term to indicate the gender of the person: Thai: zookhee kha1 - Eng: okay (ending) s[acknowledge] (zookhee *[speaker=]) An affirmation that means more than simply "yes." Thai: saap khrap - Eng: know (ending) s[affirm+knowledge](saap *[speaker=]) Interface: Hypothesis • Thai+ Roman script • Parse tree (CFG) • Translation • IF representation

Upload: paula-elliott

Post on 21-Jan-2016

215 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Rapid Development in new languages Limited training data (6hrs) provided by NECTEC from 34 speakers, + 8 spks for development and test Romanization of

Rapid Development in new languages• Limited training data (6hrs) provided by NECTEC from 34 speakers, + 8 spks for development and test• Romanization of Thai script in order to:

• allows non-Thai researchers to work with the Roman representation like in the grammar development• romanized output basically provides the pronunciation > easier for speech synthesis component

• Current dictionary covers the given 6-hours database = 734 words• Rapid bootstrapping of acoustic models using a 7-lingual GlobalPhone model set (Ch, Cr, Fr, Ge, Ja, Sp, Tu)• Results on ASR indicate that rapid bootstrapping can be done successfully for limited domain (see table) Word accuracy [%] in Thai language on the evaluation set:CI-AM 83.63% CD-AM (500) 84.44% CD-AM (1000) 82.71%

A Thai Speech Translation System for Medical DialogsTanja Schultz, Dorcas Alexander, Alan W Black, Kay Peterson, Sinaporn Suebvisai, Alex Waibel

Speech Recognition

Tcl/Tk based Communication Server• Runs on Windows and Linux platforms• Integrates several languages: Thai, English, Spa, Ch, ...• Integrates different speech recognition approaches

• Decoding along n-grams versus Context Free Grammars • Integrates different translation approaches

• IF-based Translation versus statistical MT• Integrates two natural language generations from IF

• knowledge-based generation with the pseudo-unification• statistical generation

• Allows transmission of IF across devices for (wireless) multi-party translation (see demo: Laptop PDA )

Speech SynthesisTranslation

Symbolic GenerationGenKit

Recognition/Analysis

SR+LMIF

SourceLang

Speech

SynthesisCepstral Statistical Generation

IF2NL

Target Language

Text

TargetLang

Speech

Stat. AnalysisSOUP

Direct SMT

SR+Parsing (CFG-Grammar)Thai / Englishmedical

English / Thaimedical

System Architecture

First Thai voice built in the Festival Speech Synthesis System• Limited domain targeting the Hotel Reservation domain• 235 sentence that covered the main aspects of immediate interest• Recorded, auto-labeled, and built a synthetic voice using FestVox tools• Converted to small footprint portable version using Cepstral's Theta engine

Rapid synthesis development in new languages:• Phoneme set shared with Speech Recognition• Lexicon of 522 words vocabulary constructed by hand• Statistically trained letter to sound rules to bootstrap the required word coverage• Unit selection concatenative synthesis• Phones tagged with syllable and tone information for more fluent results

• Interlingua based Machine Translation component - Interchange Format (IF)• abstracts from variation in syntax across languages• allows monolingual development for analysis and generation• provides paraphrase back into source language• can be easily extended to new languages due to STAR structure

• Some extensions due to Thai characteristics:• The use of a term to indicate the gender of the person:Thai: zookhee kha1 - Eng: okay (ending)s[acknowledge] (zookhee *[speaker=])• An affirmation that means more than simply "yes."Thai: saap khrap - Eng: know (ending)s[affirm+knowledge](saap *[speaker=])• Verb separation of terms for feasibility and other modalities

Interface:• Hypothesis

• Thai+ Roman script• Parse tree (CFG)• Translation

• IF representation