carlasimoes i ms ws speech tech

Upload: nngochue001

Post on 29-May-2018

219 views

Category:

Documents


0 download

TRANSCRIPT

  • 8/9/2019 CarlaSimoes I MS WS Speech Tech

    1/13

    Acoustic ModelingIntroduction and MethodologyFellowship in collaboration with Prof. Carlos Teixeira, FCUL

    Carla [email protected]

    I Microsoft Workshop on Speech Technology - Building bridges between industryand academia, May 2 2007, MLDC

  • 8/9/2019 CarlaSimoes I MS WS Speech Tech

    2/13

    Overview

    Introduction Speech Components What are Acoustic Models? Why to use them?

    Methodology Training Acoustic Models

    Modelling English Spoken by Portuguese speakers

    Conclusion and Future Work

  • 8/9/2019 CarlaSimoes I MS WS Speech Tech

    3/13

    Corpus(Speech + Transcriptions)

    Feature extraction

    Acoustic model training

    SAPI(developersSpeech API)

    Lexicon(phonetic dictionary)

    Acoustic Models(Hidden Markov

    Models)

    Speech RecognitionEngine (SR)

    Language Pack(contains core SR and

    TTS engines)

    Text-to-speechEngine (TTS)

    Grammar + Lexicon(for SR apps; grammardefines the permittedsequence of words)

    +

    SpeechApplications

    Telephony(Speech Server2007, Exchange

    12)

    Mobility(Voice

    Command)

    Feature vector

    Desktop(Office12,

    Vista)

    Home(TV, Kitchen)

    Speech Components

  • 8/9/2019 CarlaSimoes I MS WS Speech Tech

    4/13

    What are Acoustic Models?

    They reflect the way we pronounce a certain language

    Speech can be broken into phonetic segments, phones

    Acoustic Models are representations of speech segments

    Acoustic model training involves mapping models toacoustic examples obtained from training data

  • 8/9/2019 CarlaSimoes I MS WS Speech Tech

    5/13

    Why to use them?

    Basis of an automatic speech recognition (ASR) system

    S1 S2 S3

    Speech Waveform

    Sequence of observed speech vectors

    Sequence of symbols

    Front End

    S1 S2 S3 The acoustic model gives the likelihood fora given feature vector as produced by aparticular phoneme

  • 8/9/2019 CarlaSimoes I MS WS Speech Tech

    6/13

    Methodology

    Our Acoustic Models are Hidden Markov Models (HMMs) based Markov Assumption: each state probability depends on the previous one

    Each HMM has 3 states each state represents a short segment of

    speech, described mathematically byGaussian probability distributions

    Acoustically similar information is sharedacross HMMs - sharing states calledsenones

    Cross-word triphone System

  • 8/9/2019 CarlaSimoes I MS WS Speech Tech

    7/13

    Methodology

    Training up a cross-word triphone system for a new language Acoustic model training involves mapping acoustic models (cross-word

    triphone or whole-word triphones) with equivalent labels (transcriptions)

    Corpus(Speech+word

    leveltranscriptions )

    +

    Lexicon

    Phoneset+

    QuestionSet

    Word leveltranscriptions

    intoMonophone

    level

    transcriptions

    Prototypemonophone systemconverted to initial

    Cross-WordTriphone System

    Cross-wordsystem is then

    updated toproduce the final

    Cross-Word

    Triphone system

    Clustering triphones intoacoustically similar groups

    A phoneset file should never

    contain more than 50 phones

  • 8/9/2019 CarlaSimoes I MS WS Speech Tech

    8/13

    ModellingEnglish Spoken by Portuguese Speakers

    Normally a speech recognizers precision is lower for non-native users Non-native accents are more problematic than dialects more

    variability

    Research on non-native accent modeling reveals largegains in performance when acoustics and pronunciationof an accent are taken into account

    An usage scenario: Voice controlled applications, wherePortuguese language is dominant but English termsare supported with the same accuracy

  • 8/9/2019 CarlaSimoes I MS WS Speech Tech

    9/13

    Experiments are being developed concerning this problem

    Corpus description 4689 Utterances for a universe of 227 Words

    Files are sampled at 8Khz for 16 bits linear 11 male speakers

    Model settings 3468 utterances for training 1221 utterances for test 43 minutes of speech Senones 1200

    ModellingEnglish Spoken by Portuguese Speakers

  • 8/9/2019 CarlaSimoes I MS WS Speech Tech

    10/13

    ModellingEnglish Spoken by Portuguese Speakers

    English spoken byPortuguese corpus

    New ModelTraining

    T e s t i n g

    Test corpus

    English spoken by

    Portuguese corpus

    U p d a t e

    ENU Model

    New Model

    T e s t i n g

    Test corpus

  • 8/9/2019 CarlaSimoes I MS WS Speech Tech

    11/13

    ModellingEnglish Spoken by Portuguese Speakers

    English spoken byPortuguese corpus

    T r ai ni n

    g

    ENU Phoneset PTG Phoneset

    English to Portuguesemapped phoneset

    PTG corpus

    New ModelTestingTest corpus

  • 8/9/2019 CarlaSimoes I MS WS Speech Tech

    12/13

    Future Work

    The improvement of Acoustic Models requires gatheringhundreds of hours of speech data

    The amounts of data would have to be larger if weredealing with non-native speakers, because the accentvariability gets too high

    Possible solutions:

    Define new phonesets which implies a phonetic studyconcerning the Portuguese English pronunciation Train the native models with the English spoken by

    Portuguese corpus

  • 8/9/2019 CarlaSimoes I MS WS Speech Tech

    13/13

    2007 Microsoft Corporation. All rights reserved.This presentation is for informational purposes only. Microsoft makes no warranties, express or implied, in this summary.

    Carla Simes

    [email protected]

    Muito obrigado pela vossa ateno!

    Acoustic ModelingIntroduction and Methodology

    www.microsoft.com/portugal/mldc

    I Microsoft Workshop on Speech Technology -Building bridges between industry and

    academia, May 2 2007, MLDC

    mailto:[email protected]://www.microsoft.com/portugal/mldchttp://www.microsoft.com/portugal/mldchttp://www.microsoft.com/portugal/mldchttp://www.microsoft.com/portugal/mldchttp://www.microsoft.com/portugal/mldchttp://www.microsoft.com/portugal/mldcmailto:[email protected]:[email protected]:[email protected]