application of hmms: speech recognition “noisy channel” model of speech

14

Application of HMMs: Speech recognition • “Noisy channel” model of speech

Post on 19-Dec-2015

220 views

Category:

Documents

0 download

Report

Download

Embed Size (px):

TRANSCRIPT

Page 1: Application of HMMs: Speech recognition “Noisy channel” model of speech

Application of HMMs: Speech recognition

• “Noisy channel” model of speech

Page 2: Application of HMMs: Speech recognition “Noisy channel” model of speech

Speech feature extractionAcoustic wave form

Sampled at 8KHz, quantized to 8-12 bits

Spectrogram

Time

Fre

quen

cyA

mpl

itude

Frame(10 ms or 80 samples)

Feature vector

~39 dim.

Page 3: Application of HMMs: Speech recognition “Noisy channel” model of speech

Speech feature extractionAcoustic wave form

Sampled at 8KHz, quantized to 8-12 bits

Spectrogram

Time

Fre

quen

cyA

mpl

itude

Frame(10 ms or 80 samples)

Feature vector

~39 dim.

Page 4: Application of HMMs: Speech recognition “Noisy channel” model of speech

Phonetic model

Page 5: Application of HMMs: Speech recognition “Noisy channel” model of speech

Phonetic model• Phones: speech sounds• Phonemes: groups of speech sounds that

have a unique meaning/function in a language (e.g., there are several different ways to pronounce “t”)

Page 6: Application of HMMs: Speech recognition “Noisy channel” model of speech

HMM models for phones• HMM states in most speech recognition systems

correspond to subphones– There are around 60 phones and as many as 603

context-dependent triphones

Page 7: Application of HMMs: Speech recognition “Noisy channel” model of speech

HMM models for words

Page 8: Application of HMMs: Speech recognition “Noisy channel” model of speech

Putting words together

• Given a sequence of acoustic features, how do we find the corresponding word sequence?

Page 9: Application of HMMs: Speech recognition “Noisy channel” model of speech

Decoding with the Viterbi algorithm

Page 10: Application of HMMs: Speech recognition “Noisy channel” model of speech

Limitations of Viterbi decoding• Number of states may be too large

– Beam search: at each time step, maintain a short list of the most probable words and only extend transitions from those words into the next time step

• Words with multiple pronunciation variants may get a smaller probability than incorrect words with fewer pronunciation paths

Word model for “tomato”

Page 11: Application of HMMs: Speech recognition “Noisy channel” model of speech

Limitations of Viterbi decoding• Number of states may be too large

• Beam search: at each time step, maintain a short list of the most probable words and only extend transitions from those words into the next time step

• Words with multiple pronunciation variants may get a smaller probability than incorrect words with fewer pronunciation paths– Use the forward algorithm instead of Viterbi algorithm

• The Markov assumption is too weak to capture the constraints of real language

Page 12: Application of HMMs: Speech recognition “Noisy channel” model of speech

Advanced techniques• Multiple pass decoding

– Let the Viterbi decoder return multiple candidate utterances and then re-rank them using a more sophisticated language model, e.g., n-gram model

Page 13: Application of HMMs: Speech recognition “Noisy channel” model of speech

Advanced techniques• Multiple pass decoding

– Let the Viterbi decoder return multiple candidate utterances and then re-rank them using a more sophisticated language model, e.g., n-gram model

• A* decoding– Build a search tree whose nodes are words and whose

paths are possible utterances– Path cost is given by the likelihood of the acoustic

features given the words inferred so far– Heuristic function estimates the best-scoring extension

until the end of the utterance

Page 14: Application of HMMs: Speech recognition “Noisy channel” model of speech

Reference

• D. Jurafsky and J. Martin, “Speech and Language Processing,” 2nd ed., Prentice Hall, 2008

Two-Stage Deep Learning for Noisy-Reverberant Speech ...wang.77/papers/ZWW.taslp19.pdf · ZHAO et al.: TWO-STAGE DEEP LEARNING FOR NOISY-REVERBERANT SPEECH ENHANCEMENT 55 Based on

Application of HMMs: Speech recognition

Speech Perception by Non-Native Speakers Declines Drastically in Noisy Conditions

GPU-ACCELERATED HMM FOR SPEECH RECOGNITION€¦ · • Massive Parallelism in Speech Recognition System Neural Networks, HMMs, etc. , are both Computation and Memory Intensive •

Edinburgh Research Explorer · IEEE/ACM TRANSACTIONS ON AUDIO, SPEECH & LANGUAGE PROCESSING DRAFT 1 Speech Enhancement of Noisy and Reverberant Speech for Text-to-Speech Cassia Valentini-Botinhao,

Comparison of Narrowband and Wideband Speech codecs in noisy environnement

ENLP Lecture 11 Part-of-speech tagging and HMMspeople.cs.georgetown.edu/nschneid/cosc572/f16/11_hmm_slides.pdf · ENLP Lecture 11 Part-of-speech tagging and HMMs (based on slides

M. Tech. project, EE Dept., IIT Bombay, Jun. 2013. Real-time Enhancement of Noisy Speech

Twitter Part-of-Speech Tagging for All: Overcoming Sparse and Noisy Data

HMMs Speech Recognition

MATCHING PURSUIT DECOMPOSITIONS OF NON-NOISY SPEECH ...vivonets.ece.ucsb.edu/SturmGibson_ICASSP06.pdf · MATCHING PURSUIT DECOMPOSITIONS OF NON-NOISY SPEECH SIGNALS USING SEVERAL

Distant Speech Recognition in Smart Homes Initiated by Hand Clapping within Noisy Environments

Segmental HMMs: Modelling Dynamics and Underlying Structure for Automatic Speech Recognition Wendy Holmes 20/20 Speech Limited, UK A DERA/NXT Joint Venture

DIRECT NOISY SPEECH MODELING FOR NOISY-TO-NOISY VOICE

Speech Recognition for Noisy Environments

Hidden Markov Models in Bioinformaticsceick/ML/HMM_in_BI.pdfA HMM is a statistical model for sequences of discrete simbols. Hmms are used for many years in speech recognition. HMMs

Classification and Recognition of Adverse Dysarthric (Cerebral Palsy) Speech Using Hidden Markov Models (HMMs) and Artificial Neural Networks (ANNs) by

DNN-BASED ENHANCEMENT OF NOISY AND REVERBERANT …web.cse.ohio-state.edu/~wang.77/papers/ZWMZ.icassp16.pdfDNN-BASED ENHANCEMENT OF NOISY AND REVERBERANT SPEECH Yan Zhao? DeLiang Wang?

Hidden Markov Models K 1 … 2. Outline Hidden Markov Models – Formalism The Three Basic Problems of HMMs Solutions Applications of HMMs for Automatic Speech

Speech Emotion Analysis in Noisy Real-World Environmentcvrr.ucsd.edu/publications/2010/ashish-icpr2010.pdf · Speech Emotion Analysis in Noisy Real-World ... used to calculate the

Hidden Markov Models (HMMs) –probabilistic models for learning patterns in sequences (e.g. DNA, speech, weather, cards...) (2 nd order model)

HMMs as Generative Models of Speech - iitg.ernet.in are mathematical expressions of a process / phenomenon in terms of ... l Speech Production Model. l ... Samudravijaya K Workshop

A Robust Algorithm for Formant Frequency Extraction of Noisy Speech

Robust Entropy-based Endpoint Detection for Speech Recognition in Noisy Environments

Manfred Lutzky, Fraunhofer IIS [email protected] · EVS Performance Gain – Noisy Speech Source: 3GPP TR 26.952, Experiment M2 (mixed bandwidth), Noisy Speech (Car Noise

1 Hidden Markov Models (HMMs) Probabilistic Automata Ubiquitous in Speech/Speaker Recognition/Verification Suitable for modelling phenomena which are dynamic

Design and Implementation of Speech Recognition …Design and Implementation of Speech Recognition Systems Spring 2013 Class 8/10: HMMs 18/25 Feb 2013 1 Recap: Generalized Templates

SPEECH ENHANCEMENT IN A NOISY CAR ENVIRONMENT …

Synchronous HMMs for Audio-Visual Speech Processingeprints.qut.edu.au/17689/3/David_Dean_Thesis.pdf · Synchronous HMMs for Audio-Visual Speech Processing by David Dean, BEng (Hons),

Multisensory speech enhancement in noisy environments ... · Multisensory speech enhancement in noisy environments using bone-conducted and air-conducted microphones Mingzi Li. Multisensory

The Noisy Speech Chain Noisy Speech Chain Abeer Alwan Speech Processing and Auditory Perception Laboratory(SPAPL) Department of Electrical Engineering, UCLA ... Improving Intelligibility

HMMs as Generative Models of Speech · HMMs as Generative Models of Speech [email protected] Workshop on TTS Synthesis 17-JUN-2014 DAIICT Samudravijaya K Tata Institute of Fundamental

Vocal Tract Length Perturbation for Speech Recognition ...ndjaitly/jaitly-icml13-talk.pdf · Speech Recognition with DNN-HMMs ... 5 21.4 21.8 6 21.0 21.3 7 21.6 21.6. Most Improving

Obuda University Characterisation of noisy speech channels ... · OU´ Obuda University´ Characterisation of noisy speech channels in 2G and 3G mobile networks Master Thesis To obtain