the htk book (for htk version 3.2.1) young et al., 2002

13
The HTK Book (for HTK Version 3.2.1) Young et al., 2002

Upload: francis-bond

Post on 29-Jan-2016

253 views

Category:

Documents


11 download

TRANSCRIPT

Page 1: The HTK Book (for HTK Version 3.2.1) Young et al., 2002

The HTK Book (for HTK Version 3.2.1)

Young et al., 2002

Page 2: The HTK Book (for HTK Version 3.2.1) Young et al., 2002

Chapter 1The Fundamentals of HTK

HTK is a toolkit for building hidden Markov models (HMMs).

Primarily used to build ASRs, but also other HMM systems: speaker and image recognition, automatic text summarization etc.

HTK has tools (modules) for both training and testing HMM systems.

Page 3: The HTK Book (for HTK Version 3.2.1) Young et al., 2002

How to Train and Test an ASR?

Things needed: A labeled speech corpus and a dictionary (+ grammar).

Procedure: 1. Divide corpus into training, development and test sets. 2. Train acoustic models. 3. Test, retrain, test … on the

development set. 4. Test on the test data.

Page 4: The HTK Book (for HTK Version 3.2.1) Young et al., 2002

How to Build an ASR Using HTK?

Goal: A recognizer for voice dialing.

( SENT-START ( DIAL <$digit> | (PHONE|CALL) $name) SENT-END )

Page 5: The HTK Book (for HTK Version 3.2.1) Young et al., 2002

Creating a Dictionary

HDMan a list of the phones. An HMM will be estimated for each of these phones.

Page 6: The HTK Book (for HTK Version 3.2.1) Young et al., 2002

Recording the Data

HSLab noname HSGen (wdnet dict) testprompts

Page 7: The HTK Book (for HTK Version 3.2.1) Young et al., 2002

Transcribing the Data

HMM training is supervised learning.

Page 8: The HTK Book (for HTK Version 3.2.1) Young et al., 2002

Coding the Data

HTK supports frame-based FFTs, LPCs, MFCCs, user-defined etc.

Page 9: The HTK Book (for HTK Version 3.2.1) Young et al., 2002

Output Probability Specification

Most common one is CDHMM. HTK also allows discrete probabilities (for

VQ data).

Page 10: The HTK Book (for HTK Version 3.2.1) Young et al., 2002

Flat Start Training

Build a prototype HMM with reasonable initial guesses of its parameters (HCompV).

Specify the topology – usually left to right and 3 states w/ no skips.

Create a MMF. Now use HRest or HERest for

training.

Page 11: The HTK Book (for HTK Version 3.2.1) Young et al., 2002

Realigning and Creating Triphones.

Use pseudo-recognition to force align training data w/ multiple pronunciations.

Page 12: The HTK Book (for HTK Version 3.2.1) Young et al., 2002

Evaluation

Page 13: The HTK Book (for HTK Version 3.2.1) Young et al., 2002

Other Issues

HTK supports supervised and unsupervised speaker adaptation (HVite).

Language model: n-gram language models.