readme

1
This readme document is for version 1.03. The EM training function is updated in this version. Those who are interested in a more easily used version are invite d to download version 1.01, in which the structure of HMMs is left-to-right with out skips. Those who are interested in high-order hidden Markov models (HO-HMM) or hidden semi-Markov models (HSMM) are invited to visit https://sourceforge.net /projects/ho-hmm/. In this version, the HMMs are allowed to have state-skipping transitions. State 1 and State N in this version are the null start and end state, respectively. Th e entry point for this package is "main_train_test_EM.m". In that script file, y ou may need to modify several parameters for the recognition system such as MODE L_NO, dim(the dimension of feature vector), ITERATION_END (which is used to dete rmine the number of training iterations), the range for EMIT_STATE_NO, and the m odel structure, which is defined by the initialization probabilities, A0, Aij, a nd Af. A0 is a row vector for the transition probability from the dummy start state to the emitting states, i.e., A0(k) is used to initialize A(1,k+1) Aij is a row vector for the transition probability from an emit-state to itself and to the following states, i.e., Aij(k) is used initialize A(i,i+k-1) for all i. Af is a row vector used to set the transition probability from the last k-th emi t-state to the null end state. For each k, if Af(k) is larger than A(N-k,N), the n Af(k) is used to replace A(N-k,N) and the probability associated with the tran sition arcs leaving State k are renormalized. If Af(k) does not exists or Af(k) is not larger than A(N-k,N), then A(N-k,N) will not been affected. Before you start to use the programs, you should first prepare the training and testing data. Excerpts of TIDIGITS database can be obtained from http://cronos.r utgers.edu/~lrr/speech%20recognition%20course/databases/isolated_digits_ti_train _endpt.zip and http://cronos.rutgers.edu/~lrr/speech%20recognition%20course/data bases/isolated_digits_ti_test_endpt.zip. The root directory for the training data, isolated_digits_ti_train_endpt, and th e root directory for test data, isolated_digits_ti_test_endpt, should be placed under the "wav" directory so that we do not need to modify "main_train_test_EM.m " to run that program. To prepare your own data, you can modify the Matlab script file "main_dr_wav2mfc c_e_d_a.m" for extracting the feature vector sequence from your own waveform dat a. You also need to create a .mat file containing a list of training data and an other .mat file containing a list of testing data, where the first field of a re cord in the list represents the word id (in integer) and the second field is the path of the data file. Example Matlab script files for creating training and te sting list files are "generate_selected_TI_isolated_digits_training_list_mat.m" and "generate_selected_TI_isolated_digits_testing_list_mat.m", respectively. The feature file format used in this version is compactable with the HTK format.

Upload: waqas-sultan

Post on 21-Jul-2016

12 views

Category:

Documents


0 download

DESCRIPTION

It is read me file for only uploading...

TRANSCRIPT

Page 1: Readme

This readme document is for version 1.03. The EM training function is updated in this version. Those who are interested in a more easily used version are invited to download version 1.01, in which the structure of HMMs is left-to-right without skips. Those who are interested in high-order hidden Markov models (HO-HMM) or hidden semi-Markov models (HSMM) are invited to visit https://sourceforge.net/projects/ho-hmm/.

In this version, the HMMs are allowed to have state-skipping transitions. State 1 and State N in this version are the null start and end state, respectively. The entry point for this package is "main_train_test_EM.m". In that script file, you may need to modify several parameters for the recognition system such as MODEL_NO, dim(the dimension of feature vector), ITERATION_END (which is used to determine the number of training iterations), the range for EMIT_STATE_NO, and the model structure, which is defined by the initialization probabilities, A0, Aij, and Af. A0 is a row vector for the transition probability from the dummy start state to the emitting states, i.e., A0(k) is used to initialize A(1,k+1) Aij is a row vector for the transition probability from an emit-state to itself and to the following states, i.e., Aij(k) is used initialize A(i,i+k-1) for all i.Af is a row vector used to set the transition probability from the last k-th emit-state to the null end state. For each k, if Af(k) is larger than A(N-k,N), then Af(k) is used to replace A(N-k,N) and the probability associated with the transition arcs leaving State k are renormalized. If Af(k) does not exists or Af(k) is not larger than A(N-k,N), then A(N-k,N) will not been affected.

Before you start to use the programs, you should first prepare the training and testing data. Excerpts of TIDIGITS database can be obtained from http://cronos.rutgers.edu/~lrr/speech%20recognition%20course/databases/isolated_digits_ti_train_endpt.zip and http://cronos.rutgers.edu/~lrr/speech%20recognition%20course/databases/isolated_digits_ti_test_endpt.zip. The root directory for the training data, isolated_digits_ti_train_endpt, and the root directory for test data, isolated_digits_ti_test_endpt, should be placed under the "wav" directory so that we do not need to modify "main_train_test_EM.m" to run that program.

To prepare your own data, you can modify the Matlab script file "main_dr_wav2mfcc_e_d_a.m" for extracting the feature vector sequence from your own waveform data. You also need to create a .mat file containing a list of training data and another .mat file containing a list of testing data, where the first field of a record in the list represents the word id (in integer) and the second field is the path of the data file. Example Matlab script files for creating training and testing list files are "generate_selected_TI_isolated_digits_training_list_mat.m" and "generate_selected_TI_isolated_digits_testing_list_mat.m", respectively.

The feature file format used in this version is compactable with the HTK format.