readme

This readme document is for version 1.03. The EM training function is updated in this version. Those who are interested in a more easily used version are invited to download version 1.01, in which the structure of HMMs is left-to-right without skips. Those who are interested in high-order hidden Markov models (HO-HMM) or hidden semi-Markov models (HSMM) are invited to visit https://sourceforge.net/projects/ho-hmm/.

In this version, the HMMs are allowed to have state-skipping transitions. State 1 and State N in this version are the null start and end state, respectively. The entry point for this package is "main_train_test_EM.m". In that script file, you may need to modify several parameters for the recognition system such as MODEL_NO, dim(the dimension of feature vector), ITERATION_END (which is used to determine the number of training iterations), the range for EMIT_STATE_NO, and the model structure, which is defined by the initialization probabilities, A0, Aij, and Af. A0 is a row vector for the transition probability from the dummy start state to the emitting states, i.e., A0(k) is used to initialize A(1,k+1) Aij is a row vector for the transition probability from an emit-state to itself and to the following states, i.e., Aij(k) is used initialize A(i,i+k-1) for all i.Af is a row vector used to set the transition probability from the last k-th emit-state to the null end state. For each k, if Af(k) is larger than A(N-k,N), then Af(k) is used to replace A(N-k,N) and the probability associated with the transition arcs leaving State k are renormalized. If Af(k) does not exists or Af(k) is not larger than A(N-k,N), then A(N-k,N) will not been affected.

Before you start to use the programs, you should first prepare the training and testing data. Excerpts of TIDIGITS database can be obtained from http://cronos.rutgers.edu/~lrr/speech%20recognition%20course/databases/isolated_digits_ti_train_endpt.zip and http://cronos.rutgers.edu/~lrr/speech%20recognition%20course/databases/isolated_digits_ti_test_endpt.zip. The root directory for the training data, isolated_digits_ti_train_endpt, and the root directory for test data, isolated_digits_ti_test_endpt, should be placed under the "wav" directory so that we do not need to modify "main_train_test_EM.m" to run that program.

To prepare your own data, you can modify the Matlab script file "main_dr_wav2mfcc_e_d_a.m" for extracting the feature vector sequence from your own waveform data. You also need to create a .mat file containing a list of training data and another .mat file containing a list of testing data, where the first field of a record in the list represents the word id (in integer) and the second field is the path of the data file. Example Matlab script files for creating training and testing list files are "generate_selected_TI_isolated_digits_training_list_mat.m" and "generate_selected_TI_isolated_digits_testing_list_mat.m", respectively.

The feature file format used in this version is compactable with the HTK format.

readme

Documents