voice recognition system using wavelet transform and neural networks
TRANSCRIPT
8/6/2019 Voice Recognition System Using Wavelet Transform and Neural Networks
http://slidepdf.com/reader/full/voice-recognition-system-using-wavelet-transform-and-neural-networks 1/7
JOURNAL OF COMPUTING, VOLUME 3, ISSUE 5, MAY 2011, ISSN 2151-9617
HTTPS://SITES.GOOGLE.COM/SITE/JOURNALOFCOMPUTING/
WWW.JOURNALOFCOMPUTING.ORG 13
Voice Recognition System Using Wavelet
Transform and Neural NetworksBayan Alsaaidah, Abdulsalam Alarabeyyat, Moh'd Rasoul Al-Hadidi
Abstract— Speech is the natural way that the people interact with each other. And by there voice, they can do and re‐
mote any job. This study aims to make the voice recognition system more efficient by converting the original data to
seven levels; each level presents a wavelet transform and then examines which level of the seven levels presents the
best solution. This system is applied on 40 samples which presents eight words. This research is based on speech rec‐
ognized words using Neural Networks, based on limited dictionary. This paper begins with introduction of this study,
then presents some related works, and explained the experiment of this study; finally the conclusion and future works
are presented.
Index Terms — Speech Recognition, Wavelet transform, Neural Networks, Resampling.
—————————— ——————————
1 INTRODUCTION
Every day there is many people come to this world and
make the first sound to begin there life and they don’t
know that by this sound they can make there life very
comfortable and easy to communicate with people and
machines.
The speech recognition is the process by which a com‐
puter identifies your spokenwords. It means when you
are talking to your computer, the computer will correctly
recognize what you are saying.
Moreover, voice recognition [1] “is the technology by
which sounds, words or phrases are spoken by humans
that are converted into electrical signals, and these signals
are transformed into coding patterns to which meaning
has been assigned”. The sound recognition can be more
general than the voice recognition, but in this paper we
focus on the human voice because it is most often and
most naturally used to communicate with the humans
and machines.
Speech generation and recognition are used to com‐
municate between humans and machines. Rather than
using our hands and eyes, and also we can use our mouth
and ears. This is very convenient when our hands and
eyes should be doing something else, such as: driving a
car, performing surgery, or firing weapons at the enemy
[2].
These days, we need to do every thing quickly and to
save our times and do it without using our hands espe‐
cially when it busy with something else. To achieve our
goal we can give order to any system just by using our
mouth by saying the order and it will be done. This con‐
cept can be achieved by using the voice recognition which
is a process by which the words of the humans are con‐
verted into electrical signals and these signals are trans‐
formed into coding patterns to which meaning has been
assigned [3].
There is a difficult in using the voice as an input to a
computer presented by the differences between the hu‐
man speech and the traditional forms of the computer
input. Each human has a different voice, and the same
words can have different meanings when it is spoken in
different different contexts. To overcome these diffculties
there are many techniques and methods that can be used
for voice recognition system, one of these methods is by
using the artificial neural networks.
Artificial Neural Networks (ANNs) are computer systems
made from collections of artificial neurons. They accept a
vector of inputs and produce a vector of outputs. They
compute their results in constant time [4]. Like what we
know about the nervous system in the human body. They
are trained by presenting them with input datasets and
corresponding correct outputs, and working to minimize
the recognition errors by adjusting the weights of the
————————————————
bdulsalam Alarabeyyat is with Al‐balqa Applied University, Salt, Jordan.
ohʹd Rasoul Al‐Hadidi is with Al‐balqa Applied University, Salt, Jordan.
Bayan Alsaaidah is with Al‐balqa Applied University, Salt, Jordan.
8/6/2019 Voice Recognition System Using Wavelet Transform and Neural Networks
http://slidepdf.com/reader/full/voice-recognition-system-using-wavelet-transform-and-neural-networks 2/7
JOURNAL OF COMPUTING, VOLUME 3, ISSUE 5, MAY 2011, ISSN 2151-9617
HTTPS://SITES.GOOGLE.COM/SITE/JOURNALOFCOMPUTING/
WWW.JOURNALOFCOMPUTING.ORG 14
network [4].The neural networks are used to design many
applications and the speech recognition is one of the most
important of it.
The main Significant Contribution of this research study
is the using of neural networks tool to design a system
that use the voice recognition technology, wher the voice
is processed by using the Wavelet transform. Neural net‐
work is applicable for many applications and it is popular
and used in many applications in the recently years. It is
friendly tool with matlab enviroment when the grammar
rules are not known.
According to the mentioned before, the main objective is
to build a system that works as speech recognition after
minimize some features of the voice and train the neural
networks to identify the spoken words, and then find the
best recognition after the wavelet transform. In a speech
recognition system, each input typically represents one
feature of the captured speech signal.
The combination of the voice feature strengths results in
an output vector that shows, for example, the likelihood
that these inputs represent various phonemes under con‐
sideration [4]. The neural network is a new technique that
based on training a model to recognize certain patterns of
voice so that when any words applied to the model and
have the same pattern it will be recognized.
Pattern recognition is the basis of today’s voice recogni‐
tion software. For any application the voice is converted
into digital data, which is then compared to information
stored in the programʹs database [5].
The comparison process of the recognition system uses
algorithms based on statistical techniques for predictive
modeling known as the Hidden Markov Model or HMM
or Neural Network or any other approches. The process
makes educated guesses about the audio sound pattern of
voice to predict the words that the user might be used[5].
Discrete Wavelet Transform (DWT) is an orthogonal func‐
tion which can be applied to a finite group of data. The
DWT and the Discrete Fourier Transform (DFT) are simi‐
lar in the orthogonality of the function, a signal passed
twice through the transformation is unchanged, the input
signal is assumed to be a set of discrete‐time samples, and
both transforms are convolutions [6].
The Discrete Wavelet Transform gives information
about the frequency function in the signal where it’s a
weakness in the DFT function. A wavelet is a little piece
of a wave. While theFourier transforms use a sinusoidal
wave carries with repeating itself to infinity, a wavelet
exists only within a finite domain, and its value is zero
elsewhere [7].
2 RELATED WORKS
There are many researches presented in voice recognition
system by using Artificial Neural Networks (ANNs), the following explanation introduce some of them.
In 1988 Murdock et al. improve speech recognition and
synthesis for disabled individuals using fuzzy neural
network. Their system involves three stages:
Dynamic word wrap matching is used to detect and align
candidate words; fuzzy neural‐net word recognition is
applied to input spectrogram patterns; a voice synthesizer
is used to complete the interactive loop. The system has a
recognition accuracy of 95‐98% [8].
In 1989 Nakamura and Shikano proposed system with speaker dependent which seem an updating on the pre‐
vious works. The algorithm was applied to Hidden Mar‐
kov Models (HMMs) and Neural Networks and evaluat‐
ed using a database of 216 phonetically balance words
and 5240 important Japanese words uttered by three
speakers. The HMM speaker adapted recognition rate for
b,d,g was 79.5%. The average recognition rate for the
three choices was about 91%. The algorithm was applied
to neural networks and resulted in almost the same per‐
formance [9].
In 1990 Hampshire and Waibel proposed the Single‐
speaker and multispeaker recognition system for the
voice‐stop consonants b, d, g using Time‐Delay Neural
Networks (TDNNs) with a number of enhancements, in‐
cluding a new objective function for training these net‐
works. The new objective function, called the Classifica‐
tion Figure of Merit (CFM) [10].
In 1994 the speech recognition using neural networks
used to controlling a robot as mentioned in [11]. Zhou et
al. activated robot arm controller by using the
VoiceCommander that based on neural networks.
In 1996 Nava and Taylor proposed a system with Neu‐
ro‐Fuzzy Classifier (NFC) with excellent classification
accuracy to solve the speaker‐independent systemʹs prob‐
lems. According to the results of this system, the NFC
shows better results than several existing methods [12].
In 2003 a 2‐D phoneme sequence pattern recogniion
using the fuzzy neural network was proposed by Kwan
andDong. They used the self‐organizing map and the
learning vector quantization to organize the phoneme
8/6/2019 Voice Recognition System Using Wavelet Transform and Neural Networks
http://slidepdf.com/reader/full/voice-recognition-system-using-wavelet-transform-and-neural-networks 3/7
JOURNAL OF COMPUTING, VOLUME 3, ISSUE 5, MAY 2011, ISSN 2151-9617
HTTPS://SITES.GOOGLE.COM/SITE/JOURNALOFCOMPUTING/
WWW.JOURNALOFCOMPUTING.ORG 15
feature vectors of short and long phonemes segmented
from speech samples to obtain the phoneme maps. They
formed the 2‐D phoneme response sequences of the
speech samples on the phoneme maps by the Viterbi
search algorithm. Then they used these 2‐D phoneme re‐
sponse sequence curves as inputs to the fuzzy neural
network for training and recognition of 0‐9 digit‐voice
utterances [13].
Toyoda et al. proposed a system by using a multi‐
layered perceptron NN system for environmental sound
recognition. Environmental sound recognition depends
more on the robot computer system task. The input data
was the one‐dimensional combination of the instantane‐
ous spectrum at the power peak and the power pattern in
time domain. Two experiments were conducted using an
original database and a database created. The result of
recognition rate for 45 environmental sound data sets was
about 92%. They found that the new method is fast and
simple compared to the HMM‐ based methods, and suita‐
ble for an on‐ board system of a robot for home use, e.g. a
security monitoring robot or a home‐helper robot [14].
In 2007, Soltani and Ainon proposed an experimental
study on six emotions, happiness, sadness, anger, fear,
neutral and boredom. This experiment used speech fun‐
damental frequency, formants, energy and voicing rate as
extracted features. The features were selected manually
for different experiments in order to get the best results.
These features were included into a features vector with
different sizes as input for different neural network classi‐
fiers. The database which was used for this experiment is
the Berlin Database of Emotional Speech [15].
In the study of Al‐Alaoui et al. they implemented a new
pattern classification method, where they used Neural
Networks trained using the Al‐Alaoui Algorithm.
The proposed speech recognition system was part of the
Teaching and Learning Using Information Technology
(TLIT) project which would implement a set of reading
lessons to assist adult illiterates in developing better read‐
ing capabilities.They compared two different methods for
automatic Arabic speech recognition for isolated words
and sentences. The result showed that the using of the Al‐
Alaoui Algorithm better than HMM in the prediction of
both words and sentences [16].
Onishi et al. proposed their system in 2009. They con‐
structed an individual identification system with three‐
layered neural networks. The voice signals were prepro‐
cessed by Fast Fourier Transform (FFT), and then they
used as input data of the neural networks with a back‐
propagation learning algorithm. The results of this study
summarized by that the performance of the neural net‐
work were dependent on pronunciation, and that the
three‐layered neural networks were effective for an indi‐
vidual identification using voice patterns [17].
In 2010 Shahgoshtasbi proposed system that improves
the equality of speech recognition system. This system
has two parts: The first part filters the input signal and
packs it. Then it gets the average of three packets as an
identification of the signal and send it to the second part.
The second part which is based on the human auditory
cortex was an associative neural network that maps the
input set to a desired output set. By experiment this sys‐
tem is able to recognize a word even anoisy one [18].
3 EXPERIMENT
The design of the proposed system based on the prepro‐
cessing of the wave signal by using the wavelet transform, and also on the ANNs which are designed to train the
system to recognize the samples.
3.1 Recording the Voice
The first step of the voice recognition system is the sound
record of the words that will be recognized by the system.
The record process can be achieved by many methods,
such as the sound recorder that is in the accessories of the
windows, the Audio recorder in any program with the
input (N, Fs, and CH). That is records N audio samples
at
Fs
Hertz
from
CH
number
of
input
channels.
With
the WAVE recording as output, and the third method is
the voice record with matlab environment, which is
achieved by using a list of commands written in the
command window and record the desired voice. In this
system we record 8 words; each one is recorded 5 times, so
we have 40 voice samples. Table 1 summarizes the recorded
words in both Arabic and English languages.
TABLE 1THE WORDS IN THE RECOGNITION SYSTEM
English words Arabic words
Open eftah
Close egleg
Right yameen
Left yasar
3.2 Analysis of voice signal
The designed system works on a limited vocabulary
consists of 8 words. Each words recorded with input
parameters as (44100 Hz, 16, stereo).It was read with this
© 2011 Journal of Computing Press, NY, USA, ISSN 2151-9617
8/6/2019 Voice Recognition System Using Wavelet Transform and Neural Networks
http://slidepdf.com/reader/full/voice-recognition-system-using-wavelet-transform-and-neural-networks 4/7
JOURNAL OF COMPUTING, VOLUME 3, ISSUE 5, MAY 2011, ISSN 2151-9617
HTTPS://SITES.GOOGLE.COM/SITE/JOURNALOFCOMPUTING/
WWW.JOURNALOFCOMPUTING.ORG 16
parameters with two channel for each one. The analysis
process begins with resampling of the voice signals to
minimize the sample’s size without any effects on the
content of these samples. The human can hear the sound
up to 20 KHz rates so the maximum rate is 2 x 20 KHz =
40 KHz maximum rates according to the relation be‐
tween fs and fm that is shows that fs greater than or
equal to double fm. The resampling here convert
44100 Hz to be 40000 Hz by rate of 10‐15%, this step
aimed to minmize the size of data to minmize the sys‐
tem’s time. The resampling process use one channel in this
system that is the left one.
The resampling process involves converting a sampled
signal from one sampling frequency to another fre‐
quency without any changing on the period of the
sample if the sampled audio was played at the new rate
directly. Figure 1 shows a resampling voice signal of our
system (egleg).
Figure 1: Resampled voice signal
3.3 Wavelet TransformDiscrete Wavelet Transform (DWT) is an orthogonal
function which can be applied to a finite group of data.
Use a separate file for each image. The functionality of
DWT is like the Discrete Fourier Transform(DFT), in
that the orthogonality of the function, a signal passed
twice through the transformation is unchanged, the in‐
put signal is assumed to be a set of discrete‐time samples,
and both transforms are convolutions [6].
In the voice recognition system the data was compresse
by
using
DWT
to
be
smaller
and
minimize
some
features.
In the proposed system the data that were resampled con‐
verted to another forms by using the wavelet transform,
this process applied on the data seven times and after that
the data begin too small so seven times are enough.
Table 2 shows the seven levels of the wavelet trans‐
form that applied on the data and also the original data
is remain to see what is the best one of them.
TABLE 2THE TRANSFORMATED LEVELS
Levels Audio rates(Hz)
0 40000
1 20000
2 10000
3 5000
4 2500 5 1250
6 625
7 313
The proposed system used the discrete wavelet trans‐
form that can be calculated by the equation 1:
Where the range of the summation is determined by the
specified number of nonzero coefficients M. The number of
nonzero
coefficients
is arbitrary,
and
they
will
be
the order
of the wavelet. The value of the coefficients is determined
by constraints of orthogonality and normalization, this
value is not arbitrary [6].
The processing of the wavelet transform begins with
load‐ ing all the samples including the original sample.
Then for each sample the wavelet filter is constructed us‐
ing first daubechies and the Discrete Wavelet Transform
(DWT) is applied. The daubechies used as mother wavelet
function because it is better to remove the noise from
the signal. For each sample the DWT applied seven
times to obtain seven levels of transformation so the
overall transformation equal to 40 samples x 7 levels = 280 wavelet transforma‐ tions. Then the result of these
transformations is saved in the database to be an input to the
Artificial Neural Networks (ANNs).
3.4 Neural Network Topology
The topology of the designed ANN which is used in the
system is shown in Figure 2. In this system there are 8
networks, each level of the wavelet transform has a net‐
work. In the next subsections the detail description of
each component of this topology is presented.
Figure 2: The topology of one ANN
8/6/2019 Voice Recognition System Using Wavelet Transform and Neural Networks
http://slidepdf.com/reader/full/voice-recognition-system-using-wavelet-transform-and-neural-networks 5/7
JOURNAL OF COMPUTING, VOLUME 3, ISSUE 5, MAY 2011, ISSN 2151-9617
HTTPS://SITES.GOOGLE.COM/SITE/JOURNALOFCOMPUTING/
WWW.JOURNALOFCOMPUTING.ORG 17
3.5 Input Layer
As shown in figure 2 the input layer has 40 neurons that
present the audio signals that will enter the network. The
value of this input has minimum and maximum value
depends on the input audio.
3.6 Output Layer
The numbers of neurons in output layers are eight neurons de‐
pending on original sample and the seven levels of the
wavelet transformation. For each sample that will be rec‐
ognized there are eight output to show which signal can
give a best recognition, for this reason there are eight neu‐
rons in the output layer.
3.7 Hidden Layer
In this neural network there is a single hidden layer that
has 25 neurons. These numbers depends on experiments
in the training of neural network and choose the best
number
that
gives
a best
training.
There
is
no
exact
rule that gives the best number of neurons in the hidden
layer. The choosing of the optimal number of neurons in
the hidden layer is determined by doing many experi‐
ments and then chooses the best one that is 25 in this sys‐
tem. This layer is important to determine the best perfor‐
mance of the ANN.
3.8 Recognizing Process
The most important part of the system is the recognition
process of the given patterns. There are many methods
that achieve this process, the most common used are Hid‐
den
Markov
Model
(HMM)
and
ANNs.
The
system
here has a limited dictionary so the best method to
recognize is the ANNs, while HMM is better with
speech recognition that has a big dictionary.
The recognition process is achieved by using neural
networks that i s offered by Matlab software. In the neu‐
ral networks the recognition process is done by using the
special function called sim which indicate the similarity
between the sample and the trained samples.
3.9 The Training Stage
This system built the neura l networks by using the
MultiLayer Perceptron (MLP) structure. Each network of
the system has three layers with a sigmoidal function in
the hidden layer followed by a training function Gradi‐
ent descent (gdx) that use a momentum and adaptive
learning rate backpropagation. The momentum used in
the backpropagation algorithm to achieve a faster global
convergence [19]. Traingdx can train any network as long
as its weight, its input, and transfer functions have de‐
rivative functions.
After the network architecture has been established, it is necessary to determine which value of weights must be
assigned to the network to minimize the error rate.To
assign this weights, the backpropagation training algo‐
rithm is used to train the network. The weights of
the networks are established, and also the biases are es‐
tablished.
The Sum‐squared error goal, the maximum number
of epochs to train, and the Momentum constant are
presented to the neural networks. After the input X
and the desired output Y are presented to the net‐
work, then the network use the input X to calculate the
output O, this value differs from the desired output Y
or the target output. The difference between the de‐
sired output and the actual output is computed and is
called the error.
After that the error is computed using the mean squared
error (MSE). The error is propagated backward to change
the weights in order to minimize the error rate. This pro‐
cess is repeated for a series of experimental data during
the training process to reach the error rate goal. This is
the summary of the learning process in the proposed
ANN.
The b a c k p r o p a g a t i o n algorithm i s used to t r a i n
the ANNs. The backpropagation algorithm has two passes
in the training process:
1. Forward pass: begins with initialize the weights
and biases. Then apply the input array X to the in‐
put layer. After that calculate the inputs and the out‐
puts for the hidden layer. This is done by finding the
summation of multiplying the input values by the
weights for each neuron in the hidden layer. The
output for each neuron is calculated by applying
the suitable activation f u n c t i o n f o r t h e comput‐
ed summation. In this system, the logistic activation
function is used. Then calculate the inputs and the
outputs for the output layer. This is done by finding
the summation of multiplying the output v a l u e s
from the hidden layer by their corresponding
weights for each neuron in the output l a y e r . The
output for each neuron is calculated by applying
the logistic activation function as defined fo r this
summation. The result is the network output o. The
calculated network output O is compared with
t h e desired output Y. If there is a difference, then
the error is computed using the Equation (2).
Error = Y − O (2)
8/6/2019 Voice Recognition System Using Wavelet Transform and Neural Networks
http://slidepdf.com/reader/full/voice-recognition-system-using-wavelet-transform-and-neural-networks 6/7
JOURNAL OF COMPUTING, VOLUME 3, ISSUE 5, MAY 2011, ISSN 2151-9617
HTTPS://SITES.GOOGLE.COM/SITE/JOURNALOFCOMPUTING/
WWW.JOURNALOFCOMPUTING.ORG 18
2. Backward pass: The backward step presents the
adjusting of the weights for the output layer, and
also for the hidden layer. The forward pass and the
backward pass are repeated for each sample in the
training set. After all samples are trained, the mean
square error (MSE) is computed using Equation (3):
Where n represents the size of the training set. The
training sets are maximized by adding a noise to
achieve more size of input to increase the precise of
training, the number converted from 40 to 160 sam‐
ples.
The developed neural network is trained well, accord‐
ing to the figure 3 that shows the performance and the
target of the NN of the best level of the DWT (level 7).
Figure 3: The performance of ANN
The training process of this ANN is reached to the target
output by
extremely
99%
as
shown
in the
figure
4.
Figure 4: The Regression of ANN
3.9 The Testing Stage
The testing stage is the final stage of the proposed system
that aims to examine the results and determine the accu‐
racy of the system.
To determine the performance of the voice recognition
system, the accuracy should be calculated by using the
following equation:
Where • AR: Accuracy Rate.
• CR: Correctly Recognized.
• S: Samples that will be examined.
By using this formula, the accuracy of this system is 90%.
After the result is presented and analyzed, the percentage
of recognition of each level of the wavelet transformations
shows that the level seven give the best recognition per‐
centage as shown in table 3.
TABLE 3R ECOGNITION RATES
Levels Recognition Rate
0 12.51% 1 12.49% 2 12.46% 3 12.53% 4 12.50% 5 12.44% 6 12.51% 7 12.57%
Which means that the best way to reach to an efficient voice recognition system is by applying the wavelet trans‐
formation of the voice sample seven times and then rec‐
ognize it. The following figures (5 &6) show the differ‐
ence between the original sample (of the word yasar) and
the same sample after the DWT is applied on it seven
times, This is clear that the DWT compressed the size of
this sample.
Figure 5: The Original Sample
Figure 6: The 7’th Transformation
8/6/2019 Voice Recognition System Using Wavelet Transform and Neural Networks
http://slidepdf.com/reader/full/voice-recognition-system-using-wavelet-transform-and-neural-networks 7/7
JOURNAL OF COMPUTING, VOLUME 3, ISSUE 5, MAY 2011, ISSN 2151-9617
HTTPS://SITES.GOOGLE.COM/SITE/JOURNALOFCOMPUTING/
WWW.JOURNALOFCOMPUTING.ORG 19
4 CONCLUSION
In this study, the multilayer p e r c e p t r o n is used as
structure of the ANN and the backpropagation training
algorithm is used to train the developed ANN.
The proposed system applied the DWT o n the orig‐
inal data seven times to convert to a smaller data. Each
level of this transformation has an individual network with the same parameters except the input value. The
recognition process can be achieved by using using the
special function called sim which indicate the similarity
between the sample and the trained samples.
The testing process shows that the best level that
gives the higher recognition rate is the seven level, that
ensure that the DWT is an effecient approach that com‐
pressed the data to minimize it’s features that will
make the recognition process faster and it improved the
accuracy of the system. The accuracy of this system is
80%‐100% according to the sample and the overall accura‐
cy is 90%. The performance of the ANN of the seventh
level is extremely 99%.
5 FUTURE WORKS
There are many directions are recommended to enhance
the voice recognition system using ANNs, such as: im‐
proving the accuracy of the voice recognition system by
training the ANNs on more data or by taking a specif‐
ic duration of the voice sample to minimize the data
and eliminate all unnecessary durations. Improving the
recognition system by using sentences not only words in
the training and recognizing processes. Comparing the
accuracy of the system that is applied on female voice
with the same system which is applied on male voice.
Finally, applying the voice recognition system on anoth‐
er type of ANNs.
REFERENCES
[1] B. Juang, L. Rabiner, “ Fundamentals of speech recog‐
nition”, PTR prentice‐hall,Inc.,A simon and schuster
company, 1993.
[2] www.physics.otago.ac.nz/internal/elec401/dsp
‐
smith/ch01.pdf. Accessed on 10‐12‐2010.
[3] R. Adams,” Sourcebook of automatic identification
and data collection”, Van Nostrand Reinhold, New
York, 1990.
[4] D. Colton,” Automatic speech recognition tutori‐
al”,2003.
[5] Clariety,” Voice Recognition Technology The Perfect
Computer Interface for the Real Estate Industry”, Clarei‐
ty Consulting & Communications, Inc., 2004.
accessed on 22‐1‐2011.
[6] T. Edwards,” Discrete wavelet transforms: Theory
and implementation”, Technical report, Stanford Uni‐
versity, 1991.
[7]www.thepolygoners.com/tutorials/dwavelet/dwttut.ht
ml. Accessed on 20‐2‐2011.
[8] R. Murdock, J. Husseiny, A. Liang, E. Abolrous, S. Ro‐
driguez, “Improvement on speech recognition and syn‐
thesis for disabled individuals using fuzzy
neural net retrofits”, Neural Networks, IEEE Interna‐
tional Conference on 24‐27 Jul, 1988.
[9] S. Shikano, K. Nakamura,” Speaker adaptation applied
to HMM and neural networks”, Acoustics, Speech, and
Signal Processing, ICASSP‐89., International Con‐
ference on 23‐26 May, 1, 1989.
[10] , J.B. A II Waibel, A.H. Hampshire, “novel objec‐
tive function for improved phoneme recognition using
time‐delay neural networks”, Neural Networks, IEEE
Transactions, V 2(216‐228), 1990.
[11] K.Ng, Y. Zhou, R. Ng, “A voice controlled robot
using neural network”, Intelligent Information Sys‐
tems. Second Australian and New Zealand Conference
on 29 Nov‐2 Dec, 1994.
[12] P.A. Taylor, J.M. Nava,” Speaker independent voice
recognition with a fuzzy neural network”, Fuzzy Sys‐
tems, Proceedings of the Fifth IEEE International
Conference on 8‐11 Sep, 3, 1996.
[13] H.Dong, X. Kwan, “Phoneme sequence pattern
recognition using fuzzy neural network”, Neural Net‐
works and Signal Processing, Proceedings of the 2003
International Conference on 14‐17 Dec., 1, 2003.
[14] S. Ding, Y. Liu, Y. Toyoda, J. Huang, “Environmental
sound recognition by multilayered neural networks”,
Computer and Information Technology, CIT ʹ04. The
Fourth International Conference on 14‐16 Sept., 2004.
[15] K. Ainon, R. Soltani, “Speech emotion detection
based on neural networks”, Signal Processing and Its
Applications. ISSPA 2007. 9th International Sympo‐
sium on 12‐15 Feb., 2007.
[16] J. Azar, E. Yaacoub, M. Al‐Alaoui, L. Al‐Kanj,
“Speech recognition using artificial neural networks and
hidden markov models”, In IMCL2008 Conference, 2008.
[17] A. Hasegawa, H. Kinoshita, K. Kishida, S.Onishi,
S. Tanaka, “Construction of individual identifica‐
tion system using voice in three‐layered neural net‐
works”, Intelligent Signal Processing and Communi‐
cation Systems. ISPACS 2009. International Sympo‐
sium on 7‐9 Jan., 2009.
[18] D. Shahgoshtasbi, “ A biological speech recognition
system by using associative neural networks”, World
Automation Congress (WAC), 2010.
[19] V. Sandrasegaran, K. Venayagamoorthy, G.K.
Moonasar, “Voice recognition using neural Networks”,
Communications and Signal Processing, COMSIG ʹ98.
Proceedings of the 1998 South African Symposium on 7‐8
Sep, 1998.