a hybrid model of mfcc/msfla for speaker
TRANSCRIPT
American Journal of Computer Science and Engineering 2015; 2(5): 32-37
Published online August 30, 2015 (http://www.openscienceonline.com/journal/ajcse)
A Hybrid Model of MFCC/MSFLA for Speaker
Recognition
Majida Ali Abed1, Hamid Ali Abed Alasadi
2
1College of Computers Sciences & Mathematics, University of Tikrit, Tikrit, Iraq 2Computers Sciences Department, Education for Pure Science College, University of Basra, Basra, Iraq
Email address
[email protected] (M. A. Abed), [email protected] (H. A. A. Alasadi)
To cite this article Majida Ali Abed, Hamid Ali Abed Alasadi. A Hybrid Model of MFCC/MSFLA for Speaker Recognition. American Journal of Computer
Science and Engineering. Vol. 2, No. 5, 2015, pp. 32-37.
Abstract
In this paper, speaker recognition system is optimized based on one of Swarm Intelligence Algorithm called Modified Shuffle
Frog Leaping Algorithm (MSFLA) with Cepstral analysis and the Mel Frequency Cepstral Coefficients (MFCC) feature
extraction approach. In this algorithm Search has been applied on speaker recognition systems and voice. Thus by applying this
algorithm, the process of speaker recognition is optimized by a fitness function by matching of voices being done on only the
extracted optimized features produced by the MSFLA. The recognition accuracy for various noise conditions (white Gaussian
noises, car-noises and B-noises) with same dataset are 94.02%, 96.78% and 84.33%, respectively, using a Hybrid model of
MFCC/MSFLA.
Keywords
Speaker Recognition, Mel Frequency Cepstral Coefficients (MFCCs), Modified Shuffled Frog Leaping Algorithm (MSFLA)
1. Introduction
Speaker recognition systems became the topic of research
in the early 1970’s [1]. Some of the first studies of speaker
recognition were published in 1971, which used feature
extraction technique included, pitch contours [2], Linear
Prediction (LP), Cepstral analysis, linear prediction error
energy and autocorrelation coefficients .Current speaker
recognition research depend on the Cepstral analysis and the
Mel Frequency Cepstral Coefficients (MFCC) are the most
common short-time feature extraction approaches [3].
Speaker recognition includes speaker identification or
speaker verification based on his/her voice in the form of
speech. Speech signal carries information about speech
message, speaker and also the environment of recording. For
speaker recognition, speech data from a speaker is collected
and is used to develop a model for capturing the speaker
specific information. For text-independent speaker
recognition the speech data is usually of about one minute
duration. The model speaker is divided two models [4].
(1). Statistical model like a Gaussian Mixture Model,
Hidden Markov Model, Support Vector Machines
(SVM) and Vector Quantization (VQ).
(2). Neural network model like Feed forward Auto
associative network
Now these two models are used as classification methods
in speaker recognition based by applying the evolutionary
algorithms such as genetic algorithms and genetic
programming, Swarm Intelligence (SI) algorithms such as
Ant Colony Optimization (ACO), Bee Colony Optimization
(BCO), Cat Swarm Optimization (CSO), Shuffled Frog
Leaping Algorithm (SFLA), and Cuckoo Search Algorithm
(CSA). The process of Speaker Recognition is optimized by a
fitness function of these algorithms by matching of voices
being done on only the extracted optimized features produced
by the Swarm Intelligence (SI) algorithms [5, 6]. In Our
paper we used Modified Shuffled Frog Leaping Algorithm
(MSFLA). Our paper is prepared as, Section 2; we discuss
the principle of speaker recognition, Section 3, features
extraction used in this paper. In Sections 4 and 5, the
principle of MSFLA and the speaker recognition system
using the MSFLA are described, respectively. The
performance of the recognition systems based on principle of
speaker recognition and system features is evaluated, and the
33 Majida Ali Abed and Hamid Ali Abed Alasadi: A Hybrid Model of MFCC/MSFLA for Speaker Recognition
results are discussed in Section 6. Section 7, gives a
conclusion of the paper.
2. Speaker Recognition
The speaker recognition task is often divided into two
related applications and Characterized into text-independent
and text-dependent recognition [7]. As shown in Figure (1):
� Speaker Identification.
� Speaker Verification.
Speaker identification is used to determine the speaker
from a set of registered speakers when the result of this set is
finest speaker matched, the set is called closed set
identification but when the result can be a speaker or a no-
match result and is called open set identification. Speaker
Verification determines if the voice matches a particular
registered speaker result is the probability of a match or a
similarity measure [8].
Figure (1). The two essential tasks of speaker recognition.
3. Feature Extraction
Modified Shuffle Frog Leaping Algorithm (MSFLA) work
on only on best features, so there is a need to initially extract
the features from the voices [9]. There are many different
speech features that have been shown to be indicative of
speaker identity. These include field related features:
� Linear Prediction Cepstral Coefficients (LPCCs).
� Maximum Autocorrelation Value (MACV).
� Mel Frequency Cepstral Coefficients (MFCCs).
We used in our research the speech feature Mel Frequency
Cepstral Coefficients (MFCCs) extracted from the spectrum.
The reason for use this speech feature is that in many
applications speaker identification is a precursor to speech
recognition, to identify what is being said. Among the
possible features MFCCs have verified to be the most
successful and hearty features for speech recognition [10].
The features will be extracted from the inputted voice. This
inputted voice will be in the form of spectrograms consisting
of various frequencies as per time. Fourier-Bessel Cepstral
coefficients (FBCC) based feature extraction indicates an
improved accuracy and efficiency in comparison to (LPCCs)
and (MACV) feature extracted [11].
4. Modified Shuffled Frog Leaping
Algorithm (MSFLA)
Shuffled Frog Leaping Algorithm (SFLA) and Modified
Shuffled Frog Leaping Algorithm (MSFLA) is a newly
developed nature-inspired method [12-16], which is
characterized by great capability in global search and easy
execution. MSFLA combines the advantages of Genetic
Algorithm (GA) and Particle Swarm Optimization (PSO), is
shown in Figure (2).
American Journal of Computer Science and Engineering 2015; 2(5): 32-37 34
Figure (2). Modified Shuffle Frog Leaping Algorithm.
35 Majida Ali Abed and Hamid Ali Abed Alasadi: A Hybrid Model of MFCC/MSFLA for Speaker Recognition
Figure (3). Process of our proposed Speaker Recognition.
5. Voices Speaker Matching
After the feature extraction stage we obtained stored
extracted features voice of speakers and these extracted
features voice must be matched with input voice’s features.
We used relationship between them, when the extracted
features nearby to the stored features will be the one that will
be matched. To evade the voice matching in all stage of our
system especially when we have un - aboveboard speaker, a
basis small value is used to correct un-aboveboard or
abandonment a speaker which stipulates a probability ratio,
which will denote the amount of match of speaker
recognition. Then the voice will either be accepted or
disallowed. Acceptance means that the speaker is aboveboard
as the voice is matched otherwise it will be disallowed. The
matching between the input voice and the database voice gets
when the matched voice will have a high relationship
otherwise a low value below the threshold is mistreated,
hence the speaker is not permitted the admittance. In our
paper text dependent speaker recognition is used, in which
the conscription and test safety codes are same [17]. The
following Figure (3) explains the process of Text
Independent Speaker Recognition using Modified Shuffled
Frog Leaping Algorithm.
6. Simulation and Results
In this section, they described the Simulation by using
MATLAB, in order to simulate it and discussed, first explain
the database of our system contains different enunciation of
40 different speakers, both male and female speakers (as an
examples in Figure (4)), and each speaker has expressed 5
different sentences.
(a)
(b)
Figure (4). Speaker Signal examples (a) Male (b) Female.
American Journal of Computer Science and Engineering 2015; 2(5): 32-37 36
The database is required the extracted features of the user
be relevant to different enunciation. In our work the Mel
Frequency Cepstral Coefficients (MFCC) is the popular
acoustic features used in speech recognition system for
different speech data. The extracted feature database of the
enunciation is made using MFCC for making a hearty speech
recognizer for different users, and for efficient working of the
MSFLA. The features extracted are accessed by the MSFLA
to search out the best match. The enunciation is added with
different types of noise (white Gaussian noises, car-noises
and B-noises) the features of the signal with added noise are
extracted and the MSFLA discoveries optimally the best
match for the features extracted with admiration to the
feature database, and shows the result for best match. The
obtained results of the recognition accuracy are found to be
best using MFCC features with MSFLA for various noise
conditions using same dataset are as below in Figure (5). The
recognition accuracy for added white Gaussian noises, car-
noises and B-noises are 94.02%, 96.78% and 84.33%,
respectively.
Figure (5). Simulation results for different types of noises.
7. Conclusion
Our paper is based on one of Swarm Intelligence
Algorithm called Modified Shuffle Frog Leaping Algorithm
(MSFLA). The aim of this algorithm use Biometrics is to
identify an individual as per their some special characteristics
as voice. In this MSFLA Search has been applied on speaker
recognition systems and voice. Thus by applying this
algorithm, the process of speaker recognition is optimized by
a fitness function by matching of voices being done on only
the extracted optimized features produced by the MSFLA.
The recognition accuracy is found to be best using a hybrid
model of MFCC/MSFLA (MFCC features with MSFLA) for
various noise conditions. This work addresses the hybrid
model of MFCC/MSFLA as a system reliability optimization
with a multi-criteria approach provided useful insights into
patterns of interaction among articulatory-acoustic feature
dimensions in the further work.
References
[1] D. Ververidis, C. Kotropoulos, “Gaussian mixture modeling by exploiting the mahalanobis distance”, IEEE transactions on signal processing, Vol. 56, No. 7, July 2008.
[2] K. Sri Rama Murty and B. Yegnanarayana, “Combining evidence from residual phase and MFCC features for speaker recognition”, IEEE Signal Processing Letters, vol 13, no 1, Jan. 2006.
[3] S.R.M. Prasanna, S.G. Cheedella, B. Yegnanarayana, “Extraction of speaker-specific excitation information from linear prediction residual of speech”, Speech Communication, Vol. 48, Issue 10, October 2006.
[4] S. Chakroborty, A. Roy, S. Majumdar, G. Saha, “Capturing Complementary Information via Reversed Filter Bank and Parallel Implementation with MFCC for Improved Text-Independent Speaker Identification”, International conference on Computing theory and applications, March 2007.
[5] Y. Liu, M. Russell, M. Carey,” The Role of Dynamic Features in Text-Dependent and Independent Speaker Verification”, IEEE international conf. on acousto. Speech and signal processing (ICASSP), Vol. 1, May 2006.
[6] E. Elbeltagi, T. Hegazy, and D. Grierson, “Comparison among five evolutionary based optimization algorithms,” Advanced Engineering Informatics, Vol. 19, Jan. 2005.
[7] D. A. Reynolds, “Speaker identification and verification using Gaussian mixture models,” Speech Comm., vol. 17, Aug. 1995.
[8] Chu, W. C., "Speech Coding Algorithms'', John Wiley & Sons, Vol.4, USA. 2003.
37 Majida Ali Abed and Hamid Ali Abed Alasadi: A Hybrid Model of MFCC/MSFLA for Speaker Recognition
[9] S. P. Kishore and B. Yegnanarayana, “Speaker verification Minimizing the channel effects using auto associative neural network models,” in Proceedings of IEEE Int. Conf. Acoust., Speech, and Signal Processing, Istanbul, 2000.
[10] M. Shajith Ikbal, Hemant Misra, and B. Yegnanarayana,“Analysis of auto associative mapping neural networks,” in Int. Joint Conf. on Neural Networks,Washington, USA, 1999.
[11] B.Wildermoth and K. K. Paliwal. Use of voicing and pitch information for speaker recognition. In Use of Voicing and Pitch Information for Speaker Recognition, 2000.
[12] Eusuff, M.M. and Lansey, K.E. ‘Optimization of water distribution network design using the shuffled frog leaping algorithm’, Journal of Water Resources Planning andManagement, Vol. 129, No. 3, 2003.
[13] Taher Niknam, Ehsan Azad Farsani, A hybrid self-adaptive particle swarm optimization and modified shuffled frog leaping algorithm for distribution feeder reconfiguration , Engineering Applications of Artificial Intelligence, 2010.
[14] B. Amiri, M. Fathian, A. Maroosi, Application of shuffled frog-leaping algorithm on clustering, Journal of International Advanced Manufacturing Technology, Vol.45, 2009.
[15] X. H. Luo, Y. Yang, and X. Li, “Modified shuffled frog-leaping algorithm to solve traveling salesman problem,” Journal of Communications, Vol. 30, Jul. 2009.
[16] A. Khorsandi, A. Alimardani, B. Vahidi, and S.H. Hosseinian, “Hybrid shuffled frog leaping algorithm and Nelder–Mead simplexsearch for optimal reactive power dispatch,” IET Genetation Transmission & Distribution, Vol. 5, 2, 2011.
[17] H.B. Kekre, Vaishali Kulkarni, Prashant Gaikar and Nishant Gupta, “Speaker Identification using Spectrograms of Varying Frame Sizes”, International Journal of Computer Applications Vol. 50 - No. 20, July 2012.