speech recognition , noise filtering and content search engine , research document
TRANSCRIPT
![Page 1: Speech Recognition , Noise Filtering and Content Search Engine , Research Document](https://reader031.vdocument.in/reader031/viewer/2022020106/55a96f961a28ab2b508b47a6/html5/thumbnails/1.jpg)
Ultimate Speech Search
Page i
ABSTRACT
In the modern era people tends to find information where ever they can in a more
efficient way. They search for the knowledge from past events so does the present events.
Searching for a particular thing evolves a search engine and the necessary information. When
they want to learn out of speeches or lectures done by any one they are going for a desperate
search without knowing the actual results. If they have a luxury of a search engine that would
give the required results that would be a blessing for their work.
This project totally aims for build a search engine that will able to search for
speeches and lectures by their content. Every search engine supports the feature of searching,
but the results may be a jargon. The user has to go one by one and sometimes at the end of
the day they will end up will a null result. The main goal of this project is to provide a search
facility by the content.
This research covers converting a speech in to text with a bit of noise analysis,
maintaining a database with clustered indexing and a simple search facility by the content.
The system that would build operates on a limited data such as speeches and lectures in a low
noisy environment and as for the future enhancement it would be able to search for music or
any other sound stream by the analysis of the spectrum with user friendly search facility.
KEY WORDS Search Engine, Speeches, Lectures, Noise Analysis, Content, Spectrum
![Page 2: Speech Recognition , Noise Filtering and Content Search Engine , Research Document](https://reader031.vdocument.in/reader031/viewer/2022020106/55a96f961a28ab2b508b47a6/html5/thumbnails/2.jpg)
Ultimate Speech Search
Page ii
ACKNOWLEDGEMENTS
My sincere gratitude goes to my grandfather who taught me the ways of life and who
raised me up from my childhood to a teenager and left me in a May.
I would like to thank to my friends those who help me in my difficult times and praised me in
my good times. I would like to thank to my college teachers who beat me from canes to make
me a good man and gave me the knowledge to face the society.
I would like to thank for my sister who always be a mother to me and I would like to show
my gratitude for my supervisor Mrs. Nadeera Ahangama who guide to throughout the project.
Finally I would like to thank to the APIIT staffs who provide us with necessary facilities to
achieve our higher education and make it a success.
![Page 3: Speech Recognition , Noise Filtering and Content Search Engine , Research Document](https://reader031.vdocument.in/reader031/viewer/2022020106/55a96f961a28ab2b508b47a6/html5/thumbnails/3.jpg)
Ultimate Speech Search
Page iii
Table of Contents
ABSTRACT ................................................................................................................................ i
ACKNOWLEDGEMENTS ....................................................................................................... ii
List of Figures .......................................................................................................................... vii
List of Equations .................................................................................................................... viii
List of Tables ............................................................................................................................ ix
INTRODUCTION ..................................................................................................................... 1
1.1 Project Background ..................................................................................................... 1
1.2 Problem Description .................................................................................................... 2
1.3 Project Overview ......................................................................................................... 4
1.3.1 Noise analysis ...................................................................................................... 4
1.3.2 Speech recognition ............................................................................................... 4
1.3.3 Speech to text conversion .................................................................................... 4
1.3.4 The database......................................................................................................... 5
1.3.5 The search engine ...................................................................................................... 5
1.4 Project Scope ............................................................................................................... 6
1.5 Project Objectives ....................................................................................................... 7
RESEARCH ............................................................................................................................... 8
2.1 Speech Recognition .......................................................................................................... 8
2.2 Speech recognition methods........................................................................................... 13
2.2.1 Hidden Markov methods in speech recognition ...................................................... 13
2.2.2 Client side speech recognition ................................................................................. 16
2.2.5 Continuous speech recognition ................................................................................ 18
2.2.6 Direct Speech Recognition ...................................................................................... 18
2.3 Speaker Characteristics .................................................................................................. 19
2.3.1 Speaker Dependent .................................................................................................. 19
2.3.2 Speaker Independent................................................................................................ 19
2.3.3 Conclusion ................................................................................................................... 20
2.4 Speech Recognition mechanisms ................................................................................... 21
2.4.1 Isolated word recognition ........................................................................................ 21
![Page 4: Speech Recognition , Noise Filtering and Content Search Engine , Research Document](https://reader031.vdocument.in/reader031/viewer/2022020106/55a96f961a28ab2b508b47a6/html5/thumbnails/4.jpg)
Ultimate Speech Search
Page iv
2.4.2 Continuous speech recognition ................................................................................ 22
2.4.3 Conclusion ............................................................................................................... 23
2.5 Vocabulary Size ............................................................................................................. 24
2.5.1 Limited Vocabulary ................................................................................................. 24
2.5.2 Large Vocabulary .................................................................................................... 24
2.5.3 Conclusion ............................................................................................................... 24
2.6 Speech recognition API‟s ............................................................................................... 25
2.6.1 Microsoft Speech API 5.3 ....................................................................................... 25
2.6.2 Java Speech API ...................................................................................................... 26
2.7 Speech Recognition Algorithms .................................................................................... 31
1. 8 Noise Filtering ........................................................................................................... 32
1.8.1 Weiner filtering .................................................................................................. 33
1.8.2 Conclusion ......................................................................................................... 33
2.9 Database and data structure ............................................................................................ 34
2.9.1 Conclusion ............................................................................................................... 34
2.10 Search Engine ............................................................................................................... 35
2.11 MATLAB ..................................................................................................................... 36
ANALYSIS .............................................................................................................................. 37
3.0 System requirements ................................................................................................. 37
3.11 Functional requirements ........................................................................................ 37
3.1.2 Non functional requirements ................................................................................... 37
3.1.3 Software Requirements ............................................................................................ 38
3.1.4 Hardware requirements ............................................................................................ 39
3.2 System Development Methodologies............................................................................. 40
3.2.1 Rational Unified Process ......................................................................................... 40
3.2.2 Agile Development Method .................................................................................... 43
3.2.3Scrum Development Methodology ........................................................................... 45
3.3 Test Plan ......................................................................................................................... 47
3.3.1System testing ........................................................................................................... 47
SYSTEM DESIGN .................................................................................................................. 48
4.1 Use Case Diagram ..................................................................................................... 48
![Page 5: Speech Recognition , Noise Filtering and Content Search Engine , Research Document](https://reader031.vdocument.in/reader031/viewer/2022020106/55a96f961a28ab2b508b47a6/html5/thumbnails/5.jpg)
Ultimate Speech Search
Page v
4.2 Use case description ....................................................................................................... 50
4.2.1 Use case description for file upload ........................................................................ 50
4.2.2Use Case description for play an audio file .............................................................. 51
4.2.3 Use Case description for search ............................................................................... 52
4.2.4 Use Case description for noise reduced output ....................................................... 53
4.2.5 Use Case description for noise filtering .................................................................. 54
4.3 Activity Diagrams .......................................................................................................... 55
4.3.1Activity Diagram for Speech Recognition System................................................... 55
4.3.2 Activity Diagram for Noise filtering ....................................................................... 56
4.4 Sequence Diagrams ........................................................................................................ 57
4.4.1 Select a file .............................................................................................................. 57
4.4.2 Play wav file ............................................................................................................ 58
4.4.3Speech recognition pre stage .................................................................................... 59
4.4.4Speech Recognition post stage ................................................................................. 60
4.5 Class Diagrams ............................................................................................................... 61
4.5.1 GUI and the system ................................................................................................. 61
4.5.2 Speech recognition .................................................................................................. 62
4.6 Noise Filtering ................................................................................................................ 64
4.7 Code to filter noise in C Language................................................................................. 67
CHAPTER 5 ............................................................................................................................ 73
5.0 Implementation .................................................................................................................. 73
CHAPTER 6 ............................................................................................................................ 78
6.0 Test Plan............................................................................................................................. 78
6.1 Background .................................................................................................................... 78
6.2 Introduction .................................................................................................................... 78
6.3 Assumptions ................................................................................................................... 79
6.4 Features to be tested ....................................................................................................... 79
6.5 Suspension and resumption criteria ............................................................................... 80
6.6 Environmental needs ...................................................................................................... 81
6.7 System testing ................................................................................................................ 82
6.8 Unit testing ..................................................................................................................... 83
![Page 6: Speech Recognition , Noise Filtering and Content Search Engine , Research Document](https://reader031.vdocument.in/reader031/viewer/2022020106/55a96f961a28ab2b508b47a6/html5/thumbnails/6.jpg)
Ultimate Speech Search
Page vi
6.9 Performance Testing ...................................................................................................... 89
6.10 Integration Testing ....................................................................................................... 92
CHAPTER 7 ............................................................................................................................ 94
CRITICAL EVALUATION AND FUTURE ENHANCEMENTS ........................................ 94
7.1Critical evaluation ........................................................................................................... 94
7.2 Suggestions for future enhancements ............................................................................. 99
8.0 Conclusion .................................................................................................................. 101
REFERENCES ...................................................................................................................... 102
BIBLIOGRAPHY .................................................................................................................. 106
APPENDIX A ........................................................................................................................ 107
APPENDIX B ........................................................................................................................ 114
Gantt chart .......................................................................................................................... 114
![Page 7: Speech Recognition , Noise Filtering and Content Search Engine , Research Document](https://reader031.vdocument.in/reader031/viewer/2022020106/55a96f961a28ab2b508b47a6/html5/thumbnails/7.jpg)
Ultimate Speech Search
Page vii
List of Figures
Figure 1: Overview of Steps in Speech Recognition ................................................................. 8
Figure 2 : Graphical Overview of the Recognition Process .................................................... 10
Figure 3: Components of a typical speech recognition system................................................ 12
Figure 4 : example of HMM for word “Yes” on an utterance ................................................. 15
Figure 5: Overview of Microsoft Speech Recognition API ................................................... 25
Figure 6 : Java Sound API Architecture .................................................................................. 29
Figure 7 : JSGF Architecture ................................................................................................... 30
Figure 8: Noise in Speech ........................................................................................................ 32
Figure 9 : Database Indexing ................................................................................................... 34
Figure 10 : Google Architecture .............................................................................................. 35
Figure 11 Phases in RUP ......................................................................................................... 41
Figure 12 : Overview of Agile ................................................................................................. 43
Figure 13 : Scrum Overview .................................................................................................... 46
Figure 15 : Use Case Diagram for System............................................................................... 48
Figure 16 Speech Recognition ................................................................................................. 55
Figure 17 Activity Diagram Noise Filtering ........................................................................... 56
Figure 18 Sequence Diagram Select a file ............................................................................... 57
Figure 19 Sequence Diagram Play File ................................................................................... 58
Figure 20 Sequence Diagram SR Pre Stage ............................................................................ 59
Figure 21 Sequence Diagram SR Post Stage .......................................................................... 60
Figure 22 Class Diagrams GUI & System ............................................................................... 61
Figure 23 Class Diagram SR System ....................................................................................... 62
Figure 24 : Speech Search Class Diagram ............................................................................... 63
Figure 25: SR Engine ............................................................................................................... 73
Figure 26 Open file .................................................................................................................. 74
Figure 27: Text output ............................................................................................................. 75
Figure 28 Speech Search Engine ............................................................................................. 77
![Page 8: Speech Recognition , Noise Filtering and Content Search Engine , Research Document](https://reader031.vdocument.in/reader031/viewer/2022020106/55a96f961a28ab2b508b47a6/html5/thumbnails/8.jpg)
Ultimate Speech Search
Page viii
List of Equations
Equation 1 : First order Markov chain ..................................................................................... 13
Equation 2: Stationary states Transition .................................................................................. 14
Equation 3: Observations independence .................................................................................. 14
Equation 4: observation sequence. ........................................................................................... 14
Equation 5 : Left Right topology constraints ........................................................................... 15
Equation 6: CSR Equations ..................................................................................................... 22
![Page 9: Speech Recognition , Noise Filtering and Content Search Engine , Research Document](https://reader031.vdocument.in/reader031/viewer/2022020106/55a96f961a28ab2b508b47a6/html5/thumbnails/9.jpg)
Ultimate Speech Search
Page ix
List of Tables
Table 1: Typical parameters used to characterize the capability of speech recognition system 9
Table 2 : Comparison in different techniques in speech recognition....................................... 17
Table 3: Isolated word recognition .......................................................................................... 21
Table 4 : Use Case description file upload .............................................................................. 50
Table 5 Use Case description play audio ................................................................................. 51
Table 6 Use Case description search ....................................................................................... 52
Table 7 Use Case description noise reduction ......................................................................... 53
Table 8 Use Case description noise process ............................................................................ 54
Table 9 Test Case 1 .................................................................................................................. 83
Table 10 Test Case 2 ................................................................................................................ 84
Table 11 Test Case 3 ................................................................................................................ 85
Table 12 Test Case 4 ................................................................................................................ 86
Table 13 Test Case 5 ................................................................................................................ 87
Table 14 Test Case 6 ................................................................................................................ 88
Table 15: Performance testing windows XP ............................................................................ 89
Table 16 : Performance Testing on UBUNTU ........................................................................ 90
![Page 10: Speech Recognition , Noise Filtering and Content Search Engine , Research Document](https://reader031.vdocument.in/reader031/viewer/2022020106/55a96f961a28ab2b508b47a6/html5/thumbnails/10.jpg)
Ultimate Speech Search
Page 1
CHAPTER 1
INTRODUCTION
1.1 Project Background
Throughout the history of human civilization time played a key role. Humans achieved
Technological advancement, scientific breakthroughs and unfortunately drawbacks within
certain time goals. In many cases these time goals were set by nature.
According to sooths point of view now we are live in an advanced era compared to
prehistoric eras. We all are actors in another part of a chronicle play in our time. Due to the
globalization distances in this planet narrowing. Within a shorter time limit people forced to
accomplish objectives and goals and most of the time they are lacking certain amount of time
in order to make it a success.
Some part of a society ask to accomplish a goal they may go for a research , interviews or
various any other fact finding techniques. Just imagine that they need to find certain
information from lectures and speeches. Can they find the appropriate resource materials in a
minimum time and with a minimum effort?
They have to go through many search results and they have to commit most of their valuable
time for a worthless task. If there is a way to find the lectures and speeches by searching by
their content we could guarantee that we can save our valuable time in a respectable manner
and we can invest this valuable time for deeds in sake of the planet earth.
![Page 11: Speech Recognition , Noise Filtering and Content Search Engine , Research Document](https://reader031.vdocument.in/reader031/viewer/2022020106/55a96f961a28ab2b508b47a6/html5/thumbnails/11.jpg)
Ultimate Speech Search
Page 2
1.2 Problem Description
The problem is to provide with the users with a search engine in order to search lectures and
speeches by their content for various purposes.
In order to do this we have to come up with fair solutions for the challenges that meet
throughout this process and they are as follows.
Noise analysis: - we have to analyze the nature of the speech or the lecture. Speeches and
lectures may come from various surrounding environmental conditions. This may directly
effect to the vocal part of the speech. So we have to reduce the noise as much as possible.
Speech recognition: - speech recognition is a vast area. Speeches can be done by many
personalities with different accents. Each individual has his/her own accent when speaking in
English or any other languages. In order to recognize the words they spoken we have to do a
deep research in order to build a speech recognition server to overcome the speech
recognition challenge.
Speech to text conversion:-Speech to text conversion is one of the key areas of this project
because it‟s the key point to build the database that contains the text version of speeches and
lectures.
The database: - All the converted versions of the speeches and lectures will be saved in the
database.
The search engine: - This is another challenging area of the project. The search engine will
show the appropriate search results from the database. I need to find the searching
![Page 12: Speech Recognition , Noise Filtering and Content Search Engine , Research Document](https://reader031.vdocument.in/reader031/viewer/2022020106/55a96f961a28ab2b508b47a6/html5/thumbnails/12.jpg)
Ultimate Speech Search
Page 3
mechanisms and methods for the search in order to give the user with efficient and accurate
results.
Database and the search engine are two parallel problems that need to be developed
more precisely. Without a proper structure for the database it‟s tedious to implement search
functionality.
![Page 13: Speech Recognition , Noise Filtering and Content Search Engine , Research Document](https://reader031.vdocument.in/reader031/viewer/2022020106/55a96f961a28ab2b508b47a6/html5/thumbnails/13.jpg)
Ultimate Speech Search
Page 4
1.3 Project Overview
The main challenge area of this project is to build the database containing the text version of
speeches and lectures. In order to accomplish these phenomena we have to perform some
tasks.
1.3.1 Noise analysis
A noise analysis will perform in order to ensure an efficient speech to text conversion. This
will enables us to isolate the human voice and remove the background environment in the
audio file. This may include background noise such as tape hiss, electric fans or hums, etc.
1.3.2 Speech recognition
Speech recognition comes in two flavors. They are speaker independent and speaker
dependent. The voice of the speaker or the lecturer may change. Because of that the project
uses speaker independent speech recognition.
1.3.3 Speech to text conversion
The system converts the speech in text format in order to build the database. The database
consists with the converted text version of the speeches and the lectures.
![Page 14: Speech Recognition , Noise Filtering and Content Search Engine , Research Document](https://reader031.vdocument.in/reader031/viewer/2022020106/55a96f961a28ab2b508b47a6/html5/thumbnails/14.jpg)
Ultimate Speech Search
Page 5
1.3.4 The database
The database consists two parts. They are the converted (speech to text) speech file or the
lecture file and the actual source files contains audio.
1.3.5 The search engine
The search engine search for the content of a speech or a lecture from the database and gives
the actual results. We might need to do something like summarizing. So the user can search
from the content more easily by typing a sentence or a word.
![Page 15: Speech Recognition , Noise Filtering and Content Search Engine , Research Document](https://reader031.vdocument.in/reader031/viewer/2022020106/55a96f961a28ab2b508b47a6/html5/thumbnails/15.jpg)
Ultimate Speech Search
Page 6
1.4 Project Scope
Existing search engines wont facilitates for search for a speech by its content. This system
gives you the facility to search a speech by its content. The system contains data about
English speeches and lectures.
These speeches and lectures were done in a low noisy environment because the system
would perform a less noise analysis. The system won‟t store music because the amount of
noise analysis in higher compared to a low noisy environment.
The speech recognition engine that going to build only supports for the English speeches and
lectures and the noise analysis will only supports for the English speeches and lectures and
speeches.
The system will convert speeches and lectures (low noise) to text format. After the
development process users will able to search from anywhere on this planet for a required
result.
Speaker independent speech recognition will be used because the system deals with different
type of speeches performed by different persons with different accents.
![Page 16: Speech Recognition , Noise Filtering and Content Search Engine , Research Document](https://reader031.vdocument.in/reader031/viewer/2022020106/55a96f961a28ab2b508b47a6/html5/thumbnails/16.jpg)
Ultimate Speech Search
Page 7
1.5 Project Objectives
1.0 Noise analysis and reduction
The system will performs noise filtering. This helps the speech recognition process. The
noisy signal channel will analyzed and split in to two parts. Amplitude of the noisy channel
set to low in value. An efficient noise filtering mechanism will use.
2.0 Continuous speech recognition system
To develop an efficient speech recognition engine to convert speeches and lectures to a text
format Speeches performed by various persons will be translated in to text format.
3.0 The Database
Database implementation Converted version of the speeches and lectures will be stored in the
data base in text format and the relevant speech or the lecture will be stored in another
database
4.0 The search engine
The search engine search for the content of a speech or a lecture from the database and gives
the actual results. We might need to do something like summarizing. So the user can search
from the content more easily by typing a sentence or a word.
![Page 17: Speech Recognition , Noise Filtering and Content Search Engine , Research Document](https://reader031.vdocument.in/reader031/viewer/2022020106/55a96f961a28ab2b508b47a6/html5/thumbnails/17.jpg)
Ultimate Speech Search
Page 8
CHAPTER 2
RESEARCH
2.1 Speech Recognition
The process of converting a phonic signal captured by a phone or a microphone or any other
audio device to a set of words is called speech recognition. Speech recognition is used in
command based applications such as data entry control systems, documentation preparation,
automation of telephone relay systems, in mobile devices such as in mobile phones and to
help people with hearing disabilities.
According to Professor Todd Austin (2007) Speech recognition is the task of translating an
acoustic waveform representing human speech into its corresponding textual representation.
Source(Aoustin,T. (2007). Speech Recognition. Available:
http://cccp.eecs.umich.edu/research/speech.php. Last accessed 17 July 2009. )
Figure 1: Overview of Steps in Speech Recognition
![Page 18: Speech Recognition , Noise Filtering and Content Search Engine , Research Document](https://reader031.vdocument.in/reader031/viewer/2022020106/55a96f961a28ab2b508b47a6/html5/thumbnails/18.jpg)
Ultimate Speech Search
Page 9
Applications that support speech recognition are “introduced on a weekly basis and speech
technology are rapidly entering new technical domains and new markets” (Java Speech API
Programmers Guide, 1998)
According to Zue et al. (2003), Speech recognition is a process that converts an acoustic
signal which can be captured by a microphone, to a set of words. Speech recognition systems
can be categorized by many parameters.
Parameters Range
Speaking mode
Isolated words to continues speech
Speaking Style
Read Speech to spontaneous speech
Enrolment
Speaker dependent to speaker independent
Small
Small (<20 words) to large (>20000 words)
Language Model
Finite state to context sensitive
Perplexity
Small(<10) larger(>100)
SNR
High(>3dB) to low (<20dB)
Transducer
Voice cancelling microphone to telephone
Table 1: Typical parameters used to characterize the capability of speech recognition
system
![Page 19: Speech Recognition , Noise Filtering and Content Search Engine , Research Document](https://reader031.vdocument.in/reader031/viewer/2022020106/55a96f961a28ab2b508b47a6/html5/thumbnails/19.jpg)
Ultimate Speech Search
Page 10
According to Hosom et al. (2003), “The dominant technology used in Speech Recognition is
called the Hidden Markov Model (HMM)”. There are four basics steps in performing speech
recognition. They can be seen in the figure below.
[Source: Hosom et al., 1999]
Figure 2 : Graphical Overview of the Recognition Process
![Page 20: Speech Recognition , Noise Filtering and Content Search Engine , Research Document](https://reader031.vdocument.in/reader031/viewer/2022020106/55a96f961a28ab2b508b47a6/html5/thumbnails/20.jpg)
Ultimate Speech Search
Page 11
During pass few years speech recognition systems have achieved a remarkable success such
their capability of recognition accuracy rate sometimes results over 98 percent. But that such
accuracy rate was achieved in quite environments and by using sample words in training. It
has been said that a good speech recognition system must be able to achieve good
performance in many circumstances such as a noisy environment. Noise can come on many
flavors.
Air conditions , fans , radios , coughs , tape hiss , cross talks channel distortions , lips smack
, breath noise , pops , sneeze are the basic factors that are engage in making a noisy
environment.
Typical component of a speech recognition system composed of Training data , Acoustic
model , Language model , Training model, Lexical model, Speech signal, Representation,
Model Classification , Search and Recognize words.
![Page 21: Speech Recognition , Noise Filtering and Content Search Engine , Research Document](https://reader031.vdocument.in/reader031/viewer/2022020106/55a96f961a28ab2b508b47a6/html5/thumbnails/21.jpg)
Ultimate Speech Search
Page 12
The figure below shows these components geometry in a speech recognition system.
Figure 3: Components of a typical speech recognition system.
![Page 22: Speech Recognition , Noise Filtering and Content Search Engine , Research Document](https://reader031.vdocument.in/reader031/viewer/2022020106/55a96f961a28ab2b508b47a6/html5/thumbnails/22.jpg)
Ultimate Speech Search
Page 13
2.2 Speech recognition methods
There is only few speech recognition methods are prevailing. They are categorizing as for the
mobile devices and for standalone applications.
2.2.1 Hidden Markov methods in speech recognition
Andre Markov is the founder of Markov process. Markov model involves probability and it
uses over a finite sets usually called its states.
When a state transition occurs it generates a character from the process. This model has a
finite state Markov chain and a finite set of output probability distribution. Hidden Markov
Constrains for speech recognition systems
1 – First order Markov chain.
This has been made by the assumption that the probability of transition to a state depends
only on the current state
𝑃 𝑞𝑡 + 1 =𝑆𝑗
𝑞𝑡= 𝑆𝑖 , 𝑞𝑡 − 1 = 𝑆𝑘 , 𝑞𝑡 − 2 = 𝑆𝑤 ,… . . , 𝑞𝑡 − 𝑛 = 𝑆𝑧 𝑃 𝑞𝑡 + 1 = 𝑆𝑗
𝑞𝑡= 𝑆𝑖
Equation 1 : First order Markov chain
![Page 23: Speech Recognition , Noise Filtering and Content Search Engine , Research Document](https://reader031.vdocument.in/reader031/viewer/2022020106/55a96f961a28ab2b508b47a6/html5/thumbnails/23.jpg)
Ultimate Speech Search
Page 14
2 – Stationary states Transition.
This assumption proved that the state changes are mutually exclusive from the time.
𝑎𝑖𝑗 = 𝑃 𝑞𝑡 + 1 = 𝑆𝑗 𝑞𝑡 = 𝑆𝑖
Equation 2: Stationary states Transition
3 – Observations independence.
This assumption regards to the state changes depend only on the underline Markov chain.
However this assumption was depreciated.
𝑃 𝑂𝑡
𝑂𝑡− 1,𝑂𝑡 − 2,… . . ,𝑂𝑡 − 𝑝 , 𝑞𝑡 , 𝑞𝑡 − 1 , 𝑞𝑡 − 2 ,… . 𝑞𝑡 − 𝑝
= 𝑃 𝑂𝑡
𝑞𝑡 , 𝑞𝑡 − 1 , 𝑞𝑡 − 2 ,… . 𝑞𝑡 − 𝑝
Equation 3: Observations independence
Where “p “represents considered history of the observation sequence.
𝑏𝑗 𝑂𝑡 = 𝑃 𝑂𝑡
𝑞𝑡= 𝑗
Equation 4: observation sequence.
![Page 24: Speech Recognition , Noise Filtering and Content Search Engine , Research Document](https://reader031.vdocument.in/reader031/viewer/2022020106/55a96f961a28ab2b508b47a6/html5/thumbnails/24.jpg)
Ultimate Speech Search
Page 15
4 – Left-Right topology constraint:
𝑎𝑖𝑗 = 0 𝑓𝑜𝑟 𝑎𝑙𝑙 𝑗 > 𝑖 + 2 𝑎𝑛𝑑 𝑗 < 𝑖 { 1 𝑓𝑜𝑟 𝑖 1 1 0 𝑓𝑜𝑟 1 𝑖 𝑁 ( ) = < £
= 𝑖 𝑖 𝑝 𝑃 𝑞 𝑆
Equation 5 : Left Right topology constraints
The figure below shows an example of HMM for word “Yes” on an utterance.
Figure 4 : example of HMM for word “Yes” on an utterance
![Page 25: Speech Recognition , Noise Filtering and Content Search Engine , Research Document](https://reader031.vdocument.in/reader031/viewer/2022020106/55a96f961a28ab2b508b47a6/html5/thumbnails/25.jpg)
Ultimate Speech Search
Page 16
2.2.2 Client side speech recognition
According to Hosom et al. (2003), Client Side - Speech Recognition is technology that allows
a computer to identify the words that a person speaks into a microphone or telephone. The
basic advantages of having client side speech recognition are it assures a faster response time
because all the processing handled in the client side. The other advantage is it does not use
any network connections like GPRS. According to Hagen at el. (2003, p.66) the problems of
client side speech recognition is, Recognition accuracy and Running time (power
consumption).
2.2.3 Dynamic Time wrapping based speech recognition
This method was used in past decades but now has been depreciated. This algorithm
measures similarities between two sequences which may vary in time or speed. Number of
templates being used in order to perform automatic speech recognition in Dynamic Time
Wrapping based speech recognition. This process involves normalization of distortion and the
lowest normalized distortion is identified as a word.
2.2.4 Artificial Neural Networks
The mechanism inside ANN is to filter the human speech frequencies from the other
frequencies due to the fact that the non speech sound covers higher frequency range than
speech.
![Page 26: Speech Recognition , Noise Filtering and Content Search Engine , Research Document](https://reader031.vdocument.in/reader031/viewer/2022020106/55a96f961a28ab2b508b47a6/html5/thumbnails/26.jpg)
Ultimate Speech Search
Page 17
The table below shows a comparison between different speech recognition mechanisms.
Source [anon. (nd). School of Electrical, Computer and Telecommunications
Engineering. Available: http://www.elec.uow.edu.au/staff/wysocki/dspcs/papers/004.pdf].
Last accessed 23rd
August 2009.]
Table 2 : Comparison in different techniques in speech recognition
![Page 27: Speech Recognition , Noise Filtering and Content Search Engine , Research Document](https://reader031.vdocument.in/reader031/viewer/2022020106/55a96f961a28ab2b508b47a6/html5/thumbnails/27.jpg)
Ultimate Speech Search
Page 18
2.2.5 Continuous speech recognition
Continuous speech recognition applies is used when a speaker pronounce words sentence or
phrase that are in a series or specific order and are dependent on each other, as if linked
together. This system operates on a system that words are connected to each other and not
separated by pauses.
Because there is more variety of effects it‟s a tedious task to manipulate it. Co articulation is
another series issue in continuous speech recognition. . The effect of the surrounding
phonemes to a single phoneme is high. Starting and ending words are affecting by the
following words and also affected by the speed of the speech.
It‟s harder to track down a fast speech. Two algorithms are usually involves in Continuous
speech recognition. They are Viterbi Algorithm and Baum Welch Algorithm.
2.2.6 Direct Speech Recognition
This process is responsible for identify the speech such that from a word by word and it
follows by pauses.
![Page 28: Speech Recognition , Noise Filtering and Content Search Engine , Research Document](https://reader031.vdocument.in/reader031/viewer/2022020106/55a96f961a28ab2b508b47a6/html5/thumbnails/28.jpg)
Ultimate Speech Search
Page 19
2.3 Speaker Characteristics
2.3.1 Speaker Dependent
Speaker Dependent speech recognition systems are developed for a single user purpose only.
No other user can use the system and it will function with only a single user. These systems
subjected to train by the user for the functionality purpose.
One such advantage is that these kinds of systems support more vocabulary than the speaker
independent system and the disadvantage is the limitation of usage for the type of users. This
technology is used in steno masks
.
2.3.2 Speaker Independent
Speaker Independent speech recognition systems are harder to implement relative to the
speaker dependent speech recognition systems. The system need to recognize the patterns and
different accents spoken by many users. The advantage of this system is it can be used by
many users without training.
The most important steps in order to build a speaker independent SRS is to identify what
parts of speech are generic, and which ones vary from person to person. The Speaker
dependent speech recognition can be used by many users despite they are harder to
implement.
![Page 29: Speech Recognition , Noise Filtering and Content Search Engine , Research Document](https://reader031.vdocument.in/reader031/viewer/2022020106/55a96f961a28ab2b508b47a6/html5/thumbnails/29.jpg)
Ultimate Speech Search
Page 20
2.3.3 Conclusion
Speaker Independent speech recognition system has been selected for the project because the
system has to deal with many speeches done by many users.
The speech accent and phoneme patterns are different from a speaker to a speaker and it‟s not
possible to perform an individual training for each and every speaker.
Java Speech API only supports for speaker independent speech recognition systems and
that‟s another reason to select speaker independent speech recognition.
![Page 30: Speech Recognition , Noise Filtering and Content Search Engine , Research Document](https://reader031.vdocument.in/reader031/viewer/2022020106/55a96f961a28ab2b508b47a6/html5/thumbnails/30.jpg)
Ultimate Speech Search
Page 21
2.4 Speech Recognition mechanisms
2.4.1 Isolated word recognition
This identifies a single word at a time and pauses are involved between words. Isolated word
recognition is the primary stage of speech recognition and it widely used in command based
applications.
Isolated speech recognition needs a less processing power and primary patter matching
algorithms evolved.
Table 3: Isolated word recognition
![Page 31: Speech Recognition , Noise Filtering and Content Search Engine , Research Document](https://reader031.vdocument.in/reader031/viewer/2022020106/55a96f961a28ab2b508b47a6/html5/thumbnails/31.jpg)
Ultimate Speech Search
Page 22
2.4.2 Continuous speech recognition
According to Hunt, A. (1997) Continuous speech is more difficult to handle because it is
difficult to find the start and end points of words and Co articulation - the production of each
phoneme is affected by the production of surrounding phonemes.
According to Peinado & Segura (2006, p.9), there are three types of errors in Continuous
speech recognition systems.
Substitutions - recognized sentence have different words substituting original words.
Deletions - recognized sentence with missing words.
Insertions - recognized sentence have new/extra words. Error rate calculation in Continuous
speech recognition by Stephen at el. (2003, p.2)
𝐶𝑜𝑟𝑟𝑒𝑐𝑡 𝑃𝑒𝑟𝑐𝑒𝑛𝑡𝑎𝑔𝑒 =𝐻1
𝑁2𝑥 100%
𝐴𝑐𝑐𝑢𝑟𝑎𝑐𝑦 =𝑁 − 𝐷3 − 𝑆4 − 𝐼
𝑁𝑥 100%
𝐴𝑐𝑐𝑢𝑟𝑎𝑐𝑦 =𝐻 − 𝐼
𝑁𝑥 100%
𝑊𝑜𝑟𝑑 𝐸𝑟𝑟𝑜𝑟 𝑟𝑎𝑡𝑒 =𝑆 + 𝐷 + 𝐼5
𝑁𝑥 100%
Equation 6: CSR Equations
1 Number of words correctly recognized
2 Total number of words in the actual speech
3 Deletions
4 Substitutions
5 Insertions
![Page 32: Speech Recognition , Noise Filtering and Content Search Engine , Research Document](https://reader031.vdocument.in/reader031/viewer/2022020106/55a96f961a28ab2b508b47a6/html5/thumbnails/32.jpg)
Ultimate Speech Search
Page 23
2.4.3 Conclusion
As for the project continuous speech recognition mechanism has chosen because the system
going to deal with continuous speeches in order to build a database and the back end of the
system serves as a standalone application.
![Page 33: Speech Recognition , Noise Filtering and Content Search Engine , Research Document](https://reader031.vdocument.in/reader031/viewer/2022020106/55a96f961a28ab2b508b47a6/html5/thumbnails/33.jpg)
Ultimate Speech Search
Page 24
2.5 Vocabulary Size
Vocabulary is the amount of words that known by a person. Greater the vocabulary size, the
depth that he know is higher. The same rule applies for speech recognition systems.
2.5.1 Limited Vocabulary
Limited vocabulary systems have a limited number of words. This can be varies 100 to 10000
words. These systems need a less processing power and more suitable for mobile devices.
2.5.2 Large Vocabulary
Large Vocabulary size for a speech recognition system mainly used in servers or stand alone
applications and evolves more processing power. It will identify almost every word speak by
a person. This vocabulary has more than 10000 words.
2.5.3 Conclusion
Large Vocabulary has been chosen for the project because the project‟s main processes are
handled by standalone applications and it has to collaborate with many speeches.
![Page 34: Speech Recognition , Noise Filtering and Content Search Engine , Research Document](https://reader031.vdocument.in/reader031/viewer/2022020106/55a96f961a28ab2b508b47a6/html5/thumbnails/34.jpg)
Ultimate Speech Search
Page 25
2.6 Speech recognition API’s
2.6.1 Microsoft Speech API 5.3
Microsoft Speech API reduces the coding overload from the programmers. It‟s equipped with
speech to text and text to speech recognition.
This API requires a .NET based building environment and have to purchase. Scope of Speech
Application Programming Interface or SAPI lies within windows environments. It allows the
use of speech recognition and speech synthesis within Windows applications.
Applications that use SAPI include Microsoft Office, Microsoft Agent and Microsoft Speech
Server.
In general SAPI defines a set of interfaces and classes to develop dynamic speech recognition
systems. SAPI uses two libraries for its front end and for its back end. For front end it uses
the “Fast format” library. For the back end SAPI uses the “Pantheios”. Both these are C++
open source libraries.
Figure 5: Overview of Microsoft Speech Recognition API
![Page 35: Speech Recognition , Noise Filtering and Content Search Engine , Research Document](https://reader031.vdocument.in/reader031/viewer/2022020106/55a96f961a28ab2b508b47a6/html5/thumbnails/35.jpg)
Ultimate Speech Search
Page 26
2.6.2 Java Speech API
Java Speech API provides the both speech recognition and synthesis capabilities and it is
freely available. JSAPI supports for multiple platform development and supports open source
and non open source third party tools. JSAPI package comprises with java.speech,
javax.speech.recognition and javax.speech.synthesis.
Sun Micro Systems build JSAPI in collaboration with
Apple Computer, Inc.
AT&T
Dragon Systems, Inc.
IBM Corporation
Novell, Inc.
Philips Speech Processing
Texas Instruments Incorporated
It supports speaker independent speech recognition and W3C standards.
Speech recognizer‟s capabilities:
Built-in grammars (device specific)
Application defined grammars
Speech synthesizer‟s capabilities:
Formant synthesis
Concatenate synthesis
![Page 36: Speech Recognition , Noise Filtering and Content Search Engine , Research Document](https://reader031.vdocument.in/reader031/viewer/2022020106/55a96f961a28ab2b508b47a6/html5/thumbnails/36.jpg)
Ultimate Speech Search
Page 27
Java Speech API specifies a cross-platform interface to support command and control
recognizers, dictation systems and speech synthesizers. Java Speech API has two
technologies. They are speech synthesis and speech recognition. Speech synthesis provides
the reverse process of producing synthetic speech from text generated by an application, an
apple, or a user.
With the synthesis capabilities developer‟s can build applications to generate speech from the
text.
There are two primary steps to produce speech from a text.
Structure analysis: Processes the input text to determine where paragraphs, sentences, and
other structures start and end. For most languages, punctuation and formatting data are used
in this stage.
Text pre-processing: Analyzes the input text for special constructs of the language. In
English, special treatment is required for abbreviations, acronyms, dates, times, numbers,
currency amounts, e-mail addresses, and many other forms. Other languages need special
processing for these forms, and most languages have other specialized requirements.
Speech recognition grants the privileges for the computer to listen to human speech
understand and recognize and converts in to text.
![Page 37: Speech Recognition , Noise Filtering and Content Search Engine , Research Document](https://reader031.vdocument.in/reader031/viewer/2022020106/55a96f961a28ab2b508b47a6/html5/thumbnails/37.jpg)
Ultimate Speech Search
Page 28
There are some steps in order to build a speech recognition system.
Grammar design: Defines the words that may be spoken by a user and the patterns in
which they may be spoken.
Signal processing: Analyzes the spectrum characteristics of the incoming audio.
Phoneme recognition: Compares the spectrum patterns to the patterns of the
phonemes of the language being recognized.
Word recognition: Compares the sequence of likely phonemes against the words and
patterns of words specified by the active grammars.
Result generation: Provides the application with information about the words the
recognizer has detected in the incoming audio.
Behalf of JSAPI we need another two Java API‟s. They are Java Sound API and Java
media frame work. Java sound API has the capabilities of handling sounds and it‟s
equipped with a rich set of classes and interfaces that directly deals with incoming sound
signals. Java Sound API widely used for the following areas and industries.
Communication frameworks, such as conferencing and telephony
End-user content delivery systems, such as media players and music using streamed
content
Interactive application programs, such as games and Web sites that use dynamic
content
Content creation and editing
Tools, toolkits, and utilities
Java sound API uses a hardware independent architecture. It is designed to allow different
sorts of audio components to be installed on a system and accessed by the API.
![Page 38: Speech Recognition , Noise Filtering and Content Search Engine , Research Document](https://reader031.vdocument.in/reader031/viewer/2022020106/55a96f961a28ab2b508b47a6/html5/thumbnails/38.jpg)
Ultimate Speech Search
Page 29
With Java Sound API we can process both the MIDI 6and wav sound formats.
Java media frame work is a recently developed frame work which can be used to build
dynamic multimedia applications.
6 Musical Instrument Digital Interface
Figure 6 : Java Sound API Architecture
![Page 39: Speech Recognition , Noise Filtering and Content Search Engine , Research Document](https://reader031.vdocument.in/reader031/viewer/2022020106/55a96f961a28ab2b508b47a6/html5/thumbnails/39.jpg)
Ultimate Speech Search
Page 30
2.6.2.1 Java Speech and Grammar format
JSGF or Java speech and Grammar Format was built by the Sun Micro systems. It defines the
set of rules and words for speech recognition. JSGF is plat form independent specification
and it is derived from the Speech recognition Grammar Specification.
The Java Speech Grammar Format has been developed for use with recognizers that
implement the Java Speech API. However, it may also be used by other speech recognizers
and in other types of applications.
A typical grammar rule is a composition of what to be spoken, the text to be spoken and a
reference to other grammar rules. A JSGF file comes in a normal file format or in XML
format.
source (anon. (nd). JSGF Architecture. Available: http://www.cs.cmu.edu/. Last accessed
24th
july 2009.)
Figure 7 : JSGF Architecture
![Page 40: Speech Recognition , Noise Filtering and Content Search Engine , Research Document](https://reader031.vdocument.in/reader031/viewer/2022020106/55a96f961a28ab2b508b47a6/html5/thumbnails/40.jpg)
Ultimate Speech Search
Page 31
2.7 Speech Recognition Algorithms
Viterbi Algorithm is widely used in speech recognition. It is supports for dynamic
programming. This algorithm directly deals with the hidden Markov methods. Baum Welch
Algorithm is another algorithm used in this process. It evolves probability and maximum
likelihood. Forward Backward algorithm is another algorithm used in this process and it is
directly deals with hidden Markov methods. There are three steps in this algorithm.
Computing forward probabilities
Computing backward probabilities
Computing smoothed values
A combination of the above algorithms (a customized version) will use in the project.
![Page 41: Speech Recognition , Noise Filtering and Content Search Engine , Research Document](https://reader031.vdocument.in/reader031/viewer/2022020106/55a96f961a28ab2b508b47a6/html5/thumbnails/41.jpg)
Ultimate Speech Search
Page 32
1. 8 Noise Filtering
Noise can be emerged in a speech by tape hiss, clapping, cough or by any other relevant
environmental or machinery factors. Noise plays a major role in the play of speech
recognition.
Source (anon. (nd). Departement Elektrotechniek. Available:
http://www.esat.kuleuven.be/psi/spraak/theses/08-09-en/clp_lp_mask.png. Last accessed 22
September 2009)
Figure 8: Noise in Speech
![Page 42: Speech Recognition , Noise Filtering and Content Search Engine , Research Document](https://reader031.vdocument.in/reader031/viewer/2022020106/55a96f961a28ab2b508b47a6/html5/thumbnails/42.jpg)
Ultimate Speech Search
Page 33
According to Khan, E., and Levinson, R (1998) Speech recognition has achieved quite
remarkable progress in the past years.
Many speech recognition systems are capable of producing very high recognition
accuracies (over 98%).
But such recognition accuracy only applies for a quiet environment (very low noise)
and for speakers whose sample words were used during training.
Spectral subtraction and Weiner filtering are the two most popular methods that are available
in noise reduction because they are straight forward to implement.
1.8.1 Weiner filtering
Weiner filtering is a common model that applies for filtering noise. z(k), is a signal, s(k), plus
additive noise, n(k), that is uncorrelated with the signal z(k) = s(k) + n(k). If the noise is also
stationary then the power spectra of the signal and noise add 𝑃𝑧 𝑤 = 𝑃𝑠 𝑤 + 𝑃, 𝑤
1.8.2 Conclusion
Weiner filtering method has been chosen to the project because it is widely acceptable
method and so easy to implement.
![Page 43: Speech Recognition , Noise Filtering and Content Search Engine , Research Document](https://reader031.vdocument.in/reader031/viewer/2022020106/55a96f961a28ab2b508b47a6/html5/thumbnails/43.jpg)
Ultimate Speech Search
Page 34
2.9 Database and data structure
Database contains the text version of speeches and their location. Sample database maintains
in the hard disk and the locations are saved in a file. Database indexing used for efficient
search results.
Database indexing improves the speed of data structure. Indexing can be divided in to two
parts that is clustered and none clustered.
None clustered indexing doesn‟t bother about the order of the actual records. This results
additional input and output operations to get the actual results.
In clustering indexing it reorders data according to their indexes as data blocks. It‟s more
efficient for the searching purposes.
2.9.1 Conclusion
Clustered indexing has been chosen for the project because the system evolves search
operation for speeches.
Figure 9 : Database Indexing
![Page 44: Speech Recognition , Noise Filtering and Content Search Engine , Research Document](https://reader031.vdocument.in/reader031/viewer/2022020106/55a96f961a28ab2b508b47a6/html5/thumbnails/44.jpg)
Ultimate Speech Search
Page 35
2.10 Search Engine
Search engine basically act as the terminal for searching speeches and lectures. It will check
for search results in locally deployed database that contains the text version of speeches and
lectures. A search engine operates in the order of web crawling, indexing and searching.
Source(Sergy ,B. Lawrence,P.. (nd). The Anatomy of a Large-Scale Hypertextual Web Search
Engine. Available: http://infolab.stanford.edu/~backrub/google.html. Last accessed 24 march
2009.)
Figure 10 : Google Architecture
![Page 45: Speech Recognition , Noise Filtering and Content Search Engine , Research Document](https://reader031.vdocument.in/reader031/viewer/2022020106/55a96f961a28ab2b508b47a6/html5/thumbnails/45.jpg)
Ultimate Speech Search
Page 36
2.11 MATLAB
MATLAB was developed by MathWorks. MathWorks is a privately held multinational
company. They are specialized in technical software.
MATLAB is a multi platform fourth generation programming language. Just like other many
languages MATLAB supports the following features.
Matrix manipulation
Plotting of functions and data
Algorithm implementation
Create Graphical user interfaces
Interfacing with other programming languages
Most of the MATLAB code snippets show a numerical nature. Regardless of that factor by
using MATLAB we can build systems in a more precise manner and the line of codes that
required buildings the system are relatively few compared with other languages such as
JAVA or C#.
Just like other object oriented languages MATALB supports classes, interfaces and functions.
They are used in high level MATLAB programming.
MATLAB directly supports both the Analogue and Digital Signal processing. It has defined a
set of rich features to work with Analogue and Digital Signal Processing. Signal transforms
and spectral analysis, digital system design, digital filtering, adaptive filtering, coding and
compression algorithms are the features which supports by MATLAB.
![Page 46: Speech Recognition , Noise Filtering and Content Search Engine , Research Document](https://reader031.vdocument.in/reader031/viewer/2022020106/55a96f961a28ab2b508b47a6/html5/thumbnails/46.jpg)
Ultimate Speech Search
Page 37
CHAPTER 3
ANALYSIS
3.0 System requirements
3.11 Functional requirements
1. The application must convert to speech or the lecture to a text format.
2. Converted text should be visible to the user.
3. If the speech or the lecture has noise it must be reduced in a manner that eligible for speech
recognition process.
4. Speeches with different accent need to be identified in a reasonable manner by the system.
5. The search results must be efficient and reliable.
3.1.2 Non functional requirements
1. Search algorithm need to be efficient.
2. Should not cater duplicate search results.
3. Should not take more time in searching.
4. Speech to text conversion must be efficient and accurate.
5. Noise reduction must maintain a fair performance.
![Page 47: Speech Recognition , Noise Filtering and Content Search Engine , Research Document](https://reader031.vdocument.in/reader031/viewer/2022020106/55a96f961a28ab2b508b47a6/html5/thumbnails/47.jpg)
Ultimate Speech Search
Page 38
3.1.3 Software Requirements
Java JDK 1.6:- JDK 1.6 equipped with the state of the art technology and includes much
functionality. Java Sound API newest version must be required.
NetBeans IDE 6.5:- is an open source IDE and it equipped with h PHP, JavaScript and Ajax
editing features, improved and the Java Persistence API, and tighter Glassfish v3 and MySQL
integration. It also facilitates features for the architectural drawing of the system. It also
equipped powerful J2EE components that are essential to build the search engine. We can
integrate any third party component that used for the system without much efforts and it has
the feature of code generation. Many non open source plugging supports this IDE.
Windows XP or equivalent operating system: - Windows XP operating system supports both
the open source components as well as commercialized components. We can deploy
everything that is essential for our project. Windows XP is a robust error less, user friendly
operating system compare to other windows operating system.
Apache tomcat server: version 5.5.27-Available in http://tomcat.apache.org/ is a freely
available server that we can run web programs on it. It is robust and open source. It has many
third party components that s essential to integrate stand alone, mobile, web based application
in to each other. This server comes with the NetBeans IDE.
XML Database:- This is the world‟s most popular database and its open source. It directly
supports for apache tomcat server and the NetBeans IDE and the crashing rate are lesser
compared to other databases with web services.
Proper sound driver software is required in order to achieve best results.
Matlab software required to perform noise filtering.
![Page 48: Speech Recognition , Noise Filtering and Content Search Engine , Research Document](https://reader031.vdocument.in/reader031/viewer/2022020106/55a96f961a28ab2b508b47a6/html5/thumbnails/48.jpg)
Ultimate Speech Search
Page 39
3.1.4 Hardware requirements
32 bit Intel Dual Core IV processor or greater:- concern about the development phase of the
project a massive amount of processing power is required as for the speech recognition and
for noise analysis , text to speech conversion and for the search. It is advisable to have high
end machine inured to prevent deadlocks.
64 bit PCI sound card: - A high end sound card required to process digital audio signals.
Minimum of 1 GB DDR3 RAM is required and 2 GB of virtual memory must be present
in the system.
The default components of a personal computer must required
A modem or a router is required in order to test the search between many users.
1mb ADSL internet connection or greater is required for the data gathering.
A microphone must need as for the future enhancements. So the users can store their own
speech and as for a future use any one can search any particular speeches and lectures.
20GB hard disk is required with 7200 or more rotation rate because the system going to maintain
the database in my machine.
Note: At least a Duel Core Processor is required because the speech recognition process
needs a massive processing power.
![Page 49: Speech Recognition , Noise Filtering and Content Search Engine , Research Document](https://reader031.vdocument.in/reader031/viewer/2022020106/55a96f961a28ab2b508b47a6/html5/thumbnails/49.jpg)
Ultimate Speech Search
Page 40
3.2 System Development Methodologies
All the methodologies compared in here were extended versions of previously commonly
used methodologies.
3.2.1 Rational Unified Process
Rational Unified process is a development methodology created by the rational software
division of IBM in 2003. It‟s an iterative system development process. RUP explains how
specific goals are achieved in a detailed manner.
RUP is a methodology of Managing Object Oriented software development. According to
Kroll and Kruchten (2003) “The RUP is a software development approach that is iterative,
architecture-centric, and use-case-driven.” RUP has extensible features and they are as
follows.
Iterative Development
Requirements Management
Component-Based Architectural Vision
Visual Modeling of Systems
Quality Management
Change Control Management
![Page 50: Speech Recognition , Noise Filtering and Content Search Engine , Research Document](https://reader031.vdocument.in/reader031/viewer/2022020106/55a96f961a28ab2b508b47a6/html5/thumbnails/50.jpg)
Ultimate Speech Search
Page 41
The figure below shows basic overviews of its phases.
Source: (anon. (nd). Department of Computer Science. Available:
people.cs.uchicago.edu/~matei/CSPP523/lect4.ppt. Last accessed 24 march 2009.)
Figure 11 Phases in RUP
![Page 51: Speech Recognition , Noise Filtering and Content Search Engine , Research Document](https://reader031.vdocument.in/reader031/viewer/2022020106/55a96f961a28ab2b508b47a6/html5/thumbnails/51.jpg)
Ultimate Speech Search
Page 42
Advantages of RUP
It is a well-defined and well-structured software engineering process.
It supports changing requirements and provides means to manage the change and
related risks
It promotes higher level of code reuse.
It reduces integration time and effort, as the development model is iterative.
It allows the systems to run earlier than with other processes that essential for the
system.
Risk management feature allows identifying risks before the development process.
It has the unique feature that “Plan a little”, “design a little” and codes a little.
RUP is an idea driven, principle based methodology.
RUP methodology is a worldwide commercial standard.
Disadvantages of RUP
For most of the projects RUP is an insufficient methodology.
We need to customize the processes due to various situations.
It has a poor usability support.
The process in relatively complex and the weight age is high.
![Page 52: Speech Recognition , Noise Filtering and Content Search Engine , Research Document](https://reader031.vdocument.in/reader031/viewer/2022020106/55a96f961a28ab2b508b47a6/html5/thumbnails/52.jpg)
Ultimate Speech Search
Page 43
3.2.2 Agile Development Method
Agile development methodology is an iterative process. Agile has short time iterations and
due to that have minimum risk. The Agile software development methodology has the feature
of break tasks into small increments with minimal planning and it won‟t directly involve long
term planning. Agile highly supports for object oriented developing.
Most of all Agile has the unique feature called Extreme programming, now widely used in
software development process.
According to Ambler (2005) Agile is an iterative and incremental (evolutionary) approach to
software development which is performed in a highly collaborative manner by self-
organizing teams within an effective governance framework that produces high quality
software in a cost effective and timely manner which meets the changing needs of its
stakeholders.
Figure 12 : Overview of Agile
![Page 53: Speech Recognition , Noise Filtering and Content Search Engine , Research Document](https://reader031.vdocument.in/reader031/viewer/2022020106/55a96f961a28ab2b508b47a6/html5/thumbnails/53.jpg)
Ultimate Speech Search
Page 44
Advantages in Agile Software Development
Increased Control
Rapid Learning
Early Return on Investment
Satisfied Stakeholders
Responsiveness to Change
Disadvantages in Agile Software Development
Agile evolves heavy documentation.
Agile Requirements are barely insufficient for the projects.
Not an organized methodology.
Because testing is integrated through the development the development cost is
relatively high.
Too much user involvement may spoil the project.
![Page 54: Speech Recognition , Noise Filtering and Content Search Engine , Research Document](https://reader031.vdocument.in/reader031/viewer/2022020106/55a96f961a28ab2b508b47a6/html5/thumbnails/54.jpg)
Ultimate Speech Search
Page 45
3.2.3Scrum Development Methodology
According to Mikneus,s , S., Akinde, A. (2003)
Scrum is an Agile Software Development Process.
Scrum is not an acronym
Name taken from the sport of Rugby, where everyone in the team pack acts together
to move the ball down the field
Analogy to development is the team works together to successfully develop quality
software
According to Jeff Sutherland (2003) “Scrum assumes that the systems development process is
an unpredictable, complicated process that can only be roughly described as an overall
progression.” “Scrum is an enhancement of the commonly used iterative/incremental object-
oriented development cycle” Scrum principles include:
Quality work: empowers everyone involved to be feeling good about their job.
Assume Simplicity: Scrum is a way to detect and cause removal of anything that gets
in the way of development.
Embracing Change: Team based approach to development where requirements are
rapidly changing.
Incremental changes: Scrum makes these possible using sprints where a team is able
to deliver a product (iteration) deliverable within 30 days.
![Page 55: Speech Recognition , Noise Filtering and Content Search Engine , Research Document](https://reader031.vdocument.in/reader031/viewer/2022020106/55a96f961a28ab2b508b47a6/html5/thumbnails/55.jpg)
Ultimate Speech Search
Page 46
Advantages in Scrum
Scrum has the ability to respond unseen software development risks
It‟s a specialized process for commercial application development.
It gives the developers of facility to deliver a functional application to the clients.
Disadvantages in Scrum
Not suitable for researched based software developments.
Source[anon. (nd). anon. Available: http://www.methodsandtools.com/archive/scrum1.gif.
Last accessed 26 th March 2009.]
3.2.4 Conclusion
Agile software development methodology has been chosen for the development process
because it supports Object oriented development, has short iterations and supports Extreme
programming.
Figure 13 : Scrum Overview
![Page 56: Speech Recognition , Noise Filtering and Content Search Engine , Research Document](https://reader031.vdocument.in/reader031/viewer/2022020106/55a96f961a28ab2b508b47a6/html5/thumbnails/56.jpg)
Ultimate Speech Search
Page 47
3.3 Test Plan
The Systems main functionalities are noise analysis, speech recognition, database indexing
(directly effects to the search) and the search engine.
The system takes data (Speeches and lectures) from various conditions with lowered noise. But
the system cannot assure the effect of the noise factor. Due to that reason we perform noise
analysis and try to reduce it. Otherwise it will affect to the speech recognition process.
3.3.1System testing
Speeches and lectures with different accent [English only USA and British]:- In order to test
the speech recognition engines accuracy it will tested against different accents. The expected
results must be in a minimum difference with minimum errors.
Content Search:- when the user tries to search by the content by typing a word or a phrase
the appropriate search result will be displayed. The speech or the lecture containing the
specified words or the phrase will be displayed
![Page 57: Speech Recognition , Noise Filtering and Content Search Engine , Research Document](https://reader031.vdocument.in/reader031/viewer/2022020106/55a96f961a28ab2b508b47a6/html5/thumbnails/57.jpg)
Ultimate Speech Search
Page 48
CHAPTER 4
SYSTEM DESIGN
4.1 Use Case Diagram
The noise filters functionality implements separately from the speech recognition system.
Noise filtering system represents as the “Actor”.
Figure 14 : Use Case Diagram for System
![Page 58: Speech Recognition , Noise Filtering and Content Search Engine , Research Document](https://reader031.vdocument.in/reader031/viewer/2022020106/55a96f961a28ab2b508b47a6/html5/thumbnails/58.jpg)
Ultimate Speech Search
Page 49
The Figure above shows the Use Case Diagram for the entire system. The System mainly
consists with two actors. A user can uploads speech file in wav format to perform the speech
recognition.
Noise filtering handled by a separate system. The user has to upload a noisy speech file and
the noise filtering system will produce a file with lowered noise.
![Page 59: Speech Recognition , Noise Filtering and Content Search Engine , Research Document](https://reader031.vdocument.in/reader031/viewer/2022020106/55a96f961a28ab2b508b47a6/html5/thumbnails/59.jpg)
Ultimate Speech Search
Page 50
4.2 Use case description
4.2.1 Use case description for file upload
Use Case
Use Case One
Description
User uploads a file
Actors
user
Assumptions
User uploads a file in .wav format. The user has to upload a file without
noise.
Steps
User has to run the system, press open button and have to select a file
Variations
A user may uploads a file without noise or with noise,
Non functional
requirements
All the necessary hardware configuration must met.
Issues
None
Table 4 : Use Case description file upload
![Page 60: Speech Recognition , Noise Filtering and Content Search Engine , Research Document](https://reader031.vdocument.in/reader031/viewer/2022020106/55a96f961a28ab2b508b47a6/html5/thumbnails/60.jpg)
Ultimate Speech Search
Page 51
4.2.2Use Case description for play an audio file
Use Case
Use Case Two
Description
User plays a .wav file
Actors
User
Assumptions
User can only play a file in wav format
Steps
User has to open a file, and then the button play gets enabled. User has to
press the play button.
Variations
No variations , only files in wav format can be played
Non functional
Requirements
All the necessary hardware configuration must met.
Issues
None
Table 5 Use Case description play audio
![Page 61: Speech Recognition , Noise Filtering and Content Search Engine , Research Document](https://reader031.vdocument.in/reader031/viewer/2022020106/55a96f961a28ab2b508b47a6/html5/thumbnails/61.jpg)
Ultimate Speech Search
Page 52
4.2.3 Use Case description for search
Use Case
Use Case Three
Description
User search for a speech by content
Actors
User
Assumptions
User can search a speech by typing a sentences
Steps
User has to run the speech search program. Type the thing he/she wants to
search for and presses search
Variations
No variations
Non functional
All the necessary hardware configuration must met.
Issues
None
Table 6 Use Case description search
![Page 62: Speech Recognition , Noise Filtering and Content Search Engine , Research Document](https://reader031.vdocument.in/reader031/viewer/2022020106/55a96f961a28ab2b508b47a6/html5/thumbnails/62.jpg)
Ultimate Speech Search
Page 53
4.2.4 Use Case description for noise reduced output
Use Case
Use Case Four
Description
Noise reduction output produced by the system
Actors
Noise filtering system
Assumptions
Permanent elimination of the noise is unreachable.
User uploads a noisy file in a wav format
Steps
User has to run the noise filtering program in MATLAB
User has to input a file which includes the noise
Variations
No variations
Non functional
All the necessary hardware configuration must met.
Issues
None
Table 7 Use Case description noise reduction
![Page 63: Speech Recognition , Noise Filtering and Content Search Engine , Research Document](https://reader031.vdocument.in/reader031/viewer/2022020106/55a96f961a28ab2b508b47a6/html5/thumbnails/63.jpg)
Ultimate Speech Search
Page 54
4.2.5 Use Case description for noise filtering
Use Case
Use Case Five
Description
The process of filtering noise
Actors
Noise filtering system
Assumptions
Permanent elimination of the noise is unreachable.
User uploads a noisy file in a wav format
The chosen mechanism for noise filtering is the most suitable one
Steps
User has to run the noise filtering program in MATLAB
User has to input a file which includes the noise
Variations
No variations
Non functional
All the necessary hardware configuration must met.
Issues
None
Table 8 Use Case description noise process
![Page 64: Speech Recognition , Noise Filtering and Content Search Engine , Research Document](https://reader031.vdocument.in/reader031/viewer/2022020106/55a96f961a28ab2b508b47a6/html5/thumbnails/64.jpg)
Ultimate Speech Search
Page 55
4.3 Activity Diagrams
4.3.1Activity Diagram for Speech Recognition System
Figure 15 Speech Recognition
![Page 65: Speech Recognition , Noise Filtering and Content Search Engine , Research Document](https://reader031.vdocument.in/reader031/viewer/2022020106/55a96f961a28ab2b508b47a6/html5/thumbnails/65.jpg)
Ultimate Speech Search
Page 56
4.3.2 Activity Diagram for Noise filtering
Figure 16 Activity Diagram Noise Filtering
![Page 66: Speech Recognition , Noise Filtering and Content Search Engine , Research Document](https://reader031.vdocument.in/reader031/viewer/2022020106/55a96f961a28ab2b508b47a6/html5/thumbnails/66.jpg)
Ultimate Speech Search
Page 57
4.4 Sequence Diagrams
4.4.1 Select a file
Figure 17 Sequence Diagram Select a file
![Page 67: Speech Recognition , Noise Filtering and Content Search Engine , Research Document](https://reader031.vdocument.in/reader031/viewer/2022020106/55a96f961a28ab2b508b47a6/html5/thumbnails/67.jpg)
Ultimate Speech Search
Page 58
4.4.2 Play wav file
The system can play a file. Two main control classes involve this process. The
WavFileRecognition class acts as a mediator which passes messages between functionalities
on other classes.
Figure 18 Sequence Diagram Play File
![Page 68: Speech Recognition , Noise Filtering and Content Search Engine , Research Document](https://reader031.vdocument.in/reader031/viewer/2022020106/55a96f961a28ab2b508b47a6/html5/thumbnails/68.jpg)
Ultimate Speech Search
Page 59
4.4.3Speech recognition pre stage
In Speech recognition pre stage, the system gets loaded with the configuration file and input
signal. A recognizer will allocate through the configuration manager.
Figure 19 Sequence Diagram SR Pre Stage
![Page 69: Speech Recognition , Noise Filtering and Content Search Engine , Research Document](https://reader031.vdocument.in/reader031/viewer/2022020106/55a96f961a28ab2b508b47a6/html5/thumbnails/69.jpg)
Ultimate Speech Search
Page 60
4.4.4Speech Recognition post stage
In speech recognition post stage the input digital signal will go through fast Fourier
transformation segmenting, identifying dialects and phonemes. The Classes
AudioFileDataSource and the Recognizer facilitates functionalities to perform these tasks.
Figure 20 Sequence Diagram SR Post Stage
![Page 70: Speech Recognition , Noise Filtering and Content Search Engine , Research Document](https://reader031.vdocument.in/reader031/viewer/2022020106/55a96f961a28ab2b508b47a6/html5/thumbnails/70.jpg)
Ultimate Speech Search
Page 61
4.5 Class Diagrams
4.5.1 GUI and the system
The figure below shows the class diagram of the GUI and WavFileRecognizer.
Figure 21 Class Diagrams GUI & System
![Page 71: Speech Recognition , Noise Filtering and Content Search Engine , Research Document](https://reader031.vdocument.in/reader031/viewer/2022020106/55a96f961a28ab2b508b47a6/html5/thumbnails/71.jpg)
Ultimate Speech Search
Page 62
4.5.2 Speech recognition
Figure 22 Class Diagram SR System
![Page 72: Speech Recognition , Noise Filtering and Content Search Engine , Research Document](https://reader031.vdocument.in/reader031/viewer/2022020106/55a96f961a28ab2b508b47a6/html5/thumbnails/72.jpg)
Ultimate Speech Search
Page 63
Class Diagram for Speech search
Figure 23 : Speech Search Class Diagram
![Page 73: Speech Recognition , Noise Filtering and Content Search Engine , Research Document](https://reader031.vdocument.in/reader031/viewer/2022020106/55a96f961a28ab2b508b47a6/html5/thumbnails/73.jpg)
Ultimate Speech Search
Page 64
4.6 Noise Filtering
Noise filtering has done using Matlab. Matlab support objects orientation, polymorphism
or inheritance. I have generated a code in C to tally the code in Matlab.
%ver 1.56
function noiseReduction
%----- user data -----
steps_1 = 512;
chunk = 2048;
coef = 0.01*chunk/2;
The 3 code segments above defines user data which going to use in MATLAB script. The
term chunk means a small piece of segment of the input signal. The script below can be used
to filter the noise for any given input signal.
%Windowing Techniques
%w1 = .5*(1 - cos(2*pi*(0:chunk-1)'/(chunk))); %hanning
w1 = [.42 - .5*cos(2*pi*(0:chunk-1)/(chunk-1)) + .08*cos(4*pi*(0:chunk-1)/(chunk-1))]';
%Blackman
w2 = w1;
Backman Window technique used here to chop the signal in to small segments. In here the
input signal will recursively split in to small chunks. Chunk is the technical term for a
segment in digital signal processing.
% input wav file and extract required data
[input, FS, N] = wavread('input.wav');
L = length(input);
The input signal will extract and re arrange in to a matrix. Length is the total propagating
duration of the signal. The matrix mechanism hidden by the MATLAB.
![Page 74: Speech Recognition , Noise Filtering and Content Search Engine , Research Document](https://reader031.vdocument.in/reader031/viewer/2022020106/55a96f961a28ab2b508b47a6/html5/thumbnails/74.jpg)
Ultimate Speech Search
Page 65
% zero padding for intput file
input = [zeros(chunk,1);input;zeros(chunk,1)]/ max(abs(input));
%the appended zeros to the back of the input sound file makes it so that the windowing
samples the complete sound file
%----- initializations -----
output = zeros(length(input),1);
count = 0;
% block by block fft algorithm
Normally a noise signal has a higher frequency. After the system gets median value for
noise factor. The functions below recursively take segments and analyze the mean value.
while count<(length(input) - chunk)
grain = input(count+1:count+chunk).* w1; % windowing
f = fft(grain); % fft of window data
r = abs(f); % magnitude of window data
phi = angle(f); % phase of window data
ft = denoise(f,r,coef);
This function will reduce the amplitude of each chunk. A single chunk will take as an
argument by the function.
grain = real(ifft(ft)).*w2; % take inverse fft of window data
output(count+1:count+chunk) = output(count+1:count+chunk) + grain; % append
data to output file
count = count + steps_1; % increment by hop size
end
output = output(1:L) / (4.75*max(abs(output))); %the 4.75*max(abs(output) maintains
consistency between input and output volume
%soundsc(output, FS);
wavwrite(output, FS, 'output.wav');
As you can see there are no classes or Interfaces. Equivalent code for the Matlab in C
programming language is shown below.
![Page 75: Speech Recognition , Noise Filtering and Content Search Engine , Research Document](https://reader031.vdocument.in/reader031/viewer/2022020106/55a96f961a28ab2b508b47a6/html5/thumbnails/75.jpg)
Ultimate Speech Search
Page 66
function ft = denoise(f,r,coef)
if abs(f) >= 0.001
ft = f.*(r./(r+coef));
else
ft = f.*(r./(r+sqrt(coef)));
end
The shown above is denoise function. The function analyzes each signal chunk‟s absolute
frequency against its mean value. Then it will get modified by the coefficient and the square
root recursively. This process continues till the higher frequency clusters eliminates to lower
frequencies.
![Page 76: Speech Recognition , Noise Filtering and Content Search Engine , Research Document](https://reader031.vdocument.in/reader031/viewer/2022020106/55a96f961a28ab2b508b47a6/html5/thumbnails/76.jpg)
Ultimate Speech Search
Page 67
4.7 Code to filter noise in C Language
#include <stdio.h>
#include "mclmcr.h"
#ifdef __cplusplus
extern "C" {
#endif
extern const unsigned char __MCC_denoise2_public_data[];
extern const char *__MCC_denoise2_name_data;
extern const char *__MCC_denoise2_root_data;
extern const unsigned char __MCC_denoise2_session_data[];
extern const char *__MCC_denoise2_matlabpath_data[];
extern const int __MCC_denoise2_matlabpath_data_count;
extern const char *__MCC_denoise2_mcr_runtime_options[];
extern const int __MCC_denoise2_mcr_runtime_option_count;
extern const char *__MCC_denoise2_mcr_application_options[];
extern const int __MCC_denoise2_mcr_application_option_count;
#ifdef __cplusplus
}
#endif
static HMCRINSTANCE _mcr_inst = NULL;
static int mclDefaultPrintHandler(const char *s)
{
return fwrite(s, sizeof(char), strlen(s), stdout);
}
static int mclDefaultErrorHandler(const char *s)
{
int written = 0, len = 0;
len = strlen(s);
written = fwrite(s, sizeof(char), len, stderr);
if (len > 0 && s[ len-1 ] != '\n')
written += fwrite("\n", sizeof(char), 1, stderr);
return written;
}
bool denoise2InitializeWithHandlers(
mclOutputHandlerFcn error_handler,
mclOutputHandlerFcn print_handler
)
{
if (_mcr_inst != NULL)
return true;
![Page 77: Speech Recognition , Noise Filtering and Content Search Engine , Research Document](https://reader031.vdocument.in/reader031/viewer/2022020106/55a96f961a28ab2b508b47a6/html5/thumbnails/77.jpg)
Ultimate Speech Search
Page 68
if (!mclmcrInitialize())
return false;
if (!mclInitializeComponentInstance(&_mcr_inst,
__MCC_denoise2_public_data,
__MCC_denoise2_name_data,
__MCC_denoise2_root_data,
__MCC_denoise2_session_data,
__MCC_denoise2_matlabpath_data,
__MCC_denoise2_matlabpath_data_count,
__MCC_denoise2_mcr_runtime_options,
__MCC_denoise2_mcr_runtime_option_count,
true, NoObjectType, ExeTarget, NULL,
error_handler, print_handler))
return false;
return true;
}
bool denoise2Initialize(void)
{
return denoise2InitializeWithHandlers(mclDefaultErrorHandler,
mclDefaultPrintHandler);
}
void denoise2Terminate(void)
{
if (_mcr_inst != NULL)
mclTerminateInstance(&_mcr_inst);
}
int main(int argc, const char **argv)
{
int _retval;
if (!mclInitializeApplication(__MCC_denoise2_mcr_application_options,
__MCC_denoise2_mcr_application_option_count))
return 0;
if (!denoise2Initialize())
return -1;
_retval = mclMain(_mcr_inst, argc, argv, "denoise2", 0);
if (_retval == 0 /* no error */) mclWaitForFiguresToDie(NULL);
denoise2Terminate();
mclTerminateApplication();
return _retval; }
/*
* MATLAB Compiler: 4.0 (R14)
* Date: Sun Oct 04 09:55:11 2009
* Arguments: "-B" "macro_default" "-m" "-W" "main" "-T" "link:exe" "denoise2"
![Page 78: Speech Recognition , Noise Filtering and Content Search Engine , Research Document](https://reader031.vdocument.in/reader031/viewer/2022020106/55a96f961a28ab2b508b47a6/html5/thumbnails/78.jpg)
Ultimate Speech Search
Page 69
*/
#ifdef __cplusplus
extern "C" {
#endif
const unsigned char __MCC_denoise2_public_data[] = {'3', '0', '8', '1', '9',
'D', '3', '0', '0', 'D',
'0', '6', '0', '9', '2',
'A', '8', '6', '4', '8',
'8', '6', 'F', '7', '0',
'D', '0', '1', '0', '1',
'0', '1', '0', '5', '0',
'0', '0', '3', '8', '1',
'8', 'B', '0', '0', '3',
'0', '8', '1', '8', '7',
'0', '2', '8', '1', '8',
'1', '0', '0', 'C', '4',
'9', 'C', 'A', 'C', '3',
'4', 'E', 'D', '1', '3',
'A', '5', '2', '0', '6',
'5', '8', 'F', '6', 'F',
'8', 'E', '0', '1', '3',
'8', 'C', '4', '3', '1',
'5', 'B', '4', '3', '1',
'5', '2', '7', '7', 'E',
'D', '3', 'F', '7', 'D',
'A', 'E', '5', '3', '0',
'9', '9', 'D', 'B', '0',
'8', 'E', 'E', '5', '8',
'9', 'F', '8', '0', '4',
'D', '4', 'B', '9', '8',
'1', '3', '2', '6', 'A',
'5', '2', 'C', 'C', 'E',
'4', '3', '8', '2', 'E',
'9', 'F', '2', 'B', '4',
'D', '0', '8', '5', 'E',
'B', '9', '5', '0', 'C',
'7', 'A', 'B', '1', '2',
'E', 'D', 'E', '2', 'D',
'4', '1', '2', '9', '7',
'8', '2', '0', 'E', '6',
'3', '7', '7', 'A', '5',
'F', 'E', 'B', '5', '6',
'8', '9', 'D', '4', 'E',
'6', '0', '3', '2', 'F',
'6', '0', 'C', '4', '3',
![Page 79: Speech Recognition , Noise Filtering and Content Search Engine , Research Document](https://reader031.vdocument.in/reader031/viewer/2022020106/55a96f961a28ab2b508b47a6/html5/thumbnails/79.jpg)
Ultimate Speech Search
Page 70
'0', '7', '4', 'A', '0',
'4', 'C', '2', '6', 'A',
'B', '7', '2', 'F', '5',
'4', 'B', '5', '1', 'B',
'B', '4', '6', '0', '5',
'7', '8', '7', '8', '5',
'B', '1', '9', '9', '0',
'1', '4', '3', '1', '4',
'A', '6', '5', 'F', '0',
'9', '0', 'B', '6', '1',
'F', 'C', '2', '0', '1',
'6', '9', '4', '5', '3',
'B', '5', '8', 'F', 'C',
'8', 'B', 'A', '4', '3',
'E', '6', '7', '7', '6',
'E', 'B', '7', 'E', 'C',
'D', '3', '1', '7', '8',
'B', '5', '6', 'A', 'B',
'0', 'F', 'A', '0', '6',
'D', 'D', '6', '4', '9',
'6', '7', 'C', 'B', '1',
'4', '9', 'E', '5', '0',
'2', '0', '1', '1', '1'
, '\0'};
const char *__MCC_denoise2_name_data = "denoise2";
const char *__MCC_denoise2_root_data = "";
const unsigned char __MCC_denoise2_session_data[] = {'7', '7', 'B', 'D', '1',
'6', '2', '3', '5', '5',
'4', '5', '0', 'A', 'B',
'1', '7', '3', '9', '0',
'4', 'D', '4', '6', '7',
'2', 'E', '3', '6', 'B',
'3', '2', '4', '7', '5',
'6', '1', '0', 'F', '3',
'5', '2', '8', 'D', '5',
'3', '8', '2', '3', '4',
'4', 'A', '6', 'B', '6',
'3', '8', 'E', '4', 'E',
'A', '8', '2', 'F', '9',
'4', '1', '8', 'E', '9',
'1', 'C', '1', 'F', '8',
'F', '7', '6', '0', '2',
'D', 'B', '3', 'B', 'F',
'3', '4', '9', 'B', 'C',
![Page 80: Speech Recognition , Noise Filtering and Content Search Engine , Research Document](https://reader031.vdocument.in/reader031/viewer/2022020106/55a96f961a28ab2b508b47a6/html5/thumbnails/80.jpg)
Ultimate Speech Search
Page 71
'2', '8', 'C', '6', 'A',
'9', '9', '6', '4', '9',
'6', '3', 'C', '6', '8',
'4', '1', '1', '8', '5',
'5', 'E', '2', '3', '5',
'B', '9', '7', '9', '7',
'0', '9', 'B', 'A', 'F',
'7', 'E', 'D', '0', 'C',
'0', '5', 'F', 'E', '2',
'C', '6', '3', '6', '6',
'D', 'F', 'B', '6', '0',
'F', '6', 'B', 'F', 'F',
'2', '9', '4', '4', '2',
'0', '3', 'C', 'C', 'C',
'8', 'E', '3', '7', 'F',
'A', '4', '5', 'A', '9',
'A', '5', 'B', '7', '2',
'0', '0', 'B', 'E', '3',
'F', 'E', '0', 'E', 'B',
'1', 'C', '0', '7', 'D',
'3', '9', 'D', 'F', '0',
'7', '4', '2', 'B', '9',
'E', '3', 'A', '2', 'F',
'3', '3', 'E', '9', '8',
'E', '5', 'C', '9', 'B',
'B', 'D', '3', '6', 'B',
'7', 'D', 'E', '8', '3',
'2', 'B', '9', '7', '5',
'F', '3', '0', '7', '7',
'D', 'F', '8', '1', 'F',
'A', '9', 'B', '4', 'F',
'E', '3', '5', '4', 'F',
'B', '1', '8', 'E', '1',
'D', '\0'};
const char *__MCC_denoise2_matlabpath_data[] = {"denoise2/",
"toolbox/compiler/deploy/",
"$TOOLBOXMATLABDIR/general/",
"$TOOLBOXMATLABDIR/ops/",
"$TOOLBOXMATLABDIR/lang/",
"$TOOLBOXMATLABDIR/elmat/",
"$TOOLBOXMATLABDIR/elfun/",
"$TOOLBOXMATLABDIR/specfun/",
"$TOOLBOXMATLABDIR/matfun/",
"$TOOLBOXMATLABDIR/datafun/",
"$TOOLBOXMATLABDIR/polyfun/",
![Page 81: Speech Recognition , Noise Filtering and Content Search Engine , Research Document](https://reader031.vdocument.in/reader031/viewer/2022020106/55a96f961a28ab2b508b47a6/html5/thumbnails/81.jpg)
Ultimate Speech Search
Page 72
"$TOOLBOXMATLABDIR/funfun/",
"$TOOLBOXMATLABDIR/sparfun/",
"$TOOLBOXMATLABDIR/scribe/",
"$TOOLBOXMATLABDIR/graph2d/",
"$TOOLBOXMATLABDIR/graph3d/",
"$TOOLBOXMATLABDIR/specgraph/",
"$TOOLBOXMATLABDIR/graphics/",
"$TOOLBOXMATLABDIR/uitools/",
"$TOOLBOXMATLABDIR/strfun/",
"$TOOLBOXMATLABDIR/imagesci/",
"$TOOLBOXMATLABDIR/iofun/",
"$TOOLBOXMATLABDIR/audiovideo/",
"$TOOLBOXMATLABDIR/timefun/",
"$TOOLBOXMATLABDIR/datatypes/",
"$TOOLBOXMATLABDIR/verctrl/",
"$TOOLBOXMATLABDIR/codetools/",
"$TOOLBOXMATLABDIR/helptools/",
"$TOOLBOXMATLABDIR/winfun/",
"$TOOLBOXMATLABDIR/demos/",
"toolbox/local/",
"toolbox/compiler/"};
const int __MCC_denoise2_matlabpath_data_count = 32;
const char *__MCC_denoise2_mcr_application_options[] = { "" };
const int __MCC_denoise2_mcr_application_option_count = 0;
const char *__MCC_denoise2_mcr_runtime_options[] = { "" };
const int __MCC_denoise2_mcr_runtime_option_count = 0;
#ifdef __cplusplus
}
#endif
![Page 82: Speech Recognition , Noise Filtering and Content Search Engine , Research Document](https://reader031.vdocument.in/reader031/viewer/2022020106/55a96f961a28ab2b508b47a6/html5/thumbnails/82.jpg)
Ultimate Speech Search
Page 73
CHAPTER 5
5.0 Implementation
The Agile development process was chosen for the development. The system went on three
iterations. In the first iteration the basic objective was to build a speech recognition engine.
Various methods were tested out. But in the first iteration the speech recognition engine was
built.
Figure 24: SR Engine
![Page 83: Speech Recognition , Noise Filtering and Content Search Engine , Research Document](https://reader031.vdocument.in/reader031/viewer/2022020106/55a96f961a28ab2b508b47a6/html5/thumbnails/83.jpg)
Ultimate Speech Search
Page 74
The figure below shows the functionalities in speech recognition engine. It can open .wav file
to play or to recognize speech.
Figure 25 Open file
![Page 84: Speech Recognition , Noise Filtering and Content Search Engine , Research Document](https://reader031.vdocument.in/reader031/viewer/2022020106/55a96f961a28ab2b508b47a6/html5/thumbnails/84.jpg)
Ultimate Speech Search
Page 75
Once a file selected for the recognition a user can press the start button to start the
recognition process. The recognized output can be viewed in the text output section.
Figure 26: Text output
![Page 85: Speech Recognition , Noise Filtering and Content Search Engine , Research Document](https://reader031.vdocument.in/reader031/viewer/2022020106/55a96f961a28ab2b508b47a6/html5/thumbnails/85.jpg)
Ultimate Speech Search
Page 76
The noise filtering process has done in the second iteration and it‟s completely done by using
MATLAB.
It doesn‟t have a user interface. In the first development the noise filtering engine was not
that efficient. There were many isolated noise packets in the spectrum. But in the second
development the system could achieve a remarkable performance.
We have to input a noisy speech file and when we runs the program it will produced a noise
filtered .wav file.
![Page 86: Speech Recognition , Noise Filtering and Content Search Engine , Research Document](https://reader031.vdocument.in/reader031/viewer/2022020106/55a96f961a28ab2b508b47a6/html5/thumbnails/86.jpg)
Ultimate Speech Search
Page 77
The Search engine was built on the third phase. The user has to run the search engine and it
will access the local database and gives the search results.
Figure 27 Speech Search Engine
![Page 87: Speech Recognition , Noise Filtering and Content Search Engine , Research Document](https://reader031.vdocument.in/reader031/viewer/2022020106/55a96f961a28ab2b508b47a6/html5/thumbnails/87.jpg)
Ultimate Speech Search
Page 78
CHAPTER 6
6.0 Test Plan
6.1 Background
The system that built for the research project was comprises with three main parts. The
speech recognition section is the key part in this application. The noise filtering part section is
another key are that taken in to accounts. There‟s a text search in the system which provides
the facility to search the speech by content. Because this was a technical project and with
consideration of the nature of the projects, the testing criteria‟s would not looks the same
compared with other projects.
6.2 Introduction
As for the test plan the testing criteria‟s will based on the input speech signals for the speech
recognition and noise filtering and searching criteria. Due to nature of this project we cannot
make the use of industrial test plans. The project is not a commercial project. As for the
speech recognition testing criteria a speech in a digital format will use. Speech recognition
projects are still in the research stage. So it‟s not advisable to implement a standard heavy
weight test plan. Basic test plans will sufficient to asses the testing criteria‟s mentioned in the
project.
![Page 88: Speech Recognition , Noise Filtering and Content Search Engine , Research Document](https://reader031.vdocument.in/reader031/viewer/2022020106/55a96f961a28ab2b508b47a6/html5/thumbnails/88.jpg)
Ultimate Speech Search
Page 79
6.3 Assumptions
Before declaring any assumptions it is advisable to understand the nature of the project.
Within the project scope we can assume that the speech recognition engine will only works
for noiseless speech inputs. The speech recognition system will only work on pure English
accent only. Noisy speech will not use as inputs because the speech engine won‟t directly
identify the noise factor and filters it.
The system only can identify the most speaking words. It is possible to add large vocabulary.
Due to the fact, the system haven‟t designed for high level language identification and
processing.
Noise filtering can be done on “.wav” format only. System cannot eliminate the noise factor
permanently.
It is not possible to use a file which have been filtered the noise for the recognition, because
the speech recognition system will works on noiseless accent only.
6.4 Features to be tested
For the speech recognition system a noiseless speech input in .wav format will be tested to
identify the continuous speech recognition capabilities. Continuous speech recognition
capability is a unique feature in modern speech recognition systems.
A noisy speech file will upload to noise filtering system and it will results a noise filtered [up
to a reasonable level] output file. It is possible to measure the efficiency of the noise filtering
system by measuring the amount of time it will take for processing. It is not addressed here in
the project.
For the speech searching part the system will use a file search. The search mechanism will
include an efficient file searching and text matching mechanism. Once the user typed for a
phrase, the system will show the mostly containing file name.
System can play a wav file before uploading for the recognition process.
![Page 89: Speech Recognition , Noise Filtering and Content Search Engine , Research Document](https://reader031.vdocument.in/reader031/viewer/2022020106/55a96f961a28ab2b508b47a6/html5/thumbnails/89.jpg)
Ultimate Speech Search
Page 80
6.5 Suspension and resumption criteria
While the system testing process running and if there are defects there are reasons to
suspense the process. Suspension criteria denote what are those reasons. According to Anon.
(nd). Suspension criteria & resumption requirements
The suspension criteria as follows
Unavailability of external dependent systems during execution.
When a defect is introduced that cannot allow any further testing.
Critical path deadline is missed so that the client will not accept delivery even if all
testing is completed.
A specific holiday shuts down both development and testing
The resumption criteria‟s as follows
When the external dependent systems become available again.
When a fix is successfully implemented and the Testing Team is notified to continue
testing.
The contract is renegotiated with the client to extend delivery.
The holiday period ends.
According Anon. (nd). Suspension criteria & resumption requirements
Suspension criteria assume that testing cannot go forward and that going backward is also not
possible. A failed build would not suffice as you could generally continue to use the previous
build. Most major or critical defects would also not constituted suspension criteria as other
areas of the system could continue to be tested.
![Page 90: Speech Recognition , Noise Filtering and Content Search Engine , Research Document](https://reader031.vdocument.in/reader031/viewer/2022020106/55a96f961a28ab2b508b47a6/html5/thumbnails/90.jpg)
Ultimate Speech Search
Page 81
6.6 Environmental needs
There are few environmental needs to be met before testing the system. The environmental
needs can be classified as software needs, hardware needs and legal needs. There are no legal
needs because the system does not have any links with legal situations.
The list of Software needs can be list down as below
Java run time environment
Matlab development software
NetBeans 6.5 or greater
Sound driver software
Windows XP operation system
The hardware needs are
A computer[hardware requirements were specified in another chapter under system
requirements]
Multimedia devices
![Page 91: Speech Recognition , Noise Filtering and Content Search Engine , Research Document](https://reader031.vdocument.in/reader031/viewer/2022020106/55a96f961a28ab2b508b47a6/html5/thumbnails/91.jpg)
Ultimate Speech Search
Page 82
6.7 System testing
Speeches and lectures with different accent [English only USA and British]:- In order to test
the speech recognition engines accuracy it will tested against different accents. The expected
results must be in a minimum difference with minimum errors.
Content Search:- when the user tries to search by the content by typing a word or a phrase
the appropriate search result will be displayed. The speech or the lecture containing the
specified words or the phrase will be displayed
![Page 92: Speech Recognition , Noise Filtering and Content Search Engine , Research Document](https://reader031.vdocument.in/reader031/viewer/2022020106/55a96f961a28ab2b508b47a6/html5/thumbnails/92.jpg)
Ultimate Speech Search
Page 83
6.8 Unit testing
The initial testing was the initial user interface. At the first glance the system only loads with
the basic interactions with the user. The system doesn‟t load any calculation or extraction
functionalities before a user provides a correct input for the system.
Test Case Test Case One
Description
The user runs the Speech recognition System for the first time
Expected Output
Open, Start and Open Speech buttons set enabled.
Encode To wav, Noise Filter buttons remain disabled.
The area below open a speech file shows blank.
Text output must show blank.
Actual Output
Open, Start and Open Speech buttons set enabled.
Encode To wav, Noise Filter buttons remain disabled.
The area below open a speech file shows blank.
Text output must show blank.
Actual output acquired.
Table 9 Test Case 1
On the initial run the speech recognition system won‟t load with any algorithms. After giving
an input the system will load the necessary components for processing. This mechanism will
utilize the system resources.
![Page 93: Speech Recognition , Noise Filtering and Content Search Engine , Research Document](https://reader031.vdocument.in/reader031/viewer/2022020106/55a96f961a28ab2b508b47a6/html5/thumbnails/93.jpg)
Ultimate Speech Search
Page 84
The second testing criteria begin when the user provides and input to the system. This test
case interacts with the speech recognition system‟s input. The input can be a .wav file.
Test Case Test Case Two
Description
The user opens a file to feed the speech recognition system
The user provides for the system with .wav file.
The first input speech contains digits in the range of one to nine in
British accent.
File must be a noise free file.
Expected Output
Identified names of the digits needs to be display in text output area.
Actual Output
Due to variations in dialect the expected results would not the same.
Within the range of one to nine the system identifies the digits and
displays the output.
Table 10 Test Case 2
The identification of digits can be extending beyond ten. Once the name of the digit to be
identified becomes longer, the system identifies the digits with an error rate.
![Page 94: Speech Recognition , Noise Filtering and Content Search Engine , Research Document](https://reader031.vdocument.in/reader031/viewer/2022020106/55a96f961a28ab2b508b47a6/html5/thumbnails/94.jpg)
Ultimate Speech Search
Page 85
The third testing criterion was based on the user inputs a file with noise for identification.
The system does not work for files contains with noise.
Test Case Test Case Three
Description
The user provides for the system with a .wav file with noise
Expected Output
The system will throws an error or the system shows no results
Actual Output
The actual output varies due to different noise levels. If the density of
the noise lays within a higher range the system go for an error. The
error can be “severe null”.
The system will go blank results due to the fact that the words are
merely in an identifiable stage.
Table 11 Test Case 3
The system doesn‟t have any functionality to measure the noise levels. The project scope
won‟t cater for in depth noise analysis. The levels of noise mentioned above were measured
in user experience.
The system assumes that the users would not upload files with noise to the system and this
rule clearly mentioned in assumptions.
![Page 95: Speech Recognition , Noise Filtering and Content Search Engine , Research Document](https://reader031.vdocument.in/reader031/viewer/2022020106/55a96f961a28ab2b508b47a6/html5/thumbnails/95.jpg)
Ultimate Speech Search
Page 86
The fourth testing c criterion is to check the systems speech recognition capabilities with
words.
Test Case Test Case Four
Description
The user provides for the system with a .wav file containing basic
words.
The input doesn‟t contain any noise.
Expected Output
The system identified all the words and shows the output in a more
precise manner.
Actual Output
The system identified words with an error rate. The error rate is
fluctuates between from 20% to 35%.
Not all the words will identify by the system.
Table 12 Test Case 4
The System doesn‟t identify all the words. The identification process depends on the speed of
the utterance rate and the intensity of the phoneme. Higher intensities on phoneme help any
speech recognition systems to achieve more precise results.
![Page 96: Speech Recognition , Noise Filtering and Content Search Engine , Research Document](https://reader031.vdocument.in/reader031/viewer/2022020106/55a96f961a28ab2b508b47a6/html5/thumbnails/96.jpg)
Ultimate Speech Search
Page 87
Test case five tests the performance of noise filtering. The noise filtering system was built in
MATLAB.
Test Case Test Case Five
Description
The user provides the noise filtering system with a noise file.
The input file must in .wav format.
The user has to open the MATLAB Scripts, import them to working
directory and need to run.
The file to be input need to be in the same directory.
Expected Output
An output file should be create in the working folder with the name
“output.wav”
“output.wav” file contains the noise filtered version of input file.
Amplitude of the output file should not have a difference which can
identify by a human.
Actual Output
Output file creates in working folder.
Output file has a lowered noise relative to the input file.
Output file is not noise free.
Amplitude has a different which can identify by a human ear.
Table 13 Test Case 5
Still there isn‟t a mechanism to remove the noise for 100%. The system will works on
predefined algorithms.
![Page 97: Speech Recognition , Noise Filtering and Content Search Engine , Research Document](https://reader031.vdocument.in/reader031/viewer/2022020106/55a96f961a28ab2b508b47a6/html5/thumbnails/97.jpg)
Ultimate Speech Search
Page 88
Test case six tests the criterion for search functionality. The search functionality acts as a
speech search engine.
Test Case Test Case Six
Description
The user has to run the search engine.
Port 8080 must be free.
Expected Output
When user types a phrase to search on search engine and press search
button.
If there‟s a match in the database it will show true.
If there‟s no match the results will show as false.
Actual Output
If a match was found “true” displays in the results.
If no match “false” displays in results.
Table 14 Test Case 6
The system doesn‟t build for actual speech engine. It will only demonstrate how the speech
search engine works. As for future enhancements it‟s possible to build an actual search
engine.
![Page 98: Speech Recognition , Noise Filtering and Content Search Engine , Research Document](https://reader031.vdocument.in/reader031/viewer/2022020106/55a96f961a28ab2b508b47a6/html5/thumbnails/98.jpg)
Ultimate Speech Search
Page 89
6.9 Performance Testing
The System‟s performance was tested in different operating systems. Operating systems
include virtual operating environments.
The absolute operating system in order to take the measurements was taken as the Microsoft
Windows XP.
Operating System Microsoft Windows XP
Speech recognition engine configuring
time
Between 0.5 seconds and 1 second
Efficiency of Speech recognition
Input signal which having greater phoneme
intensity, free from noise and duration less
than 10 seconds with low word density will
take around 1 second to 12 seconds.
Input signals which have many words will
take longer times.
Efficiency of Noise filtering and
MATLAB
Noise filtering system generates the output
less than 200 milliseconds for .wav file clips
which having a duration between 2 to 10
seconds.
Performance of Speech search engine
Startup time for the Speech Search has an
average of 8 to 15 seconds.
Table 15: Performance testing windows XP
![Page 99: Speech Recognition , Noise Filtering and Content Search Engine , Research Document](https://reader031.vdocument.in/reader031/viewer/2022020106/55a96f961a28ab2b508b47a6/html5/thumbnails/99.jpg)
Ultimate Speech Search
Page 90
The performance of the speech search engine totally depends on the operating system. As for
an example the windows operation systems use much more resources than UNIX based
operating systems.
The speech search system runs on the Glassfish server. The glassfish server has more
performance in UNIX based operating systems. In windows environments the speech search
engine has many deadlocks.
Operating System Ubuntu 9.04
Speech recognition engine configuring
time
Between 0.2 seconds and 0.8 second
Efficiency of Speech recognition
Input signal which having greater phoneme
intensity, free from noise and duration less
than 5 seconds with low word density will
take around 1 second to 5 seconds.
System has a greater positive effect when it‟s
work on Ubuntu environments.
Efficiency of Noise filtering and
MATLAB
Noise filtering was efficient compared with
windows environment.
Performance of Speech search engine
The search and the startup time of the search
engine were efficient compared with
windows XP.
Table 16 : Performance Testing on UBUNTU
![Page 100: Speech Recognition , Noise Filtering and Content Search Engine , Research Document](https://reader031.vdocument.in/reader031/viewer/2022020106/55a96f961a28ab2b508b47a6/html5/thumbnails/100.jpg)
Ultimate Speech Search
Page 91
Once the Search engine runs on many times in windows environment it has a higher potential
of crashing and it would not provide the correct results.
When the system uses to perform the speech recognition for several times the efficiency of
the recognition slows down.
Java runs on a virtual environment and the recognition process needs a higher processing
power. Due to those factors the efficiency of the system will degrade as it uses over and over.
![Page 101: Speech Recognition , Noise Filtering and Content Search Engine , Research Document](https://reader031.vdocument.in/reader031/viewer/2022020106/55a96f961a28ab2b508b47a6/html5/thumbnails/101.jpg)
Ultimate Speech Search
Page 92
6.10 Integration Testing
Integration testing is a logical extension of unit testing. Integration testing identify drawback
when a system combines. Before performing integration System for the system it comprises
with different systems with different functionalities.
It is not possible to combine the noise filtering system with speech recognition system or
search web browser,.
An overall test mechanism used for integration testing due to the fact that the system
comprises with sub systems which indirectly have a connection with each other.
Big Bang testing
Big Band testing is the process of taking the entire unit testing criteria for a System and ties
them together. This approach mostly suitable for small systems and May results many
unidentified errors on testing stages. If a developer has done unit testing correctly, Bug bang
testing will helps to uncover more errors and it will save money and time.
In the system after performing the big bang testing the following faults were recovered.
The continuous functionality of the search engine cannot guaranty.
If the length of the input signal was long there will be a system out of memory error.
The disadvantages in Big bang testing are
Cannot start integration testing until all the modules have been successfully
evaluated.
Harder to track down the causes for the errors
![Page 102: Speech Recognition , Noise Filtering and Content Search Engine , Research Document](https://reader031.vdocument.in/reader031/viewer/2022020106/55a96f961a28ab2b508b47a6/html5/thumbnails/102.jpg)
Ultimate Speech Search
Page 93
Incremental Testing
Incremental Testing allows you to compare and contrast two functionalities with you are
testing. You can add and test for other modules within the testing time.
Incremental testing cannot perform to the system because there are no parallel functionalities
within the system which interact each others.
![Page 103: Speech Recognition , Noise Filtering and Content Search Engine , Research Document](https://reader031.vdocument.in/reader031/viewer/2022020106/55a96f961a28ab2b508b47a6/html5/thumbnails/103.jpg)
Ultimate Speech Search
Page 94
CHAPTER 7
CRITICAL EVALUATION AND FUTURE ENHANCEMENTS
7.1Critical evaluation
The entire project was about speech recognition using digital signal by input and search by
content. The project is a union of several other research areas. At the initial stage the research
was focused on to speech recognition.
The barriers met in the initial stage
Human speech recognition
At the beginning there wasn‟t a way to explain the speech recognition process,
the mechanisms behind that and how it was performed.
Speech recognition engine
Study of speech recognition engine was a crucial part for the design phase.
There was no speech recognition engine to analyze or to study.
In order to overcome those two factors first of all the functionality of speech recognition was
essential. After understanding a system and completion of a basic sketch of the flow
diagram, it seems a sufficient starting point for the development.
When talks over a microphone, it so easy to record a human voice. After recording the
human voice is no longer in analogue format. The obvious digital format was a .wav file. The
system going to performs the speech recognition for the file in .wav format.
![Page 104: Speech Recognition , Noise Filtering and Content Search Engine , Research Document](https://reader031.vdocument.in/reader031/viewer/2022020106/55a96f961a28ab2b508b47a6/html5/thumbnails/104.jpg)
Ultimate Speech Search
Page 95
In the development phase the study of speech recognition system didn‟t help much for further
proceedings. Because when it comes to audio formats the digital signal processing part was
hidden. The system has to address the DSP part in a reasonable manner. Digital signal
processing is about concerned with the representation of the signals by a sequence of
numbers or symbols and the processing of these signals.
Within the course content we studied there wasn‟t a single module that tough us about
Interface programming, micro control programming or digital signal processing.
Building a functionality to handle the digital signal processing part from the scratch was a
tedious job. The knowledge that we have to build such functionality wont sufficient
compared with the time.
At the initial stage the plan was to develop the entire system in JAVA .but java didn‟t have a
built in proper API or luxuries to handle digital signal processing. However there were some
reliable third party components that merely manage to perform the task.
Plug in the third party tools was another issue. But finally manage to find codes in order to
accomplish the task.
There was few speech recognition systems were built using java. But the fact that they were
not built for continuous speech recognition or for noise reduction.
There were many issues in the first place. We have to define a grimmer format. There were
two options. One is to go with JVXML. Java voice xml is a technology which provides
speech synthesis capabilities and recognition capabilities. We can embed voice commands for
web sites using voice xml.
As for the project I have choose JSGF or JAVA speech and grammar format. Java speech and
grammar format supports inbuilt dictionaries which capable to support digits and words. We
can plug multi language capabilities.
When developing systems using JAVA it‟s always advisable to use the components that
easily support JAVA platform capabilities.
![Page 105: Speech Recognition , Noise Filtering and Content Search Engine , Research Document](https://reader031.vdocument.in/reader031/viewer/2022020106/55a96f961a28ab2b508b47a6/html5/thumbnails/105.jpg)
Ultimate Speech Search
Page 96
The speech recognition system can split in to two systems. The Java virtual machine allocates
maximum of 128m memories for NetBeans. We cannot explicitly define the amount of the
virtual machine when we are working with net beans.
The digits recognition part can be performed and implemented in NetBeans development
environment.
But it‟s not possible to free the memory for recognize speech that contain words. They need a
higher level of virtual memory from the virtual machine. Because of that the speech
recognition for words had to run in Command prompt explicitly saying “java –mx256m -
jar”. This command allocated 256m virtual memory for speech recognition.
Noise filtering was another unsolved issue that had to answer through the system. For noise
filtering there was no proper support in JAVA. If you are doing a technical project its
essential to develop in 4th
generation languages or languages like C , assembly .
![Page 106: Speech Recognition , Noise Filtering and Content Search Engine , Research Document](https://reader031.vdocument.in/reader031/viewer/2022020106/55a96f961a28ab2b508b47a6/html5/thumbnails/106.jpg)
Ultimate Speech Search
Page 97
In order to do noise filtering I had to search an efficient way. There were many
methodologies and algorithms were available. Spectral subtraction and Weiner filtering
methods are some of them were inefficient. Finally I had to understand about FFT algorithms.
FFT algorithms are addressed in hardware level applications.
JAVA doesn‟t have a proper DSP API. So the only was MATLAB. MATLAB is 4th
generation. It‟s the proper instrument for noise filtering. I had performed spectral
subtraction. Split the spectrum of incoming signal in to two parts. The noisy part bears the
constant which unknown that cannot be applied in to a pure equation.
That constant part will reduce from the equation. After performing the filtering part I could
able to produce a noise filtered speech output.
The first program was not efficient as it is. The output which produced by the system had
many isolated noisy bits. The first system for noise filtering was built using Weiner‟s filtering
mechanism. It‟s inefficient for continues speech noise filtering. Spectral subtraction is more
efficient compared with other algorithms and we can apply it for any digital signal.
Once I have done the noise filtering I realized that speech recognition would have done more
easily by using a development environment like MATLAB.
The search functionality was the tricky part. The system has to assure that the users can
search by content. There were many text searching algorithms available. Some of the
available algorithms are Knuth Morris algorithm, Rabin Karp algorithm and the Boyer Moor
fast string searching algorithm.
We can use that algorithm s for an efficient text search. System search is more like a search
engine. Google has its own search hierarchies. But as a research project it‟s not possible to
build a search that meets the criteria within time constraints.
![Page 107: Speech Recognition , Noise Filtering and Content Search Engine , Research Document](https://reader031.vdocument.in/reader031/viewer/2022020106/55a96f961a28ab2b508b47a6/html5/thumbnails/107.jpg)
Ultimate Speech Search
Page 98
After completing the project I realized that the best languages are technical languages. When
trying to develop the system using JAVA I found out many technical difficulties. There was
no proper documentation. Minimum technical support and had to use many third party
components
As for the project I got the knowledge of digital signal processing which cannot be easily
acquired by a software engineering student who‟s following APIIT. Another interesting area
was speech recognition. In some occasions it seems so hard to accomplish certain tasks. But
while doing the recognition in JAVA I would able to do certain parts in maximum.
As final thoughts developing speech recognition systems needs a vast knowledge of
programming. You need to know about Digital signal processing, noise filtering mechanisms,
xml, and how to configure a speech recognition engine. It‟s advisable to know about
integration and algorithms.
![Page 108: Speech Recognition , Noise Filtering and Content Search Engine , Research Document](https://reader031.vdocument.in/reader031/viewer/2022020106/55a96f961a28ab2b508b47a6/html5/thumbnails/108.jpg)
Ultimate Speech Search
Page 99
7.2 Suggestions for future enhancements
At the beginning phase of this project the sole purpose was to build a speech recognition
system that that have the ability to identify human speech in .wav digital format and convert
it to a text format , maintain a rich set of database and grand the ability for users to search
speech by its content.
While doing the project I found out some difficulties in speech recognition process because it
was entirely a waste area. The system that I was built for my research project will identify the
most using words in speeches in common activities.
As for future enhancements we can do many modifications for the system.
Improve efficiency via Neural Network capabilities
Introducing Neural Network capabilities in to the system, we can improve time efficiency of
it‟s various algorithms. By making the system efficient we can integrate the system for
mobile devices.
Expand the system scope for many languages
System built for the research project only able to identify the English words. System won‟t
able to identify any other international languages such as Spanish, French, Arabic or Russian.
By introducing additional functionalities for the system we can make this system a universal
product.
![Page 109: Speech Recognition , Noise Filtering and Content Search Engine , Research Document](https://reader031.vdocument.in/reader031/viewer/2022020106/55a96f961a28ab2b508b47a6/html5/thumbnails/109.jpg)
Ultimate Speech Search
Page 100
Enhanced noise filtering capabilities
By enhancing noise filtering capabilities we can build a one system rather that two which self
identify the noise factor, reducing it and performs speech recognition. Currently JAVA
doesn‟t have a proper noise filtering API for Digital Signal processing. If we can convert the
entire system to C or C++ we will come up with a feature rich application.
Text to Speech Synthesis
By improving the above functionalities of the system we can maintain a rich database. If we
could make the system for translate speeches and lectures between languages it will be
helpful for the users throughout the globe.
Integrate to web
If we could make the system for the web as online speech recognition and searching portal it
would be helpful for many users. By making the system as a web component or an add-on
“the system will distribute a handful of service for the users throughout the globe.
The search functionality which includes in the system is only a replica of a search engine. As
future enhancements we can build actual speech search engine in collaboration with various
free online music and speech streaming servers.
![Page 110: Speech Recognition , Noise Filtering and Content Search Engine , Research Document](https://reader031.vdocument.in/reader031/viewer/2022020106/55a96f961a28ab2b508b47a6/html5/thumbnails/110.jpg)
Ultimate Speech Search
Page 101
CHAPTER 8
8.0 Conclusion
This project offered me a great experience to me. As a research project I thoroughly believe
that this would be a unique one for the institute. By doing this project I realized one important
aspect of live. That is it is so easy to think and harder to do.
When I started the project from the research stage I thought I will be able to complete the
project in less amount of time. But when I moved on to coding stage, the development
environments were still not ready to support the research done by the world.
Still there were no efficient algorithms or API‟s to meet the requirements for problems. With
the prevailing resources I was able to build a system that would meet the requirements in a
reasonable manner. Computer is a dumb device. Humans have to programme for it.
Programming is an art. Programmer‟s varies for the research. Throughout the academic life
of three years in APIIT I didn‟t have the chance to develop a technical project. This was an
entirely new experience for me.
As for the project and the available resource I think I have come up with a fair working
application that meets the requirements that I have mentioned in the project. It will recognize
the speech in a wav format rather than a microphone. This is something innovative. The noise
filtering area is still developing side in computing. I have done noise filtering for reasonable
amount, the best I could with available resources. Noise filtering is another vast research area
still growing. As for final thoughts I think I have done the best I could do to complete the
project.
![Page 111: Speech Recognition , Noise Filtering and Content Search Engine , Research Document](https://reader031.vdocument.in/reader031/viewer/2022020106/55a96f961a28ab2b508b47a6/html5/thumbnails/111.jpg)
Ultimate Speech Search
Page 102
REFERENCES (In Alphabetical Order)
Research papers on Agile development
Phalnikar,R. & Deshpande, V.S. & Joshi, S.D.,2009. „Applying Agile Principles for
Distributed Software Development‟ , International Conference on Advanced Computer
Control, 2009, pp.535-539. S
mith,M. & Miller,J. & Huang,L. & Tran,A.,2009 . „A More Agile Approach to Embedded
System Development‟ ,IEEE Software, vol. 26, no. 3, May/June 2009, pp. 50-57.
Research papers on search engines
Varadarajan,R.& Histridis,V. & Li,T. , 2008 . ‟ Beyond Single-Page Web Search Results‟ ,
IEEE Transactions on Knowledge and Data Engineering, vol. 20, no. 3, pp. 411-424.
Shao,Q. & Sun,P & Chen,Y.,2009.‟ WISE: A Workflow Information Search Engine‟ , IEEE
International Conference on Data Engineering, 2009, pp.1491-1494.
Research papers on web databases Su,W. & Wang,J. & Lochovsky,F.H.,2009.‟ Record
Matching over Query Results from Multiple Web Databases‟ , IEEE Transactions on
Knowledge and Data Engineering, 15 Apr. 2009.
Research papers on noise analysis
Anderson , D.V. & Clements, M.A.,1999. „Audio signal noise reduction using multi-
resolution sinusoidal modeling‟ ,' Acoustics, Speech, and Signal Processing, 1999.
Proceedings, 1999 IEEE International Conference on, 1999, vol. 2, pp.805-808.
Godsill, S.J. & Rayner, P.J.W.,1996. „Robust noise reduction for speech and audio signals‟ ,
Acoustics, Speech, and Signal Processing, Conference Proceedings, 1996 IEEE International
Conference on, 1996, icassp, vol. 2, pp.625-628.
![Page 112: Speech Recognition , Noise Filtering and Content Search Engine , Research Document](https://reader031.vdocument.in/reader031/viewer/2022020106/55a96f961a28ab2b508b47a6/html5/thumbnails/112.jpg)
Ultimate Speech Search
Page 103
Research papers on speech recognition
Wang,Z. & Topkara,U. & Schultz,T. & Waibel,A . , 2002. ‟ Towards Universal Speech
Recognition‟ , Fourth IEEE International Conference on Multimodal Interfaces (ICMI'02),
2002,pp.247.
Kurschl,W. & Mitsch,S. & Prokop,R. & Schonbock,J. , 2007. ‟ Gulliver-A Framework for
Building Smart Speech-Based Applications‟ ,' 40th Annual Hawaii International Conference
on System Sciences (HICSS'07), 2007, pp.30c.
Buza, O. & Toderean, G. & Nica, A. & Caruntu, A. ,2006.‟ Voice Signal Processing For
Speech Synthesis‟ ,2006 IEEE International Conference on Automation, Quality and
Testing, Robotics, aqtr, vol. 2, pp.360-364
Deligne, S. & Dharanipragada, S. & Gopinath, R. & Maison, B. & Olsen P. , 2002. „A
Robust High Accuracy Speech Recognition System‟ , IEEE Transactions on Speech and
Audio Processing. vol. 10, pp.1-11.
Abdulla,H.W & Kasabov,N.K,1999,The Concepts of Hidden Markov Model in Speech
Recognition..[Online],From:http://www.aut.ac.nz/resources/research/research_institutes/ked
ri/downloads/pdf/waleed-kas-9909.pdf .Available [2009/04/20]
Yankelovich, N & Levow, G. A. & Marx, M. , 1995. „ Designing SpeechActs: Issues in
Speech User Interfaces‟ . In: CHI, 1995, pp.1-12. Deng, L. & Huang, X. , 2004. „Challenges
in adopting speech recognition‟ ,Communications of the ACM. Vol. 47, pp.69-76.
Tu,Z. & Loizou, P.C., 1999 .‟ Speech recognition over the Internet using Java‟ , In:icassp ,
Acoustics, Speech, and Signal Processing, 1999. Proceedings, IEEE International Conference
on, 1999, Vol 4,pp.2367-2370.
![Page 113: Speech Recognition , Noise Filtering and Content Search Engine , Research Document](https://reader031.vdocument.in/reader031/viewer/2022020106/55a96f961a28ab2b508b47a6/html5/thumbnails/113.jpg)
Ultimate Speech Search
Page 104
Zhang,W. & He,L. & Chow,Y. & Yang,R. & Su,Y.,2000 „The study on distributed speech
recognition system‟ . In:icassp, Acoustics, Speech, and Signal Processing, IEEE
International Conference on, 2000, vol. 3, pp.1431-1434
Ahmed, M. M. & Ahmed, A. M. B. , 2005. 'Review And Challenges In Speech Recognition',
In: ICCAS, 2005, pp.1-5. Lin, E. C. & Yu, K. & Rutenbar,
R. A. & Chen, T. ,2007. „A 1000-Word Vocabulary, Speaker-Independent, Continuous Live-
Mode Speech Recognizer Implemented in a Single FPGA‟ , In: FPGA, 18–20 February
2007, California. pp.60-69.
anon. (nd). School of Electrical, Computer and Telecommunications Engineering. Available:
http://www.elec.uow.edu.au/staff/wysocki/dspcs/papers/004.pdf]. Last accessed 23rd
August
2009
anon. (nd). Departement Elektrotechniek. Available:
http://www.esat.kuleuven.be/psi/spraak/theses/08-09-en/clp_lp_mask.png. Last accessed 22
September 2009
Sergy ,B. Lawrence,P.. (nd). The Anatomy of a Large-Scale Hypertextual Web Search
Engine. Available: http://infolab.stanford.edu/~backrub/google.html. Last accessed 24 march
2009.
anon. (nd).. Available: http://www.methodsandtools.com/archive/scrum1.gif. Last accessed
26 th March 2009
![Page 114: Speech Recognition , Noise Filtering and Content Search Engine , Research Document](https://reader031.vdocument.in/reader031/viewer/2022020106/55a96f961a28ab2b508b47a6/html5/thumbnails/114.jpg)
Ultimate Speech Search
Page 105
Development
Java Media Framework API and Docs[online] (2009). Available from World Wide Web:
Java Speech APIs & Docs [online]. (2008). Available from World Wide Web: .
Other third party research papers
Byun, J. H. & Rim, H. C. & Park, S. Y. , 2007. 'Automatic Spelling Correction Rule Extraction
and Application for Spoken-Style Korean Text', In: Sixth International Conference on Advanced
Language Processing and Web Information Technology , 2007, pp.195-199.
Mitchel, C.D. , 1999. 'Improved spelling recognition using a tree-based fast lexical match', In:
IEEE International Conference on Acoustics, Speech, and Signal Processing, 1999, pp.597-600.
Sloboda, T. , 1995. 'Dictionary learning: performance through consistency', In: IEEE
International Conference on Acoustics, Speech, and Signal Processing, 1995, pp.453-456.
Wendemuth, A. & Rose, G. & Dolfing, J.G.A., 1999. 'Advances in confidence measures for large
vocabulary', In: IEEE International Conference on Acoustics, Speech, and Signal Processing,
1999, pp.705-708.
Thiele, F. & Rueber, B. & Klakow D. , 2000. 'Long range language models for free spelling
recognition', In: IEEE International Conference on Acoustics, Speech, and Signal Processing,
2000, pp.1715-1718. Books
Kroll, P., Kruchten, P., 2003, Rational Unified Process Made Easy: A Practitioner's Guide to
RUP, Addison Wesley Abrahamsson,
P. & Salo, O. & Ronkainen, J. & Warsta, J. 2002, Agile software development methods review
and analysis , VTT Publications.
![Page 115: Speech Recognition , Noise Filtering and Content Search Engine , Research Document](https://reader031.vdocument.in/reader031/viewer/2022020106/55a96f961a28ab2b508b47a6/html5/thumbnails/115.jpg)
Ultimate Speech Search
Page 106
BIBLIOGRAPHY
Frankel, J. & Richmond, K. & King, S. & Taylor, P. , 2000. 'An Automatic Speech
Recognition System Using Neural Networks and Linear Dynamic Models to Recover and
Model Articulatory Traces', In: Sixth International Conference on Spoken Language
Processing, 2000, pp.1-4.
Padmanabhan, M. & Picheny, M. , 2002. „Large-Vocabulary Speech Recognition
Algorithms‟ . Computer, Vol. 35, pp. 42-50.
Zhao, H. & Wakita, X., 1991, "An HMM Based Speaker-Independent Continuous Speech
Recognition System With Experiments on the TIMIT Database" In. ICASSP, Toronto,
Canada, May 1991, pp. 333-336.
Andrew, H. 1997, comp.speech Frequently Asked Questions, [online], Available:
<http://www.speech.cs.cmu.edu/comp.speech/> [Accessed 26 May. 2009]
Andrew, H. 1997 What is speech recognition, [online], Available: < http://www.
speech.cs.cmu.edu/comp.speech/Section6/Q6.1.html> [Accessed 30 May. 2009]
![Page 116: Speech Recognition , Noise Filtering and Content Search Engine , Research Document](https://reader031.vdocument.in/reader031/viewer/2022020106/55a96f961a28ab2b508b47a6/html5/thumbnails/116.jpg)
Ultimate Speech Search
Page 107
APPENDIX A User manual
The Ultimate Speech search system comprises with four subsystems. They are
Speech recognition System for numbers
Speech recognition System for words
Noise Filtering System
Speech search Web Browser
If you want to run the speech recognition System, you have to open the NetBeans and select
“USS “ as the project name.
![Page 117: Speech Recognition , Noise Filtering and Content Search Engine , Research Document](https://reader031.vdocument.in/reader031/viewer/2022020106/55a96f961a28ab2b508b47a6/html5/thumbnails/117.jpg)
Ultimate Speech Search
Page 108
After running a user can see the initial GUI. The open button above can be used to open a
speech file in .wav format. In here the system with GUI will supports the identification of
digits. Identification of words do not support by this sub system.
By pressing the open button the system will allows the user to select a digit file for the
recognition.
![Page 118: Speech Recognition , Noise Filtering and Content Search Engine , Research Document](https://reader031.vdocument.in/reader031/viewer/2022020106/55a96f961a28ab2b508b47a6/html5/thumbnails/118.jpg)
Ultimate Speech Search
Page 109
Once the user has select a file and press open, then he or she can click on Star button. After
that the user can see the text output in the text output area
.
The opened file name will show in just below open speech file label.
![Page 119: Speech Recognition , Noise Filtering and Content Search Engine , Research Document](https://reader031.vdocument.in/reader031/viewer/2022020106/55a96f961a28ab2b508b47a6/html5/thumbnails/119.jpg)
Ultimate Speech Search
Page 110
The system for Recognize words needs a huge memory in java virtual machine. If we try to
run the system on NetBeans or Eclipse it will crash virtual machine. So the best way to do
this is in command prompt. You have to run the Jar file in command prompt by allocating the
memory for it manually.
You have to type java –mx256m –jar wordrec.jar. The java –mx256m will allocate additional
memory for the virtual machine and the latter part will run the program for word recognition.
The output varies and we cannot assure 100% correctness.
![Page 120: Speech Recognition , Noise Filtering and Content Search Engine , Research Document](https://reader031.vdocument.in/reader031/viewer/2022020106/55a96f961a28ab2b508b47a6/html5/thumbnails/120.jpg)
Ultimate Speech Search
Page 111
In order to run the noise filtering system you need to have MATLAB version 6 or greater.
The user has to open both the denoise2. Script file and denoise.m script file. After that user
needs to type the file that he or she wants to perform noise filtering.
![Page 121: Speech Recognition , Noise Filtering and Content Search Engine , Research Document](https://reader031.vdocument.in/reader031/viewer/2022020106/55a96f961a28ab2b508b47a6/html5/thumbnails/121.jpg)
Ultimate Speech Search
Page 112
wavread(„input.wav‟) gets the input parameter for system. If you want to specify the output
file name you have to specify it as shown below.
wavwrite(output,FS.‟output.wav‟) is the place to define the output file and format
![Page 122: Speech Recognition , Noise Filtering and Content Search Engine , Research Document](https://reader031.vdocument.in/reader031/viewer/2022020106/55a96f961a28ab2b508b47a6/html5/thumbnails/122.jpg)
Ultimate Speech Search
Page 113
If you want to run the search engine, open NetBeans and run SpeechSearchEng.
A user cans type the content of a speech he or she required and if that speech is available it
will show as true. Otherwise it will show false.
![Page 123: Speech Recognition , Noise Filtering and Content Search Engine , Research Document](https://reader031.vdocument.in/reader031/viewer/2022020106/55a96f961a28ab2b508b47a6/html5/thumbnails/123.jpg)
Ultimate Speech Search
Page 114
APPENDIX B
Gantt chart