speech recognition , noise filtering and content search engine , research document

Ultimate Speech Search

Page i

ABSTRACT

In the modern era people tends to find information where ever they can in a more

efficient way. They search for the knowledge from past events so does the present events.

Searching for a particular thing evolves a search engine and the necessary information. When

they want to learn out of speeches or lectures done by any one they are going for a desperate

search without knowing the actual results. If they have a luxury of a search engine that would

give the required results that would be a blessing for their work.

This project totally aims for build a search engine that will able to search for

speeches and lectures by their content. Every search engine supports the feature of searching,

but the results may be a jargon. The user has to go one by one and sometimes at the end of

the day they will end up will a null result. The main goal of this project is to provide a search

facility by the content.

This research covers converting a speech in to text with a bit of noise analysis,

maintaining a database with clustered indexing and a simple search facility by the content.

The system that would build operates on a limited data such as speeches and lectures in a low

noisy environment and as for the future enhancement it would be able to search for music or

any other sound stream by the analysis of the spectrum with user friendly search facility.

KEY WORDS Search Engine, Speeches, Lectures, Noise Analysis, Content, Spectrum


Page ii

ACKNOWLEDGEMENTS

My sincere gratitude goes to my grandfather who taught me the ways of life and who

raised me up from my childhood to a teenager and left me in a May.

I would like to thank to my friends those who help me in my difficult times and praised me in

my good times. I would like to thank to my college teachers who beat me from canes to make

me a good man and gave me the knowledge to face the society.

I would like to thank for my sister who always be a mother to me and I would like to show

my gratitude for my supervisor Mrs. Nadeera Ahangama who guide to throughout the project.

Finally I would like to thank to the APIIT staffs who provide us with necessary facilities to

achieve our higher education and make it a success.


Page iii

Table of Contents

ABSTRACT ................................................................................................................................ i

ACKNOWLEDGEMENTS ....................................................................................................... ii

List of Figures .......................................................................................................................... vii

List of Equations .................................................................................................................... viii

List of Tables ............................................................................................................................ ix

INTRODUCTION ..................................................................................................................... 1

1.1 Project Background ..................................................................................................... 1

1.2 Problem Description .................................................................................................... 2

1.3 Project Overview ......................................................................................................... 4

1.3.1 Noise analysis ...................................................................................................... 4

1.3.2 Speech recognition ............................................................................................... 4

1.3.3 Speech to text conversion .................................................................................... 4

1.3.4 The database......................................................................................................... 5

1.3.5 The search engine ...................................................................................................... 5

1.4 Project Scope ............................................................................................................... 6

1.5 Project Objectives ....................................................................................................... 7

RESEARCH ............................................................................................................................... 8

2.1 Speech Recognition .......................................................................................................... 8

2.2 Speech recognition methods........................................................................................... 13

2.2.1 Hidden Markov methods in speech recognition ...................................................... 13

2.2.2 Client side speech recognition ................................................................................. 16

2.2.5 Continuous speech recognition ................................................................................ 18

2.2.6 Direct Speech Recognition ...................................................................................... 18

2.3 Speaker Characteristics .................................................................................................. 19

2.3.1 Speaker Dependent .................................................................................................. 19

2.3.2 Speaker Independent................................................................................................ 19

2.3.3 Conclusion ................................................................................................................... 20

2.4 Speech Recognition mechanisms ................................................................................... 21

2.4.1 Isolated word recognition ........................................................................................ 21


Page iv

2.4.2 Continuous speech recognition ................................................................................ 22

2.4.3 Conclusion ............................................................................................................... 23

2.5 Vocabulary Size ............................................................................................................. 24

2.5.1 Limited Vocabulary ................................................................................................. 24

2.5.2 Large Vocabulary .................................................................................................... 24

2.5.3 Conclusion ............................................................................................................... 24

2.6 Speech recognition API‟s ............................................................................................... 25

2.6.1 Microsoft Speech API 5.3 ....................................................................................... 25

2.6.2 Java Speech API ...................................................................................................... 26

2.7 Speech Recognition Algorithms .................................................................................... 31

1. 8 Noise Filtering ........................................................................................................... 32

1.8.1 Weiner filtering .................................................................................................. 33

1.8.2 Conclusion ......................................................................................................... 33

2.9 Database and data structure ............................................................................................ 34

2.9.1 Conclusion ............................................................................................................... 34

2.10 Search Engine ............................................................................................................... 35

2.11 MATLAB ..................................................................................................................... 36

ANALYSIS .............................................................................................................................. 37

3.0 System requirements ................................................................................................. 37

3.11 Functional requirements ........................................................................................ 37

3.1.2 Non functional requirements ................................................................................... 37

3.1.3 Software Requirements ............................................................................................ 38

3.1.4 Hardware requirements ............................................................................................ 39

3.2 System Development Methodologies............................................................................. 40

3.2.1 Rational Unified Process ......................................................................................... 40

3.2.2 Agile Development Method .................................................................................... 43

3.2.3Scrum Development Methodology ........................................................................... 45

3.3 Test Plan ......................................................................................................................... 47

3.3.1System testing ........................................................................................................... 47

SYSTEM DESIGN .................................................................................................................. 48

4.1 Use Case Diagram ..................................................................................................... 48


Page v

4.2 Use case description ....................................................................................................... 50

4.2.1 Use case description for file upload ........................................................................ 50

4.2.2Use Case description for play an audio file .............................................................. 51

4.2.3 Use Case description for search ............................................................................... 52

4.2.4 Use Case description for noise reduced output ....................................................... 53

4.2.5 Use Case description for noise filtering .................................................................. 54

4.3 Activity Diagrams .......................................................................................................... 55

4.3.1Activity Diagram for Speech Recognition System................................................... 55

4.3.2 Activity Diagram for Noise filtering ....................................................................... 56

4.4 Sequence Diagrams ........................................................................................................ 57

4.4.1 Select a file .............................................................................................................. 57

4.4.2 Play wav file ............................................................................................................ 58

4.4.3Speech recognition pre stage .................................................................................... 59

4.4.4Speech Recognition post stage ................................................................................. 60

4.5 Class Diagrams ............................................................................................................... 61

4.5.1 GUI and the system ................................................................................................. 61

4.5.2 Speech recognition .................................................................................................. 62

4.6 Noise Filtering ................................................................................................................ 64

4.7 Code to filter noise in C Language................................................................................. 67

CHAPTER 5 ............................................................................................................................ 73

5.0 Implementation .................................................................................................................. 73

CHAPTER 6 ............................................................................................................................ 78

6.0 Test Plan............................................................................................................................. 78

6.1 Background .................................................................................................................... 78

6.2 Introduction .................................................................................................................... 78

6.3 Assumptions ................................................................................................................... 79

6.4 Features to be tested ....................................................................................................... 79

6.5 Suspension and resumption criteria ............................................................................... 80

6.6 Environmental needs ...................................................................................................... 81

6.7 System testing ................................................................................................................ 82

6.8 Unit testing ..................................................................................................................... 83


Page vi

6.9 Performance Testing ...................................................................................................... 89

6.10 Integration Testing ....................................................................................................... 92

CHAPTER 7 ............................................................................................................................ 94

CRITICAL EVALUATION AND FUTURE ENHANCEMENTS ........................................ 94

7.1Critical evaluation ........................................................................................................... 94

7.2 Suggestions for future enhancements ............................................................................. 99

8.0 Conclusion .................................................................................................................. 101

REFERENCES ...................................................................................................................... 102

BIBLIOGRAPHY .................................................................................................................. 106

APPENDIX A ........................................................................................................................ 107

APPENDIX B ........................................................................................................................ 114

Gantt chart .......................................................................................................................... 114


Page vii

List of Figures

Figure 1: Overview of Steps in Speech Recognition ................................................................. 8

Figure 2 : Graphical Overview of the Recognition Process .................................................... 10

Figure 3: Components of a typical speech recognition system................................................ 12

Figure 4 : example of HMM for word “Yes” on an utterance ................................................. 15

Figure 5: Overview of Microsoft Speech Recognition API ................................................... 25

Figure 6 : Java Sound API Architecture .................................................................................. 29

Figure 7 : JSGF Architecture ................................................................................................... 30

Figure 8: Noise in Speech ........................................................................................................ 32

Figure 9 : Database Indexing ................................................................................................... 34

Figure 10 : Google Architecture .............................................................................................. 35

Figure 11 Phases in RUP ......................................................................................................... 41

Figure 12 : Overview of Agile ................................................................................................. 43

Figure 13 : Scrum Overview .................................................................................................... 46

Figure 15 : Use Case Diagram for System............................................................................... 48

Figure 16 Speech Recognition ................................................................................................. 55

Figure 17 Activity Diagram Noise Filtering ........................................................................... 56

Figure 18 Sequence Diagram Select a file ............................................................................... 57

Figure 19 Sequence Diagram Play File ................................................................................... 58

Figure 20 Sequence Diagram SR Pre Stage ............................................................................ 59

Figure 21 Sequence Diagram SR Post Stage .......................................................................... 60

Figure 22 Class Diagrams GUI & System ............................................................................... 61

Figure 23 Class Diagram SR System ....................................................................................... 62

Figure 24 : Speech Search Class Diagram ............................................................................... 63

Figure 25: SR Engine ............................................................................................................... 73

Figure 26 Open file .................................................................................................................. 74

Figure 27: Text output ............................................................................................................. 75

Figure 28 Speech Search Engine ............................................................................................. 77


Page viii

List of Equations

Equation 1 : First order Markov chain ..................................................................................... 13

Equation 2: Stationary states Transition .................................................................................. 14

Equation 3: Observations independence .................................................................................. 14

Equation 4: observation sequence. ........................................................................................... 14

Equation 5 : Left Right topology constraints ........................................................................... 15

Equation 6: CSR Equations ..................................................................................................... 22


Page ix

List of Tables

Table 1: Typical parameters used to characterize the capability of speech recognition system 9

Table 2 : Comparison in different techniques in speech recognition....................................... 17

Table 3: Isolated word recognition .......................................................................................... 21

Table 4 : Use Case description file upload .............................................................................. 50

Table 5 Use Case description play audio ................................................................................. 51

Table 6 Use Case description search ....................................................................................... 52

Table 7 Use Case description noise reduction ......................................................................... 53

Table 8 Use Case description noise process ............................................................................ 54

Table 9 Test Case 1 .................................................................................................................. 83

Table 10 Test Case 2 ................................................................................................................ 84

Table 11 Test Case 3 ................................................................................................................ 85

Table 12 Test Case 4 ................................................................................................................ 86

Table 13 Test Case 5 ................................................................................................................ 87

Table 14 Test Case 6 ................................................................................................................ 88

Table 15: Performance testing windows XP ............................................................................ 89

Table 16 : Performance Testing on UBUNTU ........................................................................ 90


Page 1

CHAPTER 1

INTRODUCTION

1.1 Project Background

Throughout the history of human civilization time played a key role. Humans achieved

Technological advancement, scientific breakthroughs and unfortunately drawbacks within

certain time goals. In many cases these time goals were set by nature.

According to sooths point of view now we are live in an advanced era compared to

prehistoric eras. We all are actors in another part of a chronicle play in our time. Due to the

globalization distances in this planet narrowing. Within a shorter time limit people forced to

accomplish objectives and goals and most of the time they are lacking certain amount of time

in order to make it a success.

Some part of a society ask to accomplish a goal they may go for a research , interviews or

various any other fact finding techniques. Just imagine that they need to find certain

information from lectures and speeches. Can they find the appropriate resource materials in a

minimum time and with a minimum effort?

They have to go through many search results and they have to commit most of their valuable

time for a worthless task. If there is a way to find the lectures and speeches by searching by

their content we could guarantee that we can save our valuable time in a respectable manner

and we can invest this valuable time for deeds in sake of the planet earth.


Page 2

1.2 Problem Description

The problem is to provide with the users with a search engine in order to search lectures and

speeches by their content for various purposes.

In order to do this we have to come up with fair solutions for the challenges that meet

throughout this process and they are as follows.

Noise analysis: - we have to analyze the nature of the speech or the lecture. Speeches and

lectures may come from various surrounding environmental conditions. This may directly

effect to the vocal part of the speech. So we have to reduce the noise as much as possible.

Speech recognition: - speech recognition is a vast area. Speeches can be done by many

personalities with different accents. Each individual has his/her own accent when speaking in

English or any other languages. In order to recognize the words they spoken we have to do a

deep research in order to build a speech recognition server to overcome the speech

recognition challenge.

Speech to text conversion:-Speech to text conversion is one of the key areas of this project

because it‟s the key point to build the database that contains the text version of speeches and

lectures.

The database: - All the converted versions of the speeches and lectures will be saved in the

database.

The search engine: - This is another challenging area of the project. The search engine will

show the appropriate search results from the database. I need to find the searching


Page 3

mechanisms and methods for the search in order to give the user with efficient and accurate

results.

Database and the search engine are two parallel problems that need to be developed

more precisely. Without a proper structure for the database it‟s tedious to implement search

functionality.


Page 4

1.3 Project Overview

The main challenge area of this project is to build the database containing the text version of

speeches and lectures. In order to accomplish these phenomena we have to perform some

tasks.

1.3.1 Noise analysis

A noise analysis will perform in order to ensure an efficient speech to text conversion. This

will enables us to isolate the human voice and remove the background environment in the

audio file. This may include background noise such as tape hiss, electric fans or hums, etc.

1.3.2 Speech recognition

Speech recognition comes in two flavors. They are speaker independent and speaker

dependent. The voice of the speaker or the lecturer may change. Because of that the project

uses speaker independent speech recognition.

1.3.3 Speech to text conversion

The system converts the speech in text format in order to build the database. The database

consists with the converted text version of the speeches and the lectures.


Page 5

1.3.4 The database

The database consists two parts. They are the converted (speech to text) speech file or the

lecture file and the actual source files contains audio.

1.3.5 The search engine

The search engine search for the content of a speech or a lecture from the database and gives

the actual results. We might need to do something like summarizing. So the user can search

from the content more easily by typing a sentence or a word.


Page 6

1.4 Project Scope

Existing search engines wont facilitates for search for a speech by its content. This system

gives you the facility to search a speech by its content. The system contains data about

English speeches and lectures.

These speeches and lectures were done in a low noisy environment because the system

would perform a less noise analysis. The system won‟t store music because the amount of

noise analysis in higher compared to a low noisy environment.

The speech recognition engine that going to build only supports for the English speeches and

lectures and the noise analysis will only supports for the English speeches and lectures and

speeches.

The system will convert speeches and lectures (low noise) to text format. After the

development process users will able to search from anywhere on this planet for a required

result.

Speaker independent speech recognition will be used because the system deals with different

type of speeches performed by different persons with different accents.


Page 7

1.5 Project Objectives

1.0 Noise analysis and reduction

The system will performs noise filtering. This helps the speech recognition process. The

noisy signal channel will analyzed and split in to two parts. Amplitude of the noisy channel

set to low in value. An efficient noise filtering mechanism will use.

2.0 Continuous speech recognition system

To develop an efficient speech recognition engine to convert speeches and lectures to a text

format Speeches performed by various persons will be translated in to text format.

3.0 The Database

Database implementation Converted version of the speeches and lectures will be stored in the

data base in text format and the relevant speech or the lecture will be stored in another

database

4.0 The search engine

The search engine search for the content of a speech or a lecture from the database and gives

the actual results. We might need to do something like summarizing. So the user can search

from the content more easily by typing a sentence or a word.


Page 8

CHAPTER 2

RESEARCH

2.1 Speech Recognition

The process of converting a phonic signal captured by a phone or a microphone or any other

audio device to a set of words is called speech recognition. Speech recognition is used in

command based applications such as data entry control systems, documentation preparation,

automation of telephone relay systems, in mobile devices such as in mobile phones and to

help people with hearing disabilities.

According to Professor Todd Austin (2007) Speech recognition is the task of translating an

acoustic waveform representing human speech into its corresponding textual representation.

Source(Aoustin,T. (2007). Speech Recognition. Available:

http://cccp.eecs.umich.edu/research/speech.php. Last accessed 17 July 2009. )

Figure 1: Overview of Steps in Speech Recognition


Page 9

Applications that support speech recognition are “introduced on a weekly basis and speech

technology are rapidly entering new technical domains and new markets” (Java Speech API

Programmers Guide, 1998)

According to Zue et al. (2003), Speech recognition is a process that converts an acoustic

signal which can be captured by a microphone, to a set of words. Speech recognition systems

can be categorized by many parameters.

Parameters Range

Speaking mode

Isolated words to continues speech

Speaking Style

Read Speech to spontaneous speech

Enrolment

Speaker dependent to speaker independent

Small

Small (<20 words) to large (>20000 words)

Language Model

Finite state to context sensitive

Perplexity

Small(<10) larger(>100)

SNR

High(>3dB) to low (<20dB)

Transducer

Voice cancelling microphone to telephone

Table 1: Typical parameters used to characterize the capability of speech recognition

system


Page 10

According to Hosom et al. (2003), “The dominant technology used in Speech Recognition is

called the Hidden Markov Model (HMM)”. There are four basics steps in performing speech

recognition. They can be seen in the figure below.

[Source: Hosom et al., 1999]

Figure 2 : Graphical Overview of the Recognition Process


Page 11

During pass few years speech recognition systems have achieved a remarkable success such

their capability of recognition accuracy rate sometimes results over 98 percent. But that such

accuracy rate was achieved in quite environments and by using sample words in training. It

has been said that a good speech recognition system must be able to achieve good

performance in many circumstances such as a noisy environment. Noise can come on many

flavors.

Air conditions , fans , radios , coughs , tape hiss , cross talks channel distortions , lips smack

, breath noise , pops , sneeze are the basic factors that are engage in making a noisy

environment.

Typical component of a speech recognition system composed of Training data , Acoustic

model , Language model , Training model, Lexical model, Speech signal, Representation,

Model Classification , Search and Recognize words.


Page 12

The figure below shows these components geometry in a speech recognition system.

Figure 3: Components of a typical speech recognition system.


Page 13

2.2 Speech recognition methods

There is only few speech recognition methods are prevailing. They are categorizing as for the

mobile devices and for standalone applications.

2.2.1 Hidden Markov methods in speech recognition

Andre Markov is the founder of Markov process. Markov model involves probability and it

uses over a finite sets usually called its states.

When a state transition occurs it generates a character from the process. This model has a

finite state Markov chain and a finite set of output probability distribution. Hidden Markov

Constrains for speech recognition systems

1 – First order Markov chain.

This has been made by the assumption that the probability of transition to a state depends

only on the current state

𝑃 𝑞𝑡 + 1 =𝑆𝑗

𝑞𝑡= 𝑆𝑖 , 𝑞𝑡 − 1 = 𝑆𝑘 , 𝑞𝑡 − 2 = 𝑆𝑤 ,… . . , 𝑞𝑡 − 𝑛 = 𝑆𝑧 𝑃 𝑞𝑡 + 1 = 𝑆𝑗

𝑞𝑡= 𝑆𝑖

Equation 1 : First order Markov chain


Page 14

2 – Stationary states Transition.

This assumption proved that the state changes are mutually exclusive from the time.

𝑎𝑖𝑗 = 𝑃 𝑞𝑡 + 1 = 𝑆𝑗 𝑞𝑡 = 𝑆𝑖

Equation 2: Stationary states Transition

3 – Observations independence.

This assumption regards to the state changes depend only on the underline Markov chain.

However this assumption was depreciated.

𝑃 𝑂𝑡

𝑂𝑡− 1,𝑂𝑡 − 2,… . . ,𝑂𝑡 − 𝑝 , 𝑞𝑡 , 𝑞𝑡 − 1 , 𝑞𝑡 − 2 ,… . 𝑞𝑡 − 𝑝

= 𝑃 𝑂𝑡

𝑞𝑡 , 𝑞𝑡 − 1 , 𝑞𝑡 − 2 ,… . 𝑞𝑡 − 𝑝

Equation 3: Observations independence

Where “p “represents considered history of the observation sequence.

𝑏𝑗 𝑂𝑡 = 𝑃 𝑂𝑡

𝑞𝑡= 𝑗

Equation 4: observation sequence.


Page 15

4 – Left-Right topology constraint:

𝑎𝑖𝑗 = 0 𝑓𝑜𝑟 𝑎𝑙𝑙 𝑗 > 𝑖 + 2 𝑎𝑛𝑑 𝑗 < 𝑖 { 1 𝑓𝑜𝑟 𝑖 1 1 0 𝑓𝑜𝑟 1 𝑖 𝑁 ( ) = < £

= 𝑖 𝑖 𝑝 𝑃 𝑞 𝑆

Equation 5 : Left Right topology constraints

The figure below shows an example of HMM for word “Yes” on an utterance.

Figure 4 : example of HMM for word “Yes” on an utterance


Page 16

2.2.2 Client side speech recognition

According to Hosom et al. (2003), Client Side - Speech Recognition is technology that allows

a computer to identify the words that a person speaks into a microphone or telephone. The

basic advantages of having client side speech recognition are it assures a faster response time

because all the processing handled in the client side. The other advantage is it does not use

any network connections like GPRS. According to Hagen at el. (2003, p.66) the problems of

client side speech recognition is, Recognition accuracy and Running time (power

consumption).

2.2.3 Dynamic Time wrapping based speech recognition

This method was used in past decades but now has been depreciated. This algorithm

measures similarities between two sequences which may vary in time or speed. Number of

templates being used in order to perform automatic speech recognition in Dynamic Time

Wrapping based speech recognition. This process involves normalization of distortion and the

lowest normalized distortion is identified as a word.

2.2.4 Artificial Neural Networks

The mechanism inside ANN is to filter the human speech frequencies from the other

frequencies due to the fact that the non speech sound covers higher frequency range than

speech.


Page 17

The table below shows a comparison between different speech recognition mechanisms.

Source [anon. (nd). School of Electrical, Computer and Telecommunications

Engineering. Available: http://www.elec.uow.edu.au/staff/wysocki/dspcs/papers/004.pdf].

Last accessed 23rd

August 2009.]

Table 2 : Comparison in different techniques in speech recognition


Page 18

2.2.5 Continuous speech recognition

Continuous speech recognition applies is used when a speaker pronounce words sentence or

phrase that are in a series or specific order and are dependent on each other, as if linked

together. This system operates on a system that words are connected to each other and not

separated by pauses.

Because there is more variety of effects it‟s a tedious task to manipulate it. Co articulation is

another series issue in continuous speech recognition. . The effect of the surrounding

phonemes to a single phoneme is high. Starting and ending words are affecting by the

following words and also affected by the speed of the speech.

It‟s harder to track down a fast speech. Two algorithms are usually involves in Continuous

speech recognition. They are Viterbi Algorithm and Baum Welch Algorithm.

2.2.6 Direct Speech Recognition

This process is responsible for identify the speech such that from a word by word and it

follows by pauses.


Page 19

2.3 Speaker Characteristics

2.3.1 Speaker Dependent

Speaker Dependent speech recognition systems are developed for a single user purpose only.

No other user can use the system and it will function with only a single user. These systems

subjected to train by the user for the functionality purpose.

One such advantage is that these kinds of systems support more vocabulary than the speaker

independent system and the disadvantage is the limitation of usage for the type of users. This

technology is used in steno masks

.

2.3.2 Speaker Independent

Speaker Independent speech recognition systems are harder to implement relative to the

speaker dependent speech recognition systems. The system need to recognize the patterns and

different accents spoken by many users. The advantage of this system is it can be used by

many users without training.

The most important steps in order to build a speaker independent SRS is to identify what

parts of speech are generic, and which ones vary from person to person. The Speaker

dependent speech recognition can be used by many users despite they are harder to

implement.


Page 20

2.3.3 Conclusion

Speaker Independent speech recognition system has been selected for the project because the

system has to deal with many speeches done by many users.

The speech accent and phoneme patterns are different from a speaker to a speaker and it‟s not

possible to perform an individual training for each and every speaker.

Java Speech API only supports for speaker independent speech recognition systems and

that‟s another reason to select speaker independent speech recognition.


Page 21

2.4 Speech Recognition mechanisms

2.4.1 Isolated word recognition

This identifies a single word at a time and pauses are involved between words. Isolated word

recognition is the primary stage of speech recognition and it widely used in command based

applications.

Isolated speech recognition needs a less processing power and primary patter matching

algorithms evolved.

Table 3: Isolated word recognition


Page 22

2.4.2 Continuous speech recognition

According to Hunt, A. (1997) Continuous speech is more difficult to handle because it is

difficult to find the start and end points of words and Co articulation - the production of each

phoneme is affected by the production of surrounding phonemes.

According to Peinado & Segura (2006, p.9), there are three types of errors in Continuous

speech recognition systems.

Substitutions - recognized sentence have different words substituting original words.

Deletions - recognized sentence with missing words.

Insertions - recognized sentence have new/extra words. Error rate calculation in Continuous

speech recognition by Stephen at el. (2003, p.2)

𝐶𝑜𝑟𝑟𝑒𝑐𝑡 𝑃𝑒𝑟𝑐𝑒𝑛𝑡𝑎𝑔𝑒 =𝐻1

𝑁2𝑥 100%

𝐴𝑐𝑐𝑢𝑟𝑎𝑐𝑦 =𝑁 − 𝐷3 − 𝑆4 − 𝐼

𝑁𝑥 100%

𝐴𝑐𝑐𝑢𝑟𝑎𝑐𝑦 =𝐻 − 𝐼

𝑁𝑥 100%

𝑊𝑜𝑟𝑑 𝐸𝑟𝑟𝑜𝑟 𝑟𝑎𝑡𝑒 =𝑆 + 𝐷 + 𝐼5

𝑁𝑥 100%

Equation 6: CSR Equations

1 Number of words correctly recognized

2 Total number of words in the actual speech

3 Deletions

4 Substitutions

5 Insertions


Page 23

2.4.3 Conclusion

As for the project continuous speech recognition mechanism has chosen because the system

going to deal with continuous speeches in order to build a database and the back end of the

system serves as a standalone application.


Page 24

2.5 Vocabulary Size

Vocabulary is the amount of words that known by a person. Greater the vocabulary size, the

depth that he know is higher. The same rule applies for speech recognition systems.

2.5.1 Limited Vocabulary

Limited vocabulary systems have a limited number of words. This can be varies 100 to 10000

words. These systems need a less processing power and more suitable for mobile devices.

2.5.2 Large Vocabulary

Large Vocabulary size for a speech recognition system mainly used in servers or stand alone

applications and evolves more processing power. It will identify almost every word speak by

a person. This vocabulary has more than 10000 words.

2.5.3 Conclusion

Large Vocabulary has been chosen for the project because the project‟s main processes are

handled by standalone applications and it has to collaborate with many speeches.


Page 25

2.6 Speech recognition API’s

2.6.1 Microsoft Speech API 5.3

Microsoft Speech API reduces the coding overload from the programmers. It‟s equipped with

speech to text and text to speech recognition.

This API requires a .NET based building environment and have to purchase. Scope of Speech

Application Programming Interface or SAPI lies within windows environments. It allows the

use of speech recognition and speech synthesis within Windows applications.

Applications that use SAPI include Microsoft Office, Microsoft Agent and Microsoft Speech

Server.

In general SAPI defines a set of interfaces and classes to develop dynamic speech recognition

systems. SAPI uses two libraries for its front end and for its back end. For front end it uses

the “Fast format” library. For the back end SAPI uses the “Pantheios”. Both these are C++

open source libraries.

Figure 5: Overview of Microsoft Speech Recognition API


Page 26

2.6.2 Java Speech API

Java Speech API provides the both speech recognition and synthesis capabilities and it is

freely available. JSAPI supports for multiple platform development and supports open source

and non open source third party tools. JSAPI package comprises with java.speech,

javax.speech.recognition and javax.speech.synthesis.

Sun Micro Systems build JSAPI in collaboration with

Apple Computer, Inc.

AT&T

Dragon Systems, Inc.

IBM Corporation

Novell, Inc.

Philips Speech Processing

Texas Instruments Incorporated

It supports speaker independent speech recognition and W3C standards.

Speech recognizer‟s capabilities:

Built-in grammars (device specific)

Application defined grammars

Speech synthesizer‟s capabilities:

Formant synthesis

Concatenate synthesis


Page 27

Java Speech API specifies a cross-platform interface to support command and control

recognizers, dictation systems and speech synthesizers. Java Speech API has two

technologies. They are speech synthesis and speech recognition. Speech synthesis provides

the reverse process of producing synthetic speech from text generated by an application, an

apple, or a user.

With the synthesis capabilities developer‟s can build applications to generate speech from the

text.

There are two primary steps to produce speech from a text.

Structure analysis: Processes the input text to determine where paragraphs, sentences, and

other structures start and end. For most languages, punctuation and formatting data are used

in this stage.

Text pre-processing: Analyzes the input text for special constructs of the language. In

English, special treatment is required for abbreviations, acronyms, dates, times, numbers,

currency amounts, e-mail addresses, and many other forms. Other languages need special

processing for these forms, and most languages have other specialized requirements.

Speech recognition grants the privileges for the computer to listen to human speech

understand and recognize and converts in to text.


Page 28

There are some steps in order to build a speech recognition system.

Grammar design: Defines the words that may be spoken by a user and the patterns in

which they may be spoken.

Signal processing: Analyzes the spectrum characteristics of the incoming audio.

Phoneme recognition: Compares the spectrum patterns to the patterns of the

phonemes of the language being recognized.

Word recognition: Compares the sequence of likely phonemes against the words and

patterns of words specified by the active grammars.

Result generation: Provides the application with information about the words the

recognizer has detected in the incoming audio.

Behalf of JSAPI we need another two Java API‟s. They are Java Sound API and Java

media frame work. Java sound API has the capabilities of handling sounds and it‟s

equipped with a rich set of classes and interfaces that directly deals with incoming sound

signals. Java Sound API widely used for the following areas and industries.

Communication frameworks, such as conferencing and telephony

End-user content delivery systems, such as media players and music using streamed

content

Interactive application programs, such as games and Web sites that use dynamic

content

Content creation and editing

Tools, toolkits, and utilities

Java sound API uses a hardware independent architecture. It is designed to allow different

sorts of audio components to be installed on a system and accessed by the API.


Page 29

With Java Sound API we can process both the MIDI 6and wav sound formats.

Java media frame work is a recently developed frame work which can be used to build

dynamic multimedia applications.

6 Musical Instrument Digital Interface

Figure 6 : Java Sound API Architecture


Page 30

2.6.2.1 Java Speech and Grammar format

JSGF or Java speech and Grammar Format was built by the Sun Micro systems. It defines the

set of rules and words for speech recognition. JSGF is plat form independent specification

and it is derived from the Speech recognition Grammar Specification.

The Java Speech Grammar Format has been developed for use with recognizers that

implement the Java Speech API. However, it may also be used by other speech recognizers

and in other types of applications.

A typical grammar rule is a composition of what to be spoken, the text to be spoken and a

reference to other grammar rules. A JSGF file comes in a normal file format or in XML

format.

source (anon. (nd). JSGF Architecture. Available: http://www.cs.cmu.edu/. Last accessed

24th

july 2009.)

Figure 7 : JSGF Architecture


Page 31

2.7 Speech Recognition Algorithms

Viterbi Algorithm is widely used in speech recognition. It is supports for dynamic

programming. This algorithm directly deals with the hidden Markov methods. Baum Welch

Algorithm is another algorithm used in this process. It evolves probability and maximum

likelihood. Forward Backward algorithm is another algorithm used in this process and it is

directly deals with hidden Markov methods. There are three steps in this algorithm.

Computing forward probabilities

Computing backward probabilities

Computing smoothed values

A combination of the above algorithms (a customized version) will use in the project.


Page 32

1. 8 Noise Filtering

Noise can be emerged in a speech by tape hiss, clapping, cough or by any other relevant

environmental or machinery factors. Noise plays a major role in the play of speech

recognition.

Source (anon. (nd). Departement Elektrotechniek. Available:

http://www.esat.kuleuven.be/psi/spraak/theses/08-09-en/clp_lp_mask.png. Last accessed 22

September 2009)

Figure 8: Noise in Speech


Page 33

According to Khan, E., and Levinson, R (1998) Speech recognition has achieved quite

remarkable progress in the past years.

Many speech recognition systems are capable of producing very high recognition

accuracies (over 98%).

But such recognition accuracy only applies for a quiet environment (very low noise)

and for speakers whose sample words were used during training.

Spectral subtraction and Weiner filtering are the two most popular methods that are available

in noise reduction because they are straight forward to implement.

1.8.1 Weiner filtering

Weiner filtering is a common model that applies for filtering noise. z(k), is a signal, s(k), plus

additive noise, n(k), that is uncorrelated with the signal z(k) = s(k) + n(k). If the noise is also

stationary then the power spectra of the signal and noise add 𝑃𝑧 𝑤 = 𝑃𝑠 𝑤 + 𝑃, 𝑤

1.8.2 Conclusion

Weiner filtering method has been chosen to the project because it is widely acceptable

method and so easy to implement.


Page 34

2.9 Database and data structure

Database contains the text version of speeches and their location. Sample database maintains

in the hard disk and the locations are saved in a file. Database indexing used for efficient

search results.

Database indexing improves the speed of data structure. Indexing can be divided in to two

parts that is clustered and none clustered.

None clustered indexing doesn‟t bother about the order of the actual records. This results

additional input and output operations to get the actual results.

In clustering indexing it reorders data according to their indexes as data blocks. It‟s more

efficient for the searching purposes.

2.9.1 Conclusion

Clustered indexing has been chosen for the project because the system evolves search

operation for speeches.

Figure 9 : Database Indexing


Page 35

2.10 Search Engine

Search engine basically act as the terminal for searching speeches and lectures. It will check

for search results in locally deployed database that contains the text version of speeches and

lectures. A search engine operates in the order of web crawling, indexing and searching.

Source(Sergy ,B. Lawrence,P.. (nd). The Anatomy of a Large-Scale Hypertextual Web Search

Engine. Available: http://infolab.stanford.edu/~backrub/google.html. Last accessed 24 march

2009.)

Figure 10 : Google Architecture


Page 36

2.11 MATLAB

MATLAB was developed by MathWorks. MathWorks is a privately held multinational

company. They are specialized in technical software.

MATLAB is a multi platform fourth generation programming language. Just like other many

languages MATLAB supports the following features.

Matrix manipulation

Plotting of functions and data

Algorithm implementation

Create Graphical user interfaces

Interfacing with other programming languages

Most of the MATLAB code snippets show a numerical nature. Regardless of that factor by

using MATLAB we can build systems in a more precise manner and the line of codes that

required buildings the system are relatively few compared with other languages such as

JAVA or C#.

Just like other object oriented languages MATALB supports classes, interfaces and functions.

They are used in high level MATLAB programming.

MATLAB directly supports both the Analogue and Digital Signal processing. It has defined a

set of rich features to work with Analogue and Digital Signal Processing. Signal transforms

and spectral analysis, digital system design, digital filtering, adaptive filtering, coding and

compression algorithms are the features which supports by MATLAB.


Page 37

CHAPTER 3

ANALYSIS

3.0 System requirements

3.11 Functional requirements

1. The application must convert to speech or the lecture to a text format.

2. Converted text should be visible to the user.

3. If the speech or the lecture has noise it must be reduced in a manner that eligible for speech

recognition process.

4. Speeches with different accent need to be identified in a reasonable manner by the system.

5. The search results must be efficient and reliable.

3.1.2 Non functional requirements

1. Search algorithm need to be efficient.

2. Should not cater duplicate search results.

3. Should not take more time in searching.

4. Speech to text conversion must be efficient and accurate.

5. Noise reduction must maintain a fair performance.


Page 38

3.1.3 Software Requirements

Java JDK 1.6:- JDK 1.6 equipped with the state of the art technology and includes much

functionality. Java Sound API newest version must be required.

NetBeans IDE 6.5:- is an open source IDE and it equipped with h PHP, JavaScript and Ajax

editing features, improved and the Java Persistence API, and tighter Glassfish v3 and MySQL

integration. It also facilitates features for the architectural drawing of the system. It also

equipped powerful J2EE components that are essential to build the search engine. We can

integrate any third party component that used for the system without much efforts and it has

the feature of code generation. Many non open source plugging supports this IDE.

Windows XP or equivalent operating system: - Windows XP operating system supports both

the open source components as well as commercialized components. We can deploy

everything that is essential for our project. Windows XP is a robust error less, user friendly

operating system compare to other windows operating system.

Apache tomcat server: version 5.5.27-Available in http://tomcat.apache.org/ is a freely

available server that we can run web programs on it. It is robust and open source. It has many

third party components that s essential to integrate stand alone, mobile, web based application

in to each other. This server comes with the NetBeans IDE.

XML Database:- This is the world‟s most popular database and its open source. It directly

supports for apache tomcat server and the NetBeans IDE and the crashing rate are lesser

compared to other databases with web services.

Proper sound driver software is required in order to achieve best results.

Matlab software required to perform noise filtering.


Page 39

3.1.4 Hardware requirements

32 bit Intel Dual Core IV processor or greater:- concern about the development phase of the

project a massive amount of processing power is required as for the speech recognition and

for noise analysis , text to speech conversion and for the search. It is advisable to have high

end machine inured to prevent deadlocks.

64 bit PCI sound card: - A high end sound card required to process digital audio signals.

Minimum of 1 GB DDR3 RAM is required and 2 GB of virtual memory must be present

in the system.

The default components of a personal computer must required

A modem or a router is required in order to test the search between many users.

1mb ADSL internet connection or greater is required for the data gathering.

A microphone must need as for the future enhancements. So the users can store their own

speech and as for a future use any one can search any particular speeches and lectures.

20GB hard disk is required with 7200 or more rotation rate because the system going to maintain

the database in my machine.

Note: At least a Duel Core Processor is required because the speech recognition process

needs a massive processing power.


Page 40

3.2 System Development Methodologies

All the methodologies compared in here were extended versions of previously commonly

used methodologies.

3.2.1 Rational Unified Process

Rational Unified process is a development methodology created by the rational software

division of IBM in 2003. It‟s an iterative system development process. RUP explains how

specific goals are achieved in a detailed manner.

RUP is a methodology of Managing Object Oriented software development. According to

Kroll and Kruchten (2003) “The RUP is a software development approach that is iterative,

architecture-centric, and use-case-driven.” RUP has extensible features and they are as

follows.

Iterative Development

Requirements Management

Component-Based Architectural Vision

Visual Modeling of Systems

Quality Management

Change Control Management


Page 41

The figure below shows basic overviews of its phases.

Source: (anon. (nd). Department of Computer Science. Available:

people.cs.uchicago.edu/~matei/CSPP523/lect4.ppt. Last accessed 24 march 2009.)

Figure 11 Phases in RUP


Page 42

Advantages of RUP

It is a well-defined and well-structured software engineering process.

It supports changing requirements and provides means to manage the change and

related risks

It promotes higher level of code reuse.

It reduces integration time and effort, as the development model is iterative.

It allows the systems to run earlier than with other processes that essential for the

system.

Risk management feature allows identifying risks before the development process.

It has the unique feature that “Plan a little”, “design a little” and codes a little.

RUP is an idea driven, principle based methodology.

RUP methodology is a worldwide commercial standard.

Disadvantages of RUP

For most of the projects RUP is an insufficient methodology.

We need to customize the processes due to various situations.

It has a poor usability support.

The process in relatively complex and the weight age is high.


Page 43

3.2.2 Agile Development Method

Agile development methodology is an iterative process. Agile has short time iterations and

due to that have minimum risk. The Agile software development methodology has the feature

of break tasks into small increments with minimal planning and it won‟t directly involve long

term planning. Agile highly supports for object oriented developing.

Most of all Agile has the unique feature called Extreme programming, now widely used in

software development process.

According to Ambler (2005) Agile is an iterative and incremental (evolutionary) approach to

software development which is performed in a highly collaborative manner by self-

organizing teams within an effective governance framework that produces high quality

software in a cost effective and timely manner which meets the changing needs of its

stakeholders.

Figure 12 : Overview of Agile


Page 44

Advantages in Agile Software Development

Increased Control

Rapid Learning

Early Return on Investment

Satisfied Stakeholders

Responsiveness to Change

Disadvantages in Agile Software Development

Agile evolves heavy documentation.

Agile Requirements are barely insufficient for the projects.

Not an organized methodology.

Because testing is integrated through the development the development cost is

relatively high.

Too much user involvement may spoil the project.


Page 45

3.2.3Scrum Development Methodology

According to Mikneus,s , S., Akinde, A. (2003)

Scrum is an Agile Software Development Process.

Scrum is not an acronym

Name taken from the sport of Rugby, where everyone in the team pack acts together

to move the ball down the field

Analogy to development is the team works together to successfully develop quality

software

According to Jeff Sutherland (2003) “Scrum assumes that the systems development process is

an unpredictable, complicated process that can only be roughly described as an overall

progression.” “Scrum is an enhancement of the commonly used iterative/incremental object-

oriented development cycle” Scrum principles include:

Quality work: empowers everyone involved to be feeling good about their job.

Assume Simplicity: Scrum is a way to detect and cause removal of anything that gets

in the way of development.

Embracing Change: Team based approach to development where requirements are

rapidly changing.

Incremental changes: Scrum makes these possible using sprints where a team is able

to deliver a product (iteration) deliverable within 30 days.


Page 46

Advantages in Scrum

Scrum has the ability to respond unseen software development risks

It‟s a specialized process for commercial application development.

It gives the developers of facility to deliver a functional application to the clients.

Disadvantages in Scrum

Not suitable for researched based software developments.

Source[anon. (nd). anon. Available: http://www.methodsandtools.com/archive/scrum1.gif.

Last accessed 26 th March 2009.]

3.2.4 Conclusion

Agile software development methodology has been chosen for the development process

because it supports Object oriented development, has short iterations and supports Extreme

programming.

Figure 13 : Scrum Overview


Page 47

3.3 Test Plan

The Systems main functionalities are noise analysis, speech recognition, database indexing

(directly effects to the search) and the search engine.

The system takes data (Speeches and lectures) from various conditions with lowered noise. But

the system cannot assure the effect of the noise factor. Due to that reason we perform noise

analysis and try to reduce it. Otherwise it will affect to the speech recognition process.

3.3.1System testing

Speeches and lectures with different accent [English only USA and British]:- In order to test

the speech recognition engines accuracy it will tested against different accents. The expected

results must be in a minimum difference with minimum errors.

Content Search:- when the user tries to search by the content by typing a word or a phrase

the appropriate search result will be displayed. The speech or the lecture containing the

specified words or the phrase will be displayed


Page 48

CHAPTER 4

SYSTEM DESIGN

4.1 Use Case Diagram

The noise filters functionality implements separately from the speech recognition system.

Noise filtering system represents as the “Actor”.

Figure 14 : Use Case Diagram for System


Page 49

The Figure above shows the Use Case Diagram for the entire system. The System mainly

consists with two actors. A user can uploads speech file in wav format to perform the speech

recognition.

Noise filtering handled by a separate system. The user has to upload a noisy speech file and

the noise filtering system will produce a file with lowered noise.


Page 50

4.2 Use case description

4.2.1 Use case description for file upload

Use Case

Use Case One

Description

User uploads a file

Actors

user

Assumptions

User uploads a file in .wav format. The user has to upload a file without

noise.

Steps

User has to run the system, press open button and have to select a file

Variations

A user may uploads a file without noise or with noise,

Non functional

requirements

All the necessary hardware configuration must met.

Issues

None

Table 4 : Use Case description file upload


Page 51

4.2.2Use Case description for play an audio file

Use Case

Use Case Two

Description

User plays a .wav file

Actors

User

Assumptions

User can only play a file in wav format

Steps

User has to open a file, and then the button play gets enabled. User has to

press the play button.

Variations

No variations , only files in wav format can be played

Non functional

Requirements


Issues

None

Table 5 Use Case description play audio


Page 52

4.2.3 Use Case description for search

Use Case

Use Case Three

Description

User search for a speech by content

Actors

User

Assumptions

User can search a speech by typing a sentences

Steps

User has to run the speech search program. Type the thing he/she wants to

search for and presses search

Variations

No variations

Non functional


Issues

None

Table 6 Use Case description search


Page 53

4.2.4 Use Case description for noise reduced output

Use Case

Use Case Four

Description

Noise reduction output produced by the system

Actors

Noise filtering system

Assumptions

Permanent elimination of the noise is unreachable.

User uploads a noisy file in a wav format

Steps

User has to run the noise filtering program in MATLAB

User has to input a file which includes the noise

Variations

No variations

Non functional


Issues

None

Table 7 Use Case description noise reduction


Page 54

4.2.5 Use Case description for noise filtering

Use Case

Use Case Five

Description

The process of filtering noise

Actors

Noise filtering system

Assumptions

Permanent elimination of the noise is unreachable.

User uploads a noisy file in a wav format

The chosen mechanism for noise filtering is the most suitable one

Steps

User has to run the noise filtering program in MATLAB

User has to input a file which includes the noise

Variations

No variations

Non functional


Issues

None

Table 8 Use Case description noise process


Page 55

4.3 Activity Diagrams

4.3.1Activity Diagram for Speech Recognition System

Figure 15 Speech Recognition


Page 56

4.3.2 Activity Diagram for Noise filtering

Figure 16 Activity Diagram Noise Filtering


Page 57

4.4 Sequence Diagrams

4.4.1 Select a file

Figure 17 Sequence Diagram Select a file


Page 58

4.4.2 Play wav file

The system can play a file. Two main control classes involve this process. The

WavFileRecognition class acts as a mediator which passes messages between functionalities

on other classes.

Figure 18 Sequence Diagram Play File


Page 59

4.4.3Speech recognition pre stage

In Speech recognition pre stage, the system gets loaded with the configuration file and input

signal. A recognizer will allocate through the configuration manager.

Figure 19 Sequence Diagram SR Pre Stage


Page 60

4.4.4Speech Recognition post stage

In speech recognition post stage the input digital signal will go through fast Fourier

transformation segmenting, identifying dialects and phonemes. The Classes

AudioFileDataSource and the Recognizer facilitates functionalities to perform these tasks.

Figure 20 Sequence Diagram SR Post Stage


Page 61

4.5 Class Diagrams

4.5.1 GUI and the system

The figure below shows the class diagram of the GUI and WavFileRecognizer.

Figure 21 Class Diagrams GUI & System


Page 62

4.5.2 Speech recognition

Figure 22 Class Diagram SR System


Page 63

Class Diagram for Speech search

Figure 23 : Speech Search Class Diagram


Page 64

4.6 Noise Filtering

Noise filtering has done using Matlab. Matlab support objects orientation, polymorphism

or inheritance. I have generated a code in C to tally the code in Matlab.

%ver 1.56

function noiseReduction

%----- user data -----

steps_1 = 512;

chunk = 2048;

coef = 0.01*chunk/2;

The 3 code segments above defines user data which going to use in MATLAB script. The

term chunk means a small piece of segment of the input signal. The script below can be used

to filter the noise for any given input signal.

%Windowing Techniques

%w1 = .5*(1 - cos(2*pi*(0:chunk-1)'/(chunk))); %hanning

w1 = [.42 - .5*cos(2*pi*(0:chunk-1)/(chunk-1)) + .08*cos(4*pi*(0:chunk-1)/(chunk-1))]';

%Blackman

w2 = w1;

Backman Window technique used here to chop the signal in to small segments. In here the

input signal will recursively split in to small chunks. Chunk is the technical term for a

segment in digital signal processing.

% input wav file and extract required data

[input, FS, N] = wavread('input.wav');

L = length(input);

The input signal will extract and re arrange in to a matrix. Length is the total propagating

duration of the signal. The matrix mechanism hidden by the MATLAB.


Page 65

% zero padding for intput file

input = [zeros(chunk,1);input;zeros(chunk,1)]/ max(abs(input));

%the appended zeros to the back of the input sound file makes it so that the windowing

samples the complete sound file

%----- initializations -----

output = zeros(length(input),1);

count = 0;

% block by block fft algorithm

Normally a noise signal has a higher frequency. After the system gets median value for

noise factor. The functions below recursively take segments and analyze the mean value.

while count<(length(input) - chunk)

grain = input(count+1:count+chunk).* w1; % windowing

f = fft(grain); % fft of window data

r = abs(f); % magnitude of window data

phi = angle(f); % phase of window data

ft = denoise(f,r,coef);

This function will reduce the amplitude of each chunk. A single chunk will take as an

argument by the function.

grain = real(ifft(ft)).*w2; % take inverse fft of window data

output(count+1:count+chunk) = output(count+1:count+chunk) + grain; % append

data to output file

count = count + steps_1; % increment by hop size

end

output = output(1:L) / (4.75*max(abs(output))); %the 4.75*max(abs(output) maintains

consistency between input and output volume

%soundsc(output, FS);

wavwrite(output, FS, 'output.wav');

As you can see there are no classes or Interfaces. Equivalent code for the Matlab in C

programming language is shown below.


Page 66

function ft = denoise(f,r,coef)

if abs(f) >= 0.001

ft = f.*(r./(r+coef));

else

ft = f.*(r./(r+sqrt(coef)));

end

The shown above is denoise function. The function analyzes each signal chunk‟s absolute

frequency against its mean value. Then it will get modified by the coefficient and the square

root recursively. This process continues till the higher frequency clusters eliminates to lower

frequencies.


Page 67

4.7 Code to filter noise in C Language

#include <stdio.h>

#include "mclmcr.h"

#ifdef __cplusplus

extern "C" {

#endif

extern const unsigned char __MCC_denoise2_public_data[];

extern const char *__MCC_denoise2_name_data;

extern const char *__MCC_denoise2_root_data;

extern const unsigned char __MCC_denoise2_session_data[];

extern const char *__MCC_denoise2_matlabpath_data[];

extern const int __MCC_denoise2_matlabpath_data_count;

extern const char *__MCC_denoise2_mcr_runtime_options[];

extern const int __MCC_denoise2_mcr_runtime_option_count;

extern const char *__MCC_denoise2_mcr_application_options[];

extern const int __MCC_denoise2_mcr_application_option_count;

#ifdef __cplusplus

}

#endif

static HMCRINSTANCE _mcr_inst = NULL;

static int mclDefaultPrintHandler(const char *s)

{

return fwrite(s, sizeof(char), strlen(s), stdout);

}

static int mclDefaultErrorHandler(const char *s)

{

int written = 0, len = 0;

len = strlen(s);

written = fwrite(s, sizeof(char), len, stderr);

if (len > 0 && s[ len-1 ] != '\n')

written += fwrite("\n", sizeof(char), 1, stderr);

return written;

}

bool denoise2InitializeWithHandlers(

mclOutputHandlerFcn error_handler,

mclOutputHandlerFcn print_handler

)

{

if (_mcr_inst != NULL)

return true;


Page 68

if (!mclmcrInitialize())

return false;

if (!mclInitializeComponentInstance(&_mcr_inst,

__MCC_denoise2_public_data,

__MCC_denoise2_name_data,

__MCC_denoise2_root_data,

__MCC_denoise2_session_data,

__MCC_denoise2_matlabpath_data,

__MCC_denoise2_matlabpath_data_count,

__MCC_denoise2_mcr_runtime_options,

__MCC_denoise2_mcr_runtime_option_count,

true, NoObjectType, ExeTarget, NULL,

error_handler, print_handler))

return false;

return true;

}

bool denoise2Initialize(void)

{

return denoise2InitializeWithHandlers(mclDefaultErrorHandler,

mclDefaultPrintHandler);

}

void denoise2Terminate(void)

{

if (_mcr_inst != NULL)

mclTerminateInstance(&_mcr_inst);

}

int main(int argc, const char **argv)

{

int _retval;

if (!mclInitializeApplication(__MCC_denoise2_mcr_application_options,

__MCC_denoise2_mcr_application_option_count))

return 0;

if (!denoise2Initialize())

return -1;

_retval = mclMain(_mcr_inst, argc, argv, "denoise2", 0);

if (_retval == 0 /* no error */) mclWaitForFiguresToDie(NULL);

denoise2Terminate();

mclTerminateApplication();

return _retval; }

/*

* MATLAB Compiler: 4.0 (R14)

* Date: Sun Oct 04 09:55:11 2009

* Arguments: "-B" "macro_default" "-m" "-W" "main" "-T" "link:exe" "denoise2"


Page 69

*/

#ifdef __cplusplus

extern "C" {

#endif

const unsigned char __MCC_denoise2_public_data[] = {'3', '0', '8', '1', '9',

'D', '3', '0', '0', 'D',

'0', '6', '0', '9', '2',

'A', '8', '6', '4', '8',

'8', '6', 'F', '7', '0',

'D', '0', '1', '0', '1',

'0', '1', '0', '5', '0',

'0', '0', '3', '8', '1',

'8', 'B', '0', '0', '3',

'0', '8', '1', '8', '7',

'0', '2', '8', '1', '8',

'1', '0', '0', 'C', '4',

'9', 'C', 'A', 'C', '3',

'4', 'E', 'D', '1', '3',

'A', '5', '2', '0', '6',

'5', '8', 'F', '6', 'F',

'8', 'E', '0', '1', '3',

'8', 'C', '4', '3', '1',

'5', 'B', '4', '3', '1',

'5', '2', '7', '7', 'E',

'D', '3', 'F', '7', 'D',

'A', 'E', '5', '3', '0',

'9', '9', 'D', 'B', '0',

'8', 'E', 'E', '5', '8',

'9', 'F', '8', '0', '4',

'D', '4', 'B', '9', '8',

'1', '3', '2', '6', 'A',

'5', '2', 'C', 'C', 'E',

'4', '3', '8', '2', 'E',

'9', 'F', '2', 'B', '4',

'D', '0', '8', '5', 'E',

'B', '9', '5', '0', 'C',

'7', 'A', 'B', '1', '2',

'E', 'D', 'E', '2', 'D',

'4', '1', '2', '9', '7',

'8', '2', '0', 'E', '6',

'3', '7', '7', 'A', '5',

'F', 'E', 'B', '5', '6',

'8', '9', 'D', '4', 'E',

'6', '0', '3', '2', 'F',

'6', '0', 'C', '4', '3',


Page 70

'0', '7', '4', 'A', '0',

'4', 'C', '2', '6', 'A',

'B', '7', '2', 'F', '5',

'4', 'B', '5', '1', 'B',

'B', '4', '6', '0', '5',

'7', '8', '7', '8', '5',

'B', '1', '9', '9', '0',

'1', '4', '3', '1', '4',

'A', '6', '5', 'F', '0',

'9', '0', 'B', '6', '1',

'F', 'C', '2', '0', '1',

'6', '9', '4', '5', '3',

'B', '5', '8', 'F', 'C',

'8', 'B', 'A', '4', '3',

'E', '6', '7', '7', '6',

'E', 'B', '7', 'E', 'C',

'D', '3', '1', '7', '8',

'B', '5', '6', 'A', 'B',

'0', 'F', 'A', '0', '6',

'D', 'D', '6', '4', '9',

'6', '7', 'C', 'B', '1',

'4', '9', 'E', '5', '0',

'2', '0', '1', '1', '1'

, '\0'};

const char *__MCC_denoise2_name_data = "denoise2";

const char *__MCC_denoise2_root_data = "";

const unsigned char __MCC_denoise2_session_data[] = {'7', '7', 'B', 'D', '1',

'6', '2', '3', '5', '5',

'4', '5', '0', 'A', 'B',

'1', '7', '3', '9', '0',

'4', 'D', '4', '6', '7',

'2', 'E', '3', '6', 'B',

'3', '2', '4', '7', '5',

'6', '1', '0', 'F', '3',

'5', '2', '8', 'D', '5',

'3', '8', '2', '3', '4',

'4', 'A', '6', 'B', '6',

'3', '8', 'E', '4', 'E',

'A', '8', '2', 'F', '9',

'4', '1', '8', 'E', '9',

'1', 'C', '1', 'F', '8',

'F', '7', '6', '0', '2',

'D', 'B', '3', 'B', 'F',

'3', '4', '9', 'B', 'C',


Page 71

'2', '8', 'C', '6', 'A',

'9', '9', '6', '4', '9',

'6', '3', 'C', '6', '8',

'4', '1', '1', '8', '5',

'5', 'E', '2', '3', '5',

'B', '9', '7', '9', '7',

'0', '9', 'B', 'A', 'F',

'7', 'E', 'D', '0', 'C',

'0', '5', 'F', 'E', '2',

'C', '6', '3', '6', '6',

'D', 'F', 'B', '6', '0',

'F', '6', 'B', 'F', 'F',

'2', '9', '4', '4', '2',

'0', '3', 'C', 'C', 'C',

'8', 'E', '3', '7', 'F',

'A', '4', '5', 'A', '9',

'A', '5', 'B', '7', '2',

'0', '0', 'B', 'E', '3',

'F', 'E', '0', 'E', 'B',

'1', 'C', '0', '7', 'D',

'3', '9', 'D', 'F', '0',

'7', '4', '2', 'B', '9',

'E', '3', 'A', '2', 'F',

'3', '3', 'E', '9', '8',

'E', '5', 'C', '9', 'B',

'B', 'D', '3', '6', 'B',

'7', 'D', 'E', '8', '3',

'2', 'B', '9', '7', '5',

'F', '3', '0', '7', '7',

'D', 'F', '8', '1', 'F',

'A', '9', 'B', '4', 'F',

'E', '3', '5', '4', 'F',

'B', '1', '8', 'E', '1',

'D', '\0'};

const char *__MCC_denoise2_matlabpath_data[] = {"denoise2/",

"toolbox/compiler/deploy/",

"$TOOLBOXMATLABDIR/general/",

"$TOOLBOXMATLABDIR/ops/",

"$TOOLBOXMATLABDIR/lang/",

"$TOOLBOXMATLABDIR/elmat/",

"$TOOLBOXMATLABDIR/elfun/",

"$TOOLBOXMATLABDIR/specfun/",

"$TOOLBOXMATLABDIR/matfun/",

"$TOOLBOXMATLABDIR/datafun/",

"$TOOLBOXMATLABDIR/polyfun/",


Page 72

"$TOOLBOXMATLABDIR/funfun/",

"$TOOLBOXMATLABDIR/sparfun/",

"$TOOLBOXMATLABDIR/scribe/",

"$TOOLBOXMATLABDIR/graph2d/",

"$TOOLBOXMATLABDIR/graph3d/",

"$TOOLBOXMATLABDIR/specgraph/",

"$TOOLBOXMATLABDIR/graphics/",

"$TOOLBOXMATLABDIR/uitools/",

"$TOOLBOXMATLABDIR/strfun/",

"$TOOLBOXMATLABDIR/imagesci/",

"$TOOLBOXMATLABDIR/iofun/",

"$TOOLBOXMATLABDIR/audiovideo/",

"$TOOLBOXMATLABDIR/timefun/",

"$TOOLBOXMATLABDIR/datatypes/",

"$TOOLBOXMATLABDIR/verctrl/",

"$TOOLBOXMATLABDIR/codetools/",

"$TOOLBOXMATLABDIR/helptools/",

"$TOOLBOXMATLABDIR/winfun/",

"$TOOLBOXMATLABDIR/demos/",

"toolbox/local/",

"toolbox/compiler/"};

const int __MCC_denoise2_matlabpath_data_count = 32;

const char *__MCC_denoise2_mcr_application_options[] = { "" };

const int __MCC_denoise2_mcr_application_option_count = 0;

const char *__MCC_denoise2_mcr_runtime_options[] = { "" };

const int __MCC_denoise2_mcr_runtime_option_count = 0;

#ifdef __cplusplus

}

#endif


Page 73

CHAPTER 5

5.0 Implementation

The Agile development process was chosen for the development. The system went on three

iterations. In the first iteration the basic objective was to build a speech recognition engine.

Various methods were tested out. But in the first iteration the speech recognition engine was

built.

Figure 24: SR Engine


Page 74

The figure below shows the functionalities in speech recognition engine. It can open .wav file

to play or to recognize speech.

Figure 25 Open file


Page 75

Once a file selected for the recognition a user can press the start button to start the

recognition process. The recognized output can be viewed in the text output section.

Figure 26: Text output


Page 76

The noise filtering process has done in the second iteration and it‟s completely done by using

MATLAB.

It doesn‟t have a user interface. In the first development the noise filtering engine was not

that efficient. There were many isolated noise packets in the spectrum. But in the second

development the system could achieve a remarkable performance.

We have to input a noisy speech file and when we runs the program it will produced a noise

filtered .wav file.


Page 77

The Search engine was built on the third phase. The user has to run the search engine and it

will access the local database and gives the search results.

Figure 27 Speech Search Engine


Page 78

CHAPTER 6

6.0 Test Plan

6.1 Background

The system that built for the research project was comprises with three main parts. The

speech recognition section is the key part in this application. The noise filtering part section is

another key are that taken in to accounts. There‟s a text search in the system which provides

the facility to search the speech by content. Because this was a technical project and with

consideration of the nature of the projects, the testing criteria‟s would not looks the same

compared with other projects.

6.2 Introduction

As for the test plan the testing criteria‟s will based on the input speech signals for the speech

recognition and noise filtering and searching criteria. Due to nature of this project we cannot

make the use of industrial test plans. The project is not a commercial project. As for the

speech recognition testing criteria a speech in a digital format will use. Speech recognition

projects are still in the research stage. So it‟s not advisable to implement a standard heavy

weight test plan. Basic test plans will sufficient to asses the testing criteria‟s mentioned in the

project.


Page 79

6.3 Assumptions

Before declaring any assumptions it is advisable to understand the nature of the project.

Within the project scope we can assume that the speech recognition engine will only works

for noiseless speech inputs. The speech recognition system will only work on pure English

accent only. Noisy speech will not use as inputs because the speech engine won‟t directly

identify the noise factor and filters it.

The system only can identify the most speaking words. It is possible to add large vocabulary.

Due to the fact, the system haven‟t designed for high level language identification and

processing.

Noise filtering can be done on “.wav” format only. System cannot eliminate the noise factor

permanently.

It is not possible to use a file which have been filtered the noise for the recognition, because

the speech recognition system will works on noiseless accent only.

6.4 Features to be tested

For the speech recognition system a noiseless speech input in .wav format will be tested to

identify the continuous speech recognition capabilities. Continuous speech recognition

capability is a unique feature in modern speech recognition systems.

A noisy speech file will upload to noise filtering system and it will results a noise filtered [up

to a reasonable level] output file. It is possible to measure the efficiency of the noise filtering

system by measuring the amount of time it will take for processing. It is not addressed here in

the project.

For the speech searching part the system will use a file search. The search mechanism will

include an efficient file searching and text matching mechanism. Once the user typed for a

phrase, the system will show the mostly containing file name.

System can play a wav file before uploading for the recognition process.


Page 80

6.5 Suspension and resumption criteria

While the system testing process running and if there are defects there are reasons to

suspense the process. Suspension criteria denote what are those reasons. According to Anon.

(nd). Suspension criteria & resumption requirements

The suspension criteria as follows

Unavailability of external dependent systems during execution.

When a defect is introduced that cannot allow any further testing.

Critical path deadline is missed so that the client will not accept delivery even if all

testing is completed.

A specific holiday shuts down both development and testing

The resumption criteria‟s as follows

When the external dependent systems become available again.

When a fix is successfully implemented and the Testing Team is notified to continue

testing.

The contract is renegotiated with the client to extend delivery.

The holiday period ends.

According Anon. (nd). Suspension criteria & resumption requirements

Suspension criteria assume that testing cannot go forward and that going backward is also not

possible. A failed build would not suffice as you could generally continue to use the previous

build. Most major or critical defects would also not constituted suspension criteria as other

areas of the system could continue to be tested.


Page 81

6.6 Environmental needs

There are few environmental needs to be met before testing the system. The environmental

needs can be classified as software needs, hardware needs and legal needs. There are no legal

needs because the system does not have any links with legal situations.

The list of Software needs can be list down as below

Java run time environment

Matlab development software

NetBeans 6.5 or greater

Sound driver software

Windows XP operation system

The hardware needs are

A computer[hardware requirements were specified in another chapter under system

requirements]

Multimedia devices


Page 82

6.7 System testing

Speeches and lectures with different accent [English only USA and British]:- In order to test

the speech recognition engines accuracy it will tested against different accents. The expected

results must be in a minimum difference with minimum errors.

Content Search:- when the user tries to search by the content by typing a word or a phrase

the appropriate search result will be displayed. The speech or the lecture containing the

specified words or the phrase will be displayed


Page 83

6.8 Unit testing

The initial testing was the initial user interface. At the first glance the system only loads with

the basic interactions with the user. The system doesn‟t load any calculation or extraction

functionalities before a user provides a correct input for the system.

Test Case Test Case One

Description

The user runs the Speech recognition System for the first time

Expected Output

Open, Start and Open Speech buttons set enabled.

Encode To wav, Noise Filter buttons remain disabled.

The area below open a speech file shows blank.

Text output must show blank.

Actual Output

Open, Start and Open Speech buttons set enabled.

Encode To wav, Noise Filter buttons remain disabled.

The area below open a speech file shows blank.

Text output must show blank.

Actual output acquired.

Table 9 Test Case 1

On the initial run the speech recognition system won‟t load with any algorithms. After giving

an input the system will load the necessary components for processing. This mechanism will

utilize the system resources.


Page 84

The second testing criteria begin when the user provides and input to the system. This test

case interacts with the speech recognition system‟s input. The input can be a .wav file.

Test Case Test Case Two

Description

The user opens a file to feed the speech recognition system

The user provides for the system with .wav file.

The first input speech contains digits in the range of one to nine in

British accent.

File must be a noise free file.

Expected Output

Identified names of the digits needs to be display in text output area.

Actual Output

Due to variations in dialect the expected results would not the same.

Within the range of one to nine the system identifies the digits and

displays the output.

Table 10 Test Case 2

The identification of digits can be extending beyond ten. Once the name of the digit to be

identified becomes longer, the system identifies the digits with an error rate.


Page 85

The third testing criterion was based on the user inputs a file with noise for identification.

The system does not work for files contains with noise.

Test Case Test Case Three

Description

The user provides for the system with a .wav file with noise

Expected Output

The system will throws an error or the system shows no results

Actual Output

The actual output varies due to different noise levels. If the density of

the noise lays within a higher range the system go for an error. The

error can be “severe null”.

The system will go blank results due to the fact that the words are

merely in an identifiable stage.


The system doesn‟t have any functionality to measure the noise levels. The project scope

won‟t cater for in depth noise analysis. The levels of noise mentioned above were measured

in user experience.

The system assumes that the users would not upload files with noise to the system and this

rule clearly mentioned in assumptions.


Page 86

The fourth testing c criterion is to check the systems speech recognition capabilities with

words.

Test Case Test Case Four

Description

The user provides for the system with a .wav file containing basic

words.

The input doesn‟t contain any noise.

Expected Output

The system identified all the words and shows the output in a more

precise manner.

Actual Output

The system identified words with an error rate. The error rate is

fluctuates between from 20% to 35%.

Not all the words will identify by the system.


The System doesn‟t identify all the words. The identification process depends on the speed of

the utterance rate and the intensity of the phoneme. Higher intensities on phoneme help any

speech recognition systems to achieve more precise results.


Page 87

Test case five tests the performance of noise filtering. The noise filtering system was built in

MATLAB.

Test Case Test Case Five

Description

The user provides the noise filtering system with a noise file.

The input file must in .wav format.

The user has to open the MATLAB Scripts, import them to working

directory and need to run.

The file to be input need to be in the same directory.

Expected Output

An output file should be create in the working folder with the name

“output.wav”

“output.wav” file contains the noise filtered version of input file.

Amplitude of the output file should not have a difference which can

identify by a human.

Actual Output

Output file creates in working folder.

Output file has a lowered noise relative to the input file.

Output file is not noise free.

Amplitude has a different which can identify by a human ear.


Still there isn‟t a mechanism to remove the noise for 100%. The system will works on

predefined algorithms.


Page 88

Test case six tests the criterion for search functionality. The search functionality acts as a

speech search engine.

Test Case Test Case Six

Description

The user has to run the search engine.

Port 8080 must be free.

Expected Output

When user types a phrase to search on search engine and press search

button.

If there‟s a match in the database it will show true.

If there‟s no match the results will show as false.

Actual Output

If a match was found “true” displays in the results.

If no match “false” displays in results.


The system doesn‟t build for actual speech engine. It will only demonstrate how the speech

search engine works. As for future enhancements it‟s possible to build an actual search

engine.


Page 89

6.9 Performance Testing

The System‟s performance was tested in different operating systems. Operating systems

include virtual operating environments.

The absolute operating system in order to take the measurements was taken as the Microsoft

Windows XP.

Operating System Microsoft Windows XP

Speech recognition engine configuring

time

Between 0.5 seconds and 1 second

Efficiency of Speech recognition

Input signal which having greater phoneme

intensity, free from noise and duration less

than 10 seconds with low word density will

take around 1 second to 12 seconds.

Input signals which have many words will

take longer times.

Efficiency of Noise filtering and

MATLAB

Noise filtering system generates the output

less than 200 milliseconds for .wav file clips

which having a duration between 2 to 10

seconds.

Performance of Speech search engine

Startup time for the Speech Search has an

average of 8 to 15 seconds.

Table 15: Performance testing windows XP


Page 90

The performance of the speech search engine totally depends on the operating system. As for

an example the windows operation systems use much more resources than UNIX based

operating systems.

The speech search system runs on the Glassfish server. The glassfish server has more

performance in UNIX based operating systems. In windows environments the speech search

engine has many deadlocks.

Operating System Ubuntu 9.04

Speech recognition engine configuring

time

Between 0.2 seconds and 0.8 second

Efficiency of Speech recognition

Input signal which having greater phoneme

intensity, free from noise and duration less

than 5 seconds with low word density will

take around 1 second to 5 seconds.

System has a greater positive effect when it‟s

work on Ubuntu environments.

Efficiency of Noise filtering and

MATLAB

Noise filtering was efficient compared with

windows environment.

Performance of Speech search engine

The search and the startup time of the search

engine were efficient compared with

windows XP.

Table 16 : Performance Testing on UBUNTU


Page 91

Once the Search engine runs on many times in windows environment it has a higher potential

of crashing and it would not provide the correct results.

When the system uses to perform the speech recognition for several times the efficiency of

the recognition slows down.

Java runs on a virtual environment and the recognition process needs a higher processing

power. Due to those factors the efficiency of the system will degrade as it uses over and over.


Page 92

6.10 Integration Testing

Integration testing is a logical extension of unit testing. Integration testing identify drawback

when a system combines. Before performing integration System for the system it comprises

with different systems with different functionalities.

It is not possible to combine the noise filtering system with speech recognition system or

search web browser,.

An overall test mechanism used for integration testing due to the fact that the system

comprises with sub systems which indirectly have a connection with each other.

Big Bang testing

Big Band testing is the process of taking the entire unit testing criteria for a System and ties

them together. This approach mostly suitable for small systems and May results many

unidentified errors on testing stages. If a developer has done unit testing correctly, Bug bang

testing will helps to uncover more errors and it will save money and time.

In the system after performing the big bang testing the following faults were recovered.

The continuous functionality of the search engine cannot guaranty.

If the length of the input signal was long there will be a system out of memory error.

The disadvantages in Big bang testing are

Cannot start integration testing until all the modules have been successfully

evaluated.

Harder to track down the causes for the errors


Page 93

Incremental Testing

Incremental Testing allows you to compare and contrast two functionalities with you are

testing. You can add and test for other modules within the testing time.

Incremental testing cannot perform to the system because there are no parallel functionalities

within the system which interact each others.


Page 94

CHAPTER 7

CRITICAL EVALUATION AND FUTURE ENHANCEMENTS

7.1Critical evaluation

The entire project was about speech recognition using digital signal by input and search by

content. The project is a union of several other research areas. At the initial stage the research

was focused on to speech recognition.

The barriers met in the initial stage

Human speech recognition

At the beginning there wasn‟t a way to explain the speech recognition process,

the mechanisms behind that and how it was performed.

Speech recognition engine

Study of speech recognition engine was a crucial part for the design phase.

There was no speech recognition engine to analyze or to study.

In order to overcome those two factors first of all the functionality of speech recognition was

essential. After understanding a system and completion of a basic sketch of the flow

diagram, it seems a sufficient starting point for the development.

When talks over a microphone, it so easy to record a human voice. After recording the

human voice is no longer in analogue format. The obvious digital format was a .wav file. The

system going to performs the speech recognition for the file in .wav format.


Page 95

In the development phase the study of speech recognition system didn‟t help much for further

proceedings. Because when it comes to audio formats the digital signal processing part was

hidden. The system has to address the DSP part in a reasonable manner. Digital signal

processing is about concerned with the representation of the signals by a sequence of

numbers or symbols and the processing of these signals.

Within the course content we studied there wasn‟t a single module that tough us about

Interface programming, micro control programming or digital signal processing.

Building a functionality to handle the digital signal processing part from the scratch was a

tedious job. The knowledge that we have to build such functionality wont sufficient

compared with the time.

At the initial stage the plan was to develop the entire system in JAVA .but java didn‟t have a

built in proper API or luxuries to handle digital signal processing. However there were some

reliable third party components that merely manage to perform the task.

Plug in the third party tools was another issue. But finally manage to find codes in order to

accomplish the task.

There was few speech recognition systems were built using java. But the fact that they were

not built for continuous speech recognition or for noise reduction.

There were many issues in the first place. We have to define a grimmer format. There were

two options. One is to go with JVXML. Java voice xml is a technology which provides

speech synthesis capabilities and recognition capabilities. We can embed voice commands for

web sites using voice xml.

As for the project I have choose JSGF or JAVA speech and grammar format. Java speech and

grammar format supports inbuilt dictionaries which capable to support digits and words. We

can plug multi language capabilities.

When developing systems using JAVA it‟s always advisable to use the components that

easily support JAVA platform capabilities.


Page 96

The speech recognition system can split in to two systems. The Java virtual machine allocates

maximum of 128m memories for NetBeans. We cannot explicitly define the amount of the

virtual machine when we are working with net beans.

The digits recognition part can be performed and implemented in NetBeans development

environment.

But it‟s not possible to free the memory for recognize speech that contain words. They need a

higher level of virtual memory from the virtual machine. Because of that the speech

recognition for words had to run in Command prompt explicitly saying “java –mx256m -

jar”. This command allocated 256m virtual memory for speech recognition.

Noise filtering was another unsolved issue that had to answer through the system. For noise

filtering there was no proper support in JAVA. If you are doing a technical project its

essential to develop in 4th

generation languages or languages like C , assembly .


Page 97

In order to do noise filtering I had to search an efficient way. There were many

methodologies and algorithms were available. Spectral subtraction and Weiner filtering

methods are some of them were inefficient. Finally I had to understand about FFT algorithms.

FFT algorithms are addressed in hardware level applications.

JAVA doesn‟t have a proper DSP API. So the only was MATLAB. MATLAB is 4th

generation. It‟s the proper instrument for noise filtering. I had performed spectral

subtraction. Split the spectrum of incoming signal in to two parts. The noisy part bears the

constant which unknown that cannot be applied in to a pure equation.

That constant part will reduce from the equation. After performing the filtering part I could

able to produce a noise filtered speech output.

The first program was not efficient as it is. The output which produced by the system had

many isolated noisy bits. The first system for noise filtering was built using Weiner‟s filtering

mechanism. It‟s inefficient for continues speech noise filtering. Spectral subtraction is more

efficient compared with other algorithms and we can apply it for any digital signal.

Once I have done the noise filtering I realized that speech recognition would have done more

easily by using a development environment like MATLAB.

The search functionality was the tricky part. The system has to assure that the users can

search by content. There were many text searching algorithms available. Some of the

available algorithms are Knuth Morris algorithm, Rabin Karp algorithm and the Boyer Moor

fast string searching algorithm.

We can use that algorithm s for an efficient text search. System search is more like a search

engine. Google has its own search hierarchies. But as a research project it‟s not possible to

build a search that meets the criteria within time constraints.


Page 98

After completing the project I realized that the best languages are technical languages. When

trying to develop the system using JAVA I found out many technical difficulties. There was

no proper documentation. Minimum technical support and had to use many third party

components

As for the project I got the knowledge of digital signal processing which cannot be easily

acquired by a software engineering student who‟s following APIIT. Another interesting area

was speech recognition. In some occasions it seems so hard to accomplish certain tasks. But

while doing the recognition in JAVA I would able to do certain parts in maximum.

As final thoughts developing speech recognition systems needs a vast knowledge of

programming. You need to know about Digital signal processing, noise filtering mechanisms,

xml, and how to configure a speech recognition engine. It‟s advisable to know about

integration and algorithms.


Page 99

7.2 Suggestions for future enhancements

At the beginning phase of this project the sole purpose was to build a speech recognition

system that that have the ability to identify human speech in .wav digital format and convert

it to a text format , maintain a rich set of database and grand the ability for users to search

speech by its content.

While doing the project I found out some difficulties in speech recognition process because it

was entirely a waste area. The system that I was built for my research project will identify the

most using words in speeches in common activities.

As for future enhancements we can do many modifications for the system.

Improve efficiency via Neural Network capabilities

Introducing Neural Network capabilities in to the system, we can improve time efficiency of

it‟s various algorithms. By making the system efficient we can integrate the system for

mobile devices.

Expand the system scope for many languages

System built for the research project only able to identify the English words. System won‟t

able to identify any other international languages such as Spanish, French, Arabic or Russian.

By introducing additional functionalities for the system we can make this system a universal

product.


Page 100

Enhanced noise filtering capabilities

By enhancing noise filtering capabilities we can build a one system rather that two which self

identify the noise factor, reducing it and performs speech recognition. Currently JAVA

doesn‟t have a proper noise filtering API for Digital Signal processing. If we can convert the

entire system to C or C++ we will come up with a feature rich application.

Text to Speech Synthesis

By improving the above functionalities of the system we can maintain a rich database. If we

could make the system for translate speeches and lectures between languages it will be

helpful for the users throughout the globe.

Integrate to web

If we could make the system for the web as online speech recognition and searching portal it

would be helpful for many users. By making the system as a web component or an add-on

“the system will distribute a handful of service for the users throughout the globe.

The search functionality which includes in the system is only a replica of a search engine. As

future enhancements we can build actual speech search engine in collaboration with various

free online music and speech streaming servers.


Page 101

CHAPTER 8

8.0 Conclusion

This project offered me a great experience to me. As a research project I thoroughly believe

that this would be a unique one for the institute. By doing this project I realized one important

aspect of live. That is it is so easy to think and harder to do.

When I started the project from the research stage I thought I will be able to complete the

project in less amount of time. But when I moved on to coding stage, the development

environments were still not ready to support the research done by the world.

Still there were no efficient algorithms or API‟s to meet the requirements for problems. With

the prevailing resources I was able to build a system that would meet the requirements in a

reasonable manner. Computer is a dumb device. Humans have to programme for it.

Programming is an art. Programmer‟s varies for the research. Throughout the academic life

of three years in APIIT I didn‟t have the chance to develop a technical project. This was an

entirely new experience for me.

As for the project and the available resource I think I have come up with a fair working

application that meets the requirements that I have mentioned in the project. It will recognize

the speech in a wav format rather than a microphone. This is something innovative. The noise

filtering area is still developing side in computing. I have done noise filtering for reasonable

amount, the best I could with available resources. Noise filtering is another vast research area

still growing. As for final thoughts I think I have done the best I could do to complete the

project.


Page 102

REFERENCES (In Alphabetical Order)

Research papers on Agile development

Phalnikar,R. & Deshpande, V.S. & Joshi, S.D.,2009. „Applying Agile Principles for

Distributed Software Development‟ , International Conference on Advanced Computer

Control, 2009, pp.535-539. S

mith,M. & Miller,J. & Huang,L. & Tran,A.,2009 . „A More Agile Approach to Embedded

System Development‟ ,IEEE Software, vol. 26, no. 3, May/June 2009, pp. 50-57.

Research papers on search engines

Varadarajan,R.& Histridis,V. & Li,T. , 2008 . ‟ Beyond Single-Page Web Search Results‟ ,

IEEE Transactions on Knowledge and Data Engineering, vol. 20, no. 3, pp. 411-424.

Shao,Q. & Sun,P & Chen,Y.,2009.‟ WISE: A Workflow Information Search Engine‟ , IEEE

International Conference on Data Engineering, 2009, pp.1491-1494.

Research papers on web databases Su,W. & Wang,J. & Lochovsky,F.H.,2009.‟ Record

Matching over Query Results from Multiple Web Databases‟ , IEEE Transactions on

Knowledge and Data Engineering, 15 Apr. 2009.

Research papers on noise analysis

Anderson , D.V. & Clements, M.A.,1999. „Audio signal noise reduction using multi-

resolution sinusoidal modeling‟ ,' Acoustics, Speech, and Signal Processing, 1999.

Proceedings, 1999 IEEE International Conference on, 1999, vol. 2, pp.805-808.

Godsill, S.J. & Rayner, P.J.W.,1996. „Robust noise reduction for speech and audio signals‟ ,

Acoustics, Speech, and Signal Processing, Conference Proceedings, 1996 IEEE International

Conference on, 1996, icassp, vol. 2, pp.625-628.


Page 103

Research papers on speech recognition

Wang,Z. & Topkara,U. & Schultz,T. & Waibel,A . , 2002. ‟ Towards Universal Speech

Recognition‟ , Fourth IEEE International Conference on Multimodal Interfaces (ICMI'02),

2002,pp.247.

Kurschl,W. & Mitsch,S. & Prokop,R. & Schonbock,J. , 2007. ‟ Gulliver-A Framework for

Building Smart Speech-Based Applications‟ ,' 40th Annual Hawaii International Conference

on System Sciences (HICSS'07), 2007, pp.30c.

Buza, O. & Toderean, G. & Nica, A. & Caruntu, A. ,2006.‟ Voice Signal Processing For

Speech Synthesis‟ ,2006 IEEE International Conference on Automation, Quality and

Testing, Robotics, aqtr, vol. 2, pp.360-364

Deligne, S. & Dharanipragada, S. & Gopinath, R. & Maison, B. & Olsen P. , 2002. „A

Robust High Accuracy Speech Recognition System‟ , IEEE Transactions on Speech and

Audio Processing. vol. 10, pp.1-11.

Abdulla,H.W & Kasabov,N.K,1999,The Concepts of Hidden Markov Model in Speech

Recognition..[Online],From:http://www.aut.ac.nz/resources/research/research_institutes/ked

ri/downloads/pdf/waleed-kas-9909.pdf .Available [2009/04/20]

Yankelovich, N & Levow, G. A. & Marx, M. , 1995. „ Designing SpeechActs: Issues in

Speech User Interfaces‟ . In: CHI, 1995, pp.1-12. Deng, L. & Huang, X. , 2004. „Challenges

in adopting speech recognition‟ ,Communications of the ACM. Vol. 47, pp.69-76.

Tu,Z. & Loizou, P.C., 1999 .‟ Speech recognition over the Internet using Java‟ , In:icassp ,

Acoustics, Speech, and Signal Processing, 1999. Proceedings, IEEE International Conference

on, 1999, Vol 4,pp.2367-2370.


Page 104

Zhang,W. & He,L. & Chow,Y. & Yang,R. & Su,Y.,2000 „The study on distributed speech

recognition system‟ . In:icassp, Acoustics, Speech, and Signal Processing, IEEE

International Conference on, 2000, vol. 3, pp.1431-1434

Ahmed, M. M. & Ahmed, A. M. B. , 2005. 'Review And Challenges In Speech Recognition',

In: ICCAS, 2005, pp.1-5. Lin, E. C. & Yu, K. & Rutenbar,

R. A. & Chen, T. ,2007. „A 1000-Word Vocabulary, Speaker-Independent, Continuous Live-

Mode Speech Recognizer Implemented in a Single FPGA‟ , In: FPGA, 18–20 February

2007, California. pp.60-69.

anon. (nd). School of Electrical, Computer and Telecommunications Engineering. Available:

http://www.elec.uow.edu.au/staff/wysocki/dspcs/papers/004.pdf]. Last accessed 23rd

August

2009

anon. (nd). Departement Elektrotechniek. Available:

http://www.esat.kuleuven.be/psi/spraak/theses/08-09-en/clp_lp_mask.png. Last accessed 22

September 2009

Sergy ,B. Lawrence,P.. (nd). The Anatomy of a Large-Scale Hypertextual Web Search

Engine. Available: http://infolab.stanford.edu/~backrub/google.html. Last accessed 24 march

2009.

anon. (nd).. Available: http://www.methodsandtools.com/archive/scrum1.gif. Last accessed

26 th March 2009


Page 105

Development

Java Media Framework API and Docs[online] (2009). Available from World Wide Web:

Java Speech APIs & Docs [online]. (2008). Available from World Wide Web: .

Other third party research papers

Byun, J. H. & Rim, H. C. & Park, S. Y. , 2007. 'Automatic Spelling Correction Rule Extraction

and Application for Spoken-Style Korean Text', In: Sixth International Conference on Advanced

Language Processing and Web Information Technology , 2007, pp.195-199.

Mitchel, C.D. , 1999. 'Improved spelling recognition using a tree-based fast lexical match', In:

IEEE International Conference on Acoustics, Speech, and Signal Processing, 1999, pp.597-600.

Sloboda, T. , 1995. 'Dictionary learning: performance through consistency', In: IEEE

International Conference on Acoustics, Speech, and Signal Processing, 1995, pp.453-456.

Wendemuth, A. & Rose, G. & Dolfing, J.G.A., 1999. 'Advances in confidence measures for large

vocabulary', In: IEEE International Conference on Acoustics, Speech, and Signal Processing,

1999, pp.705-708.

Thiele, F. & Rueber, B. & Klakow D. , 2000. 'Long range language models for free spelling

recognition', In: IEEE International Conference on Acoustics, Speech, and Signal Processing,

2000, pp.1715-1718. Books

Kroll, P., Kruchten, P., 2003, Rational Unified Process Made Easy: A Practitioner's Guide to

RUP, Addison Wesley Abrahamsson,

P. & Salo, O. & Ronkainen, J. & Warsta, J. 2002, Agile software development methods review

and analysis , VTT Publications.


Page 106

BIBLIOGRAPHY

Frankel, J. & Richmond, K. & King, S. & Taylor, P. , 2000. 'An Automatic Speech

Recognition System Using Neural Networks and Linear Dynamic Models to Recover and

Model Articulatory Traces', In: Sixth International Conference on Spoken Language

Processing, 2000, pp.1-4.

Padmanabhan, M. & Picheny, M. , 2002. „Large-Vocabulary Speech Recognition

Algorithms‟ . Computer, Vol. 35, pp. 42-50.

Zhao, H. & Wakita, X., 1991, "An HMM Based Speaker-Independent Continuous Speech

Recognition System With Experiments on the TIMIT Database" In. ICASSP, Toronto,

Canada, May 1991, pp. 333-336.

Andrew, H. 1997, comp.speech Frequently Asked Questions, [online], Available:

<http://www.speech.cs.cmu.edu/comp.speech/> [Accessed 26 May. 2009]

Andrew, H. 1997 What is speech recognition, [online], Available: < http://www.

speech.cs.cmu.edu/comp.speech/Section6/Q6.1.html> [Accessed 30 May. 2009]


Page 107

APPENDIX A User manual

The Ultimate Speech search system comprises with four subsystems. They are

Speech recognition System for numbers

Speech recognition System for words

Noise Filtering System

Speech search Web Browser

If you want to run the speech recognition System, you have to open the NetBeans and select

“USS “ as the project name.


Page 108

After running a user can see the initial GUI. The open button above can be used to open a

speech file in .wav format. In here the system with GUI will supports the identification of

digits. Identification of words do not support by this sub system.

By pressing the open button the system will allows the user to select a digit file for the

recognition.


Page 109

Once the user has select a file and press open, then he or she can click on Star button. After

that the user can see the text output in the text output area

.

The opened file name will show in just below open speech file label.


Page 110

The system for Recognize words needs a huge memory in java virtual machine. If we try to

run the system on NetBeans or Eclipse it will crash virtual machine. So the best way to do

this is in command prompt. You have to run the Jar file in command prompt by allocating the

memory for it manually.

You have to type java –mx256m –jar wordrec.jar. The java –mx256m will allocate additional

memory for the virtual machine and the latter part will run the program for word recognition.

The output varies and we cannot assure 100% correctness.


Page 111

In order to run the noise filtering system you need to have MATLAB version 6 or greater.

The user has to open both the denoise2. Script file and denoise.m script file. After that user

needs to type the file that he or she wants to perform noise filtering.


Page 112

wavread(„input.wav‟) gets the input parameter for system. If you want to specify the output

file name you have to specify it as shown below.

wavwrite(output,FS.‟output.wav‟) is the place to define the output file and format


Page 113

If you want to run the search engine, open NetBeans and run SpeechSearchEng.

A user cans type the content of a speech he or she required and if that speech is available it

will show as true. Otherwise it will show false.


Page 114

APPENDIX B

Gantt chart

speech recognition , noise filtering and content search engine , research document

Documents