Knowledge Base approach for spoken digit recognition
Vijetha Periyavaram
Speech Recognition Systems
Provides a vehicle for communication between people and machines
The exchange of information with machines is actually the complex product of more than 30 years of research in statistics, physics, linguistics, and computer science.
Characters in science fiction stories have conversed with robots and computers for a long time.
Speech Recognition Systems
•We may have shared a few wordswith a computer, car, or cell phonewhen they are not working properly
•But now these machines can understand and can respond because of speech recognition systems
Advantages
As a result of speech recognition systems you can
Ask your car for directions Dial your mobile phone with out touching it. Dictate a term instead of typing it on the keyboard Give commands to a personal organizer e.g:
Shutdown , pop up start menu etc…
Main Concept of Speech Recognition Systems
Speech recognition systems first break down spoken language into phonemes
For example: /w/ as in "we" "quite" "once" /ch/ as in "much" "nature" "match" /ou/ as in "no" "boat" "low" /au/ as in "haul" "bought" "draw"
Almost 40 phonemes
Main Concept
•The system converts the individual sounds into digitized sound waves ,which it matches with a built in dictionary
•The speech recognition system figures out the correct choice through a series of algorithms, or mathematical models, that help narrow down the possibilities to ones that make the most sense
Proposed Method
One of the method proposed here for speech recognition systems is Knowledge base approach for spoken digit recognition.
In this method digitized data is processed using MATLAB – DSP tool box.
Problem definition
To develop a system that can identify an isolated spoken digit based on the knowledge developed by analyzing the digits
Analysis is based on on the following features which can be extracted using Matlab – DSP tool kit
– Energy envelope: Plots the energy of the wave– Zero crossing rate: No. of times in a sound sample that
amplitude of the sound wave changes sign
Proposed solution
Utterances of different people are studied Knowledge base for digits is created from
above Each digit has unique characteristics
irrespective of speaker’s nationality because this method mainly concentrates on the phonemes
Analyzing few features of these spoken words the digits are recognized
The output is printed on the screen.
Scope of the system
Isolated word vocabulary Unlimited speaker population, unrestricted by age or
sex Computer room speaking environment Transmission over high quality microphone No prior training Single word format with pauses between each spoken
input
Technique used
Speech signal is sampled at a particular frequency
End points of isolated words detected Data time normalized All digits set to same number of data Zero crossing rate and energy envelope
determined from each segment
Data Acquisition
Records voice from the user using multimedia sound recording equipment
Data is digitized
Sampled at the rate mentioned
Recorded speech is plotted
Recorded wave for digit zero
Filtering the recorded data
Required because of presence of environmental, system, and inherent microphone noises
Uses elliptical band pass filter in range of human voice frequency
Output of filter for digit zero
Plotting energy envelope
Plots energy of spoken digit This is smoothened using moving point
average method– A 200-point moving point average chosen– Replaces each sample’s amplitude with average of
200 consecutive samples
Energy envelope for digit zero
Location of start and end points
Start and end points of envelope of spoken digit identified
Criterion: less than 10% of maximum value of the energy envelope is not considered
Actual message for digit zero
Time normalization of data
Envelope resampled so that spoken word always contains 6000 samples
Envelope smoothened using moving point average method
Resampled envelope for digit zero
Various wave forms for digit seven
FLOW CHART
Start
Wave Recording
(8000samples)
Filtering (Band pass of 400 –
3200HZ)
Energy Calculation
FLOW CHART
Smoothening (moving point
average Filter of 200 pts)
Determination of Start and end
points
Calculation of Zero crossing
rate
Resampling to 6000 samples
FLOW CHART
Calculation of no.of peaks and peak positions
Classification Algorithm
Stop
Setting up the knowledge base
Number of peaks in energy envelope of spoken digit
Energy peak level Energy peak positions Zero crossing rate for each segment
Classification
First sweep – counting number of peaks in energy envelope
– Single peak – Two peaks – Three peaks
Second sweep – peak positions Third sweep – zero crossing rate
Classification
Results
The system was tested for 100 different human voice signals and the success rate was 89%
The final output was displayed on the monitor as well as on the LCD screen
The response time was 7 seconds.
Few More Applications
Speaker Identification system Security systems Robot Control Bank Transactions Aircraft control system Stock price quotation system