Download - Automatic Speech Recognion
![Page 1: Automatic Speech Recognion](https://reader030.vdocument.in/reader030/viewer/2022032421/55a778831a28ab580a8b4810/html5/thumbnails/1.jpg)
![Page 2: Automatic Speech Recognion](https://reader030.vdocument.in/reader030/viewer/2022032421/55a778831a28ab580a8b4810/html5/thumbnails/2.jpg)
Types of ASR?????
Approaches to ASR
ASR(Automatic Speech Recognition)?
What is Voice Recognition???
What Is Voice??
Process of Voice Recognition????
Why Voices are Different???
Component of Sound???
How Speech Recognition Works????
![Page 3: Automatic Speech Recognion](https://reader030.vdocument.in/reader030/viewer/2022032421/55a778831a28ab580a8b4810/html5/thumbnails/3.jpg)
Application of Speech Processing??
Process of Speech Production???
Classification to Speech Sounds??
Approaches to Speech Recognition??
![Page 4: Automatic Speech Recognion](https://reader030.vdocument.in/reader030/viewer/2022032421/55a778831a28ab580a8b4810/html5/thumbnails/4.jpg)
The voice consists of sound made by a
human being using the vocal folds for
talking, singing, laughing, crying,
screaming, etc.
The voice consists of sound made by a
human being using the vocal folds for
talking, singing, laughing, crying,
screaming, etc.
![Page 5: Automatic Speech Recognion](https://reader030.vdocument.in/reader030/viewer/2022032421/55a778831a28ab580a8b4810/html5/thumbnails/5.jpg)
It is the process of converting voice into
electric signals.
Signals transform into CODING
PATTERN.
![Page 6: Automatic Speech Recognion](https://reader030.vdocument.in/reader030/viewer/2022032421/55a778831a28ab580a8b4810/html5/thumbnails/6.jpg)
The first ASR device was used in 1952and recognized single digits spoken by a
user
![Page 7: Automatic Speech Recognion](https://reader030.vdocument.in/reader030/viewer/2022032421/55a778831a28ab580a8b4810/html5/thumbnails/7.jpg)
TEMPLATE MATCHING
Template matching is
the simplest technique
and has the highest
accuracy when used
properly, but it also
suffers from the most
limitations.
ASR
Feature Analysis
A more general form
of voice recognition is
available through
feature analysis and
this technique usually
leads to "speaker-
independent" voice
recognition.
![Page 8: Automatic Speech Recognion](https://reader030.vdocument.in/reader030/viewer/2022032421/55a778831a28ab580a8b4810/html5/thumbnails/8.jpg)
•It is SPEAKE DEPENDENT.
•It match voice with already saved
templates.
•Before it we’ve to trained the system.
• System must be trained.
•User speak same word which are avail
in template.
•Recognition accuracy can be about 98
percent.
Template Matching
•It is SPEAKER INDEPENDENT.
•First process the giving voice as inputut
•Using LPC(Linear Productive Coding)
•Attempt to find similarities b/w
expected
•Input and Digitized input.
•Recognition accuracy for
speaker-independent systems is
somewhat less than for
speaker-dependent systems, usually
between 90 and 95 percent.
Feature Analysis
![Page 9: Automatic Speech Recognion](https://reader030.vdocument.in/reader030/viewer/2022032421/55a778831a28ab580a8b4810/html5/thumbnails/9.jpg)
TEXT PhonemsArticulary
Motions
Speak/
Say Someting
Acoustic Wave Form
Acoustic Wave FormSpectrum
Analysis
Feature
Retractions
Coding
Phonems/
Word/Sentence
Semantics
Discrete Input Continuous Input
![Page 10: Automatic Speech Recognion](https://reader030.vdocument.in/reader030/viewer/2022032421/55a778831a28ab580a8b4810/html5/thumbnails/10.jpg)
Vocal Tract
Consist of laryngeal pharynx, oral
phyrnax, oral cavity, nassal cavity,
nassal phyrnx.
Specturm Analysis
MFCC used to produce voice
feaature. DTW to select the pattern
that match the database(matLab).
![Page 11: Automatic Speech Recognion](https://reader030.vdocument.in/reader030/viewer/2022032421/55a778831a28ab580a8b4810/html5/thumbnails/11.jpg)
Acoustic Model
provide the acoustic sound of a language
and can be recognized the chore of a
particular user speech pattern and
acoustic environment.
![Page 12: Automatic Speech Recognion](https://reader030.vdocument.in/reader030/viewer/2022032421/55a778831a28ab580a8b4810/html5/thumbnails/12.jpg)
To make pattern recognition PCM
transfer into frequency domain
![Page 13: Automatic Speech Recognion](https://reader030.vdocument.in/reader030/viewer/2022032421/55a778831a28ab580a8b4810/html5/thumbnails/13.jpg)
Speaker Dependent
Speaker Independent
Discrete Speaker Recognition
Continuous Speech Recognition
Natural Languages
![Page 14: Automatic Speech Recognion](https://reader030.vdocument.in/reader030/viewer/2022032421/55a778831a28ab580a8b4810/html5/thumbnails/14.jpg)
Pitch
Timber
Harmonics
Loudness
Rhythm
Attack
Sustain
Decay
Speed
![Page 15: Automatic Speech Recognion](https://reader030.vdocument.in/reader030/viewer/2022032421/55a778831a28ab580a8b4810/html5/thumbnails/15.jpg)
![Page 16: Automatic Speech Recognion](https://reader030.vdocument.in/reader030/viewer/2022032421/55a778831a28ab580a8b4810/html5/thumbnails/16.jpg)
COMPRESSION
in which particles are crowded
together, appear as upward curves in
the line.
RAREFACTION
in which particles are spread apart,
appear as downward curves in the line.
WAVELENGTH
this is the distance from the crest of one
wave to the crest of the next.
![Page 17: Automatic Speech Recognion](https://reader030.vdocument.in/reader030/viewer/2022032421/55a778831a28ab580a8b4810/html5/thumbnails/17.jpg)
FREQUENCY
this is the number of waves that
pass a point in each second.
AMPLITUDE
this is the measure of the amount
of energy in a sound wave.
![Page 18: Automatic Speech Recognion](https://reader030.vdocument.in/reader030/viewer/2022032421/55a778831a28ab580a8b4810/html5/thumbnails/18.jpg)
High Frequency Sound Wave Low Frequency Sound Wave
This is how high or low a sound seems.
A bird makes a high pitch.
A lion makes a low pitch.
![Page 19: Automatic Speech Recognion](https://reader030.vdocument.in/reader030/viewer/2022032421/55a778831a28ab580a8b4810/html5/thumbnails/19.jpg)
Voices are different caused
by
INTENSITY(depend on amplitude) ,
PITCH(frequency) ,
TONE(pleasant or unpleasent).
![Page 20: Automatic Speech Recognion](https://reader030.vdocument.in/reader030/viewer/2022032421/55a778831a28ab580a8b4810/html5/thumbnails/20.jpg)
Divide the sound wave into evenly spaced blocks
Process each block for important characteristics, such as strength across various frequency ranges, number of zero crossings, and total energy.
Using this characteristic vector, attempt to associate each block with a phone, which is the most basic unit of speech, producing a string of phones.
Find the word whose model is the most likely match to the string of phones which was produced.
![Page 21: Automatic Speech Recognion](https://reader030.vdocument.in/reader030/viewer/2022032421/55a778831a28ab580a8b4810/html5/thumbnails/21.jpg)
Transfer the PCM into Accoustic
Apply GRAMMER
Figure out which PHONEMS are spoken
Convert PHONEMS into WORDS
![Page 22: Automatic Speech Recognion](https://reader030.vdocument.in/reader030/viewer/2022032421/55a778831a28ab580a8b4810/html5/thumbnails/22.jpg)
Acoustic Phonetic Approach
Pattern Recognition Approach(HMM)
Artificial Intelligence Approach(Neural Networks)
![Page 23: Automatic Speech Recognion](https://reader030.vdocument.in/reader030/viewer/2022032421/55a778831a28ab580a8b4810/html5/thumbnails/23.jpg)
![Page 24: Automatic Speech Recognion](https://reader030.vdocument.in/reader030/viewer/2022032421/55a778831a28ab580a8b4810/html5/thumbnails/24.jpg)
![Page 25: Automatic Speech Recognion](https://reader030.vdocument.in/reader030/viewer/2022032421/55a778831a28ab580a8b4810/html5/thumbnails/25.jpg)
![Page 26: Automatic Speech Recognion](https://reader030.vdocument.in/reader030/viewer/2022032421/55a778831a28ab580a8b4810/html5/thumbnails/26.jpg)
Speech Processing
Analysis/Syntactic Coding
Recognition
Speaker Recognition Language Identification
Speech Recognition
Speech Mode Speaking StyleVocabulary SizeSpeaker Mode
•Isolated Speech
•Continuous Speech
•Speaker Dependent
•Speaker In-Dependent
•Speaker Adaptive
•Small
•Medium
•large
•Dictation
•Spontaneous
![Page 27: Automatic Speech Recognion](https://reader030.vdocument.in/reader030/viewer/2022032421/55a778831a28ab580a8b4810/html5/thumbnails/27.jpg)
•Vocal Chord play active role in the
production of SOUND.
e.g. a/e/I
•It has high frequency
Voiced Sound
•When Vocal Chord is Inactive
Called UN VOICED SOUND
e.g. s/f
•It build up by pressure
Un Voiced Sound
![Page 28: Automatic Speech Recognion](https://reader030.vdocument.in/reader030/viewer/2022032421/55a778831a28ab580a8b4810/html5/thumbnails/28.jpg)
Speech Coding
Speech Recognition
Speech Verification/Identification
Speech Enhancement(remove background noises)
Speech Synthesis
![Page 29: Automatic Speech Recognion](https://reader030.vdocument.in/reader030/viewer/2022032421/55a778831a28ab580a8b4810/html5/thumbnails/29.jpg)
Grammar Design
Signal Processing
Phonemic Recognition
Word Recognition
Result Recognition