![Page 1: SPEECH RECOGNITION FOR MOBILE SYSTEMS BY: PRATIBHA CHANNAMSETTY SHRUTHI SAMBASIVAN](https://reader036.vdocument.in/reader036/viewer/2022062320/56649d965503460f94a7f797/html5/thumbnails/1.jpg)
SPEECH RECOGNITION FOR MOBILE SYSTEMS
BY:PRATIBHA CHANNAMSETTY
SHRUTHI SAMBASIVAN
![Page 2: SPEECH RECOGNITION FOR MOBILE SYSTEMS BY: PRATIBHA CHANNAMSETTY SHRUTHI SAMBASIVAN](https://reader036.vdocument.in/reader036/viewer/2022062320/56649d965503460f94a7f797/html5/thumbnails/2.jpg)
Introduction
• What is speech recognition?
Automatic speech recognition(ASR) is the process by which a computer maps an acoustic speech signal to text.
![Page 3: SPEECH RECOGNITION FOR MOBILE SYSTEMS BY: PRATIBHA CHANNAMSETTY SHRUTHI SAMBASIVAN](https://reader036.vdocument.in/reader036/viewer/2022062320/56649d965503460f94a7f797/html5/thumbnails/3.jpg)
CLASSIFICATION OF SPEECH RECOGNITION SYSTEM• Users - Speaker dependent system - Speaker independent system -Speaker adaptive system• Vocabulary -small vocabulary : tens of word -medium vocabulary : hundreds of words -large vocabulary : thousands of words -very-large vocabulary : tens of thousands of words.
![Page 4: SPEECH RECOGNITION FOR MOBILE SYSTEMS BY: PRATIBHA CHANNAMSETTY SHRUTHI SAMBASIVAN](https://reader036.vdocument.in/reader036/viewer/2022062320/56649d965503460f94a7f797/html5/thumbnails/4.jpg)
CLASSIFICATION OF SPEECH RECOGNITION SYSTEM• Word Pattern
- isolated-word system : single words at a time
- continuous speech system : words are connected together
![Page 5: SPEECH RECOGNITION FOR MOBILE SYSTEMS BY: PRATIBHA CHANNAMSETTY SHRUTHI SAMBASIVAN](https://reader036.vdocument.in/reader036/viewer/2022062320/56649d965503460f94a7f797/html5/thumbnails/5.jpg)
HOW SPEECH RECOGNITION WORKS
![Page 6: SPEECH RECOGNITION FOR MOBILE SYSTEMS BY: PRATIBHA CHANNAMSETTY SHRUTHI SAMBASIVAN](https://reader036.vdocument.in/reader036/viewer/2022062320/56649d965503460f94a7f797/html5/thumbnails/6.jpg)
APPLICATIONS
• Healthcare
• Military
• Helicopters
• Training air traffic controllers
• Telephony and other domains
![Page 7: SPEECH RECOGNITION FOR MOBILE SYSTEMS BY: PRATIBHA CHANNAMSETTY SHRUTHI SAMBASIVAN](https://reader036.vdocument.in/reader036/viewer/2022062320/56649d965503460f94a7f797/html5/thumbnails/7.jpg)
WHY SPEECH RECOGNITION?
• Speech is the easiest and most common way for people to communicate.• Speech is also faster than typing on a keypad and more expressive than
clicking on a menu item. • Users with low literacy.• Cellphones have widely proliferated the market.
![Page 8: SPEECH RECOGNITION FOR MOBILE SYSTEMS BY: PRATIBHA CHANNAMSETTY SHRUTHI SAMBASIVAN](https://reader036.vdocument.in/reader036/viewer/2022062320/56649d965503460f94a7f797/html5/thumbnails/8.jpg)
CHALLENGES ON MOBILE DEVICES
• Limited available storage space • Cheap and variable microphones • No hardware support for floating point arithmetic • Low processor clock-frequency • Small cache of 8-32 KB• Highly variable and challenging acoustic environments ranging from heavy
background traffic noises to a small room with reverberation of multiple speakers speaking simultaneously • Consume a lot of energy during algorithm execution
![Page 9: SPEECH RECOGNITION FOR MOBILE SYSTEMS BY: PRATIBHA CHANNAMSETTY SHRUTHI SAMBASIVAN](https://reader036.vdocument.in/reader036/viewer/2022062320/56649d965503460f94a7f797/html5/thumbnails/9.jpg)
ASR MODELS
• Embedded speech recognition • Speech recognition in the cloud
• Distributed speech recognition
• Shared speech recognition with user based adaptation(proposed model of use)
![Page 10: SPEECH RECOGNITION FOR MOBILE SYSTEMS BY: PRATIBHA CHANNAMSETTY SHRUTHI SAMBASIVAN](https://reader036.vdocument.in/reader036/viewer/2022062320/56649d965503460f94a7f797/html5/thumbnails/10.jpg)
EMBEDDED MOBILE SPEECH RECOGNITION
![Page 11: SPEECH RECOGNITION FOR MOBILE SYSTEMS BY: PRATIBHA CHANNAMSETTY SHRUTHI SAMBASIVAN](https://reader036.vdocument.in/reader036/viewer/2022062320/56649d965503460f94a7f797/html5/thumbnails/11.jpg)
EMBEDDED MOBILE SPEECH RECOGNITION Advantages• Not rely on any communication with a central server
• Cost effective
• Not affected by the latency
![Page 12: SPEECH RECOGNITION FOR MOBILE SYSTEMS BY: PRATIBHA CHANNAMSETTY SHRUTHI SAMBASIVAN](https://reader036.vdocument.in/reader036/viewer/2022062320/56649d965503460f94a7f797/html5/thumbnails/12.jpg)
EMBEDDED MOBILE SPEECH RECOGNITION Disadvantages
• Cannot perform complex computations
• Lack in terms of speed and memory
• To achieve reliable performance, modifications need to be made to every sub-system of the ASR to take both factors into account.
![Page 13: SPEECH RECOGNITION FOR MOBILE SYSTEMS BY: PRATIBHA CHANNAMSETTY SHRUTHI SAMBASIVAN](https://reader036.vdocument.in/reader036/viewer/2022062320/56649d965503460f94a7f797/html5/thumbnails/13.jpg)
SPEECH RECOGNITION IN THE CLOUD
![Page 14: SPEECH RECOGNITION FOR MOBILE SYSTEMS BY: PRATIBHA CHANNAMSETTY SHRUTHI SAMBASIVAN](https://reader036.vdocument.in/reader036/viewer/2022062320/56649d965503460f94a7f797/html5/thumbnails/14.jpg)
SPEECH RECOGNITION IN THE CLOUD
Advantages
• Improves speed and accuracy
• It provides an easy way to upgrade or modify the central speech recognition system.
• It can be used for speech recognition with low-end mobile devices such as cheap cellphones.
![Page 15: SPEECH RECOGNITION FOR MOBILE SYSTEMS BY: PRATIBHA CHANNAMSETTY SHRUTHI SAMBASIVAN](https://reader036.vdocument.in/reader036/viewer/2022062320/56649d965503460f94a7f797/html5/thumbnails/15.jpg)
SPEECH RECOGNITION IN THE CLOUD
Disadvantages• Performance degradation
• Acoustic models on the central server need to account for large variations in the different channels.
• Each data transfer over the telephone network can cost money for the end user.
![Page 16: SPEECH RECOGNITION FOR MOBILE SYSTEMS BY: PRATIBHA CHANNAMSETTY SHRUTHI SAMBASIVAN](https://reader036.vdocument.in/reader036/viewer/2022062320/56649d965503460f94a7f797/html5/thumbnails/16.jpg)
DISTRIBUTED SPEECH RECOGNITION
![Page 17: SPEECH RECOGNITION FOR MOBILE SYSTEMS BY: PRATIBHA CHANNAMSETTY SHRUTHI SAMBASIVAN](https://reader036.vdocument.in/reader036/viewer/2022062320/56649d965503460f94a7f797/html5/thumbnails/17.jpg)
DISTRIBUTED SPEECH RECOGNITION
Advantages
• Does not really need high quality speech
• Improve word error rates
![Page 18: SPEECH RECOGNITION FOR MOBILE SYSTEMS BY: PRATIBHA CHANNAMSETTY SHRUTHI SAMBASIVAN](https://reader036.vdocument.in/reader036/viewer/2022062320/56649d965503460f94a7f797/html5/thumbnails/18.jpg)
DISTRIBUTED SPEECH RECOGNITION
Disadvantages• The major disadvantage of this mode still remains cost and the need of
continuous and reliable cellular connection,.• There’s a need for standardized feature extraction processes that account
for variability's arising due to differences in channel , multi-linguality, variable accents, and gender differences, etc.
![Page 19: SPEECH RECOGNITION FOR MOBILE SYSTEMS BY: PRATIBHA CHANNAMSETTY SHRUTHI SAMBASIVAN](https://reader036.vdocument.in/reader036/viewer/2022062320/56649d965503460f94a7f797/html5/thumbnails/19.jpg)
SHARED SPEECH RECOGNITION WITH USER BASED ADAPTATION
![Page 20: SPEECH RECOGNITION FOR MOBILE SYSTEMS BY: PRATIBHA CHANNAMSETTY SHRUTHI SAMBASIVAN](https://reader036.vdocument.in/reader036/viewer/2022062320/56649d965503460f94a7f797/html5/thumbnails/20.jpg)
SHARED SPEECH RECOGNITION WITH USER BASED ADAPTATION Advantages• The ability to function even without network connectivity. • Works well for the limited set of conditions it encounters.• It can be covered successfully by existing mobile devices, if trained or
adapted accordingly. • Server capacity has to be provided only for average, not peak use.
![Page 21: SPEECH RECOGNITION FOR MOBILE SYSTEMS BY: PRATIBHA CHANNAMSETTY SHRUTHI SAMBASIVAN](https://reader036.vdocument.in/reader036/viewer/2022062320/56649d965503460f94a7f797/html5/thumbnails/21.jpg)
Speech recognition Process in detail
![Page 22: SPEECH RECOGNITION FOR MOBILE SYSTEMS BY: PRATIBHA CHANNAMSETTY SHRUTHI SAMBASIVAN](https://reader036.vdocument.in/reader036/viewer/2022062320/56649d965503460f94a7f797/html5/thumbnails/22.jpg)
Front-end ProcessInvolves spectral analysis that derives feature vectors to capture salient spectral characteristics of speech input.
Backend ProcessCombines word-level matching and sentence-level search to perform an inverse operation to decode the message from the speech waveform.
![Page 23: SPEECH RECOGNITION FOR MOBILE SYSTEMS BY: PRATIBHA CHANNAMSETTY SHRUTHI SAMBASIVAN](https://reader036.vdocument.in/reader036/viewer/2022062320/56649d965503460f94a7f797/html5/thumbnails/23.jpg)
Acoustic model• Provides a method of calculating the likelihood of any
feature vector sequence Y given a word W.• Each phone is represented by a HMM.
![Page 24: SPEECH RECOGNITION FOR MOBILE SYSTEMS BY: PRATIBHA CHANNAMSETTY SHRUTHI SAMBASIVAN](https://reader036.vdocument.in/reader036/viewer/2022062320/56649d965503460f94a7f797/html5/thumbnails/24.jpg)
Language Model
• The purpose of the language model is to take advantage of linguistic constraints to compute the probability of different word sequences • Assuming a sequence of words, ={ 1, 2,…, k}, the 𝐾 𝑊 𝑤 𝑤 𝑤
probability ( ) can be expanded as 𝑃 𝑊𝑃( )=( 1, 2,…, k)𝑊 𝑃𝑤 𝑤 𝑤• We generally make the simplifying assumption that any
word depends only on the previous −1 words in the 𝑤𝑘 𝑁sequence • This is known as an N-gram model • Grammars – Use context free grammars represented by
Finite State Automata (FSA)
![Page 25: SPEECH RECOGNITION FOR MOBILE SYSTEMS BY: PRATIBHA CHANNAMSETTY SHRUTHI SAMBASIVAN](https://reader036.vdocument.in/reader036/viewer/2022062320/56649d965503460f94a7f797/html5/thumbnails/25.jpg)
Overview of Statistical Speech recognition
Statistical Speech recognition model
![Page 26: SPEECH RECOGNITION FOR MOBILE SYSTEMS BY: PRATIBHA CHANNAMSETTY SHRUTHI SAMBASIVAN](https://reader036.vdocument.in/reader036/viewer/2022062320/56649d965503460f94a7f797/html5/thumbnails/26.jpg)
• Word sequence is postulated and the language model computes its probability.• Each word is converted into sounds or phones using
pronunciation dictionary.• Each phoneme has a corresponding statistical Hidden
Markov Model (HMM).• HMM of each phoneme is concatenated to form word model
and the likelihood of the data given the word sequence is computed.• This process is repeated for many word sequences and the
best is chosen as the output.
Statistical Speech recognition model
![Page 27: SPEECH RECOGNITION FOR MOBILE SYSTEMS BY: PRATIBHA CHANNAMSETTY SHRUTHI SAMBASIVAN](https://reader036.vdocument.in/reader036/viewer/2022062320/56649d965503460f94a7f797/html5/thumbnails/27.jpg)
Speech recognition on embedded platforms• Embedded ASR can be deployed either locally or in a
distributed environment with both advantages and disadvantages.
• For LVCSR, embedded devices are limited in terms of CPU power and amount of memory.
• Most importantly, speed is a limiting factor.
![Page 28: SPEECH RECOGNITION FOR MOBILE SYSTEMS BY: PRATIBHA CHANNAMSETTY SHRUTHI SAMBASIVAN](https://reader036.vdocument.in/reader036/viewer/2022062320/56649d965503460f94a7f797/html5/thumbnails/28.jpg)
Decoding algorithm
Asynchronous stack based decoder – memory efficient but complex.
Viterbi based decoder – most efficient.
3 types of search implementation
Combination of static graph and static search space
Static graph space with dynamic search space
Dynamic graph
![Page 29: SPEECH RECOGNITION FOR MOBILE SYSTEMS BY: PRATIBHA CHANNAMSETTY SHRUTHI SAMBASIVAN](https://reader036.vdocument.in/reader036/viewer/2022062320/56649d965503460f94a7f797/html5/thumbnails/29.jpg)
Mobile speech frameworks
• Nuance - Dragon mobile SDK• Openears• Sphinx• CeedVocal SDK • Vlingo
![Page 30: SPEECH RECOGNITION FOR MOBILE SYSTEMS BY: PRATIBHA CHANNAMSETTY SHRUTHI SAMBASIVAN](https://reader036.vdocument.in/reader036/viewer/2022062320/56649d965503460f94a7f797/html5/thumbnails/30.jpg)
Dragon Mobile SDK
The Dragon Mobile SDK provides speech recognition and text-to-speech functionality.
The Speech Kit framework provides the classes necessary to perform network-based speech recognition and text-to-speech synthesis.
It uses SystemConfiguration and AudioToolbox frameworks.
![Page 31: SPEECH RECOGNITION FOR MOBILE SYSTEMS BY: PRATIBHA CHANNAMSETTY SHRUTHI SAMBASIVAN](https://reader036.vdocument.in/reader036/viewer/2022062320/56649d965503460f94a7f797/html5/thumbnails/31.jpg)
Speech kit architecture
![Page 32: SPEECH RECOGNITION FOR MOBILE SYSTEMS BY: PRATIBHA CHANNAMSETTY SHRUTHI SAMBASIVAN](https://reader036.vdocument.in/reader036/viewer/2022062320/56649d965503460f94a7f797/html5/thumbnails/32.jpg)
OpenEars
OpenEars is an iOS framework for iPhone voice recognition and speech synthesis (TTS).
It uses the open source CMU Pocketsphinx, CMU Flite, and CMUCLMTK libraries.
OpenEars works by doing the recognition inside the iPhone without using the network.
![Page 33: SPEECH RECOGNITION FOR MOBILE SYSTEMS BY: PRATIBHA CHANNAMSETTY SHRUTHI SAMBASIVAN](https://reader036.vdocument.in/reader036/viewer/2022062320/56649d965503460f94a7f797/html5/thumbnails/33.jpg)
Sphinx
CMU Sphinx is a open source toolkit for speech recognition developed by Carnegie Melon University.CMU Sphinx is a speaker-independent large vocabulary continuous speech recognizer.
Pocketsphinx — lightweight recognizer library written in C.Sphinx4 — adjustable, modifiable recognizer written in Java.
![Page 34: SPEECH RECOGNITION FOR MOBILE SYSTEMS BY: PRATIBHA CHANNAMSETTY SHRUTHI SAMBASIVAN](https://reader036.vdocument.in/reader036/viewer/2022062320/56649d965503460f94a7f797/html5/thumbnails/34.jpg)
CeedVocal SDK
CeedVocal SDK is a isolated word speech recognition SDK for iOS.
It operates locally on the device and supports 6 languages : English, French, German, Dutch, Spanish and Italian.
![Page 35: SPEECH RECOGNITION FOR MOBILE SYSTEMS BY: PRATIBHA CHANNAMSETTY SHRUTHI SAMBASIVAN](https://reader036.vdocument.in/reader036/viewer/2022062320/56649d965503460f94a7f797/html5/thumbnails/35.jpg)
Mobile applications using speech recognition• Google now• Siri• S-Voice• Dragon Search• Dragon Dictation• Trippo-Mondo • Verbally
![Page 36: SPEECH RECOGNITION FOR MOBILE SYSTEMS BY: PRATIBHA CHANNAMSETTY SHRUTHI SAMBASIVAN](https://reader036.vdocument.in/reader036/viewer/2022062320/56649d965503460f94a7f797/html5/thumbnails/36.jpg)
References1. Rethinking Speech Recognition on Mobile Devices, Anuj Kumar, Anuj Tewari, Seth Horrigan, Matthew Kam, Florian Metze and John Canny.
2. Towards large vocabulary ASR on embedded platforms, Miroslav Novak.
3. Speech Recognition: Statistical Methods, L R Rabiner, B-H Juang.
4. http://www.nuancemobiledeveloper.com, 9th April 2013.
5. http://cmusphinx.sourceforge.net , 9th April 2013.
6. http://www.politepix.com/openears.
7. http://www.creaceed.com/ceedvocal