speech recognition as a user interface
TRANSCRIPT
Title
1Jared Sheehan @ DriversitiSpeech Recognition as a User Interface
Unit of measureTitleUnit of measurePAGE # 2015 Apio Systems, Inc. Confidential 2Who am I
Glass explorer, speech recognition enthusiast and big android nerd
Android Lead @Driversiti - driving safety for the mobile generation
Speech Recognition application for the Amazon Fire Phone
Suite of applications - AIM Android, Engadget Android, Distro Android, TechCrunch Android, AOL HD, AIM Blackberry
Meetup evangelist DC Android Meetup Group Join today!
Unit of measureTitleUnit of measurePAGE # 2015 Apio Systems, Inc. Confidential 3Overview
What is voice/speech recognition?
What awesome stuff you can do with it?
How it works
Demo!
Question and Answer
Unit of measureTitleUnit of measurePAGE # 2015 Apio Systems, Inc. Confidential 4Hello Computer
Unit of measureTitleUnit of measurePAGE # 2015 Apio Systems, Inc. Confidential 5
Definition
Unit of measureTitleUnit of measurePAGE # 2015 Apio Systems, Inc. Confidential 6What can you do with SR?
Technology that allows spoken input into software systems.
You speak to your computer, tablet, phone or device and it uses what you said as input to trigger some sort of action.
Replace other methods of input like clicking, swiping, typing or selecting in other ways.
It is a means to make devices and software more user-friendly and to increase productivity.
It is used extensively as a form of accessibility assistance.
Unit of measureTitleUnit of measurePAGE # 2015 Apio Systems, Inc. Confidential 7ASR - Dictation
Automatic speech recognition (ASR) also called Dictation
Translates speech input into words, sentences and punctuation.
Audio is input through a microphone and streamed somewhere
The result is usually returned as a string with a confidence level
Very easy integration with Android 2 ways to do it.
Unit of measureTitleUnit of measurePAGE # 2015 Apio Systems, Inc. Confidential 8How does it work?
A user speaks into a recording device of some sort
Speech recognition begins with the digital sampling of speech and then acoustic signal processing of the audio.
Several processes including DTW (Dynamic time warping), HMM (Hidden Markov models) and NNs (Neural Networks) can achieve the desired results
Most systems use language specific knowledge to tune the models.
Next is the actual recognition of phonemes, groups of phonemes and words
Unit of measureTitleUnit of measurePAGE # 2015 Apio Systems, Inc. Confidential
8CHI-AAA123-20100421-
9Speech Recognition system architecture
Unit of measureTitleUnit of measurePAGE # 2015 Apio Systems, Inc. Confidential 10Into the weeds
Speaker dependence
Speaker independence
Continuous Speech
How good is your system? Hint: Word Error Rate
Isolated wordIs that all it does??
Unit of measureTitleUnit of measurePAGE # 2015 Apio Systems, Inc. Confidential
10CHI-AAA123-20100421-
11Dictation is cool, but not that cool
Next step is understanding what the user wants to do
Then act on it
Generally, the ASR results are passed into an Intent recognition system with additional information
Contextual information can be, where the utterance is coming from (mobile phone, computer), what app they are using, location etc.
That information is used to determine the users intent and execute the request.
Unit of measureTitleUnit of measurePAGE # 2015 Apio Systems, Inc. Confidential 12Intent recognition
Recognizing speech is only part of the process. How does Google Now know that I want to send an SMS message to a friend? How does Siri know when I want to know how tall Kobe Bryant is?
ASR is only the first step in true Speech as a user interface. To successfully help users perform useful actions we must understand their intent. How to do this?
Three systems; ASR, Intent Recognition and a Dialog Engine
The Dialog engine takes the output from the IR system and sends responses and actionable information to the caller.
Unit of measureTitleUnit of measurePAGE # 2015 Apio Systems, Inc. Confidential 13Android Speech APIs
Unit of measureTitleUnit of measurePAGE # 2015 Apio Systems, Inc. Confidential 14Android Speech APIs
http://developer.android.com/reference/android/speech/package-summary.html
Relatively easy implementation
A UI and no UI API
InputMethodServices use the no UI version - Keyboards
Unit of measureTitleUnit of measurePAGE # 2015 Apio Systems, Inc. Confidential 15Recognizer Intent
UI is supplied for you
Fire the intent and get a result
Again very easy to use
Unit of measureTitleUnit of measurePAGE # 2015 Apio Systems, Inc. Confidential 16SpeechRecognizer
UI is not supplied for you
Results are streamed directly to the EditText
Still fairly easy to use
Unit of measureTitleUnit of measurePAGE # 2015 Apio Systems, Inc. Confidential What else is there?16CHI-AAA123-20100421-
17Google Now Onto Intent recognition systems
Unit of measureTitleUnit of measurePAGE # 2015 Apio Systems, Inc. Confidential 18Google Now On tap
Unit of measureTitleUnit of measurePAGE # 2015 Apio Systems, Inc. Confidential 19Apple Siri
Unit of measureTitleUnit of measurePAGE # 2015 Apio Systems, Inc. Confidential
20Amazon Fire phone, Fire Tv and Echo
Unit of measureTitleUnit of measurePAGE # 2015 Apio Systems, Inc. Confidential 21Microsoft Cortana
Unit of measureTitleUnit of measurePAGE # 2015 Apio Systems, Inc. Confidential 22Speech providers Google, Nuance, IBM Watson
Unit of measureTitleUnit of measurePAGE # 2015 Apio Systems, Inc. Confidential 23Google Voice Interaction API
Unit of measureTitleUnit of measurePAGE # 2015 Apio Systems, Inc. Confidential 24Nuance Speech SDK
Dragon Mobile SDK Free up to 20k transactions per/month
Upload custom vocabularies
Developer: Uploads a new song and music vocabularyUtterance: Eminem higher probability then M&M
Unit of measureTitleUnit of measurePAGE # 2015 Apio Systems, Inc. Confidential
24CHI-AAA123-20100421-
25User Interface examples - Google Glass
Unit of measureTitleUnit of measurePAGE # 2015 Apio Systems, Inc. Confidential 26User Interface examples - Google Glass continued
Unit of measureTitleUnit of measurePAGE # 2015 Apio Systems, Inc. Confidential 27User Interface examples - Google Glass continued
Unit of measureTitleUnit of measurePAGE # 2015 Apio Systems, Inc. Confidential
Enough talk!
Unit of measureTitleUnit of measurePAGE # 2015 Apio Systems, Inc. Confidential
Show me code!
Unit of measureTitleUnit of measurePAGE # 2015 Apio Systems, Inc. Confidential
[email protected]://www.meetup.com/DCAndroid/Tweet: @jayroo5245
THANK YOU
Unit of measureTitleUnit of measurePAGE # 2015 Apio Systems, Inc. Confidential