assistants that make sense - nuance technology lenke... · assistants that make sense ... – very...
TRANSCRIPT
© 2015 Nuance Communications, Inc. All rights reserved.
Assistants that Make Sense Nuance Corporate Research, Machine Learning, and AI Overview
Dr. Nils Lenke
© 2015 Nuance Communications, Inc. All rights reserved. 2
Strong customer and brand preference
With leading global relationships, it’s rare to go a day without Nuance
Healthcare Consumer Electronics Document Imaging
Automotive Telecommunications Government
Financial Services Travel Consumer goods
© 2015 Nuance Communications, Inc. All rights reserved. 3
14 billion customer engagements per year
4,300 patents and applications
40 text-to-speech languages and voices
800 million mobile keyboards shipped annually
30,000 mobile app developers
80 languages
309 million patient stories shared annually
130 million voice-enabled vehicles shipped globally
6,500 companies use Nuance Enterprise solutions
World-class technology portfolio
© 2015 Nuance Communications, Inc. All rights reserved. 4
– Very large global research group, part of the
larger R&D organization (1700 people)
– Appr. 300 researchers in ASR, Machine
Learning, NL and AI worldwide
– research.nuance.com/research
– Closely cooperates with divisional R&D
(Healthcare, Mobile, Enterprise), directly
involved in select customer engagements
– Co-Organizer of “Winograd Schema Challenge”
to replace the “Turing test”
– Shareholder of DFKI, world’s largest AI research
lab
Nuance Corporate Research
© 2015 Nuance Communications, Inc. All rights reserved. 5
The Science Behind the Buzzwords
“Deep Learning”
“Deep Neural Networks”
“Artificial Intelligence”
“Natural Language
Understanding”
“Cloud”
“The problem is that the concept of "artificial
intelligence" is way too potent for its own good,
conjuring images of supercomputers that operate
spaceships, rather than particularly clever spam
filters. The next thing you know, people are
worrying about exactly how and when AI is going
to doom humanity.” The Verge, Feb 29, 2016
“Machine Learning”
© 2015 Nuance Communications, Inc. All rights reserved. 6
Structure of Talk (and of Nuance Research)
NN & Machine
Learning
ASR Research NLU Research Symbolic AI TTS Research Voice Biometrics
Research
Enterprise:
Assisting
Customers &
Customer Agents
Ap
plicati
on
Layer
Co
re T
ech
no
log
y
Mobile: Assisting
the Driver & the IoT
user
Healthcare:
Assisting the
Specialist
© 2015 Nuance Communications, Inc. All rights reserved. 7
The big themes for ML:
– Which model type works best for which tasks (HMMs, NNs,
DNNs, CNNs, RNNs, CRF, SVM, Classifier, Logistic Regression,
Clustering…)
– Where to find supervised or at least “lightly” supervised training
data?
– How to get it to work?
Machine Learning
Data Model
Unseen
object Learning
(Training) Prediction
This is where
“big data”, the
Internet and
“social media”
come in
© 2015 Nuance Communications, Inc. All rights reserved. 8
inpur
hidden layer 1 hidden layer 2 hidden layer 3
output layer
Learn hierarchical feature
presentations
“Deep” Neural Nets
Backpropagation Mary
?
Learning from labeled
examples (= supervised
learning”)
© 2015 Nuance Communications, Inc. All rights reserved. 9
DNNs in Speech Recognition (ASR)
Hidden Markov Models
(HMMS) “Deep” Neural Nets
© 2015 Nuance Communications, Inc. All rights reserved. 10
– Map shows position of
devices (smart
phones, TVs, cars,
wearables, …)
sending requests to
cloud servers
– > 1 bn transactions /
month
– Important source
of training data
Nuance Speech Recognition (and NLU) increasingly deployed “in the cloud”
© 2015 Nuance Communications, Inc. All rights reserved. 11
Voice Biometrics vs. Voice Recognition (ASR)
“I want to transfer
money”
Eliminate
this variation
by training
on lots of
data
Extract only
what was said
from speech
signal
Use the
characteristics
to identify or
verify speaker
identity
Eliminate (vocal
password) or
ignore
(Freespeech)
content variation
© 2015 Nuance Communications, Inc. All rights reserved. 12
Does NOT mean to understand the complete meaning of any utterance
Instead it works for a specific domain (or a set of domains)
And the primary task of NLU is to return the most likely user intent and associated
concepts or named entities (aka slots/mentions) given an input utterance.
intent = navigation drive to [name] joe beef [/name] in [location] montreal [/location]
• Good accuracy gains with more modern ML models over the last few years:
– Baseline HMMs
– CRF (Conditional Random Fields, “invented for labeling tasks) +20% rel. in accuracy
– “Neuro CRF” (= combination of RNNs and CRFs) +15% rel. in accuracy
Natural Language Understanding (NLU)
© 2015 Nuance Communications, Inc. All rights reserved. 13
Text-to-Speech Technology - Giving the Assistant a Voice
Input
Text Speech
Front-End Linguistic Processing
Back-End Signal Generation
Text
Preprocessor
Linguistic
Preprocessor
Unit Selection
or Model Synthesizer
Language data Voice data
Voice Talent
“American Airlines and US
Airways have settled an anti-
trust suit with US regulators.
As part of the agreement,
which must still be approved
by a judge, the airlines will
give up slots at several US
airports.”
Or go for a
celebrity /
custom
voice
Selection of
Voice
Talent
=
TTS Voice
=
Assistant
Persona
© 2015 Nuance Communications, Inc. All rights reserved. 14
Symbolic AI (around since the 1960ies)
– Capturing “knowledge” in logical forms
– Ontologies play an important role “Pizza IS-A Italian
Dish”
– Allows to do reasoning and come up with
conclusions,
– Planning as an important technique
– Develop action plans to fulfill a user request
– Complex dialog behavior based on planning (rather
than fixed dialog strategies)
– Syntax parsing and semantical analysis beyond shallow
NLU
– Dependency trees as interface between Syntax and
Semantics
Intend(Sys, (Bel(Sys,
x.close(User,x) cafe(x)
tell( Sys, User,
close(User,x))))
send
I message John
agent
obj target
© 2015 Nuance Communications, Inc. All rights reserved. 15
“Find a good charging
station near the AMC”
Find the
AMC near
the driver’s
location
Find charging
stations near
the AMC
returned by
Fandango
NLU Output
Combine the results
from charge point
with Yelp to get the
highest rated stations
Big Knowledge & Semantic Routing
3,87 4,13
1,8
3,93 3,7
3,47
2,9
3,42
1
2
3
4
5
MovieTV Business:Restaurant
DMA+SR DMA Siri Google Now
Big Knowledge
Repository
BKR
Ingestion
Knowledge
Interface Layer /
Semantic Router
Open
Cyc
© 2015 Nuance Communications, Inc. All rights reserved. 16
Assistants that Make Sense… “I don’t know what you mean
by X, do you want me to do a
web search?”
– The problem with many personal virtual assistants is that they are too
general: you don’t know what they can and cannot do for you
– Rather focus on specialized assistants that cover a well defined task
and have an intuitive scope:
– Automotive Assistant for the Driver
– Device-centric Assistants for the IoT
– Assistant for the Clinical Specialist
– Virtual Customer Service Agent
© 2015 Nuance Communications, Inc. All rights reserved. 17
Acceptance of Conversational / Intelligent Experiences
89% Prefer
conversation with
virtual assistants
over search
73% Prefer personalized
conversations
83% Want an alternative to
PINs and passwords
90% Would prefer voice
biometrics over
passwords or
questions
A recent Nuance survey found:
© 2015 Nuance Communications, Inc. All rights reserved. 18
Assisting the Driver Today: Dragon Drive in BMW 7 Series
The Future : ASR, NLU, and AI form the Automotive Assistant
Calendar Book a table at
Joe‘s Pizza after
my last meeting
and let Tom and
Brian know to
meet me there
Sorry, but there aren‘t any
tables open until 9pm.
Would you like me to find
you another Italian
restaurant along your way
home?
Restaurant reservation
Address book
Messaging
Maps
Big Knowledge + Semantic Routing + Planning + Deep Syntax / Semantics + Dialog
© 2015 Nuance Communications, Inc. All rights reserved. 19
Nuance “Mix” – Build your own IoT Assistant
– Self-service platform covering ASR and NLU
– ASR tools to allow customization of vocabulary
– NLU GUI tools allow to define intents and named
entities (ontology).
– Fast creation of seed data and annotation of real data
– Push-button ML NLU model training
– Cloud based; create & test app for free; pay as you go
for real traffic
© 2015 Nuance Communications, Inc. All rights reserved. 20
Assisting the Specialist: CA-CDI
– Computer- Assisted Clinical
Documentation Improvement
– Ensure diagnoses suggested by case
record are explicitly documented
– If not, submit a clarification request to
the physician/ document author to
ensure proper subsequent coding and
thus (maximal) reimbursement
– Human CDS (“Clinical Documentation
Specialist”) use 42 strategies
– Rule-based system in production
mimicking the strategies
– Now adding ML solutions on top:
Billing
Automated
Assistant to
the CDS
Ru
les
Sim
ple
ML
DN
Ns o
nte
xt
DN
NS
on
text a
nd
rule
ou
tput
F-Score Baseline
Combining Rules
and DNN ML
© 2015 Nuance Communications, Inc. All rights reserved. 21
– “Transfer 73€ to A. van Dijck”
– Uses ML-ASR for spoken input
– Uses ML-NLU to discover intent & named entities
– To build these apps it needs a lot of manual work (be it with great GUI tools) – Define intents
– Collect and annotate data to train the NLU models per intent
– Define dialog strategies per intent
– Define the answers / answer strategies per intent
Assisting the Customer & the Customer Service Agent: Nina today
http://whatsnext.nuance.com/customer-experience/ing-intelligent-virtual-assistant-mobile-app/
Intent Named entities
Can we automate
this with ML?
© 2015 Nuance Communications, Inc. All rights reserved. 22
Assisting the Customer
& the Customer Agent
User Web Virtual Assistant
Nina & HAVA Hidden Agent
User Asks a question “Has my check cleared”
VA does not know
the answer
Hidden Agent
Supplies an Answer “yes your check has cleared”
Voice, chat,
web, …
Apply ML to learn
from Agent
interventions and
improve Automatic
Agent