lecture 2 architecture of virtual assistants and wwvw
TRANSCRIPT
Stanford CS224v Course: Conversational Virtual Assistants with Deep Learning
Lecture 2Architecture of
Virtual Assistants and WWvW
2
• What Does an Assistant Do? • How Alexa builds its 3rd party platform?• How to build a platform that supports
the full expressiveness of natural language?(Everything but NLP and speech)
• Decentralized WWvW
By Monica Lam and Giovanni Campagna
STANFORDLAM
Stopmoving
S
Growth of Virtual Assistants$3B US in 2019 -> $44B US in 2027 (CAGR 38%)1
• 50 millions in 2 years• Internet: 50 millions in 4 years
• 90M adults in the US have smart speakers
https://techcrunch.com/2018/03/07/47-3-million-u-s-adults-have-access-to-a-smart-speaker-report-says/
STANFORDLAM
Stopmoving
S
Intelligent Virtual Assistant Market $3B US in 2019 -> $44B US in 2027 (CAGR 38%)1
1. Smart Speakers: Alexa, Google Assistant
2. Chatbots / Robotic Process Automation
• BFIS (banks, financial institutes, insurance), Government, Hospitality, Travel, Retail, Automotive, Media & Entertainment, Education, Healthcare
1Valuates Reports
Stopper5
Alexa Commands
Party Category Examples
1st
Personification Chit chat, FAQInteractive Clock Radio Music, news, weather, clock, IoTs TVs, security camera, lights, …
WebAmazon servicesQuestions: Search, Wikipedia, …
3rd General websites
Stopper6
Genie Commands
Party Category Examples
1st
Personification Chit chat, FAQInteractive Clock Radio Music, news, weather, clock, IoTs TVs, security camera, lights, …
WebAmazon servicesQuestions: Search, Wikipedia, …
3rd General websites
Stopper7
TechniquesCategory Examples TechnologyPersonification Chit chat, FAQ Chit chat, FAQClock Radio Music, news, weather, clock, TasksIoTs TVs, security camera, lights, Tasks
WebQuestions: Search, Wikipedia,
Alexa: SearcnGenie: Search
+ Tasks
General websites Alexa: Task intentsGenie: Tasks
STANFORDLAM
Stopmoving
S
High-Level Architecture
Speech-to-text Text-to-speech
Dialogue Runtime
Chitchat FAQ WebSearch
TaskIntents Tasks
Amazonspecific
STANFORDLAM
Stopmoving
S
Key Concepts1. Assistant: Tasks, Chit-chat, FAQ, Open-domain questions
2. Alexa: 3rd-Party Dispatch Model
3. Genie: a Function Execution Model
4. Monitors and Function Composition
5. Abstract APIs
6. Decentralized WWvW
STANFORDLAM
Stopmoving
S
Alexa 3rd Party Design Goals
• Users must enable skills to avoid interference• 3rd party commands do not degrade 1st party commands• Multiple companies can provide same commands
• Shallow semantic parsing: intents/slots• Fast iteration• Simple connection to APIs
(function invocation with parameters)
A second class citizen
STANFORDLAM
Stopmoving
S
Amazon 3rd Party Platform
Skill
Open Capitol1; Launch Palo Alto Health;
Skill + intent
Ask U.S. Bank to get mychecking account balance
Skill + intent + slots
Ask Ring Central to call 16501234567
Name of the companyIdentified by a fixed pattern(“open/launch/ask/tell X”)
A choice of intents (API calls) Use a classifierPredict the most likely choice
Slots: typed parameters to APIsTag the words according to
a set of (predefined) types
(NLP models to be discussed next week)
STANFORDLAM
Stopmoving
S
Alexa 3rd Party: A Dispatch Model
Speech-to-text
IntentClassification
Text-to-speech
Fulfillment
Ask Yahoo what is the stock quote of AAPL? When the market closed,
Apple traded at $143.43 on the NASDAQ, up 0.34% since previous close.
Results are simply to be spoken
STANFORDLAM
Stopmoving
S
How to Build a Skill? 1. User Interface:
• Declare a list of intents and their associated slots
• Provide phrases for each intent (as many as possible)
• Choose type for each slot
• If slot is not predefined, provide phrases for new slot types
SearchRestaurant(loc, cuisine)GetRestaurant(name)
“find me a good restaurant”“find a restaurant in {loc}”
loc : Location
name : “Mc Donalds”, “Burger King”, ..cuisine: “American”, “Italian”..
STANFORDLAM
Stopmoving
S
How to Build a Skill? 2. Fulfillment (backend)
• Implement HTTP server to be called by Alexa backend
• Extract intent and slots from HTTP request
• Map slots from natural language (“tomorrow”) to API value (“2021-09-23T00:00:00Z”)
• Call any necessary APIs
• Construct and return reply text to be spoken to the user
STANFORDLAM
Stopmoving
S
Key Concepts1. Assistant: Tasks, Chit-chat, FAQ, Open-domain questions
2. Alexa: 3rd-Party Dispatch Model
3. Genie: a Function Execution Model4. Monitors and Function Composition
5. Abstract APIs
6. Decentralized WWvW
STANFORDLAM
Stopmoving
S
Alexa 3rd Party: A Dispatch Model
Speech-to-text
IntentClassification
Text-to-speech
Fulfillment
Ask Yahoo what is the stock quote of AAPL? When the market closed,
Apple traded at $143.43 on the NASDAQ, up 0.34% since previous close.
Results are simply to be spoken
STANFORDLAM
Stopmoving
S
Genie Tasks: A Function Execution Model
Speech-to-text
Semantic Parsing
Text-to-speech
Language Generation
ThingTalkExecution
What is the stock quote of AAPL?
Policy
Current: $143.43Last close: $142.94Exchange: Nasdaq
Apple is trading at $143.43 on the NASDAQ, up 0.34% since previous close.
STANFORDLAM
Stopmoving
S
Execution vs. Dispatch ModelsExecution Model Dispatch Model
Used by Genie for 1st and 3rd party skills Used by Alexa & Google Assistant on 3rd party skills
Declaration of full signature of functions: typed inputs and typed return results Declaration of typed inputs only
Skill returns typed results in ThingTalk Skill returns display text
Supports composition: Return results can be used as inputs Does not support composition
Response generator can adjust output according to the context Canned output per intent
STANFORDLAM
Stopmoving
S
Genie Function Model
• Two kinds of grounding primitives
• Queries: Return a query result (DB or API); No side effect
• Actions: Perform an action, possibly returning a result. With potential side effect
Natural Language API SignaturesQueries get tweets from Stanford home_timeline(), … author==“Stanford”Actions tweet “Stanford won!” post (status)
STANFORDLAM
Stopmoving
S
Basic Dialogue (1):
What is the weather? There are too many results!
Which city?
Palo Alto
City is a required parameter,often called a slot.
Automatic “slot filling”
slot filling
Invoke API with “Palo Alto”
STANFORDLAM
Stopmoving
S
Basic Dialogue (2):
Call Ethan Chi
I’ll call Ethan Chi, OK?
Actions have side effects
Confirmation
Should be confirmed by default
OK
STANFORDLAM
Stopmoving
S
Basic Dialogue (No Code)• Slot filling
• Function signature declares required parameters
• Agent fills slots when they are missing
• Confirmation • Agent should confirm actions by default
• This can be tedious
• Explicit special cases: playing a song
STANFORDLAM
Stopmoving
S
Skills• Skills: grounding of words to making queries and actions• Manifest: Definition of a skill
1. User interface: Mapping of words to queries and actions
2. Fulfillment: Implementing queries and actions
STANFORDLAM
Stopmoving
S
To Build a Skill1. User interface
• Declare function signature
• Canonical natural language
• Enough to create an agent (but it is clunky)
• Customized prompts and results
STANFORDLAM
Stopmoving
S
Skill Signature: APIsclass @org.thingpedia.weather {
query forecast(in req location : Location,
in opt date : Date,
out status : Enum(sunny,raining,cloudy),
out temperature : Measure(C),
out wind_speed : Measure(mps));
}
function type:• query: no side effect,
can be combined like DB query• action: side effect,
performed one at a time
parameters:• inputs (required or optional):
user-supplied values • outputs: values returned
can filter on outputs
precise, normalized types:• “tomorrow” à “2021-09-23T00:00:00Z”• “clear sky” à enum sunny• “70 F” à 21.7 C
STANFORDLAM
Stopmoving
S
Skill Signature: For DB Queriesclass @org.schema {
entity restaurant;
entity review;
query restaurant(out id : Entity(restaurant),
out geo : Location,
out reviewCount : Number,
…)
query review(out id : Entity(review), …)
}
entity: a database table
parameters• No input parameters• Only outputs (filter on outputs)• Id: parameter of entity type
one query per table
STANFORDLAM
Stopmoving
S
Skill: Canonical Natural Languageclass @org.thingpedia.weather {
query forecast(in req location : Location,
#_[canonical=[{
base=[“location”, “place”], // what location are you interested in?
preposition=[“for #”, “at #”] // get forecast for Stanford
}],
…)
#_[canonical=[“forecast”, “weather forecast”]];
}
parameter canonical form:how to use the parameter in a sentence, in different parts of speech
function canonical form:basic way to refer to a function(noun for query, verb for action)
STANFORDLAM
Stopmoving
S
Skill: Customized Interfaceclass @org.thingpedia.weather {
query forecast(in req location : Location,
#_[prompt=“for what location would you like the forecast?”],
…)
#_[result=“It will be ${status} on ${date} in ${location}”]
}
#_[ ]: User-supplied prompt and result text ${ }: Placeholders
STANFORDLAM
Stopmoving
S
To Build a Skill1. User Interface
2. Fulfillment
• Performing the function
• Managing accounts
STANFORDLAM
Stopmoving
S
Fulfillment: DB Queriesconst Tp = require(‘thingpedia’);
module.exports = class DB extends Tp.BaseDevice {
async query(query, env) {
const sql = compile(query);
return executeSQL(sql);
}}
query: ThingTalk query as abstract syntax treeenv: ThingTalk runtime (command ID, session ID)query:
single entry point for DB Compile: ThingTalk to DB query (e.g. sql, sparql)
Tp: Thingpedia (standards repository)
Execute: Execute the query and return the results
STANFORDLAM
Stopmoving
S
Fulfillment: Standard Protocols• Import loader brings in library loader (a fulfillment library)• RSS: Really Simple Syndication
• Standardized content distribution method for news, blogs
class @com.washingtonpost {
import loader from @org.thingpedia.rss();
…
}
This is an RSS feedNeeds only RSS URL
STANFORDLAM
Stopmoving
S
Fulfillment: Javascript (JS) Codeconst Tp = require(‘thingpedia’);
module.exports = class Weather extends Tp.BaseDevice {
async get_forecast({ location, date }, env) {
// make HTTP request
return [{
status: …,
temperature: …,
wind_speed: …
}];
}
}
Declared inputs
Return declared outputs
STANFORDLAM
Stopmoving
S
Accounts• Many skills need accounts: Spotify, email, IoT, FB• Common authentication scheme: Oauth
• IETF (Internet Engineering Task Force) RFC (Request for comments) 6749
• Lets a third-party application obtain limited access to an HTTP service
• User is redirected to login page on 3rd party website• Then redirected back to Genie with auth material
STANFORDLAM
Stopmoving
S
OAuth in Genieclass @com.spotify {
import config from @org.thingpedia.oauth2(
authorize="https://accounts.spotify.com/authorize"^^tt:url,get_access_token="https://accounts.spotify.com/api/token"^^tt:url,
get_profile="https://api.spotify.com/v1/me"^^tt:url,
profile=["id", "display_name", "product"]
);
…}
import config: how the user configs the skill
^^: RDF syntax of assigning a type to a string
STANFORDLAM
Stopmoving
S
Execution
Account Manager
Execution
Thingpedia
User Interface
3. get my“spotify”
User
1. command
4. get spotifyinterface
5. return spotifyinterface
6. OAuth7. return authenticated
spotify instance
Spotify
8. interfacewith my Spotify
9. return result10. show answer
@com.spotify.get_item_from_library(),contains(artists,"spotify:artist:06HL4z0CvFAxyc27GXpf02“ ^^com.spotify:artist("Taylor Swift"))
Find my favorite songs by Taylor Swift
2. thingtalk
STANFORDLAM
Stopmoving
S
Key Concepts1. Assistant: Tasks, Chit-chat, FAQ, Open-domain questions
2. Alexa: 3rd-Party Dispatch Model
3. Genie: a Function Execution Model
4. Monitors and Function Composition5. Abstract APIs
6. Decentralized WWvW
STANFORDLAM
Stopmoving
S
Monitors• Get notified when some query satisfies the condition
• Amazon has some hard-coded monitors
• e.g. Notifications from Amazon deliveries
• Genie allows monitoring of every query
STANFORDLAM
Stopmoving
S
Monitoring is UsefulQuery: Is there a Covid appointment available?
Monitor: Tell me when thereis a Covid appointment
Query: Is it going to rain tomorrow?
Monitor: Let me know when rain is forecast the next day
Query: Is there motion on the security camera?
Monitor: Tell me if motion is seen on the security camera
Query: Is the AAPL stock below $100?
Monitor: Tell me when the AAPL stock drops to $100
STANFORDLAM
Stopmoving
S
Implementing Monitors• Polling: The execution engine issues queries at a specified interval
• Push notifications• The client gets a push when there is new data
• Web hooks: API server makes HTTP request to client
• Efficient, but cloud-to-cloud only
• Streaming (Web sockets): keep open connection from client server
• Works from any device, keeping connection might be expensive• Monitoring functions are useful, but expensive
• Affordable on a personal, always-connected device (smart speaker)
STANFORDLAM
Stopmoving
S
IFTTT: Simple Function Composition
• A crowdsourced website; supports If-this-then-that routines with APIs• Users can create IFTTT routines
• 250,000 unique recipes• 90M activated IFTTT rules
• Simple but powerful paradigms: many existing APIs • Limitations
• Proprietary and centralized: user must share credentials• GUI user interface: no natural language input
no formal language representation • A skill in Alexa, Google Assistant (configured on the web)
Stopper
ThingTalk Compound Statement
• When: when the command is to be executed – when clause• Get: a query – noun• Do: call a action – verb
“every day at 9am play songs by Taylor Swift”
attimer(time=9:00) =>@com.spotify.song(), contains(artists, “taylor swift”)
=> @com.spotify.play(playable=id)
43
[when =>]? get, filter => do
STANFORDLAM
Stopmoving
S
Composition in Genie• Monitor-action
• If the AAPL stock goes to $100, buy 1 shares• If it is going to rain tomorrow, turn off the sprinkler • Whenever I update my FB profile, update my Twitter profile.
• Query-action• Get all FB photos posted by Jackie, send them to grandma
• Monitor-query-action• If the AAPL stock goes to $100, get my checking account
balance, if it is greater than $1000, buy 1 share.
Stopper
Long-Tail FunctionspeopleDr. Smith:
“if Bob’s peak flow-meter drops below 180L/min let me know”
environmentDr. Smith:
“when the air quality index is above 50 and Bob is running, warn him”
location“Let my Dad know
if I am at the hospital”
devices“when I use my inhaler,
record my GPS location in logfile on Box”
Bob
STANFORDLAM
Stopmoving
S
Natural Language Programming
When the air quality index is above 500, and I am running, send me an SMS.If I get taken to a hospital, let my dad know. When I use my inhaler, get my location, save them to Dropbox.
If my heart rate is above 130, and I am not running, remind me to take a deep breath.
WHEN [FILTERS] → GET [FILTERS] → DO
FILTERS: =, <, >, <=. >=, <>, contains, starts with, ends with
STANFORDLAM
Stopmoving
S
NL Programming in ThingTalk• Power
• Grounding: Many skills (primitive actions, queries/monitors)
• Composition: through a simple, high-level construct
• Events, function composition, type-based filters
• Note: no variable declarations, no assignments!
• Mastering composition is key to NLU
STANFORDLAM
Stopmoving
S
Key Concepts1. Assistant: Tasks, Chit-chat, FAQ, Open-domain questions
2. Alexa: 3rd-Party Dispatch Model
3. Genie: a Function Execution Model
4. Monitors and Function Composition
5. Abstract APIs
6. Decentralized WWvW
STANFORDLAM
Stopmoving
S
Abstract APIs
• IoTs of a type share mostly the same queries and actions
• E.g. Lights!
• We can define the superset
• Unsupported can be defined as such
• Create a skill once for an abstract API
STANFORDLAM
Stopmoving
S
Home Assistant• Top 10 most popular open-source projects, 100,000 users• Crowdsourced device APIs used by community• 1000+ devices, SmartThings, Apple HomeKit hubs• 26 abstract APIs• Genie: Bundled voice assistant released with Home Assistant
• Current: 15 skills to abstract APIs, 700+ devices• Agnostic to assistant compatibility (Alexa, Google Assistant, Siri)
Lights Door Locks Thermostats FansAbstractAPIs
Philips GE
TP-Link LIFX
Yale Shlage
Samsung Kwikset
Nest Honeywell
Emerson LUX
Haiku Hunter
Bond Honeywell
1000+devices
Skills Lights Door Locks Thermostats Fans
Local IoT gateway with a GUI interface
STANFORDLAM
Stopmoving
S
Language Crosses Device & Type Boundaries
• Turn on the light – which light? • The user gives names to the light: family room, kitchen
• What is the battery level?• Is it the smoke alarm? Or the car?
• Pause• What is being played
• Context is needed at both parsing and execution time• Parser time: User-defined names à parameters in parsing
• Execution time: Disambiguate across devices in the house
STANFORDLAM
Stopmoving
S
Other Abstract APIs?
STANFORDLAM
Stopmoving
S
Key Concepts1. Assistant: Tasks, Chit-chat, FAQ, Open-domain questions
2. Alexa: 3rd-Party Dispatch Model
3. Genie: a Function Execution Model
4. Monitors and Function Composition
5. Abstract APIs
6. Decentralized WWvW
STANFORDLAM
Stopmoving
S
Decentralization of Voice
3rd Party Skills
Assistant
Closed AI
WWvWusing Thingpedia
Assistant
AlexaGoogle Assistant
Proprietary Stack
Genie, …
Open System
STANFORDLAM
Stopmoving
S
Existing Standards• Data Types: Wikidata Types• Public structured knowledge bases
• Wikidata: 100M data facts• Schema.org: used by 30% of webpages for search engines
• Abstract APIs• RSS• Home Assistant
(Amazon, Apple, Google Assistant have proprietary standards)
STANFORDLAM
Stopmoving
S
Thingpedia: Open Resources
Skills Wikidata IoTs Music Restaurant NewsRSS
Pretrained Agents Wikidata IoTs Music Restaurant NewsRSS
enieReference Assistant
700+ Spotify Yelp Smartnews
Wikidata IoTs Music Restaurant NewsAbstract APIs RSS
HomeAssistant
SpotifySchema.org
YelpSource Standard Standard NewlyProposed
STANFORDLAM
Stopmoving
S
No-Training Agent• User-interface: Inherit an abstract user-interface
• Spotify as an instance of the generic media-source
class @com.spotifyextends @org.thingpedia.media-source, @org.thingpedia.media-player {
import loader …;import config …;
}
• Fulfillment: map abstract API to concrete API
STANFORDLAM
Stopmoving
S
Decentralized WWvW• Each company owns its agent
• E.g. by using Thingpedia• Each company hosts agent at a “well-known URI”
• Web-based service discovery mechanism (RFC 8615)
• https://spotify.com/.well-known/wwvw/manifest.tt
• User navigation• User specifies the company by name or locate via a search engine • Virtual assistant visits the well-known URI
https://spotify.com/.well-known/wwvw/manifest.tt
STANFORDLAM
Stopmoving
S
Conclusion: WWvW
3rd Party Skills
Assistant
Closed AI
WWvWusing Thingpediaabstract APIs, skills, &
pretrained agents
AssistantAlexa
Google Assistant
Proprietary Stack
Genie, …
Open Decentralized System
ThingTalkexpressiveness of NL:monitors and composition
STANFORDLAM
Stopmoving
S
Conclusion1. Assistant: Tasks, Chit-chat, FAQ, Open-domain questions
2. Alexa: 3rd-Party Dispatch Model
3. Genie: a Function Execution Model
4. Natural language programming: Monitors & Function Composition
5. Abstract APIs
6. Decentralized WWvW
STANFORDLAM
Stopmoving
S
Simple DB Queries (1 Table)class @com.yelp {
entity restaurant;
query restaurant(out id : Entity(restaurant),
out geo : Location,
out reviewCount : Number,
…);
}
Parameters• No input parameters• Only outputs (filter on outputs)• Id: parameter of entity type
Entity: a database table
one query per table
STANFORDLAM
Stopmoving
S
General DB Queries (w/ joins)class @org.schema {
entity restaurant;
entity review;
query restaurant(out id : Entity(restaurant),
out geo : Location,
out reviewCount : Number,
…)
#[handle_thingtalk=true];
query review(out id : Entity(review), …)
#[handle_thingtalk=true];
}
declare that the skill will receive the whole thingtalk query, without compiling it to JavaScript first
STANFORDLAM
Stopmoving
S
Simple Database Queriesconst Tp = require(‘thingpedia’);
module.exports = class Yelp extends Tp.BaseDevice {
async *get_forecast({ }, env, { filter, projection, sort, limit }) {
// retrieve data// according to filter, sort, limit
for (const restaurant of data) {
yield [{
id: …,
geo: …,
reviewCount: …
}];
}
}
}
• input param list is empty: { }• instead, receives the thingtalk db
operators applied to the result of the query
return declared outputs, lazily
method will be stopped when enough results have been returned to the user
one Thingpedia query = one jsmethod,async iterable(coroutine)
STANFORDLAM
Stopmoving
S
Other abstract APIs and Skills?• Restaurants
• Hotels
• Banking
• Customer support
• Tools that can generate agents from: APIs and DBs