heidi young, ozlo vp of engineering, seattle interactive 2016

The Future of Search: How Measuring Satisfaction Will Enhance Our Personal AIs and

Our Lives

Heidi YoungVP of Engineering

Ozlo

Who am I?

Search Junkie, Data Scientist, Engineer

Currently building Ozlo!!!

What is Ozlo?

Next generation assistant

Ozlo is leveraging artificial intelligence, machine learning and natural language processing to power the next generation of search

Ozlo is in the early stages of learning to understand a wide range of human goals and activities, and the words and ideas that connect those things to help users find what they actually need

AI Assistant and Chatbot Landscape

Siri

Alexa Skills Store

Bot Store

Skype Bot Store

Assistants

Platforms for exposing chatbots

Building a chatbot or assistant


https://twitter.com/ashevat/status/786690547733889024/photo/1




https://twitter.com/davidjbland/status/725119174368976897



Why all the hype then?We’ve moved to mobile where messaging is the natural method of communication

We’re moving to connected smart devices and expect our interactions to be natural to our surroundings

Why all the hype then?

There’s a good chunk of information seeking tasks that search engines don’t handle well in their current form Say wha?And they aren’t the really hard ones that you’re thinking of (i.e. research travel, buy a house)

Conversational UI

Why is conversational a better experience?It isn’t for a lot of things

Alexa, buy me some pants

I can’t buy pants. So I’ve added it to your shopping list.

😒

I want to order a pizza

Great! What kind of toppings would you like?

Pepperoni and sausage with extra cheese

And what kind of crust?

Thin crust

What size pizza would you like?

…

😒On average 73 taps with conversational ui vs conventional filtering ui with 16 taps

Why is conversational a better experience?

Rich, robust filtering

Highly visual experience

A lot of variety

It isn’t for a lot of things

Answer? The most natural interaction for the taskThe bar should be:

What kind of response would you expect from a really knowledgeable friend?

Are there any good movies playing?

Here’s some:…

Anything more kid friendly?

How about these? …

Which of these is playing around 9pm?

This is the only one playing around9pm, near you…

Great! Can you get me a ticket?

Here’s a link to buy it on Fandango

Information Task Modes

Remember• Simple Facts• Simple 1-2

sentence answers

• Clean, cut, dried

Understand• Obtaining

knowledge from a multitude of sources

• Constructing meaning from different content sources

Analyze• Breaking

material into constituent parts

• Determine relationships

• Make decisions

https://www.microsoft.com/en-us/research/wp-content/uploads/2015/08/fp286-bailey.pdf



Information Task Modes


In typical web search tasks, users have expectations for the number of queries they’ll issue and documents they’ll review

How many queries they expect to issue How many documents they expect to review



Back to that hype thing…


Chatbots and AI of today are primarily focused on stuff that’s pretty easy to get with an existing app or search engine

X X X XBut our expectation is that they can do these



Understand or Analyze Type of Task

What’s a good place to watch the game nearby?

Point of interest

That is rated highly or is popular or is known forthis type of task

Implies sports bar or point of interest that has a television with sports typically available

Close to your current location

Depending on where you’re located, could mean within walking distance or could mean 20 mins driving distance, depending on density of POIs and sparsity of available content

VERY IMPORTANT!!!

There is not ONE right answer to this question

It is a subjective question. Depending on your content sources, results can widely vary.

It requires a lot of synthesis across multiple sources, and likely presenting multiple sources, not a definitive answer.

What you really want

???

???

???

Place A:

Great sports bar nearby

Place B:

Romantic restaurant

nearby

Place C:

Coffeeshop nearby

XX

Place A:

Great sports bar nearby

Place D:

Restaurant known for

sports and tvs

Place D:

Restaurant known for

sports and tvs

Some existing experiences

Alexa

Google Assistant via Allo

What might a good experience look like?

Present evidence as to why those are good options

Present multiple options, but not so many that it’s overwhelming

Establish that you were heard and that he understood what you actually meant (i.e. sports bars, nearby)

Offer most likely refinements and follow on prompts

Successful Measurement of

Conversational UIs

To measure, we must understand

National Communication Association publishes a rating scale to assess skills in interpersonal settings during conversation

1 5

Inadequate awkward, disruptive, leaving a negative impression

Excellentsmooth, controlled, leaving a

positive impression

Attentiveness

Attention to, concern for conversational partner

Composure

Confidence, assertiveness

Expressiveness

Articulation, animation, variation

Coordination

Non disruptive negotiation of speaking turns

What do REAL messaging conversations look like?New vs Continuing Conversations

Identifying satisfaction of each sub-conversation

How we think about things at Ozlo

Negative conversations

Bottom Line: How did the conversation end?

Negative indicators,implicit AND explicit

We:

1. Identify conversation boundaries2. Assign positive or negative

assessment of each interaction3. Mark as negative if it “ended”

negatively

What’s a negative ending conversation?Conversations that contain one of the following in the last N messages in the interaction:

1. Explicit negative feedback

2. Highly latent

3. Not well understood

4. No follow on

VS

What’s a negative ending conversation?Negative Ending Specific Signal Roughly maps to NCA ratings for…

Explicit Negative Feedback

Thumbs down Composure (i.e. Didn’t understand, Results could be better)

Attentiveness (i.e. Oddly worded response, Didn’t understand)

Expressiveness (i.e. Oddly worded response)

Highly latent >1 second Coordination (i.e. Controlling the flow of conversation, “Never leave me hanging”)

Not well understood Didn’t understand, low confidence scores

Composure

Expressiveness

No follow on Lack of prompts displayed, Lack of engagement for non QnA questions

Coordination

Attentiveness

Why this over DAUs?

It’s not one over the other

DAUs/MAUs are lagging indicators

We must optimize for in-the-moment interactions

Negatively ending conversations allows us to react in the moment, and aggregate and set targets

Will this result in better AI experiences?Still early

This is how we learn, reinforce good behavior

Once we successfully measure, we can optimize!

Questions?

heidi young, ozlo vp of engineering, seattle interactive 2016

Internet