predicting user satisfaction with intelligent assistants
Post on 11-Jan-2017
426 Views
Preview:
TRANSCRIPT
Predicting User Satisfaction with Intelligent Assistants
Julia Kiseleva, Kyle Williams, Ahmed Hassan Awadallah, Aidan C. Crook, Imed Zitouni, Tasos
Anastasakos
Eindhoven University of Technology Pennsylvania State University
Microsoft
SIGIR’16, Pisa, Italy
From Queries to DialoguesQ1: how is the weather in ChicagoQ2: how is it this weekendQ3: find me hotelsQ4: which one of these is the cheapestQ5: which one of these has at least 4 starsQ6: find me directions from the Chicago airport to number one
User’s dialogue with
Cortana:Task is
“Finding a hotel in
Chicago”
From Queries to Dialogues
Q1: find me a pharmacy nearbyQ2: which of these is highly ratedQ3: show more information about number 2Q4: how long will it take me to get thereQ5: Thanks
User’s dialogue with
Cortana:Task is
“Finding a pharmacy”
Cortana: “Here are
ten restaurant
s near you”
Cortana:“Here are ten restaurants
near you that have good reviews”
Cortana:“Getting you direction to the Mayuri
Indian Cuisine”
User:“show restaur
ants near me”
User:“show
the best ones”
User:“show
directions to the second one”
From Queries to Dialogues
Main Research QuestionHow can we automatically predict
user satisfaction with search dialogues on intelligent
assistants using click, touch, and voice
interactions?
User:“Do I
need to have a jacket
tomorrow?”
Cortana: “You could
probably go without one. The forecast
shows …”
Single Task Search Dialogue
Cortana: “Here are
ten restaurant
s near you”
Cortana:“Here are ten restaurants
near you that have good reviews”
Cortana:“Getting you direction to the Mayuri
Indian Cuisine”
User:“show restaur
ants near me”
User:“show
the best ones”
User:“show
directions to the second one”
Multi-Task Search Dialogues
How to define user satisfaction with with search dialogues?
Cortana: “Here are
ten restaurant
s near you”
Cortana:“Here are ten restaurants
near you that have good reviews”
Cortana:“Getting you direction to the Mayuri
Indian Cuisine”
User:“show restaur
ants near me”
User:“show
the best ones”
User:“show
directions to the second one”
No Clicks ??
?
Cortana: “Here are
ten restaurant
s near you”
Cortana:“Here are ten restaurants
near you that have good reviews”
Cortana:“Getting you direction to the Mayuri
Indian Cuisine”
User:“show restaur
ants near me”
User:“show
the best ones”
User:“show
directions to the second one”
SAT?
SAT?
SAT?
Overall SAT? ? SAT
?SAT
?SAT
?
User Frustration
Q1: what's the weather like in San FranciscoQ2: what's the weather like in Mountain ViewQ3: can you find me a hotel close to Mountain ViewQ4: can you show me the cheapest onesQ5: show me the third oneQ6: show me the directions from SFO to this hotelQ6: show me the directions from SFO to this hotel
Q7: go back to first hotel (misrecognition) Q8: show me hotels in Mountain ViewQ9: show me cheap hotels in Mountain ViewQ10: show me more about the third one
Dialog with Intelligent Assistant
Task is “Planning a weekend ”
Intl.
As
sist
ant l
ost
cont
ext
Rest
art
sear
chA
user
is s
atis
fied
What interaction signals can track during search dialogues?
Tracking User Interaction: Click Signals
• Number of queries in a dialogue
• Number of clicks in a dialogue
• Number of SAT clicks (> 30 sec. dwell time) in a dialogue
• Number of DSAT clicks (< 15 sec. dwell time) in a dialogue
• Time (seconds) until the first click in a dialogue
Tracking User Interaction: Acoustic Signals
Phonetic Similarity between consecutive requests
Tracking User Interaction
3 seconds
6 seconds33% of
ViewPort 66% of
ViewPort
View
Port
H
eigh
t
2 seconds20% of ViewPor
t
1s 4s 0.4s 5.4s+ + =
Tracking User Interaction
• Number of Swipes• Number of up-swipes• Number of down-swipes• Total distance swiped (pixels)• Number of swipes
normalized by time• Total distance divided by
num. of swipes• Total swiped distance divided
by time• Number of swipe direction
changes
• SERP answer duration (seconds) which is shown on screen (even partially)
• Fraction of visible pixels belonging to SERP answer
• Attributed time (seconds) to viewing a particular element (answer) on SERP
• Attributed time (seconds) per unit height (pixels) associated with a particular element on SERP
• Attributed time (milliseconds) per unit area (square pixels) associated with a particular element on SERP
Tracking User Interaction: Touch Signals
How to collect data?
User Study Participants
75%
25%
GENDERMale Female
55%
45%
LANGUAGEEnglish Other
82%
8%2% 8%
Education Computer Science
Electrical Engineering
Mathematics
Other
• 60 Participants• 25.53 +/- 5.42 years
You are planning a vacation. Pick a place. Check if the weather is good enough for the period you are
planning the vacation. Find a hotel that suits you. Find the driving
directions to this place.
You are planning a vacation. Pick a place. Check if the weather is good enough for the period you are
planning the vacation. Find a hotel that suits you. Find the driving
directions to this place.
Questionnaire• Were you able to complete the task?
o Yes/No
• How satisfied are you with your experience in this task?o If the task has sub-tasks participants indicate their graded
satisfaction e.g. o a. How satisfied are you with your experience in finding a hotel? o b. How satisfied are you with your experience in finding directions?
• How well did Cortana recognize what you said?o 5-point Likert scale
• Did you put in a lot of effort to complete the task?o 5-point Likert scale
Questionnaire• Were you able to complete the task?
o Yes/No
• How satisfied are you with your experience in this task?o If the task has sub-tasks participants indicate their graded
satisfaction e.g. o a. How satisfied are you with your experience in finding a hotel? o b. How satisfied are you with your experience in finding directions?
• How well did Cortana recognize what you said?o 5-point Likert scale
• Did you put in a lot of effort to complete the task?o 5-point Likert scale
8 Tasks: 1 simple,
4 with 2 subtasks, 3 with 3 subtasks
~ 30 Minutes
Search Dialog Dataset• Total amount of queries is 2, 040 • Amount of unique queries is 1, 969• The average query-length is 7.07
Search Dialog Dataset• Total amount of queries is 2, 040 • Amount of unique queries is 1, 969• The average query-length is 7.07
• The simple task generated 130 queries• Tasks with 2 context switches generated 685
queries• Tasks with 3 context switches generated 1, 355
queries
How can we predict user satisfaction
with search dialogues using interaction signals?
Q1: what do you have medicine for the stomach acheQ2: stomach ache medicine over the counter
General WebSERP
User’s dialogue about the ‘stomach ache’
Q1: what do you have medicine for the stomach acheQ2: stomach ache medicine over the counterQ3: show me the nearest pharmacyQ4: more information on the second one
General WebSERP
Structured SERP
User’s dialogue about the ‘stomach ache’
General Web and Structured SERP
General Web and Structured SERP
Aggregating Touch Interactions
I( )1.
Aggregating Touch Interactions
I( )I( , ) 1. 2.
Aggregating Touch Interactions
I( ) I( ),I( ) I( , ) 1. 2. 3.
Quality of Interaction Model
Method Accuracy (%) Average F1 (%)Baseline 70.62 61.38
Interaction Model 1 78.78*(+11.55)
83.59*(+35.90)
Interaction Model 2 80.21*(+13.58)
83.31*(+35.44)
Interaction Model 3 80.81*(14.43)
79.08*(28.83)
* Statistically significant improvement (p < 0,05 )
Which interaction signals havethe highest impact on
predicting user satisfaction with search dialogues?
Predicting User Satisfaction• F1: The SERP for a query is ordered by a measure of relevance as
determined by the system, then additional exploration is unlikely to achieve user satisfaction, but is more likely an indication that the best-provided results (i.e. the SERP top) are insufficient to address the user intent
Predicting User Satisfaction• F1: The SERP for a query is ordered by a measure of relevance as
determined by the system, then additional exploration is unlikely to achieve user satisfaction, but is more likely an indication that the best-provided results (i.e. the SERP top) are insufficient to address the user intent
• F2: In the converse case of F1, when users find content that satisfies their intent, their likelihood of scrolling is reduced, and they dwell for an extended period on the top viewport
Predicting User Satisfaction• F1: The SERP for a query is ordered by a measure of relevance as
determined by the system, then additional exploration is unlikely to achieve user satisfaction, but is more likely an indication that the best-provided results (i.e. the SERP top) are insufficient to address the user intent
• F2: In the converse case of F1, when users find content that satisfies their intent, their likelihood of scrolling is reduced, and they dwell for an extended period on the top viewport
• F3: When users are involved in a complex task, they are dissatisfied when redirected to a general web SERP. Unlike F2, the absence of scrolling on this landing page is an indication of dissatisfaction
How can we define user satisfaction with search dialogues?• User satisfaction with search dialogues is defined in the generalized
form, which showed understanding the nature of user satisfaction as an aggregation of satisfaction with all dialogue’s tasks and not as a satisfaction with all dialogue’s queries separately
How can we predict user satisfaction with search dialogues using interaction signals?• We showed that features derived from voice and especially from touch
and voice interactions add significant gain in accuracy over the baseline
How can we predict user satisfaction with search dialogues using interaction signals?• Our analysis showed a strong negative correlation between user
satisfaction and swipe actions
Conclusion
• User satisfaction with search dialogues is defined in the generalized form, which showed understanding the nature of user satisfaction as an aggregation of satisfaction with all dialogue’s tasks and not as a satisfaction with all dialogue’s queries separately
• We showed that features derived from voice and especially from touch and voice interactions add significant gain in accuracy over the baseline
• Our analysis showed a strong negative correlation between user satisfaction and swipe actions
Thank you!
Questions?
top related