qomex2010
DESCRIPTION
Considering the Subjectivity to Rationalise Evaluation Approaches: The Example of Spoken Dialogue SystemsTRANSCRIPT
Considering the subjectivity to rationalise evaluation approaches The example of Spoken Dialogue Systems
Marianne Laurent, Philippe Bretier (Orange Labs) Ioannis Kanellos (Telecom Bretagne)23 June 2010, Qomex 2010, Trondheim, Norway
2
« I can't Connect
the Internet! »
SPEECH UNDERSTANDING
SYSTEM OUTPUT
Spoken Dialogue Systems ?Evaluation
? ?Information system
DialogueManager
Spoken Language
Understanding
Automatic Speech
Recognition
Spoken Language
Generation
Text-toSpeech
Complex task - Dynamic interactions: no comparison to an ideal (fidelity) - Diversity of evaluators profiles, individualities and evaluation situations
3
Internal review of evaluation methods:Ad hoc protocolsdepending on the evaluator profile…
Laurent, M., Bretier, P. and Manquillet, C. (2010). Ad-hoc evaluations along the lifecycle of industrial spoken dialogue systems: heading to harmonisation?. In LREC 2010. Malta.
4
Internal review of evaluation methods:Ad hoc protocols... and on the evaluation context!
Laurent, M., Bretier, P. and Manquillet, C. (2010). Ad-hoc evaluations along the lifecycle of industrial spoken dialogue systems: heading to harmonisation?. In LREC 2010. Malta.
http://www.slideshare.net/MarianneLo/lrecmlaurentposter
5
Toward one-size-fits-all evaluation protocols?
Research has exerted considerably effort and attention to devising evaluation metrics that allows for comparison of disparate systems with various tasks and domain. (Paek, 2007)
«
A critical obstacle to progress in this area is the lack of a general framework for evaluating and comparing the performance of different dialogue agents. (Walker et al., 1997)
«
We see a multitude of highly interesting - but virtually incomparable – evaluation exercises, which address different aspects of quality, an which rely on different aspects evaluation criteria. (Möller, 2009)
«
6
Roadmap
1 Evaluation dependent on both context and evaluator
2 The evaluator as a mediator, an anthropocentric framework
3 Software implementation and anticipated added value
7
Free examination
Give the age of the people
Remember the clothes worn by the people
Estimate material circumstances of the family
Surmise what the family had been doing before the arrival
of the unexpected visitor
Remember positions of people and objects in the
room
Yarbus, A. L. (1967), Eye Movement and Vision, Plenum, New York.
1 Evaluation, a rationalising contribution for a decision process
8
Evaluation, a goal driven argumentation discourse
Process through which one defines, obtains and delivers useful pieces of information to settle between the alternative possible
decisions.Daniel STUFFLEBEAM
L'évaluation en éducation et la prise de décision, 1980, Ottawa, Edition NHP.
«
1
9
Compare
Confront the resultswith initial objectives
Top-down trendSituation interpreted into evaluation needs and procedure.
Nature of the decision to take
Identify the objectives
Define criteria
Deduce the indicators
List the data to capture
Experimental set-up
Bottom-up trend Value judgment: the evaluator creates a
meaning.
Capture the data
Process data into indicators
Note on agrid of criteria
Meet the objectives?
Take the final decision
V-Model process to define of evaluation2
10
Data Processing TechniquesAnalysis
Log Files Question-naires
3rd Party annotation
Physio-metrics
Capture
Interaction performance
Interaction quality
Efficiency related aspects
Utility & Usefulness
Critical viewpoints Etc.
2 A meta-model to define evaluations
Dat
a-D
riven
Goa
l-Driv
en
11
2 A mediator within an “evaluation ecosystem”
Situation
Corpus of evaluatio
ns
Community of
practiceNormative system
Rationalising
system
Demandsystem
System of constraints
Resources
12
User questionnaires
Third-party annotations
Log files
Data as collected in
evaluation campaigns
Datamart
Parameters, a descriptive view on the
system
Dashboards, Ad hoc selection of KPIs with potential
graphics
Personalised dashboards
Define KPIs
Multi Point Of vieW Evaluation Refine Studio3 Software implementation: MPOWERS
Retrieval of KPIs & reports
KPIs, an analytical
statistical view on the system
ITU-T Rec P.Supp.24: Parameters
describing the interaction with SDS
13
Added Value: Impact both for the individual and the belonging communities
Contribution& Involvement
Feedback& Inspiration
Evaluation definition & refinement
Retrieval of evaluation
results
COOPERATE: Contribute, as a knowledge-farming cooperative
CONNECT: Identify and create contact with relevant people.
COLLABORATE: - Feedback to refine
evaluations- Discuss/negotiate to
converge toward common practices
3
Communities of practice
Communities of interest
merci?? ?
@warius