layered evaluation of adaptive systems dr. stephan weibelzahl national college of ireland dublin ...
TRANSCRIPT
Layered Evaluation of Adaptive Systems
Dr. Stephan WeibelzahlNational College of Ireland Dublinhttp://www.weibelzahl.de/
Dr Alexandros ParamythisJohannes Kepler University Linzhttp://www.fim.uni-linz.ac.at/staff/paramythis/
Programmiersprache C++ Winter 2005 Operator overloading (2) (2)
Example System
HTML-Tutor: An Adaptive Learning System
Introduction to HTML and Publishing on the Web
Programmiersprache C++ Winter 2005 Operator overloading (6) (6)
adaptive link annotation
Adaptation Strategy
Programmiersprache C++ Winter 2005 Operator overloading (7) (7)
Adaptation Strategy
adaptive curriculum sequencing
Programmiersprache C++ Winter 2005 Operator overloading (8) (8)
How can I evaluate my system?
How can we evaluate adaptivity? How can we find out what’s wrong? How can we find ways to improve it?
Compare adaptive version to non-adaptive version?» What could we learn from that?» What can we not learn from that?
Programmiersprache C++ Winter 2005 Operator overloading (9) (9)
Layered evaluation
Basic premises of layered evaluation» Don’t treat adaptation as a “monolithic” / singular process
(at least not only as such)» Rather, “break it down” into its constituents (“layers”), and» Evaluate each of them separately where necessary and
feasible
Programmiersprache C++ Winter 2005 Operator overloading (10) (10)
How can I evaluate my system?
Questions addressed in this lecture» Evaluation layer(s): What to evaluate?» Criteria: What are possible measures of success?
(see also a previous lecture)
Covered in the next lecture» What evaluation methods and data collection techniques ar
appropriate for our goals?
Programmiersprache C++ Winter 2005 Operator overloading (11) (11)
Layered Evaluation
collect collect input data input data
interpret datainterpret data
model the model the currentcurrent
state of the state of the world world
decide upon decide upon adaptationadaptation
apply apply adaptationadaptation
non-non-interactivinteractiv
ee““sensors”sensors”
interactive “front-end”interactive “front-end”
““static” modelsstatic” modelsapplication
modeltask
modelsystemmodel
...
““dynamic” modelsdynamic” modelsuser
modelcontextmodel
interactionhistory
......
adaptive adaptive theorytheory
Paramythis & Weibelzahl, 2005
Programmiersprache C++ Winter 2005 Operator overloading (12) (12)
Collection of Input Data
Adaptive system observes user behaviour and context, e.g., click stream, input, sensor data, etc.
Questions» Does the data collection work?» Is the user behaviour registered accurately?
Criteria» Reliability (consistency of data)» Accuracy» Latency
collect collect input data input data
interpret data interpret data
model the model the currentcurrent
state of the state of the world world decide upon decide upon
adaptationadaptation
apply apply adaptationadaptation
Programmiersprache C++ Winter 2005 Operator overloading (13) (13)
Collection of Input Data
Examples of questions to be answered» Eye-tracking for task
detection: Does the user actually look at the part of the screen that the eye-tracker indicates?
» HTML-Tutor: Are test items reliable?
» Movie recommender: Are ratings of movies consistent per user? Would a user rate the movie in the same way again after one week?
collect collect input data input data
interpret data interpret data
model the model the currentcurrent
state of the state of the world world decide upon decide upon
adaptationadaptation
apply apply adaptationadaptation
Programmiersprache C++ Winter 2005 Operator overloading (14) (14)
Interpretation of the Collected Data
Adaptive system interprets the recorded behaviour, giving meaning to raw data» Sometimes trivial (click on “next” button means, user
wants to proceed to next page)» However, interpretation is possibly based on assumptions
and might require inference
Question» Are the users doing what the system thinks
they are doing?
Possible criteria» Validity
collect collect input data input data
interpret data interpret data
model the model the currentcurrent
state of the state of the world world decide upon decide upon
adaptationadaptation
apply apply adaptationadaptation
Programmiersprache C++ Winter 2005 Operator overloading (15) (15)
Interpretation of the Collected Data
Examples of questions to be answered» HTML-Tutor: Is the content of a page actually “known”
when • Learner visited the page • Learner answered test- items correctly
» Movie recommender: Does a user actually like a movie when giving a positive rating?
collect collect input data input data
interpret data interpret data
model the model the currentcurrent
state of the state of the world world decide upon decide upon
adaptationadaptation
apply apply adaptationadaptation
Programmiersprache C++ Winter 2005 Operator overloading (16) (16)
Example Study with HTML-Tutor
Learners use system Learners complete post-test Comparison of model (“visited”, “known”) and real
data
collect collect input data input data
interpret data interpret data
model the model the currentcurrent
state of the state of the world world decide upon decide upon
adaptationadaptation
apply apply adaptationadaptation
Programmiersprache C++ Winter 2005 Operator overloading (17) (17)
Modelling of the Current State of the World Based on observations the system infers the current
state of the world, e.g., user model, context model» Usually this is the AI component of the system (Bayesian
network, rules, etc)
Questions» Does the model reflect the real world?» Is the world modelled in an appropriate way?
Possible criteria» Primarily: Validity» Secondary: Comprehensiveness,
Redundancy, Precision, Sensitivity, Scrutability collect collect
input data input data
interpret data interpret data
model the model the currentcurrent
state of the state of the world world decide upon decide upon
adaptationadaptation
apply apply adaptationadaptation
Programmiersprache C++ Winter 2005 Operator overloading (18) (18)
Modelling of the Current State of the World Examples of questions to be answered
» HTML-Tutor: Are pages that are inferred to be “known” (e.g., prerequisites of more advanced concepts) actually known?
» Movie Recommender: Do users like a movie that got high ratings from somebody with similar preferences?
collect collect input data input data
interpret data interpret data
model the model the currentcurrent
state of the state of the world world decide upon decide upon
adaptationadaptation
apply apply adaptationadaptation
Programmiersprache C++ Winter 2005 Operator overloading (19) (19)
Example Study with HTML-Tutor
Learners learn concepts in class Learners use system Learners complete post-test Comparison of model (“inferred”) and real data
0
5
10
15
20
25
1.5
1.6
2.2
2.3
2.5
2.8
2.12
3.2.
43.
44.
35.
3.5
chapter
fre
qu
en
cy
congruent
incongruent
Programmiersprache C++ Winter 2005 Operator overloading (20) (20)
Decide upon Adaptation
Adaptive System decides which adaptation theory/strategy to apply given the current user model
Questions» Is it necessary to intervene?» Did the system select a good and appropriate adaptation
strategy?
Possible criteria» Necessity» Appropriateness» Subjective acceptance
collect collect input data input data
interpret data interpret data
model the model the currentcurrent
state of the state of the world world decide upon decide upon
adaptation adaptation
apply apply adaptationadaptation
Programmiersprache C++ Winter 2005 Operator overloading (21) (21)
Decide upon Adaptation
Examples of questions to be answered» HTML-Tutor: The learner model seems to indicate that the
learner acquired sufficient knowledge about the current chapter.
• Shall we recommend to proceed to the next chapter? • Shall we annotate the current chapter as “known”?
» Movie recommender: Shall we recommend a certain movie (push) or wait till the user asks for a recommendation (pull)?
collect collect input data input data
interpret data interpret data
model the model the currentcurrent
state of the state of the world world decide upon decide upon
adaptation adaptation
apply apply adaptationadaptation
Programmiersprache C++ Winter 2005 Operator overloading (22) (22)
Example Study
Learners use system under different conditions» With and without annotation» With and without sequencing
Results» No effect on number of pages visited, overall impression or
perceived successful adaptation» Annotation increases number of pages visited per minute
How could this study be improved to better fit the layer?
collect collect input data input data
interpret data interpret data
model the model the currentcurrent
state of the state of the world world decide upon decide upon
adaptation adaptation
apply apply adaptationadaptation
Programmiersprache C++ Winter 2005 Operator overloading (23) (23)
Applying Adaptation Decisions
The adaptation decision can be applied in different ways (e.g., different colours, layouts, formulations)
Questions» Is the concrete instantiation of the adaptation decision
working?
» Do users understand what it means? Do they like it?
Possible criteria» Usability
» Obtrusiveness
» Acceptance
» Timeliness
» User controlcollect collect
input data input data
interpret data interpret data
model the model the currentcurrent
state of the state of the world world decide upon decide upon
adaptationadaptation
apply apply adaptation adaptation
Programmiersprache C++ Winter 2005 Operator overloading (24) (24)
Applying Adaptation Decisions
Examples of questions to be answered» HTML-Tutor:
• Is a red bullet a good way to indicate a “not recommended” page?• “Continue with the next suggested page”?
» Movie Recommender: • Shall we provide the full list of recommended movies? • Only one movie plus “more” button?• “Based on your ratings we believe that you might like the following
movies…”?
collect collect input data input data
interpret data interpret data
model the model the currentcurrent
state of the state of the world world decide upon decide upon
adaptationadaptation
apply apply adaptation adaptation
Programmiersprache C++ Winter 2005 Operator overloading (25) (25)
Evaluating Adaptation as a Whole
The big picture» Looking at the system as a whole: Does it work?
Questions» Does the system achieve its goals?» Does it improve interaction?» Do users like the system?
Possible criteria» Effectiveness» Efficiency» Usability» System specific criteria
collect collect input data input data
interpret data interpret data
model the model the currentcurrent
state of the state of the world world decide upon decide upon
adaptationadaptation
apply apply adaptationadaptation
Programmiersprache C++ Winter 2005 Operator overloading (26) (26)
System-specific criteria
The layered approach described, and the proposed generic criteria, are not by themselves sufficient to assess whether adaptivity “meets its goals”
A complete evaluation design then must include system-specific criteria that are» Specific to what gets adapted, how, and with what goal» Do not neglect the intrinsic characteristics of the application
domain(s)
These do not need to be addressed in isolation» But they tend to address adaptivity as a whole, so one has
to be careful in how they are combined with the assessment of other criteria
Programmiersprache C++ Winter 2005 Operator overloading (27) (27)
Evaluating Adaptation as a Whole
Examples of questions to be answered» HTML-Tutor: Does adaptation to prior-knowledge save
time?» Movie Recommender: Do users find movies they like and
would they have found these movies otherwise?
collect collect input data input data
interpret data interpret data
model the model the currentcurrent
state of the state of the world world decide upon decide upon
adaptationadaptation
apply apply adaptationadaptation
Programmiersprache C++ Winter 2005 Operator overloading (28) (28)
Field Study
What’s the impact of offering an adaptive prior-knowledge test in an on-line course? (Weibelzahl & Weber, 2002)
140 users learned with the HTML-Tutor» optional pre-test for 3 chapters» final knowledge test at the end
of the course» criteria: duration, knowledge» statistical analysis: MANOVA
and ANOVA
Photo © BrowserBob, 2007
Programmiersprache C++ Winter 2005 Operator overloading (29) (29)
Results
1 2 30
5000
10000
15000
chapterdu
ratio
n of
inte
ract
ion
(s)
no pre-test presentedpre-test presented
1 2 30
0.10.20.30.40.50.60.70.80.9
1
chapter
corr
ect
resp
onse
s
No differences in knowledge Completed course much quicker
Programmiersprache C++ Winter 2005 Operator overloading (30) (30)
Layered Evaluation – Complete example
Where is the user Where is the user looking on the screen looking on the screen
What concept is the What concept is the user reading about user reading about
What concepts What concepts has the user has the user
learned learned
eye-eye-trackertracker
Educational AHSEducational AHS
““static” modelsstatic” models
model of concepts andrelation to pages
““dynamic” modelsdynamic” models
learner (overlay)model
adaptive adaptive theorytheory
Guide user to Guide user to exercises for theexercises for thelearned conceptslearned concepts
Add icons in front of Add icons in front of links to the exerciseslinks to the exercises
Programmiersprache C++ Winter 2005 Operator overloading (31) (31)
Layered Evaluation Summary
Main premises» Break adaptation process down into its constituents
(“layers”)» Evaluate each of them separately where necessary and
feasible» Meant to provide guidance rather than prescribing a certain
way of evaluation
Benefits» Offers guidance for possible studies (“separation of
concerns”)» Helps to identify problems and wrong assumptions» Guides development process (formative evaluation)
Programmiersprache C++ Winter 2005 Operator overloading (32) (32)
Reading List
[Layered Evaluation] Paramythis, A. & Weibelzahl, S. (2005). A Decomposition Model for the Layered Evaluation of Interactive Adaptive Systems. In Ardissono, L., Brna, P., & Mitrovic, A. (Eds.), Proceedings of the 10th International Conference on User Modeling (UM2005), Edinburgh, Scotland, UK, July 24-29 (pp. 438-442) (Lecture Notes in Computer Science LNAI 3538, Springer Verlag). Berlin: Springer.
[Layered evaluation] Weibelzahl, S. (2001). Evaluation of adaptive systems. In M. Bauer, P. Gmytrasiewicz & J. Vassileva (Eds.), User Modeling 2001: Proceedings of the Eighth International Conference, UM2001. (pp. 292-294) (Lecture Notes in Computer Science LNAI 2109; © Springer-Verlag). Berlin: Springer.
[Layered evaluation] Paramythis, A., Totter, A., & Stephanidis, C. (2001). A modular approach to the evaluation of Adaptive User Interfaces. In S. Weibelzahl, D. Chin & G. Weber (Eds.), Proceedings of the Workshop on Empirical Evaluations of Adaptive Systems, held in the context of the 8th International Conference on User Modeling (UM'2001), 13-17 July, Sonthofen, Germany (pp.9-24). Freiburg: Pedagogical University of Freiburg.