evolution of evaluation in hci joseph jofish kaye microsoft research, cambridge cornell university,...
TRANSCRIPT
Evolution of Evaluationin HCI
Joseph ‘Jofish’ KayeMicrosoft Research, CambridgeCornell University, Ithaca, NYjofish @ cornell.edu
HCI Seminar SeriesYork20 November 2006
What is evaluation?• Something you do at the end
of a project to show it works…• … so you can publish it.• Part of the design-build-
evaluate iterative design cycle• A way of defining a field• A way a discipline validates
the knowledge it creates.• A reason papers get rejected
HCI Evaluation: Validity“Methods for establishing
validity vary depending on the nature of the contribution. They may involve empirical work in the laboratory or the field, the description of rationales for design decisions and approaches, applications of analytical techniques, or ‘proof of concept’ system implementations”
CHI 2007 Website
So…• How did we get to where we
are today?• Why did we end up with the
system(s) we use today?• How can our current
approaches to evaluation deal with novel concepts of HCI, such as experience-focused (rather than task focused) HCI?
• And in particular…
Evaluation of the VIO• A device for couples in long
distance relationships to communicate intimacy
• It’s about the experience; it’s not about the task
www.intimateobjects.orgKaye, Levitt, Nevins, Golden & Schmidt.
Communicating Intimacy One Bit at a Time. Ext. Abs. CHI 2005.
Kaye. I just clicked to say I love you. alt.chi, Ext. Abs. CHI 2006.
A Brief History and plan for the talk
1. Evaluation by Engineers2. Evaluation by Computer
Scientists3. Evaluation by Experimental
Psychologists & Cognitive Scientists
4. Evaluation by HCI Professionals
5. Evaluation in CSCW6. Evaluation for Experience
A Brief History and plan for the talk1. Evaluation by Engineers2. Evaluation by Computer
Scientists3. Evaluation by Experimental
Psychologists & Cognitive Scientists
a. Case study: Evaluation of Text Editors
4. Evaluation by HCI Professionals
a) Case Study: The Damaged Merchandise Debate
5. Evaluation in CSCW6. Evaluation for Experience
3 Questions to ask about an era
• Who are the users?• Who are the evaluators?• What are the limiting
factors?
Evaluation by Engineers• Users are engineers &
mathematicians• Evaluators are engineers• The limiting factor is
reliability
Evaluation by Computer Scientists
• Users are programmers• Evaluators are
programmers• The speed of the machine
is the limiting factor
Evaluation by Experimental Psychologists& Cognitive Scientists• Users are users: the
computer is a tool, not an end result
• Evaluators are cognitive scientists and experimental psychologists: they’re used to measuring things through experiment
• The limiting factor is what the human can do
Perceptual issues such as print legibility and motor issues arose in designing displays, keyboards and other input devices… [new interface developments] created opportunities for cognitive psychologists to contribute in such areas as motor learning, concept formation, semantic memory and action. In a sense, this marks the emergence of the distinct discipline of human-computer interaction. (Grudin 2006)
Evaluation by Experimental Psychologists& Cognitive Scientists
Case Study of Evaluation: Text EditorsRoberts & Moran, 1982,
1983.Their methodology for
evaluating text editors had three criteria:objectivitythoroughnessease-of-use
Case Study: Text Editorsobjectivity “implies that the methodology not
be biased in favor of any particular editor’s conceptual structure”
thoroughness “implies that multiple aspects of editor use be considered”
ease-of-use (of the method, not the editor itself)“the methodology should be usable by editor designers, managers of word processing centers, or other nonpsychologists who need this kind of evaluative information but who have limited time and equipment resources”
Case Study: Text Editorsobjectivity “implies that the methodology not
be biased in favor of any particular editor’s conceptual structure”
thoroughness “implies that multiple aspects of editor use be considered”.
ease-of-use (of the method (not the editor itself),“the methodology should be usable by editor designers, managers of word processing centers, or other nonpsychologists who need this kind of evaluative information but who have limited time and equipment resources.”
Case Study: Text Editors
Text editors are the white rats of HCI
Thomas Green, 1984,in Grudin, 1990.
Evaluation by HCI Professionals• Usability professionals• They believe in expertise
(e.g. Nielsen 1984)• They’ve made a decision to
decide to focus on better results, regardless of whether they were experimentally provable or not.
Case Study: The Damaged Merchandise Debate
Damaged Merchandise Setup
Early eighties:usability evaluation methods (UEMs)- heuristics (Nielsen)- cognitive walkthrough- GOMS- …
Damaged Merchandise Comparison Studies
Jefferies, Miller, Wharton and Uyeda (1991)
Karat, Campbell and Fiegel (1992)
Nielsen (1992)Desuirve, Kondziela, and
Atwood (1992)Nielsen and Phillips (1993)
Damaged Merchandise Panel
Wayne D. Gray, Panel at CHI’95
Discount or Disservice? Discount Usability Analysis at a Bargain Price or Simply Damaged Merchandise
Damaged Merchandise Paper
Wayne D. Gray & Marilyn Salzman
Special issue of HCI:Experimental Comparisons of
Usability Evaluation Methods
Damaged Merchandise ResponseCommentary on Damaged
MerchandiseKarat: experiment in contextJefferies & Miller: real-worldLund & McClelland: practicalJohn: case studiesMonk: broad questionsOviatt: field-wide scienceMacKay: triangulateNewman: simulation & modelling
Damaged Merchandise What’s going on?
Gray & Salzman, p19There is a tradition in the human factors literature of providing advice to practitioners on issues related to, but not investigated in, an experiment. This tradition includes the clear and explicit separation of experiment-based claims from experience-based advice. Our complaint is not against experimenters who attempt to offer good advice… the advice may be understood as research findings rather than the researcher’s opinion.
Damaged Merchandise What’s going on?
Gray & Salzman, p19There is a tradition in the human factors literature of providing advice to practitioners on issues related to, but not investigated in, an experiment. This tradition includes the clear and explicit separation of experiment-based claims from experience-based advice. Our complaint is not against experimenters who attempt to offer good advice… the advice may be understood as research findings rather than the researcher’s opinion.
Damaged Merchandise Clash of Paradigms
Experimental Psychologists & Cognitive Scientists
(who believe in experimentation) vs.
HCI Professionals (who believe in experience and expertise, even if ‘unprovable’) (and who were trying to present
their work in the terms of the dominant paradigm of the
field.)
CSCWBriefly…• CSCW vs. HCI• Not just groups instead of
users, but philosophy & approach (ideology?)
• Posits that work is member-created, dynamic, and explictly not cognitive, modelable
• Follows failure of ‘workplace studies’ to characterize work
Evaluation in CSCW• Ramage, The Learning Way
(Ph.D, Lancaster 1999)– No single ‘right’ or wrong– Identify why evaluate here– Determine stakeholders– Observe & analyze– Learn
• Note the differences between this kind of approach and more traditional HCI user testing.
• Fundamentally different from HCI: so much so they became a new field.
Experience Focused HCI
• A possibly emerging sub-field, drawing from traditions and disciplines outside the field
• Emphasis on the experience, not [just] the task
• But how to evaluate?
Experience focused HCI
Isbister et. al.: open-ended affective evaluations that leverage realtime individual interpretations.
Isbister, Höök, Sharp, Laaksolahti. The Sensual Evaluation Instrument: Developing an Affective Evaluation Tool. Proc. CHI’06
Experience focused HCI
Gaver et. al.: cultural commentators with expertise in their own fields provide multi-layered assessment.
Gaver, W. Cultural Commentators for Polyphonic Assessment. To appear in IJHCI.
Experience focused HCI
Kaye et. al. Cultural probes to provide user-interpreted thick descriptions of use experience
Kaye, Levitt, Nevins, Golden & Schmidt. Communicating Intimacy One Bit at a Time. Ext. Abs. CHI 2005.
Epistemology
• How does a field know what it knows?
• How does a field know that it knows it?
• Science: experiment…• But literature? Anthropology?
Sociology? Therapy? Art? Theatre? Design?
• These disciplines have ways to talk about experience lacking in an experimental paradigm.
Formally…
The aim of this work is to recognize the ways in which multiple epistemologies, not just the experimental paradigm of science, must inform the hybrid discipline of human-computer interaction if we wish to build systems that support users’ increasingly rich interactions with technology.
An evolving discussionThanks to• Mark Blythe & Darren Reed• Louise Barkhuus & Barry Brown,
University of Glasgow• Alex Taylor & MS Research• Phoebe Sengers & CEmCom• Cornell S&TS Department• Maria Håkansson & the IT
University Göteborg• Andy Warr & The Oxford E-
Research Center