evolution of evaluation in hci joseph jofish kaye microsoft research, cambridge cornell university,...

Evolution of Evaluationin HCI

Joseph ‘Jofish’ KayeMicrosoft Research, CambridgeCornell University, Ithaca, NYjofish @ cornell.edu

HCI Seminar SeriesYork20 November 2006

What is evaluation?• Something you do at the end

of a project to show it works…• … so you can publish it.• Part of the design-build-

evaluate iterative design cycle• A way of defining a field• A way a discipline validates

the knowledge it creates.• A reason papers get rejected

HCI Evaluation: Validity“Methods for establishing

validity vary depending on the nature of the contribution. They may involve empirical work in the laboratory or the field, the description of rationales for design decisions and approaches, applications of analytical techniques, or ‘proof of concept’ system implementations”

CHI 2007 Website

So…• How did we get to where we

are today?• Why did we end up with the

system(s) we use today?• How can our current

approaches to evaluation deal with novel concepts of HCI, such as experience-focused (rather than task focused) HCI?

• And in particular…

Evaluation of the VIO• A device for couples in long

distance relationships to communicate intimacy

• It’s about the experience; it’s not about the task

www.intimateobjects.orgKaye, Levitt, Nevins, Golden & Schmidt.

Communicating Intimacy One Bit at a Time. Ext. Abs. CHI 2005.

Kaye. I just clicked to say I love you. alt.chi, Ext. Abs. CHI 2006.

A Brief History and plan for the talk

1. Evaluation by Engineers2. Evaluation by Computer

Scientists3. Evaluation by Experimental

Psychologists & Cognitive Scientists

4. Evaluation by HCI Professionals

5. Evaluation in CSCW6. Evaluation for Experience

A Brief History and plan for the talk1. Evaluation by Engineers2. Evaluation by Computer

Scientists3. Evaluation by Experimental

Psychologists & Cognitive Scientists

a. Case study: Evaluation of Text Editors

4. Evaluation by HCI Professionals

a) Case Study: The Damaged Merchandise Debate

5. Evaluation in CSCW6. Evaluation for Experience

3 Questions to ask about an era

• Who are the users?• Who are the evaluators?• What are the limiting

factors?

Evaluation by Engineers• Users are engineers &

mathematicians• Evaluators are engineers• The limiting factor is

reliability

Evaluation by Computer Scientists

• Users are programmers• Evaluators are

programmers• The speed of the machine

is the limiting factor

Evaluation by Experimental Psychologists& Cognitive Scientists• Users are users: the

computer is a tool, not an end result

• Evaluators are cognitive scientists and experimental psychologists: they’re used to measuring things through experiment

• The limiting factor is what the human can do

Perceptual issues such as print legibility and motor issues arose in designing displays, keyboards and other input devices… [new interface developments] created opportunities for cognitive psychologists to contribute in such areas as motor learning, concept formation, semantic memory and action. In a sense, this marks the emergence of the distinct discipline of human-computer interaction. (Grudin 2006)

Evaluation by Experimental Psychologists& Cognitive Scientists

Case Study of Evaluation: Text EditorsRoberts & Moran, 1982,

1983.Their methodology for

evaluating text editors had three criteria:objectivitythoroughnessease-of-use

Case Study: Text Editorsobjectivity “implies that the methodology not

be biased in favor of any particular editor’s conceptual structure”

thoroughness “implies that multiple aspects of editor use be considered”

ease-of-use (of the method, not the editor itself)“the methodology should be usable by editor designers, managers of word processing centers, or other nonpsychologists who need this kind of evaluative information but who have limited time and equipment resources”

Case Study: Text Editorsobjectivity “implies that the methodology not

be biased in favor of any particular editor’s conceptual structure”

thoroughness “implies that multiple aspects of editor use be considered”.

ease-of-use (of the method (not the editor itself),“the methodology should be usable by editor designers, managers of word processing centers, or other nonpsychologists who need this kind of evaluative information but who have limited time and equipment resources.”

Case Study: Text Editors

Text editors are the white rats of HCI

Thomas Green, 1984,in Grudin, 1990.

Evaluation by HCI Professionals• Usability professionals• They believe in expertise

(e.g. Nielsen 1984)• They’ve made a decision to

decide to focus on better results, regardless of whether they were experimentally provable or not.

Case Study: The Damaged Merchandise Debate

Damaged Merchandise Setup

Early eighties:usability evaluation methods (UEMs)- heuristics (Nielsen)- cognitive walkthrough- GOMS- …

Damaged Merchandise Comparison Studies

Jefferies, Miller, Wharton and Uyeda (1991)

Karat, Campbell and Fiegel (1992)

Nielsen (1992)Desuirve, Kondziela, and

Atwood (1992)Nielsen and Phillips (1993)

Damaged Merchandise Panel

Wayne D. Gray, Panel at CHI’95

Discount or Disservice? Discount Usability Analysis at a Bargain Price or Simply Damaged Merchandise

Damaged Merchandise Paper

Wayne D. Gray & Marilyn Salzman

Special issue of HCI:Experimental Comparisons of

Usability Evaluation Methods

Damaged Merchandise ResponseCommentary on Damaged

MerchandiseKarat: experiment in contextJefferies & Miller: real-worldLund & McClelland: practicalJohn: case studiesMonk: broad questionsOviatt: field-wide scienceMacKay: triangulateNewman: simulation & modelling

Damaged Merchandise What’s going on?

Gray & Salzman, p19There is a tradition in the human factors literature of providing advice to practitioners on issues related to, but not investigated in, an experiment. This tradition includes the clear and explicit separation of experiment-based claims from experience-based advice. Our complaint is not against experimenters who attempt to offer good advice… the advice may be understood as research findings rather than the researcher’s opinion.

Damaged Merchandise Clash of Paradigms

Experimental Psychologists & Cognitive Scientists

(who believe in experimentation) vs.

HCI Professionals (who believe in experience and expertise, even if ‘unprovable’) (and who were trying to present

their work in the terms of the dominant paradigm of the

field.)

CSCWBriefly…• CSCW vs. HCI• Not just groups instead of

users, but philosophy & approach (ideology?)

• Posits that work is member-created, dynamic, and explictly not cognitive, modelable

• Follows failure of ‘workplace studies’ to characterize work

Evaluation in CSCW• Ramage, The Learning Way

(Ph.D, Lancaster 1999)– No single ‘right’ or wrong– Identify why evaluate here– Determine stakeholders– Observe & analyze– Learn

• Note the differences between this kind of approach and more traditional HCI user testing.

• Fundamentally different from HCI: so much so they became a new field.

Experience Focused HCI

• A possibly emerging sub-field, drawing from traditions and disciplines outside the field

• Emphasis on the experience, not [just] the task

• But how to evaluate?

Experience focused HCI

Isbister et. al.: open-ended affective evaluations that leverage realtime individual interpretations.

Isbister, Höök, Sharp, Laaksolahti. The Sensual Evaluation Instrument: Developing an Affective Evaluation Tool. Proc. CHI’06


Gaver et. al.: cultural commentators with expertise in their own fields provide multi-layered assessment.

Gaver, W. Cultural Commentators for Polyphonic Assessment. To appear in IJHCI.


Kaye et. al. Cultural probes to provide user-interpreted thick descriptions of use experience

Kaye, Levitt, Nevins, Golden & Schmidt. Communicating Intimacy One Bit at a Time. Ext. Abs. CHI 2005.

Epistemology

• How does a field know what it knows?

• How does a field know that it knows it?

• Science: experiment…• But literature? Anthropology?

Sociology? Therapy? Art? Theatre? Design?

• These disciplines have ways to talk about experience lacking in an experimental paradigm.

Formally…

The aim of this work is to recognize the ways in which multiple epistemologies, not just the experimental paradigm of science, must inform the hybrid discipline of human-computer interaction if we wish to build systems that support users’ increasingly rich interactions with technology.

An evolving discussionThanks to• Mark Blythe & Darren Reed• Louise Barkhuus & Barry Brown,

University of Glasgow• Alex Taylor & MS Research• Phoebe Sengers & CEmCom• Cornell S&TS Department• Maria Håkansson & the IT

University Göteborg• Andy Warr & The Oxford E-

Research Center

evolution of evaluation in hci joseph jofish kaye microsoft research, cambridge cornell university,...

Documents

hci evaluation

evaluation of text editors

evaluation deal

evolution of evaluation

case study of evaluation

experience slide

particular slide

reliability slide