![Page 2: Ryen White Microsoft Research ryenw@microsoft.com research.microsoft.com/~ryenw/talks/ppt/WhiteIMT542E.ppt](https://reader035.vdocument.in/reader035/viewer/2022062409/56649ec95503460f94bd7827/html5/thumbnails/2.jpg)
OverviewShort, selfish bit about meUser evaluation in IRCase study combining two approaches
User studyLog-based
Introduction to Exploratory Search SystemsFocus on evaluation
Short group activityWrap-up
![Page 3: Ryen White Microsoft Research ryenw@microsoft.com research.microsoft.com/~ryenw/talks/ppt/WhiteIMT542E.ppt](https://reader035.vdocument.in/reader035/viewer/2022062409/56649ec95503460f94bd7827/html5/thumbnails/3.jpg)
Me, Me, MeInterested in understanding and supporting
peoples’ search behaviors, in particular on the WebPh.D. in Interactive Information Retrieval from
University of Glasgow, Scotland (2001 – 2004)Post-doc at University of Maryland Human-
Computer Interaction Lab (2004 – 2006)Instructor for course on Human-Computer Interaction
at UMD College of Library and Information StudiesResearcher in Text Mining, Search, and Navigation
group at Microsoft Research, Redmond (2006 - present)
![Page 4: Ryen White Microsoft Research ryenw@microsoft.com research.microsoft.com/~ryenw/talks/ppt/WhiteIMT542E.ppt](https://reader035.vdocument.in/reader035/viewer/2022062409/56649ec95503460f94bd7827/html5/thumbnails/4.jpg)
OverviewShort, selfish bit about meUser evaluation in IRCase study combining two approaches
User studyLog-based
Introduction to Exploratory Search SystemsFocus on evaluation
Short group activityWrap-up
![Page 5: Ryen White Microsoft Research ryenw@microsoft.com research.microsoft.com/~ryenw/talks/ppt/WhiteIMT542E.ppt](https://reader035.vdocument.in/reader035/viewer/2022062409/56649ec95503460f94bd7827/html5/thumbnails/5.jpg)
Search InterfacesThere are lots of different search interfaces, for
lots of different situations
Big question: How do we evaluate these interfaces?
![Page 6: Ryen White Microsoft Research ryenw@microsoft.com research.microsoft.com/~ryenw/talks/ppt/WhiteIMT542E.ppt](https://reader035.vdocument.in/reader035/viewer/2022062409/56649ec95503460f94bd7827/html5/thumbnails/6.jpg)
Some ApproachesLaboratory ExperimentsNaturalistic StudiesLongitudinal StudiesFormative (during) and Summative (after)
evaluationsTraditional usability studies
Is an interface usable? Generally not comparative.
Case StudiesOften designer, not user, driven
![Page 7: Ryen White Microsoft Research ryenw@microsoft.com research.microsoft.com/~ryenw/talks/ppt/WhiteIMT542E.ppt](https://reader035.vdocument.in/reader035/viewer/2022062409/56649ec95503460f94bd7827/html5/thumbnails/7.jpg)
Research QuestionsResearch questions are questions that you
hope that your study will answer (a formal statement of your goal)
Hypotheses are specific predictions about relationships among variables
Questions should be meaningful, answerable, concise, open-ended, and value-free
![Page 8: Ryen White Microsoft Research ryenw@microsoft.com research.microsoft.com/~ryenw/talks/ppt/WhiteIMT542E.ppt](https://reader035.vdocument.in/reader035/viewer/2022062409/56649ec95503460f94bd7827/html5/thumbnails/8.jpg)
Research Questions: Example 1For study of advanced query syntax (e.g., +, -, “”,
site:), the research questions were: Is there a relationship between the use of advanced
syntax and other characteristics of a search?Is there a relationship between the use of advanced
syntax and post-query navigation behaviors?Is there a relationship between the use of advanced
syntax and measures of search success?
![Page 9: Ryen White Microsoft Research ryenw@microsoft.com research.microsoft.com/~ryenw/talks/ppt/WhiteIMT542E.ppt](https://reader035.vdocument.in/reader035/viewer/2022062409/56649ec95503460f94bd7827/html5/thumbnails/9.jpg)
Research Questions: Example 2For a study of an interface gadget that points users
to popular destinations (i.e., pages that many people visit):Are popular destinations preferable and more
effective than query refinement suggestions and unaided Web search for: Searches that are well-defined (“known-item” tasks)? Searches that are ill-defined (“exploratory” tasks)?
Should popular destinations be taken from the end of query trails or the end of session trails?
More on this research question in the case study later!
![Page 10: Ryen White Microsoft Research ryenw@microsoft.com research.microsoft.com/~ryenw/talks/ppt/WhiteIMT542E.ppt](https://reader035.vdocument.in/reader035/viewer/2022062409/56649ec95503460f94bd7827/html5/thumbnails/10.jpg)
VariablesIndependent Variable (IV): the “cause”; this is
often (but not always) controlled or manipulated by the investigator
Dependent Variable (DV): the “effect”; this is what is proposed to change as a result of different values of the independent variable
Other variables:Intervening variable: explains link between variablesModerating variable: affects direction/strength IV-to-
DVConfounding variable: not controlled for, affects DV
![Page 11: Ryen White Microsoft Research ryenw@microsoft.com research.microsoft.com/~ryenw/talks/ppt/WhiteIMT542E.ppt](https://reader035.vdocument.in/reader035/viewer/2022062409/56649ec95503460f94bd7827/html5/thumbnails/11.jpg)
HypothesesAlternative Hypothesis: a statement describing
the relationship between two or more variables, e.g.,E.g., Search engine users that use advanced query
syntax find more relevant Web pages
Null Hypothesis: a statement declaring that there is no relationship among variables; you may have heard of“reject the null hypothesis”“failing to reject the null hypothesis”E.g., Search engine users that use advanced query
syntax find Web pages that are no more or less relevant than other users
![Page 12: Ryen White Microsoft Research ryenw@microsoft.com research.microsoft.com/~ryenw/talks/ppt/WhiteIMT542E.ppt](https://reader035.vdocument.in/reader035/viewer/2022062409/56649ec95503460f94bd7827/html5/thumbnails/12.jpg)
Experimental DesignWithin and/or Between Subjects
Within-subjects: All subjects use all systemsBetween-subjects: Subjects use only one system,
different blocks of users use each systemControl:
System with no modifications (in within-subjects)Group of subjects that do not use experimental
system, but instead use a baseline (in between-subjects)
Factorial Designs> 1 variable (factor), e.g., system × task type
![Page 13: Ryen White Microsoft Research ryenw@microsoft.com research.microsoft.com/~ryenw/talks/ppt/WhiteIMT542E.ppt](https://reader035.vdocument.in/reader035/viewer/2022062409/56649ec95503460f94bd7827/html5/thumbnails/13.jpg)
TasksTask or topic?
Task is the activity the user is asked to performTopic is the subject matter of the task
Artificial tasksSubjects given task or even queries; relevance
pre-determinedSimulated work tasks (Borlund, 2000)
Subjects given task; compose queries; determine relevance
Natural tasks (Kelly & Belkin, 2004)Subjects construct own tasks as part of real needs
![Page 14: Ryen White Microsoft Research ryenw@microsoft.com research.microsoft.com/~ryenw/talks/ppt/WhiteIMT542E.ppt](https://reader035.vdocument.in/reader035/viewer/2022062409/56649ec95503460f94bd7827/html5/thumbnails/14.jpg)
System & Task RotationRotation & counterbalancing to
counteract learning effectsLatin Square rotation
n × n table filled with n different symbols so that each symbol occurs exactly once in each row and exactly once in each column
Factorial rotationall possible combinations
Factorial has twice as many subjectsTwice as expensive to perform
213
132
321
123
213
132
312
231
321
![Page 15: Ryen White Microsoft Research ryenw@microsoft.com research.microsoft.com/~ryenw/talks/ppt/WhiteIMT542E.ppt](https://reader035.vdocument.in/reader035/viewer/2022062409/56649ec95503460f94bd7827/html5/thumbnails/15.jpg)
Data CollectionQuestionnairesDiariesInterviewsFocus groupsObservationThink-aloudLogging (system, proxy & server, client)
![Page 16: Ryen White Microsoft Research ryenw@microsoft.com research.microsoft.com/~ryenw/talks/ppt/WhiteIMT542E.ppt](https://reader035.vdocument.in/reader035/viewer/2022062409/56649ec95503460f94bd7827/html5/thumbnails/16.jpg)
Data Analysis: QuantitativeDescriptive Statistics
Describes the characteristics of a sample of the relationship among variables
Presents summary information about the exampleE.g., mean, correlation coefficient
Inferential StatisticsUsed for hypotheses testingDemonstrate cause/effect relationshipsE.g., t-value (from t-test), F-value (from ANOVA)
![Page 17: Ryen White Microsoft Research ryenw@microsoft.com research.microsoft.com/~ryenw/talks/ppt/WhiteIMT542E.ppt](https://reader035.vdocument.in/reader035/viewer/2022062409/56649ec95503460f94bd7827/html5/thumbnails/17.jpg)
Data Analysis: QualitativeCoding – open-questions, transcribed think-aloud,
…Classifying or categorizing individual pieces of dataOpen Coding: codes are suggested by the
investigator’s examination and questioning of the data Iterative process
Closed Coding: codes are identified before the data is collected
Each passage can have more than one codeAll passages do not have to have a codeCode, code, and code some more!
![Page 18: Ryen White Microsoft Research ryenw@microsoft.com research.microsoft.com/~ryenw/talks/ppt/WhiteIMT542E.ppt](https://reader035.vdocument.in/reader035/viewer/2022062409/56649ec95503460f94bd7827/html5/thumbnails/18.jpg)
OverviewShort, selfish bit about meUser evaluation in IRCase study combining two approaches
User studyLog-based
Introduction to Exploratory Search SystemsFocus on evaluation
Short group activityWrap-up
![Page 19: Ryen White Microsoft Research ryenw@microsoft.com research.microsoft.com/~ryenw/talks/ppt/WhiteIMT542E.ppt](https://reader035.vdocument.in/reader035/viewer/2022062409/56649ec95503460f94bd7827/html5/thumbnails/19.jpg)
Case StudyLeveraging popular destinations to enhance Web search interaction
White, R.W., Bilenko, M., Cucerzan, S. (2007). Studying the use of popular destinations to enhance web search interaction. In Proceedings of the 30th ACM SIGIR Conference on Research
and Development in Information Retrieval, pp. 159-166.
![Page 20: Ryen White Microsoft Research ryenw@microsoft.com research.microsoft.com/~ryenw/talks/ppt/WhiteIMT542E.ppt](https://reader035.vdocument.in/reader035/viewer/2022062409/56649ec95503460f94bd7827/html5/thumbnails/20.jpg)
MotivationQuery suggestion is a popular approach to help users
better define their information needs
Incremental: may be inappropriate for exploratory needs
In exploratory searches users rely a lot on browsingCan we use places others go rather than what they
say?
Query = [hubble telescope]
Query suggestio
ns
![Page 21: Ryen White Microsoft Research ryenw@microsoft.com research.microsoft.com/~ryenw/talks/ppt/WhiteIMT542E.ppt](https://reader035.vdocument.in/reader035/viewer/2022062409/56649ec95503460f94bd7827/html5/thumbnails/21.jpg)
Search Trails: from user logsInitiated with a query
to a top-5 search engine
Query trailsQuery Query
Session trailsQuery Event:
Session timeout Visit homepage Type URL Check Web-based
email or logon to online service
S1 S3 S4
S3
dpreview.com
S2
pmai.orgdigital
cameras
S2
QueryTrailEnd canon.com
amazon.com
S5
howstuffworks.com
S6
S5 S8
S6 S9
S1 S10 S11
S10 S12 S13 S14
amazon
digitalcamera-hq.com
digital camera canon
S7
S6
canon lenses
SessionTrailEnd
S2
![Page 22: Ryen White Microsoft Research ryenw@microsoft.com research.microsoft.com/~ryenw/talks/ppt/WhiteIMT542E.ppt](https://reader035.vdocument.in/reader035/viewer/2022062409/56649ec95503460f94bd7827/html5/thumbnails/22.jpg)
Popular DestinationsPages at which other users end up frequently after
submitting the same or similar queries, and then browsing away from initially clicked search results
Popular destinations lie at the end of many users’ trailsMay not be among the top-ranked resultsMay not contain the queried termsMay not even be indexed by the search engine
![Page 23: Ryen White Microsoft Research ryenw@microsoft.com research.microsoft.com/~ryenw/talks/ppt/WhiteIMT542E.ppt](https://reader035.vdocument.in/reader035/viewer/2022062409/56649ec95503460f94bd7827/html5/thumbnails/23.jpg)
Suggesting DestinationsCan we exploit a corpus of trails to support
Web search?
![Page 24: Ryen White Microsoft Research ryenw@microsoft.com research.microsoft.com/~ryenw/talks/ppt/WhiteIMT542E.ppt](https://reader035.vdocument.in/reader035/viewer/2022062409/56649ec95503460f94bd7827/html5/thumbnails/24.jpg)
Research QuestionsRQ1: Are destination suggestions preferable
and more effective than query refinement suggestions and unaided Web search for:Searches that are well-defined (“known-item”
tasks)Searches that are ill-defined (“exploratory”
tasks)
RQ2: Should destination suggestions be taken from the end of the query trails or the end of the session trails?
![Page 25: Ryen White Microsoft Research ryenw@microsoft.com research.microsoft.com/~ryenw/talks/ppt/WhiteIMT542E.ppt](https://reader035.vdocument.in/reader035/viewer/2022062409/56649ec95503460f94bd7827/html5/thumbnails/25.jpg)
User StudyConducted a user study to answer these
questions36 subjects drawn from subject pool within
our organization4 systems2 task types (“known-item” and “exploratory”)Within-subject experimental designGraeco-Latin square designSubjects attempted 2 known-item and 2
exploratory tasks, one on each system
![Page 26: Ryen White Microsoft Research ryenw@microsoft.com research.microsoft.com/~ryenw/talks/ppt/WhiteIMT542E.ppt](https://reader035.vdocument.in/reader035/viewer/2022062409/56649ec95503460f94bd7827/html5/thumbnails/26.jpg)
Systems: Unaided Web SearchLive Search backendNo direct support for query refinement
Query = [hubble telescope]
![Page 27: Ryen White Microsoft Research ryenw@microsoft.com research.microsoft.com/~ryenw/talks/ppt/WhiteIMT542E.ppt](https://reader035.vdocument.in/reader035/viewer/2022062409/56649ec95503460f94bd7827/html5/thumbnails/27.jpg)
Systems: Query Suggestion Suggests queries based on popular
extensions for the current query type by the userQuery = [hubble telescope]
![Page 28: Ryen White Microsoft Research ryenw@microsoft.com research.microsoft.com/~ryenw/talks/ppt/WhiteIMT542E.ppt](https://reader035.vdocument.in/reader035/viewer/2022062409/56649ec95503460f94bd7827/html5/thumbnails/28.jpg)
Systems: Destination SuggestionQuery Destination (unaided + page support)
Suggests pages many users visit before next query
Session Destination (unaided + page support)Same as above, but before session end not next query
Query = [hubble telescope]
![Page 29: Ryen White Microsoft Research ryenw@microsoft.com research.microsoft.com/~ryenw/talks/ppt/WhiteIMT542E.ppt](https://reader035.vdocument.in/reader035/viewer/2022062409/56649ec95503460f94bd7827/html5/thumbnails/29.jpg)
TasksTasks taken and adapted from TREC Interactive
Track and QA communities (e.g., Live QnA, Yahoo! Answers)
Six of each task type, subject chose without replacement
Two task types: known-item and exploratoryKnown-item: Identify three tropical storms
(hurricanes and typhoons) that have caused property damage and/or loss of life.
Exploratory task: You are considering purchasing a Voice Over Internet Protocol (VoIP) telephone. You want to learn more about VoIP technology and providers that offer the service, and select the provider and telephone that best suits you.
![Page 30: Ryen White Microsoft Research ryenw@microsoft.com research.microsoft.com/~ryenw/talks/ppt/WhiteIMT542E.ppt](https://reader035.vdocument.in/reader035/viewer/2022062409/56649ec95503460f94bd7827/html5/thumbnails/30.jpg)
MethodologySubjects:
Chose two known-item and two exploratory tasks from six
Completed demographic and experience questionnaire
For each of four interfaces, subjects were:Given an explanation of interface functionality (2 min.)Attempt the task on the assigned system (10 min.)Asked to complete a post-search questionnaire after
each task
After using four systems, subjects answered exit questionnaire
![Page 31: Ryen White Microsoft Research ryenw@microsoft.com research.microsoft.com/~ryenw/talks/ppt/WhiteIMT542E.ppt](https://reader035.vdocument.in/reader035/viewer/2022062409/56649ec95503460f94bd7827/html5/thumbnails/31.jpg)
Findings: System RankingSubjects asked to rank the systems in preference order
Subjects preferred QuerySuggestion and QueryDestination
Differences not statistically significantOverall ranking merges performance on different types
of search task to produce one ranking
Systems Baseline QuerySuggest. QueryDest. SessionDest.
Ranking 2.47 2.14 1.92 2.31
Relative ranking of systems (lower = better).
![Page 32: Ryen White Microsoft Research ryenw@microsoft.com research.microsoft.com/~ryenw/talks/ppt/WhiteIMT542E.ppt](https://reader035.vdocument.in/reader035/viewer/2022062409/56649ec95503460f94bd7827/html5/thumbnails/32.jpg)
Findings: Subject CommentsResponses to open-ended questions
Baseline:+ familiarity of the system (e.g., “was familiar
and I didn’t end up using suggestions” (S36))− lack of support for query formulation (“Can be
difficult if you don’t pick good search terms” (S20))
− difficulty locating relevant documents (e.g., “Difficult to find what I was looking for” (S13))
![Page 33: Ryen White Microsoft Research ryenw@microsoft.com research.microsoft.com/~ryenw/talks/ppt/WhiteIMT542E.ppt](https://reader035.vdocument.in/reader035/viewer/2022062409/56649ec95503460f94bd7827/html5/thumbnails/33.jpg)
Findings: Subject CommentsQuery Suggestion:
+ rapid support for query formulation (e.g., “was useful in saving typing and coming up with new ideas for query expansion” (S12); “helps me better phrase the search term” (S24); “made my next query easier” (S21))
− suggestion quality (e.g., “Not relevant” (S11); “Popular queries weren’t what I was looking for” (S18))
− quality of results they led to (e.g., “Results (after clicking on suggestions) were of low quality” (S35); “Ultimately unhelpful” (S1))
![Page 34: Ryen White Microsoft Research ryenw@microsoft.com research.microsoft.com/~ryenw/talks/ppt/WhiteIMT542E.ppt](https://reader035.vdocument.in/reader035/viewer/2022062409/56649ec95503460f94bd7827/html5/thumbnails/34.jpg)
Findings: Subject CommentsQueryDestination:
+ support for accessing new information sources (e.g., “provided potentially helpful and new areas / domains to look at” (S27))
+ bypassing the need to browse to these pages (“Useful to try to ‘cut to the chase’ and go where others may have found answers to the topic” (S3))
− lack of specificity in the suggested domains (“Should just link to site-specific query, not site itself” (S16); “Sites were not very specific” (S24); “Too general/vague” (S28))
− quality of the suggestions (“Not relevant” (S11); “Irrelevant” (S6))
![Page 35: Ryen White Microsoft Research ryenw@microsoft.com research.microsoft.com/~ryenw/talks/ppt/WhiteIMT542E.ppt](https://reader035.vdocument.in/reader035/viewer/2022062409/56649ec95503460f94bd7827/html5/thumbnails/35.jpg)
Findings: Subject CommentsSessionDestination:
+ utility of the suggested domains (“suggestions make an awful lot of sense in providing search assistance, and seemed to help very nicely” (S5))
− irrelevance of the suggestions (e.g., “did not seem reliable, not much help” (S30); “irrelevant, not my style” (S21))
− need to include explanations about why the suggestions were offered (e.g., “low-quality results, not enough information presented” (S35))
![Page 36: Ryen White Microsoft Research ryenw@microsoft.com research.microsoft.com/~ryenw/talks/ppt/WhiteIMT542E.ppt](https://reader035.vdocument.in/reader035/viewer/2022062409/56649ec95503460f94bd7827/html5/thumbnails/36.jpg)
Findings: Task CompletionSubjects felt that they were more successful
for known-item searches on QuerySuggestion and more successful for exploratory searches in QueryDestination
Task-typeSystem
Baseline QSuggestion QDestination SDestination
Known-item 2.0 1.3 1.4 1.4
Exploratory 2.8 2.3 1.4 2.6
Perceptions of task success (lower = better, scale = 1-5 )
![Page 37: Ryen White Microsoft Research ryenw@microsoft.com research.microsoft.com/~ryenw/talks/ppt/WhiteIMT542E.ppt](https://reader035.vdocument.in/reader035/viewer/2022062409/56649ec95503460f94bd7827/html5/thumbnails/37.jpg)
Findings: Task Completion Time
QuerySuggestion and QueryDestination sped up known-item performance
Exploratory tasks took longer
Known-item Exploratory0
100
200
300
400
500
600
Task categories
BaselineQSuggest
Time (seconds)
Systems
348.8
513.7
272.3
467.8
232.3
474.2
359.8
472.2
QDestination
SDestination
![Page 38: Ryen White Microsoft Research ryenw@microsoft.com research.microsoft.com/~ryenw/talks/ppt/WhiteIMT542E.ppt](https://reader035.vdocument.in/reader035/viewer/2022062409/56649ec95503460f94bd7827/html5/thumbnails/38.jpg)
Findings: Interaction
Known-item taskssubjects used query suggestion most heavily
Exploratory taskssubjects benefited most from destination
suggestionsSubjects submitted fewer queries and clicked
fewer search results on QueryDestination
Task-typeSystem
QSuggestion QDestination SDestination
Known-item 35.7 33.5 23.4
Exploratory 30.0 35.2 25.3
Suggestion uptake (values are percentages).
![Page 39: Ryen White Microsoft Research ryenw@microsoft.com research.microsoft.com/~ryenw/talks/ppt/WhiteIMT542E.ppt](https://reader035.vdocument.in/reader035/viewer/2022062409/56649ec95503460f94bd7827/html5/thumbnails/39.jpg)
Log AnalysisThese findings are all from the laboratoryLogs from consenting users of the Windows
Live Toolbar allowed us to determine the external validity of our experimental findingsDo the behaviors observed in the study mimic
those of real users in the “wild”?
Extracted search sessions from the logs that started with the same initial queries as our user study subjects
![Page 40: Ryen White Microsoft Research ryenw@microsoft.com research.microsoft.com/~ryenw/talks/ppt/WhiteIMT542E.ppt](https://reader035.vdocument.in/reader035/viewer/2022062409/56649ec95503460f94bd7827/html5/thumbnails/40.jpg)
Log Analysis: Search TrailsInitiated with a query
to a top-5 search engine
Query trailsQuery Query
Session trailsQuery Event:
Session timeout Visit homepage Type URL Check Web-based
email or logon to online service
S1 S3 S4
S3
dpreview.com
S2
pmai.orgdigital
cameras
S2
QueryTrailEnd canon.com
amazon.com
S5
howstuffworks.com
S6
S5 S8
S6 S9
S1 S10 S11
S10 S12 S13 S14
amazon
digitalcamera-hq.com
digital camera canon
S7
S6
canon lenses
SessionTrailEnd
S2
![Page 41: Ryen White Microsoft Research ryenw@microsoft.com research.microsoft.com/~ryenw/talks/ppt/WhiteIMT542E.ppt](https://reader035.vdocument.in/reader035/viewer/2022062409/56649ec95503460f94bd7827/html5/thumbnails/41.jpg)
Log Analysis: TrailsWe extracted 2,038 trails from the logs that began
with the same query as a user study session700 from known-item and 1,338 from exploratory
tasks
In vitro group: User study subjectsEx vitro group: Remote subjects
Compared:# query iterations, # unique query terms, # result
clicks, and # of unique domains visited
![Page 42: Ryen White Microsoft Research ryenw@microsoft.com research.microsoft.com/~ryenw/talks/ppt/WhiteIMT542E.ppt](https://reader035.vdocument.in/reader035/viewer/2022062409/56649ec95503460f94bd7827/html5/thumbnails/42.jpg)
Log Analysis: Results
Generally same, apart from in the number of unique query terms submittedSubjects may be taking terms from the textual
task descriptions provided to them
FeatureKnown-item Exploratory
In vitroEx vitro
In vitroEx vitro
10 min All 10 min All
Query iterations 1.9 2.3 2.6 3.1 3.0 3.8
Unique query terms 5.2 2.8 3.2 7.4 4.4 4.9
Result clicks 2.6 1.8 2.5 3.3 2.8 3.1
Unique domains 1.3 1.4 1.7 2.1 1.8 2.1
These numbers are high!
These numbers are high!
![Page 43: Ryen White Microsoft Research ryenw@microsoft.com research.microsoft.com/~ryenw/talks/ppt/WhiteIMT542E.ppt](https://reader035.vdocument.in/reader035/viewer/2022062409/56649ec95503460f94bd7827/html5/thumbnails/43.jpg)
Log Analysis: ResultsKnown-item tasks
72% overlap between queries issued and terms appearing in the task description
Exploratory tasks79% overlap between queries issued and terms
appearing in the task description
Could confound experiment if we are interested in query formulation behavior – need to address!
![Page 44: Ryen White Microsoft Research ryenw@microsoft.com research.microsoft.com/~ryenw/talks/ppt/WhiteIMT542E.ppt](https://reader035.vdocument.in/reader035/viewer/2022062409/56649ec95503460f94bd7827/html5/thumbnails/44.jpg)
ConclusionsUser study compared the popular destinations with
traditional query refinement and unaided Web search
Results revealed that: RQ1a: Query suggestion preferred for known-item
tasksRQ1b: Destination suggestion preferred for
exploratory tasksRQ2: Destinations from query trails rather than
session trailsDifferences in number of unique query terms
suggests that textual task descriptions may introduce some degree of experimental bias
![Page 45: Ryen White Microsoft Research ryenw@microsoft.com research.microsoft.com/~ryenw/talks/ppt/WhiteIMT542E.ppt](https://reader035.vdocument.in/reader035/viewer/2022062409/56649ec95503460f94bd7827/html5/thumbnails/45.jpg)
Case StudyWhat did we learn?
Showed how a user evaluation can be conducted
Showed how analysis of different sources – questionnaire responses and interaction logs (both local and remote) – can be combined to answer our research questions
Showed that the findings of a user study can be generalized in some respects to the “real” world (i.e., has some external validity)
Anything else?
![Page 46: Ryen White Microsoft Research ryenw@microsoft.com research.microsoft.com/~ryenw/talks/ppt/WhiteIMT542E.ppt](https://reader035.vdocument.in/reader035/viewer/2022062409/56649ec95503460f94bd7827/html5/thumbnails/46.jpg)
OverviewShort, selfish bit about meUser evaluation in IRCase study combining two approaches
User studyLog-based
Introduction to Exploratory Search SystemsFocus on evaluation
Short group activityWrap-up
![Page 47: Ryen White Microsoft Research ryenw@microsoft.com research.microsoft.com/~ryenw/talks/ppt/WhiteIMT542E.ppt](https://reader035.vdocument.in/reader035/viewer/2022062409/56649ec95503460f94bd7827/html5/thumbnails/47.jpg)
Exploratory Search“Exploratory search” describes:
an information-seeking problem context that is open-ended, persistent, and multi-faceted commonly used in scientific discovery, learning, and
decision making contextsinformation-seeking processes that are
opportunistic, iterative, and multi-tactical exploratory tactics are used in all manner of
information seeking and reflect seeker preferences and experience as much as the goal
User’s search
problem
User’s search
strategies
![Page 48: Ryen White Microsoft Research ryenw@microsoft.com research.microsoft.com/~ryenw/talks/ppt/WhiteIMT542E.ppt](https://reader035.vdocument.in/reader035/viewer/2022062409/56649ec95503460f94bd7827/html5/thumbnails/48.jpg)
Marchionini’s definition:
![Page 49: Ryen White Microsoft Research ryenw@microsoft.com research.microsoft.com/~ryenw/talks/ppt/WhiteIMT542E.ppt](https://reader035.vdocument.in/reader035/viewer/2022062409/56649ec95503460f94bd7827/html5/thumbnails/49.jpg)
Exploratory Search SystemsSupport both querying and browsing
activitiesSearch engines generally just support querying
Help users explore complex information spaces
Help users learn about new topics: go beyond finding
Can consider user contextE.g., Task constraints, user emotion, changing
needs
![Page 50: Ryen White Microsoft Research ryenw@microsoft.com research.microsoft.com/~ryenw/talks/ppt/WhiteIMT542E.ppt](https://reader035.vdocument.in/reader035/viewer/2022062409/56649ec95503460f94bd7827/html5/thumbnails/50.jpg)
OverviewShort, selfish bit about meUser evaluation in IRCase study combining two approaches
User studyLog-based
Introduction to Exploratory Search SystemsFocus on evaluation
Short group activityWrap-up
![Page 51: Ryen White Microsoft Research ryenw@microsoft.com research.microsoft.com/~ryenw/talks/ppt/WhiteIMT542E.ppt](https://reader035.vdocument.in/reader035/viewer/2022062409/56649ec95503460f94bd7827/html5/thumbnails/51.jpg)
Group ActivityDivide into two groups of 3-4 peopleEach group designs an evaluation of an
exploratory search systemTwo systems:
mSpace: faceted spatial browser for classical music
PhotoMesa: photo browser with flexible filtering, grouping, and zooming tools
You pick the evaluation criteria, comparator systems, approach, metrics, etc.
![Page 52: Ryen White Microsoft Research ryenw@microsoft.com research.microsoft.com/~ryenw/talks/ppt/WhiteIMT542E.ppt](https://reader035.vdocument.in/reader035/viewer/2022062409/56649ec95503460f94bd7827/html5/thumbnails/52.jpg)
mSpace (mspace.fm)
![Page 53: Ryen White Microsoft Research ryenw@microsoft.com research.microsoft.com/~ryenw/talks/ppt/WhiteIMT542E.ppt](https://reader035.vdocument.in/reader035/viewer/2022062409/56649ec95503460f94bd7827/html5/thumbnails/53.jpg)
PhotoMesa (photomesa.com)
![Page 54: Ryen White Microsoft Research ryenw@microsoft.com research.microsoft.com/~ryenw/talks/ppt/WhiteIMT542E.ppt](https://reader035.vdocument.in/reader035/viewer/2022062409/56649ec95503460f94bd7827/html5/thumbnails/54.jpg)
Some questions to think aboutWhat are the independent/dependent variables?Which experimental design?What task types? What tasks? What topics? Any comparator systems?What subjects? How many? How will you recruit?Which instruments? (e.g., questionnaires)Which data analysis methods
(qualitative/quantitative)?
Most importantly: Which metrics?How do you determine user and system performance?
![Page 55: Ryen White Microsoft Research ryenw@microsoft.com research.microsoft.com/~ryenw/talks/ppt/WhiteIMT542E.ppt](https://reader035.vdocument.in/reader035/viewer/2022062409/56649ec95503460f94bd7827/html5/thumbnails/55.jpg)
OverviewShort, selfish bit about meUser evaluation in IRCase study combining two approaches
User studyLog-based
Introduction to Exploratory Search SystemsFocus on evaluation
Short group activityWrap-up
![Page 56: Ryen White Microsoft Research ryenw@microsoft.com research.microsoft.com/~ryenw/talks/ppt/WhiteIMT542E.ppt](https://reader035.vdocument.in/reader035/viewer/2022062409/56649ec95503460f94bd7827/html5/thumbnails/56.jpg)
Evaluating Exploratory SearchSIGIR 2006 workshop on Evaluating
Exploratory Search Systems Brought together around 40 experts to
discuss issues in the evaluation of exploratory search systems
http://research.microsoft.com/~ryenw/eess
What metrics did they come up with?How do they compare to yours?
![Page 57: Ryen White Microsoft Research ryenw@microsoft.com research.microsoft.com/~ryenw/talks/ppt/WhiteIMT542E.ppt](https://reader035.vdocument.in/reader035/viewer/2022062409/56649ec95503460f94bd7827/html5/thumbnails/57.jpg)
Metrics from workshopEngagement and enjoyment:
e.g., task focus, happiness with system responses, the number of actionable events (e.g., purchases, forms filled)
Information novelty:e.g., the amount of new information encountered
Task success: e.g., reach target document? encountered
sufficient information en route?Task time: to assess efficiencyLearning and cognition:
e.g., cognitive loads, attainment of learning outcomes, richness/completeness of post-exploration perspective, amount of topic space covered, number of insights
![Page 58: Ryen White Microsoft Research ryenw@microsoft.com research.microsoft.com/~ryenw/talks/ppt/WhiteIMT542E.ppt](https://reader035.vdocument.in/reader035/viewer/2022062409/56649ec95503460f94bd7827/html5/thumbnails/58.jpg)
Activity Wrap-up[insert summary of comments from group
activity]
![Page 59: Ryen White Microsoft Research ryenw@microsoft.com research.microsoft.com/~ryenw/talks/ppt/WhiteIMT542E.ppt](https://reader035.vdocument.in/reader035/viewer/2022062409/56649ec95503460f94bd7827/html5/thumbnails/59.jpg)
ConclusionWe have:
Described aspects of user experimentation in IR
Walked through a case studyIntroduced exploratory searchPlanned evaluation of exploratory search
systemsRelated our proposed metrics to those of
others interested in evaluating exploratory search systems
![Page 60: Ryen White Microsoft Research ryenw@microsoft.com research.microsoft.com/~ryenw/talks/ppt/WhiteIMT542E.ppt](https://reader035.vdocument.in/reader035/viewer/2022062409/56649ec95503460f94bd7827/html5/thumbnails/60.jpg)
Acknowledgements
Although modified, a few of the earlier slides in this lecture were based on an excellent SIGIR 2006 tutorial given by Diane Kelly and David Harper – Thank you Diane and David!
![Page 61: Ryen White Microsoft Research ryenw@microsoft.com research.microsoft.com/~ryenw/talks/ppt/WhiteIMT542E.ppt](https://reader035.vdocument.in/reader035/viewer/2022062409/56649ec95503460f94bd7827/html5/thumbnails/61.jpg)
Referenced ReadingBorlund, P. (2000). Experimental components
for the evaluation of interaction information retrieval systems. Journal of Documentation, 56(1): 71-90.
Kelly, D. and Belkin, N.J. (2004). Display time as implicit feedback: Understanding task effects. Proceedings of the 29th ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 377-384.