evaluating ir (web) systems study of information seeking & ir pragmatics of ir experimentation...

Evaluating IR (Web) SystemsEvaluating IR (Web) Systems

• Study of Information Seeking & IR• Pragmatics of IR experimentation• The dynamic Web• Cataloging & understanding Web docs• Web site characteristics

Study of Info seeking & retrievalStudy of Info seeking & retrieval

- Well known authors (useful for research papers)

• Real life studies (not TREC)- User context of questions- Questions (structure & classification)- Searcher (cognitive traits & decision making)- Information Items• Difference searches with same question• Relevant items

• “models, measures, methods, procedures and statistical analyses” p 175

• Beyond common sense and anecdotes

Study 2Study 2

• Is there ever enough user research?• A good set of elements to include in an IR

system evaluation• How do you test for real life situations?- Questions the users actually have- Expertise in subject (or not)- Intent- User’s computers, desks & materials

• What’s a search strategy?- Tactics, habits, previous knowledge

• How do you collect search data?

Study 3Study 3

• How do you ask questions?- General knowledge test- Specific search terms

• Learning Style Inventory- NOT the best way to understand users- Better than nothing- Choose your questions like your users• Let users choose their questions?• Let users work together on searches

• Effectiveness Measures- Recall, precision, relevance

Study 4Study 4

• Measuring efficiency- Time on tasks- Task completion

• Correct answer• Any answer?

- Worthwhile?

• Counting correct answers• Statistics

- Clicks, commands, pages, results- Not just computer time, but the overall process- Start with the basics, then get advanced- Regression analysis (dependencies for large studies)

Let’s design an experimentLet’s design an experiment

• User Selection- Searcher (cognitive traits & decision making)- User context of questions

• Environment• Questions (structure & classification)• Information Items- Successful answers- Successful/Worthwhile sessions

• Measurement

Pragmatics of IR experimentationPragmatics of IR experimentation

• The entire IR evaluation must be planned• Controls are essential• Working with what you can get- Expert defined questions & answers- Specific systems

• Fast, cheap, informal tests- Not always, but could be pre-tests- Quick results for broad findings

Pragmatic Decision1Pragmatic Decision1

• Testing at all?- Purpose of test- Pull data from previous tests

• Repeat old test- Old test with new system- Old test with new database

• Same test, many users- Same system- Same questions (data)

Pragmatic Decision 2Pragmatic Decision 2

• What kind of test?• Everything at once?- System (help, no help?)- Users (types of)- Questions (open-ended?)

• Facts- Answers with numbers- Words the user knows

• General knowledge- Found more easily- Ambiguity goes both ways


• Understanding the Data• What are your variables? (p 207)• Working with initial goals of study• Study size determines measurement methods- Lots of user- Many questions- All system features, competing system features

• What is acceptable/passable performance?- Time, correct answers, clicks?- Which are controlled?


• What database?- The Web (no control)- Smaller dataset (useful to user?)

• Very similar questions, small dataset- Web site search vs. whole Web search- Prior knowledge of subject- Comprehensive survey of possible results

beforehand

• Differences other than content?


• Where do queries/questions come from?- Content itself- User pre-interview (pre-tests)- Other studies

• What are search terms (used or given)- Single terms- Advanced searching- Results quantity

Pragmatic Decisions 6, 7, 8Pragmatic Decisions 6, 7, 8

• Analyzing queries- Scoring system- Logging use

• What’s a winning query (treatment of units)- User success, expert answer- Time, performance- Different querie with same answer?

• Collect the data- Logging and asking users- Consistency (software, questionnaires, scripts)

Pragmatic Decisions 9 & 10Pragmatic Decisions 9 & 10

• Analyzing Data• Dependent on the dataset• Compare to other studies• Basic statistics first

• Presenting Results• Work from plan• Purpose• Measurement• Models• Users

• Matching other studies

Keeping Up with the Changing WebKeeping Up with the Changing Web

• Building Indices is difficult enough in theory• What about a continuously changing huge volume of

information?• Is old information good?• What does up-to-date mean anymore?• Is Knowledge a depreciating commodity?

- Correctness + Value over time

• Different information changes at different rates- Really it’s new information

• How do you update an index with constantly changing information?

Changing Web PropertiesChanging Web Properties

• Known distributions for information change• Sites and pages may have easily identifiable

patterns of update- 4% change on every observation- Some don’t ever change (links too)

• If you check and a page hasn’t changed, what is the probability it will ever change?

• Rate of change is related to rate of attention- Machines vs. Users- Measures can be compared along with information

Dynamic Maint. of Indexes w/LandmarksDynamic Maint. of Indexes w/Landmarks

• Web Crawlers do the work in gathering pages• Incremental crawling means incremented

indices- Rebuild the whole index more frequently- Devise a scheme for updates (and deletions)- Use supplementary indices (i.e. date)

• New documents• Changed documents• 404 documents

Landmarks for IndexingLandmarks for Indexing

• Difference-based method• Documents that don’t change are landmarks

- Relative addressing- Clarke: block-based- Glimpse: chunking

• Only update pointers to pages• Tags and document properties are

landmarked• Broader pointers mean less updates• Faster indexing – Faster access?

Yahoo! Cataloging the WebYahoo! Cataloging the Web

• How do information professionals build an “index” of the Web?

• Cataloging applies to the Web• Indexing with synonyms• Browsing indexes vs searching them• Comprehensive index not the goal

- Quality- Information Density

• Yahoo’s own ontology – points to site for full info• Subject Trees with aliases (@) to other locations• “More like this” comparisons as checksums

Yahoo uses tools for indexingYahoo uses tools for indexing

Investigation of Documents from the WWWInvestigation of Documents from the WWW

• What properties do Web documents have?• What structure and formats do Web

documents use?• What properties do Web documents have?- Size – 4K avg.- Tags – ratio and popular tags- MIME types (file extensions)- URL properties and formats- Links – internal and external- Graphics- Readability

WWW Documents InvestigationWWW Documents Investigation

• How do you collect data like this?- Web Crawler• URL identifier, link follower

- Index-like processing • Markup parser, keyword identifier

• Domain name translation (and caching)• How do these facts help with indexing?• Have general characteristics changed?

• (This would be a great project to update.)

Properties of Highly-Rated Web SitesProperties of Highly-Rated Web Sites

• What about whole Web sites?• What is a Web site?- Sub-sites?- Specific contextual, subject-based parts of a Web

site?- Links from other Web pages: on the site and off- Web site navigation effects

• Will experts (like Yahoo catalogers) like a site?

PropertiesProperties

• Links & formatting• Graphics – one, but not too many• Text formatting – 9 pt. with normal style• Page (layout) formatting – min. colors• Page performance (size and acess)• Site architecture (pages, nav elements)

- More links within and external- Interactive (search boxes, menus)

• Consistency within a site is key

• How would a user or index builder make use of these?

Extra DiscussionExtra Discussion

• Little Words, Big Difference- The difference that makes a difference- Singular and plural noun identification can change

indices and retrieval results- Language use differences

• Decay and Failures- Dead links- Types of errors- Huge amount of dead links (PageRank effective)• 28% in 1995-1999 Computer & CACM• 41% in 2002 articles• Better than the average Web page?

Break!Break!

Topic Discussions SetTopic Discussions Set

• Leading WIRED Topic Discussions- About 20 minutes reviewing issues from the

week’s readings• Key ideas from the readings• Questions you have about the readings• Concepts from readings to expand on

- PowerPoint slides- Handouts- Extra readings (at least a few days before class) –

send to wired listserv

Web IR EvaluationWeb IR Evaluation

- 5 page written evaluation of a Web IR System- technology overview (how it works)• Not an eval of a standard search engine• Only main determinable diff is content

- a brief overview of the development of this type of system (why it works better)

- intended uses for the system (who, when, why)- (your) examples or case studies of the system in

use and its overall effectiveness

• How can (Web) IR be better?- Better IR models- Better User Interfaces

• More to find vs. easier to find• Web documents sampling• Web cataloging work- Metadata & IR- Who watches the catalogers?

• Scriptable applications- Using existing IR systems in new ways- RSS & IR

Projects and/or Papers OverviewProjects and/or Papers Overview

Project IdeasProject Ideas

• Searchable Personal Digital Library• Browser hacks for searching• Mozilla keeps all the pages you surf so you

can search through them later- Mozilla hack- Local search engines

• Keeping track of searches• Monitoring searches

Paper IdeasPaper Ideas

• New datasets for IR• Search on the Desktop – issues, previous

research and ideas• Collaborative searching – advantages and

potential, but what about privacy?• Collaborative Filtering literature review• Open source and IR systems history &

discussion

evaluating ir (web) systems study of information seeking & ir pragmatics of ir experimentation...

Documents

similar questions

wayspragmatic decision

search data

ir system evaluationhow

search strategy

kind of test

system help

new systemold test