evaluating ir (web) systems study of information seeking & ir pragmatics of ir experimentation...
TRANSCRIPT
Evaluating IR (Web) SystemsEvaluating IR (Web) Systems
• Study of Information Seeking & IR• Pragmatics of IR experimentation• The dynamic Web• Cataloging & understanding Web docs• Web site characteristics
Study of Info seeking & retrievalStudy of Info seeking & retrieval
- Well known authors (useful for research papers)
• Real life studies (not TREC)- User context of questions- Questions (structure & classification)- Searcher (cognitive traits & decision making)- Information Items• Difference searches with same question• Relevant items
• “models, measures, methods, procedures and statistical analyses” p 175
• Beyond common sense and anecdotes
Study 2Study 2
• Is there ever enough user research?• A good set of elements to include in an IR
system evaluation• How do you test for real life situations?- Questions the users actually have- Expertise in subject (or not)- Intent- User’s computers, desks & materials
• What’s a search strategy?- Tactics, habits, previous knowledge
• How do you collect search data?
Study 3Study 3
• How do you ask questions?- General knowledge test- Specific search terms
• Learning Style Inventory- NOT the best way to understand users- Better than nothing- Choose your questions like your users• Let users choose their questions?• Let users work together on searches
• Effectiveness Measures- Recall, precision, relevance
Study 4Study 4
• Measuring efficiency- Time on tasks- Task completion
• Correct answer• Any answer?
- Worthwhile?
• Counting correct answers• Statistics
- Clicks, commands, pages, results- Not just computer time, but the overall process- Start with the basics, then get advanced- Regression analysis (dependencies for large studies)
Let’s design an experimentLet’s design an experiment
• User Selection- Searcher (cognitive traits & decision making)- User context of questions
• Environment• Questions (structure & classification)• Information Items- Successful answers- Successful/Worthwhile sessions
• Measurement
Pragmatics of IR experimentationPragmatics of IR experimentation
• The entire IR evaluation must be planned• Controls are essential• Working with what you can get- Expert defined questions & answers- Specific systems
• Fast, cheap, informal tests- Not always, but could be pre-tests- Quick results for broad findings
Pragmatic Decision1Pragmatic Decision1
• Testing at all?- Purpose of test- Pull data from previous tests
• Repeat old test- Old test with new system- Old test with new database
• Same test, many users- Same system- Same questions (data)
Pragmatic Decision 2Pragmatic Decision 2
• What kind of test?• Everything at once?- System (help, no help?)- Users (types of)- Questions (open-ended?)
• Facts- Answers with numbers- Words the user knows
• General knowledge- Found more easily- Ambiguity goes both ways
Pragmatic Decision 3Pragmatic Decision 3
• Understanding the Data• What are your variables? (p 207)• Working with initial goals of study• Study size determines measurement methods- Lots of user- Many questions- All system features, competing system features
• What is acceptable/passable performance?- Time, correct answers, clicks?- Which are controlled?
Pragmatic Decision 4Pragmatic Decision 4
• What database?- The Web (no control)- Smaller dataset (useful to user?)
• Very similar questions, small dataset- Web site search vs. whole Web search- Prior knowledge of subject- Comprehensive survey of possible results
beforehand
• Differences other than content?
Pragmatic Decision 5Pragmatic Decision 5
• Where do queries/questions come from?- Content itself- User pre-interview (pre-tests)- Other studies
• What are search terms (used or given)- Single terms- Advanced searching- Results quantity
Pragmatic Decisions 6, 7, 8Pragmatic Decisions 6, 7, 8
• Analyzing queries- Scoring system- Logging use
• What’s a winning query (treatment of units)- User success, expert answer- Time, performance- Different querie with same answer?
• Collect the data- Logging and asking users- Consistency (software, questionnaires, scripts)
Pragmatic Decisions 9 & 10Pragmatic Decisions 9 & 10
• Analyzing Data• Dependent on the dataset• Compare to other studies• Basic statistics first
• Presenting Results• Work from plan• Purpose• Measurement• Models• Users
• Matching other studies
Keeping Up with the Changing WebKeeping Up with the Changing Web
• Building Indices is difficult enough in theory• What about a continuously changing huge volume of
information?• Is old information good?• What does up-to-date mean anymore?• Is Knowledge a depreciating commodity?
- Correctness + Value over time
• Different information changes at different rates- Really it’s new information
• How do you update an index with constantly changing information?
Changing Web PropertiesChanging Web Properties
• Known distributions for information change• Sites and pages may have easily identifiable
patterns of update- 4% change on every observation- Some don’t ever change (links too)
• If you check and a page hasn’t changed, what is the probability it will ever change?
• Rate of change is related to rate of attention- Machines vs. Users- Measures can be compared along with information
Dynamic Maint. of Indexes w/LandmarksDynamic Maint. of Indexes w/Landmarks
• Web Crawlers do the work in gathering pages• Incremental crawling means incremented
indices- Rebuild the whole index more frequently- Devise a scheme for updates (and deletions)- Use supplementary indices (i.e. date)
• New documents• Changed documents• 404 documents
Landmarks for IndexingLandmarks for Indexing
• Difference-based method• Documents that don’t change are landmarks
- Relative addressing- Clarke: block-based- Glimpse: chunking
• Only update pointers to pages• Tags and document properties are
landmarked• Broader pointers mean less updates• Faster indexing – Faster access?
Yahoo! Cataloging the WebYahoo! Cataloging the Web
• How do information professionals build an “index” of the Web?
• Cataloging applies to the Web• Indexing with synonyms• Browsing indexes vs searching them• Comprehensive index not the goal
- Quality- Information Density
• Yahoo’s own ontology – points to site for full info• Subject Trees with aliases (@) to other locations• “More like this” comparisons as checksums
Yahoo uses tools for indexingYahoo uses tools for indexing
Investigation of Documents from the WWWInvestigation of Documents from the WWW
• What properties do Web documents have?• What structure and formats do Web
documents use?• What properties do Web documents have?- Size – 4K avg.- Tags – ratio and popular tags- MIME types (file extensions)- URL properties and formats- Links – internal and external- Graphics- Readability
WWW Documents InvestigationWWW Documents Investigation
• How do you collect data like this?- Web Crawler• URL identifier, link follower
- Index-like processing • Markup parser, keyword identifier
• Domain name translation (and caching)• How do these facts help with indexing?• Have general characteristics changed?
• (This would be a great project to update.)
Properties of Highly-Rated Web SitesProperties of Highly-Rated Web Sites
• What about whole Web sites?• What is a Web site?- Sub-sites?- Specific contextual, subject-based parts of a Web
site?- Links from other Web pages: on the site and off- Web site navigation effects
• Will experts (like Yahoo catalogers) like a site?
PropertiesProperties
• Links & formatting• Graphics – one, but not too many• Text formatting – 9 pt. with normal style• Page (layout) formatting – min. colors• Page performance (size and acess)• Site architecture (pages, nav elements)
- More links within and external- Interactive (search boxes, menus)
• Consistency within a site is key
• How would a user or index builder make use of these?
Extra DiscussionExtra Discussion
• Little Words, Big Difference- The difference that makes a difference- Singular and plural noun identification can change
indices and retrieval results- Language use differences
• Decay and Failures- Dead links- Types of errors- Huge amount of dead links (PageRank effective)• 28% in 1995-1999 Computer & CACM• 41% in 2002 articles• Better than the average Web page?
Break!Break!
Topic Discussions SetTopic Discussions Set
• Leading WIRED Topic Discussions- About 20 minutes reviewing issues from the
week’s readings• Key ideas from the readings• Questions you have about the readings• Concepts from readings to expand on
- PowerPoint slides- Handouts- Extra readings (at least a few days before class) –
send to wired listserv
Web IR EvaluationWeb IR Evaluation
- 5 page written evaluation of a Web IR System- technology overview (how it works)• Not an eval of a standard search engine• Only main determinable diff is content
- a brief overview of the development of this type of system (why it works better)
- intended uses for the system (who, when, why)- (your) examples or case studies of the system in
use and its overall effectiveness
• How can (Web) IR be better?- Better IR models- Better User Interfaces
• More to find vs. easier to find• Web documents sampling• Web cataloging work- Metadata & IR- Who watches the catalogers?
• Scriptable applications- Using existing IR systems in new ways- RSS & IR
Projects and/or Papers OverviewProjects and/or Papers Overview
Project IdeasProject Ideas
• Searchable Personal Digital Library• Browser hacks for searching• Mozilla keeps all the pages you surf so you
can search through them later- Mozilla hack- Local search engines
• Keeping track of searches• Monitoring searches
Paper IdeasPaper Ideas
• New datasets for IR• Search on the Desktop – issues, previous
research and ideas• Collaborative searching – advantages and
potential, but what about privacy?• Collaborative Filtering literature review• Open source and IR systems history &
discussion