interfaces for querying collections. information retrieval activities selecting a collection...
Post on 21-Dec-2015
214 views
TRANSCRIPT
Information Retrieval Activities
Selecting a collection– Lists, overviews, wizards, automatic selection
Submitting a request– Queries & expressiveness– Graphical interfaces– Natural language
Examining the response– Next class
Primary HCI Styles
Command language
Form filling
Menu selection
Direct manipulation
Natural language
Others?
Boolean QueriesMost commercial full-text retrieval systems
(until recently) supported only Boolean queries.
Many studies show users have difficulty with Boolean expression– And and Or not as used in English
• “cats and dogs”• “tea or coffee”
– Syntax specifying nesting is often cryptic
Boolean model does not include ranking– Earlier systems used reverse chronological order
Web-based Boolean Queries
Search engines based on Boolean or extended Boolean engines needed to make their systems usable by the Web audience
Reduce expressiveness for ease of use– Use “all the words” and “any of the words”– Boolean-based search engines added the +
prefix
Ranking performed using statistical algorithms and Web-specific heuristics
Command Line Search
Command line interfaces for search
Example Queries from Melvyl:– FIND PA darwin and TW species or TW descent– FIND TW Mt St. Helens AND DATE 1981
Faceted Queries
Boolean queries often return too many or too few results
– Conjunctions reduce sets too quickly– Disjunctions grow sets too quickly
Solution: – Try out smaller queries to see if they have an
appropriately sized set of results– Combine the smaller queries that are successful into
larger query.Example:
1. (osteoporosis OR “bone loss”)2. (drugs OR pharmaceuticals)3. (preventions OR cure)4. 1 AND 2 AND 3
Post-Coordinate or Quorum Ranking
Results are first ranked based on how many facets of the query they match.
Faceted Search with Quorum ranking allows specifying each concept in multiple ways yet ranking based on number of concepts included in document.
Further extension is to allow users to weight each facet.– Found on the web to help balance different
goals of search (e.g. selecting a car or house)
Graphical Query Specification
Graphical interfaces can be static, direct manipulation, or combine the two.
Direct manipulation– Continuous representation of objects– Physical actions replace complex syntax– Rapid incremental reversible operations on
objects– Immediate feedback on actions
Graphical Boolean Queries
Graphical queries are more accurate and faster than command-line queries in some studies
Venn diagrams are common graphical approach– Limit to three elements in conjunction
VQuery– Let users draw ellipses to create their own
queries
Process-Based Graphs
Can graphically represent the query as a process of selection.
Filter-flow model presents a set of filters.– One attribute and set of potential values per
filter, multiple values treated as disjunction– Branches in flow indicate disjunctions– Serialized filters indicate conjunctions
Fewer errors made with filter-flow than with SQL
Block-diagram Visualization
Users arrange blocks to specify query.
STARS– Users initially type in natural language query– Query terms are turned into blocks– Blocks are then arranged into query– Blocks in same row represent conjunction– Blocks in same column represent disjunction– Allows for previewing the query results by
simple rearrangement of blocks
Magic Lenses
Lenses act as filters on an overview visualization.– Disjunction is represented by independent
lenses– Conjunction is expressed by placing multiple
lenses over one another– Lenses can include addition information
• Where the term must appear• Term frequency requirements• Switches to use stemming• …
Phrases and Proximity
Specifying phrases and proximity constraints can be used to vastly improve precision.
Phrase search is often used in the context of the Web.– But the phrase must be literal– “President Lincoln” does not match “President
Abraham Lincoln”
Proximity constraints allow for more general queries– Examples:
• LEXIS-NEXIS “white w/3 house” means “white within three words of house”
Natural Language and Free Text Queries
Many systems treat question as a bag of words
Natural language processing can be used to try to better determine the information need.
– Extract noun (and verb) phrases– Find noun (and verb) phrases in same sentence
Ask.com uses sites preselected to answer particular question forms.
– Need to recognize type of question