modern information retrieval relevance feedback. the ir black box search query ranked list
TRANSCRIPT
![Page 1: Modern Information Retrieval Relevance Feedback. The IR Black Box Search Query Ranked List](https://reader035.vdocument.in/reader035/viewer/2022062421/56649e265503460f94b160d1/html5/thumbnails/1.jpg)
Modern Information Retrieval
Relevance Feedback
![Page 2: Modern Information Retrieval Relevance Feedback. The IR Black Box Search Query Ranked List](https://reader035.vdocument.in/reader035/viewer/2022062421/56649e265503460f94b160d1/html5/thumbnails/2.jpg)
The IR Black Box
Search
Query
Ranked List
![Page 3: Modern Information Retrieval Relevance Feedback. The IR Black Box Search Query Ranked List](https://reader035.vdocument.in/reader035/viewer/2022062421/56649e265503460f94b160d1/html5/thumbnails/3.jpg)
Standard Model of IR
Assumptions:The goal is maximizing precision and recall
simultaneouslyThe information need remains staticThe value is in the resulting document set
![Page 4: Modern Information Retrieval Relevance Feedback. The IR Black Box Search Query Ranked List](https://reader035.vdocument.in/reader035/viewer/2022062421/56649e265503460f94b160d1/html5/thumbnails/4.jpg)
Problems with Standard Model
Users learn during the search process:Scanning titles of retrieved documentsReading retrieved documentsViewing lists of related topics/thesaurus termsNavigating hyperlinks
Some users don’t like long (apparently) disorganized lists of documents
![Page 5: Modern Information Retrieval Relevance Feedback. The IR Black Box Search Query Ranked List](https://reader035.vdocument.in/reader035/viewer/2022062421/56649e265503460f94b160d1/html5/thumbnails/5.jpg)
IR is an Iterative Process
Repositories
Workspace
Goals
![Page 6: Modern Information Retrieval Relevance Feedback. The IR Black Box Search Query Ranked List](https://reader035.vdocument.in/reader035/viewer/2022062421/56649e265503460f94b160d1/html5/thumbnails/6.jpg)
IR is a Dialog
The exchange doesn’t end with first answer Users can recognize elements of a useful answer,
even when incomplete Questions and understanding changes as the process
continues
![Page 7: Modern Information Retrieval Relevance Feedback. The IR Black Box Search Query Ranked List](https://reader035.vdocument.in/reader035/viewer/2022062421/56649e265503460f94b160d1/html5/thumbnails/7.jpg)
Bates’ “Berry-Picking” Model
Standard IR modelAssumes the information need remains the
same throughout the search process
Berry-picking model Interesting information is scattered like
berries among bushesThe query is continually shifting
![Page 8: Modern Information Retrieval Relevance Feedback. The IR Black Box Search Query Ranked List](https://reader035.vdocument.in/reader035/viewer/2022062421/56649e265503460f94b160d1/html5/thumbnails/8.jpg)
Berry-Picking Model
Q0
Q1
Q2
Q3
Q4
Q5
A sketch of a searcher… “moving through many actions towards a general goal of satisfactory completion of research related to an information need.” (after Bates 89)
![Page 9: Modern Information Retrieval Relevance Feedback. The IR Black Box Search Query Ranked List](https://reader035.vdocument.in/reader035/viewer/2022062421/56649e265503460f94b160d1/html5/thumbnails/9.jpg)
Berry-Picking Model (cont.)
The query is continually shifting New information may yield new ideas and
new directions The information need
Is not satisfied by a single, final retrieved set Is satisfied by a series of selections and bits
of information found along the way
![Page 10: Modern Information Retrieval Relevance Feedback. The IR Black Box Search Query Ranked List](https://reader035.vdocument.in/reader035/viewer/2022062421/56649e265503460f94b160d1/html5/thumbnails/10.jpg)
Restricted Form of the IR Problem
The system has available only pre-existing, “canned” text passages
Its response is limited to selecting from these passages and presenting them to the user
It must select, say, 10 or 20 passages out of millions or billions!
![Page 11: Modern Information Retrieval Relevance Feedback. The IR Black Box Search Query Ranked List](https://reader035.vdocument.in/reader035/viewer/2022062421/56649e265503460f94b160d1/html5/thumbnails/11.jpg)
Information Retrieval
Revised Task Statement:
Build a system that retrieves documents that users are likely to find relevant to their queries
This set of assumptions underlies the field of Information Retrieval
![Page 12: Modern Information Retrieval Relevance Feedback. The IR Black Box Search Query Ranked List](https://reader035.vdocument.in/reader035/viewer/2022062421/56649e265503460f94b160d1/html5/thumbnails/12.jpg)
Relevance Feedback
Take advantage of user relevance judgments in the retrieval process: User issues a (short, simple) query and gets back an
initial hit list User marks hits as relevant or non-relevant The system computes a better representation of the
information need based on this feedback Single or multiple iterations (although little is typically
gained after one iteration) Idea: you may not know what you’re looking for, but
you’ll know when you see it
![Page 13: Modern Information Retrieval Relevance Feedback. The IR Black Box Search Query Ranked List](https://reader035.vdocument.in/reader035/viewer/2022062421/56649e265503460f94b160d1/html5/thumbnails/13.jpg)
Outline
Explicit feedback: users explicitly mark relevant and irrelevant documents
Implicit feedback: system attempts to infer user intentions based on observable behavior
Blind feedback: feedback in absence of any evidence, explicit or otherwise
![Page 14: Modern Information Retrieval Relevance Feedback. The IR Black Box Search Query Ranked List](https://reader035.vdocument.in/reader035/viewer/2022062421/56649e265503460f94b160d1/html5/thumbnails/14.jpg)
Why relevance feedback?
You may not know what you’re looking for, but you’ll know when you see it
Query formulation may be difficult; simplify the problem through iteration
Facilitate vocabulary discovery Boost recall: “find me more documents
like this…”
![Page 15: Modern Information Retrieval Relevance Feedback. The IR Black Box Search Query Ranked List](https://reader035.vdocument.in/reader035/viewer/2022062421/56649e265503460f94b160d1/html5/thumbnails/15.jpg)
Relevance Feedback ExampleImage Search Enginehttp://nayana.ece.ucsb.edu/imsearch/imsearch.html
![Page 16: Modern Information Retrieval Relevance Feedback. The IR Black Box Search Query Ranked List](https://reader035.vdocument.in/reader035/viewer/2022062421/56649e265503460f94b160d1/html5/thumbnails/16.jpg)
Initial Results
![Page 17: Modern Information Retrieval Relevance Feedback. The IR Black Box Search Query Ranked List](https://reader035.vdocument.in/reader035/viewer/2022062421/56649e265503460f94b160d1/html5/thumbnails/17.jpg)
Relevance Feedback
![Page 18: Modern Information Retrieval Relevance Feedback. The IR Black Box Search Query Ranked List](https://reader035.vdocument.in/reader035/viewer/2022062421/56649e265503460f94b160d1/html5/thumbnails/18.jpg)
Revised Results
![Page 19: Modern Information Retrieval Relevance Feedback. The IR Black Box Search Query Ranked List](https://reader035.vdocument.in/reader035/viewer/2022062421/56649e265503460f94b160d1/html5/thumbnails/19.jpg)
Updating Queries
Let’s assume that there is an optimal query The goal of relevance feedback is to bring the user
query closer to the optimal query How does relevance feedback actually work?
Use relevance information to update query Use query to retrieve new set of documents
What exactly do we “feed back”? Boost weights of terms from relevant documents Add terms from relevant documents to the query Note that this is hidden from the user
![Page 20: Modern Information Retrieval Relevance Feedback. The IR Black Box Search Query Ranked List](https://reader035.vdocument.in/reader035/viewer/2022062421/56649e265503460f94b160d1/html5/thumbnails/20.jpg)
Query Modification
Changing or Expanding a query can lead to better results
Problem: how to reformulate the query?Thesaurus expansion:
Suggest terms similar to query terms
Relevance feedback:Suggest terms (and documents) similar to
retrieved documents that have been judged to be relevant
![Page 21: Modern Information Retrieval Relevance Feedback. The IR Black Box Search Query Ranked List](https://reader035.vdocument.in/reader035/viewer/2022062421/56649e265503460f94b160d1/html5/thumbnails/21.jpg)
Relevance Feedback
Main Idea:Modify existing query based on relevance
judgementsExtract terms from relevant documents and add them to
the queryand/or re-weight the terms already in the query
Two main approaches:Automatic (psuedo-relevance feedback)Users select relevant documents
Users/system select terms from an automatically-generated list
![Page 22: Modern Information Retrieval Relevance Feedback. The IR Black Box Search Query Ranked List](https://reader035.vdocument.in/reader035/viewer/2022062421/56649e265503460f94b160d1/html5/thumbnails/22.jpg)
Picture of Relevance Feedback
x
x
xx
oo
o
Revised queryx non-relevant documentso relevant documents
o
o
ox
x
x x
xx
x
x
xx
x
x
x
x
Initial query
x
![Page 23: Modern Information Retrieval Relevance Feedback. The IR Black Box Search Query Ranked List](https://reader035.vdocument.in/reader035/viewer/2022062421/56649e265503460f94b160d1/html5/thumbnails/23.jpg)
Rocchio Algorithm
Used in practice:
New queryMoves toward relevant documentsAway from irrelevant documents
nrjrj Ddj
nrDdj
rm d
Dd
Dqq
110
qm = modified query vector; q0 = original query vector;α,β,γ: weights (hand-chosen or set empirically); Dr = set of known relevant doc vectors; Dnr = set of known irrelevant doc vectors
![Page 24: Modern Information Retrieval Relevance Feedback. The IR Black Box Search Query Ranked List](https://reader035.vdocument.in/reader035/viewer/2022062421/56649e265503460f94b160d1/html5/thumbnails/24.jpg)
notes on feedback
If is set = 0, we have a positive feedback system; often preferred
the feedback formula can be applied repeatedly, asking user for relevance information at each iteration
relevance feedback is generally considered to be very effective
one drawback is that it is not fully automatic.
![Page 25: Modern Information Retrieval Relevance Feedback. The IR Black Box Search Query Ranked List](https://reader035.vdocument.in/reader035/viewer/2022062421/56649e265503460f94b160d1/html5/thumbnails/25.jpg)
Simple feedback example:
T = {pudding, jam, traffic, lane, treacle}
d1 = (0.8, 0.8, 0.0, 0.0, 0.4),
d2 = (0.0, 0.0, 0.9, 0.8, 0.0),
d3 = (0.8, 0.0, 0.0, 0.0, 0.8)
d4 = (0.6, 0.9, 0.5, 0.6, 0.0)
Recipe for jam pudding
DoT report on traffic lanes
Radio item on traffic jam in Pudding Lane
Recipe for treacle pudding
Display first 2 documents that match the following query:q0 = (1.0, 0.6, 0.0, 0.0, 0.0)
r = (0.91, 0.0, 0.6, 0.73)
Retrieved documents are:
d1 : Recipe for jam pudding
d4 : Radio item on traffic jam
relevant
not relevant
![Page 26: Modern Information Retrieval Relevance Feedback. The IR Black Box Search Query Ranked List](https://reader035.vdocument.in/reader035/viewer/2022062421/56649e265503460f94b160d1/html5/thumbnails/26.jpg)
Suppose we set and to 0.5, to 0.2
qm = q0 di / | HR | di / | HNR|
= 0.5 q0 + 0.5 d1 0.2 d4
= 0.5 (1.0, 0.6, 0.0, 0.0, 0.0)+ 0.5 (0.8, 0.8, 0.0, 0.0, 0.4) 0.2 (0.6, 0.9, 0.5, 0.6, 0.0)
= (0.78, 0.52, 0.1, 0.12, 0.2)
di HR di HNR
Positive and Negative Feedback
![Page 27: Modern Information Retrieval Relevance Feedback. The IR Black Box Search Query Ranked List](https://reader035.vdocument.in/reader035/viewer/2022062421/56649e265503460f94b160d1/html5/thumbnails/27.jpg)
Simple feedback example:
T = {pudding, jam, traffic, lane, treacle}
d1 = (0.8, 0.8, 0.0, 0.0, 0.4),
d2 = (0.0, 0.0, 0.9, 0.8, 0.0),
d3 = (0.8, 0.0, 0.0, 0.0, 0.8)
d4 = (0.6, 0.9, 0.5, 0.6, 0.0)
Display first 2 documents that match the following query:qm = (0.78, 0.52, 0.1, 0.12, 0.2)
r’ = (0.96, 0.0, 0.86, 0.63)
Retrieved documents are:
d1 : Recipe for jam pudding
d3 : Recipe for treacle pud
relevant
relevant
![Page 28: Modern Information Retrieval Relevance Feedback. The IR Black Box Search Query Ranked List](https://reader035.vdocument.in/reader035/viewer/2022062421/56649e265503460f94b160d1/html5/thumbnails/28.jpg)
Relevance Feedback: Cost
Speed and efficiency issuesSystem needs to spend time analyzing
documentsLonger queries are usually slower
Users often reluctant to provide explicit feedback
It’s often harder to understand why a particular document was retrieved
![Page 29: Modern Information Retrieval Relevance Feedback. The IR Black Box Search Query Ranked List](https://reader035.vdocument.in/reader035/viewer/2022062421/56649e265503460f94b160d1/html5/thumbnails/29.jpg)
Koenemann and Belkin’s Work
Well-known study on relevance feedback in information retrieval
Questions asked:Does relevance feedback improve results? Is user control over relevance feedback
helpful?How do different levels of user control effect
results?
Jürgen Koenemann and Nicholas J. Belkin. (1996) A Case For Interaction: A Study of Interactive Information Retrieval Behavior and Effectiveness. Proceedings of SIGCHI 1996
Conference on Human Factors in Computing Systems (CHI 1996).
![Page 30: Modern Information Retrieval Relevance Feedback. The IR Black Box Search Query Ranked List](https://reader035.vdocument.in/reader035/viewer/2022062421/56649e265503460f94b160d1/html5/thumbnails/30.jpg)
What’s the best interface?
Opaque (black box) User doesn’t get to see the relevance
feedback process Transparent
User shown relevance feedback terms, but isn’t allowed to modify query
Penetrable (可渗透的)User shown relevance feedback terms and is
allowed to modify the query
Which do you think worked best?
![Page 31: Modern Information Retrieval Relevance Feedback. The IR Black Box Search Query Ranked List](https://reader035.vdocument.in/reader035/viewer/2022062421/56649e265503460f94b160d1/html5/thumbnails/31.jpg)
Query Interface
![Page 32: Modern Information Retrieval Relevance Feedback. The IR Black Box Search Query Ranked List](https://reader035.vdocument.in/reader035/viewer/2022062421/56649e265503460f94b160d1/html5/thumbnails/32.jpg)
Penetrable Interface
Users get to select which terms they want to add
![Page 33: Modern Information Retrieval Relevance Feedback. The IR Black Box Search Query Ranked List](https://reader035.vdocument.in/reader035/viewer/2022062421/56649e265503460f94b160d1/html5/thumbnails/33.jpg)
Study Details
Subjects started with a tutorial 64 novice searchers (43 female, 21 male)
Goal is to keep modifying the query until they’ve developed one that gets high precision
![Page 34: Modern Information Retrieval Relevance Feedback. The IR Black Box Search Query Ranked List](https://reader035.vdocument.in/reader035/viewer/2022062421/56649e265503460f94b160d1/html5/thumbnails/34.jpg)
Procedure
Baseline (Trial 1)Subjects get tutorial on relevance feedback
Experimental condition (Trial 2)Shown one of four modes: no relevance
feedback, opaque, transparent, penetrable
Evaluation metric used: precision at 30 documents
![Page 35: Modern Information Retrieval Relevance Feedback. The IR Black Box Search Query Ranked List](https://reader035.vdocument.in/reader035/viewer/2022062421/56649e265503460f94b160d1/html5/thumbnails/35.jpg)
Relevance feedback works!
Subjects using the relevance feedback interfaces performed 17-34% better
Subjects in the penetrable condition performed 15% better than those in opaque and transparent conditions
![Page 36: Modern Information Retrieval Relevance Feedback. The IR Black Box Search Query Ranked List](https://reader035.vdocument.in/reader035/viewer/2022062421/56649e265503460f94b160d1/html5/thumbnails/36.jpg)
Behavior Results
Search times approximately equal Precision increased in first few iterations Penetrable interface required fewer
iterations to arrive at final query Queries with relevance feedback are much
longerBut fewer terms with the penetrable interface
users were more selective about which terms to add
![Page 37: Modern Information Retrieval Relevance Feedback. The IR Black Box Search Query Ranked List](https://reader035.vdocument.in/reader035/viewer/2022062421/56649e265503460f94b160d1/html5/thumbnails/37.jpg)
Implicit Feedback
Users are often reluctant to provide relevance judgmentsSome searches are precision-orientedThey’re lazy!
Can we gather feedback without requiring the user to do anything?
Idea: gather feedback from observed user behavior
![Page 38: Modern Information Retrieval Relevance Feedback. The IR Black Box Search Query Ranked List](https://reader035.vdocument.in/reader035/viewer/2022062421/56649e265503460f94b160d1/html5/thumbnails/38.jpg)
Observable Behavior
Minimum Scope
Segment Object ClassB
ehav
ior
Cat
ego
ryExamine
Retain
Reference
Annotate
ViewListen
Select
Print BookmarkSavePurchaseDelete
Subscribe
Copy / pasteQuote
ForwardReplyLinkCite
Mark up RatePublish
Organize
![Page 39: Modern Information Retrieval Relevance Feedback. The IR Black Box Search Query Ranked List](https://reader035.vdocument.in/reader035/viewer/2022062421/56649e265503460f94b160d1/html5/thumbnails/39.jpg)
So far…
Explicit feedback: take advantage of user-supplied relevance judgments
Implicit feedback: observe user behavior and draw inferences
Can we perform feedback without having a user in the loop?
![Page 40: Modern Information Retrieval Relevance Feedback. The IR Black Box Search Query Ranked List](https://reader035.vdocument.in/reader035/viewer/2022062421/56649e265503460f94b160d1/html5/thumbnails/40.jpg)
Blind Relevance Feedback
Also called “pseudo (伪) relevance feedback” Motivation: it’s difficult to elicit relevance judgments from
users Can we automate this process?
Idea: take top n documents, and simply assume that they are relevant
Perform relevance feedback as before If the initial hit list is reasonable, system should pick up
good query terms Does it work?
![Page 41: Modern Information Retrieval Relevance Feedback. The IR Black Box Search Query Ranked List](https://reader035.vdocument.in/reader035/viewer/2022062421/56649e265503460f94b160d1/html5/thumbnails/41.jpg)
BRF Experiment
Retrieval engine: Indri Test collection: TREC, topics 301-450 Procedure:
Used topic description as query to generate initial hit list
Selected top 20 terms from top 20 hits using tf.idf
Added these terms to the original query
![Page 42: Modern Information Retrieval Relevance Feedback. The IR Black Box Search Query Ranked List](https://reader035.vdocument.in/reader035/viewer/2022062421/56649e265503460f94b160d1/html5/thumbnails/42.jpg)
ResultsMAP R-Precision
No feedback 0.1591 0.2022
With feedback 0.1806+( 13.5%)
0.2222 (+9.9%)
Blind relevance feedback doesn’t always help!
![Page 43: Modern Information Retrieval Relevance Feedback. The IR Black Box Search Query Ranked List](https://reader035.vdocument.in/reader035/viewer/2022062421/56649e265503460f94b160d1/html5/thumbnails/43.jpg)
The Complete Landscape
Explicit, implicit, blind feedback: it’s all about manipulating terms
Dimensions of query expansion “Local” vs. “global”User involvement vs. no user involvement
![Page 44: Modern Information Retrieval Relevance Feedback. The IR Black Box Search Query Ranked List](https://reader035.vdocument.in/reader035/viewer/2022062421/56649e265503460f94b160d1/html5/thumbnails/44.jpg)
Local vs. Global
“Local” methodsOnly considers documents that have be retrieved
by an initial queryQuery specificComputations must be performed on the fly
“Global” methodsTakes entire document collection into accountDoes not depend on the queryThesauri can be computed off-line (for faster
access)
![Page 45: Modern Information Retrieval Relevance Feedback. The IR Black Box Search Query Ranked List](https://reader035.vdocument.in/reader035/viewer/2022062421/56649e265503460f94b160d1/html5/thumbnails/45.jpg)
User Involvement
Query expansion can be done automaticallyNew terms added without user intervention
Or it can place a user in the loopSystem presents suggested termsMust consider interface issues
![Page 46: Modern Information Retrieval Relevance Feedback. The IR Black Box Search Query Ranked List](https://reader035.vdocument.in/reader035/viewer/2022062421/56649e265503460f94b160d1/html5/thumbnails/46.jpg)
Global Methods
Controlled vocabularyFor example, MeSH terms
Manual thesaurusFor example, WordNet
Automatically derived thesaurusFor example, based on co-occurrence
statistics
![Page 47: Modern Information Retrieval Relevance Feedback. The IR Black Box Search Query Ranked List](https://reader035.vdocument.in/reader035/viewer/2022062421/56649e265503460f94b160d1/html5/thumbnails/47.jpg)
Using Controlled Vocabulary
![Page 48: Modern Information Retrieval Relevance Feedback. The IR Black Box Search Query Ranked List](https://reader035.vdocument.in/reader035/viewer/2022062421/56649e265503460f94b160d1/html5/thumbnails/48.jpg)
Thesauri
A thesaurus may contain information about lexical semantic relations:Synonyms: similar words
e.g., violin → fiddleHypernyms: more general words
e.g., violin → instrumentHyponyms: more specific words
e.g., violin → StradivariMeronyms: parts
e.g., violin → strings
![Page 49: Modern Information Retrieval Relevance Feedback. The IR Black Box Search Query Ranked List](https://reader035.vdocument.in/reader035/viewer/2022062421/56649e265503460f94b160d1/html5/thumbnails/49.jpg)
Using Manual Thesauri
For each query term t, added synonyms and related words from thesaurus feline → feline cat
Generally improves recall Often hurts precision
“interest rate” “interest rate fascinate evaluate”
Manual thesauri are expensive to produce and maintain
![Page 50: Modern Information Retrieval Relevance Feedback. The IR Black Box Search Query Ranked List](https://reader035.vdocument.in/reader035/viewer/2022062421/56649e265503460f94b160d1/html5/thumbnails/50.jpg)
Automatic Thesauri Generation
Attempt to generate a thesaurus automatically by analyzing the document collection
Two possible approachesCo-occurrence statistics (co-occurring words
are more likely to be similar)Shallow analysis of grammatical relations
Entities that are grown, cooked, eaten, and digested are more likely to be food items.
![Page 51: Modern Information Retrieval Relevance Feedback. The IR Black Box Search Query Ranked List](https://reader035.vdocument.in/reader035/viewer/2022062421/56649e265503460f94b160d1/html5/thumbnails/51.jpg)
Automatic Thesauri: Example
![Page 52: Modern Information Retrieval Relevance Feedback. The IR Black Box Search Query Ranked List](https://reader035.vdocument.in/reader035/viewer/2022062421/56649e265503460f94b160d1/html5/thumbnails/52.jpg)
Automatic Thesauri: Discussion
Quality of associations is usually a problem Term ambiguity may introduce irrelevant statistically
correlated terms. “Apple computer” “Apple red fruit computer”
Problems: False positives: Words deemed similar that are not False negatives: Words deemed dissimilar that are
similar Since terms are highly correlated anyway, expansion
may not retrieve many additional documents
![Page 53: Modern Information Retrieval Relevance Feedback. The IR Black Box Search Query Ranked List](https://reader035.vdocument.in/reader035/viewer/2022062421/56649e265503460f94b160d1/html5/thumbnails/53.jpg)
Key Points
Moving beyond the black box… interaction is key!
Different types of feedback:Explicit (user does the work) Implicit (system watches the user and guess)Blind (don’t even involve the user)
Query expansion as a general mechanism