query reformulation: user relevance feedback
DESCRIPTION
Query Reformulation: User Relevance Feedback. Introduction. Difficulty of formulating user queries Users have insufficient knowledge of the collection make-up Users have insufficient knowledge of the retrieval environment Query reformulation to improve user query two basic methods - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Query Reformulation: User Relevance Feedback](https://reader035.vdocument.in/reader035/viewer/2022072014/56812dbd550346895d92f9c9/html5/thumbnails/1.jpg)
Query Reformulation:User Relevance Feedback
![Page 2: Query Reformulation: User Relevance Feedback](https://reader035.vdocument.in/reader035/viewer/2022072014/56812dbd550346895d92f9c9/html5/thumbnails/2.jpg)
Introduction
Difficulty of formulating user queries– Users have insufficient knowledge of the
collection make-up– Users have insufficient knowledge of the
retrieval environment
Query reformulation to improve user query– two basic methods
• query expansion– Expanding the original query with new terms
• term reweighting– Reweighting the terms in the expanded query
![Page 3: Query Reformulation: User Relevance Feedback](https://reader035.vdocument.in/reader035/viewer/2022072014/56812dbd550346895d92f9c9/html5/thumbnails/3.jpg)
Introduction
Approaches for query reformulation– user relevance feedback
• based on feedback information from the user
– local analysis• based on information derived from the set of
documents initially retrieved (local set)
– global analysis• based on global information derived from the
document collection
![Page 4: Query Reformulation: User Relevance Feedback](https://reader035.vdocument.in/reader035/viewer/2022072014/56812dbd550346895d92f9c9/html5/thumbnails/4.jpg)
User Relevance Feedback
User’s role in URF cycle– is presented with a list of the retrieved documents– marks relevant documents
Main idea of URF– selecting important terms, or expressions, attached
to the documents that have been identified as relevant by the user
– enhancing the importance of these terms in new query formulation
– effect: the new query will be moved towards the relevant documents and away from the non-relevant ones
![Page 5: Query Reformulation: User Relevance Feedback](https://reader035.vdocument.in/reader035/viewer/2022072014/56812dbd550346895d92f9c9/html5/thumbnails/5.jpg)
User Relevance Feedback
Advantages of URF– it shields the user from the details of the
query reformulation process• users only have to provide a relevance judgment
on documents
– it breaks down the whole searching task into a sequence of small steps which are easier to grasp
– it provides a controlled process designed to emphasize relevant terms and de-emphasize non-relevant terms
![Page 6: Query Reformulation: User Relevance Feedback](https://reader035.vdocument.in/reader035/viewer/2022072014/56812dbd550346895d92f9c9/html5/thumbnails/6.jpg)
URF for Vector Model
Assumptions– the term-weight vectors of the documents identified as
relevant to the query have similarities among themselves.
– non-relevant documents have term-weight vectors which are dissimilar from the ones for the relevant documents.
Basic idea– reformulate the query such that it gets closer to the
term-weight vector space of the relevant documents
![Page 7: Query Reformulation: User Relevance Feedback](https://reader035.vdocument.in/reader035/viewer/2022072014/56812dbd550346895d92f9c9/html5/thumbnails/7.jpg)
The Perfect (Vector Model) Query
Assume we know what documents are relevant and which are not.
Given:– a collection of N documents
– Cr : the set of relevant documents
What is the optimal query?
![Page 8: Query Reformulation: User Relevance Feedback](https://reader035.vdocument.in/reader035/viewer/2022072014/56812dbd550346895d92f9c9/html5/thumbnails/8.jpg)
Back to Reality
Actually, what we are trying to figure out is which documents are relevant and which are not.
Our ideal query & definitions:– a collection of N documents
– Cr : the set of relevant documents
– Dr : set of documents user identified as relevant
– Dn : set of retrieved documents not relevant
– α, β, γ : tuning constants
Modified Query
(Rochio)
![Page 9: Query Reformulation: User Relevance Feedback](https://reader035.vdocument.in/reader035/viewer/2022072014/56812dbd550346895d92f9c9/html5/thumbnails/9.jpg)
Rochio & Ide VariationsStandard Rochio
Ide (Regular)
Ide (Dec_Hi)
where maxnonrelevant(dj): the highest ranked non-relevant document
![Page 10: Query Reformulation: User Relevance Feedback](https://reader035.vdocument.in/reader035/viewer/2022072014/56812dbd550346895d92f9c9/html5/thumbnails/10.jpg)
Tuning the Feedback
Modified Query
How do we set the tuning constants α, β, γ?– Rochio originally set α = 1– Ide originally set α = β = γ = 1
Often, positive relevance feedback is more valuable than negative relevance feedback.– this implies: β > γ– purely positive feedback mechanism: γ = 0
![Page 11: Query Reformulation: User Relevance Feedback](https://reader035.vdocument.in/reader035/viewer/2022072014/56812dbd550346895d92f9c9/html5/thumbnails/11.jpg)
URF for Vector Model
Includes both query expansion and term reweighting
Advantages– simplicity
• modified term weights are computed directly from the set of retrieved documents
– good results• modified query vector does reflect a portion of the intended
query semantics
Issue: As with all learning techniques, this assumes the information need is relatively static.
![Page 12: Query Reformulation: User Relevance Feedback](https://reader035.vdocument.in/reader035/viewer/2022072014/56812dbd550346895d92f9c9/html5/thumbnails/12.jpg)
Evaluation of Relevance Feedback Strategies
Simplistic evaluation is to compare the results of the modified query to the original query.– Does not work!!!– Results are great but mostly due to higher
ranking of documents returned by original query.
– User has already seen these documents.
![Page 13: Query Reformulation: User Relevance Feedback](https://reader035.vdocument.in/reader035/viewer/2022072014/56812dbd550346895d92f9c9/html5/thumbnails/13.jpg)
Evaluation of Relevance Feedback Strategies
More realistic evaluation– Compute precision and recall on residual
collection (those documents not returned by the original query)
– Because highly-ranked documents are removed, these results can be worse than for the original query.
– That is okay if we are comparing between relevance feedback approaches.