modern information retrieval chapter 3 retrieval evaluation
Post on 20-Dec-2015
301 Views
Preview:
TRANSCRIPT
Modern Information Retrieval
Chapter 3 Retrieval Evaluation
The most common measures of system performance are time and space an inherent tradeoff
Data retrieval time and space indexing
Information retrieval precision of the answer set also important
evaluation considerations query with/without feedback query interface design real data/synthetic data real life/laboratory environment
repeatability and scalability
recall and precision recall: fraction of relevant documents
which has been retrieved precision: fraction of retrieved documents
which is relevant
can we precisely compute precisions? can we precisely compute recalls?
precision versus recall curve: a standard evaluation strategy
interpolation procedure for generating the 11 standard recall levels
Rq={d3,d56,d129}
where j is in {0,1,2,…,10} andP(r) is a known precision
to evaluate the retrieval strategy over all test queries, the precisions at each recall level are averaged
another approach: compute average precision at given relevant document cutoff values
advantages?
single value summary for each query average precision at seen relevant
documents example in Figure 3.2 favor systems which retrieve relevant
documents quickly can have a poor overall recall performance
R-precision R: total number of relevant documents examples in Figures 3.2 and 3.3
precision histogram
combining recall and precision the harmonic mean
it assumes a high value only when both recall and precision are high
the E measure
b=1, complement of the harmonic mean b>1, the user is more interested in precision b<1, the user is more interested in recall
user-oriented measures
coverage ratio: fraction of the documents known to be relevant which has been retrieved
the system finds the relevant documents the user expected to see
novelty ratio: fraction of the relevant documents retrieved which was previously unknown to the user
the system reveals new relevant documents previously unknown to the user
relative recall: the ratio between the number of relevant documents found and the number of relevant documents the user expected to find
relative recall=
when the relative recall equals to 1 (the user finds enough relevant documents), the user stops searching
recall effort: the ratio between the number of relevant documents the user expected to find and the number of documents examined in an attempt to find the expected relevant documents
research in IR lack a solid formal framework lack robust and consistent testbeds and
benchmarks Text REtrieval Conference
retrieval techniques methods using automatic thesauri sophisticated term weighting natural language techniques relevance feedback advanced pattern matching
document collection over 1 million documents newspaper, patents, etc.
topics in natural language conversion done by the system
relevant documents the pooling method: for each topic, collect the
top k documents generated by each participating system and decide their relevance by human assessors
the benchmark tasks ad hoc task filtering task Chinese cross languages spoken document retrieval high precision very large collection
evaluation measures summary table statistics: number of
documents retrieved, number of relevant documents retrieved, number of relevant documents not retrieved, etc.
recall-precision averages document level averages: average
precision at seen relevant documents average precision histogram
top related