![Page 1: One Click One Revisited: Enhancing Evaluation Based on Information Units Tetsuya Sakai 1 and Makoto P. Kato 2 1 Microsoft Research Asia, P.R. China tetsuyasakai@acm.org](https://reader036.vdocument.in/reader036/viewer/2022062519/5697bfa71a28abf838c98f00/html5/thumbnails/1.jpg)
One Click One Revisited: Enhancing Evaluation Based on Information
UnitsTetsuya Sakai1 and Makoto P. Kato2
1 Microsoft Research Asia, P.R. China
2 Kyoto University, Japan
AIRS 2012
![Page 2: One Click One Revisited: Enhancing Evaluation Based on Information Units Tetsuya Sakai 1 and Makoto P. Kato 2 1 Microsoft Research Asia, P.R. China tetsuyasakai@acm.org](https://reader036.vdocument.in/reader036/viewer/2022062519/5697bfa71a28abf838c98f00/html5/thumbnails/2.jpg)
Introduction
1. NTCIR-9 1CLICK-1 -measure, the official evaluation metric of One Click Access
Task, discounts the value of each information unit based on its position within the textual output.
2. NTCIR-10 1CLICK-2 We complement the recall-like -measure with a simple,
precision-like metric called -measure as well as a combination of -measure and -measure, called .
2
![Page 3: One Click One Revisited: Enhancing Evaluation Based on Information Units Tetsuya Sakai 1 and Makoto P. Kato 2 1 Microsoft Research Asia, P.R. China tetsuyasakai@acm.org](https://reader036.vdocument.in/reader036/viewer/2022062519/5697bfa71a28abf838c98f00/html5/thumbnails/3.jpg)
-strings
The output of 1CLICK systems
3
![Page 4: One Click One Revisited: Enhancing Evaluation Based on Information Units Tetsuya Sakai 1 and Makoto P. Kato 2 1 Microsoft Research Asia, P.R. China tetsuyasakai@acm.org](https://reader036.vdocument.in/reader036/viewer/2022062519/5697bfa71a28abf838c98f00/html5/thumbnails/4.jpg)
Task and Data
Participating systems were expected to return important iUnits first, and to minimize the amount of text the user has to read.
The length of the vital string is used for defining an “optimal” output and for computing .
Every -string was evaluated by two assessors: we evaluate runs based on the Intersection data (I) and the Union data (U) of the iUnit matches.
4
![Page 5: One Click One Revisited: Enhancing Evaluation Based on Information Units Tetsuya Sakai 1 and Makoto P. Kato 2 1 Microsoft Research Asia, P.R. China tetsuyasakai@acm.org](https://reader036.vdocument.in/reader036/viewer/2022062519/5697bfa71a28abf838c98f00/html5/thumbnails/5.jpg)
-measure and
The Pseudo Minimal Output (PMO) be the set of gold-standard iUnits constructed for a
particular query
be the vital string and be the weight for iUnit
Sorting all vital strings by (first key) and (second key)
denote the offset position of within the PMO
denote the set of matched iUnits obtained by manually comparing the -string with the gold-standard iUnits
denote the offset position of
5
![Page 6: One Click One Revisited: Enhancing Evaluation Based on Information Units Tetsuya Sakai 1 and Makoto P. Kato 2 1 Microsoft Research Asia, P.R. China tetsuyasakai@acm.org](https://reader036.vdocument.in/reader036/viewer/2022062519/5697bfa71a28abf838c98f00/html5/thumbnails/6.jpg)
-measure and (Cont’d)
is defined as:
Let be a parameter that represents how the user’s patience runs out: When is set to a very large value, reduces to weighted
recall (W-recall), which is position-insensitive.
There is no theoretical guarantee that lies below one: given by may be used instead.
6
![Page 7: One Click One Revisited: Enhancing Evaluation Based on Information Units Tetsuya Sakai 1 and Makoto P. Kato 2 1 Microsoft Research Asia, P.R. China tetsuyasakai@acm.org](https://reader036.vdocument.in/reader036/viewer/2022062519/5697bfa71a28abf838c98f00/html5/thumbnails/7.jpg)
Effect of the Patience Parameter
The official 1CLICK-1 evaluation used (one minute) with .
We vary this parameter as follows and examine the outcome: (two minutes), (30 seconds) and (6 seconds).
Note that if is set to an extremely small value, most of the contents of the -strings will be ignored.
7
![Page 8: One Click One Revisited: Enhancing Evaluation Based on Information Units Tetsuya Sakai 1 and Makoto P. Kato 2 1 Microsoft Research Asia, P.R. China tetsuyasakai@acm.org](https://reader036.vdocument.in/reader036/viewer/2022062519/5697bfa71a28abf838c98f00/html5/thumbnails/8.jpg)
Effect of on the System Ranking
The -axis shows runs sorted by Mean -measure ()
8
![Page 9: One Click One Revisited: Enhancing Evaluation Based on Information Units Tetsuya Sakai 1 and Makoto P. Kato 2 1 Microsoft Research Asia, P.R. China tetsuyasakai@acm.org](https://reader036.vdocument.in/reader036/viewer/2022062519/5697bfa71a28abf838c98f00/html5/thumbnails/9.jpg)
-strings of Runs from KUIDL and MSRA1click The LOCAL query “Menard Aoyama Resort” (name
of a facility)
9
![Page 10: One Click One Revisited: Enhancing Evaluation Based on Information Units Tetsuya Sakai 1 and Makoto P. Kato 2 1 Microsoft Research Asia, P.R. China tetsuyasakai@acm.org](https://reader036.vdocument.in/reader036/viewer/2022062519/5697bfa71a28abf838c98f00/html5/thumbnails/10.jpg)
Results on the Patience Parameter (two minutes) produces rankings that are very
similar to (one minute), but (30 seconds) results in substantially different system rankings.
Given a test collection with a set of runs, discriminative power is measured by conducting a statistical significance test for every pair of runs.
10
![Page 11: One Click One Revisited: Enhancing Evaluation Based on Information Units Tetsuya Sakai 1 and Makoto P. Kato 2 1 Microsoft Research Asia, P.R. China tetsuyasakai@acm.org](https://reader036.vdocument.in/reader036/viewer/2022062519/5697bfa71a28abf838c98f00/html5/thumbnails/11.jpg)
Effect of on Discriminative Power The Achieved Significance Level (ASL) curves of
with varying The -axis represents the -value and the -axis represents run
pairs sorted by the -value. Metrics that are closer to the origin are the ones that are highly discriminative.
11
![Page 12: One Click One Revisited: Enhancing Evaluation Based on Information Units Tetsuya Sakai 1 and Makoto P. Kato 2 1 Microsoft Research Asia, P.R. China tetsuyasakai@acm.org](https://reader036.vdocument.in/reader036/viewer/2022062519/5697bfa71a28abf838c98f00/html5/thumbnails/12.jpg)
-measure, and (1/3)
We introduce a precision-like “terseness” metric for evaluating an -string of size :
As might exceed one, we also define given by . Although in reality never exceeded one for our data and
therefore holds.
12
![Page 13: One Click One Revisited: Enhancing Evaluation Based on Information Units Tetsuya Sakai 1 and Makoto P. Kato 2 1 Microsoft Research Asia, P.R. China tetsuyasakai@acm.org](https://reader036.vdocument.in/reader036/viewer/2022062519/5697bfa71a28abf838c98f00/html5/thumbnails/13.jpg)
-measure, and (2/3)
Finally, following the approach of the well-known -measure, we can define as:
Where letting reduces to a harmonic mean of and . We also examined .
13
![Page 14: One Click One Revisited: Enhancing Evaluation Based on Information Units Tetsuya Sakai 1 and Makoto P. Kato 2 1 Microsoft Research Asia, P.R. China tetsuyasakai@acm.org](https://reader036.vdocument.in/reader036/viewer/2022062519/5697bfa71a28abf838c98f00/html5/thumbnails/14.jpg)
-measure, and (3/3)
differs from the traditional nugget-based -measure in the following two aspects:1. It utilizes the positions of iUnits for computing the recall-
like .
2. Instead of relying on a fixed allowance parameter, it utilizes the vital string length of each iUnit for computing the precision-like -measure.
14
![Page 15: One Click One Revisited: Enhancing Evaluation Based on Information Units Tetsuya Sakai 1 and Makoto P. Kato 2 1 Microsoft Research Asia, P.R. China tetsuyasakai@acm.org](https://reader036.vdocument.in/reader036/viewer/2022062519/5697bfa71a28abf838c98f00/html5/thumbnails/15.jpg)
System Ranking by Different Metrics The -axis shows runs sorted by Mean -measure with
15
![Page 16: One Click One Revisited: Enhancing Evaluation Based on Information Units Tetsuya Sakai 1 and Makoto P. Kato 2 1 Microsoft Research Asia, P.R. China tetsuyasakai@acm.org](https://reader036.vdocument.in/reader036/viewer/2022062519/5697bfa71a28abf838c98f00/html5/thumbnails/16.jpg)
-strings of Runs from KUIDL and MSRA1click The QA query “The three duties of a Japanese
citizen.”
16
![Page 17: One Click One Revisited: Enhancing Evaluation Based on Information Units Tetsuya Sakai 1 and Makoto P. Kato 2 1 Microsoft Research Asia, P.R. China tetsuyasakai@acm.org](https://reader036.vdocument.in/reader036/viewer/2022062519/5697bfa71a28abf838c98f00/html5/thumbnails/17.jpg)
Effect of on
The -axis represents and the -axis represents Kendall’s with the Mean -measure ranking Note that means “ is times as important as ” and that .
We recommend for evaluating 1CLICK systems, along with the original .
17
![Page 18: One Click One Revisited: Enhancing Evaluation Based on Information Units Tetsuya Sakai 1 and Makoto P. Kato 2 1 Microsoft Research Asia, P.R. China tetsuyasakai@acm.org](https://reader036.vdocument.in/reader036/viewer/2022062519/5697bfa71a28abf838c98f00/html5/thumbnails/18.jpg)
Results on -measure and
Discriminative power of -measure, -measure and
18
![Page 19: One Click One Revisited: Enhancing Evaluation Based on Information Units Tetsuya Sakai 1 and Makoto P. Kato 2 1 Microsoft Research Asia, P.R. China tetsuyasakai@acm.org](https://reader036.vdocument.in/reader036/viewer/2022062519/5697bfa71a28abf838c98f00/html5/thumbnails/19.jpg)
Conclusions
The One Click Access Task (1CLICK) is one of the tasks of NTCIR that requires systems to return a concise multi-document summary of web pages in response to a query which is assumed to have been submitted in a mobile context.
Furthermore, we introduce the next round of the 1CLICK task called MobileClick, in which participants are required to submit a two-layered summarization suitable for mobile information access.
19
![Page 20: One Click One Revisited: Enhancing Evaluation Based on Information Units Tetsuya Sakai 1 and Makoto P. Kato 2 1 Microsoft Research Asia, P.R. China tetsuyasakai@acm.org](https://reader036.vdocument.in/reader036/viewer/2022062519/5697bfa71a28abf838c98f00/html5/thumbnails/20.jpg)
ThanksY. Hou et al. (Eds.): AIRS 2012, LNCS 7675, pp. 39–51, 2012.
© Springer-Verlag Berlin Heidelberg 2012
20