overview of the second ntcir...

Overview of the Second NTCIR Workshop

National Institute of Informatics (NII), Japan

1.1 Purpose

3.

1.2 Brief History

1.3 Focus of the NTCIR Workshop

,

Traditional IR Testing

Challenging Issues

1.4 Evaluation Workshops

Benefits of Evaluation Workshops

pooling

2.1 Tasks

.

Chinese Text Retrieval Tasks (CHTR):

Japanese-English IR Tasks (JEIR):

Text Summarization Challenge (TSC):

2.2 Participants

Participants of the Second NTCIR Workshop

Yokohama National Univ. (Japan)Waseda Univ. (Japan)

Table 1. Number of Participating Groups

subtask

Table 2 Attribute of Participating Groups

Table 3. Distribution of Participating Groups

CHTR JEIR. TSC

enrl sub enrl sub enrl sub

Canada 1 0 1 0 0 0

China 2 2 0 0 0 0

Hong Kong 2 1 0 0 0 0

Japan 3 3 21 18 12 9

Korea 0 0 1 1 0 0

Taiwan 2 2 2 2 1 0

UK 1 0 2 1 0 0

USA 5 3 4 3 2 0

total 16 11 31 25 15 9

Comparison with the First NTCIR Workshop

Fig. 1 Number of Participants of Each Task

AdHoc-

JEIR/mono

CLIR-

JEIR/CLIR

CHTR

TSCTermExtractio

0

10

20

30

40

50

60

TSC 0 9

TermExtraction 9 0

CHTR 0 11

CLIR-JEIR/CLIR 10 14

AdHoc-

JEIR/mono

18 17

ntcir-ws1 ntcir-ws2

2.3 Procedures and Evaluation

1 June 2000:

10 August 2000:

30 August 2000:

8 September 2000:

18 September 2000:

20 October 2000:

27 November - 1 December 2000:

28 December 2000:

10 January 2001:

7-9 March 2001:

CHTR and JEIR:

TSC:

3. Test Collections

CIRB010;

NTCIR-1;

NTCIR-2;

NTCIR-2 Summ

<REC>

<ACCN>gakkai-0000011144</ACCN>

<TITL TYPE="kanji">d ´ d d } -

SGML ± ¬ ± Ê </TITL>

<TITE TYPE="alpha">Electronic manuscripts, electronicpublishing, and electronic library </TITE>

<AUPK TYPE="kanji"> ³õ</AUPK>

<AUPE TYPE="alpha">Negishi, Masamitsu</AUPE>

<CONF TYPE="kanji">¤� ( b)</CONF>

<CNFE TYPE="alpha">The Special Interest Group Notes of

IPSJ</CNFE><CNFD>1991. 11. 19</CNFD>

<ABST TYPE="kanji"><ABST.P>d

�S £ ó Ê ö d

» ® ã ®ü �¢

d K SGML (Standard

Generalized Markup Language) Î ® �

Ú SGML ±

S CD-ROM ¬ ± Ê ¾ m©

d } `Ô W]

Ë� Z I

âè Z I Z§ âè

_ « _

</ABST.P></ABST>

<ABSE TYPE="alpha"><ABSE.P>Current situation onelectronic processing in preparation, editing, printing, and

distribution of documents is summarized and its future trend is

discussed, with focus on the concept: "Electronic publishing:Movements in the country concerning an international standard

for electronic publishing. Standard Generalized MarkupLanguage (SGML) is assumed to be important, and the results

from an experiment at NACSIS to publish an "SGML

Experimental Journal" and to make its full-text CD-ROM versionare reported. Various forms of "Electronic Library" are also

investigated. The author puts emphasis on standardization, astechnological problems for those social systems based on the

cultural settings of publication of the country, are the problems of

acceptance and penetration of the technology in thesociety.</ABSE.P></ABSE>

<KYWD TYPE="kanji">d // d } // d ´ //

SGML // // S </KYWD>

<KYWE TYPE="alpha">Electronic publishing // Electronic

library // Electronic manuscripts // SGML // NACSIS // Full textdatabases</KYWE>

<SOCN TYPE="kanji"> </SOCN>

<SOCE TYPE="alpha">Information Processing Society ofJapan</SOCE>

</REC>

Fig. 2 A Sample of a Document Record

3.1 Documents

3.2 Topics

<TOPIC q=0005><TITLE>

Á¥ ³

</TITLE><DESCRIPTION>

Á¥ ³

</DESCRIPTION>

<NARRATIVE>

Á¥ » ]

³ �ç ��

O ³

³ Á¥ ³ @

_Ê ± ñÄ

� ± � ê

Á¥ ³ �

�

</NARRATIVE><CONCEPT>

Á¥Ið, ¬ Í, x,

</CONCEPT><FIELD>

1.d §ä

</FIELD>

</TOPIC>

Fig. 3 A Sample Topic

3.3 Relevance Judgments (Right Answers)

Multi-grade Judgments

Level

1

Level 2

Judgments by Different Users

,

rigid relevance

Level 1 relaxed

relevance

Level 2

Additional Information

Rank-Degree Sensitive Evaluation Metric on

Multi-grade Relevance Judgments

3.4 Linguistic Analysis

3.5 Robustness of the System Evaluation

using the Test Collections

3.6 Differences between CHTR and JEIR

Table 4. "Query Types" in CHTR and JEIR

4.1 Round Tables at the Workshop Meeting

4.2 Future Directionsworkshops NTCIR WS1

topic

field(s)

used*

Ad Hoc &

CLIR (JE)

Chinese

Text

Retrieval

Japanese

& English

IR

T Very Short TI (Title) T

TD T+D

D

Short

without

Conce t

D only

C or DC or

TDC

TCVS (Very

Short

N or DN or

TN or TDN

Long

without

Conce t

N without

C

NC or

DNC or

TNC or

Long with

ConceptN+C

: mandatory for automatic query construc

w ere = , =

(QUESTION in CHTR), N=NARRATIVE,

C=CONCEPT

NTCIR WS2

LO (Long)

Short with

Concept

C without

N

SO (Short)

ACKNOWLEDGMENTS

[1] NTCIR Project: http://research.nii.ac.jp/ntcir/

[2] NTCIR Workshop 1: Proceedings of the First NTCIR

Workshop on Research in Japanese Text Retrieval

and Term Recognition, 30 Aug.�1 Sept., 1999, Tokyo,

ISBN4-924600-77-6.

http://research.nii.ac.jp/ntcir/workshop/OnlineProcee

dings/)

[3] IREX URL:http://cs.nyu.edu/cs/projects/proteus/irex/

[4] Sung, H.M. "HANTEC Collection". Presented at the

panel on IR Evaluation in the 4th IRAL, Hong Kong,

30 Sept.-3 Oct. 2000.

[5] Kando, N.: Cross-Linguistic Scholarly Information

Transfer and Database Services in Japan. Annual

Meeting of the ASIS, Washington DC. Nov. 1, 1997

[6] TREC URL: http://trec.nist.gov/

[7] Smeaton, A., Harman, D. K. "The TREC experiments

and their impact on Europe", Journal of Information

Science, Vol23, No.2, pp.169-174, 1997.

[8] Sparck Jones, K., Rijsbergen, C. J." Information

retrieval test collections", Journal of Documantation,

Vol.32, No.1, pp.59-72, 1975.

[9] Panel of IR Evaluation of the World. RIAO 2000,

Paris, France, April 2000.

[10] TDT URL: http://www.nist.gov/speech Selecting

"benchmark tests" and then "TDT".

[11] CLEF URL: http://www.iei.pi.cnr.it/DELOS/CLEF/

[12] http://www.itl.nist.gov/iaui/894.02/related_projects/

tipster_summac/

[13] Spink, A., Bateman, J. From highly relevant to not

relevant: Examining different regions of relevance.

Information Processing and Management, Vol.34,

No.5, pp.599-622, 1998

[14] Dunlop, M.D.Reflections on Mira, Journal of the

Americal Society for Information Sciences, Vol.51,

No.14, pp.1269-1274, 2000

[15] Spink, A., Greisdorf, H. Regions and levels:

Measuring and mapping users' relevance judgments.

Journal of the Americal Society for Information

Sciences, Vol.52, No.2, pp.161-173, 2001

[16] Campbell, I., Interactive evaluation of the ostensive

model using a new test collection of images with

multiple relevance assessments, Information

Retrieval, Vol.2, No.1,pp.87-114, 2000

[17] Reid, J. A task-oriented non-interactive evaluation

methodology for information retrieval systems,

Information Retrieval, Vol.2, No.1,pp 115-129, 2000

[18] Chiang, Yu-ting: A Study on Design and

Implementation for Chinese Information Retrieval

Benchmark. Master Thesis, National Taiwan

University, 1999, 184 p.

[19] Shaw, W.M., Jr, et al.: The cystic fibrosis database:

Content and research opportunities. Library and

Information Science Research, Vol.13, pp.347-366,

1991.

[20] Hersh, W., Buckley, C., Leone, T.J., Kichman, D.H.:

OHSUMED: an Interactive Retrieval Evaluation and

New Large Test Collection for Research. In

Proceedings of 17th Annual International ACM-

SIGIR Conference on Research and Development in

Information Retrieval. p.192-201, Dublin, Ireland,

1994.

[21] Losee, R.M.: Text retrieval and filtering: analytic

models of performance. Kluwer, 1998

[22] Borlund, P., Ingwersen, P.: Measures of relative

relevance and ranked half-life: Performance

indicators for interactive IR. In Proceedings of 21st

Annual International ACM-SIGIR Conference on

Research and Development in Information Retrieval.

p.324-331, Melbourne, Australia, August. 1998.

[23] Jarvelin, K., Kekalainen, J.: IR evaluation methods

for retrieving highly relevant documents. In

Proceedings of 23rd Annual International ACM-


Information Retrieval. p. 41-48, Philadelphia, PA,

USA, July 2000.

[24] Yoshioka, M., Kuriyiama, K., Kando, N.: Analysis

on the Usage of Japanese Segmented Texts in the

NTCIR Workshop 2. In NTCIR Workshop 2 :

Proceedings of the Second NTCIR Workshop on

Research in Chinese & Japanese Text Retrieval and

Text Summarization, Tokyo, June 2000- March 2001

iISBNF4-924600-96-2) (to appear)

[25] Kando, N., Kuriyama, K., Yoshioka, M. Evaluation

based on multi-grade relevance judgements. IPSJ SIG

Notes, July 2001 (to appear)

[26] Kando, N, Nozue, T., Kuriyama, K., Oyama, K.:

NTCIR-1: Its Policy and Practice, IPSJ SIG Notes,

Vol.99, No.20, pp. 33-40, 1999 [in Japanese].

[27] Kuriyama, K., Nozue, T., Kando, N., Oyama, K.:

Pooling for a Large Scale Test Collection: Analysis

of the Search Results for the Pre-test of the NTCIR-1

Workshop, IPSJ SIG Notes, Vol.99-FI-54, pp.25-32

May, 1999 [in Japanese].

[28] Kuriyama, K., Kando, K.: Construction of a Large

Scale Test Collection: Analysis of the Training

Topics of the NTCIR-1, IPSJ SIG Notes, Vol.99-FI-

55, pp.41-48, July 1999 [in Japanese].

[29] Kando, N., Eguchi, K., Kuriyama, K.: Construction

of a Large Scale Test Collection: Analysis of the Test

Topics of the NTCIR-1, In Proceedings of IPSJ

Annual Meeting [in Japanese]. pp.3-107 -- 3-108, 30

Sept -3 Oct. 1999.

[30] Kuriyama, K., Yoshioka, M., Kando, N.: Effect of

Cross-Lingual Pooling. In NTCIR Workshop 2 :

Proceedings of the Second NTCIR Workshop on

Research in Chinese & Japanese Text Retrieval and

Text Summarization, Tokyo, June 2000- March 2001

iISBNF4-924600-96-2) (to appear)

Voorhees, E.M.: Variations in Relevance Judgments

and the Measurement of Retrieval Effectiveness, In

Proceedings of 21st Annual International ACM-


Information Retrieval. pp. 315-323, Melbourne,

Australia, August. 1998