clef 2007 - budapest joint semeval/clef tasks: contribution of wsd to clir ubc: agirre, lopez de...
TRANSCRIPT
CLEF 2007 - Budapest
Joint SemEval/CLEF tasks: Contribution of WSD to CLIR
UBC: Agirre, Lopez de Lacalle, Otegi, Rigau,
FBK: Magnini
Irion Technologies: Vossen
CLEF 2007 - Budapest 2
WSD and SemEval
Word Sense Disambiguation When I went to bed at around two o'clock that night ,
everyone else was still out in the party. party:N:1 political organization party:N:2 social event
Potential for more precise expansion (translation) SemEval 2007
Framework for semantic evaluations Under auspices of SIGLEX (ACL) 19 tasks incl. WSD, SRL, full frames, people, … > 100 attendants in ACL workshop
CLEF 2007 - Budapest 3
Motivation for the task
WSD perspective In-vitro evaluations not fully satisfactory In-vivo evaluations in applications (MT, IR, …)
IR perspective Usefulness of WSD on IR/CLIR disputed, but … Real compared to artificial experiments Expansion compared to just WSD Weighted list of senses compared to best sense Controlling which word to disambiguate WSD technology has improved Coarser-grained senses (90% acc. on Semeval 2007)
CLEF 2007 - Budapest 4
Motivation for the task
Combining WSD and IR: Many possible variations Unfeasible for a single research team A public common dataset allows for the community to
explore different combinations. Tasks where we could hope to get positive impact:
High recall IR scenarios Short passage IR scenarios Q&A CLIR
We selected CLIR because of previous expertise of some of the organizers.
CLEF 2007 - Budapest 5
Two-stage framework
First stage (SemEval 2007 task 01): Participants: submit WSD results
Sense inventory WordNet 1.6 (multilinguality) Organizers:
Expansion / translation strategy fixed IR/CLIR system fixed (IR as upperbound)
Second stage (Proposed CLEF 2008 track): Organizers: provide several WSD annotations Participants: submit CLIR results with/without
WSD annotations
CLEF 2007 - Budapest 6
Outline
Description of the SemEval task (1st stage) Evaluation of results (1st stage) Conclusions (1st stage) Next step (2nd stage)
CLEF 2007 - Budapest 7
Description of the taskDatasets CLEF data:
Documents in English: LA94, GH95170.000 documents, 580 Mb raw text
300 topics: both in English and Spanish Existing relevance judgments
Due to time limitations of the exercise 16,6% of document collection
(we will have 100% shortly) Subset of relevance judgments, 201 topics
CLEF 2007 - Budapest 8
Description of the taskTwo subtasks for participantsEnglish WSD of the following: the document collection the topics
We limit to English at the time being.
Return WN 1.6 senses.
CLEF 2007 - Budapest 9
Description of the taskSteps of CLIR/IR systemStep 1: Participants return WSD resultsStep 2: Expansion / Translation Multilingual Central Repository (based on
EuroWN) 5 languages tightly connected To ILI concepts (WN 1.6 synsets) Mappings to other WN versions
Example: car sense 1 Expanded to synonyms: automobile Translated to equivalents: auto, coche
CLEF 2007 - Budapest 10
Description of the taskSteps of CLIR/IR systemStep 3: IR/CLIR system Adaptation of TwentyOne (Irion)
Pre-processing: XML Indexing: detected noun phrases only Title and description used for queries Stripped down to vector-space matching
CLEF 2007 - Budapest 11
Description of the taskThree evaluation settings IR with WSD of documents (English)
WSD of English documents Expansion of senses in the documents
IR with WSD of topics (English) WSD of English documents Expansion of senses in the documents
IR as upperbound of CLIR CLIR with WSD of documents:
WSD of English documents Translation of English documents Retrieval using Spanish topics
CLIR with WSD of topics (Spanish WSD, NO)
CLEF 2007 - Budapest 12
Evaluation and resultsParticipant systems Participants returned sense-tagged documents and
topics Two systems participated:
PUTOP from Princeton, unsupervised UNIBA from Bari, KB using WordNet
In-house system: ORGANIZERS, supervised, kNN classifiers
Other baselines: Noexp: original text Fullexp: expand to all senses WSDrand: return sense at random 1st: return first sense in WordNet Wsd50: 50% best senses (in-house WSD system only)
CLEF 2007 - Budapest 13
Evaluation and resultsS2AW and S3AW control
Indication of performance of WSD Not necessarily correlated with IR/CLIR results Supervised system (ORG) fares better
Prec. Recall Cov.Senseval-2 all wordsORG 0.584 0.577 93.61%UNIBA 0.498 0.375 75.39%PUTOP 0.388 0.240 61.92%Senseval-3 all wordsORG 0.591 0.566 95.76%UNIBA 0.484 0.338 69.98%PUTOP 0.334 0.186 55.68%
CLEF 2007 - Budapest 14
Evaluation and resultsResults (Mean Average Precision MAP)
IR: noexp best
CLIR: fullexp best ORG close far from IR
Expansion and
IR/CLIR system too simple
IRtops IRdocs CLIRnoexp 0.3599 0.3599 0.1446fullexp 0.1610 0.1410 0.2676UNIBA 0.3030 0.1521 0.1373PUTOP 0.3036 0.1482 0.1734Wsdrand 0.2673 0.1482 0.26171st 0.2862 0.1172 0.2637ORG 0.2886 0.1587 0.2664wsd50 0.2651 0.1479 0.2640
Mean Average Precision
CLEF 2007 - Budapest 15
Analysis# words in expansion of docs.
IR: the less the better (but) MAP: noexp > ORG > UNIBA MW: noexp < … < ORG
CLIR: the more the better (but) MAP: fullexp > ORG > ORG (50) MW: fullexp > ORG(50) > ORG
WSD allows for moreinformed expansion
Eng.
Sp.
NO WSD
noexp 9 9fullexp 93 58
UNIBAwsdbest 19 17
wsd50 19 17
PUTOPwsdbest 20 16
wsd50 20 16
Baseline1st 24 20wsdrand 24 19
ORG. wsdbest 26 21
wsd50 36 27Millions of words
CLEF 2007 - Budapest 16
Conclusions
Main goals met: First try on evaluating WSD on CLIR Large dataset prepared and preprocessed WSD allows for more informed expansion
On the negative side: Participation low
SemEval overload, 10 interested No improvement over baseline
Expansion and IR/CLIR naive
CLEF 2007 - Budapest 17
Next stage: CLEF 2008
WSD results provided: WSD of whole collection Best WSD systems in SemEval 2007
CLEF teams will be able to try more sophisticated IR/CLIR methods
Feasibility of a Q/A exercise Suggestions for cooperation on other tasks welcome
Thank you for your attention!http://ixa2.si.ehu.es/semeval-clir