fate: a framenet annotated corpus for textual entailment marco pennacchiotti, aljoscha burchardt...

17
FATE: a FrameNet Annotated corpus for Textual Entailment Marco Pennacchiotti, Aljoscha Burchardt Computerlinguistik Saarland University, Germany LREC 2008 , Marrakech , 28 May 2008 SALSA II - The Saarbrücken Lexical Semantics Acquisition Project

Upload: bertina-bond

Post on 18-Dec-2015

214 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: FATE: a FrameNet Annotated corpus for Textual Entailment Marco Pennacchiotti, Aljoscha Burchardt Computerlinguistik Saarland University, Germany LREC 2008,

FATE:a FrameNet Annotated corpus for

Textual Entailment

Marco Pennacchiotti, Aljoscha BurchardtComputerlinguistik

Saarland University, Germany

LREC 2008 , Marrakech , 28 May 2008

SALSA II - The Saarbrücken Lexical Semantics Acquisition Project

Page 2: FATE: a FrameNet Annotated corpus for Textual Entailment Marco Pennacchiotti, Aljoscha Burchardt Computerlinguistik Saarland University, Germany LREC 2008,

Summary

• FrameNet and Textual Entailment

• FATE annotation schema

• Annotation examples and statistics

• Conclusions

28/05/2008 2 / 17FATE - Marco Pennacchiotti

Page 3: FATE: a FrameNet Annotated corpus for Textual Entailment Marco Pennacchiotti, Aljoscha Burchardt Computerlinguistik Saarland University, Germany LREC 2008,

Frame Semantics• Frame: conceptual structure modeling a prototypical situation• Frame Elements (FE): participants of the situation• Frame Evoking elements (FEE): predicates evoking the situation

[Fillmore 1976, 2003]

28/05/2008 3 / 17FATE - Marco Pennacchiotti

Predicate-argument level normalizations

• FrameNet Berkeley Project 1

– Database of frames for the core lexicon of English– 800 frames, 10.000 lemmas, 135.000 annotated sentences

(1) http://framenet.icsi.berkeley.edu

“Evelyn spoke about her past” “Evelyn’s statement about her past”

STATEMENT(SPEAKER: Evelyn; TOPIC: her past)

Page 4: FATE: a FrameNet Annotated corpus for Textual Entailment Marco Pennacchiotti, Aljoscha Burchardt Computerlinguistik Saarland University, Germany LREC 2008,

Textual Entailment (TE)Given two text fragments, the Text T and the Hypothesis H,

T entails H if the meaning of H can be inferred from the meaning of T, as would typically interpreted by people

[Dagan 2005]

T: “Yahoo has recently acquired Overture” H: “Yahoo owns Overture”T H

• Recognizing Textual Entailment (RTE)– recognize if entailment holds for a given (T,H) pair– Models core inferences of many NLP applications (QA, IE, MT,…)

• RTE Challenges [Dagan et al.,2005 ; Giampiccolo et al., 2007]– Compare systems for RTE– Corpus: 800 training pairs, 800 test pairs, evenly split in + and - pairs

28/05/2008 4 / 17FATE - Marco Pennacchiotti

Page 5: FATE: a FrameNet Annotated corpus for Textual Entailment Marco Pennacchiotti, Aljoscha Burchardt Computerlinguistik Saarland University, Germany LREC 2008,

Predicate-argument and RTE• Predicate-level inference plays a relevant role in TE (20% of positive

examples in RTE-2 [Garoufi, 2007] )

An avalanche has struck a popular skiing resort in Austria, killing at least 11 people.

Humans died in an avalanche.

• Implementation gap :

• [Burchardt et al.,2007] : FrameNet system comparable to lexical overlap • [Hickl et al.,2006] : PropBank-based features are not effective• [Rana et al.,2005]: DIRT paraphrase repository does not help

28/05/2008 5 / 17FATE - Marco Pennacchiotti

DEATH(PROTAGONIST: 11 people / humans ; CAUSE: avalanche / avalanche )

Page 6: FATE: a FrameNet Annotated corpus for Textual Entailment Marco Pennacchiotti, Aljoscha Burchardt Computerlinguistik Saarland University, Germany LREC 2008,

FATE corpus

• Reference corpus : RTE-2 test set, 800 pairs, 29,000 tokens• Frame resource : FrameNet version 1.3• Corpus Format : SALSA/TIGER XML [Burchardt et al.,2006]

• Pre-processing : annotation on top of Collins parser syntactic analysis: T and H are randomly reordered to avoid biases

• Annotation : performed by one highly experienced annotator: inter-annotator agreement over 5% of the corpus

– FEE-agreement : 82%– Frame-agreement: 88%– Role-agreement: 91%

: annotation carried out using the SALTO tool 1

(1) http://www.coli.uni-saarland.de/projects/salsa/salto/doc

28/05/2008 6 / 17FATE - Marco Pennacchiotti

FATE: a manually frame-annotated Textual Entailment corpus, to study the role of frame semantics in RTE

Page 7: FATE: a FrameNet Annotated corpus for Textual Entailment Marco Pennacchiotti, Aljoscha Burchardt Computerlinguistik Saarland University, Germany LREC 2008,

FATE annotation process: an example

28/05/2008 7 / 17FATE - Marco Pennacchiotti

Collins synt. an. Collins synt. an.

full-text annotation (all words considered) [Ruppenhofer,2007]

Page 8: FATE: a FrameNet Annotated corpus for Textual Entailment Marco Pennacchiotti, Aljoscha Burchardt Computerlinguistik Saarland University, Germany LREC 2008,

FATE annotation process: an example

28/05/2008 8 / 17FATE - Marco Pennacchiotti

frameframe

FEEFEE

Collins synt. an. Collins synt. an.

Page 9: FATE: a FrameNet Annotated corpus for Textual Entailment Marco Pennacchiotti, Aljoscha Burchardt Computerlinguistik Saarland University, Germany LREC 2008,

FATE annotation process: an example

28/05/2008 9 / 17FATE - Marco Pennacchiotti

frameframe

FEFE

Collins synt. an. Collins synt. an.

FEEFEE FE fillerFE filler

Maximization principle: chose the largest constituent possible when annotating

Page 10: FATE: a FrameNet Annotated corpus for Textual Entailment Marco Pennacchiotti, Aljoscha Burchardt Computerlinguistik Saarland University, Germany LREC 2008,

Annotation Schema

• Intuition: annotate as FEE only those words evoking a relevant situation (frame) in the sentence at hand– Very intuitive flavor, but high agreement: 83% on a

pilot set of 15 sentences

Relevance Principle

“Authorities in Brazil hold 200 people as hostage”

LEADERSHIP DETAIN PEOPLE KIDNAPPING

28/05/2008 10 / 17FATE - Marco Pennacchiotti

VICTIMPLACEPERPETRATOR

Page 11: FATE: a FrameNet Annotated corpus for Textual Entailment Marco Pennacchiotti, Aljoscha Burchardt Computerlinguistik Saarland University, Germany LREC 2008,

Annotation Schema

• On T of positive pairs, annotate only the fragments (spans) contributing to the inferential process– Spans are obtained from the ARTE annotation

[Garoufi,2007]– For negative pairs it is not straightforward to derive spans,

hence we do full annotation

Span Annotation

T: “Soon after the EZLN had returned to Chiapas, Congress approved a different version of the COCOPA Law, which did not include the autonomy clauses, claiming they were in contradiction with some constitutional rights (private property and secret voting); this was seen as a betrayal by the EZLN and other political groups.”

H: “EZLN is a political group.”

28/05/2008 11 / 17FATE - Marco Pennacchiotti

Page 12: FATE: a FrameNet Annotated corpus for Textual Entailment Marco Pennacchiotti, Aljoscha Burchardt Computerlinguistik Saarland University, Germany LREC 2008,

Annotation Schema

• Unknown frames: use an UNKNOWN frame for words evoking situations not present in the FrameNet database

• Anaphora

• Copula and support verbs

• Modal expressions

• Metaphors

• Existential constructions

• …

Other guidelines

28/05/2008 12 / 17FATE - Marco Pennacchiotti

Page 13: FATE: a FrameNet Annotated corpus for Textual Entailment Marco Pennacchiotti, Aljoscha Burchardt Computerlinguistik Saarland University, Germany LREC 2008,

Corpus statistics

• Annotated pairs : 800 (400 positive, 400 negatives)

• Annotated frames : 4,500: avg. 5.6 frames per pair: 1,600 frames in positive pairs: 2,800 in negative pairs

• Annotated roles : 9,500:avg. 2.1 roles per frame

• Annotation time : 230 hours: 90 h for positive pairs (13 min/pair): 140 h for negative pairs (21 min/pair)

28/05/2008 13 / 17FATE - Marco Pennacchiotti

Page 14: FATE: a FrameNet Annotated corpus for Textual Entailment Marco Pennacchiotti, Aljoscha Burchardt Computerlinguistik Saarland University, Germany LREC 2008,

FrameNet and RTE (simple case)

28/05/2008 14 / 17FATE - Marco Pennacchiotti

• Syntactic normalization– Active / Passive

EDUCATIONAL_TEACHING(STUDENT: ground soldiers / soldiers; MATERIAL: virtual reality/ virtual reality)

Page 15: FATE: a FrameNet Annotated corpus for Textual Entailment Marco Pennacchiotti, Aljoscha Burchardt Computerlinguistik Saarland University, Germany LREC 2008,

(1) Resource coverage is too low(2) Models for predicate-argument inference are weak(3) Automatic annotation models (SRL) are not good

enough to be safely used in RTE

Implementation gap insights

28/05/2008 15 / 17FATE - Marco Pennacchiotti

• FrameNet coverage is good:– 373 Unknown frames (8 % of total frames)– Unknown roles 1 % of total roles

• Coverage is unlikely to be a limiting factor for using FrameNet in applications

Page 16: FATE: a FrameNet Annotated corpus for Textual Entailment Marco Pennacchiotti, Aljoscha Burchardt Computerlinguistik Saarland University, Germany LREC 2008,

(1) Resource coverage is too low(2) Models for predicate-argument inference are weak(3) Automatic annotation models (SRL) are not good

enough to be safely used in RTE

28/05/2008 16 / 17FATE - Marco Pennacchiotti

• To better study predicate-argument inference in RTE• To experiment frame-RTE models on a gold-std corpus• To learn better SRL models, by training on FATE

Corpus is freely available on-line

Why should you use FATE ?

Page 17: FATE: a FrameNet Annotated corpus for Textual Entailment Marco Pennacchiotti, Aljoscha Burchardt Computerlinguistik Saarland University, Germany LREC 2008,

Thank you!Questions?

28/03/2008 FATE – Marco Pennacchiotti 17 / 17

FATE download: http://www.coli.uni-saarland.de/projects/salsa/fate

[email protected]

www.coli.uni-saarland.de/~pennacchiotti