1 the pascal recognizing textual entailment challenges - rte-1,2,3 ido daganbar-ilan university,...

48
1 The PASCAL Recognizing Textual Entailment Challenges - RTE-1,2,3 Ido Dagan Bar-Ilan University, Israel with …

Upload: devin-croxford

Post on 15-Dec-2015

217 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: 1 The PASCAL Recognizing Textual Entailment Challenges - RTE-1,2,3 Ido DaganBar-Ilan University, Israel with …

1

The PASCALRecognizing Textual Entailment

Challenges - RTE-1,2,3

Ido Dagan Bar-Ilan University, Israelwith …

Page 2: 1 The PASCAL Recognizing Textual Entailment Challenges - RTE-1,2,3 Ido DaganBar-Ilan University, Israel with …

2

Recognizing Textual Entailment

PASCAL NOE Challenge2004-5

Ido Dagan, Oren glickman Bar-Ilan University, IsraelBernardo Magnini ITC-irst, Trento, Italy

Page 3: 1 The PASCAL Recognizing Textual Entailment Challenges - RTE-1,2,3 Ido DaganBar-Ilan University, Israel with …

3

The Second PASCAL Recognising Textual Entailment Challenge

Roy Bar-Haim, Ido Dagan, Bill Dolan, Lisa Ferro, Danilo Giampicollo, Bernardo Magnini, Idan Szpektor

Bar-Ilan, CELCT, ITC-irst, Microsoft Research, MITRE

Page 4: 1 The PASCAL Recognizing Textual Entailment Challenges - RTE-1,2,3 Ido DaganBar-Ilan University, Israel with …

4

The Third Recognising Textual Entailment

Challenge

Danilo Giampiccolo (CELCT) and Bernardo Magnini (FBK-ITC)

With Ido Dagan (Bar-Ilan) and Bill Dolan (Microsoft Research) Patrick Pantel (USC-ISI), for Resources Pool

Hoa Dang and Ellen Voorhees (NIST), for Extended Task

Page 5: 1 The PASCAL Recognizing Textual Entailment Challenges - RTE-1,2,3 Ido DaganBar-Ilan University, Israel with …

5

RTE Motivation

• Text applications require semantic inference• A common framework for addressing applied

inference as a whole is needed, but still missing– Global inference is typically application dependent– Application-independent approaches and resources exist

for some semantic sub-problems

• Textual entailment may provide such common application-independent semantic framework

Page 6: 1 The PASCAL Recognizing Textual Entailment Challenges - RTE-1,2,3 Ido DaganBar-Ilan University, Israel with …

6

Framework Desiderata

A framework for modeling a target level of language processing should provide:

1) Generic module for applications– A common underlying task, unified interface (cf.

parsing)

2) Unified paradigm for investigating sub-phenomena

Page 7: 1 The PASCAL Recognizing Textual Entailment Challenges - RTE-1,2,3 Ido DaganBar-Ilan University, Israel with …

7

Outline

• The textual entailment task – what and why?

• Evaluation dataset & methodology

• Participating systems and approaches

• Potential for machine learning

• Framework for investigating semantics

Page 8: 1 The PASCAL Recognizing Textual Entailment Challenges - RTE-1,2,3 Ido DaganBar-Ilan University, Israel with …

8

Natural Language and Meaning

Meaning

Language

Ambiguity

Variability

Page 9: 1 The PASCAL Recognizing Textual Entailment Challenges - RTE-1,2,3 Ido DaganBar-Ilan University, Israel with …

9

Variability of Semantic Expression

Model variability as relations between text expressions:

• Equivalence: text1 text2 (paraphrasing)• Entailment: text1 text2 – the general case

Dow ends up

Dow climbs 255

The Dow Jones Industrial Average closed up 255

Stock market hits a record high

Dow gains 255 points

Page 10: 1 The PASCAL Recognizing Textual Entailment Challenges - RTE-1,2,3 Ido DaganBar-Ilan University, Israel with …

10

Typical Application Inference

Overture’s acquisition by Yahoo

Yahoo bought Overture

Question Expected answer formWho bought Overture? >> X bought Overture

• Similar for IE: X buy Y

• “Semantic” IR: t: Overture was bought …

• Summarization (multi-document) – identify redundant info

• MT evaluation (and recent ideas for MT)

• Educational applications, …

text hypothesized answer

entails

Page 11: 1 The PASCAL Recognizing Textual Entailment Challenges - RTE-1,2,3 Ido DaganBar-Ilan University, Israel with …

11

KRAQ'05 Workshop - KNOWLEDGE and REASONING for ANSWERING QUESTIONS

(IJCAI-05)

CFP:– Reasoning aspects:

    * information fusion,    * search criteria expansion models     * summarization and intensional answers,    * reasoning under uncertainty or with incomplete

knowledge,– Knowledge representation and integration:

    * levels of knowledge involved (e.g. ontologies, domain knowledge),

    * knowledge extraction models and techniques to optimize response accuracy

… but similar needs for other applications – can entailment provide a common empirical task?

Page 12: 1 The PASCAL Recognizing Textual Entailment Challenges - RTE-1,2,3 Ido DaganBar-Ilan University, Israel with …

12

Classical Entailment Definition

• Chierchia & McConnell-Ginet (2001):A text t entails a hypothesis h if h is true in every circumstance (possible world) in which t is true

• Strict entailment - doesn't account for some uncertainty allowed in applications

Page 13: 1 The PASCAL Recognizing Textual Entailment Challenges - RTE-1,2,3 Ido DaganBar-Ilan University, Israel with …

13

“Almost certain” Entailments

t: The technological triumph known as GPS … was incubated in the mind of Ivan Getting.

h: Ivan Getting invented the GPS.

Page 14: 1 The PASCAL Recognizing Textual Entailment Challenges - RTE-1,2,3 Ido DaganBar-Ilan University, Israel with …

14

Applied Textual Entailment• Directional relation between two text

fragments: Text (t) and Hypothesis (h):

t entails h (th) if humans reading t will infer that h is most likely true

• Operational (applied) definition:– Human gold standard - as in NLP applications– Assuming common background knowledge – which

is indeed expected from applications

Page 15: 1 The PASCAL Recognizing Textual Entailment Challenges - RTE-1,2,3 Ido DaganBar-Ilan University, Israel with …

16

Evaluation Dataset

Page 16: 1 The PASCAL Recognizing Textual Entailment Challenges - RTE-1,2,3 Ido DaganBar-Ilan University, Israel with …

17

Generic Dataset by Application Use

• 7 application settings in RTE-1, 4 in RTE-2/3– QA – IE– “Semantic” IR– Comparable documents / multi-doc summarization– MT evaluation– Reading comprehension – Paraphrase acquisition

• Most data created from actual applications output• ~800 examples in development and test sets• 50-50% YES/NO split

Page 17: 1 The PASCAL Recognizing Textual Entailment Challenges - RTE-1,2,3 Ido DaganBar-Ilan University, Israel with …

18

Some Examples

TEXTHYPOTHESISTASKENTAIL-

MENT

1Regan attended a ceremony in Washington to commemorate the landings in Normandy.

Washington is located inNormandy.

IEFalse

2Google files for its long awaited IPO.Google goes public.IRTrue

3

…: a shootout at the Guadalajara airport in May, 1993, that killed Cardinal Juan Jesus Posadas Ocampo and six others.

Cardinal Juan Jesus Posadas Ocampo died in 1993.

QATrue

4

The SPD got just 21.5% of the votein the European Parliament elections,while the conservative opposition partiespolled 44.5%.

The SPD is defeated by

the opposition parties.IETrue

Page 18: 1 The PASCAL Recognizing Textual Entailment Challenges - RTE-1,2,3 Ido DaganBar-Ilan University, Israel with …

19

Final Dataset (RTE-2)• Average pairwise inter-judge agreement: 89.2%

– Average Kappa 0.78 – substantial agreement– Better than RTE-1

• Removed 18.2% of pairs due to disagreement (3-4 judges)

• Disagreement example:– (t) Women are under-represented at all political levels ...

(h) Women are poorly represented in parliament.

• Additional review removed 25.5% of pairs– too difficult / vague / redundant

Page 19: 1 The PASCAL Recognizing Textual Entailment Challenges - RTE-1,2,3 Ido DaganBar-Ilan University, Israel with …

20

Final Dataset (RTE-3)

• Each pair judged by three annotators

• Pairs on which the annotators disagreed were filtered-out.

• Average pairwise annotator agreement: 87.8% (Kappa level of 0.75)

• Filtered-out pairs:– 19.2 % due to disagreement– 9.4 % as controversial, too difficult, or too similar to

other pairs

Page 20: 1 The PASCAL Recognizing Textual Entailment Challenges - RTE-1,2,3 Ido DaganBar-Ilan University, Israel with …

21

Progress from 1 to 3

• More realistic application data:– RTE-1: some partly synthetic examples– RTE-2&3 mostly:

• Input from common benchmarks for the different applications• Output from real systems

– Test entailment potential across applications• Text length:

– RTE-1&2: one-two sentences– RTE-3: 25% full paragraphs, requires discourse modeling/anaphora

• Improve data collection and annotation– Revised and expanded guidelines– Most pairs triply annotated, some across organizers sites

• Provide linguistic pre-processing, RTE Resources Pool• RTE-3 pilot task by NIST: 3-way judgments; explanations

Page 21: 1 The PASCAL Recognizing Textual Entailment Challenges - RTE-1,2,3 Ido DaganBar-Ilan University, Israel with …

22

Suggested Perspective

RE the Arthur Bernstein competition:

“… Competition, even a piano competition, is legitimate … as long as it is just an anecdotal side effect of the musical culture scene, and doesn’t threat to overtake the center stage”

Haaretz Israeli News Paper,Culture Section, April 1st, 2005

Page 22: 1 The PASCAL Recognizing Textual Entailment Challenges - RTE-1,2,3 Ido DaganBar-Ilan University, Israel with …

23

Participating Systems

Page 23: 1 The PASCAL Recognizing Textual Entailment Challenges - RTE-1,2,3 Ido DaganBar-Ilan University, Israel with …

24

Participation

• Popular challenges, world wide:– RTE-1 – 17 groups – RTE-2 – 23 groups – RTE-3 – 26 groups

• 14 Europe, 12 US

• 11 newcomers (~40 groups so far)

• 79 dev-set downloads (44 planned, 26 maybe)

• 42 test-set downloads

• Joint ACL-07/PASCAL workshop (~70 participants)

Page 24: 1 The PASCAL Recognizing Textual Entailment Challenges - RTE-1,2,3 Ido DaganBar-Ilan University, Israel with …

25

Methods and Approaches• Estimate similarity match between t and h

(coverage of h by t): – Lexical overlap (unigram, N-gram, subsequence)– Lexical substitution (WordNet, statistical)– Lexical-syntactic variations (“paraphrases”)– Syntactic matching/edit-distance/transformations– Semantic role labeling and matching– Global similarity parameters (e.g. negation, modality)– Anaphora resolution

• Probabilistic tree-transformations• Cross-pair similarity• Detect mismatch (for non-entailment)• Logical interpretation and inference

Page 25: 1 The PASCAL Recognizing Textual Entailment Challenges - RTE-1,2,3 Ido DaganBar-Ilan University, Israel with …

26

Dominant approach: Supervised Learning

• Features model various aspects of similarity and mismatch• Classifier determines relative weights of information sources• Train on development set and auxiliary t-h corpora

t,hSimilarity Features:

Lexical, n-gram,syntacticsemantic, global

Feature vector

Classifier

YES

NO

Page 26: 1 The PASCAL Recognizing Textual Entailment Challenges - RTE-1,2,3 Ido DaganBar-Ilan University, Israel with …

27

Parse-based Proof Systems

rainverb

whenadj

leaveverb

whaexpletive

ROOTi

itotheri

Marynoun

Johnnoun

subj

conj

N2noun

N1nounconjN2noun

It rained when John and Mary left

itother

rainverb

whenadj

leaveverb

wha

ROOTi

i

Marynoun

subj

leftverb

ROOT

i

Marynoun

subj

V1verb

whenadj

ROOTi

i

V2verb

ROOT

V2verb

i

wha

It rained when Mary left

Mary left

expletive

(Bar-Haim et al., RTE-3)

Page 27: 1 The PASCAL Recognizing Textual Entailment Challenges - RTE-1,2,3 Ido DaganBar-Ilan University, Israel with …

28

Resources• WordNet, Extended WordNet, distributional similarity

– Britain UK

– steal take

• DIRT (paraphrase rules)– X file a lawsuit against Y X accuse Y (world knowledge)

– X confirm Y X approve Y (linguistic knowledge)

• FrameNet, ProBank, VerbNet– For semantic role labeling

• Entailment pairs corpora– Automatically acquired training

• No dedicated resources for entailment yet

Page 28: 1 The PASCAL Recognizing Textual Entailment Challenges - RTE-1,2,3 Ido DaganBar-Ilan University, Israel with …

29

Accuracy Results – RTE-1

0.4

0.5

0.6

MIT

RE

Bar

Ila

nU

NE

DD

ublin

Edi

nbur

gh-

Dub

linS

tanf

ord

UIU

CIR

ST

IRS

TU

NE

DE

dinb

urgh

-A

mst

erda

mS

tanf

ord

LCC

Am

ster

dam

accuracy

0.01 sig

0.05 sig

Page 29: 1 The PASCAL Recognizing Textual Entailment Challenges - RTE-1,2,3 Ido DaganBar-Ilan University, Israel with …

30

Results (RTE-2)

First Author (Group)AccuracyAverage Precision

Hickl (LCC)75.4%80.8%

Tatu (LCC)73.8%71.3%

Zanzotto (Milan & Rome)63.9%64.4%

Adams (Dallas)62.6%62.8%

Bos (Rome & Leeds)61.6%66.9%

11 groups58.1%-60.5%

7 groups52.9%-55.6%

Average: 60%Median: 59%

Page 30: 1 The PASCAL Recognizing Textual Entailment Challenges - RTE-1,2,3 Ido DaganBar-Ilan University, Israel with …

31

Results: RTE-3

Accuracy1. Hickl - LCC0.80

2. Tatu - LCC0.72

3. Iftene - Uni. Iasi0.69

4. Adams - Uni. Dallas0.67

5. Wang - DFKI0.66

Baseline (all YES)0.51

Two systems above 70%

Most systems (65%) in the range 60-70%; they were just 30% at RTE-2

Page 31: 1 The PASCAL Recognizing Textual Entailment Challenges - RTE-1,2,3 Ido DaganBar-Ilan University, Israel with …

32

Current Limitations

• Simple methods perform quite well, but not best• System reports point at:

– Lack of knowledge (syntactic transformation rules, paraphrases, lexical relations, etc.)

– Lack of training data

• It seems that systems that coped better with these issues performed best:– Hickl et al. - acquisition of large entailment corpora for

training– Tatu et al. – large knowledge bases (linguistic and world

knowledge)

Page 32: 1 The PASCAL Recognizing Textual Entailment Challenges - RTE-1,2,3 Ido DaganBar-Ilan University, Israel with …

33

Impact

• High interest in the research community– Papers, conference sessions and areas, PhD theses,

funded projects– Special issue - Journal of Natural Language Engineering– ACL-07 tutorial

• Initial contribution to specific applications– QA – Harabagiu & Hickl, ACL-06; CLEF-06/07– RE – Romano et al., EACL-06

• RTE-4 – by NIST, with CELCT– Within TAC, a new semantic evaluation conference

(with QA and summarization, subsuming DUC)

Page 33: 1 The PASCAL Recognizing Textual Entailment Challenges - RTE-1,2,3 Ido DaganBar-Ilan University, Israel with …

34

New Potentials for Machine Learning

Page 34: 1 The PASCAL Recognizing Textual Entailment Challenges - RTE-1,2,3 Ido DaganBar-Ilan University, Israel with …

35

Classical Approach = Interpretation

Stipulated Meaning

Representation(by scholar)

Language(by nature)

Variability

Logical forms, word senses, semantic roles, named entity types, … - scattered tasks

Feasible/suitable framework for applied semantics?

Page 35: 1 The PASCAL Recognizing Textual Entailment Challenges - RTE-1,2,3 Ido DaganBar-Ilan University, Israel with …

36

Textual Entailment = Text Mapping

Assumed Meaning (by humans)

Language(by nature)

Variability

Page 36: 1 The PASCAL Recognizing Textual Entailment Challenges - RTE-1,2,3 Ido DaganBar-Ilan University, Israel with …

37

General Case – Inference

MeaningRepresentation

Language

Entailment mapping is the actual applied goal - and also a touchstone for understanding!

Interpretation becomes a possible mean

Inference

Interpretation

Textual Entailment

Page 37: 1 The PASCAL Recognizing Textual Entailment Challenges - RTE-1,2,3 Ido DaganBar-Ilan University, Israel with …

38

Machine Learning Perspectives

• Issues with interpretation approach:– Hard to agree on target representations– Costly to annotate semantic representations for training– Has it been a barrier?

• Language-level entailment mapping refers to texts– Texts are semantic-theory neutral– Amenable for unsupervised/semi-supervised learning

• It would be interesting to explore (many do) – language-based representations of meaning, inference knowledge,

and ontology,– for which learning and inference methods may be easier to develop.– Artificial intelligence through natural language?

Page 38: 1 The PASCAL Recognizing Textual Entailment Challenges - RTE-1,2,3 Ido DaganBar-Ilan University, Israel with …

39

Major Learning Directions

• Learning entailment knowledge (!!!)– Learning entailment relations between words/expressions

– Integrating with manual resources and knowledge

• Inference methods– Principled frameworks for probabilistic inference

• Estimate likelihood of deriving hypothesis from text

• Fusing information levels

– More than bags of features

• Relational learning relevant for both• How can we increase ML researchers involvement?

Page 39: 1 The PASCAL Recognizing Textual Entailment Challenges - RTE-1,2,3 Ido DaganBar-Ilan University, Israel with …

40

Learning Entailment Knowledge

• Entailing “topical” terms from words/texts– E.g. medicine, law, cars, computer security, …

– An unsupervised version of text categorization

• Learning entailment graph for terms/expressions– Partial knowledge: statistical, lexical resources, Wikipedia, …

– Estimate link likelihood in context

acquire/v

own/v

acquisition/n

buy/v

purchase/n

derived

WN-syn

Dist. sim

entails

entails? ?

?

Page 40: 1 The PASCAL Recognizing Textual Entailment Challenges - RTE-1,2,3 Ido DaganBar-Ilan University, Israel with …

41

Meeting the knowledge challenge – by a coordinated effort?

• A vast amount of “entailment rules” needed• Speculation: can we have a joint community

effort for knowledge acquisition?– Uniform representations

– Mostly automatic acquisition (millions of rules)

– Human Genome Project analogy

• Preliminary: RTE-3 Resources Pool at ACLWiki(set up by Patrick Pantel)

Page 41: 1 The PASCAL Recognizing Textual Entailment Challenges - RTE-1,2,3 Ido DaganBar-Ilan University, Israel with …

42

Textual Entailment ≈ Human Reading Comprehension

• From a children’s English learning book(Sela and Greenberg):

Reference Text: “…The Bermuda Triangle lies in the Atlantic Ocean, off the coast of Florida. …”

Hypothesis (True/False?): The Bermuda Triangle is near the United States

???

Page 42: 1 The PASCAL Recognizing Textual Entailment Challenges - RTE-1,2,3 Ido DaganBar-Ilan University, Israel with …

43

Where are we (from RTE-1)?

Page 43: 1 The PASCAL Recognizing Textual Entailment Challenges - RTE-1,2,3 Ido DaganBar-Ilan University, Israel with …

44

Cautious Optimism

1) Textual entailment provides a unified framework for applied semantics

– Towards generic inference “engines” for applications

2) Potential for:– Scalable knowledge acquisition,

boosted by (mostly unsupervised) learning

– Learning-based inference methods

Thank you!

Page 44: 1 The PASCAL Recognizing Textual Entailment Challenges - RTE-1,2,3 Ido DaganBar-Ilan University, Israel with …

45

Summary: Textual Entailment as Goal

• The essence of our proposal: Base applied inference on entailment “engines” and KBsFormulate various semantic problems as entailment tasks

• Interpretations and “mapping” methods may compete/complement

• Open question: which inferences– can be represented at language level?

– require logical or specialized representation and inference? (temporal, spatial, mathematical, …)

Page 45: 1 The PASCAL Recognizing Textual Entailment Challenges - RTE-1,2,3 Ido DaganBar-Ilan University, Israel with …

46

Collecting QA Pairs• Motivation: a passage containing the answer slot

filler should entail the corresponding answer statement.– E.g. for: Who invented the telephone?, and answer Bell,

text should entail Bell invented the telephone

• QA systems were given TREC and CLEF questions.

• Hypothesis generated by “plugging” the system answer term into the affirmative form of the question

• Texts correspond to the candidate answer passages

Page 46: 1 The PASCAL Recognizing Textual Entailment Challenges - RTE-1,2,3 Ido DaganBar-Ilan University, Israel with …

47

Collecting IE Pairs• Motivation: a sentence containing a target

relation instance should entail an instantiated template of the relation– E.g: X is located in Y

• Pairs were generated in several ways– Outputs of IE systems:

• for ACE-2004 and MUC-4 relations

– Manually:• for ACE-2004 and MUC-4 relations

• for additional relations in news domain

Page 47: 1 The PASCAL Recognizing Textual Entailment Challenges - RTE-1,2,3 Ido DaganBar-Ilan University, Israel with …

48

Collecting IR Pairs

• Motivation: relevant documents should entail a given “propositional” query.

• Hypotheses are propositional IR queries, adapted and simplified from TREC and CLEF– drug legalization benefits

drug legalization has benefits

• Texts selected from documents retrieved by different search engines

Page 48: 1 The PASCAL Recognizing Textual Entailment Challenges - RTE-1,2,3 Ido DaganBar-Ilan University, Israel with …

49

Collecting SUM (MDS) Pairs• Motivation: identifying redundant statements

(particularly in multi-document summaries)• Using web document clusters and system summary• Picking for hypotheses sentences having high lexical

overlap with summary• In final pairs:

– Texts are original sentences (usually from summary)– Hypotheses:

• Positive pairs: simplify h until entailed by t• Negative pairs: simplify h similarly

• In RTE-3: using Pyramid benchmark data