rte-7@tac2010 the seventh recognizing textual entailment challenge

55
RTE-7@TAC2010 The Seventh Recognizing Textual Entailment Challenge Luisa Bentivogli (coordinator, CELCT & FBK-irst) Danilo Giampiccolo (coordinator, CELCT) Hoa Trang Dang (NIST) Ido Dagan (Bar Ilan University) Peter Clark (Vulcan Inc.)

Upload: kiril

Post on 25-Feb-2016

37 views

Category:

Documents


0 download

DESCRIPTION

RTE-7@TAC2010 The Seventh Recognizing Textual Entailment Challenge. Luisa Bentivogli (coordinator, CELCT & FBK-irst) Danilo Giampiccolo (coordinator, CELCT) Hoa Trang Dang (NIST) Ido Dagan (Bar Ilan University) Peter Clark (Vulcan Inc.). Outline. The RTE Challenge - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: RTE-7@TAC2010 The Seventh Recognizing Textual Entailment Challenge

RTE-7@TAC2010The Seventh Recognizing

Textual Entailment Challenge

Luisa Bentivogli (coordinator, CELCT & FBK-irst)Danilo Giampiccolo (coordinator, CELCT)Hoa Trang Dang (NIST)Ido Dagan (Bar Ilan University)Peter Clark (Vulcan Inc.)

Page 2: RTE-7@TAC2010 The Seventh Recognizing Textual Entailment Challenge

Outline• The RTE Challenge• RTE-7 Main Task: RTE within a

Corpus– RTE-7 Novelty Detection Subtask– Knowledge Resources and Tools for

RTE• RTE-7 KBP Validation Task• Conclusion and Future Perspectives

NIST - November 14, 2011 RTE-7@TAC2011

Page 3: RTE-7@TAC2010 The Seventh Recognizing Textual Entailment Challenge

Outline• The RTE Challenge• RTE-7 Main Task: RTE within a

Corpus– RTE-7 Novelty Detection Subtask– Knowledge Resources and Tools for

RTE• RTE-7 KBP Validation Task• Conclusion and Future Perspectives

NIST - November 14, 2011 RTE-7@TAC2011

Page 4: RTE-7@TAC2010 The Seventh Recognizing Textual Entailment Challenge

Textual entailment is a directional relation between two text fragments: •the entailing text, called T(ext) •the entailed text, called H(ypothesis)

Textual Entailment

T entails H if, typically, a human reading T would infer

that H is most likely true

NIST - November 14, 2011 RTE-7@TAC2011

Page 5: RTE-7@TAC2010 The Seventh Recognizing Textual Entailment Challenge

Examples• YES

T: The Christian Science Monitor named a US journalist kidnapped in Iraq as freelancer Jill Carroll.

H: Jill Carroll was abducted in Iraq.

• NOT: The Christian Science Monitor named a

US journalist kidnapped in Iraq as freelancer Jill Carroll.

H: Jill Carroll is the daughter of Mary Beth Carroll.

NIST - November 14, 2011 RTE-7@TAC2011

Page 6: RTE-7@TAC2010 The Seventh Recognizing Textual Entailment Challenge

The RTE-7 ChallengeReplicates the same tasks as in RTE-6 to allow participants to address the novelties

introduced for the first time in RTE-6:– Main Task: Textual Entailment within a Corpus

(Piloted in RTE-5 - Summarization setting)– Novelty Detection Subtask (based on the Main Task)– KBP Validation Task (Knowledge Base Population

setting)– Exploratory effort on resource evaluation extended

to tools

NIST - November 14, 2011 RTE-7@TAC2011

Page 7: RTE-7@TAC2010 The Seventh Recognizing Textual Entailment Challenge

RTE-7 Participants• Number of participants: 13

– RTE-1: 18, RTE-2: 23, RTE-3: 26, RTE-4: 26, RTE-5: 21, RTE-6: 18

• Provenance– ASIA: 8– EUROPE: 5

• Participants per task– Main Task: 13 (33 runs)– Novelty Detection Subtask: 5 (13 runs)– KBP Validation Pilot Task: 2 (8 runs)

NIST - November 14, 2011 RTE-7@TAC2011

Page 8: RTE-7@TAC2010 The Seventh Recognizing Textual Entailment Challenge

Outline• The RTE Challenge• RTE-7 Main Task: RTE within a

Corpus– RTE-7 Novelty Detection Subtask– Knowledge Resources and Tools for

RTE• RTE-7 KBP Validation Task• Conclusion and Future Perspectives

NIST - November 14, 2011 RTE-7@TAC2011

Page 9: RTE-7@TAC2010 The Seventh Recognizing Textual Entailment Challenge

• Given – a corpus– a hypothesis H– a set of "candidate" entailing

sentences for that H retrieved by Lucene from the corpus

• RTE systems are required – to identify all the sentences among the

candidate sentences that entail a given Hypothesis

RTE-7 Main Task Description

NIST - November 14, 2011 RTE-7@TAC2011

Page 10: RTE-7@TAC2010 The Seventh Recognizing Textual Entailment Challenge

RTE-7 Main Task Example

S1: Betty Friedan, a founder of the modern feminist movement in the United States, died here Saturday of congestive heart failure, feminist leaders announced.

S2: She was 85.

S3: Friedan achieved prominence in l963 with the publication of her book "The Feminine Mystique," which detailed the lives of American women who were expected to find fulfillment through the achievements of their husbands and children.

S4: The book sparked a movement for a re-evaluation of women's role in American society and is credited with laying the foundation of modern feminism.

S5: She was a founder of the National Organization for Women and a leading advocate of the Equal Rights Amendment, a proposed amendment to the US constitution banning sex-based discrimination, women's rights activists said.

S6: "The movement that Friedan's energy sparked continues to grow, and is bigger today than she could ever have dreamed …

S1: Betty Friedan, the visionary, combative feminist who launched a social revolution with her provocative 1963 book, "The Feminine Mystique," died Saturday, which was her 85th birthday.

S2: Friedan died of congestive heart failure at her home in Washington, D.C., according to Emily Bazelon, a cousin who was speaking for the family.

S3: She said Friedan had been in failing health for some time.

S4: Her best-selling book identified "the problem that has no name," the unhappiness of post-World War II American women unfulfilled by traditional notions of female domesticity.

S5:. Melding sociology and humanistic psychology, the book became the cornerstone of one of the last century's most profound movements, unleashing the first full flowering of American feminism since the 1800s.

S6: It gave Friedan, an obscure suburban New York housewife and freelance writer, the mantle to...…

S26: What is perhaps most surprising, though, is not that feminists like Hirshman believe homemaking is second-class drudgery, but that so many people still get worked up over the issue.

S27: After all, feminist thinkers have been proclaiming the need to free women from the bondage of housework for a long time..

S28: It is, as Hirshman freely acknowledges, precisely what Friedan argued in "The Feminine Mystique," first published more than 40 years ago.

S29 "The only kind of work which permits an able woman to realize her abilities fully," Friedan wrote, "is the kind that was forbidden by the feminine mystique, the lifelong commitment to an art or science, to politics or profession.".

S30: Not homemaking, not motherhood.

S31: In an interview, Hirshman said that in the course of researching a book, she began to wonder when feminism switched from offering a clear blueprint for liberation to choosing from Column A and Column B. …

Document 1 Document 2 Document 3

H380 :Betty Friedan is the author of "The Feminine Mystique."H391 : "The Feminine Mystique" was published in 1963.H401 : In 1962, Judy Mott was laid off from her job with Sears.

Hs SET

NIST - November 14, 2011 RTE-7@TAC2011

Topic 918: Betty Friedan

H380: Betty Friedan is the author of "The Feminine Mystique"

Page 11: RTE-7@TAC2010 The Seventh Recognizing Textual Entailment Challenge

RTE-7 Main Task Example

H380 :Betty Friedan is the author of "The Feminine Mystique."H391 : "The Feminine Mystique" was published in 1963.H401 : In 1962, Judy Mott was laid off from her job with Sears.S1: Betty Friedan, a founder of the modern feminist

movement in the United States, died here Saturday of congestive heart failure, feminist leaders announced.

S2: She was 85.

S3: Friedan achieved prominence in l963 with the publication of her book "The Feminine Mystique," which detailed the lives of American women who were expected to find fulfillment through the achievements of their husbands and children.

S4: The book sparked a movement for a re-evaluation of women's role in American society and is credited with laying the foundation of modern feminism.

S5: She was a founder of the National Organization for Women and a leading advocate of the Equal Rights Amendment, a proposed amendment to the US constitution banning sex-based discrimination, women's rights activists said.

S6: "The movement that Friedan's energy sparked continues to grow, and is bigger today than she could ever have dreamed …

S1: Betty Friedan, the visionary, combative feminist who launched a social revolution with her provocative 1963 book, "The Feminine Mystique," died Saturday, which was her 85th birthday.

S2: Friedan died of congestive heart failure at her home in Washington, D.C., according to Emily Bazelon, a cousin who was speaking for the family.

S3: She said Friedan had been in failing health for some time.

S4: Her best-selling book identified "the problem that has no name," the unhappiness of post-World War II American women unfulfilled by traditional notions of female domesticity.

S5:. Melding sociology and humanistic psychology, the book became the cornerstone of one of the last century's most profound movements, unleashing the first full flowering of American feminism since the 1800s.

S6: It gave Friedan, an obscure suburban New York housewife and freelance writer, the mantle to...…

S26: What is perhaps most surprising, though, is not that feminists like Hirshman believe homemaking is second-class drudgery, but that so many people still get worked up over the issue.

S27: After all, feminist thinkers have been proclaiming the need to free women from the bondage of housework for a long time..

S28: It is, as Hirshman freely acknowledges, precisely what Friedan argued in "The Feminine Mystique," first published more than 40 years ago.

S29 "The only kind of work which permits an able woman to realize her abilities fully," Friedan wrote, "is the kind that was forbidden by the feminine mystique, the lifelong commitment to an art or science, to politics or profession.".

S30: Not homemaking, not motherhood.

S31: In an interview, Hirshman said that in the course of researching a book, she began to wonder when feminism switched from offering a clear blueprint for liberation to choosing from Column A and Column B. …

Document 1 Document 2 Document 3

Hs SET

NIST - November 14, 2011 RTE-7@TAC2011

Topic 918: Betty Friedan

H380: Betty Friedan is the author of "The Feminine Mystique"

Page 12: RTE-7@TAC2010 The Seventh Recognizing Textual Entailment Challenge

RTE-7 Main Task Example

H380 :Betty Friedan is the author of "The Feminine Mystique."H391 : "The Feminine Mystique" was published in 1963.H401 : In 1962, Judy Mott was laid off from her job with Sears.S1: Betty Friedan, a founder of the modern feminist

movement in the United States, died here Saturday of congestive heart failure, feminist leaders announced.

S2: She was 85.

S3: Friedan achieved prominence in l963 with the publication of her book "The Feminine Mystique," which detailed the lives of American women who were expected to find fulfillment through the achievements of their husbands and children.

S4: The book sparked a movement for a re-evaluation of women's role in American society and is credited with laying the foundation of modern feminism.

S5: She was a founder of the National Organization for Women and a leading advocate of the Equal Rights Amendment, a proposed amendment to the US constitution banning sex-based discrimination, women's rights activists said.

S6: "The movement that Friedan's energy sparked continues to grow, and is bigger today than she could ever have dreamed …

S1: Betty Friedan, the visionary, combative feminist who launched a social revolution with her provocative 1963 book, "The Feminine Mystique," died Saturday, which was her 85th birthday.

S2: Friedan died of congestive heart failure at her home in Washington, D.C., according to Emily Bazelon, a cousin who was speaking for the family.

S3: She said Friedan had been in failing health for some time.

S4: Her best-selling book identified "the problem that has no name," the unhappiness of post-World War II American women unfulfilled by traditional notions of female domesticity.

S5:. Melding sociology and humanistic psychology, the book became the cornerstone of one of the last century's most profound movements, unleashing the first full flowering of American feminism since the 1800s.

S6: It gave Friedan, an obscure suburban New York housewife and freelance writer, the mantle to...…

S26: What is perhaps most surprising, though, is not that feminists like Hirshman believe homemaking is second-class drudgery, but that so many people still get worked up over the issue.

S27: After all, feminist thinkers have been proclaiming the need to free women from the bondage of housework for a long time..

S28: It is, as Hirshman freely acknowledges, precisely what Friedan argued in "The Feminine Mystique," first published more than 40 years ago.

S29 "The only kind of work which permits an able woman to realize her abilities fully," Friedan wrote, "is the kind that was forbidden by the feminine mystique, the lifelong commitment to an art or science, to politics or profession.".

S30: Not homemaking, not motherhood.

S31: In an interview, Hirshman said that in the course of researching a book, she began to wonder when feminism switched from offering a clear blueprint for liberation to choosing from Column A and Column B. …

Document 1 Document 2 Document 3

Hs SET

NIST - November 14, 2011 RTE-7@TAC2011

Topic 918: Betty Friedan

H380: Betty Friedan is the author of "The Feminine Mystique"

S3: Friedan achieved prominence in l963 with the publication of her book "The Feminine Mystique," which detailed the lives of American women ...

S1: Betty Friedan, the visionary, combative feminist who launched a social revolution with her provocative 1963 book, "The Feminine Mystique," died …

S28: It is, as Hirshman freely acknowledges, precisely what Friedan argued in her book "The Feminine Mystique," first published...

Page 13: RTE-7@TAC2010 The Seventh Recognizing Textual Entailment Challenge

TAC 2008 and 2009 SUM Update scenarioFor each topic:

RTE-7 Main Data Set (1/2)Ti

me

Cluster A

Cluster B

Initial Summary

Update Summary

NIST - November 14, 2011 RTE-7@TAC2011

Page 14: RTE-7@TAC2010 The Seventh Recognizing Textual Entailment Challenge

RTE-7 Main Data Set (2/2)

S1: Betty Friedan, a founder of the modern feminist movement in the United States, died here Saturday of congestive heart failure, feminist leaders announced.

S2: She was 85.

S3: Friedan achieved prominence in l963 with the publication of her book "The Feminine Mystique," which detailed the lives of American women who were expected to find fulfillment through the achievements of their husbands and children.

S4: The book sparked a movement for a re-evaluation of women's role in American society and is credited with laying the foundation of modern feminism.

S5: She was a founder of the National Organization for Women and a leading advocate of the Equal Rights Amendment, a proposed amendment to the US constitution banning sex-based discrimination, women's rights activists said.

S6: "The movement that Friedan's energy sparked continues to grow, and is bigger today than she could ever have dreamed …

S1: Betty Friedan, the visionary, combative feminist who launched a social revolution with her provocative 1963 book, "The Feminine Mystique," died Saturday, which was her 85th birthday.

S2: Friedan died of congestive heart failure at her home in Washington, D.C., according to Emily Bazelon, a cousin who was speaking for the family.

S3: She said Friedan had been in failing health for some time.

S4: Her best-selling book identified "the problem that has no name," the unhappiness of post-World War II American women unfulfilled by traditional notions of female domesticity.

S5:. Melding sociology and humanistic psychology, the book became the cornerstone of one of the last century's most profound movements, unleashing the first full flowering of American feminism since the 1800s.

S6: It gave Friedan, an obscure suburban New York housewife and freelance writer, the mantle to...…

S26: What is perhaps most surprising, though, is not that feminists like Hirshman believe homemaking is second-class drudgery, but that so many people still get worked up over the issue.

S27: After all, feminist thinkers have been proclaiming the need to free women from the bondage of housework for a long time..

S28: It is, as Hirshman freely acknowledges, precisely what Friedan argued in "The Feminine Mystique," first published more than 40 years ago.

S29 "The only kind of work which permits an able woman to realize her abilities fully," Friedan wrote, "is the kind that was forbidden by the feminine mystique, the lifelong commitment to an art or science, to politics or profession.".

S30: Not homemaking, not motherhood.

S31: In an interview, Hirshman said that in the course of researching a book, she began to wonder when feminism switched from offering a clear blueprint for liberation to choosing from Column A and Column B. …

Document 1 Document 2 Document 3

H380: Betty Friedan is the author of "The Feminine Mystique."H381: Betty Friedan died on February 4, 2006.H382: Betty Friedan died at 85.H397: In The Guardian, Germaine Greer took critical measure of Betty Friedan.

Hs SET

NIST - November 14, 2011 RTE-7@TAC2011

Topic 918: Betty Friedan

20-40 standalone sentences:- based on the “B” summary sentences of the 3 best scoring SUM systems- based directly on Cluster “A” sentences

Cluster A

Automatic summary sentence:In The Guardian, Germaine Greer took critical measure of a fellow feminist, Betty Friedan, the author of “The Feminine Mystique” who died on Feb. 4 at 85.

Page 15: RTE-7@TAC2010 The Seventh Recognizing Textual Entailment Challenge

RTE-7 Main Data Set (2/2)

H380 :Betty Friedan is the author of "The Feminine Mystique."H391 : "The Feminine Mystique" was published in 1963.H401 : In 1962, Judy Mott was laid off from her job with Sears.S1: Betty Friedan, a founder of the modern feminist

movement in the United States, died here Saturday of congestive heart failure, feminist leaders announced.

S2: She was 85.

S3: Friedan achieved prominence in l963 with the publication of her book "The Feminine Mystique," which detailed the lives of American women who were expected to find fulfillment through the achievements of their husbands and children.

S4: The book sparked a movement for a re-evaluation of women's role in American society and is credited with laying the foundation of modern feminism.

S5: She was a founder of the National Organization for Women and a leading advocate of the Equal Rights Amendment, a proposed amendment to the US constitution banning sex-based discrimination, women's rights activists said.

S6: "The movement that Friedan's energy sparked continues to grow, and is bigger today than she could ever have dreamed …

S1: Betty Friedan, the visionary, combative feminist who launched a social revolution with her provocative 1963 book, "The Feminine Mystique," died Saturday, which was her 85th birthday.

S2: Friedan died of congestive heart failure at her home in Washington, D.C., according to Emily Bazelon, a cousin who was speaking for the family.

S3: She said Friedan had been in failing health for some time.

S4: Her best-selling book identified "the problem that has no name," the unhappiness of post-World War II American women unfulfilled by traditional notions of female domesticity.

S5:. Melding sociology and humanistic psychology, the book became the cornerstone of one of the last century's most profound movements, unleashing the first full flowering of American feminism since the 1800s.

S6: It gave Friedan, an obscure suburban New York housewife and freelance writer, the mantle to...…

S26: What is perhaps most surprising, though, is not that feminists like Hirshman believe homemaking is second-class drudgery, but that so many people still get worked up over the issue.

S27: After all, feminist thinkers have been proclaiming the need to free women from the bondage of housework for a long time..

S28: It is, as Hirshman freely acknowledges, precisely what Friedan argued in "The Feminine Mystique," first published more than 40 years ago.

S29 "The only kind of work which permits an able woman to realize her abilities fully," Friedan wrote, "is the kind that was forbidden by the feminine mystique, the lifelong commitment to an art or science, to politics or profession.".

S30: Not homemaking, not motherhood.

S31: In an interview, Hirshman said that in the course of researching a book, she began to wonder when feminism switched from offering a clear blueprint for liberation to choosing from Column A and Column B. …

Document 1 Document 2 Document 3

Hs SET

NIST - November 14, 2011 RTE-7@TAC2011

Topic 918: Betty Friedan

H380: Betty Friedan is the author of "The Feminine Mystique"

Up to 100 “candidate” entailing sentences- Information Retrieval filtering phase: - The H is the query - The corpus sentences are “the documents” to be retrieved for the query - the 100 top-ranked sentences are selected as candidates (80% of all the entailing sentences in the corpus)

- LUCENE text search engine (v. 2.9.1): - StandardAnalyzer, Boolean “OR” query, Default Lucene ranking

Page 16: RTE-7@TAC2010 The Seventh Recognizing Textual Entailment Challenge

• 3 annotations for the whole data set• IAA (Kappa): 98.35% (Dev), 98.51% (Test)

Data Set Composition

NIST - November 14, 2011 RTE-7@TAC2011

DEVELOPMENT SET TEST SETTopics 10 Topics 10HypothesesEntailment: yes |noSummaries: yes |no

284174 | 110

193 | 91

HypothesesEntailment: yes | noSummaries: yes | no

269186 |

83192 |77

Annotations 21,420

Annotations 22,426

“entailment” judg.

1,136 “entailment” judg.

1,308

Page 17: RTE-7@TAC2010 The Seventh Recognizing Textual Entailment Challenge

13 participants (33 runs)• Evaluation measures:

– Precision, Recall, F-measure (micro-averaged)

• IR Baselines:

Main Task Evaluation

NIST - November 14, 2011 RTE-7@TAC2011

Precision

Recall F1

Lucene_5 37.00 37.84 37.41Lucene_10 27.07 55.20 36.33Lucene_15 21.15 64.65 31.85Lucene_20 17.71 71.64 28.40Lucene_100 5.83 100 11.02

Page 18: RTE-7@TAC2010 The Seventh Recognizing Textual Entailment Challenge

Best Results

NIST - November 14, 2011 RTE-7@TAC2011

Team Precision Recall F-

measure IKOMA1 46.96 49.08 48.00 u_tokyo3 46.84 43.58 45.15 BUPTTeam1 45.02 44.95 44.99 CELI1 41.88 46.56 44.10 DFKI2 50.77 37.92 43.41 BIU2 41.81 44.11 42.93 FBK_irst3 46.59 38.07 41.90Baseline_Lucene5 30.78 39.58 34.63

te_iitb1 20.67 60.24 30.78 JU_CSE_TAC2 26.66 35.55 30.47 ICL1 47.88 21.56 29.73 UAIC20112 30.21 25.84 27.85 SJTU_CIT3 17.92 33.33 23.31 SINAI3 47.3 8.72 14.72Baseline_LuceneAll 4.73 100.00 9.03

Page 19: RTE-7@TAC2010 The Seventh Recognizing Textual Entailment Challenge

Results: F-measure statistics

NIST - November 14, 2011 RTE-7@TAC2011

F-measure Best runsHighest

48.00

Median

41.90

Baseline_Lucene5

37.41

Average

35.95

Lowest

14.72

Page 20: RTE-7@TAC2010 The Seventh Recognizing Textual Entailment Challenge

Results: F-measure statistics

NIST - November 14, 2011 RTE-7@TAC2011

F-measure Best runsHighestRTE-6

48.0048.01

MedianRTE-6

41.9036.14

Baseline_Lucene5RTE-6

37.4134.63

AverageRTE-6

35.9533.70

LowestRTE-6

14.7211.60

Page 21: RTE-7@TAC2010 The Seventh Recognizing Textual Entailment Challenge

Outline• The RTE Challenge• RTE-7 Main Task: RTE within a

Corpus– RTE-7 Novelty Detection Subtask– Knowledge Resources and Tools for

RTE• RTE-7 KBP Validation Task• Conclusion and Future Perspectives

NIST - November 14, 2011 RTE-7@TAC2011

Page 22: RTE-7@TAC2010 The Seventh Recognizing Textual Entailment Challenge

RTE-7 Novelty Detection SubtaskGoals:• Specifically address the needs of

the SUM Update Task, where it is necessary to distinguish between novel and non novel information

• RTE engines could help summarization systems to filter out non-novel sencences from their summaries

NIST - November 14, 2011 RTE-7@TAC2011

Page 23: RTE-7@TAC2010 The Seventh Recognizing Textual Entailment Challenge

RTE-7 Novelty Detection SubtaskTask:Judge if the information contained in each H (from Cluster B) is novel with respect to the information contained in the set of (Cluster A) candidate entailing sentences– If a given H:

•has entailing sentences = information is NOT novel

•has not entailing sentences = information is novel

NIST - November 14, 2011 RTE-7@TAC2011

Page 24: RTE-7@TAC2010 The Seventh Recognizing Textual Entailment Challenge

RTE-7 Novelty Detection SubtaskBased on the Main Task:• Uses only the Hs taken from the automatic

summaries• Same output format/annotation

– the novelty detection decision is derived automatically from the number of entailing sentences for each H

Differences:• Systems are specifically tuned for novelty

detection • Specific scoring metrics designed for

assessing novelty detectionNIST - November 14, 2011 RTE-7@TAC2011

Page 25: RTE-7@TAC2010 The Seventh Recognizing Textual Entailment Challenge

• IAA (Kappa): 98.21% (Dev), 98.06% (Test)

Data Set Composition

NIST - November 14, 2011 RTE-7@TAC2011

DEVELOPMENT SET TEST SET

Topics 10 Topics 10

HypothesesNovel:

254159

(63%)HypothesesNovel:

302195

(65%)“entailing” judgm. 576 “entailing”

judgm. 779

Page 26: RTE-7@TAC2010 The Seventh Recognizing Textual Entailment Challenge

Evaluation Measures5 participants (13 runs)• Primary score: Novelty Detection

evaluation– Micro Averaged Precision, recall and F-measure

computed on the binary novel/non-novel decision– derived automatically from the number of entailing

sentences provided by the systems• Secondary score: Justification evaluation

– measures the quality of the justifications provided for non-novel Hs

– Micro-averaged Precision, Recall and F-measure on the set of all the sentences extracted as entailing the Hs

NIST - November 14, 2011 RTE-7@TAC2011

Page 27: RTE-7@TAC2010 The Seventh Recognizing Textual Entailment Challenge

Best Results – Primary Score

NIST - November 14, 2011 RTE-7@TAC2011

Novelty Detection Evaluation

Run Precision Recall F1

IKOMA2 86.92 95.38 90.95CELI1 88.83 85.64 87.21JU_CSE_TAC1 80.18 93.33 86.26BIU1 90.74 75.38 82.35DFKI3 91.72 73.85 81.82Baseline_all_new 64.57 100 78.47

Page 28: RTE-7@TAC2010 The Seventh Recognizing Textual Entailment Challenge

Best Results – Secondary Score

NIST - November 14, 2011 RTE-7@TAC2011

Justification Evaluation

Run Precision Recall F-

measureBIU3 36.34 40.31 38.22DFKI2 38.36 33.63 35.84IKOMA1 51.84 27.09 35.58CELI1 37.92 33.25 35.43JU_CSE_TAC2 21.94 33.63 26.56

Page 29: RTE-7@TAC2010 The Seventh Recognizing Textual Entailment Challenge

Novelty Detectio

n

Justification

(non novel Hs)

F-measure Best runs Best runs

Highest

90.95

38.22

Median

86.26

35.58

Average

85.72

34.32

Lowest

81.82

26.56

Results: F-measure statistics

NIST - November 14, 2011 RTE-7@TAC2011

Page 30: RTE-7@TAC2010 The Seventh Recognizing Textual Entailment Challenge

Novelty Detectio

n

Justification

(non novel Hs)

F-measure Best runs Best runs

Highest RTE-6

90.9582.91

38.2248.26

MedianRTE-6

86.2678.70

35.5835.59

AverageRTE-6

85.7272.41

34.3232.38

LowestRTE-6

81.8243.98

26.563.79

Results: F-measure statistics

NIST - November 14, 2011 RTE-7@TAC2011

Page 31: RTE-7@TAC2010 The Seventh Recognizing Textual Entailment Challenge

Outline• The RTE Challenge• RTE-7 Main Task: RTE within a

Corpus– RTE-7 Novelty Detection Subtask– Knowledge Resources and Tools for

RTE• RTE-7 KBP Validation Task• Conclusion and Future Perspectives

NIST - November 14, 2011 RTE-7@TAC2011

Page 32: RTE-7@TAC2010 The Seventh Recognizing Textual Entailment Challenge

Knowledge Resources and Tools for RTEAn exploratory effort aimed at studying the

relevance of knowledge resources and tools in recognizing TE

• Ablation Tests for all knowledge resources and tools used in Main Task runs:– remove one module at a time from a

system, and re-run the system on the test set with the other modules, except the one tested

! Remove only knowledge resources or tools! Remove one resource or tool at a time

NIST - November 14, 2011 RTE-7@TAC2011

Page 33: RTE-7@TAC2010 The Seventh Recognizing Textual Entailment Challenge

• 31 ablation tests submitted (by 10 teams)– 7 tests did not specifically address

knowledge resources or tools– 3 tests had a combination of different

resources/components removed

• 21 ablation tests conformant to the requirements– 16 tests for 7 different resources– 5 tests for 2 different tools

Ablation Tests

NIST - November 14, 2011 RTE-7@TAC2011

Page 34: RTE-7@TAC2010 The Seventh Recognizing Textual Entailment Challenge

Ablated Resource#

Ablation Tests

Impact on Systems

Positive Negative

WordNet 8 5 (+9.81%)

3 (-0.14%)

Wikipedia 3 2 (+8.89%)

1 (-2.64%)

VerbOcean 1 1 (+5.93%) -

DIRECT 1 1 (+0.94%) -

Paraphrase table 1 - 1 (-1.43%)

CatVar 1 1 (+0.84%) -

Acronym Lists 1 - 1 (-0.16%)

Ablation Tests - Resources

NIST - November 14, 2011 RTE-7@TAC2011

Page 35: RTE-7@TAC2010 The Seventh Recognizing Textual Entailment Challenge

Ablated Resource#

Ablation Tests

Impact on Systems

Positive Negative

WordNet 8 5 (+9.81%)

3 (-0.14%)

Wikipedia 3 2 (+8.89%)

1 (-2.64%)

VerbOcean 1 1 (+5.93%) -

DIRECT 1 1 (+0.94%) -

Paraphrase table 1 - 1 (-1.43%)

CatVar 1 1 (+0.84%) -

Acronym Lists 1 - 1 (-0.16%)

Ablation Tests - Resources

NIST - November 14, 2011 RTE-7@TAC2011

Page 36: RTE-7@TAC2010 The Seventh Recognizing Textual Entailment Challenge

Ablated Tool#

Ablation

Tests

Impact on Systems

Positive Negative

Named Entities Recognizer 4 2

(+7.97%)2 (-

8.29%)Coreference Resolver 1 1

(+0.69%) -

Ablation Tests - Tools

NIST - November 14, 2011 RTE-7@TAC2011

Page 37: RTE-7@TAC2010 The Seventh Recognizing Textual Entailment Challenge

Ablated Tool#

Ablation

Tests

Impact on Systems

Positive Negative

Named Entities Recognizer 4 2

(+7.97%)2 (-

8.29%)Coreference Resolver 1 1

(+0.69%) -

Ablation Tests - Tools

NIST - November 14, 2011 RTE-7@TAC2011

Page 38: RTE-7@TAC2010 The Seventh Recognizing Textual Entailment Challenge

• WRT RTE-5 and RTE-6:– Resources: trends confirmed over the years – Tools: RTE-6 trends not confirmed

• Lesson learned– Ablation test results may provide an indication

of the actual contribution of a component to the performance a specific system

– BUT the value of a resource is very much dependent on how that resource is used and how it integrates with the rest of the system

– Need for a deeper comprehension of the usage of the resources and tools

Remarks on the initiative

NIST - November 16, 2010 RTE-6@TAC2010

Page 39: RTE-7@TAC2010 The Seventh Recognizing Textual Entailment Challenge

Outline• The RTE Challenge• RTE-7 Main Task: RTE within a

Corpus– RTE-7 Novelty Detection Subtask– Knowledge Resources and Tools for

RTE• RTE-7 KBP Validation Task• Conclusion and Future Perspectives

NIST - November 14, 2011 RTE-7@TAC2011

Page 40: RTE-7@TAC2010 The Seventh Recognizing Textual Entailment Challenge

Motivations:• analyze the potential utility of RTE

systems in another real NLP application scenario, i.e. the Knowledge Base Population Slot Filling task

• use Textual Entailment techniques to validate the output of an NLP system (similar to the AVE experiment in QA)

The RTE-7 KBP Validation Task

NIST - November 14, 2011 RTE-7@TAC2011

Page 41: RTE-7@TAC2010 The Seventh Recognizing Textual Entailment Challenge

Given an entity in a knowledge base and an attribute (slot) for that entity:

• find in a large corpus the correct value (filler) for that attribute

• return the extracted information together with a corpus document supporting it as a correct slot filler

The KBP Slot Filling Task

NIST - November 14, 2011 RTE-7@TAC2011

Page 42: RTE-7@TAC2010 The Seventh Recognizing Textual Entailment Challenge

• Initial assumption: an extracted slot filler is correct if and only if the supporting document entails a hypothesis summarizing the slot filler

• Task : determine whether a candidate slot filler is supported in the associated document using entailment techniques.

The RTE-7 KBP Validation Task

NIST - November 14, 2011 RTE-7@TAC2011

Page 43: RTE-7@TAC2010 The Seventh Recognizing Textual Entailment Challenge

Each slot filler returned by KBP systems

Data Set Creation

1 RTE evaluation pair, where:• T is the entire document

supporting the slot filler• H is a set of synonymous

sentences, representing different realizations of the slot filler

NIST - November 14, 2011 RTE-7@TAC2011

Page 44: RTE-7@TAC2010 The Seventh Recognizing Textual Entailment Challenge

Data Set Creation: example

H:

H1: Chris Simcox lives in Tucson, Ariz.H2: Chris Simcox has residence in Tucson, Ariz.H3: Tucson, Ariz. is the place of residence of

Chris SimcoxH4: Chris Simcox resides in Tucson, Ariz.H5: Chris Simcox’s home is in Tucson, Ariz.

Target Entity: Chris SimcoxSlot: ResidencesDocument collection

KBP SYSTEM INPUT Slot Filler: “Tucson, Ariz.”Supporting

Document: NYT_ENG_20050919.0130.LDC2007T07

KBP SYSTEM OUTPUT

T: NYT_ENG_20050919.0130.LDC2007T07

NIST - November 14, 2011

RTE-7@TAC2011

RTE EVALUATION PAIR

Page 45: RTE-7@TAC2010 The Seventh Recognizing Textual Entailment Challenge

H1 CHRIS SIMCOX origins are in CANADIANH2 CHRIS SIMCOX comes from CANADIAN H3 CHRIS SIMCOX is from CANADIAN H4 CHRIS SIMCOX origins are CANADIAN H5 CHRIS SIMCOX has CANADIAN origin H6 CHRIS SIMCOX is of CANADIAN origin

Hypotheses CreationManually created

templates

Templ 1: X’s origins are in Y

Templ 2: X comes from Y

Templ 3: X is from Y

Templ 4: X origins are Y

Templ 5: X has Y originsTempl 6: X is of Y origin

HsAttribute: origin Target entity: person

Slot filler: Canadian Target entity: Chris Simcox

NIST - November 14, 2011 RTE-7@TAC2011

Page 46: RTE-7@TAC2010 The Seventh Recognizing Textual Entailment Challenge

KBP assessments(automatically)

RTE gold standard annotations

Gold Standard Creation

KBP JUDGMENTS ENTAILMENT VALUES (4-valued) (2-valued)

Correct YESRedundant YESWrong NOInexact (not included)

NIST - November 14, 2011 RTE-7@TAC2011

Page 47: RTE-7@TAC2010 The Seventh Recognizing Textual Entailment Challenge

• RTE evaluation pair– T is an entire document– H is a set of synonymous sentences,

possibly ungrammatical

• (Semi-)automatic generation– Data Set

•from KBP outputs– Gold Standard

•from KBP output assessments

Distinguishing Features

NIST - November 14, 2011 RTE-7@TAC2011

Page 48: RTE-7@TAC2010 The Seventh Recognizing Textual Entailment Challenge

Removed pair types: GPE; “inexact”; “NO_RESPONSE”; duplicates; speech transcriptions; “other_family” slot ; web documents

Data Set CompositionDEVELOPMENT SET TEST SET

Combined RTE-6 Dev and Test sets

KBP ’11 Slot Filling Task assessments

24,014

Pairs 24,808 Pairs 23,99

8 Positive examples 2,231 Positive

examples 1,508Negative examples

22,577

Negative examples

21,971

NIST - November 14, 2011 RTE-7@TAC2011

Page 49: RTE-7@TAC2010 The Seventh Recognizing Textual Entailment Challenge

2 TYPES OF SUBMISSIONS:• generic systems (no adaptation)• tailored systems (adapted for specific slots)PARTICIPANTS : 2SUBMITTED RUNS: 8• 5 generic• 3 tailoredEVALUATION MEASURES:Micro-Averaged Precision, Recall, F-measure

Evaluation

NIST - November 14, 2011 RTE-7@TAC2011

Page 50: RTE-7@TAC2010 The Seventh Recognizing Textual Entailment Challenge

Baseline: All Ts classified as entailing the corresponding H

This baseline:• reflects the cumulative performance of all KBP Slot Filling Systems• indicates the percentage of entailing pairs in the Test Set

Pilot Task Baseline

NIST - November 14, 2011 RTE-7@TAC2011

Page 51: RTE-7@TAC2010 The Seventh Recognizing Textual Entailment Challenge

Results

TYPE RUN P R F1

GenericJU_CSE_TAC2 11.79 49.14 19.02CELI3 10.47 29.05 15.39Baseline 6.42 100 12.07

Tailored JU_CSE_TAC2 10.97 55.9 18.34

NIST - November 14, 2011 RTE-7@TAC2011

Page 52: RTE-7@TAC2010 The Seventh Recognizing Textual Entailment Challenge

Results

TYPE RUN P R F1 RTE-6

GenericJU_CSE_TAC2 11.79 49.14 19.02 25.5CELI3 10.47 29.05 15.39 15.98Baseline 6.42 100 12.07 16.13

Tailored JU_CSE_TAC2 10.97 55.9 18.34 33.07

NIST - November 14, 2011 RTE-7@TAC2011

- Overall system performance decreased wrt. RTE-6, especially for the “tailored” submission

- All runs are above the baseline

Page 53: RTE-7@TAC2010 The Seventh Recognizing Textual Entailment Challenge

Outline• The RTE Challenge• RTE-7 Main Task: RTE within a

Corpus– RTE-7 Novelty Detection Subtask– Knowledge Resources and Tools for

RTE• RTE-7 KBP Validation Task• Conclusion and Future Perspectives

NIST - November 14, 2011 RTE-7@TAC2011

Page 54: RTE-7@TAC2010 The Seventh Recognizing Textual Entailment Challenge

RTE-7 repeated RTE-6: •Main Task: the results largely reflect those achieved in RTE-6 and show an improvement of the overall system performances•Novelty Subtask: the systems performed well and recorded a neat improvement with respect to RTE-6

RTE confirmed a potential to help SUM systems filter out non-novel information

•KBP Validation Task: demonstrated to be the most complex of the tasks proposed

Conclusions

NIST - November 14, 2011 RTE-7@TAC2011

Page 55: RTE-7@TAC2010 The Seventh Recognizing Textual Entailment Challenge

See you all at the RTE Planning Session

Thank you!

Future Directions

NIST - November 14, 2011 RTE-7@TAC2011