on the issue of combining anaphoricity determination and antecedent identification in anaphora...

18
On the Issue of Combining Anaphoricity Determination and Antecedent Identification in Anaphora Resolution Ryu Iida, Kentaro Inui, Yuji Matsumoto Nara Institute of Science and Technology {ryu- i,inui,matsu}@is.naist.jp NLP-KE’05, October 30, 2005

Upload: valerie-lawson

Post on 31-Dec-2015

213 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: On the Issue of Combining Anaphoricity Determination and Antecedent Identification in Anaphora Resolution Ryu Iida, Kentaro Inui, Yuji Matsumoto Nara Institute

On the Issue of Combining Anaphoricity Determination and Antecedent Identification

in Anaphora Resolution

Ryu Iida, Kentaro Inui, Yuji MatsumotoNara Institute of Science and Technology

{ryu-i,inui,matsu}@is.naist.jpNLP-KE’05, October 30, 2005

Page 2: On the Issue of Combining Anaphoricity Determination and Antecedent Identification in Anaphora Resolution Ryu Iida, Kentaro Inui, Yuji Matsumoto Nara Institute

2

Noun phrase anaphora resolution Anaphora resolution is the process of determining whether two expressions in natural language refer to the same real world entity

Important process for various NLP applications: machine translation, information extraction, question answering

A federal judge in Pittsburgh issued a temporary restraining order preventing Trans World Airlines from buying additional shares of USAir Group Inc.The order, requested in a suit filed by USAir, dealt another blow to TWA's bid to buy the company for $52 a share.

A federal judge in Pittsburgh issued a temporary restraining order preventing Trans World Airlines from buying additional shares of USAir Group Inc.The order, requested in a suit filed by USAir, dealt another blow to TWA's bid to buy the company for $52 a share.

antecedent

anaphor

Page 3: On the Issue of Combining Anaphoricity Determination and Antecedent Identification in Anaphora Resolution Ryu Iida, Kentaro Inui, Yuji Matsumoto Nara Institute

3

Anaphora resolution can be decomposed into two sub processes1.Anaphoricity determination is the task of classifying whether

a given noun phrase (NP) is anaphoric or non-anaphoric

2.Antecedent identification is the identification of the antecedent of a given anaphoric NP

Noun phrase anaphora resolution

A federal judge in Pittsburgh issued a temporary restraining order preventing Trans World Airlines from buying additional shares of USAir Group Inc.The order, requested in a suit filed by USAir, dealt another blow to TWA's bid to buy the company for $52 a share.

A federal judge in Pittsburgh issued a temporary restraining order preventing Trans World Airlines from buying additional shares of USAir Group Inc.The order, requested in a suit filed by USAir, dealt another blow to TWA's bid to buy the company for $52 a share.

antecedent

anaphor

non-anaphor

Page 4: On the Issue of Combining Anaphoricity Determination and Antecedent Identification in Anaphora Resolution Ryu Iida, Kentaro Inui, Yuji Matsumoto Nara Institute

4

Previous workEarly corpus-based work on anaphora resolution does not address anaphoricity determination (Hobbs `78, Lappin and Leass `94)

Assuming that the anaphora resolution system knows a priori all the anaphoric noun phrases

This problem has been paid attention by an increasing number of researchers (Bean and Riloff `99, Ng and Cardie `02, Uryupina `03, Ng `04)

Determining anaphoricity is not a trivial problem

Overall performance of anaphora resolution crucially depends on the accuracy of anaphoricity determination

Page 5: On the Issue of Combining Anaphoricity Determination and Antecedent Identification in Anaphora Resolution Ryu Iida, Kentaro Inui, Yuji Matsumoto Nara Institute

5

Previous work (Cont’d)Previous efforts to tackle anaphoricity determination problem have provided the two findings1. One useful cue for determining anaphoricity of a given NP

can be obtained by searching for an antecedent(Soon et al. 01, Ng and Cardie 02a)

2. Anaphoricity determination can be effectively carried out by a binary classifier that learns instances of non-anaphoric NPs (Ng and Cardie 02b, Ng 04)

None of the previous models effectively combines the strengths of these findings

Page 6: On the Issue of Combining Anaphoricity Determination and Antecedent Identification in Anaphora Resolution Ryu Iida, Kentaro Inui, Yuji Matsumoto Nara Institute

6

AimImproving anaphora resolution performance:

Using better anaphoricity determination

Combining sources of evidence from previous models

Page 7: On the Issue of Combining Anaphoricity Determination and Antecedent Identification in Anaphora Resolution Ryu Iida, Kentaro Inui, Yuji Matsumoto Nara Institute

7

ProposalIntroducing a 2-step process for combining antecedent information and non-anaphoric information

We call this model the selection-and-classification model

1. Select the most likely candidate antecedent (CA) of a target NP (TNP) using the tournament model (Iida et al. `03)

2. Classify a TNP paired with CA is classified as anaphoric if CA is identified as the antecedent of TNP; otherwise TNP is judged non-anaphoric

Page 8: On the Issue of Combining Anaphoricity Determination and Antecedent Identification in Anaphora Resolution Ryu Iida, Kentaro Inui, Yuji Matsumoto Nara Institute

8

2-step process for anaphora resolutionA federal judge in Pittsburgh issued a temporary restraining order preventing Trans World Airlines from buying additional shares of USAir Group Inc. The order, requested in a suit filed by USAir, …

A federal judge in Pittsburgh issued a temporary restraining order preventing Trans World Airlines from buying additional shares of USAir Group Inc. The order, requested in a suit filed by USAir, …candidate

anaphor

tournament model

USAir

suit

USAir Group Inc

order

federal judge

candidate anaphor

candidate antecedents …

Page 9: On the Issue of Combining Anaphoricity Determination and Antecedent Identification in Anaphora Resolution Ryu Iida, Kentaro Inui, Yuji Matsumoto Nara Institute

9

2-step process for anaphora resolutionA federal judge in Pittsburgh issued a temporary restraining order preventing Trans World Airlines from buying additional shares of USAir Group Inc. The order, requested in a suit filed by USAir, …

A federal judge in Pittsburgh issued a temporary restraining order preventing Trans World Airlines from buying additional shares of USAir Group Inc. The order, requested in a suit filed by USAir, …candidate

anaphor

tournament model

USAir

suit

USAir Group Inc

order

federal judge

candidate anaphor

candidate antecedents …

USAir Group Inc

USAirsuitUSAir Group IncFederal judgecandidate anaphorcandidate antecedents

…order

Page 10: On the Issue of Combining Anaphoricity Determination and Antecedent Identification in Anaphora Resolution Ryu Iida, Kentaro Inui, Yuji Matsumoto Nara Institute

10

2-step process for anaphora resolution

USAir Group Inc

candidate antecedent

A federal judge in Pittsburgh issued a temporary restraining order preventing Trans World Airlines from buying additional shares of USAir Group Inc. The order, requested in a suit filed by USAir, …

A federal judge in Pittsburgh issued a temporary restraining order preventing Trans World Airlines from buying additional shares of USAir Group Inc. The order, requested in a suit filed by USAir, …candidate

anaphor

tournament model

USAir

suit

USAir Group Inc

order

federal judge

candidate anaphor

candidate antecedents …

Anaphoricitydetermination model

is non-anaphoricUSAir

score   θ anascore θ ana

is anaphoric andis the USAir

USAirUSAir Group Inc antecedent of

USAir Group Inc USAir

Page 11: On the Issue of Combining Anaphoricity Determination and Antecedent Identification in Anaphora Resolution Ryu Iida, Kentaro Inui, Yuji Matsumoto Nara Institute

11

Training phaseAnaphoric

Non-anaphoric

NANP

NP5

NP4

NP3

NP2

NP1

Non-anaphoric NP

set of candidate antecedents

NP3

tournament model

candidate antecedent

Non-anaphoricinstances

NP3 NANP

ANP

NP5

NP4

NP3

NP2

NP1

Anaphoric NP

set of candidate antecedents

Antecedent Anaphoricinstances

NP4 ANP

NPi: candidate antecedent

Page 12: On the Issue of Combining Anaphoricity Determination and Antecedent Identification in Anaphora Resolution Ryu Iida, Kentaro Inui, Yuji Matsumoto Nara Institute

12

Comparison with previous approaches1. Search-based approach (SM) (Soon et al. `01, Ng and Cardie `02)

Recasting anaphora resolution as binary classification problemsComparable to the state-of-the-art rule-based systemdisadvantage: not use non-anaphoric instances in training

2. Classification-and-search approach (CSM) (Ng and Cardie `02, Ng `04)

Introducing anaphoricity determination as a classification taskThe performance of the CSM is better than the SMif the threshold parameters are appropriately tuneddisadvantage: not use the contextual information(i.e. whether an appropriate antecedent appears on the context)

Page 13: On the Issue of Combining Anaphoricity Determination and Antecedent Identification in Anaphora Resolution Ryu Iida, Kentaro Inui, Yuji Matsumoto Nara Institute

13

Experiments Noun phrase anaphora resolution in JapaneseJapanese newspaper article corpus tagged NP-anaphoric relations

90 text, 1,104 sentencesNoun phrases : 876 anaphors and 6,292 non-anaphors

Recall =

Precision =

# of correctly detected anaphoric relations

# of anaphoric NPs

# of correctly detected anaphoric relations

# of NPs classified as anaphoric

Page 14: On the Issue of Combining Anaphoricity Determination and Antecedent Identification in Anaphora Resolution Ryu Iida, Kentaro Inui, Yuji Matsumoto Nara Institute

14

Experimental setting Conduct 10-fold cross-validation with support vector machines

Comparison among three models1. Search-based model (Ng and Cardie `02)2. Classification-and-Search model (Ng and Cardie `04)3. Selection-and-Classification model (Proposed model)

using the tournament model (Iida et al. `03)

Page 15: On the Issue of Combining Anaphoricity Determination and Antecedent Identification in Anaphora Resolution Ryu Iida, Kentaro Inui, Yuji Matsumoto Nara Institute

15

Results of noun phrase anaphora resolution

Proposed model

Search-based model

Classification-and-search model

Search-based model (SM) vs. Classification-and-search model (CSM)the performance of CSM is significantlybetter than the SM

Page 16: On the Issue of Combining Anaphoricity Determination and Antecedent Identification in Anaphora Resolution Ryu Iida, Kentaro Inui, Yuji Matsumoto Nara Institute

16

Results of noun phrase anaphora resolution

Proposed model

Search-based model

Classification-and-search model

Classification-and-search model (CSM) vs.Proposed modelthe proposed model outperforms the CSMin the higher-recall portion

Page 17: On the Issue of Combining Anaphoricity Determination and Antecedent Identification in Anaphora Resolution Ryu Iida, Kentaro Inui, Yuji Matsumoto Nara Institute

17

ConclusionOur selection-and-classification approach to anaphora resolution improves on the performance of previous learning-based models by combining their advantages

1. Our model uses non-anaphoric instances together with anaphoric instances to induce anaphoricity classifier

2. Our model determines the anaphoricity of a given NP by taking antecedent information into account

Page 18: On the Issue of Combining Anaphoricity Determination and Antecedent Identification in Anaphora Resolution Ryu Iida, Kentaro Inui, Yuji Matsumoto Nara Institute

18

Future workThe majority of errors are caused by the difficulty of judging the semantic compatibilitye.g.) the system outputs that “ani (elder brother)” is anaphoric with “kanojo (she)”

The lexical resource we employed in the experiments did not contain gender information

Developing a lexical resource which includes a broad range of semantic compatible relations