anaphora resolution

42
Anaphora Resolution Sobha Lalitha Devi AU-KBC Research Centre MIT Campus of Anna University Chennai-44 [email protected]

Upload: hyman

Post on 13-Jan-2016

64 views

Category:

Documents


0 download

DESCRIPTION

Anaphora Resolution. Sobha Lalitha Devi AU-KBC Research Centre MIT Campus of Anna University Chennai-44 [email protected]. Contents. Introduction to Anaphora and Anaphora Resolution Types of Anaphora Process of Anaphora Resolution Tools Applications References. Introduction. What is - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Anaphora Resolution

Anaphora Resolution

Sobha Lalitha DeviAU-KBC Research CentreMIT Campus of Anna [email protected]

Page 2: Anaphora Resolution

Contents

Introduction to Anaphora and Anaphora Resolution

Types of Anaphora Process of Anaphora Resolution Tools Applications References

Page 3: Anaphora Resolution

Introduction

What is Anaphora AntecedentAnaphora Resolution

1. Sabeer Bhatia arrived at Los Angeles International Airport at 6 p.m. on September 23, 1998. His flight from Bangalore had taken 22hrs and he was starving.

[RD, NOV 2000]

Page 4: Anaphora Resolution

Etymology of Anaphora

ANA- Back, Upstream, Back upstream

Phora- Act of Carrying

Anaphora - Act of Carrying Back

Page 5: Anaphora Resolution

What is Anaphora

Anaphora, in discourse, is a device for making an abbreviated reference (containing fewer bits of disambiguating information, rather than being lexically or phonetically shorter) to some entity (or entities) in the expectation that the receiver of the discourse will be able to disabbreviate the reference and, thereby, determine the identity of the entity.

(Hirst 1981)

Page 6: Anaphora Resolution

Cataphora

When “anphor” precedes the antecedent

Because she was going to the departmental store, Mary was asked to pick up the vegetables.

Page 7: Anaphora Resolution

Relevance from the Linguistics point of view

Binding Theory is one of the major results of the principles and parameters approach developed in Chomsky (1981) and is one of the mainstays of generative linguistics.

The Binding Theory deals with the relations between nominal expressions and possible antecedents.

It attempts to provide a structural account of the complementarity of distribution between pronouns, reflexives and R-expressions.

Page 8: Anaphora Resolution

Dichotomy Between Linguistic and NLP

The Binding Theory (and its various formulations) deals only with intra-sentential anaphora,

A very small subset of the anaphoric phenomenon that practical NLP systems are interested in resolving.

A much larger set of anaphoric phenomenon is the resolution of pronouns inter-sententially.

This problem is dealt with by Discourse Representation Theory and more specifically by Centering Theory (Grosz et al., 1995)..

Page 9: Anaphora Resolution

Type of Anaphors

The Prime Minister is yet to arrive and he is expected at the central hall at any time. [The Times of India, Feb 2001]

This book is about Anaphora Resolution. The book is designed to help beginners in the field and its author hopes that it will be useful.

John screamed, as did Mary .

Page 10: Anaphora Resolution

Pronominal anaphora Vajpayee hits back forcefully when he told the

opposition today “sometimes we fall prey to the media and sometimes you do. [Indian Express 2001]

Possessive Priyanka eats only chicken sandwiches

before going to take any exam; nothing else goes down her gullet that day.[Indian Express, 13 March 2001]

Page 11: Anaphora Resolution

Reflexive Pronoun

Finally ,Danian heaved himself up and lay on a waiting stretcher.

Demonstrative PronounJohn had lots of packing to do before he shifted his

house. This was something he never liked….

Relative PronounStumper Sameer Dige, who made his test debut, failed

to show fast reflexives when it mattered.

Page 12: Anaphora Resolution

Pleonastic It

Cognativea. It is believed that…..b. It appears that…..Modal Adjectivesc. It is dangerous……d. It is important…..Temporale. It is five o’clock f. It is winterWeather verbsg. It is rainingf. It is snowing

Page 13: Anaphora Resolution

Distance

h. How far it is to Chennai?

Non-anaphoric uses of pronounsHe that plants thorns must never expect to gather

roses.He who dares wins.

DeicticHe seems remarkably bright for a child of his age.

Page 14: Anaphora Resolution

Noun Phrase Anaphora

Definite descriptions and Proper names

Roy Kaene has warned Manchester United he may snub their pay deal. United’s skipper is even hinting that unless the future Old Trafford Package meets his demands, he could quit the club in June 2000. Irishman Keane, 27, still has 17 months to run on his current 23,000 pound a week contract and wants to commit himself to United for life. Alex Ferguson’s No 1 player confirmed: If it’s not the contract I want, I won’t sign”.

Page 15: Anaphora Resolution

Coreference

Computational Linguists from many different countries attended the tutorial. The participants found it hard to cope with the speed of the presentation, nevertheless they manages to take extensive notes.

Page 16: Anaphora Resolution

What is Anaphora Resolution

The Process of finding the antecedent for an Anaphor is Anaphora resolution

Anaphor-The reference that point to the previous item.

Antecedent-The entity to which the anaphor refers

Page 17: Anaphora Resolution

Different Approaches In Anaphora Resolution

Rule Based

Statistical Based

Page 18: Anaphora Resolution

Lappin and Leass (1994) Anaphora Resolution Algorithm

The Lappin and Leass(1994) anaphora resolution algorithm uses

salience weight in determining the antecedent to the pronominals.

It requires as input a fully parsed sentence structure and

uses hierarchy in identifying the subject, object etc.

This algorithm uses syntactic criteria to rule out noun

phrases that cannot possibly corefer with it.

The antecedent is then chosen according to a ranking based

on salience weights.

Page 19: Anaphora Resolution

The salience Factors and WeightsA pronoun P is non-coreferential with a (non-reflexive or non-

reciprocal) noun phrase N if any of the following conditions hold:

P and N have incompatible agreement features. P is in the argument domain of N. P is in the adjunct domain of N. P is an argument of a head

H, N is not a pronoun, and N is contained in H. P is in the NP domain of N. P is a determiner of a noun Q, and N is contained in Q.

Page 20: Anaphora Resolution

Examples

Condition 1:The woman said that he is funny.

Condition 2:She likes her. John seems to want to see him.

Condition 3:She sat near her.

Condition 4:He believes that the man is amusing.This is the man he said John wrote about.

Condition 5:John’s portrait of him is interesting.

Page 21: Anaphora Resolution

Salience Factors and Weights

Salience factor types with initial weightsFactor type Initial weightSentence recency 100Subject emphasis 80Existential emphasis 70Accusative emphasis 50Indirect object and oblique complement emphasis 40Head noun emphasis 80

Non-adverbial emphasis 50

Page 22: Anaphora Resolution

Kennedy 1996The linguistic analysis for anaphora resolution includes

The output of a part of speech tagger,

Augmented with syntactic function annotations for each input token;

Using LINGSOFT

Page 23: Anaphora Resolution

A set of patterns are used for identifying

The NP Chunking with position of the NP in the text: Nominal Sequencing in two subordinate syntactic

environments:a. in an adverbial adjunct b. in an NP (i.e. containment in a prepositional

or clausal complement of a noun, or containment in a relative clause)

Expletive “it”:

Page 24: Anaphora Resolution

Anaphora Resolution

Uses Lappin and Lease algorithmSENT-S: 100 iff in the current sentenceCNTX-S: 50 iff in the current contextSUBJ-S: 80 iff GFUN = subjectEXST-S: 70 iff in an existential constructionPOSS-S: 65 iff GFUN = possessiveACC-S: 50 iff GFUN = direct objectDAT-S: 40 iff GFUN = indirect objectOBLQ-S: 30 iff the complement of a prepositionHEAD-S: 80 iff EMBED = NILARG-S: 50 iff ADJUNCT = NIL

Page 25: Anaphora Resolution

Mitkov 1997

No Parsing of the Input Sentence

Boosting indicators

First Noun Phrases: A score of +1 is assigned to the first NP in a sentence.

Indicating Verbs: A score of +1 is assigned to those NPs immediately following a verb which is a member of a predefined set (including verbs such as discuss, present, illustrate, identify, summarise, examine, describe, define, show, check, develop, review,

Page 26: Anaphora Resolution

MARS Cont….

Lexical Reiteration: A score of +2 is assigned to those NPs repeated twice or more in the paragraph in which the pronoun appears, a score of +1 is assigned to those NPs repeated once in that paragraph.

Section Heading Preference: A score of +1 is assigned to those NPs that also occur in the heading of the section in which the pronoun appears.

Page 27: Anaphora Resolution

Boosting indicators contd..

Collocation Match: A score of +2 is assigned to those NPs that have an identical collocation pattern to the pronoun.

Immediate Reference: A score of +2 is assigned to those NPs appearing in constructions of the form

“… (You) V1 NP … con (you) V2 it (con (you) V3 it)”, where con Є {and/or/before/after…}.

Sequential Instructions: A score of +2 is applied to NPs in the NP1 position of constructions of the form: “To V1 NP1 V2 NP2. (Sentence). To V3 it, V4 NP4“ the noun phrase NP1 is the likely antecedent of the anaphor it (NP1 is assigned a score of 2).

Term Preference: A score of +1 is applied to those NPs identified as representing terms in the genre of the text.

Page 28: Anaphora Resolution

Impeding indicators

Indefiniteness: Indefinite NPs are assigned a score of -1.

Prepositional Noun Phrases: NPs appearing in prepositional phrases are assigned a score of -1.

Page 29: Anaphora Resolution

“Vasisth” a Rule Based Anaphora Resolution System

1. mo:han(i) avanRe(i) kuttiye kantu. mohan he-poss child-acc see-pst (Mohan saw his child.)2. mo:han(i) avanRe(i) kuttiye kantu ennu kRisnan paRannu. mohan he-poss child-acc see-pst compl krishnan say-pst (Krishnan said that Mohan saw his child.)3. *mo:han(i) avane(i) aticcu. mohan he-acc beat-pst (Mohan beat him.) 4. mo:han avane(i) aticcu ennu kRisnan(i) paRannu. mohan he-acc beat-pst compl krishnan say-pst (Krishnan said that Mohan beat him.)

Page 30: Anaphora Resolution

The Algorithm for Intra-sentential Anaphora

A pronoun P is coreferential with an NP iff the following conditions hold:

a. P and NP have compatible P, N, G features. b. P does not precede NP. c. If P is possessive, then NP is the subject of the clause which contains P. d. If P is non-possessive, then NP is the subject

of the immediate clause which does not contain P.

Page 31: Anaphora Resolution

Vasisth is a multilingual Anaphora Resolution system

Rule based With minimum Parsing Exploit the Morphology of Indian

Languages

Page 32: Anaphora Resolution

“VASISTH” Using Salience Measure for Indian Languages

No In-depth Parsing

Exploit the Rich Morphology of the Language

The analysis depends on the salience weight of the candidate (NP) for the antecedent-hood of an anaphor from a list of probable candidates.

Page 33: Anaphora Resolution

The salience weight assignment

a) The current sentence gets a score of 50 and it reduces by 10 for each preceding sentence till it reaches the fifth sentence. The system considers five sentences for identifying the antecedent.

b) The current clause gets a score of 75 if the pronoun present in the clause is a possessive pronoun and if it is a non-possessive pronoun it gets zero score.

c) The immediate clause gets the score 70 in the case of Possessive pronoun and gets a score of 75 for non-possessive pronouns.

d) For non-immediate clause, the possessive pronoun gets a score of 30 and non-possessive pronoun gets a score of 65.

Page 34: Anaphora Resolution

e)The analysis showed that the subject could be the most probable antecedent for the pronoun. The case markings the subject of a sentence could take are nominative and dative.

A Nominative, a Dative and a Possessive NP with a nominative/Dative head could become a subject of a sentence.

Page 35: Anaphora Resolution

f) The direct object of a sentence could be identified by the case markings and all the case markings other than the subject are considered for object. The next most probable NP for antecedent-hood is the direct object and hence it gets a score of 40.

g) The third NP in a clause, which is not identified as the subject or object, is considered as the indirect object and gets a low score of 30.

Page 36: Anaphora Resolution

Salience factor weights for Indian Languages

Salience Factors Weights

Current sentence  Possessive Current clauseImmediate clauseNon-immediate clauseNon-PossessiveCurrent clauseImmediate clauseNon-immediate clausePossessive and Non-PossessiveN.NomN.PossN.DatN.Acc, Loc, Instr…N.others(3rd NP)

50- Reduced by 10 for preceding sentences upto 5th sentence 75 7030 07565 8050504030

Page 37: Anaphora Resolution

How it works

The salience weight to an NP is assigned in the following way

Identify the Pronoun Consider Four sentences above the sentence containing

the Pronoun Consider all the NPs preceding the Pronoun ( This is

the general rule)

Page 38: Anaphora Resolution

Here we take some NPs which follow the the Pronoun since Tamil

All Indian languages are relatively free word Order

Assign Salience Weights.

The NP which gets the maximum salience weight and agrees in png with the anaphor is considered as the antecedent to the anaphor

Page 39: Anaphora Resolution

Tools

GATE Java-RAP (pronouns) GUITAR (Poesio & Kabadjov, 2004;

Kabadjov, 2007) BART (Versleyet al, 2008)

Page 40: Anaphora Resolution

Where it is required?

Machine Translation Information Extraction Summarization And in……….almost all NLU applications

Page 41: Anaphora Resolution

References

Massimo Poesio Slides: “Anaphora resolution for Practical task”

Ruslan Mitkov: “MARS a Knowledge Poor anaphora resolution system”

Page 42: Anaphora Resolution

Thank You