anaphora resolution
DESCRIPTION
Anaphora Resolution. Sobha Lalitha Devi AU-KBC Research Centre MIT Campus of Anna University Chennai-44 [email protected]. Contents. Introduction to Anaphora and Anaphora Resolution Types of Anaphora Process of Anaphora Resolution Tools Applications References. Introduction. What is - PowerPoint PPT PresentationTRANSCRIPT
Anaphora Resolution
Sobha Lalitha DeviAU-KBC Research CentreMIT Campus of Anna [email protected]
Contents
Introduction to Anaphora and Anaphora Resolution
Types of Anaphora Process of Anaphora Resolution Tools Applications References
Introduction
What is Anaphora AntecedentAnaphora Resolution
1. Sabeer Bhatia arrived at Los Angeles International Airport at 6 p.m. on September 23, 1998. His flight from Bangalore had taken 22hrs and he was starving.
[RD, NOV 2000]
Etymology of Anaphora
ANA- Back, Upstream, Back upstream
Phora- Act of Carrying
Anaphora - Act of Carrying Back
What is Anaphora
Anaphora, in discourse, is a device for making an abbreviated reference (containing fewer bits of disambiguating information, rather than being lexically or phonetically shorter) to some entity (or entities) in the expectation that the receiver of the discourse will be able to disabbreviate the reference and, thereby, determine the identity of the entity.
(Hirst 1981)
Cataphora
When “anphor” precedes the antecedent
Because she was going to the departmental store, Mary was asked to pick up the vegetables.
Relevance from the Linguistics point of view
Binding Theory is one of the major results of the principles and parameters approach developed in Chomsky (1981) and is one of the mainstays of generative linguistics.
The Binding Theory deals with the relations between nominal expressions and possible antecedents.
It attempts to provide a structural account of the complementarity of distribution between pronouns, reflexives and R-expressions.
Dichotomy Between Linguistic and NLP
The Binding Theory (and its various formulations) deals only with intra-sentential anaphora,
A very small subset of the anaphoric phenomenon that practical NLP systems are interested in resolving.
A much larger set of anaphoric phenomenon is the resolution of pronouns inter-sententially.
This problem is dealt with by Discourse Representation Theory and more specifically by Centering Theory (Grosz et al., 1995)..
Type of Anaphors
The Prime Minister is yet to arrive and he is expected at the central hall at any time. [The Times of India, Feb 2001]
This book is about Anaphora Resolution. The book is designed to help beginners in the field and its author hopes that it will be useful.
John screamed, as did Mary .
Pronominal anaphora Vajpayee hits back forcefully when he told the
opposition today “sometimes we fall prey to the media and sometimes you do. [Indian Express 2001]
Possessive Priyanka eats only chicken sandwiches
before going to take any exam; nothing else goes down her gullet that day.[Indian Express, 13 March 2001]
Reflexive Pronoun
Finally ,Danian heaved himself up and lay on a waiting stretcher.
Demonstrative PronounJohn had lots of packing to do before he shifted his
house. This was something he never liked….
Relative PronounStumper Sameer Dige, who made his test debut, failed
to show fast reflexives when it mattered.
Pleonastic It
Cognativea. It is believed that…..b. It appears that…..Modal Adjectivesc. It is dangerous……d. It is important…..Temporale. It is five o’clock f. It is winterWeather verbsg. It is rainingf. It is snowing
Distance
h. How far it is to Chennai?
Non-anaphoric uses of pronounsHe that plants thorns must never expect to gather
roses.He who dares wins.
DeicticHe seems remarkably bright for a child of his age.
Noun Phrase Anaphora
Definite descriptions and Proper names
Roy Kaene has warned Manchester United he may snub their pay deal. United’s skipper is even hinting that unless the future Old Trafford Package meets his demands, he could quit the club in June 2000. Irishman Keane, 27, still has 17 months to run on his current 23,000 pound a week contract and wants to commit himself to United for life. Alex Ferguson’s No 1 player confirmed: If it’s not the contract I want, I won’t sign”.
Coreference
Computational Linguists from many different countries attended the tutorial. The participants found it hard to cope with the speed of the presentation, nevertheless they manages to take extensive notes.
What is Anaphora Resolution
The Process of finding the antecedent for an Anaphor is Anaphora resolution
Anaphor-The reference that point to the previous item.
Antecedent-The entity to which the anaphor refers
Different Approaches In Anaphora Resolution
Rule Based
Statistical Based
Lappin and Leass (1994) Anaphora Resolution Algorithm
The Lappin and Leass(1994) anaphora resolution algorithm uses
salience weight in determining the antecedent to the pronominals.
It requires as input a fully parsed sentence structure and
uses hierarchy in identifying the subject, object etc.
This algorithm uses syntactic criteria to rule out noun
phrases that cannot possibly corefer with it.
The antecedent is then chosen according to a ranking based
on salience weights.
The salience Factors and WeightsA pronoun P is non-coreferential with a (non-reflexive or non-
reciprocal) noun phrase N if any of the following conditions hold:
P and N have incompatible agreement features. P is in the argument domain of N. P is in the adjunct domain of N. P is an argument of a head
H, N is not a pronoun, and N is contained in H. P is in the NP domain of N. P is a determiner of a noun Q, and N is contained in Q.
Examples
Condition 1:The woman said that he is funny.
Condition 2:She likes her. John seems to want to see him.
Condition 3:She sat near her.
Condition 4:He believes that the man is amusing.This is the man he said John wrote about.
Condition 5:John’s portrait of him is interesting.
Salience Factors and Weights
Salience factor types with initial weightsFactor type Initial weightSentence recency 100Subject emphasis 80Existential emphasis 70Accusative emphasis 50Indirect object and oblique complement emphasis 40Head noun emphasis 80
Non-adverbial emphasis 50
Kennedy 1996The linguistic analysis for anaphora resolution includes
The output of a part of speech tagger,
Augmented with syntactic function annotations for each input token;
Using LINGSOFT
A set of patterns are used for identifying
The NP Chunking with position of the NP in the text: Nominal Sequencing in two subordinate syntactic
environments:a. in an adverbial adjunct b. in an NP (i.e. containment in a prepositional
or clausal complement of a noun, or containment in a relative clause)
Expletive “it”:
Anaphora Resolution
Uses Lappin and Lease algorithmSENT-S: 100 iff in the current sentenceCNTX-S: 50 iff in the current contextSUBJ-S: 80 iff GFUN = subjectEXST-S: 70 iff in an existential constructionPOSS-S: 65 iff GFUN = possessiveACC-S: 50 iff GFUN = direct objectDAT-S: 40 iff GFUN = indirect objectOBLQ-S: 30 iff the complement of a prepositionHEAD-S: 80 iff EMBED = NILARG-S: 50 iff ADJUNCT = NIL
Mitkov 1997
No Parsing of the Input Sentence
Boosting indicators
First Noun Phrases: A score of +1 is assigned to the first NP in a sentence.
Indicating Verbs: A score of +1 is assigned to those NPs immediately following a verb which is a member of a predefined set (including verbs such as discuss, present, illustrate, identify, summarise, examine, describe, define, show, check, develop, review,
MARS Cont….
Lexical Reiteration: A score of +2 is assigned to those NPs repeated twice or more in the paragraph in which the pronoun appears, a score of +1 is assigned to those NPs repeated once in that paragraph.
Section Heading Preference: A score of +1 is assigned to those NPs that also occur in the heading of the section in which the pronoun appears.
Boosting indicators contd..
Collocation Match: A score of +2 is assigned to those NPs that have an identical collocation pattern to the pronoun.
Immediate Reference: A score of +2 is assigned to those NPs appearing in constructions of the form
“… (You) V1 NP … con (you) V2 it (con (you) V3 it)”, where con Є {and/or/before/after…}.
Sequential Instructions: A score of +2 is applied to NPs in the NP1 position of constructions of the form: “To V1 NP1 V2 NP2. (Sentence). To V3 it, V4 NP4“ the noun phrase NP1 is the likely antecedent of the anaphor it (NP1 is assigned a score of 2).
Term Preference: A score of +1 is applied to those NPs identified as representing terms in the genre of the text.
Impeding indicators
Indefiniteness: Indefinite NPs are assigned a score of -1.
Prepositional Noun Phrases: NPs appearing in prepositional phrases are assigned a score of -1.
“Vasisth” a Rule Based Anaphora Resolution System
1. mo:han(i) avanRe(i) kuttiye kantu. mohan he-poss child-acc see-pst (Mohan saw his child.)2. mo:han(i) avanRe(i) kuttiye kantu ennu kRisnan paRannu. mohan he-poss child-acc see-pst compl krishnan say-pst (Krishnan said that Mohan saw his child.)3. *mo:han(i) avane(i) aticcu. mohan he-acc beat-pst (Mohan beat him.) 4. mo:han avane(i) aticcu ennu kRisnan(i) paRannu. mohan he-acc beat-pst compl krishnan say-pst (Krishnan said that Mohan beat him.)
The Algorithm for Intra-sentential Anaphora
A pronoun P is coreferential with an NP iff the following conditions hold:
a. P and NP have compatible P, N, G features. b. P does not precede NP. c. If P is possessive, then NP is the subject of the clause which contains P. d. If P is non-possessive, then NP is the subject
of the immediate clause which does not contain P.
Vasisth is a multilingual Anaphora Resolution system
Rule based With minimum Parsing Exploit the Morphology of Indian
Languages
“VASISTH” Using Salience Measure for Indian Languages
No In-depth Parsing
Exploit the Rich Morphology of the Language
The analysis depends on the salience weight of the candidate (NP) for the antecedent-hood of an anaphor from a list of probable candidates.
The salience weight assignment
a) The current sentence gets a score of 50 and it reduces by 10 for each preceding sentence till it reaches the fifth sentence. The system considers five sentences for identifying the antecedent.
b) The current clause gets a score of 75 if the pronoun present in the clause is a possessive pronoun and if it is a non-possessive pronoun it gets zero score.
c) The immediate clause gets the score 70 in the case of Possessive pronoun and gets a score of 75 for non-possessive pronouns.
d) For non-immediate clause, the possessive pronoun gets a score of 30 and non-possessive pronoun gets a score of 65.
e)The analysis showed that the subject could be the most probable antecedent for the pronoun. The case markings the subject of a sentence could take are nominative and dative.
A Nominative, a Dative and a Possessive NP with a nominative/Dative head could become a subject of a sentence.
f) The direct object of a sentence could be identified by the case markings and all the case markings other than the subject are considered for object. The next most probable NP for antecedent-hood is the direct object and hence it gets a score of 40.
g) The third NP in a clause, which is not identified as the subject or object, is considered as the indirect object and gets a low score of 30.
Salience factor weights for Indian Languages
Salience Factors Weights
Current sentence Possessive Current clauseImmediate clauseNon-immediate clauseNon-PossessiveCurrent clauseImmediate clauseNon-immediate clausePossessive and Non-PossessiveN.NomN.PossN.DatN.Acc, Loc, Instr…N.others(3rd NP)
50- Reduced by 10 for preceding sentences upto 5th sentence 75 7030 07565 8050504030
How it works
The salience weight to an NP is assigned in the following way
Identify the Pronoun Consider Four sentences above the sentence containing
the Pronoun Consider all the NPs preceding the Pronoun ( This is
the general rule)
Here we take some NPs which follow the the Pronoun since Tamil
All Indian languages are relatively free word Order
Assign Salience Weights.
The NP which gets the maximum salience weight and agrees in png with the anaphor is considered as the antecedent to the anaphor
Tools
GATE Java-RAP (pronouns) GUITAR (Poesio & Kabadjov, 2004;
Kabadjov, 2007) BART (Versleyet al, 2008)
Where it is required?
Machine Translation Information Extraction Summarization And in……….almost all NLU applications
References
Massimo Poesio Slides: “Anaphora resolution for Practical task”
Ruslan Mitkov: “MARS a Knowledge Poor anaphora resolution system”
Thank You