discourse structureand co-reference: an empiricalstudyide/papers/acl99.pdf · discourse...

8
Discourse Structure and Co-Reference: An Empirical Study Dan Cristea Department of Computer Science University “A.I. Cuza” Ias ¸i, Romˆ ania [email protected] Nancy Ide Department of Computer Science Vassar College Poughkeepsie, NY, USA [email protected] Daniel Marcu Information Sciences Institute and Department of Computer Science University of Southern California Los Angeles, CA, USA [email protected] Valentin Tablan Department of Computer Science University “A.I. Cuza” Ias ¸i, Romˆ ania [email protected] Abstract We compare the potential of two classes of linear and hierarchical models of discourse to determine co-reference links and resolve anaphors. The com- parison uses a corpus of thirty texts, which were manually annotated for co-reference and discourse structure. 1 Introduction Most current anaphora resolution systems im- plement a pipeline architecture with three mod- ules (Lappin and Leass, 1994; Mitkov, 1997; Kameyama, 1997). 1. ACOLLECT module determines a list of poten- tial antecedents (LPA) for each anaphor (pro- noun, definite noun, proper name, etc.) that have the potential to resolve it. 2. A FILTER module eliminates referees incom- patible with the anaphor from the LPA. 3. A PREFERENCE module determines the most likely antecedent on the basis of an ordering policy. In most cases, the COLLECT module determines an LPA by enumerating all antecedents in a win- dow of text that precedes the anaphor under scrutiny (Hobbs, 1978; Lappin and Leass, 1994; Mitkov, 1997; Kameyama, 1997; Ge et al., 1998). This window can be as small as two or three sen- tences or as large as the entire preceding text. The FILTER module usually imposes semantic con- straints by requiring that the anaphor and poten- tial antecedents have the same number and gender, that selectional restrictions are obeyed, etc. The PREFERENCE module imposes preferences on po- tential antecedents on the basis of their grammati- cal roles, parallelism, frequency, proximity, etc. In some cases, anaphora resolution systems implement these modules explicitly (Hobbs, 1978; Lappin and Leass, 1994; Mitkov, 1997; Kameyama, 1997). In other cases, these modules are integrated by means of statistical (Ge et al., 1998) or uncertainty reason- ing techniques (Mitkov, 1997). The fact that current anaphora resolution systems rely exclusively on the linear nature of texts in or- der to determine the LPA of an anaphor seems odd, given that several studies have claimed that there is a strong relation between discourse structure and reference (Sidner, 1981; Grosz and Sidner, 1986; Grosz et al., 1995; Fox, 1987; Vonk et al., 1992; Azzam et al., 1998; Hitzeman and Poesio, 1998). These studies claim, on the one hand, that the use of referents in naturally occurring texts imposes con- straints on the interpretation of discourse; and, on the other, that the structure of discourse constrains the LPAs to which anaphors can be resolved. The oddness of the situation can be explained by the fact that both groups seem prima facie to be right. Em- pirical experiments studies that employ linear tech- niques for determining the LPAs of anaphors report recall and precision anaphora resolution results in the range of 80% (Lappin and Leass, 1994; Ge et al., 1998). Empirical experiments that investigated the relation between discourse structure and reference also claim that by exploiting the structure of dis- course one has the potential of determining correct co-referential links for more than 80% of the refer- ential expressions (Fox, 1987; Cristea et al., 1998) although to date, no discourse-based anaphora res- olution system has been implemented. Since no di-

Upload: lekhanh

Post on 10-Apr-2018

217 views

Category:

Documents


2 download

TRANSCRIPT

Discourse Structure and Co-Reference: An Empirical StudyDan Cristea

Department of Computer ScienceUniversity “A.I. Cuza”

Iasi, [email protected]

Nancy IdeDepartment of Computer Science

Vassar CollegePoughkeepsie, NY, USA

[email protected]

Daniel MarcuInformation Sciences Institute andDepartment of Computer ScienceUniversity of Southern California

Los Angeles, CA, [email protected]

Valentin TablanDepartment of Computer Science

University “A.I. Cuza”Iasi, Romania

[email protected]

Abstract

We compare the potential of two classes of linearand hierarchical models of discourse to determineco-reference links and resolve anaphors. The com-parison uses a corpus of thirty texts, which weremanually annotated for co-reference and discoursestructure.

1 Introduction

Most current anaphora resolution systems im-plement a pipeline architecture with three mod-ules (Lappin and Leass, 1994; Mitkov, 1997;Kameyama, 1997).

1. A COLLECT module determines a list of poten-tial antecedents (LPA) for each anaphor (pro-noun, definite noun, proper name, etc.) thathave the potential to resolve it.

2. A FILTER module eliminates referees incom-patible with the anaphor from the LPA.

3. A PREFERENCE module determines the mostlikely antecedent on the basis of an orderingpolicy.

In most cases, the COLLECT module determinesan LPA by enumerating all antecedents in a win-dow of text that precedes the anaphor underscrutiny (Hobbs, 1978; Lappin and Leass, 1994;Mitkov, 1997; Kameyama, 1997; Ge et al., 1998).This window can be as small as two or three sen-tences or as large as the entire preceding text.The FILTER module usually imposes semantic con-straints by requiring that the anaphor and poten-tial antecedents have the same number and gender,that selectional restrictions are obeyed, etc. The

PREFERENCE module imposes preferences on po-tential antecedents on the basis of their grammati-cal roles, parallelism, frequency, proximity, etc. Insome cases, anaphora resolution systems implementthese modules explicitly (Hobbs, 1978; Lappin andLeass, 1994; Mitkov, 1997; Kameyama, 1997). Inother cases, these modules are integrated by meansof statistical (Ge et al., 1998) or uncertainty reason-ing techniques (Mitkov, 1997).

The fact that current anaphora resolution systemsrely exclusively on the linear nature of texts in or-der to determine the LPA of an anaphor seems odd,given that several studies have claimed that thereis a strong relation between discourse structure andreference (Sidner, 1981; Grosz and Sidner, 1986;Grosz et al., 1995; Fox, 1987; Vonk et al., 1992;Azzam et al., 1998; Hitzeman and Poesio, 1998).These studies claim, on the one hand, that the use ofreferents in naturally occurring texts imposes con-straints on the interpretation of discourse; and, onthe other, that the structure of discourse constrainsthe LPAs to which anaphors can be resolved. Theoddness of the situation can be explained by the factthat both groups seem prima facie to be right. Em-pirical experiments studies that employ linear tech-niques for determining the LPAs of anaphors reportrecall and precision anaphora resolution results inthe range of 80% (Lappin and Leass, 1994; Ge et al.,1998). Empirical experiments that investigated therelation between discourse structure and referencealso claim that by exploiting the structure of dis-course one has the potential of determining correctco-referential links for more than 80% of the refer-ential expressions (Fox, 1987; Cristea et al., 1998)although to date, no discourse-based anaphora res-olution system has been implemented. Since no di-

rect comparison of these two classes of approacheshas been made, it is difficult to determine whichgroup is right, and what method is the best.

In this paper, we attempt to fill this gap by em-pirically comparing the potential of linear- and hi-erarchical models of discourse to correctly establishco-referential links in texts, and hence, their poten-tial to correctly resolve anaphors. Since it is likelythat both linear- and discourse-based anaphora res-olution systems can implement similar FILTER andPREFERENCE strategies, we focus here only on thestrategies that can be used to COLLECT lists of po-tential antecedents. Specifically, we focus on de-termining whether discourse theories can help ananaphora resolution system determine LPAs that are”better” than the LPAs that can be computed froma linear interpretation of texts. Section 2 outlinesthe theoretical assumptions of our empirical inves-tigation. Section 3 describes our experiment. Weconclude with a discussion of the results.

2 Background2.1 AssumptionsOur approach is based on the following assump-tions:

1. For each anaphor in a text, an anaphora reso-lution system must produce an LPA that con-tains a referent to which the anaphor can beresolved. The size of this LPA varies from sys-tem to system, depending on the theory a sys-tem implements.

2. The smaller the LPA (while retaining a correctantecedent), the less likely that errors in theFILTER and PREFERENCE modules will affectthe ability of a system to select the appropriatereferent.

3. Theory A is better than theory B for the taskof reference resolution if theory A producesLPAs that contain more antecedents to whichanaphors can be correctly resolved than theoryB, and if the LPAs produced by theory A aresmaller than those produced by theory B. Forexample, if for a given anaphor, theory A pro-duces an LPA that contains a referee to whichthe anaphor can be resolved, while theory Bproduces an LPA that does not contain such areferee, theory A is better than theory B. More-over, if for a given anaphor, theory A producesan LPA with two referees and theory B pro-duces an LPA with seven referees (each LPA

containing a referee to which the anaphor canbe resolved), theory A is considered better thantheory B because it has a higher probability ofsolving that anaphor correctly.

We consider two classes of models for determiningthe LPAs of anaphors in a text:

Linear-k models. This is a class of linear modelsin which the LPAs include all the references foundin the discourse unit under scrutiny and the k dis-course units that immediately precede it. Linear-0models an approach that assumes that all anaphorscan be resolved intra-unit; Linear-1 models an ap-proach that corresponds roughly to centering (Groszet al., 1995). Linear-k is consistentwith the assump-tions that underlie most current anaphora resolutionsystems, which look back k units in order to resolvean anaphor.

Discourse-VT-k models. In this class of models,LPAs include all the referential expressions found inthe discourse unit under scrutiny and the k discourseunits that hierarchically precede it. The units that hi-erarchically precede a given unit are determined ac-cording to Veins Theory (VT) (Cristea et al., 1998),which is described briefly below.

2.2 Veins Theory

VT extends and formalizes the relation betweendiscourse structure and reference proposed byFox (1987). It identifies ”veins”, i.e., chains of el-ementary discourse units, over discourse structuretrees that are built according to the requirements putforth in Rhetorical Structure Theory (RST) (Mannand Thompson, 1988).

One of the conjectures of VT is that the vein ex-pression of an elementary discourse unit provides acoherent ”abstract” of the discourse fragment thatcontains that unit. As an internally coherent dis-course fragment, all anaphors and referential ex-pressions (REs) in a unit must be resolved to ref-erees that occur in the text subsumed by the unitsin the vein. This conjecture is consistent with Fox’sview (1987) that the units that contain referees towhich anaphors can be resolved are determined bythe nuclearity of the discourse units that precede theanaphors and the overall structure of discourse. Ac-cording to VT, REs of both satellites and nuclei canaccess referees of immediately preceding nucleusnodes. REs of nuclei can only access referees ofpreceding nuclei nodes and of directly subordinatedsatellite nodes. And the interposition of a nucleus

after a satellite blocks the accessibility of the satel-lite for all nodes that are lower in the correspondingdiscourse structure (see (Cristea et al., 1998) for afull definition).

Hence, the fundamental intuition underlying VTis that the RST-specific distinction between nucleiand satellites constrains the range of referents towhich anaphors can be resolved; in other words,the nucleus-satellite distinction induces for eachanaphor (and each referential expression) a Do-main of Referential Accessibility (DRA). For eachanaphor a in a discourse unit u, VT hypothesizesthat a can be resolved by examining referential ex-pressions that were used in a subset of the discourseunits that precede u; this subset is called the DRAof u. For any elementary unit u in a text, the corre-sponding DRA is computed automatically from therhetorical representation of that text in two steps:

1. Heads for each node are computed bottom-upover the rhetorical representation tree. Headsof elementary discourse units are the unitsthemselves. Heads of internal nodes, i.e., dis-course spans, are computed by taking the unionof the heads of the immediate child nodes thatare nuclei. For example, for the text in Fig-ure 1, whose rhetorical structure is shown inFigure 2, the head of span [5,7] is unit 5 be-cause the head of the immediate nucleus, theelementary unit 5, is 5. However, the head ofspan [6,7] is the list h6,7i because both imme-diate children are nuclei of a multinuclear rela-tion.

2. Using the results of step 1, Vein expressionsare computed top-down for each node in thetree. The vein of the root is its head. Veinsof child nodes are computed recursively ac-cording to the rules described by Cristea etal.(1998). The DRA of a unit u is given by theunits in the vein that precede u.

For example, for the text and RST tree in Fig-ures 1 and 2, the vein expression of unit 3,which contains units 1 and 3, suggests thatanaphors from unit 3 should be resolved onlyto referential expressions in units 1 and 3. Be-cause unit 2 is a satellite to unit 1, it is consid-ered to be ”blocked” to referential links fromunit 3. In contrast, the DRA of unit 9, consist-ing of units 1, 8, and 9, reflects the intuitionthat anaphors from unit 9 can be resolved onlyto referential expressions from unit 1, which

Figure 1: An example of text and its elementaryunits. The referential expressions surrounded byboxes and ellipses correspond to two distinct co-referential equivalence classes. Referential expres-sions surrounded by boxes refer to Mr. Casey;those surrounded by ellipses refer to Genetic Ther-apy Inc..

is the most important unit in span [1,7], andto unit 8, a satellite that immediately precedesunit 9. Figure 2 shows the heads and veins ofall internal nodes in the rhetorical representa-tion.

2.3 Comparing modelsThe premise underlying our experiment is that thereare potentially significant differences in the size ofthe search space required to resolve referential ex-pressions when using Linear models vs. Discourse-VT models. For example, for text and the RSTtree in Figures 1 and 2, the Discourse-VT modelnarrows the search space required to resolve theanaphor the smaller company in unit 9. Accord-ing to VT, we look for potential antecedents for thesmaller company in the DRA of unit 9, which lists

Figure 2: The RST analysis of the text in figure 1. The tree is represented using the conventions proposedby Mann and Thompson (1988).

units 1, 8, and 9. The antecedent Genetic Ther-apy, Inc. appears in unit 1; therefore, using VT wesearch back 2 units (units 8 and 1) to find a correctantecedent. In contrast, to resolve the same refer-ence using a linear model, four units (units 8, 7,6, and 5) must be examined before Genetic Ther-apy is found. Assuming that referential links are es-tablished as the text is processed, Genetic Therapywould be linked back to pronoun its in unit 2, whichwould in turn be linked to the first occurrence of theantecedent,Genetic Therapy, Inc., in unit 1, the an-tecedent determined directly by using VT.

In general, when hierarchical adjacency is con-sidered, an anaphor may be resolved to a referentthat is not the closest in a linear interpretation ofa text. Similarly, a referential expression can belinked to a referee that is not the closest in a lin-ear interpretation of a text. However, this does notcreate problems because we are focusing here onlyon co-referential relations of identity (see section3). Since these relations induce equivalence classesover the set of referential expressions in a text, itis sufficient that an anaphor or referential expres-sion is resolved to any of the members of the rele-vant equivalence class. For example, according toVT, the referential expression Mr. Casey in unit 5in Figure 1 can be linked directly only to the ref-eree Mr Casey in unit 1, because the DRA of unit 5

is f1,5g. By considering the co-referential links ofthe REs in the other units, the full equivalence classcan be determined. This is consistent with the dis-tinction between ”direct” and ”indirect” referencesdiscussed by Cristea, et al.(1998).

3 The Experiment3.1 Materials

We used thirty newspaper texts whose lengths var-ied widely; the mean � is 408 words and the stan-dard deviation � is 376. The texts were anno-tated manually for co-reference relations of iden-tity (Hirschman and Chinchor, 1997). The co-reference relations define equivalence classes on theset of all marked referents in a text. The texts werealso manually annotated with discourse structuresbuilt in the style of Mann and Thompson (1988).Each analysis yielded an average of 52 elementarydiscourse units. Details of the discourse annotationprocess are given in (Marcu et al., 1999).

3.2 Comparing potential to establishco-referential links

3.2.1 MethodThe annotations for co-reference relations andrhetorical structure trees for the thirty texts werefused, yielding representations that reflect not onlythe discourse structure, but also the co-reference

equivalence classes specific to each text. Based onthis information, we evaluated the potential of eachof the two classes of models discussed in section2 (Linear-k and Discourse-VT-k) to correctly estab-lish co-referential links as follows: For each model,each k, and each marked referential expression a,we determined whether or not the correspondingLPA (defined over k elementary units) contained areferee from the same equivalence class. For exam-ple, for the Linear-2 model and referential expres-sion the smaller company in unit 9, we estimatedwhether a co-referential link could be establishedbetween the smaller company and another referen-tial expression in units 7, 8, or 9. For the Discourse-VT-2 model and the same referential expression, weestimated whether a co-referential link could be es-tablished between the smaller company and anotherreferential expression in units 1, 8, or 9, which cor-respond to the DRA of unit 9.

To enable a fair comparison of the two models,when k is larger than the size of the DRA of a givenunit, we extend that DRA using the closest units thatprecede the unit under scrutiny and are not alreadyin the DRA. Hence, for the Linear-3 model and thereferential expression the smaller company in unit 9,we estimate whether a co-referential link can be es-tablished between the smaller company and anotherreferential expression in units 6, 7, 8, or 9. For theDiscourse-VT-3 model and the same referential ex-pression, we estimate whether a co-referential linkcan be established between the smaller companyand another referential expression in units 1, 8, 9,or 7, which correspond to the DRA of unit 9 (units1, 8, and 9) and to unit 7, the closest unit precedingunit 9 that is not in its DRA.

For the Discourse-VT-k models, we assume thatthe Extended DRA (EDRA) of size k of a unitu �EDRAk�u�� is given by the first l � k units ofa sequence that lists, in reverse order, the units ofthe DRA of u plus the k � l units that precede u butare not in its DRA. For example, for the text in Fig-ure 1, the following relations hold: EDRA ���� ��� EDRA���� � �� �� EDRA���� � �� �� ��EDRA���� � �� �� �� �� EDRA���� � �� �� �� �� �.For Linear-k models, the EDRAk�u� is given by uand the k units that immediately precede u.

The potential p�M� a�EDRAk� of a model M todetermine correct co-referential links with respectto a referential expression a in unit u, given a corre-sponding EDRA of size k �EDRAk�u��, is assignedthe value 1 if the EDRA contains a co-referent

from the same equivalence class as a. Otherwise,p�M� a�EDRAk� is assigned the value 0. The poten-tial p�M�C� k� of a model M to determine correctco-referential links for all referential expressions ina corpus of texts C, using EDRAs of size k, is com-puted as the sum of the potentials p�M� a�EDRAk�of all referential expressions a in C. This potentialis normalized to a value between 0 and 1 by dividingp�M�C� k� by the number of referential expressionsin the corpus that have an antecedent.

By examining the potential of each model to cor-rectly determine co-referential expressions for eachk, it is possible to determine the degree to whichan implementation of a given approach can con-tribute to the overall efficiency of anaphora resolu-tion systems. That is, if a given model has the po-tential to correctly determine a significant percent-age of co-referential expressions with small DRAs,an anaphora resolution system implementing thatmodel will have to consider fewer options overall.Hence, the probability of error is reduced.

3.2.2 ResultsThe graph in Figure 3 shows the potentials of theLinear-k and Discourse-VT-k models to correctlydetermine co-referential links for each k from 1 to20. The graph in Figure 4 represents the same po-tentials but focuses only on ks in the interval [2,9].As these two graphs show, the potentials increasemonotonically with k, the VT-k models always do-ing better than the Linear-k models. Eventually, forlarge ks, the potential performance of the two mod-els converges to 100%.

The graphs in Figures 3 and 4 also suggest reso-lution strategies for implemented systems. For ex-ample, the graphs suggests that by choosing to workwith EDRAs of size 7, a discourse-based system hasthe potential of resolving more than 90% of the co-referential links in a text correctly. To achieve thesame potential, a linear-based system needs to lookback 8 units. If a system does not look back atall and attempts to resolve co-referential links onlywithin the unit under scrutiny �k � �, it has thepotential to correctly resolve about 40% of the co-referential links.

To provide a clearer idea of how the two modelsdiffer, Figure 5 shows, for each k, the value of theDiscourse-VT-k potentials divided by the value ofthe Linear-k potentials. For k � , the potentials ofboth models are equal because both use only the unitin focus in order to determine co-referential links.For k � �, the Discourse-VT-1 model is about 7%

Figure 3: The potential of Linear-k and Discourse-VT-k models to determine correct co-referentiallinks � � k � �.

Figure 4: The potential of Linear-k and Discourse-VT-k models to determine correct co-referentiallinks � � k � ��.

better than the Linear-1 model. As the value of kincreases, the value Discourse-VT-k/Linear-k con-verges to 1.

In Figures 6 and 7, we display the number ofexceptions, i.e., co-referential links that Discourse-VT-k and Linear-k models cannot determine cor-rectly. As one can see, over the whole corpus, foreach k � �, the Discourse-VT-k models have thepotential to determine correctly about 100 more co-referential links than the Linear-k models. As k

increases, the performance of the two models con-verges.

Figure 5: A direct comparison of Discourse-VT-kand Linear-VT-k potentials to correctly determineco-referential links � � k � �.

Figure 6: The number of co-referential links thatcannot be correctly determined by Discourse-VT-kand Linear-k models � � k � �.

3.2.3 Statistical SignificanceIn order to assess the statistical significance of thedifference between the potentials of the two modelsto establish correct co-referential links, we carriedout a Paired-Samples T Test for each k. In general, aPaired-Samples T Test checks whether the mean ofcasewise differences between two variables differsfrom 0. For each text in the corpus and each k, wedetermined the potentials of both VT-k and Linear-k models to establish correct co-referential links inthat text. For ks smaller than 4, the difference inpotentials was statistically significant. For example,for k � �, t � ���� � df � �� P � �. Forvalues of k larger than or equal to 4, the differencewas no longer significant. These results are consis-tent with the graphs shown in Figure 3 to 7, whichall show that the potentials of Discourse-VT-k andLinear-k models converges to the same value as thevalue of k increases.

Figure 7: The number of co-referential links thatcannot be correctly determined by Discourse-VT-kand Linear-k models �� � k � ��.

3.3 Comparing the effort required to establishco-referential links

3.3.1 MethodThe method described in section 3.2.1 estimates thepotential of Linear-k and Discourse-VT-k modelsto determine correct co-referential links by treatingEDRAs as sets. However, from a computational per-spective (and presumably, from a psycholinguisticperspective as well) it also makes sense to comparethe effort required by the two classes of models toestablish correct co-referential links. We estimatethis effort using a very simple metric that assumesthat the closer an antecedent is to a correspond-ing referential expression in the EDRA, the better.Hence, in estimating the effort to establish a co-referential link, we treat EDRAs as ordered lists. Forexample, using the Linear-9 model, to determine thecorrect antecedent of the referential expression thesmaller company in unit 9 of Figure 1, it is neces-sary to search back through 4 units (to unit 5, whichcontains the referent Genetic Therapy). Had unit 5been Mr. Cassey succeeds M. James Barrett, 50, wewould have had to go back 8 units (to unit 1) in orderto correctly resolve the RE the smaller company. Incontrast, in the Discourse-VT-9 model, we go backonly 2 units because unit 1 is two units away fromunit 9 (EDRA���� � �� �� �� �� �� � �� �� ).

We consider that the effort e�M� a�EDRAk� of amodel M to determine correct co-referential linkswith respect to one referential a in unit u, given acorresponding EDRA of size k �EDRAk�u�� is givenby the number of units between u and the first unit inEDRAk�u� that contains a co-referential expressionof a.

The effort e�M�C� k� of a model M to deter-

Figure 8: The effort required by Linear-k andDiscourse-VT-k models to determine correct co-referential links � � k � ��.

mine correct co-referential links for all referentialexpressions in a corpus of texts C using EDRAsof size k was computed as the sum of the effortse�M� a�EDRAk� of all referential expressions a inC.

3.3.2 ResultsFigure 8 shows the Discourse-VT-k and Linear-k ef-forts computed over all referential expressions in thecorpus and all ks. It is possible, for a given referenta and a given k, that no co-referential link exists inthe units of the corresponding EDRAk. In this case,we consider that the effort is equal to k. As a conse-quence, for small ks the effort required to establishco-referential links is similar for both theories, be-cause both can establish only a limited number oflinks. However, as k increases, the effort computedover the entire corpus diverges dramatically: usingthe Discourse-VT model, the search space for co-referential links is reduced by about 800 units for acorpus containing roughly 1200 referential expres-sions.

3.3.3 Statistical significanceA Paired-Samples T Test was performed for each k.For each text in the corpus and each k, we deter-mined the effort of both VT-k and Linear-k modelsto establish correct co-referential links in that text.For all ks the difference in effort was statisticallysignificant. For example, for k � �, we obtainedthe values t � �� �� df � �� P � ��. These re-sults are intuitive: because EDRAs are treated as or-dered lists and not as sets, the effect of the discoursestructure on establishing correct co-referential linksis not diminished as k increases.

4 ConclusionWe analyzed empirically the potentials of discourseand linear models of text to determine co-referentiallinks. Our analysis suggests that by exploitingthe hierarchical structure of texts, one can increasethe potential of natural language systems to cor-rectly determine co-referential links, which is a re-quirement for correctly resolving anaphors. If onetreats all discourse units in the preceding discourseequally, the increase is statistically significant onlywhen a discourse-based coreference system looksback at most four discourse units in order to estab-lish co-referential links. However, if one assumesthat proximity plays an important role in establish-ing co-referential links and that referential expres-sions are more likely to be linked to referees thatwere used recently in discourse, the increase is sta-tistically significant no matter how many units adiscourse-based co-reference system looks back inorder to establish co-referential links.

Acknowledgements. We are grateful to LynetteHirschman and Nancy Chinchor for making avail-able their corpus of co-reference annotations. Weare also grateful to Graeme Hirst for comments andfeedback on a previous draft of this paper.

ReferencesSaliha Azzam, Kevin Humphreys, and Robert

Gaizauskas. 1998. Evaluating a focus-based ap-proach to anaphora resolution. In Proceedings ofthe 36th Annual Meeting of the Association forComputational Linguistics and of the 17th Inter-national Conference on Computational Linguis-tics (COLING/ACL’98), pages 74–78, Montreal,Canada, August 10–14.

Dan Cristea, Nancy Ide, and Laurent Romary. 1998.Veins theory: A model of global discourse co-hesion and coherence. In Proceedings of the36th Annual Meeting of the Association for Com-putational Linguistics and of the 17th Interna-tional Conference on Computational Linguistics(COLING/ACL’98), pages 281–285, Montreal,Canada, August.

Barbara Fox. 1987. Discourse Structure andAnaphora. Cambridge Studies in Linguistics; 48.Cambridge University Press.

Niyu Ge, John Hale, and Eugene Charniak. 1998.A statistical approach to anaphora resolution. InProceedings of the Sixth Workshop on Very LargeCorpora, pages 161–170, Montreal, Canada, Au-gust 15-16.

Barbara J. Grosz and Candace L. Sidner. 1986. At-tention, intentions, and the structure of discourse.ComputationalLinguistics, 12(3):175–204, July–September.

Barbara J. Grosz, Aravind K. Joshi, and Scott We-instein. 1995. Centering: A framework for mod-eling the local coherence of discourse. Computa-tional Linguistics, 21(2):203–226, June.

Lynette Hirschman and Nancy Chinchor, 1997.MUC-7 Coreference Task Definition, July 13.

Janet Hitzeman and Massimo Poesio. 1998. Longdistance pronominalization and global focus. InProceedings of the 36th Annual Meeting of theAssociation for Computational Linguistics andof the 17th International Conference on Com-putational Linguistics (COLING/ACL’98), pages550–556, Montreal, Canada, August.

Jerry H. Hobbs. 1978. Resolving pronoun refer-ences. Lingua, 44:311–338.

Megumi Kameyama. 1997. Recognizing referen-tial links: An information extraction perspec-tive. In Proceedings of the ACL/EACL’97 Work-shop on Operational Factors in Practical, RobustAnaphora Resolution, pages 46–53.

Shalom Lappin and Herbert J. Leass. 1994. Analgorithm for pronominal anaphora resolution.Computational Linguistics, 20(4):535–561.

William C. Mann and Sandra A. Thompson. 1988.Rhetorical structure theory: Toward a functionaltheory of text organization. Text, 8(3):243–281.

Daniel Marcu, Estibaliz Amorrortu, and MagdalenaRomera. 1999. Experiments in constructing acorpus of discourse trees. In Proceedings of theACL’99 Workshop on Standards and Tools forDiscourse Tagging, University of Maryland, June22.

Ruslan Mitkov. 1997. Factors in anaphora reso-lution: They are not the only things that mat-ter. a case study based on two different ap-proaches. In Proceedings of the ACL/EACL’97Workshop on Operational Factors in Practical,Robust Anaphora Resolution, pages 14–21.

Candace L. Sidner. 1981. Focusing for interpre-tation of pronouns. Computational Linguistics,7(4):217–231, October–December.

Wietske Vonk, Lettica G.M.M. Hustinx, and WimH.G. Simons. 1992. The use of referential ex-pressions in structuring discourse. Language andCognitive Processes, 7(3,4):301–333.