kristina striegnitz › esslli › courses › readers › studentsession.pdfnob o komagata,...

Proceedings of the sixth

ESSLLI Student Session

August 2001, Helsinki

Kristina Striegnitz (Ed.)

Preface

The student session of the European Summer School in Logic, Languageand Information provides a forum where students at any level can presenttheir own work to an international audience of researchers and other stu-dents. It aims at creating an environment where students can, on the onehand, discuss their ideas with experts in the area and get helpful and in-spiring feedback on their work, and, on the other hand, meet and exchangeideas with other students. Submissions are invited in any of the six ESS-LLI subject areas: logic, language, computation, logic & language, logic &computation, and language & computation.

It is now the sixth time that a student session is organized as part of theESSLLI. As in the previous years, the number of submissions re ects theinterest that exists in the ESSLLI student session: 63 submissions from allover the world were received this year. The program committee accepted 18papers for a 30 minute presentation and 11 for a poster presentation, 28 ofwhich appear in this volume.

I am very grateful to the program committee { to the co-chairs, who didan excellent job in �nding reviewers and coordinating the reviewing in theirsection, and to the area experts who assisted us with their expertise andexperience.

Of course, the student session wouldn't have been possible without thenumerous reviewers. I want to thank them for taking the time to read andcomment on the submitted papers, which not only helped us in making the�nal decision, but is also important as feedback for the student authors.

Finally, I would like to thank everybody who supported me with helpand advice, especially Geert-Jan Kruij�, Ivana Kruij�-Korbayova, and AhtiPietarinen of the local organizing committee.

I am looking forward to the student session in Helsinki, which promisesmany interesting presentations and stimulating discussions. I hope that itwill be a success for all participants.

Kristina Striegnitz Saarbr�ucken, June 2001ESSLLI 2001 Student Session Chair

i

Programm Committee

Logic

Ani Nenkova, Columbia University, USAYde Venema, University of Amsterdam, Netherlands

Language

Malvina Nissim, University of Pavia, ItalyRuth Kempson, King`s College London, UK

Computation

Jan Schwinghammer, Saarland University, GermanyGilles Dowek, INRIA Rocquencourt, France

Logic & Language

Ra�aella Bernardi, University of Utrecht, NetherlandsPatrick Blackburn, INRIA Lorraine, France

Logic & Computation

Carsten Lutz, University of Aachen, GermanyIlkka Niemel�a, Helsinki University of Technology, Finland

Language & Computation

Susanne Salmon-Alt, INRIA Lorraine, FranceShuly Wintner, University of Haifa, Israel

Reviewers

Marco Aiello, Jan Alexanderson, Thorsten Altenkirch, Elisabeth Andr�e, Hi-roshi Aoyama, Carlos Areces, Mira Ariel, Alessandro Artale, Paolo Baldan,J�org Bauer, Michael Beeson, Lev Beklemishev, Agnes Bende-Farkas, Bran-don Bennett, Claire Beyssade, Jean-Yves Beziau, Patrick Blackburn, UlrichBodenhofer, Eerke Boiten, Johan Bos, Sebastian Brandt, Bryson Brown,Armelle Brun, Silvia Bruti, Mary Buchholtz, Isabella Burger, Miriam Butt,Jean Carletta, Stephen Clark, Francis Corblin, John Constable, Gilles Dowek,Antonin Dvorak Regine Eckhardt, Jan van Eijck, Katrin Erk, Tim Fernando,Olivier Ferret, Melvin Fitting, Christophe Fouquere, Nissim Francez, EnricoFranconi, Anette Frank, Bertrand Gai�e, Kim Gerdes, Bart Geurts, ChiaraGhidini, Valentin Goranko, Bernd Grobauer, John Hannan, Roland Hausser,Dominik Heckmann, Klaus von Heusinger, Dirk Heylen, Colin Hirsch, NancyIde, Rosalie Iemho�, Amar Isli, Gerhard J�ager, Tomi Janhunen, MaartenJanssen, Sylvain Kahane, Ruth Kempson, Steve King, Bartek Klin, Alexan-der Koller, Nobo Komagata, Geert-Jan Kruij�, Ivana Kruij�-Korbayova,

ii

Daniela Kurz, Celine Kuttler, Fran�cois Lamarche, Fr�ed�eric Landragin, MariaLapata, Alain Lecomte, Patrice Lopez, Carsten Lutz, Katja Markert, MaartenMarx, Tomoko Matsui, Erica Melis, Daniel Moldt, Paola Monachesi, ChristofMonz, Richard Moot, Glyn Morrill, Stefan M�uller, Daniele Nardi, IlkkaNiemela, Oystein Nilsen, Malvina Nissim, Rick Nouwen, Richard Oehrle,Martin Otto, Valeria de Paiva, Simon Parsons, Lawrence C. Paulson, Niko-lay Pelov, Gerald Penn, Massimo Poesio, Fran�cois Pottier, Stephanie Pou-chot, Owen Rambow, Anne Reboul, Steve Reeves, Philip Resnik, Olivier Ri-doux, Laurent Romary, Andreas Rossberg, Louisa Sadler, Christer Samuels-son, Anoop Sarkar, Ulrike Sattler, Marina Sbisa, Peter Schotch, SvetlanaSheremetyeva, Patrick St.-Dizier, Maarten Stol, Jiri Sustal, Mariet The-une, Yannick Toussaint, David Traum, Dimiter Vakarelov, Achille C. Varzi,Yde Venema, Andrei Voronkov, Peter Wiemer-Hastings, Frank Wolter, PaulWong, Michael Zock

iii

Contents

David Ahn

Computing adverbial quanti�er domains 1

Henrik Bergqvist

Grounding in the Swedish Map-task 13

Manuel Bodirsky, Tobias G�artner, Timo von Oertzen,

and Jan Schwinghammer

Computing the Density of Regular Languages 23

Vladimir Brezhnev

On the Logic of Proofs 35

Balder ten Cate

Information exchange as reduction 47

Martine De Cock

Fuzzy Hedges: a Next Generation 59

Bridget Copley

One is Enough: The Case Against Aspectual Prolifer-ation 71

Chris Cornelis and Glad Deschrijver

The Compositional Rule of Inference in an Intuition-istic Fuzzy Logic Setting 83

Ralph Debusmann

Movement as well-formedness conditions 95

Marta Garcia Matos

On Interpolation and Model Theoretic Characteriza-tion of Logics 107

Elsi Kaiser

A Dynamic Approach to Referent Tracking and Pro-noun Resolution 117

iv

Elena Karagjosova

Towards a comprehensive meaning of German doch 131

Simon Keizer

Dialogue Act Modelling Using Bayesian Networks 143

Roman V. Konchakov

On the Semantics of Concurrent Programming Lan-guages: An Automata-Theoretic Approach 155

Stasinos Konstantopoulos

Learning Phonotactics Using ILP 167

Christian Korthals

Self Embedded Relative Clauses in a Corpus of Ger-man Newspaper Texts 179

Ewen Maclean

Automating Proof in Non-standard Analysis 191

Kundan Misra

On LPetri nets 203

Tara Nicholson

A weakening of chromatic number 215

Rick Nouwen

A plural resolution logic 227

Jerome Piat

Relational Concept Analysis for Structural Disambigua-tion 239

Gy�orgy R�akosi

A Model-Based Semantics of the Mood Morphemesin Hungarian 251

Saeed Salehi

Unprovability of Herbrand Consistency inWeak Arith-metics 265

Chung-chieh Shan

Monads for natural language semantics 275

v

Isidora Stojanovic

Incomplete De�nite Descriptions, Demonstrative Com-pletion and Redundancy 289

Jan Westerhoff

Belief Fragments and Ontological Categories 299

Naoki Yoshinaga and Yusuke Miyao

Grammar conversion from LTAG to HPSG 309

Evgeni Zolin

In�nitary Expressibility of Necessity in Terms of Con-tingency 325

vi

Computing adverbial quanti�er domains

David Ahn

University of Rochester, Computer Science Department

[email protected]

Abstract. This paper describes a method for computing the domain of quan-ti�cation of an adverbially quanti�ed sentence. This method relies on the accom-modation of presuppositions in the scope of a quanti�cational adverb and on theresolution of the domain in context. This paper also describes a computationalsystem for processing such sentences based on this method.

1 Introduction

This paper is concerned with the computation of the logical form of ad-verbially quanti�ed sentences, which are those sentences modi�ed by anexplicit quanti�cational adverb (qadverb), such as always, usually, some-times, or never. The primary challenge in computing the logical form ofan adverbially quanti�ed sentence (which we will call qadverb sentences)is determining the domain of quanti�cation (there are additional challengesassociated with interpreting such a logical form, including the strange modalnature and quanti�cational force of generics, but we are leaving those asidefor this paper). We propose that qadverbs quantify over situations whichare restricted both by presuppositions of the scope of the qadverb and bycontext. Furthermore, we propose that such domain restrictions can be com-puted by a method based on the presupposition resolution algorithm of vander Sandt (1992), and we demonstrate this with a DRT-based grammar andparser based on the system of Blackburn and Bos (1999).

2 Quanti�cational adverbs

Syntactically, a qadverb can modify either a verb phrase or a sentence:

(1) John usually takes out the trash.

(2) Usually, John takes out the trash.

1This work was supported by NSF research grant IIS-0082928

Proceedings of the Sixth ESSLLI Student SessionKristina Striegnitz (editor)Chapter 1, Copyright c 2001, David Ahn

1


Semantically, however, we take them to be sentential operators, becausethey give rise to scope ambiguities with quanti�ed noun phrases (NPs):

(3) Someone usually takes out the trash.

In fact, we take the logical form of a qadverb sentence to be similar to thatof a sentence containing a quanti�ed NP. The qadverb, like a quanti�ca-tional determiner, corresponds to an operator which takes three arguments.The �rst argument is a discourse marker, which serves as the variable ofquanti�cation. The second argument (the restrictor) indicates the domainof quanti�cation; the third argument (the nuclear scope) indicates the pred-ication which is asserted of the members of the domain. In the DRT-basedlogical form language, the second and third arguments are each a DRS. Thevariable of quanti�cation is introduced in the restrictor DRS, which is ac-cessible to the nuclear scope DRS, much like the arguments of a conditionaloperator. For a quanti�cational determiner, the restrictor is provided bythe remainder of the NP, and the nuclear scope, by the remainder of thesentence. For a qadverb, the situation is not so straightforward.

Unlike a quanti�cational determiner, there is no syntactically determinedrelationship between a qadverb and constituents which might provide its re-strictor. The earliest work on qadverb sentences (starting with Lewis, 1975)focuses on examples in which an if - or when-clause provides the restrictor:

(4) If a man owns a donkey, he always beats it.

Without such a clause (and sometimes, even with one), any part of a qadverbsentence may map to the restrictor argument. Milsark (1974) provides anoft-cited example:

(5) Typhoons arise in this part of the Paci�c.

This sentence has two readings. One is an implausible reading in which theset of typhoons serves as the restrictor argument; on such a reading, it istaken to be a general property of typhoons that they arise in a particular partof the Paci�c. In the other, more natural, reading, the restrictor argumentis the set of situations involving the indicated part of the Paci�c; on sucha reading, the occasional arising of a typhoon is taken to be a property ofsituations in a particular part of the Paci�c. De Swart (1991) and Rooth(1985) give examples of sentences in which even an explicit when-clause doesnot necessarily map to the restrictor:

(6) When John was young, he often took walks in the gardens.

(7) John usually shaves when he showers.

In (6), the when-clause only serves to set the situation in time. In (7), focalstress determines whether the when-clause maps to the restrictor.

2

David Ahn

Focal stress and explicit if - and when-clauses are just two of the manyfactors that have been implicated in the determination of the restrictor argu-ment of a qadverb. Diesing (1992) proposes with her Mapping Hypothesisthat it is syntactic structure which is primarily responsible for determin-ing what overt material (in particular, which NPs) in a qadverb sentenceis associated with the qadverb restrictor. Her theory requires the syntacticmachinery of GB and fails to take into account much of the semantic com-plexity of the data. Opposed to this syntactic view are those who claim thatit is the topic-focus or topic-comment structure of a sentence that determinesits division into restrictor and matrix. According to this view, the topic ofa qadverb sentence is its restrictor; its focus, the matrix (proponents of thisapproach include Chierchia, 1992 and J�ager, 1997; see Partee, 1995 for alist of others). This approach is closely related to Rooth's observations onthe association of qadverb restrictors with focal stress (exempli�ed by (7),above). Cohen (1996) proposes that qadverbs quantify over appropriate setsof alternatives which may be generated in a variety of ways, including byfocal stress or by presuppositions.

Several other authors have noticed that presuppositions of the scope ofa qadverb are incorporated into its restrictor (Schubert and Pelletier 1987;Berman 1991):

(8) John usually beats Marvin at ping pong.

(9) John usually regrets missing a lecture by Chomsky.

Schubert and Pelletier note that in a qadverb sentence with a presupposi-tional verb, such as beat, the qadverb quanti�es only over situations whichsatisfy the presupposition associated with the verb (in the case of (8), situa-tions in which John plays Marvin at ping pong). Berman provides examplesfor other kinds of presuppositions, such as factives (regrets in (9)) and aspec-tual verbs. The range of presuppositional material considered by Schubertand Pelletier and by Berman is somewhat narrow, and both consider presup-position incorporation to be only a part of process of determining a qadverbrestrictor. When we consider the full range of presuppositions which may beaccommodated by a qadverb restrictor, however, we �nd that most of theovert material which has been claimed to be incorporated into the restrictorconsists of presuppositions of the scope.

3 Presuppositions

A wide range of linguistic phenomena gives rise to presuppositions. Keenan(1971) includes as examples of presupposition triggers: de�nite descriptions,factive predicates, cleft constructions, selectional restrictions, temporal sub-ordinate clauses, certain aspectual verbs, iteratives, and presuppositional

3


adverbs; to this list, van der Sandt (1988) adds focal stress, lexical presup-position, and quanti�ers. Additionally, Milsark (1977) and Fodor and Sag(1982) distinguish between two readings of inde�nites, one of which corre-sponds to existential quanti�cation and the other of which is essentially pre-suppositional. If we simply allow that sentential presuppositions are whatdetermine a qadverb restrictor, we have an account which subsumes theabove-mentioned accounts. Explicit if - and when-clauses fall together withother presupposed subordinate clauses. The NPs with which Diesing is con-cerned fall into one of the following classes: de�nite descriptions, which arethe paradigm case of presuppositionality; quanti�ed NPs, which are taken topresuppose their domains; and inde�nites, which on certain interpretationsare presuppositional. The most illustrative cases of topic-focus articulation(cleft constructions and focal stress) are also subsumed as presuppositions.

In order to make our presuppositional theory work, we need a separateaccount of presuppositions. The most widely accepted account of presup-positional phenomena is that of van der Sandt (1992), which is well-suitedto our purposes. Van der Sandt proposes that presuppositions are simplyinformationally rich anaphors. Like an ordinary anaphor, a presuppositionrequires an antecedent, and if one is found in the appropriate context, thenthe presupposition is satis�ed. Unlike an ordinary anaphor, though, theinability to �nd an antecedent does not necessarily lead to failure. Since apresupposition is informationally rich, i.e. it has its own descriptive content,an antecedent may be constructed for it, under the appropriate conditions.

Van der Sandt casts his account in a version of DRT, which allows himto handle projection in con�gurational terms, using the DRT notion of ac-cessibility. On his analysis, presupposing elements introduce their elemen-tary presuppositions in a local DRS just as anaphoric elements introduceanaphors in the local DRS. Each elementary presupposition must then beresolved, either by binding or accommodation. The resolution process pro-ceeds from the local DRS for an elementary presupposition along its projec-tion line looking for an appropriate marker which can be uni�ed with theelementary presupposition. If one is found, then an anaphoric link is createdbetween the presupposition and its antecedent, and the presupposition hasbeen satis�ed by binding. If one is not found, the presupposition must beaccommodated at some accessible level in which constraints on binding, con-sistency, informativeness, and eÆciency can be satis�ed. Accommodationsimply consists of copying the presupposition to the embedding DRS.

4 Qadverb analysis

Following Berman (1987), Heim (1990), and von Fintel (1995), we take qad-verbs to be quanti�ers over situations (rather than unselective quanti�ers orquanti�ers over time as others have proposed), which allows us to restrict

4

David Ahn

both temporally and informationally the entities being quanti�ed over andalso allows us to make the connection between qadverb restrictors and pre-suppositions clearer. Following Poesio (1994), we take presuppositions tobe situation descriptions, and the resolution of a presupposition to consistof binding or accommodating a situation parameter associated with the de-scription. Since there is no descriptive content initially associated with aqadverb restrictor, the restrictor in our initial logical form consists solelyof a situational discourse marker with no conditions. The presuppositionresolution process resolves presuppositions of the scope with the restrictorby binding the situation parameters associated with these presuppositionsto the situational discourse marker introduced in the restrictor. This bind-ing of situation parameters accounts for the presupposition accommodationobserved by Schubert and Pelletier and Berman and also accounts for mostof the other overt material which has been claimed to restrict the domainof quanti�cation of qadverbs.

As we mentioned above, the domain of quanti�cation of a quanti�ca-tional determiner is presupposed. A qadverb also presupposes its domainof quanti�cation. In both cases, what is presupposed is the existence of aset corresponding to the quanti�cation domain. Thus, for a qadverb, therestrictor is taken to be a situation description which provides the member-ship conditions of the presupposed set of situations. This set of situationsis associated with a situation parameter of its own. Note that this situationparameter is a separate entity from the situational discourse marker intro-duced in the restrictor. This discourse marker indicates the situations overwhich the qadverb quanti�es and, as described above, binds presuppositionsin the nuclear scope. The situation parameter associated with the domainpresupposition provides the resource situation in which those entities aresituated. It is the resolution of this parameter in the context of the rest ofthe discourse which results in the anaphoricity of qadverb domains that isobserved by von Fintel (1995).

For example, consider the sentence (10). (We will illustrate one plausi-ble resolution, though there are others.) The initial representation for thissentence (11) introduces a global discourse situation DS, which supports thedinner-stealing proposition. (The support relation * between a situationaldiscourse marker and a DRS is transparent to accessibility.) The quanti�ca-tional condition deriving from the qadverb often takes three arguments: S,which is the variable of quanti�cation; the restrictor DRS, which containsa situation parameter RS' associated with the resource domain presuppo-sition; and the nuclear scope DRS, which contains situation parameters as-sociated with the presuppositions corresponding to the cat and my dinner.Situational parameters are distinguished by a prime and are associated to apresupposition via the � relation.

(10) The cat often steals my dinner

5


(11)

DS

DS*

often(S,

S

S* ,

RS' �

SS

set(SS),

T

member(T,SS) ) T*

,

S*

steal(C, D),

RC' �

C

cat(C) ,

RD' �

D

my-dinner(D)

)

In the representation (12), the presupposition for the cat has been accom-modated globally by binding RC' to DS and adding the conditions of RC'to the situation supported by DS. The presupposition corresponding to mydinner has been accommodated in the restrictor, by binding RD' to S andadding the conditions of RD' to the situation supported by S. These condi-tions are also added to resource domain presupposition.

(12)

DS

DS*

C

cat(C),often(S,

S

S*

D

my-dinner(D) ,

RS' �

SS

set(SS),

T

member(T,SS) ) T*

D

my-dinner(D)

,

S* steal(C, D))

6

David Ahn

The resolution process is completed by binding RS' to DS and adding itsconditions, as well, resulting in representation (13).

(13)

DS

DS*

C SS

cat(C),set(SS),

T

member(T,SS) )T*

D

my-dinner(D),

often(S,

S

S*

D

my-dinner(D),

S* steal(C,D))

The discourse may be continued with sentence (14), which, after resolv-ing the presuppositions associated with she and the delicious food, resultsin the representation (15).

(14) She always enjoys the delicious food.

(15)

DS

DS*

C SS

cat(C),female(C),set(SS),

T

member(T,SS) ) T*

D

my-dinner(D),

often(S,

S

S*

D

my-dinner(D), S* steal(C,D)

),

always(U,

U

U*

F

delicious-food(F) ,

RU' �

UU

set(UU),

V

member(V,UU) ) V *

F

delicious-food(F)

,

U* enjoy(C,F))

7


The �nal representation (16) is completed by binding RU' to DS and equat-ing UU to SS. In collapsing the membership conditions for UU and SS, Fand D are also equated.

(16)

DS

DS*

C SS

cat(C),female(C),set(SS),

T

member(T,SS) ) T*

D

my-dinner(D),delicious-food(D)

,

often(S,

S

S*

D

my-dinner(D),

S* steal(C,D)),

always(U,

U

U*

F

delicious-food(F),

U* enjoy(C,F))

In the above example, the resource domains associated with the qadverbswere resolved by accommodation or by binding. In binding the resource do-main for always to the accommodated set for often, we equated the twosets. In general, however, the relation between an existing set and a re-source domain which is bound to it is not equality. Consider the followingexample:

(17) John ate every piece of fruit in the bowl. Most (of the) oranges weretasty.

In this case, the set of fruit should bind the set of oranges, even thoughthe two sets are not necessarily equal. Instead, the set of oranges shouldbe taken to be a subset of the set of fruit. We see the same situation withqadverb domains:

(18) My friends always vacation in strange places. My best friend Johnusually goes to Antarctica.

The set of situations accommodated for the �rst sentence consists of situa-tions in which the speaker's friends go on vacation. The resource domain forthe second sentence is the set of situations in which John goes somewhere.Clearly, this set should be bound to a subset of the �rst set|those situationsin which John goes on vacation. Thus, we expand the notion of binding toinclude binding a presupposed set to a subset of an existing set.

As the example (10)-(16) demonstrates, presuppositions arising in thescope of a qadverb are not necessarily resolved with the restrictor. Whether

8

David Ahn

or not a presupposition must be bound to the situational discourse markerintroduced in the restrictor depends on a number of factors. Most impor-tant is whether or not the sentence is coherent without such binding. Sincewe claim that a qadverb essentially quanti�es over situations (which defaultto time intervals of contextually determined granularity in the absence ofany bound presuppositions), any presuppositions that are resolved at a dis-course level superordinate to the quanti�cation must be constant for eachof the situations over which the qadverb quanti�es. For many presupposedelements, this is not possible. Consider, for example:

(19) The fog usually lifts before noon here.

There cannot be a single, unique instance of fog which repeatedly lifts;instead, the de�nite description must be bound to the qadverb restrictor,resulting in separate instances of fog for each lifting event. We are stilldeveloping a way to formalize this notion of persistence so that it can beused to evaluate possible resolutions.

One shortcoming of our analysis of qadverbs is that once the resourcedomain is resolved, the connection between the set of situations constitutingthe resource domain and the set of situations over which the qadverb quan-ti�es is lost. One way to maintain this connection is to include an additionalcondition in the restrictor which indicates that the situation correspondingto the variable of quanti�cation is a member of the resource domain. Forexample (11), the condition would be member(S,SS). Once this is done, theadditional conditions in the restrictor would seem to be redundant, exceptthat they are necessary to maintain accessibility for the nuclear scope.

Geurts and van der Sandt (1999) present an account of quanti�er domainrestriction that is in spirit very similar to ours. They, however, introducea new kind of entity|a propositional discourse marker|which allows themto handle acessibility non-structurally. On their analysis, a quanti�er is nota relation between descriptions of sets but between propositional discoursemarkers, which are sets of embedding functions speci�ed as separate condi-tions. Thus, there is a direct connection between the presupposed resourcedomain and the restrictor argument. They build in conservativity by stip-ulating that the propositional discourse marker associated with the nuclearscope is an extension of the marker associated with the restrictor and, as aresult, they get accessibility of the restrictor from the nuclear scope for free.

5 Computational implementation

We are presently at work on a small computational system to process qad-verb sentences. It is based on the DRT parser in the textbook by Blackburnand Bos (1999), which is focused on the problems of quanti�er scope ambigu-ity and presupposition resolution. Thus far, we have added grammar rules

9


and lexical entries to allow for plural nouns and quanti�cational adverbsand modi�ed existing rules and entries to associate situational discoursemarkers with DRSs. We do use the existing notation for presuppositions,which di�ers somewhat from ours, and have not modi�ed it to associatesituation parameters with presuppositions. Thus, instead of introducing apresupposition inside the DRS in which its extent begins, a presuppositionis introduced via an alpha expression which takes scope over the DRS inwhich its extent begins.

We have had to modify the presupposition resolution algorithm itself.The original algorithm resolves each presupposition before resolving anypresuppositions triggered within its scope; in order for the resource domainof a qadverb to have any descriptive content to be resolved, however, thepresuppositions of the nuclear scope (which falls within the scope of theresource domain presupposition) must be resolved �rst. Also, a resourcedomain presupposition (both for qadverbs and for quanti�ed NPs, for whichwe have also added such a presupposition) may be bound either to an exist-ing set or to a subset of an existing set. The semantic macro that producesthe semantic portion of the lexical entry for a qadverb is as follows:

advSem(Sym, qadv, lambda(P, lambda(X,

alfa(SS,qresource,

drs([SS],[set(SS),

drs([S],[member(S,SS)])>drs([],[])]),

drs([],[qcond(Sym,S,drs([S],[S * drs([],[])]),

drs([],[S * P@X])]))))))

Since a qadverb is syntactically a VP operator, its �rst argument is a pred-icate, and its second, an individual. The resulting expression is a resourcedomain presupposition (SS is the set of situations) whose scope is the quan-ti�cational condition (Sym is the translation of the qadverb). Presuppo-sitions arising from P@X (i.e. P applied to X ) may be accommodated inthe restrictor. Material from the restrictor is copied into the membershipconditions of the resource domain, and the resource domain is then resolved.

This system is still a work-in-progress. We have not yet built a mecha-nism to check consistency when binding sets, and we still have to introducesituation parameters into presuppositional expressions to mirror our anal-ysis more closely. We are considering trying to use propositional discoursemarkers, a la Geurts and van der Sandt (1999), since in addition to han-dling the connection between the restrictor and the resource domain moreneatly, such an approach would obviate the need for a di�erent order ofresolution just for qadverbs. Changes would be necessary to handle acces-sibility through linked propositional discourse markers, as well as throughstructural embedding. Nonetheless, the current system computes qadverbdomains in accordance with the theoretical analysis outlined above, usingonly presuppositional information.

10

David Ahn

References

Berman, S. (1987). Situation-based semantics for adverbs of quanti�ca-tion. In University of Massachusetts Occasional Papers in Linguistics,Volume 12, pp. 45{68. Graduate Linguistic Student Association, Uni-versity of Massachusetts, Amherst. Also published in the Proceedingsof the Sixth Annual West Coast Conference on Formal Linguistics,17{31, Stanford Linguistics Association, Stanford University, 1987.

Berman, S. (1991). On the semantics and logical form of WH-clauses.Ph. D. thesis, University of Massachusetts, Amherst. Also publishedin Outstanding Dissertations in Linguistics, Garland, 1994.

Blackburn, P. and J. Bos (1999). Representation and inference fornatural language: A �rst course in computational semantics.http://www.coli.uni-sb.de/~bos/comsem/.

Chierchia, G. (1992). Anaphora and dynamic binding. Linguistics andPhilosophy 15, 111{183.

Cohen, A. (1996). Think Generic! The Meaning and Use of GenericSentences. Ph. D. thesis, Carnegie Mellon University. Also publishedin Dissertations in Linguistics, CSLI, 1999.

de Swart, H. (1991). Adverbs of Quanti�cation: A Generalized Quanti�erApproach. Ph. D. thesis, Rijksuniversiteit Groningen. Also publishedin Outstanding Dissertations in Linguistics, Garland, 1993.

Diesing, M. (1992). Inde�nites. MIT Press.

Fodor, J. D. and I. Sag (1982). Referential and quanti�cational inde�nites.Linguistics and Philosophy 5, 355{398.

Geurts, B. and R. van der Sandt (1999). Domain restriction. In P. Boschand R. A. van der Sandt (Eds.), Focus: Linguistic, Cognitive,and Computational Perspectives, pp. 268{292. Cambridge UniversityPress.

Heim, I. (1990). E-type pronouns and donkey anaphora. Linguistics andPhilosophy 13, 137{177.

J�ager, G. (1997). The Stage/Individual Contrast Revisited. In B. Ag-bayani and S.-W. Tang (Eds.), Proceedings of WCCFL 15, pp. 225{239.

Keenan, E. (1971). Two kinds of presupposition in natural language. InC. Fillmore and D. T. Langendoen (Eds.), Studies in Linguistic Se-mantics, pp. 44{52. New York: Holt, Rinehart and Winston, Inc.

Lewis, D. (1975). Adverbs of quanti�cation. In E. L. Keenan (Ed.), FormalSemantics of Natural Language: Papers from a colloquium sponsoredby the King's College Research Centre, Cambridge, pp. 3{15. Cam-bridge University Press.

11


Milsark, G. (1974). Existential Sentences in English. Ph. D. thesis, MIT.

Milsark, G. (1977). Toward an explanation of certain peculiarities of theexistential construction in English. Linguistic Analysis 3, 1{29.

Partee, B. H. (1995). Quanti�cational structures and compositionality. InE. Bach, E. Jelinek, A. Kratzer, and B. H. Partee (Eds.), Quanti�ca-tion in Natural Languages, Volume 2, pp. 541{601. Dordrecht: KluwerAcademic Publishers.

Poesio, M. (1994). Discourse Interpretation and the Scope of Operators.Ph. D. thesis, University of Rochester.

Rooth, M. (1985, February). Association with Focus. Ph. D. thesis, Uni-versity of Massachusetts, Amherst.

Schubert, L. K. and F. J. Pelletier (1987). Problems in the Representationof the Logical Form of Generics, Plurals, and Mass Nouns. In E. LePore(Ed.), New Directions in Semantics, Chapter 12, pp. 385{451. London:Academic Press.

van der Sandt, R. (1992). Presupposition projection as anaphora resolu-tion. Journal of Semantics 9, 333{377.

van der Sandt, R. A. (1988). Context and Presupposition. London: CroomHelm.

von Fintel, K. (1995, February and May). A minimal theory of adverbialquanti�cation. In B. Partee and H. Kamp (Eds.), Context Dependencein the Analysis of Linguistic Meaning, IMS Stuttgart Working Papers,Prague and Bad Teinach, pp. 153{193.

12

Grounding in the Swedish Map-task

Henrik Bergqvist

Stockholm University

[email protected]

Abstract. This paper investigates how two conversational moves with groundingproperties, acknowledge and reply-yes (Carletta et al. 1997), are distributed inSwedish map task dialogues, and if their respective utterance forms are possible tocategorize in order to obtain statistical data which might be useful in an automaticidenti�cation system.

1 Introduction

Grounding is the process whereby information between participants in dis-course is added to the common ground. This common ground, or commonknowledge is what the participants both believe to be true in some sense inrelation to what is being said in the conversation. This process also pos-tulates that there are distinct units in discourse whose function is that ofshowing understanding or acceptance of an utterance. This phenomenonhas been reasonably well investigated by among others, Clark & Schaefer(1989) and David Traum (1994), and is also the subject of this study. Inorder to clearly identify and categorize something it is often easier to de-�ne and restrict the domain of the investigation, which here is limited toSwedish map-task dialogues along the line of Anderson et al. (1991). In themap-task two participants each have a map in front of them. One of theparticipants is the giver and the other is the follower. The giver has a routedrawn on his map and tries to get the follower to draw the same route, butthe maps are not exactly alike which sometimes adds a bit of confusion tothe dialogue in trying to determine what exactly the discrepancy is.

The tag set used to categorize utterances in the dialogues was developedby Carletta et al. (1997) and has previously been used in the HCRC-corpus(www.hcrc.ed.ac.uk). Within this tag set there are two separate conver-sational moves which both have grounding properties, namely acknowledgeand reply-yes. These two moves are the main object of this study. They haveseveral traits in common, but it could be argued that one of them, acknowl-edge, is a grounding move in the sense that it only shows understanding

Proceedings of the Sixth ESSLLI Student SessionKristina Striegnitz (editor)Chapter 2, Copyright c 2001, Henrik Bergqvist

13


and/or acceptance, and that reply-yes does this while also contributing se-mantic information to the conversation. Their lexical forms are often almostidentical, but their roles in the discourse are somewhat di�erent. In accor-dance with Clark & Schaefers (1989) model for grounding in discourse, someutterances can convey semantic content while also functioning as a ground-ing element. This holds for the reply-yes move and it is therefore interestingto investigate from a grounding view point, in the same way as acknowledge.

The hypotheses is that it is possible to apply the theory of grounding tothe domain of the map-task and that a clear picture concerning the distribu-tion of grounding moves and their surface forms can be obtained. This couldgive useful clues to an automatic identi�cation system on a statistical basis.Separate utterance forms are expected to be identi�ed consistently for bothmoves. Can these forms in turn be assembled into a few, easily identi�ablecategories, and is it possible to make predictions as to the probability ofa grounding move after an information bearing move, eg. instruct, clarify,explain etc. (Carletta et al. 1997)?

2 Background and Method

The background for this investigation consists of two things, the article byCarletta et al. (1997) which explains and evaluates the tagging model asreferred to above, and the work of Clark & Schaefer (1989) and David Traum(1994). Clark & Schaefer introduce a de�nition of a segment in discourse,which is called the turn. This is equivalent to Carletta et al's game leveland constitutes a sub-unit in the dialogue. The turn is a term which is usedthroughout the study and in it, it generally encompasses two moves in a pair,for example instruct-acknowledge. The work by Traum is directed towardsforming a computational theory of grounding, explaining how discourse-units (equivalent to moves ) occur in task-oriented dialogue, with empiricalinformation from TRAIN-dialogues. It is a development of the previousarticle by Clark & Schaefer, since it provides empirically based argumentsconcerning grounding which the article by Clark & Schaefer lacks.

Brie y, utterances according to Carletta et al's scheme can be dividedinto three main categories, initiative, responsive and preparative. Initiativeutterances containing a command are coded instruct. If they simply consistof a statement, they are coded explain. An initiative utterance can alsobe a query of some kind, align, check, query-yes/no or query-wh. Possibleresponses are acknowledgement, if only evidence was given that the utterancewas successful and clarify if the reply contributes information as well. Theresponse category also contains reply-yes/no, and reply-wh if the initiativemove was a query or check. Preparative utterances are not included in thesurvey, as they do not pair up with other utterances. This is an example ofa turn made up from instruct-acknowledge.

14

Henrik Bergqvist

Initiator: [...]du siktar du g�ar mitt emellan palmerna du h�allerdig n�armre den undre gruppen av palmer �an den �ovre.[instruct][you aim for you go between the palm trees you keep closer tothe lower group of palm trees than the upper one]

Responder: ja [...][acknowledgement] [yes] (my translation)

The before mentioned di�erence between acknowledge and reply-yes is alsorelated to whether a certain move functions on a conversational- or domainlevel. The split between conversational- and domain goal in the tag setis relevant to this investigation since the tagging method contains moves,and turns, which are by de�nition conversational and domain oriented. Anexample of a conversational turn would be query-reply, whereas instruct-acknowledgement, would be a domain oriented turn. This is not a big thingin the study, but it is a di�erence that has relevance for how the moves arede�ned, and it is also something which separates the tag-set in this studyfrom the one used by Traum, which only concerns the domain level.

The Swedish map-task corpus, \the Swedish Corpus of SpontaneousSpeech" (SCSS), was used for this investigation. It was assembled by Hel-gason at the University of Stockholm and was part of an investigation onpreaspiration in natural speech. As such its transcription was not with themap-task in mind. Since this was a preliminary investigation, only four outof six available dialogues were used, and several more await transcription.The recordings were made face to face which makes it possible that nods andeye contact alone make up some of the grounding acts. The four dialoguescontain approximately 546 utterances as they were segmented in the origi-nal material, but this does not coincide with how they were later tagged inaccordance with Carletta et al's model. My segmentation of an utterance of-ten resulted in a division of one utterance into two or three moves. This wasdone with the content of the utterance in mind, and was not related to theseparation of utterances in a temporal sense. The material was presented tome both as audio�les with a division between right and left channel, equiv-alent to giver-follower, and as text�les with transcription information and aseparation between utterances. When something was unclear to me concern-ing how the utterances were distributed or how they were to be segmented,I consulted the audio�les, but otherwise relied on the transcribed material.This had to do with the time frame of the study, which was limited sinceit was a preliminary investigation. The examples from the dialogues will bepresented in a slightly modi�ed version where initiator and responder aremarked to signify the participants of the speci�c turn.First, an attempt is made to try to establish a few easily identi�able ut-terance forms that are common to the grounding moves, acknowledge and

15


reply-yes, to see how often they occur after other moves, i.e. instruct, ex-plain, clarify, check, query-y/n, acknowledge, and reply-yes. At an earlystage in the investigation, the reply-wh move was also investigated since italso has a grounding function, but the instances of the move in the dia-logues were too few and the utterance forms for reply-wh were in no wayhomogenous. Since this was not what I was looking for, I decided to excludethe results from the investigation. Second, the question as to whether moveswhich require grounding are actually followed by such moves, is addressed tosee if high percentages can be obtained, and when a grounding move is notuttered, what takes its place. Finally an attempt is made at constructing amodel, based on Traum�s �nite-state machine, (Traum 1994) to display howmoves as de�ned by Carletta et al. can occur in the dialogues as a help intrying to determine which utterance may follow another as far as groundingis concerned in the Swedish map task. I have to add that the empirical ev-idence for such a model has only super�cially been investigated within thisstudy and will need much further investigation if the results are actuallygoing to be used in a functioning computational identi�cation system.

3 Results and Discussion

The hypotheses was that it would be possible to �nd utterance forms foracknowledge and reply-yes which were going to be easy to identify and cate-gorize. No expectation existed beforehand as to whether only certain moves,eg. information bearing ones, were going to be followed by grounding movesin the form of acknowledge. The categorization had to depend on the results.Seven distinct categories for utterance forms were identi�ed which could befound in all dialogues on multiple occasions. They were: 1) ja - 'yes', 2)mmm - 'uhuh', 3) okej - 'okay', 4) jaha - 'oh', 5) aha - 'oh' 6) ja, just det -'right', and 7) miscellaneous utterance - for example: p�adet viset - 'is thatso'. The last category not only contained utterance forms which were com-posed of more than one or two words, but also consisted of short utteranceswhich were too few in numbers to be placed in a separate category. Theseforms were then counted and assembled into di�erent categories dependingon their occurrence after the moves instruct, explain, clarify, check, query-y/n, acknowledge, and reply-yes. Since instruct, explain, clarify and tosome extent also acknowledge, and reply-yes, take acknowledge as a ground-ing move, they must be separated from check and query-y/n which bothtake reply-yes as their proper reply. This division is based on Carletta etal's scheme (Carletta et al. 1997). The results can be seen below in tables1 and 2. Acknowledge can, -as seen in table 1, occur in most contexts of thedialogues and is also the most common move of all moves, before reply-yesin second place.

The results show that there is a tendency for the simple reply forms to

16

Henrik Bergqvist

Moves preceding instruct explain clarify reply-yes,utterance forms acknowledge

ja - 'yes' 44 17 13 13okej - 'okay' 16 8 7 9mmm - 'uhuh' 24 3 7 10ja, just det - 'right' 2 3 12 1aha - 'oh' 1 4 3 -jaha - 'oh' 5 2 8 3miscellaneous utterance 20 12 6 4

Total moves: 112 49 56 40

Total sum:257

Table 2.1: Preceding moves with utterance forms for acknowledgement

occur more frequently than do the miscellaneous forms. This is a tendencythat I hoped for and which the hypotheses predicted. If this has to do withthe settings for the recording of the dialogues, which were in a controlledenvironment, or if it is a property inherent to the phenomenon of grounding,I can not say, but I personally think that it has to do with how \on-line" theconversation is at a certain moment in the discourse. Sometimes (most of thetime) the participants feel that they understand each other, and do not feelthe need to be so explicit as to paraphrase the previous utterance to conveythat understanding has been achieved. One thing that Clark & Schaefer(1989) discusses is the di�erent weight of the utterance forms concerninggrounding. They propose that paraphrasing or repetition of parts of a previ-ous utterance has the most weight, and that replies in the form of yes, right,uhuh, etcetera, come just after in terms of conversational weight. Since thesimple reply-forms are more common than complex or miscellaneous ones,this might indicate what Clark & Schaefer predict, namely that: \[acknowl-edgements] ...are backgrounded attempts by partners to create contributionsfrom extended turns, and they almost always succeed" (Clark & Schaefer1989:283).These results are shown in tables 3 and 4.

The distribution of grounding moves after information bearing movessuch as instruct, explain, clarify, check and query-y/n also needs to be in-vestigated in order for any predictions concerning grounding moves in thedialogue to be useful. The results are displayed in table 5.

What these results imply is that grounding, besides being done withgrounding moves such as acknowledge and reply-yes, can be achieved with

17


Moves preceding query-yes/no checkutterance forms

ja[o], (jepp 2) - 'yes' 12 28okej - 'okay' - 1mmm - 'uhuh' 4 4ja, just det[precis] - 'right' 3 6aha - 'oh' - -jaha - 'oh' - -miscellaneous utterance 8 14

Total moves: 27 53

Total sum: 80

Table 2.2: Preceding moves with utterance forms for reply-yes

Utterance forms: instruct explain clarify reply-yes,ja-'yes', mmm-'uhuh' acknowledge,okej-'okay' andja, just det-'right'

Percentage of total 77% 63% 70% 83%

Table 2.3: Simple utterance forms for acknowledge

almost any move as long as it is relevant to the prior utterance. Instructcan for example take explain as a possible grounding move, and there areexamples of this in the dialogues.

Initiator: [ ]... och sen s�a tar vi en snabb v�ag precis rakt �osterutoch du g�ar norr om dom h�ar bergen. [instruct ] [and then wetake a fast route exactly straight east and you go north of thesemountains]

Responder: d�ar har jag krokodiler och s�a en upps�attning s�anah�ar l�ojliga f�aglar som s�akert heter lunnef�aglar eller n�at annat.[explain ] [there I have crocodiles and a set of these ridiculousbirds which probably are called puÆns or something else] (mytranslation)

Instruct can also be followed by a query-y/n, or a check move if somethingis not clear to the follower. This means that the turn is not grounded and

18

Henrik Bergqvist

Utterance forms: query-yes/no checkja-'yes', mmm-'uhuh',okej-'okay' andja, just det-'right'

Percentage of total 70% 74%

Table 2.4: Simple utterance forms for reply-yes

Moves preceding instruct explain clarify query-yes/no checkgrounding moves

Instance when 98 38 39 28 48followed bygrounding move

Instance when 29 11 12 13 22not followed bygrounding move

Sum total: 127 49 51 41 70

Percentage of moves 77% 78% 76% 68% 69%followed bygrounding moves

Table 2.5: Distribution and percentage of moves followed by groundingmoves

that it will continue in order for there to be agreement, or it will be grounded,at a later stage. There are however cases where a reply-no is uttered, whichindicates that there is no agreement and that something is wrong withoutspecifying exactly what it is. It seems that a measure of grounding and theonly possibility of knowing when an utterance has been grounded, dependson the following segments of discourse. If the information conveyed has beenaccepted, then the participants move on by discussing other things. What isconsidered to have entered the common ground is by no means absolute, inthe sense that it does not have to do with how the follower has drawn his/herroute. Only the dialogue in itself can show if something is considered to haveentered the common ground by the participants. Alternative moves occurafter all the information bearing moves when a grounding move has not beengiven, and can perhaps best be explained and displayed by constructing a�nite-state model. The model below shows the way a turn starting withinstruct, can evolve on its way to be grounded. The di�erence between themodel in this study and the one presented by Traum (1994), which the model

19

Figure 2.1: Finite-state model for instruct

directly builds on, is that the latter encompasses all levels of conversationwhile mine simply concerns the turn, and not the transaction-level. Thesedialogue levels are de�ned in Carletta et al.�s article. The transaction-levelwould be equivalent to the sub-dialogue level in Traums terms. This meansthat an entire dialogue or sub-dialogue can not be read through this model,and that it can not be empirically supported by the results of this study.Nevertheless the model of this study can still illustrate how a turn can beconstructed and function as far as grounding is concerned. The model shouldbe read as follows: S is the starting-state which leads to state 1. This is theinitiative statement given by the initiator. From there acknowledgement canbe given by the responder so that the turn is grounded and thereby �nished,in state F. Alternative moves eg. explain (state 2) or query-yes/no (state 3)can also be given and they must in turn be acknowledged or given a reply-yesin order to go back to state 1. State D is also a �nal state which means theturn is left ungrounded. This can be accomplished by the responder eitheruttering an opposing instruct or simply by replying no. Similar models canbe constructed for the other moves as well.

4 Conclusion

What is evident from the investigation is that grounding, as expected, fre-quently occurs in the Swedish map task dialogues and that it can be catego-rized into a few utterance forms which in turn are fairly easy to identify. Itis very likely, around a 74% chance, that an instruct, explain or clarify moveis followed by an acknowledge move, and that one of the four simple utter-ance forms for acknowledge is used, namely ja- 'yes', mmm- 'uhuh', okej-

20

Henrik Bergqvist

'okay', or ja,just det- 'right'. These simple forms are much more frequentthan are the miscellaneous ones with a 73% probability in favor of the fourshown forms. The same scenario is valid for reply-yes, with a slightly lowerprobability rate when it comes to predicting grounding moves after infroma-tion bearing ones, ie. after query-yes/no (68%) and check (69%). What thismeans is that grounding in the from of acknowledgement or reply-yes can beidenti�ed in task-oriented dialogues and that they have fairly homogenousutterance forms that seem to be particular for those moves. What it doesnot say anything about is whether the preceding move is information bear-ing or not, since even an acknowledgement can be acknowledged. Lastlythe context for grounding moves in the map task and the way groundingcan be achieved using alternative moves, other than grounding ones, can beorganized into a �nite-state model that encompasses the turn-level of thediscourse (�g.1). The manner in which this way of grounding actually func-tions in a way similar to the results obtained in this study, is not addressedhere and needs an investigation of its own.

Bibliography

Anderson, Anne H., Miles Bader, Ellen Gurman Bard, Elisabeth Boyle,Gwyneth Doherty, Simon Garrod, Stephen Isard, Jacqueline Kowtko, JanMcAllister, Jim Miller, Catherine Sotillo, Henry Thompson, and ReginaWeinert. (1991) The HCRC Map Task Corpus. Language and Speech, 34(4) s. 351-366.

Carletta, Jean, Stephen Isard, Gwyneth Doherty-Sneddon, Amy Isard, Jaque-line C. Kowtko, Anne H. Anderson (1997) The Reliability of a DialogueStructure Coding Scheme Computational Linguistics, vol 23, nr 1: s.13-31,.

Clark, H. Herbert, Schaefer, F. Edward (1989) Contributing to Discourse,Cognitive Science 13, s.259-294, Stanford University.

Helgason, Pet�ur The Stockholm Corpus of Spontaneous Speech, Universityof Stockholm

Stirling, Lesley, Ilana Mushin, Jane Fletcher, Rodger Wales (2000) The Na-ture of Common Ground Units: an Empirical Analysis Using Map TaskDialogues, in G�otalog 2000 -workshop on the Semantics and Pragmatics ofdialogue G�oteborgs Universitet 15-17 June 2000.

Traum, David (1994) A Computational Theory of Grounding in NaturalLanguage Conversation, Doctoral Dissertation. http://www.cs.umd.edu/

users/traum/

21


22

Computing the Density of Regular

Languages

Manuel Bodirsky

Department Algorithms and Complexity I, Humboldt University Berlin

[email protected]

Tobias G�artner

Department of Computer Science, Saarland University

[email protected]

Timo von Oertzen


[email protected]

Jan Schwinghammer


[email protected]

Abstract. We present an algorithm that computes the limit density of a regularlanguage and prove its correctness. If the language is speci�ed by a �nite deter-ministic automaton A, the algorithm takes time O(dn3), where n is the number ofstates of A, and d its periodicity.

1 Introduction

It is a well-known fact that for a given �rst order formula � without functionsymbols, the limit density of structures satisfying � is either zero or one((Fagin 1976; Y. V. Gelbskij and Talanov 1969)). More precisely, if �m(�)denotes the fraction of structures of size m that satisfy the formula �, wehave that limm!1 �m(�) always exists and is either zero or one. Such 0-1laws have been established for various extensions of �rst order logic (Blasset al. 1985; Anuj Dawar 1995; Kolaitis and Vardi 1992; Bars 1998).

Proceedings of the Sixth ESSLLI Student SessionKristina Striegnitz (editor)Chapter 3, Copyright c 2001, Manuel Bodirsky, Tobias G�artner, Timo von Oertzen, and Jan Schwinghammer

23

Computing the Density of Regular Languages

However, even in cases where the 0-1 laws fail, the class of almost surelytrue sentences is coherently determined, and it is natural to ask for the al-gorithmic complexity of deciding this property for a given formula. It turnsout that the transition from 'absolute truth' to 'truth with probability one'can reduce the complexity qualitatively. For instance, almost sure valid-ity for �rst-order logic in the �nite is PSPACE-complete (Grandjean 1983),whereas validity in the �nite is undecidable, by Trachtenbrot's theorem.

In this paper, we will give an algorithm to determine the limit densityjLmjj�jm as m!1 of a regular language L, where Lm = L \ �m. The density

of regular languages has already been studied in (Berstel 1972), and themethodology using formal power series is standard by now (Salomaa andSoittola 1978). It is known that there are �nitely many rational accumu-lation points. But the algorithmic answer to the above questions is not oronly implicitely given. In particular, the exact complexity of computing thelimit density remains unclear.

We show that checking the existence of the limit and computing it inthe case of existence can be done in time O(dn3), where n is the numberof states of the automaton and d its periodicity. If the underlying graphG of the automaton is strongly connected, the periodicity is de�ned as thegreatest common divisor of the length of all cycles in G. For the general case,d is the least common multiple of the periodicities of all strongly connectedcomponents of G. The periodicity d may grow exponentially with the size ofthe automaton, although it should be very small in practice. Nonetheless,our algorithm can be formulated to run in nondeterministic polynomial time.

//?>=<89:;1b

II a//?>=<89:;2

a

EE b

//?>=<89:;3EDGF

a

��

b

//?>=<89:;/.-,()*+4B��

Aa

__???

EDGFb

��

//?>=<89:;11

2

II 1

2

//?>=<89:;2

1

2

EE 1

2

//?>=<89:;3EDGF

1

2

��

1

2

//?>=<89:;/.-,()*+4B��

A1

__???

Figure 3.1: Simple automaton (with limit density 14) and the corresponding

Markov process.

The task can be seen as a special instance of a convergence problemof Markov chains. We say a matrix A = (aij)i;j with non-negative ratio-nal entries is a DFA-matrix i� the sum of the elements in each columnis one. A corresponds to some complete deterministic �nite automatonA = (�; f1; : : : ; ng; Æ; F ) if aij is the number of edges from j to i, divided byj�j (see Figure 3.1). When viewed as a Markov process, a stationary distri-bution would make it easy to compute the percentage of words accepted inthe limit. If the initial distribution is given by the vector s (e.g. correspond-

24

Manuel Bodirsky, Tobias G�artner, Timo von Oertzen, and Jan Schwinghammer

ing to the the initial state of the automaton), we say that A converges top 2 [0; 1] if limm!1

Pi2F (Ams)i exists and is equal to p.

However, Figure 3.2 shows that the limit density might exist althoughthe Markov chain is not ergodic. In particular, we do not obtain a stationarydistribution for any of the leaf components when running the automaton on(1; 0; 0; 0; 0; 0). A leaf component is a strongly connected component whereno edge leaves the component.

An implementation of the algorithm is available at http://fsinfo.cs.uni-sb.de/~timohome/regDensity/.

//?>=<89:;1b��

a;c//?>=<89:;2 c

//B��

Aa

ZZ55

GF EDb

��?>=<89:;/.-,()*+3a;b;c

//?>=<89:;4B��

Aa;b;c

ZZ55?>=<89:;5

a;b;c

//?>=<89:;/.-,()*+6B��

Aa;b;c

ZZ55

Figure 3.2: Periodic automaton accepting a language of density 12

2 Computing the density of a regular language

Let L be the regular language accepted by a DFA A. Let A be its DFA-matrix and s its starting vector. We describe the algorithm to compute the

accumulation points of�Lmj�jm

�m

, or, in case of existence, the limit. The �rst

step of the algorithm is a reduction to the case when the periodicity of theautomaton is 1.

Density(A)1 A DFA-matrix(A)2 s StartingVector(A)3 d ComputePeriodicity(A)4 for i = 0; : : : ; d� 15 do li 1-Density(Ad; Ais)6 return fl0; : : : ; ld�1g

Figure 3.3: The main procedure

In order to get rid of a peri-odicity d > 1, consider the ma-trix Ad. It is easily seen that Ad

is again a DFA-matrix, and it canbe interpreted as the matrix of theautomaton Ad which is de�ned asfollows: The alphabet of Ad is �d,the states are the same as those ofA. There is an edge i ! j in Ad

with label w = w1 � � �wd i� thereis a path in A from i to j with edge labels w1; : : : ; wd.

Obviously, the automaton Ad has periodicity 1. Instead of computingthe densities of A with starting vector s we compute the densities of Ad

with starting vectors s;As; : : : Ad�1s. Because of proposition 2.1, the set ofaccumulation points are the same. The intuition behind this is that runningA k steps of length 1 or k

d steps of length d yields the same. If k is notdivisible by d, the starting point of the run has to be postponed.

25


ComputePeriodicity(A)1 for each strongly connected component C2 do Fix v 2 C; L fvg; pC 03 for j = 1; : : : ; jCj4 do L

Sl2L Out (l) \ C;

5 if v 2 L then pC =gcd(pC ; j)6 return lcmfpC jC strongly connected componentg

Figure 3.4: Computing the periodicity in O(n3).

Proposition 2.1. Let A be a DFA-Matrix, and s be a starting vector andd 2 N. Let S be the set of accumulation points of (Ams)m2N. Let Si; 0 �i � d � 1 be the set of accumulation points of

�(Ad)m(Ais)

�m2N

. ThenS = S1 [ : : : [ Sn.In terms of automata this reads: Given the matrix A of a DFA and a startingvector s, then the accumulation points of the densities Lm

j�jm can be obtained

by calculating the accumulation points of the densities of Ad with startingvectors s;As; : : : ; Ad�1s.

Proof. If v is an accumulation point, then there is a subsequence such thatAmjs! v. Write mj = kjd + rj ; 0 � rj < d. Then at least one of thenumbers 1; : : : ; d, say k, appears in�nitely often as a rj . This gives a partialsequence of

�(Ad)m(Aks)

�m

.

Conversely, the identity (Ad)mj (Aks) = Admj+ks shows that each accumula-tion point of

�(Ad)m(Aks)

�m

is also an accumulation point of (Ams)m.

//?>=<89:;1

1

9+ 2

9

�� 2

9 //

@A B2

9

??��@A B2

9

??��

?>=<89:;2

2

9

��

F��E

1

3

��??

????

??

B��

A1

9

ZZ55 A??? B2

9

??��

?>=<89:;/.-,()*+31

�� ?>=<89:;41

�� ?>=<89:;51

�� ?>=<89:;/.-,()*+6

1

Figure 3.5: A visualisation of the squared transition matrix of the last ex-ample

26


3 Computing the density of an aperiodic DFA-matrix

Let A be an automaton that is aperiodic. Unfortunately, A is not neces-sarily strongly connected, so we can not apply the basic limit theorem (seeChapter 4 on the following page) directly. Instead, we use the followingmodi�ed version:

Let J � f1; :::; ng be the set of indices of all vertices that are in non-leafcomponents. Let Ik � f1; :::; ng be the set of vertices in the kth leaf compo-nent with k = 1; :::;K, where K is the number of di�erent leaf components.Let bk be the unique convergence vector for each Ik that exists because ofthe basic limit theorem (�lled with zeroes on all components that are not inIk). Then there exist �k such that

limm!1

Ams =

KXk=1

�kbk: (3.1)

For a formal proof of this fact see the correctness section. While the bk areindependent of the starting vector, the �k must be calculated with respectto s. Considering �g. 3.2, we can easily observe that the sum of the dis-tributional densities in vertices 1 and 2 will approach zero. Still it is noteasy to determine which fraction will be led into the leaf component f3; 4gand how much to f5; 6g. To overcome this problem, we apply two reductionrules on non-leaf components that change the automaton but preserve thelimit distribution:

loop reduction: If a vertex j 2 J has a loop to itself, it is omittedand the other edge weights are scaled such that their sum equals 1 (see�g. 3.7 on the next page).

edge reduction: If a vertex j 2 J has an edge to i 2 J with edgeweight aij and all edges from i end in p vertices k1; :::kp, then theedge from j to i is replaced by p edges from j to kl, l = 1; :::; p, withedge weight aij � akli. If an edge from j to kl has already existed, thenaij � akli is added to the edge weight.

It is easily seen that the resulting matrix of a reduction is again a DFA-matrix. The proof that these rules preserve the limit can be found in thecorrectness section.

If we eliminate the loop to a vertex j 2 J and afterwards apply thesecond reduction rule on all edges to j, there will remain no edge to j. Asall edges leading to j that are inserted in the reduction rules lead to targetvertices of former edges, further applications of reduction rules will createno edges to j. If we do this for an arbitrary �xed sequence of vertices, itis guaranteed that no further reductions are applicable, and the result is an

27


1-density(A; s)1 for i : i is not in a leaf component2 do reduceSelfLoop(i)3 reduceEdgesTo(i)4 for k = 0; : : : ;K5 do �k

Pi2Ik

< s;Ai >6 bk Eigenvector to eigenvalue 1 of Ik7 return

PKk=1 �kbk

Figure 3.6: Computing the density of a strongly connected component.

u

v

12

s

13

t

13

131

2

) u

v

12

s

12

t

12

12

u

v

12

s

12

t

12

12

) u

v

s

12

t

12

34

14

Figure 3.7: Loop reduction and edge reduction.

automaton A0 that has no edges within J . So within one step in A0, theprobability of all states in J will be zero. The �k can then be obtained byadding up the probability of all vertices in Ik. Now that we have calculatedall �k with respect to s, it remains to calculate the bk, i.e. the unique limitdistribution of a strongly connected component of A. For this purpose, we

restrict A and bk to the entries of Ik and ignore the rest. A0 2 QjIk j�jIkj+

and b0k 2 QjIk j+ will denote these objects. The basic limit theorem yields

that limm!1A0ms = b0k for all starting vectors. b0k is the eigenvector of

A0 corresponding to eigenvalue 1, scaled to a column sum of 1. So we caneasily compute b0k by solving the homogenous equation system A0 � I = 0with the Gaussian elimination algorithm. We do this for all Ik and obtainall bk, and using �k we compute the limit of Ams. The sum of the entriesof the accepting states equals the probability that a word will be acceptedby A given the initial probability distribution s.

4 Correctness of the algorithm

We have to prove the statements of Section 3, namely 3.1 on the precedingpage and the correctness of loop and edge reductions. We �rst repeat astandard result for Markov chains, the basic limit theorem. See for instance

28


(Chang 2000) for a good introduction to Markov processes and a proof ofthe theorem.

Theorem 4.1 (Basic Limit Theorem). Let A be a DFA-matrix and thecorresponding graph A be strongly connected and aperiodic. Then limm!1A

msexists for all starting vectors s, and it is independent of s.

As mentioned above, we cannot apply the theorem directly, since thegraph of the automaton is in general not strongly connected. The idea is toconcentrate on the leaf components and to apply Theorem 4.1 there. Thelimit probability of nodes in non-leaf components is zero since for almost allwords the automaton will eventually be in a node of a leaf component. Thisis the subject of the following theorem:

Theorem 4.2. Let A be a DFA-matrix. Let I = fi j i is in a leaf componentgbe the set of all nodes which are in leaf components of A, and let J = fj j jis not in a leaf componentg be the set of remaining nodes. Consider, for anystarting vector s, (Ams)j, which denotes the j-th entry of the vector Ams.Then

limm!1

(Ams)j = 0:

Proof. Let sm =P

j2J(Ams)j. Then, (sm)m2N is a sequence, which isbounded by 0 and which decreases monotonously. In fact, sm equals theprobability that at step n the automaton is in a node of a non-leaf compo-nent. Since there is no path from leaf component nodes to non-leaf compo-nent nodes, this probability cannot increase:

Pj2J(Am+1s)j �

Pj2J(Am)j .

Therefore (sm)m converges to a limit a. We shall show that a = 0. Let = minfakj j j 2 J; akj 6= 0g the minimal edge weight which is not zero for

all edges starting in a node in J . Let � > 0 and Æ = � jJj

jJ j . Since (sm)m2Nconverges, there is a m0 2 N such that 8m � m0 : jsm�aj � Æ. Let k be anyvertex in J that has an edge to a vertex in I. Then sm+1 � sm� (Am

0s)k � .

Therefore (Ams)k �Æ (because otherwise, sm+1 would be smaller than a,

which is impossible).Let k0 be an arbitrary node in J . By de�nition, there is a path from k0 to anode in I. Let r � jJ j be the length of this path, and let k be the last nodein J on this path. Then, since each edge weight on this path is at least ,we have (Am+rs)k � (Ams)k0 �

r�1. Therefore, since r � jJ j,

(Ams)k0 � (Am+rs)k �Æ

r�

Æ

jJj

)Xj2J

(Ams)j � jJ jÆ

jJj= �:

We can now prove the statement 3.1 on page 27.

29


Theorem 4.3. Let A be a DFA-matrix such that the leaf components of thecorresponding graph A are aperiodic. For all leaf components of A, let Ik :=fi j i is in the k�th leaf componentg. Let J = fj j j is not in a leaf componentgdenote the remaining subgraph. Let s be any �xed starting vector. Let, forall leaf components Ik, b

(k) be the vector that has in its entries with indicesfrom Ik the unique limit for Ik, which exists due to the basic limit theorem,and which is 0 on its other components. Then limm!1A

ms exists and is alinear combination of the bk, i.e.

limm!1

Ams =Xk

�kb(k):

Proof. The idea of the proof is the following: Since the leaf componentsare aperiodic and strongly connected, the basic limit theorem states that,given a starting vector, they converge to a limit which is independent ofthis starting vector. The problem here is that the components Ik do not'run' on a �xed starting vector s. Instead, they get their 'input' from somenon-leaf components. Moreover, the initial distributions on the Ik changewith the non-leaf components, and the initial distributions are not given bystarting vectors, since their sum need not be 1 (remember that a startingvector is characterized by the fact that the sum of its entries equals 1). Butwe already know that the distributions in the non-leaf component nodesconverge to zero, which implies that the initial distributions of the Ik com-ponents stabilize. More precisely, consider the following.

Let �(m)k =

Pi2Ik

(Ams)i. We show that (�(m)k )m2N converges, and denote

the limit by �k. Let � > 0 and K be the number of leaf components. Be-cause of the convergence of the distributions of the non-leaf components,there exists a m1 such that 8m > m1 : j

Pj2J(Ams)j j <

�2K . Since Ik is a

leaf component, the distributional fraction of Ik remains in Ik. Therefore,

�(m+m)k � �

(m)k +

Xi2J

(Ams)j

) j�(m+m)k � �

(m)k j � j

Xi2J

(Ams)j j � � 8m � m1:

This means that (�(m)k )m 2 N is a cauchy sequence.

Let � > 0, and m1 be such that

j�(m1)k � �kj �

�

2Kjb(k)j: (3.2)

Let Pk be the projection on the indices of Ik. By the basic limit theorem,given a (normalized) starting vector, Ik converges to b(k), which is indepen-dent of the starting vector. The vector PkA

m1s is not necessarily normalized,

but the sum of its entries equals �(m1)k . Therefore the distribution of Ik with

30


starting vector Am1s converges to �(m1)k b(k). Thus there is a m2 such that

for m � m2

jPkAm(Am1s)� �

(m1)k b(k)j <

�

2K: (3.3)

For m � m1 +m2, this implies��Ams�Xk

�kb(k)

�� =

��Xk

�PkA

m�m1Am1s��Xk

�kb(k)

��

Xk

��PkAm�m1Am1s� �(m1)k b(k)

��+Xk

��(m1)k b(k) � �kb

(k)��

��

2+�

2= �; by (3.2) and (3.3).

This completes the proof.

5 Correctness of the reduction rules

The intuition behind both, loop and edge reductions, is the same: By 'redi-recting' the probability distribution via (new) edges of appropriately scaledweights, the sum of the weights along all paths from some i to some j remainsthe same. Whereas before paths of length two or more were considered, nowany such path consists of fewer edges. However, we are interested in thelimit distribution only, and so this does not make any di�erence.

Lemma 5.1 (Minor perturbation). Let A, B be two DFA-matrices whichare equal except for the i-th column. We assume that both (Ams) and (Bms)are convergent for all vectors s. Moreover, for both A and B we assume(limm!1A

ms)j = (limm!1Bms)j = 0 for all j 2 J , the set of all nodes in

node leaf components. Then limm!1Bms = limm!1 limk!1A

kBms, forall s.

Proof. Since both limits are assumed to exist, all that remains to be shownis equality. Let � > 0, s any vector and v be such that limmB

ms = v. Hence,we can choose k such that for all k0 > k, kv �Bk0sk1 <

�2 . By assumption,

vj = 0 for all j 2 J , so we obtainP

j2J(Bk0s)j <�2 .

Let u := Bk0s and let u0 be equal to u except for the entries with indicesin J , where u0 is zero. Since these components in u add up to a numberlower or equal �, we know that jju0�ujj1 � �. As u0 is zero on all componentsof J , it is obviuos that for arbitrary m, Amu0 = Bmu0. Now both A and Bpreserve the jj:jj1-norm, in particular jj(u0�u)jj1 = jjA(u0�u)jj1. Using thetriangular inequality we obtain

31


jjAmu�Bmujj1 =jj(Am �Bm)ujj1

=jj(Am �Bm)(u0 + (u� u0)jj1

�jj(Am �Bm)u0jj1 + jj(Am �Bm)(u� u0)jj1

�jj(Amu0 �Bmu0jj1 + jjAm(u� u0)jj1 + jjBm(u� u0)jj1

�0 +�

2+�

2= �:

Hence, limm!1 kAmBk0s�BmBk0sk1 < � for any k0 > k, which yields

limm!1

Bms = limk!1

limm!1

AmBks:

For any DFA-matrix A we denote by Ji the set of nodes for which i isreachable in the graph corresponding to A, i.e. Ji := fj j exists a path from j to ig.Correctness of the elimination of self loops follows from

Lemma 5.2. Let A be a DFA-matrix for A such that the underlying graphis aperiodic in all of its leaf components. Let i be a node in a non-leafcomponent, thus in particular aii < 1. Let B be the DFA-matrix obtainedfrom A by replacing aii by 0 and normalizing the column to 1, i.e. multi-plying each entry in column i by 1

1�aii. Moreover, let us assume that both

(limm!1Ams)j = (limm!1B

ms)j = 0 for all vectors s and j 2 Ji. Then Bconverges for all vectors s to the same limit as A does, i.e. limm!1A

ms =limm!1B

ms.

Proof. First observe that B is again the DFA-matrix of some automatonwhose underlying graph is aperiodic in all of its leaf components, since re-duction rules are only applied to nodes in non-leaf components. By Theo-rem 4.3 on page 30 this implies both (Am) and (Bm) converge on all vectorss.Let si denote the i-th component of a vector s, and si denote the projectionon its i-th component. Then, taking b = Asi, Bs = A(s� si) + 1

1�aii(b� bi)

by de�nition of B, and b = (b� bi) + bi = (b� bi) + aiisi. Intuitively, b givesthe distribution after one step as obtained by starting in i, and this can besplit into the sel oop part and the rest.We obtain

limm!1

Amsi = limm!1

Am�1b

= limm!1

Am�1((b� bi) + aiisi)

= limm!1

Am�1(b� bi) + aii limm!1

Amsi

32


since, by assumption, Am is convergent.Thus, limm!1A

msi = 11�aii

limm!1Am(b� bi). Now

limm!1

Ams = limm!1

Am(s� si + si)

= limm!1

Am+1(s� si) + limm!1

Amsi

= limm!1

Am+1(s� si) +1

1� aiilimmAm(b� bi):

Again, since A is assumed to be convergent this gives

limm!1

Ams = limm!1

(Am+1(s� si) +1

1� aiiAm(b� bi))

= limm!1

Am(A(s� si) +1

1� aii(b� bi))

= limm!1

AmBs:

By induction we obtain limmAms = limmA

mBks for all k. Hence

limm!1

Ams = limk!1

limm!1

AmBks

and using 5.1 on page 31 we conclude limmAms = limmB

m.

Likewise, the correctness of our rules for the reduction of edges in non-leaf components follows immediately from

Lemma 5.3. Let A be a DFA-matrix as above, and let i 6= k be two nodesin non-leaf components of the underlying graph. Let B denote the matrixobtained from A by replacing aki by zero and ali by ali + aki � alk for alll 6= k. Again, let us assume that (limm!1A

ms)j = (limm!1Bms)j = 0 for

all j 2 Ji. Then B converges for all vectors s to the same limit as A does,i.e. limm!1A

ms = limm!1Bms.

Proof. As above, we observe that both A and B converge for all vectors s,so it remains to be shown that the limits are equal.If A = (aik) then let b be the vector with zero entries except for the k-thcomponent, where it is aki � si. Let b0 = Ab, for then Bs = As � b + b0, byde�nition of B. Thus,

limm!1

Ams = limm!1

AmAs

= limm!1

Am(Bs� b0 + b)

= limm!1

AmBs� limm!1

Am(Ab) + limm!1

Amb

= limm!1

AmBs:

As in the previous proof, limmAms = limmA

mBks for all k, by inductionon k. Hence we can apply Lemma 5.1 on page 31 and obtain limmA

ms =limmB

m.

33


6 Conclusion and Outlook

In this paper we gave an upper bound for deciding whether a determinis-tic �nite automaton de�nes a dense regular language. The algorithm wepresented can be formulated such that the problem is seen to be in coNP.The nondeterministic version of the algorithm accepts if all the accumula-tion points equal one. Trading time for space, it might be possible to showthat the problem is in PSPACE, even if we are given a nondeterministicautomaton.

Several other questions remain open: Is there a polynomial time algo-rithm for the problem, i.e. can we get rid of checking d values? The next steptowards computing the limit density of more expressive formalisms might beto look at context free languages. Related to this, we could ask whether �-nite automata accepting regular languages over �nite trees have computablelimit densities.

References

Anuj Dawar, E. G. (1995). Generalized Quanti�ers and 0-1 Laws. In Proc.10th IEEE Sypm. on Logic in Computer Science, IEEE ComputerSociety Press, pp. 54{64.

Bars, H.-M. L. (1998). Fragments of existential second-order logik without0-1 laws. Logic in Computer Science, 525{536.

Berstel, J. (1972). Sur la densit�e de langages formels. ICALP , 345{358.

Blass, A., Y. Gurevich, and D. Kozen (1985). A zero-one law for logicwith a �xed point operator. Journal of Information and Conrol 67,70{90.

Chang, J. (2000). Stochastic Processes. Lecture notes, http://pantheon.yale.edu/~jtc5/251/.

Fagin, R. (1976). Probabilities on Finite Models. Journal of SymbolicLogic 41, 50{58.

Grandjean, E. (1983). Complexity of the First-Order Theory of AlmostAll Finite Structures. Information and Control 57, 180{204.

Kolaitis, P. G. and M. Y. Vardi (1992). 0-1 laws and decision problemsfor fragments of second order logic. Journal of Information and Com-putation 98, 258{294.

Salomaa, A. and M. Soittola (1978). Automata-Theoretic Aspects of For-mal Power Series. Springer-Verlag.

Y. V. Gelbskij, D.I. Kogan, M. L. and V. Talanov (1969). Range anddegree of realizability of formulas in the restricted predicate calculus.Kibernetika 2, 17{28.

34

On the Logic of Proofs

Vladimir Brezhnev

Department of Mathematical Logic, Faculty of Mechanics and Mathematics,

Moscow State University, 119899 Moscow, Russia

[email protected]

Abstract. The Logic of Proofs (LP) was introduced by S. N. Artemov. It is asystem in the propositional language with additional sort of objects { proof terms,and an extra atomic propositions [t]F with intended reading \t is a proof of F". LPis an explicit counterpart of modal logic S4, since S4 is the exact term-forgettingprojection of LP. In the present paper we build sequent system correspondingto fragment of LP suÆcient to realize S4. Using its format we build explicitcounterparts of modal logics K, K4, D, D4 and T. Also we de�ne versions of thissystems for which the intended interpretation of terms is proof tactics, computableprocedures able to �nd proofs of some formulas, having formula as input.

1 Introduction

In 1933 G�odel informally speci�ed modal logic S4 as logic for provability andleft open the question about its exact intended semantics. The straightfor-ward reading of 2F as \F is provable in certain formal system" contradictsG�odel's incompleteness theorem. S. N. Artemov gave the solution of thisproblem in (Artemov 1995; Artemov 2001). In (Artemov 1995) he intro-duced the Logic of Proofs (LP). It is a system in the propositional languagewith additional sort of objects { proof terms, and an extra atomic propo-sitions [t]F with intended reading \t is a proof of F". Gabbay's LabelledDeductive Systems (cf. (Gabbay 1994)) serves a natural framework for LP.LP is sound and complete with respect to the the arithmetical proof inter-pretations. LP realizes all the theorems of S4, and thus provides the formalprovability semantics to it.

S4 has all axioms and rules of propositional logic in the modal languagealong with the necessitation rule F ` 2F and the modal axiom schemes:

AK. 2(F ! G)! (2F ! 2G) \distributivity"AT. 2F ! F \re exivity"AK4. 2F ! 22F \transitivity"

The language of LP contains: the usual Boolean connectives and sen-tence variables; proof variables p0; p1; : : :; axiom constants a0; a1; : : :; func-tional symbols !, + and �; brackets [ ] ( ). Terms and formulas: proof

Proceedings of the Sixth ESSLLI Student SessionKristina Striegnitz (editor)Chapter 4, Copyright c 2001, Vladimir Brezhnev

35


variables and axiom constants are terms; if s and t are terms, then !t, (s+ t)and (s� t) are terms; sentence variables and Boolean constants are formu-las; Boolean connectives behave in the usual way, and if t is a term andF is a formula, then [t]F is a formula. We write st instead of (s � t) andskip parentheses when convenient. The system LPAS has following axiomschemes:A0. The tautologies in the language of LP.A1. [t]F ! F \veri�cation"A2. [t](F ! G)! ([s]F ! [t� s]G) \application"A3. [t]F ! [!t][t]F \proof checker"A4. [s]F ! [s+ t]F , [t]F ! [s+ t]F \choice"

AS. A �nite set of formulas of the form [c]A, where c is anaxiom constant, and A is an axiom A0-A4. \axiom speci�cation"

The only rule is modus ponens. LP is the generic name for the systemsLPAS of the various axiom speci�cations.

Intended semantics of operations ! and � is quite natural: ! builds theveri�cation of the proof, � corresponds to modus ponens. Traditional in-terpretation of + is the operation of concatenation of two multi-conclusionproofs. If we assume that the usual Hilbert style derivations proves all itsformulas, we get an example of multi-conclusion proof system.

Axiom speci�cation corresponds to the particular case of necessitationrule. The general case has its counterpart too. If LP ` A then for someterm t of axiom consants LP ` [t]A (cf. (Artemov 1995)).

The result of substitution 2 for all occurences of [�] in axioms of LP aretheorems of S4. The same is true for all theorems of LP.

By LP-realization of modal formula F we mean an assignment of proofterms to all occurrences of modality 2 in F . An LP-realization is normalif all negative occurrences1 of modality are realized by proof variables. LetF r be an image of F under a realization r. If S4 ` F , than LPAS ` F

r forsome normal realization r and axiom speci�cation AS (cf. (Artemov 1995)).In this sense LP realizes S4.

The epistemic meaning of modality suggests that a modal formula 2A!2B implicitly speci�es a function f(x) such that if x is a justi�cation ofA then f(x) is a justi�cation of B. LP makes this consideration formal.Indeed, negative occurences of modality in the normal realization of themodal formula are lebeled by proof variables and positive ones by LP-terms{ functions of proof variables.

The logic LP is a version of S4 presented in a more rich operational lan-guage, with no information being lost, since S4 is the exact term-forgetting

1Positive and negative occurrences of modality in a formula and a sequent: the outmostoccurrence of modality in formula 2F is positive; any occurrence of modality from F inG ! F , 2F , and � ` �; F;� has the same polarity as the corresponding occurrence ofmodality in F ; any occurrence of 2 from F in F ! G and �; F;� ` � has a polarityopposite to that of the corresponding occurrence of 2 in F .

36

Vladimir Brezhnev

projection of LP. Naturally there is an interest in �nding explicit counter-parts for other modal logics than S4 and a common procedure of buildingan explicit counterpart. In this paper we are trying to approach this goal.

We de�ne sequential logic of proofs, system LSP(S4). This system maybe regarded as minimal explicit counterpart of modal logic S4. Artemov'srealization procedure has a cut-free derivation in the sequent version of S4as its input and labels occurences of modality in this derivation. Thus,derivations of LP suÆcient for realization of S4 are realizations of cut-free sequent derivations of S4. In our system we make this correspondanceexplicit. Derivations of LSP(S4) are exactly derivations of sequent cut-freeformulation of S4 with additional labels for occurences of modality.

Using format of LSP(S4) we easily �nd explicit counterparts for somemodal logics for which cut-free sequent formulations are known. This logicsare K, K4, D, D4 and T. All of them are subsystems of S4.

In section 4 we de�ne systems LSPT . Terms of this systems can be inter-preted as proof tactics. Proof tactics or simply tactics are totally computableprocedures able to �nd proofs of some formulas, having formula as input.In section 5 we build arithmetical interpretation of system LSPT (S4).

2 Systems LSP

In this systems we derive pairs of the form \term:sequent", where the termencodes the proof of the sequent, like terms of typed lambda calulus (cf.(Girard et al. 1989; Sorensen and Urzyczyn 1998)) or justi�cations of typedlogic (cf. (Constable 1998)). We call sequent derivable if there is a term� for which � : ( ) is derivable.

By capital Greek letters we denote comma-separated lists of formulas.By i, j, k, m, n we denote integer numbers.

The language of LSP contains: symbols Axi, i �= �1, Wl, Wr, Cl,Cr, T lj, Trj, j � 0, M , T , D; labels l1; l2; : : :; semicolon \;"; proof variablesp1; p2; : : :; functional symbols !, + (we omit � in the language); propositionalvariables S1; S2; : : :; Boolean connectives !, ?; brackets ( ), [ ].

Sequent is an expression of the form � ` �, where � and � are listsof words in the language of LSP. We do not de�ne terms and formulasexplicitly, instead we state that if � : (A1; : : : ; An ` An+1; : : : ; An+m) isderivable then � is well-formed term, and Ai are well-formed formulas.

Rules. First, we de�ne system corresponding to classical propositionalcalculus. We call this system LSP. Axioms:

Ax�1 : (? ` ) Ax0 : (? ` ?) Axi : (Si ` Si) ; i � 1

Weakening:

� : (� ` �) � : (A;� ` �)

Wl(�;�) : (A;� ` �)

� : (� ` �) � : (� ` �; A)

Wr(�;�) : (� ` �; A)

37


Weakening rules look slightly unusual { they has two premises each.Additional premises appear to guarantee that A is a well-formed formula.

Contraction:

� : (A;A;� ` �)

Cl(�) : (A;� ` �)

� : (� ` �; A;A)

Cr(�) : (� ` �; A)

Transposition:

� : (�; A;B;� ` �)

T li(�) : (�; B;A;� ` �)

� : (� ` �; A;B;�)

Trj(�) : (� ` �; B;A;�);

where i is the number of commas before the indicated occurence of Ain the premise of the left rule and j is the number of commas after theindicated occurence of B in the premise of the right rule.

Implication introduction rules:

� : (� ` �; A) � : (B;� ` �)

Il(�;�) : (A! B;�;� ` �;�)

� : (A;� ` �; B)

Ir(�) : (� ` �; A! B)

The system LSP is de�ned. Now we construct its extensions correspond-ing to modal logics. Rule (`K2):

� : (A1; : : : ; An ` B)

M(�; p1 : : : pn; l) : ([p1]A1; : : : ; [pn]An ` [t]B);

if l is an unique label, t = t1 + � � �+ tk, every ti has form �ipi1 : : : pini , �iare well-formed terms, pij are proof variables, and for some k, tk = �p1 : : : pn.We call expressions of this form normal proof polynomials or simply proofpolynomial. LSP(K) = LSP + (`K2).

In the case of this rule we do not store all the information required torecover the proof of the sequent in the term explicitly. Instead we use uniquelabels. We do this to allow additional summands in the proof polynomial tocontain the term itself.

By forgetful projection D0 of LSP-derivation D we mean derivation ob-tained by omitting left part terms and substituting 2 for all occurences oflabeled modality [�] in sequents. By forgetful projection L0 of system L

we mean system which derivations are forgetful projections of derivationsof system L. Forgetful projection of LSP(K) corresponds to modal logicK (propositional calculus plus axiom AK, plus necessitation rule). We callthis system GK. There are axioms Si ` Si and ? ` ? in this system, butnot A ` A for any formula A, as usual. One can see that all the traditionalaxioms are derivable from atomic ones not using cut. LSP0 is sequentialsystem for classical propositional logic.

Rule (`K42):

� : (A1; : : : ; An; [qi]C1; : : : ; [qm]Cm ` B)

M(�; p1 : : : pn; l) : ([p1]A1; : : : ; [pn]An; [qi]C1; : : : ; [qm]Cm ` [t]B);

38

Vladimir Brezhnev

t = t1 + � � � + tk, every ti has form �ipi1 : : : pini !qi1 : : : !qimi, �i are

well-formed terms, pij and qij are proof variables, and for some k, tk =�p1 : : : pn!q1 : : : !qm; l is an unique label. LSP(K4) = LSP + (`K42).GK4 = LSP(K4)0. GK4 corresponds to modal logic K4 = K plus axiomAS4.

Rule (2`D):

� : (A1; : : : ; An ` )

D(�; p1 : : : pn) : ([p1]A1; : : : ; [pn]An ` )

LSP(D) = LSP(K) + (2`D). GD = LSP(D)0. GD corresponds toD = K + AD, AD = 2? ! ?.

Rule (2`D4):

� : (A1; : : : ; An; [qi]C1; : : : ; [qm]Cm ` )

D(�; p1 : : : pn) : ([p1]A1; : : : ; [pn]An; [qi]C1; : : : ; [qm]Cm ` )

LSP(D4) = LSP(K4) + (2`D4). GD4 = LSP(D4)0. GD4 corre-sponds to D4 = K4 + AD.

Rule (2`T ):� : (A;� ` �)

T (�; p) : ([p]A;� ` �)

LSP(T) = LSP(K) + (2`T ). GT = LSP(T)0. GT corresponds toT = K + AT. LSP(S4) = LSP(K4) + (2`T ). GS4 = LSP(S4)0. GS4corresponds to S4 = K4 + AT.

3 Modal logics, LSP and LP

3.1 Realization of Modal Logics

Theorem 3.1. For every derivation D of GL there is derivation Dr ofLSP(L) such that (Dr)0 = D, where L is one of the logics K, K4, D, D4,T and S4. We call Dr realization of D.

Proof. We build realization by induction on the modal derivation. Duringthis process we label sequents by terms encoding there proofs and labeloccurences of modality by proof variables or proof polynomials. Realizationof most rules is straighforward.

In the rules introducing modality some new proof variables appear. Weuse new proof variables each time. In rules (`2) we use proof polynomialscontaining only one essential summand. The only step that remains unclearis contraction. The problem is that corresponding boxes in two occurencesof the a�ected formula may be labeled by di�erent terms.

The occurrences of modality in a derivation are related if they are thecorresponding occurrences of the related formulas in premises and conclu-sions of rule; we extend this relation by transitivity. All occurrences of

39


modality in a derivation are split into disjoint families of the related ones.In a cut-free derivation the related occurrences of modality have the samepolarity, so we de�ne polarity of a family as polarity of its members.

Rules (`2) introduce positive and negative occurences of modality, rules(2`) introduce negative occurences. There are no other sources of modality.

Contraction. From A;A;� ` � inferA;� ` �. We already have a deriva-tion of � : (B;C;�r ` �r) where B0 = C0 = A. Let us rebuild the derivationso that B = C. Negative occurences of modality are labeled by proof vari-ables, so we substitute proof variables of B for corresponding proof variablesof C in the derivation. Positive occurences of modality are introduced byone of the rules (`2) for same formula A on the right hand, and labeledby proof polynomials. Let proof polynomials r and s label correspondingoccurences of modality in B and C. Let t be the the minimal proof polyno-mial containing all summands of r and s. Substitute t for all occurences of rand s in the derivation. After this substitutions all rules remain correct, andboth B and C became the same formula, so we can apply contraction. 2

Remark 3.2. In LSP-derivations positive families of modality are labeledby proof polynomials. Sometimes it is possible to throw out some non-essential summands of these polynomials while keeping the derivation cor-rect. We call LSPR(L) subsystem of LSP(L) which derivations does notcontain any non-essential summands in their proof polynomials. Our real-ization procedure leads to LSPR-derivation.

Realization procedure applied to the forgetful projection of the LSPR-derivation D leads to the derivation very similar to D. Namely, the onlypossible di�erences are another proof variables, another numbers of labels,and another order of summands in proof polynomials. In this sense oursystems may be regarded as minimal explicit counterparts of correspondingsequent modal systems.

Remark 3.3. Another interesting feature of the systems LSPR. For everyproof polynomial t = t1 + � � �+ tn in the LSPR-derivation there is a formulaA such that for some �i, sequents �i ` [ti]A are derivable. On the semanticlevel it means that for every i under certain circumstances (namely, �i) tiproves A. It gives one a chance to interpret plus as the operation that chosea well-formed proof from two candidates. This operation isn't restricted tomulti-conclusion proofs.

3.2 Embedding into LP

Theorem 3.4. If A1; : : : ; An ` B1; : : : ; Bm (A1; : : : ; An ` , ` B1; : : : ; Bm)is derivable in LSP(L), then there is derivable in LP formula

VCi !

WDj

(VCi ! ?, > !

WDj), where Ci and Dj are obtained from Ai and Bj by

substituting LSP-terms by some terms of axiom constants.

40

Vladimir Brezhnev

Proof. Induction on the LSP-derivation. Formulas corresponding to ax-ioms are propositional tautologies.

Weakening, transposition and rules introducing implication have one ofthe forms

� ` �

� ` �and

�1 ` �1 �2 ` �2

� ` �;

and (V

� !W

�) ! (V

� !W

�) or (V

�1 !W

�1) ! (V

�2 !W�2) ! (

V� !

W�) is a tautology. Take derivations of formulas cor-

responding to premises, add suitable tautology and derive formula corre-sponding to the conclusion using modus ponens.

Rule (2`T ). Take derivation of formula corresponding to premise, addaxiom [p]A ! A and derive the formula corresponding to the conclusionusing suitable tautology.

Rule (`K42). FormulaVAi ^

V[qj]Bj ! A is derivable, then for some

term � of axiom constants it is derivable [�](VAi ^

V[qj]Bj ! A) (cf.

(Artemov 1995)); using axioms A2 deriveV

[pi]Ai ^V

[!qj ][qj]Bj ! [t], t =�p1 : : : pn!q1 : : : !qm; using axioms A3 derive

V[pi]Ai ^

V[qj]Bj ! [t]; �nally

using axioms A4 deriveV

[pi]Ai^V

[qj]Bj ! [t1+� � �+t+� � �+tk], the formulacorresponding to the conclusion. LSP terms in additional summands ti aresubstituted by some unique axiom constants (place holders).

Rules (`K2), (2`D) and (2`D4) are processed along the same lines asrule (`K42), using axioms [t]? ! ? in the last two cases.

Contraction. Take a derivation of formula A ^ B ^VCi !

WDj corre-

sponding to the premise, substitute some place holders in the derivation byterms of axiom constants to make A and B the same formula, add suitabletautology and derive formula corresponding to the conclusion. 2

4 Systems LSPT

Intended interpretation of terms of this systems is proof tactics, computablefunctions mapping formulas or sequents to proofs. Both parts of the pairterm:sequent are necessary and suÆcient to build the proof.

The language of LSPT proof polynomials contains: symbols At, Ifl,Ifr, Mfl, Mfl, Ax, Wl, Wr, Cl, Cr, T lj, Trj, M , T , Dj , j � 0; proofvariables p1, p2, . . . ; functional symbols ! and +; brackets ( ); semicolon\;". In this and next sections any word in the language of LSPT proofpolynomials we call proof polynomial. Parts of a proof polynomial separatedby + we call its summands.

The language of LSPT contains: the language of LSPT proof polyno-mials; propositional variables S1, S2, . . . ; Boolean connectives ? and !;square brackets [ ]; sequent formation symbols `, comma \;" and �.

Rules. Formula construction:

At : (? ` �) At : (Si ` �) At : (� ` ?) At : (� ` Si) ; i � 1

41


� : (� ` A) � : (B ` �)

Ifl(�;�) : (A! B ` �)

� : (A ` �) � : (� ` B)

Ifr(�;�) : (� ` A! B)

� : (A ` �)

Mfl(�) : ([p]A ` �)

� : (� ` A)

Mfr(�) : (� ` [t]A);

if p is a proof variable and t is a proof polynomial.Axioms:

Ax : (? ` ) Ax : (Si ` Si) ; i � 1

Weakening:

� : (� ` �) � : (A ` �)

Wl(�;�) : (A;� ` �)

� : (� ` �) � : (� ` A)

Wr(�;�) : (� ` �; A)

Contraction:

� : (A;A;� ` �)

Cl(�) : (A;� ` �)

� : (� ` �; A;A)

Cr(�) : (� ` �; A)

Transposition:

� : (�; A;B;� ` �)

T li(�) : (�; B;A;� ` �)

� : (� ` �; A;B;�)

Trj(�) : (� ` �; B;A;�);

where i is the number of commas before the indicated occurence of Ain the premise of the left rule and j is the number of commas after theindicated occurence of B in the premise of the right rule.

Implication introduction rules:

� : (� ` �; A) � : (B;� ` �)

Il(�;�) : (A! B;�;� ` �;�)

� : (A;� ` �; B)

Ir(�) : (� ` �; A! B)

The system LSPT is de�ned.Systems LSPT (K), LSPT (K4), LSPT (D4), LSPT (T) and LSPT (S4):Rule (`K2):

� : (A1; : : : ; An ` B)

M : ([p1]A1; : : : ; [pn]An ` [t]B);

if t is a proof polynomial and one of its summands is �p1 : : : pn.Rule (`K42):

� : (A1; : : : ; An; [qi]C1; : : : ; [qm]Cm ` B)

M : ([p1]A1; : : : ; [pn]An; [qi]C1; : : : ; [qm]Cm ` [t]B);

if t is a proof polynomial and one of its summands is �p1 : : : pn!q1 : : : !qm.Rule (2`D):

� : (A1; : : : ; An ` )

D0(�) : ([p1]A1; : : : ; [pn]An ` )

42

Vladimir Brezhnev

Rule (2`D4):

� : (A1; : : : ; An; [qi]C1; : : : ; [qm]Cm ` )

Dm(�) : ([p1]A1; : : : ; [pn]An; [qi]C1; : : : ; [qm]Cm ` )

Rule (2`T ):� : (A;� ` �)

T (�) : ([p]A;� ` �)

Remark 4.1. It is possible to construct the system of LSPT -tactics corre-sponding to LSPT -terms in the following sense: for a word in the languageof LSPT , � : � is derivable if and only if tactic corresponding to � givesconsistent result on input . Tactics corresponding to symbols At and Axjust check format of its input and if it matches return a derivation contain-ing corresponding axiom. Other tactics are trying to recover premises ofthe corresponding rule from its input, call tactics corresponding to its pa-rameters and construct the derivation of its input using the correspondingrule of inference. Tactic corresponding to M has no parameters. It usesinformation stored in the proof polynomial.

5 Arithmetical Interpretation of LSPT(S4)

We denote Proof(x; y) usual G�odel proof predicate for Peano Arithmetic(PA). Proof(x; y) is a �1-formula. PA ` Proof(x; y) i� y is the code ofthe last formula in the proof with code x. We denote the empty word �.We assume that PA has all propositional tautologies as its axioms.

De�nition 5.1. An arithmetical interpretation � of LSPT (S4) has the fol-lowing parameters: an evaluation of propositional letters by sentences ofarithmetic, an evaluation of proof variables by natural numbers.

Proof polynomials are interpreted by tactics. ?� = ?, (A ! B)� =(A� ! B�), (A ` �)� = (A� ! >), (� ` A)� = (? ! A�), (� ` )� =(V

�� ! ?), ( ` �)� = (> !W

��), (� ` �)� = (V

�� !W

��) (here con-junctions are grouped to the right and disjunctions are grouped to the left),([p]A)� = Proof(p�; A�), if p is a proof variable, ([t]A)� = Proof(n;A�), ift is a proof polynomial and n is a natural number, the code of the result oftactic t� on input A�.

Interpretation of proof polynomials. Here we use I for input of a tactic.Tactics Ax, At, Ifl(�; beta), Ifr(�; beta), Mfl(�) and Mfr(�): if I has

form (A! A), (? ! A) or (A! >) then return proof containing only oneformula I (axiom). Otherwise return �.

Tactics Cl(�): if input has form (A ^ B ! C), then call �� on input(A^A^B ! C), if it returns � then return �, otherwise add to output of ��

tautology (A^A^B ! C)! (A^B ! C) and �nally derive (A^B ! C)using modus ponens. Otherwise return �.

43


Interpretation of tactics of the forms Cr(�), T li(�), Tri(�), Wl(�; beta),Wr(�; beta), Il(�; beta), Ir(�) and T (�) goes along the same lines. Whilebuilding an interpretation of T (�) we use the fact that PA ` Proof(n;A)!A. Indeed, consider two possibilities. If Proof(n;A) is true, then n isthe proof of A, thus PA ` A, and PA ` Proof(n;A) ! A. OtherwiseProof(n;A) is false, it is a �1-formula, thus PA ` :Proof(n;A), andagain PA ` Proof(n;A)! A.

Tactics of the form �p, if p is a proof variable, and � does not contain+, and the last symbol of � is not !: if p� is a code of correct proof, thentake its last formula A and call �� on input A ! I. If its result is not �then add to it the proof with the code p�, and �nally derive I using modusponens. Otherwise return �.

Tactics of the form �!p, if p is a proof variable, and � does not contain+: if p� is a code of correct proof, then take its last formula A and call ��

on input Proof(p�; A) ! I, if its result is not � then search for the proof!p� of the formula Proof(p�; A) (it can be found because this �1-formulais true, and thus derivable), add it to the result of ��, and derive I usingmodus ponens. Otherwise return �.

Tactics of the form �+ �: if the result of ��(I) is not � then return it,return ��(I) otherwise.

Tactics of the form M : if input I has other form thanVProof(pi; Ai)!

Proof(n;A) return �. If for all i, p�i proofs Ai and n does not proof A return�. Otherwise I is a true �1 formula, it is derivable, and proof of it can befound e�ectively.

Tactics corresponding to proof polynomials of other forms: return �.

Theorem 5.2. If � : (� ` �) is derivable in LSPT (S4) and � is an arith-metical interpretation of LSPT (S4) then PA ` (� ` �)� and �� can �nd aproof of it.

Proof. Proof is a straightforward induction on the LSPT -derivation. Mostcomments can be found in the de�nition of the interpretation of proof poly-nomials. The only step that remains unclear is rule (`K42).

� : (A1; : : : ; An; [qi]C1; : : : ; [qm]Cm ` B)

M : ([p1]A1; : : : ; [pn]An; [qi]C1; : : : ; [qm]Cm ` [t]B);

is t is a proof polynomial containing summand t0 = �p1 : : : pn!q1 : : : !qm.If one of p�i does not proof A�i or one of q�i does not proof C�i then in-terpretation of the conclusion is a true formula, and M� is able to �nd aproof of it. Let p�i prove A�i and q�i prove C�i . �� is able to �nd a proofofVA�i ^

VProof(q�i ; C

�j ) ! B�, then (�p1 : : : pn)� can �nd a proof ofV

Proof(q�i ; C�j ) ! B�, then t�0 can �nd a proof of B�, t0 is one of the

summands of t, then t� can �nd a proof of B�, thus Proof(t�(B�); B�) istrue, and M� can �nd a proof of the interpretation of the conclusion of therule. 2

44

Vladimir Brezhnev

References

Artemov, S. (1995). Operational modal logic. Technical Report MSI 95-29, Cornell University. http://www.math.cornell.edu/~artemov/

MSI95-29.ps.

Artemov, S. (2001). Explicit provability and constructive semantics.Bulletin of Symbolic Logic 6 (1). http://www.math.cornell.edu/

~artemov/BSL.ps.

Chagrov, A. and M. Zakharyaschev (1997). Modal Logic. Oxford SciencePublications.

Constable, R. (1998). Types in Logic, Mathematics and Programming. InS. Buss (Ed.), Handbook of Proof Theory, Chapter 10. Elsevier Science.

Gabbay, D. M. (1994). Labelled Deductive Systems. Oxford UniversityPress.

Girard, J.-Y., Y. Lafont, and P. Taylor (1989). Proofs and Types. Cam-bridge University Press.

Sorensen, M. H. and P. Urzyczyn (1998). Lectures on the Curry-HowardIsomorphism. Technical report, University of Copenhagen. ftp://

ftp.diku.dk/diku/semantics/papers/D-368.ps.gz.

45


46

Information exchange as reduction

Balder ten Cate

ILLC - University of Amsterdam

[email protected]

Abstract. A formal framework is introduced for analysing pragmatic aspectsof cooperative information exchange, based on update semantics as well as thetheory of abstract reduction systems. The framework provides us with the means toanalyse communicative strategies of multiple participants in information exchangedialogues. It is shown how the system can be applied to the analysis of relevance.

1 Introduction

The general aim of this paper is to provide a formal framework for analysingpragmatic aspects of cooperative information exchange. By information ex-change, communication is meant involving multiple participants, consistingof the exchange of information, in contrast to, e.g., argumentative discourse.Cooperative means that (a) the participants fully trust each other, and (b)they have no interest in lying, misleading or hiding information. The prag-matic aspects of these types dialogues concern goals and strategies of theparticipants. Clearly, we restrict ourselves to a very speci�c type of dis-course. This will allow us to develop an interesting theory of such dialogues.At the end of the paper, we will address the question to what extent thepresent theory can be generalised to other types of discourse.

A more speci�c aim of this paper is to apply the developed framework tothe issue of relevance. We will see how this can help us to analyse di�erentnotions of relevance that have been proposed in the literature.

Our starting point is Stalnaker's theory of assertions. Stalnaker (1978)observed that in making assertions, people take some of their private infor-mation and make it common ground. In terms of possible world semantics,they eliminate possibilities from the common ground.

At any point in a dialogue, people make choices concerning which in-formation to exchange (which possibilities to eliminate). Typically, there

0 I'd like to thank Paul Dekker, Darrin Hindsill, Jan Willem Klop, Rosja Mastop,Marie Nilsenov�a, Robert van Rooy, Yde Venema and the three anonymous reviewers fortheir useful comments and discussions.

Proceedings of the Sixth ESSLLI Student SessionKristina Striegnitz (editor)Chapter 5, Copyright c 2001, Balder ten Cate

47


are many alternatives to choose from. Notwithstanding this apparent diver-gence, there is a clear direction in which the conversation proceeds. Duringthe conversation, the information states of the participants are adjusted toeach other, converging to a (hypothetical) situation in which there is noinformation left to communicate. In this situation, all participants have thesame information (i.e., the information that used to be distributed betweenthem). Obviously, the situation is never reached in practice.1

Figure 5.1: Process ofinformation exchange

Assuming for the moment that there are onlytwo participants involved, these considerations leadto a view on the process of cooperative informationexchange (CIE) that is depicted in Figure 5.1. Inthis picture, the top three ovals correspond to theinformation states of the agents and to the com-mon ground (thought of as sets of possibilities). Atthe \end state" (at the bottom of the picture) allthree circles coincide. This corresponds to the hy-pothetical state that we discussed. The arrows inthe picture correspond to assertions. A dialogue,then, is essentially a walk through this graph.

In CIE dialogues, the participants usually havea speci�c communicative goal, for instance to resolve a decision problem(van Rooy 2000). In general, each participant can have his or her own goal,but for the moment we will assume that the participants have a commongoal. This common goal is represented by the outlined area in the bottomhalf of the picture (the goal is to reach one of the states in this area).

Relevance can be analysed as a strategy for achieving a goal. Intuitively,irrelevant utterances are utterances that will not help the participants toachieve their goal.

In the rest of this paper, we will formalise the picture that we justsketched. In doing so, we will make use of Abstract Reduction Systemstheory. ARS's form a �eld of research in theoretical computer science thatis concerned with the reduction of terms or more abstract objects to a normalform. Combinatory logic and the lambda-calculus are the prime examples,but many other reduction systems have been devised, for instance concerningbraids and knots. For a general introduction, the reader is referred to Klop(1992) or Baader and Nipkow (1998).

2 Information states and utterances

We start out by de�ning Stalnakerian `contexts', which we will refer to asstates. A state speci�es the information that each agent has, as well as the

1Notice that this is an idealised model of information exchange. In practice, manycomplications arise (misunderstandings, mistakes, lying, etc.) that we will not address.

48

Balder ten Cate

information that is common ground. Such states can be modeled in variousways, but for present purposes, the following de�nition will suÆce. Let Pbe a propositional alphabet and A a �nite set of agents. Let V be the set ofall valuations over P . Then we de�ne states as follows (where c stands forthe common ground):

De�nition 1 (States). A state is a function � : A[fcg ! }(V ) such that8a 2 A : �(a) � �(c). � is the set of all states.

In other words, we have a set of agents and the common ground and eachof these is assigned a set of valuations representing its information. Noticethat it is required that the common ground contains less information thaneach of the participants, which is only natural.

We will sometimes write � \ � for �a:(�(a) \ �(a)). Likewise, we willwrite � � � instead of 8a : �(a) � �(a).2

Two central notions from update semantics are support and update. Typ-ically, these notions are de�ned in terms of individual information states.Here, we will use the same notions, but de�ne them in terms of multi-agentstates.

De�nition 2 (Support). � a � if 8v 2 �(a) : v j= �

This de�nition says that agent a supports � (\knows that �") in state �whenever � is true in all the situations a considers possible.

De�nition 3 (Update). � + � = �a:fv 2 �(a) j v j= �g

According to this de�nition, when the agents collectively update with asentence �, then all possibilities are eliminated in which � is not the case(from the private information sets as well as from the common ground).

In terms of support and update, we can now de�ne the valid \moves ofthe game", which correspond to assertions. Recall that we focus on coop-erative information exchange, meaning that the participants (a) trust eachother, and (b) have no intent to deceive each other. These two assumptionsare re ected in the next de�nition, using the notions of update and supportrespectively.

2This notation is justi�ed by the fact that type-theoretically, the given de�nition isequivalent to one in which a state is a relation between agents and possibilities.

49


De�nition 4 (Utterances).

i. �a:��! � if � a � and � 6 c � and � = � + �

ii. �a�! � if �

a:��! � for some �

iii. � �! � if �a�! � for some a 2 A

�a:��! � means that in state �, agent a can utter the sentence �, resulting

in the new state � . According to this de�nition, an agent can only utter aproposition if he or she knows the content to be true, and if the information isnot already common ground. The e�ect of the utterance is then a collectiveupdate with the proposition uttered.

Notice that the set of all states � is closed under �!, i.e., for all � 2 �and �, (� + �) 2 �. Also, if � �! � , then � � �, since the e�ect of anupdate is always elimination of possibilities. We will refer to the re exiveclosure of

a�! by

a�!

�, and we will refer to the re exive, transitive closure

ofa�! by

a�!!.

The next proposition states some properties of the !'s. The diagramsshould be read as follows: whenever the situation occurs that is picturedby the solid arrows, then we can extend this con�guration with the dotted

arrow. For instance, the transitivity diagram says that whenever �a:��! �

and �a: �! �, we have that �

a:�^ �! �.

Proposition 1. The following diagrams hold.

�

�

a: -

a:�-

�

a:�^ -

�

b: -�

�

b: -

a:�-

��

�

a:�-

�

b:�! -�

�

b: -

a:�-

�

a:�-

�

a--

�

�

b

??

a;b--

�

a--

�

�

a

?? a--

�

a

??

�--

�

�

?? --

�

??

(transitivity) (subcommutativity) (swap) (PP a�!;

b�!

) (CR fora

�!) (CR for �!)

If we conceive of �! as a reduction relation, then a normal form is astate that cannot be reduced any further, i.e., a state in which there is noinformation left to exchange. The last diagram of the above proposition tellsus that �! is con uent (has the Church-Rosser property). This implies thatevery state reduces to at most one normal form. Now we can ask ourselveswhich states have such a normal form, and what it looks like.

If the propositional alphabet P is �nite, then the answer to the questionis quite simple. In that case, �! is Strongly Normalising, which means thatthe normal form is always reached in a �nite number of steps (there are noin�nite reduction sequences). Furthermore, nf(�) = �a:

Tb2A �(b), i.e., the

normal form of � is a state in which each agent has the same information,

50

Balder ten Cate

namely the intersection of the old information states of the agents.3 Itfollows that � is in normal form precisely if 8a 2 A : �(a) = �(c)

In the general case, things are a bit more complicated. In general, we donot have Strong Normalisation (or even Weak Normalisation). This meansthat if the agents keep on exchanging information, this process might notcome to an end. Furthermore, even if a normal form is reached, it is notguaranteed that in the �nal situation, the agents all have the same informa-tion. It might happen that one agent has more information then another,but he cannot communicate this information (for instance if that requiresan in�nitely long formula). In order to give precise characterisations of thenormal forms of our system, we need to introduce the auxiliary notion ofsaturation. This is de�ned in terms of ultra�lters.

De�nition 5 (Saturation). � = �a:fv 2 V j 9u 2Uf(V ) : �(a) 2 u & 8p 2P : fw j w(p) = v(p)g 2 ug

Technically speaking, � is the smallest saturated superset of �. Further-more, � is equivalent to �, in the sense that it satis�es the same formulae(� a � i� � a �).4 Having the notion of saturation at our disposal, we cangeneralise our results to the case where P is in�nite.

Proposition 2.

i. If � has a normal form, then nf(�) = �a:[�(a) \Tb2A �(b)].

ii. � is in normal form i� 8a 2 A : �(a) = �(c)

In the absence of normal forms, co�nal reduction sequences often playa similar role. A reduction sequence �1 �! �2 �! : : : is called co�nal if8� : � �!! � =) 9i : � �!! �i. The following proposition tells us that if Pis countably in�nite, then for every � (even if � doesn't have a normal form)there are co�nal reduction sequences starting with �, and they converge to�a:[�(a) \

Tb2A �(b)].

Proposition 3.

i. If �1 �! �2 �! : : : is a co�nal reduction sequence, thenTi �i =

�a:[�1(a) \Tb2A �1(b)].

ii. If P is countable, then co�nal reduction sequences exist for every �.

This �nishes our discussion of the properties of our ARS. In the nextsection it will become clear why it makes sense to conceive of �! as areduction relation.

3Notice the lambda abstraction over a, which does not occur in the rest of the formula.This means that nf(�) is a constant function, assigning the same information to eachagent (and to the common ground).

4Cf. Blackburn et al. (To appear) for an introduction to Ultra�lters and saturatedness.For present purposes, one can think of � as a state that is almost like � but not exactly.

51


3 Goals and strategies

At this point we have formalised most aspects of Figure 5.1. However,what we have left out of consideration so far is the goal of the informationexchange. We will now reintroduce the notion of a communicative goal.

De�nition 6 (Goal). A goal G is a set of states that is closed under re-duction, i.e. � 2 G & � �! � =) � 2 G

Remember that we explicitly assume that the participants have no inter-est in hiding information from each other. This assumption is re ected bythe fact that the set of satisfactory states is closed under further reduction.An example of a goal is f� j � is a normal form g, a rather trivial examplethat illustrates that our notion of a goal is really a generalisation of reachingthe normal form.5 A more interesting range of examples is generated by thefollowing de�nition.

De�nition 7 (The goal to resolve an issue). G�? = f� j � c � or � c:�g.

G�? is the goal to resolve the issue whether � (in other words, it is thegoal to reach a state in which it is common ground whether �).

An important question is: what is a good strategy for reaching a partic-ular goal? Given a speci�c goal, four requirements can be formulated withrespect to a strategy.

1. If the goal is feasible (i.e., if it is possible to reach the goal), the goalshould be reached if the participants follow the strategy.

2. Second, the goal should be reached as quickly as possible.

3. If the goal cannot be reached, then at some point the strategy shouldtell you to give up and stop wasting your time.

4. The strategy should be easy to apply (i.e., complexity-wise).

In this paper, we focus on the �rst of these requirements.The quest for eÆcient strategies is not particular to CIE dialogues, but

is central to the theory of reduction systems (which is why using ARS's toanalyse CIE is useful). It was already established in the previous sectionthat our reduction relations are con uent. The importance of con uence forinformation exchange can be explained as follows.

Suppose a goal G can be reached from a state �, which is to say, thereis a state � such that � �!! � and � 2 G. Then con uence says that nomatter what the participants do, the opportunity to reach the goal is never

5The goal of reaching a weak head normal form in lambda calculus also falls under thisgeneral notion of a goal, as the set of weak head normal forms is closed under �-reduction.

52

Balder ten Cate

lost. Formally, for any state �0, if � �!! �0, then there is a state � 0 suchthat �0 �!! � 0.

This does not mean that it is easy to reach the goal. In order to reach thegoal, it is still important to have a good strategy. Finding good strategiesin the presence of con uence is what ARS theory is concerned with.

Let us start out by de�ning what strategies are. We are concerned withseveral agents, and each of them can follow their own strategy. Therefore,we de�ne strategy pro�les as tuples of reduction strategies, one for eachagent (where the individual reduction strategies are de�ned in the usualway). Also, we must require that the individual strategy of an agent doesnot make reference to the private information of another agent. This leadsto the following de�nition.

De�nition 8 (Strategy pro�les). A strategy pro�le F is a tuple (Fa)a2Asuch that for each agent a 2 A,

� Fa is a sequential reduction strategy fora�!

� 8�; � : �(a) = �(a) & �(c) = �(c) =) Fa (�) = Fa (�)

Given a state and a strategy pro�le, we can ask ourselves what willhappen if the agents start communicating in accordance to their strategies.In addition to the strategies of the individual players, we then also needto specify a particular system of turn taking: which agent is at turn to saysomething at a given moment? At this point, it seems most sensible to makeonly the minimal requirement of fairness: every agent should have more orless an equal oppurtunity to say something. We de�ne an F-dialogue to bea reduction sequence that is both in accordance to F and fair.

De�nition 9 (Dialogue). An F-dialogue is a reduction sequence �1 �!�2 �! : : : such that

� 8i9a 2 A : �i+1 = Fa(�i)

� 8a 2 A 8i 9j � i : either �j+1 = Fa (�j) or �j is ina�!-normal form.

The �rst condition demands that the reduction sequence be in accor-dance with the strategy pro�le (i.e., every step is in accordance with thestrategy of some agent). The second condition expresses fairness: everyagent should say something every once in a while, unless he or she has noth-ing left to say (more precisely: for any participant and at any moment i, thewill be a moment j � i, at which the participant either makes an utteranceor has no information left to exchange).

The next de�nition tells us when a strategy pro�le is successful withrespect to a goal (a G-state being a state which is in G).

53


De�nition 10 (Success). F is successful with respect to G if for all F-dialogues �1 �! �2 �! : : : it holds that if �1 can be reduced to a G-state,then a G-state occurs in the sequence.

This de�nition formalises the �rst requirement on strategies that wediscussed: whenever a goal can be reached it will be reached if the participantsuse the strategy pro�le.

4 Relevance

In the previous section, we have developed the necessary vocabulary to talkabout goals, strategies, and success of strategies with respect to goals. Inthe present section, the framework is applied to the analysis of relevance.

Given that information exchange dialogues take place with a speci�cgoal, it seems most natural to relate the relevance of utterances to thisgoal: an utterance is relevant if it helps to achieve the conversational goal.Van Rooy (2000) argues that the goal of information exchange dialoguesis to solve a particular decision problem. Given that decision problems areessentially issues of the form \what should I do?", this means that utterancesare relevant to the extent that they address (or help to resolve) this issue.

However, what does it mean for an utterance to address, or help to re-solve, an issue? Van Rooy (2000) uses decision theory to explicate this.However, the decision theoretic view is limited by the fact that only sin-gle actions and their immediate consequences are evaluated. The e�ectsof utterances with respect to the goal in the long run are not taken intoconsideration.

The present framework o�ers a more general view, allowing us to evaluatenotions of relevance in terms of successful strategies for achieving a goal. Toillustrate this, we will take two formal notions of answerhood and showwhat happens if we formalise the above discribed view on relevance usingthat notion. In particular, when considering a potential notion of relevancewe will look at the implications of being relevant (in the proposed sense)with respect to reaching the goal of the conversation.

Before we proceed, we give a general result concerning the existence ofsuccessful strategies. Let us de�ne co�nal strategy pro�les as follows.

De�nition 11 (Co�nality of strategy pro�les). F is co�nal if all F-di-alogues are co�nal.

Co�nal strategy pro�les satisfy the following properties.

Proposition 4.

i. Co�nal strategies are succesful with respect to every goal.

ii. If P is �nite, then every strategy pro�le is co�nal.

54

Balder ten Cate

iii. If P is countably in�nite, then there are co�nal strategy pro�les.

From the above result we can conclude that it is not unreasonable tomake our requirements on notions of relevance in terms of successfulness.

Let us consider again what it means for an utterance to address thedecision problem. Groenendijk and Stokhof (1984) have distinguished andformalised four ways in which an assertions can address an issue: beinga complete answer, being a partial answer, giving a complete answer andgiving a partial answer. The last of these four notions of answerhood isthe most liberal one: an assertion gives a partial answer to a question if iteliminates at least one possible answer, which is to say, if it eliminates atleast one block of the partition generated by the question.

For convenience, let us assume that the decision problem corresponds toa yes/no question, and let �? denote the issue whether � is the case. In theprevious section, we already de�ned G�? to be the goal of resolving �?. Nowa proposition gives a partial answer to the issue �? just in case updatingwith results in a state in which �? is resolved.6 In other words, gives apartial answer to �? in a state � precisely if (� + ) 2 G�?.

Let us de�ne to be relevant1 with respect to �? if gives a partialanswer to �?. Unfortunately, it turns out that, in order to resolve �?,it can be necessary for the participants to make assertions that are notrelevant1 with respect to �?. Consequently, any successful strategy pro�lewill sometimes prescribe utterances that are not relevant1 with respect tothe decision problem.

There are weaker notions of answerhood than that of giving a partialanswer. Ten Cate (2000) has developed a notion of relatedness of issues,based on composition of equivalence relations (essentially, two issues arerelated if their corresponding partitions are not orthogonal). Using this, letus de�ne to be relevant2 with respect to �? if ? is related to �?.7

Relevance2 does not have the problem described above: there are suc-cesful strategies with respect to any goal that prescribe only relevant2 utter-ances. Although relevance2 behave better than relevance1 in this respect,it has another problem: given any sentence and any issue �?, we can splitup into two formulae, (�! ) and (:�! ), which are both relevant2with respect to �?. Consequently, in terms of our ARS, the reduction graphgenerated by relevant2 reductions is co�nal in the entire reduction graph.

A natural language example that illustrates this problem is the following.Suppose the issue is whether Amsterdam is the capital of the Netherlandsand consider the proposition \it rains". Then the latter proposition can besplit into two parts, \if Amsterdam is the capital of the Netherlands then

6In the case of a yes/no question, eliminating at least one alternative means resolvingthe question completely.

7As a matter of fact, a proposition is relevant2 with respect to an issue �? preciselyif either or : is relevant1 with respect to �?.

55


it rains" and \if Amsterdam is not the capital of the Netherlands then itrains", such that both propositions are relevant2 with respect to the issue(i.e., whether Amsterdam is the capital of the Netherlands).

Consequently, relevance2 does not signi�cantly decrease the complexityof the reduction graph. Ideally we therefore need a notion of relevance thatis weaker than relevance1, but stronger than relevance2. We do not havesuch a notion at present. Nevertheless, the discussion illustrates how thepresent framework can contribute to analysis of notions of relevance.

5 Conclusion and discussion

It has been shown that ARS theory can be used for analysing pragmatic as-pects of cooperative information exchange. In particular, it was shown howsuch an approach to pragmatics can contribute to the analysis of relevance.There are a number of directions for further research.

First of all, a number of assumptions were made with respect to thetype of communication. The present approach depends crucially on thecon uence property (most rewriting literature assumes that the reductionrelations are either con uent or have some in�nitary variant of the con uenceproperty). When we look at other types of communication than informationexchange (e.g., argumentation) then there seems to be no con uence.

However, when we stay with information exchange, it is still possibleto generalise over the assumptions that were made. In particular, we havelooked only at the case in which the participants share a common goal, butwe could also assign all participants their own goal. In this way, it is possibleto model less cooperative types of communication.

In this paper, we have focussed only on one aspect of information ex-change strategies, namely that they must be succesful with respect to thegoal of the conversation. However, we mentioned three other requirementson strategies. These requirements should be examined in more detail, andformalised where possible. Note that the requirements on strategies we dis-cussed are not speci�c to the case of information exchange but are relevantfor many rewriting applications.

Another direction of further research would be in enriching the modelsof information. We took a simple update semantics based on propositionallogic, where the worlds are propositional valuations, but various alternativespresent themselves. In ten Cate (2000) a start is made with a similar systemon the basis of Groenendijk (1999)'s QL, which is a predicate logical languagewith questions. Also, recently attention has been directed to the dynamicepistemic semantics of public announcements (Gerbrandy 1999). It seemsnatural to look at these epistemic models of information exchange from theperspective of ARS's.8

8Although the resulting ARS is in general not con uent, van Benthem (2000) has shown

56

Balder ten Cate

Finally, it is interesting to see that the link between information exchangeand rewriting becomes more apparent when we shift from the set-theoreticperspective of dynamic semantics to a more representational view. In thelatter case, information states are no longer sets of possible worlds, butrather formulae, �les or discourse representation structures. If we wouldproceed in this way, then our ARS would look more like a term rewritingsystem.

References

Baader, F. and T. Nipkow (1998). Term rewriting and all that. Cambridge,UK: Cambridge University Press.

Blackburn, P., M. de Rijke, and Y. Venema (To appear). Modal logic.Cambridge, UK: Cambridge University Press.

Gerbrandy, J. (1999). Bisimulations on planet Kripke. Ph. D. thesis,ILLC, University of Amsterdam.

Groenendijk, J. (1999). The logic of interrogation: classical version. InT. Matthews and D. Strolovitch (Eds.), Proceedings of the ninth con-ference on semantics and linguistics theory (SALT-9), Santa Cruz.CLC Publications.

Groenendijk, J. and M. Stokhof (1984). Studies on the semantics of ques-tions and the pragmatics of answers. Ph. D. thesis, University of Am-sterdam.

Klop, J. W. (1992). Term Rewriting Systems. In S. Abramsky, D. M.Gabbay, and T. S. E. Maibaum (Eds.), Handbook of Logic in ComputerScience, Volume 2, Chapter 1, pp. 1{117. Oxford: Oxford UniversityPress.

Stalnaker, R. (1978). Assertion. In P. Cole (Ed.), Syntax and semantics,Volume 9, Pragmatics, pp. 315{332. New York: Academic Press.

ten Cate, B. (2000). Dynamic and epistemic semantics of questions: thelogic of consultation. Master's thesis, Vrije Universiteit, Amsterdam.Available from http://home.student.uva.nl/balder.tencate/.

van Benthem, J. (2000). Philosophy 298: Information updatefrom a logical perspective (lecture notes). Available fromhttp://turing.wins.uva.nl/~johan/298-2000.html.

van Rooy, R. (2000). Decision Problems in Pragmatics. In D. Traum andM. Poesio (Eds.), Proceedings of G �OTALOG 2000.

that if we restrict ourselves no �nite Kripke structures, we do have con uence, as well asstrong normalisation.

57


58

Fuzzy Hedges: a Next Generation

Martine De Cock

Department of Applied Mathematics and Computer Science

Ghent University, Krijgslaan 281 (S9), B-9000 Ghent, Belgium

[email protected]

Abstract. Fuzzy set theory provides a framework in which natural languageexpressions can be modelled mathematically. In fuzzy systems a linguistic term(e.g. fast) is represented by a fuzzy set, while a linguistic modi�er (e.g. very) ismodelled by an operator (also known as hedge) transforming a fuzzy set into an-other. In the paper �rst we give a short overview of the traditional fuzzy settheoretical approaches to this problem. We point out that these \hedges of the�rst generation" are in general technical tools without meaning of their own, whichexplains their most important shortcomings. To overcome this, we present two newapproaches in which the representation of linguistic modi�ers is endowed with aninherent semantics (\hedges of the second generation"): the framework of fuzzymodi�ers based on fuzzy relations (recently developed by us) and the so-calledhorizon approach (an extension of the research initiated by Nov�ak).

1 Introduction

During millenia yet people tend to express real-life information by meansof natural language; it allows them to reason about everyday issues withina certain (tolerated) degree of imprecision. Hence it is not surprising thatthe introduction of fuzzy set theory (FST) in the sixties (Zadeh 1965) as aframework for the mathematical representation of linguistic concepts, hasgiven rise to an important evolution in the �eld of computer science. Theimportance of computing with words has increased tremendously over thelast years (Zadeh and Kacprzyk 1999a; Zadeh and Kacprzyk 1999b) andit will grow even more in the years to come (Zadeh 2000). Nowadays thefuzzy set theoretical representations of linguistic terms are already appliedsuccesfully in an impressive amount of systems, ranging from control andreasoning systems over preference and decision making systems to databasesystems. We refer to (Babu�ska 1998; Ruan and Kerre 2000; Yazici andGeorge 1999) for some examples.

The key idea is to model linguistic terms such as \cheap" or \more or lesscheap" by a X ! [0; 1] mapping for X a suitable universe (e.g. the universeof prices). When developing an application, the design of such a X ! [0; 1]

Proceedings of the Sixth ESSLLI Student SessionKristina Striegnitz (editor)Chapter 6, Copyright c 2001, Martine De Cock

59


mapping for each term involved is therefore a fundamental task. Quiteearly in the history of FST, researchers (Lako� 1973; Zadeh 1972) startedstudying the possibility to partially automate this not so trivial task in thefollowing way: given a X ! [0; 1] mapping for a term (e.g. \cheap"), how canwe automatically generate a suitable X ! [0; 1] mapping for the modi�edterm (e.g. \more or less cheap")? All their e�orts resulted in an extensivecollection of technical operators transforming one X ! [0; 1] mapping intoanother (see (Kerre and De Cock 1999) for an overview).

Although already useful, these so-called hedges have important short-comings, which are in our opinion due to the fact that they are designedsimply to perform a technical transformation, but have no further meaningof their own. In our paper we will present two new approaches with a clearinherent semantics, and we will show that they overcome the shortcomingsof the hedges of the �rst generation. However we will start with a shortdiscussion on the representation of (modi�ed) linguistic terms in general,focussing on the linguistic modi�ers \very" and \more or less", and a briefoverview of the hedges of the �rst generation and their shortcomings.

2 Representing linguistic terms

In fuzzy systems, each linguistic term is represented by a fuzzy set on asuitable universe X. A fuzzy set A on X is a X ! [0; 1] mapping, alsocalled the membership function of A. For all x in X, A(x) is called themembership degree of x in A. The class of all fuzzy sets on X is denotedF(X). Furthermore for A and B in F(X) we say that A � B if and only ifA(x) � B(x); for all x in X. When representing a linguistic term by a fuzzyset, the most important question is of course: which membership functionshould we choose?

2.1 Two interpretations

In the inclusive interpretation (Vanden Eynde 1996) it is assumed that se-mantic entailment (Lako� 1973) holds : for A in F(X) and x in X

x is very A ) x is A ) x is more or less A

These kind of assumptions are often made in the literature (see e.g. (Kerre1993; Nov�ak and Per�lieva 1999; Zadeh 1972)). Representing linguisticterms by fuzzy sets, they correspond to:

very A � A � more or less A (6.1)

The underlying semantics is that every object that is very A is also A andthat every object that is A is also more or less A. In this interpretation themembership degree of x in A corresponds to the degree to which x satis�es

60

Martine De Cock

30 40 50 60 70 80 90 1000

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

more or less old

old

very old

30 40 50 60 70 80 90 1000

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

more orless old

very old

old

Figure 6.1: Modi�ers based on fuzzy relations: a) inclusive. b) non-inclusive.

the term modelled by A: indeed the degree to which an object x is moreor less A is always greater than or equal to the degree to which the sameobject x is A. The same holds for A w.r.t. very A. Figure 6.1a depictspossible membership functions for the linguistic terms old, more or less old,and very old in the universe of ages expressed in years. Note that such kindof membership functions are dependent on context and observer and mayvary likewise depending on the application and the applicant. However sincethis is not the central issue of our paper, we will not go into further detailabout it and assume that the given membership functions are acceptable toa particular observer in a given context.

Psycholinguistic research (Hersh and Caramazza 1976; Vanden Eynde1996) showed that there is also a non-inclusive interpretation. In this inter-pretation, which is often used in fuzzy control applications (e.g. (Babu�ska1998)), a term modi�ed by \more or less" or \very" doesn't denote a subsetneither a superset of the original term. The terms denote di�erent (possi-bly overlapping) categories. In this case the membership degree of x in Acorresponds rather to the degree to which x is representative for the termmodelled by A. Figure 6.1b depicts possible membership functions for old,more or less old, and very old in the non-inclusive interpretation.

2.2 Fuzzy modi�ers

As already indicated in the introduction, the design of a suitable member-ship function for every term involved in an application is a fundamental task.Furthermore saving all the membership functions in memory can be quitememory consuming. For both these reasons it is desirable to generate the

61


membership function for a modi�ed linguistic term automatically from themembership function of the original term. In practice this is done by rep-resenting every linguistic modi�er by a fuzzy modi�er, i.e. a F(X)! F(X)mapping. In FST linguistic/fuzzy modi�ers are commonly called linguis-tic/fuzzy hedges. Most of the fuzzy modi�ers proposed in the literature(we call them \hedges of the �rst generation") are (r; t)-decomposable. Forr a [0; 1] ! [0; 1] mapping and t a X ! X mapping, a fuzzy modi�er mon X is called (r; t)-decomposable if and only if for all A in F(X), for allx in X: m(A)(x) = r(A(t(x))). t and r are called the pre- and the post-modi�er of m respectively. Often either r is the identical mapping I[0;1]on [0; 1] (i.e. r(x) = x, for all x in [0; 1]) or t is the identical mapping IXon X (i.e. t(x) = x, for all x in X). In these cases we talk about purepremodi�cation and pure postmodi�cation respectively.

Pure postmodi�cation By far the most popular postmodi�ers are themappings :�, for � in [0;+1[. The associated (:�; IX)-decomposable fuzzymodi�ers | also called powering modi�ers | are originally introduced byZadeh (Zadeh 1972). When m is a (r; IX)-decomposable fuzzy modi�erthen either (8x 2 X)(A(x) = 1 ) m(A)(x) 6= 1) or (8x 2 X)(A(x) = 1 )m(A)(x) = 1) Suppose we use m(tall) to model very tall in the universe ofheights of men, then in the �rst case no height can be very tall to degree1, while in the second all heights that are tall to degree 1, are also verytall to degree 1. According to our intuition however a height of 2m00 isclearly very tall to degree 1, while a height of 1m80 is tall to degree 1 butvery tall only to a lower degree. This example show that a representation of\very" by a (r; IX)-decomposable fuzzy modi�er leads to counter-intuitiveresults. Similar remarks can be made for \more or less." Furthermore it's notpossible to use a (r; IX)-decomposable fuzzy modi�er for the non-inclusiveinterpretation.

Pure premodi�cation Premodi�ers act on the objects of the universe X.They are mainly studied for X a subset of R, the set of real numbers. Themost popular premodi�ers are the mappings T� de�ned by T�(x) = x+�, forall x in R, � in R. The resulting (I[0;1]; T�)-decomposable modi�ers S�� |also called shifting modi�ers | are informally suggested by Lako� (Lako�1973) and more formally developed later on (Bouchon-Meunier 1993; Hellen-doorn 1990; Kerre 1993). Since they shift the membership function � unitsto the left or the right they are suitable to model \more or less" and \very��in the non-inclusive interpretation. In the inclusive interpretation shiftingmodi�ers can be used to represent more or less A and very A provided thatA is increasing or decreasing, and that X is a subset of R of course. If A ispartially increasing and partially decreasing however (e.g. about 8 p.m.) ashifted version can never be a superset of A and can therefore not be used to

62

Martine De Cock

model more or less A. In this case an arti�cial and quite complicated solu-tion to the problem can be found by dividing the membership function intoits increasing and decreasing parts and applying a di�erent shift to each part.

A suggestion to use both a non-trivial pre- and postmodi�er at once forthe inclusive interpretation was made by Nov�ak (Nov�ak 1992). Althoughit is an improvement on the solutions discussed in the previous paragraphs,it can only be applied for a special kind of membership functions and italso involves a process of division of the membership function in increasingand decreasing parts. For a detailed overview of the hedges of the �rstgeneration and their shortcomings, we refer to (De Cock 1999; Kerre andDe Cock 1999).

3 The next generation

The fuzzy modi�ers of the �rst generation are merely arti�cial operatorsthat transform the membership function of a term A into a membershipfunction that is kind of acceptable (to the designer) for very A or more orless A, but they do not have an inherent semantics. Hence it is no surprisethat they are a�icted with the above mentioned shortcomings. We feel thatfuzzy modi�ers should be endowed with a clear inherent semantics (see also(De Cock and Kerre 2000)).

3.1 Fuzzy modi�ers based on fuzzy relations

Inclusive interpretation We suggest to do this by taking the context,i.e. mutual relationships in the universe into account. More speci�cally tocompute the degree to which an object y is more or less A or very A, wewill take a look at the objects that resemble to y. Indeed: according to ourintuition a person can be called more or less old if he resembles to someonewho is old. Furthermore he is very old if everybody whom he resembles tois old.

Resemblance can be modelled by means of a fuzzy relation R on X, i.e. afuzzy set on X �X. It is clear that such a resemblance relation should bere exive and symmetrical. For a more detailed study on how it should looklike, we refer to (De Cock and Kerre 2001). In Figure 6.1a we used thefuzzy relation R de�ned by R(x; y) = min(1;max(2:5�0:5jx�yj); 0)), withR(x; y) being the degree to which x and y are approximately equal, for allx and y in X. The so-called R-foreset of y is the fuzzy set on X denoted byRy and de�ned by (Ry)(x) = R(x; y). If R models resemblance, then Ry isthe fuzzy set of objects resembling y, in other words the context of y thatwe wish to consider.

Furthermore we want to express that y is more or less A if y resemblesto an object that is A, in other words if the intersection of A and Ry is not

63


Term Fuzzy set Conditions

very A R~(A) R is a resemblance relationmore or less A R|(A) R is a resemblance relation

Table 6.1: Modi�ers based on fuzzy relations - inclusive interpretation

empty. Likewise y is very A if all the objects that resemble to y are A, i.e. ifRy is included in A. Therefore we suggest to model more or less A and veryA by the fuzzy sets R|(A) and R~(A) respectively, with for all y in X:

R|(A)(y) = supx2X

max(A(x) + (Ry)(x)� 1; 0)

R~(A)(y) = infx2X

min(1� (Ry)(x) +A(x); 1)

The meaning of these formulas becomes very clear when A and Ry are crispsets. A crisp set B in a universe X only has membership degrees 0 and 1;B(x) = 1 is denoted \x 2 B" while B(x) = 0 corresponds to \x =2 B".Indeed if A and Ry are crisp, it can be veri�ed that

y 2 R|(A) i� (9x 2 X)(x 2 A ^ x 2 Ry)y 2 R~(A) i� (8x 2 X)(x 2 Ry ) x 2 A)

Furthermore if A is a crisp singleton, i.e. A(z) = 1 for some z in X, andA(x) = 0 for all other x in X, then, regardless whether Ry is crisp or not:

R|(A)(y) = R(z; y)

In the chosen representation this corresponds to \y is more or less fzg tothe degree to which z and y resemble," which is according to our intuition.To show that the presented framework respects the semantic entailmentneeded in the inclusive interpretation (cfr. Formula (6.1)), we will proof thefollowing proposition which holds for an arbitrary re exive fuzzy relation.A fuzzy relation R is called re exive if and only if (8x 2 X)(R(x; x) = 1).

Proposition 3.1. If R is a re exive fuzzy relation on X then for all A inF(X):

R~(A) � A � R|(A)

Proof. For all y in X:

R~(A)(y) � min(1�R(y; y) +A(y); 1)� min(A(y); 1)� A(y)� max(A(y); 0)� max(A(y) +R(y; y)� 1; 0)� R|(A)(y)

64

Martine De Cock

beautiful average ugly

snowwhite 1.00 0.00 0.00witch 0.00 0.30 0.70

wolf 0.00 0.00 1.00dwarf 0.10 0.70 0.20prince 0.80 0.20 0.00

red-hood 0.50 0.50 0.00

Table 6.2: Fuzzy sets in the universe of fairy-tale characters

R snowwhite witch wolf dwarf prince red-hood

snowwhite 1.00 0.00 0.00 0.00 1.00 0.50witch 0.00 1.00 1.00 0.50 0.00 0.00

wolf 0.00 1.00 1.00 0.00 0.00 0.00dwarf 0.00 0.50 0.00 1.00 0.00 0.88prince 1.00 0.00 0.00 0.00 1.00 1.00

red-hood 0.50 0.00 0.00 0.88 1.00 1.00

Table 6.3: Resemblance relation on the universe of fairy-tale characters

According to the de�nition of fuzzy set inclusion given at the beginning ofSection 2, this proofs the proposition.

As stated above it is natural to assume re exivity for a fuzzy relationmodelling resemblance; indeed every object resembles to itself to the highestdegree. Therefore our framework garantuees semantic entailment. Further-more it imposes no restrictions on the membership functions and it can beapplied to all kinds of universes (numerical as well as non-numerical), as wewill illustrate in the following example:

Example 3.1. Table 6.2 shows the membership degrees of the fuzzy setsbeautiful, average and ugly in the universe X of 6 fairytale characters, whileTable 6.3 de�nes a resemblance relation R on X. For more details on theconstruction of R we refer to (De Cock and Kerre 2001). The modi�ed fuzzysets obtained using the representational scheme of Table 6.1 are given in Ta-ble 6.4. Comparing our technique to the traditional fuzzy modi�ers discussedin Section 2, we notice that there is no clear shifting operation on the non-numerical universe of fairy-tale characters, hence no straightforward way touse shifting hedges. The application of powering hedges is technically possi-ble, but can never yield the results obtained in Table 6.4. In fact the princewho is beautiful only to degree 0.80, is considered more or less beautiful todegree 1 in our representation. Likewise snowwhite who is average to degree0 is promoted a bit to being more or less average to degree 0.20. This is dueto her resemblance to other fairy-tale characters who are already average tosome degree greater than 0. Since modi�ers with pure postmodi�cation, such

65


more or less more or less more or less very very very

beautiful average ugly beautiful average ugly

snowwhite 1.00 0.20 0.00 0.80 0.00 0.00witch 0.00 0.30 1.00 0.00 0.00 0.70

wolf 0.00 0.30 1.00 0.00 0.00 0.70dwarf 0.38 0.70 0.20 0.10 0.62 0.12prince 1.00 0.50 0.00 0.50 0.00 0.00

red-hood 0.80 0.58 0.08 0.22 0.20 0.00

Table 6.4: Modi�ed fuzzy sets

as powering modi�ers, can not make the di�erence between objects belongingto degree 1 to the original fuzzy set and belonging to degree 1 to the modi�edfuzzy set (and analogously for the degree 0) they can not yield such results.Similar remarks can be made for \very".

Non-inclusive interpretation For the non-inclusive interpretation suit-able fuzzy relations R can easily be chosen such that R|(A) imitates thebehaviour of shifting as discussed in Section 2.2. Hence for such suitablerelations R1 and R2 we can use R1

|(A) to model more or less A and R2|(A)

to model very A. Furthermore for quite similar fuzzy relations not only ashift but also a change of shape can be obtained, making the representa-tional scheme even more exible. Note that this allows the use of the samepowerful framework for both the inclusive and the non-inclusive interpre-tation. However since no further inherent semantics is yet assigned to thistechnique in the non-inclusive interpretation, we omit the details here. Anexample is given in Figure 6.1b.

3.2 The horizon approach

Another approach to give inherent semantics to modi�ers is the horizonidea developed by Nov�ak (Nov�ak 1996; Nov�ak and Per�lieva 1999) for theinclusive interpretation. We extend it to the non-inclusive interpretation.For both interpretations we also integrate a mathematical notion of distance,namely a pseudo-metric, so the technique can be used not only in numericaluniverses but in pseudo-metric spaces in general. We recall that an M2 �[0;+1[ mapping d is called a pseudo-metric on M if and only if d(x; x) = 0,d(x; y) = d(y; x) and d(x; y) + d(y; z) � d(x; z), for all x, y, and z in M.(M; d) is then called a pseudo-metric space.

Let (X; d) be a pseudo-metric space and let be an element of X calledthe observer. In real life situations the objects that are close to (closer

66

Martine De Cock

than some distance �) are clearly visible to him (visible to degree 1). Thevisibility of an object x of X to drops with the distance d(; x) between and that object. Somewhere in the far distance, behind the horizon (say,at distance ), the objects aren't visible at all anymore (visible to degree 0).� and characterize the vision of the observer; therefore we call them thequality of vision parameters. It is natural to assume that � 2 R, 2 R and� < . The visibility of objects in X to an observer in X with quality ofvision parameters � and can now be expressed by a X ! [0; 1] mappingH (; (�; )) with the following characteristics:

(H.1) d(; x) � �) H (; (�; ))(x) = 1(H.2) � d(; x)) H (; (�; ))(x) = 0(H.3) d(; x) � d(; y)) H (; (�; ))(x) � H (; (�; ))(y)

Furthermore for two couples of quality of vision parameters (�1; 1) and(�2; 2) such that �1 � �2 and 1 � 2, it should hold that

(H.4) H (; (�1 ; 1))(x) � H (; (�2 ; 2))(x)

which expresses that the visibility is better if the quality of vision parametersare higher. An example of such a mapping is f = H (; (�; )) de�ned by,for all x in X,

f(x) =

8><>:1 if d(; x) � �

1� d(;x)�� if � < d(; x) <

0 otherwise

Inclusive interpretation To model a term A in X (e.g. the term \small"in the universe of heights), we place the observer with quality of visionparameters (�, ) in the element of X that satis�es the term A the best (thesmallest height, i.e. 0). The visibility of an object x of X to correspondsto the degree to which x satis�es A. Then we make the observer take o�his glasses, which changes the quality of his vision to the worse: it is nowcharacterized by two new quality of vision parameters (�1, 1) such that�1 � �, 1 � . The objects he can still see without his glasses must bevery close to him, so they must be very A. Finally we give the observerback his glasses and a telescope as well, thereby improving the quality ofhis vision. Now he can see all objects that are more or less A. This issummarized in Table 6.5 in which is the element of X that satis�es Athe best. Note that the characteristics imposed on H and the conditions onthe quality of vision parameters guarantee that Formula (6.1) is respected.The membership functions in Figure 6.2a are constructed according to thisscheme, using the mapping f de�ned above with d(x; y) = jx � yj for all xand y in X (the universe of ages).

67


Term Corresponding fuzzy set Conditions

A H (; (�; ))very A H (; (�1 ; 1)) �1 � �, 1 �

more or less A H (; (�2 ; 2)) � � �2, � 2

Table 6.5: Horizon approach - inclusive interpretation

30 40 50 60 70 80 90 1000

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

more orless old

old

very old

30 40 50 60 70 80 90 1000

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

more orless old

old

very old

Figure 6.2: Horizon approach: a) inclusive. b) non-inclusive.

Non-inclusive interpretation For the inclusive interpretation we havekept the observer �xed on some object of X, and we have changed hisquality of vision parameters. In the non-inclusive interpretation we will doexactly the opposite: we keep the quality of vision �xed, but we make theobserver walk through the universe. More in particular, to model a term,we will place the observer on the object of X that is most representativefor that term. In Table 6.6 1, 2, and 3 are the elements of X mostrepresentative for A, very A, and more or less A respectively. See �gure 6.2bfor an example of membership functions generated according to this scheme

Term Corresponding fuzzy set

A H (1 ; (�; ))very A H (2 ; (�; ))

more or less A H (3 ; (�; ))

Table 6.6: Horizon approach - non-inclusive interpretation

68

Martine De Cock

using the mapping f and d as de�ned above.

Conclusion and future work

After indicating the shortcomings of traditional fuzzy modi�ers, we havepresented two new approaches. Unlike the hedges of the �rst generation,these new kinds of fuzzy modi�ers are not merely technical operators, butare endowed with clear inherent semantics. For the fuzzy modi�ers basedon fuzzy relations this is achieved by taking the context into account (morespeci�cally: mutual relationships between objects). In the horizon approach(an extension of the work initiated by Nov�ak) the modi�ed terms are char-acterized by a change of quality of vision parameters or a change of theobserver, giving them a semantics smoothly integrated in the surroundinghorizon idea. Since this new generation of hedges has a strong semanti-cal ground, they lead to membership functions which are closer to humanunderstanding. Furthermore they can be used on all kinds of membershipfunctions with (hardly) no restriction. Our further work will focus on theapplication of these new kinds of fuzzy modi�ers in fuzzy systems.

Acknowledgements The author would like to thank the Fund For Scien-ti�c Research-Flanders (FWO) for funding the research reported on in thispaper, as well as the anonymous referees for their helpful comments.

References

Babu�ska, R. (1998). Fuzzy Modeling for Control.Boston/Dordrecht/London: Kluwer Academic Publishers.

Bouchon-Meunier, B. (1993). La Logique Floue. Que sais-je? Paris.

De Cock, M. (1999). Representing the Adverb Very in Fuzzy Set The-ory. In A. Todirascu (Ed.), Proceedings of the ESSLLI 1999 StudentSession, Utrecht, the Netherlands.

De Cock, M. and E. E. Kerre (2000). A New Class of Fuzzy Modi�ers. InProc. ISMVL 2000, pp. 121{126. IEEE Computer Society.

De Cock, M. and E. E. Kerre (2001). On (un)suitable Fuzzy Relations toModel Approximate Equality. Fuzzy Sets and Systems. Accepted forpublication.

Hellendoorn, H. (1990). Reasoning with fuzzy logic. Ph. D. thesis,T. U. Delft, Delft.

Hersh, H. M. and A. Caramazza (1976). A Fuzzy Set Approach to Mod-i�ers and Vagueness in Natural Language. Journal of ExperimentalPsychology : General 105 (3), 254{276.

69


Kerre, E. E. (1993). Introduction to the Basic Principles of Fuzzy SetTheory and Some of its Applications. Gent: Communication and Cog-nition.

Kerre, E. E. and M. De Cock (1999). Linguistic Modi�ers: An Overview.In G. Chen, M. Ying, and K.-Y. Cai (Eds.), Fuzzy Logic and SoftComputing, pp. 69{85. Kluwer Academic Publishers.

Lako�, G. (1973). Hedges : a Study in Meaning Criteria and the Logic ofFuzzy Concepts. Journal of Philosophical Logic 2, 458{508.

Nov�ak, V. (1992). The alternative mathematical model of linguistic se-mantics and pragmatics, Volume 8 of International Series on SystemsScience and Engineering. Plenum Press New York and London.

Nov�ak, V. (1996). A horizon shifting model of linguistic hedges for ap-proximate reasoning. In Proceedings of the Fifth IEEE InternationalConference on Fuzzy Systems, New Orleans, pp. 423{427.

Nov�ak, V. and I. Per�lieva (1999). Evaluating Linguistic Expressionsand Functional Fuzzy Theories in Fuzzy Logic. In L. A. Zadeh andJ. Kacprzyk (Eds.), Computing with Words in Information/IntelligentSystems 1 (Foundations), pp. 383{406. Physica-Verlag.

Ruan, D. and E. E. Kerre (Eds.) (2000). Fuzzy If-Then Rules in Com-putational Intelligence; Theory and Applications. Kluwer AcademicPublishers, Boston/Dordrecht/London.

Vanden Eynde, C. (1996). A very diÆcult problem : modelling mod-i�cation using VERY; A semantic{pragmatic approach. (in Dutch)FKFO{project, private communication.

Yazici, A. and R. George (Eds.) (1999). Fuzzy Database Modeling, Vol-ume 26 of Studies in Fuzziness and Soft Computing. Physica-Verlag.

Zadeh, L. A. (1965). Fuzzy Sets. Information and Control 8, 338{353.

Zadeh, L. A. (1972). A Fuzzy-Set-Theoretic Interpretation of LinguisticHedges. Journal of Cybernetics 2 (3), 4{34.

Zadeh, L. A. (2000). Toward an Enlargement of the Role of NaturalLanguages in Information Processing, Decision and Control. In Pro-ceedings 6th Int. Conf. on Soft Computing (IIZUKA2000), CD-ROM,Fukuoka, Japan, pp. 7{10.

Zadeh, L. A. and J. Kacprzyk (Eds.) (1999a). Computing with Words inInformation/Intelligent Systems 1 (Foundations), Volume 33 of Stud-ies in Fuzziness and Soft Computing. Physica-Verlag.

Zadeh, L. A. and J. Kacprzyk (Eds.) (1999b). Computing with Words inInformation/Intelligent Systems 2 (Applications), Volume 34 of Stud-ies in Fuzziness and Soft Computing. Physica-Verlag.

70

One is Enough:

The Case Against Aspectual Proliferation

Bridget Copley

Massachusetts Institute of Technology

[email protected]

Abstract. This paper defends the \naive view" that English simple past tensesentences have only one temporal-aspectual element. The view that there are twotemporal-aspectual elements in such sentences has been proposed by proponents ofvarious improvements on the Reichenbachian system of tense, but their argumentsare shown to be inconclusive, and other arguments are presented in favor of thenaive view. If correct, this result casts serious doubts on the use of the Reichen-bachian temporal points as primitives in theories of temporal-aspectual semantics.

1 Introduction

Consider (1).

(1) John danced.

The naive view of the logical form of (1) is that there is one temporal ele-ment (operator, or predicate, or whatever you believe tenses are1 ), namelyPast, as in (2a). The proposal has been made, however, that (1) actuallyinvolves two temporal-aspectual elements, identi�ed as Past and Asp belowin (2b):

(2) a. Naive view b. Less naive view

TPaaa

!!!John TP

bb""Past VP

ll,,dance

TPaaaa

!!!!John TP

HHH��

Past AspPQQ��

Asp VPll,,

dance1I will use the word \operator" for convenience even though I later assume tenses of

the type hi; ti; neither the terminology nor the later assumption is crucial to the argument.

Proceedings of the Sixth ESSLLI Student SessionKristina Striegnitz (editor)Chapter 7, Copyright c 2001, Bridget Copley

71

One is Enough: The Case Against Aspectual Proliferation

This proposal has been made in the service of two somewhat di�erent,though related, ideas.

The �rst begins ultimately with Reichenbach (1947), with additional in-put from Hornstein (1990) and Klein (1997) along the way. This idea takesas primitive three temporal times (S, R, and E). Each temporal-aspectualconstruction is distinguished by di�erent instantiations of two relations: be-tween S and R on the one hand and R and E on the other. Two relationsmeans two temporal-aspectual elements in the morphosemantics (or oughtto, even if Reichenbach and Hornstein don't put it that way; Klein does).

The second idea leading to (2b), independently of the Reichenbachianline of reasoning, is the idea that (1) has a perfective aspectual operator.Proponents of this view include Klein (for whom this perfective operator isthe second operator alluded to above).

Despite both of these reasons to adopt (2b) as the correct view of (1),however, I will argue that the naive view in (2a) is the correct one. First,I will argue that the arguments given in support of the �rst idea, i.e., theReichenbachian line of reasoning, are not convincing and make wrong pre-dictions. Then I will make the case for (2a) using two new arguments againstthe �rst idea. Finally, I will address the second idea in the light of theseresults and show that (2b) makes sense in light of the semantic facts.

This result, if correct, is signi�cant for any theory which seeks to elimi-nate the use of the Reichenbachian temporal points S, R, and E as primitivesin temporal-aspectual semantics. (C.f., e.g., (Stowell 1996) for one theorywhich explicitly does this.) If it can be shown that the presence of R isnot needed and indeed makes incorrect predictions, then the designation ofthese points as primitives is called seriously into doubt.

2 The Reichenbachian line: two operators

Though Hornstein does not espouse a view in which each relation has itsown morpheme, Klein does, as mentioned above. However, since Hornstein'sarguments for three points are more syntactically oriented than Klein's, andcan be carried over to Klein's structural view, in this section I will for themost part be arguing against Hornstein's reasons for having three temporalpoints implicit in (1), with the view of arguing for (2a). First, however, I willprovide some background on the original motivation for the Reichenbachiansystem, and Hornstein's and Klein's improvements to it.

2.1 The Reichenbachian system

Reichenbach's system is a formal characterization of tense constructions.Each tense construction consists of relations among three times. Two ofthese times, which are relevant for any tense construction are speech time(S), the time at which the sentence is uttered, and event time (E), the time

72

Bridget Copley

at which the event or state mentioned in the sentence is to have happened,be happening, or be going to happen. For simple sentences such as those in(3), seemingly this is all that is needed. (I will be arguing, in e�ect, that forsimple sentences it really is all that is needed.) In the Reichenbach notation,\E S" means that E is before S, and \E,S" means that E and S are locatedat the same time.

(3) a. John danced. E S

b. John is dancing. E,S

c. John will dance. S E

This is �ne, but the perfect apparently needs more than just S and E.Consider a past perfect construction as in (4):

(4) At 6:00, John had �nished his paper.

How should the perfect be represented? E is before S, but that is alsotrue for the simple past, as in (4a). We would like to be able to distinguishthe two. Furthermore, assuming that temporal variables are only allowed topick out times that have these \temporal roles," what is 6:00 doing there?The sentence in (4) has a reading in which E is not located at 6:00, butbefore 6:00, so 6:00 is neither E nor S. This is why Reichenbach introducesa third time: the \reference time" R. The past perfect he then characterizesas E R S.

Casting all temporal constructions (even the simple tenses) as combi-nations of E, R, and S allows Reichenbach to make a list of 24 possibletemporal combinations, which is nicely restrictive. Hornstein points out,however, that we can get an even more restrictive set of possibilities, sincethere are numerous combinations that English apparently does not distin-guish between. For example, the future perfect seems to correspond toE S R, E,S R, and S E R; the sentence in (5) is silent about whetherthe �nishing event has taken place before the speech time, is taking placeas we speak, or will take place between now and noon.

(5) John will have �nished his paper by noon.

What is important here is that the relation between E and S is unspeci-�ed. Thus Hornstein concludes that the only two relations that matter arethe relation between E and R and the relation between R and S. This alsoreduces the number of possible temporal constructions to 16.

Hornstein still has to make the case that all tense combinations have allthree times and both relations, a case which is outlined below. But notethat one would only want to make this argument if these times | or moreaccurately, the \roles" they instantiate (speech time, reference time/time

73


of topic, etc.) | and the relations between them are really primitives inthe system. If there is a structural or pragmatic justi�cation for the roles,then the number of roles and relations in the sentence should simply followfrom the structural or pragmatic considerations. Likewise, if we �nd thelatter to be the case, that undermines the idea that the temporal roles areprimitives. The term \roles" I mean to be reminiscent of theta roles, but acloser analogy would be to grammatical relations such as subject and object.Some theories of syntax (e.g. Perlmutter and Postal (1983); see also otherpapers in that volume) have taken these as primitives, but there has beenmuch fruitful work that derives subjecthood or objecthood from structuralfacts. My approach to the Reichenbachian temporal roles is in the spirit ofthe latter.

Klein, in fact, adds a structural component to the theory, to the e�ectthat morphemes are responsible for the relations. (A more elaborated struc-tural theory is also given by Thompson (1994), to be discussed below.) Inthe case of the present perfect, for example, the present tense is responsiblefor the R-S relation, and the perfect aspect (or tense) is responsible for theR-E relation. Not only that, but any other aspect, such as imperfective orperfective, can also a�ect the R-E relation. To achieve this, Klein treats theReichenbachian points in terms of intervals, not instants. Thus imperfectiv-ity and perfectivity can be stated in terms of E containing or overlappingR (or in Klein's terms, Time of Situtation containing overlapping Time ofTopic).

Once you agree with Hornstein's story that E, R, and S are in all tenseconstructions, and Klein's story that the relations between E and R onthe one hand and R and S on the other are mediated structurally by mor-phemes in the right positions, (2b) follows as a null hypothesis (with theReichenbach-Hornstein-Klein representation being something like E,R S).So to understand more about the motivation for (2b), let's proceed to threearguments of Hornstein's, and one of Klein's, that the simple past shouldhave all three of the Reichenbachian temporal roles.

2.2 Conceptual argument for two operators

The conceptual argument for (2b) as a null hypothesis is as follows. Theperfect requires all three time primitives (and hence, on a structural viewlike Klein's, two operators to mediate between them). We would like all con-structions to use all three primitives, to avoid introducing other constraintswhich explain why not all constructions require all three time primitives.An advantage of such a system is that the number of possible tense-aspectconstructions is limited in nature, consisting of all possible combinations ofrelations between S-R and R-E, making tense-aspect learnable for the child.

However, the fact that (1) has a single piece of temporal-aspectual mor-phology and the past perfect (John had danced) has (at least) two is certainly

74

Bridget Copley

enough to motivate a theory in which (1) involves a single operator. As forthe learnability issue, we cannot insist that the child needs all tense-aspectconstructions to have two operators, or three Reichenbachian temporal rolesjust because the perfect does; to continue the parallel with grammaticalrelations, that would be something like insisting that all verbs are under-lyingly ditransitive just because some are. Instead, presumably, UniversalGrammar includes one-operator tense-aspect constructions (with two timesimplicated, with pragmatics determining which gets the role of S and whichgets the roles of E and R). The child would know that sometimes just onetense-aspect operator is needed, just as she would have to know that someverbs are transitive. She would be led towards such a hypothesis by the factthat such constructions only have one temporal-aspectual morpheme.

Hornstein's concern that this state of a�airs would lead to the iterationwithout end of temporal-aspectual operators { if one, then why not threeor four? { should be tempered by the fact that there do exist constructionsin various languages with three or four (see Copley (forthming) for a fewof these). And in fact, there is some evidence that sometimes even none isenough, in the case of some present tenses (En�c (1996), Ogihara (1989)) andlanguages like Haitian Creole (Dechaine 1991).

2.3 Two temporal adverbials implies two operators

Hornstein further argues that the presence of both R and E in the simple pastcan be detected by the possibility for two temporal adverbials in examplessuch as (6).

(6) Yesterday, John left a week ago. (Hornstein's judgment)

I agree with the validity of Hornstein's test, but would like to argue that hegets some of the facts wrong. I �nd (6) marginal, certainly worse than theperfect version (Yesterday, John had left a week ago,) and also worse than,say, a progressive past futurate, as in (7a):

(7) a. Yesterday, John was leaving next week.

b. *? Yesterday, John left next week.

Since progressives clearly have two temporal-aspectual operators { i.e., tenseand progressive aspect { they would be expected to have a full complement ofReichenbachian temporal points, according to Klein. Thus it's not surprisingthat they should be able to have two temporal adverbials. On the otherhand, the simple past as in (7b) does not have this ability, which also suggeststhat there are not two temporal-aspectual operators.

2.4 Temporal connectives need R

Hornstein further argues that a theory which aims to explain the behavior oftemporal connectives such as when, before, and after requires that the simple

75


past have an R point. Our modus operandi here to argue against his claimis to consider in turn the connectives that can occur between the simplepast and a construction such as the perfect, which on anyone's theory hasto have two operators, and see if any explanatory power is lost by treatingthe simple past as having a single tense operator.

Suppose then that tenses and sentences are predicates of times, typehi; ti. (For concreteness, though it could be handled other ways, I assume asemantic framework as in Heim and Kratzer (1998), and also assume thattenses combine with tenseless sentences by predicate modi�cation) For spacereasons let us consider here only the connective after, which might have thedenotation below, where 9c stands for a contextually restricted existentialquanti�er. Both p and q stand for sentences, type hi; ti.

(8) [[after]](p)(q) = 1 i� 9c i: p(i) and 9c j: q(j) and j is after i

For Hornstein, temporal connectives relate R points to each other. In thissystem, however, after relates the time points that are arguments of thecomponent sentences p and q. A further assumption I will make, followingStowell (1996) and Iatridou (2001), is that the perfect has a stative featurein it, spelled out as have. Semi-formally, the perfect can be partially brokendown as in (9):

(9) [[John had left at 5]](i) = 1 i� at i John was in the state of havingleft at 5

Using these assumptions, there are no problems accounting for the mean-ing of (10a,b). I include the second temporal argument at 5 in the case ofthe perfect to show that the argument i really can be (the same time as)the R point, as desired.

(10) a. John left after Harry arrived.9c i: [[Harry arrived]](i) and 9c j: [[John left]](j) and j is after i

b. John left after Harry had arrived at 5.9c i: [[Harry had arrived at 5]](i) and 9c j: [[John left]](j) and j isafter i9c i: at i Harry was in the state of having arrived at 5 and 9c j:[[John left]](j) and j is after i

Hornstein's theory of temporal connectives rules out (11a), although I�nd it to be fairly good. Even better is (11b).

(11) a. John had left after Harry arrived. (Hornstein's judgment)

b. John had insulted Maryi after shei left.

76

Bridget Copley

Why is (11b) better? The temporal connective after she left attaches rela-tively low, which is a possibility with the perfect, according to Thompson(1994); in (11b) that position is favored because of the coreference. Thestructure would be in part as in (12):

(12) John had [insulted Mary after she left]

According to Thompson, there is a higher attachment point for the adver-bial (we saw this with the two temporal arguments for the perfect above, aswell). Perhaps it is this higher attachment that yields Hornstein's judgmenton (11), stemming from a con ict between the stativity of the perfect anda revised meaning of after, re ecting the idea that statives are generallydegraded where they appear with after-clauses (Dowty 1979; Smith 1991).Supposing this argument holds for other temporal connectives as well, tem-poral connectives provide no rationale for including three temporal points(and two operators) in the semantics of the simple past.

2.5 Topic Time is real

Here we return to Klein, for whom \Time of Topic" (TT) is the analogueto R. TT is \the time span to which the speaker's claim on this occasion iscon�ned." (Of course there are utterances which do not make any claim, butthe idea of how to extend this de�nition will be clear enough that we mayjusti�ably abstract away from those cases.) Klein's example to illustratethis uses a discourse between a judge and a witness, given in (13).

(13) a. What did you notice when you looked into the room?

b. There was a book on the table. It was in Russian.

Klein's point here is that the statements in (13b) can be true without sayinganything about whether the situations still hold at the time of utterance,or indeed at any time other than the one provided by context. Perhapsthe book has since been removed from the table; the \Time of Situation"(TSit, analogous to E), corresponding to the interval that the book wason the table, in any case was not limited to the interval over which thewitness looked into the room. This point is even more clear with the secondstatement in (13b). Presumably the book remains in Russian. Yet the pasttense is nonetheless felicitous, and indeed is much more so than the presenttense in this discourse (at least in English).

Still, it is not at all clear that these facts need, or even should, beaddressed via a temporal primitive R, as has been pointed out by Kampand Reyle (1993); in addition, ter Meulen (1995) attributes such usages as(13b) in part to the idea that statives piggyback on a temporal node in thediscourse without creating one of their own. So again, it may not be thesemantics that must account for the fact in (13b), but rather the pragmatics.

77


There are a number of ways to put contextual information into the tense,either by treating it as a referential entity (see for instance Partee (1984)),or by putting a contextual restriction on a temporal existential quanti�er, aswe did above. Thus Klein's argument for the reality of Topic Time cannotbe taken as an argument for R as a semantic primitive.

3 Arguments for only one operator

So far we have seen that there is no reason to posit a third temporal pointR in the English simple past. On a structural view such as Klein's, in whichoperators associated with tense and aspectual atoms correlate with the Re-ichenbach/Hornstein relations, this would mean that there is no reason toposit an aspectual operator between T and the VP. But there are two furtherreasons why it would be preferable not to posit an operator there2.

3.1 Condition C and adverbials

As alluded to above, (Thompson 1994), investigating the perfect, �nds thatdi�erent attachment sites for temporal adverbials in perfect constructionscome with di�erent possibilities for modi�cation of R or E. For example,(14) has two readings, but (15) has only one:

(14) Mary had seen John at 3.

a. Reading 1: The seeing time was at 3. (adv. modi�es E)

b. Reading 2: The seeing time was before 3 (adv. modi�es R)

(15) Mary had seen himi at the time Johni presented his paper.

a. *Reading 1: The seeing time was at the time of thepresentation. (adv. modi�es E)

b. Reading 2: The seeing time was before the time of thepresentation. (adv. modi�es R)

(15) is bad because the lower position for the adverbial is low enough tobe c-commanded by the object (assumed to have undergone object shift),violating Condition C of the binding theory, which states that a referringexpression such as John must not be c-commanded by anything that refersto the same entity (in this case, John). The R-modifying reading remainsgood, however, because it is high enough not to be c-commanded by theobject.

2These are not, however, arguments against an extra perfective operator within theVP as proposed by, e.g., (Travis 1991), or aspect associated with the direct object as in(Tenny 1994).

78

Bridget Copley

If there really is both an E and an R in the simple past, we wouldexpect them both to be available to adverbial modi�cation. Given Klein'sassumption that temporal-aspectual heads correspond to temporal relations,this means that we should be able to get adverbials to be at both heightsin the simple past. And even if R and E in the simple past happen tobe cotemporaneous, so that we cannot distinguish two di�erent readings aswe did for the perfect in (14), we should still be able to distinguish twodi�erent readings with the condition C test as in (15). However, only thelower position, modifying E, seems to be available, since there is a ConditionC violation:

(16) * Mary saw himi at the time of Johni's presentation

We may conclude that the fact that the adverbial can only be at the E-modifying height means that there is no R-modifying position available,and therefore no morpheme in the �eld between T and VP responsible forrelating R to E.

3.2 Tense must be near the verb for morphological reasons

The analysis in (2b) is also a problem in view of the morphological closenessof past tense and the verb in English (Alec Marantz, p.c.). The existence ofablauted forms (sing-sang) suggests that the past tense is local enough to theverb (in some sense of "local enough"), so that the relevant morphologicalprocesses can occur conditioned on the identity of the verb root. Whateverthe precise statement of locality that is needed here, the aÆxation of a nullmorpheme between tense and the verb would be expected to keep the tensefeature far enough from the verb that the correct form of the verb could notbe inserted.

3.3 \Perfectives" without two operators?

I have argued so far against the �rst idea that motivated the structure in(2b): the idea that an intervening operator is needed to relate R to E inthe simple past. But what about the second idea, Klein's claim that thereis a contentful aspectual operator there, in addition to the tense operator,namely the perfective operator?

Actually, Smith (1991) �nds the semantics of the English \perfective"to be extremely bland, unlike perfectives in other languages. It essentiallypasses up the aspectual semantics given by the Aktionsart (\situation type")of the verb:

For English [the perfective] has a consistent yet variable mean-ing: the perfective presents in its entirety the temporal schemaassociated with each situation type, as in (Smith 1991).

79


If that is the case, the semantics of English \perfectives" are no obstacle tochoosing (2a) as the structure of (1).

However, Kratzer (1998) brings up some di�erences between the Englishsimple past and the German simple past that are supposed to show thatthe English simple past is not an (anaphoric) past tense but rather can alsobe the re ex of a structure with present tense and a Klein-style aspectualoperator as well. Her argument is made in the wake of the main pointof the paper, that tense can be anaphoric in nature. Her concern is thatthere are some cases in which the English past tense seems not to behaveanaphorically. For example, the English past tense, but not the German pasttense, is possible in situations where no past time is contextually salient, asin (17).

(17) a. Who built this church? Borromini built this church.

b. * WerWho

bautebuilt

diesethis

Kirche?church?

BorrominiBorromini

bautebuilt

diesethis

Kirche.church.`Who built this church? Borromini built this church.'

The English past tense, but not the German past, can also be used in em-bedded contexts in which similarly, no past time is contextually salient (soin (18), for instance, the relevant context would be one in which the letterswere not received at some previous salient time before the utterance time):

(18) a. We will answer every letter we got.

b. # WirWe

werdenwill

jedenevery

Briefletter

beantworten,answer

denthat

wirwe

bekamen.received`We will answer every letter that we received.'

Kratzer accounts for this abberrant behavior of the English past tense bypositing an ambiguity between the anaphoric past and a present tense plusanterior aspect combination. She calls the anterior aspectual element re-sponsible for the non-anaphoric pastness the \perfect" (acknowledging thatthis is certainly not the same perfect as the English perfect constructionwhich uses have).

But does this proposal commit us to the structure in (2b), for this as-pectual version of the simple past tense? Not if we are to believe En�c (1996)and Ogihara (1989) (whose arguments I unfortunately lack space to discusshere) that present tense is no tense at all, but rather the absence of tense. Inthat case the one temporal-aspectual operator I have been arguing for couldbe Kratzer's \perfect" aspect. More research is called for here, but Kratzer's

80

Bridget Copley

idea does not a priori pose a problem for maintaining the structure in (2a)as the logical form of (1).

4 Conclusion

I hope to have shown that the arguments made by Hornstein and Klein forthree times in the English simple past to be unconvincing. On a structuralview this argues for (2a) over (2b) as the null hypothesis, since only onetemporal-aspectual operator is needed; I supported this hypothesis withevidence that there is only one operator there. Finally, there are someindications that (2b) looks strange semantically as well. All of these areevidence that the Reichenbachian temporal roles should not be taken asprimitives of a theory of temporal-aspectual semantics.

References

Copley, B. (forthcoming). Ph. D. thesis, MIT.

Dechaine, R. (1991). Bare sentences. In SALT I, Ithaca. Cornell Univer-sity.

Dowty, D. (1979). Word meaning and Montague Grammar. Dordrecht:Reidel.

En�c, M. (1996). Tense and Modality. In S. Lappin (Ed.), The Handbookof Contemporary Semantic Theory, pp. 345{358. Oxford, England:Blackwell.

Heim, I. and A. Kratzer (1998). Semantics in Generative Grammar.Blackwell.

Hornstein, N. (1990). As Time Goes By. MIT Press.

Iatridou, S. (2001). Temporal Existentials? MIT unpublished ms.

Kamp, H. and U. Reyle (1993). From Discourse to Logic. Dordrecht:Kluwer.

Klein, W. (1997). Time in Language. New York: Routledge.

Kratzer, A. (1998). More Structural Analogies Between Pronouns andTenses. In D. Strolovitch and A. Lawson (Eds.), Semantics and Lin-guistic Theory VIII, Ithaca, NY, pp. 92{128. Cornell Linguistics Circle.

Ogihara, T. (1989). Temporal Reference in English and Japanese. Ph. D.thesis, Ann Arbor, MI.

Partee, B. (1984, August). Nominal and Temporal Anaphora. Linguisticsand Philosophy 7 (3), 243{286.

81


Perlmutter, D. M. and P. M. Postal (1983). The Relational Succes-sion Law. In Studies in Relational Grammar, Chapter 2, pp. 30{80.Chicago: University of Chicago Press.

Reichenbach, H. (1947). Elements of Symbolic Logic. New York: MacMil-lan.

Smith, C. (1991). The parameter of aspect. Dordrecht: Kluwer.

Stowell, T. (1996). The Phrase Structure of Tense. In J. Rooryck andL. Zaring (Eds.), Phrase Structure and the Lexicon, Volume 33 ofStudies in Natural Language and Linguistic Theory. Kluwer.

Tenny, C. (1994). Aspectual Roles and the Syntax-Semantics Interface.Dordrecht: Kluwer.

ter Meulen, A. (1995). Representing Time in Natural Language. Cam-bridge, MA: MIT Press.

Thompson, E. (1994). The Structure of Tense and the Syntax of Tempo-ral Adverbs. In R. Aranovich, W. Byrne, S. Preuss, and M. Senturia(Eds.), Proceedings of the Thirteenth West Coast Conference on Lin-guistics, Palo Alto, CA, pp. 499{514. CSLI.

Travis, L. (1991). Derived Objects, Inner Aspect, and the Structure ofVP. McGill University ms.

82

The Compositional Rule of Inference in

an Intuitionistic Fuzzy Logic Setting

Chris Cornelis

Department of Mathematics and Computer Science, Ghent University

[email protected]

Glad Deschrijver

Department of Mathematics and Computer Science, Ghent University

[email protected]

Abstract. The incorporation of imprecise, linguistic information into logicaldeduction processes, as opposed to the practice of traditional two{valued proposi-tional logic and set theory, continues to be a predominant feature of fuzzy expertsystems. Throughout the literature, we can �nd all sorts of intelligent inferenceschemes acting under imprecision; common to most approaches is their reliance onif{then rules of the kind \IF X is A THEN Y is B", where A and B are fuzzy sets ingiven universes U and V . Intuitively, fuzzy sets (FSs) can be used to model elasticconstraints on the values a variable may assume. While the theory of FS{based ap-proximate reasoning is surely a well{established and commonly applied one, thereis still a demand for further expanding the expressiveness of the formalism. Onesuch improvement can be obtained by using Atanassov's (Atanassov 1983) intu-itionistic fuzzy sets (IFSs), of which FSs are speci�c instances, and which highlightthe fundamental importance of negation: the degree to which a proposition is false,or equivalently to which an object does not belong to a set, is given an indepen-dent status here. In this paper we will contribute to the further development ofthis relatively young theory, by generalizing the well-known Compositional Rule ofInference (CRI) to IFSs. We also deal with the related problem of checking thevalidity of the inference, as motivated in (Atanassov and Gargov 1998).

1 Introduction and preliminaries

Inference is de�ned as a procedure for deducing new facts out of existingones on the basis of formal deduction rules. Classical paradigms like two{valued propositional and predicate logic, exhibit some important drawbacks(lack of expressivity in describing incomplete and/or imprecise knowledge,high computational complexity) that make them unsuitable for applicationin automated deduction systems (e.g. for medical diagnosis). To alleviatethese diÆculties, Zadeh in 1973 introduced a formalism called approximatereasoning to cope with problems which are too complex for exact solution

Proceedings of the Sixth ESSLLI Student SessionKristina Striegnitz (editor)Chapter 8, Copyright c 2001, Chris Cornelis and Glad Deschrijver

83

The Compositional Rule of Inference in an Intuitionistic Fuzzy Logic Setting

but which do not require a high degree of precision. (Zadeh 1975) His workis centered around the notions of a fuzzy set and a fuzzy restriction.

In his seminal 1965 paper (Zadeh 1965), Zadeh generalized ordinary setsto fuzzy sets (FSs, for short), allowing an element u 2 U to belong to anydegree of membership in [0,1] (denoted A(u)) to a fuzzy set A in U . Itis clear that the extension equivalently gives rise to a continuum of truthvalues between 0 and 1 for a logical proposition P .

De�nition 1.1. (Fuzzy set)A fuzzy set A in a given universe U is a map-ping from U into the unit interval [0; 1]. The class of fuzzy sets in U isdenoted F(U).

To de�ne the intersection and union of fuzzy sets (equivalently, conjunc-tion and disjunction of fuzzy propositions), so{called t{norms and t{conormsare used: a t{norm is any symmetric, associative, increasing [0; 1]� [0; 1] ![0; 1] mapping T satisfying T (1; x) = x for every x 2 [0; 1], whereas for at{conorm S the last property is replaced by S(0; x) = x for every x 2 [0; 1].t{norms give rise to fuzzy intersections, in the sense that A \T B(u) =T (A(u); B(u)) for every u 2 U and T a t{norm. An analogous result holdsof course for t{conorms and unions.

Now consider the statement: \Paul is very old". Modelling \very old"as a fuzzy set on a suitable range of ages, this statement constitutes a so{called fuzzy restriction on the possible values of Paul's age rather than anassertion about the membership of Paul in a class of individuals. (Zadeh1975) From a logical perspective, it is interesting to see how people are ableto combine such information eÆciently in a Modus Ponens{like fashion toallow for inferences of the following kind:

IF bath water is \too hot" THEN I'm apt to get burntbath water is \really rather hot"

I'm quite apt to get burnt

The technique used above is in fact less restrictive than the actual MPfrom propositional logic since it doesn't require the observed fact (\reallyrather hot") and the antecedent of the rule (\too hot") to coincide to yield ameaningful conclusion. The need emerges for a exible, qualitative scale ofmeasuring to what extent the antecedent is ful�lled, on the basis of which wecould obtain an approximate idea (stated under the form of another fuzzyrestriction) of the value of the consequent variable.

With his introduction of a calculus of fuzzy restrictions (Zadeh 1975),Zadeh paved the way towards a reasoning scheme called Generalized ModusPonens (GMP) to systematize deductions like the example we presented.Since his pioneering work, many researchers have sought for eÆcient real-izations1 of this approximate inference scheme. In section 2 we will formally

1By \realization", we mean any computational procedure unambiguously de�ning theoutput in terms of the inputs

84


de�ne the GMP and survey its most common realization, the CompositionalRule of Inference (CRI). Section 3 introduces the notion of IntuitionisticFuzzy Sets (IFSs) and their connectives. In section 4, we proceed to extendthe CRI to the IFS setting. In section 5, we address a common validationprocedure based on the notion of Intuitionistic Fuzzy Tautology (IFT) anddiscuss how it a�ects our reasoning processes. Finally, section 6 o�ers someoptions for future research.

2 FS{based Compositional Rule of Inference

We start by recalling from (Cornelis et al. 2000) the de�nition of the mainconcept that we are concerned with:

De�nition 2.1. (Generalized Modus Ponens, GMP) Let X and Ybe variables assuming values in U , resp. V . Consider then a fuzzy rule\IF X is A, THEN Y is B" and a fuzzy fact (or observation) \X is A0"(A;A0 2 F(U); B 2 F(V )). The GMP allows deduction of a fuzzy fact \Yis B0", with B0 2 F(V ).

Expressing this under the form of an inference scheme, we get:

IF X is A, THEN Y is BX is A0

Y is B0

De�nition 2.1 does not state what the fuzzy restrictionB0 should be whenA;A0 and B are given. A lot of approaches have been proposed for this pur-pose ((Cornelis 2000), among others, gives a survey), the most common onerelying on the so{called Compositional Rule of Inference, a convenient mech-anism for calculating with fuzzy restrictions introduced by Zadeh in (Zadeh1975).

De�nition 2.2. (Compositional Rule of Inference, CRI) (Corneliset al. 2000) Let X and Y be de�ned as in de�nition 2.1. Consider also fuzzyfacts \X is A0" and \X and Y are R", where A0 2 F(U); R 2 F(U � V ) (Ris a fuzzy relation between U and V ). The CRI allows us to infer the fuzzyfact: \Y is R ÆT A

0", in which the direct image of A0 under R, denoted2

R ÆT A0, is de�ned as, for v 2 V :

R ÆT A0(v) = sup

u2UT (A0(u); R(u; v))

Expressing this under the form of an inference scheme, we get:

2Some people prefer to speak of the \composition of R with A0" hence the appearanceof the composition symbol.

85


X is A0

X and Y are R

Y is R ÆT A0

For de�nition 2.2 to be a realization of the GMP, R must be a relationalrepresentation of a fuzzy implicator, an extension of the classical implicationoperator:

De�nition 2.3. (Fuzzy implicator) (Ruan and Kerre 1993) A fuzzy im-plicator is any [0; 1]2 ! [0; 1] mapping I for which the restriction to f0; 1g2

coincides with classical implication: I(0; 0) = 1, I(1; 0) = 0, I(0; 1) = 1,I(1; 1) = 1. Moreover, I should satisfy the following monotonicity criteria:

(8y 2 [0; 1])(8(x; x0) 2 [0; 1]2)(x � x0 ) I(x; y) � I(x0; y)) (8.1)

(8x 2 [0; 1])(8(y; y0) 2 [0; 1]2)(y � y0 ) I(x; y) � I(x; y0)) (8.2)

Given I and A and B, the fuzzy sets used in de�nition 2.1, R is de�nedas, for (u; v) 2 U � V : R(u; v) = I(A(u); B(v)). The two most importantclasses of fuzzy implicators are called S{ and R{implicators, and are de�nedas follows3:

De�nition 2.4. (S{implicator) (Bouchon-Meunier et al. 1999) Let S bea t{conorm. The S{implicator generated by S is the mapping IS de�ned as:

IS : [0; 1]2 ! [0; 1](x; y) 7! S(1� x; y); 8(x; y) 2 [0; 1]2

De�nition 2.5. (R{implicator) (Bouchon-Meunier et al. 1999) Let T bea t{norm. The R{implicator generated by T is the mapping IT de�ned as:

IT : [0; 1]2 ! [0; 1](x; y) 7! supf 2 [0; 1]jT (x; ) � yg; 8(x; y) 2 [0; 1]2

3 Intuitionistic Fuzzy Sets

IFSs, �rst introduced by Atanassov (Atanassov 1983) in 1983, generalizeZadeh's fuzzy sets. While FSs merely give the degree of membership of anelement in a set, IFSs also involve a degree of non{membership.

De�nition 3.1. An intuitionistic fuzzy set in a universe U is any object Aof the form A = f(u; �A(u); �A(u))ju 2 Ug, where the membership function�A and the non{membership function �A are U ! [0; 1] mappings satisfying(8u 2 U)(�A(u) + �A(u) � 1). The class of all IFSs in U is denoted IF(U).

3It is easily veri�ed that they are indeed fuzzy implicators. (Cornelis 2000)

86


Clearly any FS A 2 F(U) has an IFS representation where for any u 2 Uthe degree of non{membership equals one minus the degree of membership.There also exist straightforward extensions of the FS union and intersec-tion to IFSs. Let T be a t-norm and S a t-conorm. Then the general-ized intersection A \T;S B of two IFSs A and B in U can be de�ned asA \T;S B = f(u; T (�A(u); �B(u)); S(�A(u); �B(u)))ju 2 Ug. The resultingobject is again an IFS provided T � S�, where S� denotes the dual t-normof S, de�ned as S�(x; y) = 1�S(1�x; 1�y) for all x and y in [0; 1]. Indeed,from T � S� and using the increasing property of the t-conorm S, we ob-tain T (�A(u); �B(u)) � 1� S(1� �A(u); 1� �B(u)) � 1� S(�A(u); �B(u)).Putting �A(u) = 1 � �A(u) and �B(u) = 1 � �B(u), it is clear that thecondition T � S� is also necessary. A similar result can be obtained for theIFS union [S;T , under the condition S � T �, which is equivalent to T � S�.

4 IFS{based Compositional Rule of Inference

As discussed in the previous section, IFSs o�er a more general frameworkthan FSs do, thus allowing representation of relations between variablesthat could previously not be described (see e.g. (Atanassov 1999) (De et al.2001) for some real{world examples). Also in (Atanassov 1999), a �rstattempt is made to endow fuzzy expert systems with concepts from IFStheory, indicating the real interest excited by these structures. It wouldtherefore be nice to �nd some suitable IFS adaptation of the CRI, by farthe most common means of deduction in the FS setting.

Before we can generalize the GMP and the CRI, we have to introducesome preliminary concepts.

De�nition 4.1. (Lattice (L�;�L�)) De�ne a lattice (L�;�L�) such that:

L� = f(x1; x2) 2 [0; 1]2 j x1 + x2 � 1g

(x1; x2) �L� (y1; y2), x1 � y1 ^ x2 � y2

The shaded area in the �gure is the set of elements x = (x1; x2) belongingto L�.

��

��

x

x1

1

x2

10

x2

x1

87


The lattice (L�;�L�) is a complete lattice: for each A � L�,

supA = (supfx1 2 [0; 1] j (9x2 2 [0; 1])((x1; x2) 2 A)g;inffx2 2 [0; 1] j (9x1 2 [0; 1])((x1 ; x2) 2 A)g) ;

inf A = (inffx1 2 [0; 1] j (9x2 2 [0; 1])((x1; x2) 2 A)g;supfx2 2 [0; 1] j (9x1 2 [0; 1])((x1; x2) 2 A)g) :

Equivalently, this lattice can also be de�ned as an algebraic structure(L�;^;_) where the meet operator ^ and the join operator _ are de�ned asfollows, for (x1; x2); (y1; y2) 2 L

�:

(x1; x2) ^ (y1; y2) = (min(x1; y1); (max(x2; y2))

(x1; x2) _ (y1; y2) = (max(x1; y1);min(x2; y2))

For our purposes, we will also consider a generalized meet operator ^T;Sand join operator _S;T on (L�;�L�):

(x1; x2) ^T;S (y1; y2) = (T (x1; y1); S(x2; y2))

(x1; x2) _S;T (y1; y2) = (S(x1; y1); T (x2; y2))

for given t{norm T and t{conorm S satisfying T � S�. Again, it can beshown that the condition T � S� is necessary and suÆcient for these oper-ators to be well-de�ned.

Lastly, we de�ne an order{reversing mapping N by N(x1; x2) = (x2; x1);8(x1; x2) 2 L

�.We now propose the following extensions for intuitionistic fuzzy impli-

cators, as well as the special instances of S{ and R{implicators.

De�nition 4.2. (Intuitionistic Fuzzy Implicator) An intuitionistic fuzzyimplicator is any (L�)2 ! L�{mapping I satisfying the border conditions

I(0L� ; 0L�) = 1L� ;I(1L� ; 0L�) = 0L� ;I(0L� ; 1L�) = 1L� ;I(1L� ; 1L�) = 1L� ;

where 0L� = (0; 1) and 1L� = (1; 0) are the identities of (L�;�L�). More-over we require I to be decreasing in its �rst, and increasing in its secondcomponent, i.e.

(8y 2 L�)(8(x; x0) 2 (L�)2)(x �L� x0 ) I(x; y) �L� I(x

0; y)) (8.3)

(8x 2 L�)(8(y; y0) 2 (L�)2)(y �L� y0 ) I(x; y) �L� I(x; y

0)) (8.4)

88


De�nition 4.3. (S{implicator) Let T be a t-norm, S a t-conorm satis-fying T � S�, and N an involutive4 order{reversing operator on L�. TheS{implicator generated by T , S and N is the mapping IS;T;N de�ned as, for(x; y) 2 (L�)2 :

IS;T;N(x; y) = N(x) _S;T y

De�nition 4.4. (R{implicator) Let T be a t-norm, S a t-conorm satis-fying T � S�. The R{implicator generated by T and S is the mapping IT;Sde�ned as, for (x; y) 2 (L�)2 :

IT;S(x; y) = supf 2 L� j x ^T;S �L� yg

Theorem 4.1. The mappings IS;T;N and IT;S are intuitionistic fuzzy impli-cators.

Proof. It is easy to verify that the de�ned operators satisfy the border con-ditions. We now prove that they satisfy the hybrid monotonicity properties.

(i) Since each t-norm T and each t-conorm S are increasing in both com-ponents, it can easily be seen that, for all (a1; a2); (b1; b2); (c1; c2),(d1; d2) 2 L�, (a1; a2) �L� (b1; b2) and (c1; c2) �L� (d1; d2) implies(S(a1; c1); T (a2; c2)) �L� (S(b1; d1); T (b2; d2)), i.e. (a1; a2)_S;T (c1; c2)�L� (b1; b2)_S;T (d1; d2). Hence _S;T is increasing in both components.Similarly ^T;S is increasing in both components.

Since N is order{reversing, we obtain successively, for x; x0; y 2 L�,

x �L� x0

N(x) �L� N(x0)

N(x) _S;T y �L� N(x0) _S;T y

IS;T;N(x; y) �L� IS;T;N(x0; y)

It is equally obvious that IS;T;N is increasing in its second component.

(ii) Now we prove that an arbitraryR{implicator IT;S is hybrid monotonous.Let x = (x1; x2); x

0 = (x01; x02); y = (y1; y2) 2 L

� such that x �L� x0.

Then

f( 1; 2) 2 L� j (T (x1; 1); S(x2; 2)) �L� (y1; y2)g

� f( 1; 2) 2 L� j (T (x01; 1); S(x02; 2)) �L� (y1; y2)g

since T (x1; 1) � T (x01; 1) and S(x2; 2) � S(x02; 2). Hence

supf( 1; 2) 2 L� j (T (x1; 1); S(x2; 2)) �L� (y1; y2)g

� supf( 1; 2) 2 L� j (T (x01; 1); S(x02; 2)) �L� (y1; y2)g

Analogously, monotonicity in the second component is obtained.4An X ! X{mapping f is involutive i�, for all x 2 X; f(f(x)) = x

89


�

A generalization of the CRI will require us to de�ne the direct image ofan IFS under an intuitionistic fuzzy relation (IFR).

De�nition 4.5. Let A 2 IF(U), R 2 IF(U � V ), T a t{norm and S at{conorm satisfying T � S�. The direct image R ÆT;S A is de�ned as:

R ÆT;S A =

��v; supu2U

T (�A(u); �R(u; v)); infu2U

S(�A(u); �R(u; v))

�j v 2 V

�(8.5)

Now we have all the necessary tools to generalize the GMP and theCRI. De�nition 2.1 can be maintained if we replace the word \fuzzy" by\intuitionistic fuzzy", and F(U) and F(V ) by IF(U) and IF(V ) respectively.We call this pattern Intuitionistic Generalized Modus Ponens (IGMP).

Just as in the fuzzy case, a realization of the IGMP can be obtainedby de�ning the output B0 as the direct image of the input A0 under anintuitionistic fuzzy relation R representing the intuitionistic fuzzy rule. Thisgives rise to the following generalization of the CRI.

De�nition 4.6. (IFS-based Compositional Rule of Inference, ICRI)Let X and Y be variables assuming values in U , resp. V . Consider intuition-istic fuzzy facts \X is A0" and \X and Y are R", where A0 2 IF(U); R 2IF(U � V ) (R is an intuitionistic fuzzy relation between U and V ). TheICRI allows us to infer the fuzzy fact: \Y is B0 = R ÆT;S A

0", where (T; S)is a given pair of a t{norm and a t{conorm satisfying T � S�.Expressing this under the form of an inference scheme, we get:

X is A0

(X;Y ) is R

Y is B0 = R ÆT;S A0

We use an IF implicator I to de�ne R. Given IFSs A and B and I, wecalculate, for (u; v) 2 U � V ,

(�R(u; v); �R(u; v)) = I((�A(u); �A(u)); (�B(v); �B(v)));

thus de�ning R. Using this de�nition, it is clear that the ICRI is an exten-sion of the fuzzy-based CRI.

5 Validity of the Modus Ponens

As pointed out in (Atanassov and Gargov 1998), validity of the modus po-nens (MP) is essential if one is interested in passing from hypothesis toconclusions without loss in the degree of truth. Before we introduce thede�nition of validity, we �rst give the following de�nition.

90


De�nition 5.1. (Intuitionistic fuzzy tautology, IFT) (Atanassov andGargov 1998) Let a = (a1; a2) 2 L

�, then a is said to be an \intuitionisticfuzzy tautology" if and only if a1 � a2.

De�nition 5.2. (Validity of the modus ponens) (Atanassov and Gar-gov 1998) We say that the MP is valid for an IF implicator I i�, fora = (a1; a2); b = (b1; b2) 2 L

�, we have5

a1 � a2 ^ pr1(I(a; b)) � pr2(I(a; b))) b1 � b2

This amounts to: \if a is an IFT and I(a; b) is an IFT, then b is an IFT".

Unfortunately, the implicators de�ned in the previous section, do notsatisfy the validity of the modus ponens, as shown in the following theorem.

Theorem 5.1.

� If for the mapping N there exists an x = (x1; x2) 2 L� such thatx1 � x2 and pr1(N(x)) 6= 0, then the modus ponens in not valid forthe S{implicator IS;T;N.

� The modus ponens is not valid for any R{implicator.

Proof.

� Let x = (x1; x2) 2 L� with x1 � x2 be such that N(x) = (x01; x02) with

x01 6= 0, and let y = (0; x01). Then

IS;T;N(x; y) = N(x) _S;T y = (S(x01; 0); T (x02; x01)) = (x01; T (x02; x

01));

with x01 � T (x02; x01). This shows that the modus ponens is not valid

for IS;T;N.

� Let x = (x1; x2) 2 L� and y = (y1; y2) 2 L

� such that y2 > y1 � x1 �x2 = 0. Then T (x1; 1) � x1 � y1, 8 1 2 [0; 1] and S(x2; 2) = 2,hence

supf( 1; 2) 2 L� j (T (x1; 1); S(x2; 2)) �L� yg = (1� y2; y2):

If y2 �12 , then 1� y2 � y2. For the MP to be valid, we need:

(8(a1; a2) 2 L�)(8(b1; b2) 2 L

�)(a1 � a2^pr1(I((a1; a2); (b1; b2))) � pr2(I((a1; a2); (b1; b2)))) b1 � b2)

The chosen x and y satisfy x1 � x2 and pr1(I(x; y)) � pr2(I(x; y)),but have y1 < y2. Hence the MP is not valid.

5The projection mappings pr1; pr2 are de�ned, for any (x1; x2) 2 L�, as: pr1(x1; x2) =x1; pr2(x1; x2) = x2

91


�

In (Atanassov and Gargov 1998) Atanassov de�nes a number of allegedimplicators for which the modus ponens is valid. Unfortunately, none of hisproposed mappings is an IF implicators in the sense of de�nition 4.2 (eitherthe border conditions, or the hybrid monotonicity, or both, are violated).

One may wrongly conclude from the above discussion that the implica-tors de�ned in section 4 are of minor interest. Indeed, in the fuzzy casevalidity is de�ned as, for a; b 2 [0; 1] and I a fuzzy implicator, \a is a fuzzytautology (FT) and I(a; b) is a FT implies b is a FT", where x 2 [0; 1] is saidto be a FT if and only if x � 1

2 (cfr. (Atanassov and Gargov 1998)). Themodus ponens is not valid for fuzzy S{implicators either.6 However, mostof the commonly used fuzzy implicators belong to this class.

Set apart from these considerations, we still �nd it useful to look forintuitionistic fuzzy implicators for which the modus ponens does hold. Thiswill be the subject of a future paper.

6 Conclusion and Future Work

The interest for Intuitionistic Fuzzy Sets from the perspective of logicaldeduction will continue to grow as more people become familiar with theirstraightforward semantics and exible operations. As it turns out, we could�nd very useful results not previously established, notably the extension ofa very wide range of fuzzy implicators to IFS's and their application in theintuitionistic CRI. The consistency of the reached approach still needs tobe looked into systematically. The procedure of checking for validity linedout in section 5, along with several other criteria that the inference shouldsatisfy, provides us with various yardsticks for evaluating the performanceof our inference scheme. For FS{based GMP, extensive studies have beencarried out for this purpose (Baldwin and Pilsworth 1980) (Fukami et al.1981).

Acknowledgements

Chris Cornelis would like to thank the Fund for Scienti�c Research Flanders(FWO) for funding the research elaborated on in this paper.

6Speci�cally, if there exists an x 2 [0; 1] such that x � 12

and N(x) = 12, then

S(N(x); y) � N(x) = 12

for any y 2 [0; 1]

92


References

Atanassov, K. T. (1983). Intuitionistic fuzzy sets. (in Bulgarian) VIIITKR's Session, So�a (deposed in Central Sci.{Technical Library ofBulg. Acad. of Sci., 1697/84.

Atanassov, K. T. (1999). Intuitionistic fuzzy sets. Heidelberg/New York:Physica{Verlag.

Atanassov, K. T. and G. Gargov (1998). Elements of intuitionistic fuzzylogic. Part I. Fuzzy Sets and Systems 95, 39{52.

Baldwin, J. F. and B. W. Pilsworth (1980). Axiomatic approach to im-plication for approximate reasoning using fuzzy logic. Fuzzy Sets andSystems 3, 193{219.

Bouchon-Meunier, B., D. Dubois, L. Godo, and H. Prade (1999). Fuzzysets and possibility theory in approximate and plausible reasoning.In J. Bezdek, D. Dubois, and H. Prade (Eds.), Fuzzy sets in approx-imate reasoning and information systems, pp. 15{190. Kluwer Aca-demic Publishers.

Cornelis, C. (2000). Approximate reasoning in fuzzy set theory. Master'sthesis (in Dutch), Ghent University.

Cornelis, C., M. De Cock, and E. E. Kerre (2000). The Generalized ModusPonens in a Fuzzy Set Theoretical Framework. In D. Ruan and E. E.Kerre (Eds.), Fuzzy IF-THEN Rules in Computational Intelligence,Theory and Applications, pp. 37{59. Kluwer Academic Publishers.

De, S. K., R. Biswas, and A. R. Roy (2001). An application of intuitionisticfuzzy sets in medical diagnosis. Fuzzy Sets and Systems 117, 209{213.

Fukami, S., M. Mizumoto, and T. Tanaka (1981). Some considerations onfuzzy conditional inference. Fuzzy Sets and Systems 4, 243{273.

Ruan, D. and E. E. Kerre (1993). Fuzzy implication operators and gener-alized fuzzy method of cases. Fuzzy Sets and Systems 54, 23{37.

Zadeh, L. A. (1965). Fuzzy sets. Information and Control 8, 338{353.

Zadeh, L. A. (1975). Calculus of fuzzy restrictions. In L. A. Zadeh, K.-S.Fu, K. Tanaka, and M. Shimura (Eds.), Fuzzy sets and their applica-tions to cognitive decision processes, pp. 1{40. New York: AcademicPress.

93


94

Movement as well-formedness conditions

Ralph Debusmann

University of Saarland

[email protected]

Abstract. We introduce a new formulation of dependency grammar recentlysuggested in (Duchier and Debusmann 2001) (henceforth DD). DD shares with GB(Chomsky 1986) a notion of movement. In GB, movement is carried out by treetransformations. In DD, it is the e�ect of well-formedness conditions on dependencytrees and does not involve transformations. We illustrate both kinds of movementby showing how both theories analyze German verb-second clauses. Then, we pointout the similarities between GB and DD, and raise the question whether GB'stransformational notion of movement could be replaced by DD's constraint-basedaccount of movement.

1 Introduction

In this article, we introduce a new formulation of dependency grammar re-cently suggested by (Duchier and Debusmann 2001) (henceforth DD). Oneof the key assets of DD is that its axiomatization can be (and has been)turned into eÆcient constraint-based parsers. DD analyses consist of twotree structures: a syntactic dependency tree (ID tree) and a topological de-pendency tree (LP tree). The ID tree is a dependency tree in the spirit of(Tesni�ere 1959) whose edges are labeled by grammatical roles. ID trees areunordered (and hence in a sense non-projective), as opposed to LP treeswhich are ordered and projective. The LP tree is inspired by topological�elds theory. Its shape is essentially a attening of the ID tree, allowing usto handle discontinuous constructions in free word order languages such asGerman.

The approach taken in DD is very similar to other recent dependency-based approaches tackling discontinuity, such as (Br�oker 1998), (Kahaneet al. 1998) and (Gerdes and Kahane 2001). It is also reminiscent of HPSG-based theories by (Reape 1994), (Kathol 2000) and (M�uller 1999). DD'sstrong resemblance with Government-Binding (GB) theory (Chomsky 1986)is probably less obvious. We will show that, above all, DD shares with GBa notion of movement. In GB, movement is carried out by tree transforma-tions, which have properties undesirable for parsing. In DD, movement is

Proceedings of the Sixth ESSLLI Student SessionKristina Striegnitz (editor)Chapter 9, Copyright c 2001, Ralph Debusmann

95


the e�ect of well-formedness conditions on �nite labeled trees and does notinvolve transformations.

The outline of this article is as follows. We start out by presenting theessentials of topological �elds theory in section 2. In section 3, we introducethe dependency-based theory proposed in (Duchier and Debusmann 2001),and show how this theory handles German verb-second clauses. In section 4,we explain an analysis of German verb-second clauses in GB, as put forwardby (Grewendorf 1988). Then, in section 5, we compare GB and DD, andhint at the possibility to reformulate parts of GB, movement in particular,in a constraint-based way. The conclusion in section 6 rounds up the article.

2 Topological �elds theory

Both GB and DD use ideas from topological �elds theory. Topological �eldstheory has a long tradition in German linguistics reaching back to the worksof (Herling 1821) and (Erdmann 1886). As an example, consider the sentencebelow:

Einen Mann hat Maria geliebtA man(acc) has Maria loved

\A man, Maria has loved."(9.1)

Topological �elds theory divides (9.1) into four parts, which are called �elds:Vorfeld, left parenthesis, Mittelfeld and right parenthesis:1

Vorfeld left parenthesis Mittelfeld right parenthesis

Einen Mann hat Maria geliebt.

where the �nite verb hat in the left parenthesis and its verbal complementgeliebt (right parenthesis) form a bracket around the non-verbal materialin the Mittelfeld. The Vorfeld, left of the left parenthesis, can be occupiedby at most one topicalized constituent, whereas the Mittelfeld can host anynumber of non-verbs. The order of the material in the Mittelfeld is almostarbitrary.

3 Verb-second clauses: A DD analysis

3.1 ID and LP trees

DD distinguishes two tree structures: the unordered ID tree and the partiallyordered and projective LP tree. ID and LP trees share the same set of nodes,which correspond one-to-one with words, but have di�erent edges. Below,we give an ID tree analysis of (9.1). Since ID trees are unordered, we can

1Actually, topological �elds theory postulates one more �eld which is called Nachfeldand hosts material right of the right parenthesis.

96

Ralph Debusmann

pick an arbitrary linear arrangement for display purposes. In the picturebelow, we stick to the word order given in sentence (9.1).

ID tree:

einen Mann hat Maria geliebt

subjvpast

obj

det

ID edges are labelled by grammatical functions like subj (for a nominativesubject), obj (for an accusative object) vpast (for a past participle comple-ment) and det (for a determiner). The mother of a node in the ID tree iscalled head and its daughters syntactic dependents. Here is the correspond-ing topological dependency (LP tree) analysis:

LP tree:

einen Mann hat Maria geliebt

dn

cn v

vf mf vc

df

The mother of a node in the LP tree is called host and its daughters topo-logical dependents.

3.2 Ordering words in the LP tree

DD employs a set F of �elds to determine the licensed linearizations. F =Fext ] Fint, where Fext = fdf; vf;mf; vcg is the set of external �elds or LPedge labels. Fint = fd; n; c; vg is the set of internal �elds or LP node labels.2

F is totally ordered, which induces a partial order on LP trees:

1. The topological dependents of each node are ordered by their edgelabels or external �elds.

2. Each node is positioned with respect to its topological dependents byits node label or internal �eld.

2Note that for expository reasons, F only includes the �elds needed to account for ourexample. For a full account of German, more �elds are required.

97


The order is partial and not total because if two words land3 in the sameexternal �eld, they remain unordered with respect to each other.

The set F of �elds is essentially motivated by topological �elds theory.For example vf models the Vorfeld, c the left parenthesis, mf the Mittelfeldand vc the right parenthesis (or verb cluster). df and n are used to determineword order within noun phrases: e.g. df stands for determiner �eld and nfor noun �eld. The total order on F is given below:

d � df � n � vf � c � mf � vc � v (9.2)

The global total order on F can be decomposed into local total orders.For instance the local order df � n requires determiners to precede theircorresponding nouns. Conversely, the sequence vf � c � mf � vc requiresthe Vorfeld (vf) to precede the left parenthesis (c) to precede the Mittelfeld(mf) to precede the verb cluster or right parenthesis (vc).

In our example, the desired linearization is attained as follows:

1. Mann lands in the vf, Maria in the mf and geliebt in the vc of hat.Since vf � mf � vc in (9.2), Mann must precede Maria and Mariamust precede geliebt.

2. The internal �eld of hat is c. Because vf � c � mf in (9.2), it must beplaced between the Mann in the vf and Maria in the mf.

3.3 Example lexicon

DD states well-formedness conditions for LP trees in a lexicalized fashion:a lexical entry stipulates which external �elds are o�ered for topologicaldependents to land in and which are accepted. A node w0 can land in anexternal �eld f of host w i� w o�ers f and w0 accepts f. A lexical entry alsoassigns a set of possible internal �elds to each word. Here are the lexicalentries for our example4:

o�ers accepts internal

einen fg fdfg fdgMann fdfg fvf;mfg fngMaria fg fvf;mfg fng

hat fvf?;mf�; vc?g fg fcggeliebt fg fvcg fvg

The set of �elds o�ered by a node is given in wildcard notation: e.g. vf?indicates that there can be at most one topological dependent in the vf (asstated by topological �elds theory), and mf� that any number of daughterscan land in the mf.

3A node is said to land in external �eld f i� its incoming edge is labeled with f.4We display only the LP tree part of each lexical entry. Full lexical entries also comprise

ID tree information such as subcategorization and agreement.

98

Ralph Debusmann

3.4 Climbing

The well-formedness conditions for LP trees are further constrained by agrammatical principle5:

Principle 1. a node must land on a transitive head

which states that the host w of a node w0 in the LP tree must be abovew0 in the ID tree. If a node lands above of its syntactic head, it is said tohave climbed. Below, we illustrate how Mann climbs into the vf of hat (asindicated by the dashed arrow):

ID tree: LP tree:

Mann hat geliebt

vpast

obj)

Mann hat geliebt

vcvf

Mann is forced to climb by the well-formedness conditions stated in thelexicon: it cannot land on geliebt because geliebt o�ers no �eld.

3.5 Extending coverage

Note that the example lexicon has been designed to be as simple as possi-ble, and is therefore not representative of DD's coverage. In fact, we havedeveloped a German grammar for our prototype parser which also coverse.g. verb-last sentences, partial verb phrase fronting and relative clauses. Inorder to also handle verb-last sentences, we introduce separate lexical entriesfor �nite verbs that have internal �eld v and do not o�er a vf. The intro-duction of additional lexical entries does not really hamper the eÆciency ofour approach as the parser uses a novel constraint-based treatment of lexicalambiguity as introduced in e.g. (Duchier 1999).

In order to accomodate partial verb phrase fronting in the example lex-icon, we would need an additional lexical entry for geliebt. If it lands inthe Vorfeld, i.e. if it accepts fvfg, then it should o�er fmf�g. An LP treeanalysis of the partial verb phrase fronting sentence Einen Mann geliebt hatMaria would then look like the one depicted below:

5We only mention the �rst of the three principles given in (Duchier and Debusmann2001).

99


LP tree:

einen Mann geliebt hat Maria

dn

vc

n

vf mf

mf

df

4 Verb-second clauses: a GB analysis

4.1 German sentence structure

There are two approaches to analyzing verb-second sentences in GB. Wechoose the approach taken in (Grewendorf 1988), as originally suggested in(Chomsky 1986). It assumes the following sentence structure for German:6

cp

xp c'

c s

np vp

The theory of topological �elds can be mapped to the proposed GB struc-ture as follows: the Vorfeld corresponds to the [CP,XP]-position and theleft parenthesis to [C',C]. The right parenthesis and the Mittelfeld have noequivalents in the above tree con�guration.

4.2 d- and s-structure

Two of the four levels of analysis that GB posits are of interest for ourpurposes here: d- and s-structure. Here is a d-structure analysis of (9.1):

6For simplicity, we do not depict the IP- and I'-nodes from the original tree shown in(Grewendorf 1988), p. 49.

100

Ralph Debusmann

d-structure:cp

xp c'

c s

np vp

Maria v vp

hat np v

einen Mann geliebt

GB uses an operation called move-� to mediate between d- and s-structure.Move-� moves nodes into landing sites, which are positions available formovement to take place to. The number of landing sites is restricted byconstraints such as the �-criterion and the Case-Filter. In the example,[CP,XP] and [C',C] function as landing sites. The XP-position [CP,XP] is alanding site only for maximal projections, whereas [C',C] is a head positionand therefore only available to heads. In the s-structure shown below, move-� takes place into both landing sites: the NP einen Mann moves to [CP,XP](Vorfeld), and the �nite verb hat to [C',C] (left parenthesis):

s-structure:cp

np c'

einen Manni c s

hatj np vp

Maria v vp

tj np v

ti geliebt

5 GB and DD: a comparison

5.1 Dependency

An obvious di�erence between GB and DD is that GB is a phrase structure-based theory and DD dependency-based. But is no crucial di�erence: sinceGB is based on X-bar theory (Jackendo� 1977), it also incorporates thenotion of a head : X-bar theory requires that every phrase has a head whichis a single word. (Covington 1990) argues that if a phrase structure analysis(1) picks out one node as the head of each phrase and (2) has no labelsor features on non-terminal nodes (unless of course copied unchanged from

101


terminal nodes), it can be regarded as being equivalent to a dependencyanalysis that speci�es word order.

(Covington 1992) even goes one step further by attempting to simplifyGB theory by recasting it into a dependency formalism. He shows how toconvert GB's X-bar-based phrase structure trees into equivalent dependencytrees and then rede�nes government in terms of dependency. As an example,consider the GB phrase structure tree below:

n"

d"

d'

d

some

n'

adj"

adj'

adj

new

n'

n

pictures

p"

p'

p

of

n"

n'

n

us

The head of the phrase is pictures, which has a speci�er (some), anadjunct (new) and a complement (of ). The complement of of is us. Hereis an equivalent dependency tree:

some new pictures of us

sp adj co

co

where sp stands for speci�er, adj for adjunct and co for complement. Withrespect to a dependency tree, GB's notion of government is now much easierto de�ne:

De�nition 1. A governs B i� B is an immediate dependent of A.

By using dependency trees rather than phrase structure trees, not onlyprinciples such as government become easier to de�ne. From a computa-tional point of view, dependency grammar also has the advantage of pos-tulating fewer nodes, i.e. exactly one node per word. Fewer nodes lead toimproved performance because the trees to be processed are smaller.

102

Ralph Debusmann

5.2 Valency

GB and DD and in fact most linguistic theories to date share a notion ofvalency. Both GB and DD state valency requirements in the lexicon: GBuses subcategorization frames to specify the required �-roles, and a DDlexicon includes ID tree and LP tree valency. ID tree valency is encodedby stating which syntactic roles a word o�ers and is very similar to GB'ssubcategorization frames. For example, a �nite transitive verb o�ers subjand obj. On the opposite, LP tree valency speci�es which �elds a wordo�ers.

5.3 Constituency

While GB includes the notion of dependency as a derived notion only, con-stituency is incorporated as a �rst class citizen. Constituents or phrases inGB are contiguous substrings of the analyzed sentence. DD includes con-stituency as a derived notion in both the ID and LP tree. In the ID tree,the set of nodes equal or below a head can be viewed as a constituent, butone which is not required to be contiguous (since the ID tree is unorderedand non-projective). In the LP tree, the set of nodes equal or below a hostforms a contiguous substring constituent (because LP trees are ordered andprojective).

5.4 Movement

Both theories use a notion of movement to relate a deep or syntactic struc-ture (d-structure in GB, ID tree in DD) to a surface or topological structure(s-structure, LP tree). But while in GB, move-� is a primitive operationmodelled by tree transformations, climbing is a derived notion in DD: it de-scribes the e�ect of well-formedness conditions. These well-formedness con-ditions are axiomatized in a constraint-based fashion, as outlined in (Duchier1999) and (Duchier 2000), and can be easily turned into a parser.

GB's tree transformational account of movement severely restricts thecomputational usability of the theory. As (Covington 1990) argues, becausetransformations are tree-to-tree mappings, a parser can only undo a trans-formation if it has already parsed the tree structure that represents theoutput of the transformation. That is, the only way to parse a transformedsentence is to undo the transformation | but the only way to undo thetransformation is to parse the sentence �rst.

GB restricts the applicability of move-� by providing a �xed set of kindsof movement, including wh- and NP-movement. Movement is further con-strained by general principles such as the Case Filter and the �-criterion.For instance, only �-positions7 may function as landing sites for movement

7A �-position is a position which is not assigned a �-role.

103


in GB. XP-positions are landing sites for maximal projections only (e.g.[CP,XP]) and head-positions for heads.

DD constraints movement in a lexicalized way. Only a small number ofgrammatical principles are postulated and the remaining work is done inthe lexicon: a lexical entry stipulates which �elds are o�ered and which areaccepted. Making the connection to GB again, the notion of o�ered �elds isvery similar to GB's landing sites.

6 Conclusion

The new dependency grammar-based framework described in DD employsconcepts which are strikingly similar to concepts in GB theory. Above all,both GB and DD use a notion of movement to mediate between levels ofsyntax and linear precedence. But while GB models movement as tree trans-formations, movement in DD is the consequence of well-formedness condi-tions.

We demonstrated with analyses of verb-second clauses that on a descrip-tive level, the notions of movement in GB and DD are yet very similar. Thissuggests that GB's approach to movement could be reformulated in a waysimilar to DD's constraint-based approach. A non-transformational accountof movement based on well-formedness conditions would make GB muchmore attractive from a computational point of view, and could make use oftechniques developed for DD, including an eÆcient treatment of ambiguityusing �nite-set constraints.

References

Br�oker, N. (1998). Separating Surface Order and Syntactic Relations in aDependency Grammar. In COLING-ACL 98 - Proc. of the 17th Intl.Conf. on Computational Linguistics and 36th Annual Meeting of theACL., Montreal/CAN.

Chomsky, N. (1986). Barriers. Linguistic Inquiry Monograph 13. Cam-bridge/MA: MIT Press.

Covington, M. A. (1990). A Dependency Parser for Variable-Word-OrderLanguages. Research Report AI-1990-01, Arti�cial Intelligence Pro-grams, University of Georgia, Athens/GA.

Covington, M. A. (1992). GB Theory as Dependency Grammar. Re-search Report AI-1992-03, Arti�cial Intelligence Program, Universityof Georgia, Athens/GA.

Duchier, D. (1999). Axiomatizing Dependency Parsing Using Set Con-straints. In Sixth Meeting on the Mathematics of Language, Or-lando/FL.

104

Ralph Debusmann

Duchier, D. (2000). Con�guration Of Labeled Trees Under LexicalizedConstraints And Principles. To appear.

Duchier, D. and R. Debusmann (2001). Topological Dependency Trees:A Constraint-based Account of Linear Precedence. In 39th AnnualMeeting of the Association for Computational Linguistics (ACL 2001),Toulouse/FRA. To appear.

Erdmann, O. (1886). Grundz�uge der deutschen Syntax nach ihrergeschichtlichen Entwicklung dargestellt. Stuttgart/FRG: ErsteAbteilung.

Gerdes, K. and S. Kahane (2001). Word Order in German: A FormalDependency Grammar Using a Topological Hierarchy. In 39th AnnualMeeting of the Association for Computational Linguistics (ACL 2001),Toulouse/FRA. To appear.

Grewendorf, G. (1988). Aspekte der deutschen Syntax. Eine Rektions-Bindungs-Analyse. Studien zur deutschen Grammatik 33. T�ubin-gen/FRG: Gunter Narr.

Herling, S. (1821). �Uber die Topik der deutschen Sprache.

Jackendo�, R. (1977). �X Syntax: A Study of Phrase Structure. Number 2in Linguistic Inquiry Monographs. Cambridge/MA: MIT Press.

Kahane, S., A. Nasr, and O. Rambow (1998). Pseudo-projectivity: a poly-nomially parsable non-projective dependency grammar. In 36th An-nual Meeting of the Association for Computational Linguistics (ACL1998), Montreal/CAN.

Kathol, A. (2000). Linear Syntax. Oxford University Press.

M�uller, S. (1999). Deutsche Syntax deklarativ. Head-Driven Phrase Struc-ture Grammar f�ur das Deutsche. Linguistische Arbeiten 394. T�ubin-gen/FRG: Max Niemeyer Verlag.

Reape, M. (1994). Domain Union and Word Order Variation in German.In J. Nerbonne, K. Netter, and C. Pollard (Eds.), German in Head-Driven Phrase Structure Grammar, pp. 151{197. Stanford/CA: CSLI.

Tesni�ere, L. (1959). El�ements de Syntaxe Structurale. Paris/FRA: Klinck-siek.

105


106

On Interpolation and Model Theoretic

Characterization of Logics

Marta Garcia Matos

University of Helsinki

[email protected].�

Abstract. We present a generalized form of Lindstr�om's Theorem that character-izes the fragment of �rst order sentences in which a particular predicate occurs onlypositively. It has as corollaries Lyndon's Interpolation and Preservation theorems.The aim of the general project is to analyze the proof of Lindstr�om's Theoremand try to extend it, �rst by relating it to interpolation theorems, and later byextending the study to non-classical logics, such as systems of modal logics.

1 Introduction.

Two famous properties of �rst order logic are the Compactness Theorem:À set of sentences � has a model if and only if every �nite subset of � has amodel', and the Downward L�owenheim-Skolem Theorem: Ìf a countable setof sentences has a model of an in�nite cardinality, then it has a countablemodel'. Per Lindstr�om (5) characterized �rst order logic as the maximal logicwhich satis�es these model theoretic properties. The study of extensions of�rst order logic is well established (1), but practically no other logic than�rst order logic has a model theoretic characterization. The few results inthis area can be found in (3) and (10).

A close examination of the ingredients of Lindstr�om's Theorem revealsthat it is actually an application of some basic properties of back-and-forthsystems. When the proof is broken into its parts, a proof of Craig Interpo-lation Theorem: Ìf j= �, there is � such that j= � and � j= �, and everynon-logical symbol which occurs in � also occurs in both and �' emerges.This has been known in the folklore of the subject (3), but has not been sys-tematically exploited. Two recent works (2), (7) study the relations betweenback and forth systems, interpolation, and characterization theorems.

In this work we show that a generalized Lindstr�om's Theorem can beproved for the fragment of �rst order logic consisting of formulas in whicha particular predicate occurs positively (Theorem 4). This characteriza-tion immediately implies Lyndon Interpolation and Preservation Theorems

Proceedings of the Sixth ESSLLI Student SessionKristina Striegnitz (editor)Chapter 10, Copyright c 2001, Marta Garcia Matos

107

On Interpolation and Model Theoretic Characterization of Logics

(Corollaries 10 and 11). In (2) there is a similar result: a generalized theo-rem (in their case an interpolation theorem) with respect to a back and forthsystem. The strategy for proving interpolation theorems begins therefore by�nding a back and forth system for the logic. A preservation theorem re-lated to the nature of the back and forth system is immediately obtainedas a corollary. However, the existence of back and forth systems does notguarantee the success in �nding interpolation theorems, as happens in thecase of L1� for � > !.

In our project we analyze the proof of Lindstr�om's Theorem and tryto extend it, �rst by relating it to interpolation theorems, and later byextending the study to non-classical logics. In this paper we report theresults of the �rst stage only. We plan to go further, pursuing the connectionbetween model theoretic characterizations and de�nability results further.

When one looks at the spectrum of extensions of �rst order logic (1), itis noticeable that precisely the extensions that have a model theoretic char-acterization are the ones that also satisfy the Interpolation Theorem. Thisis so with �rst order logic, the in�nitary language L!1!, and the admissi-ble fragments of L!1!. On the other hand, a lot of e�ort has been put inobtaining model theoretic characterizations of extensions of �rst order logicby generalized quanti�ers (9). These attempts have failed, and remarkably,all known such extensions also fail to satisfy Interpolation theorems of anykind.

Let S be a �nite and relational vocabulary. An S-structure A is a pair(A; �), where A is a nonempty set and � is a map that assigns to every n-aryrelation symbol R in S, a n-ary relation on A. A logic is a pair (L; j=L)where L is a mapping de�ned on vocabularies S such that L[S] is a class (theclass of L-sentences of vocabulary S) , and j=L (the L-satisfaction relation)is a relation between L-sentences and structures. To de�ne a logic we mustalso give its closure properties. Usually these properties are:

(i) If S0 � S1, then L(S0) � L(S1).

(ii) If A j=L , then 2 L[SA]. Where SA is the vocabulary of A.

(iii) Isomorphism property. If A j=L , and A �= B, then B j=L

(iv) Reduct Property. If 2 L(S) and S � SA, then A j=L i� A �S j=L .

(v) Renaming Property. Let � : S0 ! S1 be a renaming. Then, foreach 2 L(S) there is a sentence �, from L[S1] such that for allS0-structures A, A j=L i� A� j=L

�.

First order logic is a logic in the sense of this de�nition, if we de�ne LS

to be the set of �rst order sentences of vocabulary S. Lindstr�om (5) de�nesan abstract logic as a logic with the above properties plus closure under

108

Marta Garcia Matos

negation and conjunction. We'll see that, for the purpose of this paper, wehave to give di�erent closure properties to de�ne our logic.

We call a formula of vocabulary S an S-formula. We will use LS todenote the set �rst order S-formulas. If P is a predicate symbol in S, LS;P

denotes the set of sentences in LS in which P occurs only positively. LS;Pr isthe set of formulas in LSwith variables among x1; : : : ; xr in which P occursonly positively.

2 A Lindstr�om Theorem

Next de�nition introduces a relation that is an isomorphism with respect toevery symbol in the language except for a particular predicate P . This isrelated to the particular back-and-forth system (De�nition 2) that we needfor proving Lindstr�om's Theorem for the P -positive fragment of �rst orderlogic.

De�nition 1. Let A;B be S-structures, and �a 2 A an r-tuple. A bijection� : A! B is a P -isomorphism if

A j= P (�a)) B j= P ( ��a)

andA j= Ri(�a) () B j= Ri( ��a) for Ri 2 S r fPg:

De�nition 2. A P -Back-and-Forth sequence for (A;B) is a sequence Pn �Pn�1 � : : : � Pn�k that satis�es the following conditions:

(i) Each Pi is a set of partial P -isomorphisms.

(ii) ; 2 Pn

(iii) If p 2 Pm and a 2 A, then there is b 2 B such that p[fha; big 2 Pm�1.

(iv) If p 2 Pm and b 2 B, then there is a 2 A such that p[fha; big 2 Pm�1.

If A = fan : n 2 !g; and B = fbn : n 2 !g, and there is a P -back-and-forth sequence of length ! for them, then they are P -isomorphic. Forsuppose we have de�ned, for n � k, c2n = an; and d2n+1 = bn, and asequence hcniPnhdni. If k = 2r, set ck = ar. By (iii), there is a least indexs such that hc0; : : : ; ckiPk+1hd0; : : : ; dk�1; bsi. Set dk = bs. Similarly if k isodd. By (i), f = hcn; dni : n 2 ! is a P -isomorphism on A onto B.

We adapt the following technical notion from (5). It is introduced toavoid redundancy in the use of quanti�ers or boolean connectives. Thisprevents formulas from being arbitrarily long, and the number of them be-coming in�nite.

109


De�nition 3. An (r; r)-condition is any atomic or negated atomic formulain LS;Pr . A complete (r; i)-condition is any disjunction of conjunctions of(r; i)-conditions. An (r; i� 1)-condition is 9xi� or 8xi�, where � is a com-plete (r; i)-condition.

We will see in Lemmas 7 and 8 that any � 2 LS;Pr of quanti�er rank � nis equivalent to some (n+ r; r)-condition. For each back-and-forth systemwe have a di�erent de�nition of the (r; i)-conditions. Lemma 9 (which is thecentral part of the proof of Theorem 4) shows the dependency between back-and-forth systems and formulas. In particular, in the proof of (ii) ! (i) ofthis theorem is essential that the set of formulas is �nite.

The following theorem is a generalization of Lindstr�om's Theorem withrespect to back-and-forth systems. In this particular case, the logic to becharacterized being the fragment of �rst order logic in which a given pred-icate occurs only positively, the closure property needed is that of a P -isomorphism, and the back-and-forth system needed is that of De�nition 2.This generalization enables us to have Lindstr�om-like theorems character-izing other than �rst order logic. This theorem can be also stated in aLindstr�om like form: LS;P is the strongest logic with the compactness andL�owenheim-Skolem properties that is closed under negation of all predicatesin S except P . We obtain this result by making L� in the Theorem 4 closedunder negation of all predicates in S except P .

Theorem 4 (Lindstr�om's Theorem (Generalized Form)). Let S be a�nite and relational vocabulary and P be a predicate symbol in S. Let L� bea logic with the Compactness and Downward L�owenheim-Skolem properties,satisfying the above properties (i)� (v), except (ii), which is replaced by clo-sure under P -isomorphisms. Let L� also be closed under _;^;9;8 (but notnecessarily under negation).

Let �; be S-sentences in L� such that Mod(�) \Mod( ) = ;. Thenthere is � 2 LS;P such that Mod( ) \Mod(�) = ;, and Mod(�) �Mod(�).

De�nition 5. A formula is said to be in negation normal form (nnf) if itis built up from atomic formulas and their negations using _;^;9;8.

Every �rst order formula is equivalent to an nnf formula. We assumethroughout this paper that �rst order formulas are in nnf.

Lemma 6. Every P -positive formula is preserved by P -isomorphisms.

Proof. By induction on the formation of P -positive formulas. The casefor a P -positive atomic or negated atomic formulas comes directly formDe�nition 1, and the case for connectives is easy.

Now suppose � = 9x (x), and A j= �. Then, A j= 9x (x) () 9a 2A s.t. A j= (a) ) (by induction hypothesis) B j= (�a) () (since � is

110

Marta Garcia Matos

a surjection) 9b 2 B s.t.B j= (b) () B j= 9x (x). The case for 8 is similar. a

Lemma 7. Let � = f�1; : : : ; �ng be a �nite set of S-formulas, and h�i theleast set that contains � and is closed under _;:. Then any formula inh�i is logically equivalent to some disjunction of conjunctions of the formu-las in f�1; : : : ; �n;:�1; : : : ;:�ng. In addition, there are only �nitely manypairwise logically nonequivalent formulas in h�i.

Proof. Suppose each �i has free variables among x1; : : : ; xr. Given amodel A, and a tuple �a = (a1; : : : ; ar) 2 A, take the conjunction �(A;�a) offormulas from f�1; : : : ; �n;:�1; : : : ;:�ng such that A j= �(A;�a)(a1; : : : ; ar).It is clear that the set of satis�able conjunctions is �nite, with cardinalityat most 2n. For any 2 h�i, take the disjunction � =

Wf�(A;�a)(a1; : : : ; ar) :

A j= (a1; : : : ; ar)g. Suppose B j= �(b1; : : : ; br). Then there is �(A;�a) with(a1; : : : ; ar) 2 A such that B j= �(A;�a)(b1; : : : ; br) and A j= (a1; : : : ; ar). So,we have is A such that A j= �i(a1; : : : ; ar) () B j= �i(b1; : : : ; br). Giventhat 2 h�i, and A j= (a1; : : : ; ar), we conclude that B j= (b1; : : : ; br).Now suppose B j= (b1; : : : ; br). Then �B;�b belongs to the disjunction �.Since B j= �B;�b(b1; : : : ; br), it follows that B j= �. a

The representatives of each of these equivalence classes can be taken tobe the (r; i)-conditions.

Lemma 8. For any n 2 N there are, up to logical equivalence, only �nitelymany P -positive formulas of quanti�er rank � n.

Proof. By induction on n.n = 0: Let 2 LS;Pr be an atomic or negated atomic formula. Since

S is �nite and relational, the set of (r; r)-conditions is �nite. The case forquanti�er free formulas follows from Lemma 7 if we take � to be the set of(r; r)-conditions.

Induction step: Suppose we have proved that there are formulas f�i; : : : ; �hg2 LS;Pr of quanti�er rank � n, and formulas f�j ; : : : ; �kg 2 L

S;Pr+1 of quan-

ti�er rank � n such that every formula in LS;Pr (respectively in L

S;Pr+1) of

quanti�er rank � n is logically equivalent to some �i (respectively �j).

Given a formula in 2 LS;Pr of quanti�er rank � n+ 1, it can be provedby induction on that it is contained in

h�1i = hf� 2 LS;Pr : qr(�) � ng [ f9x� 2 LS;Pr : qr(�) � ngi

We prove: (�) Any such is logically equivalent to a formula in

h�2i = hf�1; : : : ; �hg \ f9xr�1; : : : ;9xr�kgi:

But then, by Lemma 7, h�2i contains only �nitely many formulas whichare pairwise logically non-equivalent.

111


Proof of (�): By the induction hypothesis, every 2 LS;Pr of quanti�er rank� n is logically equivalent to some �i. Now, if 9x� 2 LS;Pr and qr(�) � n,then A j= 9x�(a1; :::; ar ; x) i� there is a 2 A A j= �(a1; :::; ar; ar+1). � 2LS;Pr+1 and has quanti�er rank � n, hence is logically equivalent to some �j;

thus 9x� is equivalent to 9xr�. Finally, it is easy to verify that if everyformula in �1 is logically equivalent to a formula in �2, then every formulain h�1i is logically equivalent to a formula in h�2i. a

Lemma 9. The following two facts are equivalent:

(i) For every 2 LS;Pr+1 of quanti�er rank � n, A j= ) B j= .

(ii) There is a P -back-and-forth sequence Pn; : : : ; P0 for (A;B).

Proof. We prove by induction on n the equivalence of

(i0) For all 2 LS;Pr+1 of quanti�er rank � n, (a1; : : : ; ar) 2 A; (b1; : : : ; br) 2B, A j= (a1; : : : ; ar)) B j= (b1; : : : ; br).

(ii0) There is a P -back-and-forth sequence Pn; : : : ; P0 for (A;B) such thatfha1; b1i; : : : ; har; brig 2 Pn.

i0 ) ii0: By induction hypothesis we have Pn � Pn�1 � : : : � Pm.Suppose p 2 Pm;dom(p) = fa1; : : : ; arg; and a 2 A. Take �(dom(p); a) tobe the (r+m; r+ 1)-condition such that A j= �(dom(p); a). 9x�(dom(p); x)has quanti�er rank m. By the induction hypothesis B j= 9x�(rng(p); x).Thus there is b 2 B such that B j= �(rng(p); b). Let b = p(a). Thus we havep [ fha; big 2 Pm�1. The back condition is proved in a similar way. Theinduction hypothesis in this case uses B 6j= �) A 6j= �.

ii0 ) i0: By induction on the complexity of formulas . Suppose 2LS;Pr , qr( ) � m, p 2 Pm, and a10 ; : : : ; ar 2 dom(p). The atomic casecomes from De�nition 1. The connective cases are clear. Suppose = 9x�,and A j= (a1; : : : ; ar). Then there is an element a 2 A such that A j=�(a1; : : : ; ar; a). Using the forth condition of Pm, take the element suchthat there is q 2 Pm�1, q � p, a 2 dom(q), and A j= �(a1; : : : ; ar; a). Byinduction hypothesis B j= �(p(a1); : : : ; p(ar); q(a)). By the back condition,there is b 2 B such that q(a) = b. Thus B j= �(p(a1); : : : ; p(ar); b), andtherefore B j= (p(a1); : : : ; p(ar)). a

Proof of Theorem 4. For each m 2 !, construct �m 2 LS;P such that

Mod(�) �Mod(�m) as

�m =_

A2Mod(�)

^f� : �(m; 0)� condition, and A j= �g

112

Marta Garcia Matos

That �m is in LS;P follows from the fact that there are only �nitely many(m; 0)�conditions. (Lemma 8). Clearly, if A j= �, then A j= �m for all m.So Mod(�) �Mod(�) for all m.

If there is an m 2 ! such that Mod(�m) \Mod( ) = ;, take � = �m.Suppose to the contrary that there is no such m. In this case, there isBm 2Mod( ) such that Bm j=

Vf(m; 0)� conditions � : A j= �g, for some

Am 2 Mod(�), for each m 2 !. So, we have Am and Bm such that Am j=�) Bm j= � for each m. By compactness, we get models A 2Mod( ) andB 2 Mod(�), such that for all (m; 0)-conditions �, A j= � ) B j= �. Thisgives us by Lemma 9 a P -back-and-forth sequence of length ! for (A;B). ByL�owenheim-Skolem theorem, we can get (A;B) countable. By going backand forth we get a bijection � : A ! B which is a P -isomorphism. SinceMod(�) is closed under P -isomorphisms, A j= � implies B j= �, but thenB 2Mod(�) \Mod( ), a contradiction. a

Let R be any relational vocabulary and P a predicate symbol in R. Let ; �; � 2 LR. Theorem 4 has as corollaries the following known results:

Corollary 10 (Lyndon's Interpolation Theorem.). Let ; � be such that j= �. Then there is a sentence � such that:

(i) j= � and � j= �.

(ii) � contains only those predicate symbols that occur in both and �.

(iii) If is P -positive, then so is �, and if � is P -negative, then so is �.

(iv) The same as in (iii), changing 'positive' to 'negative'.

Proof. Take L� in Theorem 4 to be the logic of classes K such that forsome � 2 LR;P , A 2 K i� there is B such that B j= � and B �R= A. LetR1 be the vocabulary of , and R2 that of �, and suppose P occurs onlypositively in . Let K1;K2 2 L

� be such that:

A 2 K1i� there is B such that B j= and B �R1= A:

A 2 K2 i� there is B such that B j= :� and B �R2= A:

Then K1 \ K2 = ;. By Theorem 4, there is a � 2 LS;P such thatK1 �Mod(�) and K2 \Mod(�) = ;. Then j= � and � j= �. a

Corollary 11 (Lyndon's Preservation Theorem.). For any formula �,the following conditions are equivalent:

(i) Mod(�) can be characterized by a set of positive formulas.

(ii) Mod(�) is closed under homomorphisms.

113


Proof. The trivial direction is (i) ) (ii). Assume (i). Let � be anyP -positive S-formula, A;B be R-structures, �x 2 A, and � : A ! B a P -homomorphism and A j= �. To show (ii) it suÆces to prove that B j= �.This follows from Lemma 6, by induction in the formation of formulas.

(ii)) (i) Make L� in the Theorem 4 Closed under negation of all pred-icates in R except P . Take = :�. Then � �. a

We believe that such generalized forms of Lindstr�om's Theorem consi-tute a good general methodology for proving preservation and interpolationtheorems.

3 Further Directions

For non-classical logics, such as systems of modal logic, various InterpolationTheorems are known (4), (6). Our purpose is to extend this work in thedirection of modal logics and study model theoretic characterizations ofthese logics. There is a preliminary result in this area proved by de Rijke(8) but our goal is to prove a more conclusive result. The interesting thingabout Lindstr�om's theorem is that the L�owenheim-Skolem and Compactnesstheorems establish a limit in the expressive power of �rst order logic, in sucha way that if we want this expressive power to be wider, precisely thoseproperties are the �rst to be given up. The work of de Rijke did not geta real model theoretic characterization of propositional modal logic in thesense that it proposes a property of basic model logic which is lost in anyextension of it, keeping therefore the characterization aspect of Lindstr�om'stheorem, but not its semantic implications.

Acknowledgements I want to acknowledge anonymous referees forhelpful suggestions and advises, and Professor Jouko V�a�an�anen for constantsupport, and help in making me see the meaning of all this.

References

[1] J. Barwise and S. Feferman, Model-theoretic logics, Springer-Verlag,New York, 1985.

[2] J. Barwise and J. van Benthem, Interpolation, Preservation, and PebbleGames, Journal of Symbolic Logic, 64:2, 881-903, 1999.

[3] J. Flum, Characterizing Logics, in J. Barwise and S. Feferman, Model-theoretic logics, Springer-Verlag, New York, 1985.

[4] D. Gabbay, Craig's Interpolation Theorem for Modal Logics, Confer-ence in Mathematical Logic, 111-127 London, 1970, Lecture Notes inMath., Vol. 255, Springer, Berlin, 1972.

114

Marta Garcia Matos

[5] P. Lindstr�om, On Extensions of Elementary Logic, Theoria, 35, 1{11,1969.

[6] L. L. Maksimova Interpolation Theorems in Modal Logics. SuÆcientConditions. (Russian) Algebra i Logika 19:2, 194{213, 250{251, 1980.English translation: Algebra and Logic 19:2, 106{120, 1981.

[7] M. Otto, An Interpolation Theorem, Bulletin of Symbolic Logic, 6:2,447-462, 2000.

[8] M. de Rijke, A Lindstr�om Theorem for Modal Logic, In: A. Ponse, M.de Rijke, and Y. Venema, editors, Modal Logic and Process Algebra,Lecture Notes 53, CSLI Publications, Standford, 217-230, 1995.

[9] S. Shelah and J. V�a�an�anen, A Note on Extensions of In�nitary Logic,www.math.helsinki.�/ logic/jvaaftp.html.

[10] J. V�a�an�anen, Set Theoretic De�nability of Logics, in J. Barwise andS. Feferman, Model-theoretic logics, Springer-Verlag, New York,1985.

115


116

A Dynamic Approach to Referent

Tracking and Pronoun Resolution

Elsi Kaiser

University of Pennsylvania

[email protected]

Abstract. This paper presents a referent-tracking and pronoun resolution sys-tem for Finnish, a free-word-order, articleless language that poses a challenge foralgorithms designed for languages like English that have de�nite/inde�nite arti-cles. To track referents and interpret pronouns in Finnish, this algorithm uses thepragmatically-motivated word order tendencies of Finnish to create an ordered reg-ister of pegs (where each peg is associated with an entity in the discourse), rankedaccording to salience. Pronouns are interpreted as referring to the topmost pegin the register. The algorithm aims to extend and adapt notions from DynamicSemantics ((Groenendijk et al. 1996)) and Centering Theory ((Grosz et al. 1995))to a typologically di�erent language.

1 Introduction

In this paper,1 I present an entity-tracking and pronoun interpretation sys-tem which extends and elaborates some of the methods of Centering Theory(e.g. (Grosz et al. 1995)) and Dynamic Semantics (e.g. (Groenendijk et al.1996)). In order to track referents and interpret pronouns in Finnish, a freeword order language without articles, this algorithm uses the pragmatically-motivated word order tendencies of Finnish to create an ordered register ofpegs (where each peg is associated with an entity in the discourse), rankedaccording to salience. In this algorithm, pronouns are interpreted as refer-ring to the topmost (most salient) peg in the register.

The structure of this paper is as follows. First, Section 2 is a review ofsome previous work, and in Section 3, I present the basics of the algorithm.I discuss why languages like Finnish pose interesting problems for entity-tracking systems in Section 4, and Section 5 addresses the details of thecurrent version of the algorithm. Section 6 presents some additional issues

1Thanks are due to Robin Clark for many comments and ideas. Thanks also to EllenPrince, Misha Becker, Tonia Bleam, Na-Rae Han, Eleni Miltsakaki, Kimiko Nakanishi andaudiences at Penn and Amsterdam for their feedback on earlier versions of this paper. Iwould also like to thank three anonymous ESSLLI reviewers for very useful comments.

Proceedings of the Sixth ESSLLI Student SessionKristina Striegnitz (editor)Chapter 11, Copyright c 2001, Elsi Kaiser

117

A Dynamic Approach to Referent Tracking and Pronoun Resolution

and Section 7 contains the conclusion, as well as directions for future work.The algorithm presented here is still work in progress, and is best viewed asa �rst step rather than the �nal answer.

2 A look at some previous work

The algorithm presented in this paper makes crucial use of the notion of`pegs' ((Groenendijk et al. 1996)), which can be thought of as "addresses inmemory" and are used to keep track of entities in a discourse. In this model,as in dynamic semantics, when a new (previously unmentioned) entity entersthe discourse model for the �rst time, it is mapped onto a new peg. Pronounsrefer to entities that are already linked to pegs.2 More speci�cally, in thisalgorithm, a pronoun is interpreted as an instruction to �nd the entity linkedto the highest-ranked featurally-compatible peg (The method of ranking thepegs is discussed below in Sections 3 and 5).

The notion that a pronoun 'points to' the topmost peg of an orderedregister of pegs is based on Centering Theory ((Grosz et al. 1995))3. Ac-cording to Centering Theory, which is a model of the local-level componentof attentional state in discourse (Grosz et al. (1995:4-5)), the entities (`cen-ters') evoked by a sentence are ordered in terms of their discourse salienceThe most salient entity is the one that is at the center of attention at thatparticular point in the discourse. The ranking can depend on a numberof criteria, including syntactic, morphological and prosodic factors. It isusually assumed to be subject > object > others for English (Walker et al.(1998:7)). Each utterance thus has a single highest-ranked center. In ad-dition, most utterances have a `backward-looking center', i.e. a center thatrefers to an entity mentioned in the preceding utterance. According to Cen-tering Theory, if any of the centers expressed in a sentence is a pronoun,then the backward-looking center must be a pronoun (`Pronoun Rule').

3 Introduction to the algorithm

The algorithm, in its current incarnation, is designed to track the entitiesmentioned in a discourse and to resolve the pronouns that refer to them. Ithas three main components; (1) A supply of pegs; (2) a discourse register(an array) to act as a storehouse for the pegs that are `in use' in the dis-course, and (3) a current register (an array) for pegs in the current sentence(cf. (Hintikka and Sandu 1997)). The pegs are dumped from the current

2As one of the reviewers noted, this is somewhat of a simpli�cation, since there aresome exceptions. For example, after talking about 'a couple', one can then use the pronoun'she' to refer to the woman in the couple.

3See (Walker et al. 1998) for a more in-depth look at Centering.

118

Elsi Kaiser

register to the discourse register at the end of every utterance, and their or-der relative to one another is retained. The current version of the algorithmis not intended to deal with (discourse) deixis. Moreover, at the currentstage, only sentences with neutral intonation are considered.

3.1 Ranked pegs

A central aspect of this algorithm is the idea that the pegs are ranked in theregisters, with the topmost one being more accessible4 than the ones lowerdown. When a new discourse entity is introduced, it is associated with apeg, and that peg is added to the current register. The algorithm treatssubjects and objects asymmetrically to capture the fact that subjects tendto be more salient than objects (as noted for English by (Grosz et al. 1995),(Hudson-D'Zmura and Tanenhaus 1998), inter alia; for crosslinguistic work,see (Walker et al. 1998)). To re ect this, the peg for the subject of a sentenceis located above (i.e. is more accessible than) the peg for the object. Thisranking could easily be extended to include other grammatical functionsbeyond subject and (direct) object. However, for reasons of brevity, thispaper only discusses subjects and objects.

It is worth noting that this algorithm ranks entities in terms of theirgrammatical functions, not their linear position in the sentence. As we willsee in Section 5, it seems that, in Finnish, subjects are ranked higher (i.e.are more accessible) than objects regardless of whether the word order isSVO or OVS. Using grammatical relations instead of word order to rankreferents is supported by preliminary psycholinguistic results for Finnish(Kaiser in preparation), as well as crosslinguistic research in other languageswith exible word order (see e.g. (Ho�man 1998) and (Turan 1998) onTurkish, and (Prasad and Strube 2000) on Hindi. However, there seemsto be some crosslinguistic variation in what factors determine the ranking(see e.g. (Rambow 1993) and (Strube and Hahn 1996), (Strube 1998) onGerman).

3.2 Resolving pronouns

According to this algorithm, a pronoun is interpreted as an instruction topoint to the topmost peg in the discourse register (actually, to the entitythat the topmost peg is connected to) and to bring it into the current regis-ter. If the entity linked to the topmost peg is featurally incompatible withthe pronoun (e.g. the pronoun is he but the entity is feminine), then thealgorithm checks the next highest peg in the discourse register.

This algorithm is more `global' than Centering Theory, which focuses onthe local relationship between two adjacent utterances within a discourse

4The term 'accessible' is used here to mean 'salient,' where the most salient entity isthe one at the center of attention.

119


segment. Within Centering Theory, it is not clear how one should deal witha pronominal 'backward-looking center' that does not have an antecedent inthe immediately preceding utterance (e.g. Peter visited his mother yester-day. She had recently moved to New York. He enjoyed his trip, especiallysince the weather was so nice.). In contrast, in the current algorithm, thesearch for the suitable peg is not limited to the pegs for entities in the im-mediately preceding utterance. Along similar lines, (Walker 1998) argues infavor of a Cache Model, according to which centers can remain accessibleeven two to three sentences after they occur.

3.3 When to add pegs, when to look for existing pegs

The algorithm needs to ensure that the number of pegs in use is correct.It needs to know whether a given NP, say cat, refers to a cat that wasalready mentioned and has a peg in the discourse register, or whether itis introducing a new discourse entity. In English, the articles a and theprovide useful clues for this task. In contrast, Finnish has neither a de�nitenor an inde�nite article, and thus the distinction between a noun phrasethat is `pegless' and one that already has a peg cannot be made simply onthe basis of morphology. Thus, it is not clear how a system, e.g. DynamicSemantics, that aims to track the referents mentioned in a discourse woulddeal with Finnish. The algorithm discussed here deals with this complicationby using information from word order patterns. The next section addressesthe relationship between word order and information status in Finnish.

4 Basics of Finnish word order

As mentioned above, the articleless nature of Finnish poses problems forreferent-tracking systems that use the de�nite/inde�nite article distinctionto distinguish new entities from previously mentioned ones.5 However, thelack of articles does not mean that Finnish fails to mark the distinctionbetween old and new information. Finnish word order is exible, and canbe used to make many of the distinctions that other languages accomplishby using articles (see e.g. (Chesterman 1991))6.

Although Finnish is canonically SVO, all six permutations of these el-ements are grammatical in the appropriate contexts (Vilkuna (1995:245)).Di�erent word orders convey di�erent kinds of pragmatic information. For

5(Kruij�-Korbayov�a 1997) discusses the application of File Change Semantics to Czech,another articleless language, and shows how word order marks the novelty/familaritydistinction. However, she does not propose a reference resolution algorithm. Thanks toone of the reviewers for bringing this paper to my attention.

6A kind of optional 'de�nite article' is evolving in colloquial Finnish from the demon-strative pronoun se 'it' ((Laury 1997)). However, this phenomenon is not a part of stan-dard Finnish.

120

Elsi Kaiser

example, in orders with more than one preverbal argument (SOV, OSV),as well as verb-initial orders (VOS, VSO), the initial constituent is in-terpreted as contrastive7 (see e.g. (Vallduv�i and Vilkuna 1998), (Vilkuna1995)). When trying to ascertain the discourse status of an entity, knowingwhether it is contrastive is not necessarily informative because, as (Vallduv�iand Vilkuna 1998) note, the notions of 'rheme' (roughly speaking, 'new'information) and 'kontrast' are distinct and should not be con ated.8 Onthe other hand, SVO and OVS orders tell us about the discourse status ofthe subject and object { i.e. whether they are new to the discourse or havealready been mentioned. Thus, this paper focuses on SVO and OVS orders.

Let us now take a closer look at the pragmatics of SVO and OVS orders.In Finnish, a preverbal subject NP is usually interpreted as being `old' in-formation, i.e., as referring to an entity already mentioned in the discourse(see (Prince 1992) for a discussion of 'discourse-old') (ex. (1)). If an SVOsentence occurs at the very beginning of a discourse, however, the preverbalsubject can be 'new' information. Moreover, postverbal NP subjects are`new' information (ex. (2)). When an NP object is preverbal, it is inter-preted as old (ex. (2)). However, NP objects in their canonical postverbalposition can be new or old (ex. (1)). These patterns are summarized inChart 1 (see (Chesterman 1991) for further discussion).

(1) MiesMan-NOM

lukiread

kirjan.book-ACC

(SVO order)

`A/the man read a/the book.'

(2) KirjanBook-ACC

lukiread

mies.man-NOM

(OVS order)

`The book, a man read.'

Chart (1)

subject-old subject new

object-old SVO OVSobject-new SVO SVO

In the next section we will see how the entity-tracking algorithm cantake these ordering patterns into account, in order to avoid misinterpretingpreviously-mentioned entities as new, or new entities as already mentioned.

7putting aside the �ner details of the various ways of de�nining the term 'contrastive'8See (Kaiser 2000a) for further discussion concerning the functions of OSV order in

Finnish and the distinction between discourse status and contrast.

121


5 Algorithm for Finnish

5.1 Full NP subjects and objects

First, we will consider what happens when the algorithm encounters a sub-ject9. As illustrated in the simple example below, when the algorithm comesacross a preverbal NP subject (`man' in ex. (3)) at the very beginning of adiscourse, it adds a peg to the top of the current register. (Curly brackets f gare used to denote registers, `d.r.' means discourse register, and `c.r.' standsfor current register. The rightmost elements in the registers are the mostsalient { i.e. `topmost.') When the sentence ends, the peg is dumped fromthe current register into the discourse register. The next sentence beginswith the (gender-neutral) pronoun h�an `s/he', which the algorithm inter-prets as an instruction to look for the highest-ranked featurally-compatiblepeg. After this peg is located, it is moved to the current register, and thereferent linked to this peg is interpreted as the subject of the predicate`smiled'. At the end of the second sentence, all pegs in the current registerare dumped into the discourse register.

(3) MiesMan-NOM

k�aveliwalked

sis�a�an.in.

H�anS/he-NOM

hymyili.smiled.

(discourse-initial)

`A man walked in. He smiled.'

f . . . gd:r: fmangc:r: [peg from �rst sentence]fmangd:r: f . . . gc:r: [peg dumped into d.r. at end of sentence]f. . . gd:r: fmangc:r: [pronoun h�an `s/he' encountered, points to

top peg, which is moved to c.r.]fmangd:r: f. . . gc:r: [peg back to d.r. after second sentence is over]

Recall now that a preverbal NP subject `inside' a discourse usually refersto an already-mentioned entity. Thus, in such a case, the algorithm searchesfor the peg that already exists for this entity, and brings it to the top of thecurrent register. If the algorithm cannot �nd a suitable peg, an accommoda-tion process is presumably triggered; perhaps, as a last resort, the algorithm'repairs' the situation by adding a new peg. An utterance requiring accom-modation is presumably felt to be infelicitous.

When dealing with a postverbal NP subject (new information), the al-gorithm adds a new peg to the top of the current register. This is illustratedin ex. (4). This example can also be used to illustrate the algorithm's treat-ment of preverbal objects. When the algorithm comes across a preverbal

9The examples discussed in this paper involve only bare nouns or pronouns. The issuesinvolved with more complex noun phrases such as joku nainen 'some woman' and toinennainen 'another woman' merit further research, but cannot be addressed here for spacereasons.

122

Elsi Kaiser

NP object (old information), it searches for an existing peg in the discourseregister, and brings it to the second-highest slot in the current register. Atthe end of the �rst sentence, the pegs for the subject and object are thusranked such that the subject peg is above the object peg. As mentionedearlier, the subject peg is more accessible than the object peg, regardlessof the word order (SVO or OVS). In other words, the algorithmn ranks thepegs by grammatical function, not word order or information status. Thisarrangement predicts that a pronoun in the subsequent sentence will tendto refer to the subject `man', and not the object `woman.' This is sup-ported by preliminary psycholinguistic �ndings for Finnish (see (Kaiser inpreparation)).10

(4) ...Naisen...woman-ACC

n�akisaw

mies.man-NOM.

H�anS/he-NOM

oliwas

iloinen.happy.

`...The woman, a man saw. He was happy.'

f. . . gd:r: fwomangc:r: [peg for sentence-initial object moved from d.r. to c.r.]f. . . gd:r: fwoman, mangc:r: [peg for postverbal subject added on top of object peg in c.r.]fwoman, mangd:r: f. . . gc:r: [pegs dumped into d.r. at end of �rst sentence]fwoman . . . gd:r: fmangc:r [pronoun h�an `s/he' encountered, peg that

pronoun points to is moved to c.r.]fwoman, mangd:r: f. . . gc:r: [pegs back to d.r. after second sentence is over]

A postverbal NP object is less straightforward than a preverbal one. Inthe case of a postverbal object, it is unclear whether it already has a peg orwhether is new to the discourse. Thus a search procedure is used to checkif a matching peg is present in the discourse register. The algorithm checksthe seven highest pegs11 to see if one of them is linked to a referent thanmatches the postverbal NP object. If it �nds one, this peg is moved to thesecond-highest position in the current register. If no peg is found, a newpeg is added to the top of the current register, below the subject peg. Thedetails of the matching process are left as questions for future research.

5.2 Pronominal subjects and objects

The asymmetry in the ranking of subjects over objects also holds for pronom-inal arguments. When the algorithm encounters a pronominal subject, itlooks for the topmost compatible peg in the discourse register and bringsit to the top of the current register. However, when the algorithm comesacross a pronominal object, it brings the topmost compatible peg from the

10As one of the reviewers points out, this does not hold for parallel constructions (e.g.Lisa saw Anne and Mary saw her too.). It is worth noting that these constructions are'special' in that they are often marked by a special kind of intonation and/or use oftoo/also.

11The number seven was chosen because human short-term/working memory can con-tain approximately seven items. See e.g. (Miller 1956).

123


discourse register to the current register, but does not raise this peg abovethe peg for the subject. Example (5) illustrates this. According to informantjudgments, the gender-neutral subject pronoun h�an `s/he' (in the third sen-tence) shows a preference for the preceding subject even if an object pronounis present (in the second sentence). If this object pronoun raised its peg tovery top, we would expect a subsequent pronoun to refer to it.

(5) PekkaPekka

keitt�a�acooks

kahvia.co�ee-PART.

LiisaLiisa

katseleelooks-at

h�ant�as/he-PART

samallasame-time

kunas

vesiwater-NOM

kiehuu.boils.

H�anS/he

tykk�a�alikes

Pekasta/???Liisasta.Pekka-ELA/Liisa-ELA.'Pekka is making co�ee. Liisa looks at him while the water is boiling.She likes Pekka/???He likes Liisa.'

These examples also illustrate the advantage of having a separate dis-course register. The distinction between the current register and the dis-course register explains why the object pronoun h�ant�a `him/her' in the sec-ond sentence cannot refer to the subject of its sentence, Liisa. Pronounsare interpreted by the algorithm as instructions to point to the topmostcompatible peg in the discourse register, and in ex. (5), the peg for Liisa isstill in the current register and thus it is not a possible antecedent for thepronoun (See Section 6 for a discussion of re exives). Another advantageof having both a discourse register and a current register is the ease withwhich sentences with two pronouns (e.g. Peter saw John. He kicked him)can be interpreted (see the end of Section 5.3).

5.3 Demonstrative anaphors

Let us now tackle a sentence where both the subject and object are pro-nouns, such as Peter saw John. He kicked him. In Finnish, such sentencesare somewhat marked, due to the existence of another alternative, namelythe demonstrative t�am�a `this,' which can be used as an anaphor (e.g. (Haku-linen and Karlsson 1988), (Sulkala and Karjalainen 1992)). It has often beennoted that this anaphoric demonstrative tends to refer to non-subject ar-guments (e.g. (Saarimaa 1949)). Corpus studies support this observation((Halmari 1994), (Kaiser 2000b)). In a two-pronoun sentence, the anaphoricdemonstrative is usually used instead of a second pronoun (ex. (6)).

(6) PekkaPekka-NOM

huomasinoticed

MatinMatti-ACC

pihalla.yard-ADE.

H�anS/he-NOM

tervehtigreeted

t�at�a.this-PART.

'Pekka noticed Matti in the yard. HePekka greeted thisMatti.'

124

Elsi Kaiser

Given the tendency of t�am�a to refer to objects, one might hypothesizethat, when the algorithm encounters an anaphoric demonstrative, it pointsto the second highest peg in the discourse register. However, (6) shows thatthis is insuÆcient, because t�am�a can be used even when only one peg isleft. At the start of the second sentence in (6), the algorithm encountersh�an 's/he' and pulls the peg for Pekka into the current register. Then theanaphoric demonstrative t�am�a 'this' is encountered { but at this point, onlythe peg for Matti remains in the discourse register, and it is referred to witht�am�a. One might conclude that the anaphoric demonstrative is preferablyinterpreted as an instruction to point to the second-highest peg but, if noother alternative is available, it can also be interpreted as pointing to thetop peg. This approach, however, runs into trouble with ex. (7).

(7) LiisaLiisa

nukkuusleeps

kotona.at-home.

H�anS/he

//

???T�am�aThis

onis

sairas.sick.

'Liisa is sleeping at home. She is sick.'

Even though the peg for Liisa is the only available peg in the discourseregister by the time the algorithm gets to the second sentence, t�am�a 'this'cannot be used to refer to Liisa. Instead of trying to de�ne the anaphoricdemonstrative t�am�a by the ranking of the peg it points to, maybe we shouldtreat it as having preference to refer to objects, i.e. to be associated withreferents that have a certain grammatical relation/semantic role. While thissolution is not very elegant, it seems, so far, to o�er the best account of theempirical data. This area would clearly bene�t from future work.

On a more positive note, the algorithm works smoothly for 'double-pronoun' sentences of the English type. Consider a sentence such as Petersaw John. He hit him, which most people tend to interpret as meaning 'Peterhit John.'12 When the algorithm encounters the subject pronoun 'he' in thesecond sentence, it pulls the peg for Peter into the current register. Whenit reaches the object pronoun 'him,' it points to the top peg in the discourseregister which, at this point, is the peg for John. Thus, due to the presenceof two registers, the algorithm resolves the pronouns successfully, withoutthe need for any addition stipulations concerning the second pronoun.

6 Some other issues

6.1 Pronouns vs. re exives

There are a number of possible approaches to binding-theoretic phenom-ena. One option is to posit that pronouns refer to entities linked to pegsin the discourse register, and re exives to entities whose pegs are in the

12assuming neutral intonation.

125


current register ('divided registers'). Another option would be to use coref-erence restrictions to prevent certain NPs from referring to the same peg(see (Hintikka and Sandu 1997)). In order to account for some of the detailsof binding domains (picture of NP's etc.), we could add further partitions(embedded registers) or impose coindexation restrictions derived from syn-tactic binding theories. It is not clear whether a proliferation of registers isthe best approach, given that this algorithm is not intended to replace Bind-ing Theory. It seems reasonable to conclude that coindexation restrictionsshould play a role in the algorithm.

6.2 Main vs. Subordinate clauses

An important question that we have sidestepped so far is: What is the sizeof the utterance? I.e., when are pegs dumped from the current utteranceregister to the discourse register? This question is especially relevant whenit comes to sentences that contain subordinate clauses (see e.g. (Kameyama1998) and (Miltsakaki 1999)). The distribution of anaphoric demonstra-tives in Finnish texts ((Kaiser 2000b)) suggests that at least some kinds ofembedded clauses should be treated as subparts of the main clause.

In addition to referring to objects (Section 5), the anaphoric demonstra-tive t�am�a can also be used to refer to an embedded subject - especially whena `competitor' antecedent is present in the main clause (ex.(8)).

(8) MattiMattii

sanoi,said

ett�athat

LiisaLiisaj

onis

sairas.sick.

T�am�aThisj

oliwas

saanutgotten

unssan u-ACC

hiihtomatkalla.ski-trip-ADE

'Matti said that Liisa is sick. She had caught the u during a skiingtrip.'

If a subordinate clause behaves just like a main clause, the regular pro-noun h�an should be used to refer to the subjects of all kinds of sentencesequally, regardless of whether they are matrix or subordinate. The prefer-ence to use t�am�a for referents in embedded clauses (ex.(8)) suggests thatembedded clauses are best treated as subparts of the main clause. We canencode this in the algorithm by introducing a partition in the current regis-ter, such that subordinate sentences have their own separate current registeralongside the current register of the main sentence. When the algorithm hasprocessed the main clause, it dumps the current register of the main clauseinto the discourse register, and then after processing the subordinate clause,it dumps its current register into the discourse register below that of themain clause, as if this second current register were an object in some sense.

126

Elsi Kaiser

7 Conclusion

In this paper, I have presented a preliminary referent-tracking and pronounresolution system for Finnish, an articleless, free-word-order language thatposes a challenge for entity-tracking systems designed for languages likeEnglish or German. The algorithm described in this paper uses the wordorder patterns of Finnish to distinguish 'old' and 'new' referents. In addition,by mapping the entities in a discourse onto pegs that are ranked with respectto their salience, the algorithm also aims to function as a pronoun resolutionsystem.

Many issues remain open for future research, including the interpretationof plural pronouns, the role of intonation and the treatment of more complexnoun phrases. Moreover, so far, the algorithm has only been tested onvery small amounts of text. In the future, it should be tested on largercorpora and also compared to/combined with existing algorithms, in orderto determine how to best tackle referent tracking and resolution in a languageof this type.

References

Chesterman, A. (1991). On de�niteness. Cambridge University Press.

Groenendijk, J., M. Stokhof, and F. Veltman (1996). Coreference andModality. In S. Lappin (Ed.), The Handbook of Contemporary Seman-tic Theory, pp. 179{213. Blackwell.

Grosz, B., A. Joshi, and S. Weinstein (1995). Centering: A Framework forModelling the Local Coherence of Discourse. Technical report, Instutefor Research in Cognitive Science, University of Pennsylvania.

Hakulinen, A. and F. Karlsson (1988). Nykysuomen lauseoppia. Suoma-laisen Kirjallisuuden Seura.

Halmari, H. (1994). On accessibility and coreference. Nordic Journal ofLinguistics 17, 35{59.

Hintikka, J. and G. Sandu (1997). Game-Theoretical Semantics. In J. vanBenthem and A. ter Meulen (Eds.), Handbook of Logic and Language,pp. 361{410. Elsevier Science.

Ho�man, B. (1998). Word Order, Information Structure and Centeringin Turkish. In M. Walker, A. Joshi, and E. Prince (Eds.), CenteringTheory in Discourse, pp. 251{272. Clarendon Press.

Hudson-D'Zmura, S. and M. Tanenhaus (1998). Assiging Antecedents toAmbiguous Pronouns: The Role of the Center of Attention as theDefault Assignment. In A. J. M.A. Walker and E. Prince (Eds.), Cen-tering Theory in Discourse, pp. 199{226. Clarendon Press.

127


Kaiser, E. (2000a). The discourse functions and syntax of OSV word orderin Finnish. In Proceedings of the 36th Annual Meeting of the ChicagoLinguistics Society.

Kaiser, E. (2000b). Pronouns and demonstratives in Finnish: Indicatorsof Referent Salience. In Proceedings of the Third International Con-ference on Discourse Anaphora and Anaphor Resolution (DAARC).

Kaiser, E. (2001). Word order and pronoun interpretation in Finnish. Inpreparation, University of Pennsylvania.

Kameyama, M. (1998). Intrasentential Centering: A case Study. In A. J.M.A. Walker and E. Prince (Eds.), Centering Theory in Discourse,pp. 98{112. Clarendon Press.

Kruij�-Korbayov�a, I. (1997). Czech Noun Phrases in File Change Seman-tics. In G. K. A. Drewery and R. Zuber (Eds.), Proceedings of theStudent Session at the 10th ESSLLI, pp. 107{118.

Laury, R. (1997). Demonstratives in Interaction - The Emergence of aDe�nite Article in Finnish. John Benjamins.

Miller, G. (1956). The Magical Number Seven, Plus or Minus Two: SomeLimits on our Capacity for Processing Information. Psychological Re-view 3, 81{97.

Miltsakaki, E. (1999). Locating Topics in Text Processing. In Proceedingsof Computational Linguistics in the Netherlands (CLIN'99).

Prasad, R. and M. Strube (2000). Discourse Salience and Pronoun Reso-lution in Hindi. UPenn Working Papers in Linguistics 6, 189{208.

Prince, E. (1992). The ZPG letter: subjects, de�niteness, andinformation-status. In S. Thompson and W. Mann (Eds.), Discoursedescription: diverse analyses of a fund raising text, pp. 295{325. JohnBenjamins.

Rambow, O. (1993). Pragmatic Aspects of Scrambling and Topicalizationin German. Paper presented at the Workshop on Naturally-OccurringDiscourse, IRCS, University of Pennsylvania.

Saarimaa, E. (1949). Kielemme k�ayt�ant�o. Pronominivirheist�a.Viritt�aj�a 49, 250{257.

Strube, M. (1998). Never Look Back: An Alternative to Centering. InProceedings of the 17th International Conference on ComputationalLinguistics and the 36th Annual Meeting of the ACL.

Strube, M. and U. Hahn (1996). Functional Centering. In Proceedings ofthe 34th Annual Meeting of the Association for Computational Lin-guistics.

Sulkala, H. and M. Karjalainen (1992). Finnish. Routledge.

128

Elsi Kaiser

Turan, U. D. (1998). Ranking Forward-Looking Centers in Turkish: Uni-versal and Language-Speci�c Properties. In A. J. M.A. Walker andE. Prince (Eds.), Centering Theory in Discourse, pp. 139{160. Claren-don Press.

Vallduv�i, E. and M. Vilkuna (1998). On rheme and kontrast. In P. Culi-cover and L. McNally (Eds.), The Limits of Syntax Syntax and Se-mantics 29, pp. 79{108. Academic Press.

Vilkuna, M. (1995). Discourse Con�gurationality in Finnish. In K. E.Kiss (Ed.), Discourse Con�gurational Languages, pp. 244{268. OxfordUniversity Press.

Walker, M. (1998). Centering: Anaphora Resolution and Discourse Struc-ture. In A. J. M.A. Walker and E. Prince (Eds.), Centering Theory inDiscourse, pp. 401{435. Clarendon Press.

Walker, M., A. Joshi, and E. Prince (1998). Centering Theory in Dis-course. Clarendon Press.

129


130

Towards a comprehensive meaning of

German doch

Elena Karagjosova

Computerlinguistik, Universit�at des Saarlandes, Saarbr�ucken, Germany

[email protected]

Abstract. In this paper, an overall abstract meaning is assumed to cover all syn-tactic instantiations of German doch (i.e., as a discourse connector, modal particle,sentence adverb and response particle). It is claimed that the overall meaning isa contrastive coherence relation. Three kinds of contrast relations are consideredand it is argued that they can hold between relata that have di�erent status indiscourse: in its use as a discourse connector, doch can express all three di�erentkinds of contrast relations that hold between two propositions, whereas as a sen-tence adverb, modal particle and response particle, it marks the contrast relation ofdenied expectation between the proposition expressed by the sentence containingdoch and some implication arising from it.

1 Introduction

In the examples below, the word doch ful�lls di�erent syntactic functionscorresponding to the di�erent categories of speech it belongs to: discourseconnective (1)-(2), sentential adverb (3), modal particle (4)-(5), and re-sponse particle (6).

(1) MariaMaria

isthas

verreist,left,

dochbut

PeterPeter

isthas

dastayed.

geblieben.

(2) PeterPeter

hattehad

gesagt,said

da�that

erhe

dastays,

bleibt,but

dochhe

erhas

istleft.

weg.

(3) PeterPeter

isthas

alsoleft

dochafter

verreist.all.

(4) (a) A:A:

PeterPeter

kommtis

auchalso

mit.coming.

(b) B:B:

ErBut

isthe

dochhas

verreist.left.

(5) PeterPeter

isthas

verreist.left.

DabeiAlthough

hathe

ersaid

dochthat

gesagt,he

erwill

bleibt.stay.

Proceedings of the Sixth ESSLLI Student SessionKristina Striegnitz (editor)Chapter 12, Copyright c 2001, Elena Karagjosova

131

Towards a comprehensive meaning of German doch

(6) (a) A:A:

IstHasn't

PeterPeter

nichtleft?

verreist?

(b) B:B:

Doch.He has.

It has been argued that the di�erent functions doch ful�lls are historicallyand semantically connected (Hentschel 1986), but very few attempts havebeen made to de�ne an overall core meaning underlying them (Abraham1991; Graefen 2000; K�onig et al. 1990).

Abraham (1991) de�nes the meaning of the conjunction doch as an ad-versative implicational relation between propositions where di�erent impli-cations may be involved in the interpretations of the propositions. He de�nesthe meaning of the adversative conjunction as doch(y) =def 9x(x! :y) ordoch(y) =def 9x9y9z(x! z)&(y ! :z)1, where x, y and z are propositionsand ! is some semantic entailment denoting probability or normality. Inthe case where just one proposition is involved, e.g., Es regnet doch (wheredoch is not accentuated, a modal particle in our account), a proposition x(and z for the second formula) has to be found such that the same two-placerelation holds as for the coordinating conjunction. For doch as a sentenceadverb2 he suggests the scheme

doch(x) =def fA : 9x(x! :y)g, fB : :xg, A : xa variant of which is that doch may fail to imply the �rst part. This schemehe exempli�es by the dialogue (A: One cannot go out today. It is raining.)(B: It does n o t3 rain!) A: Es regnet d o c h! (It d o e s rain, too! ).

We �nd this approach erroneous in several aspects. First, it need notbe another speaker who negates an implication from the previous utteranceof the doch-speaker, but it can be an old belief or expectation of the samespeaker, e.g., in (3). Besides, the �rst step in the scheme for the responseparticle does not seem to have anything to do with the semantics of doch.E.g., we can imagine the �rst utterance in the example dialogue to be A:We can go out today. The sun is shining. The dialogue can take still asimilar course: B: The sun is not shining, A: Doch. Here, no implicationof the type x! :y is needed for or invoked by the use of the adverb doch.We also believe that the formalization provided for the modal particle dochis inadequate since the modal particle does not denote a relation betweenpropositions but between propositional attitudes (Karagjosova 2000).

Graefen (2000) claims that the overall (functional) meaning of doch is tocancel a negation and to process a contradiction. She derives this meaningfrom the function of doch as a response particle which she considers to beprimary. In her approach, all other uses follow the schema typical for the

1The two formulas correspond to the notion of denied expectation and concession,respectively, that will be addressed further below.

2Abraham uses the term discourse modal particle under contrastive accent.3The wide spacing indicates a word under contrastive accent.

132

Elena Karagjosova

response particle. This scheme consists of three steps: (i), the speaker Sintroduces a new (knowledge) element into the discourse; (ii), questioning:one of the participants points out an objection to it; (iii), S rejects theobjection. The following example dialogue, and more closely the part B �A�B, is supposed to exemplify these steps: A: Die 'BMW' ist gut. B: Die'HONDA' ist besser. A: Nein. B: Doch. (A: The 'BMW' is good. B: The'HONDA' is better. A: No, it isn't. B: Yes, it is.)

In our view, it is not adequate to take the use of doch as a responseparticle to be basic and to project its meaning to other doch-uses. First ofall, it does not seem possible to extend this scheme to cover all uses. Parts(ii) and (iii) of it can be seen to apply to the modal particle doch only aftera slight modi�cation, e.g., if we said that the speaker rejects (part (iii)) animplication of the hearer which is not in accord with the common groundalready established between the two conversants (part (ii)). The adverbialuse of doch covers part (iii), where the speaker rejects an earlier oppositeassumption (this would be part (ii)), which however is only implicit andnot contextually present. Furthermore, the scheme does not apply at allto the conjunction doch: although the speaker points to an objection, hedoes not reject it, on the contrary, he aÆrms it. Consider (1). Here, withdoch, not the proposition that Peter has stayed is rejected, but the implicitexpectation to the opposite: that he would also leave.

Secondly, it can be argued from a typological point of view that it wouldbe more plausible to start from the meaning of the diachronically originaluse, i.e., the coordinative adversative conjunction doch which is synonymouswith German aber and English but. And thirdly, Graefen's claim that usingdoch, the speaker blocks an expectation of the hearer, does not apply to alluses since in the case of the conjunction doch it is an expectation of thespeaker himself that is \blocked".

So we need a wider abstraction that covers adequately all uses of doch.This is what is proposed in K�onig et al. (1990), where the basic meaning ofthe word doch is speci�ed in terms of its use as a conjunction from which theother uses are derived. The meaning of the di�erent senses of doch is de�nedas ful�lling the function of (i) objecting to an assumption/statement of theconversational partner (response particle) or (ii) pointing at (a) a contrastbetween two circumstances (conjunction), (b) a denied expectation (sentenceadverb), (c) a con ict between the knowledge that the hearer should haveand his actual assumptions/behavior (modal particle).

The assumptions in this account are closely related to the ones under-lying the present account, but they still need some elaboration and speci�-cation, e.g., how do the notions of contrast, denied expectation and con ictrelate to each other, what uni�es the di�erent uses, and what entities arethe \circumstances" that are related by doch?

The present approach, while based on assumptions about the compre-hensive meaning of the form doch similar to those underlying the accounts

133


described above, attempts to overcome the drawbacks of the previous ap-proaches. Our claim is that what is common to all uses of doch is thata contrastive coherence relation is being expressed. In our approach, thenotion of contrast is based on a taxonomy of discourse relations used inLagerwerf (1998). We shortly introduce three types of contrastive relations{denial of expectation, semantic opposition and concession{and propose ananalysis of examples with di�erent doch-uses in terms of the type of con-trastive relation they express. It turns out that in most of its uses, dochexpresses the relation of denied expectation. The relation is furthermoreshown to hold between discourse entities of di�erent kind depending on theparticular use: for the conjunction doch, it is a relation between two proposi-tions4 (1),(2), for the adverb a relation between a proposition (propositionalcontent) and an implicit (not overtly stated) propositional attitude (3). Inthe case of the modal particle, it is a relation between two propositionalattitudes one of which can be implicit (4)-(5), and for the response particlea relation between a proposition and a propositional attitude of the hearer(6).

2 Contrast as a coherence relation

In theories of discourse, coherence relations are used to account for the factthat a meaningful text is more that just a sentence concatenation. Discourserelations are usually expressed by connectives, but can also hold implicitlybetween the propositions expressed by the respective clauses. Examples oftheory-independent coherence relations are contrast (usually indicated bybut, although) and causality (because).

According to a classi�cation proposed in Sanders et al. (1992), coherencerelations can be (i) causal or additive, (ii) negative or positive, and (iii)interpreted either semantically or pragmatically5. Contrastive relations arenegative: in all of them, a negation of a clause is involved. They mayalso be additive or causal, and either semantic or pragmatic. Contrastiveconnectives can be characterized in terms of these features, e.g., although iscausal, but can have both additive and causal interpretations.

A semantic (or content) interpretation applies to contrast relations thatare based on world knowledge, as in Connors didn't use Kevlar sails, althoughhe expected little wind6, where the contrastive relation is based on the worldknowledge implication Normally, if one expects little wind, one uses Kevlarsails. Pragmatic interpretation can either be an epistemic (a conclusion ofthe speaker is involved) or a speech act interpretation. E.g., in Theo was not

4As a conjunction, doch occurs only in declarative sentences.5There is a fourth feature, \representing its segments in basic order or not", which will

not be considered here.6All examples in this section are taken from (Lagerwerf 1998).

134

Elena Karagjosova

exhausted, although he was gasping for breath, a speculation of the speaker isinvolved, so the interpretation is epistemic: Normally, if someone is gaspingfor breath, I conclude7 that he is exhausted. The interpretation of Mary lovesyou very much, although you already know that is based on the considerationof the speaker If x already knows p, I need not say: p, and therefore it is aspeech act interpretation.

Lagerwerf (1998) distinguishes three basic kinds of contrastive relations{denial of expectation, semantic opposition, and concession{whichwill be shortly introduced below. All of them are typically expressed by thediscourse connective but which corresponds to the German connectives dochand aber.

2.1 Denial of expectation

Denial of expectation is de�ned by Lagerwerf as a causal relation betweentwo propositions. It can be both semantic or pragmatic and is typicallyexpressed by the discourse connector although. but can also have a denial ofexpectation interpretation. This relation is classi�ed as causal in the frame-work of the above taxonomy since a causal relation between two propositions,related to the connected clauses, should be available as an expectation to bedenied. Consider Although Greta Garbo was called the yardstick of beauty,she never married and Greta Garbo was called the yardstick of beauty butshe never married. Both lead to the expectation if a woman is beautiful,she will marry. This expectation is denied by the follow-up clause she nevermarried.

Moreover, Lagerwerf argues that the expectation that is denied is a pre-supposition triggered by connectives like although and but. The presupposi-tion has the form of a defeasible implication (denoted by >): although p; qpresupposes p0 > :q0, where p,q are clauses and p0; q0 propositions. E.g., for(2): Normally, if Peter tells me that he stays, I conclude that he will notleave8. The expectation is derived by negating one of the clauses. In caseswhere no sensible expectation can be derived, another interpretation has tobe found.

2.2 Semantic opposition and concession

The relation of semantic opposition holds between two clauses with paral-lel structure where the contrast is induced by two incompatible predicates:

(7) Greta was single, but Prince was married.

According to Spooren (1989), semantic opposition is about two entitiesin the domain of discussion. This explains also why Greta was single, but

7since in the real world there may be also other reasons for gasping for breath.8This is also an example of an epistemic interpretation of a contrastive relation.

135


she was married is unacceptable: the two predicates in the domain, beingsingle and being married, cannot enter into a contrast relation since there isonly one entity in the domain available for application, Greta: this leads toa contradiction. However, denial of expectation and concession can also beabout two entities: Although Greta was a beauty, Prince married anotherwoman (Ivana Kruij�-Korbayov�a, p.c.). The crucial di�erence seems ratherto be that in semantic opposition, a causal connection between the twopropositions related is not available which makes it impossible to arrive ata (plausible or immediately evident) defeasible rule. As Lagerwerf pointsout, semantic opposition is additive and not causal. It may sometimes beimplicational (e.g., if someone has left he is not here for (1)). However, thisis not a causal connection between propositions but an antonymous relationbetween predicates. This is also why it is not immediately plausible for thetwo clauses in (7) to be connected by although (?Although Greta was single,Prince was married) without having some additional information that helpsto derive an expectation, e.g., that there is an agreement between Greta andPrince that they would never marry.

Concession is always de�ned via a contextually determined 'tertiumcomparationis': a proposition for which both a positive and a negative ar-gument are provided by the two clauses connected by the concession relation:

(8) A: Shall we take this room?

B: It has a beautiful view, but it is very expensive.

In (8), B's answer presents arguments both for and against the proposi-tional content of A's question. Experiments have shown that the but-clauseis given more importance by speakers (Spooren 1989), so that B's answerin (8) is negative after all.

Whereas there are cases where only concessive interpretation is possi-ble (e.g., The weather is nice, but I am tired), it is not diÆcult to provideconcessive interpretations also for cases of denial of expectation or semanticopposition if a 'tertium comparationis' can be found, e.g., (1) can be inter-preted as concession if we imagine it as an answer to the question Have your atmates left?.

Concession is negative, pragmatic and additive. No causal relation be-tween the two clauses is inferred, but a conjunction on the basis of causalinferences (namely, from clause to tertium comparationis), (Lagerwerf 1998,40).

3 Analyzing the examples

In this section, we provide an analysis of the di�erent senses of doch basedon the taxonomy of coherence relations and attempt to integrate the basicprinciples of meaning uni�cation mentioned in section 1.

136

Elena Karagjosova

3.1 doch as a discourse connector

In this function, doch corresponds to English but in that it can express allthree kinds of contrast already mentioned in section 2. (1) has a semanticopposition interpretation since two contrasting predicates are applied totwo di�erent arguments. However, as pointed out in the previous section,a concession interpretation for this example can also be found providedthat a tertium comparationis is contextually present (see (8)). Finally, theconjunction doch can also have a denial of expectation interpretation as in(2). Here, the speaker has expected that Peter has not left since he told himso: if Peter tells me that he will not leave, then he will not leave which hasthe causal implication Peter has not left because he told me he would not doso.

3.2 doch as a sentence adverb

Example (3) can be interpreted only if one imagines a second, implicit propo-sition standing in a contrast relation to the asserted one. The sentence canbe thus paraphrased as I had reasons9 to believe that Peter would not leave,but he has left. Only a denial of expectation interpretation is immediatelyplausible (a semantic opposition interpretation would require two parallelstructures, a concession interpretation a tertium comparationis). The ap-propriate paraphrase reconstructs an implicit denial of the expectation ofthe speaker that Peter has not left:10 Although I had reasons to believe thatPeter would not leave, he has left. The underlying causal relation (the ex-pectation that is being denied by doch) can then be said to be Peter has notleft, because I have reasons to believe that he has not left, and from this, thepresupposition can be derived: Normally, if I have reasons to believe thatPeter has not left, I conclude11 that he has not left.

Alternatively, doch could be said to trigger the simpler presuppositionPeter has not left (Zeevat 1999). But a problem we see with this interpre-tation is that this presupposition, when added to the context created bythe utterance, would make it inconsistent with what is asserted (that Peterhas left). Even if we take it to be a pragmatic or speaker presuppositionthat the speaker believes that Peter has not left, it would be inconsistentwith an inference based on Grice's quality maxim according to which thespeaker believes what he says: that Peter has left. On our account, the

9The paraphrase is formulated this way since the proposition that is opposed to theasserted one can be either believed by the speaker, or by the hearer, or it can be someinformation provided by Peter himself or another person, or any other circumstance thatspeaks against the truth of the asserted proposition.

10In this and the following examples, the connector although is used for the denial ofexpectation paraohrases instead of the English doch-counterpart but, since although istypically expressing denial of expectation and as such it is partially synonymous with but.

11This is the epistemic interpretation of the contrast relation.

137


expected consequence is rather overridden in the beliefs of the speaker bythe unexpected fact the speaker witnesses.

3.3 doch as a modal particle

In (4), doch serves to express the objection of the speaker to an (accordingto the speaker) incorrect implication made by the other conversant whichviolates the common knowledge already established between both interlocu-tors12. The sentence can be paraphrased as We both know that Peter hasleft, but you assert that he will come along. The contrast expressed herecannot be interpreted as semantic opposition: although there is an opposi-tion (in the broader sense) between two predicates (coming along and beingabsent), they apply to the same entity (Peter) which leads to a contradiction(*Peter is coming, but he has left). Nor can a concessive interpretation beestablished (due to the lack of a tertium comparationis). Thus, as a modalparticle, doch can be interpreted only as a case of denial of expectation:the expectation of the speaker that his conversational partner will behavein accord with the common knowledge between them is not met. Thus, theexpectation denied here is not associated barely with the fact that Peter hasleft as in (3), but with the common knowledge between the interlocutorsand their conversational behaviour. This can be paraphrased as: Althoughwe both know that Peter has left, you assert that he will come along. Theunderlying expectation that is being denied by doch can be then said to beNormally, if we both know that Peter has left, you do not assert that he willcome along.

(5) is very similar to (2). However, the contrast relation holds not onlybetween the fact that Peter has left and what he previously said, but alsobetween the old belief which the speaker shared with the hearer (that Peterwill stay), and the new belief of the speaker (that he has left): Although weboth believed that Peter will stay, he has left which is based on the causalrelation Since we both believe that he will stay, he will not leave from whichthe presupposition can be derived: Normally, if we both believe that he willstay, I conclude that he has not left.

3.4 doch as a response particle

The response particle doch serves to negate a negative question or a negativestatement. In these cases, it can be paraphrased as on the contrary:

(9) A:A:

PeterPeter

isthas

nichtnot

verreist.left.

B:B:

Doch,On

(erthe

istcontrary,

verreist).he has.

12doch as modal particle triggers additionally the attitude of the speaker that the propo-sition in its scope is common knowledge between the conversants, see Karagjosova (2000).

138

Elena Karagjosova

But there are also cases where it can serve to con�rm a positive state-ment13 where this paraphrase does not apply:

(10) (a) A:A:

DasThis

warwas

sehrvery

freundlichfriendly

vonof

ihm.him.

(b) B:B:

Doch,Yes,

dasadmittedly.

muss man schon sagen.

Thus, another paraphrase is needed. Consider (6). As a response toA's utterance, doch is intuitively synonymous with the clause this is nottrue. Thus, (6b) means in the context of (6a) That Peter has not left is nottrue. As an answer to (10a), however, doch is synonymous with this is true.This gives the impression that the responce particle means two diametricallyopposed things. On the other hand, it seems that, since doch after a negativestatement is equal to not p is not true, one would expect for doch after apositive statement to mean p is not true. But this is not the case as shownin (12) on p. 140 where doch cannot be used to negate a positive question.

The key to that problem lies in our view in explicating doch's meaningin relational terms, as a marker of contrast, rather than in treating it as asimple negation or con�rmation. E.g., in (6), its meaning can be paraphrasedas You believe that Peter has not left, but he has. Here, of the three kinds ofcontrast, only the denial of expectation interpretation seems plausible (noparallel structure, no tertium comparationis). The response particle ratherserves to deny the expectation of conversant A that the proposition in (6a) istrue. The denial of expectation relation can be explicated by the paraphraseAlthough you believe that Peter has not left, it is not true that Peter has notleft. The underlying defeasible implications for (6) will be then Normally,if you believe something, it is true14.

Example (10) can be also explained15 in conformity with the interpreta-tion provided for (6) using the familiar denial of expectation technique: itcan be paraphrased as Although you do not (seem to) believe that I believethat this was friendly of him, I do16. The underlying defeasible implicationfor the denied expectation will be then Normally, if you do not believe thatI believe what you convey, you conclude that I do not believe it17.

In the above examples, doch serves to deny not only a proposition (as in(9)), but also inferences from utterances (as in (6) and (10)). This indicates

13This example is taken from (Helbig 1988).14This implication is problematic as an axiom of epistemic logic, but as a defeasible rule

it seems to be passable.15Otherwise we would be forced to permit a word to mean two so diametrically opposed

things as a negation and a con�rmation.16This paraphrase is based on the implicature that can be derived from assertions pro-

vided the maxim of quantity is observed: that the speaker believes the content of hisutterance to be new to his addressee.

17This (a bit baroque) paraphrase is also not tenable in epistemic logic, but can beaccepted as a defeasible rule.

139


that it is not a simple negation synonymous with on the contrary, but adenial of an expectation to the opposite of what the respective utteranceexpresses.

As already mentioned, doch cannot be used to negate a positive ques-tion:

(11) A:A:

WillstDon't

Duyou

keinenwant

Zuckersugar

inin

denyour

Tee?tea?

B:B:

Doch.Yes, I do.

(12) A:A:

WillstDo you

Duwant

Zuckersugar

inin

denyour

Tee?tea?

B:B:

DochDoch

(6=(6=

Nein).No).

Here, it could be argued that, against the background of what has beenstated so far, it is not clear why in (12), doch cannot serve to refute theexpectation of A (that B wants sugar), whereas in (11) it does serve torefute A0s expectation. On a second thought, however, doch does refuteboth the expectation in (11) and the one in (12). They are simply not thesame expectation: Although you seem to believe that I do not want sugar inmy tea, I do is the one in (11) and Although you do not seem to know that18 Iwant sugar in my tea, I do the one in (12). From a logical point of view, thereshould not be any di�erence in representing both questions19. Pragmatically,however, there seems to be a di�erence in terms of the implicatures that thehearer of the two questions can draw from them: if A didn't know that Bwanted sugar, he would utter rather a question like (12), but if he had thesuspicion that B does not want sugar in his tea, he whould ask (11).

4 Conclusion

In this paper, an attempt was made to o�er an explanation about whatuni�es the di�erent syntactic instantiations of the word doch. It was claimedthat it expresses a contrastive coherence relation between various discourseentities. Except for its use as a conjunction, where it can express threedi�erent kinds of contrast between two propositions (denial of expectation,semantic opposition and concession), doch in its other uses marks only therelation of denied expectation which holds between a proposition and someimplication arising from it20.

18If someone asks a positive yes/no question, he does not know whether the propo-sitional content of the question is true, but from the point of view of the doch-speakerit is true that he wants sugar in his tea (in \Although you do not know that p, q", p ispresupposed), and the doch-paraphrase re ects the speaker's point of view. Alternativeparaphrase and rule: Although you do not know whether p, it is true; Normally, if you donot know whether something is true, you do not conclude that it is true.

19Both negative and positive questions imply that the speaker knows neither p nor :p.20I would like to thank Ivana Kruijff-Korbayov�a and four anonymous

reviewers for their comments on earlier drafts of this paper.

140

Elena Karagjosova

References

Abraham, W. (1991). Discourse particles in German: How does theirillocutive force come about? In W. Abraham (Ed.), Discourse particlesin German, pp. 203{252. Amsterdam/Philadelphia: John Benjamins.

Graefen, G. (2000). Ein Beitrag zur Partikelanalyse - Beispiel: doch. Lin-guistik online 6 (2).

Grice, H. (1975). Logic and Conversation. In P. Cole and J. L. Morgan(Eds.), Syntax and Semantics, Volume III. Orlando: Academic Press.

Helbig, G. (1988). Lexikon deutscher Partikeln. Leipzig: Verlag Enzyk-lop�adie.

Hentschel, E. (1986). Funktion und Geschichte deutscher Partikeln: ja,doch, halt und eben. Reihe germanistische Linguistik; 63. T�ubingen:Niemeyer.

Karagjosova, E. (2000). A uni�ed approach to the meaning and functionof modal particles in dialogue. In C. Pili�ere (Ed.), Proceedings of theESSLLI 2000 Student Session, University of Birmingham.

K�onig, E., D. Stark, and S. Requardt (1990). Adverbien und Partikeln.Ein deutsch-englisches W�orterbuch. Heidelberg: Julius Groos.

Lagerwerf, L. (1998). Causal Connectives have Presuppositions. Ph. D.thesis, Tilburg.

Sanders, T. J. M., W. Spooren, and L. Noordman (1992). Toward a tax-onomy of coherence relations. Discourse Processes 15 (1), 1{35.

Spooren, W. P. M. S. (1989). Some Aspects of the Form and Interpretationof Global Contrastive Coherence Relations. Ph. D. thesis, Universityof Nijmegen.

Zeevat, H. (1999). Explaining presupposition triggers. Manuscript.

141


142

Dialogue Act Modelling Using Bayesian

Networks

Simon Keizer

Parlevink Language Engineering Group

University of Twente

The Netherlands

[email protected]

Abstract. A probabilistic approach to interpretation of natural language utter-ances in terms of dialogue acts is proposed. It is illustrated how using BayesianNetworks, partial information obtained from an NLP component can be combinedwith knowledge the agent has about the state of the dialogue and about the user,in order to �nd the most probable dialogue act made.

1 Introduction

In this paper, we are dealing with a conversational agent (the 'server'),which participates in a dialogue with another agent (the 'client'). Thisconversational agent perceives utterances of the client and tries to react tothese utterances in an appropriate way. In Figure 13.1, a possible architec-ture for the conversational agent is given.

The dialogue manager coordinates the various steps involved in the inter-pretation of an incoming user utterance and the planning of what action totake next. First of all, an incoming utterance is submitted to the componentsof speech recognition (in case of spoken dialogue), and syntactic/semanticanalysis. Next, the pragmatic aspect of identifying what communicative ac-tion was performed by the user in uttering the sentence is dealt with. Thispart is what we will be concentrating on in this paper. Finally, the resultinginterpretation will have to lead to a decision on what actions to take, inparticular, what communicative action to be addressed to the user.

In identifying the communicative action performed in an utterance, theserver has to deal with uncertainty. This uncertainty arises because of theincompleteness of the information that is provided by the speech recognitionand syntactic/semantic analysis components. In general, such componentscannot provide all linguistic information that is contained in an utterance,

Proceedings of the Sixth ESSLLI Student SessionKristina Striegnitz (editor)Chapter 13, Copyright c 2001, Simon Keizer

143

Dialogue Act Modelling Using Bayesian Networks

SPEECHRECOGNITION

SYNTACTIC / SEMANTICANALYSIS

NATURAL LANGUAGEGENERATION

SPEECHSYNTHESIS

DIALOGUEMANAGER

PLANNING / DECISIONDIALOGUE ACTCLASSIFICATION

linguisticfeatures

dialogueact(s)

utteranceclient

utteranceserver

clientaction action

server

CLIENT

SERVER

Figure 13.1: An architecture for a conversational agent

especially because we are dealing with human speakers that may produce ut-terances that are partly unrecognised through bad pronunciation, sentencesthat are ungrammatical, or that contain unknown words. This is especiallythe case in mixed-initiative dialogues, where a client may tend to use morecomplex utterances.

In order to deal with the uncertainty, the server will have to make ed-ucated guesses in the interpretation of the client's utterances. Therefore,we will take a probabilistic approach to dialogue modelling, in the form ofBayesian networks.

In Section 2, we will describe our approach to identifying communicativeactions in terms of dialogue acts. In Section 3, we will introduce the notionof Bayesian networks and how they can be used in dialogue act classi�cation.In Section 4, we discuss some related work and �nally, in Section 5 someconclusions are drawn and an indication of further research is given.

2 Dialogue Act Modelling

The notion of dialogue acts is originated in the work of Austin and Searle(Austin 1962; Searle 1969). They observed that utterances are not merelysentences which can be either true or false, but should be seen as (com-municative) actions. Searle introduced the theory of speech acts, which hede�ned as the whole act of uttering a sentence. He gave a categorisation oftypes of speech acts, including e.g. request, assertion, and advice.

When we speak of dialogue acts however, we emphasise the importanceof looking beyond the boundaries of an utterance itself when analysing that

144

Simon Keizer

utterance (see also (Traum 1999)). Besides linguistic information of the ut-terance in isolation (prosodic features, syntactic/semantic features, surfacepatterns, keywords, etc.), the meaning of dialogue utterances may also bedetermined by information concerning two other aspects:

1. the state of the dialogue: e.g. the current topic or the communicativeact(s) performed in the previous utterance(s);

2. the state of the hearer's model of the speaker (e.g. the state of themodel that our server has of the the client, the so-called user model).

Based on these three aspects, a categorisation of di�erent dialogue acttypes can be made. The dialogue act hierarchy we use, is based on an exist-ing dialogue act hierarchy underlying an annotation scheme, called DAMSL(Allen and Core 1997). This scheme has been developed as a standard forannotating task-oriented dialogues. We will not go into the details of thishierarchy here, but just mention some of the dialogue act types that arerelevant in the remainder of this paper:

� q ref: the speaker requests the hearer for information in the form ofreferences satisfying some speci�cation also given by the speaker (e.g.a list of theatre performances scheduled on a speci�ed date: \whatoperas are on next week?").

� q if: the speaker asks the hearer if something is the case or not (\doyou want to make reservations?").

� req: the speaker requests the hearer for a non-communicative action(\two tickets please").

� neg ans: often in response to a q ref previously performed by thehearer, the speaker indicates that no references satisfy the given spec-i�cation (\there are no operas scheduled for next week").

Because of the need to be able to reason under uncertainty as explained inSection 1, we propose the use of probability theory in modelling the variousrelationships involved in interpreting dialogue utterances. Our model willconsist of three interrelated components:

1. the Belief State of the server: this state is determined by beliefs con-cerning:

(a) the course of the dialogue, and

(b) the beliefs, desires and intentions of the client.

2. the Dialogue Act(s) performed in the client's utterance;

145


RV meaning component

PSN the previous dialogue act of S was a neg ans. Belief StatePCQ the previous dialogue act of C was a q ref.

CQ C performed a q ref in the current utterance. DialogueCR C performed a req in the current utterance. Acts

CONT the current utterance shows a continuation Linguisticpattern, e.g. in Dutch, if it starts with Featuresthe word \en".

QM the current utterance contains a question mark.

Table 13.1: The RVs of the example network of Figure 13.2.

3. the relevant Linguistic Features that the client's utterance may con-tain.

Using this model, the server can calculate what most probably musthave been the dialogue act performed in an utterance of the client, givennew information w.r.t. linguistic features of that utterance, obtained fromthe speech recognition and syntactic/semantic components. We will nowdescribe how this can be done, using probabilistic inference in a Bayesiannetwork.

3 Bayesian Dialogue Act Classi�cation

The model introduced in Section 2 is described by a set of discrete RandomVariables (RVs), i.e. variables that describe events with di�erent possibleoutcomes. In Table 13.1 some two-valued, Boolean RVs in this set are givenwith their meaning and which of the components, as indicated in Section 2,they are associated with. It should be noted that it is not a requirementthat the all RVs are Boolean; we could also have a RV that represents theprevious dialogue act of the server S, its values ranging over all possibledialogue act types.

To illustrate the in uences that exist between the RVs, consider thefollowing dialogue passage, where we are interested in the dialogue act per-formed in utterance (3):

(1) C: Wat gebeurt er komend weekend 19 maart in de schouwburg?(What is happening in the theatre next weekend March 19?)

(2) S: Op deze datum is er geen uitvoering. (On this date no performanceshave been scheduled.)

(3) C: En op 18 maart? (What about March 18?)

146

Simon Keizer

Our judgement that a q ref was performed in (3) may be determinedby the observation that C previously performed a q ref in (1), and thatS's previous dialogue act in (2) was a neg ans. In this context, C has de-cided to continue his previous dialogue act, but with a di�erent speci�cationaccompanying that q ref. Note that all this cannot be concluded by justanalysing the linguistic features of utterance (3) only.

A probabilistic model is completely speci�ed by a joint probability distri-bution (jpd), which assigns a number between 0 and 1 to every instantiationof the RVs. However, the number of probabilities to be assessed in order tospecify the jpd increases exponentially with the number of variables. Thisproblem can be overcome by identifying conditional independencies betweenRVs, reducing the number of probabilities needed for specifying the jpd. Forexample, we may indicate that if we know the value of CQ and CR, thenlearning the value of PCQ gives us no information on QM : CQ and QM areconditionally independent, given CQ and CR.

These conditional independencies can be speci�ed by means of a Bayes-ian Network. A Bayesian network (Pearl 1988) is a DAG (Directed AcyclicGraph) in which the nodes represent RVs and the arcs re ect the infor-mational dependencies between these variables. In Figure 13.2, a BayesianNetwork is depicted containing the RVs given above. It re ects a number ofconditional independencies, including the one indicated above.

PSN PCQ

CR

QM

CQ

CONT

Figure 13.2: Simple Bayesian Network for utterance interpretation.

Associated with each RV is a conditional probability distribution (cpd)given its parents in the network. This means that for CQ a cpd is speci�ed,given its parents PSN and PCQ. In the cpd's we should have numbers whichre ect the qualitative relationships between the RVs involved. In Table13.2, some of the chosen distributions are given. From the distributionof CQ, one can see that if PSN=true and PCQ=true, then CQ is moreprobably true (0.7) than false (0.3). For this example, the numbers havebeen assessed by using 'expert' knowledge; for more accurate and realisticmodels, statistical analysis on data using an annotated dialogue corpus is

147


needed for the assessment (see for example (Heckerman 1995)).

PSN

true 0.75false 0.25

PCQ

true 0.8false 0.2

CQ

PSN true false

PCQ true false true false

true 0.7 0.2 0.2 0.5false 0.3 0.8 0.8 0.5

Table 13.2: Some of the probability distributions specifying the Bayesiannetwork of Figure 13.2.

The RVs concerning the Belief State, PSN and PCQ, have no parents, sotheir cpd's are not conditional, but 'prior' distributions (also given in Table13.2). Although we have speci�ed all prior distributions in the networkas �xed, the distributions of RVs like these actually depend on the BeliefState at the time-step in which the previous utterance was processed, andis therefore subject to updating. However, in order to keep our story withinlimits, we will not go into this dynamic extension of our model, but havebased the prior distributions intuitively on the course of the dialogue passagegiven before. Here, the previous act by S, performed in utterance (2), wasprobably a neg ans (probability 0.75) and the previous act of C, performedin utterance (1) a q ref (probability 0.8).

This network can now be used for probabilistic inferences: we can de-termine the probability that e.g. a q ref was performed, given the partialinformation that e.g. C's utterance contained a question mark, like in (3)of the dialogue passage given before. This process of determining the pos-terior probability distribution is called belief updating. In this process, theformula for the joint probability distribution (jpd) over all RVs in the net-work plays a central role. By making use of the conditional independenciesimplicitly given by the network structure, the jpd is given by the productof the speci�ed cpd's:

P (PSN;PCQ;CQ;CR;CONT;QM) = P (PSN) � P (PCQ)�

�P (CQjPSN;PCQ) � P (CRjPSN;PCQ)�

�P (CONT jPSN;PCQ;CQ) � P (QM jCQ;CR)

(13.1)

Suppose the server gets to know that there was a question mark in theutterance. He will be interested in the updated probability that the dialogue

148

Simon Keizer

act performed is a q ref, given this new information. This probability canbe calculated as follows:

P (CQ = truejQM = true) =P (CQ = true;QM = true)

P (QM = true)(13.2)

Both numerator and denominator can be obtained from the jpd (13.1)by summing over all possible con�gurations of the other RVs in the network.Let S = s and T = t denote instantiations of the RVs in fPSN , PCQ, CR,CONTg and fPSN , PCQ, CR, CONT , CQg respectively. Then we getour posterior probability from 13.3 and 13.4.

P (CQ = true;QM = true) =Xs

P (CQ = true;QM = true; S = s)

(13.3)

P (QM = true) =Xt

P (QM = true; T = t) (13.4)

We will now show some results of probabilistic inferences in the network.For three di�erent cases of particular information (which will be called the'evidence' E), we have calculated the posterior probability distribution ofCQ and CR, given that information. This has been done with two di�erentchoices for the prior distributions of PSN and PCQ. From the results inTable 13.3, one can observe that with the original priors (upper row of thetable), the probability that q ref was performed changes as the availableinformation varies. Especially the case where S gets to know that the ut-terance shows a continuation pattern (starting with the word \en"), clearlyre ects the correctness of classifying the utterance as a q ref (with proba-bility 0.804) and not req (with probability 0.313).

In the case of 'uniform' distributions for the priors (lower row of the ta-ble), i.e. the probabilities of both values true and false are 0.5 for both PSNand PCQ, one can observe that there is much more indi�erence between CQand CR than before. This illustrates how the role of beliefs concerning thecourse of the dialogue (as part of the Belief State) in dialogue act classi�ca-tion can be taken into account.

4 Related Work

Other work on dialogue modelling which is based on the notion of dialogueacts includes e.g. Dynamic Interpretation Theory (DIT) and the InformationState model (Poesio and Traum 1998). In DIT (Bunt 1995), dialogue actsare de�ned as functional units used by the speaker to change the context.They consist of a semantic content and a communicative function, so a

149


priors E P (CQjE) P (CRjE)

P (PSN) = 0:75 (none) 0.515 0.298and (QM=true) 0.562 0.380

P (PCQ) = 0:8 (QM=true,CONT=true) 0.804 0.313

P (PSN) = 0:5 (none) 0.400 0.413and (QM=true) 0.521 0.537

P (PCQ) = 0:5 (QM=true,CONT=true) 0.624 0.485

Table 13.3: Results of Probabilistic Inferences.

dialogue act changes the context in a way that is given by the communicativefunction, using the semantic content as a parameter. According to theInformation State model, both of the dialogue participants keep track of theConversational Information State (CIS), in which grounded conversationalacts are recorded and also ungrounded contributions. A CIS is characterisedby a feature structure, containing embedded feature structures for bothdialogue participants.

Concerning the classi�cation of communicative actions, various researchhas been done. In plan-based approaches (Perrault and Allen 1980), thecommunicative acts (speech acts) are predicted on the basis of recognitionof plans that the speaker has. Therefore, the interpretation of utterancesis extended from identifying direct speech acts from linguistic features, toindirect speech acts, taking into account the course of the dialogue in termsof a speaker's plans. This rule-based approach may lead to diÆculties whendealing with uncertainty.

In other approaches, statistical methods are used to model dialogue, seefor example (Nagata and Morimoto 1994; Mast et al. 1996; Stolcke et al2000). In (Stolcke et al 2000), a probabilistic model obtained from statisti-cal analyses of a dialogue corpus is presented. Dialogues are modelled in aHidden Markov Model, with states corresponding to dialogue acts and ob-servations corresponding to utterances (in terms of word sequences, acousticevidence and prosodic features). The transition probabilities are obtainedfrom n-gram analysis of dialogue acts and the observation probabilities aregiven by local utterance-based likelihoods.

Also the use of Bayesian networks for interpreting utterances in a dia-logue has been proposed before. In (Pulman 1996), Stephen Pulman pro-poses a framework for classifying communicative actions, very similar to ourapproach, in which a framework of conversational games and moves is used,in stead of dialogue acts. A more interesting di�erence with our approach

150

Simon Keizer

however, is in the structure of the Bayesian network used. While we havechosen for arcs from nodes representing dialogue acts to nodes representinglinguistic features (like the arc from CQ to QM ), Pulman has chosen arcin the opposite direction. One could say that in our approach a model ofthe speaker's behaviour is given, which is used to derive the most probabledialogue act he performed. Pulman's network however, models the hearer,using various sources of information as 'inputs' to derive the most probableconversational move made by the speaker. This di�erence is an interestingtopic for further research.

Other work on the use of Bayesian networks in dialogue systems in-cludes research, where the emphasis is on the user modelling part within aspeci�c (task-)domain (Akiba and Tanaka 1994), in stead of the aspect ofdealing with partial linguistic information in understanding communicativebehaviour.

5 Conclusion

In this paper we have shown a probabilistic approach to interpreting nat-ural language utterances in a dialogue. We have described how Bayesiannetworks can be used to interpret partial information from natural languageanalysis in terms of dialogue acts. Not only the linguistic information fromutterances is taken into account, but also knowledge about the course of thedialogue and about the mental state of the speaker, the dialogue partici-pant that performed the utterance. Using Bayesian networks, these varioussources of information can be integrated into one probabilistic model.

Current research questions are related to the construction of a Bayesiannetwork for dialogue act classi�cation. This means that we have to �nda set of random variables representing all relevant aspects of identifying adialogue act type, identify a suÆcient number of conditional independenciesamong these variables (i.e. �nding the network structure), and assess theconditional probability distributions associated with the network. One wayof doing that is using an annotated dialogue corpus to train a Bayesiannetwork from data. Ongoing work includes the annotation of a corpus ofdialogues, which was obtained from Wizard of Oz experiments. The corpuscontains mixed-initiative dialogues between a human user typing utterancesand a system also producing textual utterances. The dialogues concern thetheatre domain, in which the user can get information about performancesand ticket reservations can be made if required. The annotation currentlyconcerns dialogue acts (from a dialogue act hierarchy based on the DAMSLscheme, as mentioned in Section 2) and super�cial linguistic features likesentence type and punctuation.

151


References

Akiba, T. and H. Tanaka (1994). A Bayesian approach for user modelingin dialogue systems. Technical Report TR94-0018, Tokyo Institute ofTechnology.

Allen, J. and M. Core (1997, October). Draft of DAMSL: Dialog ActMarkup in Several Layers. Dagstuhl Workshop.

Austin, J. L. (1962). How to Do Things with Words. Harvard UniversityPress.

Bunt, H. C. (1995). Dynamic Interpretation and Dialogue Theory. InM. Taylor, D. G. Bouwhuis, and F. Neel (Eds.), The Structure ofMultimodal Dialogue, Volume 2. A'dam: John Benjamins.

Heckerman, D. (1995). A Tutorial on Learning With Bayesian Networks.Technical Report MSR-TR-95-06, Microsoft Research.

Mast, M., H. Niemann, E. Noth, E. Guenter, and E. Schukat-Talamazzini(1996). Automatic Classi�cation of Dialog Acts with Semantic Classi-�cation Trees and Polygrams. In S. Wermter, E. Rilo�, and G. Scheler(Eds.), Connectionist, Statistical, and Symbolic Approaches to Learn-ing for Natural Language Processing, Volume 1040 of Lecture Notes inArti�cial Intelligence, pp. 217{229. Springer-Verlag.

Nagata, M. and T. Morimoto (1994). First steps towards statistical mod-eling of dialogue to predict the speech act type of the next utterance.Speech Communication 15, 193{203.

Pearl, J. (1988). Probabilistic Reasoning in Intelligent Systems: Networksof Plausible Inference. Morgan Kaufmann.

Perrault, C. R. and J. Allen (1980). A Plan-based Analysis of IndirectSpeech Acts. American Journal of Computational Linguistics 6 (3-4),167{182.

Poesio, M. and D. Traum (1998). Towards an Axiomatization of DialogueActs. In J. Hulstijn and A. Nijholt (Eds.), TwenDial'98: Formal Se-mantics and Pragmatics of Dialogue, Number 13 in TWLT.

Pulman, S. G. (1996). Conversational Games, Belief Revision andBayesian Networks. In K. v. D. Jan Landsbergen, Jan Odijk and G. V.van Zanten (Eds.), Computational Linguistics in the Netherlands. SRITechnical Report CRC-071.

Searle, J. R. (1969). Speech Acts: An Essay in the Philosophy of Language.Cambridge University Press.

Stolcke et al, A. (2000). Dialogue Act Modelling for Automatic Taggingand Recognition of Conversational Speech. Computational Linguis-tics 26 (3), 339{374.

152

Simon Keizer

Traum, D. R. (1999). Speech Acts for Dialogue Agents. In M. Wooldridgeand A. Rao (Eds.), Foundations of Rational Agency, pp. 169{201.Kluwer.

153


154

On the Semantics of Concurrent

Programming Languages:

An Automata-Theoretic Approach

Roman V. Konchakov

Faculty of Computational Mathematics and Cybernetics,

Moscow State University

[email protected]

Abstract. In this paper we present an approach to developing formal semanticsof programming languages with concurrency and real time. The main idea is tocombine Denotational Semantics and the concept of Timed Automata. We adoptgeneral timed automata and de�ne some operations on them. These operationsmake it possible to de�ne a compositional semantics of MM language. This ap-proach can be applied to programming languages based on message passing (as MMlanguage) and shared variables as well. It also provides promising opportunities ofwell-known veri�cation techniques application.

Introduction

A problem of developing reliable software gets more and more actual sincesoftware systems capture wider application areas whereas software systemsin their own turn become more complicated. In particular, this problemis very important for distributed and embedded systems that control traf-�c, communications, industrial processes, etc. Distributed systems are thesystems comprised of several communicating processes executed in paral-lel. Due to concurrency these systems as a rule are highly nondeterminis-tic. Modeling, testing and other similar techniques for program correctnesschecking are not well-suited for distributed systems, since these techniquescan handle only a small part of all possible executions.

Formal methods provide an alternative approach to the problem. Thekey idea of formal methods is as follows. One builds a formal model ofthe program under consideration and then proves that the model meetsthe requirements (program speci�cation). The bene�ts of formal methodsare well-known: checking procedure can be done automatically and it al-ways gives mathematically correct answer to the question. Usually formalmethods are applied to models of computation that are abstractions of real

Proceedings of the Sixth ESSLLI Student SessionKristina Striegnitz (editor)Chapter 14, Copyright c 2001, Roman V. Konchakov

155

On the Semantics of Concurrent Programming Languages

computer programs. Therefore one has to translate an analyzed programinto a formal model to achieve bene�ts of formal veri�cation techniques.When turning from a real computer program to its abstract formal modelone should also guarantee the correctness of the translation. It is possibleonly in the case when a programming language is provided with a formalsemantics that is suitable for application of formal methods.

In this paper we present an approach to developing formal semantics ofprogramming languages with real time and concurrency. The key idea ofthe approach is to handle concurrency by means of timed automata. As anexample of a programming language we choose MM (Model with Messages)language of DYANA environment (Bakhmurov et al. 1999).

MM language is intended for design and development of real-time dis-tributed systems. We will show that both key features of MM programs {real time and true concurrency { are captured by timed automata. A runof an MM program operating in real time may be thought of as a sequenceof observable events marked with timestamps. We demonstrate below thattimed observation sequences of this kind are generated by timed automata.Thus a real-time behaviour of an MM program can be described by a timedautomaton.

On the other hand, the concept of timed automata admits an intro-duction of some operations (parallel composition, sequential composition,choice and iteration) in a natural manner. These operations make it pos-sible to present a sequential process in terms of timed automata associatedwith individual statements this process is composed of. Furthermore, a trueconcurrent behaviour of several communicating processes can be presentedas a parallel composition of timed automata corresponding to processes andcommunication channels.

Thus the notion of timed automata is suitable for de�ning a composi-tional semantics of programming languages with real time and concurrency.In the section 1 we introduce the concept of timed automata. Timed au-tomata we use are slightly di�erent from those are presented in (Alur andHenzinger 1991; Alur and Dill 1994): in our automata time progresses in anon-strictly monotone sense, i.e. these automata admits several subsequentstate transitions within the same time instant. In the section 2 we adopt thegeneral concept of timed automata for purposes of designing formal speci-�cation of programming languages semantics and de�ne some useful opera-tions on timed automata. Syntax and informal description of MM languageare brie y presented in the section 3. In the last section we demonstratehow introduced timed automata can be applied for developing the formalsemantics of MM language.

156

Roman V. Konchakov

1 Timed automata

Intuitively, a timed automaton operates in real time by taking transitionsfrom one location into another. Executing a transition takes no time. If notransitions are taken, time progresses by an arbitrary real number. An au-tomaton can stay in a location as long as timing constraints are satis�ed. Anobservable behaviour of a timed automaton is indicated by its propositionalvariables. Formally, a timed automaton A is a tuple

(P;Q;Q0; C; �; �; E)

such that P is a set of basic propositions, Q is a �nite set of locations,Q0 � Q is a subset of initial locations, C is a �nite set of clocks, � and � arelabeling functions, and E � Q � Q� 2C is a set of transitions. Clocks arereal-valued variables, whose values either ascend uniformly as time elapsesor reset to 0 on some transitions. Labeling function � assigns to each loca-tion in Q a timing constraint over the clocks. Each timing constraint is aboolean combination of atomic timing constraints of the form � Æ c, wherec 2 R+ , � is a clock, and Æ is a binary relation, e.g. <, = or �. Labelingfunction � assigns to each location q in Q a boolean formula over the set Pof propositions. The formula �(q) is called a propositional constraint. Eachtriple (q; q0; �) 2 E identi�es a transition from q into q0 with a subset � ofclocks to be reset on the transition.

Any time instant during a run, a state of the timed automaton A iscompletely characterized by a triple (q; �; ), where q 2 Q is a location inwhich the control resides, � � P is an observation and : C ! R+ is aclock interpretation. By an observation we mean a subset of propositionsthat are evaluated with true in a location. A clock interpretation determinesthe values of the clocks. At any state (q; �; ) the observation � and theclock interpretation should satisfy the propositional �(q) and timing �(q)constraints. This is denoted by � j= �(q) and j= �(q) respectively.

As time goes on, locations and observations may remain unchangedwithin some intervals. An interval is a convex subset of non-negative realnumbers R+ . The left end-point of an interval I is denoted by l(I), andthe right end-point is denoted by r(I). An interval sequence is a �nite orin�nite sequence of intervals I0; I1; : : : such that for all t 2 R+ there is aninterval Ik with t 2 Ik, and l(Ik+1) = r(Ik) for all intervals Ik and Ik+1.

A run of the timed automaton A is a �nite or in�nite sequence

r : 0�! (q0; �0; I0)

1�!�1

(q1; �1; I1) 2�!�2

(q2; �2; I2) 3�!�3

: : :

where qk 2 Q are locations, �k � P are observations, Ik are intervals, �k � Care clock sets, and k : C ! R+ are clock interpretations, if the followingconditions hold for all k � 0 and � 2 C

� initialization: q0 2 Q0, 0(�) = 0;

157


� consecution: I0; I1; I2 : : : is an interval sequence,

(qk; qk+1; �k+1) 2 E, k+1(�) =

(0; � 2 �k+1

k(�) + r(Ik)� l(Ik), otherwise

� invariance: �k j= �(qk) and k + t� l(Ik) j= �(qk) hold for all t 2 Ik.

Each run r of the timed automaton A produces a timed observationsequence

�(r) : (�0; I0) �! (�1; I1) �! (�2; I2) �! : : :

The set of timed observation sequences generated by a timed automaton iscalled a language of the automaton, or timed language. There is an obviousde�nition of a parallel composition of timed languages (see (Alur and Dill1994)). Also a parallel composition of timed automata can be introducedsuch that the language of parallel composition on automata is, in fact, aparallel composition of their languages.

2 Timed automata of �xed signature

The general concept of timed automata should be adopted to suit our pur-pose of de�ning formal semantics of sequential processes. Namely, we as-sociate a timed automaton with each statement of a process and introducesome operations on timed automata to capture the composition of processstatements.

Timed automata (P;Qi; Q0i; C; �i; �i; Ei) corresponding to statementsof the same process have many common components. They share a set ofclocks C and a set of propositions P . Also their sets of locations Qi have thesimilar structure Qi = D � Li, where D is a set of process data states andLi are sets of control ow positions (labels) occurred in the ith statement ofthe process.

These common components form a signature � which is a triple (P;D;C),where P is a set of propositions, D is a set of data states, C is a set of clocks.A timed automaton A� of the signature � is a tuple

(V;L0; L1; E);

where V = f(`; �; �) j ` 2 Lg is a set of location classes, L0 � L is a set ofinitial labels, L1 � L is a set of �nal labels, and E � D�L�D�L� 2C isa set of transitions. A location class (`; �; �) consists of the label ` 2 L andfunctions �(d) and �(d) that specify propositional and timing constraints ofthe locations of the form (d; `) 2 D � L for every data state d 2 D. Thepair (L0; L1) of sets of initial and �nal labels is called a context of the timedautomaton A�.

It is obvious that any timed automaton A� gives rise to a general timedautomaton A in a natural way.

158

Roman V. Konchakov

Basic operations Now we are at the point of introducing some basicoperations on timed automata of �xed signature. These operations haveusual automata-theoretic treatments.

Union Consider timed automata A0� and A00� with disjoint sets of labelsL0 and L00. Then the automaton A� = (V;L0; L1; E) such that V =V 0 [ V 00, L0 = L00 [ L

000, L1 = L01 [ L

001 , E = E0 [ E00 is called a union

of automata and is denoted by A0� [A00�.

Factorization Let A� be a timed automaton and R � L�L be an equiva-lence relation on the set of its labels L. Denote by [`]R the equivalenceclass of a label ` w.r.t. R and by L=R the set of equivalence classes ofL induced by R. Then the automaton A0� = (V 0; L00; L

01; E

0) such thatV 0 =

��`0;V`2`0 �;

V`2`0 �

�j `0 2 L=R

, L00 = L0=R, L01 = L1=R, and

E0 = f(d1; [`1]R; d2; [`2]R; �) j (d1; `1; d2; `2; �) 2 Eg is called a factor-ization of A� induced by R and is denoted by A�=R.

Replacing of a context Let A� = (V;L0; L1; E) be a timed automatonand (L00; L

01) be a context such that L00; L

01 � L. Then denote by�

A��(L00;L

01)

the timed automaton (V;L00; L01; E).

Compositional operations Consider timed automata A0� and A00�. ByR(L0; L00) denote the total equivalence on L0 [ L00 for any pair of label setsL0 and L00. Then sequential composition (.), nondeterministic choice (+),and in�nite iteration (�) are de�ned by the following equations.

A0� . A00� =

�hA0� [A

00�

i(L00;L

001 )

� .R(L01;L

000 )

A0� +A00� =�A0� [A

00�

� .R(L00;L

000 )[R(L

01 ;L

001 )

A0��

=

�hA0�

i(L00[L

01;;)

� .R(L00;L

01)

These operations give us a possibility to de�ne a compositional semanticsof MM statements, so that composing statements corresponds to composingoperations on timed automata.

3 Overview of DYANA and MM language

DYANA (DYnamic ANAlyser) is an integrated software environment fordesign and analysis of distributed system behaviour without hardware pro-totyping. DYANA deals with models of distributed systems that consist ofthree main parts: hardware components, software components and commu-nications.

159


Processors, timers, memory units and other devices are considered ashardware components, or executors. Hardware components are speci�ed bytheir architecture and clock speed. Architecture of an executor provides timeestimation subsystem of DYANA with necessary information on duration ofbasic operations of the executor. It gives DYANA a possibility to estimateexecution time of entire computation blocks of processes.

Software components, called processes, runs on executors. A processmay perform some computation, suspend its own execution or interact withother processes. MM language provides message passing for an interprocesscommunication. A process sends messages into output ports and receivesmessages from input ports. There are also intermidiates between output andinput ports called channels. They are responsible of storing and transmittingof messages. Each channel connects several output ports with an input port,thus a process can receive messages coming from several processes throughthe same input port.

Below a grammar that describes the syntax of MM processes and state-ments is presented. In square brackets

� �we put optional elements and by� ��

repeating elements are denoted.

P::= process name() <input in0

�, inj

��; output out0

�, outi

��;> S

S::= T v; jv = E; jf S�g jcomplex S jif(E) S1

�else S2

�j

while(E) S1 jstop; jdelay(E); jsend(v, out); jreceive(v, in); j

select(nm) f�case inj: Sj

�� timeout E: S0

�g

Table 1. Syntax of MM process declaration.

In the table 1 and hereafter S is a statement, in is an input port name,out is an output port name, T is a type (int, bool, msg, etc.), v is a name ofa variable and E is an expression. An informal meaning of MM statementsis as follows:

� T v; declares a new variable of name v and type T ;

� v = E; evaluates the expression E and assigns the value to v;

� f S�g simple groups statements and presents the statement sequenceS� as a single statement;

160

Roman V. Konchakov

� complex S indicates a computation block that is processed by thetime estimation subsystem;

� if(E) S1�else S2

�and while(E) S1 have conventional meaning:

they are conditional and loop statements respectively;

� stop; immediately terminates execution of the process;

� delay(E); suspends execution of the process for the speci�ed amountof time; time is measured in processor ticks;

� send(v, out); sends a message containing in the variable v to theoutput port out;

� receive(v, in); receives a message from the input port in and assignsthe message to the variable v; if the channel contains no messages thenthe process is suspended until a message arrives;

� select f . . .g statement gives a possibility to receive a message fromone of the speci�ed input ports inj ; an optional section timeout E:S0 sets an upper bound on time that the process waits while all thechannels are empty.

4 Semantics of MM language

In order to describe the semantics of MM language we de�ne the mapping ofMM models into timed automata in such a way that the timed automatonof a distributed system model produces timed observation sequences of allpossible executions of the model. These timed automata are constructed asparallel compositions of timed automata of channels and timed automata ofprocesses, i.e.

p

P (p;B(p))

in

C(p; in;K(p; in))

!;

where p is a process name, B(p) is an executor of the process p, P (p;B(p))is a timed automaton of the process p running on the executor B(p), in isan input port name, K(p; in) is a set of names of output ports that are con-nected to the input port in of the process p, and C(p; in;K(p; in)) is a timedautomaton of the channel. All functions mentioned above are constructedfrom a distributed system model. For the sake of brevity much details ofsecond importance will be omitted. We focus mostly on the principles ofderivation of timed automata by statements and processes.

161


Observable variables In order to describe an observable behaviour ofprocesses and communication channels we introduce the following proposi-tional variables:

� send(p:out;m) and full(p:out) for each output port out of a process p.send(p:out;m) holds whenever the process p is at the state of sendinga message m into the output port out. full(p:out) holds whenever thechannel is capable of taking a message from the output port out.

� receive(p:in;m) and empty(p:in) for each input port in of a process p.receive(p:in;m) holds whenever the process p is at the state of receivinga message m from the input port in. empty(p:in) holds whenever thechannel of the input port in contains no messages.

We use predicate-like notation for names of propositional variables. Forexample, send(p:out;m) denotes an individual propositional variable for amessage m 2 M, process p and output port out.

Within the timed automata framework interaction of automata is mod-eled by means of sharing observable propositional variables. Namely, con-sider a parallel composition of two timed automata that share some observ-able variables, i.e. there are variables of the same name that belong to bothautomata. Then, along each run of the parallel composition any sharedvariable has the same value in both automata.

Each propositional variable associated with an input port in or an outputport out is shared by a timed automaton of the process p and a timedautomaton of the channel connected to the port.

Semantics of process declaration Consider a process declaration. Aswe mentioned above all timed automata associated with statements of theprocess have the same signature. This signature � = (P;D; f�g) is con-structed according to the header of process declaration. The set P of propo-sitional variables consists of all observable variables describing behaviour ofthe input and output ports. Data space D is a set of all possible evalua-tions of variables declared in the process. The set of clocks consists of asingle clock � which is used to measure duration of actions performed bythe process.

Semantics of statements In the table 2 the semantics of some compoundMM statements is de�ned.

For each statement S the semantic function S[[S]](�; T ) maps a signature� 2 � and a time unit T 2 T onto a timed automaton of the signature� that behaves like the statement S. We see that the semantic de�nitionscan be derived pretty straightforwardly using the introduced operations ontimed automata.

162

Roman V. Konchakov

S : Statement �! �� T �! TA

S[[S1 S2]] = �� 2 �: �T 2 T:�S[[S1]](�; T ) . S[[S2]](�; T )

�

S[[if(E) S1 else S2]] = �� 2 �: �T 2 T:�C[[E]](�; true) . S[[S1]](�; T )

�+�C[[E]](�; false) . S[[S2]](�; T )

�

S[[while(E) S1]] = �� 2 �: �T 2 T:�C[[E]](�; true) . S[[S1]](�; T )

��+ C[[E]](�; false)

Table 2. Semantics of compound MM statements.

Semantic function C[[E]](�; v) gives a timed automaton that evaluatesthe expression E at the initial location ` and then immediately passes tothe �nal location `c if the value is equal to the parameter v. In the table 3and hereafter we assume that `, `c, etc. are newly introduced labels. Propo-sitional constraint N of the location ` or `c speci�es that the process doesnot send or receive any message at the location.

Semantic function E[[E]] speci�es a denotation of the expression E. Itmaps an valuation of variables onto an expression value (domain EV). Thede�nition of E[[E]] is quite standard and omitted here for brevity.

E : Expression �! D �! EV

C : Expression �! �� EV �! TA

C[[E]] = �� 2 �: �v 2 EV:�(`;N; � = 0)(`c;N; � = 0)

; f`g; f`cg;n �

d; `; d; `c; �� o

d2D : E[[E]](d)=v

��

Table 3. Semantics of expressions.

Finally, in the table 4 we demonstrate how to de�ne the formal semanticsof some typical MM statements.

In the table 4 we use the following abbreviations. A propositional con-straint E(in) speci�es that the input port in is empty, and a propositionalconstraint R(m; in) speci�es that the process is receiving the message mfrom the input port in. By d[v 7! e] we denote a mapping that coincideswith the mapping d everywhere except v, where it gives e.

As an example we consider the semantics equation of receive statement.A timed automaton realizing behaviour of recieve statement acts as follows.It starts at the initial location ` and passes to the location `w if the channelof the input port in is empty. Otherwise it passes to the location `a, where

163


the process receives an incoming message. At the location `w the automatoncan stay for an arbitrary long time (timing constraint is true) and it leavesthe location only when a message arrives to the channel. >From the locationà the automaton immediately gets to the �nal location ^.

The semantics of other MM statements is de�ned in a similar way.

S[[v=E;]] = �� 2 �: �T 2 T:�(`;N; � = 0)

(^;N; � = 0); f`g; f^g;

n�d; `; d[v 7! E[[E]](d)]; ^; �

�od2D

��

S[[delay(E);]] = �� 2 �: �T 2 T:2664

(`;N; � = 0)(`w;N; � � T � E[[E]])(à;N; � = T � E[[E]])

(^;N; � = 0)

; f`g; f^g;

8<:

�d; `; d; `w; �

��d; `w; d; à; ;

��d; à; d; ^; �

�9=;d2D

3775�

S[[receive(v, in);]] = �� 2 �: �T 2 T:26664

(`;N; � = 0)(`w;N ^ E(in); true)(à;R(E[[v]]; in); � = 0)

(^;N; � = 0)

; f`g; f^g;

8>><>>:

�d; `; d; `w; �

��d; `w; d; `; �

��d; `; d[v 7! m]; à; �

��d; à; d; ^; �

�

9>>=>>;m2Md2D

37775�

Table 4. Semantics of MM statements.

Conclusions

In this paper we show how it is possible to de�ne formally a semantics ofprogramming language with real-time and concurrency by combining Deno-tational Semantics (Mosses 1990) and the concept of Timed Automata (Alurand Dill 1994; Alur and Henzinger 1991). As an example we consider MMlanguage based on message passing, but this approach can be adopted as wellto programming languages that provide shared variables for synchronization.Our techniques captures the main characteristic features of real-time parallelexecution of communicated processes and provides promising opportunitiesof well-known formal veri�cation techniques (Alur et al. 1993; Henzingeret al. 1994) application.

References

Alur, R., C. Courcoubetis, and D. L. Dill (1993, May). Model-Checkingin Dense Real-time. Information and Computation 104 (1), 2{34.

164

Roman V. Konchakov

Alur, R. and D. L. Dill (1994, 25 April). A theory of timed automata.Theoretical Computer Science 126 (2), 183{235.

Alur, R. and T. A. Henzinger (1991). Logics and Models of Real Time: ASurvey. Lecture Notes in Computer Science 600, 74{106.

Bakhmurov, A. G., A. P. Kapitonova, and R. L. Smeliansky (1999).DYANA: an Environment for Embedded System Design and Anal-ysis. In W. R. Cleaveland (Ed.), Proceedings of TACAS '99, Volume1579 of LNCS, pp. 390{404. Springer.

Henzinger, T. A., X. Nicollin, J. Sifakis, and S. Yovine (1994, June).Symbolic Model Checking for Real-Time Systems. Information andComputation 111 (2), 193{244.

Mosses, P. D. (1990). Denotational Semantics. In J. van Leewen (Ed.),Handbook of Theoretical Computer Science, Volume B: Formal Modelsand Semantics, Chapter 11, pp. 575{631. Amsterdam: North-Holland.

165


166

Learning Phonotactics Using ILP


Alfa-Informatica, Rijksuniversiteit Groningen

[email protected]

Abstract. This paper describes experiments on learning Dutch phonotacticrules using Inductive Logic Programming, a machine learning approach based onthe notion of inverting resolution. Di�erent ways of approaching the problem areexperimented with, and compared against each other as well as with related workon the task. Further research is outlined.

1 Introduction

The Phonotactics of a given language is the set of rules that identi�es whatsequences of phonemes constitute a possible word in that language. Theproblem can be broken down to the syllable structure (i.e. what sequencesof phonemes constitute a possible syllable) and the processes that take placeat the syllable boundaries (e.g. assimilation).

Previous work on the syllable structure of Dutch includes hand-craftedmodels, like the ones described in (van der Hulst 1984) and (Booij 1995),but also machine-learning approaches: abduction in (Tjong Kim Sang andNerbonne 2000) and neural networks in (Stoianov and Nerbonne 1999) and(Stoianov 2001, ch. 4).

This paper describes experiments on the task of constructing from ex-amples a model of Dutch monosyllabic words. The reason for restrictingthe domain is to avoid the added complexity of handling syllable bound-ary phonological processes. Furthermore by not using polysyllables no priorcommitment is made to any one particular syllabi�cation (and thus syllablestructure) theory.

2 Inductive Logic Programming and Aleph

Inductive Logic Programming (ILP) is a machine learning discipline. It islogic programming in the sense that the target concept to be learned is alogic program, i.e. a set of Horn clauses. It is inductive because the coreoperator used is that of induction.

Proceedings of the Sixth ESSLLI Student SessionKristina Striegnitz (editor)Chapter 15, Copyright c 2001, Stasinos Konstantopoulos

167


Induction can be seen as the inverse of deduction. For example fromthe clauses `All humans die' and `Socrates is human' the fact that `Socrateswill die' can be deduced. Inversely, induction uses background knowledge(e.g. `Socrates is human') and a set of observations (training data) (e.g.`Socrates died') to search for a hypothesis that, in conjunction with thebackground knowledge, can deduce the data. In more formal terms, given alogic program B modelling the background knowledge and a set of groundterms D representing the training data, ILP constructs a logic program H,such that B ^H � D .

If the deductive operator used is resolution (as de�ned by Robinson1965), then the inductive operator necessary to solve the equation aboveis the inverse resolution operator, as de�ned by Muggleton and De Raedt(1994).

Aleph is an ILP system implementing the Progol algorithm (Muggle-ton 1995). This algorithm allows for single-predicate learning only, withoutbackground theory revision or predicate invention. It incrementally con-structs the clauses of a single-predicate hypothesis that describes the data,by iterating through the following basic algorithm:

� Saturation: pick a positive example from the training data and con-struct the most speci�c, non-ground clause that entails it. This isdone by repeated application of the inverse resolution operator on theexample, until all its ground terms have been replaced by variableswhich are appropriately bound by the body of the clause, so that theoriginal ground positive and only that is covered by the bottom clause.This minimally generalised clause is called the bottom clause.

� Reduction: search between the maximally general, empty-bodied clauseand the maximally speci�c bottom clause for a `good' clause. Thespace between the empty clause and the bottom clause is partiallyordered by �-subsumption, and the search proceeds along the latticede�ned by this partial ordering. The `goodness' of each clause encoun-tered along the search path is evaluated by an evaluation function.

� Cover removal: add the new clause to the theory and remove all ex-amples covered by it from the dataset.

� Repeat until all positive examples are covered.

The evaluation function quanti�es the usefulness of each clause con-structed during the search and should be chosen so that it balances betweenover�tting (i.e. covering the data too tightly and making no generalisationsthat will yield coverage over unseen data) and overgeneralising (i.e. coveringthe data too loosely and accepting too many negatives.) It can be simplecoverage (number of positive minus number of negative examples covered by

168


the clause) or the Bayes probability that the clause is correct given the data(for learning from positive data only) or any function of the numbers of ex-amples covered and the length of the clause (for implementing bias towardsshorter clauses.)

Syntactic bias is applied during reduction to either prune paths that arealready known to not yield a winning clause, or to enforce restrictions on theconstructed theory, for example, conformance to a theoretical framework.

3 Setting up the Experiments

As a starting point, a rough template matching all syllables is assumed. Thistemplate is C3VC5, where Cn represents any consonant cluster of length upto n and V any vowel or diphthong. The problem can now be reformu-lated as a single-predicate learning task where the target theory is one ofacceptable pre�xes to a given vowel and partial consonant cluster. The rulesfor prevocalic and postvocalic aÆxing are induced in two separate learningsessions.

The training data is derived from 5095 monosyllabic words taken fromthe Dutch section of the CELEX Lexical Database, with an additional 597reserved for evaluation.

The positive examples are constructed by breaking the phonetic tran-scriptions down to three parts: a prevocalic and a postvocalic consonantcluster (consisting of zero or more consonants) and a vowel or diphthong.The consonant clusters are treated as `aÆxes' to the vowel, so that syllablesare constructed by repeatedly aÆxing consonants, if the context (the voweland the pre- or post-vocalic material that has been already aÆxed) allowsit. So, for example, from the word /ma:kt/ the following positives would begenerated:

prefix(m,[],[a,:]). suffix(k,[],[:,a]).

prefix(^,[m],[a,:]). suffix(t,[k],[:,a]).

suffix(^,[tk],[:,a]).

Note that the context lists in suÆx rules is reversed, so that the two pro-cesses are exactly symmetrical and can use the same background predicates.

The caret, ^, is used to mark the beginning and end of a word. Thereason that the aÆx termination needs to be explicitly licenced is so thatit is not assumed by the experiment's setup that all partial sub-aÆxes of avalid aÆx are necessarily valid as well.

In Dutch, for example, a monosyllable with a short vowel has to beclosed, which means that the null suÆx is not valid. The end-of-word markallows for this to be expressable as a theory that does not have the followingclause: suffix(^,[],[V]).

The positives are, then, all the pre�xes and suÆxes that must be al-

169


lowed in context, so that all the monosyllables in the training data can beconstructed: 11067 amd 10969 instances of 1428 and 1653 unique examples,respectively.

The negative data is randomly generated words that match the C3VC5template and do not appear as positives. The random generator is alsodesigned so that approximately equal numbers of examples at generated ateach at each aÆx length.

The negative data is also split into evaluation and training data, andthe negative examples are derived from the training negative data by thefollowing deductive algorithm:

1. For each example, �nd the maximal substring that is provable by thepositive prefix/3 and suffix/3 clauses in training data. So, forexample, for /mtratk/ it would be trat and for /mlat/, lat^.

2. Choose the clause that should be a negative example, so that this wordis not accepted by the target theory. Pick the inner-most one on eachside, i.e. the one immediately applicable to the maximal substringcomputed above. For /mlat/ that would be suffix(m,[l],[a])./mtratk/, however, could be negative because either prefix(m,[tr],[a])or suffix(k,[t],[a]) are unacceptable . In such cases, pick one atrandom. This is bound to introduce false negatives, but no alternativethat does not presuppose at least part of the solution could be devised.

3. Iterate, until enough negative examples have been generated to dis-prove all the words in the negative training data.

Since the problem is, in e�ect, that of identifying the sets of consonantsthat may be pre�xed or suÆxed to a partially constructed monosyllable,the clauses of the target predicate must have a means of referring to varioussubsets of C and V in a meaningful and intuitive way. This is achieved byde�ning a (possibly hierarchical,) linguistically motivated partitioning of Cand V. Each partition can then be referred to as a feature-value pair, forexample Lab+ to denote the set of the labials or Voic+ for the set of voicedconsonants. Intersections of these basic sets can then be easily referred toby feature-value vectors; the intersection, for example, of the labials andthe voiced consonants (i.e. the voiced labials) is the feature-value vector[Voic+,Lab+].

The background knowledge is, as seen in section 2, playing a decisiverole in the quality of the constructed theory, by implementing the theoreticframework to which the search for a solution will be con�ned. In moreconcrete terms, the background predicates are the building blocks that willbe used for the construction of the hypothesis' clauses and they must bede�ning all the relations necessary to discover an interesting hypothesis.

For the purposes of this task, they have been de�ned as relations betweenindividual phones and feature values, e.g. labial(m,+) or voiced(m,+).

170


Feature-value vectors can then be expressed as conjunctions like, for exam-ple, labial(C,+) ^ voiced(C,+) to mean the voiced labials.

Except for the linguistic features predicates, the background knowledgealso contained the head/2 and rest/2 list access predicates. This approachwas chosen over direct list access with the nth/3 predicate, as bias towardsrules with more local context dependencies.

The experiments described in sections 3.1, 3.2 and 3.3 below, were con-ducted with background knowledge that encodes increasingly more infor-mation about Dutch phonology as well as Dutch phonotactics: for the ex-periment in 3.1 the learner has access to the way the various symbols arearranged in the IPA table, whereas for the experiment in 3.2 a classi�cationthat is sensitive to Dutch phonological processes was chosen. And, �nally,in section 3.3 the sonority level feature is implemented, which has been pro-posed with the explicit purpose of solving the problem of Dutch syllablestructure.

The quantitative evaluation shown for three experiments was done usingthe 597 words and the part of the randomly generated negative data thathave been reserved for this purpose.

3.1 The IPA segment space

In this experiment the background knowledge re ects the way that the IPAtable is organised: the phonetic inventory of Dutch consists of two disjointspaces, one of consonants and one of vowels, with three and four orthogonaldimensions of di�erentiation respectively.

The consonant space varies in place and manner of articulation, andvoicing. The manner of articulation can be plosive, nasal, lateral approx-imant, trill, fricative or approximant. The place can be bilabial, alveolar,velar, labiodental, postalveolar or palatal. Voicing can be present or not.

Similarly for vowels, where there are four dimensions: place (front, cen-tre, back) and manner of articulation (open, mid-open, mid-closed, closed),length and roundedness.

The end-of-word mark has no phonological features whatsoever and itdoes not belong to any of the partitions of either C or V.

This schema was implemented as one background predicate per dimen-sion relating each phone with its value along that dimension, for example:

manner( plosive, p ). place( bilabial, p ).

manner( nasal, m ).

etc.The evaluation function used was the Laplace function P+1

P+N+2 , where Pand N is the coverage of positive and negative examples, respectively.

Since the randomly generated negatives must also contain false negatives,it cannot be expected that even a good theory will �t it perfectly. In order to

171


avoid over�tting, the learning algorithm was set to only require an accuracyof 85% over the training data.

The resulting hypothesis consisted of 199 pre�x and 147 suÆx clausesand achieved a recall rate of 99.3% with 89.4% precision. All the false neg-atives were rejected because they couldn't get their onset licensed, typicallybecause it only appears in a handful of loan words. The /Ã/ onset necessaryto accept `jeep' and `junk', for example, was not permitted and so these twowords were rejected.

The most generic rules found were:

prevoc(A,B,C) :- A= '^'.

prevoc(A,[],C).

postvoc(A,B,C) :- A= '^'.

postvoc(A,[],C).

meaning that (a) the inner-most consonant can be anything, and (b) allsub-pre�xes (-suÆxes) of a valid pre�x (suÆx) are also valid.

Other interesting rules include pairs like these two:

prevoc(A,B,C) :-

head(B,D), manner(trill,D), head(C,E), length(short,E),

manner(closed,E), manner(plosive,A).

prevoc(A,B,C) :-

head(B,D), manner(trill,D), head(C,E), length(short,E),

manner(open_mid,E), manner(plosive,A).

and

prevoc(A,B,C) :-

head(B,D), manner(approx,D), head(C,E), length(short,E),

place(front,E), voiced(minus,A).

prevoc(A,B,C) :-

head(B,D), manner(approx,D), head(C,E), length(short,E),

place(front,E), manner(plosive,A), place(alveolar,A).

that could have been collapsed if a richer feature system would includefeatures like `closed or mid-open vowel' and `devoiced consonant or plosivealveolar', respectively. These particular disjunctions might be un-intuitiveor even impossible to independently motivate, but they do suggest that aredundant feature set might allow for more interesting theories than theminimal, orthogonal one used for this experiment. This is particularly truefor a system like Aleph, that performs no predicate invention or backgroundtheory revision.

172


Root [cons,son]

Laryngeal [cont] [nasal] [lateral] [Place]

[asp] [voice] Labial Coronal Dorsal

[round] [ant] [back] [high] [mid]

Figure 15.1: The feature geometry of Dutch

3.2 Feature Classes

For this experiment a richer (but more language-speci�c) background knowl-edge was made available to the inductive algorithm, by implementing thefeature hierarchy suggested by Booij (1995, ch. 2) and replicated in �g-ure 15.1.

The most generic features are the major class features (Consonant andSonorant) that are placed on the root node and divide the segment space invowels [Cons-,Son+], obstruents [Cons+,Son-] and sonorant consonants[Cons+,Son+]. Since all vowels are sonorous, [Cons-,Son-] is an invalidcombination.

The features specifying the continuants, nasals and the lateral /l/ arepositioned directly under the root node, with the rest of the features bundledtogether under two feature classes, those of the laryngeal and the placefeatures. These classes are chosen so that they collect together featuresthat behave as a unit in phonological processes of Dutch. The class oflaryngeal features is basically making the voiced-voiceless distinction, whilethe Aspiration feature is only separating /h/ from the rest. The place classbundles together three subclasses of place of articulation features, one foreach articulator. Furthermore some derived or redundant features such asGlide, Approximant and Liquid are de�ned. The vowels do not includethe schwa, which is set apart and only speci�ed as Schwa+.

Using the Laplace evaluation function and this background, the con-structed theory consisted of 13 pre�x and 93 suÆx rules, accepting 94.2%of the test positives and under 7.4% of the test negatives.

Among the rejected positives are loan words (`jeep' and `junk' onceagain), but also all the words starting with perfectly Dutch /s/ - obstru-ent - liquid clusters. The pre�x rule with the widest coverage is:

prefix(A,B,C) :-

173


head(C,D), sonorant(D,plu), rest(B,[]).

or, in other words, `pre�x anything before a single consonant before a nucleusother than the schwa'.

The suÆx rules were less strict, with only 3 rejected positives, `branche',`dumps' and `krimpst' (the �rst two of which are loan words) that failed tosuÆx /S/, /s/ and /s/ respectively. Some achieve wide coverage (althoughnever quite as wide as that of the pre�x rules,) but some are making referenceto individual phonemes and are of more restricted application. For example:

suffix(A,B,C) :-

rest(C,D), head(D,E), rest(B,[]), A=t.

or, `suÆx a /t/ after exactly one consonant, if the nucleus is a long vowel ora diphthong'.

Of some interest are also the end-of-word marking rules (see in section 3above about the ^ mark), because of the fact that open, short monosyllablesare very rare in Dutch (there are four in CELEX: `schwa', `ba', `h?', and`joh'). This would suggest that the best way to treat those is as exceptions,and have the general rule disallow open, short monosyllables. What waslearned instead was a whole set of 29 rules for suÆxing ^, the most generalof which is:

postvoc(A,B,C) :-

head(B,t), larynx(t,E), rest(B,F),

head(F,G), larynx(G,E), A= '^'.

or `suÆx an end-of-word mark after at least two consonants, if the outer-mostone is a /t/ and has the same values for all the features in the Laryngealfeature class as the consonant immediately preceding it'.

A �nal note that needs to be made regarding this experiment, is one re-garding its computational complexity. Overlapping and redundant featuresmight be o�ering the opportunity for more interesting hypotheses, but arealso making the search space bigger. The reason is that overlapping featuresare diminishing the e�ectiveness of the inverse resolution operator at keep-ing uninteresting predicates out of the bottom clause: the more backgroundpredicates can be used to prove the positive example on which the bottomclause is seeded, the longer the latter will get.

3.3 Sonority Scale

This experiment implements and tests the syllabic structure model suggestedin (van der Hulst 1984, ch. 3). The Dutch syllable is there analysed as havingthree prevocalic and 5 postvocalic positions, (some of which may be empty)and constraints are placed on the set of consonants that can occupy each.

174


phoneme obstruents m n l r glides vowelssonority 1 2 2.25 2.5 2.75 3 4

Table 15.1: The Sonority Scale

The most prominent constraint is the one stipulating a high-to-low sonor-ity progression from the nucleus outwards. Each phoneme is assigned asonority value (table 15.1) based not only on language-independent featuressuch as it being a Sonorant or an Obstruent, but also because of syllablestructure of Dutch itself. Especially the �ne tuning done with respect to thesonority values of the nasals and the liquids is explicitly justi�ed on �lteringout impossible consonant clusters that would otherwise be predicted by thesimpler model. It must, therefore, be noted that the background knowledgefor this experiment is not only language-speci�c, but is also directly aimedat solving the very problem that is being investigated.

In addition to the high-to-low sonority level progression from the nucleusoutwards, there are both �lters and explicit licensing rules. The former arerestrictions referring to sonority (e.g. `the sonority of the three left-mostpositions must be smaller than 4') or other phonological features (e.g. the `novoiced obstruents in coda' �lter in p. 92) and are applicable in conjunction tothe sonority rule. The latter are typically restricted in scope rules, that takeprecedence over the sonority-related constraints mentioned so far. The left-most position, for example, may be /s/ or empty, regardless of the contentsof the rest of the onset.

Implementing the basic sonority progression rule as well as the mostwidely-applicable �lters and rules1 yielded impressive compression ratesmatched with results lying between those of the two previous experiments:93.1% recall, 83.2% precision.

4 Conclusions and Further Work

The quantitative results from the machine learning experiments presentedabove are collected in table 15.2, together with those of Tjong Kim Sang andNerbonne (2000)2 and the results from the sonority scale experiment. Thoselast ones in particular, are listed for comparison's sake and as the logical

1Some were left out because they were too lengthy when translated from their �xed-position framework to the aÆx licensing one used here, and were very speci�cally �netuning the theory to individual onsets or codas.

2From experiments on phonetic data in the `Experiments without linguistic Con-straints' section.

175


(Tjong 2000) IPA Feat. Classes Sonority

Recall 99.1% 99.3% 94.2% 93.1%Precision 74.8% 79.8% 92.6% 83.2%

Num. Clauses 577 + 577 145 + 36 13 + 93 3+8

Table 15.2: Results

end-point of the progression towards more language- and task-speci�c priorassumptions.

The �rst two columns are directly comparable, because they are bothonly referring to phonetic primitives with no linguistically sophisticatedbackground knowledge. Furthermore the fact that the C3VC5 template as-sumed in this work is not taken for granted in (Tjong Kim Sang and Ner-bonne 2000) is taken in account in terms of compactness as well as perfor-mance, since (a) in (Tjong Kim Sang and Nerbonne 2000) there are 41 extrarules besides the 1154 pre�x and suÆx rules that describe the "basic word"on which the latter operate, and (b) the precision in (Tjong Kim Sang andNerbonne 2000) is measured on random strings, whereas in this work onlystrings matching the C3VC5 template are used.

As can be seen, then, the ILP-constructed rules compare favourably (inboth performance and hypothesis compactness) with those constructed bythe deductive approach employed in (Tjong Kim Sang and Nerbonne 2000).

What cat be also seen by comparing the two ILP results with each other,is that the drop in recall between the the second and third column is com-pensated by higher precision and compression, suggesting a direct correspon-dence between the quality of the prior knowledge encoded in the backgroundtheory and that of the constructed hypothesis.

One interesting follow-up to these experiments would be attempting toexpand their domain to that of syllables of multisyllabic words and, even-tually, full word-forms. In the interest of keeping the problems of syllabicstructure and syllable-boundary phonology apart, a way must be devised toderive from the positive data (i.e. a corpus of Dutch word-forms) examplesfor a distinct machine learning session for each task.

Furthermore, it would be interesting to carry out the same (or rather theequivalent) experiments on other parts of the CELEX corpus (e.g. English orGerman) and see to which extend the results-to-background relation followsthe same patterns.

On the purely computational aspect of the problem, and as has alreadybeen mentioned in section 3.2, including overlapping and redundant featuresin the background might be interesting, but is also implying a very fast

176


growth of the search space. It would, therefore, be useful to employ thesearch path pruning facilities of Aleph to avoid directions along the searchpath that would yield inconsistent or unlikely feature combinations.

References

Booij, G. (1995). The Phonology of Dutch. The Phonology of the World'sLanguages. Oxford: Clarendon Press.

Muggleton, S. and L. De Raedt (1994). Inductive Logic Programming:Theory and Methods. Journal of Logic Programming 19 (20), 629{679.Updated version of technical report CW 178, May 1993, Departmentof Computing Science, K.U. Leuven.

Muggleton, S. H. (1995). Inverse Entailment and Progol. New GenerationComputing 13, 245{286.

Robinson, J. A. (1965). A Machine-Oriented Logic based on the Resolu-tion Principle. Journal of the ACM 12 (1), 23{41.

Stoianov, I. (2001). Connectionist Lexical Processing. Ph. D. thesis, Rijk-suniversiteit Groningen.

Stoianov, I. and J. Nerbonne (1999). Exploring Phonotactics with SimpleRecurrent Networks. In van Eynde, Schuurman, and Schelkens (Eds.),Computational Linguistics in the Netherlands.

Tjong Kim Sang, E. and J. Nerbonne (2000). Learning the Logic of Sim-ple Phonotactics. In J. Cussens and S. Dzeroski (Eds.), Learning Lan-guage in Logic, Volume 1925 of Lecture Notes in Arti�cial Intelligence.Springer Verlag.

van der Hulst, H. (1984). Syllable Structure and Stress in Dutch. Ph. D.thesis, Rijksuniversiteit Leiden.

177


178

Self Embedded Relative Clauses in a

Corpus of German Newspaper Texts

Christian Korthals

University of the Saarland

[email protected]

Abstract. The distribution of center self-embeddings and extrapositions inGerman is assumed to minimize memory load during parsing. Self-embedded rela-tive clauses were semi-automatically analysed in a treebank of German newspapertexts. Clause length and especially extraposition distance are found as the mostimportant parameters distinguishing center embeddings from extrapositions. Theinternal structure of superordinated and subordinated clause is not important.1

1 Introduction

The opposition of center self-embedding constructions and extrapositions isan interesting phenomenon of syntax from two points of view: from the per-spective of parsing and automata theory, center embeddings force the parserto possess context free power at least. From a psycholinguistic perspective,center embeddings and extrapositions are assumed to di�er in their memoryload, because the processing of one phrase may have to be delayed untilintervening material has been processed.

This paper examines German relative clause self-embeddings as an ex-ample of these kinds of constructions.2 In section 2, we de�ne and classifyrelative clauses and demonstrate that the choice of center embedding vs. ex-traposition arises in German syntax due to verb �nal position. Hypotheseson the acceptability of di�erent constructions are formulated primarily onthe basis of Hawkins' performance theory, which is introduced in section 3together with other preliminaries. The hypotheses were tested against data

1This work was funded within the NEGRA project, which is part of the Sonder-forschungsbereich 378, funded by the Deutsche Forschungsgemeinschaft at the Universityof the Saarland. I wish to thank Thorsten Brants, Reinhard K�ohler, Lars Konieczny, ValiaKordoni, Daniela Kurz, Oliver Pl�ahn, Christoph Scheepers and Hans Uszkoreit for theircomments and support.

2Note that the term \self-embedding" is used to refer to all cases where a RC (or, de�n-ing more weakly, also an S) contains another RC. We use the terms \center embedding",\right embedding" and \extraposition" to subclassify self-embeddings.

Proceedings of the Sixth ESSLLI Student SessionKristina Striegnitz (editor)Chapter 16, Copyright c 2001, Christian Korthals

179

Self Embedded Relative Clauses in a Corpus of German Newspaper Texts

from a treebank of German newspaper texts. This corpus is described insection 4 together with a method of deriving description parameters of eachrelative clause in the corpus fully automatically. In section 5, we presentthe results of the evaluation of three hypotheses: a) Longer relative clausesshould tend to be extraposed, while shorter ones should tend to be embed-ded. This turns out to be weakly supported only (Section 5.1). b) Thepotential extraposition distance (the distance between relative pronoun andantecedent) should decide whether a clause is extraposed or not (Section5.2). This factor will turn out to be the most relevant one. c) Superordi-nated and subordinated RC should di�er in their internal structure. Thisturns out to be false (Section 5.3).

2 Types of Relative Clauses

A relative clause is a sub-clause that contains at least one relative pronoun.The TIGER annotation scheme (Brants et al. 2000) and the StuttgartT�ubingen tagset (Schiller et al. 1999) classify relative pronouns into fourclasses based on two criterions: substituting vs. attributive, and proper rel-ative pronoun vs. wh-pronoun. Especially when wh-pronouns are involved,there is gradiance between relative clauses and other clause types (e.g. placeholder phrase, clausal complement).

In German, main declarative clauses in the present or past tense haveverb second position. If a post-verbal NP contains a relative clause, it mayturn out to be situated at the end of the sentence, directly following itsantecedent NP. Such a case of right embedding (RE) yields no diÆculty atall, as in example (1).

(1) ErHe

kannteknew

denthe

Termin,deadline

derwhich

f�urfor

diethe

Konferenzconference

festgesetztset

war.was.

\He knew the deadline set for the conference."

In contrast, perfect tense declarative clauses require an auxiliary verband an in�nite verb in �nal position. Also subordinate clauses (and amongthem relative clauses) have the (�nite) verb in �nal position. This meansthat whenever a relative clause is embedded in another relative clause, atthe very least the verb of the superordinated relative clause must followafter the subordinated relative clause, if the antecedent NP is to be real-ized continuously, i.e. without intervening material. These cases will becalled center embeddings (CE). (2) is a complex (arti�cial) example, wherea relative clause is center embedded in another relative clause, which isagain center embedded in the main clause. If center embedding is to beavoided, the subordinated relative clause must be extraposed over the verb(and maybe some of its arguments). This case is illustrated in example (3)and will be called an extraposition (EX).

180

Christian Korthals

However, it is possible to extrapose even more material than the relativeclause alone, as illustrated in example (4). This again results in a continuousrealization, and thus is considered a type of RE, too.

(2) [SErHe

hathas

denthe

TerminNN,deadline

[RCderPRELwhich

f�urfor

diethe

KonferenzNN,conference

[RCdiePRELwhich

erhe

besuchenVINFattend

wollteVFIN,]wanted

festgesetztVPARTset

warVFIN,]was

gekanntVFIN.]known

\He knew the deadline that was set for the conference he wanted to attend."

(3) [SEr hat den Termin, [RCder f�ur die Konferenz festgesetzt war, [RCdie erbesuchen wollte,]] gekannt.]

(4) [SEr hat den Termin, [RCder festgesetzt war f�ur die Konferenz, [RCdie erbesuchen wollte,]] gekannt.]

Throughout the paper, we will investigate relative clauses of embed-ding depth 2, that is the most inner RCs in sentences of the structure[S � [RC � [RC] ] Æ]. Note that the dominance relation between theclauses will in general be indirect: Normally, at least an NP, the antecedentNP of the RC will occur in � or �; there may also be more material, e.g.coordinated structures or VPs. If there is no antecedent NP, we speak of afree relative clause (Eisenberg 1986).

3 Theoretical Assumptions

Standard context free phrase structure grammars would indeed generatecenter self-emedded structures like (2) and render them grammatical. Itwas early noted, though, e.g. by Chomsky (1959), that humans have diÆ-culty understanding center embedded structures. The existence of alterna-tive syntactic constructions like right extrapositions can therefore be viewedas emerging from a universal principle of preventing center embeddings orrestricting them to a certain depth (K�ohler 1999).

Many alternative explanations have been provided to explain humans'parsing diÆculties with center embedding. Lewis (1996) discusses some ofthem. In his model, center embedded relative clauses are predicted to behard due to memory interference e�ects between stacked preverbal NPs thatcannot be integrated into a coherent parse tree before the verb is encoun-tered, and therefore have to be stored in short term memory. He predictscenter embeddings are incomprehensible only in those cases where morethan two NPs before the occurence of the verb receive the same grammati-cal function.

In contrast, Hawkins (1994) does not consider the internal structure ofembedded material at all but reduces all the factors that may contribute todiÆculties in understanding to one simple variable, namely phrase length interms of the number of word forms. The quality of a preterminal node issimply the number of terminal nodes from the beginning of the phrase tothe �rst terminal of the last immediate constituent, divided by the number

181


of immediate constituent nodes of the phrase. The overall quality of thesentence is the sum of the quality of its nodes. For relative clauses, Hawkins'sprinciple of Early Immediate Constituents (EIC) predicts a) that shortercenter embedded relative clauses should be more acceptable than longer onesand b) that extrapositions should be more acceptable if their extrapositiondistance is small. Thus, a) and b) are competing principles of locality thatpredict a high memory load when the processing of a phrase has to bedelayed until intervening material has been processed.

4 Data and Methods

In order to test hypotheses on complex syntactic constructions, syntacticallyannotated corpora (treebanks) are needed. The NEGRA Millennium corpus(Negra) is a treebank comprising 20,571 sentences semi-manually annotatedat the University of the Saarland. Word forms are POS-tagged according tothe Stuttgart T�ubingen tagset (Schiller et al. 1999). The syntactic struc-ture is annotated according to an annotation scheme that recognizes formalnode tags (phrase labels like sentence, noun phrase) and functional edgelabels (like subject, post-modi�cation, relative clause, clausal complement).The annotation scheme allows for crossing phrase-structure edges to encodediscontinouous constituents. The Negra corpus is a sub-corpus of the 2.4million sentence Frankfurter Rundschau newspaper corpus (Rundschau).

Since center embedded relative clauses are quite rare a phenomenon, thedata from Negra was not suÆcient. Therefore, we heuristically searched astatistically POS-tagged version of the Frankfurter Rundschau corpus forpossible center embeddings and annotated the �ndings according to theTIGER annotation scheme. The heuristics searched for at least two oc-curences of commas followed by an optional preposition and a potential rel-ative pronoun in the sentence. Since word forms that are potential relativepronouns are highly ambiguous between determiner and relative pronounin German, manual �ltering was necessary. At the present point of time,we cannot yet estimate recall and precission accurately, but it is clear thatfurther work is necessary to increase the recall.

The resulting data was automatically analysed with C, Perl and Awktools operating on the data structures representing the syntactic trees. Foreach sentence, a characterization of all the relative clauses it contained wasgenerated fully automatically. The parameters were then �ltered and sta-tistically evaluated. Among the parameters were the embedding depthwithin the top level S (0, 1, or 2), the relative pronoun together with adetailed description of its tag and its function within the clause, the kind ofantecedent phrase (NP, PP or free relative clause), the kind of embed-ding (RE, CE, EX), the length of superordinated and subordinated clauseand the (potential) extraposition distance (cf. section 5.2).

182

Christian Korthals

5 Results

Of 20,571 sentences in Negra, 2352 (11.4%) contained at least one relativeclause, and there were 2389 relative clauses in total. 61 sentences (0.3%)contained a relative clause embedding, i.e. two or more relative clauses ofwhich at least two were embedded into each other. 1 of these sentencescontained a double RC self-embedding, 1 sentence contained a conjunctionof two embedded RCs clauses within another RC. These two instances willbe ignored in the rest of the paper. Of the remaining 59 RCs of embed-ding depth 2, 39 (66%) were extraposed from their matrix RC, 13 (22%)were center embedded, and 7 (12%) were \right embedded", i.e. involvedconstructions where more material than the relative clause was extraposed.Table 16.1 shows the type of embedding of the superordinated RC in thematrix clause (super), and of the subordinated RC in the superordinatedRC (sub), and provides the counts for each class in detail.

super �RE EX CE 59

RE 4 2 1 7sub EX 25 11 3 39

CE 7 3 3 13

Table 16.1: Types of all RC self-embeddings in Negra

The heuristical search in the unannotated Frankfurter Rundschau corpusyielded 62 center embedded RCs. 14 of these were CE in CE embeddings,28 CE in RE embeddings and 17 CE in EX embeddings. There were 3 morecomplex constructions involving CE relative clauses.

A corpus of all CE relative clauses was compiled by merging the CErelative clauses in Negra with those from Frankfurter Rundschau, whichyielded 68 CE relative clauses in total. Since we do not yet have more datafrom the Rundschau corpus, we ignored the type of embedding of the self-embedded RC within the top level S for the compilation of the corpus ofcenter embeddings.

In analogy, a corpus of non-center embedded RCs (nonCE) was com-puted. It included the 39 extrapositions and the 7 right embeddings fromNegra. Figure 16.1 shows the structure of the data.

5.1 Clause Length

Based on the principle of Early Immediate Constituents (EIC, Hawkins(1994)), we expected to �nd a preference for the extraposition of longerphrases, while shorter phrases are expected to appear center embedded moreoften, because they would not delay the processing of their matrix clausefor very long.

183


CE in FF (62)

Frankfurter Rundschau corpus

Negra treebank

(39 EX + 7 RE)

nonCE in Negra (46)

CE (68)

CEin FR

and Negra (7)

CE in Negra (13)

Figure 16.1: Relations between corpora used

As a comparison, we computed the overall distribution of relative clauselength in the Negra corpus, i.e. the length of all 2389 relative clauses inthe Negra corpus, be they embedded in other clauses or not. Within Ne-gra, we found an average relative clause length of 10.2. We compared theseresults to all 68 center embedded relative clauses in the corpus of centerembeddings (CE sub-corpus in �gure 16.1) as well as with all 46 non-centerembedded relative clauses from Negra (nonCE sub-corpus). Note that ex-trapositions and \right embeddings" were treated as one class on the baseof the arguments under section 2.

Table 16.2 shows the frequencies of relative clauses as a function of lengthin the corpus of center embeddings (CE) and in the corpus of extrapositionsand right embeddings (nonCE) together with the overall distribution of rel-ative clause length in Negra.

length 2 3 4 5{6 7{10 11{14 15{24 25{47 Total �Negra .00 2.2 5.0 19.3 36.7 20.4 13.3 2.9 2389 10.2nonCE 0 2.1 2.1 23.4 38.3 17.0 17.0 0 46 9.4CE 0 7.4 10.3 35.3 25.0 20.6 1.5 0 68 7.5

Table 16.2: Distribution of RC length for all RCs in Negra, and of centerembedded RCs and non center embedded RCs of depth 2 (in %)

There is no signi�cant di�erence between the distribution of non-centerembedded RCs we investigated and the overall distribution in Negra (�2(df=7) =

5:09). That is, extraposed or right embedded RCs are quite representativein their length of RCs in total. However, there is a signi�cant di�erencebetween the distributions of center embedded RCs and the distribution inNegra (�2(df=10) = 34:0; p < 0:001). There is also a signi�cant di�erence

184

Christian Korthals

between the distribution of CE and nonCE (�2(df=6) = 68:9; p < 0:001).We conclude that a subordinated RC in a relative clause extraposition is

not signi�cantly shorter than the average of relative clauses in total, namelyaround 10 word forms. In contrast, a subordinated RC center embeddedwithin another relative clause is signi�cantly shorter, namely about 8 wordforms.

5.2 Extraposition Distance

One of the parameters that were fully automatically generated from thecorpus data is the (potential) extraposition distance of a relative clause. Inthe case of an extraposition, this is the number of word forms betweenthe end of the annotated antecedent NP and the relative clause (ex). Inexample (3) above, e.g., the RC is extraposed over the two word formsfestgesetzt war (ex = 2). In the case of a center embedding, it is the numberof remaining word forms of the superordinated clause after the end of thesubordinated clause (rd). Note that potentially every center embeddingcan be transformed into an extraposition and vice versa, and that alwaysrd = ex, as in example (2).

A hypothesis following from Hawkins (1994) is that extrapositions overa small number of word forms are more acceptable than those over a far dis-tance. There is striking evidence for this hypothesis. In fact, extrapositionsover a distance of more than three word forms do not occur in subordinatedembedded relative clauses at all. In contrast, there is massive data for cen-ter embeddings whose potential extraposition distance is between 4 and 17.Table 16.3 shows the distributions in detail. Note that RE embeddings wereignored in this analysis.

distance 1 2 3 4 5 6 7 8 9 10{13 14-17 total

CE rd 11 13 8 12 6 6 2 2 2 4 2 68EX ex 24 11 4 39

Table 16.3: Distribution of (potential) extraposition distance of center em-bedded and extraposed RCs of embedding depth 2

As can be seen from the table, relative clause embeddings with a poten-tial extraposition distance between 1 and 3 do occur center embedded aswell as extraposed. If center embedding is assumed to be the preferred op-tion for relative clauses with a potential extraposition distance greater than3, but center embeddings and extrapositions are both possible for RCs withsmall potential extraposition distance, one would expect length di�erencesnot only between EX and CE but also between \close" CEs and \far" CEs.

The analysis con�rmed this hypothesis. While extrapositions are 9.4word forms long on average (cmp. above, table 16.2), \far" center embed-dings (whose potential extraposition distance rd is greater than 3) are 8.6

185


word forms long, and \close" center embeddings (rd � 3) are 5.7 word formslong (The �2 values reached p<0.001 for EX vs. close CE, p < 0:02 for EXvs. far CE, and p = 0:004 for far CE vs. close CE).

The data may predict that not all center embeddings lead to problems,but that especially those center embeddings are acceptable that would leadto large extraposition distances if the relative clause was extraposed. Acorpus example of an acceptable double center embedding is (5). Sentenceswith a small (potential) extraposition distance, on the other hand, occurcenter embedded in, as well as extraposed from their matrix RC. Examplesfor both cases are provided in (6) and (7).

(5) [SDerThe

RegierungschefNN,head of government

[RCderPRELwho

sichhimself

wegendue to

Pensions-pensions

undand

AusgleichszahlungenNN,compensations

[RCdiePRELwhich

seinhis

Gehaltsalary

jahrelangfor years

monatlichmonthly

umby

einigesome

tausendthousand

Markmarks

aufstocktenVFIN ],increased

seitfor

zweitwo

Wochenweaks

unangenehmenembarrassing

Fragenquestions

stellenVINFanswer

mu�VFIN],must

forderteVFINcalled for

denthe

Jubelchearing

undand

diethe

unkritischeuncritical

Solidarit�atsolidarity

seinerof his

Genossencomrades

imin

Oskar-LandOskar land

ein].(verb particle)

\The head of government, who has been exposed for two weaks to anweringembarrassing questions about pensions and compensation money that have beenincreasing his monthly salary by some thousand marks for years called for hiscomrades' applause and uncritical solidarity in the `Oskar' territory." (far CEunder far CE)

(6) [SIm selben Zuge soll der RabattNN von 20 Pfennig pro Kubikmeter, [RCderPRELbisher denjenigen Gro�verbrauchernNN, [RCdiePREL mehr als 10.001 Kubikmeterim Jahr verbrauchenVFIN], zugestandenVPART wurdeVFIN], entfallenVFIN]. (short CEunder short CE)

(7) [SDieser Fonds wirdVFIN beispielsweise mit dem FlaschenpfandNN gef�ulltVPART,[RCdasPREL Annemarie Roth f�ur die FlaschenNN erl�ost, [RCdiePREL sie bei denregelm�a�igen Spazierg�angen im Gr�uneburgpark �ndet]]]. (short EX under shortEX)

5.3 Structural Parallelism or Anti-Parallelism

If more complex categories than S or NP are assumed for syntactic de-scriptions, the concept of self-embedding becomes unclear: there might bedi�erences between an object RC embedding a subject RC, for example.Also Lewis (1996) would predict di�erences in acceptability depending onthe noun phrases involved. As a matter of fact, Lewis (1996) predicts anin uence of all noun phrases occuring within the matrix clause and all em-bedded clauses. For reasons of simplicity, we concentrated on two factorsonly: the antecedent NPs and the phrase containing the relative pronoun.We considered these variables as possible criterions of structure similarityof the clauses involved and hypothesized there might be a trend towards

186

Christian Korthals

anti-parallelism. There is still not enough data for more variables to beinvestigated.

Within the Negra corpus, there is a ratio of roughly 70% antecedentNPs and 30% antecedent PPs of all relative clauses in total. Unexpectedly,embedded relative clauses have 57% antecedent NPs and 43% antecedentPPs, be they center embedded or not (�2(df=1) = 9:34; p = 0:002). There isno signi�cant di�erence between CE and nonCE.

The hypothesis that the distribution of the antecedent NP of the sub-ordinated RC might depend on the antecedent NP of the superordinatedRC could be falsi�ed: There are four possible classes (RC with NP or PPantecedent embedded within RC with NP or PP antecedent), whose distri-bution does not di�er signi�cantly between CE and nonCE.

Our second criterion of internal clause structure was the function of therelative pronoun within the RC. Again, the distribution of RC pronounfunctions within the entire Negra corpus was automatically computed asa comparison. Table 16.4 shows descriptions of the most frequent relativepronoun functions in the RC. Note that subject RCs are by far the mostfrequent class of relative clauses in German, and that object RCs are quiterare.

64% subject 3% modifying wh-pronoun11% within adverbial PP 2% attributive pronoun in NP10% accusative object 10% other

Table 16.4: Function of relative pronoun within all relative clauses in Negra

Signi�cance tests were possible for the main classes subject relative,object relative, others only. There is only a marginal di�erence in theirdistributions between nonCE and Negra (�2(df=2) = 5:1; p = 0:07). Incontrast, there are signi�cant di�erences between the distributions of CEand of Negra (�2(df=2) = 21:9; p < 0:0001), and between nonCE and CE

(�2(df=2) = 43; p < 0:0001). Center embedded RCs were signi�cantly more

often subject RCs (74%) than nonCEs (45%) or RCs in general (58%).While center embedded RCs may have a tendency to be subject RCs in

general, it could be shown that it does not have any in uence on the typeof subordinated relative clause whether the superordinated relative clauseis a subject RC, an object RC or any other type of RC. The matrix clausesof subject RCs are also subject RCs in 63% of the cases and others in theremaining 37%. This is true of center embedded RCs as well as of non-centerembedded RCs (�2(df=1) = 2:9 � 10�4).

In summary, the hypotheses that there might be a dependency betweenthe internal structure of the superodinated RC and the subordinated RCcould be falsi�ed. For both variables considered, antecedent and relativepronoun function, such a dependency was not signi�cant. There was neither

187


a tendency towards anti-parallelism between embedding RC and embeddedRC, nor towards parallelism.

There were two unpredicted e�ects: an aÆnity of embedded (centerembedded and extraposed) RCs to have PP antecedents, and a tendencyof center embedded relative clauses to be subject RCs. Recall, however,that while all non-center embeddings were taken from Negra, center embed-dings were heuristically extracted from the Frankfurter Rundschau corpusto enhance data. Since the recall was low, it may well be argued that thedi�erence is due to properties of the extraction heuristics. Further researchon increasing the recall from the Rundschau corpus may help to clarify thisopen issue.

6 Discussion

The data is compatible with models that explain the distribution of cen-ter embeddings and extrapositions as following from restrictions on shortterm memory. Two parameters, embedded clause length and extrapositiondistance that are predicted to be relevant by Hawkins's EIC principle werecon�rmed. However, extraposition distance seems to be the more promi-nent factor, whereas clause length is of secondary importance. The datamay also be interpretable within related models, such as Gibson's SPLTtheory, which assumes higher \integration cost" with increasing extraposi-tion distance and/or clause length.

Predictions following from a phrase structure architecture assuming thatthe internal clause structure of the participating RCs matters could be fal-si�ed. Assumptions that parallelism or anti-parallelism between the RCsmight simplify processing could not be con�rmed with the present data. Inorder to test predictions of a more elaborated interference model like Lewis(1996), more data is needed.

The question may be raised whether the material investigated { newspa-per texts { re ects production or perception preferences. Konieczny (2000),for instance, found signi�cant acceptability assymetries between perceptionand production data.

7 Conclusion and Future Work

The paper demonstrated how a syntactically annotated corpus together withappropriate querying and processing tools can be used to semi-automaticallyprove or disprove hypotheses derived from a model of performance. Thistechnique allows for an easy repetition of the study on di�erent data or withadditional parameters (e.g., restrictive vs. non-restrictive RC).

The data can be interpreted as following from a tradeo� between di�er-ent principles of minimizing memory e�ort during parsing (or generation,

188

Christian Korthals

respectively). Such principles are the principle of Early Immediate Con-stituents and constraints on embedding depth and phrase length (see e.g.K�ohler (1999) for a wider framework). The main results are the quanti�ca-tion of the tendency to extrapose over short distances only, and to centerembed especially shorter phrases whose potential extraposition distance isshort. We did not �nd any evidence for the hypothesis that structural di�er-ences between superordinated and subordinated RC might have an impact.

The study con�rms and supplements a prior study by Uszkoreit et al.(1998) with more data (20,000 instead of 12,000 sentences) and, especially,a shift towards self-embeddings. For the future, an automatic calculationof EIC scores is planned. A cross-check with functional equivalents of rel-ative clauses (K�ohler 1999) and psycholinguistic acceptability tests are twopossible ways to supplement the study.

References

Brants, T. et al. (2000). TIGER Annotationsschema. Saarbr�ucken: Uni-versit�at des Saarlandes, www.coli.uni-sb.de/cl/projects/tiger.

Chomsky, N. (1959). On Certain Formal Properties of Grammars. Infor-mation and Control (2), 137{167.

Eisenberg, P. (1986). Grundri� der deutschen Grammatik. Stuttgart:Metzler.

Gibson, E. (1998). Linguistic Complexity: Locality of Syntactic Depen-dencies. Cognition (68), 1{76.

Hawkins, J. A. (1994). A Performance Theory of Order and Constituency.Cambridge: Cambridge UP.

K�ohler, R. (1999). Syntactic Structures: Properties and Interrelations.Journal of Quantitative Linguistics 6 (1), 46{57.

Konieczny, L. (2000). Locality and Parsing Complexity. Journal of Psy-cholinguistic Research 29 (6), 627{645.

Lewis, R. L. (1996). Interference in Short-term Memory: The MagicalNumber Two (or Three) in Sentence Processing. Journal of Psycholin-guistic Research 25, 93{115.

Schiller, A., S. Teufel, C. St�ockert, and C. Thielen (1999). Guidelinesf�ur das Tagging: Kleines und gro�es Tagset. Stuttgart: Institut f�urmaschinelle Sprachverarbeitung.

Uszkoreit, H., T. Brants, et al. (1998). Studien zur performanzorientiertenLinguistik. Kognitionswissenschaft 7 (3), 129{133.

189


190

Automating Proof in Non-standard

Analysis

Ewen Maclean

Division of Informatics, University of Edinburgh

[email protected]

Abstract. Non-standard analysis provides a framework to carry out formallycalculus proofs in a much more intuitive manner than in the � � Æ formulationof Weierstra�. This paper introduces the notions of proof-planning and rippling,and gives an example showing how they can be applied to non-standard analysisto produce readable proofs. We present the related work in the area and give anoutline of the proposed work.

1 Introduction

The work proposed in this project is a combination of proof-planning (Bundy1988) and an area of mathematics known as non-standard analysis (Robin-son 1966). Proof-planning is a technique which helps to automate proofby working at an abstracted level. Non-standard analysis allows intuitivecalculus proofs to be performed with the assurance of correctness thanks toa solid logical construction of the number system that underlies it{ namelythe hyperreals. The aim of this work is to develop proof-plans which capturethe common patterns of reasoning that take place when proving theoremsin non-standard analysis, and thereby to automate their proof. The workpresented here has been completed in the �Clam higher order proof-planner(Richardson et al. 2000)

We start by giving a brief overview of some concepts in both non-standard analysis and proof-planning; we then present an example proofand discuss some of the related work; �nally we highlight possible futuredirections for our development.

2 Non-standard analysis

Non-standard analysis is a tool which provides an intuitive yet rigorous al-ternative to Weierstra�'s tricky �� Æ proofs. Abraham Robinson (Robinson

Proceedings of the Sixth ESSLLI Student SessionKristina Striegnitz (editor)Chapter 17, Copyright c 2001, Ewen Maclean

191

Automating Proof in Non-standard Analysis

1966) brought together a number of ideas from mathematical logic to comeup with a theory of analysis in the hyperreal domain. By formally introduc-ing an \in�nitely close" relation, de�nitions and proofs become simpler andmore intuitive.

As an example of how a de�nition in non-standard analysis can be sim-pler and more intuitive than its standard counterpart, consider the standardand the non-standard de�nitions of a limit.

De�nition 1. The limit l, of a function f at a point a is de�ned as:

limx!a f(x) = l � (8� > 0)(9Æ > 0)8x: 0 < jx� aj < Æ ! jf(x)� lj < �

Proofs which make use of this de�nition are usually very diÆcult to auto-mate, on account of the alternating quanti�ers, which mean that instan-tiations for Æ have to guessed early on in the proof. Theorem 1 gives usan alternative characterisation of a limit making use of the in�nitely closerelation �.

Theorem 1. The limit l, of a function f at a point a can equivalently bede�ned in non-standard analysis by:

limx!a f(x) = l () 8x: x � �a ^ x 6= �a! f�(x) � �l

As this theorem expresses the notion of limit over the hyperreals, denoted�R, a few points are worth mentioning: the function f� corresponds tothe original function f extended to accept hyperreal arguments whereas�a corresponds to the embedding of real value a in the hyperreals. Thesenotions are introduced for correctness and will not be examined further here.Only an intuitive understanding of their behaviour is needed for the rest ofthis paper although the interest reader may �nd out more by examining(Robinson 1966).

This theorem gives us a simple non-standard characterisation for a limitwhich formalises the intuition that underlies the ��Æ formulation of a limit.For a proof of theorem 1 the reader is urged to consult (Fleuriot and Paulson2000), for example. We give a demonstration of the intuitive nature of proofsinvolving limits in section 4.

3 Proof-planning

Proof planning (Bundy 1988) is a technique for devising an overall plan for aproof, which can then be used to guide the proof search itself. A proof-planconsists of methods and critics. The methods embody common patternswithin proofs, such as the use of induction, and the associated tactics carryout these methods explicitly in the object level prover. For each conjecturea precise proof-plan is built, but a general form of a proof-plan can be

192

Ewen Maclean

developed for certain classes of conjecture. As described by (Ireland 1992),a method in proof planning has a number of \slots" assigned to it whichcorrespond to a name by which it is recognised, an input which is matchedagainst the current goal, and a set of preconditions which determine whetherthe method is applicable to the current goal. The method also has a set ofe�ects which de�ne what happens to the goal after it has been applied,and a tactic which controls the object level prover, in which the plan canbe executed. Finally, it has an output which is the result of applying thee�ects to the current goal.

3.1 Critics

The use of inductive theorem provers such as the Boyer-Moore TheoremProver (Boyer and Moore 1979) shows that failed proofs can provide usefulinformation about failure, and hence can help to suggest ways to renderproofs successful. One piece of useful information that could come out ofa failed proof attempt is a suggestion of how to pre-process the conjecturein some way to help the proof succeed. In the context of proof-planning, acritic is a component which analyses this failure. Ireland and others (Ireland1992) have done considerable work in this area, which the interested readeris urged to consult.

3.2 Rippling

Rippling is a heuristic used in proof-planning for guiding the proof search. Itwas initially motivated by Aubin's observation (Aubin 1976) on how termsintroduced by induction are a�ected by rewriting. Bundy formalised thisidea into a theory of annotated rewriting (Bundy et al. 1993), and a formalcalculus has been developed from which one can prove termination (Basinand Walsh 1996). Crucially, it has been shown that rippling can be extendedto non-inductive settings by Yoshida (Yoshida et al. 1994), Walsh (Walshet al. 1992) and Hutter (Hutter 1997). This is of importance to the currentwork.

Rippling annotates the conclusion to indicate the di�erence between thehypotheses and conclusion. The idea is that the extra structure which existsin the conclusion can be annotated, and one can use this annotation to reasonabout how the proof is proceeding. The basic properties of an annotationare directed wave fronts, wave holes and sinks. Wave fronts, wave holes andsinks are the meta annotation which encapsulate terms. Wave fronts mustinclude wave holes. Sinks correspond to universally quanti�ed variables inthe hypotheses. We shall now brie y explain the important properties andde�nitions of rippling. For a more formal de�nition of rippling see (Basinand Walsh 1996).

Well annotated terms All wave fronts must contain at least one wave

193


hole, and the term within the wave hole is well annotated. All theterms within the wave front, but outside the wave hole are unanno-tated. Unannotated terms are well annotated.

Skeletons The skeleton of an expression is the set of all expressions thatare built up from the unannotated term and the possible wave holeswithin each wave front.

Erasure The erasure of a well annotated term is the term without anyannotation.

For example consider the expression f(x + y). Examples of possible anno-tations which render it a well annotated term are:

f(x+y); f(x)"

+y; f(x + y )"; f(x+ y )

"; f(x)

#+y; f(x + y )

#

The erasure of all of these terms is simply the original expression f(x+ y).

As an example, the skeleton of the term f(x)"

+ y is x + y, because

this is the expression built up from the term in the wave hole and the

unannotated part. Similarly the skeleton of f(x + y )"

is fx; yg, because x

and y are both wave holes of the same wave front, so the set of possibilitiesis constructed. The reason for these de�nitions will become clear whenconsidering the properties of rippling.

Standard rewrite rules can be annotated to �t in with the ripplingscheme. Annotated rewrite rules, called wave rules, have to be both skeletonpreserving and measure decreasing. We will not describe here in detail thearguments for termination of rippling. Skeleton preservation means that fora rewrite rule L ) R, the skeleton of L must be a superset of the skeletonof R. For example

f(x)"

+ y ) f(x+ y )"

f(x+ y )#) f(x)

#+ y

are wave rules because the skeleton, x+ y is preserved under the rewrite. Itis possible to see that when rewriting an expression the rewrites move thewavefronts around the expression, encapsulating more or less of the termstructure according to their direction. The arrows on the wave fronts can bethought of as describing whether the wave front surrounds more or less ofthe term structure under rewriting. Wave rules that maintain the directionof the arrow are called longitudinal, and those that change the direction ofthe arrow are called transverse. Transverse rippling is only allowed if thedirection of the wave rules moves from \out" to \in" (up to down). Wheninitially annotating a term the wave fronts are all directed \out". This

194

Ewen Maclean

condition helps one to think about termination; intuitively one can think ofthe measure monotonically decreasing because a wave front can move out asfar as it can go and then move back in if it is desirable, but it cannot moveback out once it has started to \ripple in". Wave rules match expressions ifthe annotation matches with a subterm within the expression.

The properties of rippling are

Well-formedness If a well annotated term is rewritten by a wave-rule thenthe result is a well annotated term.

Skeleton preserving The result of applying wave rule to a term has askeleton which is a subset of the original term.

Correctness When a term is rewritten by a wave rule the erasure of therewrite corresponds to a rewrite in the unannotated theory.

Termination Rippling terminates.

This has been a brief overview of the calculus of rippling. For a fullerdescription see (Basin and Walsh 1996), and for a description of the existingwork on colouring terms for equational rewriting, which we do not use, see(Hutter 1997).

4 A brief case study

This section presents a plan which has been generated automatically by�Clam. Other proofs which have been performed include LIM+, the chainrule for di�erentiation and the product rule.

A useful example of a limit theorem to consider in the context of non-standard analysis and proof-planning is that of LIM�. This theorem statesthat the product of the limits of two real valued functions is equal to thelimit of the product of the functions. This conjecture is diÆcult to automatein standard analysis (Melis et al. 2000), but we show here that the struc-ture of its non-standard counterpart is very simple. This example proofdemonstrates the advantage of using non-standard analysis over standardanalysis, and shows how existing proof-planning machinery can be used tomodel the reasoning patterns in non-standard proofs. The standard versionof this proof is challenging on paper, and is extremely hard to automate.The �� Æ formulation involves quanti�ers which are arranged in such a waythat a guess has to made about the instantiation for quanti�ed variablesbefore enough information is available. The conjecture in standard analysisis stated as

limx!a f(x) = lf ^ limx!a g(x) = lg ` limx!a f(x)� g(x) = lf � lg

195


The hypotheses to the conjecture become:

c; lf ; lg : R (17.1)

f; g : R ! R (17.2)

8x: x � �c ^ x 6= �c! f�(x) � �lf (17.3)

8x: x � �c ^ x 6= �c! g�(x) � �lg (17.4)

using the non-standard characterisation for a limit. Hypotheses (17.1) and(17.2) describe the typing information of the constants.

The hypotheses (17.3) and (17.4) can now be embedded in the conclusion.This yields the annotated conclusion

x � �c ^ x 6= �c ! f�(x)red� g�(x)

blue

"� �lf red �

�lg blue"

The terms belonging to di�erent hypotheses inside the wave fronts are givendi�erent colours. The shading around the variable x in the conclusion refersto positions which correspond to universally quanti�ed variables in the hy-potheses. In rippling terminology these are referred to as sinks. The waverules that are available to the system are the following:

[finite(Y ) ^ finite(B)] X red � A blue

"� Y red � B blue

") X � Y red ^ A � B blue

"(17.5)

A! B red ^ C blue

") A! B red ^ A! C blue

"(17.6)

Here when we write A ) B we mean that A rewrites to B, and hence it istrue that B ! A. The side conditions involving the predicate finite referto all hyperreal numbers which are not in�nite| for example real numbers.

Wave rule (17.5) represents the continuity of �, which is itself a com-plicated theorem to prove in standard analysis, and renders this proof verysimple indeed. In non-standard analysis, it is a challenging problem in-volving mainly equational reasoning, but is much simpler than its standardcounterpart as we discuss in section 4.1.

By inspecting the annotation on the conclusion and that on the rewriterules, the general idea behind rippling can be seen. The wave fronts shouldincrease until the terms in the wave holes can match with the hypotheses.After application of each of these rules, the conclusion becomes

x � �c ^ x 6= �c! f�(x) � �lf red ^ x � �c ^ x 6= �c! g�(x) � �lg blue"

at which point the holes are instances of the hypotheses as required. Nowthe conclusion contains instances of the hypotheses, and the plan of theproof is complete.

Part of the output from the plan given by �Clam showing wave rules(17.1) and (17.2) is given below. The formatting has been changed slightlyby hand so as not to take up too much space.

196

Ewen Maclean

Attempting...

cond_wave_method _841981 cont_times

cond_wave_method outward cont_times

succeeded

X:hyperreal, c:real, f:real ! real, g:real ! real, lf:real, lg:real

8 Y:hyperreal. ((Y � emb c) ^ : (Y = emb c)) ! ext f Y � emb lf8 Y:hyperreal. ((Y � emb c) ^ : (Y = emb c)) ! ext g Y � emb lg` ((X � emb c) ^ : (X = emb c)) ! (ext f X � emb lf) ^ (ext g X � emb lg)

1 :: 1 :: 1 :: 1 :: 1 :: 1 :: 1 :: nil %%current plan node

Attempting...

patch_meth (wave_method outward _818076) wave_critic_strat

patch_meth (wave_method outward prop_wr1) wave_critic_strat

succeeded

X:hyperreal, c:real, f:real ! real, g:real ! real, lf:real, lg:real

8 Y:hyperreal. ((Y � emb c) ^ : (Y = emb c)) ! ext f Y � emb lf8 Y:hyperreal. ((Y � emb c) ^ : (Y = emb c)) ! ext g Y � emb lg` ((X � emb c) ^ : (X = emb c)) ! (ext f X � emb lf) ^

((X � emb c) ^ : (X = emb c)) ! (ext g X � emb lg)

1 :: 1 :: 1 :: 1 :: 1 :: 1 :: 1 :: 1 :: nil

Here emb denotes the embedding of a real number in the hyperreals, andext the non-standard extension of a real function.

The non-standard proof follows a simple pattern of reasoning whichproof-planning is able to model. In this example, we have introduced waverule (17.5) to the system so that the simplicity of the overall structure isapparent. The annotation gives the proof extra information which can beused to speculate lemmas, and proof transformations. For example, considerwhat would happen if the wave rule (17.5) had not been added to the system.The rewriting mechanism would then fail, prompting a critic to respond tothis failure, and suggest a new wave rule according to the hypotheses. Seesection 4.1 for a discussion of this wave rule. This would enable the proofto automatically recover and go through even in the absence of the rewriterule. The advantage of the annotation in this case, is that it can be madeexplicit which terms belong to which hypotheses, and hence which shouldbe brought closer together.

4.1 The continuity of �

The following conjecture represents the lemma which has to speculated andwhose proof must be planned automatically in order to justify the claimthat the construction of a plan for the proof LIM� has been automated.This work is currently in progress.

` 8a; b; x; y: M ^ x � y ^ a � b ! x� a � y � b

197


In this case the meta-variable M has to be instantiated to the conditionsof the rule. After the usual 8 and implication introduction, we proceed asfollows:

` x� a � y � a ^ y � a � y � b

from transitivity. We then speculate the following lemma:

M1 ^ x � y ` x� a � y � a

where M1 is a new meta-variable which must be instantiated to the condi-tions. Now we use an arithmetic rule to yield the conclusion:

` x� a� y � a � �0

and use the distributive law to write this as:

` (x� y)� a � �0

and �nally rewrite this using the following axioms:

X � �0 ^ finite(Y )! X � Y � �0�0 + x = x

x+ (�x) = �0

to yield:` finite(a) ^ x � y

which completes the proof, and M1 is instantiated to the condition finite(a).Using this lemma we can proceed with the original proof. The conclusion

was at the state:x� a � y � a ^ y � a � y � b

which can now be rewritten as:

finite(a) ^ x � y ^ finite(y) ^ a � b

which completes the proof, with the instantiation of the meta-variable M asfinite(a)^ finite(y). Including the commutativity of � and the symmetryof � in the original equation, we can also derive new conditions, meaningthat the most general condition for M is (finite(x)_finite(y))^(finite(a)_finite(b)). The use of transitivity in this example is analogous to the use ofthe triangle inequality in the standard proofs.

198

Ewen Maclean

5 Related work

A reasonable amount of work has already been completed in both standardand non-standard analysis. (Bledsoe et al. 1972) is the �rst piece of work onautomating parts of standard analysis, and presents a resolution based the-orem prover which automates certain standard analysis proofs, and presentscertain heuristics such as the limit heuristic which are speci�c to analysisproofs. (Bledsoe and Ballantyne 1977) presents the same theorem provermodi�ed to automate proofs in non-standard analysis. The goal of thiswork was to demonstrate how non-standard analysis allows mathematiciansto prove theorems in analysis more easily than by using the � � Æ formula-tion. The paper presents the methods employed in the prover, and explainshow it handles the various types that occur in non-standard analysis. Theirwork uses a generous axiomatisation and the prover is able to prove simpleresults about standard parts of numbers, as well as more signi�cant resultssuch as LIM+, and even the Bolzano-Weierstrass conjecture that a con-tinuous function on a compact set is uniformly continuous. Their proofs,however, are not as readable as those generated by our proof-planning ap-proach, as they are driven by resolution, and the output is linear. Throughproof-planning the tree structure of method applications in the proof plancan be seen, and hence the structure of the proof itself is clearer.

In the MEGA proof-planner and mathematical assistant (Benzm�ulleret al. 1997), there has been much work done in developing proof-plansfor limit theorems using what is referred to as \multiple strategies". TheMEGA proof-planner builds on Bledsoe's work by introducing a \complexestimate", and a constraint solver which carefully manages the introductionof meta-variables in order to calculate the diÆcult instantiations of the vari-ables in standard proofs. The system is able to prove a substantial numberof theorems at present, and a proof presentation system has been included inthe system to make the proofs readable. The work is documented in (Meliset al. 2000) and (Melis and Siekmann 1999) for example.

The Mathpert system developed by Beeson is a mathematical assistantwhich is also capable of automatically proving properties about functions instandard analysis such as continuity (Beeson 1998). Often the proofs involveevaluating limits, and Beeson found that this evaluation could be simpli�edusing of some axioms from non-standard analysis (Beeson 1995).

The ACL2 theorem prover originally did not include a theory for rea-soning about conjectures over the reals, but Gamboa (Gamboa 1999) usedNelson's Internal Set Theory (Nelson 1977) to extend the prover to auto-matically prove theorems about real valued functions.

The most signi�cant work that has been done in this area has been doneby Fleuriot (Fleuriot and Paulson 2000) in the interactive theorem proverIsabelle/HOL. This work is the �rst to mechanise the construction of thehyperreals using the ultrapower construction used by Robinson to develop

199


non-standard analysis. A substantial amount of real analysis has also beenproved in the hyperreal domain using this construction. It also highlightsthat there are proofs in non-standard analysis, though simpler than theirstandard counterparts, which cannot be done automatically. We believe thatour proposed approach should enable these to be planned automatically.

Other work in progress that is related to the current work is that ofHeneveld (Heneveld et al. 2000). This work models the cognitive processesof mathematicians solving integration problems. The �Clam proof-planneris also being used to do this, and many of the rules used, such as integrationby parts, can potentially be proved using the work proposed here.

6 Proposed work

The work we propose in this project involves generating plans for proofs innon-standard analysis automatically, and thence executing them in an objectlevel theorem prover. The methodology is one of genericity and automation.We have compiled a set of minimal axioms for non-standard analysis fromwhich to derive all of the rules we require, and we hope to build a genericproof-plan whose initial state will be constant, but whose eventual shape willbe transformed by the addition of intelligent critics. In addition to this, wehave introduced a method for unfolding axioms by introducing such notionsas �eld homomorphism. For example, the reals and the hyperreals form a�eld, so it is sensible to express �eld axioms in such a way, that the axiomsspeci�c to each type can be generated in a generic manner. This is similarto the work done in Isabelle on Axiomatic Type Classes (Wenzel 2000).

Part of the work still yet to be completed involves the collection of a setof conjectures with which to test the system. The initial area of interest lieswithin limit and continuity conjectures, which have already been attemptedin standard analysis, for example in the MEGA system. Some of theseproofs have already been performed as a trial in an earlier version of �Clam.Also of interest are fundamental proofs about calculus such as the chain rule,the product rule, and the fundamental theorem of calculus. The former twohave also been attempted in the earlier version of �Clam with some success.

7 Conclusion

The main aim of this project is to gain an understanding of the structure ofproofs in non-standard analysis. Proof-planning allows this to be done in adeclarative way by formally specifying plans. We hope that this will showthat non-standard analysis is a better way of going about proving theoremsabout analysis in general, and also, perhaps, would also be a more sensiblealternative for teaching purposes than the infamous �� Æ proofs of standardanalysis.

200

Ewen Maclean

References

Aubin, R. (1976). Mechanizing Structural Induction. Ph. D. thesis, Uni-versity of Edinburgh.

Basin, D. and T. Walsh (1996). A calculus for and termination of rippling.Journal of Automated Reasoning 16 (1{2), 147{180.

Beeson, M. (1995). Using nonstandard analysis to verify the correctnessof computations. International Journal of Foundations of ComputerScience 6 (3), 299{338.

Beeson, M. (1998). Automatic generation of epsilon-delta proofs of con-tinuity. Arti�cial Intelligence and Symbolic Computation, 67{83. Lec-ture Notes in Arti�cial Intelligence No. 1476.

Benzm�uller, C., L. Cheikhrouhou, D. Fehrer, A. Fiedler, X. Huang,M. Kerber, K. Kohlhase, A. Meier, E. Melis, W. Schaarschmidt,J. Siekmann, and V. Sorge (1997). mega: Towards a Mathemati-cal Assistant. In W. McCune (Ed.), 14th International Conference onAutomated Deduction, pp. 252{255. Springer-Verlag.

Bledsoe, W. W. and A. M. Ballantyne (1977). Automatic Proofs of Theo-rems in Analysis Using Nonstandard Techniques. Association for Com-puting Machinery 24 (3), 353{374.

Bledsoe, W. W., R. S. Boyer, and W. H. Henneman (1972). ComputerProofs of Limit Theorems. Arti�cial Intelligence 3, 27{60.

Boyer, R. S. and J. S. Moore (1979). A Computational Logic. AcademicPress. ACM monograph series.

Bundy, A. (1988). The Use of Explicit Plans to Guide Inductive Proofs.In R. Lusk and R. Overbeek (Eds.), 9th International Conference onAutomated Deduction, pp. 111{120. Springer-Verlag. Longer versionavailable from Edinburgh as DAI Research Paper No. 349.

Bundy, A., A. Stevens, F. van Harmelen, A. Ireland, and A. Smaill (1993).Rippling: A Heuristic for Guiding Inductive Proofs. Arti�cial Intel-ligence 62, 185{253. Also available from Edinburgh as DAI ResearchPaper No. 567.

Fleuriot, J. D. and L. C. Paulson (2000). Mechanizing Nonstandard RealAnalysis. LMS Journal of Computation and Mathematics 3, 140{190.

Gamboa, R. (1999). Mechanically Verifying Real-Valued Algorithms inACL2. Ph. D. thesis, The University of Texas at Austin.

Heneveld, A., E. Maclean, A. Bundy, J. Fleuriot, and A. Smaill (2000). To-wards a Formalisation of College Calculus. In Proceedings of the 2000Calculemus Symposium: Systems for Integrated Computation and De-duction.

201


Hutter, D. (1997). Colouring Terms to Control Equational Reasoning.Journal of Automated Reasoning 18, 399{442.

Ireland, A. (1992). The Use of Planning Critics in Mechanizing Induc-tive Proofs. In A. Voronkov (Ed.), International Conference on LogicProgramming and Automated Reasoning { LPAR 92, St. Petersburg,Lecture Notes in Arti�cial Intelligence No. 624, pp. 178{189. Springer-Verlag. Also available from Edinburgh as DAI Research Paper 592.

Melis, E. and J. Siekmann (1999). Knowledge-Based Proof Planning. Ar-ti�cial Intelligence 115 (1), 65{105.

Melis, E., J. Zimmer, and T. M�uller (2000). Extensions of ConstraintSolving for Proof Planning. European Conference on Arti�cial Intelli-gence.

Nelson, E. (1977). Internal set theory: A new approach to nonstandardanalysis. Bulletin American Mathematical Society 83. Available fromhttp://www.math.princeton.edu/~nelson/books.html.

Richardson, J., L. Dennis, J. Gow, and M. Jackson (2000).User/Programmer Manual for the �Clam proof planner.

Robinson, A. (1966). Non-standard Analysis. North-Holland PublishingCompany, Amsterdam. Studies in Logic and the Foundations of Math-ematics.

Walsh, T., A. Nunes, and A. Bundy (1992). The Use of Proof Plansto Sum Series. In D. Kapur (Ed.), 11th International Conference onAutomated Deduction, pp. 325{339. Springer Verlag. Lecture Notesin Computer Science No. 607. Also available from Edinburgh as DAIResearch Paper 563.

Wenzel, M. (2000). Using Axiomatic Type Classes in Isabelle. Availablefrom http://isabelle.in.tum.de/doc/.

Yoshida, T., A. Bundy, I. Green, T. Walsh, and D. Basin (1994). Colouredrippling: An extension of a theorem proving heuristic. In A. G. Cohn(Ed.), Proceedings of ECAI-94, pp. 85{89. John Wiley.

202

On LPetri nets

Kundan Misra

Department of Computer Science, University of Warwick, Coventry CV47AL, UK

[email protected]

Abstract. LPetri nets are coloured Petri nets where tokens are formulae fromlinear logic. LPetri nets are derived from linear logic Petri nets in Farwer (1996).LPetri nets are convenient for concisely modelling complex systems. A theory ofsimulations and intuitionistic linear logic speci�cation language were developed forPetri nets in (Brown et al. 1991). We will extend these to LPetri nets.

1 Introduction

Petri nets are a system modelling schema that has been much used in manybranches of engineering since the 1960s. A Petri net consists of conditionsand events which are connected by directed, weighted arcs. Diagramatically,conditions are represented by circles and events by rectangles. Conditionscontain tokens, and the \pre-condition" of an event is the number of tokensneeded in each condition for the event to be enabled. The pre-condition ofan event is indicated by the weights of the incoming arcs. An event canoccur when its pre-condition is satis�ed, causing its post-condition:

��ww - -��

��- ��

- -��ww

condition1event condition2

2 3 eventoccurs

condition1event condition2

2 3

Coloured Petri nets (CPNs) allows tokens to be of di�erent \colours". Eachcondition has a set of allowable colours called a colour set. Each conditioncontains a multisets of coloured tokens. (A multiset or \bag" is a set inwhich repetitions of elements is allowed.) Any event in a CPN may havea \guard function" associated. This allows the designer to specify complexcriteria on the incoming arcs.

In (Farwer 1996), a type of CPN was devised in which the colour set ofeach condition is the set of linear logic (LL) formulae, so each LL formulais treated as a unique colour.

Proceedings of the Sixth ESSLLI Student SessionKristina Striegnitz (editor)Chapter 18, Copyright c 2001, Kundan Misra

203

On LPetri nets

The important characteristic of LL, which makes it well-suited to com-puter science, is that premises are treated as resources. This contrasts withclassical logic, in that if we know that a premise holds true, then it is as-sumed to always be true. In a sense, classical logic provides an in�nite\supply" of a true premise. For example, suppose A ` B and A ` C. Thenin LL, if we are given A we can have either B or C but not both.

LL is called a \substructural logic" because it lacks some of the rulesof classical logic. In particular, LL does not have the rules of contractionand weakening. Omission of the contraction rule means that if two \copies"of the same premise are required, then one copy is not suÆcient. For ex-ample, in LL if A;A ` B, then given only A, we cannot deduce B. Onthe other hand, in LL if we are given that A;B ` D, then we cannot saythat A;B;C ` D. Rather, if we have A;B ` D, then we can only sayA;B;C ` D;C. The paper (Girard 1987) is the detailed seminal work onLL. Also see (Troelstra 1993) for a more concise overview of linear logic.

Farwer calls the CPNs in which tokens are LL formulae linear logic Petrinets (LLPNs). We will consider a restricted form of LLPNs called LPetrinets, the di�erence being that LPetri nets do not have guard functions. Theaim is to understand LPetri nets �rst, and advance to LLPNs in a laterpaper. Omitting the guard function gives a construct similar to standardPetri nets, making a categorical treatment more tractable.

Neither LPetri nets nor LLPNs �t the standard description of CPNsfrom (Jensen 1997), because they allow the tokens (LL formulae) to inter-act in the conditions under the proof rules of LL. So the marking of a netfollowing the occurrence of an event is generally not the same as the tokenswhich determine whether the pre-condition of an event is satis�ed. Supposethe LL formulae J�ÆK and J enter a condition, b. Then J and J�ÆK are\consumed" to produce K.

In (Brown et al. 1991), an intuitionistic linear logic (ILL) proof theorywas developed for standard Petri nets, together with a categorical seman-tics. As in classical logic, ILL is LL without the law of the excluded middle.(Also see (Troelstra 1993) for an overview of ILL.) We will show that thisproof theory can be extended from Petri nets to LPetri nets.

The diagram shows how earlier work relates to this paper:

Petri nets

this paper

��

Brown; Gurr; de Paiva

,,XXXXXXXXXXXXXXXXXXXXXXXXXMarti�Oliet and Meseguer // categorical de�nition

LLPNsthis paper

�// LPetri nets

this paper // ILL proof theory

coloured Petri nets

Farwer

jjUUUUUUUUUUUUUUUUUthis paper

OO

this paper

22ffffffffffffffffffffffff

204

Kundan Misra

To motivate this work, we must show that LPetri nets can model real-worldsystems. We will do this with examples. To justify the categorical approach,(Brown et al. 1991) argue that: Structures derived from the category ofPetri nets yields constructors on Petri nets that are required for a modu-lar approach to Petri nets; if the category of Petri nets has an appropriatestructure, we get a logic for reasoning about Petri nets; expressing models ofconcurrency as categories allows us to explore relationships between modelsby studying functors between the categories; and the generality often makesit easy to modify the structures being studied. Also, (Sassone 2000) saysthat the appeal of a categorical approach to Petri nets is that it \tends todevise neat algebraic structures that capture the essential nature of the classof systems considered".

For clarity of numbering, no distinction has been made between proposi-tions, lemmas and theorems. All are called \proposition". Also, proofs havebeen omitted. Proofs will be included in a full journal version of this paper.

2 LPetri nets

An LPetri net is a coloured Petri net in which the colour sets are the formulaefrom the (;+;&;�;�Æ;?; !)-fragment of LL on �nite alphabets. The setof all such formulae is denoted L. We base the de�nition of the category ofLPetri nets, LPetri, on the de�nition of Petri given by (Marti-Oliet andMeseguer 1991).

De�nition 2.1. (a) An LPetri net LN is a 4-tuple hE; (B)L; pre; postiwhere (B)L is the set of functions from L to the free commutative monoidon B. The set of conditions is B and the set of events is E. We de�ne preand post as functions pre; post : E ! (B)L.(b) Let LN 0 = hE0; (B0)L; pre0; post0i. Then hf; gi : LN ! LN 0 is amorphism of LPetri nets if

(i) f is a function f : E ! E0 and(ii) g : B ! B0 is a monoid homomorphism such that

E

f

��

pre // (B)L

g

��

and E

f

��

post // (B)L

g

��

commute.

E0pre0

// (B0)L E0post0

// (B0)L

The category LPetri has objects all LPetri nets and arrows all LPetri netmorphisms.

205

On LPetri nets

3 LPetri net simulations

We must �rst introduce an alternative de�nition of LPetri net. Unfortu-nately, we must rede�ne pre and post.

De�nition 3.1. An LPetri net is a 4-tuple hE;B; pre; posti where E, Bare sets, and pre, post are functions or \multirelations" E �B ! NL .

We call pre(e) the pre-condition set of e and post(e) the post-conditionset of e. We can extend the pre- and post-conditions to multisets of eventsin the obvious way, to enable multisets of events to occur simultaneously.

If M is a marking of LN , then the pair hLN;Mi is the LPetri net withmarking M . We say that the marking hLN;Mi enables A, denoted M#LNA,if for each b 2 B and each l 2 L, the de�nition of \enable" for a standardPetri net is satis�ed. Formally, for each b 2 B and each l 2 L, we require�e2Aprelhe; bi �M(b). Here, we are using � in the sense of multisets.

If LN evolves under A from the marking M to M 0, we write M;1M0.

This happens if for each l 2 L, M 0 = (M � prel(A)) + postl(A). We thensay that the events of A occur concurrently.

3.1 The category GLNet

De�nition 3.2. The category GLNet has objects all LPetri nets and ar-rows hE;B; pre; posti ! hE0; B0; pre0; post0i all pairs of functions hf; F iwhere f : E ! E0 and F : B ! B0 such that the following squares weaklycommute in the direction of the partial order signs and composition is func-tion composition of each component in the pair.

E �B01�F //

f�1

��

E �B

pre

��

E �B01�F //

f�1

��

E �B

post

��

� �

E0 �B0pre0

// N E0 �B0post0

// N

We de�ne a partial ordering on NL by: If f; g 2 NL , then f � g if andonly if f(l) � g(l) for all l 2 L.

De�nition 3.3. For an LPetri net morphism hf; F i : LN ! LN 0, the pairof markings hM;M 0i is F -ok if MF �M 00.

Suppose all pairs hMi;M0ii are F -ok for 0 � i � n+ 1. If an evolution

LN steps through the Mis from M0 to Nn+1, then the image of the evolutionunder f is its simulation in LN 0. A simulation of an evolution of an LPetrinet is essentially a copy of the evolution in another LPetri net.

206

Kundan Misra

The following proposition is adapted from Proposition 6.4 of (Brownet al. 1991).

Proposition 3.1. Let hLN;Mi and hLN 0;M 0i be marked LPetri nets, andlet hf; F i : hLN;Mi ! hLN 0;M 0i be a morphism in GLNet such thathM;M 0i is F -ok. Then for all e 2 E, M#LNe implies that M 0#LN

0fe.

This means that hLN;Mi can simulate any one-step evolution of hLN 0;M 0i.The following variation of Theorem 6.6 of (Brown et al. 1991) tells us whenhLN;Mi can simulate any evolution of hLN 0;M 0i.

Proposition 3.2. Let hLN;Mi and hLN 0;M 0i be marked nets, and lethf; F i : LN ! LN 0 be an arrow in GLNet. If hM0;M

00i is F -ok and

M0;eM1, then M

00;

feM 01 and hM1;M01i is F -ok.

3.2 Example: Apple cake-baking

We now have the machinery to apply the simulation ideas from (Brown et al.1991) to the apple cake-baking example of (Farwer 1996).

We have modi�ed the LLPN which models an apple cake-baking processin (Farwer 1996) to an LPetri net. The process includes the making ofdough, the mixing of apple pieces and baking at the correct temperature.LN 0 models this in detail, while LN models the process with less granularity,though LN only does things that LN 0 also does. In (Brown et al. 1991),simulations of Petri nets are given, and here we will give a simulation ofLPetri nets LN ! LN 0.

Let LN 0 = hE0; B0; pre0; post0i, where E0 = ft1; t2; t3; t4g andB0 = fp1; p2; p3; p4g.

Symbols: Ij - some ingredient, A - apple, D - dough, H - heat in degreesCelsius, Q - apple quarter, R - ready to bake, C - cake.

LN 0 = ��

-

p1

t1

��

-

p2 t2

��p3

��*

HHHj-

t3-��p4

-� -��p5t4

M 00 is de�ned by M 00(p1) = fj2JIjg, M00(p2) = fA3g, M 00(p4) = f200Hg

and M 00(p3) = M 00(p5) = ;. (Note: we abbreviate n copies of A in A� � �A to An.) The events are de�ned by pre0(t1) = fj2JIjg, post

0(t2) = fAg,post0(t2) = fQ4g, pre0(t3) = fD Q12g, post0(t3) = fRg,pre0(t4) = fR 200Hg and post0(t4) = ff200Hgp4 ; fCgp5g.

Suppose we let M 00(p1) = fj2JIj; (Ii0 Ii1)�ÆPg. The symbol P maystand for poison. Then the term (Ii0 Ii1)�ÆP models a contaminant whichmakes it impossible for dough to be produced. This could also be modelled

207

On LPetri nets

in a CPN with guard functions, though not in so intuitively obvious a way.Let LN = hE;B; pre; posti, where E = fq1; q2; q3g and B = fu1; u2g.

LN = ��q1

-u1-��q2

-� -��q3u2

M0 is de�ned by M0(q1) = fD Q12g, M0(q2) = f200Hg and M0(q3) = ;.The events are de�ned by pre(u1) = fD Q12g, post(u1) = fRg,pre(u2) = fR 200Hg, post(u2) = ff200Hgq2 ; fCgq3g.

De�ne f : E ! E0 by f : u1 7! t3, f : u2 7! t4, and de�ne F : B0 ! Bby F : fp1; p2; p3g 7! q1, F : p4 7! q2, F : p5 7! q3.

Then hf; F i : LN ! LN 0 is a morphism in GLNet. By Proposition3.2, for any F -ok pair of markings hM;M 0i, the net LN 0 can simulate anybehaviour of LN with marking M . Consider the pair of markings hM;M 0iwhere M = (fDQ12g; ;; ;) and M 0 = (fj2JIj ;DQ

12g; fA3;DQ12g;fD Q12g; ;; ;). LN can evolve from M by M;1

u1(;; R; ;);1u2(;; ;; C).

The simulation of this evolution in LN 0 is:

M 0;1t3(fj2JIj ;D Q

12g; fA3;D Q12g; ;; fRg; ;);1t4

(fj2JIj ;D Q12g; fA3;D Q12g; ;; ;; fCg).

4 Example: Message handler

We now consider an extension of the message handler and its more concreteversion which was given in (Brown et al. 1991) and (Brown and Gurr 1992).Here, we add features beyond what (Brown et al. 1991) and (Brown andGurr 1992) gave so as to illustrate the practical utility of LPetri nets. Thisexample illustrates the type of simulation that is possible in GLNet as aresult of Proposition 3.2. The approach is similar to that in the apple cake-baking example just given.

In the simple message handler N , when the sender is ready, charactersare sent by M to ME which then ensures that the characters are delivered.The end of a message is indicated by ENDMESS, which disables ME. Themessage handler N 0 does everything that N does, but it also has a timeoutfeature. When a character is not sent within a stipulated time, the characteris resent. Also, if a character is lost in N 0, then it is resent.

The simulation morphism N ! N 0 simply maps

B 7! B0 for each condition B of N and E 7! E0 for each event E of N .

First consider the \unintelligent" message handler called N .Symbols: SR - sender ready, M - message, ME - message sent into ether,

DM - deliver message, � = E�Æ (N�Æ (D�Æ (M�Æ (E�Æ (S�Æ ((S�Æ�?)O (S?�Æ �)) O (S?�Æ �)) O (E?�Æ �)) O (M?�Æ �)) O (D?�Æ �))O (N?�Æ �)) where O (\par") is the LL non-deterministic disjunction.

208

Kundan Misra

��M

-x

char

-x ��ME

-x ^ ��

�-x ��DM

deliver

��

SR

?�

send

?� ^ �

When the sender is ready, the send event occurs. This sends � and � toME. It is implicit that the receipt of characters (each x is a character) viachar is timed with SR. (At least, this synchrony is not accounted for bythe model.) When a character arrives at ME and the � ag is there, thendeliver occurs, which sends x to DM and returns the � ag to ME. Thiscontinues until the characters E N D M E S S are received in sequence.

The formula � is \eroded" by each character in ENDMESS until the ag � is removed when the last S is received. When the ag � is removed,the sending of characters is no longer enabled and thus ceases. Note that ifone of the letters in \ENDMESS" is read but the next character receivedis not the subsequent letter of the word \ENDMESS", then the formula� is refreshed. When send occurs again placing the � ag in ME, thencharacters in the message can again be passed.

Now consider the more intelligent message handler, called N 0.Symbols: SR0 - sender ready, M 0 - message, ME0 - message sent into

ether, DM 0 - deliver message, ST - start timer, TO - timeout, MC - copymessage, L - lose message, � is the same linear logic expression as above, �is a ag used to enable message-sending.

��

M 0

6x

char06 6

��

6x��ME0

6

x ^ �

?

�

-x��DM 0

deliver0

��SR0

��send0

�� ^ �

-�

retry�@@I ��

-�

TO-�

time

��

-t ^ �

ST

?

�?

-

HHHHHHHHHj

x ^ �

�

resend

��

-�MC

x

x�?

clear

x

�

��

�

G

x

�

��

��

��

��+

lose

�

� ^ l

��6

L

�

The features of N 0 that are also in N work in the same way as in N . Thereare several additional features. When a character is sent by the event char0,the timer is started by a ag � being sent to ST and a copy of the character

209

On LPetri nets

x is sent to MC. When the character is successfully sent, through the eventdeliver, the � ag is deleted from ST by �? and the � ag is sent to theclear event so that the character x is removed from MC.

If the timer \times out", then t occurs in ST so that the time event isenabled by t ^ � in ST . The event time then sends � to TO. We representthe non-sending of the message in the allocated time by the loss of the � agin the ME0 condition. So the retry event sends the � ag to ME0 again.The retry event then restarts the timer by sending the � ag to ST .

If the character x is lost, then the lose event is enabled by the cominginto being of the l ag in ME0. This passes the � ag to condition L whichenables the resend event. The resend event \passes" x ^ � to ME0 inanother attempt to send the character. Note that the timer is not restarted.

5 A Proof System for LPetri Nets

Much of this section follows from (Brown et al. 1991), with variations forour extension from Petri nets to LPetri nets.

5.1 A category of multirelations

De�nition 5.1. Let C be a concrete category with �nite products, and letN be an object of C equipped with a partial order �. The category MNChas objects the triples hE;B; �i where E and B are objects of C and� : E � B ! N is an arrow in C; arrows hE;B; �i ! hE0; B0; �0i are thepairs hf; F i of morphisms f : E ! E0, F : B0 ! B in C such that thefollowing diagram commutes:

E �B01�F //

f�1

��

E �B

�

��

�

E0 �B0�0

// N

where � is the partial order induced pointwise on C(A;N) by the partialorder on N , and composition is composition in C for each component.

The category MNC� is de�ned in the same way as MNC except that

the partial order sign in the de�nition of arrows is reversed.

Proposition 5.1. The categories MNC and MNC� are isomorphic.

Evidently, an LPetri net can be regarded as an object ofMNLSet�MNLSet, where MNLSet is interchangeable with MNLSet

�.

210

Kundan Misra

5.2 When MNC is a model of linear logic

Proposition 5.2. (From (Brown et al. 1991)) If C is cartesian closedand has �nite coproducts, and hN;�i is a partial order with a symmetricmonoidal closed structure, then MNC has �nite products, �nite coproductsand a symmetric monoidal closed structure, then MNC is a sound model oflinear logic. These are suÆcient (not necessary) conditions.

The linear connectives are interpreted as follows: ^ by product, � bycoproduct, by the symmetric monoidal product, �Æ by the internal hom,O by a second symmetric monoidal product 3, and linear negation by thefunctor (��Æ ?) where ? is the unit of 3. See (Brown 1991) for a proof.

Now, Set is cartesian closed. Also, hNL ;�i is a closed ordered monoid,since truncated subtraction (where f(l)�f(l0) = 0 if f(l0) > f(l) for f 2 NL )is right adjoint to addition. Therefore, MNLSet has all �nite productsand coproducts, and is symmetric monoidal closed. So MNLSet is a soundinterpretation of ILL, by Proposition 5.2.

5.3 ILL is a proof system for LPetri nets

The following four results are variations of results 7.1, 7.2, 7.3 and 9.1 from(Brown et al. 1991). They give the result that ILL is a proof system forLPetri nets.

Proposition 5.3. The category GLNet is the pullback in Cat of the for-getful functor U : MNLSet ! Set� Setop along itself. That is, GLNet isthe kernel pair of U in Cat.

Proposition 5.4. The functor U strictly preserves the product, coproductand symmetric monoidal closed structure of MNLSet.

Proposition 5.5. The category GLNet has the products, coproducts andsymmetric monoidal closed structure induced by those in MNLSet.

Proposition 5.6. Let [[N]] be an interpretation of the LPetri net N inGLNet. If � ` N then there is an arrow [[�]]! [[N ]].

Proposition 5.6 allows us to perform calculations between LPetri nets,such as combining them by taking products and coproducts, just as thoughthe LPetri nets were symbols in ILL formulae. The result is that we canmodel complex systems, using LPetri nets, by building up a large LPetrifrom simpler \component" LPetri nets. This is what (Brown et al. 1991)is referring to in their title \A linear speci�cation for Petri nets", and herethis speci�cation language has been extended to LPetri nets.

211

On LPetri nets

6 Conclusion

We have considered a \pared down" version of LLPNs, and given someexamples of how they might be useful. The aim has been to illustrate theconvenience of using linear logic formulae as tokens, which highlights theusefulness of LLPNs as well as LPetri nets. We have also shown that thetheory of simulations for Petri nets (from (Brown et al. 1991)) extendsto LPetri nets. We did this by showing that certain properties of GNetalso hold for GLNet. Finally, we have shown that the linear speci�cationlanguage for Petri nets, given in (Brown et al. 1991), is also a speci�cationlanguage for LPetri nets.

7 Further work

We would like to extend these results on LPetri nets to LLPNs as muchas possible. Second, we would like to explore the connection between Petrinets and other models of concurrency, such as that of (Winskel and Nielsen1995). Third, we would like to consider other possibilities for CPN coloursets - in particular, other substructural logics. Fourth, more study shouldbe done on how to make this theory (more) useful to practitioners who workwith Petri nets.

Finally, the author is presently doing work on semantics of data re�ne-ment for concurrency. If we consider Petri nets as a model for concurrency,and LPetri nets as a more powerful model for concurrency, then we wouldlike to make our study of the semantics of data re�nement encompass thelinear speci�cation language of Petri nets and LPetri nets.

References

Brown, C. (1991). Linear Logic and Petri Nets: Categories, Algebra andProof. Ph. D. thesis, AI Laboratory, University of Edinburgh.

Brown, C. and D. Gurr (1992). Re�nement and simulation of nets - acategorical characterization. Lecture Notes in Computer Science 616.

Brown, C., D. Gurr, and V. de Paiva (1991). A linear speci�cation lan-guage for Petri nets. Technical Report 363, Computer Science Depart-ment, Aarus University.

Farwer, B. (1996). Towards linear logic Petri nets. Technical report, Fac-ulty of Informatics, University of Hamburg.

Girard, J.-Y. (1987). Linear logic. Theoretical Computer Science 50, 1{102.

Jensen, K. (1997). Coloured Petri Nets, Vol. 1. New York, USA: Springer.

212

Kundan Misra

Marti-Oliet, N. and J. Meseguer (1991). From Petri nets to linear logic.Mathematical Structures in Computer Science 1, 69{101.

Sassone, V. (2000). On the Algebraic Structure of Petri nets. Bulletin ofthe EATCS 72, 133{148.

Troelstra, A. N. (1993). Substructural Logics, Chapter Tutorial on linearlogic. Clarendon Press.

Winskel, G. and M. Nielsen (1995). Semantic Modelling, Volume 4 ofHandbook of Logic in Computer Science, Chapter Models for concur-rency. Oxford Science Publications.

213

On LPetri nets

214

A weakening of chromatic number

Tara Nicholson

Laboratory for Logic and Experimental Philosophy

Simon Fraser University

[email protected]

Abstract. Hadwiger's conjecture (1943), the claim that every graph with chro-matic number > n contains a Kn+1 minor subgraph (for n 2 Z+), remains anopen problem in graph theory. It has been proven for all n � 5, and for n = 4,it is equivalent to the Four Color Theorem. In this article, the conjecture is gen-eralized to hypergraphs, and the model theory for a class of weakly aggregativemodal logics is used to produce a construction with potential applications to a so-lution of this generalized formulation. More speci�cally, theorems of the Kn modalsystems are used to derive a weakening and logically dual generalization of the \n-uncolourability" of a hypergraph. A contracting operation for hypergraphs is thende�ned and shown to preserve the weakened n-uncolourability property, relative toa speci�ed input.

1 Introduction

In graph theory, scheduling problems constitute a common application of\chromatic number". For example, given a set � of committees, what isthe least of amount of time required in order to schedule meetings in such away that no con icts arise? For a solution, it suÆces to (a) form the systemG consisting of the pair subsets of � whose memberships intersect, and (b)determine the cardinal j of the least partition � of � which biparts eachelement of G. The integer j, known as the chromatic number of G, isthen the least number of time periods necessary for consistent scheduling.But, there are often occasions in which the consideration of such varying\human" factors as truancy, holidays, etc. makes for greater eÆciency. E.g.,there might be a smaller partition of � on which only elements of G whosecommon members are ill are a subset of some cell of the partition. In thisway, putatively optimal solutions may be improved by determining whichelements of the set system formed are (ir)relevant to a given application.

In logical, and particularly non-dialetheic, paraconsistent contexts, sim-ilar applications of \chromatic number" arise. Let � be a set of sentences ofsome language, and let H be the system of all minimally inconsistent sub-sets E of �. Then the chromatic number of H is the cardinal k of the least

Proceedings of the Sixth ESSLLI Student SessionKristina Striegnitz (editor)Chapter 19, Copyright c 2001, Tara Nicholson

215


partition of � into consistent cells. Preservationist logicians refer to thisinteger as the (in)coherence level of �, l(�)(Jennings and Schotch 1989).As Jennings and Schotch have shown (1989), to avoid trivializing the closureof � under classical inference, it suÆces to restrict inferences to those whichfollow classically from every element E0 of a system H0 of subsets of �, wherethe chromatic number of H0 > k. In this way, if a sentence � is inferred from�, then � follows classically from at least one cell in every partition of �into classically consistent subsets|i.e., l(� [ �) = l(�).

Unfortunately however, the practical utility of this inferential strategy isquestionable because the underlying problem|that of forming a set systemH0 with chromatic number> k|is probably not even in NP. Otherwise, we'dhave CO{NP = NP, by the NP-completeness of COLOURABILITY. Still,in some contexts, it may be possible to form H0 eÆciently by weakening therestriction on its chromatic number. Here, rather than determining whichelements of the set system formed are (ir)relevant to application at hand,we are, obversely, considering which partitions of the underlying set � are(ir)relevant to the preservation of the desired property, e.g., partitions of �which contain (in)consistent cells.

In what follows, a combinatoric formalization of this latter kind of weak-ening of \chromatic number" is derived from the model theory for a classof weakly aggregative modal logics, viz., the Kn modal logics, which rep-resent the Jennings{Schotch paraconsistent inference relation. To begin,the relationship between (n + 1)-ary aggregation and n-uncolourability isexplored for the purposes of generalizing a logically dual characterizationof chromatic number. This result is then exploited in a construction withpotential applications to a solution of a generalized Hadwiger's conjecturefor hypergraphs1.

2 A modal representation of weak uncolourability

The Kn modal system is an n+ 1 aggregative theory which adjoins

� [RM ]`�i!�j`2�i!2�j

, and

� [RN ] `�i`2�i

with

� [Kn] ` 2p1 ^2p2 ^ ::: ^2pn+1 ! 22=(n+ 1)(p1; p2; :::; pn+1),

where 2=(n+ 1)fpig = _[pi ^ pj] (1 � i 6= j � n+ 1).Thus, K1 is just the standard Kripke K.

De�nition 2.1. Let U be a non-empty set, let n 2 Z+, and let R : U �!}(Un). Then hU;Ri is an (n+ 1)-ary relational frame F.

1The author wishes to thank the Natural Sciences and Engineering Research Councilof Canada for supporting this research.

216

Tara Nicholson

De�nition 2.2. Let � be a denumerable language with the operators ^,_, :, 2, and 3, where At = fpig is the denumerable set of atoms of �.Let F = hU;Ri be an (n + 1)-ary relational frame. Let V : � �! }(U)where V is de�ned in the usual way for truth-functional operators, 8� 2 �,

V (2�) = fx 2 U j 8a 2 R(x);9y 2 a : y 2 V (�)g, and 3def= :2:. Then M

= hF;Vi is an (n+ 1)-ary relational model on the frame F.

Completeness for Kn with respect to the class of all (n+1)-ary relationalmodels was �rst proven by Apostoli and Brown (1995), and later simpli�edby Nicholson, Jennings, and Sarenac (2000) in a proof which exploits alogically dual characterization of \n-uncolourability" for hypergraphs.

De�nition 2.3. A hypergraph H = fEig is any �nite non-empty set of�nite non-empty sets Ei, where [H is the vertex set of H, and fEig is theset of its edges.

If H is a hypergraph and 8E 2 H; jEj = 2, then H is a simple graphwithout isolated vertices, or, vertices which fail to appear in edges of H.These are the only kinds of graphs we consider. In general, \G" denotes agraph, and \H" denotes a hypergraph. Moreover, none of the hypergraphsconsidered have isolated vertices; whence that graphs and hypergraphs alikeare identi�ed by their edge sets.

De�nition 2.4. Let � be a set. Then the set of all k{decompositions

Æ of �, �k(�) = fÆ = fdig j [ki=1di = �g.

De�nition 2.5. Let � be a set. Then the set of all k{partitions � of

�, �k(�) = f� = fcig j [ki=1ci = �, and 8i; j (1 � i 6= j � k); ci\cj = ;g.

De�nition 2.6. Let H be a hypergraph. Then the chromatic number ofH, �(H) = min fj 2 Z+ j 9� 2 �j([H) : 8ci 2 �;8E 2 H; E 6� cig.

De�nition 2.7. Let H be a hypergraph, and let n 2 Z+. Then H is n{uncolourable if �(H) > n. H is (properly) n{colourable, else.

If H is a hypergraph with a singleton edge, then since 8n 2 Z+, no n-partition of [H biparts every H-edge, set �(H) = 1. In general, if E 2 His not biparted by a partition �, then we say that E is monochromatic on�. Because we are interested in (a) properties of hypergraphs with deter-minate chromatic number, and (b) non-trivial aggregation among formulaeof a language, typically hypergraphs with singleton edges are excluded fromconsideration.

De�nition 2.8. Let H � }(�). Then the formulation of H, f(H) =_[^[E 2 H]]. The dual formulation of H, f4(H) = ^[_[E 2 H]].

De�nition 2.9. Let H be a hypergraph with a set S � [H. Then S is atransversal for H if 8E 2 H; S \E 6= ;.

217


De�nition 2.10. 8H, the transversal hypergraph for H, TrH is the setof all inclusion{minimal transversals for H.

Lemma 2.11. 8H; T rTrH � H. (Berge 1989) (Nicholson et al. 2000)

Theorem 2.12. 8H; j= f4(H)$ f(TrH), and j= f4(TrH)$ f(H).(Nicholson et al. 2000)

The preceding result suggests that 8H;H and TrH are duals in the log-ical sense in which conjunctive and disjunctive normal forms are duals. Toillustrate, let � 2 �, with � the DNF of �. Let H be the underlying hy-pergraph whose edges are the disjuncts of �, and whose vertices are theconjunct literals. Then � = f(H) � f4(TrH) = the CNF of �.

De�nition 2.13. Let H � }(�). Then [Jn] = ^[2[[H]] ! 2f(H), and

[J4n ] = 3f4(H)! _[3[[H]].

Theorem 2.14. Let H � }(At), and let C be the class of all (n + 1)-aryrelational models. Then 8n 2 Z+; �(H) > n , C j= [Jn].

Lemma 2.15. Let H � }(�). Then where C is the class of all (n+ 1)-ary

relational models, C j= [Jn] , C j= [J4n ].

Corollary 2.16. Let H � }(At), and let C be the class of all (n + 1)-aryrelational models. Then �(H) > n , C j= 3f(TrH)! _[3[[H]].

Expressed combinatorically (modullo hypergraphs with singleton edges),this corollary provides the following logically dual characterization of \n-uncolourability" for hypergraphs:

Theorem 2.17. 8H;8n 2 Z+; H is n-uncolourable , 8B 2�TrHn

�;\B 6=

;, and jTrHj > n. (Nicholson et al. 2000)

To generalize this characterization, consider that 8B 2�TrHn

�;\B 6= ; ,

8B 2�TrHn

�;9x 2 [H : fxg is a transversal for B , every n-tuple of TrH-

edges has a singleton transversal. Thus, if the size of minimum transversalsfor n-tuples of TrH-edges is allowed to vary, then H is n-colourable. Inthat event however, by weakening the [J4n ] principle it is possible to de�nea subclass of n-partitions of [H which do not properly colour H. I.e., letH � }(At), let Æ = fdig be a decomposition of [H, and let C be the classof (n+ 1)-ary relational models. Then the following rule preserves validitywith respect to C:

3f4(H)!3_[d1]_3_[d2]_:::_3_[djÆj]

2^[d1]^2^[d2]^:::^2^[djÆj]!2f(H).

I.e., where S = f� 2 �n([H) j 8d 2 Æ;9c 2 � : d � cg, we have that 8� 2 S,some H-edge is monochromatic on �.

218

Tara Nicholson

De�nition 2.18. Let S be a �nite set with k 2 Z+, and Æ 2 �k(S). Thenthe set of Æ-n-partitions of S;�Æ

n(S) =

f� 2 �n(S) j 8di 2 Æ;9c 2 � : di � cg.

In words, �Æn(S) is just the set of n-partitions of S on which each di 2 Æ

is monochromatic.

De�nition 2.19. Let S be a �nite set with k 2 Z+, and Æ 2 �k(S). Thenthe set of Æ-n-decompositions of S;�Æ

n(S) =

fÆ0 2 �n(S) j 8di 2 Æ;9d0 2 Æ0 : di � d

0g.

De�nition 2.20. Let H be a hypergraph, let k; n 2 Z+, and let Æ 2 �k([H).Then H is uncolourable on �Æ

n, if 8� 2 �Æn([H), 9E 2 H : E is monochro-

matic on �. H is (properly) colourable on �Æn, else.

In general, if H is a hypergraph and 9k; n 2 Z+ with Æ 2 �k([H), andH uncolourable on �Æ

n, then H is said to be weakly n-uncolourable.

Theorem 2.21. Let H be a hypergraph, with k; n 2 Z+, and Æ 2 �k([H).Then H is uncolourable on �Æ

n , 8B 2�TrH

n

�;9di 2 Æ : di is a transversal

for B.

Proof. [)] Assume the antecedent. Suppose the consequent is false.Let B be an n-tuple of transversals for H with no di transversing B. Then[H� [B] induces an n-partition � 2 �Æ

n which biparts every H-edge, absurd.[(] Assume the antecedent. Suppose that the consequent is false. In

particular, let � 2 �Æn properly colour H. Then [H� [�] is a j-tuple (j � n)

of transversals for H for which no di 2 Æ is a transversal, absurd.

Notice that the generalization of n-uncolourability provided by the pre-ceding theorem suggests a parameter according to which the extent to whicha hypergraph is (not) n-uncolourable may be measured. In the next section,a formulation of Hadwiger's conjecture is given which highlights this param-eter, and its potential utility, with respect to the solution of the conjecture.

3 Hadwiger's conjecture for hypergraphs

Hadwiger's conjecture is the claim that 8n 2 Z+;8G, if �(G) > n, thenthe complete graph on n + 1 vertices, Kn+1, is obtainable from G in a�nite amount of time by consecutively either deleting vertices or edges, orcontracting edges. For n = 4, this conjecture is equivalent to the four-colour theorem. Moreover, Robertson, Seymour, and Thomas (1993) haveshown that the four-colour theorem implies Hadwiger's conjecture for n = 5;for n = 3 it was proven by Dirac (1952); for n = 2, by Hadwiger (1943);and for n = 1 it is utterly trivial. So far however, for arbitrary n 2 Z+,there is no known proof of the conjecture. In this section, the conjecture is

219


generalized to hypergraphs, and equivalently formulated in the language ofweak-uncolourability.

De�nition 3.1. Let H and H 0 be hypergraphs. Then H0 is a contraction

of H if for some E = fx; y1; y2; :::ykg 2 H, and some yi 2 E;H 0 is theresult of substituting x for all occurrences of yi in H-edges, and deleting anysingleton edges which thereby arise.

If H is a graph, then a contraction of H corresponds to the usual notionof an edge{contraction for a graph. Similarly, the deletion of a vertex x ina hypergraph H results in the hypergraph H 0 = fE 2 H j E � [H�xg, andthe deletion of an H-edge E gives H� fEg.

De�nition 3.2. Let H and J be hypergraphs. Then H0 is a minor of H if

H0 is obtainable from H by a �nite sequence of contractions, or deletions of

vertices or edges.

(Generalized) Hadwiger's conjecture:8H;8n 2 Z+; �(H) > n ) H contains a Kn+1 minor.

De�nition 3.3. Let H be a hypergraph. Then H is connected if 8x; y 2[H, there is a sequence E1; E2; :::; Ek of H-edges, with:

� x 2 E1,

� y 2 Ek, and

� Ej \Ej+1 6= ; (1 � j < k).

De�nition 3.4. Let H be a hypergraph with S � [H. Then the restrictionof H to S, H[S] = fE 2 H j E � Sg.

Proposition 3.5. Let H and H 0 be connected hypergraphs, with [H 0 = [j].Then H contains an H 0 minor i� 9� 2 �j([H) with non-empty cells suchthat 8c 2 �;H[c] is connected, and 8E = fjig 2 H 0, 9E0 2 H with thevertices in E0 having indices which agree with those in E.

De�nition 3.6. Let H and H 0 be connected hypergraphs where H 0 is aminor of H, and [H 0 = [j]. Let � 2 �j([H) be such that 8c 2 �; c 6= ;, H[c]is connected, and 8E = fjig 2 H

0, 9E0 2 H with the indices of E0 agreeingwith those of E. Then fH[ci]g (1 � i � j) is a set of exterior subgraphs

for H 0, and fE 2 H j 8c 2 �;E =2 H[c]g is an H 0-interior of H, H=H 0 (seeFigure 1). �

��

��

��

��

QQQQ

��

��

��HHHH

��

��

��

��

Figure 1: a K4-interior G=K4 of a graph G

220

Tara Nicholson

Theorem 3.7. Let H be a connected hypergraph. Then H contains a Kn+1

minor i� there is an (n + 1){vertex minor G of H with exterior subgraphsH1;H2; :::;Hn+1 satisfying 8B 2

�Tr(H=G)n

�, 9Hi : [Hi is a transversal for B.

Proof. [)] Assume that H contains a Kn+1 minor G with exterior sub-graphs H1;H2; :::;Hn+1. Then where Æ = f[H1;[H2; :::;[Hn+1g;H=G isuncolourable on �Æ

n. ) j= 2 ^ [[H1] ^ 2 ^ [[H2] ^ ::: ^ 2 ^ [[Hn+1] !2f(H=G). )j= 3f4(H=G) ! _[3 _ [[Hi]](1 � i � n + 1). Whence,8B 2

�Tr(H=G)

n

�;9Hi : [Hi is a transversal for B.

[(] Assume the antecedent. Then j= 3f4(H=G)! _[3 _ [[Hi]](1 � i � n+ 1). So j= 2^ [[H1]^2^ [[H2]^ :::^2^ [[Hn+1]! �f(H=G).) H contains a Kn+1 minor subgraph.

Corollary 3.8. Hadwiger's conjecture is true, :8H; �(H) > n) 9G;H 0 �H : G is a minor of H 0 with exterior subgraphs H 01;H

02; :::H

0n+1 such that

8B 2�Tr(H 0=G)

n

�; 9H 0i : [H 0i is a transversal for B.

4 weakly n-uncolourable minor subgraphs

Since \minor-hood" is transitive for hypergraphs, to prove Hadwiger's con-jecture it suÆces to show that every hypergraph admits of a chromaticnumber{preserving contraction. In Nicholson (2001) it is shown how to de-�ne such an operation for every hypergraph H with a vertex x satisfying:9S � NH(x)(= fE 2 H j x 2 Eg) : jSj � jNH(x)j � (�(H) � 3), and8E 2 H; E 6� [S � fxg. As a corollary, it follows that for every graph G,there is a chromatic number{preserving contraction of G if 9x 2 [G suchthat either x does not appear in any K3 subgraph of G, or jNG(x)j = �(G)�1.The proof exploits the downward invariance of the dual characterization of\n-uncolourability" upon the deletion of a speci�ed vertex from a givenTrG-edge. Here, a similar strategy is invoked, but this time using the dualcharacterization of weak uncolourability.

De�nition 4.1. Let H be a hypergraph. Then the covering hypergraph

for H, Tr0H, is the set of all transversals for H.

De�nition 4.2. Let H be a hypergraph, with E 2 Tr0H, and x 2 E. Letn 2 Z+. Then x is INE

nif 9B 2

�Tr0Hn

�: E 2 B and \B = fxg; x is

OUTEn, else (see Figure 2) .

221


- \B = fxgB 2�Tr0H

n

� ��

�

�

�

x

E 2 Tr0H-x

x

x

B =

8>>>>>><>>>>>>:

E1...Ei...En

Figure 2: x is INEn

De�nition 4.3. Let H be a hypergraph with E 2 TrH, and x 2 E. Then thecontractor set for x on E, Cx

E= ([fE0 2 H j E � fxg \E0 = ;g) � fxg

(see Figure 3).

De�nition 4.4. Let H be a hypergraph, and let E 2 TrH, with x 2 E. Thenthe (block) contraction of the contractor set for x on E, Cx

E7! x,

is the minor hypergraph obtained from H by identifying each vertex in thecontractor set for x on E with x (and deleting any singleton edges whicharise).

De�nition 4.5. Let H be a hypergraph, and let B 2�TrHn+1

�satisfy:

� \B = ;,

� E 2 B, and

� x 2 E.

Then B is a context for CxE 7! x, E is a base, and x is the pivot (seeFigure 3).

De�nition 4.6. Let H be a hypergraph with E 2 TrH and x 2 E such thatfor some n 2 Z+, x is OUTEn . Then a hypergraph H 0 is a n{contraction

of H, if H 0 �= CxE 7! x, and there is a context B for CxE 7! x.

� �x��

the pivot

� �z [H�E

?

E 2 TrH(a base)

CxE

- -

Figure 3: the contractor set CxE for x on E

De�nition 4.7. Let H be a hypergraph with a n-contraction H 0. Let B =fEig be a context for the contraction, Ej a base, and x the pivot. Then�: B �! }([H 0), where:

222

Tara Nicholson

� 8i 6= j; �(Ei) = (Ei �CxEj

) [ fxg, and

� �(Ej) = Ej � fxg.

Proposition 4.8. Let H be a hypergraph with a n-contraction H 0. Let Bbe a context for H 0, Ej a base, and x the pivot. Then 8E 2 B, �(E) is atransversal for H 0, and in particular, �(Ej) 2 Tr(H

0).

Lemma 4.9. Let H be a hypergraph with a n-contraction H0. Let B =

fEig be a context for H 0, Ej a base, and x the pivot. Let C = fcig =[H� [B], where 8i (1 � i � n+ 1), ci = [H�Ei. Lastly, let C

0 = fc 0ig =[H 0 � [�[B]], where 8i (1 � i � n+ 1); c 0i = [H 0 � �(Ei).Then H is uncolourable on �C

n ) H0 is uncolourable on �C 0

n .

Proof. Assume that H0 is colourable on �C0n . Let � 2 �C0

n ([H0) bipartevery H0-edge. Then since �(Ej) 2 TrH

0; c0j 2 �: ) replacing x with CxEj in

c0j 2 � produces an n-partition �0 of [H� fxg which biparts every H-edge.

) [H � [�0] 2�Tr0H

n

�. But by construction, \([H � [�0]) = fxg, which is

absurd since x is OUTEjn . ) H0 is uncolourable on �C0

n (see Figure 4).

� �� ci = c0j��

c2c1 cn

: : :

: : :

: : :

: : :

: : : : : :

Ei = �(Ej)E1

(Ei = [H0 � ci)

E2 En = B 2�Tr0H0

n

�

= � 2 �C0

n ([H0)��

��

��

�x

6zCxEj

Figure 4: preserves weak-uncolourability

De�nition 4.10. Let H be a hypergraph with a n-contraction H 0. Let Bbe a context for H 0; Ej a base, and x the pivot. Then a transformation

t[B] of the context B is the result of replacing each E 2 �[B] with any leastsubset t(E) of E which is a transversal for H 0.

Proposition 4.11. Let H be a hypergraph with a n-contraction H 0. LetB = fEig be a context for H 0; Ej a base, and x the pivot. Further, letC = fcig = [H 0 � [�[B]], where 8i (1 � i � n + 1); ci = [H 0 � �(Ei),and let C 0 = fc 0g = [H 0 � t[B], for any transformation t of B, where8i (1 � i � n+ 1); c 0i = [H 0 � t(Ei).Then H 0 is uncolourable on �C

n ) H 0 is uncolourable on �C 0

n .

223


Corollary 4.12. Let H be a hypergraph with a n-contraction H 0. LetB = fEig be a context for H 0; Ej a base, and x the pivot. Let C = fcig =[H � [B], where 8i (1 � i � n+ 1); ci = [H �Ei. Lastly, let C

0 = fc 0ig =[H 0 � [t[B]], for any transformation t of B, where 8i (1 � i � n+ 1); c 0i =[H 0 � t(bi).

Then H is n-uncolourable on �Cn ) H 0 is uncolourable on �C 0

n .

The preceding result may (more suggestively, perhaps) be representedas a validity{preserving rule for (n+ 1)-ary relational models:

2^[c1]^2^[c2]^:::^2^[cn+1]!2f(H)2^[c01]^2^[c

02]^:::^2^[c

0n+1]!2f(H

0)

Proposition 4.13. Let H be a hypergraph with a n-contraction H0. Let B

be a context for H 0; Ej a base, and x the pivot. Then for any transformationt of B, \t[B] = ;.

De�nition 4.14. Let H0;H1;H2; :::;Hk and B0; B1; B2; :::; Bk�1 be �nite se-quences of hypergraphs Hi and Bi�1, respectively. Let E0; E1; E2; :::; Ek�1 bea �nite sequence of sets Ei�1, and let x0; x1; x2; :::; xk�1 be a �nite sequenceof vertices xi�1. Then H0;H1;H2; :::;Hk is a n{sequence (for n 2 Z+) ifhHii; hBi�1i; hEi�1i, and hxi�1i satisfy:

1. 8i (0 � i � k � 1),

� Bi 2�TrHin+1

�,

� Ei 2 Bi,

� xi 2 Ei,

2. 8i (1 � i � k);Hi is a n-contraction of Hi�1 which uses Bi�1 as acontext, Ei�1 as a base, and xi�1 as the pivot, and

3. 8i (1 � i � k � 1) , there is a transformation t: Bi = t[Bi�1].

Corollary 4.15. If 8H;8n 2 Z+; �(H) > n ) there is a n-sequenceH0;H1;H2; :::;Hk with H = H0 and j[Hkj = n+1 , then Hadwiger's conjectureis true.

References

Apostoli, P. and B. Brown (1995). A solution to the competeness problemfor weakly aggregative modal logic. Journal of Symbolic Logic 60, 832{842.

Berge, C. (1989). Hypergraphs: combinatorics of �nite sets. North-Holland.

224

Tara Nicholson

Dirac, G. (1952). A property of 4-chromatic graphs and some remarks oncritical graphs. J. London Math. Soc. 27, 85{92.

Hadwiger, H. (1943). �Uber eine Klassi�kation der Streckenkomplexe.Vierteljahrsch. Naturforsch. Ges. 88, 133{42.

Jennings, R. and P. Schotch (1989). On Detonating. Philosophia, Para-consistent Logic, 306{327.

Nicholson, T. (2001). Chromatic number preserving contractions of hyper-graphs. (unpublished) presented at the Society for Exact PhilosophyConference, Universit�e de Montr�eal, May, 2001.

Nicholson, T., R. Jennings, and D. Sarenac (2000). Revisiting complete-ness for the Kn modal logics: a new proof. the logic journal of theIGPL 8 (1). on{line.

Roberston, N., P. Seymour, and R. Thomas (1993). Hadwiger's conjecturefor K6-free graphs. Combinatorica 13, 279{361.

225


226

A plural resolution logic

Rick Nouwen

UiL-OTS Universiteit Utrecht

[email protected]

Abstract. This paper presents an integration of centering theory into incrementaldynamic logic. It furthermore discusses how such a combined system could dealwith plural pronoun resolution.

1 Introduction

It is common in dynamic semantics to assume that anaphora are related totheir antecedent by indexing. This way, dynamic semantics can analyze anddescribe the distribution of such relations. At the same time, it ignores one ofthe key aspects of language use: reference resolution. On the pragmatic sideof the study of language we see the exact opposite situation. Theories likecentering theory ((Grosz et al. 1995)) describe the way anaphoric elementschoose their antecedents, but they say little to nothing about structurallimitations of anaphoric relations. A combination of these two families oftheories is thus a natural step in the study of the pragmasemantics of naturallanguage.

Recently, a number of attempts have been made at combining dynamicsemantics with centering theory ((Roberts 1998), (Beaver 1999), but seealso (Hardt 1999)), as well as attempts to improve the formalism involvedin dynamic semantics in order for it to better deal with discourse anaphora((van Eijck 2000a),(van Eijck 2001)). All such studies have thus-far con-centrated on reference resolution of singular pronouns. This is a dangeroussimpli�cation since dealing with plurals proves to be a far more complicatedchallenge. Plural pronoun reference resolution is a relatively poorly studiedmechanism. Since plural anaphora can be linked to multiple antecedents,the basic insights from centering and other pragmatic theories of pronominalreference focusing on singulars lose their value.

In this paper I take a formalist standpoint towards pronoun resolution,by integrating a formalization of centering theory in a dynamic logic. Thebase system will be incremental dynamics as presented in (van Eijck 2001).

Proceedings of the Sixth ESSLLI Student SessionKristina Striegnitz (editor)Chapter 20, Copyright c 2001, Rick Nouwen

227


In Section 2 the formal details of the logic are presented. In the next sec-tion I consider its relation to centering and propose some translations ofcentering notions. In Section 4, I discuss the resolution of plurals, proposesome changes to the system and I give some examples. This is followed byconclusions and some thoughts on future developments.

2 Presentation of the logic

2.1 Incremental Dynamics

Incremental dynamics (ID, (van Eijck 2001)) is a system which purports todo away with the destructive assignment problem of dynamic predicate logic(DPL, (Groenendijk and Stokhof 1991)) as well as provide a logic �t for areformulation of discourse representation theory in Montagovian (i.e. bottomup) style. In DPL, the merging of two contexts sharing a certain variablewill destroy the value of one of its occurrences. ID is variable free. Quanti�eractions simply bind the next available index.

For the purpose of this paper we focus on one particular interestingaspect introduced in (van Eijck 2000a). Normally, dynamic meaning is seenas a relation between sets of individuals. In ID these sets are replaced bysequences re ecting a salience hierarchy: the most prominent individual inthe discourse is the head of the current sequence. Salience of an individualis changed whenever it is involved in a predication. Predicates have thecapability of permuting contexts in such a way that their arguments areassigned prominence according to the obliqueness hierarchy. Here is anexample.

(20.1) i) A farmer who owns a donkey likes a woman.

ii) 9x; farmer(x);9y; donkey(y); own(x; y);9z; woman(z); like(x; z)

Normally we would give the semantics for (20.1i) as in (20.1ii), producing acontext where all assignments are such that x; y and z refer to a farmer, adonkey and a woman respectively. But notice that the salience hierarchy of(i) in no way re ects the order of introduction of variables in (ii). Thingschange if we assume predicates to be permuters. The own predicate will�rst push the farmer to the head of the sequence and the donkey to thesecond position. Processing likes later on will again push the farmer to thehead, but now the woman to the second position. The output is thus asequence where the referent for a farmer is most salient and the referent forthe donkey is the least salient one, which is in accordance with the usualrelation between salience and syntactic role.

Incremental dynamics is thus an ideal starting point for our integrationof reference resolution and dynamic semantics. We will now proceed withthe formal details of the logic and gradually extend the system to �t ourpurpose better.

228

Rick Nouwen

2.2 The logic

In the current variant on the theme of ID, an assignment will be a triplehl; i; di, where l is a natural number representing syntactic level, i is a naturalnumber representing the index or name and d is an individual. Of eachintroduced individual we thus keep track of where it was introduced and bywhich label.

A context is a sequence of assignments. If c is a context and � anassignment, then we write c : � for the sequence resulting from pushing �to the top of c. We write c[p] for the pth assignment in c, e.g. c[0] returnsthe head of context c. If i is an index we write c(i) for the entity paired withi in some assignment in c. c(i) =" whenever i is not part of any assignmentin c.

Since our collections of assignments are represented as sequences we nowhave the opportunity to permute them without losing track of our level-index-entity labelling. We de�ne permutation of contexts as follows (given atrivial de�nition of subtraction): (p)c := (c�c[p]) :c[p]. Given an assignmenta = h�; �; i we write a� for . Similarly c� is fa�ja 2 cg.

The model and interpretation in the model are standardly de�ned:

De�nition 2.1. (ontology/model) Let hX;�i be an upper semilattice, thenif Y � X, we de�ne �Y as the unique z such that 8x 2 Y : x�z and8x 2 X : (8y 2 Y : y�x ! z�x). We write x � y for �fx; yg. Given a setX, we write X� for the closure of X under �. Let E be a set of entities. LetPred be a set of predicate symbols, with Ar an arity function for Pred.A model is a structure hI; Ei, where I is a function from Pred to tuples oflength Ar(Pred) containing individuals from the E� domain.

(interpretation) M j= Pa1 : : : an if and only if ha1 : : : ani 2 IM (P ) andM =jPa1 : : : an if and only if ha1 : : : ani 62 IM (P )

A state is a set of contexts of equal length. We often write Si, where ispeci�es the cardinality of each of the contexts in S. Updates are performedon tuples hn; Si, where S is a state and n is a counter specifying the currentlevel.

De�nition 2.2. (semantics) For k � 1:

hn; Sii [[9]] := hn; S0i+1i whereS0 = fc : hn; i + 1; dijd 2 E� & c 2 Sg

hn; Sii [[�9]] := hn; S0i+1i whereS0 = fc : hn; i + 1; dijd 2 (c�)� & c 2 Sg

hn; Sii [[P i1 : : : ik]] := hn; S0ii whereS0 = f(i1) : : : (ik)cjc 2 S & M j= Pc(i1) : : : c(ik)g

hn; Sii [[i = j1 � : : :� jk]] := hn; S0ii whereS0 = fcjc 2 S & c(i) = c(j1)� : : :� c(jk)g

hn; Sii [['; ]] := (hn; Sii [[']] ) [[ ]]

229


There are two quanti�er actions. The ordinary existential quanti�er 9expands the state such that each context now has a new index pointing to anindividual in the domain. The action �9 di�ers from 9 in that it assigns onlycontextually available individuals to the newly introduced index. In this waythe two quanti�ers constitute the inde�niteness/de�niteness distinction.

Predicates are tests modulo permutation. If the condition of the predica-tion succeeds, then a context is permuted in such a way that the argumentsof the predicate form the head of the context in the order mirroring theirsyntactic roles.

The distribution of indices to predicates is handled by the compositionalsemantics. I refer to (van Eijck 2000b) for a detailed exposition of the com-positionality of incremental dynamics. In this paper I will take the com-positional analysis for granted. For clarity I will write 9i for the quanti�eraction introducing index i into the context, but the reader should bear inmind that the reality of the formalism does not di�erentiate between quan-ti�er actions with di�erent indices. Another clari�cation method I will useis the graphic depiction of contexts. Here is an example.

(20.2) John saw Harry.�921; john(21); �922; harry(22); saw 21 22

Following the introduction of John and Harry we derive the context c0, below(given a unique contextual John and Harry). Rows are assignments. Theleftmost column represents the syntactic level of the referent represented byeach row. The second column is the index or name of the referent. The �nalcolumn speci�es the value. Prominence is represented by the order of therows. The top row has prominence zero, the second one, etc. So in c0 theindividual \h" is assigned to index 22 on level 0 and is the most prominentindividual in the discourse. Following the predication \saw 1 2" the promi-nence hierarchy is permuted into c00, making John the more prominent ofthe two.

c0 =0 22 h

0 21 j[[saw 21 22]]

0 21 j

0 22 h= c00

2.3 Extending the system with syntactic levels

Since resolution mechanisms not only make use of a prominence hierarchybut also of a notion of utterance, discourse unit or any other segmentationof discourse, we need to express syntactic units. For this we have the �rstposition in assignment tuples, but also the index keeping track of the currentlevel during updating. We change levels1 with \[" and \]". Here is the

1The use of levels is closely related to Visser and Vermeulen's work on dynamic brack-eting, cf. (Visser and Vermeulen 1996).

230

Rick Nouwen

semantics.2

hn; Sii [[ [ ]] := hn+ 1; Siihn+ 1; Sii [[ ] ]] := hn; Sii

Next, I will consider centering theory as our candidate for integration withthe formal system.

3 Centering Theory

Centering Theory (Grosz et al. 1995) is a theory of coherence in localdiscourse. In this paper we will embrace the basic ideas behind centeringbut nothing prevents us from altering some of its assumptions. In fact,the point is that we can make formal reference to the notions introducedby centering. We adopt here a speci�c version of centering known as theBFP-algorithm, (Brennan et al. 1987).

Central to centering is the assumption that each utterance produces a setof Forward-looking centers notated as (Cf ). Furthermore, if an utterance isnot discourse initial, then it is also assigned a backward-looking center or Cb.The elements in Cf are partially ordered re ecting the relative prominencein the utterance. Usually this order is thought to mirror the obliquenesshierarchy. The highest ordered element in Cf is most likely to be the futurebackward looking center. Centering theory calls this element Cp or preferredcenter.

Resolution occurs with the aid of transition types. Each type is de�nedin terms of Cb and Cf , i.e. a transition between utterances k and l is calleda shift if Ckb 6= C lb. Such a transition is called retain if Ckb = C lb, butC lb 6= C lp. A continue transition occurs when the current backward-lookingcenter equals the previous one and it is also the current preferred center.The transition types are ordered. This is what drives resolution: pronounreference should be resolved to produce a transition which is least costly. Thetransition order is continue < retain < smooth-shift < rough-shift, where ashift is smooth if C lb = C lp and rough otherwise. The complete centeringstrategy towards resolution can now be captured in an algorithmic fashion.First of all, all the possible resolutions are produced. Next the followingpossibilities are �ltered out: (i) those which do not obey syntactic bindingconstraints, (ii) those which do not have the current backward-looking centermapped to element in the current Cf which is highest in the Cf of thepreceding utterance and (iii) those where all pronouns, if there are any,

2The semantics of the closing bracket as given here is no doubt problematic, since itallows multiple utterances to be represented on the same level. In this paper however, the\ ] "-action plays no role. Still, to give a hint of an alternative de�nition we might proposethat the closing bracket throws away all the referents introduced at the level to be closed.This means we want the output state to be S0j = fcjc0 = c�f0; : : : ; ng & j = jc0jg. Similarde�nitions could open the door to formalisms not only representing discourse units, buteven discourse structure (cf. (Roberts 1998), (Polanyi 1988)).

231


fail to refer to the current Cb. We will refer to this �nal �lter as Rule 1.Following these constraints we are left with a set of possible resolutions,each of which corresponds to a certain transition type. The least costlypossibility is considered the optimal resolution.

I give a brief example. Consider: \ 1. John has been having a lot oftrouble arranging his vacation. 2. He cannot �nd anyone to take over hisresponsibilities. 3. He called up Mike yesterday to work out a plan. 4. Mikehas annoyed him a lot recently. 5. He called him at 5 am on Friday lastweek." (cf.(Grosz et al. 1995)(20)).

Say that we already processed (and resolved) everything up to the onebut �nal sentence. Here are the relevant center assignments.

1. 2. 3. 4.

Cb ; John John JohnCf hJohni hJohni hJohn,Mikei hMike,Johni

For the resolution of the pronoun in (4.) we had a choice between resolvinghim to John or to Mike. Rule 1 as well as binding constraint B tell us to referto John. Since the preferred center is no longer the backward-looking center,we may call (4.) a retain transition. Now we consider the �nal sentence.We have two possible resolutions (after application of binding constraints).The current Cb is Mike. We have a choice between Cf = hMike; Johni orCf = hJohn;Mikei. The �rst one produces a smooth shift, the latter a roughone. This means we will have to resolve he to Mike and him to John.

A di�erent way of looking at centering is the view that it is simply aformalism on tuples. Given the logic we introduced in Section 2 we canview states as sets of forward looking center lists. Transition types are thusa notion directly related to the extended incremental dynamics system.

We write cl for the assignments in c which share level l, i.e. cl gives usthe forward looking center list of discourse unit l. This also means that cl[0]is the preferred center for that level.

Continuing this line we can de�ne the backward looking center of adiscourse segment l as follows:3

cl :=

�0 cl�1 = ;�p:9q(cl[p]� = cl�1[q]� & 8p0; q0(cl[p0]� = cl�1[q0]� ! q<q0)) otherwise

clb := cl[cl ]

We can now de�ne the two transition building constraints as predicates over

3The original de�nition of backward looking center has it that no Cb is assigned to thediscourse initial sentence. I follow Beaver (Beaver 2000) in selecting the subject in thatcase.

232

Rick Nouwen

contexts and levels.4

Cohere(c; l) :, clb = cl�1b

Align(c; l) :, clb = cl[0]

The transitions types are:

continue(c; l) :, Cohere(c; l) ^ Align(c; l)retain(c; l) :, Cohere(c; l) ^ :Align(c; l)

smoothshift(c; l) :, :Cohere(c; l) ^ Align(c; l)roughshift(c; l) :, :Cohere(c; l) ^ :Align(c; l)

Up to now we have simply shown that our extended incremental dynamicsis capable of fully incorporating centering theory. Now it is time that westart thinking about plurality.

4 Resolving plurals

One of the strategies used by language to introduce plural referents is coor-dination. Conjunctive NPs unite two (singular or plural) individuals into anew plural one. In the current system, this is modeled well by making useof the summation operator � and of the =-predicate. In fact, this is similarto the way coordinated NPs are handled in e.g. DRT (see (Kamp and Reyle1993):434). For instance, the NP a man and a woman is represented as91; man 1;92; woman 2; 93; 3 = 1� 2. Plural pronouns can subsequently linkto such plural referents. But an interesting feature of these constructions isthat the subparts of the newly formed plural do not inherit salience fromthe position of the plural NP. Consider for example the contrast in (20.3).

(20.3) i) Tom talked with Dick. He spoke very softly.

ii) Tom and Mary talked with Dick. He spoke very softly.

In (20.3ii) he is readily resolved to refer to Dick even though Tom is insubject position. Another example is the contrast in (20.4).

(20.4) i) Tom, Susan and Mary went to a party the other day.?They wore beautiful dresses.

ii) Tom took Susan and Mary to a party the other day.They wore beautiful dresses.

Our system so far does a good job accounting for this fact. Given that itis predication which drives the salience game, the plural Tom, Susan andMary is permuted to the front of the context. The singular parts of this

4The names of these predicates were inspired by David Beaver's optimality theoreticreformulation of the BFP-algorithm in (Beaver 2000).

233


referent play no role in any predication, so they are even less salient thanthe referents contributed by a party and the other day.

We thus easily predict inferior salience for parts of conjunctive NPs. Hereis a detailed example. The �gure in (20.7) shows a possible context updateof (20.5)/(20.6), in a model where f is a farmer, m a man and w a womanand with m and w hitting f .5

(20.5) A man and a woman hit a farmer.

(20.6) 91; man 1;92; woman 2;93; 3 = 1� 2;94; farmer 4; hit� 3 4

(20.7)

0 4 f

0 3 m� w0 2 w

0 1 m

hit� 3 4

0 3 m� w0 4 f

0 2 w

0 1 m

It is clear that the farmer is now more salient than the man. Subsequentanaphoric reference will thus prefer picking up the farmer-referent over thesub-individual of the plural subject.

I propose translating singular pronouns as �9i; sg i and plural ones as�9i; pl i. Here, sg and pl are predicates such that in our model I(sg) = Eand I(pl) = E� � E. Continuing (20.5) with e.g. He was angry asks for asubsequent update of the �nal context in (20.7) with [; �95; sg 5. The resultingstate is:

0 3 m�w0 4 f

0 2 w

0 1 m

[; �95; sg 5

1 5 f

0 3 m� w0 4 f

0 2 w

0 1 m

1 5 w

0 3 m� w0 4 f

0 2 w

0 1 m

1 5 m

0 3 m� w0 4 f

0 2 w

0 1 m

There are a few complications here, though. First of all, centering's Rule1, rules each of the resulting contexts out. Because of agreement, none ofthem has the pronoun referring to the backward looking center m�w. Fur-thermore, would we disregard Rule 1, then all resulting contexts are simply(smooth) shifts. This is not against our intuition { we are changing the topicfrom the man and the woman to the farmer. However, we still want someway of expressing the fact that the farmer is the most salient singular refer-ent and the fact that the farmer is accessible with pronominal reference. Thekey to the solution of these problems seems to lie in agreement. I proposehere that the tension between transition types of resolutions is dependenton agreement. This means that as far as plural reference is concerned m�w

5I write P � for the pluralized version of predicate P . This also means that in thispaper the distributive/collective distinction will be ignored.

234

Rick Nouwen

is the backward-looking center in (20.7). For singular reference, though, itis the farmer who is the Cb.

Another problematic issue for the centering component is that of splitantecedents.

(20.8) John took Mary to the cinema. They enjoyed the movie.

How will we be able to relate the plural construction in the second sentenceto the two singulars in the �rst? It is easy to see that the setup thus farpredicts the cb of the second sentence to be absent (a violation of Rule1). The example in (20.8), however, is �ne. The reason for this is thatalthough the plural individual consisting of John and Mary was not explicitlyintroduced in the �rst sentence, this group still inherits some of the salienceof its parts. This is also illustrated by (20.9).

(20.9) John took Susan to see the Beatles. They. . .

Although the NP the Beatles is the only plural in the �rst sentence, theconstrued plural John and Susan is at least as salient as are the fab four(we are for instance not forced to repeat the names of John and Susan whenwe want to comment about the fact that they had a good time). Thisproblem can be solved when we again assume agreement to be active duringresolution. That is, processing the �rst sentence also creates the possibilityto assign prominence to implicit plurals. In (20.9) I therefore take John tobe the singular Cb while John and Susan form the plural Cb.

Implementing the ideas presented in the previous sections will happenthrough two operations on contexts. One, the singular closure of a context isthe prominence hierarchy of all the singular referents. The other, the pluralclosure, restricts the context to plural referents, but also includes construedplurals and calculates their salience.

First we de�ne the singular closure of a context.

sg(c) = �a:a 2 c & a� 2 E

Viewing sequences of assignments as four-place tuples, the plural closurecreates a partially ordered Cf by allowing the position slot to receive thesame index more than once. Construed plurals inherit the combined salienceof their parts. Since the name of these referents is irrelevant, I assign themthe dummy label ?. I write c+ for the sum of all the positions taken in c.For a linearly ordered c of say length 2 this means that c+ = 1, given thatthe �rst position is called zero. I write maxl(c) for the maximum level in c.

pl(c) := fhc0+;maxl(c0); ?;�(c0�)ijc0 � c & � (c0�) 2 E� �Eg

Thus, pl(c) is the set of plural constructions possibly attainable from c whereprominence is calculated by summating the prominence of the parts.

235


Given these de�nitions we can de�ne the backward-looking centers.

clb;sg := (sg(c))l[(sg(c))l ]

clb;pl := (pl(c))l[(pl(c))l ]

Correspondingly we can weaken the two transition building constraints.

Cohere(c; l) :, 9v : clb;v = cl�1b;v

Align(c; l) :, 9v : clb;v = cl[0]

Finally I de�ne a pronoun resolution operator. Let Si be a state and let nbe a level de�ned for S. We de�ne:

bSicn := �c::9c0 2 S : c0 �n c

Here the ordering is as follows. The relation �n is the minimal relation suchthat:

Cohere(c; n) < Cohere(c0; n)! c �n c0

&Cohere(c; n) = Cohere(c0; n)! (Align(c; n) < Align(c0; n)! c �n c

0)

This means that bScn �lters out all contexts but the ones representing theleast costly transition at level n.

5 Final remarks

First of all, let me remark that there are a few reasons why this paper canonly be considered to be a preliminary study. There is so little empirical dataconcerning plural resolution6 that the generalizations I have made here arevery likely to be too general. Also, it is clear that the centering frameworkis far from optimal; the system proposed in this paper inherits most of theproblems inherent in centering.

There is also room for improvement by changing some design choices.For instance, the permutable contexts have the disadvantage that saliencecan only be expressed relative to other referents. This means that if we havea sentence with a conjunctive subject as the only NP, the subparts of thisNP end up relatively high in the prominence hierarchy since there are noother referents to intervene. If we compare a sentence A and B Pred, with APred B, then in both cases B is second in the Cf . We might remedy this byletting predicates assign roles, i.e. they will assign position 0 to their subject,1 to their object, etcetera. In a system like that, existential introductionshould assign some notion of non-salience to a new index and we should use\[" and \]" for a richer notion of constituency in the logic.

6But see (Koh 2001) for some recent interesting psycholinguistic results.

236

Rick Nouwen

Here is what was achieved: Enriching contexts with labels expressingsyntactic levels opens the possibility of equating contexts with sequences offorward-looking center lists. Applying the formalized notion of backward-looking center to a singular or plural closure of such contexts enables us todeal with some complicating issues in plural pronoun resolution.

References

Beaver, D. (1999). The logic of anaphora resolution. In P. Dekker (Ed.),Proceedings of the 12th Amsterdam Colloquium, pp. 61{66. ILLC Uni-versiteit van Amsterdam.

Beaver, D. (2000, July). The optimization of discourse. Unpublishedmanuscript.

Brennan, S., M. W. Friedman, and C. Pollard (1987). A centering ap-proach to pronouns. In Proceedings of the 25th annual meeting of theACL, Stanford, pp. 155{162.

Groenendijk, J. and M. Stokhof (1991). Dynamic Predicate Logic. Lin-guistics and Philosophy 14, 39{100.

Grosz, B., A. Joshi, and S. Weinstein (1995). Centering: a framework formodelling the local coherence of discourse. Computational Linguis-tics 21 (2), 203{226.

Hardt, D. (1999, April). Dynamic interpretation of verb phrase ellipsis.Linguistics and Philosophy 22 (2), 187{221.

Kamp, H. and U. Reyle (1993). From Discourse to Logic. Dordrecht: D.Reidel.

Koh, S. (2001, february). Resolution of the antecedent of a plural pronoun.Ph. D. thesis, Graduate School of the University of MassachusettsAmherst.

Polanyi, L. (1988). A formal model of the structure of discourse. Journalof Pragmatics 12, 601{638.

Roberts, C. (1998). The place of centering in a general theory of anaphoraresolution. In M. Walker, A. Joshi, and E. Prince (Eds.), CenteringTheory in Discourse, pp. 359{400. Oxford: Clarendon Press.

van Eijck, J. (2000a). Context semantics for NL. Unpublished manuscript.

van Eijck, J. (2000b). On the proper treatment of Context. In Proceedingsof CLIN99, Utrecht.

van Eijck, J. (2001, Summer). Incremental Dynamics. Journal of Logic,Language and Information 10 (3), 319{351.

Visser, A. and C. Vermeulen (1996). Dynamic bracketing and discourserepresentation. Notre Dame Journal of Formal Logic 37, 321{365.

237


238

Relational Concept Analysis for

Structural Disambiguation

Jerome Piat

RACE Laboratory, University Of Tokyo, Komaba 5-4-1, Meguro-ku, Tokyo 153-8904, Japan

[email protected]

Abstract. Many methods currently used to disambiguate some grammaticalstructures are statistical or probabilistic ones and the knowledge learned is usuallyat the level of abstraction of the example itself, thus diÆcult to reuse for exam-ples newly encountered. This paper aims at proposing a framework to solve somegrammatical ambiguities using a Relational Concept Analysis. Such framework isused here to disambiguate a Japanese structure referred to as the Genitive CaseJapanese No particle - Ga/No conversion by the use of two main methods thatparticipate in correctly solving the syntactic ambiguity : a computational logicalanalysis, establishing logical semantic paths between concepts, and an empiricalpreference rule system, using conceptual rules learned from examples to help tacklecases diÆcult to solve by the �rst method. Practical results show that the proposedframework is computationally manageable, and can provide some adaptive concep-tual generalization as well as some knowledge enrichment toward the understandingof the meaning of language structures.

1 Introduction

The Japanese grammatical structure A no Adj B, with A and B two nounsand Adj a regular adjective is grammatically ambiguous, when no oral orwritten marker is present. Indeed, Adj can refer to A or B, or both. Thisgrammatical ambiguity will be later referenced as GCN ((Shinichi 1971),(Miyagawa 1992)). A system could rely on some statistical analysis, but theknowledge learned may be directly dependent on the number of occurrencesencountered at the level of examples and thus sensitive to irregularities andlacks in the coverage of representative examples as described in (et al. 98).The present work aims at proposing a concept based framework to solvesuch ambiguity as well as some similar ones by using a network of con-cepts, and the construction of semantic paths between them in order to �nddynamically the correct grammatical attachment afterward. The approachintroduced here is class based in order to avoid as much as possible prob-lems induced by the sparseness of data in training sets. Compared to somepreviously known class based approaches ((et al. 98), (Kurohashi and Sakai

Proceedings of the Sixth ESSLLI Student SessionKristina Striegnitz (editor)Chapter 21, Copyright c 2001, Jerome Piat

239

Relational Concept Analysis for Structural Disambiguation

1999)), the present one generalizes from examples, then constructs concep-tual paths at learning time, or searches for some speci�c types of semanticpaths at decision time to �nd appropriate conceptual bindings. This ap-proach requires a rich database of concepts, that can link them together bydi�erent semantic relations. Wordnet ((Miller et al 1993), (Miller 1993)) hasbeen used for the present study.

The �rst section gives a brief linguistic analysis of the Genitive caseJapanese No ambiguity, a description of the system developed to tacklesimilar ambiguities is then described in two steps, then results obtainedfrom the disambiguation of GCN are presented and discussed.

1.1 Linguistic analysis

Semantically non-ambiguous cases

1. Hanako no ookii me (Hanako's big eyes).2. me no ookii Hanako (Big-eyed Hanako).The syntactic structure is exactly the same in both cases. The adjectiveookii (big, large) refers to the noun me (eye) in both sentences. Thoughinherently semantically ambiguous, it is not ambiguous for a human inter-pretation.3. sono hito no kireina me (the beautiful eyes of this person).The Japanese language uses two types of adjectival words: one type directlyreferred to as adjectives that end with the hiragana suÆx -i, and anothertype followed by the hiragana suÆx -na and referred here as nominal ad-jectives. The nominal adjective in the example 3. for instance attaches tothe noun on the right of the structure. This means that the structure issyntactically not ambiguous in such case. A more detailed study can befound in (Kanzaki and Isahara 1997).

Semantically ambiguous cases

4. ? badominton-kooto no hiroi sentaa.This sentence is syntactically and semantically ambiguous. In case the con-text is omitted, two alternative interpretations could reasonably be drawnby a Japanese native speaker:

� a. the center that contains a large badminton court.

� b. the large center that contains a badminton court.

As a primary conclusion, it appears that solving the GCN structurerequires both a syntactic and a semantic approach. At the semantic level,a conceptual approach is needed to better analyse the semantic relationsthe two nouns of the structure have between each other. A system that issupposed to �nd the correct attachment is then expected to �nd whether

240

Jerome Piat

the sentence is ambiguous for humans or not, and if not, determine theappropriate attachment.

1.2 The Relational Concept Approach

The Principle of Minimality

The approach to solve the GCN ambiguity is partly based on a maxim ofquantity described by Grice as one part of a basic principle of communicationin pragmatics: the principle of minimality indicating that something notadding information is generally omitted. This principle does not strictlybelong to the Concept Relational Approach but participates in solving theGCN from the logical inclusion point of view and is one of several principlesthat can draw a semantic search within that framework.

The Conceptual Knowledge database

Though the present work deals with a Japanese structure, the English ver-sion of the Wordnet database has been used for its richness of information,and accessibility, providing also a valuable library of functions to processexamples.

Formalization

Let Ci be a concept de�ned here as a non-intentional internal representa-tion of information that can serves as the meaning of a linguistic entity,represented as a synset in the Wordnet database ((Miller et al 1993)). Therelation �i(Cj) is true if and only if there exists a semantic arc from Ci toCj following a type of relation u. Some Wordnet relation types are symmet-rical but haven't been used here, only the two asymmetrical relation types:hyponymy and meronymy are used. For instance, if u is the hyponymy re-lation, �u(Ci)=Cj is true if and only if Cj is the superordinate or hypernymof Ci, which can be written also Ci "is a" Cj .The semantic arc from Ci to Cj following the relation u is noted su(Ci; Cj).Thesemantic oriented homogeneous path from Ci to Cj following a unique rela-

tion type u is then so that: pu(Ci; Cj)=Sjk=isu(Ck; Ck+1) with i, j, and k

2 N.A path is said to be homogeneous if and only if it is composed of semanticarcs having all the same type of relation, it is called heterogeneous other-wise. An heterogeneous path from Ci to Cj is composed of homogeneouspaths from Ci to Cj possibly following di�erent types of relations and notedP an (Ci; Cj)=

Snk=0 pk with n de�ning the number of di�erent relation types

from Ci to Cj, a a unique identi�er, and pk an homogeneous path betweentwo concepts.

241


Unilateral heterogeneous logical inclusion paths

As described just before, a �rst part of the study based on Grice minimalityprinciple is so that if one of the two nouns of the GCN structure is concep-tually an intrinsic part of the second noun, then the adjective is likely todetermine the noun that contains the other, otherwise, the included nounwould be merely useless. As a simple example, in me no ookii hito (big-eyed person), me (eye) being an intrinsic part of a person, adding it in thestructure would not add useful information if the adjective ookii (big) wasqualifying hito (person). Therefore, the adjective implicitly attaches to thenoun that is a constituting part, or property of the other in that case.The goal of this �rst part is to �nd some paths of logical inclusion basedon the general de�nition of a path given in section 1.2, using a logical deci-sion criteria Al(Ci;Cj) for two concepts Ci and Cj , with Card(fPmk (Ci; Cj),m 2 Ng) also noted in short Card(fPk; i; j)g) the total number of heteroge-neous logical inclusion path occurrences linkingCi to Cj, a reliability criteria

r(i; j) = Card2(fPk;i;jg)�Card2(fPk;j;ig)

Card(fPk;i;jg)+Card(fPk;j;ig), and t0 2 R

�+ a threshold parameter.

Al(Ci; Cj) =8>><>>:undecidable if Card(fPk(Ci; Cjg) + Card(fPk(Cj ; Cig) = 0ambiguous if jr(i; j)j < t0left attachment if r(i; j) � t0right attachment if r(i; j) � �t0

Practically, three types of logical inclusion paths have been instantiated:

Unary logical paths where the full logical inclusion path is P ku (Ci; Cj)= pu(Ci; Cj) with u an hypernymy or meronymy semantic relation.These paths are direct semantic paths for which one element is directlyfound to be a conceptual part member or instance of the other. Thisis the only type where P ku (Ci; Cj) is homogeneous, that is composedof only one type of relation.

Binary logical paths where the path P ku (Ci; Cj) = pu(Ci; Cm)Spv(Cm; Cj)

with the pair (u,v) being for instance (is part, is a), (is a, has member),or (is a, has part). In this case, the full path is heterogeneous as beingcomposed of two semantically distinct homogeneous paths. m repre-sents here the index of the concept Cm so that fCmg = pu(Ci; Cm)

Tpv(Cm; Cj).

Ternary logical paths where P ku (Ci; Cj) = pu(Ci; Cm)Spv(Cm; Cn)

Spw(Cn; Cj) with (u,v,w) is a combination of three distinct hyponymicor meronymic homogeneous sub-paths with m and n so that: fCmg =pu(Ci; Cm)

Tpv(Cm; Cn) and fCng = pv(Cm; Cn)

Tpw(Cn; Cj).

Searching for the existence and redundancy of these paths has proved toprovide some valid information to disambiguate the GCN structure. Results

242

Jerome Piat

are given in section 2.2. Figure 21.1 illustrates one binary and ternary pathsof logical inclusion. Within each one, there is at least one common conceptCm bridging two di�erent but logically complementary semantic sub-pathspu(Ci; Cm) and pv(Cm; Cj) so that at least one binary or ternary logical pathof inclusion can be created to connect the �rst concept Ci with the secondCj .

Figure 21.1: Binary and ternary logical inclusion paths.

Whether the logical inclusion paths found are unary, binary, or ternary,the reliability parameter r(i,j) is calculated afterward by including them allto determine the appropriate attachment. Some more complex paths couldbe searched, but the current implementation has been limited to ternarypaths. It should be noted that not all binary or ternary combinations (u,v,w)are logically valid. For instance if a concept Ci is part of a concept Cm whichhas a part that is Cj , it cannot obviously be deduced that Ci is an intrinsicpart of Cj. Thus only a restricted number of combinations have been usedto search valid paths. Moreover, the search can be constrained in depth tolimit the combinatorial explosion of possibilities especially in case of ternarypaths like ( is part, has part, is a) where a concept Cj is a Cm that has apart which is a Ci. It should be noted however that average computationaltime was short, generally few seconds on a PC Pentium III 750 Mhz with128 Mb of RAM memory.

Learning Empirical semantic paths

The creation and learning of empirical preference rules based on exampleswas needed in order to:

� Extend Wordnet semantic relations for nouns and relate concepts likesword with warrior or rent with apartment by creating and adaptinggeneral rules at some low, medium or high levels of abstraction andcreate new semantic relations like used by for instance.

243


� Solve indecision cases for logically independent concepts.

Figure 21.2: Learning preference rules at the right level of abstraction.

Formally, this second step intends to construct empirical paths from twologically disconnected concepts Ci and Cj in order to recreate, from exam-ples, preference rules that will help to solve similar types of combinationsby a generalization process. An empirical path is then composed of:

Two homogeneous hyponymic sub-paths ph(Ci; Cm) and ph(Cj ; Cn) (hbeing the hyponymic relation) that are independent in the sense thatthey do not share common concepts.

A semantic arc su(Cr;Cs)(Cm; Cn) arti�cially added between one conceptCr of the sub-path ph(Ci; Cm) and a concept Cs of the other sub-pathph(Cj ; Cn). Here u designate one new type of relation between Cr andCs that is none of those described in Wordnet database ((Miller et al1993), (Miller 1993)).

An empirical path k linking two concepts Ci and Cj via an hyponymic sub-path ph(Ci; Cm), an arti�cial semantic arc su(Cr;Cs)(Cm; Cn), and anotherhyponimic sub-path ph(Cj ; Cn) is written:Eku(Cm;Cn)

(Ci; Cj) =ph(Ci; Cm)Ssu(Cr;Cs)(Cm; Cn)

Sph(Cj ; Cn).

The cardinality of su(Cr;Cs) noted Card(su(Cr;Cs)) represents the number

of learned examples (Ci; Cj) so that there exists a path Eku(Cm;Cn)(Ci; Cj)

with m, n, i, and j distinct indexes 2 N. A reliability criteria for a pref-

erence rule is r(s) =Card2(su(Cr;Cs))�Card

2(su(Cs;Cr))

Card(su(Cr;Cs))+Card(su(Cs;Cr))with Card(su(Cr;Cs)) +

244

Jerome Piat

Card(su(Cs;Cr)) 6= 0. Let t1 be a positive threshold parameter, the prefer-ence rule decision criteria Ae is obtained by calculating the average value ofthe n0 �rst most reliable rules for two Ci and Cj as follows:

Ae(Ci; Cj) =

8<:ambiguous if 1

n0+1� jPn0

k=0 rsk(i; j)j < t1left attachment if 1

n0+1�Pn0

k=0 rsk(i; j) � t1right attachment otherwise

2 Results

2.1 Results from examples of the Japanese literature

Nine hundred examples of the GCN structure have been extracted fromthe Japanese literature. For each example, three native speakers have beenasked to add two types of information:

� The semantic attachment of the adjective to the left or right noun.

� The existence of an intrinsic logical inclusion between the conceptsreferenced by the two nouns. Practically, this logical inclusion usesonly the meronymy type of relation of the Wordnet database. In theGCN structure A no Adj B, A is said to be logically included in Bif and only if A is thought of being an intrinsic part of B by nativespeakers for that only experience, this is noted in the table below bythe relation A @ B and A A B otherwise.

Tests have been run independently and were blind in the sense thattesters were not indicated the purpose of the tests. No notion of ambiguousattachment have been mentioned to the speakers thus no information havebeen collected on that issue here. Each percentage value given here is theaverage of percentage values from the three testers and represent the relativeproportion of examples of one type of logical inclusion among all examplesfor the same semantic attachment (left or right).

attachment � inclusion A @ ;right attachment 52.07% 2.25% 45.68%

left attachment 0.0 39.28% 60.72%

Table 21.1: Semantic attachments results by a human interpretation.

Two conclusions can be drawn from such results though the data setis relatively small and not representative of all possible combinations ofthe Japanese language for the GCN structure. The �rst one is that thetotal number of examples involving the search for some potential logicalinclusion paths is relatively high. Secondly, in case native speakers think

245


that there is a logical inclusion between the two concepts, that is results ofthe two �rst columns, there is a strong correlation between the direction ofthe attachment and the direction of the logical inclusion. This hinted fora conceptual relational approach to tackle a part of the problem. However,it should be noted that the existence of such correlation does not provethat logical inclusions paths are the underlying cognitive dynamic structuresused to partially solve the ambiguity, though reconstructing them is usedafterward as a method to solve the GCN structure. Some deeper cognitivestudy is thus needed at this point. Following results tend to con�rm thatcorrelation.

2.2 The search for logical inclusion paths

Out of these nine hundred examples, one hundred have been extracted sothat side e�ects of translations from Japanese to English were minimized.All these examples have been considered as non-ambiguous by Japanese na-tive speakers and thus having the adjective attaching either to the left nounor to the right noun only. Moreover, very abstract Wordnet synsets likeentity or something have been ignored on purpose in the search of logicalpaths in order to avoid too abstract and meaningless matching. The follow-ing table shows, for the columns of left and right attachment, the percentageof correct semantic attachments found by the system on the sole search forinclusion paths.

Left attachment Right Attachment

72.7% 75.5%

Table 21.2: Disambiguation results from the search of logical inclusion paths.

Results con�rm that by �nding some logical inclusion paths betweenconcepts, following the decision criteria given in section 1.2, some promisingresults are found when there is originally some logical inclusion between thetwo nouns (or concepts in Wordnet). This can be generalized to arbitrarysemantic paths depending on the grammar structure analyzed.

2.3 Learning semantic preference rules

As discussed before, searching for logical semantic paths cannot solve en-tirely the ambiguity as a signi�cantly high number of real examples are notinvolving any logical inclusion relation, as con�rmed by Wordnet. Conse-quently a system has been implemented to create and learn preference rulesat the appropriate levels of abstraction to link unrelated hyponymic paths,and create empirical semantic paths as described in 1.2 for appropriate se-mantic bindings. The two most signi�cant results are:

246

Jerome Piat

� Semantic rules, weighted by some adaptive reliability value, revealstep-by-step some useful abstract information to disambiguate theGCN. this could be for instance that a rent applies to artifacts andtherefore that the adjective attaches to the left in an example likeyachin no takai heya (a room with an expensive rent). This type ofinformation could be hardly extracted by a non-conceptual approach,participate directly in the generalization process, and could be storedin an eÆcient way in memory.

� Ambiguous examples for native speakers are also found ambiguousby the system, and at the opposite examples that are not consideredambiguous by humans have been generalized using a rather few numberof examples by the system as counter-examples are rare by de�nition.

As one example, here is listed in order of reliability some of the most reli-able rules calculated after 20 examples have been learned with closely re-lated words like payment, rent, cost as the right noun of the GCN, and asthe left noun some names of everyday life goods like television, table or book :

cost d by rule #523 c applies to physical objectoutgo d by rule #608 c applies to physical objectexpenditure d by rule #693 c applies to physical objectcost d by rule #521 c applies to artifactoutgo d by rule #606 c applies to artifactexpenditure d by rule #691 c applies to artifact

Here, rules found are rather abstract because the search depth has notbeen limited to low values. This means that if there are some contradictoryexamples afterward, these rules are very likely to have their reliability de-creased because of their high level of abstraction, and most reliable rules willhave a degree of abstraction lower but more appropriate to what have beenlearned. At decision time, inputting a completely new example like nedanno sugokutakai fune (a very expensive boat), the system can correctly answerthat the adjective attaches to the left noun nedan (price) even though wordsmay be encountered for the �rst time.

2.4 Bene�ts and current limits

The combinatorial explosion of possibilities is not so high. The search forrelation between concepts is limited by the fact that abstract concepts aremuch more rare than less abstract ones, and thus the processing is generallyvery quick at learning time and at decision time. In addition, the knowledgeinduced by each concept in relation to others can be kept local so that thecreation of potential paths is done using a limited set of possible connections,and changes made only locally.

247


Moreover, when learning preference rules, each example in uences con-siderably low, middle and high level rules, improving considerably the at-tachment decision for unlearned semantically similar examples. One ex-ample can modify several hundreds of abstract preference rules, and rulesbeing generated at all levels of abstraction, the training phase does not needa high number of examples. This approach is also valuable for some deepunderstanding of the meaning of linguistic structures as understanding theappropriate semantic attachment give the possibility to draw new inferencesfrom acquired and related knowledge.

Concerning limits and possible improvements of the approach, resultsdepend highly on the richness of information found in Wordnet. It has beennoticed that the information is not homogeneously distributed, this beingvalid for common concepts as well. Some noise and side e�ects have beeninevitably induced by the necessary translation though examples have beenchosen so that these problems are as few as possible. The fact that a goodproportion of words are polysemous induce changes at all levels of abstrac-tion when learning preference rules, and thus may modify inconsistentlymany relations at all levels. Pondering the reliability of preference rules bycriteria like their degree of abstraction is being investigated.

Moreover, exact matching is sometimes not possible but some coordi-nate concepts may participate signi�cantly in creating analogous semanticbindings. Therefore, introducing some semantic distance between conceptsas in (Resnik 1992) may improve sensibly the eÆciency of the system, at theexpense of its speed. In addition, current preference rules are rather simple,and constructed to create appropriate local conceptual bindings. Learningmore complex common sense based rules is currently studied.

3 Conclusion

A Relational Concept Approach can signi�cantly participate in solving struc-tural ambiguities, and single out the possible existence of some cognitiveprocesses underlying the understanding of some linguistic structures. More-over, such method can add implicit knowledge to the understanding of thestructure by the analysis itself, providing a preference rules system basedon learnable connection rules, and semantic paths that can generalize ap-propriate semantic attachments and draw further inferences toward a gener-alized understanding of the structure. Such framework can also be appliedto grammatical ambiguities similar to the GCN structure like the Japanesegrammatical structure Adj Noun no Noun or the English structure Adj NounNoun. It can be supplemented by some collocation statistical or probabilisticmethods and be used to semantically enrich the construction of the meaningof linguistic expressions.

248

Jerome Piat

References

et al., H. W. (98). Structural Disambiguation Based on Reliable Estima-tion of Strength of Association. COLING-ACL.

Kanzaki, K. and H. Isahara (1997). Lexical Semantics for AdnominalConstituents in Japanese. NLPRS .

Kurohashi, S. and Y. Sakai (1999). Semantic Analysis of Japanese NounPhrases: A New Approach to Dictionnary-Based Understanding.ACL.

Miller, G. (1993). Nouns in Wordnet : A lexical Inheritance System, Re-vised Edition.

Miller et al, G. (1993). Introduction to Wordnet : An On-Line LexicalDatabase. Revised Edition.

Miyagawa, S. (1992). Case, Agreement, and GA/No Conversion. ThirdAnnual Southern California Japanese/Korean Linguistics Conference,San Diego in Japanese Korean Linguistics, Volume 3, Soonja Choi,SLA Stanford Editor .

Resnik, P. (1992). Wordnet and Distributional Analysis: A Class-BasedApproach to Lexical Discovery. AAAI Workshop on Statistically-basedNatural Language Processing Techniques.

Shinichi, H. (1971). Ga-no conversion and idiolectal variations inJapanese. Gengo kenkyuu 60 .

249


250

A Model-Based Semantics of the Mood

Morphemes in Hungarian

Gy�orgy R�akosi

University of Debrecen, Hungary

[email protected]

Abstract.

This paper outlines a novel model-based approach to the analysis of the basic se-mantic properties of Hungarian mood morphemes.1 I assume that these morphemesare assigned a basic semantic feature already in the lexicon and they contribute thisbasic semantic impact to the interpretation of the sentence in which they occur.Though the present proposal does not aim to achieve anything more than providingan analysis where these features are established, it is believed at the same time tohave a potential to support the claim made by Farkas (1992) that mood distributionis at least partially semantically motivated.

Traditionally, three di�erent moods are distinguished in Hungarian: the indicative,the subjunctive and the conditional. First, I will brie y introduce some of theirmost typical uses, both in main and in subordinate clauses (section 1 ). Then Iwill propose an analysis that assumes that the indicative is in basic contrast withthe non-indicative, this latter being an abstract superordinate category for the sub-junctive and the conditional. The indicative is associated with the compatibility ofthe relevant proposition with a given contextually identi�ed model of interpretationM, whereas the non-indicative is characterised by incompatibility with M (section2 ). Finally, I will apply Quer's (1998) account of mood variation in Romance lan-guages to an analysis of instances of mood variation in Hungarian, and partiallymodify that account for the purposes of the present proposal (section 3 ).

1 The distribution of mood morphemes in Hun-garian clauses

I do not make an attempt here to catalogue all the di�erent contexts inwhich mood morphemes occur: their most frequent uses are summarisedhere to facilitate the subsequent theoretical discussion in section 2.

1This paper has been preceded by a number of illuminating discussions on the topicand here I wish to express my gratitude to the four people that helped me the most withtheir valuable suggestions and criticism: Tibor Laczk�o, Tam�as Mih�alyde�ak, P�eter Pelyv�asand Enik~o T�oth. I am also obliged to the anonymous ESSLLI reviewers for their usefulcomments. Of course, any errors remaining still are entirely mine. Special thanks toGy�orgy Balassa for the LaTeX conversion.

Proceedings of the Sixth ESSLLI Student SessionKristina Striegnitz (editor)Chapter 22, Copyright c 2001, Gy�orgy R�akosi

251

A Model-Based Semantics of the Mood Morphemes in Hungarian

1.1 The indicative

The unmarked category of the indicative is denoted by a zero morph inHungarian. It is typically used to express full epistemic commitment inmain clauses (1) and in the complement clauses of factive predicates (2-3):

(1) P�eterPeter

otthonat home

van.is(IND)

`Peter is at home.'

(2) �Or�ul�ok,glad.be.I(IND)

hogythat

P�eterPeter

otthonat home

van.is(IND)

`I am glad that Peter is at home.'

(3) �Erdekes,Interesting

hogythat

P�eterPeter

otthonat home

van.is(IND)

`It is interesting that Peter is at home.'

The indicative can also be used however to express various degrees ofepistemic commitment:

(4) Tal�anperhaps

otthonat home

van.is(IND)

`Perhaps he is at home.'

(5) Biztos,certain

hogythat

otthonat home

van.is(IND)

`He must be at home.'

Moreover, the indicative is also used in the complement clauses of weakintensional predicates such as propositional attitude verbs (6), declaratives(7) and �ction verbs (8):

(6) P�eterPeter

aztthat.acc

hiszi,believes(IND)

hogythat

EditEdith

otthonat home

van.is(IND)

`Peter believes that Edith is at home.'

(7) P�eterPeter

aztthat.acc

mondja,says(IND)

hogythat

EditEdith

otthonat home

van.is(IND)

`Peter says that Edith is at home.'

(8) P�eterPeter

aztthat.acc

k�epzeli,fancies(IND)

hogythat

EditEdith

otthonat home

van.is(IND)

`Peter fancies that Edith is at home.'

Notice that the interpretation of the embedded clause may be attachedto the referent of the matrix subject instead of the speaker, as in the lastthree sentences.

252

Gy�orgy R�akosi

1.2 The subjunctive

The subjunctive aÆx is -j.2 It is the mood primarily associated with imper-ative sentences:

(9) �Uljsit.you(SUB)

le!down

`Sit down.'

We also �nd the subjunctive in the complement clauses of directive (10),desiderative (11) and non-epistemic modal (12) predicates:

(10) Megparancsolom,order.I(IND)

hogythat

�uljsit.you(SUB)

le!down

Ì order you to sit down.'

(11) Arrathat.for

v�agyom,long.I(IND)

hogythat

le�ulj.down.sit.you(SUB)

Ì am longing for you to sit down.'

(12) Nemnot

szabad,allowed.to

hogythat

le�ulj.down.sit.you(SUB)

`You are not allowed to sit down.'

Nevertheless, the subjunctive is not simply the mood that we could asso-ciate with imperative speech acts. It is obligatory, for example, in purposeadjunct clauses (13) and it appears in too and enough (14) contexts:

(13) Az�ertthat.for

j�ottem,came.I(IND)

hogythat

besz�eljektalk.I(SUB)

veled.you.with

Ì have come to talk to you.'

(14) T�ultoo

�atalyoung

vagyare.you(IND)

ahhoz,that.to

hogythat

eztthis.acc

meg�ertsd.understand.you(SUB)`You are too young to understand this.'

2A syntactically motivated distinction is often made between an imperative and asubjunctive mood in Hungarian as the former reverses the order of the preverb and theverb both in main and in subordinate clauses, while the latter does not (cf., for example,Kenesei 1994). This is a purely syntactic distinction though and for this reason I am goingto talk about the single morphological category of the subjunctive in what follows.

253


1.3 The conditional

The conditional aÆx is -n. It is the obligatory mood in both the main andthe subordinate clauses of counterfactuals:

(15) Haif

~oshe

itthere

lenne,be.she(SUB)

akkorthen

boldoghappy

lenn�ek.be.I(CON)

`If she were here, then I would be happy.'

It also has a volitional/boulomaic use, expressing the speaker's -or, as in(16), of the individual referent of the clausal subject- wishes:

(16) P�eterPeter

sz��vesenkeenly

innadrink.he(CON)

egya

s�ort.beer.acc

`Peter would keenly drink a beer.'

The conditional can be used to express a remote future possibility, i.e.when we want to indicate that there is only a slight possibility that some-thing will happen in the future:

(17) Nenot

engedj�eteklet.you(SUB)

be,in

haif

errehere.to

j�onne.come.he(CON)

`Do not let him in if he happens to come this way.'

And �nally, the conditional is obligatory in two other subordinate clausetypes: complement clauses governed by the matrix pronominal an�elk�ul (18)and clauses with the complementizer mintha (19):

(18) Elmentleft.he(IND)

an�elk�ul,that.without

hogythat

k�osz�ontsaid.goodbye.he

volna.CON

`He left without saying goodbye.'

(19) �Ugyso

tett,did.he(IND)

minthaas.if

semminothing

semnot

t�ort�enthappened.it

volna.CON

`He behaved as if nothing had happened.

2 The semantics of the mood morphemes

The following discussion is based on the assumption that in the above ex-amples the respective mood morphemes contribute a single meaning to theinterpretation of the sentence in which they occur. This semantic featureis already assigned to them in the lexicon and it encodes the relation thatexists between the propositional core of the relevant clause and a given con-textually identi�ed model of interpretation M. Two types of such relationsare going to be established: compatibility and incompatibility. The presentproposal is intended to be compatible with dynamic frameworks, but it itself

254

Gy�orgy R�akosi

is not dynamic as the sole purpose is the analysis of the meaning of moodmorphemes and not that of the sentences in which they occur.

This proposal rests upon the following understanding of the notion ofmodel. A model M is taken to be a single set containing propositions thatthe individual anchor of the model can assign a de�nite truth-value to.Every model is anchored to a given individual x and a given time t : inthe default case the speaker and the time of utterance.3 This notion ofmodel corresponds in many ways to what is currently known in dynamicsemantics as information state, as it is intended to represent the epistemicstatus of its individual anchor. The reason why I still talk about models andnot information states is threefold. First, the present proposal itself is notdynamic as it is emphasised above. Second, propositions in modal semanticsare interpreted with respect to models that assign them to (sets of) possibleworlds, one of which is regarded to have a distinguished status in linguisticanalyses: the actual world. As this paper is not concerned with an analysisof sentential meaning, the introduction of the whole inventory of a possibleworld semantics is not necessary. Models here are taken to represent only therelevant actual world, as it is known to a given individual. This terminologyfacilitates a transition to semantic analyses that aim at providing fully- edged interpretations for sentences. Third, in some non-classical logicalapproaches models are indeed de�ned through language, without explicitlyassuming a separate ontological domain of entities which expressions in thelanguage can be assigned to (cf. Carnap's work). This notion of models isclose to the present one.4

M is then a representation of the epistemic status of its individual an-chor, which need not be the speaker as in (8), repeated here as (20):

(20) P�eterPeter

aztthat.acc

k�epzeli,fancies(IND)

hogythat

EditEdit

otthonat home

van.is(IND)

`Peter fancies that Edit is at home.'

Here the interpretation of the matrix clause is in a model anchored to thespeaker as the default case, but the subordinate clause is to be interpretedin a model anchored to P�eter, the main clause subject. Model shifts canbe introduced by expressions such as according to Peter or by intensionalpredicates such as k�epzeli in (20). Nevertheless, even a short characterisationof model-shifting would exceed the limitations of this paper.5 I investigateonly one particular type of shifting models of interpretation in section 3 :one which is the result of mood variation.

M contains propositions that the individual anchor either knows to betrue or knows to be not true. As the proposal centres on the notions of

3In what follows the temporal parameter will be ignored and I will concentrate uponthe default value: the present.

4My thanks are due to Tam�as Mih�alyde�ak for pointing this out to me.5cf. Farkas 1992 for an introductory discussion of this issue.

255


compatibility and incompatibility (to be explained below), the issue of entail-ments needs to be mentioned.6 From the present perspective, the problemarises when we want to represent inferences that speakers draw from theinformation that is available to them (the propositions in M ). This is a verycomplex issue that I cannot tackle here properly, but what I assume is thefollowing. Inferences are only represented in M if the individual anchor is"forced" to draw them in a given situation. Potential inferences -ones thatare not relevant to be drawn in a given situation- are not to be representedin M. The reason why I do not assume theoretic closure is that M wouldcease to be a representation of any degree of psychological plausibility if weassumed that speakers were able to draw all the logically possible conclu-sions from the premises that are available to them. This, of course, rendersa proper formal treatment very diÆcult, but this is not an idiosyncraticproblem of the present proposal.

My basic claim is that the mood morphemes establish a relation betweenM and the propositional core of the clause in which they occur. We canreceive this propositional core if we abstract away all attitudinal operatorsfrom the clause. Kiefer (1987) & (1988) claims that every sentence has thefollowing schematic semantic structure:

(21) (Att, p')

where Att stands for attitudinal operators that determine sentence modalityand p' represents the propositional content of the clause. If we consider forexample the imperative sentence (9), repeated here as (22),

(22) �Uljsit.you(SUB)

le.down

`Sit down.'

then the relevant propositional content is you are sitting down' and theattitudinal operator is Imp(erative), which approximately means "I want thefuture action described by the propositional content to become true" (Kiefer:1987, 79). The attitudinal operator of declarative sentences is Decl, withthe approximate sense "to consider something to be true".

These preliminaries having been set, we can now turn to the discussionof the mood morphemes themselves. I claim that the indicative denotes thecompatibility of the propositional core of the clause in which the indicativemorpheme is present with the relevant model M. Compatibility is de�ned in(23):

(23) A proposition p is compatible with a given model M i� :p =2M .

6As the reviewers of the abstract indicated, a proposition p can be incompatible withM even if :p is not in M, but there is another proposition q in M which implies :p.

256

Gy�orgy R�akosi

In the case of factive predicates in (1-3) p is in M as the speaker knowsthe relevant propositions to be true. In (6-8) the matrix predicate introducesanother model in which the propositional content of the complement clause isto be interpreted, and within that model the embedded proposition is againtaken to be true. It is considered to be an advantage of the present approachthat it can also account for clauses which do not express full epistemiccommitment, as (4-5): here neither p, nor :p is in M and this is indicativeinducing according to (23). That is why we also �nd the indicative in thecomplement clause of a matrix modal predicate of epistemic possibility, asin (24):

(24) Lehet,may(IND)

hogythat

otthonat home

van.is(IND)

`He may be at home.'

Here the speaker does not know whether the person talked about is athome or not, and this non-determinateness leads to no propositions in M. Inthis case again, the relevant propositional core 'he is at home' is compatiblewith M. The above mentioned Decl attitudinal operator is present only in thematrix, but not in the embedded clause - what is asserted is the possibilitythat the embedded proposition is true.

As of the other two moods, let us consider the subjunctive �rst. Thesubjunctive morpheme is argued to denote the incompatibility of the relevantpropositional core with M, where incompatibility is de�ned as:

(25) A proposition p is incompatible with a given model M i� :p 2M .

Considering our earlier example sentences (9-14), the intuitive insightabout the function of the subjunctive is that it can be regarded as a kindof indicator that the relevant propositional core is incompatible with therelevant model M in the sense that p is known to be not true at the time ofutterance. This complies with the description that Kiefer provides for theImp operator. Let us consider the following sentence:

(26) L�atogassvisit.you(SUB)

megperf

holnap.tomorrow

`Visit me tomorrow.'

I assume that at the time of utterance M contains a proposition of theform '(what I know now is that) you are not coming tomorrow ', otherwise theuse of an imperative sentence is not felicitous. It has to be emphasised againthat the present proposal is restricted to a semantic analysis of the moodmorphemes themselves and not to the sentences in which they occur. If we

257


want to achieve the latter goal, then the present apparatus is insuÆcient.7

What is claimed of the subjunctive morpheme is nothing more than what itcontributes to the interpretation of the clause is that its propositional coreis incompatible with M, as it has been de�ned. It will be shown in section 3that this approach helps to explain why the subjunctive occurs in contextswhere futurity is not involved in any sense (cf. sentences 37 and 40 in section3 ).

The conditional morpheme, as opposed to the subjunctive, is argued tohave a lexically encoded function of introducing an obligatory model shift.In (15-19), the conditional morpheme introduces an alternative model M C

where the relevant propositional core is envisaged to be true, thus M C con-tains p. However, the reference model is M here, too, just as well as it hasbeen claimed for the subjunctive and it contains :p. In other words, theconditional morpheme also has the basic function of denoting the incompat-ibility of p with M, but as opposed to the subjunctive, it introduces at thesame time an alternative model of interpretation.

The counterfactual reading we have in (15) and in (18-19) follows fromthis approach in an obvious way. I assume that the volitional use operatesin the same way as it has been presented for the subjunctive: in the caseof (16) Peter would like a situation to come about which is not true at themoment. We use (17), repeated here as (27), if what we know now is thathe is not coming:

(27) Nenot

engedj�eteklet.you(SUB)

be,in

haif

errehere.to

j�onne.come.he(CON)

`Do not let him in if he happens to come this way.'

What happens here is that the speaker still maintains a slight possibilitythat a situation might happen, which he otherwise considers to be not verylikely.

I have presented an analysis where both the subjunctive and the con-ditional morphemes are to be associated with incompatibility, as it is un-derstood here. The conditional morpheme has the additional function thatit always denotes an obligatory shift in models of interpretation - a shiftinto a model MC which is incompatible with the reference model M. Thuswe might as well regard the conditional as some kind of a special form ofthe subjunctive - in fact I claim that they both realize the same underlyingnon-indicative category of mood.8 In this sense, the Hungarian mood sys-

7Cf., for example, the analysis of deontic modals in En�c 1997, where predicates ofdeontic modality are said to shift the evaluation time into the future. Then, as usual, thesentence is to be interpreted in a set of future possible worlds - but the introduction ofthe whole inventory of a possible world semantics is exactly what the present proposalintends to avoid.

8In section 3 I will present data from Hungarian which are believed to support thisclaim.

258

Gy�orgy R�akosi

tem is based upon a basic dichotomy: the indicative can be contrasted withthe abstract superordinate category of the non-indicative. Figure 2 gives asummary of the system that has been proposed here.

[compatibility]

zero morph

conditional

[+ model shift]

indicative non-indicative

[incompatibility]

subjunctive

-j -n

Figure 22.1: The Hungarian mood-system.

A �nal note of warning should be made here before closing this section.I agree with Farkas (1992) in that mood distribution is not completely arbi-trary and there is ample evidence to suggest that it is in fact to some extentsemantically motivated. This does not mean though that such motivationcan always be legitimately established. In other words, it is not claimedhere that there is an exact assignment from semantic to grammatical repre-sentations or vice versa.

3 Mood variation in Hungarian

3.1 Some instances of mood variation in Hungarian

In this brief section I apply the approach outlined above to mood variationin Hungarian. Some typical contexts where mood variation is grammaticalare listed below. I provide only one English translation in each case, as thereis no systematic way of rendering in English the di�erences brought aboutby the change of mood in these environments.

The grammaticality of mood variation is the most robust in polaritycontexts. The presence of a conditional or a question operator (cf. 28-29)can be a licensing factor for an indicative-conditional variation to varyingdegrees:

(28) EzThis (IND-zero copula)

azthat

athe

h��resfamous

palota?palace

(29) EzThis

lennebe.it(CON)

azthat

athe

h��resfamous

palota?palace

`Is this that famous palace?'9

9This example has been suggested to me by �Ad�am N�adasdy. I owe (32-33) to TiborLaczk�o.

259


(29) is typically uttered in a situation where the speaker �nds the palacemuch di�erent than he had expected: he in fact �nds it unworthy of itsfame.

The negative operator is, however, a much stronger licenser for moodvariation. We �nd the indicative-conditional variation in the complementclause of negated factive verbs (30-31), negated propositional attitude verbs(hisz 'believe', gondol 'think'), predicates with a negative meaning compo-nent (32-33); and in clauses of reason when the matrix clause is negated(34-35):

(30) P�eterPeter

nemnot

eml�ekszikremember.he(IND)

arra,that.for

hogythat

J�anosJohn

itthere

volt.was(IND)

(31) P�eterPeter

nemnot

eml�ekszikremember.he(IND)

arra,that.for

hogythat

J�anosJohn

itthere

lettwas

volna.CON`Peter does not remember that John was here.'

(32) J�anosJohn

tagadta,denied.he(IND)

hogythat

~ohe

�oltekilled.he(IND)

megperf

azthe

eln�ok�ot.president.acc

(33) J�anosJohn

tagadta,denied.he(IND)

hogythat

~ohe

�oltekilled.he

volnaCON

megperf

azthe

eln�ok�ot.president.acc`John denied that he killed the president.'

(34) P�eterPeter

nemnot

az�ertthat.for

maradtstayed.he(IND)

otthon,at home

mertbecause

f�aradttired

volt.was.he(IND)

(35) P�eterPeter

nemnot

az�ertthat.for

maradtstayed.he(IND)

otthon,at home

mertbecause

f�aradttired

lettwas.he

volna.CON

`Peter did not stay at home because he was tired.'

Where the indicative is used (30, 32, 34), the speaker takes the proposi-tion expressed by the complement clause to be true. In (31, 33) the speaker's

260

Gy�orgy R�akosi

epistemic commitment to the truth of the embedded preposition is left un-speci�ed, in (35) the speaker takes the embedded proposition to be false.Negated epistemic predicates expressing likelihood or probability show anindicative-subjunctive-conditional alternation10 (36-38), and an indicative-subjunctive alternation appears in the complement clauses of negated matrixpredicates of epistemic possibility11 (39-40):

(36) Nemnot

val�osz��nu,likely

hogythat

ezthis

athe

t�ort�enetstory

igaz.true

(IND-zero copula)

(37) Nemnot


hogythat

ezthis

athe

t�ort�enetstory

igaztrue

legyen.be(SUB)

(38) Nemnot


hogythat

ezthis

athe

t�ort�enetstory

igaztrue

lenne.be(CON)

`It is not likely that this story is true.'

(39) Azthat

nemnot

lehet,may.it(IND)

hogythat

ilyenso

h�ulyestupid

vagy.are.you(IND)

(40) Azthat

nemnot

lehet,may.it(IND)

hogythat

ilyenso

h�ulyestupid

legy�el.are.you(CON)

`You cannot be that stupid.'

(36) and (39) mean something like 'Though it does not seem to be verylikely, I still have to believe it as my sources are completely reliable'. Nosuch assumption is present in (37-38) and (40): the embedded propositionis not believed to be true by the speaker.12

In all the previous examples, the indicative was in variation with thesubjunctive or the conditional, or both. But there is also a subjunctive-conditional variation in purpose clauses13 :

10This threefold alternation also appears in the context of free choice pronouns and inconcessive clauses. I leave these contexts out for lack of space, but the present proposalis intended to provide an account of these instances of mood variation, too.

11There is a restricted occurrence of the subjunctive in the complement clauses of thesepredicates, too. The question operator, for example, licenses this mood: Lehets�eges, hogyt�enyleg ilyen h�ulye lenn�el? `Is it really possible that you are that stupid?'

12Native speakers have di�ering intuitions about the interpretation of this set of sen-tences. Some of them �nd (36) -and to a lesser degree (39), too- acceptable even if thespeaker does not believe that the embedded proposition is true. Even these native speak-ers believe though that the speaker is more likely to assume the truth of the embeddedproposition if he uses the indicative than if he uses one of the non-indicative moods.

13The conditional is usually argued to be less acceptable in these contexts in descriptivegrammars.

261


(41) Megk�erte,asked.he(IND)

hogythat

��rjonwrite.he(SUB)

egya

levelet.letter.acc

(42) Megk�erte,asked.he(IND)

hogythat

��rnawrite.he(CON)

egya

levelet.letter.acc

`He asked him to write a letter.'

There is no semantic di�erence between the two complement clauseshere: both express the same purposive meaning.

3.2 Explaining mood variation

Quer (1998) claims that the grammatical function of mood variation inRomance languages is to mark shifts in models of interpretation. What theindicative denotes in contexts of mood variation is that the interpretation ofthe relevant proposition is in a model anchored to the speaker and not to thereferent of the main clause subject (if he is di�erent from the speaker). Thesubjunctive has the function of denoting that the interpretation is not inthe speaker's, but in another, contextually available model (for example inthat of the referent of the main clause subject). Then the speaker's epistemiccommitment towards the truth of the relevant proposition is left unspeci�ed.I argue that this proposal has the right predictions for the Hungarian data,too, with some modi�cations to be explained below.

Following the claim made by Quer, I assume that the indicative mor-pheme in all the above depicted mood variation contexts has the functionof denoting the compatibility of the relevant proposition with the speaker 'smodel. Thus when the speaker opts for (32), then he can only go on saying'. . . but I am sure that he did '. Accepting the denial put forward by Peter isnot a possibility in this case. These indicative clauses express the speaker'sfull epistemic commitment towards the truth of the relevant proposition,which is brought about by the presence of the Decl attitudinal operator inthese environments. This is an important di�erence between (39) and (24):Decl is assumed to be present in the complement clause only in the former,but not in the latter case.

I claim that in mood variation contexts both the subjunctive and theconditional morphemes have the basic function of denoting that the rele-vant proposition is incompatible with the relevant model. Thus they areassumed to operate in the same way whether they are in mood variationenvironments or not, and it is only the indicative that has a special func-tion in the former case. When the individual anchor of the model is notthe speaker, then incompatibility arises with respect to that model, and thespeaker's commitment towards the truth of the relevant proposition is leftunspeci�ed. In (31) for example, the matrix predicate introduces a modelanchored to the matrix predicate where the embedded proposition is to be

262

Gy�orgy R�akosi

interpreted. Hence it is possible that the speaker thinks that John was therejust as well that he believes (in another situation) that he was not. Whenno other model of interpretation is available in the context (as in 29, 35, 37,38 and 40), then the two morphemes simply denote the incompatibility of pwith the speaker's own model.

If they are both grammatical in these contexts, then the subjunctiveand the indicative are almost completely interchangeable as both appearto contribute the same incompatibility feature to the interpretation of theclause. This is especially true for (41) and (42). I believe that these datasupport the fundamental claim made in this paper: that the subjunctiveand the conditional are both subtypes of the same abstract non-indicativemood, which is in basic contrast with the indicative.

References

En�c, M. (1997). Tense and Modality. In S. Lappin (Ed.), The Handbookof Contemporary Semantic Theory, pp. 345{358. Oxford: BlackwellPublishers Ltd.

Farkas, D. (1992). Mood choice in complement clauses. In I. Keneseiand C. Pl�eh (Eds.), Approaches to Hungarian, Volume 4, pp. 207{25. Szeged: JATE.

Kenesei, I. (1994). Subordinate clauses. In F. Kiefer and K. �E. Kiss (Eds.),The Syntactic Structure of Hungarian, Volume 27 of Syntax and Se-mantics, pp. 275{354. San Diego: Academic Press.

Kiefer, F. (1987). On de�ning modality. Folia Linguistica XXX (1), 67{94.

Kiefer, F. (1988). Mondatfajta { mondatmodalit�as { mondathaszn�alat.[Sentence type { sentence modality { sentence use]. Tertium nondatur 5, 5{15.

Quer, J. F. (1998). Mood at the Interface. Ph. D. thesis, The Hague:Holland Academic Graphics.

263


264

Unprovability of Herbrand Consistency in

Weak Arithmetics

Saeed Salehi

Institute of Mathematics, Polish Academy of Sciences, P.O.BOX 137, 00-950 Warsaw, and

Turku Center for Computer Sciences, Lemminkaisenkatu 14 A - 4th oor, FIN-20520 Turku

[email protected].�

Abstract. By introducing an appropriate de�nition of Herbrand Consistency inweak arithmetics, we show G�odel's Second Incompleteness Theorem for Herbrandconsistency of theories containing I�0.

1 Introduction

Consider a formula � in the prenex normal form

8x19y1 � � � 8xm9ym�(x1; y1; � � � ; xm; ym)

with the Skolem functions f �1 ; � � � ; f�m; its Skolemized form by de�nition is

8x1 � � � 8xm�(x1; f�1 (x1); � � � ; xm; f

�m(x1; : : : ; xm)):

For a sequence of terms � = ht1; � � � ; tmi, the Skolem instance Sk(�; �) is

�(t1; f�1 (t1); � � � ; tm; f

�m(t1; : : : ; tm)):

Herbrands's Theorem states that a theory T is consistent if and only ifevery �nite set of its Skolem instances is propositionally satis�able (see (5).)Let � be a set of Skolem terms of T (i.e. constructed from the Skolem func-tion symbols of T ) available Skolem instances of � in � are Sk(�; �) for all se-quence of terms � = ht1; � � � ; tmi � � such that ff �1 (t1); � � � ; f

�m(t1; : : : ; tm)g

is a subset of � too.Any function, p, whose domain is a set of atomic formulae and its range

is f0; 1g is called an evaluation, if it preserves the equality (for all a; b andatomic formulae ', p[a = b] = 1 implies p['(a)] = p['(b)]) and satis�es theequality axioms (p[a = a] = 1 for all a.) For a set of terms �, an evaluationon � is an evaluation whose domain is the set of all atomic formulae with

Proceedings of the Sixth ESSLLI Student SessionKristina Striegnitz (editor)Chapter 23, Copyright c 2001, Saeed Salehi

265

Unprovability of Herbrand Consistency in Weak Arithmetics

constants from � (i.e. the variables are substituted by the terms from �.)An evaluation p satis�es an atomic formula ' if p['] = 1. This de�nitioncan be extended to all open (quanti�er-less) formulae in a unique way.Evaluation p on � is an T -evaluation for a theory T , if it satis�es all theavailable Skolem instances of T in �.When � is the set of all Skolem terms of T , any T -evaluation on � determinesa Herbrand model of T (see (5).)

Toward formalizing the de�nition of Herbrand Consistency, we read theabove Herbrand's Theorem as:

A theory T is consistent if and only if for every �nite set of Skolem termsof T , say �, there is an T -evaluation on �.

So Herbrand Consistency of a theory T can be de�ned as:\for every set of Skolem terms of T , there is an T -evaluation on it."Herbrand's Theorem is provable in I�0 + SupExp, and it is known

that Herbrand consistency is not equivalent to the standard, say Hilbert's,consistency in I�0 + Exp (see (3), (7).) Unprovability or provability ofHerbrand's Consistency for weak arithmetics (i.e. proper fragments of I�0+Exp) had been an open problem (see (6),(7).) Herbrand Consistency ofI�0 +Exp is unprovable in itself ((3),(7).)

Adamowicz ((1)) has shown the unprovability of Herbrand Consistencyof I�0 + 2 in itself (also in another unpublished paper for I�0 + 1.)

In this paper we modify the de�nition of Herbrand Consistency such thatits negation gives a real Herbrand proof of contradiction even when Exp isnot available, and show unprovability of formalized Herbrand Consistencyof I�0 (by the new de�nition) in itself. So it turns out that I�0 does notprove its own Herbrand Consistency, since the new Herbrand Consistencypredicate is implied by the old one.

2 Formalization of Herbrand Consistency in I�0

We take the language of arithmetic L = f0; S;+; :;�g in which the opera-tions \S" (successor) \ + " (addition) and \:" (multiplication) are regardedas predicates. For example \x+y = z" is a 3-array predicate, and the tradi-tional statements should be re-read in this language by using the predicatesS;+; : ; as an example 8x; y; z(x + (y + z) = (x + y) + z) can be read as8x; y; z; u; v; w(\y + z = v" ^ \x+ v = w" ^ \x+ y = u"! \u+ z = w").

So we may need some extra universal quanti�ers (and variables) to rep-resent the arithmetical formulae in this language, but for simplicity, andwhen there is no confusion, we will use the old notation.

All atomic formulae in our language are of the form x1 = x2, x2 = S(x1),x1 + x2 = x3, x1:x2 = x3 and x1 � x2, where x1; x2; x3 are variables or theconstant 0.

266

Saeed Salehi

Denote the cardinal of a set A by jAj; by terms we mean terms constructedfrom the Skolem function symbols of a theory T under consideration.

For a set of terms �, there are 2j�j3+3j�j2 di�erent atomic formulae withconstants from �. So there are 22j�j

3+3j�j2 di�erent evaluations on �. Thisshows that the above de�nition has a de�ciency in weak arithmetics, fromthe viewpoint of incompleteness: unprovability of the consistency of T in Tis equivalent to having a model of T which contains a proof of contradictionfrom T . By the above de�nition, a Herbrand proof of contradiction consistsof a set of terms, say �, such that there is no T -evaluation on it. If Expis not available in T , it may happen that all the (few) available evaluationsin the model are T -evaluations. This doesn't give a real Herbrand proof inthe model! because not all the evaluations are accessible in the model (theirnumber 22j�j

3+3j�j2 might be too large to exist.) It would be more reasonableif we could �nd a model with a suÆciently small set of terms in it, such thatnone of the evaluations on this set (which can be counted in the model) isan T -evaluation. An upper bound for the codes of the evaluations on a setof terms is given below.

We use the Hajek-Pudlak's coding of sets-sequences and terms ((3)) themain properties of this coding are:� code(hx1; � � � ; xli); code(fx1; � � � ; xlg) � (9(1 +maxfx1; � � � ; xlg)

2)l

(i.e. the set fx1; � � � ; xlg or the sequence hx1; � � � ; xli can have a codewhich is less than or equal to [9(1 +maxfx1; � � � ; xlg)

2]l)� code(A [B); code(A � B) � 64:code(A):code(B)Code the ordered pair ha; bi by (a+ b)2 + b+ 1.Fix the function symbols f i;jk which is supposed to be the i-th, k-array

Skolem function for the j-th axiom of a theory T ( so if the j-th axiom is9x8y9u9vA(x; y; u; v) then its Skolemized is 8yA(f1;j0 ; y; f1;j1 (y); f2;j1 (y)).)

Code f i;jk by h1; hi; hj; kiii, the symbol \)" by h2; 0i, $" by h2; 1i andthe constant 0 by h2; 2i.

And �x the function symbols f il which is supposed to be the i-th, l-arrayfunction, these symbols are reserved to be Skolem function of a formula �in the de�nition of HConT (�), and code it by h0; hi; lii.

Terms are well-bracketing sequence constructed from f(; )g [ ff jkgj;k [fflgl (see (3).)

Example Let i � 1, and de�ne c0 = 0, ck+1 = f1;11 (ck) for k � i.

There is a natural number A such that code(fc0; � � � ; cig) � Ai2 .

Since we have code(ck+1) � 644code(f1;11 )code(\(")code(ck)code($") �644h1; h1; h1; 1iiih2; 0ih2; 1ih2; 1icode(ck),

let m = 644h1; h1; h1; 1iiih2; 0ih2; 1ih2; 1icode(ck), so we have code(ck) �mk:code(c0).

Hence code(fc0; � � � cig) � (9(1 + code(ci))2)i � 9i(22(micode(c0))

2)i �36ih2; 2i2im2i2 � (36h2; 2im2)i

2, we can take A = 36h2; 2im2.

Let � be a set of terms with code y, we compute an upper bound for

267


evaluations on �: each evaluation is of the formfhy1 = y2; p[y1 = y2]i j y1; y2 2 �g [ fhy1 � y2; p[y1 � y2]i j y1; y2 2�g [ fhy2 = S(y1); p[y2 = S(y1)]i j y1; y2 2 �g [ fhy1:y2 = y3; p[y1:y2 =y3]i j y1; y2; y3 2 �g [ fhy1 + y2 = y3; p[y1 + y2 = y3]i j y1; y2; y3 2 �g;in which p[�] 2 f0; 1g for any atomic formula � with constants from �.Code \ = " by h3; 0i, \ � " by h3; 1i, \S" by h3; 2i, \ + " by h3; 3i, and\:" by h3; 4i.We code formulae by Polish notation, for example

code(x1 + x2 = x3) = code(+(x1:x2:x3)) =code(hh3; 3i; h2; 0i; code(x1); code(x2); code(x3); h2; 1ii).

There is a natural number a such that for any k 2 f0; 1gcode(hy1 = y2; ki) � 2 + (1 + ay1y2)

2,code(hy1 � y2; ki) � 2 + (1 + ay1y2)

2,code(hy2 = S(y1); ki) � 2 + (1 + ay1y2)

2,code(hy1 + y2 = y3; ki) � 2 + (1 + ay1y2y3)

2, andcode(hy1:y2 = y3; ki) � 2 + (1 + ay1y2y3)

2.So code(h�; ki) � 2+(1+ay3)2 for all k 2 f0; 1g and atomic � with constantsfrom �, with code(�) = y. Hence code(p) � (9:(3 + (1 +ay3)2)2)2jyj

3+3jyj2 �(81(1 + ay3)4)2jyj

3+3jyj2 , (we identify j�j with jyj) for all evaluation p on �.Call a set of terms � with code(�) = y, admissible if F (y) = (81(1 +ay3)4)2jyj

3+3jyj2 exists.We modify the de�nition of Herbrand Consistency of a theory T as: \ forevery admissible set of Skolem terms of T , there is an T -evaluation on it".This is formalized below.

By \terms" we mean terms constructed from the Skolem function sym-bols ff i;jk gi;j;k [ ff

il gi;l introduced above, the bounded formula Terms(y)

means \y is a set of terms constructed from those symbols".There are bounded formulae eva(x) and eval(x; y) which represent \x is

an evaluation" and \y is a set of terms and x is an evaluation on y".For atomic formula �, p[�] = 1 is a bounded formula, for more complex

� the statement p[�] = 1 can be written by a �1-formula:let the bounded formula Sat(p; �; s) be\eva(p)& s is a sequence of pairs hai; bii, such that:1) each ai is (the code of) a formula and each bi is 0 or 1,2) for k = length(s), ak = � and bk = 1,3) each ai is either of the form3.1) ai = aj ^ ak for some j; k < i and bi = bj:bk,or 3.2) ai = aj _ ak for some j; k < i and bi = bj + bk � bj:bk,or 3.3) ai = aj ! ak for some i; j < k and bi = 1 + bj :bk � bj,or 3.4) ai = :aj for some j < i and bi = 1� bj ,or 3.5) ai is atomic and bi = p[ai]. "Let S(�) be the number of subformulae of the formula �. For the above

sequence s we have code(s) � (9(1 + code(h�; 1i)2)S(�)

� (9(1 + 2 + (�+ 1)2)2)S(�) � (81(1 + �)4)S(�).

268

Saeed Salehi

So we can write p[�] = 1 as: 8z�z � (81(1+�)4)S(�) ! 9s � zSat(p; �; s)

�.

Let j�j be the number of existential quanti�ers in the prenex normal formof � (we can assume it has the form � = 8x19y1 � � � 8xm9ym�(x1; y1; � � � ; xm; ym),so j�j = m in this case.)For a formula � �x its Skolem functions as f �1 ; � � � ; f

�� where � = j�j. Write

� = ht1; � � � ; t�i � � for a set of terms � such that ff �1 (t1); � � � ; f��(t1; : : : ; t�)g

is a subset of � too.We have code(Sk(�; �)) � code(� � � � (f �1 (t1); � � � ; f

��(t1; : : : ; t�))).

On the other handcode((f �1 (t1); � � � ; f

��(t1; : : : ; t�)))) � 18�code(f ��(t1; : : : ; t�)))2�, and also

code(f ��(t1; : : : ; t�))) � 643+�code(f ��)code($")code($")code(t1) : : : code(t�).So with code(�) = y we havecode(Sk(�; �)) � 643�(18:552)�642�(3+�)code(f ��)2�:y2�

2+�.LetG(�; y) = [81(1+643:(18:552)j�j:642j�j(3+j�j):�:code(f �j�j)

2j�j:y2j�j2+j�j)4]S(�).

Noting that \u = Sk(�; y)" is a bounded formula, we can write \p is an�-evaluation on y" as:

Terms(y)^eval(p; y) ^ 8z[z � G(�; y)! 8u � z8� � yf� = (t1; � � � ; tj�j) �

y^ff �1 (t1); � � � ; f�j�j(t1; : : : ; t�))g � y^\u = Sk(�; �)"! 9s � zSat(p; u; s)g].

Denote its bounded counterpart by SatAvail(p; y; �; z), that is:SatAvail(p; y; �; z) = Terms(y)^eval(p; y)^8u � z8� � yf� = (t1; � � � ; tj�j) �

y^ff �1 (t1); � � � ; f�j�j(t1; : : : ; t�))g � y^\u = Sk(�; �)"! 9s � zSat(p; u; s)g].

For a �nite theory fT1; � � � ; Tng, de�ne the predicate HConT (x), as:

8z�8y � z [ Terms(y) ^ z � F (y) ^

V1�j�n z � G(Tj ; y) ^ z � G(x; y)!

9p � z9s � zfeval(p; y)^V1�j�n SatAvail(p; Tj ; y; s)^SatAvail(p; x; y; s)g]

�.

We note that the boundsG(Tj ; y) and for a standard x the boundG(x; y)for z, are polynomial with respect to y, so for large enough, also for non-standard ys, they are less than the bound F (y).

The cut log2 is de�ned (informally) by: x 2 log2 () 22xexists: A

formal de�nition is given in the next section.The predicate HCon�T (x) is obtained from HConT (x) by restricting the

(only unbounded) universal quanti�er to log2:

8z 2 log2�8y � z [ Terms(y) ^ z � F (y) ^

V1�j�n z � G(Tj ; y) ^

z � G(x; y) ! 9p � z9s � zfeval(p; y) ^V1�j�n SatAvail(p; Tj ; y; s) ^

SatAvail(p; x; y; s)g]�

.

Proposition 2.1. The above formulae HConT (�) and HCon�T (�) binu-merate \Herbrand Consistency of T with �" in N:

N j= HConT (�) i� N j= HCon�T (�) i� \f�g [ T is Herbrand consis-tent."

Herbrand Consistency of T , HCon(T ), is HConT (\0 = 0").For a moment assume we have proved the following proposition:

269


Proposition 2.2. There is a �nite set of I�0-derivable sentences, say B,such that for every bounded formula �(x) with x as the only free variable,and for any �nite theory � (in the language of arithmetic) whose axiomscontain the set B,

I�0 ` HCon(�) ^ 9x 2 log2�(x)! HCon��(\9x 2 log2�(x)")

Now we can prove our main theorem:

Theorem 2.3. Take B as the previous proposition, and let D be the unionof B and a �nite fragment of I�0 containing PA

� such that the last propo-sition is provable in D, then for any �nite consistent theory (in the languageof arithmetic) whose axioms contain the set D, we have � 6` HCon(�).

Proof. Let � be the �xed point of HCon��(:�) � � (it is available inPA�, i.e. PA� ` HCon��(:�) � � , see (4).)

The theory �+ :� is consistent, since otherwise, by proposition 2.1, wewould have N j= :HCon��(:�) and so by the fact that PA� is �1-complete((4)) we would get PA� ` :HCon��(:�), hence � ` :� , then � would beinconsistent.

Write :� � 9x 2 log2�(x) for a bounded �, then�+ :� +HCon(�) ` HCon(�) ^ 9x 2 log2�(x),so by proposition 2.2 we get � + :� + HCon(�) ` HCon��(\9x 2

log2�(x)"),and then �+:� +HCon(�) ` HCon��(:�), or �+:� +HCon(�) ` � .So � ` HCon(�)! � , and this shows that � 6` HCon(�). 4

3 A Herbrand �1-Completeness Theorem in I�0

This section is devoted to prove proposition 2.2.Godel's original second incompleteness theorem states unprovability of

(formalized) consistency of T in T , for strong enough theories T . Beingstrong enough means being able to code sets-sequences, terms and someother logical (syntaical) concepts, like provability and prove their properties.

Of those properties are:1. T ` PrT (') ^ PrT ('! )! PrT ( )2. T ` PrT (')! PrT (PrT ('))Usually the property 2 is proved by use of formalized �1-completeness

theorem:T ` '! PrT (') for any �1-formula '.So how can one show Godel's second incompleteness theorem for weak

arithmetics, which are not that strong to prove those properties?One may have two options here:

270

Saeed Salehi

1) try to �nd a model of T which does not satisfy Con(T ),2) try to show some weak forms of �1-completeness in T , which can

prove T 6` Con(T ) (by a similar argument of our main theorem's proof.)The �rst method is applied in (2) to show Q 6` Con(Q) for Robinson's

arithmetic Q.There is no hope to use this way for more complex theories like I�0

(and its super-fragments) since there is no recursive non-standard model forthem (see (3).)

So the diÆculty rises when one seeks for a kind of formalized �1-completenesstheorem which can be proved in the (weak) theory and at the same time ispowerful enough to show unprovabolity of the theory's consistency in itself.

A weak form of �1-incompleteness theorem can be like:T ` Con(T ) ^ 9x�(x)! ConT (9x�(x)) for �0-formulae �(x) (cf (1).)

Our proposition 2.2 is a form of weak �1-incompleteness theorem, in whichthe witness x for �(x) is small (restricted to log2) and the second consistencypredicate is rather weak (that is HCon�T instead of HConT .)

Take A be the axiom system:A1: 8x9y \y = S(x)"A2: 8x; y; z(\y = S(x)" ^ \z = S(x)"! y = z)A3: 8x (x � x)A4: 8x; y; z (x � y ^ y � z ! x � z)A5: 8x (x � 0! x = 0)A6: 8x; y; z (\y = S(z)" ^ x � y ! x � z _ x = y)A7: 8x; y(\y = S(x)"! x � y)A8: 8x \x+ 0 = x"A9: 8x; y; z; u; v (\z = S(y)"^ \x+ y = u"^ \v = S(u)"! \x+ z = v")A10: 8x \x:0 = 0"A11: 8x; y; z; u; v (\z = S(y)" ^ \x:y = u" ^ \u+ x = v"! \x:z = v")A12: 8x; y (\y = S(x)"! :y � x)Fix the terms c0 = 0, cj+1 = f1;11 (cj).The term ci is represented as the i-th numeral in every A-evaluation p:

p[c0 = 0] = 1 and p[cj+1 = S(cj)] = 1.

Lemma 3.1. (I�0) Suppose for an i, we have fc0; � � � ; cig � � for a set ofterms �, and p is an A-evaluation on �, then1) If p[a � ci] = 1 for an a 2 �, then there is an j � i such thatp[a = cj ] = 1.2) If is an open formula and (x1; � � � ; xm) holds for x1 � � � xm � i, thenp[ (cx1 ; � � � ; cxm)] = 1.

Proof. 1) by induction on j, one can prove that if p[a � cj ] = 1 thenp[a = ck] = 1 for a k � j: for j = 0 use A5, and for j + 1 use A6.

2) The assertion can be proved for the atomic or negated atomic formu-lae. For x1 � x2 use induction on x2, for x2 = 0 by A3 and for x2 + 1 by

271


A3, A4 and A7. Similarly for x1 + x2 = x3 and x1:x2 = x3 use inductionon x2 and A8, A9, A10 and A11. For :x1 = x2: if :x1 = x2 then eitherx1 + 1 � x2 or x2 + 1 � x1, e.g. for x1 + 1 � x2 we have p[cx1+1 � cx2 ] = 1,now use A12. For :S(x1) = x2 use A2, and the cases :x1 + x2 = x3 and:x1:x2 = x3 can be derived from the previous cases. For :x1 � x2: if:x1 � x2 then x2 + 1 � x1 so p[cx2+1 � cx1 ] = 1, now use A4 and A12.

The induction cases for ^;_;! are straightforward. (Note we have as-sumed that the formula � is in normal form: the negation appears only infront of atomic formulas.) 4

Recall Godel's beta function:�(a; b; i) = r if a = (q + 1)[(i + 1)b + 1] + r ^ r � (i + 1)b for some q (cf(4).) De�ne the ordered pairs by ha; bi = a+ 1

2(a+ b+ 1)(a + b).Let (z; i) = 8x � z8y � z8j < ifhx; yi = z ! x � (i + 1)y + 1 ^

�(x; y; 0) = 2 ^ �(x; y; j + 1) = (�(x; y; j))2g.The formula (z; i) states that z is a (�)-code of a sequence whose length

is at least i + 1, and its �rst term is 2 and every term is the square of itspreceding term. So such a sequence looks like: h2; 22; 22

2; � � � ; 22

i; : : :i.

We can de�ne the cut log2 as: x 2 log2 () 9z(z; x):Denote the open part of by , so (z; x) = 8u(z; x;u), in which

u = (u1; � � � ; uk) for a natural k.To get the B asserted in the proposition, we add the following axioms to A:

A13: (33; 0)A14: 8x8i9y((x; i)! (y; i + 1))

The axiom A14 is in fact the I�0-derivable statement i 2 log2 ! i+1 2 log2.To be more precise we (can) write the axiom A14 in the prenex normal form:

A14: 8x8i9y8u8v((x; i;u)! (y; i + 1;v)).Fix the terms z0 = c33, zj+1 = f1;142 (zj ; cj).

The term zi is represented as a (�)-code of the sequence h2; 22; � � � ; 22ii

in any B-evaluation (note that 33 = h5; 2i = a �-code for h2i.)

Lemma 3.2. (I�0) Suppose for i � 33, fc0; � � � ; ci; z0; � � � ; zig � �, thenfor any B-evaluation p on �, p satis�es all the available Skolem instancesof (zi; ci).

Proof. By induction on j � i one can show that any such p satis�es allthe available Skolem instances of (zj; cj). 4

Now we are close to the proof of the proposition, let � be a theorywhose axioms contain the set B, and take a model M j= I�0 such thatM j= HCon(�) and M j= i 2 log2 ^ �(i) for an i 2M . Take a set of terms� with code(�) = y such that F (y) exists and is in log2(M) (we can assumei and y are non-standard) then we �nd an admissible set of terms �0, soby the assumption HCon(�) there is an �-evaluation on �0 which induces

272

Saeed Salehi

an (� [ f9x 2 log2�(x)g)-evaluation on �. This shows M j= HCon��(9x 2log2�(x)).Write �(x) = 8x1 � �19y1 � �1 � � � 8xm � �m9ym � �m�(x; x1; y1; � � � ; xm; ym).There are (partial) functions on M , g1; � � � ; gm (we may assume, gj : [0; i]j !M) such that for all a1; � � � ; am 2M

M j= a1 � �01 ! [g1(a1) � �01 ^ � � � [am � �0m ! [gm(a1; : : : ; am) ��0m ^ �(i; a1; g1(a1); � � � ; gm(a1; : : : ; am))]] : : :],

in which (�0j ; �0j ; j � m) is the image of (�j ; �j ; j � m) under the

substitution fx 7! i; xj 7! aj; yj 7! gj(a1; � � � aj); j � mg.Consider the formula

9x 2 log2�(x) �9x9z8x1 � �19y1 � �1 � � � 8xm � �m9ym � �m8uf(z; x;u)^^�(x; x1; y1; � � � ; xm; ym)g. Its Skolemized form is:8x1 � � � 8xm8uf(f20 ; f

10 ;u) ^ x��

001 ! [f11 (x1) � �001 ^ � � � [xm � �00m !

[f1m(x1; : : : ; xm) � �00m ^ �(f10 ; x1; f11 (x1); � � � ; xm; f

1m(x1; : : : ; xm))]] � � � ]g,

in which (�00j ; �00j ; j � m) is the image of (�j ; �j ; j � m) under the

substitution fx 7! f10 ; yj 7! f1j (x1; � � � xj); j � mg.De�ne the operation < on terms by:- f10 7! ci- f20 7! zi- f11 (cj) 7! cg1(j)...- f1m(cj1 ; � � � ; cjm) 7! cgm(j1;�� ;jm)That is the term f10 is mapped (under <) to ci, and f20 is mapped to zi

and for any 1 � t �m the term f1t (cj1 ; � � � ; cjt) is mapped to cgt(j1;�� ;jt).By an argument similar to the example in the previous section, it can be

shown that there is a natural K such that code(cj); code(zj) � Kj for any

j � 1, and code(fc0; � � � ; ci; z0; � � � ; zig) � Ki2 for any i � 1:

For any term t, code(<(t)) � code(t � (zi)jtj � (ci)

jtj) �� 643:t:363jtj:code(zi)

jtj:code(ci)jtj � 643:t:363t:K2it, so maxfcode(<(t))jt 2

�g � 643:y:363y :K2i2y, hence code(<(y)) � 36jyj:[643:t:363t:K2it]jyj.Let �0 = <(�) [ fc0; � � � ; ci; z0; � � � ; zig.So code(�0) = y0 � 64:K2i2 :36jyj:643jyj:yjyj:363yjyj:K2ijyj.We show that F (y0) exists. Note that y 2 log2 because y < F (y) 2 log2.Assuming that i; y are non-standard we can write: F (y0) � (y0)4jy

0j4 =(y0)4(2i+jyj)

4� (y0)4y

5:(y0)4(2i)

5, and this is less than (22

i)14 if y � i, and is

less than (22y)14 if i � y. So �0 is admissible.

Hence by the assumption HCon(�) there is an �-evaluation q on �0.De�ne the evaluation p on � by

p['(a1; � � � ; al)] = q['(<(a1); � � � ;<(al))] for any atomic '.It can be shown that the above equality holds for open formulae ' as well.We show that p satis�es all the available Skolem instances of f9x 2 log2�(x)g[� in �:

273


1) p is an �-evaluation, since q is so and the operation < has nothing to dowith the Skolem functions of �: p[�(t1; f

1;j1 (t1); � � � ; tk; f

1;jk (t1; : : : ; tk))] =

q[�(<(t1);<(f1;j1 (t1)); � � � ;<(tk);<(f1;jk (t1; : : : ; tk)))] =

q[�(<(t1); f1;j1 (<(t1)); � � � ;<(tk); f1;jk (<(t1; : : : ; tk)))] = 1:

2) p satis�es all the available Skoelm instances of 9x 2 log2�(x) in �:2.1) p[(f20 ; f

10 ; t1; � � � ; t24)] = q[(<(f20 );<(f10 );<(t1); � � � ;<(t24))] =

q[(zi; ci;<(t1); � � � ;<(t24))] = 1since by lemma 3.2, q satis�es all the available Skolem instances of (zi; ci)then the latter equality holds.2.2) by lemma 3.1 for any term t and any k � i, if p[t � ck] = 1 thenp[t = cj ] = 1 for some j � k. So for evaluating �(x) it is enough to considerSkolem instances like ��(f10 ; cj1 ; f

11 (cj1); � � � ; cjm ; f

1m(cj1 ; : : : ; cjm)):

p[��(f10 ; cj1 ; f11 (cj1); � � � ; cjm ; f

1m(cj1 ; : : : ; cjm))] =

q[��(<(f10 );<(cj1);<(f11 (cj1)); � � � ;<(cjm);<(f1m(cj1 ; : : : ; cjm)))] =q[��(ci; cj1 ; cg1(j1); � � � ; cjm ; cgm(j1;:::;jm))] = 1

the latter equality holds by M j= ��(i; j1; g1(j1); � � � ; jm; gm(j1; : : : ; jm)) andlemma 3.1.

This completes the proof of the proposition.

Acknowledgement I would like to thank professor Zo�a Adamowiczfor reading the draft of the paper and for her fruitful criticisms.

References

[1] Adamowicz Z. & Zbierski P. \On Herbrand Type Consistency inWeak Theories", to appear in Archive for Mathematical Logic.

[2] Bezboruah A. & Shepherdson J.C. \G�odel's Second IncompletenessTheorem For Q", in The Journal of Symbolic Logic. (41) 1976,pp. 503-512

[3] H�ajek P. & Pudl�ak P. Metamathematics of First Order Arithmetic,Springer-Verlag 1991.

[4] Kay R. Models of Peano arithmetic, Oxford Logic Guides 15,. OxfordScience Publications. The Clarendon Press, Oxford UniversityPress, New York, 1991

[5] Nerode A. & Shore R.A. Logic for Applications, Springer-Verlag1993.

[6] Paris J. & Wilkie A. \�0-sets and inducton" Proceedings of theJadwisin Logic Conference, Open days in Model Theory and SetTheory, Poland, Leeds University Press, 1981, pp. 237-248.

[7] Pudl�ak P. \Cuts, Consistency Statements and interpretation" in TheJournal of Symbolic Logic, (50) 1985, pp. 423-442.

274

Monads for natural language semantics

Chung-chieh Shan

Harvard University, 33 Oxford St, Cambridge, MA 02138, USA

[email protected]

Abstract. Accounts of semantic phenomena often involve extending types ofmeanings and revising composition rules at the same time. The concept of monads

allows many such accounts|for intensionality, variable binding, quanti�cation andfocus|to be stated uniformly and compositionally.

1 Introduction

The Montague grammar tradition formulates formal semantics for naturallanguages in terms of the �-calculus. Each utterance is considered a tree inwhich each leaf node is a lexical item whose meaning is a (usually typed)value in the �-calculus. The leaf node meanings then determine meaningsfor subtrees, through recursive application of one or more composition rules.A composition rule speci�es the meaning of a tree in terms of the meaningsof its sub-trees. One simple composition rule is function application:

Jx yK = JxK�JyK�

: � where JxK : �! � and JyK : �: (24.1)

Here � and � are type variables, and we denote function types by !.To handle phenomena such as intensionality, variable binding, quanti�-

cation and focus, we often introduce new types in which to embed existingaspects of meaning and accommodate additional ones. Having introducednew types, we then need to revise our composition rules to reimplementexisting functionality. In this way, we often augment semantic theories bysimultaneously extending the types of meanings and stipulating new com-position rules. When we augment a grammar, its original lexical meaningsand composition rules become invalid and require global renovation (typi-cally described as \generalizing to the worst case" (Partee 1996)). Each timewe consider a new aspect of meaning, all lexical meanings and compositionrules have to be revised.

Over the past decade, the category-theoretic concept of monads hasgained popularity in computer science as a tool to structure denotational se-mantics (Moggi 1990; Moggi 1991) and functional programs (Wadler 1992a;

Proceedings of the Sixth ESSLLI Student SessionKristina Striegnitz (editor)Chapter 24, Copyright c 2001, Chung-chieh Shan

275


Wadler 1992b). When used to structure computer programs, monads allowthe substance of a computation to be de�ned separately from the plumb-ing that supports its execution, increasing modularity. Many accounts ofphenomena in natural language semantics can also be phrased in terms ofmonads, thus clarifying the account and simplifying the presentation.

In this paper, I will present the concept of monads and show how theycan be applied to natural language semantics. To illustrate the approach, Iwill use four monads to state analyses of well-known phenomena uniformlyand compositionally. By \uniformly" I mean that, even though the analysesmake use of a variety of monads, they all invoke monad primitives in thesame way. By \compositionally" I mean that the analyses de�ne compositionrules in the spirit of Montague grammar. After presenting the monadicanalyses, I will discuss combining monads to account for interaction betweensemantic phenomena.

2 Monadic analyses

Intuitively, a monad is a transformation on types equipped with a composi-tion method for transformed values. Formally, a monad is a triple (M ; �; ?),where M is a type constructor (a map from each type � to a correspondingtype M �), and � and ? are functions (pronounced \unit" and \bind")

� : �! M �; ? : M � ! (�! M �) ! M �: (24.2)

These two functions are polymorphic in the sense that they must be de�nedfor all types � and �. Roughly speaking, � speci�es how ordinary valuescan be injected into the monad, and ? speci�es how computations withinthe monad compose with each other.1 Some concrete examples follow.

2.1 The powerset monad; interrogatives

As a �rst example, consider sets. Corresponding to each type � we have atype �! t, the type of subsets of �. We de�ne2

M � = �! t 8�; (24.4a)

�(a) = fag : M � 8a : �; (24.4b)

m ? k =Sa2m k(a) : M � 8m : �! t, k : �! � ! t: (24.4c)

1By de�nition, � and ? must satisfy left identity, right identity, and associativity:

�(a) ? k = k(a) 8a : �, k : �! M �; (24.3a)

m ? � = m 8m : M �; (24.3b)�m ? k

�? l = m ?

��v: k(v) ? l

�8m : M �, k : �! M �, l : � ! M : (24.3c)

2In this section and the next, we treat types as sets in order to de�ne the powersetand pointed powerset monads. These two monads do not exist in every model of the�-calculus.

276

Chung-chieh Shan

The powerset monad is a crude model of non-determinism. For example,the set of individuals m1 de�ned by

m1 = fJohn;Maryg : M e

can be thought of as a non-deterministic individual|it is ambiguous betweenJohn and Mary. Similarly, the function k1 de�ned by

k1 : e! M (e ! t)

k1(a) = f�x: like(a; x); �x: hate(a; x)g : M (e ! t)

maps each individual to a non-deterministic property. To apply the functionk1 to the individual m1, we compute

m1 ? k1 =Sa2fJohn;Marygf�x: like(a; x); �x: hate(a; x)g

= f�x: like(John; x); �x: hate(John; x);

�x: like(Mary; x); �x: hate(Mary; x)g : M (e ! t):

We see that the non-determinism in both m1 and k1 is carried through toproduce a 4-way-ambiguous result.

Most words in natural language are not ambiguous in the way m1 andk1 are. To upgrade an ordinary (deterministic) value of any type � to thecorresponding non-deterministic type M �, we can apply � to the ordinaryvalue, say John:

�(John) = fJohng : M e;

fJohng ? k1 = f�x: like(John; x); �x: hate(John; x)g : M (e ! t):(24.5)

Similarly, to convert an ordinary function to a non-deterministic function,we can apply � to the output of the ordinary function, say k2 below:

k2 = �a: �x: like(a; x) : e! e! t;

� Æ k2 = �a:f�x: like(a; x)g : e! M (e ! t);

m1 ? (� Æ k2) = f�x: like(John; x); �x: like(Mary; x)g : M (e ! t):

(24.6)

In both (24.5) and (24.6), an ordinary value is made to work with a non-deterministic value by upgrading it to the non-deterministic type.

Consider now the function application rule (24.1). We can regard it asa two-argument function, denoted A and de�ned by

A : (�! �)! �! �;

A(f)(x) = f(x) : � 8f : �! �, x : �:(24.7)

We can lift ordinary function application A to non-deterministic functionapplication AM , de�ned by

AM : M (� ! �)! M � ! M �;

AM (f)(x) = f ?��a: x ? [�b: �(a(b))]

�: M �

8f : M (� ! �), x : M �:

(24.8)

277


Substituting (24.4) into (24.8), we get

AM (f)(x) = f a(b) j a 2 f , b 2 x g � � 8f � �! �, x � �: (24.9)

Just as the de�nition of A in (24.7) gives rise to the original compositionrule (24.1), that is,

Jx yK = A(JxK)(JyK); (24.10)

the de�nition of AM in (24.8) gives rise to the revised composition rule

Jx yK = AM (JxK)(JyK): (24.11)

For the powerset monad, this revised rule is the set-tolerant compositionrule in the alternative semantics analysis of interrogatives �rst proposed byHamblin (1973). In Hamblin's analysis, the meaning of each interrogativeconstituent is a set of alternatives available as answers to the question; thiscorresponds to the de�nition of M in (24.4). By contrast, the meaning ofeach non-interrogative constituent is a singleton set; this corresponds to thede�nition of � in (24.4).

To support question-taking verbs (such as know and ask), we (and Ham-blin) need a secondary composition rule in which A is lifted with respect tothe function f but not the argument x:

Jx yK = A0M (JxK)(JyK) where

A0M : M (M � ! �)! M � ! M �;

A0M (f)(x) = f ?��a: �(a(x))

�: M � 8f : M (M � ! �), x : M �:

(24.12)

Substituting (24.4) into (24.12), we get

A0M (f)(x) = f a(x) j a 2 f g � � 8f � (�! t)! �, x � �; (24.13)

Note that, for any given pair of types of JxK and JyK, at most one of AM (24.8)and A0

M(24.12) can apply. Thus the primary composition rule (24.11) and

the secondary composition rule (24.12) never con ict.

2.2 The pointed powerset monad; focus

A variation on the powerset monad (24.4) is the pointed powerset monad ; itis implicitly involved in Rooth's (1996) account of focus in natural language.A pointed set is a nonempty set with a distinguished member. In otherwords, a pointed set x is a pair x = (x0; x1), such that x0 is a member ofthe set x1. De�ne the pointed powerset monad by

M � =�

(x0; x1) j x0 2 x1 � �

8�; (24.14a)

�(a) =�a; fag

�: M � 8a : �; (24.14b)

m ? k =�[k(m0)]0;

Sa2m1

[k(a)]1�

: M � 8m : M �, k : �! M �: (24.14c)

278

Chung-chieh Shan

This de�nition captures the intuition that we want to keep track of both a setof non-deterministic alternatives and a particular alternative in the set. Aswith the powerset monad, the de�nition of m?k carries the non-determinismin both m and k through to the result.

Substituting our new monad de�nition (24.14) into the previously liftedapplication formula (24.8) gives

AM (f0; f1)(x0; x1) =�f0(x0); f a(b) j a 2 f1, b 2 x1 g

�: M �

8(f0; f1) : M (� ! �), (x0; x1) : M �:(24.15)

This formula, in conjunction with the primary composition rule (24.11), isequivalent to Rooth's recursive de�nition of focus semantic values.

Crucially, even though the pointed powerset monad extends our mean-ing types to accommodate focus information, neither our de�nition of AMin (24.8) nor our composition rule (24.11) needs to change from before.Moreover, the majority of our lexical meanings have nothing to do withfocus and thus need not change either. For example, in the hypotheticallexicon entry JJohnK = �(John), the upgrade from meaning type e to mean-ing type M e occurs automatically due to the rede�nition of �.

2.3 The reader monad; intensionality and variable binding

Another monad often seen in computer science is the reader monad, alsoknown as the environment monad. This monad encodes dependence of val-ues on some given input. To de�ne the reader monad, �x a type �|say thetype s of possible worlds, or the type g of variable assignments|then let

M � = �! � 8�; (24.16a)

�(a) = �w: a : M � 8a : �; (24.16b)

m ? k = �w: k�m(w)

�(w) : M � 8m : M �, k : �! M �: (24.16c)

Note how the de�nition of m?k threads the input w through both m and k toproduce the result. To see this threading process in action, let us once againsubstitute our monad de�nition (24.16) into the de�nition of AM in (24.8):

AM (f)(x) = �w: f(w)�x(w)

�: M � 8f : M (� ! �), x : M �: (24.17)

For � = s, we can think of M as the intensionality monad, noting that (24.17)is exactly the usual extensional composition rule. While words such asstudent and know have meanings that depend on the possible world w,words such as is and and do not. We can upgrade the latter by applying �.

For � = g, we can think of M as the variable binding monad, notingthat (24.17) is the usual assignment-preserving composition rule. Except forpronominals, most word meanings do not refer to the variable assignment.Thus we can upgrade the majority of word meanings by applying �.

279


If we substitute the same monad de�nition (24.16) into the secondarycomposition rule (24.12), the result is

A0M (f)(x) = �w: f(w)(x) : M � 8f : M (M � ! �), x : M �: (24.18)

For � = s, this is the intensional composition rule; it handles sentence-takingverbs such as know and believe (of type s! (s! t) ! e ! t) by allowingthem to take arguments of type s! t rather than type t. The monad laws,by the way, guarantee that A0

M(f)(x) = AM (f)

��(x)

�for all f and x; the

function � (in this case a map from s ! t to s ! s ! t) is simply theintension (up) operator, usually written ^.

For � = g, the same formula (24.18) is often involved in accounts ofquanti�cation that assume quanti�er raising at LF, such as that in Heimand Kratzer (1998). It handles raised quanti�ers (of type g ! (g ! t)! t)by allowing them to take arguments of type g ! t rather than type t. Thefunction � (in this case a map from g ! t to g ! g ! t) is simply thevariable abstraction operator.

2.4 The continuation monad; quanti�cation

Barker (2000) proposed an analysis of quanti�cation in terms of continua-tions. The basic idea is to continuize a grammar by replacing each mean-ing type � with its corresponding continuized type (� ! t) ! t through-out. As a special case, the meaning type of NPs is changed from e to(e ! t) ! t, matching the original treatment of English quanti�cationby Montague (1974).

In general, for any �xed type ! (say t), we can de�ne a continuationmonad with answer type !:

M � = (�! !)! ! 8�; (24.19a)

�(a) = �c: c(a) : M � 8a : �; (24.19b)

m ? k = �c:m��a: k(a)(c)

�: M � 8m : M �, k : �! M �: (24.19c)

The value c manipulated in these de�nitions is known as the continuation.Intuitively, \the continuation represents an entire (default) future for thecomputation" (Kelsey, R., W. Clinger, and J. Rees (Eds.) 1998). Eachvalue of type M � must turn a continuation (of type �! !) into an answer(of type !). The most obvious way to do so, encoded in the de�nition of �above, is to feed the continuation a value of type �:

JJohnK = �(John) = �c: c(John) : M e; (24.20a)

JsmokesK = �(smoke) = �c: c(smoke) : M (e ! t): (24.20b)

To compute the meaning of John smokes, we �rst substitute our monadde�nition (24.19) into the primary composition operation AM (24.8):

AM (f)(x) = �c: f��g: x

��y: c(g(y))

��: M �

8f : M (� ! �), x : M �:(24.21)

280

Chung-chieh Shan

Letting f = �(smoke) and x = �(John) then gives

JJohn smokesK = �c: �(smoke)��g: �(John)

��y: c(g(y))

��= �c: �(John)

��y: c(smoke(y))

�= �c: c(smoke(John)) : M t:

In the second step above, note how the term �y: c(smoke(y)) represents thefuture for the computation of JJohnK, namely to check whether he smokes,then pass the result to the context c containing the clause. If John smokesis the main clause, then the context c is simply the identity function id!.We de�ne an evaluation operator " : M ! ! ! by "(m) = m(id!). Fixing! = t, we then have "

�JJohn smokesK

�= smoke(John), as desired.

Continuing to �x ! = t, we can specify a meaning for everyone:

JeveryoneK = �c: 8x: c(x) : M e: (24.22)

This formula is not of the form �c: c(: : : ). In other words, the meaningof everyone non-trivially manipulates the continuation, and so cannot beobtained from applying � to an ordinary value. Using the continuized com-position rule (24.21), we now compute a denotation for everyone smokes:

Jeveryone smokesK = �c: �(smoke)��g:JeveryoneK

��y: c(g(y))

��= �c:JeveryoneK

��y: c(smoke(y))

�= �c: 8x: c(smoke(x)) : M t;

giving "�Jeveryone smokesK

�= 8x: smoke(x) : t, as desired.

The main theoretical advantage of this analysis is that it is a compo-sitional, in-situ analysis that does not invoke quanti�er raising. Moreover,note that a grammar continuized is still a grammar|the continuized com-position rule (24.21) is perfectly interpretable using the standard machineryof Montague grammar. In particular, we do not invoke any type ambiguityor exibility as proposed by Partee and Rooth (1983) and Hendriks (1993);the interpretation mechanism performs no type-shifting at \run-time".

This desirable property also holds of the other monadic analyses I havepresented. For instance, in a grammar with intensionality, meanings thatuse intensionality (for example JstudentK) are identical in type to meaningsthat do not (for example JisK). The interpretation mechanism does notdynamically shift the type of is to match that of student.

It is worth relating the present analysis to the computer science liter-ature. Danvy and Filinski (1990) studied composable continuations, whichthey manipulated using two operators \shift" and \reset". We can de�neshift and reset for the continuation monad by (Wadler 1994)

shift = �h: �c: "�h(�a: �c0: c0(c(a)))

�:�(�! M !) ! M !

�! M �; (24.23)

reset = �m: �c: c�"(m)

�: M ! ! M !: (24.24)

Assuming that the 8 operator is of type (e ! t) ! t, the meaning ofeveryone speci�ed in (24.22) is simply shift

��c:(� Æ 8)(" Æ c)

�. To encode

scope islands, Barker implicitly used reset. Filinski (1999) proved that, in acertain sense, composable continuations can simulate monads.

281


3 Combining monads

Having placed various semantic phenomena in a monadic framework, wenow ask a natural question: Can we somehow combine monads in a mod-ular fashion to characterize interaction between semantic phenomena, forexample between intensionality and quanti�cation?

Unfortunately, there exists no general construction for composing twoarbitrary monads, say (M 1 ; �1; ?1) and (M 2 ; �2; ?2), into a new monad of theform (M 3 = M 1 ÆM 2 ; �3; ?3) (King and Wadler 1993; Jones and Duponcheel1993). One might still hope to specialize and combine monads with addi-tional structure, to generalize and combine monads as instances of a broaderconcept, or even to �nd that obstacles in combining monads are re ected insemantic constraints in natural language.

Researchers in denotational semantics of programming languages havemade several proposals towards combining monadic functionality, none ofwhich are completely satisfactory (Moggi 1990; Steele 1994; Liang et al.1995; Espinosa 1995; Filinski 1999). In this section, I will relate one promi-nent approach to natural language semantics.

3.1 Monad morphisms

One approach to combining monads, taken by Moggi, Liang et al., andFilinski, is to compose monad morphisms instead of monads themselves. Amonad morphism (also known as a monad transformer or a monad layering)is a map from monads to monads; it takes an arbitrary monad and trans-forms it into a new monad, presumably de�ned in terms of the old monadand supporting a new layer of functionality. For instance, given any monad(M 1 ; �1; ?1) and �xing a type �, the reader monad morphism constructs anew monad (M 2 ; �2; ?2), de�ned by

M 2� = �! M 1� 8�; (24.25a)

�2(a) = �w: �1(a) : M 2� 8a : �; (24.25b)

m ?2 k = �w:�m(w) ?1 �a: k(a)(w)

�: M 2� 8m : M 2�, (24.25c)

k : �! M 2�:

If we let the old monad (M 1 ; �1; ?1) be the identity monad, de�ned by M 1� =�, �1(a) = a, and m ?1 k = k(m), then the new monad (24.25) is just thereader monad (24.16). If we let the old monad be some other monad|eventhe reader monad itself|the new monad adds reader functionality.

By de�nition, each monad morphism must specify how to embed com-putations inside the old monad into the new monad. More precisely, eachmonad morphism must provide a function (pronounced \lift")

` : M 1�! M 2�; (24.26)

282

Chung-chieh Shan

polymorphic in �.3 For the reader monad morphism, ` is de�ned by

`(m) = �w:m : M 2� 8m : M 1�: (24.28)

The continuation monad also generalizes to a monad morphism. Fix-ing an answer type !, the continuation monad morphism takes any monad(M 1 ; �1; ?1) to the monad (M 2 ; �2; ?2) de�ned by

M 2� = (�! M 1!)! M 1! 8�; (24.29a)

�2(a) = �c: c(a) : M 2� 8a : �; (24.29b)

m ?2 k = �c:m��a: k(a)(c)

�: M 2� 8m : M 2�, k : �! M 2�: (24.29c)

The lifting function ` for the continuation monad morphism is de�ned by

`(m) = �c:(m ?1 c) : M 2� 8m : M 1�: (24.30)

Monad morphisms can be freely composed with each other, though theorder of composition is signi�cant. Applying the continuation monad mor-phism to the reader monad is equivalent to applying to the identity monadthe composition of the continuation monad morphism and the reader monadmorphism, and yields a monad with type constructor M � = (�! �! !)!�! !. Applying the reader monad morphism to the continuation monad isequivalent to applying to the identity monad the composition of the readermonad morphism and the continuation monad morphism, and yields a dif-ferent monad, with type constructor M � = �! (�! !)! !.

3.2 Translating monads to monad morphisms

The monad morphisms (24.25) and (24.29) may appear mysterious, but wecan in fact obtain them from their monad counterparts (24.16) and (24.19)via a mechanical translation. The translation takes a monad (M 0 ; �0; ?0)whose �0 and ?0 operations are �-terms, and produces a morphism mappingany old monad (M 1 ; �1; ?1) to a new monad (M 2 ; �2; ?2). The translation isde�ned recursively on the structure of �-types and �-terms, as follows.4

Every type � is either a function type or a base type. A function typehas the form �1 ! �2, where �1 and �2 are types. A base type � is a type�xed in M 0 (� and ! in our cases), a polymorphic type variable (� and �

3By de�nition, ` must satisfy naturality:

`��1(a)

�= �2(a) 8a : �; (24.27a)

`�m ?1 k

�= `(m) ?2 (` Æ k) 8m : M 1�, k : �! M 1�: (24.27b)

4Among other things, the translation requires M 0 to be de�ned as a �-type, and �0and ?0 to be de�ned as �-terms. Thus the translation cannot apply to the powerset andpointed powerset monads (see footnote 2). Nevertheless, any monad morphism (includingones produced by the translation) can be applied to any monad (including these twomonads).

283


as appearing in � and ? (24.2) and ` (24.26)), or the terminal type ! (alsoknown as the unit type or the void type). For every type � , we recursivelyde�ne its computation translation d�e and its value translation b�c:

d�e = M 1 �; b�c = �; d�1 ! �2e = b�1 ! �2c = b�1c ! d�2e ; (24.31)

where � is any base type.Each term e : � is an application term, an abstraction term, a variable

term, or the terminal term. An application term has the form (e1 : �1 ! �2)(e2 : �1) : �2, where e1 and e2 are terms. An abstraction term has the form(�x : �1: e : �2) : �1 ! �2, where e is a term. A variable term has the formx : � , where x is the name of a variable of type � . The terminal term is ! : !and represents the unique value of the terminal type !. For every term e : � ,we recursively de�ne its term translation dee : d�e:�

(e1 : �! �1 ! � � � ! �n ! �0)(e2 : �)�

=

�y1 : b�1c: : : : �yn : b�nc:�de2e ?1

��y0 : �: de1e (y0) : : : (yn)

��; (24.32a)�

(e1 : (�1 ! �2)! �3)(e2 : �1 ! �2)�

= de1e�de2e

�; (24.32b)

d�x : � : ee = �x : b�c: dee ; (24.32c)

dx : �e = �1(x); (24.32d)

dx : �1 ! �2e = x; (24.32e)

d!e = �1(!); (24.32f)

where � and �0 are any base types, and y0; : : : ; yn are fresh variable names.Finally, to construct the new monad (M 2 ; �2; ?2), we specify

M 2� = dM 0�e ; �2 = d�0e ; ?2 = d?0e ;

`(m) =��f : !! �: �0(f(!))

�(�! : !:m) 8m : M 1�:

(24.33)

To illustrate this translation, let us expand out M 2 and ?2 in the spe-cial case where (M 0 ; �0; ?0) is the reader monad (24.16). From the typetranslation rules (24.31) and the speci�cation of M 2 in (24.33), we have

M 2� = d�! �e = b�c ! d�e = �! M 1�;

matching (24.25a) as desired. From the term translation rules (24.32) andthe speci�cation of ?2 in (24.33), we have

?2 =��m : �! �: �k : �! �! �: �w : �: k

�m(w)

�(w)�

by (24.16c)

= �m : �! M 1�: �k : �! �! M 1�: �w : �:�k�m(w)

�(w)�

by (24.32c);

in which�k�m(w)

�(w)�

= �1(w) ?1 �y0 : �:�k�m(w)

��(y0) by (24.32a), (24.32d)

=�k�m(w)

��(w) by (24.3a)

=�dwe ?1 �y0 : �: dme (y0)

�?1 �y0 : �: dke (y0)(w) by (24.32a)

=��1(w) ?1 �y0 : �:m(y0)

�?1 �y0 : �: k(y0)(w) by (24.32d), (24.32e)

= m(w) ?1 �y0 : �: k(y0)(w) by (24.3a);

284

Chung-chieh Shan

matching (24.25c) as desired.The intuition behind our translation is to treat the �-calculus with which

(M 0 ; �0; ?0) is de�ned as a programming language whose terms may havecomputational side e�ects. Our translation speci�es a semantics for thisprogramming language in terms of (M 1 ; �1; ?1) that is call-by-value and thatallows side e�ects only at base types. That the semantics is call-by-valuerather than call-by-name is re ected in the type translation rules (24.31),where we de�ne b�1 ! �2c to be b�1c ! d�2e rather than d�1e ! d�2e. Thatside e�ects occur only at base types is also re ected in the rules, where wede�ne d�1 ! �2e to be b�1 ! �2c rather than M 1 b�1 ! �2c. Overall, ourtranslation is a hybrid between the call-by-name Algol translation (Bentonet al. 2000, x3.1.2) and the standard call-by-value translation (Wadler 1992a,x8; Benton et al., x3.1.3).

3.3 A call-by-name translation of monads

Curiously, the semantic types generated by monad morphisms seem some-times not powerful enough. As noted at the end of x3.1, the reader andcontinuation monad morphisms together give rise to two di�erent monads,depending on the order in which we compose the monad morphisms. Fixing� = s for the reader monad morphism and ! = t for the continuation monadmorphism, the two combined monads have the type constructors

M cr� = (�! s! t)! s! t; M rc� = s! (�! t)! t: (24.34)

Consider now sentences such as

John wanted to date every professor (at the party). (24.35)

This sentence has a reading where every professor at the party is a personJohn wanted to date, but John may not be aware that they are professors.On this reading, note that the world where the property of professorship isevaluated is distinct from the world where the property of dating is eval-uated. Therefore, assuming that to date every professor is a constituent,its semantic type should mention s in contravariant position at least twice.Unfortunately, the type constructors M cr and M rc (24.34) each have onlyone such occurrence, so neither M cr t nor M rct can be the correct type.

Intuitively, the semantic type of to date every professor ought to be�(s! t)! s! t

�! s! t (24.36)

or an even larger type. The type (24.36) is precisely equal to M cr (s! t),but simply assigning M cr (s! t) as the semantic type of to date every pro-fessor would be against our preference for the reader monad morphism tobe the only component of the semantic system that knows about the type s.Instead, what we would like is to equip the transformation on types takingeach � to

�(s ! �) ! s ! t

�! s ! t with a composition method for

transformed values.

285


One tentative idea for synthesizing such a composition method is toreplace the call-by-value translation described in x3.2 with a call-by-nametranslation, such as the Algol translation mentioned earlier (Benton et al.,x3.1.2).5 For every type � , this translation recursively de�nes a type V�W:

V�W = M 1 �; V�1 ! �2W = V�1W! V�2W; (24.37)

where � is any base type. For every term e : � , this translation recursivelyde�nes a term VeW : V�W:

Ve1(e2)W = Ve1W�Ve2W

�; VxW = x;

V�x : � : eW = �x : V�W:VeW; V!W = �1(!):(24.38)

Applying this translation to a monad (M 0 ; �0; ?0) gives the types

V�0W : M 1�! VM 0�W; (24.39a)

V?0W : VM 0�W!�M 1�! VM 0�W

�! VM 0�W: (24.39b)

If we let (M 0 ; �0; ?0) be the continuation monad (24.19) and (M 1 ; �1; ?1) bethe reader monad (24.16), then the type t transformed is

VM 0 tW = (M 1 t! M 1 t)! M 1 t =�(s! t)! s! t

�! s! t;

as desired. However, unless (M 1 ; �1; ?1) is the identity monad, the typesin (24.39) do not match the de�nition of monads in (24.2). In other words,though our call-by-name translation does give the type transform we wantas well as a composition method in some sense, its output is not a monadmorphism.

4 Conclusion

In this paper, I used monads to characterize the similarity between severalsemantic accounts|for interrogatives, focus, intensionality, variable bind-ing, and quanti�cation.6 In each case, the same monadic composition rulesand mostly the same lexicon were specialized to a di�erent monad. Themonad primitives � and ? recur in semantics with striking frequency.

It remains to be seen whether monads would provide the appropriateconceptual encapsulation for a semantic theory with broader coverage. Inparticular, for both natural and programming language semantics, combin-ing monads|or perhaps monad-like objects|remains an open issue thatpromises additional insight.

Acknowledgments Thanks to Stuart Shieber, Dylan Thurston, ChrisBarker, and the anonymous referees for helpful discussions and comments.

5Another possible translation is the standard (\Haskell") call-by-name one (Wadler1992a, x8; Benton et al., x3.1.1). It produces strictly larger types than the Algol call-by-name translation, for instance s!

�s! (s! �) ! s! t

�! s! t.

6Other phenomena that may fall under the monadic umbrella include presuppositions(the error monad) and dynamic semantics (the state monad).

286

Chung-chieh Shan

References

Barker, C. (2000). Continuations and the Nature of Quanti�cation.Manuscript, University of California, San Diego, 4 November 2000,http://semanticsarchive.net/Archive/902ad5f7/.

Benton, N., J. Hughes, and E. Moggi (2000). Monads and E�ects. Lecturenotes, International Summer School on Applied Semantics, 5 Septem-ber 2000, http://www.disi.unige.it/person/MoggiE/APPSEM00/.

Danvy, O. and A. Filinski (1990). Abstracting Control. In Proceedingsof the 1990 ACM Conference on Lisp and Functional Programming,Nice, France, pp. 151{160. New York: ACM Press.

Espinosa, D. A. (1995). Semantic Lego. Ph. D. thesis, Graduate School ofArts and Sciences, Columbia University.

Filinski, A. (1999). Representing Layered Monads. In POPL '99: Confer-ence Record of the Annual ACM Symposium on Principles of Program-ming Languages, San Antonio, TX, pp. 175{188. New York: ACMPress.

Hamblin, C. L. (1973). Questions in Montague English. Foundations ofLanguage 10, 41{53.

Heim, I. and A. Kratzer (1998). Semantics in Generative Grammar. Ox-ford: Blackwell.

Hendriks, H. (1993). Studied Flexibility: Categories and Types in Syn-tax and Semantics. Ph. D. thesis, Institute for Logic, Language andComputation, Universiteit van Amsterdam.

Jones, M. P. and L. Duponcheel (1993). Composing Monads. Techni-cal Report YALEU/DCS/RR-1004, Department of Computer Science,Yale University, New Haven.

Kelsey, R., W. Clinger, and J. Rees (Eds.) (1998). Revised5 Report onthe Algorithmic Language Scheme. Higher-Order and Symbolic Com-putation 11 (1), 7{105. Also in ACM SIGPLAN Notices 33 (9), 26{76.

King, D. J. and P. Wadler (1993). Combining Monads. In J. Launchburyand P. M. Sansom (Eds.), Functional Programming, Glasgow 1992:Proceedings of the 1992 Glasgow Workshop on Functional Program-ming, Ayr, Scotland. Berlin: Springer-Verlag.

Lappin, S. (Ed.) (1996). The Handbook of Contemporary Semantic The-ory. Oxford: Blackwell.

Liang, S., P. Hudak, and M. Jones (1995). Monad Transformers andModular Interpreters. In POPL '95: Conference Record of the An-nual ACM Symposium on Principles of Programming Languages, SanFrancisco, CA, pp. 333{343. New York: ACM Press.

287


Moggi, E. (1990). An Abstract View of Programming Languages. Tech-nical Report ECS-LFCS-90-113, Laboratory for Foundations of Com-puter Science, Department of Computer Science, University of Edin-burgh, Edinburgh.

Moggi, E. (1991). Notions of Computation and Monads. Information andComputation 93 (1), 55{92.

Montague, R. (1974). The Proper Treatment of Quanti�cation in Ordi-nary English. In R. Thomason (Ed.), Formal Philosophy: Selected Pa-pers of Richard Montague, pp. 247{270. New Haven: Yale UniversityPress.

Partee, B. (1996). The Development of Formal Semantics. See Lappin(1996), pp. 11{38.

Partee, B. and M. Rooth (1983). Generalized conjunction and type am-biguity. In R. Bauerle, C. Schwartze, and A. von Stechow (Eds.),Meaning, Use and Interpretation of Language, pp. 361{383. Berlin:De Gruyter.

Rooth, M. (1996). Focus. See Lappin (1996), pp. 271{297.

Steele, Jr., G. L. (1994). Building Interpreters by Composing Monads.In POPL '94: Conference Record of the Annual ACM Symposium onPrinciples of Programming Languages, Portland, OR, pp. 472{492.New York: ACM Press.

Wadler, P. (1992a). Comprehending Monads. Mathematical Structures inComputer Science 2 (4), 461{493.

Wadler, P. (1992b). The Essence of Functional Programming. In POPL'92: Conference Record of the Annual ACM Symposium on Principlesof Programming Languages, Albuquerque, NM, pp. 1{14. New York:ACM Press.

Wadler, P. (1994). Monads and Composable Continuations. Lisp and Sym-bolic Computation 7 (1), 39{56.

288

Incomplete De�nite Descriptions,

Demonstrative Completion and

Redundancy

Isidora Stojanovic

Ecole Polytechnique, CREA, 1 rue Descartes, F-75005 Paris, France

[email protected]

Abstract.

\Incomplete" de�nite descriptions (i.e. descriptions that violate the uniquenessconstraint) have been o�ered various accounts in semantics. Among them, theso-called ellipsis account, which analyzes \the F" as elliptical for \the F which isthat F". I begin by arguing that the objections raised against this account have notbeen conclusive, and go on to supply a new argument against it, which consists inshowing such demonstrative completions to be semantically redundant.

1 The Problem

A friend, whose dog I was playing with, said to me once:

(1) The dog likes you.

or perhaps:

(2) That dog likes you,

I don't quite remember. To know which was exactly the sentence he useddoes not seem to matter much. As against such layman's intuitions, mostsemantic theories see a huge di�erence between the way in which de�nitedescriptions, like \the dog", and complex demonstratives, like \that dog"contribute to the truth conditions. It is widely agreed on that \that dog"contributes to the truth conditions of (2) by the dog itself, Fido, so that (2)is true i� Fido likes me (at the time of the utterance)1. But when it comes

1This view of demonstratives is common to the theories that embody David Kaplan'sinsights from Demonstratives, in Almog J., Perry J. and Wettstein H. (eds.), Themes fromKaplan, Oxford University Press, 1989. Such is, for instance, the theory of Larson R. andSegal G. in Knowledge of Meaning, MIT Press, 1995.

Proceedings of the Sixth ESSLLI Student SessionKristina Striegnitz (editor)Chapter 25, Copyright c 2001, Isidora Stojanovic

289

Incomplete De�nite Descriptions, Demonstrative Completion and Redundancy

to de�nite descriptions, semanticians are at pains to agree on how \the dog"contributes to the truth conditions of (1). If, following Bertrand Russell2,\the such-and-such is so-and-so" is true i� exactly one thing is such-and-such, and it is also so-and-so, then (1) is bound to be false, it seems, sincethere are plenty of dogs in the universe.

\Incomplete" de�nite descriptions, which violate Russell's uniquenesscondition, seem to pose a problem only to the accounts that do not viewde�nite descriptions as referential expressions3. Why not say, then, thatde�nite descriptions can at least be referentially used, so that (1) and (2)end up receiving the same truth conditions4? Yet another option is to saythat (1) literally says something false, but conveys something true, namely,that Fido likes me5. Some might be tempted to say that the domain ofdiscourse relevant to interpreting (1) contains only one dog, Fido, so that theuniqueness condition becomes ful�lled after all. The portion of the universeover which the quanti�er ranges would be then speci�ed by the context6.At last, some might suggest that not the whole sentence is phoneticallyrealized in (1). (1) would then simply serve as a shorthand for somethingmore complex, like:

(3) The dog that you are playing with right now likes you.

(1) might as well be a shorthand for (2). Then all one needs is recover acompletion that will prevent the failure of the uniqueness condition7.

In the last half of the century, many accounts of incomplete de�nite de-scriptions have seen light, but none has received a unanimous support, and,more importantly, none has been refuted either, pace claims to the contrary.In this paper, I focus on the last approach, the so-called ellipsis approach.I begin by arguing that the arguments o�ered against it did not prove con-clusive. I provide two replies to the argument from indeterminacy, which

2See On Denoting, Mind 59, 1905. Russell's proposal has been incorporated in theframework of generalized quanti�ers, as endorsed e.g. by Stephen Neale's in Descriptions,MIT Press Cambridge, 1990, or by Larson and Segal (op. cit.)

3\Referential" accounts of de�nite descriptions Cf. Strawson's criticism of Russell, inOn Referring, Mind 59, 1950. A more recent account can be found e.g. in Heim I., Articlesand De�niteness, in Stechow A. and Wunderlich D. (eds.) Semantics. An InternationalHandbook of Contemporary Research, Berlin: De Gruyter, 1991. The idea is already neatin G. Wilson's On De�nite and Inde�nite Descriptions, The Philosophical Review 87,1978.

4See Donnellan, K., Reference and De�nite Descriptions, The Philosophical Review 75,1966, reprinted in Gar�eld J. and Kiteley M., (eds.), Meaning and Truth, Paragon HouseNY, 1991.

5See S. Kripke, Speaker's Reference and Semantic Reference, Midwest Studies in Phi-losophy 2, 1977, and K. Bach in Conversational Impliciture, Mind and Language 9, 1994.

6The suggestion goes back to Barwise, J. and Cooper, R., Generalized Quanti�ers,Linguistics and Philosophy 2, 1981. Stanley J. and Szabo S. contend that this is the onlysound view of incomplete descriptions. See On Quanti�er Domain Restriction, Mind andLanguage 15, 2000.

7Stephen Neale, On Being Explicit, Mind and Language 15, 2000.

290

Isidora Stojanovic

objects to the ellipsis approach that it fails to recover a determinate com-pletion. One reply is that the recovery devices may themselves depend uponthe context in a way that allows them to yield a determinate completion.The other reply is that one can associate to every incomplete descriptiona canonical completion, namely, a demonstrative completion that recovers\that" over \the". I then argue that the argument from knowledge, whichpoints that one may be ignorant of demonstratives while competent at us-ing incomplete de�nite descriptions, does not go through for the simple factthat, in general, one does not have to know the phonetic realization of someparticular expression in order to make an ellipsis on that expression.

Even though it has got replies to those two arguments, the ellipsis ac-count of incomplete descriptions seems to be deeply mistaken. The reason,I argue, is that demonstrative completions over de�nite descriptions, likelyto be forced upon the account by the argument from indeterminacy, areactually redundant. The cases of incomplete de�nite descriptions not meantto refer to particular objects { that is, of descriptions used attributively andanaphorically { may help us to get to the redundancy result. It is clear thatsuch incomplete descriptions may be completed into complex demonstrativessalva congruitate. The question is how demonstratives used attributively oranaphorically are to be accounted for. If such demonstratives are seen asde�nite descriptions in disguise, the ellipsis account will be circular and lacka solution to the problem that it purports to solve. If, on the other hand,they are still seen as genuine referential expressions, the account will lackmotivations for not considering de�nite descriptions as referential expres-sions at the outset8.

2 The Argument from Indeterminacy

In its typical instances, bare ellipsis consists in leaving out some expressionpossible to recover from the discourse, as in:

(4) Bill wants pie for dessert and Al pudding.

(5) Bill has one child and Al four.

Thus the speaker of (4) fails to pronounce the whole sentence \Al wantspudding for dessert", while the speaker of (5) omits \has" and \children",recoverable from the previously used \child", given the plural constraint dueto the numeral \four9."

8It is noteworthy that from the standpoint of etymology, the ties between demonstra-tives and the de�nite article are straightforward. The idea that \the" is a reduced form of\that" may be found e.g. in Jespersen O., The Philosophy of Grammar, Allen & Unwin,London, 1924.

9(4) is an example from K. Bach, who wrote: \Utterances are elliptical, strictly speak-ing, only if the suppressed material is recoverable (..) by grammatical means alone."(ibid., p. 131)

291


Some cases of incomplete descriptions �t �ne to this picture. Consider:

(6) I saw a neighbor of mine kissing a woman in the staircase. Thewoman was his wife.

It could be that the description \the woman" is a shorthand for e.g. \thewoman that I saw being kissed by a neighbor of mine in the staircase", whichis recoverable from the discourse alone. However, not many incompletedescriptions are amenable to this sort of analysis. Take this well-knownexample10: having come upon Smith's cruelly mutilated body, the inspector,without suspecting anyone in particular, says:

(7) The murdered must be insane.

There is nothing in the discourse itself to complete the description. Tokeep viewing \the murderer" as a shorthand for another description respect-ful of the uniqueness constraint, the recovery of the elided material had bet-ter depend not only upon the linguistic context, but upon the context inits most general sense. Let us call narrow the approach that constrains therecovery of the elided material down to linguistic inputs only, and broad theother one. Both philosophers and linguists seem to agree that no narrowapproach can be worked out { as (7) already suggests {, and that no broadapproach can be worked out either, since there is no algorithm that tells ushow to recover the elided material. H. Wettstein wrote: \When one says,e.g., `The table is covered with books', the table the speaker has in mindcan be more fully described in any number of ways (..) Since these morecomplete descriptions are not synonymous, it follows that each time we re-place (..) `the table' with a di�erent one of these `Russellian' descriptions,it would seem that we obtain an expression for a di�erent proposition11."

Some have thought, erroneously, that Wettstein's worry would not ariseif \non- descriptive" completions were possible: completions with referentialexpressions which, albeit di�erent, would pick out the same referent12. Butas pointed out by Marga Reimer, \even if completions of incomplete de-scriptions are stipulated to be non-descriptive, the problem of adjudicatingbetween non-equivalent, co-denoting descriptions remains13."

So let me put the argument from indeterminacy as follows: The ellipsisapproach to incomplete descriptions fails because it does not o�er any al-gorithm that may tell us how to recover a determinate completion over anincomplete description.

10See Donnellan, ibid., p. 147.11Demonstrative Reference and De�nite Descriptions, Philosophical Studies 40, 1981,

p. 246.12That was S. Neale's view in Descriptions, cf. p. 99 et passim.13Incomplete Descriptions, Erkenntnis 37, 1992, p.353. F. Recanati similarly noted:

\Even if we accept that the expressed content is singular, still it is totally indeterminatewhich particular sentence expressing that content the uttered sentence is elliptical for."(Domains of Discourse, Linguistics and Philosophy 19, 1996, p.449 fn.)

292

Isidora Stojanovic

To rule out this argument, it will not do just to have non-descriptive com-pletions, whose values depend on the context of utterance. The \algorithm"that yields those completions should also be able to depend on the context.Now, Wettstein has argued convincingly enough that there was no obviousalgorithm leading from an incomplete description to its completion, and thateven the speaker may be unable to point to some determinate completion.But this does not show yet that there are utterances of incomplete descrip-tions which lack a determinate completion. The point may be made clear bylooking at the behavior of demonstratives. It has been agreed on that thelinguistic meaning of a demonstrative such as \that" does not correspondto any expression �a la \what I am pointing to" or \what I have in mind".There is no algorithm, in other words, that leads from a demonstrative to itsreferent. But this does not mean that there are utterances of \that" withouta determinate referent. The meaning of \that" might exploit all the cuesthat the context makes available, take those cues as inputs, and give us thereferent of \that" as an output. This is perfectly compatible with the factthat we did not manage to come up with such an algorithm. Similarly, onemay suggest that the meaning of \the" exploits any contextual cue availableso as to single out a completion14.

There is a reply, then, to the argument from indeterminacy. True, it`passes the buck' to the semantics of \the", but this simply shows thatthere is no special link between indeterminacy and ellipsis. Besides, it takeslittle to realize that other accounts of incomplete descriptions are not anybetter o� in this regard. Indeterminacy is a very general phenomenon,and Stanley and Szabo were wrong to think that indeterminacy obliged theellipsis approach to place \intolerable burdens on any possible solution" tothe semantics of \the". They wrote: \context has to provide a speci�cpredicate (..) And it is exceedingly hard to see what feature of contextcould [select the predicate F among other candidates]15." At the same time,they did not see the same \intolerable burden" placed on the contextualrestriction of the domain of discourse. But take (7). What feature of contextcould restrict the domain down to a set that contains no other murderersthan whoever happens to have murdered Smith? Why should there belongsome person rather than another? The only way to restrict the domainappropriately would use, it seems, some clause like \the murderer of theperson whose mutilated body we are looking at." And it makes no di�erencewhether this clause is supposed to restrict the domain of discourse, or to be

14It is somewhat ironical that Wettstein, who supports this view for demonstratives {see How to Bridge the Gap Between Meaning and Reference, Synthese 58, 1984 {, shouldthus provide a possible reply to his own argument against the ellipsis account of incompletedescriptions.

15Ibid., p. 238. The issue of indeterminacy is also known as the issue of underdetermi-nation. For discussion, see Recanati, F., Direct Reference, Blackwell Oxford, 1993, p. 235et passim.

293


recovered over some other expression.

3 The Argument from Knowledge

There is another possible reply to the argument from indeterminacy, whichmoreover makes a good case for the narrow ellipsis approach to incompletedescriptions. The reply consists in showing that for every incomplete de-scription, it is possible to come up with a determinate, canonical comple-tion. Thus every de�nite description \the F" that violates the uniquenessconstraint while empirically yielding true utterances may be seen as ellip-tical for \the F which is that F", or simply for \that F". The user of anincomplete description may be then seen as having failed to pronounce /at/after /th/. It looks as if he had used a de�nite article, whereas he has useda demonstrative.

First of all, it should be shown that canonical completion works. That isobvious for those cases in which there is something contextually salient thatthe description singles out, in the way in which \the dog" singles out Fido in(1). The cases that are worrisome are rather of the same ilk as (6), in whichthe description is anaphoric, or as (7), in which it is attributive. But, asit turns out, complex demonstratives also allow for anaphoric as well as forattributive uses16. One can utter: \that such-and- such is so-and-so", justas one can utter: \the such-and-such is so-and-so", in order to attribute theproperty of being so-and-so to whatever is such-and-such, however there isnothing of which the speaker wishes to say that it is such-and-such. Thus,instead of (7), the inspector might have said as well:

(8) That murdered must be insane.

If we turn to de�nite descriptions used anaphorically, we also see thatthey can always be replaced, in a more or less felicitous manner, by complexdemonstratives. Thus, instead of (6), I might have said as well:

(9) I saw a neighbor of mine kissing a woman in the staircase. Thatwoman was his wife.

Let us assume, for the sake of the argument, that substituting \that" for\the" generally works. The output of the substitution may be less felicitousthan the input, but there shall be no di�erence in the truth conditions. Willthere be anything wrong, then, with the narrow ellipsis approach? Stanleyand Szabo thought they had a knockdown argument against it, which I shallcall the argument from knowledge. They wrote: \Suppose that Max is not a

16In Are Complex `That' Phrases Devices of Direct Reference?, Nous 33, 1999, Je�reyKing uses precisely this sort of cases to motivate an account of complex demonstrativesin terms of restricted quanti�ers.

294

Isidora Stojanovic

fully competent speaker of English. (..) The use of demonstrative pronounsis not discussed until unit 7 and Max is not there yet. (..) Since Max doesnot know the word `that', he cannot identify the sentence uttered by thespeaker of (1) which contains that word as an unarticulated constituent17."As Max, competent in the use of the de�nite article, has no diÆculties tounderstand (1), Stanley and Szabo were led to conclude that (1) cannot beelliptical for (2).

If this argument were to be taken seriously, it would have been a knock-down argument against ellipsis in general. Thus, suppose that Max doesnot know what the plural of \child" is, but knows that it is irregular. Ashe does not want to reveal his ignorance, he says \Bill has one child and Alfour", as in (5). Shall we say then that his utterance cannot be elliptical for\Bill has one child and Al has four children"? No. Ignorance of the pho-netic realization of some grammatical form of an expression is not enoughto prevent us from making utterances elliptical on that expression, still lessfrom understanding them18.

4 The Argument from Redundancy

Let me take stock. The narrow ellipsis approach, which over every incom-plete description recovers a canonical, demonstrative completion, meets theargument from indeterminacy in that it does provide a determinate comple-tion, whereas the argument from knowledge has simply proved awed. Now,the question was how to ascribe correct truth conditions to an utterance thatinvolves a de�nite description that does not single out anything. If the el-lipsis approach is to provide a solution, it must be able to ascribe correcttruth conditions to utterances that involve complex demonstratives in lieuof incomplete de�nite descriptions. The view on which \that" is a referen-tial expression whose truth-conditional contribution consists of its referenthas no problem with those cases in which there is some contextually salientreferent, as in Fido's case. Rather, the question is how that view deals withother cases, in which demonstratives are used attributively or anaphorically.If demonstratives lose their referential character on such uses, and turn outto be quanti�ers in disguise, then the ellipsis approach clearly lacks a clue tothe problem of incomplete descriptions. Conversely, if demonstratives keepreferential even on such uses, then it must be possible to provide de�nitedescriptions themselves with an account that sees them as referential ex-

17Ibid., p. 238; \unarticulated" reads as \elided". I have taken the liberty of makingnecessary changes so that the quote �ts the examples that I have been using here.

18Had Max have no concept of demonstratives, perhaps he would have given us reasonsto be suspicious toward the ellipsis approach. But it is far from clear that anyone withoutany idea of how demonstratives work could ever grasp what (1) says. And it is not muchmore clear that the mastery of \that" requires anything more than does the mastery of\the".

295


pressions. In either case, the recovery of the demonstrative over the de�nitearticle must prove unattractive, since the material recovered turns out to beredundant. Albeit possible, the ellipsis approach is far from attractive, foreither it lacks a solution to the problem that it purports to solve, or it lacksmotivations for not analyzing de�nite descriptions as referential expressions,while being able to do so19.

In the remainder of the paper, let me try to see whether there is a refer-ential account of demonstratives that covers their anaphoric and attributiveuses. (Keep in mind, though, that the answer does not a�ect the argumentitself.)

The cases of anaphora that we have seen are not, so to say, diÆcultenough, since there is something singled out by the antecedent. Considerthis harder case instead:

(10) Whenever you see a neighbor of yours kissing a woman in thestaircase, the woman (/ that woman) (/ she) may easily happen tobe his wife.

The quanti�er \whenever", which, for our purposes, may be taken torange over situations, makes it clear that there is not a single woman rele-vant to the truth of (10). How can \that woman" then occur as a referentialexpression in (10)? It seems that if \that woman" were a genuine demon-strative, there would have to be some particular woman whom it wouldstand for.

In relation to the same issue as it arises with pronouns, Gareth Evansargued that, notwithstanding appearances, the pronoun \she" may receivea referential analysis even in (10) and the sentences of the same ilk. Evanswrote: \If we adopt a Fregean account of satisfaction, we have only to givean account of the pronoun-antecedent construction as it occurs in singularsentences { no further explanation need be given of pronouns with quanti�erantecedents. (..) A natural explanation of the role of pronouns with singularantecedents is in terms of co-reference { the pronoun refers to whatever theantecedent refers to20."

Without tackling the details of Evans' account, let us adapt the ideato our needs. We can roughly say that (10) is true i� C is true for every

19The cases that pose problems for referential accounts of de�nite descriptions { caseswhich had originally motivated Russell's view {, are precisely the cases of descriptionsattributively used and the cases of anaphora. If there is a referential account of suchcases when they involve demonstratives, that same account should work with descriptionstoo. The choice, then, to reserve the referential account for demonstratives and to subjectde�nite descriptions to a non-referential one can only be arbitrary and unmotivated.

20Pronouns, in Collected Papers, Oxford University Press, 1985, p. 227. Frege's accountof satisfaction is like the substitutional account, except that, instead of assuming that thelanguage has all the expressions needed for substitution instances, it simply assumes that itcan always be enriched with any such expression. Cf. Pronouns, Quanti�ers and RelativeClauses, in ibid.

296

Isidora Stojanovic

pair of referential terms substituted for (x) and (y) respectively, where Cis the conditional: `when you see (x) kissing (y) in the staircase and (x) isa neighbor of yours and (y) is a woman, (y) may easily happen to be (x)'swife.' The general idea is that every situation in which you can truly say: \Isee him, a neighbor of mine, kissing her in the staircase", is also a situationin which you can truly say: \she may well happen to be his wife." In otherwords, there are no more quanti�ers that properly range over individuals,but only quanti�ers that range over situations21.

There is one step left from a referential account of demonstratives anaphor-ically used to a referential account of demonstratives attributively used,hence of de�nite descriptions. Recall the situation. Watching Smith's bodycruelly mutilated, the inspector says:

(11) That murderer must be insane.

How can \that" occur as a genuine demonstrative in (11)? { there isnobody salient in the context that the inspector was talking of, nobody hemeant to speak of otherwise than as of whoever happened to be the murderer.And if \that murderer" is not a demonstrative, but a de�nite description indisguise, then it is plainly circular to suggest that \the murder" used underthe same circumstances should be elliptical for \that murderer".

Once again, however, it seems possible to continue treating \that mur-derer" as a demonstrative. All we need to do is make our quanti�ers rangenot over individuals, but over something else { situations, possible substitu-tion instances, events, and so on. On Russellian approaches, (11) is true i�there is one and only one individual who is a murderer, and who, moreover,must be insane. Swaying the range of our quanti�ers from individuals to sit-uations, we would say that (11) is true i� there is one and only one situationin which one would speak truly if one said: \That murder must be insane",using \that" in reference to someone demonstrated in that situation. Suchan account would clearly preserve the referential character of \that mur-derer". So the idea is that the demonstrative always acts as a referentialexpression, i.e. as an expression contributing to the truth conditions by itsreferent. Yet, in order to �x the referent, one must have settled on someparticular situation. Most often, the situation settled on is the current one,that is, the situation in which the utterance is taking place. But that neednot be the case. One may also settle on other situations, or even not settleon any particular situation at all. Thus one may simply state the existenceof a situation, so that the situation relevant to �xing the reference of \that"

21Alternatively, all quanti�ers range over (possible) substitution instances. Presumably,this option is more faithful to Evans' original proposal. There are other frameworksfavorable to the idea. E.g. one can make quanti�ers range over events, or over (possible)contexts. It goes without saying that within the boundaries of the present paper I can atbest hint at a referential account of unbound anaphora.

297


would merely be a hypothetical situation { which is what probably happensin (11).

To be sure, the proposal that I have just canvassed does not amount toproviding a solution to attributive uses of expressions normally seen as ref-erential, like complex demonstratives, or even pronouns22. But what it doesshow is that the possibility of using an expression attributively is compatiblewith the possibility of giving that expression a referential account23. Thereferential account that can be given for \that", including its anaphoric andattributive uses, straightforwardly extends into a referential account of thede�nite article itself. In consequence, an account of de�nite descriptions interms of quanti�ers that posits a demonstrative completion when it comes toincomplete descriptions, is simply unmotivated { or dismotivated, as mightbe a more proper way of putting it {, given that the initial motivationsfor a non- referential account of de�nite descriptions vanish in the presenceof a referential account of demonstratives that behave in the same way asde�nite descriptions do.

22Indeed, \that murderer" may be successfully replaced by \he" in (11). Another well-known example is that of a person who, spotting giantic footprints, says: \He must be agiant."

23In this way, the referential account of \the", built out from the referential account of\that" attributively used, will not blur the epistemological distinction between two man-ners of individuating things { namely, by direct reference vs. by description {, brought tolight by Russell in Knowledge by Acquaintance and Knowledge by Description, Proceedingsof the Aristotelian Society 11, 1910.

298

Belief Fragments and Ontological

Categories

Jan Westerhoff

Trinity College, Cambridge (UK)

[email protected]

Abstract.

There have been attempts to get some logic out of belief dynamics, i.e. attempts tode�ne the constants of propositional logic in terms of functions from sets of beliefsto sets of beliefs.1 It would be interesting to see whether something similar couldbe done for ontological categories, i.e. ontological constants. The theory presentedhere will be a (modest) expansion of belief dynamics: it will not only incorporatebeliefs, but also parts of beliefs, so called belief fragments. On the basis of this wewill give a belief-dynamical account of the ontological categories of states of a�airs,individuals, properties of arbitrary adicities and properties of arbitrary orders.

1 The Background: Belief Fragments

The fundamental idea of this paper is that two beliefs can have something incommon, namely, a part of a belief, which we will also call a belief fragments.Consider the following two beliefs:

1. The Duke of Edinburgh is over �fty.

2. The American President is over �fty.

What do these two beliefs have in common? They have something incommon which isn't itself a belief, but rather a part of a belief, a belieffragment: this is is over fifty. It is important to note that is over

fifty isn't the same as ìs over �fty' | we are not talking about syntaxhere. To see this consider a third belief:

3. The Chancellor of Cambridge University is male.

The syntactic elements which the sentences 1 and 3 have in common(`the', òf', ìs') are not the belief fragments which the two beliefs 1 and 3have in common. What the beliefs have in common is that they are both

1See (3), chapter 6.

Proceedings of the Sixth ESSLLI Student SessionKristina Striegnitz (editor)Chapter 26, Copyright c 2001, Jan Westerho�

299

Belief Fragments and Ontological Categories

about a certain person, namely about Prince Phillip. The belief fragmentPrince Phillip which they have in common isn't identical to any syntac-tical element which the English formulations of the two beliefs given abovehave in common.

Note that we are not assuming that beliefs have any particular structure.They will have some structure, but for our purposes it is irrelevant whatthis is. The sentences given above which express beliefs 1{3 happen to sharesome syntactic elements. This is due to the way we have chosen to expressthem. But of course we could have expressed them in all sorts of ways(by formulating them in di�erent languages, by using pictures or diagrams),including ways in which they did not have any syntactic features in common.This, however, would not change the fact that the beliefs expressed share acommon belief fragment.

We are not specifying what kind of entity a fragment is, whether it islinguistic, psychological, material (a brain state, say), functional or what-ever. All we claim here is that if beliefs are supposed to be entities of kindX, belief fragments should be considered to be parts of entities of kind X.2

There are beliefs which have more than one thing in common. Take forexample the following two beliefs:

4. The apple and the banana are on the table.

5. The apple and the banana are both to the left of the vase.

4 and 5 have two belief fragments in common, namely the apple andthe banana. One might be tempted to treat them just as a single fragmentthe apple and the banana. But such a procedure cannot be generalizedas can be seen from the following example:

6. Peter washes his white shirt.

7. Charles washes his white car.

Here washes and white are fragments common to both 6 and 7 but wewould not want to say that there is one fragment which somehow containsthese two as parts which 6 and 7 have in common. For the sake of simplicitywe are therefore going to disregard such cases in our inquiry and con�neourselves to beliefs which have at most a single fragment in common.

For our account we need two primitive notions. The �rst is the meetoperation \B. This takes two beliefs and returns the largest fragment whichthey have in common, else it returns ;. Meeting can be iterated: \B canalso return a (`smaller') fragment which two fragments have in common.And �nally the meet operation can take a belief and a belief fragment andreturn the smaller fragment which the belief and the belief fragment have

2Cf. (4), 7{9.

300

Jan Westerho�

in common. We will write Bi; Bj ; : : : for beliefs and B]i ; B

]j ; : : : for belief

fragments.The second primitive notion is the two-place � operation which allows

us to expand fragments and thus to build complexes out of fragments. Theexpansion operation takes two belief fragments B]

i ; B]j and returns either

1. a belief Bn or

2. another belief fragment B]m or

3. the set fB]i ; B

]jg of the two belief fragments.

Let us consider an example for each possibility in turn. For the �rstcase, let B]

i be Piero and B]j married Lucrezia, then B]

i � B]j = Bn =

Piero married Lucrezia.For the second possibility, let B]

i be is married to and B]j Lorenzo,

then B]i � B]

j = Bm = Lorenzo is married to (or is married to

Lorenzo | see below).

For the �nal option let B]i again be is married to but B]

j is the

brother of. Clearly in this case there is no belief or belief fragment whichcan be constructed out of B]

i and B]j so � will return the set of the two as

a default value.3

We suppose that expansion is commutative and associative, i.e.

(COMM) B]i �B]

j = B]j �B]

i

(ASS) (B]i �B]

j)�B]k = B]

i � (B]j �B]

k)

As reader will have already noted from the example of (B � B]1), �

behaves quite di�erently from the expansion operator in standard beliefdynamics.

Firstly, (B]i �B

]j)�B

]j will not be the same as (B]

i �B]j), i.e. expanding

a fragment twice by another fragment will not give the same result as doingit only once (as is true for standard expansion where (A+�) +� = A+�).For example, expanding the above fragment is married to twice by thefragment Lorenzo gives Lorenzo is married to Lorenzo.

Secondly, in the standard case the expansion of a set of beliefs will alwaysdeliver a unique object of the same kind, namely a (now expanded) set ofbeliefs. But with our expansion operator it will not always be the case thatan expansion of some fragment will yield some unique new belief fragment(or belief). Suppose some fragment corresponds to the dyadic property`adores'. If we want to expand this by the belief fragment `Dante', it is clear

3We could plausibly extend � by allowing it to take a belief fragment and a beliefand return a belief fragment. For example, it could take the belief Piero is married

to Lucrezia and expand it by the fragment entails that. This would then return thebelief fragment entails that Piero is married to Lucrezia (or Piero is married to

Lucrezia entails that).

301


that we get another belief fragment, namely a monadic property. But it isnot clear what this property is; it may be either `Dante adores' or `adoresDante'. This point generalizes: for every fragment corresponding to an n-adic property (where n � 2) which is expanded by a belief fragment thereare n di�erent possible results.

In order to give a systematic account of our notion of expansion, thereare now two possibilities. We could either say that expanding a fragmentcorresponding to an n-adic property yields a set � containing the di�erentpossibilities, so that in the above case the result of expanding `adores' by`Dante' would be fadores Dante, Dante adoresg. Alternatively we couldde�ne the expansion in terms of a function (�) which selects exactly onemember of �.

The problem with the �rst construction is that it brings in an unwelcomeasymmetry between e.g. fragments corresponding to `naturally' monadicproperties and those corresponding to monadic properties which were pro-duced from n-adic properties by `�lling in' n� 1 places. Obviously they willbehave di�erently under expansion: the expansion of the naturally monadicpredicates by the appropriate fragments will yield a belief, while the expan-sion of the formerly n-adic properties will yield a set of beliefs. The expan-sion of is dead by Beatrice will just give the belief Beatrice is dead,while expanding the result of expanding adores by Dante by Beatrice

gives the set fDante adores Beatrice, Beatrice adores Danteg.We will therefore use the second construction. The expansion operator

is then de�ned as a function which selects exactly one member from theset � of the various possibilities.

2 The Program: Which Fragment is Which?

As we saw above, \B will supply us with various kinds of belief fragments.What we want to try to do is to group the fragments according to the onto-logical categories they correspond to. There is a fairly natural way in whichbeliefs correspond to states of a�airs so that in a similar way fragments ofbeliefs should correspond to fragments of states of a�airs (i.e. properties andindividuals). Is there a way of telling which groups of fragments correspondto which categories?

We will look at an attempt of grouping fragments by their behaviourunder expansion. Clearly not every fragment can be used to expand otherevery fragment: if B]

i is is yellow and B]j is is bigger than expanding

the one by the other will not be a belief, i.e. there is no belief B such thatB]i �B]

j = B.

302

Jan Westerho�

2.1 Adicity

One way in which this can be exploited is in de�ning di�erent adicitiesof properties. Suppose we have to properties B]

i and B]j such that B]

i �

B]j = B]

k (i.e. expanding the one by the other gives another belief fragment)

while B]k � B]

i = B (i.e. expanding the result by the �rst fragment gives

a belief). In this case the belief fragment B]j must correspond to a dyadic

property. (Note that this doesn't entail that B]i must correspond to an

individual | B]j could correspond to a property of order n and B]

i to one of

order n� 1.) For example, B]i could be Peter and B]

j is married to so

that expanding B]j by B]

i gives the belief fragment Peter is married to

(or is married to Peter) (thus a fragment corresponding to a monadic

property), while another expansion by B]i gives the belief (corresponding

to a state of a�airs) Peter is married to Peter). In another possible

interpretation B]i could be bigger than and B]

j is the inverse of, so

that B]i would not correspond to an individual.

We can generalize this by introducing the notion of n-copy-saturability.We say that some belief fragment B]

i is n copy-saturable (ncs) by some belief

fragment B]j i� there is some belief Bk such that (B]

j : : :�(B]j�(B]

j�B]i ))) =

Bk, where � occurs n times on the left-hand-side of the equation.Thus the notion of n copy-saturability, which says that n expansions

by some other belief fragment make the expanded belief fragment into abelief allows us to group belief fragments into classes. The class of dyadicproperties will correspond to the class of fragments which are 2cs by someB]m, the class of triadic properties to the class of fragments which is 3cs by

some B]k and in general the class of i-adic properties will correspond to those

fragments which are ics by some B]l .

While the notion of copy-saturability does the trick for all adicitiesgreater that two, it fails for the monadic case. In this case the asymme-try between the expanded and the expander will break down since B]

i will

be 1cs by B]j and also B]

j will be 1cs by B]i . So the class of fragments which

are 1cs by some B]i will not correspond to the class of monadic properties

since it will e.g. also contain individuals.The above procedure also fails to give us a way of distinguishing between

di�erent orders of properties of one adicity. If some B]i is 2cs by some B]

j we

can be sure that B]i is dyadic, but we cannot know what the order of either

B]i or of B]

j is, since it could be the case that B]i corresponds to a �rst-order

dyadic property (which we will in the future abbreviate as P 12 ) and B]

j an

individual (abbreviated as i), or that B]i = P 2

2 and B]j = P 1

n or B]i = P 3

2

and B]j = P 2

m and so on.So we need some way of distinguishing �rst-order properties from indi-

303


viduals and also a method for distinguishing orders of properties. Once wehave a notion of an individual, the de�nition of orders of properties is easy.

Before proceeding to this discussion, however, let us discuss a couple ofproperties of belief fragments which can be characterized in this framework.The above introduction of the selection function gives us a quite naturalway of characterizing symmetry.

We can say that a belief fragment B]i corresponding to a dyadic property

corresponds to a symmetric property i� for any two distinct fragments B]j,

B]k such that repeatedly expanding B]

i by them gives some belief B, whatB is does not in any way depend on .

This corresponds to the idea that what we mean by saying that someproperty (such as `married') is symmetric is that `Romeo married Juliet',`Juliet married Romeo', `Juliet and Romeo are married', `Romeo and Julietare married', `A marriage took place between Romeo and Juliet' : : : are allthe same belief. So i� some fragment corresponds to a symmetric propertyit makes no di�erence which of the di�erent possibilities (i.e. members of �) picks, since these possibilities really just all amount to the same.

We will say that a fragment B]i corresponding to a dyadic property cor-

responds to a re exive property i� for some fragments B]j , B

]k and some set

of beliefsB ((B]i �B

]j)�B

]k)[B = ((((B]

i �B]j)�B

]k)� ((B]

i �B]j)�B

]j))�

((B]i � B]

k) � B]k)) [B. This characterization depends on the fact that B

is logically closed so that given we expand it by some belief aPb, when P isre exive we get the very same result if we afterwards also expand by aPaand bPb.

Transitivity is a bit more complicated to characterize. Here we have tomake our characterization relative to the structure of . Suppose selectsthe expansion which �lls in the �rst place (i.e. i� the dyadic P is expandedby some a, selects aP rather than Pa from the set � of possibilities.).

We can then say that a fragment B]i corresponding to a dyadic property

corresponds to a transitive property i� for some fragments B]j , B

]k and B]

l

and some set of beliefs B (((B]i � B]

j) � B]k) � ((B]

i � B]k) � B]

l )) �B =

(((B]i �B

]j)�B

]k)� ((B]

i �B]k)�B

]l )� ((B]

i �B]j)�B

]l ))�B. Again this

characterization depends on the fact that B is logically closed so that givenwe expand it by some beliefs aPb and bPc when P is transitive we get thevery same result if we afterwards also expand by aPc.

2.2 Order

To give an account of the di�erent orders of properties di�erent belief frag-ments correspond to we start from the class of fragments which are 1cs bysome B]

i , i.e. the class G containing fragments corresponding to i, P 11 , P 2

1 : : :.We then split G into partitions which ful�ll the following properties:

304

Jan Westerho�

| For no two members from any Gi from G will the expansion of the oneby the other yield a belief.

| If for some B]i in some Gj from G and some some B]

k in some distinct

Gl from G it is the case that B]i �B

]k yields a belief this is also true for

all other members of Gj and Gl. (Call Gj and Gl corresponding classesin this case). For every Gi from G there is exactly one correspondingclass.

Such a partitioning ensures that every order of properties is containedwholly and purely in some partition of G. We just don't know which partitioncontains which.

We will now proceed by considering the following construction: Takesome B]

i 2 G and select some distinct B]j 2 G such that B]

i � B]j yields

a belief. Now take B]j and select another B]

k (if possible from a partition

di�erent from the one of which B]i is a member) such that B]

j �B]k yields a

belief. Then consider B]k and so on. We call such a sequence an expansion

sequence which can be written in the following way:

B]i �

�B]j = Bm

B]j �

�B]k = Bn

B]k �

�B]l = Bo

...

(Here the arrow indicates that the fragment on the left of the arrow isexpanded by the fragment on the right in the respective step and yields thebelief to the right of the equation sign.)

If we take a concrete example and let B]i be P 2

1 the expansion sequencelooks like this:

P 21 �

�P 11 = Bm

P 11 �

�i = Bn

i ��

P 11 = Bo

...

305


Remember that what we want to do is �nding the partition containingthose and only those beliefs corresponding to individuals. Since the expan-sion sequences do not end, we cannot in any way identify i as the expansionin the last step of such a sequence. But if we look at the above example wesee that the expansion which corresponds to i is the only expansion both im-mediately preceded and immediately followed by expansions which come fromthe same partition. In the third step in this sequence it wasn't possible tochoose the fragment which expands from a new partition (as we demanded

above, where B]i and B]

k were supposed to come from di�erent partitions).In fact we had to select the fragment which expands in step three (namelyP 11 ) from the very same partition from which we selected the fragment which

expanded in step one.We can thus de�ne the partition of belief fragments corresponding to

individuals in the following way:

it is the class of all those fragments B]n which are used in expansion

sequences such that B]n�1 (the expansion immediately preceding it)

and B]n+1 (the expansion immediately following it) are elements of the

same Gk 2 G if there is such a B]n

or, if there is none in some expansion sequence it is the �rst memberof that sequence (in this case the belief to be expanded corresponds toa �rst-order property).

Thus given that we know which partition contains the belief fragmentscorresponding to individuals, it is easy to give a recursive way of telling whichpartitions contain fragments corresponding to properties of which order.Suppose B]

n corresponds to an individual. Then some Gk 2 G containsall and only the �rst-order properties i� for every B]

m 2 Gk there is somebelief B1 such that B]

n �B]m = B1. Some Gl 2 G contains all and only the

second-order properties i� for every B]o 2 Gl there is some belief B2 such

that B]m � B]

o = B2 and so on. It is then obvious how to de�ne fragmentscorresponding to higher-order properties of adicities > 1.

3 Philosophical Re ections

It should be noted that although the above characterization relies on di�er-ences in the possibilities of putting together fragments to form beliefs thisdoes not mean that we have to assume that some class of fragments is some-how `complete' while the other is incomplete and cannot stand on its own.This line is taken by Strawson who argues that

A subject-expression is one which, in a sense, presents a fact inits own right and is to that extent complete. A predicate ex-pression is one which in no sense presents a fact in its own right

306

Jan Westerho�

and is to that extent incomplete. [: : :] The predicate-expression,on the new criterion is one that can be completed only by ex-plicit coupling with another. [: : :] We �nd an additional depth inFrege's metaphor of saturated and the unsaturated constituents.(5), 187{188

On our account none of the fragments is able to represent a state ofa�airs (not even in `a certain sense') | this is something which only beliefscan do. And although we consider belief fragments to be incomplete (to theextent to which they are incomplete beliefs) this does not mean that theycannot represent as they are; as we have seen above we can make sense ofsome of them representing individuals while others represent properties ofdi�erent kinds.

The ontological categories we considered above | states of a�airs, in-dividuals, properties of arbitrary orders and adicities | are certainly notall the ontological categories there are. Just think of categories as abstractobject, event, mathematical structure, trope or material object. All of theseare categories which are as legitimate objects of ontological discussion asthe ones we considered. Nevertheless, they seem to be distinguished bytheir universality as well as by their coherence.

They are universal because relative to every other ontological categorythere seem to be states of a�airs, individuals and properties. The set con-taining the number seven (an abstract object) is an individual, amongstothers it has the property of having just one member and it is a state ofa�airs that it has this property. The Seven Years' War (an event) is anindividual, it has the property of having lasted for seven years and it is astate of a�airs that it has lasted for seven years, and so on.

These ontological categories have a particular coherence because theycan all be constructed from a rather con�ned basis. It is possible to accountfor all of them just in terms of states of a�airs and individuals. This isdone by employing a procedure due to Ajduckiewizc (1) (which was laterdeveloped in (2)) for giving a theory of grammatical categories. We use aprimitive functor (intuitively interpreted as `takes and returns : : :') andwrite the complex ontological categories as fractions. Monadic �rst orderproperties are de�ned as i

s (because they take an individual and return

a state of a�airs), while dyadic �rst order properties are i;is and so forth.

Monadic second order properties are taken to be of the formis

s ; and in asimilar way all the ontological categories of the di�erent adicities and orderscan be constructed. This constructional coherence does not seem to bepossessed by other sets of ontological categories. The ensemble of statesof a�airs, individuals and properties of di�erent adicities and orders whichcan be de�ned in a belief-dynamical framework therefore seems to occupy a

307


special place in the system of ontological categories.�

References

[1] Kasiemirz Ajduckiewicz. `The Scienti�c World-Perspective and otherEssays. 1931{1963. Reidel, PWN Polish Publishers, 1978.

[2] Yehoshua Bar-Hillel. A quasi-arithmetical notation for syntactic de-scription. Language, 29:47{8, 1953. MIT Press, Cambridge MA,1988.

[3] Peter G�ardenfors. Knowledge in Flux. MIT Press, Cambridge MA, 1988.

[4] Sven Ove Hansson. A Textbook of Belief Dynamics. Theory Change andDatabase Updating. Kluwer, Dordrecht, Boston, London, 1999.

[5] P. F. Strawson. Individuals. An Essay in Descriptive Metaphysics.Methuen & Co., London, 1964.

�Thanks are due to Matthias Hild for providing the initial motivation for this paper.

308

Grammar conversion from LTAG to

HPSG

Naoki Yoshinaga

Department of Information Science, Graduate school of Science, University of Tokyo

[email protected]

Yusuke Miyao

Department of Information Science, Graduate school of Science, University of Tokyo

[email protected]

Abstract. We propose a grammar conversion algorithm from an arbitraryFeature-Based Lexicalized Tree Adjoining Grammar (FB-LTAG) grammar into astrongly equivalent Head-Driven Phrase Structure Grammar (HPSG)-style gram-mar. Our algorithm converts LTAG elementary trees into HPSG feature structuresby encoding a tree structure in a list. A set of pre-determined rules manipulate thelist to emulate substitution and adjunction. By using our algorithm, we can obtainHPSG-style grammars from existing LTAG grammars. We apply this algorithm tothe XTAG English grammar and report some �ndings.

1 Introduction

Recent intelligent NLP applications (Kay et al. 1994) take advantages ofcomputationally and linguistically motivated grammar formalisms, such asFeature-Based Lexicalized Tree Adjoining Grammar (FB-LTAG1) (Vijay-Shanker 1987; Vijay-Shanker and Joshi 1988) and Head-Driven Phrase Struc-ture Grammar (HPSG) (Pollard and Sag 1994). Although both formalismshave the same motivation to model natural languages, the correspondencebetween the formalisms is not well discussed. We believe that collaborationbetween the communities will be bene�cial for their further improvement.

This research aims at providing a basis for the collaboration, and theproposal is a grammar conversion from LTAG into HPSG. Our conversionguarantees the strong equivalence, that is, parsing results (derivation trees)of an LTAG grammar can be derived from parsing results of the obtainedHPSG-style grammar. Strongly equivalent grammars based on the di�erent

1If not confusing, we use the term LTAG to refer to FB-LTAG in this paper.

Proceedings of the Sixth ESSLLI Student SessionKristina Striegnitz (editor)Chapter 27, Copyright c 2001, Naoki Yoshinaga and Yusuke Miyao

309

Grammar conversion from LTAG to HPSG

formalisms are valuable for both communities in practical, computational,and theoretical aspects (Becker and Lopez 2000):

Resource sharing HPSG-based applications can make use of LTAG re-sources (lexicons and grammars) such as large-scale English (Doranet al. 2000) and French grammars (Abeill�e and Candito 2000). Ouralgorithm can reduce the considerable workload to develop huge re-sources from scratch.

Parsing eÆciency We can apply HPSG parsing techniques to LTAG pars-ing by applying them to the obtained HPSG-style grammar. Recentdevelopment of the HPSG parsing techniques allows the use of HPSG-based processing in practical application contexts (Flickinger et al.2000). Experiments using state-of-the-art parsers show the eÆciencyof an HPSG parser compared to an LTAG parser.

Linguistic correspondence We can explore the linguistic correspondencebetween the formalisms. Since the obtained grammar has computa-tional features of HPSG and linguistic features of LTAG, their dif-ference will be apparent by comparing the obtained grammar withexisting HPSG grammars.

We implemented the conversion algorithm, and successfully convertedthe latest version of the XTAG English grammar (The XTAG ResearchGroup 2001), which is a large-scale FB-LTAG grammar. The strong equiva-lence was empirically attested by the fact that parsing with the original andthe obtained grammars generated equivalent parse results. Strongly equiva-lent grammars enable the fair comparison of LTAG and HPSG parsers, andanother work reports that an eÆcient HPSG parser achieved a parsing speedthat was signi�cantly higher than an existing LTAG parser (Yoshinaga et al.2001). In this paper, we investigated the types of linguistic phenomena cov-ered by the XTAG English grammar, and correspondence to their analysisin the HPSG formalism.

Tateisi et al. also translated LTAG into HPSG (Tateisi et al. 1998).However, their method depended on translator's intuitive analysis of theoriginal grammar, and thus the translation was manual and grammar de-pendent. The manual translation demanded considerable e�orts from thetranslator, and obscures the strong equivalence between the original and ob-tained grammars. Other works converted HPSG into LTAG (Kasper et al.1995; Becker and Lopez 2000). Given the greater generative power of HPSG,the conversion required some restrictions on HPSG to suppress its genera-tive capacity. As a result, the conversion loses the strong equivalence of thegrammars. The existing works cannot gain the above advantages, which areattainable only when the strong equivalence is preserved.

310


S

NP VP

V

run

VP

VPV

can

*

adjunction

NP

N

We

substitution

α1

α2 β1anchor

foot node*

substitution node

derived tree

α2

β1α1

derivation tree

S

NP VP

VPV

can

N

We V

run

Figure 27.1: Tree Adjoining Grammar: basic structures and their composingoperations

2 Grammar formalisms

2.1 Feature-Based Lexicalized Tree Adjoining Grammar (FB-LTAG)

LTAG (Schabes et al. 1988), an input of our algorithm, is a grammar formal-ism that provides syntactic analyses for a sentence by composing elementarytrees with two operations called substitution and adjunction. Elementarytrees are classi�ed into two types, initial trees (�1 and �2 in Figure 27.1)and auxiliary trees (�1 in Figure 27.1). An elementary tree has at least oneleaf node labeled with a terminal symbol called an anchor (marked with �).In an auxiliary tree, one leaf node is labeled with the same symbol as theroot node and is specially marked as a foot node (marked with �). In anelementary tree, leaf nodes with the exception of anchors and a foot nodeare called substitution nodes (marked with #).

The left-hand side of Figure 27.1 illustrates the two operations. Substi-tution replaces a leaf node (substitution node) with an initial tree and ad-junction grafts an auxiliary tree with the root node and a foot node labeledx onto a node with the same symbol x. Results of analysis are describednot only by derived trees (i.e., parse trees) but also by derivation trees (Fig-ure 27.1). A derivation tree represents the history of combinations of treesand is a structural description in LTAG.

FB-LTAG (Vijay-Shanker 1987; Vijay-Shanker and Joshi 1988) is anextension of the LTAG formalism. In FB-LTAG, each node in elementarytrees has a feature structure, containing grammatical constraints on thenode.

2.2 Head-Driven Phrase Structure Grammar (HPSG)

We de�ne HPSG, an output of the algorithm, following the computationalspeci�cations in (Pollard and Sag 1994). It consists of lexical entries and

311


Arg :

we can run

ID grammar rule

unifySym : NP

Arg :

Sym : VP

Arg : VP

Sym : VP

Arg : NP

Arg :

Sym :

Arg :

22

3322

unify

33

unify

ID grammar rule

we can run

Sym : NP

Arg :

Sym : VP

Arg : VP

Sym : VP

Arg : NP

Arg : NPSym :

Arg :

Arg :

11

11 | 22

Arg : 22

unify

we can run

Sym : NPArg :

Sym : VPArg : VP

Sym : VPArg : NP

Arg : NP

Arg :

Figure 27.2: Parsing with an HPSG grammar

ID grammar rules, each of which is described with typed feature struc-tures (Carpenter 1992). A lexical entry for each word expresses characteris-tics of the word, such as subcategorization frame and grammatical category.An ID grammar rule represents a grammatical relation between a motherand its daughters, and is independent of lexical characteristics.

Figure 27.2 illustrates an example of bottom-up parsing with an HPSGgrammar. As we can see in this example, in HPSG a parse tree is gen-erated by incrementally applying ID grammar rules to lexical entries andconstructing each branching one by one, while in LTAG it is done by com-posing elementary trees with the two operations. Thus, the key points ofthe conversion are 1) how to encode a tree structure of an elementary tree asan HPSG feature structure, and 2) how to emulate substitution and adjunc-tion. We should notice that there is not one-to-one correspondence betweenan elementary tree and an HPSG lexical entry because one elementary treecan have multiple anchors to represent compound expressions.

3 Algorithm

This section describes an algorithm for grammar conversion in detail. Asdescribed in Section 2, tree structures of LTAG elementary trees should beencoded in HPSG lexical entries, and the tree composing operations shouldbe emulated by ID grammar rules. Thus, we propose a conversion algorithmwhich consists of: 1) conversion of elementary trees into HPSG lexical en-tries and 2) emulation of substitution and adjunction by pre-determined IDgrammar rules.

In the following, we �rst state the notion of canonical , and show canon-ical elementary trees, which have a one-to-one correspondence to an HPSGlexical entry2. Canonical elementary trees are elementary trees which satisfythe conditions below:

2In this paper, we discuss a conversion of elementary trees which consist of binarybranchings. A conversion of trees with unary branchings is straightforward and trees withn-ary (n � 3) branchings can be converted into trees with binary branchings.

312


Canonical elementary tree Non-canonical elementary trees

think

S

NP VP

V S*

it

S

NP VP

N V VP

V

ε

is

Non-anchored subtree

S

NP VP

V PP

P NP

for

look PP S

P NP

a) Exception for Condition 1 b) Exception for Condition 2

Figure 27.3: A canonical elementary tree and exceptions

Condition 1 A tree must have only one anchor.

Condition 2 All branchings in a tree must contain trunk nodes.

Trunk nodes are nodes on a trunk , which is a path from an anchor to theroot node (the thick line in Figure 27.3) (Kasper et al. 1995), with theexception of the anchors. Condition 1 guarantees that a canonical tree hasonly one trunk, and Condition 2 guarantees that each branching consistsof a trunk node, a leaf node, and their mother (also a trunk node) (theleft-hand side of Figure 27.3). The right-hand side of Figure 27.3 showsnon-canonical ones. We call a subtree which is a depth 1 or above withno anchor a non-anchored subtree. Non-canonical elementary trees are �rstconverted to canonical ones, and then applied the algorithm for canonicalones.

3.1 Conversion of canonical elementary trees

This section describes a conversion algorithm of an LTAG canonical elemen-tary tree into an HPSG lexical entry. Canonical elementary trees can bedirectly converted to HPSG lexical entries as shown in the left-hand side ofFigure 27.4.

The procedure convert tree into lexical entry in Figure 27.4 de-picts an algorithm for converting an elementary tree T into an HPSG lexicalentry L. In the algorithm, arg is a list of branchings bi described with aquadruplet hni�1; li; di; tii along the trunk. The parameter ni�1 representsthe mother node of the trunk node ni. The parameters li, di and ti repre-sent a leaf node with depth i; they represent the non-terminal symbol, thedirection (on which side of the trunk node ni the leaf node is), and the type(whether a foot node or a substitution node) respectively. We call this listthe arguments of the word. Finally the converted lexical entry L is described

313


Sym:

Arg:

Sym :

Leaf :

Dir : right left,

Foot?: +_

*

think

V S

VP

S

NP

V

think: S

VP S

NP

foot node

anchor

trunk

*

substitution node

Sym :

Leaf :

Dir :Foot?:

Sym: symbol of a trunk nodeLeaf: symbol of a leaf nodeDir: the direction of a leaf node

against the trunkFoot?: the type of a leaf node

procedure convert tree into lexical entry(T )beginarg := []for i := 1 to depth(T )�1ni�1 := trunk(T , i � 1)li := leaf(T , i)

di := direct(T , i)ti := type(T , i)bi := hni�1; li; di; tiiarg := [bi] � arg

end forL := (ndepth(T)�1 , arg)

return L;end

depth: returns the depth of the anchor.trunk: returns the symbol of the trunk node.leaf: returns the symbol of the leaf node in depth idirect: returns which side of the trunk node

the leaf node is in depth itype: returns the type of the leaf node in depth i

Figure 27.4: A conversion algorithm from a canonical elementary tree T intoan HPSG lexical entry L

with the arguments arg and the mother of the anchor, namely, ndepth(T)�1where depth(T) is the depth of the tree T .

In the left-hand side of Figure 27.4, the value of the Sym feature is thesymbol ndepth(T )�1, the value of the Arg feature contains the arguments inarg, as a list of feature structures with four features Sym, Leaf, Dir andFoot? corresponding to ni�1, li, di and ti respectively.

Sym :

Arg :

Arg :

11

Sym :

Leaf :

Dir : left

Foot? : �

Sym :

Arg :

2

3311

33

Sym :

Arg :

Arg :

11

Sym :

Leaf :

Dir : left

Foot? : �

Sym :

Arg :

2

3311

332

4

4

substitution node

trunk node trunk node

foot node2

Left substitution rule Left adjunction rule

Figure 27.5: Grammar rules: substitution rule and adjuction rule

3.2 De�nition of grammar rules

In this section, we give the de�nition of grammar rules to emulate substi-tution and adjunction (Figure 27.5)3. We should note that these grammar

3In the �gure, we give rules whose right daughter is a trunk node. Of course, there aresymmetric rules whose left daughter is a trunk node.

314


NP

NP S

NP V

loves

α1

β1

NP

N

he

NP

N

what

α2

α3

NP S

V S

thinkNP

N

you

α4

*

S

Sym : NP

Arg :

Sym : S

Arg :

Sym : S

α1

22

11

55

33

Sym : S

Leaf : NP

Dir : left

Foot? : �

22

11

Sym : VP

Leaf : S

Dir : right

Foot? : +

Sym : NP

Arg :

Sym : NP

Arg :

Sym : V

Sym : S

Sym : VP

Sym : V

think:

loves:

you

… A

*

… B

44

44

77

77

88

66

Sym : S

Leaf : NP

Dir : left

Foot? : �

55

Sym : S

Leaf : NP

Dir : left

Foot? : �

22

11

55

Sym : S

Leaf : NP

Dir : left

Foot? : �

22

11

33

66

88

Sym : S

Leaf : NP

Dir : left

Foot? : �

33

66

Sym : S

Leaf : NP

Dir : left

Foot? : �

,

55

Sym : S

Leaf : NP

Dir : left

Foot? : �

22

11

,

44

99

99

β1

he

α2

α4

α3

Arg :

Arg :

Arg : Arg :

Arg :

what

… C

LTAG derivation HPSG rule applications

Figure 27.6: LTAG derivation and HPSG rule applications for a phrase\what you think he loves"

rules are independent of the original grammar because they do not specifyany characteristics given in the original grammar.

Substitution rule: The Sym feature of the substitution node must havethe value of the Leaf feature 3 of the trunk node. The Arg feature of thesubstitution node must be a null list, because the substitution node mustbe uni�ed only with the node corresponding to the root node of the initialtree. The substitution rule percolates the tail elements 2 of the Arg featureof the trunk node to the mother in order to continue constructing the tree.

Adjunction rule: The Sym feature of a foot node must have the samevalue as the Leaf feature 3 . The value of the Arg feature of the mother

node is a concatenation list of both Arg features 2 and 4 of its daughtersbecause we �rst construct the tree corresponding to the adjoining tree andnext continue constructing the tree corresponding to the adjoined tree. Thevalue \+" or \�" of the Foot? feature explicitly determines whether thenext rule application is the adjunction rule or the substitution rule.

Figure 27.6 shows an example of rule applications. The thick line indi-cates the adjoined tree (�1) and the dashed line indicates the adjoining tree(�1). The adjunction rule is applied to construct the branching marked with?, where \think" takes as an argument a node whose Sym feature's value is

315


S

NP VP

V PP

P NP

for

S

NP VP

V

P NP

for

look

look

cut off

PP look_forPP look_for PP look_forPP look_for

identifier

procedure divide tree into subtrees(MT )beginA := select(MT )hST;Ti := divtree(MT , A) � � � (1)foreach T (T)SST := divide tree into subtrees(T)ST := SST [ ST

end foreachST := ST [ fSTgreturn ST;

end

procedure divtree(MT , A)beginT := �for i := 1 to depth(MT , A)-1

if (nonleaf(arg(trunk(i))))

hMT 0 ; T i := cut(MT , arg(trunk(i))) � � � (2)address(trunk(i), Address)

mark(Address, MT 0, T ) � � � (3)T = T [ T

MT := MT 0

end ifend forST := MTreturn hST;Ti

end

select: returns one of anchors.depth: returns the depth of the anchor.trunk: returns the trunk node in depth i.arg: returns the sister of the trunk node.cut: cuts o� the tree at the sister of the trunk node and

returns a subtree whose root node is the sister node.nonleaf:returns true if not a leaf node.address:returns address in the elementary tree.mark: marks the address to each cut-o� node.

Figure 27.7: An algorithm for dividing a multi-anchored elementary treeMT into single-anchored trees ST

S. By applying the adjunction rule, the Arg feature of the mother node (B)becomes a concatenation list of both Arg features of �1 ( 8 ) and �1 ( 5 ).Note that when the construction of �1 is completed, the Arg feature of thetrunk node (C) will be its former state (A). We can continue constructing�1 as if nothing had happened.

3.3 Division of multi-anchored trees

Multi-anchored elementary trees, which violate Condition 1, are divided intomultiple canonical elementary trees. We call the cutting nodes in the dividedtrees cut-o� nodes. Note that a cut-o� node is marked by an identi�er topreserve a co-occurrence relation among the multiple anchors (Figure 27.7).

The procedure divide tree into subtrees in Figure 27.7 depicts analgorithm for dividing a multi-anchored elementary tree MT into a set ofsingle-anchored subtrees ST. One anchor A is selected, and a single-anchoredtree ST which can be constructed from the anchor A is picked up ((1) inFigure 27.7). We check the path from the root node to the anchor A, cut o�

316


P

Ad P

P

substitution

all candidate initial trees

for substitution

, …

non-anchored subtree

multi-anchored trees without non-anchored subtrees

it

S

NP VP

N V

is

VP

V

ε

PP S

P NP

breaking points

on

tonext

it

S

NP VP

N V

is

VP

V

ε

PP S

P NP

it

S

NP VP

N V

is

VP

V

ε

PP S

P NP

, …

Ad Pon

tonext

procedure expand tree into anchored tree(NT )beginBR := na br(NT )MT := fNTgforeach BR (BR)S := select(BR) � � � (1)

IT := initial(S) � � � (2)MMT := �foreach MT (MT)TMT := substitute(MT , S, IT) � � � (3)mark(S)MMT := TMT\MMT

end foreachMT := MMT

end foreach

return MTend

na br: returns the deepest branchings, whosedaughters do not consist of an anchor.

initial: returns all initial tree whose root node is thesame as the leaf node.

select: returns one of leaf nodes.substitute: causes substitution to the leaf node and

returns a set of resulting multi-anchored trees.mark: marks the substituted node.

Figure 27.8: An algorithm for converting a non-anchored subtree NT intoanchored trees MT

the sister node4 arg(trunk(i)) if it is not a leaf node, and store the addressof an elementary tree into the cut-o� node as an identi�er ((2) and (3) inFigure 27.7).

3.4 Substitution to non-anchored subtrees

Non-canonical elementary trees violating Condition 2 have non-anchoredsubtrees. A non-anchored subtree is converted into multi-anchored trees bysubstituting the deepest node with initial trees (Figure 27.8). Substitutednodes are marked as breaking points to remember that the nodes originatefrom the substitution nodes. In the resulting trees, all subtrees are anchoredso that we can apply the above conversion algorithms.

The procedure expand tree into anchored tree in Figure 27.8 depictsan algorithm for combining a non-anchored subtreeNT with initial trees intoanchored trees MT. For each branching consisting of substitution nodes orfoot nodes, one substitution node S is selected ((1) in Figure 27.8). Thefunction substitute causes a substitution to the node S with all initialtrees which can substitute to S ((3) in Figure 27.8).

4We should mention that the path from the foot node to the root node (spine) in anauxiliary tree must not be cut because the spine represents the chain of head signs betweenthe root node and the foot node, which are uni�ed with the same internal node in theother elementary trees.

317


We should consider the termination of this algorithm. Since the functioninitial returns all initial trees whose root node is the same as the leaf node((2) in Figure 27.8), the initial trees might include a tree violating Condition2, to which this algorithm should also be applied. This seems to cause anin�nite recursion of application of this algorithm, but it can be avoided byremoving non-anchored subtrees from such an initial tree before applyingthis algorithm.

3.5 Strong equivalence

Our algorithm guarantees the strong equivalence of the original and ob-tained grammars. In the obtained grammar, the above grammar rules areapplied only to feature structures corresponding to nodes which can substi-tute/be adjoined in the LTAG canonical elementary trees because encodedbranchings in Arg specify the nodes to be subcategorized next. The strongequivalence holds also for conversion of non-canonical elementary trees. Fortrees violating Condition 1, we can distinguish the cut-o� nodes from thesubstitution nodes owing to identi�ers, which recover the co-occurrence re-lation in the original elementary trees between the divided trees. For treesviolating Condition 2, we can identify substitution nodes in a combinedtree because they are marked as breaking points, and we can consider thecombined tree as two trees in the LTAG derivation.

By following a history of rule applications and mapping each of themto substitution or adjunction, we can recover an LTAG derivation tree froman HPSG parse tree. First, we �nd the syntactic head of a parse tree byfollowing a trunk node when the substitution rule was applied, or a foot nodewhen the adjunction rule was applied. We then follow the path from thesyntactic head to the root node. If we �nd an application of the substitutionrule, we can readily map it to LTAG substitution. If we �nd an applicationof the adjunction rule, the node takes adjunction. We remember the lengthof the Arg feature of the node to identify the adjoining tree. If the lengthof the Arg feature of a trunk node is longer than the one we remembered,we are in the adjoining tree. When the length becomes equal to the one weremembered, the construction of the adjoining tree reaches to its root nodeand the construction of the adjoined tree restarts. Thus we can bring downthe adjoining tree. Let us consider the case in Figure 27.6. First we �ndthat \love" is a syntactic head of the parse tree. We then follow the pathfrom the anchor to the root node of �1. Since we �rst �nd an application ofthe substitution rule, we can map it to the substitution of �3 to �1. Then,the next rule is the adjunction rule (marked with ?), and we can �nd thenode A takes adjunction. We thus remember the length of the Arg featureof the node A, and follow the trunk until the length of the Arg feature isequal to that of the node A. At the node C, the length of the Arg featureis equal to the node A. This implies that the construction of the adjoining

318


tree �1 is completed at the node C.

3.6 Extension to FB-LTAG

The above algorithm gives the conversion of LTAG, and it can be easily ex-tended to handle an FB-LTAG grammar by merely storing a feature struc-ture of each node into the Sym feature and Leaf feature together with thenon-terminal symbol. ID grammar rules execute feature structure uni�ca-tion done in LTAG substitution and adjunction.

3.7 Remaining issue

The above algorithm gives a formal link between HPSG and LTAG, butthe linguistic aspects are not well discussed. In this section, we discuss thelinguistic aspects according to the notion of the syntactic head, which is acentral notion in HPSG.

Distinction between predicative/modifying auxiliary trees Weshould mention that there are two kinds of auxiliary trees in LTAG (Kroch1989; Schabes and Shieber 1994). One is a predicative auxiliary tree, andthe other is a modi�er auxiliary tree. The former introduces a predicatethat subcategorizes for a phrase of the category of its foot node and as-signs a thematic role to the phrase, while the latter introduces a modifying,complement or dislocated phrase. This distinction between the two kindsof auxiliary trees is given roughly by determining which daughter is head, afoot node or a trunk node. Tateisi et al. (Tateisi et al. 1998) distinguishedthese trees by manually analyzing feature percolation in auxiliary trees andby assigning HPSG rule schemata separately to them. In this paper, weconsider that all trunk nodes are head, but it can be manually or semi-automatically determined by giving some linguistic cues or by analyzingfeature percolation.

Head selection It is not entirely clear how we should select the anchorA in Figure 27.7. Since most multi-anchored trees represent compound ex-pressions or idioms, such as \in front of " and \kick the bucket", this problemcan be replaced with another problem that which word of a phrase should besyntactic/semantic head. In the current implementation, we simply selectan anchor from the left most one, though we can select one by using theidea of projection or some other linguistic approaches. There is room forargument on how we should select the leaf node S to be substituted inFigure 27.8. Since elementary trees with non-anchored subtrees representconstructions requiring a speci�cation beyond immediate dominance, suchas it-clefts and equative be, this problem can be rephrased as the problem

319


Table 27.1: The classi�cation of elementary trees in the XTAG Englishgrammar (LTAG) and converted lexical entries corresponding to them(HPSG): A: canonical elementary trees, B: elementary trees violating onlyCondition 1, C: elementary trees violating only Condition 2, D: elementarytrees violating both conditions

Grammar A B C D Total

LTAG 326 764 54 50 1,194

HPSG 326 1,992 1,083 2,474 5,875

that which leaf node takes the most important syntactic role in the con-structions and should be expanded for HPSG analysis. We can solve thisproblem by using the same idea as we have mentioned above, but for nowwe simply select a leaf node from the left most one.

4 Experiments

The algorithm was applied to the latest version of the XTAG English gram-mar (The XTAG Research Group 2001)5, which is a large-scale FB-LTAGgrammar for English. We successfully converted all the elementary trees6 inthe XTAG English grammar to HPSG lexical entries. Table 27.1 shows theclassi�cations of elementary trees of the XTAG English grammar, accordingto the conditions we introduced, and also shows the number of correspondingHPSG lexical entries.

We acquired exactly the same number of the derivation trees by usingthe original and the obtained grammar in the parsing experiment with 457sentences from the ATIS corpus (Marcus et al. 1994)7 (the average lengthis 6.32 words). This result empirically attested the strong equivalence ofour algorithm. The experimental result also shows that an eÆcient HPSGparser (Torisawa et al. 2000) achieves a drastic speed-up against the LTAGparser (Sarkar 2000). The speed-up is because of not only the careful im-plementation but also the di�erence of the parsing scheme in the HPSG andLTAG parsing (Yoshinaga et al. 2001).

5We used the grammar attached to the latest distribution of an LTAGparser which we used for the parsing experiment. The parser is available at:ftp://ftp.cis.upenn.edu/pub/xtag/lem/lem-0.13.0.i686.tgz

6Elementary trees should be in fact denoted as elementary tree templates. That is,elementary trees are abstracted from lexicalized trees, and one elementary tree templateis de�ned for one syntactic construction, which is assigned to a number of words.

7We eliminated 59 sentences because of a time-out of the parsers, and 61 sentencesbecause the LTAG parser does not produce correct derivation trees because of bugs in itspreprocessor.

320


Table 27.2: The classi�cation of the elementary trees violating the conditionsin Table 27.1: multi-anchored ones (corresponding to B) (left), and ones withnon-anchored subtrees (corresponding to C [D) (right)

Construction # of trees

Compound expressions 414Verb with PP 194Idioms 140Others 16

Total 764

Construction # of trees

Verb with PP 85It-cleft 12Others 7

Total 104

The left-hand side of Table 27.2 shows how multi-anchored elementarytrees are used in the XTAG English grammar. The table shows they aremainly used for compound expressions or idioms. Although such expressionsseem to be diÆcult to explain in the HPSG formalism, the obtained grammarcan handle them with a proper size of lexical entries by dividing them intomultiple lexical entries. Another case for multi-anchored trees is verb withPP . The obtained grammar expresses this construction by cut-o� nodes torequire speci�ed subtrees. On the other hand in linguistic speci�cations ofHPSG, such a constraint is expressed by having the FORM feature, whichrequires the type of a phrase. This analysis seems to be consistent with theobtained grammar, i.e., the LTAG analysis.

The right-hand side of Table 27.2 shows the usage of elementary treeswith non-anchored subtrees. These elementary trees express constructionsrequiring speci�cations beyond immediate dominance. As we can see inTable 27.1, these trees are expanded to quite large number of lexical entries.This result expects that these constructions might be diÆcult to handle inthe HPSG formalism. The major case of such constructions is verb withPP , which causes a Wh-extraction of NP from PP. Since this constructionis explained by using SLASH feature in HPSG, the di�erence between theformalisms appears here. Another case is it-cleft , which is not obvious tohandle in HPSG, and it might be worth discussing.

5 Conclusion

We proposed a grammar conversion from an FB-LTAG grammar into anHPSG-style grammar. The grammar conversion guarantees the strong equiv-alence, and hence we can obtain parsing results of an LTAG grammar fromthose of the obtained HPSG-style grammar. Thus, our grammar conver-sion enables to share LTAG resources in the HPSG community, to applyHPSG parsing techniques to LTAG grammars, and to clarify the di�erenceof linguistic analysis in both grammar formalisms. We implemented this al-

321


gorithm, and successfully converted the latest version of the XTAG Englishgrammar. Although we did not give a formal proof of strong equivalence, itwill be possible owing to the formulation of our conversion algorithm.

Acknowledgment The authors wish to thank Mr. Anoop Sarkar for hishelp in using his parser in our experiment. The authors are also indebtedto two anonymous reviewers for their valuable comments on this paper.

References

Abeill�e, A. and M.-H. Candito (2000). Evolution of the XTAG system. InA. Abeill�e and O. Rambow (Eds.), Tree Adjoining Grammars: Formal,Computational and Linguistic Aspects. CSLI publications.

Becker, T. and P. Lopez (2000). Adapting HPSG-to-TAG compilationto wide-coverage grammars. In Proceedings of the 5th of Interna-tional Workshop on Tree Adjoining Grammars and Related Frame-works (TAG+5), pp. 47{54.

Carpenter, B. (1992). The Logic of Typed Feature Structures. CambridgeUniversity Press.

Doran, C., B. A. Hockey, A. Sarkar, B. Srinivas, and F. Xia (2000). FTAG:A Lexicalized Tree Adjoining Grammar for French. In A. Abeill�e andO. Rambow (Eds.), Tree Adjoining Grammars: Formal, Computa-tional and Linguistic Aspects. CSLI publications.

Flickinger, D., S. Oepen, and J. Tsujii (Eds.) (2000). Natural LanguageEngineering { Special Issue on EÆcient Processing with HPSG: Meth-ods, Systems, Evaluation. Cambridge University Press.

Kasper, R., B. Kiefer, K. Netter, and K. Vijay-Shanker (1995). Compila-tion of HPSG to TAG. In Proceedings of 33rd Meeting of the Associ-ation for Computational Linguistics (ACL '94), pp. 92{99.

Kay, M., J. Gawron, and P. Norvig (1994). Verbmobil: A TranslationSystem for Face-to-Face Dialog. CSLI Publications.

Kroch, A. (1989). Asymmetries in long-distance extraction in a Tree-Adjoining Grammar. In M. Baltin and A. Kroch (Eds.), AlternativeConceptions of Phrase Structure. University of Chicago Press.

Marcus, M., B. Santorini, and M. A. Marcinkiewicz (1994). Building alarge annotated corpus of English: the Penn Treebank. ComputationalLinguistics 19 (2), 313{330.

Pollard, C. and I. A. Sag (1994). Head-Driven Phrase Structure Grammar.University of Chicago Press and CSLI Publications.

322


Sarkar, A. (2000). Practical experiments in parsing using Tree Adjoin-ing Grammars. In Proceedings of the 5th of International Workshopon Tree Adjoining Grammars and Related Frameworks (TAG+5), pp.193{198.

Schabes, Y., A. Abeille, and A. K. Joshi (1988). Parsing strategies with`lexicalized' grammars: Application to Tree Adjoining Grammars. InProceedings of 12th International Conference on Computational Lin-guistics (COLING '92), pp. 578{583.

Schabes, Y. and S. M. Shieber (1994). An alternative conception of tree-adjoining derivation. Computational Linguistics 20 (1), 91{124.

Tateisi, Y., K. Torisawa, Y. Miyao, and J. Tsujii (1998). Translating theXTAG English grammar to HPSG. In Proceedings of the 4th of Inter-national Workshop on Tree Adjoining Grammars and Related Frame-works (TAG+4), pp. 172{175.

The XTAG Research Group (2001). A Lexicalized Tree Adjoining Gram-mar for English. http://www.cis.upenn.edu/~xtag/.

Torisawa, K., K. Nishida, Y. Miyao, and J. Tsujii (2000). An HPSG parserwith CFG �ltering. Natural Language Engineering { Special Issue onEÆcient Processing with HPSG: Methods, Systems, Evaluation 6 (1),63{80.

Vijay-Shanker, K. (1987). A Study of Tree Adjoining Grammars. Ph. D.thesis, Department of Computer & Information Science, University ofPennsylvania.

Vijay-Shanker, K. and A. K. Joshi (1988). Feature structures based TreeAdjoining Grammars. In Proceedings of 12th International Conferenceon Computational Linguistics (COLING '92), pp. 714{719.

Yoshinaga, N., Y. Miyao, K. Torisawa, and J. Tsujii (2001). EÆcientLTAG parsing using HPSG parsers. In Proceedings of Paci�c Associ-ation for Computational Linguistics (PACLING 2001). To appear.

323


324

In�nitary Expressibility of Necessity in

Terms of Contingency

Evgeni Zolin

Department of Mathematical Logic

Faculty of Mathematics and Mechanics

Moscow State University, 119899 Moscow, Russia

[email protected]

Abstract. This paper consists of two parts. In the �rst part we present anaxiomatization of the \epistemic" modal logic KD45 in the language with non-contingency operator as the sole modal primitive symbol. The second part, havinga certain philosophical avour, is devoted to the old question about the possibilityof de�ning the necessity operator in terms of the contingency operator. Here wegive a new, positive answer to this question by constructing an in�nitary operatorde�ned in terms of contingency which behaves like some necessity.

1 Introduction

The non-contingency operator B is de�ned in terms of the necessity operator� by putting BA := �A _�:A. A natural question arises here: is necessityexpressible in terms of non-contingency? The answer depends upon theunderstanding of the notion of expressibility.

In general, a notion � is said to be de�nable (or expressible) in terms of anotion � if there exists an expression A = A(�) containing � such that A isequal (or equivalent) to �. In our case this means that � would be de�nablein terms of B if there exists a formula '(p) such that all occurrences of �in ' are in contexts of the form B and the equivalence �p$ '(p) is valid.From this point of view, � is not de�nable in terms of B (cf. (3; 4)).

However, this understanding is rather con�ned, by the author's opin-ion, and more appropriate is to say that A behaves like �, or A subjectsthe same laws as �. This approach proves to be successful to give a new,positive answer to the above question. In this paper we construct an op-erator � (by giving its in�nitary de�nition in terms of B) which behaveslike some necessity. To be more exact, we show that, for any normal modallogic L (describing the behaviour of �) the corresponding logic describingthe behaviour of � is normal and, for some L, it even contains L (up to

Proceedings of the Sixth ESSLLI Student SessionKristina Striegnitz (editor)Chapter 28, Copyright c 2001, Evgeni Zolin

325

In�nitary Expressibility of Necessity in Terms of Contingency

replacement of � by �). The operator � plays an important role in theproof of completeness theorem for non-contingency logic of KD45.

2 Preliminaries

The propositional modal language consists of a denumerable set of variablesVar = fp0; p1; : : :g, symbols for falsehood ?, implication !, and a unarymodal operator �. Other connectives are taken as standard abbreviations.The set of formulas of this language, Fm�, is de�ned as usual, in particular,if A is a formula then so is �A. This language will be referred to as a �-language and its formulas as �-formulas. A B-language and the set FmB

of B-formulas are de�ned similarly, just by replacing the symbol � by B.Fix a translation tr : FmB ! Fm� which respects boolean connectives andsets tr(BA) := �tr(A) _�:tr(A).

A (Kripke) frame is a structure hW; "i, where W is a nonempty set of\worlds" and " is a binary \accessibility" relation on W . By # we denotethe converse relation of ". Quanti�cation over worlds accessible from a givenworld w 2W will be written as 8x#w and 9x#w. A model M = hF; j=iconsists of a frame F and a valuation j= � (W�Var). The notion \A is truein M at w" (written M;w j= A and M usually omitted) is de�ned for both�- and B-formulas in the standard way; the modal clauses are as follows:

w j= �A � 8x#w x j= A;w j= BA �

�8x#w x j= A

�or�8x#w x 6j= A

�:

Obviously, w j= A , w j= tr(A), for any B-formula A. A formula A is validin a frame F (F j= A, in symbols) if A is true at every world in every modelbased on F . If � is a set of formulas then a �-frame is a frame validating �.A logic L is called complete w.r.t. a class of frames F if, for any formula A,L ` A , F j= A.

The minimal normal modal logic K has the following axioms and theinference rules of modus ponens, substitution, and necessitation:

(A�>) All classical tautologies in the �-language

(A�K) �(p! q)! (�p! �q) (distributivity)

(MP)A A! B

B(Sub)

A

A[B=p](Nec)

A

�A

The systems K�, � � fD;4;5g, are obtained by adding to K theaxioms (A�

S), S 2 �, listed below (the class of frames characterized by an

axiom (A�S

) is �rst-order de�nable by a formula also shown below).

(A�D) �p! �p 8w 9x w"x (seriality)

(A�4 ) �p! ��p 8w 8x#w 8y#x w"y (transitivity)

(A�5 ) �p! ��p 8w 8x#w 8y#w x"y (euclideanness)

326

Evgeni Zolin

The logic KD45 is known to capture the principles of reasoning involvingepistemic judgments: the postulates of this logic are valid under the (infor-mal) interpretation of a sentence of the form �A as \A is known (to someidealized person)". In this context, the non-contingency assertion BA means\the truth value of A is known".

We shall consider only normal modal �-logics, i.e., sets of �-formulascontaining the axioms of K and closed under the rules of K. Given a logic L,a non-contingency logic of L (a B-logic of L, for short), denoted by LB, isthe set of all B-formulas whose translations are theorems of L:

LB = fA 2 FmB j tr(A) 2 Lg = tr�1(L):

Montgomery and Routley (5; 6; 7) axiomatized the non-contingencylogics of T, S4, and S5. It is worth noting that if L contains T, or morespeci�cally, the re exivity scheme �A! A, necessity is de�nable in termsof non-contingency (B-de�nable, for short) by �A = A&BA. In the logicVer, the same e�ect is observed: it proves, for any A, a formula �A$ >,which can be viewed as a B-de�nition of �. Cresswell (2) provides anexample of logic H such that H 6� T, H 6= Ver, but � is B-de�nable in H.

A systematic study of non-contingency logic (in particular, the caseswhen � is counted to be B-unde�nable), was initiated by Humberstone.In his paper (3), a (rather complicated) system axiomatizing the non-con-tingency logic of K was presented. Kuhn (4) succeeded in simplifying thissystem and proposed a �nite axiomatization of the non-contingency logic ofK4.

3 Axiomatizations of non-contingency logics

Now we formulate our systems for B-logics of the logics L described above.For notation simplicity, we denote the systems by LB; Theorem 3.1 belowjusti�es the notation. The logic KB has the rules (MP) and (Sub) as well asthe following axioms and the \noncontingentization" rule (cf. (3)):

(AB>) All classical tautologies in the B-language(ABK) B(p$ q)! (Bp$ Bq) (equivalence)(AB: ) Bp$ B:p (mirror axiom)(AB_ ) Bp!

�B(q ! p) _B(p! r)

�(dichotomy)

(NCR)A

BA

To obtain the system K�B, � � fD;4;5g, we add to KB the corre-sponding axioms (note that no axiom corresponds to seriality):

(AB4 ) Bp! B(q ! Bp) (weak transitivity)(AB5 ) :Bp! B(q ! :Bp) (weak euclideanness)

The classes of frames characterized by these two axioms strictly contain theclasses of transitive (resp. euclidean) frames as well as the class of functionalframes (where each world \sees" at most one world); hence their names.

327


The systems for KB and K4B proposed by Kuhn (4) di�er from ours:instead of the axiom (ABK), his systems contain the rule of equivalent re-placement A$B

BA$BB as well as an additional axiom Bp & Bq ! B(p& q).Our systems are similar to the standard axiomatizations of normal logics.

The main result of this section is formulated in Theorem 3.1, statingthat the systems K�B axiomatize the B-logics of K�, � � f4;5g (logicscontaining the seriality axiom will be considered at the end of the section).

Theorem 3.1 (Completeness). For any � � f4;5g and a B-formula A,the following statements are equivalent:

(1) K�B ` A;(2) K� ` tr(A);(3) A is valid in all K�-frames.

Proof. We follow the scheme (1) ) (2) , (3) ) (1). The equivalence(2), (3) is well-known (cf. (1)) completeness of K� w.r.t. K�-frames.In the rest of the proof we refer to B-formulas as just formulas.

(1)) (2) The axioms of KB are valid in any frame, so the translations

thereof are provable in K. For the axiom (AB4 ), the proof is in (4). We givea sketch of a derivation of (the translation of) the axiom (AB5 ) in K5:

K5 ` :Bp ��:

XXz�p ! ��p�:p! ��:p

XXz��: �:Bp! �(q ! :Bp)! B(q ! :Bp).

(3)) (1) We construct the canonical model ML = hWL; "; j=i for the logic

L = K�B. Its worlds are maximal L-consistent sets of formulas. A valu-ation is de�ned in the usual way: w j= p , p 2 w, for any world w and avariable p. Before de�ning the relation ", we introduce some notation.

For a formula A, denote �A := fB(B ! A) j B 2 FmBg. In thesubsequent proof, the symbol � plays the role similar to that of � in thestandard canonical model argument for �-logics. The di�erence is in their\types": the operator � maps a formula to a formula, whereas � maps aformula to a set of formulas. Note that semantically � is not equivalentto �, i.e., the truth at a world w of the formula �A is not equivalent tothe truth at w of all formulas in the set �A. The next section is devoted toinvestigation of interconnection between the operators � and �.

Now denote ]w := fA 2 FmB j �A � wg. Finally, put w"x i� ]w � x.

Lemma 3.2. For any world w 2WL, the following properties are satis�ed:1Æ (Dichotomy) If BA 2 w then either A 2 ]w or :A 2 ]w.2Æ The set ]w is closed under (even empty) conjunction (hence ]w 6= ?).3Æ The set ]w is closed under derivability in L: if A 2 ]w and L `

A! B, then B 2 ]w.4Æ The dichotomy property is reversible: if A 2 ]w then BA 2 w.

328

Evgeni Zolin

I 1Æ. Suppose A;:A =2 ]w, then by de�nition of ]w, for some formulasB;C we have: :B(B ! A) 2 w, :B(C ! :A) 2 w. However, using thedichotomy axiom, we derive: KB ` BA !

�B(B ! A) _ B(C ! :A)

�,

hence w is even KB-inconsistent, which contradicts our assumptions.2Æ. By de�nition, the empty conjunction is >. Since KB ` B(B ! >),

for any formula B, we have �> � KB � L � w and so > 2 ]w.Now letA;B 2 ]w and prove that (A&B) 2 ]w, i.e., B[C!(A&B)] 2 w,

for any formula C. From �A � w and �B � w it follows that B(C!A) 2 wand B(C!B) 2 w. Then we derive:

KB ` B(C ! A) &B(C ! B) �! B[(C ! A) & (C ! B)] !�! B[C ! (A&B)]:

Since w is closed under conjunction and derivability in KB (and even in L),we conclude: B[C ! (A&B)] 2 w.

3Æ. To prove that B 2 ]w, we take an arbitrary formula C and showthat B(C ! B) 2 w. Since �A � w, we have B[:(C ! B)! A] 2 w. Theassumption L ` A! B truth-functionally implies L ` [:(C ! B)! A]$[C ! B], and this �nally yields B(C ! B) 2 w.

4Æ. �A � w implies B(>!A) 2 w, which is equivalent to BA 2 w. J

Lemma 3.3 (j= = 3). For any formula A and a world w, w j= A, A 2 w.

I By induction on A. Consider the only interesting case A = BB.

(() BB 2 w ) (by dichotomy 1Æ)B 2 ]w or :B 2 ]w ) (by de�nition of ")(8x#w B 2 x) or (8x#w :B 2 x)) (by consistency of x)(8x#w B 2 x) or (8x#w B =2 x) ) (by induction hypothesis)(8x#w x j= B) or (8x#w x 6j= B) ) w j= BB:

()) Suppose BB =2 w. Then the sets X = ]w [ fBg and Y = ]w [ f:Bgare L-consistent. For, if Y is not then L ` (A1 & : : : & An)! B for someformulas A1; : : : ; An 2 ]w and n > 0. By 2Æ, (A1 & : : : & An) 2 ]w, thenB 2 ]w by 3Æ and BB 2 w by 4Æ, which is not the case. The argument forX is similar except for additional use of the mirror axiom.

Therefore, X and Y are contained in some worlds x and y. Since ]w � xand ]w � y, we have w"x and w"y; by induction hypothesis, B 2 x andB =2 y imply x j= B and y 6j= B, thus w 6j= BB. J

By this lemma, the canonical model falsi�es all the nontheorems of L.To conclude the proof, it remains to check that the canonical frame is aK�-frame. The case � = ? is trivial.

Suppose 4 2 � and prove that " is transitive. Let w"x"y and showthat w"y, i.e., ]w � y. Take any A 2 ]w, then B(B ! A) 2 w, for ev-ery B. By the axiom (AB4 ), K4B ` B(B ! A) ! B[C ! B(B ! A)], forany C. Since w is closed under K4B-derivability, B[C ! B(B ! A)] 2 w.

329


Hence �B(B ! A) � w and B(B ! A) 2 ]w � x, whence �A � x andA 2 ]x � y, as desired.

Suppose 5 2 � and prove that " is euclidean. Let w"x, w"y and showthat x"y, i.e., ]x � y. Take any A =2 y, then A =2 ]w by ]w � y, hence:B(B!A) 2 w, for some B. Since w is closed under K5B-derivability,we apply (AB5 ) to obtain B[C ! :B(B ! A)] 2 w, for all C, therefore�:B(B ! A) � w. By w"x, we conclude: :B(B ! A) 2 x, thus �A 6� xand A =2 ]x, hence the claim. a

Now we show, following (3), that adding the axiom (A�D) to some �-logicsdoes not change B-logic thereof. Let F = hW; "i be a frame. We denotethe set of worlds accessible from w 2W by w" := fx 2W j w"xg. Turning\blind" worlds into worlds \seeing" only itself yields a frame bF := hW;*i,

where * := " [ fhw;wi j w" = ?g. For a class of frames F, put bF := f bF jF 2 Fg. In (3) it is noted that F and bF validate the same B-formulas.

Theorem 3.4. Suppose a �-logic L is complete w.r.t. a class F and LD isthe smallest logic containing L and (A�D). If bF � F then LDB = LB.

Proof. The inclusion `�' is trivial. Now take any A 2 LDB; clearly,A 2 LB , tr(A) 2 L , F j= tr(A) , F j= A, so it remains to show that

F j= A, for any frame F 2 F. Since bF � F, we have bF 2 F and so bF j= L;besides, bF is serial, hence bF j= (A�D). Thus bF j= LD, whence bF j= LDB, in

particular, bF j= A. By the above, this is equivalent to F j= A. a

As a consequence, KD�B = K�B, for any � � f4;5g, since the tran-sitivity and euclideanness properties are preserved as we pass from F to bF .For the case � = ? the result was obtained in (3).

4 In�nitary operator

Roughly speaking, the following \in�nitary operator" occurs in the proof ofTheorem 3.1 (we replace a set of formulas by a conjunction thereof):

�A =V

B2FmBB(B ! A):

>From this equality one can read o� a natural Kripke semantics of the oper-ator �. The question arises immediately: What modal principles are validfor this operator? Surprisingly enough, the operator � subjects the lawsof some normal modal logic (which, of course, depends on the normal logicdescribing the behaviour of the initial necessity operator �).

To put it in a more precise form, consider the in�nitary B-language con-taining the set of variables Var as above, negation :, in�nitary conjunctionV

and a unary modal operator B. The set of formulas, FmB1, is de�ned by

induction: every variable pi is a formula; if A is a formula then so are :A and

330

Evgeni Zolin

BA; if � is a �nite or countable set of formulas thenV

� is a formula. Otherconnectives can be introduced as usual, e.g., (A! B) � :

VfA;:Bg; there-

fore we can assume that FmB � FmB1. Kripke semantics for this language

is de�ned in an obvious way.Further, we introduce a �-language obtained from the �-language by re-

placing the symbol� by�. Finally, we de�ne a translation Tr : Fm� ! FmB1

which respects boolean connectives and has the following inductive item:

Tr(�A) =V

B2FmBB(B ! Tr(A)):

This translation induces semantics for the�-language: F j= A� F j= Tr(A),for any �-formula A. One can even de�ne semantics for \mixed" formulascontaining �, B, and �. Note that the implication �A! �A is valid inany frame, whereas the converse one is not.

Now, given a �-logic L, we de�ne a �-logic of L as the set of all �-formulas valid in any L-frame:

L� � fA 2 Fm� j for any frame F (F j= L) F j= A)g:

It is easily seen, for example, that Ver� = Ver (from now on, we understandsuch equalities as well as inclusions up to replacement of � by �).

Theorem 4.1. If L is a normal �-logic then L� is a normal �-logic.

Proof. Since L� is clearly closed under the rules of K, we only need toverify that �(p! q)! (�p! �q) is valid. Assume the contrary, i.e., thereexists a model M and its world w such that

w j= �(p! q); w j= �p; w 6j= �q:

The latter implies that w 6j= B(A! q) for some B-formula A, and so

9x#w x j= A! q;9y#w y j= A;:q:

By our assumptions, w j= B(p! q);Bp. Hence w j= B[(p! q) & p], orequivalently, w j= B(p& q). But the case w j= �(p& q) is impossible, fory 6j= p& q, therefore we have w j= �(:p _ :q). Now consider two cases:

1) x j= q. Then x 6j= p, since x j= (:p _ :q) by the above. Usingw j= Bp, we conclude y 6j= p. This yields a contradiction: on the one hand,w j= B(q ! p), since w j= �p; on the other, x 6j= q ! p and y j= q ! p.

2) x 6j= q. Then x 6j= A, for x j= A! q. Since w j= Bp, there are twosubcases:

2a) x j= p and y j= p. Then from w j= B[A! (p! q)] it follows that:� either w j= �[A! (p! q)], which is not the case, for y j= A; p;:q;� or w j= �:[A! (p! q)], so w j= �A, in contradiction with x 6j= A.

331


2b) x 6j= p and y 6j= p. Then from w j= B[A! p] it follows that:� either w j= �[A! p], which is not the case, for y j= A;:p;� or w j= �:[A! p], hence w j= �A, in contradiction with x 6j= A. a

This result implies that the in�nitary operator � de�ned in terms of non-contingency behaves like some, possibly di�erent from the initial, necessity.

Theorem 4.2. For any � � f4;5g, we have K�� K�.

Proof. For � = ? the statement follows from Theorem 4.1.4 2 �. We shall prove that �p ! ��p is valid on any transitive frame.Assume that for a world w of some transitive model we have w j= �p andw 6j= ��p. This means that w 6j= B(A! �p) for some B-formula A, i.e.,

9x#w x j= A! �p;9y#w y j= A;:�p:

The latter, in turn, implies the existence of a B-formula B such that

9s#y s j= B ! p;9t#y t 6j= B ! p:

By transitivity, w"s, w"t, so w 6j= B(B ! p), in contradiction with w j= �p.4 2 �. We show that :�p! �:�p is true at any world w of any euclideanmodel. Let w j= :�p, i.e., w j= :B(A! p) for some B-formula A. By theaxiom (AB5 ), we conclude w j= B[B ! :B(A! p)] for any B-formula B,i.e., w j= �:B(A! p). Since the implication :B(A! p) ! :�p is validin any frame, the formula �:B(A! p) ! �:�p is valid too (we can usethe monotonicity principle \from '! it follows that �'! � ", for �is a normal modal operator). Thus we have w j= �:�p. a

This theorem cannot be generalized to all logics. A counterexample isKB� 6� KB, where KB = K + (A�B) and (A�B) is the symmetricity axiomp! ��p. One can easily construct a �nite symmetric frame falsifying theformula p! �:�:p.

It is worth noting that all the previous reasoning is valid if, in the def-inition of �, the in�nitary conjunction is taken only over the set of literals L := fp;:p j p 2 Varg. So, in what follows, we assume that � is de�ned as

�A :=V`2 L

B(`! A):

(In fact, this new operator � is not semantically equivalent to the previousone, as can be easily shown; however, the results obtained above remaintrue under new de�nition of � as well). Recall that, starting from �, wehave de�ned the operator B and then the operator �. What if we iteratethe procedure? Schematically, the next iteration looks like:

IA := �A _�:A; �A :=V`2 L

I(`! A):

Fortunately, this iteration of the construction is redundant.

332

Evgeni Zolin

Theorem 4.3. The operators � and � are semantically equivalent, i.e., theformula �p$ �p is valid in any frame. Moreover, we have j= Bp$ Ip.

Proof. Validity of the implication Bp ! Ip follows from j= �p! �p.Now, using j= �p! Bp, we obtain the converse implication:j= Ip ! (�p _�:p) �! (Bp _B:p) ! Bp. a

Let us observe the following distinctive feature of the operator �. Wehave established that � possesses the following two properties: (a) � is anormal modal operator; (b) the operator B is �-de�nable by the equalityBA = �A _ �:A (since B and I are equivalent). It turns out that � isthe weakest modality possessing (a) and (b) simultaneously (under modalityhere we mean any unary operator supplied by Kripke semantics; of course,this is not a formal de�nition; for example, any modal formula of one variablesuits for ours purposes). Indeed, assume that � is a modality satisfying (a)and (b). To prove that j= �p! �p, take any literal ` and put A := (`! p).From (b) it follows that j= (�A _ �:A)! BA, hence j= �A! BA. Bynormality of �, we have j= �p ! �(`! p). Therefore j= �p ! B(`! p),for any `, hence the claim: j= �p! �p.

Theorem 4.3 immediately implies that the in�nitary �- and �-logics ofK are distinct. Formally, denote by L1 the set of all in�nitary �-formulas(de�ned similarly to in�nitary B-formulas) that are valid in any L-frame:

L1 = fA 2 Fm�1 j for any frame F (F j= L) F j= A)g:

One can de�ne L�1 in the same manner. Now observe that K1 6= K�1, forthe logic K�1 contains the formula �p$ �p, or explicitly,

�p$V`2 L

��(`! p) _�:(`! p)

�;

whereas the logic K1 does not contain the corresponding in�nitary �-formula (since � and � are not equivalent).

5 Conclusion

The aim of this paper was to introduce a new modal operator � de�ned interms of the non-contingency operator. As we have observed, � is a necessityoperator (Theorem 4.1), which is similar to the original necessity � is someaspects (Theorem 4.2) and di�erent from it is some others (KB� 6� KB,K�1 6= K1). The new necessity has several distinctive features (idempo-tency of the construction � 7! �, the fact that � is the weakest necessitysuch that B is �-de�nable in a natural manner).

Our main conjecture is that K� = K. If this is the case then theconstruction of � may be regarded as a solution of the problem concerningde�nability of necessity in terms of contingency. If not then the logic K� is

333


a new modal logic of particular interest, like K, K4 etc. Another interestingissue is axiomatization of in�nitary�-logics L�1 over various modal logics L.This is a rather natural question, for the very de�nition of � is in�nitary.Our candidate for K�1 is K1[�=�] + f�p$ �pg. These questions seem tobe of both technical and philosophical interest, and the answers may shed anew light on the interconnection between necessity and contingency.

References

[1] A. Chagrov, M. Zakharyaschev, Modal Logic, Oxford Science Publica-tions, 1997.

[2] M. J. Cresswell, Necessity and contingency, Studia Logica, vol. 47(1988), pp. 145{149.

[3] I. L. Humberstone, The logic of non-contingency, Notre Dame Journalof Formal Logic, 1995, 36(2):214{229.

[4] S. T. Kuhn, Minimal non-contingency logic, Notre Dame Journal ofFormal Logic, 1995, 36(2):230{234.

[5] H. Montgomery, R. Routley, Contingency and non-contingency bases fornormal modal logics, Logique et analyse, 9 (1966), 318{328.

[6] H. Montgomery, R. Routley, Non-contingency axioms for S4 and S5,Logique et analyse, 11 (1968), 422{424.

[7] H. Montgomery, R. Routley, Modalities is a sequence of normal non-contingency modal systems, Logique et analyse, 12 (1969), 225{227.

334

kristina striegnitz › esslli › courses › readers › studentsession.pdfnob o komagata,...

Documents