computational semantics - universitetet i oslofolk.uio.no/jtl/sli360/komp/compsem.pdfcomputational...

1

Computational Semantics

An Introduction through Prolog

Jan Tore L¿nning

Institutt for Lingvistiske Fag

Universitetet i Oslo

1997

3

PrefaceThese notes were originally written for use in the first course in ComputationalLinguistics which is given as a part of the program in Language, Logic andInformation (SLI) at the University of Oslo. I have written them in English becauseI feel they fill a hole in the current literature and might hence also be of interest to abroader audience.

This is a draft version. Comments are welcome.The main goal is to fill the semantic part of a first course in computational

linguistics where Gazdar and Mellish: Natural Language Processing in Prolog, ch.1-7, covers the syntactic part. The book by Gazdar and Mellish (G&M) contains achapter on semantics too, and one chapter on question answering and inference.The approach here differs from the one by G&M in one crucial respect, it exploitsthe possibilities of Prolog to a larger degree. The content in their book falls intotwo different categories. On the one hand, there are implementation independentsections, which are also used in the editions of the book using Lisp or Pop-11, and,on the other hand, there are the Prolog specific implementations. Although thismainly works fine, in the semantics it has lead to an unnecessary complex formatfor semantic representations compared to what is needed in Prolog. In particular,the meaning representations are done in a sort of Attribute-Value format (DAGs)and can thereby first be introduced for grammars in such a format. But in a Prologsetting, semantic representations can as well be done directly in Prolog terms, atleast for the simpler cases, and be introduced as extra arguments in a DCGgrammar. Thereby one also avoids the extra step in G&MÕs approach where onefirst constructs a semantic representation as a DAG during parsing, and thentranslates it into a term before processing.

The approach to semantics taken here is not original, but common in the LogicGrammar and DCG community. It is given a particular clear treatment in Pereiraand Shieber, Prolog and Natural Language Analysis. The current notes wereintended as a more elementary introduction to the same approach where we fill insome more details and try to spell out the theoretical linguistic foundations for theimplementations somewhat more. That is also the case if we compare these notes toGal et. al. (1991) or Abramson and Dahl (1989). Besides, we cover some topicsnot discussed in any of these.

The students, for whom this is written, are supposed to have completed a coursein first order logic and one in symbolic programming including Prolog. Theyfollow a course in linguistics, including introduction to semantics, simultaneouslyas they follow this course. We will hence here suppose that the readers know somesemantics. The main question here will therefore be the more practical one, howshould the known theory of simple semantics be implemented. But it is also timelyto ask the very fundamental question on what the proper relationship betweensemantics and computational linguistics is. I will therefore start with a review ofthe basis of formal semantics, and what the relationship between formal semanticsand computational linguistics is in section 1.

4

Added in 1997

These notes were originally written in 1992 with the exception of the last chapterwhich was written in 1993. The preface above was written in 1993.

Since then the manuscript has gone through two changes. Chapter 1, which wasintended as an introductory chapter, had grown too large and complicated in the1993 edition to serve its purpose. It discussed several questions of principle in toomuch detail. Hopefully, the current version is more readable and will serve thepurpose better. The part on evaluation has been expanded. It now fills twosections, section 4 and 5, and some interesting questions relating to partiality isconsidered.

The first four chapters serve as a kind introduction to the basics of computationalsemantics in Prolog. These chapters have been the core of the curriculum for thesemantic part of SLI 6 since 1992. These chapters also contain two sections, 2.2and 3.4, which try to relate the practical approach to a more theoretical one. Thesesections have not been parts of the curriculum. The last years section 5 has alsobeen part of the curriculum.

Section 6 on quantifier scope is also somewhat more advanced. Similarly to thesections 2.2 and 3.4 it presupposes some more semantic maturity by the reader,corresponding roughly to what one acquires in the second course in semantics (thecourse SLI 8 in Oslo, where the basic text book has been Chierchia, G. and S.McConnell-Ginet, Meaning and Grammar. An Introduction to Semantics). Henceit has not been part of the curriculum of SLI 6.

As there has not been room for more semantics in the SLI 6 course, these noteshave never been completed. There were several expansions that proposed itselffrom the beginning:

¥ expanding section 6 with an implementation of Cooper storage and adiscussion of efficient implementations.

¥ implementations of logical inference mechanisms¥ implementations of what is called semantic validity in section 1

Since these notes were originally written I have come across two othermanuscripts written from a similar perspective as the current notes. So far, none ofthem are properly published. They are

¥ Prolog and Natural Language Semantics,by Robin Cooper, Ian Lewin and Alan W Black, 1993(http://www.ling.gu.se/~cooper/papers.html)

¥ Representation and Inference for Natural Language,by Patrick Blackburn and Johan Bos, 1997(http://coli.uni-sb.de/~bos/comsem)

Both of them include issues not covered in the current notes and are well worthconsidering.

The current notes still have the status of a draft and all sorts of comments arewelcome.

5

1. Formal Semantics and Computational LinguisticsThere are obvious reasons for why semantics should be included in a course oncomputational linguistics. Semantics belongs to linguistics as much as syntax orphonology, hence when linguistics is made computational, semantics should be asignificant part of that project. Furthermore, many of the most useful applicationsof computational linguistics one can think of concern meaning, e.g., machinetranslation, database front-ends, information retrieval, or ticket reservation systems.

On the other hand, there is something strange with the term computationalsemantics, in particular for readers used to think in terms of logic. The logical keyconcepts, a valid inference and a proof, are considered to belong to differentrealms. It is semantics which makes an inference valid. While it is the syntacticoperation of proof which may be computed.

The bulk of this booklet will be very practical. We will write programs fortranslating English sentences into a logical language and for handling the resultinglogical formulas. But before entering that project, we will pause and reflect a littleon the theoretical issues raised when linguistic semantics is incorporated intocomputational linguistics. What are we computingÑand why?

We will start by considering what a semantic description of a natural languageshould account for, and how formal semantics try to answer these issues. We willthen proceed to consider how formal semantics might be made computational, andwhat the relationship between computational linguistic applications and theunderlying theory might be.

1.1 Linguistic meaning

ASPECTS OF MEANING

What is meaning? What is the meaning of a particular utterance? There is nosimple answer to this question, at least no simple answer that everybody will agreeupon. Consider Mary and Tom in a room where all the windows are closed, sheasking him,

1. Can you please open a window?

From a theoretical point of view, there are two sides to the utterance, what can becalled the expression and the content. The expression is the utterance itself, theparticular sound waves. The expression is what we study in phonetics, phonology,morphology and syntax. This particular utterance is classified as an instance of asentence, containing certain words, etc.

But language would not have been of much interest if this were all there were toit. By this particular utterance, Mary makes a request, she signals to Tom that shewants him to do something. Furthermore, this something is linked to the outerworld, here e.g. to the windows of the room. It is obvious that there is more tolanguage than the expression, or form, as we canÑin principle at leastÑknow the

6

phonetics, phonology, morphology and syntax of a language to such a degree thatwe may be able to produce and recognize all possible utterances of the languagewithout understanding anything. Meaning, or content ,is what lacks from such alinguistic description. Still, it is hard to say exactly what the meaning consists of.One reason is, of course, that no one has seen, or heard, meanings as such, isolatedfrom expressions. While we may experience expressions without grasping theirmeaning, we cannot experience meanings without expressions. Thus it will be partof a theory of linguistic meaning to determine what meaning or content is.

A starting point might be a pre-theoretic list of phenomena and aspects oflanguage use that we think belong to meaning and which a semantic theory shouldaccount for.

External significance

The first observation is that an utterance might be about, or bring informationabout, state of affairs in the world different from the utterance itself. For example,if Tom answers,

2. The window is open,

this tells something about the window and about the state of affairs in the world.MaryÕs request, even though it does not describe a state of affairs, relates to thewindows as well. The first goal for a semantic theory is to explain therelationship between language and the world.

Communicative significance

Tom's answer is a description of how the world is. He conveys a piece ofinformation. Mary's utterance was on the other hand a request. She wanted Tomto do something for her. In other contexts, the same sentence could have beenuttered as a question meaning roughly the same as, Are you able to open thewindow or Is it possible to open the window? The second question a semantictheory should consider is how language is used for communication.

Cognitive significance

When the speaker makes an utterance, she must in some way have access to thecontent expressed by the utterance before it is expressed. For example, Tom mustknow that one window is open to make his statement. On the other hand, thelistener must be able to decode the message, to understand the content of theutterance, for the exchange to be successful. One might argue that a theory oflinguistic meaning should explain how humans can encode and decode messages inlinguistic expressions, how they can store the content of utterances and how thelanguage meaning interacts with other types of cognitive content, e.g., how itrelates to perception and actions.

7

THEORIES OF MEANING

We have so far listed three different aspects of linguistic meaning. Differentapproaches to semantics have emphasis the three aspects differently. In these notesour theoretical backbone will be the logical approach to semantics. This is mostoften called formal semantics. Other names used are Montague semantics, truth-conditional semantics, model-theoretic semantics and possible world semantics.This comprises a whole family of different theories and not all of them find all theterms covering for their approach.

The common basic assumption underlying these approaches is that language is aconventional, rule-based, social system and can be studied as such. One can talk ofthe rules of the language, e.g., the syntactic rules. These rules are not defined by agrammar book, but they are supported by a language community and they are whatchildren acquire by growing up in that community. Of course, it might be difficultto specify exactly what the rules of the language are, both as they may vary betweensubgroups of the community and they may change over time. But this does notdisturb the basic approach. The language as such is not necessarily the same aswhat the individual language user masters. She might learn the language withsmaller or greater success.

As part of the language system there must be established conventions regardingwhat the expressions mean. These conventions must be shared by the speakers of alanguage for communication to succeed.

It is a consequence of this view that the task of linguistic semantics is to describethe rules or conventions that governs the meaning of a particular language. Assuch, it is the two first aspects, the external and the communicative significance,which are in focus for the formal semantic approach. The study of the cognitiveaspects, how the individual language users acquire the semantic rules and processthem, is not considered a part of semantic proper, but left to cognitive psychology.

Other approaches to semantics have emphasized differently. Thus bothbehaviorist approaches to meaning and the so-called speech act theories haveconcentrated on the communicative aspects. While so-called cognitivesemantics has considered the cognitive aspects to be the most important ones, asseen from the name. We will not consider these alternative approaches further.

As a methodological ideal for the formal semanticists, one can take a study froma different domain. Karl von Frisch discovered in the sixties some remarkable factsabout the behavior of bees. When returning to the beehive after finding pollen, theyperform a certain wiggling dance. The angle of the dance corresponds in a certainway to the direction to the source, while the number of wiggles is proportional tothe distance to the source. Other bees attend the dance and are then able to flydirectly to the source.

We recognize several of the same elements as in the linguistic meaning. Thereference to the world; the food source. The communication; the other bees gettingthe message. There is also a cognitive aspect to the bee dance; in some way orother the first bee must be able to store the information about the food source, thedirection and the distance, and encode this into the dance, and the attending beesmust be able to decode the message. But, as humans theoreticians, we have noaccess to how the bees perform and experience this part. Our description of the

8

significance of the bee dance, what it means in terms of food, can be done withoutany refernce to how this is being processed by the bees. Formal semanticists take asimilar approach to the study of the meaning of linguistic expressions.

On the other hand, a theoretician may characterize the knowledge the bee carriesin many different ways, as a map over the area, as a mental picture of thecorresponding dance, in Cartesian coordinates or in polar coordinates (which, bythe way, corresponds coordinate wise to the two critical aspects of the dance).However, the theoretician cannot say how the bee actually store the information, byobserving the behavior of the bee alone. Similarly, a theoretician might want to saysomething about the information a particular language user has, without trying toexplain how this information is actually processed. Thus, what we have calledcognitive aspects might be split into two, on the one hand the information alanguage user carries, on the other hand how this is processed. The first part mightbe studied even by the methods of formal semantics.

1.2 The basics of formal semantics

EVALUATION AND VALIDITY

We will assume the reader to be familiar with the basics of first-order logic andformal semantics. Hence we will not give an elementary introduction, but insteadreflect a little upon what formal semantics aims at, and what it achieves, before weturn to how formal semantics may serve as a background for computationalapplications.

The basic property of meaning, according to formal semantics, is the externalaspect. The starting point has been declarative sentences, like (2), and one hasfocused upon the remarkable fact that one (linguistic) event can be about, or carryinformation about, a quite different event.

2. The window is open.

An utterance of sentence (2) tells us something about another event or situationcontaining a particular window that is closed. It is not obvious in what terms oneshould describe how the utterance carries information. As we have come to know,formal semantics has chosen to take truth as the basic concept. The utterance of(2) is true if the window in question is open and false if it is not open. Moreover,the utterance of (2) might be true at one occasion and false at another. One way toapproximate the information carried by (2) is by classifying thesituations/events/worlds that make it true and they that don't. This is called thetruth conditions of the sentence.

If one knows the truth-conditions of the sentences of the language, there are twoperspectives one might take. Firstly, given (the truth-conditions of) a particularsentence and a (description of the) world, one may determine whether the sentenceis true or not, or evaluate the sentence.

9

Secondly, given the truth conditions of different sentences, but no informationabout what the actual world looks like, we can compare the truth conditions of thesentences in question. For example, two sentences will be equivalent if and onlyif it is exactly the same class of worlds which make them true, if they have the sametruth conditions. A sentence is called valid if it is always true, and an inferencefrom a set of sentences, G, to a sentence, j, is valid if all the possible (descriptionsof the) worlds which make all the premises true also make the conclusion true.While evaluation tries to say something about the relationship between language andthe world, the concept of valid inference let us say something about how themeaning of different sentences relate.

This exposition has of course been very sketchy. We have been sloppy withrespect to what kind of objects utterances are about, whether they should be calledworlds, situations, events or state of affairs. It is without importance for the pointswe want to make and we want to stay as theory neutral as possible. We have alsoskipped the distinction between sentence, utterance and statement. We will in thesequel, as in simple formal semantics, talk as if we ascribe truth conditions tosentences. But it is clear that several aspects of the meaning of an utterance isdetermined by the utterance situation. A fully developed theory should be morecareful in distinguishing between how truth depends on the utterance situation andhow it depends on the described situation.

Another question we will not consider is whether meaning is the same as truthconditions or whether there might be more to meaning than truth conditions. Inparticular, one might ask whether two sentences that have the same truth conditionsmay differ in meaning. Although we will not go beyond truth conditions here,nothing we will say will indicate that this is all there is to meaning. Besides, weknow that something more than truth conditions is necessary when so-calledintensional contexts are taken into consideration, in particular complement phrasesof verbs like belief as in

3. The Greeks thought Phosphorus was only visible in the morning.

We will not consider such phenomena here, but stick to the simple extensionalcontexts and what can be studied in terms of truth conditions.

W ORDS AND SENTENCES

On the picture considered so far, there are two figures involved, on the one handsentences and their meanings and on the other hand the world with its state ofaffairs. This corresponds pretty well with how logic is studied. There oneconsiders on the one hand sentences and on the other hand structures, or models.But when studying natural languages, it pays off to apply a more fine-grainedanalysis. Upon our view, there are two factors that jointly determines the meaningof sentences.

10

I. Each word in the language has its own meaning.

II. There are rules that determine how sentences, and phrases in moregeneral, get meaning from the meaning of the words in the sentence andhow the words are composed.

This corresponds to our generative view of syntax, or linguistic form in moregeneral. Even if we speak a language fluently, we may over and over again meetnew sentences never encountered before. Still we recognize them as sentences ofthe language, andÑeven more remarkableÑwe understand them. There is noupper limit to the number of sentences that can be expressed within a naturallanguage.

When it comes to syntax, or form, our strategy as scientists is the following.Since human beings with limited finite resources can master an infinite language,there must be a finite set of rules, or principles, that describes the language. Thisset of principles may be divided into two. On the one hand there is the finitelexicon, the words with their morphological and syntactic properties. It is ratherarbitrary which of all possible combinations of letters (or sounds) that actuallyconstitute words of the language, and which do not. On the other hand, there is theset of syntactic rules or principles which determine which words that may combineinto phrases and sentences.

Also, when it comes to meaning there must be a finite set of principles, since aspeaker may utter new sentences and be understood. It is natural to distinguishbetween the meaning of words and the meaning of complex phrases, correspondingto the distinction between lexicon and syntactic rules. Words like girl and teacherhave distinct meanings. But as Saussure observed, it is only an accident that girlmeans what it does and not, say, what teacher actually means. On the other hand,there are systematic rules that steps into action and ascribes meaning to sentenceslike

4. Every girl is a child.5. Every teacher is an adult.

To get a proper evaluation of a sentence as discussed above, we would need adescription of (I) the meaning of the involved words together with (II) principlesfor how meanings (truth conditions) of sentences are determined by their parts,together with

III. The facts in the world.

When we turn to valid inferences, there are several options for how these threefactors should be grouped together. Starting with logic and the way it is applied tosemantics, observe that what is called a structure, or a model, in fact represents twothings, both the facts of the world, and the meaning of the individual words.Consider a sentence like

11

6. A man is typing.

This sentence is true in this room now because I am both a man, and I am typing.Now there are two different ways to make the sentence false. I could change thefacts of the world by stopping to type (or by some more drastic means). Or I couldchange the meaning of some of the words, say man to mean what book nowmeans. That is, to be more precise, I cannot change the meaning of any Englishwords as this is determined by a whole language community, but one could think ofa language like English except for the meaning of man, and in this language thesentence would have been false in the current situation.

A first-order structure pays no attention to this distinction. A structure makes thesentence true if there is some object X which belongs both to the denotation of manand is typing, another structure makes the sentence false because there is no suchX, but it does not consider why there is a difference between the two structures.This indicates that a valid inference in logic does not take the meaning of theinvolved words into consideration. An inference is valid irrespectively of what thecontent words actually are; it is valid under all possible ways of ascribing meaningto these words. To be precise, the inference from a set of sentences G to a sentencej is logically valid if all possible ways of ascribing meaning to the contentwords and all possible state of affairs that make all the sentences in G true, alsomake j true.

By not considering what the words actually means, we miss something. Forexample, if sentence (7) is true, we will also expect (8) to be true, i.e., (7)intuitively entails (8).

7. The book is on the table.8. The book is not under the table.

But this inference is not logically valid as the denotations of the words under and onare not fixed, but can be any binary relations. We will call this inferencesemantically valid, and propose the following definition. Let s be a particular wayof ascribing meaning to all the content words, e.g., s ascribes the meaning wenormally assume to all words in English. The inference from a set of sentences Gto a sentence j is semantically valid (relatively to s) if all possible state of affairsthat together with s make all the sentences in G true, also make j true. It followsthat an inference which is logically valid will also be semantically valid, but notnecessarily the other way around, as illustrated by the example.

The meaning of the individual words is a part of the meaning of the language.Hence it would be desirable if a description of the semantics of a languagecontained a description of the meaning of the words. This could then be usedtogether with compositional semantic rules to compute the meanings of thesentences and the semantically valid inferences. But there are at least two problemsin giving a description of the meaning of the individual words. Firstly, to give sucha description we will again need to use words (in the same or a different language),for example if we say that "A swan is a large bird with a long neck and whitefeathers which likes to swim É" etc. But this does not take us outside of the realm

12

of language. We will not be able to identify a swan unless we already know themeaning of other words like white or bird. Hence such definitions ultimately tell usnothing about the relationship between words and their denotations, only abouthow the denotations of different words relate to each other. A dictionary is full ofsuch definitions. For a user, it is of no help unless she already knows the meaningof some of the words of the language.

The other problem in giving a description of the involved words is that there isno obvious way to draw a firm border line between which facts that are facts aboutthe meanings of the words and which facts that are facts about the state of affairs inthe world. For example, we listed in the definition of a swan that it is white. Whatif we meet a bird looking exactly like a swan except that it is black. Should it beclassified as a swan or not? If we take white to be a part of the meaning of theword swan, we have to conclude that this is not a swan. But there might be otherarguments, say from biology, which counts towards classifying the black bird as aswan. In which case we will say that it was an empirical generalization to say thatall swans are white; an empirical generalization which turned out wrong.

Though we are not able to overcome the first problem, to give a description thatglues the gap between the language and the world, it does not follow that we areunable to say anything about which inferences are semantically valid. Even withoutan actual representation of the function s which ascribes meanings to words, wemay try to classify the semantically valid inferences directly. One possibility is toclassify the first order structures (i.e. structures representing both state of affairsand the meaning of words) into two classes, those that are possible given the actualmeaning of the words and those that are impossible. Call these two classes ofstructures P and I, respectively. All structures where a book is both on and underthe same table will be classified as impossible; they will belong to I. While anystructure in P which makes sentence (7) true, also will make sentence (8) true.Thus one could identify semantic validity with P-validity.

Alternatively, one could consider the word definitions of a dictionary to be atheory S. Then the sentence j would be a semantic consequence of G if and only ifit is a logical consequence of S È G. Let P be the class of models for the theory S.Then this way of defining semantic validity will correspond to P-validity. Theadvantage of classifying it in terms of a theory is that we might study whether asentence j is a logical consequence of S È G by considering whether this isprovable in first-order logic.

This indicates that a logical approach to semantics can be adopted to an indirectstudy of word meanings in terms of semantically valid inferences. But the secondproblem we saw in giving an explicit description of the meaning of words remains.There is no simple way to determine which truths are empirical truths about wordmeanings and which truths are empirical truths about facts in the world. Thereby itis no simpler to determine exactly which models that should be classified aspossible and which that are impossible. Thus when applied in practice, the safestmust be not to rule out too many structures as impossible. It is a more seriousproblem if we end up with classifying intuitively non-valid inferences as valid thanthe other way around.

13

FORMAL SEMANTICS AND ASPECTS OF MEANING

We considered several aspects of meaning in section 1.1, the external aspect, i.e.,how language refers to the world, the communicative aspect and the cognitiveaspect. We have so far considered how formal semantics try to describe theexternal aspect. What about the other two?

Formal semantics has had less to say about communication. Some formalsemanticists have tried to describe the difference between declarative sentences andquestions in terms of their denotations; the two have different kinds of denotations.Others have distinguished between denotation and what is often called force. Asentence, or rather utterance, carries in addition to its denotation a certain force, andthis varies between the different kinds of sentences. Thus a declarative sentence notonly corresponds to certain state of affairs it says that these state of affairs actuallyare the facts. It says that a certain statement is true. A yes/no-question alsocorresponds to certain state of affairs, but it asks whether they are the facts,whether a certain statement is true. While a command is an order to bring it aboutthat something is the case, that a certain statement becomes true. We will notconsider this further from a theoretical point of view, but this approach will underlieour actual implementations.

As already said, formal semantics does not aim at describing how individualagents process meaning. But it might be used to classify the information carried bya certain agent. One possibility is to classify this information in terms of a set ofsentences, G, either within a natural language or some logical calculus. By doingthis, we take no stand on the actual nature of knowledge, whether it is linguistic ornot. Rather, the only thing we claim is that the agentÕs knowledge is compatiblewith the word being in certain ways and incompatible with the word being in otherways. According to the agentÕs knowledge the world has to make G true. Whenthe listener hears a new sentence j, he might update his knowledge with j toG È {j}.

What is typical for this way of conceptualizing things is that a (finite) set ofsentences may correspond to many different worlds or states of affairs. A world,or what we in logic call a model, is a complete set of facts. All issues aredetermined in one of two directions, either positively or negatively. A particularsentence, j, will either be true or false in the world. But just like a cognitive agentdoes not have to know whether j is true or not, neither j nor Øj has to be amember of G, the speakerÕs knowledge.

1.3 Making computational sense of formal semantics

On this background we can turn to the main theme of these notes, how tocombine a formal semantic approach to the meaning of natural languages withcomputational linguistics. The task can be approached from two different angles.Either, how can a formal semantic theory be computed? Or, how can the practicewe see in computational linguistics with respect to semantics be given a soundtheoretical foundation from formal semantics? We shall keep an eye to both

14

perspectives, but we will organize the discussion according to the theoreticaldimension and the concepts considered so far; evaluation, communicative force,logically and semantically valid inferences, and information processing.

EVALUATION

What would a computational implementation of the semantic evaluation look like?We can think of two ways of implementing this. We may build a machine, orrobot, that actually evaluates sentences towards the real world, or we may build arepresentation of the world within a computer and design procedures that evaluatesentences towards this representation. The first option is exemplified if wecommunicate in natural language with a robot on Mars about what it sees. Theother option is illustrated if we communicate towards a data base containinginformation about what the robot found on Mars.

A real world evaluator would have to contain more than just semantics. To uttersomething like

9. It is cold here,

it would at least need the following. A sensory device able to measuretemperatures, i.e., some sort of thermometer. And some procedures associated tothe word cold, which makes the computer able to decide, on the basis of what itmeasures, whether it should be classified as cold or not. Similar considerationsapply to sentences involving what the robot sees, etc. The study of perception andthe building of robots able to carry out such tasks is outside of semantics, and willnot be considered here. But observe how the explicit procedures necessary toevaluate words like cold given such sensory devices, give a rather directimplementation of meanings of words. It indicates that even though it is in generalnot possible to give an explicit description of the relationship between a word andwhat it denotes, one could aim at giving an operational description of the meaningof the words in terms of perception. But that will not be our task here.

The other option is to evaluate sentences towards a data base. In particular, thecomputer may contain a model of the world, some sort of representation of (someof) the facts in the world. For example, we may store the fact that John is a boy as

10. is_a(John, boy).

Then we can write a program which on the basis of a set of such facts determineswhether sentences are true or not. This is the way semantics most often is intro-duced in computational linguistics, and it will be our basic approach in section 4.

Such a data base may correspond closely to a first order structure. Like the first-order structure the data base represents both the actual facts in the world and themeaning of the content words. In the example, the entry represents the joint effectof the state of affairs and the meaning of words necessary to make John is a boytrue. We will have evaluation rules that correspond to the compositional semantic

15

rules of the language. But we will not in any way implement the meaning of thecontent words.

A small digression. We have here made a firm distinction between evaluationtowards the real world and evaluation towards representations of the same world.But some cases may be harder to classify. If I ask the computer whether I ambooked in on the flight to Bergen Thursday at 8 p.m., and the computer answersyes, is it then the fact that I am booked that is represented in the computer or is itthis representation itself which is the sole fact?

FORCE

We have once again started with declarative sentences, but for some purposesquestions or commands may be more appropriate. Consider again the robot onMars. We will use questions to ask it what it sees, and it will use declarativesentences when answering us back describing what it sees. The answers should betrue sentences and they should be appropriate answers to the questions. (We willnot consider here theories for when answers are appropriate.) Furthermore, wemay use commands to order the robot around, tell it to pick up things etc.

Back to the earth, we will also vary the mood in communicating with a data base.We will ask questions and it will answer us back in declarative sentences on thebasis of what it knows. We can also think of telling the data base facts it does notknow by declarative sentences and it updating its data base accordingly.

Robots moving around in space lifting objects and a data base containing factsabout a static world are two extremes. One could also think of applications inbetween. For example, one could communicate with a ticket reservation systemusing natural language. Then the machine would have to contain a dynamic database where the factsÑwho have reserved places, which seats are takenÑchangeduring time. This system could also communicate with attendants in charge ofmailing out tickets to the customers, thereby serving as the systems arms.

VALID INFERENCES

Logical validity

Formal semantics introduced two perspectives; the relationship between languageand the world (=evaluation), the relationship between sentences in the language(=valid inferences). When a human reads a text, she will be able to answer a lot ofquestions with sentences that are not directly contained in the text. To take anelementary example, if the text contains the sentences, Socrates is a first-grader andEvery first-grader is six years old and the reader is asked How old is Socrates?, sheknows the correct answer, He is six years old. If we could get the computer toperform similarly, it would have an immense impact. Think of all the texts that arecurrently available in electronic form. What if the computer could answer for anyquestion whether is was entailed by these texts?

How could this be implemented? What we are looking for are procedures whichwill tell us whether inferences are valid or not. But this is exactly the task of logic.Applied to natural language we can either translate the natural language sentences

16

into some calculus that are well known, say first order logic, and then implementsome sort of theorem prover for this logic. Or we may try to develop particularlogical procedures more suited to the particular phenomena we find in naturallanguages. In any case, we can consider the text to be a set of sentences G and ayes/no-question as a question to whether a certain j is entailed by G. (This is ofcourse a gross simplification as a text is not the same as a set of sentences, there is acertain coherence between the sentences, it matters in which order they come etc.But this is beyond the points we want to make.)

So we consider the question whether G entails j. Sometimes we will be able toanswer the question successfully with a yes or a no as in the simple example withSocrates, the first-grader. But in other cases it will become much harder to comeup with the correct answer. It is commonly assumed that natural languages are atleast as strong an apparatus as first-order logic. As is well known from logic, thereis no general decision procedure for first-order logic, i.e., no procedure which to aset of sentences G answers correctly yes or no when asked whether a sentence jfollows logically from G. There are procedures which answer correctly wheneverthey answer and answer yes whenever they should, but which do not always givean answer when supposed to answer no. As a result, we can assume that it is ingeneral not possible to write a program which always correctly answers whether aninference from natural language is logically valid or not.

This might be a disappointment, but it should not come as a surprise. Think ofmathematics. If we had a computer as described, it should suffice to write a set ofmathematical axioms as sentences in English and formulate any possible theorem asan English sentence and ask the program whether it holds or not. The computerwould then answer the question correctly. No computer can do this. Even thoughthe theorem is a logical consequence of the axioms, it is not that easy for a humanor a computer to see that it is.

Whether an inference is valid or not has to do with the meaning of the involvedexpressions. What the words mean and how the meaning of complex expressionsare formed from the meaning of their parts are parts of the language as aconventional system supported by a language community. Hence, we believe theindividual language users are able to acquire these principles. And we should inprinciple be able to implement the same principles on a computer. But it does notfollow that the language users are able to decide whether arbitrary inferences arevalid, and we cannot make any argument from the abilities of the human linguisticabilities to the erroneous conclusion that the concept of valid inference iscomputable. What we can hope for are computer implementations thatapproximate the full relation of valid inference and is able to decide for as manycases as possible.

Another point is that even questions that can in principle be answeredalgorithmically may in practice be intractable. We know that even thoughpropositional logic is decideable, it is easy to formulate problems where all knownalgorithms are exponential in the number of variables and hence intractable.

17

Towards semantic validity

Logical validity is not only impossible to compute in general, it is not evensufficient. Say the text contains the sentence John is a horse and a reader is askedIs John a boy?. She will be able to answer, No he isn't a boy, because she knowsthat nothing is both a horse and a boy. This is an example of a semantically validinference which is not logically valid. If the text also contained the sentence Nohorse is a boy, the inference would have been logically valid, as well.

We encountered problems in describing in language the actual relationshipbetween words and their denotations. But we also saw that such an explicitdescription is not necessary for describing the semantically valid inferences. Onecould think of several ways for implementing this. One possibility we encounteredwas to put sentences that characterize the relationship between the meaning ofwords, as No horse is a boy, as additional premises, or so-called meaningpostulates. Then semantical validity is the same as logical validity within thistheory.

One could also implement semantic validity in other ways, e.g., by the means ofso-called semantic nets. We will not consider semantic validity any further in thisbooklet, but the reader should be aware that for real world applications the conceptof semantic validity may be more valuable than pure logical validity. There areseveral large projects that try to incorporate a significant part of lexical semanticsinto computational linguistics. The most well known is Word Net, which is onegiant semantic net of (50 000?) English words. Currently there are projects fordeveloping similar nets for at least German and French.

But any such approach at extending logic validity in the direction of semanticvalidity is expected to run into the same problems with respect to lack of decisionprocedure as logic validity itself. Moreover, as we discussed earlier, there will beno principal way to distinguish between pure semantic truths and truths about theworld. And if the computer is expected to answer as well as a human after readinga text, the computer will, in fact, need to have as good knowledge as the humanalso with respect to what is the case in the world. For example, think of how muchreal world knowledge a human makes use of in discussing a scene from a restaurantinvolving waiters and customers.

We have so far distinguished rather sharply between evaluation and inference.But even in an evaluational setting the system will profit from being able to makesimple inferences, both purely logical ones and semantic ones. If a data basecontains canaries, ducks, cats and dogs, some semantic knowledge goes into theprocessing of questions containing the word bird.

INFORMATION PROCESSING

We have here presented two perspectives. Either the computer contains a data basewhich we conceive of as a model of the real world. Then evaluation towards thedata base is a model of answering whether the sentence is true in the modeled worldor not. Or the computer contains a text and we ask whether a question can beanswered from the text.

18

Let us change perspective to the information processing agent who knowssomethingÑbut not necessarily everythingÑabout the world, and is told somethingnew and asked questions. How could this be modeled and implemented? There aretwo options, to think in terms of evaluation or in terms of valid inferences.

To take the first option, earlier we thought of a data base as a model of the world.But we could also think of the data base not as a model of the world as such, but asthe model of what a certain agent knows about the world. The difference betweenthe two perspectives may show up as follows. The world is complete. Anystatement is either true or false in the world. Hence when a data base models theworld we assume it to be complete and answer all questions with a yes or no. AnagentÕs knowledge does of course not have to be complete. There might be manyquestions the agent does not know how to answer. Hence, in this case the database will only be a partial model of the world which leaves many questionsundecided. We will in the text see how this change of perspective changes ourimplementations.

Upon the other option, the knowledge is characterized by a set of sentences, sayG, as discussed earlier. Then a certain sentence j may be such that neither j norØj is a consequence of G. Also with this approach we see the partiality ofknowledge. We will return a little more to the different ways of conceivingknowledge and information in the text.

But now, let's get started!

19

2. Semantic representations in a DCGÑfirst round

2.1 Basic sentences

From the discussion in the first chapter, our task can be split into two:

1. Write a program that parse natural language sentences and ascribe tothem logical formulae which represent their truth-conditional content.

2. Apply this content.

The two tasks will be handled independently and we start with the first one. Fromour background in natural language processing in Prolog, this task will be amplifiedto the more specific

1 ' . Extend a DCG grammar with extra arguments for semantics which to asentence return a Prolog representation of a logical formula.

For simple sentences we want the following types of logical representations.

John runs. Run(john)Mary loves John. Love(mary, john)

Hence we want the following Prolog queries to succeed:

?- s(Sem,[john,runs],[]). Sem = 'RUN'('JOHN') ; no?- s(Sem, [mary,loves,john],[]). Sem = 'LOVE'('MARY','JOHN') ; no

(As a convention, capitals in simple quotes are used for semantic representations todistinguish them both from the syntactic representations and from the Prolog code).The next question is then what the semantic representations of the simpler phrasesshould be to obtain this representation for the sentences as a whole.

PROPOSAL 1

The simplest and most straightforward proposalÑwhich also would correspondbest to how we construct a compositional semantics in a non-computationalsettingÑwould have been the following.

s(V(N)) --> np(N), vp(V).np('JOHN') --> [john].vp('RUN') --> [runs].

20

There are two reasons for why this does not work. The first reason is that Prologdoes not allow variables for functions or predicates, only for terms. Hence the firstline, even though we perfectly well understand it, is not Prolog code. The secondreason is that it is not obvious how to generalize this to the transitive verb case.

PROPOSAL 2

There are a lot of possible ways around these problems. We shall here choose oneand leave a discussion of alternatives to the problem section. Our next approach isbased on two pillars

1. An anticipation of what the semantics for the full sentence will look like.2. An exploitation of PrologÕs unification mechanism.

Instead of ascribing the verb run the semantics 'RUN', we shall ascribe it thesemantics 'RUN'(X) and then unify X with 'JOHN' during parsing. We can dothis by giving the VP an extra argument for the unknown X, e.g.vp(X,'RUN'(X)). When the VP is combined with the NP this X is unified withthe semantics of the NP, here 'JOHN', by the first grammar rule below. Thegrammar fragment for the two sentences will thus look like:

s(Sem) --> np(N), vp(N, Sem).vp(Subj, Sem) --> v(Obj, Subj, Sem), np(Obj).np('JOHN') --> [john].np('MARY') --> [mary].vp(X,'RUN'(X)) --> [runs].v(X,Y,'LOVE'(Y, X)) --> [loves].

This sort of computation will be the main idea behind our approach to computingsemantic representations in DCG both in this section and when we come to morecomplex constructions. We will, however, make one smaller change to ourprogram before we are content. This change can be perceived as a syntacticadjustment; it does not change the basic underlying approach.

PROPOSAL 3

In the last program fragment above, the different phrases have different numbers ofarguments for keeping track of the semantics. The VP has an extra argument forthe not yet instantiated subjectÕs semantics, while the transitive verb has two extraarguments. This is clumsy, and it is theoretically unclear what the status of thedifferent semantic arguments of the V or VP is. To avoid the first problem, wecould have put all the semantic arguments of a phrase in a list. But as lists are usedfor many other purposes, and to also obtain a notation which is intended to be moretheoretically revealing, it has been customary to use a particular binary operator tocollect the semantic arguments of a phrase.

We have chosen to use the operator ^. (This is already built into the Prolog weare using for other purposes.)

21

?- current_op(X,Y,^). X = 200, Y = xfy ; no

If this is not included in your Prolog interpreter, you can try to declare it. It mightbe that you have to change the precedence level from 200. Some Prologinterpreters may have problems with the symbol Ò^Ó. You can then choose anyother symbol you like. The important thing is to have an operator which is rightassociative, i.e., one that parses

X^Y^'LOVE'(Y, X)as

X^(Y^'LOVE'(Y, X))

and which are not used for other purposes that may cause ambiguity or confusion.With this operator, our program will look like:

s(Sem) --> np(N), vp(N^Sem).vp(Sem) --> v(N^Sem), np(N).np('JOHN') --> [john].np('MARY') --> [mary].vp(X^'RUN'(X)) --> [runs].v(X^Y^'LOVE'(Y, X)) --> [loves].

Observe how the VP-rule becomes simpler than in approach 2, and how easilytransitive verbs can be handled in general. This program fragment will be the basisfor the rest of our work. We will not change it seriously. The rest of our task willbe to extend it to cover a larger fragment of English.

The reader is advised at this point to take a break and work her way by handthrough a sentence like Mary likes John to observe how the fragment handles it.

2.2 Theoretical foundations

REPRESENTATIONS AS TERMS IN A LANGUAGE WITH L

So far we have only focused on the semantics of whole sentences. The semanticrepresentations associated with the simpler phrases were chosen according to thecontributions they made to the representation of the semantics of the full sentence;they were not ascribed autonomous denotations. But it is possible to give atheoretical analysis where expressions like X^'RUN'(X) and X^Y^'LOVE'(Y, X)are considered as terms in the logical semantical language that represent the possibledenotations of the phrases. For this purpose, we have to consider a logicallanguage with l-terms. Some readers might already have acquaintance with thetyped higher-order logic used, e.g. in Montague grammar, where such l-terms playan important role. They should not have too big problems in seeing the connectionsbetween the representations we have used so far and the typed higher-order logic.

22

For readers unfamiliar with such languages, we will settle for less. We will showhow a normal first-order language can be extended with a l, and how thiscorresponds to the expressions we have considered so far.

We extend the set of special symbols used in first order languages, connectives:®, Ú, &, and quantifiers: ", $ with the symbol l. Then we extend the stock offormation rules, i.e. rules like Òif P is a predicate and t is a term then P(t) is aformulaÓ, with the following rule.

(l1) Whenever j is a formula and x is a variable, lx[j] is a predicate(=unary relation).

From this rule, together with the standard rules, it then follows that lx[j](t) is aformula whenever t is a term. The semantic rule corresponding to the formationrule is given by

(l2) ||lx[j]|| is the function which to each individual a in the domain yields true if ||j[a/x]|| is true,

where a is a name for a (i.e. ||a||=a) and j[a/x] is the result of exchanging all free occurrences of x in j with a

and false otherwise.

At first, this interpretation rule might look a bit heavy. But what it says is basicallythat e.g. formula (a) below is true if and only if formula (b) is true

a. lx(Run(x) & Love(x, mary)) (john)b. Run(john) & Love(john, mary)

This also illustrates one reason for introducing l into our first order language.By the use of l one can construct complex predicates, like the one in (a), whichrepresents runs and loves Mary.

So far we have introduced l for one purpose, to build predicates (=unaryrelations) from formulae. One can also meet l in other contexts. For example, itcan be used for constructing a (unary) function lx(t) from a term t, e.g.,lx(child_of(x,mary)) if child_of is a binary function. Both these uses of l areexamples of the more general rule that is known, e.g. from programming in Lisp:

(l3) ||lx[j]|| is the function which to each individual a in its domain yields||j[a/x]|| , where a is a name on a (i.e. ||a||=a) and j[a/x] is the result ofexchanging all occurrences of x in j with a.

It should be straightforward to see that (l2) is an instance of (l3). On the otherhand, it might be harder to see that (l2) is a proper semantic rule if lx[j] is apredicate. Should not ||lx[j]|| then be a set of individuals and not a function?Normally a predicate, like P, is taken to denote a set and P(t) is true provided thedenotation of t is a member of this set. Here we have been a bit sloppy and notdistinguished properly between a set and its characteristic function. To each set A

23

there is a function fA , called the characteristic function of the set, which to anargument a yields the value true if a Î A and false otherwise. There is a simpleone-to-one correspondence between sets and their characteristic functions, andoften we are not very precise whether we talk about the one or the other. To beprecise, we would have (at least) two options. Either to say that all predicatesdenote characteristic functions rather than sets and that ||P(t)|| = ||P||(||t||) whenever Pis a predicate and t is a term, or to let lxj denote a set rather than a function, inwhich case the connection to (l3) would have been less direct.

For our purposes we only need the particular instance of (l3) expressed in (l1)-(l2), together with one more extension of the first order language:

(l4) whenever P is a predicate and x is a variable, lxP is a function which toeach individual in the domain yields a predicate.

This formulation is a bit sloppy. A proper definition would presuppose a revisionof the way the syntax of the first order language was formulated and is beyond thescope of this short exposition. The semantics corresponding to this rule is given by(l3). Thus e.g. lx(ly(Love(y,x))) is such a function according to (l4) and itssemantics is such that ||lx(ly(Love(y,x)))(john)|| = ||ly(Love(y,john))||.

So far we have extended the first order language with more syntacticconstructions and we have ascribed denotations to these new constructs. Of coursesemantic notions like satisfiability, truth and validity are defined as before relative tothe new languages. What about logical questions concerning inference rules,proofs and completeness? First, as we know that the two expressions lx[j](t) andj[t/x] always have the same meaning or denotation, we can introduce a syntacticinference rule corresponding to this observation.

(l5) The expressions lx(j)(t) and j[t/x] can be exchanged for each other.

This rule is normally called l-conversion or, in some contexts, b-conversion. (Tobe precise, when lx(j)(t) is exchanged with j[t/x] it is called a b-reduction,whenj[t/x] is exchanged with lx(j)(t) it is called an inverse b-reduction.A b-conversion is either a reduction or an inverse reduction and it is a special caseof the more general l-conversion.) This will be the backbone for the relationbetween our Prolog program and the l-calculus we shall soon consider. For thereader interested in logic, we also remark, without proof, that the rules (l1) and(l4) with the semantics from (l3) is a modest logical extension of first order logic.The new logic is still compact and it might be given a complete axiomatization; infact all that is needed is to add (l5) in a little more precise form to a set of completeaxioms for first order logic.

We should admit first, however, that the description of the l-calculus has beensuperficial and sloppy on several points. In the rule (l5) we must put somerestrictions on the term t. We cannot allow t to contain variables that get boundwhen t gets substituted for x. (a) and (b) below are not equivalent, while (a) and(c) are:

24

a) $y( lx($yR(x,y))(father_of(y)) )b) $yR(father_of(y), y)c) $y( $zR(father_of(y),z) )

After this digression on first order languages extended with certainl-constructions, we can return to the main theme. What this has to do with ourProlog program. The Prolog term X^'RUN'(X) can be considered a representationof the predicate lx(run(x)). In a compositional semantics, the representation of thesentence John runs is then lx(run(x))(john). This is semantically equivalent torun(john) and can be transformed syntactically into the simplified expression by l-conversion. What our Prolog program then does can be conceived as follows

1. It applies lx(run(x)) represented as X^'RUN'(X) to john.2. Instead of giving the result lx(run(x))(john), it yields run(john)

represented as 'RUN'('JOHN'). The reason we do not get the morecomplex representation is that the b-reduction is built into our programand executed simultaneously as composition.

A similar analysis can be done for the transitive verb. X^Y^'LOVE'(Y, X)represents lx(ly(love(y, x)) and functional composition with b-reduction is usedboth in the VP rule and in the S rule.

SEMANTICS AS A HEAD-FEATURE

Quite another perspective one could take on what our small program does is thefollowing. If we return to proposal 2, what really is the semantics of the verb in

vp(X,'RUN'(X)) --> [runs].

is the second argument, 'RUN'(X). The first argument, X, shall not be conceivedas part of the semantics. Thus what the S rule does is to unify the semantics of thewhole sentence with the semantics of the VP. As the VP is said to be the head ofthe sentence in the linguistic sense, this is the natural thing to do. A phrase sharesseveral features with its head, semantics is one of them. This approach is more orless prominent in theories like Lexical-Functional grammar (LFG) (cf. the =¯equations), Generalized Phrase Structure Grammars (GPSG) and Head-DrivenPhrase Structure Grammar (HPSG) (cf. the Head Feature Convention). It is morenaturally implemented in a graph unification framework, like PATR, than in a termunification framework like DCG. For example, in the graph based approach onedoes not need to refer to the variable argument X in the S-rule. In the PATR-notation from G&M, a fragment with semantics could include rules like thefollowing ones:

S ® NP VPÜS semÝ = ÜVP semÝÜS sem arg0Ý = ÜNP semÝ

25

Word runsÜcatÝ = VPÜsem relÝ = run

The first equation in the sentence rule corresponds to LFGÕs =¯. (This is not theway it is done in G&MÕs chapter 8, however).

How the semantic representation shall best be implemented and how one shouldconceive what one is doing with different implementations, is currently under activeinvestigation. We will not go more into the details here, but just point out the twodifferent perspectives that are both consistent with our current practise.

2.3 Propositional connectives

We have so far considered simple predicate-argument structures. Our mainextension will be when we turn to quantified noun phrases. But let us first considersome simpler extensions. Let us start with sentential connectives: conjunction,disjunction and negation. A simple way to introduce conjunction and disjunctionwould be as follows

:-op( 600,yfx,[&,or]).s(S1 & S2) --> s(S1), [and], s(S2).s(S1 or S2) --> s(S1), [or], s(S2).

It would have been more linguistically correct here to only introduce one rule, andlexical entries for the connectives, say

:-op( 600,yfx,[&,or]).s(S1 C S2) --> s(S1), conj(C), s(S2).conj(&) --> [and].conj(or) --> [or].

But Prolog does not allow this. And instead of choosing a more complex solutionto this, we go for the shortcut and introduce the connectives syncategorematically.

There is another problem for the semantic rules we proposed, however. Eventhough they are descriptively sound, as everyone who has tried to write a DCG inProlog knows, such rules are dangerous and will make large problems for theprogram. The reason is that the rules are left-recursive and Prolog has problemswith such rules thanks to the combination of a left first choice function in the linearinput resolution procedure and a depth-first search strategy. A standard way out isto introduce an additional meta symbol:

26

s(S1 & S2) --> s1(S1), [and], s(S2).s(S1 or S2) --> s1(S1), [or], s(S2).s(Sem) --> s1(Sem).s1(Sem) --> np(N), vp(N^Sem).vp(Sem) --> v(N^Sem), np(N).np('JOHN') --> [john].np('MARY') --> [mary].vp(X^'RUN'(X)) --> [runs].v(X^Y^'LOVE'(Y, X)) --> [loves].

This trick has one drawback. It undergenerates. The following sentence issemantically ambiguous corresponding to where we Òput the parenthesisÓ, while theprogram will only yield one parse

s(S,[john,runs,and,mary,loves,john,or, john,loves,mary],[]).

S = 'RUN'('JOHN') & ('LOVE'('MARY','JOHN') or 'LOVE'('JOHN','MARY')) ; no

(Observe that the original left-recursive program would ascribe two different parseswith corresponding non-equivalent semantic representations). We will leave it forthe exercises to write a program that yields all possible readings of conjoined anddisjoined sentences.

Let us here instead move on to negation. While conjunction and disjunctionbehave pretty similarly in logic and in natural languages, there are larger differenceswhen it comes to negation. While we in logic will place the negation in front of thesentence, it turns up in the middle of the natural language sentence. This is not toobig a problem as long as the sentences are as simple as the ones we have consideredso far. We can even choose whether we want to regard negation as a verb modifieror a VP modifier. As the simplest thing is to keep it as a VP modifier, which willgive the possibility for handling transitive and intransitive verbs uniformly, we willchoose that option:

:-op(600,yfx,[&, or]).%:-op(500, fx, -). If not already defined.s(S1 & S2) --> s1(S1), [and], s(S2).s(S1 or S2) --> s1(S1), [or], s(S2).s(Sem) --> s1(Sem).s1(Sem) --> np(N), vp(X,N^Sem).vp(_,Sem) --> v(N^Sem), np(N).vp(neg,N^(- Sem)) -->

[does, not],vp(pos, N^Sem).vp(_,X^'RUN'(X)) --> [runs].

27

We have used "-" for negation. This will normally be a defined operator (formathematical purposes). The interesting clause is the penultimate one. Observehow the negation goes Òinside the abstracted NPÓ. We have included an extraargument in the vp. This is a ÒflagÓ against repeated negation. It will blockconstructions like does not does not run. There is one shortcoming in the presentprogram. It does not consider inflection. Thus it will either accept both of runs anddoes not runs or none of them, and similarly for run and does not run. We leave itfor the exercises to fix this.

2.4 Exercises

EXERCISE 2.1

The approach we chose in proposals 2 and 3 was only one possible way out of theproblems for the first proposal. Another possibility would have been to usePrologÕs facilities for transforming lists into terms. For example, the followingwould be a fragment with intransitive verbs:

s(Sem) --> np(N), vp(VSem), {Sem=..[VSem,N]}.np('JOHN') --> [john].vp('RUN') --> [runs].

Extend this with transitive verbs. We suppose that the lexical entries are on theform:

v('LOVE') --> [loves].

(There is no obvious way to do this.) The clumsiness of the more complexrulesÑtogether with the extension of the DCG-rules with {}-clauses and the use ofthe ÒimpureÓ =.. Ñwere reasons not to choose this format.

EXERCISE 2.2

Another route out of the problems for proposal 1 would have been the following.When we represent first order logical formulas in Prolog, we are not forced torepresent relations by relations, we can as well represent relations by atoms if weintroduce a series of new relations stand_in_1, stand_in_2, stand_in_3, etc. Wecould then have represented

Run(john) by stand_in_1(run, john)Love(john, mary) by stand_in_2(love, john, mary)

In Prolog, we could have used the same name, stand_in, for the two relations, asProlog herself distinguishes relations by arity, and reads the arity of a relation fromthe number of arguments with which it occurs. The following simple rule wouldwork for intransitive verbs, but have no immediate extension to transitive verbs.

28

s(stand_in(VSem,N)) --> np(N), vp(VSem).np('JOHN') --> [john].vp('RUN') --> [runs].

On the way towards transitive verbs, two modifications will be done. The first oneis that we will introduce an operator for the relation stand_in, thereby getting asyntax that looks more similar to the original first order formula. The fragmentwith intransitive verbs would then look like:

:-op( 150,yfx,::).s(VSem::N) --> np(N), vp(VSem).np('JOHN') --> [john].vp('RUN') --> [runs].

To extend this to transitive verbs, we could use the same observation as inMontague grammar. Instead of regarding love as a relation taking two arguments toyield a truth-value, we can consider it a function which first takes one argument, theobject NP to yield a predicate, i.e., something that needs yet another argument toproduce a truth-value. Thus John loves Mary is represented by Love(mary)(john),which has immediate constituents Love(mary) and john, rather than Love(john,mary).

Use this idea to extend the fragment with transitive verbs and with negation.

This way to represent logical formulas, using atoms also for predicates andspecial operators to connect them, can be found in the literature. It might perfectlywell be combined with our approach in proposal 3, thus making use ofrepresentations like: X^run::X. How will Prolog parse this Ñ as (X^run)::X or asX^(run::X)? Try to answer the question and check your answer by using the(DEC-10) built-in predicate "display".

EXERCISE 2.3

Extend the last program from section 2.1 with rules for ditransitive verbs to be ableto handle sentences like John gave Mary Lassie.

EXERCISE 2.4

Modify the last program from section 2.3 such that the interplay between negationand inflection comes out right.

EXERCISE 2.5

Extend the program from section 2.3 with rules for conjunction and disjunction ofVPs.

29

EXERCISE 2.6

Extend the program with rules for sentences of the forms Òif S then SÓ, ÒS if SÓ andÒS only if SÓ. Assume the interpretations from (propositional) logic. Consider twovariants of the rules, first one where the formulae use a new operator correspondingto ®, and then a variant where all the representations only use the basicconnectives, &, or, -.

EXERCISE 2.7 (HARD)

Modify the program with conjunction and disjunction such that it at the same timegenerates all possible semantic representations and avoid the problems of left-recursion. Use a DCG-grammar with no {}-clauses.

(Hints: (i) Take as example how ÒcorrectÓ parse trees were returned after left-recursion was removed in G & M. (ii) Do not get confused from the fact that theright-hand side of the rule ÒS ® S and SÓ contains two S-es. It is only the first Sthat makes a problem. The second S can be handled as if it was any category.)

30

3. Quantified Noun Phrases

3.1 Quantifiers

We will now turn to sentences with quantified noun phrases, like Every man lovesa woman. In logic, quantifiers are usually represented as follows:

Every girl runs. "x(Girl(x) ® Run(x))Some girl runs. $x(Girl(x) & Run(x))Every girl loves some boy. "x(Girl(x) ® ($y(Boy(y) & Love(x,y))))

As this constructs a formula from three constituents, a quantifier, ", a variable, x,and a formula, Girl(x) ® Run(x), the most similar representation we could get inProlog would have been

every(X, girl(X) ==> run(X))some(X, girl(X) & run(X))

We shall choose a slightly different option, to use so-called generalizedquantifiers with two arguments, which will result in Prolog relations with threearguments, one extra argument for the variable. The representations will be:

'EVERY'(X, 'GIRL'(X), 'RUN'(X))'SOME'(X, 'GIRL'(X), 'RUN'(X))'EVERY'(X,'GIRL'(X),'SOME'(Y,'BOY'(Y),'LOVE'(X,Y)))

The point with this move is to get a representation which is more transparentbecause it has a syntax more similar to natural languages. In particular, theexistential and universal quantifiers will get the same type of treatment. In addition,this notation could be used for non-logical quantifiers like most, which areinherently binary, i.e., cannot be represented by a unary quantifier andpropositional connectives with its intuitive meaning (cf. Barwise and Cooper1981). We will postpone the semantics of such quantifiers to a later section, but thetranslation procedures we define will work as well for them as for the logical ones.

With these representations, our first shot at the S-rule, and rule for quantifiedNPs will be:

s(Q(Var,Restr,Body)) -->np(Q(Var,Restr,_)), vp(Var^Body).

np(Q(Var,Restr,_)) --> det(Q), n(Var^Restr).n(X^'GIRL'(X)) --> [girl].det('EVERY') --> [every].

But, as before, this does not work, thanks to the lack of variables for relations inProlog. Also as before, there are several ways out.

31

The first possibility is to follow a similar track as in proposal 2 in section 2. Thiscould be done by the following rules:

s(Sem) --> np(Var,Body,Sem), vp(Var^Body).np(Var, Body, Sem) -->

det(Var,Restr,Body,Sem), n(Var^Restr).n(X^'GIRL'(X)) --> [girl].det(X,Restr,Body,'EVERY'(X,Restr,Body)) -->

[every].

We will apply this approach in the following. How will the rule for introducingtransitive verbs, VP ® V NP, be with this approach? The problem is to combinethe semantic representation of a verb with that of a quantified NP in the proper way.We will deduce it with a form of baxkword reasoning. What we know is what wewant the semantic representation of a sentence like every girl loves some boy, to be,namely 'EVERY'(X, 'GIRL'(X), 'SOME'(Y, 'BOY'(Y), 'LOVE'(X,Y))). If thisis put into the rule for S ® NP VP, we see that the representation corresponding tothe VP loves some boy must be X^'SOME'(Y, 'BOY'(Y), 'LOVE'(X,Y)). But toget this from the already chosen representations for the verb loves and the NP someboy, we see that the following rule will do the job:

vp(Sub^Sem) --> v(Obj^Sub^Body), np(Obj,Body,Sem).

This rule may seem anything but obvious. The reader is therefore strongly urged towork her way through the details of a parse of the example sentence.

3.2 Adjectives and restrictive relatives

In a fragment with quantified common noun phrases, it is natural to introducecommon noun modifiers: adjectives and restrictive relatives. To begin withadjectives, we will restrict attention to the adjectives that are called predicative orintersective.

An adjective a is predicative if for all common noun phrases b and forall names g, g is an ab has the same meaning as g is a and g is a b.

Red and square are examples of predicative adjectives. Something is a red ball if itis red, and it is a ball; something is a square piece of cheese if it is square and apiece of cheese. Small is an adjective which is not predicative. Jumbo might be asmall elephant without being a small animal. We cannot then analyse being a smallelephant as being small and being an elephant, since that would turn Jumbo into asmall animal since we know that Jumbo is an animal.

At this stage it should be obvious how we want to represent predicativeadjectives in a restrictive position:

32

n(X^(R1 & R2)) --> adj(X^R1), n(X^R2).adj(X^'RED'(X)) --> [red].

With this representation for predicative adjectives, it should not be too difficult tofigure out how predicative adjectives in predicate position can be handled, e.g., Theball is red.

Next, we consider restrictive relatives. As they are also noun modifiers, weexpect them to be treated similarly to adjectives. A first shot, with a very primitiveanalysis of the internal structure of the relatives could be:

n(X^(R1 & R2)) --> n(X^R1), r(X^R2).r(Sem) --> [that], vp(Sem).

This would nearly work for simple constructions like every man that runs. But itwould not cover relatives like every man who mary loves. A more refined analysisof the inner structure of the relatives is necessary. A possibility is to use the slashnotation used in G&M for relatives and try to extend that analysis with semantics.

Another problem is that the rule for introducing the restrictive relative is left-recursive. One might ask whether that is necessary. At least for Norwegian it isoften claimed that nouns only combine with one restrictive relative clause. If so isthe case, the left recursion can be avoided by introducing an additional category fornouns, e.g.

np --> det, n1.n1 --> n.n1 --> n, r.

or by more simply contracting the rules into:

np --> det, n.np --> det, n, r.

Such a solution will give us a temporary relief from the pain of left-recursion, butthe pleasure may be brief. Restrictive relatives may be conjoined as in Every manwho loves Lisa (and) who hates Peter, and thereby reintroducing a problem of left-recursion. So whether we allow a noun to combine with more than one relative ornot, we get a case of left-recursion. G&M (p. 158) contains a proposal for writinga grammar without left recursion for common nouns modified by several PPs, andthey show how that grammar can be extended with extra arguments that build treescorresponding to the syntax of the first rule above. This approach can also be usedfor returning the proper semantics (cf. exercise 2.7 above). We leave this task asan exercise.

33

3.3 Proper nouns

In section 2 we discussed proper nouns, here we have discussed quantified NPs.Unfortunately, we have ended up with two sets of rules, one for each type of NPs.Is there any possibility for getting a uniform treatment of the two. Consider the tworules for S ® NP VP:

s(Sem) --> np(Var,Body,Sem), vp(Var^Body).s(Sem) --> np(N), vp(N^Sem).

Some moments reflection should convince the reader that the last rule can be sub-sumed under the first one if the following type of entries for proper nouns areassumed:

np('JOHN', Body, Body) --> [john].

For the moment we will not try to give this rule any other explanation than itworks. We will look on a possible theoretical analysis in the next subsection. Wehave so far only considered the rule where an NP combines with a VP to asentence. What about a VP which consists of a transitive verb and a proper noun?The reader should convince herself that the new rule for introducing names as NPsworks as well in this case by hand-parsing an example sentence like John lovesMary.

3.4 More on theoretical foundations

In section 2, we did not stop with proposal 2, but went one step further in thesemantic representations to proposal 3. Then we showed in subsection 2.2 howthese new representations could be given a logical interpretation. Is somethingsimilar possible for the extended fragment with quantifiers?

We will in this section assume that the reader is familiar with some moreMontague grammar and l-calculus than in subsection 2.2. We will not give thenecessary background here. Readers who are not familiar with Montague grammarcan safely proceed to section 4 which will not presuppose this subsection.

For our purposes, it is sufficient to consider an extensional version of thel-calculus with two basic types e and t. Complex types are built by the familiarrule: whenever a and b are types then so is (a, b). From the formation rules, weremember the following:

¥ what is called terms in first order logic, in particular constants and variables,are here terms of type e,

¥ formulas are also called terms of type t,¥ there are variables of all types. The type of a variable is often indicated by a

subscript, e.g. X(e, t). Here we will follow the conventions to use smallletters towards the end of the alphabet for variables of type e and capitalstowards the end of the alphabet for variables of type (e, t),

34

¥ when A is a term of type (a, b) and B is a term of type a, A(B) is a term oftype b. In particular it is a formula if b equals t,

¥ when A is a variable of type a and j is a term of type b, lA[j] is a term oftype (a, b),

¥ when A is a variable (of any type) and j is a formula (= term of type t),"A(j) and $A(j) are formulae.

In the discussion in the sequel we will assume that quantifiers are unary (cf. therepresentation in exercise 3.4). We could have included binary quantifiers, sayevery2 corresponding to the Prolog representation by a rule like: Òwhenever A is avariable and j and y are formulae then "A(j, y) is a formula.Ó

In Montague's original paper, The Proper treatment of Quantification in ordinaryEnglish, (PTQ), quantifiers are introduced syncategorematically. But the followingtype of representations, which have become customary in later approaches, iscompatible with Montague's approach:

every lX(lY("z(X(z) ® Y(z)) ) )girl lu(girl(u))run lv(run(v))

The only semantic rule one would need would be functional application, e.g. if VP'is the semantics of the VP and NP' is the semantics of the NP, NP'(VP') is thesemantics of the sentence. This would result in the following representations:

every girl lX(lY("z(X(z) ® Y(z)) ) )(lu(girl(u)))every girl runs lX(lY("z(X(z) ® Y(z)) ) )(lu(girl(u)))(lv(run(v)))

If we l-reduce this expression, we will successively get the following formulae:

lY"z(lu(girl(u))(z) ® Y(z))(lv(run(v)))"z(lu(girl(u))(z) ® lv(run(v))(z))"z(girl(z) ® run(z))

This last representation is, of course, the type of representation our programcomputes. Thus, our piece of Prolog code can be conceived as a program whichcalculates semantic interpretation by functional application and simultaneouslyexecute l-reductions. In a way, this means that

det(X,Restr,Body,'EVERY'(X,Restr,Body)) -->[every].

represents the content of

every lX(lY("z(X(z) ® Y(z)) ) )

35

in our program. In subsection 2.1 the relationship between the underlying logicand the implementation became more transparent by the move from proposal 2 toproposal 3. Could there be a similar way to write our program for quantifierswhich made this relationship to logic more transparent ?

The most immediate proposal would be the following:

det(R^B^'EVERY'(X,R(X),B(X))) --> [every].s(Sem) --> np(Var^Sem), vp(Var).np(Sem) --> det(Var^Sem), n(Var).

Ignoring for the moment that this is not Prolog code (variables for functions), itwould have produced the following result for the sentence every girl runs:

'EVERY'(X, (Y^'GIRL'(Y))(X), (Z^'RUN'(Z))(X))

With the conventions established so far, this is a representation of the formula

"z(lu(girl(u))(z) ® lv(run(v))(z))

which is a correct rendering of the sentence, but it is not the most reduced one.What can we do to get the most reduced reading? By inspecting the last fragment

and example, we see that we want to change (Y^'GIRL'(Y))(X) with 'GIRL'(X).This can be anticipated by changing our last fragment into

det((X^R)^(X^B)^'EVERY'(X,R,B)) --> [every].s(Sem) --> np(Var^Sem), vp(Var).np(Sem) --> det(Var^Sem), n(Var).

This type of representation can be found in the literature, e.g. on page 100 in thebook by Pereira and Shieber (1987). It should be observed that computationally,this program will be indistinguishable from the one in subsection 3.1. The overalleffect will be the same, we have only chosen a bit different data structures. Whatabout representationally. Does this new program make more sense. If it shall makeany sense, it must be by claiming that (X^R)^(X^B)^'EVERY'(X,R,B) representsexactly the same as R^B^'EVERY'(X,R(X),B(X)). The difference between thetwo is that in addition to what is represented in the first one, we have anticipatedÑor as it is called in computer science, partially evaluatedÑcertain l-reductions in theformer. This is the approach taken to these representations by Pereira and Shieber.

What about the proper nouns, is it possible to give a similar theoretical analysiswith respect to them? Remember that a proper noun in Montague grammarcorresponds to a term in the logical representations language, but that the NP withthe name as sole constituent corresponds to more complex terms as exemplified:

Mary lX(X(mary) )runs lv(run(v))Mary runs lX(X(mary) )(lv(run(v)))

36

We can l-reduce the last expression and will see:

lX(X(mary) )(lv(run(v)))lv(run(v)(mary)run(mary)

Let us try to represent this in Prolog similarly as the quantified NPs. FirstlX(X(mary) ) should be represented by X^(X(mary)), which yield the following,entry, which combine with the same format on the sentence rule as above:

np(X^(X(mary))) --> [mary].s(Sem) --> np(Var^Sem), vp(Var).

The result of this parser on Mary runs would have been (X^'RUN'(X))('MARY').A similar partial evaluation as for the determiner turns the NP entry into:

np((mary^Y)^Y) --> [mary].

which together with the S-rule and the regular entry for runs will give'RUN'('MARY') as the semantic value for Mary runs. What is computed is thesame as in section 3.3. Theoretically it can be interpreted as if we are operatingwithin the typed l-calculus and partially evaluate certain l-reductions.

This way of looking at semantic representations as l-terms where l-reductionsare partially evaluated is under current investigation by the author. Between thequestions that are considered are a closer study of what exactly it means that aProlog term represents a term in the l-calculus, under which conditions partialevaluation is legal, and how more complex semantic rules, corresponding to therules of flexible categorial grammar, can be used in a Prolog type of setting.

3.5 Exercises

EXERCISE 3.1

Work out the details of the fragment/program described in the text in section 3.3,and write a lexicon with at least ten words in each category. The lexicon shouldcontain intransitive and transitive verbs, proper nouns, common nouns, determiners(less than ten), predicative adjectives. Consider both the possibility of handling thetwo classes of NPs differently and the possibility of giving them a uniformtreatment (section 3.3).

EXERCISE 3.2

Give rules for the introduction of ditransitive verbs with quantified NPs (givesevery girl a doll) and extend the lexicon accordingly.

37

EXERCISE 3.3

Extend your fragment/program from exercise 3.1 and 3.2 with rules for negation.

EXERCISE 3.4

Modify your program to use unary logical quantifiers:

every(X, girl(X) ==> run(X))some(X, girl(X) & run(X))

EXERCISE 3.5

Write a program that translate formulae in the ÒbinaryÓ quantifier language intoformulae in the language with ÒunaryÓ quantifiers, e.g. translating (a) as (b).

a) 'EVERY'(X,'GIRL'(X),'SOME'(Y,'BOY'(Y),'LOVE'(X,Y)))b) 'EVERY'(X,'GIRL'(X) ==>

'SOME'(Y,'BOY'(Y) & 'LOVE'(X,Y)))

EXERCISE 3.6

Another possibility for the problems we encountered at the beginning of section 3concerning the lack of variables for predicates and functions in Prolog would havebeen to use a similar strategy as in exercise 2.2. Thus we could introduce a newrelation with four arguments, q, and write

q(every, X, girl(X), run(X))q(some, X, girl(X), run(X))

for the atomic formulas. Then the program could be written:

s(q(Q,Var,Restr,Body)) -->np(q(Q,Var,Restr,_)), vp(Var^Body).

np(q(Q,Var,Restr,_)) --> det(Q), n(Var^Restr).n(X^'GIRL'(X)) --> [girl].det('EVERY') --> [every].

Modify the rest of your fragment to correspond to this treatment of quantified nounphrases. Observe that if this option is chosen, it is not as easy as before to get auniform treatment of quantified NPs and proper nouns.

EXERCISE 3.7

Write DCG rules with semantics for restrictive relatives. The rules should cover thefollowing types of constructions.

38

who runswho loves Marywhom Mary loveswho handed Mary a parcelwhom Mary handed a parcelwhich Mary handed John

(If you find the inflection of the Wh-word including the animate/inanimatedistinction too intricate, then use that in all the cases.)

EXERCISE 3.8

Extend the alternative rules for introducing restrictive relatives into an NP presentedin the text with semantic arguments.

EXERCISE 3.9

Write DCG-rules with semantic arguments for copula constructions with NPs andAPs (and forget about PPs), e.g. is red, is a happy girl.

EXERCISE 3.10

We have so far avoided PP-constructions. One reason is that their semantics is notalways transparent, another is that they introduce several syntactic problems. Weshall here try to incorporate PPs that are adjoined to NPs. We assume the propersemantic representation of a sentence like Every man in Oslo runs to be'EVERY'(X, 'MAN'(X) & 'IN'(X,'OSLO'), 'RUN'(X)). This indicates that thePP has a similar semantic function as a restrictive relative. The easiest seems to beto assume that the PP is adjoined to an N rather than an NP.a) Provide semantic arguments for the rules

n ––> n, pp.pp --> p, np.

which together with the other rules will generate the indicated representation forEvery man in Oslo runs. You may assume that the NP in the last clause is a propernoun and you do not have to worry about left recursion.b) Modify the rules to handle the case when the NP in the PP is a quantified one.c) Modify the rules to avoid left-recursion. The N should be able to combine withany number of PPs.d) Syntactically, there might be reasons for claiming that the PP is adjoined to thefull NP. Try instead to extend the rules

np --> np, pp.p --> p, np.

with semantic arguments in such a way that the sentences get the same semantics.Solve this task first without worrying about left-recursion, i.e. extend the rules assuch with semantic arguments. Then try to get rid of left-recursion.

39

4. Evaluation

4.1 The model

We will consider the second task, the use of the semantic representations computedso far. The use we have in mind is the simulated evaluation described in section1.3, where the computer contains a (description of a) model of the world. As ourbasis is formal semantics, this should be a model in the model theoretic senseconsisting of a set of individuals, D, and an interpretation function, I, which to eachindividual a yields a member I(a) of D; to each unary relation or predicate, P, inthe language, I, yields the extension of that predicate I(P), which is a subset of D,etc. How shall this be represented in Prolog? Here we have a large freedom. It isnot necessary to mention the I-function explicitly or to write the extension of apredicate as a set. The important thing is that we as programmers and usersunderstand what the clauses in the database represent. We shall therefore follow asimple option. To represent

I(run) = {I(john), I(mary), I(ann)},

we writefact('RUN'('JOHN')).fact('RUN'('MARY')).fact('RUN'('ANN')).

Similarly we write

fact('SEE'('JOHN', 'MARY')).

to representÜI(john), I(mary)Ý Î I(see),

and so on. This will work fine as long as we have a convention for what it means.We will not mention the function I explicitly; it will be incorporated into the relationfact. Nor will we bother to transform the relation run into a term which can be anargument for Prolog. Though we will consider this option in an exercise. Asdiscussed in section 1.3, what is represented can as well be considered to be thesimple sentence John runs, together with the fact that we regard the sentence astrue.

Assume the model to be represented as facts of the form above which have beenconsulted and have become a part of PrologÕs own data base. The next task is towrite a predicate true which takes one argument, a logical formula representing asentence, and answers yes or no depending on whether the model (the fact base)supports the truth of the actual formula or not. The definition of true for atomicsentences will be very simple:

true(F) :- fact(F).

40

4.2 Connectives

For evaluating conjunction and disjunction, we can use PrologÕs own conjunctionand disjunction:

true(A & B) :- true(A), true(B).true(A or B) :- true(A); true(B).

Negation is much harderÑat least in principleÑbecause Prolog does not containanything corresponding directly to logical negation. As we remember, Prologcannot express all logical formulae directly, it is restricted to so-called Hornclauses. Prolog contains a built in meta predicate not, which is not the correct im-plementation of logical negation. It corresponds to logical negation, however, if wemake some extra assumptions. It turns out that if we make the same extra assump-tions for the representation of the model for the natural language we study, we canuse PrologÕs not for the not of English. We will explain this in a little more detail.

To understand what is going on, we have to take one step back and ask what theProlog machine actually does. Given a program G and a query a, Prolog tries tocheck whether a (read as a formula of first order logic) is provable from G (read asa set of formulae from first order logic). Prolog answers yes if a is provable fromG and no if it is not provable. If Prolog is given a query not b to the same program,Prolog checks whether b is provable from G or not. If b is provable, Prologanswers no; if b is not provable, Prolog answers yes. But the fact that b is notprovable and the fact that Ø b is provable are in general not the same. If G isconsistent, which we will assume it is, we cannot prove both b and Ø b from G.But it is fully possible that neither b nor Ø b is provable. In this case, Prolog willanswer yes to the query not b. However, it would have been more correct toanswer no. The provability of Ø b entails the non-provability of b, but not theother way around.

Let us illustrate this with a small example. Suppose that G = {run(mary),laugh(john)}. Then neither run(john) nor Ørun(john) are logical consequences ofG. If the Prolog program G is given the query :Ð run(john), it will answer no; henceif it is given the query :Ð not run(john), it will answer yes.

Which extra assumption will make the non-provability of b entail the provabilityof Ø b? The answer is often called the closed world assumption. It says, roughly,that if an atomic formula is not provable from the program G, then we can assumethe formula to be false. In other words, if we ask a query containing not towards aprogram G, we are not asking whether the query is a logical consequence of theprogram G, but whether it is a logical consequence of a new theory S, where S is Gextended with the negation of each atomic sentence which is not a theorem of G.Under this interpretation, it will be correct to read not as the logical negation.

Going back to our example assuming the actual vocabulary to contain exactly twonames john, mary and two unary predicates run and laugh, when we ask the query:Ð not run(john) we interpret this as we ask whether the formula Ørun(john) isprovable from the set S = {run(mary), laugh(john), Ølaugh(mary), Ørun(john)}.Then it is logically correct to answer yes.

41

What we have to do to get PrologÕs not to do the job also in our natural languageprogram is to make exactly the same type of assumption. That is, we must assumethat all atomic facts which are not explicitly listed in our representation of the modelare false. For example, if the model is {fact(run(mary)), fact(laugh(john))}, wemust assume that in this model I(mary)Ï I(laugh). This should not be too much ofa problem, as long as we remember that this is the way the program works. Also informal semantics, we describe the interpretation of the relation symbols bydescribing their positive extension only. We do not list which tuples that do notsatisfy the relation. There is a difference between the way we normally representmodels in formal semantics, and the type of representations we build in Prolog,however. In the Prolog representation we have to be explicit; we have to name allindividuals in the domain and be explicit about their properties, while in formalsemantics we have the possibility to specify the denotations more indirectly, say theset of even numbers. In particular, this entails that our representation of semanticmodels in Prolog only will work for finite domains.

If we are conscious of what we are doing, the clause for negation becomes astrivial as

true(- P) :- not true(P).

4.3 Quantifiers

Let us then face the real question, how to handle quantified NPs. We assume thefollowing types of representations

'EVERY'(X, 'GIRL'(X), 'RUN'(X))'SOME'(X, 'GIRL'(X), 'RUN'(X))

We start with the existential quantifier. Our first proposal Ñ which we will have torefine in the next section Ñ is as simple as:

true('SOME'(X, P, Q)) :- true(P), true(Q).

To see how this works, we consider the simple call

:-true('SOME'(X, 'GIRL'(X), 'RUN'(X))).

Which then continues as

:- true('GIRL'(X)), true('RUN'(X)).

If the first of these succeeds, it means that for some name or other, say anna, thefollowing is a fact:

fact('GIRL'('ANNA')).

42

After the engine has discovered this, the variable X will be bound to anna, so thenext thing to check is whether

:- true('RUN'('ANNA')).

This will again succeed if and only if the base contains the fact

fact('RUN'('ANNA')).

This little piece of programming exploits PrologÕs use of variables to do most ofthe job. One word of caution. We have represented variables in the logicallanguage by variables in Prolog. But while variables in the logical language may bebound or free, all variables in the Prolog representations are free. This may causeproblems. Look at the pair:

'EVERY'(Y,'MAN'(Y) &'SOME'(X,'HORSE'(X),'OWN'(Y,X)),

'SOME'(X, 'WOMAN'(X),'LOVE'(Y,X))"y (Man(y) & $x(Horse(x) & Own(y,x)) ®

$x (Woman(x) & Love(y,x)))

The Prolog expression may be taken to represent the logical formula, but it is notpossible to use the Prolog representation together with our interpretation routine,since the variable X is used twice for a bound variable. We must hence rememberto be careful when we write our programs to see that such representations are notgenerated by accident. If we are careful, it works fine to use PrologÕs variables forrepresenting bound variables.

The evaluation of universal quantification is a bit harder than the existential one.There is no obvious way to implement it. One possibility is to use the laws of logicthat let us define every in terms of some by the use of negation. The followingformulas are always equivalent:

"y (j ® h)

Ø$y Ø(j ® h)

Ø$y (j & Ø h)

We can then write the clause for the universal quantifier as

true(every(X,P,Q)):- true(- some(X, P,- Q)).

Universal quantification may be the point where the shortcomings caused by theclosed world assumption and the restriction to finite domains become mostobvious. To take one example, a sentence like Every swan is white will becometrue if and only if all the swans which we have given a name and introduced as aswan in our model is white. But it is of course not realistic that we can haveintroduced all swans in the world this way.

43

4.4 Quantification and negation

As we have seen, neither negation nor quantification can be directly represented inProlog. But we have proposed ways to implement them and illustrated that theimplementations are correct with some simple examples. But what happens whenthe examples get more involved and quantification and negation interact? Considerthe following model

fact('GIRL'('LEE')).fact('BOY'('KIM')).

and try to evaluate the following two formulas

a) true('SOME'(_552,'GIRL'(_552),- 'BOY'(_552))).b) true('SOME'(_552,- 'BOY'(_552),'GIRL'(_552)))

What happens? The first query (a) is evaluated to true as it should and returns thewitness LEE. So far, so good. This means that if the fragment were extended tocontain a sentence like Some girl is not a boy, the evaluation component wouldwork fine. But what about the query (b)? To our surpriseÑor at least to ourdisappointmentÑProlog will answer no.

Of course we would like the same answer to the two queries, just like thefollowing two formulas are equivalent:

$x(Girl(x) & Ø Boy(x))$x( Ø Boy(x) & Girl(x))

Why do we not get the correct answer to query (b)? The reason is that to prove(b), Prolog will try to prove:

not true('BOY'(_552))

But this will not succeed since there is a boy. Rather than looking for someone whois not a boy, Prolog will look for a boy, and when Prolog succeeds in finding a boy(KIM), the query true('BOY'(_552)) comes out true, and the query nottrue('BOY'(_552)) fails. In other words, our trick for implementing the existentialquantifier, which works for $x(Girl(x)), confuses $x(ØBoy(x)) andØ$x(Boy(x)).

As a side remark, one should observe that this has nothing to do with our choiceto use binary quantifiers in the representation language. Exactly the same problemarise for the unary quantifier with the interpretation schema

true('SOME'(X, P)) :- true(P).

applied to the formulas

44

true('SOME'(_552,'GIRL'(_552) & - 'BOY'(_552))).true('SOME'(_552,- 'BOY'(_552) & 'GIRL'(_552))).

What can be done to circumvent the problem? One possibility is to make surethat formulas like (b) are never generated. It is not in the fragment considered thisfar since any quantified NP will contain a head noun which is not negated, andwhich precedes any restrictive relatives, cf. some child who is not a boyÉ.Moreover, if any adjectives precede the head noun, they will not be negated either.

But there is something discomforting about having logical formulae which arewrongly evaluated, even if they are not used for representation purposes. A morerobust strategy would be to try to get all logical formulas evaluated correctly. Thenwe will not have to worry about the form of the formulas used when later onextending the fragment. To get the correct interpretations of examples like (b), wewill have to adjust the interpretation rule for the existential quantifier. There will betwo clues to the improved implementation. Firstly, we will adopt the restriction wealready accepted to get a correct implementation of negation: all facts must beexplicitly represented in a model. Secondly, we will consider why theimplementation of the existential quantifier worked correctly when the firstargument was not negated and try to extend this property to the general case. Wewill hence assume that each model contains an explicit representation of itsindividuals, say of the form

is('LEE').is('KIM').is('SANDY').

The interpretation scheme for the existential quantifier may then make use of thisas follows:

true('SOME'(X,P,Q)):-is(X),true(P),true(Q).

This will yield the correct answer also for (b). Since the query is('LEE') succeeds,the program will proceed to try to prove not true('BOY'('LEE')) and in contrast tonot true('BOY'(_552)) this will come out true.

45

4.5 Exercises

EXERCISE 4.1

We will modify the representation of the model structure to look more like the onewe use when we talk about models and evaluation in a pure logical setting. For thispurpose we shall introduce the binary relation ext (for extension). The meaning isthat ext(X,Y) holds if Y is the extension of the expression X, or in other words ifI(X) = Y. The following exemplifies a name and a binary relation, respectively.

ext('JOHN',j).ext('LOVE', [[j,a], [a,h], [h,l], [l,j]]).

Change the evaluation rules accordingly. For relation symbols, you will need theÒimpureÓ Prolog relation =.. which holds between terms and lists as in

love(john, ann) =.. [love, john, ann]

EXERCISE 4.2

One could think of another strategy for checking

'EVERY'(X,P,Q)

To calculate the two sets {X | P} and {X | P & Q} and compare their cardinality byusing PrologÕs built in, ÒimpureÓ set_of. Write a piece of program which does this.

EXERCISE 4.3

We chose to represent the domain by asserting the fact is(a) for each name a. Analternative would be to put all the individuals into a set (list) and assert that this isthe domain, say

domain(['KIM', 'SANDY', 'LEE']).

a) Adjust the interpretation rules to use domain instead of is.

b) Assume that the is-representation is used. Write a procedure that finds thedomain X such that domain(X).

c) Assume that the domain-representation is used. Write a procedure that assertsall the facts of the form is(a) for each name a. To be functional in a larger context,such a procedure should first retract all facts of the form is(a) before it startsasserting.

A reason why we preferred the more verbose use of is compared to domain is that itmakes it easier to update the data base when more individuals are included.

46

EXERCISE 4.4 (PROJECT)

We have so far only considered the semantics of declarative sentences. In thissection we have seen how they can be evaluated in a model. But when we evaluatesentences in a model, it is more natural to consider them to be questions.

a) Write a DCG for a fragment of English yes/no-questions corresponding to thefragment of declarative sentences considered this far.

b) Extend the DCG-with semantic arguments.

c) Build the DCG into a program such that each question is evaluated towards themodel represented by atomic fact sentences.

EXERCISE 4.5 (PROJECT CONTD.)

Try and extend your grammar with Wh-questions like

Who loves Mary?What did John give Ann?

and your answering program to give sensible answers.

EXERCISE 4.6 (ADVANCED PROLOG PROGRAMMING)

So far we have assumed that the programmer always update the is-predicate whennew facts including new individuals are added to the data base. It might be morerobustÑand save the programmer some workÑto have a routine which calculatesthe domain from all facts of the form fact(É). Write a procedure which calculatesthe domain from the facts of the form fact(É). This might involve some impureProlog predicates like =.. and setof.

One might experience a problem in the use of setof here. We will illustrate with asimpler example. Assume we represent a is the father of b by father(a,b), e.g.

father(frank, nancy).father(henry, jane).

Assume we will calculate the set of fathers. The most straightforward idea is towrite this

fathers(X):- setof(Y, father(Y, Z), X).

But if we try this out, the result becomes as follows

47

?- fathers(X). X = [henry] ; X = [frank] ; no

What goes wrong here is the relative scope between the existential quantifierbinding Z and the set-formation. Prolog ascribes maximally wide scope to freevariables. Thus in evaluating fathers(X), Prolog will ask whether there is a Z suchthat X={Y | father(Y,Z)}. But what we want Prolog to ask is whetherX={Y | $Z father(Y,Z)}. There is a mechanism in Prolog for expressing this, try

fathers(X):- setof(Y, Z^father(Y, Z), X).

(So this is what the built in operator ^ is used for!). Observe the similarity betweenthis problem and the problem we experienced when the existential quantifier in theEnglish fragment interacted with negation.

48

5. World, knowledge and inferences

5.1 Partial models of the world

We have so far assumed the set of Prolog facts of the form fact(É) (the model forthe first order language) to be a model of the world. The program that evaluatesEnglish sentences towards this data base is a simulation of checking whetherEnglish sentences are true in the world. For example, if the data base contains factsabout rocks and minerals returned from the moon, the program should producecorrect facts about the moon. An application of the program may save the userfrom a journey to the moon to look for herself.

But there is another way to conceive of the data baseÑnot as a representation ofthe world as such, but as the representation of an agent's knowledge of the world.The data base will then contain some, but not necessarily all, facts about the world,and it will be acceptable for the computer to answer I do not know when given aquery.

To be able to distinguish between the computer knowing something to be falseand not knowing it to be true, we must represent both positive and negative facts inthe data base, as exemplified by

true('GIRL'('LEE')).false('BOY'('LEE')).true('SEE'('KIM', 'LEE')).

We want the data base to be consistent, i.e., it should not contain both true(a) andfalse(a) for any a, but we do not demand it to be complete, e.g. neithertrue('GIRL'('KIM')) nor false('GIRL'('KIM')) has to be part of the data base.

To do the evaluation, we will need a predicate with an argument for the answerwith three possible answers, true, false, undefined. The basis for the predicate willthen be

value(P, true) :- true(P), !.value(P, false) :- false(P), !.value(P, undefined).

Observe the use of cut (!) in the first two clauses to get undefined as the answer ifnone of the first two are affirmed. When we add more clauses to the valueprocedure, it might be necessary to add some checking of the syntactic form of P toavoid, say, that the value undefined is returned as a possible value for any P, or cutmust be included in all other definitions for value.

The intuition behind the propositional logical part is the following; undefinedcorresponds to lack of information. It may later be resolved to true or false. If apart of a formula is resolved, then the whole formula should be resolvedcorrespondingly. That a formula has the truth value undefined means that it mightlater on get the value false or it may get the value true. If the formula can only getone of these values, then the formula should itself be ascribed this value. Thus,

49

e.g. undefined Ú true should be true and not undefined. This proposes thefollowing truth tables

p q Ø q p & q p Ú qt t f t tt u u u tt f t f tu t u tu u u uu f f uf t f tf u f uf f f f

This can then be written as Prolog clauses. For the conjunction and thedisjunction, one might write all the nine lines as different clauses, or one may try tocontract some of the cases by using cut. We leave this as an exercise. In contrastto before, there will be no problems with negation. We will not have to rely onProlog's interpretation of negation as failure, we can write explicit clauses for thethree cases.

We are then left with quantification, which turns out to be quite interesting.There should not be any doubt when it comes to what makes the existentiallyquantified sentence true. We have to find a witness which makes the sentence true.

value('SOME'(X,P,Q), true):- value(P & Q, true),!.

Observe that because of the sound treatment of negation, the problems consideredin section 4.4 will not arise here.

But when should the sentence be false and when should it be undefined? Thereare at least two possible answers to this question. The first possibility is to say thatthe sentence is undefined if it is undefined for at least one individual in the domain,and hence that it is false if it is false for all individuals in the domain. To implementthis, we are back at similar problems as in section 4.4. Since we have not listedexplicitly the atomic sentences which are undefined in the data base, but said thatsentences are undefined if they are neither true nor false, we run into similarproblems when it comes to finding witnesses. A possible solution is, as in section4.4, to assume that all the individuals are listed in the data base, say by the predicate"is". Then we can proceed by

value('SOME'(X,P,Q),undefined) :-is(X),value(P & Q,undefined),!.

And then take false to simply be the option if true and undefined both fail.

50

What exactly does such an approach to the quantifiers implement? As we havementioned, Prolog is based on the:

Closed world assumption:What I do not know to be true is false.

The direct use of Prolog in representing a model and in evaluating negation as insection 4.4 is based on the same assumption. Implicit in this assumption is alsowhat in the AI literature has been called

Domain closure:The individuals I have not heard of, they do not exist.

This became explicit in the interpretation of quantifiers in section 4.4.The approach to the negation and the interpretation rule for the existential

quantifier proposed in this section can be considered the result of giving up theclosed world assumption but keeping the domain closure assumption. Theevaluation rules admit that there might be more to know about the individuals thanwhat is represented in the data base, but it does not admit there to be otherindividuals than the ones that are listed in the data base.

5.2 More partiality

A more radical attitude would be to also give up domain closure and to admit thatone has not seen all the individuals in the world. Then the rules for the evaluationof the existential quantifier will have to be adjusted. We leave this as an exercise.The difference between the two approaches can be illustrated by an example.Assume the data base to consist of the following facts

true('SWAN'(a)).true('SWAN'(b)).false('SWAN'(c)).true('WHITE'(a)).true('WHITE'(b)).is(a).is(b).is(c).

On the approach assuming the domain closure with the rules given above, thesentence Some swan is not white will evaluate to false. But if we give up domainclosure, there will be a possibility there are more swans in the world and we do notknow their color. Hence the sentence will evaluate to undefined.

51

5.3 Inference and evaluation

In chapter 1 we singled out two possible applications based on formal semantics,evaluation and inference. The latter would be to answer whether a logical formulaj could be deduced from a set of sentences S, whether the inference from S to j isvalid. The application of this would be a system which contained a text, T, andwere asked a yes-no question q. To answer the question, the system could translatethe text T into a set of formulas S and the proposition questioned by q into j andask whether j followed from S. If the sentences in T and q were within thefragments considered, we could use the translation procedure introduced so far. Todetermine whether j followed from S we could implement one of the proofprocedures for first order logic in Prolog. But remember that we could not expect aprocedure which halted with a definite answer for all cases.

The use of partial models to represent knowledge in the last section proposes adifferent procedure, however; to build a (partial) model from the text T (S) along asthe text is read and then evaluate j in this model. What is the relationship betweenthese two approaches? Why do we consider a proof procedure if we could go thesimpler way of evaluation in a partial model? The correspondence is this. To eachpartial model M, there corresponds a set of sentences S, as follows. If true(p) is inthe model then p is in S and if false(p) is in the model then Øp is in S. If true(p) orfalse(p) is in M, p will be atomic, hence S will consist of a set of literals, i.e.,atomic and negated atomic sentences. To evaluate the formula j in M correspondsto answering whether j follows from S. The system will answer that j is true inM if and only if j follows from S, it will answer that j is false in M if and only ifØj follows from S, and it will answer that j is undefined in M if and only ifneither j nor Øj follows from S. Conversely, any set of literals S which isconsistent, i.e., which does not contain both a formula q and its negation Øq, willcorrespond to a partial model in this way.

An arbitrary set of sentences S will, however, not always correspond to a partialmodel. In particular, if the set S contains a formula like p Ú q, there is no way ofbuilding one partial model reflecting this formula. Hence the idea of using a partialmodel is constrained compared to a general theorem prover. It only works if thepremises are on a particular form. In the cases where it works, it yields an efficientproof procedure.

There is a striking parallel here to the Prolog programming language itself.Prolog contains a similar restriction with respect to disjunction (cf. Kompendium iSLI 6). What Prolog does might similarly be considered as building a partial modeland answering queries towards this model. Though the partial model cannot bebuilt incrementally. A formula like p ® q does not correspond to a partial model,but the discourse p ® q, p corresponds to the partial model {true(p), true(q)}.This shows that even though some premises are not literals Ñ here p®q Ñ onemight apply the simple procedure for deciding validity. Though we cannotconstruct the partial model incrementally, we must store the formulas as long as wesee them.

It is not only disjunction where a general set of formulas S may contain morethan a partial model M. Another issue is quantification. How should a sentence

52

like all swans are white be represented in the partial model? One possibility is toadd the fact white(a) when swan(a) is already in the data base. But what then whenmore facts are added? In a syntactically based approach, we can store arepresentation of the formula "x(swan(x) ® white(x)), and this formula will thenrefer to the actual swans introduced at the moment when the query is made.

Prolog yields the possibility of representing a fragment of first order logic (seeKompendium i SLI 6). A possibility for answering whether j follows from S istherefore to restrict attention to the case where S is a set of Horn clauses and j canbe expressed as a Prolog goal. The strategy then is to parse the sentences in T andtranslate them to logical formulas. Then check whether the formulas are equivalentto Horn clauses. If no, report failure. If yes, transform them to Prolog programsentences and assert them to the Prolog data base. Similarly, check the query forwhether it may be translated to a Prolog goal. If yes, ask it. This strategy may alsobe extended from yes/no-questions to some who, what and which questions. Weleave the actual implementations to exercises.

5.4 Exercises

EXERCISE 5.1

a) Fill in the details of the approach sketched in subsection 5.1. Write first theclauses for the connectives and try them out.

b) Extend with clauses for quantifiers. One has to be careful including cuts orchecks for syntactic form such that no clauses can apply to instances different fromthe intended ones.

EXERCISE 5.2

Combine the implementation of the connectives from exercise (5.1a) with theapproach to quantifiers sketched in section 5.2. This implementation will be at leastas simple as the one in exercise (5.1b).


Continuation of exercise 4.4.a) Let yes/no-questions be evaluated towards a partial data base such that theprogram answers yes, no or I don't know.

We will include declarative sentences alongside with questions. But the declarativesentences shall not be evaluated towards the data base. They shall be read asinstructions for updating the data base, i.e., we tell the computer something it doesnot know.

b) Write a procedure that checks that the data base is consistent, i.e., does notcontain both true(p) and false(p) for any p.

53

c) Extend the program to handle atomic declarative sentences and negateddeclarative sentences with no quantifiers as follows. If it is inconsistent to add thefact expressed by the sentence, return a message saying this. If the fact is alreadyin the data base, do nothing. If the fact represents new and consistent information,update the data base accordingly. Remember to update the is-relation when newindividuals are first met.

The last three extensions are not as simple as the others and some may not evenhave any solution. Discuss alternative solutions, if there are any, for the followingthree constructions. Give arguments for why there cannot be any solutions orpropose an implementation

d) Disjunction.

e) Existential quantification.

f) Universal quantification.


Implement the approach sketched towards the end of section 5.3. When adeclarative sentence is read, check whether it is equivalent to a conjunction of Hornclauses.

i) If it is not, return a message.ii) If it is equivalent to something of the form Ø(a1 & a2 & É & an),check whether it is consistent with what the program already knows.iii) If it is equivalent to a definite Horn clause, i.e., a clause containingexactly one positive literal, assert it.

Why do we distinguish between (ii) and (iii)?

When a yes/no-question is read, check whether it is equivalent to a Prolog goal andask the goal towards the Prolog program (cf. ii).

54

6. Scope

6.1 Quantifiers

In our treatment of quantifiers so far we have only generated one reading ofsentences containing quantifiers, thus the sentence in (a) was ascribed the Prologrepresentation in (c) corresponding to the logical formula in (b).

a) Some girl saw every boy.b) $x(Girl(x) & "y(Boy(y) ® See(x,y)))c)'SOME'(X,'GIRL'(X),'EVERY'(Y,'BOY'(Y),'SEE'(X,Y)))

It is normally assumed that (a) also has another reading where the boys are notnecessarily loved by the same girl:

d) "y(Boy(y) ® $x(Girl(x) & See(x,y)))e)'EVERY'(Y,'BOY'(Y),'SOME'(X,'GIRL'(X),'SEE'(X,Y)))

We will now consider how our program can be modified to account for this.The proposal will be based on Montague's ideas of disambiguated syntax and so-

called Òquantifying-inÓ rules. We will consider some of the similarities anddifferences between this approach and other approaches in section 6.4. Montague'sidea was to introduce a layer of disambiguated syntax. At this layer, the string in(a) was ascribed several different syntactic analysis where each syntactic analysiscorresponded to a unique semantic readingÑthough different syntactic analysiscould, of course, correspond to the same reading. In particular, the readings (b/c)and (d/e) were derived from different syntactic structures. The way to getsufficiently many different structures was to introduce a countable set of new NPsin the lexicon {hei | i Î N} together with the rule(s), where a, b and g vary overstrings:

Quantifying-in (simplified form). When a is an NP and b is an S then g isan S, where g results from substituting a for hei in b.

To be correct, there was one such rule for each i. Immediately, this rule looks quitedifferent from the other grammar rules, but in fact the differences are not that big ifwe remember that the interpretation of e.g. S ® NP VP is nothing but

When a is an NP and b is a VP then g is an S where g is the concatenationof a and b.

This interpretation is obvious when we think of the implementation of the context-free rule in Prolog. Our task will be to give a similar Prolog implementation of thequantifying-in rule, but we will first consider the semantic rule corresponding to thesyntactic rule and an example of how the quantifying-in rule may be used.

55

One possible syntactic analysis of (a) corresponding to the (d/e) reading is shownin the following derivation tree.

some girl loves every boy, S

every boy, NP

every, Det boy, N

some girl loves he , S

some girl, NP loves he , VP

loves, V he , NP

3

3

3

Fig.1

The topmost rule used is the quantifying-in rule (no. 3). All the other rules arestandard context-free, concatenation rules.

To give a logical correct semantic counter part to the quantifying-in rule, we mustassume that the logical language contains ÒlÓ and is able to ascribe a representationto an NP, e.g.

every boy lY("z(Boy(z) ® Y(z))

The pronoun hei is translated to the variable number i, xi. Thus the sentence somegirl loves he3 is translated into $x(Girl(x) & Love(x, x3)). The translation rule canthen be expressed

If a is an NP which translates to a' and b is an S which translates to b' andg is an S composed from a and b by the quantifying-in rule no. i, then thetranslation of g, g', is a'(lxi[b']).

The full tree is hence translated into (f) which reduces to (g).

f) lY("z(Boy(z) ® Y(z))(lx3($x(Girl(x) & Love(x, x3))))g) "y(Boy(y) ® $x(Girl(x) & Love(x,y)))

Actually, we do not have to worry about l-s to write a Prolog implementation ofthe quantifying-in rule. We can use exactly the same technique as we did forquantified NPs before which will yield the reduced logical expressions. Thus tocombine (h) and (i) into (j) all we need to do is to unify Y with X3 and Body withthe whole term in (i).

h)(Y,Body,'EVERY'(Y,'BOY'(Y),Body))i)'SOME'(X,'GIRL'(X),'SEE'(X,X3))j)'EVERY'(Y,'BOY'(Y),'SOME'(X,'GIRL'(X),'SEE'(X,Y)))

56

We are now ready to implement the quantifying-in rule. As the effect of the ruleis different from concatenation we cannot implement it directly in a DCG-parser.We have to write a separate procedure. The goal is a procedure which checkswhether a string g, say some girl loves every boy, is the possible output of aquantifying-in rule. We know that if it is, then it must contain an NP a and theresult of substituting hei for a in g must be a sentence which is the otherconstituent to the rule. The idea is that we search the string g (non-deterministically) for an NP, here every boy, and then substitute a variable hei (forsome i) which does not already occur in g, for the NP in g and check whether thismodified string is an S. The difference list arguments must be made explicit as thelast two arguments.

s(s(Ind,NP,S),Sem,String,Rest) :-exchange(Ind,NP,(Ind,Body,Sem),String,NewS),s(S,Body,NewS,Rest).

exchange(Ind,NPSyn,NPSem,String,[he(Ind)|Rest]):-np(NPSyn,NPSem,String,Rest).%Finds an NP and returns the rest of the%string with a fresh he(Ind) in front.

exchange(Ind,NPSyn,NPSem,[F|R],[F|NewR]):-exchange(Ind,NPSyn,NPSem,R,NewR).%Reads until an NP is found. May also skip%an NP.

word(np,he(X),X).

The implementation is intended to build the following types of syntactic andsemantic structures corresponding to the reading discussed so far

s (_226 np (det(every) n (boy)) s (np (det(a) n (girl)) vp (v (loved) np (he (_226)))))

EVERY(_226,BOY(_226), SOME(_293,GIRL(_293),LOVE(_293,_226)))

The syntactic structure is intended to reflect the derivation history of the stringsimilarly to the derivation tree we considered earlier (fig.1). The index, hereÒ_226Ó is included in the syntactic tree to indicate which instance of the quantifying-in rule which is used. The syntactic pronoun he226 is represented he(_226) while

57

the semantic variable x226 is simply represented by the Prolog variable _226.Observe how easily the use of Prolog variables care for the coindexing of he226 andx226 and thereby for the correspondence between syntax and semantics.

Some care has to be taken, though. First, observe that as stated the quantifying-in rule generates more than we ask for. A sentence like A girl loved every boy isascribed infinitely many different syntactic trees as the quantifying-in rule can beapplied for each hei for i Î N. This is no problem for a theoretical analysis, but itbecomes a problem when one wants to implement a procedure which finds allpossible analyses of a string. Of course, we do not need all these differentanalyses. The sentence has only finitely many different readings. There is no needfor separating between the one above and the one we get by exchanging _226 with_3. But this is exactly what our Prolog implementation yields. It does notconstruct all the possible derivation trees but it constructs one from each class ofessentially different trees.

There is another point where the theoretical analysis overgenerates. There is norestriction on repetition of the quantifying-in rule. Thus the following is a possible(part of a) tree:

some girl loves every boy, S

every boy, NP some girl loves he , S7

some girl loves he , S5

some girl loves he , S3he , NP

5

he , NP7

Fig. 2

And a sentence is ascribed infinitely many different trees. There is no need forthese structures. By restricting indexed pronouns from being quantified-in, weavoid these structures, and each string is only ascribed finitely many different trees.We will have to modify our program accordingly. The way we will do it in thesequel is by only allowing quantified NPs to be quantified-in. Thereby we alsorestrict proper nouns from being quantified-in. As proper nouns do not havescope, we do not loose any readings in this way, but we get some fewer syntaxtrees than one can see upon other proposals.

There is one possible source of mistake in our implementation of the quantifying-in rule. The rule searches through the remaining string for an NP and substituteshe(I) for it. But thanks to the use of difference lists, this remaining string does nothave to correspond to the sentence being analysed. Thus what we check after thesubstitution is that NewS-Rest is a sentence, not that NewS is a sentence. If thesubstitution of he(I) for the NP was in Rest, we get a wrong result. Theimplementation of the rule is correct if there is no embedded sentences, i.e., if eachsentence is a full string, or, in other words, if the rule is always called with

58

Rest=[�]. We will consider modifications to the rule for the general case in thesubsection on relative clauses below.

The version of the quantifying-in rule given above is not the full rule. One goalwhen it comes to quantifiers and scope is to explain how quantifiers can bindpronouns, e.g.

k) Every boy loves every girl who loves him.l) "y(Boy(y) ® "x((Girl(x) & Love(x,y)) ® Love(y,x)))

All we need is a small change of the quantifying-in rule:

Quantifying-in (full form). When a is an NP and b is an S then b' is an S,where b' results from substituting a for the first occurrence of hei in b andhe (him/she/her etc.) for the other occurrences of hei in b.

There have to be some constraints on gender and case, of course, but we ignorethem here. The semantic rule does not have to be changed. The only change wehave to do in the implementation is that the routine for exchanging the NP will readthrough the rest of the string after it has localized the quantified NP and each time itsees an occurrence of he it has the possibility of exchanging it with he(I) with thesame index as for the full NP. With these modifications a program may look like:

s(s(NP,VP), Sem) -->np(NP, (Var,Body,Sem)),vp(VP,Var^Body).

vp(vp(V,NP),Sub^Sem) -->v(V, Obj^Sub^Body),np(NP, (Obj,Body,Sem)).

qnp(np(D,N), (Var,Body,Sem)) -->det(D, (Var,Restr,Body,Sem)),n(N, Var^Restr).

np(Syn, Sem) --> qnp(Syn, Sem).

s(s(Ind,NP,S), Sem, String, Rest) :-exchange(Ind, NP, (Ind, Body, Sem), String, NewString),s(S, Body, NewString, Rest).

exchange(Ind, NPSyn, NPSem, String, [he(Ind)|Out]):-qnp(NPSyn, NPSem, String, Rest),option_change(Ind, Rest, Out).

exchange(Ind,NPSyn, NPSem, [First|Rest],[First|NewRest]):-exchange(Ind, NPSyn, NPSem, Rest, NewRest).

option_change(Ind,[he|Rest],[he(Ind)|NewRest]):-option_change(Ind,Rest,NewRest).

option_change(Ind,[X|Rest],[X|NewRest]):-option_change(Ind,Rest,NewRest).

option_change(_,[],[]).

59

np(np(X), (Sem,Res,Res)) --> [X], {word(np,Sem,X)}.v(v(X),Sem) --> [X], {word(v,Sem,X)}.vp(vp(X),Sem) --> [X], {word(vp,Sem,X)}.n(n(X),Sem) --> [X], {word(n,Sem,X)}.det(det(X),Sem) --> [X], {word(det,Sem,X)}.

word(np,'KIM', kim).word(np,'LEE',lee).word(np,'HIM',he).word(np, X, he(X)).word(v,Y^X^'SEE'(X,Y),saw).word(v,Y^X^'LOVE'(X,Y),loved).word(vp, X^'RUN'(X), ran).word(n, X^'GIRL'(X), girl).word(n, X^'BOY'(X), boy).word(det, (X,Restr,Body,'SOME'(X,Restr,Body)), a).word(det, (X,Restr,Body,'EVERY'(X,Restr,Body)), every).

With this program we get the following analysis for the sentences Every girl sawLee, Every girl saw he, Every girl saw some boy, respectively.

> every girl saw lee.

s (np (det(every) n (girl)) vp (v (saw) np (lee)))

EVERY(_369,GIRL(_369),SAW(_369,LEE))

s (_366 np (det(every) n (girl)) s (np (he (_366)) vp (v (saw) np (lee))))

EVERY(_366,GIRL(_366),SAW(_366,LEE))

> every girl saw he.

s (np (det(every) n (girl)) vp (v (saw) np (he)))

EVERY(_184,GIRL(_184),SAW(_184,HIM))

s (_181 np (det(every) n (girl)) s (np (he (_181)) vp (v (saw) np (he (_181)))))

EVERY(_181,GIRL(_181),SAW(_181,_181))

60

s (_181 np (det(every) n (girl)) s (np (he (_181)) vp (v (saw) np (he))))

EVERY(_181,GIRL(_181),SAW(_181,HIM))

> every girl saw a boy.

s (np (det(every) n (girl)) vp (v (saw) np (det(a) n (boy))))

EVERY(_579,GIRL(_579), SOME(_620,BOY(_620),SAW(_579,_620)))

s (_576 np (det(every) n (girl)) s (np (he (_576)) vp (v (saw) np (det(a) n (boy)))))


s (_576 np (det(every) n (girl)) s (_634 np (det(a) n (boy)) s (np (he (_576)) vp (v (saw) np (he (_634))))))


s (_576 np (det(a) n (boy)) s (np (det(every) n (girl)) vp (v (saw) np (he (_576)))))

SOME(_576,BOY(_576), EVERY(_643,GIRL(_643),SAW(_643,_576)))

61

s (_576 np (det(a) n (boy)) s (_640 np (det(every) n (girl)) s (np (he (_640)) vp (v (saw) np (he (_576))))))

SOME(_576,BOY(_576), EVERY(_640,GIRL(_640),SAW(_640,_576)))

We get two different syntactic trees for the first sentence. We would have got fiveif proper nouns were allowed to be quantified-in too, but even two trees is one toomany. The two trees are ascribed the same logical formula. The second sentencehas two different readings, the one where the pronoun is boundÑand whichactually has to be written every girl loves herselfÑand the one where the pronounis not bound but deictic, here indicated by the semantic value 'HIM'. But it isascribed three different trees. The third sentence has two different readings, but isascribed five different trees.

One way to reduce the number of syntactic analysis is to make the quantifying inrule compulsory for quantified NP. It is easy to see that this will still yield all thepossible logical formulae, but a lot fewer analysis, e.g. one, two and two differentsyntactic trees for the three example sentences. In our program this can be achievedby simply removing the rule

np(Syn, Sem) --> qnp(Syn, Sem).

6.2 Negation

In section 2.3 we introduced negation as VP-negation. Combined with thehandling of quantified NPs from section 3 we would get the (b) reading of (a) butnot the (c) reading.

a) Lee did not see a girl.b) Ø $x(Girl(x) & See(Lee, x))c) $x(Girl(x) & Ø See(Lee, x))

If the treatment of negation from section 2.3 is incorporated in the fragment fromthe last section, and we do not make the quantifying-in of quantified NPscompulsory, we will, however, get both readings. But we will only get the (e)-reading of (d). There is no way to give the subject NP narrower scope than thenegation as in (f):

62

d) A girl did not see Lee.e) $x(Girl(x) & Ø See(x, Lee))f) Ø $x(Girl(x) & See(x, Lee))

To get the (f)-reading, negation has to be considered semantically as a sentenceoperator. The easiest way to obtain this will be to introduce the negation in theS ® NP VP-rule as proposed by Montague in PTQ. The rules that have to bechanged compared to section 6.1 is

s(s(NP,VP), Sem) -->np(NP, (Var,Body,Sem)),vp(fin,VP,Var^Body).

s(s(NP,VP), - Sem) -->np(NP, (Var,Body,Sem)),[did],[not],vp(inf,VP,Var^Body).

vp(Tense,vp(V,NP),Sub^Sem) -->v(Tense,V, Obj^Sub^Body),np(NP, (Obj,Body,Sem)).

v(Tense,v(X),Sem) --> [X], {word(v,Tense,Sem,X)}.vp(Tense,vp(X),Sem) --> [X], {word(vp,Tense,Sem,X)}.

word(v,fin,Y^X^'SEE'(X,Y),saw).word(v,inf,Y^X^'SEE'(X,Y),see).word(v,fin,Y^X^'LOVE'(X,Y),loved).word(v,inf,Y^X^'LOVE'(X,Y),love).word(vp,fin,X^'RUN'(X), ran).word(vp,inf,X^'RUN'(X), run).

The quantifying-in rule is left unchanged. Furthermore, in this fragment the qnp --> np-rule must be included. Otherwise the negation cannot get wider scope than thequantifiers. Exactly as the full program in section 6.1 generated five syntacticanalysis to Every girl saw a boy this program will generate five analysis to Everygirl did not see a boy. But this time the syntactic analysis will correspond to fivedifferent readings:

-EVERY(_291,GIRL(_291), SOME(_339,BOY(_339),SEE(_291,_339)))

EVERY(_288,GIRL(_288), -SOME(_383,BOY(_383),SEE(_288,_383)))

EVERY(_288,GIRL(_288), SOME(_355,BOY(_355),-SEE(_288,_355)))

SOME(_288,BOY(_288), -EVERY(_368,GIRL(_368),SEE(_368,_288)))

SOME(_288,BOY(_288), EVERY(_365,GIRL(_365),-SEE(_365,_288)))

63

But wait a minute. Three elements can be ordered in six different ways. Hencethere must be (at least) one reading we did not get.

-SOME(_288,BOY(_288), EVERY(_365,GIRL(_365),SEE(_365,_288)))

The way the program works, if we want some boy to get wider scope than everygirl, some boy must be quantified-in. But a quantifier which is quantified-in willalso get wider scope than the negation.

MontagueÕs way for overcoming this shortcoming was to allow quantifying-innot only into sentences but also into phrases of category VP or CN. We will hereconsider another option. Instead of incorporating the semantics of the negation intothe S ® NP VP-rule, this rule will notice an occurrence of did not but not interpretit. It will store the fact that the sentence is negated. Later on the negation must besemantically resolved before the sentence is accepted. But the negation may getwider scope than any quantifier. The following exemplifies how this can be done.

s(Syn,Sem) -->s(pos,Syn,Sem).

s(Pol,s(NP,VP), Sem) -->np(NP, (Var,Body,Sem)),vp(Pol,fin,VP,Var^Body).

vp(pos,Tense,vp(V,NP),Sub^Sem) -->v(Tense,V, Obj^Sub^Body),np(NP, (Obj,Body,Sem)).

vp(neg,fin, VP,Sem) -->[did],[not],vp(pos,inf,VP,Sem).

s(Pol,s(Ind,NP,S), Sem, String, Rest) :-exchange(Ind, NP, (Ind, Body, Sem), String, NewString),s(Pol,S, Body, NewString, Rest).

s(pos,s(neg,S),- Sem) -->s(neg,S,Sem).

vp(pos,Tense,vp(X),Sem) --> [X], {word(vp,Tense,Sem,X)}.

The rest of the rules are left unaltered. This corresponds to handling the negation asa scoping element similar to the quantifiers. Both quantifiers and negation can beascribed scope over the whole sentence and all possible orderings of these elementsare allowed. If the np --> qnp -rule is left out, this program will ascribe exactly twosyntactic analysis to Every girl saw a boy and six syntactic analysis to Every girldid not see a boy corresponding to the six possible readings.

64

6.3 Restrictive relatives

Considering scope it is only natural to take into consideration relative clauses. Byincluding embedded sentences the periods get longer, the number of NPs grow andthe scope possibilities multiply. It is straightforward to include the treatment ofrelatives from section 3.2 in our program for quantified sentences. The question iswhether this yields all the possible readings. If we include relatives as a newcategory, R, and a rule like N ® N R, the answer is no. This does not suffice.Consider the sentence Every girl that did not see a boy ran. The program willascribe to it the two readings (a) and (b), but not the one in (c).

a) EVERY(_1324,GIRL(_1324)& -SOME(_1424,BOY(_1424),SEE(_1324,_1424)),RUN(_1324))

b) SOME(_1324,BOY(_1324), EVERY(_1457,GIRL(_1457)&-SEE(_1457,_1324),RUN(_1457)))

c) EVERY(_1324, GIRL(_1324)& SOME(_1419,BOY(_1419),-SEE(_1324,_1419)),RUN(_1324))

We are not able to vary the relative scope between the NP a boy and the negation inthe relative without using the quantifying-in rule and thereby giving a boy widerscope than every girl.

Before considering possible remedies which will yield the (c)-reading, let usconsider what the actual program for quantifiers and relative clauses currently looklike. In particular we should explicitly introduce rules for negation in relatives,which we have not done before, to get the (a) and (b)-readings. We are notinterested in the inner structure of relatives, hence the relatives will consist of a VPsolely. To get more compact rules we introduce a new argument in the s-relation,the first argument. This argument will be filled with main for a main clause andrel:Var for a relative clause, where Var is a variable filling the gap position in thesemantic representation.

Old rules extended with this new argument will look like

s(Syn,Sem) -->s(main,pos,Syn,Sem).

s(main,Pol,s(NP,VP), Sem) -->np(NP, (Var,Body,Sem)),vp(Pol,fin,VP,Var^Body).

s(main,Pol,s(Ind,NP,S), Sem, String, Rest) :-exchange(Ind, NP, (Ind, Body, Sem), String, NewString),s(main,Pol,S, Body, NewString, Rest).

These rules will have exactly the same function as earlier. New rules to care for therelatives will be:

65

qnp(np(D,N,R), (Var,Body,Sem)) -->det(D, (Var,Ns & Rs,Body,Sem)),n(N, Var^Ns),[that],s(rel:Var,pos,R, Rs).

s(rel:X,Neg,rel(that,VP), Sem) -->vp(Neg,fin,VP,X^Sem).

s(Type,pos,s(neg,S),- Sem) -->s(Type,neg,S,Sem).

The last rule is not actually new; it is the same negation rule as in the last section.But with the variable first argument this rule may be used both for negation in mainclauses and relative clauses. With the rest of the program as in the last section wehave a program which will find the (a) and (b)-readings but not the (c)-reading.

How can this be extended to yield the (c)-reading? In the form we have writtenthe program, the answer proposes itself, to allow quantifying-in to be done not onlyinto full main clauses but also into relative clauses. All we have to do is to changethe quantifying-in rule as we did with the negation rule, allow the first argument tobe on the form rel:Var as well as main.

s(Type,Neg,s(Ind,NP,S), Sem, String, Rest) :-exchange(Ind, NP, (Ind, Body, Sem), String, NewString),s(Type,Neg,S, Body, NewString, Rest).

We are now in a similar position as at the end of section 6.2. We get allpossible readings if each quantified NPs is quantified-inÑeither at the minimalclause containing itÑor at some larger clause containing it. As in section 6.2 wewill get all possible readings without the np --> qnp -rule.

Are then all problems solved? Unfortunately not. Even though this solution istheoretically sound, our implementation of it is not. When the quantifying-in rule isapplied to restrictive clauses, it may be called with Rest as a variable. When afterexchange there is made a call on s with NewString, this s may return a Restdifferent from []. This Rest will be a segment of NewString but not necessarily ofString since exchange may have made changes also in the part of the string which isnot consumed by the last s of the rule. To take an example, the string every girl thatran saw a boy is not only ascribed the two sound forms (d) and (e) but also thestrange looking (f) containing a free variable:

d) EVERY(_670,GIRL(_670)&RUN(_670), SOME(_753,BOY(_753),SEE(_670,_753)))

e) SOME(_670,BOY(_670), EVERY(_749,GIRL(_749)&RUN(_749),SEE(_749,_670)))

f) EVERY(_670,GIRL(_670)& SOME(_722,BOY(_722),RUN(_670)), SEE(_670,_722))

66

The easiest way out of this problem is the following. To chop the string in twoby append, let exchange only operate on the first part of the string, demand the s toconsume this whole first part, while the second part is returned unaltered.

s(Type,Neg,s(Ind,NP,S), Sem, String, Rest) :-append(First,Rest,String),exchange(Ind, NP, (Ind, Body, Sem), First, NewString),s(Type,Neg,S, Body, NewString, []).

This will do the job. It will yield exactly the readings (a), (b), (c), and (d), (e),respectively. The implementation is not particularly efficient. This rule will in turntry all partitions of the remaining string. When we studied parsing, we consideredthe possibility of using append as the basis of a Prolog parser and saw that theparser became much more efficient if it was instead based on difference lists. Inthis particular rule we do not exploit the advantages of the difference lists, we returnto the append-approach. We have not found a simple way to implement thequantifying-in rule in the spirit of difference lists. Of course, there are ways toimplement the rule more efficiently if one is willing to write a longer and morecomplicated program. We leave it to the readers to explore this possibility.

6.4 What is scope?

APPROACH 1 Ñ MONTAGUE

We have so far implemented scope but what exactly is it? Is scope a part of syntax,semantics or something in between? We will now consider some proposedanswers to this question. Our implementation was based on MontagueÕs proposalin PTQ which is e.g. quite similar to the reminiscent treatment of scope in Barwiseand Cooper (1981). To account for the two different readings of

a) Some girl loves every boy.

Montague claimed that the sentence was syntactically ambiguous and ascribed it two(well, actually more than two, as we have seen) different analysis in hisdisambiguated syntax. (Montague himself talked about disambiguated languages,Dowty et. al. (1981) thinks it would have been more appropriate to talk aboutunambiguous grammar while we have chosen disambiguated syntax.) The objectsin the disambiguated syntax can be taken to be derivation histories which can bedisplayed in derivation trees as we have seen. The point is that each disambiguatedsyntactic structure has a unique expression (string) that realizes it and a uniquemeaning. Thus there is a function which maps the syntactic structure on the stringthat realizes it and another function that maps it on its meaning. If arrow directionsindicate a functional relationship this can be indicated as follows:

67

Disambiguatedsyntactic structure

String Meaning

Fig.1

Just like two different structures can be mapped on the same string, they can alsobe mapped on the same meaning. In particular a string like

a) Some girl loves some boy.

has only one meaning even though it has several different disambiguated structures.The general picture can be illustrated thus:

d1

d2

d3

d4

d5

d6

e1

e2

m1

m2

m3

ExpressionDisambiguatedsyntax

Meaning

Fig.2

By introducing disambiguated syntactic representations Montague was able toexplain the relationship between expressions and meanings; an expression can beassociated with a meaning just in case they can be derived from the samedisambiguated structure.

Montague grammar became very popular by semanticists during the seventies butnot by syntacticians. The disambiguated syntax of a Montague grammar based oncategorial grammar extended with its quantifying-in rules and very awkward rulesfor tense and case was not a good tool for genuine syntactic purposes; fordescribing the well-formed sentences of a language beyond a small fragment andfor describing syntactic regularities and variations. In particular it did notdistinguish between the scope ambiguity in (a) and the genuine syntactic ambiguityin a string like

b) He saw her duck.

68

It should be observed, however, that it is perfectly possible to extendMontagueÕs proposal with a more normal syntactic level where the structures arenot necessarily semantically disambiguated. At this level (a) would only get onestructure while (b) would get two. A sentence like

c) Every girl did not see her duck,

would be ascribed two different syntactic analysis according to how see her duck isinterpreted, while each of these would be associated with several differentdisambiguated structures according to the relative scope of every girl and thenegation and whether her is bound by every girl or not. Each disambiguatedstructure would be associated with a unique (possibly ambiguous) syntacticstructure and each syntactic structure with a unique string.

The objects of the (possibly ambiguous) syntactic structures could be taken to bephrase structure trees. A context-free phrase structure grammar can be read asgenerating strings or as generating trees. A Montague grammar could also beinterpreted as one generating phrase structure strings rather than trees. For thecontext-free rewriting rules, the tree interpretation would be as in a context-freegrammar. For the quantifying-in rule, what we have to do is to let a, b and g varyover phrase structure trees rather than strings. Thus the interpretation of the rule (insimplified form) is that a tree a with top node NP is substituted into a tree b withtop node S for another sub-tree with top node NP and hei as the only daughter (cf.Barwise and Cooper 1981). The result is a grammar which generates phrasestructure trees. The strings generated can as usually be taken to be the yield of thetrees.

Before drawing the full picture, let us for symmetry reasons add a word on themeaning side, too. Semantics is the relationship between form and meaning. In atruth-conditional approach it is convenient to express meanings by logical formulae.But these formulae are not the meanings of the sentences nor are they, as Montaguepointed out, necessary for expressing the meanings of the sentences. For examplethe string in (d) is ascribed (at least) two different disambiguated forms with twodifferent logical formulae.

d) Some girl loves some boy.e) $x(Girl(x) & $y(Boy(y) & Love(x,y)))f) $y(Boy(y) & $x(Girl(x) & Love(x,y)))

The two formulae have the same meaning, however. There is no differencebetween the two readings when it comes to meaning; nothing can be read out of thetwo different formulae. In contrast to the level of disambiguated syntacticstructures and maybe phrase structure trees nothing can be read out of the form ofthe logical formulae.

With arrows indicating functionality, our two earlier pictures can be revised as

69

Disambiguatedsyntactic structure

String Meaning

Phrase structure tree Logical formula

Fig.3

d1

d2

d3

d4

d5

d6

s1

s2

l1

l2

l3

ExpressionDisambiguatedsyntax

Meaning

e1

m1

m2

Phrase-structuretree

Logicalformula

Fig.4

In general we can think of a language as a relation, L, between these five layers, arelation which holds between an expression, a phrase-structure tree, a derivationhistory , a logical formula and a meaning just in case the grammar generates thederivation history and the other four components are associated with this derivationhistory. We can also consider simpler relations as the one which holds between aphrase-structure tree and a logical formula just in case there is a derivation historywith which they both can be associated.

Scope is located at the level of disambiguated syntax in Montague grammar.What is disambiguated syntax? As it is possible to derive a unique syntactic treeand a unique semantic interpretation from each disambiguated structure, the mostcorrect is probably to say that this level is both syntax and semantics.

The disambiguated syntax is the basic level in Montague grammar. A grammarspecifies a set of disambiguated syntactic structures. From these one can derive themore normal syntactic trees uniquely. This proposal can be split into two. Firstly,that there is a level of disambiguated syntactic structures and a one-to-manyrelationship between the normal syntactic structures and the disambiguated ones.Secondly, that the grammar shall describe the disambiguated structures and thenormal trees shall be derived from them by a functional application. It is possible toretain the first point and oppose to the second one.

70

APPROACH 2 Ñ GB AND L F

The basic picture in GB-theory is

DS

SS

PF LFFig.5

The arrow direction does not indicate a functional relationship. It indicates a sort ofderivational priority. The (a)-string above is ascribed one structure at SS-level Ñwe will not try to describe this structure in detail. From SS to LF the rule Move amay be applied to NPs and let a trace be left behind. Schematically this can berepresented (from Chierchia and McConnell-Ginet 1990):

[S X NP Y] Þ [S NPi [S X ei Y]]

This form of movement is called quantifier raising. Thus if (i) is a simplified SS-structure, (i)Ð(v) are possible LF-forms (ignoring tense):

i) [S [NP some girl] [VP loves [NP every boy]]]ii) [S [NP some girl]3 [S e3 [VP loves [NP every boy]]]]iii) [S [NP every boy]5 [S [NP some girl] [VP loves e5]]]]iv) [S [NP some girl]3 [S [NP every boy]5 [S e3 [VP loves e5]]]]]v) [S [NP every boy]5 [S [NP some girl]3 [S e3 [VP loves e5]]]]]

If quantifier raising is made compulsory (cf. Chierchia and McConnel-Ginet, 1990)only the last two of the five structures are derived.

The similarities between the LF representations and the disambiguated syntacticstructures in Montague grammar are obvious. This also indicates the mostappropriate level to compare the two approaches; SS plays a similar role to thephrase-structure trees in Montague grammar, while LF plays a similar role to thedisambiguated syntactic structures. (This is worth mentioning since there has beensome confusion comparing LF to the logical formulae in Montague grammar. Thisconfusion is discussed by Dowty et. al. 1981). The similarities between the twoapproaches is that there are two different syntactic layers, a one-to-manyrelationship between them and that the disambiguated representations are input tothe semantics. There is also a direct similarity between the quantifying-in rule andthe quantifier raising rule. On both proposals one possibility for a quantified NP isto be base generated, or, as it is also called, be interpreted in situ, as the two NPs in(i). Or the NP can be ascribed wide scope, on the first proposal by quantifying-inand upon the second proposal by quantifier raising. And this wide scope handlingmay be made compulsoryÑat least in an extensional setting where negation andtense is given a similar wide scope treatment to the quantifiers.

71

We are here ignoring the finer details of difference between the two proposals ashow the well-formedness of the SS-structures are determined and how LF is usedfor explaining constraints on coreference. Still there is one major differencebetween the two. In Montague grammar one generates disambiguated structuresand derive the phrase structure trees by a function, a deterministic procedure. InGB on the other hand, one starts out with the S-structures and derive the LFs non-deterministically.

But when it comes to implementation, these distinctions may become less sharp,at least on a small fragment as the one we considered. The distinctions are notnecessarily more essential than the difference between generation and parsing orbetween a top-down and a bottom-up strategy. The implementation of thequantifying-in rule was not totally determined by the rule. We could think of analternative implementation as follows. Suppose that a is an NP with derivation treea', that b is an S with derivation tree b' and that g is an S with derivation tree g'constructed from a' and b' by quantifying-in rule no. i. If a' is substituted for thefirst occurrence of an NP hei and he for the other occurrences of hei in the historyg' then the result is an alternative history g'' for the same string g. Moreover, bysubstituting the NP hei for a' and possibly for some occurrences of he in g'', g'' isturned into b'. On this background one might propose the following alternativealgorithm for parsing g into g':

1) find a derivation history g''2) substitute hei for a' and possibly for some occurrences of he in g'' to get b'3) finally add a' to b' to get g'.

Context-free rules together with this rule would be sufficient for getting the sameparses as before. The similarity between this strategy and the quantifier raisingstrategy should be obvious, though there may remain some smaller discrepancies asto whether quantifying-in/quantifying raising should be done on relatives and otherembedded clauses during parsing or afterwards.

The algorithm described in the last paragraph can be straightforwardlyimplemented in Prolog. In the following program, only derivation histories arereturned, no logical formulae. We have introduced a new category s1 for sentenceswhich have not undergone quantifying-in while s is a full sentence, as illustrated bythe two first clauses. The simpler clauses are as before and not included. The oldquantifying-in rule is left out.

s(Syn) -->s(main,pos,Syn).

s1(main,Pol,s(NP,VP)) -->np(NP),vp(Pol,fin,VP).

s(Type,Pol,NewTree) -->s1(Type,Pol,OldTree),{qi(OldTree,NewTree)}.

72

qi(Old,Old). %No qi. End of recursion.qi(Old,New):-

find(Old,RestTree,[],[Ind,NP]),%One qi done.qi(s(Ind,NP,RestTree),New).

find(he(I),he(I),X,X):-!.%Special care here.%Otherwise the program returns the same answer%several times.

find(Tree,Rest,In,Out):-Tree=..[Cat|[Daught|Ers]],find_list([Daught|Ers],NewDs,In,Out),Rest=..[Cat|NewDs].

find(Tree,he(Ind),[],[Ind,NP]):-Tree=..[qnp|Daughters],NP=..[np|Daughters].%One Np is found.%The category name is changed to prevent repetition.

find(np(he),np(he(Ind)),[Ind,NP],[Ind,NP]).%An NP is found to the left and%exchanged with he(Ind).

find(X,X,Y,Y):-atom(X).%Takes care of the base of the induction.

find_list([F|R],[F|NR],I,O):-var(F),!,find_list(R,NR,I,O).

find_list([F|R],[NF|NR],I,O):-find(F,NF,I,M),find_list(R,NR,M,O).%Standard list treatment.

find_list([],[],X,X).

But this implementation faces a problem of overgeneration. Consider thefollowing example:

> every girl that saw a boy ran.

1. s (qnp(det(every) n (girl) rel(that(_317) s (np (he (_317)) vp (v (saw) qnp(det(a) n (boy)))))) vp (ran))

2. s (_409 np (det(a) n (boy)) s (qnp(det(every) n (girl) rel(that(_317) s (np (he (_317)) vp (v (saw) he (_409))))) vp (ran)))

73

3. s (_692 np (det(every) n (girl) rel(that(_317) s (np (he (_317)) vp (v (saw) he (_409))))) s (_409 np (det(a) n (boy)) s (he (_692) vp (ran))))

4. s (_409 np (det(every) n (girl) rel(that(_317) s (np (he (_317)) vp (v (saw) qnp(det(a) n (boy)))))) s (he (_409) vp (ran)))

5. s (_471 np (det(a) n (boy)) s (_409 np (det(every) n (girl) rel(that(_317) s (np (he (_317)) vp (v (saw) he (_471))))) s (he (_409) vp (ran))))

Observe that the program is a bit simplified. It does not allow a quantifier to bequantified-in with the scope of a relative clause. Even so the programovergenerates. The problem is the reading in (3) which does not have sense. It iseasy to see what has happened. First the NP ((a boy) _409) was extracted out from(1) to (2) and then the remaining NP ((every girl that saw _409) _692) is ascribedeven wider scope from (2) to (3). The step from (2) to (3) does not reflect a legalquantifying-in, hence the algorithm described a minute ago is not correct. Why wasit incorrect? Where did we go wrong? Everything we said in the motivation wascorrect. If g' is derived by quantifying-in then g'' has the described form and wecan reconstruct g' according to the algorithm. But there are derivation histories, sayd, which are not on the form g'' for any g'. The algorithm will find all derivationhistories, also d, and then wrongly assume that d is on the form g'' and try toconstruct the corresponding g'. The following small change in the routine will yielda correct program which returns only correct readings.

74

qi(Old,Old). %No qi. End of recursion.qi(Old,s(Ind,NP,New)):-

find(Old,RestTree,[],[Ind,NP]),%One qi done.qi(RestTree,New).

These pieces of programming illustrates both the similarities and the differencesbetween the quantifying-in and the quantifier raising approaches. The algorithm wefirst specified and the implementation of it, which did not quite work out, is acorrect implementation of the simple law for quantifier raising. Hence someconstraints have to be put on quantifier raising to give a correct result. Thealgorithm was also nearly a way to implement quantifying-in. Our last proposalshowed a small modification which was sufficient to get it to work. The twoapproaches will also differ if we allow a quantifier to take scope over a relative.Within the quantifying-in approach, the natural way to implement it would be toallow the quantifying-in routine to work on all clauses during parsing. Within thequantifier raising approach, the most natural solution would be to let the output ofthe parse be a phrase-structure tree and let the qi-routine transform this tree into adisambiguated tree after parsing and then allow a quantifier not to be moved only tothe front of the whole tree but also to the front of any tree of category s. We leave itto an exercise to try out these alternatives.

APPROACH 3 Ñ COOPER STORES

The dissatisfaction with Montague grammar as a syntactic tool, led Robin Cooper(1983) to another approach. The goal of a grammar is on the one hand to describethe grammatical strings and on the other hand to ascribe meanings to these strings.Cooper found that the best tool for the first task was not MontagueÕs disambiguatedstructures but rather a form of more normal, possibly semantically ambiguous,syntactic rules. The second task consisted in constructing logical formulaeassociated with the syntactic structures. The proposal was to each syntacticstructure to associate a representation in an extended logical formalism, arepresentation similar to a logical formula, but where the quantifier scopes were notdecidedÑthe quantifiers were ÒstoredÓ. Later on, logical formulae were recoveredfrom these representations non-deterministically by ascribing scope to thequantifiers. It will take us to far astray to explain the details, but the overall picturemay be indicated thus

u1

u2

s1

s2

l1

l2

l3

ExpressionUnderspecifed logical formula

Meaning

e1

m1

m2

Phrase-structuretree

Logicalformula

Fig.6

75

The use of underspecified semantic representations can be found in several laterproposals, too. In Lexical-Functional Grammar (LFG) the f-structures were meantto be, between other things, input to the semantics. F-structures indicate predicate-argument structure but not scopes. HalvorsenÕs (1983) proposal for combiningLFG with a Montague type of semantic interpretations translated f-structures intounderspecified logical formulae before the quantifier scopes were resolved to yieldfull formulae. Fenstad et. al.Õs (1987) integration of LFG and Situation Semanticsalso passed through a level of representation which was not disambiguated forscope, the situation schemata, before the formulae were disambiguated. A similaridea can be found in the Core Language Engine (Alshawi 1992).

There is currently a lot of discussion with respect to the best level for resolvingscope. We will not try and answer this question, but we will point to some types ofinteresting evidence.

CONSIDERATIONS 1 Ñ RESTRICTIONS ON SCOPE

During our implementation of scope, we tried to be sure that we were able togenerate all possible scopings. But we might have generated too much. Forexample, we let an NP in a relative clause possibly get scope over the full sentence.This seems appropriate for (a) which may have the reading (b), but (c) cannot beread as (d).

a) Every girl who saw a boy ran.b) $x(Boy & "y((Girl(y) & See(x,y)) ® Run(y))c) A girl who saw every boy ran.d) "x(Boy ® $y((Girl(y) & See(x,y)) & Run(y))

Often relative clauses are claimed to be scope islands. A scoping element occurringin the relative clause cannot get wider scope than the clause. Thereby (d) isavoided, but this does not explain why (b) is possible. A grammatical theory forscope should make it possible to account for these differences.

A particular scope phenomenon (too?) much studied is pronouns, under whichconditions they can be bound or not. In particular such phenomena as the pronounher cannot be bound by every girl in (f) while herself has to be bound in (g).

e) Every girl saw her.f) Every girl saw herself.

We will not enter that discussion here, but the following is relevant from acomputational point of view. Strand (1992) implemented the LFG proposedsolution for these phenomena (see Sells 1985) based on functional uncertainty(Dalrymple et. al. 1990). He found it much easier not to try to resolve the possiblepronoun bindings during parsing, but instead to make use of a two step procedurewhere the output of the first procedure were not specified with respect to whetherpronouns were bound or not, and what their binders were, while the second step

76

determined this in accordance with an implementation of inside-out functionaluncertainty.

CONSIDERATIONS 2 Ñ LIKELINESS OF SCOPE

Upon our implementation, even quite simple sentences become many waysambiguous. Still when we as language users hear an utterance, we do normally nothave any problems in understanding it. We do not even recognize possibleambiguities. One might consider that the proper task of an implementation is not tofind all possible scopings but the most likely one in the context.

This is not an easy task, however. Several factors seem to work together, theorder of the quantifiers: a left to right interpretation is preferred ceteris paribus; thedeterminers involved: e.g. each has a tendency to take wide scope, all has atendency to take narrow scope; the context: e.g. some prepositions have a tendencyto ascribe wide scope to their arguments; real world knowledge: some readingsdescribe situations we know could not happen.

But there is of course the possibility of using the simple left to right order as anapproximation, which is exactly what we did in section 3.

CONSIDERATION 3 Ñ EFFICIENT IMPLEMENTATION

Our implementation parsed the sentence once for each possible scoping. Needlessto say, this is a time consuming approach when the sentences is long with manyquantified NPs.

An implementation where the sentence is only parsed once (for each syntacticstructure), where the quantifiers are stored and where the disambiguated formulaeare calculated later on may be done much more efficiently. Hobbs and Shieber(1987) and Vestre (1991) consider algorithms for efficient calculations of thepossible scopings from an underspecified logical formula with stored quantifiers.

77

6.5 Exercises

EXERCISE 6.1

When we allow quantifying-in with the scope of a relative clause (section 6.3) theresulting derivation trees which are the output of the program become a bit strange.Modify the program to a more natural format for the relatives.

EXERCISE 6.2

Our implementation calculates disambiguated syntactic structures and logicalformulae in parallel during the parse. An approach more in the spirit of MontagueÕsoriginal proposal would have been to only calculate the derivation trees during theparse and then have a separate procedure which calculate the logical formulae fromderivation trees. Modify the program accordingly.

EXERCISE 6.3

Modify the program such that it returns one more argument, a ÒnormalÓ phrase-structure tree.Hint. If the program shall return both a phrase-structure tree and a disambiguatedstructure, some ingenuity is needed as the quantifying-in rule will disturb the stringand thereby the simple trees. One possible trick is to leave behind the tree of theremoved NP together with the index he(Ind) in the string.

EXERCISE 6.4

Fill in the details of the program we constructed in the paragraph on QuantifierRaising in the following two different ways.a) The return of the parse shall be a derivation history. A quantifier may take scopeover a relative clause. The interaction between negation and quantifiers in relativeclauses shall come out correct as in section 6.3. Though we will allow quantifierswhich are not quantified-in and hence several different histories for the samereading.b) The return of the parse shall be a phrase structure tree. Then the phrasestructure tree shall be transformed into a disambiguated tree/derivation historytree/LF. You should cover the same phenomena as in (a).

EXERCISE 6.5

From exercise 6.2 and exercise 6.4 the following program can be constructed.Routine 1 parses the string and returns a phrase-structure tree, routine 2 turns thephrase-structure tree into a disambiguated syntactic structure, and routine 3translates the disambiguated syntactic structure into a logical formula. Contractroutine 2 and routine 3 into one routine which from a phrase-structure treeconstructs an associated logical formula.

78

ReferencesAbramson, H. and V. Dahl, 1989, Logic Grammars, Springer, Berlin, Germany.

Alshawi, H.(ed.), 1992, The Core Language Engine, MIT Press, Cambridge, MA.

Barwise, J. and R. Cooper, 1981, ÒGeneralized Quantifiers and NaturalLanguageÓ, Linguistics and Philosophy, 4, 159Ð219.

Chierchia, G. and S. McConnell-Ginet, 1990, Meaning and Grammar. AnIntroduction to Semantics, MIT Press, Cambridge, MA.

Cooper, R., 1983, Quantification and Syntactic Theory, D.Reidel, Dordrecht,Holland

Dalrymple, M., J. Maxwell and A. Zaenen, 1990, ÒModeling AnaphoricSuperiorityÓ, in Proceedings of Coling 90, Helsinki, Finland.

Dowty, R.D., R.E. Wall and S. Peters, 1981, Introduction to MontagueSemantics, D.Reidel, Dordrecht, Holland

Fenstad, J.E., P.-K. Halvorsen, T. Langholm, J. van Benthem, 1987, Situations,Language and Logic, D.Reidel, Dordrecht, Holland

Gal, A., G. Lapalme, P. Saint-Dizier and H. Somers: 1991, Prolog for NaturalLanguage Analysis, Wiley, New York, NY.

Gazdar, G. and C. Mellish, 1989, Natural Language Processing in PROLOG,Addison-Wesley, Wokingham, England.

Halvorsen, P.-Kr., 1983, ÒSemantics for Lexical-Functional GrammarÓ, LinguisticInquiry, 14, 567Ð615

Hobbs, J.R. and S.M. Shieber, 1987, ÒAn Algorithm for Generating QuantifierScopingsÓ, Computational Linguistics, 13, 47Ð63.

Lakoff, G., 1987, Women, Fire and Dangerous Things, Chicago University Press,Chicago, IL.

Montague, R., 1973, ÒThe Proper Treaatment of Quantification in OrdinaryEnglishÓ in Hintikka, K.J.J., M.E. Moravcsik and P.Suppes (eds.),Approaches to Natural Language, Reidel, Dordrecht, Holland.

Pereira, F.C.N. and S.M. Shieber, 1987, Prolog and Natural-Language Analysis,CSLI Lecture Note, Stanford, CA.

Sells, P., 1985, Lectures on Contemporary Syntactic Theories, CSLI Lecture Note,Stanford, CA.

Strand, K., 1992, Indeksering av nomenfraser i et tekstforst�ende system,hovedfagsoppgave, Institutt for lingvistikk og filosofi, Universitetet i Oslo (inNorwegian).

Vestre, E., 1991, ÒAn Algorithm for Generating Non-redundant QuantifierScopingsÓ, in Proceedings of the Fifth Conference of the European Chapterof the Association for Computational Linguistics, Berlin

computational semantics - universitetet i oslofolk.uio.no/jtl/sli360/komp/compsem.pdfcomputational...

Documents