unification grammars - university of haifashuly/malta-slides.pdf · lexical functional grammar...

Unification Grammars

Shuly Wintner

Department of Computer ScienceUniversity of Haifa

Haifa, Israel

L-Universita ta’ Malta, October 2008

Introduction Overview

Introduction

Grammatical formalisms: a formal, mathematical and computationalmodel for (the structure of) natural languages

Unification grammars: a general formalism underlying variouslinguistic theories

Lexical Functional Grammar (LFG)Head-driven Phrase Structure Grammar (HPSG)some variants of CCG, TAG, ...

This course: linguistic motivation; mathematical infrastructure;linguistic applications

This is an introductory course...

The Book (Francez and Wintner, Forthcoming)

c©Shuly Wintner (University of Haifa) Unification Grammars c©Copyrighted material 2 / 420


Plan

Syntax: the structure of natural languages

Linguistic formalismsConstituencySome syntactic phenomena

Context-free grammars

Basic definitions: grammars, forms, derivations, languages...Derivation treesStructural ambiguityGenerative capacityCFGs and natural languages



Plan

Feature structures

MotivationFeature graphsSubsumptionFeature structuresAttribute-value matrices

Unification

Feature structure unificationFeature graph unificationGeneralization



Plan

Extending feature structures

Multi-rooted feature graphsMulti-rooted feature structuresMulti-AVMs

Unification grammars

Unification in contextForms and grammar rulesDerivationsLanguagesDerivation tress



Plan

Linguistic applications

AgreementCase controlSubcategorizationLong-distance dependenciesControlCoordination



Plan

Computational aspects

The expressivity of unification grammars

Extensions and open problems

Restricted versions of unification grammarsTyped unification grammarsDevelopment of large-scale grammarsGrammar engineering


Syntax Linguistic formalisms

Linguistic formalisms

Syntax is the field of linguistics that studies the structure of naturallanguages.

Why should there be any mathematics involved with linguistictheories?

A linguistic formalism is a (formal) language, with which claimsabout (natural, but also formal) languages can be made.


Syntax Structure

Syntax

The underlying assumption is that languages have structure: not allsequences of words over the given alphabet are valid; and when asequence of words is valid (grammatical), a natural structure can beinduced on it.

It is useful to think of this structure as a tree (although we shall seeother structures later).

Given a sentence in some language, not all possible trees define thestructure that native speakers of the language intuitively recognize.


Syntax Structure

Natural languages have structure

Even though I klaw through the valley of the shadow of death, I willraef no evil

Even though I walk through the valley of the shadow of death, I willfear no evil

Even though I ordinary through the valley of the shadow of death, Iwill slowly no evil

Even though I slowly gaze through the valley of the shadow of death, Iwill unsurprisingly do no evil

Even though I walk through the valley of the shadow of death, I willfear no evil


Syntax Structure


Natural languages are infinite:

The water put out the fire

The water put out the fire, that burned the stick

The water put out the fire, that burned the stick, that hit the dog

The water put out the fire, that burned the stick, that hit the dog,

that chased the cat

But it is possible to characterize an infinite set with finite expressions.


Syntax Structure


Intuitively, words combine to form phrases:

(Jacob (served (seven years) (for Rachel))),

and (they seemed to him but a few days

(because of ((the love) (he had for her)))).

but not:(Jacob served) seven (years for) Rachel,

and they (seemed to) him but

(a few days because) of the love he had for her.

Phrases which correspond to our native speaker intuitions are calledconstituents.


Syntax Constituency

Determining constituents

The criteria for defining constituents are sometimes fuzzy.

The main criterion is equivalent distribution: if two word sequencesare mutually interchangeable in every context, preservinggrammaticality, then both are constituents and both have the samegrammatical category.


Syntax Constituency


Certain grammatical operations apply only to constituents:

Topicalization:

For Rachel, Jacob served seven years

Cleft:It was for Rachel that Jacob served seven years

Interjection:

Jacob served seven years, the Bible tells us, for Rachel


Syntax Constituency


Certain grammatical operations apply only to constituents:

Question formation:

How long did Jacob serve for Rachel?

Coordination:Jacob served seven years for Rachel,

and they seemed to him but a few days

Anaphors refer to constituents:

... and for Leah, too


Syntax Constituency

Types of constituents

Inducing structure on a grammatical string is done recursively,starting with the words.

To this end, words are classified into categories according to theirdistribution.

In many languages, words are classified into substantial andfunctional categories.

substantial: table, dogs, walked, purple, quickly

functional: the, in, or

Another classification is according to whether the category is open orclose.


Syntax Constituency

Types of constituents

Word categories (parts of speech):

N Noun table, dogs, justice, oak

V Verb run, climb, love, ignore

ADJ Adjective green, fast, mild, imaginary

ADV Adverb quickly, well, alone

P Preposition in, to, of, after, in spite of

D Determiner a, the, all, some

Pron Pronoun I, you, she, theirs, our

PropN Proper Noun Jacob, IBM, Haifa


Syntax Constituency

Constituents

Phrases are projections of word categories:

Noun phrases are headed by nouns:table → round table → the round table

→ the round table in the corner

→ the round table in the corner that we sat at yesterday

Verb phrases are headed by verbs:climbed → climbed a tree → climbed a tree yesterday

→ recklessly climbed a tree yesterday

Adjectival phrases are headed by adjectives:high → rather high / higher than me / high as a tree


Syntax Constituency

Constituents

Phrases consist of a head and additional complements and adjuncts.The phrase is a projection of its head.

Complements are required by the head, and are mandatory. Adjunctsare optional, and can be iterated.

Example: John drinks a cup of milk every morning


Syntax Syntactic phenomena

Syntactic phenomena

Agreement

Subcategorization

Case assignment

Unbounded dependencies

Subject/object control

Coordination



A gradual description of language fragments

E0 is a small fragment of English consisting of very simple sentences,constructed with only intransitive and transitive (but no ditransitive)verbs, common nouns, proper names, pronouns and determiners.

Typical sentences are:

A sheep drinks

Rachel herds the sheep

Jacob loves her




Similar strings are not E0- (and, hence, English-) sentences:

∗Rachel feed the sheep

∗Rachel feeds herds the sheep

∗The shepherds feeds the sheep

∗Rachel feeds

∗Jacob loves she

∗Jacob loves Rachel she

∗Them herd the sheep




There are constraints on the combination of phrases in E0:

The subject and the predicate must agree on number and person: ifthe subject is a third person singular, so must the verb be.

Objects complement only – and all – the transitive verbs.

When a pronoun is used, it is in the nominative case if it is in thesubject position, and in the accusative case if it is an object.



Subcategorization

E1 is a fragment of English, based on E0, in which verbs are classifiedto subclasses according to the complements they “require”:

Laban gave Jacob his daughter

Jacob promised Laban to marry Leah

Laban persuaded Jacob to promise him to marry Leah

Similar strings that violate this constraint are:

∗Rachel feeds Jacob the sheep

∗Jacob saw to marry Leah



Control

With the addition of infinitival complements in E1, E2 can captureconstraints of argument control in English:

Jacob promised Laban to work seven years

Laban persuaded Jacob to work seven years



Long distance dependencies

Another extension of E1 is E3, typical sentences of which are:

The shepherd wondered whom Jacob loved ⌣.

The shepherd wondered whom Laban thought Jacob loved ⌣.

The shepherd wondered whom Laban thought Rachel claimed

Jacob loved ⌣.

An attempt to replace the gap with an explicit noun phrase results inungrammaticality:

∗The shepherd wondered who Jacob loved Rachel.




The gap need not be in the object position:

Jacob wondered who ⌣ loved Leah

Jacob wondered who Laban believed ⌣ loved Leah

Again, an explicit noun phrase filling the gap results inungrammaticality:

∗Jacob wondered who the shepherd loved Leah




More than one gap may be present in a sentence (and, hence, morethan one filler):

This is the well which Jacob is likely to ⌣ draw water from ⌣

It was Leah that Jacob worked for ⌣ without loving ⌣

In some languages (e.g., Norwegian) there is no (principled) bound onthe number of gaps that can occur in a single clause.




There are other fragments of English in which long distancedependencies are manifested in other forms.

Topicalization:

Rachel, Jacob loved ⌣

Rachel, every shepherd knew Jacob loved ⌣

Another example is interrogative sentences:

Who did Jacob love ⌣?

Who did Laban believe Jacob loved ⌣?



Coordination

Coordination is accounted for in the language fragment E4:

No man lift up his [hand] or [foot] in all the land of EgyptJacob saw [Rachel] and [the sheep of Laban]Jacob [went on his journey] and[came to the land of the people of the east]Jacob [went near], and [rolled the stone from the well’s mouth], and[watered the flock of Laban his mother’s brother].every [speckled] and [spotted] sheepLeah was [tender eyed] but [not beautiful][Leah had four sons], but [Rachel was barren]

She said to Jacob, “[Give me children], or [I shall die]!”



The goals of syntactic analysis

Given a natural language sentence, syntactic analysis provides astructural description of the sentence.

To do so, one must have a model of the structure of the language.

Syntax is concerned with a formulation of the structure of naturallanguages. An example of a syntactic formalism is context-freegrammars.

In CFGs, the structure of sentences is modeled by derivation trees.


Context-free grammars Basic definitions


Definition (Context-free grammars)

A context-free grammar (CFG) is a four-tuple 〈Σ,V ,S ,P〉, where:

Σ is a finite, non-empty set of terminals, the alphabet;

V is a finite, non-empty set of grammar variables (categories, ornon-terminal symbols), such that Σ ∩ V = ∅;

S ∈ V is the start symbol;

P is a finite set of production rules, each of the form A → α, whereA ∈ V and α ∈ (V ∪ Σ)∗.

For a rule A → α, A is the rule’s head and α is its body.




Example (CFG example)

Σ = {the, cat, in, hat}V = {D, N, P, NP, PP}The start symbol is NPThe rules:

D → the NP → D NN → cat PP → P NPN → hat NP → NP PPP → in



Context-free grammars: language

Each non-terminal symbol in a grammar denotes a language.

A rule such as N → cat implies that the language denoted by thenon-terminal N includes the alphabet symbol cat.

The symbol cat here is a single, atomic alphabet symbol, and not astring of symbols: the alphabet of this example consists of naturallanguage words, not of natural language letters.

For a more complex rule such as NP → D N , the language denotedby NP contains the concatenation of the language denoted by D withthat denoted by N: L(NP) ⊇ L(D) · L(N).

Matters become more complicate when we consider recursive rulessuch as NP → NP PP .



Context-free grammars: derivation

Given a grammar G = 〈V ,Σ,P ,S〉, we define the set of forms to be(V ∪ Σ)∗: the set of all sequences of terminal and non-terminalsymbols.

Derivation is a relation that holds between two forms, each asequence of grammar symbols.

Definition (Derivation)

A form α derives a form β, denoted by α ⇒ β, if and only if α = γlAγr

and β = γlγcγr and A → γc is a rule in P .

A is called the selected symbol. The rule A → γ is said to beapplicable to α.



Derivation

Example (Forms)

The set of non-terminals of G is V = {D, N, P, NP, PP} and the set ofterminals is Σ = {the, cat, in, hat}.The set of forms therefore contains all the (infinitely many) sequences ofelements from V and Σ, such as 〈〉, 〈NP〉, 〈D cat P D hat〉, 〈D N〉,〈the cat in the hat〉, etc.



Derivation

Example (Derivation)

Let us start with a simple form, 〈NP〉. Observe that it can be written asγlNPγr , where both γl and γr are empty. Observe also that NP is thehead of some grammar rule: the rule NP → D N . Therefore, the form is agood candidate for derivation: if we replace the selected symbol NP withthe body of the rule, while preserving its environment, we getγlD Nγr = D N. Therefore, 〈NP〉 ⇒ 〈D N〉.



Derivation


We now apply the same process to 〈D N〉. This time the selected symbolis D (we could have selected N, of course). The left context is againempty, while the right context is γr = N. As there exists a grammar rulewhose head is D, namely D → the, we can replace the rule’s head by itsbody, preserving the context, and obtain the form 〈the N〉. Hence〈D N〉 ⇒ 〈the N〉.



Derivation


Given the form 〈the N〉, there is exactly one non-terminal that we canselect, namely N. However, there are two rules that are headed by N:N → cat and N → hat. We can select either of these rules to show thatboth 〈the N〉 ⇒ 〈the cat〉 and 〈the N〉 ⇒ 〈the hat〉.Since the form 〈the cat〉 consists of terminal symbols only, nonon-terminal can be selected and hence it derives no form.



Extended derivation

αk⇒G β if α derives β in k steps:

α ⇒G α1 ⇒G α2 ⇒G . . . ⇒G αk andαk = β.

The reflexive-transitive closure of ‘⇒G ’ is ‘∗⇒G ’: α

∗⇒G β if α

k⇒G β

for some k ≥ 0.

A G -derivation is a sequence of forms α1, . . . , αn, such that forevery i , 1 ≤ i < n, αi ⇒G αi+1.



Extended derivation: example


(1) 〈NP〉 ⇒ 〈D N〉(2) 〈D N〉 ⇒ 〈the N〉(3) 〈the N〉 ⇒ 〈the cat〉



Extended derivation: example


Therefore, we trivially have:

(4) 〈NP〉∗⇒ 〈D N〉

(5) 〈D N〉∗⇒ 〈the N〉

(6) 〈the N〉∗⇒ 〈the cat〉

From (2) and (6) we get

(7) 〈D N〉∗⇒ 〈the cat〉

and from (1) and (7) we get

(7) 〈NP〉∗⇒ 〈the cat〉



Languages

Definition (Senential forms)

A form α is a sentential form of a grammar G iff S∗⇒G α, i.e., it can be

derived in G from the start symbol.

Definition (Language)

The (formal) language generated by a grammar G with respect to a

category name (non-terminal) A is LA(G ) = {w | A∗⇒ w}. The language

generated by the grammar is L(G ) = LS(G ).

Definition (Context-free languages)

A language that can be generated by some CFG is a context-free languageand the class of context-free languages is the set of languages everymember of which can be generated by some CFG. If no CFG can generatea language L, L is said to be trans-context-free.



Language of a grammar

Example (Language)

For the example grammar (with NP the start symbol):

D → the NP → D NN → cat PP → P NPN → hat NP → NP PPP → in

it is fairly easy to see that L(D) = {the}.Similarly, L(P) = {in} and L(N) = {cat, hat}.



Language of a grammar

Example (Language)

It is more difficult to define the languages denoted by the non-terminalsNP and PP, although is should be straight-forward that the latter isobtained by concatenating {in} with the former.Proposition: L(NP) is the denotation of the regular expression

the · (cat + hat) · (in· the · (cat + hat))∗



Language: a formal example Ge

Example (Language)

S → Va S Vb

S → ǫ

Va → aVb → b

L(Ge) = {anbn | n ≥ 0}.



Recursion

The language L(Ge) is infinite: it includes an infinite number ofwords; Ge is a finite grammar.

To be able to produce infinitely many words with a finite number ofrules, a grammar must be recursive: there must be at least one rulewhose body contains a symbol, from which the head of the rule canbe derived.

Put formally, a grammar 〈Σ,V ,S ,P〉 is recursive if there exists achain of rules, p1, . . . , pn ∈ P , such that for every 1 < i ≤ n, the headof pi+1 occurs in the body of pi , and the head of p1 occurs in thebody of pn.

In Ge , the recursion is simple: the chain of rules is of length 0,namely the rule S → Va S Vb is in itself recursive.


Context-free grammars Derivation trees

Derivation tree

Sometimes derivations provide more information than is actuallyneeded. In particular, sometimes two derivations of the same stringdiffer not in the rules that were applied but only in the order in whichthey were applied.

Starting with the form 〈NP〉 it is possible to derive the string the catin two ways:

(1) 〈NP〉 ⇒ 〈D N〉 ⇒ 〈D cat〉 ⇒ 〈the cat〉(2) 〈NP〉 ⇒ 〈D N〉 ⇒ 〈the N〉 ⇒ 〈the cat〉

Since both derivations use the same rules to derive the same string, itis sometimes useful to collapse such “equivalent” derivations intoone. To this end the notion of derivation trees is introduced.



Derivation tree

A derivation tree (sometimes called parse tree, or simply tree) is avisual aid in depicting derivations, and a means for imposing structureon a grammatical string.

Trees consist of vertices and branches; a designated vertex, the rootof the tree, is depicted on the top. Then, branches are simplyconnections between two vertices.

Intuitively, trees are depicted “upside down”, since their root is at thetop and their leaves are at the bottom.



Derivation tree

Example (Derivation tree)

An example for a derivation tree for the string the cat in the hat:

NP

NP PP

D N P NP

D N

the cat in the hat



Derivation tree

Formally, a tree consists of a finite set of vertices and a finite set ofbranches (or arcs), each of which is an ordered pair of vertices.

In addition, a tree has a designated vertex, the root, which has twoproperties: it is not the target of any arc, and every other vertex isaccessible from it (by following one or more branches).

When talking about trees we sometimes use family notation: if avertex v has a branch leaving it which leads to some vertex u, thenwe say that v is the mother of u and u is the daughter, or child, of v .If u has two daughters, we refer to them as sisters.



Derivation trees

Derivation trees are defined with respect to some grammar G , and mustobey the following conditions:

1 every vertex has a label, which is either a terminal symbol, anon-terminal symbol or ǫ;

2 the label of the root is the start symbol;

3 if a vertex v has an outgoing branch, its label must be a non-terminalsymbol, the head of some grammar rule; and the elements in body ofthe same rule must be the labels of the children of v , in the sameorder;

4 if a vertex is labeled ǫ, it is the only child of its mother.



Derivation trees

A leaf is a vertex with no outgoing branches.

A tree induces a natural “left-to-right” order on its leaves; when readfrom left to right, the sequence of leaves is called the frontier, or yieldof the tree.



Correspondence between trees and derivations

Derivation trees correspond very closely to derivations.

For a form α, a non-terminal symbol A derives α if and only if α isthe yield of some parse tree whose root is A.

Sometimes there exist different derivations of the same string thatcorrespond to a single tree. In fact, the tree representation collapsesexactly those derivations that differ from each other only in the orderin which rules are applied.




NP

NP PP

D N P NP

D N

the cat in the hat

Each non-leaf vertex in the tree corresponds to some grammar rule (sinceit must be labeled by the head of some rule, and its children must belabeled by the body of the same rule).




This tree represents the following derivations (among others):

(1) NP ⇒ NP PP ⇒ D N PP ⇒ D N P NP⇒ D N P D N ⇒ the N P D N⇒ the cat P D N ⇒ the cat in D N⇒ the cat in the N ⇒ the cat in the hat

(2) NP ⇒ NP PP ⇒ D N PP ⇒ the N PP⇒ the cat PP ⇒ the cat P NP⇒ the cat in NP ⇒ the cat in D N⇒ the cat in the N ⇒ the cat in the hat

(3) NP ⇒ NP PP ⇒ NP P NP ⇒ NP P D N⇒ NP P D hat ⇒ NP P the hat⇒ NP in the hat ⇒ D N in the hat⇒ D cat in the hat ⇒ the cat in the hat




While exactly the same rules are applied in each derivation (the rulesare uniquely determined by the tree), they are applied in differentorders.

In particular, derivation (2) is a leftmost derivation: in every step theleftmost non-terminal symbol of a derivation is expanded.

Similarly, derivation (3) is rightmost.


Context-free grammars Ambiguity

Ambiguity

Sometimes, however, different derivations (of the same string!)correspond to different trees.

This can happen only when the derivations differ in the rules whichthey apply.

When more than one tree exists for some string, we say that thestring is ambiguous.

Ambiguity is a major problem when grammars are used for certainformal languages, in particular programming languages. But fornatural languages, ambiguity is unavoidable as it corresponds toproperties of the natural language itself.



Ambiguity: example

Consider again the example grammar and the following string:

the cat in the hat in the hat

Intuitively, there can be (at least) two readings for this string: one inwhich a certain cat wears a hat-in-a-hat, and one in which a certaincat-in-a-hat is inside a hat:

((the cat in the hat) in the hat)

(the cat in (the hat in the hat))

This distinction in intuitive meaning is reflected in the grammar, andhence two different derivation trees, corresponding to the tworeadings, are available for this string:



Ambiguity: example

NP

NP

NP PP PP

D N P NP P NP

D N D N




Ambiguity: example

NP

NP PP

D N P NP

NP PP

P NP

D N D N




Ambiguity: example

Using linguistic terminology, in the left tree the second occurrence ofthe prepositional phrase in the hat modifies the noun phrase the catin the hat, whereas in the right tree it only modifies the (firstoccurrence of) the noun phrase the hat.

This situation is known as syntactic or structural ambiguity.


Context-free grammars Generative capacity

Grammar equivalence

It is common in formal language theory to relate different grammarsthat generate the same language by an equivalence relation:

Two grammars G1 and G2 (over the same alphabet Σ) areequivalent (denoted G1 ≡ G2) iff L(G1) = L(G2).

We refer to this relation as weak equivalence, as it only relates thegenerated languages. Equivalent grammars may attribute totallydifferent syntactic structures to members of their (common)languages.



Grammar equivalence

Example (Equivalent grammars, different trees)

Following are two different tree structures that are attributed to the stringaabb by the grammars Ge and Gf , respectively.

S S

S S

Va Va S Vb Vb

a a ǫ b b a a ǫ b b



Grammar equivalence

Example (Structural ambiguity)

A grammar, Garith, for simple arithmetic expressions:

S → a | b | c | S + S | S ∗ S

Two different trees can be associated by Garith with the string a + b ∗ c :

S S

S S

S S S S S S

a + b ∗ c a + b ∗ c



Grammar equivalence

Weak equivalence relation is stated in terms of the generatedlanguage.

Consequently, equivalent grammars do not have to be described inthe same formalism for them to be equivalent.

We will later see how grammars, specified in different formalisms, canbe compared.


Context-free grammars CFGs and natural languages

Normal form

It is convenient to divide grammar rules into two classes: one thatcontains only phrasal rules of the form A → α, where α ∈ V ∗, andanother that contains only terminal rules of the form B → σ whereσ ∈ Σ.

It turns out that every CFG is equivalent to some CFG of this form.



Normal form

A grammar G is in phrasal/terminal normal form iff for everyproduction A → α of G , either α ∈ V ∗ or α ∈ Σ.

Productions of the form A → σ are called terminal rules, and A issaid to be a pre-terminal category, the lexical entry of σ.

Productions of the form A → α, where α ∈ V ∗, are called phrasalrules.

Furthermore, every category is either pre-terminal or phrasal, but notboth.

For a phrasal rule with α = A1 · · ·An,w = w1 · · ·wn,w ∈ LA(G ) andwi ∈ LAi

(G ) for i = 1, . . . , n, we say that w is a phrase of category A,and each wi is a sub-phrase (of w) of category Ai .

A sub-phrase wi of w is also called a constituent of w .



Context-free grammars for natural languages

Context-free grammars can be used for a variety of syntacticconstructions, including some non-trivial phenomena such asunbounded dependencies, extraction, extraposition etc.

However, some (formal) languages are not context-free, and thereforethere are certain sets of strings that cannot be generated bycontext-free grammars.

The interesting question, of course, involves natural languages: arethere natural languages that are not context-free? Are context-freegrammars sufficient for generating every natural language?



A context-free grammar, G0, for E0

Example

A context-free grammar, G0, for E0

S → NP VPVP → VVP → V NPNP → D NNP → PronNP → PropND → the, a, two, every, . . .

N → sheep, lamb, lambs, shepherd, water . . .

V → sleep, sleeps, love, loves, feed, feeds, herd, herds, . . .

Pron → I, me, you, he, him, she, her, it, we, us, they, them

PropN → Rachel, Jacob, . . .



Context-free grammars for natural languages

There are two major problems with this grammar.

1 it ignores the valence of verbs: there is no distinction amongsubcategories of verbs, and an intransitive verb such as sleep mightoccur with a noun phrase complement, while a transitive verb such aslove might occur without one. In such a case we say that thegrammar overgenerates: it generates strings that are not in theintended language.

2 there is no treatment of subject–verb agreement, so that a singularsubject such as the cat might be followed by a plural form of verbsuch as smile. This is another case of overgeneration.

Both problems are easy to solve.



Problems of G0

Over-generation (agreement constraints are not imposed):

∗Rachel feed the sheep

∗The shepherds feeds the sheep

∗Rachel feeds

∗Jacob loves she

∗Them herd the sheep



Problems of G0

Over-generation (subcategorization constraints are not imposed):

the lambs sleep

Jacob loves Rachel

∗the lambs sleep the sheep

∗Jacob loves



Problems of G0

Example (Over-generation)

S

NP VP

D N V NP

Pron

the lambs sleeps they



Verb valence

To account for valence, we can replace the non-terminal symbol V bya set of symbols: Vtrans, Vintrans, Vditrans etc.

We must also change the grammar rules accordingly:

Example

VP → Vintrans Vintrnas → sleep, sleeps

VP → Vtrans NP Vtrans → love, loves

VP → Vditrans NP NP Vditrans → give, gives



Agreement

To account for agreement, we can again extend the set ofnon-terminal symbols such that categories that must agree reflect inthe non-terminal that is assigned for them the features on which theyagree.

In the very simple case of English, it is sufficient to multiply the set of“nominal” and “verbal” categories, so that we get Dsg, Dpl, Nsg,Npl, NPsg, NPpl, Vsg, Vlp, VPsg, VPpl etc. We must also changethe set of rules accordingly:



Agreement

Example

Nsg → lamb Npl → lambs

Nsg → sheep Npl → sheep

Vsg → sleeps Vpl → sleep

Vsg → smiles Vpl → smile

Vsg → loves Vpl → love

Vsg → saw Vpl → saw

Dsg → a Dpl → two



Agreement

Example

S → NPsg VPsg S → NPpl VPplNPsg → Dsg Nsg NPpl → Dpl NplVPsg → Vsg VPpl → VplVPsg → VPsg NP VPpl → VPpl NP



Methodological properties of the CFG formalism

1 Concatenation is the only string combination operation

2 Phrase structure is the only syntactic relationship

3 The terminal symbols have no properties

4 Non-terminal symbols (grammar variables) are atomic

5 Most of the information encoded in a grammar lies in the productionrules

6 Any attempt of extending the grammar with a semantics requiresextra means.



Alternative methodological properties

1 Concatenation is not necessarily the only way by which phrases maybe combined to yield other phrases.

2 Even if concatenation is the sole string operation, other syntacticrelationships are being put forward.

3 Modern computational formalisms for expressing grammars adhere toan approach called lexicalism.

4 Some formalisms do not retain any context-free backbone. However,if one is present, its categories are not atomic.

5 The expressive power added to the formalisms allows also a certainway for representing semantic information.


Feature structures Introduction

Feature structures

Motivated by the violations of the context-free grammar G0, wewould like to extend the CFG formalism with additional mechanismsthat will facilitate the expression of information that is missing in G0

in a uniform and compact way.

The core idea is to incorporate into the grammar properties ofsymbols, in terms of which the violations of G0 were stated.

Properties are represented by means of feature structures.


Feature structures Introduction

Overview

An overview of feature structures, motivating their use as arepresentation of linguistic information

Four different views of these entities:

feature graphsfeature structuresabstract feature structuresattribute-value matrices (AVMs)

Feature structures in a broader context.


Feature structures Motivation

Motivation

Words in natural languages have properties

We want to model these properties in the lexicon

We would like to associate with words not just atomic symbols, as inCFGs, but rather structural information that reflects their properties.



A simple lexicon

Example (A simple lexicon)

lamb:

[

num : sgpers : third

]

lambs:

[

num : plpers : third

]

I:

[

num : sgpers : first

]

sheep:

[

num : [ ]pers : third

]

dreams:

[


]

dreams:

[


]



Feature structures

Feature structures map features into values, which are themselvesfeature structures

A special case of feature structures are atoms, which representstructureless values.

For example, to deal with number (and impose its agreement), weuse a feature num, and a set of atomic feature structures {sg,pl} asits values, representing singularity and plurality, respectively.

When a value is not atomic, it is complex.

A complex value is, recursively, a feature structure consisting offeatures and values.



A complex feature structure

Example (A complex feature structure)

loves:

vtype : transitive

agr :

[


]



Grouping features

Deciding how to group features is up to the grammar designer, and isintended to capture syntactic generalizations.

If number and person ‘go together’ in formulating restrictions, it ismore appropriate to group them as in this example.

Moreover, such a grouping might be beneficial when featurestructures are being modified.

Processes of derivation and parsing (the application of grammarrules) are able to manipulate feature structures to reflect applicationof such constraints.

When the properties of some feature structure are changed, it ispossible to change the value of only one feature, namely agr, ratherthan specify two separate changes for each subfeature.



Grouping features

In the example lexicon, the lexical ambiguity of sheep is representedby an empty feature structure as the value of the num feature.

This is interpreted as the value of this feature being unconstrained.

However, it would have been useful to be able to state that the onlypossible values for this feature are, say, sg and pl.

There are at least two different ways to specify such information:

by listing a set of values for the feature;or by restricting its value to a certain “type” of permissible values.

We do not explore the former solution here.

The latter solution is employed by typed feature structure formalisms.



Adding features to phrases

Words are not the only linguistic entities that have properties; wordsare combined into phrases, and those also have properties which canbe modeled by feature:value pairs.

For example, the noun phrase a sheep has the value sg for the num

feature, while two sheep has the value pl for num.

Consequently, grammar non-terminals, too, must be decorated withfeatures, representing the endowment of phrases of this category withthat feature.


Feature structures Feature graphs

Feature graphs

The informal discussion of feature structures above depicted themusing a representation, called attribute-value matrices (AVMs), whichis common in the linguistic literature.

We begin the discussion of feature structures by defining the conceptof feature graphs, using well-known concepts of graph theory.

A graph view of feature structures facilitates computationalprocessing because so many properties of graphs are well understoodand because graphs lend themselves to efficient processing.

We will return to AVMs and discuss their correspondence with featuregraphs later on.



Definitions

Feature graphs are defined over a signature consisting of non-empty,finite, disjoint sets Feats of features and Atoms of atoms.

Features are used to encode properties of (linguistic) objects, such asnumber, gender etc.

Atoms are used for the (atomic) values of such features, as in plural,feminine etc.

We use a convention of depicting features in small capitals andatoms in italics.



Signature

Definition (Signature)

A signature is a structure S = 〈Atoms,Feats〉, where Atoms is a finiteset of atoms and Feats is a finite set of features.

We assume some fixed signature throughout this presentation.

Meta-variables f , g (with or without subscripts or superscripts) rangeover features, and a, b, etc. over atoms.

We usually assume that both Feats and Atoms are non-empty (andsometimes even assume that they include more than one elementeach).



Feature graphs

Definition (Feature graphs)

A feature graph A = 〈QA, qA, δA, θA〉 is a finite, directed, connected,labeled graph consisting of a finite, nonempty set of nodes QA (such thatQA ∩ Feats = QA ∩Atoms = ∅), a root qA ∈ QA, a partial functionδA : QA × Feats → QA specifying the arcs such that every node q ∈ QA

is accessible from qA, and a partial function, marking some of the sinks:θA : QS → Atoms, where QS = {q ∈ QA | δA(q, f )↑ for every f }.Given a signature of features Feats and atoms Atoms, letG(Feats,Atoms) be the set of all feature graphs over the signature.



Feature graphs

Example (Feature graphs)

The graph displayed below is 〈Q, q, δ, θ〉, whereQ = {q0, q1, q2, q3}, q = q0, δ(q0,agr) = q1, δ(q1,num) =q2, δ(q1,pers) = q3,QS = {q2, q3}, θ(q2) = pl, θ(q3) = third.

q2pl

q0 q1

q3third

agr

num

pers



Feature graphs

The arcs of a feature graph are thus labeled by features.

The root is a designated node from which all other nodes areaccessible (through δ); note that nothing prevents the root fromhaving incoming arcs.

Sink nodes (nodes with no outgoing edges) can be marked by anatom, but can also be unmarked.



Feature graphs

We use meta-variables A, B (with or without subscripts) to refer tofeature graphs.

We use Q, q, δ, θ, to refer to constituents of feature graphs.

When displaying feature graphs, the root is depicted as a grey-colorednode, usually at the top or the left side of the graph.

The identities of the nodes are arbitrary, and we use generic namessuch as q0, q1 etc. to refer to them.



Feature graphs

Example (Feature graphs)

In the following graph, the leaves q2 and q3 bear no marking; in otherwords, the marking function θ is undefined for the two sinks in its domain.

q2

q0 q1

q3

agrnum

pers

The graph displayed above is 〈Q, q, δ, θ〉, where Q = {q0, q1, q2, q3}, q =q0, δ(q0,agr) = q1, δ(q1,num) = q2, δ(q1,pers) = q3,QS = {q2, q3},and θ is undefined for its entire domain.



Feature graphs

A feature graph is empty if it consists of a single unmarked nodewith no arcs.

A feature graph is atomic if it consists of a single marked node withno arcs.



Empty and atomic feature graphs

Example (Empty and atomic feature graphs)

A, an empty feature graph: q0

B , an atomic feature graph: q0pl



Paths

The concept of paths is natural when graphs are concerned.

A path (over Feats) is a finite sequence of features, and the setPaths = Feats∗ is the collection of all paths.

Meta-variables π, α (with or without subscripts) range over paths.

ǫ is the empty path, denoted also by ‘〈〉’.

The length of a path π is denoted |π|.

For example, if Feats = {a, b} then Paths includesǫ, 〈a〉, 〈b〉, 〈a,b,a〉, 〈b,b,b,b,a,b〉, etc.

While a path is a purely syntactic notion (every sequence of featuresconstitutes a path), interesting paths are those that can be interpretedas actual paths in some graph, leading from the root to some node.



Paths

The definition of δ is therefore extended to paths: given a featuregraph A = 〈QA, qA, δA, θA〉, define δA : QA ×Paths → QA as follows:

δA(q, ǫ) = q

δA(q, f π) = δA(δA(q, f ), π) (defined only if δA(q, f )↓)

Since for every node q ∈ QA and every feature f ∈ Feats,δA(q, f) = δA(q, 〈f〉), we identify δ with δ in the future and use onlythe latter. When the index (A) is clear from the context, it is omitted.When δA(q, π) = q′ we say that π leads (in A) from q to q′.



Paths

Definition (Paths)

The paths of a feature graph A are Π(A) = {π ∈ Paths | δA(qA, π)↓}.



Paths

Example (Paths)

Consider the following feature graph, A:

q2pl

q0 q1

q3third

agr

num

pers

Its paths are

Π(A) = {ǫ, 〈agr〉, 〈agr num〉, 〈agr pers〉}



Path values

Of particular interest are paths which lead from the root of a featuregraph to some node in the graph.

For such paths we define the notion of a value, which is thesub-graph whose root is the node at the end of the path.

It would have been possible to define as value the node itslef, ratherthan the sub-graph is induces; the choice is a matter of taste, asmoving from one view of values to another is trivial.



Path values

Definition (Path value)

For a feature graph A = 〈QA, qA, δA, θA〉 and a path π ∈ Π(A), the valuevalA(π) of π in A is a feature graph B = 〈QB , qB , δB , θB〉, over the samesignature as A, where:

qB = δA(qA, π)

QB = {q′ ∈ QA | for some π′, δA(qB , π′) = q′} (QB is the set ofnodes reachable from qB)

for every feature f and for every q′ ∈ QB , δB(q′, f) = δA(q′, f ) (δB isthe restriction of δA to QB)

for every q′ ∈ QB , θB(q′) = θA(q′) (θB is the restriction of θA to QB)



Paths

Example (Paths)

Consider the following feature graph, A:

q2pl

q0 q1

q3third

agr

num

pers

Its paths are

Π(A) = {ǫ, 〈agr〉, 〈agr num〉, 〈agr pers〉}



Path values

Example (Path values)

The value of the path 〈agr〉 in A is:

valA(〈agr〉) =

q2pl

q1

q3third

num

pers

and the value of the path 〈agr num〉 in A is:

valA(〈agr num〉) = q2pl

Note that, for example, the value of 〈agr pers num〉 in A is undefined.



Reentrancy

The definition of path values raises the question of when two pathshave equal values.

We distinguish between paths which lead to one and the same node,and those whose values are isomorphic but not identical.

The former case is called reentrancy.



Reentrancy

Definition (Reentrancy)

Let A = 〈Q, q, δ, θ〉 be a feature graph. Two paths π1, π2 ∈ Π(A) are

reentrant in A, denoted π1A

! π2, iff δ(q, π1) = δ(q, π2), implyingvalA(π1) = valA(π2). A feature graph A is reentrant iff there exist two

distinct paths π1, π2 ∈ Π(A) such that π1A

! π2.



Reentrancy

Example (A reentrant feature graph)

This feature graph, A, is reen-trant because δA(q0, 〈agr〉) =δA(q0, 〈subj,agr〉)

q2pl

q0 q1

q4 q3third

agr

num

perssubj agr

The (single) value of the(different) paths 〈agr〉 and〈subj agr〉 in A is:

q2pl

q1

q3third

num

pers



Reentrancy

The notion of reentrancy touches on the issue of the distinctionbetween type- and token-identity.

Two feature graphs are token identical if their components (i.e., theirsets of nodes, roots, transition functions and atom marking functions)are identical.

They are type-identical if they are isomorphic, not necessarilyrequiring their nodes to be identical.

We will discuss feature graph isomorphism later.



Cicles

Early feature structure based formalisms used to employ only acyclicfeature graphs.

However, modern ones usually allow (or even require) featurestructures to be possibly cyclic.

While the linguistic motivation for cyclic feature structures is limited,there is good practical motivation for allowing them: whenimplementing a system for manipulating feature graphs, it is usuallyeasier to support cycles than to guarantee that all the graphs in asystem are acyclic.

The reason is that unification, which is the major operation definedon feature graphs, can yield a cyclic graph even when its operands areacyclic.



Cicles

Definition (Cycles)

A feature graph A = 〈QA, qA, δA, θA〉 is cyclic if two paths π1, π2 ∈ Π(A),

where π1 is a proper subsequence of π2, are reentrant: π1A

! π2. A isacyclic otherwise.

Note that cyclicity is a special case of reentrancy (every cyclic featuregraph is reentrant, but not vice versa).

A corollary of the definition is that when a feature graph is cyclic, ithas at least one node q such that δ(q, α) = q for some non-emptypath α.



Cicles

Example (A cyclic feature graph)

Following is a cyclic feature graph, C :

q0 q1 q2a

f

h

g

The value of the path 〈f〉 in C , as well as the values of the (infinitelymany) paths 〈f hn〉, for n ≥ 0, is the same feature graph:

q1 q2a

h

g


Feature structures Feature graph subsumption

Feature graph isomorphism

Since feature graphs are just a special case of directed, labeledgraphs, we can adapt the well-defined notion of graph isomorphism tofeature graphs.

Informally, two graphs are isomorphic when they have the samestructure; the identites of their nodes may differ without affecting thestructure.

In our case, we require also that the labels of sink nodes be identicalin order for two graphs to be considered isomorphic.



Feature graph isomorphism

Definition (Feature graph isomorphism)

Two feature graphs A = 〈QA, qA, δA, θA〉 and B = 〈QB , qB , δB , θB〉 areisomorphic, denoted A ∼ B , iff there exists a one-to-one and ontomapping i : QA → QB , called an isomorphism, such that:

i(qA) = qB ;

for all q1, q2 ∈ QA and f ∈ Feats, δA(q1, f ) = q2 iffδB(i(q1), f ) = i(q2); and

for all q ∈ QA, θA(q) = θB(i(q)) (either both are undefined, or bothare defined and equal).



Feature graph subsumption

Definition (Subsumption)

Let A1 = 〈Q1, q1, δ1, θ1〉 and A2 = 〈Q2, q2, δ2, θ2〉 be two feature graphs.A1 subsumes A2 (denoted by A1 ⊑ A2) iff there exists a total functionh : Q1 → Q2, called a subsumption morphism, such that

h(q1) = q2

for every q ∈ Q1 and for every f such that δ1(q, f )↓,h(δ1(q, f )) = δ2(h(q), f )

for every q ∈ Q1, if θ1(q)↓ then θ1(q) = θ2(h(q)).

If A1 ⊑ A2 then A1 is said to subsume, or be more general than A2; A2 issubsumed by, or is more specific than, A1.



Subsumption

The morphism h associates with every node in Q1 a node in Q2; if anarc labeled f connects q with q′, then such an arc connects h(q) withh(q′).

In other words, δ and h commute, as depicted in the followingdiagram, where δ-arcs are depicted using solid lines, whereash-mappings are depicted using dashed lines:

δ :

h

h

f f



Subsumption

In addition, if a node q ∈ Q1 is marked by an atom, then its imageh(q) must be marked by the same atom (recall that only sinks can bethus marked).

Note that if a sink in Q1 is not marked, there is no constraint on itsimage (in particular, it can be a non-sink).



Subsumption morphism

Example (Subsumption morphism)

A1 A2

q h(q)

q h(q)

q′ h(q′)

f f

h

h

h




Example (Subsumption)

qA2

A : qA0 qA

1

qA3

third

qB2

pl

B : qB0 qB

1

qB4 qB

3 third

agr

num

pers

agr num

perssubj agr




Indeed, B can—and does—have nodes that do not correspond tonodes in A: such is qB

4 in the example.

In addition, while the sink qA2 is not marked by an atom (that is, it is

a variable), its image in B , qB2 , is marked as pl .

Notice that no subsumption morphism can be defined from QB toQA, since there is no node into which qB

4 can be mapped.

In particular, it cannot be mapped to the root of A since this wouldnecessitate an arc from qA

0 to itself (as the root of A would be theimage of both qB

4 and qB0 ).

Trying to take h−1 as an inverse subsumption morphism will fail bothbecause of qB

4 and because it would map qB2 to qA

2 , violating the lastclause of the subsumption relation (a marked sink must be mapped toa sink with the same mark).

We conclude that B 6⊑ A.



Subsumption

Given a feature structure, what modifications can be made to it inorder for it to become more specific? Three different kinds ofmodifications are possible:

1 Adding arcs;2 Adding reentrancies;3 Marking unmarked sinks by some atom.



Subsumption

Example (Subsumption as an order on information)

⊑ pl adding arcsnum

⊑ pl adding atomic marksnum num

sg ⊑ sg adding arcs

third

num num

per

sg ⊑ sg adding reentrancies

sg

num1

num2

num1

num2



Subsumption

Lemma

If A ⊑ B then Π(A) ⊆ Π(B).



Subsumption

Lemma

If A ⊑ B then for each π ∈ Π(A), if θA(δA(qA, π))↓ then θB(δB(qB , π))↓and θA(δA(qA, π)) = θB(δB (qB , π)).



Subsumption

Lemma

If A ⊑ B and π1, π2 are reentrant in A (that is, π1A

! π2) then π1, π2 are

reentrant in B (that is, π1B

! π2).



Subsumption

Corollary

If A ⊑ B, then:

Π(A) ⊆ Π(B)

for each π ∈ Π(A), if θA(δA(qA, π))↓ then θB(δB (qB , π))↓ andθA(δA(qA, π)) = θB(δB (qB , π))

for each π1, π2 ∈ Π(A), if π1A

! π2 then π1B

! π2 (and, therefore, ifA is reentrant/cyclic then so is B).



Subsumption

Theorem

If A is an atomic feature graph and A ⊑ B, then A ∼ B.



Subsumption

Theorem

Subsumption has a least element: there exists a feature graph A such thatfor all feature graph B, A ⊑ B.

Proof.

Consider the (empty) feature graph A = 〈{q0}, q0, δ, θ〉, where δ and θ areundefined for their entire domains. For every feature graph B , A ⊑ B bymapping (through h) the root q0 to the root of B , qB . The two clauses ofthe definition of subsumption hold vacuously.



Subsumption

Theorem

Subsumption is reflexive: for every feature graph A, A ⊑ A.

Proof.

Take h to be the identity function that maps every node in A to itself.



Subsumption

Theorem

Subsumption is transitive: if A ⊑ B and B ⊑ C then A ⊑ C.



Subsumption

Theorem

Subsumption is not antisymmetric: if A ⊑ B and B ⊑ A then notnecessarily A = B.

Proof.

Consider the feature graphs A = 〈{qA}, qA, δ, θ〉 and B = 〈{qB}, qB , δ, θ〉,where δ and θ are undefined for their entire domains, and where qA 6= qB .Trivially, both A ⊑ B and B ⊑ A, but A 6= B .



Subsumption

Thus, feature graph subsumption forms a partial pre-order on featuregraphs.

It is a pre-order since it is not antisymmetric; it is partial as there arefeature graphs that are incomparable with respect to subsumption.



Subsumption

Example (Feature graph subsumption is a partial relation)

Feature graphs can be incomparable due to inconsistency (contradictinginformation) or to complementary information.

sg6⊑6⊒ pl

sg6⊑6⊒ pl

num num

6⊑6⊒

num pers



Subsumption

There is a clear connection between feature graph isomorphism andfeature graph subsumption:

Theorem

A ∼ B iff A ⊑ B and B ⊑ A.


Feature structures Feature structures

Feature structures

Feature graphs are a useful notation but they are too discriminating.

Usually, the importance of the identities of the nodes in a graph isinferior to the structure of the graph (including the labels on itsnodes and arcs).

It is therefore beneficial to collapse feature graphs which only differ inthe identities of their nodes into an equivalence class.

The definition of feature structures as equivalence classes ofisomorphic feature graphs facilitates a view which emphasizes thestructure and ignores the irrelevant information encoded in the nodes.



Feature structures

Definition (Feature structures)

Given a signature of features Feats and atoms Atoms, let FS = G|∼ bethe collection of equivalence classes in G(Feats,Atoms) with respect tofeature graph isomorphism. A feature structure is any member of FS.

We use meta-variables fs to range over feature structures.



Feature structures

Theorem

Let fs be a feature structure and let A ∈ fs, B ∈ fs be two feature graphsin fs. Then:

Π(A) = Π(B)

for each π ∈ Π(A), θA(δA(qA, π))↓ iff θB(δB (qB , π))↓ andθA(δA(qA, π)) = θB(δB (qB , π))

for each π1, π2 ∈ Π(A), π1A

! π2 iff π1B

! π2 (and, therefore, A isreentrant/cyclic iff B is reentrant/cyclic).



Feature structures

Definition

Let fs be a feature structure. Then the paths of fs are defined asΠ(fs) = Π(A) for some A ∈ fs.



Feature structures

From now on, we will usually refer to feature structures through somefeature graph representative, taking care that all definitions arerepresentative independent.

As an example, we can lift the definition of reentrancy from featuregraphs to feature structures in the natural way:

Definition (Feature structure reentrancy)

Two paths π1, π2 are reentrant in a feature structure fs, denoted

π1fs

! π2, if π1A

! π2 for some A ∈ fs. fs is reentrant if for some

π1 6= π2, π1fs

! π2.

The definition is independent of the representative A.

Feature structure cyclicity is defined in a similar way.



Feature structures

As another example, we lift the definition of subsumption fromfeature graphs to feature structures:

Definition (Feature structure subsumption)

If fs1 and fs2 are feature structures, fs1 subsumes fs2, denoted fs1⊑fs2, ifffor some A ∈ fs1 and some B ∈ fs2, A ⊑ B .

Since feature structure subsumption is defined in terms of arepresentative, we must show that the definition is representativeindependent.



Feature structures

Lemma

The definition of feature structure subsumption is independent of therepresentative: if A ∼ A′ and B ∼ B ′ then A ⊑ B iff A′ ⊑ B ′.

Proof.

Assume that A ∼ A′ through an isomorphism iA : QA → QA′ and B ∼ B ′

through an isomorphism iB : QB → QB′ . If A ⊑ B there exists asubsumption morphism h : QA → QB . Then h′ = iB ◦ h ◦ iA

−1 is asubsumption morphism mapping QA′ to QB′ (the proof is left as anexercise), and hence fs(A′)⊑fs(B ′). The other direction (if fs(A′)⊑fs(B ′)then fs(A)⊑fs(B)) is completely symmetric.



Feature structures

Corollary

If fsA and fsB are feature structures, fsA⊑fsB iff for every A ∈ fsA andevery B ∈ fsB , A ⊑ B.



Feature structures

Like feature graph subsumption, feature structure subsumption isreflexive and transitive; these properties can be easily establishedfrom their counterparts in the feature graph case.

However, unlike feature graphs, feature structure subsumption isantisymmetric:

Theorem

If fs1⊑fs2 and fs2⊑fs1, then fs1 = fs2.

Therefore, subsumption is a partial order on feature structures.

In the sequel we will sometimes use the ‘⊑’ symbol to denote bothfeature graph and feature structure subsumption, when the type ofthe arguments of the relation is clear.



Feature graphs and feature structures

Example (Feature graphs and feature structures)

Feature Graph Feature Structure

A1 fs1 = [A1]∼

A2 ∼ A′2 fs2 = [A2]∼

[·]∼

[·]∼

∈

∈

∈

⊑ ⊑


Feature structures Attribute-value matrices

AVMs

We now return to attribute-value matrices (AVMs).

This is the view that we will adopt for depicting feature structures(and grammars based on them), both because they are easy topresent on paper and because of their centrality in existing literature.

Like feature graphs, AVMs are defined over a signature of featuresand atoms, which we fix below.

In addition, AVMs make use of variables, also called tags below.Meta-variables X , Y , Z , etc. range over over variables.

Variables are used to encode sharing of values, as will be clearpresently.

When AVMs are concerned, we follow the convention of the linguisticliterature by which variables are natural numbers, depicted in boxes,e.g., 3 .



AVMs

Definition (AVMs)

Given a signature S, the set Avms(S) of AVMs over S is the least setsatisfying the following two clauses:

1 M = Xa ∈ Avms(S) for any a ∈ Atoms and X ∈ Tags; M is saidto be atomic and X is the tag of M, denoted tag(M) = X .

2 M = X [f1 : M1, . . . , fn : Mn] ∈ Avms(S) for n ≥ 0, X ∈ Tags,f1, . . . , fn ∈ Feats and M1, . . . ,Mn ∈ Avms(S), where fi 6= fj ifi 6= j . M is said to be complex, and X is the tag of M, denotedtag(M) = X . If n = 0, M = X [] is an empty AVM.

Note that two AVMs which differ only in their tag are distinct: if X 6= Y ,X

[

· · ·]

6= Y[

· · ·]

. In particular, there is no unique empty AVM. Note alsothat the same variable can be used more than once in an AVM.



AVMs

Example (AVMs)

Consider a signature consisting of Atoms = {a} and Feats = {f,g}.Then M1 = 4a is an AVM by the first clause of the definition, M2 = 2 [ ] isan empty AVM by the second clause, M3 = 3

[

f : 4a]

is an AVM by thesecond clause (using M1 as the value of f, so that fval(M3, f) = M1), and

M4 = 2

[

g : 3[

f : 4a]

f : 2 [ ]

]

is an AVM by the second clause, as is

M5 = 4

[

g : 3[

f : 4a]

f : 2 [ ]

]



AVMs

Meta-variables M, with or without subscripts, range over Avms; theparameter S is omitted when it is clear from the context.

The domain of an AVM M, denoted dom(M), is undefined when M isatomic, and {f1, . . . , fn} when M is complex (hence, dom(M) isempty for an empty AVM).

The value of some feature f ∈ Feats in M, denoted fval(M, f ), isdefined if f = fi ∈ dom(M), in which case it is Mi , and undefinedotherwise.



Sub-AVMs

Definition (Sub-AVMs)

Given an AVM M, its sub-AVMs are SubAVM(M), defined as:

1 SubAVM(Xa) = {Xa}

2 SubAVM(X [f1 : M1, . . . , fn : Mn]) = X [f1 : M1, . . . , fn : Mn]⋃

∪1≤i≤nSubAVM(Mi )



AVMs

Definition (Tags)

Given an AVM M, its tags Tags(M) are defined as:

1 Tags(Xa) = {X}

2 Tags(X [f1 : M1, . . . , fn : Mn]) = X ∪1≤i≤n Tags(Mi )

Definition (Tagset)

The tagset of an AVM M and a tag X ∈ Tags(M) is the set of sub-AVMsof M (including M itself) which are tagged by X :TagSet(M,X ) = {M ′ ∈ SubAVM(M) | tag(M ′) = X}.



AVMs

Example (AVMs)

Let:

M4 = 2

[

g : 3[

f : 4a]

f : 2 [ ]

]

fval(M4, f) = 2 [ ]. Observe that Tags(M4) = { 2 , 3 , 4}. Also,TagSet(M4, 4 ) is { 4a}, TagSet(M4, 3 ) is { 3

[

f : 4a]

} andTagSet(M4, 2 ) is {M4, 2 [ ]}.Trivially, tag(M4) = 2 .



AVMs

Example (AVMs)

Let:

M5 = 4

[

g : 3[

f : 4a]

f : 2 [ ]

]

Similarly, fval(M5, f) = 2 [ ], whereas fval(M5,g) = 3[

f : 4a]

. Observethat Tags(M5) = { 2 , 3 , 4}.Also TagSet(M5, 2 ) = { 2 [ ]},TagSet(M5, 3 ) = { 3

[

f : 4a]

} and TagSet(M5, 4 ) = {M5, 4a}.Trivially, tag(M5) = 4 .



AVMs

Example (AVMs)

As another example, consider the AVM

M6 = 1[

f : 1[

f : 1[

f : 1 [ ]]]]

Here, Tags(M6) = { 1}, and TagSet(M6, 1 ) is:

{M6, 1[

f : 1[

f : 1 [ ]]]

, 1[

f : 1 [ ]]

, 1 [ ]}

Of course, tag(M6) = 1 .



Well-formed AVMs

Consider some AVM M = 1

[

f1 : 2M1

f2 : 2M2

]

where M1 6= M2.

Both M1 and M2 are sub-AVMs of M, and both have the same tag,although they are different.

In other words, the recursive definition of AVMs allows two different,contradicting AVMs to be in the TagSet of the same variable.

To eliminate such cases, we define well-formed AVMs as follows:

Definition (Well-formed AVMs)

An AVM M is well-formed iff for every variable X ∈ Tags(M),TagSet(M,X ) includes at most one non-empty AVM.



Variable associations

Henceforth, we only consider well-formed AVMs.

This allows us to provide a concise interpretation of shared values inAVMs: we wish to make explicit the special role that multipleoccurrences of the same variable in a single AVM play.

To this end, we would like to say that the association of a variableX ∈ Tags(M) in an AVM M, written assoc(M,X ), is the AVM whichis tagged by X ; if, in a given AVM M, a variable X occurs exactlyonce, then assoc(M,X ) is a single, unique value.

If, however, X occurs more than once in M, special care is required.Recall that for well-formed AVMs, at most one of these multipleoccurrences is associated with a non-empty AVM.




Definition (Variable association)

For a variable X ∈ Tags(M), the association of X in M, denotedassoc(M,X ), is the single non-empty AVM in TagSet(M,X ); if only X [ ]is a member of TagSet(M,X ), then assoc(M,X ) = X [ ].

Note that assoc assigns exactly one sub-AVM of M to each variableoccurring in M, independently of the number of occurrences of thevariable in M or the size of TagSet(M,X ).




Example (Variable association)

Consider the well-formed AVM

M = 2

[

g : 3[

f : 4a]

f : 2 [ ]

]

Observe that assoc(M, 2 ) = M, assoc(M, 3 ) = 3[

f : 4a]

andassoc(M, 4 ) = 4a. The two occurrences of the variable 2 have one andthe same association. For M ′ = 4

[

f : 4 [ ]]

, assoc(M ′, 4 ) = M ′.



AVM paths

Definition (AVM paths)

Let M be an AVM. Let Arcs(M) be defined as:Arcs(M) = {〈X , f ,Y 〉 | X ,Y ∈ Tags(M), f ∈ dom(assoc(M,X )) andtag(fval(assoc(M,X ), f )) = Y }. Let Arcs* be the extension of Arcs topaths, defined (recursively) by:

for all X ∈ Tags(M), 〈X , ǫ,X 〉 ∈ Arcs*(M)

if 〈X , f ,Y 〉 ∈ Arcs(M) then 〈X , f ,Y 〉 ∈ Arcs*(M)

if 〈X , f ,Y 〉 ∈ Arcs(M) and 〈Y , π,Z 〉 ∈ Arcs*(M) then〈X , f · π,Z 〉 ∈ Arcs*(M)

The paths of M, denoted Π(M), is the set {π | X = tag(M) and for somevariable Y ∈ Tags(M), 〈X , π,Y 〉 ∈ Arcs*(M)}.



Paths

Example (Paths)

Consider again the AVM

M = 2

[

g : 3[

f : 4a]

f : 2 [ ]

]

Observe that Arcs(M) = {〈 2 ,g, 3 〉, 〈 2 , f, 2 〉, 〈 3 , f, 4 〉}. Therefore,Arcs*(M) includes, in addition to the elements of Arcs(M), also〈 2 , ǫ, 2 〉, 〈 2 , 〈gf〉, 4 〉 and, due to the multiple occurrence of 2 , theinfinitely many triples 〈 2 , fi · 〈g,f〉, 4 〉 for any i ≥ 0.



Path values

Definition (Path values)

The value of a path π in an AVM M, denoted pval(M, π), is assoc(M,Y ),where Y is such that 〈tag(M), π,Y 〉 ∈ Arcs*(M). This is well definedsince Arcs* is functional. Similarly, pval is partial since Arcs* is partial.



Path values


In the AVM

M = 2

[

g : 3[

f : 4a]

f : 2 [ ]

]

pval(M, ǫ) = M; pval(M, 〈g〉) = 3[

f : 4a]

; andpval(M, 〈f〉) = pval(M, 〈ff〉) = pval(M, 〈fff〉) = M. pval(M, 〈gg〉) isundefined.



Reentrancy

Definition (Reentrancy)

Two paths π1 and π2 are reentrant in an AVM M if

pval(M, π1) = pval(M, π2), denoted also π1M! π2. An AVM M is

reentrant if there exist two distinct paths π1, π2 such that π1M! π2.

In the AVM M of the previous example, ǫM! 〈f〉 because

pval(M, ǫ) = pval(M, 〈f〉) = M.

Definition (Cyclic AVMs)

An AVM M is cyclic if two paths π1, π2 ∈ Π(M), where π1 is a proper

subsequence of π2, are reentrant: π1M! π2.

M of the previous example is therefore cyclic, e.g., by the paths ǫ and 〈f〉.



Reentrancy

Example (A reentrant AVM)

The following AVM is reentrant but not cyclic:

0

agr : 1

[

num : 2plpers : 3 third

]

subj : 4[

agr : 1]



Conventions

We introduce three conventions regarding the depiction ofwell-formed AVMs, motivated by the fact that variables are usedprimarily to indicate value sharing.

If a variable occurs more than once then its value is explicated onlyonce; where this value is explicated (i.e., next to which occurrence ofthe variable) is immaterial.

Variables which occur only once can be omitted.

The empty AVM is sometimes omitted when it is associated with avariable.

The first convention is crucial in the case of cyclic AVMS: there is nofinite representation of cyclic AVMs unless this convention is adopted.



Conventions

Example (Shorthand notation for AVMs)

Consider the following AVM:

6

f : 3 [ ]g : 4

[

h : 3a]

h : 2 [ ]

Notice that it is well-formed, since the only variable occurring more thanonce ( 3 ) is associated with a non-empty value (a) only once.

We can therefore leave only one occurrence of the value explicit

The tag 2 is associated with the empty feature structure, which canbe omitted

Finally, the tags 4 and 6 occur only once, so they can be omitted

This is the conventional form of the AVM.



AVM subsumption

Definition (AVM subsumption)

Let M1, M2 be AVMs over the same signature. M1 subsumes M2,denoted M1 � M2, if there exists a total functionh : Tags(M1) → Tags(M2) such that:

1 h(tag(M1)) = tag(M2)

2 for every 〈X , f ,Y 〉 ∈ Arcs(M1), 〈h(X ), f , h(Y )〉 ∈ Arcs(M2)

3 for every X ∈ Tags(M1), if assoc(M1,X ) is atomic thenassoc(M2, h(X )) is atomic, with the same atom.



AVM subsumption

Lemma

If M1 � M2 through h and 〈X , π,Y 〉 ∈ Arcs*(M1) then〈h(X ), π, h(Y )〉 ∈ Arcs*(M2).



AVM subsumption

Corollary

If M1 � M2 then Π(M1) ⊆ Π(M2) and if π1M1! π2 then π1

M2! π2.



AVM isomorphism

When two AVMs are identical up to the variables which occur inthem, one AVM can be obtained from the other by a systematicrenaming of the variables;

We say that the two AVMs are isomorphic.



AVM isomorphism

Example (Isomorphic AVMs)

Let

M1 = 2

[

g : 3[

f : 4a]

f : 2 [ ]

]

M2 = 22

[

g : 23[

f : 24a]

f : 22 [ ]

]

Then M2 can be obtained from M1 by systematically replacing 22 for 2 ,23 for 2 and 24 for 4 .



Renaming

Of course, one must be careful renaming variables, especially whenthe same variable may occur in both AVMs.

For example, if M = 2[

f : 1a]

then renaming 1 to 2 will result inM = 2

[

f : 2a]

which is not even well-formed.



AVM isomorphism

Theorem

If M1 and M2 are isomorphic AVMs then both M1 � M2 and M2 � M1.



AVM equivalence

Another case of AVM equivalence is induced by the convention bywhich if a variable occurs more than once in an AVM then its value isexplicated only once.

A consequence of this convention is that two AVMs which differ onlywith respect to where the (single) value of some multiply occurringvariable is explicated subsume each other, as they induce the sameset of Arcs.



AVM equivalence

Example (AVM equivalence)

M1 and M2 differ only in the instance of 0 whose value is explicated:

M1 = 0

agr : 1

[


]

subj : 4[

agr : 1]

M2 = 0

agr : 1

subj : 4

[

agr : 1

[


]]

Then M1 � M2 and M2 � M1.



AVM equivalence

Theorem

Let M and M ′ be two AVMs such that X ∈ Tags(M) ∩Tags(M ′), andassume that X occurs twice in M and in M ′ (that is, |TagSet(M,X )| > 1and |TagSet(M ′,X )| > 1). If M and M ′ are identical up to the choice ofwhich instance of X in them is explicated, then M � M ′ and M ′ � M.



AVM equivalence

Definition (Renaming)

Let M1 and M2 be two AVMs. M2 is a renaming of M1, denotedM1 ≃ M2, iff M1 � M2 and M2 � M1.



AVM equivalence

Example (AVM renamings)

The following two AVMs are renamings of each other:

M1 = 0

agr : 1

[


]

subj : 4[

agr : 1]

M2 = 10

agr : 11

subj : 14

[

agr : 11

[


]]


Feature structures The correspondence between feature graphs and AVMs

The correspondence between feature graphs and AVMs

AVMs are the entities that the linguistic literature employs to depictfeature structures;

feature graphs are well-understood mathematical entities to whichvarious results of graph theory can be applied.

We define the relationship between these two views.



From AVMs to feature graphs

We formalize the correspondence between AVMs and feature graphsby presenting a mapping, φ, which embodies the relation between anAVM and its feature graph image.

Informally, a given AVM M is mapped to a concrete graph whosenodes are the variables occurring in the AVM, Tags(M).

The root of the graph is the variable tagging the entire AVM; and thearcs are determined using the function val .

Atomic AVMs are mapped to single nodes, labeled by the atom, withno outgoing arcs.

Empty AVMs are mapped to a graph having just one node, bearingno label and having no outgoing features.

Complex AVMs are mapped to graphs whose nodes, including theroot, may have outgoing arcs, where the arcs’ labels correspond tofeatures.




Definition (AVM to graph mapping)

Let M be a well-formed AVM. The feature graph image of M isφ(M) = 〈Q, q, δ, θ〉, where:

Q = Tags(M)

q = tag(M)

for all X ∈ Tags(M) and f ∈ Feats, δ(X , f ) = Y iff〈X , f ,Y 〉 ∈ Arcs(M), and

for all X ∈ Tags(M) and a ∈ Atoms, θ(X ) = a iff assoc(M,X ) isthe atomic AVM Xa, and is undefined otherwise.

Note that if M1 and M2 are two AVMs which differ only in the order ofthe “rows” of feature–value pairs, they will be mapped by φ to exactly thesame feature graph.




Example (AVM to graph mapping)

Let

M = 3

f : 1[

f1 : 7a]

g : 2

[

g1 : 9ag2 : 1 [ ]

]

M is well-formed. The associa-tions of the variables of M are:

Variable Association1 1

[

f1 : 7a]

2 2

[

g1 : 9ag2 : 1 [ ]

]

3 3

f : 1[

f1 : 7a]

g : 2

[

g1 : 9ag2 : 1 [ ]

]

7 7a9 9a





The feature graph image of M is φ(M) = 〈Q, q, δ, θ〉 whereQ = { 3 , 1 , 7 , 2 , 9}, q = 3 , θ( 7) = θ( 9) = a (and θ is undefinedelsewhere), and δ is given by: δ( 3 , f) = 1 , δ( 3 ,g) = 2 , δ( 1 , f1) = 7 ,δ( 2 ,g1) = 9 , δ( 2 ,g2) = 1 and δ is undefined elsewhere.

φ(M) =

1 7a

3 9a

2

f

f1

g

g1g2





A reentrant AVM and its feature graph image:

M = 0

agr : 1

[


]

subj : 4[

agr : 1]

φ(M) =

2pl

0 1

4 3third

agrnum

perssubj agr




Example (AVM to graph mapping in the face of cycles)

Let M be the (cyclic) AVM

M = 3[

f : 3 [ ]]

where Tags(M) = { 3}. Observe that M is well-formed, as the onlyvariable that occurs more than once in M, namely 3 , has only onenon-empty AVM associated with it: M itself. The graph φ(M) willtherefore be 〈Q, q, δ, θ〉, where Q = { 3}, q = 3 , θ( 3 ) is undefined andδ( 3 , f) = 3 , δ undefined elsewhere. This graph is:

φ(M) = 3 f




Lemma

If M is an AVM and A = φ(M) is its feature graph image, then for allX ,Y ∈ Tags(M) and π ∈ Paths, 〈X , π,Y 〉 ∈ Arcs*(M) iffδA(X , π) = Y .




Corollary

If M is an AVM and A = φ(M) is its feature graph image, then

Π(M) = Π(A) and for all π1, π2 ∈ Paths, π1M! π2 iff π1

φ(M)! π2.




Theorem

For all AVMs M1,M2, M1 � M2 iff φ(M1) ⊑ φ(M2).




Corollary

For all AVMs M1,M2, M1 ≃ M2 iff φ(M1) ∼ φ(M2).

This concludes the first direction of the correspondence between AVMsand feature graphs.



From feature graphs to AVMs

For the reverse direction, we define a mapping, η, from feature graphsto AVMs.

As above, there should be a correspondence between nodes in thegraph and variables in the AVM.

But note that while the nodes of a feature graph are part of thedefinition of the graph, AVMs are defined over a universal set ofvariables.

We must therefore pre-define a set of variables, called V below, foreach AVM M, to serve as Tags(M).



From feature graphs to AVMs

In addition, AVMs exhibit a degree of freedom which is not present infeature graphs;

this is due to the fact that multiple occurrences of the same variablecan be explicated along with any of the instances of the variable.

To overcome this difficulty, we first introduce the notion ofarborescence.



Arborescence

Definition

Given a feature graph A = 〈Q, q, δ, θ〉, a tree τ = 〈Q,E 〉, where E ⊆ δ, isan arborescence of A if τ is a minimum spanning directed tree of A,rooted in q.



Arborescence

Informally, an arborescence of a given feature graph is a treeconsisting of the nodes of the graph and the minimum number of arcsrequired for defining some shortest possible path from the root toeach of the nodes in the graph.

Since feature graphs are connected and each node is accessible fromthe root, such a tree always exists, but it is not necessarily unique.

A simple algorithm for producing an arborescence scans the tree, fromthe root, in some order, and marks each node by the length of theshortest path from the root to that node, marking additionally theincoming arcs to the node that are parts of minimum length paths.

Then, for each node with in-degree greater than 1, only a singlemarked arc is retained.



Arborescence

Example (Arborescence)

Let A be the graph: A =

q1 q7a

q3

q2 q9

f

f1

gg1

g2

Then the following trees are arborescences of A:

q1 q7a

q3

q2 q9

f

f1

gg1

q1 q7a

q3

q2 q9

f1

gg1

g2



Arborescence

Definition (Feature graph to AVM mapping)

Let A = 〈Q, q, δ, θ〉 be a feature graph and let τ = 〈Q, E 〉 be an arborescense ofA. Let V ⊆ Tags be a set of |Q| variables and I : Q → V be a one-to-onemapping. For each node q ∈ Q, define Mτ

I (q) as:

if δ(q, f )↑ for all f ∈ Feats and θ(q)↑, then Mτ

I (q) = I (q) [ ]

if δ(q, f )↑ for all f ∈ Feats and θ(q) = a, then Mτ

I (q) = I (q)a

if δ(q, fi ) = qi for 1 ≤ i ≤ n , where n is the out-degree of q, then

Mτ

I (q) = I (q)

f1 : α1

......

fn : αn

where αi = Mτ

I (qi ) if 〈q, fi , qi〉 ∈ E ,αi = I (qi ) otherwise.

The AVM expression of A with respect to an arborescence τ is ητ

I (A) = Mτ

I (q).



Feature graph to AVM mapping

Example (Feature graph to AVM mapping)

Let A be the graph: A =

q1 q7a

q3 q9

q2

f

f1

g

g1g2

Let τ = 〈Q,E 〉 be:

q1 q7a

q3 q9

q2

f

f1

g

g1





Since Q = {q1, q2, q3, q7, q9} we select a set of five variables from Tags;say, V = { 1 , 2 , 3 , 7 , 9}. We define a one-to-one mapping I from Q to V ;here, the function which maps qi to i .To compute the AVM expression of A (with respect to τ and I ) we startwith the sinks of the graph: nodes with no outgoing edges. There are twosuch nodes in A, namely q7 and q9. By the definition,Mτ

I (q9) = I (q9) [ ] = 9 [ ], and MτI (q7) = I (q7)a = 7a. Then,

MτI (q1) = I (q1)

[

f1 : MτI (q7)

]

= 1[

f1 : 7a]

.





More interestingly,

MτI (q2) = I (q2)

[

g1 : MτI (q9)

g2 : I (q1)

]

= 2

[

g1 : 9 [ ]g2 : 1

]

.

Note how the value of 1 is not explicated, as the arc 〈q2,G2, q1〉 is notincluded in τ . Finally,

M = MτI (q3) = I (q3)

[

f : MτI (q1)

g : MτI (q2)

]

= 3

f : 1[

f1 : 7a]

g : 2

[

g1 : 9 [ ]g2 : 1

]

Observe that the result is a well-formed AVM, and that the reentrancy inA is reflected in M.





Had we chosen the other arborescence of A, the resulting AVM wouldhave been:

3

f : 1

g : 2

[

g1 : 9 [ ]g2 : 1

[

f1 : 7a]

]




Example (Feature graph to AVM mapping in the face of cycles)

Let A be the following graph, whose unique arborescence is τ :

A = q0 q1

f

gτ = q0 q1

f

Define V = { 0 , 1} and I maps qi to i . MτI (q1) = 1

[

g : 0]

, and hence

M = MτI (q0) = 0

[

f : MτI (q1)

]

= 0[

f : 1[

g : 0]]

.




Recall that the function η, mapping a feature graph to an AVM, isdependent on τ , the arborescence chosen for the graph.

When a given feature graph A has several different arborescences, ithas several different AVM expressions.

However, these expressions are not arbitrarily different; in fact, theyare all renamings of each other.




Lemma

Let A = 〈Q, q, δ, θ〉 be a feature graph and letτ1 = 〈Q1,E1〉, τ2 = 〈Q2,E2〉 be two arborescenses of A. LetV1,V2 ⊆ Tags be two sets of |Q| variables and I1, I2 : Q → V be twoone-to-one mapping. Then ητ1

I1(A) and ητ2

I2(A) are renamings of each other.




Lemma

If A = 〈Q, q, δ, θ〉 is a feature graph and M = ητI (A) is any one of its AVM

expressions, then for all q1, q2 ∈ Q and f ∈ Feats, δA(q1, f ) = q2 iff〈I (q1), f , I (q2)〉 ∈ Arcs(M).




Corollary

If A is a feature graph and M = ητI (A) is any one of its AVM expressions,

then:

Π(A) = Π(M);

for every path π, pval(M, π) is an atomic AVM with the atom a iffvalA(π) is the graph 〈{q}, q, δ, θ〉 for some node q, where δ isundefined and θ(q) = a; and

for every π1, π2, π1A

! π2 iff π1M! π2.




Corollary

If A is a feature graph and M1 = ητ1I1

(A),M2 = ητ2I2

(A) are two of its AVMexpressions, then M1 ≃ M2.




Theorem

For all feature graphs A1 = 〈Q1, q1, δ1, θ1〉,A2 = 〈Q2, q2, δ2, θ2〉, A1 ⊑ A2

iff for all arborescences τ1, τ2 of A1 and A2, respectively, and mappingsI1, I2, ητ1

I1(A1) � ητ2

I2(A2).




Corollary

For all feature graphs A1,A2, A1 ∼ A2 iff η(A1) ≃ η(A2).



AVMs, feature graphs, feature structures and AFSs

Example (AVMs, feature graphs, feature structures and AFSs)

AVM Feature Graph Feature Structure AFS

M1 A1 fs1 = [A1]∼ F1 = Abs(A1)

M2 ≃ M ′2 A2 ∼ A′

2 fs2 = [A2]∼ F2 = Abs(A2)

φ [·]∼

η, τ

η, τ ′ ∈

∈

Abs

Conc

� ⊑ ⊑ �


Unification Motivation

Unification

We presented different views of feature structures, withcorrespondences among them.

For each of the views, a subsumption relation was defined in a naturalway.

We now define the operation of unification for these views.

The subsumption relation compares the information content offeature structures.

Unification combines the information that is contained in two(compatible) feature structures.

We use the term ‘unification’ to refer to both the operation and itsresult. Whenever two feature structures are related, they are assumedto be over the same signature.


Unification Motivation

Unification

The mathematical interpretation of “combining” two members of apartially ordered set is to take the least upper bound of the twooperands with respect to the partial order; in our case, subsumption.

Indeed, feature structure unification is exactly that.

However, since subsumption is antisymmetric for feature structuresand AFSs but not for feature graphs and AVMs, a unique least upperbound cannot be guaranteed for all four views.


Unification Feature structure unification

Feature structure unification

Definition (Feature structure unification)

Two feature structures fs1 and fs2 are consistent if they have an upperbound (with respect to subsumption), and inconsistent otherwise. If fs1and fs2 are consistent, their unification, denoted fs1⊔fs2, is their leastupper bound with respect to subsumption.

If two feature structures have an upper bound, they have a (unique) leastupper bound.


Unification Feature graph unification

Feature graph unification

While the definition of unification as least upper bound is usefulmathematically, it does not tells us how to compute the unification oftwo given feature structures.

To this end, we provide a constructive definition in terms of featuregraphs, which induces an algorithm for computing unification.

For reasons that will be clear presently, we require that the twofeature graphs be node-disjoint.




Definition

Let A = 〈QA, qA, δA, θA〉 and B = 〈QB , qB , δB , θB〉 with QA ∩ QB = ∅ be

two feature graphs. Let ‘u≈’ be the least equivalence relation on QA ∪ QB

such that:

qA

u≈ qB

for every q1, q2 ∈ QA ∪ QB and f ∈ Feats, if

q1u≈ q2, (δA ∪ δB)(q1, f )↓ and (δA ∪ δB)(q2, f )↓, then

(δA ∪ δB)(q1, f )u≈ (δA ∪ δB)(q2, f )




The ‘u≈’ relation partitions the nodes of QA ∪ QB to equivalence

classes such that both roots are in the same class, and if some featureis defined for two nodes in one class, then the two nodes this featureleads to are also in one (possibly different) class.

Clearly, the number of equivalence classes (called the index ofu≈) is

finite.

The requirement that QA and QB be disjoint is essential here: wewould want two nodes to be in the same equivalence class with

respect to ‘u≈’ only if they comply with the above definition; if we

allowed a non-empty intersection of nodes, ‘u≈’ could have been a

different relation.



Theu≈ relation

A : qA0 qA

1 qA2

sg

B : qB0 qB

1 qB2

3rd

f

g

num

f pers



Theu≈ relation

A : qA0 qA

1 qA2

B : qB1qB

0

f

g

h

f

g



Type-respecting relation

Definition

A binary relation ‘≈’ over the nodes of two feature structures QA ∪ QB issaid to be type respecting iff for every node q ∈ QA ∪ QB , if(θA ∪ θB)(q)↓ and (θA ∪ θB)(q) = a, then for every node q′ such thatq ≈ q′, q′ is a sink and either (θA ∪ θB)(q′)↑ or (θA ∪ θB)(q′) = a.



Type-respecting relation

When is ‘u≈’ not type respecting?

The above condition can hold for a node q ∈ QA ∪ QB only if(θA ∪ θB)(q)↓; that is, q must be a sink in either A or B .

The type respecting condition requires that all nodes that areequivalent to q be sinks, either unmarked or marked by the sameatom.

Since this is the only requirement, the relation is not type respectingif it maps two nodes, one of which is a marked sink and the other ofwhich is either a non-sink or a sink with a different label, to the sameequivalence class.

A non-type respecting ‘u≈’ is the only source for unification failure.



Type respectingu≈ relation

A : qA0 qA

1 qA2

sg

B : qB0 qB

1 qB2

3rd

f

g

num

f pers




Lemma

If A and B have a common upper bound C, such that A ⊑ C through themorphism hA and B ⊑ C through the morphism hB , and if qA ∈ QA and

qB ∈ QB are such that qA

u≈ qB , then hA(qA) = hB(qB).




Definition (Feature graph unification)

Let A and B be two feature graphs such that QA and QB are disjoint. The

unification of A and B, denoted A ⊔ B, is defined only if ‘u≈’ is type respecting,

in which case it is the feature graph 〈Q, q, δ, θ〉, where:

Q = {[q] u≈| q ∈ (QA ∪ QB)}

q = [q1] u≈

(= [q2] u≈

)

δ([q] u≈

, f ) =

{

[q′′] u≈

if there exists q′ ∈ [q] u≈

s.t. (δA ∪ δB)(q′, f ) = q′′

undef. if (δA ∪ δB)(q′, f )↑ for all q′ ∈ [q] u≈

θ([q] u≈

) =

{

(θA ∪ θB)(q′) if there exists q′ ∈ [q] u≈

s.t. (θA ∪ θB)(q′)↓

undefined if (θA ∪ θB)(q′)↑ for all q′ ∈ [q] u≈

Ifu≈ is not type respecting, A and B are inconsistent.




f

gnum

pers

A : qA0 qA

1 qA2

sg

B : qB0 qB

1 qB2

3rd

f

g

num

f pers



Unification

To see that the result of unification is indeed a feature graph, observethat

〈Q, q, δ, θ〉 is connected because both A and B are connected;it is finite since both A and B are (and hence the number ofequivalence classes is finite);

and θ labels only sinks, sinceu≈ is type respecting.



Unification

Example (Unification combines information)

q0 q1sg

⊔ q3 = q6 q7sg

q53rd

q83rd

num

pers

num

pers



Unification

Example (Unification is absorbing)

q0 q1sg

⊔ q3 q4sg

= q6 q7sg

q53rd

q83rd

num num

pers

num

pers



Unification with reentrancies

sg

3rd

sg

3rd

subj

obj

num

pers

subj

obj

num

pers

subj

obj



Unification

Theorem

If A and B are inconsistent, they do not have a common upper bound.Otherwise, C = A ⊔ B is a minimal upper bound of A and B with respectto (feature graph) subsumption.



Unification

The previous theorem connects feature graph unification with featurestructure unification.

In order to compute fs = fs1⊔fs2, simply compute A = A1 ⊔ A2,where A1 ∈ fs1 and A2 ∈ fs2, and take fs = [A]∼.

Theorem

For all feature graphs A1,A2, if A = A1 ⊔ A2 then [A]∼ = [A1]∼⊔[A2]∼.


Unification Unification as a computational process

Unification as a computational process

Unification, as defined above, turns out to be very efficient toimplement.

Several algorithms for feature structure unification have beenproposed.

We present a simple algorithm, based directly on the definition, forunifying two feature graphs.

The algorithm uses two operations, known as union and find, tomanipulate equivalence classes.




Feature graphs are implemented using the following data structure:

each node q is a record (or structure) with the fields:

label, specifying θ(q) (if defined); andfeats, specifying δ, which is a list of pairs, consisting of features andpointers to nodes.

A node q is a sink if and only if q.feats is empty, and only such nodesare labeled.

If θ(q)↑, the field label is nil.

The functions is labeled and is sink receive a node record and returntrue if and only if the node is labeled or has no outgoing edges,respectively.




To implement the union-find algorithm, an additional field, class, isadded to nodes.

It is used to point to the equivalence class of the node.

Upon initialization of the algorithm, for every node q, q.class pointsback to q, indicating that each node is a separate equivalence class.




Example (Internal representation of a feature graph)

qA0 qA

1 qA2

sg

f

g h

label : nilfeats : 〈f,g〉class :

label : nilfeats : 〈h〉class :

label : sgfeats : 〈〉class :




The find operation receives a node and returns a unique, canonicalrepresentative of its equivalence class

Also, union receives two (representatives of) classes and merges themby setting the equivalence class of all members of the second class tothat of the first.



Unification algorithm

Example (Unification algorithm)

input: two feature graphs A and B

output: if fs(A) and fs(B) are unifiable, then a representative offs(A) ⊔ fs(B), else fail.




Example (Unification algorithm)

S← {〈qA, qB〉}while S 6= ∅

select a pair 〈q1, q2〉 ∈ S ; S← S \ {〈q1, q2〉}q1 ← find(q1); q2 ← find(q2)if q1 6= q2 then

if (is labeled(q1) and not is sink(q2)), or(is labeled(q2) and not is sink(q1)), or(is labeled(q1) and is labeled(q2) and q1.label 6= q2.label) then fail

elseunion(q1, q2)if (is sink(q1) and is sink(q2) and is labeled(q2)) then q1.label ← q2.labelfor each 〈f , q〉 ∈ q2.feats

if there is some 〈f , p〉 ∈ q1.feats thenS← S ∪ {〈p, q〉}

elseadd 〈f , q〉 to q1.feats




Upon termination of the algorithm, the original inputs are modified.

The result is obtained by considering the equivalence class of q1 as aroot, and computing the graph that is accessible from it.

Such algorithms, that modify their inputs, are called destructive.



Unification algorithm: correctness

Lemma

The unification algorithm terminates.

Lemma

The unification algorithm computes A ⊔ B.



Unification algorithm: complexity

Every iteration of the loop merges two (different) equivalence classesinto one.

If the inputs are feature graphs consisting of fewer than n nodes, thenumber of equivalence classes in the result is bounded by 2n.

Thus, union can be executed at most 2n times.

There are two calls to find in each iteration, so the number of findoperations is bounded by 4n.

With the appropriate data structures for implementing equivalenceclasses it can be proven that O(n) operations of union and find canbe done in (O(c(n) × n), where c(n) is the inverse Ackermanfunction, which can be considered a constant for realistic n-s.

Therefore, the unification algorithm is quasi-linear.




The algorithm is destructive: the input feature graphs are modified.

This might pose a problem: the inputs might be necessary for furtheruses; even worse, when the unification fails, the inputs might be lost.

To overcome the problem, the inputs to unification must be copiedbefore they are unified, and copying of graphs is an expensiveoperation.

As an alternative solution, there exist non-destructive unificationalgorithms whose theoretical complexity (and actual run-time) is notworse than the algorithm we have presented.


Unification Generalization

Generalization

Unification is an information-combining operator: when two featurestructures are compatible, their unification can be informally seen asa union of the information both structures encode.

Sometimes, however, a dual operation is useful, analogous to theintersection of the information encoded in feature structures.

This operation, which is much less frequently used in computationallinguistics, is referred to as anti-unification, or generalization.



Generalization

Defined over pairs of feature structures, generalization (denoted ⊓) isthe operation that returns the most specific (or least general) featurestructure that is still more general than both arguments.

In terms of the subsumption ordering, generalization is the greatestlower bound (glb) of two feature structures.

Unlike unification, generalization can never fail.



Generalization

Definition (Generalization)

The generalization (or anti-unification) of two feature structures fs1 andfs2, denoted fs1⊓fs2, is the greatest lower bound of fs1 and fs2.



Generalization

Example (Generalization)

Generalization reduces information:

[

num : sg]

⊓[

pers : third]

= [ ]

Different atoms are inconsistent:

[

num : sg]

⊓[

num : pl]

=[

num : [ ]]

Generalization is restricting:

[

num : sg]

⊓

[


]

=[

num : sg]



Generalization

Example (Generalization)

Empty feature structures are zero elements:

[ ] ⊓[

agr :[

num : sg]]

= [ ]

Reentrancies can be lost:[

f : 1[

num : sg]

g : 1

]

⊓

[

f :[

num : sg]

g :[

num : sg]

]

=

[

f :[

num : sg]

g :[

num : sg]

]


Unification grammars Introduction


Feature structures are the building blocks with which unificationgrammars are built, as they serve as the counterpart of the terminaland non-terminal symbols in CFGs.

In order to define grammars and derivations, one needs someextension of feature structures to sequences thereof.

Multi-rooted feature structures are aimed at capturing complex,ordered information and are used for representing rules and sententialforms of unification grammars.

Multi-rooted feature graphs, a natural extension of feature graphsMulti-rooted feature structures, which are equivalence classes ofisomorphic multi-rooted feature graphsMulti-AVMs, which are an extension of AVMs, and show how theycorrespond to multi-rooted graphs.


Unification grammars Introduction


Unification in context

Forms and grammar rules

Derivation

Languages

Derivation tress


Unification grammars Motivation

Motivation

A naıve attempt to augment context-free rules with feature structurescould have been to add to each rule a sequence of feature structures,with an element for each element in the CF skeleton

However, rules cannot be thought of simply as sequences of featurestructures

The reason is possible reentrancies among elements of suchsequences, or in other words, among different categories in a singlerule



Motivation

Example (Rule)

As a motivating example, consider a rule intending to account foragreement on number between the subject and the verb of Englishsentences:

[

cat : s]

→

[

cat : npnum : 4

] [

cat : vpnum : 4

]



Motivation

The difficulty in extending feature structures to sequences thereof isthe possible sharing of information among different elements of theintended sequence

This sharing takes different forms across the various views

In the case of multi-AVMs, the scope of variables (i.e., tags) isextended from a single AVM to a sequenceIn multi-rooted feature graphs this is expressed by the possibility of twopaths, leaving two different roots, leading to the same nodeIn the case of abstract multi-rooted structures, the reentrancy relationmust account for possible reentrancies across elements



Sequences

Two methods for representing rules (and sentential forms based uponthem)

One approach is to use (single) feature structures for representing“sequences” of feature structures.Dedicated features are used to encode substructures of a featurestructure, and the order among them:

Example

1 :[

cat : s]

2 :

[

cat : npnum : 6

]

3 :

[

cat : vpnum : 6

]



Sequences

In this example, the special features 1,2 and 3 are used to encodethe left-hand side, the first element and the last element of the righthand side of the rule, respectively

The main advantage of this approach is that the existing apparatus offeature structures suffices for representing rules as well

However, there are several drawbacks to this solution:

The signature must be augmented to include additional atoms forrepresenting categories, and special features to encode positionsDedicated features (e.g., 1, 2 and 3) are required to have a specialmeaningThe set Feats must be considered an ordered set in order for it to bemapped to the ordered substructure of such feature structures.The number of such dedicated features that are needed is unbounded,in contradiction to our assumption that the set Feats is finite.When feature structures are typed this results in an unboundednumber of types, too



Sequences

A different solution to this problem can be based on the observationthat feature structures can be used to represent lists.

A list can be simply represented as a feature structure having twofeatures, named, say, first and rest:

Example (Feature structure encoding of a list)

first : 1

rest :

first : 2

rest :

[

first : 3rest : elist

]



Sequences

A representation of the motivating example with list could be:

Example (List representation of a rule)

first :[

cat : s]

rest :

first :

[

cat : npnum : 6

]

rest :

first :

[

cat : vpnum : 6

]

rest : elist

Similar problems arise with the list representation:, the features first

and rest are acquired a special, irregular meaning

There is no “direct access” to elements of a rule


Unification grammars Multi-rooted feature graphs

Multi-rooted feature graphs

We extend feature graphs to multi-rooted feature graphs (MRGs).

Multi-rooted feature graphs are defined over the same signature(Feats and Atoms), which is assumed to be fixed

Definition (Multi-rooted feature graphs)

A multi-rooted feature graph (MRG) is a pair 〈R ,G 〉 whereG = 〈Q, δ, θ〉 is a finite, directed, labeled graph consisting of a non-empty,finite set Q of nodes (disjoint of Feats and Atoms), a partial functionδ : Q × Feats → Q specifying the arcs and a labeling function θ markingsome of the sinks, and where R is an ordered list of distinguished nodes inQ called roots. G is not necessarily connected, but the union of all thenodes reachable from all the roots in R is required to yield exactly Q. Thelength of an MRG is the number of its roots, |R|. λ denotes the emptyMRG, where Q = ∅.




Example (Multi-rooted feature graphs)

The following is an MRG, in which the shaded nodes (ordered from left toright) constitute the list of roots, R

q1 q2 q3

q4s

q5np

q6vp

q7

cat cat cat

agr agr




A multi-rooted feature graph is a directed, not necessarily connected,labeled graph with a designated sequence of nodes called roots

It is a natural extension of feature graphs, the only difference beingthat the single root of a feature graph is extended here to a list inorder to model the required structured information

Meta-variables ~A range over MRGs, and Q, δ, θ and R – over theirconstituents

We do not distinguish between an MRG of length 1 and a featuregraph




Natural relations can be defined between MRGs and feature graphs

First, note that if ~A = 〈R ,G 〉 is an MRG and qi is a root in R then qi

naturally induces a feature graph ~A|i = 〈Qi , qi , δi , θi 〉, where:

Qi is the set of nodes reachable from qi

δi = δ|Qi(the restriction of δ to Qi )

θi = θ|Qi(the restriction of θ to Qi ).




One can view an MRG ~A = 〈R,G 〉 as an ordered sequence〈A1, . . . ,An〉 of (not necessarily disjoint) feature graphs, whereAi = ~A|i for 1 ≤ i ≤ n

Note that such an ordered list of feature structures is not a sequencein the mathematical sense:

removing a node accessible from one root can result in this node beingremoved from the graph accessible from some other root



Subgraphs

Although MRGs are not element-disjoint sequences, it is possible todefine substructures of them

The roots of an MRG form a sequence of nodes

Taking just a subsequence of the roots, and considering only thesubgraph they induce (that is, the nodes that are accessible fromthese roots), a notion of substructure is naturally obtained



Subgraphs

Definition (Induced subgraphs)

The subgraph of a non-empty MRG ~A = 〈R,G 〉, induced by j , k anddenoted ~Aj ...k , is defined only if 1 ≤ i ≤ j ≤ n, in which case it is theMRG 〈R ′,G ′〉 where R ′ = 〈qj , . . . , qk〉, G ′ = 〈Q ′, δ′, θ′〉 and

Q ′ = {q | δ(q, π) = q} for some q ∈ R ′ and some π

δ′(q, f ) = δ(q, f ) for every q ∈ Q ′

θ′(q) = θ(q) for every q ∈ Q ′

When the sequence is of length 1 we write ~Ai for ~Ai ...i . As we identify afeature graph with an MRG of length 1, ~Ai = ~A|i .



MRGs

Since MRGs are a natural extension of feature graphs, many ofconcepts defined for the latter can be extended to the former

The transition function δ is extended from single features to pathsThe set of paths of an MRGThe function val , associating a value with each path in a featuregraph, is extended to MRGs.Reentrancy and cyclicityIsomorphism and subsumption



MRG paths

Definition (MRG paths)

The paths of a multi feature graph ~A are

Π(~A) = {〈i , π〉 | π ∈ Paths and δ(qi , π)↓}



MRG path values

Definition (Path value)

The value of a path 〈i , π〉 in an MRG ~A, denoted by val~A(〈i , π〉), is definedif and only if δ~A

(qi , π)↓, in which case it is the feature graph val~A|i (π).

Note that the value of a path in an MRG is a (single-rooted) featuregraph, not an MRG. In particular, val~A(〈i , π〉) may include nodes which

are roots in ~A but are not the root of the resulting feature graph. Clearly,an MRG may have two paths 〈i1, π1〉 and 〈i2, π2〉 where π1 = π2 eventhough i1 6= i2.



MRG path values

Example (Path value)

~A, where R = 〈q0, q1, q2〉 val~A(〈2, 〈f〉〉)

q0 q1 q2

q3 q4 q5

q6a

q7b

q4

q6a

q7b

f f f

h hg h

g h



MRG reentrancy

Two MRG paths are reentrant, denoted 〈i , π1〉~A

! 〈j , π2〉, if theyshare the same value: δ~A

(qi , π1) = δ~A(qj , π2)

A multi-rooted feature graph is reentrant if it has two distinct paths(possibly leaving different roots) that are reentrant

An MRG ~A is cyclic if two paths 〈i , π1〉, 〈i , π2〉 ∈ Π(~A), where π1 is a

proper subsequence of π2, are reentrant: 〈i , π1〉~A

! 〈i , π2〉

Here, the two paths must have the same index i , although they may“pass through” elements of ~A other than the i -th one



A cyclic MRG

Example (A cyclic MRG)

The following MRG ~A = 〈R ,G 〉, where R = 〈q0, q1, q2〉, is cyclic:

q0 q1 q2

q3 q4 q5

q6 q7

f f f

h h

g

h

g



Multi-rooted feature graph isomorphism

Definition (Multi-rooted feature graph isomorphism)

Two MRGs ~A1 = 〈R1,G1〉 and ~A2 = 〈R2,G2〉 are isomorphic, denoted~A1~∼~A2, iff they are of the same length, n, and there exists a one-to-onemapping i : Q1 → Q2, called an isomorphism, such that:

i(q1j ) = q2j for all 1 ≤ j ≤ n;

for all q1, q2 ∈ Q1 and f ∈ Feats, δ1(q1, f ) = q2 iffδ2(i(q1), f ) = i(q2); and

for all q ∈ Q1, θ1(q) = θ2(i(q)) (either both are undefined, or bothare defined and equal).



Subsumption of multi-rooted feature graphs

Definition (Subsumption of multi-rooted feature graphs)

An MRG ~A = 〈R ,G 〉 subsumes an MRG ~A′ = 〈R ′,G ′〉, denoted ~A~⊑~A′, if|R | = |R ′| and there exists a total function h : Q → Q ′ such that:

for every root qi ∈ R, h(qi ) = q′i

for every q ∈ Q and f ∈ Feats, if δ(q, f )↓ thenh(δ(q, f )) = δ′(h(q), f )

for every q ∈ Q, if θ(q)↓ then θ(q) = θ′(h(q))

The only difference from feature graph subsumption is that h is requiredto map each of the roots in R to its corresponding root in R ′. Notice thatin order for two MRGs to be related by subsumption they must be of thesame length.




Example (MRG subsumption)

Feature graph subsumption can have three different effects: if A ⊑ B ,then B can have additional arcs, additional reentrancies or more markedatoms. The same holds for MRGs, with the observation that additionalreentrancies can now occur among paths that originate at different roots:

~⊑

6 ~⊒

f g f g




Example (MRG subsumption)

Let ~A and ~A′ be the following two MRGs. Then ~A~⊑~A′ but not ~A′~⊑~A.

~Anp vp np

sg 3rd sg 3rd

cat

agr

num

pers

catag

r

agr

num

pers

cat

~A′

np vp np

sg 3rd

cat

agr

num

pers

cat

agr cat

agr



Multi-rooted feature structures

Since MRG isomorphism is an equivalence relation, the notion ofmulti-rooted feature structures is well defined:

Definition (Multi-rooted feature structures)

Given a signature of features Feats and atoms Atoms, let~G(Feats,Atoms) be the set of all multi-rooted feature graphs over thesignature. Let ~G|∼ be the collection of equivalence classes inG(Feats,Atoms) with respect to feature graph isomorphism. Amulti-rooted feature structure (MRS) is a member of ~G|∼. We usemeta-variables mrs to range over MRSs.


Unification grammars Multi-AVMs

Multi-AVMs

Definition

Given a signature S, a multi-AVM (MAVM) of length n ≥ 0 is asequence 〈M1, . . . ,Mn〉 such that for each i , 1 ≤ i ≤ n, Mi is an AVMover the signature.



Multi-AVMs

Meta-variables ~M range over multi-AVMs

The sub-AVMs of ~M are SubAVM( ~M) =⋃

1≤i≤n SubAVM(Mi)

Similarly to what we did for AVMs, we define the set of tagsoccurring in a multi-AVM ~M as Tags(~M)

Note that if ~M = 〈M1, . . . ,Mn〉 then Tags( ~M) =⋃

1≤i≤n Tags(Mi )(where the union is not necessarily disjoint)

Also, the set of sub-AVMs of ~M (including ~M itself) which are taggedby the same variable X is TagSet(~M,X )

Here, too, TagSet( ~M,X ) =⋃

1≤i≤n TagSet(Mi ,X )

We usually do not distinguish between a multi-AVM of length 1 andan AVM

When depicting MAVMs graphically, we sometimes suppress theangular brackets which enclose the sequence of AVMs.



Multi-AVMs

Well-formedness and variable association are extended from AVMs toMAVMs in the natural way:

Definition (Well-formed MAVMs)

A multi-AVM ~M is well-formed iff for every variable X ∈ Tags( ~M),TagSet(~M,X ) includes at most one non-empty AVM.

Definition (Variable association)

The association of a variable X in ~M, denoted assoc( ~M,X ), is the singlenon-empty AVM in TagSet(~M,X ); if all the members of TagSet(~M,X ) areempty, then assoc(~M,X ) = X [ ].



Multi-AVMs

Example (Multi-AVMs)

Consider the following multi-AVM ~M , whose length is 3:

⟨

2[

f : 9[

h : 1 [ ]]]

, 1

[

f : 8

[

g : 7ah : 2 [ ]

]]

, 6[

f : 5[

h : 2 [ ]]]

⟩

Tags( ~M) = { 1 , 2 , 5 , 6 , 7 , 8 , 9}. ~M is well-formed:

TagSet( ~M , 1 ) =

{

1 [ ] , 1

[

f : 8

[

g : 7ah : 2 [ ]

]]}

TagSet( ~M , 2 ) ={

2 [ ] , 2[

f : 9[

h : 1 [ ]]]}

Therefore,

assoc( ~M, 1 ) = 1

[

f : 8

[

g : 7ah : 2 [ ]

]]

, assoc( ~M, 2 ) = 2[

f : 9[

h : 1 [ ]]]



Multi-AVMs

The same variable can tag different sub-AVMs of different elements inthe sequence

In other words, the scope of variables is extended from single AVMsto multi-AVMs

This leads to an interpretation of variables (in multi-AVMs) whichhampers the view of multi-AVMs as sequences of AVMs

Recall that we interpret multiple occurrence of the same variablewithin a single AVM as denoting value sharing; hence the definition ofwell-formed AVMs, and the convention that when a variable occursmore than once in an AVM, its association can be stipulated next toany of its occurrences

As in the other views, when multi-AVMs are concerned, thisconvention implies that removing an element from a multi-AVM canaffect other elements, in contradiction to the usual concept ofsequences



MAVM arcs

The sets Arcs and Arcs* are naturally extended from AVMs toMAVMs

Crucially, an arc can connect two tags which occur in differentmembers of the MAVM.

Example (Multi-AVM arcs)

In the MAVM ~M of the example, the set of arcs includes:

{〈 2 , f, 9 〉, 〈 1 , f, 8 〉, 〈 8 ,h, 2 〉} ⊂ Arcs( ~M)

Hence, in particular, 〈 1 , 〈fhf〉, 9 〉 ∈ Arcs*( ~M)



MAVM paths

When defining the paths of a multi-AVM, some caution is required

For an AVM M, Π(M) is defined as {π | X = tag(M) and for somevariable Y ∈ Tags(M), 〈X , π,Y 〉 ∈ Arcs*(M)}

In case of MAVMs, there are several elements from which X can bechosen

Hence, we define the set of MAVM paths relative to an additionalparameter, the index of the element in the MAVM from which thepath leaves.



MAVM paths

Definition (Multi-AVM paths)

If ~M = 〈M1, . . . ,Mn〉 is an MAVM of length n, then its paths are the setΠ( ~M) = {〈i , π〉 | 1 ≤ i ≤ n, X = tag(Mi) and for some variableY ∈ Tags( ~M), 〈X , π,Y 〉 ∈ Arcs*( ~M)}. If n = 0, Π(~M) = ∅.

Example (Multi-AVM paths)

In the MAVM ~M of the example, the set of paths includes 〈2, 〈fg〉〉 butnot 〈1, 〈fg〉〉. More interestingly, {〈i , 〈fh〉k〉 | k ≥ 0 and1 ≤ i ≤ 3} ⊂ Π( ~M).



MAVM path values

Definition (Path values)

The value of a path 〈i , π〉 in a multi-AVM ~M, denoted pval( ~M , 〈i , π〉), isassoc(M,Y ), where Y is such that 〈tag( ~M), π,Y 〉 ∈ Arcs*( ~M).

Of course, one path can have several values when it leaves differentelements of a multi-AVM: in general, pval(~M, 〈i , π〉) 6= pval( ~M, 〈j , π〉) ifi 6= j .



MAVM path values


Consider again ~M of the example. Examples of path values includepval( ~M, 〈2, 〈f g〉〉) = 7a and pval( ~M , 〈1, 〈f h f g〉〉) = 7a. Observe thatin order to fully stipulate the value of some paths, one must combinesub-AVMs of more than one element of the multi-AVM. For example,

pval( ~M, 〈2, 〈f h〉〉) = 1

[

f : 8

[

g : 7ah : 2

[

f : 9[

h : 1 [ ]]]

]]



MAVM reentrancy

A multi-AVM is reentrant if it has two distinct paths which share thesame value; these two paths may well be “rooted” in two differentelements of the MAVM

An MAVM ~M is cyclic if two paths 〈i , π1〉, 〈i , π2〉 ∈ Π( ~M), where π1

is a proper subsequence of π2, are reentrant: 〈i , π1〉~M

! 〈i , π2〉

Here, the two paths must have the same index i , although they may“pass through” elements of ~M other than the i -th one.



MAVM reentrancy

Example (Multi-AVM reentrancy)

Consider again the MAMV ~M of the example. It is reentrant, sincepval(〈1, 〈fh〉〉) = pval(〈2, ǫ〉). Furthermore, it is cyclic sincepval(〈1, 〈fhfh〉〉) = pval(〈1, ǫ〉).



MAVM subsumption

Definition (Multi-AVM subsumption)

Let ~M, ~M ′ be two MAVMs of the same length n and over the samesignature. ~M subsumes ~M ′, denoted ~M~�~M ′, if the following conditionshold:

1 for all i , 1 ≤ i ≤ n, Mi � M ′i ;

2 if 〈i , π1〉~M

! 〈j , π2〉 then 〈i , π1〉~M′

! 〈j , π2〉.



MAVM subsumption

Example (MAVM subsumption)

Let ~M and ~M ′ be the following two MAVMs (of length 3):

~M : 1

»

cat : np

agr : 4

–

2

2

4

cat : vp

agr : 4

»

num : sg

pers : 3rd

–

3

5 3

2

4

cat : np

agr : 6

»

num : sg

pers : 3rd

–

3

5

~M ′ : 1

»

cat : np

agr : 4

–

2

2

4

cat : vp

agr : 4

»

num : sg

pers : 3rd

–

3

5 3

»

cat : np

agr : 4

–

Then ~M � ~M ′ but not ~M ′ � ~M.



MAVM subsumption

The second clause of the definition may seem redundant: if for all i ,1 ≤ i ≤ n, Mi � M ′

i , then in particular all the reentrancies of Mi areall reentrancies in M ′

i ; why then is the second clause necessary?

The answer lies in the possibility of reentrancies across elements inmulti-AVMs

Such reentrancies are a “global” property of multi-AVMs, which isnot reflected in any of the elements in isolation



MAVM Renaming

Definition (Renaming)

Let ~M1 and ~M2 be two MAVMs. ~M2 is a renaming of ~M1, denoted~M1~≃ ~M2, iff ~M1~�~M2 and ~M2~�~M1.



Multi-AVM to MRG mapping

Definition (Multi-AVM to MRG mapping)

Let ~M = 〈M1, . . . ,Mn〉 be a well-formed multi-AVM of length n. TheMRG image of ~M is ϕ( ~M) = 〈R ,G 〉, with R = 〈q1, . . . , qn〉 andG = 〈Q, δ, θ〉, where:

Q = Tags( ~M)

qi = tag(Mi) for 1 ≤ i ≤ n

for all X ∈ Tags(~M) and f ∈ Feats, δ(X , f ) = Y if〈X , f ,Y 〉 ∈ Arcs( ~M), and

for all X ∈ Tags(~M) and a ∈ Atoms, θ(X ) = a if assoc( ~M,X ) is theatomic AVM X (a), and is undefined otherwise.




Example (Multi-AVM to multi-rooted feature graph mapping)

Consider the following multi-AVM ~M:

2[

f : 9[

h : 1 [ ]]]

1

[

f : 8

[

g : 7ah : 2 [ ]

]]

6[

f : 5[

h : 2 [ ]]]

Observe that it is well-formed, as the variables that occur more than once( 1 and 2 ) have only one non-empty occurrence each. The set of variablesof ~M is Tags( ~M) = { 1 , 2 , 5 , 6 , 7 , 8 , 9}, which will also be the set ofnodes Q in ϕ(~M). The sequence of roots R is the sequence of variablestagging the AVM elements of ~M, namely 〈 2 , 1 , 6 〉.




Example (Multi-AVM to multi-rooted feature graph mapping)

The obtained graph is:

2 1 6

9 8 5

7a

f f fh

gh

h




Proposition

Let ~M1, ~M2 be two multi-AVMs. Then:

Π( ~M) = Π(ϕ(~M))

〈i , π1〉~M

! 〈j , π2〉 iff 〈i , π1〉ϕ(~M)! 〈j , π2〉

~M1~� ~M2 iff ϕ( ~M1)~⊑ϕ( ~M2)


Unification grammars Unification revisited

Unification revisited

We defined the unification operation for feature structures

We now extend the definition to multi-rooted structures; we definetwo variants of the operation:

one which unifies two same-length structures and produces their leastupper bound with respect to subsumptionunification in context, which combines the information in two featurestructures, each of which may be an element in a larger structure



Two AMRS unification operations

Example (Two AMRS unification operations)

[ ] [ ] [ ] · · · [ ][ ] [ ] [ ] · · · [ ]

[ ] [ ] [ ] · · · [ ]

Same-length AMRS unification

[ ][ ]

[ ] [ ] [ ] [ ] [ ][ ]




MRS unification

Example (MRS unification)

Let

σ =

[

cat : dnum : 4

]

cat : nnum : 4

case : nom

[

cat : vnum : 4

]

ρ =

[

cat : dnum : pl

]

cat : nnum : plcase : [ ]

[

cat : vnum : pl

]

Then

σ ⊔ ρ =

[

cat : dnum : 4pl

]

cat : nnum : 4

case : nom

[

cat : vnum : 4

]




Example (Unification in context)

Let

σ =

[

f : 1ag : 2 [ ]

]

[

h : 2]

, ρ =

[

f : 3 [ ]g : 4b

]

[

h : 3]

Unifying the first element in σ with the first element in ρ in the contextsof σ and ρ, we obtain (σ, 1) ⊔ (ρ, 1) = (σ′, ρ′):

σ′ =

[

f : 1ag : 2b

]

[

h : 2]

, ρ′ =

[

f : 3ag : 4b

]

[

h : 3]

Note that both operands of the unification are modified.




Theorem

If 〈σ′, ρ′〉 = (σ, i) ⊔ (ρ, j) then σ′i = ρ′j = σi ⊔ ρj .




Theorem

Let σ, ρ be two AMRSs and i , j be indexes such that i ≤ len(σ) andj ≤ len(ρ). Then 〈σ′, ρ′〉 = (σ, i) ⊔ (ρ, j) iff

σ′ = min~⊑{σ′′ | |σ~⊑ σ′′ and ρj�σ′′i} and

ρ′ = min~⊑{ρ′′ | ρ~⊑ ρ′′ and σi �ρ′′j}.


Unification grammars Rules and grammars

Rules and grammars

Like context free grammars, unification grammars are defined over analphabet

As the grammars that are of most interest to us are of naturallanguages, and since sentences in natural languages are not juststrings of symbols, but rather strings of words, we add to thesignature an alphabet, a fixed set Words of words (in addition tothe fixed sets Feats and Atoms)

Meta-variables wi ,wj etc. are used to refer to elements of Words, wto refer to strings over Words.



Rules and grammars

We also adopt here the distinction between phrasal and terminal rules

The former cannot have elements of Words in their bodies; thelatter have only a single word as their body

We refer to the collection of terminal rules as the lexicon: itassociates with terminals, members of Words, (abstract) featurestructures that are their categories

For every word wi ∈ Words the lexicon specifies a finite set ofabstract feature structures L(wi )

If L(wi ) is a singleton then wi is unambiguous, and if it is empty thenwi is not a member of the language defined by the lexicon.



Lexicon

Definition (Lexicon)

Given a signature of features Feats and atoms Atoms, and a setWords of terminal symbols, a lexicon is a finite-range functionL : Words → 2AFS(Feats,Atoms).



Lexicon

Example (Lexicon)

Following is a lexicon L over a signature consisting ofFeats = {cat,num,case}, Atoms = {d, n, v, sg, pl}, andWords = {two, sheep, sleep}:

L(two) =

{[

cat : dnum : pl

]}

L(sheep) =

cat : nnum : [ ]case : [ ]

L(sleep) =

{[

cat : vnum : pl

]}



Lexicon

Example (Lexicon)

An an alternative to the previous lexical entry of sheep above, thegrammar writer may prefer the following lexical entry:

L(sheep) =

cat : nnum : sgcase : [ ]

,




Lexicon

Example (Lexicon, rule-format)

To depict the lexicon specification above, we usually use the followingnotation:

sheep →


sheep →




Lexicon

When a string of words w is given, it is possible to construct anAMRS σw for the lexical entries of the words in w , such that no twoelements of σw share paths

Such an AMRS is simply the concatenation of the lexical entries ofthe words in w

In general, there may be several such AMRSs, as each word in w canhave multiple elements in its category

The set of such AMRSs is the pre-terminals of w



Pre-terminals

Definition (Pre-terminals)

Let w = w1 . . . wn ∈ Words+. PTw (j , k) is defined iff 1 ≤ j , k ≤ n, inwhich case it is the set of AMRSs {〈Aj · Aj+1 · · ·Ak〉 | Ai ∈ L(wi) forj ≤ i ≤ k}. If j > k (i.e., w = ǫ), then PTw (j , k) = {λ}. The subscript wis omitted when it is clear from the context.



Pre-terminals

Example (Pre-terminals)

Consider the string of words w = two sheep sleep and the lexicon of theprevious example. There is exactly one element in PTw (1, 3); this is theAMRS

[

cat : dnum : pl

]

cat : nnum : [ ]case : [ ]

[

cat : dnum : pl

]

Notice that there is no sharing of variables among different featurestructures in this AMRS. As AMRSs are depicted using multi-AVMs here,the variables in the above multi-AVM are chosen such that unintendedreentrancies are avoided.



Pre-terminals

Example (Pre-terminals)

Now assume that the word sheep is represented as an ambiguous word: itscategory contains two feature structures, namely

L(sheep) =


,


Then PTw (1, 3) has two members:

[

cat : dnum : pl

]


[

cat : dnum : pl

]

,

[

cat : dnum : pl

]


[

cat : dnum : pl

]



Rules

Definition (Rules)

A (phrasal) rule is an AMRS of length n > 0 with a distinguished firstelement. If σ is a rule then σ1 is its head and σ2..n is its body. We adopta convention of depicting rules with an arrow (→) separating the headfrom the body.

Since a rule is simply an AMRS, there can be reentrancies among itselements: both between the head and (some element of) the bodyand among elements in its body.

Notice that the definition supports ǫ-rules, i.e., rules with null bodies



Rules

Example (Rules as AMRSs)

As every AMRS can be interpreted as a rule, so can the following:

[

cat : s]

→

[

cat : npagr : 4

] [

cat : vagr : 4

]



Rules

Example (Rules as AMRSs)

Rules can also propagate information between the mother and any of thedaughters using reentrancies between paths originating in the head of therule and paths originating from one of the body elements, as below.

[

cat : ssubj : 1

]

→ 1

[

cat : npagr : 2

] [

cat : vagr : 2

]



Rules

The rules of the example employ feature structures that include thefeature cat, encoding the major part-of-speech category of phrases

While this is useful and natural, it is by no means obligatory

Unification rules can encode such information in other ways (e.g., viaa different feature, or as a collection of features); or they may notencode it at all

In the general case, a unification rule is not required to have acontext-free skeleton, a feature whose values constitute a context-freebackbone that drives the derivation

Some unification-based grammar theories do indeed maintain acontext-free skeleton (LFG is a notable example), while others (likeHPSG) do not



Rules

We introduce a shorthand notation in the presentation of grammars:

When two rules have the same head, we list the head only once andseparate the bodies of the different rules with ‘|’ (following theconvention of context-free grammars)

Note, however, that the scope of variables is still limited to a singlerule, so that multiple occurrences of the same variable within thebodies of two different rules are unrelated

Additionally, we may use the same variable (e.g., 4 ) in several rules

It should be clear by now that these multiple uses are unrelated toeach other, as the scope of variables is limited to a single rule




Definition (Unification grammars)

A unification grammar (UG) G = (L,R,As) over a signature Atoms ofatoms and Feats of features consists of a lexicon L, a finite set of rulesR and a start symbol As that is an abstract feature structure.




Example (Gu, a unification grammar)

[

cat : s]

→

cat : npnum : 4

case : nom

[

cat : vnum : 4

]

cat : npnum : 4

case : 2

→

[

cat : dnum : 4

]

cat : nnum : 4

case : 2

cat : npnum : 4

case : 2

→

cat : pronnum : 4

case : 2




Example (Gu, a unification grammar)

sleep →

[

cat : vnum : pl

]

sleeps →

[

cat : vnum : sg

]

lamb →


lambs →


she →

cat : pronnum : sgcase : nom

her →

cat : pronnum : sgcase : acc

a →

[

cat : dnum : sg

]

two →

[

cat : dnum : pl

]


Unification grammars Derivations

Derivations

The language generated by UGs is defined in a parallel way to thedefinition of languages generated by context-free grammars:

first, we define derivations, analogously to the context-free derivations

The reflexive transitive closure of the derivation relation is the basisfor the definition of languages

For the following discussion fix a particular grammar G = (L,R,As)



Derivations

Derivation is a relation that holds between two forms, σ1 and σ2,each of which is an AMRS

To define it formally, two concepts have to be taken care of:

An element of σ1 has to be matched against the head of somegrammar rule, ρ

The body of ρ must replace the selected element in σ1, thus producingσ2

Matching involves unification, and unification must be computed incontext: that is, when the selected element of σ1 is unified with thehead of ρ, other elements in σ1 or in ρ may be affected due toreentrancy

This possibility must be taken care of when replacing the selectedelement with the body of ρ



Derivations

Definition (Derivation)

An AMRS σ1 of length k derives an AMRS σ2 (denoted σ1 ⇒ σ2) iff forsome j ≤ k and some rule ρ ∈ R of length n,

(σ1, j) ⊔ (ρ, 1) = (σ′1, ρ

′), and

σ2 is the replacement of the j-th element of σ1 with the body of ρ

(details suppressed)

The reflexive transitive closure of ‘⇒’ is ‘∗⇒’. We write σ

l⇒ ρ when σ

derives ρ in l steps.



Derivation step

Example (Derivation step)

Suppose that

σ1 =

cat : npnum : 1

case : nom

[

cat : vnum : 1

]

is a (sentential) form and that

ρ =

cat : npnum : 2

case : 3

→

[

cat : dnum : 2

]

cat : nnum : 2

case : 3

is a rule. Assume further that the selected element j in σ1 is the first one.Applying the rule ρ to the form σ1, it is possible to construct a derivationσ1 ⇒ σ2.



Derivation step


First, compute (σ1, 1) ⊔ (ρ, 1) = (σ′1, ρ

′):

σ′1 =

cat : npnum : 1

case : nom

[

cat : vnum : 1

]

ρ′ =

cat : npnum : 2

case : 3nom

[

cat : dnum : 2

]

cat : nnum : 2

case : 3



Derivation step


Now, the first element of σ′1 is replaced by the body of ρ′. This operation

results in a new AMRS, σ2, of length 3: the first two elements are thebody of ρ′, and the last element is the remainder of σ′

1, after its firstelement has been eliminated; that is, the last element of σ′

1. A simplereplacement would have resulted in the following AMRS:

[

cat : dnum : 2

]

cat : nnum : 2

case : 3nom

[

cat : vnum : 1

]

Obviously, this is not the expected result!



Derivation step


Since the path (1,num) in σ1 is reentrant with (2,num) (indicated by thetag 1 ), and since the path (1,num) in the rule ρ is reentrant with thepaths (2,num) and (3,num) (the tag 3 ), one would expect that thesharing between the num values of the noun phrase and the verb phrase inσ1 would manifest itself as a sharing between this feature’s values of thedeterminer, the noun and the verb phrase in σ2.This is what the last clause in the definition of derivation guarantees. Theresult is:

σ2 =

[

cat : dnum : 4

]

cat : nnum : 4

case : 5nom

[

cat : vnum : 4

]



Derivation


Consider the grammar Gu. A derivation with Gu can start with a form oflength 1, consisting of

σ1 =[

cat : s]

The single element of this AMRS unifies with the head of the first rule inthe grammar, trivially. Substitution is again trivial, and the next form inthe derivation is the body of the first rule:

σ2 =

cat : npnum : 1

case : nom

[

cat : vnum : 1

]



Derivation


Since the rule ρ of that example is indeed in Gu, a derivable form from σ2

is:

σ3 =

[

cat : dnum : 4

]

cat : nnum : 4

case : nom

[

cat : vnum : 4

]

Thus, we obtain σ1 ⇒ σ2 ⇒ σ3, and hence σ1∗⇒ σ3.



Derivation


Consider the form σ3 and one of the AMRSs in PTw (1, 3):

σ3 =

[

cat : dnum : 4

]

cat : nnum : 4

case : nom

[

cat : vnum : 4

]

σ =

[

cat : dnum : pl

]


[

cat : dnum : pl

]

The former contains information that is accumulated during derivations; thelatter reflect information from the lexical entries of the words in w .

σ ⊔ ρ =

[

cat : dnum : 4pl

]

cat : nnum : 4

case : nom

[

cat : vnum : 4

]



Language

Definition (Language)

The language of a unification grammar G isL(G ) = {w ∈ Words∗ | w = w1 · · ·wn and there exist an AMRS σ such

that As∗⇒ σ and an AMRS ρ ∈ PTw (1, n) such that σ ⊔ ρ is defined}.



Language

Example (Language)

Consider the grammar Gu and the string the sheep sleep. The form σ3 isderivable from the start symbol of the grammar. This form is unifiablewith one of the members of PTw (1, 3). Hence the string the sheep sleep isa member of L(Gu).


Unification grammars Derivation trees

Derivation trees

In order to depict derivations graphically we extend the notion ofderivation trees, defined for context-free grammars, to unificationgrammars

Informally, we would like a tree to be a structure whose elements arefeature structures

However, care must be taken when the scope of reentrancies in a treeis concerned: in order for information to be shared among all nodes ina tree, this scope is extended to the entire tree



Derivation trees

Rather than define a new mathematical entity, corresponding to atree whose nodes are feature structures with the scope of reentranciesextended to the entire structure, we reuse in the following definitionthe concept of multi-rooted structures (more precisely, AMRSs)

In order to impose a tree structure on AMRSs we simply pair themwith a tree whose nodes are integers, such that each node in the treeserves as an index into the AMRS

In this way, all the existing definitions which refer to AMRSs can benaturally used when reasoning about trees



Derivation trees

Definition (Unification trees)

Given a signature S = 〈Atoms,Feats〉, a unification tree is an orderedtree whose nodes are AVMs over S, where the scope of reentrancies isextended to the entire tree. A subtree is a particular node of the tree,along with all its descendants (and the edges connecting them). Formally,a unification tree is a pair 〈σ, τ 〉, where σ is an AMRS over S, say oflength l for some l ∈ N, and τ is a tree over the nodes {1, 2, . . . , l}.



Derivation trees

Example (Unification tree)

Following is a unification tree, depicted as a tree of AVMs:

[

cat : s]

cat : npnum : 4

case : 2nom

[

cat : dnum : 4

]

cat : nnum : 4

case : 2nom

[

cat : vnum : 4

]



Derivation trees

Example (Unification tree)

Formally, this tree is a pair 〈τ, σ〉, where τ is a tree over {1, 2, 3, 4, 5} and σ is anAMRS of length 5:

τ = 1

2

3 4 5

σ =[

cat : s]

cat : npnum : 4

case : 2nom

[

cat : dnum : 4

]

cat : nnum : 4

case : 2nom

[

cat : vnum : 4

]



Unification derivation trees

Definition (Unification derivation trees)

A unification derivation tree induced by a unification grammarG = (R,As) is a unification tree defined recursively as follows:

〈As , τ〉 is a unification derivation tree, where τ is the tree consistingof the single node {1};

if 〈σ, τ 〉 is a unification derivation tree and 〈σ′, τ ′〉 extends 〈σ, τ 〉,then 〈σ′, τ ′〉 is also a unification derivation tree.




Example (Unification derivation trees)

A unification derivation tree with the grammar Gu can be builtincrementally as follows. The start symbol of the grammar is

[

cat : s]

;therefore, an initial derivation tree would be 〈σ1, {1}〉, the start symbolitself.Then, by using the first grammar rule, the following tree, 〈σ2, τ2〉, can beobtained:

[

cat : snum : 4

]

cat : npnum : 4

case : nom

[

cat : vnum : 4

]




Example (Unification derivation trees)

Next, by applying the second grammar rule to the leftmost node on thefrontier of 〈σ2, τ2〉, the following tree, 〈σ3, τ3〉, is obtained:

[

cat : snum : 4

]

cat : npnum : 4 sgcase : nom

[

cat : dnum : 4

]

cat : nnum : 4

case : 2

[

cat : vnum : 4

]



Complete derivation trees

As in the context-free case, the frontier of unification derivation treesdoes not have to correspond to any lexical item

Of course, in order for trees to represent complete derivations, we areparticularly interested in such trees whose frontier is unifiable with asequence of pre-terminals




Definition (Complete derivation trees)

A unification derivation tree 〈σ, τ 〉 is complete if the frontier of τ isj1, . . . , jn and there exist a word w ∈ Words∗ of length n and an AMRSρ ∈ PTw (1, n) such that ρ ⊔ 〈σi , σj1, . . . , σjn〉 is defined.

Note that there may be more than one qualifying AMRS in PTw (1, n); thedefinition only requires one. Of course, different AMRSs in PTw (1, n) willcorrespond to different interpretations of the input string (resulting fromambiguous lexical entries of the words)




Example (Complete derivation trees)

Consider the grammar Gu and the string w = two lambs sleep. The tree ofthe previous example is complete. Its frontier is unifiable with thefollowing AMRS:

[

cat : dnum : pl

]

cat : nnum : plcase : 2

[

cat : vnum : pl

]

∈ PTw (1, 3)



Lexicalized derivation trees

It is sometimes useful to depict a tree whose leaves already reflect theadditional information obtained by actually unifying the frontier of acomplete derivation tree with PTw

We call such trees lexicalized

It is easy to see that for every lexicalized tree 〈σ, τ 〉 there exists a

complete derivation tree 〈σ′, τ ′〉 such that τ ′ = τ and σ′ ~⊑ σ




Definition (Lexicalized derivation trees)

Let 〈σ, τ 〉 be a complete derivation tree induced by a unification grammarG = (R,As) and let w , ρ be as in the definition of complete trees. Alexicalized derivation tree induced by G on w is the unification tree〈σ′, τ 〉, where σ′ is obtained from σ by unifying the frontier of σ with ρ.




Example (Lexicalized derivation tree)

A tree induced by the grammar Gu on the string two lambs sleep:

[

cat : s]

cat : npnum : 4

case : 2nom

[

cat : dnum : 4pl

]

cat : nnum : 4

case : 2nom

[

cat : vnum : 4

]

two sheep sleep


Linguistic applications Introduction

Linguistic applications

We now put the theory to use, by accounting for several of thelinguistic phenomena that motivated UGs

Unification grammars facilitate the expression of linguisticgeneralizations

This is mediated through two main mechanisms:

The notion of grammatical category is expressed via feature structures,thereby allowing for complex categories as first-class citizens of thegrammatical theoryReentrancy provides a concise machinery for expressing “movement”,or more generally, relations that hold in a deeper level than aphrase-structure tree


Linguistic applications Introduction

Phenomena

Agreement

Case control

Subcategorization

Long-distance dependencies

Control

Coordination


Linguistic applications A basic grammar

A basic grammar

Example (A context-free grammar G0:)

S → NP VPVP → V | V NPNP → D N | Pron | PropND → the, a, two, every, . . .

N → sheep, lamb, lambs, shepherd, water . . .

V → sleep, sleeps, love, loves, feed, feeds, herd, herds, . . .

Pron → I, me, you, he, him, she, her, it, we, us, they, them

PropN → Rachel, Jacob, . . .



Every CFG is a UG

Observe that any context-free grammar is a special case of aunification grammar

The non-terminal symbols of the CFG can be modeled by atoms

A more general view of G0 as a unification grammar can encode thefact that the non-terminal symbols represent grammatical categories

This can be done using a single feature, e.g., cat, whose values arethe non-terminals of G0



G ′0, a basic unification grammar

Example (G ′0, a basic unification grammar)

Following is a unification grammar, G ′0, over a signature 〈Feats,Atoms〉

where Feats = {cat} and Atoms = {s, np, vp, v, d, n, pron, propn}:

1[

cat : s]

→[

cat : np] [

cat : vp]

2[

cat : vp]

→[

cat : v]

3[

cat : vp]

→[

cat : v] [

cat : np]

4[

cat : np]

→[

cat : d] [

cat : n]

5, 6[

cat : np]

→[

cat : pron]

|[

cat : propn]



G ′0, a basic unification grammar

Example (G ′0, a basic unification grammar)

sleep →[

cat : v]

give →[

cat : v]

love →[

cat : v]

tell →[

cat : v]

feed →[

cat : v]

feeds →[

cat : v]

lamb →[

cat : n]

lambs →[

cat : n]

she →[

cat : pron]

her →[

cat : pron]

they →[

cat : pron]

them →[

cat : pron]

Rachel →[

cat : propn]

Jacob →[

cat : propn]

a →[

cat : d]

two →[

cat : d]



Derivation trees induced by G ′0

Example (Derivation trees induced by G ′0)

The grammar G ′0 induces the following tree on the string the sheep love her:

[

cat : s]

[

cat : np] [

cat : vp]

[

cat : d] [

cat : n] [

cat : v] [

cat : np]

[

cat : pron]

the sheep love her



Derivation trees induced by G ′0

Example (Derivation trees induced by G ′0)

Not surprisingly, an isomorphic derivation tree is induced by the grammaron the ungrammatical string ∗the lambs sleeps they:

[

cat : s]

[

cat : np] [

cat : vp]

[

cat : d] [

cat : n] [

cat : v] [

cat : np]

[

cat : pron]

the lambs sleeps they


Linguistic applications Imposing agreemnt

Gagr, accounting for agreement on number

Example (Gagr, accounting for agreement on number)

1[

cat : s]

→

[

cat : npnum : 4

] [

cat : vpnum : 4

]

2

[

cat : vpnum : 4

]

→

[

cat : vnum : 4

]

3

[

cat : vpnum : 4

]

→

[

cat : vnum : 4

]

[

cat : np]

4

[

cat : npnum : 4

]

→

[

cat : dnum : 4

] [

cat : nnum : 4

]

5, 6

[

cat : npnum : 4

]

→

[

cat : pronnum : 4

]

|

[

cat : propnnum : 4

]





sleep →

[

cat : vnum : pl

]

give →

[

cat : vnum : pl

]

love →

[

cat : vnum : pl

]

tell →

[

cat : vnum : pl

]

feed →

[

cat : vnum : pl

]

feeds →

[

cat : vnum : sg

]





lamb →

[

cat : nnum : sg

]

lambs →

[

cat : nnum : pl

]

she →

[

cat : pronnum : sg

]

her →

[

cat : pronnum : sg

]

they →

[

cat : pronnum : pl

]

them →

[

cat : pronnum : pl

]

Rachel →

[

cat : propnnum : sg

]

Jacob →

[

cat : propnnum : sg

]

a →

[

cat : dnum : sg

]

two →

[

cat : dnum : pl

]



Gagr generates a CF language

While Gagr is a unification grammar, the language it generates iscontext free

But the equivalent CFG is inferior to the unification grammar:

The linguistic description is distorted: information regarding number,which is determined by the words themselves, is encoded in G1 by theway they are derived (in other words, G1 accounts for lexical knowledgeby means of phrase-structure rules)Several linguistic generalizations are lost: the context-free grammarinduces two different trees on a lamb sleeps and two lambs sleep



UG and linguistic generalizations

One natural notion of ‘linguistic generalization’ emerges: the abilityto formulate a linguistic restriction by means of a single rule, insteadof by a collection of “similar” rules

In this sense, Gagr captures the agreement generalization, while G1

does not

Multiplying out all the possible values of a particular feature, andconverting a unification grammar to an equivalent context-freegrammar in this way, is not always possible


Linguistic applications Imposing case control

Imposing case control

Add a feature to the signature, case, to the feature structuresassociated with nominal categories: nouns, pronouns, proper namesand noun phrases

The lexical entries of pronouns must specify their case, which is overtand explicit: we use the value nom for nominative case, whereas accstands for accusative

As for proper names and nouns, their lexical entries are simplyunderspecified with respect to case

Use the values of the case feature in the grammar to imposeconstraints of case control



Gcase, accounting for case control

Example (Gcase, accounting for case control)

1[

cat : s]

→

cat : npnum : 4

case : nom

[

cat : vpnum : 4

]

2

[

cat : vpnum : 4

]

→

[

cat : vnum : 4

]

3

[

cat : vpnum : 4

]

→

[

cat : vnum : 4

]

cat : npnum : 3

case : acc

4

cat : npnum : 4

case : 2

→

[

cat : dnum : 4

]

cat : nnum : 4

case : 2

5, 6

cat : npnum : 4

case : 2

→

cat : pronnum : 4

case : 2

|

cat : propnnum : 4

case : 2





sleep →

»

cat : v

num : pl

–

sleeps →

»

cat : v

num : sg

–

feed →

»

cat : v

num : pl

–

feeds →

»

cat : v

num : sg

–





lamb →

2

4

cat : n

num : sg

case : [ ]

3

5 lambs →

2

4

cat : n

num : pl

case : [ ]

3

5

she →

2

4

cat : pron

num : sg

case : nom

3

5 her →

2

4

cat : pron

num : sg

case : acc

3

5

they →

2

4

cat : pron

num : pl

case : nom

3

5 them →

2

4

cat : pron

num : pl

case : acc

3

5

Rachel →

»

cat : propn

num : sg

–

Jacob →

»

cat : propn

num : sg

–

a →

»

cat : d

num : sg

–

two →

»

cat : d

num : pl

–



Derivation tree with case control

Example (Derivation tree with case control)

ˆ

cat : s˜

2

4

cat : np

num : 4case : 3nom

3

5

»

cat : vp

num : 4

–

»

cat : d

num : 4pl

–

2

4

cat : n

num : 4case : 3

3

5

»

cat : v

num : 4

–

2

4

cat : np

num : 2case : 5 acc

3

5

2

4

cat : pron

num : 2pl

case : 5

3

5

the shepherds feed them



Derivation tree with case control

Example (Derivation tree with case control)

This tree represents a derivation which starts with the initial symbol,[

cat : s]

, and ends with multi-AVM σ′, where

σ′ =the

[

num : 4]

shepherds[

num : 4

case : nom

] feed[

num : 4]

them[

num : 2

case : acc

]

This multi-AVM is unifiable with (but not identical to!) the sequence oflexical entries of the words in the sentence, which is:

σ =the

[

num : [ ]]

shepherds[

num : plcase : [ ]

] feed[

num : pl]

them[

num : plcase : acc

]

Hence the sentence is in the language generated by the grammar.


Linguistic applications Imposing subcategorization constraints

Imposing subcategorization constraints

A naıve solution to the subcategorization problem

intransitive verbs (with no object): sleep, walk, run, laugh, . . .

transitive verbs (with a nominal object): feed, love, eat, . . .

Lexical entries of verbs are extended such that their subcategorizationis specified

The rules that involve verbs and verb phrases are extended



Gsubcat, a naıve account of verb subcategorization

Example (Gsubcat, a naıve account of verb subcategorization)

1ˆ

cat : s˜

→

2

4

cat : np

num : 4case : nom

3

5

»

cat : vp

num : 4

–

2

»

cat : vp

num : 4

–

→

2

4

cat : v

num : 4subcat : intrans

3

5

3

»

cat : vp

num : 4

–

→

2

4

cat : v

num : 4subcat : trans

3

5

2

4

cat : np

num : 4case : acc

3

5

4

2

4

cat : np

num : 4case : 2

3

5 →

»

cat : d

num : 4

–

2

4

cat : n

num : 4case : 2

3

5

5, 6

2

4

cat : np

num : 4case : 2

3

5 →

2

4

cat : pron

num : 4case : 2

3

5 |

2

4

cat : propn

num : 4case : 2

3

5





sleep →

cat : vnum : plsubcat : intrans

sleeps →

cat : vnum : sgsubcat : intrans

feed →

cat : vnum : plsubcat : trans

feeds →

cat : vnum : sgsubcat : trans





lamb →

2

4

cat : n

num : sg

case : [ ]

3

5 lambs →

2

4

cat : n

num : pl

case : [ ]

3

5

she →

2

4

cat : pron

num : sg

case : nom

3

5 her →

2

4

cat : pron

num : sg

case : acc

3

5

they →

2

4

cat : pron

num : pl

case : nom

3

5 them →

2

4

cat : pron

num : pl

case : acc

3

5

Rachel →

»

cat : propn

num : sg

–

Jacob →

»

cat : propn

num : sg

–

a →

»

cat : d

num : sg

–

two →

»

cat : d

num : pl

–


Linguistic applications Subcategorization lists

Subcategorization lists

The previous account of subcategorization is naıve

Different verbs subcategorize for different kinds of complements:noun phrases, infinitival verb phrases, sentences etc.

Some verbs require more than one complement

The idea is to store in the lexical entry of each verb not an atomicfeature indicating its subcategory, but rather a list of categories,indicating the appropriate complements of the verb



Lexical entries of some verbs using subcategorization lists

Example (Lexical entries of some verbs using subcategorization lists)

sleep →

cat : vsubcat : elistnum : pl

love →

cat : vsubcat : 〈

[

cat : np]

〉num : pl

give →

cat : vsubcat : 〈

[

cat : np]

,[

cat : np]

〉num : pl

tell →

cat : vsubcat : 〈

[

cat : np]

,[

cat : s]

〉num : pl



Subcategorization lists

The grammar rules must be modified to reflect the additional wealthof information in the lexical entries

Due to this wealth there can be a dramatic reduction in the numberof grammar rules necessary for handling verbs



VP rules using subcategorization lists

Example (VP rules using subcategorization lists)

[

cat : s]

→[

cat : np]

[

cat : vsubcat : elist

]

[

cat : vsubcat : 2

]

→

cat : v

subcat :

[

first :[

cat : 4]

rest : 2

]

[

cat : 4]



A derivation tree

Example (A derivation tree)

ˆ

cat : s˜

»

cat : v

subcat : 〈〉

–

»

cat : v

subcat : 〈ˆ

cat : 2˜

〉

–

ˆ

cat : np˜

»

cat : v

subcat : 〈ˆ

cat : 1˜

,ˆ

cat : 2˜

〉

–

ˆ

cat : 1 np˜ ˆ

cat : 2 np˜

Rachel gave the sheep water



A derivation tree

Example (A derivation tree)

ˆ

c : s˜

»

c : v

sc : 〈〉

–

h

c : 2 s

i

"

c : v

sc : 〈h

c : 2i

〉

#

»

c : v

sc : 〈〉

–

ˆ

c : np˜

"

c : v

sc : 〈h

c : 1i

,

h

c : 2i

〉

#

h

c : 1 np

i

ˆ

c : np˜

"

c : v

sc : 〈h

c : 3i

〉

#

h

c : 3 np

i

Jacob told Laban he loved Rachel



Subcategorization imposes case constraints

In the above grammar, categories on subcategorization lists arerepresented as an atomic symbol

This is a simplification; the method outlined here can be used withmore complex encodings of categories

For example, the lexical entry of the German verb geben (to give) canstate that the first complement must be in the dative case, whereasthe second must be accusative




Example (Subcategorization imposes case constraints)

Ich gebe dem Hund den KnochenI give the(dat) dog the(acc) boneI give the dog the bone

∗Ich gebe den Hund den KnochenI give the(acc) dog the(acc) bone

∗Ich gebe dem Hund dem KnochenI give the(dat) dog the(dat) bone




Example (Subcategorization imposes case constraints)

The lexical entry of gebe, then, could be:

L(gebe) =

cat : v

subcat :

⟨[

cat : npcase : dat

]

,

[

cat : npcase : acc

]⟩

num : sg




In order to account for subcategorization of complex information(rather than of atomic category symbols), the VP rule whichmanipulates subcategorization lists has to be slightly modified

The revised rule reflects the fact that the subcategorized informationis not the value of the cat feature, but rather the entire verbcomplement:

[

cat : vsubcat : 2

]

→

cat : v

subcat :

[

first : 3

rest : 2

]

3 [ ]



G3, a complete E2-grammar

Example (G3, a complete E2-grammar)

ˆ

cat : s˜

→

2

4

cat : np

num : 4case : nom

3

5

2

4

cat : v

num : 4subcat : elist

3

5

2

4

cat : v

num : 4subcat : 2

3

5 →

2

6

6

4

cat : v

num : 4

subcat :

»

first : 3rest : 2

–

3

7

7

5

3 [ ]

2

4

cat : np

num : 4case : 2

3

5 →

»

cat : d

num : 4

–

2

4

cat : n

num : 4case : 2

3

5

2

4

cat : np

num : 4case : 2

3

5 →

2

4

cat : pron

num : 4case : 2

3

5 |

2

4

cat : propn

num : 4case : 2

3

5





sleep →

2

4

cat : v

subcat : elist

num : pl

3

5

give →

2

6

6

4

cat : v

subcat : 〈

»

cat : np

case : acc

–

,ˆ

cat : np˜

〉

num : pl

3

7

7

5

love →

2

6

6

4

cat : v

subcat : 〈

»

cat : np

case : acc

–

〉

num : pl

3

7

7

5

tell →

2

6

6

4

cat : v

subcat : 〈

»

cat : np

case : acc

–

,ˆ

cat : s˜

〉

num : pl

3

7

7

5





lamb →

2

4

cat : n

num : sg

case : 2

3

5 lambs →

2

4

cat : n

num : pl

case : 2

3

5

she →

2

4

cat : pron

num : sg

case : nom

3

5 her →

2

4

cat : pron

num : sg

case : acc

3

5

Rachel →

»

cat : propn

num : sg

–

Jacob →

»

cat : propn

num : sg

–

a →

»

cat : d

num : sg

–

two →

»

cat : d

num : pl

–


Linguistic applications Long distance dependencies


Encoding grammatical categories as feature structures is very usefulin the treatment of unbounded dependencies

Such phenomena involve a “missing” constituent that is realizedoutside the clause from which it is missing, as in:





Phrases such as whom Jacob loved ⌣ or who ⌣ loved Rachel aresentences, with a constituent which is “moved” from its defaultposition and realized as a wh-pronoun in front of the phrase

We represent such phrases by using the category s

But to distinguish them from declarative sentences we add a feature,que, to the category

The value of que is ‘+’ in sentences with an interrogative pronounrealizing a transposed constituent




We also add a lexical entry for the pronoun whom:

whom →

cat : proncase : accque : +

Finally, we update the rule that derives pronouns such that itpropagate the value of que from the lexicon to higher projections ofthe pronoun:

cat : npnum : 1

case : 3

que : 5

→

cat : pronnum : 1

case : 3

que : 5




We extend G3 with two additional rules, based on the first two rulesof G3:

(3)

[

cat : sslash : 4

]

→

cat : npnum : 1

case : nom

cat : vnum : 1

subcat : elistslash : 4

(4)

cat : vnum : 1

subcat : 2

slash : 4

→

cat : vnum : 1

subcat :

[

first : 4

rest : 2

]



A derivation tree for Jacob loved ⌣

Example (A derivation tree for Jacob loved ⌣)»

cat : s

slash : 4

–

2

4

cat : np

num : 1case : 2

3

5

2

6

6

4

cat : v

num : 1slash : 4subcat : 8

3

7

7

5

2

4

cat : propn

num : 1 sg

case : 2nom

3

5

2

6

6

6

6

4

cat : v

num : 1

subcat :

2

4

first : 4

»

cat : np

case : acc

–

rest : 8 elist

3

5

3

7

7

7

7

5

Jacob loved ⌣




A rule for creating “complete” sentences by combining the missingcategory with a “slashed” sentence

The rule does not commit as to the category of the dislocatedelement; it simply combines any category with a sentence in whichthis very same category is missing, provided that this category ismarked as ‘que +’

The value of que is propagated to the mother to indicate that thesentence is interrogative rather than declarative:

(5)

[

cat : sque : 5

]

→ 4[

que : 5+]

[

cat : sslash : 4

]



A derivation tree for whom Jacob loved ⌣

Example (A derivation tree for whom Jacob loved ⌣)»

cat : s

que : 5

–

»

cat : s

slash : 4

–

4

2

4

cat : np

case : 3que : 5

3

5

2

4

cat : np

num : 1case : 2

3

5

2

6

6

4

cat : v

num : 1slash : 4subcat : elist

3

7

7

5

2

4

cat : pron

case : 3 acc

que : 5+

3

5

2

4

cat : propn

num : 1 sg

case : 2nom

3

5

2

4

cat : v

num : 1subcat :

˙

4¸

3

5

whom Jacob loved ⌣




In order to derive the full sentenceRachel wondered whom Jacob loved ⌣

we need a lexical entry for the verb wondered

It is a verb, so its category is v, and as it subcategorizes for aninterrogative sentence, its subcategory is a list of a single member, asentence whose que feature is ‘+’:

wondered →

cat : vnum : [ ]

subcat : 〈

[

cat : sque : +

]

〉



A derivation tree for Rachel wondered whom Jacob loved ⌣

Example (A derivation tree for Rachel wondered whom Jacob loved ⌣)

[

cat : s]

cat : npnum : 3

case : 4nom

cat : vnum : 3

subcat : elist

cat : propnnum : 3 sgcase : 4

cat : vnum : 3

subcat : 〈 1 〉

1

[

cat : sque : +

]

Rachel wondered whom Jacob loved ⌣




In the previous example the filler of the gap is realized immediately tothe left of the clause in which the gap occurs

This need not always be the case: unbounded dependencies can holdacross several clause boundaries

Typical examples are:



The shepherd wondered whom

Laban thought Leah claimed Jacob loved ⌣.




Also, the dislocated constituent does not have to be an object:

The shepherd wondered who ⌣ loved Rachel.

The shepherd wondered who Laban thought ⌣ loved Rachel.

The shepherd wondered who

Laban thought Leah claimed ⌣ loved Rachel.




The solution we proposed for the simple case of unboundeddependencies can be easily extended to the more complex examples

The solution amounts to three components:

A slash introduction ruleSlash propagation rulesA gap filler rule




In order to account for filler-gap relations that hold across severalclauses, all that needs to be done is to add more slash propagationrules

For example, in


the slash is introduced by the verb phrase loved ⌣, and is propagatedto the sentence Jacob loved ⌣ by rule (3)

This sentence is the object of the verb thought; therefore, we need arule that propagates the value of slash from a sentential object tothe verb phrase of which it is an object




Example (Long-distance dependencies)

(6)

cat : vnum : 1

subcat : 12

slash : 4

→

cat : vnum : 1

subcat :

[

first : 8

rest : 12

]

8[

slash : 4]





Then, the slash is propagated from the verb phrase thought Jacob loved ⌣

to the sentence Laban thought Jacob loved ⌣:

(7)

[

cat : sslash : 4

]

→

cat : npnum : 5

case : nom

cat : vnum : 5

subcat : elistslash : 4




Example (A derivation tree for whom Laban thought Jacob loved ⌣)"

cat : s

que : 6

#

"

cat : s

slash : 4

#

2

6

6

6

4

cat : v

num : 5slash : 4sc : 12 elist

3

7

7

7

5

8"

cat : s

slash : 4

#

4

2

6

4

cat : np

case : 3que : 6

3

7

5

2

6

4

cat : np

num : 5case : 9

3

7

5

2

6

4

cat : np

num : 1case : 2

3

7

5

2

6

6

6

4

cat : v

num : 1slash : 4sc : elist

3

7

7

7

5

2

6

4

cat : pron

case : 3 acc

que : 6 +

3

7

5

2

6

4

cat : propn

num : 5 sg

case : 9 nom

3

7

5

2

6

6

6

4

cat : v

num : 5

sc :

"

first : 8rest : 12

#

3

7

7

7

5

2

6

4

cat : propn

num : 1 sg

case : 2 nom

3

7

5

2

6

4

cat : v

num : 1sc :

D

4E

3

7

5

whom Laban thought Jacob loved ⌣





Finally, to account for gaps in the subject position, all that is needed is anadditional slash introduction rule:

(8)

cat : s

slash :

cat : npnum : 1

case : nom

→

cat : vnum : 1

subcat : elist




Example (A derivation tree for who ⌣ loved Rachel)»

cat : s

que : 6

–

2

4

cat : s

num : 1slash : 4

3

5

2

4

cat : v


3

5

4

2

4

cat : np

case : 3 nom

que : 6

3

5 8»

cat : np

case : 2

–

2

4

cat : pron

case : 3 nom

que : 6

3

5

2

4

cat : v

num : 1 sg

subcat : 〈 8 〉

3

5

2

4

cat : propn

num : 6 sg

case : 2 acc

3

5

who ⌣ loved Rachel


Linguistic applications Subject and object control

Subject and object control

Differences between the ‘understood’ subjects of the infinitive verbphrase to work seven years in the following sentences:

Jacob promised Laban to work seven years

Laban persuaded Jacob to work seven years

The differences between the two example sentences stem fromdifferences in the matrix verbs:

promise is a subject control verb;persuade is object control



G4: explicit subj values

Example (G4: explicit subj values)

ˆ

cat : s˜

→ 1

2

4

cat : np

case : nom

num : 7

3

5

2

6

6

4

cat : v


subj : 1

3

7

7

5

2

6

6

4

cat : v

num : 7subcat : 4subj : 1

3

7

7

5

→

2

6

6

6

6

4

cat : v

num : 7

subcat :

»

first : 2rest : 4

–

subj : 1

3

7

7

7

7

5

2 [ ]

2

4

cat : np

num : 7case : 6

3

5 →

»

cat : d

num : 7

–

2

4

cat : n

num : 7case : 6

3

5

2

4

cat : np

num : 7case : 6

3

5 →

2

4

cat : pron

num : 7case : 6

3

5 |

2

4

cat : propn

num : 7case : 6

3

5





sleep →

2

6

6

6

4

cat : v

subcat : elist

subj :

»

cat : np

case : nom

–

num : pl

3

7

7

7

5

love →

2

6

6

6

6

6

4

cat : v

subcat : 〈

»

cat : np

case : acc

–

〉

subj :

»

cat : np

case : nom

–

num : pl

3

7

7

7

7

7

5

give →

2

6

6

6

6

6

4

cat : v

subcat : 〈

»

cat : np

case : acc

–

,ˆ

cat : np˜

〉

subj :

»

cat : np

case : nom

–

num : pl

3

7

7

7

7

7

5





lamb →

cat : nnum : sgcase : 6

lambs →

cat : nnum : plcase : 6

she →

cat : pronnum : sgcase : nom

her →

cat : pronnum : plcase : acc

Rachel →

[

cat : propnnum : sg

]

Jacob →

[

cat : propnnum : sg

]

a →

[

cat : dnum : sg

]

two →

[

cat : dnum : pl

]



Infinitival verb phrases

The next step is to account for infinitival verb phrases

This can be easily done by adding a new feature, vform, to verbalprojections

The values of this feature can represent the form of the verb: fin forfinite verbs and inf for infinitival ones

to work →

cat : vvform : infsubcat : elistsubj :

[

cat : np]



The lexical entry of promise

Example (The lexical entry of promise)

promised →

cat : vvform : fin

subcat : 〈

[

cat : npcase : acc

]

,

cat : vvform : infsubj : 1

〉

subj : 1

[

cat : npcase : nom

]

num : [ ]



A derivation tree for Jacob promised Laban to work

Example (A derivation tree for Jacob promised Laban to work)ˆ

cat : s˜

2

6

6

4

cat : v

vform : fin

subj : 1subcat : elist

3

7

7

5

2

6

6

6

4

cat : v

vform : fin

subj : 1subcat : 〈 3 〉

3

7

7

7

5

1"

cat : np

case : 6 nom

#

2

6

6

6

4

cat : v

vform : fin

subj : 1subcat : 〈 2 , 3 〉

3

7

7

7

5

2"

cat : np

case : 7 acc

#

"

cat : propn

case : 6

# "

cat : propn

case : 7

#

3

2

4

cat : v

vform : inf

subj : 1

3

5

Jacob promised Laban to work



The lexical entry of persuade

Example (The lexical entry of persuade)

persuaded →

cat : vvform : fin

subcat : 〈 1

[

cat : npcase : acc

]

,

cat : vvform : infsubj : 1

〉

subj :

[

cat : npcase : nom

]

num : [ ]


Linguistic applications Constituent coordination

Constituent coordination

N: no man lift up his [hand] or [foot] in all the land of Egypt

NP: Jacob saw [Rachel] and [the sheep of Laban]

VP: Jacob [went on his journey] and

[came to the land of the people of the east]

VP: Jacob [went near], and

[rolled the stone from the well’s mouth], and

[watered the flock of Laban his mother’s brother].

ADJ: every [speckled] and [spotted] sheep

ADJP: Leah was [tender eyed] but [not beautiful]

S: [Leah had four sons], but [Rachel was barren]

S: she said to Jacob, “[Give me children], or [I shall die]!”



Coordination in CFG

Example (Coordination in CFG)

S → S Conj SNP → NP Conj NPVP → VP Conj VP...

Conj → and, or, but, . . .



Coordination in UG

Example (Coordination in UG)[

cat : 1]

→[

cat : 1] [

cat : conj] [

cat : 1]



Coordination in UG

Example (Coordination)

ˆ

cat : 1 v˜

2

4

cat : 1num : [ ]sc : elist

3

5

2

4

cat : 1num : [ ]sc : elist

3

5

2

4

cat : v

num : [ ]

sc : 〈 2 〉

3

5 2»

cat : np

num : sg

–

ˆ

cat : conj˜

2

4

cat : v

num : [ ]

sc : 〈 3 〉

3

5 3»

cat : np

num : [ ]

–

rolled the stone and watered the sheep



Tough issues in coordination

Coordination of conjunctions

Properties of the conjoined phrases

Coordination of unlikes

Non-constituent coordination



Coordination

Example (Ruling out coordination in UG)[

cat : 1

conj : −

]

→

[

cat : 1

conj : +

]

[

cat : conj]

[

cat : 1

conj : +

]



Coordination

Example (Properties of the conjoined phrases)

2

6

6

4

cat : 1np

num : ??pers : ??gen : ??

3

7

7

5

2

6

6

4

cat : 1num : 4pers : 2gen : 8

3

7

7

5

2

6

6

4

cat : 1num : 6pers : 3gen : 7

3

7

7

5

2

6

6

4

cat : pron

num : 4pers : 2 second

gen : 8

3

7

7

5

ˆ

cat : conj˜

»

cat : d

num : 6

–

2

6

6

4

cat : n

num : 6 sg

pers : 3 third

gen : 7

3

7

7

5

you and a lamb



Coordination

Example (Coordination of unlikes)

Joseph became wealthyJoseph became a ministerJoseph became [wealthy and a minister]Joseph grew wealthy∗Joseph grew a minister∗Joseph grew [wealthy and a minister]



Coordination


[

cat : 1 ⊓ 2]

→[

cat : 1] [

cat : conj] [

cat : 2]

where ‘⊓’ is the generalization operator



Coordination


ˆ

cat :ˆ

v : +˜˜

»

subcat :ˆ

n : +˜

cat :ˆ

v : +˜

–

ˆ

cat :ˆ

n : +˜˜

»

cat :

»

v : +n : +

––

ˆ

cat : conj˜

»

cat :

»

v : −n : +

––

became wealthy and a minister



Coordination

Example (Coordination of unlikes)ˆ

c :ˆ

v : +˜ ˜

»

c :ˆ

v : +˜

sc :ˆ

n : +˜

–

ˆ

c :ˆ

n : +˜ ˜

2

4

c :ˆ

v : +˜

sc :

»

v : +n : +

–

3

5

ˆ

c : c˜

»

c :ˆ

v : +˜

sc :ˆ

n : +˜

–»

c :

»

v : +n : +

– –

ˆ

c : c˜

»

c :

»

v : −n : +

– –

grew and remained wealthy and a minister



Coordination

Example (Non-constituent coordination)

Rachel gave the sheep [grass] and [water]Rachel gave [the sheep grass] and [the lambs water]Rachel [kissed] and Jacob [hugged] Binyamin


Linguistic applications Unification grammars facilitate linguistic generalizations

Unification grammars facilitate linguistic generalizations

Compared with context-free grammars, unification grammars providemuch better means for expressing linguistic generalizations

Verb subcategorizationCoordination

Unification grammars also provide much more informative structuresthan CFGs

AgreementSubject/object control

Unification grammars provide a very powerful tool for expressing whatother linguistic theories would call “movement”

Gap–filler constructionsUnbounded dependencies


Summary Extensions and open problems

Extensions and open problems

Restricted versions of unification grammars

Off-line parsabilityContext-free and Mildly-context-sensitive unification grammarsPolynomially-arsable unification grammars

Typed unification grammars

Type hierarchiesAppropriateness specificationType inference

Development of large-scale grammars

Grammar engineeringModularity, information encapsulation, separate compilation, ...


Summary Extensions and open problems

Thank you


unification grammars - university of haifashuly/malta-slides.pdf · lexical functional grammar...

Documents