context-free grammars for natural languages

Context-free grammars Basic definitions

Context-free grammars for natural languages

Context-free grammars can be used for a variety of syntacticconstructions, including some non-trivial phenomena such asunbounded dependencies, extraction, extraposition etc.

However, some (formal) languages are not context-free, and thereforethere are certain sets of strings that cannot be generated bycontext-free grammars.

The interesting question, of course, involves natural languages: arethere natural languages that are not context-free? Are context-freegrammars sufficient for generating every natural language?

c©Shuly Wintner (University of Haifa) Unification Grammars c©Copyrighted material 1 / 300


A context-free grammar, G0, for E0

Example

A context-free grammar, G0, for E0

S → NP VPVP → VVP → V NPNP → D NNP → PronNP → PropND → the, a, two, every, . . .

N → sheep, lamb, lambs, shepherd, water . . .

V → sleep, sleeps, love, loves, feed, feeds, herd, herds, . . .

Pron → I, me, you, he, him, she, her, it, we, us, they, them

PropN → Rachel, Jacob, . . .



Context-free grammars for natural languages

There are two major problems with this grammar.

1 it ignores the valence of verbs: there is no distinction amongsubcategories of verbs, and an intransitive verb such as sleep mightoccur with a noun phrase complement, while a transitive verb such aslove might occur without one. In such a case we say that thegrammar overgenerates: it generates strings that are not in theintended language.

2 there is no treatment of subject–verb agreement, so that a singularsubject such as the cat might be followed by a plural form of verbsuch as smile. This is another case of overgeneration.

Both problems are easy to solve.



Problems of G0

Over-generation (agreement constraints are not imposed):

∗Rachel feed the sheep

∗The shepherds feeds the sheep

∗Rachel feeds

∗Jacob loves she

∗Them herd the sheep



Problems of G0

Over-generation (subcategorization constraints are not imposed):

the lambs sleep

Jacob loves Rachel

∗the lambs sleep the sheep

∗Jacob loves



Problems of G0

Example (Over-generation)

S

NP VP

D N V NP

Pron

the lambs sleeps they



Verb valence

To account for valence, we can replace the non-terminal symbol V bya set of symbols: Vtrans, Vintrans, Vditrans etc.

We must also change the grammar rules accordingly:

Example

VP → Vintrans Vintrnas → sleep, sleeps

VP → Vtrans NP Vtrans → love, loves

VP → Vditrans NP NP Vditrans → give, gives



Agreement

To account for agreement, we can again extend the set ofnon-terminal symbols such that categories that must agree reflect inthe non-terminal that is assigned for them the features on which theyagree.

In the very simple case of English, it is sufficient to multiply the set of“nominal” and “verbal” categories, so that we get Dsg, Dpl, Nsg,Npl, NPsg, NPpl, Vsg, Vlp, VPsg, VPpl etc. We must also changethe set of rules accordingly:



Agreement

Example

Nsg → lamb Npl → lambs

Nsg → sheep Npl → sheep

Vsg → sleeps Vpl → sleep

Vsg → smiles Vpl → smile

Vsg → loves Vpl → love

Vsg → saw Vpl → saw

Dsg → a Dpl → two



Agreement

Example

S → NPsg VPsg S → NPpl VPplNPsg → Dsg Nsg NPpl → Dpl NplVPsg → Vsg VPpl → VplVPsg → VPsg NP VPpl → VPpl NP



Methodological properties of the CFG formalism

1 Concatenation is the only string combination operation

2 Phrase structure is the only syntactic relationship

3 The terminal symbols have no properties

4 Non-terminal symbols (grammar variables) are atomic

5 Most of the information encoded in a grammar lies in the productionrules

6 Any attempt of extending the grammar with a semantics requiresextra means.



Alternative methodological properties

1 Concatenation is not necessarily the only way by which phrases maybe combined to yield other phrases.

2 Even if concatenation is the sole string operation, other syntacticrelationships are being put forward.

3 Modern computational formalisms for expressing grammars adhere toan approach called lexicalism.

4 Some formalisms do not retain any context-free backbone. However,if one is present, its categories are not atomic.

5 The expressive power added to the formalisms allows also a certainway for representing semantic information.


Feature structures Introduction

Feature structures

Motivated by the violations of the context-free grammar G0, wewould like to extend the CFG formalism with additional mechanismsthat will facilitate the expression of information that is missing in G0

in a uniform and compact way.

The core idea is to incorporate into the grammar properties ofsymbols, in terms of which the violations of G0 were stated.

Properties are represented by means of feature structures.


Feature structures Introduction

Overview

An overview of feature structures, motivating their use as arepresentation of linguistic information

Four different views of these entities:

feature graphsfeature structuresattribute-value matrices (AVMs)

Feature structures in a broader context.


Feature structures Motivation

Motivation

Words in natural languages have properties

We want to model these properties in the lexicon

We would like to associate with words not just atomic symbols, as inCFGs, but rather structural information that reflects their properties.



A simple lexicon

Example (A simple lexicon)

lamb:

[

num : sgpers : third

]

lambs:

[

num : plpers : third

]

I:

[

num : sgpers : first

]

sheep:

[

num : [ ]pers : third

]

dreams:

[


]

dreams:

[


]



Feature structures

Feature structures map features into values, which are themselvesfeature structures

A special case of feature structures are atoms, which representstructureless values.

For example, to deal with number (and impose its agreement), weuse a feature num, and a set of atomic feature structures {sg,pl} asits values, representing singularity and plurality, respectively.

When a value is not atomic, it is complex.

A complex value is, recursively, a feature structure consisting offeatures and values.



A complex feature structure

Example (A complex feature structure)

loves:

vtype : transitive

agr :

[


]



Grouping features

Deciding how to group features is up to the grammar designer, and isintended to capture syntactic generalizations.

If number and person ‘go together’ in formulating restrictions, it ismore appropriate to group them as in this example.

Moreover, such a grouping might be beneficial when featurestructures are being modified.

Processes of derivation and parsing (the application of grammarrules) are able to manipulate feature structures to reflect applicationof such constraints.

When the properties of some feature structure are changed, it ispossible to change the value of only one feature, namely agr, ratherthan specify two separate changes for each subfeature.



Grouping features

In the example lexicon, the lexical ambiguity of sheep is representedby an empty feature structure as the value of the num feature.

This is interpreted as the value of this feature being unconstrained.

However, it would have been useful to be able to state that the onlypossible values for this feature are, say, sg and pl.

There are at least two different ways to specify such information:

by listing a set of values for the feature;or by restricting its value to a certain “type” of permissible values.

We do not explore the former solution here.

The latter solution is employed by typed feature structure formalisms.



Adding features to phrases

Words are not the only linguistic entities that have properties; wordsare combined into phrases, and those also have properties which canbe modeled by feature:value pairs.

For example, the noun phrase a sheep has the value sg for the num

feature, while two sheep has the value pl for num.

Consequently, grammar non-terminals, too, must be decorated withfeatures, representing the endowment of phrases of this category withthat feature.


Feature structures Feature graphs

Feature graphs

The informal discussion of feature structures above depicted themusing a representation, called attribute-value matrices (AVMs), whichis common in the linguistic literature.

We begin the discussion of feature structures by defining the conceptof feature graphs, using well-known concepts of graph theory.

A graph view of feature structures facilitates computationalprocessing because so many properties of graphs are well understoodand because graphs lend themselves to efficient processing.

We will return to AVMs and discuss their correspondence with featuregraphs later on.



Definitions

Feature graphs are defined over a signature consisting of non-empty,finite, disjoint sets Feats of features and Atoms of atoms.

Features are used to encode properties of (linguistic) objects, such asnumber, gender etc.

Atoms are used for the (atomic) values of such features, as in plural,feminine etc.

We use a convention of depicting features in small capitals andatoms in italics.



Signature

Definition (Signature)

A signature is a structure S = 〈Atoms,Feats〉, where Atoms is a finiteset of atoms and Feats is a finite set of features.

We assume some fixed signature throughout this presentation.

Meta-variables f , g (with or without subscripts or superscripts) rangeover features, and a, b, etc. over atoms.

We usually assume that both Feats and Atoms are non-empty (andsometimes even assume that they include more than one elementeach).



Feature graphs

Definition (Feature graphs)

A feature graph A = 〈QA, qA, δA, θA〉 is a finite, directed, connected,labeled graph consisting of a finite, nonempty set of nodes QA (such thatQA ∩ Feats = QA ∩Atoms = ∅), a root qA ∈ QA, a partial functionδA : QA × Feats → QA specifying the arcs such that every node q ∈ QA

is accessible from qA, and a partial function, marking some of the sinks:θA : QS → Atoms, where QS = {q ∈ QA | δA(q, f )↑ for every f }.Given a signature of features Feats and atoms Atoms, letG(Feats,Atoms) be the set of all feature graphs over the signature.



Feature graphs

Example (Feature graphs)

The graph displayed below is 〈Q, q, δ, θ〉, whereQ = {q0, q1, q2, q3}, q = q0, δ(q0,agr) = q1, δ(q1,num) =q2, δ(q1,pers) = q3,QS = {q2, q3}, θ(q2) = pl, θ(q3) = third.

q2pl

q0 q1

q3third

agr

num

pers



Feature graphs

The arcs of a feature graph are thus labeled by features.

The root is a designated node from which all other nodes areaccessible (through δ); note that nothing prevents the root fromhaving incoming arcs.

Sink nodes (nodes with no outgoing edges) can be marked by anatom, but can also be unmarked.



Feature graphs

We use meta-variables A, B (with or without subscripts) to refer tofeature graphs.

We use Q, q, δ, θ, to refer to constituents of feature graphs.

When displaying feature graphs, the root is depicted as a grey-colorednode, usually at the top or the left side of the graph.

The identities of the nodes are arbitrary, and we use generic namessuch as q0, q1 etc. to refer to them.



Feature graphs

Example (Feature graphs)

In the following graph, the leaves q2 and q3 bear no marking; in otherwords, the marking function θ is undefined for the two sinks in its domain.

q2

q0 q1

q3

agrnum

pers

The graph displayed above is 〈Q, q, δ, θ〉, where Q = {q0, q1, q2, q3}, q =q0, δ(q0,agr) = q1, δ(q1,num) = q2, δ(q1,pers) = q3,QS = {q2, q3},and θ is undefined for its entire domain.



Feature graphs

A feature graph is empty if it consists of a single unmarked nodewith no arcs.

A feature graph is atomic if it consists of a single marked node withno arcs.



Empty and atomic feature graphs

Example (Empty and atomic feature graphs)

A, an empty feature graph: q0

B , an atomic feature graph: q0pl



Paths

The concept of paths is natural when graphs are concerned.

A path (over Feats) is a finite sequence of features, and the setPaths = Feats∗ is the collection of all paths.

Meta-variables π, α (with or without subscripts) range over paths.

ǫ is the empty path, denoted also by ‘〈〉’.

The length of a path π is denoted |π|.

For example, if Feats = {a, b} then Paths includesǫ, 〈a〉, 〈b〉, 〈a,b,a〉, 〈b,b,b,b,a,b〉, etc.

While a path is a purely syntactic notion (every sequence of featuresconstitutes a path), interesting paths are those that can be interpretedas actual paths in some graph, leading from the root to some node.



Paths

The definition of δ is therefore extended to paths: given a featuregraph A = 〈QA, qA, δA, θA〉, define δA : QA ×Paths → QA as follows:

δA(q, ǫ) = q

δA(q, f π) = δA(δA(q, f ), π) (defined only if δA(q, f )↓)

Since for every node q ∈ QA and every feature f ∈ Feats,δA(q, f) = δA(q, 〈f〉), we identify δ with δ in the future and use onlythe latter. When the index (A) is clear from the context, it is omitted.When δA(q, π) = q′ we say that π leads (in A) from q to q′.



Paths

Definition (Paths)

The paths of a feature graph A are Π(A) = {π ∈ Paths | δA(qA, π)↓}.



Paths

Example (Paths)

Consider the following feature graph, A:

q2pl

q0 q1

q3third

agr

num

pers

Its paths are

Π(A) = {ǫ, 〈agr〉, 〈agr num〉, 〈agr pers〉}



Path values

Of particular interest are paths which lead from the root of a featuregraph to some node in the graph.

For such paths we define the notion of a value, which is thesub-graph whose root is the node at the end of the path.

It would have been possible to define as value the node itslef, ratherthan the sub-graph is induces; the choice is a matter of taste, asmoving from one view of values to another is trivial.



Path values

Definition (Path value)

For a feature graph A = 〈QA, qA, δA, θA〉 and a path π ∈ Π(A), the valuevalA(π) of π in A is a feature graph B = 〈QB , qB , δB , θB〉, over the samesignature as A, where:

qB = δA(qA, π)

QB = {q′ ∈ QA | for some π′, δA(qB , π′) = q′} (QB is the set ofnodes reachable from qB)

for every feature f and for every q′ ∈ QB , δB(q′, f) = δA(q′, f ) (δB isthe restriction of δA to QB)

for every q′ ∈ QB , θB(q′) = θA(q′) (θB is the restriction of θA to QB)



Paths

Example (Paths)

Consider the following feature graph, A:

q2pl

q0 q1

q3third

agr

num

pers

Its paths are

Π(A) = {ǫ, 〈agr〉, 〈agr num〉, 〈agr pers〉}



Path values

Example (Path values)

The value of the path 〈agr〉 in A is:

valA(〈agr〉) =

q2pl

q1

q3third

num

pers

and the value of the path 〈agr num〉 in A is:

valA(〈agr num〉) = q2pl

Note that, for example, the value of 〈agr pers num〉 in A is undefined.



Reentrancy

The definition of path values raises the question of when two pathshave equal values.

We distinguish between paths which lead to one and the same node,and those whose values are isomorphic but not identical.

The former case is called reentrancy.



Reentrancy

Definition (Reentrancy)

Let A = 〈Q, q, δ, θ〉 be a feature graph. Two paths π1, π2 ∈ Π(A) are

reentrant in A, denoted π1A

! π2, iff δ(q, π1) = δ(q, π2), implyingvalA(π1) = valA(π2). A feature graph A is reentrant iff there exist two

distinct paths π1, π2 ∈ Π(A) such that π1A

! π2.



Reentrancy

Example (A reentrant feature graph)

This feature graph, A, is reen-trant because δA(q0, 〈agr〉) =δA(q0, 〈subj,agr〉)

q2pl

q0 q1

q4 q3third

agr

num

perssubj agr

The (single) value of the(different) paths 〈agr〉 and〈subj agr〉 in A is:

q2pl

q1

q3third

num

pers



Reentrancy

The notion of reentrancy touches on the issue of the distinctionbetween type- and token-identity.

Two feature graphs are token identical if their components (i.e., theirsets of nodes, roots, transition functions and atom marking functions)are identical.

They are type-identical if they are isomorphic, not necessarilyrequiring their nodes to be identical.

We will discuss feature graph isomorphism later.



Cicles

Early feature structure based formalisms used to employ only acyclicfeature graphs.

However, modern ones usually allow (or even require) featurestructures to be possibly cyclic.

While the linguistic motivation for cyclic feature structures is limited,there is good practical motivation for allowing them: whenimplementing a system for manipulating feature graphs, it is usuallyeasier to support cycles than to guarantee that all the graphs in asystem are acyclic.

The reason is that unification, which is the major operation definedon feature graphs, can yield a cyclic graph even when its operands areacyclic.



Cicles

Definition (Cycles)

A feature graph A = 〈QA, qA, δA, θA〉 is cyclic if two paths π1, π2 ∈ Π(A),

where π1 is a proper subsequence of π2, are reentrant: π1A

! π2. A isacyclic otherwise.

Note that cyclicity is a special case of reentrancy (every cyclic featuregraph is reentrant, but not vice versa).

A corollary of the definition is that when a feature graph is cyclic, ithas at least one node q such that δ(q, α) = q for some non-emptypath α.



Cicles

Example (A cyclic feature graph)

Following is a cyclic feature graph, C :

q0 q1 q2a

f

h

g

The value of the path 〈f〉 in C , as well as the values of the (infinitelymany) paths 〈f hn〉, for n ≥ 0, is the same feature graph:

q1 q2a

h

g


Feature structures Feature graph subsumption

Feature graph isomorphism

Since feature graphs are just a special case of directed, labeledgraphs, we can adapt the well-defined notion of graph isomorphism tofeature graphs.

Informally, two graphs are isomorphic when they have the samestructure; the identites of their nodes may differ without affecting thestructure.

In our case, we require also that the labels of sink nodes be identicalin order for two graphs to be considered isomorphic.



Feature graph isomorphism

Definition (Feature graph isomorphism)

Two feature graphs A = 〈QA, qA, δA, θA〉 and B = 〈QB , qB , δB , θB〉 areisomorphic, denoted A ∼ B , iff there exists a one-to-one and ontomapping i : QA → QB , called an isomorphism, such that:

i(qA) = qB ;

for all q1, q2 ∈ QA and f ∈ Feats, δA(q1, f ) = q2 iffδB(i(q1), f ) = i(q2); and

for all q ∈ QA, θA(q) = θB(i(q)) (either both are undefined, or bothare defined and equal).



Feature graph subsumption

Definition (Subsumption)

Let A1 = 〈Q1, q1, δ1, θ1〉 and A2 = 〈Q2, q2, δ2, θ2〉 be two feature graphs.A1 subsumes A2 (denoted by A1 ⊑ A2) iff there exists a total functionh : Q1 → Q2, called a subsumption morphism, such that

h(q1) = q2

for every q ∈ Q1 and for every f such that δ1(q, f )↓,h(δ1(q, f )) = δ2(h(q), f )

for every q ∈ Q1, if θ1(q)↓ then θ1(q) = θ2(h(q)).

If A1 ⊑ A2 then A1 is said to subsume, or be more general than A2; A2 issubsumed by, or is more specific than, A1.



Subsumption

The morphism h associates with every node in Q1 a node in Q2; if anarc labeled f connects q with q′, then such an arc connects h(q) withh(q′).

In other words, δ and h commute, as depicted in the followingdiagram, where δ-arcs are depicted using solid lines, whereash-mappings are depicted using dashed lines:

δ :

h

h

f f



Subsumption

In addition, if a node q ∈ Q1 is marked by an atom, then its imageh(q) must be marked by the same atom (recall that only sinks can bethus marked).

Note that if a sink in Q1 is not marked, there is no constraint on itsimage (in particular, it can be a non-sink).



Subsumption morphism

Example (Subsumption morphism)

A1 A2

q h(q)

q h(q)

q′ h(q′)

f f

h

h

h




Example (Subsumption)

qA2

A : qA0 qA

1

qA3

third

qB2

pl

B : qB0 qB

1

qB4 qB

3 third

agr

num

pers

agr num

perssubj agr




Indeed, B can—and does—have nodes that do not correspond tonodes in A: such is qB

4 in the example.

In addition, while the sink qA2 is not marked by an atom (that is, it is

a variable), its image in B , qB2 , is marked as pl .

Notice that no subsumption morphism can be defined from QB toQA, since there is no node into which qB

4 can be mapped.

In particular, it cannot be mapped to the root of A since this wouldnecessitate an arc from qA

0 to itself (as the root of A would be theimage of both qB

4 and qB0 ).

Trying to take h−1 as an inverse subsumption morphism will fail bothbecause of qB

4 and because it would map qB2 to qA

2 , violating the lastclause of the subsumption relation (a marked sink must be mapped toa sink with the same mark).

We conclude that B 6⊑ A.



Subsumption

Given a feature structure, what modifications can be made to it inorder for it to become more specific? Three different kinds ofmodifications are possible:

1 Adding arcs;2 Adding reentrancies;3 Marking unmarked sinks by some atom.



Subsumption

Example (Subsumption as an order on information)

⊑ pl adding arcsnum

⊑ pl adding atomic marksnum num

sg ⊑ sg adding arcs

third

num num

per

sg ⊑ sg adding reentrancies

sg

num1

num2

num1

num2



Subsumption

Lemma

If A ⊑ B then Π(A) ⊆ Π(B).



Subsumption

Lemma

If A ⊑ B then for each π ∈ Π(A), if θA(δA(qA, π))↓ then θB(δB(qB , π))↓and θA(δA(qA, π)) = θB(δB (qB , π)).



Subsumption

Lemma

If A ⊑ B and π1, π2 are reentrant in A (that is, π1A

! π2) then π1, π2 are

reentrant in B (that is, π1B

! π2).



Subsumption

Corollary

If A ⊑ B, then:

Π(A) ⊆ Π(B)

for each π ∈ Π(A), if θA(δA(qA, π))↓ then θB(δB (qB , π))↓ andθA(δA(qA, π)) = θB(δB (qB , π))

for each π1, π2 ∈ Π(A), if π1A

! π2 then π1B

! π2 (and, therefore, ifA is reentrant/cyclic then so is B).



Subsumption

Theorem

If A is an atomic feature graph and A ⊑ B, then A ∼ B.



Subsumption

Theorem

Subsumption has a least element: there exists a feature graph A such thatfor all feature graph B, A ⊑ B.

Proof.

Consider the (empty) feature graph A = 〈{q0}, q0, δ, θ〉, where δ and θ areundefined for their entire domains. For every feature graph B , A ⊑ B bymapping (through h) the root q0 to the root of B , qB . The two clauses ofthe definition of subsumption hold vacuously.



Subsumption

Theorem

Subsumption is reflexive: for every feature graph A, A ⊑ A.

Proof.

Take h to be the identity function that maps every node in A to itself.



Subsumption

Theorem

Subsumption is transitive: if A ⊑ B and B ⊑ C then A ⊑ C.



Subsumption

Theorem

Subsumption is not antisymmetric: if A ⊑ B and B ⊑ A then notnecessarily A = B.

Proof.

Consider the feature graphs A = 〈{qA}, qA, δ, θ〉 and B = 〈{qB}, qB , δ, θ〉,where δ and θ are undefined for their entire domains, and where qA 6= qB .Trivially, both A ⊑ B and B ⊑ A, but A 6= B .



Subsumption

Thus, feature graph subsumption forms a partial pre-order on featuregraphs.

It is a pre-order since it is not antisymmetric; it is partial as there arefeature graphs that are incomparable with respect to subsumption.



Subsumption

Example (Feature graph subsumption is a partial relation)

Feature graphs can be incomparable due to inconsistency (contradictinginformation) or to complementary information.

sg6⊑6⊒ pl

sg6⊑6⊒ pl

num num

6⊑6⊒

num pers



Subsumption

There is a clear connection between feature graph isomorphism andfeature graph subsumption:

Theorem

A ∼ B iff A ⊑ B and B ⊑ A.


Feature structures Attribute-value matrices

AVMs

We now return to attribute-value matrices (AVMs).

This is the view that we will adopt for depicting feature structures(and grammars based on them), both because they are easy topresent on paper and because of their centrality in existing literature.

Like feature graphs, AVMs are defined over a signature of featuresand atoms, which we fix below.

In addition, AVMs make use of variables, also called tags below.Meta-variables X , Y , Z , etc. range over over variables.

Variables are used to encode sharing of values, as will be clearpresently.

When AVMs are concerned, we follow the convention of the linguisticliterature by which variables are natural numbers, depicted in boxes,e.g., 3 .



AVMs

Definition (AVMs)

Given a signature S, the set Avms(S) of AVMs over S is the least setsatisfying the following two clauses:

1 M = Xa ∈ Avms(S) for any a ∈ Atoms and X ∈ Tags; M is saidto be atomic and X is the tag of M, denoted tag(M) = X .

2 M = X [f1 : M1, . . . , fn : Mn] ∈ Avms(S) for n ≥ 0, X ∈ Tags,f1, . . . , fn ∈ Feats and M1, . . . ,Mn ∈ Avms(S), where fi 6= fj ifi 6= j . M is said to be complex, and X is the tag of M, denotedtag(M) = X . If n = 0, M = X [] is an empty AVM.

Note that two AVMs which differ only in their tag are distinct: if X 6= Y ,X

[

· · ·]

6= Y[

· · ·]

. In particular, there is no unique empty AVM. Note alsothat the same variable can be used more than once in an AVM.



AVMs

Example (AVMs)

Consider a signature consisting of Atoms = {a} and Feats = {f,g}.Then M1 = 4a is an AVM by the first clause of the definition, M2 = 2 [ ] isan empty AVM by the second clause, M3 = 3

[

f : 4a]

is an AVM by thesecond clause (using M1 as the value of f, so that fval(M3, f) = M1), and

M4 = 2

[

g : 3[

f : 4a]

f : 2 [ ]

]

is an AVM by the second clause, as is

M5 = 4

[

g : 3[

f : 4a]

f : 2 [ ]

]



AVMs

Meta-variables M, with or without subscripts, range over Avms; theparameter S is omitted when it is clear from the context.

The domain of an AVM M, denoted dom(M), is undefined when M isatomic, and {f1, . . . , fn} when M is complex (hence, dom(M) isempty for an empty AVM).

The value of some feature f ∈ Feats in M, denoted fval(M, f ), isdefined if f = fi ∈ dom(M), in which case it is Mi , and undefinedotherwise.



Sub-AVMs

Definition (Sub-AVMs)

Given an AVM M, its sub-AVMs are SubAVM(M), defined as:

1 SubAVM(Xa) = {Xa}

2 SubAVM(X [f1 : M1, . . . , fn : Mn]) = X [f1 : M1, . . . , fn : Mn]⋃

∪1≤i≤nSubAVM(Mi )



AVMs

Definition (Tags)

Given an AVM M, its tags Tags(M) are defined as:

1 Tags(Xa) = {X}

2 Tags(X [f1 : M1, . . . , fn : Mn]) = X ∪1≤i≤n Tags(Mi )

Definition (Tagset)

The tagset of an AVM M and a tag X ∈ Tags(M) is the set of sub-AVMsof M (including M itself) which are tagged by X :TagSet(M,X ) = {M ′ ∈ SubAVM(M) | tag(M ′) = X}.



AVMs

Example (AVMs)

Let:

M4 = 2

[

g : 3[

f : 4a]

f : 2 [ ]

]

fval(M4, f) = 2 [ ]. Observe that Tags(M4) = { 2 , 3 , 4}. Also,TagSet(M4, 4 ) is { 4a}, TagSet(M4, 3 ) is { 3

[

f : 4a]

} andTagSet(M4, 2 ) is {M4, 2 [ ]}.Trivially, tag(M4) = 2 .



AVMs

Example (AVMs)

Let:

M5 = 4

[

g : 3[

f : 4a]

f : 2 [ ]

]

Similarly, fval(M5, f) = 2 [ ], whereas fval(M5,g) = 3[

f : 4a]

. Observethat Tags(M5) = { 2 , 3 , 4}.Also TagSet(M5, 2 ) = { 2 [ ]},TagSet(M5, 3 ) = { 3

[

f : 4a]

} and TagSet(M5, 4 ) = {M5, 4a}.Trivially, tag(M5) = 4 .



AVMs

Example (AVMs)

As another example, consider the AVM

M6 = 1[

f : 1[

f : 1[

f : 1 [ ]]]]

Here, Tags(M6) = { 1}, and TagSet(M6, 1 ) is:

{M6, 1[

f : 1[

f : 1 [ ]]]

, 1[

f : 1 [ ]]

, 1 [ ]}

Of course, tag(M6) = 1 .



Well-formed AVMs

Consider some AVM M = 1

[

f1 : 2M1

f2 : 2M2

]

where M1 6= M2.

Both M1 and M2 are sub-AVMs of M, and both have the same tag,although they are different.

In other words, the recursive definition of AVMs allows two different,contradicting AVMs to be in the TagSet of the same variable.

To eliminate such cases, we define well-formed AVMs as follows:

Definition (Well-formed AVMs)

An AVM M is well-formed iff for every variable X ∈ Tags(M),TagSet(M,X ) includes at most one non-empty AVM.



Reentrancy

Example (A reentrant AVM)

The following AVM is reentrant but not cyclic:

0

agr : 1

[

num : 2plpers : 3 third

]

subj : 4[

agr : 1]



Conventions

We introduce three conventions regarding the depiction ofwell-formed AVMs, motivated by the fact that variables are usedprimarily to indicate value sharing.

If a variable occurs more than once then its value is explicated onlyonce; where this value is explicated (i.e., next to which occurrence ofthe variable) is immaterial.

Variables which occur only once can be omitted.

The empty AVM is sometimes omitted when it is associated with avariable.

The first convention is crucial in the case of cyclic AVMS: there is nofinite representation of cyclic AVMs unless this convention is adopted.



Conventions

Example (Shorthand notation for AVMs)

Consider the following AVM:

6

f : 3 [ ]g : 4

[

h : 3a]

h : 2 [ ]

Notice that it is well-formed, since the only variable occurring more thanonce ( 3 ) is associated with a non-empty value (a) only once.

We can therefore leave only one occurrence of the value explicit

The tag 2 is associated with the empty feature structure, which canbe omitted

Finally, the tags 4 and 6 occur only once, so they can be omitted

This is the conventional form of the AVM.



AVM equivalence

Example (AVM equivalence)

M1 and M2 differ only in the instance of 0 whose value is explicated:

M1 = 0

agr : 1

[


]

subj : 4[

agr : 1]

M2 = 0

agr : 1

subj : 4

[

agr : 1

[


]]

Then M1 � M2 and M2 � M1.



AVM equivalence

Example (AVM renamings)

The following two AVMs are renamings of each other:

M1 = 0

agr : 1

[


]

subj : 4[

agr : 1]

M2 = 10

agr : 11

subj : 14

[

agr : 11

[


]]


Feature structures The correspondence between feature graphs and AVMs

The correspondence between feature graphs and AVMs

AVMs are the entities that the linguistic literature employs to depictfeature structures;

feature graphs are well-understood mathematical entities to whichvarious results of graph theory can be applied.

We define the relationship between these two views.


Feature structures The correspondence between feature graphs and AVMs

From AVMs to feature graphs

Example (AVM to graph mapping)

A reentrant AVM and its feature graph image:

M = 0

agr : 1

[


]

subj : 4[

agr : 1]

φ(M) =

2pl

0 1

4 3third

agr

num

perssubj agr


Feature structures Feature structures in a broader context

Feature structures in a broader context

Feature structures are utilized by many grammatical formalisms toencode different kinds of linguistic information: they serve inrepresenting phonological, morphological, syntactic and semanticknowledge.

But the use of feature structures is not limited to computationallinguistics; indeed, they are present in other areas of computer scienceas well.




A somewhat degenerate form of feature structures is utilized by manyprogramming languages: records (as in Pascal, known as structures inC).

There are some major differences between records and featurestructures.

The notion of sharing that is central to feature structures is lesssignificant for records.The values of record fields are not necessarily other records – differentdata types can be freely used; hence transfer of values is mediatedthrough explicit assignments, not unifications.Unification-based formalisms usually do not allow such a diversity ofoperations to apply to feature structures as programming languagesallow to records.In particular, arithmetic operations are usually not applicable to featurestructures’ values, while they are very natural to numeric records’ fields.




Logic programming languages such as Prolog manipulate first-orderterms (FOTs), which might be viewed as a special case of featurestructures.

There are some important differences between feature structures andFOTs.

FOTs are essentially trees, with possibly shared leaves, whereas featurestructures allow reentrancies to occur in every level of the structure.Feature structures can be cyclic, in contrast to (ordinary) FOTs.FOTs use positional encoding of argument structures, with no features.Two FOTs are unifiable only if they have the same functor and thesame arity, while two feature structures might be unifiable even if theyhave different number of features.


Unification Motivation

Unification

The subsumption relation compares the information content offeature structures.

Unification combines the information that is contained in two(compatible) feature structures.

We use the term ‘unification’ to refer to both the operation and itsresult. Whenever two feature structures are related, they are assumedto be over the same signature.


Unification Motivation

Unification

The mathematical interpretation of “combining” two members of apartially ordered set is to take the least upper bound of the twooperands with respect to the partial order; in our case, subsumption.

Indeed, feature structure unification is exactly that.

However, since subsumption is antisymmetric for feature structuresand AFSs but not for feature graphs and AVMs, a unique least upperbound cannot be guaranteed for all four views.


Unification Feature structure unification

Feature structure unification

Definition (Feature structure unification)

Two feature structures fs1 and fs2 are consistent if they have an upperbound (with respect to subsumption), and inconsistent otherwise. If fs1and fs2 are consistent, their unification, denoted fs1⊔fs2, is their leastupper bound with respect to subsumption.

If two feature structures have an upper bound, they have a (unique) leastupper bound.


Unification Feature graph unification

Feature graph unification

While the definition of unification as least upper bound is usefulmathematically, it does not tells us how to compute the unification oftwo given feature structures.

To this end, we provide a constructive definition in terms of featuregraphs, which induces an algorithm for computing unification.

For reasons that will be clear presently, we require that the twofeature graphs be node-disjoint.




Definition

Let A = 〈QA, qA, δA, θA〉 and B = 〈QB , qB , δB , θB〉 with QA ∩ QB = ∅ be

two feature graphs. Let ‘u≈’ be the least equivalence relation on QA ∪ QB

such that:

qA

u≈ qB

for every q1, q2 ∈ QA ∪ QB and f ∈ Feats, if

q1u≈ q2, (δA ∪ δB)(q1, f )↓ and (δA ∪ δB)(q2, f )↓, then

(δA ∪ δB)(q1, f )u≈ (δA ∪ δB)(q2, f )




The ‘u≈’ relation partitions the nodes of QA ∪ QB to equivalence

classes such that both roots are in the same class, and if some featureis defined for two nodes in one class, then the two nodes this featureleads to are also in one (possibly different) class.

Clearly, the number of equivalence classes (called the index ofu≈) is

finite.

The requirement that QA and QB be disjoint is essential here: wewould want two nodes to be in the same equivalence class with

respect to ‘u≈’ only if they comply with the above definition; if we

allowed a non-empty intersection of nodes, ‘u≈’ could have been a

different relation.



Theu≈ relation

A : qA0 qA

1 qA2

sg

B : qB0 qB

1 qB2

3rd

f

g

num

f pers



Theu≈ relation

A : qA0 qA

1 qA2

B : qB1qB

0

f

g

h

f

g



Type-respecting relation

Definition

A binary relation ‘≈’ over the nodes of two feature structures QA ∪ QB issaid to be type respecting iff for every node q ∈ QA ∪ QB , if(θA ∪ θB)(q)↓ and (θA ∪ θB)(q) = a, then for every node q′ such thatq ≈ q′, q′ is a sink and either (θA ∪ θB)(q′)↑ or (θA ∪ θB)(q′) = a.



Type-respecting relation

When is ‘u≈’ not type respecting?

The above condition can hold for a node q ∈ QA ∪ QB only if(θA ∪ θB)(q)↓; that is, q must be a sink in either A or B .

The type respecting condition requires that all nodes that areequivalent to q be sinks, either unmarked or marked by the sameatom.

Since this is the only requirement, the relation is not type respectingif it maps two nodes, one of which is a marked sink and the other ofwhich is either a non-sink or a sink with a different label, to the sameequivalence class.

A non-type respecting ‘u≈’ is the only source for unification failure.



Type respectingu≈ relation

A : qA0 qA

1 qA2

sg

B : qB0 qB

1 qB2

3rd

f

g

num

f pers




Lemma

If A and B have a common upper bound C, such that A ⊑ C through themorphism hA and B ⊑ C through the morphism hB , and if qA ∈ QA and

qB ∈ QB are such that qA

u≈ qB , then hA(qA) = hB(qB).




Definition (Feature graph unification)

Let A and B be two feature graphs such that QA and QB are disjoint. The

unification of A and B, denoted A ⊔ B, is defined only if ‘u≈’ is type respecting,

in which case it is the feature graph 〈Q, q, δ, θ〉, where:

Q = {[q] u≈| q ∈ (QA ∪ QB)}

q = [q1] u≈

(= [q2] u≈

)

δ([q] u≈

, f ) =

{

[q′′] u≈

if there exists q′ ∈ [q] u≈

s.t. (δA ∪ δB)(q′, f ) = q′′

undef. if (δA ∪ δB)(q′, f )↑ for all q′ ∈ [q] u≈

θ([q] u≈

) =

{

(θA ∪ θB)(q′) if there exists q′ ∈ [q] u≈

s.t. (θA ∪ θB)(q′)↓

undefined if (θA ∪ θB)(q′)↑ for all q′ ∈ [q] u≈

Ifu≈ is not type respecting, A and B are inconsistent.




f

gnum

pers

A : qA0 qA

1 qA2

sg

B : qB0 qB

1 qB2

3rd

f

g

num

f pers



Unification

To see that the result of unification is indeed a feature graph, observethat

〈Q, q, δ, θ〉 is connected because both A and B are connected;it is finite since both A and B are (and hence the number ofequivalence classes is finite);

and θ labels only sinks, sinceu≈ is type respecting.



Unification

Example (Unification combines information)

q0 q1sg

⊔ q3 = q6 q7sg

q53rd

q83rd

num

pers

num

pers



Unification

Example (Unification is absorbing)

q0 q1sg

⊔ q3 q4sg

= q6 q7sg

q53rd

q83rd

num num

pers

num

pers



Unification with reentrancies

sg

3rd

sg

3rd

subj

obj

num

pers

subj

obj

num

pers

subj

obj



Unification

Theorem

If A and B are inconsistent, they do not have a common upper bound.Otherwise, C = A ⊔ B is a minimal upper bound of A and B with respectto (feature graph) subsumption.



Unification

The previous theorem connects feature graph unification with featurestructure unification.

In order to compute fs = fs1⊔fs2, simply compute A = A1 ⊔ A2,where A1 ∈ fs1 and A2 ∈ fs2, and take fs = [A]∼.

Theorem

For all feature graphs A1,A2, if A = A1 ⊔ A2 then [A]∼ = [A1]∼⊔[A2]∼.


Unification Generalization

Generalization

Unification is an information-combining operator: when two featurestructures are compatible, their unification can be informally seen asa union of the information both structures encode.

Sometimes, however, a dual operation is useful, analogous to theintersection of the information encoded in feature structures.

This operation, which is much less frequently used in computationallinguistics, is referred to as anti-unification, or generalization.



Generalization

Defined over pairs of feature structures, generalization (denoted ⊓) isthe operation that returns the most specific (or least general) featurestructure that is still more general than both arguments.

In terms of the subsumption ordering, generalization is the greatestlower bound (glb) of two feature structures.

Unlike unification, generalization can never fail.



Generalization

Definition (Generalization)

The generalization (or anti-unification) of two feature structures fs1 andfs2, denoted fs1⊓fs2, is the greatest lower bound of fs1 and fs2.



Generalization

Example (Generalization)

Generalization reduces information:

[

num : sg]

⊓[

pers : third]

= [ ]

Different atoms are inconsistent:

[

num : sg]

⊓[

num : pl]

=[

num : [ ]]

Generalization is restricting:

[

num : sg]

⊓

[


]

=[

num : sg]



Generalization

Example (Generalization)

Empty feature structures are zero elements:

[ ] ⊓[

agr :[

num : sg]]

= [ ]

Reentrancies can be lost:[

f : 1[

num : sg]

g : 1

]

⊓

[

f :[

num : sg]

g :[

num : sg]

]

=

[

f :[

num : sg]

g :[

num : sg]

]


Unification grammars Introduction

Unification grammars

Feature structures are the building blocks with which unificationgrammars are built, as they serve as the counterpart of the terminaland non-terminal symbols in CFGs.

In order to define grammars and derivations, one needs someextension of feature structures to sequences thereof.

Multi-rooted feature structures are aimed at capturing complex,ordered information and are used for representing rules and sententialforms of unification grammars.

Multi-rooted feature graphs, a natural extension of feature graphsMulti-rooted feature structures, which are equivalence classes ofisomorphic multi-rooted feature graphsMulti-AVMs, which are an extension of AVMs, and show how theycorrespond to multi-rooted graphs.


Unification grammars Introduction


Unification in context

Forms and grammar rules

Derivation

Languages

Derivation tress


Unification grammars Multi-rooted feature graphs

Multi-rooted feature graphs

We extend feature graphs to multi-rooted feature graphs (MRGs).

Multi-rooted feature graphs are defined over the same signature(Feats and Atoms), which is assumed to be fixed

Definition (Multi-rooted feature graphs)

A multi-rooted feature graph (MRG) is a pair 〈R ,G 〉 whereG = 〈Q, δ, θ〉 is a finite, directed, labeled graph consisting of a non-empty,finite set Q of nodes (disjoint of Feats and Atoms), a partial functionδ : Q × Feats → Q specifying the arcs and a labeling function θ markingsome of the sinks, and where R is an ordered list of distinguished nodes inQ called roots. G is not necessarily connected, but the union of all thenodes reachable from all the roots in R is required to yield exactly Q. Thelength of an MRG is the number of its roots, |R|. λ denotes the emptyMRG, where Q = ∅.




Example (Multi-rooted feature graphs)

The following is an MRG, in which the shaded nodes (ordered from left toright) constitute the list of roots, R

q1 q2 q3

q4s

q5np

q6vp

q7

cat cat cat

agr agr




A multi-rooted feature graph is a directed, not necessarily connected,labeled graph with a designated sequence of nodes called roots

It is a natural extension of feature graphs, the only difference beingthat the single root of a feature graph is extended here to a list inorder to model the required structured information

Meta-variables ~A range over MRGs, and Q, δ, θ and R – over theirconstituents

We do not distinguish between an MRG of length 1 and a featuregraph




Natural relations can be defined between MRGs and feature graphs

First, note that if ~A = 〈R ,G 〉 is an MRG and qi is a root in R then qi

naturally induces a feature graph ~A|i = 〈Qi , qi , δi , θi 〉, where:

Qi is the set of nodes reachable from qi

δi = δ|Qi(the restriction of δ to Qi )

θi = θ|Qi(the restriction of θ to Qi ).




One can view an MRG ~A = 〈R,G 〉 as an ordered sequence〈A1, . . . ,An〉 of (not necessarily disjoint) feature graphs, whereAi = ~A|i for 1 ≤ i ≤ n

Note that such an ordered list of feature structures is not a sequencein the mathematical sense:

removing a node accessible from one root can result in this node beingremoved from the graph accessible from some other root



Subgraphs

Although MRGs are not element-disjoint sequences, it is possible todefine substructures of them

The roots of an MRG form a sequence of nodes

Taking just a subsequence of the roots, and considering only thesubgraph they induce (that is, the nodes that are accessible fromthese roots), a notion of substructure is naturally obtained



Subgraphs

Definition (Induced subgraphs)

The subgraph of a non-empty MRG ~A = 〈R,G 〉, induced by j , k anddenoted ~Aj ...k , is defined only if 1 ≤ i ≤ j ≤ n, in which case it is theMRG 〈R ′,G ′〉 where R ′ = 〈qj , . . . , qk〉, G ′ = 〈Q ′, δ′, θ′〉 and

Q ′ = {q | δ(q, π) = q} for some q ∈ R ′ and some π

δ′(q, f ) = δ(q, f ) for every q ∈ Q ′

θ′(q) = θ(q) for every q ∈ Q ′

When the sequence is of length 1 we write ~Ai for ~Ai ...i . As we identify afeature graph with an MRG of length 1, ~Ai = ~A|i .



MRGs

Since MRGs are a natural extension of feature graphs, many ofconcepts defined for the latter can be extended to the former

The transition function δ is extended from single features to pathsThe set of paths of an MRGThe function val , associating a value with each path in a featuregraph, is extended to MRGs.Reentrancy and cyclicityIsomorphism and subsumption



MRG paths

Definition (MRG paths)

The paths of a multi feature graph ~A are

Π(~A) = {〈i , π〉 | π ∈ Paths and δ(qi , π)↓}



MRG path values

Definition (Path value)

The value of a path 〈i , π〉 in an MRG ~A, denoted by val~A(〈i , π〉), is definedif and only if δ~A

(qi , π)↓, in which case it is the feature graph val~A|i (π).

Note that the value of a path in an MRG is a (single-rooted) featuregraph, not an MRG. In particular, val~A(〈i , π〉) may include nodes which

are roots in ~A but are not the root of the resulting feature graph. Clearly,an MRG may have two paths 〈i1, π1〉 and 〈i2, π2〉 where π1 = π2 eventhough i1 6= i2.



MRG path values

Example (Path value)

~A, where R = 〈q0, q1, q2〉 val~A(〈2, 〈f〉〉)

q0 q1 q2

q3 q4 q5

q6a

q7b

q4

q6a

q7b

f f f

h hg h

g h



MRG reentrancy

Two MRG paths are reentrant, denoted 〈i , π1〉~A

! 〈j , π2〉, if theyshare the same value: δ~A

(qi , π1) = δ~A(qj , π2)

A multi-rooted feature graph is reentrant if it has two distinct paths(possibly leaving different roots) that are reentrant

An MRG ~A is cyclic if two paths 〈i , π1〉, 〈i , π2〉 ∈ Π(~A), where π1 is a

proper subsequence of π2, are reentrant: 〈i , π1〉~A

! 〈i , π2〉

Here, the two paths must have the same index i , although they may“pass through” elements of ~A other than the i -th one



A cyclic MRG

Example (A cyclic MRG)

The following MRG ~A = 〈R ,G 〉, where R = 〈q0, q1, q2〉, is cyclic:

q0 q1 q2

q3 q4 q5

q6 q7

f f f

h h

g

h

g



Multi-rooted feature graph isomorphism

Definition (Multi-rooted feature graph isomorphism)

Two MRGs ~A1 = 〈R1,G1〉 and ~A2 = 〈R2,G2〉 are isomorphic, denoted~A1~∼~A2, iff they are of the same length, n, and there exists a one-to-onemapping i : Q1 → Q2, called an isomorphism, such that:

i(q1j ) = q2j for all 1 ≤ j ≤ n;

for all q1, q2 ∈ Q1 and f ∈ Feats, δ1(q1, f ) = q2 iffδ2(i(q1), f ) = i(q2); and

for all q ∈ Q1, θ1(q) = θ2(i(q)) (either both are undefined, or bothare defined and equal).



Subsumption of multi-rooted feature graphs

Definition (Subsumption of multi-rooted feature graphs)

An MRG ~A = 〈R ,G 〉 subsumes an MRG ~A′ = 〈R ′,G ′〉, denoted ~A~⊑~A′, if|R | = |R ′| and there exists a total function h : Q → Q ′ such that:

for every root qi ∈ R, h(qi ) = q′i

for every q ∈ Q and f ∈ Feats, if δ(q, f )↓ thenh(δ(q, f )) = δ′(h(q), f )

for every q ∈ Q, if θ(q)↓ then θ(q) = θ′(h(q))

The only difference from feature graph subsumption is that h is requiredto map each of the roots in R to its corresponding root in R ′. Notice thatin order for two MRGs to be related by subsumption they must be of thesame length.




Example (MRG subsumption)

Feature graph subsumption can have three different effects: if A ⊑ B ,then B can have additional arcs, additional reentrancies or more markedatoms. The same holds for MRGs, with the observation that additionalreentrancies can now occur among paths that originate at different roots:

~⊑

6 ~⊒

f g f g




Example (MRG subsumption)

Let ~A and ~A′ be the following two MRGs. Then ~A~⊑~A′ but not ~A′~⊑~A.

~Anp vp np

sg 3rd sg 3rd

cat

agr

num

pers

catag

r

agr

num

pers

cat

~A′

np vp np

sg 3rd

cat

agr

num

pers

cat

agr cat

agr


Unification grammars Multi-AVMs

Multi-AVMs

Definition

Given a signature S, a multi-AVM (MAVM) of length n ≥ 0 is asequence 〈M1, . . . ,Mn〉 such that for each i , 1 ≤ i ≤ n, Mi is an AVMover the signature.



Multi-AVMs

Meta-variables ~M range over multi-AVMs

The sub-AVMs of ~M are SubAVM( ~M) =⋃

1≤i≤n SubAVM(Mi)

Similarly to what we did for AVMs, we define the set of tagsoccurring in a multi-AVM ~M as Tags(~M)

Note that if ~M = 〈M1, . . . ,Mn〉 then Tags( ~M) =⋃

1≤i≤n Tags(Mi )(where the union is not necessarily disjoint)

Also, the set of sub-AVMs of ~M (including ~M itself) which are taggedby the same variable X is TagSet(~M,X )

Here, too, TagSet( ~M,X ) =⋃

1≤i≤n TagSet(Mi ,X )

We usually do not distinguish between a multi-AVM of length 1 andan AVM

When depicting MAVMs graphically, we sometimes suppress theangular brackets which enclose the sequence of AVMs.



Multi-AVMs

Well-formedness and variable association are extended from AVMs toMAVMs in the natural way:

Definition (Well-formed MAVMs)

A multi-AVM ~M is well-formed iff for every variable X ∈ Tags( ~M),TagSet(~M,X ) includes at most one non-empty AVM.

Definition (Variable association)

The association of a variable X in ~M, denoted assoc( ~M,X ), is the singlenon-empty AVM in TagSet(~M,X ); if all the members of TagSet(~M,X ) areempty, then assoc(~M,X ) = X [ ].



Multi-AVMs

Example (Multi-AVMs)

Consider the following multi-AVM ~M , whose length is 3:

⟨

2[

f : 9[

h : 1 [ ]]]

, 1

[

f : 8

[

g : 7ah : 2 [ ]

]]

, 6[

f : 5[

h : 2 [ ]]]

⟩

Tags( ~M) = { 1 , 2 , 5 , 6 , 7 , 8 , 9}. ~M is well-formed:

TagSet( ~M , 1 ) =

{

1 [ ] , 1

[

f : 8

[

g : 7ah : 2 [ ]

]]}

TagSet( ~M , 2 ) ={

2 [ ] , 2[

f : 9[

h : 1 [ ]]]}

Therefore,

assoc( ~M, 1 ) = 1

[

f : 8

[

g : 7ah : 2 [ ]

]]

, assoc( ~M, 2 ) = 2[

f : 9[

h : 1 [ ]]]



Multi-AVMs

The same variable can tag different sub-AVMs of different elements inthe sequence

In other words, the scope of variables is extended from single AVMsto multi-AVMs

This leads to an interpretation of variables (in multi-AVMs) whichhampers the view of multi-AVMs as sequences of AVMs

Recall that we interpret multiple occurrence of the same variablewithin a single AVM as denoting value sharing; hence the definition ofwell-formed AVMs, and the convention that when a variable occursmore than once in an AVM, its association can be stipulated next toany of its occurrences

As in the other views, when multi-AVMs are concerned, thisconvention implies that removing an element from a multi-AVM canaffect other elements, in contradiction to the usual concept ofsequences



MAVM subsumption

Definition (Multi-AVM subsumption)

Let ~M, ~M ′ be two MAVMs of the same length n and over the samesignature. ~M subsumes ~M ′, denoted ~M~�~M ′, if the following conditionshold:

1 for all i , 1 ≤ i ≤ n, Mi � M ′i ;

2 if 〈i , π1〉~M

! 〈j , π2〉 then 〈i , π1〉~M′

! 〈j , π2〉.



MAVM subsumption

Example (MAVM subsumption)

Let ~M and ~M ′ be the following two MAVMs (of length 3):

~M : 1

»

cat : np

agr : 4

–

2

2

4

cat : vp

agr : 4

»

num : sg

pers : 3rd

–

3

5 3

2

4

cat : np

agr : 6

»

num : sg

pers : 3rd

–

3

5

~M ′ : 1

»

cat : np

agr : 4

–

2

2

4

cat : vp

agr : 4

»

num : sg

pers : 3rd

–

3

5 3

»

cat : np

agr : 4

–

Then ~M � ~M ′ but not ~M ′ � ~M.



MAVM subsumption

The second clause of the definition may seem redundant: if for all i ,1 ≤ i ≤ n, Mi � M ′

i , then in particular all the reentrancies of Mi areall reentrancies in M ′

i ; why then is the second clause necessary?

The answer lies in the possibility of reentrancies across elements inmulti-AVMs

Such reentrancies are a “global” property of multi-AVMs, which isnot reflected in any of the elements in isolation



MAVM Renaming

Definition (Renaming)

Let ~M1 and ~M2 be two MAVMs. ~M2 is a renaming of ~M1, denoted~M1~≃ ~M2, iff ~M1~�~M2 and ~M2~�~M1.



Multi-AVM to MRG mapping

Definition (Multi-AVM to MRG mapping)

Let ~M = 〈M1, . . . ,Mn〉 be a well-formed multi-AVM of length n. TheMRG image of ~M is ϕ( ~M) = 〈R ,G 〉, with R = 〈q1, . . . , qn〉 andG = 〈Q, δ, θ〉, where:

Q = Tags( ~M)

qi = tag(Mi) for 1 ≤ i ≤ n

for all X ∈ Tags(~M) and f ∈ Feats, δ(X , f ) = Y if〈X , f ,Y 〉 ∈ Arcs( ~M), and

for all X ∈ Tags(~M) and a ∈ Atoms, θ(X ) = a if assoc( ~M,X ) is theatomic AVM X (a), and is undefined otherwise.




Example (Multi-AVM to multi-rooted feature graph mapping)

Consider the following multi-AVM ~M:

2[

f : 9[

h : 1 [ ]]]

1

[

f : 8

[

g : 7ah : 2 [ ]

]]

6[

f : 5[

h : 2 [ ]]]

Observe that it is well-formed, as the variables that occur more than once( 1 and 2 ) have only one non-empty occurrence each. The set of variablesof ~M is Tags( ~M) = { 1 , 2 , 5 , 6 , 7 , 8 , 9}, which will also be the set ofnodes Q in ϕ(~M). The sequence of roots R is the sequence of variablestagging the AVM elements of ~M, namely 〈 2 , 1 , 6 〉.




Example (Multi-AVM to multi-rooted feature graph mapping)

The obtained graph is:

2 1 6

9 8 5

7a

f f fh

gh

h




Proposition

Let ~M1, ~M2 be two multi-AVMs. Then:

Π( ~M) = Π(ϕ(~M))

〈i , π1〉~M

! 〈j , π2〉 iff 〈i , π1〉ϕ(~M)! 〈j , π2〉

~M1~� ~M2 iff ϕ( ~M1)~⊑ϕ( ~M2)


Unification grammars Unification revisited

Unification revisited

We defined the unification operation for feature structures

We now extend the definition to multi-rooted structures; we definetwo variants of the operation:

one which unifies two same-length structures and produces their leastupper bound with respect to subsumptionunification in context, which combines the information in two featurestructures, each of which may be an element in a larger structure



Two AMRS unification operations

Example (Two AMRS unification operations)

[ ] [ ] [ ] · · · [ ][ ] [ ] [ ] · · · [ ]

[ ] [ ] [ ] · · · [ ]

Same-length AMRS unification

[ ][ ]

[ ] [ ] [ ] [ ] [ ][ ]




MRS unification

Example (MRS unification)

Let

σ =

[

cat : dnum : 4

]

cat : nnum : 4

case : nom

[

cat : vnum : 4

]

ρ =

[

cat : dnum : pl

]

cat : nnum : plcase : [ ]

[

cat : vnum : pl

]

Then

σ ⊔ ρ =

[

cat : dnum : 4pl

]

cat : nnum : 4

case : nom

[

cat : vnum : 4

]




Example (Unification in context)

Let

σ =

[

f : 1ag : 2 [ ]

]

[

h : 2]

, ρ =

[

f : 3 [ ]g : 4b

]

[

h : 3]

Unifying the first element in σ with the first element in ρ in the contextsof σ and ρ, we obtain (σ, 1) ⊔ (ρ, 1) = (σ′, ρ′):

σ′ =

[

f : 1ag : 2b

]

[

h : 2]

, ρ′ =

[

f : 3ag : 4b

]

[

h : 3]

Note that both operands of the unification are modified.




Theorem

If 〈σ′, ρ′〉 = (σ, i) ⊔ (ρ, j) then σ′i = ρ′j = σi ⊔ ρj .




Theorem

Let σ, ρ be two AMRSs and i , j be indexes such that i ≤ len(σ) andj ≤ len(ρ). Then 〈σ′, ρ′〉 = (σ, i) ⊔ (ρ, j) iff

σ′ = min~⊑{σ′′ | |σ~⊑ σ′′ and ρj�σ′′i} and

ρ′ = min~⊑{ρ′′ | ρ~⊑ ρ′′ and σi �ρ′′j}.


Unification grammars Rules and grammars

Rules and grammars

Like context free grammars, unification grammars are defined over analphabet

As the grammars that are of most interest to us are of naturallanguages, and since sentences in natural languages are not juststrings of symbols, but rather strings of words, we add to thesignature an alphabet, a fixed set Words of words (in addition tothe fixed sets Feats and Atoms)

Meta-variables wi ,wj etc. are used to refer to elements of Words, wto refer to strings over Words.



Rules and grammars

We also adopt here the distinction between phrasal and terminal rules

The former cannot have elements of Words in their bodies; thelatter have only a single word as their body

We refer to the collection of terminal rules as the lexicon: itassociates with terminals, members of Words, (abstract) featurestructures that are their categories

For every word wi ∈ Words the lexicon specifies a finite set ofabstract feature structures L(wi )

If L(wi ) is a singleton then wi is unambiguous, and if it is empty thenwi is not a member of the language defined by the lexicon.



Lexicon

Definition (Lexicon)

Given a signature of features Feats and atoms Atoms, and a setWords of terminal symbols, a lexicon is a finite-range functionL : Words → 2AFS(Feats,Atoms).



Lexicon

Example (Lexicon)

Following is a lexicon L over a signature consisting ofFeats = {cat,num,case}, Atoms = {d, n, v, sg, pl}, andWords = {two, sheep, sleep}:

L(two) =

{[

cat : dnum : pl

]}

L(sheep) =

cat : nnum : [ ]case : [ ]

L(sleep) =

{[

cat : vnum : pl

]}



Lexicon

Example (Lexicon)

An an alternative to the previous lexical entry of sheep above, thegrammar writer may prefer the following lexical entry:

L(sheep) =

cat : nnum : sgcase : [ ]

,




Lexicon

Example (Lexicon, rule-format)

To depict the lexicon specification above, we usually use the followingnotation:

sheep →


sheep →




Lexicon

When a string of words w is given, it is possible to construct anAMRS σw for the lexical entries of the words in w , such that no twoelements of σw share paths

Such an AMRS is simply the concatenation of the lexical entries ofthe words in w

In general, there may be several such AMRSs, as each word in w canhave multiple elements in its category

The set of such AMRSs is the pre-terminals of w



Pre-terminals

Definition (Pre-terminals)

Let w = w1 . . . wn ∈ Words+. PTw (j , k) is defined iff 1 ≤ j , k ≤ n, inwhich case it is the set of AMRSs {〈Aj · Aj+1 · · ·Ak〉 | Ai ∈ L(wi) forj ≤ i ≤ k}. If j > k (i.e., w = ǫ), then PTw (j , k) = {λ}. The subscript wis omitted when it is clear from the context.



Pre-terminals

Example (Pre-terminals)

Consider the string of words w = two sheep sleep and the lexicon of theprevious example. There is exactly one element in PTw (1, 3); this is theAMRS

[

cat : dnum : pl

]

cat : nnum : [ ]case : [ ]

[

cat : dnum : pl

]

Notice that there is no sharing of variables among different featurestructures in this AMRS. As AMRSs are depicted using multi-AVMs here,the variables in the above multi-AVM are chosen such that unintendedreentrancies are avoided.



Pre-terminals

Example (Pre-terminals)

Now assume that the word sheep is represented as an ambiguous word: itscategory contains two feature structures, namely

L(sheep) =


,


Then PTw (1, 3) has two members:

[

cat : dnum : pl

]


[

cat : dnum : pl

]

,

[

cat : dnum : pl

]


[

cat : dnum : pl

]



Rules

Definition (Rules)

A (phrasal) rule is an AMRS of length n > 0 with a distinguished firstelement. If σ is a rule then σ1 is its head and σ2..n is its body. We adopta convention of depicting rules with an arrow (→) separating the headfrom the body.

Since a rule is simply an AMRS, there can be reentrancies among itselements: both between the head and (some element of) the bodyand among elements in its body.

Notice that the definition supports ǫ-rules, i.e., rules with null bodies



Rules

Example (Rules as AMRSs)

As every AMRS can be interpreted as a rule, so can the following:

[

cat : s]

→

[

cat : npagr : 4

] [

cat : vagr : 4

]



Rules

Example (Rules as AMRSs)

Rules can also propagate information between the mother and any of thedaughters using reentrancies between paths originating in the head of therule and paths originating from one of the body elements, as below.

[

cat : ssubj : 1

]

→ 1

[

cat : npagr : 2

] [

cat : vagr : 2

]



Rules

The rules of the example employ feature structures that include thefeature cat, encoding the major part-of-speech category of phrases

While this is useful and natural, it is by no means obligatory

Unification rules can encode such information in other ways (e.g., viaa different feature, or as a collection of features); or they may notencode it at all

In the general case, a unification rule is not required to have acontext-free skeleton, a feature whose values constitute a context-freebackbone that drives the derivation

Some unification-based grammar theories do indeed maintain acontext-free skeleton (LFG is a notable example), while others (likeHPSG) do not



Rules

We introduce a shorthand notation in the presentation of grammars:

When two rules have the same head, we list the head only once andseparate the bodies of the different rules with ‘|’ (following theconvention of context-free grammars)

Note, however, that the scope of variables is still limited to a singlerule, so that multiple occurrences of the same variable within thebodies of two different rules are unrelated

Additionally, we may use the same variable (e.g., 4 ) in several rules

It should be clear by now that these multiple uses are unrelated toeach other, as the scope of variables is limited to a single rule




Definition (Unification grammars)

A unification grammar (UG) G = (L,R,As) over a signature Atoms ofatoms and Feats of features consists of a lexicon L, a finite set of rulesR and a start symbol As that is an abstract feature structure.




Example (Gu, a unification grammar)

[

cat : s]

→

cat : npnum : 4

case : nom

[

cat : vnum : 4

]

cat : npnum : 4

case : 2

→

[

cat : dnum : 4

]

cat : nnum : 4

case : 2

cat : npnum : 4

case : 2

→

cat : pronnum : 4

case : 2




Example (Gu, a unification grammar)

sleep →

[

cat : vnum : pl

]

sleeps →

[

cat : vnum : sg

]

lamb →


lambs →


she →

cat : pronnum : sgcase : nom

her →

cat : pronnum : sgcase : acc

a →

[

cat : dnum : sg

]

two →

[

cat : dnum : pl

]


Unification grammars Derivations

Derivations

The language generated by UGs is defined in a parallel way to thedefinition of languages generated by context-free grammars:

first, we define derivations, analogously to the context-free derivations

The reflexive transitive closure of the derivation relation is the basisfor the definition of languages

For the following discussion fix a particular grammar G = (L,R,As)



Derivations

Derivation is a relation that holds between two forms, σ1 and σ2,each of which is an AMRS

To define it formally, two concepts have to be taken care of:

An element of σ1 has to be matched against the head of somegrammar rule, ρThe body of ρ must replace the selected element in σ1, thus producingσ2

Matching involves unification, and unification must be computed incontext: that is, when the selected element of σ1 is unified with thehead of ρ, other elements in σ1 or in ρ may be affected due toreentrancy

This possibility must be taken care of when replacing the selectedelement with the body of ρ



Derivations

Definition (Derivation)

An AMRS σ1 of length k derives an AMRS σ2 (denoted σ1 ⇒ σ2) iff forsome j ≤ k and some rule ρ ∈ R of length n,

(σ1, j) ⊔ (ρ, 1) = (σ′1, ρ

′), and

σ2 is the replacement of the j-th element of σ1 with the body of ρ

(details suppressed)

The reflexive transitive closure of ‘⇒’ is ‘∗⇒’. We write σ

l⇒ ρ when σ

derives ρ in l steps.



Derivation step

Example (Derivation step)

Suppose that

σ1 =

cat : npnum : 1

case : nom

[

cat : vnum : 1

]

is a (sentential) form and that

ρ =

cat : npnum : 2

case : 3

→

[

cat : dnum : 2

]

cat : nnum : 2

case : 3

is a rule. Assume further that the selected element j in σ1 is the first one.Applying the rule ρ to the form σ1, it is possible to construct a derivationσ1 ⇒ σ2.



Derivation step


First, compute (σ1, 1) ⊔ (ρ, 1) = (σ′1, ρ

′):

σ′1 =

cat : npnum : 1

case : nom

[

cat : vnum : 1

]

ρ′ =

cat : npnum : 2

case : 3nom

[

cat : dnum : 2

]

cat : nnum : 2

case : 3



Derivation step


Now, the first element of σ′1 is replaced by the body of ρ′. This operation

results in a new AMRS, σ2, of length 3: the first two elements are thebody of ρ′, and the last element is the remainder of σ′

1, after its firstelement has been eliminated; that is, the last element of σ′

1. A simplereplacement would have resulted in the following AMRS:

[

cat : dnum : 2

]

cat : nnum : 2

case : 3nom

[

cat : vnum : 1

]

Obviously, this is not the expected result!



Derivation step


Since the path (1,num) in σ1 is reentrant with (2,num) (indicated by thetag 1 ), and since the path (1,num) in the rule ρ is reentrant with thepaths (2,num) and (3,num) (the tag 3 ), one would expect that thesharing between the num values of the noun phrase and the verb phrase inσ1 would manifest itself as a sharing between this feature’s values of thedeterminer, the noun and the verb phrase in σ2.This is what the last clause in the definition of derivation guarantees. Theresult is:

σ2 =

[

cat : dnum : 4

]

cat : nnum : 4

case : 5nom

[

cat : vnum : 4

]



Derivation

Example (Derivation)

Consider the grammar Gu. A derivation with Gu can start with a form oflength 1, consisting of

σ1 =[

cat : s]

The single element of this AMRS unifies with the head of the first rule inthe grammar, trivially. Substitution is again trivial, and the next form inthe derivation is the body of the first rule:

σ2 =

cat : npnum : 1

case : nom

[

cat : vnum : 1

]



Derivation


Since the rule ρ of that example is indeed in Gu, a derivable form from σ2

is:

σ3 =

[

cat : dnum : 4

]

cat : nnum : 4

case : nom

[

cat : vnum : 4

]

Thus, we obtain σ1 ⇒ σ2 ⇒ σ3, and hence σ1∗⇒ σ3.



Derivation


Consider the form σ3 and one of the AMRSs in PTw (1, 3):

σ3 =

[

cat : dnum : 4

]

cat : nnum : 4

case : nom

[

cat : vnum : 4

]

σ =

[

cat : dnum : pl

]


[

cat : dnum : pl

]

The former contains information that is accumulated during derivations; thelatter reflect information from the lexical entries of the words in w .

σ ⊔ ρ =

[

cat : dnum : 4pl

]

cat : nnum : 4

case : nom

[

cat : vnum : 4

]



Language

Definition (Language)

The language of a unification grammar G isL(G ) = {w ∈ Words∗ | w = w1 · · ·wn and there exist an AMRS σ such

that As∗⇒ σ and an AMRS ρ ∈ PTw (1, n) such that σ ⊔ ρ is defined}.



Language

Example (Language)

Consider the grammar Gu and the string the sheep sleep. The form σ3 isderivable from the start symbol of the grammar. This form is unifiablewith one of the members of PTw (1, 3). Hence the string the sheep sleep isa member of L(Gu).


Unification grammars Derivation trees

Derivation trees

In order to depict derivations graphically we extend the notion ofderivation trees, defined for context-free grammars, to unificationgrammars

Informally, we would like a tree to be a structure whose elements arefeature structures

However, care must be taken when the scope of reentrancies in a treeis concerned: in order for information to be shared among all nodes ina tree, this scope is extended to the entire tree



Derivation trees

Rather than define a new mathematical entity, corresponding to atree whose nodes are feature structures with the scope of reentranciesextended to the entire structure, we reuse in the following definitionthe concept of multi-rooted structures (more precisely, AMRSs)

In order to impose a tree structure on AMRSs we simply pair themwith a tree whose nodes are integers, such that each node in the treeserves as an index into the AMRS

In this way, all the existing definitions which refer to AMRSs can benaturally used when reasoning about trees



Derivation trees

Definition (Unification trees)

Given a signature S = 〈Atoms,Feats〉, a unification tree is an orderedtree whose nodes are AVMs over S, where the scope of reentrancies isextended to the entire tree. A subtree is a particular node of the tree,along with all its descendants (and the edges connecting them). Formally,a unification tree is a pair 〈σ, τ 〉, where σ is an AMRS over S, say oflength l for some l ∈ N, and τ is a tree over the nodes {1, 2, . . . , l}.



Derivation trees

Example (Unification tree)

Following is a unification tree, depicted as a tree of AVMs:

[

cat : s]

cat : npnum : 4

case : 2nom

[

cat : dnum : 4

]

cat : nnum : 4

case : 2nom

[

cat : vnum : 4

]



Derivation trees

Example (Unification tree)

Formally, this tree is a pair 〈τ, σ〉, where τ is a tree over {1, 2, 3, 4, 5} and σ is anAMRS of length 5:

τ = 1

2

3 4 5

σ =[

cat : s]

cat : npnum : 4

case : 2nom

[

cat : dnum : 4

]

cat : nnum : 4

case : 2nom

[

cat : vnum : 4

]



Unification derivation trees

Definition (Unification derivation trees)

A unification derivation tree induced by a unification grammarG = (R,As) is a unification tree defined recursively as follows:

〈As , τ〉 is a unification derivation tree, where τ is the tree consistingof the single node {1};

if 〈σ, τ 〉 is a unification derivation tree and 〈σ′, τ ′〉 extends 〈σ, τ 〉,then 〈σ′, τ ′〉 is also a unification derivation tree.




Example (Unification derivation trees)

A unification derivation tree with the grammar Gu can be builtincrementally as follows. The start symbol of the grammar is

[

cat : s]

;therefore, an initial derivation tree would be 〈σ1, {1}〉, the start symbolitself.Then, by using the first grammar rule, the following tree, 〈σ2, τ2〉, can beobtained:

[

cat : snum : 4

]

cat : npnum : 4

case : nom

[

cat : vnum : 4

]




Example (Unification derivation trees)

Next, by applying the second grammar rule to the leftmost node on thefrontier of 〈σ2, τ2〉, the following tree, 〈σ3, τ3〉, is obtained:

[

cat : snum : 4

]

cat : npnum : 4 sgcase : nom

[

cat : dnum : 4

]

cat : nnum : 4

case : 2

[

cat : vnum : 4

]



Complete derivation trees

As in the context-free case, the frontier of unification derivation treesdoes not have to correspond to any lexical item

Of course, in order for trees to represent complete derivations, we areparticularly interested in such trees whose frontier is unifiable with asequence of pre-terminals




Definition (Complete derivation trees)

A unification derivation tree 〈σ, τ 〉 is complete if the frontier of τ isj1, . . . , jn and there exist a word w ∈ Words∗ of length n and an AMRSρ ∈ PTw (1, n) such that ρ ⊔ 〈σi , σj1, . . . , σjn〉 is defined.

Note that there may be more than one qualifying AMRS in PTw (1, n); thedefinition only requires one. Of course, different AMRSs in PTw (1, n) willcorrespond to different interpretations of the input string (resulting fromambiguous lexical entries of the words)




Example (Complete derivation trees)

Consider the grammar Gu and the string w = two lambs sleep. The tree ofthe previous example is complete. Its frontier is unifiable with thefollowing AMRS:

[

cat : dnum : pl

]

cat : nnum : plcase : 2

[

cat : vnum : pl

]

∈ PTw (1, 3)



Lexicalized derivation trees

It is sometimes useful to depict a tree whose leaves already reflect theadditional information obtained by actually unifying the frontier of acomplete derivation tree with PTw

We call such trees lexicalized

It is easy to see that for every lexicalized tree 〈σ, τ 〉 there exists a

complete derivation tree 〈σ′, τ ′〉 such that τ ′ = τ and σ′ ~⊑ σ




Definition (Lexicalized derivation trees)

Let 〈σ, τ 〉 be a complete derivation tree induced by a unification grammarG = (R,As) and let w , ρ be as in the definition of complete trees. Alexicalized derivation tree induced by G on w is the unification tree〈σ′, τ 〉, where σ′ is obtained from σ by unifying the frontier of σ with ρ.




Example (Lexicalized derivation tree)

A tree induced by the grammar Gu on the string two lambs sleep:

[

cat : s]

cat : npnum : 4

case : 2nom

[

cat : dnum : 4pl

]

cat : nnum : 4

case : 2nom

[

cat : vnum : 4

]

two sheep sleep


Linguistic applications Introduction

Linguistic applications

We now put the theory to use, by accounting for several of thelinguistic phenomena that motivated UGs

Unification grammars facilitate the expression of linguisticgeneralizations

This is mediated through two main mechanisms:

The notion of grammatical category is expressed via feature structures,thereby allowing for complex categories as first-class citizens of thegrammatical theoryReentrancy provides a concise machinery for expressing “movement”,or more generally, relations that hold in a deeper level than aphrase-structure tree


Linguistic applications Introduction

Phenomena

Agreement

Case control

Subcategorization

Long-distance dependencies

Control

Coordination


Linguistic applications A basic grammar

A basic grammar

Example (A context-free grammar G0:)

S → NP VPVP → V | V NPNP → D N | Pron | PropND → the, a, two, every, . . .

N → sheep, lamb, lambs, shepherd, water . . .

V → sleep, sleeps, love, loves, feed, feeds, herd, herds, . . .

Pron → I, me, you, he, him, she, her, it, we, us, they, them

PropN → Rachel, Jacob, . . .



Every CFG is a UG

Observe that any context-free grammar is a special case of aunification grammar

The non-terminal symbols of the CFG can be modeled by atoms

A more general view of G0 as a unification grammar can encode thefact that the non-terminal symbols represent grammatical categories

This can be done using a single feature, e.g., cat, whose values arethe non-terminals of G0



G ′0, a basic unification grammar

Example (G ′0, a basic unification grammar)

Following is a unification grammar, G ′0, over a signature 〈Feats,Atoms〉

where Feats = {cat} and Atoms = {s, np, vp, v, d, n, pron, propn}:

1[

cat : s]

→[

cat : np] [

cat : vp]

2[

cat : vp]

→[

cat : v]

3[

cat : vp]

→[

cat : v] [

cat : np]

4[

cat : np]

→[

cat : d] [

cat : n]

5, 6[

cat : np]

→[

cat : pron]

|[

cat : propn]



G ′0, a basic unification grammar

Example (G ′0, a basic unification grammar)

sleep →[

cat : v]

give →[

cat : v]

love →[

cat : v]

tell →[

cat : v]

feed →[

cat : v]

feeds →[

cat : v]

lamb →[

cat : n]

lambs →[

cat : n]

she →[

cat : pron]

her →[

cat : pron]

they →[

cat : pron]

them →[

cat : pron]

Rachel →[

cat : propn]

Jacob →[

cat : propn]

a →[

cat : d]

two →[

cat : d]



Derivation trees induced by G ′0

Example (Derivation trees induced by G ′0)

The grammar G ′0 induces the following tree on the string the sheep love her:

[

cat : s]

[

cat : np] [

cat : vp]

[

cat : d] [

cat : n] [

cat : v] [

cat : np]

[

cat : pron]

the sheep love her



Derivation trees induced by G ′0

Example (Derivation trees induced by G ′0)

Not surprisingly, an isomorphic derivation tree is induced by the grammaron the ungrammatical string ∗the lambs sleeps they:

[

cat : s]

[

cat : np] [

cat : vp]

[

cat : d] [

cat : n] [

cat : v] [

cat : np]

[

cat : pron]

the lambs sleeps they


Linguistic applications Imposing agreemnt

Gagr, accounting for agreement on number

Example (Gagr, accounting for agreement on number)

1[

cat : s]

→

[

cat : npnum : 4

] [

cat : vpnum : 4

]

2

[

cat : vpnum : 4

]

→

[

cat : vnum : 4

]

3

[

cat : vpnum : 4

]

→

[

cat : vnum : 4

]

[

cat : np]

4

[

cat : npnum : 4

]

→

[

cat : dnum : 4

] [

cat : nnum : 4

]

5, 6

[

cat : npnum : 4

]

→

[

cat : pronnum : 4

]

|

[

cat : propnnum : 4

]





sleep →

[

cat : vnum : pl

]

give →

[

cat : vnum : pl

]

love →

[

cat : vnum : pl

]

tell →

[

cat : vnum : pl

]

feed →

[

cat : vnum : pl

]

feeds →

[

cat : vnum : sg

]





lamb →

[

cat : nnum : sg

]

lambs →

[

cat : nnum : pl

]

she →

[

cat : pronnum : sg

]

her →

[

cat : pronnum : sg

]

they →

[

cat : pronnum : pl

]

them →

[

cat : pronnum : pl

]

Rachel →

[

cat : propnnum : sg

]

Jacob →

[

cat : propnnum : sg

]

a →

[

cat : dnum : sg

]

two →

[

cat : dnum : pl

]



Gagr generates a CF language

While Gagr is a unification grammar, the language it generates iscontext free

But the equivalent CFG is inferior to the unification grammar:

The linguistic description is distorted: information regarding number,which is determined by the words themselves, is encoded in G1 by theway they are derived (in other words, G1 accounts for lexical knowledgeby means of phrase-structure rules)Several linguistic generalizations are lost: the context-free grammarinduces two different trees on a lamb sleeps and two lambs sleep



UG and linguistic generalizations

One natural notion of ‘linguistic generalization’ emerges: the abilityto formulate a linguistic restriction by means of a single rule, insteadof by a collection of “similar” rules

In this sense, Gagr captures the agreement generalization, while G1

does not

Multiplying out all the possible values of a particular feature, andconverting a unification grammar to an equivalent context-freegrammar in this way, is not always possible


Linguistic applications Imposing case control

Imposing case control

Add a feature to the signature, case, to the feature structuresassociated with nominal categories: nouns, pronouns, proper namesand noun phrases

The lexical entries of pronouns must specify their case, which is overtand explicit: we use the value nom for nominative case, whereas accstands for accusative

As for proper names and nouns, their lexical entries are simplyunderspecified with respect to case

Use the values of the case feature in the grammar to imposeconstraints of case control



Gcase, accounting for case control

Example (Gcase, accounting for case control)

1[

cat : s]

→

cat : npnum : 4

case : nom

[

cat : vpnum : 4

]

2

[

cat : vpnum : 4

]

→

[

cat : vnum : 4

]

3

[

cat : vpnum : 4

]

→

[

cat : vnum : 4

]

cat : npnum : 3

case : acc

4

cat : npnum : 4

case : 2

→

[

cat : dnum : 4

]

cat : nnum : 4

case : 2

5, 6

cat : npnum : 4

case : 2

→

cat : pronnum : 4

case : 2

|

cat : propnnum : 4

case : 2





sleep →

»

cat : v

num : pl

–

sleeps →

»

cat : v

num : sg

–

feed →

»

cat : v

num : pl

–

feeds →

»

cat : v

num : sg

–





lamb →

2

4

cat : n

num : sg

case : [ ]

3

5 lambs →

2

4

cat : n

num : pl

case : [ ]

3

5

she →

2

4

cat : pron

num : sg

case : nom

3

5 her →

2

4

cat : pron

num : sg

case : acc

3

5

they →

2

4

cat : pron

num : pl

case : nom

3

5 them →

2

4

cat : pron

num : pl

case : acc

3

5

Rachel →

»

cat : propn

num : sg

–

Jacob →

»

cat : propn

num : sg

–

a →

»

cat : d

num : sg

–

two →

»

cat : d

num : pl

–



Derivation tree with case control

Example (Derivation tree with case control)

ˆ

cat : s˜

2

4

cat : np

num : 4case : 3nom

3

5

»

cat : vp

num : 4

–

»

cat : d

num : 4pl

–

2

4

cat : n

num : 4case : 3

3

5

»

cat : v

num : 4

–

2

4

cat : np

num : 2case : 5 acc

3

5

2

4

cat : pron

num : 2pl

case : 5

3

5

the shepherds feed them



Derivation tree with case control

Example (Derivation tree with case control)

This tree represents a derivation which starts with the initial symbol,[

cat : s]

, and ends with multi-AVM σ′, where

σ′ =the

[

num : 4]

shepherds[

num : 4

case : nom

] feed[

num : 4]

them[

num : 2

case : acc

]

This multi-AVM is unifiable with (but not identical to!) the sequence oflexical entries of the words in the sentence, which is:

σ =the

[

num : [ ]]

shepherds[

num : plcase : [ ]

] feed[

num : pl]

them[

num : plcase : acc

]

Hence the sentence is in the language generated by the grammar.


Linguistic applications Imposing subcategorization constraints

Imposing subcategorization constraints

A naıve solution to the subcategorization problem

intransitive verbs (with no object): sleep, walk, run, laugh, . . .

transitive verbs (with a nominal object): feed, love, eat, . . .

Lexical entries of verbs are extended such that their subcategorizationis specified

The rules that involve verbs and verb phrases are extended



Gsubcat, a naıve account of verb subcategorization

Example (Gsubcat, a naıve account of verb subcategorization)

1ˆ

cat : s˜

→

2

4

cat : np

num : 4case : nom

3

5

»

cat : vp

num : 4

–

2

»

cat : vp

num : 4

–

→

2

4

cat : v

num : 4subcat : intrans

3

5

3

»

cat : vp

num : 4

–

→

2

4

cat : v

num : 4subcat : trans

3

5

2

4

cat : np

num : 4case : acc

3

5

4

2

4

cat : np

num : 4case : 2

3

5 →

»

cat : d

num : 4

–

2

4

cat : n

num : 4case : 2

3

5

5, 6

2

4

cat : np

num : 4case : 2

3

5 →

2

4

cat : pron

num : 4case : 2

3

5 |

2

4

cat : propn

num : 4case : 2

3

5





sleep →

cat : vnum : plsubcat : intrans

sleeps →

cat : vnum : sgsubcat : intrans

feed →

cat : vnum : plsubcat : trans

feeds →

cat : vnum : sgsubcat : trans





lamb →

2

4

cat : n

num : sg

case : [ ]

3

5 lambs →

2

4

cat : n

num : pl

case : [ ]

3

5

she →

2

4

cat : pron

num : sg

case : nom

3

5 her →

2

4

cat : pron

num : sg

case : acc

3

5

they →

2

4

cat : pron

num : pl

case : nom

3

5 them →

2

4

cat : pron

num : pl

case : acc

3

5

Rachel →

»

cat : propn

num : sg

–

Jacob →

»

cat : propn

num : sg

–

a →

»

cat : d

num : sg

–

two →

»

cat : d

num : pl

–


Linguistic applications Subcategorization lists

Subcategorization lists

The previous account of subcategorization is naıve

Different verbs subcategorize for different kinds of complements:noun phrases, infinitival verb phrases, sentences etc.

Some verbs require more than one complement

The idea is to store in the lexical entry of each verb not an atomicfeature indicating its subcategory, but rather a list of categories,indicating the appropriate complements of the verb



Lexical entries of some verbs using subcategorization lists

Example (Lexical entries of some verbs using subcategorization lists)

sleep →

cat : vsubcat : elistnum : pl

love →

cat : vsubcat : 〈

[

cat : np]

〉num : pl

give →

cat : vsubcat : 〈

[

cat : np]

,[

cat : np]

〉num : pl

tell →

cat : vsubcat : 〈

[

cat : np]

,[

cat : s]

〉num : pl



Subcategorization lists

The grammar rules must be modified to reflect the additional wealthof information in the lexical entries

Due to this wealth there can be a dramatic reduction in the numberof grammar rules necessary for handling verbs



VP rules using subcategorization lists

Example (VP rules using subcategorization lists)

[

cat : s]

→[

cat : np]

[

cat : vsubcat : elist

]

[

cat : vsubcat : 2

]

→

cat : v

subcat :

[

first :[

cat : 4]

rest : 2

]

[

cat : 4]



A derivation tree

Example (A derivation tree)

ˆ

cat : s˜

»

cat : v

subcat : 〈〉

–

»

cat : v

subcat : 〈ˆ

cat : 2˜

〉

–

ˆ

cat : np˜

»

cat : v

subcat : 〈ˆ

cat : 1˜

,ˆ

cat : 2˜

〉

–

ˆ

cat : 1 np˜ ˆ

cat : 2 np˜

Rachel gave the sheep water



A derivation tree

Example (A derivation tree)

ˆ

c : s˜

»

c : v

sc : 〈〉

–

h

c : 2 s

i

"

c : v

sc : 〈h

c : 2i

〉

#

»

c : v

sc : 〈〉

–

ˆ

c : np˜

"

c : v

sc : 〈h

c : 1i

,

h

c : 2i

〉

#

h

c : 1 np

i

ˆ

c : np˜

"

c : v

sc : 〈h

c : 3i

〉

#

h

c : 3 np

i

Jacob told Laban he loved Rachel



Subcategorization imposes case constraints

In the above grammar, categories on subcategorization lists arerepresented as an atomic symbol

This is a simplification; the method outlined here can be used withmore complex encodings of categories

For example, the lexical entry of the German verb geben (to give) canstate that the first complement must be in the dative case, whereasthe second must be accusative




Example (Subcategorization imposes case constraints)

Ich gebe dem Hund den KnochenI give the(dat) dog the(acc) boneI give the dog the bone

∗Ich gebe den Hund den KnochenI give the(acc) dog the(acc) bone

∗Ich gebe dem Hund dem KnochenI give the(dat) dog the(dat) bone




Example (Subcategorization imposes case constraints)

The lexical entry of gebe, then, could be:

L(gebe) =

cat : v

subcat :

⟨[

cat : npcase : dat

]

,

[

cat : npcase : acc

]⟩

num : sg




In order to account for subcategorization of complex information(rather than of atomic category symbols), the VP rule whichmanipulates subcategorization lists has to be slightly modified

The revised rule reflects the fact that the subcategorized informationis not the value of the cat feature, but rather the entire verbcomplement:

[

cat : vsubcat : 2

]

→

cat : v

subcat :

[

first : 3

rest : 2

]

3 [ ]



G3, a complete E2-grammar

Example (G3, a complete E2-grammar)

ˆ

cat : s˜

→

2

4

cat : np

num : 4case : nom

3

5

2

4

cat : v

num : 4subcat : elist

3

5

2

4

cat : v

num : 4subcat : 2

3

5 →

2

6

6

4

cat : v

num : 4

subcat :

»

first : 3rest : 2

–

3

7

7

5

3 [ ]

2

4

cat : np

num : 4case : 2

3

5 →

»

cat : d

num : 4

–

2

4

cat : n

num : 4case : 2

3

5

2

4

cat : np

num : 4case : 2

3

5 →

2

4

cat : pron

num : 4case : 2

3

5 |

2

4

cat : propn

num : 4case : 2

3

5





sleep →

2

4

cat : v

subcat : elist

num : pl

3

5

give →

2

6

6

4

cat : v

subcat : 〈

»

cat : np

case : acc

–

,ˆ

cat : np˜

〉

num : pl

3

7

7

5

love →

2

6

6

4

cat : v

subcat : 〈

»

cat : np

case : acc

–

〉

num : pl

3

7

7

5

tell →

2

6

6

4

cat : v

subcat : 〈

»

cat : np

case : acc

–

,ˆ

cat : s˜

〉

num : pl

3

7

7

5





lamb →

2

4

cat : n

num : sg

case : 2

3

5 lambs →

2

4

cat : n

num : pl

case : 2

3

5

she →

2

4

cat : pron

num : sg

case : nom

3

5 her →

2

4

cat : pron

num : sg

case : acc

3

5

Rachel →

»

cat : propn

num : sg

–

Jacob →

»

cat : propn

num : sg

–

a →

»

cat : d

num : sg

–

two →

»

cat : d

num : pl

–


Linguistic applications Long distance dependencies

Long distance dependencies

Encoding grammatical categories as feature structures is very usefulin the treatment of unbounded dependencies

Such phenomena involve a “missing” constituent that is realizedoutside the clause from which it is missing, as in:

The shepherd wondered whom Jacob loved⌣

.




Phrases such as whom Jacob loved⌣

or who⌣

loved Rachel aresentences, with a constituent which is “moved” from its defaultposition and realized as a wh-pronoun in front of the phrase

We represent such phrases by using the category s

But to distinguish them from declarative sentences we add a feature,que, to the category

The value of que is ‘+’ in sentences with an interrogative pronounrealizing a transposed constituent




We also add a lexical entry for the pronoun whom:

whom →

cat : proncase : accque : +

Finally, we update the rule that derives pronouns such that itpropagate the value of que from the lexicon to higher projections ofthe pronoun:

cat : npnum : 1

case : 3

que : 5

→

cat : pronnum : 1

case : 3

que : 5




We extend G3 with two additional rules, based on the first two rulesof G3:

(3)

[

cat : sslash : 4

]

→

cat : npnum : 1

case : nom

cat : vnum : 1

subcat : elistslash : 4

(4)

cat : vnum : 1

subcat : 2

slash : 4

→

cat : vnum : 1

subcat :

[

first : 4

rest : 2

]



A derivation tree for Jacob loved⌣

Example (A derivation tree for Jacob loved⌣

)»

cat : s

slash : 4

–

2

4

cat : np

num : 1case : 2

3

5

2

6

6

4

cat : v

num : 1slash : 4subcat : 8

3

7

7

5

2

4

cat : propn

num : 1 sg

case : 2nom

3

5

2

6

6

6

6

4

cat : v

num : 1

subcat :

2

4

first : 4

»

cat : np

case : acc

–

rest : 8 elist

3

5

3

7

7

7

7

5

Jacob loved ⌣




A rule for creating “complete” sentences by combining the missingcategory with a “slashed” sentence

The rule does not commit as to the category of the dislocatedelement; it simply combines any category with a sentence in whichthis very same category is missing, provided that this category ismarked as ‘que +’

The value of que is propagated to the mother to indicate that thesentence is interrogative rather than declarative:

(5)

[

cat : sque : 5

]

→ 4[

que : 5+]

[

cat : sslash : 4

]



A derivation tree for whom Jacob loved⌣

Example (A derivation tree for whom Jacob loved⌣

)»

cat : s

que : 5

–

»

cat : s

slash : 4

–

4

2

4

cat : np

case : 3que : 5

3

5

2

4

cat : np

num : 1case : 2

3

5

2

6

6

4

cat : v

num : 1slash : 4subcat : elist

3

7

7

5

2

4

cat : pron

case : 3 acc

que : 5+

3

5

2

4

cat : propn

num : 1 sg

case : 2nom

3

5

2

4

cat : v

num : 1subcat :

˙

4¸

3

5

whom Jacob loved ⌣




In order to derive the full sentenceRachel wondered whom Jacob loved

⌣

we need a lexical entry for the verb wondered

It is a verb, so its category is v, and as it subcategorizes for aninterrogative sentence, its subcategory is a list of a single member, asentence whose que feature is ‘+’:

wondered →

cat : vnum : [ ]

subcat : 〈

[

cat : sque : +

]

〉



A derivation tree for Rachel wondered whom Jacob loved⌣

Example (A derivation tree for Rachel wondered whom Jacob loved⌣

)

[

cat : s]

cat : npnum : 3

case : 4nom

cat : vnum : 3

subcat : elist

cat : propnnum : 3 sgcase : 4

cat : vnum : 3

subcat : 〈 1 〉

1

[

cat : sque : +

]

Rachel wondered whom Jacob loved⌣




In the previous example the filler of the gap is realized immediately tothe left of the clause in which the gap occurs

This need not always be the case: unbounded dependencies can holdacross several clause boundaries

Typical examples are:

The shepherd wondered whom Jacob loved⌣

.

The shepherd wondered whom Laban thought Jacob loved⌣

.

The shepherd wondered whom

Laban thought Leah claimed Jacob loved⌣

.




Also, the dislocated constituent does not have to be an object:

The shepherd wondered who⌣

loved Rachel.

The shepherd wondered who Laban thought⌣

loved Rachel.

The shepherd wondered who

Laban thought Leah claimed⌣

loved Rachel.




The solution we proposed for the simple case of unboundeddependencies can be easily extended to the more complex examples

The solution amounts to three components:

A slash introduction ruleSlash propagation rulesA gap filler rule




In order to account for filler-gap relations that hold across severalclauses, all that needs to be done is to add more slash propagationrules

For example, in

The shepherd wondered whom Laban thought Jacob loved⌣

.

the slash is introduced by the verb phrase loved⌣

, and is propagatedto the sentence Jacob loved

⌣by rule (3)

This sentence is the object of the verb thought; therefore, we need arule that propagates the value of slash from a sentential object tothe verb phrase of which it is an object




Example (Long-distance dependencies)

(6)

cat : vnum : 1

subcat : 12

slash : 4

→

cat : vnum : 1

subcat :

[

first : 8

rest : 12

]

8[

slash : 4]





Then, the slash is propagated from the verb phrase thought Jacob loved⌣

to the sentence Laban thought Jacob loved⌣

:

(7)

[

cat : sslash : 4

]

→

cat : npnum : 5

case : nom

cat : vnum : 5

subcat : elistslash : 4




Example (A derivation tree for whom Laban thought Jacob loved⌣

)"

cat : s

que : 6

#

"

cat : s

slash : 4

#

2

6

6

6

4

cat : v

num : 5slash : 4sc : 12 elist

3

7

7

7

5

8"

cat : s

slash : 4

#

4

2

6

4

cat : np

case : 3que : 6

3

7

5

2

6

4

cat : np

num : 5case : 9

3

7

5

2

6

4

cat : np

num : 1case : 2

3

7

5

2

6

6

6

4

cat : v

num : 1slash : 4sc : elist

3

7

7

7

5

2

6

4

cat : pron

case : 3 acc

que : 6 +

3

7

5

2

6

4

cat : propn

num : 5 sg

case : 9 nom

3

7

5

2

6

6

6

4

cat : v

num : 5

sc :

"

first : 8rest : 12

#

3

7

7

7

5

2

6

4

cat : propn

num : 1 sg

case : 2 nom

3

7

5

2

6

4

cat : v

num : 1sc :

D

4E

3

7

5

whom Laban thought Jacob loved⌣





Finally, to account for gaps in the subject position, all that is needed is anadditional slash introduction rule:

(8)

cat : s

slash :

cat : npnum : 1

case : nom

→

cat : vnum : 1

subcat : elist




Example (A derivation tree for who⌣

loved Rachel)»

cat : s

que : 6

–

2

4

cat : s

num : 1slash : 4

3

5

2

4

cat : v


3

5

4

2

4

cat : np

case : 3 nom

que : 6

3

5 8»

cat : np

case : 2

–

2

4

cat : pron

case : 3 nom

que : 6

3

5

2

4

cat : v

num : 1 sg

subcat : 〈 8 〉

3

5

2

4

cat : propn

num : 6 sg

case : 2 acc

3

5

who ⌣ loved Rachel


Linguistic applications Subject and object control

Subject and object control

Differences between the ‘understood’ subjects of the infinitive verbphrase to work seven years in the following sentences:

Jacob promised Laban to work seven years

Laban persuaded Jacob to work seven years

The differences between the two example sentences stem fromdifferences in the matrix verbs:

promise is a subject control verb;persuade is object control



G4: explicit subj values

Example (G4: explicit subj values)

ˆ

cat : s˜

→ 1

2

4

cat : np

case : nom

num : 7

3

5

2

6

6

4

cat : v


subj : 1

3

7

7

5

2

6

6

4

cat : v

num : 7subcat : 4subj : 1

3

7

7

5

→

2

6

6

6

6

4

cat : v

num : 7

subcat :

»

first : 2rest : 4

–

subj : 1

3

7

7

7

7

5

2 [ ]

2

4

cat : np

num : 7case : 6

3

5 →

»

cat : d

num : 7

–

2

4

cat : n

num : 7case : 6

3

5

2

4

cat : np

num : 7case : 6

3

5 →

2

4

cat : pron

num : 7case : 6

3

5 |

2

4

cat : propn

num : 7case : 6

3

5





sleep →

2

6

6

6

4

cat : v

subcat : elist

subj :

»

cat : np

case : nom

–

num : pl

3

7

7

7

5

love →

2

6

6

6

6

6

4

cat : v

subcat : 〈

»

cat : np

case : acc

–

〉

subj :

»

cat : np

case : nom

–

num : pl

3

7

7

7

7

7

5

give →

2

6

6

6

6

6

4

cat : v

subcat : 〈

»

cat : np

case : acc

–

,ˆ

cat : np˜

〉

subj :

»

cat : np

case : nom

–

num : pl

3

7

7

7

7

7

5





lamb →

cat : nnum : sgcase : 6

lambs →

cat : nnum : plcase : 6

she →

cat : pronnum : sgcase : nom

her →

cat : pronnum : plcase : acc

Rachel →

[

cat : propnnum : sg

]

Jacob →

[

cat : propnnum : sg

]

a →

[

cat : dnum : sg

]

two →

[

cat : dnum : pl

]



Infinitival verb phrases

The next step is to account for infinitival verb phrases

This can be easily done by adding a new feature, vform, to verbalprojections

The values of this feature can represent the form of the verb: fin forfinite verbs and inf for infinitival ones

to work →

cat : vvform : infsubcat : elistsubj :

[

cat : np]



The lexical entry of promise

Example (The lexical entry of promise)

promised →

cat : vvform : fin

subcat : 〈

[

cat : npcase : acc

]

,

cat : vvform : infsubj : 1

〉

subj : 1

[

cat : npcase : nom

]

num : [ ]



A derivation tree for Jacob promised Laban to work

Example (A derivation tree for Jacob promised Laban to work)ˆ

cat : s˜

2

6

6

4

cat : v

vform : fin

subj : 1subcat : elist

3

7

7

5

2

6

6

6

4

cat : v

vform : fin

subj : 1subcat : 〈 3 〉

3

7

7

7

5

1"

cat : np

case : 6 nom

#

2

6

6

6

4

cat : v

vform : fin

subj : 1subcat : 〈 2 , 3 〉

3

7

7

7

5

2"

cat : np

case : 7 acc

#

"

cat : propn

case : 6

# "

cat : propn

case : 7

#

3

2

4

cat : v

vform : inf

subj : 1

3

5

Jacob promised Laban to work



The lexical entry of persuade

Example (The lexical entry of persuade)

persuaded →

cat : vvform : fin

subcat : 〈 1

[

cat : npcase : acc

]

,

cat : vvform : infsubj : 1

〉

subj :

[

cat : npcase : nom

]

num : [ ]


Linguistic applications Constituent coordination

Constituent coordination

N: no man lift up his [hand] or [foot] in all the land of Egypt

NP: Jacob saw [Rachel] and [the sheep of Laban]

VP: Jacob [went on his journey] and

[came to the land of the people of the east]

VP: Jacob [went near], and

[rolled the stone from the well’s mouth], and

[watered the flock of Laban his mother’s brother].

ADJ: every [speckled] and [spotted] sheep

ADJP: Leah was [tender eyed] but [not beautiful]

S: [Leah had four sons], but [Rachel was barren]

S: she said to Jacob, “[Give me children], or [I shall die]!”



Coordination in CFG

Example (Coordination in CFG)

S → S Conj SNP → NP Conj NPVP → VP Conj VP...

Conj → and, or, but, . . .



Coordination in UG

Example (Coordination in UG)[

cat : 1]

→[

cat : 1] [

cat : conj] [

cat : 1]



Coordination in UG

Example (Coordination)

ˆ

cat : 1 v˜

2

4

cat : 1num : [ ]sc : elist

3

5

2

4

cat : 1num : [ ]sc : elist

3

5

2

4

cat : v

num : [ ]

sc : 〈 2 〉

3

5 2»

cat : np

num : sg

–

ˆ

cat : conj˜

2

4

cat : v

num : [ ]

sc : 〈 3 〉

3

5 3»

cat : np

num : [ ]

–

rolled the stone and watered the sheep



Tough issues in coordination

Coordination of conjunctions

Properties of the conjoined phrases

Coordination of unlikes

Non-constituent coordination



Coordination

Example (Ruling out coordination in UG)[

cat : 1

conj : −

]

→

[

cat : 1

conj : +

]

[

cat : conj]

[

cat : 1

conj : +

]



Coordination

Example (Properties of the conjoined phrases)

2

6

6

4

cat : 1np

num : ??pers : ??gen : ??

3

7

7

5

2

6

6

4

cat : 1num : 4pers : 2gen : 8

3

7

7

5

2

6

6

4

cat : 1num : 6pers : 3gen : 7

3

7

7

5

2

6

6

4

cat : pron

num : 4pers : 2 second

gen : 8

3

7

7

5

ˆ

cat : conj˜

»

cat : d

num : 6

–

2

6

6

4

cat : n

num : 6 sg

pers : 3 third

gen : 7

3

7

7

5

you and a lamb



Coordination

Example (Coordination of unlikes)

Joseph became wealthyJoseph became a ministerJoseph became [wealthy and a minister]Joseph grew wealthy∗Joseph grew a minister∗Joseph grew [wealthy and a minister]



Coordination


[

cat : 1 ⊓ 2]

→[

cat : 1] [

cat : conj] [

cat : 2]

where ‘⊓’ is the generalization operator



Coordination


ˆ

cat :ˆ

v : +˜˜

»

subcat :ˆ

n : +˜

cat :ˆ

v : +˜

–

ˆ

cat :ˆ

n : +˜˜

»

cat :

»

v : +n : +

––

ˆ

cat : conj˜

»

cat :

»

v : −n : +

––

became wealthy and a minister



Coordination

Example (Coordination of unlikes)ˆ

c :ˆ

v : +˜ ˜

»

c :ˆ

v : +˜

sc :ˆ

n : +˜

–

ˆ

c :ˆ

n : +˜ ˜

2

4

c :ˆ

v : +˜

sc :

»

v : +n : +

–

3

5

ˆ

c : c˜

»

c :ˆ

v : +˜

sc :ˆ

n : +˜

–»

c :

»

v : +n : +

– –

ˆ

c : c˜

»

c :

»

v : −n : +

– –

grew and remained wealthy and a minister



Coordination

Example (Non-constituent coordination)

Rachel gave the sheep [grass] and [water]Rachel gave [the sheep grass] and [the lambs water]Rachel [kissed] and Jacob [hugged] Binyamin


Linguistic applications Unification grammars facilitate linguistic generalizations

Unification grammars facilitate linguistic generalizations

Compared with context-free grammars, unification grammars providemuch better means for expressing linguistic generalizations

Verb subcategorizationCoordination

Unification grammars also provide much more informative structuresthan CFGs

AgreementSubject/object control

Unification grammars provide a very powerful tool for expressing whatother linguistic theories would call “movement”

Gap–filler constructionsUnbounded dependencies


Expressiveness of unification grammars Expressiveness of unification grammars

Expressiveness of unification grammars

We hinted above that unification grammars are more expressive thanCFGs

Unification grammars are strictly more powerful than CFGs, evenwhen weak generation capacity is concerned

We show two unification grammars for formal languages that areknown to be trans-context-free



Unification grammars are more expressive than CFGs

Gabc generates the language L = {anbncn | n > 0}

The signature of the grammar consists in the features cat and t andthe atoms s, ap, bp, cp, at, bt, ct and end

The terminal symbols are, of course, a, b and c

The start symbol is the left-hand side of the first rule



Unification grammars are more expressive than CFGs

Feature structures in this example have two features: cat, whichstands for category, and t, which “counts” the length of sequences ofa-s, b-s and c-s

The “category” is ap for strings of a-s, bp for b-s and cp for c-s

The categories at, bt and ct are pre-terminal categories of the wordsa, b and c, respectively

“Counting” is done in unary base: a string of length n is derived byan AFS (that is, an AMRS of length 1) whose depth is n

For example, the string bbb is derived by the following featurestructure:

[

cat : bpt :

[

t :[

t : end]]

]



A unification grammar Gabc for {anbncn | n > 0}

Example (A unification grammar Gabc for {anbncn | n > 0})

ρ1 :[

cat : s]

→

[

cat : apt : 3

] [

cat : bpt : 3

] [

cat : cpt : 3

]

ρ2 :

[

cat : apt :

[

t : 4

]

]

→[

cat : at]

[

cat : apt : 4

]

ρ3 :

[

cat : apt : end

]

→[

cat : at]

ρ4 :

[

cat : bpt :

[

t : 4

]

]

→[

cat : bt]

[

cat : bpt : 4

]




Example (A unification grammar Gabc for {anbncn | n > 0})

ρ5 :

[

cat : bpt : end

]

→[

cat : bt]

ρ6 :

[

cat : cpt :

[

t : 4

]

]

→[

cat : ct]

[

cat : cpt : 4

]

ρ7 :

[

cat : cpt : end

]

→[

cat : ct]

[

cat : at]

→ a

[

cat : bt]

→ b

[

cat : ct]

→ c




Example (Derivation tree of a2b2c2)

ˆ

cat : s˜

»

cat : ap

t : 3ˆ

t : 4 end˜

– »

cat : bp

t : 3ˆ

t : 4 end˜

– »

cat : cp

t : 3ˆ

t : 4 end˜

–

»

cat : ap

t : 4 end

– »

cat : bp

t : 4 end

– »

cat : cp

t : 4 end

–

ˆ

cat : at˜ ˆ

cat : at˜ ˆ

cat : bt˜ ˆ

cat : bt˜ ˆ

cat : ct˜ ˆ

cat : ct˜

a a b b c c



A unification grammar Gabc for the language

{anb

nc

n | n > 0}

Corollary

The grammar Gabc generates the language L = {anbncn | n > 0}.



A unification grammar Gww for {ww | w ∈ {a, b}+}

Example (A unification grammar Gww for {ww | w ∈ {a, b}+})

ˆ

cat : s˜

→

»

first : 4

rest : 2

– »

first : 4

rest : 2

–

2

4

first : ap

rest :

»

first : 4

rest : 2

–

3

5 →ˆ

cat : at˜

»

first : 4

rest : 2

–

2

4

first : bp

rest :

»

first : 4

rest : 2

–

3

5 →ˆ

cat : bt˜

»

first : 4

rest : 2

–

»

first : ap

rest : elist

–

→ˆ

cat : at˜

»

first : bp

rest : elist

–

→ˆ

cat : bt˜

ˆ

cat : at˜

→ aˆ

cat : bt˜

→ b



A unification grammar Gww for {ww | w ∈ {a, b}+}

Example (A derivation tree for the string aabaab)[

cat : S]

〈a, a, b〉〈a, a, b〉

〈a, b〉〈a, b〉

〈b〉〈b〉

a a b a a b


Expressiveness of unification grammars Unification grammars and Turing machines

Unification grammars and Turing machines

How expressive are unification grammars?

They are equivalent in their weak generative power to unrestrictedrewriting systems

Unification grammars are equivalent to Turing machines in theirgenerative capacity

The languages generated by unification grammars are exactly the setof recursively enumerable languages

The universal recognition problem with unification grammars isundecidable: given an arbitrary unification grammar G and a stringw , no computational procedure can be designed to determinewhether w ∈ L(G )



Turing machines

Definition (Turing machines)

A (deterministic) Turing machine (Q,Σ, ♭, δ, s, h) is a tuple such that:

Q is a finite set of states

Σ is an alphabet, not containing the symbols L, R and elist

♭ ∈ Σ is the blank symbol

s ∈ Q is the initial state

h ∈ Q is the final state

δ : (Q \ {h}) × Σ → Q × (Σ ∪ {L,R}) is a total function specifyingtransitions.



Turing machines

A configuration of a Turing machine consists of the state, thecontents of the tape and the position of the head on the tape

A configuration is depicted as a quadruple (q,wl , σ,wr ) where q ∈ Q,wl ,wr ∈ Σ∗ and σ ∈ Σ

The contents of the tape is ♭ω · wl · σ · wr · ♭ω, and the head is

positioned on the σ symbol.



Turing machines

A given configuration yields a next configuration, determined by thetransition function δ, the current state and the character on the tapethat the head points to.

The next configuration of a configuration (q,wl , σ,wr ) is defined iffq 6= h, in which case it is:

(p,wl , σ′,wr ) if δ(q, σ) = (p, σ′) for σ′ ∈ Σ

(p,wlσ, first(wr ), but-first(wr )) if δ(q, σ) = (p,R)(p, but-last(wl ), last(wl ), σwr ) if δ(q, σ) = (p,L)



Turing machines

where:

first(σ1 · · · σn) =

{

σ1 n > 0♭ n = 0

but-first(σ1 · · · σn) =

{

σ2 · · · σn n > 1ǫ n ≤ 1

last(σ1 · · · σn) =

{

σn n > 0♭ n = 0

but-last(σ1 · · · σn) =

{

σ1 · · · σn−1 n > 1ǫ n ≤ 1



Turing machines

A next configuration is only defined for configurations in which thestate is not the final state, h

Since δ is a total function, there always exists a unique nextconfiguration for every given configuration

A configuration c1 yields the configuration c2, denoted c1 ⊢ c2, iff c2

is the next configuration of c1



Unification grammars and Turing machines: program

define a unification grammar GM for every Turing machine M suchthat the grammar generates the word halt if and only if the machineaccepts the empty input string:

L(GM) =

{

{halt} if M terminates for the empty input∅ if M does not terminate on the empty input

if there were a decision procedure to determine whether w ∈ L(G ) foran arbitrary unification grammar G , then in particular such aprocedure could determine membership in the language of GM ,simulating the Turing machine M.

the procedure for deciding whether w ∈ L(G ), when applied to theproblem halt∈ L(GM), determines whether M terminates for theempty input, which is known to be undecidable.




Feature structures will have three features: curr, representing thecharacter under the head; right, representing the tape contents to theright of the head (as a list); and left, representing the tape contentsto the left of the head, in a reversed order

All the rules in the grammar are unit rules; and the only terminalsymbol is halt. Therefore, the language generated by the grammar isnecessarily either the singleton {halt} or the empty set



Unification grammars and Turing machines: signature

Let M = (Q,Σ, ♭, δ, s, h) be a Turing machine. Define a unificationgrammar GM as follows:

Feats = {cat, left, right, curr, first, rest}Atoms = Σ ∪ {start, elist}The start symbol is

[

cat : start]

the only terminal symbol is halt



Unification grammars and Turing machines: rules

Two rules are defined for every Turing machine:

[

cat : start]

→

cat : scurr : ♭

right : elistleft : elist

h → halt




For every q, σ such that δ(q, σ) = (p, σ′) and σ′ ∈ Σ, the followingrule is defined:

cat : qcurr : σ

right : 4

left : 2

→

cat : pcurr : σ′

right : 4

left : 2




For every q, σ such that δ(q, σ) = (p,R) we define two rules:

cat : qcurr : σ

right : elistleft : 4

→

cat : pcurr : ♭

right : elist

left :

[

first : σ

rest : 4

]

cat : qcurr : σ

right :

[

first : 4

rest : 2

]

left : 5

→

cat : pcurr : 4

right : 2

left :

[

first : σ

rest : 5

]




For every q, σ such that δ(q, σ) = (p,L) we define two rules:

cat : qcurr : σ

right : 4

left : elist

→

cat : pcurr : ♭

right :

[

first : σ

rest : 4

]

left : elist

cat : qcurr : σ

right : 4

left :

[

first : 2

rest : 5

]

→

cat : pcurr : 2

right :

[

first : σ

rest : 4

]

left : 5



Unification grammars and Turing machines: results

Lemma

Let c1, c2 be configurations of a Turing machine M, and A1,A2 be AFSsencoding these configurations, viewed as AMRSs of length 1. Thenc1 ⊢ c2 iff A1 ⇒ A2 in Gm.

Theorem

A Turing machine M halts on the empty input iff halt ∈ L(GM).

Corollary

The universal recognition problem for unification grammars is undecidable.




Unification grammars are indeed a model of computation: everyrecursively enumerable set can be computed as the languagegenerated by some unification grammar

Consider again the simulation of a Turing machine M by a unificationgrammar GM

Feature structures manipulated by the grammar encode the contentsof the Turing machine tape

By the end of the derivation, the pre-terminal of the terminal halt is afeature structure which encodes, in its right and left features, thecontents of the tape when the Turing machine halts




w ∈ L(M) iff there exists a terminating computation of M where w isthe contents of the tape

It is therefore possible to define, for each Turing machine M, agrammar G ′

M , such that L(G ′M) = L(M), in the following way

First, G ′M is constructed in a similar way to GM , simulating the

operation of M until it terminates (or indefinitely, in case it does notterminate)

Then, additional rules distinguish G ′M from GM

Such rules should first copy the contents of the left feature to thebeginning of the right list

Then, additional rules should pop the contents of the right list, oneby one, and generate a pre-terminal for each of the list’s elements




Example (Unification grammars and Turing machines)

h →[

cat : shift]

cat : shiftright : 4

left : elist

→

[

cat : printright : 4

]

cat : shiftright : 2

left :

[

first : σrest : 4

]

→

cat : shift

right :

[

first : σrest : 2

]

left : 4

[

cat : printright : elist

]

→ ǫ

cat : print

right :

[

first : σrest : 4

]

→ σ

[

cat : printright : 4

]


Summary Extensions and open problems

Extensions and open problems

Restricted versions of unification grammars

Off-line parsabilityContext-free and Mildly-context-sensitive unification grammarsPolynomially-arsable unification grammars

Typed unification grammars

Type hierarchiesAppropriateness specificationType inference

Development of large-scale grammars

Grammar engineeringModularity, information encapsulation, separate compilation, ...


context-free grammars for natural languages

Documents