Download - Logical and Computational Structures for Linguistic ...bcrabbe/mpri/cours3.pdfhidden structure y is instead organised as a tree. A Pcfg is a cfg that de nes a language L such that

Logical and Computational Structures for LinguisticModeling

MPRI 2-27-1

Benoıt Crabbe

2020-2021

Benoıt Crabbe Logical and Computational Structures for Linguistic ModelingMPRI 2-27-12020-2021 1 / 60

Models of syntax

This class provides an introduction to several formal systems formodelling the syntax of natural languages.

The class highlights some of the key modelling problems in naturallanguage syntax and each time we provide an illustration with aformal system that copes well with the problem at hand

Modelling issues in the syntax of natural languages:

Ambiguity and lexicalism: weaknesses and reformulations of pcfgMove: Expression of movement without moving, non context-freepatterns and tree adjoining grammars (tag)Robustness and word order freedom Dependency syntaxSyntax semantics interface and combinatory categorial grammar(ccg)


PCFG and ambiguity

Plan

1 PCFG and ambiguity

2 Tree adjoining grammars

3 Dependency syntax

4 Combinatory Categorial Grammar


PCFG and ambiguity

Probabilistic Context Free grammar

Probabilistic context free grammar (pcfg) is a tree rather than asequence model, yet it falls into the class of generative languagemodels.

While Hmm assume an hidden sequence y, pcfg assume that thehidden structure y is instead organised as a tree.

A Pcfg is a cfg that defines a language L such that∑

x∈L P(x) = 1.That is pcfg is a language model.


PCFG and ambiguity

Definition

A pcfg is a 5-tuple G = 〈Σ,T ,S ,R,P〉 where:

Σ is a set of non terminal symbols

T is a set of terminal symbols (words)

S ∈ Σ is an axiom

R is a set of rules of the form A→ β

P is a probabilistic weighting function such that P(A→ β) ∈ [0, 1]and

∑β P(A→ β) = 1 for every A ∈ Σ


PCFG and ambiguity

cfg derivation as a tree

S

VP

PP

NP

N

mat

D

the

P

on

V

sleeps

NP

N

cat

D

The

The tree instanciates thefollowing occurrences of cfgrules:

S → NP VP

NP → D N

VP → V NP

PP → P NP

NP → D N

D → The

N → cat

V → sleeps

P → on

D → the

N → mat


PCFG and ambiguity

Probability of a tree and the best tree problem

The probability of a derivation tree t is defined as:

P(t) =∏

A→β∈tP(A→ β)

Note that P(x, y) = P(t) in our notation inherited from Hmmbecause the tree also generates the observed x symbols as leaves

Let T (x) be the set of trees such that yield(t) = x for everyt ∈ T (x). In what follows we will be mostly consider the problem ofpredicting the max probability tree:

t = argmaxt∈T (x)

P(t)


PCFG and ambiguity

Example of ambiguity and lexical effects

(A) S

NP

PP

NP

N

tomates

D

des

P

avec

N

salade

D

une

VN

mange

NP

Pierre

(B) S

PP

NP

N

tomates

D

des

P

avec

NP

N

salade

D

une

VN

mange

NP

Pierre

The lexical problem

The preferred interpretation is (A). The rules colored in red are those thatdiffer between the two trees. With standard pcfg the choice ofinterpretation is dependant of the probability of generic structural rulesindependently of the lexical elements. Observe that for the sentence withidentical structure whose preferred structure is (B) Pierre mange unesalade avec des couverts the disambiguation choice is identical.


PCFG and ambiguity

A lexicalized solutionparse trees

A possible solution to the observed problem amounts to lexicalize theparse trees (Collins, 2003) with lexical heads such as:

(A) S[mange]

NP[salade]

PP[tomates]

NP[tomates]

N[tomates]

tomates

D[des]

des

P[avec]

avec

N[salade]

salade

D[une]

une

VN[mange]

mange

NP[Pierre]

Pierre

(B) S[mange]

PP[tomates]

NP[tomates]

N[tomates]

tomates

D[des]

des

P[avec]

avec

NP[salade]

N[salade]

salade

D[une]

une

VN[mange]

mange

NP[Pierre]

Pierre


PCFG and ambiguity

A lexicalized solutionThe grammar

The lexicalized representation implies we have grammars with rulesannotated with lexical words, for the example:

S [mange] → NP[Pierre]VN[mange]NP[salade]

NP[salade] → D[une]N[salade]PP[tomates]

The rules of such a grammar are made of lexicalized categories X [w ]that are couples of a traditional non terminal and a word symbol:

X [wh] → Y1[w1] . . .Yh[wh] . . .Yn[wn]

X [wh] → wh

every lexicalized rule is restricted to have at least one occurrence ofthe lexical symbol on the lhs as part of a symbol in the rhs


PCFG and ambiguity

Combinatorial explosion in the size of the grammar

A Pcfg grammar with lexicalized rules of this form has |Σ| × |V | nonterminal symbols

Let n be the arity of the grammar rules and let |T | = 300000,|Σ| = 40, the number of rules r becomes quickly astronomical:

r = |Σ|(|Σ| × |T |)n

5, 76 1015 ≈ 40× (40× 300000)2

Pcfg parsers can process such grammars by generating rulesdynamically (lexical elements fill their slots while parsing)

Probability estimation is difficult whatever the estimation method.The problem amounts to estimate probabilities for events that belongto general purpose world knowledge


PCFG and ambiguity

The unlexicalized solution

Another line of solution comes from the unlexicalized representations(Klein and Manning, 2003; Petrov et al., 2006)

Here the working hypothesis rejects the idea of acquiring worldknowledge from data

Disambiguation is performed by identifying recurrent patterns in thedata that generally help to disambiguate properly

Example: the late attachment preference

This pattern studied by Frazier (1987) states that in case of choice, wegenerally prefer to attach to the latest open constituent in the sentence:

(Tom said (that Bill had taken the cleaning out )? yesterday )? )


PCFG and ambiguity

The capture of unlexicalized patterns by grammar categoryrefinements

Category refinements for an example of the form:. . . le frere du pere de la voisine

(A) NP

PP

NP

PP

NPP

NP

P

NP

(B) NP

PP

NPP

NP

PP

NPP

NP

Key observation

Both trees have the exact same probability with pcfg because they arebuilt by an identical set of rules.


PCFG and ambiguity

The capture of unlexicalized patterns by categoryrefinements

The parent annotation amounts to annotate each node in a tree by itsparent category

S

VP

NP

N

mouse

D

a

V

ate

NP

N

cat

D

The

parent=⇒

S/∅

VP/S

NP/VP

N

mouse

D

a

V

ate

NP/S

N

cat

D

The

The immediate effect is to contextualize the categories (NP/S is asubject and NP/VP is object). This makes sense if we look at statsfrom Penn treebank:

Type NP NP PP DT NN PRP Autre

NP/S (sujet) 9% 9% 21% 61%NP/VP (objet) 22% 7% 3% 69%NP/? (Tous) 11% 9% 6% 74%


PCFG and ambiguity

The capture of unlexicalized patterns by grammar categoryrefinementsLate closure by parent annotation

The parent annotation also catches the late closure preference:

(A) NP/∅

PP/NP

NP/PP

PP/NP

NP/PPP

NP/NP

P

NP/NP

(B) NP/∅

PP/NP

NP/PPP

NP/NP

PP/NP

NP/PPP

NP/NP

where NP within a PP get category NP/PP and NP within an NPcategory NP/NP. Since the rule sets are not anymore identical, thepreference is captured if we count it in data.


PCFG and ambiguity

PCFG evaluation

Pcfg parsers are evaluated by comparing the predictions of the parser with handannotated trees.

A parse triplet is a tuple 〈X , i , j〉 where X is a category and i the index of theleftmost word in its yield, j the index of the rightmost word.

The set predicted is the set of all triplets found in the predicted parse tree and theset, reference is the set of all triples found in the reference and predicted correct= predicted ∩ reference.

The precision is defined as:

P =predicted correct

predicted

The recall as:

R =predicted correct

reference

and the F-score:

F =2PR

P + R


PCFG and ambiguity

Some results

Name F-score

pcfg – base ∼ 70Unlex. annotations 85.8pcfg-la 90.1Lexicalized (Collins) 87.6Lexicalized (Charniak) 89.7

State of the art (2019) 95.6


Tree adjoining grammars

Plan



3 Dependency syntax




Movement in natural language

Let’s consider again the syntax of an arithmetic expression:

(3 + (4 x 2))

We can write such expressions with prefix, infix, postfix notation,whatever the notation, the functor stands next to its arguments,yielding an easy process of evaluation.

In natural language, cases following patterns of the form:

(2 (3 + (4 x )))

are quite possible. Here it is meant that the 2 has been ’moved away’from its canonical place (as argument of its functor) to some other.This ’move’ is fairly natural in many languages.



Examples of movements in natural languages

(interrogatives)(Quel loup)1 ((Marie) a-t-elle vu (t1) (hier)) ?(Quel loup)1 (tu crois (que ((Marie) a vu (t1) (hier)))) ?

(relatives)Le loup ((que)1 ((Marie) a vu (t1) (hier))) a disparuLe loup ((que)1 (tu crois (que ((Marie) a vu (t1) (hier))))) a disparu

(clefts)C’est (le loup)1 (que ((Marie) a vu (t1) (hier))) ?C’est (le loup)1 (que (tu crois (que ((Marie) a vu (t1) (hier))))) ?

A priori unbounded

Le loup que Pierre pense que son frere a dit (. . . ) que le voisin croit queMarie aurait vu



Generative and transformational grammar

The formalisation of a generative grammar with movement (ortransformations) is the original insight of Chomsky (1956).

The architecture of a transformational grammar is the following:

Generate a base structure where all the predicates and arguments are inthe same domain of locality (informally, the functor can fetch itsarguments within the same grammatical rule)Apply tree transformations (include tree structure addition, removal,substitution) to generate the surface (observed) form

Undecidability

Since transformations can be applied multiple times in sequence and giventheir properties, there is no guarantee that the transformational processhalts, the system is undecidable (Peters and Ritchie, 1973).



Tree adjoining grammar

Tree adjoining grammar (Joshi and Schabes, 1997) builds on theinsight of lexicalism and on the idea to provide a means to expressmovement without actually expressing explicit tree transformationsbut by means of an operation called adjunction.

Tag is decidable and one can design parsing algorithms in polynomialtime O(n6)



A tree grammar

tag is a tree grammar, whose units are elementary trees that canbe substituted into each other:

]

Descriptive syntax

Formal syntax cares more about the byproduct (trees) thanabout the derivation

We are eager to view the grammar and syntactic parsing asa tree building device where trees are substitutedaltogether.

S

NP

NP

D

D

Le

N

N

chat

VP

VP

V

V

dort

PP

PP

P

P

sur

NP

NP

D

D

le

N

N

paillasson

S

NP

D

Le

N

chat

VP

V

dort

PP

P

sur

NP

D

le

N

paillasson

Benoit Crabbe Structures Informatiques et Logiques pour la Modelisation



Extended domain of locality

Since elementary trees are first class citizens, Tag allows us to usetrees of depth > 1:

]

Extending the domain of locality

Under this descriptive view, we can not only consider plugging together

subtrees of depth 1, but subtrees of an arbitratry depth ≥ 1.

S

NP

NP

D

D

Le

N

chat

VP

V

dort

PP

PP

P

sur

NP

NP

D

D

le

N

paillasson

Vocabulary

We say that trees of arbitrary depth have an extended domain of locality, a

grammar that has as units such trees is a Tree substitution grammar (TSG)

Benoit Crabbe Structures Informatiques et Logiques pour la Modelisa

Such trees are said to have an extended domain of locality. Agrammar whose units are elementary trees and with an operationcalled tree substitution is called a tree substitution grammar (tsg)



Tree rewriting and derivations

The substitution operation amounts to rewrite the leaf node of anelementary tree labelled with a non terminal symbol by an elementarytree whose root is labelled with the same non terminal symbol.The derivation of a tsg is not a sequence of rewrite operations but atree:

]

Relevance for syntactic description

Allows for a direct encoding of lexical dependencies in thegrammar.

Provided that the grammar is lexicalised

S

NP

NP

D

D

Le

N

chat

VP

V

dort

PP

PP

P

sur

NP

NP

D

D

le

N

paillasson

dort

chat

le

sur

paillasson

le


(here we name elementary trees by their lexical item)Benoıt Crabbe Logical and Computational Structures for Linguistic ModelingMPRI 2-27-12020-2021 25 / 60


Lexicalisation

Definition (lexicalisation)

A finitely ambiguous grammar G is lexicalized if every rule in G contains alexical element

Definition (weak lexicalisation of a formalism)

A formalism F weakly lexicalizes a formalism F ′ if for every grammarG ′ ∈ F ′ we have a lexicalized grammar G ∈ F such that L(G ′) = L(G )

cfg in Greibach normal form (X → a α) weakly lexicalizes cfg

Definition (strong lexicalisation of a formalism)

A formalism F weakly lexicalizes a formalism F ′ if for every grammarG ′ ∈ F ′ we have a lexicalized grammar G ∈ F that generates the sametree set as G ′



tsg does not strongly lexicalize cfg

The proof relies on the idea that each elementary tree of a tsg has afinite height while some cfg grammars can generate trees where allpaths can grow without bound

Theorem

Consider the grammar G :

S → S S

S → a

This grammar generates sequences of a with trees whose path length fromthe root to the leaves can grow unboundedly. This is a contradiction withthe definition of tsg where the elementary trees have finite height andwith the definition of a grammar that has finite number of rules.



Tree adjoining grammar

Definition

A tree adjoining grammar (tag) is a tuple 〈Σ, S , L, I ,A〉 where:

Σ is a set of non terminal symbols

S ∈ Σ is an axiom

L is a set of terminal symbols (L ∪ Σ = ∅)I is a set of initial trees. A leaf node n whose label `(n) ∈ Σ is calleda substitution node. A leaf node n whose label `(n) ∈ L is called ananchor node. Any non leaf node is labelled by a non terminal

A is a set of auxiliary trees. Every auxiliary tree has exactly onenode whose label `(n) is equal to the root label. n is called the footnode (notation ?).

The set of trees I ∪ A is the set of elementary trees.

Two compositions operations are defined: substitution and adjunction



Adjunction

Adjunction

Let β be an auxiliary tree and α an elementary tree. β is adjoined onnode x of α by splicing α in two parts. The top node x top is replacedby the root of β and the bottom node xbot is replaced by the footnode of β.

Adjunction is subject to the constraint that the foot node, root nodeof β and x have the same label.

It is not possible to adjoin on a substitution node

]

Adjoining

Adjoining : inserts a tree β into a tree αβ

V

Vaux

a

V⋆

α

S

NP↓ V

vu

NP↓⇒

S

NP↓ V

Vaux

a

V

vu

NP↓

Adjunction

Let β be an elementary tree and α an auxiliary tree, α isadjoined on node x of β by splicing β in two parts. The topnode x from β is replaced by the root of α, the bottomnode x from β is replaced by the foot node of α.

Foot node in β and root node in β have the same categoryN

Node receiving adjunction must have category N .

No adjoining on substitution node




Adjoining constraints

Every node n of an elementary tree can get an adjunction constraint:

Obligatory adjunction (OA) it is mandatory to perform an adjunctionon this nodeSelective adjunction (SA) a subset of the trees of the grammar canadjoin on this nodeNull adjunction (NA) no adjunction can be performed on this node



Derivation in tree adjoining grammars

A derivation in a tag starts with an initial tree whose root is labelledby the grammar axiom (`(r) = S) The derivation process replaces thesubstitution nodes until all substitution nodes have been substitutedin the derivation tree. Adjunction may optionally occur.

Example derivation

Elementary treesN

Pierre

N

Marie

V

V?Vaux

a

S

NV

vu

N

Derivation treeα[vu]

3 : α[Marie]1 : α[Pierre]2 : β[a]

Derived treeS

N

Marie

V

V

vu

Vaux

a

N

Pierre



Adjunction and unbounded move

Lexicalized tag can capture naturally examples that would call fortransformations otherwise:

]

Solution for TAG

Adjoining of the main clause into the subordinate clause:

S

N

Pierre

V

pense

Ssub

C

que

S⋆

Srel

ProRel

ProRel

que

S

N

Marie

V

a vu

vu

a Marie que pense

Factoring recursion out of the domain of locality

The predicate vu is encoded locally with all its dependants.

The recursive structure is factored out of the domain of locality

Rq : The “design” principle (argument/modifier) is not respected


(Le loup) que Pierre pense que Marie a vu

Factoring recursion out of the domain of locality

As can be observed, the functor vu (or predicate) has access to all itsarguments que,Pierre,Marie within the same grammatical rule (elementarytree) while the recursive component is factored out. Observe also that theoperation can be repeated hence causing the relative pronoun to beunboundedly far from its predicate.



Context sensitivity

Adjunction is also an operation that contributes to make of tag aclass of language that is a superset of context free languages (and asubset of context sensitive languages). This class of languages iscalled mildly context sensitive languages.

Grammar generating the pattern anbncndn:S

ε

Sna

dS

cS?nab

a

Grammar generating the 2-copy language {ww |w ∈ {a, b}∗}S

ε

Sna

S

aS?na

a

Sna

S

bS?na

b



Context sensitivity and natural languages

The pattern of the 2-copy language is found naturally in Dutch (andSwiss German):

. . . omdat ik Cecilia Henk de nijlpaarden zag helpen voeren. . . because I Cecilia Henk the hippopotamus saw help feed

. . . because I saw Cecilia help Henk feed the hippopotamus

This kind of example (cross serial dependencies) suggest thatnatural languages are mildly context sensitive.



Mildly context sensitive systems

tag is by no means the only mildly context sensitive formalism. ccghas the same generative capacity

tag however suffers from expressivity limits and there exist slightlymore general systems such as lcfrs (Kallmeyer, 2010), mcfg (Sekiet al., 1991) and multi-component tag that make it easier to modelthe syntax of languages with more word order freedom such asGerman, Korean . . .


Dependency syntax

Plan



3 Dependency syntax



Dependency syntax

Dependency grammar

Dependency grammars (dg) are a grammatical description systemthat relate words of a sentence with dependency edges:

D N V P D Nthe cat sleeps on the mat

And sometimes with typed dependencies:

D N V P D Nthe cat sleeps on the mat

det subj iobj

pobj

det


Dependency syntax

Example of dependency dag

D N V V P D Nle chat souhaite dormir sur le paillasson

Dependency Dag

The decoration is variable but the core of the dependency representation isa dependency dag with nodes ordered according to the linear order ofthe sentence


Dependency syntax

The tradition of Dependency Grammar

Dependency grammar comes from a descriptive tradition

It is well suited for the multilingual case:

]

Why do we care ?

Direct encoding of predicate argument structure (versusphrase structure grammar or TAG)

Free word order languages where dependency structure isless related to the way words are grouped together (andunbounded dependencies)


Yet dependency grammar is not a grammar (!)

There are no rules of grammarThe grammar does not generate a language (in the formal sense)Dependency grammar makes no predictions about language, rather itprovides descriptionsOnly given a sentence, the grammar provides a description, this issimilar in spirit to discriminative models in machine learning.


Dependency syntax

Constraints on dependency structures

Let G = 〈V ,E 〉 be a dependency dag, we note:

i → j iff i 6= j , (i , j) ∈ E

i ↔ j iff i 6= j , (i , j) ∨ (j , i) ∈ E

i∗→ j iff i = j or ∃x .i → x , x

∗→ j

i∗↔ j iff i = j or ∃x .i ↔ x , x

∗↔ j

It is common to apply some of the following constraints on dependencystructures:

G is connected : if i , j ∈ V then i∗↔ j

G is acyclic : if i → j then not j∗→ i

Single head (treeness) constraint: if i → j then not x → j (∀x ∈ V )

Projectivity if i → j then i∗→ x ∀x .i ≺ x ≺ j


Dependency syntax

The projectivity constraint

The projectivity constraint (i → j then i∗→ x ∀x .i ≺ x ≺ j ) says

that a node generates a yield of contiguous words in the sentence :

]

Conditions (well-formedness) on dependency graphs

G is (weakly) connected :

if i, j ∈ V then i∗↔ j

G is acyclic :

if i → j then not j∗→ i

Single head constraint :

if i → j then not i′ → j ∀i′ = i

G is projective:

if i → j then i∗→ i′ ∀i′ such that i < i′ < j or j < i′ < i

i

j

i'

wi wjwi'

*

i

j

i'

wiwj wi'

*

i

ji'

wi wjwi'

Non projective Projective


Non projectivity is related to movement and is likely to occur moreoften with free word order languages (red nodes indicate a gap):

]

Remarks on constraints

Usually not all constraints are used, connectedness andacyclicty is used most of the time, used, Projectivity issometimes relaxed (in theory most of the time). Howeverthe list of constraints used as an impact on algorithms forparsing.

The projectivity constraint is typically used for practicalapplications, (under this constraint it is fairly easy todesign a parser) however in theory we have non projectiveconstraints as soon as we handle unbounded dependencies,recall :

Le garcon que Pierre pense que Marie aime

Le garçon que Pierre pense que Marie aime

Benoit Crabbe Structures Informatiques et Logiques pour la ModelisaBenoıt Crabbe Logical and Computational Structures for Linguistic ModelingMPRI 2-27-12020-2021 41 / 60

Dependency syntax

Are natural languages really non projective ?

One may think that free word order languages are highly nonprojectiveLet the gap degree of a node be the number of gaps below a node andlet the gap degree of a tree be the maximum gap degree of its nodesWith UD corpora (https://universaldependencies.org) one canmeasure that the gap degree of most languages is generally close to 0:

For languages such has English or French almost 99% of trees haveGap Degree 0.For languages said to have free word order, we observe that 97% ofLatin trees have Gap Degree 0. This is the same for Dutch orHungarian. Ancient Greek has 92% trees with Gap Degree 0.

Dependency Length Minimization

This observation is a consequence of a more general trend: dependenciestend to be short, and in general the structure of the sentence tends tominimize the length of dependencies (Gildea and Temperley, 2010)


https://universaldependencies.org

Dependency syntax

The connection with context free grammar

Although Dependency Grammar is generally best understood as adescriptive system, it has been observed early that a projectivedependency grammar can be expressed by a context free grammar(Gaifman, 1965). The rules are of the form:

X → α w β

(where α and β are sequences of non terminals and w is a terminal)

Example:

V → N V vu N

N → Pierre

N → D chat

D → le

V → a

Each rule encodes a lexical head with the categories of its dependantsBenoıt Crabbe Logical and Computational Structures for Linguistic ModelingMPRI 2-27-12020-2021 43 / 60

Dependency syntax

Context free grammar and robustness

A dg does not typically constrain for grammaticality, the set of grammatical rulesis some set of permutations of grammatical symbols. It seems in the first placethat dg is a poor theory of grammar. . .

Another comparison with cfg highlights the issue. The penn treebank is acollection of 40000 English constituent parse trees from which we can extractcounts of cfg rule occurrences, the pattern is again zipfian:

2.2 productivité grammaticale ? 33

Figure 15 – Courbes de Zipf et de Heaps pour CFG

[ 29 novembre 2017 at 11 :13 – classicthesis version 0.0 ]

This suggests that adding a long tail of low frequency rules is required to acquirethe robustness of a grammar


Combinatory Categorial Grammar

Plan



3 Dependency syntax




Syntax and semantics

Syntax is in general not an end in itself, except maybe for dependencyrepresentations

Categorial grammar is a framework that lends itself to study topics inthe syntax semantics interface.

In this part of the class we illustrate with combinatorial combinatorygrammar (ccg)



Categorial grammarCategories

Categorial grammar (Ajdukiewicz, 1935; Bar-Hillel, 1953)is made of alexicon that maps words to categories and a two inference rules

Let P be the set of primitive categories then the set C of categories isdefined inductively as:

if p ∈ P then p ∈ Cif p, q ∈ C then (p/q) ∈ Cif p, q ∈ C then (p\q) ∈ C

There is a primitve category called the axiom S ∈ P

The lexicon is a binary relation, relating word strings and categories.

Example lexical entries:Jean := N,Marie := N, petit := N/N, aime := (N\S)/N,est := N, est := (N\S)/(N/N)



Inference rules

Basic categorial grammar has two inference rules, forward andbackward functional application :

X/Y Y

X>

Y X\YX

<

Forward application rule (>) is understood as X/Y is a functortaking Y as right argument and returning X as result

Backward application rule (<) is understood as X\Y is a functortaking Y as left argument and returning X as result



A categorial grammar for arithmetic expressions

+ := (int\int)/int

- := (int\int)/int

0 := int

1 := int

2 := int

3 := int

3+2-1

3int

+

(int\int)/int2int

(int\int)>

int<

−(int\int)/int

1int

(int\int)>

int<



Categorial grammar and phrase structure grammar

John := NP

eats := (S\NP)/NP

apples := NP

JohnNP

eats(S\NP)/NP

apples

NP

(S\NP)>

S<

S

VP

NP

apples

V

eats

NP

John

Weak equivalence

Both cfg and cg generate context free languages. Note also that forevery AB categorial grammar, there is an equivalent cfg in ChomskyNormal Form. Note however that in the other direction the strongequivalence is not general



Categorial grammar and semantics

Categories can be understood as an encoding of the type of asemantic representation.

We can indeed make the semantic representation explicit byaugmenting the categories:

John := NP : Johneats := (S\NP)/NP : λxy .eat(y , x)apples := NP : apple

And add to the inference rule the capacity to operate on semanticrepresentation:

X/Y : f Y : a

X : fa>

Y : a X\Y : f

X : fa<

where f is a functor and a an argument



Example of semantic computation

JohnNP : John

eats(S\NP)/NP : λxy .eat(y , x)

apples

NP : apple

(S\NP) : λy .eat(y ,apple)>

S : eat(John,apple)<

Remark

With semantics, categorial grammar can be seen as a notational layerencoding a subset of simply typed lambda calculus with a “directionalapplication rule ”



Combinatory categorial grammar

As such, categorial grammar is not expressive enough to accomodatenatural language modelling.

Combinatory categorial grammar (ccg) adds some further inferencerules to handle various phenomena in natural language (Steedmanand Baldridge, 2011)

ccg not only provides more expressivity it also increases thegenerative capacity: ccg is mildly context sensitive

ccg additional inference rules make use only of Schonfinkel’s andCurry combinators



The functional composition rule

The functional composition rule is an example of such a combinatoryinference rule. It builds on Curry’s B combinator : B ≡ λfgz .f (gz)

The rules is given as follows:

X/Y : f Y /Z : g

X/Z : λz .f (gz)> B

Y \Z : g X\Y : f

X\Z : λz .f (gz)< B

Here is an example in natural language:John might eat apples

JohnNP : John

might

(S\NP)/VP : λxy .might(y , x)eat

VP/NP : λx .eat(x)

(S\NP)/NP : λzy .might(y , eat(z))> B

apples

NP : apples

(S\NP) : λy .might(y , eat(apples))>

S : might(John, eat(apples))<



Coordination

Coordination is one of the most complex phenomenon to analyze innatural language syntax.

For coordinators, the obvious category to start with is:

and := (X\X )/X (with X a metavariable over categories)

This allows for coordination of the same type, for instance:

theNP/N

blackN/N

catN

N>

NP>

and(NP\NP)/NP

theNP/N

mouseN

NP>

(NP\NP)>

NP<

runNP\S

S<



Type raising

Type raising is an additional inference rule that turns an argumentinto a function of function over this argument:

X : aT/(T\X ) : λf .fa

> T

Here is a case of coordination with shared object where type raisingapplies:

JohnNP

S/(S\NP)> T

likes(S\NP)/NP

S/NP> B

and((S/NP)\(S/NP))/(S/NP)

Mary

NPS/(S\NP)

> Tdislikes

(S\NP)/NP

S/NP> B

(S/NP)\(S/NP)>

S/NP<

garlic

NP

S>

As can be seen, type raising allows the NP subjects to be processedfirst in both sides of the coordination. The object is consumed last,once coordination is performed.



Relatives and movement

The additional inference rules (composition and type raising) allowccg to capture unbounded dependencies without adding a“movement” rule, let’s see an example with the object relative.

As seen earlier with adjectives, the noun modifier has type N\N, andan object relative is a sentence without object: S/NP. Thus therelative object pronoun is defined as :

that := (N\N)/(S/NP)

Example (The mouse that Felix eats):

theNP/N

mouseN

that(N\N)/(S/NP)

FelixNP

S/(S\NP)> T

eats(S\NP)/NP

S/NP> B

(N\N)>

N<

NP>



Subordinate clauses

To understand the following unbounded case, let’s see how weanalyze subordinate clauses such as You think that Felix eats themouse. In this case the that is no more a pronoun but rather anoptional complementizer of type S/S :

YouNP

think(S\NP)/S

thatS/S

FelixNP

eats(S\NP)/NP

theNP/N

mouseN

NP>

(S\NP)>

S<

S>

(S\NP)>

S<



The unbounded case

Here is an example for the unbounded movement case :The mouse that you think that Felix eatswhere we register the complementizer that:(S/NP)/(S/NP) andthink:((S/NP)\NP)/(S/NP) to licence bridging :

theNP/N

mouseN

that(N\N)/(S/NP)

you

NP

think((S/NP)\NP)/(S/NP)

that(S/NP)/(S/NP)

FelixNP

S/(S\NP)> T

eats(S\NP)/NP

S/NP> B

S/NP>

(S\NP)\NP >

S/NP<

(N\N)>

N<

NP>



Bibliography

Ajdukiewicz, K. (1935). Die syntaktische konnexitat. Studia Philosophia 1.

Bar-Hillel, Y. (1953). A quasi-arithmetical notation for syntactic description. Language 29(1), 47–58.

Chomsky, N. (1956). Three models for the description of language. IRE Trans. Information Theory 2(3), 113–124.

Collins, M. (2003, December). Head-driven statistical models for natural language parsing. Comput. Linguist. 29(4), 589–637.

Gaifman, H. (1965). Dependency systems and phrase-structure systems. Information and Control 8(3), 304–337.

Gildea, D. and D. Temperley (2010). Do grammars minimize dependency length? Cognitive Science 34(2), 286–310.

Joshi, A. K. and Y. Schabes (1997). Tree-adjoining grammars. In Handbook of Formal Languages, Volume 3: Beyond Words.,pp. 69–123.

Kallmeyer, L. (2010). Parsing Beyond Context-Free Grammars. Cognitive Technologies. Springer.

Klein, D. and C. D. Manning (2003). Accurate unlexicalized parsing. In Proceedings of the 41st Annual Meeting of theAssociation for Computational Linguistics, 7-12 July 2003, Sapporo Convention Center, Sapporo, Japan., pp. 423–430.

Peters, S. and R. Ritchie (1973). On the generative power of transformational grammars. Inf. Sci. 6, 49–83.

Petrov, S., L. Barrett, R. Thibaux, and D. Klein (2006). Learning accurate, compact, and interpretable tree annotation. In ACL2006, 21st International Conference on Computational Linguistics and 44th Annual Meeting of the Association forComputational Linguistics, Proceedings of the Conference, Sydney, Australia, 17-21 July 2006.

Seki, H., T. Matsumura, M. Fujii, and T. Kasami (1991). On multiple context-free grammars. Theor. Comput. Sci. 88(2),191–229.

Steedman, M. and J. Baldridge (2011). Combinatory categorial grammar.


Download - Logical and Computational Structures for Linguistic ...bcrabbe/mpri/cours3.pdfhidden structure y is instead organised as a tree. A Pcfg is a cfg that de nes a language L such that

Top Related