Logical and Computational Structures for LinguisticModeling
MPRI 2-27-1
Benoıt Crabbe
2020-2021
Benoıt Crabbe Logical and Computational Structures for Linguistic ModelingMPRI 2-27-12020-2021 1 / 60
Models of syntax
This class provides an introduction to several formal systems formodelling the syntax of natural languages.
The class highlights some of the key modelling problems in naturallanguage syntax and each time we provide an illustration with aformal system that copes well with the problem at hand
Modelling issues in the syntax of natural languages:
Ambiguity and lexicalism: weaknesses and reformulations of pcfgMove: Expression of movement without moving, non context-freepatterns and tree adjoining grammars (tag)Robustness and word order freedom Dependency syntaxSyntax semantics interface and combinatory categorial grammar(ccg)
Benoıt Crabbe Logical and Computational Structures for Linguistic ModelingMPRI 2-27-12020-2021 2 / 60
PCFG and ambiguity
Plan
1 PCFG and ambiguity
2 Tree adjoining grammars
3 Dependency syntax
4 Combinatory Categorial Grammar
Benoıt Crabbe Logical and Computational Structures for Linguistic ModelingMPRI 2-27-12020-2021 3 / 60
PCFG and ambiguity
Probabilistic Context Free grammar
Probabilistic context free grammar (pcfg) is a tree rather than asequence model, yet it falls into the class of generative languagemodels.
While Hmm assume an hidden sequence y, pcfg assume that thehidden structure y is instead organised as a tree.
A Pcfg is a cfg that defines a language L such that∑
x∈L P(x) = 1.That is pcfg is a language model.
Benoıt Crabbe Logical and Computational Structures for Linguistic ModelingMPRI 2-27-12020-2021 4 / 60
PCFG and ambiguity
Definition
A pcfg is a 5-tuple G = 〈Σ,T ,S ,R,P〉 where:
Σ is a set of non terminal symbols
T is a set of terminal symbols (words)
S ∈ Σ is an axiom
R is a set of rules of the form A→ β
P is a probabilistic weighting function such that P(A→ β) ∈ [0, 1]and
∑β P(A→ β) = 1 for every A ∈ Σ
Benoıt Crabbe Logical and Computational Structures for Linguistic ModelingMPRI 2-27-12020-2021 5 / 60
PCFG and ambiguity
cfg derivation as a tree
S
VP
PP
NP
N
mat
D
the
P
on
V
sleeps
NP
N
cat
D
The
The tree instanciates thefollowing occurrences of cfgrules:
S → NP VP
NP → D N
VP → V NP
PP → P NP
NP → D N
D → The
N → cat
V → sleeps
P → on
D → the
N → mat
Benoıt Crabbe Logical and Computational Structures for Linguistic ModelingMPRI 2-27-12020-2021 6 / 60
PCFG and ambiguity
Probability of a tree and the best tree problem
The probability of a derivation tree t is defined as:
P(t) =∏
A→β∈tP(A→ β)
Note that P(x, y) = P(t) in our notation inherited from Hmmbecause the tree also generates the observed x symbols as leaves
Let T (x) be the set of trees such that yield(t) = x for everyt ∈ T (x). In what follows we will be mostly consider the problem ofpredicting the max probability tree:
t = argmaxt∈T (x)
P(t)
Benoıt Crabbe Logical and Computational Structures for Linguistic ModelingMPRI 2-27-12020-2021 7 / 60
PCFG and ambiguity
Example of ambiguity and lexical effects
(A) S
NP
PP
NP
N
tomates
D
des
P
avec
N
salade
D
une
VN
mange
NP
Pierre
(B) S
PP
NP
N
tomates
D
des
P
avec
NP
N
salade
D
une
VN
mange
NP
Pierre
The lexical problem
The preferred interpretation is (A). The rules colored in red are those thatdiffer between the two trees. With standard pcfg the choice ofinterpretation is dependant of the probability of generic structural rulesindependently of the lexical elements. Observe that for the sentence withidentical structure whose preferred structure is (B) Pierre mange unesalade avec des couverts the disambiguation choice is identical.
Benoıt Crabbe Logical and Computational Structures for Linguistic ModelingMPRI 2-27-12020-2021 8 / 60
PCFG and ambiguity
A lexicalized solutionparse trees
A possible solution to the observed problem amounts to lexicalize theparse trees (Collins, 2003) with lexical heads such as:
(A) S[mange]
NP[salade]
PP[tomates]
NP[tomates]
N[tomates]
tomates
D[des]
des
P[avec]
avec
N[salade]
salade
D[une]
une
VN[mange]
mange
NP[Pierre]
Pierre
(B) S[mange]
PP[tomates]
NP[tomates]
N[tomates]
tomates
D[des]
des
P[avec]
avec
NP[salade]
N[salade]
salade
D[une]
une
VN[mange]
mange
NP[Pierre]
Pierre
Benoıt Crabbe Logical and Computational Structures for Linguistic ModelingMPRI 2-27-12020-2021 9 / 60
PCFG and ambiguity
A lexicalized solutionThe grammar
The lexicalized representation implies we have grammars with rulesannotated with lexical words, for the example:
S [mange] → NP[Pierre]VN[mange]NP[salade]
NP[salade] → D[une]N[salade]PP[tomates]
The rules of such a grammar are made of lexicalized categories X [w ]that are couples of a traditional non terminal and a word symbol:
X [wh] → Y1[w1] . . .Yh[wh] . . .Yn[wn]
X [wh] → wh
every lexicalized rule is restricted to have at least one occurrence ofthe lexical symbol on the lhs as part of a symbol in the rhs
Benoıt Crabbe Logical and Computational Structures for Linguistic ModelingMPRI 2-27-12020-2021 10 / 60
PCFG and ambiguity
Combinatorial explosion in the size of the grammar
A Pcfg grammar with lexicalized rules of this form has |Σ| × |V | nonterminal symbols
Let n be the arity of the grammar rules and let |T | = 300000,|Σ| = 40, the number of rules r becomes quickly astronomical:
r = |Σ|(|Σ| × |T |)n
5, 76 1015 ≈ 40× (40× 300000)2
Pcfg parsers can process such grammars by generating rulesdynamically (lexical elements fill their slots while parsing)
Probability estimation is difficult whatever the estimation method.The problem amounts to estimate probabilities for events that belongto general purpose world knowledge
Benoıt Crabbe Logical and Computational Structures for Linguistic ModelingMPRI 2-27-12020-2021 11 / 60
PCFG and ambiguity
The unlexicalized solution
Another line of solution comes from the unlexicalized representations(Klein and Manning, 2003; Petrov et al., 2006)
Here the working hypothesis rejects the idea of acquiring worldknowledge from data
Disambiguation is performed by identifying recurrent patterns in thedata that generally help to disambiguate properly
Example: the late attachment preference
This pattern studied by Frazier (1987) states that in case of choice, wegenerally prefer to attach to the latest open constituent in the sentence:
(Tom said (that Bill had taken the cleaning out )? yesterday )? )
Benoıt Crabbe Logical and Computational Structures for Linguistic ModelingMPRI 2-27-12020-2021 12 / 60
PCFG and ambiguity
The capture of unlexicalized patterns by grammar categoryrefinements
Category refinements for an example of the form:. . . le frere du pere de la voisine
(A) NP
PP
NP
PP
NPP
NP
P
NP
(B) NP
PP
NPP
NP
PP
NPP
NP
Key observation
Both trees have the exact same probability with pcfg because they arebuilt by an identical set of rules.
Benoıt Crabbe Logical and Computational Structures for Linguistic ModelingMPRI 2-27-12020-2021 13 / 60
PCFG and ambiguity
The capture of unlexicalized patterns by categoryrefinements
The parent annotation amounts to annotate each node in a tree by itsparent category
S
VP
NP
N
mouse
D
a
V
ate
NP
N
cat
D
The
parent=⇒
S/∅
VP/S
NP/VP
N
mouse
D
a
V
ate
NP/S
N
cat
D
The
The immediate effect is to contextualize the categories (NP/S is asubject and NP/VP is object). This makes sense if we look at statsfrom Penn treebank:
Type NP NP PP DT NN PRP Autre
NP/S (sujet) 9% 9% 21% 61%NP/VP (objet) 22% 7% 3% 69%NP/? (Tous) 11% 9% 6% 74%
Benoıt Crabbe Logical and Computational Structures for Linguistic ModelingMPRI 2-27-12020-2021 14 / 60
PCFG and ambiguity
The capture of unlexicalized patterns by grammar categoryrefinementsLate closure by parent annotation
The parent annotation also catches the late closure preference:
(A) NP/∅
PP/NP
NP/PP
PP/NP
NP/PPP
NP/NP
P
NP/NP
(B) NP/∅
PP/NP
NP/PPP
NP/NP
PP/NP
NP/PPP
NP/NP
where NP within a PP get category NP/PP and NP within an NPcategory NP/NP. Since the rule sets are not anymore identical, thepreference is captured if we count it in data.
Benoıt Crabbe Logical and Computational Structures for Linguistic ModelingMPRI 2-27-12020-2021 15 / 60
PCFG and ambiguity
PCFG evaluation
Pcfg parsers are evaluated by comparing the predictions of the parser with handannotated trees.
A parse triplet is a tuple 〈X , i , j〉 where X is a category and i the index of theleftmost word in its yield, j the index of the rightmost word.
The set predicted is the set of all triplets found in the predicted parse tree and theset, reference is the set of all triples found in the reference and predicted correct= predicted ∩ reference.
The precision is defined as:
P =predicted correct
predicted
The recall as:
R =predicted correct
reference
and the F-score:
F =2PR
P + R
Benoıt Crabbe Logical and Computational Structures for Linguistic ModelingMPRI 2-27-12020-2021 16 / 60
PCFG and ambiguity
Some results
Name F-score
pcfg – base ∼ 70Unlex. annotations 85.8pcfg-la 90.1Lexicalized (Collins) 87.6Lexicalized (Charniak) 89.7
State of the art (2019) 95.6
Benoıt Crabbe Logical and Computational Structures for Linguistic ModelingMPRI 2-27-12020-2021 17 / 60
Tree adjoining grammars
Plan
1 PCFG and ambiguity
2 Tree adjoining grammars
3 Dependency syntax
4 Combinatory Categorial Grammar
Benoıt Crabbe Logical and Computational Structures for Linguistic ModelingMPRI 2-27-12020-2021 18 / 60
Tree adjoining grammars
Movement in natural language
Let’s consider again the syntax of an arithmetic expression:
(3 + (4 x 2))
We can write such expressions with prefix, infix, postfix notation,whatever the notation, the functor stands next to its arguments,yielding an easy process of evaluation.
In natural language, cases following patterns of the form:
(2 (3 + (4 x )))
are quite possible. Here it is meant that the 2 has been ’moved away’from its canonical place (as argument of its functor) to some other.This ’move’ is fairly natural in many languages.
Benoıt Crabbe Logical and Computational Structures for Linguistic ModelingMPRI 2-27-12020-2021 19 / 60
Tree adjoining grammars
Examples of movements in natural languages
(interrogatives)(Quel loup)1 ((Marie) a-t-elle vu (t1) (hier)) ?(Quel loup)1 (tu crois (que ((Marie) a vu (t1) (hier)))) ?
(relatives)Le loup ((que)1 ((Marie) a vu (t1) (hier))) a disparuLe loup ((que)1 (tu crois (que ((Marie) a vu (t1) (hier))))) a disparu
(clefts)C’est (le loup)1 (que ((Marie) a vu (t1) (hier))) ?C’est (le loup)1 (que (tu crois (que ((Marie) a vu (t1) (hier))))) ?
A priori unbounded
Le loup que Pierre pense que son frere a dit (. . . ) que le voisin croit queMarie aurait vu
Benoıt Crabbe Logical and Computational Structures for Linguistic ModelingMPRI 2-27-12020-2021 20 / 60
Tree adjoining grammars
Generative and transformational grammar
The formalisation of a generative grammar with movement (ortransformations) is the original insight of Chomsky (1956).
The architecture of a transformational grammar is the following:
Generate a base structure where all the predicates and arguments are inthe same domain of locality (informally, the functor can fetch itsarguments within the same grammatical rule)Apply tree transformations (include tree structure addition, removal,substitution) to generate the surface (observed) form
Undecidability
Since transformations can be applied multiple times in sequence and giventheir properties, there is no guarantee that the transformational processhalts, the system is undecidable (Peters and Ritchie, 1973).
Benoıt Crabbe Logical and Computational Structures for Linguistic ModelingMPRI 2-27-12020-2021 21 / 60
Tree adjoining grammars
Tree adjoining grammar
Tree adjoining grammar (Joshi and Schabes, 1997) builds on theinsight of lexicalism and on the idea to provide a means to expressmovement without actually expressing explicit tree transformationsbut by means of an operation called adjunction.
Tag is decidable and one can design parsing algorithms in polynomialtime O(n6)
Benoıt Crabbe Logical and Computational Structures for Linguistic ModelingMPRI 2-27-12020-2021 22 / 60
Tree adjoining grammars
A tree grammar
tag is a tree grammar, whose units are elementary trees that canbe substituted into each other:
]
Descriptive syntax
Formal syntax cares more about the byproduct (trees) thanabout the derivation
We are eager to view the grammar and syntactic parsing asa tree building device where trees are substitutedaltogether.
S
NP
NP
D
D
Le
N
N
chat
VP
VP
V
V
dort
PP
PP
P
P
sur
NP
NP
D
D
le
N
N
paillasson
S
NP
D
Le
N
chat
VP
V
dort
PP
P
sur
NP
D
le
N
paillasson
Benoit Crabbe Structures Informatiques et Logiques pour la Modelisation
Benoıt Crabbe Logical and Computational Structures for Linguistic ModelingMPRI 2-27-12020-2021 23 / 60
Tree adjoining grammars
Extended domain of locality
Since elementary trees are first class citizens, Tag allows us to usetrees of depth > 1:
]
Extending the domain of locality
Under this descriptive view, we can not only consider plugging together
subtrees of depth 1, but subtrees of an arbitratry depth ≥ 1.
S
NP
NP
D
D
Le
N
chat
VP
V
dort
PP
PP
P
sur
NP
NP
D
D
le
N
paillasson
Vocabulary
We say that trees of arbitrary depth have an extended domain of locality, a
grammar that has as units such trees is a Tree substitution grammar (TSG)
Benoit Crabbe Structures Informatiques et Logiques pour la Modelisa
Such trees are said to have an extended domain of locality. Agrammar whose units are elementary trees and with an operationcalled tree substitution is called a tree substitution grammar (tsg)
Benoıt Crabbe Logical and Computational Structures for Linguistic ModelingMPRI 2-27-12020-2021 24 / 60
Tree adjoining grammars
Tree rewriting and derivations
The substitution operation amounts to rewrite the leaf node of anelementary tree labelled with a non terminal symbol by an elementarytree whose root is labelled with the same non terminal symbol.The derivation of a tsg is not a sequence of rewrite operations but atree:
]
Relevance for syntactic description
Allows for a direct encoding of lexical dependencies in thegrammar.
Provided that the grammar is lexicalised
S
NP
NP
D
D
Le
N
chat
VP
V
dort
PP
PP
P
sur
NP
NP
D
D
le
N
paillasson
dort
chat
le
sur
paillasson
le
Benoit Crabbe Structures Informatiques et Logiques pour la Modelisa
(here we name elementary trees by their lexical item)Benoıt Crabbe Logical and Computational Structures for Linguistic ModelingMPRI 2-27-12020-2021 25 / 60
Tree adjoining grammars
Lexicalisation
Definition (lexicalisation)
A finitely ambiguous grammar G is lexicalized if every rule in G contains alexical element
Definition (weak lexicalisation of a formalism)
A formalism F weakly lexicalizes a formalism F ′ if for every grammarG ′ ∈ F ′ we have a lexicalized grammar G ∈ F such that L(G ′) = L(G )
cfg in Greibach normal form (X → a α) weakly lexicalizes cfg
Definition (strong lexicalisation of a formalism)
A formalism F weakly lexicalizes a formalism F ′ if for every grammarG ′ ∈ F ′ we have a lexicalized grammar G ∈ F that generates the sametree set as G ′
Benoıt Crabbe Logical and Computational Structures for Linguistic ModelingMPRI 2-27-12020-2021 26 / 60
Tree adjoining grammars
tsg does not strongly lexicalize cfg
The proof relies on the idea that each elementary tree of a tsg has afinite height while some cfg grammars can generate trees where allpaths can grow without bound
Theorem
Consider the grammar G :
S → S S
S → a
This grammar generates sequences of a with trees whose path length fromthe root to the leaves can grow unboundedly. This is a contradiction withthe definition of tsg where the elementary trees have finite height andwith the definition of a grammar that has finite number of rules.
Benoıt Crabbe Logical and Computational Structures for Linguistic ModelingMPRI 2-27-12020-2021 27 / 60
Tree adjoining grammars
Tree adjoining grammar
Definition
A tree adjoining grammar (tag) is a tuple 〈Σ, S , L, I ,A〉 where:
Σ is a set of non terminal symbols
S ∈ Σ is an axiom
L is a set of terminal symbols (L ∪ Σ = ∅)I is a set of initial trees. A leaf node n whose label `(n) ∈ Σ is calleda substitution node. A leaf node n whose label `(n) ∈ L is called ananchor node. Any non leaf node is labelled by a non terminal
A is a set of auxiliary trees. Every auxiliary tree has exactly onenode whose label `(n) is equal to the root label. n is called the footnode (notation ?).
The set of trees I ∪ A is the set of elementary trees.
Two compositions operations are defined: substitution and adjunction
Benoıt Crabbe Logical and Computational Structures for Linguistic ModelingMPRI 2-27-12020-2021 28 / 60
Tree adjoining grammars
Adjunction
Adjunction
Let β be an auxiliary tree and α an elementary tree. β is adjoined onnode x of α by splicing α in two parts. The top node x top is replacedby the root of β and the bottom node xbot is replaced by the footnode of β.
Adjunction is subject to the constraint that the foot node, root nodeof β and x have the same label.
It is not possible to adjoin on a substitution node
]
Adjoining
Adjoining : inserts a tree β into a tree αβ
V
Vaux
a
V⋆
α
S
NP↓ V
vu
NP↓⇒
S
NP↓ V
Vaux
a
V
vu
NP↓
Adjunction
Let β be an elementary tree and α an auxiliary tree, α isadjoined on node x of β by splicing β in two parts. The topnode x from β is replaced by the root of α, the bottomnode x from β is replaced by the foot node of α.
Foot node in β and root node in β have the same categoryN
Node receiving adjunction must have category N .
No adjoining on substitution node
Benoit Crabbe Structures Informatiques et Logiques pour la Modelisation
Benoıt Crabbe Logical and Computational Structures for Linguistic ModelingMPRI 2-27-12020-2021 29 / 60
Tree adjoining grammars
Adjoining constraints
Every node n of an elementary tree can get an adjunction constraint:
Obligatory adjunction (OA) it is mandatory to perform an adjunctionon this nodeSelective adjunction (SA) a subset of the trees of the grammar canadjoin on this nodeNull adjunction (NA) no adjunction can be performed on this node
Benoıt Crabbe Logical and Computational Structures for Linguistic ModelingMPRI 2-27-12020-2021 30 / 60
Tree adjoining grammars
Derivation in tree adjoining grammars
A derivation in a tag starts with an initial tree whose root is labelledby the grammar axiom (`(r) = S) The derivation process replaces thesubstitution nodes until all substitution nodes have been substitutedin the derivation tree. Adjunction may optionally occur.
Example derivation
Elementary treesN
Pierre
N
Marie
V
V?Vaux
a
S
NV
vu
N
Derivation treeα[vu]
3 : α[Marie]1 : α[Pierre]2 : β[a]
Derived treeS
N
Marie
V
V
vu
Vaux
a
N
Pierre
Benoıt Crabbe Logical and Computational Structures for Linguistic ModelingMPRI 2-27-12020-2021 31 / 60
Tree adjoining grammars
Adjunction and unbounded move
Lexicalized tag can capture naturally examples that would call fortransformations otherwise:
]
Solution for TAG
Adjoining of the main clause into the subordinate clause:
S
N
Pierre
V
pense
Ssub
C
que
S⋆
Srel
ProRel
ProRel
que
S
N
Marie
V
a vu
vu
a Marie que pense
Factoring recursion out of the domain of locality
The predicate vu is encoded locally with all its dependants.
The recursive structure is factored out of the domain of locality
Rq : The “design” principle (argument/modifier) is not respected
Benoit Crabbe Structures Informatiques et Logiques pour la Modelisation
(Le loup) que Pierre pense que Marie a vu
Factoring recursion out of the domain of locality
As can be observed, the functor vu (or predicate) has access to all itsarguments que,Pierre,Marie within the same grammatical rule (elementarytree) while the recursive component is factored out. Observe also that theoperation can be repeated hence causing the relative pronoun to beunboundedly far from its predicate.
Benoıt Crabbe Logical and Computational Structures for Linguistic ModelingMPRI 2-27-12020-2021 32 / 60
Tree adjoining grammars
Context sensitivity
Adjunction is also an operation that contributes to make of tag aclass of language that is a superset of context free languages (and asubset of context sensitive languages). This class of languages iscalled mildly context sensitive languages.
Grammar generating the pattern anbncndn:S
ε
Sna
dS
cS?nab
a
Grammar generating the 2-copy language {ww |w ∈ {a, b}∗}S
ε
Sna
S
aS?na
a
Sna
S
bS?na
b
Benoıt Crabbe Logical and Computational Structures for Linguistic ModelingMPRI 2-27-12020-2021 33 / 60
Tree adjoining grammars
Context sensitivity and natural languages
The pattern of the 2-copy language is found naturally in Dutch (andSwiss German):
. . . omdat ik Cecilia Henk de nijlpaarden zag helpen voeren. . . because I Cecilia Henk the hippopotamus saw help feed
. . . because I saw Cecilia help Henk feed the hippopotamus
This kind of example (cross serial dependencies) suggest thatnatural languages are mildly context sensitive.
Benoıt Crabbe Logical and Computational Structures for Linguistic ModelingMPRI 2-27-12020-2021 34 / 60
Tree adjoining grammars
Mildly context sensitive systems
tag is by no means the only mildly context sensitive formalism. ccghas the same generative capacity
tag however suffers from expressivity limits and there exist slightlymore general systems such as lcfrs (Kallmeyer, 2010), mcfg (Sekiet al., 1991) and multi-component tag that make it easier to modelthe syntax of languages with more word order freedom such asGerman, Korean . . .
Benoıt Crabbe Logical and Computational Structures for Linguistic ModelingMPRI 2-27-12020-2021 35 / 60
Dependency syntax
Plan
1 PCFG and ambiguity
2 Tree adjoining grammars
3 Dependency syntax
4 Combinatory Categorial Grammar
Benoıt Crabbe Logical and Computational Structures for Linguistic ModelingMPRI 2-27-12020-2021 36 / 60
Dependency syntax
Dependency grammar
Dependency grammars (dg) are a grammatical description systemthat relate words of a sentence with dependency edges:
D N V P D Nthe cat sleeps on the mat
And sometimes with typed dependencies:
D N V P D Nthe cat sleeps on the mat
det subj iobj
pobj
det
Benoıt Crabbe Logical and Computational Structures for Linguistic ModelingMPRI 2-27-12020-2021 37 / 60
Dependency syntax
Example of dependency dag
D N V V P D Nle chat souhaite dormir sur le paillasson
Dependency Dag
The decoration is variable but the core of the dependency representation isa dependency dag with nodes ordered according to the linear order ofthe sentence
Benoıt Crabbe Logical and Computational Structures for Linguistic ModelingMPRI 2-27-12020-2021 38 / 60
Dependency syntax
The tradition of Dependency Grammar
Dependency grammar comes from a descriptive tradition
It is well suited for the multilingual case:
]
Why do we care ?
Direct encoding of predicate argument structure (versusphrase structure grammar or TAG)
Free word order languages where dependency structure isless related to the way words are grouped together (andunbounded dependencies)
Benoit Crabbe Structures Informatiques et Logiques pour la Modelisa
Yet dependency grammar is not a grammar (!)
There are no rules of grammarThe grammar does not generate a language (in the formal sense)Dependency grammar makes no predictions about language, rather itprovides descriptionsOnly given a sentence, the grammar provides a description, this issimilar in spirit to discriminative models in machine learning.
Benoıt Crabbe Logical and Computational Structures for Linguistic ModelingMPRI 2-27-12020-2021 39 / 60
Dependency syntax
Constraints on dependency structures
Let G = 〈V ,E 〉 be a dependency dag, we note:
i → j iff i 6= j , (i , j) ∈ E
i ↔ j iff i 6= j , (i , j) ∨ (j , i) ∈ E
i∗→ j iff i = j or ∃x .i → x , x
∗→ j
i∗↔ j iff i = j or ∃x .i ↔ x , x
∗↔ j
It is common to apply some of the following constraints on dependencystructures:
G is connected : if i , j ∈ V then i∗↔ j
G is acyclic : if i → j then not j∗→ i
Single head (treeness) constraint: if i → j then not x → j (∀x ∈ V )
Projectivity if i → j then i∗→ x ∀x .i ≺ x ≺ j
Benoıt Crabbe Logical and Computational Structures for Linguistic ModelingMPRI 2-27-12020-2021 40 / 60
Dependency syntax
The projectivity constraint
The projectivity constraint (i → j then i∗→ x ∀x .i ≺ x ≺ j ) says
that a node generates a yield of contiguous words in the sentence :
]
Conditions (well-formedness) on dependency graphs
G is (weakly) connected :
if i, j ∈ V then i∗↔ j
G is acyclic :
if i → j then not j∗→ i
Single head constraint :
if i → j then not i′ → j ∀i′ = i
G is projective:
if i → j then i∗→ i′ ∀i′ such that i < i′ < j or j < i′ < i
i
j
i'
wi wjwi'
*
i
j
i'
wiwj wi'
*
i
ji'
wi wjwi'
Non projective Projective
Benoit Crabbe Structures Informatiques et Logiques pour la Modelisa
Non projectivity is related to movement and is likely to occur moreoften with free word order languages (red nodes indicate a gap):
]
Remarks on constraints
Usually not all constraints are used, connectedness andacyclicty is used most of the time, used, Projectivity issometimes relaxed (in theory most of the time). Howeverthe list of constraints used as an impact on algorithms forparsing.
The projectivity constraint is typically used for practicalapplications, (under this constraint it is fairly easy todesign a parser) however in theory we have non projectiveconstraints as soon as we handle unbounded dependencies,recall :
Le garcon que Pierre pense que Marie aime
Le garçon que Pierre pense que Marie aime
Benoit Crabbe Structures Informatiques et Logiques pour la ModelisaBenoıt Crabbe Logical and Computational Structures for Linguistic ModelingMPRI 2-27-12020-2021 41 / 60
Dependency syntax
Are natural languages really non projective ?
One may think that free word order languages are highly nonprojectiveLet the gap degree of a node be the number of gaps below a node andlet the gap degree of a tree be the maximum gap degree of its nodesWith UD corpora (https://universaldependencies.org) one canmeasure that the gap degree of most languages is generally close to 0:
For languages such has English or French almost 99% of trees haveGap Degree 0.For languages said to have free word order, we observe that 97% ofLatin trees have Gap Degree 0. This is the same for Dutch orHungarian. Ancient Greek has 92% trees with Gap Degree 0.
Dependency Length Minimization
This observation is a consequence of a more general trend: dependenciestend to be short, and in general the structure of the sentence tends tominimize the length of dependencies (Gildea and Temperley, 2010)
Benoıt Crabbe Logical and Computational Structures for Linguistic ModelingMPRI 2-27-12020-2021 42 / 60
Dependency syntax
The connection with context free grammar
Although Dependency Grammar is generally best understood as adescriptive system, it has been observed early that a projectivedependency grammar can be expressed by a context free grammar(Gaifman, 1965). The rules are of the form:
X → α w β
(where α and β are sequences of non terminals and w is a terminal)
Example:
V → N V vu N
N → Pierre
N → D chat
D → le
V → a
Each rule encodes a lexical head with the categories of its dependantsBenoıt Crabbe Logical and Computational Structures for Linguistic ModelingMPRI 2-27-12020-2021 43 / 60
Dependency syntax
Context free grammar and robustness
A dg does not typically constrain for grammaticality, the set of grammatical rulesis some set of permutations of grammatical symbols. It seems in the first placethat dg is a poor theory of grammar. . .
Another comparison with cfg highlights the issue. The penn treebank is acollection of 40000 English constituent parse trees from which we can extractcounts of cfg rule occurrences, the pattern is again zipfian:
2.2 productivité grammaticale ? 33
Figure 15 – Courbes de Zipf et de Heaps pour CFG
[ 29 novembre 2017 at 11 :13 – classicthesis version 0.0 ]
This suggests that adding a long tail of low frequency rules is required to acquirethe robustness of a grammar
Benoıt Crabbe Logical and Computational Structures for Linguistic ModelingMPRI 2-27-12020-2021 44 / 60
Combinatory Categorial Grammar
Plan
1 PCFG and ambiguity
2 Tree adjoining grammars
3 Dependency syntax
4 Combinatory Categorial Grammar
Benoıt Crabbe Logical and Computational Structures for Linguistic ModelingMPRI 2-27-12020-2021 45 / 60
Combinatory Categorial Grammar
Syntax and semantics
Syntax is in general not an end in itself, except maybe for dependencyrepresentations
Categorial grammar is a framework that lends itself to study topics inthe syntax semantics interface.
In this part of the class we illustrate with combinatorial combinatorygrammar (ccg)
Benoıt Crabbe Logical and Computational Structures for Linguistic ModelingMPRI 2-27-12020-2021 46 / 60
Combinatory Categorial Grammar
Categorial grammarCategories
Categorial grammar (Ajdukiewicz, 1935; Bar-Hillel, 1953)is made of alexicon that maps words to categories and a two inference rules
Let P be the set of primitive categories then the set C of categories isdefined inductively as:
if p ∈ P then p ∈ Cif p, q ∈ C then (p/q) ∈ Cif p, q ∈ C then (p\q) ∈ C
There is a primitve category called the axiom S ∈ P
The lexicon is a binary relation, relating word strings and categories.
Example lexical entries:Jean := N,Marie := N, petit := N/N, aime := (N\S)/N,est := N, est := (N\S)/(N/N)
Benoıt Crabbe Logical and Computational Structures for Linguistic ModelingMPRI 2-27-12020-2021 47 / 60
Combinatory Categorial Grammar
Inference rules
Basic categorial grammar has two inference rules, forward andbackward functional application :
X/Y Y
X>
Y X\YX
<
Forward application rule (>) is understood as X/Y is a functortaking Y as right argument and returning X as result
Backward application rule (<) is understood as X\Y is a functortaking Y as left argument and returning X as result
Benoıt Crabbe Logical and Computational Structures for Linguistic ModelingMPRI 2-27-12020-2021 48 / 60
Combinatory Categorial Grammar
A categorial grammar for arithmetic expressions
+ := (int\int)/int
- := (int\int)/int
0 := int
1 := int
2 := int
3 := int
3+2-1
3int
+
(int\int)/int2int
(int\int)>
int<
−(int\int)/int
1int
(int\int)>
int<
Benoıt Crabbe Logical and Computational Structures for Linguistic ModelingMPRI 2-27-12020-2021 49 / 60
Combinatory Categorial Grammar
Categorial grammar and phrase structure grammar
John := NP
eats := (S\NP)/NP
apples := NP
JohnNP
eats(S\NP)/NP
apples
NP
(S\NP)>
S<
S
VP
NP
apples
V
eats
NP
John
Weak equivalence
Both cfg and cg generate context free languages. Note also that forevery AB categorial grammar, there is an equivalent cfg in ChomskyNormal Form. Note however that in the other direction the strongequivalence is not general
Benoıt Crabbe Logical and Computational Structures for Linguistic ModelingMPRI 2-27-12020-2021 50 / 60
Combinatory Categorial Grammar
Categorial grammar and semantics
Categories can be understood as an encoding of the type of asemantic representation.
We can indeed make the semantic representation explicit byaugmenting the categories:
John := NP : Johneats := (S\NP)/NP : λxy .eat(y , x)apples := NP : apple
And add to the inference rule the capacity to operate on semanticrepresentation:
X/Y : f Y : a
X : fa>
Y : a X\Y : f
X : fa<
where f is a functor and a an argument
Benoıt Crabbe Logical and Computational Structures for Linguistic ModelingMPRI 2-27-12020-2021 51 / 60
Combinatory Categorial Grammar
Example of semantic computation
JohnNP : John
eats(S\NP)/NP : λxy .eat(y , x)
apples
NP : apple
(S\NP) : λy .eat(y ,apple)>
S : eat(John,apple)<
Remark
With semantics, categorial grammar can be seen as a notational layerencoding a subset of simply typed lambda calculus with a “directionalapplication rule ”
Benoıt Crabbe Logical and Computational Structures for Linguistic ModelingMPRI 2-27-12020-2021 52 / 60
Combinatory Categorial Grammar
Combinatory categorial grammar
As such, categorial grammar is not expressive enough to accomodatenatural language modelling.
Combinatory categorial grammar (ccg) adds some further inferencerules to handle various phenomena in natural language (Steedmanand Baldridge, 2011)
ccg not only provides more expressivity it also increases thegenerative capacity: ccg is mildly context sensitive
ccg additional inference rules make use only of Schonfinkel’s andCurry combinators
Benoıt Crabbe Logical and Computational Structures for Linguistic ModelingMPRI 2-27-12020-2021 53 / 60
Combinatory Categorial Grammar
The functional composition rule
The functional composition rule is an example of such a combinatoryinference rule. It builds on Curry’s B combinator : B ≡ λfgz .f (gz)
The rules is given as follows:
X/Y : f Y /Z : g
X/Z : λz .f (gz)> B
Y \Z : g X\Y : f
X\Z : λz .f (gz)< B
Here is an example in natural language:John might eat apples
JohnNP : John
might
(S\NP)/VP : λxy .might(y , x)eat
VP/NP : λx .eat(x)
(S\NP)/NP : λzy .might(y , eat(z))> B
apples
NP : apples
(S\NP) : λy .might(y , eat(apples))>
S : might(John, eat(apples))<
Benoıt Crabbe Logical and Computational Structures for Linguistic ModelingMPRI 2-27-12020-2021 54 / 60
Combinatory Categorial Grammar
Coordination
Coordination is one of the most complex phenomenon to analyze innatural language syntax.
For coordinators, the obvious category to start with is:
and := (X\X )/X (with X a metavariable over categories)
This allows for coordination of the same type, for instance:
theNP/N
blackN/N
catN
N>
NP>
and(NP\NP)/NP
theNP/N
mouseN
NP>
(NP\NP)>
NP<
runNP\S
S<
Benoıt Crabbe Logical and Computational Structures for Linguistic ModelingMPRI 2-27-12020-2021 55 / 60
Combinatory Categorial Grammar
Type raising
Type raising is an additional inference rule that turns an argumentinto a function of function over this argument:
X : aT/(T\X ) : λf .fa
> T
Here is a case of coordination with shared object where type raisingapplies:
JohnNP
S/(S\NP)> T
likes(S\NP)/NP
S/NP> B
and((S/NP)\(S/NP))/(S/NP)
Mary
NPS/(S\NP)
> Tdislikes
(S\NP)/NP
S/NP> B
(S/NP)\(S/NP)>
S/NP<
garlic
NP
S>
As can be seen, type raising allows the NP subjects to be processedfirst in both sides of the coordination. The object is consumed last,once coordination is performed.
Benoıt Crabbe Logical and Computational Structures for Linguistic ModelingMPRI 2-27-12020-2021 56 / 60
Combinatory Categorial Grammar
Relatives and movement
The additional inference rules (composition and type raising) allowccg to capture unbounded dependencies without adding a“movement” rule, let’s see an example with the object relative.
As seen earlier with adjectives, the noun modifier has type N\N, andan object relative is a sentence without object: S/NP. Thus therelative object pronoun is defined as :
that := (N\N)/(S/NP)
Example (The mouse that Felix eats):
theNP/N
mouseN
that(N\N)/(S/NP)
FelixNP
S/(S\NP)> T
eats(S\NP)/NP
S/NP> B
(N\N)>
N<
NP>
Benoıt Crabbe Logical and Computational Structures for Linguistic ModelingMPRI 2-27-12020-2021 57 / 60
Combinatory Categorial Grammar
Subordinate clauses
To understand the following unbounded case, let’s see how weanalyze subordinate clauses such as You think that Felix eats themouse. In this case the that is no more a pronoun but rather anoptional complementizer of type S/S :
YouNP
think(S\NP)/S
thatS/S
FelixNP
eats(S\NP)/NP
theNP/N
mouseN
NP>
(S\NP)>
S<
S>
(S\NP)>
S<
Benoıt Crabbe Logical and Computational Structures for Linguistic ModelingMPRI 2-27-12020-2021 58 / 60
Combinatory Categorial Grammar
The unbounded case
Here is an example for the unbounded movement case :The mouse that you think that Felix eatswhere we register the complementizer that:(S/NP)/(S/NP) andthink:((S/NP)\NP)/(S/NP) to licence bridging :
theNP/N
mouseN
that(N\N)/(S/NP)
you
NP
think((S/NP)\NP)/(S/NP)
that(S/NP)/(S/NP)
FelixNP
S/(S\NP)> T
eats(S\NP)/NP
S/NP> B
S/NP>
(S\NP)\NP >
S/NP<
(N\N)>
N<
NP>
Benoıt Crabbe Logical and Computational Structures for Linguistic ModelingMPRI 2-27-12020-2021 59 / 60
Combinatory Categorial Grammar
Bibliography
Ajdukiewicz, K. (1935). Die syntaktische konnexitat. Studia Philosophia 1.
Bar-Hillel, Y. (1953). A quasi-arithmetical notation for syntactic description. Language 29(1), 47–58.
Chomsky, N. (1956). Three models for the description of language. IRE Trans. Information Theory 2(3), 113–124.
Collins, M. (2003, December). Head-driven statistical models for natural language parsing. Comput. Linguist. 29(4), 589–637.
Gaifman, H. (1965). Dependency systems and phrase-structure systems. Information and Control 8(3), 304–337.
Gildea, D. and D. Temperley (2010). Do grammars minimize dependency length? Cognitive Science 34(2), 286–310.
Joshi, A. K. and Y. Schabes (1997). Tree-adjoining grammars. In Handbook of Formal Languages, Volume 3: Beyond Words.,pp. 69–123.
Kallmeyer, L. (2010). Parsing Beyond Context-Free Grammars. Cognitive Technologies. Springer.
Klein, D. and C. D. Manning (2003). Accurate unlexicalized parsing. In Proceedings of the 41st Annual Meeting of theAssociation for Computational Linguistics, 7-12 July 2003, Sapporo Convention Center, Sapporo, Japan., pp. 423–430.
Peters, S. and R. Ritchie (1973). On the generative power of transformational grammars. Inf. Sci. 6, 49–83.
Petrov, S., L. Barrett, R. Thibaux, and D. Klein (2006). Learning accurate, compact, and interpretable tree annotation. In ACL2006, 21st International Conference on Computational Linguistics and 44th Annual Meeting of the Association forComputational Linguistics, Proceedings of the Conference, Sydney, Australia, 17-21 July 2006.
Seki, H., T. Matsumura, M. Fujii, and T. Kasami (1991). On multiple context-free grammars. Theor. Comput. Sci. 88(2),191–229.
Steedman, M. and J. Baldridge (2011). Combinatory categorial grammar.
Benoıt Crabbe Logical and Computational Structures for Linguistic ModelingMPRI 2-27-12020-2021 60 / 60