logical and computational structures for linguistic ...bcrabbe/mpri/cours3.pdfhidden structure y is...

60
Logical and Computational Structures for Linguistic Modeling MPRI 2-27-1 Benoˆ ıt Crabb´ e 2020-2021 Benoˆ ıt Crabb´ e Logical and Computational Structures for Linguistic ModelingMPRI 2-27-1 2020-2021 1 / 60

Upload: others

Post on 20-Jan-2021

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Logical and Computational Structures for Linguistic ...bcrabbe/mpri/cours3.pdfhidden structure y is instead organised as a tree. A Pcfg is a cfg that de nes a language L such that

Logical and Computational Structures for LinguisticModeling

MPRI 2-27-1

Benoıt Crabbe

2020-2021

Benoıt Crabbe Logical and Computational Structures for Linguistic ModelingMPRI 2-27-12020-2021 1 / 60

Page 2: Logical and Computational Structures for Linguistic ...bcrabbe/mpri/cours3.pdfhidden structure y is instead organised as a tree. A Pcfg is a cfg that de nes a language L such that

Models of syntax

This class provides an introduction to several formal systems formodelling the syntax of natural languages.

The class highlights some of the key modelling problems in naturallanguage syntax and each time we provide an illustration with aformal system that copes well with the problem at hand

Modelling issues in the syntax of natural languages:

Ambiguity and lexicalism: weaknesses and reformulations of pcfgMove: Expression of movement without moving, non context-freepatterns and tree adjoining grammars (tag)Robustness and word order freedom Dependency syntaxSyntax semantics interface and combinatory categorial grammar(ccg)

Benoıt Crabbe Logical and Computational Structures for Linguistic ModelingMPRI 2-27-12020-2021 2 / 60

Page 3: Logical and Computational Structures for Linguistic ...bcrabbe/mpri/cours3.pdfhidden structure y is instead organised as a tree. A Pcfg is a cfg that de nes a language L such that

PCFG and ambiguity

Plan

1 PCFG and ambiguity

2 Tree adjoining grammars

3 Dependency syntax

4 Combinatory Categorial Grammar

Benoıt Crabbe Logical and Computational Structures for Linguistic ModelingMPRI 2-27-12020-2021 3 / 60

Page 4: Logical and Computational Structures for Linguistic ...bcrabbe/mpri/cours3.pdfhidden structure y is instead organised as a tree. A Pcfg is a cfg that de nes a language L such that

PCFG and ambiguity

Probabilistic Context Free grammar

Probabilistic context free grammar (pcfg) is a tree rather than asequence model, yet it falls into the class of generative languagemodels.

While Hmm assume an hidden sequence y, pcfg assume that thehidden structure y is instead organised as a tree.

A Pcfg is a cfg that defines a language L such that∑

x∈L P(x) = 1.That is pcfg is a language model.

Benoıt Crabbe Logical and Computational Structures for Linguistic ModelingMPRI 2-27-12020-2021 4 / 60

Page 5: Logical and Computational Structures for Linguistic ...bcrabbe/mpri/cours3.pdfhidden structure y is instead organised as a tree. A Pcfg is a cfg that de nes a language L such that

PCFG and ambiguity

Definition

A pcfg is a 5-tuple G = 〈Σ,T ,S ,R,P〉 where:

Σ is a set of non terminal symbols

T is a set of terminal symbols (words)

S ∈ Σ is an axiom

R is a set of rules of the form A→ β

P is a probabilistic weighting function such that P(A→ β) ∈ [0, 1]and

∑β P(A→ β) = 1 for every A ∈ Σ

Benoıt Crabbe Logical and Computational Structures for Linguistic ModelingMPRI 2-27-12020-2021 5 / 60

Page 6: Logical and Computational Structures for Linguistic ...bcrabbe/mpri/cours3.pdfhidden structure y is instead organised as a tree. A Pcfg is a cfg that de nes a language L such that

PCFG and ambiguity

cfg derivation as a tree

S

VP

PP

NP

N

mat

D

the

P

on

V

sleeps

NP

N

cat

D

The

The tree instanciates thefollowing occurrences of cfgrules:

S → NP VP

NP → D N

VP → V NP

PP → P NP

NP → D N

D → The

N → cat

V → sleeps

P → on

D → the

N → mat

Benoıt Crabbe Logical and Computational Structures for Linguistic ModelingMPRI 2-27-12020-2021 6 / 60

Page 7: Logical and Computational Structures for Linguistic ...bcrabbe/mpri/cours3.pdfhidden structure y is instead organised as a tree. A Pcfg is a cfg that de nes a language L such that

PCFG and ambiguity

Probability of a tree and the best tree problem

The probability of a derivation tree t is defined as:

P(t) =∏

A→β∈tP(A→ β)

Note that P(x, y) = P(t) in our notation inherited from Hmmbecause the tree also generates the observed x symbols as leaves

Let T (x) be the set of trees such that yield(t) = x for everyt ∈ T (x). In what follows we will be mostly consider the problem ofpredicting the max probability tree:

t = argmaxt∈T (x)

P(t)

Benoıt Crabbe Logical and Computational Structures for Linguistic ModelingMPRI 2-27-12020-2021 7 / 60

Page 8: Logical and Computational Structures for Linguistic ...bcrabbe/mpri/cours3.pdfhidden structure y is instead organised as a tree. A Pcfg is a cfg that de nes a language L such that

PCFG and ambiguity

Example of ambiguity and lexical effects

(A) S

NP

PP

NP

N

tomates

D

des

P

avec

N

salade

D

une

VN

mange

NP

Pierre

(B) S

PP

NP

N

tomates

D

des

P

avec

NP

N

salade

D

une

VN

mange

NP

Pierre

The lexical problem

The preferred interpretation is (A). The rules colored in red are those thatdiffer between the two trees. With standard pcfg the choice ofinterpretation is dependant of the probability of generic structural rulesindependently of the lexical elements. Observe that for the sentence withidentical structure whose preferred structure is (B) Pierre mange unesalade avec des couverts the disambiguation choice is identical.

Benoıt Crabbe Logical and Computational Structures for Linguistic ModelingMPRI 2-27-12020-2021 8 / 60

Page 9: Logical and Computational Structures for Linguistic ...bcrabbe/mpri/cours3.pdfhidden structure y is instead organised as a tree. A Pcfg is a cfg that de nes a language L such that

PCFG and ambiguity

A lexicalized solutionparse trees

A possible solution to the observed problem amounts to lexicalize theparse trees (Collins, 2003) with lexical heads such as:

(A) S[mange]

NP[salade]

PP[tomates]

NP[tomates]

N[tomates]

tomates

D[des]

des

P[avec]

avec

N[salade]

salade

D[une]

une

VN[mange]

mange

NP[Pierre]

Pierre

(B) S[mange]

PP[tomates]

NP[tomates]

N[tomates]

tomates

D[des]

des

P[avec]

avec

NP[salade]

N[salade]

salade

D[une]

une

VN[mange]

mange

NP[Pierre]

Pierre

Benoıt Crabbe Logical and Computational Structures for Linguistic ModelingMPRI 2-27-12020-2021 9 / 60

Page 10: Logical and Computational Structures for Linguistic ...bcrabbe/mpri/cours3.pdfhidden structure y is instead organised as a tree. A Pcfg is a cfg that de nes a language L such that

PCFG and ambiguity

A lexicalized solutionThe grammar

The lexicalized representation implies we have grammars with rulesannotated with lexical words, for the example:

S [mange] → NP[Pierre]VN[mange]NP[salade]

NP[salade] → D[une]N[salade]PP[tomates]

The rules of such a grammar are made of lexicalized categories X [w ]that are couples of a traditional non terminal and a word symbol:

X [wh] → Y1[w1] . . .Yh[wh] . . .Yn[wn]

X [wh] → wh

every lexicalized rule is restricted to have at least one occurrence ofthe lexical symbol on the lhs as part of a symbol in the rhs

Benoıt Crabbe Logical and Computational Structures for Linguistic ModelingMPRI 2-27-12020-2021 10 / 60

Page 11: Logical and Computational Structures for Linguistic ...bcrabbe/mpri/cours3.pdfhidden structure y is instead organised as a tree. A Pcfg is a cfg that de nes a language L such that

PCFG and ambiguity

Combinatorial explosion in the size of the grammar

A Pcfg grammar with lexicalized rules of this form has |Σ| × |V | nonterminal symbols

Let n be the arity of the grammar rules and let |T | = 300000,|Σ| = 40, the number of rules r becomes quickly astronomical:

r = |Σ|(|Σ| × |T |)n

5, 76 1015 ≈ 40× (40× 300000)2

Pcfg parsers can process such grammars by generating rulesdynamically (lexical elements fill their slots while parsing)

Probability estimation is difficult whatever the estimation method.The problem amounts to estimate probabilities for events that belongto general purpose world knowledge

Benoıt Crabbe Logical and Computational Structures for Linguistic ModelingMPRI 2-27-12020-2021 11 / 60

Page 12: Logical and Computational Structures for Linguistic ...bcrabbe/mpri/cours3.pdfhidden structure y is instead organised as a tree. A Pcfg is a cfg that de nes a language L such that

PCFG and ambiguity

The unlexicalized solution

Another line of solution comes from the unlexicalized representations(Klein and Manning, 2003; Petrov et al., 2006)

Here the working hypothesis rejects the idea of acquiring worldknowledge from data

Disambiguation is performed by identifying recurrent patterns in thedata that generally help to disambiguate properly

Example: the late attachment preference

This pattern studied by Frazier (1987) states that in case of choice, wegenerally prefer to attach to the latest open constituent in the sentence:

(Tom said (that Bill had taken the cleaning out )? yesterday )? )

Benoıt Crabbe Logical and Computational Structures for Linguistic ModelingMPRI 2-27-12020-2021 12 / 60

Page 13: Logical and Computational Structures for Linguistic ...bcrabbe/mpri/cours3.pdfhidden structure y is instead organised as a tree. A Pcfg is a cfg that de nes a language L such that

PCFG and ambiguity

The capture of unlexicalized patterns by grammar categoryrefinements

Category refinements for an example of the form:. . . le frere du pere de la voisine

(A) NP

PP

NP

PP

NPP

NP

P

NP

(B) NP

PP

NPP

NP

PP

NPP

NP

Key observation

Both trees have the exact same probability with pcfg because they arebuilt by an identical set of rules.

Benoıt Crabbe Logical and Computational Structures for Linguistic ModelingMPRI 2-27-12020-2021 13 / 60

Page 14: Logical and Computational Structures for Linguistic ...bcrabbe/mpri/cours3.pdfhidden structure y is instead organised as a tree. A Pcfg is a cfg that de nes a language L such that

PCFG and ambiguity

The capture of unlexicalized patterns by categoryrefinements

The parent annotation amounts to annotate each node in a tree by itsparent category

S

VP

NP

N

mouse

D

a

V

ate

NP

N

cat

D

The

parent=⇒

S/∅

VP/S

NP/VP

N

mouse

D

a

V

ate

NP/S

N

cat

D

The

The immediate effect is to contextualize the categories (NP/S is asubject and NP/VP is object). This makes sense if we look at statsfrom Penn treebank:

Type NP NP PP DT NN PRP Autre

NP/S (sujet) 9% 9% 21% 61%NP/VP (objet) 22% 7% 3% 69%NP/? (Tous) 11% 9% 6% 74%

Benoıt Crabbe Logical and Computational Structures for Linguistic ModelingMPRI 2-27-12020-2021 14 / 60

Page 15: Logical and Computational Structures for Linguistic ...bcrabbe/mpri/cours3.pdfhidden structure y is instead organised as a tree. A Pcfg is a cfg that de nes a language L such that

PCFG and ambiguity

The capture of unlexicalized patterns by grammar categoryrefinementsLate closure by parent annotation

The parent annotation also catches the late closure preference:

(A) NP/∅

PP/NP

NP/PP

PP/NP

NP/PPP

NP/NP

P

NP/NP

(B) NP/∅

PP/NP

NP/PPP

NP/NP

PP/NP

NP/PPP

NP/NP

where NP within a PP get category NP/PP and NP within an NPcategory NP/NP. Since the rule sets are not anymore identical, thepreference is captured if we count it in data.

Benoıt Crabbe Logical and Computational Structures for Linguistic ModelingMPRI 2-27-12020-2021 15 / 60

Page 16: Logical and Computational Structures for Linguistic ...bcrabbe/mpri/cours3.pdfhidden structure y is instead organised as a tree. A Pcfg is a cfg that de nes a language L such that

PCFG and ambiguity

PCFG evaluation

Pcfg parsers are evaluated by comparing the predictions of the parser with handannotated trees.

A parse triplet is a tuple 〈X , i , j〉 where X is a category and i the index of theleftmost word in its yield, j the index of the rightmost word.

The set predicted is the set of all triplets found in the predicted parse tree and theset, reference is the set of all triples found in the reference and predicted correct= predicted ∩ reference.

The precision is defined as:

P =predicted correct

predicted

The recall as:

R =predicted correct

reference

and the F-score:

F =2PR

P + R

Benoıt Crabbe Logical and Computational Structures for Linguistic ModelingMPRI 2-27-12020-2021 16 / 60

Page 17: Logical and Computational Structures for Linguistic ...bcrabbe/mpri/cours3.pdfhidden structure y is instead organised as a tree. A Pcfg is a cfg that de nes a language L such that

PCFG and ambiguity

Some results

Name F-score

pcfg – base ∼ 70Unlex. annotations 85.8pcfg-la 90.1Lexicalized (Collins) 87.6Lexicalized (Charniak) 89.7

State of the art (2019) 95.6

Benoıt Crabbe Logical and Computational Structures for Linguistic ModelingMPRI 2-27-12020-2021 17 / 60

Page 18: Logical and Computational Structures for Linguistic ...bcrabbe/mpri/cours3.pdfhidden structure y is instead organised as a tree. A Pcfg is a cfg that de nes a language L such that

Tree adjoining grammars

Plan

1 PCFG and ambiguity

2 Tree adjoining grammars

3 Dependency syntax

4 Combinatory Categorial Grammar

Benoıt Crabbe Logical and Computational Structures for Linguistic ModelingMPRI 2-27-12020-2021 18 / 60

Page 19: Logical and Computational Structures for Linguistic ...bcrabbe/mpri/cours3.pdfhidden structure y is instead organised as a tree. A Pcfg is a cfg that de nes a language L such that

Tree adjoining grammars

Movement in natural language

Let’s consider again the syntax of an arithmetic expression:

(3 + (4 x 2))

We can write such expressions with prefix, infix, postfix notation,whatever the notation, the functor stands next to its arguments,yielding an easy process of evaluation.

In natural language, cases following patterns of the form:

(2 (3 + (4 x )))

are quite possible. Here it is meant that the 2 has been ’moved away’from its canonical place (as argument of its functor) to some other.This ’move’ is fairly natural in many languages.

Benoıt Crabbe Logical and Computational Structures for Linguistic ModelingMPRI 2-27-12020-2021 19 / 60

Page 20: Logical and Computational Structures for Linguistic ...bcrabbe/mpri/cours3.pdfhidden structure y is instead organised as a tree. A Pcfg is a cfg that de nes a language L such that

Tree adjoining grammars

Examples of movements in natural languages

(interrogatives)(Quel loup)1 ((Marie) a-t-elle vu (t1) (hier)) ?(Quel loup)1 (tu crois (que ((Marie) a vu (t1) (hier)))) ?

(relatives)Le loup ((que)1 ((Marie) a vu (t1) (hier))) a disparuLe loup ((que)1 (tu crois (que ((Marie) a vu (t1) (hier))))) a disparu

(clefts)C’est (le loup)1 (que ((Marie) a vu (t1) (hier))) ?C’est (le loup)1 (que (tu crois (que ((Marie) a vu (t1) (hier))))) ?

A priori unbounded

Le loup que Pierre pense que son frere a dit (. . . ) que le voisin croit queMarie aurait vu

Benoıt Crabbe Logical and Computational Structures for Linguistic ModelingMPRI 2-27-12020-2021 20 / 60

Page 21: Logical and Computational Structures for Linguistic ...bcrabbe/mpri/cours3.pdfhidden structure y is instead organised as a tree. A Pcfg is a cfg that de nes a language L such that

Tree adjoining grammars

Generative and transformational grammar

The formalisation of a generative grammar with movement (ortransformations) is the original insight of Chomsky (1956).

The architecture of a transformational grammar is the following:

Generate a base structure where all the predicates and arguments are inthe same domain of locality (informally, the functor can fetch itsarguments within the same grammatical rule)Apply tree transformations (include tree structure addition, removal,substitution) to generate the surface (observed) form

Undecidability

Since transformations can be applied multiple times in sequence and giventheir properties, there is no guarantee that the transformational processhalts, the system is undecidable (Peters and Ritchie, 1973).

Benoıt Crabbe Logical and Computational Structures for Linguistic ModelingMPRI 2-27-12020-2021 21 / 60

Page 22: Logical and Computational Structures for Linguistic ...bcrabbe/mpri/cours3.pdfhidden structure y is instead organised as a tree. A Pcfg is a cfg that de nes a language L such that

Tree adjoining grammars

Tree adjoining grammar

Tree adjoining grammar (Joshi and Schabes, 1997) builds on theinsight of lexicalism and on the idea to provide a means to expressmovement without actually expressing explicit tree transformationsbut by means of an operation called adjunction.

Tag is decidable and one can design parsing algorithms in polynomialtime O(n6)

Benoıt Crabbe Logical and Computational Structures for Linguistic ModelingMPRI 2-27-12020-2021 22 / 60

Page 23: Logical and Computational Structures for Linguistic ...bcrabbe/mpri/cours3.pdfhidden structure y is instead organised as a tree. A Pcfg is a cfg that de nes a language L such that

Tree adjoining grammars

A tree grammar

tag is a tree grammar, whose units are elementary trees that canbe substituted into each other:

]

Descriptive syntax

Formal syntax cares more about the byproduct (trees) thanabout the derivation

We are eager to view the grammar and syntactic parsing asa tree building device where trees are substitutedaltogether.

S

NP

NP

D

D

Le

N

N

chat

VP

VP

V

V

dort

PP

PP

P

P

sur

NP

NP

D

D

le

N

N

paillasson

S

NP

D

Le

N

chat

VP

V

dort

PP

P

sur

NP

D

le

N

paillasson

Benoit Crabbe Structures Informatiques et Logiques pour la Modelisation

Benoıt Crabbe Logical and Computational Structures for Linguistic ModelingMPRI 2-27-12020-2021 23 / 60

Page 24: Logical and Computational Structures for Linguistic ...bcrabbe/mpri/cours3.pdfhidden structure y is instead organised as a tree. A Pcfg is a cfg that de nes a language L such that

Tree adjoining grammars

Extended domain of locality

Since elementary trees are first class citizens, Tag allows us to usetrees of depth > 1:

]

Extending the domain of locality

Under this descriptive view, we can not only consider plugging together

subtrees of depth 1, but subtrees of an arbitratry depth ≥ 1.

S

NP

NP

D

D

Le

N

chat

VP

V

dort

PP

PP

P

sur

NP

NP

D

D

le

N

paillasson

Vocabulary

We say that trees of arbitrary depth have an extended domain of locality, a

grammar that has as units such trees is a Tree substitution grammar (TSG)

Benoit Crabbe Structures Informatiques et Logiques pour la Modelisa

Such trees are said to have an extended domain of locality. Agrammar whose units are elementary trees and with an operationcalled tree substitution is called a tree substitution grammar (tsg)

Benoıt Crabbe Logical and Computational Structures for Linguistic ModelingMPRI 2-27-12020-2021 24 / 60

Page 25: Logical and Computational Structures for Linguistic ...bcrabbe/mpri/cours3.pdfhidden structure y is instead organised as a tree. A Pcfg is a cfg that de nes a language L such that

Tree adjoining grammars

Tree rewriting and derivations

The substitution operation amounts to rewrite the leaf node of anelementary tree labelled with a non terminal symbol by an elementarytree whose root is labelled with the same non terminal symbol.The derivation of a tsg is not a sequence of rewrite operations but atree:

]

Relevance for syntactic description

Allows for a direct encoding of lexical dependencies in thegrammar.

Provided that the grammar is lexicalised

S

NP

NP

D

D

Le

N

chat

VP

V

dort

PP

PP

P

sur

NP

NP

D

D

le

N

paillasson

dort

chat

le

sur

paillasson

le

Benoit Crabbe Structures Informatiques et Logiques pour la Modelisa

(here we name elementary trees by their lexical item)Benoıt Crabbe Logical and Computational Structures for Linguistic ModelingMPRI 2-27-12020-2021 25 / 60

Page 26: Logical and Computational Structures for Linguistic ...bcrabbe/mpri/cours3.pdfhidden structure y is instead organised as a tree. A Pcfg is a cfg that de nes a language L such that

Tree adjoining grammars

Lexicalisation

Definition (lexicalisation)

A finitely ambiguous grammar G is lexicalized if every rule in G contains alexical element

Definition (weak lexicalisation of a formalism)

A formalism F weakly lexicalizes a formalism F ′ if for every grammarG ′ ∈ F ′ we have a lexicalized grammar G ∈ F such that L(G ′) = L(G )

cfg in Greibach normal form (X → a α) weakly lexicalizes cfg

Definition (strong lexicalisation of a formalism)

A formalism F weakly lexicalizes a formalism F ′ if for every grammarG ′ ∈ F ′ we have a lexicalized grammar G ∈ F that generates the sametree set as G ′

Benoıt Crabbe Logical and Computational Structures for Linguistic ModelingMPRI 2-27-12020-2021 26 / 60

Page 27: Logical and Computational Structures for Linguistic ...bcrabbe/mpri/cours3.pdfhidden structure y is instead organised as a tree. A Pcfg is a cfg that de nes a language L such that

Tree adjoining grammars

tsg does not strongly lexicalize cfg

The proof relies on the idea that each elementary tree of a tsg has afinite height while some cfg grammars can generate trees where allpaths can grow without bound

Theorem

Consider the grammar G :

S → S S

S → a

This grammar generates sequences of a with trees whose path length fromthe root to the leaves can grow unboundedly. This is a contradiction withthe definition of tsg where the elementary trees have finite height andwith the definition of a grammar that has finite number of rules.

Benoıt Crabbe Logical and Computational Structures for Linguistic ModelingMPRI 2-27-12020-2021 27 / 60

Page 28: Logical and Computational Structures for Linguistic ...bcrabbe/mpri/cours3.pdfhidden structure y is instead organised as a tree. A Pcfg is a cfg that de nes a language L such that

Tree adjoining grammars

Tree adjoining grammar

Definition

A tree adjoining grammar (tag) is a tuple 〈Σ, S , L, I ,A〉 where:

Σ is a set of non terminal symbols

S ∈ Σ is an axiom

L is a set of terminal symbols (L ∪ Σ = ∅)I is a set of initial trees. A leaf node n whose label `(n) ∈ Σ is calleda substitution node. A leaf node n whose label `(n) ∈ L is called ananchor node. Any non leaf node is labelled by a non terminal

A is a set of auxiliary trees. Every auxiliary tree has exactly onenode whose label `(n) is equal to the root label. n is called the footnode (notation ?).

The set of trees I ∪ A is the set of elementary trees.

Two compositions operations are defined: substitution and adjunction

Benoıt Crabbe Logical and Computational Structures for Linguistic ModelingMPRI 2-27-12020-2021 28 / 60

Page 29: Logical and Computational Structures for Linguistic ...bcrabbe/mpri/cours3.pdfhidden structure y is instead organised as a tree. A Pcfg is a cfg that de nes a language L such that

Tree adjoining grammars

Adjunction

Adjunction

Let β be an auxiliary tree and α an elementary tree. β is adjoined onnode x of α by splicing α in two parts. The top node x top is replacedby the root of β and the bottom node xbot is replaced by the footnode of β.

Adjunction is subject to the constraint that the foot node, root nodeof β and x have the same label.

It is not possible to adjoin on a substitution node

]

Adjoining

Adjoining : inserts a tree β into a tree αβ

V

Vaux

a

V⋆

α

S

NP↓ V

vu

NP↓⇒

S

NP↓ V

Vaux

a

V

vu

NP↓

Adjunction

Let β be an elementary tree and α an auxiliary tree, α isadjoined on node x of β by splicing β in two parts. The topnode x from β is replaced by the root of α, the bottomnode x from β is replaced by the foot node of α.

Foot node in β and root node in β have the same categoryN

Node receiving adjunction must have category N .

No adjoining on substitution node

Benoit Crabbe Structures Informatiques et Logiques pour la Modelisation

Benoıt Crabbe Logical and Computational Structures for Linguistic ModelingMPRI 2-27-12020-2021 29 / 60

Page 30: Logical and Computational Structures for Linguistic ...bcrabbe/mpri/cours3.pdfhidden structure y is instead organised as a tree. A Pcfg is a cfg that de nes a language L such that

Tree adjoining grammars

Adjoining constraints

Every node n of an elementary tree can get an adjunction constraint:

Obligatory adjunction (OA) it is mandatory to perform an adjunctionon this nodeSelective adjunction (SA) a subset of the trees of the grammar canadjoin on this nodeNull adjunction (NA) no adjunction can be performed on this node

Benoıt Crabbe Logical and Computational Structures for Linguistic ModelingMPRI 2-27-12020-2021 30 / 60

Page 31: Logical and Computational Structures for Linguistic ...bcrabbe/mpri/cours3.pdfhidden structure y is instead organised as a tree. A Pcfg is a cfg that de nes a language L such that

Tree adjoining grammars

Derivation in tree adjoining grammars

A derivation in a tag starts with an initial tree whose root is labelledby the grammar axiom (`(r) = S) The derivation process replaces thesubstitution nodes until all substitution nodes have been substitutedin the derivation tree. Adjunction may optionally occur.

Example derivation

Elementary treesN

Pierre

N

Marie

V

V?Vaux

a

S

NV

vu

N

Derivation treeα[vu]

3 : α[Marie]1 : α[Pierre]2 : β[a]

Derived treeS

N

Marie

V

V

vu

Vaux

a

N

Pierre

Benoıt Crabbe Logical and Computational Structures for Linguistic ModelingMPRI 2-27-12020-2021 31 / 60

Page 32: Logical and Computational Structures for Linguistic ...bcrabbe/mpri/cours3.pdfhidden structure y is instead organised as a tree. A Pcfg is a cfg that de nes a language L such that

Tree adjoining grammars

Adjunction and unbounded move

Lexicalized tag can capture naturally examples that would call fortransformations otherwise:

]

Solution for TAG

Adjoining of the main clause into the subordinate clause:

S

N

Pierre

V

pense

Ssub

C

que

S⋆

Srel

ProRel

ProRel

que

S

N

Marie

V

a vu

vu

a Marie que pense

Factoring recursion out of the domain of locality

The predicate vu is encoded locally with all its dependants.

The recursive structure is factored out of the domain of locality

Rq : The “design” principle (argument/modifier) is not respected

Benoit Crabbe Structures Informatiques et Logiques pour la Modelisation

(Le loup) que Pierre pense que Marie a vu

Factoring recursion out of the domain of locality

As can be observed, the functor vu (or predicate) has access to all itsarguments que,Pierre,Marie within the same grammatical rule (elementarytree) while the recursive component is factored out. Observe also that theoperation can be repeated hence causing the relative pronoun to beunboundedly far from its predicate.

Benoıt Crabbe Logical and Computational Structures for Linguistic ModelingMPRI 2-27-12020-2021 32 / 60

Page 33: Logical and Computational Structures for Linguistic ...bcrabbe/mpri/cours3.pdfhidden structure y is instead organised as a tree. A Pcfg is a cfg that de nes a language L such that

Tree adjoining grammars

Context sensitivity

Adjunction is also an operation that contributes to make of tag aclass of language that is a superset of context free languages (and asubset of context sensitive languages). This class of languages iscalled mildly context sensitive languages.

Grammar generating the pattern anbncndn:S

ε

Sna

dS

cS?nab

a

Grammar generating the 2-copy language {ww |w ∈ {a, b}∗}S

ε

Sna

S

aS?na

a

Sna

S

bS?na

b

Benoıt Crabbe Logical and Computational Structures for Linguistic ModelingMPRI 2-27-12020-2021 33 / 60

Page 34: Logical and Computational Structures for Linguistic ...bcrabbe/mpri/cours3.pdfhidden structure y is instead organised as a tree. A Pcfg is a cfg that de nes a language L such that

Tree adjoining grammars

Context sensitivity and natural languages

The pattern of the 2-copy language is found naturally in Dutch (andSwiss German):

. . . omdat ik Cecilia Henk de nijlpaarden zag helpen voeren. . . because I Cecilia Henk the hippopotamus saw help feed

. . . because I saw Cecilia help Henk feed the hippopotamus

This kind of example (cross serial dependencies) suggest thatnatural languages are mildly context sensitive.

Benoıt Crabbe Logical and Computational Structures for Linguistic ModelingMPRI 2-27-12020-2021 34 / 60

Page 35: Logical and Computational Structures for Linguistic ...bcrabbe/mpri/cours3.pdfhidden structure y is instead organised as a tree. A Pcfg is a cfg that de nes a language L such that

Tree adjoining grammars

Mildly context sensitive systems

tag is by no means the only mildly context sensitive formalism. ccghas the same generative capacity

tag however suffers from expressivity limits and there exist slightlymore general systems such as lcfrs (Kallmeyer, 2010), mcfg (Sekiet al., 1991) and multi-component tag that make it easier to modelthe syntax of languages with more word order freedom such asGerman, Korean . . .

Benoıt Crabbe Logical and Computational Structures for Linguistic ModelingMPRI 2-27-12020-2021 35 / 60

Page 36: Logical and Computational Structures for Linguistic ...bcrabbe/mpri/cours3.pdfhidden structure y is instead organised as a tree. A Pcfg is a cfg that de nes a language L such that

Dependency syntax

Plan

1 PCFG and ambiguity

2 Tree adjoining grammars

3 Dependency syntax

4 Combinatory Categorial Grammar

Benoıt Crabbe Logical and Computational Structures for Linguistic ModelingMPRI 2-27-12020-2021 36 / 60

Page 37: Logical and Computational Structures for Linguistic ...bcrabbe/mpri/cours3.pdfhidden structure y is instead organised as a tree. A Pcfg is a cfg that de nes a language L such that

Dependency syntax

Dependency grammar

Dependency grammars (dg) are a grammatical description systemthat relate words of a sentence with dependency edges:

D N V P D Nthe cat sleeps on the mat

And sometimes with typed dependencies:

D N V P D Nthe cat sleeps on the mat

det subj iobj

pobj

det

Benoıt Crabbe Logical and Computational Structures for Linguistic ModelingMPRI 2-27-12020-2021 37 / 60

Page 38: Logical and Computational Structures for Linguistic ...bcrabbe/mpri/cours3.pdfhidden structure y is instead organised as a tree. A Pcfg is a cfg that de nes a language L such that

Dependency syntax

Example of dependency dag

D N V V P D Nle chat souhaite dormir sur le paillasson

Dependency Dag

The decoration is variable but the core of the dependency representation isa dependency dag with nodes ordered according to the linear order ofthe sentence

Benoıt Crabbe Logical and Computational Structures for Linguistic ModelingMPRI 2-27-12020-2021 38 / 60

Page 39: Logical and Computational Structures for Linguistic ...bcrabbe/mpri/cours3.pdfhidden structure y is instead organised as a tree. A Pcfg is a cfg that de nes a language L such that

Dependency syntax

The tradition of Dependency Grammar

Dependency grammar comes from a descriptive tradition

It is well suited for the multilingual case:

]

Why do we care ?

Direct encoding of predicate argument structure (versusphrase structure grammar or TAG)

Free word order languages where dependency structure isless related to the way words are grouped together (andunbounded dependencies)

Benoit Crabbe Structures Informatiques et Logiques pour la Modelisa

Yet dependency grammar is not a grammar (!)

There are no rules of grammarThe grammar does not generate a language (in the formal sense)Dependency grammar makes no predictions about language, rather itprovides descriptionsOnly given a sentence, the grammar provides a description, this issimilar in spirit to discriminative models in machine learning.

Benoıt Crabbe Logical and Computational Structures for Linguistic ModelingMPRI 2-27-12020-2021 39 / 60

Page 40: Logical and Computational Structures for Linguistic ...bcrabbe/mpri/cours3.pdfhidden structure y is instead organised as a tree. A Pcfg is a cfg that de nes a language L such that

Dependency syntax

Constraints on dependency structures

Let G = 〈V ,E 〉 be a dependency dag, we note:

i → j iff i 6= j , (i , j) ∈ E

i ↔ j iff i 6= j , (i , j) ∨ (j , i) ∈ E

i∗→ j iff i = j or ∃x .i → x , x

∗→ j

i∗↔ j iff i = j or ∃x .i ↔ x , x

∗↔ j

It is common to apply some of the following constraints on dependencystructures:

G is connected : if i , j ∈ V then i∗↔ j

G is acyclic : if i → j then not j∗→ i

Single head (treeness) constraint: if i → j then not x → j (∀x ∈ V )

Projectivity if i → j then i∗→ x ∀x .i ≺ x ≺ j

Benoıt Crabbe Logical and Computational Structures for Linguistic ModelingMPRI 2-27-12020-2021 40 / 60

Page 41: Logical and Computational Structures for Linguistic ...bcrabbe/mpri/cours3.pdfhidden structure y is instead organised as a tree. A Pcfg is a cfg that de nes a language L such that

Dependency syntax

The projectivity constraint

The projectivity constraint (i → j then i∗→ x ∀x .i ≺ x ≺ j ) says

that a node generates a yield of contiguous words in the sentence :

]

Conditions (well-formedness) on dependency graphs

G is (weakly) connected :

if i, j ∈ V then i∗↔ j

G is acyclic :

if i → j then not j∗→ i

Single head constraint :

if i → j then not i′ → j ∀i′ = i

G is projective:

if i → j then i∗→ i′ ∀i′ such that i < i′ < j or j < i′ < i

i

j

i'

wi wjwi'

*

i

j

i'

wiwj wi'

*

i

ji'

wi wjwi'

Non projective Projective

Benoit Crabbe Structures Informatiques et Logiques pour la Modelisa

Non projectivity is related to movement and is likely to occur moreoften with free word order languages (red nodes indicate a gap):

]

Remarks on constraints

Usually not all constraints are used, connectedness andacyclicty is used most of the time, used, Projectivity issometimes relaxed (in theory most of the time). Howeverthe list of constraints used as an impact on algorithms forparsing.

The projectivity constraint is typically used for practicalapplications, (under this constraint it is fairly easy todesign a parser) however in theory we have non projectiveconstraints as soon as we handle unbounded dependencies,recall :

Le garcon que Pierre pense que Marie aime

Le garçon que Pierre pense que Marie aime

Benoit Crabbe Structures Informatiques et Logiques pour la ModelisaBenoıt Crabbe Logical and Computational Structures for Linguistic ModelingMPRI 2-27-12020-2021 41 / 60

Page 42: Logical and Computational Structures for Linguistic ...bcrabbe/mpri/cours3.pdfhidden structure y is instead organised as a tree. A Pcfg is a cfg that de nes a language L such that

Dependency syntax

Are natural languages really non projective ?

One may think that free word order languages are highly nonprojectiveLet the gap degree of a node be the number of gaps below a node andlet the gap degree of a tree be the maximum gap degree of its nodesWith UD corpora (https://universaldependencies.org) one canmeasure that the gap degree of most languages is generally close to 0:

For languages such has English or French almost 99% of trees haveGap Degree 0.For languages said to have free word order, we observe that 97% ofLatin trees have Gap Degree 0. This is the same for Dutch orHungarian. Ancient Greek has 92% trees with Gap Degree 0.

Dependency Length Minimization

This observation is a consequence of a more general trend: dependenciestend to be short, and in general the structure of the sentence tends tominimize the length of dependencies (Gildea and Temperley, 2010)

Benoıt Crabbe Logical and Computational Structures for Linguistic ModelingMPRI 2-27-12020-2021 42 / 60

Page 43: Logical and Computational Structures for Linguistic ...bcrabbe/mpri/cours3.pdfhidden structure y is instead organised as a tree. A Pcfg is a cfg that de nes a language L such that

Dependency syntax

The connection with context free grammar

Although Dependency Grammar is generally best understood as adescriptive system, it has been observed early that a projectivedependency grammar can be expressed by a context free grammar(Gaifman, 1965). The rules are of the form:

X → α w β

(where α and β are sequences of non terminals and w is a terminal)

Example:

V → N V vu N

N → Pierre

N → D chat

D → le

V → a

Each rule encodes a lexical head with the categories of its dependantsBenoıt Crabbe Logical and Computational Structures for Linguistic ModelingMPRI 2-27-12020-2021 43 / 60

Page 44: Logical and Computational Structures for Linguistic ...bcrabbe/mpri/cours3.pdfhidden structure y is instead organised as a tree. A Pcfg is a cfg that de nes a language L such that

Dependency syntax

Context free grammar and robustness

A dg does not typically constrain for grammaticality, the set of grammatical rulesis some set of permutations of grammatical symbols. It seems in the first placethat dg is a poor theory of grammar. . .

Another comparison with cfg highlights the issue. The penn treebank is acollection of 40000 English constituent parse trees from which we can extractcounts of cfg rule occurrences, the pattern is again zipfian:

2.2 productivité grammaticale ? 33

Figure 15 – Courbes de Zipf et de Heaps pour CFG

[ 29 novembre 2017 at 11 :13 – classicthesis version 0.0 ]

This suggests that adding a long tail of low frequency rules is required to acquirethe robustness of a grammar

Benoıt Crabbe Logical and Computational Structures for Linguistic ModelingMPRI 2-27-12020-2021 44 / 60

Page 45: Logical and Computational Structures for Linguistic ...bcrabbe/mpri/cours3.pdfhidden structure y is instead organised as a tree. A Pcfg is a cfg that de nes a language L such that

Combinatory Categorial Grammar

Plan

1 PCFG and ambiguity

2 Tree adjoining grammars

3 Dependency syntax

4 Combinatory Categorial Grammar

Benoıt Crabbe Logical and Computational Structures for Linguistic ModelingMPRI 2-27-12020-2021 45 / 60

Page 46: Logical and Computational Structures for Linguistic ...bcrabbe/mpri/cours3.pdfhidden structure y is instead organised as a tree. A Pcfg is a cfg that de nes a language L such that

Combinatory Categorial Grammar

Syntax and semantics

Syntax is in general not an end in itself, except maybe for dependencyrepresentations

Categorial grammar is a framework that lends itself to study topics inthe syntax semantics interface.

In this part of the class we illustrate with combinatorial combinatorygrammar (ccg)

Benoıt Crabbe Logical and Computational Structures for Linguistic ModelingMPRI 2-27-12020-2021 46 / 60

Page 47: Logical and Computational Structures for Linguistic ...bcrabbe/mpri/cours3.pdfhidden structure y is instead organised as a tree. A Pcfg is a cfg that de nes a language L such that

Combinatory Categorial Grammar

Categorial grammarCategories

Categorial grammar (Ajdukiewicz, 1935; Bar-Hillel, 1953)is made of alexicon that maps words to categories and a two inference rules

Let P be the set of primitive categories then the set C of categories isdefined inductively as:

if p ∈ P then p ∈ Cif p, q ∈ C then (p/q) ∈ Cif p, q ∈ C then (p\q) ∈ C

There is a primitve category called the axiom S ∈ P

The lexicon is a binary relation, relating word strings and categories.

Example lexical entries:Jean := N,Marie := N, petit := N/N, aime := (N\S)/N,est := N, est := (N\S)/(N/N)

Benoıt Crabbe Logical and Computational Structures for Linguistic ModelingMPRI 2-27-12020-2021 47 / 60

Page 48: Logical and Computational Structures for Linguistic ...bcrabbe/mpri/cours3.pdfhidden structure y is instead organised as a tree. A Pcfg is a cfg that de nes a language L such that

Combinatory Categorial Grammar

Inference rules

Basic categorial grammar has two inference rules, forward andbackward functional application :

X/Y Y

X>

Y X\YX

<

Forward application rule (>) is understood as X/Y is a functortaking Y as right argument and returning X as result

Backward application rule (<) is understood as X\Y is a functortaking Y as left argument and returning X as result

Benoıt Crabbe Logical and Computational Structures for Linguistic ModelingMPRI 2-27-12020-2021 48 / 60

Page 49: Logical and Computational Structures for Linguistic ...bcrabbe/mpri/cours3.pdfhidden structure y is instead organised as a tree. A Pcfg is a cfg that de nes a language L such that

Combinatory Categorial Grammar

A categorial grammar for arithmetic expressions

+ := (int\int)/int

- := (int\int)/int

0 := int

1 := int

2 := int

3 := int

3+2-1

3int

+

(int\int)/int2int

(int\int)>

int<

−(int\int)/int

1int

(int\int)>

int<

Benoıt Crabbe Logical and Computational Structures for Linguistic ModelingMPRI 2-27-12020-2021 49 / 60

Page 50: Logical and Computational Structures for Linguistic ...bcrabbe/mpri/cours3.pdfhidden structure y is instead organised as a tree. A Pcfg is a cfg that de nes a language L such that

Combinatory Categorial Grammar

Categorial grammar and phrase structure grammar

John := NP

eats := (S\NP)/NP

apples := NP

JohnNP

eats(S\NP)/NP

apples

NP

(S\NP)>

S<

S

VP

NP

apples

V

eats

NP

John

Weak equivalence

Both cfg and cg generate context free languages. Note also that forevery AB categorial grammar, there is an equivalent cfg in ChomskyNormal Form. Note however that in the other direction the strongequivalence is not general

Benoıt Crabbe Logical and Computational Structures for Linguistic ModelingMPRI 2-27-12020-2021 50 / 60

Page 51: Logical and Computational Structures for Linguistic ...bcrabbe/mpri/cours3.pdfhidden structure y is instead organised as a tree. A Pcfg is a cfg that de nes a language L such that

Combinatory Categorial Grammar

Categorial grammar and semantics

Categories can be understood as an encoding of the type of asemantic representation.

We can indeed make the semantic representation explicit byaugmenting the categories:

John := NP : Johneats := (S\NP)/NP : λxy .eat(y , x)apples := NP : apple

And add to the inference rule the capacity to operate on semanticrepresentation:

X/Y : f Y : a

X : fa>

Y : a X\Y : f

X : fa<

where f is a functor and a an argument

Benoıt Crabbe Logical and Computational Structures for Linguistic ModelingMPRI 2-27-12020-2021 51 / 60

Page 52: Logical and Computational Structures for Linguistic ...bcrabbe/mpri/cours3.pdfhidden structure y is instead organised as a tree. A Pcfg is a cfg that de nes a language L such that

Combinatory Categorial Grammar

Example of semantic computation

JohnNP : John

eats(S\NP)/NP : λxy .eat(y , x)

apples

NP : apple

(S\NP) : λy .eat(y ,apple)>

S : eat(John,apple)<

Remark

With semantics, categorial grammar can be seen as a notational layerencoding a subset of simply typed lambda calculus with a “directionalapplication rule ”

Benoıt Crabbe Logical and Computational Structures for Linguistic ModelingMPRI 2-27-12020-2021 52 / 60

Page 53: Logical and Computational Structures for Linguistic ...bcrabbe/mpri/cours3.pdfhidden structure y is instead organised as a tree. A Pcfg is a cfg that de nes a language L such that

Combinatory Categorial Grammar

Combinatory categorial grammar

As such, categorial grammar is not expressive enough to accomodatenatural language modelling.

Combinatory categorial grammar (ccg) adds some further inferencerules to handle various phenomena in natural language (Steedmanand Baldridge, 2011)

ccg not only provides more expressivity it also increases thegenerative capacity: ccg is mildly context sensitive

ccg additional inference rules make use only of Schonfinkel’s andCurry combinators

Benoıt Crabbe Logical and Computational Structures for Linguistic ModelingMPRI 2-27-12020-2021 53 / 60

Page 54: Logical and Computational Structures for Linguistic ...bcrabbe/mpri/cours3.pdfhidden structure y is instead organised as a tree. A Pcfg is a cfg that de nes a language L such that

Combinatory Categorial Grammar

The functional composition rule

The functional composition rule is an example of such a combinatoryinference rule. It builds on Curry’s B combinator : B ≡ λfgz .f (gz)

The rules is given as follows:

X/Y : f Y /Z : g

X/Z : λz .f (gz)> B

Y \Z : g X\Y : f

X\Z : λz .f (gz)< B

Here is an example in natural language:John might eat apples

JohnNP : John

might

(S\NP)/VP : λxy .might(y , x)eat

VP/NP : λx .eat(x)

(S\NP)/NP : λzy .might(y , eat(z))> B

apples

NP : apples

(S\NP) : λy .might(y , eat(apples))>

S : might(John, eat(apples))<

Benoıt Crabbe Logical and Computational Structures for Linguistic ModelingMPRI 2-27-12020-2021 54 / 60

Page 55: Logical and Computational Structures for Linguistic ...bcrabbe/mpri/cours3.pdfhidden structure y is instead organised as a tree. A Pcfg is a cfg that de nes a language L such that

Combinatory Categorial Grammar

Coordination

Coordination is one of the most complex phenomenon to analyze innatural language syntax.

For coordinators, the obvious category to start with is:

and := (X\X )/X (with X a metavariable over categories)

This allows for coordination of the same type, for instance:

theNP/N

blackN/N

catN

N>

NP>

and(NP\NP)/NP

theNP/N

mouseN

NP>

(NP\NP)>

NP<

runNP\S

S<

Benoıt Crabbe Logical and Computational Structures for Linguistic ModelingMPRI 2-27-12020-2021 55 / 60

Page 56: Logical and Computational Structures for Linguistic ...bcrabbe/mpri/cours3.pdfhidden structure y is instead organised as a tree. A Pcfg is a cfg that de nes a language L such that

Combinatory Categorial Grammar

Type raising

Type raising is an additional inference rule that turns an argumentinto a function of function over this argument:

X : aT/(T\X ) : λf .fa

> T

Here is a case of coordination with shared object where type raisingapplies:

JohnNP

S/(S\NP)> T

likes(S\NP)/NP

S/NP> B

and((S/NP)\(S/NP))/(S/NP)

Mary

NPS/(S\NP)

> Tdislikes

(S\NP)/NP

S/NP> B

(S/NP)\(S/NP)>

S/NP<

garlic

NP

S>

As can be seen, type raising allows the NP subjects to be processedfirst in both sides of the coordination. The object is consumed last,once coordination is performed.

Benoıt Crabbe Logical and Computational Structures for Linguistic ModelingMPRI 2-27-12020-2021 56 / 60

Page 57: Logical and Computational Structures for Linguistic ...bcrabbe/mpri/cours3.pdfhidden structure y is instead organised as a tree. A Pcfg is a cfg that de nes a language L such that

Combinatory Categorial Grammar

Relatives and movement

The additional inference rules (composition and type raising) allowccg to capture unbounded dependencies without adding a“movement” rule, let’s see an example with the object relative.

As seen earlier with adjectives, the noun modifier has type N\N, andan object relative is a sentence without object: S/NP. Thus therelative object pronoun is defined as :

that := (N\N)/(S/NP)

Example (The mouse that Felix eats):

theNP/N

mouseN

that(N\N)/(S/NP)

FelixNP

S/(S\NP)> T

eats(S\NP)/NP

S/NP> B

(N\N)>

N<

NP>

Benoıt Crabbe Logical and Computational Structures for Linguistic ModelingMPRI 2-27-12020-2021 57 / 60

Page 58: Logical and Computational Structures for Linguistic ...bcrabbe/mpri/cours3.pdfhidden structure y is instead organised as a tree. A Pcfg is a cfg that de nes a language L such that

Combinatory Categorial Grammar

Subordinate clauses

To understand the following unbounded case, let’s see how weanalyze subordinate clauses such as You think that Felix eats themouse. In this case the that is no more a pronoun but rather anoptional complementizer of type S/S :

YouNP

think(S\NP)/S

thatS/S

FelixNP

eats(S\NP)/NP

theNP/N

mouseN

NP>

(S\NP)>

S<

S>

(S\NP)>

S<

Benoıt Crabbe Logical and Computational Structures for Linguistic ModelingMPRI 2-27-12020-2021 58 / 60

Page 59: Logical and Computational Structures for Linguistic ...bcrabbe/mpri/cours3.pdfhidden structure y is instead organised as a tree. A Pcfg is a cfg that de nes a language L such that

Combinatory Categorial Grammar

The unbounded case

Here is an example for the unbounded movement case :The mouse that you think that Felix eatswhere we register the complementizer that:(S/NP)/(S/NP) andthink:((S/NP)\NP)/(S/NP) to licence bridging :

theNP/N

mouseN

that(N\N)/(S/NP)

you

NP

think((S/NP)\NP)/(S/NP)

that(S/NP)/(S/NP)

FelixNP

S/(S\NP)> T

eats(S\NP)/NP

S/NP> B

S/NP>

(S\NP)\NP >

S/NP<

(N\N)>

N<

NP>

Benoıt Crabbe Logical and Computational Structures for Linguistic ModelingMPRI 2-27-12020-2021 59 / 60

Page 60: Logical and Computational Structures for Linguistic ...bcrabbe/mpri/cours3.pdfhidden structure y is instead organised as a tree. A Pcfg is a cfg that de nes a language L such that

Combinatory Categorial Grammar

Bibliography

Ajdukiewicz, K. (1935). Die syntaktische konnexitat. Studia Philosophia 1.

Bar-Hillel, Y. (1953). A quasi-arithmetical notation for syntactic description. Language 29(1), 47–58.

Chomsky, N. (1956). Three models for the description of language. IRE Trans. Information Theory 2(3), 113–124.

Collins, M. (2003, December). Head-driven statistical models for natural language parsing. Comput. Linguist. 29(4), 589–637.

Gaifman, H. (1965). Dependency systems and phrase-structure systems. Information and Control 8(3), 304–337.

Gildea, D. and D. Temperley (2010). Do grammars minimize dependency length? Cognitive Science 34(2), 286–310.

Joshi, A. K. and Y. Schabes (1997). Tree-adjoining grammars. In Handbook of Formal Languages, Volume 3: Beyond Words.,pp. 69–123.

Kallmeyer, L. (2010). Parsing Beyond Context-Free Grammars. Cognitive Technologies. Springer.

Klein, D. and C. D. Manning (2003). Accurate unlexicalized parsing. In Proceedings of the 41st Annual Meeting of theAssociation for Computational Linguistics, 7-12 July 2003, Sapporo Convention Center, Sapporo, Japan., pp. 423–430.

Peters, S. and R. Ritchie (1973). On the generative power of transformational grammars. Inf. Sci. 6, 49–83.

Petrov, S., L. Barrett, R. Thibaux, and D. Klein (2006). Learning accurate, compact, and interpretable tree annotation. In ACL2006, 21st International Conference on Computational Linguistics and 44th Annual Meeting of the Association forComputational Linguistics, Proceedings of the Conference, Sydney, Australia, 17-21 July 2006.

Seki, H., T. Matsumura, M. Fujii, and T. Kasami (1991). On multiple context-free grammars. Theor. Comput. Sci. 88(2),191–229.

Steedman, M. and J. Baldridge (2011). Combinatory categorial grammar.

Benoıt Crabbe Logical and Computational Structures for Linguistic ModelingMPRI 2-27-12020-2021 60 / 60