![Page 1: SYNTAX - GitHub Pages · Heads, arguments, & adjuncts • Syntax is about the relationships among words and phrases in a sentence • Each constituent has its own internal structure,](https://reader036.vdocument.in/reader036/viewer/2022071606/6143a0686b2ee0265c0229db/html5/thumbnails/1.jpg)
SYNTAX
Matt Post IntroHLT class 10 September 2020
![Page 2: SYNTAX - GitHub Pages · Heads, arguments, & adjuncts • Syntax is about the relationships among words and phrases in a sentence • Each constituent has its own internal structure,](https://reader036.vdocument.in/reader036/viewer/2022071606/6143a0686b2ee0265c0229db/html5/thumbnails/2.jpg)
and stupor his the Fred with pain from ease couldn’t would a set he cigarette out the that for in wife Jones was during caring a often drugs house but screaming the crying at for didn’t fear worn sleep ablaze day the from that she night
![Page 3: SYNTAX - GitHub Pages · Heads, arguments, & adjuncts • Syntax is about the relationships among words and phrases in a sentence • Each constituent has its own internal structure,](https://reader036.vdocument.in/reader036/viewer/2022071606/6143a0686b2ee0265c0229db/html5/thumbnails/3.jpg)
Fred Jones was worn out from caring for his often screaming and crying wife during the day but he couldn’t sleep at night for fear that she in a stupor from the drugs that didn’t ease the pain would set the house ablaze with a cigarette
![Page 4: SYNTAX - GitHub Pages · Heads, arguments, & adjuncts • Syntax is about the relationships among words and phrases in a sentence • Each constituent has its own internal structure,](https://reader036.vdocument.in/reader036/viewer/2022071606/6143a0686b2ee0265c0229db/html5/thumbnails/4.jpg)
• 46 words, 46! permutations of those words, the vast majority of them ungrammatical and meaningless
• How is that we can– process and understand this sentence?– discriminate it from the sea of ungrammatical
permutations it floats in?
4
![Page 5: SYNTAX - GitHub Pages · Heads, arguments, & adjuncts • Syntax is about the relationships among words and phrases in a sentence • Each constituent has its own internal structure,](https://reader036.vdocument.in/reader036/viewer/2022071606/6143a0686b2ee0265c0229db/html5/thumbnails/5.jpg)
Today we will cover
5
what is syntax?
where do grammars
come from?
how can a computer find a sentence’s
structure?
Linguistics
Computer Science
![Page 6: SYNTAX - GitHub Pages · Heads, arguments, & adjuncts • Syntax is about the relationships among words and phrases in a sentence • Each constituent has its own internal structure,](https://reader036.vdocument.in/reader036/viewer/2022071606/6143a0686b2ee0265c0229db/html5/thumbnails/6.jpg)
Goals for today• After today, you should be able to
– Give a working definition of syntax and describe how linguists think about it
– Describe two well-known grammar formalisms and projects supporting them
– Discuss issues related to universal language features– Describe the formal language hierarchy– Describe algorithms for parsing the two grammar
formalisms
6
![Page 7: SYNTAX - GitHub Pages · Heads, arguments, & adjuncts • Syntax is about the relationships among words and phrases in a sentence • Each constituent has its own internal structure,](https://reader036.vdocument.in/reader036/viewer/2022071606/6143a0686b2ee0265c0229db/html5/thumbnails/7.jpg)
Outline
7
what is syntax?
where do grammars
come from?
how can a computer find a sentence’s
structure?
![Page 8: SYNTAX - GitHub Pages · Heads, arguments, & adjuncts • Syntax is about the relationships among words and phrases in a sentence • Each constituent has its own internal structure,](https://reader036.vdocument.in/reader036/viewer/2022071606/6143a0686b2ee0265c0229db/html5/thumbnails/8.jpg)
Linguistic fields of study• Phonetics: sounds
8
![Page 9: SYNTAX - GitHub Pages · Heads, arguments, & adjuncts • Syntax is about the relationships among words and phrases in a sentence • Each constituent has its own internal structure,](https://reader036.vdocument.in/reader036/viewer/2022071606/6143a0686b2ee0265c0229db/html5/thumbnails/9.jpg)
Linguistic fields of study• Phonetics: sounds• Phonology: sound systems
8
![Page 10: SYNTAX - GitHub Pages · Heads, arguments, & adjuncts • Syntax is about the relationships among words and phrases in a sentence • Each constituent has its own internal structure,](https://reader036.vdocument.in/reader036/viewer/2022071606/6143a0686b2ee0265c0229db/html5/thumbnails/10.jpg)
Linguistic fields of study• Phonetics: sounds• Phonology: sound systems• Morphology: internal word structure
8
![Page 11: SYNTAX - GitHub Pages · Heads, arguments, & adjuncts • Syntax is about the relationships among words and phrases in a sentence • Each constituent has its own internal structure,](https://reader036.vdocument.in/reader036/viewer/2022071606/6143a0686b2ee0265c0229db/html5/thumbnails/11.jpg)
Linguistic fields of study• Phonetics: sounds• Phonology: sound systems• Morphology: internal word structure• Syntax: external word structure (sentences)
8
![Page 12: SYNTAX - GitHub Pages · Heads, arguments, & adjuncts • Syntax is about the relationships among words and phrases in a sentence • Each constituent has its own internal structure,](https://reader036.vdocument.in/reader036/viewer/2022071606/6143a0686b2ee0265c0229db/html5/thumbnails/12.jpg)
Linguistic fields of study• Phonetics: sounds• Phonology: sound systems• Morphology: internal word structure• Syntax: external word structure (sentences)• Semantics: sentence meaning
8
![Page 13: SYNTAX - GitHub Pages · Heads, arguments, & adjuncts • Syntax is about the relationships among words and phrases in a sentence • Each constituent has its own internal structure,](https://reader036.vdocument.in/reader036/viewer/2022071606/6143a0686b2ee0265c0229db/html5/thumbnails/13.jpg)
Linguistic fields of study• Phonetics: sounds• Phonology: sound systems• Morphology: internal word structure• Syntax: external word structure (sentences)• Semantics: sentence meaning• Pragmatics: contextualized meaning and communicative
goals
8
![Page 14: SYNTAX - GitHub Pages · Heads, arguments, & adjuncts • Syntax is about the relationships among words and phrases in a sentence • Each constituent has its own internal structure,](https://reader036.vdocument.in/reader036/viewer/2022071606/6143a0686b2ee0265c0229db/html5/thumbnails/14.jpg)
Aside• Much of our focus is on written language, but language is
first and foremost spoken• Why does this matter?
9
![Page 15: SYNTAX - GitHub Pages · Heads, arguments, & adjuncts • Syntax is about the relationships among words and phrases in a sentence • Each constituent has its own internal structure,](https://reader036.vdocument.in/reader036/viewer/2022071606/6143a0686b2ee0265c0229db/html5/thumbnails/15.jpg)
Aside• Much of our focus is on written language, but language is
first and foremost spoken• Why does this matter?• Which of these is easier for a computer to work with?
9
![Page 16: SYNTAX - GitHub Pages · Heads, arguments, & adjuncts • Syntax is about the relationships among words and phrases in a sentence • Each constituent has its own internal structure,](https://reader036.vdocument.in/reader036/viewer/2022071606/6143a0686b2ee0265c0229db/html5/thumbnails/16.jpg)
Aside• Much of our focus is on written language, but language is
first and foremost spoken• Why does this matter?• Which of these is easier for a computer to work with?
– (written) Dipanjan asked a question
9
![Page 17: SYNTAX - GitHub Pages · Heads, arguments, & adjuncts • Syntax is about the relationships among words and phrases in a sentence • Each constituent has its own internal structure,](https://reader036.vdocument.in/reader036/viewer/2022071606/6143a0686b2ee0265c0229db/html5/thumbnails/17.jpg)
Aside• Much of our focus is on written language, but language is
first and foremost spoken• Why does this matter?• Which of these is easier for a computer to work with?
– (written) Dipanjan asked a question– (spoken) Dipanjan, uh, he, uh, um, was wondering, uh,
he had a question
9
![Page 18: SYNTAX - GitHub Pages · Heads, arguments, & adjuncts • Syntax is about the relationships among words and phrases in a sentence • Each constituent has its own internal structure,](https://reader036.vdocument.in/reader036/viewer/2022071606/6143a0686b2ee0265c0229db/html5/thumbnails/18.jpg)
Today’s focus
10
Morgan Claypool Publishers&SYNTHESIS LECTURES ONHUMAN LANGUAGE TECHNOLOGIES
w w w . m o r g a n c l a y p o o l . c o m
Series Editor: Graeme Hirst, University of TorontoCM& Morgan Claypool Publishers&
About SYNTHESIsThis volume is a printed version of a work that appears in the SynthesisDigital Library of Engineering and Computer Science. Synthesis Lecturesprovide concise, original presentations of important research and developmenttopics, published quickly, in digital and print formats. For more informationvisit www.morganclaypool.com
SYNTHESIS LECTURES ONHUMAN LANGUAGE TECHNOLOGIES
Graeme Hirst, Series Editor
ISBN: 978-1-62705-011-1
9 781627 050111
90000
Series ISSN: 1947-4040 BENDERLINGUISTIC FUNDAM
ENTALS FOR NATURAL LANGUAGE PROCESSINGM
ORGAN
&CLAYPO
OL
Linguistic Fundamentals forNatural Language Processing100 Essentials from Morphology and SyntaxEmily M. Bender, University of WashingtonMany NLP tasks have at their core a subtask of extracting the dependencies—who did what towhom—from natural language sentences. This task can be understood as the inverse of the problemsolved in different ways by diverse human languages, namely, how to indicate the relationship betweendifferent parts of a sentence. Understanding how languages solve the problem can be extremely usefulin both feature design and error analysis in the application of machine learning to NLP. Likewise,understanding cross-linguistic variation can be important for the design of MT systems and othermultilingual applications. The purpose of this book is to present in a succinct and accessible fashioninformation about the morphological and syntactic structure of human languages that can be usefulin creating more linguistically sophisticated, more language-independent, and thus more successfulNLP systems.
Linguistic Fundamentalsfor Natural LanguageProcessing100 Essentials fromMorphology and Syntax
Emily M. Bender
Morgan Claypool Publishers&SYNTHESIS LECTURES ONHUMAN LANGUAGE TECHNOLOGIES
w w w . m o r g a n c l a y p o o l . c o m
Series Editor: Graeme Hirst, University of TorontoCM& Morgan Claypool Publishers&
About SYNTHESIsThis volume is a printed version of a work that appears in the SynthesisDigital Library of Engineering and Computer Science. Synthesis Lecturesprovide concise, original presentations of important research and developmenttopics, published quickly, in digital and print formats. For more informationvisit www.morganclaypool.com
SYNTHESIS LECTURES ONHUMAN LANGUAGE TECHNOLOGIES
Graeme Hirst, Series Editor
ISBN: 978-1-62705-011-1
9 781627 050111
90000
Series ISSN: 1947-4040 BENDERLINGUISTIC FUNDAM
ENTALS FOR NATURAL LANGUAGE PROCESSINGM
ORGAN
&CLAYPO
OL
Linguistic Fundamentals forNatural Language Processing100 Essentials from Morphology and SyntaxEmily M. Bender, University of WashingtonMany NLP tasks have at their core a subtask of extracting the dependencies—who did what towhom—from natural language sentences. This task can be understood as the inverse of the problemsolved in different ways by diverse human languages, namely, how to indicate the relationship betweendifferent parts of a sentence. Understanding how languages solve the problem can be extremely usefulin both feature design and error analysis in the application of machine learning to NLP. Likewise,understanding cross-linguistic variation can be important for the design of MT systems and othermultilingual applications. The purpose of this book is to present in a succinct and accessible fashioninformation about the morphological and syntactic structure of human languages that can be usefulin creating more linguistically sophisticated, more language-independent, and thus more successfulNLP systems.
Linguistic Fundamentalsfor Natural LanguageProcessing100 Essentials fromMorphology and Syntax
Emily M. Bender
Morgan Claypool Publishers&SYNTHESIS LECTURES ONHUMAN LANGUAGE TECHNOLOGIES
w w w . m o r g a n c l a y p o o l . c o m
Series Editor: Graeme Hirst, University of TorontoCM& Morgan Claypool Publishers&
About SYNTHESIsThis volume is a printed version of a work that appears in the SynthesisDigital Library of Engineering and Computer Science. Synthesis Lecturesprovide concise, original presentations of important research and developmenttopics, published quickly, in digital and print formats. For more informationvisit www.morganclaypool.com
SYNTHESIS LECTURES ONHUMAN LANGUAGE TECHNOLOGIES
Graeme Hirst, Series Editor
ISBN: 978-1-62705-011-1
9 781627 050111
90000
Series ISSN: 1947-4040 BENDERLINGUISTIC FUNDAM
ENTALS FOR NATURAL LANGUAGE PROCESSINGM
ORGAN
&CLAYPO
OL
Linguistic Fundamentals forNatural Language Processing100 Essentials from Morphology and SyntaxEmily M. Bender, University of WashingtonMany NLP tasks have at their core a subtask of extracting the dependencies—who did what towhom—from natural language sentences. This task can be understood as the inverse of the problemsolved in different ways by diverse human languages, namely, how to indicate the relationship betweendifferent parts of a sentence. Understanding how languages solve the problem can be extremely usefulin both feature design and error analysis in the application of machine learning to NLP. Likewise,understanding cross-linguistic variation can be important for the design of MT systems and othermultilingual applications. The purpose of this book is to present in a succinct and accessible fashioninformation about the morphological and syntactic structure of human languages that can be usefulin creating more linguistically sophisticated, more language-independent, and thus more successfulNLP systems.
Linguistic Fundamentalsfor Natural LanguageProcessing100 Essentials fromMorphology and Syntax
Emily M. Bender
![Page 19: SYNTAX - GitHub Pages · Heads, arguments, & adjuncts • Syntax is about the relationships among words and phrases in a sentence • Each constituent has its own internal structure,](https://reader036.vdocument.in/reader036/viewer/2022071606/6143a0686b2ee0265c0229db/html5/thumbnails/19.jpg)
What is syntax?• A set of constraints on the possible sentences in the
language– *A set of constraint on the possible sentence.– *Dipanjan had [a] question.– *You are on class.
• At a coarse level, we can divide all possible sequences of words into two groups: valid and invalid (or grammatical and ungrammatical)
11
![Page 20: SYNTAX - GitHub Pages · Heads, arguments, & adjuncts • Syntax is about the relationships among words and phrases in a sentence • Each constituent has its own internal structure,](https://reader036.vdocument.in/reader036/viewer/2022071606/6143a0686b2ee0265c0229db/html5/thumbnails/20.jpg)
Human judgments• How do we know what’s in and out? We simply ask
humans• But how do humans know?
– Bad idea: big lists– Better idea: grammars
12
![Page 21: SYNTAX - GitHub Pages · Heads, arguments, & adjuncts • Syntax is about the relationships among words and phrases in a sentence • Each constituent has its own internal structure,](https://reader036.vdocument.in/reader036/viewer/2022071606/6143a0686b2ee0265c0229db/html5/thumbnails/21.jpg)
A hierarchical view• A grammar is a finite set of rules licensing a (possibly
infinite) number of strings• e.g., some rules
– [sentence] → [subject] [predicate]– [subject] → [noun phrase]– [noun phrase] → [determiner]? [adjective]* [noun]– [predicate] → [verb phrase] [adjunct]
• Rules are phrasal or terminal– Phrasal rules form constituents in a tree– Terminal rules are parts of speech and produce words
13
![Page 22: SYNTAX - GitHub Pages · Heads, arguments, & adjuncts • Syntax is about the relationships among words and phrases in a sentence • Each constituent has its own internal structure,](https://reader036.vdocument.in/reader036/viewer/2022071606/6143a0686b2ee0265c0229db/html5/thumbnails/22.jpg)
Example
14
![Page 23: SYNTAX - GitHub Pages · Heads, arguments, & adjuncts • Syntax is about the relationships among words and phrases in a sentence • Each constituent has its own internal structure,](https://reader036.vdocument.in/reader036/viewer/2022071606/6143a0686b2ee0265c0229db/html5/thumbnails/23.jpg)
POS Examples• No general agreement about the exact set of parts of
speech• Penn Treebank tagset examples
15
![Page 24: SYNTAX - GitHub Pages · Heads, arguments, & adjuncts • Syntax is about the relationships among words and phrases in a sentence • Each constituent has its own internal structure,](https://reader036.vdocument.in/reader036/viewer/2022071606/6143a0686b2ee0265c0229db/html5/thumbnails/24.jpg)
POS Examples• No general agreement about the exact set of parts of
speech• Penn Treebank tagset examples
– nouns: NN, NNS, NNP, NNPS
15
![Page 25: SYNTAX - GitHub Pages · Heads, arguments, & adjuncts • Syntax is about the relationships among words and phrases in a sentence • Each constituent has its own internal structure,](https://reader036.vdocument.in/reader036/viewer/2022071606/6143a0686b2ee0265c0229db/html5/thumbnails/25.jpg)
POS Examples• No general agreement about the exact set of parts of
speech• Penn Treebank tagset examples
– nouns: NN, NNS, NNP, NNPS– adverbs: RB, RBR, RBS, RP
15
![Page 26: SYNTAX - GitHub Pages · Heads, arguments, & adjuncts • Syntax is about the relationships among words and phrases in a sentence • Each constituent has its own internal structure,](https://reader036.vdocument.in/reader036/viewer/2022071606/6143a0686b2ee0265c0229db/html5/thumbnails/26.jpg)
POS Examples• No general agreement about the exact set of parts of
speech• Penn Treebank tagset examples
– nouns: NN, NNS, NNP, NNPS– adverbs: RB, RBR, RBS, RP– verbs: VB, VBD, VBG, VBN, VBP, VBZ
15
![Page 27: SYNTAX - GitHub Pages · Heads, arguments, & adjuncts • Syntax is about the relationships among words and phrases in a sentence • Each constituent has its own internal structure,](https://reader036.vdocument.in/reader036/viewer/2022071606/6143a0686b2ee0265c0229db/html5/thumbnails/27.jpg)
POS Examples• No general agreement about the exact set of parts of
speech• Penn Treebank tagset examples
– nouns: NN, NNS, NNP, NNPS– adverbs: RB, RBR, RBS, RP– verbs: VB, VBD, VBG, VBN, VBP, VBZ– (Here, different tags are used to capture the small bit of
morphology present in English)
15
![Page 28: SYNTAX - GitHub Pages · Heads, arguments, & adjuncts • Syntax is about the relationships among words and phrases in a sentence • Each constituent has its own internal structure,](https://reader036.vdocument.in/reader036/viewer/2022071606/6143a0686b2ee0265c0229db/html5/thumbnails/28.jpg)
Parts of Speech (POS)• Three definitions of noun
16
Grammar school (“metaphysical”)a person, place, thing, or idea
![Page 29: SYNTAX - GitHub Pages · Heads, arguments, & adjuncts • Syntax is about the relationships among words and phrases in a sentence • Each constituent has its own internal structure,](https://reader036.vdocument.in/reader036/viewer/2022071606/6143a0686b2ee0265c0229db/html5/thumbnails/29.jpg)
Parts of Speech (POS)• Three definitions of noun
16
Grammar school (“metaphysical”)a person, place, thing, or idea
Distributional the set of words that have the same distribution as other nouns
{I,you,he} saw the {bird,cat,dog}.
![Page 30: SYNTAX - GitHub Pages · Heads, arguments, & adjuncts • Syntax is about the relationships among words and phrases in a sentence • Each constituent has its own internal structure,](https://reader036.vdocument.in/reader036/viewer/2022071606/6143a0686b2ee0265c0229db/html5/thumbnails/30.jpg)
Parts of Speech (POS)• Three definitions of noun
16
Grammar school (“metaphysical”)a person, place, thing, or idea
Functional the set of words that serve as arguments to verbs
verb
noun adverb
adjective
Distributional the set of words that have the same distribution as other nouns
{I,you,he} saw the {bird,cat,dog}.
![Page 31: SYNTAX - GitHub Pages · Heads, arguments, & adjuncts • Syntax is about the relationships among words and phrases in a sentence • Each constituent has its own internal structure,](https://reader036.vdocument.in/reader036/viewer/2022071606/6143a0686b2ee0265c0229db/html5/thumbnails/31.jpg)
Phrases and Constituents• Longer sequences of words can perform the same
function as individual parts of speech:– I saw [aDT kidN]NP
– I saw [a kid playing basketball]NP
– I saw [a kid playing basketball alone on the court]NP
• This gives rise to the idea of a phrasal constituent, which functions as a unit in relation to the rest of the sentence
17
![Page 32: SYNTAX - GitHub Pages · Heads, arguments, & adjuncts • Syntax is about the relationships among words and phrases in a sentence • Each constituent has its own internal structure,](https://reader036.vdocument.in/reader036/viewer/2022071606/6143a0686b2ee0265c0229db/html5/thumbnails/32.jpg)
Constituent tests• How do you know if a phrase functions as a constituent?• A few tests
– Coordination∎ Kim [read a book], [gave it to Sandy], and [left].
– Substitution with a word∎ Kim read [a very interesting book about grammar].∎ Kim read [it].
– See Bender #51
18
![Page 33: SYNTAX - GitHub Pages · Heads, arguments, & adjuncts • Syntax is about the relationships among words and phrases in a sentence • Each constituent has its own internal structure,](https://reader036.vdocument.in/reader036/viewer/2022071606/6143a0686b2ee0265c0229db/html5/thumbnails/33.jpg)
Heads, arguments, & adjuncts• Syntax is about the relationships among words and
phrases in a sentence
19
![Page 34: SYNTAX - GitHub Pages · Heads, arguments, & adjuncts • Syntax is about the relationships among words and phrases in a sentence • Each constituent has its own internal structure,](https://reader036.vdocument.in/reader036/viewer/2022071606/6143a0686b2ee0265c0229db/html5/thumbnails/34.jpg)
Heads, arguments, & adjuncts• Syntax is about the relationships among words and
phrases in a sentence• Each constituent has its own internal structure, as well as
relationship with words and constituents outside it
19
![Page 35: SYNTAX - GitHub Pages · Heads, arguments, & adjuncts • Syntax is about the relationships among words and phrases in a sentence • Each constituent has its own internal structure,](https://reader036.vdocument.in/reader036/viewer/2022071606/6143a0686b2ee0265c0229db/html5/thumbnails/35.jpg)
Heads, arguments, & adjuncts• Syntax is about the relationships among words and
phrases in a sentence• Each constituent has its own internal structure, as well as
relationship with words and constituents outside it• Hierarchical structure among constituents
– Top down, each constituent has a head– Heads have (phrasal) dependents– Dependents can be required (arguments) or optional
(adjuncts)– A head word often controls the structure of its modifiers
19
![Page 36: SYNTAX - GitHub Pages · Heads, arguments, & adjuncts • Syntax is about the relationships among words and phrases in a sentence • Each constituent has its own internal structure,](https://reader036.vdocument.in/reader036/viewer/2022071606/6143a0686b2ee0265c0229db/html5/thumbnails/36.jpg)
Heads• Head: “the sub-constituent which determines the internal
structure and external distribution of the constituent as a whole” (Bender #52)
• Examples– sentence: (usually) the main verb– noun phrase: (usually) the main noun– verb phrase: (usually) the active verb
20
![Page 37: SYNTAX - GitHub Pages · Heads, arguments, & adjuncts • Syntax is about the relationships among words and phrases in a sentence • Each constituent has its own internal structure,](https://reader036.vdocument.in/reader036/viewer/2022071606/6143a0686b2ee0265c0229db/html5/thumbnails/37.jpg)
Dependents: Arguments & adjuncts• Dependents of a head:
– Arguments: selected/licensed by the head and complete the meaning
– Adjuncts: not selected and refine the meaning
21
![Page 38: SYNTAX - GitHub Pages · Heads, arguments, & adjuncts • Syntax is about the relationships among words and phrases in a sentence • Each constituent has its own internal structure,](https://reader036.vdocument.in/reader036/viewer/2022071606/6143a0686b2ee0265c0229db/html5/thumbnails/38.jpg)
Constituent structure• The head often constrains the internal structure of a constituent • Examples
– verb∎ [Kim]ARGUMENT is [ready]ADJUNCT.
– adjective∎ Kim is [readyADJ [to make a pizza]V].∎ * Kim is [tiredADJ [to make a pizza]V].
– noun∎ [The [red]ADJ ball]∎ * [The [red]ADJ ball [the stick]N]∎ [The [red]ADJ ball [on top of the stick]PP]
22
![Page 39: SYNTAX - GitHub Pages · Heads, arguments, & adjuncts • Syntax is about the relationships among words and phrases in a sentence • Each constituent has its own internal structure,](https://reader036.vdocument.in/reader036/viewer/2022071606/6143a0686b2ee0265c0229db/html5/thumbnails/39.jpg)
More examples– Kim planned [to give Sandy books].– * Kim planned [to give Sandy].– Kim planned [to give books].– * Kim planned [to see Sandy books].– Kim [would [give Sandy books]].– Pat [helped [Kim give Sandy books]].– * [[Give Sandy books] [surprised Kim]].
23
![Page 40: SYNTAX - GitHub Pages · Heads, arguments, & adjuncts • Syntax is about the relationships among words and phrases in a sentence • Each constituent has its own internal structure,](https://reader036.vdocument.in/reader036/viewer/2022071606/6143a0686b2ee0265c0229db/html5/thumbnails/40.jpg)
Summary
24
what is syntax?
A finite set of rules licensing an infinite number of strings
The rules specify how words and phrases relate to one another in a hierarchical manner
No one knows what the actual rules are, but there is consensus that the rules must exist!
![Page 41: SYNTAX - GitHub Pages · Heads, arguments, & adjuncts • Syntax is about the relationships among words and phrases in a sentence • Each constituent has its own internal structure,](https://reader036.vdocument.in/reader036/viewer/2022071606/6143a0686b2ee0265c0229db/html5/thumbnails/41.jpg)
Outline
25
what is syntax?
where do grammars
come from?
how can a computer find a sentence’s
structure?
![Page 42: SYNTAX - GitHub Pages · Heads, arguments, & adjuncts • Syntax is about the relationships among words and phrases in a sentence • Each constituent has its own internal structure,](https://reader036.vdocument.in/reader036/viewer/2022071606/6143a0686b2ee0265c0229db/html5/thumbnails/42.jpg)
Treebanks• Collections of natural text that are annotated according to
a particular syntactic theory– Usually created by linguistic experts– Ideally as large as possible– Theories are usually coarsely divided into constituent/
phrase or dependency structure
26
![Page 43: SYNTAX - GitHub Pages · Heads, arguments, & adjuncts • Syntax is about the relationships among words and phrases in a sentence • Each constituent has its own internal structure,](https://reader036.vdocument.in/reader036/viewer/2022071606/6143a0686b2ee0265c0229db/html5/thumbnails/43.jpg)
Formalisms• Phrase-structure and dependency grammars
– Phrase-structure: encodes the phrasal components of language
– Dependency grammars encode the relationships between words
27
![Page 44: SYNTAX - GitHub Pages · Heads, arguments, & adjuncts • Syntax is about the relationships among words and phrases in a sentence • Each constituent has its own internal structure,](https://reader036.vdocument.in/reader036/viewer/2022071606/6143a0686b2ee0265c0229db/html5/thumbnails/44.jpg)
Penn Treebank (1993)
28
http
s://c
atal
og.ld
c.up
enn.
edu/
LDC
99T4
2
![Page 45: SYNTAX - GitHub Pages · Heads, arguments, & adjuncts • Syntax is about the relationships among words and phrases in a sentence • Each constituent has its own internal structure,](https://reader036.vdocument.in/reader036/viewer/2022071606/6143a0686b2ee0265c0229db/html5/thumbnails/45.jpg)
The Penn Treebank• Syntactic annotation of a million words of the 1989 Wall
Street Journal, plus other corpora (released in 1993)– (Trivia: People often discuss “The Penn Treebank” when
the mean the WSJ portion of it)
29
![Page 46: SYNTAX - GitHub Pages · Heads, arguments, & adjuncts • Syntax is about the relationships among words and phrases in a sentence • Each constituent has its own internal structure,](https://reader036.vdocument.in/reader036/viewer/2022071606/6143a0686b2ee0265c0229db/html5/thumbnails/46.jpg)
The Penn Treebank• Syntactic annotation of a million words of the 1989 Wall
Street Journal, plus other corpora (released in 1993)– (Trivia: People often discuss “The Penn Treebank” when
the mean the WSJ portion of it)• Contains 74 total tags: 36 parts of speech, 7 punctuation
tags, and 31 phrasal constituent tags, plus some relation markings
29
![Page 47: SYNTAX - GitHub Pages · Heads, arguments, & adjuncts • Syntax is about the relationships among words and phrases in a sentence • Each constituent has its own internal structure,](https://reader036.vdocument.in/reader036/viewer/2022071606/6143a0686b2ee0265c0229db/html5/thumbnails/47.jpg)
The Penn Treebank• Syntactic annotation of a million words of the 1989 Wall
Street Journal, plus other corpora (released in 1993)– (Trivia: People often discuss “The Penn Treebank” when
the mean the WSJ portion of it)• Contains 74 total tags: 36 parts of speech, 7 punctuation
tags, and 31 phrasal constituent tags, plus some relation markings
• Was the foundation for an entire field of research and applications for over twenty years
29
![Page 48: SYNTAX - GitHub Pages · Heads, arguments, & adjuncts • Syntax is about the relationships among words and phrases in a sentence • Each constituent has its own internal structure,](https://reader036.vdocument.in/reader036/viewer/2022071606/6143a0686b2ee0265c0229db/html5/thumbnails/48.jpg)
( (S (NP-SBJ (NP (NNP Pierre) (NNP Vinken) ) (, ,) (ADJP (NP (CD 61) (NNS years) ) (JJ old) ) (, ,) ) (VP (MD will) (VP (VB join) (NP (DT the) (NN board) ) (PP-CLR (IN as) (NP (DT a) (JJ nonexecutive) (NN director) )) (NP-TMP (NNP Nov.) (CD 29) ))) (. .) ))
http
s://c
omm
ons.
wik
imed
ia.o
rg/w
iki/F
ile:P
ierre
Vink
en.jp
g
Pierre Vinken, 61 years old, will join the board as a nonexecutive director Nov. 29.
![Page 49: SYNTAX - GitHub Pages · Heads, arguments, & adjuncts • Syntax is about the relationships among words and phrases in a sentence • Each constituent has its own internal structure,](https://reader036.vdocument.in/reader036/viewer/2022071606/6143a0686b2ee0265c0229db/html5/thumbnails/49.jpg)
( (S (NP-SBJ (NP (NNP Pierre) (NNP Vinken) ) (, ,) (ADJP (NP (CD 61) (NNS years) ) (JJ old) ) (, ,) ) (VP (MD will) (VP (VB join) (NP (DT the) (NN board) ) (PP-CLR (IN as) (NP (DT a) (JJ nonexecutive) (NN director) )) (NP-TMP (NNP Nov.) (CD 29) ))) (. .) ))
http
s://c
omm
ons.
wik
imed
ia.o
rg/w
iki/F
ile:P
ierre
Vink
en.jp
g
Pierre Vinken, 61 years old, will join the board as a nonexecutive director Nov. 29.
x 49,208
![Page 50: SYNTAX - GitHub Pages · Heads, arguments, & adjuncts • Syntax is about the relationships among words and phrases in a sentence • Each constituent has its own internal structure,](https://reader036.vdocument.in/reader036/viewer/2022071606/6143a0686b2ee0265c0229db/html5/thumbnails/50.jpg)
Context Free Grammar• Nonterminals are rewritten
based on the lefthand side alone
31
Turing machine
context-sensitive grammar
context free grammar
finite state machine
Chomsky formal language hierarchy
![Page 51: SYNTAX - GitHub Pages · Heads, arguments, & adjuncts • Syntax is about the relationships among words and phrases in a sentence • Each constituent has its own internal structure,](https://reader036.vdocument.in/reader036/viewer/2022071606/6143a0686b2ee0265c0229db/html5/thumbnails/51.jpg)
Context Free Grammar• Nonterminals are rewritten
based on the lefthand side alone• Algorithm:
31
Turing machine
context-sensitive grammar
context free grammar
finite state machine
Chomsky formal language hierarchy
![Page 52: SYNTAX - GitHub Pages · Heads, arguments, & adjuncts • Syntax is about the relationships among words and phrases in a sentence • Each constituent has its own internal structure,](https://reader036.vdocument.in/reader036/viewer/2022071606/6143a0686b2ee0265c0229db/html5/thumbnails/52.jpg)
Context Free Grammar• Nonterminals are rewritten
based on the lefthand side alone• Algorithm:
– Start with TOP
31
Turing machine
context-sensitive grammar
context free grammar
finite state machine
Chomsky formal language hierarchy
![Page 53: SYNTAX - GitHub Pages · Heads, arguments, & adjuncts • Syntax is about the relationships among words and phrases in a sentence • Each constituent has its own internal structure,](https://reader036.vdocument.in/reader036/viewer/2022071606/6143a0686b2ee0265c0229db/html5/thumbnails/53.jpg)
Context Free Grammar• Nonterminals are rewritten
based on the lefthand side alone• Algorithm:
– Start with TOP– For each leaf nonterminal:
31
Turing machine
context-sensitive grammar
context free grammar
finite state machine
Chomsky formal language hierarchy
![Page 54: SYNTAX - GitHub Pages · Heads, arguments, & adjuncts • Syntax is about the relationships among words and phrases in a sentence • Each constituent has its own internal structure,](https://reader036.vdocument.in/reader036/viewer/2022071606/6143a0686b2ee0265c0229db/html5/thumbnails/54.jpg)
Context Free Grammar• Nonterminals are rewritten
based on the lefthand side alone• Algorithm:
– Start with TOP– For each leaf nonterminal:
∎ Sample a rule from the set of rules for that nonterminal
31
Turing machine
context-sensitive grammar
context free grammar
finite state machine
Chomsky formal language hierarchy
![Page 55: SYNTAX - GitHub Pages · Heads, arguments, & adjuncts • Syntax is about the relationships among words and phrases in a sentence • Each constituent has its own internal structure,](https://reader036.vdocument.in/reader036/viewer/2022071606/6143a0686b2ee0265c0229db/html5/thumbnails/55.jpg)
Context Free Grammar• Nonterminals are rewritten
based on the lefthand side alone• Algorithm:
– Start with TOP– For each leaf nonterminal:
∎ Sample a rule from the set of rules for that nonterminal
∎ Replace it with
31
Turing machine
context-sensitive grammar
context free grammar
finite state machine
Chomsky formal language hierarchy
![Page 56: SYNTAX - GitHub Pages · Heads, arguments, & adjuncts • Syntax is about the relationships among words and phrases in a sentence • Each constituent has its own internal structure,](https://reader036.vdocument.in/reader036/viewer/2022071606/6143a0686b2ee0265c0229db/html5/thumbnails/56.jpg)
Context Free Grammar• Nonterminals are rewritten
based on the lefthand side alone• Algorithm:
– Start with TOP– For each leaf nonterminal:
∎ Sample a rule from the set of rules for that nonterminal
∎ Replace it with∎ Recurse
31
Turing machine
context-sensitive grammar
context free grammar
finite state machine
Chomsky formal language hierarchy
![Page 57: SYNTAX - GitHub Pages · Heads, arguments, & adjuncts • Syntax is about the relationships among words and phrases in a sentence • Each constituent has its own internal structure,](https://reader036.vdocument.in/reader036/viewer/2022071606/6143a0686b2ee0265c0229db/html5/thumbnails/57.jpg)
Context Free Grammar• Nonterminals are rewritten
based on the lefthand side alone• Algorithm:
– Start with TOP– For each leaf nonterminal:
∎ Sample a rule from the set of rules for that nonterminal
∎ Replace it with∎ Recurse
• Terminates when there are no more nonterminals
31
Turing machine
context-sensitive grammar
context free grammar
finite state machine
Chomsky formal language hierarchy
![Page 58: SYNTAX - GitHub Pages · Heads, arguments, & adjuncts • Syntax is about the relationships among words and phrases in a sentence • Each constituent has its own internal structure,](https://reader036.vdocument.in/reader036/viewer/2022071606/6143a0686b2ee0265c0229db/html5/thumbnails/58.jpg)
32
![Page 59: SYNTAX - GitHub Pages · Heads, arguments, & adjuncts • Syntax is about the relationships among words and phrases in a sentence • Each constituent has its own internal structure,](https://reader036.vdocument.in/reader036/viewer/2022071606/6143a0686b2ee0265c0229db/html5/thumbnails/59.jpg)
TOP TOP → S
32
![Page 60: SYNTAX - GitHub Pages · Heads, arguments, & adjuncts • Syntax is about the relationships among words and phrases in a sentence • Each constituent has its own internal structure,](https://reader036.vdocument.in/reader036/viewer/2022071606/6143a0686b2ee0265c0229db/html5/thumbnails/60.jpg)
TOP TOP → SS S → VP
32
![Page 61: SYNTAX - GitHub Pages · Heads, arguments, & adjuncts • Syntax is about the relationships among words and phrases in a sentence • Each constituent has its own internal structure,](https://reader036.vdocument.in/reader036/viewer/2022071606/6143a0686b2ee0265c0229db/html5/thumbnails/61.jpg)
TOP TOP → SS S → VPVP VP → (VB→halt) NP PP
32
![Page 62: SYNTAX - GitHub Pages · Heads, arguments, & adjuncts • Syntax is about the relationships among words and phrases in a sentence • Each constituent has its own internal structure,](https://reader036.vdocument.in/reader036/viewer/2022071606/6143a0686b2ee0265c0229db/html5/thumbnails/62.jpg)
TOP TOP → SS S → VPVP VP → (VB→halt) NP PPhalt NP PP NP → (DT The)
(JJ→market-jarring) (CD→25)
32
![Page 63: SYNTAX - GitHub Pages · Heads, arguments, & adjuncts • Syntax is about the relationships among words and phrases in a sentence • Each constituent has its own internal structure,](https://reader036.vdocument.in/reader036/viewer/2022071606/6143a0686b2ee0265c0229db/html5/thumbnails/63.jpg)
TOP TOP → SS S → VPVP VP → (VB→halt) NP PPhalt NP PP NP → (DT The)
(JJ→market-jarring) (CD→25)
halt The market-jarring 25 PP PP → (IN→at) NP
32
![Page 64: SYNTAX - GitHub Pages · Heads, arguments, & adjuncts • Syntax is about the relationships among words and phrases in a sentence • Each constituent has its own internal structure,](https://reader036.vdocument.in/reader036/viewer/2022071606/6143a0686b2ee0265c0229db/html5/thumbnails/64.jpg)
TOP TOP → SS S → VPVP VP → (VB→halt) NP PPhalt NP PP NP → (DT The)
(JJ→market-jarring) (CD→25)
halt The market-jarring 25 PP PP → (IN→at) NPhalt The market-jarring 25 at NP NP → (DT→the) (NN→bond)
32
![Page 65: SYNTAX - GitHub Pages · Heads, arguments, & adjuncts • Syntax is about the relationships among words and phrases in a sentence • Each constituent has its own internal structure,](https://reader036.vdocument.in/reader036/viewer/2022071606/6143a0686b2ee0265c0229db/html5/thumbnails/65.jpg)
TOP TOP → SS S → VPVP VP → (VB→halt) NP PPhalt NP PP NP → (DT The)
(JJ→market-jarring) (CD→25)
halt The market-jarring 25 PP PP → (IN→at) NPhalt The market-jarring 25 at NP NP → (DT→the) (NN→bond)halt The market-jarring 25 at the bond
32
![Page 66: SYNTAX - GitHub Pages · Heads, arguments, & adjuncts • Syntax is about the relationships among words and phrases in a sentence • Each constituent has its own internal structure,](https://reader036.vdocument.in/reader036/viewer/2022071606/6143a0686b2ee0265c0229db/html5/thumbnails/66.jpg)
TOP TOP → SS S → VPVP VP → (VB→halt) NP PPhalt NP PP NP → (DT The)
(JJ→market-jarring) (CD→25)
halt The market-jarring 25 PP PP → (IN→at) NPhalt The market-jarring 25 at NP NP → (DT→the) (NN→bond)halt The market-jarring 25 at the bond(TOP (S (VP (VB halt) (NP (DT The) (JJ market-jarring) (CD 25)) (PP (IN at) (NP (DT the) (NN bond))))))
32
![Page 67: SYNTAX - GitHub Pages · Heads, arguments, & adjuncts • Syntax is about the relationships among words and phrases in a sentence • Each constituent has its own internal structure,](https://reader036.vdocument.in/reader036/viewer/2022071606/6143a0686b2ee0265c0229db/html5/thumbnails/67.jpg)
A problem with the Penn Treebank• One language, English
– Represents a very narrow typology (e.g., little morphology)
– Consider the tags we looked at before∎ nouns: NN, NNS, NNP, NNPS∎ adverbs: RB, RBR, RBS, RP∎ verbs: VB, VBD, VBG, VBN, VBP, VBZ
– How well will these generalize to other languages?•
33
![Page 68: SYNTAX - GitHub Pages · Heads, arguments, & adjuncts • Syntax is about the relationships among words and phrases in a sentence • Each constituent has its own internal structure,](https://reader036.vdocument.in/reader036/viewer/2022071606/6143a0686b2ee0265c0229db/html5/thumbnails/68.jpg)
Dependency Treebanks (2012)
• Dependency trees annotated across languages in a consistent manner
34https://universaldependencies.org
![Page 69: SYNTAX - GitHub Pages · Heads, arguments, & adjuncts • Syntax is about the relationships among words and phrases in a sentence • Each constituent has its own internal structure,](https://reader036.vdocument.in/reader036/viewer/2022071606/6143a0686b2ee0265c0229db/html5/thumbnails/69.jpg)
Example• Instead of encoding phrase structure, it encodes
dependencies between words• Often more directly encodes information we care about
(i.e., who did what to whom)
35
![Page 70: SYNTAX - GitHub Pages · Heads, arguments, & adjuncts • Syntax is about the relationships among words and phrases in a sentence • Each constituent has its own internal structure,](https://reader036.vdocument.in/reader036/viewer/2022071606/6143a0686b2ee0265c0229db/html5/thumbnails/70.jpg)
Guiding principles• Works for individual languages• Suitable across languages• Easy to use when annotating• Easy to parse quickly• Understandable to laypeople• Usable by downstream tasks
36https://universaldependencies.org/introduction.html
![Page 71: SYNTAX - GitHub Pages · Heads, arguments, & adjuncts • Syntax is about the relationships among words and phrases in a sentence • Each constituent has its own internal structure,](https://reader036.vdocument.in/reader036/viewer/2022071606/6143a0686b2ee0265c0229db/html5/thumbnails/71.jpg)
Universal Dependencies• Parts of speech
– open class∎ ADJ, ADV, INTJ, NOUN, PROPN, VERB
– closed class∎ ADP, AUX, CCONJ, DET, NUM, PART, PRON,
SCONJ– other
∎ PUNCT, SYM, X
37
![Page 72: SYNTAX - GitHub Pages · Heads, arguments, & adjuncts • Syntax is about the relationships among words and phrases in a sentence • Each constituent has its own internal structure,](https://reader036.vdocument.in/reader036/viewer/2022071606/6143a0686b2ee0265c0229db/html5/thumbnails/72.jpg)
Where do grammars come from?
38https://www.shutterstock.com/image-vector/stork-carrying-baby-boy-133823486
![Page 73: SYNTAX - GitHub Pages · Heads, arguments, & adjuncts • Syntax is about the relationships among words and phrases in a sentence • Each constituent has its own internal structure,](https://reader036.vdocument.in/reader036/viewer/2022071606/6143a0686b2ee0265c0229db/html5/thumbnails/73.jpg)
Where do grammars come from?• Treebanks!• Given a treebank, and a formalism, we can learn statistics
by counting over the annotated instances
39
![Page 74: SYNTAX - GitHub Pages · Heads, arguments, & adjuncts • Syntax is about the relationships among words and phrases in a sentence • Each constituent has its own internal structure,](https://reader036.vdocument.in/reader036/viewer/2022071606/6143a0686b2ee0265c0229db/html5/thumbnails/74.jpg)
Probabilities• For example, a context-free grammar
– S → NP , NP VP . [0.002]– NP → NNP NNP [0.037]– , → , [0.999]– NP → * [X]– VP → VB NP [0.057]– NP → PRP$ NN [0.008]– . → . [0.987]
•Probabilities given as P(X) = ∑
X′ ∈N
P(X)P(X′ )
40
![Page 75: SYNTAX - GitHub Pages · Heads, arguments, & adjuncts • Syntax is about the relationships among words and phrases in a sentence • Each constituent has its own internal structure,](https://reader036.vdocument.in/reader036/viewer/2022071606/6143a0686b2ee0265c0229db/html5/thumbnails/75.jpg)
Summary
41
where do grammars
come from?
Grammars are learned from Treebanks
Treebanks are annotated according to a particular theory or formalism
![Page 76: SYNTAX - GitHub Pages · Heads, arguments, & adjuncts • Syntax is about the relationships among words and phrases in a sentence • Each constituent has its own internal structure,](https://reader036.vdocument.in/reader036/viewer/2022071606/6143a0686b2ee0265c0229db/html5/thumbnails/76.jpg)
Outline
42
what is syntax?
where do grammars
come from?
how can a computer find a sentence’s
structure?
![Page 77: SYNTAX - GitHub Pages · Heads, arguments, & adjuncts • Syntax is about the relationships among words and phrases in a sentence • Each constituent has its own internal structure,](https://reader036.vdocument.in/reader036/viewer/2022071606/6143a0686b2ee0265c0229db/html5/thumbnails/77.jpg)
Formal Language Theory• Consider the claims underlying our grammar-based view
of language1. Sentences are either in or out of a language2. Sentences have an invisible hidden structure
43
![Page 78: SYNTAX - GitHub Pages · Heads, arguments, & adjuncts • Syntax is about the relationships among words and phrases in a sentence • Each constituent has its own internal structure,](https://reader036.vdocument.in/reader036/viewer/2022071606/6143a0686b2ee0265c0229db/html5/thumbnails/78.jpg)
Formal Language Theory• Consider the claims underlying our grammar-based view
of language1. Sentences are either in or out of a language2. Sentences have an invisible hidden structure
• We can generalize this discussion to make a connection between natural and other kinds of languages
43
![Page 79: SYNTAX - GitHub Pages · Heads, arguments, & adjuncts • Syntax is about the relationships among words and phrases in a sentence • Each constituent has its own internal structure,](https://reader036.vdocument.in/reader036/viewer/2022071606/6143a0686b2ee0265c0229db/html5/thumbnails/79.jpg)
Formal Language Theory• Consider the claims underlying our grammar-based view
of language1. Sentences are either in or out of a language2. Sentences have an invisible hidden structure
• We can generalize this discussion to make a connection between natural and other kinds of languages
• Consider, for example, computer programs– They either compile or don’t compile– Their structure determines their interpretation
43
![Page 80: SYNTAX - GitHub Pages · Heads, arguments, & adjuncts • Syntax is about the relationships among words and phrases in a sentence • Each constituent has its own internal structure,](https://reader036.vdocument.in/reader036/viewer/2022071606/6143a0686b2ee0265c0229db/html5/thumbnails/80.jpg)
Formal Language Theory• Generalization: define a language to be a set of strings
under some alphabet, – e.g., the set of valid English sentences (where the
“alphabet” is English words), or the set of valid Python programs
Σ
44
![Page 81: SYNTAX - GitHub Pages · Heads, arguments, & adjuncts • Syntax is about the relationships among words and phrases in a sentence • Each constituent has its own internal structure,](https://reader036.vdocument.in/reader036/viewer/2022071606/6143a0686b2ee0265c0229db/html5/thumbnails/81.jpg)
Formal Language Theory• Generalization: define a language to be a set of strings
under some alphabet, – e.g., the set of valid English sentences (where the
“alphabet” is English words), or the set of valid Python programs
Σ
• Formal Language Theory provides a common framework for studying properties of these languages, e.g.,– Is this file a valid C++ program? A valid Czech
sentence?– What is the structure?– How hard / time-consuming is it to answer these
questions?44
![Page 82: SYNTAX - GitHub Pages · Heads, arguments, & adjuncts • Syntax is about the relationships among words and phrases in a sentence • Each constituent has its own internal structure,](https://reader036.vdocument.in/reader036/viewer/2022071606/6143a0686b2ee0265c0229db/html5/thumbnails/82.jpg)
The Chomsky Hierarchy• Definitions: given
– an alphabet ( ), – terminal symbols, e.g., – nonterminal symbols, e.g., {S, N, A, B}– , , , strings of terminals and/or nonterminals
Σa ∈ Σ
α β γ
45
Type Rules Name Recognized by
3 Regular Regular expressions
2 Context-free Pushdown automata
1 Context-sensitive Linear-bounded Turing machine
0 Recursively enumerable Turing Machines
A → aB
A → αA → α β αγβ
A → α β γ
![Page 83: SYNTAX - GitHub Pages · Heads, arguments, & adjuncts • Syntax is about the relationships among words and phrases in a sentence • Each constituent has its own internal structure,](https://reader036.vdocument.in/reader036/viewer/2022071606/6143a0686b2ee0265c0229db/html5/thumbnails/83.jpg)
Problems• What is the value? (5 + 7) * 11
46
• Who did what to whom? Him the Almighty hurledDipanjan taught Johnmark
If we have a grammar, we can answer these with parsing
![Page 84: SYNTAX - GitHub Pages · Heads, arguments, & adjuncts • Syntax is about the relationships among words and phrases in a sentence • Each constituent has its own internal structure,](https://reader036.vdocument.in/reader036/viewer/2022071606/6143a0686b2ee0265c0229db/html5/thumbnails/84.jpg)
Parsing• If the grammar has certain properties (Type 2 or 3), we
can efficiently answer two questions with a parser– Is the sentence in the language of the parser?– What is the structure above that sentence?
47
![Page 85: SYNTAX - GitHub Pages · Heads, arguments, & adjuncts • Syntax is about the relationships among words and phrases in a sentence • Each constituent has its own internal structure,](https://reader036.vdocument.in/reader036/viewer/2022071606/6143a0686b2ee0265c0229db/html5/thumbnails/85.jpg)
Algorithms• The CKY algorithm for parsing with constituency
grammars• Transition-based parsing with dependency grammars
48
![Page 86: SYNTAX - GitHub Pages · Heads, arguments, & adjuncts • Syntax is about the relationships among words and phrases in a sentence • Each constituent has its own internal structure,](https://reader036.vdocument.in/reader036/viewer/2022071606/6143a0686b2ee0265c0229db/html5/thumbnails/86.jpg)
Chart parsing for constituency grammars
• Maintains a chart of nonterminals spanning words, e.g.,– NP over words 1..4 and 2..5– VP over words 4..6 and 4..8– etc
49
![Page 87: SYNTAX - GitHub Pages · Heads, arguments, & adjuncts • Syntax is about the relationships among words and phrases in a sentence • Each constituent has its own internal structure,](https://reader036.vdocument.in/reader036/viewer/2022071606/6143a0686b2ee0265c0229db/html5/thumbnails/87.jpg)
Chart parsing for constituency grammars
500 1 2 3 4 5Time flies like an arrow
0NN1 1NN2,1VB2 2VB3,2IN3 3DT4 4NN5
3NP50NP1
2PP5, 2VP50NP2
1VP5
0S5
![Page 88: SYNTAX - GitHub Pages · Heads, arguments, & adjuncts • Syntax is about the relationships among words and phrases in a sentence • Each constituent has its own internal structure,](https://reader036.vdocument.in/reader036/viewer/2022071606/6143a0686b2ee0265c0229db/html5/thumbnails/88.jpg)
CKY algorithm• How do we produce this chart? Cocke-Younger-Kasami (CYK/
CKY)• Basic idea is to apply rules in a bottom-up fashion, applying all
rules, and (recursively) building larger constituents from smaller ones
• Input: sentence of length Nfor width in 2..N
for begin i in 1..{N - width}j = i + width
for split k in {i + 1}..{j - 1}for all rules A → B C
create iAj if iBk and kCj
51
![Page 89: SYNTAX - GitHub Pages · Heads, arguments, & adjuncts • Syntax is about the relationships among words and phrases in a sentence • Each constituent has its own internal structure,](https://reader036.vdocument.in/reader036/viewer/2022071606/6143a0686b2ee0265c0229db/html5/thumbnails/89.jpg)
CKY algorithm
520 1 2 3 4 5Time flies like an arrow
![Page 90: SYNTAX - GitHub Pages · Heads, arguments, & adjuncts • Syntax is about the relationships among words and phrases in a sentence • Each constituent has its own internal structure,](https://reader036.vdocument.in/reader036/viewer/2022071606/6143a0686b2ee0265c0229db/html5/thumbnails/90.jpg)
CKY algorithm
520 1 2 3 4 5Time flies like an arrow
NN NN,VB VB,IN DT NN
![Page 91: SYNTAX - GitHub Pages · Heads, arguments, & adjuncts • Syntax is about the relationships among words and phrases in a sentence • Each constituent has its own internal structure,](https://reader036.vdocument.in/reader036/viewer/2022071606/6143a0686b2ee0265c0229db/html5/thumbnails/91.jpg)
CKY algorithm
520 1 2 3 4 5Time flies like an arrow
NN NN,VB VB,IN DT NNNP→DT NNNP→NN
![Page 92: SYNTAX - GitHub Pages · Heads, arguments, & adjuncts • Syntax is about the relationships among words and phrases in a sentence • Each constituent has its own internal structure,](https://reader036.vdocument.in/reader036/viewer/2022071606/6143a0686b2ee0265c0229db/html5/thumbnails/92.jpg)
CKY algorithm
520 1 2 3 4 5Time flies like an arrow
NN NN,VB VB,IN DT NNNP→DT NNNP→NN
PP→2IN3 3NP5NP→NN NN
![Page 93: SYNTAX - GitHub Pages · Heads, arguments, & adjuncts • Syntax is about the relationships among words and phrases in a sentence • Each constituent has its own internal structure,](https://reader036.vdocument.in/reader036/viewer/2022071606/6143a0686b2ee0265c0229db/html5/thumbnails/93.jpg)
CKY algorithm
520 1 2 3 4 5Time flies like an arrow
NN NN,VB VB,IN DT NNNP→DT NNNP→NN
PP→2IN3 3NP5NP→NN NNVP→2VB3 3NP5
![Page 94: SYNTAX - GitHub Pages · Heads, arguments, & adjuncts • Syntax is about the relationships among words and phrases in a sentence • Each constituent has its own internal structure,](https://reader036.vdocument.in/reader036/viewer/2022071606/6143a0686b2ee0265c0229db/html5/thumbnails/94.jpg)
CKY algorithm
520 1 2 3 4 5Time flies like an arrow
NN NN,VB VB,IN DT NNNP→DT NNNP→NN
PP→2IN3 3NP5NP→NN NN
VP→VB PP
VP→2VB3 3NP5
![Page 95: SYNTAX - GitHub Pages · Heads, arguments, & adjuncts • Syntax is about the relationships among words and phrases in a sentence • Each constituent has its own internal structure,](https://reader036.vdocument.in/reader036/viewer/2022071606/6143a0686b2ee0265c0229db/html5/thumbnails/95.jpg)
CKY algorithm
520 1 2 3 4 5Time flies like an arrow
NN NN,VB VB,IN DT NNNP→DT NNNP→NN
PP→2IN3 3NP5NP→NN NN
VP→VB PP
VP→2VB3 3NP5
S → 0NP1 1VP5
![Page 96: SYNTAX - GitHub Pages · Heads, arguments, & adjuncts • Syntax is about the relationships among words and phrases in a sentence • Each constituent has its own internal structure,](https://reader036.vdocument.in/reader036/viewer/2022071606/6143a0686b2ee0265c0229db/html5/thumbnails/96.jpg)
CKY algorithm
520 1 2 3 4 5Time flies like an arrow
NN NN,VB VB,IN DT NNNP→DT NNNP→NN
PP→2IN3 3NP5NP→NN NN
VP→VB PP
VP→2VB3 3NP5
S → 0NP1 1VP5
S → 0NP2 2VP5
![Page 97: SYNTAX - GitHub Pages · Heads, arguments, & adjuncts • Syntax is about the relationships among words and phrases in a sentence • Each constituent has its own internal structure,](https://reader036.vdocument.in/reader036/viewer/2022071606/6143a0686b2ee0265c0229db/html5/thumbnails/97.jpg)
CKY algorithm• Termination: is there a chart entry at 0SN?
– ✓ string is in the language– Obtain the structure by following backpointers– Not covered: adding probabilities to rules to resolve
amgibuities
53
![Page 98: SYNTAX - GitHub Pages · Heads, arguments, & adjuncts • Syntax is about the relationships among words and phrases in a sentence • Each constituent has its own internal structure,](https://reader036.vdocument.in/reader036/viewer/2022071606/6143a0686b2ee0265c0229db/html5/thumbnails/98.jpg)
Dependency parsing• The situation is different in many ways
– We’re no longer building labeled constituents– Instead, we’re searching for word dependencies
54
![Page 99: SYNTAX - GitHub Pages · Heads, arguments, & adjuncts • Syntax is about the relationships among words and phrases in a sentence • Each constituent has its own internal structure,](https://reader036.vdocument.in/reader036/viewer/2022071606/6143a0686b2ee0265c0229db/html5/thumbnails/99.jpg)
Dependency parsing• The situation is different in many ways
– We’re no longer building labeled constituents– Instead, we’re searching for word dependencies
• This is accomplished by a stack-based transition parser– Repeatedly (a) shift a word onto the stack or (b) create
a LEFT or RIGHT dependency from the top two words
54
![Page 100: SYNTAX - GitHub Pages · Heads, arguments, & adjuncts • Syntax is about the relationships among words and phrases in a sentence • Each constituent has its own internal structure,](https://reader036.vdocument.in/reader036/viewer/2022071606/6143a0686b2ee0265c0229db/html5/thumbnails/100.jpg)
step stack words action relation
ROOT human languages are hard to parse
![Page 101: SYNTAX - GitHub Pages · Heads, arguments, & adjuncts • Syntax is about the relationships among words and phrases in a sentence • Each constituent has its own internal structure,](https://reader036.vdocument.in/reader036/viewer/2022071606/6143a0686b2ee0265c0229db/html5/thumbnails/101.jpg)
step stack words action relation
0 [] [human,langs,are,hard,to,parse] SHIFT
ROOT human languages are hard to parse
![Page 102: SYNTAX - GitHub Pages · Heads, arguments, & adjuncts • Syntax is about the relationships among words and phrases in a sentence • Each constituent has its own internal structure,](https://reader036.vdocument.in/reader036/viewer/2022071606/6143a0686b2ee0265c0229db/html5/thumbnails/102.jpg)
step stack words action relation
0 [] [human,langs,are,hard,to,parse] SHIFT
1 [human] [langs,are,hard,to,parse] SHIFT
ROOT human languages are hard to parse
![Page 103: SYNTAX - GitHub Pages · Heads, arguments, & adjuncts • Syntax is about the relationships among words and phrases in a sentence • Each constituent has its own internal structure,](https://reader036.vdocument.in/reader036/viewer/2022071606/6143a0686b2ee0265c0229db/html5/thumbnails/103.jpg)
step stack words action relation
0 [] [human,langs,are,hard,to,parse] SHIFT
1 [human] [langs,are,hard,to,parse] SHIFT
2 [human,langs] [are,hard,to,parse] LEFTARC human←langs
ROOT human languages are hard to parse
![Page 104: SYNTAX - GitHub Pages · Heads, arguments, & adjuncts • Syntax is about the relationships among words and phrases in a sentence • Each constituent has its own internal structure,](https://reader036.vdocument.in/reader036/viewer/2022071606/6143a0686b2ee0265c0229db/html5/thumbnails/104.jpg)
step stack words action relation
0 [] [human,langs,are,hard,to,parse] SHIFT
1 [human] [langs,are,hard,to,parse] SHIFT
2 [human,langs] [are,hard,to,parse] LEFTARC human←langs
3 [langs] [are,hard,to,parse] SHIFT
ROOT human languages are hard to parse
![Page 105: SYNTAX - GitHub Pages · Heads, arguments, & adjuncts • Syntax is about the relationships among words and phrases in a sentence • Each constituent has its own internal structure,](https://reader036.vdocument.in/reader036/viewer/2022071606/6143a0686b2ee0265c0229db/html5/thumbnails/105.jpg)
step stack words action relation
0 [] [human,langs,are,hard,to,parse] SHIFT
1 [human] [langs,are,hard,to,parse] SHIFT
2 [human,langs] [are,hard,to,parse] LEFTARC human←langs
3 [langs] [are,hard,to,parse] SHIFT
4 [langs,are] [hard,to,parse] LEFTARC langs←are
ROOT human languages are hard to parse
![Page 106: SYNTAX - GitHub Pages · Heads, arguments, & adjuncts • Syntax is about the relationships among words and phrases in a sentence • Each constituent has its own internal structure,](https://reader036.vdocument.in/reader036/viewer/2022071606/6143a0686b2ee0265c0229db/html5/thumbnails/106.jpg)
step stack words action relation
0 [] [human,langs,are,hard,to,parse] SHIFT
1 [human] [langs,are,hard,to,parse] SHIFT
2 [human,langs] [are,hard,to,parse] LEFTARC human←langs
3 [langs] [are,hard,to,parse] SHIFT
4 [langs,are] [hard,to,parse] LEFTARC langs←are
5 [are] [hard,to,parse] SHIFT
ROOT human languages are hard to parse
![Page 107: SYNTAX - GitHub Pages · Heads, arguments, & adjuncts • Syntax is about the relationships among words and phrases in a sentence • Each constituent has its own internal structure,](https://reader036.vdocument.in/reader036/viewer/2022071606/6143a0686b2ee0265c0229db/html5/thumbnails/107.jpg)
step stack words action relation
0 [] [human,langs,are,hard,to,parse] SHIFT
1 [human] [langs,are,hard,to,parse] SHIFT
2 [human,langs] [are,hard,to,parse] LEFTARC human←langs
3 [langs] [are,hard,to,parse] SHIFT
4 [langs,are] [hard,to,parse] LEFTARC langs←are
5 [are] [hard,to,parse] SHIFT
6 [are,hard] [to,parse] SHIFT
ROOT human languages are hard to parse
![Page 108: SYNTAX - GitHub Pages · Heads, arguments, & adjuncts • Syntax is about the relationships among words and phrases in a sentence • Each constituent has its own internal structure,](https://reader036.vdocument.in/reader036/viewer/2022071606/6143a0686b2ee0265c0229db/html5/thumbnails/108.jpg)
step stack words action relation
0 [] [human,langs,are,hard,to,parse] SHIFT
1 [human] [langs,are,hard,to,parse] SHIFT
2 [human,langs] [are,hard,to,parse] LEFTARC human←langs
3 [langs] [are,hard,to,parse] SHIFT
4 [langs,are] [hard,to,parse] LEFTARC langs←are
5 [are] [hard,to,parse] SHIFT
6 [are,hard] [to,parse] SHIFT
7 [are,hard,to] [parse] SHIFT
ROOT human languages are hard to parse
![Page 109: SYNTAX - GitHub Pages · Heads, arguments, & adjuncts • Syntax is about the relationships among words and phrases in a sentence • Each constituent has its own internal structure,](https://reader036.vdocument.in/reader036/viewer/2022071606/6143a0686b2ee0265c0229db/html5/thumbnails/109.jpg)
step stack words action relation
0 [] [human,langs,are,hard,to,parse] SHIFT
1 [human] [langs,are,hard,to,parse] SHIFT
2 [human,langs] [are,hard,to,parse] LEFTARC human←langs
3 [langs] [are,hard,to,parse] SHIFT
4 [langs,are] [hard,to,parse] LEFTARC langs←are
5 [are] [hard,to,parse] SHIFT
6 [are,hard] [to,parse] SHIFT
7 [are,hard,to] [parse] SHIFT
8 [are,hard,to,parse] [] LEFTARC to←parse
ROOT human languages are hard to parse
![Page 110: SYNTAX - GitHub Pages · Heads, arguments, & adjuncts • Syntax is about the relationships among words and phrases in a sentence • Each constituent has its own internal structure,](https://reader036.vdocument.in/reader036/viewer/2022071606/6143a0686b2ee0265c0229db/html5/thumbnails/110.jpg)
step stack words action relation
0 [] [human,langs,are,hard,to,parse] SHIFT
1 [human] [langs,are,hard,to,parse] SHIFT
2 [human,langs] [are,hard,to,parse] LEFTARC human←langs
3 [langs] [are,hard,to,parse] SHIFT
4 [langs,are] [hard,to,parse] LEFTARC langs←are
5 [are] [hard,to,parse] SHIFT
6 [are,hard] [to,parse] SHIFT
7 [are,hard,to] [parse] SHIFT
8 [are,hard,to,parse] [] LEFTARC to←parse
9 [are,hard,parse] [] RIGHTARC hard→parse
ROOT human languages are hard to parse
![Page 111: SYNTAX - GitHub Pages · Heads, arguments, & adjuncts • Syntax is about the relationships among words and phrases in a sentence • Each constituent has its own internal structure,](https://reader036.vdocument.in/reader036/viewer/2022071606/6143a0686b2ee0265c0229db/html5/thumbnails/111.jpg)
step stack words action relation
0 [] [human,langs,are,hard,to,parse] SHIFT
1 [human] [langs,are,hard,to,parse] SHIFT
2 [human,langs] [are,hard,to,parse] LEFTARC human←langs
3 [langs] [are,hard,to,parse] SHIFT
4 [langs,are] [hard,to,parse] LEFTARC langs←are
5 [are] [hard,to,parse] SHIFT
6 [are,hard] [to,parse] SHIFT
7 [are,hard,to] [parse] SHIFT
8 [are,hard,to,parse] [] LEFTARC to←parse
9 [are,hard,parse] [] RIGHTARC hard→parse
10 [are,hard] [] RIGHTARC are→hard
ROOT human languages are hard to parse
![Page 112: SYNTAX - GitHub Pages · Heads, arguments, & adjuncts • Syntax is about the relationships among words and phrases in a sentence • Each constituent has its own internal structure,](https://reader036.vdocument.in/reader036/viewer/2022071606/6143a0686b2ee0265c0229db/html5/thumbnails/112.jpg)
step stack words action relation
0 [] [human,langs,are,hard,to,parse] SHIFT
1 [human] [langs,are,hard,to,parse] SHIFT
2 [human,langs] [are,hard,to,parse] LEFTARC human←langs
3 [langs] [are,hard,to,parse] SHIFT
4 [langs,are] [hard,to,parse] LEFTARC langs←are
5 [are] [hard,to,parse] SHIFT
6 [are,hard] [to,parse] SHIFT
7 [are,hard,to] [parse] SHIFT
8 [are,hard,to,parse] [] LEFTARC to←parse
9 [are,hard,parse] [] RIGHTARC hard→parse
10 [are,hard] [] RIGHTARC are→hard
11 [are] [] RIGHTARC ROOT→are
ROOT human languages are hard to parse
![Page 113: SYNTAX - GitHub Pages · Heads, arguments, & adjuncts • Syntax is about the relationships among words and phrases in a sentence • Each constituent has its own internal structure,](https://reader036.vdocument.in/reader036/viewer/2022071606/6143a0686b2ee0265c0229db/html5/thumbnails/113.jpg)
step stack words action relation
0 [] [human,langs,are,hard,to,parse] SHIFT
1 [human] [langs,are,hard,to,parse] SHIFT
2 [human,langs] [are,hard,to,parse] LEFTARC human←langs
3 [langs] [are,hard,to,parse] SHIFT
4 [langs,are] [hard,to,parse] LEFTARC langs←are
5 [are] [hard,to,parse] SHIFT
6 [are,hard] [to,parse] SHIFT
7 [are,hard,to] [parse] SHIFT
8 [are,hard,to,parse] [] LEFTARC to←parse
9 [are,hard,parse] [] RIGHTARC hard→parse
10 [are,hard] [] RIGHTARC are→hard
11 [are] [] RIGHTARC ROOT→are
12 [] [] DONE
ROOT human languages are hard to parse
![Page 114: SYNTAX - GitHub Pages · Heads, arguments, & adjuncts • Syntax is about the relationships among words and phrases in a sentence • Each constituent has its own internal structure,](https://reader036.vdocument.in/reader036/viewer/2022071606/6143a0686b2ee0265c0229db/html5/thumbnails/114.jpg)
Unanswered questions• How do we score rules (for constituency parsing) and
actions and relations (for dependency parsing)?– Probabilities can be read from Treebanks– Actions can be informed by feature selection
• How do we know the right path to take?– We can try multiple paths using beam search– We get lots of savings via dynamic programming
56
![Page 115: SYNTAX - GitHub Pages · Heads, arguments, & adjuncts • Syntax is about the relationships among words and phrases in a sentence • Each constituent has its own internal structure,](https://reader036.vdocument.in/reader036/viewer/2022071606/6143a0686b2ee0265c0229db/html5/thumbnails/115.jpg)
Summary
57
how can a computer find a sentence’s
structure?
For context-free grammars, the (weighted) CKY algorithm can be used to find the most probable (maximum a posteriori) tree given a certain grammar
For dependency grammars, the most popular approach is a variation of transition-based parsers
![Page 116: SYNTAX - GitHub Pages · Heads, arguments, & adjuncts • Syntax is about the relationships among words and phrases in a sentence • Each constituent has its own internal structure,](https://reader036.vdocument.in/reader036/viewer/2022071606/6143a0686b2ee0265c0229db/html5/thumbnails/116.jpg)
Resources• Demos:
– AllenNLP: https://demo.allennlp.org– Berkeley Neural Parser: https://parser.kitaev.io– Spacy dependency parser: https://explosion.ai/demos/
displacy
58
![Page 117: SYNTAX - GitHub Pages · Heads, arguments, & adjuncts • Syntax is about the relationships among words and phrases in a sentence • Each constituent has its own internal structure,](https://reader036.vdocument.in/reader036/viewer/2022071606/6143a0686b2ee0265c0229db/html5/thumbnails/117.jpg)
Outline
59
what is syntax?
where do grammars
come from?
how can a computer find a sentence’s
structure?
![Page 118: SYNTAX - GitHub Pages · Heads, arguments, & adjuncts • Syntax is about the relationships among words and phrases in a sentence • Each constituent has its own internal structure,](https://reader036.vdocument.in/reader036/viewer/2022071606/6143a0686b2ee0265c0229db/html5/thumbnails/118.jpg)
Outline
59
what is syntax?
where do grammars
come from?
how can a computer find a sentence’s
structure?
the study of the internal structure of
sentences (in natural and synthetic languages)
![Page 119: SYNTAX - GitHub Pages · Heads, arguments, & adjuncts • Syntax is about the relationships among words and phrases in a sentence • Each constituent has its own internal structure,](https://reader036.vdocument.in/reader036/viewer/2022071606/6143a0686b2ee0265c0229db/html5/thumbnails/119.jpg)
Outline
59
what is syntax?
where do grammars
come from?
how can a computer find a sentence’s
structure?
the study of the internal structure of
sentences (in natural and synthetic languages)
they are created by linguists, usually under particular
grammatical theories
![Page 120: SYNTAX - GitHub Pages · Heads, arguments, & adjuncts • Syntax is about the relationships among words and phrases in a sentence • Each constituent has its own internal structure,](https://reader036.vdocument.in/reader036/viewer/2022071606/6143a0686b2ee0265c0229db/html5/thumbnails/120.jpg)
Outline
59
what is syntax?
where do grammars
come from?
how can a computer find a sentence’s
structure?
the study of the internal structure of
sentences (in natural and synthetic languages)
they are created by linguists, usually under particular
grammatical theories
train a grammar from a treebank and then apply that grammar to new sentences
using parsing algorithms