cmsc 723: computational linguistics i session #6 syntax...
TRANSCRIPT
![Page 1: CMSC 723: Computational Linguistics I Session #6 Syntax ...lintool.github.io/UMD-courses/CMSC723-2009-Fall/session6...|CoCo st tue ts o co e e t c asses t at be a e s a ynstituents](https://reader033.vdocument.in/reader033/viewer/2022052814/609dd6ef5426c42b823cc54c/html5/thumbnails/1.jpg)
Syntax and Context-Free GrammarsCMSC 723: Computational Linguistics I ― Session #6
Jimmy LinJimmy LinThe iSchoolUniversity of Maryland
Wednesday, October 7, 2009
![Page 2: CMSC 723: Computational Linguistics I Session #6 Syntax ...lintool.github.io/UMD-courses/CMSC723-2009-Fall/session6...|CoCo st tue ts o co e e t c asses t at be a e s a ynstituents](https://reader033.vdocument.in/reader033/viewer/2022052814/609dd6ef5426c42b823cc54c/html5/thumbnails/2.jpg)
Today’s AgendaWords… structure… meaning…
Formal Grammarso a G a a sContext-free grammarGrammars for EnglishTreebanksDependency grammars
Next week: parsing algorithmsNext week: parsing algorithms
![Page 3: CMSC 723: Computational Linguistics I Session #6 Syntax ...lintool.github.io/UMD-courses/CMSC723-2009-Fall/session6...|CoCo st tue ts o co e e t c asses t at be a e s a ynstituents](https://reader033.vdocument.in/reader033/viewer/2022052814/609dd6ef5426c42b823cc54c/html5/thumbnails/3.jpg)
Grammar and SyntaxBy grammar, or syntax, we mean implicit knowledge of a native speaker
Acquired by around three years old, without explicit instructionIt’s already inside our heads, we’re just trying to formally capture it
Not the kind of stuff you were later taught in school:Not the kind of stuff you were later taught in school:Don’t split infinitivesDon’t end sentences with prepositions
![Page 4: CMSC 723: Computational Linguistics I Session #6 Syntax ...lintool.github.io/UMD-courses/CMSC723-2009-Fall/session6...|CoCo st tue ts o co e e t c asses t at be a e s a ynstituents](https://reader033.vdocument.in/reader033/viewer/2022052814/609dd6ef5426c42b823cc54c/html5/thumbnails/4.jpg)
SyntaxWhy should you care?
Syntactic analysis is a key component in many Sy tact c a a ys s s a ey co po e t a yapplications
Grammar checkersConversational agentsQuestion answering Information extractionMachine translation…
![Page 5: CMSC 723: Computational Linguistics I Session #6 Syntax ...lintool.github.io/UMD-courses/CMSC723-2009-Fall/session6...|CoCo st tue ts o co e e t c asses t at be a e s a ynstituents](https://reader033.vdocument.in/reader033/viewer/2022052814/609dd6ef5426c42b823cc54c/html5/thumbnails/5.jpg)
ConstituencyBasic idea: groups of words act as a single unit
Constituents form coherent classes that behave similarlyCo st tue ts o co e e t c asses t at be a e s a yWith respect to their internal structure: e.g., at the core of a noun phrase is a nounWith respect to other constituents: e g noun phrases generallyWith respect to other constituents: e.g., noun phrases generally occur before verbs
![Page 6: CMSC 723: Computational Linguistics I Session #6 Syntax ...lintool.github.io/UMD-courses/CMSC723-2009-Fall/session6...|CoCo st tue ts o co e e t c asses t at be a e s a ynstituents](https://reader033.vdocument.in/reader033/viewer/2022052814/609dd6ef5426c42b823cc54c/html5/thumbnails/6.jpg)
Constituency: ExampleThe following are all noun phrases in English...
Why?Why? They can all precede verbsThey can all be preposed…
![Page 7: CMSC 723: Computational Linguistics I Session #6 Syntax ...lintool.github.io/UMD-courses/CMSC723-2009-Fall/session6...|CoCo st tue ts o co e e t c asses t at be a e s a ynstituents](https://reader033.vdocument.in/reader033/viewer/2022052814/609dd6ef5426c42b823cc54c/html5/thumbnails/7.jpg)
Grammars and ConstituencyFor a particular language:
What are the “right” set of constituents?What rules govern how they combine?
Answer: not obvious and difficultThat’s why there are so many different theories of grammar and competing analyses of the same data!
Approach here:ppVery genericFocus primarily on the “machinery”
’ fDoesn’t correspond to any modern linguistic theory of grammar
![Page 8: CMSC 723: Computational Linguistics I Session #6 Syntax ...lintool.github.io/UMD-courses/CMSC723-2009-Fall/session6...|CoCo st tue ts o co e e t c asses t at be a e s a ynstituents](https://reader033.vdocument.in/reader033/viewer/2022052814/609dd6ef5426c42b823cc54c/html5/thumbnails/8.jpg)
Context-Free GrammarsContext-free grammars (CFGs)
Aka phrase structure grammarsAka Backus-Naur form (BNF)
Consist ofRules TerminalsNon-terminals
![Page 9: CMSC 723: Computational Linguistics I Session #6 Syntax ...lintool.github.io/UMD-courses/CMSC723-2009-Fall/session6...|CoCo st tue ts o co e e t c asses t at be a e s a ynstituents](https://reader033.vdocument.in/reader033/viewer/2022052814/609dd6ef5426c42b823cc54c/html5/thumbnails/9.jpg)
Context-Free GrammarsTerminals
We’ll take these to be words (for now)
Non-TerminalsThe constituents in a language (e.g., noun phrase)
RulesConsist of a single non-terminal on the left and any number of terminals and non-terminals on the rightterminals and non-terminals on the right
![Page 10: CMSC 723: Computational Linguistics I Session #6 Syntax ...lintool.github.io/UMD-courses/CMSC723-2009-Fall/session6...|CoCo st tue ts o co e e t c asses t at be a e s a ynstituents](https://reader033.vdocument.in/reader033/viewer/2022052814/609dd6ef5426c42b823cc54c/html5/thumbnails/10.jpg)
Some NP RulesHere are some rules for our noun phrases
Rules 1 & 2 describe two kinds of NPs:One that consists of a determiner followed by a nominalAnother that consists of proper namesAnother that consists of proper names
Rule 3 illustrates two things:An explicit disjunctionAn explicit disjunctionA recursive definition
![Page 11: CMSC 723: Computational Linguistics I Session #6 Syntax ...lintool.github.io/UMD-courses/CMSC723-2009-Fall/session6...|CoCo st tue ts o co e e t c asses t at be a e s a ynstituents](https://reader033.vdocument.in/reader033/viewer/2022052814/609dd6ef5426c42b823cc54c/html5/thumbnails/11.jpg)
L0 Grammar
![Page 12: CMSC 723: Computational Linguistics I Session #6 Syntax ...lintool.github.io/UMD-courses/CMSC723-2009-Fall/session6...|CoCo st tue ts o co e e t c asses t at be a e s a ynstituents](https://reader033.vdocument.in/reader033/viewer/2022052814/609dd6ef5426c42b823cc54c/html5/thumbnails/12.jpg)
CFG: Formal definition
![Page 13: CMSC 723: Computational Linguistics I Session #6 Syntax ...lintool.github.io/UMD-courses/CMSC723-2009-Fall/session6...|CoCo st tue ts o co e e t c asses t at be a e s a ynstituents](https://reader033.vdocument.in/reader033/viewer/2022052814/609dd6ef5426c42b823cc54c/html5/thumbnails/13.jpg)
Three-fold View of CFGsGenerator
Acceptorccepto
Parser
![Page 14: CMSC 723: Computational Linguistics I Session #6 Syntax ...lintool.github.io/UMD-courses/CMSC723-2009-Fall/session6...|CoCo st tue ts o co e e t c asses t at be a e s a ynstituents](https://reader033.vdocument.in/reader033/viewer/2022052814/609dd6ef5426c42b823cc54c/html5/thumbnails/14.jpg)
Derivations and ParsingA derivation is a sequence of rules applications that
Covers all tokens in the input stringCovers only the tokens in the input string
Parsing: given a string and a grammar, recover the derivationderivation
Derivation can be represented as a parse treeMultiple derivations?
![Page 15: CMSC 723: Computational Linguistics I Session #6 Syntax ...lintool.github.io/UMD-courses/CMSC723-2009-Fall/session6...|CoCo st tue ts o co e e t c asses t at be a e s a ynstituents](https://reader033.vdocument.in/reader033/viewer/2022052814/609dd6ef5426c42b823cc54c/html5/thumbnails/15.jpg)
Parse Tree: Example
Note: equivalence between parse trees and bracket notation
![Page 16: CMSC 723: Computational Linguistics I Session #6 Syntax ...lintool.github.io/UMD-courses/CMSC723-2009-Fall/session6...|CoCo st tue ts o co e e t c asses t at be a e s a ynstituents](https://reader033.vdocument.in/reader033/viewer/2022052814/609dd6ef5426c42b823cc54c/html5/thumbnails/16.jpg)
Natural vs. Programming LanguagesWait, don’t we do this for programming languages?
What’s similar?at s s a
What’s different?
![Page 17: CMSC 723: Computational Linguistics I Session #6 Syntax ...lintool.github.io/UMD-courses/CMSC723-2009-Fall/session6...|CoCo st tue ts o co e e t c asses t at be a e s a ynstituents](https://reader033.vdocument.in/reader033/viewer/2022052814/609dd6ef5426c42b823cc54c/html5/thumbnails/17.jpg)
An English Grammar FragmentSentences
Noun phrasesou p asesIssue: agreement
Verb phrasesIssue: subcategorization
![Page 18: CMSC 723: Computational Linguistics I Session #6 Syntax ...lintool.github.io/UMD-courses/CMSC723-2009-Fall/session6...|CoCo st tue ts o co e e t c asses t at be a e s a ynstituents](https://reader033.vdocument.in/reader033/viewer/2022052814/609dd6ef5426c42b823cc54c/html5/thumbnails/18.jpg)
Sentence TypesDeclaratives: A plane left.S → NP VP
Imperatives: Leave!S → VP
Yes-No Questions: Did the plane leave?S → Aux NP VP
WH Questions: When did the plane leave?S → WH-NP Aux NP VP
![Page 19: CMSC 723: Computational Linguistics I Session #6 Syntax ...lintool.github.io/UMD-courses/CMSC723-2009-Fall/session6...|CoCo st tue ts o co e e t c asses t at be a e s a ynstituents](https://reader033.vdocument.in/reader033/viewer/2022052814/609dd6ef5426c42b823cc54c/html5/thumbnails/19.jpg)
Noun PhrasesLet’s consider these rules in detail:
NPs are a bit more complex than that!Consider: “All the morning flights from Denver to Tampa leavingConsider: All the morning flights from Denver to Tampa leaving before 10”
![Page 20: CMSC 723: Computational Linguistics I Session #6 Syntax ...lintool.github.io/UMD-courses/CMSC723-2009-Fall/session6...|CoCo st tue ts o co e e t c asses t at be a e s a ynstituents](https://reader033.vdocument.in/reader033/viewer/2022052814/609dd6ef5426c42b823cc54c/html5/thumbnails/20.jpg)
A Complex Noun Phrase
“stuff that comes after”
“stuff that comes before”
“head” = central, most critical part of the NP
![Page 21: CMSC 723: Computational Linguistics I Session #6 Syntax ...lintool.github.io/UMD-courses/CMSC723-2009-Fall/session6...|CoCo st tue ts o co e e t c asses t at be a e s a ynstituents](https://reader033.vdocument.in/reader033/viewer/2022052814/609dd6ef5426c42b823cc54c/html5/thumbnails/21.jpg)
DeterminersNoun phrases can start with determiners...
Determiners can beete e s ca beSimple lexical items: the, this, a, an, etc. (e.g., “a car”)Or simple possessives (e.g., “John’s car”)Or complex recursive versions thereof (e.g., John’s sister’s husband’s son’s car)
![Page 22: CMSC 723: Computational Linguistics I Session #6 Syntax ...lintool.github.io/UMD-courses/CMSC723-2009-Fall/session6...|CoCo st tue ts o co e e t c asses t at be a e s a ynstituents](https://reader033.vdocument.in/reader033/viewer/2022052814/609dd6ef5426c42b823cc54c/html5/thumbnails/22.jpg)
PremodifiersCome before the head
Examples:a p esCardinals, ordinals, etc. (e.g., “three cars”)Adjectives (e.g., “large car”)
Ordering constraints“three large cars” vs. “?large three cars”
![Page 23: CMSC 723: Computational Linguistics I Session #6 Syntax ...lintool.github.io/UMD-courses/CMSC723-2009-Fall/session6...|CoCo st tue ts o co e e t c asses t at be a e s a ynstituents](https://reader033.vdocument.in/reader033/viewer/2022052814/609dd6ef5426c42b823cc54c/html5/thumbnails/23.jpg)
PostmodifiersNaturally, come after the head
Three kindsee dsPrepositional phrases (e.g., “from Seattle”)Non-finite clauses (e.g., “arriving before noon”)Relative clauses (e.g., “that serve breakfast”)
Similar recursive rules to handle theseNominal → Nominal PPNominal → Nominal PPNominal → Nominal GerundVPNominal → Nominal RelClause
![Page 24: CMSC 723: Computational Linguistics I Session #6 Syntax ...lintool.github.io/UMD-courses/CMSC723-2009-Fall/session6...|CoCo st tue ts o co e e t c asses t at be a e s a ynstituents](https://reader033.vdocument.in/reader033/viewer/2022052814/609dd6ef5426c42b823cc54c/html5/thumbnails/24.jpg)
A Complex Noun Phrase Revisited
![Page 25: CMSC 723: Computational Linguistics I Session #6 Syntax ...lintool.github.io/UMD-courses/CMSC723-2009-Fall/session6...|CoCo st tue ts o co e e t c asses t at be a e s a ynstituents](https://reader033.vdocument.in/reader033/viewer/2022052814/609dd6ef5426c42b823cc54c/html5/thumbnails/25.jpg)
AgreementAgreement: constraints that hold among various constituents
Example, number agreement in English
This flightThose flightsO fli ht
*This flights*Those flight*O fli htOne flight
Two flights*One flights*Two flight
![Page 26: CMSC 723: Computational Linguistics I Session #6 Syntax ...lintool.github.io/UMD-courses/CMSC723-2009-Fall/session6...|CoCo st tue ts o co e e t c asses t at be a e s a ynstituents](https://reader033.vdocument.in/reader033/viewer/2022052814/609dd6ef5426c42b823cc54c/html5/thumbnails/26.jpg)
ProblemOur NP rules don’t capture agreement constraints
Accepts grammatical examples (this flight)Also accepts ungrammatical examples (*these flight)
Such rules overgenerateWe’ll come back to this later
![Page 27: CMSC 723: Computational Linguistics I Session #6 Syntax ...lintool.github.io/UMD-courses/CMSC723-2009-Fall/session6...|CoCo st tue ts o co e e t c asses t at be a e s a ynstituents](https://reader033.vdocument.in/reader033/viewer/2022052814/609dd6ef5426c42b823cc54c/html5/thumbnails/27.jpg)
Verb PhrasesEnglish verb phrases consists of
Head verbZero or more following constituents (called arguments)
Sample rules:
![Page 28: CMSC 723: Computational Linguistics I Session #6 Syntax ...lintool.github.io/UMD-courses/CMSC723-2009-Fall/session6...|CoCo st tue ts o co e e t c asses t at be a e s a ynstituents](https://reader033.vdocument.in/reader033/viewer/2022052814/609dd6ef5426c42b823cc54c/html5/thumbnails/28.jpg)
SubcategorizationNot all verbs are allowed to participate in all VP rules
We can subcategorize verbs according to argument patterns (sometimes called “frames”)Modern grammars may have 100s of such classes
This is a finer-grained articulation of traditional notions ofThis is a finer grained articulation of traditional notions of transitivity
![Page 29: CMSC 723: Computational Linguistics I Session #6 Syntax ...lintool.github.io/UMD-courses/CMSC723-2009-Fall/session6...|CoCo st tue ts o co e e t c asses t at be a e s a ynstituents](https://reader033.vdocument.in/reader033/viewer/2022052814/609dd6ef5426c42b823cc54c/html5/thumbnails/29.jpg)
SubcategorizationSneeze: John sneezed
Find: Please find [a flight to NY]NPd ease d [a g t to ]NP
Give: Give [me]NP [a cheaper fare]NP
Help: Can you help [me] [with a flight]Help: Can you help [me]NP [with a flight]PP
Prefer: I prefer [to leave earlier]TO-VP
Told: I was told [United has a flight]S…
![Page 30: CMSC 723: Computational Linguistics I Session #6 Syntax ...lintool.github.io/UMD-courses/CMSC723-2009-Fall/session6...|CoCo st tue ts o co e e t c asses t at be a e s a ynstituents](https://reader033.vdocument.in/reader033/viewer/2022052814/609dd6ef5426c42b823cc54c/html5/thumbnails/30.jpg)
SubcategorizationSubcategorization at work:
*John sneezed the book*I prefer United has a flight*Give with a flight
But some verbs can participate in multiple frames:But some verbs can participate in multiple frames:I ateI ate the apple
How do we formally encode these constraints?
![Page 31: CMSC 723: Computational Linguistics I Session #6 Syntax ...lintool.github.io/UMD-courses/CMSC723-2009-Fall/session6...|CoCo st tue ts o co e e t c asses t at be a e s a ynstituents](https://reader033.vdocument.in/reader033/viewer/2022052814/609dd6ef5426c42b823cc54c/html5/thumbnails/31.jpg)
Why?As presented, the various rules for VPs overgenerate:
John sneezed [the book]NP
Allowed by the second ruleAllowed by the second rule…
![Page 32: CMSC 723: Computational Linguistics I Session #6 Syntax ...lintool.github.io/UMD-courses/CMSC723-2009-Fall/session6...|CoCo st tue ts o co e e t c asses t at be a e s a ynstituents](https://reader033.vdocument.in/reader033/viewer/2022052814/609dd6ef5426c42b823cc54c/html5/thumbnails/32.jpg)
Possible CFG SolutionEncode agreement in non-terminals:
SgS → SgNP SgVPPlS → PlNP PlVPSgNP → SgDet SgNomPlNP → PlDet PlNomPlNP → PlDet PlNomPlVP → PlV NPSgVP → SgV Np
Can use the same trick for verb subcategorization
![Page 33: CMSC 723: Computational Linguistics I Session #6 Syntax ...lintool.github.io/UMD-courses/CMSC723-2009-Fall/session6...|CoCo st tue ts o co e e t c asses t at be a e s a ynstituents](https://reader033.vdocument.in/reader033/viewer/2022052814/609dd6ef5426c42b823cc54c/html5/thumbnails/33.jpg)
Possible CFG SolutionCritique?
It works…But it’s ugly…And it doesn’t scale (explosion of rules)
Alternatives?Alternatives?Multi-pass solutions
![Page 34: CMSC 723: Computational Linguistics I Session #6 Syntax ...lintool.github.io/UMD-courses/CMSC723-2009-Fall/session6...|CoCo st tue ts o co e e t c asses t at be a e s a ynstituents](https://reader033.vdocument.in/reader033/viewer/2022052814/609dd6ef5426c42b823cc54c/html5/thumbnails/34.jpg)
Three-fold View of CFGsGenerator
Acceptorccepto
Parser
![Page 35: CMSC 723: Computational Linguistics I Session #6 Syntax ...lintool.github.io/UMD-courses/CMSC723-2009-Fall/session6...|CoCo st tue ts o co e e t c asses t at be a e s a ynstituents](https://reader033.vdocument.in/reader033/viewer/2022052814/609dd6ef5426c42b823cc54c/html5/thumbnails/35.jpg)
The PointCFGs have about just the right amount of machinery to account for basic syntactic structure in English
Lot’s of issues though...
Good enough for many applications!But there are many alternatives out there…
![Page 36: CMSC 723: Computational Linguistics I Session #6 Syntax ...lintool.github.io/UMD-courses/CMSC723-2009-Fall/session6...|CoCo st tue ts o co e e t c asses t at be a e s a ynstituents](https://reader033.vdocument.in/reader033/viewer/2022052814/609dd6ef5426c42b823cc54c/html5/thumbnails/36.jpg)
TreebanksTreebanks are corpora in which each sentence has been paired with a parse tree
Hopefully the right one!
These are generally created:By first parsing the collection with an automatic parserAnd then having human annotators correct each parse as necessary
But…Detailed annotation guidelines are needed
fExplicit instructions for dealing with particular constructions
![Page 37: CMSC 723: Computational Linguistics I Session #6 Syntax ...lintool.github.io/UMD-courses/CMSC723-2009-Fall/session6...|CoCo st tue ts o co e e t c asses t at be a e s a ynstituents](https://reader033.vdocument.in/reader033/viewer/2022052814/609dd6ef5426c42b823cc54c/html5/thumbnails/37.jpg)
Penn TreebankPenn TreeBank is a widely used treebank
1 million words from the Wall Street Journal
Treebanks implicitly define a grammar for the language
![Page 38: CMSC 723: Computational Linguistics I Session #6 Syntax ...lintool.github.io/UMD-courses/CMSC723-2009-Fall/session6...|CoCo st tue ts o co e e t c asses t at be a e s a ynstituents](https://reader033.vdocument.in/reader033/viewer/2022052814/609dd6ef5426c42b823cc54c/html5/thumbnails/38.jpg)
Penn Treebank: Example
![Page 39: CMSC 723: Computational Linguistics I Session #6 Syntax ...lintool.github.io/UMD-courses/CMSC723-2009-Fall/session6...|CoCo st tue ts o co e e t c asses t at be a e s a ynstituents](https://reader033.vdocument.in/reader033/viewer/2022052814/609dd6ef5426c42b823cc54c/html5/thumbnails/39.jpg)
Treebank GrammarsSuch grammars tend to be very flat
Recursion avoided to ease annotators burden
Penn Treebank has 4500 different rules for VPs, including…
VP → VBD PPVP → VBD PP PPVP → VBD PP PP PPVP → VBD PP PP PP PP
![Page 40: CMSC 723: Computational Linguistics I Session #6 Syntax ...lintool.github.io/UMD-courses/CMSC723-2009-Fall/session6...|CoCo st tue ts o co e e t c asses t at be a e s a ynstituents](https://reader033.vdocument.in/reader033/viewer/2022052814/609dd6ef5426c42b823cc54c/html5/thumbnails/40.jpg)
Why treebanks?Treebanks are critical to training statistical parsers
Also valuable to linguist when investigating phenomenaso a uab e to gu st e est gat g p e o e a
![Page 41: CMSC 723: Computational Linguistics I Session #6 Syntax ...lintool.github.io/UMD-courses/CMSC723-2009-Fall/session6...|CoCo st tue ts o co e e t c asses t at be a e s a ynstituents](https://reader033.vdocument.in/reader033/viewer/2022052814/609dd6ef5426c42b823cc54c/html5/thumbnails/41.jpg)
Dependency GrammarsCFGs focus on constituents
Non-terminals don’t actually appear in the sentenceSo what if you got rid of them?
In dependency grammar, a parse is a graph where:Nodes represent wordsEdges represent dependency relations between words (typed or untyped, directed or undirected)
![Page 42: CMSC 723: Computational Linguistics I Session #6 Syntax ...lintool.github.io/UMD-courses/CMSC723-2009-Fall/session6...|CoCo st tue ts o co e e t c asses t at be a e s a ynstituents](https://reader033.vdocument.in/reader033/viewer/2022052814/609dd6ef5426c42b823cc54c/html5/thumbnails/42.jpg)
Dependency Relations
![Page 43: CMSC 723: Computational Linguistics I Session #6 Syntax ...lintool.github.io/UMD-courses/CMSC723-2009-Fall/session6...|CoCo st tue ts o co e e t c asses t at be a e s a ynstituents](https://reader033.vdocument.in/reader033/viewer/2022052814/609dd6ef5426c42b823cc54c/html5/thumbnails/43.jpg)
Example Dependency Parse
They hid the letter on the shelf
Compare with constituent parse… What’s the relation?
![Page 44: CMSC 723: Computational Linguistics I Session #6 Syntax ...lintool.github.io/UMD-courses/CMSC723-2009-Fall/session6...|CoCo st tue ts o co e e t c asses t at be a e s a ynstituents](https://reader033.vdocument.in/reader033/viewer/2022052814/609dd6ef5426c42b823cc54c/html5/thumbnails/44.jpg)
SummaryCFG can be used to capture various facts about the structure of language
Agreement and subcategorization cause problems…And there are alternative formalisms
Treebanks as an important resource for NLPTreebanks as an important resource for NLP
Next week: parsing