dependency grammars and parsing€¦ · probabilistic context free grammar set of weighted...
TRANSCRIPT
Dependency Grammars and Parsing
CMSC 473/673
UMBC
Outline
Review: PCFGs and CKY
Dependency Grammars
Dependency Parsing (Shift-reduce)
From syntax to semantics
Probabilistic Context Free Grammar
Set of weighted (probabilistic) rewrite rules, comprised of terminals and non-terminals
Terminals: the words in the language (the lexicon), e.g., Baltimore
Non-terminals: symbols that can trigger rewrite rules, e.g., S, NP, Noun
(Sometimes) Pre-terminals: symbols that can only trigger lexical rewrites, e.g., Noun
1.0 S NP VP.4 NP Det Noun.3 NP Noun.2 NP Det AdjP.1 NP NP PP
1.0 PP P NP.34 AdjP Adj Noun.26 VP V NP.0003 Noun Baltimore…
Q: What are the distributions? What must sum to 1?
A: P(X Y Z | X)
Probabilistic Context Free Grammar
p(S
NP VP
Noun
Baltimore
Verb NP
is a great city
)=
p(S
NP VP
) *
p( ) *
p( ) *
NP
Noun
p( ) *Noun
Baltimore
VP
Verb NP
p( ) *Verb
is
p( )NP
a great city
product of probabilities of individual rules used in the
derivation
“Papa ate the caviar with a spoon”S NP VP
NP Det N
NP NP PP
VP V NP
VP VP PP
PP P NP
NP Papa
N caviar
N spoon
V spoon
V ate
P with
Det the
Det a
0 1 2 3 4 5 6 7
Example from Jason Eisner
Entire grammarAssume uniform
weights
First: Let’s find all NPs
(NP, 0, 1): Papa(NP, 2, 4): the caviar(NP, 5, 7): a spoon(NP, 2, 7): the caviar with a spoon
Second: Let’s find all VPs
(VP, 1, 7): ate the caviar with a spoon(VP, 1, 4): ate the caviar
Third: Let’s find all Ss
(S, 0, 7): Papa ate the caviar with a spoon(S, 0, 4): Papa ate the caviar
NP
VP
0
6
1
2
3
4
5
1 2 3 4 5 6 7
(NP, 0, 1) (VP, 1, 7) (S, 0, 7)
start
end
S
CKY Recognizer
Input: * string of N words* grammar in CNF
Output: True (with parse)/False
Data structure: N*N table TRows indicate span
start (0 to N-1)Columns indicate span
end (1 to N)
T[i][j] lists constituents spanning i j
For Viterbi in HMMs:build table left-to-right
For CKY in trees:1. build smallest-to-largest &2. left-to-right
CKY RecognizerT = Cell[N][N+1]
for(j = 1; j ≤ N; ++j) {T[j-1][j].add(X for non-terminal X in G if X wordj)
}
for(width = 2; width ≤ N; ++width) {for(start = 0; start < N - width; ++start) {
end = start + widthfor(mid = start+1; mid < end; ++mid) {
for(non-terminal Y : T[start][mid]) {for(non-terminal Z : T[mid][end]) {
T[start][end].add(X for rule X Y Z : G)}
}}
}}
X
Y Z
Y Z
CKY is Versatile: PCFG TasksTask PCFG algorithm name HMM analog
Find any parse CKY recognizer none
Find the most likely parse (for an observed sequence)
CKY weighted Viterbi Viterbi
Calculate the (log) likelihood of an observed sequence w1, …, wN Inside algorithm Forward algorithm
Learn the grammar parametersInside-outside algorithm (EM)
Forward-backward/Baum-Welch (EM)
Outline
Review: PCFGs and CKY
Dependency Grammars
Dependency Parsing (Shift-reduce)
From syntax to semantics
Structure vs. Word Relations
Constituency trees/analyses: based on structure
Dependency analyses: based on word relations
Remember: (P)CFGs HelpClearly Show Ambiguity
I ate the meal with friends
NP VP
VP NP PP
S
NP VP
S
VP NP
PPNP
CFGs to Dependencies
I ate the meal with friends
NP VP
VP NP PP
S
NP VP
S
VP NP
PPNP
CFGs to Dependencies
I ate the meal with friends
NP VP
VP NP PP
S
NP VP
S
VP NP
PPNP
CFGs to Dependencies
I ate the meal with friends
NP VP
VP NP PP
S
NP VP
S
VP NP
PPNP
CFGs toLabeled Dependencies
I ate the meal with friends
NP VP
VP NP PP
S
NP VP
S
VP NP
PPNPnsubj dobj nmod
nsubj dobj nmod
Labeled Dependencies
Word-to-word labeled relations
governor (head)
dependentChris ate
nsubj
Labeled Dependencies
Word-to-word labeled relations
governor (head)
dependentChris ate
nsubj
de Marneffe et al., 2014
http://universaldependencies.org/
(Labeled) Dependency Parse
Directed graphs
Vertices: linguistic blobs in a sentence
Edges: (labeled) arcs
(Labeled) Dependency Parse
Directed graphs
Vertices: linguistic blobs in a sentence
Edges: (labeled) arcs
Often directed trees
1. A single root node with no incoming arcs
2. Each vertex except root has exactly one incoming arc
3. Unique path from the root node to each vertex
Projective Dependency Trees
No crossing arcs
SLP3: Figs 14.2, 14.3
✔ Projective
Projective Dependency Trees
No crossing arcs
SLP3: Figs 14.2, 14.3
✔ Projective
✖ Not projective
Projective Dependency Trees
No crossing arcs
SLP3: Figs 14.2, 14.3
✔ Projective
✖ Not projective
non projective parses capture• certain long-range dependencies• free word order
Are CFGs for Naught?
Nope! Simple algorithm from Xia and Palmer (2011)
1. Mark the head child of each node in a phrase structure, using “appropriate” head rules.
2. In the dependency structure, make the head of each non-head child depend on the head of the head-child.
Are CFGs for Naught?
Nope! Simple algorithm from Xia and Palmer (2011)
1. Mark the head child of each node in a phrase structure, using “appropriate” head rules.
2. In the dependency structure, make the head of each non-head child depend on the head of the head-child.
Papa ate the caviar with a spoon
NP V D N P D N
NP NP
PPVP
VP
S
Are CFGs for Naught?
Nope! Simple algorithm from Xia and Palmer (2011)
1. Mark the head child of each node in a phrase structure, using “appropriate” head rules.
2. In the dependency structure, make the head of each non-head child depend on the head of the head-child.
Papa ate the caviar with a spoon
NP V D N P D N
NP NP
PPVP
VP
S
spooncaviar
Are CFGs for Naught?
Nope! Simple algorithm from Xia and Palmer (2011)
1. Mark the head child of each node in a phrase structure, using “appropriate” head rules.
2. In the dependency structure, make the head of each non-head child depend on the head of the head-child.
Papa ate the caviar with a spoon
NP V D N P D N
NP NP
PPVP
VP
S
ate spoon
spooncaviar
Are CFGs for Naught?
Nope! Simple algorithm from Xia and Palmer (2011)
1. Mark the head child of each node in a phrase structure, using “appropriate” head rules.
2. In the dependency structure, make the head of each non-head child depend on the head of the head-child.
Papa ate the caviar with a spoon
NP V D N P D N
NP NP
PPVP
VP
S
ate spoon
spooncaviar
ate
Are CFGs for Naught?
Nope! Simple algorithm from Xia and Palmer (2011)
1. Mark the head child of each node in a phrase structure, using “appropriate” head rules.
2. In the dependency structure, make the head of each non-head child depend on the head of the head-child.
Papa ate the caviar with a spoon
NP V D N P D N
NP NP
PPVP
VP
S
ate spoon
spooncaviar
ate
ate
Dependency Post-Processing(Keep Tree Structure)
Dependency Post-Processing(Get Possible Graph Structure)
Amaranthus, collectively known as amaranth, is a cosmopolitan genus of annual or short-lived perennial plants.
Outline
Review: PCFGs and CKY
Dependency Grammars
Dependency Parsing (Shift-reduce)
From syntax to semantics
(Some) Dependency Parsing Algorithms
Dynamic Programming
Eisner Algorithm (Eisner 1996)
Transition-based
Shift-reduce, arc standard
Graph-based
Maximum spanning tree
(Some) Dependency Parsing Algorithms
Dynamic Programming
Eisner Algorithm (Eisner 1996)
Transition-based
Shift-reduce, arc standard
Graph-based
Maximum spanning tree
Shift-Reduce Parsing
Recall from CMSC 331
Bottom-up
Tools: input words, some special root symbol ($), and a stack to hold configurations
Shift-Reduce Parsing
Recall from CMSC 331Bottom-up
Tools: input words, some special root symbol ($), and a stack to hold configurations
Shift:– move tokens onto the stack– match top two elements of the stack against the grammar
(RHS)
Reduce:– If match occurred, place LHS symbol on the stack
Shift-Reduce Dependency Parsing
Tools: input words, some special root symbol ($), and a stack to hold configurations
Shift:– move tokens onto the stack
– decide if top two elements of the stack form a valid (good) grammatical dependency
Reduce:– If there’s a valid relation, place head on the stack
Shift-Reduce Dependency Parsing
Tools: input words, some special root symbol ($), and a stack to hold configurations
Shift:– move tokens onto the stack
– decide if top two elements of the stack form a valid (good) grammatical dependency
Reduce:– If there’s a valid relation, place head on the stack
decide how?Search problem!
Shift-Reduce Dependency Parsing
Tools: input words, some special root symbol ($), and a stack to hold configurations
Shift:– move tokens onto the stack
– decide if top two elements of the stack form a valid (good) grammatical dependency
Reduce:– If there’s a valid relation, place head on the stack
decide how?Search problem!
what is valid?Learn it!
Shift-Reduce Dependency Parsing
Tools: input words, some special root symbol ($), and a stack to hold configurations
Shift:– move tokens onto the stack
– decide if top two elements of the stack form a valid (good) grammatical dependency
Reduce:– If there’s a valid relation, place head on the stack
decide how?Search problem!
what is valid?Learn it!
what are the possible actions?
Shift-Reduce Dependency Parsing
Tools: input words, some special root symbol ($), and a stack to hold configurations
Shift:– move tokens onto the stack
– decide if top two elements of the stack form a valid (good) grammatical dependency
Reduce:– If there’s a valid relation, place head on the stack
decide how?Search problem!
what is valid?Learn it!
what are the possible actions?
Shift-Reduce Actions
Possibility
Assign the current word as the head of some previously seen word
Assign some previously seen word as the head
of the current word
Wait processing the current word; add it for
later
Shift-Reduce Actions
PossibilityActionName
Action Meaning
Assign the current word as the head of some previously seen word
LEFTARC
Assert a head-dependent relation between the word at the top of stack and the word directly
beneath it; remove the lower word from the stack
Assign some previously seen word as the head
of the current wordRIGHTARC
Assert a head-dependent relation between the second word on the stack and the word at the top;
remove the word at the top of the stack
Wait processing the current word; add it for
laterSHIFT
Remove the word from the front of the input buffer and push it onto the stack
Shift-Reduce Actions
PossibilityActionName
Action Meaning
Assign the current word as the head of some previously seen word
LEFTARC
Assert a head-dependent relation between the word at the top of stack and the word directly
beneath it; remove the lower word from the stack
Assign some previously seen word as the head
of the current wordRIGHTARC
Assert a head-dependent relation between the second word on the stack and the word at the top;
remove the word at the top of the stack
Wait processing the current word; add it for
laterSHIFT
Remove the word from the front of the input buffer and push it onto the stack
“Arc standard”
Arc Standard Parsing
state {[root], [words], [] }
Arc Standard Parsing
state {[root], [words], [] }
while state not final {
}
Arc Standard Parsing
state {[root], [words], [] }
while state not final {
t ← ORACLE(state)
state ← APPLY(t, state)
}
Arc Standard Parsing
state {[root], [words], [] }
while state not final {
t ← ORACLE(state)
state ← APPLY(t, state)
}
return state
Arc Standard Parsing
state {[root], [words], [] }
while state ≠ {[root], [], [(deps)]} {
t ← ORACLE(state)
state ← APPLY(t, state)
}
return state
Arc Standard Parsing
state {[root], [words], [] }
while state ≠ {[root], [], [(deps)]} {
t ← ORACLE(state)
state ← APPLY(t, state)
}
return state
PossibilityActionName
Action Meaning
Assign the current word as the head of some previously seen word
LEFTARC
Assert a head-dependent relation between the word at the top of stack and the word directly beneath it; remove
the lower word from the stack
Assign some previously seen word as the head of
the current wordRIGHTARC
Assert a head-dependent relation between the second word on the stack and the word at the top; remove the
word at the top of the stack
Wait processing the current word; add it for
laterSHIFT
Remove the word from the front of the input buffer and push it onto the stack
Arc Standard Parsing
state {[root], [words], [] }
while state ≠ {[root], [], [(deps)]} {
t ← ORACLE(state)
state ← APPLY(t, state)
}
return state
PossibilityActionName
Action Meaning
Assign the current word as the head of some previously seen word
LEFTARC
Assert a head-dependent relation between the word at the top of stack and the word directly beneath it; remove
the lower word from the stack
Assign some previously seen word as the head of
the current wordRIGHTARC
Assert a head-dependent relation between the second word on the stack and the word at the top; remove the
word at the top of the stack
Wait processing the current word; add it for
laterSHIFT
Remove the word from the front of the input buffer and push it onto the stack
Arc Standard Parsing
state {[root], [words], [] }
while state ≠ {[root], [], [(deps)]} {
t ← PREDICT(state)
state ← APPLY(t, state)
}
return state
what is valid?Learn it!
PossibilityActionName
Action Meaning
Assign the current word as the head of some previously seen word
LEFTARC
Assert a head-dependent relation between the word at the top of stack and the word directly beneath it; remove
the lower word from the stack
Assign some previously seen word as the head of
the current wordRIGHTARC
Assert a head-dependent relation between the second word on the stack and the word at the top; remove the
word at the top of the stack
Wait processing the current word; add it for
laterSHIFT
Remove the word from the front of the input buffer and push it onto the stack
Shift-Reduce Dependency Parsing
Tools: input words, some special root symbol ($), and a stack to hold configurations
Shift:– move tokens onto the stack
– decide if top two elements of the stack form a valid (good) grammatical dependency
Reduce:– If there’s a valid relation, place head on the stack
decide how?Search problem!
what is valid?Learn it!
what are the possible actions?
Learning An Oracle (Predictor)
Training data: dependency treebank
Input: configuration
Output: {LEFTARC, RIGHTARC, SHIFT}
t ← ORACLE(state)
Learning An Oracle (Predictor)
Training data: dependency treebank
Input: configuration
Output: {LEFTARC, RIGHTARC, SHIFT}
t ← ORACLE(state)
• Choose LEFTARC if it produces a correct head-dependent relation given the reference parse and the current configuration
Learning An Oracle (Predictor)
Training data: dependency treebank
Input: configuration
Output: {LEFTARC, RIGHTARC, SHIFT}
t ← ORACLE(state)
• Choose LEFTARC if it produces a correct head-dependent relation given the reference parse and the current configuration
• Choose RIGHTARC if • it produces a correct head-dependent relation given the reference parse and
Learning An Oracle (Predictor)
Training data: dependency treebank
Input: configuration
Output: {LEFTARC, RIGHTARC, SHIFT}
t ← ORACLE(state)
• Choose LEFTARC if it produces a correct head-dependent relation given the reference parse and the current configuration
• Choose RIGHTARC if • it produces a correct head-dependent relation given the reference parse and• all of the dependents of the word at the top of the stack have already been
assigned
Learning An Oracle (Predictor)
Training data: dependency treebank
Input: configuration
Output: {LEFTARC, RIGHTARC, SHIFT}
t ← ORACLE(state)
• Choose LEFTARC if it produces a correct head-dependent relation given the reference parse and the current configuration
• Choose RIGHTARC if • it produces a correct head-dependent relation given the reference parse and• all of the dependents of the word at the top of the stack have already been
assigned
Deps Stack Word Buffer Action
--- $ Papa ate the caviar SHIFT
--- Papa $ ate the caviar SHIFT
--- ate Papa $ the caviar LEFTARC
ate->Papa ate $ caviar SHIFT
ate->Papa the ate $ --- SHIFT
ate->Papa caviar the ate $ --- LEFTARC
ate->Papa, caviar-> the caviar ate $ --- RIGHTARC
ate->Papa, caviar-> the, ate->caviar ate $ --- RIGHTARC
ate->Papa, caviar-> the, ate->caviar, $->ate --- --- ---
Learning An Oracle (Predictor)
Training data: dependency treebank
Input: configuration
Output: {LEFTARC, RIGHTARC, SHIFT}
t ← ORACLE(state)
• Choose LEFTARC if it produces a correct head-dependent relation given the reference parse and the current configuration
• Choose RIGHTARC if • it produces a correct head-dependent relation given the reference parse and• all of the dependents of the word at the top of the stack have already been
assigned• Otherwise, choose SHIFT
Training the Predictor
Predict action t give configuration s
t = φ(s)
Training the Predictor
Predict action t give configuration s
t = φ(s)
Extract features of the configuration
Examples: word forms, lemmas, POS, morphological features
Training the Predictor
Predict action t give configuration s
t = φ(s)
Extract features of the configurationExamples: word forms, lemmas, POS,
morphological features
How? Perceptron, Maxent, Support Vector Machines, Multilayer Perceptrons, Neural Networks
Papa ate the caviar
If time permits, work through on board
state {[root], [words], [] }
while state ≠ {[root], [], [(deps)]} {
t ← ORACLE(state)
state ← APPLY(t, state)
}
return state
PossibilityActionName
Action Meaning
Assign the current word as the head of
some previously seen word
LEFTARC
Assert a head-dependent relation between the word at the top of stack and the word directly beneath it; remove the lower word from the
stack
Assign some previously seen word
as the head of the current word
RIGHTARC
Assert a head-dependent relation between the second word on the stack and the word at the top; remove the word at the top of the stack
Wait processing the current word; add it
for laterSHIFT
Remove the word from the front of the input buffer and push it onto the stack
Parse steps for “Papa ate the caviar”
Deps Stack Word Buffer Action
--- $ Papa ate the caviar SHIFT
--- Papa $ ate the caviar SHIFT
--- ate Papa $ the caviar LEFTARC
ate->Papa ate $ caviar SHIFT
ate->Papa the ate $ --- SHIFT
ate->Papa caviar the ate $ --- LEFTARC
ate->Papa, caviar-> the caviar ate $ --- RIGHTARC
ate->Papa, caviar-> the, ate->caviar ate $ --- RIGHTARC
ate->Papa, caviar-> the, ate->caviar, $->ate --- --- ---
Arc Standard Parsing
state {[root], [words], [] }
while state ≠ {[root], [], [(deps)]} {
t ← ORACLE(state)
state ← APPLY(t, state)
}
return state
Q: What is the time complexity?
Arc Standard Parsing
state {[root], [words], [] }
while state ≠ {[root], [], [(deps)]} {
t ← ORACLE(state)
state ← APPLY(t, state)
}
return state
Q: What is the time complexity?
A: Linear
Arc Standard Parsing
state {[root], [words], [] }
while state ≠ {[root], [], [(deps)]} {
t ← ORACLE(state)
state ← APPLY(t, state)
}
return state
Q: What is the time complexity?
A: Linear
Q: What’s potentially problematic?
Arc Standard Parsing
state {[root], [words], [] }
while state ≠ {[root], [], [(deps)]} {
t ← ORACLE(state)
state ← APPLY(t, state)
}
return state
Q: What is the time complexity?
A: Linear
Q: What’s potentially problematic?
A: This is a greedy algorithm
Becoming Less Greedy
Beam search
Breadth-first search strategy (CMSC 471/671)
At each stage, keep K options open
Evaluation
Exact Match (per-sentence accuracy)
Unlabeled Attachment Score (UAS)
Labeled Attachment Score (LS, LAS)
Recall/Precision/F1 for particular relation types
Outline
Review: PCFGs and CKY
Dependency Grammars
Dependency Parsing (Shift-reduce)
From syntax to semantics
Parsing as a Core NLP Problem
sentence 1
sentence 2
sentence 3
sentence 4
Parser
Grammar
Evaluation score
Other NLP task (entity coref., MT, Q&A, …)
independent operations
Gold (correct) reference trees
From Dependencies to Shallow Semantics
From Syntax to Shallow Semantics
Angeli et al. (2015)
“Open Information Extraction”
From Syntax to Shallow Semantics
http://corenlp.run/ (constituency & dependency)
https://github.com/hltcoe/predpatt
http://openie.allenai.org/
http://www.cs.rochester.edu/research/knext/browse/ (constituency trees)
http://rtw.ml.cmu.edu/rtw/
Angeli et al. (2015)
“Open Information Extraction”
a sampling of efforts