dependency grammars and parsing€¦ · probabilistic context free grammar set of weighted...

Dependency Grammars and Parsing

CMSC 473/673

UMBC

Outline

Review: PCFGs and CKY

Dependency Grammars

Dependency Parsing (Shift-reduce)

From syntax to semantics

Probabilistic Context Free Grammar

Set of weighted (probabilistic) rewrite rules, comprised of terminals and non-terminals

Terminals: the words in the language (the lexicon), e.g., Baltimore

Non-terminals: symbols that can trigger rewrite rules, e.g., S, NP, Noun

(Sometimes) Pre-terminals: symbols that can only trigger lexical rewrites, e.g., Noun

1.0 S NP VP.4 NP Det Noun.3 NP Noun.2 NP Det AdjP.1 NP NP PP

1.0 PP P NP.34 AdjP Adj Noun.26 VP V NP.0003 Noun Baltimore…

Q: What are the distributions? What must sum to 1?

A: P(X Y Z | X)

Probabilistic Context Free Grammar

p(S

NP VP

Noun

Baltimore

Verb NP

is a great city

)=

p(S

NP VP

) *

p( ) *

p( ) *

NP

Noun

p( ) *Noun

Baltimore

VP

Verb NP

p( ) *Verb

is

p( )NP

a great city

product of probabilities of individual rules used in the

derivation

“Papa ate the caviar with a spoon”S NP VP

NP Det N

NP NP PP

VP V NP

VP VP PP

PP P NP

NP Papa

N caviar

N spoon

V spoon

V ate

P with

Det the

Det a

0 1 2 3 4 5 6 7

Example from Jason Eisner

Entire grammarAssume uniform

weights

First: Let’s find all NPs

(NP, 0, 1): Papa(NP, 2, 4): the caviar(NP, 5, 7): a spoon(NP, 2, 7): the caviar with a spoon

Second: Let’s find all VPs

(VP, 1, 7): ate the caviar with a spoon(VP, 1, 4): ate the caviar

Third: Let’s find all Ss

(S, 0, 7): Papa ate the caviar with a spoon(S, 0, 4): Papa ate the caviar

NP

VP

0

6

1

2

3

4

5

1 2 3 4 5 6 7

(NP, 0, 1) (VP, 1, 7) (S, 0, 7)

start

end

S

CKY Recognizer

Input: * string of N words* grammar in CNF

Output: True (with parse)/False

Data structure: N*N table TRows indicate span

start (0 to N-1)Columns indicate span

end (1 to N)

T[i][j] lists constituents spanning i j

For Viterbi in HMMs:build table left-to-right

For CKY in trees:1. build smallest-to-largest &2. left-to-right

CKY RecognizerT = Cell[N][N+1]

for(j = 1; j ≤ N; ++j) {T[j-1][j].add(X for non-terminal X in G if X wordj)

}

for(width = 2; width ≤ N; ++width) {for(start = 0; start < N - width; ++start) {

end = start + widthfor(mid = start+1; mid < end; ++mid) {

for(non-terminal Y : T[start][mid]) {for(non-terminal Z : T[mid][end]) {

T[start][end].add(X for rule X Y Z : G)}

}}

}}

X

Y Z

Y Z

CKY is Versatile: PCFG TasksTask PCFG algorithm name HMM analog

Find any parse CKY recognizer none

Find the most likely parse (for an observed sequence)

CKY weighted Viterbi Viterbi

Calculate the (log) likelihood of an observed sequence w1, …, wN Inside algorithm Forward algorithm

Learn the grammar parametersInside-outside algorithm (EM)

Forward-backward/Baum-Welch (EM)

Outline


Dependency Grammars



Structure vs. Word Relations

Constituency trees/analyses: based on structure

Dependency analyses: based on word relations

Remember: (P)CFGs HelpClearly Show Ambiguity

I ate the meal with friends

NP VP

VP NP PP

S

NP VP

S

VP NP

PPNP

CFGs to Dependencies


NP VP

VP NP PP

S

NP VP

S

VP NP

PPNP

CFGs toLabeled Dependencies


NP VP

VP NP PP

S

NP VP

S

VP NP

PPNPnsubj dobj nmod

nsubj dobj nmod

Labeled Dependencies

Word-to-word labeled relations

governor (head)

dependentChris ate

nsubj

Labeled Dependencies

Word-to-word labeled relations

governor (head)

dependentChris ate

nsubj

de Marneffe et al., 2014

http://universaldependencies.org/

(Labeled) Dependency Parse

Directed graphs

Vertices: linguistic blobs in a sentence

Edges: (labeled) arcs

(Labeled) Dependency Parse

Directed graphs

Vertices: linguistic blobs in a sentence

Edges: (labeled) arcs

Often directed trees

1. A single root node with no incoming arcs

2. Each vertex except root has exactly one incoming arc

3. Unique path from the root node to each vertex

Projective Dependency Trees

No crossing arcs

SLP3: Figs 14.2, 14.3

✔ Projective


No crossing arcs

SLP3: Figs 14.2, 14.3

✔ Projective

✖ Not projective


No crossing arcs

SLP3: Figs 14.2, 14.3

✔ Projective

✖ Not projective

non projective parses capture• certain long-range dependencies• free word order

Are CFGs for Naught?

Nope! Simple algorithm from Xia and Palmer (2011)

1. Mark the head child of each node in a phrase structure, using “appropriate” head rules.

2. In the dependency structure, make the head of each non-head child depend on the head of the head-child.





Papa ate the caviar with a spoon

NP V D N P D N

NP NP

PPVP

VP

S






NP V D N P D N

NP NP

PPVP

VP

S

spooncaviar






NP V D N P D N

NP NP

PPVP

VP

S

ate spoon

spooncaviar






NP V D N P D N

NP NP

PPVP

VP

S

ate spoon

spooncaviar

ate






NP V D N P D N

NP NP

PPVP

VP

S

ate spoon

spooncaviar

ate

ate

Dependency Post-Processing(Keep Tree Structure)

Dependency Post-Processing(Get Possible Graph Structure)

Amaranthus, collectively known as amaranth, is a cosmopolitan genus of annual or short-lived perennial plants.

Outline


Dependency Grammars



(Some) Dependency Parsing Algorithms

Dynamic Programming

Eisner Algorithm (Eisner 1996)

Transition-based

Shift-reduce, arc standard

Graph-based

Maximum spanning tree

Shift-Reduce Parsing

Recall from CMSC 331

Bottom-up

Tools: input words, some special root symbol ($), and a stack to hold configurations

Shift-Reduce Parsing

Recall from CMSC 331Bottom-up


Shift:– move tokens onto the stack– match top two elements of the stack against the grammar

(RHS)

Reduce:– If match occurred, place LHS symbol on the stack

Shift-Reduce Dependency Parsing


Shift:– move tokens onto the stack

– decide if top two elements of the stack form a valid (good) grammatical dependency

Reduce:– If there’s a valid relation, place head on the stack






decide how?Search problem!







what is valid?Learn it!








what are the possible actions?

Shift-Reduce Actions

Possibility

Assign the current word as the head of some previously seen word

Assign some previously seen word as the head

of the current word

Wait processing the current word; add it for

later


PossibilityActionName

Action Meaning


LEFTARC

Assert a head-dependent relation between the word at the top of stack and the word directly

beneath it; remove the lower word from the stack


of the current wordRIGHTARC

Assert a head-dependent relation between the second word on the stack and the word at the top;

remove the word at the top of the stack


laterSHIFT

Remove the word from the front of the input buffer and push it onto the stack



Action Meaning


LEFTARC

Assert a head-dependent relation between the word at the top of stack and the word directly

beneath it; remove the lower word from the stack


of the current wordRIGHTARC

Assert a head-dependent relation between the second word on the stack and the word at the top;

remove the word at the top of the stack


laterSHIFT


“Arc standard”

Arc Standard Parsing

state {[root], [words], [] }



while state not final {

}




t ← ORACLE(state)

state ← APPLY(t, state)

}




t ← ORACLE(state)


}

return state



while state ≠ {[root], [], [(deps)]} {

t ← ORACLE(state)


}

return state




t ← ORACLE(state)


}

return state


Action Meaning


LEFTARC

Assert a head-dependent relation between the word at the top of stack and the word directly beneath it; remove

the lower word from the stack

Assign some previously seen word as the head of

the current wordRIGHTARC

Assert a head-dependent relation between the second word on the stack and the word at the top; remove the

word at the top of the stack


laterSHIFT





t ← PREDICT(state)


}

return state



Action Meaning


LEFTARC

Assert a head-dependent relation between the word at the top of stack and the word directly beneath it; remove

the lower word from the stack

Assign some previously seen word as the head of

the current wordRIGHTARC

Assert a head-dependent relation between the second word on the stack and the word at the top; remove the

word at the top of the stack


laterSHIFT









what are the possible actions?

Learning An Oracle (Predictor)

Training data: dependency treebank

Input: configuration

Output: {LEFTARC, RIGHTARC, SHIFT}

t ← ORACLE(state)





t ← ORACLE(state)

• Choose LEFTARC if it produces a correct head-dependent relation given the reference parse and the current configuration





t ← ORACLE(state)


• Choose RIGHTARC if • it produces a correct head-dependent relation given the reference parse and





t ← ORACLE(state)


• Choose RIGHTARC if • it produces a correct head-dependent relation given the reference parse and• all of the dependents of the word at the top of the stack have already been

assigned





t ← ORACLE(state)



assigned

Deps Stack Word Buffer Action

--- $ Papa ate the caviar SHIFT

--- Papa $ ate the caviar SHIFT

--- ate Papa $ the caviar LEFTARC

ate->Papa ate $ caviar SHIFT

ate->Papa the ate $ --- SHIFT

ate->Papa caviar the ate $ --- LEFTARC

ate->Papa, caviar-> the caviar ate $ --- RIGHTARC

ate->Papa, caviar-> the, ate->caviar ate $ --- RIGHTARC

ate->Papa, caviar-> the, ate->caviar, $->ate --- --- ---





t ← ORACLE(state)



assigned• Otherwise, choose SHIFT

Training the Predictor

Predict action t give configuration s

t = φ(s)



t = φ(s)

Extract features of the configuration

Examples: word forms, lemmas, POS, morphological features



t = φ(s)

Extract features of the configurationExamples: word forms, lemmas, POS,

morphological features

How? Perceptron, Maxent, Support Vector Machines, Multilayer Perceptrons, Neural Networks

Papa ate the caviar

If time permits, work through on board



t ← ORACLE(state)


}

return state


Action Meaning

Assign the current word as the head of

some previously seen word

LEFTARC

Assert a head-dependent relation between the word at the top of stack and the word directly beneath it; remove the lower word from the

stack

Assign some previously seen word

as the head of the current word

RIGHTARC

Assert a head-dependent relation between the second word on the stack and the word at the top; remove the word at the top of the stack

Wait processing the current word; add it

for laterSHIFT


Parse steps for “Papa ate the caviar”

Deps Stack Word Buffer Action

--- $ Papa ate the caviar SHIFT

--- Papa $ ate the caviar SHIFT

--- ate Papa $ the caviar LEFTARC

ate->Papa ate $ caviar SHIFT

ate->Papa the ate $ --- SHIFT

ate->Papa caviar the ate $ --- LEFTARC

ate->Papa, caviar-> the caviar ate $ --- RIGHTARC

ate->Papa, caviar-> the, ate->caviar ate $ --- RIGHTARC

ate->Papa, caviar-> the, ate->caviar, $->ate --- --- ---




t ← ORACLE(state)


}

return state

Q: What is the time complexity?




t ← ORACLE(state)


}

return state


A: Linear




t ← ORACLE(state)


}

return state


A: Linear

Q: What’s potentially problematic?




t ← ORACLE(state)


}

return state


A: Linear

Q: What’s potentially problematic?

A: This is a greedy algorithm

Becoming Less Greedy

Beam search

Breadth-first search strategy (CMSC 471/671)

At each stage, keep K options open

Evaluation

Exact Match (per-sentence accuracy)

Unlabeled Attachment Score (UAS)

Labeled Attachment Score (LS, LAS)

Recall/Precision/F1 for particular relation types

Outline


Dependency Grammars



Parsing as a Core NLP Problem

sentence 1

sentence 2

sentence 3

sentence 4

Parser

Grammar

Evaluation score

Other NLP task (entity coref., MT, Q&A, …)

independent operations

Gold (correct) reference trees

From Dependencies to Shallow Semantics

From Syntax to Shallow Semantics

Angeli et al. (2015)

“Open Information Extraction”

From Syntax to Shallow Semantics

http://corenlp.run/ (constituency & dependency)

https://github.com/hltcoe/predpatt

http://openie.allenai.org/

http://www.cs.rochester.edu/research/knext/browse/ (constituency trees)

http://rtw.ml.cmu.edu/rtw/

Angeli et al. (2015)

“Open Information Extraction”

a sampling of efforts

http://corenlp.run/

https://github.com/hltcoe/predpatt

http://openie.allenai.org/

http://www.cs.rochester.edu/research/knext/browse/

http://rtw.ml.cmu.edu/rtw/

dependency grammars and parsing€¦ · probabilistic context free grammar set of weighted...

Documents