cs11-711 advanced nlp pcfg parsing

30
PCFG Parsing CS11-711 Advanced NLP Hao Zhu Site https://phontron.com/class/anlp2021/

Upload: others

Post on 09-Jun-2022

14 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: CS11-711 Advanced NLP PCFG Parsing

PCFG ParsingCS11-711 Advanced NLP

Hao Zhu

Site https://phontron.com/class/anlp2021/

Page 2: CS11-711 Advanced NLP PCFG Parsing

Two Types ofLinguistic Structure

• Dependency: focus on relations between words

• Phrase structure: focus on the structure of the sentence

I saw a girl with a telescope

PRP VBD DT NN IN DT NN

NP NPPP

VPS

I saw a girl with a telescope

ROOT

Page 3: CS11-711 Advanced NLP PCFG Parsing

Grammar Induction (Unsupervised Parsing)Learning a set of (probabilistic) grammar rules

3

Page 4: CS11-711 Advanced NLP PCFG Parsing

Typical grammar induction methodsunsupervised constituency and dependency parsing

4

Page 5: CS11-711 Advanced NLP PCFG Parsing

Explosion of ambiguity

Page 6: CS11-711 Advanced NLP PCFG Parsing

Probabilistic parsing

• First try parsing without any weird rules, throwing them in only if needed.

• Better: every rule has a weight. 

• A tree’s weight is total weight of all its rules. 

• Pick the overall lightest parse of sentence.

• Best: train the weights!

Page 7: CS11-711 Advanced NLP PCFG Parsing

Mystery: humans learn to parse without learning to parse

Page 8: CS11-711 Advanced NLP PCFG Parsing

Grammar Induction

Page 9: CS11-711 Advanced NLP PCFG Parsing

CFG definition

• : Set of nonterminals (constituent labels)

• : Set of preterminals (part-of-speech tags)

• : Set of terminals (words)

• S: Start symbol

• : Set of rules

𝒢 = (S, 𝒩, 𝒫, Σ, ℛ)

𝒩

𝒫

Σ

WLOG, only consider binary branching; Chomsky Normal Form

Page 10: CS11-711 Advanced NLP PCFG Parsing

Probalistic CFG

• For every rule, assign a probability to it.

• The summation of the probabilities of the rules with the same left hand non-terminal X is 1:

• How to get the most probable tree given the probabilities of the rules?

• Dynamic Programming

∑Y

π(X → Y) = 1

Page 11: CS11-711 Advanced NLP PCFG Parsing

He watches a model train

Pron 2 Verb 5Noun 6 DET 1 Verb 5

Noun 6Verb 7Noun 6

S’ 11 NP 8 NP 16

VP 15NP 19

∞ NP 19

S’ 19 VP 26

S’ 30

Remember to store back-pointer!

Rule -log prob

S’ →Pron Verb 4

S’ →Pron VP 2

S’ →NP VP 2

NP →NP Verb 5

NP→Det Noun 2

NP→Det NP 2

NP→Noun Noun 4

VP→Noun NP 5

VP→Verb NP 2

VP→VP NP 2

Page 12: CS11-711 Advanced NLP PCFG Parsing

He watches a model train

Pron 2 Verb 5Noun 6 DET 1 Verb 5

Noun 6Verb 7Noun 6

S’ 11 NP 8 NP 16

VP 15NP 19

∞ NP 19

S’ 19 VP 26

S’ 30

Build the whole tree by branching from root.

Rule -log prob

S’ →Pron Verb 4

S’ →Pron VP 2

S’ →NP VP 2

NP →NP Verb 5

NP→Det Noun 2

NP→Det NP 2

NP→Noun Noun 4

VP→Noun NP 5

VP→Verb NP 2

VP→VP NP 2

Page 13: CS11-711 Advanced NLP PCFG Parsing

CYK algorithm

• The probability of a constituent with a non-terminal is often called inside probability

• Complexity?

βA(x, y) = mink,B,C

(−logπ(A → BC) + βB(x, k) + βC(k + 1,y))

Page 14: CS11-711 Advanced NLP PCFG Parsing

CYK algorithm

• We can use the same CKY algorithm to calculate the marginal probability of a sentence through

• What else can it do?

• Recognizer:

• A general form?

βA(x, y) = − log ∑k,B,C

exp(logπ(A → BC) − βB(x, k) − βC(k + 1,y))

βA(x, y) = ∨k,B,C (A → BC) ∈ ℛ ∧ βB(x, k) ∧ βC(k + 1,y)

Page 15: CS11-711 Advanced NLP PCFG Parsing

CYK algorithm

• Semiring Parsing weights ⊕ ⊗ ⓪ ①

total prob [0, 1] + x 0 1

max prob [0, 1] max x 0 1

min -logp [0, ∞] min + ∞ 0

log prob [-∞, 0] logsumexp + -∞ 0

recognizer T/F or and F T

Page 16: CS11-711 Advanced NLP PCFG Parsing

Optimizing PCFGs

• Traditional methods: inside-outside algorithm

• Good news: You can directly optimize the log prob calculated by CKY with autograd with the same effect and time complexity.

• Optional reading: Inside-Outside and Forward-Backward Algorithms Are Just Backprop

• Similar to language models, we optimize the log probability of the sentence:

• ℒ = − log∑Tx

p(Tx)

Page 17: CS11-711 Advanced NLP PCFG Parsing

0

9

18

27

36

Random PCFG Right Branching

Great News: It works (better than random)

Page 18: CS11-711 Advanced NLP PCFG Parsing

0

10

20

30

40

Random PCFG Right Branching

Bad News: It is even worse than right branching. Why?

Page 19: CS11-711 Advanced NLP PCFG Parsing

Neural PCFGs

• Neural parameterization for PCFGs

• parameter sharing through distributed representations

• same training method

Page 20: CS11-711 Advanced NLP PCFG Parsing

Neural L-PCFGs

• You can further improve Neural PCFGs by adding head annotations

Page 21: CS11-711 Advanced NLP PCFG Parsing

Neural L-PCFGs

• L-PCFGs did not work well because they have even MORE parameters than PCFGs

Page 22: CS11-711 Advanced NLP PCFG Parsing

Limitation of lexical dependenciesIndependent head word and argument word

The spy saw the cop with the telescope.

S

NP VP

PPVP

V NP

saw

the spy

with the telescope

the cop

S

NP

PP

VP

VNP

sawthe spy

with the telescope

the cop

22

Page 23: CS11-711 Advanced NLP PCFG Parsing

Using a latent compound variableConditional independency

The spy saw the cop with the telescope.

S

NP VP

PPVP

V NP

saw

the spy

with the telescope

the cop

S

NP

PP

VP

VNP

sawthe spy

with the revolver

the cop

The spy saw the cop with the revolver. 23

Page 24: CS11-711 Advanced NLP PCFG Parsing

Results

0

15

30

45

60

DAS UAS F1

Compound PCFG DMV Neural L-PCFGs

24

right branching

Page 25: CS11-711 Advanced NLP PCFG Parsing

Limitations

• Efficient Bilexical dependency (table assumes enough parallel workers)

• Neural Bi-Lexicalized PCFG Induction.

Time complexity Space Complexity

Unilexical dependencies

Bilexical Dependencies

𝒪(L)

𝒪(L) 𝒪(L4 |𝒩 | ( |𝒩 | + |𝒫 | )2)

𝒪(L3 |𝒩 | ( |𝒩 | + |𝒫 | )2)

Page 26: CS11-711 Advanced NLP PCFG Parsing

Key to the mystery: visual prior?

Page 27: CS11-711 Advanced NLP PCFG Parsing

Visual Prior Grammar Induction

• Visual grounded neural syntax acquisition

Page 28: CS11-711 Advanced NLP PCFG Parsing

Visual Prior Grammar Induction

• Visual grounded neural syntax acquisition

• Similar results even if the dimension of embeddings get shrunk to 1 or 2.

• embeddings mainly capture POS tags

• concreteness?

Page 29: CS11-711 Advanced NLP PCFG Parsing

Visual Prior Grammar Induction

• Recommend readings

• Visually Grounded Compound PCFGs.

• Dependency Induction Through the Lens of Visual Perception

Page 30: CS11-711 Advanced NLP PCFG Parsing

References

• https://nlp.stanford.edu/seminar/details/yoonkim.pdf

• https://www.cs.jhu.edu/~jason/465/