probabilistic parsing: enhancements ling 571 deep processing techniques for nlp january 26, 2011

Probabilistic Parsing: Enhancements

Ling 571Deep Processing Techniques for NLP

January 26, 2011

Parser IssuesPCFGs make many (unwarranted) independence

assumptionsStructural Dependency

NP -> Pronoun: much more likely in subject position

Lexical DependencyVerb subcategorizationCoordination ambiguity

Improving PCFGs:Structural Dependencies

How can we capture Subject/Object asymmetry?E.g., NPsubj-> Pron vs NPObj->Pron

Parent annotation:Annotate each node with parent in parse tree

E.g., NP^S vs NP^VPAlso annotate pre-terminals:

RB^ADVP vs RB^VP IN^SBAR vs IN^PP

Can also split rules on other conditions

Parent Annotation

Parent Annotation:Pre-terminals

Parent AnnotationAdvantages:

Captures structural dependency in grammars

Disadvantages: Increases number of rules in grammar

Disadvantages: Increases number of rules in grammarDecreases amount of training per rule

Strategies to search for optimal # of rules

Improving PCFGs:Lexical Dependencies

Lexicalized rules:Best known parsers: Collins, Charniak parsersEach non-terminal annotated with its lexical head

E.g. verb with verb phrase, noun with noun phraseEach rule must identify RHS element as head

Heads propagate up treeConceptually like adding 1 rule per head value

VP(dumped) -> VBD(dumped)NP(sacks)PP(into)VP(dumped) -> VBD(dumped)NP(cats)PP(into)

Lexicalized PCFGsAlso, add head tag to non-terminals

Head tag: Part-of-speech tag of head wordVP(dumped) -> VBD(dumped)NP(sacks)PP(into)VP(dumped,VBD) ->

VBD(dumped,VBD)NP(sacks,NNS)PP(into,IN)

Two types of rules:Lexical rules: pre-terminal -> word

Deterministic, probability 1 Internal rules: all other expansions

Must estimate probabilities

PLCFGsIssue:

PLCFGsIssue: Too many rules

No way to find corpus with enough examples

PLCFGsIssue: Too many rules

No way to find corpus with enough examples

(Partial) Solution: Independence assumedCondition rule on

Category of LHS, head

Condition head onCategory of LHS and parent’s head

Disambiguation Example

67.09/6

dumpedVPC

VBDNPPdumpedVPC

dumpedVPVBDNPPPVPP

dumpedVPC

NPVBDdumpedVPC

dumpedVPVBDNPVPp

22.09/2

...)...)((

)..)(...)((

PPdumpedXC

inPPdumpedXC

dumpedPPinp

...)...)((

)...)(...)((

PPsacksXC

inPPsacksXC

sacksPPinp

Improving PCFGs:Tradeoffs

Tensions: Increase accuracy:

Increase specificity E.g. Lexicalizing, Parent annotation, Markovization,etc

Increases grammarIncreases processing timesIncreases training data requirements

How can we balance?

CNF Factorization & Markovization

CNF factorization:Converts n-ary branching to binary branching

CNF factorization:Converts n-ary branching to binary branchingCan maintain information about original structure

Neighborhood history and parent

Issue:Potentially explosive

If keep all context: 72 -> 10K non-terminals!!!

How much context should we keep?What Markov order?

Different Markov Orders

Markovization & Costs(Mohri & Roark 2006)

EfficiencyPCKY is Vn3

Grammar can be huge Grammar can be extremely ambiguous

100s of analyses not unusual, esp. for long sentences

However, only care about best parsesOthers can be pretty bad

Can we use this to improve efficiency?

Beam ThresholdingInspired by beam search algorithm

Assume low probability partial parses unlikely to yield high probability overallKeep only top k most probably partial parse

Retain only k choices per cell For large grammars, could be 50 or 100 For small grammars, 5 or 10

Heuristic FilteringIntuition: Some rules/partial parses are unlikely to

end up in best parse. Don’t store those in table.

Exclusions:Low frequency: exclude singleton productions

Low probability: exclude constituents x s.t. p(x) <10-

Low probability: exclude constituents x s.t. p(x) <10-200

Low relative probability:Exclude x if there exists y s.t. p(y) > 100 * p(x)

Notes on HW#3Outline:

Induce grammar from (small) treebank

Implement Probabilistic CKY

Evaluate parser

Improve parser

Treebank FormatAdapted from Penn Treebank Format

Rules simplified:Removed traces and other null elementsRemoved complex tagsReformatted POS tags as non-terminals

Reading the ParsesPOS unary collapse:

(NP_NNP Ontario) was

(NP (NNP Ontario))

Binarization:VP -> VP’ PP; VP’ -> VB PP

WasVP -> VB PP PP

Start Early!

probabilistic parsing: enhancements ling 571 deep processing techniques for nlp january 26, 2011

parent annotation slide

grammar slide

explosive slide

binary branching slide

parents head slide

parent issue

lexical rules

parent annotation advantages

Documents

semantic parsing - columbia...

october 2004csa3050 nlp algorithms1 csa3050: natural...

icon09 nlp tools contest: indian language...

11-711 algorithms for nlp - carnegie mellon...

nlp in q - kx systems · text nlp in q no existing nlp...

extract precise link context using nlp parsing

cky parsing ling 571 deep processing techniques for nlp...

probabilistic parsing ling 571 fei xia week 4:...

lecture 15: dependency...

feature structures and parsing unification grammars 11-711...

parsing with pcfg ling 571 fei xia week 3: 10/11-10/13/05

algorithms for nlp -...

features & unification ling 571 deep processing techniques...

algorithms for ai and nlp (inf4820 — parsing) ·...

cs460/626 : natural language processing/speech, nlp and the...

proceedings of icon10 nlp tools contest:...

november 2004csa3050: sentence parsing ii1 csa350: nlp...

nlp programming tutorial 8 - phrase structure...

algorithms for nlp - carnegie mellon...

cs11-711 advanced nlp pcfg parsing