log-linear models for history-based parsing -...

Log-Linear Models for History-Based Parsing

Michael Collins, Columbia University

Log-Linear Taggers: Summary

I The input sentence is w[1:n] = w1 . . . wn

I Each tag sequence t[1:n] has a conditional probability

p(t[1:n] | w[1:n]) =∏n

j=1 p(tj | w1 . . . wn, t1 . . . tj−1) Chain rule

j=1 p(tj | w1 . . . wn, tj−2, tj−1) Independence

assumptions

I Estimate p(tj | w1 . . . wn, tj−2, tj−1) using log-linear models

I Use the Viterbi algorithm to compute

argmaxt[1:n]log p(t[1:n] | w[1:n])

A General Approach:

(Conditional) History-Based Models

I We’ve shown how to define p(t[1:n] | w[1:n]) where t[1:n] is atag sequence

I How do we define p(T | S) if T is a parse tree (or anotherstructure)? (We use the notation S = w[1:n])

A General Approach:

(Conditional) History-Based Models

I Step 1: represent a tree as a sequence of decisions d1 . . . dm

T = 〈d1, d2, . . . dm〉m is not necessarily the length of the sentence

I Step 2: the probability of a tree is

p(T | S) =m∏i=1

p(di | d1 . . . di−1, S)

I Step 3: Use a log-linear model to estimatep(di | d1 . . . di−1, S)

I Step 4: Search?? (answer we’ll get to later: beam orheuristic search)

An Example Tree

S(questioned)

NP(lawyer)

lawyer

VP(questioned)

questioned

NP(witness)

witness

PP(about)

NP(revolver)

revolver

Ratnaparkhi’s Parser: Three Layers of Structure

1. Part-of-speech tags

2. Chunks

3. Remaining structure

Layer 1: Part-of-Speech Tags

lawyer

questioned

witness

revolver

T = 〈d1, d2, . . . dm〉

I First n decisions are tagging decisions〈d1 . . . dn〉 = 〈 DT, NN, Vt, DT, NN, IN, DT, NN 〉

Layer 2: Chunks

lawyer

questioned

witness

revolver

Chunks are defined as any phrase where all children arepart-of-speech tags

(Other common chunks are ADJP, QP)

Layer 2: Chunks

Start(NP)

Join(NP)

lawyer

questioned

Start(NP)

Join(NP)

witness

Start(NP)

Join(NP)

revolver

T = 〈d1, d2, . . . dm〉

I First n decisions are tagging decisionsNext n decisions are chunk tagging decisions

〈d1 . . . d2n〉 = 〈 DT, NN, Vt, DT, NN, IN, DT, NN,Start(NP), Join(NP), Other, Start(NP), Join(NP),Other, Start(NP), Join(NP)〉

Layer 3: Remaining Structure

Alternate Between Two Classes of Actions:

I Join(X) or Start(X), where X is a label (NP, S, VP etc.)I Check=YES or Check=NO

Meaning of these actions:

I Start(X) starts a new constituent with label X(always acts on leftmost constituent with no start or joinlabel above it)

I Join(X) continues a constituent with label X(always acts on leftmost constituent with no start or joinlabel above it)

I Check=NO does nothingI Check=YES takes previous Join or Start action, and converts

it into a completed constituent

lawyer

questioned

witness

revolver

Start(S)

lawyer

questioned

witness

revolver

Start(S)

lawyer

questioned

witness

revolver

Check=NO

Start(S)

lawyer

Start(VP)

questioned

witness

revolver

Start(S)

lawyer

Start(VP)

questioned

witness

revolver

Check=NO

Start(S)

lawyer

Start(VP)

questioned

Join(VP)

witness

revolver

Start(S)

lawyer

Start(VP)

questioned

Join(VP)

witness

revolver

Check=NO

Start(S)

lawyer

Start(VP)

questioned

Join(VP)

witness

Start(PP)

revolver

Start(S)

lawyer

Start(VP)

questioned

Join(VP)

witness

Start(PP)

revolver

Check=NO

Start(S)

lawyer

Start(VP)

questioned

Join(VP)

witness

Start(PP)

Join(PP)

revolver

Start(S)

lawyer

Start(VP)

questioned

Join(VP)

witness

revolver

Check=YES

Start(S)

lawyer

Start(VP)

questioned

Join(VP)

witness

Join(VP)

revolver

Start(S)

lawyer

questioned

witness

revolver

Check=YES

Start(S)

lawyer

Join(S)

questioned

witness

revolver

lawyer

questioned

witness

revolver

Check=YES

The Final Sequence of decisions

〈d1 . . . dm〉 = 〈 DT, NN, Vt, DT, NN, IN, DT, NN,Start(NP), Join(NP), Other, Start(NP), Join(NP),Other, Start(NP), Join(NP),Start(S), Check=NO, Start(VP), Check=NO,Join(VP), Check=NO, Start(PP), Check=NO,Join(PP), Check=YES, Join(VP), Check=YES,Join(S), Check=YES 〉

A General Approach:

(Conditional) History-Based ModelsI Step 1: represent a tree as a sequence of decisions d1 . . . dm

T = 〈d1, d2, . . . dm〉

m is not necessarily the length of the sentence

I Step 2: the probability of a tree isp(T | S) =

∏mi=1 p(di | d1 . . . di−1, S)

I Step 3: Use a log-linear model to estimate

p(di | d1 . . . di−1, S)

I Step 4: Search?? (answer we’ll get to later: beam or heuristicsearch)

Applying a Log-Linear Model

p(di | d1 . . . di−1, S)

I A reminder:

p(di | d1 . . . di−1, S) =ef(〈d1...di−1,S〉,di)·v∑d∈A e

f(〈d1...di−1,S〉,d)·v

where:

〈d1 . . . di−1, S〉 is the history

di is the outcome

f maps a history/outcome pair to a feature vector

v is a parameter vector

A is set of possible actions

Applying a Log-Linear Model

p(di | d1 . . . di−1, S) =ef(〈d1...di−1,S〉,di)·v∑d∈A e

f(〈d1...di−1,S〉,d)·v

I The big question: how do we define f?

I Ratnaparkhi’s method defines f differently depending onwhether next decision is:

I A tagging decision(same features as before for POS tagging!)

I A chunking decisionI A start/join decision after chunkingI A check=no/check=yes decision

Layer 3: Join or Start

I Looks at head word, constituent (or POS) label, andstart/join annotation of n’th tree relative to the decision,where n = −2,−1

I Looks at head word, constituent (or POS) label of n’th treerelative to the decision, where n = 0, 1, 2

I Looks at bigram features of the above for (-1,0) and (0,1)

I Looks at trigram features of the above for (-2,-1,0), (-1,0,1)and (0, 1, 2)

I The above features with all combinations of head wordsexcluded

I Various punctuation features

Layer 3: Check=NO or Check=YES

I A variety of questions concerning the proposed constituent

The Search Problem

I In POS tagging, we could use the Viterbi algorithm because

p(tj | w1 . . . wn, j, t1 . . . tj−1) = p(tj | w1 . . . wn, j, tj−2 . . . tj−1)

I Now: Decision di could depend on arbitrary decisions in the“past” ⇒ no chance for dynamic programming

I Instead, Ratnaparkhi uses a beam search method

log-linear models for history-based parsing -...

Documents

1 logram: efﬁcient log parsing using n-gram dictionaries

log analysis example - databricksthe new spark sql 1.3 api....

cs 4705 final review cs4705 julia hirschberg. format and...

chart parsing and probabilistic parsing

manageengine's solutions for the microsoft ecosystem ·...

parsing context-free grammars, parsing, syntax trees

example custom lce log parsing minecraft server logs · pdf...

dependency parsing - oregon state...

spell: streaming parsing of system event...

project number: mqp-cew-0703 parsing and analyzing log...

parsing. 2301373chapter 4 parsing2 outline top-down v.s....

cs 4705 hidden markov models julia hirschberg cs4705

weighted parsing, probabilistic parsing

log-linear models - columbia...

parsing, and context-free...

u.cs.biu.ac.ilu.cs.biu.ac.il/~89-680/parsing-algorithms.pdf ·...

log-linear models for history-based...

integrated supertagging and...

association rule mining from server log file - ird india ·...

cs4705 natural language processing - columbia...