feature forest models for syntactic parsing yusuke miyao university of tokyo

Post on 28-Mar-2015

217 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Feature Forest Modelsfor Syntactic Parsing

Yusuke Miyao

University of Tokyo

Probabilistic models for NLP

• Widely used for disambiguation of linguistic structures

• Ex.) POS tagging

A pretty girl is cryingNN

DT

VBZ

JJ

VBG

NN

DT

VBZ

JJ

VBG

NN

DT

VBZ

JJ

VBG

NN

DT

VBZ

JJ

VBG

NN

DT

VBZ

JJ

VBGP(NN|a/NN, pretty)

Probabilistic models for NLP

• Widely used for disambiguation of linguistic structures

• Ex.) POS tagging

A pretty girl is cryingNN

DT

VBZ

JJ

VBG

NN

DT

VBZ

JJ

VBG

NN

DT

VBZ

JJ

VBG

NN

DT

VBZ

JJ

VBG

NN

DT

VBZ

JJ

VBG

Probabilistic models for NLP

• Widely used for disambiguation of linguistic structures

• Ex.) POS tagging

A pretty girl is cryingNN

DT

VBZ

JJ

VBG

NN

DT

VBZ

JJ

VBG

NN

DT

VBZ

JJ

VBG

NN

DT

VBZ

JJ

VBG

NN

DT

VBZ

JJ

VBG

Implicit assumption

• Processing state = Primitive probability– Efficient algorithm for searching– Avoid exponential explosion of ambiguities

NN

DT

VBZ

JJ

VBG

NN

DT

VBZ

JJ

VBG

NN

DT

VBZ

JJ

VBG

NN

DT

VBZ

JJ

VBG

NN

DT

VBZ

JJ

VBG

A pretty girl is crying

POS tag = processing state = primitive probability

The assumption is right?

• Ex.) Shallow parsing, NE recognition

The assumption is right?

• Ex.) Shallow parsing, NE recognition

NP-B

VP-I

NP-I

O

VP-B

A pretty girl is cryingNP-B

VP-I

NP-I

O

VP-B

NP-B

VP-I

NP-I

O

VP-B

NP-B

VP-I

NP-I

O

VP-B

NP-B

VP-I

NP-I

O

VP-B

The assumption is right?

• Ex.) Shallow parsing, NE recognition– B(Begin), I(Internal), O(Other) tags are

introduced to represent multi-word tags

NP-B

VP-I

NP-I

O

VP-B

A pretty girl is cryingNP-B

VP-I

NP-I

O

VP-B

NP-B

VP-I

NP-I

O

VP-B

NP-B

VP-I

NP-I

O

VP-B

NP-B

VP-I

NP-I

O

VP-B

The assumption is right?

• Ex.) Syntactic parsing

The assumption is right?

• Ex.) Syntactic parsing

What do you want to give?

VP

VP

S

S

S

P(VP|VP→to give)

The assumption is right?

• Ex.) Syntactic parsing– Non-local dependencies are not

represented

What do you want to give?

VP

VP

S

S

S

P(VP|VP→to give)

Problem of existing models

• Processing state Primitive probability

Problem of existing models

• Processing state Primitive probability

• How to model the probability of ambiguous structures with more flexibility?

Possible solution

• A complete structure is a primitive event– Ex.) Shallow parsing

NP-B

VP-I

NP-I

O

VP-B

A pretty girl is cryingNP-B

VP-I

NP-I

O

VP-B

NP-B

VP-I

NP-I

O

VP-B

NP-B

VP-I

NP-I

O

VP-B

NP-B

VP-I

NP-I

O

VP-B

Possible solution

• A complete structure is a primitive event– Ex.) Shallow parsing

NP VP

NP VP

A pretty girl is cryingNP VP

NP VP NP

NP VP NP

All possiblesequences

Possible solution

• A complete structure is a primitive event– Ex.) Shallow parsing

• Probability of the sequence of multi-word tags

NP VP

NP VP

A pretty girl is cryingNP VP

NP VP NP

NP VP NP

All possiblesequences

Possible solution

• A complete structure is a primitive event– Ex.) Shallow parsing

• Probability of the sequence of multi-word tags

NP VP

NP VP

A pretty girl is cryingNP VP

NP VP NP

NP VP NP

All possiblesequences

Possible solution

• A complete structure is a primitive event– Ex.) Syntactic parsing

What do you want to give?

VP

VP

S

S

S

Possible solution

• A complete structure is a primitive event– Ex.) Syntactic parsing

whatdo

youwant

to

give

ARG1

ARG1

ARG2

MODIFY

MODIFY

ARG2

Possible solution

• A complete structure is a primitive event– Ex.) Syntactic parsing

• Probability of argument structures

whatdo

youwant

to

give

ARG1

ARG1

ARG2

MODIFY

MODIFY

ARG2

Problem

• Complete structures have exponentially many ambiguities

NP VP

NP VP

A pretty girl is cryingNP VP

NP VP NP

NP VP NP

Exponentiallymany

sequences

Proposal

• Feature forest model [Miyao and Tsujii, 2002]

Proposal

• Feature forest model [Miyao and Tsujii, 2002]

1d

4d 5d3d2d

1c

3c

2c

4c 6c5c

7d6d

7c 8c 10c9c

Conjunctive node

Conjunctive node

Disjunctive node

Disjunctive node

FeaturesFeatures ,,)( 216 ffc

• Exponentially many trees are packed

• Features are assigned to each conjunctive node

Feature forest model

• Feature forest models can be efficiently estimated without exponential explosion [Miyao and Tsujii, 2002]

Feature forest model

• Feature forest models can be efficiently estimated without exponential explosion [Miyao and Tsujii, 2002]

• When unpacking the forest, the model is equivalent to maximum entropy models [Berger et al., 1996]

Application to parsing

• Applying a feature forest model to disambiguation of argument structures

Application to parsing

• Applying a feature forest model to disambiguation of argument structures

• How to represent exponential ambiguities of argument structures with a feature forest?

Application to parsing

• Applying a feature forest model to disambiguation of argument structures

• How to represent exponential ambiguities of argument structures with a feature forest?– Argument structures are not trees, but

DAGs (including reentrant structures)

wantARG1

ARG2

Iargue11

ARG1 1

fact

ARG1

wantARG1

ARG2

Iargue21

ARG1 1ARG2 fact

Packing argument structures

• An example including reentrant structures

She neglected the fact that I wanted to argue.

I

Packing argument structures

She neglected the fact that I wanted to argue.

wantARG1

ARG2

Iargue11

ARG1 1

Packing argument structures

• Inactive parts: Argument structures whose arguments are all instantiated

She neglected the fact that I wanted to argue.

I

wantARG1

ARG2

Iargue11

ARG1 1

Packing argument structures

• Inactive parts: Argument structures whose arguments are all instantiated

• Inactive parts are packed into conjunctive nodes

She neglected the fact that I wanted to argue.

I

Packing argument structures

• Inactive parts: Argument structures whose arguments are all instantiated

• Inactive parts are packed into conjunctive nodes

She neglected the fact that I wanted to argue.

I

wantA1A2

argue1I A1

argue1I

wantARG1

ARG2

Iargue21

ARG1 1ARG2 ?

Packing argument structures

• Inactive parts: Argument structures whose arguments are all instantiated

• Inactive parts are packed into conjunctive nodes

She neglected the fact that I wanted to argue.

I

wantA1A2

argue1I A1

argue1I

Packing argument structures

• Inactive parts: Argument structures whose arguments are all instantiated

• Inactive parts are packed into conjunctive nodes

She neglected the fact that I wanted to argue.

I

wantA1A2

argue1I A1

argue1I

wantA1A2 argue2

I

Packing argument structures

• Inactive parts: Argument structures whose arguments are all instantiated

• Inactive parts are packed into conjunctive nodes

She neglected the fact that I wanted to argue.

I

wantA1A2

argue1I A1

argue1I

wantA1A2 argue2

I

wantARG1

ARG2

Iargue21

ARG1 1ARG2 fact

Packing argument structures

• Inactive parts: Argument structures whose arguments are all instantiated

• Inactive parts are packed into conjunctive nodes

She neglected the fact that I wanted to argue.

I

wantA1A2

argue1I A1

argue1I

wantA1A2 argue2

I

factargue2A1A2 fact

I

Packing argument structures

• Inactive parts: Argument structures whose arguments are all instantiated

• Inactive parts are packed into conjunctive nodes

She neglected the fact that I wanted to argue.

I

wantA1A2

argue1I A1

argue1I

wantA1A2 argue2

I

factargue2A1A2 fact

IfactA1 want

Feature forest representationof argument structures

factA1 want

factargue2A1A2

wantA1A2

argue1I A1

She neglected the fact that I wanted to argue.

I

argue1I

wantA1A2 argue2

I

factI

she

neglectA1A2 fact

sheConjunctive nodes correspond to argument structures whose arguments are all instantiated

Experiments

• Grammar: a treebank grammar of HPSG [Miyao and Tsujii, 2003]

– Extracted from the Penn Treebank [Marcus et al., 1994] Section 02-21

• Training: Section 02-21 of the Penn Treebank• Test: sentences from Section 22 covered by

the grammar• Measure: Accuracy of dependencies in

argument structures

Experiments

• Features: the combinations of– Surface strings/POS– Labels of dependencies (ARG1, ARG2, …)– Labels of lexical entries (head noun, transitive, …)– Distance

• Estimation algorithm: Limited-memory BFGS algorithm [Nocedal, 1980] with MAP estimation [Chen & Rosenfeld, 1999]

Preliminary results

• Estimation time: 143 min.

• Accuracy (precision/recall):

exact partial

Baseline 48.1 / 47.4 57.1 / 56.2

Unigram 77.3 / 77.4 81.1 / 81.3

Feature forest 85.5 / 85.3 88.4 / 88.2

Conclusion

• Feature forest models allow the probabilistic modeling of complete structures without exponential explosion

• The application to syntactic parsing resulted in the high accuracy

Ongoing work

• Refinement of the grammar and tuning of estimation parameters

• Development of efficient algorithms for best-first/beam search

top related