towards parsing unrestricted text into propbank predicate-argument structures

31
Towards Parsing Unrestricted Text into PropBank Predicate- Argument Structures ACL4 Project NCLT Seminar Presentation, 7th June 2006 Conor Cafferkey

Upload: ramiro

Post on 20-Jan-2016

31 views

Category:

Documents


0 download

DESCRIPTION

ACL4 Project NCLT Seminar Presentation, 7th June 2006 Conor Cafferkey. Towards Parsing Unrestricted Text into PropBank Predicate-Argument Structures. Project Overview. Open research problem: Integrating syntactic parsing and semantic role labeling (SRL) Approach - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Towards Parsing Unrestricted Text into PropBank Predicate-Argument Structures

Towards Parsing Unrestricted Text into PropBank Predicate-Argument

Structures

ACL4 ProjectNCLT Seminar Presentation, 7th June 2006

Conor Cafferkey

Page 2: Towards Parsing Unrestricted Text into PropBank Predicate-Argument Structures

Project Overview

Open research problem:● Integrating syntactic parsing and semantic role labeling

(SRL)

Approach● Retraining a history-based generative lexicalized parser

(Bikel, 2002)● Semantically-enriched training corpus (Penn Treebank +

PropBank-derived semantic role annotations)

Page 3: Towards Parsing Unrestricted Text into PropBank Predicate-Argument Structures

Treebank Syntactic Bracketing Style

Page 4: Towards Parsing Unrestricted Text into PropBank Predicate-Argument Structures

Treebank Syntactic Bracketing Style

Page 5: Towards Parsing Unrestricted Text into PropBank Predicate-Argument Structures

Semantic Roles

● Relationship that a syntactic constituent has with a predicate

● Predicate-argument relations● PropBank (Palmer et al., 2005)

Page 6: Towards Parsing Unrestricted Text into PropBank Predicate-Argument Structures

PropBank Predicate-Argument Relations

Frameset: hate.01

ARG0: experiencer

ARG1: target

Page 7: Towards Parsing Unrestricted Text into PropBank Predicate-Argument Structures

PropBank Argument Types

● ARG0 - ARG5: arguments associated with a verb predicate, defined in the PropBank Frames scheme.

● ARGM-XXX: adjunct-like arguments of various sorts, where XXX is the type of the adjunct. Types include locative (LOC), temporal (TMP) , manner (MNR), etc.

● ARGA: causative agents.● rel: the verb of the proposition.

Page 8: Towards Parsing Unrestricted Text into PropBank Predicate-Argument Structures

Current Approaches

● Semantic role labeling (SRL) task:– Identify, given a verb:

● which nodes of the syntactic tree are arguments of that verb, and

● what semantic role each such argument plays with regard to the verb.

Page 9: Towards Parsing Unrestricted Text into PropBank Predicate-Argument Structures

Current Approaches

● “Pipelined” approach● Parsing → Pruning → ML-techniques → post-

processing

● CoNLL-2005 (Carreras and Márquez, 2005)– SVM, Random Fields, Random Forests, …

– Various lexical parameters

Page 10: Towards Parsing Unrestricted Text into PropBank Predicate-Argument Structures

An Integrated Approach to Semantic Parsing

● Integrate syntactic and semantic parsing● Retrain parser using semantically-enriched

corpus (Treebank + PropBank-derived semantic roles)

● Parser itself performs semantic role labeling (SRL)

Page 11: Towards Parsing Unrestricted Text into PropBank Predicate-Argument Structures

Project Components

● “Off-the-shelf”:– Parser (Bikel, 2002) emulating Collins’ (1999) model 2– Penn Treebank Release 2 (Marcus et al., 1993)– PropBank 1.0 (Palmer, 2005)

● Written for project (mainly in Python):– Scripts to annotate Treebank with PropBank data– Script to generate new head-finding rules for Bikel’s parser– SRL evaluation scripts– Utility scripts (pre-processing, etc.)

Page 12: Towards Parsing Unrestricted Text into PropBank Predicate-Argument Structures

Appending Semantic Roles to Treebank Syntactic Category Labels

wsj/15/wsj_1568.mrg 16 2 gold hate.01 vn--a 0:1-ARG0 2:0-rel 3:1-ARG1

Page 13: Towards Parsing Unrestricted Text into PropBank Predicate-Argument Structures

Syntactic Bracketing Evaluation

• Parseval measures (Black, et al., 1992)

filetestintsconstituenofnumber

filetestintsconstituencorrectofnumberprecision

filegoldintsconstituenofnumber

filetestintsconstituencorrectofnumberrecall

Page 14: Towards Parsing Unrestricted Text into PropBank Predicate-Argument Structures

Syntactic Bracketing Evaluation

recallprecision

recallprecisionscoref

2

● Harmonic mean of precision and recall:

Page 15: Towards Parsing Unrestricted Text into PropBank Predicate-Argument Structures

Baseline Syntactic Bracketing Performance

Parsing Section 00, trained with sections 02-21 of Penn Treebank (1918 sentences)

Parse Time: 114:41

Page 16: Towards Parsing Unrestricted Text into PropBank Predicate-Argument Structures

Semantically-Augmented Treebanks

● N: augment node labels with ARGNs only● N-C: augment node label with conflated ARGNs

only● M: augment node labels with ARGMs only● M-C: augment node labels with conflated

ARGMs only● NMR: augment node labels with ARGNs,

ARGMs and rels

Page 17: Towards Parsing Unrestricted Text into PropBank Predicate-Argument Structures

Syntactic Bracketing Evaluation

Parsing Section 00, trained with sections 02-21 of Penn Treebank (1918 sentences)

Page 18: Towards Parsing Unrestricted Text into PropBank Predicate-Argument Structures

Semantic Evaluation

filetestinlabelsroleofnumber

filetestinlabelsrolecorrectofnumberprecision

filegoldinlabelsroleofnumber

filetestinlabelsrolecorrectofnumberrecall

Page 19: Towards Parsing Unrestricted Text into PropBank Predicate-Argument Structures

Semantic Evaluation

● Evaluating by terminal number and height● Evaluating by terminal span

● How strictly to evaluate?

Page 20: Towards Parsing Unrestricted Text into PropBank Predicate-Argument Structures

Semantic Role Labeling Evaluation

Parsing Section 00, trained with sections 02-21 of Penn Treebank (1918 sentences)

Page 21: Towards Parsing Unrestricted Text into PropBank Predicate-Argument Structures

Semantic Role Labeling Evaluation

Parsing Section 00, trained with sections 02-21 of Penn Treebank (1918 sentences)

Page 22: Towards Parsing Unrestricted Text into PropBank Predicate-Argument Structures

Syntactic Nodes that Play Multiple Semantic Roles

Page 23: Towards Parsing Unrestricted Text into PropBank Predicate-Argument Structures

Adding More Information

● Co-index the semantic role labels with governing predicate (verb)

● i.e. include the appropriate roleset name in each semantic label augmentation

Page 24: Towards Parsing Unrestricted Text into PropBank Predicate-Argument Structures

Co-indexing the Semantic Augmentations

Page 25: Towards Parsing Unrestricted Text into PropBank Predicate-Argument Structures

Adding More Information

● Data sparseness● Time efficiency● Need to make some sort of generalizations

● “Syntacto-semantic” verb classes● VerbNet (Kipper et al., 2002)

Page 26: Towards Parsing Unrestricted Text into PropBank Predicate-Argument Structures

Co-indexing with VerbNet classes

Page 27: Towards Parsing Unrestricted Text into PropBank Predicate-Argument Structures

Future Ideas

● Integrate the (un co-indexed) output from the re-trained parser into a pipelined SRL system

● Syntactic parsing informed by semantic roles?– Recoding the parser to take better advantage of the

semantic roles

– Reranking n-best parser outputs based on semantic roles

Page 28: Towards Parsing Unrestricted Text into PropBank Predicate-Argument Structures

Summary

● Retrained a history-based generative lexicalized parser with semantically-enriched corpus– Corpus annotation

– Generating head-finding rules

● Evaluated parser’s performance– Syntactic parsing (evalb)

– Semantic parsing (SRL)

Page 29: Towards Parsing Unrestricted Text into PropBank Predicate-Argument Structures

References● Bikel, Daniel M. 2002. Design of a Multi-lingual, Parallel-processing Statistical

Parsing Engine. In Proceedings of HLT2002, San Diego, California.

● Black, Ezra, Frederick Jelinek, John D. Lafferty, David M. Magerman, Robert L. Mercer and Salim Roukos. 1992. Towards History-based Grammars: Using Richer Models for Probabilistic Parsing. In Proceedings DARPA Speech and Natural Language Workshop, Harriman, New York, pages 134-139. Morgan Kaufmann.

● Carreras, Xavier and Lluís Màrquez. 2005. Introduction to the CoNLL-2005 Shared Task: Semantic Role Labeling. In Proceedings of CoNLL-2005, pages152-164.

● Collins, Michael John. 1999. Head-driven Statistical Models for Natural Language Parsing. Ph.D. thesis, University of Pennsylvania, Philadelphia.

Page 30: Towards Parsing Unrestricted Text into PropBank Predicate-Argument Structures

References● Kipper, Karin, Hoa Trang Dang and Martha Palmer. 2000. Class-

Based Construction of a Verb Lexicon. In Proceedings of Seventeenth National Conference on Artificial Intelligence, Austin, Texas.

● Marcus, Mitchell P., Beatrice Santroini and Mary Ann Marcinkiewicz. 1993. Building a large annotated corpus of English: the Penn Treebank. Computational Linguistics, 19(2):313-330.

● Palmer, Martha, Daniel Gildea and Paul Kingsbury. 2005. The Proposition Bank: An Annotated Corpus of Semantic Roles. Computational Linguistics, 31(1):71-106.

● Yi, Szu-ting and Martha Palmer. 2005. The integration of syntactic parsing and semantic role labeling. In Proceedings of CoNLL-2005, pages 237-240.

Page 31: Towards Parsing Unrestricted Text into PropBank Predicate-Argument Structures

http://student.dcu.ie/~cafferc2/