1 cs546: machine learning and natural language preparation to the term project: - dependency parsing...

Post on 27-Dec-2015

224 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

1

CS546: Machine Learning and Natural Language

Preparation to the Term Project:- Dependency Parsing- Dependency Representation for Semantic Role Labeling

Slides for Dependency Parsing are based on Joakim Nivre and Sandar Kuebler slides from ACL 06 Tutorial

2

Outline

–Dependency Parsing:• Formalism• Dependency Parsing algorithms

– Semantic Role Labeling• Dependency Formalism• Basic Approach for the First Part of the Term

Project– Pipeline for the first assignment

=5

3

=5

• Formalization by Lucien Tesniere [Tesniere, 1959]• Idea known long before (e.g., Panini, India, >2000 yrs ago)• Studied extensively in the Prague School approach in syntax• (in US, research was focused more on constituent formalism)

4

=5

5

=5

(or Constituent Structure)

8

=5

• There are advantages of dependency structures:– for free (or semi-free) order languages– easier to convert to predicate-argument structure– ...

• But there are drawbacks too...• You can try to convert one representation into

another– but, in general, these formalisms are not equivalent

Constituent vs Dependency

9

=5

• Most of approaches have been focused constituent tree-based features

• But now it changes– Machine Translation (e.g., Menezes & Quirk, 07)– Summarization and sentence compression (e.g.,

Fillippova & Strube, 08)– Opinion mining, (e.g., Lerman et al, 08)– Information extraction, Question Answering (e.g.,

Bouma et al, 06)

Dependency structures for NLP tasks

All these conditions will be violated for semantic dependency graphs we will consider later

You can think of it as (related) planarity

14

=5

• Global inference algorithms:– graph-based approaches– transition-based approaches

• We will not consider– rule-based systems– constraint satisfaction

Algorithms

15

=5

Idea:• Convert dependency structures to constituent

structures– easy for projective dependency structures

• Apply algorithms for constituent parsing to them– E.g., CKY - if some of you attend the class by Julia

Hockenmaier on parsing it was/will be covered there

Converting to Constituent Formalism

16

=5

Converting to Constituent Formalism

• Different independence assumption lead to different statistical models– both accuracy and parsing time (dynamic

programming) varies

• Features f(i,j) can include dependence on any words in the sentence, i.e. f(i, j, sent) • But still the score decomposes over edges in the graph•Strong independence assumption

Online Learning (Structured Perceptron)

• Joint feature representation:– we will talk about it more later

• Algoritm:

Here we run MST or Eisner’s algorithm

Features over edges only

• Here, when we say parsing algorithm (=derivation order) we often mean mapping:– Given a tree map it to a sequence of actions which create

this tree

• Tree T is equivalent to these sequence of actions:d1, ..., dn

• Therefore, P(T) = P(d1, ..., dn)

• P(T) = P(d1, ..., dn) = P(d1) P(d2|d1)... P(dn|dn-1, ..., d1)

• Ambigous: some times “parsing algorithms” refers to the decoding algorithm to find the most likely sequence

Parsing Algorithms

You can use classifiers here and search for most likely sequence (Recall Maryam’s talk)

• Most algorithms are restricted to projective structures, but not all

It can handle only projective structures

• Your training examples are{(dj; d1,....,dn-1)} -- collections of parsing contexts

• Your want to predict correct actionsP(dn|dn-1, ..., d1)• How to define feature representation of (dn-1, ..., d1)• You can think instead of (dn-1, ..., d1) in terms of:

– partial tree corresponding to them– current contents of queue (Q) and stack (S)– The most important features are top of S and front of Q (only

between them you can potentially create links)

• (Inference: you can do it greedily or with beam search)

How to learn in this case?

CoNLL-2006 Shared Task, Average over 12 langs (Labeled Attachment Score)

McDonald et al (MST): 80.27Nivre et al (Transitions): 80.19

• Results are the same• A lot of research in both directions,– e.g., Latent Variable Models for Transition Based

Parsing (Titov and Henderson, 07) – best single-model system in CoNLL-2007 (third overall)

Results: Transition-based vs Graph-Based

• Graph-Based Algorithms (McDonald)• Post-Processing of Projective Algorithms (Hall

and Novak, 05)• Transition-Based Algorithms which handle

non-projectivity (Attardi 06, Titov et al, 08; Nivre et al, 08)

• Pseudo Projective Parsing: Removing non-projective (crossing) links and encoding them in labels (Nivre and Nilsson, 05)

Non-Projective Parsing

• Graph-Based Algorithms (McDonald)• Post-Processing of Projective Algorithms (Hall

and Novak, 05)• Transition-Based Algorithms which handle

non-projectivity (Attardi 06, Titov et al, 08; Nivre et al, 08)

• Pseudo Projective Parsing: Removing non-projective (crossing) links and encoding them in labels (Nivre and Nilsson, 05)

Non-Projective Parsing

37

First Phase of Term Project– The goal is to construct a joint syntax-SRL

(Semantic Role Labeling) dependency structures– Similar to CoNLL-2008, 09 Shared Tasks– 2nd phase will focus on SRL – Now we need to create the entire pipeline

• Tagger: SVM tagger• Pseudo-Projective Transformations: tool by Nilsson & Nivre• Dependency Parser: Malt Parser by Nivre et al• Implement a basic classifier for SRL (see next slide)

– Due after Spring Break– I’ll send email and description by email

=5

38

First Phase of Term Project

=5

Syntactic structure

Semantic structure

•Properties of the Semantic (SRL) Structure •Multiple heads (parents)•Need to annotate predicates with senses (predicates are potential parents in the graph) – not indicated in the figure

It is not the most standard formalism for SRL

39

SRL Pipeline– 1st Stage: For every word you decide if a word

is a predicate (binary classification)– 2nd Stage: For all the words which are

predicates predict their sense – 3rd Stage: For every pair of words decide:• word A is an argument of word B• word B is an argument of word A• there is no SRL relation between them(constraint: only predicates can be parents)

– 4th Stage: Label all the relations

=5

40

SRL Pipeline

– Use any features: • hint: dependency parse features are going to be very

useful• see the CoNLL 2008 shared task papers to see which

features were useful

– Use any learning algorithm• You can use a package (e.g., SnOW)• Or implement it (e.g., averaged perceptron is easy)

– Do not use any SRL tools

=5

41

Next lectures

– I will be away for 2 weeks– Next week (Mar, 9 – Mar, 15):• Wednesday: Alex Klementiev on Weak Supervision • Friday: Kevin Small on Active Learning

+ student presentation by Ryan on Friday

– 2nd week (Mar, 16 – Mar, 22):• work on the project

– 1st phase will be due around April, 1 (exact dates later)

=5

top related