transition-based semantic role labeling using predicate argument clustering

20
Transition-based Semantic Role Labeling Using Predicate Argument Clustering Jinho D. Choi & Martha Palmer University of Colorado at Boulder June 23rd, 2011 Workshop on Relational Models of Semantics Thursday, June 23, 2011

Upload: jinho-d-choi

Post on 28-May-2015

1.178 views

Category:

Technology


0 download

DESCRIPTION

This paper suggests two ways of improving semantic role labeling (SRL). First, we introduce a novel transition-based SRL algorithm that gives a quite different approach to SRL. Our algorithm is inspired by shift-reduce parsing and brings the advantages of the transition-based approach to SRL. Second, we present a self-learning clustering technique that effectively improves labeling accuracy in the test domain. For better generalization of the statistical models, we cluster verb predicates by comparing their predicate argument structures and apply the clustering information to the final labeling decisions. All approaches are evaluated on the CoNLL’09 English data. The new algorithm shows comparable results to another state-of-the-art system. The clustering technique improves labeling accuracy for both in-domain and out-of-domain tasks.

TRANSCRIPT

Page 1: Transition-based Semantic Role Labeling Using Predicate Argument Clustering

Transition-based Semantic Role LabelingUsing Predicate Argument Clustering

Jinho D. Choi & Martha PalmerUniversity of Colorado at Boulder

June 23rd, 2011

Workshop on Relational Models of Semantics

Thursday, June 23, 2011

Page 2: Transition-based Semantic Role Labeling Using Predicate Argument Clustering

Dependency-based SRL• Semantic role labeling

- Task of identifying arguments of each predicate and labeling them with semantic roles in relation to the predicate.

• Dependency-based semantic role labeling

- Advantages over constituent-based semantic role labeling.

• Dependency parsing is faster (2.29 milliseconds / sentence).

• Dependency structure is more similar to predicate argument structure.

- Labels headwords instead of phrases.

• Still can recover the original semantic chunks for the most of time (Choi and Palmer, LAW 2010).

2

Thursday, June 23, 2011

Page 3: Transition-based Semantic Role Labeling Using Predicate Argument Clustering

Dependency-based SRL• Constituent-based vs. dependency-based SRL

3

He opened the door with his foot

NP

PPNP

VPNP

S

at ten

PP

NP

Agent Theme Instrument Temporal

Hethe

door with at

his

foot ten

Thursday, June 23, 2011

Page 4: Transition-based Semantic Role Labeling Using Predicate Argument Clustering

Dependency-based SRL• Constituent-based vs. dependency-based SRL

4

He the door with his foot at ten

ARG1 ARG2 TMP

opened

ARG0

thedoor with at

hisfoot ten

SBJ OBJ ADV TMP

opened

He

Thursday, June 23, 2011

Page 5: Transition-based Semantic Role Labeling Using Predicate Argument Clustering

Motivations• Do argument identification and classification need to be

in separate steps?

- They may require two different feature sets.

- Training them in a pipeline takes less time than as a joint-inference task.

- We have seen advantages of dealing with them as a joint-inference task in dependency parsing, why not in SRL?

5

Thursday, June 23, 2011

Page 6: Transition-based Semantic Role Labeling Using Predicate Argument Clustering

Transition-based SRL• Dependency parsing vs. dependency-based SRL

- Both try to find relations between word pairs.

- Dep-based SRL is a special kind of dep. parsing.

• It restricts the search only to top-down relations between predicate (head) and argument (dependent) pairs.

• It allows multiple predicates for each argument.

• Transition-based SRL algorithm

- Top-down, bidirectional search.

- Easier to develop a joint-inference system between dependency parsing and semantic role labeling.

6

→ More suitable for SRL

Thursday, June 23, 2011

Page 7: Transition-based Semantic Role Labeling Using Predicate Argument Clustering

Transition-based SRL• Parsing states

- (λ1, λ2, p, λ3, λ4, A)

- p - index of the current predicate candidate.

- λ1 - indices of lefthand-side argument candidates.

- λ4 - indices of righthand-side argument candidates.

- λ2,3 - indices of processed tokens.

- A - labeled arcs with semantic roles

• Initialization: ([ ], [ ], 1, [ ], [2, ..., n], ∅)

• Termination: (λ1, λ2, ∄, [ ], [ ], A)

7

Thursday, June 23, 2011

Page 8: Transition-based Semantic Role Labeling Using Predicate Argument Clustering

Transition-based SRL• Transitions

- No-Pred - finds the next predicate candidate.

- No-Arc← - rejects the lefthand-side argument candidate.

- No-Arc→ - rejects the righthand-side argument candidate.

- Left-Arc← - accepts the lefthand-side argument candidate.

- Right-Arc→ - accepts the righthand-side argument candidate.

8

Thursday, June 23, 2011

Page 9: Transition-based Semantic Role Labeling Using Predicate Argument Clustering

9

• Left-Arc : John ← wants

• No-Pred

• Right-Arc : wants → to

• No-Arc x 3

• Shift

• No-Arc x 2

• No-Pred

• Left-Arc : John ← buy

• No-Arc

• Right-Arc : buy → car

car

a

buy

to

wants

John ← wants

John

John

wants

John to

wants → tobuy

a

car

John

wants

to

car

a

buyto buy

to

wants

John

a

John ← buy

car

buy → car

λ1 λ2 λ3 λ4 A

John1 wants2 to3 buy4 a5 car6

A0 A1

A0 A1

Thursday, June 23, 2011

Page 10: Transition-based Semantic Role Labeling Using Predicate Argument Clustering

Features• Baseline features

- N-gram and binary features(similar to ones in Johansson and Nugues, EMNLP 2008).

- Structural features.

10

wants

PRP:John TO:to

VB:buy

SBJ OPRD

IM

Subcategorization of “wants”

Path from “John” to “buy”

Depth from “John” to “buy”

SBJ ← VV → OPRD

PRP ↑ LCALCA ↓ TO ↓ VB

SBJ ↑ LCALCA ↓ OPRD ↓ IM1 ↑ LCA ↓ 2

Thursday, June 23, 2011

Page 11: Transition-based Semantic Role Labeling Using Predicate Argument Clustering

Features• Dynamic features

- Derived from previously identified arguments.

- Previously identified argument label of warg.

- Label of the very last predicted numbered argument of wpred.

- These features can narrow down the scope of expected arguments of wpred.

11

John1 wants2 to3 buy4 a5 car6

A0 A1

A0 A1

Thursday, June 23, 2011

Page 12: Transition-based Semantic Role Labeling Using Predicate Argument Clustering

Experiments• Corpora

- CoNLL’09 English data.

- In-domain task: the Wall Street Journal.

- Out-of-domain task: the Brown corpus.

• Input to our semantic role labeler

- Automatically generated dependency trees.

- Used our open-source dependency parser, ClearParser.

• Machine learning algorithm

- Liblinear L2-L1 SVM.

12

Thursday, June 23, 2011

Page 13: Transition-based Semantic Role Labeling Using Predicate Argument Clustering

Experiments• Results

- AI - Argument Identification.

- AC - Argument Classification.

13

TaskIn-domainIn-domainIn-domain Out-of-domainOut-of-domainOut-of-domain

TaskP R F1 P R F1

BaselineAI 92.57 88.44 90.46 90.96 81.57 86.01

BaselineAI+AC 87.20 83.31 85.21 77.11 69.14 72.91

+Dynamic

AI 92.38 88.76 90.54 90.90 82.25 86.36+Dynamic AI+AC 87.33 83.91 85.59 77.41 70.05 73.55

JN’08 AI+AC 88.46 83.55 85.93 77.67 69.63 73.43

Thursday, June 23, 2011

Page 14: Transition-based Semantic Role Labeling Using Predicate Argument Clustering

Summary• Introduced a transition-based SRL algorithm, showing

near state-of-the-art results.

- No need to design separate systems for argument identification and classification.

- Make it easier to develop a joint-inference system between dependency parsing and semantic role labeling.

• Future work

- Several techniques, designed to improve transition-based parsing, can be applied (e.g., dynamic programming, k-best ranking)

- We can apply more features, such as clustering information, to improve labeling accuracy.

14

Thursday, June 23, 2011

Page 15: Transition-based Semantic Role Labeling Using Predicate Argument Clustering

Predicate Argument Clustering• Verb clusters can give more generalization to the

statistical models.

- Clustering verbs using bag-of-words, syntactic structure.

- Clustering verbs using predicate argument structure.

• Self-learning clustering

- Cluster verbs in the test data using automatically generated predicate argument structures.

- Cluster verbs in the training data using the verb clusters found in the test data as seeds.

- Re-run our semantic role labeler on the test data using the clustering information.

15

Thursday, June 23, 2011

Page 16: Transition-based Semantic Role Labeling Using Predicate Argument Clustering

Predicate Argument Clustering

16

want 1 1 1 1 00s 0sbuy 1 1 1 0 10s 0s

A0 A1 john:A0 to:A1 car:A1... ...Verb

Table 2 shows parsing states generated by our al-gorithm. Our experiments show that this algorithmgives comparable results against another state-of-the-art system.

3 Predicate argument clustering

Some studies showed that verb clustering informa-tion could improve performance in semantic role la-beling (Gildea and Jurafsky, 2002; Pradhan et al.,2008). This is because semantic role labelers usuallyperform worse on verbs not seen during training, forwhich the clustering information can provide usefulfeatures. Most previous studies used either bag-of-words or syntactic structure to cluster verbs; how-ever, this may or may not capture the nature of predi-cate argument structure, which is more semanticallyoriented. Thus, it is preferable to cluster verbs bytheir predicate argument structures to get optimizedfeatures for semantic role labeling.

In this section, we present a self-learning clus-tering technique that effectively improves labelingaccuracy in the test domain. First, we perform se-mantic role labeling on the test data using the algo-rithm in Section 2. Next, we cluster verbs in the testdata using predicate argument structures generatedby our semantic role labeler (Section 3.2). Then, wecluster verbs in the training data using the verb clus-ters we found in the test data (Section 3.3). Finally,we re-run our semantic role labeler on the test datausing the clustering information. Our experimentsshow that this technique gives improvement to la-beling accuracy for both in and out-of domain tasks.

3.1 Projecting predicate argument structureinto vector space

Before clustering, we need to project the predicateargument structure of each verb into vector space.Two kinds of features are used to represent thesevectors: semantic role labels and joined tags ofsemantic role labels and their corresponding wordlemmas. Figure 2 shows vector representations ofpredicate argument structures of verbs, want andbuy, in Figure 1.

Initially, all existing and non-existing features areassigned with a value of 1 and 0, respectively. How-ever, assigning equal values to all existing featuresis not necessarily fair because some features have

want 1 1 1 1 00s 0sbuy 1 1 1 0 10s 0s

A0 A1 john:A0 to:A1 car:A1... ...Verb

Figure 2: Projecting the predicate argument structure ofeach verb into vector space.

higher confidence, or are more important than theothers; e.g., ARG0 and ARG1 are generally predictedwith higher confidence than modifiers, nouns givemore important information than some other gram-matical categories, etc. Instead, we assign each ex-isting feature with a value computed by the follow-ing equations:

s(lj |vi) =1

1 + exp(−score(lj |vi))

s(mj , lj) =

�1 (wj �= noun)

exp( count(mj ,lj)�∀k count(mk,lk)

)

vi is the current verb, lj is the j’th label of vi, andmj is lj’s corresponding lemma. score(lj |vi) is ascore of lj being a correct argument label of vi; thisis always 1 for training data and is provided by ourstatistical models for test data. Thus, s(lj |vi) is anapproximated probability of lj being a correct argu-ment label of vi, estimated by the logistic function.s(mj , lj) is equal to 1 if wj is not a noun. If wj isa noun, it gets a value ≥ 1 given a maximum likeli-hood of mj being co-occurred with lj .2

With the vector representation, we can apply anykind of clustering algorithm (Hofmann and Puzicha,1998; Kamvar et al., 2002). For our experiments,we use k-best hierarchical clustering for test data,and k-means clustering for training data.

3.2 Clustering verbs in test dataGiven automatically generated predicate argumentstructures in the test data, we apply k-best hierar-chical clustering; that is, a relaxation of classical hi-erarchical agglomerative clustering (from now on,HAC; Ward (1963)), to find verb clusters. UnlikeHAC that merges a pair of clusters at each iteration,k-best hierarchical clustering merges k-best pairs at

2Assigning different weights for nouns resulted in moremeaningful clusters in our experiments. We will explore addi-tional grammatical category specific weighting schemes in fu-ture work.

Table 2 shows parsing states generated by our al-gorithm. Our experiments show that this algorithmgives comparable results against another state-of-the-art system.

3 Predicate argument clustering

Some studies showed that verb clustering informa-tion could improve performance in semantic role la-beling (Gildea and Jurafsky, 2002; Pradhan et al.,2008). This is because semantic role labelers usuallyperform worse on verbs not seen during training, forwhich the clustering information can provide usefulfeatures. Most previous studies used either bag-of-words or syntactic structure to cluster verbs; how-ever, this may or may not capture the nature of predi-cate argument structure, which is more semanticallyoriented. Thus, it is preferable to cluster verbs bytheir predicate argument structures to get optimizedfeatures for semantic role labeling.

In this section, we present a self-learning clus-tering technique that effectively improves labelingaccuracy in the test domain. First, we perform se-mantic role labeling on the test data using the algo-rithm in Section 2. Next, we cluster verbs in the testdata using predicate argument structures generatedby our semantic role labeler (Section 3.2). Then, wecluster verbs in the training data using the verb clus-ters we found in the test data (Section 3.3). Finally,we re-run our semantic role labeler on the test datausing the clustering information. Our experimentsshow that this technique gives improvement to la-beling accuracy for both in and out-of domain tasks.

3.1 Projecting predicate argument structureinto vector space

Before clustering, we need to project the predicateargument structure of each verb into vector space.Two kinds of features are used to represent thesevectors: semantic role labels and joined tags ofsemantic role labels and their corresponding wordlemmas. Figure 2 shows vector representations ofpredicate argument structures of verbs, want andbuy, in Figure 1.

Initially, all existing and non-existing features areassigned with a value of 1 and 0, respectively. How-ever, assigning equal values to all existing featuresis not necessarily fair because some features have

want 1 1 1 1 00s 0sbuy 1 1 1 0 10s 0s

A0 A1 john:A0 to:A1 car:A1... ...Verb

Figure 2: Projecting the predicate argument structure ofeach verb into vector space.

higher confidence, or are more important than theothers; e.g., ARG0 and ARG1 are generally predictedwith higher confidence than modifiers, nouns givemore important information than some other gram-matical categories, etc. Instead, we assign each ex-isting feature with a value computed by the follow-ing equations:

s(lj |vi) =1

1 + exp(−score(lj |vi))

s(mj , lj) =

�1 (wj �= noun)

exp( count(mj ,lj)�∀k count(mk,lk)

)

vi is the current verb, lj is the j’th label of vi, andmj is lj’s corresponding lemma. score(lj |vi) is ascore of lj being a correct argument label of vi; thisis always 1 for training data and is provided by ourstatistical models for test data. Thus, s(lj |vi) is anapproximated probability of lj being a correct argu-ment label of vi, estimated by the logistic function.s(mj , lj) is equal to 1 if wj is not a noun. If wj isa noun, it gets a value ≥ 1 given a maximum likeli-hood of mj being co-occurred with lj .2

With the vector representation, we can apply anykind of clustering algorithm (Hofmann and Puzicha,1998; Kamvar et al., 2002). For our experiments,we use k-best hierarchical clustering for test data,and k-means clustering for training data.

3.2 Clustering verbs in test dataGiven automatically generated predicate argumentstructures in the test data, we apply k-best hierar-chical clustering; that is, a relaxation of classical hi-erarchical agglomerative clustering (from now on,HAC; Ward (1963)), to find verb clusters. UnlikeHAC that merges a pair of clusters at each iteration,k-best hierarchical clustering merges k-best pairs at

2Assigning different weights for nouns resulted in moremeaningful clusters in our experiments. We will explore addi-tional grammatical category specific weighting schemes in fu-ture work.

score of lj being a label of vi

• Vector representation

- Semantic role labels, semantic role labels + word lemmas.

-

-max. likelihood of mj co-occurring with lj

Thursday, June 23, 2011

Page 17: Transition-based Semantic Role Labeling Using Predicate Argument Clustering

Predicate Argument Clustering• Clustering verbs in the test data

- K-best hierarchical agglomerative clustering.

• Merges k-best pairs at each iteration.

• Uses a threshold to dynamically determine the top k clusters.

- We set another threshold for early break-out.

• Clustering verbs in the training data

- K-means clustering.

• Starts with centroids estimated from the clusters found in the test data.

• Uses a threshold to filter out verbs not close enough to any cluster.

17

Thursday, June 23, 2011

Page 18: Transition-based Semantic Role Labeling Using Predicate Argument Clustering

Experiments• Results

18

TaskIn-domainIn-domainIn-domain Out-of-domainOut-of-domainOut-of-domain

TaskP R F1 P R F1

BaselineAI 92.57 88.44 90.46 90.96 81.57 86.01

BaselineAI+AC 87.20 83.31 85.21 77.11 69.14 72.91

+Dynamic

AI 92.38 88.76 90.54 90.90 82.25 86.36+Dynamic AI+AC 87.33 83.91 85.59 77.41 70.05 73.55

+Cluster

AI 92.62 88.90 90.72 90.87 82.43 86.44+Cluster AI+AC 87.43 83.92 85.64 77.47 70.28 73.70

JN’08 AI+AC 88.46 83.55 85.93 77.67 69.63 73.43

Thursday, June 23, 2011

Page 19: Transition-based Semantic Role Labeling Using Predicate Argument Clustering

Conclusion• Introduced self-learning clustering technique, potential

for improving labeling accuracy in the new domain.

- Need to try with large scale data to see a clear impact of the clustering.

- Can also be improved by using different features or clustering algorithms.

• ClearParser open-source project

- http://code.google.com/p/clearparser/

19

Thursday, June 23, 2011

Page 20: Transition-based Semantic Role Labeling Using Predicate Argument Clustering

Acknowledgements• We gratefully acknowledge the support of the National

Science Foundation Grants CISE-IIS- RI-0910992, Richer Representations for Machine Translation, a subcontract from the Mayo Clinic and Harvard Children’s Hospital based on a grant from the ONC, 90TR0002/01, Strategic Health Advanced Research Project Area 4: Natural Language Processing, and a grant from the Defense Advanced Research Projects Agency (DARPA/IPTO) under the GALE program, DARPA/CMO Contract No. HR0011-06-C-0022, subcontract from BBN, Inc. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of the National Science Foundation.

20

Thursday, June 23, 2011