semantic frame identification with distributed word representations

47
Semantic Frame Identification with Distributed Word Representations Karl Moritz Hermann 1 Jason Weston Dipanjan Das Kuzman Ganchev Google Inc., New York Department of Computer Science, University of Oxford Baltimore, 25.06.2014 1 The majority of this research is result of an internship at Google. 1/1

Upload: others

Post on 11-Sep-2021

5 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Semantic Frame Identification with Distributed Word Representations

Semantic Frame Identification with DistributedWord Representations

Karl Moritz Hermann‡1

Jason Weston†Dipanjan Das†

Kuzman Ganchev†

†Google Inc., New York‡ Department of Computer Science, University of Oxford

Baltimore, 25.06.2014

1The majority of this research is result of an internship at Google.1 / 1

Page 2: Semantic Frame Identification with Distributed Word Representations

We investigate features and classifiersfor frame-semantic parsing

2 / 1

Page 3: Semantic Frame Identification with Distributed Word Representations

We investigate features and classifiersfor frame-semantic parsing

2 / 1

Page 4: Semantic Frame Identification with Distributed Word Representations

We investigate features and classifiersfor frame-semantic parsing

2 / 1

Page 5: Semantic Frame Identification with Distributed Word Representations

We investigate features and classifiersfor frame-semantic parsing

2 / 1

Page 6: Semantic Frame Identification with Distributed Word Representations

We investigate features and classifiersfor frame-semantic parsing

2 / 1

Page 7: Semantic Frame Identification with Distributed Word Representations

We investigate features and classifiersfor frame-semantic parsing

2 / 1

Page 8: Semantic Frame Identification with Distributed Word Representations

Frame-Semantic Parsing Intro

Frame-Semantic Parsing

The task of extracting semantic predicate-argument structuresfrom text.

John sold Mary a car .

COMMERCE BUYsell.V

Seller GoodsBuyer

iobjnsubjdobj

det

3 / 1

Page 9: Semantic Frame Identification with Distributed Word Representations

Frame-Semantic Parsing Intro

Frame-Semantic Parsing

The task of extracting semantic predicate-argument structuresfrom text.

John sold Mary a car .

COMMERCE BUYsell.V

Seller GoodsBuyer

iobjnsubjdobj

det

3 / 1

Page 10: Semantic Frame Identification with Distributed Word Representations

Frame-Semantic Parsing Intro

Frame-Semantic Parsing

The task of extracting semantic predicate-argument structuresfrom text.

John sold Mary a car .

COMMERCE BUYsell.V

Seller GoodsBuyer

iobjnsubjdobj

det

3 / 1

Page 11: Semantic Frame Identification with Distributed Word Representations

Frame-Semantic Parsing Intro

Frame-Semantic Parsing

The task of extracting semantic predicate-argument structuresfrom text.

John sold Mary a car .

COMMERCE BUYsell.V

Seller GoodsBuyer

iobjnsubjdobj

det

3 / 1

Page 12: Semantic Frame Identification with Distributed Word Representations

Frame parsing as a two-stage task

Frame-Semantic Parsing is a combination of two tasks:

• Frame Identification

• Argument Identification

This paper focuses on Frame Identification

However, we also present results on the full pipeline task.

4 / 1

Page 13: Semantic Frame Identification with Distributed Word Representations

Representing frame instances

Default Approach

• Frame instances representedby candidate and context

• Context: sparse featuresbased on parse tree

Distributed Approach

• Replaces binary featureswith word embeddings

• Context: same features, butrepresented distributionally

5 / 1

Page 14: Semantic Frame Identification with Distributed Word Representations

Instances to Vectors

Step 1: Parsing

• Dependency parse of input sentence

• Paths are considered relative to candidate word

6 / 1

Page 15: Semantic Frame Identification with Distributed Word Representations

Instances to Vectors

Step 1: Parsing

• Dependency parse of input sentence

• Paths are considered relative to candidate word

6 / 1

Page 16: Semantic Frame Identification with Distributed Word Representations

Instances to Vectors

Step 2: Context word selection strategy

• Direct dependents based on parse

• Argument dependency paths learned from gold data

6 / 1

Page 17: Semantic Frame Identification with Distributed Word Representations

Instances to Vectors

Step 3: Embeddings

• Replace words with embeddings

6 / 1

Page 18: Semantic Frame Identification with Distributed Word Representations

Instances to Vectors

Step 4: Single vector creation

• Merge embeddings into a unified vector representation

• Effectively concatenation; zeros for empty slots

6 / 1

Page 19: Semantic Frame Identification with Distributed Word Representations

Joint-space Model (Wsabie) — Learning

Joint-space Model

• Instances represented in Rd based on pre-trained embeddings.

7 / 1

Page 20: Semantic Frame Identification with Distributed Word Representations

Joint-space Model (Wsabie) — Learning

Joint-space Model

• Labels represented as discrete values given lexicon (size F ).

7 / 1

Page 21: Semantic Frame Identification with Distributed Word Representations

Joint-space Model (Wsabie) — Learning

Joint-space Model

• Learn a linear mapping M : Rd → Rm.

7 / 1

Page 22: Semantic Frame Identification with Distributed Word Representations

Joint-space Model (Wsabie) — Learning

Joint-space Model

• Learn a matrix Y ∈ RF×m to represent labels.

7 / 1

Page 23: Semantic Frame Identification with Distributed Word Representations

Joint-space Model (Wsabie) — Learning

Joint-space Model

• Objective function:∑x

∑y

L(ranky (x)

)max(0, γ + s(x , y)− s(x , y)).

7 / 1

Page 24: Semantic Frame Identification with Distributed Word Representations

Joint-space Model (Wsabie) — Classification

Joint-Space Rm

• Project candidate intojoint-space

• Only consider labelprojections

• Restrict labels using lexicon

• Classify candidate usingsuitable distance metric

8 / 1

Page 25: Semantic Frame Identification with Distributed Word Representations

Joint-space Model (Wsabie) — Classification

Joint-Space Rm

• Project candidate intojoint-space

• Only consider labelprojections

• Restrict labels using lexicon

• Classify candidate usingsuitable distance metric

8 / 1

Page 26: Semantic Frame Identification with Distributed Word Representations

Joint-space Model (Wsabie) — Classification

Joint-Space Rm

• Project candidate intojoint-space

• Only consider labelprojections

• Restrict labels using lexicon

• Classify candidate usingsuitable distance metric

8 / 1

Page 27: Semantic Frame Identification with Distributed Word Representations

Joint-space Model (Wsabie) — Classification

Joint-Space Rm

• Project candidate intojoint-space

• Only consider labelprojections

• Restrict labels using lexicon

• Classify candidate usingsuitable distance metric

8 / 1

Page 28: Semantic Frame Identification with Distributed Word Representations

Experiments

Learning Setup

• Neural language model (∼ Bengio et al., 2003) trained on over100 billion tokens to learn 128-dimensional word embeddings

• FrameNet 1.5 and Ontonotes 4.0 (PropBank, WSJ-only) usedfor training the actual Wsabie models

• Hyperparameters optimised on development data

9 / 1

Page 29: Semantic Frame Identification with Distributed Word Representations

Baselines: Where is the power coming from?

Models Evaluated

• Log-Linear Words

• Log-Linear Embeddings

• Wsabie

10 / 1

Page 30: Semantic Frame Identification with Distributed Word Representations

Baselines: Where is the power coming from?

Models Evaluated• Log-Linear Words

• Log-Linear Embeddings

• Wsabie

10 / 1

Page 31: Semantic Frame Identification with Distributed Word Representations

Baselines: Where is the power coming from?

Models Evaluated• Log-Linear Words

• Log-Linear Embeddings

• Wsabie

10 / 1

Page 32: Semantic Frame Identification with Distributed Word Representations

Baselines: Where is the power coming from?

Models Evaluated• Log-Linear Words

• Log-Linear Embeddings

• Wsabie

10 / 1

Page 33: Semantic Frame Identification with Distributed Word Representations

Evaluation

Evaluation Settings

• We evaluate on FrameNet (here) and PropBank (see paper)

• FrameNet setup follows Das et al. (2014), with a restrictedlexicon during training (Semafor)

• Multiple evaluationsAll evaluates all framesRare predicates with frequency ≤ 11 in the training dataUnseen predicates not observed in the training data

11 / 1

Page 34: Semantic Frame Identification with Distributed Word Representations

Frame Identification Results (FrameNet - All Predicates)

FrameNet (Semafor)

83

84

85

86

82.97

83.6

83.94

84.53

86.49F

1-S

core

Das supervised Das bestLL-Embeddings LL-WordsWsabie

Figure : Frame Identification results on FrameNet dataset. We restrictthe training data to the Semafor lexicon for comparability with Das et al.,2014.

12 / 1

Page 35: Semantic Frame Identification with Distributed Word Representations

Frame Identification Results (FrameNet - Rare)

FrameNet (Semafor)

82

84

80.97

82.31

81.03

81.65

85.22F

1-S

core

Das supervised Das bestLL-Embeddings LL-WordsWsabie

Figure : Frame Identification results on FrameNet dataset. We restrictthe training data to the Semafor lexicon for comparability with Das et al.,2014.

13 / 1

Page 36: Semantic Frame Identification with Distributed Word Representations

Frame Identification Results (FrameNet - Unseen)

FrameNet (Semafor)

30

40

23.08

42.67

27.9727.27

46.15F

1-S

core

Das supervised Das bestLL-Embeddings LL-WordsWsabie

Figure : Frame Identification results on FrameNet dataset. We restrictthe training data to the Semafor lexicon for comparability with Das et al.,2014.

14 / 1

Page 37: Semantic Frame Identification with Distributed Word Representations

Full pipeline

Full frame-semantic analysis

In addition to the frame identification experiments, we also use ourmodels as part of a full frame-semantic parsing setup together witha standard argument identification method.

Argument Identification System

• Standard set of discrete features

• Local log-linear model

• Global inference via hard constraints and ILP

15 / 1

Page 38: Semantic Frame Identification with Distributed Word Representations

Full pipeline results (FrameNet)

FrameNet (Semafor)

64

66

68

64.05

64.54

67.06

68.69F

1-S

core

Das supervised Das bestLog-Words Wsabie

Figure : Full frame-structure prediction results for FrameNet dataset(Semafor).

16 / 1

Page 39: Semantic Frame Identification with Distributed Word Representations

Conclusion

Conclusion• Novel approach to frame identification

• Model outperforms prior state of the art on frameidentification

• In a pipeline setting with a standard argument identificationsystem, the model also sets a new state of the art on varioussemantic parsing tasks

• General approach, could easily be extended for alternativeframe-semantic parsing frameworks

17 / 1

Page 40: Semantic Frame Identification with Distributed Word Representations

The End

Thank you!Questions?

18 / 1

Page 41: Semantic Frame Identification with Distributed Word Representations

Frame Identification Results

Model Semafor Lexicon

All Ambiguous Rare Unseen

Das et al., 2014 supervised 82.97 69.27 80.97 23.08Das et al., 2014 best 83.60 69.19 82.31 42.67

Log-Linear Words 84.53 70.55 81.65 27.27Log-Linear Embed. 83.94 70.27 81.03 27.97Wsabie Embedding 86.49 73.39 85.22 46.15

Table : Frame identification results on the FrameNet test data

19 / 1

Page 42: Semantic Frame Identification with Distributed Word Representations

Frame Identification Results

Model Full Lexicon

All Ambiguous Rare

Log-Linear Words 87.33 70.55 87.19Log-Linear Embed. 86.94 70.26 86.56Wsabie Embedding 88.41 73.10 88.93

Table : Frame identification results on the FrameNet test data

20 / 1

Page 43: Semantic Frame Identification with Distributed Word Representations

Full Structure Prediction Results

Model Semafor Lexicon

Precision Recall F1

Das et al. supervised 67.81 60.68 64.05Das et al. best 68.33 61.14 64.54

Log-Linear Words 71.21 63.37 67.06Wsabie Embedding 73.00 64.87 68.69

Table : Full structure prediction results for FrameNet test data. Wecompare to the prior state of the art (Das et al., 2014).

21 / 1

Page 44: Semantic Frame Identification with Distributed Word Representations

Full Structure Prediction Results

Model Full Lexicon

Precision Recall F1

Log-Linear Words 73.31 65.20 69.01Wsabie Embedding 74.29 66.02 69.91

Table : Full structure prediction results for FrameNet test data. Wecompare to the prior state of the art (Das et al., 2014).

22 / 1

Page 45: Semantic Frame Identification with Distributed Word Representations

Argument-Only Results (CoNLL 2005 task)

PropBank

72

74

76

78

80

73.62

76.29

78.69

77.2377.14

F1-

Sco

re

Collins Charniak Combined Log-W Wsabie

Figure : Argument only evaluation (argument identification metrics)using the CoNLL 2005 shared task evaluation script (Carreras andMarquez, 2005). Results from Punyakanok et al. (2008) are taken fromTable 11 of that paper.

23 / 1

Page 46: Semantic Frame Identification with Distributed Word Representations

Frame Identification Results (PropBank)

PropBank

94

94.2

94.4

94.6

94.8

94.04

94.74

94.56

F1-

Sco

re

Log-E Log-W Wsabie

Figure : Frame Identification results on PropBank datasets.

24 / 1

Page 47: Semantic Frame Identification with Distributed Word Representations

Full pipeline results (PropBank)

PropBank

79.62

79.64

79.65

79.61

F1-

Sco

re

Log-W Wsabie

Figure : Full frame-structure prediction results for PropBank dataset.

25 / 1