a cross-lingual annotation projection approach for relation detection

33
A CROSS-LINGUAL ANNOTATION PROJECTION APPROACH FOR RELATION DETECTION The 23 rd International Conference on Computational Linguistics (COLING 2010) August 24 th , 2010, Beijing Seokhwan Kim (POSTECH) Minwoo Jeong (Saarland University) Jonghoon Lee (POSTECH) Gary Geunbae Lee (POSTECH)

Upload: seokhwan-kim

Post on 10-May-2015

373 views

Category:

Technology


0 download

TRANSCRIPT

Page 1: A Cross-Lingual Annotation Projection Approach for Relation Detection

A CROSS-LINGUAL ANNOTATION PROJECTION

APPROACH FOR RELATION DETECTION

The 23rd International Conference on Computational Linguistics (COLING 2010)August 24th, 2010, Beijing

Seokhwan Kim (POSTECH)Minwoo Jeong (Saarland University)

Jonghoon Lee (POSTECH)Gary Geunbae Lee (POSTECH)

Page 2: A Cross-Lingual Annotation Projection Approach for Relation Detection

Contents

• Introduction

• Methods

Cross-lingual Annotation Projection for Relation Detection

Noise Reduction Strategies

• Evaluation

• Conclusion

2

Page 3: A Cross-Lingual Annotation Projection Approach for Relation Detection

Contents

• Introduction

• Methods

Cross-lingual Annotation Projection for Relation Detection

Noise Reduction Strategies

• Evaluation

• Conclusion

3

Page 4: A Cross-Lingual Annotation Projection Approach for Relation Detection

What’s Relation Detection?

• Relation Extraction

To identify semantic relations between a pair of entities

ACE RDC

• Relation Detection (RD)

• Relation Categorization (RC)

4

Jan Mullins, owner Computer Recycler Incorporated said that …of

Owner-Of

Page 5: A Cross-Lingual Annotation Projection Approach for Relation Detection

What’s the Problem?

• Many supervised machine learning approaches have been

successfully applied to the RDC task

(Kambhatla, 2004; Zhou et al., 2005; Zelenko et al., 2003; Culotta

and Sorensen, 2004; Bunescu and Mooney, 2005; Zhang et al.,

2006)

• Datasets for relation detection

Labeled corpora for supervised learning

Available for only a few languages

• English, Chinese, Arabic

No resources for other languages

• Korean

5

Page 6: A Cross-Lingual Annotation Projection Approach for Relation Detection

Contents

• Introduction

• Methods

Cross-lingual Annotation Projection for Relation Detection

Noise Reduction Strategies

• Evaluation

• Conclusion

6

Page 7: A Cross-Lingual Annotation Projection Approach for Relation Detection

Cross-lingual Annotation Projection

• Goal

To learn the relation detector without significant annotation efforts

• Method

To leverage parallel corpora to project the relation annotation on

the source language LS to the target language LT

7

Page 8: A Cross-Lingual Annotation Projection Approach for Relation Detection

Cross-lingual Annotation Projection

• Previous Work

Part-of-speech tagging (Yarowsky and Ngai, 2001)

Named-entity tagging (Yarowsky et al., 2001)

Verb classification (Merlo et al., 2002)

Dependency parsing (Hwa et al., 2005)

Mention detection (Zitouni and Florian, 2008)

Semantic role labeling (Pado and Lapata, 2009)

• To the best of our knowledge, no work has reported on the

RDC task

8

Page 9: A Cross-Lingual Annotation Projection Approach for Relation Detection

ProjectionAnnotation

Overall Architecture

9

Parallel Corpus

Sentences in Ls

Preprocessing(POS Tagging,

Parsing)

NER

Relation Detection

AnnotatedSentences in

Ls

Sentences in Lt

Preprocessing(POS Tagging,

Parsing)

Word Alignment

Projection

AnnotatedSentences in

Lt

Page 10: A Cross-Lingual Annotation Projection Approach for Relation Detection

How to Reduce Noise?

• Error Accumulation

Numerous errors can be generated and accumulated through a

procedure of annotation projection

• Preprocessing for LS and LT

• NER for LS

• Relation Detection for LS

• Word Alignment between LS and LT

• Noise Reduction

A key factor to improve the performance of annotation projection

10

Page 11: A Cross-Lingual Annotation Projection Approach for Relation Detection

• Noise Reduction Strategies (1)

Alignment Filtering

• Based on Heuristics

A projection for an entity mention should be based on alignments between

contiguous word sequences

How to Reduce Noise?

11

accepted rejected

Page 12: A Cross-Lingual Annotation Projection Approach for Relation Detection

• Noise Reduction Strategies (1)

Alignment Filtering

• Based on Heuristics

A projection for an entity mention should be based on alignments between

contiguous word sequences

Both an entity mention in LS and its projection in LT should include at

least one base noun phrase

How to Reduce Noise?

12

accepted rejected

N N N N

N

accepted rejected

Page 13: A Cross-Lingual Annotation Projection Approach for Relation Detection

• Noise Reduction Strategies (1)

Alignment Filtering

• Based on Heuristics

A projection for an entity mention should be based on alignments between

contiguous word sequences

Both an entity mention in LS and its projection in LT should include at

least one base noun phrase

The projected instance in LT should satisfy the clausal agreement with the

original instance in LS

How to Reduce Noise?

13

accepted rejected

N N N N

N

accepted rejected rejected

Page 14: A Cross-Lingual Annotation Projection Approach for Relation Detection

How to Reduce Noise?

• Noise Reduction Strategies (2)

Alignment Correction

• Based on a bilingual dictionary for entity mentions

Each entry of the dictionary is a pair of entity mention in LS and its

translation or transliteration in LT

14

FOR each entity ES in LSRETRIEVE counterpart ET from DICT(E-T)

SEEK ET from the sentence ST in LTIF matched THEN

MAKE new alignment ES-ETENDIF

ENDFOR

A B C D E F G

α β γ δ ε δ ε

BCD - βγ

corrected

Page 15: A Cross-Lingual Annotation Projection Approach for Relation Detection

How to Reduce Noise?

• Noise Reduction Strategies (3)

Assessment-based Instance Selection

• Based on the reliability of a projected instances in LT

Evaluated by the confidence score of monolingual relation detection for

the original counterpart instance in LS

Only instances with larger scores than threshold value θ are accepted

15

conf = 0.9

accepted

conf = 0.6

rejected

θ = 0.7

Page 16: A Cross-Lingual Annotation Projection Approach for Relation Detection

Contents

• Introduction

• Methods

Cross-lingual Annotation Projection for Relation Detection

Noise Reduction Strategies

• Evaluation

• Conclusion

16

Page 17: A Cross-Lingual Annotation Projection Approach for Relation Detection

Experimental Setup

• Dataset

English-Korean parallel corpus

• 454,315 bi-sentence pairs in English and Korean

• Aligned by GIZA++

Korean RDC corpus

• Annotated following LDC guideline for ACE RDC corpus

• 100 news documents in Korean

835 sentences

3,331 entity mentions

8,354 relation instances

17

Page 18: A Cross-Lingual Annotation Projection Approach for Relation Detection

Experimental Setup

• Preprocessors

English

• Stanford Parser (Klein and Manning, 2003)

• Stanford Named Entity Recognizer (Finkel et al., 2005)

Korean

• Korean POS Tagger (Lee et al., 2002)

• MST Parser (R. McDonald et al., 2006)

18

Page 19: A Cross-Lingual Annotation Projection Approach for Relation Detection

Experimental Setup

• Relation Detection for English Sentences

Tree kernel-based SVM classifier

• Training Dataset

ACE 2003 corpus

• 674 documents

• 9,683 relation instances

• Model

Shortest path enclosed subtrees kernel (Zhang et al., 2006)

• Implementation

SVM-Light (Joachims, 1998)

Tree Kernel Tools (Moschitti, 2006)

19

Page 20: A Cross-Lingual Annotation Projection Approach for Relation Detection

Experimental Setup

• Relation Detection for Korean Sentences

Tree kernel-based SVM classifier

• Training Dataset

Half of the Korean RDC corpus (baseline)

Projected instances

• Model

Shortest path dependency kernel (Bunescu and Mooney, 2005)

• Implementation

SVM-Light (Joachims, 1998)

Tree Kernel Tools (Moschitti, 2006)

20

Page 21: A Cross-Lingual Annotation Projection Approach for Relation Detection

Experimental Setup

• Experimental Sets

Combinations of noise reduction strategies

• (S1: Heuristic, S2: Dictionary, S3: Assessment)

1. Baseline

Trained with only half of the Korean RDC corpus

2. Baseline + Projections (no noise reduction)

3. Baseline + Projections (S1)

4. Baseline + Projections (S1 + S2)

5. Baseline + Projections (S3)

6. Baseline + Projections (S1 + S3)

7. Baseline + Projections (S1 + S2 + S3)

21

Page 22: A Cross-Lingual Annotation Projection Approach for Relation Detection

Experimental Setup

• Evaluation

On the second half of the Korean RDC corpus

• The first half is for the baseline

On true entity mentions with true chaining of coreference

Evaluated by Precision/Recall/F-measure

22

Page 23: A Cross-Lingual Annotation Projection Approach for Relation Detection

Experimental Results

Modelno assessment with assessment

P R F P R F

baseline 60.5 20.4 30.5 - - -

baseline + projection 22.5 6.5 10.0 29.1 13.2 18.2

Baseline + projection(heuristics)

51.4 15.5 23.8 56.1 22.9 32.5

Baseline + projection(heuristics + dictionary)

55.3 19.4 28.7 59.8 26.7 36.9

23

Page 24: A Cross-Lingual Annotation Projection Approach for Relation Detection

Non-filtered Projects were Poor

Modelno assessment with assessment

P R F P R F

baseline 60.5 20.4 30.5 - - -

baseline + projection 22.5 6.5 10.0 29.1 13.2 18.2

Baseline + projection(heuristics)

51.4 15.5 23.8 56.1 22.9 32.5

Baseline + projection(heuristics + dictionary)

55.3 19.4 28.7 59.8 26.7 36.9

24

Page 25: A Cross-Lingual Annotation Projection Approach for Relation Detection

Heuristics Were Helpful

Modelno assessment with assessment

P R F P R F

baseline 60.5 20.4 30.5 - - -

baseline + projection 22.5 6.5 10.0 29.1 13.2 18.2

Baseline + projection(heuristics)

51.4 15.5 23.8 56.1 22.9 32.5

Baseline + projection(heuristics + dictionary)

55.3 19.4 28.7 59.8 26.7 36.9

25

Page 26: A Cross-Lingual Annotation Projection Approach for Relation Detection

Much Worse Than Baseline

Modelno assessment with assessment

P R F P R F

baseline 60.5 20.4 30.5 - - -

baseline + projection 22.5 6.5 10.0 29.1 13.2 18.2

Baseline + projection(heuristics)

51.4 15.5 23.8 56.1 22.9 32.5

Baseline + projection(heuristics + dictionary)

55.3 19.4 28.7 59.8 26.7 36.9

26

Page 27: A Cross-Lingual Annotation Projection Approach for Relation Detection

Dictionary Was Also Helpful

Modelno assessment with assessment

P R F P R F

baseline 60.5 20.4 30.5 - - -

baseline + projection 22.5 6.5 10.0 29.1 13.2 18.2

Baseline + projection(heuristics)

51.4 15.5 23.8 56.1 22.9 32.5

Baseline + projection(heuristics + dictionary)

55.3 19.4 28.7 59.8 26.7 36.9

27

Page 28: A Cross-Lingual Annotation Projection Approach for Relation Detection

Still Worse Than Baseline

Modelno assessment with assessment

P R F P R F

baseline 60.5 20.4 30.5 - - -

baseline + projection 22.5 6.5 10.0 29.1 13.2 18.2

Baseline + projection(heuristics)

51.4 15.5 23.8 56.1 22.9 32.5

Baseline + projection(heuristics + dictionary)

55.3 19.4 28.7 59.8 26.7 36.9

28

Page 29: A Cross-Lingual Annotation Projection Approach for Relation Detection

Assessment Boosted Performance

Modelno assessment with assessment

P R F P R F

baseline 60.5 20.4 30.5 - - -

baseline + projection 22.5 6.5 10.0 29.1 13.2 18.2

Baseline + projection(heuristics)

51.4 15.5 23.8 56.1 22.9 32.5

Baseline + projection(heuristics + dictionary)

55.3 19.4 28.7 59.8 26.7 36.9

29

Page 30: A Cross-Lingual Annotation Projection Approach for Relation Detection

Combined Strategies Achieved

Better Performance Then Baseline

Modelno assessment with assessment

P R F P R F

baseline 60.5 20.4 30.5 - - -

baseline + projection 22.5 6.5 10.0 29.1 13.2 18.2

Baseline + projection(heuristics)

51.4 15.5 23.8 56.1 22.9 32.5

Baseline + projection(heuristics + dictionary)

55.3 19.4 28.7 59.8 26.7 36.9

30

Page 31: A Cross-Lingual Annotation Projection Approach for Relation Detection

Contents

• Introduction

• Methods

Cross-lingual Annotation Projection for Relation Detection

Noise Reduction Strategies

• Evaluation

• Conclusion

31

Page 32: A Cross-Lingual Annotation Projection Approach for Relation Detection

Conclusion

• Summary

A cross-lingual annotation projection for relation detection

Three strategies for noise reduction

Projected instances from an English-Korean parallel corpus helped

to improve the performance of the task

• with the noise reduction strategies

• Future work

A cross-lingual annotation projection for relation categorization

More elaborate strategies for noise reduction to improve the

projection performance for relation extraction

32

Page 33: A Cross-Lingual Annotation Projection Approach for Relation Detection

Q&A