lightweight multilingual entity extraction and linking slide

24
Speaker: Shih-Han Lo Advisor: Professor Jia-Ling Koh Author: Aasish Pappu, Roi Blanco, Yashar Mehdad, Amanda Stent, Kapil Thadani Date: 2017/09/19 Source: WSDM ’17 1 Lightweight Multilingual Entity Extraction and Linking

Upload: others

Post on 12-Jun-2022

12 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Lightweight Multilingual Entity Extraction and Linking Slide

Speaker: Shih-Han LoAdvisor: Professor Jia-Ling Koh

Author: Aasish Pappu, Roi Blanco, Yashar Mehdad, Amanda Stent, Kapil Thadani

Date: 2017/09/19Source: WSDM ’17

1

Lightweight Multilingual Entity

Extraction and Linking

Page 2: Lightweight Multilingual Entity Extraction and Linking Slide

Outline

2

Introduction

Method

Experiment

Conclusion

Page 3: Lightweight Multilingual Entity Extraction and Linking Slide

Introduction

3

Key tasks for text analytic systems:

Named Entity Recognition (NER)

Named Entity Linking (NEL)

Some systems perform NER and NEL jointly.

Page 4: Lightweight Multilingual Entity Extraction and Linking Slide

Introduction

4

Most approaches involve (some of) the following steps:

Mention detection

Mention normalization

Candidate entity retrieval for each mention

Entity disambiguation for mentions with multiple candidate entities

Mention clustering for mentions that do not link to any entity

Motivation

Page 5: Lightweight Multilingual Entity Extraction and Linking Slide

Outline

5

Introduction

Method

Experiment

Conclusion

Page 6: Lightweight Multilingual Entity Extraction and Linking Slide

Mention Detection

6

Typically consists of running an NER system over input text.

We use simple CRFs and only a few lexical, syntactic and semantic features.

Page 7: Lightweight Multilingual Entity Extraction and Linking Slide

System Description

7

Page 8: Lightweight Multilingual Entity Extraction and Linking Slide

Candidate Entity Retrieval

8

Entity Embeddings

We aim to simultaneously learn D-dimensional representations of Ent and W in a common vector space.

Training our embedding model: continuous skip-grams with 300 dimensions and a window size of 10.

Page 9: Lightweight Multilingual Entity Extraction and Linking Slide

Candidate Entity Retrieval

9

Entity Embeddings

Page 10: Lightweight Multilingual Entity Extraction and Linking Slide

Candidate Entity Retrieval

10

Fast Entity Linking

Fast Entity Linker (FEL) is an unsupervisedapproach.

FEL imposes contextual dependencies by calculating the cosine distance between two entities. Candidate From the substrings of the input string

Minimal perfect hash function

Elias-Fano integer coding

Page 11: Lightweight Multilingual Entity Extraction and Linking Slide

Entity Disambiguation

11

Task of figuring out to which candidate entity a mention refers.

The task is complex because mentions may refer to different entities, depend on local context.

Page 12: Lightweight Multilingual Entity Extraction and Linking Slide

Entity Disambiguation

12

Forward-Backward Algorithm (FwBw)

Page 13: Lightweight Multilingual Entity Extraction and Linking Slide

Entity Disambiguation

13

Exemplar (Clustering)

Page 14: Lightweight Multilingual Entity Extraction and Linking Slide

Entity Disambiguation

14

Label Propagation (LabelProp)

Modified adsorption (MAD)

For , we inject seed labels L on a few nodes.

For nodes V’, we assign a label distribution:

Along with , MAD takes three hyper-parameters as input.

We pick the highest ranked label for each node in V as the final candidate.

Page 15: Lightweight Multilingual Entity Extraction and Linking Slide

Outline

15

Introduction

Method

Experiment

Conclusion

Page 16: Lightweight Multilingual Entity Extraction and Linking Slide

Experiment

16

Datasets:

Cross-lingual TAC KBP 2013

Mono-lingual AIDA-CONLL 2003

Page 17: Lightweight Multilingual Entity Extraction and Linking Slide

Experiment

17

Setup

N-best: N = 10

FwBw: λ = 0.5

Exemplar: max_iterations = 300, λ = 0.5

LabelProp: μ1 = 1, μ2 = 1e − 2, μ3 = 1e − 2

Page 18: Lightweight Multilingual Entity Extraction and Linking Slide

Experiment

18

TAC KBP Evaluation Results

Page 19: Lightweight Multilingual Entity Extraction and Linking Slide

Experiment

19

Analysis

Page 20: Lightweight Multilingual Entity Extraction and Linking Slide

Experiment

20

Analysis

Page 21: Lightweight Multilingual Entity Extraction and Linking Slide

Experiment

21

AIDA Evaluation

Page 22: Lightweight Multilingual Entity Extraction and Linking Slide

Experiment

22

Runtime Performance

Page 23: Lightweight Multilingual Entity Extraction and Linking Slide

Outline

23

Introduction

Method

Experiment

Conclusion

Page 24: Lightweight Multilingual Entity Extraction and Linking Slide

Conclusion

24

Our NER implementation is outperformed only by NER systems that use much more complex feature engineering and/or modeling methods.

In future work, we plan to improve the performance of our system for other languages, by expanding the pool of entities for which we have information.

Candidate retrieval in Spanish is relatively poor compared to English and Chinese.