datasets and gate evaluation framework for benchmarking wikipedia based ner systems

19
Datasets and GATE Evaluation Framework for Benchmarking Wikipedia Based NER Systems Milan Dojchinovski 1,2 , Tomáš Kliegr 1 2 Faculty of Information Technology Czech Technical University in Prague 1 Faculty of Informatics and Statistics University of Economics, Prague “NLP & DBpedia” ISWC 2013 workshop October 22nd, 2013, Sydney, Australia Milan Dojchinovski [email protected] - @m1ci - http://dojchinovski.mk Except where otherwise noted, the content of this presentation is licensed under Creative Commons Attribution-NonCommercial-ShareAlike 3.0 Unported Czech Technical University in Prague University of Economics Prague

Upload: milan-dojchinovski

Post on 14-Jul-2015

204 views

Category:

Technology


3 download

TRANSCRIPT

Datasets and GATE Evaluation Framework for Benchmarking Wikipedia Based NER Systems

Milan Dojchinovski1,2, Tomáš Kliegr1

2 Faculty of Information TechnologyCzech Technical University in Prague

1 Faculty of Informatics and StatisticsUniversity of Economics, Prague

“NLP & DBpedia” ISWC 2013 workshopOctober 22nd, 2013, Sydney, Australia

Milan [email protected] - @m1ci - http://dojchinovski.mk

Except where otherwise noted, the content of this presentation is licensed underCreative Commons Attribution-NonCommercial-ShareAlike 3.0 Unported

Czech Technical University in Prague

University of Economics Prague

Outline

2GATE Evaluation Framework for Benchmarking Wikipedia-Based NER Systems

‣ Introduction ‣ Prerequisites and challenges ‣ GATE framework for benchmarking NER ‣ Conclusion and future directions

What is a Named Entity Recognition task?

3

The Charles Bridge is a famous historic bridge that crosses the Vltava river in Prague, Czech Republic. Its construction started in 1357 under the auspices of King Charles IV, and finished in the beginning of the 15th century. The bridge replaced the old Judith Bridge built 1158–1172 that had been badly damaged by a flood in 1342.

Entity Entity URI TypeCharles Bridge http://dbpedia.org/resource/Charles_Bridge http://dbpedia.org/ontology/BridgeVltava http://dbpedia.org/resource/Vltava http://dbpedia.org/ontology/RiverPrague http://dbpedia.org/resource/Prague http://dbpedia.org/ontology/CityCzech Republic http://dbpedia.org/resource/Czech_Republic http://dbpedia.org/ontology/CountryKing Charles IV http://dbpedia.org/resource/Charles_IV,_Holy_Roman_Emperor http://dbpedia.org/ontology/PersonJudith Bridge http://dbpedia.org/resource/Judith_Bridge http://dbpedia.org/ontology/Bridge

‣ Main sub-tasks - spotting of entities: tagging text fragment as an entity - disambiguation of entities: unique identification of entities using URIs - classification of entities: assignment of type to an entity

GATE Evaluation Framework for Benchmarking Wikipedia-Based NER Systems

Existing NER tools

4

‣ DBpedia Spotlight, NERD, THD (EntityClassifier.eu), AlchemyAPI, Open Calais, Evri, Lupedia, Wikimeta, Yahoo!, Zemata, and others.

‣ Differences - types come from different taxonomies - types with different granularity (Person, SoccerPlayer, Manager) - types are plain text literals

‣ Similarities - disambiguation with DBpedia or Wikipedia resources - DBpedia Spotlight, Entityclassifier.eu, AlchemyAPI, Wikimeta

GATE Evaluation Framework for Benchmarking Wikipedia-Based NER Systems

Outline

5

‣ Introduction ‣ Challenges and prerequisites ‣ GATE framework for benchmarking NER ‣ Conclusion and future directions

GATE Evaluation Framework for Benchmarking Wikipedia-Based NER Systems

Challenges and Prerequisites

6

‣ Entity spotting - spotted entity text fragments might not be exactly overlapping, but still correct - entity start and end offset might be different

‣ Entity disambiguation - using unique DBpedia/Wikipedia resource URIs or URIs from YAGO or Freebase.

The Charles Bridge is a famous historic bridge ... - ground-truth annotationsThe Charles Bridge is a famous historic bridge ... - annotations from a NER system

The Charles Bridge is a famous historic bridge ...

“Charles Bridge” - http://dbpedia.org/resource/Charles_Bridge

GATE Evaluation Framework for Benchmarking Wikipedia-Based NER Systems

Challenges and Prerequisites

7

‣ Entity classification - different NER might return different types for same entity - although different (in granularity), but still correct

- Person and SoccerManager are not same types, but they are correctGATE Evaluation Framework for Benchmarking Wikipedia-Based NER Systems

Outline

8

‣ Introduction ‣ Prerequisites and challenges ‣ GATE framework for benchmarking NER ‣ Conclusion and future directions

GATE Evaluation Framework for Benchmarking Wikipedia-Based NER Systems

Architecture overview

9

‣ Unified evaluation framework - any NER tool can be easily integrated and evaluated

GATE Evaluation Framework for Benchmarking Wikipedia-Based NER Systems

Realization

10

‣ GATE text engineering framework - open-source, strong community support - easy to extend (plugins and new processing resources) - several existing NER clients for GATE: THD, OpenCalais, ANNIE, etc. - evaluation tools that can be reused

- Corpus Quality Assurance - Annotation diff

‣ Developed tools - plugins for import of News and Tweets datasets - plugin for type alignment - reference implementation of a NER client as a GATE plugin

GATE Evaluation Framework for Benchmarking Wikipedia-Based NER Systems

Evaluation Workflow

11

• Steps to evaluate NER system 1. Import ground-truth dataset

- use provided plugins - News and Tweets datasets

2. Run NER on the ground-truth corpus - use a GATE client plugin for the NER system - if not existent, should be implemented!

3. Align entity classes with the ground-truth classes - use the provided OntologyAwareDiffPR plugin

4. Evaluate the performance of the NER tool - use the Corpus Quality Assurance tool - evaluate NE spotting, disambiguation, classification

GATE Evaluation Framework for Benchmarking Wikipedia-Based NER Systems

News and Tweets datasets

12

• Tweets dataset - CC BY-NC-SA 3.0 - dataset from the Making Sense of Microposts (MSM) 2013 workshop challenge - 1044 tweets, 1523 entities

• News dataset - CC BY-SA 3.0 - derivation from the datasets presented at the WEKEX 2011 workshop - standard-length news articles, 10 articles, 588 entities

Fig. 1. Example of an entity annotation in GATEGATE Evaluation Framework for Benchmarking Wikipedia-Based NER Systems

Type Alignment

13

• Implemented as a GATE plugin - OntologyAwareDiffPR

• Pre-requirements - DBpedia Ontology - typeURI feature in the ground-truth and NER annotation

typeURI: http://dbpedia.org/ontology/Person aligned: http://dbpedia.org/Ontology/SoccerManager

before alignment

after alignment

NER annotation ground-truth annotation

typeURI: http://dbpedia.org/ontology/Person

typeURI: http://dbpedia.org/ontology/SoccerManager aligned: http://dbpedia.org/Ontology/SoccerManager

typeURI: http://dbpedia.org/ontology/SoccerManager

GATE Evaluation Framework for Benchmarking Wikipedia-Based NER Systems

Outline

14

‣ Introduction ‣ Prerequisites and challenges ‣ GATE framework for benchmarking NER ‣ Conclusion and future directions

GATE Evaluation Framework for Benchmarking Wikipedia-Based NER Systems

Conclusion and Future Directions

15

• GATE Evaluation Framework - two ground-truth datasets - two plugins for import of News and Tweets datasets - a plugin to perform basic type alignment - reference implementation of Entityclassifier.eu NER as a GATE client - plugins published under GPLv3.0

• Future Work - integration of additional NER/NIF systems - development of an advance type alignment - improvement of existing datasets and creation of new - additional ground-truth datasets - additional NER evaluation statistics - using NERD ontology to integrate tag sets of common wikifiers

GATE Evaluation Framework for Benchmarking Wikipedia-Based NER Systems

16

Thank you!Questions, comments, ideas?

Milan Dojchinovski @[email protected] http://dojchinovski.mk

Except where otherwise noted, the content of this presentation is licensed underCreative Commons Attribution-NonCommercial-ShareAlike 3.0 Unported

Feedback

‣ Framework resources: - general info: http://entityclassifier.eu/datasets/evaluation/ - datasets: http://entityclassifier.eu/datasets/evaluation/benchmark-datasets/ - tools: http://entityclassifier.eu/datasets/evaluation/tools/

Datasets annotation process

17

Dataset Wikipedia URL Coarse grained type

Fine grained type

Most frequent sense

News 0,61 0,65 0,7 0,77Tweets 0,79 n/a 0,64 0,86

‣ Inter-annotator agreement - two annotators - additional one/two annotators for spurious cases

• Fields - URL to English Wikipedia - Fine-grained type - Coarse-grained type - MFS flag - Common entity flag - Full name - Partial flag - Incorrect capitalization flag (Tweets only)

GATE Evaluation Framework for Benchmarking Wikipedia-Based NER Systems

News and Tweets datasets

18

• Tweets dataset - dataset from the Making Sense of Microposts (MSM) 2013 Workshop challenge - 1044 tweets, 1523 entities - Creative Commons BY-NC-SA 3.0

• News dataset - derivation from the datasets presented at the WEKEX 2011 workshop - standard-length news articles, 10 articles, 588 entities) - Creative Commons BY-SA 3.0

DocumentsEntities

All With CoNNL type Ontology type Wikipedia URLNews 10 588 580 367 440Tweets 1044 1523 1523 1379 1354

Fig. 1. Size metrics for the Tweets and News dataset

GATE Evaluation Framework for Benchmarking Wikipedia-Based NER Systems

Entityclassifier.eu preliminary results

19

Task Precision (strict/lenient)

Recall (strict/lenient)

F1.0 score (strict/lenient)

Entity spotting 0.45/0.56 0.67/0.84 0.54/0.67

Entity disambiguation 0.24/0.26 0.36/0.39 0.29/0.31

Entity classification 0.12/0.13 0.17/0.19 0.14/0.15

Task Precision (strict/lenient)

Recall (strict/lenient)

F1.0 score (strict/lenient)

Entity spotting 0.69/0.78 0.33/0.38 0.45/0.51

Entity disambiguation 0.37/0.41 0.18/0.20 0.24/0.27

Entity classification 0.69/0.78 0.33/0.38 0.45/0.51

Fig. 1. Results for the Entityclassifier.eu NER on the Tweets dataset

Fig. 2. Results for the Entityclassifier.eu NER on the News dataset

• http://entityclassifier.eu

GATE Evaluation Framework for Benchmarking Wikipedia-Based NER Systems