entity mention detection using a combination of redundancy-driven classifiers

23
Entity Mention Detection using a Combination of Redundancy- Driven Classifiers Silvana Marianela Bernaola Biggio, Manuela Speranza, Roberto Zanoli bernaola, manspera, zanoli{@fbk.eu} Fondazione Bruno Kessler – Irst Trento, Italy The present work is supported by the LiveMemories Project May, 2010

Upload: amiel

Post on 19-Mar-2016

48 views

Category:

Documents


1 download

DESCRIPTION

Entity Mention Detection using a Combination of Redundancy-Driven Classifiers Silvana Marianela Bernaola Biggio, Manuela Speranza, Roberto Zanoli bernaola, manspera , zanoli{@fbk.eu} Fondazione Bruno Kessler – Irst Trento, Italy The present work is supported by the LiveMemories Project. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Entity Mention Detection using a Combination of Redundancy-Driven Classifiers

Entity Mention Detection using a Combination of Redundancy-Driven

Classifiers

Silvana Marianela Bernaola Biggio,

Manuela Speranza, Roberto Zanolibernaola, manspera, zanoli{@fbk.eu}

Fondazione Bruno Kessler – IrstTrento, Italy

The present work is supported by the LiveMemories Project May, 2010

Page 2: Entity Mention Detection using a Combination of Redundancy-Driven Classifiers

2

Outline

• Entity Mention Detection: An extension of NER task.• The system to be presented:

Mention Levels: NAM, NOM, PRO Entity types: GPE, LOC, ORG , PER Drawing from 2 systems (ACE 2008, EVALITA 2009) 2 new features to recognize mentions Applied in LiveMemories and Italian wikipedia Available as a web service, to be integrated into TextPro

Page 3: Entity Mention Detection using a Combination of Redundancy-Driven Classifiers

4 mentions of type NAM (proper name ): 2 PER, 1 ORG, 1 GPE

Venezuelan President Hugo Chavez on Saturday called for Internet regulations. He demanded that authorities crack down on a news Web site he accused of spreading false information. "The Internet cannot be something open where anything is said and done." President said, according to reports by Reuters.

Hugo Rafael Chávez Frías (28 July 1954) is the President of Venezuela.

Mentions: Named Entities

3

Page 4: Entity Mention Detection using a Combination of Redundancy-Driven Classifiers

Venezuelan President Hugo Chavez on Saturday called for Internet regulations. He demanded that authorities crack down on a news Web site he accused of spreading false information. "The Internet cannot be something open where anything is said and done." President said, according to reports by Reuters.

Hugo Rafael Chávez Frías (28 July 1954) is the President of Venezuela.

3 nominal mentions (NOM): 3 PER

Mentions: Nominals

4

Page 5: Entity Mention Detection using a Combination of Redundancy-Driven Classifiers

Mentions: PronominalsVenezuelan President Hugo Chavez on Saturday

called for Internet regulations. He demanded that authorities crack down on a news Web site he accused of spreading false information. "The Internet cannot be something open where anything is said and done." President said, according to reports by Reuters.

Hugo Rafael Chávez Frías (28 July 1954) is the President of Venezuela.

2 pronoun mentions (PRO): 2 PER

5

Page 6: Entity Mention Detection using a Combination of Redundancy-Driven Classifiers

c

One-level mentions: Hugo ChavezVenezuelan

Two-level mention: Venezuelan President Three-level: Venezuelan President Hugo Chavez

Nested MentionsVenezuelan President Hugo Chavez on Saturday

called for Internet regulations. He demanded that authorities crack down on a news Web site he accused of spreading false information. "The Internet cannot be something open where anything is said and done." President said, according to reports by Reuters.

6

Page 7: Entity Mention Detection using a Combination of Redundancy-Driven Classifiers

6 different mentions refer to 1 entity of type PER

Entities

7

Venezuelan President Hugo Chavez on Saturday called for Internet regulations. He demanded that authorities crack down on a news Web site he accused of spreading false information. "The Internet cannot be something open where anything is said and done." President said, according to reports by Reuters.

Hugo Rafael Chávez Frías (28 July 1954) is the President of Venezuela.

Page 8: Entity Mention Detection using a Combination of Redundancy-Driven Classifiers

8

The idea … Exploiting a large corpus to improve the detection of mentions:

-Patterns

-Data redundancy“ … Italia … ““ … Rossi …”“ … Benetton … “

Page 9: Entity Mention Detection using a Combination of Redundancy-Driven Classifiers

9

1. Candidates

2. TF – IDF (Term Frequency – Inverse Document Frequency) :• Pattern Frequency: The more frequent the pattern occurs

with a mention that belongs to an specific category, the more important is for the category.

• Inverse Category Frequency : The more categories the pattern occurs with, the smaller its contribution in characterizing the semantics of a category which it co-occurs with.

[After annotating the large corpus]

wordn-5 wordn-4 wordn-3 wordn-2 wordn-1 wordn wordn+1 wordn+2 wordn+3

wordn+4 wordn+5

MENTION

Pattern Extraction

Page 10: Entity Mention Detection using a Combination of Redundancy-Driven Classifiers

10

1. “... La giunta Coni sostiene la candidatura di Torino per le Olimpiadi giovanili 2010. ..” A GPE or an ORG (soccer team)?

2. Prob(“Torino”/type=“GPE”)? • Use a classifier to recognize all mentions in a large corpus

in order to obtain the probability distribution for all mentions across all possible types.

PER ORG GPE LOC

Mention=“Torino”

Data Redundancy

B-GPE_NAM11823B-ORG_NAM 2950B-LOC_NAM: 33B-PER_NAM: 5

Page 11: Entity Mention Detection using a Combination of Redundancy-Driven Classifiers

System Architecture

11

Identifies the syntactic head of a mention and its mention level. For the extension of a mention, we use the Malt Parser for Italian (Lavelli et al. 2009)

Recognizes the type of a mention

Page 12: Entity Mention Detection using a Combination of Redundancy-Driven Classifiers

System Architecture

12

1.

Page 13: Entity Mention Detection using a Combination of Redundancy-Driven Classifiers

13

2.

System Architecture

Page 14: Entity Mention Detection using a Combination of Redundancy-Driven Classifiers

14

3.

System Architecture

Page 15: Entity Mention Detection using a Combination of Redundancy-Driven Classifiers

15

4.

System Architecture

Page 16: Entity Mention Detection using a Combination of Redundancy-Driven Classifiers

16

5.

System Architecture

Page 17: Entity Mention Detection using a Combination of Redundancy-Driven Classifiers

17

6.

System Architecture

Page 18: Entity Mention Detection using a Combination of Redundancy-Driven Classifiers

1. EVALITA 2009 EMD Task: value = 65.7%2. Feature Analysis:

18

Evaluation and Feature Analysis

FB1Class All features NOT redundancy NOT pattern

General 79.58% 74.09% 79.28%NAM_GPE 83.65% 78.37% 82.83%NAM_LOC 73.02% 77.52% 73.02%NAM_ORG 73.92% 66.81% 72.94%NAM_PER 91.63% 88.86% 92.03%NOM_GPE 75.86% 55.38% 75.18%NOM_LOC 62.37% 55.10% 59.18%NOM_ORG 71.46% 64.03% 70.41%NOM_PER 86.32% 78.29% 86.08%PRO_GPE 30.77% 14.29% 24.00%PRO_ORG 29.17% 27.59% 30.56%PRO_PER 69.58% 68.43% 69.97%

Page 19: Entity Mention Detection using a Combination of Redundancy-Driven Classifiers

1. LiveMemories Project.- Identifying mentions in 2 Italian corpora:

19

Applications …

A. Articles from the local newspaper “L’Adige”

B. Blogs posted by students living in the university residence of “San Bartolomeo”

Page 20: Entity Mention Detection using a Combination of Redundancy-Driven Classifiers

2. Semantic Wikipedia for Italian (SWiiT)http://textpro.fbk.eu/resources/SWiiT.html , annotated at 5

levels:A. Basic NLP processingB. Entity MentionsC. Entity Subtypes (work in progress)D. Entity Co-reference (work in progress)E. Dependency parsing (work in progress)

20

Applications …

Page 21: Entity Mention Detection using a Combination of Redundancy-Driven Classifiers

System available as …

1. A web service: http://textpro.fbk.eu/typhoon.html

• Using Axis (open source, XML based web

service framework)

• Allows the user to submit a document and

have it annotated with entity mentions using

the IOB format

2. Part of TextPro: http://textpro.fbk.eu (work in

progress)21

Page 22: Entity Mention Detection using a Combination of Redundancy-Driven Classifiers

Conclusions and future work

1. Difficulties in recognizing pronominal mentions, coreference is needed.

2. Data Redundancy improves the general FB1 in around 5%; and in around 20% for nominal names that refer to geopolitical entities.

3. The results for patterns were not what was expected; probably because the selection of them for each class were not the appropriate ones. As future work we would like to find out how to select the right patterns for each class. 22

Page 23: Entity Mention Detection using a Combination of Redundancy-Driven Classifiers

• Bartalesi Lenzi, V., Sprugnoli, R. (2009). EVALITA 2009: Description and Results of the Local Entity Detection and Recognition (LEDR) task. In Proceedings of Evalita 2009, workshop held at AI*IA, 12 December 2009, Reggio Emilia, Italy.

• Bernaola Biggio, S.M., Zanoli, R., Giuliano, C., Uryupina, O., Versley, Y., Poesio, M. (2009). Local Entity Detection and Recognition Task. In Proceedings of Evalita 2009, workshop to held at AI*IA, 12 December 2009, Reggio Emilia, Italy.

• Bernaola Biggio, S.M., Speranza M., Zanoli, R. Entity Mention Detection Using a Combination of Redundancy-Driven Classifiers. In Proceedings of LREC 2010, 7th Conference on Language Resources and Evaluation, Malta, Italy.

• Lavelli, A., Hall, J., Nilsson, J., Nivre, J. (2009). MaltParser at the EVALITA 2009 Dependency Parsing Task. In Proceedings of Evalita 2009, workshop held at AI*IA, 12 December 2009, Reggio Emilia, Italy.

• Magnini, B., Cappelli, A., Pianta, E., Speranza, M., Bartalesi Lenzi, V., Sprugnoli, R., Romano, L., Girardi, C., Negri, M. (2006). Annotazione di contenuti concettuali in un corpus italiano: I-CAB. In Proceedings of SILFI 2006. Florence, Italy.

• Speranza, M. (2009). The Named Entity Recognition Task at EVALITA 2009. In Proceedings of Evalita 2009, workshop held at AI*IA, 12 December 2009, Reggio Emilia, Italy.

References