multilingual event extraction and semi-automatic acquisition of related resources

Post on 09-Jul-2015

612 Views

Category:

Documents

2 Downloads

Preview:

Click to see full reader

DESCRIPTION

How to create a multilingual event extraction system

TRANSCRIPT

Multilingual Event Extraction and Semi-automatic Acquisition of Related

Resources

Hristo TanevJoint Research Centre

Ispra, Italy

NEXUS News Event eXtraction

Using language Structures

Event Extraction

Event extraction was introduced as a language processing task at MUC-2 in 1989

Event is something that happens, event description is a template which describes an event

The goal of automatic event extraction is automatic filling of an event description template from a text or a set of texts

Event description usually includes: Event type Time and place of the event Participating entities which have specific roles and which depend on the event type,

e.g. perpetrator, victim, instrument etc. Cause

Event Extraction in the Context of EMM

The purpose of the automatic event extraction from online news is to facilitate the crisis-management efforts of the European Commission and other related political institutions

NEXUS NEXUS detects security-related events and disasters NEXUSNEXUS monitors in nearly real time online news in English,

French, Spanish, Italian, Russian, Portuguese, and Arabic (after automatic translation into English)

Medical NEXUS detects news about disease outbreaks in English (soon to be deployed in French)

EMM Event Extraction from Online News

News cluster:

Car bomb kills 50 in IraqHindustanTimes Wednesday, June 18, 2008 5:07:00 AM CEST A car bomb blast in northern Baghdad left more than 50 people dead and 80 wounded on Tuesday, a police source said…

Biggest blast in months leaves at least 50 dead in IraqreliefWeb Wednesday, June 18, 2008 5:05:00 AM CESTA car bomb blast in northern Baghdad, the largest in months, left more than 50 people dead and 80 wounded on Tuesday, a police source said...

EMM Event Extraction from Online News

Event Description

• Date: 18 June 2008• Place: Baghdad, Iraq• Event type: terrorist attack• Number killed: 50• Number wounded: 80• Number kidnapped: 0• Perpetrators: not reported• Weapons: car bomb

NEXUS

EMM Event Extraction ArchitectureNews

Entity Match Geo-Tagging Clustering

TextProcessing

NER, Parsing,Pattern Matching

InformationAggregation

Visualization Events

Partial Parsing

Example for a multilingual rule, which recognizes NP like: "a French volunteer and an Italian military"

coordination_rule :> ( person_group & [NAME:#name1, AMOUNT:"1" #amount1] (token & [SURFACE: ","]?

person_group & [NAME:#name2, AMOUNT:"1" #amount2])?(token & [SURFACE: ","]?

person_group & [NAME:#name3, AMOUNT:"1" #amount3])?conjunctionperson_group & [NAME:#name4, AMOUNT:"1" #amount4]):c

c: person_group & [NAME:#final, AMOUNT:#amount, NUMBER:"p“]& #final := ConcForSum(#name1,#name2,#name3,#name4)& #amount := ConcForSum(#amount1,#amount2,#amount3,#amount4).

Annotating Participating Entities

This is one of the most important tasks – to label the person groups and other phrases with event specific semantic roles, e.g. Perpetrator, Dead victim, Displaced people, Weapons used, etc.

Linear patterns – work well for English We use linear patterns also for Russian More elaborated event extraction grammars for Arabic,

Italian, French, Spanish and Portuguese

Event-specific Grammars

Rule: <person-group> [introduce-passive] Verb[baseform: rimanere]? Adv? Verb[sem: injured-obj, passive-voice] <person-group> : injured

Cinque persone sono state feriteCinque persone sono state gravemente feriteCinque persone sono rimaste ferite For details see [Zavarella et.al. Event Extraction for

Italian, Using a Cascade of Finite State Grammars, FSMNLP 2008]

Multilingual Lexical Acquisition

Multilingual Lexical Acquisition

Automatic learning of language-specific lexical resources

Statistical approaches, weakly supervised, make use of large quantities of unannotated news

Learning of patterns, keywords and keyphrases, which can be manually validated, rather than statistical models like SVM

Pattern learning Learning domain-specific lexica Learning semantic classes

Linear Pattern Learning

For English we use the linear patterns, as the algorithm learns them

We learned more 3000 linear patterns for English For Italian and other languages, linear patterns

are staring point for grammar development

Learning Semantic Classes

Sometimes, it is necessary to learn specific semantic classes, e.g. vehicles, disasters, weapons, facilities

We built a stastical system for automatic acquisition of semantic classes

The system is language-independent, only a list of language-specific stop words is used

Ontopopulis

INPUT:

feelings: hatred, love, fear, sadness

contrasting classes: taste, (style, outlook), character, thoughts

Extracting New Terms

Newly learnt terms are ordered and next given to the user for evaluation Top 20 terms from the category feelings

griefsorrowsadnesscondolencesfeardisappointmentregretsympathyshockhatredgratitudefrustrationangerdeep sorrowprofounddismaycondolencesatisfactionprofound griefdeep grief

Using Learnt Semantic Classes for Event Extraction

We use Ontopopulis to learn terms, which we next put into our domain-specific dictionaries

Some rules which require a domain specific dictionary: Rules for parsing person reference noun phrases, such as

two engineers Rules which detect weapons used:

killed with a [WEAPON] (killed with a gun ) Detection of vehicles used:

[PEOPLE] in a [VEHICLE] were stopped (three men in a boat were stopped)

NEXUS Evaluation for English

61%Geo-tagging (place name)

90%Geo-tagging (country)

80%Event classification

57%Injured counting

70%Dead counting

AccuracyDetection Task

NEXUS Multilingual Evaluation

0.470.670.510.69Portuguese

0.67-0.620.87Italian

ArrestedKidnappedWoundedDeadF1 measure

Evaluation of Ontopopulis

------6095Spanish

7585207085756090Portuguese

BuildingCrimeEdged weapon

WatercraftVehiclePoliticianWeaponPersonAccuracy (%) top 20

top related