heideltime at tempeval-3 - heidelberg university · heideltime at tempeval-3 tuning english and...

Post on 26-Jul-2020

0 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

HeidelTime at TempEval-3Tuning English and Developing Spanish Resources

Jannik Strotgen, Julian Zell, Michael GertzDatabase Systems Research Group, Heidelberg University, Im Neuenheimer Feld 348, 69120 Heidelberg, Germany

Motivation HeidelTime: a multilingual, cross-domain temporal tagger

Temporal Tagging•extraction & normalization

of temporal expressionsMain Challenge•normalizing relative and un-

derspecified expressions

News 1998-04-18... for the United States,he said today. ... OnMay 22, 1995, Farkas wasmade a brigadier general,and the following year ...However, cited by police inDecember for driving underthe influence of alcohol ...

Different Domains [1]•pose different challenges• require different strategiesExisting Approaches• focus on English• focus on news documents

Narrative 2009-12-191979: Soviet invasion... land in Kabul onDecember 25 ... they werecomplying with the 1978Treaty of Friendship ... en-tered Afghanistan from thenorth on December 27. Inthe morning, the 103rd ...

Key Features [2]• rule-based system• required: sentence, token,

and POS information

ResourcesSource Code

Language-independent• resource interpreter•domain-dependent normal-

ization strategies→ reference time→ relation to reference time

•extraction: regular expres-sions & NLP features•normalization: knowledge

resources & linguistic clues

Language-dependent•pattern files

month=(...|April|May|...)•normalization files

normMonth(April)=04• rule files

TempEval-3 Developing Spanish Resources

Temporal Tagging (Task A)•English and Spanish news documents•annotation according to TimeMLEvaluation•strict and relaxed extraction• type and value normalization• ranking attribute: value F1 (relaxed)

From HeidelTime 1.2 to 1.3

• improved weekday normalization•annotations closer to TimeMLEnglish Adaptations•X REF values• improved negative rules for ambiguous

expressions (e.g., may, fall, march)

Four Steps to Add a New Language:(1) Preprocessing:•sentence, token, PoS information•HeidelTime uses TreeTagger•Spanish TreeTagger module available(2) Translation of Pattern Files:

// reMonthLong[Ee]nero[Ff]ebrero[Mm]arzo...

// reMonthLongJanuaryFebruaryMarch…

(3) Translation of Normalization Files:// “normMonth”“January”,”01”“Jan\.?”,”01”“0?1”,”01”“February”,”02”...

// “normMonth”“[Ee]nero”,”01”“[Ee]ne\.?”,”01”“0?1”,”01”“[Ff]ebrero”,”02”...

(4) Iterative Rule Development

•starting with (simple) English rules

•checking training corpus for errors (FP,FN, partial matches, incorrect values)

•adapting patterns, normalizations, andrules to improve results on training data

// example: “el 20 de enero de 2012” (2012-01-20)Name=“date_r1”Extract=“[Ee]l %reDayNum de %reMonthLong “ + “de %reYear4Digit”Value=“group(3)-%normMonth(group(2))-” + “%normDay(group(1))”

// example: “January 20th, 2012” (2012-01-20)Name=“date_r1”Extract=“%reMonthLong %reDayNum, “ + “%reYear4Digit”Value=“group(3)-%normMonth(group(1))-” + “%normDay(group(2))”

TempEval-3 Evaluation Results

Englishstrict F1 relaxed F1 value F1

HeidelTime 1.3 81.34 90.30 77.61HeidelTime 1.2 78.07 86.99 72.12NavyTime 79.57 90.32 70.97(next best system by value F1)

Spanishstrict F1 relaxed F1 value F1

HeidelTime 1.3 85.33 90.13 85.33TipSemB 82.57 87.40 71.85jrc 49.53 65.20 50.78

Summary•English: 8 teams, 21 submissions•Spanish: 3 teams, 3 submissions•HeidelTime best system for English &

Spanish for extraction + normalization

Error Analysis & Conclusions

False Negatives•expressions that cannot be normalized

with high probability (some time)False Negatives / False Positives• trade-off due to X REF expressions; an-

notated inconsistently (currently)Incorrect Value Normalization•due to partial matches• incorrect relation to reference timeSpanish Resources•benefit from high quality English re-

sources as starting point•contain many patterns and rules not in

Spanish training data

Availability

HeidelTime’s Current Version

•as UIMA component

•as standalone version (Java)

•online demo

•@ Google code

Languages

•English, German, Dutch,Spanish, Italian, Arabic,Vietnamese

Ongoing Work

• further languages to come

1

Contact Information:Jannik Strotgenstroetgen@uni-hd.dehttp://dbs.ifi.uni-heidelberg.de/

References[1 ] J. Strotgen and M. Gertz: Temporal Tagging on Different Domains: Challenges, Strategies, and

Gold Standards. LREC, 2012.[2 ] J. Strotgen and M. Gertz: Multilingual and Cross-domain Temporal Tagging.

Language Resources and Evaluation, 47(2), 269–298, 2013.

This work was presented at SemEval 2013, the 7th International Workshop on Semantic Evaluation, June 14-15, 2013, Atlanta, Georgia, USA.

top related