timen: an open temporal expression normalisation resource
DESCRIPTION
We present TIMEN, a resource for building and sharing knowledge and rules for TimeML temporal expression normalization subtask - that is, the generation of a TIMEX3 annotation from a linguistic temporal expression. This sets a strong basis built from current best approaches which is independent from the rest of temporal expression processing subtasks. Therefore, it can be easily integrated as a module in temporal information processing systems.Since it is open it can be used, improved and extended by the community, in contrast to closed tools, which must be replicated from scratch as the field advances. Furthermore, TIMEN eases the development of normalization knowledge and rules for low-resourced languages since the normalization process is partially shared between languages.TRANSCRIPT
TIMENAn Open Temporal Expression
Normalisation Resource
H.Llorens, L.Derczynski, R.Gaizauskas, E. Saquete
Outline ● Introduction: Timex normalisation● Related work● Problem: reinventing the wheel once and again
● Proposal: TIMEN● Evaluation● Conclusions● Further Work
Timex NormalisationTemporal information extraction subtask.
Timex: linguistic expression of a time point or interval.
Normalisation: semantic interpretation of timexes.
Temporal Expression (TIMEX)Linguistics/Variability/RelativityJune 2012, next month, 06/2012this morning 7 a.m.3 days and 3 hoursweekly
Timex normalizationISO 8601/Invariable interpretation2012-062012-05-24T07:00PT3D3HXXXX-XX-WXX
Timex Normalisation (II)Useful for a variety of NLP applications: IR, QA, Summarization, etc. I went to the cinema yesterday. When did he go to the cinema? 2012-05-23 The main advantage of normalisation is having timexes in standard time representations (e.g., gregorian calendar).
event timex
Value: 2012-05-23
Related Work There are many approaches to timex normalisation ● Pre TempEval-2
○ TempEx (2000), GUTime (2005), Chronos (2004), TERSEO (2005), TimexTag (2005), TEA (2006), DANTE (2007)...
● TempEval-2 (2010)○ HeidelTime, TRIPS/TRIOS, TIPSem/TIPSemB...
Similarities and differences● Approaches have slightly different architectures and
show slightly different performances on tests.
● But all the approaches are rule-based and in general they use the same normalization strategies.
● & also require the same parameters to perform the task.○ DCT: document creation time (deictic) (2 days ago: 2012-05-22)○ Reference time: time talked about (anaphoric)
(2 days before: 2012-05-20)○ Tense: Resolution direction (October)
Past (2011-10), Present/Future (2012-10)
The problemReinventing the wheel once and again● Implementation of high-performance approaches is
costly and it is done all the times from the scratch.● all the approaches are similar: rule-based with similar
normalization rules and strategies.● none is meant to be reused and refined by others.
Proposal: TIMENCharacteristics:
● Open philosophy: meant to be reused and refined (even across languages)
● Not only meant for computer scientists:
○ the algorithms (source code) and normalisation rules (db of user-friendly rules with a documented syntax) are separated.
● Independent from other timex processing tasks
● Multi-platform and easy integration
TIMEN Library ArchitectureExample:timex: three days agoDCT:2012-05-24normtext: 3_day_agopattern: Num_TUnit_agoonly 1 rule matches.normalized value: 2012-05-21
Example2:timex: October 202 rules matchingdisambiguation20 probably a dayrather than a yearbecause <32
Rule base sample (English)
TIMEN integration
TIMEN community ● Open-source software:
http://code.google.com/p/timen/ ● Crowd extension of the rule set (interactive
web interface to upload and check new rules): http//timen.org
* new rules only accepted if they improve the performance on the current dataset or new examples (human reviewed). Eg: New Year's Eve
EvaluationExperiments:● Normalization accuracy of TIMEN
● Performance gain in s-o-a approaches by integrating TIMEN
Datasets:● TempEval-2 test-set
(already known for approaches, mainly common dates and duration)
● TimenEval dataset (new, unknown for appr., balanced among different timex types)
Normalisation accuracy
yesterday2012Octoberdailymorning...
TIMEN
2012-05-2320122012-10xxxx-xx-xx2011...
correctcorrectincorrectcorrectincorrect...
normalisationgold timexes
e.g. TOTAL: 100 timexes to normalise e.g. TOTAL: 90 correct normalizations
RESULT: 90/100 --> 90% ACCURACY
Normalisation accuracy ● TIMEN shows a high performance even in this first
version (only 76 rules). ● TimenEval accuracy is lower. This corpus is more
heterogeneous (times/sets) and normalization is more difficult.
TEST SET NORMALISAION ACC
TempEval-2 0.90
TimenEval 0.68
Performance gain
Approach X recognized timexes
built-innormalisationof Approach X
Originalnormalisation
Performance gain = New accuracy - Original accuracy
TIMEN Newnormalisation
Performance gain (TempEval-2) "known data" ● Replacing built-in normalization approaches of the
systems by TIMEN generally improves their performance in TE2 testset.
● Tested (current) versions of the systems may have been developed/updated being aware of this data. What does it happen with data which is new for them?
System built-in norm. TIMEN norm. Err. Redution
TIPSemB 0.83 0.89 35%
HeidelTime 0.94 0.94 0%
TERNIP 0.76 0.92 66%
Performance gain (TimenEval) "new data" ● Using new data, the built-in approaches performance
decreases in general.● TIMEN favours the normalization performance for all the
systems.
System built-in norm. TIMEN norm. Err. Redution
TIPSemB 0.57 0.67 23%
HeidelTime 0.72 0.74 7%
TERNIP 0.70 0.72 66%
Conclusions ● We presented an open tool for timex normalisation:
TIMEN. ● ADVANTAGES:
○ High performance (above recent approaches).○ Easily integrated in any timex recognition
approach.○ Can be improved by the community (open philosophy),
and avoids re-development from scratch.○ Available: http://timen.org and Google code
Further Work ● Community-based extension and refinement
of TIMEN (rulebase). ● Extensive evaluation of TIMEN in various
languages (Spanish, Chinese, Italian and Danish).
TIMEN: An Open TIMEX Normalisation Resource
THANK YOU!QUESTIONS?
http://timen.org
H.Llorens, L.Derczynski, R.Gaizauskas, E. Saquete