teti: a timeml compliant timex tagger for italian
DESCRIPTION
Presentation and demo held at the Computational Linguistic Application Workshop @ IMCSIT, Mragowo, PolandTRANSCRIPT
TETI: a TimeML Compliant TimEx Tagger for Italian
Tommaso Caselli, Felice dell'Orletta and Irina Prodanof
Istituto di Linguistica Computazionale “A. Zampolli” - ILC-CNR Pisa{[email protected]}
IMCSIT 2009 – CL-A09, Mragawo, October, 13
Outline:
Motivations Extracting Temporal expression and the TIMEX3
tag TETI:
− System architecture
− Demo
Evaluation Conclusions & Future Work
Motivations
Recovering temporal relations in text/discourse is essential to improve the performance of many NLP systems (O.D-Q.A., Text Mining, Summarization, Reasoning)
Most temporal information in text/discourse is only IMPLICITLY stated
Need to develop procedures to maximize the role of the various sources of information
Temporal expressions represent a source of explicit temporal knowledge which can:
− Locate an eventuality in time, and thus used for inferencing for temporal relations between eventualities
− Measure the duration of an eventuality
Extracting Temporal Expressions
The extraction of timexes can be divide into 4 subtasks:
− Recognizing and bracketing the timex− Feature extraction (type of time unit, referential
status, presence of modifiers)− Computing the interval of reference on the time
line− Resolving the timex, i.e. normalize the value to a
standard output format
The extraction of timexes can be divide into 4 subtasks:
− Recognizing and bracketing the timex− Feature extraction (type of time unit,
referential status, presence of modifiers)− Computing the interval of reference on the time
line− Resolving the timex, i.e. normalize the value to a
standard output format
Extracting Temporal Expressions
TIMEX3 tag extends and improves previous tags for this task, namely TIMEX, TIDES TIMEX2
TIMEX3 tag is used to mark any time word i.e. both absolute and relative timexes such as day time (midnight..), dates of different granularity (yesterday, last spring..), calendar dates (01/12/1980..), durations (three hours, two years..), set of time (yearly, every day..)
The annotation process is based on:
− the constituent structure (NP, AdjP, AdvP, Time/Date Pattern)
− the granularity of the time units− the relations between the timexes
Temporal Expressions in TimeML: The TIMEX3 tag
TETI: Temporal Expression Tagger for Italian
Rule-based system Main components:
TIMEX DETECTOR & TIMEX TAGGER
Two external resources: TimEx Trigger Dictionary and a Modifier Dictionary
Chunked text
TETI: Temporal Expression Tagger for Italian (2)
Chunked text
TETI: Temporal Expression Tagger for Italian (2)
TETI: Temporal Expression Tagger for Italian (2)
Chunker output approximate TIMEX3 tag extent
Extent of timexes corresponds to regolar patterns of combination of chunks
TETI: Temporal Expression Tagger for Italian (3)
Analysis of the chuncked text
Lookout in the TimeEx Trigger dictionary
Extraction of the necessary features for the bracketing
Chunked text
TETI: Temporal Expression Tagger for Italian (3)
Core element of the tagger
A general condition + set of local conditions
If the conditions are true, the tagger activates the related rules and brackets the timex with TIMEX3
TETI: Temporal Expression Tagger for Italian (4)
Chunked text
COND(and(or (POTGOV_CHUNK equals N_C)(POTGOV_CHUNK equals ADV_C)(POTGOV_CHUNK equals ADJ_C))(not (POTGOV_CHUNK has PREMODIF))(not (POTGOV_lemma CHUNK-1 equals modiftrigger))(or (not(POTGOV_lemma CHUNK+1 equals lextrigger))(not (POTGOV_lemma CHUNK+1 equals modiftrigger))))then CREATE TIMEX3_tag (and(BEGIN_AT B_CHUNK) (END_AT E_CHUNK))
TETI: Temporal Expression Tagger for Italian (4)
TETI: Temporal Expression Tagger for Italian (4)
TETI: Temporal Expression Tagger for Italian (4)
More complex timexes require a further lookup in the TimEx Trigger Dictionary to extract further features (sematic relations) for the correct bracketing
TETI: Temporal Expression Tagger for Italian (5)
Chunked text
TETI: Temporal Expression Tagger for Italian (5)
Evaluation
TAG TOT CORR. MISSING INCORR. P R F
TIMEX3 367 321 35 66 82.95 90.17 86.41
TIMEX3: modificatori
90 55 12 23 82.09 70.51 75.86
42 newpaper articles manually annotated 367 timexes
Conclusion & Future Work
• Reduction of the number of false positives
• Implemetation of the normalization phase → rule based
• Re-wrting of the rules to be compliant with the KAF format (KYOTO Project)
• Release of the tool via web service
Acknowlegments
Thanks to Roberto Bartolini for his help in the development of the demo
Thank You!
Complex Rule 1
COND (and (not (POTGOV_lemma CHUNK-1 equals modiftrigger)) ((POTGOV_lemma CHUNK+1 equals lextrigger) then (GET GRAN GET DEFAULT TYPE)) (COND ((PREMODIF_POTGOV_CHUNK equals modiftrigger) then (GET INFO_NORMALIZATION GET TIMEML_MOD_ATTRIBUTE GET TIMEML_BEGINPOINT_ATTRIBUTE GET TIMEML_ENDPOINT_ATTRIBUTE GET TR_RESPECT_TO ANCHOR)) T) (or (POTGOV_CHUNK+1 equals N_C) (POTGOV_CHUNK+1 equals ADV_C) (POTGOV_lemma CHUNK+1 equals DATE PATTERN)) (not (POTGOV_CHUNK+1 has PREMODIF)) (POTGOV_CHUNK equals N_C)
Complex Rule 1b
(COND 1((and (equals (SEM_RELATION POTGOV_CHUNK) (has_as_part (LEXTRIG_CIBLE POTGOV_CHUNK+1))
(equals (DEFAULT_TYPE POTGOV_CHUNK)DATE)) (or (equals (DEFAULT_TYPE POTGOV_CHUNK+1) DATE)) (equals (DEFAULT TYPE POTGOV_CHUNK+1) TIME))) then CREATE TIMEX3 (and (BEGIN_AT B_POTGOV_CHUNK) (END_AT E_POTGOV_CHUNK+1))) 2 (( and (CREATE TIMEX3 (and (BEGIN_AT B_POTGOV_CHUNK) (END_AT E_POTGOV_CHUNK)) (and (BEGIN_AT B_POTGOV_CHUNK+1) (END_AT E_POTGOV_CHUNK+1)) )))