teti: a timeml compliant timex tagger for italian

24
TETI: a TimeML Compliant TimEx Tagger for Italian Tommaso Caselli, Felice dell'Orletta and Irina Prodanof Istituto di Linguistica Computazionale “A. Zampolli” - ILC-CNR Pisa {[email protected]} IMCSIT 2009 – CL-A09, Mragawo, October, 13

Upload: tommasoc80

Post on 18-Dec-2014

426 views

Category:

Documents


0 download

DESCRIPTION

Presentation and demo held at the Computational Linguistic Application Workshop @ IMCSIT, Mragowo, Poland

TRANSCRIPT

Page 1: TETI: a TimeML Compliant TimEx Tagger for Italian

TETI: a TimeML Compliant TimEx Tagger for Italian

Tommaso Caselli, Felice dell'Orletta and Irina Prodanof

Istituto di Linguistica Computazionale “A. Zampolli” - ILC-CNR Pisa{[email protected]}

IMCSIT 2009 – CL-A09, Mragawo, October, 13

Page 2: TETI: a TimeML Compliant TimEx Tagger for Italian

Outline:

Motivations Extracting Temporal expression and the TIMEX3

tag TETI:

− System architecture

− Demo

Evaluation Conclusions & Future Work

Page 3: TETI: a TimeML Compliant TimEx Tagger for Italian

Motivations

Recovering temporal relations in text/discourse is essential to improve the performance of many NLP systems (O.D-Q.A., Text Mining, Summarization, Reasoning)

Most temporal information in text/discourse is only IMPLICITLY stated

Need to develop procedures to maximize the role of the various sources of information

Temporal expressions represent a source of explicit temporal knowledge which can:

− Locate an eventuality in time, and thus used for inferencing for temporal relations between eventualities

− Measure the duration of an eventuality

Page 4: TETI: a TimeML Compliant TimEx Tagger for Italian

Extracting Temporal Expressions

The extraction of timexes can be divide into 4 subtasks:

− Recognizing and bracketing the timex− Feature extraction (type of time unit, referential

status, presence of modifiers)− Computing the interval of reference on the time

line− Resolving the timex, i.e. normalize the value to a

standard output format

Page 5: TETI: a TimeML Compliant TimEx Tagger for Italian

The extraction of timexes can be divide into 4 subtasks:

− Recognizing and bracketing the timex− Feature extraction (type of time unit,

referential status, presence of modifiers)− Computing the interval of reference on the time

line− Resolving the timex, i.e. normalize the value to a

standard output format

Extracting Temporal Expressions

Page 6: TETI: a TimeML Compliant TimEx Tagger for Italian

TIMEX3 tag extends and improves previous tags for this task, namely TIMEX, TIDES TIMEX2

TIMEX3 tag is used to mark any time word i.e. both absolute and relative timexes such as day time (midnight..), dates of different granularity (yesterday, last spring..), calendar dates (01/12/1980..), durations (three hours, two years..), set of time (yearly, every day..)

The annotation process is based on:

− the constituent structure (NP, AdjP, AdvP, Time/Date Pattern)

− the granularity of the time units− the relations between the timexes

Temporal Expressions in TimeML: The TIMEX3 tag

Page 7: TETI: a TimeML Compliant TimEx Tagger for Italian

TETI: Temporal Expression Tagger for Italian

Rule-based system Main components:

TIMEX DETECTOR & TIMEX TAGGER

Two external resources: TimEx Trigger Dictionary and a Modifier Dictionary

Chunked text

Page 8: TETI: a TimeML Compliant TimEx Tagger for Italian

TETI: Temporal Expression Tagger for Italian (2)

Chunked text

Page 9: TETI: a TimeML Compliant TimEx Tagger for Italian

TETI: Temporal Expression Tagger for Italian (2)

Page 10: TETI: a TimeML Compliant TimEx Tagger for Italian

TETI: Temporal Expression Tagger for Italian (2)

Chunker output approximate TIMEX3 tag extent

Extent of timexes corresponds to regolar patterns of combination of chunks

Page 11: TETI: a TimeML Compliant TimEx Tagger for Italian

TETI: Temporal Expression Tagger for Italian (3)

Analysis of the chuncked text

Lookout in the TimeEx Trigger dictionary

Extraction of the necessary features for the bracketing

Chunked text

Page 12: TETI: a TimeML Compliant TimEx Tagger for Italian

TETI: Temporal Expression Tagger for Italian (3)

Page 13: TETI: a TimeML Compliant TimEx Tagger for Italian

Core element of the tagger

A general condition + set of local conditions

If the conditions are true, the tagger activates the related rules and brackets the timex with TIMEX3 

TETI: Temporal Expression Tagger for Italian (4)

Chunked text

Page 14: TETI: a TimeML Compliant TimEx Tagger for Italian

COND(and(or (POTGOV_CHUNK equals N_C)(POTGOV_CHUNK equals ADV_C)(POTGOV_CHUNK equals ADJ_C))(not (POTGOV_CHUNK has PREMODIF))(not (POTGOV_lemma CHUNK-1 equals modiftrigger))(or (not(POTGOV_lemma CHUNK+1 equals lextrigger))(not (POTGOV_lemma CHUNK+1 equals modiftrigger))))then CREATE TIMEX3_tag (and(BEGIN_AT B_CHUNK) (END_AT E_CHUNK))

TETI: Temporal Expression Tagger for Italian (4)

Page 15: TETI: a TimeML Compliant TimEx Tagger for Italian

TETI: Temporal Expression Tagger for Italian (4)

Page 16: TETI: a TimeML Compliant TimEx Tagger for Italian

TETI: Temporal Expression Tagger for Italian (4)

Page 17: TETI: a TimeML Compliant TimEx Tagger for Italian

More complex timexes require a further lookup in the TimEx Trigger Dictionary to extract further features (sematic relations) for the correct bracketing

TETI: Temporal Expression Tagger for Italian (5)

Chunked text

Page 18: TETI: a TimeML Compliant TimEx Tagger for Italian

TETI: Temporal Expression Tagger for Italian (5)

Page 19: TETI: a TimeML Compliant TimEx Tagger for Italian

Evaluation

TAG TOT CORR. MISSING INCORR. P R F

TIMEX3 367 321 35 66 82.95 90.17 86.41

TIMEX3: modificatori

90 55 12 23 82.09 70.51 75.86

42 newpaper articles manually annotated 367 timexes

Page 20: TETI: a TimeML Compliant TimEx Tagger for Italian

Conclusion & Future Work

• Reduction of the number of false positives

• Implemetation of the normalization phase → rule based

• Re-wrting of the rules to be compliant with the KAF format (KYOTO Project)

• Release of the tool via web service

Page 21: TETI: a TimeML Compliant TimEx Tagger for Italian

Acknowlegments

Thanks to Roberto Bartolini for his help in the development of the demo

Page 22: TETI: a TimeML Compliant TimEx Tagger for Italian

Thank You!

Page 23: TETI: a TimeML Compliant TimEx Tagger for Italian

Complex Rule 1

COND (and (not (POTGOV_lemma CHUNK-1 equals modiftrigger)) ((POTGOV_lemma CHUNK+1 equals lextrigger) then (GET GRAN GET DEFAULT TYPE)) (COND ((PREMODIF_POTGOV_CHUNK equals modiftrigger) then (GET INFO_NORMALIZATION GET TIMEML_MOD_ATTRIBUTE GET TIMEML_BEGINPOINT_ATTRIBUTE GET TIMEML_ENDPOINT_ATTRIBUTE GET TR_RESPECT_TO ANCHOR)) T) (or (POTGOV_CHUNK+1 equals N_C) (POTGOV_CHUNK+1 equals ADV_C) (POTGOV_lemma CHUNK+1 equals DATE PATTERN)) (not (POTGOV_CHUNK+1 has PREMODIF)) (POTGOV_CHUNK equals N_C)

Page 24: TETI: a TimeML Compliant TimEx Tagger for Italian

Complex Rule 1b

(COND 1((and (equals (SEM_RELATION POTGOV_CHUNK) (has_as_part (LEXTRIG_CIBLE POTGOV_CHUNK+1))

(equals (DEFAULT_TYPE POTGOV_CHUNK)DATE)) (or (equals (DEFAULT_TYPE POTGOV_CHUNK+1) DATE)) (equals (DEFAULT TYPE POTGOV_CHUNK+1) TIME))) then CREATE TIMEX3 (and (BEGIN_AT B_POTGOV_CHUNK) (END_AT E_POTGOV_CHUNK+1))) 2 (( and (CREATE TIMEX3 (and (BEGIN_AT B_POTGOV_CHUNK) (END_AT E_POTGOV_CHUNK)) (and (BEGIN_AT B_POTGOV_CHUNK+1) (END_AT E_POTGOV_CHUNK+1)) )))