annotating temporal phenomena in literary text€¦ · annotating temporal phenomena in literary...

1
Annotating Temporal Phenomena in Literary Text Jannik Str ¨ otgen, Thomas B ¨ ogel, Michael Gertz Database Systems Research Group, Heidelberg University, Im Neuenheimer Feld 348, 69120 Heidelberg, Germany Context of the Work Temporal Tagging General context Natural language processing Information extraction Temporal annotation Temporal information frequent in many texts can be normalized challenging to extract Two tasks extraction & normalization of temporal expressions Challenge: normalizing relative & underspecified expressions Most existing approaches focus on English focus on news documents Challenges for Temporal Tagging on Different Domains [1] News 1998-04-18 ... for the United States, he said today . ... On May 22, 1995 , Farkas was made a brigadier general, and the following year ... However, cited by police in December for driving under the influence of alcohol ... Narrative 2009-12-19 1979 : Soviet invasion ... land in Kabul on December 25 ... they were complying with the 1978 Treaty of Friendship ... en- tered Afghanistan from the north on December 27 . In the morning , the 103rd ... SMS 2010-01-10T05:19 Whats it u wanted 2 say last nite ? SMS 2010-09-23T09:50 Yo! Rem to come for lab tmr :-) ... SMS 2011-02-16T12:42 ... andy is availableat 10 am in his office Scientific 2009-12-19 ... Subjects consumed one tablet per day contain- ing ... Subjects were as- sessed at baseline , three and six months ... Clini- cal pathology analysis was performed at baseline and six months ... reference time often DCT relation to reference time many underspecified and relative expressions reference time often in text relation to reference time long documents, rich discourse structure relation to reference time non-standard language (errors, word creations, ...) missing context information often no real reference time local semantics (document time frame) many durations and sets HeidelTime: a Multilingual, Cross-domain Temporal Tagger [2] Key Features rule-based system required: sentence, token, and POS information extraction: regular expres- sions & NLP features normalization: knowledge resources & linguistic clues TempEval-2 & TempEval-3 winner Resources Source Code Language-independent resource interpreter domain-dependent normal- ization strategies reference time relation to reference time Language-dependent pattern files month=(...|April|May|...) normalization files normMonth(April)=04 rule files Availability as UIMA component standalone version (Java) online demo @ Google code Languages English, German, Dutch, Spanish, Italian, French, Arabic, Vietnamese more to come The heureCL ´ EA project [3] Temporal Phenomena in Literary Text Cooperation BMBF-funded eHumanities project narratologists (Hamburg) computer scientists (Heidelberg) temporal phenomena in literary text Goals collaborative annotation framework that automatically suggests annotations reduce manual annotation errors provide valuable hints for complex tem- poral phenomena Use case (semi-)automated annotation of tempo- ral phenomena in literary texts Literary: “Der Tod” local time frame Der 10. September Nun ist der Herbst da, und der Sommer wird nicht zur ¨ uckkehren, . . . Das Meer ist grau und still . . . Als ich das heute morgen sah, habe ich vom Sommer Abschied genom- men und den Herbst begr ¨ ußt, meinen vierzigsten Herbst , der nun . . . Der 12. September Ich bin mit der . . . Normalization to local time (year x) (x)-SU (x)-FA (x)-09-10 (x)-09-12 (x)-09-10TMO (x)-09-10TXX:XX Temporal expressions less frequent in literary text (usually) can be extracted normalization wrt local time frame Tense information can be extracted (past, present, future) both help to detect more complex tem- poral phenomena: as features for ML methods as hints for manual annotations Temporal narratological aspects relations in local time frame relations between discourse and history plot organizing sequences Contact Information: Jannik Str ¨ otgen [email protected] http://dbs.ifi.uni-heidelberg.de/ References [1 ] J. Str¨ otgen and M. Gertz: Temporal Tagging on Different Domains: Challenges, Strategies, and Gold Standards. LREC, 2012. [2 ] J. Str¨ otgen and M. Gertz: Multilingual and Cross-domain Temporal Tagging. Language Resources and Evaluation, 47(2), 269–298, 2013. [3 ] The heureCL ´ EA Project: http://www.heureclea.de/. This work was presented at Herrenhauser Conference 2013, Humanities in the Digital Age, December 5–7, 2013, Hannover, Germany.

Upload: others

Post on 19-Jul-2020

4 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Annotating Temporal Phenomena in Literary Text€¦ · Annotating Temporal Phenomena in Literary Text Jannik Strotgen, Thomas B¨ ogel, Michael Gertz¨ Database Systems Research Group,

Annotating Temporal Phenomenain Literary Text

Jannik Strotgen, Thomas Bogel, Michael GertzDatabase Systems Research Group, Heidelberg University, Im Neuenheimer Feld 348, 69120 Heidelberg, Germany

Context of the Work Temporal Tagging

General context•Natural language processing• Information extraction•Temporal annotation

Temporal information• frequent in many texts•can be normalized•challenging to extract

Two tasks•extraction & normalization

of temporal expressions

Challenge: normalizing relative & underspecified expressions

Most existing approaches• focus on English• focus on news documents

Challenges for Temporal Tagging on Different Domains [1]

News 1998-04-18... for the United States,he said today. ... OnMay 22, 1995, Farkas wasmade a brigadier general,and the following year ...However, cited by police inDecemberfor driving underthe influence of alcohol ...

Narrative 2009-12-191979: Soviet invasion... land in Kabul onDecember 25 ... they werecomplying with the 1978Treaty of Friendship ... en-tered Afghanistan from thenorth on December 27. Inthe morning, the 103rd ...

SMS 2010-01-10T05:19Whats it u wanted 2 saylast nite?SMS 2010-09-23T09:50Yo! Rem to come for labtmr:-) ...SMS 2011-02-16T12:42... andy is availableat10 am in his office

Scientific 2009-12-19... Subjects consumedone tablet per day contain-ing ... Subjects were as-sessed at baseline, threeand six months ... Clini-cal pathology analysis wasperformed at baseline andsix months ...

• reference time often DCT• relation to reference time•many underspecified and

relative expressions

• reference time often in text• relation to reference time• long documents,

rich discourse structure

• relation to reference time•non-standard language

(errors, word creations, ...)•missing context information

•often no real reference time• local semantics

(document time frame)•many durations and sets

HeidelTime: a Multilingual, Cross-domain Temporal Tagger [2]

Key Features• rule-based system• required: sentence, token,

and POS information•extraction: regular expres-

sions & NLP features•normalization: knowledge

resources & linguistic clues•TempEval-2 & TempEval-3

winner

ResourcesSource Code

Language-independent• resource interpreter•domain-dependent normal-

ization strategies→ reference time→ relation to reference time

Language-dependent•pattern files

month=(...|April|May|...)•normalization files

normMonth(April)=04• rule files

Availability•as UIMA component•standalone version (Java)•online demo•@ Google codeLanguages•English, German, Dutch,

Spanish, Italian, French,Arabic, Vietnamese•more to come

The heureCLEA project [3] Temporal Phenomena in Literary Text

Cooperation•BMBF-funded eHumanities project•narratologists (Hamburg)•computer scientists (Heidelberg)• temporal phenomena in literary textGoals•collaborative annotation framework that

automatically suggests annotations• reduce manual annotation errors•provide valuable hints for complex tem-

poral phenomenaUse case• (semi-)automated annotation of tempo-

ral phenomena in literary texts

Literary: “Der Tod” local time frameDer 10. SeptemberNun ist der Herbst da, und derSommer wird nicht zuruckkehren, . . .Das Meer ist grau und still . . . Alsich das heute morgen sah, habeich vom Sommer Abschied genom-men und den Herbst begrußt, meinenvierzigsten Herbst, der nun . . .Der 12. SeptemberIch bin mit der . . .

Normalization to local time (year x)(x)-SU (x)-FA

(x)-09-10 (x)-09-12(x)-09-10TMO(x)-09-10TXX:XX

Temporal expressions

• less frequent in literary text (usually)•can be extracted•normalization wrt local time frame

Tense information•can be extracted (past, present, future)→ both help to detect more complex tem-poral phenomena:•as features for ML methods•as hints for manual annotationsTemporal narratological aspects• relations in local time frame• relations between discourse and history•plot organizing sequences

Contact Information:Jannik [email protected]://dbs.ifi.uni-heidelberg.de/

References[1 ] J. Strotgen and M. Gertz: Temporal Tagging on Different Domains: Challenges, Strategies, and

Gold Standards. LREC, 2012.[2 ] J. Strotgen and M. Gertz: Multilingual and Cross-domain Temporal Tagging.

Language Resources and Evaluation, 47(2), 269–298, 2013.[3 ] The heureCLEA Project: http://www.heureclea.de/.

This work was presented at Herrenhauser Conference 2013, Humanities in the Digital Age, December 5–7, 2013, Hannover, Germany.