using machine learning to predict temporal orientation of search engine queries in the temporalia...

7/21/2019 Using machine learning to predict temporal orientation of search engine queries in the Temporalia challenge

1/98

[email protected], [email protected]

Tokyo, 11/12/2014

presentation:NTCIR-11 Temporalia

Using machine learning to predict temporalorientation of search engines queries

in the Temporalia challenge

Michele Filannino, Goran Nenadic


2/98


3/98

11/12/2014, Tokyo

presentation: NTCIR-11 Temporalia

/25

temporal intent of queries (TIQ)

3Source:

Given a user queryand its submission time, can a

system predict its temporal intent?

" input: queries & submission date

" output: temporal intent

PAST, RECENCY, FUTURE or ATEMPORAL

" easyfor people

" hardfor machines


4/98

11/12/2014, Tokyo


/25Source:https://www.google.co.uk/search?q=google+stock+price

TQI: recency

4


5/98

11/12/2014, Tokyo


/25Sourc

e:https://www.google.co.uk/search?q=weather+forecast+manchester

TQI: future

5


6/98

11/12/2014, Tokyo


/25Source: https://www.google.co.uk/search?q=who+was+eliminated+on+dancing+with+the+stars

TQI: past

6


7/98

11/12/2014, Tokyo


/25Source: https://www.google.co.uk/search?q=who+was+eliminated+on+dancing+with+the+stars

TQI: atemporal

7


8/98

11/12/2014, Tokyo


/25

the data

" training set

80 instances +

20 instances (released as preliminary test set)

" test

300instances

8


9/98

11/12/2014, Tokyo


/25

[1] G. H. Dias, M. Hasanuzzaman, S. Ferrari, and Y. Mathet. TempoWordNet for sentence time tagging. InProceedings of the 23rd International Conference on World Wide Web Companion, pages 833838, Republicand Canton of Geneva, Switzerland, 2014.

proposed approach

9

" data-driven rather than rule-based

" low-sparsity attributes

" external resources:

TempoWordNet1, a temporal lexical KB

ManTIME, a temporal expression

extraction system

NLTK


10/98

11/12/2014, Tokyo


/25[1] M. Filannino, G. Brown, and G. Nenadic. ManTIME: Temporal expression identification andnormalization in the TempEval-3 challenge. In Proceedings of SemEval 2013, pages 5357,Atlanta, USA, June 2013. ACL.

ManTIME1usage

madden 2014 release date

madden 2014 release date

drudge report 2013 september

drudge report 2013 september

10

" a ML-based temporal expression extraction system


11/98

11/12/2014, Tokyo


/25

trigger classes

PAST

ancient

daysdeath

didhistory

lastmonths

21 triggers

RECENCY

actual

costcosts

currentdailyday

direction

44 triggers

FUTURE

agenda

calendarchancecomingdates

forecastforthcoming

27 triggers

ATEMPORAL

chords

lyrics2 triggers

11

" Feature selection RELIEF algorithm

" BOW representation

" 4 dictionaries (1 per class)


12/98

11/12/2014, Tokyo


/25Sparsity is measured on the full data set: training + test

attributes

12

# Attribute description SparsityExample

Input (query/time) !attribute value

1 Is it a Wikipedia page title? 2 New York Times #YES2 Does it contain a temporal expression? 2 june 2013 movies #YES3 Submissions term 3 Feb 28, 2013 GMT+0 #B4 Submissions trimester 4 Aug 26, 2013 GMT+0#M2

5 Timing 4 Movies 2012, Feb 28, 2013 #past6 Most frequent trigger class 5 peso dollar exchange rate #present7 Wh type 5 how did hitler die #how8 Most frequent TempoWordNet class 5 current stock prices #present9 os requen ag ense 7 what is stop kony 2012 #VBZ

10 Most frequent coarse-grained POS tag 8 kony 2012 fake #N11 Trigger classes footprint 11 what was I thinking lyrics#past-atemporal12 Temporal $between submission and query 16 fathers day 2010, Feb 28, 2013#36.0

13 Tenses footprint 18 when does fall start #VBZ-VB14 Ordered TempoWordNet classes 18 the last song #past-future-present-15 Most frequent fine-grained POS tag 21 kony 2012 fake #NN16 Coarse-grained POS tag ordered footprint 119 when is labour day #N-W-V17 Fine-grained POS tag ordered footprint 202 when is labour day #NN-WRB-VBZ18 Coarse-grained POS tag footprint 204 when is labour day #W-V-N-N19 Fine-grained POS tag footprint 265 when is labour day #WRB-VBZ-NN-NN


13/98

11/12/2014, Tokyo


/25* default parameters (C and gamma)

run 1: minimal

" classifier:

SVM with polynomial

kernel

13

# Attribute description Sparsity

2 Does it contain a temporal expression? 2

5 Timing 4

6 Most frequent trigger class 5

9 Most frequent POS tag tense 7

11 Trigger classes footprint 11


14/98

11/12/2014, Tokyo


/25* default parameters (C and gamma)

run 2: intermediate

14


1 Is it a Wikipedia page title? 2


5 Timing 4

7 Wh type 5


10 Most frequent coarse-grained POS tag 8


12 Temporal#between submission and 16

13 Tenses footprint 18

15 Most frequent fine-grained POS tag 21

" classifier:

SVM with polynomial

kernel


15/98

11/12/2014, Tokyo


/251000 random trees

run 3: full

15


1 Is it a Wikipedia page title? 2


3 Submissions term 3

4 Submissions trimester 4

5 Timing 4

6 Most frequent trigger class 57 Wh type 5

8 Most frequent TempoWordNet class 5


10 Most frequent coarse-grained POS tag 8


12 Temporal#between submission and 16

13 Tenses footprint 1814 Ordered TempoWordNet classes 18

15 Most frequent fine-grained POS tag 21

16 Coarse-grained POS tag ordered footprint 119

17 Fine-grained POS tag ordered footprint 202

18 Coarse-grained POS tag footprint 204

19 Fine-grained POS tag footprint 265

" classifier:

Random Forests


16/98

11/12/2014, Tokyo


/251st ranked system

results (submitted runs)

16

Accuracy

0

25

50

75

100

Full Intermediate Minimal


17/98

11/12/2014, Tokyo


/251st ranked system

results: 5 x 10 cross-fold v.

17

Accuracy

0

25

50

75

100

Full Intermediate Minimal


18/98

11/12/2014, Tokyo


/25best combination of attributes

a posteriori fix

18

Accuracy

0

25

50

75

100

Full Intermediate Minimal Minimal

fixed

72.33%


19/98presentation: NTCIR-11 Temporalia

how to reach the eak


20/98

11/12/2014, Tokyo


/25minimal run

confusion matrix

20

Classified as

Recency Past Future Atemporal

Recency 43 0 21 11

Past 3 60 6 6

Future 38 0 35 2

Atemporal 6 5 3 61


21/98

11/12/2014, Tokyo


/25minimal run

confusion matrix

21

Classified as

Recency Past Future Atemporal

Recency 43 0 21 11

Past 3 60 6 6

Future 38 0 35 2

Atemporal 6 5 3 61


22/98

11/12/2014, Tokyo


/25

difficult queries

" iPhone 5 release date

it can be FUTURE or PAST according to the submission time

keywords dont help here

" 2061: Odyssey Three

keywords can lie!

" season 2 dexter

use of external sources of knowledge

22


23/98

11/12/2014, Tokyo


/25

difficult queries

" iPhone 5 release date

it can be FUTURE or PAST

keywords dont help here

" Ventura Stern 2016

keywords could possibly lie

" season 2 dexter

use of external sources of knowledge

23


24/98

11/12/2014, Tokyo


/25Source:Filannino, M., Nenadic G. Mining temporal footprints from Wikipedia. Proceedings ofthe First AHA!-Workshop on Information Discovery in Text. (COLING 2014) (Dublin, Ireland,August 2014), ACL.

temporal footprint

a continuous period on the time-line that temporally

defines the existence of a articular conce t.

24


25/98

11/12/2014, Tokyo


/25Source: http://www.cs.man.ac.uk/~filannim/projects/temporalia/

online material

25


26/98

Thankyou


27/98

Contact:

[email protected]

?QUESTIONS


28/98

11/12/2014, Tokyo


/2528

Natural LanguageProcessing

Linguistics

Parallel computing

Semi-structureddata

Statistics

MachineLearning

TextMining


29/98

11/12/2014, Tokyo


/25

the task

" source: written texts

" goal: a (machine-understandable)

temporal representation of the texts

" easyfor people

" hardfor machines

Temporal aspects of events provide a natural

mechanism for organising information


30/98

11/12/2014, Tokyo


/25Source: ISO-TimeML (ISO/TC37/SC 4 N412 ), rev. 12, 2007

linguistic key concepts

" temporal expressions: phrases denoting a temporal

entity such as an interval or a time point

01/05/2014, March 15, the next week, Saturday, at that time,

yesterday, 5 oclock, 3 days, every 4 hours

" events: phrases denoting eventuality and states

inflected verbs and nouns: spoken, deliver, will be published

" links: temporal relation between two phrases

BEFORE, AFTER, INCLUDES, ENDS, DURING, BEGINS

30


31/98

11/12/2014, Tokyo


/25Source:CNN news article published on 28th February 2010.

example

" Yesterday, Deutsche Bank released a note saying

that China's current economic policies would result in an

enormous surge in coal consumption over the next

decade.

31


32/98

11/12/2014, Tokyo



example: temporal expressions

" Yesterday(T), Deutsche Bank released a note sayingthat China's current economic policies would result in

an enormous surge in coal consumption over the next

decade(T).

32

value: 2010-02-27type: DATE

value: P10Ytype: DURATION


33/98

11/12/2014, Tokyo



example: events

" Yesterday(T), Deutsche Bank released(E)a note saying(E)that China's current economic policies wouldresult(E)in

an enormous surge(E)in coal consumption over the next

decade(T).

33

class: OCCURRENCE

class: REPORTING

class: OCCURRENCEclass: OCCURRENCE


34/98

11/12/2014, Tokyo



example: links

" Yesterday(T), Deutsche Bank released(E)a note saying(E)that China's current economic policies wouldresult(E)in

an enormous surge(E)in coal consumption over the next

decade(T).

34

is included

is included

after

is included


35/98

11/12/2014, Tokyo


/25

example: ISO-TimeML output

nyt_20100228_china_pollution

Yesterday, Deutsche Bank releaseda note saying

that China's currenteconomic policies

would resultin an enormous surgein coal consumptionover the next decade.

35


36/98

11/12/2014, Tokyo


/25Utterance time:28th February 2010.

visual representation

36

now27 Feb. 2010

released,saying

2020

surge


37/98

11/12/2014, Tokyo


/25Rule-based Machine learning-based

TempEval-3 results

37

Research groupIdentification Normalisation

accuracyOverallscore

Prec. Rec. F1

The University of Heidelberg 0.93 0.88 0.9 0.86 0.776

US Naval Academy 0.89 0.91 0.9 0.79 0.71

The University of Manchester 0.95 0.85 0.9 0.77 0.69

Stanford University 0.89 0.91 0.9 0.75 0.674

AT&T Lab Research 0.98 0.75 0.85 0.77 0.656

University of Colorado Boulder 0.94 0.87 0.9 0.72 0.647

Jadavpur University 0.93 0.8 0.86 0.74 0.638

Katholieke Universiteit Leuven 0.93 0.76 0.84 0.75 0.63

Joint Research Centre European Commission 0.9 0.8 0.85 0.68 0.582


38/98

11/12/2014, Tokyo


/25

model selection

38

Source:Filannino, M., and Nenadic G. ManTIME: Temporal expression extraction withsystematic feature type selection and a posteriori label adjustment. Journal of Informationprocessing and Management: Special Issue on Time and Information Retrieval, (2014),Elsevier. (under review)

*5x10-fold cross validation

93 features, 4 models:

" M1: morpho-lexical only

" M2: morpho-lexical + syntactic

" M3: morpho-lexical + gazeetters


+ WordNet


39/98


Better software, better research


40/98

11/12/2014, Tokyo


/25Source:Filannino, M., Nenadic G. Mining temporal footprints from Wikipedia. Proceedings ofthe First AHA!-Workshop on Information Discovery in Text. (COLING 2014) (Dublin, Ireland,August 2014), ACL.

temporal footprint

A temporal footprintis a continuous period

on the time-line that temporally defines

the existence of a particular concept.

40


41/98

11/12/2014, Tokyo


/25

evaluation

" subjects: people

" lived from 1000 AD to 2014

textfrom Wikipedia web pages

year of birth and deathfrom DBpedia

" 228,824 people collected

" simple definition of temporal footprint

birth and death dates

41


42/98

11/12/2014, Tokyo


/25Error: 0.204

results

" Galileo Galilei (1564-1642), prediction: 1556-1654

42


43/98

11/12/2014, Tokyo


/25Source: http://www.cs.man.ac.uk/~filannim/projects/temporal_footprints/

results

" Computer (1940-today), prediction: 1882-1982

43


44/98

11/12/2014, Tokyo


/25Source:http://start.csail.mit.edu/answer.php?query=

application?

44


45/98

11/12/2014, Tokyo


/25

Source:Kovaevi, A., Dehghan, A., Filannino, M., Keane, J. A., and Nenadic, G. Combining

rules and machine learning for extraction of temporal expressions and events from clinicalnarratives. Journal of American Medical Informatics (2013).

i2b2 shared Task 12

45

ADMISSION DATE: 2011-02-06;DISCHARGE DATE: 2011-02-08;HISTORY OF PRESENTILLNESS: Mr. Pohl is a 53 - year-old male with historyof alcohol useand hypertension. Blood alcohol level was 383. Agitated in emergency room requiring 4leather restraints, received 5 mg of Haldol, 2 mg of Ativan. He became hypotensivein theemergency room with a systolic blood pressure in the 80 'sand had decreased respiratory

rate. He received a normal saline bolus of 2 litres of good blood pressure response. Thepatient was then admitted to the medical Intensive Care Unit for observation and thentransferred to our service on medicine when the blood pressures remained stableovernight...

06/02/2011 07/02/2011 08/02/2011

General

Tests

Treatments

Problems

admission discharge

BAL 383

Haldol 4mg

Ativan 2mg

hypotensive

SBP ~80

decreased respiratory rate

Saline bolus 2l

transfer

stable

SBP stable

hands tremor improved

blood pressure medications


46/98

11/12/2014, Tokyo


/25

clinical data

" disease progression

modelling

" analysis of the effectiveness

of treatments

" extraction of patients clinical

pathway

46


47/98


1s ear backu


48/98

11/12/2014, Tokyo


/25

identification techniques

48

2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013

TimeML(standard)

ACE-2004 dev & eval(TERN2004 corpus)

TimeBank(corpus)

Hand grammar approach(rule-based)

TempEval Task#15(in SemEval07)

TempEval-2 Task#13(in SemEval10)

TempEval-3 Task#1(in SemEval13)

Markov logic network(machine learning)

SVM(machine learning)

Maximum Entropy Class.(machine learning)

Conditional Random Fields(machine learning)


49/98

11/12/2014, Tokyo


/25Source: Google Scholar (27/02/2012)

scientific interest

49

0

7

14

21

28

35

42

49

56

63

70

2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011

temporal expressions AND clinical


50/98

11/12/2014, Tokyo


/25

conferences & journals

" SemEval: Evaluation Exercises on Semantic Evaluation

TempEval: Temporal Evaluation Task

" TIME: Time International Symposium Series

" JAMIA: Journal of American Informatics Association

" COLING: Computational Linguistics Conference

" IJHI: International Journal of Health Informatics

50


51/98

11/12/2014, Tokyo


/25

ISO-TimeML" DATE

[YYYY-MM-DD]

" TIME

[date]T[hh:mm:ss]

" SET

P[[n][Y/M/D/w/h/m/s]]

" DURATION

R[n][set]

51


52/98

11/12/2014, Tokyo


/25J. Poveda, M. Surdeanu, and J. Turmo, An analysis of Bootstrapping for the Recognition ofTemporal Expressions, 2009

temporal forms

" time or date references

11pm, February 14th

" time references that

anchor on another time

one hour after midnight

" durations

two days, five years

" recurring times

twice in the hour

" context-dependent

times

today, last year

" vague references

the near future

" times indicated by an

event the day after Silvio

Berlusconi resigned

52


53/98

11/12/2014, Tokyo


/25D. Ahn, S. Fissaha Adafre, and M. de Rijke, Towards Task-BasedTemporal Extraction andRecognition, 2005

temporal binding

" fully-qualified: no reference to any other temporal

entity

March 15, 2001" deictic: reference to the time of utterance

today, yesterday, three weeks ago, last Thursday

" anaphoric: reference to a temporal expression

previously evoked in the text

March 15, the next week, Saturday, at that time

53

i NTCIR 11 T li


54/98

11/12/2014, Tokyo


/25

NorMA architecture

" design, implement and evaluate a novel:

identification architecture

normalisation architecture" investigate the difference between general and clinical domain

" investigate the use of the proposed frameworks to the general

domain

" suggest a more temporally-aware error measure for

normalisation phase

54

t ti NTCIR 11 T li


55/98

11/12/2014, Tokyo


/25

clinical NorMA architecture





domain


normalisation phase

55

t ti NTCIR 11 T li


56/98

11/12/2014, Tokyo


/25

clinical NorMA pipeline





domain


normalisation phase

56

t ti NTCIR 11 T li


57/98

11/12/2014, Tokyo


/25

example of clinical rule

pattern = re.findall(^(?:the |her |his |their )?([09][09]!)(?:st|nd|rd |th)

(?:post|post|day)? ?(?:pod| operative |op| hospital |hsp|day|hd)(?:ly)?

(?:day|night|afternoon)?$, raw_expression)

if pattern:

value = add_date(reference_date , int(pattern[0]) )

returnexpression, DATE , value, postoperative_literals3

57

temporal expression type ISO-8601 representation(value)

rule name

presentation: NTCIR 11 Temporalia


58/98

11/12/2014, Tokyo


/25

Rules activation distribution



normalisation architecture

" investigate the difference between general and clinical domain


domain


normalisation phase

58



59/98

11/12/2014, Tokyo


/25

Rules activation distribution



normalisation architecture

" investigate the difference between general and clinical domain


domain


normalisation phase

59



60/98

11/12/2014, Tokyo


/25Source: i2b2 2012 clinical corpus

example: raw text

Admission Date :

02/01/2002

Discharge Date :

02/08/2002

HISTORY OF PRESENT ILLNESS :

Saujule Study is a 77-year-old woman with a history of obesity and

hypertension who presents with increased shortness of breath x 5

days. Her shortness of breath has been progressive over the last2-3 years. On admission , she was diuresed with Lasix and was

negative 1-2 liters per day for several days.

60



61/98

11/12/2014, Tokyo



example: identification

Unisys must pay about $100 million in interest every quarter, on

top of $27 million in dividends on preferred stock.

61

Admission Date :

02/01/2002

Discharge Date :

02/08/2002

HISTORY OF PRESENT ILLNESS :

Saujule Study is a 77-year-old woman with a history of obesity and

hypertension who presents with increased shortness of breath x 5

days. Her shortness of breath has been progressive over the last2-3 years. On admission , she was diuresed with Lasix and was

negative 1-2 liters per day for several days.



62/98

11/12/2014, Tokyo



example: normalisation

02/01/2002

02/08/2002

5 days

2-3 years

several days

62


63/98


n ear backu



64/98

11/12/2014, Tokyo


/25

ml-driven identification phase

" Conditional Random Fields

Features: harvested from the literature

Tagging scheme: BIO (beginning, inside, outside)

Factor graph:

64



65/98

11/12/2014, Tokyo


/25Source:Richard P. Feynmans page

factor graph

... was | discovered | in | 1977 | , | Feynman | immediately ...

65

w0 w+1 w+2 w+3w-1w-2w-3



66/98

11/12/2014, Tokyo


/25

unique values per feature

66

0

1200

2400

3600

48006000

7200

8400

9600

10800

12000

_w

ord

_w

ord_preprocessed

lex_lemma

lex_porter_stem

lex_treetagger_lemma

lex_prefix

lex_lancaster_stem

lex_suffix

lex_token_with_no_letters

lex_extended_pattern

lex_vocal_pattern

lex_pattern

lex_treetagger_pos

lex_token_with_no_letters_and_numbers

lex_tense

ga

z_countries

ga

z_iso_countries

ga

z_nationalities

ga

z_uscities

lex_polarity

TIMEX3(class)

ga

z_female_names

ga

z_festivities

ga

z_male_names

ga

z_stopword

lex_first_upper

lex_has_digit

lex_has_symbols

lex_is_all_caps_and_dots

lex_is_all_digits_and_dots

lex_is_alnum

lex_is_alpha

lex_is_decimal

lex_is_digit

lex_is_lower

lex_is_numeric

lex_is_title

lex_is_upper

lex_last_s

lex_unusual

temp_cardinal

temp_compound

temp_digit

temp_festivity

temp_future_ref

temp_fuzzy_quantifier

temp_literal_number

temp_modifier

temp_month

temp_number

temp_ordinal

temp_past_ref

temp_period

temp_pod

temp_present_ref

temp_season

temp_signal

temp_temporal_adjectives

temp_temporal_adverbs

temp_temporal_co-reference

temp_temporal_conjunctives

temp_temporal_prepositions

temp_time

temp_weekday

temp_year

lex_chunk

lex_is_space

lex_pnp

ph

on_first_phoneme

ph

on_form

ph

on_last_phoneme

ph

on_length



67/98

11/12/2014, Tokyo


/25

Post-processing analysis

67



68/98

11/12/2014, Tokyo

p p

/25

Temporal

" ManTIME

" wikipedia pages

" using dates only

" gaussian fit

to be improved

68

Galileo Galilei

(1564-1642)

Dante(1265-1321)


69/98


ManTIME architecture



70/98

11/12/2014, Tokyo

p p

/25

feature type selection

" 93 features

morpho-lexical, syntactic, gazetteers and WordNet

" 4 models M1: morpho-lexical only

M2: morpho-lexical + syntactic

M3: morpho-lexical + gazeetters

M4: morpho-lexical + gazeetters + WordNet

" model selection

70



71/98

11/12/2014, Tokyo

p p

/25Silver + Gold, 5x10-fold cross validation

model selection result

That means Unisys must pay about $100 million in interest every

quarter, on top of $27 million in dividends on preferred stock.

71




"M4: morpho-lexical + gazeetters + WordNet



72/98

11/12/2014, Tokyo /25Source:TempEval-3 challenge; Corpora released in October 2012 (except the eval).

TempEval-3

" temporal information extraction challenge

" organised every 3 years in SemEval (ACL)

72

Corpus#

documents#

wordsannotation

sourcepurpose

AQUAINT 73 33.973 experts training

TimeBank 183 61.418 experts training

TempEval-3 silver 2.452 666.309 systems training

TempEval-3 eval 20 6.375 experts testing



73/98

11/12/2014, Tokyo /25Silver + Gold; 4x10-fold cross validation

identification post-processing

" Probabilistic correction module

" BIO fixer

" Threshold-based label switcher

73

BIOfixerTbLS

BIOfixerPCMCRFs



74/98

11/12/2014, Tokyo /25

TempEval-3: results (Task A)

" investigate semi-supervised techniques

" approach the normalisation phase in a novel way

" investigate the differences between general and clinical

domain

" investigate the use of the proposed framework to other

domains

" suggest a more temporally-aware error measure in the

normalisation

74

Training data (post-processing)

Identification Normalisationaccuracy Overall

scorestrict matching lenient matching

Prec Rec F1 Prec Rec F1 Type Value

Human&Silver (no) 0.79 0.64 0.7 0.97 0.79 0.87 0.89 0.77 0.672

Human&Silver (yes) 0.8 0.66 0.72 0.97 0.8 0.88 0.87 0.76 0.667

Human (no) 0.76 0.64 0.7 0.95 0.8 0.87 0.87 0.77 0.675

Human (yes) 0.79 0.7 0.74 0.95 0.85 0.9 0.86 0.77 0.69

Silver (no) 0.78 0.63 0.7 0.97 0.8 0.87 0.89 0.77 0.672

Silver (yes) 0.82 0.66 0.73 0.98 0.79 0.88 0.91 0.78 0.683

Source:M. Filannino, G. Brown, and G. Nenadic. ManTIME: Temporal expression identification andnormalization in the TempEval-3 challenge. Proceedings of the Seventh International Workshop on

Semantic Evaluation (SemEval 2013), pages 5357, Atlanta, Georgia, USA, June 2013. ACL.



75/98

11/12/2014, Tokyo /25

Source: Naushad UzZaman, Hector Llorens, Leon Derczynski, James Allen, Marc Verhagen,and James Pustejovsky. Semeval-2013 task 1: Tempeval-3: Evaluating time expressions,

events, and temporal relations. Proceedings of the Seventh International Workshop onSemantic Evaluation (SemEval 2013), pages 1-9, Atlanta, Georgia, USA, June 2013. ACL.

TempEval-3: ranking (Task A)

75

System(best run only)




HeidelTime 0.84 0.79 0.81 0.93 0.88 0.9 0.91 0.86 0.776

NavyTime 0.79 0.8 0.8 0.89 0.91 0.9 0.89 0.79 0.71

ManTIME 0.79 0.7 0.74 0.95 0.85 0.9 0.86 0.77 0.69

SUTime 0.79 0.8 0.8 0.89 0.91 0.9 0.89 0.75 0.674

ATT 0.91 0.7 0.79 0.98 0.75 0.85 0.91 0.77 0.656

ClearTK 0.86 0.8 0.83 0.94 0.87 0.9 0.93 0.72 0.647JU-CSE 0.82 0.7 0.75 0.93 0.8 0.86 0.87 0.74 0.638

KUL 0.77 0.63 0.69 0.93 0.76 0.84 0.89 0.75 0.63

FSS-TimEx 0.52 0.46 0.49 0.9 0.8 0.85 0.81 0.68 0.582



76/98

11/12/2014, Tokyo /25

TempEval-3: results (Task A)

" investigate semi-supervised techniques

" approach the normalisation phase in a novel way

" investigate the differences between general and clinical

domain

" investigate the use of the proposed framework to other

domains

" suggest a more temporally-aware error measure in the

normalisation

76

Training data (post-processing)




Human&Silver (no) 0.79 0.64 0.7 0.97 0.79 0.87 0.89 0.77 0.672

Human&Silver (yes) 0.8 0.66 0.72 0.97 0.8 0.88 0.87 0.76 0.667

Human (no) 0.76 0.64 0.7 0.95 0.8 0.87 0.87 0.77 0.675

Human (yes) 0.79 0.7 0.74 0.95 0.85 0.9 0.86 0.77 0.69

Silver (no) 0.78 0.63 0.7 0.97 0.8 0.87 0.89 0.77 0.672

Silver (yes) 0.82 0.66 0.73 0.98 0.79 0.88 0.91 0.78 0.683

Source:M. Filannino, G. Brown, and G. Nenadic. ManTIME: Temporal expression identification andnormalization in the TempEval-3 challenge. Proceedings of the Seventh International Workshop on

Semantic Evaluation (SemEval 2013), pages 5357, Atlanta, Georgia, USA, June 2013. ACL.



77/98

11/12/2014, Tokyo /25

Source: Naushad UzZaman, Hector Llorens, Leon Derczynski, James Allen, Marc Verhagen,and James Pustejovsky. Semeval-2013 task 1: Tempeval-3: Evaluating time expressions,

events, and temporal relations. Proceedings of the Seventh International Workshop onSemantic Evaluation (SemEval 2013), pages 1-9, Atlanta, Georgia, USA, June 2013. ACL.

TempEval-3: ranking (Task A)

77

System(best run only)




HeidelTime 0.84 0.79 0.81 0.93 0.88 0.9 0.91 0.86 0.776

NavyTime 0.79 0.8 0.8 0.89 0.91 0.9 0.89 0.79 0.71

ManTIME 0.79 0.7 0.74 0.95 0.85 0.9 0.86 0.77 0.69

SUTime 0.79 0.8 0.8 0.89 0.91 0.9 0.89 0.75 0.674

ATT 0.91 0.7 0.79 0.98 0.75 0.85 0.91 0.77 0.656

ClearTK 0.86 0.8 0.83 0.94 0.87 0.9 0.93 0.72 0.647JU-CSE 0.82 0.7 0.75 0.93 0.8 0.86 0.87 0.74 0.638

KUL 0.77 0.63 0.69 0.93 0.76 0.84 0.89 0.75 0.63

FSS-TimEx 0.52 0.46 0.49 0.9 0.8 0.85 0.81 0.68 0.582



78/98

11/12/2014, Tokyo /25

feature type selection

" 93 features

morpho-lexical, syntactic, gazetteers and WordNet

" 4 models M1: morpho-lexical only

M2: morpho-lexical + syntactic

M3: morpho-lexical + gazeetters

M4: morpho-lexical + gazeetters + WordNet

" model selection

78



79/98

11/12/2014, Tokyo /25Silver + Gold, 5x10-fold cross validation

model selection result

That means Unisys must pay about $100 million in interest every

quarter, on top of $27 million in dividends on preferred stock.

79




"M4: morpho-lexical + gazeetters + WordNet



80/98

11/12/2014, Tokyo /25Source:TempEval-3 challenge; Corpora released in October 2012 (except the eval).

TempEval-3

" temporal information extraction challenge

" organised every 3 years in SemEval (ACL)

80

Corpus#

documents#

wordsannotation

sourcepurpose

AQUAINT 73 33.973 experts training

TimeBank 183 61.418 experts training

TempEval-3 silver 2.452 666.309 systems training

TempEval-3 eval 20 6.375 experts testing



81/98

11/12/2014, Tokyo /25Silver + Gold; 4x10-fold cross validation

identification post-processing

" Probabilistic correction module

" BIO fixer

" Threshold-based label switcher

81

BIOfixerTbLS

BIOfixerPCMCRFs


82/98


r ear backu



83/98

11/12/2014, Tokyo /25Source: Temporal Information Extraction and Shallow Temporal Reasoning, D. Roth et al. 2012

why is it challenging?

1. Matt exercised during his lunch break.

2. He stretched, lifted weights, and ran.

3. He showered, got dressed and returned work.

83



84/98

11/12/2014, Tokyo /25

1. Matt exercised(E)during his lunch break(E).

2. He stretched(E), lifted(E)weights, and ran(E).

3. He showered(E), got dressed(E)and returned(E)work.

Source: Temporal Information Extraction and Shallow Temporal Reasoning, D. Roth et al. 2012

linguistic knowledge

84

exercised

lunch break



85/98

11/12/2014, Tokyo /25






85

stretch, lift, run

lunch break

exercised



86/98

11/12/2014, Tokyo /25






86

shower, dress, return

lunch break

exercised

stretch, lift, run



87/98

11/12/2014, Tokyo /25





common sense knowledge

87

shower, dress, return

lunch break

exercised

stretch, lift, run



88/98

11/12/2014, Tokyo /25






88

lunch break

exercised shower, dress, return

stretch, lift, run



89/98

11/12/2014, Tokyo/25






89

lunch break

exercised shower dress return

stretch, lift, run



90/98

11/12/2014, Tokyo/25





domain knowledge

90

stretch liftrun stretch

lunch break

exercised shower dress return


91/98


Tem oral foot rint



92/98

11/12/2014, Tokyo /25E: 0.159

results

" Robin Williams (1951 - 2014), prediction: 1953-2006

92



93/98

11/12/2014, Tokyo /25Prediction: 1366-2057 (1451-1506), E: 0.92

other types of temporal footprint?

" Christopher Columbus will die in 2057?!

93

AHA!



94/98

11/12/2014, Tokyo /25

physical existence vs. social coverage

" Anne Franks footprint is shifted in the future

94


95/98


Tem oralia



96/98

11/12/2014, Tokyo /25

data

" training set: 100 queries

" benchmark test set: 300 queries

96

Query Submission date CLASS

Movies 2012 Feb 28, 2013 past

Upcoming Movies in 2013 Jan 1, 2013 future

2013 MLB PlayoffSchedule Jan 1, 2013 future

current price of gold Feb 28, 2013 present

Amazon Deal of the Day Feb 28, 2013 present

Number of Neck Muscles Feb 28, 2013 atemporal



97/98

11/12/2014, Tokyo /25

attributes

97

ID QuerySubmitted runs

Minimal Intermediate Full

1 Is it a Wikipedia page title? ! !

2 Does the query contain a temporal expression? ! ! !

3 Submissions term !

4 Submissions trimester !

5 Timing ! ! !

6 Most frequent trigger class ! !

7 Wh type ! !

8 Most frequent TempoWordNet class !

9 Most frequent POS tag tense ! ! !

10 Most frequent coarse-grained POS tag ! !

11 Trigger classes footprint ! ! !

12 Tem oral$between submission and uer ! !

13 Tenses footprint ! !

14 Ordered TempoWordNet classes !

15 Most frequent fine-grained POS tag ! !

16 Coarse-grained POS tag ordered footprint !

17 Fine-grained POS tag ordered footprint !

18 Coarse- rained POS ta foot rint !19 Fine- rained POS ta foot rint !



98/98

error measure

union overlap

gold

prediction

using machine learning to predict temporal orientation of search engine queries in the temporalia...

Documents