temporal information extraction in the general and clinical domain

32
[email protected] Manchester, 30/10/2014 School of Computer Science Research Symposium 2014 Temporal information extraction in the general and clinical domain Michele Filannino Supervisor: Goran Nenadic Co-Supervisor: Gavin Brown

Upload: michele-filannino

Post on 22-Jun-2015

330 views

Category:

Science


0 download

DESCRIPTION

30th October 2014 School of Computer Science Research Symposium 2014 The University of Manchester

TRANSCRIPT

Page 1: Temporal information extraction in the general and clinical domain

[email protected]

Manchester, 30/10/2014

School of Computer Science Research Symposium 2014

Temporal information extractionin the general and clinical domain

Michele Filannino

Supervisor: Goran Nenadic Co-Supervisor: Gavin Brown

Page 2: Temporal information extraction in the general and clinical domain

/ 2930/10/2014, Manchester

presentation: Research Symposium 2014

2

Natural Language Processing

Linguistics

Parallelcomputing

Semi-structureddata

Statistics

MachineLearning

Text

Mining

Page 3: Temporal information extraction in the general and clinical domain

/ 2930/10/2014, Manchester

presentation: Research Symposium 2014

temporal information extraction

■ source: written texts

■ goal: a (machine-understandable)

temporal representation of the texts

■ easy for people

■ hard for machines

Temporal aspects of events provide a natural

mechanism for organising information

Page 4: Temporal information extraction in the general and clinical domain

/ 2930/10/2014, Manchester

presentation: Research Symposium 2014

Source: ISO-TimeML (ISO/TC37/SC 4 N412 ), rev. 12, 2007

linguistic key concepts

■ temporal expressions: phrases denoting a temporal

entity such as an interval or a time point

● 01/05/2014, March 15, the next week, Saturday, at that time,

yesterday, 5 o’clock, 3 days, every 4 hours

■ events: phrases denoting eventuality and states

● inflected verbs and nouns: spoken, deliver, will be published

■ links: temporal relation between two phrases

● BEFORE, AFTER, INCLUDES, ENDS, DURING, BEGINS

4

Page 5: Temporal information extraction in the general and clinical domain

/ 2930/10/2014, Manchester

presentation: Research Symposium 2014

Source: CNN news article published on 28th February 2010.

example

■ Yesterday, Deutsche Bank released a note saying

that China's current economic policies would result in an

enormous surge in coal consumption over the next

decade.

5

Page 6: Temporal information extraction in the general and clinical domain

/ 2930/10/2014, Manchester

presentation: Research Symposium 2014

Source: CNN news article published on 28th February 2010.

example: temporal expressions

■ Yesterday(T), Deutsche Bank released a note saying

that China's current economic policies would result in

an enormous surge in coal consumption over the next

decade(T).

6

value: “2010-02-27” type: DATE

value: “P10Y” type: DURATION

Page 7: Temporal information extraction in the general and clinical domain

/ 2930/10/2014, Manchester

presentation: Research Symposium 2014

Source: CNN news article published on 28th February 2010.

example: events

■ Yesterday(T), Deutsche Bank released(E) a note saying(E)

that China's current economic policies would result(E) in

an enormous surge(E) in coal consumption over the next

decade(T).

7

class: OCCURRENCE

class: REPORTING

class: OCCURRENCEclass: OCCURRENCE

Page 8: Temporal information extraction in the general and clinical domain

/ 2930/10/2014, Manchester

presentation: Research Symposium 2014

Source: CNN news article published on 28th February 2010.

example: links

■ Yesterday(T), Deutsche Bank released(E) a note saying(E)

that China's current economic policies would result(E) in

an enormous surge(E) in coal consumption over the next

decade(T).

8

is included

is included

after

is included

Page 9: Temporal information extraction in the general and clinical domain

/ 2930/10/2014, Manchester

presentation: Research Symposium 2014

example: ISO-TimeML output<TimeML … xsi:noNamespaceSchemaLocation=“http://timeml.org/timeMLdocs/TimeML_1.2.1.xsd”>

<DOCID>nyt_20100228_china_pollution</DOCID>

<DCT><TIMEX3 functionInDocument="CREATION_TIME" tid="t0" type="DATE"

value=“2010-02-28">2010-02-28</TIMEX3></DCT>

<TITLE>As Pollution Worsens in China, Solutions Succumb to Infighting</TITLE>

<TEXT>

<TIMEX3 tid="t1" type="DATE" value=“2010-02-27">Yesterday</TIMEX3>, Deutsche Bank <EVENT

class="OCCURRENCE" eid="e1">released</EVENT> a note <EVENT class="REPORTING" eid="e2">saying</EVENT>

that China's <TIMEX3 tid="t2" type="DATE" value=“PRESENT_REF”>current</TIMEX3> economic policies

would <EVENT class="OCCURRENCE" eid="e3">result</EVENT> in an enormous <EVENT class="OCCURRENCE"

eid="e4">surge</EVENT> in coal <EVENT class="OCCURRENCE" eid="e5">consumption</EVENT> over <TIMEX3

tid="t3" type="DURATION" value="P10Y">the next decade</TIMEX3>.

</TEXT>

<TLINK eventInstanceID="ei1" lid="l52" relType="IS_INCLUDED" relatedToTime="t1"/>

<TLINK eventInstanceID="ei4" lid="l53" relType="IS_INCLUDED" relatedToTime="t2"/>

<TLINK eventInstanceID="ei2" lid="l54" relType=“IS_INCLUDED" relatedToTime="t1"/>

<TLINK eventInstanceID="ei4" lid="l59" relType="AFTER" relatedToEventInstance=“ei1"/>

</TimeML>

9

Page 10: Temporal information extraction in the general and clinical domain

/ 2930/10/2014, Manchester

presentation: Research Symposium 2014

Utterance time: 28th February 2010.

visual representation

10

now27 Feb. 2010

released, saying

2020

surge

Page 11: Temporal information extraction in the general and clinical domain

/ 2930/10/2014, Manchester

presentation: Research Symposium 2014

proposed approach

■ Data-driven rather than rule-based

■ no pre-existing tools available

■ Conditional Random Fields

(Linear chain)

● sequences of labels, for

sequences of samples

11

Page 12: Temporal information extraction in the general and clinical domain

/ 2930/10/2014, Manchester

presentation: Research Symposium 2014

proposed approach

■ Data-driven rather than rule-based

■ no pre-existing tools available

■ Conditional Random Fields

(Linear chain)

● sequences of labels, for

sequences of samples

12

words

Page 13: Temporal information extraction in the general and clinical domain

/ 2930/10/2014, Manchester

presentation: Research Symposium 2014

proposed approach

■ Data-driven rather than rule-based

■ no pre-existing tools available

■ Conditional Random Fields

(Linear chain)

● sequences of labels, for

sequences of samples

13

wordssentences

Page 14: Temporal information extraction in the general and clinical domain

/ 2930/10/2014, Manchester

presentation: Research Symposium 2014

5x10-fold cross validation

post-processing pipeline

14

■ Probabilistic correction

■ BIO fixer

■ Threshold-based label

switcher

BIOfixer

TbLSBIOfixer

PCMCRFs

Page 15: Temporal information extraction in the general and clinical domain

/ 2930/10/2014, Manchester

presentation: Research Symposium 2014

Rule-based Machine learning-based

TempEval-3 results

15

Research groupIdentification Normalisation

accuracyOverall score

Prec. Rec. F1

The University of Heidelberg 0.93 0.88 0.9 0.86 0.776

US Naval Academy 0.89 0.91 0.9 0.79 0.71

The University of Manchester 0.95 0.85 0.9 0.77 0.69

Stanford University 0.89 0.91 0.9 0.75 0.674

AT&T Lab Research 0.98 0.75 0.85 0.77 0.656

University of Colorado Boulder 0.94 0.87 0.9 0.72 0.647

Jadavpur University 0.93 0.8 0.86 0.74 0.638

Katholieke Universiteit Leuven 0.93 0.76 0.84 0.75 0.63

Joint Research Centre European Commission 0.9 0.8 0.85 0.68 0.582

Page 16: Temporal information extraction in the general and clinical domain

/ 2930/10/2014, Manchester

presentation: Research Symposium 2014

model selection

16

Source: Filannino, M., and Nenadic G. ManTIME: Temporal expression extraction with systematic feature type selection and a posteriori label adjustment. Journal of Information processing and Management: Special Issue on Time and Information Retrieval, (2014), Elsevier. (under review)

*5x10-fold cross validation

93 features, 4 models:

■ M1: morpho-lexical only

■ M2: morpho-lexical + syntactic

■ M3: morpho-lexical + gazeetters

■ M4: morpho-lexical + gazeetters

+ WordNet

Page 17: Temporal information extraction in the general and clinical domain

/ 2930/10/2014, Manchester

presentation: Research Symposium 2014

Source: Kova¢evi¢, A., Dehghan, A., Filannino, M., Keane, J. A., and Nenadic, G. Combining rules and machine learning for extraction of temporal expressions and events from clinical narratives. Journal of American Medical Informatics (2013).

i2b2 shared Task ‘12

17

ADMISSION DATE: 2011-02-06; DISCHARGE DATE: 2011-02-08; HISTORY OF PRESENT ILLNESS: Mr. Pohl is a 53 - year-old male with history of alcohol use and hypertension. Blood alcohol level was 383. Agitated in emergency room requiring 4 leather restraints, received 5 mg of Haldol, 2 mg of Ativan. He became hypotensive in the emergency room with a systolic blood pressure in the 80 's and had decreased respiratory rate. He received a normal saline bolus of 2 litres of good blood pressure response. The patient was then admitted to the medical Intensive Care Unit for observation and then transferred to our service on medicine when the blood pressures remained stable overnight...

06/02/2011 07/02/2011 08/02/2011

General

Tests

Treatments

Problems

admission discharge

BAL 383

Haldol 4mgAtivan 2mg

hypotensive

SBP ~80decreased respiratory rate

Saline bolus 2l

transfer

stable

SBP stable

hands tremor improved

blood pressure medications

Page 18: Temporal information extraction in the general and clinical domain

/ 2930/10/2014, Manchester

presentation: Research Symposium 2014

clinical data

■ disease progression

modelling

■ analysis of the effectiveness

of treatments

■ extraction of patient’s clinical

pathway

18

Page 19: Temporal information extraction in the general and clinical domain

presentation: Research Symposium 2014

Better software, better research…

Page 20: Temporal information extraction in the general and clinical domain

/ 2930/10/2014, Manchester

presentation: Research Symposium 2014

Source: Filannino, M., Nenadic G. Mining temporal footprints from Wikipedia. Proceedings of the First AHA!-Workshop on Information Discovery in Text. (COLING 2014) (Dublin, Ireland, August 2014), ACL.

temporal footprint

A temporal footprint is a continuous period

on the time-line that temporally defines

the existence of a particular concept.

20

Page 21: Temporal information extraction in the general and clinical domain

/ 2930/10/2014, Manchester

presentation: Research Symposium 2014

evaluation

■ subjects: people

■ lived from 1000 AD to 2014

● text from Wikipedia web pages

● year of birth and death from DBpedia

■ 228,824 people collected

■ simple definition of temporal footprint

● birth and death dates

21

Page 22: Temporal information extraction in the general and clinical domain

/ 2930/10/2014, Manchester

presentation: Research Symposium 2014

Error: 0.204

results

■ Galileo Galilei (1564-1642), prediction: 1556-1654

22

Page 23: Temporal information extraction in the general and clinical domain

/ 2930/10/2014, Manchester

presentation: Research Symposium 2014

Source: http://www.cs.man.ac.uk/~filannim/projects/temporal_footprints/

results

■ Computer (1940-today), prediction: 1882-1982

23

Page 24: Temporal information extraction in the general and clinical domain

/ 2930/10/2014, Manchester

presentation: Research Symposium 2014

Source: http://start.csail.mit.edu/answer.php?query=

application?

24

Page 25: Temporal information extraction in the general and clinical domain

/ 2930/10/2014, Manchester

presentation: Research Symposium 2014

temporal intent of queries

25Source: Filannino, M., Nenadic G. Using machine learning to predict temporal orientation of search engines’ queries in the Temporalia challenge. In Proceedings of the Sixth International Workshop on Evaluating Information Access (EVIA 2014) (Tokyo, Japan, December 2014).

Can we predict the temporal intent of search engine’s

user queries?

■ input: queries & submission date

■ output: past, present, future

or atemporal

Page 26: Temporal information extraction in the general and clinical domain

/ 2930/10/2014, Manchester

presentation: Research Symposium 2014

Source: https://www.google.co.uk/search?q=google+stock+price

queries’ temporal intent

26

Page 27: Temporal information extraction in the general and clinical domain

/ 2930/10/2014, Manchester

presentation: Research Symposium 2014

Source: https://www.google.co.uk/search?q=weather+forecast+manchester

queries’ temporal intent

27

Page 28: Temporal information extraction in the general and clinical domain

/ 2930/10/2014, Manchester

presentation: Research Symposium 2014

1st ranked system

results

28

Accu

racy

0

25

50

75

100

Full Intermediate Minimal Minimalfixed

72.33%66.33%

61.33%

55.00%

Page 29: Temporal information extraction in the general and clinical domain

/ 2930/10/2014, Manchester

presentation: Research Symposium 2014

just the tip of the iceberg…

■ “I’ve played Tennis for 10 years” vs.

“I’ve played Tennis for 4 hours.”

■ How can they be ported to different domains?

■ How can they be adapted to different languages?

■ Is ISO-TimeML enough to cover different domains/

languages?

29RESEARCH

Page 30: Temporal information extraction in the general and clinical domain

Thank you.

Page 31: Temporal information extraction in the general and clinical domain

/ 2930/10/2014, Manchester

presentation: Research Symposium 2014

publications■ Non peer-reviewed:

● Filannino, M. Temporal expression normalisation in natural language texts. CoRR abs/1206.2010 (2012).

■ Peer-reviewed:

● Kovačevič, A., Dehghan, A., Filannino, M., Keane, J. A., and Nenadic, G. Combining rules and machine learning for extraction of

temporal expressions and events from clinical narratives. Journal of American Medical Informatics (2013).

● Filannino, M., Brown, G., and Nenadic, G. ManTIME: Temporal expression identification and normalization in the TempEval-3

challenge. Proceedings of the Seventh International Work- shop on Semantic Evaluation (SemEval 2013) (Atlanta, Georgia,

USA, June 2013), ACL.

● Filannino, M., Nenadic G. Mining temporal footprints from Wikipedia. Proceedings of the First AHA!-Workshop on Information

Discovery in Text. (COLING 2014) (Dublin, Ireland, August 2014), ACL.

● Filannino, M., Nenadic G. Using machine learning to predict temporal orientation of search engines’ queries in the Temporalia

challenge. In Proceedings of the Sixth International Workshop on Evaluating Information Access (EVIA 2014) (Tokyo, Japan,

December 2014).

■ Under review:

● Filannino, M., and Nenadic G. ManTIME: Temporal expression extraction with systematic feature type selection and a

posteriori label adjustment. Journal of Information processing and Management: Special Issue on Time and Information

Retrieval, (2014), Elsevier.

31

Page 32: Temporal information extraction in the general and clinical domain

Contact:

[email protected]

?QUESTIONS