temporal information extraction in the general and clinical domain
DESCRIPTION
30th October 2014 School of Computer Science Research Symposium 2014 The University of ManchesterTRANSCRIPT
Manchester, 30/10/2014
School of Computer Science Research Symposium 2014
Temporal information extractionin the general and clinical domain
Michele Filannino
Supervisor: Goran Nenadic Co-Supervisor: Gavin Brown
/ 2930/10/2014, Manchester
presentation: Research Symposium 2014
2
Natural Language Processing
Linguistics
Parallelcomputing
Semi-structureddata
Statistics
MachineLearning
Text
Mining
/ 2930/10/2014, Manchester
presentation: Research Symposium 2014
temporal information extraction
■ source: written texts
■ goal: a (machine-understandable)
temporal representation of the texts
■ easy for people
■ hard for machines
Temporal aspects of events provide a natural
mechanism for organising information
/ 2930/10/2014, Manchester
presentation: Research Symposium 2014
Source: ISO-TimeML (ISO/TC37/SC 4 N412 ), rev. 12, 2007
linguistic key concepts
■ temporal expressions: phrases denoting a temporal
entity such as an interval or a time point
● 01/05/2014, March 15, the next week, Saturday, at that time,
yesterday, 5 o’clock, 3 days, every 4 hours
■ events: phrases denoting eventuality and states
● inflected verbs and nouns: spoken, deliver, will be published
■ links: temporal relation between two phrases
● BEFORE, AFTER, INCLUDES, ENDS, DURING, BEGINS
4
/ 2930/10/2014, Manchester
presentation: Research Symposium 2014
Source: CNN news article published on 28th February 2010.
example
■ Yesterday, Deutsche Bank released a note saying
that China's current economic policies would result in an
enormous surge in coal consumption over the next
decade.
5
/ 2930/10/2014, Manchester
presentation: Research Symposium 2014
Source: CNN news article published on 28th February 2010.
example: temporal expressions
■ Yesterday(T), Deutsche Bank released a note saying
that China's current economic policies would result in
an enormous surge in coal consumption over the next
decade(T).
6
value: “2010-02-27” type: DATE
value: “P10Y” type: DURATION
/ 2930/10/2014, Manchester
presentation: Research Symposium 2014
Source: CNN news article published on 28th February 2010.
example: events
■ Yesterday(T), Deutsche Bank released(E) a note saying(E)
that China's current economic policies would result(E) in
an enormous surge(E) in coal consumption over the next
decade(T).
7
class: OCCURRENCE
class: REPORTING
class: OCCURRENCEclass: OCCURRENCE
/ 2930/10/2014, Manchester
presentation: Research Symposium 2014
Source: CNN news article published on 28th February 2010.
example: links
■ Yesterday(T), Deutsche Bank released(E) a note saying(E)
that China's current economic policies would result(E) in
an enormous surge(E) in coal consumption over the next
decade(T).
8
is included
is included
after
is included
/ 2930/10/2014, Manchester
presentation: Research Symposium 2014
example: ISO-TimeML output<TimeML … xsi:noNamespaceSchemaLocation=“http://timeml.org/timeMLdocs/TimeML_1.2.1.xsd”>
<DOCID>nyt_20100228_china_pollution</DOCID>
<DCT><TIMEX3 functionInDocument="CREATION_TIME" tid="t0" type="DATE"
value=“2010-02-28">2010-02-28</TIMEX3></DCT>
<TITLE>As Pollution Worsens in China, Solutions Succumb to Infighting</TITLE>
<TEXT>
<TIMEX3 tid="t1" type="DATE" value=“2010-02-27">Yesterday</TIMEX3>, Deutsche Bank <EVENT
class="OCCURRENCE" eid="e1">released</EVENT> a note <EVENT class="REPORTING" eid="e2">saying</EVENT>
that China's <TIMEX3 tid="t2" type="DATE" value=“PRESENT_REF”>current</TIMEX3> economic policies
would <EVENT class="OCCURRENCE" eid="e3">result</EVENT> in an enormous <EVENT class="OCCURRENCE"
eid="e4">surge</EVENT> in coal <EVENT class="OCCURRENCE" eid="e5">consumption</EVENT> over <TIMEX3
tid="t3" type="DURATION" value="P10Y">the next decade</TIMEX3>.
</TEXT>
<TLINK eventInstanceID="ei1" lid="l52" relType="IS_INCLUDED" relatedToTime="t1"/>
<TLINK eventInstanceID="ei4" lid="l53" relType="IS_INCLUDED" relatedToTime="t2"/>
<TLINK eventInstanceID="ei2" lid="l54" relType=“IS_INCLUDED" relatedToTime="t1"/>
<TLINK eventInstanceID="ei4" lid="l59" relType="AFTER" relatedToEventInstance=“ei1"/>
</TimeML>
9
/ 2930/10/2014, Manchester
presentation: Research Symposium 2014
Utterance time: 28th February 2010.
visual representation
10
now27 Feb. 2010
released, saying
2020
surge
/ 2930/10/2014, Manchester
presentation: Research Symposium 2014
proposed approach
■ Data-driven rather than rule-based
■ no pre-existing tools available
■ Conditional Random Fields
(Linear chain)
● sequences of labels, for
sequences of samples
11
/ 2930/10/2014, Manchester
presentation: Research Symposium 2014
proposed approach
■ Data-driven rather than rule-based
■ no pre-existing tools available
■ Conditional Random Fields
(Linear chain)
● sequences of labels, for
sequences of samples
12
words
/ 2930/10/2014, Manchester
presentation: Research Symposium 2014
proposed approach
■ Data-driven rather than rule-based
■ no pre-existing tools available
■ Conditional Random Fields
(Linear chain)
● sequences of labels, for
sequences of samples
13
wordssentences
/ 2930/10/2014, Manchester
presentation: Research Symposium 2014
5x10-fold cross validation
post-processing pipeline
14
■ Probabilistic correction
■ BIO fixer
■ Threshold-based label
switcher
BIOfixer
TbLSBIOfixer
PCMCRFs
/ 2930/10/2014, Manchester
presentation: Research Symposium 2014
Rule-based Machine learning-based
TempEval-3 results
15
Research groupIdentification Normalisation
accuracyOverall score
Prec. Rec. F1
The University of Heidelberg 0.93 0.88 0.9 0.86 0.776
US Naval Academy 0.89 0.91 0.9 0.79 0.71
The University of Manchester 0.95 0.85 0.9 0.77 0.69
Stanford University 0.89 0.91 0.9 0.75 0.674
AT&T Lab Research 0.98 0.75 0.85 0.77 0.656
University of Colorado Boulder 0.94 0.87 0.9 0.72 0.647
Jadavpur University 0.93 0.8 0.86 0.74 0.638
Katholieke Universiteit Leuven 0.93 0.76 0.84 0.75 0.63
Joint Research Centre European Commission 0.9 0.8 0.85 0.68 0.582
/ 2930/10/2014, Manchester
presentation: Research Symposium 2014
model selection
16
Source: Filannino, M., and Nenadic G. ManTIME: Temporal expression extraction with systematic feature type selection and a posteriori label adjustment. Journal of Information processing and Management: Special Issue on Time and Information Retrieval, (2014), Elsevier. (under review)
*5x10-fold cross validation
93 features, 4 models:
■ M1: morpho-lexical only
■ M2: morpho-lexical + syntactic
■ M3: morpho-lexical + gazeetters
■ M4: morpho-lexical + gazeetters
+ WordNet
/ 2930/10/2014, Manchester
presentation: Research Symposium 2014
Source: Kova¢evi¢, A., Dehghan, A., Filannino, M., Keane, J. A., and Nenadic, G. Combining rules and machine learning for extraction of temporal expressions and events from clinical narratives. Journal of American Medical Informatics (2013).
i2b2 shared Task ‘12
17
ADMISSION DATE: 2011-02-06; DISCHARGE DATE: 2011-02-08; HISTORY OF PRESENT ILLNESS: Mr. Pohl is a 53 - year-old male with history of alcohol use and hypertension. Blood alcohol level was 383. Agitated in emergency room requiring 4 leather restraints, received 5 mg of Haldol, 2 mg of Ativan. He became hypotensive in the emergency room with a systolic blood pressure in the 80 's and had decreased respiratory rate. He received a normal saline bolus of 2 litres of good blood pressure response. The patient was then admitted to the medical Intensive Care Unit for observation and then transferred to our service on medicine when the blood pressures remained stable overnight...
06/02/2011 07/02/2011 08/02/2011
General
Tests
Treatments
Problems
admission discharge
BAL 383
Haldol 4mgAtivan 2mg
hypotensive
SBP ~80decreased respiratory rate
Saline bolus 2l
transfer
stable
SBP stable
hands tremor improved
blood pressure medications
/ 2930/10/2014, Manchester
presentation: Research Symposium 2014
clinical data
■ disease progression
modelling
■ analysis of the effectiveness
of treatments
■ extraction of patient’s clinical
pathway
18
presentation: Research Symposium 2014
Better software, better research…
/ 2930/10/2014, Manchester
presentation: Research Symposium 2014
Source: Filannino, M., Nenadic G. Mining temporal footprints from Wikipedia. Proceedings of the First AHA!-Workshop on Information Discovery in Text. (COLING 2014) (Dublin, Ireland, August 2014), ACL.
temporal footprint
A temporal footprint is a continuous period
on the time-line that temporally defines
the existence of a particular concept.
20
/ 2930/10/2014, Manchester
presentation: Research Symposium 2014
evaluation
■ subjects: people
■ lived from 1000 AD to 2014
● text from Wikipedia web pages
● year of birth and death from DBpedia
■ 228,824 people collected
■ simple definition of temporal footprint
● birth and death dates
21
/ 2930/10/2014, Manchester
presentation: Research Symposium 2014
Error: 0.204
results
■ Galileo Galilei (1564-1642), prediction: 1556-1654
22
/ 2930/10/2014, Manchester
presentation: Research Symposium 2014
Source: http://www.cs.man.ac.uk/~filannim/projects/temporal_footprints/
results
■ Computer (1940-today), prediction: 1882-1982
23
/ 2930/10/2014, Manchester
presentation: Research Symposium 2014
Source: http://start.csail.mit.edu/answer.php?query=
application?
24
/ 2930/10/2014, Manchester
presentation: Research Symposium 2014
temporal intent of queries
25Source: Filannino, M., Nenadic G. Using machine learning to predict temporal orientation of search engines’ queries in the Temporalia challenge. In Proceedings of the Sixth International Workshop on Evaluating Information Access (EVIA 2014) (Tokyo, Japan, December 2014).
Can we predict the temporal intent of search engine’s
user queries?
■ input: queries & submission date
■ output: past, present, future
or atemporal
/ 2930/10/2014, Manchester
presentation: Research Symposium 2014
Source: https://www.google.co.uk/search?q=google+stock+price
queries’ temporal intent
26
/ 2930/10/2014, Manchester
presentation: Research Symposium 2014
Source: https://www.google.co.uk/search?q=weather+forecast+manchester
queries’ temporal intent
27
/ 2930/10/2014, Manchester
presentation: Research Symposium 2014
1st ranked system
results
28
Accu
racy
0
25
50
75
100
Full Intermediate Minimal Minimalfixed
72.33%66.33%
61.33%
55.00%
/ 2930/10/2014, Manchester
presentation: Research Symposium 2014
just the tip of the iceberg…
■ “I’ve played Tennis for 10 years” vs.
“I’ve played Tennis for 4 hours.”
■ How can they be ported to different domains?
■ How can they be adapted to different languages?
■ Is ISO-TimeML enough to cover different domains/
languages?
29RESEARCH
Thank you.
/ 2930/10/2014, Manchester
presentation: Research Symposium 2014
publications■ Non peer-reviewed:
● Filannino, M. Temporal expression normalisation in natural language texts. CoRR abs/1206.2010 (2012).
■ Peer-reviewed:
● Kovačevič, A., Dehghan, A., Filannino, M., Keane, J. A., and Nenadic, G. Combining rules and machine learning for extraction of
temporal expressions and events from clinical narratives. Journal of American Medical Informatics (2013).
● Filannino, M., Brown, G., and Nenadic, G. ManTIME: Temporal expression identification and normalization in the TempEval-3
challenge. Proceedings of the Seventh International Work- shop on Semantic Evaluation (SemEval 2013) (Atlanta, Georgia,
USA, June 2013), ACL.
● Filannino, M., Nenadic G. Mining temporal footprints from Wikipedia. Proceedings of the First AHA!-Workshop on Information
Discovery in Text. (COLING 2014) (Dublin, Ireland, August 2014), ACL.
● Filannino, M., Nenadic G. Using machine learning to predict temporal orientation of search engines’ queries in the Temporalia
challenge. In Proceedings of the Sixth International Workshop on Evaluating Information Access (EVIA 2014) (Tokyo, Japan,
December 2014).
■ Under review:
● Filannino, M., and Nenadic G. ManTIME: Temporal expression extraction with systematic feature type selection and a
posteriori label adjustment. Journal of Information processing and Management: Special Issue on Time and Information
Retrieval, (2014), Elsevier.
31