medical information retrieval and its evaluation: an overview of clef ehealth evaluation task

57
Medical Information Retrieval and its Evaluation: an Overview of CLEF eHealth Evaluation Task Lorraine Goeuriot LIG – Université Grenoble Alpes (France) [email protected]

Upload: lorrainegoeuriot

Post on 07-Aug-2015

39 views

Category:

Technology


3 download

TRANSCRIPT

Page 1: Medical Information Retrieval and its Evaluation: an Overview of CLEF eHealth Evaluation Task

Medical Information Retrieval and its Evaluation: an Overview of CLEF eHealth Evaluation Task

Lorraine GoeuriotLIG – Université Grenoble Alpes (France)

[email protected]

Page 2: Medical Information Retrieval and its Evaluation: an Overview of CLEF eHealth Evaluation Task

2

Presentation Overview

• Medical IR and its Evaluation• CLEF eHealth

– Context and tasks– IR tasks description– Datasets– Evaluation– Participation

• Conclusion

Page 3: Medical Information Retrieval and its Evaluation: an Overview of CLEF eHealth Evaluation Task

3

Presentation Overview

• Medical IR and its Evaluation• CLEF eHealth

– Context and tasks– IR tasks description– Datasets– Evaluation– Participation

• Conclusion

Page 4: Medical Information Retrieval and its Evaluation: an Overview of CLEF eHealth Evaluation Task

4

Medical Professionals – Web Search and Data

• Online information search on a regular basis

• Search failure for 2 patients out of 3

• PubMed search: very long (30+ minutes against 5 available)

• Knowledge production constantly growing

• More and more publications

• Varying web access

Page 5: Medical Information Retrieval and its Evaluation: an Overview of CLEF eHealth Evaluation Task

5

Medical Professionals – Web Search and Data

Page 6: Medical Information Retrieval and its Evaluation: an Overview of CLEF eHealth Evaluation Task

6

Patients and general public

• Change in the patient-physician relationship

• Patients more committed - cybercondria• How can information quality be guaranteed?

Page 7: Medical Information Retrieval and its Evaluation: an Overview of CLEF eHealth Evaluation Task

7

Patients – Web Search and Data

Page 8: Medical Information Retrieval and its Evaluation: an Overview of CLEF eHealth Evaluation Task

8

Patients – Web Search and Data

Page 9: Medical Information Retrieval and its Evaluation: an Overview of CLEF eHealth Evaluation Task

9

Patients – Web Search and Data

Page 10: Medical Information Retrieval and its Evaluation: an Overview of CLEF eHealth Evaluation Task

10

Medical Information Retrieval

• How different is medical IR from general IR? – Domain-specific search: narrowing down the

applications to improve results for categories of users– Consequences of bad performances of a medical search

system • Characteristics of medical IR:

– Data: medical/clinical reports, research papers, medical websites…

– Information need: decision support, technology/progress watch, education, daily care…

– Evaluation: relevance, readability, trustworthiness, time

Page 11: Medical Information Retrieval and its Evaluation: an Overview of CLEF eHealth Evaluation Task

11

Evaluating Information Retrieval?

Did the user find the information she needed? How many relevant documents did she get back? What is a relevant document? How many unrelevant document did she get back? How long before she found the information? Is she satisfied with the results? …

Did the user find the information she needed? How many relevant documents did she get back? What is a relevant document? How many unrelevant document did she get back? How long before she found the information? Is she satisfied with the results? …

• Creation of (artificial) datasets representing a specific search task, in order to compare various systems efficiency

• Involving human rating• Shared with the community to improve IR

Page 12: Medical Information Retrieval and its Evaluation: an Overview of CLEF eHealth Evaluation Task

12

Typical IR Evaluation Dataset

Document Collection

Topic Set Relevance Assessment

...

...

Page 13: Medical Information Retrieval and its Evaluation: an Overview of CLEF eHealth Evaluation Task

13

Existing Medical IR evaluation tasks

• Existing medical IR evaluation tasks: TREC Medical Records 2011, 2012 TREC 2000 filtering track (corpus OHSUMED) TREC genomics 2003-2007 ImageCLEFMed 2005-2013 TREC clinical decision support 2014, 2015

No patient-centered evaluation task

Page 14: Medical Information Retrieval and its Evaluation: an Overview of CLEF eHealth Evaluation Task

14

Presentation Overview

• Medical IR and its Evaluation• CLEF eHealth

– Context and tasks– IR tasks description– Datasets– Evaluation– Participation

• Conclusion

Page 15: Medical Information Retrieval and its Evaluation: an Overview of CLEF eHealth Evaluation Task

15

CLEF eHealth

AP: 72 yo w/ ESRD on HD, CAD, HTN, asthma, p/w significant hyperkalemia & associated arrythmias.

Page 16: Medical Information Retrieval and its Evaluation: an Overview of CLEF eHealth Evaluation Task

16

CLEF eHealth Tasks2013

• Task 1: Named entity recognition in clinical text

• Task 2: acronym normalization in clinical text

• Task 3: User-centred health IR

2014• Task 1: Visual-Interactive

Search and Exploration of eHealth Data

• Task 2: Information extraction from clinical text

• Task 3: User-centred health IR

2015• Task 1a: Clinical speech recognition from nurses handover• Task 1b: Clinical named entity recognition in French• Task 2: User-centred health IR

Page 17: Medical Information Retrieval and its Evaluation: an Overview of CLEF eHealth Evaluation Task

17

Presentation Overview

• Medical IR and its Evaluation• CLEF eHealth

– Context and tasks– IR tasks description– Datasets– Evaluation– Participation

• Conclusion

Page 18: Medical Information Retrieval and its Evaluation: an Overview of CLEF eHealth Evaluation Task

18

2013-2014

IR Evaluation Task Scenario

2015

Page 19: Medical Information Retrieval and its Evaluation: an Overview of CLEF eHealth Evaluation Task

19

• IR Evaluation Task over the years

2013 2014 2015

Goal Help laypersons better understand medical reports

Layperson checking their symptoms

Topics 55 EN topics built from discharge summaries

55 EN topics + translation in CZ, DE, FR

67 EN topics built from images + translation in AR, CZ, DE, FA, FR, IT, PT

Documents Medical document collection provided by Khresmoi project

Relevance assessment

Manual evaluation of relevance of documents

Manual evaluation of relevance and readability of documents

Page 20: Medical Information Retrieval and its Evaluation: an Overview of CLEF eHealth Evaluation Task

20

Presentation Overview

• Medical IR and its Evaluation• CLEF eHealth

– Context and tasks– IR tasks description– Datasets– Evaluation– Participation

• Conclusion

Page 21: Medical Information Retrieval and its Evaluation: an Overview of CLEF eHealth Evaluation Task

21

Document Collection

• Web crawl of health-related documents (~ 1M)• Made available through the Khresmoi project

(khresmoi.eu)• Target: general public and medical professionals• Broad range of medical topics covered• Content:• Health On the Net (HON) Foundation certified

websites (~60%)• Various well-known medical websites: DrugBank,

Diagnosia, TRIP answers, etc. (~40%)

Page 22: Medical Information Retrieval and its Evaluation: an Overview of CLEF eHealth Evaluation Task

22

Topics & context

Topics2013

Manual creation from randomly

selected annotation of disorder in the

DS (context)

2014Manual creation from manually identified main

disorders in the DS (context)

2015Manual creation from images describing

a medical problem (context)

Page 23: Medical Information Retrieval and its Evaluation: an Overview of CLEF eHealth Evaluation Task

23

Topics - Examples<topic> <id>qtest3</id><discharge_summary>02115-010823-DISCHARGE_SUMMARY.txt</discharge_summary><title>Asystolic arrest</title><desc>what is asystolic arrest</desc><narr>asystolic arrest and why does it cause death</narr><profile>A 87 year old woman with a stroke and asystolic arrest dies and the daughter wants to know about asystolic arrest and what it means.</profile></topic>

2013-2014

<topic> <id>clef2015.test.15</id><query>weird brown patches on skin</query></topic>

2015

Page 24: Medical Information Retrieval and its Evaluation: an Overview of CLEF eHealth Evaluation Task

24

Datasets - Summary

• Provided to the participants:• Document collection• Discharge summaries (optional) [2013-2014]• Training set:

– 5 queries + qrels [2013]– 5 queries (+ translation) + qrels [2014-2015]

• Test set:– 50 queries [2013]– 50 queries (+ translation) [2014]– 62 queries (+ translation) [2015]

Page 25: Medical Information Retrieval and its Evaluation: an Overview of CLEF eHealth Evaluation Task

25

Presentation Overview

• Medical IR and its Evaluation• CLEF eHealth

– Context and tasks– IR tasks description– Datasets– Evaluation– Participation

• Conclusion

Page 26: Medical Information Retrieval and its Evaluation: an Overview of CLEF eHealth Evaluation Task

26

Guidelines for Submissions

Submission of up to 7 runs (per language): Run 1 (mandatory) - team baseline: only title and description fields, no external resources. Runs 2-4 (optional) any experiment WITH the DS. Runs 5-7 (optional) any experiment WITHOUT the DS.

2013 - 2014

Submission of up to 10 ranked runs (per language): Run 1 (mandatory): baseline runRuns 2-10: any experiment with any external resource

2015

Page 27: Medical Information Retrieval and its Evaluation: an Overview of CLEF eHealth Evaluation Task

27

Relevance Assessment

Manual relevance assessment conducted by medical professionals and IR experts

4-point scale assessment mapped to a binary scale

– {0: non relevant, 1: on topic but unreliable} → non relevant

– {2: somewhat relevant, 3: relevant} → relevant

4-point scale for NDCG and 2-point scale for precision

[2015] Manual assessment of the readability of the documents conducted by the same assessors on a 4-point scale

Page 28: Medical Information Retrieval and its Evaluation: an Overview of CLEF eHealth Evaluation Task

28

Relevance Assessment - PoolsTraining set Test set

2013 Merged top 30 ranked documents from Vector Space Model and Okapi BM25

Merged top 10 documents from participants baseline run, the highest two priority runs with DS and highest two without DS2014

2015 Merged top 10 documents from participants three highest priority runs

Page 29: Medical Information Retrieval and its Evaluation: an Overview of CLEF eHealth Evaluation Task

29

Evaluation Metrics

• Classical TREC evaluation: P@5, P@10, NDCG@5, NDCG@10, MAP

• Ranking based on P@10

Page 30: Medical Information Retrieval and its Evaluation: an Overview of CLEF eHealth Evaluation Task

30

Presentation Overview

• Medical IR and its Evaluation• CLEF eHealth

– Context and tasks– IR tasks description– Datasets– Evaluation– Participation

• Conclusion

Page 31: Medical Information Retrieval and its Evaluation: an Overview of CLEF eHealth Evaluation Task

31

Participants and Runs

Monolingual IR Multilingual IR

# teams # runs # teams # runs

2013 9 48 -- --

2014 14 62 2 24

2015 12 92 1 35

Page 32: Medical Information Retrieval and its Evaluation: an Overview of CLEF eHealth Evaluation Task

32

Baselines

2013: • JSoup• Okapi stop words & Porter stemmer• Lucene BM252014: • Indri HTML parser• Okapi stop words & Krovetz stemmer• Indri BM25, tf.idf, LM

Page 33: Medical Information Retrieval and its Evaluation: an Overview of CLEF eHealth Evaluation Task

33

2013 Participants P@10 (best run)

Team-Mayo (2)Team-AEHRC (5)

Team-MEDINFO (1)Team-UOG (5)

Team-THCIB (5)Team-KC (1)

Team-UTHealth (1)Team-QUT (2)

Team-OHSU (5)

0

0.1

0.2

0.3

0.4

0.5

0.6

BM25 BM25 + PRF

Page 34: Medical Information Retrieval and its Evaluation: an Overview of CLEF eHealth Evaluation Task

34

2014 Task 3a P@10 (best run)G

RIU

M_E

N_R

un.5

SN

UM

ED

INF

O_E

N_R

un.2

KIS

TI_

EN

_Run

.2

IRLa

bDA

IIC

T_E

N_R

un.1

UIO

WA

_EN

_Run

.1

base

line.

dir

DE

MIR

_EN

_Run

.6

ReP

aLi_

EN

_Run

.5

NIJ

M_E

N_R

un.2

YO

RK

U_E

N_R

un.5

UH

U_E

N_R

un.5

CO

MP

L_E

N_R

un.5

ER

IAS

_EN

_Run

.6

mir

acl_

en_r

un.1

CU

NI_

EN

_RU

N.5

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

Page 35: Medical Information Retrieval and its Evaluation: an Overview of CLEF eHealth Evaluation Task

3535

Participants P@10 (2013 and 2014)

P@100

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

2013

2014

BM25 2013LM Dirichlet smoothing 2014

Page 36: Medical Information Retrieval and its Evaluation: an Overview of CLEF eHealth Evaluation Task

36

Team-Mayo Team-AEHRCTeam-MEDINFO Team-UOG Team-THCIB Team-KC Team-UTHealth Team-QUT Team-OHSU0

0.1

0.2

0.3

0.4

0.5

0.6

Baseline

Best run

2013 Participants' ResultsBaseline vs best run

Page 37: Medical Information Retrieval and its Evaluation: an Overview of CLEF eHealth Evaluation Task

37

What Worked Well?

Team-Mayo:

• Markov Model Random Field to model query term dependency

• QE using external collections

• Combination of indexing techniques + re-ranking

Team-AEHRC:

• Language Models with Dirichlet smoothing

• QE with spelling correction and acronym expansion

Team-MEDINFO: Query Likelihood Model

BM25 Baseline

Page 38: Medical Information Retrieval and its Evaluation: an Overview of CLEF eHealth Evaluation Task

38

CO

MP

L

CU

NI

DE

MIR

ER

IAS

GR

IUM

IRLa

bDA

IIC

T

KIS

TI

mir

acl_

en_r

un.1

NIJ

M

ReP

aLi

SN

UM

ED

INF

O

UH

U

UIO

WA

YO

RK

U

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

BaselineBest run

2014 Participant's Results Baseline vs best run

Page 39: Medical Information Retrieval and its Evaluation: an Overview of CLEF eHealth Evaluation Task

39

What Worked Well?

Team-GRIUM:• Hybrid IR approach (text-based and concept-based)`• Language models• Query expansion based on mutual informationTeam-SNUMEDINFO:• Language Models with Dirichlet smoothing• QE with medical concepts• Google translateTeam-KISTI: • Language models• Various QE approaches

Page 40: Medical Information Retrieval and its Evaluation: an Overview of CLEF eHealth Evaluation Task

40

Task 3b Results

CS DE FR0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

CUNISNUMEDINFO

Page 41: Medical Information Retrieval and its Evaluation: an Overview of CLEF eHealth Evaluation Task

41

2013 - Use of Discharge Summaries

Team-Mayo Team-Medinfo Team-THCIB Team-KC Team-QUT0

0.1

0.2

0.3

0.4

0.5

0.6

With DS

Without DS

Baseline

Page 42: Medical Information Retrieval and its Evaluation: an Overview of CLEF eHealth Evaluation Task

42

How were DS used?

- Result re-ranking based on concepts extracted from queries, relevant documents and DS (Team-Mayo)

- Query expansion: * Filtering of non-relevant expansion terms/concepts

(Team-MEDINFO)* Expansion with all concepts from query and DS (Team-

THCIB)* Expansion with concepts identified in relevant

passages of the DS (Team-KC)* Query refinement (Team-TOPSIG)

Page 43: Medical Information Retrieval and its Evaluation: an Overview of CLEF eHealth Evaluation Task

43

2014 - Use of Discharge Summaries

IRLabDAIICT KISTI NIJM0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

DSNo DS

Page 44: Medical Information Retrieval and its Evaluation: an Overview of CLEF eHealth Evaluation Task

44

How Were DS Used?

●Query expansion:● Expansion using Metamap, with expansion

candidates filtered using the DS (Team-SNUMEDINFO)

● Expansion with abbreviations and DS combined with pseudo-relevance feedback (Team-KISTI)

● Expansion with MeSH terminology and DS (Team-IRLABDAIICT)

● Expansion with terms from the DS (Team-Nijmegen)

Page 45: Medical Information Retrieval and its Evaluation: an Overview of CLEF eHealth Evaluation Task

45

Presentation Overview

• Medical IR and its Evaluation• CLEF eHealth

– Context and tasks– IR tasks description– Datasets– Evaluation– Participation– Further analysis

• Conclusion

Page 46: Medical Information Retrieval and its Evaluation: an Overview of CLEF eHealth Evaluation Task

46

Medical Queries Complexity Query complexity = number of medical

concepts/entities it contains radial neck fracture and healing time facial cuts and scar tissue nausea and vomiting and hematemesis

Dataset: 50 queries from CLEF eHealth 2013 (patients queries) Runs from 9 teams

Impact of the complexity on the systems performances

Page 47: Medical Information Retrieval and its Evaluation: an Overview of CLEF eHealth Evaluation Task

47

Medical Queries Complexity

Page 48: Medical Information Retrieval and its Evaluation: an Overview of CLEF eHealth Evaluation Task

48

Presentation Overview

• Medical IR and its Evaluation• CLEF eHealth

– Context and tasks– IR tasks description– Datasets– Evaluation– Participation– Further analysis

• Conclusion

Page 49: Medical Information Retrieval and its Evaluation: an Overview of CLEF eHealth Evaluation Task

49

Conclusion

• 3 successful years running CLEF eHealth• Datasets are publicly available for research

purpose• Used for research by organizers, participants,

and other groups• Building a community – evaluation tasks,

workshop@SIGIR, special edition of JIR

Page 50: Medical Information Retrieval and its Evaluation: an Overview of CLEF eHealth Evaluation Task

50

For More Details

CLEF eHealth Lab overview: Suominen et al. (2013). Overview of the ShARe/CLEF eHealth Evaluation Lab 2013. In CLEF 2013 Proceedings.Kelly et al. (2014). Overview of the ShARe/CLEF eHealth Evaluation Lab 2014. In CLEF 2014 Proceedings.

CLEF eHealth IR task overview: Goeuriot et al. (2013). ShAReCLEF eHealth Evaluation Lab 2013, Task 3: Information Retrieval to Address Patients’ Questions when Reading Clinical Reports. In CLEF 2013 Working notes. Goeuriot et al. (2014). ShARe/CLEF eHealth Evaluation Lab 2014, Task 3: User-centred health information retrieval. In CLEF 2013 Working notes.

Page 51: Medical Information Retrieval and its Evaluation: an Overview of CLEF eHealth Evaluation Task

51

Follow us!

http://sites.google.com/site/clefehealth2015

clef-ehealth-evaluation-lab-information On Google groups

@clefehealth

Join the party in Toulouse: http://clef2015.clef-initiative.eu/CLEF2015/

conferenceRegistration.php

Page 52: Medical Information Retrieval and its Evaluation: an Overview of CLEF eHealth Evaluation Task

52

Consortium

• Lab chairs: Lorraine Goeuriot, Liadh Kelly• Task 1: Hanna Suominen, Leif Hanlen, Gareth

Jones, Liyuan Zhou, Aurélie Névéol, Cyril Grouin, Thierry Hamon, Pierre Zweigenbaum

• Task 2: Joao Palotti, Guido Zuccon, Allan Hanbury, Mihai Lupu, Pavel Pecina

Page 53: Medical Information Retrieval and its Evaluation: an Overview of CLEF eHealth Evaluation Task

53

Thank you! Questions?

Page 54: Medical Information Retrieval and its Evaluation: an Overview of CLEF eHealth Evaluation Task

54

Task 3a - Topic Generation Process (1)Discharge Medications:

1. Aspirin 81 mg Tablet, Delayed Release (E.C.) Sig: One (1) Tablet, Delayed Release (E.C.) PO DAILY (Daily). Disp:*30 Tablet, Delayed Release (E.C.)(s)* Refills:*0*

2. Docusate Sodium 100 mg Capsule Sig: One (1) Capsule PO BID (2 times a day). Disp:*60 Capsule(s)* Refills:*0*

3. Levothyroxine Sodium 200 mcg Tablet Sig: One (1) Tablet PO DAILY (Daily).

Discharge Disposition:

Extended Care

Facility:

[**Hospital 5805**] Manor - [**Location (un) 348**]

Discharge Diagnosis:

Coronary artery disease.

s/p CABG

post op atrial fibrillation

Page 55: Medical Information Retrieval and its Evaluation: an Overview of CLEF eHealth Evaluation Task

55

Task 3a - Topic Generation Process (2)Discharge Medications:

1. Aspirin 81 mg Tablet, Delayed Release (E.C.) Sig: One (1) Tablet, Delayed Release (E.C.) PO DAILY (Daily). Disp:*30 Tablet, Delayed Release (E.C.)(s)* Refills:*0*

2. Docusate Sodium 100 mg Capsule Sig: One (1) Capsule PO BID (2 times a day). Disp:*60 Capsule(s)* Refills:*0*

3. Levothyroxine Sodium 200 mcg Tablet Sig: One (1) Tablet PO DAILY (Daily).

Discharge Disposition:

Extended Care

Facility:

[**Hospital 5805**] Manor - [**Location (un) 348**]

Discharge Diagnosis:

Coronary artery disease.

s/p CABG

post op atrial fibrillation

Page 56: Medical Information Retrieval and its Evaluation: an Overview of CLEF eHealth Evaluation Task

56

Task 3a - Topic Generation Process (3)Discharge Medications:

1. Aspirin 81 mg Tablet, Delayed Release (E.C.) Sig: One (1) Tablet, Delayed Release (E.C.) PO DAILY (Daily). Disp:*30 Tablet, Delayed Release (E.C.)(s)* Refills:*0*

2. Docusate Sodium 100 mg Capsule Sig: One (1) Capsule PO BID (2 times a day). Disp:*60 Capsule(s)* Refills:*0*

3. Levothyroxine Sodium 200 mcg Tablet Sig: One (1) Tablet PO DAILY (Daily).

Discharge Disposition:

Extended Care

Facility:

[**Hospital 5805**] Manor - [**Location (un) 348**]

Discharge Diagnosis:

Coronary artery disease.

s/p CABG

post op atrial fibrillation

What is coronary heart disease?

Page 57: Medical Information Retrieval and its Evaluation: an Overview of CLEF eHealth Evaluation Task

5757

Participants Approaches