information retrieval and its application in biomedicine

20
Information Retrieval Information Retrieval and its Application in and its Application in Biomedicine Biomedicine Hong Yu Hong Yu 1,2 1,2 , PhD , PhD Susan McRoy Susan McRoy 1 , PhD , PhD 1 Department of Computer Science Department of Computer Science 2 Department of Health Sciences Department of Health Sciences University of Wisconsin-Milwaukee University of Wisconsin-Milwaukee Sept 4 Introduction

Upload: jessenia-kianoush

Post on 03-Jan-2016

38 views

Category:

Documents


1 download

DESCRIPTION

Information Retrieval and its Application in Biomedicine. Sept 4 Introduction. Hong Yu 1,2 , PhD Susan McRoy 1 , PhD 1 Department of Computer Science 2 Department of Health Sciences University of Wisconsin-Milwaukee. What is Information Retrieval?. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Information Retrieval and its Application in Biomedicine

Information Retrieval and Information Retrieval and its Application in its Application in BiomedicineBiomedicine

Hong YuHong Yu1,21,2, PhD, PhDSusan McRoySusan McRoy11, PhD, PhD11Department of Computer ScienceDepartment of Computer Science22Department of Health SciencesDepartment of Health SciencesUniversity of Wisconsin-MilwaukeeUniversity of Wisconsin-Milwaukee

Sept 4 Introduction

Page 2: Information Retrieval and its Application in Biomedicine

What is Information What is Information Retrieval?Retrieval?

The field concerned with the acquisition, The field concerned with the acquisition, organization, and searching of knowledge-organization, and searching of knowledge-based information. (Hersh, 2003)based information. (Hersh, 2003)

Page 3: Information Retrieval and its Application in Biomedicine

Speed Up CommunicationSpeed Up Communication

Page 4: Information Retrieval and its Application in Biomedicine

InformationInformation

World Wide WebWorld Wide Web Company DocumentationsCompany Documentations Drug DescriptionsDrug Descriptions Medical RecordsMedical Records BooksBooks Everything that is text, image, Everything that is text, image,

video, and sound, and that can be video, and sound, and that can be transformed digitallytransformed digitally

Page 5: Information Retrieval and its Application in Biomedicine

Information in BiomedicineInformation in Biomedicine

Literature (over 17 million publications)Literature (over 17 million publications) WWWWWW Electronic medical recordsElectronic medical records Genomics dataGenomics data

– DNA sequences, etc.DNA sequences, etc.

Knowledge representationKnowledge representation– Gene OntologyGene Ontology

Company databases Company databases – Micromedex drug databaseMicromedex drug database

Page 6: Information Retrieval and its Application in Biomedicine

IR in BiomedicineIR in Biomedicine

Index Medicus (Billings 1879)Index Medicus (Billings 1879) MEDLARS (NLM 1966)MEDLARS (NLM 1966) SAPHIRE (Hersh 1990)SAPHIRE (Hersh 1990) PubMed (NLM 1996)PubMed (NLM 1996) Arrowsmith (Smalheiser 1998)Arrowsmith (Smalheiser 1998) BioText (Hearst 2003)BioText (Hearst 2003) BioMedQA (Yu 2006)BioMedQA (Yu 2006)

Page 7: Information Retrieval and its Application in Biomedicine

Electronic and Open Electronic and Open PublishingPublishing

Internet and Web have a profound impact on Internet and Web have a profound impact on the publishing of knowledge-based informationthe publishing of knowledge-based information

Most of literature can be electronically Most of literature can be electronically availableavailable

Open-accessOpen-access– The Bethesda Statement on Open Access Publishing (The Bethesda Statement on Open Access Publishing (

http://www.earlham.edu/~peters/fos/bethesda.htmhttp://www.earlham.edu/~peters/fos/bethesda.htm) ) (April 11, 2003)(April 11, 2003)

– The Berlin Declaration on Open Access to Knowledge The Berlin Declaration on Open Access to Knowledge in the Sciences and Humanities (in the Sciences and Humanities (http://www.zim.mpg.de/openaccess-berlin/berlindeclahttp://www.zim.mpg.de/openaccess-berlin/berlindeclaration.htmlration.html). (2003)). (2003)

– PubMedCentra (NLM 2004)PubMedCentra (NLM 2004)

Page 8: Information Retrieval and its Application in Biomedicine

Quality of InformationQuality of Information

A lack of quality controlA lack of quality control– Anyone can publish onlineAnyone can publish online– A wealthy of studies concluded that A wealthy of studies concluded that

Web has a poor quality for Web has a poor quality for healthcare informationhealthcare information

ReadabilityReadability– Hard to readHard to read

Page 9: Information Retrieval and its Application in Biomedicine

Information Needs and Information Needs and SeekingSeeking

Unrecognized needsUnrecognized needs– Clinicians unaware of information needs or Clinicians unaware of information needs or

knowledge deficitknowledge deficit Recognized needsRecognized needs

– Clinicians aware of needs but may or may not Clinicians aware of needs but may or may not pursue thempursue them

Pursued needsPursued needs– Information seeking occurs but may or may not Information seeking occurs but may or may not

be successfulbe successful Satisfied needsSatisfied needs

– Information seeking successfulInformation seeking successful

Page 10: Information Retrieval and its Application in Biomedicine

Evidence-Based MedicineEvidence-Based Medicine

Page 11: Information Retrieval and its Application in Biomedicine

What You Will LearnWhat You Will Learn

IR algorithmsIR algorithms– IndexingIndexing– Query and RetrievalQuery and Retrieval– EvaluationEvaluation– Text ClassificationText Classification– XML retrievalXML retrieval– Web retrievalWeb retrieval

Page 12: Information Retrieval and its Application in Biomedicine

What You Will Learn (Cont.)What You Will Learn (Cont.)

Open-Source IR toolsOpen-Source IR tools– What open-source IR tools are What open-source IR tools are

availableavailable Indexing/retrievalIndexing/retrieval Part-of-speech and syntactic parsingPart-of-speech and syntactic parsing Semantic parsingSemantic parsing Discourse relationsDiscourse relations Machine-learning classifiersMachine-learning classifiers

How to use the tools?How to use the tools?

Page 13: Information Retrieval and its Application in Biomedicine

What You Will Learn (Cont.)What You Will Learn (Cont.)

State of the art IR systemsState of the art IR systems– Baruch 1965 [BLIMP Baruch 1965 [BLIMP http://blimp.cs.queensu.ca/index.htmlhttp://blimp.cs.queensu.ca/index.html]]– SAPHIRE (Hersh 1990)SAPHIRE (Hersh 1990)

RetrievalRetrieval– MedLEE (Friedman 1994)MedLEE (Friedman 1994)

ExtractionExtraction– PubMedPubMed (NLM 1997) (NLM 1997)– ARROSMITH Systems ARROSMITH Systems (Smalheiser 1998)(Smalheiser 1998)

Hidden Relation Discovery ToolHidden Relation Discovery Tool– GENIES (Friedman 2001)GENIES (Friedman 2001)

ExtractionExtraction

Page 14: Information Retrieval and its Application in Biomedicine

BioText (BioText (Hearst 2003Hearst 2003 http://biotext.berkeley.edu/http://biotext.berkeley.edu/ ))– Retrieval+CategorizationRetrieval+Categorization

GeneWays (GeneWays (Rzhetsky 2004 Rzhetsky 2004

http://geneways.genomecenter.columbia.edu/http://geneways.genomecenter.columbia.edu/ ))– Extraction+VisualizationExtraction+Visualization

TextPresso (TextPresso (Muller 2004Muller 2004 http://www.textpresso.org/http://www.textpresso.org/))– Retrieval+ExtractionRetrieval+Extraction

iHOP (iHOP (Hoffman and Valencia 2005Hoffman and Valencia 2005 http://www.ihop-http://www.ihop-

net.org/UniPub/iHOP/net.org/UniPub/iHOP/))– Retrieval Retrieval

BioMedQABioMedQA ( (Yu 2006 Yu 2006 http://monkey.ims.uwm.edu/MedQAhttp://monkey.ims.uwm.edu/MedQA))– Question AnsweringQuestion Answering

BioNLP Systems

Page 15: Information Retrieval and its Application in Biomedicine

Advanced NLP applicationsAdvanced NLP applications

Page 16: Information Retrieval and its Application in Biomedicine

Beyond text: Image and Beyond text: Image and VideoVideo

Image classificationImage classification– Finding concepts in captions and annotationsFinding concepts in captions and annotations– Machine learning on textual & visual featuresMachine learning on textual & visual features– Determining salient features in text and Determining salient features in text and

image separately and merging the resultsimage separately and merging the results Extracting text from imageExtracting text from image

– Understanding and correcting OCR Understanding and correcting OCR (handwriting, equations)(handwriting, equations)

– Finding text in images Finding text in images Finding document text related to illustrationsFinding document text related to illustrations Video retrievalVideo retrieval

Page 17: Information Retrieval and its Application in Biomedicine

Beyond Extraction: Experimental Beyond Extraction: Experimental

ToolsTools

Page 18: Information Retrieval and its Application in Biomedicine

ResourcesResources Annotated collections (GENIA, Medstract, Yapex …)Annotated collections (GENIA, Medstract, Yapex …) Ontologies, tools, knowledge bases …Ontologies, tools, knowledge bases … Publications, Conferences, Evaluations …Publications, Conferences, Evaluations … Centres and web portalsCentres and web portals

Page 19: Information Retrieval and its Application in Biomedicine

What We ProvideWhat We Provide

TextbookTextbook– Christopher D. Manning, Prabhakar Raghavan Christopher D. Manning, Prabhakar Raghavan

and Hinrich Schutze. and Hinrich Schutze. Introduction to Introduction to Information Retrieval. Information Retrieval. Cambridge University Cambridge University Press, 2007Press, 2007

http://www-csli.stanford.edu/~schuetze/information-http://www-csli.stanford.edu/~schuetze/information-retrieval-book.htmlretrieval-book.html

Office hour:Office hour:– Tuesdays, 3-4 pm EMS 710 and by Tuesdays, 3-4 pm EMS 710 and by

appointmentappointment– Hong Yu, 414-229-3344Hong Yu, 414-229-3344– Susan McRoy, 414-229-6695Susan McRoy, 414-229-6695

Page 20: Information Retrieval and its Application in Biomedicine

What We ExpectWhat We Expect

Undergraduate:Undergraduate:– 30% Homework, 35% Midterm exam, 30% Homework, 35% Midterm exam,

35% Final exam or project 35% Final exam or project Graduate:Graduate:

– 20% Midterm exam, 40% Homework, 40% 20% Midterm exam, 40% Homework, 40% Project: The project may be done Project: The project may be done individually or in a team of 2-3 people. The individually or in a team of 2-3 people. The final project will include a software final project will include a software system, a 2-3 page written project report, system, a 2-3 page written project report, and an oral presentation. The report and an oral presentation. The report should describe the problem, the should describe the problem, the approach, and evaluation and should cite approach, and evaluation and should cite related work where appropriate.related work where appropriate.