engine for biomedical image descriptors1 · linked through java-based apis for googleimage and...

12
Date submitted: 24/06/2009 The Development of an Automatic Mapping Engine for Biomedical Image Descriptors 1 Sujin Kim 2 Assistant Professor, School of Library and Information Science and Department of Pathology and Laboratory Medicine, University of Kentucky Lexington, USA Aswathnarayanan Sadagopan Graduate Research Assistant, Department of Computer Science, University of Kentucky Lexington, USA Meeting: 180. Audiovisual and Multimedia and Bibliographic Control WORLD LIBRARY AND INFORMATION CONGRESS: 75TH IFLA GENERAL CONFERENCE AND COUNCIL 23-27 August 2009, Milan, Italy http://www.ifla.org/annual-conference/ifla75/index.htm Abstract Background: Research communities often collect pathology images and post them to the Web in digital libraries available for others to use. As scientists and other scholars seek visual images to support their research, however, they find these images difficult to locate. Objective: This pilot study developed an automatic image indexing engine that generates topical descriptors for digitized microscopic images available on the Web. The system also implemented a mapping module which transfers the indexing terms to controlled vocabularies through MetaMap Transfer (MMTx) algorithm developed 1 This project was supported in part by a grant (RE-04-08-0069-08) from the Institute of Museum and Library Service. In addition, this publication was made possibly by Grant Number P20RR-16481 from the National Center for Research Resources (NCRR), a component of the National Institutes of Health (NIH). Its contents are solely the responsibility of the authors and do not necessarily represent the official views of NCRR or NIH. 2 First and corresponding author who contributed 90% of the work for the manuscript including conception and design, acquisition, analysis, and interpretation of data, drafting of the manuscript, critical revision of the manuscript, obtaining funding, and design and supervision of the system development.

Upload: others

Post on 28-Jul-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Engine for Biomedical Image Descriptors1 · linked through Java-based APIs for GoogleImage and PubMed Central databases. The GoogleImage database is accessible through Google Image

Date submitted: 24/06/2009

The Development of an Automatic Mapping Engine for Biomedical Image Descriptors1 Sujin Kim2 Assistant Professor, School of Library and Information Science and Department of Pathology and Laboratory Medicine, University of Kentucky Lexington, USA Aswathnarayanan Sadagopan Graduate Research Assistant, Department of Computer Science, University of Kentucky Lexington, USA

Meeting: 180. Audiovisual and Multimedia and Bibliographic Control

WORLD LIBRARY AND INFORMATION CONGRESS: 75TH IFLA GENERAL CONFERENCE AND COUNCIL

23-27 August 2009, Milan, Italy

http://www.ifla.org/annual-conference/ifla75/index.htm

Abstract Background: Research communities often collect pathology images and post them to the Web in digital libraries available for others to use. As scientists and other scholars seek visual images to support their research, however, they find these images difficult to locate. Objective: This pilot study developed an automatic image indexing engine that generates topical descriptors for digitized microscopic images available on the Web. The system also implemented a mapping module which transfers the indexing terms to controlled vocabularies through MetaMap Transfer (MMTx) algorithm developed 1 This project was supported in part by a grant (RE-04-08-0069-08) from the Institute of Museum and Library Service. In addition, this publication was made possibly by Grant Number P20RR-16481 from the National Center for Research Resources (NCRR), a component of the National Institutes of Health (NIH). Its contents are solely the responsibility of the authors and do not necessarily represent the official views of NCRR or NIH. 2 First and corresponding author who contributed 90% of the work for the manuscript including conception and design, acquisition, analysis, and interpretation of data, drafting of the manuscript, critical revision of the manuscript, obtaining funding, and design and supervision of the system development.

Page 2: Engine for Biomedical Image Descriptors1 · linked through Java-based APIs for GoogleImage and PubMed Central databases. The GoogleImage database is accessible through Google Image

2

by the Unified Medical Language System (UMLS) (National Library of Medicine, USA). Method: Researchers for the pilot study developed a Web crawler to identify websites that contain microscopic images with narrative descriptions. In order to generate sets of topical descriptors for their accompanying images, these descriptions were processed through an automatic indexing engine. Result: The Web-based image search engine developed in this study provides the ability to map indexing terms to the UMLS Metathesaurus. Pilot tests of this search engine found that the keywords from caption are useful in identifying granular information for image description. The MMTx semantic scores for individual search terms also proved helpful in ranking indexing terms. Conclusion: Because this proposed study provides a semantic mapping mechanism for the descriptive captions found with biomedical images, it has contributed to the understanding of bibliographic control of web-based visual materials and therefore enhances the possibility of more efficient retrieval.

Introduction Visual evidence of biomedical findings has become a vital component of clinical and scientific healthcare practice. Scholars, healthcare professionals, and the general public post more and more biomedical images to the Web. These images serve as a valuable source of information for medical education, patient care, and scholarly research. For instance, anatomic pathology is an image intensive discipline requiring rigorous histologic diagnosis based on cellular level of image features. Advancements in digital pathology imaging have led to the standard practice of conducting case conferences, resident training, research publishing, and histologic consulting using scanned microscopic and gross images. The majority of pathology departments in large academic teaching hospitals host a publicly accessible case database along with other clinical and histologic findings for training. Basic and clinical scientists are looking on the Web for evidence to support their findings. However, it is extremely hard to locate the images published in scholarly articles. Biomedical articles indexed in the publicly accessible PubMed database are not searchable by words in captions. Therefore, potentially important images in “Figures” remain irretrievable. More importantly, Medical Subject Headings (MeSHs) are given to describe the whole article context rather than the granular information found in image captions. AS the basic link between an image and the content of published work, a caption is the best source of topical descriptors for non-text information contained in a scholarly paper. Limited studies have assessed the usefulness of topical descriptors extracted from captions attached to biomedical images. Of those studies, none has attempted to discover characteristics of keywords from captions. As a preliminary step, we developed a prototype search engine to index, map, and search biomedical images published in articles and Web databases. To assess the usefulness of captions used for topical descriptors for biomedical images published on the Web, we tested the collections within the GoogleImage and PubMed Central databases. The characteristics of caption-based keywords and their mapping to controlled vocabularies are briefly discussed in the result section.

Page 3: Engine for Biomedical Image Descriptors1 · linked through Java-based APIs for GoogleImage and PubMed Central databases. The GoogleImage database is accessible through Google Image

3

Background Studies

Digital Pathology Images on the Web

Much of conventional pathology practice involves the examination of cellular structures, shapes, sizes, and color using specimens placed on glass slides and examined under electron microscopes. The digitization of scanned microscopic and gross images has allowed utilization of the Web for educational, research, and diagnostic purposes. Scanned pathology images can be digitally archived without the risk of breaking, fading or scratching. With the continuing development of computer-assisted image analyses such as immunohistochemistry and fluorescence microscopy, digital pathology exhibits great potential as a tool in modern molecular-based biomedicine [1-3]. Most recently, whole slide images have gained growing attention. This allows entire slides to be digitized rather than requiring the capture of individual or sequential images for viewing [4, 5]. Problematically, only a minimal level of description has been achieved in digital pathology. This descriptive minimum involves a default sequential file name assigned automatically by a digital scanner. The other descriptive option is a surgical pathology number that serves as a unique identifier but is not ideal because of patient privacy and confidentiality requirements. Proper cataloging of digital slides increases the ease with which they are retrieved and disseminated, and thereby, results in less complicated and faster slide transfer when using them for medical and scientific purposes. Something as elemental as the description of the image-capturing system (e.g., the digital microscope) does not currently exist as a “complete” set of metadata for digital pathology. Image retrieval by topical headings is obviously impossible task without achieving granular level of figure description for published images in articles. It is even worse for biomedical images posted on the Web because there is no way to standardize the practice of subject indexing by Web users. Social tagging is interesting concept which is user-driven description. However, when it gets to life-supporting mission in health care, more careful decision should be considered for better retrieval for later use.

Caption-based image descriptions Captions are a concise summary of important research findings contained in figures in published articles [6]. Several studies confirmed that keywords found in captions are an extremely effective way to index and retrieve biomedical journal articles compared to the usual method of searching title and abstract [7-11]. Heart et al. also found that captions contain important information about experimental methods [12-13]. For instance, searching on “Western Blot” returned more than a thousand results in caption search while very few results were returned when run only over title and abstract text. The usefulness of captions in categorizing biomedical documents was also reported [6]. In their prototype project, Murphy et al.(2004) and Hua et al. (2007) developed a system to identify fluorescence microscope images in published papers by analyzing image processing results based on gray level frequencies and k-nearest neighbor classifiers [8, 9]. Some researchers worked on cataloging documents based on the controlled vocabularies found in the captions. Sneiderman et al. (2008) discuss their pilot study

Page 4: Engine for Biomedical Image Descriptors1 · linked through Java-based APIs for GoogleImage and PubMed Central databases. The GoogleImage database is accessible through Google Image

4

of a system that will automatically index biomedical images using terms extracted from dermatology image captions and the portions of the article that pertain to the images [10]. In their usability assessment, the results of the automated extraction system were somewhat disappointing, with only 26% of the exact UMLS matches contained in the caption considered useful in indexing the images. Gay (2005) reported promising results of the usefulness of captions for automatic biomedical literature indexing using Medical Text Indexer (MTI) [14]. Kahn (2008) goes further with his caption analysis, using the information contained therein to filter retrieval by age, gender, and image modality in his image database [15]. This radiology image database is limited to the images collected from the five selected radiology journals. Therefore, image classification by age, gender, and image modality should be further studied in a prospective work. Previously reported findings discussed in this section suggest that we should evaluate microscopic imaging descriptors collected from different texts for better development of imaging search mechanisms.

Method

Research Questions (RQs)

Within the collection of GoogleImage and PubMed Central databases: • RQ1: What are the core functions and outputs of three modules developed

including iRetrieve, iIndex, and iTransfer? • RQ2: What are the characteristics of the unique keywords indexed by iIndex? • RQ3: What are the characteristics of meta-mapped entries by MMTx for the

sample keywords by iIndex?

System Design and Architecture

Figure 1: Architecture of Biomedical Image Indexing and Retrieval Engine

Search

Engine User Interface

JAVA

JAVA

Middleware

NIH

MMTx API

Local Database Server

Query

Results

E-Utilities Web Crawler

iRetrieve iTransfer

Automatic Indexer

User Interface

Unique Words

Core Descriptors

Similarity Measures

iIndex

Google AJAX API

Page 5: Engine for Biomedical Image Descriptors1 · linked through Java-based APIs for GoogleImage and PubMed Central databases. The GoogleImage database is accessible through Google Image

5

The architecture of the developed system is depicted in Figure 1. The main technologies used to develop a Web-based biomedical image search engine include: (1) a user interface for searching and indexing, (2) a Java-based middleware for accessing and mapping, (3) backend databases containing bibliographic databases and/or biomedical images, (4) a Web crawler and E-Utilities for collecting Medline records and figure images, and (5) a local database server that stored locations of images (e.g., Uniform Resource Location) along with a basic bibliographic description. The study involved development of three modules. These modules included an indexing module called iIndex intended to automatically select and extract core keywords from a collection of user-provided textual descriptions such as captions, abstracts, and titles. A second module was iRetrieve which sent, refined, and displayed user queries and results. The third, and final, module was a meta mapping module known as iTransfer. This last module automatically mapped user-given search words into controlled vocabularies through the National Library of Medicine’s (NLM’s) MetaMap Transfer Engine (MMTx). The MMTx algorithm processes the given terms in 5 phases including parsing, variant generation, candidate retrieval, candidate evaluation, and mapping construction [16]. Figure 1 shows the overall architecture of the system developed. More detailed functions and outputs of the modules are discussed in the following result section as a finding of research question 1.

Source Databases and MetaMapping Engine

Original data sources of image collections used in our prototype search engine are linked through Java-based APIs for GoogleImage and PubMed Central databases. The GoogleImage database is accessible through Google Image Search (Google AJAX API), a beta version of Google’s Web image search service. A GoogleImage Search is limited to searching words “on the filename of the image, the link text pointing to the image, and text adjacent to the image” [17]. While GoogleImage was included to test the sets of Web-based image collections, the PubMed Central is connected to the search engine for testing the set of published images in literatures. The PubMed Central database is an NLM product that provides open access to full text biomedical journal articles. In order for us to develop an advanced search option as well as a prototype search engine, we downloaded a test collection of Medline records in our local database through NLM’s E-Utilities. A Java-based keyword mapping algorithm called MetaMap Transfer Engine (MMTx) was a main mapping tool. Based on user-given words, the MMTx produces a list of candidate terms that are semantically close to users’ search words. The candidates produced by MetaMap are ordered by score (ranking is done using 4 metrics viz. centrality, variant, coverage and cohesiveness). If the candidate is not the preferred name of its Metathesaurus concept, the preferred name is included in parentheses, and the semantic type represented by the concept is listed in square brackets. If the complete concept is not contained in the Metathesaurus, the final mapping would consist of multiple concepts that completely represent the concept. MetaMap provides users the flexibility to customize mappings to a limited extent.

Page 6: Engine for Biomedical Image Descriptors1 · linked through Java-based APIs for GoogleImage and PubMed Central databases. The GoogleImage database is accessible through Google Image

6

Results

Data Collection for Pilot Testing

The system functionalities for the developed modules were tested on a collection of open access journal articles linked to PMC Open Access Subset and a collection of Web images accessed through the GoogleImage Databases. Manual selection was performed to identify a subset of PubMed Central articles containing microscopic images. The study’s disease of interest was breast cancer, chosen as a means of reducing word variations for “disease name.” The associated captions and abstracts were processed through iIndex to generate automatic keywords for the articles identified as containing microscopic image.

Core functions and technologies of the iRetrieve, iIndex, and iTransfer modules (RQ1)

iRetrieve reads user-given search words within a selected search field including caption, abstract, title, or MeSH descriptors. The retrieved figure images along with bibliographic details are then displayed. Advanced searching options allow users to refine searches by imaging modalities (e.g., x-ray, computer tomography, microscopic images, etc.), demographic variables (e.g., age, gender, human or animal, etc.), laboratory procedures (e.g., staining, bioassay, etc.), and quantitative/qualitative measures (e.g., 100x, 200x, 10%, 20%, etc.).

Figure 2: Snapshots of iRetrieve, iIndex, and iTransfer modules

iIndex reads user-given narrative image description either in a batch mode (e.g., a collection of captions or abstracts, etc.) or a single narrative description. The iIndex generates output files including a list of unique words, term weights, core keywords, and similarity measures for the core keywords. iIndex measures and selections are based on word frequency and cosine similarity for each core descriptor. It then exports aggregate automatic indexing results into an Excel file.

Page 7: Engine for Biomedical Image Descriptors1 · linked through Java-based APIs for GoogleImage and PubMed Central databases. The GoogleImage database is accessible through Google Image

7

iTransfer enables paring user-given search keywords (or iIndex generated keywords), generating word variations, retrieving mapping candidates, scoring mapping candidates, selecting final mapping suggestion, and adding notations for semantic types for semantic clarification. The outputs of iTransfer are the suggestions for the MetaMapping, MetaCandidates, Matching scores, Preferred terms, number of Candidates, and Semantic Types. Table 1 shows a list of core functions by modules and the expected outputs for the functions listed.

Table 1: Core functions and outputs of a prototype system developed MODULES CORE FUNCTIONS OUTPUTS iRetrieve • Reading user-given search word Images • Restricting search by captions or abstracts or

MeSHs Figures

• Displaying retrieved images and descriptions Captions • Refining search results by setting up search

limits Medline Records

• Collecting locations of images and Medline records

Article URLs

• Linking to the other two modules User Interfaces iIndex • Reading user-given narrative image description Unique words • Selecting core descriptors based on word

frequency 3 Keywords

• Calculating cosine similarity for a core descriptor Similarity scores

• Exporting aggregate automatic indexing results Excel tables iTransfer • Paring user-given search keywords MetaMapping • Generating word variations MetaCandidate

s • Retrieving mapping candidates Matching

scores • Scoring mapping candidates Preferred terms • Selecting final mapping suggestion No of

Candidates • Adding semantic types Semantic

Types

Characteristics of the unique keywords indexed by iIndex (RQ2)

Research question 2 was asked in order to describe keywords automatically identified by iIndex and to find any notable characteristics between abstract-based and caption-based narratives that could be used for recommended indexing terms. Table 2 lists only top 30 keywords identified. Since, the source articles for the abstracts and captions were screened only for “breast cancer,” the top keywords generated include breast, cancer, carcinoma, tumor, etc. Approximately 30 percent of the keywords were identified as the top 30 most frequently recommended indexing words within the collection of abstracts and captions processed for this pilot

Page 8: Engine for Biomedical Image Descriptors1 · linked through Java-based APIs for GoogleImage and PubMed Central databases. The GoogleImage database is accessible through Google Image

8

testing. The study noted that there are a few keywords identified from the caption-based study which are not included in the abstract-based keywords. These caption-based keywords include: magnification, immunohistochemistry, hematoxylin and eosin, etc. which are more descriptive of laboratory procedures than of granular level of scientific findings, otherwise not found in the abstracts. By combining keywords from the abstracts and the captions, a list of representative keywords emerges that can be used as descriptive keywords for the retrieval of breast-caner related microscopic images.

Table 2: Caption-based and Abstract-based Keywords by iIndex Abstract-based Caption-based

Keywords Count % Sum (%) Keywords Count % Sum

(%) breast 1490 6.35 Cells 569 5.01 cancer 907 3.87 10.22 Breast 270 2.38 7.39cells 746 3.18 13.40 Carcinoma 229 2.02 9.77expression 568 2.42 15.83 Expression 200 1.76 11.79cell 412 1.76 17.58 Ductal 170 1.50 13.55patients 360 1.54 19.12 Magnification 159 1.40 15.05tumor 345 1.47 20.59 Tumor 136 1.20 16.45carcinoma 234 1.00 21.59 Normal 134 1.18 17.65human 219 0.93 22.52 Cell 103 0.91 18.83growth 191 0.81 23.34 Figure 103 0.91 19.73tumors 171 0.73 24.06 Cancer 101 0.89 20.64gene 146 0.62 24.69 Tissue 92 0.81 21.53invasive 137 0.58 25.27 Invasive 90 0.79 22.34cases 120 0.51 25.78 Original 79 0.70 23.13carcinomas 110 0.47 26.25 Positive 77 0.68 23.83dcis 108 0.46 26.71 Tumour 72 0.63 24.51normal 108 0.46 27.17 Mammary 63 0.56 25.14receptor 99 0.42 27.60 Negative 63 0.56 25.70survival 99 0.42 28.02 Antibody 61 0.54 26.25treatment 99 0.42 28.44 Epithelial 61 0.54 26.79ductal 97 0.41 28.85 Panel 56 0.49 27.33protein 92 0.39 29.25 Nuclear 54 0.48 27.82results 92 0.39 29.64 Sections 54 0.48 28.30lines 90 0.38 30.02 Tumors 53 0.47 28.77levels 89 0.38 30.40 Dcis 51 0.45 29.24metastasis 88 0.38 30.78 representative 48 0.42 29.69tumour 87 0.37 31.15 Control 47 0.41 30.11women 87 0.37 31.52 Lobular 47 0.41 30.53primary 86 0.37 31.89 Nuclei 47 0.41 30.94mammary 85 0.36 32.25 Situ 47 0.41 31.35

Page 9: Engine for Biomedical Image Descriptors1 · linked through Java-based APIs for GoogleImage and PubMed Central databases. The GoogleImage database is accessible through Google Image

9

Characteristics of meta-mapped entries by MMTx (RQ3)

The purpose of Research question 3 was to describe how the keywords iIndex identified by the can be mapped to the Meta Candidates by iTransfer. The Meta Candidates are the suggested keywords that are semantically relevant to the source keywords. As shown Table 3’s second column, the preferred names of the Metathesaurus concepts for antibody (Source keyword) are Antibodies and antigen binding enclosed in parentheses. Semantic types for the concepts which further explain the identified concepts are enclosed in square bracket. Amino Acid, Peptide, or Protein,Immunologic Factor,Indicator, Reagent, or Diagnostic Aid and Molecular Function are two semantic types identified for the source keyword, antibody. The suggested candidates are ordered by their mapping scores such as 1000, 944, 916, and 900 for a source keyword, breast. The mapping scores generated are useful in ranking the candidates.

Table 3: Sample Meta Candidates entries by MMTx SOURCE

KEYWORDS iTransfer MAPPED KEYWORDS

antibody Meta Candidates (2) 1000 Antibody (Antibodies) [Amino Acid, Peptide, or Protein,Immunologic Factor,Indicator, Reagent, or Diagnostic Aid] 1000 antibody (antigen binding) [Molecular Function]

breast Meta Candidates (7) 1000 Breast [Body Part, Organ, or Organ Component] 1000 Breast (Entire breast) [Body Part, Organ, or Organ Component] 944 Mammary (Mammary gland) [Body Part, Organ, or Organ Component] 944 Pectoral [Qualitative Concept] 916 Chest [Body Location or Region] 916 Chest (Anterior thoracic region) [Body Location or Region] 900 Thoracic (Dissecting aneurysm of the thoracic aorta) [Disease or Syndrome]

Page 10: Engine for Biomedical Image Descriptors1 · linked through Java-based APIs for GoogleImage and PubMed Central databases. The GoogleImage database is accessible through Google Image

10

carcinoma Meta Candidates (5) 1000 Carcinoma [Neoplastic Process] 1000 Carcinoma (Carcinoma of the Mouse Prostate Gland) [Neoplastic Process] 1000 Carcinoma (Mouse Carcinoma) [Neoplastic Process] 900 carcinogen (Carcinogens) [Hazardous or Poisonous Substance] 900 Carcinogenicity [Neoplastic Process]

chemotherapy Meta Candidates (3) 1000 Chemotherapy (Pharmacotherapy) [Therapeutic or Preventive Procedure] 1000 chemotherapy (pharmacotherapeutic) [Functional Concept] 1000 Chemotherapy (Chemotherapy-Oncologic Procedure) [Therapeutic or Preventive Procedure]

dcis Meta Candidates (1) 1000 DCIS (Carcinoma, Intraductal) [Neoplastic Process]

ductal Meta Candidates (4) 1000 Ductal [Qualitative Concept] 1000 Ductal (Ductal Hypoplasia of the Mouse Mammary Gland) [Disease or Syndrome] 928 Duct [Body Part, Organ, or Organ Component] 928 Duct (Entire duct) [Body Part, Organ, or Organ Component]

Discussion/Conclusion The current system was developed to support biomedical imaging searchers by identifying recommended keywords, transferring the keywords for better retrieval terms, and retrieving biomedical images using the refined terms from PubMed Central and GoogleImage databases. Images within biomedical literature carry important information, both detailed and abstract, about the experimental results published therein. Conventional MeSH descriptors have not transferred into descriptors that allow for easy discovery of the image information. In combination with conventional MeSH descriptors assigned for the articles’ overall topic, the recommended indexing terms from the captions can help identify granular description for biomedical figures. Therefore, the iIndex we developed will increase retrieval effectiveness of biomedical images published in the articles by identifying image indexing terms from unconventional but more relevant source texts like captions. Future endeavors will be devoted to assessing retrieval effectiveness of the iRetrieve search engine in relation to real-user use case scenarios. Advanced searching features will be tested for major types of imaging modalities in biomedicine (e.g., radiology, gross, microscopic, MRI, CT, PET, etc.). The major limitation we identified from our pilot testing was a single term-based indexing entry identified by iIndex. To improve the current design, we will integrate

Page 11: Engine for Biomedical Image Descriptors1 · linked through Java-based APIs for GoogleImage and PubMed Central databases. The GoogleImage database is accessible through Google Image

11

an open source project, Apache Lucene, which will provide tools for better indexing, retrieving, and ranking of indexing terms in conjunction with iTransfer mapping module. Data collection used in our pilot testing was also limited to the topic of “breast cancer,” and the identified images we were limited to microscopic images. We plan to expand testing to cover other imaging modalities such as radiologic images, computer tomographic images, etc. Additionally, because we observed potential benefits from the use of MeSH descriptors (especially tags) in combination with automatically generated indexing terms for better retrieval of published biomedical images, we plan to formally assess retrieval effectiveness when using the combined PubMedCentral and GoogleImage databases. As mentioned before, advances in biomedical imaging have resulted in the vast growth of images used in clinical diagnosis, scientific research, and biomedical education. This pilot study will assist in understanding bibliographic control of web-based visual materials for better retrieval by providing a semantic mapping mechanism. In the field of information organization and retrieval, methods and tools for non-textual retrieval has garnered increasing attention as the digital world expands. It is incumbent upon libraries and other information agencies to promote and maintain an interest in the opportunities and challenges associated with biomedical imaging.

References 1. Leong FJ & Leong AS. Digital imaging applications in anatomic pathology.

Advances in Anatomic Pathology, 2003. 10(2):88-95. 2. Pritt BS, Gibson PC, and Cooper K. Digital imaging guidelines for pathology: a

proposal for general and academic use. Advances in Anatomic Pathology, 2003. 10(2):96-100.

3. Montalto MC. Pathology RE-imagined: the history of digital radiology and the future of anatomic pathology. Archives of Pathology and Laboratory Medicine, 2008. 132(5):764-5.

4. Dee FR. Virtual microscopy for comparative pathology. Toxicologic Pathology, 2006. 34(7): 966-7.

5. Li XX., et al. A feasibility study of virtual slides in surgical pathology in China. Human Pathology, 2007. 38(12): 1842-1848.

6. Shatkay H, Chen C, and Blostein D. Integrating image data into biomedical text categorization, Bioinformatics, 2006. 22(14): e446-53.

7. Xu S, McCusker J, and Krauthammer M. Yale Image Finder (YIF): a new search engine for retrieving biomedical images, Bioinformatics, 2008. 24(17):1968-70.

8. Murphy RF, et al. Extracting and Structuring Subcellular Location Information from On-line Journal Articles: The Subcellular Location Image Finder. Proceedings of the IASTED International Conference on Knowledge Sharing and Collaborative Engineering 2004. KSCE 2004:109-114.

9. Hua J, et al. Identifying Fluorescence Microscope Images in Online Journal Articles Using Both Image and Text Features. Proceedings of the 2007 IEEE International Symposium on Biomedical Imaging, 2007. ISBI 2007: 1224-1227.

10. Sneiderman CA., et al. [Web Document]. UMLS-based Automatic Image Indexing. AMIA Annual Symposium proceedings AMIA Symposium, 2008. p.1141.

Page 12: Engine for Biomedical Image Descriptors1 · linked through Java-based APIs for GoogleImage and PubMed Central databases. The GoogleImage database is accessible through Google Image

12

[Accessed on March 15, 2009]. Available at: <http://archive.nlm.nih.gov/pubs/pubPDFs/Sneiderman_et_al_AMIA_2008.pdf>

11. Yeh AS, Hirschman L, and Morgan AA. Evaluation of text data mining for database curation: lessons learned from the KDD Challenge Cup. Bioinformatics, 2003. 19: p. i331-9.

12. Hearst, M.A., et al., BioText Search Engine: beyond abstract search. Bioinformatics, 2007. 23(16): p. 2196-7.

13. Hearst MA & Wooldridge MA. Exploring the Efficacy of Caption Search for Bioscience Journal Search Interfaces. BioNLP 2007: Biological, translation, and clinical language processing, 2007:73-80.

14. Gay CW, Kayaalp M, and Aronson AR. Semi-automatic indexing of full text biomedical articles. AMIA Annual Symposium proceedings AMIA Symposium, 2005. [Accessed on March 15, 2009]. Available at: <http://ii.nlm.nih.gov/resources/amia05.fulltext.w.footer.pdf>

15. Kahn CE. Effective metadata discovery for dynamic filtering of queries to a radiology image search engine. Journal of Digital Imaging, 2008. 21(3):269-73.

16. Aronson AR. The MetaMap Mapping Algorithm. [Accessed on March 15, 2009]. Available at: <http://skr.nlm.nih.gov/papers/references/mm.mapping.pdf>

17. Google Image Search retrieved from Wikipedia. [Accessed on March 15, 2009]. Available at: <http://en.wikipedia.org/wiki/Google_image>

About the Authors: Sujin Kim, Ph.D., Assistant Professor School of Library and Information Science and Department of Pathology and Laboratory Medicine, University of Kentucky 339 Lucille Little Fine Art Library Building Lexington, Kentucky 40506-0224, USA [email protected] (+1) 859-257-8657 (Tel), (+1) 859-257-4205 (Fax) Aswathnarayanan Sadagopan, MS, Graduate Research Assistant Department of Computer Science, University of Kentucky 350 Lucille Little Fine Art Library Building Lexington, Kentucky 40506-0224, USA [email protected] (+1) 859-257-2335 (Tel), (+1) 859-257-4205 (Fax)