separating compound figures in journal articles to allow for subfigure classification

16
Institute of Information Systems Separating compound figures in journal articles to allow for subfigure classification Ajad Chhatkuli Antonio Foncubierta-Rodríguez Dimitrios Markonis Henning Müller

Upload: university-of-applied-sciences-western-switzerland

Post on 22-Jun-2015

318 views

Category:

Technology


0 download

DESCRIPTION

Journal images represent an important part of the knowledge stored in the medical literature. Figure classification has received much attention as the information of the image types can be used in a variety of contexts to focus image search and filter out unwanted information or ”noise”, for example non–clinical images. A major problem in figure classification is the fact that many figures in the biomedical literature are compound figures and do often contain more than a single figure type. Some journals do separate compound figures into several parts but many do not, thus requiring currently manual separation. In this work, a technique of compound figure separation is proposed and implemented based on systematic detection and analysis of uniform space gaps. The method discussed in this article is evaluated on a dataset of journal figures of the open access literature that was created for the ImageCLEF 2012 benchmark and contains about 3000 compound figures. Automatic tools can easily reach a relatively high accuracy in separating compound figures. To further increase accuracy efforts are needed to improve the detection process as well as to avoid over–separation with powerful analysis strategies. The tools of this article have also been tested on a database of approximately 150’000 compound figures from the biomedical literature, making these images available as separate figures for further image analysis and allowing to filter important information from them.

TRANSCRIPT

Page 1: Separating compound figures in journal articles to allow for subfigure classification

Institute of Information Systems

Separating compound figures in journal

articles to allow for subfigure classification

Ajad Chhatkuli

Antonio Foncubierta-Rodríguez

Dimitrios Markonis

Henning Müller

Page 2: Separating compound figures in journal articles to allow for subfigure classification

Institute of Information Systems

Motivation

• Figures in biomedical journals contain a lot of

information

• CBIR has been proposed for accessing medical

literature

• Modality classification

• Improves accessibility

• Allows result filtering

• But 50% of figures are compound or multipanel

Page 3: Separating compound figures in journal articles to allow for subfigure classification

Institute of Information Systems

Aim

• Develop a system that separates compound figures

in the biomedical literature

• Visual-information only

• Textual information is discarded

• Modality-independent

• One method for many images types

• Many methods for few images types

• Tunable according to the dataset

• Large-scale tested

• Approximately 250 open access journals

Page 4: Separating compound figures in journal articles to allow for subfigure classification

Institute of Information Systems

Compound figure examples

Page 5: Separating compound figures in journal articles to allow for subfigure classification

Institute of Information Systems

Methods. Dataset

• 2982 manually classified figures from ImageCLEF

2012 dataset

• Ground truth:

• Image subclass: 2x1,1x2,

• Position of separators

Page 6: Separating compound figures in journal articles to allow for subfigure classification

Institute of Information Systems

Methods. Overview

• Problem is separated in two

• Find subfigure separator candidates

• Preprocessing if required

• Analyze candidates

• Remove false positives

• Rule-based decisions

Page 7: Separating compound figures in journal articles to allow for subfigure classification

Institute of Information Systems

Methods. Separator detection

• Based on minimum

pixel projection for

white-space separated

figures

• Horizontal Vertical

detection

• Inverse order by rotation

according to aspect ratio

• Recursive

Page 8: Separating compound figures in journal articles to allow for subfigure classification

Institute of Information Systems

Methods. Separator detection

• Rule-based processing

• Progressive truncation to remove labels if no

separators are found

• Text removal based on connected commponents if no

separators are found

• Complement image for black-space separations

• Standard deviation image for subtle separations

• Binarization of non-graph figures:

• Less than 40% of the image is white or almost white

Page 9: Separating compound figures in journal articles to allow for subfigure classification

Institute of Information Systems

Methods. Separator analysis

• Classification problem

• True/false separator

• Features used:

• Closeness to border, division ratio, standard

deviation, text removal analysis, histogram, gap

comparison

• Classifiers:

• SVM

• Rule-based classifier

Page 10: Separating compound figures in journal articles to allow for subfigure classification

Institute of Information Systems

Results

Page 11: Separating compound figures in journal articles to allow for subfigure classification

Institute of Information Systems

Successful examples

Page 12: Separating compound figures in journal articles to allow for subfigure classification

Institute of Information Systems

Successful examples

Page 13: Separating compound figures in journal articles to allow for subfigure classification

Institute of Information Systems

Unsuccessful examples

No separation gap

Not horizontal/vertical

separation

Page 14: Separating compound figures in journal articles to allow for subfigure classification

Institute of Information Systems

Conclusions future work

• Good results for a wide range of images

• Using purely visual information

• Separation problem: detection and analysis

• Rule weights can be fine-tuned according to dataset

• What would be the impact of a larger training set?

• What would be the impact in existing modality

classification accuracy?

Page 15: Separating compound figures in journal articles to allow for subfigure classification

Institute of Information Systems

Conclusions future work

• Good results for a wide range of images

• Using purely visual information

• Separation problem: detection and analysis

• Rule weights can be fine-tuned according to dataset

• What would be the impact of a larger training set?

• What would be the impact in existing modality

classification accuracy?

Page 16: Separating compound figures in journal articles to allow for subfigure classification

Institute of Information Systems

Thanks for your attention!

More information at http://medgift.hevs.ch

Ajad Chhatkuli, Dimitrios Markonis, Antonio Foncubierta-Rodríguez, Fabrice Meriaudeau

and Henning Müller, Separating compound figures in journal articles to allow for subfigure

classification, in: SPIE, Medical Imaging, Orlando, FL, USA, 2013