an investigation on combination methods

42
ALI HOSSEINZADEH VAHID ASST.PROF.DR.ADİL ALPKOÇAK AUGUST, 2012 İZMİR An Investigation on Combination Methods for Multimodal Content-based Medical Image Retrieval

Upload: ali-hosseinzadeh-vahid

Post on 19-Dec-2014

266 views

Category:

Technology


2 download

DESCRIPTION

 

TRANSCRIPT

A L I H O S S E I N Z A D E H VA H I D

A S S T. P R O F. D R . A D İ L A L P K O Ç A K

A U G U S T, 2012İ Z M İ R

An Investigation on Combination Methods for Multimodal Content-based Medical Image Retrieval

Introduction

Medical images are playing an important role to detect anatomical and functional information of the body part for diagnosis, medical research and education :

physicians or radiologists examine them in conventional ways based on their individual experiences and knowledge

provide diagnostic support to physicians or radiologists by displaying relevant past cases.

as a training tool for medical students and residents in education, follow-up studies, and for research purposes.

10/8/2012An Investigation on Combination Methods for Multimodal Content-based Medical Image Retrieval

2

Background(Image Retrieval Systems)

Image retrieval is a poor stepchild to other forms of information retrieval (IR). Image retrieval has been one of the most interesting and vivid research areas in the field of computer vision over the last decades.

An image retrieval system is a computer system for browsing, searching and retrieving similar images (may not be exact) from a large database of digital images with the help of some key attributes associated with the images or features inherently contained in the images.

10/8/2012An Investigation on Combination Methods for Multimodal Content-based Medical Image Retrieval

3

Background(TBIR)

In Text Based Image Retrieval (TBIR)system, images are indexed by text, known as the metadata of the image, such as the patient’s ID number, the date it was produced, the type of the image and a manually annotated description on the content of the image itself such as Google Images and Flickr.

image retrieval based only on text information is not sufficient since : The amount of labor required to manually annotate every single

image, The difference in human perception when describing the images,

which might lead to inaccuracies during the retrieval process.

10/8/2012An Investigation on Combination Methods for Multimodal Content-based Medical Image Retrieval

4

Background(CBIR)

The main goal in Content Base Image Retrieval system is searching and finding similar images based on their content.

To accomplish this, the content should first be described in an efficient way, e.g. the so-called indexing or feature extraction and binary signatures are formed and stored as the data

When the query image is given to the system, the system will extract image features for this query. It will compare these features with that of other images in a database. Relevant results will be displayed to the user.

There are many factors to consider in the design of a CBIR: Choice of right features: how to mathematically describe an image ? Similarity measurement criteria: how to assess the similarity between a pair of images? Indexing mechanism and Query formulation technique

10/8/2012An Investigation on Combination Methods for Multimodal Content-based Medical Image Retrieval

5

Background (CBIR)

Major problems of CBIR are :

Semantic gap: The lack of coincidence between the information that one can extract from the visual data and the interpretation that the same data have for a user in a given situation. User seeks semantic similarity, but the database can only provide similarity by data processing.

Huge amount of objects to search among. Incomplete query specification. Incomplete image description.

10/8/2012An Investigation on Combination Methods for Multimodal Content-based Medical Image Retrieval

6

Image Content Descriptors

image content may include :

Visual content General : include color, texture, shape, spatial

relationship, etc. Domain specific: is application dependent and may

involve domain knowledge

Semantic content is obtained by textual annotation by complex inference procedures based on visual

content

10/8/2012An Investigation on Combination Methods for Multimodal Content-based Medical Image Retrieval

7

Color

One of the most widely used visual features Relatively robust to changes in the

background colorsIndependent of image size and orientation Considerable design and experimental work

in MPEG-7 to arrive at efficient color descriptors for similarity matching.

No single generic color descriptor exists that can be used for all foreseen applications.

Such as SCD, CLD, CSD

10/8/2012An Investigation on Combination Methods for Multimodal Content-based Medical Image Retrieval

8

Texture

Another fundamental visual feature

This contains structure ness, regularity, directionality and roughness of images

Such as HTD, EHD

10/8/2012An Investigation on Combination Methods for Multimodal Content-based Medical Image Retrieval

9

Compact composite descriptors

Color and edge directivity descriptor (CEDD) The six-bin histogram of the fuzzy system that uses

the five digital filters proposed by the MPEG-7 EHD. The 24-bin color histogram produced by the 24-bin

fuzzy-linking system. Overall, the final histogram has 144 regions.

Fuzzy color and texture histogram (FCTH) The eight-bin histogram of the fuzzy system that uses

the high frequency bands of the Haar wavelet transform

The 24-bin color histogram produced by the 24-bin fuzzy-linking system.

Overall, the final histogram includes192 regions.10/8/2012An Investigation on Combination Methods for Multimodal Content-based Medical

Image Retrieval

10

Compact composite descriptors

Brightness and Texture Directionality Histogram  BTDH is very similar to FCTH feature. The main difference is using brightness instead of

color histogram. uses brightness and texture characteristics as well as

the spatial distribution of these characteristics in one compact 1D vector.

The texture information comes from the Directionality histogram.

Fractal Scanning method through the Hilbert Curve or the Z-Grid method is used to capture the spatial distribution of brightness and texture information

10/8/2012An Investigation on Combination Methods for Multimodal Content-based Medical Image Retrieval

11

Similarity Measures

Geometric Measures treat objects as vectors.

Information Theoretic Measures are derived from the Shannon’s entropy theory and treat objects as probabilistic distributions

Statistic Measures compare two objects in a distributed manner, and basically assume that the vector elements are samples.

10/8/2012An Investigation on Combination Methods for Multimodal Content-based Medical Image Retrieval

12

Performance evaluation

Precision is a measure that evaluates the efficiency of a system according to relevant items only

Recall determines the retrieval efficiency according to all relevant documents

10/8/2012An Investigation on Combination Methods for Multimodal Content-based Medical Image Retrieval

13

Performance evaluation

R‑precision is the precision value of system after R documents are retrieved and R is the number of relevant documents for the topic.

Mean Average Precision (MAP) for a set of queries is the mean of the average precision scores for each query:

where is the number of queries

10/8/2012An Investigation on Combination Methods for Multimodal Content-based Medical Image Retrieval

14

Need to fuse (CBIR)

Some research efforts have been reported to enhance CBIR performance by taking the multi-modality fusion approaches:

Since each feature extracted from images just characterizes certain aspect of image content.

A special feature is not equally important for different image queries since a special feature has different importance in reflecting the content of different images.

10/8/2012An Investigation on Combination Methods for Multimodal Content-based Medical Image Retrieval

15

Fusion

“Information fusion is the study of efficient methods for automatically or semi-automatically transforming information from different sources and different points in time into a representation that provides effective support for human or automated decision making.”

The major challenge is to find adjusted techniques for associating multiple sources of information for either decision–making or information retrieval.

traditional work on multimodal integration has largely been heuristic-based. Still today, the understanding of how fusion works and by what it is influenced is limited. 10/8/2012An Investigation on Combination Methods for Multimodal Content-based Medical Image

Retrieval

16

Significant techniques in the multimodal fusion process

Feature level fusion: An information process that integrates, associates, correlates and combines unimodal features, data and information from single or multiple sensors or sources to achieve refined estimates of parameters, characteristics, events and behaviors

The information fusion at data or sensor level can achieve the best performance improvements (Koval, 2007)

. It can utilize the correlation between multiple features from different modalities at an early stage which helps in better task accomplishment.

Also, it requires only one learning phase on the combined feature vector

it is hard to represent the time synchronization between the multimodal features.

The features to be fused should be represented in the same format before fusion.

The increase in the number of modalities makes it difficult to learn the cross-correlation among the heterogeneous features

10/8/2012An Investigation on Combination Methods for Multimodal Content-based Medical Image Retrieval

17

Significant techniques in the multimodal fusion process

Score, rank and decision level fusion, also called high-level, late information fusion, arose in the neural network literature. Here, each modality/ sensor/ source/ feature is first processed individually. The results, so called experts, can be scores in classification or ranks for retrieval. The expert's values are then combined for determining the final decision.

This type of information fusion is faster and easier to implement than early fusion.

The decision level fusion strategy offers scalability (i.e. graceful upgrading or degrading) in terms of the modalities used in the fusion process.

The disadvantage of the late fusion approach lies in its failure to utilize the feature level correlation among modalities

As different classifiers are used to obtain the local decisions, the learning process for them becomes tedious and time-consuming.

10/8/2012An Investigation on Combination Methods for Multimodal Content-based Medical Image Retrieval

18

Formal presentation of Fusion on Multimodal Retrieval systems

 

10/8/2012An Investigation on Combination Methods for Multimodal Content-based Medical Image Retrieval

19

Formal presentation of Fusion on Multimodal Retrieval systems

 

10/8/2012An Investigation on Combination Methods for Multimodal Content-based Medical Image Retrieval

20

Venn diagram of different modalities relevant retrieved document set in combined result set

10/8/2012An Investigation on Combination Methods for Multimodal Content-based Medical Image Retrieval

21

Formal presentation of Fusion on Multimodal Retrieval systems

 

10/8/2012An Investigation on Combination Methods for Multimodal Content-based Medical Image Retrieval

22

Formal presentation of Fusion on Multimodal Retrieval systems

 

10/8/2012An Investigation on Combination Methods for Multimodal Content-based Medical Image Retrieval

23

Formal presentation of Fusion on Multimodal Retrieval systems

 

10/8/2012An Investigation on Combination Methods for Multimodal Content-based Medical Image Retrieval

24

Experiments

We performed our experiments with CLEF 2011 medical image

classification and retrieval tasks dataset. The database includes 231,000

images from journals of BioMed Central at the PubMed Central database

associated with their original articles in the journals.

Beside, a single XML file is provided as textual metadata for all

documents in the collection.

30 topics, ten topics each for visual, textual and mixed retrieval, were

chosen to allow for the evaluation of a large variety of techniques. Each

topic has both a textual query and at least one sample query image10/8/2012An Investigation on Combination Methods for Multimodal Content-based Medical

Image Retrieval

25

Text modality

We used Terrier IR Platform APIPreprocessing :

Split the metadata file and each represented image in the collection as a structured document of xml file.

Special characters deletion: characters with no meaning, like punctuation marks or blanks, are all eliminated;

Stop words removal: discarding of semantically empty words, very high frequency words,

Token normalization: converting all words to lower case

Stemming: we used the Porter stemmer

10/8/2012An Investigation on Combination Methods for Multimodal Content-based Medical Image Retrieval

26

Text modality

We compared performance of the subsystem using variety of implemented weighting models in Terrier and chose DFR-BM25 weighting model (Amati, 2003) as base textual modality of our system because its result was almost the average values of results of other weighting models.

Additionally, we calculate the similarity score of all documents in collection corresponding to each query topic and then sort them in descending order as ranked list.

10/8/2012An Investigation on Combination Methods for Multimodal Content-based Medical Image Retrieval

27

Comparison of weighting models on Text modality

Run Name DFR_BM25 BM25 TF_IDF BB2 IFB2 In_expB2

In_expC2 InL2 PL2

num_q 30 30 30 30 30 30 30 30 30num_ret 28333 28333 28333 28333 28333 28333 28333 28333 28333num_rel 2584 2584 2584 2584 2584 2584 2584 2584 2584num_rel_ret 1444 1487 1479 1425 1430 1422 1418 1440 1425

map 0.1942 0.2021 0.2022 0.1922 0.1902 0.192 0.1847 0.1991 0.1855

Rprec 0.2242 0.2428 0.2401 0.2236 0.2189 0.222 0.2251 0.2338 0.2198

bpref 0.2215 0.2323 0.2318 0.222 0.2203 0.2209 0.2128 0.2283 0.2135P_5 0.38 0.4067 0.4 0.3667 0.3667 0.3733 0.3533 0.3933 0.36P_10 0.34 0.3333 0.3367 0.3333 0.3333 0.3367 0.3467 0.3333 0.3367P_15 0.3067 0.3067 0.3178 0.32 0.3133 0.32 0.3133 0.3111 0.2978P_20 0.2933 0.3 0.3 0.2933 0.2883 0.2983 0.2883 0.29 0.285P_30 0.2644 0.2767 0.2789 0.2689 0.2622 0.2633 0.2589 0.2633 0.2533P_100 0.1797 0.1903 0.193 0.1837 0.1757 0.1793 0.1807 0.1823 0.1797P_200 0.1385 0.143 0.145 0.1387 0.134 0.1387 0.1397 0.1383 0.1377P_500 0.0802 0.0848 0.0849 0.0803 0.0799 0.0805 0.0795 0.08 0.0787

P_1000 0.0481 0.0496 0.0493 0.0475 0.0477 0.0474 0.0473 0.048 0.0475

10/8/2012An Investigation on Combination Methods for Multimodal Content-based Medical Image Retrieval

28

Visual Modality

We extracted features for all images in test collection and query examples using Rummager tool .

We examined the performance of all extracted feature.

We perceived that compact composite features like CEDD and FCTH have satisfactorily retrieval result on our image collection

Because CEDD feature has a satisfactorily retrieval result and its required computational power and storage space is noticeably lower, we used it as the base visual modality result. 10/8/2012An Investigation on Combination Methods for Multimodal Content-based Medical

Image Retrieval

29

Comparison on performance of different low level features

10/8/2012An Investigation on Combination Methods for Multimodal Content-based Medical Image Retrieval

30

CEDD FCTH Spcd BTDH CLD EHD SCD Timura Texture ColorHISTOGRAM0

100

200

300

400

500

600

700

547

603

519

329

530

352

167

120

265

num_rel_ret

Visual Modality

In matching phase, we had to evaluate the similarity difference between the vector corresponding to the query example image and the vectors representing the dataset images.

We assessed performance of different similarity function on Compact Composite features

Then we sorted all of dataset images in a descending

list based on the value of similarity score in corresponding to each query example image.

10/8/2012An Investigation on Combination Methods for Multimodal Content-based Medical Image Retrieval

31

Distance function performance evaluation of different features

10/8/2012An Investigation on Combination Methods for Multimodal Content-based Medical Image Retrieval

32

Euclidean Distance

Cosine Similarity Manhattan Distance

Minkowski P3 Distance

Minkowski P4 Distance

Minkowski P5 Distance

Tanimoto Distance

250

300

350

400

450

500

550

600

650

547 537521 537

525 521

441

603 596583

570 568 568

481519533

506516

499476

443

329 333322 319

305291 297

CEDDFCTH SPCDBTDH

Integrated Combination Multimodal Retrieval

Our proposed method is a super level of late fusion because it can applied on both, similarity scores or ranks, of each modality feature that processed individually like as late fusion.

The significant difference between this approach and late fusion is scale of combination.

In our method, all of documents in data collection involve on combination. In contrast to late fusion that the number of combined document in list depends on value of a threshold based on score or rank.

10/8/2012An Investigation on Combination Methods for Multimodal Content-based Medical Image Retrieval

33

Experiments on combination method of modalities

10/8/2012An Investigation on Combination Methods for Multimodal Content-based Medical Image Retrieval

34

Early Fusion: We used feature concatenation method on the

synchronous compact composite feature vectors of all images

We used Euclidean distance for similarity measure and selected top 1000 documents for each query.

Late Fusion with Substitution Value of Zero: We applied CombSUM function on similarity scores of

first 1000 top retrieved documents of each feature result set

We normalized the similarity scores using Min-Max normalization function before combination.

According to Fagin’s A0 combination algorithm, we substitute zero as similarity score of documents that are not appeared in retrieved document list.

Comparison of different combination method performance

Combination FunctionCombination

method

# of

Relevant

Retrieved

MAP

CombSUM(CEDD,FCTH)

Early Fusion 658 0.0201

LFSVZ 643 0.0194

ICMR 665 0.199

CombSUM

(CEDD,FCTH,SpCD)

Early Fusion 676 0.0231

LFSVZ 541 0.0198

ICMR 699 0.0252

10/8/2012An Investigation on Combination Methods for Multimodal Content-based Medical Image Retrieval

35

Impact of different weights of modalities onIntegrated Combination multimodal Retrieval

3 1559 287 1410 219 219 1191 68 34 260 81

2.5 1573 298 1404 219 219 1185 79 40 249 90

2.0 1595 310 1393 219 219 1174 91 51 237 111

1.7 1599 321 1373 219 219 1154 102 71 226 124

1.5 1597 329 1354 219 219 1135 110 90 218 133

1.0 1578 394 1254 219 219 1035 175 190 153 149

0.7 1454 432 1089 219 219 870 213 355 115 152

10/8/2012An Investigation on Combination Methods for Multimodal Content-based Medical Image Retrieval

36

Venn diagram of different modalities relevant retrieved document set in combined result set

10/8/2012An Investigation on Combination Methods for Multimodal Content-based Medical Image Retrieval

37

Impact of different weights of modalities onLate Fusion

Rmixed

3 1479 261 1437 219 219 1218 42 7 286 0

2.5 1515 337 1397 219 219 1178 118 47 210 0

2.0 1484 429 1274 219 219 1055 210 170 118 0

1.7 1385 481 1123 219 219 904 262 321 66 0

1.5 1246 506 959 219 219 740 287 485 41 0

1.0 720 543 396 219 219 177 324 1048 4 0

0.7 560 547 232 219 219 13 328 1212 0 0

10/8/2012An Investigation on Combination Methods for Multimodal Content-based Medical Image Retrieval

38

Details of ICMR in response to query #18

  Text Visual Mixed

Threshold in 1000th

top score0.4053 0.8677 0.6486

       

Document ID

Similarity Scores

Textual Visual Mixed

1471-213X-4-16-2 0.3029 0.7768 1.2919

1471-213X-4-16-3 0.3242 0.8055 1.3567

1471-213X-4-16-5 0.3223 0.7837 1.3318

10/8/2012An Investigation on Combination Methods for Multimodal Content-based Medical Image Retrieval

39

Conclusion

In this study, we found that: Effective combination of textual and visual modalities

improves the overall performance of Content-based Medical Image Retrieval Systems.

It is clear that integrated retrieval outperforms all fusion techniques, regardless of late or early fusion, in multimodal CBIR systems.

In the best combination of textual and visual modalities, weight for textual modality is about 1.7 folds of visual modality weight.

Common documents in relevant retrieved set of different modalities also appears in relevant retrieved document set of combined modality too, regardless weights or methods.

10/8/2012An Investigation on Combination Methods for Multimodal Content-based Medical Image Retrieval

40

Future directions

Our study can be extended in several ways:

To apply and verify this experimentation results to other medical image collection

To extend our study into other domain of CBIR Systems rather than medical domain.

To implement the effective and efficient tool for applying ICMR

10/8/2012An Investigation on Combination Methods for Multimodal Content-based Medical Image Retrieval

41

THANKS FOR

YOUR KIND CONSIDERATION