an investigation on combination methods
Embed Size (px)
DESCRIPTION
TRANSCRIPT

A L I H O S S E I N Z A D E H VA H I D
A S S T. P R O F. D R . A D İ L A L P K O Ç A K
A U G U S T, 2012İ Z M İ R
An Investigation on Combination Methods for Multimodal Content-based Medical Image Retrieval

Introduction
Medical images are playing an important role to detect anatomical and functional information of the body part for diagnosis, medical research and education :
physicians or radiologists examine them in conventional ways based on their individual experiences and knowledge
provide diagnostic support to physicians or radiologists by displaying relevant past cases.
as a training tool for medical students and residents in education, follow-up studies, and for research purposes.
10/8/2012An Investigation on Combination Methods for Multimodal Content-based Medical Image Retrieval
2

Background(Image Retrieval Systems)
Image retrieval is a poor stepchild to other forms of information retrieval (IR). Image retrieval has been one of the most interesting and vivid research areas in the field of computer vision over the last decades.
An image retrieval system is a computer system for browsing, searching and retrieving similar images (may not be exact) from a large database of digital images with the help of some key attributes associated with the images or features inherently contained in the images.
10/8/2012An Investigation on Combination Methods for Multimodal Content-based Medical Image Retrieval
3

Background(TBIR)
In Text Based Image Retrieval (TBIR)system, images are indexed by text, known as the metadata of the image, such as the patient’s ID number, the date it was produced, the type of the image and a manually annotated description on the content of the image itself such as Google Images and Flickr.
image retrieval based only on text information is not sufficient since : The amount of labor required to manually annotate every single
image, The difference in human perception when describing the images,
which might lead to inaccuracies during the retrieval process.
10/8/2012An Investigation on Combination Methods for Multimodal Content-based Medical Image Retrieval
4

Background(CBIR)
The main goal in Content Base Image Retrieval system is searching and finding similar images based on their content.
To accomplish this, the content should first be described in an efficient way, e.g. the so-called indexing or feature extraction and binary signatures are formed and stored as the data
When the query image is given to the system, the system will extract image features for this query. It will compare these features with that of other images in a database. Relevant results will be displayed to the user.
There are many factors to consider in the design of a CBIR: Choice of right features: how to mathematically describe an image ? Similarity measurement criteria: how to assess the similarity between a pair of images? Indexing mechanism and Query formulation technique
10/8/2012An Investigation on Combination Methods for Multimodal Content-based Medical Image Retrieval
5

Background (CBIR)
Major problems of CBIR are :
Semantic gap: The lack of coincidence between the information that one can extract from the visual data and the interpretation that the same data have for a user in a given situation. User seeks semantic similarity, but the database can only provide similarity by data processing.
Huge amount of objects to search among. Incomplete query specification. Incomplete image description.
10/8/2012An Investigation on Combination Methods for Multimodal Content-based Medical Image Retrieval
6

Image Content Descriptors
image content may include :
Visual content General : include color, texture, shape, spatial
relationship, etc. Domain specific: is application dependent and may
involve domain knowledge
Semantic content is obtained by textual annotation by complex inference procedures based on visual
content
10/8/2012An Investigation on Combination Methods for Multimodal Content-based Medical Image Retrieval
7

Color
One of the most widely used visual features Relatively robust to changes in the
background colorsIndependent of image size and orientation Considerable design and experimental work
in MPEG-7 to arrive at efficient color descriptors for similarity matching.
No single generic color descriptor exists that can be used for all foreseen applications.
Such as SCD, CLD, CSD
10/8/2012An Investigation on Combination Methods for Multimodal Content-based Medical Image Retrieval
8

Texture
Another fundamental visual feature
This contains structure ness, regularity, directionality and roughness of images
Such as HTD, EHD
10/8/2012An Investigation on Combination Methods for Multimodal Content-based Medical Image Retrieval
9

Compact composite descriptors
Color and edge directivity descriptor (CEDD) The six-bin histogram of the fuzzy system that uses
the five digital filters proposed by the MPEG-7 EHD. The 24-bin color histogram produced by the 24-bin
fuzzy-linking system. Overall, the final histogram has 144 regions.
Fuzzy color and texture histogram (FCTH) The eight-bin histogram of the fuzzy system that uses
the high frequency bands of the Haar wavelet transform
The 24-bin color histogram produced by the 24-bin fuzzy-linking system.
Overall, the final histogram includes192 regions.10/8/2012An Investigation on Combination Methods for Multimodal Content-based Medical
Image Retrieval
10

Compact composite descriptors
Brightness and Texture Directionality Histogram BTDH is very similar to FCTH feature. The main difference is using brightness instead of
color histogram. uses brightness and texture characteristics as well as
the spatial distribution of these characteristics in one compact 1D vector.
The texture information comes from the Directionality histogram.
Fractal Scanning method through the Hilbert Curve or the Z-Grid method is used to capture the spatial distribution of brightness and texture information
10/8/2012An Investigation on Combination Methods for Multimodal Content-based Medical Image Retrieval
11

Similarity Measures
Geometric Measures treat objects as vectors.
Information Theoretic Measures are derived from the Shannon’s entropy theory and treat objects as probabilistic distributions
Statistic Measures compare two objects in a distributed manner, and basically assume that the vector elements are samples.
10/8/2012An Investigation on Combination Methods for Multimodal Content-based Medical Image Retrieval
12

Performance evaluation
Precision is a measure that evaluates the efficiency of a system according to relevant items only
Recall determines the retrieval efficiency according to all relevant documents
10/8/2012An Investigation on Combination Methods for Multimodal Content-based Medical Image Retrieval
13

Performance evaluation
R‑precision is the precision value of system after R documents are retrieved and R is the number of relevant documents for the topic.
Mean Average Precision (MAP) for a set of queries is the mean of the average precision scores for each query:
where is the number of queries
10/8/2012An Investigation on Combination Methods for Multimodal Content-based Medical Image Retrieval
14

Need to fuse (CBIR)
Some research efforts have been reported to enhance CBIR performance by taking the multi-modality fusion approaches:
Since each feature extracted from images just characterizes certain aspect of image content.
A special feature is not equally important for different image queries since a special feature has different importance in reflecting the content of different images.
10/8/2012An Investigation on Combination Methods for Multimodal Content-based Medical Image Retrieval
15

Fusion
“Information fusion is the study of efficient methods for automatically or semi-automatically transforming information from different sources and different points in time into a representation that provides effective support for human or automated decision making.”
The major challenge is to find adjusted techniques for associating multiple sources of information for either decision–making or information retrieval.
traditional work on multimodal integration has largely been heuristic-based. Still today, the understanding of how fusion works and by what it is influenced is limited. 10/8/2012An Investigation on Combination Methods for Multimodal Content-based Medical Image
Retrieval
16

Significant techniques in the multimodal fusion process
Feature level fusion: An information process that integrates, associates, correlates and combines unimodal features, data and information from single or multiple sensors or sources to achieve refined estimates of parameters, characteristics, events and behaviors
The information fusion at data or sensor level can achieve the best performance improvements (Koval, 2007)
. It can utilize the correlation between multiple features from different modalities at an early stage which helps in better task accomplishment.
Also, it requires only one learning phase on the combined feature vector
it is hard to represent the time synchronization between the multimodal features.
The features to be fused should be represented in the same format before fusion.
The increase in the number of modalities makes it difficult to learn the cross-correlation among the heterogeneous features
10/8/2012An Investigation on Combination Methods for Multimodal Content-based Medical Image Retrieval
17

Significant techniques in the multimodal fusion process
Score, rank and decision level fusion, also called high-level, late information fusion, arose in the neural network literature. Here, each modality/ sensor/ source/ feature is first processed individually. The results, so called experts, can be scores in classification or ranks for retrieval. The expert's values are then combined for determining the final decision.
This type of information fusion is faster and easier to implement than early fusion.
The decision level fusion strategy offers scalability (i.e. graceful upgrading or degrading) in terms of the modalities used in the fusion process.
The disadvantage of the late fusion approach lies in its failure to utilize the feature level correlation among modalities
As different classifiers are used to obtain the local decisions, the learning process for them becomes tedious and time-consuming.
10/8/2012An Investigation on Combination Methods for Multimodal Content-based Medical Image Retrieval
18

Formal presentation of Fusion on Multimodal Retrieval systems
10/8/2012An Investigation on Combination Methods for Multimodal Content-based Medical Image Retrieval
19

Formal presentation of Fusion on Multimodal Retrieval systems
10/8/2012An Investigation on Combination Methods for Multimodal Content-based Medical Image Retrieval
20

Venn diagram of different modalities relevant retrieved document set in combined result set
10/8/2012An Investigation on Combination Methods for Multimodal Content-based Medical Image Retrieval
21

Formal presentation of Fusion on Multimodal Retrieval systems
10/8/2012An Investigation on Combination Methods for Multimodal Content-based Medical Image Retrieval
22

Formal presentation of Fusion on Multimodal Retrieval systems
10/8/2012An Investigation on Combination Methods for Multimodal Content-based Medical Image Retrieval
23

Formal presentation of Fusion on Multimodal Retrieval systems
10/8/2012An Investigation on Combination Methods for Multimodal Content-based Medical Image Retrieval
24

Experiments
We performed our experiments with CLEF 2011 medical image
classification and retrieval tasks dataset. The database includes 231,000
images from journals of BioMed Central at the PubMed Central database
associated with their original articles in the journals.
Beside, a single XML file is provided as textual metadata for all
documents in the collection.
30 topics, ten topics each for visual, textual and mixed retrieval, were
chosen to allow for the evaluation of a large variety of techniques. Each
topic has both a textual query and at least one sample query image10/8/2012An Investigation on Combination Methods for Multimodal Content-based Medical
Image Retrieval
25

Text modality
We used Terrier IR Platform APIPreprocessing :
Split the metadata file and each represented image in the collection as a structured document of xml file.
Special characters deletion: characters with no meaning, like punctuation marks or blanks, are all eliminated;
Stop words removal: discarding of semantically empty words, very high frequency words,
Token normalization: converting all words to lower case
Stemming: we used the Porter stemmer
10/8/2012An Investigation on Combination Methods for Multimodal Content-based Medical Image Retrieval
26

Text modality
We compared performance of the subsystem using variety of implemented weighting models in Terrier and chose DFR-BM25 weighting model (Amati, 2003) as base textual modality of our system because its result was almost the average values of results of other weighting models.
Additionally, we calculate the similarity score of all documents in collection corresponding to each query topic and then sort them in descending order as ranked list.
10/8/2012An Investigation on Combination Methods for Multimodal Content-based Medical Image Retrieval
27

Comparison of weighting models on Text modality
Run Name DFR_BM25 BM25 TF_IDF BB2 IFB2 In_expB2
In_expC2 InL2 PL2
num_q 30 30 30 30 30 30 30 30 30num_ret 28333 28333 28333 28333 28333 28333 28333 28333 28333num_rel 2584 2584 2584 2584 2584 2584 2584 2584 2584num_rel_ret 1444 1487 1479 1425 1430 1422 1418 1440 1425
map 0.1942 0.2021 0.2022 0.1922 0.1902 0.192 0.1847 0.1991 0.1855
Rprec 0.2242 0.2428 0.2401 0.2236 0.2189 0.222 0.2251 0.2338 0.2198
bpref 0.2215 0.2323 0.2318 0.222 0.2203 0.2209 0.2128 0.2283 0.2135P_5 0.38 0.4067 0.4 0.3667 0.3667 0.3733 0.3533 0.3933 0.36P_10 0.34 0.3333 0.3367 0.3333 0.3333 0.3367 0.3467 0.3333 0.3367P_15 0.3067 0.3067 0.3178 0.32 0.3133 0.32 0.3133 0.3111 0.2978P_20 0.2933 0.3 0.3 0.2933 0.2883 0.2983 0.2883 0.29 0.285P_30 0.2644 0.2767 0.2789 0.2689 0.2622 0.2633 0.2589 0.2633 0.2533P_100 0.1797 0.1903 0.193 0.1837 0.1757 0.1793 0.1807 0.1823 0.1797P_200 0.1385 0.143 0.145 0.1387 0.134 0.1387 0.1397 0.1383 0.1377P_500 0.0802 0.0848 0.0849 0.0803 0.0799 0.0805 0.0795 0.08 0.0787
P_1000 0.0481 0.0496 0.0493 0.0475 0.0477 0.0474 0.0473 0.048 0.0475
10/8/2012An Investigation on Combination Methods for Multimodal Content-based Medical Image Retrieval
28

Visual Modality
We extracted features for all images in test collection and query examples using Rummager tool .
We examined the performance of all extracted feature.
We perceived that compact composite features like CEDD and FCTH have satisfactorily retrieval result on our image collection
Because CEDD feature has a satisfactorily retrieval result and its required computational power and storage space is noticeably lower, we used it as the base visual modality result. 10/8/2012An Investigation on Combination Methods for Multimodal Content-based Medical
Image Retrieval
29

Comparison on performance of different low level features
10/8/2012An Investigation on Combination Methods for Multimodal Content-based Medical Image Retrieval
30
CEDD FCTH Spcd BTDH CLD EHD SCD Timura Texture ColorHISTOGRAM0
100
200
300
400
500
600
700
547
603
519
329
530
352
167
120
265
num_rel_ret

Visual Modality
In matching phase, we had to evaluate the similarity difference between the vector corresponding to the query example image and the vectors representing the dataset images.
We assessed performance of different similarity function on Compact Composite features
Then we sorted all of dataset images in a descending
list based on the value of similarity score in corresponding to each query example image.
10/8/2012An Investigation on Combination Methods for Multimodal Content-based Medical Image Retrieval
31

Distance function performance evaluation of different features
10/8/2012An Investigation on Combination Methods for Multimodal Content-based Medical Image Retrieval
32
Euclidean Distance
Cosine Similarity Manhattan Distance
Minkowski P3 Distance
Minkowski P4 Distance
Minkowski P5 Distance
Tanimoto Distance
250
300
350
400
450
500
550
600
650
547 537521 537
525 521
441
603 596583
570 568 568
481519533
506516
499476
443
329 333322 319
305291 297
CEDDFCTH SPCDBTDH

Integrated Combination Multimodal Retrieval
Our proposed method is a super level of late fusion because it can applied on both, similarity scores or ranks, of each modality feature that processed individually like as late fusion.
The significant difference between this approach and late fusion is scale of combination.
In our method, all of documents in data collection involve on combination. In contrast to late fusion that the number of combined document in list depends on value of a threshold based on score or rank.
10/8/2012An Investigation on Combination Methods for Multimodal Content-based Medical Image Retrieval
33

Experiments on combination method of modalities
10/8/2012An Investigation on Combination Methods for Multimodal Content-based Medical Image Retrieval
34
Early Fusion: We used feature concatenation method on the
synchronous compact composite feature vectors of all images
We used Euclidean distance for similarity measure and selected top 1000 documents for each query.
Late Fusion with Substitution Value of Zero: We applied CombSUM function on similarity scores of
first 1000 top retrieved documents of each feature result set
We normalized the similarity scores using Min-Max normalization function before combination.
According to Fagin’s A0 combination algorithm, we substitute zero as similarity score of documents that are not appeared in retrieved document list.

Comparison of different combination method performance
Combination FunctionCombination
method
# of
Relevant
Retrieved
MAP
CombSUM(CEDD,FCTH)
Early Fusion 658 0.0201
LFSVZ 643 0.0194
ICMR 665 0.199
CombSUM
(CEDD,FCTH,SpCD)
Early Fusion 676 0.0231
LFSVZ 541 0.0198
ICMR 699 0.0252
10/8/2012An Investigation on Combination Methods for Multimodal Content-based Medical Image Retrieval
35

Impact of different weights of modalities onIntegrated Combination multimodal Retrieval
3 1559 287 1410 219 219 1191 68 34 260 81
2.5 1573 298 1404 219 219 1185 79 40 249 90
2.0 1595 310 1393 219 219 1174 91 51 237 111
1.7 1599 321 1373 219 219 1154 102 71 226 124
1.5 1597 329 1354 219 219 1135 110 90 218 133
1.0 1578 394 1254 219 219 1035 175 190 153 149
0.7 1454 432 1089 219 219 870 213 355 115 152
10/8/2012An Investigation on Combination Methods for Multimodal Content-based Medical Image Retrieval
36

Venn diagram of different modalities relevant retrieved document set in combined result set
10/8/2012An Investigation on Combination Methods for Multimodal Content-based Medical Image Retrieval
37

Impact of different weights of modalities onLate Fusion
Rmixed
3 1479 261 1437 219 219 1218 42 7 286 0
2.5 1515 337 1397 219 219 1178 118 47 210 0
2.0 1484 429 1274 219 219 1055 210 170 118 0
1.7 1385 481 1123 219 219 904 262 321 66 0
1.5 1246 506 959 219 219 740 287 485 41 0
1.0 720 543 396 219 219 177 324 1048 4 0
0.7 560 547 232 219 219 13 328 1212 0 0
10/8/2012An Investigation on Combination Methods for Multimodal Content-based Medical Image Retrieval
38

Details of ICMR in response to query #18
Text Visual Mixed
Threshold in 1000th
top score0.4053 0.8677 0.6486
Document ID
Similarity Scores
Textual Visual Mixed
1471-213X-4-16-2 0.3029 0.7768 1.2919
1471-213X-4-16-3 0.3242 0.8055 1.3567
1471-213X-4-16-5 0.3223 0.7837 1.3318
10/8/2012An Investigation on Combination Methods for Multimodal Content-based Medical Image Retrieval
39

Conclusion
In this study, we found that: Effective combination of textual and visual modalities
improves the overall performance of Content-based Medical Image Retrieval Systems.
It is clear that integrated retrieval outperforms all fusion techniques, regardless of late or early fusion, in multimodal CBIR systems.
In the best combination of textual and visual modalities, weight for textual modality is about 1.7 folds of visual modality weight.
Common documents in relevant retrieved set of different modalities also appears in relevant retrieved document set of combined modality too, regardless weights or methods.
10/8/2012An Investigation on Combination Methods for Multimodal Content-based Medical Image Retrieval
40

Future directions
Our study can be extended in several ways:
To apply and verify this experimentation results to other medical image collection
To extend our study into other domain of CBIR Systems rather than medical domain.
To implement the effective and efficient tool for applying ICMR
10/8/2012An Investigation on Combination Methods for Multimodal Content-based Medical Image Retrieval
41

THANKS FOR
YOUR KIND CONSIDERATION