ipal at imageclef 2007 mixing features, models and knowledge sheng gao ipal french-singaporean joint...

IPAL at ImageClef 2007

Mixing Features, Models and Knowledge

Sheng Gao

IPAL French-Singaporean Joint LabInstitute for Infocomm Research

Singapore

IPAL at ImageClef’07 Team members

Jean-Pierre Chevallet, IPAL & CNRS, France (photo & medical)

Thi Hoang Diem Le, I2R, Singapore (photo & medical)

Trong Ton Pham, I2R, Singapore (photo)

Joo Hwee Lim, I2R, Singapore (photo)

Outline Ad-hoc photographic image retrieval (ImageC

LEFphoto) Content based image retrieval (CBIR) system Text based information retrieval (TBIR) system Mix-modality retrieval system Benchmark results Summary

Ad-hoc medical image retrieval (ImageCLEFmed) UMLS based medical retrieval system Benchmark results Summary

ImageCLEFphoto’07 vs.ImageCLEFphoto’06

Less text information is available in ImageCLEFphoto’07.

Image text annotations: notes are excluded this year.

Query: only annotation in the title field are used. Visual information plays more important role in

ImageCLEFphoto’07 than ImageCLEFphoto’06. Query image samples are excluded from the image

database. 244 new images are added in the image database.

Similar text annotation, different visual representation

01/1311 01/1310 Title: Accommodation Huanchaco - Exterior View

Title:Accommodation Huanchaco - Interior View

Visual evidence plays a critical role in the case.

Low-level visual features COR: Auto color correlogram, 324-dimension. HSV: 166-dimensional histogram in HSV(162-dim

ension,18*3*3) and gray image (4-dimension). GABOR: 48-dimension including means and varia

nces at 2-scale and 12-orientations in 5x5 grids. SIFT: 128-dimensional appearance feature. EDGE: Canny edge histogram, 80-dimension (6-or

ientations * 16 patches). HSV_UNI: 96-dimensional HSV histogram, 32-bin

per channel. GABOR_global: 60-dimension including means a

nd variances at 5-scale and 6-orientations in whole image.

Indexing and similarity measure Indexing

tf-idf: index histogram, each bin treated as a word.SVD: indexing at the eigen-space (80% eigenvectors are kept).ISM: Integrated Statistic Model (ISM) based supervised learning is used to learn ranking function (refer to S. Gao, et al., ACM Multimedia’07).HME: Hidden Maximum Entropy (HME) based supervised learning is used for ranking function (refer to S. Gao, et al. ICME’07 and ICIP’07).WORD_SVD: index at intermediate concepts

200 frequent keywords are intermediate concepts.HME models are trained for 200 concepts.Indexing image at 200-dim concept space.

WORD_ISM: ISM is used for ranking function at 200-dim feature.BoV: index with the bag-of-visterm (1000 visterms).

Indexing and similarity measure Similarity measure

Cosine distance if the image is indexed by one feature vector, e.g. tf-idf, SVD, etc.Likelihood ratio between the positive and negative models, if supervised learning is used, e.g. ISM, HME.

CBIR system Example fusion structure

COR

HSV

GABOR

SIFT

Visual feature Indexing and ranking

Tf-idf

SVD

ISM

HME

WORD_SVD

WORD_ISM

Fusion

cor

hsv

gabor

sift

Fusion

Fusion system

TBIR system Process XML annotation files

XML reader

Removestop words

Stemming

Lexicon

TBIR system Language model based IR: a document-dependent language

model is estimated for each document in the database.

P(w1|D)

P(w2|D)

……P(wn|

D) Ranking according to the probability of the query, Q, is

generated by the document-dependent LM.

ii

P Q D P q D Q: q1,q2,……,qm

Estimate LM

Lexicon

TBIR system Latent Semantic Indexing (LSI) based IR

Term-document matrix

Lexicon

Eigen-space

Eigen-space

Index in eigen-space

Cosine distance is used for similarity measure in LSI space.

SVD

TBIR system Access Wikipedia to extract external kn

owledge for query / document expansion.

4,881,983 pages are downloaded (Wikipedia in English, April 2, 2007 ).23,399 animal terms are extracted, which are useful for queries 5, 20 and 35.709 geographical terms are extracted, of which only mountain should be useful for queries 4 and 44.

Mix-modality system Linear combination.

CBIR

TBIR

w

1-wMix-system

Cross-modality pseudo-relevance feedback (PRF)

CBIR

TBIRQuery words

Query expansion

Top N image document

One scheme: CBIR to boost TBIR

Results 27 runs are submitted including CBIR, TBIR and mix-

modality runs. Best run MAP: 0.2833

6th place among 476 runs; 2nd place among automatic runs.IPAL_04V_12RUNS WEIGHT: combine 12 CBIR runs using the empirically tuned weights.IPAL_11TrV_LM_12RUNSVISUAL: LM-based TBIR plus the PRF.

CBIR best run MAP: 0.1204 without PRF, 4th place among CBIR.

1st run from INAOE (MAP: 0.1925) and the 2nd run from XRCE (MAP: 0.1890)

Results Our TBIR best run MAP: 0.1806 with auto

matic feedback, 7th place among TBIR.19TiV_WTmM S0M2D0.8C6T6: with a very small thesaurus manually extracted from Wikipedia. It is terms that are not from info boxes but are relevant to this collection.Using a black and white image detector based on HSV value of image.Run15: LM, document expansion with automatic Wikipedia.

Top TBIR runs’ MAP: 0.2020 without feedback (Budapest) and 0.2075 with feedback (XRCE).

PRF analysis CBIR system

Feedback from the TBIR has few effect.

MAP of the HSV-based CBIR (run 02) is only increased to 0.0693 from 0.0684 (run 01).

Combining 12 CBIR PRF runs, MAP is increased to 0.1358 (run 05) from 0.1204 (run 04).

TBIR systemFeedback from CBIR significantly improve MAP.

MAP of the LM-based TBIR is 0.1377 (run 08). With PRF (run 04), MAP reaches 0.2442 (run 11).

Summary on ImageCLEFphoto Combing rich visual content representati

ons and indexing techniques significantly improve the CBIR system comparing with any individual visual system.

CBIR based pseudo-relevance feedback significantly boost text based search system.

Exploiting external knowledge such as Wikipedia gives an extra bonus, however, it is less effective than expected. Its large size causes confusion due to lake of disambiguation.

ImageCLEFmed - Bayesian network based approach

Conceptualization: Knowledge base: UMLS Metathesaurus (NLM).

Images / texts

EnglishFrenchGerman

UMLS

TreeTagger

MetamapXIotamap

Concepts

q

c2c1 cj

d

cm cn...


Retrieval processDocument D observed: P(D)=1

, cc w tf idf

q

c2c1 cj

d

cm cn...2

''

( , ) ( | ) ( ) c

cc D

wP c D P c D P D

w


Inference via semantic links from document concept nodes to query concept nodes

( ) max ( ( | ( )) ( ( )))

max ( ( ( )))

q i q i q i q

i i q

P c P c pa c P pa c

P pa c

L: Maximum length of UMLS taxonomyl: minimal length of path between 2 conceptspa(c): document concept nodes which are parent nodes of c


Relevance status value, RSV(q,d) : belief at q

( )

( , ) ( )i

i

ii

c ic q

cc q

w P c

RSV q d bel qw

ResultsRun Isa PAR-

CHDBR-

RN

RL RQ Map R-Prec

IPAL-TXT-BAY-ISA0.1

0.1 0 0 0 0 0.3057 0.332

IPAL-TXT-BAY-ALLREL2

0.2 0.01 0.01 0 0 0.3042 0.333

IPAL-TXT-BAY-ALLREL1

0.2 0.01 0.01 0.001 0.001 0.304 0.3338

IPAL-TXT-BAY-ISA0.2

0.2 0 0 0 0 0.3039 0.3255

IPAL-TXT-BAY-ISA0.3

0.3 0 0 0 0 0.2996 0.3212

IPAL-TXT-BAY-ISA0.4

0.4 0 0 0 0 0.2935 0.3177

Summary on ImageCLEFmed Bayesian model approach exploits semantic

relationship between documents concepts and query concepts in an unified framework.

It enhances the VSM by using the semantic relatedness between concepts.

Improvements on relationship weighting issue as well as performance of model are our further study.

ipal at imageclef 2007 mixing features, models and knowledge sheng gao ipal french-singaporean joint...

Documents