cross-media retrieval based on subspace...

Cross-media Retrieval based on Subspace Mapping

Hong Zhang

Wuhan University of Science & Technology, Wuhan, China

[email protected]

11 February, 2014

outline

background:

content-based image retrieval

cross-media retrieval:

challenging issues

algorithm & discussion

Content-Based Image Retrieval (CBIR)

1. Technical Background:

the earliest, typical and long standing

[Ref: Efficient Manifold Ranking for Image Retrieval. SIGIR 2011]

the first image with red box is the query example

the rest are the top 5 returns, ranking by their similarity.

[Ref: Content-based multimedia information retrieval: state-of-the-art and challenges. ACM Trans. Multimedia

Comput. Commun. ]

(a) Visual Content Representation

Key Issues of CBIR:

image database

feature extraction

high-dimensional feature space

color

shape

texture

more accurate and more efficient?

(b) Similarity metric

A

B what's the similarity value between sample A and B?

Similarity =

linear distance in the Euclidean space

[Ref: Optimizinglearninginimageretrieval. CVPR 2000]

predefined imagerepresentation model

fused dimension1

fused dimension2

fused dimension3

[Ref: Learning an Image Manifold for Retrieval. ACM Multimedia Conference. 2004]

Similarity =

nonlinear distance in an embeded manifold

Linear or nonlinear, which one is better depends on the properties of the selected database.

supose the following is a visualized data distribution model:

2.Cross-media Retrieval: (my research interests)

To meet the needs of querying different multimedia data,

cross-media retrieval emerges from CBIR researches.

CBIR

process different types of multimedia data

support flexible retrieval of different data

cross-media retrieval

submit a query

example

image database audio database

return similar

images and audios

a sketch map of cross-media retrieval

[Ref: Harmonizing Hierarchical Manifolds for Multimedia Document SemanticsUnderstanding and Cross-Media Retrieval. IEEE Transactions on Multimedia,2011]

semantic similarity measure

semantic similarity measure

2.1 Main challenging issues

Visual feature vectors

(color, shape, texture)

Auditory feature vectors

(e.g. time-domain,

frequency domain)

Image Audio

feature extraction and

formalization

feature extraction and

formalization

machine learningmachine learning

Feature heterogeneity

how to represent heterogteneous features

Suppose we already have a cross-meda representation model for

both image and audio data

if A and B are close in distance measure

they should be ranked close in semantics.

A

B

cross-media representation model

visual axis 1

visual axis 2

auditory axis 1

geometrical distance?

semantic distance?

multi-feature fusion &

correlation learning

cross-media content representation

the flowchart of my work

graph-based multimodal ranking

Paper publication:

Journal: Neurocomputing, vol.119,2013; Conference：ICIP 2013, PCM 2013

audio feature vectorsimage feature vectors

low-dimensional &

isomorphic subspace

for cross-media

similarity measure

2.2 Algorithm & discussion

semi-supervised fusion

hierarchical linear

regression model

intra-modality

inter-modality

deep learning, neural network

(performance optimization)

multi-feature fusion & correlation learning

2.2.1 Problem formalization: step 1

the objective function:

global regression matrix for

new data

local hierarchical regression

intra-modality

2 2

1

1 1

2 2

1

min ( ( ) 1 )

( ( ) 1 B )

v ni T j i i i

g g k g g gF Fg i

vT

g g n g g g FFg

X w b Y w

X W Y W

inter-modality

2.2.1 Problem formalization: step 2

covariance matrices correponding to

original image and audio features

the objective function:

covariance matrices correponding to

original image and audio features

(b) data distribution in cross-media subspace

dimension 1

dim

ensi

on

2

bird,dog car

(a)data distribution in PCA subspace

PCA 1

PC

A 2

bird,dog

car

Example:

Simplified two-dimensional data distribution & comparison

traditional PCA method our approach

2.2.2 Experiment results and comparisons

(a) cross validation for different parameters

gains the best

performance

when 1,10

(b) cross-media retrieval results:

Ref:

[2] Fusing inherent and external knowledge with nonlinear learning for cross-mediaretrieval. Neurocomputing.2013

[6] Ranking with Local Regression and Global Alignment for Cross-Media Retrieval. ACM Multimedia . 2009

(c) performance evaluation with new data

Main limitation:

offline, response time is not required and considered

for large-scale real-world applications, reduce computational

complexity and obtain timely response

not directly compatible to multiple kinds of data

lack comprehensive cross-media correlation measure

Thanks!

cross-media retrieval based on subspace...

Documents