cross-media retrieval based on subspace...
TRANSCRIPT
Cross-media Retrieval based on Subspace Mapping
Hong Zhang
Wuhan University of Science & Technology, Wuhan, China
11 February, 2014
outline
background:
content-based image retrieval
cross-media retrieval:
challenging issues
algorithm & discussion
Content-Based Image Retrieval (CBIR)
1. Technical Background:
the earliest, typical and long standing
[Ref: Efficient Manifold Ranking for Image Retrieval. SIGIR 2011]
the first image with red box is the query example
the rest are the top 5 returns, ranking by their similarity.
[Ref: Content-based multimedia information retrieval: state-of-the-art and challenges. ACM Trans. Multimedia
Comput. Commun. ]
(a) Visual Content Representation
Key Issues of CBIR:
image database
feature extraction
high-dimensional feature space
color
shape
texture
more accurate and more efficient?
(b) Similarity metric
A
B what's the similarity value between sample A and B?
Similarity =
linear distance in the Euclidean space
[Ref: Optimizinglearninginimageretrieval. CVPR 2000]
predefined imagerepresentation model
fused dimension1
fused dimension2
fused dimension3
[Ref: Learning an Image Manifold for Retrieval. ACM Multimedia Conference. 2004]
Similarity =
nonlinear distance in an embeded manifold
Linear or nonlinear, which one is better depends on the properties of the selected database.
supose the following is a visualized data distribution model:
2.Cross-media Retrieval: (my research interests)
To meet the needs of querying different multimedia data,
cross-media retrieval emerges from CBIR researches.
CBIR
process different types of multimedia data
support flexible retrieval of different data
cross-media retrieval
submit a query
example
image database audio database
return similar
images and audios
a sketch map of cross-media retrieval
[Ref: Harmonizing Hierarchical Manifolds for Multimedia Document SemanticsUnderstanding and Cross-Media Retrieval. IEEE Transactions on Multimedia,2011]
semantic similarity measure
semantic similarity measure
2.1 Main challenging issues
Visual feature vectors
(color, shape, texture)
Auditory feature vectors
(e.g. time-domain,
frequency domain)
Image Audio
feature extraction and
formalization
feature extraction and
formalization
machine learningmachine learning
Feature heterogeneity
how to represent heterogteneous features
Suppose we already have a cross-meda representation model for
both image and audio data
if A and B are close in distance measure
they should be ranked close in semantics.
A
B
cross-media representation model
visual axis 1
visual axis 2
auditory axis 1
geometrical distance?
semantic distance?
multi-feature fusion &
correlation learning
cross-media content representation
the flowchart of my work
graph-based multimodal ranking
Paper publication:
Journal: Neurocomputing, vol.119,2013; Conference:ICIP 2013, PCM 2013
audio feature vectorsimage feature vectors
low-dimensional &
isomorphic subspace
for cross-media
similarity measure
2.2 Algorithm & discussion
semi-supervised fusion
hierarchical linear
regression model
intra-modality
inter-modality
deep learning, neural network
(performance optimization)
multi-feature fusion & correlation learning
2.2.1 Problem formalization: step 1
the objective function:
global regression matrix for
new data
local hierarchical regression
intra-modality
2 2
1
1 1
2 2
1
min ( ( ) 1 )
( ( ) 1 B )
v ni T j i i i
g g k g g gF Fg i
vT
g g n g g g FFg
X w b Y w
X W Y W
inter-modality
2.2.1 Problem formalization: step 2
covariance matrices correponding to
original image and audio features
the objective function:
covariance matrices correponding to
original image and audio features
(b) data distribution in cross-media subspace
dimension 1
dim
ensi
on
2
bird,dog car
(a)data distribution in PCA subspace
PCA 1
PC
A 2
bird,dog
car
Example:
Simplified two-dimensional data distribution & comparison
traditional PCA method our approach
2.2.2 Experiment results and comparisons
(a) cross validation for different parameters
gains the best
performance
when 1,10
(b) cross-media retrieval results:
Ref:
[2] Fusing inherent and external knowledge with nonlinear learning for cross-mediaretrieval. Neurocomputing.2013
[6] Ranking with Local Regression and Global Alignment for Cross-Media Retrieval. ACM Multimedia . 2009
Main limitation:
offline, response time is not required and considered
for large-scale real-world applications, reduce computational
complexity and obtain timely response
not directly compatible to multiple kinds of data
lack comprehensive cross-media correlation measure