IIIT
Hyd
erab
ad
Efficient Image Retrieval Methods For Large Scale Dynamic Image Databases
Suman Karthik
200407013
Advisor: Dr. C.V.Jawahar
IIIT
Hyd
erab
ad
• Cheap Imaging Hardware
• Plummeting Storage costs
• User Generated Content
Images
IIIT
Hyd
erab
ad
Image Databases
• Large Scale– Millions to billions of
images
• Dynamic– Highly dynamic in
nature Number of Images on Flickr fromDecember 2005 to November 2007
In millions
IIIT
Hyd
erab
ad
CBIR
• Content Based IR– Uses image content
• Pros– Good Quality
– Annotation agnostic
• Cons– Inefficient
– Not scalable
shape color texture
IIIT
Hyd
erab
ad
wN
d z
DPLSA, Hoffman, 2001
Bag Of Words
Words
*J Sivic & Zisserman,2003; Nister & Henrik,2006; Philbin,Sivic,Zisserman et la,2008;
FeatureExtraction
VectorQuantization
SemanticIndexing
Index
Compute SIFT descriptors
[Lowe’99]
W
D1 D2 D3
Inverted Index
IIIT
Hyd
erab
ad
Dynamic Databases
• Large scale
• New images added continuously
• High rate of change
• Nature of data not known apriori
Internet
Videos
Images
IIIT
Hyd
erab
ad
Text vs Images Dynamic databases
• Vocabulary known
• Rate of change of vocabulary low
• Stable vocabulary
• Vocabulary unknown
• Rate of change of vocabulary high
• Unstable vocabulary
IIIT
Hyd
erab
ad
Quantization and Semantic indexingIn Dynamic Databases
• As DB changes vocabulary is outmoded
• Updating vocabulary is too costly
• Not incremental
• Cannot keep up with rate of change
• As DB changes semantic index is invalid
• Updating semantic index is resource intensive
• Not incremental
• Cannot keep up with rate of change or scale
IIIT
Hyd
erab
ad
Dynamic Databases
Internet
Videos
Images
DynamicDatabase
FeatureExtraction
VectorQuantization
SemanticIndexing
Index
Quantization and semantic indexingmethods are a bottleneck
IIIT
Hyd
erab
ad
Objective 1
A. Motivation CBIR is inefficient and not scalable
B. Objective Develop methods to improve efficiency
and scalability of CBIR
C. ContributionsC 1.1 – Virtual Textual RepresentationC 1.2 – A new efficient indexing structureC 1.3 – Relevance feedback methods that improves performance
IIIT
Hyd
erab
ad
Objective 2
A. Motivation Quantization is bottleneck for BoW when dealing with dynamic image databases
B. Objective Develop incremental quantization
method for BoW model to successfully deal with dynamic image databases
C. ContributionsC 2.1 – Incremental Vector Quantization C 2.2 – Comparison of retrieval performance with existing methods C 2.3 – Comparison of incremental quantization with existing methods
IIIT
Hyd
erab
ad
Objective 3
A. MotivationSemantic Indexing is not scalable for BoW when dealing with dynamic image databases
B. Objective Develop incremental semantic indexing
method for BoW model to successfully deal with dynamic image databases
C. ContributionsC 3.1 – Bipartite Graph ModelC 3.2 – An algorithm for semantic indexing on BGMC 3.3 – Search engines for images
IIIT
Hyd
erab
ad
CBIR
IIIT
Hyd
erab
ad
* Image retrieval: Past, present, and future, Yong Rui, Thomas S. Huang, Shih F. Chang In International Symposium on Multimedia Information Processing 1997
Literature
• Global image retrieval
• Region based image retrieval
• Region Based Relevance feedback
Costly nearest neighbor based retrieval
Spatial Indexing
Relevance feedback heavily used
* Blobworld: A System for Region-Based Image Indexing and Retrieval, Chad Carson , Megan Thomas , Serge Belongie , Joseph M. Hellerstein , Jitendra Malik In Third International Conference on Visual Information Systems 1999* Region-Based Relevance Feedback In Image Retrieval, Feng Jing , Mingjing Li , Hong-jiang Zhang , Bo Zhang, Proc. IEEE International Symposium on Circuits and Systems 2002
IIIT
Hyd
erab
ad
Search
IIIT
Hyd
erab
ad
Transformation
Feature SpaceBins represented
by strings or wordsQuantization
Color
Com
pactness
Positio
n
IIIT
Hyd
erab
ad
Virtual Textual Representation
• Quantization– Uniform quantization (grid)
– Density based quantization(kmeans)
• Each cell is a string
Transformation
Document
Image
Words
Segments Text
Segmentation
IIIT
Hyd
erab
ad
CBIR Indexing
• Spatial Databases
• Relevance feedback skews the feature space rendering spatial databases inefficient*.
* Indexing for Relevance Feedback Image Retrieval, Jing Peng , Douglas R. Heisterkamp, In Proceedings of the IEEE International Conference on Image Processing (ICIP’03)
details
IIIT
Hyd
erab
ad
Elastic Bucket TrieNull
AB
C
A
B
A
B
B
Nodes
Buckets
CABCBA
OverflowSplitA B
QueryBBC
Retrieved Bucket
Insert
IIIT
Hyd
erab
ad
Relevance Feedback
Query
Retrieved
Relevance Feedback
IIIT
Hyd
erab
ad
Region importance based relevance feedback
KEYWORDS
Relevant Images Extracted WordsKeyword SelectionPseudo Image for
next iteration
Errors In Retrieval
IIIT
Hyd
erab
ad
Discriminative Relevance Feedback
• Classification is given precedence over clustering.
• Discriminative segments become the keywords.
• Non-discriminative segments are ignored.
SURFERS
WAVES
ROSES
FLOWERS
IIIT
Hyd
erab
ad
Discriminative Relevance Feedback
KEYWORDS
Relevant Images Extracted WordsKeyword SelectionPseudo Image for
next iterationIrrelevant Images
No Errors In Retrieval
IIIT
Hyd
erab
ad
Performance
Discriminative Relevance Feedback consistently out performs Region Based Importance method.
High Fscore
Low Fscore
IIIT
Hyd
erab
ad
Global image retrieval
LocalImageretrieval
SpatialIndexing
Non Spatialindexing
Global relevanceFeedback orNo relevance feedback
Region basedRelevance feedback
Our work
Early CBIR
Blobworld,(no indexing)
Simplicity(no indexing)
IIIT
Hyd
erab
ad
Analysis
• Relevance feedback algorithms need to be modified to work with text.
• Keywords emerge with relevance feedback signifying association between key segments.
• EBT can be used without any modifications with discriminative relevance feedback.
• Advent of Bag of Words model for image retrieval
IIIT
Hyd
erab
ad
Quantization
IIIT
Hyd
erab
ad
Literature
• Kmeans
• Hierarchical Kmeans
• Kmeans, Soft assignment
Time consuming offline quantization
Representative data available apriori
Quantization is not incremental
* Video Google: A Text Retrieval Approach to Object Matching in Videos, Josef Sivic, Andrew Zisserman, ICCV 2003 * Scalable Recognition with a Vocabulary Tree, D. Nistér and H. Stewénius, CVPR 2006 * Lost in quantization: Improving particular object retrieval in large scale image databases, James Philbin, Ondrej Chum, Michael Isard, Josef Sivic, Andrew Zisserman, CVPR 2008
IIIT
Hyd
erab
ad
Losses• Perceptual Loss
– Under quantization– Synonymy– Poor precision
• Binning Loss– Over quantization– Polysemy– Poor recall
Quantization
IIIT
Hyd
erab
ad
Incremental Vector Quantization
• Control perceptual loss
• Minimize binning loss
• Create quality code books
• Data dependent
• Incremental in nature
IIIT
Hyd
erab
ad
Algorithm
r
L = 2 L: minimum cardinality of a cell
Puts a upper bound on perceptual loss
Builds quality codebooks by ignoring noise
Soft BinAssignment: Minimizes binning loss
IIIT
Hyd
erab
ad
IIIT
Hyd
erab
ad
An experiment
• Given– All possible feature points in a feature space that
could be generated by natural processes.
• Quantize– K-means with apriori knowledge of entire data– IVQ with no apriori information.
• Performance– F-score– Time taken for incremental quantization
Details
IIIT
Hyd
erab
ad
Fscore
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
200 400 600 800 1000
Kmeans
Kmeans Soft
IVQ
IVQ: 1115 binsKmeans: 1000 bins
IVQ outperforms Kmeans
IIIT
Hyd
erab
ad
Time
IVQ outperforms Kmeans
•IVQ quantizes in 0.1 seconds•IVQ time complexity is linear
•Kmeans takes 1000 seconds•Time complexity exponential
IIIT
Hyd
erab
ad
Holiday Dataset
• Datasets• Holiday dataset
• 1491 images
• 500 categories
• Pre-processing • sift feature extraction.
• quantization using k-means.
• quantization using ivq
IIIT
Hyd
erab
ad
Incremental Quantization
S = seconds, D = Days
Batch = 100 images of 100,000 image ALOI dataset Added sequentially
• Datasets• ALOI dataset
• 100,000 images
• 1000 batches of 100 image each
• Pre-processing • sift feature extraction.
• quantization using k-means/online kmeans.
• quantization using IVQ
IIIT
Hyd
erab
ad
Analysis
• IVQ bins higher than Kmeans (constant perceptual loss)
• IVQ efficient due to local changes
• LSH used to accelerate IVQ
• Semantic indexing can improve mAPMore
IIIT
Hyd
erab
ad
Kmeans
Offlinequantization
Onlinequantization
Non densitybased
Density Based
Non incremental
Incremental
Online Kmeans
RegularLattice
IVQ(local)
AdaptiveVocabularyTree(global)
IIIT
Hyd
erab
ad
Semantic Indexing
IIIT
Hyd
erab
ad
Semantic Indexing
w
d
P(w|d)
* Hoffman 1999; Blei, Ng & Jordan, 2004; R. Lienhart and M. Slaney,2007
Animal
Flower
Whippet daffodil
tulipGSD
doberman
rose
Whippet
dobermanGSD
daffodil
tulip roseLSI, pLSA, LDA
Words clustered around latent topics
Visual Words clustered around latent topics
IIIT
Hyd
erab
ad
Literature
• Visual pLSA
• Visual LDA
• Spatial semantic indexing
High space complexity due to large matrix operations.
Slow, resource intensive offline processing.
* Discovering Objects and Their Location in Images , Josef Sivic, Bryan Russell, Alexei A. Efros, Andrew Zisserman, and Bill Freeman, ICCV 2005* Image Retrieval on Large-Scale Image Databases, Eva Horster, Rainer Lienhart, Malcolm Slaney, CIVR 2007* Spatial Latent Dirichlet Allocation, X. Wang and E. Grimson, in Proceedings of Neural Information Processing Systems Conference (NIPS) 2007
IIIT
Hyd
erab
ad
Bipartite Graph Model
• Vector space model is encoded as bipartite graph of words and
document.
• TF values retained as edge weights.
• IDF values retained as term weights
d2
words Documents
Cash Flow Algorithm
d1
d3
d4
d5
w1
w2
w3
w4
w5
w6
SaddamCaptured
IraqPullout
ObamaElected
BushPopularity
FinancialCrisis
subprime
reforms
war
Iraq
elections
democrats
TF
IDF
10050
5025
2512.5
12.5
11.7
8.35
IIIT
Hyd
erab
ad
• Feature extraction– Local detectors, SIFT
• Vector quantization– K-means
• BGM insertion– Words, Documents– TF– IDF
BGM with BoW
…
IIIT
Hyd
erab
ad
w1w2Query image
w1 w2 w3 w4 w5
Result :
Why BGM is Superior ?
Cash Flow
Result :
Inverted Index
IIIT
Hyd
erab
ad
Naïve vs BGM
• Datasets• 9000 images of flickr.
• 9 Sports Categories
• 5 Animal Categories
• Pre-processing • sift feature extraction.
• quantization using k-means.
• F-score• 2*(p*r)/(p+r)
IIIT
Hyd
erab
ad
BGM vs pLSA, IpLSA
• pLSA– Cannot scale for large databases.– Cannot update incrementally.– Latent topic intialization difficult– Space complexity high
• IpLSA– Cannot scale for large databases.– Cannot update new latent topics.– Latent topic intialization difficult– Space complexity high
• BGM+Cashflow– Efficient– Low space complexity
mAP Time Space
pLSA 0.553 5062s 3267Mb
IpLSA 0.567 56s 3356Mb
BGM 0.594 42s 57Mb
Number Of Concepts Known
Number Of Concepts unknown
mAP Time Space
pLSA 0.649 5144s 3267Mb
IpLSA 0.612 63s 3356Mb
BGM 0.594 42s 57Mb
• Datasets• Holiday dataset
• 1491 images
• 500 categories
• Pre-processing • sift feature extraction.
• quantization using k-means.
IIIT
Hyd
erab
ad
Near Duplicate Retrieval
• Dataset: 500,000 movie frames – SIFT vectors– Kmeans quantization
• Indexed using text search library Ferret. – Efficient Indexing and retrieval – Effectively scalable to large data.
• Query frame given as query to Ferret index. • Cash propagated to every node until cut-off.
IIIT
Hyd
erab
ad
Sample Retrieval
Fastest Indian
Query Retrieval
Fight Club
Harry Potter
IIIT
Hyd
erab
ad
Analysis
• Low index insert time for new images– Less than 200 seconds to insert 1000 images in a million image index
• Marginally higher retrieval time– Due to multiple levels of graph traversal
• Memory usage minimal
• Works without concept number apriori
• BGM is a hybrid model – Generative
– discriminative
IIIT
Hyd
erab
ad
OfflineSemanticindexing
OnlineSemanticindexing
Generative
Discriminative
Non incremental
Incremental
BGM(generative + discriminative)
PLSA
BGM IDF
BGM TF
LDA
IPLSA
IIIT
Hyd
erab
ad
Conclusion
• Efficient methods for retrieval in large scale dynamic image databases
• Scalability and adaptability have been addressed
• A step closer to real world image retrieval
• Features and their mixture, a long way to go
IIIT
Hyd
erab
ad
Future Work
• Quality and quantity of features
• Automatic feature modeling
• Text search engines for image search
• GPU based quantization methods
• Multiple vocabularies for image retrieval
• Multimodal semantic indexing with BGM
IIIT
Hyd
erab
ad
List of publications
• Suman Karthik, C.V. Jawahar, "Incremental On-line semantic Indexing for Image Retrieval in Dynamic. Databases" 4th International Workshop on Semantic Learning and Applications, CVPR, 2008, Florida
• Suman Karthik, C.V. Jawahar, "Analysis of Relevance Feedback in Content Based Image Retrieval", Proceedings of the 9th International Conference on Control, Automation, Robotics and Vision (ICARCV), 2006, Singapore.
• Suman Karthik, C.V. Jawahar, Virtual Textual Representation for Efficient Image Retrieval. Proceedings of the 3rd International Conference on Visual Information Engineering (VIE), 26-28 September 2006 in Bangalore, India.
• Suman Karthik, C.V. Jawahar, Effecient Region Based Indexing and Retrieval for Images with Elastic Bucket Tries, Proceedings of the International Conference on Pattern Recognition (ICPR), 2006
IIIT
Hyd
erab
ad
The End
IIIT
Hyd
erab
ad
Intuitive way of learning content
Transformation
Over segmentation and subsequent deduction of content through relevance feedback.Document
Image
Words
Segments Text
Segmentation
Discriminative Relevance Feedback leverages this advantage to achieve better performance than standard techniques.
IIIT
Hyd
erab
ad
Kmeans
• Pros– Simple– Efficient
• Cons– Computationally expensive– Representative Training Set– Sensitive to parameter K
IIIT
Hyd
erab
ad
A naive quantization scheme
Quantization
F2
F1
F3
Advantages: - High speed. No quantization overhead- As dataset size grows precision increases
Disadvantages: - Not data dependent, no idea of visual concept- Information loss due to hard assignment
* Suman karthik, C. V. jawahar, Virtual Textual Representation for Efficient Image Retrieval VIE 2006* Tuytelaars, T. and Schmid, C. Vector Quantizing Feature Space with a Regular Lattice ICCV 2007
IIIT
Hyd
erab
ad
C 2.1 Methodology
• Data– 1000 Random feature vectors each generated from 1000 normal
distributions in a 2-d feature space. A total of 1 million feature points in the space.
– 100,000 Virtual images falling into 100 categories where each category image is generated by drawing random numbers from 10 normal distributions from the above data.
• Algorithms– Kmeans (quantized with the entire data and ideal K=1000)– IVQ– Kmeans with soft assignment
• Measures– F-score for retrieval performance– Time estimates for incremental quantization
Back
IIIT
Hyd
erab
ad
Performance
Back
IIIT
Hyd
erab
ad
Performance
IIIT
Hyd
erab
ad
Image Retrieval
• Contemporary approach– Uses textual cues
• Pros– Simple– Efficient
• Cons– Images are Subjective– Text cues unscalable– Quality Suffers
Rose
Petals
Red
Green
Bud
Gift
Love
Flower
IIIT
Hyd
erab
ad
Losses
• High Perceptual Loss
• High Binning Loss
• Optimal Quantization
IIIT
Hyd
erab
ad
Image retrieval as Text retrieval
Can an image be indexed, queried for and
retrieved as a text document?
Can this become… …this????????????
IIIT
Hyd
erab
ad
Relevance Feedback
• Statistical– Delta mean algorithm– Query Point Movement– Inverse Variance– Membership Criterion
• Kernel Based– Parzen Windows– SVM– Kernel BDA]
• Entropy Based– KL divergence
0
20
40
60
80
100
120
140
D1 D2 D3 D4
Inverse SigmaDelta MeanMCQPMKL DivergenceParzenKBDASVM
0
20
40
60
80
100
120
140
D1 D2 D3 D4
Inverse SigmaDelta MeanMCQPMKL DivergenceParzenKBDASVM
<<Back
IIIT
Hyd
erab
ad
Semantic Indexing for Images
• Objects and their location in images.
• Large Scale Image Databases
• Web image selection
• Spatial Latent Dirichlet Allocation
• Image auto-annotation
Sivic, J. Russell, B.C. Efros, A.A. Zisserman, A. FreemanLienhart, R. Slaney, MKeiji YanaiXianggang Wang, Eric GrimsonMonay, Florent and Gatica-Perez, Daniel,
High space complexity due to large matrix operations.
Slow, resource intensive offline processing.