content-based image retrieval (cbir)

Content-based Image Retrieval (CBIR)

Searching a large database for images that match a query:

What kinds of databases? What kinds of queries? What constitutes a match? How do we make such searches

efficient?

Applications

Art Collections e.g. Museums, galleries, Archives Medical Image Databases CT, MRI, Ultrasound, The Visible Human Scientific Databases e.g. Earth Sciences, Hubble General Image Collections for Licensing Corbis, Getty Images, photo galery of

Gulhasan The World Wide Web

What is a query?

an image you already have a rough sketch you draw a symbolic description of what you want e.g. an image of a man and a woman on a beach

Some Systems You Can Try

Corbis Stock Photography and Pictures

http://www.corbis.com/

• Corbis sells high-quality images for use in advertising, marketing, illustrating, etc.

• Search is entirely by keywords.

• Human indexers look at each new image and enter keywords.

• A thesaurus constructed from user queries is used.

AVAİLABLE CBIR SYSTEMS

QBICIBM’s QBIC (Query by Image Content)

http://wwwqbic.almaden.ibm.com

• The first commercial system.

• Uses or has-used color percentages, color layout, texture, shape, location, and keywords.

Blobworld

UC Berkeley’s Blobworld

http://elib.cs.berkeley.edu/photos/blobworld

•Images are segmented on color plus texture

• User selects a region of the query image

• System returns images with similar regions

• Works really well for tigers and zebras

DittoDitto: See the Web

http://www.ditto.com

• Small company

• Allows you to search for pictures from web pages

National Archives, USA

Declaration of Independence

Ottoman Archives ???

QUERY BY EXAMPLE

Image Features / Distance Measures

Image Database

Query Image

Distance Measure

Retrieved Images

Image Feature

Extraction

Feature SpaceImages

Features

• Color (histograms, gridded layout, wavelets)

• Texture (Laws, Gabor filters, local binary partition)

• Shape (first segment the image, then use statistical or structural shape similarity measures)

• Objects and their Relationships

This is the most powerful, but you have to be able to recognize the objects!

Color Histograms

QBIC’s Histogram Similarity

The QBIC color histogram distance is:

dhist(I,Q) = (h(I) - h(Q)) A (h(I) - h(Q))T

• h(I) is a K-bin histogram of a database image

• h(Q) is a K-bin histogram of the query image

• A is a K x K similarity matrix

Similarity Matrix

R G B Y C V 1 0 0 .5 0 .50 1 0 .5 .5 00 0 1 1 1 1

RGBYCV

How similar is blue to cyan?

Gridded Color

Gridded color distance is the sum of the color distancesin each of the corresponding grid squares.

What color distance would you use for a pair of grid squares?

1 12 2

3 34 4

Texture Distances

• Pick and Click (user clicks on a pixel and system retrieves images that have in them a region with similar texture to the region surrounding it.

• Gridded (just like gridded color, but use texture).

• Histogram-based (e.g. compare the LBP histograms).

Local Binary Pattern Measure

100 101 103 40 50 80 50 60 90

• For each pixel p, create an 8-bit number b1 b2 b3 b4 b5 b6 b7 b8, where bi = 0 if neighbor i has value less than or equal to p’s value and 1 otherwise.

• Represent the texture in the image (or a region) by the histogram of these numbers.• Compute L1 distance between two histograms:

1 1 1 1 1 1 0 0

Laws Texture

Law’s texture masks (1)

Shape Distances

• Shape goes one step further than color and texture.

• It requires identification of regions to compare.

• There have been many shape similarity measures suggested for pattern recognition that can be used to construct shape distance measures.

Global Shape Properties:Projection Matching

041320

0 4 3 2 1 0

In projection matching, the horizontal and verticalprojections form a histogram.

Feature Vector(0,4,1,3,2,0,0,4,3,2,1,0)

What are the weaknesses of this method? strengths?

Global Shape Properties:Tangent-Angle Histograms

0 30 45 135

Is this feature invariant to starting point?

Boundary Matching:Fourier Descriptors

Given a sequence of points of a boundary Vk

Boundary Matching

• Elastic Matching

The distance between query shape and image shapehas two components:

1. energy required to deform the query shape into one that best matches the image shape

2. a measure of how well the deformed query matches the image

Del Bimbo Elastic Shape Matching

query retrieved images

Our work on CBIR at METU1. M. Uysal, F.T. Yarman-Vural, “A Content-Based Fuzzy Image Database Based on The Fuzzy ARTMAP

Architecture”, Turkish Journal of Electrical Engineering & Computer Sciences, vol. 13, pp.333-342, 2005.

2. Ö.Ö. Tola, N. Arıca, F.T. Yarman-Vural, “Shape Recognition with Generalized Beam Angle Statistics”, Lecture Notes on Computer Science, vol. 2, 2004.

3. Ö.C. Özcanli, F.T. Yarman-Vural, “An Image Retrieval System Based on Region Classification”, Lecture Notes on Computer Science, vol. 1, 2004.

4. Y.Xu, P.Duygulu, E.Saber, A.M. Tekalp, F.T. Yarman-Vural, “Object-based Image Labeling Through Learning by Example and Multi-level Segmentation”, Pattern Recognition, vol. 36, pp.1407-23, 2003.

5. N. Arıca, F.T. Yarman-Vural, “BAS: A Perceptural Shape Descriptor Based on the Beam Angle Statistics”, Pattern Recognition Letters, vol. 24, pp.1627-39, 2003.

6. M. Ozay, F.T. Yarman-Vural. " On the Performance of Stacked Generalization Architecture", Lecture Notes in Computer Science, ICIAR pp.445-451, 2008,

7. E. Akbaş, F.T. Yarman-Vural, "Automatic Image Annotation by Ensemble of Visual Descriptors", CVPR 2007,pp.1-8, 2007.

8. İ. Yalabık, F.T. Yarman-Vural, G. Ucoluk, O.T. Sehitoglu, "A Pattern Classification Approach for Boosting with Genetic Algorithms", ISCIS07, pp.1-6, 2007.

9. M. Uysal, E. Akbas, F.T. Yarman-Vural, “A Hierarchical Classification System Based on Adaptive Resonance Theory”, ICIP06, p.2913-16, 2006.

10. E. Akbaş, F.T. Yarman-Vural, “Design of Feature Set for Face Recognition Problem”, Lecture Notes in Computer Science, ISCIS 2006, pp.239-47, 2006.

11. N. Arica, F.T. Yarman-Vural, “Shape Similarity Measurement for Boundary Based Features”, Lecture Notes in Computer Science, ISBN 3-540-29069-9, 3656, pp.431-438, 2005.

12. M. Uysal and F.T. Yarman-Vural, “ORF-NT: An Object-Based Image Retrieval Framework Using Neighborhood Trees”, Lecture Notes in Computer Science 37332005, ISBN 3-540-29414-7 SCIS2005, p.595-605, 2005.

13. N. Arıca and F.T. Yarman-Vural, “A Compact Shape Descriptor Based on the Beam Angle Statistics”, International Conference on Image and Vidor Retrieval (CIVR), 2003.

14. N. Arıca, F.T. Yarman-Vural, “A Perceptual Shape Descriptor”, International Conference on Pattern Recognition ICPR, 2002.

15. A. Çarkacıoğlu, F.T. Yarman-Vural "Learning Similarity Space", Proc. Int. Conf. on Processing ICIP02, pp. 405-408, Rochester, NY, USA, September 2002.

16. N. Arıca, F.T. Yarman-Vural, “A Perceptual Shape Descriptor”, Proc. International Conference on Pattern Recognition (ICPR) Quebec, Canada, 2002.

17. N. Arıca, F.T. Yarman-Vural, “A Shape Descriptor Based on Circular Hidden Markov Model”, Proc. International Conference on Pattern Recognition (ICPR), 2000, Barcelona, Spain.

18. P. Duygulu, E. Saber, M. Tekalp, F.T. Yarman-Vural, “Multi-Level Image Segmentation for Content Based Image Representation" IEEE-ICASSP-2000, July 2000, Istanbul.

19. P. Duygulu, A. Çarkacıoğlu, F.T. Yarman-Vural, "Multi-Level Object Description: Color or Texture", First IEEE Balkan Conference on Signal Processing, Communications, Circuits, and Systems, June 1-3, 2000, Istanbul, Turkey.

20. N. Dicle, F.T. Yarman-Vural, NES A File Format for Content Based Coding of Images, ISCIS 15, 2000, Istanbul.

21. N. Arıca, F.T. Yarman-Vural, “A New HMM Topology for Shape Recognition” IEEE-EURASIP Workshop on Nonlinear Signal and Image Processing (NSIP'99), pp. 162-168, Antalya TURKEY, 1999.

22. S. Genc, F.T. Yarman-Vural, "Morphing for Motion Estimation" Int.Conf. on Image Processing Systems, and Technologiesand Information Systems, Las Vegas 1999.

Türkçe Bildiriler 1. M. Özay, F.T. Yarman Vural, "Yığılmış Genelleme Algoritmalarının Performans Analizi", Sinyal

İşleme ve İletişim Uygulamaları Kurultayı, 2008.2. A. Sayar, F.T. Yarman Vural, "Yarı Denetimli Kümeleme ile Görüntü İçeriğine Açıklayıcı Not

Üretme", Sinyal İşleme ve İletişim Uygulamaları Kurultayı, 2008.3. Ö. Özcanli, E. Akbaş, F.T. Yarman Vural, “Görüntü Erişiminde Bulanık ARTMAP Ve Adaboost

Sınıflandırma Yöntemlerinin Performans Karşılaştırması”, IEEE 13. Sinyal İşleme ve İletişim Uygulamaları Kurultayı, 2005.

4. Ö.Ö. Tola, N. Arıca, F.T. Yarman Vural, “Genelleştirilmiş Kerteriz Açısı İstatistikleri ile Şekil Tanıma”, Sinyal İşleme ve İletişim Uygulamaları Kurultayı, 1, 2004.

5. M. Uysal, Ö. Özcanlı, F.T. Yarman Vural, “Bulanık ARTMAP Mimarisini Kullanan İçerik Bazlı Bir İmge Sorgulama Sistemi, Sinyal Isleme ve İletisim Uygulamaları Kurultayı, 2004.

6. A. Çarkacıoğlu, F.T. Yarman Vural, "Doku Benzerliğini Tesbit Etmek İçin Yeni Bir Tanımlayıcı", 11. SIU, Sinyal İşleme ve İletişim Uygulamaları Kurultayı, 2003.

7. N. Arıca, F.T. Yarman Vural, "Tıkız Şekil Betimleyicileri", 11.SIU, Sinyal İşleme ve İletişim Uygulamaları Kurultayı, 2003.

8. Ö.C. Özcanlı, P. Duygulu Şahin, F.T. Yarman Vural, “Açıklamalı Görüntü Veritabanları Kullanarak Nesne Tanıma ve Erişimi”, 11.SIU, Sinyal İşleme ve İletişim Uygulamaları Kurultayı, 2003.

9. S. Alkan, F.T. Yarman Vural, “Öğretim Üyesi Yetiştirme Programı”, Elektrik Elektronik, Bilgisayar Mühendislikleri Eğitimi I. Ulusal Sempozyumu, 2003.

10. M. Uysal, F.T. Yarman Vural, “En İyi Temsil Eden Öznitelik Kullanılarak İçeriğe Dayalı Endeksleme ve Bulanık Mantığa Dayalı Sorgulama Sistemi”, 11.SIU, Sinyal İşleme ve İletişim Uygulamaları Kurultayı, 2003.

11. N. Arıca, F.T. Yarman Vural, “Kerteriz Tabanlı Şekil Tanımlayıcısı”, 10. SIU, cilt.1 sayfa 129-134, 2002.

12. Ö.C. Özcanlı, S. Dağlar, F.T. Yarman Vural, “Yerel Benzerlik Örüntüsü Yöntemi ile Görüntü Erişim Sistemi”, 10. SIU, cilt.1 sayfa 141-146, 2002. 30

BAS: Beam Angle Statstics

Ömer Önder TolaNafiz Arıca

Fatoş Tünay Yarman Vural

Beam Angle Statistics (BAS)

BAS, is a shape descriptor that describes a shape that is defined as an ordered set of boundary pixels, by assigning the statistics of beam angles measured for the pixel as a feature vector.

Beam Angle shows how much the shape is bended.

BAS, describes the 2D shape as 1D moment functions of the Beam Angles.

Beam Angle Statistics: Given an ordered set of boundary pixels

1 2 34

121314

15161718192021222324

313233

Beam Angle Statistics)(ip 0 1 23

111213

14151617181716151413

))(( ipC1

111213

14151617181716151413

))(( ipC2

111213

14151617181716151413

))(( ipC3

Beam Angle Statistics)(ip

))(( ipCK

0 1 23

111213

14151617181716151413

111213

14151617181716151413

))(( ipC17

)( Kip

))(( ipCK

)( Kip

,...)(,)()( 21 iCEiCEi

0,1,2,...m ))(()(K

iCPCiCE KKmK

METU, Department of Computer Engineering 41

Plots of CK(i)’s with fix K values

k=N/40

k=N/10

k=N/40 : N/4

What is the most appropriate value for K which discriminates the shapes in large database and represents the shape information at all scale ?

Answer:Find a representation which employs the

information in CK(i) for all values of K.

Output of a stochastic process at each point

C(i) is a Random Variable of the stochastic process which generates the beam angles

mth moment of random variable C(i)

Each boundary point i is represented by the moments of C(i)

First three moments of C(i)’s

Correspondence of Visual Parts and Insensitivity to Affine Transformation

Robustness to Polygonal Approximation

Robustness to Robustness to NoiseNoise

Similarity MeasurementElastic Matching Algorithm

Similarity Measurement method Application of dynamic programming Minimize the distance between two

patterns by allowing deformations on the patterns.

Cost of matching two items is calculated by Euclidean metric.

Robust to distortions promises to approximate human ways of

perceiving similarity

TEST RESULT FOR MPEG 7 CE PART A-1 Robustness to Scaling

TEST RESULT FOR MPEG 7 CE PART A-2 Robustness to Rotation

TEST RESULT FOR MPEG 7 CE PART B Similarity-based Retrieval

TEST RESULT FOR MPEG 7 CE

PART C Motion and Non-Rigid Deformations

Comparison Best Studies in MPEG 7 CE Shape

1;Data Set Shape

Context

Tangent Space

Curvature Scale Space

Zernika

Moments

Wavelet DAG SBA with length 40

SBA with length 60

Part A1 _ 88.6589.76 92.54 88.04 85 89.32 90.87

Part A2 _ 100 99.37 99.60 97.46 85 99.82 100

Part B 76.51 76.4575.44 70.22 67.76 60 81.04 82.37

Part C _ 92 96 94.5 93 83 93 93.5

Performance Evaluation (1) Average performance with the average over the three parts; Total Score1 = 1/3 A + 1/3 B + 1/3 C

Tangent Space

Zernika

Moments

SBA with length 60

Total Score 1

87.59 88.67 86.93 84.50 76 90.80 91.69

(2) Average performance with the average over the number of queries;Total Score2 = 840/2241 A + 1400/2241 B+ 1 / 2241 C

Tangent Space

Zernika

Moments

SBA with length 60

Total Score 2

83.16 82.62 79.92 77.14 69.38

86.12 87.27

Boundary of a shape is not always available

Question: Can we somehow extract the boundary of a shape directly on the edge detected image?

GENERALIZE THE BAS FUNCTIONS

Generalized Beam Angle Statistics (GBAS)

Assume: Shape is embedded in set of unordered edge pixel.

Define the generalized beam angles formed by the forward and backward boundary pixel sets that are partitioned by the mean beam vector.

Generalized Beam Angle Statistics (GBAS)

20 2122

Mean Beam Vector

Forward and BackwardBoundary Pixel Sets

SgggG ,...,, 21 RiiiI ,...,, 21

ForwardBoundary Pixels

BackwardBoundary Pixels

Mean Beam Vector

Generalized Beam Angle

))((, ipC lk

SgggG ,...,, 21 RiiiI ,...,, 21

ForwardBoundary Pixels

BackwardBoundary Pixels

Mean Beam Vector

Generalized Beam Angle

))((, ipC lk

,...)(,)()( 21 iCEiCEi

0,1,2,...m ))(()( ,,

m iCPCiCE1 1

Mean Beam Vector

Generalized Beam Angle Statistics

HANOLISTIC: a hierarchical automatic image annotation system using holistic approach

Özge Öztimur Karadağ &

Fatoş T. Yarman Vural

Department of Computer EngineeringMiddle East Technical University, Ankara, Turkey

Automatic Image Annotation

Image Annotation : Assigning keywords to digital images. Labor intensive Time consuming

Need a system that automatically annotates images.

Image Annotation Literature

Annotation problem has become popular since 1990s.

Related to CBIR. CBIR processes visual information Annotation processes visual and

semantic information Relate visual content information

to semantic context information.

Problems About Automatic Image Annotation Human subjectivity Semantic Gap Availability of datasets

Image Annotation Approaches in the Literature…

Segmental Approaches1. Segment or partition the image into

regions2. Extract features from the regions3. Quantize features into blobs4. Model the relation between the

image regions and annotation words

Holistic Approaches Features are extracted from the

whole image.

The Proposed System: HANOLISTIC

Introducing semantic information as supervision. each word is considered as a class label, an image belongs to one or more classes

Holistic Approach: multiple visual features are extracted from the several whole image. Multiple feature spaces

Description of an Image Content Description by Visual Features

of Mpeg-7 Color Layout Color Structure Scalable Color Homogenous Texture Edge Histogram

Context Description by Semantic Words Annotation words

System Architecture of HANOLISTIC

Level-0 : consists of level-0 annotators, one annotator for each visual description space.

Meta-level : consists of a meta-annotator

Level-0 Annotator

refers to the features of the i th image in the j th description space

refers to the membership value of the l th word for the i th image in the j th description space.

Meta-Level The results of level-0 annotators are

aggregated.

is a vector, referring to the final word

membership values for the i th image.

Experimental studies

Realization of HANOLISTIC Instance based realization of Level-0 Eager realization of Level-0 Realization of Meta-level

Performance criteria

Results

Experimental Setup Data set: A subset of Corel Stock

Photo Collection, consisting of 5000 images. Training set: 4500 images (500

images for validation) Testing set: 500 images

Each image is annotated with 1-5 many words.

Instance based Realization of Level-0 Annotator by Fuzzy-knn

Level-0 annotators are realized by fuzzy-knn.

For each description space; k nearest neighbors of the image is determined.

Word membership values are estimated considering the neighbors’ words and their distance from the image.

High membership values are assigned to words that appear in close neighborhood.

Eager Realization of Level-0 by Ann

For a given image Ii, ANN receives visual description of the image as input and semantic annotation words of the image as target.

Each ANN is trained with backpropagation and a randomly selected set of images is used for validation to determine when to stop training. K-fold cross validation is applied.

Realization of Meta-Level by Majority Voting

Adds the membership values returned by level-0 annotators using the formula

where, Pi,j is a vector containing the word membership values returned from the jth level-0 annotator.

For each word select the maximum of the five word membership values estimated by the level-0 annotators.

Performance Criteria Precision

Recall

F-score

Performance of Level-0 Annotators Performance of Level-0

annotators with fuzzy-knn

Performance of HANOLISTIC

Comparison of HANOLISTIC with other systems in the literature:

Annotation Examples

Annotation Examples…

Conclusion We proposed a hierarchical

automatic image annotation system using holistic approach.

We tested the system both with an instance based and an eager method.

We realized that the instance based methods are more promising in the considered problem domain.

Conclusion… The power of the proposed

system comes from the following main principles: Simplicity Fuzziness Simultaneous processing of content

and context information Holistic view of image through

different perspectives

Regions and Relationships

• Segment the image into regions

• Find their properties and interrelationships

• Construct a graph representation with nodes for regions and edges for spatial relationships

• Use graph matching to compare images

Like what?

Tiger Image as a Graph

tiger grass

aboveadjacent

inside

above aboveadjacent

abstract regions

Object Detection: Rowley’s Face Finder

1. convert to gray scale2. normalize for lighting*

3. histogram equalization4. apply K-NN trained on 16K images

What data is fed tothe classifier?

32 x 32 windows ina pyramid structure

* Like first step in Laws algorithm, p. 220

Fleck and Forsyth’s Flesh Detector

The “Finding Naked People” Paper

• Convert RGB to HSI• Use the intensity component to compute a texture map texture = med2 ( | I - med1(I) | )• If a pixel falls into either of the following ranges, it’s a potential skin pixel

texture < 5, 110 < hue < 150, 20 < saturation < 60 texture < 5, 130 < hue < 170, 30 < saturation < 130

median filters of radii 4 and 6

Look for LARGE areas that satisfy this to identify pornography.

See Transparencies

Relevance Feedback

In real interactive CBIR systems, the user shouldbe allowed to interact with the system to “refine”the results of a query until he/she is satisfied.

Relevance feedback work has been done by a number of research groups, e.g.

• The Photobook Project (Media Lab, MIT)• The Leiden Portrait Retrieval Project• The MARS Project (Tom Huang’s group at Illinois)

Information Retrieval Model*

An IR model consists of: a document model a query model a model for computing similarity between documents

and the queries

Term (keyword) weighting

Relevance Feedback

*from Rui, Huang, and Mehrotra’s work

Term weighting

Term weight assigning different weights for different

keyword(terms) according their relative importance to the document

define to be the weight for term ,k=1,2,…,N, in the document i

document i can be represented as a weight vector in the term space

iNiii wwwD ;...;; 21

Term weighting

The query Q also is a weight vector in the term space

The similarity between D and Q

QDQDSim

qNqq wwwQ ;...;; 21

Using Relevance Feedback

The CBIR system should automatically adjust the weight that were given by the user for the relevance of previously retrieved documents

Most systems use a statistical method for adjusting the weights.

The Idea of Gaussian Normalization

If all the relevant images have similar values for component j

the component j is relevant to the query If all the relevant images have very different

values for component j the component j is not relevant to the query

the inverse of the standard deviation of the related image sequence is a good measure of the weight for component j

the smaller the variance, the larger the weight

Leiden Portrait System

The Leiden Portrait Retrieval System is anexample of the use of relevance feedback.

http://ind156b.wi.leidenuniv.nl:2000/

Andy Berman’s FIDS System multiple distance measures Boolean and linear combinations efficient indexing using images as keys

Andy Berman’s FIDS System:

Use of key images and the triangle inequalityfor efficient retrieval.

Bare-Bones Triangle Inequality Algorithm

Offline

1. Choose a small set of key images

2. Store distances from database images to keys

Online (given query Q)

1. Compute the distance from Q to each key

2. Obtain lower bounds on distances to database images

3. Threshold or return all images in order of lower bounds

Bare-Bones Algorithm with Multiple Distance Measures

Offline

1. Choose key images for each measure

2. Store distances from database images to keys for all measures

Online (given query Q)

1. Calculate lower bounds for each measure

2. Combine to form lower bounds for composite measures

3. Continue as in single measure algorithm

Triangle Tries

A triangle trie is a tree structure that stores the distances from database images to each of the keys, one key per tree level.

W,Z X Y

Distance to key 1

Distance to key 2

Triangle Tries and Two-Stage Pruning

• First Stage: Use a short triangle trie.

• Second Stage: Bare-bones algorithm on the images returned from the triangle-trie stage.

The quality of the output is the same as with the bare-bones algorithm itself, but execution is faster.

Performance on a Pentium Pro 200-mHz

Step 1. Extract features from query image. (.02s t .25s)

Step 2. Calculate distance from query to key images. (1s t .8ms)

Step 3. Calculate lower bound distances. (t 4ms per 1000 images using 35 keys, which is about 250,000 images per second.)

Step 4. Return the images with smallest lower bound distances.

Weakness of Low-level Features

Can’t capture the high-level concepts

Current Research Objective

Image Database

Query Image Retrieved Images

Images

Object-oriented Feature

Extraction

Animals

Buildings

Office Buildings

Houses

Transportation

•Boats

•Vehicles

content-based image retrieval (cbir)

Documents

image retrieval by content (cbir)

content-based image retrieval (cbir)

1 content-based image retrieval. 2 what is content-based...

content-based image retrieval - connecting repositories ·...

content based image retrieval (cbir) of dermatological...

content based image retrieval (cbir) for medical images ·...

retrieval of remote sensing images based on color moment...

content based image retrieval (cbir) -...

effective approach for content based image retrieval … ·...

a study of content based image retrieval … is most widely...

comparison of content based image retrieval systems...

content based image retrieval using clustering...

content based image retrieval(cbir)

image retrieval by content (cbir). presentation outline...

content-based image retrieval (cbir)€¦ · content-based...

content-based image retrieval (cbir) system...

content-based image retrieval (cbir) using hybrid...

image retrieval part ii. 2 topics applications of cbir in...

retrieval by content -...

content based image retrieval using query by approximate...