content-based image retrieval (cbir)

Post on 15-Jan-2016

93 Views

Category:

Documents

14 Downloads

Preview:

Click to see full reader

DESCRIPTION

Content-based Image Retrieval (CBIR). Searching a large database for images that match a query: What kinds of databases? What kinds of queries? What constitutes a match? How do we make such searches efficient?. Applications. Art Collections e.g. Museum s, galleries, Archives - PowerPoint PPT Presentation

TRANSCRIPT

1

Content-based Image Retrieval (CBIR)

Searching a large database for images that match a query:

What kinds of databases? What kinds of queries? What constitutes a match? How do we make such searches

efficient?

2

Applications

Art Collections e.g. Museums, galleries, Archives Medical Image Databases CT, MRI, Ultrasound, The Visible Human Scientific Databases e.g. Earth Sciences, Hubble General Image Collections for Licensing Corbis, Getty Images, photo galery of

Gulhasan The World Wide Web

3

What is a query?

an image you already have a rough sketch you draw a symbolic description of what you want e.g. an image of a man and a woman on a beach

4

Some Systems You Can Try

Corbis Stock Photography and Pictures

http://www.corbis.com/

• Corbis sells high-quality images for use in advertising, marketing, illustrating, etc.

• Search is entirely by keywords.

• Human indexers look at each new image and enter keywords.

• A thesaurus constructed from user queries is used.

5

AVAİLABLE CBIR SYSTEMS

6

QBICIBM’s QBIC (Query by Image Content)

http://wwwqbic.almaden.ibm.com

• The first commercial system.

• Uses or has-used color percentages, color layout, texture, shape, location, and keywords.

7

Blobworld

UC Berkeley’s Blobworld

http://elib.cs.berkeley.edu/photos/blobworld

•Images are segmented on color plus texture

• User selects a region of the query image

• System returns images with similar regions

• Works really well for tigers and zebras

8

DittoDitto: See the Web

http://www.ditto.com

• Small company

• Allows you to search for pictures from web pages

National Archives, USA

9

Declaration of Independence

Ottoman Archives ???

10

QUERY BY EXAMPLE

11

12

Image Features / Distance Measures

Image Database

Query Image

Distance Measure

Retrieved Images

Image Feature

Extraction

User

Feature SpaceImages

13

Features

• Color (histograms, gridded layout, wavelets)

• Texture (Laws, Gabor filters, local binary partition)

• Shape (first segment the image, then use statistical or structural shape similarity measures)

• Objects and their Relationships

This is the most powerful, but you have to be able to recognize the objects!

14

Color Histograms

15

QBIC’s Histogram Similarity

The QBIC color histogram distance is:

dhist(I,Q) = (h(I) - h(Q)) A (h(I) - h(Q))T

• h(I) is a K-bin histogram of a database image

• h(Q) is a K-bin histogram of the query image

• A is a K x K similarity matrix

16

Similarity Matrix

R G B Y C V 1 0 0 .5 0 .50 1 0 .5 .5 00 0 1 1 1 1

RGBYCV

How similar is blue to cyan?

??

17

Gridded Color

Gridded color distance is the sum of the color distancesin each of the corresponding grid squares.

What color distance would you use for a pair of grid squares?

1 12 2

3 34 4

18

Texture Distances

• Pick and Click (user clicks on a pixel and system retrieves images that have in them a region with similar texture to the region surrounding it.

• Gridded (just like gridded color, but use texture).

• Histogram-based (e.g. compare the LBP histograms).

19

Local Binary Pattern Measure

100 101 103 40 50 80 50 60 90

• For each pixel p, create an 8-bit number b1 b2 b3 b4 b5 b6 b7 b8, where bi = 0 if neighbor i has value less than or equal to p’s value and 1 otherwise.

• Represent the texture in the image (or a region) by the histogram of these numbers.• Compute L1 distance between two histograms:

1 1 1 1 1 1 0 0

1 2 3

4

5 7 6

8

20

Laws Texture

21

Law’s texture masks (1)

22

Shape Distances

• Shape goes one step further than color and texture.

• It requires identification of regions to compare.

• There have been many shape similarity measures suggested for pattern recognition that can be used to construct shape distance measures.

23

Global Shape Properties:Projection Matching

041320

0 4 3 2 1 0

In projection matching, the horizontal and verticalprojections form a histogram.

Feature Vector(0,4,1,3,2,0,0,4,3,2,1,0)

What are the weaknesses of this method? strengths?

24

Global Shape Properties:Tangent-Angle Histograms

135

0 30 45 135

Is this feature invariant to starting point?

Boundary Matching:Fourier Descriptors

Given a sequence of points of a boundary Vk

25

26

Boundary Matching

• Elastic Matching

The distance between query shape and image shapehas two components:

1. energy required to deform the query shape into one that best matches the image shape

2. a measure of how well the deformed query matches the image

27

Del Bimbo Elastic Shape Matching

query retrieved images

28

Our work on CBIR at METU1. M. Uysal, F.T. Yarman-Vural, “A Content-Based Fuzzy Image Database Based on The Fuzzy ARTMAP

Architecture”, Turkish Journal of Electrical Engineering & Computer Sciences, vol. 13, pp.333-342, 2005.

2. Ö.Ö. Tola, N. Arıca, F.T. Yarman-Vural, “Shape Recognition with Generalized Beam Angle Statistics”, Lecture Notes on Computer Science, vol. 2, 2004.

3. Ö.C. Özcanli, F.T. Yarman-Vural, “An Image Retrieval System Based on Region Classification”, Lecture Notes on Computer Science, vol. 1, 2004.

4. Y.Xu, P.Duygulu, E.Saber, A.M. Tekalp, F.T. Yarman-Vural, “Object-based Image Labeling Through Learning by Example and Multi-level Segmentation”, Pattern Recognition, vol. 36, pp.1407-23, 2003.

5. N. Arıca, F.T. Yarman-Vural, “BAS: A Perceptural Shape Descriptor Based on the Beam Angle Statistics”, Pattern Recognition Letters, vol. 24, pp.1627-39, 2003.

6. M. Ozay, F.T. Yarman-Vural. " On the Performance of Stacked Generalization Architecture", Lecture Notes in Computer Science, ICIAR pp.445-451, 2008,

7. E. Akbaş, F.T. Yarman-Vural, "Automatic Image Annotation by Ensemble of Visual Descriptors", CVPR 2007,pp.1-8, 2007.

8. İ. Yalabık, F.T. Yarman-Vural, G. Ucoluk, O.T. Sehitoglu, "A Pattern Classification Approach for Boosting with Genetic Algorithms", ISCIS07, pp.1-6, 2007.

9. M. Uysal, E. Akbas, F.T. Yarman-Vural, “A Hierarchical Classification System Based on Adaptive Resonance Theory”, ICIP06, p.2913-16, 2006.

10. E. Akbaş, F.T. Yarman-Vural, “Design of Feature Set for Face Recognition Problem”, Lecture Notes in Computer Science, ISCIS 2006, pp.239-47, 2006.

11. N. Arica, F.T. Yarman-Vural, “Shape Similarity Measurement for Boundary Based Features”, Lecture Notes in Computer Science, ISBN 3-540-29069-9, 3656, pp.431-438, 2005.

12. M. Uysal and F.T. Yarman-Vural, “ORF-NT: An Object-Based Image Retrieval Framework Using Neighborhood Trees”, Lecture Notes in Computer Science 37332005, ISBN 3-540-29414-7 SCIS2005, p.595-605, 2005.

13. N. Arıca and F.T. Yarman-Vural, “A Compact Shape Descriptor Based on the Beam Angle Statistics”, International Conference on Image and Vidor Retrieval (CIVR), 2003.

14. N. Arıca, F.T. Yarman-Vural, “A Perceptual Shape Descriptor”, International Conference on Pattern Recognition ICPR, 2002.

15. A. Çarkacıoğlu, F.T. Yarman-Vural "Learning Similarity Space", Proc. Int. Conf. on Processing ICIP02, pp. 405-408, Rochester, NY, USA, September 2002.

16. N. Arıca, F.T. Yarman-Vural, “A Perceptual Shape Descriptor”, Proc. International Conference on Pattern Recognition (ICPR) Quebec, Canada, 2002.

17. N. Arıca, F.T. Yarman-Vural, “A Shape Descriptor Based on Circular Hidden Markov Model”, Proc. International Conference on Pattern Recognition (ICPR), 2000, Barcelona, Spain.

18. P. Duygulu, E. Saber, M. Tekalp, F.T. Yarman-Vural, “Multi-Level Image Segmentation for Content Based Image Representation" IEEE-ICASSP-2000, July 2000, Istanbul.

19. P. Duygulu, A. Çarkacıoğlu, F.T. Yarman-Vural, "Multi-Level Object Description: Color or Texture", First IEEE Balkan Conference on Signal Processing, Communications, Circuits, and Systems, June 1-3, 2000, Istanbul, Turkey.

20. N. Dicle, F.T. Yarman-Vural, NES A File Format for Content Based Coding of Images, ISCIS 15, 2000, Istanbul.

21. N. Arıca, F.T. Yarman-Vural, “A New HMM Topology for Shape Recognition” IEEE-EURASIP Workshop on Nonlinear Signal and Image Processing (NSIP'99), pp. 162-168, Antalya TURKEY, 1999.

22. S. Genc, F.T. Yarman-Vural, "Morphing for Motion Estimation" Int.Conf. on Image Processing Systems, and Technologiesand Information Systems, Las Vegas 1999.

29

Türkçe Bildiriler 1. M. Özay, F.T. Yarman Vural, "Yığılmış Genelleme Algoritmalarının Performans Analizi", Sinyal

İşleme ve İletişim Uygulamaları Kurultayı, 2008.2. A. Sayar, F.T. Yarman Vural, "Yarı Denetimli Kümeleme ile Görüntü İçeriğine Açıklayıcı Not

Üretme", Sinyal İşleme ve İletişim Uygulamaları Kurultayı, 2008.3. Ö. Özcanli, E. Akbaş, F.T. Yarman Vural, “Görüntü Erişiminde Bulanık ARTMAP Ve Adaboost

Sınıflandırma Yöntemlerinin Performans Karşılaştırması”, IEEE 13. Sinyal İşleme ve İletişim Uygulamaları Kurultayı, 2005.

4. Ö.Ö. Tola, N. Arıca, F.T. Yarman Vural, “Genelleştirilmiş Kerteriz Açısı İstatistikleri ile Şekil Tanıma”, Sinyal İşleme ve İletişim Uygulamaları Kurultayı, 1, 2004.

5. M. Uysal, Ö. Özcanlı, F.T. Yarman Vural, “Bulanık ARTMAP Mimarisini Kullanan İçerik Bazlı Bir İmge Sorgulama Sistemi, Sinyal Isleme ve İletisim Uygulamaları Kurultayı, 2004.

6. A. Çarkacıoğlu, F.T. Yarman Vural, "Doku Benzerliğini Tesbit Etmek İçin Yeni Bir Tanımlayıcı", 11. SIU, Sinyal İşleme ve İletişim Uygulamaları Kurultayı, 2003.

7. N. Arıca, F.T. Yarman Vural, "Tıkız Şekil Betimleyicileri", 11.SIU, Sinyal İşleme ve İletişim Uygulamaları Kurultayı, 2003.

8. Ö.C. Özcanlı, P. Duygulu Şahin, F.T. Yarman Vural, “Açıklamalı Görüntü Veritabanları Kullanarak Nesne Tanıma ve Erişimi”, 11.SIU, Sinyal İşleme ve İletişim Uygulamaları Kurultayı, 2003.

9. S. Alkan, F.T. Yarman Vural, “Öğretim Üyesi Yetiştirme Programı”, Elektrik Elektronik, Bilgisayar Mühendislikleri Eğitimi I. Ulusal Sempozyumu, 2003.

10. M. Uysal, F.T. Yarman Vural, “En İyi Temsil Eden Öznitelik Kullanılarak İçeriğe Dayalı Endeksleme ve Bulanık Mantığa Dayalı Sorgulama Sistemi”, 11.SIU, Sinyal İşleme ve İletişim Uygulamaları Kurultayı, 2003.

11. N. Arıca, F.T. Yarman Vural, “Kerteriz Tabanlı Şekil Tanımlayıcısı”, 10. SIU, cilt.1 sayfa 129-134, 2002.

12. Ö.C. Özcanlı, S. Dağlar, F.T. Yarman Vural, “Yerel Benzerlik Örüntüsü Yöntemi ile Görüntü Erişim Sistemi”, 10. SIU, cilt.1 sayfa 141-146, 2002. 30

BAS: Beam Angle Statstics

Ömer Önder TolaNafiz Arıca

Fatoş Tünay Yarman Vural

Beam Angle Statistics (BAS)

BAS, is a shape descriptor that describes a shape that is defined as an ordered set of boundary pixels, by assigning the statistics of beam angles measured for the pixel as a feature vector.

Beam Angle shows how much the shape is bended.

BAS, describes the 2D shape as 1D moment functions of the Beam Angles.

Beam Angle Statistics: Given an ordered set of boundary pixels

1 2 34

56

78

91011

121314

15161718192021222324

2526

2728

2930

313233

34

35 36

Beam Angle Statistics)(ip 0 1 23

45

67

8910

111213

14151617181716151413

1211

109

87

654

3

2 1

))(( ipC1

Beam Angle Statistics)(ip 0 1 23

45

67

8910

111213

14151617181716151413

1211

109

87

654

3

2 1

))(( ipC2

Beam Angle Statistics)(ip 0 1 23

45

67

8910

111213

14151617181716151413

1211

109

87

654

3

2 1

))(( ipC3

Beam Angle Statistics)(ip

))(( ipCK

0 1 23

45

67

8910

111213

14151617181716151413

1211

109

87

654

3

2 1

Beam Angle Statistics)(ip 0 1 23

45

67

8910

111213

14151617181716151413

1211

109

87

654

3

2 1

))(( ipC17

Beam Angle Statistics)(ip

)( Kip

)( Kip

))(( ipCK

Beam Angle Statistics)(ip

)( Kip

)( Kip

,...)(,)()( 21 iCEiCEi

0,1,2,...m ))(()(K

iCPCiCE KKmK

m

METU, Department of Computer Engineering 41

Plots of CK(i)’s with fix K values

k=N/40

k=N/10

k=N/4

k=N/40 : N/4

METU, Department of Computer Engineering 42

What is the most appropriate value for K which discriminates the shapes in large database and represents the shape information at all scale ?

Answer:Find a representation which employs the

information in CK(i) for all values of K.

Output of a stochastic process at each point

METU, Department of Computer Engineering 43

C(i) is a Random Variable of the stochastic process which generates the beam angles

mth moment of random variable C(i)

Each boundary point i is represented by the moments of C(i)

METU, Department of Computer Engineering 44

First three moments of C(i)’s

METU, Department of Computer Engineering 45

Correspondence of Visual Parts and Insensitivity to Affine Transformation

METU, Department of Computer Engineering 46

Robustness to Polygonal Approximation

Robustness to Robustness to NoiseNoise

METU, Department of Computer Engineering 47

Similarity MeasurementElastic Matching Algorithm

Similarity Measurement method Application of dynamic programming Minimize the distance between two

patterns by allowing deformations on the patterns.

Cost of matching two items is calculated by Euclidean metric.

Robust to distortions promises to approximate human ways of

perceiving similarity

METU, Department of Computer Engineering 48

TEST RESULT FOR MPEG 7 CE PART A-1 Robustness to Scaling

METU, Department of Computer Engineering 49

TEST RESULT FOR MPEG 7 CE PART A-2 Robustness to Rotation

METU, Department of Computer Engineering 50

TEST RESULT FOR MPEG 7 CE PART B Similarity-based Retrieval

METU, Department of Computer Engineering 51

TEST RESULT FOR MPEG 7 CE

PART C Motion and Non-Rigid Deformations

METU, Department of Computer Engineering 52

Comparison Best Studies in MPEG 7 CE Shape

1;Data Set Shape

Context

Tangent Space

Curvature Scale Space

Zernika

Moments

Wavelet DAG SBA with length 40

SBA with length 60

Part A1 _ 88.6589.76 92.54 88.04 85 89.32 90.87

Part A2 _ 100 99.37 99.60 97.46 85 99.82 100

Part B 76.51 76.4575.44 70.22 67.76 60 81.04 82.37

Part C _ 92 96 94.5 93 83 93 93.5

METU, Department of Computer Engineering 53

Performance Evaluation (1) Average performance with the average over the three parts; Total Score1 = 1/3 A + 1/3 B + 1/3 C

Tangent Space

Curvature Scale Space

Zernika

Moments

Wavelet DAG SBA with length 40

SBA with length 60

Total Score 1

87.59 88.67 86.93 84.50 76 90.80 91.69

(2) Average performance with the average over the number of queries;Total Score2 = 840/2241 A + 1400/2241 B+ 1 / 2241 C

Tangent Space

Curvature Scale Space

Zernika

Moments

Wavelet DAG SBA with length 40

SBA with length 60

Total Score 2

83.16 82.62 79.92 77.14 69.38

86.12 87.27

Boundary of a shape is not always available

Question: Can we somehow extract the boundary of a shape directly on the edge detected image?

GENERALIZE THE BAS FUNCTIONS

Generalized Beam Angle Statistics (GBAS)

Assume: Shape is embedded in set of unordered edge pixel.

Define the generalized beam angles formed by the forward and backward boundary pixel sets that are partitioned by the mean beam vector.

Generalized Beam Angle Statistics (GBAS)

1 2

3

4

5

6

7

8

9

9 10

11

1213

1415

16

17

1718

19

20 2122

23

24

2526

2728

2930

31 32

33

34

Mean Beam Vector

Mean Beam Vector

)(ip

Forward and BackwardBoundary Pixel Sets

SgggG ,...,, 21 RiiiI ,...,, 21

ForwardBoundary Pixels

Set

BackwardBoundary Pixels

Set

)(ip

Mean Beam Vector

Generalized Beam Angle

li

kg

)(ip

))((, ipC lk

SgggG ,...,, 21 RiiiI ,...,, 21

ForwardBoundary Pixels

Set

BackwardBoundary Pixels

Set

Mean Beam Vector

Generalized Beam Angle

li

kg

)(ip

))((, ipC lk

,...)(,)()( 21 iCEiCEi

0,1,2,...m ))(()( ,,

S

k

R

llki

mlk

m iCPCiCE1 1

Mean Beam Vector

Generalized Beam Angle Statistics

HANOLISTIC: a hierarchical automatic image annotation system using holistic approach

Özge Öztimur Karadağ &

Fatoş T. Yarman Vural

Department of Computer EngineeringMiddle East Technical University, Ankara, Turkey

Automatic Image Annotation

Image Annotation : Assigning keywords to digital images. Labor intensive Time consuming

Need a system that automatically annotates images.

Image Annotation Literature

Annotation problem has become popular since 1990s.

Related to CBIR. CBIR processes visual information Annotation processes visual and

semantic information Relate visual content information

to semantic context information.

Problems About Automatic Image Annotation Human subjectivity Semantic Gap Availability of datasets

Image Annotation Approaches in the Literature…

Segmental Approaches1. Segment or partition the image into

regions2. Extract features from the regions3. Quantize features into blobs4. Model the relation between the

image regions and annotation words

Holistic Approaches Features are extracted from the

whole image.

The Proposed System: HANOLISTIC

Introducing semantic information as supervision. each word is considered as a class label, an image belongs to one or more classes

Holistic Approach: multiple visual features are extracted from the several whole image. Multiple feature spaces

Description of an Image Content Description by Visual Features

of Mpeg-7 Color Layout Color Structure Scalable Color Homogenous Texture Edge Histogram

Context Description by Semantic Words Annotation words

System Architecture of HANOLISTIC

Level-0 : consists of level-0 annotators, one annotator for each visual description space.

Meta-level : consists of a meta-annotator

Level-0 Annotator

refers to the features of the i th image in the j th description space

refers to the membership value of the l th word for the i th image in the j th description space.

Meta-Level The results of level-0 annotators are

aggregated.

is a vector, referring to the final word

membership values for the i th image.

Experimental studies

Realization of HANOLISTIC Instance based realization of Level-0 Eager realization of Level-0 Realization of Meta-level

Performance criteria

Results

Experimental Setup Data set: A subset of Corel Stock

Photo Collection, consisting of 5000 images. Training set: 4500 images (500

images for validation) Testing set: 500 images

Each image is annotated with 1-5 many words.

Instance based Realization of Level-0 Annotator by Fuzzy-knn

Level-0 annotators are realized by fuzzy-knn.

For each description space; k nearest neighbors of the image is determined.

Word membership values are estimated considering the neighbors’ words and their distance from the image.

High membership values are assigned to words that appear in close neighborhood.

Eager Realization of Level-0 by Ann

For a given image Ii, ANN receives visual description of the image as input and semantic annotation words of the image as target.

Each ANN is trained with backpropagation and a randomly selected set of images is used for validation to determine when to stop training. K-fold cross validation is applied.

Realization of Meta-Level by Majority Voting

Adds the membership values returned by level-0 annotators using the formula

where, Pi,j is a vector containing the word membership values returned from the jth level-0 annotator.

For each word select the maximum of the five word membership values estimated by the level-0 annotators.

Performance Criteria Precision

Recall

F-score

Performance of Level-0 Annotators Performance of Level-0

annotators with fuzzy-knn

Performance of HANOLISTIC

Comparison of HANOLISTIC with other systems in the literature:

Annotation Examples

Annotation Examples…

Conclusion We proposed a hierarchical

automatic image annotation system using holistic approach.

We tested the system both with an instance based and an eager method.

We realized that the instance based methods are more promising in the considered problem domain.

Conclusion… The power of the proposed

system comes from the following main principles: Simplicity Fuzziness Simultaneous processing of content

and context information Holistic view of image through

different perspectives

84

Regions and Relationships

• Segment the image into regions

• Find their properties and interrelationships

• Construct a graph representation with nodes for regions and edges for spatial relationships

• Use graph matching to compare images

Like what?

85

Tiger Image as a Graph

sky

sand

tiger grass

aboveadjacent

above

inside

above aboveadjacent

image

abstract regions

86

Object Detection: Rowley’s Face Finder

1. convert to gray scale2. normalize for lighting*

3. histogram equalization4. apply K-NN trained on 16K images

What data is fed tothe classifier?

32 x 32 windows ina pyramid structure

* Like first step in Laws algorithm, p. 220

87

Fleck and Forsyth’s Flesh Detector

The “Finding Naked People” Paper

• Convert RGB to HSI• Use the intensity component to compute a texture map texture = med2 ( | I - med1(I) | )• If a pixel falls into either of the following ranges, it’s a potential skin pixel

texture < 5, 110 < hue < 150, 20 < saturation < 60 texture < 5, 130 < hue < 170, 30 < saturation < 130

median filters of radii 4 and 6

Look for LARGE areas that satisfy this to identify pornography.

See Transparencies

88

Relevance Feedback

In real interactive CBIR systems, the user shouldbe allowed to interact with the system to “refine”the results of a query until he/she is satisfied.

Relevance feedback work has been done by a number of research groups, e.g.

• The Photobook Project (Media Lab, MIT)• The Leiden Portrait Retrieval Project• The MARS Project (Tom Huang’s group at Illinois)

89

Information Retrieval Model*

An IR model consists of: a document model a query model a model for computing similarity between documents

and the queries

Term (keyword) weighting

Relevance Feedback

*from Rui, Huang, and Mehrotra’s work

90

Term weighting

Term weight assigning different weights for different

keyword(terms) according their relative importance to the document

define to be the weight for term ,k=1,2,…,N, in the document i

document i can be represented as a weight vector in the term space

ikwkt

iNiii wwwD ;...;; 21

91

Term weighting

The query Q also is a weight vector in the term space

The similarity between D and Q

QD

QDQDSim

),(

qNqq wwwQ ;...;; 21

.

92

Using Relevance Feedback

The CBIR system should automatically adjust the weight that were given by the user for the relevance of previously retrieved documents

Most systems use a statistical method for adjusting the weights.

93

The Idea of Gaussian Normalization

If all the relevant images have similar values for component j

the component j is relevant to the query If all the relevant images have very different

values for component j the component j is not relevant to the query

the inverse of the standard deviation of the related image sequence is a good measure of the weight for component j

the smaller the variance, the larger the weight

94

Leiden Portrait System

The Leiden Portrait Retrieval System is anexample of the use of relevance feedback.

http://ind156b.wi.leidenuniv.nl:2000/

95

Andy Berman’s FIDS System multiple distance measures Boolean and linear combinations efficient indexing using images as keys

96

Andy Berman’s FIDS System:

Use of key images and the triangle inequalityfor efficient retrieval.

97

Andy Berman’s FIDS System:

Bare-Bones Triangle Inequality Algorithm

Offline

1. Choose a small set of key images

2. Store distances from database images to keys

Online (given query Q)

1. Compute the distance from Q to each key

2. Obtain lower bounds on distances to database images

3. Threshold or return all images in order of lower bounds

98

Andy Berman’s FIDS System:

99

Andy Berman’s FIDS System:

Bare-Bones Algorithm with Multiple Distance Measures

Offline

1. Choose key images for each measure

2. Store distances from database images to keys for all measures

Online (given query Q)

1. Calculate lower bounds for each measure

2. Combine to form lower bounds for composite measures

3. Continue as in single measure algorithm

100

Andy Berman’s FIDS System:

Triangle Tries

A triangle trie is a tree structure that stores the distances from database images to each of the keys, one key per tree level.

root

3 4

1 9 8

W,Z X Y

Distance to key 1

Distance to key 2

101

Andy Berman’s FIDS System:

Triangle Tries and Two-Stage Pruning

• First Stage: Use a short triangle trie.

• Second Stage: Bare-bones algorithm on the images returned from the triangle-trie stage.

The quality of the output is the same as with the bare-bones algorithm itself, but execution is faster.

102

Andy Berman’s FIDS System:

103

Andy Berman’s FIDS System:

104

Andy Berman’s FIDS System:

Performance on a Pentium Pro 200-mHz

Step 1. Extract features from query image. (.02s t .25s)

Step 2. Calculate distance from query to key images. (1s t .8ms)

Step 3. Calculate lower bound distances. (t 4ms per 1000 images using 35 keys, which is about 250,000 images per second.)

Step 4. Return the images with smallest lower bound distances.

105

Andy Berman’s FIDS System:

106

Weakness of Low-level Features

Can’t capture the high-level concepts

107

Current Research Objective

Image Database

Query Image Retrieved Images

Images

Object-oriented Feature

Extraction

User

Animals

Buildings

Office Buildings

Houses

Transportation

•Boats

•Vehicles

boat

Categories

108

Overall Approach

• Develop object recognizers for common objects

• Use these recognizers to design a new set of both

low- and high-level features

• Design a learning system that can use these

features to recognize classes of objects

109

Boat Recognition

110

Vehicle Recognition

111

Building Recognition

112

Building Features: Consistent Line Clusters (CLC)

A Consistent Line Cluster is a set of lines that are homogeneous in terms of some line features.

Color-CLC: The lines have the same color feature.

Orientation-CLC: The lines are parallel to each other or converge to a common vanishing point.

Spatially-CLC: The lines are in close proximity to each other.

113

Color-CLC Color feature of lines: color pair (c1,c2) Color pair space:

RGB (2563*2563) Too big!Dominant colors (20*20)

Finding the color pairs:One line Several color pairs

Constructing Color-CLC: use clustering

114

Color-CLC

115

Orientation-CLC The lines in an Orientation-CLC are

parallel to each other in the 3D world

The parallel lines of an object in a 2D image can be: Parallel in 2D Converging to a vanishing point

(perspective)

116

Orientation-CLC

117

Spatially-CLC Vertical position clustering Horizontal position clustering

118

Building Recognition by CLC

Two types of buildings Two criteria Inter-relationship criterion Intra-relationship criterion

119

Inter-relationship criterion(Nc1>Ti1 or Nc2>Ti1) and (Nc1+Nc2)>Ti2

Nc1 = number of intersecting lines in cluster 1

Nc2 = number of intersecting lines in cluster 2

120

Intra-relationship criterion

|So| > Tj1 or w(So) > Tj2

S0 = set of heavily overlapping lines in a cluster

121

Experimental Evaluation Object Recognition

97 well-patterned buildings (bp): 97/97 44 not well-patterned buildings (bnp):

42/44 16 not patterned non-buildings (nbnp):

15/16 (one false positive) 25 patterned non-buildings (nbp): 0/25

CBIR

122

Experimental Evaluation Well-Patterned Buildings

123

Experimental Evaluation Non-Well-Patterned Buildings

124

Experimental Evaluation Non-Well-Patterned Non-Buildings

125

Experimental EvaluationWell-Patterned Non-Buildings (false positives)

126

Experimental Evaluation (CBIR)

Total Positive

Classification

(#)

Total Negative

Classification(#)

False positive

(#)

False negativ

e(#)

Accuracy(%)

Arborgreens

0 47 0 0 100

Campusinfall

27 21 0 5 89.6

Cannonbeach

30 18 0 6 87.5

Yellowstone 4 44 4 0 91.7

127

Experimental Evaluation (CBIR) False positives from Yellowstone

top related