imageretrievalviaisotropicandanisotropicmappingscvrc.ece.utexas.edu/publications/q. iqbal retrieval...

14
Pattern Recognition 35 (2002) 2673 – 2686 www.elsevier.com/locate/patcog Image retrieval via isotropic and anisotropic mappings Qasim Iqbal , J.K. Aggarwal Computer and Vision Research Center, Department of Electrical and Computer Engineering, The University of Texas at Austin, Austin, TX 78712, USA Received 31 October 2001; accepted 31 October 2001 Abstract This paper presents an approach for content-based image retrieval via isotropic and anisotropic mappings. Isotropic mappings are dened as mappings invariant to the action of the planar Euclidean group on the image space—invariant to the translation, rotation and reection of image data, and hence, invariant to orientation and position. Anisotropic mappings, on the other hand, are dened as those mappings that are correspondingly variant. Structure extraction (via a perceptual grouping process) and color histogram are shown to be representations of isotropic mappings. Texture analysis using a channel energy model comprised of even-symmetric Gabor lters is considered to be a representation of anisotropic mapping. An integration framework for these mappings is developed. Results of retrieval of outdoor images by query and by classication using a nearest neighbor classier are presented. ? 2002 Pattern Recognition Society. Published by Elsevier Science Ltd. All rights reserved. Keywords: Image retrieval; Euclidean group; Perceptual grouping; Structure; Texture; Color histogram; Gabor lter; Nearest neighbor classier 1. Introduction The interest in the automatic analysis of images based upon their content has signicantly increased with recent developments in digital image collections, world wide web (WWW), networking and multimedia. Active research in content-based image retrieval (CBIR) is geared towards the development of methodologies for analyzing, interpreting, cataloging and indexing image databases. In addition to their development, eorts are also being made to evaluate the performance of image retrieval systems [1]. Most of the previous work in image retrieval has fo- cused on retrieval by image query [2–5]. However, retrieval by image classication has also gained attention [6 –10]. This work was supported in part by the Army Research Oce under Contracts DAAD19-00-1-0044, DAAG55-98-1-0230 and DAAD19-99-1-0012 (Johns Hopkins University Subcontract Agreement 8905-48168). Corresponding author. Tel.: +1-512-471-1369; fax: +1-512- 471-5532. E-mail addresses: [email protected] (Q. Iqbal), [email protected] (J.K. Aggarwal). Retrieval by image query refers to the retrieval of images similar to a given query image from an image database, whereas retrieval by classication refers to the classication of images into certain known classes for retrieval. In this paper, we develop a methodology for retrieval of outdoor images using both image query and image classication by using a nearest neighbor classier. In image analysis, a desirable attribute is the notion of isotropy of computations in the sense of Euclidean invari- ance: any rotation, translation or reection of the input should produce an identical result under these transforma- tions, thus achieving orientation and position invariance. These image transformations are generated by the action of the (planar) Euclidean group on the image space. An isometry is a transformation of a space which preserves (Euclidean) distance. The Euclidean group is the group of isometries of an Euclidean space and is (isomorphic to) the semi-direct product of the orthogonal group and the transla- tion group. The orthogonal group is represented by rotations and reections. The translation group is represented by shifts of the Euclidean space. The action of the Euclidean group on the space of positions and directions R 2 ×S 1 , where positions are 0031-3203/02/$22.00 ? 2002 Pattern Recognition Society. Published by Elsevier Science Ltd. All rights reserved. PII:S0031-3203(01)00246-1

Upload: others

Post on 15-Mar-2020

3 views

Category:

Documents


0 download

TRANSCRIPT

  • Pattern Recognition 35 (2002) 2673–2686www.elsevier.com/locate/patcog

    Image retrieval via isotropic and anisotropic mappings�

    Qasim Iqbal∗, J.K. AggarwalComputer and Vision Research Center, Department of Electrical and Computer Engineering, The University of Texas at Austin,

    Austin, TX 78712, USA

    Received 31 October 2001; accepted 31 October 2001

    Abstract

    This paper presents an approach for content-based image retrieval via isotropic and anisotropic mappings. Isotropic mappingsare de.ned as mappings invariant to the action of the planar Euclidean group on the image space—invariant to the translation,rotation and re2ection of image data, and hence, invariant to orientation and position. Anisotropic mappings, on the other hand,are de.ned as those mappings that are correspondingly variant. Structure extraction (via a perceptual grouping process) andcolor histogram are shown to be representations of isotropic mappings. Texture analysis using a channel energymodel comprisedof even-symmetric Gabor .lters is considered to be a representation of anisotropic mapping. An integration framework forthese mappings is developed. Results of retrieval of outdoor images by query and by classi.cation using a nearest neighborclassi.er are presented. ? 2002 Pattern Recognition Society. Published by Elsevier Science Ltd. All rights reserved.

    Keywords: Image retrieval; Euclidean group; Perceptual grouping; Structure; Texture; Color histogram; Gabor .lter; Nearest neighborclassi.er

    1. Introduction

    The interest in the automatic analysis of images basedupon their content has signi.cantly increased with recentdevelopments in digital image collections, world wide web(WWW), networking and multimedia. Active research incontent-based image retrieval (CBIR) is geared towards thedevelopment of methodologies for analyzing, interpreting,cataloging and indexing image databases. In addition to theirdevelopment, e>orts are also being made to evaluate theperformance of image retrieval systems [1].Most of the previous work in image retrieval has fo-

    cused on retrieval by image query [2–5]. However, retrievalby image classi.cation has also gained attention [6–10].

    � This work was supported in part by the Army ResearchOCce under Contracts DAAD19-00-1-0044, DAAG55-98-1-0230and DAAD19-99-1-0012 (Johns Hopkins University SubcontractAgreement 8905-48168).

    ∗ Corresponding author. Tel.: +1-512-471-1369; fax: +1-512-471-5532.

    E-mail addresses: [email protected] (Q. Iqbal),[email protected] (J.K. Aggarwal).

    Retrieval by image query refers to the retrieval of imagessimilar to a given query image from an image database,whereas retrieval by classi.cation refers to the classi.cationof images into certain known classes for retrieval. In thispaper, we develop a methodology for retrieval of outdoorimages using both image query and image classi.cation byusing a nearest neighbor classi.er.In image analysis, a desirable attribute is the notion of

    isotropy of computations in the sense of Euclidean invari-ance: any rotation, translation or re2ection of the inputshould produce an identical result under these transforma-tions, thus achieving orientation and position invariance.These image transformations are generated by the actionof the (planar) Euclidean group on the image space. Anisometry is a transformation of a space which preserves(Euclidean) distance. The Euclidean group is the group ofisometries of an Euclidean space and is (isomorphic to) thesemi-direct product of the orthogonal group and the transla-tion group. The orthogonal group is represented by rotationsand re2ections. The translation group is represented byshifts of the Euclidean space.The action of the Euclidean group on the space of

    positions and directions R2×S1, where positions are

    0031-3203/02/$22.00 ? 2002 Pattern Recognition Society. Published by Elsevier Science Ltd. All rights reserved.PII: S0031 -3203(01)00246 -1

  • 2674 Q. Iqbal, J.K. Aggarwal / Pattern Recognition 35 (2002) 2673–2686

    represented using R2 and directions using the unit circleS1, generates isometric geometrical objects. It has beenargued that visual computations occur on R2 × S1, ratherthan on just R2 [11]. Using this notion of isotropy, wepresent an approach for content-based image retrieval viaisotropic and anisotropic mappings.We de.ne an isotropic mapping as a mapping that is

    invariant to the action of the Euclidean group on the imagespace—invariant to translation, rotation, and re2ection ofimage data. Similarly, we de.ne an anisotropic mappingas a mapping that is variant to the action of the Euclideangroup. Isometries are important for developing a frameworkfor isotropic mappings. It is shown later that all isometriesof a plane can be represented using the product of a trans-lation and either a rotation or a re2ection. Isotropic map-pings acting on perceptually salient image structures areuseful in retrieval, as they illustrate the similarity of di>er-ent structures in an image. On the other hand, anisotropicmappings indicate the uniqueness of certain attributes ofdi>erent images. We show that structure extraction viaperceptual grouping is a natural candidate for isotropicmappings, as are histograms of pixel color values. On theother hand, lower-level texture analysis via a Gabor .lterbank (which possesses aCnity for certain preferred direc-tions) operating in a channel energy model is shown as ane>ective candidate for anisotropic mappings.This paper discusses an integration framework for these

    mappings. The integrated framework takes advantage of thestrength of structure, color histogram and texture in their re-spective domains for retrieval. The motivation is to developa system that is able to retrieve images ranging from purelynatural objects, such as images of vegetation, 2owers, waterand sky, to images containing conspicuous structure, suchas images of building, towers, bridges and other architec-tural objects. Results of retrieval of outdoor images by queryand by classi.cation using a nearest neighbor classi.er arepresented.The rest of the paper is organized as follows: Section 2

    explains the perceptual grouping process to extract struc-ture. Section 3 provides a brief introduction to the Euclideangroup. Section 4 establishes structure and the color his-togram as representations of isotropic mapping. Section 5describes the texture analysis via a channel energy model asa representation of anisotropic mapping. Section 6 outlinesthe integration of isotropic and anisotropic mappings. Sec-tion 7 describes the results obtained, and .nally, Section 8provides the conclusions.

    2. Perceptual grouping---structure extraction and featureselection

    The human visual system can detect many classes ofpatterns and statistically signi.cant arrangements of imageelements. Perceptual grouping refers to the human visualability to extract signi.cant image relations from lower-level

    primitive image features without any knowledge of theimage content and hierarchically group them to obtainmeaningful higher-level structure. It stresses the uniformityof psychological grouping for perception and recognition,as opposed to recognition by analysis of discrete primitiveimage features, and embodies such concepts as group-ing by proximity, similarity, continuation, closure, andsymmetry [8].Man-made objects have sharp edges and straight bound-

    aries. The presence of a man-made object in an imagegenerates a large number of signi.cant edges, junctions,parallel lines and groups, and closed structures, in com-parison with an image with predominantly non-man-made(non-structural) objects. These features are generated bythe presence of corners, windows, doors and boundaries ofobjects such as buildings, towers, bridges and other archi-tectural objects. They exhibit regularity and relationship,and are strong evidence that structure is present in an im-age. The presence of these distinguishing features followsthe “principle of non-accidentalness” [8]; therefore, thesefeatures are more likely to be generated by man-madeobjects. Hence, these discriminating features distinguishbetween an image containing man-made objects and animage containing none.In our approach, segmentation and detailed object repre-

    sentation are not required. We extract the following featureshierarchically in an unconstrained environment, i.e., with noconstraints on the viewing angle and depth, using the ap-proach detailed in Refs. [8,9]: line (edge) segments, longerlinear lines, retained lines, coterminations, “L” junctions,“U” junctions, parallel lines, parallel groups, “signi8cant”parallel groups, and polygons. The features are shown inFig. 1. Perceptual grouping rules of similarity, continuity,parallelism and closure are used to extract these features.In general, the extracted feature vector XS = (x̃S1 ; : : : ;

    x̃Sd)t where d is the dimensionality of the feature space, and

    x̃Si =(∑

    j �!Si (lj))=(∑

    k �!el (lk)), where i∈ [1; : : : ; d]. Inaddition, � denotes the characteristic (indicator) function, lis a retained line, !el is the set of all retained lines, !Si isa higher-level structure extracted, and x̃Si ∈ [0; 1], i.e., thefeature space is represented by a unit hyper-cube. For re-trieval by both image query and image classi.cation, we setd=3, and let!Si represent “L” junctions, “U” junctions, and“signi.cant parallel groups and polygons” for i∈{1; 2; 3};respectively. Thus, x̃Si represents the corresponding nor-malized number of lines. Hence, the feature vector extractedis expressed as:

    XS = (x̃S1 ; x̃S2 ; x̃S3 )t ; (1)

    where

    x̃S1 =No: of lines in “L” junctionsTotal no: of retained lines

    ;

    x̃S2 =No: of lines in “U” junctionsTotal no: of retained lines

    ;

  • Q. Iqbal, J.K. Aggarwal / Pattern Recognition 35 (2002) 2673–2686 2675

    Fig. 1. Visualization of the groupings: (a) longer linear line, (b) coterminations, (c) “L” junctions, (d) “U” junction, (e) “U” junction,(f) parallel groups, (g) polygons.

    x̃S3 =

    No: of lines in (signi.cant) parallel groupsand polygons

    Total no: of retained lines:

    (2)

    Detailed justi.cation for using this feature vector is providedin Ref. [8]. In addition, elimination of weak-edged line seg-ments and lines shorter than a given threshold helps to keepbackground clutter to a minimum Refs. [8,9].

    3. Action of the Euclidean group---action by translation,rotation, and re�ection

    An isometry is a mapping that preserves distances, i.e.,the distance between any two points in a space remains in-variant after the application of the mapping. It is well-knownthat the set of all isometries of R2 forms a group, calledthe Euclidean group. To see this, let be an isometry ofR2, and let b = (0), where b; 0∈R2. Let �b representa member of the translation group T (2) of R2, such that�b(r) = r + b; r∈R2. It is easy to see that the translation

    group is isomorphic to the additive group R2, and that atranslation is an isometry. Then, %= �−b is an isometry ofR2, satisfying %(0) = 0. It can be shown that if %(0) = 0,then % is linear [12], and thus, = �−1−b%= �b% is a productof a linear isometry and a translation. Further, it can also beshown that the linear isometries can be represented by theorthogonal group O(2;R) of 2× 2 orthogonal matrices thatrepresent rotations and re2ections.The matrix orthogonal group O(2;R) is the set of all or-

    thogonal matrices. The determinant of any element (matrix)of O(2;R) is either 1 or −1. The set of orthogonal matri-ces with determinant equal to 1 forms a subgroup (calledthe special orthogonal group) that represents rotations. Re-2ections have a determinant equal to −1. It can be shownthat the special orthogonal group is a normal subgroup ofO(2;R).The (semi-direct) product of the orthogonal group and the

    translation group is the group of isometries of R2 (calledEuclidean group E(2)). Semi-direct product is a mecha-nism by which two groups can be .tted together to forma larger group, where one of the two groups is a normalsubgroup and the other is a subgroup of the larger group

  • 2676 Q. Iqbal, J.K. Aggarwal / Pattern Recognition 35 (2002) 2673–2686

    formed. The fact that the translation group is a normalsubgroup of E(2), and the intersection of the translationgroup and the orthogonal group is the identity, can be usedto deduce that E(2) ∼= O(2;R) ./ T (2), where ∼= denotesisomorphism and ./ denotes the semi-direct product [13].The construction is straightforward. As a set, O(2;R) ./T (2) is T (2)× O(2;R), but now the product is de.ned by(�b; %)(�b′ ; %′)=(�b% · �b′ ; %%′), where �b% · �b′ is to be inter-preted as a translation by an amount %b′ + b, i.e., the trans-lation �(%b′+b). The semi-direct product contains isomorphiccopies of the orthogonal group and the translation group assubgroups.The elements of E(2) are invertible. The distance-

    preserving mappings that are elements of E(2) are implicitin many computations in the Euclidean geometry. It maybe noticed, however, that in agreeing not to distinguish be-tween two congruent .gures in a plane, we are in essenceagreeing not to distinguish between the .gures if there isan element of the Euclidean group that maps one of the.gures onto the other [14]. In this sense, all of the symme-try groups of the two-dimensional .gures are subgroups ofthe Euclidean group. In addition, the Euclidean group is asubgroup of the aCne group. Further, it can be shown thatthe quotient group E(2)=T (2) ∼= O(2;R).

    4. Isotropic mapping

    We consider features extracted from the structural analy-sis of an image via the perceptual grouping process and thecolor histogram, and show that they are representations ofisotropic mappings.

    4.1. Euclidean isotropy of XS

    Let!={!i} represent the set of objects of interest presentin an image. Each object!i is a set of!ik={r; �}∈R2×S1,where r = {x; y}∈R2 is a coordinate pair, S1 is the unitcircle and �∈S1 represents an orientation. We treat r and� as independent variables, so that all possible orientationsfor � exist at each corresponding position r. The relationbetween !i and various image structures must be properlyunderstood. At the lowest level of vision,!ik are representedby points (pixels) on an edge segment !i (where each !i isobtained using the edge detection process described in Refs.[8,9]) and � represents the orientation of !i. At this level,!i are identi.ed with edge segments li as shown in Fig. 1.For example, in Fig. 1(a), !1 might be identi.ed with theedge segment l1.At the next level of perceptual grouping, certain!i will be

    combined to generate a higher-level structure. The structureobtained from the grouping of !i may be called !j for con-sistency of notation, although it should be understood that!j now represents a structure at a higher level than !i. (Re-fer to Fig. 2, where edge segments !3; !4; !5 and !6, com-bine to form !7.) As an example, at a higher level, !i might

    Fig. 2. !3; !4; !5 and !6 combined to form !7. At the lowestlevel of vision, !i are identi.ed with edge segments.

    refer to higher-level structures such as polygons shown inFig. 1(g). It may be noted that though the representations ofobjects !i change as they represent hierarchical higher-levelstructures, the representations of !ik remain the same, since!ik are points.A group action of a group G on a set A is a map from

    G×A → A (written as g · a for all g∈G and a∈A) thatsatis.es [13]:

    g1 · (g2 · a) = (g1g2) · a ∀g1; g2 ∈G; a∈A;

    Ie · a= a ∀a∈A; (3)where Ie is the identity element of G. As mentioned before,we do not distinguish between two congruent .gures if anelement of the Euclidean group maps a (planar) .gure toanother congruent .gure. This representation includes all ofthe symmetry groups of the two-dimensional .gures, sincethe symmetry groups are subgroups of the Euclidean group.We de.ne a mapping :! → Rd (where d is the di-

    mensionality of the feature space) to be isotropic if it is in-variant to the action of the Euclidean group on the space ofpositions and directions R2 ×S1: (Ej · !) = (!); (4)where E is the Euclidean group E(2)—the semi-direct prod-uct of the orthogonal group and the translation group. Theextraction of the feature vector XS is represented by .Since the group action is originally de.ned on R2 × S1,the action Ej ·!i, for all Ej ∈E and !i ∈!, transforms each!ik ∈!i (refer to Fig. 3):

    �b · (r; �) = (r + b; �); r; b∈R2; �∈S1;

    R� · (r; �) = (R�r; �+ �); �∈S1;

    �� · (r; �) = �′R−2� · (r; �) = (�′R−2�r;−(�− 2�)); (5)where �b ∈ T (2); (b∈R2) represents a member of thetranslation group of R2; T (2), such that �b(r) = r + b;r∈R2; R� ∈O(2;R) is a rotation by an angle � (with cen-ter at the origin), �� ∈O(2;R) is a re2ection along an axis(through the origin) in R2, and �′ is the re2ection alongthe x-axis: {x; y} → {x;−y}. The axis of re2ection �� is

  • Q. Iqbal, J.K. Aggarwal / Pattern Recognition 35 (2002) 2673–2686 2677

    Fig. 3. Action of E(2) on an edge segment !i . rk and rl represent the endpoints of !i . (a) translation (along the x-axis); (b) rotation;(c) re2ection (in the y-axis).

    inclined at an angle � with the x-axis and is spanned by theunit vector (cos �; sin �)t. In general, Ej = �b%, where % iseither R� or ��.The action �� ·(r; �)=R��′R−� ·(r; �)=�′R−2� ·(r; �) (by

    using the identity R��′ = �′R−�), because re2ection alongan arbitrary axis is equivalent to the rotation of R2 by anangle −� to align the axis of re2ection along the directionof the original x-axis, followed by a re2ection in the (new)x-axis, and then a (reverse) rotation by an angle �.The homomorphism implied in the upper equality in

    Eq. (3) may be seen as follows. The action of Ej on a func-tion S: R2 × S1 → R (e.g., an image function) is givenby Ej ·S(r; �) = S(E−1j · (r; �)). Therefore,(Ej · Ek) ·S(r; �) = S((Ej · Ek)−1 · (r; �))

    = S(E−1k · E−1j · (r; �))= Ek ·S(E−1j · (r; �))= Ej · (Ek ·S(r; �)): (6)

    It may be noted that if the center of rotation is not the origin,or if the axis of re2ection does not pass through the origin,then the fact that the translation group is a normal subgroupof the Euclidean group may be used to reduce the result-ing transformation into a product of a linear isometry anda translation. This assertion (regarding the generalized rota-tion and re2ection) is proved in Appendix A. If the transla-tion invariance of the .rst equality in Eq. (5) is established,then the generalized rotation and re2ection reduce to actionby R� and �′R−2�, respectively. It is interesting to note thatif the translation and rotation invariance of the .rst and sec-ond equalities in Eq. (5), respectively, are established, it issuCcient to show the invariance of �′ to show the invari-ance of ��.

    4.1.1. Linear feature modelingThe premise of linear feature modeling is to extract rich

    descriptions of lower-level local image primitives and usethese descriptions for subsequent grouping into higher-levelfeatures (linear line segments). We develop a mathematicalmodel of the perceptual grouping process described in Refs.

    [8,9] for the collection of edge segments !k to form a longerlinear line !j (Fig. 1(a)). At this level of vision, !k areidenti.ed with lines lk as shown in the .gure. Let r={x; y}denote the x- and y-coordinates of an end-point of an edgesegment!k , and�∈S1 represent the orientation of the edgesegment. We treat r and � as independent variables, so thatall possible orientations for !k exist at each correspondingposition r.A certain collection Ci of !k is collected, which will be

    replaced by !j , that maximizes the energy �i given as

    �(n)i = �(n−1)i +

    ∑k∈K;l �∈K;K={k̃:!k̃∈Ci}

    !kl; �(0)i = 0; (7)

    where the superscript n is an iteration index and (omittingthe subscript i) the energy functional !kl : (!b; !k ; !l)→ Ris expressed as

    !kl(!b |!k; !l) = #(q)#(st)'(rk − rl − sekl)×'(�b − �l); (8)

    where !b (identi.ed with lb in Fig. 1(a)) is a certain baseedge segment in the collection that is used to determinethat all other edge segments are parallel to it. Further, # isa weighting function and q is the maximum length of theorthogonal distance of any point of !l from !b. In the aboveequation, rk and rl represent those end-points of two edgesegments !k and !l, respectively, which are closer to eachother (at the lower level), and �b and �l are the orientationsof !b and !l, respectively. In addition, ' is the Dirac deltafunction, ekl is a unit vector in the direction of rk–rl and s isa distance parameter along an axis parallel to the directionof rk–rl. The Boolean parameter t is such that t = 0 if thelength of the orthogonal projection of !l on !k is greaterthan zero, otherwise t = 1:We represent # by a constant function (not equal to zero)

    with compact support. Speci.cally, we have selected theconstant as 1 and the support is equal to 5 units (pixels).Eq. (7) indicates the iterative nature of the grouping. At thestart Ci consists of only one segment !b. At the end of eachiteration those !l’s for which !kl is non-zero are put intoCi. The grouping is started again and continued until there

  • 2678 Q. Iqbal, J.K. Aggarwal / Pattern Recognition 35 (2002) 2673–2686

    is no increase in �i. The higher-level longer linear line !jis then obtained by a weighted average of the lengths andorientations of all edge segments in Ci [8].The energy functional expressed in Eq. (8) is similar to

    the one de.ned in Ref. [15]; however, in their model rk rep-resents the V1 image of the center of the receptive .eld ofa neuron, and ekl represents the V1 image of the orientationpreference of the neuron. Unlike their model, in our sys-tem ekl points in the direction of rk–rl and incorporates thenon-collinearity of two edge segments to an arbitrary extent(e.g., Fig. 1). (To further emphasize closer points, unequalweights, as opposed to constant weights in the support of #,can be obtained by replacing # with an appropriate weight-ing function such as a Gaussian function.) It may be notedthat the energy functional given in Eq. (8) incorporates theGestalt principles of proximity, collinearity, parallelism, andgood continuation.

    4.1.2. Euclidean invariance of !klThe energy functional expressed in Eq. (8) has a

    well-de.ned symmetry: it is invariant under the action ofE(2); invariant under translations {r; �} → {r + b; �},rotations {r; �} → {R�r; � + �} and re2ections {r; �} →{�′R−2�r;−(� − 2�)}. In Appendix B, it is veri.ed thatthe invariance of s = ‖rk − rl‖ can be established as‖�b%rk − �b%rl‖= ‖rk − rl‖= s, where % is either a rotationR�, or a re2ection ��. The invariance of q and t may alsobe established in a similar manner. Translation, rotation,and re2ection invariance of Eq. (8) imply the followingequalities, respectively:

    !kl(�b · !b | �b · !k; �b · !l) = !kl(!b |!k; !l);

    !kl(R� · !b |R� · !k; R� · !l) = !kl(!b |!k; !l);

    !kl(�� · !b |�� · !k; �� · !l) = !kl(!b |!k; !l): (9)

    These relations are also proved in Appendix B.Eq. (8) is at the heart of the perceptual grouping pro-

    cess. Its Euclidean invariance, as stated above in Eq. (9) andproved in Appendix B, means that Eq. (7) remains invari-ant, and the perceptual grouping process will produce thesame groupings—retained lines. All higher-level structuresare extracted using the retained lines.

    4.1.3. Higher-level structuresThe fundamental perceptual grouping proposed in Refs.

    [8,9] for higher-level structures can be modeled as follows.The proximity of two objects !k and !l can be modeled bythe relation #(s)'(rk − rl − sekl). Here, rk and rl refer tothose end-points of !k and !l, respectively, that are closerto each other than any other pair of end-points. The variationin the orientations of !k and !l can be controlled by therelation #̃(p)'(�k −�l −p), where the variable p=�k −�l; p∈ [0; 2)] and #̃ is a constant function (not equal tozero) with compact support (similar to #). At a higher level,

    �k and �l represent the general orientations associated withthe entire objects!k and!l, respectively. Using an argumentsimilar to the one shown above, it can be veri.ed that theserelations are invariant under the action of E(2). Hence, XSobtained by the mapping remains invariant to the actionof E(2).

    4.2. Color histogram

    Color histogram measures are invariant to both O(2;R)and T (2), and hence, E(2), because histogram measures areonly dependent on summations of identical pixel values anddo not incorporate orientation and position. The extractionof the normalized histogram XH ∈R512 is used as a repre-sentation of an isotropic mapping.A color space is perceptually uniform if a small per-

    turbation to a component value is approximately equallyperceptible across the range of that value. The RGB colorspace does not exhibit perceptual uniformity. However, theCIE LAB space, conceived in 1976, improves the percep-tual uniformity of RGB space considerably. In this spaceL de.nes lightness, A denotes red=green chrominance andB the yellow=blue chrominance. Given an image IRGB(x; y)in RGB space, we generate ILAB(x; y), where the pair (x; y)denotes the coordinates in an image I . A 512-dimensionalfeature vector XH, representing the 512-bin normalized his-togram, is extracted from the image ILAB(x; y) by uniformlyquantizing the LAB space, i.e.,

    XH = (x̃H0 ; : : : ; x̃H511 )t ; (10)

    where x̃Hj (where the index integer j∈ [0; 511]) representsthe normalized value of the jth bin of the histogram suchthat

    ∑511j=0 x̃Hj = 1. This feature space represents a unit

    hypercube.

    5. Anisotropic mapping

    In most quantitative channel energy models of textureanalysis, an image is processed by channel-selective .l-ters along certain fundamental stimulus dimensions suchas spatial-frequency and orientation. Texture analysis via achannel energy model employing a Gabor .lter bank is con-sidered to be a representation of anisotropic mapping. Therepresentation is accomplished by the extraction of the fea-ture vector XT ∈R48, which measures the fractional energyin various spatial channels after treating the input image withthe Gabor .lter bank.The fact that this representation is anisotropic may

    readily be veri.ed from the fact that the translation of animage I(r) → I(�b(r)) transforms the Fourier transformof the image I(-) → I(-)ej2�〈b; -〉, where b; r; -∈R2;r = {x; y} are the space domain coordinates and -= {u; v}are the Fourier domain coordinates. A rotation of the im-age, I(r) → I(R�(r)), transforms the Fourier transformI(-) → I(R�(-)). Similarly, the re2ection of an image

  • Q. Iqbal, J.K. Aggarwal / Pattern Recognition 35 (2002) 2673–2686 2679

    I(r) → I(��(r)) transforms the Fourier transform I(-) →I(��(-)). (These relations are derived in Appendix C.)Hence, texture analysis is not invariant after the action ofE(2) on an image.The LAB space is used for multiresolution texture analy-

    sis by measuring the fractional energies in the lightness andthe two chrominance channels. Given an image I , the con-volved sequence {I ∗ f′m;n} de.nes the multiresolution im-age texture characteristics, where f′m;n denotes the base tex-ture extraction function f at scale m and orientation n. The.lter energy (‖f′m;n‖2) is held constant. Even-symmetric,two-dimensional Gabor .lters have been used to representf′m;n. The impulse response of the base .lter f is given as

    f(x; y) =1

    2)2x2ye−(1=2)((x

    2=22x )+(y2=22y)) cos(2)u0x); (11)

    where f(x; y) represents the response at spatial locations xand y; u0 is the frequency of a sinusoidal plane wave alongthe x-axis (i.e., the 00 orientation), and 2x and 2y are thespreads of the Gaussian envelope along the x- and y-axis,respectively.A set of self-similar Gabor .lters is obtained by appropri-

    ate rotations and scalings of f(x; y) through the generatingfunction f′m;n(x; y)= k

    −mf(k−mx′; k−my′), where k¿ 1; mand n are integers, f′m;n(x; y) is the rotated and scaled ver-sion of the original .lter and k is the scale factor. In theabove equation, n = 0; 1; : : : ; N − 1 is the current orienta-tion index, where N is the total number of orientations, andm=0; 1; : : : ; M−1 is the current scale index, whereM is thetotal number of scales. In addition, x′ and y′ are the rotatedcoordinates x′ = x cos � + y sin �; y′ = −x sin � + y cos �;and � = n)=N is the orientation. The scale factor k−m en-sures that the .lter energy is independent of m. In order tomake the .lters zero-mean, we set F ′m;n(0; 0)=0, where F

    ′m;n

    is the Fourier transform of f′m;n. A total of 16 Gabor .lters(per channel) are selected, with four .lters in equi-angularorientations at four di>erent scales, i.e., N = 4, and M = 4,starting at the 0◦ orientation. Parameters 2x; 2y and k are cal-culated using a multiresolution .lter design approach [16].Channels L; A and B are treated with the Gabor .lter bank

    described above. The 48-dimensional feature vector XT isconstructed using the fractional energies in each of the 16.lters operating in the L; A and B channels, i.e.,

    XT = (x̃TL0;0 ; : : : ; x̃TL3;3 ; x̃TA0;0 ; : : : ; x̃TA3;3 ;

    x̃TB0;0 ; : : : ; x̃TB3;3 )t ; (12)

    where x̃TLm; n ; x̃TAm;n and x̃TBm;n represent the fractional en-ergy at the output of the .lter in the nth orientation and themth scale, for L; A and B channels, respectively. The frac-tional energy x̃TLm; n (in the discrete form) is given as

    x̃TLm; n =

    ∑Wy−1y=0

    ∑Wx−1x=0 L̂

    2m;n(x; y)∑M−1

    m=0

    ∑N−1n=0

    ∑Wy−1y=0

    ∑Wx−1x=0 L̂

    2m;n(x; y)

    (13)

    where L̂m;n is the L channel treated with the .lter f′m;n,Wx is the width of the image, Wy is the height, and∑M−1

    m=0

    ∑N−1n=0 x̃TLm; n = 1. In a similar manner, we de.ne

    x̃TAm;n=(∑Wy−1

    y=0

    ∑Wx−1x=0 Â

    2m;n(x; y)

    )=(∑M−1

    m=0

    ∑N−1n=0

    ∑Wy−1y=0∑Wx−1

    x=0 Â2m;n(x; y)

    )and x̃TBm;n=(

    ∑Wy−1y=0

    ∑Wx−1x=0 B̂

    2m;n(x; y))=

    (∑M−1

    m=0

    ∑N−1n=0

    ∑Wy−1y=0

    ∑Wx−1x=0 B̂

    2m;n(x; y)), where Âm;n and

    B̂m;n represent channels A and B treated with Gabor .lters,respectively. This feature space is also represented by a unithypercube.

    6. Integration framework

    A retrieval process is described that integrates perceptualgrouping, color histogram and texture. Both image queryand image classi.cation are discussed.

    6.1. Image query

    A two-level framework is employed for integratinglower-level and higher-level vision features. Given theisotropic feature vectors XS and XH and anisotropic fea-ture XT extracted from a query image, and XSj ;XHj andXTj extracted from the jth image in the database, the.rst level of the framework maps the feature vectors toa discriminant value within each of the three categories:structure, histogram and texture. The respective mappings7S :RNS →R, 7H :RNH →R and 7T :RNT →R, whereNS = 3; NH = 512 and NT = 48, are explained as fol-lows. The mappings 7S and ’T are selected as ‘2 norms,7S(XSj ;XS)=‖XSj−XS‖; 7T(XTj ;XT)=‖XTj−XT‖,and 7H is selected as the histogram intersection measure[2]:7H(XHj ;XH)=1−:(XHj ;XH), where :(XHj ;XH)=(∑NH

    k=1 min(x̃Hjk ; x̃Hk ))=(∑NH

    k=1 x̃Hk ). Since,∑NH

    k=1 x̃Hjk =∑NHk=1 x̃Hk = 1, the di>erence in the size of images is

    incorporated. At the second level, a supradiscriminant isgenerated by utilizing the mapping ;SHT :R3×R3 → R,given by

    ;SHT(XSj ;XHj ;XTj ;XS;XH;XT)

    =Wt · 7SHT(XSj ;XHj ;XTj ;XS;XH;XT); (14)where W = (w1; w2; w3)t is a weight vector such that∑3

    i=1 wi = 1; ;SHT ∈ [0; 1] and 7SHT :R3 ×R3 → R3,such that 7SHT ∈ [0; 1]× [0; 1]× [0; 1], is given as7SHT(XSj ;XHj ;XTj ;XS;XH;XT)

    = (7̂S(XSj ;XS); 7̂H(XHj ;XH); 7̂T(XTj ;XT))t ;(15)

    where

    7̂S(XSj ;XS) =7S(XSj ;XS)

    maxj 7S(XSj ;XS);

  • 2680 Q. Iqbal, J.K. Aggarwal / Pattern Recognition 35 (2002) 2673–2686

    7̂H(XHj ;XH) =7H(XHj ;XH)

    maxj 7H(XHj ;XH);

    7̂T(XTj ;XT) =7T(XTj ;XT)

    maxj 7T(XTj ;XT): (16)

    These normalizations ensure that 7̂S ∈ [0; 1]; 7̂H ∈ [0; 1]and 7̂T ∈ [0; 1]. The index î of the image most similar to agiven query image is given by

    î = argmini

    ;SHT(XSi ;XHi ;XTi ;XS;XH;XT): (17)

    The above integration framework has the following advan-tages over a simple concatenation of vectors XSj ; XHjand XTj . First, the di>erent lengths of these three vec-tors preclude the proper construction of a concatenated vec-tor that is equally sensitive to all of its components. Thethree-dimensional vector output by 7SHT is equally sensi-tive to all of its three one-dimensional components. Second,the size of the corresponding weight vector for the concate-nated vector will be large, making the selection of properweights diCcult and unfeasible. Third, in our proposed in-tegration, weights are assigned at the module level, i.e.,structure, histogram and texture, whereas weights in a con-catenated vector are assigned at the vector component levelwithout particular regard to the modular structure of the sys-tem. The weight vector plays an important role in control-ling the content of images retrieved by assigning di>erentweights to structure, histogram and texture.

    6.2. Image classi8cation

    A nearest neighbor classi.er [17] is used for image clas-si.cation. The image space is partitioned into three classes:Structure, Non-structure and Intermediate, based upon themeasure of man-made object structure present in an im-age. Here, a more intuitive approach is presented than theoriginal approach presented in Ref. [10]. In that approach,for the combined retrieval methodology, a slight alterationof 7SHT’s (Eq. (15)) were taken as patterns. A newthree-dimensional feature space was generated by usingthe three feature spaces (structure, histogram and texture)as {RNS ;RNH ;RNT} → R3, by using 7S(XSj ) = ‖XSj‖and 7̂S(XSj ) = 7S(XSj )=maxj 7S(XSj ). The mappings7H; 7T; 7̂H and 7̂T were de.ned similarly. Individualcomponents of resultant three-dimensional vector weremultiplied by a weight vector to yield a .nal discriminantvalue.In the current approach, the distance function for the near-

    est neighbor classi.er is rede.ned to incorporate distancesof training feature vectors from the test vectors in the threepattern spaces (structure, histogram and texture) to gener-ate a discriminant value. Speci.cally, it is de.ned as theweighted ‘1 norm on the product spaceRNS ×RNH ×RNT .Let dSHT : {{RNS ;RNH ;RNT}, {RNS ;RNH ;RNT}} →R denote the distance function, and XS;XH, and XT, be

    the structure, histogram and texture feature vectors, respec-tively, for a given test image. Let XSj ; XHj and XTj denotethe corresponding training feature vectors of the jth trainingimage. Then,

    dSHT(XS;XH;XT;XSj ;XHj ;XTj )

    =Wt · (7̂S(XSj ;XS); 7̂H(XHj ;XH); 7̂T(XTj ;XT))t ;(18)

    where 7̂S, 7̂H and 7̂T are de.ned in Eq. (16).

    7. Results obtained

    We employ three di>erent image databases consisting ofa total of 4329 24-bit color images. Database #1 consistsof 2139 images of size adjusted to 1024 × 1024 acquiredfrom two CDs obtained from the Visual Delights, Inc. [18]:“Austin and Vicinity: The Human World” and “Austin andVicinity: The World of Nature”. Database #2 consists of521 images of size adjusted to 512 × 512 acquired fromthe ground level using the Sony Digital Mavica camera.Database #3 consists of 1669 images of size adjusted to512× 512 downloaded from the internet (Free Nature Pic-tures, Info. for Travel Media, Dave’s Wall Paper and PhotoArt of Nature [19]).

    7.1. Image query

    Results for image query were obtained using W =(1=3; 1=3; 1=3)t. The .rst 16 images retrieved are displayedin all the .gures shown. Figs. 4 and 5 show queries forthe retrieval of images containing conspicuously naturalobjects: 2owers, leaves and grass, and a duck in water,respectively. The retrieved images closely match the nat-ural content of the supplied query images. Fig. 6 showsan interesting query. An image containing an automobilewas supplied as a query. This image contains a mixtureof an intermediate-level structural object (an automobile)with road and vegetation. The system retrieved imagescontaining automobiles of various colors, because an in-tegral part of the system depends on perceptual groupingthat examines the structure of an object regardless ofthe color. Fig. 7 depicts a query for a purely structuralobject: a building facade. The retrieved images againmatch closely in structural content to the supplied queryimage.

    7.2. Image classi8cation

    Results for image classi.cation were also obtained usingW=(1=3; 1=3; 1=3)t. Tables 1–4 display results for retrievalby image classi.cation obtained using a nearest neighborclassi.er. Based upon the measure of structure present in animage, the image space was partitioned into three classes:structure, non-structure and intermediate. Each class was

  • Q. Iqbal, J.K. Aggarwal / Pattern Recognition 35 (2002) 2673–2686 2681

    Fig. 4. Retrieval by image query (Databases #1 and #2): 2owers, leaves and grass.

    Fig. 5. Retrieval by image query (Databases #2 and #3): duck in water.

  • 2682 Q. Iqbal, J.K. Aggarwal / Pattern Recognition 35 (2002) 2673–2686

    Fig. 6. Retrieval by image query (Databases #1 and #2): an automobile.

    Fig. 7. Retrieval by image query (Databases #1 and #2): a building facade.

  • Q. Iqbal, J.K. Aggarwal / Pattern Recognition 35 (2002) 2673–2686 2683

    represented by 10 training samples. Table 1 shows the over-all retrieval rate. Table 2 displays class-conditional retrievalperformance measured in terms of recall and precision. Re-call is de.ned as the fraction of the total number of imagesthat are correctly retrieved for a particular class. Precisionis de.ned as the fraction of images retrieved for a particularclass that are actually correct. The detailed retrieval statisticsare shown in the confusion matrix shown in Table 3. Table4 shows the distribution of images that actually belong to aparticular class within the “best matches” for that class, inintervals of 100 images, and the corresponding e@ciency

    Table 1Retrieval by image classi.cation (Database #2): overall retrievalrate

    Total Training E>ective Correct RRT D C (C=D)

    521 30 491 372 75.76%

    T =total no. of images, D=e>ective no. of images, C=correctand RR = retrieval rate.

    Table 2Retrieval by image classi.cation (Database #2): recall and preci-sion

    Class T R C Recall Precision(C=T ) (C=R)

    Structure 255 203 188 73.73% 92.61%Non-structure 140 151 119 85.00% 78.81%Intermediate 96 137 65 67.71% 47.45%

    T = total, R = retrieved, C = correct.

    Table 3Retrieval by image classi.cation (Database #2): confusion matrix

    Class Structure Non-structure Intermediate

    Structure 188 13 54Non-Structure 3 119 18Intermediate 12 19 65

    Entries presented along rows, e.g., 188 structure class imagesclassi.ed as structure, 13 as non-structure, and 54 as intermediate.

    Table 4Retrieval by image classi.cation (Database #2): distribution of images actually belonging to a particular class in the “best matches” forthat class, in intervals of 100 images, and the eCciency of the system

    Class 1–100 101–200 201–300 301–400 401–491 T Q E> : = Q=T

    Structure 85 70 57 32 11 255 189 74.12%Non-structure 82 30 18 5 5 140 101 72.14%Intermediate 56 23 8 3 6 96 55 57.29%

    T = total no. of images belonging to a certain class, Q = no: of images that actually belong to a certain class in the .rst T best matchesfor that class, and E> : = eCciency.

    of the system. The best matches were obtained by sortingimages in ascending order based upon their distances fromthe training samples of each class. ECciency is de.ned asthe ratio of the number of images that actually belong to aparticular class in the block of closest best matches, to thesize of the block. The block size is set equal to the numberof images in that class.

    8. Conclusions

    This paper has presented an approach for content-basedimage retrieval via isotropic and anisotropic mappings.Isotropic mappings were de.ned as mappings invariantto the action of the planar Euclidean group on the imagespace—invariant to the translation, rotation and re2ection ofimage data, and hence, invariant to orientation and position.Anisotropic mappings, on the other hand, were de.ned asthose mappings that are correspondingly variant. Structureextraction (via a perceptual grouping process) and colorhistogram were shown to be representations of isotropicmappings. Texture analysis using a channel energy modelcomprised of even-symmetric Gabor .lters was consideredto be a representation of anisotropic mapping. Segmenta-tion of an image and detailed object representation werenot required.An integration framework for these mappings was also

    described. The integrated framework took advantage of thestrength of structure, color histogram and texture in theirrespective domains for retrieval. Results of retrieval of out-door images by query and by classi.cation using a nearestneighbor classi.er were presented. The system was able toretrieve images ranging from purely natural objects, such asimages of vegetation, 2owers, water and sky, to images con-taining conspicuous structure, such as images of building,towers and bridges. In addition, the system gave good perfor-mance for retrieval of images containing intermediate-levelstructure such as images containing automobiles (even whenthey were of di>erent colors). The judicious use of percep-tual grouping to extract structure gives our system an edgeover content-based image retrieval systems that retrieve im-ages containing structural objects based purely upon colorand texture. Results obtained show the eCcacy of combin-ing structure, histogram and texture for retrieval.

  • 2684 Q. Iqbal, J.K. Aggarwal / Pattern Recognition 35 (2002) 2673–2686

    Acknowledgements

    The authors wish to thank Prof. B.S. Manjunath (Univer-sity of California at Santa Barbara) for his comments onGabor .lters. Thanks are also due to Sadia Sharif for herpainstaking e>ort in locating and downloading images fromthe internet.

    Appendix A. General rotation and re�ection

    For the case of a general rotation, the center of rotationcan be shifted to the origin by a translation, followed by arotation, and then a reverse translation of the same magni-tude. The resulting transformation �bR��−1b is given as

    �bR��−1b = �bR��−b = �b�(R�(−b))R� = �(R�(−b)+b)R�; (A.1)

    where we have made use of the fact that, in general, %�b%−1=�%(b), such that % represents either a rotation or a re2ection.For the case of a general re2ection, after the rotation of

    the axis of re2ection to align it along the original x-axis,the rotated axis can be translated to align it on the originalx-axis. This is followed by the re2ection �′, then a reversetranslation and a reverse rotation. The resulting transforma-tion R��b�′�−bR−� is given as

    R��b�′�−bR−� = (R��b)�

    ′(R��b)−1

    = (�(R�(b))R�)�′(�(R�(b))R�)

    −1

    = �(R�(b))�′R−2��(−R�(b))

    = �(�′R−2�(−R�(b))+R�(b))�′R−2�: (A.2)

    Appendix B. Euclidean invariance of �kl

    The energy functional expressed in Eq. (8) has awell-de.ned symmetry: it is invariant under the ac-tion of E(2); invariant under translations {r; �}→{r +b; �}, rotations {r; �}→{R�r; � + �} and re2ections{r; �}→{�′R−2�r;−(� − 2�)}. The invariance ofs = ‖rk − rl‖ may be established as‖�b%rk − �b%rl‖2 = ‖(%rk + b)− (%rl + b)‖2

    = 〈%rk ; %rk〉+ 〈%rl; %rl〉 − 2〈%rk ; %rl〉= ‖rk‖2 + ‖rl‖2 − 2〈rk ; rl〉= ‖rk − rl‖2 = s2; (B.1)

    where 〈 ; 〉 denotes the dot product and % is either a rotationor a re2ection. The relation 〈%rk ; %rl〉=〈rk ; %−1%rl〉=〈rk ; rl〉(where we consider % as an operator) follows from the factthat, for example, in case %=R�, the adjoint operator of R� isgiven (in the matrix notation) as the conjugate-transpose ofR�—which is real, and orthogonal, i.e., Rt�R� = Ie, where Ieis the identity. Hence, the adjoint operator of R� is Rt�=R

    −1� .

    The adjoint operator of �� is �−1� and the adjoint operatorof % is %−1. The .rst and second terms in the second line of

    Eq. (B.1) may also be dealt with a similar manner. Similarly,the invariance of q and t may also be established.Translation invariance of Eq. (8) is easy to see, because

    !kl(�b · !b | �b · !k; �b · !l)=#(q)#(st)'((rk + b)− (rl + b)− sekl)'(�b − �l)=#(q)#(st)'(rk − rl − sekl)'(�b − �l)=!kl(!b |!k; !l): (B.2)

    Invariance with respect to a rotation R� follows from

    !kl(R� · !b |R� · !k; R� · !l)=#(q)#(st)'(R�rk − R�rl − sR�ekl)'((�b + �)

    − (�l + �))=#(q)#(st)'(R�(rk − rl − sekl))'(�b − �l)=#(q)#(st)'(rk − rl − sekl)'(�b − �l)=!kl(!b |!k; !l): (B.3)

    Invariance under a re2ection �� about an axis holds, since:

    !kl(�� · !b |�� · !k; �� · !l)=#(q)#(st)'(�′R−2�rk − �′R−2�rl − s�′R−2�ekl)

    ×'(−(�b − 2�) + (�l − 2�))=#(q)#(st)'(�′R−2�(rk − rl − sekl))'(−(�b − �l))=#(q)#(st)'(rk − rl − sekl)'(�b − �l)=!kl(!b |!k; !l); (B.4)

    where '(R�(rk − rl − sekl)) = '(rk − rl − sekl) and'(�′R−2�(rk−rl−sekl))='(rk−rl−sekl), as explained in thefollowing. Let % represent either R� or �′R−2�, where R�; �′

    and R−2� are orthogonal matrices. Recall that the productof any number of orthogonal matrices is also orthogonal,and hence, has an inverse. (The set of orthogonal matricesforms a group—which is closed under multiplication of theelements of the group.) It can be shown that the linear op-erator % (de.ned on the .nite dimensional space R2) has aninverse if and only if %(b)=0 ⇒ b=0; 0∈R2 [20]. That is,rk − rl− sekl=0 ⇔ %(rk − rl− sekl)=0. From which it fol-lows that rk−rl−sekl �= 0 ⇔ %(rk−rl−sekl) �= 0. Recall that'(b) = 0 if b �= 0 (in the sense of distributions). Hence, theassertion follows, i.e., '(R�(rk −rl−sekl))=0, if and only if'(rk−rl−sekl)=0, and similarly '(�′R−2�(rk−rl−sekl))=0if and only if '(rk − rl − sekl) = 0. The case for '(b) whenb = 0 may also be argued in a distributional sense.

    Appendix C. Fourier transform of translated, rotatedand re�ected image data

    We shall derive a general case of the Fourier transformunder the aCne mapping r �→ Mr+ b, whereM∈R2×2 is

  • Q. Iqbal, J.K. Aggarwal / Pattern Recognition 35 (2002) 2673–2686 2685

    a real-valued invertible matrix, and r; b∈R2. Translations,rotations and re2ections will be seen as special cases.The two-dimensional Fourier transform of a function

    f :R2 → R is given as

    F(-) =∫R2

    f(r)e−j2)〈r; -〉 dr; (C.1)

    where F denotes the Fourier transform of f; -∈R2 are theFourier domain coordinates, and 〈; 〉 denotes the dot product.The Fourier transform of f(Mr + b) is given as

    F ′(-) =∫R2

    f(Mr + b)e−j2)〈r; -〉 dr

    =1

    |det(M)|∫R2

    f(r′)e−j2)〈(M−1r′−M−1b);-〉 dr′

    =ej2)〈M

    −1b;-〉

    |det(M)|∫R2

    f(r′)e−j2)〈M−1r′ ;-〉 dr′

    =ej2)〈b; (M

    −1)t-〉

    |det(M)|∫R2

    f(r′)e−j2)〈r′ ; (M−1)t-〉 dr′

    =ej2)〈b; (M

    −1)t-〉

    |det(M)| F((M−1)t-)

    =e j2)〈b; -

    ′〉

    |det(M)|F(-′); (C.2)

    where F ′ is the Fourier transform of f(Mr + b) and r′ =Mr+b. Hence, dr′=|det(M)| dr, where det(M) denotes thedeterminant of the matrixM, i.e., the Jacobian determinantof the transformation. In the above equation, -′ = (M−1)t-are the coordinates - transformed due to the aCne mapping.In addition, we have made use of the fact that 〈M−1r′; -〉=〈r′; (M−1)t-〉. It may be noted that the result derived inEq. (C.2) remains valid for higher-dimensional (Euclidean)spaces other than R2.For translation of an image, letM be the identity. Hence,

    translation of an image I(r)→ I(�b(r)) transforms theFourier transform of the image I(-)→I(-)ej2)〈b; -〉, wherer = {x; y} are the space domain coordinates and -= {u; v}are the Fourier domain coordinates.For rotation of an image by an angle � about the origin, let

    M=R� and b=0∈R2. It may be realized that (R−1� )t =R�,since R� is an orthogonal matrix, i.e., the product Rt�R� isthe identity matrix. The determinant of R� is equal to 1 (be-cause R� is a special orthogonal matrix). Hence, a rotation ofI (r)→ I (R�(r)), transforms the Fourier transform I(-)→I(R�(-)).For re2ection of an image in an axis inclined at an angle

    � with the x-axis, letM= �� = �′R−2�, where b= 0∈R2.The determinant of �� is equal to −1, since det(��) =det(�′) det(R−2�). The determinant of R−2� is equal to 1(since R−2� is a rotation and hence, a special orthogonalmatrix), and the determinant of �′ is equal to −1, becausein matrix notation the mapping {x; y} → {x;−y} is

    obtained by(1 00 −1

    )(xy

    );

    where �′ is the matrix on the left side of the above expres-sion. Since �� is an orthogonal matrix, (�−1� )

    t = ��. There-fore, the re2ection I(r) → I(��(r)) transforms the Fouriertransform I(-)→ I(��(-)).

    References

    [1] H. Muller, W. Muller, D.McG. Squire, S. Marchand-Maillet,T. Pun, Performance evaluation in content-based imageretrieval: overview and proposals, Pattern Recognition Lett.22 (5) (2001) 593–601.

    [2] M.J. Swain, D.H. Ballard, Color indexing, Int. J. Comput.Vision 7 (1) (1991) 11–32.

    [3] J. Ashley, R. Barber, M. Flickner, J. Hafner, D. Lee, W.Niblack, D. Petkovic, Automatic and semi-automatic methodsfor image annotation and retrieval in QBIC, in: Proceedingsof the SPIE: Storage and Retrieval for Image and VideoDatabases III, San Jose, CA, Vol. 2420, February 1995, pp.24–35.

    [4] A.P. Pentland, R. Picard, S. Sclaro>, Photobook: content-based manipulation of image databases, Int. J. Comput. Vision18 (3) (1996) 233–254.

    [5] J.R. Smith, S.-F. Chang, VisualSEEK: a fully automatedcontent-based image query system, in: Proceedings of theACM Multimedia Conference, Boston, MA, Nov. 1996, pp.87–98.

    [6] M. Szummer, R.W. Picard, Indoor–outdoor image class-i.cation, in: IEEE International Workshop on Content-basedAccess of Image and Video Databases, Bombay, India, 1998,pp. 42–51.

    [7] A. Vailaya, A.K. Jain, H.-J. Zhang, On image classi.cation:city images vs. landscapes, Pattern Recognition 31 (12)(1998) 1921–1935.

    [8] Q. Iqbal, J.K. Aggarwal, Retrieval by classi.cation of imagescontaining large manmade objects using perceptual grouping,Pattern Recognition 35 (7) (2002) 1463–1479.

    [9] Q. Iqbal, J.K. Aggarwal, Applying perceptual groupingto content-based image retrieval: building images, in:Proceedings of the IEEE International Conference onComputer Vision and Pattern Recognition, Fort Collins, CO,1999, pp. 42–48.

    [10] Q. Iqbal, J.K. Aggarwal, Image retrieval via isotropic andanisotropic mappings, in: The First International Workshopon Pattern Recognition in Information Systems, Setubal,Portugal, July 6–8, 2001, pp. 34–49.

    [11] J. Zweck, L.R. Williams, Euclidean group invariantcomputation of stochastic completion .elds using shiftable-twistable functions, in: Proceedings of the Sixth EuropeanConference on Computer Vision (ECCV ’00), Dublin, Ireland,2000, pp. 100–116.

    [12] F.M. Goodman, Algebra, Abstract and Concrete,Prentice-Hall, Inc., Englewood Cli>s, NJ, 1998.

    [13] D.S. Dummit, R.M. Foote, Abstract Algebra, 2nd Edition,Wiley, New York, 1999.

    [14] J.R. Durbin, Modern Algebra: An Introduction, Wiley, NewYork, 1985.

  • 2686 Q. Iqbal, J.K. Aggarwal / Pattern Recognition 35 (2002) 2673–2686

    [15] P.C. Bresslo>, J.D. Cowan, M. Golubitsky, P.J. Thomas, M.C.Wiener, Geometric visual hallucinations, euclidean symmetry,and the functional architecture of striate cortex, Philos. Trans.R. Soc. London B 356 (2001) 299–330.

    [16] B.S. Manjunath, W.Y. Ma, Texture features for browsing andretrieval of image data, IEEE Trans. Pattern Anal. Mach.Intell. 18 (8) (1996) 837–842.

    [17] R.O. Duda, P.E. Hart, D.G. Stork, Pattern Classi.cation, 2ndEdition, Wiley, New York, 2001.

    [18] Visual Delights, Inc., http://www.visualdelights.net[19] Downloaded images, Free Nature Pictures, http://members.

    nbci.com/5555623/; Info. for Travel Media, http://www.sfvisitor.org/ travelmedia/html/slides.html; Dave’s WallPaper, http://davydicus.tripod.com/; Photo Art of Nature,http://www.photoartofnature.com

    [20] E. Kreyszig, Introductory Functional Analysis withApplications, Wiley, New York, 1978.

    About the Author—QASIM IQBAL obtained a Bachelor of Science (B.Sc.) degree in Electrical Engineering from the University ofEngineering and Technology, Lahore, Pakistan in 1996. He obtained a Master of Science (M.S.E.) degree in Electrical Engineering fromThe University of Texas at Austin in 1998. He is currently working towards a Ph.D. in Computer Vision at the Computer and VisionResearch Center (CVRC), The University of Texas at Austin. His research interests include computer vision, content-based image retrieval,image processing, wavelets and pattern recognition.

    About the Author—J.K. AGGARWAL has served on the faculty of The University of Texas at Austin College of Engineering since 1964and is currently the Cullen Professor of Electrical and Computer Engineering and Director of the Computer and Vision Research Center.His research interests include computer vision and pattern recognition. A Fellow of IEEE since 1976 and IAPR since 1998, he receivedthe Senior Research Award of the American Society of Engineering Education in 1992, and the 1996 Technical Achievement Award of theIEEE Computer Society. He is author or editor of seven books and 39 book chapters; author of over 175 journal papers, as well as numerousproceedings papers and technical reports. He served as Chairman of the IEEE Computer Society Technical Committee on Pattern Analysisand Machine Intelligence (1987–1989); Director of the NATO Advanced Research Workshop on Multisensor Fusion for Computer Vision,Grenoble, France (1989); Chairman of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition (1993), andPresident of the International Association for Pattern Recognition (1992–1994).

    http://www.visualdelights.nethttp://members.http://mailto:travelmedia/html/slides.htmlhttp://davydicus.tripod.com/http://www.photoartofnature.com

    Image retrieval via isotropic and anisotropic mappingsIntroductionPerceptual grouping---structure extraction and feature selectionAction of the Euclidean group---action by translation, rotation, and reflectionIsotropic mappingEuclidean isotropy of XSLinear feature modelingEuclidean invariance of xiklHigher-level structures

    Color histogram

    Anisotropic mappingIntegration frameworkImage queryImage classification

    Results obtainedImage queryImage classification

    ConclusionsAcknowledgementsAppendix A.Appendix B.Appendix C.References