shaman with elephant, sandawe, tanzania,...

9
34 December 1997/Vol. 40, No. 12 COMMUNICATIONS OF THE ACM Shaman with elephant, Sandawe, Tanzania, Tanzania National Museum

Upload: duongliem

Post on 30-May-2018

213 views

Category:

Documents


0 download

TRANSCRIPT

34 December 1997/Vol. 40, No. 12 COMMUNICATIONS OF THE ACM

Shaman with elephant, Sandawe, Tanzania, Tanzania National Museum

COMMUNICATIONS OF THE ACM December 1997/Vol. 40, No. 12 35

Amarnath Gupta, Simone Santini, and Ramesh Jain

Information is data with semantic association. An information system’s most

important role is to capture the data and its semantic associations so users can

perform meaningful tasks with the information. An information system’s

design and functionality change along with changes in the nature of the data,

the nature of data association, and the task the user is trying to perform. For

In Search of Information in

Visual Media

example, temporal interval data involves differentsemantics from those of a row of floating-point num-bers. Similarly, users of On-Line Analytical Process-ing database software need a different set ofoperations from those used in relational systems,although the underlying data might be the same. Ina perfect world, given any kind of data, semantics,and operational requirement, a suitable informationsystem can be designed and built. Unfortunately,that is not reality.

Reality is that information systems are most suc-cessful when the data has human-imposed structure.Systems derived from the relational model, for exam-ple, provide users a powerful set of tools to specifythe domain of every attribute and the semantic asso-ciations between them. An application schema cre-ated through these tools constrains the data that fitsthe problem domain. The resulting database issemantically rich because the system’s developertakes the trouble to ensure that every attribute of themodel has a well-defined interpretation and that the

dependencies between attributes faithfully reflectthe real problem world. A successful informationsystem gives the user enough capability to define theattribute domains, express data associations, and per-form an adequate set of retrieval operations. A greatcase in point is the practical success of spatial infor-mation systems, in which the data types are points,lines, and regions in space and operations can be ascomplex as finding the intersection of two arbitrarypolygons in 3D space.

Asecond area of moderate success hasbeen information systems in which thedata has little structure but the associ-ations are semantically rich. A modernexample is the world of hyperlinks

represented by the phenomenal application knownas the World-Wide Web. With hyperlinks betweenentities covering such a wide and heterogeneousspectrum as text, documents, images, movie clips,audio files, virtual reality, and databases, the Web

vis

ual

info

rmat

ion

ma

nag

emen

t v

A search through visual media should be as imprecise as ‘I know it when I see it.’

AR

T R

ESO

UR

CE:

WER

NER

FO

RM

AN

AR

CH

IVE,

TA

NZA

NIA

NA

TIO

NA

L M

USE

UM

, DA

R E

S SA

LAA

M

has revolutionized our ability to access unstructuredinformation. With all its success, however, can theWeb be called a good information system? We cansay it is a good application when users want tobrowse by navigating hyperlinks or users have priornavigational patterns that take them close to theright information. But trying to locate specific butunknown information via search can be a nightmare.

Compared with database and hypertext-based sys-tems, free-text retrieval systems have had mixed suc-cess. Most common text-information retrievalsystems lack the ability to interpret a user’s intentand instead try to approximate intent through sta-tistical techniques, like word frequency and termcooccurrence. Some systems go a step deeper, tryingto group a corpus of terms through interterm associ-ation (such as through a thesaurus relating dog tocanine and mongrel or through a latent semantic asso-ciation relating Macbeth to Shakespeare). These sys-tems perform some degree of associative search, so aquery on mongrel matches documents with the wordsdog and canine. However, only a few systems gobeyond word structure and occurrence statistics touse natural language processing techniques toextract phrase-level, sentence-level, and intersen-tence-level associations and use these associations for

retrieval. Hence, the semantics expressed throughintersentence or interparagraph associations can notyet be modeled and retrieved.

Given the absence of structure in free text, the taskof the system is not only to store and retrieve associa-tions with data but to extract associations from rawtext data, thereby generating information. This asso-ciation extraction is computationally more difficultand often an almost impossible task. However, if aninformation retrieval system had the ideal extractioncomponent, it could interpret a discourse written overmultiple paragraphs and answer semantic queries like,“What is the basic plot of the story?” But few limited-domain research prototypes have such ability. Theissue here is that an information retrieval system thatalso needs to extract information from raw data (asopposed to being told what the associations are) isinherently weaker, retrieving information subopti-mally. This problem has much in common with visualinformation systems in which the system containsimages (or videos) as its primary information object.

The Character of Visual InformationAs in all these systems, the information in a visualobject is its value, that is, a depiction of what it con-tains, and its association with other visual objects.

36 December 1997/Vol. 40, No. 12 COMMUNICATIONS OF THE ACM

Figure 1. The larger picture is an airplane. The picture in the inset has the same background texture but with two+ shapes. Two similar shapes is enough for many systems to consider the two pictures very similar.

However, for visual objects, characterizing informa-tion content becomes more complex than for text. Intext, every word has a finite number of meanings,and the correct semantic value of a word, if notimmediately clear, needs to be disambiguated (cho-sen from these finite possibilities) through sentence-level or paragraph-level analysis.

In Figure 1, the semantic value of the visual objectis obviously a flying aircraft. Visually, however, thepicture is quite similar to the small inset image in theupper right-hand corner. This seeming equivalence ofthe two visual objects comes about because they havea similar appearance value, characterized by a combina-tion of their color and texture. If we query on visualobject 1, should the system retrieve visual object 2?The problem would be trivially simple if the systemcould label visual object 1 as an aircraft, because thenwe could match visual objects by their semantic valuealone. Extracting thesemantic value from thevisual appearance is anarduous task, complicatedby the facts that manyobjects with the samesemantic label exhibit avery large variety ofappearance values and thatautomatic isolation (seg-mentation) of the seman-tic objects from an imageis in itself a difficult prob-lem. In this article, wefocus mainly on retrievalbased on appearance val-ues of visual objects. Foran object-modeling per-spective of the retrievalproblem, see [4].

Retrieval based onvisual object appearance involves four categories ofinformation items: features; feature space; featuregroups; and image space.

Features. Computationally, a feature is a derivedattribute obtained by transforming the originalvisual object through an image analysis algorithm; itcharacterizes a specific property of an image. A fea-ture is typically represented as a set of numbers,often called a feature vector, although several vectoroperations, like addition and multiplication by aconstant, are never performed on them. The opera-tions used most often are: • Projection. Projection creates a lower-dimensional

vector from a higher-dimensional vector by

choosing a user-specified set of dimensions. Forexample, if the feature shape of a visual objectcontains 10 floating-point numbers representingZernike moments, choosing the first five of themis a projection. Features like Zernike momentsare frequently ordered arrays, so choosing thefirst five to match objects retrieves objects thatmatch in their overall shape but may differ indetails.

• Apply Function. Apply Function takes a feature asinput and applies a function F to all numbers tocreate another set of numbers with the samedimension. For example, a filter function (see Fig-ure 2) may be applied to the hue histogram of animage to compute the redness of the image.

• Distance. Given two features, Distance computes adifference value between them, a crucial operationbecause for many kinds of visual information, a

match in the appearancevalue is defined in terms ofthe Distance function of theconstituent features; thegreater the distance, the lessthe match. The design ofthe Distance function oftenaccounts for the inherentimprecision in matchingtwo visual objects. One pre-dominant effect of treatingfeatures as vectors is thatthe Distance function isoften defined as a Euclid-ean, or city-block, distancebetween two points in fea-ture space. The cosine (orangular) distance betweentwo features is another vec-tor-space measure very pop-ular in text retrieval, but it

has been used less for visual information.

However, features cannot always be thought of asvectors. For example, if the feature represents a dis-tribution (or histogram) of a variable measured in theimage, the distance function needs to compute thedifference between two distributions. A classic dis-tance measure for comparing distributions is theMahalanobis distance, widely used in statistical pat-tern recognition. Rubner, Guibas, and Tomasi [10]recently used a color distance function called the“earth mover’s distance” to compute the workinvolved in converting one distribution to anotherdistribution. The features of a visual object can alsobe represented as groups of points in the feature

COMMUNICATIONS OF THE ACM December 1997/Vol. 40, No. 12 37

Figure 2. Hue is the color content of a pixel, represented with an angular scale from 0 to 360

degrees. The perceptual color red appears around 0 degrees, green at 120 degrees, and blue at 240

degrees. The curve shows a filter that can be used to compute the redness of an image.

1.0

0.5

0

0 15 30 45 60 75 90

Hue in Degrees

Filter Factor

space. To illustrate, assume a feature called “texture”of an image region can be represented by three num-bers: randomness (a chessboard has little random-ness, while an arbitrary set of dots has a lot);periodicity (repetitiveness of pattern); and direction-ality (the stripes in the American flag have an ori-ented texture) [7].

Consider an image with 10 different regions, eachwith a different texture value. These texture valuesform 10 points in the randomness-periodicity-direc-tionality coordinates. How similar is this image withanother that has 10 other textured regions? Many dis-tance functions are defined between point sets. Eiterand Mannila recently compared a set of such distancefunctions [3], and many other kinds of distance func-tions have also been reported in the literature. How

well do these distance functions portray the humansense of difference in appearance? We have seen nothorough investigation of the issue to date.

Feature space. Regardless of whether or not it is avector, the feature for a visual object inhabits someregion in a space defined by its variables. In the tex-ture example, we saw that the feature space is 3D. Aswe add more images to a database, the 3D space getspopulated with a point for every new texturedregion. If we treat this space as an informationobject, we can query it by using a number of opera-tions. Querying the feature space gives us not only afeeling for which objects are there in a specific partof the feature space but a general impression of whatthe database contains—an important class of queriesfor browsing and exploring the database. The mostobvious operations are union, difference, and set-membership of point-sets in feature space. Moreinvolved operations include:

• Find Boundary. Given a set of points in featurespace, Find Boundary returns a hyperpolyhedralboundary of the points. It can be used forexploratory queries like, “Here are 10 examplesof the American flag. Show me the part of thefeature space covering all 10 instances.” Once weget back the region, we can ask further queries,like, “What other visual objects belong to thisarea?”

• Select by Spatial Constraint. In its simplest form,Select by Spatial Constraint is a range query thatretrieves all feature points lying within (or out-side) a hypercube in the feature space; in the 2Dfeature space, a hypercube is a rectangle whosebounds have been specified. In a more generalform, the range may be specified by using con-

straints on the boundary of the region (e.g., with“drawing” the boundary of the region by linesand curves). This operation, although commonlyused in spatial databases, can be computationallyprohibitive for feature spaces with a large numberof dimensions or variables.

• Select by Distance. Although discussed here becauseof its popularity, Select by Distance is a specialform of the previous query, whereby the userpicks a query feature point in space and the rangeis always the shape of a hyperellipsoid (an ellipsein 2D) around the point selected. The lengths ofthe different axes (dimensions or variables) of thehyperellipsoid is determined by how tightly theuser wants a result to match the query featurepoint along an axis; the tighter the match, theshorter the axis.

• k-Nearest Neighbor. This is the most popular querysupported by current query-by-example systems.As with the previous query, the user selects a

38 December 1997/Vol. 40, No. 12 COMMUNICATIONS OF THE ACM

Information is

meaningful only when

it can be retrieved through an

expressive query.

query feature point. The operation findsthe k visual objects, denoted by the otherpoints in the feature space, closest in dis-tance to the query feature point, oftenranking them in order of distance (count-ing and ordering are two important oper-ations in feature space). While this is avery useful query in itself, it may turnout to be less useful if the feature space issparsely populated, because then thenearest neighbor can be very far into thefeature space and very dissimilar to theexample in appearance value.

• Partition Space. Related to the classictransitive closure computation, this oper-ation is used to divide the feature spaceinto a small number of regions (calledclusters in pattern recognition) by gath-ering close-by feature points into aregion. Parts of the feature space withmany nearby points are a dense group,while sparse regions may have only a few points pergroup. There are several methods for creating thepartition; in one, the user provides two distancebounds d and d*, specifying that all members withinthe group must have mutual distance less than d andthat the distance between two groups has to be atleast d* (see the colored clusters in Figure 3).

• Rename. Rename assigns a new name to a specificpart of the feature space. For example, in the 3Dtexture feature space, the region high in both ori-entation and periodicity values may be renamed“stripes.” In a more general case, a Boolean com-bination of regions (Region A and Region B) canbe renamed as a single entity.

• Aggregate Operations. These operations include agroup, like count, mean, standard deviation, andcluster diameter, that compute aggregate proper-ties in the feature space.

Clearly, users can compose more meaningfulqueries by combining these operations and a few pro-gramming constructs. For example, an advanced usercan express the query, “Find k-Nearest Neighbors ofthe query feature point within its cluster,” by saying,“Partition the feature space into a set of regionscalled R. Find the boundary of the k-Nearest Neigh-bors of the query feature point, and call the boundedregion S. Compute R0S.”

Feature GroupsGrouping more than one simple feature into onecomplex feature often makes the complex featuremore expressive. In [4], the authors refer to the com-

bination of a skin-color detector (renaming a trans-formation on the hue-saturation-luminance space)and cylindrical geometric features (a range-selectionin the shape feature space of all closed objectsdetected in images) to detect bare-skinned humanfigures. In a more general framework, feature groupscan be created by at least two kinds of operations:

• Distance Aggregation. Suppose there are two features,such as color and texture, with their own featurespaces and distance functions. Let the color-dis-tance between two visual objects be d1 and the tex-ture distance between them be d2. We can thencompute a combined distance D between themthrough combination function F1, which may be assimple as a weighted sum. Thus, D = F1(d1, d2).Taking this one step further, we can compose anaggregate hierarchy of distances of the form D =F3(d4, F2(d3, F1(d1, d2))). Even when the functionsare all implemented as a weighted sum, DistanceAggregation allows different feature combinationsby allowing the user to adjust weights [6].

• Dimension Joining. If we want to create a domain-specific feature from a set of primitive features,such as a red-and-white-stripe detector (for theAmerican flag) or a bare-skinned-human detector,we need a more sophisticated tool than DistanceAggregation. Dimension Joining is the construc-

COMMUNICATIONS OF THE ACM December 1997/Vol. 40, No. 12 39

Figure 3. Detailed view of the area found to contain cats on the x-axis. We are actually in a

“cat-rich” area, although there are many other images.

tion of a new feature axis (colored stripe) by anatural join-style operation between chosen fea-ture axes (high periodicity and hue). However, wealso need to specify a distance function for thenewly computed feature space.

Image SpaceIf we ask, “Show images with more than 20% blueon the top, more than 40% green in the bottom sec-tion, and a red-pink area in the middle,” we may belooking for an outdoor picture showing a flower in agarden. This example shows how a user may describethe compositional elements of the desired image andtheir layout in the image space. Understandably,much image-database research has been devoted tomethods for using locations, sizes, and arrangementsof the image’s compositional elements for retrieval.Such queries are allowed through three kinds of sys-tems:

Those with explicit spatial data structures andoperators. There has been strong interest from thedatabase community in applying spatial databasetechniques to queries on the image space. These sys-tems have developed either a query language withspatial operators (such as PICQUERY+ [1], whichallows query conditions like SIZE = short ANDINTERSECTS_WITH = wrist) or a first-order logicscheme (similar to relational calculus) with addi-tional syntax and semantics for spatial operations.Del Bimbo et al. [2] developed a logic with region-based expressions to specify positional informationand object-based expressions for specifying inter-object relationships. The flexibility these systemsoffer in formulating visual queries works well forconstrained domains, but in our experience, they areoften offset by a general lack of control in the set ofimages present in the database. For example, withthese systems, it is difficult to express queries like,“Find images with scattered white regions on a bluebackground at the top of the image.” It is necessaryfor database researchers to extend the scope of theirwell-defined domains to accommodate the impreci-sion inherent in the world of images.

Those using fixed regions and implicit spatialfunctions. These systems divide the image spaceinto a number of predetermined regions. Theseregions can simply be 8 3 8 pixel blocks. A systemby Stricker and Dimai [11] uses a central oval regionand four image corners. The user needs to specifywhich regions are important for spatial matching.The system computes image features for each speci-fied region and evaluates candidate images by com-

puting region-wise similarity. Once the similarityfor each region is evaluated, the system evaluates acomposite function to combine the region-wise sim-ilarities into a composite image-level similarity.Stricker’s and Dimai’s system, for example, uses afuzzy function that puts more weight on the centraloval region; the weight progressively diminishesaway from the center. Similarities from the cornersare added as a weighted function. Stricker’s andDimai’s system is also insensitive to 90-degree rota-tions. This transformation is an important issue allimage space systems have to address. Should a trans-lated, rotated, scaled, or somewhat geometricallytransformed version of an image be considered anear-perfect match with the original? The answer isstrictly domain dependent; what is acceptable for astock photo house may not be for a satellite imageanalyst. However, a system that handles image-spacequeries is incomplete unless the system designerspecifies the types of invariance allowed for geomet-rically transformed images [5].

Those using segmented regions and implicitspatial functions. We are beginning to see moresystems that attempt to segment an image to extractits constituent regions. For example, in the NETRA(in Sanskrit, netra means “eye”) system [8], devel-oped at the University of California at Santa Barbara,images are segmented into homogeneous regions.Users can compose such queries as, “Retrieve allimages containing regions that have the color ofobject A, the texture of object B, and the shape ofobject C and lie in the upper third of the image,”whereby individual objects could be regions belong-ing to different images. Ideally, these regions can betreated as spatial objects with location, shape, size,and neighborhood properties and execute queriescontaining spatial and topological predicates. Real-istically, however, segmentation is imperfect, lead-ing to errors in the computation of spatialproperties. This imperfection makes even simpleoperations, like adjacency testing, more complexthan current spatial systems have been used for. Sofar, we have not seen much research on the relationbetween approximations introduced by segmenta-tion algorithms and the performance of spatialqueries.

A Search Environment for Visual InformationAlthough a great deal of information is computablefrom both single images and image collections, hav-ing extractable information is never enough. Infor-mation is meaningful only when it can be retrieved

40 December 1997/Vol. 40, No. 12 COMMUNICATIONS OF THE ACM

through an expressive query. Many recent systems(such as NETRA, as described by Ma [8]) haveupgraded from features computed from wholeimages to those that compute features for each seg-mented region. A system developed at the Univer-sity of Massachusetts, Amherst, [9] can expressspatial relationships between segmented objects. Formost such systems, however, the primary mode ofsearch is still a k-Nearest Neighbor query overregion objects. To date, no system offers a searchenvironment to express queries that involve all theinformation types described in the previous section.In practice, however, similarity queries work fairlywell in many applications. The reason is that simi-larity queries in most of these systems are based onsimultaneous comparison on many different imageattributes. For example, color itself has three vari-ables, and one can define at least 12 different texturemeasures based on an image’s gray-level cooccurrencematrix and many more variables for shape-like andposition-sensitive properties. Through the simplelaws of joint probability of these many AND-ed vari-ables, it is only natural that the results of a k-Near-est Neighbor query match the query image veryclosely. We have also seen many cases in which users

issue similarity queries with a par-ticular intent, get back unexpectedresults, and discover the result set,though different from their per-sonal intent, is quite consistentwith the original query.

At the University of Californiaat San Diego, we have sought tomodel how users search through alarge collection of images, particu-larly when they have no priorknowledge of the collection’s con-tent and have an impreciselyformed query in mind. We holdthat such users go through a num-ber of explore-navigate-select-refine cycles before identifying theobjects of interest.

Users start the explorationphase by first looking at the distri-bution of images in a virtual spacewhose axes are features they them-

selves select. They usually opt to see sample imagesfrom any part of the virtual feature space (see Figure4). Alternatively, users could start by querying on anexample image. In such cases, the distribution isrecomputed in terms of the feature distance from thereferent image. As users turn the feature cube aroundor walk through it, they become aware that althoughthere is a wide variation along the color axis, it is thecolor distribution axis that shows meaningful clus-ters. In this case, there are several cat images on thex-axis.

Once users develop a basic feel for the organiza-tion or content, they try to sift through informationobjects that seem mutually correlated and are possi-bly related to the query intent. This information sift-ing is the navigation phase. In our example, wewould like somehow to express the idea of “cathood”as a query. So we click on the part of the feature cubewhere we saw some cats. Note that in the cube inFigure 3, a group of images is quite close to theintended query and blends into the larger group.

After users have played with images that seemrelated to the initial unspecified query, they activelyspecify the search criteria (make a selection) and exe-cute another query. In addition to range queries andsimilarity queries in feature space, our system at SanDiego permits an unconventional selection queryusing a cluster-selection operation. Roughly, theoperation means, “I have selected a set of images tobe somewhat close to what I want. Now determine asimilarity criterion so these images are very close toeach other, and make the search.” The result positions

COMMUNICATIONS OF THE ACM December 1997/Vol. 40, No. 12 41

Figure 4. Global view of the database. The three axescorrespond to color, color distribution, and structure.

The axes display a sample of the images taken randomlyat several points along the axis.

the user in an area in the feature space with a higherdensity of cat images (see Figure 5).

The refinement phase follows, in which searchparameters are modified to improve the quality ofresults. In most systems, adjusting relative weightsof the primitives is the only way to perform queryrefinement. In our system, users can also explore anycluster, using it to determine similarity criteria.

Although these operations lead to richer userinteraction than in most systems, we are far frombeing able to associate semantics with images. Themore informed and focused the user is, the less timeis needed for exploration, and the sooner the selec-tion is made. In any case, if the refined query doesnot produce desired results, users may go back to thenavigation or exploration phase.

ConclusionsA major difference between the query environment ofa traditional database system and that of a visualinformation management system is that the latterneeds to be able to handle a plurality of possible inter-pretations of data. This shift of focus from precise andwell-formed queries and provably correct results to I-know-it-when-I-see-it correctness requires a queryenvironment embedding visualization and visualmanipulation of data. Such an environment facilitatesexploration of the data, flexible feature and similarity

selection, and incremental refinement ofuser queries.

However, large databases containingsemantically rich data sets, like imageswith multiple domain- and application-dependent semantics, require interactionenvironments far richer than those pro-vided by the current generation of visualinformation systems. The ideas presentedhere are only a small step in a very richresearch direction.

References1. Cardenas, A., Ieong, I., Barker, R., Taira, R., and Breant,

C. The knowledge-based object-oriented PICQUERY+language system. IEEE Trans. Knowl. Data Eng. 5, 4(Aug. 1993), 644–658.

2. Del Bimbo, A., Vicario, E., and Zingoni, D. Symbolicdescription and visual querying of image sequences withspatio-temporal logic. IEEE Trans. Knowl. Data Eng. 7, 4(Aug. 1995), 609–622.

3. Eiter, E., and Mannila, H. Distance measures for point setsand their computation. Acta Inf. 34, 2 (Feb. 1997), 109–133.

4. Forsyth, D., Malik, J., Fleck, M., Greenspan, H., Leung, T., Belengie,S., Carson, C., and Bregler, C. Finding pictures in large collections ofimages. Tech. Rep. CSD96-905, Univ. of California, Berkeley, 1996.

5. Gudivada, V., and Raghavan, V. An experimental evaluation of algo-rithms for retrieval by spatial similarity. ACM Trans. Inf. Syst. 13, 2(Apr. 1995), 115–144.

6. Gupta, A. Visual information retrieval: A Virage perspective. Tech.Rep. TR95-01, Virage, Inc. , San Mateo, Calif., 1995.

7. Liu, F., and Picard, R. Periodicity, directionality, and randomness:Wold features for image modeling and retrieval. IEEE Trans. Patt.Anal. Mach. Intel. 18, 7 (July 1996), 722–733.

8. Ma., W. NETRA: A toolbox for navigating large image databases.Ph.D. Dissertation, Dept. of Electrical and Computer Engineering,Univ. of California at Santa Barbara, 1997.

9. Ravela, S., and Manmatha, R. Characterization of visual appearanceapplied to image retrieval. In Proceedings of DARPA Image UnderstandingWorkshop, T. Strat, Ed. (New Orleans, La., May 11–14, 1997). MorganKaufman Publishers, San Francisco, 1997, pp. 693–699.

10. Rubner, Y., Guibas, L., and Tomasi, C. The earth mover’s distance,multi-dimensional scaling, and color-based image retrieval. In Proceed-ings of DARPA Image Understanding Workshop, T. Strat, Ed. (NewOrleans, La., May 11–14, 1997). Morgan Kaufman Publishers, SanFrancisco, 1997, pp. 661–668.

11. Stricker, M., and Dimai, A. Color indexing with weak spatial con-straints. In Proceedings of the SPIE Storage and Retrieval for Image and VideoDatabases IV, vol. 2670, I. Sethi and R. Jain, Eds. (San Jose, Calif., Jan.28–Feb. 2, 1996). SPIE, Bellingham, Wa., 1996, pp. 29–40.

Amarnath Gupta ([email protected]) is a senior softwarescientist in Virage, Inc.Simone Santini ([email protected]) is a Ph.D. candidate inthe Computer Science Department in the University of Californiaat San Diego.Ramesh Jain ([email protected]) is a professor of electrical andcomputer engineering in the University of California at San Diegoand is the chairman of the board and founder of Virage, Inc.

Permission to make digital/hard copy of part or all of this work for personal or class-room use is granted without fee provided that copies are not made or distributed forprofit or commercial advantage, the copyright notice, the title of the publication andits date appear, and notice is given that copying is by permission of ACM, Inc. To copyotherwise, to republish, to post on servers, or to redistribute to lists requires prior spe-cific permission and/or a fee.

© ACM 0002-0782/97/1200 $3.50

c

42 December 1997/Vol. 40, No. 12 COMMUNICATIONS OF THE ACM

Figure 5. The user has already selected a set of cat images for a cluster selection, implicitly

defining a similarity criterion. The distribution wasrecomputed, taking the user to a cat-intensive

part of the database.