spatial/spatial-frequency representations for image segmentation and grouping

19
Spatial/spatial-frequency representations for image segmentation and grouping Todd Reed* and Harry Wechsler’ In this paper, we consider the application of spatial/ spatial-frequency representations to texture-based image segmentation and to perceptual grouping, a related phenomenon. We discuss a system far segmenta- tion and grouping, based on the pseudo- Wigner distribu- tion (PWD), a discrete approximation to the WD, and show experimental results. Keywords: texture, .~egmentat~on, pattern recognition. Vision systems are often considered to be composed of two sub-systems, referred to as low-level and high- level. Low-level vision (also referred to as early vision, due to its association with the initial processes of biological vision systems) consists primarify of image processing. The input image is operated on in such a way as to produce another image, with more favourable characteristics. These operations may yield images with reduced noise, or cause certain features of the image to be emphasized (such as edges). The high level vision processes are of the type studied by the artificial intelligence community. These include object recogni- tion and, at the highest level, scene interpretation. Segmentation, the grouping of parts of an image into regions that are homogeneous with respect to one or more characteristics. is an important function in com- puter vision. By mapping an image representation, typically at the pixel level, into a description involving regions with common features, segmentation serves as a bridge between low-level and high-level vision processes (see Figure 1). The selection of relevant features and methods for finding regions that are uniform with respect to these features is the subject of this paper. “L.aboratoire de Traitcment des Signaux, EPFL - Ecublens (DE), Ch-1015 Lausanne. Switzerland. ‘Department of Computer Science. George Mason University. Fairfax, VA 22030, USA Paper received: .?I .Novrmber 1989, Revisedpuper received: 31 Augusr IYW In the next section, we will discuss the segmentation problem in detail. Texture analysis will be examined, with the emphasis on segmentation and the characteris- tics desired in texture segmentation methods. Below WC briefly review recent texture segmentation tech- niques. We note that frequency domain methods have, in general, performed poorly as compared to statistical methods. We then examine some currently used joint spatial/spatial-frequency representations which, by allowing spatially localized frequency analysis, over- come the primary shortcoming of classical Fourier techniques. We then discuss the Wigner distribution, which has the highest joint resolution of the representa- tions discussed, and also implicitly includes phase information. We show an experimental approach to texture segmentation based on the pseudo-Wigner distribution, a computable approximation to the Wigner distribution, and show several segmentation examples. Using the same system, we examine the results of a number of grouping experiments, which imply a possible mathematical basis for perceptual grouping in humans. Our work is then summarized. PROBLEM DESCRIPTION Definitions In the previous section, a general definition of segmen- tation was given. A more formal definition of picture segmentation’ is stated below. Definition I. Let X denote the grid of sample points for a given picture. Let Y be a subset of X containing at least two po~rlts. Then a uniformjty predicate P(Y) is one which as.~ignsthe value true orfalse to Y depending on1.y on properties of the brightness matrix f{i, j) for the points of Y. Furthermore, P has the property that if Z is a nonempty subset of Y then P(Y) = true ~rnp~ies P(Z) = true. Definition 2. A segmentation of the grid X for a 0262-8856/91/003175-19 0 1991 Butterworth-Heinemann Ltd v&Y no 3 june 1991 175

Upload: todd-reed

Post on 28-Aug-2016

216 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: Spatial/spatial-frequency representations for image segmentation and grouping

Spatial/spatial-frequency representations for image

segmentation and grouping

Todd Reed* and Harry Wechsler’

In this paper, we consider the application of spatial/ spatial-frequency representations to texture-based image segmentation and to perceptual grouping, a related phenomenon. We discuss a system far segmenta- tion and grouping, based on the pseudo- Wigner distribu- tion (PWD), a discrete approximation to the WD, and show experimental results.

Keywords: texture, .~egmentat~on, pattern recognition.

Vision systems are often considered to be composed of two sub-systems, referred to as low-level and high- level. Low-level vision (also referred to as early vision, due to its association with the initial processes of biological vision systems) consists primarify of image processing. The input image is operated on in such a way as to produce another image, with more favourable characteristics. These operations may yield images with reduced noise, or cause certain features of the image to be emphasized (such as edges). The high level vision processes are of the type studied by the artificial intelligence community. These include object recogni- tion and, at the highest level, scene interpretation.

Segmentation, the grouping of parts of an image into regions that are homogeneous with respect to one or more characteristics. is an important function in com- puter vision. By mapping an image representation, typically at the pixel level, into a description involving regions with common features, segmentation serves as a bridge between low-level and high-level vision processes (see Figure 1).

The selection of relevant features and methods for finding regions that are uniform with respect to these features is the subject of this paper.

“L.aboratoire de Traitcment des Signaux, EPFL - Ecublens (DE), Ch-1015 Lausanne. Switzerland. ‘Department of Computer Science. George Mason University. Fairfax, VA 22030, USA

Paper received: .?I .Novrmber 1989, Revisedpuper received: 31 Augusr IYW

In the next section, we will discuss the segmentation problem in detail. Texture analysis will be examined, with the emphasis on segmentation and the characteris- tics desired in texture segmentation methods. Below WC briefly review recent texture segmentation tech- niques. We note that frequency domain methods have, in general, performed poorly as compared to statistical methods. We then examine some currently used joint spatial/spatial-frequency representations which, by allowing spatially localized frequency analysis, over- come the primary shortcoming of classical Fourier techniques. We then discuss the Wigner distribution, which has the highest joint resolution of the representa- tions discussed, and also implicitly includes phase information. We show an experimental approach to texture segmentation based on the pseudo-Wigner distribution, a computable approximation to the Wigner distribution, and show several segmentation examples. Using the same system, we examine the results of a number of grouping experiments, which imply a possible mathematical basis for perceptual grouping in humans. Our work is then summarized.

PROBLEM DESCRIPTION

Definitions

In the previous section, a general definition of segmen- tation was given. A more formal definition of picture segmentation’ is stated below.

Definition I. Let X denote the grid of sample points for a given picture. Let Y be a subset of X containing at least two po~rlts. Then a uniformjty predicate P(Y) is one which as.~igns the value true orfalse to Y depending on1.y on properties of the brightness matrix f{i, j) for the points of Y. Furthermore, P has the property that if Z is a nonempty subset of Y then P(Y) = true ~rnp~ies P(Z) = true.

Definition 2. A segmentation of the grid X for a

0262-8856/91/003175-19 0 1991 Butterworth-Heinemann Ltd

v&Y no 3 june 1991 175

Page 2: Spatial/spatial-frequency representations for image segmentation and grouping

F Scene

Low-level . Processing l Segmentation r Interpretation

7

Information About the

Scene

Figure 1. Segmentation is a bridge between low-level vision and high-level vision (interpretation)

uniformity predicate P is a partition of X into disjoint nonempty subsets {Xi} such that

UXi=X; for all 1 s id n, Xi is connected and P(Xi) = true; and P is false on the union of any number of adjacent members of the partition.

It should be noted that the second definition does not imply uniqueness of the segmentation or that the number of regions n is as small as possible.

In fact, segmentation algorithms examine only a subset of possible partitions. For an image of useful size, an exhaustive search of all possible segmentations is a practical impossibility, since the problem is NP hard. The final segmentation is heavily influenced by the subset examined.

The determination of the predicate P is the central problem in the segmentation process. The choice of characteristics, or features, upon which to base the segmentation strongly affects the quality of the result. Image intensity, colour, multispectral data, and range data have all been considered as candidates for features2T3.

However, when the homogeneity of regions is decided based on the uniformity of the above charac- teristics, the resulting segmentation is often unsatisfactory3. A single, strongly textured surface in an image will be segmented into numerous small regions, rather than the single surface desired. Because textured surfaces are extremely common in natural scenes, this is a serious difficulty. The desire to segment such scenes into meaningful partitions is the motivation for examining texture for image segmentation.

A universally accepted definition of texture does not presently exist. Often, however, texture is described as being generated by some basic element, or elements, referred to as primitives. These elements are repeated at positions and orientations determined by some placement rule, or rules. While this definition may be useful when describing artificially generated textures, such as checkerboards and tilings, it is difficult to apply to natural textures, which might exhibit some random appearance. In fact, textures may be found in a continuous spectrum from purely deterministic to purely stochastic. A number of surveys of general texture analysis techniques are available”.

176

In texture segmentation, and especially in unsuper- vised texture segmentation (in which prior knowledge of texture and region characteristics is not used), the task of selecting the uniformit predicate P is compli- cated by the cell unit problem Y . The image resolution and the size of area over which the predicate is to be evaluated must be determined, since measurements upon which the predicate is based are valid only over a specific range of sizes and resolutions. In order to determine this range, however, a segmentation is required.

The potential benefits make the texture segmenta- tion problem interesting, despite the difficulties in- volved in solving it. The ability to segment natural images reliably and without human intervention is critical for computer vision. It is more appropriate to use texture-based segmentation rather than methods based on the picture value function, as discussed above. Finally, once areas of uniform texture are known, methods for the detection of surface shape, orientation’, and motion’ can be employed. The ability to derive shape from texture makes 3D object recogni- tion and scene interpretation possible.

Desirable characteristics for texture segmentation methods

In light of recent research, there are a number of characteristics that are desirable in a texture segmenta- tion method. These include locality, high resolution, automaticity and consistency with what is known about the human visual system.

There is substantial support for the idea that visual texture discrimination is a local process’@‘2. That is, the ability to differentiate between two texture fields is based on the characteristics in relatively small (local) neighbourhoods, rather than on the overall (global) image characteristics. This explains the relatively poor performance of early freq;e,;cy analysis methods, which were global in nature 3 .

The resolution of a texture segmentation method determines the size of the smallest detail that can be detected, as well as the accuracy to which boundaries between textures can be found. Perceptual grouping (which is the basis for texture segmentation in humans) is dependent on both shape similarity (i.e., spatial properties) and organization (i.e., spectral properties).

image and vision computing

Page 3: Spatial/spatial-frequency representations for image segmentation and grouping

This leads to the need for high resolution in both the spatial and spatial-frequency domains. However, arbit- rarily high resolution cannot be obtained in both domains simultaneously’~~‘~.

As a key goal of texture segmentation is to allow image interpretation with a minimum of human inter- vention, the technique should also be as automatic as possible. If parameters (such as thresholds) are re- quired, it should be possible to derive them from the image being analysed. Failing that, the method should be sufficiently insensitive to the parameters so that a wide class of images can be analysed for a given parameter set.

Finally, it is desirable that the underlying mechan- isms suggested by any computer vision theory be consistent with theories of human vision. Furthermore, it should be possible to confirm predictions made by the theory with psychophysical experiments. In this case, the segmentations or groupings performed by a texture segmentation system should be consistent with those that a human would make.

BACKGROUND

In this section, we briefly examine some of the main approaches used in texture segmentation, and further discuss the motivation for the use of spatial/spatial- frequency methods for this task. A more complete comparison of current available elsewhere’“.

approaches is

The main techniques used in texture segmentation can be grouped loosely into those based on statistical methods and those using spatial-frequency or spatial/ spatial-frequency techniques.

Statistical methods

One of the first methods used in texture segmentation, and one which is still widely in use, is the Spatial-Grey Level Dependence method’“. Given an image, a set of matrices P(i, i, d, 19) is defined, such that the (i, j)th entry is the number of times that the grey levels i and j exist in the image at a distance d, in the direction 8, from each other. Each (d, 0) pair defines a new matrix. Various features are then extracted from these matrices. This method is highly dependent on the resolution chosen. however, and many of the features derived in this way have little correlation with features visible to humans.

More recently, the use of the Gibbs random fields” has become popular in image processing. Derin and Elliot” have proposed a hierarchical (two level) image model based on the Gibbs distribution. The high level distribution models the regions in the image. The low level distributions model the textural properties of each region. Using this model, they have segmented images based on the distribution parameters, using dynamic programming. This segmentation method has currently been applied only to noise free textures with known high level distribution parameters. An estimation procedure for the distribution parameters of textures has been developed. but it is only applicable to isolated samples of textures.

Spatial/spatial-frequency methods

In the past, statistical methods have graven superior to frequency domain techniques”“ . This can be explained by the global nature of these early frequency analysis methods. Joint spatial/spatiaI-frequency techniques are inherently local in nature, and have additional characteristics that make them attractive for texture segmentation (and other computer vision tasks), as well.

Joint spatial/spatial-frequency (s/sf) methods are based on image representations that measure the frequency content in localized regions in the spatial domain. In this way, these methods overcome the shortcomings of the traditional Fourier-based tech- niques. They are able to achieve relatively high resolution in both the spatial and spatial-frequency domains (depending on the representation) and are also consistent with recent theories on human vision. There is a large and growing body of theory postulating frequency analysis in the human visual system, ranging in complexity from the 3 or 4 frequency selective channels proposed by Crick et a1.‘3 to a continuous spectrum of frequency analyzers. Candidates for the implementation of this frequency analysis include the use of 2D Gabor functions” or Gaussian-smoothed sectors 24. Support for a spatial-frequency interpreta- tion of human vision in predicting object recognition has been reported by Ginsburg”. Beck et LzL.‘~ have shown a correlation between the ability of humans to segment tripartite textured images and the outputs of a bank of 2D Gabor filters applied to the images.

Spatial/spatial-frequency representations

A number of spatial/spatial-frequency representations are in current use.

The short-time (ID) Fourier transform has long been used in the analysis of time-varying signals”‘. Straight- forward two-dimensional extension of this idea yields the finite-support Fourier transform. The spectrogram of an image is the squared magnitude of this transform. The spectrogram has been used by Bajcsy and LiebermanZN for the extraction of texture features, and by Pentland” for the estimation of fractal dimension.

The difference-of-Gaussians (DOG) representation expresses an image as the sum of the outputs of a bank of isotropic 2D bandpass filters, similar to that pro- posed as a model of the human visual system by Wilson”“. Since the Nyquist frequencies for all filter outputs but that of the filter with highest centre frequency are less than that of the original image, the filtered image can be subsampled, yielding a multi- resolution (pyramids representation in three dimen- sions ((x, y, p), where p is the filter centre frequency). The squared magnitudes of the outputs of these filters form the DOG power representation. Signal representations consisting of a number of filtered versions of the original considerable attention”‘,

signal have received with the multi-resolution or

pyramid representations being most popular3’.‘3. The generating kernels used as filters have been of particu- lar interest, with recent investigations done by Meer et a13”. A representation for shape has been implemented by Crowley and Parke? using the “difference of iow- pass” (DOLP) transform, a generalization of the

177

Page 4: Spatial/spatial-frequency representations for image segmentation and grouping

DOG. The use of difference of Gaussians filters as generating filters has been popular due to the correla- tion between this response and the measured receptive fields of both retinal ganglion and lateral geniculate cells3h.

An image can also be represented as the sum of outputs of a bank of 2D Gabor filters. Since this class of filters is orientation sensitive (anisotropic), the result- ing representation is 4D, as was the spectrogram. In fact, the squared magnitude of this representation (the Gabor power representation) is an instance of the spectrogram, using a Gaussian window function. Recently, Porat and Zeevi”’ have proposed a general- ized Gabor representation for use in machine vision. Gabor filters have also been used for texture segmentation”*dO. Daugmant7 has shown that this class of filters achieve the lower limits imposed by the 2D uncertainty inequalities. He has also demonstrated that the 2D Gabor functions agree with 2D receptive-field profiles measured for simple cells in the cat striate cortex. As noted by Clark et LZ~.~~, and earlier by Kulikowski et aL4’, the real and imaginary parts of the Gabor filter impulse response differ only by a shift in phase of 7~12. Pollen and Ronner4’ have observed that simple cells in the visual cortex are arranged in pairs, with the members of each pair having the same orientation and spatial frequency tuning, but with phase responses differing by approximately 7r12. The resulting even and odd symmetries of the respective members of each pair lends support to the idea that the cell pairs correspond to the real and imaginary parts of complex Gabor receptive fields.

WIGNER DISTRIBUTION

The Wigner distribution, shown in its original one- dimensional form below:

and the asterisk denotes complex conjugation. The WD can also be defined in terms of the Fourier transform of f(x, y) as:

where:

+(u, v, Q, ()= F i u-2, v-p The use of this distribution for 2D and 3D image

~~;~$s1 was first advanced by Jacobson and

_ Cristobal et aLs4 have calculated a discrete approximation to the WD for digitized images. They have noted that differences can be seen between the WDs of different textured images, and have suggested that the WD might be useful for texture analysis and segmentation.

Some properties of the 2D Wigner distribution that are of particular interest in image processing applica- tions are shown in Table 1.

It should be noted that Property 1 implies that the WD lacks the phase component present in the Fourier transform. Nonetheless, given the WD of an image, the image can be recovered (to within a constant). This indicates that phase information is implicitly present in

Table 1. Some properties of the 2D Wigner distribution

Wf(x, o~)=/_~f(x+ i)f*(x- i).eFjwa da (1)

was first introduced in quantum mechanics to charac- terize the positions and momenta of particles4’. It was first used in signal analysis by Ville44. The properties of the WD, and examples of its application to functions of one variable, are discussed in a classic series of articles of Claasen and Mecklenbrauker4’~‘. The ID WD has been applied in a number of areas, including speech analysis and optics4”*‘*.

Extending the above definition to two dimensions, the 2D WD is defined as:

where:

= R&x, y, a, p) e-i@u+pv’da dP

(2)

1. W,(X, y, U. v) is a strictly real-valued function

2. For real f(x, y),

W,(x* y, U, V) = W&u, y. -U* -v).

3.

W/(X, y, II, v)dudv= If@* ,v)l’

4. x I

i s -* _-r W,(x, y* u, v)&dy= / F(u, “)/I

5. If g(x, y) =f(x-x,,. y-y,)), then

W&.(x. y, U. v) = W,(r -x0, y-y,,, u. V).

6. If g(x. y) = f(x+ y) ei(rr,a + vc4 I , then

W& y, u, 4 = Wr(x, y, u - U(fr v - vt,).

7. Iff(x, y) =g(x, y)*h(x, y), then

W/(X* y, u, v) = W& y, u, v)S W,(x, y, u, v),

where ‘s’ denotes convolution with respect to the spatial variables I and y,

8. If f(x, y) = g(x, y)h(x, y), then

sf WJ(S, y* U, v) = 5 w&4x, y. u, v) * w/Ax, y, u, V),

sf where ’ * ’ denotes convolution with respect to the spatial- frequency variables u and Y.

178 image and vision computing

Page 5: Spatial/spatial-frequency representations for image segmentation and grouping

the WD. The importance of phase information in images has been established by Oppenheim and Lim”.

That the Wigner distribution tracks non-stationaries (changes in frequency content) can be seen in the following example45. Consider the ID function:

(4)

This function is often instantaneous frequency Calculating the ID WD:

referred to as a chirp. Its increases linearly with time.

= IA 1227r8(w - at> (5)

This example shows that the WD can track the instananeous frequency of a chirp signal; i.e., at any point in time the spectral energy is concentrated at the instantaneous frequency.

As a second example, consider the function:

f(x) = el”l’ (6)

The WD of this function can be calculated as:

= 27r6(w - o$r) (7)

As seen in the above examples, analytic calculations with the 1D WD are relatively straightforward. For cases in which the function of interest is separable, the calculation of the 2D WD is equally simple. Extending the previous example to two dimensions:

so that:

=c x

-/W, (2 -;) e -,nrrda

If the function to be considered is not separable, however, analytic calculation of the WD can be very

complicated. Obviously, in cases where the “function” is not available in analytic form, but only as an array of data (as in the case of a digitized image, for example), the above formulations of the WD cannot be used at all.

For cases in which the WD must be calculated numerically, either because of the complexity of the analytic expression or the need to operate on discrete data, an estimator of the Wigner distribution must be used. These numericaIly calculated representations are referred to as pseudo-Wigner distributions (PWDs).

A form of PWD that has been used for the tracking of non-stationarities in ID data by Martin and Flandrins6”” is shown below:

,\ - 1

21 - I

,=~+,gw(f)f(t+l+k)f*(r+l-li) (10)

where h,(k) and g&I) are window functions. The function h,(k) is a window centred at k = 0 and 2N- 1 points wide. This window determines the region, around the point in time of interest, used in calculating the PWD. The function gM(l) is centred at I= 0, and is 2M- 1 points wide. This window allows averaging in the calculation of the PWD.

Extending the above 1D definition, we define the 2D PWD ass9:

f*(m+r--k,n+s-I) e-d’?++? (11)

where p=O, *I, . . ., f(P.$--l),g=O, 3~1, . . ., k((N,--l), P=2Nz--1, Q=2N,-1, and m and PE are integers. The functions hN,, N,(k, 1) and g,,, ,,_,,(Y, s) are window functions, analogous in function to those in the 1D case.

The previous analytic example involving the product of complex exponentials has been computed using the 2D PWD, with the result shown in Figure 2.

A key issue in comparing joint spatialispatial- frequency representations is the resolution that can be attained (simultaneously) in the two domains. Since the joint resolution determines the accuracy with which boundaries can be detected and the ability to discri- minate between similar textures, it is a very important consideration in selecting a representation for texture segmentation. Jacobson and WechslerhO have shown that the spectrogram and DOG and Gabor power representations are smoothed versions of the WD. Because of this smoothing, these sisf representations cannot improve on the resolution of the WD.

Another advantage of using the WD over power representations is that it encodes phase. Obviously, this is critical for cases in which textures differ only in phase. A common example is the case in which one texture field is composed of “T” texture elements on a

vol9 no 3 june 1991 I79

Page 6: Spatial/spatial-frequency representations for image segmentation and grouping

Figure 2. The PWD of f(x, y) = eix ejy for N, = N2 = 8, and m = n = 8. The window hN,, Nz (k, 1) is rectangular, and gM,, MZ(r, s) iS a normed rectangular window

regular grid, and a second is composed of “L” elements on the same grid. Such textures are readily discrimin- ated by humans, which supports the already existing evidence of the importance of phase in visions5. It is therefore clear that to be successful in general, a segmentation method must account for phase. Although the finite-support Fourier transform and the DOG and Gabor representations all have phase com- ponents, their power spectra are almost always used as the basis for segmentation. Investigations into the use of phase, which requires the solution of the phase unwrapping problem, undertaken4’.

have only recently been Since magnitude and phase information

are combined in the real-valued WD, the formulation of (unwrapped) features to account for phase is not required.

A potential disadvantage to using the PWD for image analysis is the cost of its computation. Imple- mented naively, the PWD can be very expensive to compute. However, as shown by Durrani et aL6’, a discrete approximation to the 1D WD can be calculated using a systolic architecture. More recently, Boashash and Black62 have proposed an algorithm and architec- ture for computing such a discrete approximation in real-time. Although we have not attempted to use these methods in our experiments, similar techniques should

Image

CT

Preprocessing

--i

Image Representation

Figure 3. A generic image segmentation system

180

be applicable to the calculation of a discrete approx- imation to the 2D WD.

TEXTURE SEGMENTATION USING THE PSEUDO-WIGNER DISTRIBUTION

In the preceding section a number of s/sf representa- tions were examined, and it was concluded that in addition to the advantages inherent in all s/sf represen- tations, the WD has superior resolution. In this section a generic form for image segmentation systems is proposed, and the requirements of the system compo- nents for PWD-based texture segmentation are dis- cussed. An implementation of a texture segmentation system based on the PWD is described, and the results of applying this system to a variety of texture mosaics are reported.

PWD-based system for texture segmentation

A generic system for texture segmentation is shown in Figure 3. Following the form of this generic system, a PWD-based texture segmentation system is shown in Figure 4. We discuss the ideas governing the selec- tion of the system components in the sections that follow. Aspects of this approach are also discussed in References 63-65.

Preprocessing/anti-alias filtering

Preprocessing of an input image can be used to reduce noise, or, if information about the input is known a priori, to emphasize desired characteristics (e.g., cer- tain frequency components). In this case, the only preprocessing performed is anti-alias filtering.

In calculating the WD (2), we calculate the Fourier transform (with respect to LY and p) of f(x+ a/2, y +P/2)f* (x-a/2, y -p/2. To calculate the PWD (ll), we must calculate the associated discrete Fourier transform. To do so without aliasing, the Nyquist condition must be satisfied for this product. Because the product function has a spectrum twice as wide (in each dimension) as the original function, the most obvious way to avoid aliasing in the product function is to oversample the original function by a factor of two in each dimension. This guarantees that the product function will satisfy the Nyquist criterion.

at the Nyquist rate, wi must _ reduce the If we assume that the innut image has been sampled

image

I I Data - Boundary

Reduction Detection

Segmented Image

t

1 image and vision computing

Page 7: Spatial/spatial-frequency representations for image segmentation and grouping

- Image

cl_, Anti-Aliasing c

Filter

I

Pseudo-Wigner Select Frequencies

* With Maximum - Relaxation Distribution 1 Absolute Values 1 / 1

Segmented Image

Figure 4. A texture seg~e~tat~o~ system using the PWD

bandwidth by a factor of two in each dimension. Since this is the same function performed in the resolution reduction for the generation of pyramids, we have used the pyramid generating kernel proposed by Meer et ai.j4 as our anti-aliasing filter.

Image representation/the Pseudo-Wigner Distribution (PWD)

Above we discussed the calculation of the PWD (ll), in general. The choice of windows used in this calculation can strongly affect the results obtained.

The selection of the window /~~,,~~(k, I) is governed by the same considerations that hold m any application of the DFT. We wish to select a window that is of a specified size in the spatial domain which is largely limited in extent in the spatial-frequency domain, and which has small (if any) sidelobes in both domains. The size of the window is dictated by the resolution required in the spatial-frequency domain.

In the interest of simplicity~ a rectangular window was used for hN,, N,(k, f) in the previously shown calculations of the PWD. It is well known, however, that the use of rectangular windows in this type of application can lead to significant ripple in the trans- form domain (the PWD, in this case). To reduce this ripple, we have chosen to use a 2D extension of the 1D Kaiser window”‘. This parameterized window is an approximation to the family of prolate-spheroidal wave functions of order zero, which maximize the energy in a given band of frequencies, for a specified time dura- tion. The parameters in the Kaiser approximation allow compromise between the centre lobe width and side lobe height of the window’s Fourier transform, thus giving a degree of control over the amount of ripple allowed.

The 2D window is defined as:

where:

(12)

(0 otherwise

otherwise

. ”

‘,~i 1 1 :

: ;q ,),I ‘:, T,\ T\ : ; : 1 : ; ‘: 1 / !’ /Jr. I ,I,, : ‘ 1 1 ,= 1 .

L<,,. : : :y-.:;;/ ;;’

” i=+

--‘-, ; /]// /

Y. \

\: .,/” Figure 5. The PWD of f(x, y) = ejx ejy with h,,. v. (I<, a Kaiser window, and LY I = a2 = 7.0

0

and lo(x) is the modified Bessel function of the first kind and zeroth order.

The advantage of using a Kaiser function for the window h,v,, ,&C 0 can be seen by comparing Figure 2 and Figure 5. Both examples show a portion of the PWD of a product of complex exponentials. The use of the rectangular window (as was done in the original example} results in substantial ripple and abasing (Figure 2) while the use of the Kaiser window makes such effects almost negligible (Figure 5).

The function of the window gM,, M,(r, s) is to allow local averaging. As before, we have chosen to use a normed rectangular window, defined as:

1 if )r)<M,-1 gM,.,u2(r,s) = (2M, - 1)(2M2- 1) and I.9 I CM,- 1

0 otherwise.

(13)

The larger the values of &I, and Mz, the more averaging occurs.

Data reduction

The 2D PWD of an image can be represented as a 4D array, consisting of 2D arrays containing the frequency content at each point in the image. For an NX N image and the main PWD window size specified by N, and NZ, this array has 2N, x 2N, x Nx N elements. For iVr = NZ = 8, and N= 128, this requires the storage of over 4 million (floating point) elements. For

vol9 no 3 jurze 1991 181

Page 8: Spatial/spatial-frequency representations for image segmentation and grouping

practical implementation, a reduction in the amount of memory required is desirable. In addition to reducing memory requirements, choosing the proper reduction method (the proper features to be extracted from the PWD) can simplify the boundary detection task. One approach to data reduction is to retain information only at key frequencies. The frequencies which have the highest energy are often important, making this a natural selection criterion. It should be noted that the most important frequencies for each texture in the image could easily be selected by human intervention, examining each frequency plane visually. In many cases this may lead to better segmentations. However, because it is our goal to make the system as automatic as possible, we have chosen to select frequencies in an unsupervised manner.

In this system, we first determine the frequency occuring with maximum energy for each pixel. The resulting frequencies are ranked by energy of occur- rence and the frequency contents at a number of the top-ranked frequencies are retained. In most of the cases that follow, only the highest energy frequency is kept, yielding a single N x N array, or frequency plane. However, as shown in the first of the examples (see below), more frequency planes can be used to improve discrimination. It should be noted that at no point in the data reduction process does averaging occur, so that the high joint resolution of the PWD is preserved in the reduced data. We simply extract a selected portion of the representation.

1 .oo

0.50

7 0.00 Y

-0.50

-1.00 0

Figure 6. The sigmoidal transformation, with p = al.5

also be terminated at the point at which the 2D array changes by only a small amount. The resulting set of relaxed 2D arrays form a set of feature vectors for each pixel in the image. When the normalized inner product between the feature vectors for adjacent pixels is above a threshold, the pixels are grouped together.

Region detection/relaxation Results

In this step, the goal is to identify regions considered to be homogeneous. By our choice of representation, this implies uniformity in the PWD (or some reduced version of the PWD). In general, a compromise must be made between the ability to discriminate between similar textures and the ability to recognize, as a single region, a uniformly textured area with slight perturba- tions. Relaxation methods ” have been shown to be useful for the identification of homogeneous regions. The relaxation process used in this case was proposed by Caelli 12. It is applied individually to each of the 2D arrays selected in the data reduction step. This iterative process consists of first calculating a local average about each point in the array:

The experiments described next are of two types. First, we use synthesized textures to demonstrate our approach. Then we show the results obtained using natural textures. All textured regions in these examples have zero mean values. The natural texture images were first scaled to have pixel values between 0 and 255, then processed to have zero mean. All images shown in the examples that follow have been scaled to

A(‘)@, y) = $, “i, (2L: I)’ I(‘-% + m, Y + n).

(14)

The averaged array is then transformed:

Aqx.,v-o I”+ y) = L(A”‘(x y)) = ’ - e- ’

1 + e A”)(J~Y)-~ (15)

where Z(‘)(x, y) is the 2D array at the ith iteration. The L(O) transformation, shown in Figure 6, is a sigmoidal function whose purpose is to facilitate a labelling decision by pushing the corresponding estimate towards 1 (class A) or - 1 (class B).

The averaging and transformation steps are repeated a specified number of times, although the process could

/ f-

-T-

30 A/alpha ’

f

I 1. 0

Figure 7. The first synthesized texture example (from top left to bottom right). (a) The real part of the input image; (b) the imaginary part of the input image; (c) the real part after anti-alias filtering; (d) the imaginary part after anti-alias filtering

182 image and vision computing

Page 9: Spatial/spatial-frequency representations for image segmentation and grouping

maximum and minimum pixel values of 0 and 255, Table 2. The parameters used in the segmentation respectively, for display purposes. experiments

Synthesized textures

The first exampte in this section is an extension of the product of complex exponentials example used pre- viously, By using these relatively simple textures, the basic Ideas underlying the system are clearly demons- trated. By using an image with more than two textures, we also show how multiple frequency planes can be used to enhance discrimination. Figures 7a and 7b show the real and imaginary parts of the original image, 64 x 64 pixels in size (as are all images in this section), with each quadrant 32x 32 pixels. The image after anti-alias filtering is shown in Figures 7c and 7d. In Figures Xa. 8h and Xc the three frequency planes of the

Experiment PWD Relaxation Nr Labelling

M,,NZ MI,M2 ty p Her. Trans./Iter. Thresh.

Synthesized I 8,8 l,l 128 20 5 1 0.95 2 8.8 I,1 128 20 s I 0.5 3 X,8 1,l 128 20 5 1 0.5 4 8,8 I.1 128 20 s 1 0.5 Natural 1 8.8 5.5 822010 I 0.5 2 8.8 5,5 128 20 5 1 0.5

PWD of the image which occur with the highest energy are shown. Three of the regions are clearly indicated as high intensity regions, while the fourth is determined by process of elimination. Figure 8d shows the first of the planes after relaxation. The relaxation results for the other planes are similar. Finally, in Figure 9 the original real part of the image is shown, with bound- aries between the regions indicated in white. Since the image is padded with zeros as a preliminary step in calculating the PWD, a low amplitude border a half window wide results, which accounts for the border region around the image. The parameters used in this example, and those that follow, are shown in Table 2.

In the second example, shown in Figure lOa, the five foreground regions are generated using the function f(x, y) = cos (x12) cos (y/2), against a background func- tion b(x, y) = cosxcosy. The image after anti-aliasing is shown in Figure lob. The PWD frequency plane with the highest energy content (which we will refer to as the primary plane) is shown in Figure 10~. While one can visually distinguish the different regions in this image, the regions are not uniform. This is due to the PWD taking negative values within the foreground regions. Taking the absolute value of the frequency plane, we get the result shown in Figure 10d. Smoothing the plane with a second application of the anti-aliasing filter,

Figure 8. (top left to bottom right) (a) The first plane from the PWD; (b) the second plane from the PWD; (c) the third plane from the PWD; (d) the first plane .from the PWD, after relaxation

Figure 9. The reaE part of the input image with boundaries

Figure 10. The second synthesized texture example (from top left to bottom right). (a) The input image; (b) the image after anti-alias filtering; (c) the primary frequency plane of the PWD; (d) the absolute value of the ~?rimary plane

vol9 no 3 june 1991 183

Page 10: Spatial/spatial-frequency representations for image segmentation and grouping

Figure Il. (a) The absolute value of the primary plane, filtered; (b) the result of the relaxation step Figure 13. The result of the relaxation step

Figure lla results. Finally, applying the relaxation procedure: we get Figure llb. In the next example an alternative to taking the absolute value and filtering is shown.

In Figure 12a, the background texture is sinusoidal, b(x, y) = cosx. The foreground texture is simply a rotated version of the background, f(x, y) = cos(x cos (r/10) + y sin (n/10)). After the anti- aliasing filter, we get Figure 12b. With no averaging (M, = M2 = l),the primary plane of the PWD is shown in Figure 12~. As in the previous example, the regions are visually distinct, but not uniform. Recalculating the PWD with averaging (M, = M2 = 3), Figure 12d results. The advantages of the averaging are clear. After the relaxation, the image in Figure 13 is obtained.

The final synthesized example is shown in Figure 14a. In this example, the background function is b(x, y) = cosx + ~0~0.95~. The foreground function is f(x, y) = cosx + cos (0.95x + 27r). The only difference

Figure 14. The fourth synthesized texture example. (a) The input image; (b) the image after anti-alias filtering; (c) the primary frequency plane of the PWD; (d) the result of the relaxation step

between these two functions is in their phase. As such, their power spectra are identical, making them difficult for power representation based methods to discri- minate. The filtered image is shown in Figure 14b. Figure 14c shows the primary frequency plane of the PWD. After relaxation, the image in Figure 14d results.

Figure 12. The third synthesized texture example. (a) The input image; (b) the image after anti-alias filtering; (c) the primary frequency plane of the PWD, MI = Mz = 1; (d) the primary frequency plane of the PWD, I$=Mz=3

Natural textures

The final experiments in this section were performed on images composed of textures from Broda&*. The first image is shown 256 x 256 pixels in size in Figure 15. It has textures Dl on the left, and D14 on the right. Both textures are of woven aluminum wire, but with different magnifications and lighting. To reduce the image to a computationally tractable size, the 256 x 256 image was reduced in resolution using the kernel proposed by Meer et a1.34 to yield a 64 x 64 image. It

184 image and vision computing

Page 11: Spatial/spatial-frequency representations for image segmentation and grouping

Figure 1.5. The original 256 ~2.56 image, composed of Figure 17. The original 256 X 256 image, composed of textures DI and 014 from Brodatz textures D102 and 0103 from Brodatz

Figure 16. (a) The resoiution-reduced (64 x 64) image; (b) the 64 x 64 image, after anti-alias filtering; (c) the primary frequency plane of the PWD; (d) the result of the relaxation step

should be noted that this averaging is not a requirement of the PWD, but is done only to reduce computer time and memory usage. Figures 16a and 16b show the 64 x 64 pixel version of the image, before and after anti- alias filtering. In Figure 16c, the primary plane of the PWD is shown. Finally, Figure 16d shows the result of the relaxation. The ragged boundary between the two regions can be attributed largely to the uneven lighting of the fine texture. obvious in Figure 16a.

The final image to be segmented is shown in-Figure 17. The textures are D102 from Brodatz, a shadow- graph of cane, and D103, a shadowgraph of loose burlap. The reduced resolution (64 ~64 pixel) versions of the image, before and after anti-alias filtering, are shown in Figure 18a and 18b. The primary frequency

Figure 18. (a) The resolution-reduced (64 x 64) image; (b) the 64 x 64 image, after anti-alias fiftering; (c) the primary frequency plane of the PWD; (d) the result of the relaxation step

plane of the PWD is shown in Figure 18~. In Figure 18d, the image after relaxation is shown. These results are clearly better than those of the previous case. This is largely due to the higher contrast and lack of lighting effects in this example.

Discussion

The experiments described above prove the feasibility of using an s/sf approach for texture segmentation. There are a number of issues that could be examined in future research, including the selection of parameters, specularity, and multiple plane/data fusion.

vol9 no 3 jifne 1991 185

Page 12: Spatial/spatial-frequency representations for image segmentation and grouping

Parameter selection

The parameters available in the segmentation system are summarized in Table 2, along with their values for the examples we have discussed. Since we would like the system to be as automatic as possible, the method used to select these parameters, and the impact of the parameters on the results obtained, are important.

The size of the window h N,,Nl(k, I) was held constant throughout these experiments. N, and NZ are deter- mined by the resolution desired in the frequency domain, the size of image to be considered, and the available memory on the computer system used (since for an image N x N pixels in size, the PWD will consist of 4 x N X N X N, x N2 elements). With the additional constraint that 2NI, and 2N2 be powers of 2 (for efficient calculation of the DFT), the values chosen were considered a reasonable compromise.

The size of the averaging window, g,,.fn-l_,,ML(P, s), was chosen to provide the amount of averagmg needed for each case. Images with a significantly random aspect (e.g., natural textures) require more averaging. However, because increasing the amount of averaging increases the uncertainty of the boundary location between regions, and can cause small regions to be missed, a small window is preferred. Based on the results obtained, a reasonable compromise window might be M, %M, = 3.

The relaxation parameter (Y, and the number of relaxation iterations performed were the variables to which results were most sensitive (using a fixed /3 and number of transformations per iteration). For images with textures that have strong and distinct primary frequencies, 5 iterations with (Y = 128 and /3 = 20 provided good segmentations. If such was not the case, experimentation was required. This indicates that features other than the frequency content at primary frequencies (e.g., the sum of the contents at a number of frequencies) might reduce the sensitivity to these parameters if they yield sufficiently distinct values for different textures. The examination of such features is a topic for future research. Other possibilities for the determination of these parameters are to derive the value of a from the statistics of the feature (in this case, primary PWD) plane, and to determine the number of iterations based on the amount of change in the feature plane between iterations.

The label threshold determines the degree of simi- larity required before pixels are considered members of the same region. With the exception of the first (four region) example, the selection of this threshold was sl’aightforward. Since we were examining only single frequency planes, a threshold of 0.5 was used in each case. For the first example, however, multiple frequency planes were used. While the 0.95 threshold yielded reasonable results, there was no attempt made to fl’nd an optimal threshold. While experience with the single plane case indicates little need to change this threshold from image to image, automatic determina- tion of optimal thresholds for the multi-plane case may be of future interest.

Specularity

As seen in the natural texture examples, non-uniform lighting and specularity can lead to difficulties in segmentation (Figures 15 through 16d). These prob-

186

lems are alleviated somewhat by averaging in the calculation of the PWD. Alternative methods such as homomorphic filtering could also be investigated.

multiple planes/data fusion

Most of the examples shown used only a single feature plane. As was demonstrated in the first example (Figures 7a through 9), however, multiple planes can be used. The method could also be extended to use information from multiple sensors and different spec- tral bands.

GESTALT GROUPING USING THE PSEUDO- WIGNER DISTRIBUTION

The formulation of the Gestalt laws resulted from the study of perceptual grou ing in the 1920s. It was suggested by Wertheimer R ’ that individual elements appear to group according to a set of principles, including proximity, similarity, good continuation, symmetry, closure and common fate. The proximity law states that elements group together based on nearness. The similarity law says that elements are perceived together when they are similar in appear- ance. Good continuation means that elements group in such a way as to minimize discontinuities in otherwise continuous lines. The law of symmetry states that symmetric groupings are favoured. The closure law says that simple, closed groupings are most likeiy. Finally, common fate means that elements with com- mon fate (such as similar velocities) will group together. Examples of some of these laws are shown in Figure 19. In the first example. the circles are perceived

A C

~

x E

B D

I

000 _I

Figure 19. Examples of some Gestalt grouping laws (from top left to bottom right): proximity; similarity; good continuation; symmetry; closure

image and vision computing

Page 13: Spatial/spatial-frequency representations for image segmentation and grouping

as grouped in columns, since the closest neighbours to each circle are either directly above or directly below. Grouping is also columnar in the second example. In this case, the squares with similar brightness appear to be in the same group. The third example is an instance of the law of good continuation. The lines A-E-B and C-E-D are seen as objects, instead of the less con- tinuous interpretations, such as A-E-C or C-E-B. The law of symmetry is demonstrated in the fourth example, where the vertical lines are seen as paired to form three columns, the interpretation with the most symmetry. Finally. the last example demonstrates closure. The six shapes are perceived as three closed objects.

The tendency to see elements having similar charac- teristics as belonging to an approximately homoge- neous group suggests that the same mechanism might be at work for both Gestalt grouping and the grouping of texture elements. The Gestalt laws do not lead in any direct way to a computational theory of vision. In particular, they do not lead to a texture segmentation method. They do, however, provide a convenient method for predicting certain types of perceptual grouping. As we desire any segmentation system to give results consistent with human experience, agree- ment with Gestalt laws is desirable. Furthermore, in producing a system that behaves as predicted by the Gestalt laws, a mathematical basis for these laws might be discovered,

In the experiments that follow, we apply our PWD- based segmentation method to several examples for which &Stab-type grouping is apparent to human observers. The results obtained indicate possible expla- nations for different aspects of Gestalt organization.

Clustering/basic Gestalt laws

In the f(~llowin~ examples. we consider grouping based on proximity, similarity and good continuation.

Proximity

The proximity law states that elements appear to be in a

Figure 20. An example of the proximity law. (a) The input image; (6) the image after anti-alias filtering; (c) the primary frequency plane of the PWD; (d) the result of the relaxations step

Table 3, The parameters used in the grouping experi- ments

Experiment PWD Relaxation & Labelling

N,.N, M,.M, N fl Iter. Trans./Iter. Thresh.

Proximirv x.x 1 . I 12x 20 IO 1 0.5 SiWlilU~if~ in Size 8.X I.1 128 20 10 I 0.5 in Brightness X.8 1.1 128 20 IO I 0.5 in Shape 8.X 1.1 12x 20 111 I 0..5 in Rotation X,8 I,1 12x 20 II) I 0.5 Good c#~t~llz~ut~~~~l “Tree” 8.X 1.1 90 20 4 I 0.5 “Circle” X.X 1 .I 90 20 s I 0.5 Multiple Cuts Parallel x,x I.1 128 20 10 1 0.5 Serial x.x 1,l 128 20 IO 1 0.5

group when they are close together. Figure 20a is a typical example of a situation in which grouping by proximity occurs. Using the same procedure applied to texture segmentation in the previous section, the image after anti-alias filtering is shown in Figure 2Ob. The primary frequency plane of the PWD is shown in Figure 20~. After applying the relaxation process, Figure 2Od results. The parameter values used for this experiment and those that follow are shown in Table 3. The resultant grouping is consistent both with that experi- enced visually and that predicted by the proximity law.

Similarity

The similarity law states that elements group together based on the similarity of their features. In the following examples, we examine grouping based on similarity with respect to size, brightness, shape. and rotation.

Similarity in size: Figure 21a is an example of an image in which grouping according to size is perceived. The image after anti-alias filtering is shown in Figure 21b. The primary frequency plane of the PWD, before

Figure 21. An example illustrating similarity in size. (a) The input image; (6) the image after anti-alias filtering; (c) the primary frequency plane of the PWD; (d) the result of the relaxation step

vol9 no 3 june 1991 187

Page 14: Spatial/spatial-frequency representations for image segmentation and grouping

Figure 22. An example illustrating similarity in bright- ness. (a) The input image; (b) the image after anti-alias filtering; (c) the primary frequency plane of the PWD; (d) the result of the relaxation step

Figure 24. An example illustrating similarity in rotation. (a) The input image; (b) the image after anti-alias filtering; (c) the primary frequency plane of the PWD; (d) the result of the relaxation step

and after relaxation, is shown in Figures 21c and 21d, respectively. The grouping is consistent with that perceived and with Gestalt laws.

Similarity in brightness: in Figure 22a, the clustering of elements with similar brightness can be seen. After anti-alias filtering, Figure 22b results. The primary plane of the PWD and the plane after relaxation are shown in Figures 22c and 22d. Again, the grouping performed by the system is as predicted.

Similarity in shape: Figure 23a shows an example in which grouping based on shape similarity is demons- trated. After anti-alias filtering, Figure 23b results. Figure 23c shows the primary PWD plane. After relaxation, the grouping shown in Figure 23d is found. Examination of the original image would lead one to expect groupings smaller than those shown, and perhaps more of them. Examining Figure 23a more

Figure 23. An example illustrating similarity in shape. (a) The input image; (b) the image after anti-alias filtering; (c) the primary frequency plane of the PWD); (d) the result of the relaxation step

closely, we see how this grouping comes about. Due to the low resolution of the image, the bottoms of the triangles (rows three and six) are identical to the tops of the squares (rows one, four and seven) and are grouped with the squares into the white regions of Figure 23d. The tops of the triangles are identical to the bottoms of the circles (rows two, five and eight) and are grouped with the circles into the black regions. The use of a higher resolution image and/or multiple frequency planes of the PWD might allow better grouping in such images.

Similarity in rotation: in Figure 24al elements in adjacent columns differ only in rotation. The expected grouping, then, is by columns. Figure 24b shows the image after anti-alias filtering. Figures 24c and 24d show the primary plane of the PWD, before and after relaxation. The resultant grouping is columnar, but by pairs of columns rather than single cotumns of similar rotation. This can be understood by noting the proxim- ity of the columns, so that in this case grouping is due to a combination of similarity and proximity.

Good continuation

The good continuation law states that elements will appear to be in the group with minimum discon- tinuities. In cases where a closed group is formed along a smooth curve, the closure law (which predicts that closed groups are preferred) could be considered a corollary of the good continuation law.

Figures 25a and 25b show a set of elements. before and after anti-alias filtering, respectively. Figures 2% and 25d show the primary frequency plane of the PWD and its absolute value. The result of the relaxation (applied to the absolute value of the primary frequency plane, to facilitate grouping) is shown in Figure 26. This grouping is consistent with the good continuation law.

In Figure 27a, a second set of elements is shown. The image after anti-alias filtering is shown in Figure 27b. Figures 27c and 27d show the primary PWD plane and its absolute value. After relaxation, the grouping in

188 image and vision computing

Page 15: Spatial/spatial-frequency representations for image segmentation and grouping

Figure 25. An example of good continuation. (a) The input image; (b) the image after anti-alias filtering; (c) the primary frequency plane of the PWD; (d) the absolute value of the primary plane

Figure 27. A second example of good continuation. (a) The input image; (6) the image after anti-alias filtering; (c) the primary frequency plane of the PWD; (d) the absolute value of the primary plane

Figure 26. The result of the relaxation step Figure 28. The result of the relaxation step

Figure 28 results. This grouping is consistent with that predicted by both good continuation and closure.

Alternative groupings

In the previous examples we sought only the primary groupings, using the primary frequency plane of the PWD. In some cases, less obvious (though still percep- tible) groupings exist. In the first example in this section, the primary grouping was by rows, based on similarity of object size. Examining the primary plane of the PWD (Figure 21~) this grouping is very clear. A less obvious clustering of objects is into columns, based on their proximity. Some evidence of this grouping can be seen in the second frequency plane in the PWD (the plane corresponding to the second highest energy content, Figure 29a), although the row grouping is still

dominant. In the third plane in the PWD (Figure 29b), the grouping into columns is clear. Comparing the energy levels of the second and third planes with the first (Figures 29c and 29d), it can be seen that while the second plane is nearly as strong as the first, the third has a much lower level. This indicates that the strength of a grouping, as perceived visually, is related to the energy level of the PWD plane which displays that grouping.

Clustering based on multiple cues

It has been found that there is a substantial difference in the speed with which human subjects can differenti- ate between regions, depending upon the number of cues which must be used’“. If only one cue must be examined, for example if the regions are defined only

~019 no 3 june 1991 1x9

Page 16: Spatial/spatial-frequency representations for image segmentation and grouping

Figure 29. An example displaying alternative groupings. (a) The second frequency plane of the PWD; (b) the third frequency plane of the PWD; (c) a comparison of the first and second PWD planes; (d) a comparison of the first and third PWD planes

by the shapes of the elements they contain, discrimina- tion is very rapid. If two or more cues must be examined, however, such as when both shape and colour are necessary to differentiate between regions, more time is required. The discrimination process in the single cue case has been termed parallel or preattentive, while the process in the multiple cue case has been called serial or attentive. In the examples that follow, we examine the performance of our PWD- based segmentation method on these two types of images.

In Figure 30a, an image with one region consisting of circles and another of squares, is shown. The circles and squares are of two intensities (white and grey), selected randomly. Due to photographic difficulties, this is not clear in the figures. Although the circles and squares vary in brightness, only shape needs to be noted to discriminate between these regions. Thus, this image is an example of one requiring “parallel” processing in humans. The image after anti-alias

Figure 30. A case requiring “parallel” processing in humans. (a) The input image; (b) the image after anti- alias filtering; (c) the primary frequency plane of the PWD; (d) the result of the relaxation step

Figure 31. A case requiring “serial” processing in humans. (a) The input image; (b) the image after anti- alias filtering; (c) the primary frequency plane of the PWD; (d) the result of the relaxation step

filtering is shown in Figure 30b. The primary PWD plane, before and after relaxation, is shown in Figures 3Oc and 30d. The resultant grouping is as expected.

In Figure 31a, the first region consists of white circles and grey squares, while the second is made up of white squares and grey circles (again, this is not clear in the figures due to photographic difficulties). This image, then, is one requiring “serial” processing in humans. The image after anti-aliasing is shown in Figure 31b. The primary PWD plane, before and after relaxation, is shown in Figures 31c and 31d. The grouping indicated does not differentiate between the regions we have specified.

The successful grouping obtained in the first case gives some evidence that this system performs in a manner consistent with the human visual system for preattentive grouping. The failure in the second example, however, shows that the system is not capable, in its current form. of correct grouping in images which require human subjects to focus their attention.

Discussion

The above experiments demonstrate that our s/sf approach to segmentation also produces groupings as predicted by the Gestalt laws. While grouping is often described as a low frequency phenomenon, which can be produced simply by applying lowpass filters, Jafiez” has shown that grouping can be explained in some cases by using only high frequencies. This indicates that the entire spectrum should be examined, as has been done here, to achieve grouping in the general case.

Additional investigations in this direction could be useful in increasing our understanding of grouping as performed by the human visual system. One area that might be explored is the sensitivity of grouping to viewing distance. The frequencies perceived vary with distance from the image, and as described by the contrast sensitivity function, the sensitivity of the visual system varies with frequency. Since we have shown that different (more or less dominant) groupings can be

190 image and vision computing

Page 17: Spatial/spatial-frequency representations for image segmentation and grouping

seen in different frequency planes, one would expect synthesis: some applications to the study of human that the dominant grouping would be distance depen- vision’, IEEE Tram PAMI Vol3 No 5 (September dent. 1981) pp 520-533

11

CONCLUSIONS

In this work, we examined texture segmentation and grouping as performed using a spatial/spatial-frequency representation. An experimental system based on the pseudo-Wigner distribution was implemented, and experiments in texture segmentation and Gestalt grouping were performed.

12

13

We have also shown results adding support to the theory that human vision can be explained in terms of a spatial/spatial-frequency representation. Examples were shown in which elements were grouped as predicted by the Gestalt laws. It was demonstrated that there is a correlation between the perceived grouping of elements and the energy of the PWD frequency planes in which the grouping is observed. Finally, experiments examining clustering based on multiple cues were performed. It was found that our system behaved in a manner consistent with human percep- tion, in cases where grouping is done preattentively.

14

15

16

17

ACKNOWLEDGEMENTS

Julesz, B and Bergen J ‘Textons, the fundamental elements in preattentive vision and perception of textures’, Bell Syst. Tech. J. Vol 62 No 6 (July- August 1983) pp 1619-1645 Caelli, T ‘Three processing characteristics of visual texture segmentation’, Spatial Vision Vol 1 NO 1 (1985) pp 19-30 Connors, R W and Hat-low, C A ‘A theoretical comparison of texture algorithms’, IEEE Tram PAMI Vol 2 No 3 (May 1980) pp 204-222 Weszka, J S, Dyer C R and Rosenfeld A ‘A comparison study of texture measures for terrain classification’, IEEE Trans. Syst., Man & Cybern. Vol 6 No 4 (1976) pp 269-286 Gabor, D ‘Theory of communication’ Proc. Inst. Electrical Eng. Vol 93 No 26 (1946) pp 429-457 Shannon, C E ‘Communication in the presence of noise’, Proc. Inst. Radio Eng. Vol 37 (January 1949) pp IO-21 Daugman, J G ‘Uncertainty relation for resolution in space, spatial frequency, and orientation opti- mized by two-dimensional visual cortical filters’, J Optical Sot. Am. A Vol 2 No 7 (July 1985) pp 1160-I 169

18

This work was supported in part by the Minnesota Supercomputer Institute. We would like to thank Todd El1 for his contributions to the Gestalt experiments.

19

REFERENCES

Wilson, R and Granlund, G H ‘The uncertainty principle in image processing’, IEEE Trans. PAMI Vol 6 No 6 (November 1984) pp 758-767 du Buf, J M H, Kardan, M and Spann, M ‘Texture feature performance for image segmentation’, Pattern Recognition Vol 23 Nos 314 (1990) pp 291- 309

20 Haralick, R M, Shanmugam, K and Dinstein, I ‘Textural features for image classification’, IEEE Trans. Syst. Man & Cybern. Vol 3 No 1 (Novem- ber 1973) pp 610-621 Geman, S and Geman, D ‘Stochastic relaxation, Gibbs distributions, and Bayesian restoration of images’, IEEE Trans. PA MI Vol No 6 (November 1984) pp 721-741 Derin, H and Elliot, H ‘Modelling and segmenta- tion of noisy and textured images using Gibbs random fields’, IEEE Trans. PAMI Vol 9 No 1 (January 1987) pp 39-55 Crick, F H C, Marr, D C and Poggio, T An Information Processing Approach to Understand- ing the Visual Cortex Al memo 557 MIT, USA (April 1980) Watson, A B ‘The cortex transform: rapid com- putation of simulated neural images’, Comput. Vision, Graph. & Image Process. Vol 39 (1987) pp 311-327

1

2

3

4

5

6

7

8

9

10

Pavlidis, T Structural Pattern Recognition Springer-Verlag, New York (1977) pp 68-70 Haralick, R M and Shapiro, L M ‘Image segmenta- tion techniques’, Comput. Vision, Graph. & Image Process. Vol 29 (1985) pp loo-132 Nevatia, R ‘Image segmentation’ in: T Y Young and K S Fu teds), Handbook of Pattern Recogni- tion and Image Processing Academic Press, New York (1986) pp 215-231 Wechsler, H ‘Texture analysis - a survey’ Signal Process. Vol 2 (1980) pp 271-282 Van Gool, L, Dewaele, P and Oosterlinck, A ‘Texture analysis anno 1983’ Comput. Vision, Graph. & Image Process. Vol 29 (1985) pp 336- 357 Haralick, R M ‘Statistical image texture analysis’ in: T Y Young and K S Fu (eds) Handbook of Pattern Recognition and Image Processing Acade- mic Press, New York (1986) pp 247-279 Gurari, E M and Wechsler, H ‘On the difficulties involved in the segmentation of pictures’, IEEE Tram PA MI Vol 4 No 3 (May 1982) pp 304-306 Witkin, A P ‘Recovering surface shape and orien- ;a7t24; from texture’ Artif. Intell. Vol 17 (1981) pp

Aloimonos, J and Chou, P B Detection of Surface Orientation and Motion from Texture: I. The Case of Planes Technical Report TR 161 Department of Computer Science, University of Rochester. Rochester, NY (January 1985) Gagalowicz, A ‘A new method for texture field

vol9 no 3 june 1991

21

22

23

24

25

26

27

Ginsburg, A P ‘Specifying relevant spatial informa- tion for image evaluation and display design: an explanation of how we see certain objects’, Proc. SID Vol 21 No 3 (1980) pp 219-227 Beck, J, Sutter, A and Ivry, R Spatial frequency channels and perceptual grouping in texture segre- gation’, Comput. Vision, Graph. & Image Process. Vol 37 (1987) pp 299-325 Allen, J B and Rabiner, L R ‘A unified approach to short-time Fourier analysis and synthesis’, Proc. IEEE Vol 65 No 1 l(November 1977) pp 1558- 1564

191

Page 18: Spatial/spatial-frequency representations for image segmentation and grouping

28

29

30

31

32

33

34

35

36

37

38

39

40

41

42

43

44

45

192 image and vision computing

Bajcsy, R and Lieberman L ‘Texture gradient as a depth cue’, Comput. Vision, Graph, & Image Process. Vol 5 (1976) pp 52-67 Pentland, A P ‘Fractal-based description of natural scenes’, IEEE Trans. PAMI Vol 6 No 6 (Novem- ber 1984) pp 661-674 Wilson, H and Giese S C ‘Threshold visibility of frequency gradient patterns’, Vision Res. Vol 37 (1977) pp 1177-1190 Witkin, A P ‘Scale-space filtering’, Proc. 8th Znt. Joint Conf. Artif. Inteil. Karlsruhe, Germany (August 1983) pp 1019-1022 Anderson, C H, Burt, P J and van der Wal, G S ‘Change detection and tracking using pyramid transform techniques’, Proc. SP~E~~ambridge Int. Conf. Intelligent Robots and Comput. Vision Cam- bridge, MA (September 1985) pp 72-78 Rosenfeld, A ‘Some useful properties of pyramids’ in: A Rosenfeld (ed) Multiresolution Image Proces- sing and Analysis Springer-Verlag, Beriin- Heidelberg (1984) pp 2-5 Meer, P, Baugher, E S and Rosenfeld, A ‘Fre- quency domain analysis and synthesis of image pyramid generating kernels’, iEEE Trans. PAMI Vol9 No 4 (July 1987) pp 512-522 Crowley, J L and Parker, A C ‘A representation for shape based on peaks and ridges in the difference of low-pass transform’, IEEE Trans. PAMI Vo16 No 2 (March 1984) pp 156-170 Marr, D Vision W H Freeman, New York, NY (1982) p 64 Porat, M and Zeevi, Y Y ‘The generalized gabor scheme of image representation in biological and machine vision’, IEEE Trans. PAMI Vol 10 No 4 (July 1988) pp 452-468 Clark, M, Bovik, A C and Geisler, W S ‘Texture segmentation using a class of narrowband filters’, Proc. IEEE Int. Co& Acoustics, Speech and Signal Process. Dallas, Texas (April 1987) pp 14.6.1-14.6.4 du Buf, J M H ‘Towards unsupervised texture segmentation using Gabor spectral decomposi- tion’, Proc. 5th Int. Conf. Image Analysis and Process. Positano, Italy (September 1989) pp 65- 72 du Buf, J M H ‘Gabor phase in texture discrimina- tion’, Signal Processing Vol 21 (1990), to appear Kulikowski, J J, Marcelja, S and Bishop, P 0 ‘Theory of spatial position and spatial frequency relations in the receptive fields of simple cells in ~~,~I~8al cortex’, Biol. Cybern. Vol 43 (1982) pp

Pollen, D A and Ronner, S F ‘Phase relationships between adjacent simple cells in the visual cortex’, Science Vol 212 (1981) pp 1409-1411 Wigner, E ‘On the quantum correction for ther- modynamic equilibrium’, Phys. Rev. Vol40 (June 1932) pp 749-7.59 Ville, J ‘Theorie et applications de la notion de signal analytique’, Cable et Transmission Vol 2 No I (1948) pp 61-74 Claasen, T A C M and Meckienbrauker, W F G ‘The Wigner distribution - a tool for time- frequency signal analysis, Part I: continuous-time signals’, Philips J. Res. Vol35 No 3 (1980) pp 217- 250

46

47

48

49

50

51

52

53

54

55

56

57

58

59

60

61

62

Claasen, T A C M and M~klenbrauker, W F G ‘The Wigner distribution - a tool for time- frequency signal analysis, Part II: discrete-time signals’, Philips J. Res. Vol 35 Nos 415 (1980) pp 276-300 Claasen, T A C M and Mecklenbrauker, W F G ‘The Wigner distribution - a tool for time- frequency signal analysis, Part III: relations with other time-frequency signal transformations’, Philips J. Res. Vol 35 No 6 (1980) pp 372-389 Bartelt, H 0, Brenner, K H and Lohmann, A W ‘The Wigner distribution function and its optical production’, Optics Common. Vol 32 No 1 (Janu- ary 1980) pp 32-38 Bastiaans, M J ‘The Wigner distribution function applied to optical signals and systems’, Optics Commun. Vol25 No 1 (April 1978) pp 26-30 Bastiaans, M J ‘Wigner distribution function and its application to first-order optics’, J. Optical Sot. Am. Vol 69 No 12 (December 1979) pp 1710- 1716 Jacobson, L and Wechsler, H ‘A paradigm for invariant object recognition of brightness, optical flow and binocular disparity images’, Pattern Recognition Lett. Vol 1 (October 1982) pp 61-68 Jacobson, L and Wecbsler, H ‘A theory for invariant object recognition in the frontopara~lel ptane’, IEEE Trans. PAMIVol6 No 3 (May 1984) pp 325-331 Jacobson, L and Wechsler, H ‘Derivation of optical flow using a spatiotemporal-frequency approach’, Comput. Vision, Graph. & Image Process. Vol 38 (1987) pp 29-65 Cristobal, G, Bescos, J, Santamaria, J and Montes, J Wigner distribution representation of digital images’, Pattern Recognition Lett. Vol 5 (March 1987) pp 215-221 Oppenbeim, A V and Lim J S ‘The importance of phase in signals’, Proc. IEEE Vol 69 (1981) pp 529-541 Martin, W and Flandrin P ‘Analysis of non- stationary processes: short-time periodigrams versus a pseudo-Wigner estimator’, in H Schussler (ed.) EUSZPCO-83 North-Holland, Amsterdam (1983) pp 455-458 Martin, W ‘Measuring the degree of non- stationarity by using the Wigner-Ville spectrum’, Proc. Int. Conf. Acoustics, Speech and Signal Process. San Diego, CA (March 19-21 1984) pp 41B.3-41B.4 Martin, W and Flandrin, P ‘Detection of changes of signal structure by using the Wign~r-Vi~le spectrum’, Signal Process. Vol 8 No 2 (1985) pp 215-233 Reed, T R and Wechsler, H ‘Tracking of non- stationarities for texture fields’, Signal Process. Vol 14 No 1 (January 1988) pp 95-102 Jacobson, L and Wechsler, H ‘Joint spatial/spatial- frequency reprsentation’, Signal Process. Vol 14 No 1 (January 1988) pp 37-68 Durrani, T S, Chapman, R and Willey, T ‘Systolic processor for computing the Wigner distribution’, Electronic Lett. Vol 19 No 13 (June 23 1983) pp 476-477 Boashash, B and Black, P J ‘An efficient real-time implementation of the Wigner-Ville distribution’,

Page 19: Spatial/spatial-frequency representations for image segmentation and grouping

IEEE Trans. Acoustics, Speech, and Signal Fro- cess. Vol 35 No 11 (November 1987) pp 1611- 1618

63 Reed, T R and Wechsler, H ‘Texture segmentation and organization using the Wigner distribution’, in J L Lacoume, A Chehikian, N Martin and J Malbos, (eds.) Signal Processing IV: Theories and Applications, Proceedings of EUSIPCO-88, Else- vier Science Publishers BV, North-Holland (1988) pp 263-266

64 Reed, T R and Wechsler, H ‘Texture analysis and clustering using the Wigner distribution’, Proc. 9th Int. Conf. pattern Recognition Rome, Italy (November 14-17 1988) pp 770-772

65 Reed, T R and Wechsler, H ‘Segmentation of textured images and Gestalt organization usi,ng spatial/spatial-frequency representations’, IEEE Trans. PAMI Vol 12 No 1 (January 1990) pp l-12

66 Kaiser, J F ‘Nonrecursive digital filtering using the Io - sinh window function’, Proc. IEEE Int. Symposium on Circuit Theory (1974) pp 20-23

67 Rosenfeld, A ‘Iterative methods in image analysis’, Pattern Recognition Vol 10 (1978) pp 181-187

68 Brodatz, P Textures - A Photographic Aibum for Artists and Designers Dover, New York (1966)

69 Wertheimer, M ‘Principles of perceptual organiza- tion’, in D C Beardslee and M Wertheimer (eds.) Readings in Perception Van Nostrand, Princeton (1958) pp 115-135

70 Treisman, A ‘Preattentive processing in vision’, Comput. Vision, Graph. & Image Process. Vol 31 No 2 (August 1985) pp 156-177

71 JGiiez, L ‘Visual grouping without low spatial frequencies’, Vision Res. Vol 24 No 3 (1984) pp 27 t-274

~019 no 3 june 1991 19.3