a class of discrete multiresolution random fields and its application

15
A Class of Discrete Multiresolution Random Fields and Its Application to Image Segmentation Roland Wilson and Chang-Tsun Li Abstract—In this paper, a class of Random Field model, defined on a multiresolution array is used in the segmentation of gray level and textured images. The novel feature of one form of the model is that it is able to segment images containing unknown numbers of regions, where there may be significant variation of properties within each region. The estimation algorithms used are stochastic, but because of the multiresolution representation, are fast computationally, requiring only a few iterations per pixel to converge to accurate results, with error rates of 1-2 percent across a range of image structures and textures. The addition of a simple boundary process gives accurate results even at low resolutions, and consequently at very low computational cost. Index Terms—Markov random fields, image segmentation, Bayesian estimation. æ 1 INTRODUCTION A MONG the statistical approaches to image modeling, Markov Random Fields (MRFs) have been around about the longest [37], [6], [9]. Recently, however, they have gained significant attention [8], [12], [22], [28], [30], [33], especially in the segmentation of regions of more or less uniform color or texture. For example, Geman et al. [11] used the Kolmogorov- Smirnov nonparametric measure of difference between the distributions of spatial features extracted from pairs of blocks of pixel gray levels, with maximum a posteriori (MAP) estimation of the boundary. Panjwani and Healey [30] adopted an MRF model to characterize textured color images in terms of spatial interaction within and between color planes. In a technique which has some similarities with [36], Bouman and Shapiro used sequential maximum a posteriori (SMAP) estimation in conjunction with a multiscale random field (MSRF) [5], which is a sequence of random fields at different scales. Other work exploring the multiresolution approach to MRFs is described in [7], [1], [32], [19], and [31]. The last two of these papers point out the difficulties in preserving the Markovian properties, which require a locality constraint, within a sampling scheme which implies the violation of that constraint. On the other hand, there is ample evidence that multiresolution processing can lead to highly efficient algorithms in many areas—from image restoration to optical flow. The upsurge in interest in MRFs has been prompted by the work of Besag [2] and Geman and Geman [12], Geman et al. [11], largely because of the new approaches to Bayesian or Maximum a Posteriori (MAP) estimation. Although expen- sive computationally, these algorithms provide a relatively simple way to approach an optimal estimate using the principles of stochastic sampling, or Markov Chain Monte Carlo methods [34], as they are sometimes called. Alterna- tively, it is often possible to get adequate results with deterministic procedures, such as Besag’s Iterated Condi- tional Modes (ICM) algorithm [3]. In any event, MAP estimation based on noncausal MRFs suffers from several drawbacks when applied to images. Apart from the consider- able computational burden, the simpler models have “low energy” states which represent uniform colorings, so that, if an MCMC algorithm is allowed to run long enough, it will tend to produce results which reflect this, especially if the data are noisy. On the other hand, deterministic algorithms such as ICM can become trapped in local minima, also an undesirable property. While there has been a lot of work showing the efficacy of multiresolution methods, these have often been justified on heuristic grounds. The early stochastic models [7], [1], [27], [5], such as those based on quadtrees, are prone to the blocking artifacts associated with the sampling scheme. The last few years have seen a number of papers describing multiresolution RF models and proposing estima- tion and segmentation algorithms using deterministic or stochastic methods to find a MAP labeling [16], [20], [29], [24]. Although these authors give results on a wide variety of images, they do not address two problems which are important in a number of applications: images may contain an unknown number of classes and regions within which there is significant variation of properties, such as intensity and texture. The work presented here is an attempt to address these issues using a novel form of multiresolution model. As such, it is closely related to the work of Bouman, as well as the more recent work in [16], [17], [20]. It also starts from a quadtree process in which the passage from coarse to fine scales can be described as a Markov Chain. It differs fundamentally, however, in the process by which each scale or resolution is conditioned on its immediate predecessor in scale: Whereas, in the simple quadtree model, a pixel at a given scale depends only on its ancestors in scale, in this model it is also directly dependent on its neighbors at its own scale. This allows it to model image structures, such as 42 IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 25, NO. 1, JANUARY 2002 . R. Wilson is with the Computer Science Department, University of Warwick, Coventry, CV4 7AL UK. E-mail: [email protected]. . C.-T. Li is with the Electrical Engineering Department, CCIT, National Defense University, Tahsi, Taoyuan 33509, Taiwan, R.O.C. E-mail: [email protected]. Manuscript received 23 Feb. 2000; revised 28 Mar. 2001; accepted 21 Mar. 2002. Recommended for acceptance by C. Bouman. For information on obtaining reprints of this article, please send e-mail to: [email protected], and reference IEEECS Log Number 111544. 0162-8828/03/$17.00 ß 2003 IEEE Published by the IEEE Computer Society

Upload: vancong

Post on 14-Feb-2017

220 views

Category:

Documents


2 download

TRANSCRIPT

A Class of Discrete MultiresolutionRandom Fields and Its Application

to Image SegmentationRoland Wilson and Chang-Tsun Li

Abstract—In this paper, a class of Random Field model, defined on a multiresolution array is used in the segmentation of gray leveland textured images. The novel feature of one form of the model is that it is able to segment images containing unknown numbers ofregions, where there may be significant variation of properties within each region. The estimation algorithms used are stochastic, butbecause of the multiresolution representation, are fast computationally, requiring only a few iterations per pixel to converge to accurateresults, with error rates of 1-2 percent across a range of image structures and textures. The addition of a simple boundary processgives accurate results even at low resolutions, and consequently at very low computational cost.

Index Terms—Markov random fields, image segmentation, Bayesian estimation.

æ

1 INTRODUCTION

AMONG the statistical approaches to image modeling,Markov Random Fields (MRFs) have been around about

the longest [37], [6], [9]. Recently, however, they have gainedsignificant attention [8], [12], [22], [28], [30], [33], especially inthe segmentation of regions of more or less uniform color ortexture. For example, Geman et al. [11] used the Kolmogorov-Smirnov nonparametric measure of difference between thedistributions of spatial features extracted from pairs of blocksof pixel gray levels, with maximum a posteriori (MAP)estimation of the boundary. Panjwani and Healey [30]adopted an MRF model to characterize textured color imagesin terms of spatial interaction within and between colorplanes. In a technique which has some similarities with [36],Bouman and Shapiro used sequential maximum a posteriori(SMAP) estimation in conjunction with a multiscale randomfield (MSRF) [5], which is a sequence of random fields atdifferent scales. Other work exploring the multiresolutionapproach to MRFs is described in [7], [1], [32], [19], and [31].The last two of these papers point out the difficulties inpreserving the Markovian properties, which require a localityconstraint, within a sampling scheme which implies theviolation of that constraint. On the other hand, there is ampleevidence that multiresolution processing can lead to highlyefficient algorithms in many areas—from image restoration tooptical flow.

The upsurge in interest in MRFs has been prompted by thework of Besag [2] and Geman and Geman [12], Geman et al.[11], largely because of the new approaches to Bayesian orMaximum a Posteriori (MAP) estimation. Although expen-sive computationally, these algorithms provide a relatively

simple way to approach an optimal estimate using theprinciples of stochastic sampling, or Markov Chain MonteCarlo methods [34], as they are sometimes called. Alterna-tively, it is often possible to get adequate results withdeterministic procedures, such as Besag’s Iterated Condi-tional Modes (ICM) algorithm [3]. In any event, MAPestimation based on noncausal MRFs suffers from severaldrawbacks when applied to images. Apart from the consider-able computational burden, the simpler models have “lowenergy” states which represent uniform colorings, so that, ifan MCMC algorithm is allowed to run long enough, it willtend to produce results which reflect this, especially if the dataare noisy. On the other hand, deterministic algorithms such asICM can become trapped in local minima, also an undesirableproperty. While there has been a lot of work showing theefficacy of multiresolution methods, these have often beenjustified on heuristic grounds. The early stochastic models [7],[1], [27], [5], such as those based on quadtrees, are prone to theblocking artifacts associated with the sampling scheme.

The last few years have seen a number of papersdescribing multiresolution RF models and proposing estima-tion and segmentation algorithms using deterministic orstochastic methods to find a MAP labeling [16], [20], [29], [24].Although these authors give results on a wide variety ofimages, they do not address two problems which areimportant in a number of applications: images may containan unknown number of classes and regions within whichthere is significant variation of properties, such as intensityand texture. The work presented here is an attempt to addressthese issues using a novel form of multiresolution model. Assuch, it is closely related to the work of Bouman, as well as themore recent work in [16], [17], [20]. It also starts from aquadtree process in which the passage from coarse to finescales can be described as a Markov Chain. It differsfundamentally, however, in the process by which each scaleor resolution is conditioned on its immediate predecessor inscale: Whereas, in the simple quadtree model, a pixel at agiven scale depends only on its ancestors in scale, in thismodel it is also directly dependent on its neighbors at its ownscale. This allows it to model image structures, such as

42 IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 25, NO. 1, JANUARY 2002

. R. Wilson is with the Computer Science Department, University ofWarwick, Coventry, CV4 7AL UK. E-mail: [email protected].

. C.-T. Li is with the Electrical Engineering Department, CCIT, NationalDefense University, Tahsi, Taoyuan 33509, Taiwan, R.O.C.E-mail: [email protected].

Manuscript received 23 Feb. 2000; revised 28 Mar. 2001; accepted 21 Mar.2002.Recommended for acceptance by C. Bouman.For information on obtaining reprints of this article, please send e-mail to:[email protected], and reference IEEECS Log Number 111544.

0162-8828/03/$17.00 ß 2003 IEEE Published by the IEEE Computer Society

multiple regions having smooth boundaries, in a morerealistic way than earlier models. In this respect, it is closelyrelated to the models presented by Kato et al. [17] andMignotte et al. [29]. Unlike those works, however, the newmodel can be applied to images containing an arbitrarynumber of classes. This approach is complemented by a newobservation model, which is suited to problems in computervision and similar areas, where there is significant variationwithin regions, as well as between regions. Equally impor-tant, by an appropriate choice of model, estimation algo-rithms can be made independent of the number of classes.MAP estimation with the model is computed using simulatedannealing, on a coarse-fine, sequential basis. This is fastcompared to conventional approaches and allows simulta-neous estimation of model parameters. After presenting themain features of the prior and data models, there is adescription of the estimation algorithms derived from them.To demonstrate their effectiveness, the methods are appliedto the segmentation of noisy and textured images. The paperis concluded with some remarks on the usefulness of this typeof RF model to the segmentation problem and its limitationsfor such applications.

2 THE PRIOR MODEL

The most common form of the segmentation problem—theone considered here—is that of inferring the existence of anumber of more or less well-defined regions from noisy orambiguous data. This presupposes that the images of interesthave a relatively small number of such regions, each of whichis sufficiently distinct in its properties and of sufficient area tobe identifiable. Reflecting this in a prior model is by no meanssimple. In particular, MRF models on 2D lattices suffer fromthe criticality phenomenon: for interesting ranges of priorparameters, they tend to give uniform colorings, i.e., a singleregion, with a few small clusters of other colors [18]. Thisphenomenon is strictly speaking a limiting effect, applying toinfinite lattices, suggesting that a multiresolution approachmight offer a solution: use the coarse scales to capture thetypical sizes and numbers of regions, while finer scaleselaborate the boundary shapes. Among the multiresolutionmodels which have been proposed, the approach taken firstby Bouman and Shapiro [5] and, recently, elaborated byMignotte et al. [29] seems to offer this possibility. Thispreserves the so-called scale causal approach of coarse-fineapproximation, within a MRF framework. However, boththese works have limitations: The number of classes has to beknown (in [29], it is fixed at 2) and they cannot deal effectivelywith textures with large cell sizes.

As in [29], therefore, the image model is a sequence ofMRFs, Xk; 0 � k � N , conforming to a quadtree structure,with a nominal top level 0 and N levels below that, level khaving 2k � 2k sites for a finite image of size 2N � 2N pixels.

Note that levels are ordered from 0 at the top of the tree toN atthe image level. However, each site s 2 Q has an integer statexs 2 Z, its class label.Theneighborhoodstructureconsistsofnpixels in an isotropic neighborhood, such as the standard firstand second order MRF models [2]. In addition, there is a parentset, on level kÿ 1, which in the simplest case consists of thequadtree father,

Ps ¼ fðbis=2c; bjs=2c; kÿ 1Þg; ð1Þ

where b:c denotes the floor of a real number and ðis; jsÞ is theimage coordinate of the site s on level k. The “father” site willbe denoted pðsÞ, so that, in the above case, Ps ¼ fpðsÞg. Themain feature of these neighborhood systems is that theyimply causality in scale: The process at level k is conditioned onthat at level kÿ 1. This has important consequences for theproperties which such fields can display. Only pairwiseinteractions are involved, so that the conditional probabilitydefining the RF can be written

P ðXs ¼ mjfXr ¼ xr; r 2 N s [ PsgÞ¼ �

Yr2N s[Ps

�ðXs ¼ mjXr ¼ xrÞ; ð2Þ

where m is the label at s, � is a normalizing factor and thepairwise interactions are given for a site s 2 Qk, the kth levelof the tree, by

�ðXs ¼ mjXr ¼ nÞ ¼ ebklð�mnÿ1Þ; r 2 Ql; ð3Þ

where bkl are constants and �mn is the Kronecker-�. Note thatthe setQk contains 2k � 2k sites in this scheme. The model thusdefined is independent of the number of classes. Globally, themodel can be expressed in terms of Gibbs potentials [18]:

U xkjxkÿ1ÿ �

¼Xs

Xr2Ps

V k;kÿ1ðxs; xrÞþ1

2

Xs

Xr2N s

V k;kðxs; xrÞ; ð4Þ

where xl denotes the configuration for the whole level l andfor the model of (3),

V k;lðm;nÞ ¼ bklð1ÿ �mnÞ: ð5Þ

The model encompasses both the quadtree model, for whichbkk ¼ 0 and the conventional “flat” MRF, for which bkkÿ1 ¼ 0.Note that the model is based on differences between labels: Itspecifies that adjacent sites are more likely to have the samelabel than not. Because of the product form, the conditioningofXs depends only on the numbers of labels of different classesamong its neighbors. The causal nature of the field isexpressed via the conditional Gibbs distribution

P Xk ¼ xkjXkÿ1 ¼ xkÿ1ÿ �

¼ 1

Zkðxkÿ1Þ eÿUðxkjxkÿ1Þ ð6Þ

and Zkð:Þ is the so-called partition function, which dependson the configuration xkÿ1 at the parent level. The depen-dence of the configuration on the father level is demon-strated most clearly in the following theorem.

Proposition 1. Let P ðXkjXkÿ1Þ define a conditional MarkovRandom Field through the potential function in (4), where thepotentials are, without loss of generality, zero if the two statesare identical, positive if they are different and satisfy

V k;kÿ1ðm;nÞ � jN sjmaxp6¼q

V k;kðp; qÞ > 0; if m 6¼ n: ð7Þ

WILSON AND LI: A CLASS OF DISCRETE MULTIRESOLUTION RANDOM FIELDS AND ITS APPLICATION TO IMAGE SEGMENTATION 43

Fig. 1. “Distance” measure between boundary segments is based on

the line joining their estimated positions, ~llsr and their estimated

orientations �r; �s.

Then, the configuration xk� which maximizes P ðxkjxkÿ1Þ, fora given configuration xkÿ1, is given by

x�s ¼ xpðsÞ; ð8Þ

where pðsÞ is the quadtree father of s. In other words, the mostlikely configuration on level k; xk�, is just the copy—or“projection”—of that on the level above.

Theproof isgivenintheAppendix.Toillustrate theeffectofthe father level, Fig. 2a shows an array of sample imagesgenerated from the same 16� 16 father image using differentcombinations of father-child and neighbor interactions. Each32� 32 image within this array is the result of 2; 000 iterationsof the sampler, more than sufficient for an array of this size toapproach the stationary distribution. Note that the image atthe top left shows complete randomness, as both interactionsare 0 in this case, while moving from top to bottom theneighbor interaction potential doubles at each step and fromleft toright thefather-child interactionsimilarlydoubles.Notethat, in this and the other sample images, the image boundarypixels are fixed as “black” ð0Þ, so that, in the absence of father-child interactions, for thehigher levelsofneighbor interaction,the stationary distribution will have a maximum when allpixels are black. Although the 4-neighbor field is simple, it islimited in its ability to produce smooth boundaries and, so, itmay be worth considering the 8-neighborhood. Using the8-neighbors as conditioning elements produces the resultshown in Fig. 2b. In comparison with Fig. 2a, the regions arenoticeably smoother, with the gaps in the nonconvex shapebeingfilled,but thesamegeneral trends canbeobserved as theinteraction strengths increase. For example, with greaterpotentials to the father level, small regions tend to survivebetter. This is evident in the images on the right side of thearray.

An obvious defect of the above models is that boundariestend to align with the quadtree, which introduces a “blocki-ness” into both the statistical structure and the sample images.A simple way around this problem is to make the influence ofthe father level zero for any site at a boundary (i.e., having aneighbor of a different color). This is accomplished by settingthe father-child potential to 0 whenever the father has aneighbor in adifferentclass.Withaslightabuseof thenotationof (4) to indicate spatial variation of the potentials

Vs;pðsÞðm;nÞ ¼ 0; if xpðsÞ 6¼ xr; for some r 2 N pðsÞ: ð9Þ

This is equivalent to extending the set of parents of s onlevel kÿ 1 to a larger set of pixels. A noteworthy consequenceof this is that the approximation of a straight edge becomesless jagged as the resolution increases. As an illustration of theeffect of these modifications, Fig. 3 shows samples from abinary process using the 8-neighbors plus parents on levels5 � k � 8, but a quadtree process, i.e., bkk ¼ 0, on levels0 � k � 4. By choosing the model parameters appropriatelyas functions of the level, different combinations of structurecan be obtained. In this case, the coarse structure representingthebottomlevelof thepurequadtreeprocess is refinedbythe8-neighbor multiresolution MRF (MMRF), resulting in a single,smooth “blob” representing the object. Similar results wereobtained after 10; 000 iterations of the simulation, indicatingthat the system is in equilibrium. A second example, using alower correlation coefficient between neighbors, is shown in

Fig. 3b. Note that the large scale structure, in this case, is

punctuated by some smaller “objects.”For some applications, the two properties of reducing the

bias caused by the quadtree, while preserving the region

structure given by the highest level in the tree, can be

captured using a parent-child probability

P ðXs 6¼ XpðsÞÞ ¼ �dpðsÞk ; s 2 Qk; ð10Þ

where 0 < �k < 1 is a constant for level k and ds is thedistance between a site and the nearest site on the samelevel in a different class. With this dependence on thefather, the process tends to behave more like a scale causal

44 IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 25, NO. 1, JANUARY 2002

Fig. 2. Illustrating the effect of father-child interactions. Each of the16 subimages is sampled at 32� 32 pixels using (a) 4-neighbors and thefather as the conditioning elements and (b) using 8-neighbors and father.From top to bottom, the neighbor interaction energy increases, while fromleft to right the father-child interaction increases. Each image is theoutcome of 2,000 iterations using a common seed.

model in areas far from the boundaries and like a singlescale MRF in boundary regions.

3 THE OBSERVATION MODEL

The simplest model of the observations, given the labeling!k, is that at each site s there is an observation Ys related tothe site label Xs by

P ðYsjXr; r 2 QÞ ¼ P ðYsjXsÞ: ð11Þ

In other words, given the site labels, the observations areconditionally independent. In particular, if the data aremultivariate normal,

P ðyysjXs ¼ iÞ ¼1

j2��ij1=2exp ÿ0:5ðyys ÿ �i�iÞT�ÿ1

i ðyys ÿ �i�iÞh i

;

ð12Þ

where �i�i;�i are the nÿ d mean and variance parametersassociated with the ith class.

However, this model cannot cope with the variabilitytypically seen in some types of image, where the 3D geometryof the world induces smooth lighting and textural variationswithin and not just between regions. To solve this problem, it isconvenient to use a second model, which like the prior, isbased on differences between neighboring sites. This is anMRF model, in which the differences between neighboringsites are conditionally independent, given the label field

P ðYsjYr;Xs;Xr; r 2 N s [ PsÞ¼Y

r2N s[Ps

P ðYsÿ YrjXs;XrÞ: ð13Þ

Although it might seem more complex than the model of (11),in some ways it is simpler: It has the same structure as the labelmodel and can be used with a minimum number ofparameters. Moreover, the conditional independence ofdifferences between observations based on blocks of imagedata is not unreasonable, particularly at large scales. In theapplication to texture segmentation, a normal model is used,of the form

P yysjyyr; xs; xr; r 2 N sð Þ

¼Yr2N s

1

j2��xsxr j1=2

exp ÿ0:5ðyys ÿ yyrÞT�ÿ1xsxrðyys ÿ yyrÞ

h i;ð14Þ

where �xsxr is the interclass covariance matrix. Note that nodifference in means is assumed, even when the two sites arein different classes. This removal of any dependence onclass means implies a robustness to smooth variations in themean, such as those which can be seen on the right side ofFig. 11: not only the intensity, but also the texture cell sizevary considerably within the “reptile” texture. Moreover, ifthe variance model is reduced to

�xsxr ¼�w if xs ¼ xr�o else;

�ð15Þ

where ��w;��o are the within and outside class variances, thenthereareonlytwomatrixparameters toestimate, regardlessofthenumberofclasses.Thus, therightcombinationofpriorandobservation models does indeed remove the dependence onthe number of classes and give robustness to intraregionvariation.

WILSON AND LI: A CLASS OF DISCRETE MULTIRESOLUTION RANDOM FIELDS AND ITS APPLICATION TO IMAGE SEGMENTATION 45

Fig. 3. Samples from two MMRF processes. The bottom level of the pyramid, k=8, is 256 by 256 pixels. 1,000 iterations were used at each level.

(a) has stronger neighbor interaction than (b).

4 ESTIMATION

The main application of the model is in inference of thelabel field X from noisy data. Consider the problem ofestimating the image at one level, Xk, say, from noisy imagedata Y . If the image xkÿ1 is known, it is possible to use theConditional Maximum A Posteriori (CMAP) estimator

xxk ¼ arg maxxk

P xkjy; xkÿ1ÿ �

; ð16Þ

where, from (6) above,

P ðXkjY ;Xkÿ1Þ ¼ P ðY jXkÞP ðXkjXkÿ1Þ

P ðY jXkÿ1Þ ; ð17Þ

the first term on the right being the likelihood. While theCMAP estimate is simple, there are few practical applica-tions where any of the images Xk; 0 � k � N is available. Asa practical alternative, Laferte et al. [20], Mignotte et al. [29],following Bouman and Shapiro [5], use the sequentialestimator based on the same conditional structure, in whichfirst the top level Xkmin is estimated and then each levelbelow that, conditioned on its father level via (17)

xxk ¼ arg maxxk

P xkjy; xxkÿ1ÿ �

: ð18Þ

There remains the problem of defining the interactionbetween the label field Xk and the observations. This istackled in both [5] and [29] by relating the labels at variousresolutions to the full resolution data, Y , although in thelatter case, this is done directly on the posterior distribution,rather than through the likelihood. The approach adoptedhere differs in that it is based on a multiresolutionrepresentation of the observations: assume there exists asequence of observation fields of the form fY k; 0 � k � Ng,having the same structure as the label fields Xk andsatisfying the refinement condition

P Y jXkÿ �

¼ P Y jY kÿ �

P Y kjXkÿ �

: ð19Þ

In other words, the detail at resolutions greater than k isindependent of both the data and the label fields at resolutionk,Y k, andXk. In this case,P ðY jY kÞmay be cancelled betweenthe numerator and denominator in (17), giving

P ðXkjY ;Xkÿ1Þ ¼ P ðYkjXkÞP ðXkjXkÿ1ÞP ðY kjXkÿ1Þ ð20Þ

so that the inference for the labels Xk need only be based onthe observation field Y k at that resolution, giving

xxk ¼ arg maxxk

P xkjyk; xxkÿ1ÿ �

: ð21Þ

Although (19) is a tight requirement, which is not animmediate consequence of the model structure, it turns outto be close to being satisfied in the applications. A heuristicexplanation is that the details, in high-frequency waveletbands, are concentrated in the regions defined by boundariesat coarser scales, where labels are determined by neighbors,not ancestors; within regions, these bands represent noise,which is independent of the region class. This is true of thegray-level images shown in the applications. For the texturedimages, it is applicable not to the image data, but to the texturefeatures upon which the segmentation is based.

To implement the estimator in a practical application, ifthe prior has neighbor interactions, only a stochastic—orsampling—method can guarantee to reach the posteriormode and then only asymptotically. Mignotte et al. reportsatisfactory results using the ICM algorithm [29]. In ourexperience, a Gibbs sampler, using simulated annealing(SA), gives better results at similar computational cost. Theresulting multiresolution estimator, which is called Sequen-tial Multiresolution MAP (SMMAP), is summarized thus:

. At level k ¼ kmin, initialize xxk with random labels,chosen uniformly and independently from therange ½0;M�, where M can be set to any numbergreater than the number of classes expected in agiven problem.

. For level k ¼ kmin; kmax,

1. Run SA for (21) on level k using annealingschedule

Tt ¼Ck

logð1þ tÞ ; ð22Þ

where the constant Ck is varied linearly with k,with higher values at low k.

2. On convergence, project xxk to level kþ 1 asinitial estimate xxkþ1 ¼ xxk�.

Note that, although SA requires random site visits toguarantee that the equilibrium distribution will be reached,manyauthorshavenoted thatsatisfactoryresultsareobtainedusing a raster scan. This approach has been used in theexperiments shown below. A practical choice for kmin is 3 or 4,given that neighbor interactions involve a 3� 3 window,while kmax is determined by the smallest window used toestimate features, i.e., 1 pixel for gray-level images and4� 4 pixels for the textured images.

The SMMAP algorithm has a number of practicaladvantages over other methods:

. Because it is sequential, it is easy to tailor the annealingon each scale so that slower convergence, which isrelatively cheap at large scales because there are fewsites, can be used to overcome local modes in theposterior, as in multitemperature annealing [17].

. The copy configuration can also be used to initializethe estimation at each level below the top level of thetree. This also speeds convergence.

. Although a Gibbs sampler can be used if theposterior is suitable, there are many other samplingschemes, such as Metroplis-Hastings, which can beused if the posterior is hard to sample from [34].

The significant weakness of SMMAP is that it cannotmaximize the joint posterior for X given Y . Achievingsimultaneous maximization is the goal of the multiresolu-tion MAP (MMAP) estimator. For the above problem,

P Xk;Xkÿ1; ::::; Xkmin jYÿ �

¼ P Xkmin jYÿ � Yk

i¼kminþ1

P XijXiÿ1; Yÿ �

ð23Þ

using the Markov property of the sequence Xi. Tomaximize this product requires sampling on the distribu-tion on the right-hand side. Provided these conditional

46 IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 25, NO. 1, JANUARY 2002

distributions are known exactly, Gibbs sampling can beused, in the following way:

1. Choose a site at random s 2 Q.2. Update its state by sampling from the posterior

P ðxsjy; xxr; r 2 N s [ PsÞ, where xxr is the currentestimate at site r.

As long as each site is visited sufficiently often and a Gibbssampler is used, then it follows directly from Theorems A andB in [12] that the invariant distribution for the sampler is theproduct in (23). Although this is potentially superior toSMMAP, MMAP does not easily allow for multitemperatureannealing, which is a significant weakness. There may also beapplications where, because of the models used, it isimpossible to sample directly from the posterior distribution.

5 PARAMETER ESTIMATION

Estimation of model parameters has been addressed by asuccession of authors, from Besag [2] to Bouman and Shapiro[5], Kato et al. [17], Laferte et al. [20] and, most recently,Mignotte et al. [29]. Most methods rely on likelihood orpseudolikelihood maximization which is straightforward ifapplied at image resolution, but has the potential to causeproblems at high levels of a quadtree, where there areinsufficient data to support models of high complexity. Theproblem is particularly acute when the number of classes isunknown since even with the normal model of (12) there areat least 2 independent data parameters per class, in additionto the prior parameters.

In the prior model of (3), there are just two parameters perlevel: P ðXs ¼ Xr; r 2 N sÞ and P ðXs ¼ XpðsÞÞ, equivalent tothe parameters bkk in (3). The intralevel parameter can beestimated using maximum pseudolikelihood during thesegmentation process. Initial estimates are based on the copyconfiguration at all levels below kmin. At kmin, it is possible touse the simple approximation that the proportion of bound-ary cliques is 2ÿð1þkminÞ, based on a single boundary runningfrom top to bottom of the image. Of course, the father-childinteraction cannot be approached in this way. While theestimation used by Mignotte et al. [29] or the EM estimator ofBouman and Shapiro [5] are suitable for certain problems,they cannot be used in the segmentation of textures having acell size greater than the block size at high resolutions. Theexample in Fig. 11 illustrates the difficulty: the reptile skintexture element consists of two quite distinct types of region-the light ribs and the dark center. At small scales, these willappear as separate regions, not elements of the same texture.This difficulty can be avoided using the prior of (10), with�k ¼ 0:5, which has been found adequate in practice.

The observation parameters also require estimation. Usingthe SMMAP or MMAP algorithms, sample averages for themean and variance parameters of either of the observationmodels are easily obtained, based on the current labeling,during the annealing process. Using the symmetric observa-tion model of (14), (15), only two data parameters are requiredindependent of the number of classes.

To summarize, the choice of prior and observationmodels has significant consequences for the tractability ofthe parameter estimation problem, in particular, when thenumber of classes may be unknown.

6 APPLICATIONS

As a simple example, consider Fig. 4a, which shows the gray-level pyramid obtained from the image at the bottom level ofFig. 3, corrupted by additive white Gaussian noise of unitvariance. This image was synthesized using a second orderneighborhood. The estimation problem is to reconstruct theoriginal binary pyramid from these data. The data areconditionally normal

pðYsjXs ¼ mÞ ¼ Nðm;�Þ; m ¼ 0; 1; ð24Þ

where the variance is �2. The MMAP estimate is defined by

xxs ¼ arg maxm

P ðXs ¼ mjyys; xxr; r 2 N s [ PsÞ; ð25Þ

where yys are the noisy data associated with site s: the blockBsof 2Nÿk � 2Nÿk pixels at ð2Nÿkis; 2NÿkjsÞ. Because the noise isindependent, identically distributed, the likelihood is

P ðyysjXs ¼ mÞ ¼ ð2��2Þÿnk=2 exp ÿXði;jÞ2Bs

ðfij ÿ �sðmÞÞ2

2�2

24 35;ð26Þ

where �sðmÞ is the mean intensity at image level, given thatXs ¼ m and fij is the intensity at the pixel ði; jÞ. Theposterior probability is then

P ðXs ¼ mjyys;Xr; r 2 N s [ PsÞ

/ exp ÿnkð�sðmÞ ÿ �yysÞ2

2�2

" #P ðXs ¼ mjXr; r 2 N s [ PsÞ;

ð27Þ

where �yys is the sample mean of Ys from the nk pixels in Bs

�yys ¼ nÿ1k

Xði;jÞ2Bs

fij: ð28Þ

This conforms to the model of (19) and results in the use of thequadtree shown in Fig. 4a, which is formed from the noisyimage data by block averaging. The estimates were initializedby thresholding level 4 in the data pyramid and thereafterusing Gibbs sampling. After 100 iterations at each level, theresult in Fig. 4b was obtained. The error rates at the variouslevels for each estimator are shown in Table 1. Although onlyfew iterations were used, the results at high resolutions aresignificantly better than were obtained by simply copying theinitial level or using a conventional 8-neighbor MRF estimator.Of the estimators, the MMAP algorithm performed betterthan either CMAP or SMMAP. Closer examination showsthat the MMAP estimate gives a more or less constant errorrate of 50 percent per boundary pixel, across a range of scales.This is because the data are uncertain in these areas—aver-aging across the boundary does not improve this. Fig. 5 showsthe number of sites changing on each iteration, for each of thefour levels. It shows that after an initial burst at each level,occupying a few iterations, the sampler settles down to asteady state where only a handful of sites change on anyiteration. Fig. 6 shows that, even with a random initialconfiguration, the sampler quickly converges to a pointwhere only a few sites change on each iteration. Note that oneiteration here refers to a scan-order visit to every pixel on agiven level.

A more realistic case is the images shown in Fig. 7a, whichagain is a binary image with added white Gaussian noise at

WILSON AND LI: A CLASS OF DISCRETE MULTIRESOLUTION RANDOM FIELDS AND ITS APPLICATION TO IMAGE SEGMENTATION 47

standard deviations of 1, i.e., equal to the difference betweenobject and background intensities. Also, in this experiment,the prior model used the second order neighborhood and theobservation model of (12) was used. Prior parameters weretuned by hand and the class means and variances estimatedfrom the data during the SA processing. The estimation errorat the highest resolution, using the MMAP algorithm, was1.3 percent. To the extent that comparisons between resultsbased on different images is possible, this compares favorablywith the results reported on similar problems in the literature[5], [21], [36], [20], [17], [29]. The resulting estimate is shown inFig. 7c; apart from the corners, where the model does not fitthe data, the estimate is visually quite good. A similar result,with an error rate of 1.9 percent, was obtained using ICMwithin the SMMAP algorithm. The best result obtained with apure quadtree process, i.e., bkk ¼ 0 in (3), was an error rate of3.1 percent, broadly comparable with the SMAP result onimage 1 in [5] or that in Fig. 5 of [20]. The result shown in

Fig. 7d, on an input image with a SNR of 5dB, shown in Fig. 7b,was an error rate of 0.46 percent, which again comparesfavorably with that shown in [17] Fig. 5, which is similar interms of shape and SNR. The ICM-SMMAP algorithm gave0.47 percent error rate in this case—virtually identical to theSA result.

The final set of results using this model show two slicesfrom a CT scan of a vertebra, which have been segmented intobone and soft tissue in using the SA-SMMAP algorithm.These images are also 256� 256 pixels, 8 bits/pixel resolu-tion. As is clear from Figs. 8a and 8b, the two classes have verydifferent means and variances: the bone, although lighter onaverage, is highly variable in its opacity; the edges of the twobone segments are also ill-defined. In these tests, only twolevels of resolution were required. The prior parameters wereagain hand tuned. Fig. 8 shows the original images, alongwith the results of the SMMAP algorithm, Figs. 8e and 8f, andsingle resolution MAP using the same second orderMRF model on the original image, Figs. 8c and 8d. Tofacilitate comparison, the results are shown superposed onthe original data as boundary lines. The MAP estimator wasinitialized with the same starting configuration as theSMMAP on the image level and had exactly the same priorparameter bkk. Note that the single resolution result in Fig. 8cshows fusing of two bone regions, which are clearly separate,while the effect of the parent level is sufficient to maintainthem as separate in the SMMAP estimate. In Fig. 8c, a darkregion has been misclassified as bone because the classvariances (17:5 for soft tissue versus 284:4 for bone) are suchthat likelihood of bone is much higher, despite the mean forsoft tissue being 66:5 against 100:0 for bone. The parent level

48 IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 25, NO. 1, JANUARY 2002

Fig. 4. Multiresolution MAP estimation of noisy image conforming to 8-neighbor MRF from Fig. 3a. Noise is additive white Gaussian noise,

SNR ¼ 0dB. (a) Quadtree averaged noisy image. (b) MMAP estimate from (a).

TABLE 1Error Rates in MMAP Estimates at Various Levels,

Compared with Rate from 8-Neighbor MRF andfrom Thresholding at Level 4 and Copying

prevents this in the SMMAP estimate. Although no groundtruth data are available for these images, it is clear from theseresults that the MMRF model avoids the artifacts caused by aconventional MRF approach.

The second test of the model is more demanding:segmenting images containing an unknown number ofregions of more or less homogeneous texture. This uses theprior of (3), with a first-order neighborhood, again with onlyone paramenter to esitmate: the probability that neighboringsites on a given level have the same calss. The father-childprobability in this case used the model of (10), with�k ¼ 0:5; k > kmin. This avoids the oversegmentation ofstructured textures, at the expense of having no regionsignificantly smaller than the block size at level kmin, whichwas 32� 32 pixels in the examples shown.

The texture model underlying the measurements buildson the local spectrum estimation described in [36], using thetwo-component model presented in [15]. The importantfeatures of this model are its ability to model texturescontaining large scale structures with a small number ofparameters and to deal effectively with variations ingeometry and intensity within regions. The difficulty of

capturing such features with a conventional model, such asan MRF model, or with cooccurrence statistics [14], isillustrated in the first example in Fig. 11: the reptile skin(right side) consists of approximately hexagonal cells, whosesize and orientation vary across the sample, while each cellconsists of a dark center surrounded by light edges. Theaverage cell size is about 20 pixels, implying a neighborhoodsize of 400 pixels for a (nonlinear) MRF texture model and asimilar size of array for cooccurrence calculations. There aresimply not enough data within the window associated witheach site to get adequate estimates of joint probabilities on thisscale, even if computational cost were not an issue.Correspondingly, the observation model in these tests is thatof (14), applied independently to four measurements oftextural quality. These measurements are based on the“deterministic+stochastic” decomposition, a generalizationof the Wold decomposition [25], [10] of signals, first presentedin [15]. The four components are:

1. The difference between the average gray level in theblocks:

WILSON AND LI: A CLASS OF DISCRETE MULTIRESOLUTION RANDOM FIELDS AND ITS APPLICATION TO IMAGE SEGMENTATION 49

Fig. 5. Plot of the number of changes on each iteration of the sampler, for levels 8 (darkest) up to 5 (lightest).

Fig. 6. Number of changes on each iteration, for random initial configuration (top) and state copied from father (bottom) on level 8 of the pyramid.

V1ðyys; yyr; xs; xrÞ ¼ð�yys ÿ �yyrÞ2

2�21xsxr

; ð29Þ

where �yys is as before the mean intensity at a site. This isthe only measure used between father and child sites.

2. Two measures associated with the deterministiccomponent, based on an affine deformation model

fsð~��Þ ¼ frðAAÿ1sr ð~�� ÿ ~��srÞÞ þ �srð~��Þ; ð30Þ

where ~�� represents the image coordinate, fsð:Þ repre-sents the patch of an image centered at site s, site r is a4-neighbor of site s,AAsr is that 2� 2 nonsingular linearcoordinate transform and ~��sr that translation whichtogether give the best fit in terms of total deformationenergy between the two patches. These are identifiedusingthemethoddescribedin[15],whichmakesuseoflocalFourierspectracalculatedat theappropriatescaleusing the Multiresolution Fourier Transform (MFT)[35]. The deformation energy consists of:

a. The deformation term kAAsr ÿ IIk2 represents theamount of “warping” required to match thegiven patch using its neighbor:

V2ðyys; yyr; xs; xrÞ ¼X2

i;j¼1

ðAsrij ÿ �ijÞ2

2�22xsxr

: ð31Þ

b. The error term k�srð~��Þk2 is the average residualerror in the approximation:

V3ðyys; yyr; xs; xrÞ ¼Xði;jÞ2Bs

fij ÿ f ðrÞij� �2

2�23xsxr

; ð32Þ

where Bs is the block of pixels associated withsite s, fij is as above the image intensity and f

ðrÞij its

approximation using the warped patch, as in (30).3. A measure for the stochastic component, based on

differences in the spectral energy densities estimatedat each site via the MFT, jffð~��; ~!!; �Þj, where

ffð~��; ~!!; �Þ ¼ 1ffiffiffi�pZd~xx fð~xxÞw ~xxÿ~��

!eÿ|~!!:~xx ð33Þ

is the (continuous) MFT at spatial coordinate ~��,frequency ~!!, and scale � [35], computed at theappropriate window scale�k ¼ 2Nÿk at the site s 2 Qk:

V4ðyys; yyr; xs; xrÞ ¼Xði;jÞ2Bs

ðjffsijj ÿ jffrijjÞ2

2�24xsxr

; ð34Þ

where ffsij is the MFT coefficient at site s andfrequency coordinate ði; jÞ, computed via a wind-owed FFT algorithm.

The process of modeling the texture at a pair of sites in thisway is illustrated in Fig. 9. In particular, the magnitudespectra of the windowed data are used both to estimate thematrix AAsr, using the two pairs of centroid vectors, and as ameasure of stochastic textural similarity. Note that, each ofthe variance parameters �2

ijk can take only two values,namely, �2

iw;m ¼ n and �2io;m 6¼ n, as in (15).

50 IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 25, NO. 1, JANUARY 2002

Fig. 7. Segmentation of noisy “Shapes” image. (a) 0dBSNR image. (b) 5dBSNR image. (c) MMAP segmentation of (a). (d) MMAP segmentation of (b).

Combining these four potentials, the posterior can finally

be expressed as

P ðxsjyys; yyr; xr; r 2 N s [ PsÞ

/Yr2N s

P ðxsjxrÞ exp ÿX4

i¼1

Viðyys; yyr; xs; xrÞ" #

� P ðxsjxpðsÞÞ exp ÿV1ðyys; yypðsÞ; xs; xpðsÞÞh i

:

ð35Þ

In addition, a line process has been introduced toincrease the accuracy of the segmentations, especially atcoarse scales, at low cost computationally. This is based onan assumption of smoothness of the boundary since texturemeasurements require a minimum sample size, which

corresponds to a sampling interval of 4� 4 pixels with theabove texture measures. As in [12], [11], [13], the lineprocess is based on pairwise interactions between neighbor-ing boundary blocks. The model is based on the orientedline joining the estimated positions of the putativeboundary in each block. Boundary processing is also asimulation designed to find the Bayesian estimate, butoccurs after the regions have been identified on a givenlevel. Only region sites having neighbors which belong to adifferent class are identified as potentially boundary-containing and the process is run on those alone. This isillustrated in Fig. 10 for the image shown in Fig. 11.

From the set of potential boundary sites on level k, Bk, asubset is selected by stochastic labeling, using a potential

WILSON AND LI: A CLASS OF DISCRETE MULTIRESOLUTION RANDOM FIELDS AND ITS APPLICATION TO IMAGE SEGMENTATION 51

Fig. 8. Segmentation of CT images of a vertebra. (a) First slice from CT data set. (b) Third slice from CT data. (c) Single resolution MAP

segmentation of (a). (d) Single resolution MAP segmentation of (b). (e) MMAP segmentation of (a). (f) MMAP segmentation of (b).

function which penalizes curvature in the line joining theestimated centroids of the putative boundary segment ineach block. The potential depends on local measurements yybsof the boundary orientation and position in the block ofpixels associated with neighboring sites

V bðyybs; yybrj s; rÞ ¼ðsin �s þ sin �rÞ2k~llsrk2

2�b 2 s r

if r; s 2 Bkand r 2 N s;

ð36Þ

where the vector ~llsr is the vector joining the featurecentroids at sites s; r, and �s; �r are the angles between thatvector and the features at the two sites, as in Fig. 1. Thecentroid position and boundary angle at a site are estimated

using the MFT-based technique first described in [35]. Inthis way, both texture and boundary features can becomputed within the same framework [23]. It is assumedthat a priori a pair of neighbors r; s 2 Bk have a probability0:5 of being connected. In this case, there are again twovariance values: the one used when both sites are boundarysites, i.e., s ¼ r ¼ 1, �b 2

w is estimated from the bottom halfof the sorted list of distances, while if either is nonbound-ary, it is estimated from the top 50 percent of distances.Then, a summary of the boundary labeling algorithm is:

1. For each site s 2 Bk, determine the posterior poten-tial for a boundary label s ¼ 1 using

V b s; r; r 2 N s \Bkÿ �

¼X

r2N s\Bk

V b yybs; yybrj s; r

ÿ �: ð37Þ

2. Sample from the corresponding Gibbs distribution todetermine the label s.

A logarithmic annealing schedule is again followed for theboundary processing, which runs after the region processingis complete at a given level. When the boundary processconverges, lines are drawn betweenthe estimated centroids ofany pair of neighbors which are both labeled as boundarysites. In the present scheme, no information is propagatedfrom “boundary fathers” to their children and sites inthe boundary setBk are labeled as either boundary 1 or 0.

The experiments tested the model’s ability to segmenttextured images of various types, as can be seen from Figs. 11and 12. These images are all 256� 256 pixels of 8 bitresolution. In each case, the number of classes was limitedto 100 and no tuning was performed: the algorithm ranunsupervised, starting in each case at the same resolution of8� 8 sites. In Fig. 11, the refinement of the segmentationthrough the SMMAP procedure is evident, as is theimprovement due to the boundary process. Table 2 sum-marizes the performance of the algorithm on this data,

52 IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 25, NO. 1, JANUARY 2002

Fig. 9. Steps in identifying the affine texture model. Using the energyspectra (c) and (d), the linear transformationAAsr is identified from a pair ofrepresentative vectors selected for each patch. The translation isestimated by cross-correlation of the warped prototype and the target.The overlay in (f) shows the skeleton of the warped prototype patch.(a) Windowed 64*64 prototype texture patch. (b) Target patch showseffect of analysis and synthesis windows. (c) Magnitude spectrum of patch(a), with coordinate vectors. (d) Magnitude spectrum of (b) with coordinatevectors. (e) Approximation of (d) by (c), using linear transform.(f) Approximation of (b) by (a), using full affine transform.

Fig. 10. Potential boundary-containing blocks from segmentation at

level 3,B3, in Fig. 11, shown with the estimated boundary segments (solid

lines). The real boundary is also drawn for comparison, as a dotted line.

showing the pixel classification error rate of the region

processing alone and for the combined region and boundary

processing, along with the computational cost, in iterations

per pixel, for the two processes. The error rate drops to less

than 1 percent with the boundary process at the highest

resolution and this is achieved at a normalized number of

iterations per pixel of only 2. This figure is the sum of

contributions from the various levels, each weighted by the

number of sites on that level, divided by the number of pixels.

Note that the algorithm terminates two levels above the

image level because this is the highest resolution for which

meaningful texture and boundary estimates can be made. For

WILSON AND LI: A CLASS OF DISCRETE MULTIRESOLUTION RANDOM FIELDS AND ITS APPLICATION TO IMAGE SEGMENTATION 53

Fig. 11. Segmentation results of Image I. (a) The results before the boundary process is executed at level 3 to 6, respectively. (b) The results after

the boundary process is executed at level 3 to 6.

comparison, the same procedure was run with ICM insteadof SA. As the results in Table 3 show, there is a roughly30 percent reduction in computation, but the error rates forthe deterministic algorithm are marginally worse. Thisfinding was repeated in all the cases examined.

In the second figure, a summary of the high-resolutionsegmentations is shown, for four combinations of two ormore textures. The test images were 256� 256 pixels, withthe textures taken from Brodatz’s book. It should be notedthat no additional information on the number of texturedregions is required by the algorithm. These picturesillustrate the effectiveness of the overall technique and the

utility of the boundary process, which both improves thesubjective quality and lowers the misclassification rates:Typically, of the order of 1-2 percent, with the worst errorrate in the four examples shown being 2.2 percent withoutthe boundary process and 2.1 percent with it. Because of themultiresolution estimation, the overall number of iterationsrequired to attain convergence was low—in the examplesshown in Fig. 12, the total number of iterations/pixel,combining all levels, was of the order of 2-4, combiningboth region and boundary processing. Tables 4, 5, 6, and 7summarize the algorithm’s performance on these images.

We have compared these results with those presented by anumber of authors, including [11], [19], [4], [5], [21], [36], and[26]. It is hard to be categorical in such comparisons becauseno two authors use the same image data and clearly somemethods are better adapted to certain classes of data thanothers. In their work on boundary detection, Geman et al.show a number of classifications of texture collages similar tothe ones used here. It would appear that their approachshows the weakness of using a single resolution: their finallabeling in Fig. 10a shows some obvious misclassifications,possibly due to the misalignment of the blocks with thetexture boundaries. Unfortunately, no error rates are quotedfor their segmentations. In other cases, where results arepresented on textures, these are often less structured than theexamples in Figs. 11 and 12. Making allowance for this, itseems fair to say that the results shown here compare wellwith those reported elsewhere.

7 CONCLUSIONS

In this paper, we have presented a new model for imageanalysis, which combines multiple resolutions and MRFs todescribe image structure statistically. The model builds onearlier MMRF models, but has some features which make itparticularly suited to some classes of image segmentationproblem. In particular, it was shown that with the right choiceof prior and observation models, the segmentation algorithmbecomes independent of the number of regions and robust to

54 IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 25, NO. 1, JANUARY 2002

Fig. 12. Summary of final segmentation results on various texture

combinations. Left images: without boundary process; right images: with

boundary process.

TABLE 2Classification Error Rates and Number of Iterations perPixel (# i/p) for Two Texture Image of Fig. 11, Using SA

TABLE 3Error Rates and Number of Iterations per

Pixel (# i/p) for Image of Fig. 11, Using ICM

the smooth variations in intensity and textural propertieswhich are often encountered in image data. Of course, it doesthis at a cost: class means and interclass variances maycontain useful information, which is inaccessible under themodel assumptions. The model was used in the segmentationof noisy and textured images, where it achieved resultscomparable to or better than those reported in similar work, interms of average misclassification rate and computationalrequirements, on a range of images containing naturaltextures with varying degrees of structure and with differentnumbers of regions. By including a simple boundary process,it is possible to achieve almost the same error rates (1-2 percent) using lower resolution, cutting the computationalcost to less than 1 iteration per pixel.

Although the work reported here is encouraging, muchremains to be done before it can be considered complete. Themost obvious defect of the prior model is that it does notproperly cater for the “creation” of new regions below the toplevel of the quadtree. As a result, any problem requiringsegmentation of regions significantly smaller than the blocksize at the top level of the tree can only be dealt with by usingfewer levels. Another weakness is that the line process whichwas used to improve the estimate of the boundary, does notinteract with the region labeling. Similarly, the segmentationmodel has not been tested with other image features. Mostimportant, more effective parameter estimation methods forthe general model are needed to extend the work to moregeneral segmentation problems. Although many authorsfavor EM methods (e.g., [5], [20]), it may be that a Bayesianapproach, in which the parameters are sampled from anappropriate posterior distribution, would be effective for thistask. Work is currently under way to address these issues.

APPENDIX

PROOF OF PROPOSITION 1

To prove the proposition, it suffices to show that for anystate xk 6¼ xk�, there is at least one site change which willhave the effect of giving a configuration which is both morelikely and closer to xk�. Suppose, therefore, that xk is such a

state. It must have at least one site s for which xs 6¼ x�s .Consider the state xk0 , in which

x0r ¼ xr; r 6¼ s ð38Þ

and

x0s ¼ x�s ð39Þ

then, recalling that V k;lðm;mÞ ¼ 0 and V k;lðm;nÞ � 0,

Uðxk0 jxkÿ1Þ ¼Uðxkjxkÿ1Þ ÿ V k;kÿ1ðxs; x�sÞþXr2N s

ðV k;kðx�s; xrÞ ÿ V k;kðxs; xrÞÞ ð40Þ

but the sum on the right side is bounded byXr2N s

V k;kðxs; xrÞ � jN sjV k;kðm;nÞ; m 6¼ n ð41Þ

Hence, if V k;kÿ1ðm;nÞ > jN sjV k;kðm;nÞ; m 6¼ n, we get

Uðxk0 jxkÿ1Þ < Uðxkjxkÿ1Þ ð42Þ

and xk0 is indeed more likely than xk. Since xk was arbitrary,the proof is complete. tu

ACKNOWLEDGMENTS

The authors would like to thank Chung-Cheng Institute ofTechnology in Taiwan for supporting this study.

REFERENCES

[1] M. Basseville, A. Benveniste, K.C. Chou, S.A. Golden, R.Nikoukah, and A.S. Willsky, “Modelling and Estimation ofMultiresolution Stochastic Processes,” IEEE Trans. InformationTheory, vol. 38, no. 2, pp. 766-784, 1992.

[2] J. Besag, “Spatial Interaction and the statistical Analysis of LatticeSystems,” J. Royal Statistical Soc., Series B, vol. 36, pp. 192-236, 1974.

[3] J. Besag, “On the Statistical Analysis of Dirty Pictures,” J. RoyalStatistical Soc., Series B, vol. 48, no. 3, pp. 259-302, 1986.

[4] C.A. Bouman and B. Liu, “Multiple Resolution Segmentation ofTextured Images,” IEEE Trans. Pattern Analysis and MachineIntelligence, vol. 13, no. 2, pp. 99-113, Feb. 1991.

WILSON AND LI: A CLASS OF DISCRETE MULTIRESOLUTION RANDOM FIELDS AND ITS APPLICATION TO IMAGE SEGMENTATION 55

TABLE 4Error Rates and Number of Iterations per Pixel (# i/p)

for Two Texture Top Image of Fig. 12, Using SA

TABLE 5Error Rates and Number of Iterations per Pixel (# i/p)

for Second Top Image of Fig. 12, Using SA

TABLE 6Error Rates and Number of Iterations per Pixel (# i/p)for Third (Four Texture) Image of Fig. 12, Using SA

TABLE 7Error Rates and Number of Iterations per Pixel (# i/p)

for Bottom Image of Fig. 12, Using SA

[5] C.A. Bouman and M. Shapiro, “A Multiscale Random Field Modelfor Bayesian Image Segmentation,” IEEE Trans. Image Processing,vol. 3, pp. 162-176, 1994.

[6] R. Chellappa and R.L. Kashyap, “Texture Synthesis Using 2-DNoncausal Autoregressive Models,” IEEE Trans. Acoustics, Speech,Signal Processing, vol. 33, pp. 194-203, 1985.

[7] S. Clippingdale, “Multiresolution Image Modelling and Estima-tion,” PhD thesis, Dept. Computer Science, The Univ. of Warwick,U.K., Sept. 1988.

[8] F.S. Cohen and Z. Fan, “Maximum Likelihood UnsupervisedTextured Image Segmentation,” Computer Vision, Graphics, andImage Processing, vol. 54, pp. 239-251, 1992.

[9] G.R. Cross and A.K. Jain, “Markov Random Field TextureModels,” IEEE Trans. Pattern Analysis and Machine Intelligence,vol. 5, pp. 25-39, 1983.

[10] J.M. Francos, A. Meiri, and B. Porat, “A Unified Texture ModelBased on a 2-D Wold-Like Decomposition,” IEEE Trans. SignalProcessing, vol. 41, 1993.

[11] D. Geman, S. Geman, C. Graffigne, and P. Dong, “BoundaryDetection by Constrained Optimization,” IEEE Trans. PatternAnalysis and Machine Intelligence, vol. 12, pp. 609-628, 1990.

[12] S. Geman and D. Geman, “Stochastic Relaxation, Gibbs Distribu-tion, and Bayesian Restoration of Images,” IEEE Trans. PatternAnalysis and Machine Intelligence, vol. 6, pp. 721-741, 1984.

[13] B. Gunsel, A.K. Jain, and E. Panayirci, “Reconstruction andBoundary Detection of Range and Intensity Images Using aMultiscale MRF Representation,” Computer Vision and ImageUnderstanding, vol. 63, pp. 353-366, 1996.

[14] R.M. Haralick, “Statistical and Structural Approaches to Texture,”Proc. IEEE, vol. 67, no. 5, pp. 786-804, 1979.

[15] T.I. Hsu and R.G. Wilson, “A Two-Component Model of Texturefor Analysis and Synthesis,” IEEE Trans. Image Processing, vol. 7,no. 10, pp. 1466-1476, 1998.

[16] Z. Kato, M. Berthod, and J. Zerubia, “A Hierarchical MarkovRandom Field Model and Multi-Temperature Annealing forParallel Image Classification,” Technical Report 1938, INRIA,1993.

[17] Z. Kato, J. Zerubia, and M. Berthod, “Unsupervised Parallel ImageClassification Using Markovian Models,” Pattern Recognition,vol. 32, pp. 591-604, 1999.

[18] R. Kinderman and J.L. Snell, Markov Random Fields and TheirApplications. Am. Math. Soc., 1980.

[19] S. Krishnamachari and R. Chellappa, “Multiresolution Gauss-Markov Random Field Models for Texture Segmentation,” IEEETrans. Image Processing, vol. 6, no. 2, pp. 251-267, 1997.

[20] J-M. Laferte, P. Perez, and F Heitz, “Discrete Markov ImageModeling and Inference on the Quadtree,” IEEE Trans. ImageProcessing, vol. 9, pp. 390-404, 2000.

[21] S. Lakshmanan and H. Derin, “Simultaneous Parameter Estima-tion and Segmentation of Gibbs Random Fields Using SimulatedAnnealing,” IEEE Trans. Pattern Analysis and Machine Intelligence,vol. 11, pp. 799-813, 1989.

[22] S.M. Lavalle and S.A. Hutchinson, “A Bayesian SegmentationMethodology for Parametric Image Models,” IEEE Trans. PatternAnalysis and Machine Intelligence, vol. 17, pp. 211-217, 1995.

[23] Chang-Tsun Li, “Unsupervised Image Segmentation Using Multi-resolution Markov Random Fields,” PhD thesis, Univ. ofWarwick, U.K., 1998.

[24] J. Li, R.M. Gray, and R.A. Olshen, “Multiresolution ImageClassification by Hierarchical Modeling with Two-DimensionalHidden Markov Models,” IEEE Trans. Information Theory, vol. 46,pp. 1826-1840, 2000.

[25] F. Liu and R.W. Picard, “Periodicity, Directionality, and Random-ness: Wold Features for Image Modelling and Retrieval,” IEEETrans. Pattern Analysis and Machine Intelligence, vol. 18, no. 7,pp. 722-733, July 1996.

[26] C.S. Lu, P.C. Chung, and C.F. Chen, “Unsupervised TextureSegmentation via Wavelet Transform,” Pattern Recognition, vol. 30,no. 5, pp. 729-742, 1997.

[27] M.R. Luettgen and W.C. Karl, A. Willsky, “Multiscale Representa-tions of Markov Random Fields,” IEEE Trans. Signal Processing,vol. 41, pp. 3377-3395, 1993.

[28] B.S. Manjunath and R. Chellappa, “Unsupervised TextureSegmentation Using Markov Random Field Models,” IEEE Trans.Pattern Analysis and Machine Intelligence, vol. 13, pp. 478-482, 1991.

[29] M. Mignotte, C. Collet, P. Perez, and P. Bouthemy, “Sonar ImageSegmentation Using an Unsupervised Hierarchical MRF Model,”IEEE Tran. Image Processing, vol. 9, pp. 1216-1231, 2000.

[30] D.K. Panjwani and G. Healey, “Markov Random Field Models forUnsupervised Segmentation of Textured Color Images,” IEEETrans. Pattern Analysis and Machine Intelligence, vol. 17, pp. 939-954,1995.

[31] P. Perez and F. Heitz, “Restriction of a Markov Random Field on aGraph and Multiresolution Statistical Image Modeling,” IEEETrans. Information Theory, vol. 42, no. 1, pp. 180-190, 1996.

[32] M.K. Schneider, P.W. Fieguth, and W.C. Karl, A.S. Willsky,“Multiscale Methods for the Segmentation and Reconstruction ofSignals and Images,” IEEE Trans. Image Processing, vol. 9, pp. 456-468, 2000.

[33] R. Szeliski, Bayesian Modelling of Uncertainty in Low-Level Vision.Kluwer Academic, 1989.

[34] S. Richardson, W.R. Gilks, and D.J. Spiegelhalter, Markov ChainMonte Carlo in Practice. Chapman and Hall, 1996.

[35] R. Wilson, A. Calway, and E.R.S. Pearson, “A Generalised WaveletTransform for Fourier Analysis: The Multiresolution FourierTransform and Its Application to Image and Audio SignalAnalysis,” IEEE Trans. Information Theory, vol. 38, no. 2, Mar. 1992.

[36] R. Wilson and M. Spann, Image Segmentation and Uncertainty.Research Studies Press, 1988.

[37] J.W. Woods, “Two-Dimensional Discrete Markovian Fields,” IEEETrans. Information Theory, vol. 18, pp. 232-240, 1972.

Roland Wilson received the BSc and PhDdegrees from the Department of Electrical andElectronic Engineering at the University ofGlasgow, in 1971 and 1978, respectively. From1978 to 1985, he was a lecturer in the Depart-ment of Electronic and Electrical Engineering atthe University of Aston. From 1982 to 1983, hewas a visiting professor at Linkoping University,Sweden. In 1985, he was appointed to a seniorlectureship in the Department of Computer

Science at the University of Warwick. In 1992, he was promoted to areadership. In 1985, he was jointly awarded the Pattern RecognitionSociety Medal for Best Paper in Pattern Recognition with his studentMike Spann. In 1999, he was promoted to a professorship. He haspublished more than 100 papers in the areas of communication theory,image and audio signal processing, and neural networks. He is aneditorial board member for the journal Pattern Recognition.

Chang-Tsun Li received the BS degree inelectrical engineering from Chung-Cheng Insti-tute of Technology (CCIT), National DefenseUniversity, Taiwan, Republic of China, in 1987,the MS degree in computer science from theUS Naval Postgraduate School in 1992, and thePhD degree in computer science from theUniversity of Warwick, United Kingdom, in1998. He was a lecturer in the Department ofElectrical Engineering at CCIT during 1992-

1994. He is currently an associate professor with the same departmentat CCIT. His research interests include computer vision, imageprocessing, pattern recognition, neural network, genetic algorithm, andfuzzy theory.

. For more information on this or any other computing topic,please visit our Digital Library at http://computer.org/publications/dlib.

56 IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 25, NO. 1, JANUARY 2002