disp.ee.ntu.edu.twdisp.ee.ntu.edu.tw/meeting/建齊/image segmentation .doc · web viewfigure 4.6...

An Introduction of Image SegmentationChien-Chi Chen

E-mail: [email protected] Institute of Communication EngineeringNational Taiwan University, Taipei, Taiwan, ROC

AbstractSegmentation was used to identify the object of image that we are interested.

We have three approaches to do it. The first is Edge detection. The second is to use threshold. The third is the region-based segmentation. It does not mean that these three of that method can solve all of the problems that we met, but these approaches are the basic methods in segmentation.

1. Introduction

We first discuss from the case of the monochrome and static images. The fundamental problem in segmentation is to partition an image into regions. Segmentation algorithms for monochrome images are generally based on one of the following two basic categories. The first one is Edge-based segmentation. The second one is Region-based segmentation. Another method is to use the threshold. It belongs to Edge-based segmentation. There are three goals that we want to achieve:

1. The first one is speed. Because we need to save the time from segmentation and give complicate compression more time to process.

2. The second, one is to have a good shape matching even under less computation time.

3. The third, one is that the result of segmenting shape will be intact but not fragmentary which means that we want to have good connectivity.

2. Edge-Based Segmentation

The focus of this section is on the segmentation methods based on detection in sharp, local changes in intensity. The three types of image feature in which we are

1

interested are isolated points, lines , and edges. Edge pixels are pixels at which the intensity of an image function changes abruptly.

2.1. FundamentalWe know local changes in intensity can be detected using derivatives. For

reasons that will become evident , first- and second-order derivatives are particularly well suited for this purpose.

Figure 2.1 The intensity histogram of image

We have following conclusions from Figure 2.1 The intensity histogram of image Figure 2.1:

First-order derivatives generally produce thicker edges in an image.1. Second-order derivative have a stronger response to fine detail, such as thin

lines, isolated points, and noise.2. Second-derivatives produce a double-edge response at ramp and step transitions

in intensity.3. The sign of the second derivative can be used to determine whether a transition

into an edge is from light to dark or dark to light.

2.2. Isolated PointsIt is based on the conclusions reached in the preceding section. We know that

point detection should be based on the second derivative, so we expect Laplacian mask.

2

Figure 2.2 The mask of isolation point

We can use the mask to scan the all point of image , and count the response of every point, if the response of point is greater than T(the threshold we set), we can define the point to 1(light), if not, set to 0(dark).

2.3. Line DetectionAs discussion in section2.1,we know the second order derivative have stronger

response and to produce thinner lines than 1st derivative. We can get four different direction of mask.

Figure 2.3 Line detection masks.

Let talk about how to use the four masks to decide which direction of mask is better than others. Let , , and denote the response of the masks in Figure 2.3. If

3

at a point in the image , > ,for all j k. > for j=2,3,4,that point is said

to be more likely associated with a line in the direction of mask k.

2.4. Edge detection

Figure 2.4 (a) Two region of constant intensity separated by an ideal vertical ramp edge.(b)Detail near the edge, showing a horizontal intensity profile.We conclude from the observation that the magnitude of 1st derivative can be used

to detect the presence of an edge at a point in an image. The 2nd derivative have two properties : (1) it produces two values for every edge in an image.(2)its zero crossings can be used for locating the center of thick edges.

2.4.1. Basic Edge Detection (gradient)The image gradient is to find edge strength and direction at location (x,y) of

image, and defines as the vector.

The magnitude (length) of vector , denoted as M(x,y)

The direction of the gradient vector is given by the angle

The direction of an edge at an arbitrary point (x,y) is orthogonal to the direction.

4

We are dealing with digital quantities,so a digital approximation of the partial derivatives over a neighborhood about a point is required.1. Roberts cross-gradient operators. Roberts [1965].

Figure 2.5 Roberts mask

2. Prewitt operator

Figure 2.6 Prewitt’s mask

3. Sobel operator

5

Figure 2.7 (a)~(g) are region of an image and various masks used to compute the

gradient at the point labeled

The Sobel mask uses 2 in the center location for image smoothing. The Prewitt masks are simpler to implement than Sobel masks, but the Sobel masks have better noise-suppression(smoothing) characteristics makes them preferable.

In the previous discussion, we just discuss to obtain the and . However, this

implementation is not always desirable ,so an approach used frequently is to approximately the magnitude of the gradient by absolute values:

2.4.2. The Marr-Hildreth edge detector(LoG)This method in use at the time were based on using small operators ,and we

discuss previously the 2nd derivative is better than 1st derivative in small operator.Then we use Laplacian to make it.

6

Figure 2.8 5x5mask of LOG

The Marr-Hildreth algorithm consists of convolving the LoG filter with an input image, .

Because these are linear process, Eq(2.4-9) can be written also as

It’s edge-detection algorithm may be summarized as follow:1. Filter the input image with an Gaussian lowpass filter(It can smooth the

large numbers of small spatial details).2. Compute the Laplacian of the image resulting from Step1 using.3. Finding the zero crossings of the image from Step2.To specify the size of Gaussian filter, recall that about 99.7% of the volume under a 2-D Gaussian surface lies between about the mean. So .

Figure 2.9 (a) Image of input. (b) After using the LoG with threshold 200.

7

2.5. Edge Linking and Boundary DetectionEdge detection should yield sets of pixel lying on edge, but noise would breaks in

the edges due to nonuniform illumination. Therefore, edge detection typically is followed by linking algorithms designed to assemble edge pixels into meaningful edges and/or region boundaries.

2.5.1. Local processingThis edge linking is to analyze the characteristics of pixels in a small

neighborhood about every point (x,y) that has been declared an edge point by previous techniques.

The two principle properties used for establishing similarity of edge pixels are1. the strength(magnitude)

denote the set of coordinates of a neighborhood centered at point .

E is a positive threshold. 2. The direction of the gradient vector

A is a positive angle threshold.

2.5.2. Regional processingOften, the location of regions of interest in an image are known or can be

determined. In such situations ,we can use techniques for linking pixels on a regional basis, with the desired result being an approximation to the boundary of the region.We discuss the mechanics of the procedure using the following Fig2.6

Figure 2.10 illustration of the iterative polygonal fit algorithm

An algorithm for finding a polygonal fit may be stated as follows:

8

1. Specify two starting point A and B.2. Connected A and B ,and compare which points are defined vertices of the

polygon max and larger than T(threshold).3. Then connect all the point we have, thus compare it like step 2 until distance

between every point and connected lines vertices is smaller than T.

2.5.3. Global processing using the Hough transformIn regional processing, it makes sense to link a given set of pixels only if we know

that they are part of the boundary of a meaningful region. Often, we have to work with unstructured environments in which all we have.

We can use Hough transform to use coordinate transition to find out the similar point in other place.

Figure 2.11 (a)xy-plane. (b)Parameter space

Consider a point in the xy-plane and the general equation of a straight line in

slope-intercept from, ,a second point also has a line in parameter

space associated with it .but a practical difficulty with the approach, is that (slope of a line)approaches infinity as the line approaches the vertical direction. One way to use the normal representation of a line:

(2.5-3)

9

Figure 2.12 (a) A line in the xy-plane. (b)Sinusoidal curves in the ρθ-plane; the point of the intersection (ρ,θ)corresponds to the line passing through point (xi,yi) and (xj,yj)in the xy-plane

2.6. Segmentation Using Morphological Watersheds2.6.1. Background

The concept of watershed is based on visualizing an image in three dimensions: two spatial coordinates and intensity. We consider three types of points: 1. The points belonging to the local minimum.2. The points where a drop of water, if placed at the locations of these points, would

fall to a single local minimum. It is called catchment basin or watershed.3. The points where water would be equally likely to fall to more than one local

minimum. They are similar to the crest lines on the topographic surface and are termed divide lines or watershed lines.

The two main properties of watershed segmentation result are continuous boundaries and over-segmentations. As we know, the boundaries that made by the watershed algorithm are exact the watershed lines in the image. Therefore, the numbers of region basically will be equal to the numbers of minima in the image. There are two steps to achieve the solution using marker:1. Preprocessing2. Defining the criteria that the markers have to be satisfy.The following figures are the mechanism to construct dam.

10

Figur2.10(a)~(d) Watershed algorithm.Supposed that figure2.10 are the image of input , and the height of the “mountain” is proportional to intensity values input image. We start to flood water from below by letting water rise through the holes at a uniform rate. Figure (b) we see that water now has risen into the first and second catchment basins. So we will construct a dam to stop it to overflowing ,and do the same motion step by step.

2.6.2. The Use of MarkersDirect application of the watershed segmentation algorithm in the form discussed

in the previous section generally leads to oversegmentation due to noise and other local irregularities of gradient.

An approach used to control oversegmentation is based on the concept of markers. Then we have markers. We have internal markers, associated with objects of interest, and external markers. A procedure for markers selection typically will consist of two principal steps: (1)preprocessing (usually smoothing) (2)definition of a set of criteria that markers must satisfy.(to do edge detection for every small region)

(a)(b)

11

(c)(d)

2.7. Edge detection using Hilbert transform(HLT)Compare with the derivative or so called the differential method, the impulse

response of HLT is much longer. The longer impulse response will reduce the sensitivity of the edge detector and at the same time reduce the influence of noise.We can learn that the longer response has less sensitive but has good detection in ramp edge and more noisy robustness.We list the mathematics in form of discrete time version of the HLT below.The discrete version of the HLT is

Where

2.7.1. Short response Hilbert transform(SRHLT)We have realized the advantages and disadvantages of the derivative method and

the HTL method for detecting edges. S. C. Pei and J. J. Ding proposed another method combining the two methods to detect edges in 2007. [D-1] [D-4] They combine the HLT and differentiation to define the Short Response Hilbert Transform (SRHLT). They define the short response Hilbert transform (SRHLT) as:

12

Figure 2.13 (a) Electrophoresis image(b)Result of applying the watershed segmentation algorithm to the gradient image, we can say that oversegmentation is evident.(c) is a image after (b) by smoothing, it shows internal markers(light gray region) and external markers(watershed lines)(d)Result of segmentation. Note the improvement over (b).(Courtesy of Dr. S.Beucher,CMM/Ecole des Mines de Paris.)

When b 0+ (0+ is a positive number very near to 0), the SRHLT becomes the HLT. When b , the SRHLT tends to the differentiation operation.

Figure 2.14 The characters of the SRHLT

-2 -1 0 1 2-1

0

1

-2 -1 0 1 2

-1

0

1

-2 -1 0 1 2-1

0

1

-2 -1 0 1 2-10

0

10

-2 -1 0 1 2-1

0

1

-2 -1 0 1 2

-1

0

1

-2 -1 0 1 2-1

0

1

-2 -1 0 1 2

-1

0

1

-2 -1 0 1 2-1

0

1

-2 -1 0 1 2

-1

0

1

(a) (b)

(c) (d)

(e) (f)

(g) (h)

(i) (j)

Time domain Frequency domain

Hilbert transform

differentiation

SRHLT, b=0.25

SRHLT, b=1

SRHLT, b=4

FT

FT

FT

FT

FT

Figure 2.15 Impulse responses

and their FTs of the SRHLT for different b.

Higher b(differentiation) Lower b(HLT)Impulse response Shorter longerNoise robustness bad good

Type of edge step rampoutput sharp thick

13

3. Thresholding

3.1. Basic Global ThresholdingAs the fact that we need only the histogram of the image to segment it,

segmenting images with Threshold Technique does not involve the spatial information of the images. Therefore, some problem may be caused by noise, blurred edges, or outlier in the image. That is why we say this method is the simplest concept to segment images.When the intensity distributions of objects and background pixels are sufficiently distinct, it is possible to use a single(global) threshold applicable over the entire image. The following iterative algorithm can be used for this purpose:1. Select an initial estimate for the global threshold, T.2. Segment the image using T as

This will produce two groups of pixels: consisting of all pixels with intensity

values > T, and consisting of pixels with values T.

3. Compute the average(mean)intensity values and for the pixels in and

.

4. Compute a new threshold values:

5. Repeats Step2 through 4 until the difference between values of T in successive iterations is smaller than a predefined parameter.

3.2. Optimum Global Thresholding Using Otsu’s MethodThresholding may be viewed as a statistical-decision theory problem whose objective is to minimize the average error incurred in assigning pixels to two or more groups.Let{0,1,2,…,L-1}denote the L distinct intensity levels in a digital image of size

pixels, and let denote the number of pixels with intensity . The total

14

number, MN, of pixels in the image is . The normalized

histogram has components , from which it follows that

Now, we select a threshold , and use it to threshold the input

image into two classes, and , where consist with intensity in the range

and consist with .

Using this threshold , ,that is assigned to and given by the cumulative sum.

The validity of the following two equations can be verified by direct substitution of the preceding result:

In order to evaluate the “goodness” of the threshold at level k we use the normalized, dimensionless metric

Where is the global variance

And is the between-class variance, define as :

15

Indicating that the between-class variance and is a measure of separability between class.

Then, the optimum threshold is the value, ,that maximizes

In other word, to find we simply evaluate (2.7-11) for all integer values of kOnce has been obtain, the input image is segmented as before:

For x = 0,1,2,…,M-1 and y = 0,1,2…,N-1. This measure has values in the range

3.2.1. Using image Smoothing/Edge to improve Global ThresholdCompare the difference between preprocess of smoothing and Edge detection.

Smoothing Edge detectionWhat situation is more suitable for the method

Large object we are interested.

Small object we are interested

3.3. Multiple ThresholdFor three classes consisting of three intensity intervals, the between-class variance is given by:

The following relationships hold:

The optimum threshold values,

Finally, we note that the separability measure defined in section 2.7.2 for one threshold extends directly to multiple thresholds:

16

3.4. Variable ThresholdingImage partitioningOne of the simplest approaches to variable threshold is to subdivide an image into nonoverlapping rectangles. This approach is used to compensate for non-uniformities in illumination and/or reflection.

Figure 3.16 (a)Noisy, shaded image (b) Image subdivide into six subimages. (c)Result of applying Otsu’s method to each subimage individually.Image subdivision generally works well when the objects of interest and the background occupy regions of reasonably comparable size. When this is not the case, the method typically fail.

Variable thresholding based on local image propertiesWe illustrate the basic approach to local thresholding using the standard deviation and

mean of the pixels in a neighborhood of every point in an image. Let and

denote the standard deviation and mean value of the set of pixels contained in a

neighborhood, .

Where Q is a predicate based on parameter computes using the pixels in neighborhood.

Using moving averageComputing a moving average along scan lines of an image. This implementation is quite useful in document processing, where speed is a fundamental requirement. The

17

scanning typically is carried out line by line in a zigzag pattern to reduce illumination bias.

Let denote the intensity of the point encountered in the scanning sequence at step

k+1. Where n denote the number of point used in computing the average.

is the initial value. Segmentation is implemented using Eq(2.7-1) with

,where b is constant and is the moving average at point (x,y) in the

input Image.

Multivariable ThresholdingWe have been concerned with thresholding based on a single variable: gray-scale intensity. A notable example is color imaging, where red(R),greed(G), and blue(B) components are used to form a composite color image. It can be represented as a 3-D

vector, z= ,whose component are the RGB colors at a point.

Let a denote the average reddish color in which we are interested, D(z,a) is a distance measure between an arbitrary color point, z, then we segment the input image as follows:

Note that the inequalities in this equation are the opposite of the equation we used before. The reason is that the equation D(z,a)=T defines a volume.

A more powerful distance measure is the so-called Mahalanobis distance.Where C is the covariance matrix of the zs, when C=I, the identity matrix.

18

4. Region-Based Segmentation

4.1. Region GrowingRegion growing segmentation is an approach to examine the neighboring pixels of the initial “seed points” and determine if the pixels are added to the seed point or not. Step1. Selecting a set of one or more starting point (seed) often can be based on the

nature of the problem.Step2. The region are grown from these seed points to adjacent point depending on a

threshold or criteria(8-connected) we make.Step3. Region growth should stop when no more pixels satisfy the criteria for

inclusion in that region

Figure 4.17 (a)Original image (b)Use step 1 to find seed based on the nature problem.(c) Use Step 2(4-connected here) to growing the region and finding the similar point. (d)(e) repeat Step 2. Until no more pixels satisfy the criteria. (f) The final image.Then we can conclude several important issues about region growing：1. The suitable selection of seed points is important. The selection of seed points is

depending on the users.2. More information of the image is better. Obviously, the connectivity or pixel

19

adjacent information is helpful for us to determine the threshold and seed points.3. The value, “minimum area threshold”. No region in region growing method

result will be smaller than this threshold in the segmented image.4. The value, “Similarity threshold value“. If the difference of pixel-value or the

difference value of average gray level of a set of pixels less than “Similarity threshold value”, the regions will be considered as a same region.

5. The result of an image after region growing still have point’s gray-level higher than the threshold but not connected with the object in image.

We briefly conclude the advantages and disadvantages of region growing.Advantages：1. Region growing methods can correctly separate the regions that have the same

properties we define.2. Region growing methods can provide the original images which have clear edges

the good segmentation results.3. The concept is simple. We only need a small numbers of seed point to represent t

he property we want, then grow the region.4. We can choose the multiple criteria at the same time.It performs well with respect to noise, which means it has a good shape matching of its result.Disadvantage：1. The computation is consuming, no matter the time or power.2. This method may not distinguish the shading of the real images.In conclusion, the region growing method has a good performance with the good shape matching and connectivity. The most serious problem of region growing method is the time consuming.

4.2. Simulation of Region Growing Using C++.

20

Figure 4.18 Lena image after using region growing, there are 90% pixels have

been classified. Threshold/second: 20/4.7 seconds.The method have connected region, but it need more time to process.

4.3. Region Splitting and MergingAn alternative method is to subdivide an image initially into a set of arbitrary, disjoint regions and then merge and/or split the region.The quadtrees means that we subdivide that quadrant into subquadrants, and it is the following as:

1. Split into four disjoint quadrants any region for which

(means the region don’t satisfy same logic in ) .

2. When no further splitting is possible, merge any adjacent region and for

which (means that and have similarity we define in

some where) .3. Stop when no further merging is possible.

Advantage of region splitting and merging：We can split the image by choosing the criteria we want, such as segment variance or mean of the pixel-value. And the splitting criteria can be different from the merging criteria.Disadvantage of it：1. Computation is intensive.2. Probably producing the blocky segments.The blocky segment problem effect can be reduce by splitting for higher resolution, but at the same time, the computational problem will be more serious.

4.4. Data ClusteringThe main idea of data clustering is that we will use the centroids or prototypes to

22

represent the huge numbers of clusters to reach the two goals which are, “reducing the computational time consuming on the image processing”, and “providing a better condition(say, more conveniently for us to compress it) on the segmented image for us to compress it”.Different from hierarchical and partitional clustering. Hierarchical clustering, we can change the number of cluster anytime during process if we want.Partitional clustering , we have to decide the number of clustering we need first before we begin the process.

4.4.1. Hierarchical clusteringThere are two kink of hierarchical clustering. Agglomerative algorithms(builds) begin with each element as a separate cluster and merge them in successively larger clusters. The divisive algorithms(break up) begin with the whole set and proceed to divide it into successively smaller clusters.Agglomerative algorithms begin at the top of the tree, whereas divisive algorithms begin at the bottom. (In the figure, the arrows indicate an agglomerative clustering.)We introduce for the former one first.

Algorithm of hierarchical agglomeration：1. See every single data (as for the image, pixel) point in the database (as for the

image, the whole image) as a cluster .

2. Find out two data points , for the distance between them is the shortest in

the whole database (as for the image, the whole image), and agglomerate them together to form a new cluster.

3. Repeat the step 1 and step 2 until the numbers of cluster satisfies our demand.Notice that we have to define the “distance” in hierarchical algorithm. The mostly two adopted definitions are single-linkage agglomerative and complete-linkage agglomerative methods.Single-Linkage agglomerative algorithm：

Complete-Linkage agglomerative algorithm：

23

We assume as the distance between cluster and . And assume

as the distance between data a and b (as for image, pixel a and pixel b). For example, we have six elements {a} {b} {c} {d} {e} and {f}. The first step is to determine which elements to merge in a cluster. Usually, we want to take the two closest elements,Suppose we have merged the two closest elements b and c, we now have the following clusters {a}, {b, c}, {d}, {e} and {f}, and want to merge them further. But to do that, we need to take the distance between {a} and {b c}, and therefore define the distance between two clusters

Algorithm of hierarchical division：1. See the whole database (as for the image, the whole image) as one cluster.2. Find out the cluster having the biggest diameter in the clusters group we have

already had.3. Find the data point (as for image, the pixel) in that

4. Split out as a new cluster , and see the rest data points of as

5. Count and , .

If > , then split y out of and classify it to

6. Back to step2 and continue the algorithm until every iteration of and is not

change anymore.

We assume that the diameter of a cluster as . The diameter is defined as

.

The distance between point and cluster is defined as below,the mean of the distance between and every single point in cluster .

24

Figure 4.19 The simple case of hierarchical division

4.5. Partitional clustering

Comparing with Hierarchical algorithm, partitional clustering cannot show the characteristic of database. However, it does save more computational time than the Hierarchical algorithms. The most famous partitional clustering algorithm is “K-mean” method. We now check out its algorithm. Algorithms of K-means：1. Decide the numbers of the cluster in the final classified result we want. We assume

the number is .2. Randomly choose data points (as for image, the pixels) in the whole

database (as for image, the whole image) as the centroids in clusters.3. As for every single data point (as for image, the pixel), find out the nearest centroid

and classify the data point into that cluster the centroid located. After step 3, we now have all data points classified in a specified cluster. And the total numbers of cluster is always , which is decided in the step 1.

4. As for every cluster, we calculate the centroid of it with the data points in it. The same, the calculation of the centroid can also be defined by users. It can be the median of the database within the cluster, or it can be the really centre. Then again, we get centroids of clusters just like we do after step 2.

5. Repeat the iteration of step 3 to step 4 until there is no change in two successive iterations.( Step 4 and 5 are checking step)

25

We have to mention the in step 3, we do not always decide the cluster classified by the “distance” between the data points and the centroids. The using of distance now is because it is the simplest criteria choice. We can also use some other criteria as we want depending on the database characteristics or the final classified result we want.

Figure 4.20 (a) original image. (b) Chose 3 clusters and 3 initial points. (c) Classify other points by using the minimum distance between the point to the center of the cluster.

There are some disadvantages of the K-mean algorithm.1. The results are sensitive from the initial random centroids we choose. That is,

difference choices of initial centroids may result in different results. 2. We cannot show the segmentation details as hierarchical clustering does.3. The most serious problem is that the results of the clustering are often the circular

shapes due to the distance-oriented algorithms.4. It is important to clarify that 10 average values does not mean there are merely 10

regions in the segmentation result. See fig 4.4

Figure 4.21 (a) The result of the K-means. (b) The result that we want.

To solve initial problemThere is a solution to overcome the initial problem. We can choose just one initial

26

point in the step 2. Using two point beside the initial point as centroids to classify become two cluster, and then using the other four point beside the two point, do it until satisfy the number of clustering we want. That means the initial point is fixed also. Then we split the numbers of cluster until clusters are classified. By using this kind of improvement, we solve the initial problem caused by randomly chose initial data point. No matter what names they called, the concepts of them are all similar.

To determine the number of clustersThere are many researches to determine the number of clusters in K-means clustering.Siddheswar Ray and Rose H. Turi define a “validity” which is the ratio of “intra-cluster distance” to “inter-cluster distance” to be a criteria. The validity measure will tell us what the ideal value of K in the K-means algorithm is.

They define the two distances as：

where N is the number of pixels in the image, K is the number of cluster, is the

cluster centre of cluster .

Since the goal of the k-means clustering algorithm is to minimize the sum of squared distances from all points to their cluster centers. First we have to define a “intra-cluster” distance to describe the distances of the points from their cluster centre (or centroid), then minimize it.

From the other hand, the purpose of clustering a database (or an image) is to separate the differences between clusters. Therefore we define the “Inter-cluster distance” which can describe the difference of different clusters. And we want the value of it the bigger the better. It’s obvious that the distance between each clusters’ centroids are

27

large, then we just need few clustering to segment it, but if each centroids are close, then we need more cluster to classify it clearly.

4.5.1. More method to solve initial problem but do not change the scheme of K-mean Particle Swarm optimization

PSO is a population-based randomly searching process. We assume that there are N “particles” randomly appear in a “solution space”. Mention that we are solving the optimization problem and for data clustering, there is always a criteria (for example, the squared error function) for every single particle at their position in the solution space. The N particles will keep moving and calculating the criteria in every position the stay (we call “fitness” in PSO) until the criteria reaches some threshold we need.

Each particle keeps track of its coordinates in the solution space which are associated with the best solution (fitness) that has achieved so far by that particle. This value is called personal best, pbest. Another best value that is tracked by the PSO is the best value obtained so far by any particle in the neighborhood of that particle. This value is called global best, gbest. We introduce the exact statement in mathematics below：

where is the current position of the particle, is the current velocity of the particle, is the personal best position of the particle, w, c, are all constant factors, and r are the random numbers uniform distributed within the interval [0,1].

We use last velocity and last position of personal and global best to predicate the velocity now. The position we stay is predicated by last position plus velocity now.

By using PSO, we can solve the initial problem of “K-means” and still maintain the whole partitional clustering scheme. The most important thing is to think about it as an optimization problem.

4.5.2. Advantage and disadvantage of data clusteringHierarchical algorithm Partitional algorithms(K-means)

28

advantage 1. Concept is simple.2. Result is reliable. Result shows

strong correlation with original image.

3. Instead of calculating the centroids of clusters, we have to calculate the distances between every single data point.

1. Computing speed is fast.2. Numbers of cluster is fixed, so

the concept is also simple.

disadvantage 1. Computing time is consuming. So that this algorithm is not suitable for performing on a large database.

1. Numbers of cluster is fixed. So, what numbers of clustering we choose is the best?

2. Initial problem.3. Partitional clustering

cannot show the characteristic of database compared with Hierarchical clustering.

4. The results of the clustering are probably the circular shapes.

5. We can’t improve K-means by setting less centroids.

We can solve the choice of numbers of clustering by observing the value “validity” proposed by Siddheswar Ray and Rose H. Turi. For the initial problem, we can solve it by choosing only one initial point or using the PSO algorithm directly. However, we cannot solve the circular shapes problem because that is due to the core computing scheme of partitional algorithms.

29

4.6. Simulation of K-means using C++.

Figure 4.22 Lena image after using k-means. Clustering/time: 9 clustering/ 0.1 seconds.The image of top left is the original image. The others are the images after K-means. We discover it is not connected to one region. However, it is a fast way in region-based segmentation.

30

4.7. Cheng-Jin Kuo`s methodWe will introduce the method we propose and explain why we would like to use this kind of algorithm by the “data compression”.

4.7.1. Ideal thought of segment we would like to obtain.The ideal result we would like to obtain is something like fig.5-6. It is very important for us to classify the similar region together.

Figure 4.23 The ideal segment result we want.

We would like to see that the whole hair section is classified as one cluster. Because after we obtain the result, we can send the result directly to the compression stage to do the compression for every region. For almost all the methods of segmentation, it is unavoidable to over-segment a region like the hair region in Lena image.

4.7.2. Algorithm of Cheng-Jin Kuo`s methodMake the first pixel (Mostly, it will be the top-left one) we scan as the first clustering.1. See the pixel (x,1) in the image as one cluster . See the pixel which we are

scanning as .

2. From the first column, scan the next pixel (x,1+1) and make a decision with the threshold if it will be merged into the first clustering or to be a new clustering.

If , we merge into and recount the

centroid of .

If , we make as a new cluster .

3. Repeat step 2 until all the pixels in the same column have been scanned.

31

4. Scan the next column with pixel (x+1,1) and compare it to the region which is in the upper side of it. And make the merge decision see if we have to merge pixel (x+1,1) to the region .

If , we merge into and recount the

centroid of .

If , we make as a new cluster

.5. Scan the next pixel (x+1,1+1) and compare it to the region , which is

upper to it and in the left side of it, respectively. And make the merge decision see if we have to merge pixel (x+1,1) to anyone of them.

If and ,

(1) We merge into , merge into .

(2) Combine the region and to be region , where n is the cluster number so far.

(3) Recount the centroid of .

else if and ,

(1) We merge into and recount the centroid of .

else if and ,

(2) We merge into and recount the centroid of .else We make as

a new cluster .6. Repeat step 4 ~ step 5 until all the pixel in the image has been scanned.7. Process the small regions which are classified from step1~ step4It is important for us to deal with the isolated small regions carefully. For that we do not want there are too many fragmentary results after we segment the image using our method. Therefore, our goal is to classify the isolated small regions into the big region which is already classified and is adjacent to these isolated small regions.The following is the method to merge small region to big region.(a) We aim and are prepare to process the regions which have the small size.

(For the 256x256 input images, we aim the regions which have the size below 32 pixels)

32

(b) If the region is fully surrounded by a single bigger region ;(C) If the region is surrounded by several(for example, k) bigger regions

,We see the adjacent pixel of as a region and count We count the mean of and classified to the most similar

：if

then

4.8. The Improvement of the Fast Algorithm : Adaptive local Threshold Decision

In our algorithm, the threshold in section 4.7 does not change in the whole procedure. We would like to make a new procedure that could adaptively decide the threshold with the local frequency and variance in the original figure.

4.8.1. Adaptive threshold decision with local variance We would like to select the threshold based on the local variance of a figure. Here is the step of the algorithm:1. Separate the original figure to 4*4, 16 section.2. Compute the variance of the 16 sections, respectively.3. Depending in the local variance, we select the suitable threshold. The bigger

variance, we assign the bigger threshold.

Figure 4.24 Lena image separated into 16 sections.

The variance of 16 sections as a matrix：

We can image that after using the adaptive threshold method, the segmented result in

33

section (2,3) and (2,4) whose variance are 1960 and 1974 will be similar to the original method whose global variance is 1943We can image that the new segmented result in section (1,1) and (1,2) will be more detailed, or we can say, in these two sections we will have more clusters in the result.

In the most of the time, the adaptive threshold selection method help us to do more precisely segmentation. However, as we can see, we do not really feel the improvement of select local threshold with local variance by observing the simulation results. The small value of variance will cause our local threshold be a small one. Therefore, it will cause a more detailed segmenting result in the end.

4.8.2. Adaptive threshold decision with local variance and frequencyFor example, the baboon.bmp will be segmented more detailed because the adaptive local variance will select a smaller threshold in every parted area, because it’s low variance. However, we do not satisfy for this. We would like to segment the beard parts of this image more roughly.

Figure 4.25 Baboon.bmp

The local variance of baboon.bmp：

34

The local average of baboon.bmp：

As we can see, the bottom left area has a small variance and big frequency component. In this area we will choose a small threshold, then the segmenting result will be more detail. If we want to classify it as one region, we need set it depend on local frequency.

To sum up, we conclude four situations for our improvement.1. High frequency, high variance. Set highest threshold.

Figure 4.26 A high frequency and high variance Image

2. High frequency, low variance. Set second high threshold.

Figure 4.27 A high frequency and low variance image

3. Low frequency, high variance. Set third high threshold.

Figure 4.28 A frequency and high variance image

35

4. Low frequency, low variance. Set the lowest threshold.

Figure 4.29 A low frequency and low variance image

For the first case, the reason for a higher threshold we select is that there are often many edges and different objects in this kind of area. The larger value of threshold may cause a rough segmenting result, but we believe the clear edge and the high variety between different objects will make the segmenting work. The larger threshold will remove some over-segmentation cause by the high variance and high frequency. It might be thought that the smallest threshold in case four will cause an over-segment result. However, the stable and monotonous characteristic in case four will not make the over-segmentation work.

4.8.3. Decide the local threshold.we use a formula to decide the threshold：The formula of F：The formula of V：

In this thesis we always try to control the threshold value between 16 and 32 for the best testing threshold value with the original method (without using adaptive threshold) will be 24. For that, the range of F will be 0 to 8, and so does the range of V. The maximum of F and V are all 8 which will make the maximum of threshold be 32.

If local average frequency>9, F=6;

Else if local average frequency<3,F=0;

EndIf local variance>3000,

V=6;Else if local variance<1000,

36

V=0;End

With the range of F to be defined from 0 to 8 and range of V to be defined from 0 to 8, the value of A, B, C, D will be 4/3, -4, 0.004, -4, respectively. The values of parameters are simply the linear relationship we count.

Equation (6.4) processes the linear relationship between the local average frequency and F. We consider the case only when 3<local average frequency<9.Equation (6.5) processes the linear relationship between the local variance and V. We consider the case only when 1000<local variance<3000.We can also change the range of the final threshold. Only we have to do is to recount the parameters A, B, C, D with the equation below：

4.9. Comparison of all algorithm by data compressionRegion growing K-means Watershed Cheng-Jin

Kuo`s method

Speed bad Good(worse than C.J.K’s

method)

bad Good

Shape connectivity

intact fragmentary oversegmentation Intact

Shape match Good(better than C.J.K’s

method)

Good(equal C.J.K’s method)

bad Good

37

5. Boundary Compression using Asymmetric Fourier

Descriptor for Non-closed Boundary Segmentation

This chapter briefly introduces the Fourier descriptor and provides an improvement of using with Fourier descriptor in describing a boundary. We define a variable R to represent the ratio of the number of original terms K to the number of compressed terms P in Discrete Fourier transform. Note that R = P/K.

5.1. Fourier DescriptorThe Fourier description is a method to descript boundary by using DFT to the

image as x-axis becomes real part and y-axis becomes imaginary part. We assume that there are several boundary of point, . These coordinates can be expressed in the form , k=0,1,2,…,K-1.Moreover, each coordinate pair can be treated as a complex number so that

The Discrete Fourier transform (DFT) of s(k) is

The complex coefficients a(u) are called the Fourier descriptors of the boundary. The inverse Fourier transform of these coefficients are denoted by s(k). That is,

We get rid of the high frequency terms which k is higher than P-1. In mathematics, this is equivalent to setting a(u) = 0 for u > P1 in (4.3). The result is the approximation to s(k):

In Fourier transform theorem, high-frequency components account for fine detail, and

38

low frequency components determine global shape. Thus the small P becomes the more lost detail on the boundary.

Problems of Fourier descriptorFourier descriptor has a serious problem when the compression rate is below 20%. Below this compression rate, the corner of the boundary shape will be smoothed.Mention that the corners of an image or boundary usually present the high-frequency components in frequency domain. However, if we reconstruct the boundary from (5-4) and let R is less than 20%, the results are not very good in the corner of the boundaries.

5.2. Asymmetric Fourier descriptor of non-closed boundary segmentThere is a method proposed by Ding and Huang can solve the problems we mentioned above. This method which is so called “Asymmetric Fourier descriptor of non-closed boundary segment” can improve the efficiency of the Fourier description even when the value of R is below 20%. There are Three approaches(Steps) in this method. We introduce it below.[A-1]

5.2.1. Approach 1: Predicting and marking the cornerThe first step of the method is to find out the corner point of the boundary. In this step, we will predict and mark the corner of the boundary.As we can see in Figure 5.30, the corner points are at the regional maximum of the error value. In our experiment, we define the corner points at the place that the error value is greater than 0.5 and choose the maximal in the 10-point nearby region.

35 40 45 50 55 60 65

35

40

45

50

55

60

650 20 40 60 80 100 1200

0.5

1

1.5

Figure 5.30 (a) A star-shape boundary (b) Error between the two boundary of (a).In this method, predicting and marking the corners is just the first step. After this step, we have to segment the original boundary into several parts and to convert these boundary segments by Fourier description.

39

5.2.2. Approach 2: Fourier descriptor of non-closed boundary segmentFourier description to describe a boundary is to get rid of the high frequency components.

u

a(u)

0Boundarysegment

Fourierdescriptor

Recoveryboundary

truncate

P KDFT Inverse

DFT

Fig. . Use Fourier descriptor to a non-close boundary segment.

u

a(u)

0Boundarysegment

Fourierdescriptor

Recoveryboundary

truncate

P KDFT Inverse

DFT


u

a(u)

0

Boundarysegment

Fourierdescriptor

Recoveryboundary

truncate

P KDFT Inverse

DFT


u

a(u)

0

Boundarysegment

Fourierdescriptor

Recoveryboundary

truncate

P KDFT Inverse

DFT

Fig. . Use Fourier descriptor to a non-close boundary segment.Figure 5.31 Using Fourier descriptor to deal with a non-closed boundary segment.However, if we truncate the high frequency of the frequency domain of a non-closed boundary, the reconstructed boundary will be a closed boundary. Now we describe it as followed:

s1(k) s2(k) s3(k)

Boundary segment

Linearly shift

Add a new segment

Fig. . The steps to solve the non-closed problem.

(x0, y0)

(xK 1, yK 1)

s1(k) s2(k) s3(k)

Boundary segment

Linearly shift

Add a new segment


(x0, y0)

(xK 1, yK 1)s1(k) s2(k) s3(k)

Boundary segment

Linearly shift

Add a new segment


(x0, y0)

(xK 1, yK 1)

s1(k) s2(k) s3(k)

Boundary segment

Linearly shift

Add a new segment


(x0, y0)

(xK 1, yK 1)

Figure 5.32 (a) Step1 、 (b) step2 and (c) step3 .

Step 1: We set the coordinate of the start point as , and the end point as. See figure 5.3 (a)

Step 2: We shift the boundary points linearly according to the distance on the curve between the two end points. If (xk, yk) is a point of the boundary segment s1(k), for k = 0, 1, ..., K1, it will be shifted to (xk', yk'), see (b), where

Step 3: We add a boundary segment which is odd-symmetric to the original one. Then the new boundary segment is closed and perfectly continuous along the curve between the two end points. See (C) The new boundary segment is

Step 4: Compute the Fourier descriptor to the new boundary segment s3(k). That is,

40

if the signal s(k) is odd-symmetric, its DFT a(u) is also odd-symmetric.

Because the central point of s3(k) is the origin, the DC-term (the first coefficient of DFT) is zero. We only need to record the second to the K- th coefficient of the Fourier descriptors, as illustrated in Fig. 4.6.

u

|a(u)|

0

useless

K 2K 2

s3

DFT

odd symmetry

DC-term is zero

u

|a(u)|

0

useless

K 2K 2

s3

DFT

odd symmetry

DC-term is zerou

|a(u)|

0

useless

K 2K 2

s3

DFT

odd symmetry

DC-term is zero

u

|a(u)|

0

useless

K 2K 2

s3

DFT

odd symmetry

DC-term is zero

Figure 5.33 Fourier descriptor of

After doing all the steps, we can start to take the Fourier description of the processed boundary and the problem we mention will not exist anymore.

5.2.3. Approach 3: Boundary compressionThis is talk about the process in approach 2 which can use on boundary compression.We can only reserve the P1 coefficients and truncate the other coefficients. We recover the whole coefficient by stuffing zeroes and then copy to the odd symmetry part

u

|a(u)|

0 K 1P 1

only reserve P 1 coefficients |a(u)|

P 1

odd symmetrytruncate

u0 2K 2

u

|a(u)|

0 K 1P 1


P 1


u0 2K 2

Figure 5.34 The reserve P1 coefficients

u

|a(u)|

0 K 1P 1


P 1


u0 2K 2

u

|a(u)|

0 K 1P 1


P 1


u0 2K 2

Figure 5.35 Recover the whole coefficients from fig. 5.5

41

5.2.4. Approach 4: boundary encodingIn the boundary segments encoding, we have four data to record:

Figure5.5

In the third data, the point number of each segment is the distance of two end points of each boundary segment. The distance we used here is the sum of the two distances of the x-axis and y-axis.

Figure 5.37 Point numbers of boundary and distance of two end point.

As we can see in Fig. 5.8, we have vector n of the point number of each boundary segment. Similarly, dx and dy are vectors that record the distances of the x-axis and y-axis respectively. Therefore, we can get the difference vector d where

The value difference vector d is close to zeroes and is appropriate to encode with Huff-man encoding. In the decoder as shown in Fig. 4.12, we can recover n that

In the fourth data, we combine the coefficients of each boundary segment in a whole boundary and encode them with zero-run length and Huffman coding. When the boundary segment is a straight line, its coefficients of Fourier descriptor will be all zeroes. Therefore, it is appropriate to use the zero-run length coding when many boundary segments are straight lines.

42

Coordinatesof each corner

Coefficientsof each segment

Segment number of each boundary

Point numberof each segment

Truncate andquantization

Corner distance

Zero-run length & Huffman encoding

Difference & Huffman encoding



+

Bit stream

Corners & boundary segments

Figure 5.36 Boundary Segment Encoding

Because we have recoded the point number of each boundary segment, we can calculate the reserved coefficient number and the split the combined coefficient array correctly. And then we can recover the original coefficients by stuffing zeroes to the truncated position.

(a) Original Boundary (b) Recovered boundary with R = 10% and coefficient number is greater than 3

Figure 5.38 Result of improved Boundary Compression

However, if we use the modified Fourier descriptor method that has split the boundary at the corner point to several boundary segments, the sharp corner can be reserve when R is less. We can also see that when R = 10%, the result of the original Fourier descriptor method is obviously distortion. However, in the modified Fourier descriptor method, the characteristic of corners can be reserved and some longer boundary segments do not be distortion obviously.

We have to notice that the shorter boundary segments are stretched from a curve due to the reserve coefficients are less than one. Therefore, we make the reserve coefficient number is greater than three, where the three coefficients can represent the most characteristic in our experiment. The improved result is shown in Fig 4.13.

43

6. Reference

1. R. C. Gonzalez, R. E. Woods, Digital Image Processing third edition, Prentice Hall, 2010.

2. L.G. Roberts, “Machine Perception of three-Dimensional Solid,” in Optical and Electro-Optical Information Processing, Tippet, J.T (ed.), MIT Press, Cambridge, Mass, 1965.

44

disp.ee.ntu.edu.twdisp.ee.ntu.edu.tw/meeting/建齊/image segmentation .doc · web viewfigure 4.6...

Documents