1-s2.0-s0167865597000044-main

ELSEVIER Pattern Recognition Letters 18 (1997) 173-185

Pattern Recognition Letters

Colour image segmentation by modular neural network 1 A. Verikas a,b,,, K. Malmqvist a, L. Bergman a

a Center for Imaging Sciences and Technologies, Halmstad University, Box 823, S-301 18, Halmstad, Sweden b Kaunas University of Technology, Studentu 50, 3031, Kaunas, Lithuania

Received 20 June 1996; revised 22 October 1996

Abstract

In this paper segmentation of colour images is treated as a problem of classification of colour pixels. A hierarchical modular neural network for classification of colour pixels is presented. The network combines different learning techniques, performs analysis in a rough to fine fashion and enables to obtain a high average classification speed and a low classification error. Experimentally, we have shown that the network is capable of distinguishing among the nine colour classes that occur in an image. A correct classification rate of about 98% has been obtained even for two very similar black colours. @ 1997 Elsevier Science B.V.

Keywords: Colour classification; Image segmentation; Modular neural networks

1. Introduction

Colour image processing and analysis is increas- ingly used in industry, medical applications, and other fields. Quality inspection, process control, material analysis, and medical image processing are a few examples. Therefore, a research in colour perception and development of efficient computational models for real world problems is of crucial importance. One task that often arises in colour image processing is image segmentation. Colour image segmentation techniques can, roughly, be categorised into techniques for chro- matically dividing an image space and those for clus- tering a feature space derived from an image. Region growing, region splitting and merging are the common approaches used by methods of the first group (Liu

* Corresponding author. E-mail: [email protected]. I Electronic Annexes available. See http://www.elsevier.nl/

locate/patrec.

and Yang, 1994, Panjwani and Healey, 1995). Meth- ods of the second group divide colour space into clusters (Uchiyama and Arbib, 1994, Tominaga, 1992). The colour image segmentation method we discuss here belongs to the latter category. We treat the colour image segmentation problem as a problem of classification of colour pixels.

The most common goal in colour image segmentation is to partition a colour image into a set of uniform colour regions. However, the aim of this work is slightly different. The motivation for this work is a need to determine colours of inks used to produce a multi-coloured picture created by printing dots of cyan (c), magenta (m), yellow (y) and black (k) primary colours upon each other through screens hav- ing differing raster angles. The answer must be given for any possible combination of cyan, magenta, yellow and black ink and for any area of the picture. One factor that influences the colour impression of the picture is the size and shape of the areas covered by the

0167-8655/97/$17.00 @ 1997 Elsevier Science B.V. All fights reserved. PII S0167-8655 (97) 00004-4

174 A. Verikas et al./Pattern Recognition Letters 18 (1997) 173-185

Table 1 The mean values and standard deviations of the variables R, G and B for five overlapping classes of colours ( R, G and B E [0, 255] )

Colour class m my cm cmy k

Variab~ R G B R G B R G B R G B R G B

Mean 119 41 53 I12 37 33 32 31 48 30 31 33 25 25 24 Stand. dev. 5 5 7 6 5 6 5 6 10 5 6 7 4 3 4

different inks. This information can be used to control the amount of ink transferred to the paper in each of the four printing nips holding cyan, magenta, yellow and black. The measurement of the area covered by ink of the different colours can be done automat- ically using an image analysis system, if the image taken from the printed picture can be segmented into regions according to the following two rules:

1. Pixels should be assigned to the same cluster (colour class) if they correspond to areas of the picture that were printed with the same inks.

2. Pixels corresponding to areas printed with different inks should be assigned to different clusters.

Determination of a colour class for every pixel of the image is the way the task is solved. In order to solve the task with acceptable accuracy in classification and a high average speed we propose the use of a hierarchical modular neural network. Note that classification speed is of primary interest in our application. The rest of the paper is organised as follows. In the next two sections we briefly describe the input data and the colour space used. Architecture of the network is presented in Section 4. Procedures for training the network are given in Section 5. Section 6 summarises the results of experimental investigations. Section 7 concludes the work.

2. The data

When mixing dots of cyan, magenta and yellow colours eight combinations are possible for every pixel in the picture. The combination emy produces the black colour. However, in practice black ink is most often also printed. We assume the black ink to be opaque. Therefore, we have to distinguish between 9 colour classes, namely c, m, y, w (white paper), cy, cm, my, emy (black resulting from overlay of cyan, magenta and yellow) and k (black resulting from black ink). Discrimination between some of the colour classes is a rather complicated matter, since they are highly

overlapping in the colour space. For example, m-my and em--emy-k are two clusters of such highly overlapping colour classes. To illustrate this we present in Table 1 the mean values and standard deviations of the variables R, G and B for these five classes of colours. Note that the intensity values shown in the table are for the data taken from solid print areas only. This means no pixels from the dots and "fuzzy borders" of the dots are included. The pixels from the borders create "bridges" between white and the other colours and make the classification problem more difficult to solve. Besides that, the R, G and B parameters of class k acquire very large variance, since the black ink can appear beneath (or above) all possible combinations of the other coloured inks.

The number of clusters with highly overlapping classes of colours depends on several factors, such as the amount of black ink printed on the picture, printing technology, properties of inks used and some other factors. By increasing the amount of black ink, we make the other colours darker and more and more similar until we get only one cluster with only one black colour. We assume, therefore, the range of variation of the amount of black ink to be 0-50%. In such a range of variation of black ink printers would like to measure the percentage of an area covered by inks of different colours. Though several clusters of rather overlapping colour classes can appear and it is important to recog- nise with an acceptable classification accuracy all the colour classes, the most difficult task is to distinguish between the colour classes cmy and k.

3. Choice of colour space

Five colour spaces, namely RGB, HSV, CIELuv, C1ELab and "IJK", have been tested and compared experimentally. The choice of colour space was based on experimental testing. The "IJK" colour space was the choice. The highest correct classification rate was obtained in this space for the most overlapping colour

A. Verikas et al./Pattern Recognition Letters 18 (1997) 173-185 175

Additional features from the | surrounding ( Yl ..... Ym ) r /

[ Colourpixel (X) 7

] New variables from R, G, and B [ I I

I A binary decision tree ]-~ Class label ) r

[ A set of ambiguous classes [

A network for each set of classes counterpropagation ambiguous 1

[ A set of CP weight vectors {W 1 ..... Wi ..... wk} ~------*IClasslabell

Weighted Euclidean Distances [~

d(x, C i ) ~-" ~ Zk~l [% (xi -- wik )]2 + ~1=1 [all (Yt -- Uil )12 }1'2 H ( Class label }

,k [ Fuzzy post-processing I

[ Class label)

Fig. 1. Architecture of the network.

Weights % from

random optimisation

classes (era, emy and k). The "IJK" colour space uses colour difference sig-

nals. If we assume the random variables R, G and B to be of equal variances (0 "2) and covariances (only variances of the variables have been normalised to be equal to one in our experiments), the covariance matrix of these variables can be written as (Tan and Kit- tier, 1993)

I 1 r i l X=o "2 r 1 , (1) r r

where r is the correlation coefficient. The eigenso- lution of the covariance matrix gives the following eigenvectors (ei) and the corresponding eigenvalues (ai):

el = {1, 1, 1}a';

e2 = {1,0,-1}T; (2)

e3 = {1 , -2 , 1}T;

At =0-2(1 +2r ) ;

a 2 = h 3 = 0"2( 1 -- r). (3)

The linear transform of the {R, G, B} vector by the eigenvectors produces other random variables

I=R+G+B, (4)

J=R-B , (5)

K = R - 2G + B, (6)

which are almost uncorrelated and have zero covariances. 1, J and K are the variables of the "IJK" colour space.

According to Hunt ( 1991 ), three signals representing colour are transmitted via nerve fibres from the human eye to the brain. One of these signals is usually referred to as an achromatic signal and the other two as colour difference signals. In this sense the 1, J and K variables mimic the signals transmitted to the human brain, since I can be referred to as an achromatic signal, while J and K as colour difference signals.

Distances measured in the "IJK", as well as the RGB colour space do not represent colour differences in a uniform scale from the point of view of perception. The C1ELuv and CIELab colour spaces are more uniform in this sense. In spite of that we have cho-

176 A. Verikas et aL /Pattern Recognition Letters 18 (1997) 173-185

sen the " I JK"colour space. A low classification error and a high processing speed, but not the good cor- respondence between measured and perceived colour differences, is our primary interest when choosing the colour space. Such an approach comes from the goal, which is to determine colours of inks used to print any arbitrary area of a given picture, but not to segment an image of the picture in a way as similar as possible to the way how the humans do. Even when calculated without pre-processing, the I, J and K variables are much less correlated than R, G and B. The amount of calculation required to obtain 1, J and K variables is less than for {L, u, v}, {L, a, b} or {H, S, V}. Be- sides, the variable H is undefined for a grey colour. Therefore, the choice of the " I JK" colour space seems reasonable in this particular case.

m

Grossberg vl I layer

I Competitive layer

W I

Input layer

x, x 2 x, y~ Y2 Ym Fig. 2. A forward-only counterpropagation network.

classification pertbrmed by the tree is very fast, since the tree consists of only a few nodes and only one neuron is used in every node of the tree.

4. Architecture of the neural network

The architecture of the network is shown in Fig. 1. There are four steps in the proposed classification

procedure. A binary decision tree performs the first step of the procedure. We carry out the second and the third classification steps by using weight vectors of the counterpropagation (CP) network. The last classification step (fuzzy post-processing) is based on analysis of the decisions made in the previous step. We use only three variables I, J and K to describe a pixel in the first two classification steps. In the third step adjacency information is also exploited, since a pixel acquires additional co-ordinates, the values of which are calculated from the surrounding of the pixel being classified. Next, we briefly describe network modules, that perform different classification steps.

4.1. Binary decision tree

The binary decision tree performs the first classification step. Two types of terminal nodes can be en- countered in the tree: (1) the node representing one colour class only, and (2) the node representing a cluster (a set) of ambiguous colour classes. The classification performed by the tree is final for the pixels arriving at terminal nodes of the first type. Pixels reaching terminal nodes of the second type are transferred to the CP network for further analysis. The tree divides colour space into several colour regions. The

4.2. Counterpropagation network

Devoted to each set of ambiguous coiour classes is a CP network (Fig. 2). The size of the network's competitive layer depends on the number of ambiguous classes in the set. Each class from this set is represented by a part of the layer. These parts are trained separately and concatenated to make one network for each and every set of ambiguous classes. The second classification step is performed by using the weight vectors wi = (Wil . . . . . win) of the competitive layer as reference patterns in a k-NN classification rule. The number of nodes in the Grossberg layer is equal to the number of features yl . . . . . Ym extracted from the surroundings of the pixel being classified. The learned values of the features are stored as weights of the Grossberg layer. Therefore, each weight vector of the competitive layer wi (a rough reference pattern) is associated with one Grossberg layer's weight vector ui = (vi i , Pi2 . . . . . Pint) (reference pattern containing more details). The association of the weight vectors produces a concatenated weight vector ci = ( wil . . . . . win, P/ l , Pi2 . . . . . Pint). The third classification step is performed by using the concatenated weight vectors. The vectors are treated as reference patterns in a minimum distance classifier.

The CP network acts as a quantiser of a colour region. The weight vectors wi quantise the region in a 3-dimensional " I JK" colour space, while the concatenated weight vectors ci quantise the region in an ex- tended (3 + m)-dimensional space.


4.2.1. Classification by using CP network During the second classification step k nearest

weight vectors wi of the competitive layer are selected. Note that the nearest weight vectors are found among the all N weight vectors of the concatenated competitive layer. The classification result is final if, among these weight vectors, vectors of one class dominate, i.e. the ratio of a number of weight vectors representing the two most frequently appearing colour classes exceeds some pre-specified threshold. Other- wise, k weight vectors ci are emitted as an output of the CP network. The vector cl is a concatenation of the first winner Wl of the competitive layer and an associated with it vector ul of the Grossberg layer. The weight vectors ci are used in the third classification step that is performed by calculating the weighted Euclidean distances. The weights a 0 that appear in the weighted Euclidean distance measure are specific for each reference pattern. The weights are found by performing a random optimisation in the weight space (Verikas et al., 1996). By doing such steps we perform analysis in a rough to fine fashion. We perform a rough classification with the binary decision tree, more precise with competitive layer weights, and accurate classification with the concatenated weights.

4.3. Fuzzy post-processing

As it has been mentioned, one counterpropagation network for each and every set (cluster) of ambiguous classes is constructed. During the learning process the weight vectors of the network are distributed in the colour space according to the class-conditional probability density function of the input data used for learning. The weight vectors of the trained network are treated as reference patterns and they represent regions of the colour space.

Let Mac be the number of ambiguous classes in the set. Each class q (q = 1,2 . . . . . MAC) is represented by Nq weight vectors. The set of these vectors is:

{Cj} = {c~,i = 1 . . . . . Nj}, j = 1 . . . . . MAC, (7)

{c} = UAG), (8)

3 --` MAc Nj. (9) N= ~_~/

As far as highly overlapping colour classes are considered, most of the weight vectors will be located in

the overlapping regions of the class-conditional dis- tributions. However, some of the vectors will be also placed in the non-overlapping "tails" of the distribu- tions. Therefore, the decisions made by using the different weight vectors are not of the same reliability. We say that the decision is made (when classifying pixel x) by using the weight vector c~ ( i = 1,2 . . . . . Nj; j = 1 . . . . . MAC), if the minimum distance d(x, c~) has been obtained by using the weight vector c~.. Some of the decisions made by using weight vectors from the overlapping regions can be rather doubtful. Therefore, a correction of the decisions (the post-processing) takes place after the pixels have been classified. The concept of the correction is as follows.

The decision classes (the colour classes) and the weight vectors ~ representing the regions of the colour space are considered as fuzzy sets. Membership values for the fuzzy sets and the fuzziness of the decisions made by the weight vectors are defined. Clas- sification of an image by the counterpropagation network results in the classified image as well as in a number MAC (the number of ambiguous classes in the set) of supplementary images. Every pixel x in the supplement image j is represented by the value of the membership function Aj (x) of the jth ambiguous class. Post-processing is based on information about the membership values and the fuzziness of the decisions. More details about the post-processing can be found in (Verikas and Malmqvist, 1995).

4.4. Benefits of the architecture

A high averaged classification speed and a low classification error are the main attributes of the architecture chosen. The binary decision tree performs fast classification by using only the three colour space co- ordinates and assigning each colour pixel to one of several colour regions. In the second classification step location of the pixel in the region is analysed by using 3-dimensional weight vectors, representing sub- regions of the region being considered. As a result of the analysis, several sub-regions for the possible location of the pixel are selected. In addition to the three main colour space co-ordinates adjacency information is also exploited in the third classification step for making a decision about the pixel's colour class. The colour class assigned to the pixel may be further


changed in the last classification step, depending on which weight vectors have been used to make decisions about the colour classes of the adjacent pixels. Therefore, in each step the dimensionality of the decision space is reduced, while the amount of information used to make a decision is increased. Depending on the colour of a pixel, the classification process can be accomplished at any of the steps.

By performing nested analysis and by adaptively using the amount of information for the classification process we achieve the required accuracy and gain analysis speed. In contrast, image segmentation methods based on region growing, region splitting and merging require intensive calculations for performing multiple splits and merges. Besides, such methods are not directly applicable in our case, since our goal is to determine colours of inks used to print any arbitrary area of a given picture, but not to segment an image of the picture in a way as similar as possible to the way how the humans do. Sometimes, for example some "cmy" regions are perceived as being more similar to the class "k" than to their own class, however pixels from these regions should acquire the label "emy".

5. Training the network

5.1. Binary decision tree

The binary decision tree is constructed by sequen- tially dividing the learning set into two parts. In every node of the tree the learning set is divided into two subsets according to the decision boundary developed during learning. Only one neuron (of any order desired in a general case) is used to solve the task (of dividing the learning set into two parts) in every node of the tree. The neuron can classify a data point x into one of two subsets according to the sign of the neuron's output value. The output is given by

y = f (u ) = f (wo + Zwix i+ i

+ EWi l . . . i LX i l ' ' 'X iL ) , ij 4"" ~ic

(lO)

with xi being the ith component of the input data, w i the corresponding weight, and L the neuron's order.

The function f ( ) is from -1 to +1 with f (0 ) = 0, for example f (u ) = tanh(u).

The learning set X is partitioned into two subsets X+ and X_ according to the following rule:

X+ i fg (x) ~> 0 x E Vx C X, (11)

X_ i fg (x ) < 0

where g(x) is given by

g( x ) = wo + Z wixi + "" i

Z Wil"'iLXil'''XiL" (12) +

iI~'"~it.

The learning set X contains labelled as well as unlabelled pixels. The unlabelled pixels are those coming from the borders of the dots. Labels for such pixels are hard or even impossible to obtain. Therefore an unsupervised learning algorithm, we have recently proposed, is used for the binary decision tree construction (Verikas et al., 1995). For every node of the tree the algorithm tries to locate the decision boundary (12) in a place with few learning samples. A node of the tree is labelled as being a terminal node of the first type when all labelled samples falling in the node belong to the same class or if only one class has the number of labelled samples above the threshold T1 and the ratio of samples of two major classes represented by the node is above the threshold T2. A node of the tree is labelled as being the terminal node of the second type if the number of labelled samples falling in the node is above the threshold T1 more than for only one class and the samples falling into the node form a "compact cluster". The algorithm that can find the "compact clusters" is given in (Verikas et ai., 1996).

5.2. Counterpropagation network

5.2.1. Process of designing the network The CP networks with input from the second type

nodes of the tree are trained separately. In order to avoid overtraining and to achieve better generalisation properties of the network, separate data sets have been used in different steps of the designing process.

Data sets used to design a pattern recognition system are always limited and very often not representa- tive enough. This often happens because of a lack of


experimental data (it was not the case in this study), limited resources of computer memory or computation time. It also often happens that sets are collected in favour of one or another class. Therefore, the use of different data sets in different steps of the design process reduces the possibility that the system be de- signed in favour of some classes and improves generalisation properties of the system.

Six sets of data, namely a learning set, two validation sets, two optimisation sets and a testing set, have been used for constructing each network. Each set rep- resents the number MAc of classes. It is clear, if only a small amount of experimental data is available, the optimisation sets can be replaced by the learning set, as well as two validation sets by only one, or "leave few out" techniques can be applied.

First, the weight vectors of the competitive and the Grossberg layers of the CP network are obtained for each of the MAc classes (using the learning set) by means of competitive learning with "conscience" and the Grossberg learning law, respectively. See the section below for the detailed learning procedure.

Next, the set of concatenated weights, is optimised by using the modified Ivq algorithm (Section 5.3) and the optimisation set 1. We use the "pocket optimisation" strategy. The best set of weights c is traced during the optimisation process and kept in the "pocket". The optimisation terminates with the best set of weights. The quality of the weights is tested on the validation set 1.

In the next step of the designing process we find weights aij that appear in the weighted Euclidean distance (Section 5.4). The Alopex algorithm (Unni- krishnan and Venugopal, 1994) performing a random search in the weight space and the optimisation set 2 are used for obtaining the weights. During the optimisation process some of the weights decrease to zero and eliminate corresponding features (Verikas et al., 1996). The features eliminated are different for different reference patterns. Therefore we say that the features are selectively used for classification. The features used are different for different regions of the colour space. The whole CP network is tested on the validation set 2 after the optimisation.

This process of designing is repeated for a different number of the CP network's nodes. The network yielding a reasonable trade-off between the classification error and complexity is chosen as the final one.

5.2.2. Training the network First, the weight vectors w of the competitive layer

of the CP network are obtained for each of the MAC classes by means of competitive learning with "conscience" (Verikas and Malmqvist, 1995). The "conscience" mechanism is similar to that proposed by De- sieno (1988).

In each iteration of the learning process we find a "winning" weight vector using the following equation:

k=argnfin(d(x,w~)-bq), /=1 ,2 . . . . . Nq,

(13)

where d(x,w~) is the distance between pixel x and the ith weight vector of the qth class, and bq is the winning frequency sensitive term, that penalises too frequent "winners" and rewards those that win seldom (Verikas and Malmqvist, 1995). Then the winning weight vector w~ (t) is updated according to the rule

W~q(t+l)=Wkq(t)+at[x(t)--Wkq(t)], (14)

where parameter {at} is a slowly decreasing sequence of learning coefficients.

When training of the competitive layer terminates, the weights w are frozen and the learning proceeds for the Grossberg layer (separately for each of the MAC classes). The learning of the Grossberg layer is governed by the Grossberg learning law:

pij(t+l)=l. ' i j (t)+fl[yj- l , ' i j (t)]zi , (15)

where t is the iteration index,/3 the learning rate (0 < /3 < 1), yj the jth feature from the surrounding, v i = (v jl, vj2 . . . . . vjNq) the weight vector associated with the jth node of the Grossberg layer, Nq the number of the competitive layer nodes representing the qth class, and zi is the output signal of the ith node of the competitive layer which is given by

= min d(x,wJ,), 1, if d(x,wiq) j=l,2,...,u,, zi = (16)

0, otherwise,

where d (x, W/q) is the distance between the pixel being classified and the ith competitive layer weight vector representing the qth class. Here we assume that the training of the CP network proceeds for the qth class. q E Iac, where Iac is the set of class indices from one cluster of ambiguous classes,

180 A. Verikas et aL /Pattern Recognition Letters 18 (1997) 173-185

After learning, the network will output a vector ui = (b'il, 11i2 . . . . . Pim) whenever node i wins the competitive's layer competition. The vector ui is an approx- imate average of the features Yl . . . . . Ym associated with those pixels x that cause node i of the competitive layer to win.

tion process, NL is the number of samples in the learning set, k is a constant, Q is the number of classes, and Ntwi is given by

= f 1~c i -- Ntci , i f N~ i > Ntci, N'w,

t O, otherwise, (22)

5.3. Modified lvq

Assume that d i and dj are the Euclidean distances from pixel x to the weight vectors ei and c j, respectively. Note that the vector ci is obtained by concate- nating wi and ui. Then x is defined so as to fall into a window of the relative width .,/, if

dj) 1-a min \d j ~ > 1 +-----~" (17)

For all x falling into the window adapt:

Ci(t + 1) = ci(t) -- a ( t ) [X(t) -- Ci(t) ] ,

cj( t + 1) = ej( t) + a( t) Ix(t) - cj( t) ] ,

where a( t ) decreases with time and 0 < a(t ) < 1, ei and cj are the two closest weight vectors to x, whereby x belongs to the same class as cj, but not as ci.

If x, ci and c i belong to the same class,

ck( t+ l )=ck( t )+e( t )a ( t ) [x ( t ) - - ck ( t ) ] (19)

for ck representing the closest weight vector. If x, ci and cj belong to different classes:

ck(t + 1) = ck(t) - e ( t )a ( t ) [x(t) - c~(t)] (20)

for k E {i, j}. The modified lvq is similar to that de- scribed by Song and Lee (1996). However, we allow only modifications of weights inside the window h.

5.4. Determining weights for the Euclidean distance

The weights aij that appear in the weighted Eu- clidean distance are specific for each reference pattern. The weights are found by maximising the following function of classification performance.

o )/ F = Ntci - k Nwi NL, (21 )

- i=l

where Nti denotes the number of samples from class i classified correctly at the tth iteration of the optimisa-

where N~ i is the number of samples from class i clas- sifted correctly at the zeroth iteration of the optimisation process. The second term in the performance measure penalises an increase in wrong classifications. The Alopex algorithm (Unnikrishnan and Venugopal, 1994) performing a random search in the weight space is used for the optimisation.

5.5. Additional features

The features extracted from the surrounding Y~ . . . . . Ym are defined to be

E[l i] , E[J/], E[g i ] ,

min[&], max[&], min[Ki], max[K/],

where E[ ] is an average operator. The operators E, min, max and ~ are calculated in the window around the pixel being classified.

6. Experimental testing

6.1. Learning and testing sets

The system was tested by a segmentation of colour images containing the nine colour classes mentioned above. The learning set for designing the binary decision tree consisted of 30000 pixels. 18000 of them were labelled and the others unlabelled. The labelled pixels have been collected from both full tone and half tone prints. Fig. 3 illustrates an example of an image taken from the full tone print of the c class. An example of the "half tone image" used to collect the "cyan" pixels is given in Fig. 4. Since class k (dots printed with black ink) can appear on the all eight possible backgrounds, pixels of class k have been collected from all the backgrounds. Figs. 5 and 6 illustrate dots of class k on the yellow and magenta backgrounds,

A. Verikas et aL /Pattern Recognition Letters 18 (1997) 173-185 181

respectively. Pixels from several windows containing all the colour classes (with the k and without the k) have been included into the learning set as unlabelled data. Fig. 7 presents an example of such a window.

Collection of data for training the CP networks starts after the labelling of the terminal nodes of the decision tree. Pixels falling into the nodes of the second type are used to train the respective CP network. Two nodes of the second type have been found. One representing the colour classes m and my and the other the colour classes era, emy and k. For example, pixels from the image shown in Fig. 8 would fall into two nodes. Node "y" of the first type and node "m, my" of the second type. Since we know that the image shown in Fig. 8 contains only two colour classes, namely y and my (100% of the area is covered by a yellow ink), all the pixels falling into the node "In, my" can be used to train the my part of the respective CP network. In the same manner the data from half tone prints are collected to train the "era, emy, k" CP network. For example, the image shown in Fig. 9 contains no "era" and no "k" pixels. Therefore, all the pixels falling into the node "era, emy, k" can be used to train the emy part of the respective CP network.

The learning, optimisation and validation sets for designing the CP networks contained 10000 pixels from each class.

The networks were tested on about 100000 pixels from each class. An exact number of testing samples processed from different classes is given in Table 2. "Window images" of a 256 x 64 pixel size, extracted from larger primary images have been used for evalu- ating the performance of the developed system. All the "window images" were extracted from different primary images. More than 200 "window images" have been processed.

The "ground truth" of the classification was estab- lished by visual inspection knowing the desired result. For example, all pixels coming from the red dots of Fig. 8 should be assigned the label "my". The other pixels of the image should acquire the label "y". Any other classification result would be treated as error.

6.2. Parameter setting

There are six coefficients controlling different steps of the training process of the CP networks. These parameters are:

(at}: the learning rate controlling training of the competitive layer,

/3: the learning rate controlling training of the Grossberg layer,

k: a constant, that controls the degree of penalising an increase in wrong classifications, if compared with the initial optimisation state (the constant appears in the criterion function for obtaining weights to be used in the weighted Euclidean distance measure),

a, A and e: coefficients controlling the behaviour of the lvq algorithm.

At the beginning of training, the coefficient at was set to a relatively large value 0.4. As the weight vectors wi move into the area of the input data, the coefficient was then lowered for final convergency. Therefore, the following learning coefficients at have been used in the training process. If the training process contains a total of t2 steps, then

for 0

182 A. Verikas et al. /Pattern Recognition Letters 18 (1997) 173-185

Fig. 3, An example of an image taken from a full-tone print of the e class.

Fig. 4. An example of an image taken from a half-tone print of the e class.

Fig. 5. Dots of class k on the yellow background. Fig. 6. Dots of class k on the magenta background.

Fig, 7. An example of the image containing eight colour classes.

6.3. Results obtained

Two CP networks have been constructed. One for the cluster cm--cmy-k and the other for the cluster m-my. The other four colour classes have been clas- sifted by the binary decision tree. The CP network constructed for m-my contained 16 nodes and that for cm--cmy-k 32 nodes.

Let p denote the correct classification and f the observed frequency. Then the 1 -oe confidence interval P(pl < P < P2) = 1 - oe is given by

~/4 f ( 1 - f ) "t2/2 '12/2 q- '1oe/2 -'}- - - Pl,2 = 2 f + ~ Nr N2 , (23)

2 1+ Nr J

where '12/2 is the fractil of the normal distribution at the risk a/2 and Nr is the size of the testing set.

Table 2 presents network's performance for different colour classes, as well as the confidence intervals and the number of testing samples used.

The values presented in the column w, y, cy of the table are averaged values for these colour classes. The correct classification rates obtained for these classes were very similar. Pixels coming from the c colour class were also classified by the binary decision tree. The lower classification accuracy for the class results from its similarity to the cm colour class. The similarity arises due to the widely varying properties of the m layer of the cm coverage as well as the varying properties of paper used for printing. For example, dark micro-spots in paper make the c pixels look like those of cm colour class. The last column of the table


Fig. 8. An example of an image containing "y" and "my" colour classes.

Fig. 9. An example of an image containing several colour classes.

Fig. 10. An example of an image containing eight classes of colours (no k). A part of the image was classified by the developed neural network.

Fig. 11. An image of dots printed with black ink on a magenta-yellow background.

Fig. 12. An image of dots printed with cyan, magenta and yellow inks on a magenta-yellow background.

Table 2 Performance of the network and confidence intervals for different colour classes

Colours w, 3,, ey e m my em emy k era, emy, k

f 0.992 0.981 0.982 0.978 0.941 0.902 0.908 0.980 P 1 0.9914 0.9801 0.9811 0.9770 0.9395 0.9003 0.9064 0.9791 P2 0.9926 0.9819 0.9828 0.9789 0.9424 0.9037 0.9096 0.9808 Nr 9 * 104 12 104 9 * 104 9 * 104 10 * 104 12 * 104 12 * 104 10 * 104


presents an averaged performance of the network for full tone (solid) prints of the cm, cmy and k colour classes. The other columns illustrate network's performance for half tone prints.

About 85% of pixels coming from the m and my colour classes and about 80% of pixels from the em colour classes have been classified in the second classification step. Nearly 65% ofpixels from the cmy and k colour classes reach the third classification step.

Figs. 10, I 1 and 12 illustrate some examples of the classification results. An example of an image containing eight colour classes (no k) is presented in Fig. 10. A part of the image was classified by the developed neural network. Eight colour classes can be easily found in the classified part of the image. The colour classes w, e, m and y are displayed with colours of their names. The my colour class is shown in red, cm in blue, ey in green, and cmy in black. Note that no post-processing has been applied in the examples presented. Figs. 11 and 12 illustrate the classification results for the most overlapping pair of colour classes, namely, emy and k. The dots presented in Fig. 11 have been printed with black ink, while those presented in Fig. 12 with a coverage of cyan, magenta and yellow inks. The dots have been printed on the same magenta- yellow background in both pictures. The classification results are shown only for the central part of the pictures. After the classification we display the class k with a black colour and class cmy with a brown colour; my class is displayed with red as in the previous picture. Therefore, a brown colour inside the classified part of the image of Fig. 11 and a black colour inside the classified part of the image of Fig. 12 means classification errors. Some small green spots can also be found inside the brown dots. It means that these areas of paper were occasionally printed with only cyan and yellow inks and no magenta. Since the ey pixels are classified at the first step, without using adjacency information, the green spots appear.

As has been already mentioned, the black ink can appear on the all eight possible backgrounds. The background cyan-magenta-yellow has proven to be the most difficult one. Since the cyan-magenta-yellow coverage also produces a black colour, the classification task in this case becomes a task of"finding" black dots on a black background. For the human eye such areas of a picture look completely black. Effects of light diffusion in paper makes the classification task

very difficult. A correct classification rate of about 70- 75% has been obtained for the black dots on the cyan- magenta-yellow background. We hope that the classification results for such dark areas of pictures can be improved by exploiting knowledge about the light- paper-ink interaction and by using more elaborate ex- traction of additional features. The work on how an artificial neural network can be used for finding a set of additional features is going on. On the other hand, for the application it is important to "find" black ink on lighter areas of pictures.

In order to evaluate the results obtained from the system we attempted to distinguish between two "black" images using some other method for colour image segmentation. It has been recently reported good segmentation results of textured colour images obtained using Gaussian Markov random field models (Panjwani and Healey, 1995). In this model it is assumed that the rgb colour vector at each location is a linear combination of neighbours in all three planes plus Gaussian noise. The coefficients of the combination are estimated as parameters of the model. There are three colour planes and four directions used. Therefore there are 12 parameters of the model for each colour plane. For two textures a difference in estimated values of the parameters indicates the difference between textures themselves. This approach has been chosen for the comparison. The two "black" images have been treated as two textures with different spatial interaction of coloured pixels. Table 3 provides an example of the estimated values of the parameters for R colour plane. As we can see from the table there is no significant difference in the values of the model parameters estimated from the image of the picture printed in black ink and that printed in cyan, magenta and yellow inks in this order on top of each other. The same range of difference between the estimated parameter values has been obtained for the colour planes G and B.

7. Conclusions

Small neural networks of different origins and different learning techniques have been combined to make a hierarchical network for classification of colour pixels. The hierarchical neural network performs analysis in a rough to fine fashion and enables

A. Verikas et al./Pattern Recognition Letters 18 (1997) 173-185

Table 3 References Means and standard deviations of the estimated values of model parameters for R colour plane

"Black image . . . . cmy image"

Parameter Mean St. dev. Mean St. dev.

1 -0.1330 0.062 -0.1431 0.050 2 0.2248 0.078 0.2458 0.071 3 -0.1364 0.053 -0.1500 0.064 4 0.5558 0.056 0.5577 0.072 5 -0.0545 0.025 -0,047 0.023 6 0.0751 0.021 0,0653 0.021 7 -0.0352 0.012 -0.0351 0.015 8 0.0040 0.006 0.0151 0.015 9 -0,0200 0.015 -0.0171 0.013

10 0.0411 0.020 0.0351 0.025 11 -0.0251 0.010 -0.0255 0.016 12 0.0072 0.003 0.0062 0.005

a high average classification speed and a low classifi-

cation error. Experimental ly, we have shown that the network is capable of distinguishing among the nine

colour classes that occur in a half tone colour image. A correct classification rate o f about 98% has been

obtained even for two very similar black colours, namely the black printed in black ink and the black printed in cyan, magenta and yel low inks in this order

on top of each other.

Acknowledgements

We grateful ly acknowledge the support we have received from The Swedish National Board for In-

dustrial and Technical Development and The Royal Swedish Academy of Sciences. We also wish to thank two anonymous reviewers for their valuable comments

on the manuscript.

185

Desieno, D. (1988). Adding a conscience to competitive learning. Proc. ICNN I. IEEE Press, New York, 117-124.

Hunt, R.W.G. (1991). Measuring Colour. Ellis Horwood, Chichester, UK.

Kohonen, T. (1990). The self-organizing maps. Proc. IEEE 78 (9), 1461-1480.

Liu, J. and Y.-H. Yang (1994). Multiresolution color image segmentation. IEEE Trans. Pattern Anal. Machine lntell. 16 (7), 689-700.

Panjwani, D. K. and G. Healey (1995). Markov Random Field models for unsupervised segmentation of textured color images. IEEE Trans. Pattern Anal. Machine Intell. 17 (10), 939-954.

Song, H.-H. and S.-W. Lee (1996). LVQ combined with simulated annealing for optimal design of large-set reference patterns. Neural Networks 9 (2), 329-336.

Tan, T.S.C. and J. Kittler (1993). Colour texture classification using features from colour histogram. Proc. SCIA-93, Tromso, Norway, 807-813.

Tominaga, S, (1992). Color classification of natural color images. Color Research and Application 17 (4), 230-239.

Uchiyama, M. and M.A. Arbib (1994). Color image segmentation using competitive learning. IEEE Trans. Pattern Anal. Machine Intell. 16 (12), 1197-1206.

Unnikrishnan, K.P. and K.P. Venugopal (1994). Alopex: A correlation-based learning algorithm for feedforward and recurrent neural networks. Neural Computation 6, 469-490.

Verikas, A. and K. Malmqvist (1995). Increasing colour image segmentation accuracy by means of fuzzy post-processing. Proc. IEEE Internat. Conf. on Artificial Neural Networks, Perth, Australia, Vol. 4, 1713-1718.

Verikas, A., K. Malmqvist, L. Bergman and A. Gelzinis (1995). An unsupervised learning technique for finding decision boundaries. Proc. 5th European Conf on Artificial Neural Networks, ICANN-95, Paris, Vol. 2, 99-104.

Verikas, A., K.Malmqvist and A. Gelzinis (1996a), A new technique to generate a binary decision tree. Proc. Symposium on Image Analysis, Lund, Sweden, 164-168.

Verikas, A., K. Malmqvist, L. Malmqvist and L. Bergman (1996b). Weighting colour space coordinates for colour classification. Proc. Symposium on Image Analysis, Lund, Sweden, 49-53.