1-s2.0-s0167865597000044-main

13
ELSEVIER Pattern Recognition Letters 18 (1997) 173-185 Pattern Recognition Letters Colour image segmentation by modular neural network 1 A. Verikas a,b,,, K. Malmqvist a, L. Bergman a a Center for Imaging Sciences and Technologies, Halmstad University, Box 823, S-301 18, Halmstad, Sweden b Kaunas University of Technology, Studentu 50, 3031, Kaunas, Lithuania Received 20 June 1996; revised 22 October 1996 Abstract In this paper segmentation of colour images is treated as a problem of classification of colour pixels. A hierarchical modular neural network for classification of colour pixels is presented. The network combines different learning techniques, performs analysis in a rough to fine fashion and enables to obtain a high average classification speed and a low classification error. Experimentally, we have shown that the network is capable of distinguishing among the nine colour classes that occur in an image. A correct classification rate of about 98% has been obtained even for two very similar black colours. @ 1997 Elsevier Science B.V. Keywords: Colour classification; Image segmentation; Modular neural networks 1. Introduction Colour image processing and analysis is increas- ingly used in industry, medical applications, and other fields. Quality inspection, process control, material analysis, and medical image processing are a few ex- amples. Therefore, a research in colour perception and development of efficient computational models for real world problems is of crucial importance. One task that often arises in colour image processing is image segmentation. Colour image segmentation techniques can, roughly, be categorised into techniques for chro- matically dividing an image space and those for clus- tering a feature space derived from an image. Region growing, region splitting and merging are the common approaches used by methods of the first group (Liu * Corresponding author. E-mail: [email protected]. I Electronic Annexes available. See http://www.elsevier.nl/ locate/patrec. and Yang, 1994, Panjwani and Healey, 1995). Meth- ods of the second group divide colour space into clus- ters (Uchiyama and Arbib, 1994, Tominaga, 1992). The colour image segmentation method we discuss here belongs to the latter category. We treat the colour image segmentation problem as a problem of classifi- cation of colour pixels. The most common goal in colour image segmenta- tion is to partition a colour image into a set of uni- form colour regions. However, the aim of this work is slightly different. The motivation for this work is a need to determine colours of inks used to produce a multi-coloured picture created by printing dots of cyan (c), magenta (m), yellow (y) and black (k) primary colours upon each other through screens hav- ing differing raster angles. The answer must be given for any possible combination of cyan, magenta, yel- low and black ink and for any area of the picture. One factor that influences the colour impression of the pic- ture is the size and shape of the areas covered by the 0167-8655/97/$17.00 @ 1997 Elsevier Science B.V. All fights reserved. PII S0167-8655 (97) 00004-4

Upload: alvaro-garcia

Post on 07-Nov-2015

219 views

Category:

Documents


2 download

TRANSCRIPT

  • ELSEVIER Pattern Recognition Letters 18 (1997) 173-185

    Pattern Recognition Letters

    Colour image segmentation by modular neural network 1 A. Verikas a,b,,, K. Malmqvist a, L. Bergman a

    a Center for Imaging Sciences and Technologies, Halmstad University, Box 823, S-301 18, Halmstad, Sweden b Kaunas University of Technology, Studentu 50, 3031, Kaunas, Lithuania

    Received 20 June 1996; revised 22 October 1996

    Abstract

    In this paper segmentation of colour images is treated as a problem of classification of colour pixels. A hierarchical modular neural network for classification of colour pixels is presented. The network combines different learning techniques, performs analysis in a rough to fine fashion and enables to obtain a high average classification speed and a low classification error. Experimentally, we have shown that the network is capable of distinguishing among the nine colour classes that occur in an image. A correct classification rate of about 98% has been obtained even for two very similar black colours. @ 1997 Elsevier Science B.V.

    Keywords: Colour classification; Image segmentation; Modular neural networks

    1. Introduction

    Colour image processing and analysis is increas- ingly used in industry, medical applications, and other fields. Quality inspection, process control, material analysis, and medical image processing are a few ex- amples. Therefore, a research in colour perception and development of efficient computational models for real world problems is of crucial importance. One task that often arises in colour image processing is image segmentation. Colour image segmentation techniques can, roughly, be categorised into techniques for chro- matically dividing an image space and those for clus- tering a feature space derived from an image. Region growing, region splitting and merging are the common approaches used by methods of the first group (Liu

    * Corresponding author. E-mail: [email protected]. I Electronic Annexes available. See http://www.elsevier.nl/

    locate/patrec.

    and Yang, 1994, Panjwani and Healey, 1995). Meth- ods of the second group divide colour space into clus- ters (Uchiyama and Arbib, 1994, Tominaga, 1992). The colour image segmentation method we discuss here belongs to the latter category. We treat the colour image segmentation problem as a problem of classifi- cation of colour pixels.

    The most common goal in colour image segmenta- tion is to partition a colour image into a set of uni- form colour regions. However, the aim of this work is slightly different. The motivation for this work is a need to determine colours of inks used to produce a multi-coloured picture created by printing dots of cyan (c), magenta (m), yellow (y) and black (k) primary colours upon each other through screens hav- ing differing raster angles. The answer must be given for any possible combination of cyan, magenta, yel- low and black ink and for any area of the picture. One factor that influences the colour impression of the pic- ture is the size and shape of the areas covered by the

    0167-8655/97/$17.00 @ 1997 Elsevier Science B.V. All fights reserved. PII S0167-8655 (97) 00004-4

  • 174 A. Verikas et al./Pattern Recognition Letters 18 (1997) 173-185

    Table 1 The mean values and standard deviations of the variables R, G and B for five overlapping classes of colours ( R, G and B E [0, 255] )

    Colour class m my cm cmy k

    Variab~ R G B R G B R G B R G B R G B

    Mean 119 41 53 I12 37 33 32 31 48 30 31 33 25 25 24 Stand. dev. 5 5 7 6 5 6 5 6 10 5 6 7 4 3 4

    different inks. This information can be used to con- trol the amount of ink transferred to the paper in each of the four printing nips holding cyan, magenta, yel- low and black. The measurement of the area covered by ink of the different colours can be done automat- ically using an image analysis system, if the image taken from the printed picture can be segmented into regions according to the following two rules:

    1. Pixels should be assigned to the same cluster (colour class) if they correspond to areas of the picture that were printed with the same inks.

    2. Pixels corresponding to areas printed with dif- ferent inks should be assigned to different clusters.

    Determination of a colour class for every pixel of the image is the way the task is solved. In order to solve the task with acceptable accuracy in classifica- tion and a high average speed we propose the use of a hierarchical modular neural network. Note that classi- fication speed is of primary interest in our application. The rest of the paper is organised as follows. In the next two sections we briefly describe the input data and the colour space used. Architecture of the network is presented in Section 4. Procedures for training the network are given in Section 5. Section 6 summarises the results of experimental investigations. Section 7 concludes the work.

    2. The data

    When mixing dots of cyan, magenta and yellow colours eight combinations are possible for every pixel in the picture. The combination emy produces the black colour. However, in practice black ink is most of- ten also printed. We assume the black ink to be opaque. Therefore, we have to distinguish between 9 colour classes, namely c, m, y, w (white paper), cy, cm, my, emy (black resulting from overlay of cyan, magenta and yellow) and k (black resulting from black ink). Discrimination between some of the colour classes is a rather complicated matter, since they are highly

    overlapping in the colour space. For example, m-my and em--emy-k are two clusters of such highly over- lapping colour classes. To illustrate this we present in Table 1 the mean values and standard deviations of the variables R, G and B for these five classes of colours. Note that the intensity values shown in the table are for the data taken from solid print areas only. This means no pixels from the dots and "fuzzy borders" of the dots are included. The pixels from the borders create "bridges" between white and the other colours and make the classification problem more difficult to solve. Besides that, the R, G and B parameters of class k acquire very large variance, since the black ink can appear beneath (or above) all possible combinations of the other coloured inks.

    The number of clusters with highly overlapping classes of colours depends on several factors, such as the amount of black ink printed on the picture, print- ing technology, properties of inks used and some other factors. By increasing the amount of black ink, we make the other colours darker and more and more sim- ilar until we get only one cluster with only one black colour. We assume, therefore, the range of variation of the amount of black ink to be 0-50%. In such a range of variation of black ink printers would like to measure the percentage of an area covered by inks of different colours. Though several clusters of rather overlapping colour classes can appear and it is important to recog- nise with an acceptable classification accuracy all the colour classes, the most difficult task is to distinguish between the colour classes cmy and k.

    3. Choice of colour space

    Five colour spaces, namely RGB, HSV, CIELuv, C1ELab and "IJK", have been tested and compared experimentally. The choice of colour space was based on experimental testing. The "IJK" colour space was the choice. The highest correct classification rate was obtained in this space for the most overlapping colour

  • A. Verikas et al./Pattern Recognition Letters 18 (1997) 173-185 175

    Additional features from the | surrounding ( Yl ..... Ym ) r /

    [ Colourpixel (X) 7

    ] New variables from R, G, and B [ I I

    I A binary decision tree ]-~ Class label ) r

    [ A set of ambiguous classes [

    A network for each set of classes counterpropagation ambiguous 1

    [ A set of CP weight vectors {W 1 ..... Wi ..... wk} ~------*IClasslabell

    Weighted Euclidean Distances [~

    d(x, C i ) ~-" ~ Zk~l [% (xi -- wik )]2 + ~1=1 [all (Yt -- Uil )12 }1'2 H ( Class label }

    ,k [ Fuzzy post-processing I

    [ Class label)

    Fig. 1. Architecture of the network.

    Weights % from

    random optimisation

    classes (era, emy and k). The "IJK" colour space uses colour difference sig-

    nals. If we assume the random variables R, G and B to be of equal variances (0 "2) and covariances (only variances of the variables have been normalised to be equal to one in our experiments), the covariance ma- trix of these variables can be written as (Tan and Kit- tier, 1993)

    I 1 r i l X=o "2 r 1 , (1) r r

    where r is the correlation coefficient. The eigenso- lution of the covariance matrix gives the following eigenvectors (ei) and the corresponding eigenvalues (ai):

    el = {1, 1, 1}a';

    e2 = {1,0,-1}T; (2)

    e3 = {1 , -2 , 1}T;

    At =0-2(1 +2r ) ;

    a 2 = h 3 = 0"2( 1 -- r). (3)

    The linear transform of the {R, G, B} vector by the eigenvectors produces other random variables

    I=R+G+B, (4)

    J=R-B , (5)

    K = R - 2G + B, (6)

    which are almost uncorrelated and have zero covari- ances. 1, J and K are the variables of the "IJK" colour space.

    According to Hunt ( 1991 ), three signals represent- ing colour are transmitted via nerve fibres from the human eye to the brain. One of these signals is usually referred to as an achromatic signal and the other two as colour difference signals. In this sense the 1, J and K variables mimic the signals transmitted to the hu- man brain, since I can be referred to as an achromatic signal, while J and K as colour difference signals.

    Distances measured in the "IJK", as well as the RGB colour space do not represent colour differences in a uniform scale from the point of view of percep- tion. The C1ELuv and CIELab colour spaces are more uniform in this sense. In spite of that we have cho-

  • 176 A. Verikas et aL /Pattern Recognition Letters 18 (1997) 173-185

    sen the " I JK"colour space. A low classification error and a high processing speed, but not the good cor- respondence between measured and perceived colour differences, is our primary interest when choosing the colour space. Such an approach comes from the goal, which is to determine colours of inks used to print any arbitrary area of a given picture, but not to segment an image of the picture in a way as similar as possible to the way how the humans do. Even when calculated without pre-processing, the I, J and K variables are much less correlated than R, G and B. The amount of calculation required to obtain 1, J and K variables is less than for {L, u, v}, {L, a, b} or {H, S, V}. Be- sides, the variable H is undefined for a grey colour. Therefore, the choice of the " I JK" colour space seems reasonable in this particular case.

    m

    Grossberg vl I layer

    I Competitive layer

    W I

    Input layer

    x, x 2 x, y~ Y2 Ym Fig. 2. A forward-only counterpropagation network.

    classification pertbrmed by the tree is very fast, since the tree consists of only a few nodes and only one neuron is used in every node of the tree.

    4. Architecture of the neural network

    The architecture of the network is shown in Fig. 1. There are four steps in the proposed classification

    procedure. A binary decision tree performs the first step of the procedure. We carry out the second and the third classification steps by using weight vectors of the counterpropagation (CP) network. The last classifi- cation step (fuzzy post-processing) is based on anal- ysis of the decisions made in the previous step. We use only three variables I, J and K to describe a pixel in the first two classification steps. In the third step adjacency information is also exploited, since a pixel acquires additional co-ordinates, the values of which are calculated from the surrounding of the pixel being classified. Next, we briefly describe network modules, that perform different classification steps.

    4.1. Binary decision tree

    The binary decision tree performs the first classifi- cation step. Two types of terminal nodes can be en- countered in the tree: (1) the node representing one colour class only, and (2) the node representing a cluster (a set) of ambiguous colour classes. The clas- sification performed by the tree is final for the pix- els arriving at terminal nodes of the first type. Pixels reaching terminal nodes of the second type are trans- ferred to the CP network for further analysis. The tree divides colour space into several colour regions. The

    4.2. Counterpropagation network

    Devoted to each set of ambiguous coiour classes is a CP network (Fig. 2). The size of the network's competitive layer depends on the number of ambigu- ous classes in the set. Each class from this set is represented by a part of the layer. These parts are trained separately and concatenated to make one net- work for each and every set of ambiguous classes. The second classification step is performed by using the weight vectors wi = (Wil . . . . . win) of the com- petitive layer as reference patterns in a k-NN classi- fication rule. The number of nodes in the Grossberg layer is equal to the number of features yl . . . . . Ym ex- tracted from the surroundings of the pixel being clas- sified. The learned values of the features are stored as weights of the Grossberg layer. Therefore, each weight vector of the competitive layer wi (a rough reference pattern) is associated with one Grossberg layer's weight vector ui = (vi i , Pi2 . . . . . Pint) (refer- ence pattern containing more details). The association of the weight vectors produces a concatenated weight vector ci = ( wil . . . . . win, P/ l , Pi2 . . . . . Pint). The third classification step is performed by using the concate- nated weight vectors. The vectors are treated as refer- ence patterns in a minimum distance classifier.

    The CP network acts as a quantiser of a colour re- gion. The weight vectors wi quantise the region in a 3-dimensional " I JK" colour space, while the concate- nated weight vectors ci quantise the region in an ex- tended (3 + m)-dimensional space.

  • A. Verikas et al./Pattern Recognition Letters 18 (1997) 173-185 177

    4.2.1. Classification by using CP network During the second classification step k nearest

    weight vectors wi of the competitive layer are se- lected. Note that the nearest weight vectors are found among the all N weight vectors of the concatenated competitive layer. The classification result is final if, among these weight vectors, vectors of one class dominate, i.e. the ratio of a number of weight vectors representing the two most frequently appearing colour classes exceeds some pre-specified threshold. Other- wise, k weight vectors ci are emitted as an output of the CP network. The vector cl is a concatenation of the first winner Wl of the competitive layer and an as- sociated with it vector ul of the Grossberg layer. The weight vectors ci are used in the third classification step that is performed by calculating the weighted Euclidean distances. The weights a 0 that appear in the weighted Euclidean distance measure are specific for each reference pattern. The weights are found by performing a random optimisation in the weight space (Verikas et al., 1996). By doing such steps we perform analysis in a rough to fine fashion. We per- form a rough classification with the binary decision tree, more precise with competitive layer weights, and accurate classification with the concatenated weights.

    4.3. Fuzzy post-processing

    As it has been mentioned, one counterpropagation network for each and every set (cluster) of ambigu- ous classes is constructed. During the learning pro- cess the weight vectors of the network are distributed in the colour space according to the class-conditional probability density function of the input data used for learning. The weight vectors of the trained network are treated as reference patterns and they represent re- gions of the colour space.

    Let Mac be the number of ambiguous classes in the set. Each class q (q = 1,2 . . . . . MAC) is represented by Nq weight vectors. The set of these vectors is:

    {Cj} = {c~,i = 1 . . . . . Nj}, j = 1 . . . . . MAC, (7)

    {c} = UAG), (8)

    3 --` MAc Nj. (9) N= ~_~/

    As far as highly overlapping colour classes are con- sidered, most of the weight vectors will be located in

    the overlapping regions of the class-conditional dis- tributions. However, some of the vectors will be also placed in the non-overlapping "tails" of the distribu- tions. Therefore, the decisions made by using the dif- ferent weight vectors are not of the same reliability. We say that the decision is made (when classifying pixel x) by using the weight vector c~ ( i = 1,2 . . . . . Nj; j = 1 . . . . . MAC), if the minimum distance d(x, c~) has been obtained by using the weight vector c~.. Some of the decisions made by using weight vectors from the overlapping regions can be rather doubtful. Therefore, a correction of the decisions (the post-processing) takes place after the pixels have been classified. The concept of the correction is as follows.

    The decision classes (the colour classes) and the weight vectors ~ representing the regions of the colour space are considered as fuzzy sets. Membership val- ues for the fuzzy sets and the fuzziness of the deci- sions made by the weight vectors are defined. Clas- sification of an image by the counterpropagation net- work results in the classified image as well as in a number MAC (the number of ambiguous classes in the set) of supplementary images. Every pixel x in the supplement image j is represented by the value of the membership function Aj (x) of the jth ambiguous class. Post-processing is based on information about the membership values and the fuzziness of the deci- sions. More details about the post-processing can be found in (Verikas and Malmqvist, 1995).

    4.4. Benefits of the architecture

    A high averaged classification speed and a low clas- sification error are the main attributes of the architec- ture chosen. The binary decision tree performs fast classification by using only the three colour space co- ordinates and assigning each colour pixel to one of several colour regions. In the second classification step location of the pixel in the region is analysed by us- ing 3-dimensional weight vectors, representing sub- regions of the region being considered. As a result of the analysis, several sub-regions for the possible location of the pixel are selected. In addition to the three main colour space co-ordinates adjacency infor- mation is also exploited in the third classification step for making a decision about the pixel's colour class. The colour class assigned to the pixel may be further

  • 178 A. Verikas et al./Pattern Recognition Letters 18 (1997) 173-185

    changed in the last classification step, depending on which weight vectors have been used to make deci- sions about the colour classes of the adjacent pixels. Therefore, in each step the dimensionality of the de- cision space is reduced, while the amount of informa- tion used to make a decision is increased. Depending on the colour of a pixel, the classification process can be accomplished at any of the steps.

    By performing nested analysis and by adaptively using the amount of information for the classifica- tion process we achieve the required accuracy and gain analysis speed. In contrast, image segmentation methods based on region growing, region splitting and merging require intensive calculations for performing multiple splits and merges. Besides, such methods are not directly applicable in our case, since our goal is to determine colours of inks used to print any arbitrary area of a given picture, but not to segment an image of the picture in a way as similar as possible to the way how the humans do. Sometimes, for example some "cmy" regions are perceived as being more similar to the class "k" than to their own class, however pixels from these regions should acquire the label "emy".

    5. Training the network

    5.1. Binary decision tree

    The binary decision tree is constructed by sequen- tially dividing the learning set into two parts. In every node of the tree the learning set is divided into two subsets according to the decision boundary developed during learning. Only one neuron (of any order de- sired in a general case) is used to solve the task (of dividing the learning set into two parts) in every node of the tree. The neuron can classify a data point x into one of two subsets according to the sign of the neu- ron's output value. The output is given by

    y = f (u ) = f (wo + Zwix i+ i

    + EWi l . . . i LX i l ' ' 'X iL ) , ij 4"" ~ic

    (lO)

    with xi being the ith component of the input data, w i the corresponding weight, and L the neuron's order.

    The function f ( ) is from -1 to +1 with f (0 ) = 0, for example f (u ) = tanh(u).

    The learning set X is partitioned into two subsets X+ and X_ according to the following rule:

    X+ i fg (x) ~> 0 x E Vx C X, (11)

    X_ i fg (x ) < 0

    where g(x) is given by

    g( x ) = wo + Z wixi + "" i

    Z Wil"'iLXil'''XiL" (12) +

    iI~'"~it.

    The learning set X contains labelled as well as unla- belled pixels. The unlabelled pixels are those coming from the borders of the dots. Labels for such pixels are hard or even impossible to obtain. Therefore an un- supervised learning algorithm, we have recently pro- posed, is used for the binary decision tree construction (Verikas et al., 1995). For every node of the tree the algorithm tries to locate the decision boundary (12) in a place with few learning samples. A node of the tree is labelled as being a terminal node of the first type when all labelled samples falling in the node be- long to the same class or if only one class has the number of labelled samples above the threshold T1 and the ratio of samples of two major classes represented by the node is above the threshold T2. A node of the tree is labelled as being the terminal node of the sec- ond type if the number of labelled samples falling in the node is above the threshold T1 more than for only one class and the samples falling into the node form a "compact cluster". The algorithm that can find the "compact clusters" is given in (Verikas et ai., 1996).

    5.2. Counterpropagation network

    5.2.1. Process of designing the network The CP networks with input from the second type

    nodes of the tree are trained separately. In order to avoid overtraining and to achieve better generalisation properties of the network, separate data sets have been used in different steps of the designing process.

    Data sets used to design a pattern recognition sys- tem are always limited and very often not representa- tive enough. This often happens because of a lack of

  • A. Verikas et al./Pattern Recognition Letters 18 (1997) 173-185 179

    experimental data (it was not the case in this study), limited resources of computer memory or computa- tion time. It also often happens that sets are collected in favour of one or another class. Therefore, the use of different data sets in different steps of the design process reduces the possibility that the system be de- signed in favour of some classes and improves gener- alisation properties of the system.

    Six sets of data, namely a learning set, two valida- tion sets, two optimisation sets and a testing set, have been used for constructing each network. Each set rep- resents the number MAc of classes. It is clear, if only a small amount of experimental data is available, the optimisation sets can be replaced by the learning set, as well as two validation sets by only one, or "leave few out" techniques can be applied.

    First, the weight vectors of the competitive and the Grossberg layers of the CP network are obtained for each of the MAc classes (using the learning set) by means of competitive learning with "conscience" and the Grossberg learning law, respectively. See the sec- tion below for the detailed learning procedure.

    Next, the set of concatenated weights, is optimised by using the modified Ivq algorithm (Section 5.3) and the optimisation set 1. We use the "pocket optimisa- tion" strategy. The best set of weights c is traced during the optimisation process and kept in the "pocket". The optimisation terminates with the best set of weights. The quality of the weights is tested on the validation set 1.

    In the next step of the designing process we find weights aij that appear in the weighted Euclidean dis- tance (Section 5.4). The Alopex algorithm (Unni- krishnan and Venugopal, 1994) performing a random search in the weight space and the optimisation set 2 are used for obtaining the weights. During the optimi- sation process some of the weights decrease to zero and eliminate corresponding features (Verikas et al., 1996). The features eliminated are different for differ- ent reference patterns. Therefore we say that the fea- tures are selectively used for classification. The fea- tures used are different for different regions of the colour space. The whole CP network is tested on the validation set 2 after the optimisation.

    This process of designing is repeated for a differ- ent number of the CP network's nodes. The network yielding a reasonable trade-off between the classifica- tion error and complexity is chosen as the final one.

    5.2.2. Training the network First, the weight vectors w of the competitive layer

    of the CP network are obtained for each of the MAC classes by means of competitive learning with "con- science" (Verikas and Malmqvist, 1995). The "con- science" mechanism is similar to that proposed by De- sieno (1988).

    In each iteration of the learning process we find a "winning" weight vector using the following equation:

    k=argnfin(d(x,w~)-bq), /=1 ,2 . . . . . Nq,

    (13)

    where d(x,w~) is the distance between pixel x and the ith weight vector of the qth class, and bq is the winning frequency sensitive term, that penalises too frequent "winners" and rewards those that win seldom (Verikas and Malmqvist, 1995). Then the winning weight vector w~ (t) is updated according to the rule

    W~q(t+l)=Wkq(t)+at[x(t)--Wkq(t)], (14)

    where parameter {at} is a slowly decreasing sequence of learning coefficients.

    When training of the competitive layer terminates, the weights w are frozen and the learning proceeds for the Grossberg layer (separately for each of the MAC classes). The learning of the Grossberg layer is governed by the Grossberg learning law:

    pij(t+l)=l. ' i j (t)+fl[yj- l , ' i j (t)]zi , (15)

    where t is the iteration index,/3 the learning rate (0 < /3 < 1), yj the jth feature from the surrounding, v i = (v jl, vj2 . . . . . vjNq) the weight vector associated with the jth node of the Grossberg layer, Nq the number of the competitive layer nodes representing the qth class, and zi is the output signal of the ith node of the competitive layer which is given by

    = min d(x,wJ,), 1, if d(x,wiq) j=l,2,...,u,, zi = (16)

    0, otherwise,

    where d (x, W/q) is the distance between the pixel being classified and the ith competitive layer weight vector representing the qth class. Here we assume that the training of the CP network proceeds for the qth class. q E Iac, where Iac is the set of class indices from one cluster of ambiguous classes,

  • 180 A. Verikas et aL /Pattern Recognition Letters 18 (1997) 173-185

    After learning, the network will output a vector ui = (b'il, 11i2 . . . . . Pim) whenever node i wins the compet- itive's layer competition. The vector ui is an approx- imate average of the features Yl . . . . . Ym associated with those pixels x that cause node i of the competi- tive layer to win.

    tion process, NL is the number of samples in the learn- ing set, k is a constant, Q is the number of classes, and Ntwi is given by

    = f 1~c i -- Ntci , i f N~ i > Ntci, N'w,

    t O, otherwise, (22)

    5.3. Modified lvq

    Assume that d i and dj are the Euclidean distances from pixel x to the weight vectors ei and c j, respec- tively. Note that the vector ci is obtained by concate- nating wi and ui. Then x is defined so as to fall into a window of the relative width .,/, if

    dj) 1-a min \d j ~ > 1 +-----~" (17)

    For all x falling into the window adapt:

    Ci(t + 1) = ci(t) -- a ( t ) [X(t) -- Ci(t) ] ,

    cj( t + 1) = ej( t) + a( t) Ix(t) - cj( t) ] ,

    where a( t ) decreases with time and 0 < a(t ) < 1, ei and cj are the two closest weight vectors to x, whereby x belongs to the same class as cj, but not as ci.

    If x, ci and c i belong to the same class,

    ck( t+ l )=ck( t )+e( t )a ( t ) [x ( t ) - - ck ( t ) ] (19)

    for ck representing the closest weight vector. If x, ci and cj belong to different classes:

    ck(t + 1) = ck(t) - e ( t )a ( t ) [x(t) - c~(t)] (20)

    for k E {i, j}. The modified lvq is similar to that de- scribed by Song and Lee (1996). However, we allow only modifications of weights inside the window h.

    5.4. Determining weights for the Euclidean distance

    The weights aij that appear in the weighted Eu- clidean distance are specific for each reference pattern. The weights are found by maximising the following function of classification performance.

    o )/ F = Ntci - k Nwi NL, (21 )

    - i=l

    where Nti denotes the number of samples from class i classified correctly at the tth iteration of the optimisa-

    where N~ i is the number of samples from class i clas- sifted correctly at the zeroth iteration of the optimi- sation process. The second term in the performance measure penalises an increase in wrong classifications. The Alopex algorithm (Unnikrishnan and Venugopal, 1994) performing a random search in the weight space is used for the optimisation.

    5.5. Additional features

    The features extracted from the surrounding Y~ . . . . . Ym are defined to be

    E[l i] , E[J/], E[g i ] ,

    min[&], max[&], min[Ki], max[K/],

    where E[ ] is an average operator. The operators E, min, max and ~ are calculated in the window around the pixel being classified.

    6. Experimental testing

    6.1. Learning and testing sets

    The system was tested by a segmentation of colour images containing the nine colour classes mentioned above. The learning set for designing the binary de- cision tree consisted of 30000 pixels. 18000 of them were labelled and the others unlabelled. The labelled pixels have been collected from both full tone and half tone prints. Fig. 3 illustrates an example of an image taken from the full tone print of the c class. An exam- ple of the "half tone image" used to collect the "cyan" pixels is given in Fig. 4. Since class k (dots printed with black ink) can appear on the all eight possible backgrounds, pixels of class k have been collected from all the backgrounds. Figs. 5 and 6 illustrate dots of class k on the yellow and magenta backgrounds,

  • A. Verikas et aL /Pattern Recognition Letters 18 (1997) 173-185 181

    respectively. Pixels from several windows containing all the colour classes (with the k and without the k) have been included into the learning set as unlabelled data. Fig. 7 presents an example of such a window.

    Collection of data for training the CP networks starts after the labelling of the terminal nodes of the decision tree. Pixels falling into the nodes of the second type are used to train the respective CP network. Two nodes of the second type have been found. One representing the colour classes m and my and the other the colour classes era, emy and k. For example, pixels from the image shown in Fig. 8 would fall into two nodes. Node "y" of the first type and node "m, my" of the second type. Since we know that the image shown in Fig. 8 contains only two colour classes, namely y and my (100% of the area is covered by a yellow ink), all the pixels falling into the node "In, my" can be used to train the my part of the respective CP network. In the same manner the data from half tone prints are collected to train the "era, emy, k" CP network. For example, the image shown in Fig. 9 contains no "era" and no "k" pixels. Therefore, all the pixels falling into the node "era, emy, k" can be used to train the emy part of the respective CP network.

    The learning, optimisation and validation sets for designing the CP networks contained 10000 pixels from each class.

    The networks were tested on about 100000 pixels from each class. An exact number of testing samples processed from different classes is given in Table 2. "Window images" of a 256 x 64 pixel size, extracted from larger primary images have been used for evalu- ating the performance of the developed system. All the "window images" were extracted from different pri- mary images. More than 200 "window images" have been processed.

    The "ground truth" of the classification was estab- lished by visual inspection knowing the desired result. For example, all pixels coming from the red dots of Fig. 8 should be assigned the label "my". The other pixels of the image should acquire the label "y". Any other classification result would be treated as error.

    6.2. Parameter setting

    There are six coefficients controlling different steps of the training process of the CP networks. These pa- rameters are:

    (at}: the learning rate controlling training of the competitive layer,

    /3: the learning rate controlling training of the Grossberg layer,

    k: a constant, that controls the degree of penalising an increase in wrong classifications, if compared with the initial optimisation state (the constant appears in the criterion function for obtaining weights to be used in the weighted Euclidean distance measure),

    a, A and e: coefficients controlling the behaviour of the lvq algorithm.

    At the beginning of training, the coefficient at was set to a relatively large value 0.4. As the weight vectors wi move into the area of the input data, the coefficient was then lowered for final convergency. Therefore, the following learning coefficients at have been used in the training process. If the training process contains a total of t2 steps, then

    for 0

  • 182 A. Verikas et al. /Pattern Recognition Letters 18 (1997) 173-185

    Fig. 3, An example of an image taken from a full-tone print of the e class.

    Fig. 4. An example of an image taken from a half-tone print of the e class.

    Fig. 5. Dots of class k on the yellow background. Fig. 6. Dots of class k on the magenta background.

    Fig, 7. An example of the image containing eight colour classes.

    6.3. Results obtained

    Two CP networks have been constructed. One for the cluster cm--cmy-k and the other for the cluster m-my. The other four colour classes have been clas- sifted by the binary decision tree. The CP network constructed for m-my contained 16 nodes and that for cm--cmy-k 32 nodes.

    Let p denote the correct classification and f the observed frequency. Then the 1 -oe confidence interval P(pl < P < P2) = 1 - oe is given by

    ~/4 f ( 1 - f ) "t2/2 '12/2 q- '1oe/2 -'}- - - Pl,2 = 2 f + ~ Nr N2 , (23)

    2 1+ Nr J

    where '12/2 is the fractil of the normal distribution at the risk a/2 and Nr is the size of the testing set.

    Table 2 presents network's performance for differ- ent colour classes, as well as the confidence intervals and the number of testing samples used.

    The values presented in the column w, y, cy of the table are averaged values for these colour classes. The correct classification rates obtained for these classes were very similar. Pixels coming from the c colour class were also classified by the binary decision tree. The lower classification accuracy for the class results from its similarity to the cm colour class. The sim- ilarity arises due to the widely varying properties of the m layer of the cm coverage as well as the vary- ing properties of paper used for printing. For example, dark micro-spots in paper make the c pixels look like those of cm colour class. The last column of the table

  • A. Verikas et al./Pattern Recognition Letters 18 (1997) 173-185 183

    Fig. 8. An example of an image containing "y" and "my" colour classes.

    Fig. 9. An example of an image containing several colour classes.

    Fig. 10. An example of an image containing eight classes of colours (no k). A part of the image was classified by the developed neural network.

    Fig. 11. An image of dots printed with black ink on a ma- genta-yellow background.

    Fig. 12. An image of dots printed with cyan, magenta and yellow inks on a magenta-yellow background.

    Table 2 Performance of the network and confidence intervals for different colour classes

    Colours w, 3,, ey e m my em emy k era, emy, k

    f 0.992 0.981 0.982 0.978 0.941 0.902 0.908 0.980 P 1 0.9914 0.9801 0.9811 0.9770 0.9395 0.9003 0.9064 0.9791 P2 0.9926 0.9819 0.9828 0.9789 0.9424 0.9037 0.9096 0.9808 Nr 9 * 104 12 104 9 * 104 9 * 104 10 * 104 12 * 104 12 * 104 10 * 104

  • 184 A. Verikas et al./Pattern Recognition Letters 18 (1997) 173-185

    presents an averaged performance of the network for full tone (solid) prints of the cm, cmy and k colour classes. The other columns illustrate network's perfor- mance for half tone prints.

    About 85% of pixels coming from the m and my colour classes and about 80% of pixels from the em colour classes have been classified in the second clas- sification step. Nearly 65% ofpixels from the cmy and k colour classes reach the third classification step.

    Figs. 10, I 1 and 12 illustrate some examples of the classification results. An example of an image con- taining eight colour classes (no k) is presented in Fig. 10. A part of the image was classified by the de- veloped neural network. Eight colour classes can be easily found in the classified part of the image. The colour classes w, e, m and y are displayed with colours of their names. The my colour class is shown in red, cm in blue, ey in green, and cmy in black. Note that no post-processing has been applied in the examples presented. Figs. 11 and 12 illustrate the classification results for the most overlapping pair of colour classes, namely, emy and k. The dots presented in Fig. 11 have been printed with black ink, while those presented in Fig. 12 with a coverage of cyan, magenta and yellow inks. The dots have been printed on the same magenta- yellow background in both pictures. The classification results are shown only for the central part of the pic- tures. After the classification we display the class k with a black colour and class cmy with a brown colour; my class is displayed with red as in the previous pic- ture. Therefore, a brown colour inside the classified part of the image of Fig. 11 and a black colour inside the classified part of the image of Fig. 12 means clas- sification errors. Some small green spots can also be found inside the brown dots. It means that these ar- eas of paper were occasionally printed with only cyan and yellow inks and no magenta. Since the ey pixels are classified at the first step, without using adjacency information, the green spots appear.

    As has been already mentioned, the black ink can appear on the all eight possible backgrounds. The background cyan-magenta-yellow has proven to be the most difficult one. Since the cyan-magenta-yellow coverage also produces a black colour, the classifica- tion task in this case becomes a task of"finding" black dots on a black background. For the human eye such areas of a picture look completely black. Effects of light diffusion in paper makes the classification task

    very difficult. A correct classification rate of about 70- 75% has been obtained for the black dots on the cyan- magenta-yellow background. We hope that the clas- sification results for such dark areas of pictures can be improved by exploiting knowledge about the light- paper-ink interaction and by using more elaborate ex- traction of additional features. The work on how an artificial neural network can be used for finding a set of additional features is going on. On the other hand, for the application it is important to "find" black ink on lighter areas of pictures.

    In order to evaluate the results obtained from the system we attempted to distinguish between two "black" images using some other method for colour image segmentation. It has been recently reported good segmentation results of textured colour images obtained using Gaussian Markov random field mod- els (Panjwani and Healey, 1995). In this model it is assumed that the rgb colour vector at each location is a linear combination of neighbours in all three planes plus Gaussian noise. The coefficients of the combina- tion are estimated as parameters of the model. There are three colour planes and four directions used. Therefore there are 12 parameters of the model for each colour plane. For two textures a difference in estimated values of the parameters indicates the dif- ference between textures themselves. This approach has been chosen for the comparison. The two "black" images have been treated as two textures with dif- ferent spatial interaction of coloured pixels. Table 3 provides an example of the estimated values of the parameters for R colour plane. As we can see from the table there is no significant difference in the values of the model parameters estimated from the image of the picture printed in black ink and that printed in cyan, magenta and yellow inks in this order on top of each other. The same range of difference between the estimated parameter values has been obtained for the colour planes G and B.

    7. Conclusions

    Small neural networks of different origins and different learning techniques have been combined to make a hierarchical network for classification of colour pixels. The hierarchical neural network per- forms analysis in a rough to fine fashion and enables

  • A. Verikas et al./Pattern Recognition Letters 18 (1997) 173-185

    Table 3 References Means and standard deviations of the estimated values of model parameters for R colour plane

    "Black image . . . . cmy image"

    Parameter Mean St. dev. Mean St. dev.

    1 -0.1330 0.062 -0.1431 0.050 2 0.2248 0.078 0.2458 0.071 3 -0.1364 0.053 -0.1500 0.064 4 0.5558 0.056 0.5577 0.072 5 -0.0545 0.025 -0,047 0.023 6 0.0751 0.021 0,0653 0.021 7 -0.0352 0.012 -0.0351 0.015 8 0.0040 0.006 0.0151 0.015 9 -0,0200 0.015 -0.0171 0.013

    10 0.0411 0.020 0.0351 0.025 11 -0.0251 0.010 -0.0255 0.016 12 0.0072 0.003 0.0062 0.005

    a high average classification speed and a low classifi-

    cation error. Experimental ly, we have shown that the network is capable of distinguishing among the nine

    colour classes that occur in a half tone colour image. A correct classification rate o f about 98% has been

    obtained even for two very similar black colours, namely the black printed in black ink and the black printed in cyan, magenta and yel low inks in this order

    on top of each other.

    Acknowledgements

    We grateful ly acknowledge the support we have received from The Swedish National Board for In-

    dustrial and Technical Development and The Royal Swedish Academy of Sciences. We also wish to thank two anonymous reviewers for their valuable comments

    on the manuscript.

    185

    Desieno, D. (1988). Adding a conscience to competitive learning. Proc. ICNN I. IEEE Press, New York, 117-124.

    Hunt, R.W.G. (1991). Measuring Colour. Ellis Horwood, Chichester, UK.

    Kohonen, T. (1990). The self-organizing maps. Proc. IEEE 78 (9), 1461-1480.

    Liu, J. and Y.-H. Yang (1994). Multiresolution color image segmentation. IEEE Trans. Pattern Anal. Machine lntell. 16 (7), 689-700.

    Panjwani, D. K. and G. Healey (1995). Markov Random Field models for unsupervised segmentation of textured color images. IEEE Trans. Pattern Anal. Machine Intell. 17 (10), 939-954.

    Song, H.-H. and S.-W. Lee (1996). LVQ combined with simulated annealing for optimal design of large-set reference patterns. Neural Networks 9 (2), 329-336.

    Tan, T.S.C. and J. Kittler (1993). Colour texture classification using features from colour histogram. Proc. SCIA-93, Tromso, Norway, 807-813.

    Tominaga, S, (1992). Color classification of natural color images. Color Research and Application 17 (4), 230-239.

    Uchiyama, M. and M.A. Arbib (1994). Color image segmentation using competitive learning. IEEE Trans. Pattern Anal. Machine Intell. 16 (12), 1197-1206.

    Unnikrishnan, K.P. and K.P. Venugopal (1994). Alopex: A correlation-based learning algorithm for feedforward and recurrent neural networks. Neural Computation 6, 469-490.

    Verikas, A. and K. Malmqvist (1995). Increasing colour image segmentation accuracy by means of fuzzy post-processing. Proc. IEEE Internat. Conf. on Artificial Neural Networks, Perth, Australia, Vol. 4, 1713-1718.

    Verikas, A., K. Malmqvist, L. Bergman and A. Gelzinis (1995). An unsupervised learning technique for finding decision boundaries. Proc. 5th European Conf on Artificial Neural Networks, ICANN-95, Paris, Vol. 2, 99-104.

    Verikas, A., K.Malmqvist and A. Gelzinis (1996a), A new technique to generate a binary decision tree. Proc. Symposium on Image Analysis, Lund, Sweden, 164-168.

    Verikas, A., K. Malmqvist, L. Malmqvist and L. Bergman (1996b). Weighting colour space coordinates for colour classification. Proc. Symposium on Image Analysis, Lund, Sweden, 49-53.