[ieee 2008 ieee international conference on signal image technology and internet based systems...

Expectation-Maximization × Self-Organizing Mapsfor Image classification

Thales Sehn Korting, Leila Maria Garcia FonsecaNational Institute for Space Research – INPE/DPI, Sao Jose dos Campos, Brazil

{tkorting,leila}@dpi.inpe.br

Fernando Lucas BacaoUniversidade Nova de Lisboa – UNL/ISEGI, Lisboa, Portugal

[email protected]

Abstract

To deal with the huge volume of information providedby remote sensing satellites, which produce images used foragriculture monitoring, urban planning, deforestation de-tection and so on, several algorithms for image classifica-tion have been proposed in the literature. This article com-pares two approaches, called Expectation-Maximization(EM) and Self-Organizing Maps (SOM) applied to unsu-pervised image classification, i.e. data clustering withoutdirect intervention of specialist guidance. Remote sensingimages are used to test both algorithms, and results areshown concerning visual quality, matching rate and pro-cessing time.

1 Introduction

The huge volume of information provided by remotesensing satellites is constantly growing, therefore the de-mand for algorithms which are able to deal with such dataand produce valid results is also increasing. Satellites withdifferent ground resolutions produce different kinds of im-ages, each of which with purposes, such as agriculture mon-itoring, urban planning, deforestation detection, etc.

To deal with all this information, several approaches forimage classification have been proposed on the literature.In the pattern recognition literature they are divided intotwo main types: supervised and unsupervised classification.The unsupervised techniques perform data clustering with-out direct intervention of specialist guidance. In this articlewe test two unsupervised approaches for remote sensing im-age classification: the Expectation-Maximization (EM) al-gorithm is compared with the Self-Organizing Maps (SOM)approach.

Many methods have been developed to deal with imageclassification problems. The work of [2] proposed a frame-work of four kernel-based techniques for hyperspectral im-age classification using Support Vector Machines (SVM).Neural networks have also been used to perform classifi-cation, as in [12], which performs image segmentation toextract object regions, then submitted to classification us-ing shape and texture attributes. Other methods for re-mote sensing image classification include Gaussian MixtureModels through the EM method [13, 9], Self-OrganizingMaps [7, 18, 16], Fuzzy sets and their combination withneural networks [14, 5], etc.

In Section 2 we describe the EM and SOM algorithmsapplied to unsupervised image classification. In Section 3we apply these algorithms to some images and compare theresults to reference images created by a specialist. In theconclusion section we highlight the strong points of eachalgorithm, and make recommendations on their application.

2 The algorithms

This section describes EM and SOM techniques ap-plied to unsupervised image classification. Both algorithmshave been developed in the same C++ Library called Ter-raLib [1], available at http://www.terralib.org/,so that processing time for basic instructions will be thesame for the two algorithms.

2.1 Expectation-Maximization Approach

In statistical pattern recognition, mixture models allowa formal approach to unsupervised learning (i.e. cluster-ing) [4]. A standard method to fit finite mixture models toobserved data is the EM algorithm, first proposed by [3].EM is an iterative procedure which converges to a (local)

2008 IEEE International Conference on Signal Image Technology and Internet Based Systems

978-0-7695-3493-0/08 $25.00 © 2008 IEEE

DOI

359


978-0-7695-3493-0/08 $25.00 © 2008 IEEE

DOI

359


978-0-7695-3493-0/08 $25.00 © 2008 IEEE

DOI 10.1109/SITIS.2008.35

359


978-0-7695-3493-0/08 $25.00 © 2008 IEEE

DOI 10.1109/SITIS.2008.35

359

Figure 1. How to create the input vectors.

maximum of the marginal a posteriori probability functionwithout manipulating the marginal likelihood p(θ|x):

p(θ|x) = p(x|θ)p(θ)

where θ is a set of unknown parameters from x. Therefore,EM estimates the components probabilities present in a cer-tain cluster. In the present implementation, the input vectorsare the image pixels (according to Figure 1), and the param-eters to estimate are mean and variance.

EM is a general method of estimating the features of agiven data set, when the data are incomplete or have miss-ing values. It works iteratively by applying two steps: theE-step (Expectation) and the M-step (Maximization). For-mally, θ(t) = {µj(t),Σj(t)}, j = 1, . . .M stands for suc-cessive parameter estimates. The method aims to approxi-mate θ(t) to real data distribution when t = 0, 1, . . .

E-step: This step calculates the conditional expectation ofthe complete a posteriori probability function;

M-step: This step updates the parameter estimation θ(t).

Each cluster probability Cj , given a certain attribute-vector, is estimated as following:

P (Cj |x) =|Σj(t)|−

12 eηjPj(t)∑M

k=1 |Σk(t)|− 12 eηkPk(t)

(1)

where

ηi = −12

(x− µi(t))TΣ−1x (t)(x− µi(t))

With such probabilities, one can now estimate the mean,covariance, and the a priori probability for each cluster, attime t+ 1, according to Equations 2, 3, and 4:

µj(t+ 1) =∑Nk=1 P (Cj |xk)xk∑Nk=1 P (Cj |xk)

(2)

Σj(t+ 1) =∑Nk=1 P (Cj |xk)(xk − µj(t))(xk − µj(t))T∑N

k=1 P (Cj |xk)(3)

Pj(t+ 1) =1N

N∑k=1

P (Cj |xk) (4)

These steps are performed until reaching the conver-gence, according the following equation [15]:

‖ θ(t+ 1)− θ(t) ‖< ε (5)

where ‖ . ‖, in this implementation, is the euclidean dis-tance between the vectors µ(t + 1) and µ(t), and ε is athreshold chosen by the user. After the calculations, Equa-tion 1 is used to classify the input.

2.2 Self-Organizing Maps Approach

The SOM is a powerful tool for exploring huge amountsof high-dimensional data. It defines an elastic, topology-preserving grid of points that is fitted to the input space [10].Proposed by Kohonen as a tool for visualization and dataanalysis with many dimensions, the SOM algorithm hasbeen used for a wide variety of applications, such as cluster-ing, dimensionality reduction, classification, sampling, vec-tor quantization and data-mining [8].

This approach comprises similar characteristics to thehuman neurons. The idea of a set of neurons which special-ize in identifying certain types of patterns through learningexperiences is consistent with current research on the hu-man brain. The idea that certain parts of the brain are re-sponsible for specific skills and tasks is remarkably similarto the SOM principles. Organizing information spatially,where similar concepts are mapped to adjacent areas, setsup a trademark of the SOM and it is believed to be one ofthe human brain functioning paradigms.

The basic idea of SOM is to map the data patterns ontoan n−dimensional grid of neurons or units. That grid formswhat is known as the output space, as opposed to the in-put space that is the original space where the data patternsare. This mapping tries to preserve topological relations, i.e.patterns that are close in the input space will be mapped tounits that are close in the output space, and vice versa. Theoutput space will usually be two-dimensional, and most ofthe implementations of SOM use a rectangular grid of units.

The neuron structure is composed by a vector of weights.The number of weights is defined by the dimensionality ofthe features in the input space. Besides this, there is also

360360360360

another dimension, related to the network structure. It isabout the space position of each neuron, and the relations toits neighbors. So, a SOM can be 1D for neurons in a line, orhigher to create planes (2D), surfaces (3D) or hyperplanes(nD).

2.2.1 Training:

The training process occurs through a set of epochs. Foreach input pattern (Figure 1) the algorithm finds the winnerneuron, that is the closest neuron to the pattern [17]. Theeuclidean distance can be used to calculate the distances.The other neurons, closer to the winner are also updated,however using a reduced factor. This process is repeated apre-defined number of epochs, and the neurons move in theattribute space, always keeping the topological relation ofneighborhood. It is important to say that the initialization ofneurons are randomized, this means that over the iterationsthe set of neurons will converge to one different class each.

Lets define x as a set of N training patterns, and w theset of neurons. In this example, the neural network has twodimensions, so wij is the neuron at the (i, j) position. Let0 ≤ α ≤ 1 be the learning rate, and h(wij , wmn) the neigh-borhood function, varying in the [0, 1] interval, closer to 0for more distant neurons. The learning rate is set to be widein the early epochs, and as the epochs are going, this rate isreduced, to smooth the convergence.

The basic training algorithm is described as follows:

1. Calculate the distance from pattern xk and all neurons:

dij = |xk − wij | (6)

2. Find the winner neuron:

wmn → wij : dij = min(dmn) (7)

3. Update the neural network:

wij = wij + αh(wmn, wij)|xk − wij | (8)

4. Repeat this until a stop criteria is met.

Generally, the function h(wij , wmn) to update the neu-rons is the gaussian, according an influence distance r, likethe following:

h(wij , wmn) = e−12

(i−m)2+(j−n)2

r2 (9)

Figure 2 shows an example of a training set, where twoneurons are converging to the two distinct classes presented.The Figure also shows the winner neuron and the updatingof the whole neural network. Across the epochs, is expectedthe green neuron converges to the top-left region and theblue to the bottom-right region in the feature space shown.

Figure 2. Training example.

A postprocessing stage can be used to improve the finalneural network, since some of them may be too close, andrepresent the same cluster in the real data distribution. Suchneurons can be merged, or removed from the final network.

2.2.2 The Classification Step:

The approach presented in this article uses one neuron foreach pattern, or class. This means that the map, duringthe epochs, will turn each neuron specilized to one specificclass. For example, one image with 5 classes will be clas-sified using 5 neurons, and so on. After the training step,we can perform the input data classification step. The algo-rithm then calculates the distance between the input elementand each neuron, and the class of the closest neuron definesthe input element class.

3 Results and Comparison

This section shows images obtained by sensors Spotand Quickbird, and their resultant classifications using bothmethods presented. The comparison of the techniques usesparameters like visual inspection, processing time and dif-ference from manual classification.

The first image, from Cubatao municipality (Brazil), wastaken by Spot sensor in May 2005, with color compositionR4G3B2. It contains three main classes, namely Water, Ur-ban, Green area, besides Shadow and Road inside the urbanarea. It is shown in Figure 3.

Each result shows two crops of the original image, sinceit is large to show in full detail. Figure 4 shows the classi-fication using EM algorithm, and Figure 5 shows the SOMresult. Both algorithms were able to distinguish between

361361361361

Figure 3. First image used in comparison, and crops used to show results.

classes Water, Urban and Green area fairly well, judgingfrom the visual inspection of the results. However the classShadow remains mixed with the other three main classes.Class Road was better classified in the SOM classification,but EM showed the smoothest result, considering granular-ity of pixels. SOM results presented some noisy appearancebecause of some isolated misclassified pixels.

In the second result we have a color composition of anurban area from Sao Jose dos Campos – Brazil. Such imagewas taken in January 2004, from Quickbird Satellite, withcolor composition R3G2B1. Figure 6 shows the originalimage and the manual classification, with 5 classes, namelyGreen Area, Building, Roof, Road, and Other.

Figure 7 shows the classification using EM algorithmand Figure 8 shows the SOM result. In terms of process-ing time, the SOM took 8sec to classify the image, whereasEM executed in 2min 34sec to do the same task. However,when comparing with the manual classification, EM resultsshowed best matching rate than SOM, according to Table 1.One can see that EM got best matching rate in classes Roofand Road, with more than 70% of correct matches. How-ever, due to spectral similarities, some occurrences of classBuilding were misclassified with class Road.

Figure 4. EM results for Figure 3.

To overcome the wrong matches, some other kind of fea-ture from the image could be inserted, such as textures orneighborhood information. On the other hand, SOM resultscouldn’t reach 70% of correct matches on any class.

The last example shows one more high resolution imagefrom Quickbird Satellite, also from Sao Jose dos Campos.Figure 9 shows the original image, with 543 × 402 pixels,that represents a wider area when compared to the previousexample. Classes here are Roof, Green Area, Road, Build-ing (White), Other and Shadow.

Figure 10 displays the classification results for Figure 9.In terms of processing time, SOM algorithm took 21sec toclassify the image, whereas EM executed in 7min 52secto do the same task. By visual inspection, we can perceivethe wrong matches provided by EM when distinguishingclasses Shadow, Road and Green Area. However the algo-rithm were satisfactory for classes Roof and Building. Suchresult can be considered as good since the correct matchesgot almost 100% of correctness. Even confusing classessuch as Shadow and Road, that also visually present somesimilarities, this result can be applied to urban studies, sincethe exact area of roofs could be extracted from the classifi-cation.

Figure 5. SOM results for Figure 3.

362362362362

a) b)

Figure 6. Second image used in comparison: a) Quickbird scene from Sao Jose dos Campos – Braziland b) manual classification.

SOM classification kept the differences between classesmore distinguishable, since the visual inspection shows thedifference between classes Road and Shadow. Howeverthis result shows some drawbacks, like the union of classesGreen Area and Shadow, and some wrong matches con-sidering classes Roof and Road. The confusion about theroofs could be solved using a contextual postprocessingstep, analysing the area of some small roads, what in thiscase should be classified as roofs.

Concerning the smoothness of the results, EM resultdoesn’t need this kind of postprocessing since the image,despite of incorrect matchings, keeps a certain spatial corre-lation among neighborhood pixels. On the other hand, SOMresult needs a postprocessing to correct the spatial correla-tion, what could be solved also using neighbor pixels in thetraining and classification steps.

4 Conclusions

This paper showed two approaches for image classifica-tion, how to implement each of which and compared theirresults in remote sensing images.

Figure 7. EM results for Figure 6.

Each technique has its advantages and drawbacks, thatwere discussed here, so the reader can choose the best al-gorithm depending on their application, volume of data, ca-pacity of processing, etc.

According to [11], SOM has some nonoptimal features,such as slowness when using large maps. However, sincethe number of neurons used in the proposed method wasequal to the number of classes (generally low), the algo-rithm executed in much less time then the EM. So, the mainfeature counting positively for SOM approach is the pro-cessing time, compared with EM. In some tests, it executedup to 20× faster then EM. This can be explained by the dif-ferent calculations performed by the two approaches, sincethe most complex operation involved in SOM algorithm isthe square root used in the euclidean distance. Whereas, EMperform several matrix inversions for each iteration duringthe convergence and classification steps.

Concerning the smoothness of the results, [6] pointsout that we may expect some degree of spatial correlationamong neighborhood pixels, resulting in a smoothed map.In this aspect, EM got better results, as we can prove byvisual inspection of the classification results shown.

Figure 8. SOM results for Figure 6.

363363363363

1 2 3 4 51 0,31 0,02 0,00 0,00 0,002 0,06 0,57 0,00 0,03 0,033 0,05 0,06 0,86 0,25 0,404 0,05 0,28 0,10 0,70 0,405 0,53 0,08 0,03 0,02 0,17

a)1 2 3 4 5

1 0,34 0,01 0,00 0,00 0,002 0,12 0,63 0,00 0,06 0,053 0,12 0,05 0,64 0,19 0,294 0,11 0,21 0,04 0,52 0,255 0,31 0,10 0,31 0,23 0,41

b)

Table 1. Confusion Matrices for Figure 6: a)EM and b) SOM. Classes are: 1) Green area, 2)Building, 3) Roof, 4) Road, and 5) Other.

SOM classifications need a postprocessing stage to im-prove the results, mostly because of the crisp nature of someof the results in specific regions. One possible solution is toincorporate some neighbor pixels in the input vector. Thissolution increases the data volume, however can increasethe resultant quality, since the algorithm executes in a small-time. Other solution, also pointed out in the previous Sec-tion, suggests to use the image context to discover incorrectmatchings, and also the use of more features, like imagetexture.

Figure 9. Third image used in comparison:Quickbird scene from Sao Jose dos Campos– Brazil.

a)

b)

Figure 10. Classification results for Figure 9:a) EM and b) SOM.

References

[1] G. Camara, R. Souza, B. Pedrosa, L. Vinhas, A. Monteiro,J. Paiva, M. Carvalho, and M. Gatass. TerraLib: Technologyin Support of GIS Innovation. II Workshop Brasileiro deGeoinformatica, GeoInfo2000, 2:1–8, 2000.

[2] G. Camps-Valls and L. Bruzzone. Kernel-based methods forhyperspectral image classification. Geoscience and RemoteSensing, IEEE Transactions on, 43(6):1351–1362, 2005.

[3] A. Dempster, N. Laird, and D. Rubin. Maximum likeli-hood estimation from incomplete data via the EM algorithm.Journal of the Royal Statistical Society, 39(1):1–38, 1977.

[4] M. Figueiredo and A. Jain. Unsupervised learning of finitemixture models. Pattern Analysis and Machine Intelligence,IEEE Transactions on, 24(3):381–396, 2002.

[5] D. Gomez, J. Montero, and J. Yanez. A coloring fuzzygraph approach for image classification. Information Sci-ences, 176(24):3645–3657, 2006.

364364364364

[6] L. Guo and J. Moore. Post-classification Processing ForThematic Mapping Based On Remotely Sensed ImageData. Geoscience and Remote Sensing Symposium, 1991.IGARSS’91.’Remote Sensing: Global Monitoring for EarthManagement’., International, 4, 1991.

[7] D. Kim. Comparison of Three Land Cover ClassificationAlgorithms-ISODATA, SMA, and SOM-for the Monitoringof North Korea with MODIS Multi-temporal Data. KoreanJournal of Remote Sensing, 23(3):181–188, 2007.

[8] T. Kohonen. Self-organizing maps. Springer, Berlin, 3 edi-tion, 2001.

[9] T. S. Korting, L. V. Dutra, L. M. G. Fonseca, G. Erthal, andF. C. Silva. Improvements to Expectation-Maximization ap-proach for unsupervised classication of remote sensing data.In GeoINFO, 2007.

[10] J. Laaksonen, V. Viitaniemi, and M. Koskela. Applicationof Self-Organizing Maps and automatic image segmenta-tion to 101 object categories database. Proc. Fourth Inter-national Workshop on Content-Based Multimedia Indexing(CBMI05), Riga, Latvia, June, 2005.

[11] J. Pakkanen and J. Iivarinen. A novel self-organizing neuralnetwork for defect image classification. Neural Networks,2004. Proceedings. 2004 IEEE International Joint Confer-ence on, 4, 2004.

[12] S. Park, J. Lee, and S. Kim. Content-based image classifi-cation using a neural network. Pattern Recognition Letters,25(3):287–300, 2004.

[13] H. Permuter, J. Francos, and I. Jermyn. A study of Gaus-sian mixture models of color and texture features for im-age classification and segmentation. Pattern Recognition,39(4):695–706, 2006.

[14] B. Shankar, S. Meher, A. Ghosh, and L. Bruzzone. Re-mote Sensing Image Classification: A Neuro-fuzzy MCSApproach. LECTURE NOTES IN COMPUTER SCIENCE,4338:128, 2006.

[15] S. Theodoridis and K. Koutroumbas. Pattern Recognition.Academic Press, 2003.

[16] C.-F. Tsai, K. McGarry, and J. Tait. Image classification us-ing hybrid neural networks. In SIGIR ’03: Proceedings ofthe 26th annual international ACM SIGIR conference on Re-search and development in informaion retrieval, pages 431–432, New York, NY, USA, 2003. ACM.

[17] J. Vesanto and E. Alhoniemi. Clustering of the self-organizing map. Neural Networks, IEEE Transactions on,11(3):586–600, 2000.

[18] J. Zhang, Q. Liu, and Z. Chen. A Medical Image Segmenta-tion Method Based on SOM and Wavelet Transforms. Jour-nal of Communication and Computer, 2(5):46–50, 2005.

365365365

[ieee 2008 ieee international conference on signal image technology and internet based systems...

Documents