author guidelines for 8 - aimagelabimagelab.ing.unimore.it/files2/borsodeste/icpr08.docx · web...

Describing Texture Directions with Von Mises Distributions

Costantino Grana, Daniele Borghesani, Rita CucchiaraDipartimento di Ingegneria dell’Informazione

Università degli Studi di Modena e Reggio [email protected]

Abstract

A new approach for document analysis has been proposed. The goal is to find a coherent way to describe texture within generic documents, in order to classify text (mainly), background and images. Autocorrelation matrixes has been computed, and an elegant description through mixture of Von Mises distributions has been implemented.

1. Introduction

Document image analysis has a quite long story in pattern recognition. Several techniques has been proposed for content and layout segmentation to provide the basis for semantic annotation, classification and retrieval: for the implementation over a large collection of digital documents, the accuracy of the analysis and the computational effort required are both significant. A specific use case are ancient books or illuminated manuscripts, that cannot be flipped through by the public due to their value and delicacy. Computer science has the power to fill the gap between people and all these precious libraries of masterpieces: in fact digital versions of the artistic works can be publicly accessible, either locally or remotely, giving the user the freedom to choose his personal way to navigate and enjoy it.

The quality of images of illuminated manuscripts heavily depends on the way they have been acquired or the preservation status of the work. Small rotations or scaling can occur, pages can be spoiled, grayscale or low quality acquisition is also possible, generally resulting in a set of noisy textures. Moreover different manuscripts have different contents and layout. For all these reasons, a simple approach based on color, shape or layout would not be effective enough for a large scale implementation. In this paper, a flexible approach to the problem is proposed based on texture analysis. Images are analyzed by blocks through autocorrelation, and a direction histogram is computed for each block. Then we described it with a mixture of

Von Mises distribution (MoVM), a statistical formulation that is more suitable for angular data than a mixture of Gaussians. We implemented an EM algorithm for parameters extraction and finally we exploited Support Vector Machine (SVM) for block classification. The goal is to provide a very fast and compact representation for each class, in order to make the retrieval as fast and effective as possible.

The test set used for our experiments is composed by illuminated manuscripts provided by the Franco Cosimo Panini S.P., precisely the photo collection of The Borso D’Este Holy Bible.

2. Related works

Document segmentation is normally based on partitioning of image in blocks and then texture analysis. Several works for text segmentation has been proposed: a clustering approach is presented in [1], while in [2] a classification using Gabor filters has been used. A comprehensive survey is proposed by Busch et.al. [3] exploring and comparing several techniques and situations. More general approaches dealing also with background and pictures segmentation have been proposed. Some works exploit geometric constraints over the layout: for a literature survey, please refer to [4]. Many others compute specific descriptors followed by classification: an example is provided by [5] with hidden tree Markov models.

The major of works have been developed for printed documents, while on illuminated manuscripts only a limited set of works has been carried out. A reference paper is the description of the DEBORA system [6], which consists of a complete system for analysis of Renaissance. In [7] an effective technique for texture characterization in old books has been proposed, exploiting the autocorrelation matrix in order to extract the relevant directions within the texture (called directional rose). In out work, we extended this approach formulating a more elegant

description of the different directions within the block with the use of mixtures of Von Mises distributions.

3. Texture analysis

Documents are mainly characterized by three kind of textures: a noisy background, the text and colored images or decorations (Fig. 1). The quality of these textures heavily depends on the way the image has been acquired. Moreover each document can have various contents and layout. For all these reasons, a simple approach based on color, shape or layout would not be effective enough for a large scale implementation. A flexible approach to the problem is necessary, so we chose to look at the texture structure using a well known texture feature, the autocorrelation matrix.

Autocorrelation is a very powerful and discriminative feature in our context because textual textures have a pronounced orientation, that heavily differs from background, pictures or decorations. The autocorrelation function is a typical signal processing technique. Formally, it is a cross correlation of a signal with itself, and it represents a measure of similarity between two signals. Once applied to a grayscale image, it produces a central symmetry matrix, that gives an idea of how regular the texture is.

The image is divided into square blocks whose size bs must be set according to the scale at which the texture should be analyzed. The definition of the autocorrelation for a block is:

(1)where l and k are defined in .

The result of the autocorrelation can be analyzed extracting an estimate of the relevant directions within the texture (a similar approach has been proposed in [7]). Each angle determines a direction, and the sum of all the pixels along each direction is computed to form a polar representation of the autocorrelation matrix, called direction histogram. In this way, each direction will be characterized by a weight, indicating its importance within the block.

. (2)Since the autocorrelation matrix has a central

symmetry by definition, we consider only the first half of the direction histogram, from 0° to 179°. and r are quantized: the step of is set to 1 degree (so we obtain 180 values, representing 180 possible directions), r is defined as of the block size. A text block will be characterized by peaks around 0° and 180° because of the dominant direction is horizontal, and this behavior is different from image textures (described by a generic monomodal or multimodal distribution) and background texture (described by a nearly uniform flat distribution).

4. Texture characterization

The polar distribution obtained by autocorrelation in the previous steps can be easily modeled using Von Mises distributions. Gaussian distributions are inappropriate to model periodic datasets: setting the origin in 0°, elements crossing this angle will be classified into two distinct directions, even if they express almost the same one. The choice of the origin becomes very critical, and for this reason could be a weak point of the fit. Instead, a Von Mises distribution is circularly defined so it can correctly represent

Figure 1. Example of illuminated manuscripts and relative ground truth. White identifies

background, red identifies text and blue identifies pictures or decorations.

angular datasets. The probability density function is defined as follows:

. (3)The parameter m denotes how concentrate the

distribution is around the mean angle . In our context, we used a slightly different formulation (we simply multiply the angles by 2) with a periodicity of

instead of , considering only angles in representative for valuable and meaningful directions. I0 is the modified order 0 Bessel function, and is defined as:

. (4)To catch the general multimodal behavior of input

datasets, we chose a mixture of Von Mises distributions. We used mixtures with 2 components only, because they proved to be sufficient in order to recognize the two most meaningful directions (horizontal and vertical) while keeping an affordable computational cost. An example of fitting for the three types of texture analyzed is shown in Fig. 2.

Generally, a mixture of K Von Mises distributions is defined as follows:

, (5)where represents a weight of the distribution within the mixture. An optimal way to get the maximum likelihood estimates of the mixture parameters is the Expectation-Maximization algorithm [8]. In the E step the expected values for the likelihood are computed, then a set of parameters to maximize such values are obtained, repeating the process until convergence or maximum number of iterations is reached. To maximize the likelihood, a set of responsibilities of the bins for each Von Mises is

necessary. Let be the index of the bin. The responsibilities are computed as follows:

. (6)A new set of weights for the Von Mises of the

mixture can now be computed:

. (7)This formulation differs from the one in [8], and the

motivation lies on the dataset we used: we do not have a general distribution of angular data to fit, but a sampling of directions and relative weights. For this reason, we consider the weight as a multiplier value for each angle, so formally we have times the angle theta in our dataset.

In the M step, we compute the new and m values for each Von Mises within the mixture. In particular,

is computed by maximization of the relative likelihood as follows:

. (8)Note the multiplication by 2 in order to relate to a periodicity. The retrieval of m by maximization is a

bit more complicate, due to the presence of the Bessel functions. Given the derivative of the modified Bessel function defined as:

, (9)the problem could be mathematically solved using this formulation:

0

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

0.45

0.5

0

1

2

3

4

5

6

7

8

1 10 19 28 37 46 55 64 73 82 91 100

109

118

127

136

145

154

163

172

0

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

0.45

0.5

0

1

2

3

4

5

6

7

8

110 19 28 37 46 55 64 73 82 91 100

109

118

127

136

145

154

163

172

0

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

0.45

0.5

0

1

2

3

4

5

6

7

8

1 10 19 28 37 46 55 64 73 82 91 100

109

118

127

136

145

154

163

172

Figure 2. Example of directional histograms and the corresponding fitting with Von Mises mixtures.

. (11)The value of can be found by the numerical

inversion of . In particular we use the approximation proposed in [9].

At this point, we have 6 parameters to play with: , , and of both Von Mises. This represents a

very consistent and compact way to describe a whole distribution, making the retrieval faster and effective.

The similarity between two Von Mises distributions can be defined using the Bhattacharyya distance. Given two Von Mises distributions V1 and V2, the formulation is shown in Eq. 10. No explicit form is available for mixtures, so we propose a new metric that also takes into account the relative weights of the components of the mixture. Given two mixture

distributions , we test the Bhattacharyya distance between the two components of one distribution and the two of the other, select the best matching two (call them b, and the other two o) and then measure the distance as:

,(12)

where

(13)This metric takes into account the fact that two components can be very similar, but their contribution to the mixtures is quite low.

5. Experimental results

Tests have been performed using 20 uncompressed 24bit high definition pictures of biblical illustrated

manuscripts, provided in high resolution (8373x6039) and with 400dpi. For each image, a ground truth has been manually annotated, focusing on the three main characteristics of these images: text, images and background. Images have been divided using different fixed window sizes, then a suitable single window size has been chosen. For each window (block) over the image, the direction histogram has been computed. A preprocessing stage is necessary to highlight the real shape of the distribution, besides the numerical value assumed by bins. A simple normalization approach has the major drawback of exalting the shape, and thus the noise in low variability data.

In order to represent the data distribution coherently with the real shape distribution, we chose to change the baseline reference of values, subtracting to each bin the minimum of the entire block. In this way, a noisy block like the background will show a nearly flat distribution with a very small variance. Instead a text block, with a dominant direction over the horizontal axis, will continue to show a coherent distribution, with peaks near 0° and 180°.

Moreover, we would like to keep the same scale in all blocks, so that the shape can be compared. For this reason the direction histogram bins have been quantized to a fixed step. The normalized distribution has been used to fit a mixture of Von Mises distributions.

In our experiments, we tested mixtures with different numbers of Von Mises distributions, and we observed that even a very limited set is sufficient to produce good retrieval results in terms of precision and recall, without affecting too much the computational time.

To perform a first evaluation of the characterization proposed, a confusion matrix has been computed using a 1-nearest neighbor approach. The training set was analyzed, then a test set was classified: for each block within the test set, the classification of the most similar block within the training set has been chosen as classification of the block. The result are shown in Table 1. Results were quite promising: this feature has

(10)

text background image recall

text 431 5 54 0.879592backgroun

d 7 446 172 0.7136

image 27 63 631 0.875173

Table 1. SVM classification using radial basis function as kernel.

recall precision

text 0.931183 0.911579backgroun

d 0.854086 0.87976

image 0.826138 0.898477

Table 2. Confusion Matrix relative to a 1-nearest neighbor classification.

a good discriminative power with all three kinds of texture used.

In order to produce a more generic classifier, we implemented the SVM classification. The best results (in terms of recall and precision) have been obtained exploiting the radial basis kernel. The results of this classification are shown in Table 2, and a visual representation of the classification is shown in Fig. 3. The text is the main focus of this feature, because of the typical horizontal orientation, and these experiments reveal a good discriminative power for this kind of texture. The background either has a very peculiar characteristics that makes this feature suitable for retrieval: a flat distribution is far more different than the monomodal distribution of the text. Instead this feature is not designed to be effective for image classification: in practice, pictures present a generic multimodal distribution, occasionally they show a particular symmetry but it is not representative of the entire class. For this reason, the good classification results in this case have to be considered as a side effect of the limited number of Von Mises within the distribution: the algorithm tends to classify every image with a two-dimensional distribution (horizontal and vertical orientations), that have proved to have a sufficiently pronounced characteristic respect of the other two.

6. Conclusions

In this paper a new technique for document analysis is presented, characterizing each texture with a 6-vector feature. The autocorrelation computation and the fitting of the direction histogram with the mixture of Von Mises distributions has a reasonable processing

time for this application: a high resolution page with more than 1800 blocks can be processed in about 100 seconds on a standard PC. The result of the feature extraction provides a very compact description of blocks. For this reason, these features can be easily exploited by a content-based image retrieval system: the computational time needed for comparison and searching will be extremely low.

Finally we thank the Franco Cosimo Panini S.P.A that give us the possibility to analyze an invaluable pieces of art such as The Borso D’Este Holy Bible for this work.

References

[1] Bres, S.; Eglin, W.; Gagneux, A., "Unsupervised clustering of text entities in heterogeneous grey level documents," Pattern Recognition, 2002. Proceedings. 16th International Conference on , vol.3, no., pp. 224-227 vol.3, 2002

[2] A. K. Jain and S. Bhattacharjee, "Text segmentation using Gabor filters for automatic document processing," Machine Vision and Applications, vol. 5, no. 3, pp. 169--184, 1992.

[3] Busch, A.; Boles, W.W.; Sridharan, S., "Texture for script identification," Pattern Analysis and Machine Intelligence, IEEE Transactions on , vol.27, no.11, pp. 1720-1732, Nov. 2005

[4] Mao, S., Rosenfeld, A., Kanungo, T.: Document structure analysis algorithms: a literature survey. Proc. SPIE Electronic Imaging 5010 (2003) 197–207

[5] Diligenti, M.; Frasconi, P.; Gori, M., "Hidden tree Markov models for document image classification," Pattern Analysis and Machine Intelligence, IEEE Transactions on , vol.25, no.4, pp. 519-523, April 2003

[6] F. Le Bourgeois, H. Emptoz, “DEBORA: Digital AccEss to BOoks of the RenAissance”, in International Journal on Document Analysis and Recognition, vol. 9, n. 2-4, pp. 193-221, 2007

[7] Journet, N.; Eglin, V.; Ramel, J.Y.; Mullot, R., "Dedicated texture based tools for characterisation of old books," Document Image Analysis for Libraries, 2006. DIAL '06. Second International Conference on , vol., no., pp. 10 pp.-, 27-28 April 2006

[8] A. Prati, S. Calderara, R. Cucchiara, “Using Circular Statistics for Trajectory Analysis” in Proceedings of International Conference on Computer Vision and Pattern Recognition (CVPR 2008), Anchorage, Alaska (USA), June 24-26, 2008

[9] G.W. Hill, “Evaluation and Inversion of the Ratios of Modified Bessel Functions, and

”, ACM Transactions on Mathematical

Software, vol. 7, n. 2, June 1981, pp. 199-208

Figure 3. Visualization of the proposed classification. A filtering technique has been applied to clean out the results.