boosting bonsai trees for handwritten/printed text ... · boosting bonsai trees for...

Boosting bonsai treesfor handwritten/printed text discrimination

Yann Ricquebourga, Christian Raymonda,Baptiste Poirrieza, Aurelie Lemaitreb and Bertrand Couasnona

aUniversite Europeenne de Bretagne, IRISA/INSA Rennes, FrancebUniversite Europeenne de Bretagne, IRISA/Universite de Rennes-2, France

ABSTRACT

Boosting over decision-stumps proved its efficiency in Natural Language Processing essentially with symbolicfeatures, and its good properties (fast, few and not critical parameters, not sensitive to over-fitting) could be ofgreat interest in the numeric world of pixel images. In this article we investigated the use of boosting over smalldecision trees, in image classification processing, for the discrimination of handwritten/printed text. Then, weconducted experiments to compare it to usual SVM-based classification revealing convincing results with veryclose performance, but with faster predictions and behaving far less as a black-box. Those promising resultstend to make use of this classifier in more complex recognition tasks like multiclass problems.

Keywords: Handwritten/Printed text discrimination, Boosting, Maurdor campaign

1. INTRODUCTION

During the last decades, the problem of classifying texts into printed or handwritten regions arose in the documentimage analysis. Moreover, the presence of both printed and handwritten texts in the same document is animportant difficulty in the automation of the optical character recognition procedure. Indeed, both printed andhandwritten texts are often present in many different kinds of documents: blank or completed forms, annotatedprints, private or business correspondence, etc. In any of these cases, from a digitized version of the documentit is crucial to detect and distinguish printed from handwritten texts so as to process them differently with theappropriate tools to extract the most possible informations during the recognition analysis.

The Maurdor campaign,1 where this study places its application context, is exactly aiming at this goal.Maurdor aims at evaluating systems for automatic processing of written documents. To do so, the campaignintends to assess, in quantity and quality, the ability of such systems to retrieve relevant information from digitalimages obtained by digitizing paper documents. Prior to retrieving information, the problem then consists incharacterizing semantics area of the documents (image, graphic, printed text, handwritten text, etc.). Thiscampaign also favors efficient solutions, able to cope with a large variety of documents (kinds, languages, etc.).

In section 2, we present some evolutions in the solutions from the literature concerning discrimination betweenprinted and handwritten texts. Section 3 motivates the use of boosting over small decision trees, called bonsaitrees, unlike the classic use of simple decision-stumps that capture less information than a bonsai, and withoutfalling in the over-fitting drawback of a full decision-tree. The different stages of our overall system are detailedin section 4. Section 5 describes the results of the experiments conducted to evaluate this system. Finally, section6 summarizes the conclusions drawn from this study that shows using bonsai trees lead to an efficient classifiersystem, fast and not sensitive to parameters, and future work directions are proposed.

Further author information: (Send correspondence to Yann Ricquebourg) E-mails = [email protected]

2. STATE OF THE ART

Previous works on this subject of discrimination between printed and handwritten texts investigate varioussolutions.

In,2 gradient and luminance histograms are extracted from the images, then transmitted to a neural net-work to segment the document into homogeneous semantic areas: printed characters, handwritten characters,photographs and paints. A neural network was also used in,3 based on directional and symmetric features todistinguish printed from handwritten character images. In the same way, a method to discriminate handwrittenfrom printed texts was proposed in,4 using low level general features to classify using a feed-forward multilayerperceptron (MLP).

Other approaches proposed the use of statistical and structural features as nodes of tree classifiers for au-tomatic separation of printed from handwritten text lines.5,6 Then Hidden Markov Models (HMM) also tookpart to the competition as proposed in.7 Outstanding results with Support Vector Machine (SVM) using Gaborfilters and run-length histograms as features were obtained in,8 and their system was also improved by a MarkovRandom Field (MRF) as post-processing. SVM were also used with Radon transform in,9 whereas in10 simplebasic features like height, aspect ratio, density, maximum run-length are choosen to feed SVM.

In11 was presented a method using simple generic features like the height of main body and different ratio ofascender and descender height to body computed from the upper-lower profile histograms. Those features fed adiscriminant analysis approach for classification in printed or handwritten text.

Spectrum domain local fluctuation detection (SDLFD) was introduced12 as a method to distinguish printedfrom handwritten texts. Local regions of the document are transformed into frequency domain feeding a MLP.

In,13 simple basic features (like width, height, area, density and major axis) were also extracted and thenprocessed with various data mining algorithms. And recently,14 also used basic and generic shape features (area,perimeter, for factor, major-minor axes, roundness, compactness) to separate handwritten and printed textsusing a k-nearest neighbor algorithm (KNN).

Out of those studies, we drew the conclusion that generic and basic features were often retained as interestingenough to achieve the goal, at word-level as in recent works13,14 as concluded to be most promising by,8 andthat SVM classification turned out to be a solution now very often selected with strong results.

3. WHY BOOSTING BONSAI TREES?

To process all future OCR stages we would like an efficient classifier able to deal with multiclass/multilabelproblems, able to manage arbitrary input features and able to deal with a big dataset. In this work we investigatethe use of boosting decision trees. Those approaches turned out to be very efficient in Natural LanguageProcessing, with discrete features. Their interesting aspects, detailed hereafter motivated the experiment totransfer this classifier to the image world, with continuous numeric features.

Boosting is a meta-learning algorithm, the final model is a linear combination of several classifiers builtiteratively. The principle is to assign an identical weight at the beginning to all training samples and use aweak learner (classifier better than random guessing) to classify the data. Then all misclassified samples are“boosted”, their weight are increased and an other weak classifier is run on the data with the new distribution.Misclassified samples by the previous classifier are boosted and the process iterate until we decide to stop it.Thus all individual classifier are linearly combined according to their respective performance to build the finalmodel. More precisely we will use a multiclass/multilabel algorithm of boosting called Adaboost.MH.15 Thisalgorithm is derived using a natural reduction of the multiclass, multilabel data to binary data. Adaboost.MHmaintain a set of weights over training examples and labels. As boosting progresses, training examples and theircorresponding labels that are difficult to predict correctly get incrementally higher weights, while examples andlabels that are easy to classify get lower weights.

Algorithm is presented in figure 1, given S a set of training samples (xi, Yi), . . . , (xm, Ym) where each instancexi ∈ X, and each label Yi ∈ γ the set of all possible labels. Dt is the weight distribution, and ht(xi, l) is the realvalued prediction of the weak learner of the presence of the label l into xi.

Given: (xi, Yi), . . . , (xm, Ym) o xi ∈ X, Yi ∈ γInit D1(i, l) = 1

mk

For t = 1, . . . , T :

• Provide distribution Dt to the weak learner

• Get weak hypothesis ht : X ∗ γ → <• Choose αt ∈ <• Update:Dt+1(i, l) = exp(−αtγ[l]ht(xi,l))

Zt

where Zt is a normalisation factor that make Dt+1 a distribution

Output final hypothesis:

f(x, l) =

T∑t=1

αtht(x, l)

Figure 1. AdaBoost.MH algorithm

In,15 “decision stumps” are chosen as weak learners which are simply one level decision trees (two leaves).AdaBoost in combination with trees has been described as the “best off-the-shelf classifier in the world”.16 Inbonzaiboost,17 we propose to build deeper, but small, decision trees (i.e decision trees with low depth, we willrefer to them as bonsai trees: they are deep enough to capture a piece of information while not being sensitiveto over-fitting). This strategy turned out to be very effective in practice since bonzaiboost is ranked first onmulticlass problems on http://mlcomp.org (a web site that evaluates classification algorithms on representativeclassification problems).

As said above, using a SVM classifier appears to be a good choice for hand/typed separation since we arefacing a binary classification problem with numeric input features. This classifier has been widely used andis state-of-the-art for this kind of problems. Despite these performances in most of the classification problems,SVM exhibits several drawbacks that we may avoid with our proposed solution. We address next a point-to-pointcomparison between our proposed algorithm and SVM.

3.1 Training time

Concerning SVM, when features number is greatly lower than instances number a fast linear kernel would notbe efficient, a RBF kernel is then a reasonable choice, but makes the training phase costly since the complexityis quadratic.

Concerning boosting, the complexity of Adaboost.MH is linear according to the number of training samplesand number of labels. We have to add the complexity of the weak learner that is linear, for instance with abinary decision tree, of the number of nodes and the number of features. We can note that the boosting algorithmitself is not parallelizable but the weak learner is in two ways: induce tree nodes in parallel, evaluate featuresin parallel. Processing numeric features in a decision tree is not efficient, because the decision tree look at eachtree node for the best threshold to apply in order to split the current node, for that it examine every thresholdin order to find the best that can become costly if many different numeric values exists in the training data.

Finally, here follows the main aspects of each of the two classifiers:

SVM Boostingcomplexity quadratic linear

numeric features process distance thresholdparallelisation yes yes

overall equality

3.2 Tuning phase

SVM need to be finely tuned: in presence of a RBF kernel, at least two parameters need to be tuned to makethe SVM efficient, the Gaussian parameter γ and C, the trade-off between training error and margin. To tune

http://mlcomp.org

these parameters, a cross-validation step on a variety of these pairs of parameters is necessary. The completeSVM training process may become very costly.

On the other hand, no parameter need to be finely tuned in our boosting, there is only two “human-readable”parameters:

1. the tree depth: going further than one level tree (“decision stumps”) allows to capture structure in inputdata (XOR problem can not be solved with one level tree). Trees should be deep enough to captureimportant structure information while kept as simple as possible to avoid the instability effect of decisiontrees and keeping a good generalisation performance. Practically, going to depth 2, 3 or 4 is sufficient formany tasks and the precise depth choice is not crucial since the leaning curves for different choice of depthwill meet each other at a given number of iterations.

2. the number of iterations: the number of iterations is not crucial, it need just to be big enough to get thetangent of errors that actually remains stable because boosting is not prone to overfitting in the generalcase18 even if this phenomenon appears in some circumstances (i.e. presence of noise).19

We can conclude that this tuning phase is clearly a huge drawback of SVM when confronted to a big amountof data. Finding the good parameters that make the SVM efficient is a very costly process.

3.3 Data preparation

SVM needs data preparation:

1. although particular variants turn out to be capable of feature selection when combined with a particularnorm,20 with most of the implementations, like libSVM,21 a pre-processing step of features selection iswelcome to filter useless features that might otherwise impact negatively the SVM performance

2. SVM can deal only with numeric features, all no-numeric features have to be transformed

3. to avoid the prevalence of some feature to others, features should be scaled

Whereas for boosting:

1. no need of feature selection since the tree base classifier is doing it intrinsically

2. decision trees can deal with several feature type: numeric or discrete

3. no numeric scaling is necessary

Then, boosting over decision trees will obviously simplify the data preparation process.

3.4 Multiclass/multilabel classification

SVM needs heuristics to perform multiclass/multilabel classification: the usual way to do multiclass or/andmultilabel classification with a SVM is to decompose the problem into several binary problems using the one-vs-all or the one-vs-one paradigm and managing a vote scheme. The whole process combined with the necessarystep of tuning make the whole process extremely costly.

The boosting algorithm Adaboost.MH is intrinsically multilabel/multiclass and the cost to pass from a binaryproblem to a multiclass/multilabel one is light.

Thus, boosting should be clearly of better interest thanks to its potential for future more complex classificationsituations.

3.5 Post-processing interpretation possibilities

SVM classifier works as a black box: you have no feedback of the model on your data, like what features arerelevant.

In boosting, each decision tree give an interesting feedback on the data thanks to the readable rules itproduced, thus many interesting statistics may be computed from the complete boosting model: what featureshave been selected? how many times ? what features conjonction have been used ? etc.

4. OUR SYSTEM

This system has been designed and tested in the context of the Maurdor campaign.1 Here we focused on theproblem of distinguishing handwritten from printed text blocks. It is important to notice that in the processimplied by the Maurdor campaign, this task is invoked with a block of homogeneous text (only handwritten oronly printed).

4.1 Text-block segmentation into lines and words

The method used for text-block segmentation has been presented in.22 This method is dedicated to the segmen-tation of homogeneous text blocks into lines and words. It is based on the notion of perceptive vision that isused by the human vision. The goal is to combine several points of view of the same image in order to detectthe text lines.

We first consider an image at low resolution (the dimensions of the original image are divided by 16). Atthat resolution level, the text lines appear as line segments. Thus, we apply a line segment extractor, based onKalman filtering. The line segment extraction gives a prediction on the presence of text lines. Then, we use theextraction of connected components at a high resolution level in order to verify the presence of a text line.

Once the text lines have been detected, our goal is to segment each text line into word. Thus, we use aneighbour distance based on Voronoi tesselation. We compute the distance between each neighbouring connectedcomponents. Then, using the k-mean algorithm, we compute a distance threshold between intra-word distancesand inter-word distances. Thanks to this threshold, we group together the connected components that belongto the same word.

The figure 2 presents some examples of the segmentation of text blocks into words. More details on themethod can be found in.22

4.2 Features extraction

This next step makes use of the segmentation information provided above to split document image into thecorresponding words sub-images. Then, the feature extraction process transforms each of these sub-images intoa real-value vector.

For this system, we preferred to use a standard and general features set (that we use as a library in differentapplication domains, such as word or number recognition for handwriting23) instead of designing dedicated andspecific features. So, each sub-image leads to the computation of a 244-dimensional real-value vector composedof the following set of features:

• Basic features like width, height, surface, pixel-value average, centre of inertia coordinates, moments ofinertia (giving 11 components).

• 13th order Zernike moments24 (giving 105 components).

Zpq =p+ 1

π

∫ 2π

0

∫ +∞

0

Vpq(r, θ)f(r, θ) r dr dθ (1)

where p is the radial magnitude and q is the radial direction, and V denotes the complex conjugate of aZernike polynomial V , defined by Vpq(r, θ) = Rpq(r)e

iqθ where p − q is even with 0 6 q 6 p and R is areal-valued polynomial:

Figure 2. Segmentation of text blocks into words, using the proposed perceptive line of words approach

Vpq(r, θ) = Rpq(r)eiqθ where p− q is even and 0 6 q 6 p

Rpq(r) =

p−q2∑

m=0

(−1)m(p−m)!

m!(p−2m+q2 )!(p−2m−q2 )!

rp−2m

• Histograms of 8-contour directions using Freeman chain code representation (with a zoning in 16 areas(2x8), implying 16 histograms, giving 128 components);

4.3 Learning with bonzaiboost

Each features vector is supplied to the bonzaiboost learning. We should not forget that processing numericfeatures is a penalizing case for the decision-tree based boosting, that has to deal with a huge amount ofthresholds (as pointed out in section 3.1) associated to all the float-values. Currently bonzaiboost uses a greedyalgorithm to select a candidate among all numeric thresholds.

To address this problem, the numeric precision was tested to be rounded naively to 10−2 to reduce the numberof candidate threshold. It can drastically cut their number and speed up the training up to a factor 2 generally.It’s important to notice that this reduction is performed at no significant cost concerning overall recognitionresults in experiments. Nevertheless, this is actually not optimal because the best reduction should be differentfor each feature. We are currently implementing a smart automatic method to extract thresholds candidate thatwould lead to an important training time reduction without loosing accuracy.

4.4 Post-processing interpretation

As presented before, after the training stage, the model built by bonzaiboost may give some interesting andinterpretable informations, that could lead to an optimization of the features set.

Figure 3. The weak learner computed at the iteration 1 of the boosting. Each node is presented with the binary test selected.Each leaf shows the number P of samples that belong to it, and the majority label with its probability

For instance, the features are automatically selected, and you can output which ones are the most used bythe system (see table 1). Moreover, each bonsai tree represents human-readable rules that can exhibit how thesystem makes its decision (see figure 3).

5. EVALUATION

We evaluated our system using the images from the documents of the Maurdor campaign.1 This corpus proposesdocuments in English, French and Arabic, of different kinds: blank or completed forms, printed but also manuallyannotated business documents, private and handwritten correspondence sometimes with printed letterheads,printed but also manually annotated business correspondence, other documents such as newspaper articles orblueprints, etc.

We conducted two experiments in parallel to compare our system based on boosting using bonzaiboost,17 anda reference one where the classification stage has been rewritten, based on SVM using the open-source librarylibSVM.21

5.1 Training

To train our two systems, we used the train0 and train1 datasets of Maurdor campaign, resulting in nearly 500,000ground-truth word images corresponding to 3000 pages (see table 2). The SVM classifier converged in about 10hours to train (with a modified version of libSVM to use parallelism, 16 threads and 20 GB of parameter cache,otherwise lasting 133 hours), whereas the bonzaiboost classifier was stopped after 4000 iterations of training thatlasted about 56 hours.

From one hand, it is worth mentioning that boosting can be continued as far as the user wants. The limitof 4000 iterations is set by default only arbitrary, since we observed out of experiments that this value wasfar beyond the number of needed iterations to reach good performance. bonzaiboost used indeed 51 secondsby iteration and could be stopped (and resumed if needed) far earlier as visible on the convergence curves offigure 4, with a linear saving of time. Moreover, as visible on this figure, continuing the training does not leadto over-fitting since the validation set still shows improving results while the train set has already reached itsmaximum.

Besides, we can use thresholds reduction by rounding numeric values described earlier. Here this cut theirnumber from more than 49 millions, to about 1 million different thresholds, thus implying a faster round iteration,reduced from 51 seconds to 29 seconds, that would also save time.

On the other hand, as described in section 3.2, SVM needs a fine tuning of all the parameters to reach goodperformance. Without this tuning, and using the default parameters, only poor results of table 3 were obtained.We then performed a pre-processing before the full training, to tune the two main parameters of the RBF kernelin SVM using a cross-validation script supplied with libSVM, but increasing the cost by a factor 450 by default.

Table 1. Frequency of each attribute selected at nodes of the 4000 trees, during the 4000 boosting iterations of training

Frequency Attribute

1715 PixelAverage1177 Surface971 ZernikeMomentREAL 0 0672 ZernikeMomentREAL 2 0590 ZernikeMomentREAL 6 0583 ZernikeMomentREAL 6 2568 ZernikeMomentIMAG 4 2562 ZernikeMomentIMAG 3 1556 ZernikeMomentREAL 4 0551 ZernikeMomentREAL 3 1549 GravityCenterY542 ZernikeMomentREAL 8 2532 WidthHeightRatio525 ZernikeMomentIMAG 6 2511 ZernikeMomentIMAG 5 1506 Orientation501 ZernikeMomentREAL 12 4499 ZernikeMomentREAL 8 4494 ZernikeMomentREAL 8 0494 ZernikeMomentREAL 10 2493 ZernikeMomentIMAG 7 1492 ZernikeMomentREAL 12 2485 ZernikeMomentREAL 10 0473 ZernikeMomentIMAG 1 1469 ZernikeMomentIMAG 10 2467 ZernikeMomentIMAG 7 3465 ZernikeMomentREAL 12 8465 ZernikeMomentREAL 12 0464 ZernikeMomentIMAG 13 1462 ZernikeMomentREAL 9 1

460 ZernikeMomentREAL 5 1457 ZernikeMomentIMAG 5 3455 ZernikeMomentREAL 12 6452 GravityCenterX451 ZernikeMomentREAL 7 1450 ZernikeMomentIMAG 8 2448 ZernikeMomentREAL 10 4448 ZernikeMomentIMAG 11 1446 ZernikeMomentIMAG 9 1445 ZernikeMomentIMAG 2 2444 ZernikeMomentIMAG 13 5437 ZernikeMomentIMAG 12 2431 ZernikeMomentIMAG 10 4430 ZernikeMomentIMAG 9 3429 ZernikeMomentIMAG 12 4426 ZernikeMomentREAL 9 3426 ZernikeMomentREAL 5 3426 ZernikeMomentREAL 1 1423 ZernikeMomentREAL 13 3420 ZernikeMomentREAL 11 1419 ZernikeMomentREAL 4 2419 ZernikeMomentREAL 13 5418 ZernikeMomentREAL 11 5415 ZernikeMomentREAL 6 4

......

58 FreemanHistoDir 1 0 557 FreemanHistoDir 6 1 749 FreemanHistoDir 2 1 7

Table 2. Datasets of the Maurdor campaign

Name Documents Word images

train0 1000 157,761train1 2000 338,453dev1 1000 165,309test1 1000 181,239

0

2

4

6

8

10

12

14

0 500 1000 1500 2000 2500 3000 3500 4000

Err

or−

rate

Number of iterations

Validation setTrain set

86

88

90

92

94

96

98

100

0 500 1000 1500 2000 2500 3000 3500 4000

F−

mea

sure

Number of iterations

Validation setTrain set

Figure 4. Error-rate and F-measure curves for bonzaiboost. It illustrates the unsensivity to over-fitting: whilst the train reacheda maximum result after 2000 iterations, further training iterations still improve the validation results

Table 3. Classification results on the validation set

SVM (without tuning)Label Predicted Truth Correct Error Precision % Recall % F-measure % Error-rate %

hand 21,141 30,415 16,822 13,593 79.57 55.31 65.26 44.69printed 144,168 134,894 130,575 13,593 90.57 96.80 93.58 10.08

All 165,309 165,309 147,397 17,912 89.16 89.16 89.16 10.84

SVM (with tuning)Label Predicted Truth Correct Error Precision (%) Recall (%) F-measure (%) Error-rate (%)

hand 28,726 30,415 26,185 4,230 91.15 86.09 88.55 13.91printed 136,583 134,894 132,353 4,230 96.90 98.12 97.51 3.14

All 165,309 165,309 158,538 6,771 95.90 95.90 95.90 4.10

bonzaiboostLabel Predicted Truth Correct Error Precision (%) Recall (%) F-measure (%) Error-rate (%)

hand 29,128 30,415 25,868 4,547 88.81 85.05 86.89 14.95printed 136,181 134,894 131,634 4,547 96.66 97.58 97.12 3.37

All 165,309 165,309 157,502 7,807 95.28 95.28 95.28 4.72

Tuning using all the data is actually too costly and the tuning phase has been conducted with only 10% of thefull training dataset. Despite this reduction, this tuning stage lasted more than 12 days with the parallel versionof libSVM using 16 threads.

5.2 Prediction

The validation set used in our experiments was dev1 dataset, resulting in 181,239 test images. The SVM basedsystem reached 4.10% of error-rate, that can be compared to 4.72% of error-rate for bonzaiboost (see table 3).If their respective performance results are quite close, it is also important to add that this decoding time is veryfast for bonzaiboost (which needs only 1 min 30 sec to predict all the dataset, taken as a whole) whereas theSVM system needed 64 minutes for prediction (also with the input of all the dataset as a whole). We can notethat for future multiclass or/and multilabels problems the bonzaiboost prediction time will remain stable whenthe SVM prediction time will increase as much as the number of class, since we will need to query as many SVMas classes.

Table 4. Results for the test set corresponding to round 1 of Maurdor campaign

System Precision (%) Silence (%) Error-rate (%)

SVM 95.41 3.55 7.98bonzaiboost 94.93 4.03 8.90

bonzaiboost-2 94.42 0.37 5.93

5.3 Final decision

In the context of the application for the Maurdor campaign, our system was tested on test1 dataset, where we hadto output a prediction at zone-level (in other words, for a group of images, each one supposedly correspondingto a single word of an homogeneous text). To this end, we added a majority vote stage taking the individualoutputs of the above predictions, and voting for the group. Naturally, this lead to a significant improvementof the recognition, because the effect a mis-recognized entity can be masked by the majority vote inside theassociated zone (see table 4 where both SVM and bonzaiboost systems reach around 95% of precision amongthe non silent outputs, thus implying an error-rate smaller than 10% considering all the inputs).

Concerning this final decision stage, it is also worth mentioning that:

• Those results are significantly better than the official result of the first competition round of the Maurdorcampaign (that will be published) where our system finished first with a precision of 90.4% (but with asilent rate of 6.6%) compared to the second competitor system with a precision of 89.9% (with no silenceat all). This is because we had the time to correct some bugs and improve the learning stage with more ofthe available data.

• We investigated improved voting strategies, in particular using the surface of the word and the confidencescore of our bonzaiboost classifier. But since no score value nor distance is supplied as outputs of theavailable libSVM library, we were not able to use those improvements with the SVM for comparison.Nevertheless, they are given for information in table 4 as a third system, bonzaiboost-2, to show that thesilence rate, where no majority could be found, can be overcome (and nearly reduced to zero) withoutsignificant lost of performance, in spite of very problematic images sometimes (see figure 5).

6. CONCLUSION

In this article we presented the idea to use the Adaboost.MH in a efficient way on small decision trees, andnot usual simple decision-stumps, on our image classification task. In this application context, this classifierturned out to be very promising: its performance reached SVM which is generally used. Moreover it proposesvery attractive advantages: automatic selection of features implying no prior choice, no scaling between features,only two general parameters (number of iteration and depth of the decision trees) not critical and which don’tneed to be finely tuned far from the costly optimization of critical SVM parameters. And last but not least, theprediction time is very fast (40 time faster than SVM in our experiments).

Concerning our application system of printed/handwritten text discrimination, short-term work will focuson improving the vote for a text region: instead of a majority vote, eventually with weights, we wish to adda training stage with bonzaiboost to learn the best way to vote according to different situations. Concerninglonger-term work on document processing, the good properties of this classifier should enable to easily buildmulticlasses systems, able to predict multi labels properties, to recognize printed text, handwritten text andalso graphic, logo, signature... As presented in introduction, such classifiers able to identify the semantic natureof document regions would improve the performance of OCR systems during the segmentation and structureanalysis process.

Figure 5. Difficult samples from the Maurdor campaign datasets (mixed, twisted, hand-like printed, background...)

ACKNOWLEDGMENTS

This work has been conducted in collaboration with Cassidian (a division of EADS, producing security anddefense systems) to study, develop and implement a prototype for automatic recognition of documents intendedto be integrated into a management chain of military intelligence, for the French Ministry of Defence (DGA).

REFERENCES

[1] DGA, Cassidian, and LNE, “Maurdor campaign dataset,” (2013). http://www.maurdor-campaign.org.

[2] Imade, S., Tatsuta, S., and Wada, T., “Segmentation and classification for mixed text/image documents us-ing neural network,” in [Document Analysis and Recognition, 1993., Proceedings of the Second InternationalConference on ], 930–934 (1993).

[3] Kuhnke, K., Simoncini, L., and Kovacs-V, Z. M., “A system for machine-written and hand-written characterdistinction,” in [Proceedings of the Third International Conference on Document Analysis and Recognition(Volume 2) - Volume 2 ], ICDAR ’95, 811–, IEEE Computer Society, Washington, DC, USA (1995).

[4] Violante, S., Smith, R., and Reiss, M., “A computationally efficient technique for discriminating betweenhand-written and printed text,” in [Document Image Processing and Multimedia Environments, IEEE Col-loquium on ], 17/1–17/7 (1995).

[5] Pal, U. and Chaudhuri, B., “Machine-printed and hand-written text lines identification,” Pattern RecognitionLetters 22(3-4), 431 – 441 (2001).

[6] Mazzei, A., Kaplan, F., and Dillenbourg, P., “Extraction and classification of handwritten annotations,” in[UbiComp ’10 ], (2010).

[7] Guo, J. and Ma, M., “Separating handwritten material from machine printed text using hidden markovmodels,” in [Document Analysis and Recognition, 2001. Proceedings. Sixth International Conference on ],439–443 (2001).

[8] Zheng, Y., Li, H., and Doermann, D., “Machine printed text and handwriting identification in noisy docu-ment images,” Pattern Analysis and Machine Intelligence, IEEE Transactions on 26(3), 337–353 (2004).

http://www.maurdor-campaign.org

[9] Zemouri, E. and Chibani, Y., “Machine printed handwritten text discrimination using radon transform andsvm classifier,” in [Intelligent Systems Design and Applications (ISDA), 2011 11th International Conferenceon ], 1306–1310 (2011).

[10] Shirdhonkar, M. and Kokare, M. B., “Discrimination between printed and handwritten text in documents,”in [IJCA Special Issue on ?Recent Trends in Image Processing and Pattern Recognition?. RTIPPR ’10 ],(2010).

[11] Kavallieratou, E. and Stamatatos, S., “Discrimination of machine-printed from handwritten text usingsimple structural characteristics,” in [Pattern Recognition, 2004. ICPR 2004. Proceedings of the 17th Inter-national Conference on ], 1, 437–440 Vol.1 (2004).

[12] Koyama, J., Kato, M., and Hirose, A., “Distinction between handwritten and machine-printed characterswith no need to locate character or text line position,” in [Neural Networks, 2008. IJCNN 2008. (IEEE WorldCongress on Computational Intelligence). IEEE International Joint Conference on ], 4044–4051 (2008).

[13] da Silva, L., Conci, A., and Sanchez, A., “Word-level segmentation in printed and handwritten documents,”in [Systems, Signals and Image Processing (IWSSIP), 2011 18th International Conference on ], 1–4 (2011).

[14] Patil, U. P. and Begum, M., “Word level handwritten and printed text separation based on shape features,”International Journal of Emerging Technology and Advanced Engineering 2(4), 590–594 (2012).

[15] Schapire, R. E. and Singer, Y., “BoosTexter: A boosting-based system for text categorization,” MachineLearning 39, 135–168 (2000). http://www.cs.princeton.edu/~schapire/boostexter.html.

[16] Breiman, L., “Arcing classifiers,” ANNALS OF STATISTICS 26, 801–823 (1998).

[17] Raymond, C., “Bonzaiboost,” (2013). http://bonzaiboost.gforge.inria.fr.

[18] Freund, Y. and Schapire, R. E., “A short introduction to boosting,” in [In Proceedings of the SixteenthInternational Joint Conference on Artificial Intelligence ], 1401–1406, Morgan Kaufmann (1999).

[19] Vezhnevets, A. and Barinova, O., “Avoiding boosting overfitting by removing confusing samples,” in [Pro-ceedings of the 18th European conference on Machine Learning ], ECML ’07, 430–441, Springer-Verlag,Berlin, Heidelberg (2007).

[20] Bradley, P. and Mangasarian, O. L., “Feature selection via concave minimization and support vector ma-chines,” in [Machine Learning Proceedings of the Fifteenth International Conference(ICML 98 ], 82–90, Mor-gan Kaufmann (1998).

[21] Chang, C.-C. and Lin, C.-J., “LIBSVM: A library for support vector machines,” ACM Transactions onIntelligent Systems and Technology 2, 27:1–27:27 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm.

[22] Lemaitre, A., Camillerapp, J., and Couasnon, B., “A perceptive method for handwritten text segmentation,”in [Document Recognition and Retrieval XVIII ], (2011).

[23] Ricquebourg, Y., Couasnon, B., and Guichard, L., “Evaluation of lexicon size variations on a verificationand rejection system based on svm, for accurate and robust recognition of handwritten words,” Proc. SPIE,Document Recognition and Retrieval XX 8658, 86580A–86580A–11 (2013).

[24] Teague, M. R., “Image analysis via the general theory of moments,” Journal of the Optical Society ofAmerica (1917-1983) 70, 920–930 (August 1980).

http://www.cs.princeton.edu/~schapire/boostexter.html

http://bonzaiboost.gforge.inria.fr

http://www.csie.ntu.edu.tw/~cjlin/libsvm

http://www.csie.ntu.edu.tw/~cjlin/libsvm