novel document image binarization technique for degraded ......abstract ² historical documents have...

@IJMTER-2015, All rights Reserved 334

Novel Document Image Binarization Technique

for Degraded Document Images

Sruthy .P1,A.Thamizharasi2

1Mtech student CSE, Department of Computer Science, MCET, Anad 2Assistant professor CSE, Department of Computer Science, MCET, Anad

Abstract— Historical documents have great cultural and scientific importance because most inventions

are derived from these documents. So the originals of these documents are carefully preserved and not

available for public viewing, only photocopies of documents are provided. There are mainly two types of

degradation in the historical document images. First, the original document is aged leading to ink bleed

through, stain, damages and dirt. The second problem is introduced during conversion of the documents

to their digital image format. In this paper a novel document image binarization technique is proposed

that addresses these issues by using enhanced adaptive image contrast. The adaptive image contrast is a

combination of the local image

contrast and the local image gradient that is tolerant to text and background variation caused by different

types of document degradations. Enhanced adaptive image contrast is a standard deviation based adaptive

image contrast. In the proposed technique, degraded document images are pre-process using filters to

eliminate background noise. Then an enhanced adaptive contrast map is constructed for an input degraded

document image. The enhanced adaptive contrast map is then binarized and combined with a Canny’s

edge map of input image to identify the text stroke edge pixels. The document text is further segmented

by a local threshold that is estimated based on the intensities of detecting text

stroke edge pixels within a local window. The final binarized image is human recognizable and can feed

into OCR system. Thus the details available in historical document are enhanced through this

technique.

Keywords-Adaptive image contrast, degraded document image binarization

I. INTRODUCTION

Ancient document collections available in libraries throughout the world are of great importance. The

transformation of such documents into digital form is essential for maintaining the quality of the originals

while provides public with full access to that information. It is quite common for such documents suffer

from degradation problems like ink bleed through, stains, background of big variations and uneven

illumination. Document Image Binarization is performed in the preprocessing stage for document analysis

and it aims to segment the foreground text from the document background. Frequently, binarization is

used as a pre-processor before OCR because most OCR packages on the market work only on bi-level

(black and white) images. The simplest way to use image binarization is to choose a threshold value, and

classify all pixels with values above this threshold as white, and all other pixels as black. The problem

then is how to select the correct threshold for degraded document images. So binarized image without any

enhancement cannot be directly fed to an OCR system, because most of the documents in the universe

suffer from certain types of degradation such as ink bleed through ,ink stains, oil stains and varying

illumination. Document image enhancement is a technique that improves the quality of a degraded

document images and enhances human perception.

Binarization converts a gray scale image into a binary image as shown in Fig. 1(a) and (b). If the processing

documents that are of very poor quality due to spreading of ink or background noise, then the noisy region

International Journal of Modern Trends in Engineering and Research (IJMTER) Volume 02, Issue 08, [August– 2015] ISSN (Online):2349–9745 ; ISSN (Print):2393-8161


becomes black so extraction of foreground image from background becomes complex task. A good

binarization will result in better recognition.

Fig 1(a) Degraded image in gray scale

Fig 1(b) Binarized image of Fig1(a)

II. LITERATURE REVIEW

This chapter presents a literature survey on document image binarization. Document image binarization

is usually performed in the pre-processing stage of image processing. It converts an image of up to 256

gray levels to a black and white image using threshold value. Binarization techniques can be divided into

two categories, namely global thresholding methods and local thresholding methods. The global

binarization techniques assign a single threshold for the whole document image, and the local binarization

techniques find a threshold for each pixel in the document image. Finding one threshold compatible of the

degraded image is very difficult, and in many cases even impossible. Many thresholding techniques have

been reported which uses document image binarization.

Local contrast[1] is the major technique used to binarize an input degraded image. It is evaluated based

on the local maximum and minimum within the neighbourhood window of size 3 X 3. Local maximum is

the maximum value and local minimum is the minimum value within its neighbourhood window. Local

contrast [7]is used to overcome the contrast variation problems of image gradient. The image gradient is

obtained by the absolute image difference within a local neighbourhood window. Background estimation

and stroke edge information [2] are another concepts used to binarized degraded documents. It is based

on the observations that the text documents usually have a document background of the uniform color and

texture and the document text within it has a different intensity level compared with the surrounding

background. Document background surface is estimated through an iterative polynomial smoothing

procedure. The text stroke edges are then detected by combining the local image variation and the

estimated document background surface. The Bolan Su’s binarization technique [3] makes use of the

adaptive image contrast that combines the local image contrast and the local image gradient adaptively.

The objective of this work is to propose some document image enhancement techniques for old and

historical document images using enhanced adaptive contrast image. The recent literature survey [4]



shows that binarization using contrast of image produce the best binarization results. Enhanced adaptive

contrast map is an extended work on adaptive contrast map method, inorder to further enhance that

technique.

III. PROPOSED METHOD

This section describes the proposed method in details. The objective of this work is to enhance degraded

documents images using enhanced adaptive contrast map. Local image contrast of degraded document

image is evaluated using local maximum and minimum filter, but it introduce over-normalization problem.

To reduce over-normalization problem, combine local image contrast and local image gradient and derive

an adaptive contrast image. The intensity variations of pixels in adaptive contrast image reduce

thresholding accuracy of degraded input image. To reduce thresholding problem in adaptive contrast map,

standard deviation of adaptive contrast image is used to further enhance the adaptive contrast image.

Filters are used to remove noise from historical document images and improve their quality before fed

into enhanced adaptive contrast map construction module. An enhanced adaptive contrast map is an

extended work over adaptive contrast map . From Fig. 3, it is clear that the pixels in enhanced adaptive

contrast map have almost similar intensity. So binarized enhanced adaptive contrast map will not produce

thresholding problems. Combined result of binarized enhanced adaptive contrast map and Canny’s edge

map[5] of input image can detect text stroke edge pixels effectively. Local thresholding of degraded input

image is performed based on text stroke edge width in formations obtained from text stroke edge pixels.

Post processing is applied further to enhance the local threshold result.

A. Preprocessing

Pre-processing is an essential step for degraded document images, for the elimination of noisy areas,

smoothing of background texture as well as contrast enhancement between background and text area. An

average filter followed by a wiener filter is used here for pre-processing. The average filter is a simple

sliding-window spatial filter that replaces the center value in the window with the average of all the pixel

values in the local window. It reduces the intensity variation within the pixels. Thus it smoothes the input

image. Wiener filter is an adaptive low pass filter which reduces photographic noises effectively. It filters

a grayscale image that has been degraded by constant power additive noise.

Fig 2 Input image after applying preprocessing

B. Enhanced Contrast Image Construction

Image gradient is widely used for edge detection but it works well only in uniform background. This

technique detects non stroke edges from its background images. So it needs to be normalized to

compensate the image variance. However, the image contrast has some limitation that, it cannot handle

degraded document images with complex background. So local image contrast map is a combined result

of both of these techniques and derive an adaptive local image contrast as shown in equation 1.

𝐶𝑎(𝑖, 𝑗) =∝ 𝐶(𝑖, 𝑗) + (1−∝)(𝐼𝑚𝑎𝑥(𝑖, 𝑗) − 𝐼𝑚𝑖𝑛(𝑖, 𝑗)) (1)

c(i, j) = (Imax(i, j) − Imin(i, j))/((Imax(i, j) + (i, j)+∈) (2)



Where C(i,j) is the local contrast in equation (2) and (𝐼𝑚𝑎𝑥(𝑖, 𝑗) - 𝐼𝑚𝑖𝑛(𝑖, 𝑗) ) is the local image

gradient that is normalized to [0,1].Empirically window size is set as 3. ∝ is the weight between local

contrast and local gradient calculated by using the equation:

∝= (𝑠𝑡𝑑

128)𝛾 (4)

Where std denotes the standard deviation of document image intensity. Where 𝛾 is a pre defined value

ranges from 0 to 1. Local image gradient will play the major role in equation 3 when 𝛾 is large and local

image contrast will play major role when 𝛾 is small. Since there is an intensity variation within the text

stroke itself, it will not produce proper binarized result. This is the major limitation of adaptive contrast

map . The Contrast map is enhanced by using new equation as follows:

𝐶 = 𝐶𝑎 ∗ 0. 01𝐶𝑠 (5)

Where Cs is the standard deviation of a pixel (i, j) of enhanced contrast map with in a local neighborhood

window of size [3 3] and Ca denotes the adaptive contrast map of pre-processed input image.

The first step in the enhanced contrast map is to construct the adaptive contrast map, then find standard

deviation locally. Next step is to construct a matrix having same dimension as adaptive contrast map and

initialized all element with 0.01, a negligible value. Each element in the newly created matrix pixel wise

perform a pixel to the power of standard deviation. Finally multiply the adaptive contrast map with

previous result. Thus this technique reduces the variation within the text stroke edge pixels.

Fig 3 shows enhanced adaptive contrast map.

Fig 3 Enhanced adaptive contrast map

C .Edge Pixel Detection

The next step is to detect text stroke edge pixels from the enhanced adaptive contrast map. A global

thresholding technique is used here to segment the text stroke from enhanced contrast map. In Bolan su’s

paper[3] the text stroke edge pixel is detected by using Otsu’s global thresholding[6]. Otsu’s global

thresholding will calculate an optimal threshold value for whole image. It produces good binarization

result for clean documents ,i.e. the pixels in each class are close to each other. But it produces worse result

in case of complex degraded document images. Enhanced contrast map undergo global thresholding and

Canny’s edge map is constructed on input degraded image. Here global thresholding is done based on

mean and standard deviation of image. Since major portion of enhanced contrast map contain text pixels,

the mean and standard deviation depends mainly on text pixels. The global thresholding using mean and

standard deviation of the enhanced contrast map can overcome the problem arise during Otsu global

threshold. The result after global thresholding is shown in Fig 4.



Fig 4 Binarized enhanced adaptive contrast map

Fig 5 Canny edge map of degraded input image

Fig 6 Combined high contrast map

Since the presence of noise in the enhanced contrast map is negligible, the mean and standard deviation

depends only on text stroke . The global thresholding is done as follows :

B(x,y)= {1 𝑀𝑎𝑝(𝑥, 𝑦) ≤ 𝐸𝑚𝑒𝑎𝑛 + 𝐸𝑠𝑡𝑑

0 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒

(6)

Where Emean and Estd are calculated based on mean and standard deviation of entire enhanced contrast

map. Thus obtain the initial binary map B. The initial binary map B can be further improved by combining

it with Canny’s edge map of input image. Fig 5 shows the canny edge map of input image. In the combined

map it only keeps the pixels that appear within both initial binary map and canny edge map. Thus it further

eliminates the presence of uncertain pixels in the initial result and high contrast text strokes are obtained

as shown in Fig 6.

D. Local threshold estimation

Once the high contrast stroke edge pixels are detected, the text can then be extracted from the document

background pixels. Two characteristics can be observed from different kinds of document images: First,



the text pixels are close to the detected text stroke edge pixels. Second, there is a distinct intensity

difference between the high contrast stroke edge pixels and the surrounding background pixels. Local

thresholding is used here to threshold input image using the equation as follows:

R(x,y)= {1 𝐼(𝑥, 𝑦) ≤ 𝐸𝑚𝑒𝑎𝑛 + 𝐸𝑠𝑡𝑑/5

0 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒

(6)

Where 𝐸𝑚𝑒𝑎𝑛 and 𝐸𝑠𝑡𝑑 are the mean and standard deviation of the intensity of the detected text stroke

edge pixels within a neighborhood window W. The window size W is closely related to the stroke width

EW. W should be larger than the stroke width. The term W can be set around 2EW because a larger local

neighborhood window will increase the computational load significantly. The term W can be set around

5EW because it works well in almost all DIBCO datasets. The term ’EW’ can be estimated from the

detected stroke edges as shown in Algorithm 1[3]. Since there is no need a precise stroke width, Algorithm

1 just calculate the most frequently distance between two adjacent edge pixels (which denotes two sides

edge of a stroke) in horizontal direction and use it as the estimated stroke width.

Algorithm 1 Edge Width Estimation

Require: The input Document Image say I and Corresponding Binary Text Stroke Edge Image say Edg

Ensure: The Estimated Text Stroke Edge Width EW

Step 1: Get the width and height of I

Step 2: for Each Row i=1 to height in Edge do

Scan from left to right to find edge pixel that meets the following criteria:

a)its label is 0(background);

b) its next pixel is labeled as 1(edge);

Examine the intensities in I of those pixels selected in Step2 and remove those pixel that have a lower

intensity than the following pixel next to it in the same row of I

Match the remaining adjacent pixels in the same row into pairs and calculate the distance between the two

pixels in pairs

end for Step 3:Construct a histogram of those calculated distances.

Step 4: Use the most frequent occurring distances as the estimated stroke edge width EW

First the edge image is scanned horizontally row by row and the edge pixel candidates are selected as

described in step 3.The edge pixels should have higher intensities than the following few pixels, those

improperly detected edge pixels are removed in step 4. After construct the histogram the most frequently

occurring distance is taken as the edge width. The result obtained after local threshold estimation is shown

in Fig 7 ,which is termed as initial binarization result .

Fig 7 Result obtained after local thresholding



E. Post processing

Once the initial binarized result is obtained from local thresholding module, it can further enhanced by

using post processing technique. First, the isolated foreground pixels that do not connect with other

foreground pixels are filtered out. Second, the neighborhood pixels are grouped into two pairs. Pixel(i-1,j)

and (i+1,j) as one pair and pixel(i,j-1) and (i,j+1) as another pair. Then check the pixels that lies on

symmetric sides of a text stroke edge pixel should belong to different classes (i.e., either the document

background or the foreground text). One pixel of the pixel pair is therefore labeled to the other category

if both of the two pixels belong to the same class. The initial binarization result can be improved by using

post processing method. Fig 8 shows the result after post processing.

Fig 8 Result obtained after post processing

Algorithm 2: Post Processing Procedure

Require: The input Document Image I and Initial Binary Result B and Corresponding Binary Text Stroke

Edge Image Edg

Ensure: The Final Binary Result Bf

Step 1: Find out all the connected components of the stroke edge pixels in Edg.

Step 2: Remove those pixels that do not connect with other pixels

Step 3: for Each remaining edge pixels (i,j):do

Get the neighbourhood pairs (i-1,j)and (i+1,j);(i,j-1)and (i,j+1)

if the pixels in the same pairs belong to the same class then

Assign the pixel with lower intensity to foreground class and the other to background class.

end if

end for

Step 4; Remove single pixel artifacts along the text stroke boundaries after the document thresholding.

Step 5: Store the new binary result to Bf

IV. IMPLEMENTATION

This chapter describes that implementation details. This work is implemented using MATLAB R2013a.

The four datasets are composed of the same series of document images that suffer from several common

document degradations such as smear, smudge, bleed-through and low contrast. The DIBCO 2009 dataset

[8] contains ten testing images that consist of five degraded handwritten documents and five degraded

printed documents. The H-DIBCO 2010 dataset [9] consists of ten degraded handwritten documents. The

DIBCO 2011 dataset [10] contains eight degraded handwritten documents and eight degraded printed

documents. In total, we have 36 degraded document images with ground truth. The DIBCO 2013[11] is

also contains eight handwritten and eight printed datas for analysis.

The performance of proposed work is calculated by taking PSNR and F measure between final binarized

result and its corresponding ground truth. Higher PSNR value means the binarized result is good. The

term PSNR measures how close the resultant image to the ground truth image. In binary classification, the

F-score (F-measure) is a measure of a test’s accuracy. The F-score can be interpreted as a weighted average



of the precision and recall, where an F-score reaches its best value at 1 and worst at 0. The proposed

technique is applied on other dataset having Malayalam document image, Tamil document image and

English document image (own created) that suffer from usual degradation. The proposed algorithms will

binaries the images in all languages, but the accuracy of binarized result is perfect. The binarization

performance of proposed system for the own created datasets cannot be calculated, because it does not

have corresponding ground truth.

Fig 9 Malayalam degraded input image suffering from low contrast

Fig 10 Final binarized result generated from proposed system

Fig 11 Tamil degraded input image



Fig 12 Final binarized result generated from proposed system

V.RESULTS AND DISCUSSION

The proposed technique is tested and compared with four DIBCO datasets: DIBCO 2009 dataset,

HDIBCO 2010 dataset, DIBCO 2011 and DIBCO 2013 dataset. The performance of proposed enhanced

binarization is evaluated by using F-Measure and Peak Signal to Noise Ratio (PSNR).

A. Parameter Selection

As described proposed system, based on γ value weight value will α change. This value has high

importance because α value determines which equation will play the major role in adaptive contrast map

equation. Local image gradient will play the major role , when γ is large and local image contrast will play

major role when γ is small. In the first experiment, we apply different γ to obtain different F measure.

Then test their performance under the DIBCO 2009 and H- DIBCO 2010 datasets. The γ increases from

2-10 to 210exponentially and monotonically as shown in Fig 13. Better F measure is obtained when γ is

equivalent to 20.

Fig 13 F-measure performance on DIBCO 2009 & H-DIBCO 2010 datasets using different γ power function

Fig 14 PSNR on DIBCO 2009,HDIBCO 2010,H-DIBCO 2011,DIBCO 2013 datasets using different edge width.



The local window size W is tested in the second experiment on the DIBCO 2009, H-DIBCO 2010 and

DIBCO 2011 datasets. Window size W is related to the stroke width EW. Fig 14 shows the thresholding

results when f measure varies with edge width EW. The performance of the proposed method becomes

stable when the local window size is larger than 5EW consistently on the three datasets. W can therefore

set around 5EW because a larger local neighborhood window will increase the computational load

significantly.

B. Testing on DIBCO Datasets

The four datasets are composed of the same series of document images that suffer from smear, smudge,

bleed-through and low contrast. The DIBCO 2009 dataset[8] contains ten testing images that consist of

five degraded handwritten documents and five degraded printed documents. The H-DIBCO 2010

dataset[9] consists of ten degraded handwritten documents. The DIBCO 2011[10] dataset contains eight

degraded handwritten documents and eight degraded printed documents. In total, we have 36 degraded

document images with ground truth. The DIBCO 2013 dataset [11] contains 16 degraded image with

ground truth.

Table 1 PSNR and F measure of 2009 dataset using adaptive contrast map method

Input mage PSNR F measure

P01 15.6492 0.984

P02 16.3863 0.98

P03 19 0.99

P04 16.4182 0.987

Table 2 PSNR and F measure of 2009 dataset using proposed method


P01 16.56 0.9875

P02 17.625 0.989

P03 20.15 0.99

P04 17.024 0.988

Table 2 and 6 shows the result obtained from proposed system is better than adaptive contrast map method

as shown in Table 1 and 5. Table 4 shows that some document images in DIBCO 2010 dataset produce

lower performance value than the existing method.



H01 14.28 0.970

H02 18.00 0.990

H03 19.85 0.994

H04 16.41 0.988

The DIBCO 2009 uses printed document image dataset with degradation. Using proposed technique the

text can recover from the background with less noise. The DIBCO 2010 includes only handwritten

document image dataset. Since the presence of edge width variation is higher in DIBCO 2010 handwritten



document, there is a chance of obtaining wrong value for edge width. So for some input image f-measure

and PSNR produce poor result than existing method.



H01 14.32 0.9775

H02 16.45 0.9880

H03 17.10 0.9894

H04 17.20 0.9896



HW1 11.61 0.964

HW2 20.10 0.995

HW6 12.57 0.966



HW1 13.75 0.9756

HW2 20.84 0.9956

HW6 13.64 0.9764

VI CONCLUSION

Document Image Enhancement is a technique that improves the quality of a document image to enhance

human perception and facilitate subsequent automated image processing. It is widely used in the pre-

processing stage of different document analysis tasks and it aims to segment the foreground text from the

document background. For degraded documents, pre-processing is an essential step that can reduce

background noise effectively. In this thesis work filters are used in the pre-processing step for this purpose.

An adaptive contrast map is a technique to identify text stroke edges of degraded document images. But

it will not work well on complex background. So the proposed system constructs an enhanced adaptive

contrast map, which works on complex degraded background effectively. Thresholding contrast map with

Otsu’s global thresholding will not produce correct binarization in existing system. This global

thresholding with mean and standard deviation can overcome this thresholding problem. Experiments

show that the proposed method outperforms binarization using adaptive contrast map method in term of

the PSNR and F-measure. The proposed system work well in English documents. This method when

applied in other languages gives less accurate result. The future work is to propose a general stroke width

estimation algorithm for handling document images in all languages.

REFERENCES [1] B. Su, S. Lu, and C. L. Tan, "Binarization of historical handwritten document images using local maximum and minimum

filter", IEEE conference on Document Anal. Syst., Jun. 2010

[2] S. Lu, B. Su, and C. L. Tan, "Document image binarization using background estimation and stroke edges", IEEE

conference on Document Anal. Recognit.,Dec. 2010.

[3] B. Su, S. Lu, and C. L. Tan, “A robust document image binarization technique for degraded document images,” IEEE

Transaction on Image Processing , 2012



[4] Ms.Supriya Sunil Lokhande, Prof.N.A.Dawande "A Survey on document image binarization techniques", IEEE

Transaction on Image Processing ,2015

[5] J. Canny, "A computational approach to edge detection", IEEE Transactions on Pattern Analysis and Machine Intelligence,

January 1986

[6] N. Otsu, “A threshold selection method from gray level histogram,” IEEE Trans. Syst., Man, Cybern., vol. 19, no. 1, pp.

62–66, Jan. 1979.

[7] J. Bernsen, “Dynamic thresholding of gray-level images,” in Proc. Int. Conf. Pattern Recognit., Oct. 1986, pp. 1251–

1255.

[8] B. Gatos, K. Ntirogiannis, and I. Pratikakis, “ICDAR 2009 document image binarization contest(DIBCO 2009),”

International Conference on Document Analysis and Recognition, pp. 1375–1382, July 2009. viii, 6, 11, 38, 39, 43, 54,

59, 61, 64, 69, 116

[9] I. Pratikakis, B. Gatos, and K. Ntirogiannis, “ICDAR 2011 document image binarization contest (DIBCO 2011),”

International Conference on Document Analysis and Recognition, September 2011. 11, 38, 39, 69, 116

[10] Pratikakis, Gatos, and Ntirogiannis, “H-DIBCO 2010 handwritten document image binarization competition,”

International Conference on Frontiers in Handwriting Recognition, pp. 727–732, November 2010. 11, 38, 39, 43, 61, 64,

69, 116

[11] I. Pratikakis, B. Gatos and K. Ntirogiannis, “ICDAR 2013 Document Image Binarization Contest (DIBCO 2013)”, 12th

International Conference on Document Analysis and Recognition (ICDAR 2013),pp. 1471 - 476, Washington, DC, USA,

2013.

novel document image binarization technique for degraded ......abstract ² historical documents have...

Documents