3725a171

Devanagari and Bangla Text Extraction from Natural Scene Images U. Bhattacharya, S. K. Parui and S. Mondal Computer Vision and Pattern Recognition Unit, Indian Statistical Institute, Kolkata 108, India {ujjwal, swapan, srikanta_t}@isical.ac.in Abstract Withtheincreasingpopularityofdigitalcameras attachedwithvarioushandhelddevices,manynew computationalchallengeshavegainedsignificance. Onesuchproblemisextractionoftextsfromnatural sceneimagescapturedbysuchdevices.Theextracted textcanbesenttoOCRortoatext-to-speechengine for recognition. In this article, we propose a novel and effectiveschemebasedonanalysisofconnected componentsforextractionofDevanagariandBangla textsfromcameracapturedsceneimages.Acommon uniquefeatureofthesetwoscriptsisthepresenceof headlineandtheproposedschemeusesmathematical morphologyoperationsfortheirextraction. Additionally,weconsiderafewcriteriaforrobust filteringoftextcomponentsfromsuchsceneimages. Moreover,westudiedtheproblemofbinarizationof suchsceneimagesandobservedthatthereare situations when repeated binarization by a well-known globalthresholdingapproachiseffective.Wetested ouralgorithmonarepositoryof100sceneimages containing texts of Devanagari and / or Bangla 1. Introduction Digital cameras have now become very popular and itisoftenattachedwithvarioushandhelddeviceslike mobilephones,PDAsetc.Manufacturersofthese devices are now-a-days looking for embedding various usefultechnologiesintosuchdevices.Prospective technologiesincluderecognitionoftextsinscene images,text-to-speechconversionetc.Extractionand recognitionoftextsinimagesofnaturalscenesare usefultoblindandforeignerswithlanguagebarrier. Furthermore,theabilitytoautomaticallydetecttext fromsceneimageshaspotentialapplicationsinimage retrieval,roboticsandintelligenttransportsystems. However,developingarobustschemeforextraction andrecognitionoftextsfromcameracapturedscenes isagreatchallengeduetoseveralfactorswhich includevariationsofstyle,color,spacing,distribution andalignmentoftexts,backgroundcomplexity, influence of luminance, and so on. Asurveyworkofexistingmethodsfordetection, localizationandextractionoftextsembeddedin imagesofnaturalscenescanbefoundin[1].Two broadcategoriesofavailablemethodsareconnected component(CC)basedandtexturebasedalgorithms. The first category of methods segments an image into a set of CCs, and then classifies each CC as either text or non-text.CC-basedalgorithmsarerelativelysimple, butoftentheyfailtoberobust.Ontheotherhand, texture-basedmethodsassumethattextsinimages havedifferenttexturalpropertiescomparedtothe backgroundorothernon-textregions.Althoughthe algorithmsofthelattercategoryaremorerobust,they haveusuallyhighercomputationalcomplexities. Additionally,afewauthorsstudiedvarious combinations of the above two categories of methods. Among earlyworks, Zhong et al. [2] located text in imagesofcompactdisc,bookcover,ortrafficscenes in two steps. In the first step, approximate locations of textlineswereobtainedandthentextcomponentsin thoselineswereextractedusingcolorsegmentation. Wuetal.[3]proposedatexturesegmentationmethod togeneratecandidatetextregions.Asetoffeature componentsiscomputedforeachpixelandtheseare clustered using K-means algorithm.Jungetal.[4]employedamulti-layerperceptron classifiertodiscriminatebetweentextandnon-text pixels.Aslidingwindowscansthewholeimageand servesastheinputtoaneuralnetwork.Aprobability mapisconstructedwherehighprobabilityareasare regarded as candidate text regions. In[5],Lietal.computedfeaturesfromwavelet decompositionofgrayscaleimageandusedaneural network classifier for labeling small windows as text or non-text.Gllavataetal.[6]consideredwavelet transformbasedtextureanalysisfortextdetection. TheyusedK-meansalgorithmtoclustertextandnon-text regions. Saoietal.[7]usedasimilarbutimprovedmethod fordetectionoftextinnaturalsceneimages.Inthis 2009 10th International Conference on Document Analysis and Recognition978-0-7695-3725-2/09 $25.00 2009 IEEEDOI 10.1109/ICDAR.2009.178171attempt, wavelet transform is applied to all of R, G and B channels of input color image separately.Ezaki,BulacuandSchomaker[8]studied morphologicaloperationsfordetectionofconnected textcomponentsinimages.Theyusedadiskfilter obtaining the difference between the closing image and the opening image. The filtered images are binarized to extract connected components. Inarecentwork,Liuetal.[9]usedaGaussian mixturedistributiontomodeltheoccurrenceofthree neighbouringcharactersandproposedaschemeunder Bayesframeworkfordiscriminatingtextandnon-text components.Panetal.[10]usedasparse representationbasedmethodforthesamepurpose.Ye etal.[11]proposedacoarse-to-finestrategyusing multiscalewaveletfeatures to locate text lines in color images.Textsegmentationmethoddescribedin[12] usesacombinationofaCC-basedstageandaregion filtering stage based on a texture measure. DevanagariandBanglaarethetwomostpopular Indianscriptsusedbymorethan500and200million peoplerespectivelyintheIndiansubcontinent.A uniqueandcommoncharacteristicofthesetwoscripts is the existence of certain headlines as shown in Fig. 1. Thefocusofthepresentworkistoexploittheabove fact for extraction of Devanagari and Bangla texts from imagesofnaturalscenes.Theonlyassumptionwe make is that the characters are sufficiently large and/or thicksothatusingalinearstructuringelementofa certainfixedlengthcancaptureitsheadlines.Tothe best of our knowledge, no existing work deals with the same problem.

(a) (b) Figure 1. (a) A piece of text in Devanagari, (b) a piece of text in Bangla Thepresentstudyisbasedonasetof100outdoor imagesofsignboards,banners,hoardingsand nameplatescollectedusingtwodifferentcameras. Connectedcomponents(bothblackandwhite)are extractedfromthebinaryimage.Then,weuse morphologicalopeningoperationalongwithasetof criteriatoextractheadlinesofDevanagariorBangla texts.Next,weuseseveralgeometricalpropertiesof thecharactersofthesetwoscriptstolocatethewhole text parts in relation to the detected headlines.Therestofthisarticleisorganizedasfollows. Section2describesthepreprocessingoperations.The proposedmethodisdescribedinSection3. Experimental results are provided in Section 4. Section 5 concludes the paper. 2. Preprocessing Sizeofaninputimagevariesdependinguponthe resolutionofthedigitalcamera.Usually,this resolution is 1 MP ormore. Initially,we down sample theinputimagebyanintegralfactorsothatitssizeis reduced to the nearest of 0.25 MP. Next, it is converted to8-bitgrayscaleimageusingtheformulaG= 0.299*R+0.587*G+0.114*B.Infact,thereisno absolutereferenceforweightvaluesofR,GandB. However,theabovesetofweightsisstandardizedby NTSC (National Television System Committee) and its usage is common in computer imaging.Aglobalbinarizationmethodlikethewell-known Otsu'stechniqueisusuallynotsuitableforcamera captured images since the gray-value histogram of such animageisnotbi-modal.Binarizationofsuchan image using a single threshold value often leads to loss of textual information against the background. Texts in theimagesofFigs.2(a)and2(b),arelostduring binarization by Otsus method. (a) (b)

(c) (d) Figure 2. (a) and (b) Two scene images, (c) and (d)afterbinarizationof(a)and(b)byOtsus method Ontheotherhand,localbinarizationmethodsare generally window-based and the choice of window size insuchmethodsseverelyaffecttheresultproducing broken characters, if the characters are thicker than the windowsize.Weimplementedanadaptive thresholdingtechniquewhichusethesimpleaverage gray value in a window of size 2727 around a pixel as thethresholdforthatpixel.InFig.3,weshowthe binarizationresultsoftheimagesofFigs.2bythis HeadlineHeadline172adaptivemethod.However,theexampleinFig.3(b) hastextcomponentsconnectedwiththebackground andsimilarsituationsoccurredfrequentlywiththe sceneimagesusedduringourexperimentations.Also, the latter stages of the proposed method cannot recover from this error. (a) (b) Figure3.(a)&(b)Afterbinarizationofimages in Figs. 2(a) & 2(b) by adaptive method On the other hand, we observed that applying Otsu forthesecondtimeseparatelyonboththesetsof foregroundandbackgroundpixelsofthebinarized imageoftenrecoverlosttextsefficiently.Thesecond timeuseofOtsusmethodasdescribedaboveconvert several pixels from foreground to background and also viceversa.FinalresultsofapplyingOtsu'smethod twice on input images of Fig. 2 are shown in Fig. 4. (a) (b) Figure4.Resultsofbinarizationbyapplying Otsusmethodtwotimes;(a)thebinarized imageofthesampleinFig.2(a),(b)the binarized image of the sample in Fig. 2(b). 3. Proposed approach for text extraction ExtractionofDevanagariand/orBanglatextsfrom binarizedimagesisprimarilybasedontheunique propertyofthesetwoscriptsthattheyhaveheadlines asinFig.1.Basicstepsofourapproach,summarized below,areexecutedseparatelyonresultingimagesof first and second time binarization. 3.1. Algorithm Step1:Obtainconnectedcomponents(C)fromthe binary image (B) corresponding to the gray image (A). These include both white and black components. Step 2: Compute all horizontal or nearly horizontal line segmentsbyapplyingmorphologicalopening operation (Section 3.2) on each C. See Fig. 5(a). Step3:Obtainconnectedsetsoftheaboveline segments. If multiple connected sets are obtained from same C, then we consider only the largest one and call it the candidate headline component HC. Step4:LetEdenoteacomponentCthatproducesa candidateheadlinecomponentHC.ReplaceEby subtractingHCfromit.Thus,Emaynowget disconnectedconsistingofseveralconnected components.Step5:ForeachE,computeH1andH2,whichare respectively the heights of the parts of E that lie above and below HC. Step6:Obtaintheheight(h)ofeachconnected component F of E that lies below HC. Compute p = the standarddeviationofhdividedbythemeanofh,for each E. Step 7: If both H1 / H2 and p are less than two suitably selected threshold values, call the corresponding HC as thetrueheadlinecomponent,HT.Here,itshouldbe notedthatthecharactersofDevanagariandBangla alwayshaveapartbelowtheheadlineandapossible part above the headline is alwayssmaller than the part below it. Step8:SelectallthecomponentsCcorrespondingto each true headline component HT. Step9:Revisitalltheconnectedcomponents,which havenotbeenselectedabove.Foreachsuch componentweexaminewhetheranyothercomponent initsimmediateneighborhoodhasalreadybeen selected.Ifso,wecomparethegrayvaluesofthetwo concernedcomponentsinimageAandifthesevalues areveryclose,thenweincludetheformercomponent into the set of already selected components.As an example,we consider the binarized image of Fig.4(a).Allthelinesegmentsproducedafterthe morphological operations on each component is shown inFig.5(a).Pointsonhorizontallinesegments obtained from white components are represented by the graycolorwhilethesameforblackcomponentsare representedbyblackcolor.Candidateheadlines obtainedattheendofstep3areshowninFig.5(b). Resultofsubtractingcandidateheadlinecomponents fromrespectiveparentcomponentsisshowninFig. 5(c). Trueheadlinecomponentsobtainedattheendof Step7areshowninFig.5(d).Textcomponents selectedbyStep8areshowninFig.5(e).Finally,a few other possible text components are selected by the laststepandthefinalsetofselectedcomponentsare shown in Fig. 5(f). Intheaboveparticularexample,allthetext componentshavebeenselected.However,onlyone non-textcomponent(atthebottomoftheimage)has also been selected. 173 XXXX XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX XXX XXX XXX XXXXXXXX (a) (b) (c) (d) (e) (f) Figure5.Resultsofdifferentstagesofthe algorithmbasedontheimageofFig.2(a);(a) alllinesegmentsobtainedbymorphological operation,(b)setofcandidateheadlines,(c) allthecomponentsminustherespective candidateheadlines,(d)trueheadlines,(e) componentsselectedcorrespondingtotrue headlines, (f) final set of selected components. (a) (b)

(c) (d) Figure6.(a)Anobject(A),(b)astructuring element(B),(c)erodedobject(C=A-B),(d) the object after opening (D = (A-B)+B) 3.2. Morphological operation Weapplymathematicalmorphologytoolssuchas erosionfollowedbydilationoneachconnected component to extract possible horizontal line segments. Forillustration,considerobjectAandstructuring element B as shown in Figs. 6(a) and 6(b) respectively. TheerosionofobjectAbythestructuringelement B, denoted by A-B, is defined as theset of all pixels P in A such that if B is placed on A with its center at P, B is entirely contained in A. For object A and structuring element B, the eroded object A-B is shown in Fig. 6(c). Thedilationoperationisinsomesensedualof Erosion. For each pixel P in the object A, consider the placementB(P)ofthestructuringelementBwithits centeratP.ThenDilationofobjectAbystructuring element B, denoted by A+B, is defined as theunion of suchplacements B(P)forallPinA.OpeningofAby the element B is (A-B)+B and it is shown in Fig. 6(d). ItisevidentthatopeningofanobjectAwitha linear structuring element B can effectively identify the horizontallinesegmentspresentinaconnected component. However, a suitable choice of the length of this structuring element is crucial for processing of the latter stage and we empirically selected its length as 21 for the present problem. (a) (b)

(c) (d)

(e) (f)

(g) (h) Figure 7. A few images on which our algorithm performed perfectly & respective output. 4. Experimental results Weobtainedsimulationresultsbasedon100test images acquired by (i) a Kodak DX7590 (5.0 MP) still camera and (ii) a SONY DCR-SR85E handy cam used in still mode (1.0 MP).Resolution of images captured bythesetwocamerasarerespectively25761932and 644483pixels.Afterdownsamplingtheirsizesare reducedto644483and576432pixelsrespectively. Theseareofhighways,Institutions,railwaystation, XXXXXXXXXXXXXXXXXXXXXX XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX XXXXXXXXX 174festivalgroundetc.Thesearefocusedonnamesof building/shop/railwaystation/financialinstitutions orhoardingstowardsadvertisements.Thesecontain DevanagariandBanglatextsofvariousfontstyles, sizes, and directions. A few of the images on which the algorithmperfectlyextractsalltheBanglaand Devanagari text components are shown in Fig. 7. There are58suchimagesallofwhoserelevanttext components could be extracted. On the other hand, two of the sample images on which the performance of our algorithmisextremelypoorareshowninFig.8. Similarpoorperformanceoccurredwith6ofour sample images. On rest of the 36 images the algorithm eitherpartiallyextractedrelevanttextcomponentsor extracted text along with a few non-text components.Insummary,theprecisionandrecallvaluesofour algorithmobtainedonthebasisofthepresentsetof 100 images are respectively 68.8% and 71.2%. (a) (b) (c) (d) Figure8.Twosampleimagesonwhichthe performance of our algorithm is very poor. 5. Conclusions The proposed algorithm works well even on slanted or curvedtextcomponentsofDevanagariandBangla. OnesuchsituationisshowninFig.9.However,the proposed algorithmwill fail whenever the size of such curved or slanted text is not sufficiently large. (a) (b) (c) (d) Figure9.Twoimagesconsistingofcurvedor slantedtexts.Extractedcomponentsare shown to the right of each source image. In future,we shall studyuseofmachine learning tools to improve the performance of the proposed algorithm. References [1] J. Liang, D. Doermann, H. Li, Camera based analysis of text and documents : a survey, Int. Journ. on Doc. Anal. and Recog. (IJDAR) vol. 7, pp. 84-104, 2005. [2] Y. Zhong, K. Karu, A. K. Jain, Locating text in complex colorimages,3rdInternationalConferenceonDocument Analysis and Recognition, vol.1, 1995, pp. 146-149. [3] V. Wu, R. Manmatha, E. M. Risemann, Text Finder: an automaticsystemtodetectandrecognizetextinimages, IEEE Transactions on PAMI, vol. 21, pp. 1224-1228, 1999. [4]K.Jung,K.I.Kim,T.Kurata,M.Kourogi,J.H.Han, TextScannerwithTextDetectionTechnologyonImage Sequences,Proceedingsof16thInternationalConference on Pattern Recognition (ICPR), vol. 3, 2002, pp. 473-476. [5]H.Li,D.Doermann,O.Kia,Automatictextdetection andtrackingindigitalvideo,IEEETrans.Image Processing, vol. 9, no. 1, pp. 147-167, 2000. [6]J.Gllavata,R.Ewerth,B.Freisleben,TextDetectionin ImagesBasedonUnsupervisedClassificationofHigh Frequency Wavelet Coefficients, Proc. of 17th Int. Conf. on Pattern Recognition (ICPR), vol. 1, 2004, pp. 425-428. [7] T. Saoi, H. Goto, H. Kobayashi, Text Detection in Color SceneImagesBasedonUnsupervisedClusteringof MultihannelWaveletFeatures,Proc.of8thInt.Conf.on Doc. Anal. and Recog. (ICDAR), pp. 690-694, 2005. [8] N. Ezaki, M. Bulacu, L. Schomaker, Text detection from natural scene images: towards a system for visually Impaired Persons,Proc.of17thInt.Conf.onPatternRecognition, vol. II, pp. 683-686, 2004. [9]X.Liu,H.Fu,Y.Jia,"Gaussianmixturemodelingand learningofneighboringcharactersformultilingualtext extraction in images", Pattern Recognition, vol. 41, pp. 484 493, 2008. [10]W.Pan,T.D.Bui,C.Y.Suen,TextDetectionfrom SceneImagesUsingSparseRepresentation,Proc.ofthe 19th International Conference on Pattern Recognition, 2008. [11]Q.Ye,Q.Huang,W.Gao,D.Zhao,Fastandrobust text detection in images and video frames, Image and Vision Computing, 23, pp. 565576, 2005. [12]C.Merino,M.Mirmehdi,Aframeworktowardsreal-timedetectionandtrackingoftext,2ndInt.Workshopon Camera-Based Doc. Anal. and Recog., pp. 1017, 2007. 175

3725a171

Documents

bangla text extraction

texts of devanagari

categories of methods

category of methods

pattern recognition

natural scene images

indian statistical institute

mondal computer vision