hierarchical attributed relational graph based ‘devnagari ...3)/p6.pdf · hierarchical attributed...

5
Hierarchical Attributed Relational Graph Based ‘Devnagari’ Character Recognition P Mukherji a and P P Rege b a Electronics and Telecommunication Department, SKNCOE, S.No. 44/1,Vadgaon, Pune 411044, India, Contact: prachimukherji@rediffmail.com b Electronics and Telecommunication Department, College of Engineering, Pune, Shivajinagar,Pune, India. ‘Devnagari’ script is a major script of India, widely used for various languages. In this paper, a new hybrid approach is proposed for recognition of handwritten ‘Devnagari’ characters, which integrates the advantages of structural and statistical approaches. In the proposed approach, the characters are first segmented in two parts: direct segments and complement segments forming the sub-graph and the graph nodes respectively in the Hierarchical Attributed Relational Graph (HARG) based ‘Devnagari’ character representation. Segment adjacency in sub-graph is found by complement segment neighborhood connectivity to direct segments. Each segment is given a primitive, based on Average Compressed Direction Coding (ACDC) algorithm forming the labels of sub-graph nodes. Additional statistical features of segments are used as attributes of nodes and their spatial relationships as edges. The node label of the graph is sub-graph type and the connected nodes determine the edges with ACDC primitive as labels. Relative angle feature is used as an attribute of the graph edge. Inexact graph matching is proposed using dissimilarity based matching by comparing feature vectors extracted from the graphs. The nearest neighbor rule is used as a classifier. The database is collected from trained, semi-trained and untrained writers. The accuracy of recognition obtained varies from 95% to 99% depending on the database. 1. Introduction Over 500 million people all over the world use ‘Devnagari’ script. It provides written form to over forty languages [1] including Hindi, Konkani, Nepali and Marathi. The character set consists of 35 consonants and 13 vowels. While most work has been published for printed ‘Devnagari’ text, very little is reported for handwritten ‘Devnagari’ script. One of the first attempts for hand printed characters has been by Sethi [2]. A set of very simple primitives is used, and all the ‘Devnagari’ characters are looked upon as a concatenation of these primitives. Most of the decisions are taken on the basis of the presence/absence or positional relationship of these primitives; and the decision process is a multistage process, where each stage of decision making narrows down the choice re- garding the class membership of the input token. In [3], a survey of different structural techniques used for feature extraction in OCR of different scripts and status of other Indian scripts is given. Recently in [4], zoning based statistical parame- ters are used by dividing the character in zones of size 3* 3 and an accuracy of 81% is achieved on characters without top modifiers. In [5], Roberts filter and normalization based 400 length feature vector is extracted on isolated characters and an accuracy of 94% is achieved. In [6], an accuracy of 85% is achieved with top modifier segmentation using stroke analysis. Attributed and labeled graphs are used in pat- tern recognition and various other domains of computer science. They have been used to repre- sent Chinese characters [7], hand-drawn symbols [8], word recognition [9] and others. When graphs are used for representing structured objects, then the problem of measuring object similarity turns into the problem of computing similarity between graphs, which is generally referred to as graph matching [10]. The graph matching methods can 49 International Journal of Information Processing, 2(3), 49 - 62, 2008 ISSN : 0973 - 8215 I. K. International Publishing House Pvt. Ltd., New Delhi, India

Upload: lequynh

Post on 10-Aug-2018

225 views

Category:

Documents


0 download

TRANSCRIPT

Hierarchical Attributed Relational Graph Based ‘Devnagari’

Character Recognition

P Mukherjia and P P Regeb

aElectronics and Telecommunication Department, SKNCOE,S.No. 44/1,Vadgaon, Pune 411044, India, Contact: [email protected]

b Electronics and Telecommunication Department, College of Engineering, Pune,Shivajinagar,Pune, India.

‘Devnagari’ script is a major script of India, widely used for various languages. In this paper, a new hybridapproach is proposed for recognition of handwritten ‘Devnagari’ characters, which integrates the advantagesof structural and statistical approaches. In the proposed approach, the characters are first segmented in twoparts: direct segments and complement segments forming the sub-graph and the graph nodes respectively inthe Hierarchical Attributed Relational Graph (HARG) based ‘Devnagari’ character representation. Segmentadjacency in sub-graph is found by complement segment neighborhood connectivity to direct segments. Eachsegment is given a primitive, based on Average Compressed Direction Coding (ACDC) algorithm forming thelabels of sub-graph nodes. Additional statistical features of segments are used as attributes of nodes and theirspatial relationships as edges. The node label of the graph is sub-graph type and the connected nodes determinethe edges with ACDC primitive as labels. Relative angle feature is used as an attribute of the graph edge. Inexactgraph matching is proposed using dissimilarity based matching by comparing feature vectors extracted from thegraphs. The nearest neighbor rule is used as a classifier. The database is collected from trained, semi-trained anduntrained writers. The accuracy of recognition obtained varies from 95% to 99% depending on the database.

1. Introduction

Over 500 million people all over the world use‘Devnagari’ script. It provides written form toover forty languages [1] including Hindi, Konkani,Nepali and Marathi. The character set consists of35 consonants and 13 vowels. While most workhas been published for printed ‘Devnagari’ text,very little is reported for handwritten ‘Devnagari’script. One of the first attempts for hand printedcharacters has been by Sethi [2]. A set of verysimple primitives is used, and all the ‘Devnagari’characters are looked upon as a concatenation ofthese primitives. Most of the decisions are takenon the basis of the presence/absence or positionalrelationship of these primitives; and the decisionprocess is a multistage process, where each stageof decision making narrows down the choice re-garding the class membership of the input token.In [3], a survey of different structural techniquesused for feature extraction in OCR of different

scripts and status of other Indian scripts is given.Recently in [4], zoning based statistical parame-ters are used by dividing the character in zones ofsize 3* 3 and an accuracy of 81% is achieved oncharacters without top modifiers. In [5], Robertsfilter and normalization based 400 length featurevector is extracted on isolated characters and anaccuracy of 94% is achieved. In [6], an accuracy of85% is achieved with top modifier segmentationusing stroke analysis.

Attributed and labeled graphs are used in pat-tern recognition and various other domains ofcomputer science. They have been used to repre-sent Chinese characters [7], hand-drawn symbols[8], word recognition [9] and others. When graphsare used for representing structured objects, thenthe problem of measuring object similarity turnsinto the problem of computing similarity betweengraphs, which is generally referred to as graphmatching [10]. The graph matching methods can

49

International Journal of Information Processing, 2(3), 49 - 62, 2008ISSN : 0973 - 8215I. K. International Publishing House Pvt. Ltd., New Delhi, India

Hierarchical Attributed Relational Graph Based ‘Devnagari’ Character Recognition 59

prototype in the training set in a brute force man-ner, the nearest neighbour of a character can beselected by simply looking at row-min, i.e., the el-ement in a row closest to zero in the distance met-ric. The diagonal zero is the case given in Equa-tion 7 when comparison is made between modelcharacters. But as hand drawn characters havebeen considered, perfect zero on the diagonal isnot possible, however smallest value on the diag-onal will be a proof of system ability to recall andfind a closest match.

D = [dij ] =

d11 d12 d13 · · · d1j

d21 d22 d23 · · · d2j

d31 d32 d33 · · · d3j

di1 di2 di3 · · · d2j

(6)

D = [dij ] =

0 d12 d13 · · · d1j

d21 0 d23 · · · d2j

d31 d32 0 · · · d3j

di1 di2 di3 · · · 0

(7)

7. Results and Discussions

Character recognition experiments are per-formed using three databases collected by us.Database I is of 270 trained writers each writ-ing the 45 ‘Devnagari’ characters 3 times, in asheet provided with boxes. A reference sheet isprovided with standard characters to avoid toomany variations in characters. Database II wascollected from 20 writers, each writing 45 char-acters 3 times. The writers were from differ-ent age groups and social backgrounds. Theywere given a sheet with boxes and no referencecharacter sheet. Database III is totally uncon-strained where 100 writers were given the instruc-tion orally to write the characters 3-6 times. Thisdata was collected from school children, collegestudents, middle-aged working professionals andthe older generation.

With relative attributes associated to nodesand edges of the graph, the recognition rate wasfrom 95%-99%. For the sake of conciseness, onlythe distances of prototypes shown in Figure 16,are given in Table 3. The results show that thediagonal matrix though not zero has the mini-mum value, thereby giving accurate result withthe model. Table 4 gives the recognition result

Figure 16. Prototype Characters

on three different databases.

Table 4Distance of handwritten characters vs. Models.

Database Recognition RecognitionTraining Data Test Data

Trained 99.40 99.00Semi − trained 97.30 96.50Untrained 96.14 95.12

Graph in figure 17 gives the number of seg-ments, number of regions and recognized sub-graphs. This graph shows that though there isover segmentation some similarity is preserved ineach case and comparatively less number of sub-graphs is matched reducing the complexity of thealgorithm to give high recognition accuracy.

8. Conclusions

In this paper, a new method is proposed forcharacter recognition. The new structural signa-ture is very easy to compute from a structural Hi-erarchical Attributed Relational Graph(HARG)of an image. The representation of an image bythe proposed technique is tolerant to about 10degrees of slant and 5 degrees of skew. The idea

60 Mukherji and Rege

Table 3Distance of handwritten characters vs. Models.Character di1 di2 di3 di4 di5

i = 1 (Ksha) 4.5 10.1 12.0 8.1 9.0i = 2 (Chha) 7.1 5.2 6.3 7.3 6.1i = 3 (Kha) 12.1 7.4 5.5 8.5 9.1i = 4 (Tha) 7.6 11.8 10.0 4.2 7.4i = 5 (Dha) 7.1 5.3 8.0 6.3 4.6

Figure 17. Regions, segments and sub-graphs.

behind developing this vectorial representationis firstly, to develop a method having minimumprocessing time unlike graph-subgraph matching,which is exponential in the worst case and toavoid the draw back of statistical methods whichextract various features from enormous samplesto train the models, and then recognize unknowncharacter by comparison of their existing mod-els. Another advantage of this approach is thatit is easily extensible and does not require a largeamount of training data again. Future scope ofthis work is to include ‘Devnagari’ script con-juncts: ‘Devnagari’ script consists of combina-tion of consonant-vowels, consonant-consonant,called as ‘yuktakshar’ or conjuncts and this ap-proach may be tried for their recognition as inconjuncts only part of one character is attachedto the other character. Thus this approach is gen-eral and proves to be a promising one for recog-nition of handwritten characters for any script,particularly for ‘Devnagari’ script.

REFERENCES

1. S Kompalli, S Nayak, S Setlur and V Govin-daraju. Challenges in OCR of ‘Devnagari’Documents, Proceedings of Eighth Interna-tional Conference on Document Analysis andRecognition, pages 327-331, September, 2005.

2. I K Sethi and B Chatterjee. Machine Recogni-tion of Constrained Handprinted ‘Devnagari’,Pattern Recognition, 9(2):69-75, July, 1977.

3. P Mukherji and P P Rege. A Survey of Tech-niques for Optical Character Recognition ofHandwritten Documents with Reference to‘Devnagari’ Script,Proceedings of First In-ternational Conference on Signal and ImageProcessing, India, pages 178-183, December,2006.

4. N Sharma, U Pal, F Kimura and S Pal.Recognition of Off-Line Handwritten ‘Devna-gari’ Characters Using Quadratic Classifier,Proceedings of Indian Conference on Com-puter Vision Graphics and Image Processing(ICVGIP), India, pages 805-816, December,2006.

5. U Pal, N Sharma, T Wakabayashi andF Kimura. Off-Line Handwritten CharacterRecognition of ‘Devnagari’ Script, Proceed-ings of Ninth International Conference onDocument Analysis and Recognition, Brazil,pages 496-500, September, 2007.

6. P Mukherji and P P Rege. Stroke Analy-sis of Handwritten ‘Devnagari’ Characters,Proceedings of WSEAS’s Sixth Conference onCircuits, Systems, Electronics, Control andSignal Processing , Egypt, pages 843-849, De-cember, 2007.

7. I J Kim and J H Kim. Statistical Charac-

Hierarchical Attributed Relational Graph Based ‘Devnagari’ Character Recognition 61

ter Structure Modeling and its Applicationto Handwritten Chinese Character Recogni-tion, IEEE Transactions on Pattern Analysisand Machine Intelligence, 25(11):1422-1436,November, 2003.

8. W Lee, L B Kara and T F Stahovich. AnEfficient Graph-Based Recognizer for Hand-Drawn Symbols, Computers and Graphics,31(4):554-567, August, 2007.

9. John T Favata. Offline General Handwrit-ten Word Recognition Using an ApproximateBEAM Matching Algorithm, IEEE Transac-tions on Pattern Analysis and Machine Intel-ligence, 23(9):1009-1021, September, 2001.

10. H. Bunke. Graph matching: TheoreticalFoundations, Algorithms and Applications,Proceedings of Vision Interface, Canada,pages 82-88, May, 2000.

11. J R Ulman. An Algorithm for SubgraphIsomorphism, Journal of Association forComputing Machinery, 23(1):31-42, January,1976.

12. J Lllad, E Mart and J J Villanueva. Sym-bol Recognition By Error-tolerant SubgraphMatching Between Region Adjacency Graphs,IEEE Transactions on Pattern Analysis andMachine Intelligence , 23(10):1137-1143, Oc-tober, 2001.

13. B T Messmer and H Bunke. A New Algorithmfor Error-tolerant Subgraph Isomorphism De-tection, IEEE Transactions on Pattern Anal-ysis and Machine Intelligence , 20(5):493-504,May, 1998.

14. E Bengoetzea, Pedro Larranaga, IsabelleBloch, Ayemeric Perchant and ClaudiaBoeres. Inexact graph Matching by Means ofEstimation of Distribution Algorithms, Pat-tern Recognition , 35(12): 2867-2880, Decem-ber, 2002.

15. P A Champin and C Solnon. Measuring theSimilarity of Labeled Graphs, Proceedings ofthe Fifth International Conference on Case-Based Reasoning Research and Development,pages 80-95, June, 2003.

16. Roberto M Cesar Jr., E Bengoetzea, IsabelleBloch, Pedro Larranaga. Inexact GraphMatching for Model- Based Recognition:Evaluation and Comparison of Optimization

Problems, Pattern Recognition, 38(11):2099-2133, November, 2005.

17. D Lopresti and G Wilfong. Applications ofGraph Probing to Web Document Analysis,Proceedings of the First International Work-shop on Web Document Analysis, USA, pages51-54, September, 2001.

18. A N Papadopoulos and Y Manolopoulos.Structure-based Similarity Search with GraphHistograms, Proceedings of the Tenth Inter-national Workshop on Database and ExpertSystem Applications, pages 174-178, Septem-ber, 1999.

19. H Qiu and E R Hancock. Graph Match-ing and Clustering Using Spectral Partitions,Pattern Recognition, 39(1):22-34, January,2006.

20. S Auwatanamongkol. Inexact Graph Match-ing using a Genetic Algorithm for Im-age Recognition, Pattern Recognition ,28(12):1428-1437, January, 2007.

21. B Luo and E R Hancock. Structural GraphMatching Using the EM Algorithm and Sin-gular Value Decomposition, IEEE Transac-tions on Pattern Analysis and Machine Intel-ligence, 23(10):1120-1136, October, 2001.

22. R C Gonzalez and R E Woods. Digital ImageProcessing (Delhi, India: Pearson Education,2003).

23. P Mukherji, P P Rege and L K Pradhan. An-alytical Verification System for Handwritten‘Devnagari’ Script, Proceedings of the SixthIASTED’s International Conference on Vi-sualization, Imaging and Image Processing,Spain, pages 237-242, August, 2006.

24. J Y Ramel, N Vincent, H Emptoz, A Struc-tural Representation for Understanding Line-drawing Images, International Journal onDocument Analysis and Recognition, 3(2):58-66, December, 2000.

25. K Fukunaga. Introduction to Statistical Pat-tern Recognition (Second Edition), AcademicPress, New York, 1990.

Prachi Mukherji is an Assistant Professor withthe Department of Electronics and Telecom-munication, Smt. Kashibai Navale College ofEngineering, Pune, India. She received her B.E.

62 Mukherji and Rege

from Madhav institute of Technology and Sci-ence, Gwalior and M.Tech. from MACT Bhopal.She is currently pursuing her PhD in OpticalCharacter Recognition of Devnagari Script underPune University.

Priti Rege received her B.E (Bachelor in En-gineering) with distinction in 1983 and M.E de-gree with Gold medal in 1985, both from DeviAhilya University of Indore, India. She receivedher doctoral degree from the Pune University, In-dia in 2002. She is a Professor at College of Engi-neering, Pune. Currently, she is also working asDean, Students affairs at the same Institute. Shehas authored/coauthored over 50 research publi-cations in peer-reviewed journals and conferenceproceedings. She has worked on advisory com-mittee/program committee of various conferencesand also as a reviewer for various journals. Shehas also worked as chairperson at various sessionsin National and International conferences. Herresearch interests include, Digital Watermarking,Texture Analysis, Optical character Recognition,Document enhancement and 3-D volume render-ing. She has contributed in research projectsfunded by Government and Industries.