online handwritten english numeral

Upload: mary-morse

Post on 02-Jun-2018

232 views

Category:

Documents


0 download

TRANSCRIPT

  • 8/10/2019 Online Handwritten English Numeral

    1/54

    DISSERTATION

    on

    ONLINE HANDWRITTEN ENGLISH NUMERAL

    CHARACTER RECOGNITION

    THESIS SUBMITTED IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR

    THE DEGREE OF

    Master of TechnologyIn

    IT (COURSEWARE ENGINEERING)

    SUBMITTEDBY

    DALIA PRATIHAR

    EXAMINATION ROLL NUMBER: M48CWE13-05

    REGISTRATION NUMBER: 117311 of 2011-2012

    UNDER THE SUPERVISION

    OF

    Sri Arunasish Acharya

    SCHOOL OF EDUCATION TECHNOLOGY

    FACULTY COUNCIL FOR UG AND PG STUDIES

    IN ENGINEERING AND TECHNOLGY

    JADAVPUR UNIVERSITY

    KOLKATA-700032

    2013

  • 8/10/2019 Online Handwritten English Numeral

    2/54

    Faculty Council for UG and PG studies in Engineering and Technology

    JADAVPUR UNIVERSITY, KOLKATA- 700032

    Certificate of Recommendation

    This is to certify that Dalia Pratihar (M48CWE13-05) has completed her dissertation

    entitled Online Handwritten English Numeral Character Recognition, under the directsupervision and guidance of Sri Arunasish Acharya, Assistant Professor, School of

    Education Technology, Jadavpur University, Kolkata. We are satisfied with her work,which is being presented for the partial fulfillment of the requirements for the degree of

    MTech in I.T. (Courseware Engineering) of Jadavpur University, Kolkata - 700032.

    _____________________________

    Sri Arunasish Acharya

    Assistant Professor,

    School of Education Technology,

    Jadavpur University,

    Kolkata 700032

    _____________________________

    Prof. Samar Bhattacharya

    Director,

    School of Education Technology,

    Jadavpur University,

    Kolkata 700032

  • 8/10/2019 Online Handwritten English Numeral

    3/54

    JADAVPUR UNIVERSITY

    FACULTY OF ENGINEERING & TECHNOLOGY

    CERTIFICATE OF APPROVAL *

    The thesis at instance is hereby approved as a creditable study of an Engineering subject

    carried out and presented in a manner satisfactory to warrant its acceptance as aprerequisite to the degree for which it has been submitted. It is understood that by this

    approval the undersigned does not necessarily endorse or approve any statement made,opinion expressed or conclusion drawn therein, but approve this thesis for the purpose for

    which it is submitted.

    Final Examination for

    Evaluation of the Thesis.

    ________________________________________

    ________________________________________

    ________________________________________

    Signature of the Examiners

    * Only in case the thesis is approved.

  • 8/10/2019 Online Handwritten English Numeral

    4/54

    Declaration of Originality and Compliance of Academic Ethics

    I hereby declare that this thesis contains literature survey and original research work byundersigned candidate, as part of her MTech in I.T.(Courseware Engineering) studies.

    All information in this document have been obtained and presented in accordance withacademic rules and ethical conduct.

    I also declare that, as required by these rules and conduct, I have fully cited andreferenced all materials and results that are not original to this work.

    Name : DALIA PRATIHAR

    Exam Roll Number : M48CWE13-05

    Registration Number: 117311 of 2011-2012

    Thesis Title : ONLINE HANDWRITTEN ENGLISH NUMERAL

    CHARACTER RECOGNITION

    Signature with date:

  • 8/10/2019 Online Handwritten English Numeral

    5/54

    ACKNOWLEDGEMENT

    I owe thanks to a great many people who helped and supported me during the

    project work.

    My deepest thanks to Sri Arunasish Acharya, the Supervisor of the project, for

    guiding and correcting various documents of mine with much attention and care. Without hisconstant endeavours, this project wouldnot have been completed successfully.

    I express my sincere thanks to Prof. Samar Bhattacharya, Director of School

    of Education Technology, for extending his valuable support.

    My deep sense of gratitude to Prof. Kalyan Kumar Datta, Dr.(Prof)

    Pramathanath Basu, Dr.Ranjan Parekh, Dr. Matangini Chattopadhyay, Smt. Saswati

    Mukherjee and Mr. Joydeep Mukherjee for their continuous support during the entire

    course of the research.

    I would also thank my classmates without whose support and motivation this

    project would have been a distant reality. Thanks and appreciation to all the helpful people of

    my department, for their support.

    I also extend my heartfelt thanks to my family and well wishers.

    With regards,

    Dalia Pratihar

    Examination Roll Number: M48CWE13-05

    Registration Number: 117311 of 2011-2012

    SCHOOL OF EDUCATION TECHNOLOGY

    JADAVPUR UNIVERSITY

    KOLKATA-700032

  • 8/10/2019 Online Handwritten English Numeral

    6/54

    TABLE OF CONTENTS

    Chapters Pages

    Executive Summary 1. Introduction 2

    1.1

    Problem Statement

    1.2

    Objectives

    1.3 Character Recognition (General Idea)

    1.4 Assumptions & Scope

    1.5 Organization of Thesis

    2. Literature Survey 4

    3. Concepts & Problem Analysis 7

    4. Design & Development

    4.1 Introduction

    4.2 System Components

    4.3 Procedure and Flowchart

    4.4 Pseudocode

    4.5 Character Database

    4.6 Hardware & Software used

    4.6.1 Development Platform

    4.6.2 External Interfaces

    4.7 Design Methodology

    14

    5. Results & Interpretation 25

    6. Conclusion & Future Scope

    6.1 Conclusion

    6.2 Future Scope

    28

  • 8/10/2019 Online Handwritten English Numeral

    7/54

    References 30

    Appendix 33

  • 8/10/2019 Online Handwritten English Numeral

    8/54

    1

    EXECUTIVE SUMMARY

    In the present work, a prototype system for recognition of online handwritten numeral Englishcharacters has been developed. A database of this script is created for 0-9. The total number ofsamples in the present database is 150 which have been collected from 15 different users. A

    novel method of feature extraction has been proposed, which employs the concept of imagecentroid. Apart from this, the range of black pixels in both vertical and horizontal directions iscomputed which forms another feature of the present algorithm. Finally, the statistical method ofEuclidean distance has been employed for the final recognition. The accuracy has also beenmeasured, based on which, some scope for further improvement has been suggested.

  • 8/10/2019 Online Handwritten English Numeral

    9/54

    2

    Chapter 1

    INTRODUCTION1.1. PROBLEM STATEMENT

    Develop a highly accurate online English numeral character recognition system.

    1.2. OBJECTIVES

    This project has certain objectives as follows ::

    To design and develop a high-performance online handwritten English

    numeral character recognition system.

    To explore the effectiveness of the present work with respect to the worksalready done in the field of character recognition.

    1.3. CHARACTER RECOGNITION (GENERAL IDEA)

    Imparting machines human like capabilities for accomplishing various tasks has always been anactive area of interest as this eases many tedious works that could earlier be performed only by

    human beings. Character Recognition is one such domain where the man-machine dividenarrows down significantly. Apart from the data entered through keyboard, the machine istrained to understand data fed through other methods like scanning, bar code reading etc. Thebirth of such Optical Character Recognition (OCR) system can be traced back to as early as the1870s. Presently, a few thousand is the number of systems sold every week, and the cost of toohas decreased significantly making it easily accessible to the general public. [22]

    1.4. ASSUMPTIONS AND SCOPE

    In a classification task, we are allotted a pattern and the job is to classify it into one out of cclasses. The number of classes, c, is assumed to be known a-priori and here is equal to 10. Each

    pattern class is represented by a set of feature values. We assume that each pattern is representeduniquely by a single feature vector and that it can belong to only one class. Also, all the classesare equi-probable that is no single class has greater priority than the other ones in this recognitiontask. Moreover, we assume that all the input digits are in perfectly straight and non-tilted. Thisforms a constraint on the writer. Also, we are taking the input using a touchpad and assume thatthe user is writing fairly accurately.

  • 8/10/2019 Online Handwritten English Numeral

    10/54

    3

    1.5. ORGANIZATION OF THESIS

    In this present work a system for Online English Handwritten Digit Recognition has beendesigned. Furthermore a set of experiments can be conducted using different image of same digitwritten by different persons, then the result is verified. Based on the results of these experiments,

    the advantages and disadvantages of the method can be then analyzed and discussed.Finally, some conclusions about this method will be drawn.

    Chapter 1: The thesis is divided into four chapters in total. The first chapter is anintroduction to our project work which includes the problem statement, objective of thedissertation, brief discussion, assumptions made and scope regarding the project. Lastly, itgives an idea about the organization of the other chapters and a brief idea about their contentsas listed below:

    Chapter 2: This chapter titled Literature Survey gives a thorough if not exhaustiveaccount of the works carried out in the field of Character Recognition which have been referred

    in the present work.

    Chapter 3: This chapter titled Concepts & Problem Analysis describes the backgroundconcepts that are needed in order to analyze the problem and hence derive and optimal solutionfor the task in hand, that is, recognizing the input numerals.

    Chapter 4: This chapter titled Design & Development demonstrates the novel algorithmproposed in this dissertation, the logic behind it, the database developed both for the ideal andtest cases and the coding done using in software tool.

    Chapter 5: This chapter is entitled as Results & Interpretations and points to some

    experimental studies, their results and their interpretations.

    Chapter 6: This end chapter named Conclusion & Future Scope concludes this thesisand discusses some of the limitations and scope for enhancement regarding this project.

  • 8/10/2019 Online Handwritten English Numeral

    11/54

    4

    Chapter 2

    LITERATURESURVEY

    Online character recognition has been an area of extensive study and research in the past.

    Character recognition systems contribute tremendously to the advance of the automation process

    and can be of significant benefit to man-machine communication in many applications, such as(1) reading aid for the blind (2) automatic text entry into the computer for desktop publication,library cataloging, ledgering , (3) automatic reading for sorting of postal mail, bank chequesand other documents, (4) document data compression: from document image to ASCII format,(5) language processing, (6) multi-media system design, etc.[9]

    In designing a highly accurate character recognition system, the challenges are to extract themost efficient features from the different character images of several characters so that they canbe easily identified by the system. Several methods of feature extraction for characterrecognition have been reported in the literature.[10]

    In paper [11], an OCR system has been designed and tested for Urdu characters. After research,the entire alphabet set of 40 characters is zoned into 21 classes. Initially, after binarization andsegmentation, a chain code of each column of the image is generated. Then this chain code isstored in an xml file. The file contains all the 21 classes of Urdu alphabets as parent nodes orelements. Each child node has three attributes. One, the name of the character and the other,chain code of that character, calculated from its image earlier. Unicode of the character issaved in xml as third attribute of the child node, which is assigned to the identifiedcharacter at the end of the matching procedure. The ideal and test images are matched withsome extent of pre-set error margin.

    In paper [12], a novel diagonal feature extraction scheme for recognizing off-line handwrittencharacters is proposed. Every character image of size 90x60 pixels is divided into 54 equalzones, each of size 10x10 pixels . The features are then extracted from each zones pixelsby moving along the diagonals of its respective 10X10 pixels. Each such 10x10 zone has19 diagonal lines and the foreground pixels present along each diagonal line is summed toget a single sub-feature, thus 19 sub-features are obtained from the each of the 54 zones. These19 sub-features values are averaged to form a single feature value and placed in thecorresponding zone. This procedure is sequentially repeated for the all the 54 zones. Therezones whose diagonals are empty of foreground pixels have the feature values equal to zero.

  • 8/10/2019 Online Handwritten English Numeral

    12/54

    5

    Finally, 54 features are extracted for each character. Apart from these, 9 and 6 features areobtained by averaging the values placed in zones rowwise and columnwise, respectively(since the original image size was assumed to be 90x60 pixels). As result, every character isrepresented by 69, that is, 54 +15 features. These extracted features are then used to train afeed forward back propagation neural network and used for classification.

    In paper [13], some statistical features like zonal density, projection histograms (horizontal,vertical and both diagonal, that is, diagonal-1 (left diagonal) and diagonal-2 (right diagonal)),distance profiles (from left, right, top and bottom sides) have been used. The distance profiles arecomputed by calculating the distance (number of pixels) of the outer edge of the character to theimage boundary. In computing Background Directional Feature, they have considered a maskalong 8 directions and then passed it over the original image to obtain the cumulative fraction ofpixels in each direction. One noteworthy work done in this paper is the incorporation of post-processing technique to construct meaningful sentence using the component constructs.

    The recognition system proposed in paper [14] inputs some text lines, extracts certain features

    like projection profiles (vertical and horizontal), density of the black pixels, variance of thehorizontal profile derivative. A text line containing words has been labeled as Htop, Hupper,Hbase and Hbottom. Moreover, while considering the vertical projection profile, more refinedfeatures have been taken into account like height of middle part, height of upper part. Finally,they have used a Bayesian classifier for recognition of the fonts, where the fonts are known a-priori.

    In paper [15], the normalized and thinned character image is divided into sectors with 12 sectorswith each sector covering a fixed angle equal to 30 degrees. For each sector, the distance of eachblack pixel from the image centre is computed and then summed up to give a single value. Also,per sector, the slope of a black pixel with respect to the image centre is calculated. These values

    are normalized by dividing each value by the number of black pixels. Then the 12x2=24 featuresare got from these 12 sectors. Next, 4 sectors have been considered with sector angle 90 degreeeach. Then occupancy value is computed which has been defined as the proportion of blackpixels in that sector with respect to the entire image. Also, the end point in each of the 4 sectorsis found by the neighboring pixel analysis method. So, the features totaling 32 include vectordistances, angles, occupancy and end-points. For recognition, both neural networks and fuzzylogic techniques are then adopted.

    In paper [16], the skewed images are rectified by transforming the same from a tilted to anupright one. After extracting directional element feature (DEF) from each character image, CityBlock Distance with Deviation (CBDD) and Asymmetric Mahalanobis distance (AMD) areproposed for rough classification and fine classification.

    Based on the theory of Image Segmentation, the centroid of a character can be found.Around this centroid, the image is divided into 36 equal angle (each angle 10 degrees)regions clockwise and the direction feature of the character distribution is obtained inpaper [17]. Then the slope of each black pixel with respect to the image centroid is found out.The angle thus obtained is used to attach the concerned pixel to one of the 36 regions. Thehandwriting restraint is removed by adopting measures to remove deflection, that is skewness. In

  • 8/10/2019 Online Handwritten English Numeral

    13/54

    6

    the principle of high-accuracy matching, the minimal matching database has been used toapproach the real-time character match.

    In paper [18], after segmentation, each isolated image glyph is processed to extract the featuresof the glyph like the character height, character width, the number of horizontal lines (long and

    short), the number of vertical lines (long and short), the horizontally oriented curves, thevertically oriented curves, the number of circles, number of slope lines, image centroid andspecial dots. The heights, the circles and the slopes are computed by applying appropriate masks.The glyphs are now set ready for classification based on these features. The extracted featuresare passed to a Support Vector Machine (SVM) where the characters are classified by SupervisedLearning Algorithm.

  • 8/10/2019 Online Handwritten English Numeral

    14/54

    7

    Chapter 3

    CONCEPTS&

    PROBLEMANALYSIS

    Pixel

    Short for Picture Element, a pixel is generally thought of as the smallest single component of adigital image. The address of a pixel corresponds to its physical coordinates. Each pixel is asample of the original image, hence more samples typically provide more accuraterepresentations of the original. [1]

    Fig. 3.1 Fig. 3.2

  • 8/10/2019 Online Handwritten English Numeral

    15/54

    8

    For example, the above Fig. titled Fig. 1.1 shows an image drawn in Adobe Photoshop 7.0 usingthe brush tool. The same image when zoomed to 1600% reveals the component pixels, a portionof which is shown in Fig. 3.2

    The intensity of each pixel may vary. In color image systems, a color is typically represented by

    three or four component intensities such as red, green, and blue (RGB systems) or cyan,magenta, yellow, and black.(CMYK systems). [1]

    The number of bits used to represent each pixel determines how many distinct colors can bedisplayed using that pixel. For example, in 8-bit color mode, the color monitor uses 8 bits foreach pixel, making it possible to display 2 to the 8th power (256) different colors or shades ofgray. [2]

    Pattern Recognition

    Pattern recognition can be broadly defined as a process to generate a meaningful description of

    data and a deeper understanding of a problem through manipulation of a large set of primitive,obviously haphazard and quantifying data. Some of that large data set may come from statistics,a document, or graphics, and is eventually expected to be in a visual form. Preprocessing of thesedata is necessary for error corrections, for image enhancement, and for their understanding andrecognition. Preprocessing operations are generally classified as low-level operations, whilepattern recognition including analysis, description and understanding of the image (or the largedata set), is high-level processing. [3]

    Fig. 3.3

    There are three general approaches for implementing pattern recognition systems namely::

    Statistical Pattern Recognition (StatPR)

    Syntactic Pattern Recognition (SynPR)

    Neural Pattern Recognition (NeurPR)

  • 8/10/2019 Online Handwritten English Numeral

    16/54

  • 8/10/2019 Online Handwritten English Numeral

    17/54

    10

    several other cases where this type of recognition comes into play, like Vehicle license platerecognition for security and surveillance purposes [6], Traffic Sign Detection [7], SignatureVerification [8] etc.More specifically, it is an application of the science of Pattern Recognition.

    Irrespective of the problem type, there steps involved in recognizing a character are as follows::

    1. Data Acquisition/Sensing2. Pre-processing3. Segmentation4. Feature Extraction5. Classification6. Post-processing [9]

    Fig. 3.5

    The preprocessing step involves image binarization, image skeletonization/ thinning, imagesegmentation amongst others.

    Image thinning is the process where the input image is processed to remove unnecessary pixelswhile retaining connectivity. Hilditch thinning algorithm is the most widely used algorithm in

    this regard. [23][24]

    Skew detection and correction is another preprocessing technique whereby slanted images arerectified to produce upright ones. Several papers demonstrate such preprocessing based ontechniques like vertical projection profiles [25], Radon transform [26] etc.

  • 8/10/2019 Online Handwritten English Numeral

    18/54

    11

    Image segmentation is a vital preprocessing step where the scanned text is partitioned intoparagraphs, lines, words and finally single characters. The works been carried out in this areainclude techniques like projection profiles [25], Hough Transform [27] amongst several others.

    Pattern

    A pattern is a d-dimensional vector x = (x1,...,xd) where each element of the vector correspondsto the value of a different feature. A classification task is perfect when each pattern is uniquelyrepresented by a set of unique values of these features.

    (a) (b)

    Fig. 3.6

    For example, an acetylene structure is uniquely described by the pattern (a) whereas pattern (b)

    describes a benzene ring. The unique orientation/pattern of the component carbon and hydrogenatoms contribute to the difference of these compounds.

    Feature

    These are used to describe an observation. A pattern is also referred to as a sample orobservation.

    Feature extraction

    The process of extracting certain attributes from collected data. These attributes are used to map

    the original data to a feature space in which the data will be separable so that classification canbe performed.

    Feature selection

    The process of selecting the most relevant features that have been extracted, thus selecting thefeatures that will map to the optimal feature space to perform classification.

  • 8/10/2019 Online Handwritten English Numeral

    19/54

    12

    For example, in order to discriminate between 6 and 9, apart from using features like numberof pixels etc, if we can incorporate another feature vector in the set that is, the phase angle, thenthe discrimination process gives correct result.

    Character

    In computer software, any symbol that requires one byte of storage is known as character. Thisincludes all the ASCII and extended ASCII characters, including the space character.

    AA A Fig. 3.7

    The above figure shows the character A written in various fonts like Times New Roman, Courier

    New and Verdana, in font size 36.

    Character Recognition Modes

    Offline

    The image of the written text may be sensed "off line" from a piece of paper by opticalscanning and then fed into the system as a graphic file. From that file, the character ofinterest has to be located and then processed for recognition.

    Online

    In this case, the movements of the pen tip may be sensed "on line" or in real-time, forexample by a pen-based computer screen surface. In the present project, a writing tablethas been used for facilitating online digit recognition.

    Euclidean Distance

    Very often, especially when measuring the distance in the plane, we use the formula for theEuclidean distance. According to the Euclidean distance formula, the distance between twopoints in the plane with coordinates (x, y) and (a, b) is given by::

    dist((x, y), (a, b)) = (x - a) + (y - b)

    As an example, the (Euclidean) distance between points (2, -1) and (-2, 2) is found to be

    dist((2, -1), (-2, 2))= (2 - (-2)) + ((-1) - 2)= (2 + 2) + (-1 - 2)= (4) + (-3)

  • 8/10/2019 Online Handwritten English Numeral

    20/54

    13

    = 16 + 9= 25= 5

    The source of this formula is in the Pythagorean theorem. [11]

    BMP File Format

    The BMP file format, also known as bitmap image file or device independent bitmap (DIB) fileformat or simply a bitmap, is an image file format used to store bitmap digital images,independently of the display device (such as a graphics adapter), especially on MicrosoftWindows and OS/2 operating systems. The main aim of DIBs is to allow bitmaps to be movedfrom one device to another (hence, the device-independent part of the name). [20]The BMP file only supports single line bitmaps of 1, 4, 8 or 24 bits per pixel. In our project, wehave used .bmp format due to the following reasons ::

    BMP files store each pixel independently, hence maintain the accuracy and quality of the storedimage. There is no compression, hence, no loss in pixel information. In the present work, theimages undergo several steps and are saved for treating them as input for the next step. Hence,preserving the quality of the images is a very important factor and bmp format is absolutelyperfect for that. Also, this format can represent complex images and shapes and retain imageproperties even when the image is magnified. [21]

    Having gained a fair idea about the concepts required, the task in hand can now be started.

  • 8/10/2019 Online Handwritten English Numeral

    21/54

    14

    Chapter 4

    DESIGN&

    DEVELOPMENT4.1. INTRODUCTION

    In this project, a novel system of English numeral has been proposed. It focuses on onlinerecognition of the digits. The recognition system is based on the statistical approach of patternrecognition. As will be evident in the following sub-sections, the system extracts some featuresrelated to the digitized images and compares them with the ideal values to find the closest match.The degree of this similarity is assessed by means of Euclidean distance classifier.

    4.2. SYSTEM COMPONENTS

    The numeral recognition system developed has several sub-systems as follows::

    1. Image Acquisition

    For the dataset of our system, we have used the Adobe Photoshop 6.0 Text Tool, in Times NewRoman Font, pen size 72pt, color black. The image size is immaterial here as the digitizedimages are resized to an optimal dimension subsequently. The images are saved as Bitmapimages in Windows format, 24bit pixel depth. The test images are input into the system by using

    a tablet. The images are then read into the Matlab package using standard Matlab I/O functions.

    2. Binarization

    In this step, the images are converted from color images into binary. In order to reduce storagerequirements and to increase processing speed, it is often desirable to represent gray-scale orcolor images as binary images by picking a threshold value. As evident from the name, a binaryimage is composed of only two types of pixels- black and white. So, the bits per pixel is reduced

  • 8/10/2019 Online Handwritten English Numeral

    22/54

    15

    to a mere value of 2. Image binarization via thresholding may be both local or global [10]. In oursystem, the intensity based on which the images are binarized is purely experimental.

    3. Image Cropage ing

    The binarized images are now cropped in a way such that the bounding box touches theoutermost pixels exactly that is, no extra white space is allowed beyond the border black pixels.

    4. Size Normalization

    This is an important step in our system where the cropped image is resized to 40x40 image forcomparison. This is absolutely necessary as no constraint was imposed on the input image size.Hence it is not appropriate for comparison unless size normalized.

    5. Image Centroid Calculation

    In this step, the centroid of an image is calculated. The image centroid, both in horizontal andvertical directions, is defined as

    Position of black pixels (in horizontal / vertical direction)=

    Total number of black pixels

    So, the image centroid is not necessarily same as the image centre.

    6. Division into blocks

    The image is then divided into 4 blocks of equal size starting from the top-left corner. Now, foreach such block, we do the following steps ::

    i. Calculate image centroid using the above formulaii. Find out the distance of this centroid from original image centroid

    iii. Sum up all these distances into a single valueiv. Calculate the Euclidean distance between this value and the same in case of all

    other ideal images. This constitutes one of our features.

    7. Euclidean Distance with respect to image centroid

    The Euclidean distance between the ideal images and the test image in question is calculated andstored as our second feature.

    8. Calculation of range of black pixels both horizontal and vertical wise

    In this step, the difference between the extreme black pixels per row and column is calculatedand then summed up to give our third and fourth feature respectively. This is done as follows::

  • 8/10/2019 Online Handwritten English Numeral

    23/54

    16

    For example, let us consider the following image matrix. Here, 1 implies a black pixel is present.

    1 2 3 Range(row wise)

    I= 1 1 0 0 1-1=0

    2 0 1 1 3-2=1 Sum=0+1+2=3

    3 1 1 1 3-1=2

    Range 2 1 2(col wise)

    Sum=2+1+2=5

    These values are calculated for both test and ideal images and for each test image, the euclideandistance between these values and the ideal images values is computed to give two more of ourfeatures.

    9. Summation of all the Euclidean Distance values

    In this stage, the four Euclidean distance containing matrices are summed up to generate the finaldata matrix. In this matrix, the rows contribute to test images and the columns correspond toideal images. So, in a certain row, the cell with the minimum value is the recognized character.The recognition is correct if the column that contains this minimum value is the same as the testimage in concern.

    4.3. PROCEDURE AND FLOWCHART

    A complete procedure of handwritten English numeral recognition is given below::

    o Capture the test numerals.o Perform Binarization.o Perform the Normalization process (cropping, resizing, thinning).o Apply Feature Extraction Techniques (image centroid and pixel position range

    techniques).o Implement the Euclidean Distance classifier.o

    Get the recognized character.

    A complete flowchart of the handwritten English numeral recognition system is givenbelow::

  • 8/10/2019 Online Handwritten English Numeral

    24/54

    17

    Fig. 4.14.4. PSEUDOCODE

    Some of the algorithms used are as follows::

    Image Binarization

    Start at left top of the image

    For each pixel in the input image,if (luminance > 0.6)

    pixel:=1 /* white pixel */else

    pixel:=0 /* black pixel */

    So, after applying this algorithm, the output binary image has values of 1 (white) for all

  • 8/10/2019 Online Handwritten English Numeral

    25/54

    18

    pixels in the input image with luminance greater than LEVEL=0.6 and 0 (black) for all otherpixels

    Image Cropping

    For each image,

    Identify the extreme top-left corner containing black pixel (x1, y1)Identify the extreme top-right corner containing black pixel (x1, y2)Identify the extreme bottom-left corner containing black pixel (x2, y1)Identify the extreme bottom-right corner containing black pixel (x2, y2)

    Crop the image so that the entire image is contained within the bounding box of the abovementioned four points. So, the dimension of the rectangle for cropping is::

    (y2-y1+1)x (x2-x1+1)

    Where,y2-y1+1=image widthx2-x1+1=image height

    Image resizing

    For each image, normalize it such that resultant imagewidth=40 pixels,height=40 pixels

    Image Thinning

    For each image,Remove pixels so that the image shrinks to a minimally connected stroke. This removaloperation is carried out innumerable times, till the image hence obtained has the followingproperties:

    As thin as possibleConnectedCentered

    Centroid Calculation

    Let,n = number of black pixels = 0 (initially)sumr = x co-ordinate of black pixelsumc = y co-ordinate of black pixel

  • 8/10/2019 Online Handwritten English Numeral

    26/54

    19

    Start at top-left corner of the image.

    while (extreme bottom-right corner is not reached)If(black pixel is found)

    n:=n+1 /* increment black pixel count by one*/

    sumr:= sumr + x co-ordinate of black pixelsumc:= sumc + y co-ordinate of black pixelelseproceed to next pixel

    end

    Finally, centroid (x,y) is::

    x=sumr / n , y=sumc / n

    Division of image into blocks

    Start at top-left corner of the imageThe image is divided into four blocks as follows

    1 20 40

    North

    West

    North

    East

    NorthWest

    SouthEast

    Fig. 4.2

    Range calculation

    Rowwise::

    Start at top-left of the image.

    For each row,

    20 20

    40

    40

    40

  • 8/10/2019 Online Handwritten English Numeral

    27/54

    20

    Diff(d)=(y co-ordinate of leftmost black pixel) - (y co-ordinate of rightmost black pixel);End

    Sum = sum of all the values(d) per row

    Columnwise::

    Start at top-left of the image.

    For each column,Diff(d)=(x co-ordinate of topmost black pixel) (x co-ordinate of bottommost black pixel);end

    Sum = sum of all the values(d) per column

    4.5. CHARACTER DATABASE

    One hundred and fifty samples were collected from 15 persons, 10 samples each for constructingour test image database, while, as discussed earlier, the Adobe Photoshop softwares Text Toolhas been used to generate the 10 ideal digitized images.

    4.6. HARDWARE & SOFTWARE USED

    4.6.1. Development Platform

    HARDWARE

    Processor Intel Core 2 Duo T5870 @ 2.00GHz RAM 2.99GB Motherboard Intel Gigabyte Technology Chipset 8I865G

    OPERATING SYSTEM USED

    Microsoft Windows XP Professional Version 02 Service Pack 3

    SOFTWARE

    MATLAB 7.12 Adobe Photoshop 7.0

    MS Paint

    4.6.2. External Interfaces

    HARDWARE INTERFACES

  • 8/10/2019 Online Handwritten English Numeral

    28/54

    21

    Tablet and pen

    SOFTWARE INTERFACES

    A GUI is made using MATLAB to take user command for recognizing a character

    4.7. DESIGN METHODOLOGY

    The design started with collecting handwritten samples of English numerals. A blank image file,of 200x200 size and 72page i resolution is created in Adobe Photoshop and the digit is written bythe writer by means of a tablet and an associated pen in this file

    Fig. 4.3

    Thereafter, the file is saved in the appropriate folder using .bmp extension.

  • 8/10/2019 Online Handwritten English Numeral

    29/54

    22

    Fig. 4.4

  • 8/10/2019 Online Handwritten English Numeral

    30/54

    23

    In this way all handwritten samples of English numeral characters have been collected.

    Fig. 4.5

    Then these images are imported into MATLAB using in-built functions.

    After cropping, the images are saved in a separate folder.

    Fig. 4.6

    Thereafter, these are resized and thinned and kept aside in a new folder.

  • 8/10/2019 Online Handwritten English Numeral

    31/54

    24

    Fig. 4.6

    The final results are obtained once the user clicks a button developed using MATLAB guide.

    Fig. 4.7

  • 8/10/2019 Online Handwritten English Numeral

    32/54

    25

    Chapter 5

    RESULTS&

    INTERPRETATIONSA sample of the final data matrix, which shows the Euclidean distance values is shown below::

    Fig. 5.1

    Here, the orange box denotes the digit recognized by the system, and the green one denotes thedigit that should have been recognized in case there is mismatch. The result is emphasized bymeans of a flag, where RCorrect recognition, WWrong recognition

    The final results are displayed in the MATLAB command window as follows::

  • 8/10/2019 Online Handwritten English Numeral

    33/54

    26

    Fig. 5.2

    Where the digits shown in column represent the recognized digits, whether correct or incorrect.And the accuracy shows the percentage to which the digits have been recognized correctly.

    After analyzing the results of all the 150 samples, we derive the actual accuracy of the systemdeveloped. One thing that must be noted here is that the accuracy so derived is totally dependenton the size of the test image set and hence is not fixed.

    The total dataset accuracy values are saved in a certain folder as follows::

    Fig. 5.3

  • 8/10/2019 Online Handwritten English Numeral

    34/54

    27

    The accuracy values and the digits recognized for all the datasets from 1-15 are shown below::

    Fig.5.4

    So, the overall accuracy for 150 (15 x 10) samples comes out to be= ((100.00+100.00+53.33+100.00+80.00+60.00+73.33+73.33+73.33+80.00)/10) %= (793.32 / 10) %= 79.332%~= 80%

    Of this, the best accuracy is for 0, 1 and 3. The worst accuracy is for digit 2.

  • 8/10/2019 Online Handwritten English Numeral

    35/54

    28

    Chapter 6

    CONCLUSION&

    FUTURE SCOPE

    6.1. CONCLUSION

    In this project, a system for recognizing handwritten English numerals has been developed byadopting a qualitative research methodology approach. The research is being termed qualitativein the sense that it began with a broad problem statement of having to identify English numeralsand then plans were chalked out to accomplish that task. After initial research and literaturesurvey, a fair number of concepts were drawn based on which an efficient algorithm was

    proposed. Thereafter a top-down design methodology was adopted with some inherent designconstraints as discussed earlier. Then experiments were carried out to collect sample data. Next,existing statistical methods were adopted to process that data and derive the final interpretationsfor the systems results. Thus, conclusions regarding the efficiency of the system developed weredrawn based on the statistical findings. A fair recognition accuracy of 80% has been achieved,which is quite acceptable given the novelty of the algorithm proposed.

    6.2. FUTURE SCOPE

    However, the system developed has some shortcomings. First, it is capable of testing only

    isolated characters from 0-9. That is, the user cannot expect to recognize a number containingmultiple digits. Two, there is no option for skew and slant normalization. Also, the system forrecognition expects the user to write all the digits 0-9 for proper recognition as the backend logicis coded likewise.

    The accuracy for the system can be increased by applying some classifier other than theEuclidean distance based one. This approach can be extended to be applicable for recognition ofwords, sentences and documents by implementing segmentation techniques. Also, post-

  • 8/10/2019 Online Handwritten English Numeral

    36/54

    29

    processing checking by incorporation of semantic information can help in increasing theaccuracy and efficiency of the system. Neural network can also be applied in future to make thesystem learn and adapt, hence, this would do away with the need of database. And finally, abetter user interface development is definitely possible which incorporates all the steps fromimage input to recognition within it solely and not as disjoint software components. Also, more

    transparency can be incorporated in the system so as to impart to the user what is actuallyhappening and what are the outputs of each step of the system.

    But otherwise, the system is a useful work in the development of a viable Optical CharacterRecognition system.

  • 8/10/2019 Online Handwritten English Numeral

    37/54

    30

    References

    [1]Pixel - Wikipedia, the free encyclopediahttps://en.wikipedia.org/wiki/Pixel

    [2] What is pixel - A Word Definition From the Webopedia Computer Dictionaryhttp://www.webopedia.com/TERM/P/pixel.html

    [3] Bow, S.T., Pattern Recognition, Marcel Dekker, New York, 1984, ISBN 0-8247-0659-5,Page v

    [4] Schalkoff. Robert, PATTERN RECOGNITION: STATISTICAL, STRUCTURAL ANDNEURAL APAGE ROACHES, John Wiley & Sons, 01-Sep-2007, page 2, page 18-19, 2007

    [5] http://www.byclb.com/TR/Tutorials/neural_networks/ch1_1.htm

    [6] Duan. Tran Duc, Du. Tran Le Hong, Phuoc. Tran Vinh, Hoang. Nguyen Viet, Building anAutomatic Vehicle License-Plate Recognition System, Intl. Conf. in Computer Science RIVF05, February 21-24, 2005, Can Tho, Vietnam, page 59-63, 2005

    [7] Garcia. Miguel Angel, Sotelo. Miguel Angel, Gorostiza. E. Martin, Traffic Sign Detection inStatic Images using Matlab, IEEE 0-7803-7937-3/'03, page 212-215, 2003

    [8] Deng. Peter Shaohua, Liao. Hong-Yuan Mark, Ho. Chin Wen, Tyan. Hsiao-Rong, Wavelet-Based Off-Line Handwritten Signature Verification, Computer Vision and ImageUnderstanding Vol. 76, No. 3, December, page 173190, 1999

    [9]Pal. U., Chaudhuri. B.B., Indian script character recognition: a survey, Pattern Recognition37 (2004), page 18871899, 2004

    http://www.elsevier.com/locate/patcog

    [10]Arica. Nafiz, Yarman-Vural. Fatos T., An Overview of Character Recognition Focused onOff-Line Handwriting, IEEE Transactions on Systems, Man, and Cyberneticspart c: Apagelications and Reviews, vol. 31, no. 2, may 2001, page 216-233, 2001

    [11]Nawaz. Tabassam, Naqvi. Syed Ammar Hassan Shah, Rehman. Habib ur, Faiz. Anoshia,Optical Character Recognition System for Urdu (Naskh Font) Using Pattern Matching

    Technique, International Journal of Image Processing, (IJIP)Volume (3) : Issue (3), page 92-104

    [12] J.Pradeep, E.Srinivasan, S.Himavathi, Diagonal Based Feature Extraction For HandwrittenAlphabets Recognition System Using Neural Network, International Journal of ComputerScience & Information Technology (IJCSIT), Vol 3, No 1, Feb 2011, page 27-38, 2011

  • 8/10/2019 Online Handwritten English Numeral

    38/54

    31

    [13] Siddharth. Kartar Singh, Jangid. Mahesh, Dhir. Renu, Rani. Rajneesh, HandwrittenGurmukhi Character Recognition Using Statistical and Background Directional DistributionFeatures, ISSN : 0975-3397 Vol. 3 No. 6 June 2011, page 2332-2345, 2011

    [14] Zramdini. Abdel Wahab, Ingold. Rolf, Optical font recognition from projection profiles,

    Electronic Publishing, Vol. 6(3), (September 1993), page 249260, 1993

    [15]Madasu. Vamsi K., Lovell. Brian C., Hanmandlu. M., Hand printed Character Recognitionusing Neural Networks

    [16] Kato. Nei, Omachi. Shinichiro, Aso. Hirotomo, Nemoto. Yoshiaki, A HandwrittenCharacter Recognition System Using Directional Element Feature and Asymmetric MahalanobisDistance, IEEE Transactions On Pattern Analysis And Machine Intelligence, vol. 21, no. 3,March 1999, page 258-262, 1999

    [17] Lin. Huiqin, Ou. Wennuan, Zhu. Tonglin, The Research of Algorithm for Handwritten

    Character Recognition in Correcting Assignment System, IEEE Sixth International Conferenceon Image and Graphics, 978-0-7695-4541-7/11, page 456-460, 2011

    [18] R. Seethalakshmi, T.R. Sreeranjani, Balachandar T., Singh. Abnikant, Singh. Markandey,Ratan. Ritwaj, Kumar. Sarvesh, Optical Character Recognition for printed Tamil text usingUnicode, Journal of Zhejiang University SCIENCE ISSN 1009-3095, page 1297-1305, 2005

    [19] The Distance Formulahttp://www.cut-the-knot.org/pythagoras/DistanceFormula.shtml

    [20] BMP file format - Wikipedia, the free encyclopediahttp://en.wikipedia.org/wiki/BMP_file_format

    [21] The Advantages of BMP _ eHowhttp://www.ehow.com/list_6118423_advantages-bmp.html

    [22] Line Eikvil, OCR Optical Character Recognition, December 1993http://citeseerx.ist.psu.edu

    [23] Yin. Ming, Narita. Seinosuke, Speedup Method for Real-Time Thinning Algorithm,Digital Image Computing Techniques and Applications, 21--22 January 2002, Melbourne,Australia, 2002

    [24] Shanthi. N, Duraiswamy. K, Preprocessing Algorithms for the Recognition of TamilHandwritten Characters, 3rd International CALIBER - 2005, Cochin, 2-4 February, 2005, page77-82, 2005

    [25] Papandreou. A., Gatos. B., A Novel Skew Detection Technique Based on VerticalProjections, International Conference on Document Analysis and Recognition, 2011, IEEE1520-5363/11, page 384-388, 2011

  • 8/10/2019 Online Handwritten English Numeral

    39/54

    32

    [26] Louloudis. G.,Gatos. B., Pratikakis. I., Halatsis. C., Text line and word segmentation ofhandwritten documents, Pattern Recognition 42 (2009) 3169 3183, 2009

    http://www.elsevier.com/locate/pr

  • 8/10/2019 Online Handwritten English Numeral

    40/54

    33

    Appendix

    Some code snippets are given below::

    Final_34.m

    function varargout = final_34(varargin)gui_Singleton = 1;gui_State = struct('gui_Name', mfilename, ...

    'gui_Singleton', gui_Singleton, ...'gui_OpeningFcn', @final_34_OpeningFcn, ...'gui_OutputFcn', @final_34_OutputFcn, ...'gui_LayoutFcn', [], ...'gui_Callback', []);

    if nargin && ischar(varargin{1})

    gui_State.gui_Callback = str2func(varargin{1});end

    if nargout[varargout{1:nargout}] = gui_mainfcn(gui_State, varargin{:});

    elsegui_mainfcn(gui_State, varargin{:});

    endend

    function final_34_OpeningFcn(hObject, ~, handles, varargin)

    handles.output = hObject;end

    function varargout = final_34_OutputFcn(~, ~, handles)varargout{1} = handles.output;end

    function recognize_Callback(~, ~, ~)call_crop_test34();prepro_test34();centroid_call();

    centroid_call_test34();callcentroid_seg();callcentroid_seg_test34();centroidseg_dist();centroidseg_dist_test34();cal_diff_seg_dist34();allsegdist_diff();call_getextrm_ideal();

  • 8/10/2019 Online Handwritten English Numeral

    41/54

    34

    call_getextrm_test34();cal_getxtrm_ideal_col();cal_getextrm_test_col34();cal_diff_xtrm();cal_diff_xtrm_col();

    calceuclid();summing_run_cent();min_vals();accu();end

    accu.m

    count=0;for i=1:10

    if((i-1)==in(i))

    count=count+1;endendperc=(count/10)*100;disp('accuracy=');disp(perc);

    min_vals.m

    for i=1:10[minimum,in]=min(sum_run_cent,[],2);in=in-1;

    enddisp(in);

    call_crop_test34.m

    for i=1:10cropping34(i);

    end

    cropping34.m

    function []=cropping34(f)

    b=imread(strcat('C:\Program Files\MATLAB\R2011a\bin\test images new34\',num2str(f-1),'.bmp'));b = im2bw(b,0.6);k=0;s=size(b);

  • 8/10/2019 Online Handwritten English Numeral

    42/54

    35

    for i=1:s(1)for j=1:s(2)

    if(b(i,j)==0)k=k+1;

    arr_x(k)=i;arr_y(k)=j;end

    endend

    x1=min(arr_x,[],1);x1=min(x1,[],2);x2=max(arr_x,[],1);x2=max(x2,[],2);y1=min(arr_y,[],2);

    y2=max(arr_y,[],2);

    rect=[y1-1 x1-1 y2-y1+1 x2-x1+1];b=imcrop(b,rect);

    imwrite(b,strcat('C:\ProgramFiles\MATLAB\R2011a\bin\test34_cropped\',num2str(f-1),'.bmp'),'bmp');end

    prepro_test34.m

    for i=1:10b=imread(strcat('C:\Program Files\MATLAB\R2011a\bin\test34_cropped\',num2str(i-1),'.bmp'));k=0;dim=size(b);b=imresize(b,[40 40]);b = bwmorph(~b,'thin',Inf);b=~b;imwrite(b,strcat('C:\Program Files\MATLAB\R2011a\bin\test34_thinned\',num2str(i-1),'.bmp'),'bmp');end

    centroid_call.m

    for i=1:10p(i,1:2)=getcentroid(i);end

  • 8/10/2019 Online Handwritten English Numeral

    43/54

    36

    centroid_call_test34.m

    for i=1:10q(i,1:2)=getcentroid_test34(i);end

    getcentroid.m

    function cent=getcentroid(i)

    f=imread(strcat('C:\MATLAB7\bin\imagedatabase\',num2str(i-1),'.bmp'));c=0;sumr=0;sumc=0;

    for i=1:s(1)

    for j=1:s(2)if(ti(i,j)==0)sumr=sumr+i;sumc=sumc+j;c=c+1;

    endend

    endcr=sumr/c;cc=sumc/c;

    cent=[cr cc];end

    getcentroid_test34.m

    function cent_test=getcentroid_test34(i)

    f=imread(strcat('C:\Program Files\MATLAB\R2011a\bin\test34_thinned\',num2str(i-1),'.bmp'));c=0;sumr=0;sumc=0;

    for i=1:s(1)for j=1:s(2)if(ti(i,j)==0)

    sumr=sumr+i;sumc=sumc+j;c=c+1;

    end

  • 8/10/2019 Online Handwritten English Numeral

    44/54

    37

    endend

    cr=sumr/c;cc=sumc/c;

    cent_test=[cr cc];end

    centroid_seg.m

    for i=1:10cent_seg(i,1:8)=getcentroid_seg(i);end

    getcentroid_seg.m

    function cent=getcentroid_seg(i)

    f=imread(strcat('C:\MATLAB7\bin\imagedatabase\',num2str(i-1),'.bmp'));s=size(ti);

    c_seg1=0;sumr_seg1=0;sumc_seg1=0;

    for i=1:s(1)/2for j=1:s(2)/2if(ti(i,j)==0)

    sumr_seg1=sumr_seg1+i;sumc_seg1=sumc_seg1+j;c_seg1=c_seg1+1;

    endend

    end

    cr_seg1=sumr_seg1/c_seg1;cc_seg1=sumc_seg1/c_seg1;

    %******************************seg2*****************************%

    c_seg2=0;sumr_seg2=0;sumc_seg2=0;

    for i=1:s(1)/2

  • 8/10/2019 Online Handwritten English Numeral

    45/54

    38

    for j=(s(2)/2)+1:s(2)if(ti(i,j)==0)

    sumr_seg2=sumr_seg2+i;sumc_seg2=sumc_seg2+j;c_seg2=c_seg2+1;

    endendend

    cr_seg2=sumr_seg2/c_seg2;cc_seg2=sumc_seg2/c_seg2;

    %******************************seg3*****************************%

    c_seg3=0;sumr_seg3=0;

    sumc_seg3=0;

    for i=(s(1)/2)+1:s(1)for j=(s(2)/2)+1:s(2)if(ti(i,j)==0)

    sumr_seg3=sumr_seg3+i;sumc_seg3=sumc_seg3+j;c_seg3=c_seg3+1;

    endendend

    cr_seg3=sumr_seg3/c_seg3;cc_seg3=sumc_seg3/c_seg3;

    %******************************seg4*****************************%

    c_seg4=0;sumr_seg4=0;sumc_seg4=0;

    for i=(s(1)/2)+1:s(1)for j=1:s(2)if(ti(i,j)==0)

    sumr_seg4=sumr_seg4+i;sumc_seg4=sumc_seg4+j;c_seg4=c_seg4+1;

    endend

  • 8/10/2019 Online Handwritten English Numeral

    46/54

    39

    end

    cr_seg4=sumr_seg4/c_seg4;cc_seg4=sumc_seg4/c_seg4;

    cent=[cr_seg1 cc_seg1 cr_seg2 cc_seg2 cr_seg3 cc_seg3 cr_seg4 cc_seg4];

    end

    callcentroid_seg_test.m

    for i=1:10cent_seg_test(i,1:8)=getcentroid_seg_test34(i);end

    getcentroid_seg_test34.m

    function cent=getcentroid_seg_test34(i)

    f=imread(strcat('C:\Program Files\MATLAB\R2011a\bin\test34_thinned\',num2str(i-1),'.bmp'));% showgrid();% calculation of centroid %% level=graythresh(f);% ti=im2bw(f,level);

    ti=im2bw(f,0.6);s=size(ti);

    c_seg1=0;sumr_seg1=0;sumc_seg1=0;

    for i=1:s(1)/2for j=1:s(2)/2if(ti(i,j)==0)

    sumr_seg1=sumr_seg1+i;sumc_seg1=sumc_seg1+j;c_seg1=c_seg1+1;

    endendend

    cr_seg1=sumr_seg1/c_seg1;cc_seg1=sumc_seg1/c_seg1;

  • 8/10/2019 Online Handwritten English Numeral

    47/54

    40

    %******************************seg2*****************************%

    c_seg2=0;sumr_seg2=0;

    sumc_seg2=0;

    for i=1:s(1)/2for j=(s(2)/2)+1:s(2)if(ti(i,j)==0)

    sumr_seg2=sumr_seg2+i;sumc_seg2=sumc_seg2+j;c_seg2=c_seg2+1;

    endendend

    cr_seg2=sumr_seg2/c_seg2;cc_seg2=sumc_seg2/c_seg2;

    %******************************seg3*****************************%

    c_seg3=0;sumr_seg3=0;sumc_seg3=0;

    for i=(s(1)/2)+1:s(1)for j=(s(2)/2)+1:s(2)if(ti(i,j)==0)

    sumr_seg3=sumr_seg3+i;sumc_seg3=sumc_seg3+j;c_seg3=c_seg3+1;

    endendend

    cr_seg3=sumr_seg3/c_seg3;cc_seg3=sumc_seg3/c_seg3;

    %******************************seg4*****************************%

    c_seg4=0;sumr_seg4=0;sumc_seg4=0;

  • 8/10/2019 Online Handwritten English Numeral

    48/54

    41

    for i=(s(1)/2)+1:s(1)for j=1:s(2)if(ti(i,j)==0)

    sumr_seg4=sumr_seg4+i;

    sumc_seg4=sumc_seg4+j;c_seg4=c_seg4+1;endendend

    cr_seg4=sumr_seg4/c_seg4;cc_seg4=sumc_seg4/c_seg4;

    cent=[cr_seg1 cc_seg1 cr_seg2 cc_seg2 cr_seg3 cc_seg3 cr_seg4 cc_seg4];

    endcentroidseg_dist.m

    for i=1:10

    dist_seg1(i,1)=(p(i,1)-cent_seg(i,1))^2+(p(i,2)-cent_seg(i,2))^2;dist_seg1(i,1)=sqrt(dist_seg1(i,1));

    dist_seg1(i,2)=(p(i,1)-cent_seg(i,3))^2+(p(i,2)-cent_seg(i,4))^2;dist_seg1(i,2)=sqrt(dist_seg1(i,2));

    dist_seg1(i,3)=(p(i,1)-cent_seg(i,5))^2+(p(i,2)-cent_seg(i,6))^2;dist_seg1(i,3)=sqrt(dist_seg1(i,3));

    dist_seg1(i,4)=(p(i,1)-cent_seg(i,7))^2+(p(i,2)-cent_seg(i,8))^2;dist_seg1(i,4)=sqrt(dist_seg1(i,4));

    end

    for i=1:10 dist_seg1(i,5)=dist_seg1(i,1)+dist_seg1(i,2)+dist_seg1(i,3)+dist_seg1(i,4);end

    centroidseg_dist_test34.m

    for i=1:10dist_seg1_test(i,1)=(q(i,1)-cent_seg_test(i,1))^2+(q(i,2)-cent_seg_test(i,2))^2;dist_seg1_test(i,1)=sqrt(dist_seg1_test(i,1));

    dist_seg1_test(i,2)=(q(i,1)-cent_seg_test(i,3))^2+(q(i,2)-cent_seg_test(i,4))^2;

  • 8/10/2019 Online Handwritten English Numeral

    49/54

    42

    dist_seg1_test(i,2)=sqrt(dist_seg1_test(i,2));

    dist_seg1_test(i,3)=(q(i,1)-cent_seg_test(i,5))^2+(q(i,2)-cent_seg_test(i,6))^2;dist_seg1_test(i,3)=sqrt(dist_seg1_test(i,3));

    dist_seg1_test(i,4)=(q(i,1)-cent_seg_test(i,7))^2+(q(i,2)-cent_seg_test(i,8))^2;dist_seg1_test(i,4)=sqrt(dist_seg1_test(i,4));

    end

    for i=1:10dist_seg1_test(i,5)=dist_seg1_test(i,1)+dist_seg1_test(i,2)+dist_seg1_test(i,3)+dist_seg1_test(i,4);end

    cal_diff_seg_dist34.m

    for i=1:10for m=1:10

    diff_seg(i,m)=abs(dist_seg1_test(i,5)-dist_seg1(m,5));end

    end

    allsegdist_diff.m

    sum_seg_dist_four_diff=diff_seg;for i=1:10

    for j=1:10if(isnan(sum_seg_dist_four_diff(i,j)))

    sum_seg_dist_four_diff(i,j)=9999;end

    endend

    call_getextrm_ideal.m

    for i=1:10extrm(i,1)=getextrm_ideal(i);

    end

    call_getextrm_test34.m

    for i=1:10extrm_test(i,1)=getextrm_test34(i);

    end

  • 8/10/2019 Online Handwritten English Numeral

    50/54

    43

    getextrm_ideal.m

    function xt=getextrm_ideal(i)

    f=imread(strcat('C:\MATLAB7\bin\imagedatabase\',num2str(i-1),'.bmp'));

    s=size(f);left(40)=zeros();right(40)=zeros();diff(40)=zeros();

    for i=1:s(1)for j=1:s(2)

    if(ti(i,j)==0)

    left(i)=j;break;end

    endend

    for m=s(1):1for n=s(2):1

    if(ti(m,n)==0)right(m)=n;break;

    endend

    end

    diff=abs(right-left);disp(diff);xt=0;

    for p=1:40xt=xt+diff(p);

    enddisp(xt);end

    getextrm_test34

    function xt=getextrm_test34(i)

    f=imread(strcat('C:\Program Files\MATLAB\R2011a\bin\test34_thinned\',num2str(i-1),'.bmp'));

  • 8/10/2019 Online Handwritten English Numeral

    51/54

    44

    s=size(f);left(40)=zeros();right(40)=zeros();diff(40)=zeros();

    for i=1:s(1)for j=1:s(2)if(ti(i,j)==0)

    left(i)=j;break;

    endend

    end

    for m=s(1):1for n=s(2):1

    if(ti(m,n)==0)right(m)=n;break;

    endend

    end

    diff=abs(right-left);disp(diff);xt=0;for p=1:40

    xt=xt+diff(p);enddisp(xt);end

    cal_getextrm_ideal_col.m

    for i=1:10extrm_col(i,1)=getextrm_ideal_col(i);

    end

    cal_getextrm_test_col34.m

    for i=1:10extrm_col_test(i,1)=getextrm_test_col34(i);

    end

  • 8/10/2019 Online Handwritten English Numeral

    52/54

    45

    getextrm_ideal_col.m

    function xt=getextrm_ideal_col(i)

    f=imread(strcat('C:\MATLAB7\bin\imagedatabase\',num2str(i-1),'.bmp'));

    s=size(f);

    top(40)=zeros();bottom(40)=zeros();diff(40)=zeros();

    for j=1:s(2)for i=1:s(1)

    if(ti(i,j)==0)top(j)=i;

    break;endend

    end

    for n=1:s(2)for m=s(1):1

    if(ti(m,n)==0)bottom(n)=m;break;

    endend

    end

    diff=abs(bottom-top);disp(diff);xt=0;for p=1:40

    xt=xt+diff(p);enddisp(xt);end

    getextrm_test_col34.m

    function xt=getextrm_test_col34(i)

    f=imread(strcat('C:\Program Files\MATLAB\R2011a\bin\test34_thinned\',num2str(i-1),'.bmp'));s=size(f);top(40)=zeros();

  • 8/10/2019 Online Handwritten English Numeral

    53/54

    46

    bottom(40)=zeros();diff(40)=zeros();

    for j=1:s(2)for i=1:s(1)

    if(ti(i,j)==0)top(j)=i;break;

    endend

    end

    for n=1:s(2)for m=s(1):1

    if(ti(m,n)==0)

    bottom(n)=m;break;end

    endend

    diff=abs(bottom-top);disp(diff);xt=0;for p=1:40

    xt=xt+diff(p);enddisp(xt);end

    cal_diff_xtrm.m

    for i=1:10for j=1:10diff_xtrm(i,j)=abs(extrm_test(i,1)-extrm(j,1)) ;end

    end

    cal_diff_xtrm_col.m

    for i=1:10for j=1:10diff_xtrm_col(i,j)=abs(extrm_col_test(i,1)-extrm_col(j,1));end

  • 8/10/2019 Online Handwritten English Numeral

    54/54

    end

    caleuclid.m

    m=1;n=1;for i=1:10

    for j=1:10D1(m,j)=sqrt((q(i,1)-p(j,1))^2+(q(i,2)-p(j,2))^2);

    endm=m+1;enddisp(D1);

    summing_run_cent.m

    sum_run_cent=sum_seg_dist_four_diff+D1+diff_xtrm+diff_xtrm_col;sum_run_cent=(sum_run_cent)/100;disp(sum_run_cent);