march - 2015...the focus is specifically on recognition of cursive handwritten characters in...
TRANSCRIPT
© 2014 Cognizant
© 2014 Cognizant
Cursive Handwriting Recognition
March - 2015
© 2014 Cognizant 2
Introduction
Cursive Handwriting Recognition has been an active area ofresearch and due to its diverse enterprises applications, itcontinues to be a challenging research topic in terms ofaccuracy.
The focus is specifically on recognition of cursive handwrittencharacters in insurance forms.
Six different types of insurance forms have been used for thecurrent study. These forms are mixtures of printed andhandwritten text.
Example fields: Phone number, SSN, telephone number, policynumber, name, address, dependent details etc.,
© 2014 Cognizant
To design a semi automated method for extracting the cursive
handwritten text present in insurance forms and reduce the errors
that could happen during extraction and recognition.
Also it attempts to correct the recognition error using Natural
Language Processing(NLP).
3
Objective
© 2014 Cognizant
It is challenging task to design a practical cursive handwritten
recognition system, which can maintain high accuracy and it is
independent of quality of the input documents.
Complexity of character segmentation stems from the wide
variety of fonts, rapidly expanding text styles and poor image
characteristics.
Touched, overlapped, separated and broken characters are
major factors for causing segmentation errors.
4
Challenges
© 2014 Cognizant
Utilization of MATLAB as a platform to provide efficient solution
within short period of time.
Using Image processing toolbox to process the captured image
in order to extract the enhanced image snippets and segment it
to characters.
Using methods: Histogram of Orientated Gradients(HOG) and
Principal Component Analysis (PCA) to construct the feature
model.
Computer Vision tool box is used to classify and recognize
segmented characters.
MATLAB utilities are used to build GUI for displaying the
results and store the extracted data in database within MATLAB
environment.
5
Methodology
© 2014 Cognizant 6
The Proposed Model
Input Image
(Scanned Insurance Form)
Manual Error
Correction
(Low confidence
case)
Preprocessing
(Snippet Extraction
+segmentation)
Feature Extraction
( HoG PCA)
Natural
Language
processing
(Error Correction)
Store the
Extracted
data
Classification
and
Recognition
(Neural Network)
© 2014 Cognizant 7
Preprocessing
Tagging of Forms (Manual Process ) – Snippet
extraction
Segmentation(Image snippet segment into individual glyphs or letters)
Labeling process (Assigning a number to each segmented characters)
Noise FilteringSmoothing the
edgesNormalization Dilation
Image Enhancement
The following series of operations needs to be perform on scanned
input image.
© 2014 Cognizant
HOG- PCA Features describes the relevant structural information
contained in a pattern.
Histograms of Oriented Gradient(HOG) descriptor and then project
it to a linear subspace Principal Component Analysis(PCA).
Robust under illumination, pose and view point changes.
8
Feature Extraction
© 2014 Cognizant
Neural Network (NN) is a powerful classifier that can be very
useful to classify HOG-PCA features.
Feed Forward Back Propagation Neural Network (FFBPN) is
used.
Neural Classifier consists of two hidden layers besides an input
and output layer.
The total number of neurons in output layer is 62 (upper case,
lower case and numeric characters) as the proposed system is
designed to identify numeric characters and alphabets.
9
Classification
© 2014 Cognizant
Building NLP aided Dictionary for insurance terms.
It stores user terms extracted from the previous history(database) and then improves the recognition accuracy.
It prevents repeated mis-recognitions.
Also avoids the risk of user stress caused by repeatedfailures in recognition.
It is used to modify the classification model. Helps to buildadaptive classification model.
10
Error Correction Using Natural Language processing (NLP)
© 2014 Cognizant
MATLAB R2013b : Generic math functions present in MATLAB
11
Tools used
MATLAB Tools/Functions Purpose
Image Processing Tool BoxImhist, histeq, dilate, bwlabel, imadjust, histeq,
adapthisteq, imfilter , imopen, imclose etc.,
Toolbox supports a wide range of image
processing operations including noise filtering,
histogram, enhancement , normalization etc.,
Computer Vision Tool BoxextractHOGFeatures
To HOG features from the input image
Statistics Tool Boxprepca, prestd, trapca etc.,
Principal component analysis on input data.
Neural Network Toolboxnntool
To classify and correctly recognize objects
based on the observed features
Database Toolboxdatabase, isconnection, set, sql2native
To build database to store the extracted data
Graphical User Interface(GUI) GUIDE To display the results within MATLAB
Environment
© 2014 Cognizant
Training
Number of characters used for training (alphabets, numeric and
alphanumeric) 20,000
Number of characters used for testing 15,000
Testing
Numeric Recognition with accuracy of 96% at an average
confidence level of 95%
Alphabets with a accuracy of 81% with average confidence
level of 85%.
12
Results
© 2014 Cognizant 13
Average ( Upper, Lower and Numeric) Character Recognition
Results
Hidden
Units
Classification rate(%)
with NLP
Classification rate(%)
without NLP
10 82.25 80.08
20 84.93 82.00
30 85.83 83.84
© 2014 Cognizant 14
Receiver Operating Characteristic plot
Numeric Recognition
AlphabetsRecognition
© 2014 Cognizant 15
Screen shots
Graphical User Interface
© 2014 Cognizant 16
Contd….
© 2014 Cognizant
In this work, a new NLP based cursive handwrite recognition
approach has been presented in this project that produces
promising results
MATLAB tool is utilized to build a efficient cursive handwritten
characters recognition system within short span of time
NLP aided error correction helps to improve the accuracy
significantly
The recognition rate on this various insurance forms are very
promising in the real time environment
17
Conclusion
© 2014 Cognizant
1. Choudhary, A. (2014)A Review of Various Character Segmentation Techniques
for Cursive Handwritten Words Recognition.
2. Abuzaraida, M. A., & Zeki, A. M. (2012, November). Recognition Techniques for
Online Arabic Handwriting Recognition Systems. In Advanced Computer Science
Applications and Technologies (ACSAT), 2012 International Conference on (pp.
518-523). IEEE.
3. Ghosh, R., & Ghosh, M. (2005). An intelligent offline handwriting recognition
system using evolutionary neural learning algorithm and rule based over
segmented data points. Journal of Research and Practice in Information
Technology, 37(1), 73-88.
4. Günter, S. (2004). Multiple classifier systems in offline cursive handwriting
recognition (Doctoral dissertation, University of Bern).
5. Wada, Y., & Kawato, M. (1995). A theory for cursive handwriting based on the
minimization principle. Biological Cybernetics, 73(1), 3-13.
18
Reference
© 2014 Cognizant 19