character recognition: scope and challenges
Post on 18-Oct-2014
1.055 views
DESCRIPTION
useful in research for character recognition in general and Devnagari character recognition in perticularTRANSCRIPT
![Page 1: character recognition: Scope and challenges](https://reader033.vdocument.in/reader033/viewer/2022061106/5442e0a1afaf9f0e118b477d/html5/thumbnails/1.jpg)
04/07/23 Devnagari Character Recognition 1of 62
byVikas J. Dongre
Lecturer Electronics,Government Polytechnic Gondia
![Page 2: character recognition: Scope and challenges](https://reader033.vdocument.in/reader033/viewer/2022061106/5442e0a1afaf9f0e118b477d/html5/thumbnails/2.jpg)
04/07/23 Devnagari Character Recognition 2of 62
Contents
Introduction Scope Features Of Devnagari Script Image Preprocessing Feature Extraction Character Classification Post processing Character Recognition challenges Current research results
![Page 3: character recognition: Scope and challenges](https://reader033.vdocument.in/reader033/viewer/2022061106/5442e0a1afaf9f0e118b477d/html5/thumbnails/3.jpg)
04/07/23 Devnagari Character Recognition 3of 62
OCR (Optical Character Recognition)
Character recognition is a part of pattern or object recognition with special focus to Natural language processing (NLP).
“…a system that provides a full alphanumeric recognition of printed or handwritten characters at electronic speed by simply scanning the document.”
Documents can be scanned through a scanner and then the recognition engine of the OCR system interpret the images and turn images of handwritten or printed characters into ASCII data (machine-readable characters).
![Page 4: character recognition: Scope and challenges](https://reader033.vdocument.in/reader033/viewer/2022061106/5442e0a1afaf9f0e118b477d/html5/thumbnails/4.jpg)
04/07/23 Devnagari Character Recognition 4of 62
Some applications
•Postal address reading•Check reading•Census data collection and processing•Image document reading•Digitizing old books in editable form•Extended research:
• text to speech conversion (e-book reading) •Visually impaired should be able to access
computers in their native language Indian
languages
![Page 5: character recognition: Scope and challenges](https://reader033.vdocument.in/reader033/viewer/2022061106/5442e0a1afaf9f0e118b477d/html5/thumbnails/5.jpg)
04/07/23 Devnagari Character Recognition 5of 62
Postal Address Recognition
![Page 6: character recognition: Scope and challenges](https://reader033.vdocument.in/reader033/viewer/2022061106/5442e0a1afaf9f0e118b477d/html5/thumbnails/6.jpg)
04/07/23 Devnagari Character Recognition 6of 62
![Page 7: character recognition: Scope and challenges](https://reader033.vdocument.in/reader033/viewer/2022061106/5442e0a1afaf9f0e118b477d/html5/thumbnails/7.jpg)
04/07/23 Devnagari Character Recognition 7of 62
![Page 8: character recognition: Scope and challenges](https://reader033.vdocument.in/reader033/viewer/2022061106/5442e0a1afaf9f0e118b477d/html5/thumbnails/8.jpg)
04/07/23 Devnagari Character Recognition 8of 62
![Page 9: character recognition: Scope and challenges](https://reader033.vdocument.in/reader033/viewer/2022061106/5442e0a1afaf9f0e118b477d/html5/thumbnails/9.jpg)
04/07/23 Devnagari Character Recognition 9of 62
![Page 10: character recognition: Scope and challenges](https://reader033.vdocument.in/reader033/viewer/2022061106/5442e0a1afaf9f0e118b477d/html5/thumbnails/10.jpg)
04/07/23 Devnagari Character Recognition 10of 62
Prime comitments
![Page 11: character recognition: Scope and challenges](https://reader033.vdocument.in/reader033/viewer/2022061106/5442e0a1afaf9f0e118b477d/html5/thumbnails/11.jpg)
04/07/23 Devnagari Character Recognition 11of 62
International Scenario (Source IBM)
Internet Users by Language
English
ChineseJapaneseSpanish
German
FrenchKoreanItalian
PortugueseDutch
Other
![Page 12: character recognition: Scope and challenges](https://reader033.vdocument.in/reader033/viewer/2022061106/5442e0a1afaf9f0e118b477d/html5/thumbnails/12.jpg)
04/07/23 Devnagari Character Recognition 12of 62
International Scenario (Source IBM)
Internet Users: Growth
EnglishChinese
Japanese
Spanish
German
FrenchKoreanItalian
Portuguese
Dutch
Other
![Page 13: character recognition: Scope and challenges](https://reader033.vdocument.in/reader033/viewer/2022061106/5442e0a1afaf9f0e118b477d/html5/thumbnails/13.jpg)
04/07/23 Devnagari Character Recognition 13of 62
Main Research Themes
Online character RecognitionPrinted Text RecognitionHandwriting RecognitionLanguage RecognitionGraphics Document RecognitionDocument UnderstandingTables and Forms ProcessingDocument Engineering
![Page 14: character recognition: Scope and challenges](https://reader033.vdocument.in/reader033/viewer/2022061106/5442e0a1afaf9f0e118b477d/html5/thumbnails/14.jpg)
04/07/23 Devnagari Character Recognition 14of 62
Introduction to Devnagari character Recognition
Devnagari Optical Character recognition (DOCR) is more complicated as compared to English.
various soft computing tools involved in other types of pattern recognition and image processing can be used for DOCR.
![Page 15: character recognition: Scope and challenges](https://reader033.vdocument.in/reader033/viewer/2022061106/5442e0a1afaf9f0e118b477d/html5/thumbnails/15.jpg)
04/07/23 Devnagari Character Recognition 15of 62
Features Of Devnagari Script
Devnagari is the most popular script in India. Hindi, the national language of India, is written in the
Devnagari script. It is also used for writing Marathi, Konkani, Sanskrit and
Nepali languages. Moreover, Hindi is the third most popular language in
the world. Alphabet set tends to be quite large. It has 11 vowels and 33 consonants as basic characters. Compound characters can be formed by joining
characters in various ways. characters have a horizontal line at the upper part,
known as Shirorekha or headline
![Page 16: character recognition: Scope and challenges](https://reader033.vdocument.in/reader033/viewer/2022061106/5442e0a1afaf9f0e118b477d/html5/thumbnails/16.jpg)
04/07/23 Devnagari Character Recognition 16of 62
Vowels and Corresponding Modifiers
Consonants
Half Form of Consonants with Vertical Bar
![Page 17: character recognition: Scope and challenges](https://reader033.vdocument.in/reader033/viewer/2022061106/5442e0a1afaf9f0e118b477d/html5/thumbnails/17.jpg)
04/07/23 Devnagari Character Recognition 17of 62
Examples of Combination of Half-Consonant and Consonant
Examples of Special Combination of Half-Consonant and Consonant.
Special Symbols
![Page 18: character recognition: Scope and challenges](https://reader033.vdocument.in/reader033/viewer/2022061106/5442e0a1afaf9f0e118b477d/html5/thumbnails/18.jpg)
04/07/23 Devnagari Character Recognition 18of 62
Character recognition Process
Image digit-zation using Scann
er
Image
pre-processin
g
Feature
extraction & Normalizati
on
Character
Classifier
Character
Segmentati
on
Storing
character in
text file
![Page 19: character recognition: Scope and challenges](https://reader033.vdocument.in/reader033/viewer/2022061106/5442e0a1afaf9f0e118b477d/html5/thumbnails/19.jpg)
04/07/23 Devnagari Character Recognition 19of 62
Image Preprocessing
Thresholding & Binarization Noise Reduction Segmentation Skew Detection And Correction Size Normalization Thinning
![Page 20: character recognition: Scope and challenges](https://reader033.vdocument.in/reader033/viewer/2022061106/5442e0a1afaf9f0e118b477d/html5/thumbnails/20.jpg)
04/07/23 Devnagari Character Recognition 20of 62
Preprocessed Images (a) Original, (b) segmented (c) Shirorekha removed (d) Thinned (e) image edging
![Page 21: character recognition: Scope and challenges](https://reader033.vdocument.in/reader033/viewer/2022061106/5442e0a1afaf9f0e118b477d/html5/thumbnails/21.jpg)
04/07/23 Devnagari Character Recognition 21of 62
Slant Correction
• The dominant slope of the word is found from the slope corrected words which gives the minimum entropy of a vertical projection histogram. The vertical histogram projection is calculated for a range of angles ± R. In our case R=60, seems to cover all writing styles. The
slope of the word, ,is found from:
ma
HRa
m min
i
N
ii ppH log
1
• The character is then corrected by using:
ma
)tan( mayxx yy
![Page 22: character recognition: Scope and challenges](https://reader033.vdocument.in/reader033/viewer/2022061106/5442e0a1afaf9f0e118b477d/html5/thumbnails/22.jpg)
04/07/23 Devnagari Character Recognition 22of 62
Skew Correction
![Page 23: character recognition: Scope and challenges](https://reader033.vdocument.in/reader033/viewer/2022061106/5442e0a1afaf9f0e118b477d/html5/thumbnails/23.jpg)
04/07/23 Devnagari Character Recognition 23of 62
Feature Extraction
A set of features are extracted for each class that helps distinguish it from other classes, while remaining
invariant to characteristic differences within the class Various methods are:
Global Transformation and Series Expansion Statistical Features Geometrical and Topological Features
![Page 24: character recognition: Scope and challenges](https://reader033.vdocument.in/reader033/viewer/2022061106/5442e0a1afaf9f0e118b477d/html5/thumbnails/24.jpg)
04/07/23 Devnagari Character Recognition 24of 62
Global Transformation and Series Expansion Fourier Transforms Gabor Transform Wavelets Moments Karhunen-Loeve( KL) Expansion
Statistical Features
Zoning Crossings and Distances Projections
![Page 25: character recognition: Scope and challenges](https://reader033.vdocument.in/reader033/viewer/2022061106/5442e0a1afaf9f0e118b477d/html5/thumbnails/25.jpg)
04/07/23 Devnagari Character Recognition 25of 62
Geometrical and Topological Features
Extracting and Counting Topological Structures Measuring and Approximating the Geometrical
Properties Coding Graphs and Trees
![Page 26: character recognition: Scope and challenges](https://reader033.vdocument.in/reader033/viewer/2022061106/5442e0a1afaf9f0e118b477d/html5/thumbnails/26.jpg)
04/07/23 Devnagari Character Recognition 26of 62
Zoning
![Page 27: character recognition: Scope and challenges](https://reader033.vdocument.in/reader033/viewer/2022061106/5442e0a1afaf9f0e118b477d/html5/thumbnails/27.jpg)
04/07/23 Devnagari Character Recognition 27of 62
Structural Features
![Page 28: character recognition: Scope and challenges](https://reader033.vdocument.in/reader033/viewer/2022061106/5442e0a1afaf9f0e118b477d/html5/thumbnails/28.jpg)
04/07/23 Devnagari Character Recognition 28of 62
![Page 29: character recognition: Scope and challenges](https://reader033.vdocument.in/reader033/viewer/2022061106/5442e0a1afaf9f0e118b477d/html5/thumbnails/29.jpg)
04/07/23 Devnagari Character Recognition 29of 62
![Page 30: character recognition: Scope and challenges](https://reader033.vdocument.in/reader033/viewer/2022061106/5442e0a1afaf9f0e118b477d/html5/thumbnails/30.jpg)
04/07/23 Devnagari Character Recognition 30of 62
Character Classification
Template Matching. Statistical Techniques. Neural Networks. Support Vector Machine (SVM) algorithms.
Combination classifier.
OCR systems extensively use the methodologies of pattern recognition, which assigns an unknown sample to a predefined class. Various methods are
![Page 31: character recognition: Scope and challenges](https://reader033.vdocument.in/reader033/viewer/2022061106/5442e0a1afaf9f0e118b477d/html5/thumbnails/31.jpg)
04/07/23 Devnagari Character Recognition 31of 62
Template Matching
Euclidean Distance Mahalanobis, Jaccard or Yule similarity measures K-Nearest Neighbor measurements
This is the simplest way of character recognition. The recognition rate of this method is very sensitive to noise and image deformation. Various methods are
Character Classification…
![Page 32: character recognition: Scope and challenges](https://reader033.vdocument.in/reader033/viewer/2022061106/5442e0a1afaf9f0e118b477d/html5/thumbnails/32.jpg)
04/07/23 Devnagari Character Recognition 32of 62
Character Classification…
Statistical Techniques Likelihood or Bayes classifier Clustering Analysis Hidden Markov Modeling (HMM) Fuzzy Set Reasoning Quadratic classifier
![Page 33: character recognition: Scope and challenges](https://reader033.vdocument.in/reader033/viewer/2022061106/5442e0a1afaf9f0e118b477d/html5/thumbnails/33.jpg)
04/07/23 Devnagari Character Recognition 33of 62
Character Classification…
Neural Networks multilayer
perceptron (MLP) Kohonen's Self
Organizing Map (SOM)
Back Propagation algorithm
Support Vector Machine (SVM) algorithms
![Page 34: character recognition: Scope and challenges](https://reader033.vdocument.in/reader033/viewer/2022061106/5442e0a1afaf9f0e118b477d/html5/thumbnails/34.jpg)
04/07/23 Devnagari Character Recognition 34of 62
Character Classification…
Combination Classifier ANN and HMM K-Means and SVM MLP and SVM MLP and minimum edit SVM and ANN fuzzy neural network NN, fuzzy logic and genetic algorithm
![Page 35: character recognition: Scope and challenges](https://reader033.vdocument.in/reader033/viewer/2022061106/5442e0a1afaf9f0e118b477d/html5/thumbnails/35.jpg)
04/07/23 Devnagari Character Recognition 35of 62
Post processing
save in text file Refine OCR output using spell check ,
grammar check and other knowledge source comparisons
other applications using standard word processors.
![Page 36: character recognition: Scope and challenges](https://reader033.vdocument.in/reader033/viewer/2022061106/5442e0a1afaf9f0e118b477d/html5/thumbnails/36.jpg)
04/07/23 Devnagari Character Recognition 36of 62
Some Research results
Scanned document (input image)
![Page 37: character recognition: Scope and challenges](https://reader033.vdocument.in/reader033/viewer/2022061106/5442e0a1afaf9f0e118b477d/html5/thumbnails/37.jpg)
04/07/23 Devnagari Character Recognition 37of 623704/07/23
Paragraph Segmentation
![Page 38: character recognition: Scope and challenges](https://reader033.vdocument.in/reader033/viewer/2022061106/5442e0a1afaf9f0e118b477d/html5/thumbnails/38.jpg)
04/07/23 Devnagari Character Recognition 38of 62
Segmented Paragraph
![Page 39: character recognition: Scope and challenges](https://reader033.vdocument.in/reader033/viewer/2022061106/5442e0a1afaf9f0e118b477d/html5/thumbnails/39.jpg)
04/07/23 Devnagari Character Recognition 39of 623904/07/23
Segmented Paragraph
![Page 40: character recognition: Scope and challenges](https://reader033.vdocument.in/reader033/viewer/2022061106/5442e0a1afaf9f0e118b477d/html5/thumbnails/40.jpg)
04/07/23 Devnagari Character Recognition 40of 62
Zero pixel zone
Line Segmentation
![Page 41: character recognition: Scope and challenges](https://reader033.vdocument.in/reader033/viewer/2022061106/5442e0a1afaf9f0e118b477d/html5/thumbnails/41.jpg)
04/07/23 Devnagari Character Recognition 41of 62
Line Segmentation
![Page 42: character recognition: Scope and challenges](https://reader033.vdocument.in/reader033/viewer/2022061106/5442e0a1afaf9f0e118b477d/html5/thumbnails/42.jpg)
04/07/23 Devnagari Character Recognition 42of 62
Devnagari Word
Individual Devnagari symbols
Word Segmentation
Segmented word
![Page 43: character recognition: Scope and challenges](https://reader033.vdocument.in/reader033/viewer/2022061106/5442e0a1afaf9f0e118b477d/html5/thumbnails/43.jpg)
04/07/23 Devnagari Character Recognition 43of 62
Word Segmentation
Devnagari Word
Individual Devnagari symbols
Segmented word
![Page 44: character recognition: Scope and challenges](https://reader033.vdocument.in/reader033/viewer/2022061106/5442e0a1afaf9f0e118b477d/html5/thumbnails/44.jpg)
04/07/23 Devnagari Character Recognition 44of 62
Some observations
Experiments with degraded text images show that the chief source of error is at the level of segmentation of characters.
A similar situation exists for recognition of hand written texts.
Error rates are at acceptable levels for the other stages i.e. line segmentation, word segmentation, character recognition etc.
![Page 45: character recognition: Scope and challenges](https://reader033.vdocument.in/reader033/viewer/2022061106/5442e0a1afaf9f0e118b477d/html5/thumbnails/45.jpg)
04/07/23 Devnagari Character Recognition 45of 62
Character classification
Recognized characters
Input character
s
(54)
Correct=42Icorrect=9Not recognized: 3Accuracy=77.8 %
Features used:Filled AreaEuler NumberPerimeterConvex Area
Classifier used
Absolute difference
![Page 46: character recognition: Scope and challenges](https://reader033.vdocument.in/reader033/viewer/2022061106/5442e0a1afaf9f0e118b477d/html5/thumbnails/46.jpg)
04/07/23 Devnagari Character Recognition 46of 62
Research Publications
Vikas J Dongre, Vijay H Mankar, “A Review of Research on Devnagari Character Recognition”, International Journal of Computer Applications (0975 – 8887) Volume 12– No.2, pp. 8-15, November 2010.
![Page 47: character recognition: Scope and challenges](https://reader033.vdocument.in/reader033/viewer/2022061106/5442e0a1afaf9f0e118b477d/html5/thumbnails/47.jpg)
04/07/23 Devnagari Character Recognition 47of 62
Complexity in Indic writing
![Page 48: character recognition: Scope and challenges](https://reader033.vdocument.in/reader033/viewer/2022061106/5442e0a1afaf9f0e118b477d/html5/thumbnails/48.jpg)
04/07/23 Devnagari Character Recognition 48of 62
Devnagari Character recognition challenges -1
•Devnagari is Two dimensional script as consonants are modified in many ways to form a meaningful letter.
•Same is also true for its recognition.
•The recognizer has to identify all the modifiers present in a letter.
•Generated ISCII codes or Unicode are the combined properly to display the digitized document.
![Page 49: character recognition: Scope and challenges](https://reader033.vdocument.in/reader033/viewer/2022061106/5442e0a1afaf9f0e118b477d/html5/thumbnails/49.jpg)
04/07/23 Devnagari Character Recognition 49of 62
Devnagari Character recognition challenges -2
•Compound letter segmentation.
•Upper and lower modifier segmentation.
•Left and right modifier segmentation
•Separating anuswara (.) and full stop from noise.
•Understanding punctuation marks in the document.
•Unconnected compound letters handwritten document.
•Connected simple letters in handwritten document.
![Page 50: character recognition: Scope and challenges](https://reader033.vdocument.in/reader033/viewer/2022061106/5442e0a1afaf9f0e118b477d/html5/thumbnails/50.jpg)
04/07/23 Devnagari Character Recognition 50of 62
Devnagari Character recognition challenges -3
•India is multilingual country. More than one language is used in a document frequently.
•Recognition of more than one language at a time is a great challenge.
•Initially Language recognition is to be done by looking into the properties of the script.
•English–Hindi language discrimination is moderately simple as compared to Marathi-Hindi.
•Various forms in Banks uses three languages (Marathi- State language, Hindi-National language and English- International language). This this work is still more challenging.
![Page 51: character recognition: Scope and challenges](https://reader033.vdocument.in/reader033/viewer/2022061106/5442e0a1afaf9f0e118b477d/html5/thumbnails/51.jpg)
04/07/23 Devnagari Character Recognition 51of 62
Multilingual character recognition
![Page 52: character recognition: Scope and challenges](https://reader033.vdocument.in/reader033/viewer/2022061106/5442e0a1afaf9f0e118b477d/html5/thumbnails/52.jpg)
04/07/23 Devnagari Character Recognition 52of 62
Examples of multi-oriented documents
![Page 53: character recognition: Scope and challenges](https://reader033.vdocument.in/reader033/viewer/2022061106/5442e0a1afaf9f0e118b477d/html5/thumbnails/53.jpg)
04/07/23 Devnagari Character Recognition 53of 62
Two column documents with image
![Page 54: character recognition: Scope and challenges](https://reader033.vdocument.in/reader033/viewer/2022061106/5442e0a1afaf9f0e118b477d/html5/thumbnails/54.jpg)
04/07/23 Devnagari Character Recognition 54of 62
Image Document recognition
![Page 55: character recognition: Scope and challenges](https://reader033.vdocument.in/reader033/viewer/2022061106/5442e0a1afaf9f0e118b477d/html5/thumbnails/55.jpg)
04/07/23 Devnagari Character Recognition 55of 62
Image Document recognition
Video caption text recognition
Cargo container code recognition
![Page 56: character recognition: Scope and challenges](https://reader033.vdocument.in/reader033/viewer/2022061106/5442e0a1afaf9f0e118b477d/html5/thumbnails/56.jpg)
04/07/23 Devnagari Character Recognition 56of 62
Image Document recognition
Poster capturing License plate reading
![Page 57: character recognition: Scope and challenges](https://reader033.vdocument.in/reader033/viewer/2022061106/5442e0a1afaf9f0e118b477d/html5/thumbnails/57.jpg)
04/07/23 Devnagari Character Recognition 57of 62
Image Document recognition
Whiteboard reading Road sign recognition
![Page 58: character recognition: Scope and challenges](https://reader033.vdocument.in/reader033/viewer/2022061106/5442e0a1afaf9f0e118b477d/html5/thumbnails/58.jpg)
04/07/23 Devnagari Character Recognition 58of 62
Image Document recognition
Message on glass door with complex background
Document recognition on mobile phone
![Page 59: character recognition: Scope and challenges](https://reader033.vdocument.in/reader033/viewer/2022061106/5442e0a1afaf9f0e118b477d/html5/thumbnails/59.jpg)
04/07/23 Devnagari Character Recognition 59of 62
International journals related to Character recognition
![Page 60: character recognition: Scope and challenges](https://reader033.vdocument.in/reader033/viewer/2022061106/5442e0a1afaf9f0e118b477d/html5/thumbnails/60.jpg)
04/07/23 Devnagari Character Recognition 60of 62
Conclusion
Development in character recognition will boost word processing and image understanding.Devnagari character recognition will help readers to listen to Indian literature using computers and PDA or e-book readers.It will help in language translation which is complex problem in multilingual country like India where each state have its own language.Many modern innovative applications will evolve which is the need of time in this information age.This will help in information processing to a large extent.
![Page 62: character recognition: Scope and challenges](https://reader033.vdocument.in/reader033/viewer/2022061106/5442e0a1afaf9f0e118b477d/html5/thumbnails/62.jpg)
04/07/23 Devnagari Character Recognition 62of 62
Acknowledgement
Friend, Philosopher and “ GUIDE”Dr. V.H. Mankar
for his consistent help and encouragement