chapter 9 experimental results and...
TRANSCRIPT
147
Chapter 9
[9]Experimental Results and Analysis
We have approached the problem of Indian language HCR in a series of phases of
increasing complexity, starting with vowels, then adding consonants and finally covering
Kagunita. At each stage, the performance was critically and systematically analyzed, to
identify problem areas. These were investigated through sub-experiments and the overall
model was refined accordingly.
At each stage, different experiments were formulated and conducted. Given the overall
recognition pipeline for OCR problems (see chapter 2), the experiments were set up with the
following hypotheses:
1. Preprocessing the images: Given the wide variation in the image samples that we have
collected, a perfect preprocessor was unlikely. Therefore, a preprocessor need to be
formulated based on how well the overall recognition performs. At the same time, we felt
that the pipeline usually followed in HCR for other languages may not be appropriate for
Indian languages (we restrict to Kannada in the rest of the discussion). On the one hand,
Kannada gives a lot of importance to small elements like knots, dots, hooks, etc. and
hence preprocessor should recognize and retain them. On the other hand, the richness of
shapes in Kannada may allow higher tolerance in distortion. Thus, the first challenge was
to evaluate a traditional preprocessor pipeline and modify it to evolve / design a
preprocessor suitable for Kannada.
2. The choice of features: Literature on character recognition (see chapter 2) reports a wide
range of features of different nature. Given the differences between Indian scripts and
148
scripts of other languages such as English, a one-size fits all approach is unlikely to
work. So we needed to study the utility and effectiveness of the available features for
Kannada script and formulate additional features as required. Much of our work was in
this direction. As mentioned below (and introduced in chapter 7), a number of different
feature classes were studied empirically. We also formulated some new feature classes
which were felt to be useful for Kannada scripts.
3. Tackling Kagunita: Kagunita combination of consonants and vowels resulting in change
of shape of both is a major challenge for Indian languages. Blindly treating every
combination as separate did not appear to be appropriate. A number of studies were
carried out to formulate a split recognition strategy, trying to recognize vowels and
consonants individually and collaboratively from a given Kagunita.
In the next three sub-sections we discuss these three classes of experiments briefly and
analyze the results.
9.1 Studies on Preprocessing Pipeline
A set of experiments were done using modified static preprocessing pipeline on the
basic character set. The results analysis showed that around 10% of the images remained
noisy after preprocessing and they were resulting in poor recognition.
We also experimented to find the effect of resizing of the image with and without
considering the aspect ratio and found that results were influenced by the resizing size and
method of size normalization. The performance with images resized to 64x64 by maintaining
the original shape showed an improvement of 7% in comparison to those images which were
resized to 64x64 by ignoring the aspect ratio. This was to be expected since the overall shape
of the character is an important attribute for certain character pairs. We also studied the effect
of various thinning algorithms on the basic character set, since loss of critical details like
knots and dots would be critical for Kannada. The thinning algorithm had an influence with
the reduction in results by 9% for LOG and Canny edge detection methods as compared to
morphological thinning. We, hence, included morphological thinning in our pipeline. Due to
huge variations in background and foreground intensity information in our image database,
this modified static pipeline also could not eliminate the noise. We analyzed the noise as
149
discussed in section 6.4. The two major noise problems traced with static preprocessing
pipeline are:
Conversion of background information as foreground and foreground information as
background, due to huge background and foreground variations when observed over
the whole corpus,
The addition of new joining edges between nearby contours, due to small image size,
ink spread, etc.
Based on these feedbacks, we formulated the dynamic preprocessing pipeline (see
section 6.5) that dynamically categorizes the images based on the above factors for noise
cleaning. These studies, observations and resulting pipeline architecture were discussed in
detail in chapter 6. The dynamically preprocessed images were mostly free from noise. This
thinned, confined within a bounding box and resized image is used further for feature
extraction.
9.2 Studies on Feature Set for Basic Character Set
We experimented with basic character set first and analyzed the maximum recognition
capability with different feature sets, considering parameters such as average recognition rate,
generalization capability and confusion matrix.
A number of features have been derived for the experiments on basic character set (and
later for Kagunita) images based on the observations and transformations discussed in chapter
7. For convenience in identifying the features in subsequent discussions and pictures (figures),
we introduced a notation scheme (in chapter 7) to describe the different features. The notation
is summarized in table 9.1 for ease of reference. For example, O-CM means central moment
features extracted from original image.
Table 9.1 Feature notation
Notation Meaning Examples O- Derived from original image (preprocessed) O-CM G- Derived from Gabor directional images G-CM C- Derived from cut images C-CM CM Central Moment features O-CM, G-CM, C-CM GM Geometric moment features O-GM, G-GM, C-GM
150
HU Hu’s Moment features O-HU, G-HU, C-HU CPB Curve position from boundary O-CPB GI Global information content O-GI, G-GI GA Global aspect ratio O-GA,G-GA ZD Zonal density O-ZD CPC Curve position from center O-CPC CLB lengths of curves from boundary O-CLB CLC Length of curves from center O-CLC ZPD Zonal probability distribution G-ZPD. C-ZPD S Statistical features- GI, GA and ZD
Statistical features- GI, GA and ZPD O-S G-S
C Curvature features- CPB, CPC, CLB & CLC O-C M Moments features GM, CM and HU O-M, G-M
For the basic character set, we extracted the features from the preprocessed original
image. We chose the initial set of features based on manual observation of their
discrimination ability, non-linearity and possible impact of the features on recognition etc.
The initial feature sets chosen are O-M (with 14 features), O-ZD with (with 30 features) and
O-CPB (18 features along with O-GA and O-GI). The features were verified for their value
range taken using Weka, an open source machine learning framework. Some features were
with very low range of values compared to others. This may lower the triggering of the
corresponding nodes of the Neural Network. Hence, such features are enhanced by scaling the
range of values taken by them to make all the features comparable in terms of the value range.
9.2.1 Effectiveness of various feature classes
The manual observations of the feature set are analyzed through the experimental
results. We first performed the experiments on individual feature sets and on various subsets
to know their individual contribution and their relation with other features in the feature sets.
Then based on the observations, the final feature set is formed. Along with average
recognition rate, the details of the classes that obtained the minimum recognition result,
maximum recognition result and the count of classes whose recognition rate is above average
are computed for every feature category. Table 9.2 shows the contributions from the various
feature sets taken individually and in combination.
151
Table 9.2 Performance of various feature sets on basic character set
RecognitonResults in %
Class No with recognition %
S.No.
Feature set Min Max Avg
Count of classes
>=average Min Max
1 O-CPB 19 78 43 19 20 412 O-ZD 40 88 63 25 29 433 O-CPB+O-ZD 55 95 79 21 33 434 O-GM 11 79 35 18 3 435 O-CM 4 56 26 22 11 96 O-HU 21 87 40 19 22 437 O-HU+O-CM 40 93 62 20 45 438 O-HU+O-GM 37 93 61 19 3 439 O-M 50 96 72 18 31 43
9.2.2 Effectiveness of individual feature classes
The experiments on the individual feature sets showed that, the zonal density statistical
features (O-ZD) performed well with average 63% as compared to curvature features (O-CPB)
with 43%. The Hu’s moments (O-HU) performed well with 40% as compared to Geometric
moments (O-GM) with 35% and Central Moments (O-CM) with 26%. As the number of
character classes to be recognized is large (49 classes 15 vowels and 34 consonants) with
huge variations in the sample images, the individual performance of these feature sets is not
satisfactory. In comparison to the count of features per feature set, even central moments
performance (26%) is promising. Hence, we choose to continue to experiment further with all
these feature sets.
9.2.3 Effect of Combining Feature Classes
It is observed that, with most of these features, the characters that had minimum
recognition are different. This shows the behavior of the feature sets on to the character
shapes. A character shape poorly recognized by one feature set is well recognized by another
feature set we call these the non-linear interaction of the respective features – and hence all
these features sets can be combined. A number of combinations were explored, and the
important results are summarized below.
152
a. As density and curvature features deal with different kind of information, they can be
grouped together and the combined average performance results are 79%. Though the
individual contribution of these feature sets is poor, they together are capable of
classifying 21 out of 49 characters above 79%.
b. Even though, the central moments performed poorly compared to geometric moments,
along with Hu’s moments, they showed 62% average recognition power. They are able to
distinguish 20 of 49 characters above average. As geometric moments deal with the image
without normalization, Central moments are translation normalized and Hu’s moments
represent image shape with the features invariant for translation, scale and rotation
variations, they convey different information and hence can be grouped. A combined
moments feature set (only 14 features) showed an average recognition rate of 72%
compared to 79% of density and curvature feature set (48features) results. Hence,
moments are robust features against huge variations. We also found the character that is
confused the most by each of the combined feature set.
c. We found that the character with minimum recognition rate is different with different
feature sets. As moments capture the centralness, diagonality, deviation etc, we can group
these features with the zonal density and curvature information. With O-M, O-ZD and O-
CPB (along with O-GA and O-GI features), almost all the characters were classified well,
with an average recognition rate of 92%, minimum of 78% for character 24 ( ) and a
maximum of 100 % for character 6 ( ) with handwritten shapes as shown in figure 9.1.
Figure 9.1 Sample Images of characters 24 and 6
The number of characters recognized above 92% is 24. That is, more than 50% of the
characters have more than 92% recognition rate.
To find the reason for mis-classification, we analyzed the confusion matrix from the
misses in the testing stage as shown in table 9.3. Each entry c(x,y) in the matrix gives the
count of how many times the character x is confused with the character y. Total number of
153
confusions per character is calculated. This is used to analyze the performance of the
system and identify areas requiring further attention.
Confusion matrix was analyzed to identify the nature of failures and to find the measures
to improve performance. The maximum miss characters are analyzed to catch with which
character they are confused and how many times they are confused. Characters with miss
count > 4 are chosen and used for further analysis. Some of these are tabulated in table 9.3
showing images of the confused character and of those with whom it is confused along
with the misclassification count. For example, the character no. 10 is confused with
character no. 9 seven times. Similarly character no. 12 is confused with character no. 20
seven times and with character number 23 nine times, etc. The printed form of the
character is also shown to facilitate comparing the characters.
Table 9.3 Confusion matrix for O-CPB, O-M and O-S feature set
Confused with Sl No Character No –
printed character Image character no/(count) times – printed character Images
1 10 – 9/7 –
2 12 – 20/7 – , 23/9 -
3 20 – 12/6 –
4 22 – 39/5 –
5 28 – 33/8 – , 48/6 -
6 31 – 42/5 –
7 32 – 29/11 –
8 34 – 29/5 –
9 40 – 7/7 –
10 44 – 9/7 –
These confusion matrix entries clearly indicate the high shape similarity among some
character pairs and huge writing variations as the reason for confusion. The data set
considered for the experiment has very large variations. Some characters are written so
badly or differently as shown in the confusion matrix 2nd column and 1st, 6th and 10th rows,
that it is hard even for human readers to identify them when presented independently.
154
Based on the confusion matrix analysis, we further developed two new feature sets for
basic character recognition as explained below in (d.) and (e.).
d. The detailed analysis of the confusion matrix showed that the characters differentiable
with features within the contour are under confusion. Also, we found that the curvature
feature contribution is less compared to zonal density and the moments features. It is
mainly due to the curvature details being taken only from boundaries. As some Kannada
characters are differentiated by the central information, curvature details observed from
the center may be important. Hence we improved the curvature feature set by adding some
more curvature features (O-CLB, O-CPC and O-CLC). We found that these additional
features improved the average performance of curvature features from 43% to 73%.
e. As the Kannada characters are curved, the moments features from the Gabor directional
images (see section 7.4) appeared likely to provide useful discriminatory information. We
studied this experimentally, and obtained average performance of 85% with moments as
features from Gabor directional images as compared to 72% when using moments from
original images. The minimum and maximum recognition rates obtained are 67% and
99% respectively. The number of character classes recognized above average (85%) is 25.
The characters that suffered minimum recognition rate is 47- which is different from the
character (character 31- ) that suffered minimum recognition using moments from
original image. The plot of recognition rate of both these results as shown in figure 9.2
showed that they are non-linearly related. Hence they can be combined together as a
feature set to produce still stronger feature set.
155
0
20
40
60
80
100
120
1 4 7 10 13 16 19 22 25 28 31 34 37 40 43 46 49
Number of classes
Rec
ogni
tion
resu
lts
original image moments directional image moments combined features
Figure 9.2 Recognition results with moments features from original and directional images
The study on feature set resulted in a number of feature sets – curvature feature set,
moments feature set, density (statistical) feature set, moment feature set from Gabor
directional images. All these feature sets show a nonlinear relation and hence can be
combined together to obtain better performance. Together they produce better results than any
of them would individually do, as we discuss in the next section.
9.2.4 Final Recommendations
A number of experiments are performed on an optimized Multi-Layer Perceptron
neural network with Back Propagation method for learning. In each case, the training process
stopped when the performance is observed to be in its peak and the % recognition results
remained steady over a number of epochs. The best results observed for basic character set
recognition are presented in table 9.4. The results of basic character set using curvature,
statistical and moments features extracted from dynamically preprocessed images and that of
moments features from original image and from Gabor directional images, are discussed in
detail in the following sub-sections.
156
Table 9.4 Experimental results for Basic Character set
% Recognition results Sl No Feature set
Average Minimum Maximum
1 O-M+ O-S+ O-C 96 86 100
2 G-M + O-M 92 80 100
9.2.4.1 Experiment results with curvature, statistical and moments features
Along with the CPC, CLB and CLC features (O-M, O-S and O-C together), the
recognition rate further improved with an average recognition rate of 96% and a minimum
recognition rate of 86%. We also performed the experiments by reducing the number of
hidden nodes. It is observed that this 94 feature set is capable of recognizing with an average
of 88% and 96% with 49 and 188 hidden layer nodes respectively with a performance drop of
8%. This means that the system is generalized and may be capable of recognizing the
unknown characters with this feature set. The confusion matrix is in table 9.5.
Table 9.5 Confusion matrix for O-C, O-M and O-S feature set
Confused with Sl No Character No –
printed character character no-printed character /(count) times
1 12- 23- /5
2 20- 12- /4
3 27- 28- /4 , 42- /4
4 29- 32- /5, 34- /4
5 31- 28- /4
6 32- 34- /4, 47- /4
7 40- 7- /4
8 44- 28- /4
9 45- 35- /5, 42- /4
10 46- 19- /6
The confusion matrix analysis shows that the confusion remaining is between the
similar shaped characters. It is observed that 5 is the maximum times a character has been
157
confused with another character, against 200 number of instances used (less than 2.5 %). All
other mismatches are scattered and some are seen to be the random matches to that class, like
character 45- mismatching with character 42- , etc.
9.2.4.2 Experiment results with moments as features
The moments features from four directional images gave an average recognition rate of
85% with a minimum recognition rate of 76%. As the moments features extracted from
original image and from directional images, show nonlinearity as explained in sub-section
9.2.2 (e), they can be combined together as features. When O-M features are considered along
with G-M features, the minimum recognition increased from 67% to 80% with an
improvement of 13% in the minimum recognition rate and the average recognition rate
improved from 85% to 92%. There are 22 characters above 92% average recognition rate and
42 characters (out of 49 characters) above 85%. The character that suffered with minimum
recognition rate of 80% is character 32 - . The final confusion matrix is as shown in table
9.6.
Table 9.6 Confusion matrix for combined moments features
Confused with Sl No Character No – printed character character no-printed character /(count)
times
1 12- 20- /5
2 19- 37- /7
3 22- 39- /6
4 23- 20- /6
5 24- 7- /7
6 26- 10- /5
7 27- 42- /5
8 28- 33- /6
9 32- 22 - /5, 34- /9, 37 - /6
10 36- 27- /5, 47- /5, 48- /5
158
11 37- 29- /5
12 41- 40- /5
13 44- 9- /8
14 45- 18- /5, 27- /7
15 47- 4- /5, 16- /5, 34- /6
16 48- 7- /6
17 49- 25- /5
The confusion matrix analysis cases shows that the remaining confusion is between
closely similar shaped characters like , , , , , , etc, which is hard to distinguish even
manually under large writing variations.
The different categories of features – curvature, statistical and moments – together
perform well. It is observed that moments are strong features for representing the Kannada
characters. All these features can be used together and can improve the performance further.
But due to character shape similarities and writing deformations, even manual recognition of
some of the samples is difficult when such images are seen randomly. Under such cases, any
number of features added to the feature set may not be able to discriminate the character.
Hence under such cases, it is good to go for top-3 choices and process further using post-
processing stage. Thus, at this stage, we have two sets of features that can be used effectively
for Kannada as shown in table 9.4.
9.3 Studies on Feature Set for Kagunita
As Kagunita is a combination of a vowel and a consonant with the resultant shape
retaining neither the consonant shape nor the vowel shape fully, recognition of Kagunita
character is a difficult task. As there are similarities between the vowel matras and between
the consonants, and also due to huge writer variations, recognizing the shape as a Kagunita
character (510 classes) may not be a suitable choice. To reduce the number of classes from
510 classes, we proposed to explore separate recognition of vowel and consonant present in
the Kagunita image.
159
We used three different categories of images for recognizing a Kagunita in terms of
vowel and consonant. They are cut images (refer section 7.5), Gabor directional images (refer
section 7.4) and the original preprocessed image. Original image and directional images are
complete images where as the cut images are the portions of the whole image cut to include
regions dominant with vowel or consonant information to form isolated images. For Kagunita
recognition, we used all the above features – curvature, statistical, moments and moments
from directional images and a new zonal probability density feature. The Kagunita features
extracted from the three different images (original, directional and cut images) are shown in
table 9.7.
Table 9.7 Feature sets for experiments on Kagunita
Image type Kagunita features (feature count) Feature set size
Cut images C-CM (3) and C-HU (7) C-ZPD (1 per zone*6 zones)
16 for each cut image
Four Gabor directional images G-CM (3), G-HU (7) G-GI(1), G-GA(1) and G-ZPD (1 per zone* 10 zones)
22 for each directional image
Original image O-CPB(16), O-CPC(16), O-CLB(8), O-CLC(8) O-GM(4), O-CM(3) and O-HU(7) O-GI(1), O-GA(1) and O-ZD (5 per zone * 6 zones)
94
Vowel recognition in a Kagunita is a 15 class problem, wherein, it is required to
identify the vowel matra associated with a consonant. For vowel recognition in the Kagunita,
we used cut images and directional images. Similarly, consonant recognition in a Kagunita is
a 34 class problem, wherein, it is required to identify the consonant associated with a vowel
matra. For consonant recognition, since the consonants classes for recognition is large, many
consonants are similar in shape and also the vowel matra modifies the shape, we need large
feature set with strong features for discrimination. Here we used the features extracted from
all the three image types - cut images, directional images and original image. To extract zonal
statistical feature – zonal probability density feature from cut images and Gabor directional
images, we used Nx1 horizontal zones and 1xN vertical zones. For cut images, we used N=3
giving 6 zones and for Gabor directional images N=5 giving 10 zones.
160
9.3.1 Performance of moment features on whole Kagunita images
The individual feature set and sub-feature set performances of moments features on
whole Kagunita images are given in table 9.8.
Table 9.8 Performance of moments features from whole Kagunita images
Recognition results in % Slno
Feature description Vowels Consonants
Mean Min Max Mean Min Max 1 O-CM+O-HU 40 15 70 24 8 51 2 O-M 47 18 74 29 4 62 3 G-CM+G-HU 63 42 81 35 9 60 4 G-M 64 40 82 36 9 62
The performance of CM and HU features from the original images are compared with
the CM and HU features from four Gabor directional images. There is an improvement of
23% and 11% in the recognition of vowels and consonants respectively. The performance of
all the 14 moment features (O-M) from original image is 47% and 29% for vowels and
consonants and for G-M (56 features), it is 64% and 36%. The performance of G-M improved
by just 1% on average for both vowels and consonants to 64% and 36% respectively when G-
GM is also considered along with G-CM and G-HU. It is observed that the moments features
from the directional images performed well compared to moments features from original
images. The consonant results are poor with both feature sets.
The plot of O-M from original images and O-CM+O-HU from directional images for
15 vowels and 34 consonants are as shown in the figure 9.3 (a) and (b) respectively. As
expected they show a nonlinear relation indicated with circles. This suggests that they can be
used together as features in further experiments.
(a) Recognition results for vowels (b) Recognition results for consonants Figure 9.3 Nonlinearity of Moments features from original image and four directional images
161
The vowel recognition results using combined moments along with statistical features
is as shown in table 9.9. When moment features G-CM, G-HU and O-M are together used as
features, the vowel average recognition results improved from 63% to 71%. The statistical
features G-S showed a performance of 21% individually but when combined with G-CM and
G-HU, the results improved from 63% to 69%. When all these features here used together, the
average results are 77%.
Table 9.9 Performance of combined moments and statistical features from whole Kagunita images
Recognition results in % Sl no Feature description
Vowels Consonants
Mean Min Max Mean Min Max
1 G-CM+G-HU+O-M 71 57 83 44 22 65
2 G-S 21 09 43 10 01 29
3 G-CM+G-HU+G-S 69 50 84 43 20 67
4 G-CM+G-HU+G-S+O-M 77 56 92 49 23 78
Table 9.9 also lists the combined moments and statistical features recognition results
for consonants. When moment features G-CM, G-HU and O-M are together used as features,
the recognition result is average 44% with an improvement of 9%. The statistical features G-S
did not perform well but when combined with G-CM and G-HU, the results improved from
35% to 43% and along with O-M, the result is 49%.
9.3.2 Performance of cut images on original Kagunita images
To understand the effect of cuts that extracts some portion from the image, we did
experiment on the basic character set with a 20% cut from all the directions. That is, the base
character loses 20% information from all the directions so that the contribution of the image
cut around its boundary can be analyzed. As this may happen to the cut image shapes, this is
one of the useful analyses to get a feel of the impact of the cut on recognition. The results are
average 89% when tested using O-CPB, O-S and O-M features; the reduction observed is of
only 3% in the mean results. The result is shown in table 9.11. The results helped in adopting
the concept of cut images to make the vowel or consonant related strokes in the image more
162
significant for feature extraction. This study also helped to understand the effect of cuts and
cut positions on the consonant recognition within Kagunita.
In order to isolate the portion of the image for Kagunita recognition, we analyzed the
regions dominated by the vowel information and consonant information as explained in
section 7.5. It is clear from the Kagunita shape analysis that to extract the vowel information,
five cut images – one on top, three on left and one on bottom are required and to extract
consonant information, three cut images from left are required. As the position of the cuts for
generating the cut images is influenced by the writing variations, the positions cannot be
uniquely determined in advance and hence we identified some potential cut sizes. The % cut
image sizes considered for the vowel recognition experiments and consonant recognition
experiments are listed in table 9.10(a) and (b) respectively.
Table 9.10 (a) size and count of cut images for Vowel recognition
Top Right Bottom SlNo Vowel Feature
description No of
Images%cut
No of Images % cut No of
Images %cut
1 20T20R20B 1 20 1 20 1 20 2 30T30R30B 1 30 1 30 1 30 3 30T50R30B 1 30 1 50 1 30 5 20T204060R20B 1 20 3 20,40,60 1 20 6 20T305070R20B 1 20 3 30,50,70 1 20
Table 9.10 (b) Size and count of cut images for Consonant recognition
Bottom Left Consonant feature
description No of
Images % cut No of Images % cut
507090 - - 3 50,70,90 406080 - - 3 40,60,80 305070 - - 3 30,50,70
305070L30B 1 30 3 30,50,70
In the table, we have used a specific notation to describe the cuts effected. For
example, ‘20T305070R20B’ means a single 20% cut on the top, three different cuts with 30%,
50% and 70% from the right and a single 20% cut in the bottom. Similarly we considered
different cut images for consonant recognition. For example, ‘406080’ means, 40%, 60% and
80% cuts from left. The cut size is varied and different sets of cut images are generated for the
experiments.
163
From all the cut images, the geometric moments, central moments, Hu's moments and
the zonal probability density statistical features are computed. As there are four different
vowel sizes to the right of the consonant, we consider one cut or three cuts to the right to
check the effect of three cuts over single cut. The performances of different cut images on
vowel recognition are shown in the table 9.11. The effect of different percent cuts is analyzed.
It is found that extracting three cut images performs well (69%) as compared to single cut
(57%). As this selection is based on trials, two of them 30T204060R30B and 20T305070R20B are
selected for further computations.
Table 9.11 Results of CUT experiments on original images
Vowels Consonants SlNo Feature description Recognition results in
% on test data Feature description
Recognition results in % on test data
Mean Min Max Mean Min Max
1 94 features from basic character set 89 75 100
1 30T50R30B 52 25 75 507090 42 18 78
2 20T20R20B 57 36 84 406080 46 15 88
3 30T30R30B 55 26 78 305070 34 14 64
4 30T204060R30B 66 47 90 305070L30B 37 16 76
5 20T204060R20B 69 48 83 406080L30B 35 13 72
6 20T305070R20B 69 50 87 507090L30B 36 11 74
Similar experiments are done for consonant feature extraction. The different percent
cuts from the left are tried. Experiments are done with three different percent cuts. As the
consonant shape in the bottom portion carries distinguishing information for some similar
shape characters, some experiments are performed by considering three left cut images and
one bottom cut image together but, we found that the system performance reduced from 46%
to 35% as shown in table 9.11. The two highlighted images 406080 and 405080L30B from both
categories are used for further work.
We considered two cut image sets for both vowel and consonant recognition for
further experiments so that the effect of the cut sizes along with other features can be
analyzed.
164
9.3.3 Kagunita Recognition
We find through these experiments that none of these three types of images (original,
directional and cut images) alone can give satisfactory results. The recognition result analysis
showed that the characters suffering lower recognition rate are different in all three cases.
Hence the combined feature set should perform better. Based on this intuition, we performed
further experiments on combined feature sets. The results are shown in table 9.12.
Table 9.12 Experimental results for Kagunita with combined feature sets
% Recognition results Imagesused Feature set Vowels Consonants Observations
(1) G-CM+G-HU+G-S+20T305070R20B
Top-1 84% Top-2 92% Top-3 96%
96% in top-3 shows that the maximum confusion is with similar shaped characters
(2) (1)+ O-M Top-1 86% With the addition of O-M, 2% increase in vowels results.
(3) G-CM+G-HU+G-S+ 406080 Top1 63%
(4) (3) + O-M Top1 65%
The addition of moments from original image improved the results by 2% for consonants.
Whole and cut images
(5)
G-CM+G-HU+G-S+ 406080+ O-M+O-C+O-S
Top-1 72% Top-2 84% Top-3 89%
The O-C curvature information, O-S density distribution and O-M moments from original image improved the results from 63% to 72%
The moments and statistical features extracted from cut images 20T305070R20B and
from four Gabor directional images G-CM, G-HU and G-S performed well with an average
recognition rate of 84% for vowel recognition as in table 9.12. The minimum and the
maximum recognition rates are 72% and 93% respectively. The top-2 and top-3 results are
92% and 96% respectively. Along with O-M features, the result is 86% with an improvement
of 2% for vowel recognition. The best cut image result is shown in the table. With G-CM+G-
S+30T204060R30B using another 5 cut images set, the top-1, top-2 and top-3 results are 82%,
91% and 95% respectively. This shows that the specific cut image size has minimal influence
and performs similarly along with other features for vowel recognition.
Similarly for consonants, the recognition rate is average 63% as in table 9.12. With O-
M, there is an improvement of 2% with an average consonant recognition rate of 65% which
is not sufficient to build a practically feasible HCR solution. To strengthen the consonant
feature set, we considered the curvature and statistical features from the original image also
165
for consonant recognition. This improved the results by 7% with an average recognition result
of 72%. The minimum and the maximum recognition rates are 39% and 97% respectively.
The top-2 and top-3 results are 84% and 89% respectively. With G-CM+G-S+406080L30B (using
another cut image), the top-1, top-2 and top-3 results are 70%, 82% and 88% respectively.
Again on the top-3 level, the performance difference is just 1% and so we can say that the cut
images of different size and number has minimal influence on the performance.
To uniquely recognize the 34 different consonants under huge vowel shape variations
and close consonant similarities, the feature set need to be further strengthened. For this we
computed the confusion matrix to analyze the reasons for confusion.
We examined confusion matrices for vowels and for consonants to find the most
confused vowels and consonants and with what they are confused with. We considered all the
characters confused with 5 or more times. The vowel confusion matrix is as shown in the
table 9.13. For clarity the typed characters are shown instead of handwritten images.
Table 9.13 Vowel Confusion matrix
SlNo
Vowel and its matra
Confused 5 or more times with another vowel and its matra
1
2
3
4
5
6
7
8
9
The analysis of the confusion matrix shows that the confusion is mostly between the
similar vowel matras. Also it is observed that some Kagunita shapes along with matra become
similar to other Kagunita shapes.
For example, and , and , and etc.
166
With huge writing variations, if the circle in ( ) matra becomes smaller or the circle
in ( ) matra becomes bigger, there will be close similarities. If the ( ) matra is written
with short horizontal line, it can be confused with ( ) matra.
The Consonant confusion matrix is shown in the table 9.14. We can observe the close
shape similarities as the reason for the confusion. For example with matra becomes
with a shape very close to . The manual shape similarities observation matches with the
confusion results.
Table 9.14 Consonant Confusion matrix
SlNo
Consonant Confused 5 or more times with
123456789
1011121314151617
To further improve the results, a cascaded neural network is proposed. This design is
based on the intuition that ‘knowing the kind of vowel present in the Kagunita image, the
consonant recognition may improve and vice versa’ (see section 8.4). As shown in figure 9.4,
the architecture has four neural networks (NN) with two independent NNs in the first stage:
one for vowel recognition and another for consonant recognition and two independent NNs in
the second stage again separately for vowel and consonant recognition.
167
Figure 9.4 Cascaded Neural Network
The top-3 vowel results from the first stage are encoded in 15 bits with the
corresponding vowel bit positions set to 1 giving V feature vector of size 15. Similarly, the
top-3 consonant results from the first stage are encoded in 34 bits by setting the corresponding
bits forming the C feature vector of size 34. This kind of encoding is chosen as it does not
signify the ranking of recognition and hence top-3 matching have equal weightage.
The encoded top-3 results of vowels from the first stage NN are used as additional
features along with the other features of the consonants by the second stage NN for consonant
recognition. Similarly the encoded top-3 results of consonants from the first stage NN are
used as features along with the other features of the vowels by the second stage NN for vowel
recognition. The second level neural network results are influenced by the knowledge of top-3
possible consonants that may be present in the Kagunita for the vowel recognition and vice
versa. The results and observations are in table 9.15.
Table 9.15 Experimental results for Kagunita with cascaded NN
Feature set % Recognition results
Vowels Consonants Vowels ConsonantsObservations
G-CM+ G-S + 20T305070R20B + C
G-CM+G-S+ O-GM+ O-CM + O-C+O-S+ 406080+V
Top-1 84%Top-2 93%Top-3 97%
Top-1 72%Top-2 84%Top-3 90%
Significant improvement in the top-3 results indicates that most of the time the confusion is between three similar shaped characters
Vowel class
Top 3 matches
NN
V1
NN
C1
NN
V2
NN
C2
VowelFeatures
Consonantclass
ConsonantFeatures
168
The individual recognition results (Top 1) of vowel and consonant in a Kagunita are
84% and 72% respectively. This result is the same as before cascading which means that the
over all performance of the system is not affected. When individual performances are
observed, for consonants, the minimum recognition rate improved from 39% to 44% and top-
3 recognition results improved by 1%. For vowels, the maximum recognition rate improved
from 93% to 95%.
When Kagunita is viewed as a character, to recognize it, both vowel and consonant in
that character need to be recognized correctly. We considered top-3 results of cascaded neural
network as shown in table 9.16.
Table 9.16 Kagunita recognition results
Consonant No Vowel = true
Consonant = true Vowel = false
Consonant = true Vowel = true
Consonant = false Vowel = false
Consonant = false count % count % count % count %
1 190 84.44 16 7.11 17 7.56 2 0.89
2 198 88 6 2.67 17 7.56 4 1.78
3 198 88 8 3.56 19 8.44 0 0
4 179 79.56 13 5.78 30 13.33 3 1.33
5 210 93.33 9 4 4 1.78 2 0.89
6 184 81.78 9 4 31 13.78 1 0.44
7 190 84.44 12 5.33 22 9.78 1 0.44
8 176 78.22 13 5.78 35 15.56 1 0.44
9 153 68 61 27.11 4 1.78 7 3.11
10 205 91.11 14 6.22 6 2.67 0 0
11 208 92.44 8 3.56 9 4 0 0
12 206 91.56 6 2.67 12 5.33 1 0.44
13 189 84 9 4 23 10.22 4 1.78
14 192 85.33 9 4 23 10.22 1 0.44
15 203 90.22 13 5.78 8 3.56 1 0.44
16 192 85.33 7 3.11 25 11.11 1 0.44
17 179 79.56 12 5.33 32 14.22 2 0.89
18 182 80.89 4 1.78 39 17.33 0 0
19 190 84.44 3 1.33 30 13.33 2 0.89
20 202 89.78 10 4.44 13 5.78 0 0
21 187 83.11 8 3.56 27 12 3 1.33
22 167 74.22 16 7.11 38 16.89 4 1.78
169
Consonant No Vowel = true
Consonant = true Vowel = false
Consonant = true Vowel = true
Consonant = false Vowel = false
Consonant = false
23 190 84.44 7 3.11 27 12 1 0.44
24 199 88.44 9 4 16 7.11 1 0.44
25 170 75.56 36 16 16 7.11 3 1.33
26 176 78.22 28 12.44 18 8 3 1.33
27 199 88.44 0 0 25 11.11 1 0.44
28 198 88 11 4.89 16 7.11 0 0
29 183 81.33 10 4.44 29 12.89 3 1.33
30 189 84 11 4.89 22 9.78 3 1.33
31 177 78.67 25 11.11 20 8.89 3 1.33
32 164 72.89 6 2.67 49 21.78 6 2.67
33 199 88.44 10 4.44 15 6.67 1 0.44
34 199 88.44 13 5.78 12 5.33 1 0.44
Average % 83.96 5.82 9.84 0.84
Minimum % 68 0 1.78 0
Maximum % 93.33 27.11 21.78 3.11
The top-3 Kagunita recognition rate is 84% with both vowel and consonant recognized
correctly. Cases where only the vowels is recognized wrongly average 6%, consonants
recognized wrongly is average 10% and both recognized wrongly is average around just 1%.
The confusion is in terms of which component (vowel or consonant or both) is
wrongly recognized. The Kagunita with consonant 9- showed the maximum error of
27.1% for wrong recognition of vowels associated with it. The consonant itself showed an
error of 1.78% and the percentage of both wrongly recognized is 3.11% which is again the
maximum error of both being wrongly recognized. Similarly, the Kagunita with consonant
32- was wrongly recognized with a maximum error rate of 21.8%, its vowel component is
wrongly recognized by 2.67% and both wrongly recognized is again 2.67%. Cases of
Kagunita with both vowel and consonant being false are maximum only 3%.
The consonant 5- has maximum top-3 recognition rate of 93.33% and consonant 9-
has minimum top-3 recognition rate of 68%. The same consonant 9- suffered
170
maximum faults for vowel recognition. Consonant 27- (both correctly recognized with
88.4%) did not suffer (0%) vowel recognition problem, 11.11% of wrong consonant
recognition and 0.44% of both vowel and consonant wrong recognition rates. The consonant
32- suffered the maximum for consonant recognition by 21.8% whereas both vowel and
consonant going wrong is 2.67%. The consonant 9- suffered the most with both being
wrong is 3.11%. The confusion analysis shows that the consonant 9- has similarity with
, and has more curves to the right in the basic shape itself as compared to others due to
which the shape change due to matra is different. We also observe the reduced recognition
rates for (78.2%) and (75.8%).
The Kagunita result overall analysis is as follows.
When both vowel and consonant get correctly recognized, the Kagunita is recognized
and the top-3 Kagunita recognition rate is 84%. As both being erroneously recognized is just
average 1%, with the use of post-processing stage, the wrongly recognized vowel, or
consonant or the choice of one of the top-3 characters can be determined.
The cut image concept worked well for the Kagunita recognition. Under vowel
recognition experiments, the three cuts from the right performed well over single cut on the
right, presumably because one of the three cuts extract the right side vowel information fairly
well based on the size of the vowel.
Under consonant recognition experiments, the three cuts from the left performed well.
The additional cut from the bottom did not contribute positively. This is expected given that
the modification is rarely on the bottom side.
The cascaded network appears to be a promising idea as per our initial experiments,
but it did not contribute any noticeable improvement to the top 1% performance; though there
was visible improvement in the overall performance. This is now being investigated further.
No comparable work in Indian language OCR exists for comparing our results; much
less for Kannada. Much of the existing work covers simply numerals or at best stand alone
vowels and consonants. Even though, in absolute terms, the recognition figures may not be
high enough for practical use, as the recognition rate is over a fairly extreme range of
171
character shapes, and at a single isolated character level, in a practical system one can use
linguistic information as part of post-processing to significantly enhance the effective
recognition. For example, using a dictionary to cross check and rank the characters can often
get to recognize words with high enough accuracy for practical use. In some cases, human
intervention can be brought in to fix the surviving errors. In practice, one can tune the system
more accurately with character samples on a per user basis also to get better performance
figures.