chapter 9 experimental results and...

25
147 Chapter 9 Experimental Results and Analysis We have approached the problem of Indian language HCR in a series of phases of increasing complexity, starting with vowels, then adding consonants and finally covering Kagunita. At each stage, the performance was critically and systematically analyzed, to identify problem areas. These were investigated through sub-experiments and the overall model was refined accordingly. At each stage, different experiments were formulated and conducted. Given the overall recognition pipeline for OCR problems (see chapter 2), the experiments were set up with the following hypotheses: 1. Preprocessing the images: Given the wide variation in the image samples that we have collected, a perfect preprocessor was unlikely. Therefore, a preprocessor need to be formulated based on how well the overall recognition performs. At the same time, we felt that the pipeline usually followed in HCR for other languages may not be appropriate for Indian languages (we restrict to Kannada in the rest of the discussion). On the one hand, Kannada gives a lot of importance to small elements like knots, dots, hooks, etc. and hence preprocessor should recognize and retain them. On the other hand, the richness of shapes in Kannada may allow higher tolerance in distortion. Thus, the first challenge was to evaluate a traditional preprocessor pipeline and modify it to evolve / design a preprocessor suitable for Kannada. 2. The choice of features: Literature on character recognition (see chapter 2) reports a wide range of features of different nature. Given the differences between Indian scripts and

Upload: others

Post on 24-May-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

147

Chapter 9

[9]Experimental Results and Analysis

We have approached the problem of Indian language HCR in a series of phases of

increasing complexity, starting with vowels, then adding consonants and finally covering

Kagunita. At each stage, the performance was critically and systematically analyzed, to

identify problem areas. These were investigated through sub-experiments and the overall

model was refined accordingly.

At each stage, different experiments were formulated and conducted. Given the overall

recognition pipeline for OCR problems (see chapter 2), the experiments were set up with the

following hypotheses:

1. Preprocessing the images: Given the wide variation in the image samples that we have

collected, a perfect preprocessor was unlikely. Therefore, a preprocessor need to be

formulated based on how well the overall recognition performs. At the same time, we felt

that the pipeline usually followed in HCR for other languages may not be appropriate for

Indian languages (we restrict to Kannada in the rest of the discussion). On the one hand,

Kannada gives a lot of importance to small elements like knots, dots, hooks, etc. and

hence preprocessor should recognize and retain them. On the other hand, the richness of

shapes in Kannada may allow higher tolerance in distortion. Thus, the first challenge was

to evaluate a traditional preprocessor pipeline and modify it to evolve / design a

preprocessor suitable for Kannada.

2. The choice of features: Literature on character recognition (see chapter 2) reports a wide

range of features of different nature. Given the differences between Indian scripts and

148

scripts of other languages such as English, a one-size fits all approach is unlikely to

work. So we needed to study the utility and effectiveness of the available features for

Kannada script and formulate additional features as required. Much of our work was in

this direction. As mentioned below (and introduced in chapter 7), a number of different

feature classes were studied empirically. We also formulated some new feature classes

which were felt to be useful for Kannada scripts.

3. Tackling Kagunita: Kagunita combination of consonants and vowels resulting in change

of shape of both is a major challenge for Indian languages. Blindly treating every

combination as separate did not appear to be appropriate. A number of studies were

carried out to formulate a split recognition strategy, trying to recognize vowels and

consonants individually and collaboratively from a given Kagunita.

In the next three sub-sections we discuss these three classes of experiments briefly and

analyze the results.

9.1 Studies on Preprocessing Pipeline

A set of experiments were done using modified static preprocessing pipeline on the

basic character set. The results analysis showed that around 10% of the images remained

noisy after preprocessing and they were resulting in poor recognition.

We also experimented to find the effect of resizing of the image with and without

considering the aspect ratio and found that results were influenced by the resizing size and

method of size normalization. The performance with images resized to 64x64 by maintaining

the original shape showed an improvement of 7% in comparison to those images which were

resized to 64x64 by ignoring the aspect ratio. This was to be expected since the overall shape

of the character is an important attribute for certain character pairs. We also studied the effect

of various thinning algorithms on the basic character set, since loss of critical details like

knots and dots would be critical for Kannada. The thinning algorithm had an influence with

the reduction in results by 9% for LOG and Canny edge detection methods as compared to

morphological thinning. We, hence, included morphological thinning in our pipeline. Due to

huge variations in background and foreground intensity information in our image database,

this modified static pipeline also could not eliminate the noise. We analyzed the noise as

149

discussed in section 6.4. The two major noise problems traced with static preprocessing

pipeline are:

Conversion of background information as foreground and foreground information as

background, due to huge background and foreground variations when observed over

the whole corpus,

The addition of new joining edges between nearby contours, due to small image size,

ink spread, etc.

Based on these feedbacks, we formulated the dynamic preprocessing pipeline (see

section 6.5) that dynamically categorizes the images based on the above factors for noise

cleaning. These studies, observations and resulting pipeline architecture were discussed in

detail in chapter 6. The dynamically preprocessed images were mostly free from noise. This

thinned, confined within a bounding box and resized image is used further for feature

extraction.

9.2 Studies on Feature Set for Basic Character Set

We experimented with basic character set first and analyzed the maximum recognition

capability with different feature sets, considering parameters such as average recognition rate,

generalization capability and confusion matrix.

A number of features have been derived for the experiments on basic character set (and

later for Kagunita) images based on the observations and transformations discussed in chapter

7. For convenience in identifying the features in subsequent discussions and pictures (figures),

we introduced a notation scheme (in chapter 7) to describe the different features. The notation

is summarized in table 9.1 for ease of reference. For example, O-CM means central moment

features extracted from original image.

Table 9.1 Feature notation

Notation Meaning Examples O- Derived from original image (preprocessed) O-CM G- Derived from Gabor directional images G-CM C- Derived from cut images C-CM CM Central Moment features O-CM, G-CM, C-CM GM Geometric moment features O-GM, G-GM, C-GM

150

HU Hu’s Moment features O-HU, G-HU, C-HU CPB Curve position from boundary O-CPB GI Global information content O-GI, G-GI GA Global aspect ratio O-GA,G-GA ZD Zonal density O-ZD CPC Curve position from center O-CPC CLB lengths of curves from boundary O-CLB CLC Length of curves from center O-CLC ZPD Zonal probability distribution G-ZPD. C-ZPD S Statistical features- GI, GA and ZD

Statistical features- GI, GA and ZPD O-S G-S

C Curvature features- CPB, CPC, CLB & CLC O-C M Moments features GM, CM and HU O-M, G-M

For the basic character set, we extracted the features from the preprocessed original

image. We chose the initial set of features based on manual observation of their

discrimination ability, non-linearity and possible impact of the features on recognition etc.

The initial feature sets chosen are O-M (with 14 features), O-ZD with (with 30 features) and

O-CPB (18 features along with O-GA and O-GI). The features were verified for their value

range taken using Weka, an open source machine learning framework. Some features were

with very low range of values compared to others. This may lower the triggering of the

corresponding nodes of the Neural Network. Hence, such features are enhanced by scaling the

range of values taken by them to make all the features comparable in terms of the value range.

9.2.1 Effectiveness of various feature classes

The manual observations of the feature set are analyzed through the experimental

results. We first performed the experiments on individual feature sets and on various subsets

to know their individual contribution and their relation with other features in the feature sets.

Then based on the observations, the final feature set is formed. Along with average

recognition rate, the details of the classes that obtained the minimum recognition result,

maximum recognition result and the count of classes whose recognition rate is above average

are computed for every feature category. Table 9.2 shows the contributions from the various

feature sets taken individually and in combination.

151

Table 9.2 Performance of various feature sets on basic character set

RecognitonResults in %

Class No with recognition %

S.No.

Feature set Min Max Avg

Count of classes

>=average Min Max

1 O-CPB 19 78 43 19 20 412 O-ZD 40 88 63 25 29 433 O-CPB+O-ZD 55 95 79 21 33 434 O-GM 11 79 35 18 3 435 O-CM 4 56 26 22 11 96 O-HU 21 87 40 19 22 437 O-HU+O-CM 40 93 62 20 45 438 O-HU+O-GM 37 93 61 19 3 439 O-M 50 96 72 18 31 43

9.2.2 Effectiveness of individual feature classes

The experiments on the individual feature sets showed that, the zonal density statistical

features (O-ZD) performed well with average 63% as compared to curvature features (O-CPB)

with 43%. The Hu’s moments (O-HU) performed well with 40% as compared to Geometric

moments (O-GM) with 35% and Central Moments (O-CM) with 26%. As the number of

character classes to be recognized is large (49 classes 15 vowels and 34 consonants) with

huge variations in the sample images, the individual performance of these feature sets is not

satisfactory. In comparison to the count of features per feature set, even central moments

performance (26%) is promising. Hence, we choose to continue to experiment further with all

these feature sets.

9.2.3 Effect of Combining Feature Classes

It is observed that, with most of these features, the characters that had minimum

recognition are different. This shows the behavior of the feature sets on to the character

shapes. A character shape poorly recognized by one feature set is well recognized by another

feature set we call these the non-linear interaction of the respective features – and hence all

these features sets can be combined. A number of combinations were explored, and the

important results are summarized below.

152

a. As density and curvature features deal with different kind of information, they can be

grouped together and the combined average performance results are 79%. Though the

individual contribution of these feature sets is poor, they together are capable of

classifying 21 out of 49 characters above 79%.

b. Even though, the central moments performed poorly compared to geometric moments,

along with Hu’s moments, they showed 62% average recognition power. They are able to

distinguish 20 of 49 characters above average. As geometric moments deal with the image

without normalization, Central moments are translation normalized and Hu’s moments

represent image shape with the features invariant for translation, scale and rotation

variations, they convey different information and hence can be grouped. A combined

moments feature set (only 14 features) showed an average recognition rate of 72%

compared to 79% of density and curvature feature set (48features) results. Hence,

moments are robust features against huge variations. We also found the character that is

confused the most by each of the combined feature set.

c. We found that the character with minimum recognition rate is different with different

feature sets. As moments capture the centralness, diagonality, deviation etc, we can group

these features with the zonal density and curvature information. With O-M, O-ZD and O-

CPB (along with O-GA and O-GI features), almost all the characters were classified well,

with an average recognition rate of 92%, minimum of 78% for character 24 ( ) and a

maximum of 100 % for character 6 ( ) with handwritten shapes as shown in figure 9.1.

Figure 9.1 Sample Images of characters 24 and 6

The number of characters recognized above 92% is 24. That is, more than 50% of the

characters have more than 92% recognition rate.

To find the reason for mis-classification, we analyzed the confusion matrix from the

misses in the testing stage as shown in table 9.3. Each entry c(x,y) in the matrix gives the

count of how many times the character x is confused with the character y. Total number of

153

confusions per character is calculated. This is used to analyze the performance of the

system and identify areas requiring further attention.

Confusion matrix was analyzed to identify the nature of failures and to find the measures

to improve performance. The maximum miss characters are analyzed to catch with which

character they are confused and how many times they are confused. Characters with miss

count > 4 are chosen and used for further analysis. Some of these are tabulated in table 9.3

showing images of the confused character and of those with whom it is confused along

with the misclassification count. For example, the character no. 10 is confused with

character no. 9 seven times. Similarly character no. 12 is confused with character no. 20

seven times and with character number 23 nine times, etc. The printed form of the

character is also shown to facilitate comparing the characters.

Table 9.3 Confusion matrix for O-CPB, O-M and O-S feature set

Confused with Sl No Character No –

printed character Image character no/(count) times – printed character Images

1 10 – 9/7 –

2 12 – 20/7 – , 23/9 -

3 20 – 12/6 –

4 22 – 39/5 –

5 28 – 33/8 – , 48/6 -

6 31 – 42/5 –

7 32 – 29/11 –

8 34 – 29/5 –

9 40 – 7/7 –

10 44 – 9/7 –

These confusion matrix entries clearly indicate the high shape similarity among some

character pairs and huge writing variations as the reason for confusion. The data set

considered for the experiment has very large variations. Some characters are written so

badly or differently as shown in the confusion matrix 2nd column and 1st, 6th and 10th rows,

that it is hard even for human readers to identify them when presented independently.

154

Based on the confusion matrix analysis, we further developed two new feature sets for

basic character recognition as explained below in (d.) and (e.).

d. The detailed analysis of the confusion matrix showed that the characters differentiable

with features within the contour are under confusion. Also, we found that the curvature

feature contribution is less compared to zonal density and the moments features. It is

mainly due to the curvature details being taken only from boundaries. As some Kannada

characters are differentiated by the central information, curvature details observed from

the center may be important. Hence we improved the curvature feature set by adding some

more curvature features (O-CLB, O-CPC and O-CLC). We found that these additional

features improved the average performance of curvature features from 43% to 73%.

e. As the Kannada characters are curved, the moments features from the Gabor directional

images (see section 7.4) appeared likely to provide useful discriminatory information. We

studied this experimentally, and obtained average performance of 85% with moments as

features from Gabor directional images as compared to 72% when using moments from

original images. The minimum and maximum recognition rates obtained are 67% and

99% respectively. The number of character classes recognized above average (85%) is 25.

The characters that suffered minimum recognition rate is 47- which is different from the

character (character 31- ) that suffered minimum recognition using moments from

original image. The plot of recognition rate of both these results as shown in figure 9.2

showed that they are non-linearly related. Hence they can be combined together as a

feature set to produce still stronger feature set.

155

0

20

40

60

80

100

120

1 4 7 10 13 16 19 22 25 28 31 34 37 40 43 46 49

Number of classes

Rec

ogni

tion

resu

lts

original image moments directional image moments combined features

Figure 9.2 Recognition results with moments features from original and directional images

The study on feature set resulted in a number of feature sets – curvature feature set,

moments feature set, density (statistical) feature set, moment feature set from Gabor

directional images. All these feature sets show a nonlinear relation and hence can be

combined together to obtain better performance. Together they produce better results than any

of them would individually do, as we discuss in the next section.

9.2.4 Final Recommendations

A number of experiments are performed on an optimized Multi-Layer Perceptron

neural network with Back Propagation method for learning. In each case, the training process

stopped when the performance is observed to be in its peak and the % recognition results

remained steady over a number of epochs. The best results observed for basic character set

recognition are presented in table 9.4. The results of basic character set using curvature,

statistical and moments features extracted from dynamically preprocessed images and that of

moments features from original image and from Gabor directional images, are discussed in

detail in the following sub-sections.

156

Table 9.4 Experimental results for Basic Character set

% Recognition results Sl No Feature set

Average Minimum Maximum

1 O-M+ O-S+ O-C 96 86 100

2 G-M + O-M 92 80 100

9.2.4.1 Experiment results with curvature, statistical and moments features

Along with the CPC, CLB and CLC features (O-M, O-S and O-C together), the

recognition rate further improved with an average recognition rate of 96% and a minimum

recognition rate of 86%. We also performed the experiments by reducing the number of

hidden nodes. It is observed that this 94 feature set is capable of recognizing with an average

of 88% and 96% with 49 and 188 hidden layer nodes respectively with a performance drop of

8%. This means that the system is generalized and may be capable of recognizing the

unknown characters with this feature set. The confusion matrix is in table 9.5.

Table 9.5 Confusion matrix for O-C, O-M and O-S feature set

Confused with Sl No Character No –

printed character character no-printed character /(count) times

1 12- 23- /5

2 20- 12- /4

3 27- 28- /4 , 42- /4

4 29- 32- /5, 34- /4

5 31- 28- /4

6 32- 34- /4, 47- /4

7 40- 7- /4

8 44- 28- /4

9 45- 35- /5, 42- /4

10 46- 19- /6

The confusion matrix analysis shows that the confusion remaining is between the

similar shaped characters. It is observed that 5 is the maximum times a character has been

157

confused with another character, against 200 number of instances used (less than 2.5 %). All

other mismatches are scattered and some are seen to be the random matches to that class, like

character 45- mismatching with character 42- , etc.

9.2.4.2 Experiment results with moments as features

The moments features from four directional images gave an average recognition rate of

85% with a minimum recognition rate of 76%. As the moments features extracted from

original image and from directional images, show nonlinearity as explained in sub-section

9.2.2 (e), they can be combined together as features. When O-M features are considered along

with G-M features, the minimum recognition increased from 67% to 80% with an

improvement of 13% in the minimum recognition rate and the average recognition rate

improved from 85% to 92%. There are 22 characters above 92% average recognition rate and

42 characters (out of 49 characters) above 85%. The character that suffered with minimum

recognition rate of 80% is character 32 - . The final confusion matrix is as shown in table

9.6.

Table 9.6 Confusion matrix for combined moments features

Confused with Sl No Character No – printed character character no-printed character /(count)

times

1 12- 20- /5

2 19- 37- /7

3 22- 39- /6

4 23- 20- /6

5 24- 7- /7

6 26- 10- /5

7 27- 42- /5

8 28- 33- /6

9 32- 22 - /5, 34- /9, 37 - /6

10 36- 27- /5, 47- /5, 48- /5

158

11 37- 29- /5

12 41- 40- /5

13 44- 9- /8

14 45- 18- /5, 27- /7

15 47- 4- /5, 16- /5, 34- /6

16 48- 7- /6

17 49- 25- /5

The confusion matrix analysis cases shows that the remaining confusion is between

closely similar shaped characters like , , , , , , etc, which is hard to distinguish even

manually under large writing variations.

The different categories of features – curvature, statistical and moments – together

perform well. It is observed that moments are strong features for representing the Kannada

characters. All these features can be used together and can improve the performance further.

But due to character shape similarities and writing deformations, even manual recognition of

some of the samples is difficult when such images are seen randomly. Under such cases, any

number of features added to the feature set may not be able to discriminate the character.

Hence under such cases, it is good to go for top-3 choices and process further using post-

processing stage. Thus, at this stage, we have two sets of features that can be used effectively

for Kannada as shown in table 9.4.

9.3 Studies on Feature Set for Kagunita

As Kagunita is a combination of a vowel and a consonant with the resultant shape

retaining neither the consonant shape nor the vowel shape fully, recognition of Kagunita

character is a difficult task. As there are similarities between the vowel matras and between

the consonants, and also due to huge writer variations, recognizing the shape as a Kagunita

character (510 classes) may not be a suitable choice. To reduce the number of classes from

510 classes, we proposed to explore separate recognition of vowel and consonant present in

the Kagunita image.

159

We used three different categories of images for recognizing a Kagunita in terms of

vowel and consonant. They are cut images (refer section 7.5), Gabor directional images (refer

section 7.4) and the original preprocessed image. Original image and directional images are

complete images where as the cut images are the portions of the whole image cut to include

regions dominant with vowel or consonant information to form isolated images. For Kagunita

recognition, we used all the above features – curvature, statistical, moments and moments

from directional images and a new zonal probability density feature. The Kagunita features

extracted from the three different images (original, directional and cut images) are shown in

table 9.7.

Table 9.7 Feature sets for experiments on Kagunita

Image type Kagunita features (feature count) Feature set size

Cut images C-CM (3) and C-HU (7) C-ZPD (1 per zone*6 zones)

16 for each cut image

Four Gabor directional images G-CM (3), G-HU (7) G-GI(1), G-GA(1) and G-ZPD (1 per zone* 10 zones)

22 for each directional image

Original image O-CPB(16), O-CPC(16), O-CLB(8), O-CLC(8) O-GM(4), O-CM(3) and O-HU(7) O-GI(1), O-GA(1) and O-ZD (5 per zone * 6 zones)

94

Vowel recognition in a Kagunita is a 15 class problem, wherein, it is required to

identify the vowel matra associated with a consonant. For vowel recognition in the Kagunita,

we used cut images and directional images. Similarly, consonant recognition in a Kagunita is

a 34 class problem, wherein, it is required to identify the consonant associated with a vowel

matra. For consonant recognition, since the consonants classes for recognition is large, many

consonants are similar in shape and also the vowel matra modifies the shape, we need large

feature set with strong features for discrimination. Here we used the features extracted from

all the three image types - cut images, directional images and original image. To extract zonal

statistical feature – zonal probability density feature from cut images and Gabor directional

images, we used Nx1 horizontal zones and 1xN vertical zones. For cut images, we used N=3

giving 6 zones and for Gabor directional images N=5 giving 10 zones.

160

9.3.1 Performance of moment features on whole Kagunita images

The individual feature set and sub-feature set performances of moments features on

whole Kagunita images are given in table 9.8.

Table 9.8 Performance of moments features from whole Kagunita images

Recognition results in % Slno

Feature description Vowels Consonants

Mean Min Max Mean Min Max 1 O-CM+O-HU 40 15 70 24 8 51 2 O-M 47 18 74 29 4 62 3 G-CM+G-HU 63 42 81 35 9 60 4 G-M 64 40 82 36 9 62

The performance of CM and HU features from the original images are compared with

the CM and HU features from four Gabor directional images. There is an improvement of

23% and 11% in the recognition of vowels and consonants respectively. The performance of

all the 14 moment features (O-M) from original image is 47% and 29% for vowels and

consonants and for G-M (56 features), it is 64% and 36%. The performance of G-M improved

by just 1% on average for both vowels and consonants to 64% and 36% respectively when G-

GM is also considered along with G-CM and G-HU. It is observed that the moments features

from the directional images performed well compared to moments features from original

images. The consonant results are poor with both feature sets.

The plot of O-M from original images and O-CM+O-HU from directional images for

15 vowels and 34 consonants are as shown in the figure 9.3 (a) and (b) respectively. As

expected they show a nonlinear relation indicated with circles. This suggests that they can be

used together as features in further experiments.

(a) Recognition results for vowels (b) Recognition results for consonants Figure 9.3 Nonlinearity of Moments features from original image and four directional images

161

The vowel recognition results using combined moments along with statistical features

is as shown in table 9.9. When moment features G-CM, G-HU and O-M are together used as

features, the vowel average recognition results improved from 63% to 71%. The statistical

features G-S showed a performance of 21% individually but when combined with G-CM and

G-HU, the results improved from 63% to 69%. When all these features here used together, the

average results are 77%.

Table 9.9 Performance of combined moments and statistical features from whole Kagunita images

Recognition results in % Sl no Feature description

Vowels Consonants

Mean Min Max Mean Min Max

1 G-CM+G-HU+O-M 71 57 83 44 22 65

2 G-S 21 09 43 10 01 29

3 G-CM+G-HU+G-S 69 50 84 43 20 67

4 G-CM+G-HU+G-S+O-M 77 56 92 49 23 78

Table 9.9 also lists the combined moments and statistical features recognition results

for consonants. When moment features G-CM, G-HU and O-M are together used as features,

the recognition result is average 44% with an improvement of 9%. The statistical features G-S

did not perform well but when combined with G-CM and G-HU, the results improved from

35% to 43% and along with O-M, the result is 49%.

9.3.2 Performance of cut images on original Kagunita images

To understand the effect of cuts that extracts some portion from the image, we did

experiment on the basic character set with a 20% cut from all the directions. That is, the base

character loses 20% information from all the directions so that the contribution of the image

cut around its boundary can be analyzed. As this may happen to the cut image shapes, this is

one of the useful analyses to get a feel of the impact of the cut on recognition. The results are

average 89% when tested using O-CPB, O-S and O-M features; the reduction observed is of

only 3% in the mean results. The result is shown in table 9.11. The results helped in adopting

the concept of cut images to make the vowel or consonant related strokes in the image more

162

significant for feature extraction. This study also helped to understand the effect of cuts and

cut positions on the consonant recognition within Kagunita.

In order to isolate the portion of the image for Kagunita recognition, we analyzed the

regions dominated by the vowel information and consonant information as explained in

section 7.5. It is clear from the Kagunita shape analysis that to extract the vowel information,

five cut images – one on top, three on left and one on bottom are required and to extract

consonant information, three cut images from left are required. As the position of the cuts for

generating the cut images is influenced by the writing variations, the positions cannot be

uniquely determined in advance and hence we identified some potential cut sizes. The % cut

image sizes considered for the vowel recognition experiments and consonant recognition

experiments are listed in table 9.10(a) and (b) respectively.

Table 9.10 (a) size and count of cut images for Vowel recognition

Top Right Bottom SlNo Vowel Feature

description No of

Images%cut

No of Images % cut No of

Images %cut

1 20T20R20B 1 20 1 20 1 20 2 30T30R30B 1 30 1 30 1 30 3 30T50R30B 1 30 1 50 1 30 5 20T204060R20B 1 20 3 20,40,60 1 20 6 20T305070R20B 1 20 3 30,50,70 1 20

Table 9.10 (b) Size and count of cut images for Consonant recognition

Bottom Left Consonant feature

description No of

Images % cut No of Images % cut

507090 - - 3 50,70,90 406080 - - 3 40,60,80 305070 - - 3 30,50,70

305070L30B 1 30 3 30,50,70

In the table, we have used a specific notation to describe the cuts effected. For

example, ‘20T305070R20B’ means a single 20% cut on the top, three different cuts with 30%,

50% and 70% from the right and a single 20% cut in the bottom. Similarly we considered

different cut images for consonant recognition. For example, ‘406080’ means, 40%, 60% and

80% cuts from left. The cut size is varied and different sets of cut images are generated for the

experiments.

163

From all the cut images, the geometric moments, central moments, Hu's moments and

the zonal probability density statistical features are computed. As there are four different

vowel sizes to the right of the consonant, we consider one cut or three cuts to the right to

check the effect of three cuts over single cut. The performances of different cut images on

vowel recognition are shown in the table 9.11. The effect of different percent cuts is analyzed.

It is found that extracting three cut images performs well (69%) as compared to single cut

(57%). As this selection is based on trials, two of them 30T204060R30B and 20T305070R20B are

selected for further computations.

Table 9.11 Results of CUT experiments on original images

Vowels Consonants SlNo Feature description Recognition results in

% on test data Feature description

Recognition results in % on test data

Mean Min Max Mean Min Max

1 94 features from basic character set 89 75 100

1 30T50R30B 52 25 75 507090 42 18 78

2 20T20R20B 57 36 84 406080 46 15 88

3 30T30R30B 55 26 78 305070 34 14 64

4 30T204060R30B 66 47 90 305070L30B 37 16 76

5 20T204060R20B 69 48 83 406080L30B 35 13 72

6 20T305070R20B 69 50 87 507090L30B 36 11 74

Similar experiments are done for consonant feature extraction. The different percent

cuts from the left are tried. Experiments are done with three different percent cuts. As the

consonant shape in the bottom portion carries distinguishing information for some similar

shape characters, some experiments are performed by considering three left cut images and

one bottom cut image together but, we found that the system performance reduced from 46%

to 35% as shown in table 9.11. The two highlighted images 406080 and 405080L30B from both

categories are used for further work.

We considered two cut image sets for both vowel and consonant recognition for

further experiments so that the effect of the cut sizes along with other features can be

analyzed.

164

9.3.3 Kagunita Recognition

We find through these experiments that none of these three types of images (original,

directional and cut images) alone can give satisfactory results. The recognition result analysis

showed that the characters suffering lower recognition rate are different in all three cases.

Hence the combined feature set should perform better. Based on this intuition, we performed

further experiments on combined feature sets. The results are shown in table 9.12.

Table 9.12 Experimental results for Kagunita with combined feature sets

% Recognition results Imagesused Feature set Vowels Consonants Observations

(1) G-CM+G-HU+G-S+20T305070R20B

Top-1 84% Top-2 92% Top-3 96%

96% in top-3 shows that the maximum confusion is with similar shaped characters

(2) (1)+ O-M Top-1 86% With the addition of O-M, 2% increase in vowels results.

(3) G-CM+G-HU+G-S+ 406080 Top1 63%

(4) (3) + O-M Top1 65%

The addition of moments from original image improved the results by 2% for consonants.

Whole and cut images

(5)

G-CM+G-HU+G-S+ 406080+ O-M+O-C+O-S

Top-1 72% Top-2 84% Top-3 89%

The O-C curvature information, O-S density distribution and O-M moments from original image improved the results from 63% to 72%

The moments and statistical features extracted from cut images 20T305070R20B and

from four Gabor directional images G-CM, G-HU and G-S performed well with an average

recognition rate of 84% for vowel recognition as in table 9.12. The minimum and the

maximum recognition rates are 72% and 93% respectively. The top-2 and top-3 results are

92% and 96% respectively. Along with O-M features, the result is 86% with an improvement

of 2% for vowel recognition. The best cut image result is shown in the table. With G-CM+G-

S+30T204060R30B using another 5 cut images set, the top-1, top-2 and top-3 results are 82%,

91% and 95% respectively. This shows that the specific cut image size has minimal influence

and performs similarly along with other features for vowel recognition.

Similarly for consonants, the recognition rate is average 63% as in table 9.12. With O-

M, there is an improvement of 2% with an average consonant recognition rate of 65% which

is not sufficient to build a practically feasible HCR solution. To strengthen the consonant

feature set, we considered the curvature and statistical features from the original image also

165

for consonant recognition. This improved the results by 7% with an average recognition result

of 72%. The minimum and the maximum recognition rates are 39% and 97% respectively.

The top-2 and top-3 results are 84% and 89% respectively. With G-CM+G-S+406080L30B (using

another cut image), the top-1, top-2 and top-3 results are 70%, 82% and 88% respectively.

Again on the top-3 level, the performance difference is just 1% and so we can say that the cut

images of different size and number has minimal influence on the performance.

To uniquely recognize the 34 different consonants under huge vowel shape variations

and close consonant similarities, the feature set need to be further strengthened. For this we

computed the confusion matrix to analyze the reasons for confusion.

We examined confusion matrices for vowels and for consonants to find the most

confused vowels and consonants and with what they are confused with. We considered all the

characters confused with 5 or more times. The vowel confusion matrix is as shown in the

table 9.13. For clarity the typed characters are shown instead of handwritten images.

Table 9.13 Vowel Confusion matrix

SlNo

Vowel and its matra

Confused 5 or more times with another vowel and its matra

1

2

3

4

5

6

7

8

9

The analysis of the confusion matrix shows that the confusion is mostly between the

similar vowel matras. Also it is observed that some Kagunita shapes along with matra become

similar to other Kagunita shapes.

For example, and , and , and etc.

166

With huge writing variations, if the circle in ( ) matra becomes smaller or the circle

in ( ) matra becomes bigger, there will be close similarities. If the ( ) matra is written

with short horizontal line, it can be confused with ( ) matra.

The Consonant confusion matrix is shown in the table 9.14. We can observe the close

shape similarities as the reason for the confusion. For example with matra becomes

with a shape very close to . The manual shape similarities observation matches with the

confusion results.

Table 9.14 Consonant Confusion matrix

SlNo

Consonant Confused 5 or more times with

123456789

1011121314151617

To further improve the results, a cascaded neural network is proposed. This design is

based on the intuition that ‘knowing the kind of vowel present in the Kagunita image, the

consonant recognition may improve and vice versa’ (see section 8.4). As shown in figure 9.4,

the architecture has four neural networks (NN) with two independent NNs in the first stage:

one for vowel recognition and another for consonant recognition and two independent NNs in

the second stage again separately for vowel and consonant recognition.

167

Figure 9.4 Cascaded Neural Network

The top-3 vowel results from the first stage are encoded in 15 bits with the

corresponding vowel bit positions set to 1 giving V feature vector of size 15. Similarly, the

top-3 consonant results from the first stage are encoded in 34 bits by setting the corresponding

bits forming the C feature vector of size 34. This kind of encoding is chosen as it does not

signify the ranking of recognition and hence top-3 matching have equal weightage.

The encoded top-3 results of vowels from the first stage NN are used as additional

features along with the other features of the consonants by the second stage NN for consonant

recognition. Similarly the encoded top-3 results of consonants from the first stage NN are

used as features along with the other features of the vowels by the second stage NN for vowel

recognition. The second level neural network results are influenced by the knowledge of top-3

possible consonants that may be present in the Kagunita for the vowel recognition and vice

versa. The results and observations are in table 9.15.

Table 9.15 Experimental results for Kagunita with cascaded NN

Feature set % Recognition results

Vowels Consonants Vowels ConsonantsObservations

G-CM+ G-S + 20T305070R20B + C

G-CM+G-S+ O-GM+ O-CM + O-C+O-S+ 406080+V

Top-1 84%Top-2 93%Top-3 97%

Top-1 72%Top-2 84%Top-3 90%

Significant improvement in the top-3 results indicates that most of the time the confusion is between three similar shaped characters

Vowel class

Top 3 matches

NN

V1

NN

C1

NN

V2

NN

C2

VowelFeatures

Consonantclass

ConsonantFeatures

168

The individual recognition results (Top 1) of vowel and consonant in a Kagunita are

84% and 72% respectively. This result is the same as before cascading which means that the

over all performance of the system is not affected. When individual performances are

observed, for consonants, the minimum recognition rate improved from 39% to 44% and top-

3 recognition results improved by 1%. For vowels, the maximum recognition rate improved

from 93% to 95%.

When Kagunita is viewed as a character, to recognize it, both vowel and consonant in

that character need to be recognized correctly. We considered top-3 results of cascaded neural

network as shown in table 9.16.

Table 9.16 Kagunita recognition results

Consonant No Vowel = true

Consonant = true Vowel = false

Consonant = true Vowel = true

Consonant = false Vowel = false

Consonant = false count % count % count % count %

1 190 84.44 16 7.11 17 7.56 2 0.89

2 198 88 6 2.67 17 7.56 4 1.78

3 198 88 8 3.56 19 8.44 0 0

4 179 79.56 13 5.78 30 13.33 3 1.33

5 210 93.33 9 4 4 1.78 2 0.89

6 184 81.78 9 4 31 13.78 1 0.44

7 190 84.44 12 5.33 22 9.78 1 0.44

8 176 78.22 13 5.78 35 15.56 1 0.44

9 153 68 61 27.11 4 1.78 7 3.11

10 205 91.11 14 6.22 6 2.67 0 0

11 208 92.44 8 3.56 9 4 0 0

12 206 91.56 6 2.67 12 5.33 1 0.44

13 189 84 9 4 23 10.22 4 1.78

14 192 85.33 9 4 23 10.22 1 0.44

15 203 90.22 13 5.78 8 3.56 1 0.44

16 192 85.33 7 3.11 25 11.11 1 0.44

17 179 79.56 12 5.33 32 14.22 2 0.89

18 182 80.89 4 1.78 39 17.33 0 0

19 190 84.44 3 1.33 30 13.33 2 0.89

20 202 89.78 10 4.44 13 5.78 0 0

21 187 83.11 8 3.56 27 12 3 1.33

22 167 74.22 16 7.11 38 16.89 4 1.78

169

Consonant No Vowel = true

Consonant = true Vowel = false

Consonant = true Vowel = true

Consonant = false Vowel = false

Consonant = false

23 190 84.44 7 3.11 27 12 1 0.44

24 199 88.44 9 4 16 7.11 1 0.44

25 170 75.56 36 16 16 7.11 3 1.33

26 176 78.22 28 12.44 18 8 3 1.33

27 199 88.44 0 0 25 11.11 1 0.44

28 198 88 11 4.89 16 7.11 0 0

29 183 81.33 10 4.44 29 12.89 3 1.33

30 189 84 11 4.89 22 9.78 3 1.33

31 177 78.67 25 11.11 20 8.89 3 1.33

32 164 72.89 6 2.67 49 21.78 6 2.67

33 199 88.44 10 4.44 15 6.67 1 0.44

34 199 88.44 13 5.78 12 5.33 1 0.44

Average % 83.96 5.82 9.84 0.84

Minimum % 68 0 1.78 0

Maximum % 93.33 27.11 21.78 3.11

The top-3 Kagunita recognition rate is 84% with both vowel and consonant recognized

correctly. Cases where only the vowels is recognized wrongly average 6%, consonants

recognized wrongly is average 10% and both recognized wrongly is average around just 1%.

The confusion is in terms of which component (vowel or consonant or both) is

wrongly recognized. The Kagunita with consonant 9- showed the maximum error of

27.1% for wrong recognition of vowels associated with it. The consonant itself showed an

error of 1.78% and the percentage of both wrongly recognized is 3.11% which is again the

maximum error of both being wrongly recognized. Similarly, the Kagunita with consonant

32- was wrongly recognized with a maximum error rate of 21.8%, its vowel component is

wrongly recognized by 2.67% and both wrongly recognized is again 2.67%. Cases of

Kagunita with both vowel and consonant being false are maximum only 3%.

The consonant 5- has maximum top-3 recognition rate of 93.33% and consonant 9-

has minimum top-3 recognition rate of 68%. The same consonant 9- suffered

170

maximum faults for vowel recognition. Consonant 27- (both correctly recognized with

88.4%) did not suffer (0%) vowel recognition problem, 11.11% of wrong consonant

recognition and 0.44% of both vowel and consonant wrong recognition rates. The consonant

32- suffered the maximum for consonant recognition by 21.8% whereas both vowel and

consonant going wrong is 2.67%. The consonant 9- suffered the most with both being

wrong is 3.11%. The confusion analysis shows that the consonant 9- has similarity with

, and has more curves to the right in the basic shape itself as compared to others due to

which the shape change due to matra is different. We also observe the reduced recognition

rates for (78.2%) and (75.8%).

The Kagunita result overall analysis is as follows.

When both vowel and consonant get correctly recognized, the Kagunita is recognized

and the top-3 Kagunita recognition rate is 84%. As both being erroneously recognized is just

average 1%, with the use of post-processing stage, the wrongly recognized vowel, or

consonant or the choice of one of the top-3 characters can be determined.

The cut image concept worked well for the Kagunita recognition. Under vowel

recognition experiments, the three cuts from the right performed well over single cut on the

right, presumably because one of the three cuts extract the right side vowel information fairly

well based on the size of the vowel.

Under consonant recognition experiments, the three cuts from the left performed well.

The additional cut from the bottom did not contribute positively. This is expected given that

the modification is rarely on the bottom side.

The cascaded network appears to be a promising idea as per our initial experiments,

but it did not contribute any noticeable improvement to the top 1% performance; though there

was visible improvement in the overall performance. This is now being investigated further.

No comparable work in Indian language OCR exists for comparing our results; much

less for Kannada. Much of the existing work covers simply numerals or at best stand alone

vowels and consonants. Even though, in absolute terms, the recognition figures may not be

high enough for practical use, as the recognition rate is over a fairly extreme range of

171

character shapes, and at a single isolated character level, in a practical system one can use

linguistic information as part of post-processing to significantly enhance the effective

recognition. For example, using a dictionary to cross check and rank the characters can often

get to recognize words with high enough accuracy for practical use. In some cases, human

intervention can be brought in to fix the surviving errors. In practice, one can tune the system

more accurately with character samples on a per user basis also to get better performance

figures.