thai english

Upload: parvezali

Post on 29-May-2018

223 views

Category:

Documents


0 download

TRANSCRIPT

  • 8/8/2019 Thai English

    1/6

    ON-LINE THAI-ENGLISH HANDWRITTEN CHARACT ERRECOGNITION USING DISTINCTIVE FEATURES

    Thirapiroon Thongkamwitoon,W idhyakorn Asdornwised,S up avadee Aramvith,a nd Somchai JitapunkulDigital Signal Processing Research Laboratory,D epartment of Electrical EngineeringFaculty of Engineering$ hulalongkorn University,B angkok 10330,ThailandPhone: (662) 218-6909, Fax: (662) 218-6912E-mail: {widhya,supava}@ee.eng.chula.ac.th

    ABSTRACTThis paper presents an on-line bilingual handwrittencharacter recognition to classify both Thai and Englishcharacters using distinctive feature extraction. In thispape r, we analytically derive distinctive features fo rThai-English language classijka tion. Decision treediagram based on distinctive features derivedi s thenbuild to use in practical applications. In addition, alanguage classiJier has been used as the front-endrecognizer in order to improve the performance of therecognition in terms of complexity reduction andrecognition accuracy. From the experimental results,our implemented system can recognize an input as Thaior English character with an accuracy of 86.34% and95.4 2% respectively. These numbers can be translatedinto an overall language classification accuracy of90.21%.

    1.1NTRODUCTIONCurrently, there are many researches in both off-lineand on-line handwritten recognition mainly for Englishcharacters [4,5]. Since different languages owe differentlanguage structures, the recognition techniques based onone language cannot be directlya pplied to recognizeanother language. To be able to recognize Thaicharacters, thes tructure of Thai characters need to bestudied such that several techniques proposed for otherlanguages could be adap!ed to recognize Thai characters.Previous works in Thaic haracter recognition include theuse of loop structure and other Thai characters structuralfeatures [I], and the use of distinctive feature and StrokeChanging Sequence (SCS) by considering the head ofcharacters [3],t o aid recognition performance.The needsf or multilingual document analysis resultin their uses in many applications such as postalautomation where the envelope of international mailm aycontain lines of different languages. In addition, their

    uses are found in digital library applications, technicaljournals, newspapers, maps, andp ersonal short notes,

    0-7803-7690-0/02/$17.00 02002 IEEE 259

    especially in the country where their native language isnot English. Nowadays, these increasing needs result inmulti-language documents support in several popularcomputing devices; such as Personal Digital Assistant(PDA) system.In Personal Digital Assistant (PDA) system, e.g.Palmtop, input data or personal short notes are in theform of on-line handwritten letter. Thus, on-linehandwrittenT hai-Englishc haracter recognition playavital role on input process0 n thist ype of devices. Thesystem and process must be as simple as possible, aslimited resources and the requirement of high processingrate can affect system performance.The organization of this paper is as follows: InSection 2, we describe the fundamental of distinctivefeatures, which is usually used far character recognition.We analytically extract extended features, which issuitable for the language classification described inSection 3. The details of our proposed languageclassification rules and decision tree diagram arepresented in Section 4. Experimentr esults are discussedin Section5. The conclusion is given in Section 6.

    2. BASIC DISTINCTIVE FEATURES FORCHARACTER RECO GNITIONIn this section, we briefly review the concept of thebasic distinctive features extraction process. Theobjective of this process is to extract distinctive featuresthat indicate differences of characters for synthesizingthe pattern of classification. Previous research work in[2] has accounted for the uses of basic distinctivefeatures in Thai-English character recognition.However, no research works have ever beend one infeature analysis for Thai-English language classification.In this paper, we extract extended distinctive features,which will be described in section 3, and propose itscombined uses with a specific basic distinctive feature toclassify Thai-English alphabets.Basic distinctive features can be divided into primaryand secondary features. Primary features are defined as

  • 8/8/2019 Thai English

    2/6

    the common features owed by every characters whilesecondary features are defined as owed, 6y somecharacters.2.1 Primary features2.1.1 Number of islandsAn island is defined as an isolated part of t he letter.An example of two characters in Thai and English withone and two islands, respectively, is shown in Fig. l(a)and I(b), respectively.

    Im iFigure 1. Example of two characters with (a) one islandand (b) two islands.2.1.2 Number of loopsLoop is defined as a close path in ac haracter. Acharacter can have one, more than one, or no loop at all.An example of the number of loops is showni n theFigure 2.

    Figure 2. Example of characters with (a) no loop (b)one loop (c) two loops (d) three loops.2.1.3 Loop heightWe divide the heights of characters into 3 levels asshown in Figure 3.A s shown in Figure 4,t he level of theloop can be upper, middle, or lower level correspondingto the height of the center of the loop.

    Boundary ofC-- character

    upper -bMiddlelower-

    Figure 3.B oundaryand loop level of character.

    5

    2.1.4 Loop connection poin tSome characters consist of a loop connected with astroke line. This stroke line can be connected to the loopeither on the left or right side with respect to a referenceline as shown in Figs. 5and 6,r espectively.

    lineLoop line

    Figure 5. Loop and its reference line.@ 0

    Figure 6. Loop connection (a) Left connection (b)Right connection (c) Left and right connections (d) noconnection.2.2 Secondary features

    We use primary distinctive features to cluster andcategorize characters into groups, e.g. with no loop orwith one loop. After clustering, we need more specificfeatures to recognize each character. These features arecalled secondary distinctive feature s.2.2.1 Active reg ion featur eWe define some specific features; such as end point( Q) , tri-connecting point (E), and tetra-connectingpoint (k). Next, we define 4 regions. When thesefeatures occur in any regions, the features and its regionare called the active region featur e. Figs. 7 and 8 showactive region and its special features,r espectively.

    Figure7. Defined active region.Figure 4. Level of loop (a) upper level (b) middle level(c) lower level.

    260

  • 8/8/2019 Thai English

    3/6

    ( 4 tb)Figure 8. Active region feature (a) 0 has no end pointand no active region (b) Q has an endpoint in region 4.2.2.2 Loop-width- to- loop- height ratioFor some similar characters, such as letter 0 nd-,we can use this feature in final step of recognition.An example of loop-width-to-loop-height ratio forsimilar letters is shown in Fig. 9.aoop heightooph

    loop width4 *!

    1000 widthFigure 9. Illustration of loop height and loop width ofcharacters.2.2.3 Character rbp lesRipple is an important distinctive feature for Thaicharacter recognition. It can also be used for Thai-English character classification. However, the extractionprocess of this feature is very time consuming. Note thatEnglish characters have only vertical ripples, while mostof Thai characters have horizontal ripples.

    (a) (b)Figure 10. Character ripples (a) vertical (b) horizontal.

    3. EXTENDED DISTINCTIVE FEATURES FORLANGUAGE CLASSIFICATIONTo obtain the best results in Thai-Englishclassification, new rules need to be synthesized andextended f?om the basic distinctive features owed to thedifferences of Thai and English characters. In thissection, we discuss our proposed extended distinctivefeatures for language classification task.

    3.1 Level of characterWe define a character can have up to three-levelheighta s shown in Fig. 11. For English characters, thereis only one level, the middle level. For Thai characters,there are three levels. Most consonants of Thai

    charactersa re in the middle level while the vowels areusually located in the upper or the lower level.m

    PFigure 11. Illustration of English character (P) that ispositioned in the middle levela nd Thai character thata repositioned in the upper, middle,a nd lower levels3.2 Loop-to-character- area ratioAlthough, both Thai and English characters canhave a loop, the ratio of the size ofl oop with respect tothe size of character can beu sed to classify Thai andEnglish characters. For example, Thai characterso ftencontain loop of characters heads, which have a smallratio with respectt o English characters, as shown in Fig.12.

    (a) (b)Figure 12. Comparison of the ratio of loopto-character-area of (a) Thai character (b) English character3.3 Loop-to-character- width ratioThe ratio of loop-width-to-character-width can beusedt oc lassify a character if that character containsloops. In general, Thai character often contains1 oop ofcharactersh ead, which hast he ratio of lesst han 40%,while English character often has a ratio of more than50%,a s shown in figure 13.

    I I W ,

    Figure 13. Illustration of loop-to-character-width-ratio(a )Thai character (b) English character.3.4 Loop-to-character- height ratio

    The ratio of loop-to-character-height can be used toclassify a character if that character contains loops. Thaicharacter often contains loop of characters head, whichhas ratio less than 40%, whileE nglish character oftenhas ratio more than 50%,a s shown in Fig.14.

    26

  • 8/8/2019 Thai English

    4/6

    CH H

    Figure 14. Illustration of loop-to-character-height ratioof (a ) Thai character (b) English character.3.SLength-0f-loop-line-to-character-stroke-length atioThai character often contains loop of character'shead. The length of a stroke from an initial point to aloop contribution point around this loop is defined as thelength of a loop line. The length of a stroke from aninitial point to an end point of character is defined asstroke length. Typically, the ratioo f Thai character'slength-of-loop-line-to-character-stroke-length is lessthan 15%, while the ratio of English character is morethan 40%,a s shown in Fig.15.

    (a) (b)Figure 15. Illustration of the difference0 f length-of-loop-line-to-character-stroke-length ratio between (a)Thai character (b)English character3.6 Loop generatedpointWe define a point that stroke isr e-crossed itso wnpath to generate loop of character as loop generatedpoint. Thai character often starts with head of acharacter, so this point is presented in early of pathsequence. While, this point is presented in asternsequence for English character.

    / sternf Initial pointInitial point

    Figure 16.1 llustration of the difference of the position ofloop generated point (a) Thai (b)English.4.L ANGUAGE CLASSIFICATIONRULES .

    In this paper, ou r focus is to develop an on-line Thai-English handwritten character classification. Inputcharacters are 70T hai letters (44 consonants and 26vowels) and 52 English letters (small and capital of 26letters),a s shown in Fig. 17.

    Thai characters:n u s l A A w ~ q ~ a . a a r r y a D g v l u r u A s ln a u u ~ u d w ~ n u e s a a R n m H ~ a a

    .I:: L L L O ' L I I 7 q 4," "* e " . O I " r a t ,English characters:a b c d e f g h l j h l m n o p q r s t u v w x y zA B C D E F G H I J K L M N O P Q R S T U V W X Y Z

    Figure 17.Total input 122 lettersWe select appropriate basic and extended distinctivefeatures to classify an input character and build decisiontree diagram as shown in Fig.] 8.

    L

    I

    I

    Figure 18. Decision tree diagram of Thai-Englishlanguage classificationFrom thed ecision tree diagram, our proposed rulescan be described as follows: First, we consider a groupof characters with no loop (Fig. 19). After using the firstrule, we have five Thai characters remaining in thisgroup. In this work, we do not further classify these fivecharactersb ecause their classification processesa re toocomplicated. However, we can use simplified approachof using active region feature or contexts, i.e., previouscharacter,to classify them.

    English : f h I j k m n t s t U vw y zC E F G H I J K L M N S T U V W X Y ZFigure 19. Group of characters with no loop

    262

  • 8/8/2019 Thai English

    5/6

    Second, we consider a group of characters with loopas shown in Fig.2 0.

    m m n e u u J u d n ~ n u o s R ? f l a n u ~ o n- l Z t U l I l Iqq,.d.-.< Y e ,

    En lish: a b d e g o p q A B D O P Q

    Figure 20. Group of characters with loopCharacters with loop can be classified into tw ogroups as shown in Figs. 21 and 22 by using loop-to-character-width ratio as an indicator. A group ofcharacters with loop-to-character-width ratio of less than70% is shown in Fig. 21. We observe that all charactersbelong to this group are Thai characters.

    w n f i u u J u d n ~ n ~ o o R ? A a a a d o a ' "

    Figure 21. Group of characters which have Loop-to-character width ratio of less than 70%Thus, loop-to-character-width ratio is a good featurefor effectively classifying bilingual characters. A groupof characters with loop-to-character-width ratio of morethan 70% is shown in Fig.2 2.3 "Thai : - a1 E h : b d e g o p q A B D O P Q R

    Figure 22. Group of characters which have loop-to-character-width ratio of more than 70%As we can see, the characters in this group stillcontain both Thai and English characters.B y consideringloop-to-character-height ratio of more than 80%, weobserve that English characters can be classified fromThai characters as shown in Fig.2 3.

    Enelish : a b d e g p q A B P RFigure 23. Group of characters which have loop-to-character-height ratio of more than 80%For the loop-to-character-height ratio of less than80% as shown in Fig.2 4.1

    English : o D 0 QFigure 24 . Group of characters which have loop-to-character-height ratio of less than 80%

    From Fig. 24, we need to take into consideration,loop-height-to-loop-width ratio to finally classify theThai characters shown in Fig. 25 . As a result, theremaining characters after the last step of classificationare shpwn in Fig. 26. After this step, we could use somespecific basic distinctive features to classify theremaining characters.

    Figure 25. Group of characters,which have Loop-height-to-loop-width ratio of less than 70%ThaiEnglish o D O Q

    Figure 26 . Group of characters which have loop-height-to-loop-width ratio of more than 70%The structure of our proposed decision tree diagramis in hierarchical pattern. Note that, due tosimplification, the upper bound successful rate of ourproposed language classification rules described in thissection is 95.08%. Nevertheless, adding few moredistinctive features and rules could increase the upperbound of successful rate of classification to 100%.

    5.E XPERIMENTAL RESULTSTo validate our concept, we have implemented oursoftware prototype in Windows and Palm operatingsystems, as shown in Figs. 27a nd2 8. Ino ur testeddatabase, we use 10 people in ou r experiments. Eachperson wrote the whole set of characters (122 characters)ten times,f or the total of 12200 tested characters.

    Figure 27. Prototype software implemented onWindows operating system.

    263

  • 8/8/2019 Thai English

    6/6

    Figure 27. Palm OS emulator software implementation.Examples of the performance in term of accuracy areshown in Table 1 and 2, where the columns indicate theclassification results as Thai character, the classificationresults asE nglish character, unable to classify, and therate of successful classification, respectively. More

    detailed results can be obtained fi-om [6].From the experimental results, our implementedsoftware prototypes can recognize Thai and Englishlanguage using the proposed conditions with accuracy of86.34% for Thai characters input, and 95.42%, forEnglish characters input. These numbers can betranslated into an overall language classificationaccuracy of 90.21%. In addition, we can see that thesuccessful rate of classification for Thai character inputis less thant hat of Englishc haracter input. This isbecause Thai characters often consist of loop of head ofcharacter resulting in undetected condition.

    6.C ONCLUSIONSBy using our proposed distinctive features, we can

    build a small and efficient decision tree diagram, whichhas only fiver ules, with upper bound of classificationaccuracy up to 95.08%, based on the uniform characterprobability assumption. Our proposed recognitionsystem can classify Thai-English language with the totalof 122 letters (Thai: 44 consonant and 26 vowels,English: small and capital of 26 letters) witho vera11language classification accuracy of 90.21%. To achievehigher successful classification rate, the newsynthesizing algorithms of loop detection and morecomplicated rules could be added to enhance systemperformance.

    REFERENCES[I ] S. Airphaiboon, M. Sangworasil, and S. Kondo,Off-line Handwritten Thai Characters from WordScript, Proceedings of the 12 IAPR International,1994

    [2] N.Sattayatham, Of-lin e Thai-English handwrittencharacter recognition by using distinctive features.Technical report, Department of electrical engineeringChulalongkorn University,1998.[ 3 ] P. Chormengwiwat, Thai handwritten characterrecognition by using distinctive feature of strokechanging, Master Thesis, Department of electricalengineering Chulalongkorn University,1 998.[4] P. Siy, and C.S. Chen, Fuzzy Logic for HandWritten Numerical Character Recognition, ZEEE Trans.Syst.Man,Cyber-net, pp.570-575,N ovember 1974.[5] R. J. Schalkoff, Pattern Recognition and StatisticalStructural and Neural Netwo rk Approaches, John Wiley& Sons,l 992.[6] T. Thongkamwitoon, Online Bilingual HandwrittenCharacter Recognition on Palm OS , Technical report,Department of Electrical Engineering, ChulalongkornUniversity, 2001.

    Table l.C lassification accuracy of input as Thaicharacters

    Table 2. Classification accuracy of input as Englishcharacters

    264