type-2 fuzzy markov random fields and their application to

14
IEEE TRANSACTIONS ON FUZZY SYSTEMS, VOL. 16, NO. 3, JUNE 2008 747 Type-2 Fuzzy Markov Random Fields and Their Application to Handwritten Chinese Character Recognition Jia Zeng, Member, IEEE, and Zhi-Qiang Liu Abstract—In this paper, we integrate type-2 (T2) fuzzy sets with Markov random fields (MRFs) referred to as T2 FMRFs, which may handle both fuzziness and randomness in the structural pat- tern representation. On the one hand, the T2 membership func- tion (MF) has a 3-D structure in which the primary MF describes randomness and the secondary MF evaluates the fuzziness of the primary MF. On the other hand, MRFs can represent patterns statistical-structurally in terms of neighborhood system and clique potentials and, thus, have been widely applied to image analysis and computer vision. In the proposed T2 FMRFs, we de- fine the same neighborhood system as that in classical MRFs. To describe uncertain structural information in patterns, we derive the fuzzy likelihood clique potentials from T2 fuzzy Gaussian mix- ture models. The fuzzy prior clique potentials are penalties for the mismatched structures based on prior knowledge. Because Chi- nese characters have hierarchical structures, we use T2 FMRFs to model character structures in the handwritten Chinese character recognition system. The overall recognition rate is 99.07%, which confirms the effectiveness of the proposed method. Index Terms—Handwritten Chinese character recognition (HCCR), Markov random fields (MRFs), type-2 fuzzy sets (T2 FSs). I. INTRODUCTION M ANY pattern recognition problems can be posed as the labeling problem to which the solution is a set of lin- guistic labels, , assigned to a set of sites, , to interpret the observation set, , at all sites [1]. The sites may be successive times, image pixels, and image regions, while the labels reflect any re- lations, regularities, or structures inherent in sites. The number of labels is not necessarily equal to the number of sites . The labeling strength, , measures the assignment of label to site , where denotes that is definitely assigned to , i.e., . The null label is not assigned to any sites denoted by , and the null site is not asso- ciated with any labels denoted by . Each site is associated with a random observation , which may repre- sent symbols, feature vectors, or image pixel values. For sim- plicity, we assume that the observations are independent and identical distributed (i.i.d.). The label assigned to site is also a random variable and, thus, the labeling configuration at all sites, , is a stochastic process. For example, in Manuscript received October 29, 2006; revised February 8, 2007. The authors are with the School of Creative Media, City University of Hong Kong, Hong Kong (e-mail: [email protected]; [email protected]). Color versions of one or more of the figures in this paper are available online at http://ieeexplore.ieee.org. Digital Object Identifier 10.1109/TFUZZ.2007.905916 speech recognition using hidden Markov models, we may have labels representing phonemes, and such a label set for the word “cat” would have labels for phonemes , , and (see Fig. 1); in Chinese character recognition using Markov random fields, we may have labels representing stroke models, which constitute the character model. The label set for the character would have labels for all decomposed straight-line strokes (see Fig. 1). Given a set of class models , the recognition of an un- known pattern is performed using to score each class model based upon the given test observation , and select the pattern whose model score is highest, i.e., the max- imum-likelihood (ML) criterion [2]. The recognition can be also viewed as one of scoring how well a given class model (labels) matches a given observation. Because the matching score, , is an intractable com- binatorial problem, we turn to find the single best labeling configuration, , to explain the observation, . The maximum a posteriori (MAP) estima- tion [3] guarantees the single best labeling configuration (1) (2) where is the likelihood function for given , and is the prior probability of . Markov random fields (MRFs), with Markov property on undirected graphs, can rep- resent 2-D patterns statistical-structurally. Their great success achieved in pattern recognition, image processing, and com- puter vision in the passing decades has been largely due to their ability to reflect local statistical dependencies existing univer- sally in patterns and images [3]–[6] [7], [8]. Within the MRF framework, statistical interactions at adja- cent sites in a pattern or image are reflected by two fundamental concepts: neighborhood system and clique potentials . The neighborhood system denotes the neighbors of site provided that . The clique is a subset of sites that are all pair-wise neighbors. For simplicity, we consider only single-site cliques and pair-site cliques . Clique potentials encourage or penalize different local in- teractions among neighboring sites. According to the Hammer- sley–Clifford theorem [8], the joint probability distribution of random variables at all sites in MRFs is a Gibbs distribution. Therefore, the MAP estimation (1)–(2) of the MRF is equivalent to minimizing the posterior energy of the Gibbs distribution [7] (3) where 1063-6706/$25.00 © 2007 IEEE

Upload: truongxuyen

Post on 14-Feb-2017

217 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Type-2 Fuzzy Markov Random Fields and Their Application to

IEEE TRANSACTIONS ON FUZZY SYSTEMS, VOL. 16, NO. 3, JUNE 2008 747

Type-2 Fuzzy Markov Random Fields andTheir Application to Handwritten

Chinese Character RecognitionJia Zeng, Member, IEEE, and Zhi-Qiang Liu

Abstract—In this paper, we integrate type-2 (T2) fuzzy sets withMarkov random fields (MRFs) referred to as T2 FMRFs, whichmay handle both fuzziness and randomness in the structural pat-tern representation. On the one hand, the T2 membership func-tion (MF) has a 3-D structure in which the primary MF describesrandomness and the secondary MF evaluates the fuzziness of theprimary MF. On the other hand, MRFs can represent patternsstatistical-structurally in terms of neighborhood system andclique potentials and, thus, have been widely applied to imageanalysis and computer vision. In the proposed T2 FMRFs, we de-fine the same neighborhood system as that in classical MRFs. Todescribe uncertain structural information in patterns, we derivethe fuzzy likelihood clique potentials from T2 fuzzy Gaussian mix-ture models. The fuzzy prior clique potentials are penalties for themismatched structures based on prior knowledge. Because Chi-nese characters have hierarchical structures, we use T2 FMRFs tomodel character structures in the handwritten Chinese characterrecognition system. The overall recognition rate is 99.07%, whichconfirms the effectiveness of the proposed method.

Index Terms—Handwritten Chinese character recognition(HCCR), Markov random fields (MRFs), type-2 fuzzy sets (T2FSs).

I. INTRODUCTION

MANY pattern recognition problems can be posed as thelabeling problem to which the solution is a set of lin-

guistic labels, , assigned to a set of sites,, to interpret the observation set,

, at all sites [1]. The sites may be successive times,image pixels, and image regions, while the labels reflect any re-lations, regularities, or structures inherent in sites. The numberof labels is not necessarily equal to the number of sites .The labeling strength, , measures the assignmentof label to site , where denotes that is definitelyassigned to , i.e., . The null label is not assigned to anysites denoted by , and the null site is not asso-ciated with any labels denoted by . Each siteis associated with a random observation , which may repre-sent symbols, feature vectors, or image pixel values. For sim-plicity, we assume that the observations are independent andidentical distributed (i.i.d.). The label assigned to site is also arandom variable and, thus, the labeling configuration at all sites,

, is a stochastic process. For example, in

Manuscript received October 29, 2006; revised February 8, 2007.The authors are with the School of Creative Media, City University of Hong

Kong, Hong Kong (e-mail: [email protected]; [email protected]).Color versions of one or more of the figures in this paper are available online

at http://ieeexplore.ieee.org.Digital Object Identifier 10.1109/TFUZZ.2007.905916

speech recognition using hidden Markov models, we may havelabels representing phonemes, and such a label set for the word“cat” would have labels for phonemes , , and (seeFig. 1); in Chinese character recognition using Markov randomfields, we may have labels representing stroke models, whichconstitute the character model. The label set for the characterwould have labels for all decomposed straight-line strokes (seeFig. 1).

Given a set of class models , the recognition of an un-known pattern is performed using to score eachclass model based upon the given test observation , andselect the pattern whose model score is highest, i.e., the max-imum-likelihood (ML) criterion [2]. The recognition can bealso viewed as one of scoring how well a given class model(labels) matches a given observation. Because the matchingscore, , is an intractable com-binatorial problem, we turn to find the single best labelingconfiguration, , to explain the observation,

. The maximum a posteriori (MAP) estima-tion [3] guarantees the single best labeling configuration

(1)

(2)

where is the likelihood function for given , andis the prior probability of . Markov random fields

(MRFs), with Markov property on undirected graphs, can rep-resent 2-D patterns statistical-structurally. Their great successachieved in pattern recognition, image processing, and com-puter vision in the passing decades has been largely due to theirability to reflect local statistical dependencies existing univer-sally in patterns and images [3]–[6] [7], [8].

Within the MRF framework, statistical interactions at adja-cent sites in a pattern or image are reflected by two fundamentalconcepts: neighborhood system and clique potentials . Theneighborhood system denotes the neighbors of site providedthat . The clique is a subset of sitesthat are all pair-wise neighbors. For simplicity, we consider onlysingle-site cliques and pair-site cliques .Clique potentials encourage or penalize different local in-teractions among neighboring sites. According to the Hammer-sley–Clifford theorem [8], the joint probability distribution ofrandom variables at all sites in MRFs is a Gibbs distribution.Therefore, the MAP estimation (1)–(2) of the MRF is equivalentto minimizing the posterior energy of the Gibbs distribution [7]

(3)

where

1063-6706/$25.00 © 2007 IEEE

Page 2: Type-2 Fuzzy Markov Random Fields and Their Application to

748 IEEE TRANSACTIONS ON FUZZY SYSTEMS, VOL. 16, NO. 3, JUNE 2008

Fig. 1. Many pattern recognition problems can be posed as the labeling problem.

Fig. 2. MRF has four labels a, b, c, and d with the link representing neighbors. The likelihood clique potentials are derived from GMMs. The single-site cliquepotentials V describe the statistical information, and the pair-site clique potentials V describe the structural information statistically.

(4)

(5)

are likelihood energy and prior energy, respectively. The energyfunction is a sum of clique potentials over all possible cliques .

Fig. 2 illustrates the key concepts of MRFs. The label at eachsite describes randomness of the observations in terms of prob-ability density functions (PDFs), and here we use the Gaussianmixture models (GMMs) to model underlying densities of theobservations. The single-site likelihood clique potentials arederived from GMMs, which encode the statistical informationof observations at each site. Furthermore, the MRF describes re-lationships between labels at neighboring sites by defining theproper neighborhood system . In Fig. 2, we denote labels atneighboring sites such as and by a link. This structural in-formation is evaluated by the pair-site likelihood clique poten-tials also derived from GMMs. Therefore, the MRF can sta-

tistical-structurally model 2-D patterns, which are often repre-sented by a set of observations at sites in the feature space.

However, randomness may be difficult to characterize the fol-lowing uncertainties [1]:

1) uncertain parameters of the class model because of in-sufficient and noisy training data, which further lead to un-certain mapping of the model;

2) uncertain relationship between training and test data due tolimited prior information;

3) uncertain linguistic labels because the same label maymean different things to different people.

One of the best sources of general discussion about uncer-tainty is Klir and Wierman [9]. Regarding the nature of uncer-tainty, they state that three types of uncertainty are now recog-nized:

1) fuzziness (vagueness), which results from the impreciseboundaries of fuzzy sets;

2) nonspecificity (information-based imprecision), which isconnected with sizes (cardinalities) of relevant sets of al-ternatives;

Page 3: Type-2 Fuzzy Markov Random Fields and Their Application to

ZENG AND LIU: TYPE-2 FUZZY MARKOV RANDOM FIELDS AND THEIR APPLICATION TO HANDWRITTEN CHINESE CHARACTER RECOGNITION 749

Fig. 3. Three-dimensional type-2 fuzzy membership function (T2 MF). (a) Primary membership with (thick dashed line) lower and (thick solid line) upper mem-bership functions, where h(o) and h(o) are lower and upper bounds of the primary membership of the observation o. The shaded region is the foot print ofuncertainty (FOU). (b) Gaussian secondary membership function. (c) Interval secondary membership function. (d) Mean � has a uniform membership function.

3) strife (discord), which expresses conflicts among the var-ious sets of alternatives.

Observe that the types of uncertainty in the labeling problemmay be certain fuzziness and nonspecificity resulting from in-complete information, i.e., fuzzy class models (uncertain map-ping), fuzzy observations (nonstationary data), and fuzzy match(uncertain labeling configuration). Therefore, we integrate thetype-2 fuzzy set (T2 FS) with the MRF referred to as the T2FMRF to handle both fuzziness and randomness in the labelingproblem.

A T2 FS [10] is characterized by a 3-D membership func-tion (MF) , which evaluates the uncertainty of the obser-vation by the primary membership, and further describes theuncertainty of that primary membership by the secondary MF.The T2 MF can be viewed as an ensemble of embedded type-1(T1) MFs with fuzzy parameters. Fig. 3(a) is the T1 GaussianMF with fuzzy mean , which is bounded by an interval .We assume the mean vary anywhere in the interval, which re-sults in the movement of the T1 MF to form the footprint ofuncertainty (FOU) illustrated by the shaded region bounded bylower and upper MFs in Fig. 3(a). We see that if the movementis uniform, i.e., the mean has a uniform MF in Fig. 3(d), thenthe FOU is also uniform with equal possibilities, so does thesecondary MF in Fig. 3(c). More specifically, if the mean is afuzzy variable [11] with uniform MF in Fig. 3(a), the output

of the input is also a fuzzy variable with uniformMF in Fig. 3(c). This special T2 FS is called the interval type-2fuzzy set (IT2 FS) that can be denoted by the interval of upperand lower MFs, i.e., . However, if the meanis with Gaussian MF, the output fuzzy variable is definitely notassociated with the Gaussian secondary MF in Fig. 3(b). There-fore, in practice, it is convenient to define the secondary MF inFig. 3(b) and (c) directly without considering the MF of fuzzyparameters of the original T1 MF, though we know that there isa complex relationship between the MF of fuzzy parameters andthe secondary MF [12]. Operations on general T2 FSs, meet “ ”and join “ ”, involve an intractable combinatorial problem ofthe primary membership, whereas IT2 FSs use the interval arith-metic which makes the operations very simple [13]. Hence, weuse only IT2 FSs throughout this paper unless otherwise stated.More details of T2 FSs can be found in [14].

Recently, T2 FSs have been successfully applied to patternrecognition [12] (clustering image data [15], classification ofMPEG VBR video traffic [16], evaluation of welded structures[17], speech recognition [18]–[20], and classification of battle-field ground vehicles [21]). The T2 FS-based classifiers havethe potential to outperform the competing T1 FS-based ones be-cause intuitively an ensemble of T1 FSs may have more expres-sive power than the single “best” T1 FS when uncertainty existsin real-world problems.

Page 4: Type-2 Fuzzy Markov Random Fields and Their Application to

750 IEEE TRANSACTIONS ON FUZZY SYSTEMS, VOL. 16, NO. 3, JUNE 2008

Based on our previous work [1], to handle fuzziness and ran-domness in the labeling problem, we extend the MAP estimation(1)–(2) to the following operations on T2 FSs:

(6)

(7)

where is the fuzzy model with uncertain parameters. The non-singleton fuzzification is able to handle fuzzy observations dueto noise [22]. Similar to the assumption of i.i.d., we assume that

can be written as a meet of T2 MFs of individual ob-servations

(8)

Compared with (2), (7) conveys more possibilities because oper-ations on T2 FSs propagate more uncertainty than the scalar cal-culation [23]. We see that (7) will be reduced to (2) if the fuzzyclass model becomes certain. From this perspective,and describe fuzziness of the likelihood and prior, respec-tively. In case of IT2 FSs, , wecan rewrite (7) as follows:

(9)

(10)

where denotes product -norm.Because we use GMMs for the densities of observations in

classical MRFs, in order to incorporate fuzziness in the labelingproblem, we focus on the following issues in the proposed T2FMRFs.

1) We propose the T2 fuzzy GMMs (T2 FGMMs) for densitymodeling.

2) We derive the fuzzy likelihood clique potentials from T2FGMMs.

3) We develop the T2 fuzzy relaxation labeling algorithm (T2FRL) that finds the single best labeling configuration in(6).

4) We use the generalized linear model (GLM) [24] tomake the classification decision from the interval,

, in (9) and (10).In Section II, we introduce the T2 FGMMs for density

modeling. By information theory, we demonstrate that theinterval of lower and upper MFs can evaluate the uncertaintyof the class model to the observation. Section III proposes T2FMRFs, which have the same neighborhood system as that inclassical MRFs. The difference lies in that we derive the fuzzylikelihood clique potentials from T2 FGMMs. Similarly, wedesign the fuzzy prior clique potentials to reflect the prior struc-tural constraints in the labeling space. In Section IV, we builda T2 FMRF-based handwritten Chinese character recognition(HCCR) system. We use the single-site fuzzy clique potentialto extract candidate strokes, and then determine the best strokerelationships by the pair-site fuzzy clique potentials. Section Vdraws conclusions and the Appendix lists important notationsin this paper.

II. TYPE-2 FUZZY GMMS

A. Multivariate Uncertain Gaussian

Given a -dimensional observation vector , the corre-sponding mean vector , and the diagonal covariance matrix,

Fig. 4. Gaussian with uncertain (a) mean and (b) std. The (a) mean and (b) stdare fuzzy variables with uniform possibilities. The shaded region is the FOU.The thick solid and dashed lines denote the lower and upper boundaries of theFOU referred to as the lower and upper MFs. Because of the uniform FOU, thesecondary MFs are all interval sets.

, the multivariate Gaussian primary MFwith uncertain mean vector or covariance matrix [18] is

(11)

or

(12)

where each exponential component is the Gaussian primary MFwith uncertain mean or standard deviation (std) [25] (see Fig. 4).Obviously, the uncertainty of multivariate Gaussian is accumu-lated by all of its -dimensional uncertain exponential compo-nents. Without loss of generality, we consider only the Gaussianprimary MF with uncertain mean in the rest of paper. Interestedreaders can investigate properties of Gaussian primary MF withuncertain std in a similar way as follows.

In the case of Gaussian primary MF with uncertain mean [seeFig. 4(a)], the upper MF is

(13)

Page 5: Type-2 Fuzzy Markov Random Fields and Their Application to

ZENG AND LIU: TYPE-2 FUZZY MARKOV RANDOM FIELDS AND THEIR APPLICATION TO HANDWRITTEN CHINESE CHARACTER RECOGNITION 751

Fig. 5. Example of Gaussian primary MF with uncertain mean with factork = 1, 2, 3. The FOU reflects the amount of uncertainty in the primary mem-bership, i.e., the larger (smaller) the amount of uncertainty, the larger (smaller)will the FOU be. Although larger FOU covers more uncertainty, it in the meantime loses more information for decision making because it is more difficult tofigure out which point to choose in the larger FOU (k = 3) than in the smallerFOU (k = 1).

and the lower MF is

(14)

where

(15)

The factor [18] controls the FOU

(16)

As shown in Fig. 5, because a 1-D Gaussian has 99.7% of itsprobability mass in the range of , we con-strain in (16). The FOU reflects the amount of un-certainty in the primary membership, i.e., the larger (smaller)the amount of uncertainty, the larger (smaller) will the FOU be.However, the large FOU will lose useful information for clas-sification decision. From Fig. 5, we see that it is more difficultto know which point to choose in the larger FOU thanin the smaller FOU . Indeed, different values of playdifferent roles in classification performance. Although in somecases other values of may have better performance, in practicewe take because the 1-D Gaussian has 68% of its prob-ability mass in the range , which covers properuncertainty, and in the meantime, keeps a certain useful infor-mation for classification decision making.

Based on the multivariate uncertain Gaussian, we obtain thefollowing T2 FGMM with mixtures

(17)

Fig. 6. Dashed line is the original T1 MF and the solid and dotted linesare upper and lower MFs of this T1 MF with fuzzy mean. The lengthL = j ln h(o) � ln h(o)j describes the uncertainty of the T2 FS to theobservation o. The longer L the more uncertainty, which is marked by thedarker gray bar. For example, o deviates far from the mean, so it has not onlya lower membership grade but a longer L , as well. Three intervals L, L , andL reflect the uncertainty of the T2 FS to the observation o.

where is the mixture weight, and the sum “ ” and product-norm “ ” are used for join “ ” and meet “ ”, respectively.

B. Bounded Uncertainty

T2 FSs can model bounded uncertainties deterministically[26]. As shown in Fig. 6, the observation is measured by abounded interval set, , in T2 MFs instead of a crispscalar in T1 MFs. Similar to the entropy of a uniformrandom variable, the uncertainty of the interval set is equal tothe logarithm of the length of the interval [23]. In Fig. 6, we areinterested in lengths of three intervals, , ,and . For analytical purposes, we often use thelog-likelihood [2] in pattern recognition, so we turn to the log-arithm of lower and upper MFs, . According to (13)–(16), we have the following lengths:

(18)

(19)

(20)

We observe that (18)–(20) are all increasing functions interms of deviation and the factor . For example,given a fixed , the further the deviation of from , thelonger the interval , increasing the entropy (uncertainty).This relationship accords with our prior knowledge. If theobservation deviates further from the class model, so calledoutlier, it not only has a lower membership grade , but alsoa longer interval reflecting its uncertainty to the class model.Indeed, we are often uncertain whether outliers belong to theclass or not. From (18)–(20), we see that plays an importantrole in representing the bounded uncertainty. If , then

, which implies that no uncertainty existsand only the membership grade evaluates the observation. If increases for a fixed deviation , the length of

interval increases, which also brings with more uncertainty.Nevertheless, the longer and will lose some informationcompared with the original . Thus, though the larger candescribe more uncertainty depicted by the FOU in Fig. 5, it alsoresults in loss of more information. In practice is a good

Page 6: Type-2 Fuzzy Markov Random Fields and Their Application to

752 IEEE TRANSACTIONS ON FUZZY SYSTEMS, VOL. 16, NO. 3, JUNE 2008

Fig. 7. Classification with T2 FGMMs. The black squares and triangles are cen-ters of the mixture components. The thin black solid line is the optimal Bayesiandecision boundary by the generating distributions. The thick black solid line isthe decision boundary by GMMs. The dotted lines are the decision boundariesby lower and upper MFs in T2 FGMMs with k = 1.

choice because it keeps a balance between uncertainty andinformation. Based on (11) and (12) the bounded uncertaintyis accumulated in each dimension, and propagated through (9)and (10). The output posterior interval, ,contains both the membership grade and uncertainty of the bestlabeling configuration to the observation.

C. Density Modeling

The GMM is a universal approximator, in that it can modelany density function arbitrarily closely provided that theycontain enough components [27]. The expectation-maximiza-tion (EM) algorithm [28] finds ML estimates of parameters inGMMs. In contrast, we propose T2 FGMMs for density mod-eling and classification. The key problem is how to estimateparameters of the T2 FGMMs. The simplest way is to assume

is a fixed constant. So we can first estimate the parametersof a GMM by the EM algorithm, and add the factor to makea T2 FGMM.

Fig. 7 shows the decision boundaries by the lower and upper(dotted lines) MFs of the T2 FGMM with . We seethat these two boundaries twist around the decision boundary(thick black line) by GMMs. The optimal decision boundaryby Bayesian classifier is the thin black line. Observe Fig. 7,and we believe that the linear combination of lower and upperMFs of T2 FGMMs will be better than the decision boundaryof GMMs in terms of classification rates. Hence, we proposethe GLM to make the classification decision from intervals oflower and upper MFs. First, we evaluate the training data by T2FGMMs, and use the output intervals to estimate a GLM by theiterated re-weighted least square training [24]. Second, we usethe GLM to classify the output intervals of test data evaluatedby T2 FGMMs. The pattern classification system based on T2FGMMs is illustrated in Fig. 8.

III. TYPE-2 FUZZY MRFS

A. Type-2 Fuzzy Energy Function

Hidden fuzzy MRFs [29] characterize fuzzy image pixels(sites) that are difficult to segment. In contrast, the proposed T2

Fig. 8. Pattern classification system based on T2 FGMMs.

FMRF uses fuzzy clique potentials to evaluate the uncertain la-beling configuration at all sites. By integrating T2 FSs, we canrewrite (6)–(7) in terms of fuzzy posterior energy function

(21)

(22)

where

(23)

(24)

are fuzzy likelihood and prior energy functions, respectively.The fuzzy energy functions are join “ ” operation on fuzzyclique potentials over all possible cliques . Furthermore, basedon (9) and (10), we can rewrite the energy function (22)–(24) asthe sum -conorm operation on lower and upper MFs in parallel.

B. Type-2 Fuzzy Clique Potentials

In principle, we can select clique potentials arbitrarily if theydecrease the values of the energy with an increase of matchingdegree [7]. To evaluate uncertain structural constraints in the la-beling configuration, we derive the fuzzy clique potentials fromT2 FGMMs. From (8) and the Hammersley–Clifford theorem[8], we find the following relationship:

(25)

(26)

where we usually assume the normalizing constant .If and are T2 FGMMs, we can derivethe single-site and pair-site fuzzy likelihood clique potentialsfrom (25)–(26)

(27)

(28)

where the binary features .The fuzzy prior clique potentials and

encode the prior structural information in the labeling space.We may think of this prior structural information as a kindof smoothness [3], i.e., we encourage those local labelingconfigurations consistent with the predefined structure at sites,and penalize those mismatched labeling configurations.

Page 7: Type-2 Fuzzy Markov Random Fields and Their Application to

ZENG AND LIU: TYPE-2 FUZZY MARKOV RANDOM FIELDS AND THEIR APPLICATION TO HANDWRITTEN CHINESE CHARACTER RECOGNITION 753

Fig. 9. (First line) Samples in Hanja1 and (second line) Hanja2 databases.

C. Type-2 Fuzzy Relaxation Labeling

To find the single best labeling configuration and the min-imum fuzzy posterior energy in (21)–(22), we de-velop the T2 FRL algorithm as shown in Algorithm 1.

First, we convert the minimization of fuzzy posterior energyinto the maximization of the following fuzzy gain function:

(29)

The fuzzy compatibility functions are defined by the fuzzyclique potentials (initialization part)

(30)

where the constant and satisfy that bothfuzzy compatibility functions are non-negative. The fuzzy gainfunction (29) is the matching degree of the labeling configura-tion to the observation . In order to find the best , weneed to search the labeling strengths maximizing (29). The finallabeling strength implies an assignment of the label to the ob-servation.

Second, we update (line 6) by the gradient of (29) (line5)

(31)

until reaches the fixed number in Algorithm 1. Because thecompatibility functions contain both fuzzy likelihood and prior

information, the solution of T2 FRL dose not depend on theinitial labeling that much.

Finally, we use the winner-take-all strategy to retrieve thebest labeling configuration, which ensures that each label isassigned to merely one site (lines 11–12). The T2 FRL is acontext-based algorithm because the labeling strength will in-crease only if its neighboring labeling strengths increase (lines5–6). According to (9) and (10), the meet “ ” and join “ ” canbe rewritten by the -norm and -conorm operations on lowerand upper MFs in parallel. We shall use the product -norm “ ”and sum -conorm “ ”, which involve the interval arithmeticbecause , , , and are all intervalsets. For division between two intervals, we divide the center ofinterval unless otherwise stated.

Observe Algorithm 1 and we see that the T2 FRL algorithmhas a polynomial complexity concerned with , , and . Foreach iteration (line 5), we have to compute two bounds of theinterval set rather than a scalar in the classical RL [30].Therefore, the computational cost of the T2 FRL is twice thatof the classical RL.

IV. HANDWRITTEN CHINESE CHARACTER RECOGNITION

A. Character Structure Representation

Chinese characters [31] are typical 2-D patterns with com-plex structural information in Fig. 9. According to Biederman’srecognition-by-components (RBC) theory [32], the visual inputis matched against structural representations of objects in thebrain, and these structural representations consist of primitiveshapes and their interrelations. A Chinese character can be de-composed into many straight-line strokes. So, we can representcharacter structures by fragmental features (such as the posi-tion of a stroke) and configurational features for relationships(structural information) among the fragmental features. Cur-rently, many successful HCCR systems are based on either sta-tistical methods [33]–[36], or structural methods [37]–[39], orhybrid statistical-structural methods [40]–[43].

This paper proposes T2 FMRFs to model Chinese characterstructures as the labeling problem. Each class of characters is as-sociated with a T2 FMRF-based character model composed of aset of stroke models (labels) in Fig. 1. Different labeling config-urations identify different stroke relationships by interactions atneighboring sites in the neighborhood system . We encourageor penalize different stroke relationships by assigning differentfuzzy clique potentials derived from T2 FGMMs, and theirparameters are estimated from training samples automatically.The recognition is equivalent to finding the best structural match

between the labels (stroke models) and observations (candi-date strokes). Hence, the fuzzy posterior energy, , isthe cost of that structural match. After structural match with allcharacter models by the T2 FRL Algorithm 1, we classify the

Page 8: Type-2 Fuzzy Markov Random Fields and Their Application to

754 IEEE TRANSACTIONS ON FUZZY SYSTEMS, VOL. 16, NO. 3, JUNE 2008

Fig. 10. Global neighborhood system defines that all strokes i are neighbors in(b)–(e). The connected neighborhood system defines that only connected strokesi and i are neighbors in (f) and (g).

observation to the character model with the minimum cost in(21)–(22). Similar to Fig. 8, we shall use the GLM to make theclassification decision from the interval set (3).

To account for all types of stroke relationships completely,the global neighborhood system [43] defines that all strokes areneighbors each other (see Fig. 10), which is computationallyexpensive because all relationships between two sites have tobe calculated in the T2 FRL algorithm. Nevertheless, in mostcases we extract less than forty candidate strokes, ,from a Chinese character, which makes the global neighbor-hood system computationally tractable. In contrast, the con-nected neighborhood system [7] defines that only two connectedstrokes are neighbors in Fig. 10, because the connected strokesoften have stable structures, such as stable relative directions,positions, and lengths. The Kullback–Leibler (KL) divergenceneighborhood system [41] selects the most important stroke re-lationships by minimizing the KL divergence among the strokedistributions. Comparing the three neighborhood systems, theconnected neighborhood system is natural but can reflect onlyfixed types of stroke relationships based on connection. Theglobal neighborhood system is complete with comparably highcost. The KL divergence neighborhood system is a compromisebetween the global and connected neighborhood systems, butwe have to compute KL divergence for all stroke distributions.Therefore, to model character structures completely, we adoptthe global neighborhood system in this paper. Because of theambiguity of stroke segmentation, we usually extract many can-didate strokes which may be repetitive or overlapped. Thesecandidate strokes are not neighbors each other in the neighbor-hood system. Section III-B derives the fuzzy likelihood cliquepotentials (27) and(28) from T2 FGMMs. Here, we design thefuzzy prior clique potentials. The single-site fuzzy prior cliquepotential penalizes the null label

ifif

(32)

where . The pair-site fuzzy prior clique potential penal-izes the mismatch between two input candidate strokes and theirlabeling configuration in terms of connection. We denote twoconnected labels and by , and otherwise .The connection between labels and denoted by an edge isfixed according to the initial character model. In Fig. 11(d), theconnection reflects the prior local structure of the character in(a). During the structural match between the candidate strokes,

, , , , and the labels 1, 2, 3, 4, we penalize the inconsis-tent relationship if two disconnected labels 1 and 3 are assignedto two connected strokes and by the pair-site fuzzy priorclique potential

Fig. 11. Connection between labels is prior knowledge about the characterstructure. In (d), the four connected labels represent prior stroke relationships ofthe character in terms of connection. We penalize the inconsistent relation-ship if two disconnected labels 1 and 3 are assigned to two connected strokeso and o .

if are consistent withif are inconsistent with

(33)where . In order to keep a balance between fuzzy priorand likelihood potentials, we set the penalties , accordingto the fuzzy likelihood clique potentials

(34)

(35)

Equation (34) penalizes the null label not assigned to any siteby the minimum fuzzy single-site likelihood potential as if itis assigned to that observation. Sometimes there are exceptionsto the prior constraints on connected strokes. For example, inFig. 11(d), labels 2 and 3 are connected to represent connectedstrokes and , but an exception occurs in (a) where and

are separated. So, we take a small fraction of the pair-sitelikelihood potential as penalty in (35).

B. Stroke Extraction

Because we use T2 FMRFs to model stroke relationships,the stroke extraction is an essential part in our HCCR system.After decomposing the character into strokes, we can obtaintheir complete spatial information such as direction , posi-tion , and length .

The preprocessing of character images consists of three steps.1) We normalize the slant and moment with aspect ratio pre-

served for character images [31], and then perform theEuclidean distance transform (EDT)-based thinning to theinput characters, which can recover the jam-packed holesand remove loosely touching strokes [44].

2) From the character skeleton, we extract the end and inter-section points called feature points. In the meanwhile, wetrace the consecutive pixels connecting the feature pointsreferred to as substrokes, and remove spurious substrokeswhose lengths are short [38].

3) We use the corner detection to break each substroke at highcurvature points [45].

To represent the substroke direction , we use the Gaborfilter-based directional feature [46]–[48].

1) Eight Gabor filters with orientations 0 , 22.5 , 45 , 67.5 ,90 , 112.5 , 135 , and 157.5 are convolved with thethinned character image, which produces eight uncorre-lated gray images.

Page 9: Type-2 Fuzzy Markov Random Fields and Their Application to

ZENG AND LIU: TYPE-2 FUZZY MARKOV RANDOM FIELDS AND THEIR APPLICATION TO HANDWRITTEN CHINESE CHARACTER RECOGNITION 755

2) At each pixel on the character skeleton, we useeight gray values as from eight gray images,and normalize them by setting their maximumand minimum values to one and zero, such as

.3) The of a substroke is the average

of its component pixels. If a substroke has, it

must be a horizontal line because its responses of Gaborfilters are larger in 0 and 22.5 orientations.

The Euclidean distance between two vectors measures the sim-ilarity of two directions because the same directions have zerodistance and the perpendicular directions have the largest dis-tance.

Because of complicated character shapes and ambiguouscross strokes, traditional stroke extraction methods are notreliable and often produce erroneous broken strokes, whichmay lead to the possibility to assign multiple structural repre-sentations to the same character. To overcome this problem,we propose the T2 FMRF-based stroke extraction, which usesthe greedy search to extract many candidate strokes based onstroke models (labels). As a summary, Algorithm 2 extracts allpossible candidate strokes using the single-site fuzzy likelihoodclique potential (27) of each label.

First, we build a graph for all substrokes, whereis a matrix about the connectable substrokes and that

satisfies the following conditions.1) and share the same intersection region, or the dis-

tance between their end points is less than a threshold .2) .

The first condition ensures two substrokes from a continuousstraight line, and the second condition checks the linearity oftwo substrokes. Since the size of the normalized character imageis 64 64 pixels, is a proper distance for the gap be-tween substrokes. We set to prevent perpendicularsubstrokes from being connectable. For each substroke, isits centroid coordinates, and is the number of pixels,both of which are normalized with respect to the character size.

Second, the set from lines 3–7 stores the initialsubstrokes that satisfy the positional and directional constraints

Fig. 12. T2 FMRF-based stroke extraction. (a) The input character image.(b) The EDT-based thinning and feature points extraction. (c) The substrokesthat build the graph G = (O; E). (d) The character model composed of fourlabels. (e) The extracted candidate strokes by the label 1, where the candidate“o + o ” has the minimum center of likelihood clique potential.

with the label . The thresholds and areloose enough to include proper candidates close and parallel tothe label such as , , , to the label 1 in Fig. 12(e). Foreach initial substroke in the set, we concatenate it withits connectable substroke to form a new substroke (line 9)

(36)

(37)

(38)

and put the old one into the set (line 12). If the newsubstroke decreases the center of the single-site fuzzy likeli-hood potential, we repeat to concatenate it with its connectablesubstrokes until the center of likelihood potential increases(line 14). The threshold is a small positive value to controlthe decreasing value. Finally, the set stores allcandidate strokes that decrease the center of the single-sitefuzzy likelihood clique potential (27). After stroke extraction,we also obtain the connection information of all candidatestrokes. Fig. 12 shows the process of stroke extraction, wherethe candidate “ ” has the minimum center of likelihoodclique potential to the label 1. In practice, we may extract mul-tiple candidate strokes by each label, and a null label indicatesthat no candidate strokes are extracted.

C. Structural Match

As shown in Fig. 13, each label corresponds to a set of can-didate strokes, which constitute a search space for structuralmatch. The objective is to determine the best labeling config-uration by Algorithm 1, i.e., to fix the best candidate strokeassociated with each label.

First, for the candidate strokes extracted by the label , weset their initial labeling, , and zero for others in Algo-rithm 1 (initialization part). Through stroke extraction, we alsoobtain the single-site and pair-site fuzzy compatibilitiesand for Algorithm 1 (initialization part). Because theinteractions between candidates extracted by the same label arenot considered, the compatibility . For example,candidates , , and extracted by the label 2 have nointeractions in Fig. 13. Second, the structural match from lines3–8 in Algorithm 1 computes the gradient of the gain function

Page 10: Type-2 Fuzzy Markov Random Fields and Their Application to

756 IEEE TRANSACTIONS ON FUZZY SYSTEMS, VOL. 16, NO. 3, JUNE 2008

Fig. 13. Candidate strokes constitute a search space for structural match. We adopt the global neighborhood system, where every two candidates are neighborsexcept repetitive or overlapped ones. The fuzzy compatibility, ~K (j; j) = 0, prevents the same label from being assigned to neighboring sites. At each iterationof the T2 FRL algorithm, we update the labeling strength of each label to the observation until it is assigned to only one candidate.

(29) for every candidate in the label according to its com-patibility as well as neighboring compatibility support (line 5).We take only the maximum neighboring support from each label(line 5) into account, e.g., for candidate in the label 1, we se-lect the maximum support from candidates , , andin the label 4 (See Fig. 13).

Finally, we update the labeling strength by the gradient(line 6) until the maximum number of iteration is reached.Based on final labeling strengths, we determine the best la-beling configuration to the candidates using winner-take-allstrategy (lines ), e.g., in Fig. 13 the label 1 is definitelyassigned to “ ” if its labeling strength is the largest.Sometimes two labels may be assigned to two overlappedcandidate strokes, e.g., the label 1 and 2 may be assigned to

simultaneously (see Fig. 13). In this case, we retain thelabel with the higher compatibility and assign the other labelto another candidate stroke which does not overlap with otherlabeled candidate strokes. The label is null if there are no suchcandidates. Those unlabeled candidate strokes, ,are regarded as noise (surplus strokes). The penalty of theunlabeled stroke is proportional to its length [41].

D. Training a T2 FMRF-Based Character Model

Algorithm 3 summarizes the training of a T2 FMRF-basedcharacter model.

First, we set up the T2 FMRF prototype for each class ofcharacters by the observation, , which isextracted from a well-segmented standard character (line 2).The number of sites of the standard characters is equal to thenumber of labels . In this case, the initial mean vectors and

are the unary and binary features and , respectively.The initial covariance matrix and are the diagonal ma-trices . We use the factor to producethe T2 FGMM (17), and derive the clique potentials (27), (28),and (32)–(33).

Second, for each training character image, we use the T2FMRF-based stroke extraction and obtain the observation .Suppose a set of training observations , , is usedto estimate the parameters of a T2 FMRF-based character modelwith mixture components. We find the single best labelingconfiguration to each training observation by the T2 FRL(line 4). For the observations associated with the same label,we cluster them into different mixture components by the fuzzyc-means (FCM) [49] (line 9). As a result, each observation is as-sociated with a single unique mixture component, which can berepresented by the indicator function [see (39) and (40), shownat the bottom of the page]. Therefore, the mean vector, covari-ance matrix, and mixture weight can be estimated via simpleaverages (lines 10–15).

Finally, we use the the EM algorithm [28] to refine parametersof a T2 FMRF. After iterations, the T2 FRL terminates and as-sociates each observation with the th label and th mixtureby the labeling strength, . Rather than assigninga label to the specific site, we assign the label to each site in pro-portion to the labeling strength . The process of the EM

if is with the th mixture component of labelotherwise

(39)

if is with the th mixturecomponent of labelsotherwise

(40)

Page 11: Type-2 Fuzzy Markov Random Fields and Their Application to

ZENG AND LIU: TYPE-2 FUZZY MARKOV RANDOM FIELDS AND THEIR APPLICATION TO HANDWRITTEN CHINESE CHARACTER RECOGNITION 757

Fig. 14. Results of T2 FMRF-based stroke extraction. The first column shows the T2 FMRF-based character models. Only the best candidate strokes which havethe minimum center of single-site fuzzy likelihood clique potentials are illustrated.

is almost the same as Algorithm 3, except that we use asthe updating weight. Because of limited training observations,the covariance matrix may be singular and irreversible. In thiscase, the EM algorithm updates only mean vectors, and leavesthe covariance matrix unchanged.

E. Experimental Results

We evaluated the performance of T2 FMRF-based charactermodels on KAIST [38] (Hanja1, Hanja2) database that is pub-licly available. The Hanja1 database has 783 classes with 200samples for each class. The Hanja2 database has 1309 samplesof cursive handwritten characters from real documents only fortest purpose. The image quality of Hanja1 database is good, butHanja2 database is bad. Some typical samples in Hanja1 andHanja2 are illustrated in Fig. 9.

Fig. 14 shows the results of T2 FMRF-based stroke extrac-tion. We show only the candidate strokes with the minimumcenter of the single-site fuzzy likelihood clique potentials. Com-pared with other model-based stroke extraction methods [38],[41], we use the single-site fuzzy likelihood potential to limitthe total number of candidate strokes for each label in order toreduce the search space. Furthermore, after stroke extraction weobtain , which is part of the structural match in Algo-rithm 3. From this perspective, we have formulated the strokeextraction and the later structural match within a unified frame-work.

We compared the T2 FMRF-based character model with thebaseline recognizer [41]. The baseline recognizer used the oddnumber of samples of every class for training, and the first ten

samples of the even class for the test on the Hanja1 database.By handling degraded region, the baseline recognition ratewas 98.45%. The Hanja2 database were only used for testwith recognition rate 83.14%. Based on baseline recognizer,a binary classifier [50] was proposed to further differentiatesimilar characters by the neural network learning algorithm,which improved the overall recognition rate from 98.45% to99.46% on Hanja1 database. In our experiment, we selected783 classes of characters as recognition vocabulary fromHanja1 database. For each class, we used ten samples of theeven number for the test and the remaining 190 samples fortraining. To evaluate the T2 FMRF-based character model forcursive Chinese characters, we also used the test samples fromthe Hanja2 database as shown in Fig. 9. For simplicity, we didnot specially differentiate the similar characters in [50].

We set the factor in each T2 FMRF-based charactermodel with three mixture components. Although more mixturecomponents may have better results, some mixtures would beassociated with few observations for estimation due to limitedtraining data. After parameter estimation as described in Al-gorithm 3, we again evaluated the training samples with T2FMRFs, and obtained the training interval sets for the GLM. Fi-nally, we matched test samples with all T2 FMRFs, and used thetrained GLM to classify the output test interval sets. The wholeprocess is similar to Fig. 8.

Table I shows the comparison with other recognizers onHanja1 and Hanja2 datasets. Kim and Kim [41] used the first100 odd number of samples in Hanja1 for training, and the firstten samples of even number for recognition. Liu et al. [38] used

Page 12: Type-2 Fuzzy Markov Random Fields and Their Application to

758 IEEE TRANSACTIONS ON FUZZY SYSTEMS, VOL. 16, NO. 3, JUNE 2008

TABLE IRECOGNITION RATE COMPARISON ON KAIST DATABASE

Fig. 15. Sixteen highly similar characters in structure.

TABLE IICOMPARISON OF RECOGNITION RATE

the first 80 odd number of samples in Hanja1 for training, andthe first 20 samples of even number for recognition. To estimateparameters of all mixture components in T2 FMRF-based char-acter models accurately, we used more training samples totally190 for each class, though the number of training samples didnot necessarily affect the performance of the recognizer verymuch. The recognition rate of T2 FMRF-based HCCR systemon Hanja1 was 1.18% and 0.62% higher than those reported in[38] and [41], respectively. Moreover, the recognition rate onHanja2 database was 2.65% higher than that reported in [41],which is currently the best result. The Hanja2 database mainlycontains complex cursive handwritten characters composedof cursive-line strokes. Although the recognition rate 85.79%is far from satisfaction compared with 99.07% on the Hanja1database, it demonstrates the effectiveness of our proposedmethod for complex Chinese characters. Two reasons lead tothe misclassified samples: 1) character image degradation and2) similar characters in structure.

To recognize similar characters in structure, we selected six-teen highly similar characters as recognition vocabulary fromthe Hanja1 database in Fig. 15. Some of them have little differ-ence and share the same structure or substructure. In this exper-iment, we compared the recognition and generalization abilitiesbetween T2 FMRFs and MRFs, where three mixtures were usedin both T2 FMRFs and MRFs. For each class of characters, weselected 100 samples as training data, and took the rest 100 sam-ples as test data. The T1 FS-based classifier is not considered inthis experiment because it has a similar formulation to the MRFexcept the PDF is replaced by the T1 MF. Thus, we believe thatthe T1 FS-based classifier has the similar performance to that ofthe MRF. After training, we tested the T2 FMRFs and MRFs bytraining data and test data separately. Table II compares recog-nition rates of T2 FMRFs and MRFs. The recognition rate ofT2 FMRFs is 0.92% higher than that of MRFs on training data.When generalizing to unknown test data, Table II shows thatT2 FMRFs degrade only 1.08% whereas MRFs degrade 1.79%in recognition rates, which demonstrate that T2 FMRFs have abetter generalization ability than MRFs. The reason is partly dueto the proper FOU in T2 FMRFs retaining and propagating pos-sibilities by the interval arithmetic until the final classificationdecision.

V. CONCLUSION

This paper has proposed an integration of T2 FSs with MRFsreferred to as T2 FMRFs to solve the labeling problem. T2 FSscan be viewed as an ensemble of embedded T1 FSs and, thus,have more expressive power when uncertainty exists. If the pri-mary MF describes randomness, and the secondary MF repre-sents the fuzziness of the primary MF, both uncertainties canbe accounted for within a unified framework. In the labelingproblem, because of uncertain parameters of class model, non-stationary data, and fuzzy labeling configuration, we extend theMAP estimation (1), (2) to the operations on T2 FSs (6), (7). Inpractice, we use IT2 FSs because the meet “ ” operation canbe expressed as the product -norm operations on the lower andupper MFs in parallel (9), (10). So, the computational cost isdoubled compared with operations on T1 FSs.

As far as density modeling is concerned, we extend GMMs toT2 FGMMs based on T2 FSs. Not only can the output intervalreflect the membership grade, but also evaluate the uncertaintyof the class model to the observation. The classification resultsdemonstrate that T2 FGMMs have the potential to outperformGMMs with doubled computational cost. In the proposed T2FMRFs, we derive the fuzzy likelihood clique potentials fromT2 FGMMs. To perform structural match, we develope the T2FRL algorithm to find the single best labeling configuration ac-cording to (6) and (7).

Because Chinese characters have complex structures, weapply the proposed T2 FMRFs to model character structures inHCCR system. We adopt the global neighborhood system todescribe all types of stroke relationships, and design the fuzzyprior clique potential to penalize the inconsistent local struc-tural match. Because of ambiguities of strokes, we propose theT2 FMRF-based stroke extraction, which can be incorporatedinto the later structural match within a unified framework. Theexperimental results are encouraging. The recognition rateof the T2 FMRF-based HCCR system is 1.18% and 0.62%higher than the best results reported in [38], [41], respectively.Furthermore, the T2 FMRF shows a better recognition andgeneralization abilities than the classical MRF.

APPENDIX

, a set of sits.

, a set of labels.

, a collection ofobservations at all sites.

, a labelingconfiguration at all sites.

Neighborhood system of site .

Clique that all sites in it are pair-wiseneighbors.

Set of single-site cliques.

Set of pair-site cliques.

Page 13: Type-2 Fuzzy Markov Random Fields and Their Application to

ZENG AND LIU: TYPE-2 FUZZY MARKOV RANDOM FIELDS AND THEIR APPLICATION TO HANDWRITTEN CHINESE CHARACTER RECOGNITION 759

Type-2 fuzzy membership function.

Multivariate Gaussian primary MF withuncertain mean vector.

Multivariate Gaussian primary MF withuncertain covariance matrix.

Single-site fuzzy clique potential.

Pair-site fuzzy clique potential.

Set of all parameters defining a T2 FMRF.

Fuzzy energy function that is a joint of fuzzyclique potentials over all possible .

REFERENCES

[1] J. Zeng and Z.-Q. Liu, “Type-2 fuzzy sets for handling uncertaintyin pattern recognition,” in Proc. Int. Conf. Fuzzy Syst., 2006, pp.1247–1252.

[2] R. O. Duda, P. E. Hart, and D. G. Stork, Pattern Classification, 2nded. New York: Wiley, 2001.

[3] S. Z. Li, Markov Random Field Modeling in Image Analysis, 2nd ed.Tokyo, Japan: Springer-Verlag, 2001.

[4] R. Chellappa and A. Jain, Eds., Markov Random Fields: Theory andApplication. Boston, MA: Academic, 1993.

[5] R. G. Cowell, A. P. Dawid, S. L. Lauritzen, and D. J. Spiegelhalter,Probabilistic Networks and Expert Systems. New York: Springer,1999.

[6] S. Geman and D. Geman, “Stochastic relaxation,Gibbs distribution andthe Bayesian restoration of images,” IEEE Trans. Pattern Anal. Ma-chine Intell., vol. 6, no. 6, pp. 721–741, 1984.

[7] J. Cai and Z.-Q. Liu, “Pattern recognition using Markov random fieldmodels,” Pattern Recognit., vol. 35, no. 3, pp. 725–733, 2002.

[8] J. M. Hammersley and P. Clifford, Markov Field on Finite Graphs andLattices, 1971, unpublished.

[9] G. J. Klir and M. J. Wierman, Uncertainty-Based Information. NewYork: Physica-Verlag, 1998.

[10] J. M. Mendel, “Advances in type-2 fuzzy sets and systems,” Inf. Sci.,vol. 177, pp. 84–110, 2007.

[11] Z.-Q. Liu and Y.-K. Liu, “Type-2 fuzzy variables,” IEEE Trans. FuzzySyst., to be published.

[12] J. Zeng and Z.-Q. Liu, “Type-2 fuzzy sets for pattern recognition: Thestate-of-the-art,” J. Uncertain Syst., vol. 1, no. 3, pp. 163–177, Aug.2007.

[13] J. M. Mendel and R. I. John, “Type-2 fuzzy sets made simple,” IEEETrans. Fuzzy Syst., vol. 10, no. 2, pp. 117–127, Apr. 2002.

[14] J. M. Mendel, Uncertain Rule-based Fuzzy Logic Systems: Introduc-tion and New Directions. Upper Saddle River, NJ: Prentice-Hall,2001.

[15] R. I. John, P. R. Innocent, and M. R. Barnes, “Neuro-fuzzy clusteringof radiographic tibia image data using type-2 fuzzy sets,” Inf. Sci., vol.125, pp. 65–82, 2000.

[16] Q. Liang and J. M. Mendel, “MPEG VBR video traffic modeling andclassification using fuzzy technique,” IEEE Trans. Fuzzy Syst., vol. 9,no. 5, pp. 183–193, Oct. 2001.

[17] H. Mitchell, “Pattern recognition using type-II fuzzy sets,” Inf. Sci.,vol. 170, pp. 409–418, 2005.

[18] J. Zeng and Z.-Q. Liu, “Type-2 fuzzy hidden Markov models and theirapplication to speech recognition,” IEEE Trans. Fuzzy Syst., vol. 14,no. 3, pp. 454–467, Jun. 2006.

[19] J. Zeng and Z.-Q. Liu, “Interval type-2 fuzzy hidden Markov models,”in Proc. IEEE FUZZ, 2004, pp. 1123–1128.

[20] J. Zeng and Z.-Q. Liu, “Type-2 fuzzy hidden Markov models tophoneme recognition,” in Proc. Int. Conf. Pattern Recognition, 2004,vol. 1, pp. 192–195.

[21] H. Wu and J. M. Mendel, “Classification of battlefield ground vehiclesusing acoustic features and fuzzy logic rule-based classifiers,” IEEETrans. Fuzzy Syst., vol. 15, no. 1, pp. 56–72, Feb. 2007.

[22] G. C. Mouzouris and J. M. Mendel, “Nonsingleton fuzzy logic systems:Theory and application,” IEEE Trans. Fuzzy Syst., vol. 5, no. 1, pp.56–71, Feb. 1997.

[23] H. Wu and J. M. Mendel, “Uncertainty bounds and their use in thedesign of interval type-2 fuzzy logic systems,” IEEE Trans. Fuzzy Syst.,vol. 10, no. 5, pp. 622–639, Oct. 2002.

[24] I. T. Nabney, NETLAB: Algorithms for Pattern Recognition. London,U.K.: Springer, 2002.

[25] Q. Liang and J. M. Mendel, “Interval type-2 fuzzy logic systems:Theory and design,” IEEE Trans. Fuzzy Syst., vol. 8, no. 5, pp.535–549, Oct. 2000.

[26] J. M. Mendel, “Type-2 fuzzy sets: Some questions and answers,” Proc.IEEE Connections, vol. 1, pp. 10–13, Aug. 2003.

[27] G. J. McLachlan and K. E. Basford, Mixture Models: Inference andApplications to Clustering. New York: Marcel Dekker, 1988.

[28] A. P. Dempster, N. M. Laird, and D. B. Rubin, “Maximum likelihoodfrom incomplete data via the EM algorithm,” J. Roy. Statist. Soc. B,vol. 39, no. 1, pp. 1–38, 1977.

[29] F. Salzenstein and W. Pieczynski, “Parameter estimation in hiddenfuzzy Markov random fields and image segmentation,” Graph. ModelsImage Process., vol. 59, no. 4, pp. 205–220, 1997.

[30] S. Z. Li, H. Wang, and K. L. Chan, “Minimization of MRF energy withrelaxation labeling,” J. Math. Imag. Vis., vol. 7, no. 2, pp. 149–161,1997.

[31] C.-L. Liu, S. Jaeger, and M. Nakagawa, “Online recognition of Chinesecharacters: The state-of-the-art,” IEEE Trans. Pattern Anal. Mach. In-tell., vol. 26, no. 2, pp. 198–213, Feb. 2004.

[32] I. Biederman, “Recognition-by-components: A theory of human imageunderstanding,” Psych. Rev., vol. 94, no. 2, pp. 115–147, Apr. 1987.

[33] N. Kato, M. Suzuki, S. Omachi, H. Aso, and Y. Nemoto, “A hand-written character recognition system using directional element featureand asymmetric Mahalanobis distance,” IEEE Trans. Pattern Anal.Mach. Intell., vol. 21, no. 3, pp. 258–262, Mar. 1999.

[34] C.-L. Liu and M. Nakagawa, “Evaluation of prototype learning al-gorithms for nearest-neighbor classifier in application to handwrittencharacter recognition,” Pattern Recognit., vol. 34, no. 3, pp. 601–615,2001.

[35] Y. Y. Tang, L.-T. Tu, J. Liu, S.-W. Lee, and W.-W. Lin, “Off-linerecognition of Chinese handwriting by multifeature and multilevelclassification,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 20, no. 5,pp. 556–561, May 1998.

[36] P.-K. Wong and C. Chan, “Off-line handwritten Chinese characterrecognition as a compound Bayes decision problem,” IEEE Trans.Pattern Anal. Mach. Intell., vol. 20, no. 9, pp. 1016–1023, Sep.1998.

[37] X. Huang, J. Gu, and Y. Wu, “A constrained approach to multifontChinese character recognition,” IEEE Trans. Pattern Anal. Mach. In-tell., vol. 15, no. 8, pp. 838–843, Aug. 1993.

[38] C.-L. Liu, I.-J. Kim, and J. H. Kim, “Model-based stroke extractionand matching for handwritten Chinese character recognition,” PatternRecognit., vol. 34, no. 12, pp. 2339–2352, 2001.

[39] H. Y. Kim and J. H. Kim, “Hierarchical random graph representationof handwritten characters and its application to Hangul recognition,”Pattern Recognit., vol. 34, no. 2, pp. 187–201, 2001.

[40] Q. Wang, Z. Chi, D. Feng, and R. Zhao, “Hidden Markov randomfield based approach for off-line handwritten Chinese character recog-nition,” in Proc. Int. Conf. Pattern Recognition, Sep. 2000, vol. 2, pp.347–350.

[41] I.-J. Kim and J.-H. Kim, “Statistical character structure modeling andits application to handwriting Chinese character recognition,” IEEETrans. Pattern Anal. Mach. Intell., vol. 25, no. 11, pp. 1422–1436, Nov.2003.

[42] K.-W. Kang and J.-H. Kim, “Utilization of hierarchical, stochastic rela-tionship modeling for Hangul character recognition,” IEEE Trans. Pat-tern Anal. Mach. Intell., vol. 26, no. 9, pp. 1185–1196, Sep. 2004.

[43] J. Zeng and Z.-Q. Liu, “Markov random fields for handwritten Chi-nese character recognition,” in Proc. Int. Conf. Document Analysis andRecognition, 2005, pp. 101–105.

[44] H.-H. Chang and H. Yan, “Analysis of stroke structures of handwrittenChinese characters,” IEEE Trans. Syst., Man, Cybern. B, vol. 29, no. 2,pp. 47–61, Feb. 1999.

[45] X. He and N. Yung, “Curvature scale space corner detector with adap-tive threshold and dynamic region of support,” in Proc. Int. Conf. Pat-tern Recognition, Aug. 2004, vol. 2, pp. 791–794.

[46] Z.-Q. Li, J. Cai, and R. Buse, Handwriting Recognition: Soft Com-puting and Probabilistic Approaches. Berlin, Germany: Springer,2003.

[47] Y.-M. Su and J.-F. Wang, “A novel stroke extraction method for Chi-nese character using Gabor filters,” Pattern Recognit., vol. 36, no. 3,pp. 635–647, 2003.

[48] J. Zeng and Z.-Q. Liu, “Stroke segmentation of Chinese charactersusing Markov random fields,” in Proc. Int. Conf. Pattern Recognition,2006, pp. 868–871.

[49] J. C. Bezdek, Pattern Recognition With Fuzzy Objective Function Al-gorithms. New York: Plenum, 1981.

[50] I.-J. Kim and J.-H. Kim, “Pair-wise discrimination based on astroke importance measure,” Pattern Recognit., vol. 35, no. 10, pp.2259–2266, 2002.

Page 14: Type-2 Fuzzy Markov Random Fields and Their Application to

760 IEEE TRANSACTIONS ON FUZZY SYSTEMS, VOL. 16, NO. 3, JUNE 2008

Jia Zeng (S’05–M’07) received the B.Eng. degreein electrical engineering from the Wuhan Universityof Technology, China, in 2002, and the Ph.D. degreefrom the School of Creative Media, City Universityof Hong Kong, in 2006.

In 2003, he was a Research Assistant in the Centerfor Media Technology, School of Creative Media,City University of Hong Kong. He is currently aResearch Fellow in the Department of ElectronicEngineering, City University of Hong Kong. Hisresearch interests are type-2 fuzzy sets, pattern

recognition, and bioinformatics.Dr. Zeng was awarded First Place and Second Place in the 2005 and 2006

IEEE Region 10 Postgraduate Student Paper Competition, respectively.

Zhi-Qiang Liu received the M.A.Sc. degreein aerospace engineering from the Institute forAerospace Studies, University of Toronto, Toronto,ON, Canada, and the Ph.D. degree in electrical engi-neering from The University of Alberta, Edmonton,AB, Canada.

He is currently a Professor with the School of Cre-ative Media, City University of Hong Kong. He hastaught computer architecture, computer networks,artificial intelligence, programming languages,machine learning, pattern recognition, computer

graphics, and art and technology. His interests include neural-fuzzy systems,machine learning, human-media systems, computer vision, mobile computing,and computer networks.