integrating rough set theory and fuzzy neural network to discover fuzzy rules

8/8/2019 Integrating Rough Set Theory and Fuzzy Neural Network to Discover Fuzzy Rules

1/16

Intelligent Data Analysis 7 (2003) 5973 59IOS Press

Integrating rough set theory and fuzzy neuralnetwork to discover fuzzy rules

Shi-tong Wanga, Dong-jun Yub and Jing-yu YangbaDepartment of Computer Science, School of Information, Southern Yangtse University, Jiangsu, P.R.

China, 214036bDepartment of Computer Science, Nanjing University of Science & Technology, Nanjing, Jiangsu, P.R.

China 210094

Received 15 April 2002

Revised 15 June 2002

Accepted 25 June 2002

Abstract. Most of fuzzy systems use the complete combination rule set based on partitions to discover the fuzzy rules, thusoften resulting in low capability of generalization and high computational complexity. To large extent, the reason originatesfrom the fact that such fuzzy systems do not utilize the field knowledge contained in data. In this paper, based on rough settheory, a new generalized incremental rule extraction algorithm (GIREA) is presented to extract rough domain knowledge,namely, certain and possible rules. Then, fuzzy neural network FNN is used to refine the obtained rules and further produce thefuzzy rule set. Our approach and experimental results demonstrate the superiority in both rules length and the number of fuzzyrules.

Keywords: Rough set, fuzzy set, neural networks, incremental rule extraction

1. Introduction

In real world, almost every question will finally lead to process data that has characteristics ofuncertainty, imprecision. To date, many scholars have developed all kinds of approaches, such as neural

network [1], fuzzy systems [2], rough set theory [3], genetic algorithm etc. Each approach has its own

advantages and disadvantages. In order to provide more flexible and robust information processingsystem, using only one approach is not enough. There is already a trend to integrate different computingparadigms such as neural network, fuzzy systems, rough set theory, genetic algorithm and so on togenerate more efficient hybrid systems such as neural-fuzzy systems [4].

Typically, fuzzy neural network (namely, FNN) embodies both advantages of neural networks (namely,NN) and fuzzy systems. In other words, FNN can be used to construct knowledge-basedNN. i.e. human-beings field knowledge can be incorporated into NN, so FNN can be more suitable for the question tobe solved. But there still exist questions. For example, in some circumstances, people even cant derive

appropriate rules to a given system. Of course, we can divide every input dimension into several fuzzysubsets, and then all fuzzy subsets in every input dimension are combined to construct the complete ruleset. However, such kind of FNN contains no field knowledge, i.e. this kind of FNN may not fit for thegiven system at the very beginning. Recent years, rough set theory has been attracting more and more

1088-467X/03/$8.00 2003 IOS Press. All rights reserved


2/16

60 S.-t. Wang et al. / Integrating rough set theory and fuzzy neural network to discover fuzzy rules

attentions and used in various applications, due to its excellent capability of extracting knowledge from

data. In this paper, we will first apply rough set theory to extract certain and possible rules which thenare used to determine the initial structure of FNN such that the FNN here works in the beginning with

this type of useful knowledge.As to fuzzy rule extraction, there are two important problems worthy to study. One is how to extract a

rule set from data. The other is how to refine/simplify the obtained rule set. Several approaches [1] can

be applied to extract rules from data, such as fuzzy rule extraction based on product space clustering,fuzzy rule extraction based on ellipsoidal covariance learning, fuzzy rule extraction based on directmatching, etc. Fuzzy rule simplification approach [12] based on similarity measure can effectively

reduce the number of fuzzy rules by merging similar fuzzy sets in fuzzy rules. This paper aims at solving

the above two problems in a different aspect. The contribution of our approach here mainly exists in

effectively integrating rough set theory and FNN together to discovery fuzzy rules from data. Concisely,this approach first extracts certain and possible rules from data in an incremental mode by using the new

generalized incremental rule extraction algorithm GIREA, then applies the FNN to refine/simplify theextracted fuzzy rules.

This paper are organized as follows: Section II gives a brief description of fuzzy system and FNN.Section III introduces basic concepts of rough set theory. In Section IV, new generalized incrementalfuzzy rule extraction algorithm GIREA is presented. Section V deals with the method of mapping

fuzzy rule set to the corresponding FNN. Simulation results are demonstrated in Section VI. Section VIIconcludes this paper.

2. Fuzzy system and its fuzzy neural network

Generally speaking, a fuzzy system consists of a set of fuzzy rules as follows [5]:

Rule 1: ifx1 is A11 and x2 is A

12 and . . . xn is A

1n, then y is B

1

Rule 2: ifx1 is A21 and x2 is A

22 and . . . xn is A

2n, then y is B

2

...

...

Rule N: ifx1 is AN1 and x2 is A

N2 and . . . xn is A

Nn , then y is B

N

Fact: x1 is A

1 and x2 is A

2 and . . . xn is A

n

Conclusion: y is B .

With max-product inference and centroid defuzzification, the final output of this fuzzy system can be

written as:

y =

B(y)ydyB(y)dy

(1)

where B(y) = x1,x2,...,xn

ni=1

AI

(xi)

N

j=1

ni=1

Aji

(xi)

Bj(y)

.

Dr. L.X. Wang [6] has proved that Eq. (1) is a universal approximator.


3/16

S.-t. Wang et al. / Integrating rough set theory and fuzzy neural network to discover fuzzy rules 61

Fig. 1. The FNN implementation of the fuzzy system.

In practice, one can often consider that the output fuzzy sets Bj are singleton j , i.e.,

B(y) =

1, if(y = j), j = 1, 2, . . . , N 0, otherwise, j = 1, 2, . . . , N

(2)

thus, we have

Bj (y) =

ni=1

Aji

(xi), if(y = j),

0, otherwise

j = 1, 2, . . . , N (3)

then the final output can be rewritten as follows:

y =

Nj=1

j

ni=1

Aji

(xi)

Nj=1

ni=1

Aji

(xi)

(4)

The I/O relationship of the fuzzy system defined in Eq. (4) can be implemented by a correspondingFNN. The FNN consists of four components. They are input layer, fuzzification layer, inference layerand defuzzification layer as shown in Fig. 1.

General speaking, FNN can be utilized in two modes, one is series-parallel mode and the other is

parallel mode [13,14], see Figs 2(a) and (b), where TDL represents time delayed logic, RS represents thereal system, FNN represents fuzzy neural network, uk is the activation function, yk and yk are outputsof the RS and the FNN, respectively, ek is the difference between yk and yk. Figure 2(a) can be called

series-parallel mode and Fig. 2(b) parallel mode. When the FNN works in series-parallel mode, all the

delayed output data (used as the input data of the FNN) are the observation data of the real system. In this

circumstance, high observation precision is needed; too much observation noise will greatly degrade the

performance of the FNN. While in parallel mode, all the delayed output data (used as the input data of the

FNN) are independent to the observation data of the real system, and only relate to the FNN itself. No


4/16


(a) (b)

Fig. 2. Two modes that FNN can be applied. (a) series-parallel mode (b) parallel mode.

matter in which kind of mode, when the FNN approximates the real system well enough, it can be applied

independently. FNN has been widely used but there still exists a question as we described in Section 1,

i.e., when there is no any prior field knowledge, how can people get appropriate rules to construct FNNto reduce its searching space and time. The rest parts of this paper try to solve this problem.

3. Rough set, decision matrix and rule extraction

3.1. Basic concepts of rough sets

Here, we just introduce some necessary concepts needed in this paper. For details, please refer to [3].

An information system K = (U, C D), where U denotes the domain of discourse, C denotes anon-empty condition attribute set, and D denotes a non-empty decision attribute set. Let A = C D,an attribute a(a A) can be regarded as a function from the domain of discourse U to value set V ala.

An information system may be represented in the form of attribute-value table, in which rows are

labeled by objects in the domain of discourse, and columns by the attributes.

For every subset of attributes B C, equivalence relation IB on U can be defined as:

IB = {(x, y) U : for every a B, a(x) = a(y)} (5)

thus, the equivalence class of the object x U relative to IB can be defined as:

[x]B = {y|y U,yBx} (6)

Equivalence class can also be called indiscernible class, because any two objects in equivalence class

are indiscernible.

Low and upper approximation are another two important concepts in rough set theory. Given subsets

x U, B C, Xs B-Lower and B-Upper approximations can be defined as BX{x U : [x]B X}and BX = {x U : [x]B X = }, respectively. Boundary set BNB(X) can be defined asBNB(X) = BX BX. If BNB(X) = , i.e. BX = BX, then X is B rough, otherwise, X is B

exact.

3.2. Rule extraction using decision matrix

Decision matrix is a generalized form of rough set theory. The concept of decision matrix is derived

from descernibility matrices [8], it can be used to compute decision rules and reducts of information

system. It provides a way to generate the simplest set of rules and preserve all classi fication informationsimultaneously [9].


5/16


Table 1Consistent information table

Attributes Decision

Headache Temperature Flu

Object1 Yes Normal NoObject2 Yes High YesObject3 Yes Very High YesObject4 No Normal NoObject5 No High NoObject6 No Very High Yes

Table 2Decision matrix for class 0 (flu infected)

Class j 1 2 3

i OBJ Obj1 Obj4 Obj51 Obj2 (T,1) (T,1)(H,0) (H,0)2 Obj3 (T,2) (T,2)(H,0) (T,2)(H,0)3 Obj6 (H,1)(T,2) (T,2) (T,2)

3.2.1. Rule extraction from consistent information table

Let us introduce decision matrix first. For an information system K = (U, C D), suppose Ube divided into m classes (c1, c2, . . . , cm) by equivalence relation defined on D. Given any classc (c1, c2, . . . , cm), all objects which belong to and do not belong to this class are numbered withsubscripts i(i = 1, 2, . . . , ) and j(j = 1, 2, . . . , ), respectively. The decision matrix M(K) = (Mij)of information system Kisdefinedasa matrix, whose entry at position (i, j) is a set of attribute-valuepair:

Mij = {(a, a(i)) : a(i) = a(j)}, (i = 1, 2, . . . , ;j = 1, 2, . . . , ), (7)

where a(i) is a value of attribute a.For a given object i(i = 1, 2, . . . , ) belonging to class c (c1, c2, . . . , cm), we can compute its

minimal-length decision rule

|Bi| =jMij, (8)

where and are generalized conjunction and disjunction operator respectively. So for the given classc (c1, c2, . . . , cm), its decision rule set can be represents as following

RU L = |Bi|, (i = 1, 2, . . . , ) (9)

Let H represent Headache, T and F represent Temperature and Flu, respectively.

V ALH = {0, 1} represents V ALHeadache = {Yes,No}.

V ALT = {0, 1, 2} represents V ALTewmperature = {Normal,High,Very High}.V ALF = {0, 1} represents V ALFlu = {Yes,No}.Tables 2 and 3 demonstrate the decision matrix for class 0 (Flu infected) and 1 (not infected),

respectively.

Let |B0i |(i = 1, 2, 3) denotes the i-th minimal-length rule in decision matrix of class 0.So,

|B01 | = (T, 1) ((T, 1) (H, 0)) (H, 0) = (T, 1) (H, 0)


6/16


Table 3Decision matrix for class 1 (flu not infected)

Class j 1 2 3

i OBJ Obj2 Obj3 Obj6

1 Obj1 (T,0) (T,0) (H,0)(T,0)2 Obj4 (T,0)(H,1) (T,0)(H,1) (T,0)3 Obj5 (H,1) (H,1)(T,1) (T,1)

Table 4Inconsistent information table

Attributes Decision

Headache Temperature Flu

Object1 Yes Normal NoObject2 Yes High YesObject3 Yes Very High YesObject4 No Normal NoObject5 No High NoObject6 No Very High YesObject7 No High YesObject8 No Very High No

|B02 | = (T, 2) ((T, 2) (H, 0)) ((T, 2) (H, 0)) = (T, 2)

|B03 | = ((T, 2) (H, 1)) (T, 2) (T, 0) = (T, 2)

Similarly, the i-th minimal-length rule in decision matrix of class 1 can be compute as following:

|B11 | = (T, 0) (T, 0)((T, 0) (H, 0)) = (T, 0)

|B12 | = ((T, 0) (H, 1)) ((T, 0) (H, 1)) (T, 0) = (T, 0)

|B13 | = (H, 1) ((T, 1) (H, 1)) (T, 1) = (T, 1) (H, 1)

The final minimal-length decision rule set for class 0 and class 1 can be represented as

RU L0 = (T, 2) ((T, 1) (H, 0))

RU L1 = (T, 0) ((T, 1) (H, 1))

3.3. Rule extraction from inconsistent information table using decision matrix

In real-life applications, consistent information table often does not exist, so, inconsistent information

has to be coped with.

Suppose we add Object 7 and Object 8 into Table 1 and then get Table 3. Table 3 is an inconsistent

information table for there exist some Objects that have the same condition attribute value and whosecorresponding decision attribute values are different. For example, Object5 and Object7 have the same

condition attribute value, but they have different decision attribute values.

From Table 3, we can get two concepts X1 = {Object2, Object3, Object6, Object7} and X2 ={Object1, Object4, Object5, Object8}, representing flu infected and flu not infected, respectively. Thesetwo concepts are rough because neither of them is de finable. In order to extract rules from inconsistentinformation table, low and upper approximations are needed. Rules extracted from low approximation

are certain rules. Rules extracted from upper approximation are possible rules.


7/16


Table 5Decision matrix for computing concept X1s certain rules

Class j 1 2 3 4 5 6

i Object Object1 Object4 Object5 Object6 Object7 Object8

1 Object2 (T,1) (H,0)(T,1) (H,0) (H,0)(T,1) (H,0) (H,0)(T,1)2 Object3 (T,2) (H,0)(T,2) (H,0)(T,2) (H,0) (H,0)(T,2) (H,0)

Table 6Decision matrix for computing concept X1s pos-sible rules

Class j 1 2

i Object Object1 Object41 Object2 (T,1) (T,1)(H,1)2 Object3 (T,2) (T,2)(H,1)3 Object5 (H,1)(T,1) (T,1)4 Object6 (H,1)(T,2) (T,2)5 Object7 (H,1)(T,1) (T,1)6 Object8 (H,1)(T,2) (T,2)

Firstly, we compute concept X1 and X2s low and upper approximation:

BX1 = {Object2, Object3}

BX2 = {Object1, Object4}

BX1 = {Object2, Object3,Object5, Object6,Object7, Object8}

BX2 = {Object1, Object4,Object5, Object6,Object7, Object8}

Let |B0i |certain(i = 1, 2) denote the i-th minimal-length certain rule in decision matrix of class 0.Using method proposed in Section 4.1, we can compute certain rules for concept X1 (class 0 ) as

follows:

|B01 |certain = (T, 1) ((T, 1) (H, 0)) (H, 0) ((T, 1) (H, 0)) (H, 0) ((T, 1) (H, 0))

= (T, 1) (H, 0)

|B02 |certain = (T, 2) ((T, 2) (H, 0)) ((T, 2) (H, 0)) (H, 0) ((T, 2) (H, 0)) (H, 0)

= (T, 2) (H, 0)

thus, we obtain certain rule set for class 0:

RU L0certain = ((T, 1) (H, 0)) ((T, 2) (H, 0))

In order to obtain certain rules, we define its belief function df = 1. In other words, rules with df = 1

are positively believable.Let |B0i |possible denote the i-th minimal-length certain rule in decision matrix of class 0. Similarly, we

can use the same method to compute possible rules for concept X1 from Table 6 as follows:

|B01 |possible = (T, 1) ((T, 1) (H, 1)) = (T, 1),

|B02 |possible = (T, 2) ((T, 2) (H, 1)) = (T, 2)

|B03 |possible = ((T, 1) (H, 1)) (T, 1)) = (T, 1),


8/16


|B04 |possible = ((T, 2) (H, 1)) (T, 2) = (T, 2)

|B05 |possible = ((T, 1) (H, 1)) (T, 1)) = (T, 1),

|B0

6 |possible = ((T, 2) (H, 1)) (T, 2) = (T, 2)

thus, we can obtain possible rule set for class 0 as follows

RU L0possibel = (T, 1) (T, 2) (T, 1) (T, 2) (T, 1) (T, 2)

= (T, 1) (T, 2)

For possible rules, we define their belief function

df = 1 card(BX BX)

card(U)

where card() denotes the cardinality of the set. In other words, possible rules are believable with degreedf, 0 < df < 1. The rationale of this definition is intuitive: The more the difference between BX andBX is, the more inexact the concept X is, thus the belief degree of the possible rules extracted from X

should be decreased accordingly. When BX approaches to BX, df will approach to 1.

Similarly, we can compute concept X2s certain and possible rules.

4. New Generalized Incremental Rule Extracting Algorithm (GIREA)

Suppose we have extracted certain and possible rules from an information table, when new objects are

added into it, the rule set may be changed. In this circumstance, incremental rule extraction algorithm is

required; otherwise it will take much more long time to re-compute rule set from the very beginning. It

should be pointed out that the incremental rule extraction algorithm in [9] did not compute certain andpossible rules and cope with consistent information table simultaneously. However, the new generalized

incremental rule extraction algorithm (GIREA) is presented here, which can not only deal with both

consistent and inconsistent information table, but also it can extract certain and possible rule sets at the

same time, although GIREA is a generalization of the algorithm presented in [9]. The main idea of this

new algorithm can be summarized as follows:

Given a new added Object:

Whether this new added Object causes a new concept or not? If it does, update concept set.

Collision detection: Objecta collides with Objectb, if and only if Objecta and Objectb have the

same condition attribute values, and their corresponding decision attribute values are different. For

example, Object6 and Object8 collide with each other (in Section 4.3).

Update certain and possible rule sets in terms of collision detection.Using this algorithm, when a new object is added up to information system, it is unnecessary to

re-compute rule sets from the very beginning, we can update rule sets by partly modifying original rule

sets, so a lot of time are saved, it is especially useful when extracting rules from large databases.

GIREA Algorithm:

Condition: Rule set and concept set (X = {X1, X2, . . . , X }) which have been computed from thegiven information system. A new object Objectnew is added up to information system.


9/16


BEGIN

STEP 1.

Determine which concept the new added object belongs to, if it does not belong to any concept in

concept set X = {X1, X2, . . . , X }, create a new concept X+1 and add it to X, i.e. X = X {X}STEP 2.

// Collision detection

IF (the new object Objectnew collides with original objects in information table)

FLAG = 1;ELSE

FLAG = 0;

STEP 3.

Get a concept Xi from X, and X = X {Xi}.IF(FLAG = 0) // no collision{

IF (Val(Xi) = Val(Objnew)){ add up a new row for concept Xis certain and possible decision matrix respectively

(labeled with 1 and 2 respectively),

(Mk1j) = {(a, a(k1))|a(k1) = a(j)}(Mk2j) = {(a, a(k2))|a(k2) = a(j)}compute decision rule for the added row respectively:

|Bk1| =jMk1j

|Bk2| =jMk2j

update concept Xis certain and possible rule sets as followsRU Licertain = RU L

icertain |Bk1|

RU L

i

possible = RU L

i

possible |Bk2|}ELSE

{add a new column for concept s certain and possible decision matrix respectively(labeled with 1 and 2 respectively),

(Mk1j) = {(a, a(i))|a(i) = a(k1)}(Mk2j) = {(a, a(i))|a(i) = a(k2)}compute decision rule for every row respectively:

|Bi|certain = |Bi|certain Mik1|Bi|possible = |Bi|possible Mik2update concept Xis certain and possible rule sets as follows

RU Licertain = |Bi|certain

RU Lipossible = |B

i|possible}

}ELSE //collision detected

{ IF (Objectnew collides with Object which exists in concept Xis low approximation){ delete the row which contains Object from certain decision matrix of concept


10/16


Xi (labeled with l).Update certain rule set as follows:

RU Licertain = RU Licertain |Bl|certain.

Then add a new column to certain decision matrix of conceptXi (labeled with k).

Update every rows decision rule as follows:|Bi|certain = |Bi|certain MikUpdate final certain rule set as follows:RU Licertain = |B

i|certainadd a new row to possible decision matrix of concept Xi (labeled with k):(Mkj) = {(a, a(k))|a(k) = a(j)}compute possible decision rule for this line:

|Bk|possible =jMkj

update final possible set as follows:RU L

possible= RU Li

possible |B

k|

possible}ELSE IF(Val(Xi) = Val(Objectnew)){ add a new column to certain decision matrix of concept Xi (labeled with k).

(Mik) = {(a, a(i))|a(i) = a(k)}update every rows decision rule as follows:|Bi|certain = |Bi|certain Mikupdate final certain rule set as follows:RU Licertain = |B

i|certaindelete the column which contains Object from possible decision matrix of

concept Xi and add a new row (Objectnew) to it;calculate each rows possible rule |Bi|possible;

calculate RU Lipossible as: RU Lipossible = |Bi|possible}

ELSE

{ add a new column for concept s certain and possible decision matrix respectively(labeled with k1 and k2 respectively),(Mik1 = {(a, a(i))|a(i) a(k1)}(Mik2 = {(a, a(i))|a(i) a(k2)}compute decision rule for the added column respectively:

|Bi|certain = |Bi|certain Mik1|Bi|possible = |Bi|possible Mik2update concept Xis certain and possible rule sets as followsRU Li

certain = |Bi|certainRU Lipossible = |B

i|possible}

}}

STEP 4.

IF (X = )GOTO STEP 3.


11/16


ELSE

STOP.

END

A question one may raise here is that when a new object is added to the domain of discourse U, thecardinality of U will change, thus the belief degrees of possible rules must be recomputed, this willaffect the entire learned rule set, thereby making the algorithm not incremental. We analyze it as follows:

according to the definition of belief function in Section 4.3, the belief degrees of possible rules extracted

from the same concept are equal. When a new object is added, recomputing each concepts belieffunction can get the belief degrees of all possible rules. Moreover, the incrementability of the proposed

algorithm is acquired by properly modifying the already existing rules; belief degree recomputation is just

small part work of this kind modification. Compared with the computational cost of rule modification,computational cost of belief degree is rather small.

5. Mapping rules into the FNN

When certain and possible rules are extracted from information table, we need to map them into the

corresponding FNN just like mapping fuzzy rules to FNN, which is described in Section 2.

Taking the rules extracted in Section 3.2.2 as an example, there are 3 certain rules and 3 possible rules

in the rule set as follows:

Certain rules:

RU L0certain = ((T, 1) (H, 0)) ((T, 2) (H, 0))

RU L1certain = (T, 0)

Possible rules:

RU L0possible = (T, 1) (T, 2)

RU L1possible = (H, 1)

We can describe these rules in the form of natural language as follows:

(1) If Temperature is High And Headache is Yes, Then the Flu is Infected. (df1 = 1)(2) If Temperature is Very High And Headache is Yes, Then the Flu is Infected. (df2 = 1)(3) If Temperature is Normal, Then the Flu is not Infected. (df3 = 1)

Rules (1), (2) and (3) are certain rules, the belief degrees (df) of which are all 1, i.e., these certain rules

are definitely believable.

(4) If Temperature is High, Then the Flu is Infected. (df4 = 0.5)

(5) If Temperature is Very High, Then the Flu is Infected. (df5 = 0.5)(6) If Headache is No, Then the Flu is not Infected. (df6 = 0.5)

Rules (4), (5) and (6) are possible rules, the belief degrees (df) of which lie between 0 and 1, i.e., these

possible rules are partially believable.

As there are two kinds of rules (certain and possible), thus the inference layer of the corresponding

FNN consists of two parts as shown in Fig. 3, one is certain part, which contains certain rules, and the

other is possible part, which contains possible rules.


12/16


the FNN as shown in Fig

Fig. 3. Mapping rules to FNN.

Let dfi be the belief degree of the ith rule. The final fitness of the ith rule in FNN can be measured by

dfi i, where i is the fitness of the ith rule in conventional meaning.Let x be the input variable of Headache, y be the input variable of Temperature, C1 represent flu not

infected and C2 represent flu infected. Define two fuzzy sets Yes and No on input dimension and

three fuzzy sets N, H and V on input dimension, where N, H and V represent Normal,

High and Very High, respectively. Then the six rules described above can be mapped into the FNN

as shown in Fig. 3.

6. Numerical simulations

In this section, numerical simulations are demonstrated to show our approachs superiority over the

rule extraction approach only using the conventional FNN [1].

Given a nonlinear system:

y(t + 1) =y(t)y(t 1)(y(t) + 2.5)

1 + y2(t) + y2(t 1)+ u(t)

u(t) = sin2t

25is activation function. (10)

y(0) = 0.9, y(1) = 0.5.

Method 1: Use the conventional FNN [1].

First, we divide input interval into three equal sub-intervals on each dimension, and then define three

fuzzy subsets on them (see Fig. 4). Figure 4 shows how to define fuzzy sets on sub-intervals, where

S, M and L represents fuzzy sets Small, Middle and Large, respectively; y min and ymax are the

minimum and the maximum that may be taken on dimension y, respectively.


13/16


Fig. 4. Defining fuzzy sets on y dimension.

Table 7Performance comparison between method 1 andmethod 2

R ARL No. of Iterations

Method 1 27 3 200Method 2 20 2.2 89

We define the average rule length ARL as:

ARL =

Ri=1

Pi

R(11)

where R is the number of rules, Pi is the number of the premise variables in the ith rule.

Using the complete combination rule set, there will be 27 (3 3 3) rules, and ARL is 3 (becausethere are 3 premise variables in each rule).

Method 2: Use the approach in this paper, i.e.,

Discretizing samples (Quantifying continuous attribute value). In order to compare with Method 1,

input interval is also divided into 3 equal subintervals on each dimension as done in method 1. In order to demonstrate the incrementability of the proposed algorithm GIREA, setting information

table null at beginning, then gradually add sample into it (one by each time), extracting certain and

possible rules using GIREA until all samples have been processed.

Mapping rules to the FNN, using the FNN to refine the rules obtained in the above step

Using method 2, we got 20 rules and the average rule length is 2.5.

In our experiment, in order to approximate to the same level, the number of iteration for method 1 and

method 2 are 200 and 89 respectively. Figure 5 shows the final identification results of method 1 andmethod 2, respectively (using FNN independently when training finished and using different initial statevalues from the real system (y(0) = 0.9, y(1) = 0.5), but the two FNNs use the same initial statevalues (y(0) = 0.4, y(1) = 0.2)). Table 7 compares the performances of method 1 and method 2.

From Fig. 5 we can see that compared with method 1, method 2 has the simpler rule set, the morequick learning speed. The reason is that the FNN based on our approach here contains knowledge gotfrom sample data.

Figure 6 also shows the final identification results of method 1 and method 2 after 20% white gaussnoise added respectively. It is easy to see that the FNN based on method 2 has better robustness than the

FNN based on method 1.

Here another experiment is done to demonstrate the performance superiority of the proposed GIREA

over the conventional rule extraction algorithm.


14/16


(a) (b)

Fig. 5. (a) and (b) are identification results using method 1 and method 2, respectively. Small dots real system (initial statey(0) = 0.9, y(1) = 0.5); Big dots FNN (initial state y(0) = 0.4, y(1) = 0.2).

(a) (b)

Fig. 6. (a) and (b) are identification results using method 1 and method 2, respectively. (20% white gauss noise added). Smalldots real system (initial state y(0) = 0.9, y(1) = 0.5); Big dots FNN (initial state y(0) = 0.4, y(1) = 0.2).

Suppose there are 100 samples in original sample set. Rules have been extracted from the sample set

using the conventional rule extraction algorithm. Suppose the used time be the benchmark time 1. Now

another 20 samples are added to the sample set. The time of re-extracting rules using the conventional

rule extraction algorithm is 1.19, while the time of re-extracting rules using GIREA is only 1.08, as

shown in Table 8. The reason is that when new objects added, the proposed GIREA updates rule set by

partly modifying original rule set, while the conventional rule extraction algorithm needs to re-compute

rule set from the very beginning.

7. Conclusions

How to get rules from data without expert knowledge is the bottleneck of knowledge discovery. Our

approach here attempts to integrate rough set and FNN together to discover knowledge. Rule set obtained

by GIREA has characteristics of fewer rules and shorter rule length. Simulation results on our approach

here show its effectiveness and advantages over conventional FNN. The reason is that our approach

utilizes the distribution characteristics of sample data and extract better rule set, so the FNN based onbetter rule set has better topology and has better robustness and learning speed accordingly. Further


15/16


Table 8Performance comparison between the conventional rule ex-traction algorithm and the GIREA (Note: the time listed inTable 8 is relative to the benchmark time 1)

Algorithm Time usedThe Conventional Rule Extraction Algorithm 1.19GIREA 1.08

studies should be focused on theoretical and practical study of static-dynamic topology-changeable FNN

and knowledge discovery.

Acknowledgement

The work here is financially supported by National Science Foundation of China. The authors wouldlike to thank the anonymous reviewers for their valuable comments.

ABOUT AUTHORS

Wang Shitong: Professor in computer science

Yu Dongjun, Ph.D candidate in computer science

Yang JinYu: Professor in computer science

References

[1] S.T. Wang, Fuzzy system and Fuzzy Neural Networks, Shanghai Science and Technology Press, 1998, Edition 1.[2] L.A. Zadeh, Fuzzy sets, Inform. Contr. 8 (1965), 338353.[3] Z. Pawlak, Rough Sets, Theoretical Aspects of Reasoning About Data. Dordrecht, Kluwer, The Netherlands, 1991.[4] M. Banerjee et al., Rough Fuzzy MLP: Knowledge Encoding and Classification, IEEE Trans. Neural Networks 9(6)

(1998), 12031216.[5] C.T. Lin, Neural Fuzzy System, Prentice-Hall Press, USA, 1997.[6] L.X. Wang, A Course on Fuzzy Systems, Prentice-Hall press, USA, 1999.[7] S. Wang and D. Yu, Error analysis in nonlinear system identification using fuzzy system, J. of software research 11(4)

(2000), 447452.[8] A. Skowron and C. Rauser, The discernability matrices and functions in information system, in Intelligent Decision

Support, Handbook of Application and Advances of Rough Sets Theory, R. Slowinski, ed., Dordrecht, Kluwer, TheNetherlands, 1992, pp. 331362.

[9] N. Shan and W. Ziarko, An incremental Learning Algorithm for Constructing Decision Rules, in: Rough Sets, Fuzzy Setsand Knowledge Discovery, R.S. Kluwer, ed., Springer-Verlag, 1994, pp. 326334.

[10] P. Wang, Constructive theory for fuzzy system, Fuzzy sets and systems 88(2) (1997), 10401045.[11] Z. Mao et al., Topology-Changeable neural network, Control theory and application 16(1), 5460.[12] M. Setnes et al., Similarity measures in fuzzy rule base simplification,IEEETransactions on system, man, and cybernetics

Part B: cybernetics 28(3) (June 1998).[13] K.S. Narendra and K. Parthasarathy, Identification and control of dynamical systems using neural networks, IEEE Trans.

Neural Networks 1(1) (March 1990), 423.[14] J. Lu, W. Xu and Z. Han, Research on parallel Identification Algorithm of Neural Networks, Control Theory and

applications 15(5) (1998), 741745.


16/16

integrating rough set theory and fuzzy neural network to discover fuzzy rules

Documents