fuzzy models for educational data mining

15
JOURNAL OF TELECOMMUNICATIONS, VOLUME 15, ISSUE 2, AUGUST 2012 © 2012 JOT www.journaloftelecommunications.co.uk 8 Fuzzy Models for Educational Data Mining Ashraf Darwish and Olga Poleshchuk Abstract- Models of educational information processing based on semantic spaces are developed in this paper. The first model allows formalizing the results of expert evaluation of students’ qualitative characteristic with the help of fuzzy sets. The second model is developed to determine rating psoints of students in the frame of qualitative characteristics and to assign one of qualification levels to each student. The obtained rating points are used to develop a model for students’ clusterization based on the expert opinions regarding the importance of certain characteristics for the corresponding cluster. The fourth model is proposed to predict the characteristics of students and to study the relationship between these characteristics. The method of regression’s creation is based on the transformation of the input and output fuzzy numbers into intervals, which are called weighted intervals. Index Terms- Educational information processing, fuzzy expert evaluation, complete orthogonal semantic space, qualitative characteristic, rating points, cauterization, prediction. ----------------------- ------------------------ 1 INTRODUCTION AND BACKGROUND The complexity of the quantitative assessment of the educational process results from the general complexity of the quantitative assessment of the training and management processes. This complexity is associated with the specific features of measurement in the education. It consists in the necessity to take into account characteristics and judgments of the experts who take decisions based on their personal evaluation. Also this complexity is associated with the fact that many of the characteristics are non-numeric and can be described only on a verbal level. Information received from experts can contain both crisp and fuzzy data. By objective reasons the latter prevail, because normally an expert uses words of a natural language to evaluate processes, events and objects. The use of such words is the cause of uncertainty in the form of fuzziness. Expert information with fuzzy data (fuzzy expert information) is difficult to formalize within the framework of traditional mathematical concepts. The results of expert evaluation of students’ characteristics are assumed to be the values of random variables. It might be true if students got their grades written on the cards and pulled from a black box. But it was not the case because, for example, their academic progress is assessed by their teachers, who do not assign these results randomly. It is clear that in many real situations the conditions to apply the probability theory are not satisfied. Nevertheless, the methods of probability theory are applied, because sometime it is better to have some plausible solution than none. Recently, the issue related with the definition and application of probability theory has been discussed widely by members of BISC (Berkeley Initiative in Soft Computing) Group. Prof. L. Zadeh said in this connection: “Standard probability theory is a more than adequate tool when one deals with physical, that is, inanimate systems in which human judgment, perceptions and emotions do not have a role. But there is another realm, the realm of what I call humanistic systems- systems in which human judgment, perceptions Ashraf Darwish is with the Computer Science Division, Faculty of Science, Helwan University, Cairo, Egypt Olga Poleshchuk is with the Department of Electronics and Computers, Moscow State Forest University, Moscow, Russian Federation

Upload: journal-of-telecommunications

Post on 28-Oct-2014

79 views

Category:

Documents


0 download

DESCRIPTION

Journal of Telecommunications, ISSN 2042-8839, Volume 15, Issue 2, August 2012http://www.journaloftelecommunications.co.uk

TRANSCRIPT

Page 1: Fuzzy Models for Educational Data Mining

JOURNAL OF TELECOMMUNICATIONS, VOLUME 15, ISSUE 2, AUGUST 2012

© 2012 JOT www.journaloftelecommunications.co.uk

8

Fuzzy Models for Educational Data Mining Ashraf Darwish and Olga Poleshchuk

 

Abstract- Models of educational information processing based on semantic spaces are developed in this paper. The first model allows formalizing the results of expert evaluation of students’ qualitative characteristic with the help of fuzzy sets. The second model is developed to determine rating psoints of students in the frame of qualitative characteristics and to assign one of qualification levels to each student. The obtained rating points are used to develop a model for students’ clusterization based on the expert opinions regarding the importance of certain characteristics for the corresponding cluster. The fourth model is proposed to predict the characteristics of students and to study the relationship between these characteristics. The method of regression’s creation is based on the transformation of the input and output fuzzy numbers into intervals, which are called weighted intervals.

Index Terms- Educational information processing, fuzzy expert evaluation, complete orthogonal semantic space, qualitative characteristic, rating points, cauterization, prediction.

----------------------- ------------------------ 1 INTRODUCTION AND BACKGROUND

The complexity of the quantitative assessment of the educational process results from the general complexity of the quantitative assessment of the training and management processes. This complexity is associated with the specific features of measurement in the education. It consists in the necessity to take into account characteristics and judgments of the experts who take decisions based on their personal evaluation. Also this complexity is associated with the fact that many of the characteristics are non-numeric and can be described only on a verbal level.

Information received from experts can contain both crisp and fuzzy data. By objective reasons the latter prevail, because normally an expert uses words of a natural language to evaluate processes, events and objects. The use of such words is the cause of uncertainty in the form of fuzziness. Expert information with fuzzy data (fuzzy expert information) is

difficult to formalize within the framework of traditional mathematical concepts.

The results of expert evaluation of students’ characteristics are assumed to be the values of random variables. It might be true if students got their grades written on the cards and pulled from a black box. But it was not the case because, for example, their academic progress is assessed by their teachers, who do not assign these results randomly. It is clear that in many real situations the conditions to apply the probability theory are not satisfied. Nevertheless, the methods of probability theory are applied, because sometime it is better to have some plausible solution than none.

Recently, the issue related with the definition and application of probability theory has been discussed widely by members of BISC (Berkeley Initiative in Soft Computing) Group. Prof. L. Zadeh said in this connection: “Standard probability theory is a more than adequate tool when one deals with physical, that is, inanimate systems in which human judgment, perceptions and emotions do not have a role. But there is another realm, the realm of what I call humanistic systems-systems in which human judgment, perceptions

• Ashraf Darwish is with the Computer Science Division, Faculty of Science, Helwan University, Cairo, Egypt

• Olga Poleshchuk is with the Department of Electronics and Computers, Moscow State Forest University, Moscow, Russian Federation

Page 2: Fuzzy Models for Educational Data Mining

JOURNAL OF TELECOMMUNICATIONS, VOLUME 15, ISSUE 2, AUGUST 2012

© 2012 JOT www.journaloftelecommunications.co.uk

9

and emotions play a prominent role. This is the realm of education, economics, psychology, law and decision analysis. In the realm of humanistic systems, perceptions play a prominent role. Perceptions are intrinsically imprecise-more specifically, perceptions are intrinsically fuzzy. In large measure, subjective probabilities are perception-based and hence fuzzy. Here is where fuzzy logic comes into play. To deal with fuzzy probabilities what is needed is fuzzy logic. Standard probability theory is not designed to deal with perceptions-especially with perceptions described in a natural language. However, standard probability theory can be generalized through addition of concepts and techniques drawn from fuzzy logic. The generalized probability theory does have the capability to deal with perception-based information described in natural language”.

As it is well known the qualitative (non-numeric) characteristics are scored in different scales and often are incomparable in principle. The elements of these scales (ordinal scales, as a rule) are transformed into scores. As a result of this the information coarsens and its valuable component defined by individual experience and knowledge of a person becomes lost. Moreover such transformation needs some substantiation because stability of the final findings depends on it. It is very important to have stable results while determining rating points of students.

Rating points systems are used widely in the educational process and are of great importance in quality control problems. These systems allow to get available and timely information in the form of aggregative index at any stage of training and to use it for control decision making. However, only adding scores, for example, on examinations will not give a full picture of what students know [1]. As already mentioned, rating points based on traditional convolutions of separate characteristics may be unstable. The following example will make the point clear.

Let us suppose that two students got 4 and 3 points for one characteristic and 4 and 5 points for the other characteristic correspondingly. As a result of two assessments each student gets the same total score that equals 8. The conclusion is made that they have similar rating points and similar rating correspondingly. Since we deal with the ordinal scale while assessing students’ qualitative characteristics, we shall apply strictly increasing transformation Φ of this scale, that is acceptable: ( ) ( ) ( ) 754433 === ΦΦΦ ,, . It is known [2] that an acceptable transformation of the values of the assessed quality feature is such a transformation that retains subject matter of the type of assessment involved. In accordance with transformation applied the total score remained the same for the first student while it changed for the second student and has become equal 10 points. Thus the rating point of the second student has increased. The stability of the results after the acceptable transformation is violated that testifies to the fact that transformation of verbal scales’ elements into scores needs some substantiation. Taking into account the incorrectness of operating with the elements of an ordinal scale, we can conclude the need to develop a new model of the rating assessment.

The same problems arise when using classical regression analysis to predict the values of non-numerical characteristics and to study the relationship between them. An alternative approach to solving these problems has become fuzzy regression analysis [3-9], and the hybrid fuzzy least-squares regression analysis [10-18], combining the advantages of classical and fuzzy analysis. No one theory will solve all the problems. Each has its own natural area of application. Sometimes, they will compete with each other when finding a method to solve a particular problem, but, as the long-term studies of author, the key to solving many problems of the educational sphere as one of the so-called human systems is in generalization of known mathematical theories

Page 3: Fuzzy Models for Educational Data Mining

JOURNAL OF TELECOMMUNICATIONS, VOLUME 15, ISSUE 2, AUGUST 2012

© 2012 JOT www.journaloftelecommunications.co.uk

10

through addition of concepts and techniques drawn from fuzzy sets and fuzzy logic.

The modeling approach based on the fuzzy set theory has allowed excluding the shortcomings of traditional fuzzy information formalization. Successful development of the fuzzy set theory has ensured its recognition, however it has revealed points that need solving. A most criticized section of the fuzzy set theory is a phase of data formalization or creation of corresponding membership functions. As a rule, formalization model requirements are defined within each specific problem framework, and quality of built-up models often depends on experience and skills of the contributors. Apparently, a reason for such dependence is that formalization methods are limited by both a type of information and a way the experts provide it.

2 PROBLEM FORMULATION

The original task while processing the expert evaluation of students is the task of formalizing the obtained information. The solution of this problem lies in creating expert evaluation models in a uniform universal set. In the context of the instruments of the fuzzy sets theory semantic the space with a wide sphere of practical applications (expert systems, decision-making support intellect systems, data analysis and complex process management) may serve as these models [2, 19].

As it is well known semantic space is a linguistic variable with a fixed term-set [20].

A linguistic variable is a set of five ( ){ }S,V,U,XT,X ,

where X - is a name of a variable; ( ) { }m,i,XXT i 1== - a term-set of

variable X , i.e. a set of terms or names of linguistic meanings of variable X (each of these meanings is a fuzzy variable with a value from a universal set U );

V - is a syntactical rule that gives names of the meanings of a linguistic variable X ;

S - is a semantic rule that gives to every fuzzy variable with a name from ( )XT a corresponding fuzzy subset of a universal set U .

The theoretic research of semantic spaces’ properties aimed at adequacy improvement of the expert evaluation models and their utility for practical tasks solution has made it possible to formulate the valid requirements to the membership functions

( ) m,l,xl 1=µ of their term-sets

( ) { }m,l,XXT l 1== [2]:

1. For every m,l,X l 1= there is ≠lU

Ø, where { }1=∈= )x(:UxU ll µ

is a point or an interval.

2. Let { }1=∈= )x(:UxU ll µ

,

then ( ) m,l,xl 1=µ does not decrease to the

left of lU

and does not increase to the right of

lU

.

3. ( ) m,l,xl 1=µ have maximum two points of discontinuity of the first type.

4. For every Ux∈ ∑ ==

m

ll )x(

11µ .

It is assumed that each term of a semantic space has at least one typical representative and each point of universal set has at least one term which describes this point with non-zero membership value. The membership functions only of two adjacent terms can cross each other at 0.5 level point. The sum of all the membership functions at the fixed point of universal set equals 1. That allows to divide the used notions and to avoid using semantically close terms or synonyms.

Page 4: Fuzzy Models for Educational Data Mining

JOURNAL OF TELECOMMUNICATIONS, VOLUME 15, ISSUE 2, AUGUST 2012

© 2012 JOT www.journaloftelecommunications.co.uk

11

All these properties correspond to the thinking activity of the experts, and by this reason the semantic spaces were chosen for modeling. Theoretical and practical studies of some researchers [21-23] have shown that these models describe expert evaluations most adequately, and as a result they were often included in more sophisticated models of intellectual systems for decision making and data analysis.

Semantic spaces, whose membership functions meet the mentioned requirements were named complete orthogonal semantic spaces (COSS) [19].

Thus, the first task of this paper is to make COSS based on expert evaluation of students’ characteristics.

The second task is to develop a rating model for students’ evaluation.

The third task is to develop a fuzzy regression model to predict students’ characteristics and to study the relationship between these characteristics.

3 EXPERT EVALUATION MODEL

We choose −T fuzzy numbers for

modeling. Fuzzy number ( )RL, a,aa,aA~ 21≡ is

a −T fuzzy number, if its membership function is

( )

⎪⎪⎪

⎪⎪⎪

+>−<

≤≤

>+≤<−

><≤−−

+

=

.aaxoraax,;axa,

;a,aaxa,aax

;a,axaa,aax

x

RL

RRR

LLL

A~

21

21

222

111

01

01

01

µ

The normal triangular number ( )RL a,a,aA~ 1≡ is a special case of T -number

with 21 aa = . A triangular number is called

symmetric with RL aa = and asymmetric with

RL aa ≠ . A T -number is called nonnegative with 01 ≥− Laa .

Let us consider a group of N students which are being assessed for the characteristic X in a verbal scale with the levels

m,l,X l 1= , 2≥m , that are ordered according to the intensity of manifestation of the characteristic X . The levels of the applied verbal scale uniquely specify term-set -

}X,...X,X{)X(T m21= . For a universal set COSS [ ]10,U = is selected. Point 0=x corresponds to the total absence of characteristic X manifestation and that is why it is considered a typical point of term 1X , point 1=x corresponds to total presence of

characteristic X and that is why it is considered a typical point of term mX .

We shall designate membership functions of fuzzy numbers m,l,X~l 1= that

correspond to the terms m,l,X l 1= by

( ) m,l,xl 1=µ correspondingly. We shall designate the number of students which were assessed by the level m,l,X l 1= by

m,l,nl 1= and m,l,Nnl 1= by m,l,al 1= ,

11

=∑=

m

lla . The membership functions of fuzzy

numbers m,l,X~l 1= will be constructed so that the areas of the figures restricted by their graphs equal m,l,al 1= correspondingly. We

shall designate ( ) 111 −=+ m,l,a,amin ll by

11 −= m,l,bl . Then

( ) ⎟⎠

⎞⎜⎝

⎛−≡ 1

111 02

0 b,,ba,xµ ,

Page 5: Fuzzy Models for Educational Data Mining

JOURNAL OF TELECOMMUNICATIONS, VOLUME 15, ISSUE 2, AUGUST 2012

© 2012 JOT www.journaloftelecommunications.co.uk

12

( ) ⎟⎠

⎞⎜⎝

⎛−+≡ −

=

−−

=∑∑ ll

ll

ii

ll

iil bbbabax ,,

2,

2 11

11

1µ ,

12 −= m,l ,

( ) ⎟⎠

⎞⎜⎝

⎛ +−≡ −− 012

1 11 ,b,,bax m

mmmµ .

The first two parameters in brackets are abscissas of the apexes of the trapezium upper bases that are graphs of the corresponding membership functions, while the last two parameters are the lengths of the left and right trapezium wings correspondingly.

Thus, the author developed the expert evaluation model of students’ characteristic in the form of COSS or the model of formalization of the verbal scales’ elements, which allows to transform them not into scores, but into fuzzy numbers, defined on uniform universal set. Such formalization makes it possible to present dissimilar data in common abstract form and to operate correctly with them by their membership functions. The developed expert evaluation model allows to compare different groups of students, for example, to compare the students’ progress, psychophysiological and personal characteristics in different groups. It also allows to compare different expert criteria after their assessments of one and the same group of students.

4 DETERMINATION OF STUDENTS’ RATING POINTS AND QUALIFICATION LEVELS

As already mentioned rating point systems are used widely in the educational process and are of great importance in quality control problems [22-33]. The expert evaluation model of students’ characteristic with the help of fuzzy sets was developed in section 3. Presenting separate characteristics in the form

of the fuzzy sets defined on a uniform universal set and correct manipulation with their membership functions provides for adequate and stable rating points. The developed expert evaluation model is being used to assess of students’ rating points and qualification levels taking into account their academic progress, as well as psychophysiological and personal characteristics.

Let us consider a group of N students who are being evaluated by qualitative

characteristics k,j,X j 1= . Let

jlj m,l,X 1= be levels of the verbal scales

that are used correspondingly to evaluate these characteristics. We shall create k COSS with

the names k,j,X j 1= and term-sets ljX ,

k,j,m,l j 11 == . We shall designate

membership function of fuzzy number ljX~

that

corresponds to the l th term of the j th COSS

by ( )xljµ , k,j,m,l j 11 == . We shall refer

to fuzzy numbers k,j,m,l,X~ jlj 11 == or

their membership functions

( ) k,j,m,l,x jlj 11 ==µ as students’ points.

We shall designate characteristic jX point of

the n th student by njX~

or

( ) ( )njR

njL

nj

nj

nj a,a,a,ax 21≡µ , k,j,N,n 11 == .

Fuzzy number njX~

with membership function

( )xnjµ is equal to one of fuzzy numbers

jlj m,l,X~ 1= , k,j 1= . We shall designate

Page 6: Fuzzy Models for Educational Data Mining

JOURNAL OF TELECOMMUNICATIONS, VOLUME 15, ISSUE 2, AUGUST 2012

© 2012 JOT www.journaloftelecommunications.co.uk

13

weight coefficients of the evaluated

characteristics by k,j,j 1=ω , 11

=∑=

n

jjω .

Fuzzy rating point of the n th student,

N,n 1= in the frame of the characteristics

k,j,X j 1= is determined as a fuzzy number

nkk

nn X~...X~A~ ⊗⊕⊕⊗= ωω 11

with membership function

.N,n,a,a,a,a)x( njR

k

jj

njL

k

jj

nj

k

jj

nj

k

jjn 1

112

11

1=⎟

⎠⎞⎜

⎝⎛ ∑∑∑∑≡

====ωωωωµ

Let us determine a confidence interval

for crisp rating point nx , that classifies

characteristics k,j,X j 1= manifestation for

the n th student , N,n 1= . At the confidence level

( ) 10 <<≥ ααµ ,ynn we get:

( ) ( ) .aaxaak

j

njRj

k

j

njjn

k

j

njLj

nj

k

jj ∑−+∑≤≤∑−−∑

==== 112

11

111 ωαωωαω

We shall defuzzy fuzzy numbers ,N,n,A~n 1=kk X~...X~B~ 11111 ⊗⊕⊕⊗= ωω ,

kmkmm kX~...X~B~ ⊗⊕⊕⊗= ωω 11 1

by the

method of gravity center. The obtained crisp

numbers are designated by mn B,B,N,n,A 11=.

Number nA is called a rating point of

the n th student, N,n 1= in the frame of

characteristics k,j,X j 1= .

We shall find the normed rating point of the n th student with the following formula

N,n,BBBAE

m

nn 1

1

1 =−

−= .

Let us assume that it is necessary to assign one of accepted qualification levels lD ,

m,l 1= to the each student. Levels are arranged in ascending order of manifestation intensity of a relevant characteristic. Let us create COSS with terms lD , m,l 1= using the model described in section 3. We shall designate membership function of fuzzy number lD

~ that

corresponds to the term lD by ( )xlη , m,l 1= . To assign one of the qualification levels

m,l,Dl 1= to the n th student it is necessary to

identify a fuzzy number nA~ with membership

function ( )xnµ with one of fuzzy numbers lD~

with membership functions ( )xlη . With this aim we shall calculate identification indexes

( ) ( )( )

( ) ( )( )N,n,m,l,

dxx,xmax

dxx,xmin

ln

lnln 111

0

1

0 ==∫

∫=

ηµ

ηµβ

.

If ,maxl

ln

jn ββ = then ( )jn D~A~Pos =

is calculated [33].

If ( ) γ== jn D~A~Pos , then the n th

student is assigned qualification level jD with

possibility γ .

5 CLUSTERIZATION OF STUDENTS BASED ON EXPERT OPINIONS

Page 7: Fuzzy Models for Educational Data Mining

JOURNAL OF TELECOMMUNICATIONS, VOLUME 15, ISSUE 2, AUGUST 2012

© 2012 JOT www.journaloftelecommunications.co.uk

14

The obtained fuzzy rating points are

suggested to be used for students’s clusterization based on the expert opinions regarding the importance of certain characteristics for the corresponding cluster. The following opinion can serve as an example of such opinions: «For the students’ belonging to the i th cluster characteristics from the first group are not very important , characteristics from the second group are rather important,…and characteristics from the v th

group are very important», r,i 1= . For example, these expert opinions maybe requirements of employers about the importance of students’ characteristics for their future successful professional activity. To formalize the linguistic terms «not important at all», «rather unimportant», «not very important», «rather important», «important», «very important» fuzzy numbers 61 C~,...,C~ with the following membership functions may be used:

( ) ( ) ( ) ( )( ) ( ) ( ) ( )( ) ( ) ( ) ( ).,.,x,.,.,.x

,.,.,.x,.,.,.x,.,.,.x,.,,x

0201202080202060202040

2020202000

65

43

21

≡≡

≡≡

≡≡

µµ

µµ

µµ

The corresponding fuzzy rating points of the n th student for the first, the second and so on the v th group of characteristics are designated by v

nn A~...,A~1 . Then according to the

expert opinions a fuzzy number inR~ with the

membership function ( )xinµ will be a fuzzy rating point of the n th student in the frame of the i th cluster:

,A~C~...A~C~A~C~R~ vn

vnn

in ⊗⊕⊕⊗⊕⊗≡ 2211

r,i,N,n 11 == .

Each fuzzy number v,j,C~ j 1= is equal

to one of the fuzzy numbers 61 C~,...,C~ .

The rating points of other clusters for all the students are obtained in a similar way in accordance with the expert opinions. The comparison of the obtained results is made on the basis r,i,N,n,R~in 11 == . For this fuzzy

sets r,i,I i 1= are determined at the index set

{ }N...,,,21 . Membership functions’ values of

these sets ( ) r,i,N,n,ni 11 ==µ are interpreted as belonging degree of the n th student to the ith cluster.

If ( ) ,x:xsup in

n1=µ N,n 1= belongs to

( )xR~ik , then the k th student is considered to be a typical representative of the i th cluster and

( ) 1=kiµ . The values ( ) N,n,ni 1=µ with

kn ≠ are calculated in the following way:

( ) ( ) ( )( )x,xminmaxn ik

inxi µµµ = .

If there are several typical representatives of the i th cluster (for example, they are pk,...,k,k 21 students), then the values

( ) lli kn,N,n,r,i,p,l,n ≠=== 111µ are

calculated in the following way:

( ) ( ) ( )( )x,xminmaxn ik

inx

li l

µµµ = .

At last

Page 8: Fuzzy Models for Educational Data Mining

JOURNAL OF TELECOMMUNICATIONS, VOLUME 15, ISSUE 2, AUGUST 2012

© 2012 JOT www.journaloftelecommunications.co.uk

15

( ) ( ) p,l,kn,N,n,nmaxn llili 11 =≠== µµ

6 PREDICTION OF STUDENTS’ CHARACTERISTICS

Let

( ) niyyyyyyYY

YY i

Lii

RiL

iii

n

,1,0,,,,~,~...

~~

121

1

=≥−≡⎟⎟⎟⎟

⎜⎜⎜⎜

=

be output T -numbers, and

( ) n,i,m,j,xx,x,x,x,xX~,X~...X~

X~ jiL

jijiR

jiL

jijiij

nj

j

j 110121

1

==≥−≡⎟⎟⎟

⎜⎜⎜

=

be input T -numbers.

Relation between input and output data will be determined as

,X~a~...X~a~a~Y~ mm+++= 110

where ( )jRjL

jj b,b,ba~ ≡ , m,j 0= are

unknown coefficients of a regression model.

Fuzzy numbers which correspond to the terms of expert evaluation models, developed in section 3, are considered as the initial information of the regression model. As universal set of these models is the interval [0, 1], the fuzzy numbers corresponding to its terms, are nonnegative.

The method for the defuzzification of fuzzy numbers based on the weighted intervals was developed in [35]. As is well known defuzzification methods convert a fuzzy

number into a crisp number, but very often two different fuzzy numbers are converted into one crisp number. While this may not present a problem to solve a number of practical tasks, however, for example, in decision-making problem and some other problems the necessity arises to find aggregative indexes that will possibly accumulate different bounds of input fuzzy numbers.

As it is well known an α -level set

( )RL a,a,a,aA~ 21≡ is defined as the ordinary

set αA , such as

( ){ } [ ] ( ) ( )[ ] [ ]1011 2121 ,,aa,aaA,Ax:RxA RLA~ ∈−+−−==≥∈= ααααµ ααα

Let us consider two triangular numbers

( ) ( )RL a,,aB~,,a,aB~ 00 2211 ≡≡ that belong

to the number ( )RL, a,aa,aA~ 21≡ . α -level

sets 21 B~,B~ are designated by

[ ] [ ]22221

111 αααα B,aB,a,BB == accordingly.

Then a weighed interval [ ]21 A,A for T -

number ( )RL a,a,a,aA~ 21≡ can be obtained as follows:

( ) ( )( ) ,aadaadaBA LL 6112 1

1

01

1

01

111 −=∫ −−=∫ += αααααα

( ) ( )( ) .aadaadBaA RR 6112 2

1

02

1

0

2222 +=∫ −+=∫ += αααααα

The method for the defuzzification of fuzzy numbers based on the weighted intervals is suggested to be used in situations where it is necessary to accumulate more information about fuzzy numbers than aggregative point crisp indexes contain when there is no requirement to get only aggregative numbers. It is obvious quite that it is easier to operate with aggregative

Page 9: Fuzzy Models for Educational Data Mining

JOURNAL OF TELECOMMUNICATIONS, VOLUME 15, ISSUE 2, AUGUST 2012

© 2012 JOT www.journaloftelecommunications.co.uk

16

indexes in regression models than with proper fuzzy numbers. Other defuzzification methods and their discussions can be found in [36-39]. Developed method can be used in regression analysis, decision-making problem and many other tasks.

The proposition 1. Weighed interval of sum of T -numbers is equal to the sum of the weighed intervals of these numbers.

The proof. Let us prove that the sum of

T -numbers ( )1121 RL a,a,a,aA~ ≡ ,

( )2221 RL b,b,b,bB~ ≡ with the weighed

intervals [ ]21 A,A , [ ]21 B,B accordingly, has

the weighed interval [ ]2211 BA,BA ++ . Let

us designate the weighed interval B~A~ + by [ ]21 C,C . Then

( ) ( ) ( )[ ] =∫ −−−−+=1

0111 21

112 αααα dbabaC LL

;BAbaba LL 1111 21 61

61

+=−−+=

( ) ( ) ( )[ ] =∫ −+−++=1

0222 21

112 αααα dbabaC RR

.BAbaba RR 2222 21 61

61

+=+++=

Thus [ ] [ ]221121 BA,BAC,C ++= . The proposition 1 is proved.

Let us consider a nonnegative T -number ( )RL x,x,x,xX~ 21≡ , 01 ≥− Lxx

and a triangular number ( )RL b,b,ba~ ≡ .

The proposition 2. Boundaries of the weighed interval [ ]21

X~a~X~a~ ,θθ of fuzzy number

X~a~ that is the result of multiplication of fuzzy

numbers a~ and X~ look like

( ) ( ) ;xxbxxbqq M

qqLM

qqX~a~ ⎥⎦

⎤⎢⎣

⎡ −+−⎥⎦

⎤⎢⎣

⎡ −+=1211

61

6111θ

( ) ( ) ;xxbxxbrr M

rrRM

rrX~a~ ⎥⎦

⎤⎢⎣

⎡ −++⎥⎦

⎤⎢⎣

⎡ −+=1211

61

6112θ

⎩⎨⎧

=

==

⎩⎨⎧

<+

≥−=

;q,R;q,L

M;bb,;bb,

q qR

L

21

0201

⎩⎨⎧

=

==

⎩⎨⎧

<+

≥−=

.r,R;r,L

M;bb,;bb,

r rR

L

21

0102

The proof. Let us write out α -level set

X~

[ ] ( ) ( )[ ]RL xx,xxX,XX ααααα −+−−== 11 2121

and α -level set a~

[ ] ( ) ( )[ ].bb,bba,aa RL ααααα −+−−== 1121

If ( )RL b,b,ba~ ≡ is a nonnegative

fuzzy number ( )0≥− Lbb , then according to multiplication for fuzzy numbers [33], the α -

level set X~a~ looks like [ ]21αα A,A , where

( ) ( ) ( ) ;xbxbbxbxA LLLL2

111 111 αααα −+−−−−=

( ) ( ) ( ) .xbxbbxbxA RRRR2

222 111 αααα −+−+−+=

Page 10: Fuzzy Models for Educational Data Mining

JOURNAL OF TELECOMMUNICATIONS, VOLUME 15, ISSUE 2, AUGUST 2012

© 2012 JOT www.journaloftelecommunications.co.uk

17

Then

( ) =+−−=∫ += LLLLX~a~ xbxbbxbxdAbx121

61

61

11

1

0

11

1 ααθ α

;xxbxxb LLL ⎟⎠

⎞⎜⎝

⎛ −−⎟⎠

⎞⎜⎝

⎛ −=121

61

61

11

( ) =+++=∫ += RRRRX~a~ xbxbbxbxdAbx121

61

61

22

1

0

22

2 ααθ α

.xxbxxb RRR ⎟⎠

⎞⎜⎝

⎛ ++⎟⎠

⎞⎜⎝

⎛ +=121

61

61

22

If ( )RL b,b,ba~ ≡ is a negative fuzzy

number ( )0<+ Rbb , then according to multiplication for fuzzy numbers, the α -level

set X~a~ looks like [ ]21αα B,B , where

( ) ( ) ( ) ;xbxbbxbxB RLLR2

221 111 αααα −−−−−+=

( ) ( ) ( ) .xbxbbxbxB LRRL2

112 111 αααα −−−+−−=

Then

( ) =−−+=∫ += RLLRX~a~ xbxbbxbxdBbx121

61

61

22

1

0

12

1 ααθ α

;xxbxxb RLR ⎟⎠

⎞⎜⎝

⎛ +−⎟⎠

⎞⎜⎝

⎛ +=121

61

61

22

( ) =−+−=∫ += LRRLX~a~ xbxbbxbxdBbx121

61

61

11

1

0

21

2 ααθ α

⎟⎠

⎞⎜⎝

⎛ −+⎟⎠

⎞⎜⎝

⎛ −= LRL xxbxxb121

61

61

11

or

( ) ( ) ;xxbxxbqq M

qqLM

qqX~a~ ⎥⎦

⎤⎢⎣

⎡ −+−⎥⎦

⎤⎢⎣

⎡ −+=1211

61

6111θ

( ) ( ) ;xxbxxbrr M

rrRM

rrX~a~ ⎥⎦

⎤⎢⎣

⎡ −++⎥⎦

⎤⎢⎣

⎡ −+=1211

61

6112θ

⎩⎨⎧

=

==

⎩⎨⎧

<+

≥−=

;q,R;q,L

M;bb,;bb,

q qR

L

21

0201

⎩⎨⎧

=

==

⎩⎨⎧

<+

≥−=

.r,R;r,L

M;bb,;bb,

r rR

L

21

0102

The proposition 2 is proved.

Let us define an affinity measure for two T -numbers B~,A~ with the weighed intervals [ ]21 A,A , [ ]21 B,B

( ) ( ) ( ) .BABAB~,A~f 222

211 −+−=

Let us determine the weighed intervals

n,i,yy,yy iR

iiL

i 161

61

21 =⎥⎦

⎤⎢⎣

⎡ +− for

observable output data iY~ , using the

proposition 1.

Let us designate the weighed intervals

of fuzzy numbers ijj X~a~ m,j 1= , n,i 0= by

( ) ( )[ ]jRjL

j

X~a~jR

jL

j

X~a~b,b,b,b,b,b i

jjijj

21 θθ .

According to the proposition 2

( ) ,xxbxxbb,b,b jiL

jijL

jiL

jijjR

jL

j

X~a~ ijj

⎟⎠

⎞⎜⎝

⎛ −−⎟⎠

⎞⎜⎝

⎛ −=121

61

61

111θ

Page 11: Fuzzy Models for Educational Data Mining

JOURNAL OF TELECOMMUNICATIONS, VOLUME 15, ISSUE 2, AUGUST 2012

© 2012 JOT www.journaloftelecommunications.co.uk

18

( ) ,xxbxxbb,b,b jiR

jijR

jiR

jijjR

jL

j

X~a~ ijj

⎟⎠

⎞⎜⎝

⎛ ++⎟⎠

⎞⎜⎝

⎛ +=121

61

61

222θ

if ( )jRjL

jj b,b,ba~ ≡ , m,j 1= are nonnegative

fuzzy numbers ( )0≥+ jL

j bb ,

( ) ;xxbxxbb,b,b jiR

jijL

jiR

jijjR

jL

j

X~a~ ijj

⎟⎠

⎞⎜⎝

⎛ +−⎟⎠

⎞⎜⎝

⎛ +=121

61

61

221θ

( ) .xxbxxbb,b,b jiL

jijR

jiL

jijjR

jL

j

X~a~ ijj

⎟⎠

⎞⎜⎝

⎛ −+⎟⎠

⎞⎜⎝

⎛ −=121

61

61

112θ

if ( )jRjL

jj b,b,ba~ ≡ , m,j 1= are negative

fuzzy numbers ( )0<+ jR

j bb .

Let us determine the weighed intervals

( ) ( ) n,i,b,b,bbb,b,b,bbbm

j

jR

jL

j

X~a~R

m

j

jR

jL

j

X~a~L ijj

ijj

161

61

1

200

1

100 =⎥⎦

⎤⎢⎣

⎡∑++∑+−==θθ

for model output data imm

ii X~a~...X~a~a~Y +++= 110

, using

propositions 1, 2.

Let us consider a functional

( ),Y~,YfFn

iii∑=

=1

2

which characterizes an affinity measure between initial and model output data. It is easy to demonstrate that

( ) +∑ ⎥⎦

⎤⎢⎣

⎡∑++−−=

= =

n

i

m

j

jR

jL

j

X~a~iL

iL b,b,byybbF i

jj1

2

1

11

00

61

61

θ

( ) .b,b,byybbn

i

m

j

jR

jL

j

X~a~iR

iR i

jj∑ ⎥⎦

⎤⎢⎣

⎡∑+−−++

= =1

2

1

22

00

61

61

θ

The optimization problem is set as follows:

( ) ( ) min;Y~,Yfb,b,bFn

iii

jR

jL

j →∑==1

2

.m,j,b,b jR

jL 000 =≥≥

As ( )jRjL

j

X~a~b,b,bi

jj

1θ and

( )jRjL

j

X~a~b,b,bi

jj

2θ are piecewise linear

functions in the field 0≥jLb , 0≥jRb ,

m,j 0= , then F is piecewise differentiable function, and solutions of an optimization problem are found by means of known methods [40].

Let initial output data

( )iRiL

ii y,y,y,yY~ 21≡ , n,i 1= are

formalizations ( )kRkL

kk y,y,y,yY~~

21≡ , p,k 1=

of linguistic values kY of students’

characteristic Y . After obtaining predicted

n,i,Yi 1=

a problem of identifying them with

p,k,Y~~k 1= appears.

The weighted intervals

[ ( ) ( ) n,i,b,b,brbb,b,b,blbbm

j

jR

jL

j

X~a~R

m

j

jR

jL

j

X~a~L ijj

ijj

11

200

1

100 =⎥⎦⎤∑++∑+−

==θθ

for predicted n,i,Yi 1=

are

designated by [ ] n,i,A,A ii 121 = , accordingly. The weighted intervals [ ] p,k,ryy,lyy k

Rkk

Lk 121 =+− for fuzzy

Page 12: Fuzzy Models for Educational Data Mining

JOURNAL OF TELECOMMUNICATIONS, VOLUME 15, ISSUE 2, AUGUST 2012

© 2012 JOT www.journaloftelecommunications.co.uk

19

numbers ( ) p,k,y,y,y,yY~~ k

RkL

kkk 121 =≡ are

designated by [ ] p,k,B,B kk 121 = accordingly.

Let

( ) ( ) ( ) p,k,n,i,BABAY~~,Yf kikiki 112

22

2

11

2

==−+−=

The predicted iY

is identified to

linguistic meaning sY , if

( ) ( ) p,k,Y~~,YfminY

~~,Yf kiksi 122

==

.

7 CONCLUSION

An essential feature of the educational sphere is the difficulty of quantitative evaluation of the training and management processes, as many of the characteristics of these processes are non-numeric and evaluated by experts with words or phrases of their professional language. Using the words of natural language is the cause of uncertainty in the form of fuzziness. As the use of words while assessing the characteristics is an objective reality, rather than artificially contrived problem, we can say that the existence of the fuzzy environment in the educational sphere is a typical situation that cannot be ignored.

This paper is dedicated to models of educational information processing based on fuzzy sets theory. The first model allows to create expert evaluation models in the form of semantic spaces with special properties that are consonant with the intellectual activity of experts. Such formalization of information makes it possible to present dissimilar data in common abstract form in order to operate correctly with them by their membership functions. It allows to compare different groups of students, for example, to compare the students’ progress and psychophysiological characteristics in different groups. It also allows to compare different expert criteria after their assessments of one and the same group of students. The second model is developed to determine fuzzy, interval (with a given confidence level), crisp and normed students’ rating points in the frame of qualitative characteristics. This model allows to assign one of qualification levels for every student. Obtained fuzzy rating points are suggested to be used for the students’ clusterization based on the expert opinions.

A method for multiple fuzzy regression based on the weighted intervals was developed in this paper. The method allows to fit a model to linguistic meanings of students’ qualitative characteristics and to predict these meanings.

REFERENCES [1] Chiu-Keung, Law. Using fuzzy numbers in educational grading system, Fuzzy Sets and Systems, V. 83, 1996, pp. 311 - 323. [2] Poleshuk, O.M., Komarov, E.G. Меthods and models of fuzzy

information processing, М.: Energoatomizdat, 2007. [3] Tanaka, H., Uejima, S., Asai, K. Linear regression analysis with fuzzy model, IEEE. Systems, Trans. Systems Man Cybernet, SMC-2, 1982, pp. 903-907.

Page 13: Fuzzy Models for Educational Data Mining

JOURNAL OF TELECOMMUNICATIONS, VOLUME 15, ISSUE 2, AUGUST 2012

© 2012 JOT www.journaloftelecommunications.co.uk

20

[4] Tanaka, H., Ishibuchi, H. Identification of possibilistic linear models, Fuzzy Sets and Systems, V. 41, 1991, pp. 145 - 160. [5] Tanaka, H., Ishibuchi, H., Yoshikawa, S. Exponential possibility regression analysis, Fuzzy Sets and Systems, V. 69, 1995, pp. 305 - 318. [6] Hathaway, R.J., Bezdek, J.C. Switching regression models and fuzzy clustering, IEEE Transactions on fuzzy systems, V. 1, № 3, 1993, pp. 195-203. [7] Turksen, I.B. Fuzzy functions with LSE, Applied Soft Computing, V. 8, № 3, 2008, pp. 1178-1188. [8] Celikyilmaz, A., Turksen, I.B. Fuzzy functions with support vector machines, Information Sciences, V. 177, № 23, 2007, pp. 5163–5177. [9] Yao, C.C., Yu, P.T. Fuzzy regression based on asymmetric support vector machines, Applied Mathematics and Computation, V. 182, 2006, pp. 175-193 [10] Celmins, A. Least squares model fitting to fuzzy vector data, Fuzzy Sets and Systems, V. 22, 1987, pp. 245-269. [11] Celmins, A. Multidimensional least-squares model fitting of fuzzy models, Math. Modeling, V. 9, 1987, pp. 669-690. [12] Sabic, D.A., Pedrycr, W. Evaluation on fuzzy linear regression models, Fuzzy Sets and Systems, V. 39, 1991, pp. 51 - 63.

[13] Chang, Y.-H.O. Synthesize fuzzy-random data by hybrid fuzzy least-squares regression analysis, J. National Kaohsiung Inst. Technol., V. 28, 1997, pp. 1-14. [14] Chang, Y.-H.O. Hybrid fuzzy-random analysis for system modeling, J. National Kaohsiung Inst. Technol., V. 29, 1998, pp. 1-9. [15] Chang, Y.-H.O. Hybrid fuzzy least-squares regression analysis and its reliabity measures, Fuzzy Sets and Systems, V. 119, 2001, pp. 225-246. [16] Chang, Y.-H.O., Ayyub, B.M. Fuzzy regression methods – a comparative assessment, Fuzzy Sets and Systems, V. 119, 2001, pp. 187-203. [17] Domrachev, V.G., Poleshuk, O.M. A regression model for fuzzy initial data, Automation and Remote Control, V. 64, № 11, 2003, pp. 1715-1724. [18] Poleshuk, O.M., Komarov, E.G. Using fuzzy regression analysis in educational process, Proceeding of the X Belarussian Mathematical Conference, V. 5, 2008, pp. 74-79. [19] Ryjov, A.P. Theory of fuzzy sets and fuzziness measurement elements. М.: Dialog-MGU, 1998. [20] Zadeh, L.A. The Concept of a linguistic variable and its application to approximate reasoning, Part 1, 2 and 3, Information Sciences, V. 8, 1975, pp. 199-249, pp.301-357 and Information Sciences, V. 9, 1976, pp. 43-80.

Page 14: Fuzzy Models for Educational Data Mining

JOURNAL OF TELECOMMUNICATIONS, VOLUME 15, ISSUE 2, AUGUST 2012

© 2012 JOT www.journaloftelecommunications.co.uk

21

[21] Ryjov, A.P. Description of Students in Human-Machine’, Information Systems, Application of Fuzzy Systems, Proceeding of the International Conference on Application of Fuzzy Systems, 1994, pp. 246-249. [22] Cheng, C.H., Wang, J.W., Tsai, M.F., Huang, K.C. Appraisal support system for high school teachers based on fuzzy linguistic integrating operation’, Journal of Human Resource Management, № 4, 2004, pp. 73-89. [23] Poleshuk, O.M., Komarov, E.G. The determination of students’ rating points on fuzzy formalization of initial information basis, Education, science and economics at universities. Integration to international education area, Plock, Poland, 2008, pp. 67-73. [24] Wang, C.H., Chen, S.M. Appraising the performance of high school teachers based on fuzzy number arithmetic operations, Soft Computing -A Fusion of Foundations, Methodologies and Applications, № 12, 2008, pp. 919 - 934. [25] Wang, Y.H., Yang, J.B., Xu, D.L., Chin, K.S. On the centroids of fuzzy Numbers, Fuzzy Sets and Systems, V. 157, 2006, pp. 919 - 926. [26] Echauz, J. R., Vachtsevanos, G. J. Fuzzy grading system, IEEE Trans. Educ. V. 38, № 2, 1995, pp. 158–164. [27] Ranjit, B. An application of fuzzy set in students’ evaluation, Fuzzy Sets and Systems, V. 74, 1995, pp. 187–194.

[28] Biswas, R. An application of fuzzy sets in student’s evaluation, Fuzzy Set and systems, V. 74, 1995, pp. 194-197. [29] Capaldo, G., Zollo, G. Applying fuzzy logic to personnel assessment: A case study, Omega The International Journal, № 29, 2001, pp. 585-597. [30] Chen, S.M., Lee, C.H. New methods for students’ evaluation using fuzzy sets, Fuzzy Sets and Systems, V. 104, 1999, pp. 209-218. [31] Poleshchuk, O., Komarov, E. The determination of rating points of objects with qualitative characteristics and their usage in decision making problems, Proceedings of World Academy of Science, Engineering and Technology, V. 40, ISSN: 2070-3740, 2009, pp. 313- 317. [32] Poleshchuk, O., Komarov, E. The determination of rating points of objects and groups of objects with qualitative characteristics, Proceedings of the 28th International Conference of the North American Fuzzy Information Processing Society, NAFIPS'2009, ISBN: 978-1-4244-4577-6, 2009. [33] Poleshchuk, O., Komarov, E. The determination of students’ fuzzy rating points and qualification levels, Proceedings of the 1st International Fuzzy Systems Symposium, 2009, pp. 218-224. [34] Dubois, D., Prade, H. Fuzzy real algebra: some results, Fuzzy Sets and Systems, V.4, 1979, pp. 327-348.

Page 15: Fuzzy Models for Educational Data Mining

JOURNAL OF TELECOMMUNICATIONS, VOLUME 15, ISSUE 2, AUGUST 2012

© 2012 JOT www.journaloftelecommunications.co.uk

22

[35] Poleshuk, O. M., Komarov, E. G. New defuzzification method based on weighted intervals, Proceedings of the 27th International Conference of the North American Fuzzy Information Processing Society, NAFIPS'2008, 2008. [36] Klir, G.J., Yuan, B. Fuzzy Sets and Fuzzy Logic – Theory and Applications. Prentice-Hall, New-York , 1995. [37] Song, Q., Bortolan, G. Some properties of defuzzification neural networks, Fuzzy Sets and Systems, V. 61, 1994, pp. 83-89. [38] Roychowdhury, S., Wang, B.-H. Cooperative neighbors in defuzzification, Fuzzy Sets and Systems, V. 78, 1996, pp. 37-49. [39] Yager, R.R., Filev, D.P. On the issue of defuzzification and selection based on a fuzzy set, Fuzzy Sets and Systems, V. 55, 1993, pp. 255-272. [40] Coleman, T.F., Li, Y. A reflective newton method for minimizing a quadratic function subject to bounds on some of the variables, SIAM J. Optim, V. 6, № 4, 1996, pp. 1040 - 1058.